CN105354571B - Distortion text image baseline estimation method based on curve projection - Google Patents

Distortion text image baseline estimation method based on curve projection Download PDF

Info

Publication number
CN105354571B
CN105354571B CN201510695611.7A CN201510695611A CN105354571B CN 105354571 B CN105354571 B CN 105354571B CN 201510695611 A CN201510695611 A CN 201510695611A CN 105354571 B CN105354571 B CN 105354571B
Authority
CN
China
Prior art keywords
image
projection
line
strip
strip image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510695611.7A
Other languages
Chinese (zh)
Other versions
CN105354571A (en
Inventor
孟高峰
潘春洪
向世明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201510695611.7A priority Critical patent/CN105354571B/en
Publication of CN105354571A publication Critical patent/CN105354571A/en
Application granted granted Critical
Publication of CN105354571B publication Critical patent/CN105354571B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/247Aligning, centring, orientation detection or correction of the image by affine transforms, e.g. correction due to perspective effects; Quadrilaterals, e.g. trapezoids

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of distortion text image baseline estimation method based on curve projection, this method include at least: extracting the edge image of distortion text image;The edge image is cut into band image;Calculate the perspective view of the band image;According to the perspective view, the optimal projection line of the band image is estimated;According to the optimal projection line of the band image and boundary line, the baseline of the distortion text image is obtained.Through the invention, the technical issues of how extracting distortion text image bending baseline is at least solved.

Description

Distorted text image baseline estimation method based on curve projection
Technical Field
The embodiment of the invention relates to the technical field of digital image processing and computer vision, in particular to a distorted text image baseline estimation method based on curve projection.
Background
When a camera is used for shooting pages of curved documents such as books and periodicals, the obtained images are often accompanied with severe geometric distortion due to the perspective effect of the camera and the bending of the pages. Such geometric distortion causes serious problems in subsequent text image analysis, such as image layout analysis and character recognition, and therefore, it is often necessary to first perform distortion correction on a distorted text image. One of the first problems involved in this is how to robustly and accurately extract the baseline of curved text lines in the image.
The text line baseline is a cluster of invisible horizontal lines in the text image that are parallel to each other and along which the printed content of the document (e.g., text lines, charts, etc.) is aligned. For curved document pages, the cluster of base lines is generally no longer a cluster of straight lines, but a cluster of curved lines. In addition, the text line baselines on the text image are no longer parallel to each other due to the perspective effect of the camera. In addition, due to the complex and various layouts of document pages and the ubiquitous presence of factors such as non-character target interference, imaging noise, image occlusion, low image resolution, character blurring and the like in images, accurate and robust extraction of text line base lines in distorted text images is usually very challenging.
In order to estimate the curved text line baseline, a common method firstly segments the horizontal text line in the image, then extracts a corresponding reference point for each character in the text line, and finally fits the reference points by using a B-spline curve to obtain the estimation of the text line baseline. Depending on the method of obtaining text lines, such methods can be further subdivided into: a text line tracking based method, a connected branch clustering based method, and an image segmentation based method.
Early text line tracking was performed directly on binary images. The method comprises the steps of firstly selecting a connected branch in a binary image as a starting seed point for tracking according to a certain strategy, and then carrying out seed point growth on the seed points by searching neighboring connected branches around the seed points. It should be noted that the connected branches are often stuck due to factors such as image blur and low resolution, and therefore, tracking at the level of the connected branches is often unstable, and a large number of tracking errors are often caused. Furthermore, this method is very language sensitive, as most chinese characters are usually composed of many connected branches, and thus tracking on chinese documents often results in erroneous text lines.
An improvement to the above method is to perform text line tracking directly on the grayscale image. The scholars propose a filtering-based method, which uses a set of anisotropic gaussian filter sets to filter the gray-scale image to extract the ridge lines of text lines, and then tracks the obtained ridge lines to extract the text lines. Another improved method notices high similarity between image blocks from the same text line, so a self-similarity measurement function between image blocks is introduced, and a text line tracking algorithm is constructed based on the measurement function. Text line tracking based methods are typically very sensitive to image noise. In addition, non-textual objects in text images, complex layout structures, etc. often cause the tracking algorithm to fail.
The problem of extracting lines of text can be generally seen as a clustering problem of connected branches. Based on this, recently, a bottom-up method is proposed to segment text lines in a handwritten Chinese document image. The method comprises the steps of firstly constructing distance measurement between connected branches by using supervised learning, then organizing all the connected branches in a binary image into a tree structure by using a minimum spanning tree, and finally, dynamically pruning the minimum spanning tree to obtain a text line to be segmented. Similar to the thought, the learner also solves the segmentation problem of the text lines as an energy minimization problem of the connected branch state in the image, measures the interaction between the text lines and the text line bending by introducing a cost function, and finally solves the optimization problem by using a graph cutting method to obtain a text line segmentation result. Connected-branch clustering based methods are generally more robust than text-line tracking based methods. However, the problems of a large number of manual parameter settings, heuristic merging rules, incapability of managing the change of the topology of the connected branches and the like in the algorithm often result in poor performance of the algorithm in practical application.
Unlike the above-described methods, the image segmentation-based method treats text line extraction as a classical image segmentation problem to solve. Based on this, the scholars propose a text line segmentation method based on density estimation and image level set. One significant advantage of this type of approach is language independent and therefore can be applied to text images of different languages. Inspired by the slit cutting (seamcarding) technology, some scholars directly apply the slit cutting technology to text line segmentation of text images, and a better effect is achieved. The image segmentation based method is similar to most image segmentation methods, and has a significant limitation, that is, the segmentation effect of the method is very sensitive to image noise, image resolution, and adhesion between characters, which are very common in text images shot by cameras (especially mobile phones with cameras).
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
The embodiment of the invention mainly aims to provide a distorted text image baseline estimation method based on curve projection, which at least partially solves the technical problem of how to estimate a distorted text image baseline.
In order to achieve the above object, according to one aspect of the present invention, the following technical solutions are provided:
a method for distorted text image baseline estimation based on curvilinear projection, the method at least may include:
extracting an edge image of the distorted text image;
cutting the edge image into strip images;
calculating a projection view of the strip image;
estimating an optimal projection line of the strip image according to the projection diagram;
and obtaining a base line of the distorted text image according to the optimal projection line and the boundary line of the strip image.
Further, the extracting the edge image of the distorted text image specifically includes:
step 1: calculating an edge image of the input image by using a Canny operator;
step 2: performing morphological closing operation and removing operation on the edge image;
and step 3: and (3) performing morphological dilation operation on the image obtained in the step (2).
Further, the calculating the projection view of the strip image specifically includes:
calculating a Radon transformation matrix corresponding to the strip imageTo the aboveAnd performing coordinate transformation to obtain a projection graph R (k, theta) corresponding to the strip image, wherein a coordinate transformation formula is as follows:
wherein,
h represents the height of the band image;
k represents a line mark of the strip image;
ρ represents the distance from the center of the strip image to the projection line;
theta represents an included angle between a normal of a projection line of the strip image and an abscissa axis of the strip image;
α denotes the minimum angle between the projection line of the strip image and the axis of abscissa;
β denotes the maximum angle of the strip image projection line with respect to the axis of abscissa.
Further, the estimating an optimal projection line of the strip image according to the projection map specifically includes:
and constructing a constraint optimization problem on the projection graph R (k, theta), and calculating an optimal projection line of the strip image by using a dynamic programming algorithm. Wherein the constraint optimization problem is as follows:
wherein,
θkan optimal included angle parameter corresponding to a projection line representing the k (k is 1, …, H) th row center point of the strip image;
p represents a power exponent parameter of the projection value;
λ represents a weight parameter;
φ(θ1,…,θH) A smoothing term representing the projection line angle parameter, used to smooth the angle parameter of adjacent projection lines, is defined as follows:
wherein, σ is a set parameter for controlling the sensitivity of the smoothing term to the difference between the included angles of the adjacent projection lines.
Further, the calculating an optimal projection line of the strip image by using a dynamic programming algorithm specifically includes:
a weighted directed graph is constructed that is,
discrete sampling is carried out on the k coordinate and the theta coordinate of the projection graph R (k, theta) to obtain a series of grid points (k-theta) on a k-theta planesj)(1≤s≤n,1≤j≤m),
Wherein k issAre discrete sample points in the k-direction,
θjare discrete sample points in the theta direction,
n is the total number of sample points in the k coordinate,
m is the total number of sample points in the theta coordinate,
taking the grid points as vertexes of a weighted directed graph;
if and only if two vertices (k)s-1i) And (k)sj) The corresponding projection lines meet the disjointness condition, a directed edge is constructed for the two vertexes, and a vertex (k) is recordeds-1i) Corresponding projection line angle isTwo adjacent projection linesAndare not intersected if and only ifWithin the following intervals:
wherein,
v isAThe row coordinates representing the intersection of the previous projection line and the left border of the strip image are calculated according to the following formula:
v isBThe row coordinates representing the intersection of the previous projection line and the right border of the strip image are calculated according to the following formula:
the w represents the width of the slice image;
for connecting verticesAnda directed edge of (2), to which a weight is given
Wherein,
the p represents a specified power exponent;
said ΔkRepresenting a sampling interval for k coordinates of the projection view;
said λ represents a weight parameter;
the sigma represents a set parameter and is used for controlling the sensitivity of the smoothing item to the difference value of the included angles of the adjacent projection lines;
the h represents an angle step and is calculated according to the following formula:
solving for the longest path on the weighted directed graph.
Further, the obtaining a baseline of the distorted text image according to the optimal projection line and the boundary line of the strip image specifically includes:
calculating the intersection point of the optimal projection line and the left and right boundaries of the strip image by using a projection line equation for the optimal projection line passing through each point on the central line of the leftmost strip image of the image, wherein if the strip image is overlapped with the adjacent strip images, the central line of the overlapped part is selected as the boundary line of the strip image;
from left to right, taking the intersection point of the optimal projection line of the previous strip image and the right boundary thereof as the starting point of the optimal projection line of the current strip image, and calculating the intersection point of the optimal projection line of the strip image and the right boundary of the strip image by utilizing a strip projection line equation;
repeating the above processes until the calculation of the strip image positioned at the rightmost side of the image is completed;
and approximating all the intersection points by utilizing a cubic spline curve so as to obtain a base line.
Compared with the prior art, the technical scheme at least has the following beneficial effects:
the embodiment of the invention extracts the edge graph of the image of the curved document (such as a book), performs stripe segmentation on the edge graph, calculates the projection graph of the stripe image by using Radon transformation for each stripe image, and constructs a constraint optimization problem based on the graph to calculate the optimal projection line of the stripe image. The solution of the optimization problem can be converted into an optimal path calculation problem on a weighted directed graph, and can be quickly solved through a classical dynamic programming algorithm. And finally, connecting the optimal projection lines obtained on each strip image to obtain a complete baseline estimation of the image. The embodiment of the invention can be applied to extracting the base line of the curved document image, thereby being used for the distortion correction of the distorted book page image. The method is independent of text line segmentation, and has the advantages of low calculation complexity, high precision, wide applicability and the like. The embodiment of the invention can be applied to high-quality correction of distorted documents such as book pages and the like, and has wide application prospect in the fields of document data digitization, digital library construction, precious historical document book protection and the like.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the means particularly pointed out in the written description and claims hereof as well as the appended drawings.
It should be noted that this summary is not intended to identify essential features of the claimed subject matter, nor is it intended to be used to determine the scope of the claimed subject matter. The claimed subject matter is not limited to addressing any or all of the disadvantages noted in the background.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention to its proper form. It is obvious that the drawings in the following description are only some embodiments, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a flow diagram illustrating a distorted text image baseline estimation method based on curvilinear projection according to an exemplary embodiment;
FIG. 2 is a schematic diagram illustrating a strip image coordinate system definition and projection line estimation in accordance with an exemplary embodiment;
FIG. 3a is an edge diagram of a stripe image shown in accordance with an exemplary embodiment;
FIG. 3b shows the result of a Radon transform of a strip image according to an example embodiment;
FIG. 3c is a projection diagram illustrating a strip image according to a Radon transform, according to an exemplary embodiment;
FIG. 4a is a schematic diagram illustrating construction of a weighted directed graph in accordance with an illustrative embodiment;
FIG. 4b is a diagram illustrating the calculation of an adjacent projection line disjoint constraint in accordance with an exemplary embodiment;
FIG. 5a is a diagram illustrating the results of calculating an optimal projection line on a stripe projection map, according to an exemplary embodiment;
FIG. 5b illustrates an optimal projection line on a strip image according to an exemplary embodiment;
FIG. 5c is a diagram illustrating the results of computing a histogram of the projection of a strip image along a fixed direction of the image in accordance with one illustrative embodiment;
FIG. 5d is a diagram illustrating the results of computing a histogram of the projection of the strip image along an optimal projection line in accordance with one illustrative embodiment;
FIG. 6 is a schematic diagram illustrating a strap baseline connection in accordance with an exemplary embodiment;
FIG. 7 is a diagram illustrating a baseline estimate of multiple distorted text images obtained using a method according to an embodiment of the invention, according to an illustrative embodiment.
These drawings and the description are not intended to limit the scope of the present invention in any way, but rather to illustrate the inventive concept to those skilled in the art by reference to specific embodiments.
Detailed Description
The technical problems solved, the technical solutions adopted and the technical effects achieved by the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings and the specific embodiments. It is to be understood that the described embodiments are merely a few, and not all, of the embodiments of the present application. All other equivalent or obviously modified embodiments obtained by the person skilled in the art based on the embodiments in this application fall within the scope of protection of the invention without inventive step. The embodiments of the invention can be embodied in many different ways as defined and covered by the claims.
It should be noted that in the following description, numerous specific details are set forth in order to provide an understanding. It may be evident, however, that the subject invention may be practiced without these specific details.
It should be noted that, unless explicitly defined or conflicting, the embodiments and technical features in the present invention may be combined with each other to form a technical solution.
In order to solve the problem of extraction of a curved baseline of a distorted text image, the embodiment of the invention provides a distorted text image baseline estimation method based on curve projection. FIG. 1 is a flow diagram illustrating a distorted text image baseline estimation method based on curvilinear projection according to an exemplary embodiment. As shown in fig. 1, the method may include at least steps S100 to S108.
S100: an edge image of the distorted text image is extracted.
In the step, an edge map of an input image is calculated by using a Canny operator, then the edge map is subjected to morphological closing operation and removing operation, and finally the obtained image is subjected to morphological dilation operation again. The Canny operator is a multi-stage edge detection algorithm and comprises the steps of denoising, finding brightness gradient in an image, tracking edges in the image and adjusting parameters.
S102: the edge image is sliced into strip images.
In this step, the resulting edge image is cut into a plurality of vertical band images overlapping each other according to the image size, the height of the band images being equal to the height of the image.
S104: a projection view of the strip image is calculated.
In this step, for each strip chart, calculating Radon variation corresponding to the strip chartChange matrixAnd rho and theta are parameters of a projection line corresponding to Radon transformation, rho is the distance from the center of the strip image to the projection line, and theta is an included angle between the normal of the projection line and the x coordinate axis of the strip image. To Radon transform matrixAnd (3) carrying out coordinate transformation, and recording a projection graph corresponding to the obtained strip graph as R (k, theta), wherein a coordinate transformation formula is as follows:
wherein,
h represents the height of the band image;
k represents a line mark of the strip image;
ρ represents the distance from the center of the strip image to the projection line;
theta represents an included angle between a normal of a projection line of the strip image and an abscissa axis of the strip image;
α denotes the minimum angle between the projection line of the strip image and the axis of abscissa;
β denotes the maximum angle of the strip image projection line with respect to the axis of abscissa.
By way of example, FIG. 2 presents a schematic diagram of the computation of the strip projection map. For each strip image, two coordinate systems are respectively established on the strip: an image coordinate system uov and a projection coordinate system xoy. The image coordinate system origin is located at the upper left corner of the strip image, the u axis and the v axis are respectively parallel to the row direction and the line direction of the image, the projection coordinate system origin is located at the center of the strip image, and the x axis and the y axis are respectively parallel to the u axis and the v axis of the image coordinate system. The projection diagram of the strip image mainly calculates the projection values of the image along a plurality of directions at each point on the central line of the strip image. In fig. 2, p is a point on the center line of the band image. In order to estimate the optimal projection line of the strip, the optimal projection line of each point on the central line of the strip is determined. By way of example, FIG. 3a shows an edge map of a stripe image, shown in accordance with an exemplary embodiment; FIG. 3b shows the result of a Radon transform of a strip image shown in accordance with an example embodiment; fig. 3c shows a projection of a strip image according to a Radon transform, according to an exemplary embodiment.
S106: from the projection map, the optimal projection line of the strip image is estimated.
In this step, a constraint optimization problem is constructed on the projection map R (k, θ), and all optimal projection lines of the strip image are calculated using a dynamic programming algorithm. Wherein the constraint optimization problem of the construction is as follows:
wherein,
θkan optimal included angle parameter corresponding to a projection line representing the k (k is 1, …, H) th line center point of the strip image, wherein the parameter is a parameter to be estimated;
p represents a power exponent parameter of the projection value, preferably, p ≧ 3;
λ represents a weight parameter;
φ(θ1,…,θH) A smoothing term representing the projection line angle parameter, used to smooth the angle parameter of adjacent projection lines, is defined as follows:
wherein, σ is a set parameter for controlling the sensitivity of the smoothing term to the difference between the included angles of the adjacent projection lines.
Wherein, the dynamic programming is: each decision depends on the current state, which in turn causes a transition of state. One decision sequence is generated in varying states, so this process of multi-stage optimization decision solving is called dynamic programming.
In an optional embodiment, when the optimal projection line is calculated by using a dynamic programming algorithm, a weighted directed graph needs to be constructed first, and the specific steps include:
firstly, discrete sampling is carried out on the k coordinate and the theta coordinate of a projection graph R (k, theta) to obtain a series of grid points (k-theta) on a k-theta planesj) (s is 1. ltoreq. n, j is 1. ltoreq. m) as shown in FIG. 4 a. Wherein k issIs a discrete sampling point in the k direction, thetajIs a discrete sampling point in the theta direction, n is the total number of sampling points in the k coordinate, and m is the total number of sampling points in the theta coordinate. These grid points are taken as vertices of the constructed weighted directed graph.
Second, the weighted directed graph edges are connected. If and only if two vertices (k)s-1i) And (k)sj) And the corresponding projection lines meet the disjointness condition, and a directed edge is constructed for the two vertexes. For convenience, the vertex (k) is noteds-1i) Corresponding projection line angle isWith the symbol representing the corresponding vertex. Two adjacent projection linesAndare not intersected if and only ifWithin the following intervals:
wherein,
vAthe row coordinates representing the intersection of the previous projection line and the left border of the strip image are calculated according to the following formula:
vBthe row coordinates representing the intersection of the previous projection line and the right border of the strip image are calculated according to the following formula:
w represents the width of the band image;
and finally, calculating the weight of the weighted directed graph edge. For connecting verticesAnda directed edge of (2), to which a weight is given
Wherein,
p represents a specified power exponent;
Δkrepresenting a sampling interval for k coordinates of the projection view;
λ represents a weight parameter;
sigma represents a set parameter for controlling the sensitivity of the smoothing term to the difference value of the included angles of the adjacent projection lines;
h represents the angular step and is calculated according to the following formula:
after the weighted directed graph is obtained, the constraint optimization problem is solved
And converting into the longest path planning problem on the constructed weighted directed graph. For this purpose, a virtual start point and an end point are added to the left and right of the weighted directed graph, respectively, where the start point is connected to all the leftmost vertices of the weighted directed graph, the end point is connected to all the rightmost vertices of the weighted directed graph, and the weights of all the connected edges are set to zero. Thus, the constraint optimization problem is converted to solve a longest path from a left starting point of the weighted directed graph to a right end point of the weighted directed graph. The problem is solved by a classical longest path problem, and can be quickly solved by a classical algorithm such as Dijkstra.
As an example, fig. 4a gives a schematic diagram of constructing a weighted directed graph. And (3) performing discrete sampling on the k coordinate and the theta coordinate of the projection graph R (k, theta) to obtain a series of grid points on a k-theta plane, wherein the grid points are used as vertexes of the directed graph. When constructing directed edges, the edges are only connected to adjacent vertices that satisfy the disjoint constraint. As an example, fig. 4b shows a schematic diagram of calculating a disjointing constraint between two adjacent projection lines. In order to ensure that two adjacent projection lines do not intersect, the current projection line should be located in an angular region defined by the previous projection line and the current point. In FIG. 4b, L denotes the center line of the band image, ks-1And ksTwo adjacent points on the central line of the strip image respectively, and the straight line AB is over ks-1The straight line intersects the left and right boundaries of the strip image at two points a and B. As can be seen from the figure, to make k-oversDoes not intersect with the line AB, the k-crossingsMust be located at the position of A, B and k-crossingsThe three points define an angular region. By way of example, FIG. 5a shows the results of computing an optimal projection line on a stripe projection graph, which corresponds to computing an optimal path from left to right on a constructed weighted directed graph. Fig. 5b shows the optimal projection line corresponding to the strip image. As can be seen from fig. 5b, these optimal projection lines correspond to the baseline of the strip image. Note that the baselines on the strip images are typically not parallel due to image distortion and camera perspective effects. Fig. 5c and 5d show the results of computing the projection histogram of the strip image along the image fixation direction and along the optimal projection line, respectively. It can be seen that the projection histogram obtained along the optimal projection line has a significant peak value and no aliasing phenomenon exists.
S108: and obtaining a base line of the distorted text image according to the optimal projection line and the boundary line of the strip image.
In the step, for the strip image at the leftmost side of the image, calculating the intersection point of the optimal projection line and the left and right boundaries of the strip image by using a projection line equation for the optimal projection line passing through each point on the central line of the strip image, wherein if the strip image and the adjacent strip image are overlapped, the central line of the overlapped part is selected as the boundary line of the strip image; from left to right, taking the intersection point of the optimal projection line of the previous strip image and the right boundary as the starting point of the optimal projection line of the current strip image, and calculating the intersection point of the optimal projection line of the strip image and the right boundary of the strip image by utilizing a strip projection line equation; repeating the above processes until the calculation of the strip image positioned at the rightmost side of the image is completed; and finally, approximating all the intersection points by utilizing a cubic spline curve to obtain a smooth base line.
As an example, fig. 6 gives a schematic diagram of the baseline connection of adjacent strip images. In the figure Si-1、SiAnd Si+1Respectively representing three adjacent strip images which are overlapped two by two. When the base lines are connected, the starting point of the base line is set on the central line of the overlapping area of the strip images.
In order to verify the embodiment of the invention, baseline extraction was performed on a plurality of actually shot text images of different distorted shapes. By way of example, FIG. 7 is a diagram illustrating a baseline estimate of a plurality of distorted text images obtained using a method according to an embodiment of the invention, according to an exemplary embodiment. Where the first line is the original distorted text image, it can be seen that these images have different distorted shapes and different layout structures. The partial image also contains a margin noise or the like introduced during imaging. The second row shows the image baseline results extracted using the method provided by the embodiments of the present invention. The third row shows a projection histogram of the image projected along the extracted image baseline. It can be seen that the projection histograms have significant peak values, and peaks and valleys in the histograms have good separability, which facilitates subsequent image layout segmentation and analysis. The fourth row gives the effect of extracting the baseline on the locally enlarged image. It can be seen that the image baseline extracted by the embodiment of the invention has high accuracy, and the extracted baseline is well fitted with the curved text line. This provides good characteristic lines for subsequent image distortion correction.
While the steps in this embodiment are described as being performed in the above sequence, those skilled in the art will appreciate that, in order to achieve the effect of this embodiment, the steps may not be performed in such a sequence, and may be performed simultaneously or in a reverse sequence, and these simple changes are all within the scope of the present invention.
The technical solutions provided by the embodiments of the present invention are described in detail above. Although specific examples have been employed herein to illustrate the principles and practice of the invention, the foregoing descriptions of embodiments are merely provided to assist in understanding the principles of embodiments of the invention; also, it will be apparent to those skilled in the art that variations may be made in the embodiments and applications of the invention without departing from the spirit and scope of the invention.
It should be noted that: the numerals in the drawings are only for the purpose of illustrating the invention more clearly and are not to be construed as unduly limiting the scope of the invention.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/device. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other elements in a process, method, article, or apparatus/device that comprises the element, i.e., the meaning of "comprising a" does not exclude the meaning of "comprising another".
The various steps of the present invention may be implemented in a general purpose computing device, for example, they may be centralized on a single computing device, such as: personal computers, server computers, hand-held or portable devices, tablet-type devices or multi-processor apparatus, which may be distributed over a network of computing devices, may perform the steps shown or described in a different order than those shown or described herein, or may be implemented as separate integrated circuit modules, or may be implemented as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific hardware or software or combination thereof.
The methods provided by the present invention may be implemented using programmable logic devices or as computer program software or program modules (including routines, programs, objects, components, data structures, etc.) including performing particular tasks or implementing particular abstract data types, such as a computer program product which is executed to cause a computer to perform the methods described herein. The computer program product includes a computer-readable storage medium having computer program logic or code portions embodied in the medium for performing the method. The computer-readable storage medium may be a built-in medium installed in the computer or a removable medium detachable from the computer main body (e.g., a storage device using a hot-plug technology). The built-in medium includes, but is not limited to, rewritable non-volatile memory such as: RAM, ROM, flash memory, and hard disk. The removable media include, but are not limited to: optical storage media (e.g., CD-ROMs and DVDs), magneto-optical storage media (e.g., MOs), magnetic storage media (e.g., magnetic tapes or removable disks), media with built-in rewritable non-volatile memory (e.g., memory cards), and media with built-in ROMs (e.g., ROM cartridges).
The present invention is not limited to the above-described embodiments, and any variations, modifications, or alterations that may occur to one skilled in the art without departing from the spirit of the invention fall within the scope of the invention.
While there has been shown, described, and pointed out detailed description of the basic novel features of the invention as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the system may be made by those skilled in the art without departing from the spirit of the invention.

Claims (5)

1. A distorted text image baseline estimation method based on curve projection is characterized by at least comprising the following steps:
extracting an edge image of the distorted text image;
cutting the edge image into strip images;
calculating a projection view of the strip image;
estimating an optimal projection line of the strip image according to the projection diagram;
obtaining a base line of the distorted text image according to the optimal projection line and the boundary line of the strip image, specifically:
calculating the intersection point of the optimal projection line and the left and right boundaries of the strip image by using a projection line equation for the optimal projection line passing through each point on the central line of the leftmost strip image of the image, wherein if the strip image is overlapped with the adjacent strip images, the central line of the overlapped part is selected as the boundary line of the strip image;
from left to right, taking the intersection point of the optimal projection line of the previous strip image and the right boundary thereof as the starting point of the optimal projection line of the current strip image, and calculating the intersection point of the optimal projection line of the strip image and the right boundary of the strip image by utilizing a strip projection line equation;
repeating the above processes until the calculation of the strip image positioned at the rightmost side of the image is completed;
and approximating all the intersection points by utilizing a cubic spline curve so as to obtain a base line.
2. The method for estimating a baseline of a distorted text image based on curved projection as claimed in claim 1, wherein the extracting the edge image of the distorted text image specifically comprises:
step 1: calculating an edge image of the input image by using a Canny operator;
step 2: performing morphological closing operation and removing operation on the edge image;
and step 3: and (3) performing morphological dilation operation on the image obtained in the step (2).
3. A distorted text image baseline estimation method based on curved projection as claimed in claim 1, wherein said calculating the projection view of the strip image specifically comprises:
calculating a Radon transformation matrix corresponding to the strip imageTo the aboveAnd performing coordinate transformation to obtain a projection graph R (k, theta) corresponding to the strip image, wherein a coordinate transformation formula is as follows:
wherein,
the H represents the height of the stripe image;
the k represents a line mark of the strip image;
the p represents the distance from the center of the strip image to the projection line;
the theta represents an included angle between a normal of a projection line of the strip image and an abscissa axis of the strip image;
α represents the minimum angle between the projection line of the strip image and the axis of abscissa;
said β indicates the maximum angle of the strip image projection line with the abscissa axis.
4. A distorted text image baseline estimation method based on curved projection as claimed in claim 3, wherein said estimating the optimal projection line of the strip image according to the projection map specifically comprises:
constructing a constraint optimization problem on a projection graph R (k, theta), and calculating an optimal projection line of the strip image by using a dynamic programming algorithm; wherein the constraint optimization problem is as follows:
wherein,
theta is describedkRepresenting the optimal included angle parameter corresponding to the projection line of the central point of the kth line of the strip image, wherein k is 1, … and H;
the p represents a power exponent parameter of the projection value;
said λ represents a weight parameter;
the phi (theta)1,…,θH) Smoothing to indicate included angle parameter of projection lineThe term, the angle parameter used to smooth adjacent projection lines, is defined as follows:
wherein, the sigma is a set parameter used for controlling the sensitivity of the smoothing term to the difference value of the included angles of the adjacent projection lines.
5. The method for estimating a baseline of a distorted text image based on curved projection as claimed in claim 4, wherein the calculating the optimal projection line of the strip image by using a dynamic programming algorithm specifically comprises:
a weighted directed graph is constructed that is,
discrete sampling is carried out on the k coordinate and the theta coordinate of the projection graph R (k, theta) to obtain a series of grid points (k-theta) on a k-theta planesj),1≤s≤n,1≤j≤m;
Wherein, k issAre discrete sample points in the k-direction,
theta is describedjAre discrete sample points in the theta direction,
said n being the total number of sample points in the k coordinate,
said m being the total number of sample points in the theta coordinate,
taking the grid points as vertexes of a weighted directed graph;
if and only if two vertices (k)s-1i) And (k)sj) The corresponding projection lines meet the disjointness condition, a directed edge is constructed for the two vertexes, and a vertex (k) is recordeds-1i) Corresponding projection line angle isTwo adjacent projection linesAndare not intersected if and only ifWithin the following intervals:
wherein,
v isAThe row coordinates representing the intersection of the previous projection line and the left border of the strip image are calculated according to the following formula:
v isBThe row coordinates representing the intersection of the previous projection line and the right border of the strip image are calculated according to the following formula:
the w represents the width of the slice image;
for connecting verticesAnda directed edge of (2), to which a weight is given
Wherein,
the p represents a specified power exponent;
said ΔkRepresenting a sampling interval for k coordinates of the projection view;
said λ represents a weight parameter;
the sigma represents a set parameter and is used for controlling the sensitivity of the smoothing item to the difference value of the included angles of the adjacent projection lines;
the h represents an angle step and is calculated according to the following formula:
solving for the longest path on the weighted directed graph.
CN201510695611.7A 2015-10-23 2015-10-23 Distortion text image baseline estimation method based on curve projection Expired - Fee Related CN105354571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510695611.7A CN105354571B (en) 2015-10-23 2015-10-23 Distortion text image baseline estimation method based on curve projection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510695611.7A CN105354571B (en) 2015-10-23 2015-10-23 Distortion text image baseline estimation method based on curve projection

Publications (2)

Publication Number Publication Date
CN105354571A CN105354571A (en) 2016-02-24
CN105354571B true CN105354571B (en) 2019-02-05

Family

ID=55330538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510695611.7A Expired - Fee Related CN105354571B (en) 2015-10-23 2015-10-23 Distortion text image baseline estimation method based on curve projection

Country Status (1)

Country Link
CN (1) CN105354571B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105842728B (en) * 2016-03-24 2018-11-30 东华理工大学 Digitize the pulse base estimation method in nuclear spectrum measurement system
CN107730511B (en) * 2017-09-20 2020-10-27 北京工业大学 Tibetan historical literature text line segmentation method based on baseline estimation
CN107845058A (en) * 2017-09-28 2018-03-27 成都大熊智能科技有限责任公司 A kind of method that three-dimensionalreconstruction based on edge line realizes projection distortion correction
CN112241411B (en) * 2020-10-23 2022-07-26 湖南省交通规划勘察设计院有限公司 Spreadsheet structured identification and extraction method based on CAD basic elements
CN113298054B (en) * 2021-07-27 2021-10-08 国际关系学院 Text region detection method based on embedded spatial pixel clustering
CN113901904A (en) * 2021-09-29 2022-01-07 北京百度网讯科技有限公司 Image processing method, face recognition model training method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149801A (en) * 2007-10-23 2008-03-26 北京大学 Complex structure file image inclination quick detection method
CN101192269A (en) * 2006-11-29 2008-06-04 佳能株式会社 Method and device for estimating vanishing point from image, computer program and its storage medium
CN102156884A (en) * 2011-04-25 2011-08-17 中国科学院自动化研究所 Straight segment detecting and extracting method
US20140140627A1 (en) * 2012-11-20 2014-05-22 Hao Wu Image rectification using sparsely-distributed local features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101192269A (en) * 2006-11-29 2008-06-04 佳能株式会社 Method and device for estimating vanishing point from image, computer program and its storage medium
CN101149801A (en) * 2007-10-23 2008-03-26 北京大学 Complex structure file image inclination quick detection method
CN102156884A (en) * 2011-04-25 2011-08-17 中国科学院自动化研究所 Straight segment detecting and extracting method
US20140140627A1 (en) * 2012-11-20 2014-05-22 Hao Wu Image rectification using sparsely-distributed local features

Also Published As

Publication number Publication date
CN105354571A (en) 2016-02-24

Similar Documents

Publication Publication Date Title
CN105354571B (en) Distortion text image baseline estimation method based on curve projection
Miao et al. A semi-automatic method for road centerline extraction from VHR images
Wei et al. Tensor voting guided mesh denoising
US8494273B2 (en) Adaptive optical character recognition on a document with distorted characters
CN102592268B (en) Method for segmenting foreground image
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
CN104636706B (en) One kind is based on gradient direction uniformity complex background bar code image automatic division method
CN103942797B (en) Scene image text detection method and system based on histogram and super-pixels
US8929597B2 (en) Method of tracking objects
CN104809446A (en) Palm direction correction-based method for quickly extracting region of interest in palmprint
CN113392856B (en) Image forgery detection device and method
CN104123554A (en) SIFT image characteristic extraction method based on MMTD
De Automatic data extraction from 2D and 3D pie chart images
Koo Text-line detection in camera-captured document images using the state estimation of connected components
Feild et al. Scene text recognition with bilateral regression
CN112990368B (en) Polygonal structure guided hyperspectral image single sample identification method and system
CN112907612A (en) Bar code region positioning method and image rectangular region fitting method
Ouwayed et al. General text line extraction approach based on locally orientation estimation
EP3686841B1 (en) Image segmentation method and device
Krylov et al. Stochastic extraction of elongated curvilinear structures with applications
Deb et al. An efficient method for correcting vehicle license plate tilt
Kumar et al. An efficient algorithm for text localization and extraction in complex video text images
CN105631896A (en) Hybrid classifier decision-based compressed sensing tracking method
CN104156696B (en) Bi-directional-image-based construction method for quick local changeless feature descriptor
CN102693424A (en) Document skew correction method based on Harr-like features

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190205

CF01 Termination of patent right due to non-payment of annual fee