CN107729898B - Method and device for detecting text lines in text image - Google Patents

Method and device for detecting text lines in text image Download PDF

Info

Publication number
CN107729898B
CN107729898B CN201610654001.7A CN201610654001A CN107729898B CN 107729898 B CN107729898 B CN 107729898B CN 201610654001 A CN201610654001 A CN 201610654001A CN 107729898 B CN107729898 B CN 107729898B
Authority
CN
China
Prior art keywords
link
links
weight
sum
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610654001.7A
Other languages
Chinese (zh)
Other versions
CN107729898A (en
Inventor
刘伟
范伟
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201610654001.7A priority Critical patent/CN107729898B/en
Publication of CN107729898A publication Critical patent/CN107729898A/en
Application granted granted Critical
Publication of CN107729898B publication Critical patent/CN107729898B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for detecting text lines in a text image. The method comprises the following steps: performing binarization processing on the text image and extracting a connected domain; establishing links between adjacent connected domains of similar size to form a plurality of interleaved links; cutting links of the plurality of links based on the first weight to obtain a main link; searching a maximum weight and a link from the subject link based on the first weight, wherein the sum of the maximum weight and the first weight of each link in the links is larger than the sum of the first weights of each link in other links; fusing the connected domain associated with the cut link to the maximum weight and the link to obtain a fused link, wherein the connected domain satisfies the following conditions: the sum of the second weights of all the links in the fused link after fusion is smaller than the sum of the maximum weight before fusion and the second weight of all the links in the link; and obtaining the text line based on the converged link.

Description

Method and device for detecting text lines in text image
Technical Field
The present invention relates to image processing technologies, and in particular, to a method and an apparatus for detecting text lines in a text image.
Background
The text image can be stored in various media, networks and the like in an electronic form, and the cost is low, so that the text image can be widely applied to daily life of people. Text line detection plays a very important role in understanding the content in text images. However, text lines always contain different kinds of languages, different colors and different relationships, making text line detection more difficult.
Disclosure of Invention
In view of this, the present invention provides a new method and apparatus for detecting text lines in a text image.
According to an aspect of the present invention, there is provided a method of detecting text lines in a text image, comprising: performing binarization processing on the text image and extracting a connected domain; establishing links between adjacent connected domains of similar size to form a plurality of interleaved links; clipping links of the plurality of links based on a first weight to obtain a subject link; searching the main link for a maximum weight and a link based on the first weight, wherein the sum of the maximum weight and the first weight of each link in the links is larger than the sum of the first weights of each link in other links; fusing connected domains associated with the pruned links into the maximum weight and links to obtain fused links, wherein the connected domains satisfy the following conditions: executing the sum of the second weights of all the links in the fused link after fusion to be smaller than the sum of the maximum weight before fusion and the second weight of all the links in the link; and obtaining the text line based on the converged link.
According to another aspect of the present invention, there is provided an apparatus for detecting text lines in a text image, comprising: connected component, is used for carrying out the binarization processing to the text image, and withdraw the connected domain; link establishing means for establishing links between adjacent connected domains having similar sizes to form a plurality of interleaved links; a clipping component for clipping the links of the plurality of links based on a first weight to obtain a subject link; searching means for searching for a maximum weight and a link from the subject link based on the first weight, a sum of the maximum weight and the first weight of each of the links being larger than a sum of the first weights of each of the other links; a fusion component for fusing the connected component associated with the cut link into the maximum weight and the link to obtain a fused link, wherein the connected component satisfies the following conditions: executing the sum of the second weights of all the links in the fused link after fusion to be smaller than the sum of the maximum weight before fusion and the second weight of all the links in the link; text line acquisition means for acquiring a text line based on the merged link.
According to still another aspect of the present invention, there is also provided a storage medium. The storage medium includes a program code readable by a machine, which, when executed on an information processing apparatus, causes the information processing apparatus to execute the above-described method according to the present invention.
According to still another aspect of the present invention, there is also provided a program. The program comprises machine-executable instructions that, when executed on an information processing device, cause the information processing device to perform the above-described method according to the invention.
These and other advantages of the present invention will become more apparent from the following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings.
Drawings
Other features and advantages of the present invention will be more readily understood from the following description of the various embodiments of the invention taken with the accompanying drawings, which are for the purpose of illustrating embodiments of the invention by way of illustration only, and not in all possible implementations, and which are not intended to limit the scope of the invention. In the drawings:
FIG. 1 shows a flow diagram of a method of detecting lines of text in a text image according to an embodiment of the invention;
FIG. 2 is a diagram illustrating lines of text resulting from a method of detecting lines of text in a text image, according to an embodiment of the invention;
3-6 illustrate a process of applying a method of detecting text lines in a text image according to one embodiment of the invention to an exemplary text image application;
FIG. 7 is a block diagram illustrating an apparatus for detecting text lines in a text image according to an embodiment of the present invention; and
FIG. 8 shows a schematic block diagram of a computer that may be used to implement methods and apparatus according to embodiments of the invention.
Detailed Description
Embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the following description is only exemplary and is not intended to limit the present invention. Further, in the following description, the same reference numbers will be used to refer to the same or like parts in different drawings. The different features in the different embodiments described below can be combined with each other to form further embodiments within the scope of the invention.
FIG. 1 shows a flow diagram of a method 100 of detecting text lines in a text image according to an embodiment of the invention. As shown in fig. 1, the method 100 includes steps S110 to S180, wherein steps S130 and S180 are not necessary for implementing the method 100, but are preferred.
In step S110, binarization processing is performed on the text image, and each connected component of the image is extracted. Binarization processing is a common technique in image preprocessing, and aims to separate a text foreground region from a background region of an image. The binarized image becomes a binary image composed of 0 and 1. After the image is subjected to binarization processing, a potential character area can be obtained. Here, the connected component refers to an image region composed of foreground pixels having the same pixel value and adjacent positions in the text image.
Then, in step S120, links are established between adjacent connected domains having similar sizes, thereby forming a plurality of links interleaved. The formed plurality of links are considered herein as a multi-way tree structure, i.e., each connected domain constitutes a node in the multi-way tree, and the links between connected domains constitute branches between nodes of the multi-way tree. For a more visual description of the plurality of links, hereinafter also referred to as tree links, which are composed of a root connected domain and several subtrees.
Here, the tree-shaped link obtained according to step S120 may include a closed-loop link and a non-literal connected domain as shown in fig. 3. In order to improve the accuracy of text line extraction, preferably, the method for detecting text lines in a text image according to the present invention may further include step S130 of optimizing links in the tree-like link to exclude closed-loop links, thereby obtaining an optimized tree-like link.
Specifically, links with weights greater than a predetermined threshold are first pruned from the tree link. The predetermined threshold value may be set by one skilled in the art based on practical application or experience. The weight here is a second weight depending on the distance between two connected domains establishing a link, and for example, the second weight may be set to the distance between adjacent connected domains. Then, a minimum spanning tree algorithm is applied to the resulting links to obtain optimized tree links without closed loop links.
Next, in step S140, the links in the tree link are clipped based on the first weight to obtain the main link. Specifically, the first weight in the tree link may be sorted first, and then the greedy algorithm is used to cut the link with the smallest weight each time until the total number of layers of the tree link changes, and the algorithm is stopped, so as to obtain the main body of the tree link.
Here, the first weight is calculated from the distance between two adjacent connected domains establishing a link and their size, and may be calculated, for example, according to the following expression:
for links in the vertical direction:
Wij=hi÷hj×d(cci,ccj) Wherein h isi<hj……(1)
Wherein, WijRepresenting two adjacent connected domains cciAnd a connected domain ccjWeight of the link between hiRepresents a connected component cciHeight of (h)jRepresents a connected component ccjAnd d (cc) andi,ccj) Represents a connected component cciAnd a connected domain ccjThe distance between them.
For links in the horizontal direction:
Wij=wi÷wj×d(cci,ccj) Wherein w isi<wj……(2)
Wherein, WijRepresents a connected component cciAnd a connected domain ccjWeight of the link between wiRepresents a connected component cciWidth of (d), wjRepresents a connected component ccjAnd d (cc), andi,ccj) Represents a connected component cciAnd a connected domain ccjThe distance between them.
Note that the first weight of the link is in the order of connected component cci and connected component ccjIndependently of one another, i.e. Wij=Wji. In the above expressions (1) and (2), the conditions hi < hj and w are seti<wjIn order to exclude the influence of the order of calculation of connected component cci and connected component ccj on the weights. Alternatively, hi > hj and w may be seti>wj
The direction of the link may be determined according to: a horizontal direction if the projections of two connected domains associated with a certain link in the horizontal direction do not overlap and the projections in the vertical direction overlap; conversely, if the projections of two connected domains associated with a link in the vertical direction do not overlap and the projections in the horizontal direction overlap, then the vertical direction is the direction in which the two connected domains overlap.
Next, in step S150, a maximum weight and a link are searched from the subject link based on the first weight, wherein a first weight sum of each of the maximum weight and the link is greater than a first weight sum of each of the other links.
Specifically, in the present invention, the following recursive algorithm is utilized to search for the maximum weight and the link. The maximum weight from the root connected domain r and the weight value Val of the link may be expressed as:
Val=Max(Val(ri)+wr-ri),i∈T……(3)
where ri is one of the T sub-connected domains of the root connected domain r; w is ar-riA weight representing a link between the root connected component r and the child connected component ri; and val (ri) may be represented as:
Val(ri)=MAX(Val(rit)+wri-rit),rit∈N……(4)
where rit is one of the N sub-connected domains of connected domain ri; w is ari-ritA weight representing a link between the connected component ri and the child connected component rit; and val (rit) may operate recursively like expressions (3) and (4), and finally obtain the maximum weight and link.
Next, in step S160, the connected component associated with the clipped link satisfying a specific condition is fused into the maximum weight and the link to obtain a fused link, where the specific condition is: and executing second weight sum of each link in the merged link after merging to be smaller than the maximum weight sum before merging and the second weight sum of each link in the link. Wherein the second weight is calculated based on the distance between two adjacent connected domains establishing the link, e.g. the distance between the two connected domains may be set as the weight of the link.
Preferably, the connected component associated with the pruned link is fused to the most recent connected component in the link and the largest weight. When calculating the sum of the second weights of the links of the merged link, the links of the merged link are updated first, and then the second weights of the links are calculated to obtain the second weight sum.
Specifically, in this step S160, it is assumed that the connected component associated with a certain link cut out in step S140 is fused into the most recent connected component within the maximum weight and link, thereby merging with the most recent connected component into a new connected component. Then, the links of the merged link thus merged are updated, and the weight of each link is calculated to find the total weight of the merged link. And if the weight sum after the fusion is smaller than the weight sum before the fusion, judging to fuse the connected domain, otherwise, not fusing the connected domain. In the case where it is decided to fuse the connected component, the connected component is fused into the maximum weight and link, and the link is updated.
Next, in step S170, a text line is acquired based on the merged link.
The text line obtained in step S170 may include a disjointed connected domain as shown in fig. 2, i.e., in fig. 2, the text portion "tokyo" is disjointed with the number portion "tel.0265-83-2324". In order to remove such disjointed connected regions, the method 100 for detecting text lines in a text image according to the present invention may further include a step S180 of filtering the text lines to remove the disjointed connected regions.
Specifically, the connected domains in the text line may be numbered sequentially, and the connected domains in fig. 2, i.e., "east", "jing", "T", "e", "l", "e", "0", "2" … … "4" are numbered sequentially as 1, 2, 3, 4, 5, 6 … … 18. Then, the distance of each connected component with respect to the reference position is calculated. The reference position may be arbitrarily selected, such as a left or right boundary of a text line, or the like. Then, for each connected component, the ratio R between the number of the connected component and the distance of the connected component from the reference position is calculated, i.e.:
r ═ the number of connected components/the distance … … of the connected components from the reference position (5)
And, each calculated ratio R is compared with a predetermined range, and if the predetermined range is exceeded, a connected component corresponding to the ratio R is filtered out. The predetermined range may be set by one skilled in the art based on practical application or experience.
The flow of the method for detecting text lines in a text image according to the embodiment of the invention is described in detail above in conjunction with fig. 1. The method 100 for detecting text lines is specifically described below by taking the text images in fig. 3 to 6 as an example.
First, binarization processing is performed in step S110, and connected components are extracted
Figure BDA0001074679760000061
Figure BDA0001074679760000062
Then, in step S120, links are established between adjacent connected domains having similar sizes, forming a tree-like link shown in fig. 3. Next, in step S130, the tree-shaped link is optimized to remove the link with a larger weight, i.e., the connected domain
Figure BDA0001074679760000063
And
Figure BDA0001074679760000064
resulting in an optimized tree-like link as shown in fig. 4. In step S140, the optimized tree link is trimmed to obtain the main link shown in FIG. 5, wherein the connected domain
Figure BDA0001074679760000065
The link with the subject link is cut off so that the connected domain
Figure BDA0001074679760000066
Is removed from the subject link. After the above operation, the exemplary text image only results in one subject link, in which case the step S150 of searching for the maximum weight and link may be omitted because the subject link is the maximum weight and link. Next, in step S160, it is first determined whether or not fusion is performed. Presume first that a connected domain is
Figure BDA0001074679760000067
Fuse with subject links, thereby joining the connected domains
Figure BDA0001074679760000068
And connected domain
Figure BDA0001074679760000069
Fusing a new connected domain
Figure BDA00010746797600000610
Then, a second weighted sum of the merged link (as shown in fig. 6) is calculated, including: updating connected domains
Figure BDA00010746797600000611
With adjacent connected domains
Figure BDA00010746797600000612
And
Figure BDA00010746797600000613
and calculating a second weighted sum of the merged link based on the updated link. Then, comparing the link before merging (as shown in fig. 5) with the link after merging, it is obvious that the link after merging has smaller total weight because of the connected domain after merging
Figure BDA00010746797600000614
Is connected to adjacentThe distance between connected domains is shorter and thus the weight of the link with the adjacent connected domain is smaller. Based on the comparison, it can be determined that connected domains are to be connected
Figure BDA00010746797600000615
Merging with the subject link. Thereby, fusion is performed and the relevant links are updated. And finally, obtaining the text line based on the converged link. The resulting text line has no unconnected connected domains, so step S180 of filtering the unconnected connected domains is omitted.
According to the method for detecting text lines in a text image, the connected component of the image is extracted by using binarization processing. However, in the connected component obtained by the binarization process, there may be a connected component constituted by a single character or a stroke in a character, or there may be a non-character connected component unrelated to a character. Therefore, preferably, before performing step S120, non-text connected components can be filtered out by using an image filtering method well known in the art.
The method for detecting a text line in a text image according to an embodiment of the present invention is described in detail above with reference to the accompanying drawings. An apparatus for detecting text lines in a text image according to an embodiment of the present invention will be described with reference to the accompanying drawings. Fig. 7 is a block diagram illustrating a structure of an apparatus for detecting text lines in a text image according to an embodiment of the present invention.
As shown in fig. 7, the apparatus 700 for detecting text lines in a text image includes a connected component 710, a link establishing component 720, a cropping component 740, a searching component 750, a fusing component 760, and a text line acquiring component 770.
The connected component extraction unit 710 is used to perform binarization processing on the text image and extract each connected component of the image.
The link establishing section 720 establishes a link between adjacent connected domains having similar sizes, thereby forming a plurality of links interleaved. The formed links are considered as a multi-way tree structure, and are also called tree links. Preferably, the link establishing component 720 may filter out non-textual connected domains using image filtering methods well known in the art before establishing links between adjacent connected domains to form the interleaved plurality of links.
The clipping component 740 clips the links in the tree link based on the first weight to obtain a subject link. Specifically, the cutting component may sequence the first weight in the link, and then use a greedy algorithm to cut the link with the smallest weight each time until the total number of layers of the tree link changes, and stop the algorithm, thereby obtaining the main body of the tree link.
The search section 750 searches for the maximum weight and the link from the subject links based on the first weight. Wherein the maximum weight sum is greater than the first weight sum of each of the other links. Specifically, the search section 750 recursively searches for the maximum weight and the link according to expressions (3) and (4).
The fusion section 760 fuses the connected component associated with the clipped link, which satisfies a specific condition, into the maximum weight sum link to obtain a fused link, the specific condition being: and executing second weight sum of each link in the merged link after merging to be smaller than the maximum weight sum before merging and the second weight sum of each link in the link. When the sum of the second weights of all the links of the fused link is calculated, the links of the fused link are updated, then the second weights of all the links are calculated, and the sum of the second weights is further calculated.
The text line acquisition section 770 acquires a text line based on the fused link.
Preferably, the apparatus 700 for detecting text lines in a text image according to the present invention may further include a link optimization unit 730 for optimizing the links in the tree-shaped link to exclude closed-loop links to obtain an optimized tree-shaped link before clipping the links in the tree-shaped link to obtain a main body link. Specifically, the link optimization unit 730 first prunes the links of the tree-like links whose second weight is greater than a predetermined threshold. The predetermined threshold value may be set by one skilled in the art based on practical application or experience. Then, a minimum spanning tree algorithm is applied to the resulting links to obtain optimized tree links without closed loop links.
Preferably, the apparatus 700 for detecting text lines in a text image according to the present invention may further include a filtering component 780 for filtering the text lines acquired by the text line acquiring component 770 to remove the incoherent connected regions. Specifically, the filter 780 first numbers each connected component in sequence, and then calculates the distance of each connected component from the reference position. Then, the ratio R corresponding to each connected component is calculated using expression (5), and each ratio R is compared with a predetermined range, respectively, and if the range is exceeded, the connected component corresponding to the ratio R is filtered out.
In addition, it is noted that the components of the above system may be configured by software, firmware, hardware or a combination thereof. The specific means or manner in which the configuration can be used is well known to those skilled in the art and will not be described further herein. In the case of implementation by software or firmware, a program constituting the software is installed from a storage medium or a network to a computer (for example, a general-purpose computer 800 shown in fig. 8) having a dedicated hardware configuration, and the computer can execute various functions when various programs are installed.
FIG. 8 illustrates a schematic block diagram of a computer 800 that may be used to implement methods and apparatus according to embodiments of the invention.
In fig. 8, a Central Processing Unit (CPU)801 executes various processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 to a Random Access Memory (RAM) 803. In the RAM 803, data necessary when the CPU 801 executes various processes and the like is also stored as necessary. The CPU 801, the ROM 802, and the RAM 803 are connected to each other via a bus 804. An input/output interface 805 is also connected to the bus 804.
The following components are connected to the input/output interface 805: an input section 806 (including a keyboard, a mouse, and the like), an output section 807 (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like), a storage section 808 (including a hard disk and the like), a communication section 809 (including a network interface card such as a LAN card, a modem, and the like). The communication section 809 performs communication processing via a network such as the internet. A drive 810 may also be connected to the input/output interface 805 as desired. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like can be mounted on the drive 810 as necessary, so that the computer program read out therefrom is installed into the storage portion 808 as necessary.
In the case where the above-described series of processes is realized by software, a program constituting the software is installed from a network such as the internet or a storage medium such as the removable medium 811.
It will be understood by those skilled in the art that such a storage medium is not limited to the removable medium 811 shown in fig. 8 in which the program is stored, distributed separately from the apparatus to provide the program to the user. Examples of the removable medium 811 include a magnetic disk (including a floppy disk (registered trademark)), an optical disk (including a compact disk read only memory (CD-ROM) and a Digital Versatile Disk (DVD)), a magneto-optical disk (including a Mini Disk (MD) (registered trademark)), and a semiconductor memory. Alternatively, the storage medium may be the ROM 802, a hard disk included in the storage section 808, or the like, in which programs are stored and which are distributed to users together with the apparatus including them.
The invention also provides a program product with machine readable instruction codes stored. The instruction codes are read by a machine and can execute the method according to the embodiment of the invention when being executed.
Accordingly, storage media carrying the above-described program product having machine-readable instruction code stored thereon are also within the scope of the present invention. Including, but not limited to, floppy disks, optical disks, magneto-optical disks, memory cards, memory sticks, and the like.
It should be noted that the method of the present invention is not limited to being performed in the chronological order described in the specification, and may be performed sequentially in other orders, in parallel, or independently. Therefore, the order of execution of the methods described in this specification does not limit the technical scope of the present invention.
The foregoing description of the various embodiments of the invention is provided for the purpose of illustration only and is not intended to be limiting of the invention. It should be noted that in the above description, features described and/or illustrated with respect to one embodiment may be used in the same or similar manner in one or more other embodiments, in combination with or instead of the features of the other embodiments. It will be understood by those skilled in the art that various changes and modifications may be made to the above-described embodiments without departing from the inventive concept of the present invention, and all such changes and modifications are intended to be included within the scope of the present invention.
In summary, in the embodiments according to the present invention, the present invention provides the following technical solutions.
Scheme 1. a method of detecting text lines in a text image, comprising the steps of:
performing binarization processing on the text image and extracting a connected domain;
establishing links between adjacent connected domains of similar size to form a plurality of interleaved links;
clipping links of the plurality of links based on a first weight to obtain a subject link;
searching the main link for a maximum weight and a link based on the first weight, wherein the sum of the maximum weight and the first weight of each link in the links is larger than the sum of the first weights of each link in other links;
fusing connected domains associated with the pruned links into the maximum weight and links to obtain fused links, wherein the connected domains satisfy the following conditions: executing the sum of the second weights of all the links in the fused link after fusion to be smaller than the sum of the maximum weight before fusion and the second weight of all the links in the link; and
text lines are obtained based on the fused link.
Scheme 2. the method of scheme 1 wherein, in performing the fusing, the connected component associated with the links that are pruned is fused with the most recent connected component in the links and the largest weight.
Scheme 3, the method of scheme 2, wherein the sum of the second weights for each of the merged links is calculated by: the links in the converged link are updated, and a sum of the second weights for the converged link is calculated based on the updated links.
Scheme 4. the method of any of schemes 1-3, wherein the first weight depends on the size and distance of two connected domains establishing a link, and the second weight depends on the distance between two connected domains establishing a link.
Scheme 5. the method according to scheme 4, wherein,
calculating the first weight based on a height and a distance of two connected domains establishing a link when the links are vertically distributed; and
when the links are distributed horizontally, the first weight is calculated based on the width and distance of the two connected domains where the link is established.
Scheme 6. the method of any of schemes 1-3, further comprising: a minimum spanning tree algorithm is applied to exclude closed loop links before searching for the maximum weight and links.
Scheme 7. the method of any of schemes 1-3, wherein the clipping is performed using a greedy algorithm.
Scheme 8. the method of any of schemes 1-3, further comprising: and filtering disconnected connected domains in the text line.
Scheme 9. the method of scheme 8, wherein filtering unconnected connected domains in the text line comprises: the respective connected components are numbered in turn, and filtering is performed based on the numbers of the connected components and the distances of the connected components with respect to the reference position.
Scheme 10. the method of any of schemes 1-3, further comprising, prior to establishing a link, filtering the connected component to exclude non-textual connected components.
Scheme 11. an apparatus for detecting lines of text in a text image, comprising:
a connected component extracting component for extracting a connected component from the text image;
link establishing means for establishing links between adjacent connected domains having similar sizes to form a plurality of interleaved links;
a clipping component for clipping the links of the plurality of links based on a first weight to obtain a subject link;
searching means for searching for a maximum weight and a link from the subject link based on the first weight, a sum of the maximum weight and the first weight of each of the links being larger than a sum of the first weights of each of the other links;
a fusion component for fusing the connected component associated with the cut link into the maximum weight and the link to obtain a fused link, wherein the connected component satisfies the following conditions: executing the sum of the second weights of all the links in the fused link after fusion to be smaller than the sum of the maximum weight before fusion and the second weight of all the links in the link; and
text line acquisition means for acquiring a text line based on the merged link.
Scheme 12 the apparatus of scheme 11 wherein the fusion component fuses the connected component associated with the pruned link with the nearest connected component of the maximum weight and link when performing fusion.
Scheme 13 the apparatus of scheme 12, wherein the fusion component calculates a sum of the second weights for the fused link by: the links in the converged link are updated, and a sum of the second weights for the converged link is calculated based on the updated links.
Scheme 14. the apparatus of any of schemes 11-13, wherein the first weight depends on the size and distance of two connected domains establishing a link, and the second weight depends on the distance between two connected domains establishing a link.
Scheme 15. the apparatus of scheme 14, wherein,
calculating the first weight based on a height and a distance of two connected domains establishing a link when the links are vertically distributed; and
when the links are distributed horizontally, the first weight is calculated based on the width and distance of the two connected domains where the link is established.
Scheme 16. the apparatus of any of schemes 11-13, further comprising: a link optimization component that applies a minimum spanning tree algorithm to the plurality of links formed by the link establishment component to exclude closed loop links.
Scheme 17. the apparatus of any of schemes 11-13, further comprising: the cropping component utilizes a greedy algorithm to crop.
Scheme 18. the apparatus of any of schemes 11-13, further comprising: a filtering component that filters disconnected connected domains in the text line.
The apparatus of claim 18, wherein the filtering means numbers the respective connected components in sequence, and performs filtering based on the numbers of the connected components and the distances of the connected components from a reference position.

Claims (10)

1. A method of detecting text lines in a text image, comprising:
performing binarization processing on the text image and extracting a connected domain;
establishing links between adjacent connected domains of similar size to form a plurality of interleaved links;
clipping links of the plurality of links based on a first weight to obtain a subject link;
searching the main link for a maximum weight and a link based on the first weight, wherein the sum of the maximum weight and the first weight of each link in the links is larger than the sum of the first weights of each link in other links;
fusing connected domains associated with the pruned links into the maximum weight and links to obtain fused links, wherein the connected domains satisfy the following conditions: executing the sum of the second weights of all the links in the fused link after fusion to be smaller than the sum of the maximum weight before fusion and the second weight of all the links in the link; and
text lines are obtained based on the fused link.
2. The method of claim 1, wherein in performing the fusing, the connected component associated with the links that are cropped is fused with the nearest connected component of the maximum weight and links.
3. The method of claim 2, wherein the sum of the second weights for each of the merged links is calculated by: the links in the converged link are updated, and a sum of the second weights for the converged link is calculated based on the updated links.
4. The method according to any of claims 1-3, wherein the first weight depends on the size and distance of two connected domains establishing a link, and the second weight depends on the distance between two connected domains establishing a link.
5. The method of claim 4, wherein,
calculating the first weight based on a height and a distance of two connected domains establishing a link when the links are vertically distributed; and
when the links are distributed horizontally, the first weight is calculated based on the width and distance of the two connected domains where the link is established.
6. The method of any of claims 1-3, further comprising: a minimum spanning tree algorithm is applied to exclude closed loop links before searching for the maximum weight and links.
7. The method of any of claims 1-3, wherein the clipping is performed using a greedy algorithm.
8. The method of any of claims 1-3, further comprising: and filtering disconnected connected domains in the text line.
9. The method of claim 8, wherein filtering unconnected connected domains in the text line comprises: the respective connected components are numbered in turn, and filtering is performed based on the numbers of the connected components and the distances of the connected components with respect to the reference position.
10. An apparatus for detecting lines of text in a text image, comprising:
connected component, is used for carrying out the binarization processing to the text image, and withdraw the connected domain;
link establishing means for establishing links between adjacent connected domains having similar sizes to form a plurality of interleaved links;
a clipping component for clipping the links of the plurality of links based on a first weight to obtain a subject link;
searching means for searching for a maximum weight and a link from the subject link based on the first weight, a sum of the maximum weight and the first weight of each of the links being larger than a sum of the first weights of each of the other links;
a fusion component for fusing the connected component associated with the cut link into the maximum weight and the link to obtain a fused link, wherein the connected component satisfies the following conditions: executing the sum of the second weights of all the links in the fused link after fusion to be smaller than the sum of the maximum weight before fusion and the second weight of all the links in the link; and
text line acquisition means for acquiring a text line based on the merged link.
CN201610654001.7A 2016-08-10 2016-08-10 Method and device for detecting text lines in text image Active CN107729898B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610654001.7A CN107729898B (en) 2016-08-10 2016-08-10 Method and device for detecting text lines in text image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610654001.7A CN107729898B (en) 2016-08-10 2016-08-10 Method and device for detecting text lines in text image

Publications (2)

Publication Number Publication Date
CN107729898A CN107729898A (en) 2018-02-23
CN107729898B true CN107729898B (en) 2020-12-22

Family

ID=61200205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610654001.7A Active CN107729898B (en) 2016-08-10 2016-08-10 Method and device for detecting text lines in text image

Country Status (1)

Country Link
CN (1) CN107729898B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135408B (en) * 2019-03-26 2021-02-19 北京捷通华声科技股份有限公司 Text image detection method, network and equipment
CN111178346B (en) * 2019-11-22 2023-12-08 京东科技控股股份有限公司 Text region positioning method, text region positioning device, text region positioning equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810251A (en) * 2014-01-21 2014-05-21 南京财经大学 Method and device for extracting text
CN104200209A (en) * 2014-08-29 2014-12-10 南京烽火星空通信发展有限公司 Image text detecting method
JP2015103114A (en) * 2013-11-26 2015-06-04 コニカミノルタ株式会社 Text data embedding device, image processing device having the same, text data embedding method and embedding program
CN104732188A (en) * 2013-12-19 2015-06-24 富士通株式会社 Text extraction method and device
US9129277B2 (en) * 2011-08-30 2015-09-08 Digimarc Corporation Methods and arrangements for identifying objects
CN105469086A (en) * 2014-06-19 2016-04-06 夏普株式会社 Equipment and method for identifying direction of text lines

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140067631A1 (en) * 2012-09-05 2014-03-06 Helix Systems Incorporated Systems and Methods for Processing Structured Data from a Document Image

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9129277B2 (en) * 2011-08-30 2015-09-08 Digimarc Corporation Methods and arrangements for identifying objects
JP2015103114A (en) * 2013-11-26 2015-06-04 コニカミノルタ株式会社 Text data embedding device, image processing device having the same, text data embedding method and embedding program
CN104732188A (en) * 2013-12-19 2015-06-24 富士通株式会社 Text extraction method and device
CN103810251A (en) * 2014-01-21 2014-05-21 南京财经大学 Method and device for extracting text
CN105469086A (en) * 2014-06-19 2016-04-06 夏普株式会社 Equipment and method for identifying direction of text lines
CN104200209A (en) * 2014-08-29 2014-12-10 南京烽火星空通信发展有限公司 Image text detecting method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Improving Scene Text Detection by Scale-Adaptive Segmentation and Weighted CRF Verification》;Yi-Feng Pan et al.;《2011 International Conference on Document Analysis and Recognition》;20111103;第759-763页 *
《基于图挖掘的文本主题识别方法研究综述》;郭红梅 等;《中国图书馆学报》;20151130;第41卷(第6期);第97-108页 *

Also Published As

Publication number Publication date
CN107729898A (en) 2018-02-23

Similar Documents

Publication Publication Date Title
CN110334346B (en) Information extraction method and device of PDF (Portable document Format) file
CN103824053B (en) The sex mask method and face gender detection method of a kind of facial image
CN110598622B (en) Video subtitle positioning method, electronic device, and computer storage medium
JP6058005B2 (en) Information filtering
CN102722709A (en) Method and device for identifying garbage pictures
US9183636B1 (en) Line segmentation method
JP2018081674A (en) Method for line and word segmentation for handwritten text images
JP7026165B2 (en) Text recognition method and text recognition device, electronic equipment, storage medium
CN110941959A (en) Text violation detection method, text restoration method, data processing method and data processing equipment
US11915465B2 (en) Apparatus and methods for converting lineless tables into lined tables using generative adversarial networks
Jung et al. Devil's on the edges: Selective quad attention for scene graph generation
CN107729898B (en) Method and device for detecting text lines in text image
CN115546809A (en) Table structure identification method based on cell constraint and application thereof
CN105790967B (en) Network log processing method and device
CN113962199A (en) Text recognition method, text recognition device, text recognition equipment, storage medium and program product
WO2018120575A1 (en) Method and device for identifying main picture in web page
CN112418220A (en) Single word detection method, device, equipment and medium
CN108628703B (en) Mirror image website discovery method and system based on visual similarity
CN112614134A (en) Image segmentation method and device, electronic equipment and storage medium
CN115797955A (en) Table structure identification method based on cell constraint and application thereof
CN115457581A (en) Table extraction method and device and computer equipment
CN106933797B (en) Target information generation method and device
CN113434797A (en) Webpage information extraction method and device
CN114495144A (en) Method and device for extracting form key-value information in text image
CN114912012A (en) Method and system for content recommendation through transfer learning based on source domain features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant