CN103839060B - A kind of merging method in individual character region and device - Google Patents
A kind of merging method in individual character region and device Download PDFInfo
- Publication number
- CN103839060B CN103839060B CN201210486972.7A CN201210486972A CN103839060B CN 103839060 B CN103839060 B CN 103839060B CN 201210486972 A CN201210486972 A CN 201210486972A CN 103839060 B CN103839060 B CN 103839060B
- Authority
- CN
- China
- Prior art keywords
- combined region
- region
- literal line
- connected component
- merging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/158—Segmentation of character regions using character size, text spacings or pitch estimation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
Abstract
The embodiment of the invention discloses the merging method in individual character region and device.Wherein, the method includes:Extract the connected component in image, described connected component is merged, obtain multiple combined region of merging process generation;Arrange described combined region, obtain at least one literal line;Count the number of the combined region that described literal line comprises, retain the most maximum literal line of the number comprising described combined region, and delete other literal lines overlapping, wherein, the described combined region included in described maximum literal line is described individual character region.According to embodiments of the present invention, the inaccurate problem of merging of the prior art can be solved.
Description
Technical field
The present invention relates to image processing field, more particularly to a kind of merging method in individual character region and device.
Background technology
Character recognition technology in image has extensive practical application, such as the content recognition of scanned document or automatically postal
Code identification etc..Popularization with digital camera and the development of Internet technology, through human-edited in the image basis shooting
The image that generated afterwards gets more and more, as shown in figure 1, these human-edited's images generally have the background picture of complexity, changeable
Foreground color and texture, in order to identify the word in these complicated human-edited's images, first need to carry out determining of character area
Position and cutting, so-called character area just refers to the set in all individual character regions in above-mentioned human-edited's image, and herein in
" individual character " is to refer to, including the word in Arabic numerals and various language, e.g., Chinese character or the Latin alphabet etc..
It is important to each the individual character area in human-edited's image will be determined in the positioning of character area and cutting process
Domain.In all types of individual characters, Chinese character compared with the Latin alphabet, due to its be by multiple radicals (in graph theory, one
Multiple radicals in Chinese character are multiple mutually disconnected connected components) form, there is more complicated structure, therefore,
When determining the region of a Chinese character it is necessary to the multiple mutually disconnected connected component originally belonging to a Chinese character is carried out group
Close, i.e. merge process.Identical with Chinese character areas case it is also desirable to the individual character region merging process also includes Korea Spro
Character area and Japan word region etc..
The existing method merging individual character region typically all analyzes spacing and the position relationship between each connected component,
Using all connected components meeting specific distance threshold value and certain positional relationship as the connected component belonging to an individual character region,
And merge.In merging process, stop when the number of merged connected component reaches specific quantity threshold value merging.
But, during realizing invention, the inventors found that existing individual character region merging method at least
There is following technical problem:The number of the connected component being comprised due to each individual character region is different, and different individual character region it
Between spacing also vary, therefore, select spacing threshold or amount threshold anyway, all easily produce in merging process
The multiple connected components originally belonging to an individual character region are merged into the over-segmentation problem in multiple individual character regions, or will originally not
The connected component belonging to an individual character region is also merged into the problem crossing merging in this individual character region.
Content of the invention
In order to solve above-mentioned technical problem, embodiments provide a kind of merging method in individual character region and device,
To solve the inaccurate problem of merging of the prior art.
The embodiment of the present invention discloses following technical scheme:
A kind of merging method in individual character region, including:
Extract the connected component in image, described connected component is merged, obtain multiple conjunctions of merging process generation
And region;
Arrange described combined region, obtain at least one literal line;
Count the number of the combined region that described literal line comprises, retain the number comprising described combined region most
Daimonji row, and delete other literal lines overlapping, wherein, described combined region included in described maximum literal line
For described individual character region.
A kind of merging device in individual character region, including:
Merge module, for extracting the connected component in image, described connected component is merged, obtains merging process
The multiple combined region producing;
Literal line arrangement analysis module, for arranging described combined region, obtains at least one literal line;
First choice module, for counting the number of the combined region that described literal line comprises, retains and comprises described merging
The most maximum literal line of the number in region, and delete other literal lines overlapping, wherein, institute in described maximum literal line
The described combined region comprising is described individual character region.
As can be seen from the above-described embodiment, in human-edited's image, individual character is often arranged in rows regularly, therefore, such as
The individual character region that fruit merges is correct, the individual character region of this merging correct individual character region and surrounding should sizableness, row
Row are neat, and can make up longer literal line.On the contrary, if it is wrong that the individual character region merging is closed, create over-segmentation
Or cross merging, the individual character region of this merging mistake and the individual character region of surrounding just can form the probability of a longer literal line
Meeting very little, therefore, the embodiment of the present invention, by carrying out literal line arrangement analysis to all combined region, obtains literal line, therefrom
Select the most literal line of connected component number, i.e. literal line the longest, and the combined region in this literal line the longest is
Merge correct individual character region, thus solving the inaccurate problem of merging of the prior art.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description be only this
Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, also may be used
So that other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is human-edited's image schematic diagram;
Fig. 2 is a kind of method flow diagram of the merging method in individual character region that the embodiment of the present invention one discloses;
Fig. 3 is the connected component schematic diagram in graph theory;
Fig. 4 is a kind of method flow diagram of the merging method in individual character region that the embodiment of the present invention two discloses;
Fig. 5 is the combined region schematic diagram being in over-segmentation state produced by the intermediate link merging;
Fig. 6 is a kind of method flow diagram of the merging method in individual character region that the embodiment of the present invention three discloses;
Fig. 7 implements a kind of structure drawing of device of the merging device in individual character region of four announcements for the present invention;
Fig. 8 is the structural representation of literal line arrangement analysis module of the present invention;
Fig. 9 merges the structural representation of module for the present invention.
Specific embodiment
Embodiments provide merging method and the device in individual character region.In human-edited's image, individual character often has
It is regularly arranged and embark on journey, therefore, if the individual character region merging is correct, the list of this merging correct individual character region and surrounding
Block domain should sizableness, marshalling, and can make up longer literal line.On the contrary, if the individual character region merging is closed
It is wrong, create over-segmentation or cross merging, the individual character region of this merging mistake and the individual character region of surrounding can form one
The probability of individual longer literal line will very little, therefore, the embodiment of the present invention is by carrying out literal line row to all combined region
Row analysis, obtains literal line, therefrom the most literal line of selection connected component number, i.e. literal line the longest, and the longest at this
Literal line in combined region be merge correct individual character region.
Understandable for enabling the above objects, features and advantages of the present invention to become apparent from, below in conjunction with the accompanying drawings to the present invention
Embodiment is described in detail.
Embodiment one
Refer to Fig. 2, it is a kind of method flow diagram of the merging method in individual character region that the embodiment of the present invention one discloses,
The method comprises the following steps:
Step 201:Extract the connected component in image, described connected component is merged, obtain merging process and produce
Multiple combined region;
As shown in figure 3, its be graph theory in connected component schematic diagram, in graph theory, if any two therein point it
Between all there are paths, and the point that they are all got along well outside subgraph is connected, and such subgraph is referred to as connected component.Example
As, in the human-edited's image shown in Fig. 1, " rush " and " sale " in " sales promotion " is respectively two individual character regions, wherein,
" rush " this individual character region includes two connected components:" Ren " and " sufficient ";" pin " this individual character region includes two connected components:
" Jin " and " Xiao ".
The embodiment of the present invention is not defined to merging connected component, the method obtaining combined region, can be using existing
Any one method in technology is had to merge.
A kind of preferred implementation method is:Connected component in human-edited's image is compared two-by-two, bag will be met
Any two connected component of closed structure relation and adjacent structure relation merges, and obtains combined region;By connected component and
The combined region that merging process produces each time, as combining objects, repeats combining objects are compared two-by-two, will meet bag
Any two combining objects of closed structure relation and adjacent structure relation merge, until can not be merged.
For example, describe for convenience, taking in human-edited's image, comprise 5 connected components (connected component 1-5) as a example, will
5 connected components are compared it is assumed that connected component 1 and 2 meets encirclement structural relation, in " side ", " area " and " figure " two-by-two
Connected component is to surround structural relation, and connected component 3 and 4 meets adjacent structure relation, " product ", " word " and connecting in " OK "
Component is adjacent structure relation.In first time merging process, by connected component 1 and 2-in-1 and be combined region 1, by connection point
Amount 3 and 4 merges into combined region 2,.It is further continued for being compared connected component 1-5 and combined region 1-2 it is assumed that merging two-by-two
Region 1 and connected component 5 meet encirclement structure.In second merging process, connected component 5 and combined region 1 are merged into
Combined region 3.The like, till can not merging again.Finally, obtain the assembly section that merging process produces each time
Domain:Combined region 1,2 and 3.
Preferably, the present invention can be, but not limited in the following way to two connections point meeting encirclement structural relation
Amount, a connected component and a combined region, or two combined region merge:
For two connected components, judge that the overlapping area between the fitted rectangle of two connected components is connected with two point
In the fitted rectangle of amount, whether the ratio of the minimum fitted rectangle area of area is more than the first preset multiple, and two connections point
Whether the color of amount and stroke width are close, if it is, meeting encirclement structural relation, otherwise, do not meet encirclement structural relation.
For a connected component and a combined region (combining objects), or two combined region (combining objects),
Judge the overlapping area between the fitted rectangle of two combining objects and area minimum in the fitted rectangle of two combining objects
Whether the ratio of fitted rectangle area be more than the first preset multiple, and the color of two combining objects and stroke whether close,
If it is, meeting encirclement structural relation, otherwise, do not meet encirclement structural relation.
Preferably, the present invention can be, but not limited in the following way to two connections point meeting adjacent structure relation
Amount, a connected component and a combined region, or two combined region merge:
For two connected components, judge the width of the fitted rectangle of two connected components and the ratio and centre distance between
Whether value is more than the second preset multiple, and whether the color of two connected components and stroke are close, and the plan in the region after merging
Whether the ratio of the length and width of closing rectangle is less than the 3rd preset multiple, if it is, meeting adjacent structure relation, otherwise, no
Meet adjacent structure relation.
For a connected component and a combined region (combining objects), or two combined region (combining objects),
Judge whether the width of the fitted rectangle of two combining objects and the ratio and centre distance between are more than the second preset multiple, two
Whether the color of individual combining objects and stroke are close, and the length of the fitted rectangle in region after merging with the ratio of width is
No be less than the 3rd preset multiple, if it is, meeting adjacent structure relation, otherwise, do not meet adjacent structure relation.
It should be noted that the embodiment of the present invention is preset again to above-mentioned first preset multiple, the second preset multiple and the 3rd
The concrete numerical value of number is not defined, and beforehand through test, can calculate the matching of each connected component in the individual character surrounding structure
The ratio of the minimum fitted rectangle area of area in the fitted rectangle of the overlapping area between rectangle and each connected component, and utilize
Sample statistics method determines a mean ratio, using this mean ratio as the first preset multiple.In the same manner, it may also be determined that going out
Two preset multiple and the 3rd preset multiple.
Step 202:Arrange described combined region, obtain at least one literal line;
The embodiment of the present invention can adopt any one of prior art literal line arrangement analysis method to obtain in step 201
The combined region obtaining carries out literal line arrangement analysis.
For example, the literal line arrangement analysis method based on projection and Hough transformation, both sides are included in prior art
Method is all based on the statistical information in region to obtain literal line arrangement information.Additionally, also include one kind in prior art being based on
The literal line arrangement analysis method of region clustering, such method would generally define similar between the region in same a line and region
Relation, then becomes one group using a kind of polymerization to the region clustering with similarity relation, the behavior literal line being formed.
Step 203:Count the number of the combined region that described literal line comprises, retain the number comprising described combined region
Most maximum literal lines, and delete other literal lines overlapping, wherein, described included in described maximum literal line
Combined region is described individual character region.
After all of literal line is obtained based on step 202, count the number of the combined region comprising in each literal line,
Therefrom find out the most literal line of number, i.e. literal line the longest, the combined region in this literal line the longest is just to merge
True individual character region, meanwhile, deleting the literal line the longest with this has overlapping literal line, and these have with literal line the longest overlapping
Literal line in combined region be due to over-segmentation or cross the individual character region of mistake merging and leading to.
Preferably, methods described can further include:
Step 204:If also having surplus in addition to described maximum literal line and the literal line overlapping with described maximum literal line
Remaining literal line, relays continuation of insurance from described remaining literal line and stays next maximum literal line, and deletes other that overlap
Literal line, by that analogy, till no maximum literal line can retain;
Wherein, the described combined region included in the maximum literal line of described each reservation is described individual character region.
According to method as above, from all literal lines of the residue in addition to the literal line selected above, then look for
Go out the most literal line of number, the like, till there is no selectable literal line.
As can be seen from the above-described embodiment, in human-edited's image, individual character is often arranged in rows regularly, therefore, such as
The individual character region that fruit merges is correct, the individual character region of this merging correct individual character region and surrounding should sizableness, row
Row are neat, and can make up longer literal line.On the contrary, if it is wrong that the individual character region merging is closed, create over-segmentation
Or cross merging, the individual character region of this merging mistake and the individual character region of surrounding just can form the probability of a longer literal line
Meeting very little, therefore, the embodiment of the present invention, by carrying out literal line arrangement analysis to all combined region, obtains literal line, therefrom
Select the most literal line of connected component number, i.e. literal line the longest, and the combined region in this literal line the longest is
Merge correct individual character region, thus solving the inaccurate problem of merging of the prior art.
In addition in addition it is also necessary to especially emphasize that a kind of common inter-bank crosses combination situation, due to line space very little, adjacent multirow
Word is crossed by inter-bank and is merged into a character area.In this case, although crossing the region merging also can form longer row,
But due to crossing the presence merging, the number of its combined region comprising will necessarily less than in correct row combined region
Number, according to the strategy of the present invention, still can select the row correctly merging.Therefore, the present invention also can solve this class well
Inter-bank crosses the problem of merging.
Embodiment two
When carrying out literal line arrangement analysis and obtaining literal line, due to processed to as if each time merging process produce
Combined region, and the combined region that merging process produces each time inherently can include the substantial amounts of conjunction being in over-segmentation state
And region (combined region produced by the intermediate link before completing to merge for the last time), using in the intermediate link merging
The produced combined region that these are in over-segmentation state carries out literal line arrangement analysis, will necessarily affect literal line arrangement point
The accuracy of analysis and execution efficiency.In order to solve this problem, the present embodiment two is with the difference of embodiment one, to merging
During region carries out literal line arrangement analysis, the above-mentioned combined region being in over-segmentation state is not carried out with literal line arrangement point
Analysis is processed.Refer to Fig. 4, it is a kind of method flow diagram of the merging method in individual character region that the embodiment of the present invention two discloses,
The method comprises the following steps:
Step 401:Extract the connected component in image, described connected component is merged, obtain merging process and produce
Multiple combined region;
This step implement the step 201 that process may refer in embodiment one, due to carrying out in embodiment one
Describe in detail, therefore here is omitted.
Step 402:Obtain the first combined region set, described first combined region set include at least two have identical
The combined region of connected component, is carried based on comprising the most combined region of connected component number in described first combined region set
Take literal line, obtain the second combined region set, described second combined region set includes at least one and do not have identical connection
The combined region of component, extracts literal line based on the combined region in described second combined region set;
For example, when merging to three connected components " mouth " in " product " word, three kinds of combined region (1,2 can be produced
With 3), as shown in Figure 5.Two kinds of combined region (1 and 2) therein are to be in over-segmentation shape produced by the intermediate link merging
The combined region of state, respectively includes two connected components, and combined region 3 is correctly to merge produced by the final tache merging
Region, includes three connected components.Above three combined region is all close with the word of surrounding in size and arrangement, because
This, in literal line arrangement analysis, these three combined region all can be extracted on same literal line, and this not only can affect literary composition
The accuracy of word row arrangement analysis and execution efficiency, and, also can affect the number of combined region comprising in literal line,
That is, the number making the combined region comprising in the literal line extracting will have more 2 than actual number, and comprise in literal line
The number of combined region is the foundation whether final decision literal line retains, the number of the combined region comprising in impact literal line
Finally also affect the accuracy merging individual character region.
It is found that three kinds of combined region all comprise identical connected component from Fig. 5, and comprise connected component number
Many combined region are correct combined region produced by the final tache merging.Therefore, in all of combined region, such as
There are certain several combined region and comprise identical connected component in fruit, then, in these combined region, connected component number is most
Combined region is correct combined region produced by the final tache merging, and remaining combined region is to be in over-segmentation state
Combined region, row is extracted based on the most combined region of connected component number, thus not to the assembly section being in over-segmentation state
Domain carries out literal line arrangement analysis process.
It should be noted that the method for the literal line arrangement analysis adopting is different, based on described first combined region set
In comprise the most combined region of connected component number extract literal line method also different.
Preferably, when adopting the literal line arrangement analysis method based on Hough transformation, by described first combined region collection
In conjunction, the line relationship between each combined region is set to calculate, and obtains from carrying out literal line arrangement analysis to combined region
Literal line in search described first combined region set;Connected component is retained in the described first combined region set searched
The most combined region of number, removes other combined region.
Or it is preferred that when using literal line arrangement analysis method based on region clustering, in each combined region weight
Weight factor in increase the connected component number that comprises of combined region;By each combined region in described first combined region set
Between weight be set to 0.
Step 403:Count the number of the combined region that described literal line comprises, retain the number comprising described combined region
Most maximum literal lines, and delete other literal lines overlapping, wherein, described included in described maximum literal line
Combined region is described individual character region.
This step implement the step 203 that process may refer in embodiment one, due to carrying out in embodiment one
Describe in detail, therefore here is omitted.
As can be seen from the above-described embodiment, in addition to there is the technique effect in embodiment one, because the present embodiment exists
During literal line arrangement analysis are carried out to combined region, literal line is not carried out to the above-mentioned combined region being in over-segmentation state
Arrangement analysis are processed, and therefore, further increase the accuracy of literal line arrangement analysis.
Embodiment three
Below so that literal line arrangement analysis are carried out using region clustering method as a example, describe a kind of individual character region in detail and carry out
The method merging.Refer to Fig. 6, it is a kind of method flow of the merging method in individual character region that the embodiment of the present invention three discloses
Figure, the method comprises the following steps:
Step 601:All connected components in human-edited's image are compared two-by-two, encirclement structural relation will be met
Merge with any two connected component of adjacent structure relation, obtain combined region;
Step 602:Using the combined region of all connected components and merging process generation each time as combining objects, repeat
Combining objects are compared two-by-two, is carried out meeting any two combining objects surrounding structural relation and adjacent structure relation
Merge, until can not be merged;
Step 603:When adopting the literal line arrangement analysis method based on region clustering, in the power of each combined region weight
Increase the connected component number that comprises of combined region in repeated factor, the power between each combined region of identical connected component will be comprised
Reset and be set to 0, obtain literal line;
For example, according to the existing literal line arrangement analysis method based on region clustering, the old weight of combined region R is W,
After increasing, in weight factor, the connected component number that combined region comprises, the new weight of combined region R is W+kn, and wherein, k is
One constant, the number of the connected component that n then comprises for combined region R.According to the existing literal line based on region clustering row
Row analysis method, the old weight (typicallying represent the probability belonging to same a line) between combined region R1 and R2 is W, then new weight is
W+kn1+kn2, wherein, k is a constant, the number of the connected component that n1 then comprises for combined region R1, and n2 is then assembly section
The number of the connected component that domain R2 comprises.As, in the Clique extracting method based on greedy algorithm, selected coupling to (in figure
Summit) weight be the side that this summit is connected number N, and new weight then may be configured as N+kn1+kn2, n1 and n2 is respectively
It is the number of connected component that the combined region of coupling centering comprises.
Step 604:From all literal lines in addition to the literal line selected, circulation selection comprises combined region
The most literal line of number, deleting has overlapping literal line with described literal line, wherein, comprises in the literal line selected
Combined region is the individual character region merging.
As can be seen from the above-described embodiment, in human-edited's image, individual character is often arranged in rows regularly, therefore, such as
The individual character region that fruit merges is correct, the individual character region of this merging correct individual character region and surrounding should sizableness, row
Row are neat, and can make up longer literal line.On the contrary, if it is wrong that the individual character region merging is closed, create over-segmentation
Or cross merging, the individual character region of this merging mistake and the individual character region of surrounding just can form the probability of a longer literal line
Meeting very little, therefore, the embodiment of the present invention, by carrying out literal line arrangement analysis to all combined region, obtains literal line, therefrom
Select the most literal line of connected component number, i.e. literal line the longest, and the combined region in this literal line the longest is
Merge correct individual character region, thus solving the inaccurate problem of merging of the prior art.
In addition in addition it is also necessary to especially emphasize that a kind of common inter-bank crosses combination situation, due to line space very little, adjacent multirow
Word is crossed by inter-bank and is merged into a character area.In this case, although crossing the region merging also can form longer row,
But due to crossing the presence merging, the number of its combined region comprising will necessarily less than in correct row combined region
Number, according to the strategy of the present invention, still can select the row correctly merging.Therefore, the present invention also can solve this class well
Inter-bank crosses the problem of merging.
Example IV
Corresponding with a kind of above-mentioned merging method in individual character region, the embodiment of the present invention additionally provides a kind of individual character region
Merge device.Refer to Fig. 7, it implements a kind of structure drawing of device of the merging device in individual character region of four announcements for the present invention,
This device includes:Merge module 701, literal line arrangement analysis module 702 and selecting module 703.Work with reference to this device
It is further described its internal structure and annexation as principle.
Merge module 701, for extracting the connected component in image, described connected component is merged, is merged
Multiple combined region that process produces;
Literal line arrangement analysis module 702, for arranging described combined region, obtains at least one literal line;
First choice module 703, for counting the number of the combined region that described literal line comprises, retains and comprises described conjunction
And the most maximum literal line of number in region, and delete other literal lines overlapping, wherein, in described maximum literal line
The described combined region being comprised is described individual character region.
Preferably, the device shown in Fig. 7 can further include:Circulation selecting module, if for removing described maximum
Also has remaining literal line, from described remaining literal line outside literal line and the literal line overlapping with described maximum literal line
Continue to retain next maximum literal line, and delete other literal lines overlapping, by that analogy, until no maximum literal line
Till can retaining;
Wherein, the described combined region included in the maximum literal line of described each reservation is described individual character region.
Preferably, as shown in figure 8, literal line arrangement analysis module 702 further includes:The first row extracting sub-module 7021
With the second row extracting sub-module 7022, wherein,
The first row extracting sub-module 7021, for obtaining the first combined region set, described first combined region set bag
Include at least two combined region with identical connected component, individual based on comprising connected component in described first combined region set
The most combined region of number extracts literal line;
Second row extracting sub-module 7022, for obtaining the second combined region set, described second combined region set bag
Include the combined region that at least one does not have identical connected component, carried based on the combined region in described second combined region set
Take literal line.
It is further preferred that the first row extracting sub-module 7021 includes:
First mutual exclusion condition setting submodule, for when using literal line arrangement analysis method based on Hough transformation,
Line relationship between each combined region in described first combined region set is set to calculate, from entering to combined region
Described first combined region set is searched in the literal line that row literal line arrangement analysis obtain;
Row selects submodule, comprises connected component number for retaining in the described first combined region set searched
Many combined region, remove other combined region.
Or, it is further preferred that the first row extracting sub-module 7021 includes:
Weight factor arranges subelement, for when adopting the literal line arrangement analysis method based on region clustering, each
Increase the connected component number that combined region comprises in the weight factor of combined region weight;
Second mutual exclusion condition setting submodule, for when using literal line arrangement analysis method based on region clustering,
Weight between the described combined region comprising identical connected component is set to 0.
Preferably, include as shown in figure 9, merging module 701:Connected component merges submodule 7011 and combining objects merge
Submodule 7012, wherein,
Connected component merges submodule 7011, for being compared two-by-two to the connected component in human-edited's image, will
Meet encirclement structural relation and any two connected component of adjacent structure relation merges, obtain combined region;
It is further preferred that connected component merges submodule including:First judging submodule, for judging two connections point
Overlapping area between the fitted rectangle of amount and the minimum fitted rectangle area of area in the fitted rectangle of two connected components
Whether ratio be more than the first preset multiple, and the color of two connected components and stroke width whether close, if it is, meeting
Surround structural relation, otherwise, do not meet encirclement structural relation;With the second judging submodule, for judging two connected components
Whether the width of fitted rectangle and the ratio and centre distance between are more than the second preset multiple, the color of two connected components and
Whether stroke is close, and whether the length of the fitted rectangle in region after merging is less than the 3rd default times with the ratio of width
Number, if it is, meeting adjacent structure relation, otherwise, does not meet adjacent structure relation.
Combining objects merge submodule 7012, for making the combined region of connected component and merging process generation each time
For combining objects, repeat combining objects are compared two-by-two, surround any of structural relation and adjacent structure relation by meeting
Two combining objects merge, until can not be merged.
It is further preferred that combining objects merge submodule 7012 including:3rd judging submodule, for judging two conjunctions
And the overlapping area between the fitted rectangle of object and the minimum fitted rectangle face of area in the fitted rectangle of two combining objects
Whether long-pending ratio be more than the first preset multiple, and the color of two combining objects and stroke width whether close, if it is,
Meet encirclement structural relation, otherwise, do not meet encirclement structural relation;With, the 4th judging submodule is right for judging two merging
Whether the width of the fitted rectangle of elephant and the ratio and centre distance between are more than the second preset multiple, the face of two combining objects
Whether normal complexion stroke is close, and whether the length of the fitted rectangle in region after merging is default less than the 3rd with the ratio of width
Multiple, if it is, meeting adjacent structure relation, otherwise, does not meet adjacent structure relation.
As can be seen from the above-described embodiment, in human-edited's image, individual character is often arranged in rows regularly, therefore, such as
The individual character region that fruit merges is correct, the individual character region of this merging correct individual character region and surrounding should sizableness, row
Row are neat, and can make up longer literal line.On the contrary, if it is wrong that the individual character region merging is closed, create over-segmentation
Or cross merging, the individual character region of this merging mistake and the individual character region of surrounding just can form the probability of a longer literal line
Meeting very little, therefore, the embodiment of the present invention, by carrying out literal line arrangement analysis to all combined region, obtains literal line, therefrom
Select the most literal line of connected component number, i.e. literal line the longest, and the combined region in this literal line the longest is
Merge correct individual character region, thus solving the inaccurate problem of merging of the prior art.
In addition in addition it is also necessary to especially emphasize that a kind of common inter-bank crosses combination situation, due to line space very little, adjacent multirow
Word is crossed by inter-bank and is merged into a character area.In this case, although crossing the region merging also can form longer row,
But due to crossing the presence merging, the number of its combined region comprising will necessarily less than in correct row combined region
Number, according to the strategy of the present invention, still can select the row correctly merging.Therefore, the present invention also can solve this class well
Inter-bank crosses the problem of merging.
It should be noted that one of ordinary skill in the art will appreciate that realizing the whole or portion in above-described embodiment method
Split flow, can be by computer program to complete come the hardware to instruct correlation, described program can be stored in a computer
In read/write memory medium, this program is upon execution, it may include as the flow process of the embodiment of above-mentioned each method.Wherein, described
Storage medium can be magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory
(Random Access Memory, RAM) etc..
Above a kind of merging method in individual character region provided by the present invention and device are described in detail, herein
Apply specific embodiment the principle of the present invention and embodiment are set forth, the explanation of above example is only intended to help
Assistant's solution method of the present invention and its core concept;Simultaneously for one of ordinary skill in the art, according to the think of of the present invention
Think, all will change in specific embodiments and applications, in sum, it is right that this specification content should not be construed as
The restriction of the present invention.
Claims (16)
1. a kind of merging method in individual character region is it is characterised in that include:
Extract the connected component in image, described connected component is merged, obtain multiple assembly sections of merging process generation
Domain;
Arrange described combined region, obtain at least one literal line;
Count the number of the combined region that described literal line comprises, retain the most maximum literary composition of the number comprising described combined region
Word row, and delete other literal lines overlapping, wherein, the described combined region included in described maximum literal line is institute
State individual character region.
2. method according to claim 1 is it is characterised in that methods described also includes:
If also having remaining literal line in addition to described maximum literal line and the literal line overlapping with described maximum literal line, from
Next maximum literal line is stayed in described remaining literal line relaying continuation of insurance, and deletes other literal lines overlapping, with such
Push away, till no maximum literal line can retain;
Wherein, the described combined region included in maximum literal line retaining every time is described individual character region.
3. method according to claim 1, it is characterised in that the described combined region of described arrangement, obtains at least one literary composition
Word row, including:
Obtain the first combined region set, described first combined region set includes at least two conjunctions with identical connected component
And region, extract literal line based on comprising the most combined region of connected component number in described first combined region set;
Obtain the second combined region set, described second combined region set includes at least one and do not have identical connected component
Combined region, extracts literal line based on the combined region in described second combined region set.
4. it is characterised in that described acquisition the first combined region set, described first closes method according to claim 3
And regional ensemble includes at least two combined region with identical connected component, wrap based in described first combined region set
The most combined region of number containing connected component extracts literal line, including:
When adopting the literal line arrangement analysis method based on Hough transformation, by each assembly section in described first combined region set
Line relationship between domain is set to calculate, from carrying out to combined region looking into the literal line that literal line arrangement analysis obtain
Look for described first combined region set;
Retain the most combined region of connected component number in the described first combined region set searched, remove other conjunctions
And region.
5. it is characterised in that described acquisition the first combined region set, described first closes method according to claim 3
And regional ensemble includes at least two combined region with identical connected component, wrap based in described first combined region set
The most combined region of number containing connected component extracts literal line, including:
When adopting the literal line arrangement analysis method based on region clustering, increase in the weight factor of each combined region weight
The connected component number that combined region comprises;
Weight between each combined region in described first combined region set is set to 0.
6. method according to claim 1 is it is characterised in that connected component in described extraction image, to described connection
Component merges, and obtains multiple combined region of merging process generation, including:
The connected component extracting is compared two-by-two, any two surrounding structural relation and adjacent structure relation will be met even
Reduction of fractions to a common denominator amount merges, and obtains combined region;
The described combined region that connected component and merging process are produced, as combining objects, repeats combining objects are carried out two-by-two
Relatively, merge meeting any two combining objects surrounding structural relation and adjacent structure relation, until merging
Till.
7. method according to claim 6 will be it is characterised in that described will meet encirclement structural relation and adjacent structure relation
Any two connected component merge, including:
Judge overlapping area between the fitted rectangle of two connected components with area in the fitted rectangle of two connected components
Whether the ratio of little fitted rectangle area is more than the first preset multiple, and the color of two connected components and stroke width are
No close, if it is, meeting encirclement structural relation, otherwise, do not meet encirclement structural relation;
Judge the width of the fitted rectangle of two connected components and the ratio and centre distance between whether default times more than second
Whether number, the color of two connected components and stroke are close, and the length of the fitted rectangle in region after merging and width
Whether ratio is less than the 3rd preset multiple, if it is, meeting adjacent structure relation, otherwise, does not meet adjacent structure relation.
8. method according to claim 6 will be it is characterised in that described will meet two combining objects of adjacent structure relation
Merge, including:
Judge overlapping area between the fitted rectangle of two combining objects with area in the fitted rectangle of two combining objects
Whether the ratio of little fitted rectangle area is more than the first preset multiple, and the color of two combining objects and stroke width are
No close, if it is, meeting encirclement structural relation, otherwise, do not meet encirclement structural relation;
Judge the width of the fitted rectangle of two combining objects and the ratio and centre distance between whether default times more than second
Whether number, the color of two combining objects and stroke are close, and the length of the fitted rectangle in region after merging and width
Whether ratio is less than the 3rd preset multiple, if it is, meeting adjacent structure relation, otherwise, does not meet adjacent structure relation.
9. a kind of merging device in individual character region is it is characterised in that include:
Merge module, for extracting the connected component in image, described connected component is merged, obtain merging process and produce
Multiple combined region;
Literal line arrangement analysis module, for arranging described combined region, obtains at least one literal line;
First choice module, for counting the number of the combined region that described literal line comprises, retains and comprises described combined region
The most maximum literal line of number, and delete other literal lines overlapping, wherein, included in described maximum literal line
Described combined region be described individual character region.
10. device according to claim 9 is it is characterised in that described device also includes:
Circulation selecting module, if for going back in addition to described maximum literal line and the literal line overlapping with described maximum literal line
There is remaining literal line, relay continuation of insurance from described remaining literal line and stay next maximum literal line, and delete and overlap
Other literal lines, by that analogy, till no maximum literal line can retain;
Wherein, the described combined region included in maximum literal line retaining every time is described individual character region.
11. devices according to claim 9 are it is characterised in that described literal line arrangement analysis module includes:
The first row extracting sub-module, for obtaining the first combined region set, described first combined region set includes at least two
The individual combined region with identical connected component, most based on comprising connected component number in described first combined region set
Combined region extracts literal line;
Second row extracting sub-module, for obtaining the second combined region set, described second combined region set includes at least one
The individual combined region without identical connected component, extracts word based on the combined region in described second combined region set
OK.
12. devices according to claim 11 are it is characterised in that described the first row extracting sub-module includes:
First mutual exclusion condition setting submodule, for when adopting the literal line arrangement analysis method based on Hough transformation, by institute
State the line relationship between each combined region in the first combined region set to be set to calculate, to combined region enter style of writing
Described first combined region set is searched in the literal line that word row arrangement analysis obtain;
Row selects submodule, for retaining the most merging of connected component number in the described first combined region set searched
Region, removes other combined region.
13. devices according to claim 11 are it is characterised in that described the first row extracting sub-module includes:
Weight factor arranges subelement, for when adopting the literal line arrangement analysis method based on region clustering, in each merging
Increase the connected component number that combined region comprises in the weight factor of region weight;
Second mutual exclusion condition setting submodule, for when adopting the literal line arrangement analysis method based on region clustering, by institute
The weight stated between the combined region comprising identical connected component is set to 0.
14. devices according to claim 9 are it is characterised in that described merging module includes:
Connected component merges submodule, for being compared two-by-two to the connected component in human-edited's image, will meet encirclement
Any two connected component of structural relation and adjacent structure relation merges, and obtains combined region;
Combining objects merge submodule, for the combined region of connected component and merging process generation each time is right as merging
As repeating combining objects are compared two-by-two, merging meeting any two surrounding structural relation and adjacent structure relation
Object merges, until can not be merged.
15. devices according to claim 14 are it is characterised in that described connected component merging submodule includes:
First judging submodule, for judging the overlapping area between the fitted rectangle of two connected components and two connected components
Fitted rectangle in the ratio of the minimum fitted rectangle area of area whether be more than the first preset multiple, and two connected components
Color and stroke width whether close, if it is, meeting encirclement structural relation, otherwise, do not meet encirclement structural relation;
Second judging submodule, for judging the width of fitted rectangle of two connected components and the ratio and centre distance between
Whether it is more than the second preset multiple, whether the color of two connected components and stroke are close, and the matching in the region after merging
Whether the length of rectangle is less than the 3rd preset multiple with the ratio of width, if it is, meeting adjacent structure relation, otherwise, is not inconsistent
Close adjacent structure relation.
16. devices according to claim 14 are it is characterised in that described combining objects merging submodule includes:
3rd judging submodule, for judging the overlapping area between the fitted rectangle of two combining objects and two combining objects
Fitted rectangle in the ratio of the minimum fitted rectangle area of area whether be more than the first preset multiple, and two combining objects
Color and stroke width whether close, if it is, meeting encirclement structural relation, otherwise, do not meet encirclement structural relation;
4th judging submodule, for judging the width of fitted rectangle of two combining objects and the ratio and centre distance between
Whether it is more than the second preset multiple, whether the color of two combining objects and stroke are close, and the matching in the region after merging
Whether the length of rectangle is less than the 3rd preset multiple with the ratio of width, if it is, meeting adjacent structure relation, otherwise, is not inconsistent
Close adjacent structure relation.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710112654.7A CN107122778B (en) | 2012-11-26 | 2012-11-26 | Method and device for merging single character areas |
CN201210486972.7A CN103839060B (en) | 2012-11-26 | 2012-11-26 | A kind of merging method in individual character region and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210486972.7A CN103839060B (en) | 2012-11-26 | 2012-11-26 | A kind of merging method in individual character region and device |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710112654.7A Division CN107122778B (en) | 2012-11-26 | 2012-11-26 | Method and device for merging single character areas |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103839060A CN103839060A (en) | 2014-06-04 |
CN103839060B true CN103839060B (en) | 2017-03-01 |
Family
ID=50802539
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210486972.7A Active CN103839060B (en) | 2012-11-26 | 2012-11-26 | A kind of merging method in individual character region and device |
CN201710112654.7A Active CN107122778B (en) | 2012-11-26 | 2012-11-26 | Method and device for merging single character areas |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710112654.7A Active CN107122778B (en) | 2012-11-26 | 2012-11-26 | Method and device for merging single character areas |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN103839060B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105989366A (en) * | 2015-01-30 | 2016-10-05 | 深圳市思路飞扬信息技术有限责任公司 | Inclination angle correcting method of text image, page layout analysis method of text image, vision assistant device and vision assistant system |
CN107977593A (en) * | 2016-10-21 | 2018-05-01 | 富士通株式会社 | Image processing apparatus and image processing method |
CN106951893A (en) * | 2017-05-08 | 2017-07-14 | 奇酷互联网络科技(深圳)有限公司 | Text information acquisition methods, device and mobile terminal |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101251892A (en) * | 2008-03-07 | 2008-08-27 | 北大方正集团有限公司 | Method and apparatus for cutting character |
CN101266654A (en) * | 2007-03-14 | 2008-09-17 | 中国科学院自动化研究所 | Image text location method and device based on connective component and support vector machine |
US7697760B2 (en) * | 2001-02-22 | 2010-04-13 | International Business Machines Corporation | Handwritten word recognition using nearest neighbor techniques that allow adaptive learning |
-
2012
- 2012-11-26 CN CN201210486972.7A patent/CN103839060B/en active Active
- 2012-11-26 CN CN201710112654.7A patent/CN107122778B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7697760B2 (en) * | 2001-02-22 | 2010-04-13 | International Business Machines Corporation | Handwritten word recognition using nearest neighbor techniques that allow adaptive learning |
CN101266654A (en) * | 2007-03-14 | 2008-09-17 | 中国科学院自动化研究所 | Image text location method and device based on connective component and support vector machine |
CN101251892A (en) * | 2008-03-07 | 2008-08-27 | 北大方正集团有限公司 | Method and apparatus for cutting character |
Non-Patent Citations (1)
Title |
---|
基于连通域单元和穿越算法的汉字切分;王琳琬等;《信息技术》;20040430;第28卷(第4期);第30-33页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107122778B (en) | 2020-06-23 |
CN107122778A (en) | 2017-09-01 |
CN103839060A (en) | 2014-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Clausner et al. | Icdar2017 competition on recognition of documents with complex layouts-rdcl2017 | |
Qiao et al. | Lgpma: Complicated table structure recognition with local and global pyramid mask alignment | |
Zhang et al. | Road extraction by deep residual u-net | |
CN110968667B (en) | Periodical and literature table extraction method based on text state characteristics | |
Bansal et al. | Segmentation of touching and fused Devanagari characters | |
Ma et al. | Joint layout analysis, character detection and recognition for historical document digitization | |
CN108763483A (en) | A kind of Text Information Extraction method towards judgement document | |
CA2315456C (en) | Schematic organization tool | |
US20070234258A1 (en) | Method for post-routing redundant via insertion in integrated circuit layout | |
CN105574524B (en) | Based on dialogue and divide the mirror cartoon image template recognition method and system that joint identifies | |
CN101510252A (en) | Area extraction program, character recognition program, and character recognition device | |
Harit et al. | Table detection in document images using header and trailer patterns | |
CN105528614A (en) | Cartoon image layout recognition method and automatic recognition system | |
CN103839060B (en) | A kind of merging method in individual character region and device | |
CN102968619B (en) | Recognition method for components of Chinese character pictures | |
CN106339481A (en) | Chinese compound new-word discovery method based on maximum confidence coefficient | |
Colter et al. | Tablext: A combined neural network and heuristic based table extractor | |
CN103995816A (en) | Information processing apparatus, information processing method | |
Bansal et al. | Table extraction from document images using fixed point model | |
CN103729638A (en) | Text row arrangement analytical method and device for text area recognition | |
JP2005043990A (en) | Document processor and document processing method | |
CN101814141A (en) | Storage medium, character identifying method and character recognition device | |
Nguyen | TableSegNet: a fully convolutional network for table detection and segmentation in document images | |
Roy et al. | Diag2graph: Representing deep learning diagrams in research papers as knowledge graphs | |
CN110955892B (en) | Hardware Trojan horse detection method based on machine learning and circuit behavior level characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |