CN101290656A - Connected region extraction method and apparatus for copy block analysis - Google Patents

Connected region extraction method and apparatus for copy block analysis Download PDF

Info

Publication number
CN101290656A
CN101290656A CNA200810067409XA CN200810067409A CN101290656A CN 101290656 A CN101290656 A CN 101290656A CN A200810067409X A CNA200810067409X A CN A200810067409XA CN 200810067409 A CN200810067409 A CN 200810067409A CN 101290656 A CN101290656 A CN 101290656A
Authority
CN
China
Prior art keywords
mark
pixel
labeled
value
neighborhood
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA200810067409XA
Other languages
Chinese (zh)
Other versions
CN101290656B (en
Inventor
朱慧莹
邹月娴
吴天瑞
刘宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN200810067409XA priority Critical patent/CN101290656B/en
Publication of CN101290656A publication Critical patent/CN101290656A/en
Application granted granted Critical
Publication of CN101290656B publication Critical patent/CN101290656B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for picking up a connected region in layout analysis. The method comprises the following steps: the neighboring region N(p) of a target pixel p(x, y) is defined as: N(p)={(x-1, y), (x+1, y), (x, y-1), (x, y+1), (x-1, y-1), (x+1, y-1), (x-1, y+1), (x+1, y+1), (x-4, y), (x-3, y), (x-2, y), (x+2, y), (x+3, y), (x+4, y), (x, y+2), (x, y-2)}; as for a random pixel q(i, j) which has the same pixel value as the target pixel p(x, y), whether q(i, j) is in the neighboring region N(p) is judged; if q(i, j) is in the neighboring region N(p), the pixel q(i, j) and the pixel p(x, y) are taken as the connected region and picked up. The method and the device of the invention have the advantages of greatly reducing the number of connected regions picked up, increasing areas of connected regions, lowering the calculation amount and the treatment complexity of the merging of connected regions in following treatment steps and making following treatment steps easy and convenient.

Description

A kind of connected region extracting method and device that is used for printed page analysis
Technical field
The present invention relates to a kind of connected region extracting method that is used for printed page analysis, the invention still further relates to a kind of connected region extraction element that is used for printed page analysis.
Background technology
As shown in Figure 1, the flow process of business card identification generally comprises input business card scan image, pre-service, printed page analysis, character recognition successively, the back typing is handled and output.Wherein pre-service comprises image binaryzation and separate text, image; Printed page analysis comprises that connected region is extracted, connected region merges, printed page understanding.Wherein, the connected region extraction is a very important step in the printed page analysis.The extraction of connected region is exactly from the binaryzation dot matrix image of white pixel and black picture element composition, according to the connectedness between the pixel black picture element set or the white pixel set that is interconnected extracted, these collection of pixels that are extracted out are exactly connected region.
Connectedness between the pixel is to determine an important notion of connected region, its meaning is as follows: hypothetical target pixel p (x, y) among the neighborhood N (p) k pixel arranged, for any i pixel in this k pixel, if the pixel value of its pixel value and object pixel is identical, then claim to be interconnected between these two pixels, promptly they belong to same connected region.
Difference according to connective between the pixel can have different connected region extracting method.In image processing field, widespread use at present be four to be communicated with disposal routes and eight and to be communicated with disposal routes.As shown in Figure 2, the ultimate principle of four connection disposal routes is as follows:
(x y), defines its neighborhood N for object pixel p 4(p) be:
N 4(p)=(x-1, y), (x+1, y), (x, y-1), (x, y+1) }, for (x, (i, j), (i is j) at neighborhood N as if q y) to have any pixel q of same pixel value with object pixel p 4(p) in, (x, y) (i j) is interconnected, and promptly they belong to same connected region with pixel q then to claim pixel p.
The ultimate principle of eight connection disposal routes is as follows:
(x y), defines its neighborhood N for object pixel p 8(p) be:
N 8(p)=N 4(p)+(x-1, y-1), (x+1, y-1), (x-1, y+1), (x+1, y+1) } (wherein, N 4(p)={ (x-1, y), (x+1, y), (x, y-1), (x, y+1) }) is for (x, (i, j), (i is j) at neighborhood N as if q y) to have any pixel q of same pixel value with object pixel p 8(p) in, (x, y) (i j) is interconnected, and promptly they belong to same connected region with pixel q then to claim pixel p.
Existing four are communicated with disposal route and eight is communicated with the epistemic logic that disposal route meets the mankind, also can obtain good effect in the major part practice, thereby obtain using widely.Yet these two kinds of disposal routes lack and the concrete singularity that is associated of using.In business card identification was used, the connected region area that these two kinds of methods are extracted was little, and it is many to extract the connected region number, has caused follow-up connected region joint account complexity height, and has influenced final printed page analysis effect.Experimental results show that these two kinds of methods can not obtain the result of connected domain extraction preferably in business card identification is used.And current name card recognition technique is mainly used on the embedded platform such as mobile phone, and its system hardware resources is limited, and algorithm computation complexity height can cause system handles speed slow, makes people be difficult to stand, thereby reduces the availability of system.
Summary of the invention
Technical matters to be solved by this invention is exactly in order to overcome above deficiency, to have proposed a kind of connected region extracting method and device that is used for printed page analysis, reducing the complexity that follow-up connected region merges.
For solving the problems of the technologies described above, method of the present invention comprises the steps: the first step: for object pixel p (x y), defines its neighborhood N (p) and is:
N(p)={(x-1,y),(x+1,y),(x,y-1),(x,y+1),(x-1,y-1),(x+1,y-1),(x-1,y+1),(x+1,y+1),(x-4,y),(x-3,y),(x-2,y),(x+2,y),(x+3,y),(x+4,y),(x,y+2),(x,y-2)}
Second step: for (x, (i, j), (whether i is j) in neighborhood N (p), if then (x, y) (i j) extracts as same connected region with pixel q with pixel p to judge q y) to have any pixel q of same pixel value with object pixel p.
Preferably, method of the present invention specifically comprises the steps:
(1), from left to right, from top to bottom the binaryzation dot matrix image of business card is carried out element marking;
The method of described element marking is: if (x is a white y) to object pixel p, then object pixel is labeled as 0; If object pixel is a black, then judge object pixel scan neighborhood mark whether all to be 0; If not all being 0, object pixel is followed scan non-0 mark that first scans in the neighborhood, if all be 0, then give new non-0 mark of object pixel;
(2), merge element marking from left to right, from top to bottom;
The method of described merging element marking is: non-0 mark of object pixel and neighborhood thereof is merged into non-0 mark of a unification, and non-0 of described unification is labeled as the minimum value in non-0 mark of object pixel and neighborhood thereof.
Further preferably, method of the present invention specifically comprises the steps:
Step 1: to the binaryzation dot matrix image, preceding 4 pixel p (1,1) of mark the 1st row, p (1,2), p (1,3), p (1,4); These 4 pixels if its pixel value is not 0, is put it and are labeled as 0 for all; If its pixel value is 0, then carry out following processing:
To pixel p (1,1): distribute non-0 mark to it;
To pixel p (1,2): if being labeled as of pixel p (1,1) is non-0, follow the mark of p (1,1), otherwise, distribute new non-0 mark for p (1,2);
To pixel p (1,3): if having one in the mark of pixel p (1,1) and p (1,2) at least, follow first non-0 mark that scans among them for non-0, otherwise, give p (1,3) distribution new non-0 mark;
To pixel p (1,4): if having one in the mark of pixel p (1,1), p (1,2) and p (1,3) at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (1,4) distribution new non-0 mark;
Step 2: (1, j), wherein 4<j<N+1 for all these pixels, if its pixel value is not 0, puts it and is labeled as 0 to handle the 1st other pixel p of going; If its pixel value is 0, then carry out following processing:
To pixel p (1, j), if pixel p (1, j-1), p (1, j-2), p (1, j-3), p (1, have one in mark j-4) at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (1, j) distribute new non-0 mark;
Step 3: handle preceding four pixel p (2,1) of the 2nd row, p (2,2), p (2,3), p (2,4), these four pixels if its pixel value is not 0, is put it and are labeled as 0 for all, if its pixel value is 0, then carry out following processing:
To pixel p (2,1): if having one in the mark of pixel p (1,1) and p (1,2) at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (2,1) distribution new non-0 mark;
To pixel p (2,2): if having one in the mark of pixel p (1,1), p (1,2), p (1,3), p (2,1) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (2,2); To among p (1,1), p (1,2), p (1,3), the p (2,1) any one, if it is labeled as non-0 and do not wait with the mark of p (2,2), it is of equal value then writing down these two marks;
To pixel p (2,3): if having one in the mark of pixel p (1,2), p (1,3), p (1,4), p (2,1), p (2,2) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (2,3); To among p (1,2), p (1,3), p (1,4), p (2,1), the p (2,2) any one, if it is labeled as non-0 and do not wait with the mark of p (2,3), it is of equal value then writing down these two marks;
To pixel p (2,4): if pixel p (1,3), p (1,4), p (1,5), have at least one to be non-0 in the mark of p (2,1), p (2,2), p (2,3), follow any one mark of non-0 among them, otherwise, distribute new non-0 mark for p (2,4); To among p (1,3), p (1,4), p (1,5), p (2,1), p (2,2), the p (2,3) any one, if it is labeled as non-0 and do not wait with the mark of p (2,4), it is of equal value then writing down these two marks;
Step 4: handle the 2nd the row other pixel p (2, j), wherein 4<j<N+1 for all these pixels, if its pixel value is not 0, puts it and is labeled as 0, if its pixel value is 0, then carries out following processing:
To pixel p (2, j): if pixel p (1, j-1), p (1, j), p (1, j+1), p (2, j-1), p (2, j-2), p (2, j-3), p (2, j-4) have one in the mark at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (2, j) distribute new non-0 mark; To p (1, j-1), p (1, j), p (1, j+1), p (2, j-1), p (2, j-2), p (2, j-3), p (2, j-4) among any one, if its be labeled as non-0 and with p (2, mark j) does not wait, and it is of equal value then writing down these two marks;
Step 5: handle 1~4 capable row pixel p of 3~M (i, j), 2<i<M+1 wherein, 0<j<5, this four row pixel if its pixel value is not 0, is put it and is labeled as 0 for all, if its pixel value is 0, then carries out following processing:
To pixel p (i, 1): if having one in the mark of p (i-2,1), p (i-1,1), p (i-1,2) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 1); To among p (i-2,1), p (i-1,1), the p (i-2,2) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 1), it is of equal value then writing down these two marks;
To pixel p (i, 2): if p (i-2,2), p (i-1,1), p (i-1,2), has one in the mark of p (i-1,3), p (i, 1) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 2); To among p (i-2,2), p (i-1,1), p (i-1,2), p (i-1,3), the p (i, 1) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 2), it is of equal value then writing down these two marks;
To pixel p (i, 3): if p (i-2,3), p (i-1,2), p (i-1,3), have at least one to be non-0 in the mark of p (i-1,4), p (i, 1), p (i, 2), non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 3); To among p (i-2,3), p (i-1,2), p (i-1,3), p (i-1,4), p (i, 1), the p (i, 2) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 3), it is of equal value then writing down these two marks;
To pixel p (i, 4): if p (i-2,4), p (i-1,3), p (i-1,4), p (i-1,5), p (i, 1), p (i, 2), p (i, 3) have one in the mark at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 4); To among p (i-2,4), p (i-1,3), p (i-1,4), p (i-1,5), p (i, 1), p (i, 2), the p (i, 3) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 4), it is of equal value then writing down these two marks;
Step 6: handle the capable 5~N row pixel p of 3~M (i, j), 2<i<M+1 wherein, 4<j<N+1 for all these pixels, if its pixel value is not 0, puts it and is labeled as 0; If its pixel value is 0, then carry out following processing:
To pixel p (i, j): if p (i-2, j), p (i-1, j-1), p (i-1, j), p (i-1, j+1), p (i, j-1), p (i, j-2), p (i, j-3), (i has one at least for non-0 to p in mark j-4), non-0 the mark of following that first scans among them, otherwise (i j) distributes new non-0 mark to give p; To p (i-2, j), p (i-1, j-1), p (i-1, j), p (i-1, j+1), p (i, j-1), p (i, j-2), p (i, j-3), p (i, j-4) among any one, if its be labeled as non-0 and with p (i, mark j) does not wait, it is of equal value then writing down these two marks;
Step 7: all marks of equal value are merged into minimum mark in the mark of equal value.
Also comprise the steps: to add up the number of all connected regions that obtain at last after the described step 7.
Described connected region extracting method is used for the printed page analysis of business card identification.
For solving the problems of the technologies described above, device of the present invention specifically comprises as lower unit: the neighborhood definition unit: be used for object pixel p (x, neighborhood N (p) y) is defined as:
N(p)={(x-1,y),(x+1,y),(x,y-1),(x,y+1),(x-1,y-1),(x+1,y-1),(x-1,y+1),(x+1,y+1),(x-4,y),(x-3,y),(x-2,y),(x+2,y),(x+3,y),(x+4,y),(x,y+2),(x,y-2)}
Extraction unit: be used for judging and object pixel p (x, (i, j), (i, whether j) in neighborhood N (p), and (x, y) (i j) extracts as same connected region judgement q with pixel q with pixel p when being y) to have any pixel q of same pixel value.
Preferably, this device specifically comprises as lower unit:
Indexing unit: be used for from left to right, the binaryzation dot matrix image to business card carries out element marking from top to bottom; The method of described element marking is: if (x is a white y) to object pixel p, then object pixel is labeled as 0; If object pixel is a black, then judge object pixel scan neighborhood mark whether all to be 0; Object pixel is followed scan non-0 mark that first scans in the neighborhood if not all being 0, give new non-0 mark of object pixel if all be 0;
Merge cells: be used for merging from left to right, from top to bottom element marking; The method of described merging element marking is: non-0 mark of object pixel and neighborhood thereof is merged into non-0 mark of a unification, and non-0 of described unification is labeled as the minimum value in non-0 mark of object pixel and neighborhood thereof.
The beneficial effect that the present invention is compared with the prior art is: the connected region extracting method of printed page analysis and the connected region extracting method computation complexity height that device has overcome four connected sums, eight connections of extensively adopting at present of being used for of the present invention, connected domain is extracted the not good problem of performance, the present invention is by the simple search neighborhood that enlarges each connected domain, can greatly reduce the number of the connected region of extraction, increase the area of single connected region, improve the effect that connected domain is extracted, make subsequent treatment become simple, thereby reduce the workload of entire identification process, satisfy the demand of practical application.Method and apparatus of the present invention can significantly reduce the number of the connected region of extracting, and has increased the area of connected region, has reduced the calculated amount that connected region merges in the subsequent treatment and has handled complexity, makes subsequent treatment become simple.The present invention has taken all factors into consideration the effect of hunting zone and connected region extraction, increases in the hunting zone and has improved the effect that connected region is extracted under the little situation greatly.Method and apparatus of the present invention is easy to realize that calculated amount increases little, is applicable to embedded platforms such as mobile phone.
Description of drawings
Fig. 1 is the synoptic diagram of existing business card identification process;
Fig. 2 is the principle schematic of existing four connection disposal routes;
Fig. 3 is the principle schematic of existing eight connection disposal routes;
Fig. 4 is the principle schematic that is used for the connected region extracting method of printed page analysis of the present invention;
Fig. 5 is the synoptic diagram that scans neighborhood of the object pixel of the specific embodiment of the invention;
Fig. 6 is the synoptic diagram of the processing procedure of the specific embodiment of the invention;
Fig. 7 is the result schematic diagram that adopts after method of the present invention is handled " Beijing ";
Fig. 8 is the result schematic diagram that adopts after existing eight connection methods are handled " Beijing ".
Embodiment
A kind of connected region extracting method that is used for printed page analysis comprises the steps:
The first step: for object pixel p (x y), defines its neighborhood N (p) and is:
N(p)={(x-1,y),(x+1,y),(x,y-1),(x,y+1),(x-1,y-1),(x+1,y-1),(x-1,y+1),(x+1,y+1),(x-4,y),(x-3,y),(x-2,y),(x+2,y),(x+3,y),(x+4,y),(x,y+2),(x,y-2)}
Second step: for (x, (i, j), (whether i is j) in neighborhood N (p), if then (x, y) (i j) extracts as same connected region with pixel q with pixel p to judge q y) to have any pixel q of same pixel value with object pixel p.
The present invention to the improvement of existing connected region extracting method is: redefined neighborhood N (p), on eight bases that are communicated with, respectively increase a connected pixel point up and down, its objective is that the character that solves up-down structure is divided into the problem of two or more connected regions, about respectively increase by three connected pixel points, we are referred to as " cross connected domain ", that is to say when carrying out the connected domain search, we can be on the neighborhood basis that search eight is communicated with, enlarge the search neighborhood again, and the hunting zone is exactly " the cross connected domain " of the present invention's definition.
The neighborhood N (p) of the defined object pixel p of existing connected region extracting method is based on the mankind's epistemic logic, but the connected region area that existence is extracted is too small, and the problem that the connected region number is too much makes troubles for follow-up processing.In order to address this problem, we have invented " cross connected domain ", its objective is that the character that will belong to the full extent with delegation is divided into same connected region, method of the present invention has reduced the number of the connected region of extracting on the one hand, reduce the subsequent treatment complexity, avoided the excessive problem that is difficult to search for of neighborhood on the other hand.The present invention does not define the reason that neighborhood N (p) has more pixels and is: 1). and increase more pixel and can cause the hunting zone excessive, the calculated amount of algorithm is sharply increased.2). increasing more pixel up and down, can to cause two area dividing that will originally not belong to same connected region be same connected region, and subsequent treatment is caused difficulty.3). the inventor proves after doing a large amount of experiments, about increase by three pixels and can will be divided into same connected region with the character of delegation, can cause these two area dividing that do not belong to same connected region to be same connected region, to reduce the search point and then can cause the number of connected region sharply to increase and increase more pixel.Therefore, this programme is optimum, and experimental result shows, uses this method to be controlled at connected domain extraction time in the Millisecond.
The above-mentioned connected region extracting method that is used for printed page analysis specifically comprises the steps:
(1), from left to right, from top to bottom the binaryzation dot matrix image of business card is carried out element marking.The method of described element marking is: if (x is a white y) to object pixel p, then object pixel is labeled as 0; If object pixel is a black, then judge object pixel scan neighborhood mark whether all to be 0; Object pixel is followed scan non-0 mark that first scans in the neighborhood if not all being 0, give new non-0 mark of object pixel if all be 0.
We can see in Fig. 5, object pixel p (x, y) neighborhood of scanning before may comprise a (x, y-2), b (x-1, y-1), c (x, y-1), d (x+1, y-1), e (x-4, y), f (x-3, y), g (x-2, y), h (x-1, y).Wherein, for the object pixel p on non-border (x, y), wherein (4<x, and 2<y), object pixel p (x, y) neighborhood of scanning before necessarily comprises a, b, c, d, e, f, g, h, and the sequencing of scanning also is followed successively by a, b, c, d, e, f, g, h.If a-h all is a white (being that mark all is 0), then give object pixel p (x, y) new non-0 mark, and begin that (x+1 y) carries out mark to next object pixel p.If a-h complete then make object pixel follow and scans non-0 mark that first scans in the neighborhood for white (being that mark is not all to be 0), and begin that (x+1 y) carries out mark to next object pixel p.For example: if a=0 will begin to check b so, if b=3, (x y) follows the mark of b, and (x y) also is labeled as 3 to be about to p then to make object pixel p.For the object pixel p on border (x, y), wherein (x<5, or y<3), (x, y) neighborhood of scanning before comprises a, b, c, d, e, f, g, h to object pixel p scarcely fully, but the sequencing of scanning is constant.If scanned neighborhood territory pixel and all is white (being that mark all is 0), then give object pixel p (x, y) new non-0 mark, and begin that (x+1 y) carries out mark to next object pixel p.Object pixel is followed scan non-0 mark that first scans in the neighborhood entirely if scanned neighborhood territory pixel and is not, and begin that (x+1 y) carries out mark to next object pixel p for white (being that mark is not all to be 0).
(2), merge element marking from left to right, from top to bottom.
The method of described merging element marking is: with object pixel p (x, y) and non-0 mark of neighborhood be merged into non-0 mark of a unification, non-0 of described unification is labeled as the minimum value in non-0 mark of object pixel and neighborhood thereof.
The connected region extracting method that is used for printed page analysis that the present invention is proposed in embodiment of act is done a step more detailed description below.This embodiment operates on ordinary individual's computing machine (PC), the concrete configuration of PC is as follows: CPU:Intel P4 1.7GHz, internal memory: 512MDDR333, operating system: Windows XP Professional Edition, running environment: MicrosoftVisual Studio 2005.To the binaryzation dot matrix image that is input as business card image (refer to the image be made up of monochrome pixels point, generally in the computing machine represent black picture element with 0, represent white pixel with 255), the extraction pixel value is 0 connected region in this embodiment.Its treatment step following (please refer to Fig. 6 and understand with convenient, in the following treatment step, " following the mark of * * " means to object pixel and distribute a mark identical with * *):
Step 1: to the binaryzation dot matrix image, preceding 4 pixels of mark the 1st row, i.e. pixel p (1,1), p (1,2), p (1,3), p (1,4); These 4 pixels if its pixel value is not 0 (white pixel), is put it and are labeled as 0 for all; If its pixel value is 0 (black picture element), then carry out following processing:
To pixel p (1,1): distribute non-0 mark to it;
To pixel p (1,2): if being labeled as of pixel p (1,1) is non-0, follow the mark of p (1,1), otherwise, distribute new non-0 mark for p (1,2);
To pixel p (1,3): if having one in the mark of pixel p (1,1) and p (1,2) at least, follow first non-0 mark that scans among them for non-0, otherwise, give p (1,3) distribution new non-0 mark;
To pixel p (1,4): if having one in the mark of pixel p (1,1), p (1,2) and p (1,3) at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (1,4) distribution new non-0 mark.
Step 2: (1, j), wherein 4<j<N+1 for all these pixels, if its pixel value is not 0, puts it and is labeled as 0 to handle the 1st other pixel p of going; If its pixel value is 0, then carry out following processing:
To any pixel p (1, j) (4<j<N+1), if p (1, j-1), p (1, j-2), p (1, j-3), p (1, have one in mark j-4) at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (1, j) distribute new non-0 mark.
Step 3: handle preceding four pixel p (2,1) of the 2nd row, p (2,2), p (2,3), p (2,4), these four pixels if its pixel value is not 0, is put it and are labeled as 0 for all, if its pixel value is 0, then carry out following processing:
To pixel p (2,1): if having one in the mark of pixel p (1,1) and p (1,2) at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (2,1) distribution new non-0 mark;
To pixel p (2,2): if having one in the mark of pixel p (1,1), p (1,2), p (1,3), p (2,1) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (2,2); To p (1,1), p (1,2), p (1,3), p (2,1) any one among is if it is labeled as non-0 and do not wait with the mark of p (2,2), it is of equal value then writing down these two marks, and here, " equivalence " means that these two marks are to merge into a mark in the step of back;
To pixel p (2,3): if having one in the mark of pixel p (1,2), p (1,3), p (1,4), p (2,1), p (2,2) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (2,3); To among p (1,2), p (1,3), p (1,4), p (2,1), the p (2,2) any one, if it is labeled as non-0 and do not wait with the mark of p (2,3), it is of equal value then writing down these two marks;
To pixel p (2,4): if pixel p (1,3), p (1,4), p (1,5), have at least one to be non-0 in the mark of p (2,1), p (2,2), p (2,3), follow any one mark of non-0 among them, otherwise, distribute new non-0 mark for p (2,4); To among p (1,3), p (1,4), p (1,5), p (2,1), p (2,2), the p (2,3) any one, if it is labeled as non-0 and do not wait with the mark of p (2,4), it is of equal value then writing down these two marks.
Step 4: handle the 2nd the row other pixel p (2, j), wherein 4<j<N+1 for all these pixels, if its pixel value is not 0, puts it and is labeled as 0, if its pixel value is 0, then carries out following processing:
To pixel p (2, j): if pixel p (1, j-1), p (1, j), p (1, j+1), p (2, j-1), p (2, j-2), p (2, j-3), p (2, j-4) have one in the mark at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (2, j) distribute new non-0 mark; To p (1, j-1), p (1, j), p (1, j+1), p (2, j-1), p (2, j-2), p (2, j-3), p (2, j-4) among any one, if its be labeled as non-0 and with p (2, mark j) does not wait, and it is of equal value then writing down these two marks;
Step 5: handle 1~4 capable row pixel p of 3~M (i, j), 2<i<M+1 wherein, 0<j<5, this four row pixel if its pixel value is not 0, is put it and is labeled as 0 for all, if its pixel value is 0, then carries out following processing:
To pixel p (i, 1): if having one in the mark of p (i-2,1), p (i-1,1), p (i-1,2) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 1); To among p (i-2,1), p (i-1,1), the p (i-2,2) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 1), it is of equal value then writing down these two marks;
To pixel p (i, 2): if p (i-2,2), p (i-1,1), p (i-1,2), has one in the mark of p (i-1,3), p (i, 1) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 2); To among p (i-2,2), p (i-1,1), p (i-1,2), p (i-1,3), the p (i, 1) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 2), it is of equal value then writing down these two marks;
To pixel p (i, 3): if p (i-2,3), p (i-1,2), p (i-1,3), have at least one to be non-0 in the mark of p (i-1,4), p (i, 1), p (i, 2), non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 3); To among p (i-2,3), p (i-1,2), p (i-1,3), p (i-1,4), p (i, 1), the p (i, 2) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 3), it is of equal value then writing down these two marks;
To pixel p (i, 4): if p (i-2,4), p (i-1,3), p (i-1,4), p (i-1,5), p (i, 1), p (i, 2), p (i, 3) have one in the mark at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 4); To among p (i-2,4), p (i-1,3), p (i-1,4), p (i-1,5), p (i, 1), p (i, 2), the p (i, 3) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 4), it is of equal value then writing down these two marks;
Step 6: handle the capable 5~N row pixel p of 3~M (i, j), 2<i<M+1 wherein, 4<j<N+1 for all these pixels, if its pixel value is not 0, puts it and is labeled as 0; If its pixel value is 0, then carry out following processing:
To pixel p (i, j): if p (i-2, j), p (i-1, j-1), p (i-1, j), p (i-1, j+1), p (i, j-1), p (i, j-2), p (i, j-3), (i has one at least for non-0 to p in mark j-4), non-0 the mark of following that first scans among them, otherwise (i j) distributes new non-0 mark to give p; To p (i-2, j), p (i-1, j-1), p (i-1, j), p (i-1, j+1), p (i, j-1), p (i, j-2), p (i, j-3), p (i, j-4) among any one, if its be labeled as non-0 and with p (i, mark j) does not wait, it is of equal value then writing down these two marks;
Step 7: all marks of equal value are merged into minimum mark in the mark of equal value.Promptly right according to all marks of equal value of obtaining, to underlined merging of binaryzation dot matrix image, two the right marks that are about to mark of equal value are merged into non-0 mark of a unification, and non-0 of described unification is labeled as the minimum value of mark centering of equal value.
Step 8: last all number of labels that obtain of statistics, just the connected region number of Ti Quing.The above-mentioned connected region extracting method that is used for printed page analysis can be used in any printed page analysis, preferably is used in the printed page analysis of business card, is particularly useful for the printed page analysis of Chinese business card.Said method is applicable to horizontally-arranged business card (character on the business card according to from left to right series arrangement), for vertical setting of types business card (character on the business card is arranged in accordance with the order from top to bottom), then needs can use method of the present invention to handle business card rotation 90.In business card identification was used, we wished that the correct connected region area that extracts is big as much as possible, and the number of connected region is few as much as possible, can reduce the computation complexity of subsequent treatment like this.Can further understand advantage of the present invention with reference to following comparative example.Shown in Fig. 7,8, for these two words of Beijing, adopt method of the present invention, last " north " word is a complete connected region, " capital " word is a complete connected region; Adopt existing eight to be communicated with disposal route, last " north " word is divided into two connected regions, and " capital " word is divided into five connected regions.Method of the present invention that hence one can see that can significantly reduce the number of the connected region of extracting, and has increased the area of connected region, has reduced the calculated amount that connected region merges in the subsequent treatment and has handled complexity, makes subsequent treatment become simple.
Use eight width of cloth business card images of above-mentioned embodiment to picked at random, result is as shown in table 1:
The result of eight width of cloth business card images of table 1 pair picked at random relatively
Figure A20081006740900161
Figure A20081006740900171
As can be seen from the above table, method of the present invention can significantly reduce the number of the connected region of extracting, and has increased the area of connected region, has reduced the calculated amount that connected region merges in the subsequent treatment and has handled complexity, makes subsequent treatment become simple.The present invention has taken all factors into consideration the effect of hunting zone and connected region extraction, increases in the hunting zone and has improved the effect that connected region is extracted under the little situation greatly.Method of the present invention is easy to realize that calculated amount increases little, is applicable to embedded platforms such as mobile phone.
Above content be in conjunction with concrete preferred implementation to further describing that the present invention did, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims (7)

1. a connected region extracting method that is used for printed page analysis is characterized in that: comprise the steps:
The first step: for object pixel p (x y), defines its neighborhood N (p) and is:
N(p)={(x-1,y),(x+1,y),(x,y-1),(x,y+1),(x-1,y-1),(x+1,y-1),(x-1,y+1),(x+1,y+1),(x-4,y),(x-3,y),(x-2,y),(x+2,y),(x+3,y),(x+4,y),(x,y+2),(x,y-2)}
Second step: for (x, (i, j), (whether i is j) in neighborhood N (p), if then (x, y) (i j) extracts as same connected region with pixel q with pixel p to judge q y) to have any pixel q of same pixel value with object pixel p.
2. the connected region extracting method that is used for printed page analysis according to claim 1 is characterized in that: specifically comprise the steps:
(1), from left to right, from top to bottom the binaryzation dot matrix image of business card is carried out element marking;
The method of described element marking is: if (x is a white y) to object pixel p, then object pixel is labeled as 0; If object pixel is a black, then judge object pixel scan neighborhood mark whether all to be 0; If not all being 0, object pixel is followed scan non-0 mark that first scans in the neighborhood, if all be 0, then give new non-0 mark of object pixel;
(2), merge element marking from left to right, from top to bottom;
The method of described merging element marking is: non-0 mark of object pixel and neighborhood thereof is merged into non-0 mark of a unification, and non-0 of described unification is labeled as the minimum value in non-0 mark of object pixel and neighborhood thereof.
3. the connected region extracting method that is used for printed page analysis according to claim 2 is characterized in that: specifically comprise the steps:
Step 1: to the binaryzation dot matrix image, preceding 4 pixel p (1,1) of mark the 1st row, p (1,2), p (1,3), p (1,4); These 4 pixels if its pixel value is not 0, is put it and are labeled as 0 for all; If its pixel value is 0, then carry out following processing:
To pixel p (1,1): distribute non-0 mark to it;
To pixel p (1,2): if being labeled as of pixel p (1,1) is non-0, follow the mark of p (1,1), otherwise, distribute new non-0 mark for p (1,2);
To pixel p (1,3): if having one in the mark of pixel p (1,1) and p (1,2) at least, follow first non-0 mark that scans among them for non-0, otherwise, give p (1,3) distribution new non-0 mark;
To pixel p (1,4): if having one in the mark of pixel p (1,1), p (1,2) and p (1,3) at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (1,4) distribution new non-0 mark;
Step 2: (1, j), wherein 4<j<N+1 for all these pixels, if its pixel value is not 0, puts it and is labeled as 0 to handle the 1st other pixel p of going; If its pixel value is 0, then carry out following processing:
To pixel p (1, j), if pixel p (1, j-1), p (1, j-2), p (1, j-3), p (1, have one in mark j-4) at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (1, j) distribute new non-0 mark;
Step 3: handle preceding four pixel p (2,1) of the 2nd row, p (2,2), p (2,3), p (2,4), these four pixels if its pixel value is not 0, is put it and are labeled as 0 for all, if its pixel value is 0, then carry out following processing:
To pixel p (2,1): if having one in the mark of pixel p (1,1) and p (1,2) at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (2,1) distribution new non-0 mark;
To pixel p (2,2): if having one in the mark of pixel p (1,1), p (1,2), p (1,3), p (2,1) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (2,2); To among p (1,1), p (1,2), p (1,3), the p (2,1) any one, if it is labeled as non-0 and do not wait with the mark of p (2,2), it is of equal value then writing down these two marks;
To pixel p (2,3): if having one in the mark of pixel p (1,2), p (1,3), p (1,4), p (2,1), p (2,2) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (2,3); To among p (1,2), p (1,3), p (1,4), p (2,1), the p (2,2) any one, if it is labeled as non-0 and do not wait with the mark of p (2,3), it is of equal value then writing down these two marks;
To pixel p (2,4): if pixel p (1,3), p (1,4), p (1,5), have at least one to be non-0 in the mark of p (2,1), p (2,2), p (2,3), follow any one mark of non-0 among them, otherwise, distribute new non-0 mark for p (2,4); To among p (1,3), p (1,4), p (1,5), p (2,1), p (2,2), the p (2,3) any one, if it is labeled as non-0 and do not wait with the mark of p (2,4), it is of equal value then writing down these two marks;
Step 4: handle the 2nd the row other pixel p (2, j), wherein 4<j<N+1 for all these pixels, if its pixel value is not 0, puts it and is labeled as 0, if its pixel value is 0, then carries out following processing:
To pixel p (2, j): if pixel p (1, j-1), p (1, j), p (1, j+1), p (2, j-1), p (2, j-2), p (2, j-3), p (2, j-4) have one in the mark at least for non-0, non-0 the mark of following that first scans among them, otherwise, give p (2, j) distribute new non-0 mark; To p (1, j-1), p (1, j), p (1, j+1), p (2, j-1), p (2, j-2), p (2, j-3), p (2, j-4) among any one, if its be labeled as non-0 and with p (2, mark j) does not wait, and it is of equal value then writing down these two marks;
Step 5: handle 1~4 capable row pixel p of 3~M (i, j), 2<i<M+1 wherein, 0<j<5, this four row pixel if its pixel value is not 0, is put it and is labeled as 0 for all, if its pixel value is 0, then carries out following processing:
To pixel p (i, 1): if having one in the mark of p (i-2,1), p (i-1,1), p (i-1,2) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 1); To among p (i-2,1), p (i-1,1), the p (i-2,2) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 1), it is of equal value then writing down these two marks;
To pixel p (i, 2): if p (i-2,2), p (i-1,1), p (i-1,2), has one in the mark of p (i-1,3), p (i, 1) at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 2); To among p (i-2,2), p (i-1,1), p (i-1,2), p (i-1,3), the p (i, 1) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 2), it is of equal value then writing down these two marks;
To pixel p (i, 3): if p (i-2,3), p (i-1,2), p (i-1,3), have at least one to be non-0 in the mark of p (i-1,4), p (i, 1), p (i, 2), non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 3); To among p (i-2,3), p (i-1,2), p (i-1,3), p (i-1,4), p (i, 1), the p (i, 2) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 3), it is of equal value then writing down these two marks;
To pixel p (i, 4): if p (i-2,4), p (i-1,3), p (i-1,4), p (i-1,5), p (i, 1), p (i, 2), p (i, 3) have one in the mark at least for non-0, non-0 the mark of following that first scans among them, otherwise, distribute new non-0 mark for p (i, 4); To among p (i-2,4), p (i-1,3), p (i-1,4), p (i-1,5), p (i, 1), p (i, 2), the p (i, 3) any one, if it is labeled as non-0 and do not wait with the mark of p (i, 4), it is of equal value then writing down these two marks;
Step 6: handle the capable 5~N row pixel p of 3~M (i, j), 2<i<M+1 wherein, 4<j<N+1 for all these pixels, if its pixel value is not 0, puts it and is labeled as 0; If its pixel value is 0, then carry out following processing:
To pixel p (i, j): if p (i-2, j), p (i-1, j-1), p (i-1, j), p (i-1, j+1), p (i, j-1), p (i, j-2), p (i, j-3), (i has one at least for non-0 to p in mark j-4), non-0 the mark of following that first scans among them, otherwise (i j) distributes new non-0 mark to give p; To p (i-2, j), p (i-1, j-1), p (i-1, j), p (i-1, j+1), p (i, j-1), p (i, j-2), p (i, j-3), p (i, j-4) among any one, if its be labeled as non-0 and with p (i, mark j) does not wait, it is of equal value then writing down these two marks;
Step 7: all marks of equal value are merged into minimum mark in the mark of equal value.
4. the connected region extracting method that is used for printed page analysis according to claim 3 is characterized in that: the number that also comprises the steps: to add up all connected regions that obtain at last after the described step 7.
5. according to the arbitrary described connected region extracting method that is used for printed page analysis of claim 1-4, it is characterized in that: described connected region extracting method is used for the printed page analysis of business card identification.
6. connected region extraction element that is used for printed page analysis is characterized in that: comprise as lower unit: the neighborhood definition unit: be used for object pixel p (x, neighborhood N (p) y) is defined as:
N(p)={(x-1,y),(x+1,y),(x,y-1),(x,y+1),(x-1,y-1),(x+1,y-1),(x-1,y+1),(x+1,y+1),(x-4,y),(x-3,y),(x-2,y),(x+2,y),(x+3,y),(x+4,y),(x,y+2),(x,y-2)}
Extraction unit: be used for judging and object pixel p (x, (i, j), (i, whether j) in neighborhood N (p), and (x, y) (i j) extracts as same connected region judgement q with pixel q with pixel p when being y) to have any pixel q of same pixel value.
7. the connected region extraction element that is used for printed page analysis according to claim 6 is characterized in that: specifically comprise as lower unit:
Indexing unit: be used for from left to right, the binaryzation dot matrix image to business card carries out element marking from top to bottom; The method of described element marking is: if (x is a white y) to object pixel p, then object pixel is labeled as 0; If object pixel is a black, then judge object pixel scan neighborhood mark whether all to be 0; Object pixel is followed scan non-0 mark that first scans in the neighborhood if not all being 0, give new non-0 mark of object pixel if all be 0;
Merge cells: be used for merging from left to right, from top to bottom element marking; The method of described merging element marking is: non-0 mark of object pixel and neighborhood thereof is merged into non-0 mark of a unification, and non-0 of described unification is labeled as the minimum value in non-0 mark of object pixel and neighborhood thereof.
CN200810067409XA 2008-05-23 2008-05-23 Connected region extraction method and apparatus for copy block analysis Expired - Fee Related CN101290656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810067409XA CN101290656B (en) 2008-05-23 2008-05-23 Connected region extraction method and apparatus for copy block analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810067409XA CN101290656B (en) 2008-05-23 2008-05-23 Connected region extraction method and apparatus for copy block analysis

Publications (2)

Publication Number Publication Date
CN101290656A true CN101290656A (en) 2008-10-22
CN101290656B CN101290656B (en) 2011-04-27

Family

ID=40034907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810067409XA Expired - Fee Related CN101290656B (en) 2008-05-23 2008-05-23 Connected region extraction method and apparatus for copy block analysis

Country Status (1)

Country Link
CN (1) CN101290656B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065314A (en) * 2012-12-28 2013-04-24 中国电子科技集团公司第五十四研究所 Image communicated domain rapid marking method based on linear description
CN104318543A (en) * 2014-01-27 2015-01-28 郑州大学 Board metering method and device based on image processing method
CN106127786A (en) * 2016-07-04 2016-11-16 大连理工大学 The Fast Calibration of a kind of complicated connected region feature and extracting method
CN110517282A (en) * 2019-08-07 2019-11-29 哈尔滨工业大学 A kind of bianry image connected component labeling method
CN113793316A (en) * 2021-09-13 2021-12-14 合肥合滨智能机器人有限公司 Ultrasonic scanning area extraction method, device, equipment and storage medium
CN116489288A (en) * 2023-06-20 2023-07-25 北京医百科技有限公司 Method and device for solving maximum connected domain in image

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065314A (en) * 2012-12-28 2013-04-24 中国电子科技集团公司第五十四研究所 Image communicated domain rapid marking method based on linear description
CN103065314B (en) * 2012-12-28 2015-07-15 中国电子科技集团公司第五十四研究所 Image communicated domain rapid marking method based on linear description
CN104318543A (en) * 2014-01-27 2015-01-28 郑州大学 Board metering method and device based on image processing method
CN106127786A (en) * 2016-07-04 2016-11-16 大连理工大学 The Fast Calibration of a kind of complicated connected region feature and extracting method
CN106127786B (en) * 2016-07-04 2018-12-18 大连理工大学 A kind of Fast Calibration and extracting method of complexity connected region feature
CN110517282A (en) * 2019-08-07 2019-11-29 哈尔滨工业大学 A kind of bianry image connected component labeling method
CN113793316A (en) * 2021-09-13 2021-12-14 合肥合滨智能机器人有限公司 Ultrasonic scanning area extraction method, device, equipment and storage medium
CN113793316B (en) * 2021-09-13 2023-09-12 合肥合滨智能机器人有限公司 Ultrasonic scanning area extraction method, device, equipment and storage medium
CN116489288A (en) * 2023-06-20 2023-07-25 北京医百科技有限公司 Method and device for solving maximum connected domain in image
CN116489288B (en) * 2023-06-20 2023-09-05 北京医百科技有限公司 Method and device for solving maximum connected domain in image

Also Published As

Publication number Publication date
CN101290656B (en) 2011-04-27

Similar Documents

Publication Publication Date Title
CN101290656B (en) Connected region extraction method and apparatus for copy block analysis
Kong et al. A federated learning-based license plate recognition scheme for 5G-enabled internet of vehicles
CN100550038C (en) Image content recognizing method and recognition system
CN104809481A (en) Natural scene text detection method based on adaptive color clustering
CN108615058A (en) A kind of method, apparatus of character recognition, equipment and readable storage medium storing program for executing
CN107766854B (en) Method for realizing rapid page number identification based on template matching
CN102592132A (en) System and method for classifying digital image data
CN1240021C (en) Bill image processing equipment
WO2020125062A1 (en) Image fusion method and related device
CN105590112B (en) Text judgment method is tilted in a kind of image recognition
CN109086772A (en) A kind of recognition methods and system distorting adhesion character picture validation code
JP2004502244A (en) Digital image segmentation of mail by Hough transform
CN105184294B (en) It is a kind of based on pixel tracking inclination text judge recognition methods
Liu et al. Stroke filter for text localization in video images
JP2004272798A (en) Image reading device
CN106940804A (en) Architectural engineering material management system form data method for automatically inputting
CN110717397A (en) Online translation system based on mobile phone camera
CN108388898A (en) Character identifying method based on connector and template
Madushanka et al. Sinhala handwritten character recognition by using enhanced thinning and curvature histogram based method
CN105894475B (en) A kind of International Phonetic Symbols image character thinning method
CN113936137A (en) Method, system and storage medium for removing overlapping of image type text line detection areas
CN104408452B (en) A kind of Latin character correcting inclination method and system based on rotation projection width
CN110084117A (en) Document table line detecting method, system based on binary map segmented projection
CN114913518A (en) License plate recognition method, device, equipment and medium based on image processing
JP2019003534A (en) Image processing program, image processing apparatus, and image processing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110427

Termination date: 20120523