BE1007824A5

BE1007824A5 - Segmentation procedure

Info

Publication number: BE1007824A5
Application number: BE9301385A
Authority: BE
Original assignee: Delva Jean Pierre
Priority date: 1993-12-14
Filing date: 1993-12-14
Publication date: 1995-10-31

Abstract

Segmentation procedure for separating, by pixel on several levels in adigital document, the areas containing text/graphic data requiring two levelsper pixel from areas containing image data requiring several levels perpixel. The procedure consists in classifying the pixels in the digitaldocument in at least three categories of values, depending on theirbrightness; considering pads of NM pixels thus classified, and classifyingthem in text pads (T) and image pads (P) according to whether they containpixels whose values are more or less contrasted; and processing text pads (T)and image pads (P) separately to be able to process them using the bestprocess.

Description

       

   <Desc/Clms Page number 1> 
 



  Procédé de segmentation
L'invention concerne un procédé de segmentation pour séparer, dans un document mixte, numérisé en demi-teinteet donc sur plusieurs bits par points-par un scanner, des zones d'image de zones de texte/graphique. 



   Par   document"mixte",   on entend ici un document composé de zones de texte/graphique et de zones d'image, ou photographiques. Par"image", on entend ici des images monochromes en demi-teinte, et donc à plusieurs niveaux de gris, tandis que   par"texte/graphique"on   entend du texte proprement dit, ainsi que des graphiques et autres dessins au trait, c'est-à-dire des zones d'information à deux niveaux, noir et blanc. 



   Les"images"ainsi définies exigent plusieurs bits par point ou pixel, pour définir les niveaux de gris, tandis que le"texte/graphique"demande un seul bit, pour indiquer la présence ou l'absence d'un signe en ce point. 



   Le but de l'invention est de fournir à la restitution un document de qualité optimale, pour une taille minimale du fichier de données, ou un taux de compression maximal des données. 



   L'amélioration du taux de compression des données est obtenue par un traitement séparé des zones d'image et des zones de texte/graphique. En effet, les processus de compression optimale de données diffèrent selon qu'il s'agit de zone d'information image à plusieurs niveaux de gris (désignées ci-après par"zones multiniveaux") ou de zones d'information texte/graphique à deux niveaux (désignées ci-après par"zones biniveaux"), et la compression la plus efficace du document est obtenue lorsque l'on applique à chaque type d'information un algorithme de compression adapté. 



   Ainsi, l'information de texte/graphique peut être comprimée de manière optimale par la méthode du CCITT groupe 4, tandis que l'information d'image peut l'être par la méthode JPEG. 



   L'amélioration de la qualité du texte/graphique est 

 <Desc/Clms Page number 2> 

 obtenue par la reconnaissance des zones texte/graphique comme telles, et la transformation, dans ces zones, des pixels à plusieurs niveaux par des pixels à deux niveaux, pour éviter les grisés et les bavures en bordure des lettres ou traits, nuisibles à la clarté du texte/graphique. 



   Un but de l'invention est donc un procédé de reconnaissance des composantes (texte/graphique et image) d'un document mixte, de façon à permettre de les traiter ensuite séparément, de manière optimale. 



   Selon l'invention, on choisit de préférence quatre seuils   (SI,   S2, S3 et S4) sur l'échelle des niveaux de gris, de façon à définir 5 intervalles sur ladite échelle,
On définit ensuite 4 classes de gris, soit : - la classe n1 pour les niveaux de gris dans l'intervalle entre 0 et   81   la classe n2 pour les niveaux de gis dans l'intervalle entre S2 et 255 - la classe n3 pour les niveaux de gris dans l'intervalle entre S3 et S4 la classe n4 pour les niveaux de gris dans les intervalles entre   81   et S3, et S4 et S2. 



   En considérant à titre d'exemple une numérisation du document sur 8 bits, et donc 255 niveaux de gris partant du plus foncé   ("0")   au plus clair ("255"), on définit ainsi l'échelle des valeurs suivantes : 
 EMI2.1 
 
Schématiquement, la classe n1 correspond donc au gris foncé, ou aux zones les plus sombres du document, telles que les lettres et les traits des zones de texte/graphique et les régions les plus foncées des zones d'image, tandis que la classe n2 correspond aux zones les plus claires, soit au fond du document, la classe n3 aux demi-teintes, typiques des zones d'image, et la zone discontinue n4 aux gris intermédiaires. 

 <Desc/Clms Page number 3> 

 



   On classe ensuite les pixels du documents dans l'une des quatre classes, en fonction de leur niveau de gris, par comparaison de la valeur associée à chaque pixel aux seuils ainsi définis. 



   L'étape suivante consiste à considérer des pavés de 
 EMI3.1 
 N x M pixels et à les classer comme zones"T"de texte/graphique, ou comme zones"P"d'image, selon que leur contenu est plus ou moins contrasté, les zones de texte/graphique étant par nature très contrastées (absence de demi-teintes), tandis que les zones d'image sont par nature moins contrastées. 



   Dans un exemple de mise en oeuvre de l'invention, cette classification se fait comme suit. 



   Un pavé est classé image"P"si
Nn3 > v1 ou si
Nn2 < v2, et   Nnl   < v3, et Nn3 > v4 
 EMI3.2 
 où - Nni représente le nombre de pixels de classe ni dans le pavé considéré - v1, v2, v3 et v4 sont des entiers, dont la valeur est déterminée empiriquement sur la base du traitement d'un grand nombre de documents, tandis qu'il est classé   texte/graphique"T"si   ces inégalités ne sont pas satisfaites. v1 est donc le seuil au-delà duquel le nombre Nn3 de pixels de la classe n3, des gris moyens, est considéré comme suffisamment significatif à lui seul pour que le pavé soit considéré comme un pavé image, et v4 est le seuil en dessous duquel ce nombre est suffisamment significatif à lui seul pour que le pavé soit considéré comme un pavé texte.

   Pour les valeurs intermédiaires de Nn3, et donc les cas douteux, le pavé sera encore considéré comme un pavé image s'il ne contient pas un nombre de pixels blancs et/ou noirs (Nn2,   Nnl)   supérieur à un seuil (v2, v3) déterminé. 



   Comme on l'a mentionné ci-dessus, les paramètres v1, v2, v3 et v4 sont déterminés expérimentalement, par essais 

 <Desc/Clms Page number 4> 

 et erreurs, par analyse d'un grand nombre de documents. 



   L'information contenue dans les pavés"T"est alors seuillée sur deux niveaux,   comme"blanc"ou"noir",   pour éviter les grisés dans le texte et les graphiques. 



   Les pavés"T"et"P", associés chacun à l'information concernant sa position dans le document, sont ensuite regroupés en deux ensembles distincts, réalisant ainsi la segmentation recherchée selon l'invention, et l'information texte/graphique peut être comprimée par la méthode du CCITT ou une méthode analogue, tandis que l'information image peut être comprimée par la méthode JPEG ou une méthode analogue. 



   A titre d'exemple, le seuillage peut se faire par comparaison au seuil S3, les valeurs inférieures à ce seuil 
 EMI4.1 
 étant codées""o"ou"noir", et les valeurs supérieures à ce seuil étant codées"1"ou"blanc". 



   Dans la description ci-dessus, on a considéré quatre classes n1 à n4 pour les pixels. 



   Ceci constitue une réalisation préférée, mais le procédé de l'invention fonctionne également avec par exemple trois classes, en regroupant les classes extrêmes n1 et n2.



   <Desc / Clms Page number 1>
 



  Segmentation process
The invention relates to a segmentation method for separating, in a mixed document, scanned in halftone and therefore on several bits by points-by a scanner, image zones from text / graphic zones.



   The term “mixed” document is understood here to mean a document composed of text / graphic areas and image, or photographic areas. By "image" we mean here monochrome halftone images, and therefore with several levels of gray, while by "text / graphics" we mean actual text, as well as graphics and other line drawings, c that is to say, two-level information areas, black and white.



   The "images" thus defined require several bits per point or pixel, to define the gray levels, while the "text / graphic" requires a single bit, to indicate the presence or absence of a sign at this point.



   The object of the invention is to provide the document with an optimal quality, for a minimum size of the data file, or a maximum data compression rate.



   The improvement of the compression rate of the data is obtained by a separate processing of the image zones and the text / graphic zones. Indeed, the optimal data compression processes differ depending on whether it is an image information area with several gray levels (hereinafter referred to as "multilevel areas") or text / graphic information areas with two levels (hereinafter referred to as "two-level areas"), and the most efficient compression of the document is obtained when a suitable compression algorithm is applied to each type of information.



   Thus, the text / graphic information can be optimally compressed by the CCITT group 4 method, while the image information can be compressed by the JPEG method.



   The improvement in the quality of text / graphics is

 <Desc / Clms Page number 2>

 obtained by recognizing the text / graphic areas as such, and transforming, in these areas, pixels at several levels by pixels at two levels, to avoid gray and smudges at the edge of letters or lines, detrimental to clarity text / graphics.



   An object of the invention is therefore a method of recognizing the components (text / graphic and image) of a mixed document, so as to allow them to be treated separately, in an optimal manner.



   According to the invention, four thresholds (SI, S2, S3 and S4) are preferably chosen on the gray level scale, so as to define 5 intervals on said scale,
We then define 4 gray classes, either: - class n1 for gray levels in the range between 0 and 81 class n2 for gray levels in the range between S2 and 255 - class n3 for levels of gray in the interval between S3 and S4 the class n4 for the gray levels in the intervals between 81 and S3, and S4 and S2.



   By considering by way of example a digitization of the document on 8 bits, and therefore 255 levels of gray starting from the darkest ("0") to the lightest ("255"), the scale of the following values is thus defined:
 EMI2.1
 
Schematically, class n1 therefore corresponds to dark gray, or to the darkest areas of the document, such as letters and lines of text / graphic areas and the darkest regions of image areas, while class n2 corresponds to the lightest areas, ie at the bottom of the document, class n3 in halftone, typical of image areas, and the discontinuous area n4 in intermediate gray.

 <Desc / Clms Page number 3>

 



   The pixels of the document are then classified into one of the four classes, as a function of their gray level, by comparison of the value associated with each pixel with the thresholds thus defined.



   The next step is to consider paving stones
 EMI3.1
 N x M pixels and to classify them as "T" areas of text / graphics, or as "P" areas of images, depending on whether their content is more or less contrasted, the text / graphics areas being by nature very contrasted ( lack of halftone), while the image areas are by nature less contrasted.



   In an exemplary implementation of the invention, this classification is done as follows.



   A block is classified image "P" if
Nn3> v1 or if
Nn2 <v2, and Nnl <v3, and Nn3> v4
 EMI3.2
 where - Nni represents the number of class pixels ni in the block considered - v1, v2, v3 and v4 are integers, the value of which is determined empirically on the basis of the processing of a large number of documents, while it is classified text / graph "T" if these inequalities are not satisfied. v1 is therefore the threshold beyond which the number Nn3 of pixels of class n3, of medium gray, is considered to be sufficiently significant in itself for the block to be considered as an image block, and v4 is the threshold below which this number is significant enough on its own for the block to be considered a text block.

   For the intermediate values of Nn3, and therefore doubtful cases, the block will still be considered as an image block if it does not contain a number of white and / or black pixels (Nn2, Nnl) greater than a threshold (v2, v3 ) determined.



   As mentioned above, the parameters v1, v2, v3 and v4 are determined experimentally, by tests

 <Desc / Clms Page number 4>

 and errors, by analyzing a large number of documents.



   The information contained in the "T" blocks is then thresholded on two levels, such as "white" or "black", to avoid gray in the text and graphics.



   The blocks "T" and "P", each associated with the information concerning its position in the document, are then grouped into two distinct sets, thus achieving the desired segmentation according to the invention, and the text / graphic information can be compressed by the CCITT method or a similar method, while image information can be compressed by the JPEG method or a similar method.



   For example, thresholding can be done by comparison with threshold S3, values below this threshold
 EMI4.1
 being coded "" o "or" black ", and the values higher than this threshold being coded" 1 "or" white ".



   In the description above, four classes n1 to n4 have been considered for the pixels.



   This constitutes a preferred embodiment, but the method of the invention also works with, for example, three classes, by grouping the extreme classes n1 and n2.

Claims

CLAIMS 1. Segmentation method for separating, in a document scanned on several levels per pixel, areas containing text / graphic information requesting two levels per pixel, from areas containing image information requesting several levels per pixel, process in which the document is divided into blocks, and the classification of the blocks into text blocks (T) and image blocks (P) is carried out by passing through an intermediate classification in a higher number of classes, characterized in that it consists in a ) define for the intermediate classification, at least three classes of values, namely a class n1 for extreme values, corresponding to the lightest and darkest areas, a class n3 for average values, corresponding to the most nuanced areas a class n4 for intermediate values,

separating the average values from the extreme values, on both sides, b) classifying each pixel of the scanned document in one of said classes, according to the value associated therewith, c) classifying it in text blocks (T) and in image blocks (I) by considering blocks of N * M pixels thus classified, and by classifying them according to the more or less contrasted nature of the pixels which they contain, d) threshold on two levels, "white" and " black ", the pixels of the blocks" T "e) associating with each block the information relating to its position in the document f) grouping the two types of blocks in distinct sets, in order to be able to process them by the optimal process.

2. Method according to claim 1, characterized in that it consists in providing two distinct classes for the extreme values, namely a class n1 for the values <Desc / Clms Page number 6> lower extremes, and a class n2 for upper extreme values.

3. Method according to claim 1 or 2, characterized in that it consists in classifying the blocks in text blocks (T) and in image blocks (P) is done according to the relative weight of the contrasting classes (n1 and n2) of pixels compared to the nuanced class (n3) in the box.

4. Method according to claim 3, in which the classification of the blocks is made as a function of the following inequalities: - the block is classified image (P) if Nn3> v1 or if EMI6.1 Nn2 <v2 and Nnl <v3 and Nn3> v4 while it is classified text / graph "T" if these inequalities are not satisfied. where - Nni represents the number of pixels classified neither in the block considered - v1, v2, v3 and v4 are integers, the value of which is determined empirically on the basis of the processing of a large number of documents.