WO2004013802A2 - Procede et systeme de localisation automatique de zones de texte dans une image - Google Patents
Procede et systeme de localisation automatique de zones de texte dans une image Download PDFInfo
- Publication number
- WO2004013802A2 WO2004013802A2 PCT/FR2003/002406 FR0302406W WO2004013802A2 WO 2004013802 A2 WO2004013802 A2 WO 2004013802A2 FR 0302406 W FR0302406 W FR 0302406W WO 2004013802 A2 WO2004013802 A2 WO 2004013802A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- pixels
- text
- value
- binary image
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the present invention relates to a method and a system for automatically locating text areas in an image.
- OCR optical character recognition
- WO 01/69529 A2 describes a method for locating text in digital images. According to this method, a digital image is first scaled into images of different resolutions, and then a neural network is used to determine whether the pixels in the images of different resolutions are part of text boxes or not. The results obtained are then represented by initial boxes including text. These initial boxes containing text are then examined using horizontal or vertical projection profiles with adaptive thresholds.
- the document WO 00/63833 describes a method for segmenting an image into text areas and areas without text. This process is based on a simple spatial quantification, based on blocks, of the gray level histogram at 15 intensity levels.
- the object of the present invention is to remedy the drawbacks of the systems and methods of the prior art and to allow reliable detection of text zones in an image, so that the text zones located by the method and the system according to the invention. he invention could then be the subject of a conventional optical character recognition processing in order to obtain complete texts.
- the invention aims in particular to allow the localization of text zones in video images of different types of programs (advertising, television information, short or feature films, etc.) and whatever the presentation of this text, with different types and styles of characters and even in the case where the background image is complex.
- the invention thus aims to enable a search by semantic content in sequences of images, taking into account both indications in the form of natural text appearing in images, such as street names or shop signs, only in the form of artificial text introduced, for example in the form of subtitles, in a post-processing of the images, after the shooting.
- a method for automatically locating text areas in a digital image characterized in that it comprises a first step of converting the digital image into a binary image, a second step of locating text areas potentials and a third step of selecting effective text areas.
- the second step of locating potential text areas includes the application of morphological operations on the binary image in order to produce closed blocks capable of containing text, in the original image. If the image or images to be processed are not already in digital form, a preliminary step may simply consist in an analog-digital conversion of the images to be processed.
- the first step comprises a step of converting a digital image into an image defined by gray levels.
- the first step of converting the digital image into a binary image comprises a multiresolution step using an interpolation method to transform an input image I into an output image J of lower resolution whose size is M times that of the input image I, with 0 ⁇ M ⁇ 1.
- the first step of converting the digital image into a binary image comprises a binarization step using a thresholding method to transform an input image I in gray levels into a binary image BW, each pixel of the input image I having a value lower than a predefined threshold being converted in the binary image BW to a value 0 corresponding to black and all the other pixels of the input image I being converted in l binary image BW at a value 1 corresponding to white.
- a thresholding method to transform an input image I in gray levels into a binary image BW, each pixel of the input image I having a value lower than a predefined threshold being converted in the binary image BW to a value 0 corresponding to black and all the other pixels of the input image I being converted in l binary image BW at a value 1 corresponding to white.
- the second step of locating potential text areas includes the application of different morphological masks in an order which can be adapted to the particular contexts of implementation of the invention.
- the second step of locating potential text areas comprises the application of at least one morphological mask to perform on the binary image at least one morphological operation according to which the value 1 is assigned to all the pixels d 'a line or column when in the binary image the end pixels of this line or of this column both have the value 1.
- the second step of locating potential text zones comprises the application of at least one morphological mask to perform on the binary image at least one morphological operation according to which the value 1 is assigned to all the pixels d 'a rectangle or a square defined on two lines or two columns when in the binary image two pixels located diagonally at the ends of this rectangle or this square both have the value 1.
- the second step of locating potential text areas comprises an initial step according to which a morphological mask is applied to perform on the binary image a morphological operation according to which, for each row or each column comprising at its ends two pixels of value 1 and having a length greater than a threshold corresponding to a percentage less than 100% of the dimension of the ima ge resulting from the multiresolution step, all the pixels of the row or column considered are assigned a value 0.
- this threshold at 75% of the width of the image resulting from the multiresolution step when the line is the preferred direction.
- the second step of locating potential text areas can firstly include the application of a morphological mask to perform on the binary image a morphological operation according to which the value 0 is assigned to each pixel of the binary image which is surrounded by pixels which all have the value 0.
- the operations are applied morphological by considering exclusively the lines of the binary image.
- one proceeds to a transposition of the image defined by gray levels and represented by a matrix G into a transposed image represented by a transposed matrix 'G and one applies to this matrix transposed t G the morphological operations of the second step of locating potential text areas by exclusively considering the lines of the binary image.
- the morphological operations of the second step of locating potential text areas are again applied to the image G defined by gray levels and represented by a matrix G, exclusively considering the columns of l binary image through the use of transposed morphological operators.
- the third step of selecting effective text zones comprises a prior step of separating the pixels belonging to the background of the image, during which a division is made of the intensity applied to the image in shades of grey.
- u is a constant representing a gray level value between 0 and L.
- the value of the constant u can be determined dynamically from the histogram H of the grayscale image G comprising N shades, obtained from the input image I after the conversion step d '' a digital image into an image defined by gray levels, as follows:
- the threshold is fixed at 2% of the total number of pixels in the image, but this threshold can be modified depending on the application.
- the effective text areas are filtered by locating the two most important peaks of the histogram of each of the potential text zones, these two most important peaks being identified by their positions Pi and P 2 respectively and by classifying as effective text zone any zone for which the distance D (P ⁇ , P 2 ) is greater than a predetermined threshold S and ignoring any other potential text area that does not meet this condition.
- the method according to the invention further comprises a step of delimiting the borders of the effective text zones in a first preferred direction, according to which for each effective text zone, first a representative line Rh ⁇ g (i) oriented according to the first preferred direction among all the lines of the effective text zone considered oriented in this first preferred direction, the line Rh ⁇ g (i) is compared with the adjacent line which immediately precedes Rh ⁇ g (i-1) and respectively with the adjacent line immediately following Rh ⁇ g (i + 1), for each couple of lines, we decide to merge the two lines into a single block of text if the intersection is not empty between the Pos higQ and Pos Rhigo-i sets) or respectively between the sets Pos Rh ig i) and Pos Rhig (i + i) which relate to positions for the pixels of the lines Rh ⁇ g (i) and Rh ⁇ g (i-1), or respectively of the lines Rh
- the method according to the invention may also comprise a step of delimiting the borders of the effective text areas oriented in a second preferred direction perpendicular to the first preferred direction, according to which, for each effective text area, first selecting a representative line Rh ⁇ g (i) oriented in the first preferred direction among all the lines in the text area effective considered oriented in the first preferred direction, at each iteration, we only consider the pixels lying on either side of, the pixels forming said representative line Rh ⁇ g (i) and added to the representative line Rh ⁇ g (i ) exclusively the pixels having the same color as the pixels of the representative line Rh ⁇ g (i).
- the first preferred direction can be a horizontal or vertical direction.
- the representative line Rh ⁇ g (i) oriented in the first preferred direction is constituted by the line comprising the maximum number of pixels having a value equal to the maximum value L corresponding to white.
- the closed blocks produced capable of containing text advantageously have the form of parallelograms and preferably the form of rectangles.
- a limited area of the image is preselected to which the other processing steps are applied aimed at locating text areas .
- the invention also relates to a system for automatically locating text areas in a digital image, characterized in that it comprises a unit for converting an input digital image into a binary image, a unit for locating text areas potentials applied to the binary image and a unit for selecting effective text zones highlighted by said localization unit.
- the unit for locating potential text areas comprises means for applying at least one morphological filter to the binary image resulting from the conversion of the digital image into a binary image.
- the unit for converting an input digital image I into a binary image comprises means for converting a digital image I into an image G defined by gray levels.
- the unit for converting an input digital image into a binary image comprises at least one multiresolution module comprising interpolation means for transforming an input image into an output image of lower resolution .
- the unit for converting an input digital image into a binary image comprises at least one thresholding module for transforming an input image in gray levels into a binary image BW.
- the system includes means for transposing matrices representative of morphological images or masks.
- the method and the system according to the invention can give rise to a very large number of applications.
- Such a system for detecting and recognizing number plates may include a device for capturing digital images, such as a digital video camera, an image analysis module, and a database management system for storing and comparing data.
- the image analysis module must first locate the area of the license plate, then extract this area and provide the information relating to this area, if applicable if necessary after a post-processing, at the entry of an OCR type system to obtain, in the form of an alphanumeric text, the indications of the registration number.
- Another possible application of the method and of the system according to the invention consists in the detection of logos and the recognition of these in television broadcasts.
- FIG. 1 is a flowchart showing schematically the main steps of the method for automatically locating text areas in an image, in accordance with the invention
- FIG. 2A shows an example of starting image comprising two text areas against a complex background
- FIG. 2B represents an output binary image having undergone a first processing of enhancement of the shapes of potential text zones, in accordance with the invention
- FIG. 2C represents a binary image which has also given rise to the elimination of manifestly incorrect potential text areas
- FIG. 2D represents an image such as that of FIG. 2C having also given rise, in accordance with the invention, to a step of locating potential text areas by the application of morphological masks,
- FIG. 3 shows on a larger scale the image of Figure 2D
- - Figures 4 to 8 show the histograms of the different potential text regions of Figure 3, after applying a step of separating the pixels from the areas of potential text relative to the background of the image,
- FIGS. 9 to 15 represent various examples of the application of morphological masks to an image such as that of FIG. 2C or, where appropriate, of FIG. 2B,
- - Figure 16 shows various examples of images presenting text on a complex background and to which the method according to the invention can be applied
- - Figure 17 is a block diagram showing the essential components of an example system automatic location of text areas in an image, according to the invention.
- the system and the method according to the invention can be applied to the detection of natural text included in the images from the moment of shooting, such as for example names of shop signs, names of streets or indications on signs or bulletin boards. This is the case, for example, of image 143 in FIG. 16 which shows on a door a function name "guardian".
- the invention also applies to the detection of artificial text superimposed on images during editing.
- Natural text has certain special characteristics which can be used to facilitate detection: - the characters of the text are in the foreground,
- the characters of the text have dimensions framed within certain limits (for example, a letter is never as large as the surface of the screen and the minimum size of the characters includes a minimum number of pixels for the characters to be readable) .
- the method according to the invention applies to digital images having a complex background, which may have a low resolution and be affected by noise, and without control parameters.
- the method can thus be applied to video images, limits false detections and makes it possible to locate and extract text zones with very high reliability, even with low quality images.
- FIG 1 shows the main steps of the method according to the invention.
- step 10 From a digital color image, first of all a step 10 is carried out to transform the digital image into a gray level digital image.
- Step 20 can include a multiresolution step 21 and a binarization step 22, the order of steps 21 and 22 being interchangeable.
- step 30 With the binary image resulting from step 20, one proceeds in step 30 to a localization of the potential text areas to obtain a binary image with potential text areas delimited by white blocks.
- the effective text areas are selected, which can then be subjected in the initial digital image to a conventional optical character recognition (OCR) process.
- OCR optical character recognition
- the starting image is a digital image represented by one or more matrices. If this is not the case, for example if the input image is in a compressed format such as for example the JPEG format, we first convert the input image into a digital image in a matrix form . In the same way, if one has input images in analog form, one converts first by conventional techniques, these analog images in a digital form.
- the digital input image I is a color image
- this is converted into a grayscale image G.
- This conversion step 10 can be carried out by conventional techniques. It consists of a simple conversion of a digital color image, generally represented by three matrices in the color space, for example RGB, into a grayscale matrix. This step is necessary and essential for step 22 of binarization which will be described in more detail in the following description.
- the initial step 10 can also if necessary be accompanied by an additional step of calculating the transposed matrix of the matrix G.
- the transposed matrix te resulting from this operation can be used for example for the detection of regions of vertical text .
- the transpose ⁇ of the matrix A is formed by interchanging the rows and the columns of the matrix A.
- the i th row of the matrix A becomes the i th column of the transposed matrix * To whatever i.
- the transposed matrix ⁇ is thus an nx m matrix.
- a digital image I and a morphological operator M can both be considered as matrices, the transposed matrices of which can be determined according to the definition given above.
- the step 30 of locating potential text zones comprises the application of morphological filters.
- a morphological filter is a mask.
- the regions of the image in which text is likely to appear are known in advance, for example in the case detection of artificial text, such as subtitles, it is also possible from the initial step 10, to define a preferred region in which the text zones will be sought.
- a preferred region definition makes it possible to speed up the localization process by limiting the extent of the image to which all of the steps 20 to 40 of the method illustrated in FIG. 1 are applied.
- step 20 of enhancing the shapes of the text zones.
- the location of probable areas of text presence in an image is part of an image preprocessing which is fundamental to allow the correct detection of text.
- a multiresolution approach and a conversion of the grayscale image into a binary image are used to highlight the shapes of probable text zones.
- the conversion of an input image in gray levels I into a binary image BW takes place by thresholding.
- the output binary image BW has a value of 0 (black) for all the pixels of the input image I which have a value below a predetermined threshold and a value of 1 (white) for all the others. pixels.
- step 21 The implementation of a multiresolution method (step 21) for locating lines of text is based on the basic characteristic that a line of text appears as a solid line in a low resolution image.
- the multiresolution method when applied to an input image I, results in producing an output image J which has M times the size of the image I..
- image J is smaller than image I. If M is greater than 1.0, image J is larger than image I.
- M is greater than 1.0, image J is larger than image I.
- the parameter M can vary and be adapted for example to the size of the image.
- the method according to the invention does not depend on the value of the parameter M, as long as it is between 0 and 1. It is also possible to change the threshold value used to convert a grayscale image into a binary image , for example depending on the input image. For example, this threshold value can be of the order of 0.7.
- Figure 2B clearly shows that the multiresolution method makes it possible to filter the input image while keeping only related components having a homogeneous color corresponding to a significant area.
- Step 30 of locating potential text areas consists of applying morphological masks to binary images such as those of FIGS. 2B or 2C in order to obtain the closing of blocks likely to contain text, by filling in the areas spaces between characters or words.
- the starting binary image is an image such as those of
- Figures 2B or 2C from step 20, several binary morphological operations are repeatedly applied until the image obtained J no longer presents many changes compared to the previous image and presents an appearance with closed blocks such as that of Figure 2D.
- three different morphological masks can be used to close the blocks likely to contain text. These different morphological masks can be combined with each other and applied in different orders.
- the first morphological mask Mi is represented in FIG. 9. Considering a line 50 of pixels 51 to 58, all the intermediate pixels 52 to 57 are set to the value "1" regardless of their initial value “0" or "1 ", when the end pixels 51 and 58, on the left and on the right have the value 1.
- the same operation can be done on columns, for example by using the transposed matrix of Mi, as indicated above, or by using the transpose of the matrix representing the input image.
- the second morphological mask Nb is represented in FIG. 10.
- Starting rectangles 60 and 70 comprising pixels 61 to 66 and 71 to 76 are transformed into a rectangle 80 comprising pixels 81 to 86.
- the starting rectangle 60, respectively 70 includes pixels
- the rectangle 80 of the transformed image comprises pixels 81 to 86 which all have the value "1".
- the operation of the morphological mask Nb can be applied to rows or columns using transposed matrices.
- the third morphological mask M is shown in Figure 11.
- This mask M 3 is very similar to the morphological mask Nb and aims to obtain the closure of diagonals. From square elements 90A, 90B of a starting image, a square element 100 of converted image is obtained.
- FIGS. 12 and 13 show two examples of the application of the third morphological mask M 3 .
- the square 90C comprising two diagonal pixels 92C, 93C having the value "1" and the other two pixels 91C, 94C which having the value "0".
- a second step we give the value "1" to the pixel 91C located at the top left, while the other pixels 92C to 94C are unchanged, so that we obtain a square 100 of which all the pixels 111 to 114 have the value 1.
- Figure 13 shows a case similar to that of Figure 12 but where we proceed symmetrically.
- the starting square 90D we start by giving the value "1" to the pixel 90D located at the top left which initially has the value 0, the other pixels 92D to 94D having unchanged values equal to "1" for the pixels 92D, 93D and equal to 0 for the pixel 94D.
- Pixel 94D 'located at the bottom right is then given the value "1" while the other pixels 91D' to 93D 'have an unchanged value "1".
- a square 100 is thus obtained in the same way, all the pixels 111 to 114 of which have the value 1.
- the operations of FIGS. 12 to 13 can be carried out in parallel, which corresponds to the process illustrated in FIG. 11.
- Figures 2D and 3 show all regions as closed blocks 1 to 5 with a probability of containing text. We can note that we can identify five candidate zones 1 to 5 likely to contain text whereas in the initial image of Figure 2A we see only two zones actually containing
- the detection of potential text regions on the input image I can be derived by mapping between the coordinates of blocks of potential text in the binary image and those of the input image I. We can then apply to potential text regions detected on the input image various OCR techniques.
- step 20 based on multiresolution and binarization is an effective process when applied to a document containing text, in which a pixel belongs either to the background of the image, or to a certain significant object of the image.
- a digital image comprising a complex background
- step 21 of multiresolution constitutes only a preprocessing making it possible to carry out a first location of candidate regions likely to contain text.
- Each candidate region 1 to 5 (FIG. 3) is then examined again during a selection step 40 in order to determine whether this candidate region actually contains text or not.
- the step 40 for selecting effective text areas itself comprises two steps which include separating the pixels from the background of the image and filtering the effective text regions.
- the step of separating the background pixels from the image aims to highlight the pixels of the characters with respect to the background of the image.
- a method of cutting out the intensity of the grayscale image obtained after the first image transformation step is applied. This technique is useful when different characteristics of an image are contained in different levels of gray.
- the value of u is determined dynamically from the histogram H of the grayscale image G (for example in 256 shades) obtained from the input image I after the step 10, as follows:
- L is initialized with the value 256 (white color). 2. To determine the value of u, we first calculate the number of pixels Nb having the color 256, then we gradually add to the number Nb the number of pixels having the color 255, then 254 and so on until the number Nb is greater than a threshold representing a small percentage of the total number of pixels in the image. The last color of the histogram H, taken into account in this operation, is assigned to u.
- the threshold is fixed at 2% of the total number of pixels, but this threshold can be modified according to the applications.
- the effective text regions are filtered by a simple analysis of the spatial variation of all the candidate regions likely to contain text, after transformation by the previously described operation of separating the pixels representing characters from the background of the 'picture. This analysis is based on the characteristic according to which the characters of a text generally present a significant contrast with the background.
- Figures 4 to 8 represent such an approach applied to the potential text regions 1 to 5 identified in Figure 3.
- the potential text region is considered to be an effective text region. Otherwise, it is simply ignored.
- regions 1, 2 and 3 in Figure 3 have little spatial variation, since the distances between the local maximums 101, 102 ( Figure 4), 201 to 204 ( Figure 5), 301 to 305 ( Figure 6) are weak. As a result, these regions will then be ignored.
- regions 4 and 5 of Figure 3 have a strong spatial variation, since the distances D (P ⁇ , P 2 ) between the local maximums 401, 402 ( Figure 7) or 501, 502 ( Figure 8) are high. These regions 4 and 5 will therefore be retained.
- the threshold value can be chosen for example to be equal to 15% of the total number in the gray scale levels.
- the precision of the method is all the better the higher the threshold value.
- the method according to the invention can present various variants and additional steps aimed at better delimiting the borders of the text regions or at speeding up the whole process by eliminating a few potential text regions which are obviously negative.
- Rh ⁇ g (i) can be made by selecting the line which is formed by the maximum number of horizontally aligned pixels belonging to characters.
- the selected line Rh ⁇ g (i) will be the line formed by the maximum number of pixels having a value equal to L because after the transformation consisting in separating the pixels from the background, the characters in a text region are considered to be monochrome and contrast with the background of the image.
- Rh ⁇ g (i) we then proceed to a comparison of Rh ⁇ g (i) with the adjacent line Rh ⁇ g (i-1) which immediately precedes (respectively with the adjacent line Rh ⁇ g (i + 1) which immediately follows), in order to decide whether to merge or not the two lines in the same text block.
- the fusion criterion is based on the spatial distribution of the gray values and the principle of connected monochrome pixels as follows: Let Pos R h ig (i) and Pos Rhig ⁇ -i) (respectively Pos Rhigo + i)) two sets which describe the positions of the pixels in the line Rh ⁇ g (i) and Rhig (il) (respectively Rh ⁇ g (i + 1)) which have a gray value equal to L.
- the delimitation principles can be applied which have just been exposed, for example by first carrying out a vertical delimitation. But this amounts to working on the transposition of the matrix which represents the input image, as was explained above in relation to the transformation of digital images.
- the process of locating text regions can be speeded up when one has some prior knowledge of regions likely to contain text.
- FIG. 2C illustrates the result of such a negative-form elimination method applied to the image of FIG. 2B.
- FIG. 14 it can be seen that, for a line 120 comprising pixels 121 to 128, the two end pixels 121 and 128 of which have the value "1", while the other pixels 122 to 127 each have a value " 0 "or” 1 ", in the case where the length of the line is greater than a threshold l t (for example equal to 75% of the size of the image resulting from the multiresolution process), then all the pixels 121 to 128 are set to the value "0" corresponding to black.
- a threshold l t for example equal to 75% of the size of the image resulting from the multiresolution process
- Figure 15 shows another example of a possible improvement consisting of filling in diagonals to eliminate an isolated pixel in the background of the image.
- the morphological operator NI5 illustrated in FIG. 15 consists, in a square 130 of nine pixels, of giving the value "0" to an isolated central pixel 135 of value "1" surrounded by eight pixels 131 to 134, 136 to 139 of value "0".
- FIG. 17 shows the block diagram of an example of an automatic system for locating text areas in an image implementing the invention.
- An input digital image I is first applied to a processing unit 150 which converts the input digital image I into an image G defined by gray levels.
- the grayscale image G is itself applied to a processing unit 160.
- the processing unit 160 comprises an input module 163, which can, for example, calculate the transpose of the matrix of the image G in gray levels, or of the transpose of representative matrices of morphological masks.
- the input module 163 can also, if necessary, make it possible to define (a priori) regions of the image G which constitute subsets in which the process of searching for text zones will be carried out.
- the input module 163 cooperates with a multiresolution module 161 which includes interpolation means to transform an image applied to it into a lower resolution image.
- the input module 163 also cooperates with a thresholding module 162 which transforms a grayscale image which is applied to it into a binary image BW.
- the input module 163 can call on modules 161 and 162 in any order. Each of the modules 161, 162 can also use as an input image directly an image produced by the other module.
- the binary image output from the processing unit 160 is applied to a unit 170 for locating potential text areas.
- the location unit 170 includes one or more morphological filters and makes it possible to apply morphological masks to the binary image from the processing unit 160 in order to close blocks likely to contain text.
- the selection unit 180 then makes it possible to select the effective text zones from the potential text zones highlighted by the localization unit 170.
- the selection unit 180 implements the previously described method of cutting out the intensity applied to the grayscale image from the processing unit 160, and applies to all candidate regions likely to contain text highlighted by the location unit 170 a filtering consisting of an analysis of the spatial variation of the candidate regions, after having carried out a separation of the pixels from the background of the image.
- the units and modules of the system for automatically locating text areas in an image can be produced in hardware or software.
- a processing unit 190 acts on the initial digital image I, in the areas located and selected by the location units 170 and selection 180, to carry out various conventional treatments of optical character recognition. These conventional treatments are therefore only applied to very limited targeted regions of the input image.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
- Facsimile Image Signal Circuits (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP03750862A EP1525553A2 (fr) | 2002-07-31 | 2003-07-30 | Procede et systeme de localisation automatique de zones de texte dans une image |
AU2003269080A AU2003269080A1 (en) | 2002-07-31 | 2003-07-30 | Method and system for automatically locating text areas in an image |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR02/09749 | 2002-07-31 | ||
FR0209749A FR2843220B1 (fr) | 2002-07-31 | 2002-07-31 | "procede et systeme de localisation automatique de zones de texte dans une image" |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2004013802A2 true WO2004013802A2 (fr) | 2004-02-12 |
WO2004013802A3 WO2004013802A3 (fr) | 2004-04-08 |
Family
ID=30129584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2003/002406 WO2004013802A2 (fr) | 2002-07-31 | 2003-07-30 | Procede et systeme de localisation automatique de zones de texte dans une image |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1525553A2 (fr) |
CN (1) | CN1685358A (fr) |
AU (1) | AU2003269080A1 (fr) |
FR (1) | FR2843220B1 (fr) |
WO (1) | WO2004013802A2 (fr) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101667251B (zh) * | 2008-09-05 | 2014-07-23 | 三星电子株式会社 | 具备辅助定位功能的ocr识别方法和装置 |
CN102081731B (zh) * | 2009-11-26 | 2013-01-23 | 中国移动通信集团广东有限公司 | 一种从图像中提取文本的方法和装置 |
CN102411707A (zh) * | 2011-10-31 | 2012-04-11 | 世纪龙信息网络有限责任公司 | 一种图片中文本的识别方法及识别装置 |
CN103186786A (zh) * | 2011-12-30 | 2013-07-03 | 鸿富锦精密工业(深圳)有限公司 | 封闭图形识别系统及方法 |
CN108959287B (zh) | 2017-05-17 | 2021-08-03 | 中兴通讯股份有限公司 | 一种网页内容处理方法及装置、存储介质 |
CN115803772A (zh) * | 2020-05-12 | 2023-03-14 | Polycom通讯技术(北京)有限公司 | 用于检测和显示白板文本和/或活跃说话者的系统和方法 |
CN113312990B (zh) * | 2021-05-13 | 2024-08-23 | 汕头市同行网络科技有限公司 | 一种基于光学字符识别的电竞比赛赛况实时输出方法 |
-
2002
- 2002-07-31 FR FR0209749A patent/FR2843220B1/fr not_active Expired - Fee Related
-
2003
- 2003-07-30 EP EP03750862A patent/EP1525553A2/fr not_active Withdrawn
- 2003-07-30 CN CNA038235072A patent/CN1685358A/zh active Pending
- 2003-07-30 WO PCT/FR2003/002406 patent/WO2004013802A2/fr not_active Application Discontinuation
- 2003-07-30 AU AU2003269080A patent/AU2003269080A1/en not_active Abandoned
Non-Patent Citations (7)
Title |
---|
BLOOMBERG D S ET AL: "Document image summarization without OCR" PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) LAUSANNE, SEPT. 16 - 19, 1996, NEW YORK, IEEE, US, vol. 1, 16 septembre 1996 (1996-09-16), pages 229-232, XP010202636 ISBN: 0-7803-3259-8 * |
DEFORGES O ET AL: "Segmentation d'images de documents par une approche multirésolution" TRAITEMENT DU SIGNAL, 1995, GRETSI, FRANCE, vol. 12, no. 6, pages 527-539, XP008011651 ISSN: 0765-0019 * |
DIMITROVA N ET AL: "MPEG-7 Videotext description scheme for superimposed text in images and video" SIGNAL PROCESSING. IMAGE COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 16, no. 1-2, septembre 2000 (2000-09), pages 137-155, XP004216273 ISSN: 0923-5965 * |
LIANG J ET AL: "Document layout structure extraction using bounding boxes of different entitles" APPLICATIONS OF COMPUTER VISION, 1996. WACV '96., PROCEEDINGS 3RD IEEE WORKSHOP ON SARASOTA, FL, USA 2-4 DEC. 1996, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 2 décembre 1996 (1996-12-02), pages 278-283, XP010206444 ISBN: 0-8186-7620-5 * |
MESSELODI S ET AL: "Automatic identification and skew estimation of text lines in real scene images" PATTERN RECOGNITION, PERGAMON PRESS INC. ELMSFORD, N.Y, US, vol. 32, no. 5, mai 1999 (1999-05), pages 791-810, XP004222747 ISSN: 0031-3203 * |
WERNICKE A ET AL: "On the segmentation of text in videos" IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, XX, XX, vol. 3, 30 juillet 2000 (2000-07-30), pages 1511-1514, XP002178986 * |
YU ZHONG ET AL: "Automatic caption localization in compressed video" IMAGE PROCESSING, 1999. ICIP 99. PROCEEDINGS. 1999 INTERNATIONAL CONFERENCE ON KOBE, JAPAN 24-28 OCT. 1999, PISCATAWAY, NJ, USA,IEEE, US, 24 octobre 1999 (1999-10-24), pages 96-100, XP010368958 ISBN: 0-7803-5467-2 * |
Also Published As
Publication number | Publication date |
---|---|
FR2843220A1 (fr) | 2004-02-06 |
CN1685358A (zh) | 2005-10-19 |
WO2004013802A3 (fr) | 2004-04-08 |
EP1525553A2 (fr) | 2005-04-27 |
FR2843220B1 (fr) | 2005-02-18 |
AU2003269080A1 (en) | 2004-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1298588B1 (fr) | Procédé de traitement d'images pour l'extraction automatique d'éléments sémantiques | |
BE1017547A6 (fr) | Compression d'images numeriques de documents scannes. | |
EP3572976A1 (fr) | Procede de traitement d'un flux d'images video | |
EP3832535A1 (fr) | Procédé de détection d'au moins un élément d'intérêt visible dans une image d'entrée au moyen d'un réseau de neurones à convolution | |
CA3043090C (fr) | Procede de reconnaissance de caracteres | |
WO2009141378A1 (fr) | Procede et systeme d'indexation et de recherche de documents video | |
FR3081244A1 (fr) | Procede de reconnaissance de caracteres | |
Fazlali et al. | Single image rain/snow removal using distortion type information | |
WO2004013802A2 (fr) | Procede et systeme de localisation automatique de zones de texte dans une image | |
FR3095286A1 (fr) | Procédé de traitement d’image d’un document d’identité. | |
WO2019129985A1 (fr) | Procede de formation d'un reseau de neurones pour la reconnaissance d'une sequence de caracteres et procede de reconnaissance associe | |
EP1390905B1 (fr) | Procede de detection de zones de texte dans une image video | |
FR2860902A1 (fr) | Determination de caracteristiques textuelles de pixels | |
WO2008087316A2 (fr) | Procede et systeme de binarisation d'une image comprenant un texte | |
EP4091098A1 (fr) | Procédé de traitement d'une image candidate | |
Saha et al. | Npix2Cpix: A GAN-based Image-to-Image Translation Network with Retrieval-Classification Integration for Watermark Retrieval from Historical Document Images | |
EP1768049B1 (fr) | Procédé et système de reproduction de documents par segmentation et amélioration sélective des images et des textes | |
CN113888758B (zh) | 一种基于复杂场景中的弯曲文字识别方法和系统 | |
Shetty et al. | Automated Identity Document Recognition and Classification (AIDRAC)-A Review | |
Khan et al. | Target detection in cluttered FLIR imagery using probabilistic neural networks | |
Bouaziz et al. | Automatic text regions location in video frames. | |
FR3112228A1 (fr) | Dispositif et procédé pour générer un masque de la silhouette du profil d’une structure | |
CN115797630A (zh) | 遮挡车辆图像生成方法、装置及电子设备 | |
BE1017576A6 (fr) | Procede d'agrandissement rapide d'images en couleur. | |
FR2982057A1 (fr) | Procede de reconnaissance d'une image dans une scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2003750862 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 20038235072 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2003750862 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2003750862 Country of ref document: EP |