WO2009052577A1

WO2009052577A1 - Locating a character region in an image

Info

Publication number: WO2009052577A1
Application number: PCT/AU2008/001576
Authority: WO
Inventors: Subhash Challa; Rajib Chakravorty; Duc Dinh Minh Vo
Original assignee: Sensen Networks Pty Ltd
Priority date: 2007-10-24
Filing date: 2008-10-24
Publication date: 2009-04-30

Abstract

Identifying a character region in an image comprises considering a segment of the image, and calculating a column-wise summation of image intensity for each of a plurality of columns of the segment. Frequency components of the column-wise summation are then determined. At least a subset of the frequency components are compared to predetermined expected values of such frequency components. From the comparison a score is calculated indicating the likelihood that the segment contains a character region.

Description

"Locating a character region in an image"

Cross-Reference to Related Applications

The present application claims priority from Australian Provisional Patent Application No 2007905816 filed on 24 October 2007, the content of which is incorporated herein by reference.

Technical Field

The present invention relates to techniques for identifying features of data sets and particularly, but not exclusively, to identifying the location of a character region within a larger image.

Background of the Invention

Various methods of automatic detection and recognition of predetermined features from sensor data sets have been employed to date. For example, automatic license plate recognition (ALPR) from digital photography is presently used in several applications, including speed monitoring and infringement and toll management. In prior art methods, ALPR is usually accomplished using three processing steps, illustrated in Figures 2 and 3, after an image of a vehicle has been acquired. Firstly, the region of the license plate in the image is determined and a data set obtained including the data of the license plate (Figure 2); secondly, the characters on the license plate are segmented for individual processing (Figure 3); and thirdly, optical character recognition (OCR) techniques are employed on each segmented character to determined each character.

A number of techniques exist to perform the first step, including colour detection, signature analysis, edge detection, and so on. Any inclination from the horizontal line in the captured image is determined and the image rotated before it becomes ready for a character recognition module. The image may also be further processed to remove noise.

For character segmentation, a known histogram method may be used, where each character is labelled in the license plate image, and then each label is extracted. Each character in the plate is extracted in a single image and normalized prior to the recognition step.

In one example of OCR, the segmented characters are first normalized and then fed into a neural network for optical character recognition, for example a back propagation feed forward Neural Network consisting of two layers. The neural network outputs are normalized and used as estimates of the a posteriori probability of each character:

For this ALPR technique to work well, the quality of the acquired image must be of a level that allows a relatively clear photograph to be taken to increase the accuracy of the OCR techniques employed. This tends to be achievable on open roads during daylight hours or under well lit street lighting. However, there are many situations where such optimum conditions are not available, such as at night time on roads with no or poor street lighting, during wet weather, in car parks, under bridges or in poorly lit tunnels. In such conditions, such prior art techniques generally require the use of relatively expensive cameras which can operate in a variety of lighting conditions, and/or the use of additional vehicle sensors to trigger lighting or flashes at the time of taking the photograph to illuminate the subject of the image being acquired.

Moreover, a considerable number of factors involved in the ALPR image analysis renders the problem hard to solve with adequate accuracy and robustness. Most LPR systems suffer reliability issues.

License plate recognition (LPR), or automatic number plate recognition (ANPR) is thus the use of video captured images for automatic identification of a vehicle through its license plate. The applications are numerous and include surveillance, theft prevention, parking lot attendance, identification of stolen vehicles, traffic laws enforcement, border crossing and toll roads. While other automatic vehicle identification methods are in use, such as transponders, bar-coded labels and radio-frequency tags, or proposed, such as electronic license plates, license plate reading remains, and is likely to remain, the way a car is identified. LPR attempts to make the reading automatic by processing sets of images captured by cameras. LPR systems comprise a series of steps that consist of detecting a vehicle, triggering the captures of images of that vehicle and treating those images for recognition of the characters in the license plate. While the capture of the images, their transfer in digital form to a processor and the coordination of all tasks in a LPR system is not trivial, these tasks often reduce to engineering skills and can be managed extremely well in most scenarios. The processing of images for recognition is where research begins. Image analysis in LPR has three parts; (i) localization (extraction) of license plate from image, (ii) segmentation (extraction) of characters from localized license plate region, and (iii) recognition of those characters. These steps are performed automatically by software and require intelligent algorithms to achieve a high reliability.

Plate localization is an important step in LPR. It aims to locate the license plate of the vehicle in an image. Although the human eye can immediately visually locate a license plate in an image, it is not a trivial task for a computer program to do so in real time.

Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.

Throughout this specification the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. Summary of the Invention

According to a first aspect the present invention provides a method for identifying a character region in an image, the method comprising: considering a segment of the image, and calculating a column- wise summation of image intensity for each of a plurality of columns of the segment; determining frequency components of the column- wise summation; comparing at least a subset of the frequency components to predetermined expected values of such frequency components; and calculating from the comparison a score indicating the likelihood that the segment contains a character region.

According to a second aspect the present invention provides a system for identifying a character region in an image, the system comprising: an image capture device for capturing an image; a data processing means arranged to consider a segment of the image, to calculate a column-wise summation of image intensity for each of a plurality of columns of the segment, to determine frequency components of the column-wise summation, to compare at least a subset of the frequency components to predetermined expected values of such frequency components, and to calculate from the comparison a score indicating the likelihood that the segment contains a character region.

A computer program product comprising computer program code means to make a computer execute a procedure for identifying a character region in an image, the computer program product comprising: computer program code means for considering a segment of the image, and calculating a column-wise summation of image intensity for each of a plurality of columns of the segment; computer program code means for determining frequency components of the column-wise summation; computer program code means for comparing at least a subset of the frequency components to predetermined expected values of such frequency components; and computer program code means for calculating from the comparison a score indicating the likelihood that the segment contains a character region.

The present invention thus utilises the periodic nature of printed characters as a feature by which a character region in an image, such as a license plate region in an image, may be located. It is to be appreciated that the column wise summation is thus based on columns which are substantially perpendicular to a direction of the character script, irrespective of whether the character script direction text appears horizontally or vertically in the image, for example.

Preferably, a plurality of segments of the image are each considered in accordance with the invention, the plurality of segments being obtained by raster scanning of a segmentation window across the image and treating the segment windowed at each increment. Preferably, step increments of the segmentation window are less than the dimension of the window, and for example may be substantially half the respective dimension of the window.

Preferably, the or each segment is selected to be of a size which is around the same size or slightly larger than an anticipated area of the character region of interest. The segment preferably has a number of columns which is a power of two to provide for the use of Fast Fourier Transform to obtain the frequency components.

Where the segment has 128 columns and the character region comprises a license plate, the subset of the frequency components may comprise the FFT points substantially in the range of points 15 to 55. Such embodiments recognise that 6 license plate characters spread across a segment of column width 128 produces distinctive frequency components at FFT points 15 to 55, providing a feature permitting automated recognition of the license plate region. It is to be understood that such features in the FFT profile will appear at spectral locations which depend on the segment/window width and also depending on the number, size and language of characters present in the character region of interest. An appropriate spectral range in which distinctive features enable identification of the character region may be determined for each such combination of segment size and character layout, and the present invention includes such alternative spectral ranges in such applications.

In preferred embodiments, training is carried out in order to identify a mean and variance of the magnitude of the spectral points of interest to serve as the predetermined expected values, by taking such measurements for a sufficient number of sample segments in which a character region of interest is present. Additionally or alternatively, the predetermined expected values may comprise a mean and variance of the magnitude of the spectral points of interest obtained by taking such measurements for a sufficient number of sample segments in which a character region of interest is not present. Preferably, the score of each segment comprises a sum of the point-wise Gaussian likelihoods.

The plurality of data sets may be acquired from one sensor at different times, or acquired from a plurality of sensors at approximately the same or different times. Additionally, data fusion in accordance with the teachings of International Patent Application No. PCT/AU2007/001274, the content of which is incorporated herein by reference, may be applied in some embodiments of the present invention.

The data sets may be image data sets acquired from one or more cameras.

The one or more features may comprise characters, alphanumeric characters, or phonograms or logograms of other character sets such as Chinese characters. The plurality of data sets may comprise a different respective representation of a vehicle license plate, the plate displaying one or more of the alphanumeric characters.

According to another aspect of the invention there is provided a computer program configured to cause a computer to perform the steps of the method of any of the above described aspects. The method of any of the aspects described above may be implemented on a computer.

Sensors used to acquire the data sets are preferably cameras, and further preferably cameras which acquire images using the visible spectrum. Such cameras may include cameras which capture black and white and/or colour still or moving images. Alternatively the sensors may comprise infrared sensors or thermal image sensors. In other alternative embodiments, other sensors such as motion sensors and distance sensors may be employed.

Other aspects of the invention comprise systems and apparatus for carrying out the above described method aspects. The systems may comprise cameras or other sensors for acquiring sensor acquired data sets and apparatus for performing the above described method steps. The apparatus may comprise programmable computers.

As will be understood, the term license plate should not be interpreted as requiring a physical plate, but includes physical plates and panels such as sticky paper or plastic panels or a part of a surface of a vehicle upon which characters may be printed, embossed, impressed, lithographed or the like.

Brief Description of the Drawings

An example of the invention will now be described with reference to the accompanying drawings, in which:

Figure 1 illustrates a general-purpose computing device that may be used in an exemplary system for implementing the invention; Figure 2 is an acquired image of a license plate;

Figure 3 illustrates the characters of the license plate of Figure 1 where the characters have been segmented;

Figure 4 illustrates conversion of a colour (RGB) license plate image to a greyscale image, and then to a binary black and white version; Figure 5 is a plot of a column- wise summation of a license plate image;

Figure 6 illustrates a w-point FFT output of the summation of Figure 5; Figure 7 illustrates operation of the Sobel operator;

Fig. 8 is a normalized Y projection function of plate (c) illustrated in Figure 12c;

Fig. 9 is a normalized X projection function of plate (c) illustrated in Figure 12c;

Figure 10 illustrates three examples of FFT Located Plate Images; Figure 11 illustrates edge images obtained by using a Sobel operator upon the images of Figure 10;

Figure 12 illustrates the edge images of Fig 11 after lone line removal;

Figure 13 illustrates edge images of Fig 12 after out plate area removal;

Figure 14 illustrates the images of Fig 13 cropped to the plate boundaries; Figure 15 illustrates the plate character segmentation algorithm;

Fig. 16 (a), (b) and (c) show the normalized Y projection function and corresponding black and white plate images;

Figure 17 shows the normalized X projection function and its corresponding black and white plate image; Figs 18 (a) and (b) illustrate black and white image before and after long line removal;

Figs. 19a and 19b illustrate black and white image before and after non- character components removal;

Figs. 20a, 20b, 20c and 2Od illustrate a black and white image plate, the black and white image plate after top and bottom cut off; the black and white image plate after median character width based lone line; and the black and white image plate after non-character components removal, respectively;

Fig 21 illustrates Horizontal, vertical M3 and diagonal median filtering masks; and Fig 22 illustrates a finally cropped plate image, before and after Otsu binary thresholding.

Description of the Preferred Embodiments

Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.

A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory ("ROM"); random access memory ("RAM"); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.

Turning to the drawings, wherein like reference numerals refer to like elements, the invention is illustrated as being implemented in a suitable computing environment. Although not required, the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices. The following description begins with a description of a general-purpose computing device that may be used in an exemplary system for implementing the invention, and the invention will be described in greater detail with reference to subsequent figures. Turning now to FIG. I₃ a general purpose computing device is shown in the form of a conventional personal computer 20, including a processing unit 21, a system memory 22, and a system bus 23 that couples various system components including the system memory to the processing unit 21. The system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory includes read only memory (ROM) 24 and random access memory (RAM) 25. A basic input/output system (BIOS) 26, containing the basic routines that help to transfer information between elements within the personal computer 20, such as during start-up, is stored in ROM 24. The personal computer 20 further includes a hard disk drive 27 for reading from and writing to a hard disk 60, a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD ROM or other optical media.

The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical disk drive interface 34, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the personal computer 20. Although the exemplary environment described herein employs a hard disk 60, a removable magnetic disk 29, and a removable optical disk 31, it will be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories, read only memories, storage area networks, and the like may also be used in the exemplary operating environment. A number of program modules may be stored on the hard disk 6O₅ magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35, one or more applications programs 36, other program modules 37, and program data 38. A user may enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and a pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or a universal serial bus (USB) or a network interface card. A monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor, personal computers typically include other peripheral output devices, not shown, such as speakers and printers.

The personal computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be another personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 20, although only a memory storage device 50 has been illustrated in FIG. 1. The logical connections depicted in FIG. 1 include a local area network (LAN) 51 and a wide area network (WAN) 52. Such networking environments are commonplace in offices, enterprise- wide computer networks, intranets and, inter alia, the Internet.

When used in a LAN networking environment, the personal computer 20 is connected to the local network 51 through a network interface or adapter 53. When used in a WAN networking environment, the personal computer 20 typically includes a modem 54 or other means for establishing communications over the WAN 52. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.

In the description that follows, the invention will be described with reference to acts and symbolic representations of operations that are performed by one or more computers, unless indicated otherwise. As such, it will be understood that such acts and operations, which are at times referred to as being computer-executed, include the manipulation by the processing unit of the computer of electrical signals representing data in a structured form. This manipulation transforms the data or maintains it at locations in the memory system of the computer, which reconfigures or otherwise alters the operation of the computer in a manner well understood by those skilled in the art. The data structures where data is maintained are physical locations of the memory that have particular properties defined by the format of the data. However, while the invention is being described in the foregoing context, it is not meant to be limiting as those of skill in the art will appreciate that various of the acts and operations described hereinafter may also be implemented in hardware.

The following describes a preferred embodiment of the invention which involves identifying one or more features in the form of alphanumeric characters of a vehicle license plate represented in a plurality of sensor acquired data sets in the form of digital image files. The plurality of digital image files are taken of the same subject, in this embodiment being a vehicle license plate, by a single camera. For each image, the license plate is extracted, the characters segmented, and each character of each image determined.

First, the process of locating the license plate (LP) in the image is discussed. The algorithm uses frequency based technique and defines a score for a certain segment of the image indicating a likelihood of it being a LP region. Based on the obtained score, several possible LP regions are extracted and passed on to a module to determine tracks formed by those regions. Notably, multiple license plates present in a single image may be handled by this technique.

In an initial training phase, a set of pre-segmented M number of license plate images is used. Let the image segment be denoted as Im(h;w) where h and w refer to the height and width of the segment (in pixels) respectively. For each plate in the training data, the algorithm carries out the following steps.

First, the plate is converted into a binary (black and white) image (1-0 image) using a hysteresis threshold method. Figure 4 illustrates conversion of a colour (RGB) license plate image to a greyscale image, and then to a binary black and white version. Figure 1 illustrates an example of such a binary thresholded image.

Second, a summation is taken of each column in the image, resulting in a w x 1 summation vector where w is the width in pixels of the image. Figure 5 is a plot of such a column- wise summation of the LP image.

Third, the summation vector is taken as a signal, and a Fast Fourier transform (FFT) is performed. The length of the vector "w" dictates the FFT to be w-point, making it desirable for w to be a factor of 2, e.g. 128 or 256. Each point of the FFT results in a complex number and for simplicity we denote the absolute value of the complex number to be X(n), as the phase outputs of the FFT are not used in this embodiment. Thus:

X(n) = absolute value of n-th FFT point where n= 1 , ... , w

Figure 6a illustrates a w-point FFT output of such a summation. The present preferred embodiment recognises that for w=128 and for the NSW, Australia style license plates tested, there exists a window of interest in the frequency domain between points 15 and 55 of the FFT. We denote these two points as L(=l 5) and H(=55), respectively. Figure 6b illustrates this particular portion of the FFT. For embodiments using a different window width w and/or which relate to license plates of differing format, simple tests can determine the location of the FFT window of interest and appropriate values of L and H.

Fourth, for each plate the corresponding absolute values of FFT X(i) where i=L,L+ 1 , ... , H are stored. The first to fourth steps are then repeated for M license plate images. The mean and variance of each FFT component over M images are then calculated:

Mecm_p(ϊ) =

i M

Var_p (0 = — £ (X -(O - Mean, (O)²

M y₌₁

where i=L,L+l, ,H

Training then continues by repeating this process in respect of images which do not contain a license plate. That is, the above steps are followed for T non-plate image regions and similar scores within the same frequency band of L,L+1, ,H are calculated:

Mean_NP(i)

Var_NP (0 (X(O " Mean_NP (i)f

where i=L,L+l, ,H

The means and variances of both plate regions and non-plate regions are pre-computed and stored, concluding the training phase. In the live detection phase, during real time operation of the application, the information stored in the training phase is used to assign a score value to every segment of the image, the score value referring to that segment's likelihood of being a LP region. This is carried out in following manner. Once again a segment is denoted as Im(h;w).

Generally, the source image will be larger than the image segment size. Accordingly, the present embodiment provides for the source image to be scanned by separately considering a plurality of segments of that image. In the present embodiment, segments are selected by rastering a segmentation window over the source image, with horizontal segmentation increments of w/2, and vertical segmentation increments of h/2. Selecting the windowing segmentation increments to be less than w and h, respectively, recognises that a license plate may be randomly positioned within the source image and that sufficiently fine segmentation increments are required to locate the license plate region sufficiently accurately.

For each segment so obtained, the first to fourth steps set out above in relation to the training sequence are carried out. That is, each segment is binary thresholded, then columns are summed, the column summation is FFTed, and the absolute values of the FFT output in the range of interest are stored. Let the absolute values of frequency components for each such real time image segment be P(i) where i=L,L+l , ,H.

A Gaussian likelihood score is assigned to each of the frequency components, resulting in a likelihood score for each of those frequency points. A first score for each point, score_p(i), relates to the similarity between that point and the trained value for license plates:

score _p (z) -

-yj2πVar_p (z) A second score, scorβ_Np(i), relates to the similarity between that point and the trained values for non-license plate regions:

(P(i)-Mean_NP {i)f s &ccoorree -¹N_NP_P ( ^ϊ;) ^{2VarNP (/)}

where i=L,L+l,....,H

Two combined likelihood scores for the whole segment are then calculated:

H region _ score _p - ]^[ score_p (i) i=L H region _ score_NP = ]^[ score_NP (/) i=L with the former being the score of the segment on the question of whether the segment includes a plate, and the latter being the score of the segment on the question of whether the segment is a non-plate region.

In other embodiments alternative such score calculations may be effective, for example the log of the region_score is calculated in an alternative example discussed further in the following.

From the series of segments, a set of possible regions are identified as being license plate regions. These regions are selected based on the following criteria:

1. Satisfying a minimum ^re&^on-^scorep _vai_{ue; an}d

2. Satisfying a likelihood ratio test: region _ score _p region _ s core _NP >r

where ^ is a pre-set threshold. Regions identified as being license plate regions can then be passed to other modules for processing, such as a tracker module of the type set out in an Australian Provisional Patent application filed simultaneously herewith and entitled "verification of identification of an image characteristic over multiple images", the content of which is incorporated herein by reference.

In particularly preferred embodiments, a joint likelihood and likelihood ratio is calculated to give the score for each region or image segment:

The score calculation can be simplified by considering the logarithm of the score defined in (1). The procedure is illustrated below: Log Likelihood — log s

The calculation above applies both to the plate (region_score_p) and non-plate region score (region _scoreχp) calculations. With specific values of mean and variances for both cases, the region scores are given by

RSrqp = KNP ~ > _{J n} ?_Λ9 tL 2<r_W ^p(0² (₄) The likelihood ratio test is then, in logarithmic operation sense, subtraction of (3) and (4).

\2 P^NP) tL 2σ_NP(if 2σ_P('i)^{2 l} ^^

The first term (Kp - K^p) in (6) can be precomputed in the training phase. Only the second term needs to be calculated for each region which is also an addition operation rather than more complicated multiplication operation. Then the required condition that needs to be satisfied becomes

LR > log 7 (7)

where γ is the same threshold parameter defined previously herein.

The detection may in some embodiments be improved if used together with the distribution score set out in an Australian Provisional Application by the same applicant filed simultaneously herewith and entitled "Locating a character region in an image (II)", the content of which is incorporated herein by reference.

Once license plate regions have been identified in accordance with the present invention, they may be verified and/or discarded, cropped to a plate region, thresholded, cleaned, character segmented, character cropped and/or undergo optical character recognition. The following provides details of particularly suitable plate cropping techniques and character segmentation techniques.

Where an image segment has been identified as a license plate area in accordance with the preceding preferred embodiment, the image segment will normally contain both license plate area and some background area as it is unlikely that the image segment will be of identical size as the license plate and be precisely aligned with the license plate. Examples are shown in Fig. 10. Thus, while the license plate region has been identified in images such as those in Figure 10, it remains necessary to more accurately identify the bounds of the license plate itself. The present embodiment for plate cropping is based on the fact that the density of edges in the license plate region is normally much higher than in the non-plate regions.

In this technique, the edge image is first obtained by applying a Sobel operator, examples are shown in Fig. 11. Then, any flat lines that are longer than a given threshold (normally set slightly bigger than the maximum character width, which can be predetermined for a given installation of an imaging device) will be removed. This is helpful in removing non-character line noise, with examples shown in Fig. 12. Finally, edge removal is carried out, based on the horizontal and vertical projection, examples are shown in Fig. 13 and 14.

In more detail, the Sobel detector is a 2D spatial gradient operator that emphasizes the high spatial frequency components applying to a grey scale image. This operator is used to detect the edges, and its operation is illustrated in Fig. 7.

The masks of Figure 7 are designed to respond maximally to edges running vertically and horizontally relative to the pixel grid, one mask for each of the two perpendicular orientations. The masks can be applied separately to the input image, to produce separate measurements of the gradient component in each orientation (Gx and Gy). These can then be combined together to find the absolute magnitude of the gradient at each point and the orientation of that gradient. The gradient magnitude is given by:

\G\ = yjG^ + G^ although typically, an approximate magnitude is computed using:

which is much faster to compute.

Long line removal is carried out to remove any flat lines that are longer than a given threshold, which is normally set slightly bigger than the maximum character width. This operation is helpful in removing some non-character line noise. Out plate area removal is then conducted. In the present embodiment this technique is based on the fact that 'the density of edges in the plate area is normally much higher than in the non-plate area. The preprocessed edge image is first projected to Y (vertical) direction, and then normalized with its maximum value. An example of normalized Y projection of plate in examples in the following section (plate (c)) is shown in Fig. 8, which is a normalized Y projection function of plate (c) illustrated in Figure 12c. In

Figure 8, it can be seen that row 2 exceeds this threshold so that rows 0 and 1 can be excluded as being non-plate region above the license plate region. Similarly, rows 31 to 35 can be excluded as being non-plate region below the license plate region.

Next, the preprocessed edge image of Figure 12c is projected to the X (horizontal) direction, and normalized with its maximum value, as shown in Fig. 9. From Figure 9 it can be seen that the leftmost columns, from column 0 to about column 57, can be excluded as being non-plate region to the left of the license plate region by reference to the threshold value of 0.4. Similarly, the rightmost columns from about column 170 and all columns farther right, can be excluded as being non-plate region to the right of the license plate region by reference to the threshold value of 0.4.

Figure 10 illustrates three examples of FFT Located Plate Images. Figure 11 illustrates edge images obtained by using a Sobel operator upon the images of Figure 10. Figure 12 illustrates edge images after lone line removal. Figure 13 illustrates edge images after out plate area removal. Figure 14 illustrates cropped image cropped to the plate boundaries identified by this technique.

Character segmentation plays a very important role in plate recognition system. Since there are many kinds of plates in different states, or in different countries, and each image of a plate can be obtained in totally different illumination condition, processing these plate images and segmentation becomes extremely varied and difficult. The following sets out a robust character segmentation algorithm, which includes edge removal, long line removal, character grab based top and bottom position estimation etc.

The structure of the plate character segmentation algorithm is shown in Fig. 15. The input image is from the edge detection based plate crop function explained above with reference to Figures 7 to 14. The algorithm of Figure 15 is thus provided with a portion of an image which has been cropped to conform to the edges of an identified license plate region. In brief, firstly, the plate frame edge is removed based on the binary image projected to X and Y direction. Secondly, a pre-lone line removal operation applies to frame edge removed image to separate possible character connected to boundary or background. Then, the "First Character Grab Cut and Non-Character Components Removal" operation will apply, the outputs of this operation are median top and bottom cut off positions, which can remove some incorrectly connected components, such as bolts that are used to fix plate on car etc., median character height, median character width and median character size, which will be used in the second and final cut as a reference size. After that, the "Second Character Grab Cut" operation will output "left and right" cut off positions. The final operation is "Final Character Grab cut and Character Recognition", which will output recognized plate string.

In more detail, we refer to the edge removal algorithm. There are some plates that have frames around plate characters. In order to segment these plate characters properly after edge detection based plate cropping, it is necessary to remove the plate edges. The basic idea of removing these frame edges is to project the black and white binary image into X and Y directions, then cut off edges from top, bottom, left and right based on these project functions. The preprocessed black and white image is first projected to Y (vertical) direction, and then normalized with its maximum value. The top edge is removed from first row to a certain row, in which the project function value goes just blow the threshold which is currently set to 0.75. Similarly, the removal process will cut from bottom row, left column and right column to certain position (row or column), in which the proj ect function value j ust below the threshold. An example of normalized Y projection of plate and its corresponding images before and after top and bottom cut off are shown in Fig. 16 (a), (b) and (c), which shows the normalized Y projection function and corresponding black and white plate images. Since the bottom does not meet the cut off condition, no bottom rows are cut off in this case.

Similar consideration apply to the left and right edges, and Figure 17 shows the normalized X projection function and its corresponding black and white plate image for this purpose.

Next, long line removal occurs, with the goal of removing any flat lines that are longer than a given threshold, which is normally set slightly bigger than the maximum character width. This operation is helpful in removing some non-character line background and in separating the character from connected background. An example of black and white image before and after long line removal is shown in Fig. 18 (a) and (b), respectively.

After such preprocessing the algorithm of Figure 15 moves to the First Character Grab Cut. In the first character grab cut, any components that are too small or too big will be removed first, in which the "too big" and "too small" thresholds are set as hard thresholds in this cut. Then, the width and height ratio of any components which are too large (too fat to be a character) or too thin (too thin to be a character) will also be removed, in which the width and height ratio thresholds are also set as hard thresholds in this cut. An example of black and white image before and after non-character components removal is shown in Figs. 19a and 19b, respectively.

The outputs of the First Character Grab Cut are the top and bottom plate character cut positions, median character height, median character width and median character size of plate characters. The top character cut position is calculated as the median value of top character positions of all possible character candidates, and the bottom character cut position is calculated as the median value of bottom character positions of all possible character candidates. The output median character height, median character width and median character size are the median values of the heights, widths and sizes of all possible character candidates and these output values will be used as reference for the final character grab cut.

Next is the Second Character Grab Cut. The output of the "Second Character Grab Cut" is the left and right cut off positions. Firstly, the top and bottom part of preprocessed black and white plate image will be cut off based on the output top and bottom character cut positions of the "First Character Grab Cut". An example of black and white image plate before and after top and bottom cut off is shown in Fig. 20 (a) and 20(b).

Then, a median character width based "post long line removal" function is applied to the above output image, in which any flat lines that are longer than 1.4 times the median width of the output "First Character Grab Cut" character, will be treated as non- character noise and will be removed. An example of before and after median character width based lone line is shown in Fig. 20 (b) and (c).

Finally, any components that are too small or too big will be removed, and left and right cut off positions will be obtained at the same time. An example of before and after non-character components removal is shown in Fig. 20 (c) and (d).

The algorithm of Figure 15 then moves to Noise Removal and Broken Mending. After "Second Character Grab Cut", we need to do either noise removal or break mending depending on if there is possible break or interruption in the license plate characters. If a broken flag is true, then broken mending occurs, otherwise noise removal is carried out. Horizontal, vertical M3 and diagonal median filtering masks are used for both noise removal and broken mending, shown in Fig. 21.

For noise removal, these masks directly apply to the plate characters, which will only filter out high frequency spot noise without affecting non-noise pixels or edge pixels.

For broken mending, we need to convert black and white image into its complementary image, so broken pixels become high frequency noise, then we apply the above median filter masks, and finally we convert back and white image back to original, the broken pixels will be mended automatically.

The algorithm of Figure 17 then moves to a Final Character Grab Cut and Character Recognition. Based on the previous first and second cut, the plate will be cropped into an area that only contains characters. An example of a finally cropped plate image, and binary image having been subjected to Otsu thresholding, are shown in Fig. 22 (a) and

(b).

In order to deal with the possible situation that previous cuts have not fully removed outside plate background, median width based long line removal is still applied to the re-Otsued black and white image of Fig 22b. Then, non-character components removed and characters will be grabbed and fed into character recognition function to recognize.

Preferred embodiments of the invention realise several advantages. For example, the image quality does not need to be of as high a standard compared with prior art techniques. Therefore, additional lighting and ideal camera placement that may be required to increase the accuracy of prior art methods are not necessary in the preferred embodiment. Also, it is not necessary to use dedicated license plate image capture cameras with the present embodiment, but instead images captured by existing devices, such as closed circuit television (CCTV) cameras, or highway monitoring cameras, may be used. The preferred embodiment is therefore more cost effective and simpler to install and/or set up compared with prior art methods and equipment.

The above embodiment has been described with reference to license plates, which are typically understood to be registration plates or number plates used to identify a vehicle (eg automobile, motorbike, trailer, truck, etc) used on roadways, but may also be adapted for use in determining alphanumeric characters in different situations, such as for estimating characters from images of boat registration numbers, which are typically affixed to an above water hull side of a boat. This alternative embodiment may be useful for determining the registration details of boats moored in a marina, for example. Images for use in this embodiment can be obtained from CCTV or other cameras.

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

Claims

CLAIMS:

1. A method for identifying a character region in an image, the method comprising: considering a segment of the image, and calculating a column- wise summation of image intensity for each of a plurality of columns of the segment; determining frequency components of the column- wise summation; comparing at least a subset of the frequency components to predetermined expected values of such frequency components; and calculating from the comparison a score indicating the likelihood that the segment contains a character region.

2. The method of claim 1, comprising considering a plurality of unique segments of the image, and performing the method in respect of each of said plurality of segments.

3. The method of claim 2 wherein the plurality of segments are obtained by raster scanning of a segmentation window across the image in step increments and treating the segment windowed at each increment.

4. The method of claim 3 wherein the step increments are less than the respective dimension of the window.

5. The method of any one of claims 1 to 4 wherein the or each segment is selected to be of a size which is around the same size or slightly larger than an anticipated area of the character region of interest.

6. The method of any one of claims 1 to 5 wherein the segment has 128 columns, wherein the character region comprises a license plate, and wherein the subset of the frequency components comprises FFT points substantially in the range of points 15 to 55.

7. The method of any one of claims 1 to 6 further comprising determining the predetermined expected values by carrying out training upon sample segments in which a character region of interest is known to be present, in order to identify a mean and variance of the magnitude of the spectral points of interest.

8. The method of any one of claims 1 to 7 further comprising determining the predetermined expected values by carrying out training upon sample segments in which it is known that a character region of interest is not present, to identify a mean and variance of the magnitude of the spectral points of interest for such segments.

9. The method of any one of claims 1 to 8 wherein the score of each segment comprises a sum of the point- wise Gaussian likelihoods.

10. A system for identifying a character region in an image, the system comprising: an image capture device for capturing an image; a data processing means arranged to consider a segment of the image, to calculate a column-wise summation of image intensity for each of a plurality of columns of the segment, to determine frequency components of the column-wise summation, to compare at least a subset of the frequency components to predetermined expected values of such frequency components, and to calculate from the comparison a score indicating the likelihood that the segment contains a character region.

11. The system of claim 10, wherein the data processing means is configured to consider a plurality of unique segments of the image, and to calculate the score for each segment.

12. The system of claim 11 wherein the data processing means is configured to obtain the plurality of segments by raster scanning of a segmentation window across the image in step increments and treating the segment windowed at each increment.

13. The system of claim 12 wherein the step increments are less than the respective dimension of the window.

14. The system of any one of claims 10 to 13 wherein the or each segment is of a size which is around the same size or slightly larger than an anticipated area of the character region of interest.

15. The system of any one of claims 10 to 14 wherein the segment has 128 columns, wherein the character region comprises a license plate, and wherein the subset of the frequency components comprises FFT points substantially in the range of points 15 to 55.

16. The system of any one of claims 10 to 15 wherein the data processing means is further configured to determine the predetermined expected values by carrying out training upon sample segments in which a character region of interest is known to be present, in order to identify a mean and variance of the magnitude of the spectral points of interest.

17. The system of any one of claims 10 to 16 wherein the data processing means is further configured to determine the predetermined expected values by carrying out training upon sample segments in which it is known that a character region of interest is not present, to identify a mean and variance of the magnitude of the spectral points of interest for such segments.

18. The system of any one of claims 10 to 17 wherein the score of each segment comprises a sum of the point- wise Gaussian likelihoods.

19. A computer program product comprising computer program code means to make a computer execute a procedure for identifying a character region in an image, the computer program product comprising: computer program code means for considering a segment of the image, and calculating a column-wise summation of image intensity for each of a plurality of columns of the segment; computer program code means for determining frequency components of the column-wise summation; computer program code means for comparing at least a subset of the frequency components to predetermined expected values of such frequency components; and computer program code means for calculating from the comparison a score indicating the likelihood that the segment contains a character region.