EP2208170A1

EP2208170A1 - Method for image analysis, especially for mobile stations

Info

Publication number: EP2208170A1
Application number: EP08848083A
Authority: EP
Inventors: Gerd Mosakowski
Original assignee: T Mobile International AG
Current assignee: Deutsche Telekom AG
Priority date: 2007-11-05
Filing date: 2008-10-28
Publication date: 2010-07-21
Also published as: CN101855640B; MX2010004732A; RU2010122947A; BRPI0820570A2; CA2704830A1; RU2454718C2; CA2704830C; KR20100099154A; KR101606469B1; US20100296729A1; US8532389B2; DE102007052622A1; WO2009059715A1; CN101855640A

Abstract

A robust OCR system requiring little computing capacity is obtained by first carrying out an adaptive pre-processing optimised in terms of pixel groups, which analyses the image in line segments. The most significant difference compared to previously known methods is that there is no longer a direct pattern comparison, instead the line segments are gone over in as optimum a manner as possible. The corresponding character is then deduced from the sequence of movements. As this sequence of movements can be scaled well and descibed in a relatively simple manner, this technique is especially suitable for mobile use. The sequence of movements of know characters is stored in a search word, such that the letters can be directly deduced from the movement. A dictionary/lexicon can also be used. If words are recognised by means of the dictionary/lexicon, the recognised letters can be used for an even more optimised character font identification. The invention is advantageous in that a robust OCR system is provided, which also requires little computing capacity. The system according to the invention is robust especially in that the recognition works better than with conventional systems even under bad conditions, especially light ratios and interferences.

Description

Method for image analysis, in particular for a mobile radio device

The invention relates to a method for image analysis, in particular for a mobile device with built-in digital camera for automatic optical character recognition (OCR), according to the preamble of patent claim 1 or 2.

There are a variety of OCR systems for PCs. Typically, a flatbed scanner is used to read in texts. There are for mobile use

Handheld scanners that display, save or transfer scanned text to a computer screen. There are always problems when the original is scanned askew, or only letters of the fragments can be recognized (for example, flag inscribed in the wind). In addition, such techniques fail when direct scanning is not possible (e.g., roadside signage). According to the current state of the art, such an image could be recorded with a high resolution, which can be subsequently scanned. However, there is no direct OCR in the camera itself, as it is too computationally intensive with conventional methods.

If longer texts are to be recognized, it is often necessary to take several pictures and then put them together (put 360 ° photos together). In order to get a sufficient quality, the process usually has to be reworked manually.

Essential procedures for OCR work with a pure bit pattern comparison "Pattern Matching" or, as in the case of handwriting recognition, with the description of the letters by lines and crossing points Pattern matching can be used particularly well in the case of standardized letters (eg In the case of license plate recognition, the characters to be recognized are limited to a small number, which are also standardized.

Furthermore, various applications in the field of augmented reality are known. An example of this is the overlaying of a photograph (satellite photo) with a road map showing the individual street names (www.clicktel.de). The prior art is a method of prioritizing pixel groups according to DE 10113880 B4 or equivalent EP 1371229 B1, which discloses the features according to the preamble of claim 2.

DE 10025017 A1 discloses a mobile phone, which in particular for a simpler application and use of additional services and functions such. As short message service, payment transactions, identity or security checks, etc. is suitable. The mobile phone has an integrated device for reading characters, symbols codes and / or identity features, which is a scanner, a bar code reader or a fingerprint reader in the form of a CCD sensor, thus providing a comfortable and fast input and capture of text, Symbols or safety-relevant characteristics possible.

DE 202005018376 U1 discloses a mobile telephone with keyboard, screen, data processing system and an optical scanning system arranged behind an opening or a window of the housing, in particular a hand-held scanner, and an integrated translation program. The optical scanning system makes it possible to scan characters and / or words available in another language. Selecting the language translates the word or words. As a result, the user of the mobile phone is able to read him strange words and texts. This can be beneficial menus, warnings, operating instructions and maps and signs. In addition, users can also enter words themselves from the keyboard of the mobile phone or select from an encyclopedia contained in the memory of the data processing system. By interconnecting the data processing system with the screen and keyboard, the choice of language translates these words and displays them on the screen.

DE 10163688 A1 discloses a method and a system for tracking goods which are provided with an optically readable, alphanumeric identification, as well as a detection device therefor. The marking is captured as an image by the recording device and converted into image data. These are sent by the detection device by radio to a receiver which is connected to a computer system, which further evaluates the image data. Alternatively, the image data are evaluated before being sent to the receiver in the detection device. How exactly the evaluation of the image data is done, is not disclosed in detail.

DE 10 2005 033 001 A1 already discloses a method for image processing in mobile terminals, e.g. Mobile phones with a camera, which receives digital image information and part of this image information using pattern recognition methods, such as text recognition methods (OCR) are analyzed. How exactly these text recognition methods (OCR) work, however, is not described in this document.

Object of the present invention is therefore to provide a generic method for image processing in mobile devices with digital camera, which works much more accurate and faster.

The invention is characterized by the features of independent claim 1 or 2.

Advantageous developments are the subject of the dependent claims.

Advantage of the invention is a more robust OCR detection with optional translation in real time (real-time), which also manages with relatively little computing power. The robustness refers in particular to the fact that detection works better than conventional systems even under poor conditions (especially light conditions, overlapping interference).

On the one hand, this is achieved by first performing an adaptive pixel group-optimized preprocessing, which searches the image for lines. The most important distinguishing feature of the previously known methods is that now no further direct pattern comparison takes place, but an attempt is made to trace the lines as optimally as possible. From the sequence of movements is then closed on the corresponding character. Because this sequence of movements scale well and describe with relatively little effort This technology is currently suitable for mobile use. The sequence of movements of known characters is stored in a search word, so that the movement can be concluded directly on the letter. In addition, a dictionary / lexicon can be used. If words are recognized by the dictionary / lexicon, the recognized letters can be used for even more optimized character recognition.

Application scenarios are camera mobile phones for tourists abroad, in particular to read traffic signs, menu cards, general information signs. The content can be translated into a second language. The translation is shown to the user on the display, or read out via a "text to speech application".

The robustness of the recognition is based initially on a normalization of the line widths or letter sizes. Subsequently, the letters are traced, in which case the actual letters are recognized during the tracing. The robustness of the detection method results from the combination of different solution steps. Due to the normalization of the bar widths, shadow effects and poor lighting conditions have hardly any influence on the detection rate. Due to the size norms, the

Effects on e.g. be compensated by distant signs. By tracing one gets through simple, inexpensive, yet extensible solution trees to the correct letter or number. In order to make the results even more robust, a dictionary can additionally be used. Through feedback of recognized words solution trees and line widths of the template can be optimized accordingly.

To solve the problem, the following steps will be taken.

First of all, the image is converted into electrical signals with an image recording element (for example a CCD camera). These signals are then stored according to the method of the patent DE 101 13 880 B4 in a prioritized array. Optionally, a position factor can also be included in the prioritization. The position factor is the greater, the closer the pixel group is to the starting pixel. The Startpixel is located in the western languages (English, German, French) first in the upper left corner of the array.

In contrast to the patent DE 101 13 880 B4, which works with a predetermined form of the pixel group, the pixel groups here can also vary during the recognition process. An example of a pixel group is a one-line horizontal array of pixels whose length is dependent on a double change in brightness. For dark letters to be recognized on a light background, the distance between the first light-dark transition and the subsequent dark-light transition would then be a size for an assumed stroke width. Pixel groups of the same assumed bar widths are each compiled in a separate list. In order to increase the robustness of the method with respect to pixel errors, it is additionally possible to work with a low-pass filter. In this filter, the sum of n adjacent pixels is taken in each case to find corresponding light-dark, or dark-light transitions. Due to the summation, any pixel errors or errors due to strong noise are greatly reduced.

To recognize the letter similar pixel groups are each compiled in a separate list. Each list thus obtained is sorted in such a way that the pixel groups which have a lower Y position are sorted in descending order. If several similar pixel groups are at the same Y positions, new lists are created for them. From these lists, an attempt is now made to derive corresponding vectors. In the process, the pixel groups with the lowest and highest Y values are selected from the respective lists. Between these pixel group positions a line is calculated. Then the deviations of the other pixel groups to this line are determined. If all deviations are below a certain threshold, then a description vector has been found for this list. If the deviations are above a threshold, the list is split and an attempt is made to generate corresponding vectors for each sub-list. It makes sense to divide the list where the largest deviations from the calculated line occurred. In this way one obtains a number of vectors. Touching vectors are summarized in another vector list and sorted according to the Y values. This vector list then describes corresponding letters. The vector list is then normalized (eg to the maximum Y difference). Such a normalized vector list can then go through a solution tree in which the different letters are stored. With this approach you will first recognize only a part of the letters. However, you get in this way first information about the font to be recognized. For large characters you will get double letters. This is because according to the line width of the letters once the light-dark, as well as in the dark-light transition is interpreted as a single letter. It can be assumed that the distance between these double letters is reasonably constant. However, this fact can now be used to optimize the shape of the pixel groups used according to the line width. So the width of the pixel group used should be chosen to be three times the line width. The optimal height of the pixel group depends on the font height. With the thus optimized pixel groups, the image is now scanned further. Enlarging the pixel groups results in faster processing due to the less required internal lists, which also provides more accurate results. Another form of optimization is to optimize the result trees. Since the font usually does not change within a text, there are optimized result trees for each text with this font. Assuming 26 letters, the letters are case-insensitive. Assuming a binary tree of 128 characters, 7 branches (2 to 7) are enough to determine the letters.

For typescript you could optimize the entire process of text recognition even further by storing already recognized letters, or even syllables as a pixel group master. Methods described in parallel above could now be used with the pixel group master, e.g. Vowels are easily recognized because they would achieve an extremely high pixel group value.

As an additional option, recognition errors with dictionaries could be partially detected and corrected. The output of the recognized characters can be realized both via a display and via a "speech-to-text program". The method described describes an optimized method which forms vectors from pixel-based images, wherein each individual pixel (with a one-line pixel group) only needs to be traversed once. In previously known OCR methods, an edge optimization is usually carried out to increase the recognition rate beforehand, and only then is the recognition process started. In the method described above, this is done in one step, so it is both less computationally intensive and more robust.

Claims

claims

A method of OCR recognition, comprising the steps of: a) recognizing strokes by pixel group-oriented listing, each of the lists representing individual strokes; b) tracing the letters based on the generated lists; c) Comparison of the sequence of movements when tracing the letter with standardized reference letters, stored in a solution tree.

2. A method of analyzing image data consisting of an array of individual pixels, each pixel having a time-varying pixel value describing color or brightness information of the pixel, the steps of: a) determining a priority value for each pixel of the array, by setting the pixel used as a reference pixel and calculating a pixel difference value from the respective current pixel value of the reference pixel with respect to the current pixel values of a predetermined set of adjacent pixels; b) summarizing the pixels used to calculate the priority value into a pixel group, c) sorting the pixel groups based on the priority value of the associated reference pixel and dropping them in a priority array; d) storing and / or transmitting the pixel groups according to their priority in the priority array, wherein only a part of the pixel groups is used for the list formation to optimize the computing power, characterized in that in addition a position factor is included in the priority value, which is the greater, depending closer to the

Pixel group is located at a predefined depending on the language start pixel.

3. The method according to claim 2, characterized in that the pixel difference value is the difference of the pixel value of a considered pixels to the pixel value of some of its considered neighboring pixels of the pixel group.

4. The method according to claim 2 or 3, characterized in that the pixel difference value allows conclusions about the stroke width.

5. The method according to any one of claims 1 to 4, characterized in that formed of similar pixel groups lists.

6. The method according to any one of claims 2 to 5, characterized in that after the steps 1 a) to 1 d) the following steps are performed:

There is first an adaptive pixel group-optimized preprocessing, which searches the image for lines, wherein subsequently attempts to trace these lines as optimally as possible, from the

Motion sequence is then closed on the corresponding character via stored search words / solution trees.

7. The method according to any one of claims 2 to 5, characterized in that after the steps 1 a) to 1 d) the following steps are performed:

Similar pixel groups are each collected in a separate list, and each list thus obtained is sorted such that the pixel groups having a lower Y position are sorted in descending order, and if a plurality of similar pixel groups are in the same Y positions, for them generate new lists from which lists vectors are derived and the pixel groups with the lowest and the highest Y value are searched for and where a line is calculated between these pixel group positions and the deviations of the other pixel groups are determined to this line.

8. The method according to claim 7, characterized in that, if all deviations are below a certain threshold, a description vector has been found for this list, but if the deviations above a threshold, the list is divided and an attempt is made to generate corresponding vectors for each sub-list.

9. The method according to claim 7 or 8, characterized in that the list is shared where the largest deviations to the calculated line were.

10. The method according to any one of claims 7 to 9, characterized in that the vector list is then normalized, e.g. to the maximum Y difference.

11.A method according to claim 10, characterized in that the normalized vector list passes through a solution tree in which the different letters are stored.

12. The method according to any one of claims 7 to 1 1, characterized in that touching vectors are combined in a further vector list, and sorted according to the Y values.

13. The method according to any one of claims 7 to 12, characterized in that the width of the pixel group used is chosen so that it is three times the line width and the optimal height of the pixel group is dependent on the font height.

14. The method according to any one of claims 7 to 13, characterized in that the image is then further scanned with the thus optimized pixel groups.

15. The method according to any one of claims 7 to 14, characterized in that optimized result trees are generated for each text with this font.

16. The method according to any one of claims 7 to 15, characterized in that for typescript already recognized letters, or even syllables are stored as a pixel group master.

17. The method according to any one of claims 1 to 16, characterized in that a dictionary / lexicon is used, based on which the recognized letters are used for an even more optimized character recognition.

18. The method according to any one of claims 1 to 17, characterized in that the recognized words translated into a selectable language and output visually and / or acoustically.

19. The method according to any one of claims 1 to 18, characterized in that by feedback of recognized words solution trees and

Line widths of the template are optimized accordingly.

20. The method according to any one of claims 1 to 19, characterized in that the current determination and output of the sorted priority groups of pixels already by a used image-receiving system, in particular a built-in mobile phone scanner or CCD camera takes place.