CN101667251B

CN101667251B - OCR recognition method and device with auxiliary positioning function

Info

Publication number: CN101667251B
Application number: CN200810215861.6A
Authority: CN
Inventors: 陈又新; 李斌; 王�华; 王炎
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2008-09-05
Filing date: 2008-09-05
Publication date: 2014-07-23
Anticipated expiration: 2028-09-05
Also published as: CN101667251A

Abstract

The invention relates to an OCR recognition method with an auxiliary positioning function. The method comprises the following steps: shooting a target and capturing an image containing characters; searching areas of the image and detecting out one or more text areas; selecting a specific text area; and recognizing characters in the selected specific text area. By using the method and a device of the invention, a user can automatically obtain the text areas in the image, and obtain the interested text areas in an interactive mode so as to carry out applications such as character recognition, translation, and the like. The invention can be applied to ordinary character scenes such as the automatic recognition and the translation of guideboards, public notices, newspaper, and the like, and is particularly suitable for a mobile terminal with the function of a camera. The invention can facilitate the use of the user without complicated auxiliary operations and interactions, reduces the searching range of the image, automatically obtains the text areas interested by the users, reduces the calculating operation time of a system and can improve the accuracy rate of positioning.

Description

The OCR recognition methods and the device that possess auxiliary positioning function

Technical field

The present invention relates to text detection and location in image processing and area of pattern recognition, particularly video and natural scene, character recognition.

Background technology

Present OCR technology is applied to more and more with on the equipment such as the mobile intelligent terminal of image scanning (or shooting) function and PDA, but because background is often comparatively complicated during as video image, in text orientation problem before OCR, also exist certain technological difficulties, cause the result of text location to occur deviation, the character of required identification can not easily and accurately be detected, or be text filedly divided into mistakenly a plurality of relevant text subregions one, affect continuity and the computing cost of OCR recognition result, add word discrimination on the low side, cause final result (as translation) not bery desirable, thereby the mode that at this time just need to carry out some auxiliary positioning improves text location accuracy rate and recognition accuracy.

The basic process of current image (or video) text identification, first by the text image to collected (or certain two field picture) in video, carry out the pre-service (boostfiltering etc.) of image, and the analysis of the space of a whole page and understanding, with this, detect and orient text filed, again each is text filedly carried out to character recognition, further can do to recognition result the operations such as postprocessing correction, " text filed location " wherein directly affects final recognition result, and the counting yield of whole system.

Existing OCR functional mobile phone, by camera scan text word, carries out Sino-British intertranslation, first needs the camera on mobile phone to aim at word center when user uses, and mobile phone and word vertical range are more than 10 centimetres; User focuses by navigation key on mobile phone; Need to guarantee that the height of word to be identified is higher than the height that shows focusing symbol "+"; If the Chinese text of vertical setting of types need to be selected " vertical setting of types text " in menu.In the interface of operation, there will be the bar of " highlighted " to bring to be identified text filed in location, the word in this banded zone is identified and translated.The method adopts the bar of " highlighted " to bring to be identified text filed of auxiliary positioning, need user that the camera on mobile phone is aimed to word center, and need that mobile phone is vertical with word to be maintained a certain distance, if need user to do special setting while identifying vertical setting of types text filed, have a lot of restrictions to user's operation, system can not be carried out text filed location automatically, and the time of operation is long.

[CN 1804858 A] are a kind of for the mobile terminal with camera, implement the assistant positioning system for word to be identified of OCR function, the method makes to there will be on screen a tracking cross, user moves cursor, can make the initial point of cursor be positioned to be identified text filed, with this, carry out auxiliary positioning, can adjust the base of character zone to be identified and the transverse axis of tracking cross simultaneously, the base of character zone to be identified and the longitudinal axis of tracking cross are perpendicular, be used for preventing taking, improve discrimination.The method adopts tracking cross, come auxiliary positioning to be identified text filed, the transverse axis of adjusting tracking cross is parallel to each other with vertical with the longitudinal axis and the base of character zone to be identified, be used for preventing the inclination of word, need user carefully to adjust the position of cursor, and can only locate at every turn one text filed, the working time of whole location and identification is longer.

[CN 1685358 A] propose a kind of in image the automatic method in localization of text region, the step handlebar digital picture comprising is converted into bianry image; Locate possible text filed; Select actual text region; Its feature in text filed positioning step is, applied morphology mask, and with to the operation of bianry image applied morphology, then according to some rules, to generate sealing piece in image, thus localization of text region.The method adopts in whole image-regions, searches for localization of text region, and calculated amount is large and there will be the location of some errorss and omissions.

[US 7171046] propose to identify in a kind of image gathering the method for word, and the step comprising has the image that uses portable set collection to have text message; Text filed in detected image in real time; Adjust the result in text detection region, application OCR technology is carried out word identification; Supplement relevant extrinsic information, comprise travel information, transport information etc.; With dictionary technique, improve the result of OCR identification, the text of output identification and supplementary information, or translate further, and adopting the pictograph detection and Identification system of the method to realize in a portable equipment.The result of the method text filed location of manual setting before identification, needs user's direct intervention, inconvenient user's direct use.

Summary of the invention

The object of this invention is to provide a kind of OCR recognition methods and device that possesses auxiliary positioning function.

According to an aspect of of the present present invention, a kind of OCR recognition methods that possesses auxiliary positioning function, comprises step:

Target is taken and is captured the image that comprises word;

Described image-region is searched for, detected one or more text filed;

Select specific text area;

Word in selecteed specific text area is identified.

According to another aspect of the present invention, a kind of OCR recognition methods that possesses auxiliary positioning function, comprises step:

Click on screen and comprise text filed one or more points;

To including the image-region of click place, take;

Photographic images is carried out to text filed detection and location, obtain candidate text filed;

Word in text filed to candidate carries out OCR identification.

According to another aspect of the present invention, a kind of OCR recognition device that possesses auxiliary positioning function, comprising:

Image acquisition units, for obtaining text image or the video that comprises word;

Text detection positioning unit, for detection of with orient text filed in image;

Word recognition unit, for identifying the word in selected region;

Display unit, for showing the text image of collection, the result of user's input, text detection location and word identification;

Storage unit, moves required related data for storing unit.

The method and apparatus of the application of the invention, user can automatically obtain text filed in image, obtains user interested text filed by mutual mode, carries out the application such as word identification and translation with this.The present invention can apply to common word scene, such as guideboard, and bulletin, automatic identification and the translation of newspapers etc., be particularly suitable for the mobile terminal with camera function.The present invention can be user-friendly, do not need complicated non-productive operation and mutual, and the hunting zone of downscaled images, automatically obtains user interested text filed, reduces the calculating working time of system, and the accuracy rate that can improve location.

Accompanying drawing explanation

Fig. 1 is the OCR recognition device that possesses auxiliary positioning function;

Fig. 2 is the process flow diagram that user selects text filed OCR recognition methods;

Fig. 3 is the process flow diagram that user clicks the OCR recognition methods that contains character area.

Embodiment

The inventive system comprises interactive unit, operation processing unit and storage unit forms, interactive unit is wherein text image or the video that gathers required identification, receive and show that user clicks the relevant information of operations such as selecting, the user's input information receiving is sent to operation processing unit, and receive and show information from operation processing unit, comprise image acquisition units, display unit and user's input detection unit; Operation processing unit is to text image and user's input information from interactive unit input, carry out text filed detection and location, and the word in text filed is identified, comprise text detection positioning unit and word recognition unit, wherein the text filed circumscribed rectangular region that comprises one or more character blocks that refers to.

Image acquisition units is text image or the video that gathers required identification, such as camera, with the mobile phone of camera function, notebook etc.;

Display unit is that user clicks the relevant information of selection for showing text image to be identified or video, text filed detection and positioning result, and the result of word identification;

User's input detection unit is to click for receiving user the relevant information of selecting to wait operation;

Detection and location unit is according to the information receiving from interactive unit, carries out text filed detection and location, exports corresponding text filed location coordinate information to word recognition unit;

Word recognition unit is according to the text image and the location coordinate information that receive from interactive unit and detection and location unit, the word in text filed is identified, and exported to display unit;

Storage unit is moved required relevant information for storing unit, and it comprises: text image to be identified, user click the relevant information of the operations such as selection, text filed positioning result, the apparatus and method information needed such as result of word identification.

In implementation process, based on user, select text filed OCR recognition methods to comprise: start image pickup mode, take and capture the image that comprises word, this image can be low-resolution image; Image-region is searched for, carried out text filed detection and location, automatically the text filed of the candidate who obtains tipped out; Text filed for the candidate who provides, user selects text filed by the mode of click or moving focal point; Word in text filed to selecteed candidate carries out OCR identification.

In implementation process, based on user, click the OCR recognition methods that contains character area and comprise: start image pickup mode, user, by clicking screen, takes and capture character image, and this image can be low-resolution image; To including the image-region of click place, carry out text filed detection and location; What to user prompting, marked is text filed, and user is by clicking or the mode of moving focal point, selects to be identified text filed; Word in text filed to selecteed candidate, or the word in text filed to candidate carries out OCR identification.

Below, describe with reference to the accompanying drawings embodiments of the invention in detail.In the following description, for clear and for simplicity, omitted the detailed description to known function or structure.

This instructions for embodiment be only application one of specific embodiments of the invention, and do not mean that enforcement of the present invention is only confined to this kind of form.

In this manual, comprise claim, the term of use " unit " is by module composition, and " assembly " refers to the entity relevant to system of the present invention, or hardware, the combination of hardware and software, software, or executory software.For example, assembly can be, but is not limited to, and operates in thread, program and the computing machine of process on processor, processor, object, the thing that can carry out, execution.As example, the application program operating on mobile terminal can be assembly.Assembly can comprise one or more assemblies in addition.

Term " comprises ", " comprising " or the similar terms meaning is that nonexclusion comprises, thereby comprises that the method for a row assembly or equipment not only comprise these assemblies, also comprise the assembly that other are unlisted.

Figure mono-is the first advantageous embodiment of the present invention, possesses the OCR recognition device of auxiliary positioning function, and the input equipment of this device is video capture device, output device be can display graphics interface display device, Identification display is touch-screen in the present embodiment.

Image acquisition units 111 major functions are to gather text image or video, such as camera, mobile phone with camera function, notebook etc., when starting OCR recognition device, user can enable image acquisition units 111, image acquisition units 111 is exported after obtaining image or video on display unit 112, and photograph taking and selection that user controls image acquisition units 111 by user's input detection unit 113 enter OCR identification or again take.

The image that image acquisition units 111 is obtained, by text detection positioning unit 121, text filed in detected image, or by user's input detection unit 113, click the region of containing word, by text detection positioning unit 121, detection includes text filed in click place image, and the result detecting is exported on display unit 112, conventionally text filed testing result represents with surrounding text filed rectangle, user by graphical interfaces to rectangle position, size, the editor of shape revises text filed testing result.

By user's input detection unit 113, select the text filed of candidates, user can have and clicks or the mode of moving focal point, selects text filedly, and can select a plurality of text filed; What detect is text filed through word recognition unit 122 identifications, be converted into the machine code of corresponding language, such as Unicode, and on the graphical interfaces of display unit 112, show corresponding recognition result, user can by graphical interfaces to recognition result delete accordingly, add, the operation such as modification, further can carry out the translation of relational language.

Figure bis-is process flow diagrams that user selects text filed OCR recognition methods, and the step comprising is as follows:

1) under image pickup mode (S201), user presses shutter and starts focusing automatically, and camera carries out the operation of focusing automatically, takes and capture the image (S202) that contains word, and this character image can be low-resolution image;

2) character image obtaining is above carried out to the search in global image region, detect text filed (S203), and automatically by the text filed user of being prompted to of the candidate who detects (S204), wherein adopt low-resolution image to carry out detection and location, by the experiment test on 6350 width images, the different resolution image of contrast 400*300 and 1024*768, approximately only have the latter's 20% the former operation time, improved the travelling speed of device;

3) candidate who has detected to user's prompting is text filed, and user, by the mode of click or moving focal point, selects text filed (S205), and supports to select a plurality of text filed;

4) that according to user, selects is text filed, word is wherein carried out to OCR identification (S206), and can further translate.

Wherein step S201 and S202 can carry out in image acquisition units 111, step S203 can carry out in detection and location unit 121, step S204 can carry out in display unit 112, step S205 can carry out in user's input detection unit 113, and step S206 can carry out in word recognition unit 122.

Figure tri-is process flow diagrams that user clicks the OCR recognition methods that contains character area, and the step comprising is as follows:

1) under image pickup mode (S301), user comprises a text filed point or a plurality of point (S302) by clicking on screen, press shutter and start focusing automatically, camera carries out the operation of focusing automatically, take and capture the image (S303) that contains word, this character image can be low-resolution image, and shutter can click screen by user and start, and user can click a plurality of regions of screen simultaneously;

2) according to user, click the position coordinates of screen, the image obtaining is above processed: can start search from whole image-region, detect and contain the text filed of click coordinate; Also can carry out text filed detection (S304) to including the image-region centered by several clicks place, and automatically by the text filed user of being prompted to of the candidate who detects (S305);

3) candidate who has detected to user's prompting is text filed, and user, by the mode of click or moving focal point, selects text filed (S306), and supports to select a plurality of text filed;

4) text filed (S306) selecting according to user, or detection and location text filed (S305) that arrive, device carries out OCR identification (S307) to word wherein, and can further translate.

Wherein step S301 and S303 can carry out in image acquisition units 111, step S304 can carry out in detection and location unit 121, step S305 can carry out in display unit 112, step S302 and S306 can carry out in user's input detection unit 113, and step S307 can carry out in word recognition unit 122.

Claims

1. possess an OCR recognition methods for auxiliary positioning function, comprise step:

User starts image pickup mode;

Click on screen and comprise text filed a point or a plurality of point;

Shutter starts and focusing automatically;

Take and capture the image that contains word;

According to the position coordinates of clicking screen, from whole image-region, start search, to detect, contain the text filed of click coordinate, or the image-region including centered by several clicks place is carried out to text filed detection;

The candidate who has detected to user's prompting is text filed;

User is by clicking or the mode of moving focal point is carried out from candidate is text filed selection one or more is text filed;

To what select, text filedly carry out OCR identification.

2. method according to claim 1, is characterized in that, the image capturing is low-resolution image.

3. method according to claim 1, is characterized in that, also comprises: the word after identification is translated.

4. an OCR recognition device that possesses auxiliary positioning function, comprising:

Image acquisition units, for obtaining text image or the video that comprises word under image pickup mode;

User's input detection unit, clicks on screen and comprises and text filed a point or a plurality of point make image acquisition units start shutter focusing automatically for user, takes and capture the image that contains word;

Text detection positioning unit, for according to the position coordinates of clicking screen, starts search from whole image-region, to detect, contains the text filed of click coordinate, or the image-region including centered by several clicks place is carried out to text filed detection;

Wherein, the candidate that described user's input detection unit has also detected to user prompting is text filed, so that user is by clicking or the mode of moving focal point is carried out from candidate is text filed selection one or more is text filed; Described recognition device also comprises:

Word recognition unit, for text filedly carrying out OCR identification to what select;

Storage unit, moves required related data for storing unit.

5. recognition device according to claim 4, characterized by further comprising graphical interfaces, detects word identification and translation result for display text on display device.

6. recognition device according to claim 4, is characterized in that described recognition device is mobile phone, PDA, intelligent terminal, camera or translater.

7. recognition device according to claim 4, is characterized in that described display unit is LCD display or touch-screen.