CN101246550B - Image character recognition method and device - Google Patents

Image character recognition method and device Download PDF

Info

Publication number
CN101246550B
CN101246550B CN2008101016961A CN200810101696A CN101246550B CN 101246550 B CN101246550 B CN 101246550B CN 2008101016961 A CN2008101016961 A CN 2008101016961A CN 200810101696 A CN200810101696 A CN 200810101696A CN 101246550 B CN101246550 B CN 101246550B
Authority
CN
China
Prior art keywords
character
information
scanning
database
character information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008101016961A
Other languages
Chinese (zh)
Other versions
CN101246550A (en
Inventor
刘广振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Device Co Ltd
Original Assignee
Huawei Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Device Co Ltd filed Critical Huawei Device Co Ltd
Priority to CN2008101016961A priority Critical patent/CN101246550B/en
Publication of CN101246550A publication Critical patent/CN101246550A/en
Application granted granted Critical
Publication of CN101246550B publication Critical patent/CN101246550B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

The invention relates to an image character recognition method and device, wherein the image character recognition method comprises the steps of: obtaining character image information, and storing the character image information in database; scanning the first line character column by column in accordance with character height, scanning the next line character after the line character are scanned, until all characters are scanned; matching the scanned character information and character information stored in the database, transforming the matched character information into character corresponded with the characters information; the image character recognition method and device, can enhance the accuracy rate and recognition speed of image character recognition, reduce the development speed.

Description

Image character recognition method and device
Technical field
The present invention relates to the character recognition technology field, relate in particular to a kind of image character recognition method and device.
Background technology
At present, the pictograph recognition technology has been applied to a lot of fields, and this technology is exactly that Word message is especially identified from the picture that electronic equipment shows from a width of cloth picture.But these Word messages of identification unusual difficulty just concerning machine, especially to some Word message of easily obscuring, for example numeral on the picture 0 and alphabetical O, numeral 1 and alphabetical l are if be difficult to distinguish without contextual information.Therefore, common pictograph recognition technology can't be accomplished 100% discrimination at present, and owing to need to consider multiple font, recognition speed is also slow.
In addition, in test job, usually need to check whether the displaying contents of equipment under test is correct, this does not have any difficulty for manual test, but for automatic test, but being to be difficult to the wide gap of going beyond together, is exactly accurate because automatic test at first requires, even the literal identification accuracy reaches 99%, its test result all is incredible; Next requires speed fast, if every identification one width of cloth picture needs several seconds even tens seconds, for some concrete test that requires speed, just can't be competent at.
In order to improve the precision and the speed of pictograph identification, prior art often adopts following three kinds of methods:
One. directly the display effect of expection is made picture in advance,, finish self-verifying the target displaying contents by the comparison of this picture and actual displayed picture.The effect of this inspection is that 100% of Pixel-level is accurately mated.But when adopting this method, all need to prepare the expection picture in advance because each detects step, not only workload is big, and after usually will waiting until that equipment under test is mature and stable, could obtain these pictures, the hysteresis on this duration can cause test period lengthening, risk controllable degree variation; In addition, the mode self-verifying that also has some displaying contents to compare by picture merely, for example some relates to the content of time showing, just can't make accurate contrast picture in advance.
Two. based on the literal rule, perhaps adopt methods such as neural network, in advance big quantitative analysis and processing made in every kind of character, and in identifying, adopt image blurring processing.These class methods mostly are commercial at present and use, and are of wide application.But these class methods need analytically be done extensive work at character and graphic, and implementation is very complicated.In addition, owing to need in the identifying image is adopted Fuzzy Processing, can't guarantee very high recognition accuracy, especially for small font, discrimination is lower.
Three. with image with common horizontal scanning sequential storage data owing to be first horizontal scanning vertical scanning again, the follow-up HV scan mode that abbreviates as, the scanning result data are binary mode, again scanning result are carried out matching treatment, carry out pictograph identification.This mode can use the processing mode of common one-dimensional data to come image data processing the view data of one-dimensional of two dimension, reaches the purpose of identification.But the shortcoming of this method is: because each character data is divided into disjunct several sections, follow-up matching treatment is caused very big obstacle.
Therefore, lack a kind of image character recognition method and device accurately and fast at present.
Summary of the invention
The embodiment of the invention provides a kind of image character recognition method and device, to realize the identification accurately and fast to pictograph.
The embodiment of the invention provides a kind of image character recognition method, and this method specifically comprises:
By column scan first trip character, pursue line character under the column scan again after scan this line character, according to the height of character until having scanned all characters;
The character information of storing in character information after the scanning and the database is mated, the character information after the coupling is converted to and this character information corresponding character, wherein, the character information of storing in the database is the character information of single character.
Said method has improved the accuracy rate and the recognition speed of pictograph identification preferably, has reduced tempo of development.
The embodiment of the invention provides a kind of pictograph recognition device, and this device specifically comprises:
Acquisition module is used to obtain the character graphics information of single character;
Memory module, the character graphics information that is used for getting access to is stored in database;
Scan module is used for according to the height of character by column scan first trip character, pursues line character under the column scan again after scan this line character, until having scanned all characters;
Modular converter is used for the character information that character information and database after the scanning are stored is mated, and the character information after mating is converted to and this character information corresponding character.
Said apparatus has improved the accuracy rate and the recognition speed of pictograph identification preferably, has reduced tempo of development.
Below by drawings and Examples, the technical scheme of the embodiment of the invention is described in further detail.
Description of drawings
Fig. 1 is the process flow diagram of image character recognition method embodiment of the present invention;
Fig. 2 has the synoptic diagram of character image embodiment for the present invention;
Fig. 3 is the structural representation of pictograph recognition device embodiment of the present invention.
Embodiment
As shown in Figure 1, be the process flow diagram of image character recognition method embodiment of the present invention, this method specifically comprises:
Step 101, obtain character graphics information, and above-mentioned character graphics information is stored in the database;
Obtaining of character graphics information can be accomplished in several ways, and for example can directly obtain from character library, also can initiatively grasp each character display image; For make after the scanning character graphics better with database in character graphics be complementary, should be with the character graphics that obtains to be stored in the database through form by the scanning result after the column scan; But above-mentioned steps is an optional step, if character graphics information has been stored in the database, does not then need to carry out this step;
Step 102, according to the height of character by column scan first trip character, pursue line character under the column scan again after scan this line character, until having scanned all characters;
For the electronic equipment that is used for display text information, the height of the character that desire shows can know in advance, by column scan first trip character, pursues line character under the column scan again after scan this line character, until having scanned all characters according to the height of character; As shown in Figure 2, for the present invention has the synoptic diagram of character image embodiment, the height of every style of writing word is 16b it in this image, thus according to the height of 16 bits by each row pixel of column scan, until the intact first trip literal " Chinese character " of horizontal scanning, the character information of " Chinese character " is as follows:
0000100000100000000001100010000010000000011111100110000110000000000001100000001000100000000001000011100000000100001001110000100000100000110100000010000000100000001000001101000000100111000010000011100000001100001000000000011000000000000001000000000000000000 0000000000000000000010000100000000110000010000000010010 001000000001001000100000000100100010000101010010001000001011001001 11111100010010101000000001001100100000000100100010000000010000 0010000000010100001000000001100000100000000000000010000000000000000000000
And then by the descending literal of column scan " the word Chinese ", the character information of " Chinese character " is as follows:
00000000000000000000100001000000001100000100000000100100 0100000000100100010000000010010001000010101001000100000101100 10011111110001001010100000000100110010000000010010001000000001000 000100000000101000010000000011000001000000000000000100000000000000000000000000100000100000000001100010000010000000011111100110000110000000000001100000001000100000000001000011100000000100001001110000100000100000110100000010000000100000001000001101000000100111000010000011100000001100001000000000011000000000000001000000000000000000
Wherein, what indicate the description of underscore partial data is " word " this character, and what do not indicate the description of underscore partial data is " Chinese " this character;
In addition, when the scanning character, if the height difference of each character in the delegation is then pursued this line character of column scan according to the maximum height of this line character; If the spacing between each line character is not simultaneously, then according to the height of character by the first trip pixel in the column scan first trip character, if the scanning result of first trip pixel is empty, then continue the descending pixel in this first trip character of scanning, in scanning this row pixel, remain content identified, then according to the height of character by this line character at intact this row pixel place of column scan after again by the first trip pixel in the line character under the column scan, until having scanned all characters; If accompany other special symbols in the middle of the character, for example Shan Shuo cursor then can directly be ignored;
The character information of storing in step 103, the character information after will scanning and the database mates, and the character information after the coupling is converted to and this character information corresponding character.
In the character information after scanning, judging width between two row pixels of the no content to be identified in scanning back less than the setting width, is the space with the content recognition to be identified between above-mentioned two row pixels then; In addition, the character information of storing in character information after the scanning and the database is mated, if the character information after the coupling is corresponding with kinds of words, character information after then will mating is converted to the longest literal of character length, for example if the character information after the scanning can be identified as two single quotation marks, also can be identified as a double quotation marks, then be identified as a double quotation marks;
Above-mentioned image character recognition method embodiment, by the character information that will obtain to be stored in the database through form by the scanning result after the column scan, make the pictograph recognition speed after the scanning fast, employing is pursued column scan first trip character according to the height of character, pursue line character under the column scan again after having scanned this line character, until the mode that has scanned all characters, make each character data be divided into continuous several sections, thereby improved the accuracy rate and the recognition speed of pictograph identification preferably, reduced tempo of development.
As shown in Figure 3, be the structural representation of pictograph recognition device embodiment of the present invention, this device specifically comprises: acquisition module 1 is used to obtain character graphics information; Memory module 2, the character graphics information that is used for getting access to is stored in database; Scan module 3 is used for according to the height of character by column scan first trip character, pursues line character under the column scan again after scan this line character, until having scanned all characters; Modular converter 4 mates the character information of storing in character information after the scanning and the database, and the character information after the coupling is converted to and this character information corresponding character.
Wherein, above-mentioned literal comprises Chinese character, letter, numeral, space and various punctuation marks etc., and above-mentioned memory module is to be stored in character picture in the database through the form by the scanning result after the column scan.
In addition, above-mentioned modular converter can be specially identification module, is used for width when between two row pixels of the no content to be identified in scanning back less than the setting width, is the space with the content recognition to be identified between above-mentioned two row pixels then; Above-mentioned modular converter can also be specially the selection modular converter, be used for character information after the scanning and the character information stored of database are mated, if the character information after the coupling is corresponding with kinds of words, the character information after then will mating is converted to the longest literal of character length.
Said apparatus, obtain character information by acquisition module, and the character graphics information that gets access to by memory module storage, utilize all character informations in the scan module scan image then, the character information of storing in character information after will scanning by modular converter at last and the database mates, character information after the coupling is converted to and its corresponding character, said apparatus can be used for the correctness inspection of automatic test for middle display message, for example check that whether the current demonstration time be the display format of 12 hours systems, perhaps whether the warning information of current demonstration is correct etc.; Also can be used for discerning and copy the Word message that screen obtains a sub-picture on the computing machine; Can also make character library provider on the basis that does not increase big cost, the pictograph recognition technology of own all fonts is provided simultaneously.
Above-mentioned pictograph recognition device embodiment, make each character data be divided into continuous several sections, and coming to the same thing of the storage format of memory module store character graphical information and scan module scanning character graphics information, thereby improved the accuracy rate and the recognition speed of pictograph identification preferably, reduced tempo of development.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (11)

1. image character recognition method is characterized in that comprising:
By column scan first trip character, pursue line character under the column scan again after scan this line character, according to the height of character until having scanned all characters;
The character information of storing in character information after the scanning and the database is mated, the character information after the coupling is converted to and this character information corresponding character, wherein, the character information of storing in the database is the character information of single character.
2. image character recognition method according to claim 1 is characterized in that described height according to character by column scan first trip character, pursues line character under the column scan again after scan this line character, also comprises before until having scanned all characters:
Obtain character graphics information, and described character graphics information is stored in the database.
3. image character recognition method according to claim 2 is characterized in that described described character graphics information is stored in the database specifically comprises:
The scanning result of described character graphics information after pursuing column scan is stored in the database.
4. according to the arbitrary described image character recognition method of claim 1-3, it is characterized in that described height according to character by column scan first trip character, pursue line character under the column scan again after scan this line character, specifically comprise until having scanned all characters:
When the character height of every row not simultaneously, by column scan first trip character, pursue line character under the column scan according to the maximum height of character again after scan this line character, according to the maximum height of character until having scanned all characters.
5. according to the arbitrary described image character recognition method of claim 1-3, it is characterized in that described height according to character by column scan first trip character, pursue line character under the column scan again after scan this line character, specifically comprise until having scanned all characters:
According to the height of character by the first trip pixel in the column scan first trip character, if the scanning result of first trip pixel is empty, then continue the descending pixel in this first trip character of scanning, in scanning this row pixel, remain content identified, then according to the height of character by this line character at intact this row pixel place of column scan after again by the first trip pixel in the line character under the column scan, until having scanned all characters.
6. image character recognition method according to claim 5 is characterized in that the character information of storing in character information after described will the scanning and the database mates, and the character information after the coupling is converted to this character information corresponding character specifically comprise:
The character information of storing in character information after the scanning and the database is mated, and the width when between two row pixels of no content to be identified after the scanning is the space with the content recognition to be identified between described two row pixels then less than setting width.
7. image character recognition method according to claim 5 is characterized in that the character information of storing in character information after described will the scanning and the database mates, and the character information after the coupling is converted to this character information corresponding character specifically comprise:
The character information of storing in character information after the scanning and the database is mated, if the character information after the coupling is corresponding with kinds of words, the character information after then will mating is converted to the longest literal of character length.
8. pictograph recognition device is characterized in that comprising:
Acquisition module is used to obtain the character graphics information of single character;
Memory module, the character graphics information that is used for getting access to is stored in database;
Scan module is used for according to the height of character by column scan first trip character, pursues line character under the column scan again after scan this line character, until having scanned all characters;
Modular converter is used for the character information that character information and database after the scanning are stored is mated, and the character information after mating is converted to and this character information corresponding character.
9. pictograph recognition device according to claim 8 is characterized in that described literal comprises Chinese character, letter, numeral, space and punctuation mark.
10. according to Claim 8 or 9 described pictograph recognition devices, it is characterized in that described modular converter specifically comprises identification module, being used for width when between two row pixels of the no content to be identified in scanning back less than the setting width, is the space with the content recognition to be identified between described two row pixels then.
11. pictograph recognition device according to claim 10, it is characterized in that described modular converter specifically comprises the selection modular converter, be used for character information after the scanning and the character information stored of database are mated, if the character information after the coupling is corresponding with kinds of words, the character information after then will mating is converted to the longest literal of character length.
CN2008101016961A 2008-03-11 2008-03-11 Image character recognition method and device Expired - Fee Related CN101246550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101016961A CN101246550B (en) 2008-03-11 2008-03-11 Image character recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101016961A CN101246550B (en) 2008-03-11 2008-03-11 Image character recognition method and device

Publications (2)

Publication Number Publication Date
CN101246550A CN101246550A (en) 2008-08-20
CN101246550B true CN101246550B (en) 2011-05-18

Family

ID=39946989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101016961A Expired - Fee Related CN101246550B (en) 2008-03-11 2008-03-11 Image character recognition method and device

Country Status (1)

Country Link
CN (1) CN101246550B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101771752A (en) * 2009-12-29 2010-07-07 中兴通讯股份有限公司 Mobile phone TV text information extraction method and mobile terminal with same
CN103345646B (en) * 2013-06-06 2016-05-18 天地融科技股份有限公司 Stamp and the method and apparatus that obtains stamp information
US20150006361A1 (en) * 2013-06-28 2015-01-01 Google Inc. Extracting Card Data Using Three-Dimensional Models
CN105631486A (en) * 2014-10-27 2016-06-01 深圳Tcl数字技术有限公司 image character recognition method and device
CN105809170B (en) * 2016-03-04 2019-04-26 东软集团股份有限公司 Character identifying method and device
CN108062301B (en) * 2016-11-08 2021-11-05 希思特兰国际 Character translation method and device
CN109409370B (en) * 2017-08-18 2022-02-18 深圳市傲冠软件股份有限公司 Remote desktop character recognition method and device
CN109522553B (en) * 2018-11-09 2020-02-11 龙马智芯(珠海横琴)科技有限公司 Named entity identification method and device
CN111914513A (en) * 2019-05-08 2020-11-10 亿阳安全技术有限公司 RDP window title character recognition method and device
CN111753827B (en) * 2020-05-15 2024-02-13 中国科学院信息工程研究所 Scene text recognition method and system based on semantic enhancement encoder and decoder framework
CN113239929A (en) * 2021-04-30 2021-08-10 南京钢铁股份有限公司 Method for identifying numbers in computer display screen

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276790A (en) * 1991-07-12 1994-01-04 Destiny Technology Corporation Fast vertical scan-conversion and filling method and apparatus for outline font character generation in dot matrix devices

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276790A (en) * 1991-07-12 1994-01-04 Destiny Technology Corporation Fast vertical scan-conversion and filling method and apparatus for outline font character generation in dot matrix devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
US 5276790 A,全文.

Also Published As

Publication number Publication date
CN101246550A (en) 2008-08-20

Similar Documents

Publication Publication Date Title
CN101246550B (en) Image character recognition method and device
US8442324B2 (en) Method and system for displaying image based on text in image
US8830241B1 (en) Image conversion of text-based images
US8515176B1 (en) Identification of text-block frames
CN100504868C (en) Tree structures list display process having multiple line content node and device thereof
CN104123608B (en) A kind of method and apparatus for establishing accounting records
CN103620589A (en) Device, method, and program for displaying document file
CN111626036A (en) Novel image-text typesetting processing method
CN101986247A (en) Electronic reader and text display method thereof
CN101916174B (en) Display method and device thereof, treatment method and device thereof for electronic document handwriting
US20050039138A1 (en) Method and system for displaying comic books and graphic novels on all sizes of electronic display screens.
CN103268185A (en) Text display method and text display device for e-book reader
CN104573675A (en) Operating image displaying method and device
US9734132B1 (en) Alignment and reflow of displayed character images
CN101573684B (en) Method for visualizing a change caused by scrolling in a scrolling direction of a section of a text and/or graphic displayed on an optical display means
US8824806B1 (en) Sequential digital image panning
KR101638511B1 (en) Computer readable medium recording program for authoring online learning contents and d method of authoring online learning contents
US8656371B2 (en) System and method of report representation
CN111338733A (en) User interface adaptation method and system
CN116311300A (en) Table generation method, apparatus, electronic device and storage medium
CN114638915A (en) Intelligent layout method, system, equipment and storage medium for pictorial newspaper
CN115034177A (en) Presentation file conversion method, device, equipment and storage medium
WO2014125658A1 (en) Character recognition system, character recognition program and character recognition method
CN113657279A (en) Bill image layout analysis method and device
CN109409370B (en) Remote desktop character recognition method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20171031

Address after: Metro Songshan Lake high tech Industrial Development Zone, Guangdong Province, Dongguan City Road 523808 No. 2 South Factory (1) project B2 -5 production workshop

Patentee after: Huawei terminal (Dongguan) Co.,Ltd.

Address before: 518129 Longgang District, Guangdong, Bantian HUAWEI base B District, building 2, building No.

Patentee before: HUAWEI DEVICE Co.,Ltd.

TR01 Transfer of patent right
CP01 Change in the name or title of a patent holder

Address after: 523808 Southern Factory Building (Phase I) Project B2 Production Plant-5, New Town Avenue, Songshan Lake High-tech Industrial Development Zone, Dongguan City, Guangdong Province

Patentee after: HUAWEI DEVICE Co.,Ltd.

Address before: 523808 Southern Factory Building (Phase I) Project B2 Production Plant-5, New Town Avenue, Songshan Lake High-tech Industrial Development Zone, Dongguan City, Guangdong Province

Patentee before: Huawei terminal (Dongguan) Co.,Ltd.

CP01 Change in the name or title of a patent holder
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110518

CF01 Termination of patent right due to non-payment of annual fee