Background technology
At present, the pictograph recognition technology has been applied to a lot of fields, and this technology is exactly that Word message is especially identified from the picture that electronic equipment shows from a width of cloth picture.But these Word messages of identification unusual difficulty just concerning machine, especially to some Word message of easily obscuring, for example numeral on the picture 0 and alphabetical O, numeral 1 and alphabetical l are if be difficult to distinguish without contextual information.Therefore, common pictograph recognition technology can't be accomplished 100% discrimination at present, and owing to need to consider multiple font, recognition speed is also slow.
In addition, in test job, usually need to check whether the displaying contents of equipment under test is correct, this does not have any difficulty for manual test, but for automatic test, but being to be difficult to the wide gap of going beyond together, is exactly accurate because automatic test at first requires, even the literal identification accuracy reaches 99%, its test result all is incredible; Next requires speed fast, if every identification one width of cloth picture needs several seconds even tens seconds, for some concrete test that requires speed, just can't be competent at.
In order to improve the precision and the speed of pictograph identification, prior art often adopts following three kinds of methods:
One. directly the display effect of expection is made picture in advance,, finish self-verifying the target displaying contents by the comparison of this picture and actual displayed picture.The effect of this inspection is that 100% of Pixel-level is accurately mated.But when adopting this method, all need to prepare the expection picture in advance because each detects step, not only workload is big, and after usually will waiting until that equipment under test is mature and stable, could obtain these pictures, the hysteresis on this duration can cause test period lengthening, risk controllable degree variation; In addition, the mode self-verifying that also has some displaying contents to compare by picture merely, for example some relates to the content of time showing, just can't make accurate contrast picture in advance.
Two. based on the literal rule, perhaps adopt methods such as neural network, in advance big quantitative analysis and processing made in every kind of character, and in identifying, adopt image blurring processing.These class methods mostly are commercial at present and use, and are of wide application.But these class methods need analytically be done extensive work at character and graphic, and implementation is very complicated.In addition, owing to need in the identifying image is adopted Fuzzy Processing, can't guarantee very high recognition accuracy, especially for small font, discrimination is lower.
Three. with image with common horizontal scanning sequential storage data owing to be first horizontal scanning vertical scanning again, the follow-up HV scan mode that abbreviates as, the scanning result data are binary mode, again scanning result are carried out matching treatment, carry out pictograph identification.This mode can use the processing mode of common one-dimensional data to come image data processing the view data of one-dimensional of two dimension, reaches the purpose of identification.But the shortcoming of this method is: because each character data is divided into disjunct several sections, follow-up matching treatment is caused very big obstacle.
Therefore, lack a kind of image character recognition method and device accurately and fast at present.
Summary of the invention
The embodiment of the invention provides a kind of image character recognition method and device, to realize the identification accurately and fast to pictograph.
The embodiment of the invention provides a kind of image character recognition method, and this method specifically comprises:
By column scan first trip character, pursue line character under the column scan again after scan this line character, according to the height of character until having scanned all characters;
The character information of storing in character information after the scanning and the database is mated, the character information after the coupling is converted to and this character information corresponding character, wherein, the character information of storing in the database is the character information of single character.
Said method has improved the accuracy rate and the recognition speed of pictograph identification preferably, has reduced tempo of development.
The embodiment of the invention provides a kind of pictograph recognition device, and this device specifically comprises:
Acquisition module is used to obtain the character graphics information of single character;
Memory module, the character graphics information that is used for getting access to is stored in database;
Scan module is used for according to the height of character by column scan first trip character, pursues line character under the column scan again after scan this line character, until having scanned all characters;
Modular converter is used for the character information that character information and database after the scanning are stored is mated, and the character information after mating is converted to and this character information corresponding character.
Said apparatus has improved the accuracy rate and the recognition speed of pictograph identification preferably, has reduced tempo of development.
Below by drawings and Examples, the technical scheme of the embodiment of the invention is described in further detail.
Embodiment
As shown in Figure 1, be the process flow diagram of image character recognition method embodiment of the present invention, this method specifically comprises:
Step 101, obtain character graphics information, and above-mentioned character graphics information is stored in the database;
Obtaining of character graphics information can be accomplished in several ways, and for example can directly obtain from character library, also can initiatively grasp each character display image; For make after the scanning character graphics better with database in character graphics be complementary, should be with the character graphics that obtains to be stored in the database through form by the scanning result after the column scan; But above-mentioned steps is an optional step, if character graphics information has been stored in the database, does not then need to carry out this step;
Step 102, according to the height of character by column scan first trip character, pursue line character under the column scan again after scan this line character, until having scanned all characters;
For the electronic equipment that is used for display text information, the height of the character that desire shows can know in advance, by column scan first trip character, pursues line character under the column scan again after scan this line character, until having scanned all characters according to the height of character; As shown in Figure 2, for the present invention has the synoptic diagram of character image embodiment, the height of every style of writing word is 16b it in this image, thus according to the height of 16 bits by each row pixel of column scan, until the intact first trip literal " Chinese character " of horizontal scanning, the character information of " Chinese character " is as follows:
0000100000100000000001100010000010000000011111100110000110000000000001100000001000100000000001000011100000000100001001110000100000100000110100000010000000100000001000001101000000100111000010000011100000001100001000000000011000000000000001000000000000000000
0000000000000000000010000100000000110000010000000010010 001000000001001000100000000100100010000101010010001000001011001001 11111100010010101000000001001100100000000100100010000000010000 0010000000010100001000000001100000100000000000000010000000000000000000000
And then by the descending literal of column scan " the word Chinese ", the character information of " Chinese character " is as follows:
00000000000000000000100001000000001100000100000000100100 0100000000100100010000000010010001000010101001000100000101100 10011111110001001010100000000100110010000000010010001000000001000 000100000000101000010000000011000001000000000000000100000000000000000000000000100000100000000001100010000010000000011111100110000110000000000001100000001000100000000001000011100000000100001001110000100000100000110100000010000000100000001000001101000000100111000010000011100000001100001000000000011000000000000001000000000000000000
Wherein, what indicate the description of underscore partial data is " word " this character, and what do not indicate the description of underscore partial data is " Chinese " this character;
In addition, when the scanning character, if the height difference of each character in the delegation is then pursued this line character of column scan according to the maximum height of this line character; If the spacing between each line character is not simultaneously, then according to the height of character by the first trip pixel in the column scan first trip character, if the scanning result of first trip pixel is empty, then continue the descending pixel in this first trip character of scanning, in scanning this row pixel, remain content identified, then according to the height of character by this line character at intact this row pixel place of column scan after again by the first trip pixel in the line character under the column scan, until having scanned all characters; If accompany other special symbols in the middle of the character, for example Shan Shuo cursor then can directly be ignored;
The character information of storing in step 103, the character information after will scanning and the database mates, and the character information after the coupling is converted to and this character information corresponding character.
In the character information after scanning, judging width between two row pixels of the no content to be identified in scanning back less than the setting width, is the space with the content recognition to be identified between above-mentioned two row pixels then; In addition, the character information of storing in character information after the scanning and the database is mated, if the character information after the coupling is corresponding with kinds of words, character information after then will mating is converted to the longest literal of character length, for example if the character information after the scanning can be identified as two single quotation marks, also can be identified as a double quotation marks, then be identified as a double quotation marks;
Above-mentioned image character recognition method embodiment, by the character information that will obtain to be stored in the database through form by the scanning result after the column scan, make the pictograph recognition speed after the scanning fast, employing is pursued column scan first trip character according to the height of character, pursue line character under the column scan again after having scanned this line character, until the mode that has scanned all characters, make each character data be divided into continuous several sections, thereby improved the accuracy rate and the recognition speed of pictograph identification preferably, reduced tempo of development.
As shown in Figure 3, be the structural representation of pictograph recognition device embodiment of the present invention, this device specifically comprises: acquisition module 1 is used to obtain character graphics information; Memory module 2, the character graphics information that is used for getting access to is stored in database; Scan module 3 is used for according to the height of character by column scan first trip character, pursues line character under the column scan again after scan this line character, until having scanned all characters; Modular converter 4 mates the character information of storing in character information after the scanning and the database, and the character information after the coupling is converted to and this character information corresponding character.
Wherein, above-mentioned literal comprises Chinese character, letter, numeral, space and various punctuation marks etc., and above-mentioned memory module is to be stored in character picture in the database through the form by the scanning result after the column scan.
In addition, above-mentioned modular converter can be specially identification module, is used for width when between two row pixels of the no content to be identified in scanning back less than the setting width, is the space with the content recognition to be identified between above-mentioned two row pixels then; Above-mentioned modular converter can also be specially the selection modular converter, be used for character information after the scanning and the character information stored of database are mated, if the character information after the coupling is corresponding with kinds of words, the character information after then will mating is converted to the longest literal of character length.
Said apparatus, obtain character information by acquisition module, and the character graphics information that gets access to by memory module storage, utilize all character informations in the scan module scan image then, the character information of storing in character information after will scanning by modular converter at last and the database mates, character information after the coupling is converted to and its corresponding character, said apparatus can be used for the correctness inspection of automatic test for middle display message, for example check that whether the current demonstration time be the display format of 12 hours systems, perhaps whether the warning information of current demonstration is correct etc.; Also can be used for discerning and copy the Word message that screen obtains a sub-picture on the computing machine; Can also make character library provider on the basis that does not increase big cost, the pictograph recognition technology of own all fonts is provided simultaneously.
Above-mentioned pictograph recognition device embodiment, make each character data be divided into continuous several sections, and coming to the same thing of the storage format of memory module store character graphical information and scan module scanning character graphics information, thereby improved the accuracy rate and the recognition speed of pictograph identification preferably, reduced tempo of development.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.