CN102136064A

CN102136064A - System for recognizing characters from image

Info

Publication number: CN102136064A
Application number: CN 201110071825
Authority: CN
Inventors: 王鑫鑫; 税彬
Original assignee: CHENGDU SIFANG TECHNOLOGIES Co Ltd
Current assignee: CHENGDU SIFANG TECHNOLOGIES Co Ltd
Priority date: 2011-03-24
Filing date: 2011-03-24
Publication date: 2011-07-27

Abstract

The invention discloses a system for recognizing characters from an image, which comprises a data receiving module, a background filtering module, a character segmentation module, a characteristic extraction module, a characteristic comparison module and a database updating module, wherein the data receiving module is used for receiving specific image character data from a DataServer for subsequent image character recognition; the background filtering module is used for removing an image background, and extracting character areas; the character segmentation module is used for performing thinning and size normalization processing on each character area to segment characters; the characteristic extraction module is used for extracting characteristic values of each character of each area; the characteristic comparison module is used for querying a database to obtain character comparison results; and the database updating module is used for writing the characteristic values and corresponding characters into the characteristic database. The recognition system provided by the invention reduces the error rate of character recognition, and can learn the characters which cannot be recognized to achieve improved recognition capability.

Description

A kind of pictograph recognition system

Technical field

The present invention relates to the image analysis process field, especially corresponding visual filter analysis application is particularly at the multimedia messages monitoring filtering system of common carrier.

Background technology

Along with the development of mobile communication technology, the data that sent not only are confined to Word message, have also comprised a large amount of image informations.At present, there has been monitoring preferably in country to short message, has suppressed the propagation of a part of flame, but to image information, but can not monitor, this just allows a lot of lawless persons that opportunity is arranged, make in this way, spread wide to contain and disturb public order or the information of resident living.

Summary of the invention

The purpose of this invention is to provide pictograph recognition system solution first-class, efficient stable.Discern processing at view data, the character information that directly obtains view data and comprised.

The present invention proposes a kind of pictograph recognition system, described pictograph recognition system comprises for this reason

Data reception module is used for receiving concrete image file data for follow-up images and characters identification work from Data Server; The filtering background module is used to remove image background, extracts character zone; The Character segmentation module is used for each character zone is done refinement and size normalization processing, separating character; Characteristic extracting module is used to extract the eigenwert of each each character of zone; Feature contrast module is used for Query Database, obtains the character comparing result; Database update module: be used for eigenwert is write property data base with corresponding character.

According to embodiments of the invention, described filtering background module: at first carry out colored Run-Length Coding; Next obtains color cluster; Carry out the generation and the selection of character layer once more; Extract character zone at last.

Above-mentioned colored Run-Length Coding is encoded according to colored Euclidean distance: from first pixel of each row, with this pixel is the starting point of a new distance of swimming, calculate this starting point and with in the delegation with it next-door neighbour the Euclidean distance dii ' of next pixel in the RGB space; If dii ' is less than threshold value Th, these two pixels are merged into a distance of swimming, and run length li increases 1, and calculates the mean value RGB value of this distance of swimming: (ri, bi, gi), on the contrary, if more than or equal to threshold value Th, then distance of swimming sequence number i increases 1, and be the starting point of the new distance of swimming with this pixel, note this pixel coordinate and color value (1), new run length initial value is 1; According to the method, continue to calculate the Euclidean distance of next neighbor and this adjacent distance of swimming, if, just incorporate this pixel into this distance of swimming, and recomputate its RGB value apart from less than Th, otherwise, generate the new distance of swimming; According to above-mentioned rule, all pixels of each row can obtain several colored distances of swimming in the traversing graph picture; Generate simultaneously in the colored distance of swimming, second row from image, calculate this row and last adjacent lines and on the position be each colored distance of swimming of linking to each other of 8 neighborhoods Euclidean distance in the RGB space between any two, judge that whether this distance is less than threshold value Tv, if then merge into same connected domain, promptly connect this two distances of swimming less than Tv; After having traveled through whole images, according to the concatenation pointer between the distance of swimming just can obtain all connected domains of composition diagram picture set Ci | l=1,2,, p}, p are the sum of connected domain that image comprises; Wherein said Tv=Th=13 are to 16.

According to embodiments of the invention, this system obtains color cluster as follows:

The average color of the connected domain that contained number of pixels is maximum is calculated other connected domain and its Euclidean distance in the RGB color space as initial center color; If less than threshold value TC, then calculate two average RGB values of connected domain, and replace original initial center color as new center color value; If greater than TC, then generate second new color center, the average color of this connected domain is the Initial R GB value at this center; Calculate comparison one by one according to this method, merge the color center of centre distance less than TC; Wherein said TC=28～30.

According to embodiments of the invention, described pictograph recognition system is carried out the generation and the selection of character layer according to following steps: after the connected domain color cluster, keep all areas greater than 1 * 1 connected domain, calculate the Euclidean distance of these connected domains and each color center respectively; If the Euclidean distance of certain connected domain and one of them color center is less than TC, this connected domain can be assigned on the aspect of this color center decision so.

According to embodiments of the invention, described pictograph recognition system is extracted character zone as follows:

1) test each image aspect successively: for each image aspect, if surpass 100 greater than the number of pixels of this layer segmentation threshold, just as the literal aspect, if greater than the number of pixels of this layer segmentation threshold less than 100, as noise or background aspect;

2) test each connected domain successively: if the length of tested connected domain and wide and the test pattern size is about the same, the average color of tested connected domain is look as a setting, and its place aspect is the background aspect

3) erased noise aspect and background aspect, remaining aspect is the pictograph aspect.

According to embodiments of the invention, described pictograph recognition system comprises the Image Enhancement Based piece, is used for each character zone is done pre-service, strengthens image recognition intensity; Described pictograph recognition system is carried out enhancement process as follows: according to the view data after the filtering background resume module, the pixel average of character zone on the computed image, compare the difference between each pixel and the mean pixel, the little pixel of difference in the reserved character zone; Use mean filter to remove noise then; Again according to the statistics of histogram threshold value, and according to the Threshold Segmentation image.

According to embodiments of the invention, described Character segmentation module is finished following steps:

At first to extract character skeleton, the skeleton of text, judge according to its situation of eight consecutive point: 1. internal point can not be deleted; 2. isolated point can not be deleted; 3. if can not to delete 4. pixel P are frontier points to the straight line end points, remove P after, if connected component does not increase, then delete P;

According to concordance list, at every turn to delegation of delegation with whole scanning of image one time, each point for non-border calculates the index of its correspondence in table, if 0, then keep, otherwise delete this point; If it is deleted to scan neither one point specifically, then loop ends, remaining point is exactly the skeleton point, if somewhat deleted, then carries out a new wheel scan, so repeatedly, up to do not put deleted till;

To extracting the view data of skeleton, according to from left to right, mode from top to bottom travels through, and carries out Character segmentation according to eight of character together with property.

According to embodiments of the invention, described character feature extraction module is finished following steps:

At first, each character of cutting apart is carried out big or small normalizing handle according to unified the ratio of width to height;

Secondly, according to the quantity of horizontal and vertical needs extract minutiae, with the cutting of character equivalent; Or

1) obtain the starting point coordinate and the terminal point coordinate of stroke, be respectively (StartX, StartY) and (EndX, EndY);

2) if EndX=StartX, the vector coding code-4 of stroke forwards 5 to so);

3) the absolute value slope of calculating stroke slope:

4) determine the vector coding code of stroke according to slope at first quartile.

If

, code=0 then; If

Figure 2011100718259100002DEST_PATH_IMAGE002

, code=1 then;

If

, code=2 then;

If

Figure 2011100718259100002DEST_PATH_IMAGE004

, code=3 then;

If

, code=4 then;

5) determine the quadrant at stroke place, if second quadrant, code=8-code so, if third quadrant, code=8+code so, if four-quadrant, code=(16-code) mod 16 so; After algorithm finished, code was exactly the vector coding of our stroke that requires; Only the coding of each need be coupled together by sequential write, just can obtain the coding of whole Chinese character.

According to embodiments of the invention, described feature contrast module is carried out the feature contrast by correlation, and wherein correlation calculates as follows:

Figure 2011100718259100002DEST_PATH_IMAGE006

Wherein the characteristic value data sequence of character is

,

Figure 2011100718259100002DEST_PATH_IMAGE008

,,

, the characteristic value data sequence of standard character is ,

Figure 2011100718259100002DEST_PATH_IMAGE012

Technical solution of the present invention can provide real-time, high performance, safe and reliable information monitoring that comprises image and filtering system for mobile operator; by powerful stable pictograph analytic function; mobile communication market is accurately discerned, in time alarms, tackles, purified to the information that comprises image; the protection information security is safeguarded the consumer legitimate right.

Description of drawings

The present invention will illustrate by example and with reference to the mode of accompanying drawing, wherein:

Fig. 1 is a pictograph identification treatment scheme.

Fig. 2 is a pictograph training managing flow process;

Fig. 3 is one of feature extraction mode of the present invention.

Embodiment

Disclosed all features in this instructions, or the step in disclosed all methods or the process except mutually exclusive feature and/or step, all can make up by any way.

Disclosed arbitrary feature in this instructions (comprising any accessory claim, summary and accompanying drawing) is unless special narration all can be replaced by other equivalences or the alternative features with similar purpose.That is, unless special narration, each feature is an example in a series of equivalences or the similar characteristics.

Pictograph recognition system of the present invention mainly comprises the two large divisions, is respectively pictograph identification and pictograph study.

Fig. 1 is pictograph identification process flow figure, and concrete steps comprise:

Data Receiving step: receive the view data that obtains comprising pictograph;

Filtering background step: mainly remove image background, extract character zone;

Enhancement process step: each character zone is done pre-service, strengthen image recognition intensity;

The Character segmentation step: each character zone is done refinement and size normalization processing, and separating character;

Characteristic extraction step: the eigenwert of extracting each each character of zone;

Feature contrast step: Query Database obtains the result and contrasts character.

Fig. 2 is a pictograph training method treatment scheme, and concrete steps comprise:

Data Receiving step: receive the view data that obtains comprising the pictograph character;

Filtering background step: remove image background, extract character zone;

The Character segmentation step: each character zone is done refinement and size normalization processing, and separating character

Database update step: eigenwert is write property data base with corresponding character

In order to realize pictograph identification and pictograph study, comprise following processing module among the present invention altogether, be used to finish each step:

Module	Treatment step
		Receiver module	Reception obtains comprising the view data of pictograph
The filtering background module	Remove image background, extract character zone
		The enhancement process module	Each character zone is done pre-service, strengthen image recognition intensity
The Character segmentation module	Each character zone is done refinement and size normalization processing, and separating character
		Characteristic extracting module	Extract the eigenwert of each each character of zone
Feature contrast module	Query Database obtains the character comparing result
		The database update module	Eigenwert is write property data base with corresponding character

Below module in the system is elaborated:

One, data reception module

Data reception module receives concrete image file data for follow-up images and characters identification work from Data Server.

Two, filtering background module

In order to adapt to the character recognition of complicated color background, for separating character and background, the present invention at first carries out colored Run-Length Coding:

Colored distance of swimming definition is: Ri (ri, gi, bi), (xi, yi), li }, wherein (ri, gi, bi) be on the distance of swimming each point at the (r of RGB color space, g, b) color component mean value, (xi yi) is the origin coordinates of this distance of swimming, the coordinate of first pixel in image that promptly this distance of swimming comprised, li are run length.

Figure 2011100718259100002DEST_PATH_IMAGE013

Annotate: dii2 is three-dimensional Euclidean distance, represents the difference size of former and later two color components, and wherein (ri, gi is bi) with (ri2, gi2 bi2) are adjacent company's color component value.

If?(?dii′?<?Th)

ri?=(?ri?×l?i?+?ri′?)/(i?+?1);

gi?=(?gi?×l?i?+?gi′?)/(i?+?1);

bi?=(?bi?×l?i?+?bi′?)/(i?+?1);

l?i?=?li?+?1?;

Else{?i?=?i?+?1?;?ri?=?ri′;?bi?=?bi′;?gi?=?gi′;}?(1)

Annotate: Th is a pre-set threshold, and as three-dimensional Euclidean distance dii ' during less than threshold value, (bi) component value is the mean value of former and later two components, and run length li adds one for ri, gi; If dii ' is greater than threshold value, the starting point color component of new distance of swimming assignment again then.

The acquisition methods of the colored distance of swimming is as follows: on the basis of the distance of swimming acquisition methods of common bianry image, the Euclidean distance that adds colored rgb space is differentiated, first pixel from each row, think that this pixel is the starting point of a new distance of swimming, calculate this starting point and with in the delegation with it next-door neighbour the Euclidean distance dii ' of next pixel in the RGB space.If dii ' is less than threshold value Th, these two pixels are merged into a distance of swimming so, and run length li increases 1, and calculate the mean value RGB value of this distance of swimming: (ri, bi, gi), on the contrary, if more than or equal to threshold value Th, then distance of swimming sequence number i increases 1, and be the starting point of the new distance of swimming with this pixel, note this pixel coordinate and color value (1), new run length initial value is 1.According to the method, continue to calculate the Euclidean distance of next neighbor and this adjacent distance of swimming, same, if, just incorporate this pixel into this distance of swimming, and recomputate its RGB value apart from less than Th, otherwise, generate the new distance of swimming.According to above-mentioned rule, all pixels of each row can obtain several colored distances of swimming in the traversing graph picture.

Generate simultaneously in the colored distance of swimming, second row from image, calculate this row and last adjacent lines and on the position be each colored distance of swimming of linking to each other of 8 neighborhoods Euclidean distance in the RGB space (calculating whole dissimilarity between two distances of swimming) between any two, judge that whether this distance is less than the Tv(threshold value, be the boundary of split image, span can be chosen optimal threshold by experiment between 0 to 255), if less than then merging into same connected domain, promptly connect this two distances of swimming.According to above-mentioned rule, traveled through whole images after, according to the concatenation pointer between the distance of swimming just can obtain all connected domains of composition diagram picture set Ci | l=1,2,, p}, p are the sum of connected domain that image comprises.

Here, the organization definition of connected domain is as follows:

Cl{ (rl, gl, bl), Nl, (vl, hl) }, rl wherein, what gl and bl represented is the average color RGB value of connected domain Cl,

Figure 2011100718259100002DEST_PATH_IMAGE014

Figure 2011100718259100002DEST_PATH_IMAGE015

Be easy to obtain the length and width vl and the hl of connected domain by simple computation.Nl?=?{?Rli|

I=1,2,, nl } represent the set of institute's chromatic colour distance of swimming of comprising in this connected domain, wherein Rli is a distance of swimming sequence.

In the time of this scope of Tv=Th=13～16, can obtain experimental result preferably.The CRAG structure that this step adopts has made full use of the color and the positional information of image, and by writing down the average color of connected domain, has kept the color information of original image well.

Next obtains color cluster:

When obtaining all connected domains of composition diagram picture, also obtained the average color of each connected domain.In order further to obtain needed literal aspect, need here these average color are done a simple cluster.We find by the great amount of samples test, do not knowing under the situation of color cluster number in advance, consider from time loss, and because the number of the color of the literal that we were concerned about often is less than 8 kinds, thereby adopted here and selected the method for initial cluster center to obtain needed color center, method is summarized as follows:

The average color of choosing the maximum connected domain of contained number of pixels is as initial center color, calculate other connected domain and its Euclidean distance in the RGB color space, if less than threshold value TC, then calculate two average RGB values of connected domain, and replace original initial center color as new center color value; Otherwise, if greater than TC, then generating second new color center, the average color of this connected domain is the Initial R GB value at this center.Calculate comparison one by one according to this method, constantly change owing to the color center position simultaneously, thereby need to merge the color center of centre distance less than TC.In addition, following Sample selection criterion also needs to be used, i.e. the ratio of width to height of character area and image, and run length, together with area pixel density:

1) hl＜P * H, vl＜P * V, the length of H coloured image here, V is wide; Vl and hl represent the wide height of connected domain respectively, and P is the ratio of width to height of character area and original image.

2) hl * vl〉1 * 1; (run length is greater than one)

3)

Figure 2011100718259100002DEST_PATH_IMAGE017

Q1 wherein, Q2 is the probability density together with the regional distance of swimming, and li is a run length, and hl and vl are respectively wide and high together with the zone.

Here

Figure 2011100718259100002DEST_PATH_IMAGE018

The picture element density of the expression connected domain in connected domain, character height in the test pattern and width are always less than 0.95 times of image height and width, thereby P=0.95.If the and Q2=0.9(picture element density lower limit Q1=0.5(picture element density upper limit)), the frame of image and other long and narrow fine rules are with deleted.TC=28～30 are well selections for extract Chinese character from the natural scene image, can effectively remove the interference noise point.By said method, finally just can obtain the color cluster center of proper number.

Carry out the generation and the selection of character layer once more:

After the connected domain color cluster, cause stroke fracture for fear of losing because of little connected domain, keep all areas here greater than 1 * 1 connected domain, calculate the Euclidean distance of these connected domains and each color center respectively.If the Euclidean distance of certain connected domain and one of them color center is less than TC, this connected domain can be assigned on the aspect of this color center decision so, just so just guarantee that the connected domain that has similar color at last can be present in same image aspect.

Extract character zone at last:

When character zone is chosen, select character area as follows:

1) test each aspect successively: for each aspect, if surpass 100 greater than the number of pixels of this layer segmentation threshold, just as the literal aspect, if greater than the number of pixels of this layer segmentation threshold less than 100, then think noise or background aspect.Wherein this layer segmentation threshold can be determined by the dividing method of routine, such as statistics with histogram.

2) test each connected domain successively: if the length of tested connected domain and wide about the same with the test pattern size, we are with the average color of tested connected domain look as a setting so, because the area of character area size is littler than entire image area certainly, if so two area size are suitable, its place aspect is the background aspect.

3) by after erased noise aspect and the background aspect, remaining aspect is alternative aspect, uses automatic threshold further to cut apart.Obtain the literal aspect and promptly obtained character area.

Three, enhancement process module

According to the view data after the filtering background resume module, the pixel average of character zone on the computed image compares the difference between each pixel and the mean pixel, the little pixel (such as less than 5) of difference in the reserved character zone; Use mean filter (, reaching the purpose of filtered noise) to remove noise then by calculating the mean value between the neighbor; Again according to grey level histogram (remarked pixel grey value profile rule) statistical threshold, and according to the Threshold Segmentation image.

Four, Character segmentation module

Before carrying out Character segmentation, at first to extract character skeleton.So-called skeleton can be understood as the axis of image, the skeleton of text, and judge according to its situation of eight consecutive point: 1. internal point can not be deleted; 2. isolated point can not be deleted; 3. if can not to delete 4. P (pixel) be frontier point to the straight line end points, remove P after, if connected component does not increase, then P can delete.

According to above-mentioned criterion, make a concordance list in advance, from 0 to 255 has 256 elements, each element or be 0, or be 1.We table look-up according to the situation of eight consecutive point of certain point (black color dots to be processed that yes), if the element in the table is 1, represent that then this point can delete, otherwise keep.The method of tabling look-up is, establishing white point is 1, and stain is 0; First (lowest order) of corresponding one 8 figure place of upper left side point, directly over corresponding second of point, upper right side point is corresponding the 3rd, corresponding the 4th of left side adjoint point, corresponding the 5th of right adjoint point, corresponding the 6th of lower left point, under the 7th of some correspondence, lower right point is corresponding the 8th, going to table look-up by 8 figure places of such composition gets final product.

static?int?erasetable[256]={

0,0,1,1,0,0,1,1, 1,1,0,1,1,1,0,1,

1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,1,

0,0,1,1,0,0,1,1, 1,1,0,1,1,1,0,1,

1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,1,

1,1,0,0,1,1,0,0, 0,0,0,0,0,0,0,0,

0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,

1,1,0,0,1,1,0,0, 1,1,0,1,1,1,0,1,

0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,

0,0,1,1,0,0,1,1, 1,1,0,1,1,1,0,1,

1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,1,

0,0,1,1,0,0,1,1, 1,1,0,1,1,1,0,1,

1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,0,

1,1,0,0,1,1,0,0, 0,0,0,0,0,0,0,0,

1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,0,

1,1,0,0,1,1,0,0, 1,1,0,1,1,1,0,0,

1,1,0,0,1,1,1,0, 1,1,0,0,1,0,0,0

};

According to this concordance list, at every turn to delegation of delegation with whole scanning of image one time, for each point (not comprising frontier point), calculate the index of its correspondence in table, if 0, then keep, otherwise delete this point.If it is deleted to scan neither one point specifically, then loop ends, remaining point is exactly the skeleton point, if somewhat deleted, then carries out a new wheel scan, so repeatedly, up to do not put deleted till.

To extracting the view data of skeleton, according to from left to right, mode from top to bottom travels through, carry out Character segmentation according to eight of character together with property (upper and lower, left and right, upper left, upper right, eight directions in lower-left and bottom right), judge promptly around each character pixels whether (upper and lower, left and right, upper left, upper right, lower-left and bottom right) exists same pixel, if exist, then think the pixel on the same character, and the like.

Five, character feature extraction module

At first, each character of cutting apart is carried out big or small normalizing handle according to unified the ratio of width to height.Then can be according to following two kinds of extracting modes.

Feature extraction mode one (as shown in Figure 3):

According to the quantity of horizontal and vertical needs extract minutiae, with the cutting of character equivalent, for example:

Laterally get three unique points if desired, vertically get three unique points, according to a definite sequence, with character pixels number that every line ran through as unique point.(unique point is intensive more, and recognition result is accurate relatively more).

Feature extraction mode two:

2) if EndX=StartX, the vector coding code-4 of stroke forwards 5 to so);

3) the absolute value slope of calculating stroke slope:

If , code=0 then; If , code=1 then;

If

, code=2 then;

If

Figure 2011100718259100002DEST_PATH_IMAGE022

, code=3 then;

If

, code=4 then.

5) determine the quadrant at stroke place, if second quadrant, code=8-code so, if third quadrant, code=8+code so, if four-quadrant, code=(16-code) mod 16 so.

After algorithm finished, code was exactly the vector coding of our stroke that requires.Obtain after each coding.Just can encode to whole Chinese character.Only the coding of each need be coupled together by sequential write, just can obtain the coding of whole Chinese character.

Six, feature contrast module

If the characteristic value data sequence of character is

,

,,

, the characteristic value data sequence of standard character is ,

, then can be by the correlation of two sequences of following formula calculating.

Figure 2011100718259100002DEST_PATH_IMAGE024

Ideally, if two characteristic sequences are similar, correlation Absolute value should be near 1, thereby the absolute value of choosing correlation P when identification, needs it by correlation if the result of identification has a plurality of characters so greater than the result of the standard character of setting threshold values as identification The absolute value ordering of P allows the user select needed character from the result of identification, if the user is non-selected, just correlation again

The character of absolute value maximum is as final character.

Seven, database update module

The character feature of property data base can constantly add and revise, and also can strengthen by continuous study the recognition capability of character, thereby bring in constant renewal in database.

The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature or any new combination that discloses in this manual, and the arbitrary new method that discloses or step or any new combination of process.

Claims

1. a pictograph recognition system is characterized in that, described pictograph recognition system comprises

Data reception module is used for receiving concrete image file data for follow-up images and characters identification work from Data Server;

The filtering background module is used to remove image background, extracts character zone;

The Character segmentation module is used for each character zone is done refinement and size normalization processing, separating character;

Characteristic extracting module is used to extract the eigenwert of each each character of zone;

Feature contrast module is used for Query Database, obtains the character comparing result;

Database update module: be used for eigenwert is write property data base with corresponding character.

2. pictograph recognition system as claimed in claim 1 is characterized in that, described filtering background module:

At first carry out colored Run-Length Coding;

Next obtains color cluster;

Carry out the generation and the selection of character layer once more;

Extract character zone at last.

3. pictograph recognition system as claimed in claim 2 is characterized in that, described colored Run-Length Coding is encoded according to colored Euclidean distance:

From each the row first pixel, be the starting point of a new distance of swimming with this pixel, calculate this starting point and with in the delegation with it next-door neighbour the Euclidean distance dii ' of next pixel in the RGB space;

If dii ' is less than threshold value Th, these two pixels are merged into a distance of swimming, and run length li increases 1, and calculates the mean value RGB value of this distance of swimming: (ri, bi, gi), on the contrary, if more than or equal to threshold value Th, then distance of swimming sequence number i increases 1, and be the starting point of the new distance of swimming with this pixel, note this pixel coordinate and color value (1), new run length initial value is 1;

According to the method, continue to calculate the Euclidean distance of next neighbor and this adjacent distance of swimming, if, just incorporate this pixel into this distance of swimming, and recomputate its RGB value apart from less than Th, otherwise, generate the new distance of swimming;

According to above-mentioned rule, all pixels of each row can obtain several colored distances of swimming in the traversing graph picture;

Generate simultaneously in the colored distance of swimming, second row from image, calculate this row and last adjacent lines and on the position be each colored distance of swimming of linking to each other of 8 neighborhoods Euclidean distance in the RGB space between any two, judge that whether this distance is less than threshold value Tv, if then merge into same connected domain, promptly connect this two distances of swimming less than Tv;

After having traveled through whole images, according to the concatenation pointer between the distance of swimming just can obtain all connected domains of composition diagram picture set Ci | l=1,2,, p}, p are the sum of connected domain that image comprises;

Wherein said Tv=Th=13 are to 16.

4. pictograph recognition system as claimed in claim 3 is characterized in that, this system obtains color cluster as follows:

5. pictograph recognition system as claimed in claim 4 is characterized in that, described pictograph recognition system is carried out the generation and the selection of character layer according to following steps:

After the connected domain color cluster, keep all areas greater than 1 * 1 connected domain, calculate the Euclidean distance of these connected domains and each color center respectively; If the Euclidean distance of certain connected domain and one of them color center is less than TC, this connected domain can be assigned on the aspect of this color center decision so.

6. pictograph recognition system as claimed in claim 5 is characterized in that, described pictograph recognition system is extracted character zone as follows:

2) test each connected domain successively: if the length of tested connected domain and wide and the test pattern size is about the same, the average color of tested connected domain is look as a setting, and its place aspect is the background aspect;

7. pictograph recognition system as claimed in claim 1 is characterized in that, described pictograph recognition system comprises the Image Enhancement Based piece, is used for each character zone is done pre-service, strengthens image recognition intensity;

Described pictograph recognition system is carried out enhancement process as follows:

According to the view data after the filtering background resume module, the pixel average of character zone on the computed image compares the difference between each pixel and the mean pixel, the little pixel of difference in the reserved character zone; Use mean filter to remove noise then; Again according to the statistics of histogram threshold value, and according to the Threshold Segmentation image.

8. pictograph recognition system as claimed in claim 1 is characterized in that, described Character segmentation module is finished following steps:

9. pictograph recognition system as claimed in claim 1 is characterized in that, described character feature extraction module is finished following steps:

2) if EndX=StartX, the vector coding code-4 of stroke forwards 5 to so);

3) the absolute value slope of calculating stroke slope:

4) determine the vector coding code of stroke according to slope at first quartile;

If

Figure 2011100718259100001DEST_PATH_IMAGE001

, code=0 then; If

Figure 2011100718259100001DEST_PATH_IMAGE002

, code=1 then;

If

Figure 2011100718259100001DEST_PATH_IMAGE003

, code=2 then;

If

Figure 2011100718259100001DEST_PATH_IMAGE004

, code=3 then;

If

Figure 2011100718259100001DEST_PATH_IMAGE005

, code=4 then;

10. pictograph recognition system as claimed in claim 1 is characterized in that, described feature contrast module is carried out the feature contrast by correlation, and wherein correlation calculates as follows:

Figure 2011100718259100001DEST_PATH_IMAGE006

Wherein the characteristic value data sequence of character is

Figure 2011100718259100001DEST_PATH_IMAGE007

, ,,

Figure 2011100718259100001DEST_PATH_IMAGE009

, the characteristic value data sequence of standard character is

Figure 2011100718259100001DEST_PATH_IMAGE010

,

Figure 2011100718259100001DEST_PATH_IMAGE011