CN102136064A - System for recognizing characters from image - Google Patents

System for recognizing characters from image Download PDF

Info

Publication number
CN102136064A
CN102136064A CN 201110071825 CN201110071825A CN102136064A CN 102136064 A CN102136064 A CN 102136064A CN 201110071825 CN201110071825 CN 201110071825 CN 201110071825 A CN201110071825 A CN 201110071825A CN 102136064 A CN102136064 A CN 102136064A
Authority
CN
China
Prior art keywords
character
module
distance
image
swimming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201110071825
Other languages
Chinese (zh)
Inventor
王鑫鑫
税彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU SIFANG TECHNOLOGIES Co Ltd
Original Assignee
CHENGDU SIFANG TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHENGDU SIFANG TECHNOLOGIES Co Ltd filed Critical CHENGDU SIFANG TECHNOLOGIES Co Ltd
Priority to CN 201110071825 priority Critical patent/CN102136064A/en
Publication of CN102136064A publication Critical patent/CN102136064A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a system for recognizing characters from an image, which comprises a data receiving module, a background filtering module, a character segmentation module, a characteristic extraction module, a characteristic comparison module and a database updating module, wherein the data receiving module is used for receiving specific image character data from a DataServer for subsequent image character recognition; the background filtering module is used for removing an image background, and extracting character areas; the character segmentation module is used for performing thinning and size normalization processing on each character area to segment characters; the characteristic extraction module is used for extracting characteristic values of each character of each area; the characteristic comparison module is used for querying a database to obtain character comparison results; and the database updating module is used for writing the characteristic values and corresponding characters into the characteristic database. The recognition system provided by the invention reduces the error rate of character recognition, and can learn the characters which cannot be recognized to achieve improved recognition capability.

Description

A kind of pictograph recognition system
Technical field
The present invention relates to the image analysis process field, especially corresponding visual filter analysis application is particularly at the multimedia messages monitoring filtering system of common carrier.
Background technology
Along with the development of mobile communication technology, the data that sent not only are confined to Word message, have also comprised a large amount of image informations.At present, there has been monitoring preferably in country to short message, has suppressed the propagation of a part of flame, but to image information, but can not monitor, this just allows a lot of lawless persons that opportunity is arranged, make in this way, spread wide to contain and disturb public order or the information of resident living.
Summary of the invention
The purpose of this invention is to provide pictograph recognition system solution first-class, efficient stable.Discern processing at view data, the character information that directly obtains view data and comprised.
The present invention proposes a kind of pictograph recognition system, described pictograph recognition system comprises for this reason
Data reception module is used for receiving concrete image file data for follow-up images and characters identification work from Data Server; The filtering background module is used to remove image background, extracts character zone; The Character segmentation module is used for each character zone is done refinement and size normalization processing, separating character; Characteristic extracting module is used to extract the eigenwert of each each character of zone; Feature contrast module is used for Query Database, obtains the character comparing result; Database update module: be used for eigenwert is write property data base with corresponding character.
According to embodiments of the invention, described filtering background module: at first carry out colored Run-Length Coding; Next obtains color cluster; Carry out the generation and the selection of character layer once more; Extract character zone at last.
Above-mentioned colored Run-Length Coding is encoded according to colored Euclidean distance: from first pixel of each row, with this pixel is the starting point of a new distance of swimming, calculate this starting point and with in the delegation with it next-door neighbour the Euclidean distance dii ' of next pixel in the RGB space; If dii ' is less than threshold value Th, these two pixels are merged into a distance of swimming, and run length li increases 1, and calculates the mean value RGB value of this distance of swimming: (ri, bi, gi), on the contrary, if more than or equal to threshold value Th, then distance of swimming sequence number i increases 1, and be the starting point of the new distance of swimming with this pixel, note this pixel coordinate and color value (1), new run length initial value is 1; According to the method, continue to calculate the Euclidean distance of next neighbor and this adjacent distance of swimming, if, just incorporate this pixel into this distance of swimming, and recomputate its RGB value apart from less than Th, otherwise, generate the new distance of swimming; According to above-mentioned rule, all pixels of each row can obtain several colored distances of swimming in the traversing graph picture; Generate simultaneously in the colored distance of swimming, second row from image, calculate this row and last adjacent lines and on the position be each colored distance of swimming of linking to each other of 8 neighborhoods Euclidean distance in the RGB space between any two, judge that whether this distance is less than threshold value Tv, if then merge into same connected domain, promptly connect this two distances of swimming less than Tv; After having traveled through whole images, according to the concatenation pointer between the distance of swimming just can obtain all connected domains of composition diagram picture set Ci | l=1,2,, p}, p are the sum of connected domain that image comprises; Wherein said Tv=Th=13 are to 16.
According to embodiments of the invention, this system obtains color cluster as follows:
The average color of the connected domain that contained number of pixels is maximum is calculated other connected domain and its Euclidean distance in the RGB color space as initial center color; If less than threshold value TC, then calculate two average RGB values of connected domain, and replace original initial center color as new center color value; If greater than TC, then generate second new color center, the average color of this connected domain is the Initial R GB value at this center; Calculate comparison one by one according to this method, merge the color center of centre distance less than TC; Wherein said TC=28~30.
According to embodiments of the invention, described pictograph recognition system is carried out the generation and the selection of character layer according to following steps: after the connected domain color cluster, keep all areas greater than 1 * 1 connected domain, calculate the Euclidean distance of these connected domains and each color center respectively; If the Euclidean distance of certain connected domain and one of them color center is less than TC, this connected domain can be assigned on the aspect of this color center decision so.
According to embodiments of the invention, described pictograph recognition system is extracted character zone as follows:
1) test each image aspect successively: for each image aspect, if surpass 100 greater than the number of pixels of this layer segmentation threshold, just as the literal aspect, if greater than the number of pixels of this layer segmentation threshold less than 100, as noise or background aspect;
2) test each connected domain successively: if the length of tested connected domain and wide and the test pattern size is about the same, the average color of tested connected domain is look as a setting, and its place aspect is the background aspect
3) erased noise aspect and background aspect, remaining aspect is the pictograph aspect.
According to embodiments of the invention, described pictograph recognition system comprises the Image Enhancement Based piece, is used for each character zone is done pre-service, strengthens image recognition intensity; Described pictograph recognition system is carried out enhancement process as follows: according to the view data after the filtering background resume module, the pixel average of character zone on the computed image, compare the difference between each pixel and the mean pixel, the little pixel of difference in the reserved character zone; Use mean filter to remove noise then; Again according to the statistics of histogram threshold value, and according to the Threshold Segmentation image.
According to embodiments of the invention, described Character segmentation module is finished following steps:
At first to extract character skeleton, the skeleton of text, judge according to its situation of eight consecutive point: 1. internal point can not be deleted; 2. isolated point can not be deleted; 3. if can not to delete 4. pixel P are frontier points to the straight line end points, remove P after, if connected component does not increase, then delete P;
According to concordance list, at every turn to delegation of delegation with whole scanning of image one time, each point for non-border calculates the index of its correspondence in table, if 0, then keep, otherwise delete this point; If it is deleted to scan neither one point specifically, then loop ends, remaining point is exactly the skeleton point, if somewhat deleted, then carries out a new wheel scan, so repeatedly, up to do not put deleted till;
To extracting the view data of skeleton, according to from left to right, mode from top to bottom travels through, and carries out Character segmentation according to eight of character together with property.
According to embodiments of the invention, described character feature extraction module is finished following steps:
At first, each character of cutting apart is carried out big or small normalizing handle according to unified the ratio of width to height;
Secondly, according to the quantity of horizontal and vertical needs extract minutiae, with the cutting of character equivalent; Or
1) obtain the starting point coordinate and the terminal point coordinate of stroke, be respectively (StartX, StartY) and (EndX, EndY);
2) if EndX=StartX, the vector coding code-4 of stroke forwards 5 to so);
3) the absolute value slope of calculating stroke slope:
4) determine the vector coding code of stroke according to slope at first quartile.
If
Figure 452236DEST_PATH_IMAGE001
, code=0 then; If
Figure 2011100718259100002DEST_PATH_IMAGE002
, code=1 then;
If
Figure 431694DEST_PATH_IMAGE003
, code=2 then;
If
Figure 2011100718259100002DEST_PATH_IMAGE004
, code=3 then;
If
Figure 273748DEST_PATH_IMAGE005
, code=4 then;
5) determine the quadrant at stroke place, if second quadrant, code=8-code so, if third quadrant, code=8+code so, if four-quadrant, code=(16-code) mod 16 so; After algorithm finished, code was exactly the vector coding of our stroke that requires; Only the coding of each need be coupled together by sequential write, just can obtain the coding of whole Chinese character.
According to embodiments of the invention, described feature contrast module is carried out the feature contrast by correlation, and wherein correlation calculates as follows:
Figure 2011100718259100002DEST_PATH_IMAGE006
Wherein the characteristic value data sequence of character is
Figure 731274DEST_PATH_IMAGE007
,
Figure 2011100718259100002DEST_PATH_IMAGE008
,,
Figure 676139DEST_PATH_IMAGE009
, the characteristic value data sequence of standard character is ,
Figure 560919DEST_PATH_IMAGE011
Figure 2011100718259100002DEST_PATH_IMAGE012
Technical solution of the present invention can provide real-time, high performance, safe and reliable information monitoring that comprises image and filtering system for mobile operator; by powerful stable pictograph analytic function; mobile communication market is accurately discerned, in time alarms, tackles, purified to the information that comprises image; the protection information security is safeguarded the consumer legitimate right.
Description of drawings
The present invention will illustrate by example and with reference to the mode of accompanying drawing, wherein:
Fig. 1 is a pictograph identification treatment scheme.
Fig. 2 is a pictograph training managing flow process;
Fig. 3 is one of feature extraction mode of the present invention.
Embodiment
Disclosed all features in this instructions, or the step in disclosed all methods or the process except mutually exclusive feature and/or step, all can make up by any way.
Disclosed arbitrary feature in this instructions (comprising any accessory claim, summary and accompanying drawing) is unless special narration all can be replaced by other equivalences or the alternative features with similar purpose.That is, unless special narration, each feature is an example in a series of equivalences or the similar characteristics.
Pictograph recognition system of the present invention mainly comprises the two large divisions, is respectively pictograph identification and pictograph study.
Fig. 1 is pictograph identification process flow figure, and concrete steps comprise:
Data Receiving step: receive the view data that obtains comprising pictograph;
Filtering background step: mainly remove image background, extract character zone;
Enhancement process step: each character zone is done pre-service, strengthen image recognition intensity;
The Character segmentation step: each character zone is done refinement and size normalization processing, and separating character;
Characteristic extraction step: the eigenwert of extracting each each character of zone;
Feature contrast step: Query Database obtains the result and contrasts character.
Fig. 2 is a pictograph training method treatment scheme, and concrete steps comprise:
Data Receiving step: receive the view data that obtains comprising the pictograph character;
Filtering background step: remove image background, extract character zone;
Enhancement process step: each character zone is done pre-service, strengthen image recognition intensity;
The Character segmentation step: each character zone is done refinement and size normalization processing, and separating character
Characteristic extraction step: the eigenwert of extracting each each character of zone;
Database update step: eigenwert is write property data base with corresponding character
In order to realize pictograph identification and pictograph study, comprise following processing module among the present invention altogether, be used to finish each step:
Module Treatment step
Receiver module Reception obtains comprising the view data of pictograph
The filtering background module Remove image background, extract character zone
The enhancement process module Each character zone is done pre-service, strengthen image recognition intensity
The Character segmentation module Each character zone is done refinement and size normalization processing, and separating character
Characteristic extracting module Extract the eigenwert of each each character of zone
Feature contrast module Query Database obtains the character comparing result
The database update module Eigenwert is write property data base with corresponding character
Below module in the system is elaborated:
One, data reception module
Data reception module receives concrete image file data for follow-up images and characters identification work from Data Server.
Two, filtering background module
In order to adapt to the character recognition of complicated color background, for separating character and background, the present invention at first carries out colored Run-Length Coding:
Colored distance of swimming definition is: Ri (ri, gi, bi), (xi, yi), li }, wherein (ri, gi, bi) be on the distance of swimming each point at the (r of RGB color space, g, b) color component mean value, (xi yi) is the origin coordinates of this distance of swimming, the coordinate of first pixel in image that promptly this distance of swimming comprised, li are run length.
Figure 2011100718259100002DEST_PATH_IMAGE013
Annotate: dii2 is three-dimensional Euclidean distance, represents the difference size of former and later two color components, and wherein (ri, gi is bi) with (ri2, gi2 bi2) are adjacent company's color component value.
If?(?dii′?<?Th)
ri?=(?ri?×l?i?+?ri′?)/(i?+?1);
gi?=(?gi?×l?i?+?gi′?)/(i?+?1);
bi?=(?bi?×l?i?+?bi′?)/(i?+?1);
l?i?=?li?+?1?;
Else{?i?=?i?+?1?;?ri?=?ri′;?bi?=?bi′;?gi?=?gi′;}?(1)
Annotate: Th is a pre-set threshold, and as three-dimensional Euclidean distance dii ' during less than threshold value, (bi) component value is the mean value of former and later two components, and run length li adds one for ri, gi; If dii ' is greater than threshold value, the starting point color component of new distance of swimming assignment again then.
The acquisition methods of the colored distance of swimming is as follows: on the basis of the distance of swimming acquisition methods of common bianry image, the Euclidean distance that adds colored rgb space is differentiated, first pixel from each row, think that this pixel is the starting point of a new distance of swimming, calculate this starting point and with in the delegation with it next-door neighbour the Euclidean distance dii ' of next pixel in the RGB space.If dii ' is less than threshold value Th, these two pixels are merged into a distance of swimming so, and run length li increases 1, and calculate the mean value RGB value of this distance of swimming: (ri, bi, gi), on the contrary, if more than or equal to threshold value Th, then distance of swimming sequence number i increases 1, and be the starting point of the new distance of swimming with this pixel, note this pixel coordinate and color value (1), new run length initial value is 1.According to the method, continue to calculate the Euclidean distance of next neighbor and this adjacent distance of swimming, same, if, just incorporate this pixel into this distance of swimming, and recomputate its RGB value apart from less than Th, otherwise, generate the new distance of swimming.According to above-mentioned rule, all pixels of each row can obtain several colored distances of swimming in the traversing graph picture.
Generate simultaneously in the colored distance of swimming, second row from image, calculate this row and last adjacent lines and on the position be each colored distance of swimming of linking to each other of 8 neighborhoods Euclidean distance in the RGB space (calculating whole dissimilarity between two distances of swimming) between any two, judge that whether this distance is less than the Tv(threshold value, be the boundary of split image, span can be chosen optimal threshold by experiment between 0 to 255), if less than then merging into same connected domain, promptly connect this two distances of swimming.According to above-mentioned rule, traveled through whole images after, according to the concatenation pointer between the distance of swimming just can obtain all connected domains of composition diagram picture set Ci | l=1,2,, p}, p are the sum of connected domain that image comprises.
Here, the organization definition of connected domain is as follows:
Cl{ (rl, gl, bl), Nl, (vl, hl) }, rl wherein, what gl and bl represented is the average color RGB value of connected domain Cl,
Figure 2011100718259100002DEST_PATH_IMAGE014
Figure 2011100718259100002DEST_PATH_IMAGE015
Be easy to obtain the length and width vl and the hl of connected domain by simple computation.Nl?=?{?Rli|
I=1,2,, nl } represent the set of institute's chromatic colour distance of swimming of comprising in this connected domain, wherein Rli is a distance of swimming sequence.
In the time of this scope of Tv=Th=13~16, can obtain experimental result preferably.The CRAG structure that this step adopts has made full use of the color and the positional information of image, and by writing down the average color of connected domain, has kept the color information of original image well.
Next obtains color cluster:
When obtaining all connected domains of composition diagram picture, also obtained the average color of each connected domain.In order further to obtain needed literal aspect, need here these average color are done a simple cluster.We find by the great amount of samples test, do not knowing under the situation of color cluster number in advance, consider from time loss, and because the number of the color of the literal that we were concerned about often is less than 8 kinds, thereby adopted here and selected the method for initial cluster center to obtain needed color center, method is summarized as follows:
The average color of choosing the maximum connected domain of contained number of pixels is as initial center color, calculate other connected domain and its Euclidean distance in the RGB color space, if less than threshold value TC, then calculate two average RGB values of connected domain, and replace original initial center color as new center color value; Otherwise, if greater than TC, then generating second new color center, the average color of this connected domain is the Initial R GB value at this center.Calculate comparison one by one according to this method, constantly change owing to the color center position simultaneously, thereby need to merge the color center of centre distance less than TC.In addition, following Sample selection criterion also needs to be used, i.e. the ratio of width to height of character area and image, and run length, together with area pixel density:
1) hl<P * H, vl<P * V, the length of H coloured image here, V is wide; Vl and hl represent the wide height of connected domain respectively, and P is the ratio of width to height of character area and original image.
2) hl * vl〉1 * 1; (run length is greater than one)
3)
Figure 2011100718259100002DEST_PATH_IMAGE017
Q1 wherein, Q2 is the probability density together with the regional distance of swimming, and li is a run length, and hl and vl are respectively wide and high together with the zone.
Here
Figure 2011100718259100002DEST_PATH_IMAGE018
The picture element density of the expression connected domain in connected domain, character height in the test pattern and width are always less than 0.95 times of image height and width, thereby P=0.95.If the and Q2=0.9(picture element density lower limit Q1=0.5(picture element density upper limit)), the frame of image and other long and narrow fine rules are with deleted.TC=28~30 are well selections for extract Chinese character from the natural scene image, can effectively remove the interference noise point.By said method, finally just can obtain the color cluster center of proper number.
Carry out the generation and the selection of character layer once more:
After the connected domain color cluster, cause stroke fracture for fear of losing because of little connected domain, keep all areas here greater than 1 * 1 connected domain, calculate the Euclidean distance of these connected domains and each color center respectively.If the Euclidean distance of certain connected domain and one of them color center is less than TC, this connected domain can be assigned on the aspect of this color center decision so, just so just guarantee that the connected domain that has similar color at last can be present in same image aspect.
Extract character zone at last:
When character zone is chosen, select character area as follows:
1) test each aspect successively: for each aspect, if surpass 100 greater than the number of pixels of this layer segmentation threshold, just as the literal aspect, if greater than the number of pixels of this layer segmentation threshold less than 100, then think noise or background aspect.Wherein this layer segmentation threshold can be determined by the dividing method of routine, such as statistics with histogram.
2) test each connected domain successively: if the length of tested connected domain and wide about the same with the test pattern size, we are with the average color of tested connected domain look as a setting so, because the area of character area size is littler than entire image area certainly, if so two area size are suitable, its place aspect is the background aspect.
3) by after erased noise aspect and the background aspect, remaining aspect is alternative aspect, uses automatic threshold further to cut apart.Obtain the literal aspect and promptly obtained character area.
Three, enhancement process module
According to the view data after the filtering background resume module, the pixel average of character zone on the computed image compares the difference between each pixel and the mean pixel, the little pixel (such as less than 5) of difference in the reserved character zone; Use mean filter (, reaching the purpose of filtered noise) to remove noise then by calculating the mean value between the neighbor; Again according to grey level histogram (remarked pixel grey value profile rule) statistical threshold, and according to the Threshold Segmentation image.
Four, Character segmentation module
Before carrying out Character segmentation, at first to extract character skeleton.So-called skeleton can be understood as the axis of image, the skeleton of text, and judge according to its situation of eight consecutive point: 1. internal point can not be deleted; 2. isolated point can not be deleted; 3. if can not to delete 4. P (pixel) be frontier point to the straight line end points, remove P after, if connected component does not increase, then P can delete.
According to above-mentioned criterion, make a concordance list in advance, from 0 to 255 has 256 elements, each element or be 0, or be 1.We table look-up according to the situation of eight consecutive point of certain point (black color dots to be processed that yes), if the element in the table is 1, represent that then this point can delete, otherwise keep.The method of tabling look-up is, establishing white point is 1, and stain is 0; First (lowest order) of corresponding one 8 figure place of upper left side point, directly over corresponding second of point, upper right side point is corresponding the 3rd, corresponding the 4th of left side adjoint point, corresponding the 5th of right adjoint point, corresponding the 6th of lower left point, under the 7th of some correspondence, lower right point is corresponding the 8th, going to table look-up by 8 figure places of such composition gets final product.
static?int?erasetable[256]={
0,0,1,1,0,0,1,1, 1,1,0,1,1,1,0,1,
1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,1,
0,0,1,1,0,0,1,1, 1,1,0,1,1,1,0,1,
1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,1,
1,1,0,0,1,1,0,0, 0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,
1,1,0,0,1,1,0,0, 1,1,0,1,1,1,0,1,
0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,
0,0,1,1,0,0,1,1, 1,1,0,1,1,1,0,1,
1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,1,
0,0,1,1,0,0,1,1, 1,1,0,1,1,1,0,1,
1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,0,
1,1,0,0,1,1,0,0, 0,0,0,0,0,0,0,0,
1,1,0,0,1,1,1,1, 0,0,0,0,0,0,0,0,
1,1,0,0,1,1,0,0, 1,1,0,1,1,1,0,0,
1,1,0,0,1,1,1,0, 1,1,0,0,1,0,0,0
};
According to this concordance list, at every turn to delegation of delegation with whole scanning of image one time, for each point (not comprising frontier point), calculate the index of its correspondence in table, if 0, then keep, otherwise delete this point.If it is deleted to scan neither one point specifically, then loop ends, remaining point is exactly the skeleton point, if somewhat deleted, then carries out a new wheel scan, so repeatedly, up to do not put deleted till.
To extracting the view data of skeleton, according to from left to right, mode from top to bottom travels through, carry out Character segmentation according to eight of character together with property (upper and lower, left and right, upper left, upper right, eight directions in lower-left and bottom right), judge promptly around each character pixels whether (upper and lower, left and right, upper left, upper right, lower-left and bottom right) exists same pixel, if exist, then think the pixel on the same character, and the like.
Five, character feature extraction module
At first, each character of cutting apart is carried out big or small normalizing handle according to unified the ratio of width to height.Then can be according to following two kinds of extracting modes.
Feature extraction mode one (as shown in Figure 3):
According to the quantity of horizontal and vertical needs extract minutiae, with the cutting of character equivalent, for example:
Laterally get three unique points if desired, vertically get three unique points, according to a definite sequence, with character pixels number that every line ran through as unique point.(unique point is intensive more, and recognition result is accurate relatively more).
Feature extraction mode two:
1) obtain the starting point coordinate and the terminal point coordinate of stroke, be respectively (StartX, StartY) and (EndX, EndY);
2) if EndX=StartX, the vector coding code-4 of stroke forwards 5 to so);
3) the absolute value slope of calculating stroke slope:
4) determine the vector coding code of stroke according to slope at first quartile.
If , code=0 then; If , code=1 then;
If
Figure 201853DEST_PATH_IMAGE021
, code=2 then;
If
Figure 2011100718259100002DEST_PATH_IMAGE022
, code=3 then;
If
Figure 525387DEST_PATH_IMAGE023
, code=4 then.
5) determine the quadrant at stroke place, if second quadrant, code=8-code so, if third quadrant, code=8+code so, if four-quadrant, code=(16-code) mod 16 so.
After algorithm finished, code was exactly the vector coding of our stroke that requires.Obtain after each coding.Just can encode to whole Chinese character.Only the coding of each need be coupled together by sequential write, just can obtain the coding of whole Chinese character.
Six, feature contrast module
If the characteristic value data sequence of character is
Figure 194266DEST_PATH_IMAGE007
,
Figure 249947DEST_PATH_IMAGE008
,,
Figure 66593DEST_PATH_IMAGE009
, the characteristic value data sequence of standard character is ,
Figure 389307DEST_PATH_IMAGE011
Figure 881468DEST_PATH_IMAGE012
, then can be by the correlation of two sequences of following formula calculating.
Figure 2011100718259100002DEST_PATH_IMAGE024
Ideally, if two characteristic sequences are similar, correlation Absolute value should be near 1, thereby the absolute value of choosing correlation P when identification, needs it by correlation if the result of identification has a plurality of characters so greater than the result of the standard character of setting threshold values as identification The absolute value ordering of P allows the user select needed character from the result of identification, if the user is non-selected, just correlation again
Figure 183900DEST_PATH_IMAGE025
The character of absolute value maximum is as final character.
Seven, database update module
The character feature of property data base can constantly add and revise, and also can strengthen by continuous study the recognition capability of character, thereby bring in constant renewal in database.
Disclosed arbitrary feature in this instructions (comprising any accessory claim, summary and accompanying drawing) is unless special narration all can be replaced by other equivalences or the alternative features with similar purpose.That is, unless special narration, each feature is an example in a series of equivalences or the similar characteristics.
The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature or any new combination that discloses in this manual, and the arbitrary new method that discloses or step or any new combination of process.

Claims (10)

1. a pictograph recognition system is characterized in that, described pictograph recognition system comprises
Data reception module is used for receiving concrete image file data for follow-up images and characters identification work from Data Server;
The filtering background module is used to remove image background, extracts character zone;
The Character segmentation module is used for each character zone is done refinement and size normalization processing, separating character;
Characteristic extracting module is used to extract the eigenwert of each each character of zone;
Feature contrast module is used for Query Database, obtains the character comparing result;
Database update module: be used for eigenwert is write property data base with corresponding character.
2. pictograph recognition system as claimed in claim 1 is characterized in that, described filtering background module:
At first carry out colored Run-Length Coding;
Next obtains color cluster;
Carry out the generation and the selection of character layer once more;
Extract character zone at last.
3. pictograph recognition system as claimed in claim 2 is characterized in that, described colored Run-Length Coding is encoded according to colored Euclidean distance:
From each the row first pixel, be the starting point of a new distance of swimming with this pixel, calculate this starting point and with in the delegation with it next-door neighbour the Euclidean distance dii ' of next pixel in the RGB space;
If dii ' is less than threshold value Th, these two pixels are merged into a distance of swimming, and run length li increases 1, and calculates the mean value RGB value of this distance of swimming: (ri, bi, gi), on the contrary, if more than or equal to threshold value Th, then distance of swimming sequence number i increases 1, and be the starting point of the new distance of swimming with this pixel, note this pixel coordinate and color value (1), new run length initial value is 1;
According to the method, continue to calculate the Euclidean distance of next neighbor and this adjacent distance of swimming, if, just incorporate this pixel into this distance of swimming, and recomputate its RGB value apart from less than Th, otherwise, generate the new distance of swimming;
According to above-mentioned rule, all pixels of each row can obtain several colored distances of swimming in the traversing graph picture;
Generate simultaneously in the colored distance of swimming, second row from image, calculate this row and last adjacent lines and on the position be each colored distance of swimming of linking to each other of 8 neighborhoods Euclidean distance in the RGB space between any two, judge that whether this distance is less than threshold value Tv, if then merge into same connected domain, promptly connect this two distances of swimming less than Tv;
After having traveled through whole images, according to the concatenation pointer between the distance of swimming just can obtain all connected domains of composition diagram picture set Ci | l=1,2,, p}, p are the sum of connected domain that image comprises;
Wherein said Tv=Th=13 are to 16.
4. pictograph recognition system as claimed in claim 3 is characterized in that, this system obtains color cluster as follows:
The average color of the connected domain that contained number of pixels is maximum is calculated other connected domain and its Euclidean distance in the RGB color space as initial center color; If less than threshold value TC, then calculate two average RGB values of connected domain, and replace original initial center color as new center color value; If greater than TC, then generate second new color center, the average color of this connected domain is the Initial R GB value at this center; Calculate comparison one by one according to this method, merge the color center of centre distance less than TC; Wherein said TC=28~30.
5. pictograph recognition system as claimed in claim 4 is characterized in that, described pictograph recognition system is carried out the generation and the selection of character layer according to following steps:
After the connected domain color cluster, keep all areas greater than 1 * 1 connected domain, calculate the Euclidean distance of these connected domains and each color center respectively; If the Euclidean distance of certain connected domain and one of them color center is less than TC, this connected domain can be assigned on the aspect of this color center decision so.
6. pictograph recognition system as claimed in claim 5 is characterized in that, described pictograph recognition system is extracted character zone as follows:
1) test each image aspect successively: for each image aspect, if surpass 100 greater than the number of pixels of this layer segmentation threshold, just as the literal aspect, if greater than the number of pixels of this layer segmentation threshold less than 100, as noise or background aspect;
2) test each connected domain successively: if the length of tested connected domain and wide and the test pattern size is about the same, the average color of tested connected domain is look as a setting, and its place aspect is the background aspect;
3) erased noise aspect and background aspect, remaining aspect is the pictograph aspect.
7. pictograph recognition system as claimed in claim 1 is characterized in that, described pictograph recognition system comprises the Image Enhancement Based piece, is used for each character zone is done pre-service, strengthens image recognition intensity;
Described pictograph recognition system is carried out enhancement process as follows:
According to the view data after the filtering background resume module, the pixel average of character zone on the computed image compares the difference between each pixel and the mean pixel, the little pixel of difference in the reserved character zone; Use mean filter to remove noise then; Again according to the statistics of histogram threshold value, and according to the Threshold Segmentation image.
8. pictograph recognition system as claimed in claim 1 is characterized in that, described Character segmentation module is finished following steps:
At first to extract character skeleton, the skeleton of text, judge according to its situation of eight consecutive point: 1. internal point can not be deleted; 2. isolated point can not be deleted; 3. if can not to delete 4. pixel P are frontier points to the straight line end points, remove P after, if connected component does not increase, then delete P;
According to concordance list, at every turn to delegation of delegation with whole scanning of image one time, each point for non-border calculates the index of its correspondence in table, if 0, then keep, otherwise delete this point; If it is deleted to scan neither one point specifically, then loop ends, remaining point is exactly the skeleton point, if somewhat deleted, then carries out a new wheel scan, so repeatedly, up to do not put deleted till;
To extracting the view data of skeleton, according to from left to right, mode from top to bottom travels through, and carries out Character segmentation according to eight of character together with property.
9. pictograph recognition system as claimed in claim 1 is characterized in that, described character feature extraction module is finished following steps:
At first, each character of cutting apart is carried out big or small normalizing handle according to unified the ratio of width to height;
Secondly, according to the quantity of horizontal and vertical needs extract minutiae, with the cutting of character equivalent; Or
1) obtain the starting point coordinate and the terminal point coordinate of stroke, be respectively (StartX, StartY) and (EndX, EndY);
2) if EndX=StartX, the vector coding code-4 of stroke forwards 5 to so);
3) the absolute value slope of calculating stroke slope:
4) determine the vector coding code of stroke according to slope at first quartile;
If
Figure 2011100718259100001DEST_PATH_IMAGE001
, code=0 then; If
Figure 2011100718259100001DEST_PATH_IMAGE002
, code=1 then;
If
Figure 2011100718259100001DEST_PATH_IMAGE003
, code=2 then;
If
Figure 2011100718259100001DEST_PATH_IMAGE004
, code=3 then;
If
Figure 2011100718259100001DEST_PATH_IMAGE005
, code=4 then;
5) determine the quadrant at stroke place, if second quadrant, code=8-code so, if third quadrant, code=8+code so, if four-quadrant, code=(16-code) mod 16 so; After algorithm finished, code was exactly the vector coding of our stroke that requires; Only the coding of each need be coupled together by sequential write, just can obtain the coding of whole Chinese character.
10. pictograph recognition system as claimed in claim 1 is characterized in that, described feature contrast module is carried out the feature contrast by correlation, and wherein correlation calculates as follows:
Figure 2011100718259100001DEST_PATH_IMAGE006
Wherein the characteristic value data sequence of character is
Figure 2011100718259100001DEST_PATH_IMAGE007
, ,,
Figure 2011100718259100001DEST_PATH_IMAGE009
, the characteristic value data sequence of standard character is
Figure 2011100718259100001DEST_PATH_IMAGE010
,
Figure 2011100718259100001DEST_PATH_IMAGE011
CN 201110071825 2011-03-24 2011-03-24 System for recognizing characters from image Pending CN102136064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110071825 CN102136064A (en) 2011-03-24 2011-03-24 System for recognizing characters from image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110071825 CN102136064A (en) 2011-03-24 2011-03-24 System for recognizing characters from image

Publications (1)

Publication Number Publication Date
CN102136064A true CN102136064A (en) 2011-07-27

Family

ID=44295846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110071825 Pending CN102136064A (en) 2011-03-24 2011-03-24 System for recognizing characters from image

Country Status (1)

Country Link
CN (1) CN102136064A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831416A (en) * 2012-08-15 2012-12-19 广州广电运通金融电子股份有限公司 Character identification method and relevant device
CN103020634A (en) * 2011-09-26 2013-04-03 北京大学 Segmentation method and device for recognizing identifying codes
CN103503002A (en) * 2011-05-04 2014-01-08 联邦印刷有限公司 Method and device for identifying character
CN103593329A (en) * 2012-08-17 2014-02-19 腾讯科技(深圳)有限公司 Text image rearrangement method and system
CN104899586A (en) * 2014-03-03 2015-09-09 阿里巴巴集团控股有限公司 Method for recognizing character contents included in image and device thereof
CN105701489A (en) * 2016-01-14 2016-06-22 云南大学 Novel digital extraction and identification method and system thereof
CN106599818A (en) * 2016-12-07 2017-04-26 广州视源电子科技股份有限公司 Method and apparatus for generating handwriting-format document based on picture
CN106934846A (en) * 2015-12-29 2017-07-07 深圳先进技术研究院 A kind of cloth image processing method and system
CN107045632A (en) * 2015-10-29 2017-08-15 尼尔森(美国)有限公司 Method and apparatus for extracting text from imaging files
CN107093172A (en) * 2016-02-18 2017-08-25 清华大学 character detecting method and system
CN108009538A (en) * 2017-12-22 2018-05-08 大连运明自动化技术有限公司 A kind of automobile engine cylinder-body sequence number intelligent identification Method
CN108446709A (en) * 2017-02-16 2018-08-24 现代自动车株式会社 Picto-diagram identification device, picto-diagram identifying system and picto-diagram recognition methods
CN109447086A (en) * 2018-09-19 2019-03-08 浙江口碑网络技术有限公司 A kind of extracting method and device of picture character color
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device
CN110490204A (en) * 2019-07-11 2019-11-22 深圳怡化电脑股份有限公司 Image processing method, image processing apparatus and terminal
CN111080554A (en) * 2019-12-20 2020-04-28 成都极米科技股份有限公司 Method and device for enhancing subtitle area in projection content and readable storage medium
CN111104936A (en) * 2019-11-19 2020-05-05 泰康保险集团股份有限公司 Text image recognition method, device, equipment and storage medium
WO2020224551A1 (en) * 2019-05-08 2020-11-12 中兴通讯股份有限公司 Information compression/decompression methods and apparatuses, and storage medium
CN112528755A (en) * 2020-11-19 2021-03-19 上海至冕伟业科技有限公司 Intelligent identification method for fire-fighting evacuation facilities
CN113117311A (en) * 2021-04-20 2021-07-16 重庆反切码科技有限公司 Novel grab dish machine
CN113808225A (en) * 2021-09-27 2021-12-17 东华理工大学南昌校区 Lossless coding method for image
CN116452615A (en) * 2023-06-19 2023-07-18 恒银金融科技股份有限公司 Segmentation method and device for foreground and background of crown word size region

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1312625C (en) * 2004-07-02 2007-04-25 清华大学 Character extracting method from complecate background color image based on run-length adjacent map
CN101615244A (en) * 2008-06-26 2009-12-30 上海梅山钢铁股份有限公司 Handwritten plate blank numbers automatic identifying method and recognition device
US20110058028A1 (en) * 2009-09-09 2011-03-10 Sony Corporation Information processing apparatus, information processing method, and information processing program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1312625C (en) * 2004-07-02 2007-04-25 清华大学 Character extracting method from complecate background color image based on run-length adjacent map
CN101615244A (en) * 2008-06-26 2009-12-30 上海梅山钢铁股份有限公司 Handwritten plate blank numbers automatic identifying method and recognition device
US20110058028A1 (en) * 2009-09-09 2011-03-10 Sony Corporation Information processing apparatus, information processing method, and information processing program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《中国优秀硕士学位论文全文数据库信息科技辑》 20090415 狄光敏 车牌定位及识别方法的研究 , 第4期 *
《微计算机信息》 20071231 任民宏 基于矢量特征编码的手写字符识别技术 第24卷, 第5-3期 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103503002A (en) * 2011-05-04 2014-01-08 联邦印刷有限公司 Method and device for identifying character
CN103503002B (en) * 2011-05-04 2018-01-19 联邦印刷有限公司 Method and apparatus for distinguished symbol
CN103020634A (en) * 2011-09-26 2013-04-03 北京大学 Segmentation method and device for recognizing identifying codes
CN102831416A (en) * 2012-08-15 2012-12-19 广州广电运通金融电子股份有限公司 Character identification method and relevant device
CN103593329B (en) * 2012-08-17 2018-03-13 腾讯科技(深圳)有限公司 A kind of text image rearrangement method and system
CN103593329A (en) * 2012-08-17 2014-02-19 腾讯科技(深圳)有限公司 Text image rearrangement method and system
CN104899586A (en) * 2014-03-03 2015-09-09 阿里巴巴集团控股有限公司 Method for recognizing character contents included in image and device thereof
CN104899586B (en) * 2014-03-03 2018-10-12 阿里巴巴集团控股有限公司 Method and device is identified to the word content for including in image
CN107045632A (en) * 2015-10-29 2017-08-15 尼尔森(美国)有限公司 Method and apparatus for extracting text from imaging files
CN106934846B (en) * 2015-12-29 2020-05-22 深圳先进技术研究院 Cloth image processing method and system
CN106934846A (en) * 2015-12-29 2017-07-07 深圳先进技术研究院 A kind of cloth image processing method and system
CN105701489A (en) * 2016-01-14 2016-06-22 云南大学 Novel digital extraction and identification method and system thereof
CN105701489B (en) * 2016-01-14 2020-03-17 云南大学 Novel digital extraction and identification method and system
CN107093172A (en) * 2016-02-18 2017-08-25 清华大学 character detecting method and system
CN107093172B (en) * 2016-02-18 2020-03-17 清华大学 Character detection method and system
CN106599818A (en) * 2016-12-07 2017-04-26 广州视源电子科技股份有限公司 Method and apparatus for generating handwriting-format document based on picture
CN106599818B (en) * 2016-12-07 2020-10-27 广州视源电子科技股份有限公司 Method and device for generating handwriting format file based on picture
CN108446709A (en) * 2017-02-16 2018-08-24 现代自动车株式会社 Picto-diagram identification device, picto-diagram identifying system and picto-diagram recognition methods
CN108446709B (en) * 2017-02-16 2023-06-02 现代自动车株式会社 Pictogram recognition apparatus, pictogram recognition system, and pictogram recognition method
CN108009538A (en) * 2017-12-22 2018-05-08 大连运明自动化技术有限公司 A kind of automobile engine cylinder-body sequence number intelligent identification Method
CN109447086A (en) * 2018-09-19 2019-03-08 浙江口碑网络技术有限公司 A kind of extracting method and device of picture character color
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device
WO2020224551A1 (en) * 2019-05-08 2020-11-12 中兴通讯股份有限公司 Information compression/decompression methods and apparatuses, and storage medium
CN110490204B (en) * 2019-07-11 2022-07-15 深圳怡化电脑股份有限公司 Image processing method, image processing device and terminal
CN110490204A (en) * 2019-07-11 2019-11-22 深圳怡化电脑股份有限公司 Image processing method, image processing apparatus and terminal
CN111104936A (en) * 2019-11-19 2020-05-05 泰康保险集团股份有限公司 Text image recognition method, device, equipment and storage medium
CN111080554A (en) * 2019-12-20 2020-04-28 成都极米科技股份有限公司 Method and device for enhancing subtitle area in projection content and readable storage medium
CN112528755A (en) * 2020-11-19 2021-03-19 上海至冕伟业科技有限公司 Intelligent identification method for fire-fighting evacuation facilities
CN113117311A (en) * 2021-04-20 2021-07-16 重庆反切码科技有限公司 Novel grab dish machine
CN113808225A (en) * 2021-09-27 2021-12-17 东华理工大学南昌校区 Lossless coding method for image
CN113808225B (en) * 2021-09-27 2023-09-19 东华理工大学南昌校区 Lossless coding method for image
CN116452615A (en) * 2023-06-19 2023-07-18 恒银金融科技股份有限公司 Segmentation method and device for foreground and background of crown word size region
CN116452615B (en) * 2023-06-19 2023-10-03 恒银金融科技股份有限公司 Segmentation method and device for foreground and background of crown word size region

Similar Documents

Publication Publication Date Title
CN102136064A (en) System for recognizing characters from image
CN110738125B (en) Method, device and storage medium for selecting detection frame by Mask R-CNN
CN110210475B (en) License plate character image segmentation method based on non-binarization and edge detection
CN103714181B (en) A kind of hierarchical particular persons search method
CN105868683A (en) Channel logo identification method and apparatus
CN105631486A (en) Method and device for recognizing images and characters
CN102930561A (en) Delaunay-triangulation-based grid map vectorizing method
CN103295009B (en) Based on the license plate character recognition method of Stroke decomposition
CN107705254B (en) City environment assessment method based on street view
CN104778470A (en) Character detection and recognition method based on component tree and Hough forest
CN106845513A (en) Staff detector and method based on condition random forest
CN111832504A (en) Space information intelligent integrated generation method for satellite in-orbit application
CN108829711A (en) A kind of image search method based on multi-feature fusion
CN104134067A (en) Road vehicle monitoring system based on intelligent visual Internet of Things
CN104376334A (en) Pedestrian comparison method based on multi-scale feature fusion
CN116386090B (en) Plankton identification method, system and medium based on scanning atlas
CN112668375A (en) System and method for analyzing tourist distribution in scenic spot
CN104517262B (en) The adaptive image scaling method detected based on DCT domain vision significance
CN114241358A (en) Equipment state display method, device and equipment based on digital twin transformer substation
CN114120094A (en) Water pollution identification method and system based on artificial intelligence
CN108830882A (en) Video abnormal behaviour real-time detection method
CN116822548B (en) Method for generating high recognition rate AI two-dimensional code and computer readable storage medium
CN110135274A (en) A kind of people flow rate statistical method based on recognition of face
CN114298992A (en) Video frame duplication removing method and device, electronic equipment and storage medium
CN114708424A (en) End-to-end detector with input being dense query with deduplication pre-processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110727