Invention content
In view of this, a kind of character segmentation method of offer of the embodiment of the present invention and device, it can be by multiple continuous adhesion words
Symbol cuts into complete single character, avoids a case where complete character cuts in half, improves to a certain extent
The accuracy rate of character recognition.
To achieve the above object, the first aspect according to the ... of the embodiment of the present invention provides a kind of character segmentation method.
The present invention character segmentation method include:Character picture to be split is subjected to vertical direction projection, and from projection
The intermediate point of blank spaces is searched in image as pre-segmentation point, to obtain pre-segmentation point set;The blank spaces are vertical
Histogram is less than the point of setting value to projection value;
The mean breadth of single character is calculated according to character sum and character overall width;
The pre-segmentation point set is traversed, calculates the interval of two neighboring pre-segmentation point, and in conjunction with the single character
Mean breadth determine practical segmentation point set;
The practical segmentation point set is traversed, determines the pixel between adjacent practical cut-point, to obtain described wait for
Single character figure after the segmentation of separating character image.
Optionally, the intermediate point of blank spaces is searched from projected image as pre-segmentation point, to obtain pre-segmentation point
The step of set includes:Projection value is searched from the projected image and is less than the point of setting value, and records each point successively
Abscissa;According to the abscissa of two neighboring point, the abscissa of the intermediate point of two neighboring point is calculated successively, it is pre- to obtain
Cut-point coordinate set blank_point={ b0,b1,...,bi,..,bm};Wherein, m indicates the sum of pre-segmentation point;Less than etc.
In character sum Nchar;biIndicate the abscissa of i-th of pre-segmentation point.
Optionally, the step of calculating the mean breadth of single character according to character sum and character overall width include:According to
Following formula calculates the mean breadth of single character, mean breadth=character overall width/character sum of single character.
Optionally, the pre-segmentation point set is traversed, calculates the interval of two neighboring pre-segmentation point, and in conjunction with the list
The mean breadth of a character determines that the step of practical segmentation point set includes:Pre-segmentation point set blank_point is traversed, then
Calculate separately the interval interval between the abscissa of two neighboring pre-segmentation point;Wherein, interval=blank_point
[i+1]-blank_point [i], i ∈ [0, m);The size for comparing the mean breadth W of interval and single character, according to pre-
If recognition rule determine the abscissa of practical cut-point, and by the practical segmentation point set segment_ of the write-in of practical cut-point
point;Wherein, b0For the practical abscissa for dividing first practical cut-point in point set.
Optionally, the size for comparing interval and W is wrapped according to the step of preset recognition rule determination practical cut-point
It includes:As the first coefficient * W<The coefficient * of interval≤second W, it is determined that the interval includes a character, i.e. blank_point [i
+ 1] it is practical cut-point;Wherein, the second coefficient is more than the first coefficient.
Optionally, the size for comparing interval and W is wrapped according to the step of preset recognition rule determination practical cut-point
It includes:As the second coefficient * W<When interval≤third coefficient * W, wherein third coefficient is more than the second coefficient, determines that this interval is wrapped
Containing 3 adhesion characters, the mean breadth w of character in the interval section is calculated, using blank_point [i] as starting point
Start determines practical cut-point as follows:Step A:First in the interval section is calculated according to following formula
Abscissa seg_point, the seg_point=start+w of a adhesion cut-point, then centered on the first adhesion cut-point,
Extend the pixel of the first predetermined number respectively to the left and right sides, after expansion in the range of search projection value minimum point, and general
The minimum point of projection value divides as the first practical cut-point in the interval section, and according to first in the interval section is practical
The abscissa of cutpoint updates seg_point values;Step B:Using updated seg_point values in step A as starting point
Start repeats step A, to obtain second practical cut-point in the interval section;Step C:By blank_
Point [i+1] is as the last one practical cut-point in the last interval section.
Optionally, the size for comparing interval and W is wrapped according to the step of preset recognition rule determination practical cut-point
It includes:Work as interval>When third coefficient * W, determine that the interval section includes the adhesion character that number of characters is more than 3, according to following step
Suddenly the practical cut-point of the interval section is determined:Step a:The pixel of the second predetermined number will be respectively retracted before and after the interval section;
Step b:The point of the upright projection value minimum after being retracted in the interval section is searched, and using the point as the adhesion of the interval section
Cut-point, and the interval section is divided by two subintervals according to the adhesion cut-point;Step c:Between the section for calculating subinterval
Every, and according to the size at subinterval interval and the mean breadth of single character, determined every son according to the preset recognition rule
The practical cut-point in section.
Optionally, the traversal practical segmentation point set, determines the step of the pixel between adjacent practical cut-point
Suddenly include:Determine the abscissa of adjacent practical cut-point;Pixel abscissa is belonged between adjacent practical cut-point abscissa
Pixel as the pixel between the practical cut-point.
Optionally, the character picture to be split includes the binary image for only including character to be split.
The second aspect according to the ... of the embodiment of the present invention provides a kind of Character segmentation device.
The present invention Character segmentation device include:Projection module, for character picture to be split to be carried out vertical direction throwing
Shadow, and from projected image search blank spaces intermediate point as pre-segmentation point, to obtain pre-segmentation point set;It is described
Blank spaces are the point that vertical direction projection value is less than setting value;Computing module, for according to character sum and character overall width
Calculate the mean breadth of single character;Determining module calculates two neighboring pre-segmentation point for traversing the pre-segmentation point set
Interval, and determine practical segmentation point set in conjunction with the mean breadth of the single character;Character determining module, for traversing
The practical segmentation point set, determines the pixel between adjacent practical cut-point, to obtain the character picture to be split
Segmentation after single character figure.
Optionally, the projection module is additionally operable to:The point that projection value is less than setting value is searched from the projected image, and
The abscissa of each point is recorded successively;According to the abscissa of two neighboring point, the intermediate point of two neighboring point is calculated successively
Abscissa, to obtain pre-segmentation point coordinates collection blank_point={ b0,b1,...,bi,..,bm};Wherein, m indicates pre-
The sum of cut-point;Less than or equal to character sum Nchar;biIndicate the abscissa of i-th of pre-segmentation point.
Optionally, the computing module is additionally operable to:The mean breadth of single character, single character are calculated according to following formula
Mean breadth=character overall width/character sum.
Optionally, the determining module is additionally operable to:Pre-segmentation point set blank_point is traversed, phase is then calculated separately
Interval interval between the abscissa of adjacent two pre-segmentation points;Compare the mean breadth W's of interval and single character
Size, determines the abscissa of practical cut-point according to preset recognition rule, and by the practical cut-point of the write-in of practical cut-point
Set segment_point;Wherein, b0For the practical abscissa for dividing first practical cut-point in point set.
Optionally, the determining module is additionally operable to:As the first coefficient * W<The coefficient * of interval≤second W, it is determined that should
Interval includes a character, i.e. blank_point [i+1] is practical cut-point.
Optionally, the determining module is additionally operable to:As the second coefficient * W<When interval≤third coefficient * W, this is determined
Interval include 3 adhesion characters, calculate the mean breadth w of character in the interval section, using blank_point [i] as
Initial point start determines practical cut-point as follows:Step A:It is calculated in the interval section according to following formula
Abscissa seg_point, the seg_point=start+w of first adhesion cut-point, then with the first adhesion cut-point be
The heart extends the pixel of the first predetermined number respectively to the left and right sides, after expansion in the range of search projection value minimum point, and
By the minimum point of projection value as the first practical cut-point in the interval section, and it is practical according to first in the interval section
The abscissa of cut-point updates seg_point values;Step B:Using updated seg_point values in step A as starting point
Start repeats step A, to obtain second practical cut-point in the interval section;Step C:By blank_
Point [i+1] is as the last one practical cut-point in the last interval section.
Optionally, the determining module is additionally operable to:Work as interval>When third coefficient * W, determine that the interval section includes
Number of characters is more than 3 adhesion character, and the practical cut-point of the interval section is determined according to following steps:Step a:By the spacer region
Between it is front and back it is each retraction the second predetermined number pixel;Step b:Search the upright projection value minimum after being retracted in the interval section
Point, and using the point as the adhesion cut-point of the interval section, and the interval section is divided into two according to the adhesion cut-point
Subinterval;Step c:The interval in subinterval is calculated, and according to the size at subinterval interval and the mean breadth of single character,
The practical cut-point every subinterval is determined according to the preset recognition rule.
Optionally, the character determining module is additionally operable to:Determine the abscissa of adjacent practical cut-point;By the horizontal seat of pixel
Mark belongs to the pixel between adjacent practical cut-point abscissa as the pixel between the practical cut-point.
Optionally, the character picture to be split includes the binary image for only including character to be split.
In terms of third according to the ... of the embodiment of the present invention, a kind of electronic equipment is provided.
The present invention electronic equipment include:One or more processors;Storage device, for storing one or more journeys
Sequence, when one or more of programs are executed by one or more of processors so that one or more of processors are real
Existing character segmentation method provided by the invention.
In terms of third according to the ... of the embodiment of the present invention, a kind of computer-readable medium is provided.
The computer-readable medium of the present invention, is stored thereon with computer program, real when described program is executed by processor
Existing character segmentation method provided by the invention.
One embodiment in foregoing invention has the following advantages that or advantageous effect:For the feelings of the continuous adhesion of multiple characters
Condition is split according to different adhesion number of characters using different segmentation rules, can be effectively by multiple continuous adhesion words
Symbol cuts into complete single character, avoids a case where complete character cuts in half, improves to a certain extent
The recognition accuracy of character.
Further effect possessed by above-mentioned non-usual optional mode adds hereinafter in conjunction with specific implementation mode
With explanation.
Specific implementation mode
It explains to the exemplary embodiment of the present invention below in conjunction with attached drawing, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
The description to known function and structure is omitted for clarity and conciseness in sample in following description.
Fig. 5 is a kind of schematic diagram of separating character method according to the ... of the embodiment of the present invention, as shown in figure 5, the present invention is implemented
The method of the separating character of example includes following step S50 to S53.
Step S50:Character picture to be split is subjected to vertical direction projection, and searches blank spaces from projected image
Intermediate point as pre-segmentation point, to obtain pre-segmentation point set.In this step, the character picture to be split is only
Include the bianry image of character to be split;The blank spaces being previously mentioned are the point that vertical direction projection value is less than setting value, such as
Here setting value is 2, that is to say, that if bright vertical direction projection value of crossing is less than 2 pixels, that is, thinks that the vertical direction is not wrapped
Containing any character information;It should be noted that first pre-segmentation point is redefined for the previous sky of first character in the step
White position coordinates, if without blank position, first pre-segmentation point coordinates is set as 0.In the step searches blank spaces
Between when putting, projection value is searched first from the projected image and is less than the point of setting value (that is, vertical direction projection value is less than 2
The point of pixel), and the abscissa of each point is recorded successively;Then according to the abscissa of two neighboring point, phase is calculated successively
The abscissa of the intermediate point of adjacent two points, to obtain pre-segmentation point coordinates collection blank_point={ b0,b1,...,bi,..,
bm};Wherein, m indicates the sum of pre-segmentation point;Less than or equal to character sum Nchar;biIndicate the abscissa of i-th of pre-segmentation point.
Step S51:The mean breadth of single character is calculated according to character sum and character overall width.In this step, root
The mean breadth of single character, mean breadth=character overall width/character sum of single character are calculated according to following formula.
Step S52:The pre-segmentation point set is traversed, calculates the interval of two neighboring pre-segmentation point, and in conjunction with described
The mean breadth of single character determines practical segmentation point set.In this step, traversal pre-segmentation point set blank_ first
Then point calculates separately the interval interval between the abscissa of two neighboring pre-segmentation point;
Wherein, interval=blank_point [i+1]-blank_point [i], i ∈ [0, m);
Then compare the magnitude relationship of the mean breadth W of interval and single character, it is true according to preset recognition rule
The abscissa of fixed practical cut-point, and by the practical segmentation point set segment_point of the write-in of practical cut-point;Wherein, b0
For the practical abscissa for dividing first practical cut-point in point set.
Relationship between Interval and W includes following three kinds:
As the first coefficient * W<The coefficient * of interval≤second W, it is determined that the interval includes a character, i.e. blank_
Point [i+1] is practical cut-point;Wherein, the second coefficient is more than the first coefficient.
As the second coefficient * W<When interval≤third coefficient * W, wherein third coefficient is more than the second coefficient, determines this
Interval include 3 adhesion characters, calculate the mean breadth w of character in the interval section, using blank_point [i] as
Initial point start determines practical cut-point as follows:
Step A:The abscissa seg_ of first adhesion cut-point in the interval section is calculated according to following formula
Point, seg_point=start+w, then centered on the first adhesion cut-point, extension first is pre- respectively to the left and right sides
If the pixel of number, after expansion in the range of search the point of projection value minimum, and using the point of projection value minimum as the interval
The first practical cut-point in section, and seg_ is updated according to the abscissa of the first practical cut-point in the interval section
Point values;
Step B:Updated seg_point values in step A are repeated into step A as starting point start, to
Obtain second practical cut-point in the interval section;
Step C:By blank_point [i+1] as the last one practical cut-point in the last interval section.
Work as interval>When third coefficient * W, determine that the interval section includes the adhesion character that number of characters is more than 3, according to
Following steps determine the practical cut-point of the interval section:
Step a:The pixel of the second predetermined number will be respectively retracted before and after the interval section;
Step b:The point of the upright projection value minimum after being retracted in the interval section is searched, and using the point as the spacer region
Between adhesion cut-point, and the interval section is divided by two subintervals according to the adhesion cut-point;
Step c:The interval in subinterval is calculated, and according to the big of subinterval interval and the mean breadth of single character
It is small, the practical cut-point every subinterval is determined according to the preset recognition rule.
Step S53:The practical segmentation point set is traversed, determines the pixel between adjacent practical cut-point, to
Single character figure to after the segmentation of the character picture to be split.In this step, it is first determined adjacent practical cut-point
Abscissa;Then the pixel between pixel abscissa belongs to adjacent practical cut-point abscissa is divided as the reality
Pixel between point;Finally according to the single character figure between adjacent practical cut-point.
Below by taking the identification of the social security number or taxid on insurance document as an example, technical solution of the present invention to be described in detail.?
In the embodiment, image to be treated is pre-processed first, and has obtained only wrapping by various image processing methods
Containing the binary image with separating character, specific pretreated process is as follows:
In declaration form, for the relatively whole Zhang Baodan of region area shared by number of policy, ratio very little, and also position is relatively solid
It is fixed, in order to improve subsequent processing efficiency, the waste of computing resource is avoided, the declaration form image of input is converted to gray-scale map first
Picture, and region interception is carried out to it.In the embodiment of the present invention, setting number of policy is unified in the upper right corner of declaration form image, with declaration form
The upper right corner is starting point, is not more than the 1/2 of full figure width according to width, 1/6 interception declaration form upper right of the length no more than full figure length
The region area at angle, as shown in fig. 6, the region is defined as pretreatment zone.
Self-adaption binaryzation processing, local neighborhood are carried out using the method for local auto-adaptive threshold value to Fig. 6 pretreatment zones
The size of block is 35*35, and after binary conversion treatment, the pixel more than threshold value can be set to 255, and the pixel for being less than threshold value can quilt
It is set to 0.Due to the picture characteristics of declaration form, the pixel value of binaryzation back scene area is usually 255, and the pixel value of foreground image is
0, as shown in Figure 7.
Pretreatment zone (i.e. Fig. 7) after binaryzation is subjected to horizontal direction corrosion treatment, due to the intercharacter of number of policy
Away from very little, thus erodent template is no more than 3 pixels;Processing can not only remove the scattered noise of some in Fig. 7 in this way, also
The profile information that can reinforce foreground character to a certain extent improves the accuracy of follow-up text row positioning.
Binaryzation picture after corrosion treatment first carries out picture the operation of sobel edge detections, as a result such as Fig. 8
Shown, in the figure after edge detection, the corresponding pixel of character only has " profile " of outmost turns, in this way when projecting,
It can reduce, due to font, thickness, the error of caused perspective view ensures size of the value of projection with character as far as possible
It is related with the quantity of word, convenient for passing through the relative position of projected image location character.
To edge-detected image carry out horizontal direction projection, perspective view as shown in figure 9, the embodiment of the present invention technical side
In case, since the relative position of two row comments below number of policy and number of policy is fixed, and two row comments
Since word height and number of characters are close, i.e. the floor projection feature of two row comments is almost the same, therefore first positions close to declaration form
Number comment line of text position it is more reliable.It, can be by walking as follows in a specific embodiment of the present invention
It is rapid to be positioned close to the coboundary position of the comment line of text of number of policy:
Step 1):Maximum horizontal projection value proj_max, given threshold can be obtained according to floor projection value horiz_proj
Thred, the criterion as extraction comment line of text:
Thred=proj_max*0.6
Step 2):Horiz_proj is traversed, the size of comparison level projection value and threshold value, all big by floor projection value line by line
In the continuous row of threshold value is concluded to a section list sublist, and preserve consecutive rows of row coordinate sublist=[rowi,
rowi+1,…rowi+n], wherein i is consecutive rows of initial row coordinate, and n is consecutive rows of row sum;According to row shared by comment
Number characteristic will not be less than 10 using the section list sublist length of thred extractions, therefore when the length of section list is unsatisfactory for this
When condition, it will give up without subsequent processing;After having traversed, a section list collection seg_list=[sublist can be obtained1,
sublist2,…,sublistm], m is the number of section list.It should be noted that due to being by the sequence traversed from top to bottom
It is handled, therefore sublist2In first numerical value can be more than sublist1In the last one numerical value.
Step 3):Calculating sifting is carried out to the multiple sections of lists that step 2) obtains, obtains the comment close to number of policy
The coboundary position coboundary_row of line of text, concrete operation step are as follows:
a:Calculate the maximum value and the corresponding row coordinate of maximum value of each section list sublist;
b:Seg_list is traversed, since the distance of two row comments is close, setting of the embodiment of the present invention represents expository writing
The spacing of two sublist of word should be no more than 15.After two neighboring sublist meets the condition, two sublist are calculated
The mean value and variance of maximum value;
c:Two adjacent sublist of selected minimum variancei, sublisti+1Line of text institute as comment is in place
It sets, therefore sublistiThe first element be close to number of policy comment line of text coboundary position row coordinate
coboundary_row。
Since comment and number of policy are in the ranks without apparent adhesion, using coboundary coboundary_row as starting point to
Upper scanning horiz_proj, lower boundary position of the interval midpoint that first floor projection value is 0 as number of policy, second water
Coboundary position of the interval midpoint that flat projection value is 0 as number of policy, according to the up-and-down boundary position coordinates oriented, extraction
Number of policy line of text it is as shown in Figure 10.
Finally, vertical direction projection is carried out to the number of policy text image row in Figure 10, as shown in figure 11.Due to " insurance
Odd numbers:" and declaration form character there is apparent and fixed position feature, therefore the spy that can be 0 by dashed box position projection value in figure
Sign, by " number of policy:" character opens with declaration form character cutting, orient the right boundary of number of policy.At this point it is possible to obtain number of policy
Be accurately positioned, as shown in figure 12.
After the image for obtaining number of policy as shown in figure 12, some characters are to be sticked together as can see from Figure 12
, number of policy shown in Figure 12 is subjected to upright projection (perspective view is as shown in figure 13), obtains the intermediate point of blank spaces, as
Pre-segmentation point.Here the projection value that blank refers to vertical direction is not higher than 2 pixels, that is, it is any to think that the vertical direction does not include
Character information.It should be noted that first pre-segmentation point is redefined for the previous blank position coordinate of first character, if nothing
Blank position then sets first pre-segmentation point coordinates and is set as 0.
The intermediate point that each blank spaces are calculated according to upright projection, to obtain pre-segmentation point coordinates collection blank_point
={ b0,b1,...,bi,..,bm, wherein m indicates pre-segmentation point sum, is less than or equal to character sum Nchar, biIndicate i-th in advance
Divide point coordinates.The interval of two neighboring pre-segmentation point can be expressed as interval=blank_point [i+1]-blank_
Point [i], i ∈ [0, m);
As shown in Figure 13, two neighboring pre-segmentation point section may include the declaration form character of a non-adhesion, it is also possible to wrap
Declaration form character containing multiple adhesions;According to character sum and number of policy overall width, the width of averagely single character can be obtained
W.Interval by comparing two neighboring pre-segmentation point and single character mean breadth, it can be determined that two neighboring pre-segmentation point is
No includes adhesion character:
As the first coefficient * W<The coefficient * of interval≤second W, it is determined that the interval includes a character, i.e. blank_
Point [i+1] is practical cut-point;Wherein, the second coefficient is more than the first coefficient;In this embodiment, the first coefficient is 0.6,
Second coefficient is 1.2;Work as 0.6*W<When interval≤1.2*W, it includes a character to determine the interval only, so blank_
Point [i+1] is practical cut-point.
As the second coefficient * W<When interval≤third coefficient * W, wherein third coefficient is set as 3.2, that is, works as 1.2*W<
When interval≤3.2*W, determines that this interval includes 3 adhesion characters, calculate the average width of character in the interval section
W is spent, using blank_point [i] as starting point start, determines practical cut-point as follows:
Step A:The abscissa seg_ of first adhesion cut-point in the interval section is calculated according to following formula
Point, seg_point=start+w, since character duration will not be essentially equal, Along ent is not necessarily practical cut-point, then
Centered on the first adhesion cut-point, the picture of the first predetermined number (being set as 3 in embodiment) is extended respectively to the left and right sides
Element, after expansion in the range of search the point of projection value minimum, and put projection value is minimum as the in the interval section
One practical cut-point, and seg_point values are updated according to the abscissa of the first practical cut-point in the interval section;
Step B:Updated seg_point values in step A are repeated into step A as starting point start, to
Obtain second practical cut-point in the interval section;
Step C:By blank_point [i+1] as the last one practical cut-point in the last interval section.
Work as interval>When 3.2*W, determine that the interval section includes the adhesion character that number of characters is more than 3, according to following step
Suddenly the practical cut-point of the interval section is determined:
Step a:The pixel of the second predetermined number will be respectively retracted before and after the interval section;
Step b:The point of the upright projection value minimum after being retracted in the interval section is searched, and using the point as the spacer region
Between adhesion cut-point, and the interval section is divided by two subintervals according to the adhesion cut-point;
Step c:The interval in subinterval is calculated, and according to the big of subinterval interval and the mean breadth of single character
It is small, the practical cut-point every subinterval is determined according to the preset recognition rule.
When finding all practical cut-points through the above steps, practical cut-point coordinate set segment_ is traversed
Point finds the pixel in each segmentation section, to the single character picture after being divided, as shown in figure 14.
Finally, in the prior art, training sample is trained using convolutional neural networks, character recognition mould can be obtained
Type.In forecast period, a character picture is inputted, identification model can export the highest character of identification probability.The embodiment of the present invention
In identification model including but not limited to convolutional neural networks, can be used has common supervised classifier KNN, SVM etc..
A kind of Figure 15 schematic diagrames of separating character device according to the ... of the embodiment of the present invention.As shown in figure 15, the present invention is implemented
Example separating character device 150 include mainly:Projection module 151, computing module 152, determining module 153 and character determine
Module 154;Projection module 151 is used to character picture to be split carrying out vertical direction projection, and is searched from projected image
The intermediate point of blank spaces is as pre-segmentation point, to obtain pre-segmentation point set;The blank spaces project for vertical direction
Point of the value less than setting value;Computing module 152 is used to calculate the average width of single character according to character sum and character overall width
Degree;Determining module 153 calculates the interval of two neighboring pre-segmentation point, and combine institute for traversing the pre-segmentation point set
The mean breadth for stating single character determines practical segmentation point set;Character determining module 154 is for traversing the practical cut-point
Set, determines the pixel between adjacent practical cut-point, single after the segmentation of the character picture to be split to obtain
Character figure;Wherein, the character picture to be split includes the binary image for only including character to be split.
The projection module 151 of separating character device 150 can be additionally used in:Lookup projection value, which is less than, from the projected image sets
The point of definite value, and the abscissa of each point is recorded successively;According to the abscissa of two neighboring point, calculate successively two neighboring
The abscissa of the intermediate point of point, to obtain pre-segmentation point coordinates collection blank_point={ b0,b1,...,bi,..,bm};Its
In, m indicates the sum of pre-segmentation point;Less than or equal to character sum Nchar;biIndicate the abscissa of i-th of pre-segmentation point.
The computing module 152 of separating character device 150 can be additionally used in:The average width of single character is calculated according to following formula
Degree, mean breadth=character overall width/character sum of single character.
The determining module 153 of separating character device 150 can be additionally used in:Pre-segmentation point set blank_point is traversed, then
Calculate separately the interval interval between the abscissa of two neighboring pre-segmentation point;Compare the flat of interval and single character
The size of equal width W, the abscissa of practical cut-point is determined according to preset recognition rule, and the write-in of practical cut-point is real
Divide point set segment_point in border;Wherein, b0For the practical abscissa for dividing first practical cut-point in point set.
The determining module 153 of separating character device 150 can be additionally used in:As the first coefficient * W<The coefficients of interval≤second *
W, it is determined that the interval includes a character, i.e. blank_point [i+1] is practical cut-point;Wherein, the second coefficient is more than the
One coefficient.
The determining module 153 of separating character device 150 can be additionally used in:As the second coefficient * W<Interval≤third coefficient *
When W, wherein third coefficient is more than the second coefficient, determines that this interval includes 3 adhesion characters, calculates word in the interval section
The mean breadth w of symbol determines practical cut-point as follows using blank_point [i] as starting point start:Step
A:The abscissa seg_point, seg_ of first adhesion cut-point in the interval section are calculated according to following formula
Point=start+w, then centered on the first adhesion cut-point, extend the picture of the first predetermined number respectively to the left and right sides
Element, after expansion in the range of search the point of projection value minimum, and put projection value is minimum as the in the interval section
One practical cut-point, and seg_point values are updated according to the abscissa of the first practical cut-point in the interval section;Step B:
Updated seg_point values in step A are repeated into step A as starting point start, to obtain the interval section
Second interior practical cut-point;Step C:By blank_point [i+1] as the last one reality in the last interval section
Border cut-point.
The determining module 153 of separating character device 150 can be additionally used in:Work as interval>When third coefficient * W, determine between being somebody's turn to do
Include the adhesion character that number of characters is more than 3 between septal area, the practical cut-point of the interval section is determined according to following steps:Step a:
The pixel of the second predetermined number will be respectively retracted before and after the interval section;Step b:Search the vertical throwing in the interval section after being retracted
The point of shadow value minimum, and using the point as the adhesion cut-point of the interval section, and according to the adhesion cut-point by the spacer region
Between be divided into two subintervals;Step c:Calculate the interval in subinterval, and being averaged according to subinterval interval and single character
The size of width determines the practical cut-point every subinterval according to the preset recognition rule.
The character determining module 154 of separating character device 150 can be additionally used in:Determine the abscissa of adjacent practical cut-point;
Pixel between pixel abscissa is belonged to adjacent practical cut-point abscissa is as the picture between the practical cut-point
Vegetarian refreshments.
Below with reference to Figure 16, it illustrates the computer systems suitable for the terminal device for realizing the embodiment of the present application
1600 structural schematic diagram.Terminal device shown in Figure 16 is only an example, should not to the function of the embodiment of the present application and
Use scope brings any restrictions.
As shown in figure 16, computer system 1600 include central processing unit (CPU) 1601, can according to be stored in only
It reads the program in memory (ROM) 1602 or is loaded into the journey in random access storage device (RAM) 1603 from storage section 608
Sequence and execute various actions appropriate and processing.In RAM1603, be also stored with system 1600 operate required various programs and
Data.CPU1601, ROM1602 and RAM1603 are connected with each other by bus 1604.Input/output (I/O) interface 1605
It is connected to bus 1604.
It is connected to I/O interfaces 1605 with lower component:Importation 1606 including keyboard, mouse etc.;Including such as cathode
The output par, c 1607 of ray tube (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section including hard disk etc.
1608;And the communications portion 1609 of the network interface card including LAN card, modem etc..Communications portion 1609 passes through
Communication process is executed by the network of such as internet.Driver 1610 is also according to needing to be connected to I/O interfaces 1605.It is detachable to be situated between
Matter 1611, such as disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 1610 as needed, so as to
In being mounted into storage section 1608 as needed from the computer program read thereon.
Particularly, in accordance with an embodiment of the present disclosure, the process of Character segmentation above may be implemented as computer software journey
Sequence.For example, embodiment of the disclosure includes a kind of computer program product comprising carry meter on a computer-readable medium
Calculation machine program, the computer program include the program code of the character segmentation method for executing technical solution of the present invention.At this
In the embodiment of sample, which can be downloaded and installed by communications portion 1609 from network, and/or from removable
Medium 1611 is unloaded to be mounted.When the computer program is executed by central processing unit (CPU) 1601, the system that executes the application
The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the application can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two arbitrarily combines.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or arbitrary above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more conducting wires, just
It takes formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type and may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In this application, can be any include computer readable storage medium or storage journey
The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this
In application, computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated,
Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By instruction execution system, device either device use or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to:Wirelessly, electric wire, optical cable, RF etc. or above-mentioned
Any appropriate combination.
Flow chart in attached drawing and block diagram, it is illustrated that according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part for a part for one module, program segment, or code of table, above-mentioned module, program segment, or code includes one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
Being described in module involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described module can also be arranged in the processor, for example, can be described as:A kind of processor packet
Include projection module, computing module, determining module and character determining module.Wherein, the title of these units is under certain conditions
The restriction to the unit itself is not constituted, for example, projection module can also be described as searching blank spaces from projected image
Intermediate point as pre-segmentation point, to obtain the module of pre-segmentation point set.
As on the other hand, present invention also provides a kind of computer-readable medium, which can be
Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes
Obtaining the equipment includes:Character picture to be split is subjected to vertical direction projection, and searches blank spaces from projected image
Intermediate point is as pre-segmentation point, to obtain pre-segmentation point set;The blank spaces are that vertical direction projection value is less than setting
The point of value;The mean breadth of single character is calculated according to character sum and character overall width;The pre-segmentation point set is traversed, is counted
The interval of two neighboring pre-segmentation point is calculated, and practical segmentation point set is determined in conjunction with the mean breadth of the single character;Time
The practical segmentation point set is gone through, the pixel between adjacent practical cut-point is determined, to obtain the character figure to be split
Single character figure after the segmentation of picture.
Technical solution according to the ... of the embodiment of the present invention, the case where for the continuous adhesion of multiple characters, according to different adhesions
Number of characters is split using different segmentation rules, can be effectively by multiple continuous adhesion Character segmentations at complete single
Character avoids a case where complete character cuts in half, improves the recognition accuracy of character to a certain extent.
Above-mentioned specific implementation mode, does not constitute limiting the scope of the invention.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and replacement can occur.It is any
Modifications, equivalent substitutions and improvements made by within the spirit and principles in the present invention etc., should be included in the scope of the present invention
Within.