CN108805128A

CN108805128A - A kind of character segmentation method and device

Info

Publication number: CN108805128A
Application number: CN201710312140.6A
Authority: CN
Inventors: 李俊玲
Original assignee: Beijing Jingdong Financial Technology Holding Co Ltd
Current assignee: Beijing Jingdong Financial Technology Holding Co Ltd
Priority date: 2017-05-05
Filing date: 2017-05-05
Publication date: 2018-11-13
Anticipated expiration: 2037-05-05
Also published as: CN108805128B

Abstract

A kind of character segmentation method of present invention offer and device can avoid a case where complete character cuts in half, improve the accuracy rate of character recognition to a certain extent by multiple continuous adhesion Character segmentations at complete single character.The present invention character segmentation method include：By character picture to be split carry out vertical direction projection, and from projected image search blank spaces intermediate point as pre-segmentation point, to obtain pre-segmentation point set；The blank spaces are the point that vertical direction projection value is less than setting value；The mean breadth of single character is calculated according to character sum and character overall width；The pre-segmentation point set is traversed, calculates the interval of two neighboring pre-segmentation point, and practical segmentation point set is determined in conjunction with the mean breadth of the single character；The practical segmentation point set is traversed, the pixel between adjacent practical cut-point is determined, to obtain the single character figure after the segmentation of the character picture to be split.

Description

A kind of character segmentation method and device

Technical field

The present invention relates to field of computer technology more particularly to a kind of character segmentation method and devices.

Background technology

OCR text recognition techniques suffer from application extensively and profoundly in current many fields, such as scientific research, reading, information inspection Rope etc..Using OCR text recognition techniques, can batch text, report are handled, improve treatment effeciency, reduce manually at This.In OCR text identification flows, it is broadly divided into：Picture typing --- binary conversion treatment --- denoising cutting --- character intelligence Identification --- result exports, and wherein denoising cutting is important a link, that is, removes noise useless in picture, will be in picture The target text identified is needed, the cutting of picture is carried out as unit of character, each character forms an individual picture, after being Continuous character intelligent recognition link provides available material, thus the result of denoising cutting will be largely fixed entire text The efficiency and accuracy rate of identification.

It is through projection mostly for the denoising cutting technique of picture text in the technical solution of the prior art Come what is realized.Steps are as follows：

1) binary conversion treatment is carried out to picture, color image is switched into black and white picture.After binaryzation, the character in figure can To regard as and black pixel point together.

2) projection operation is done for target character.By taking digital " 1 " as an example, this character is placed in two-dimensional coordinate system, it is left Upper angle is origin, and pixel is unit, is projected to X-axis and Y-axis, effect is as shown in Figure 1.

3) similarly, when characters multiple so together when, such as Fig. 2, the projection presented in the ideal situation, X-axis Projected image such as Fig. 3 (a), Y-axis projected image such as Fig. 3 (b).It is one to have multistage curve, every section of curve in two-dimensional projection image The projection of character, intermediate a certain section of SumY is 0, is therefore the image after projecting since character pitch part does not have a black pixel point It is 0 in this section.In this way, the projected image for passing through X-axis, it may be determined that the specific position lateral in picture of each character It sets.Similarly, character is obtained in the position of Y direction by figure.It is cut, is obtained according to the specific coordinate of pixel in picture To the picture of single character.

In realizing process of the present invention, inventor has found that at least there are the following problems in the prior art：In actual business In scene, only cut by traditional projection pattern to solve the denoising of picture, many times accuracy rate is very low, the reason is that industry In scene of being engaged in, the diversity and complexity of picture.Such as Fig. 4, character is after binary conversion treatment, some characters have adhesion in figure, and It cannot completely separate, so the image being projected out, can not be split according to the 0 value point of SumY.

Invention content

In view of this, a kind of character segmentation method of offer of the embodiment of the present invention and device, it can be by multiple continuous adhesion words Symbol cuts into complete single character, avoids a case where complete character cuts in half, improves to a certain extent The accuracy rate of character recognition.

To achieve the above object, the first aspect according to the ... of the embodiment of the present invention provides a kind of character segmentation method.

The present invention character segmentation method include：Character picture to be split is subjected to vertical direction projection, and from projection The intermediate point of blank spaces is searched in image as pre-segmentation point, to obtain pre-segmentation point set；The blank spaces are vertical Histogram is less than the point of setting value to projection value；

The mean breadth of single character is calculated according to character sum and character overall width；

The pre-segmentation point set is traversed, calculates the interval of two neighboring pre-segmentation point, and in conjunction with the single character Mean breadth determine practical segmentation point set；

The practical segmentation point set is traversed, determines the pixel between adjacent practical cut-point, to obtain described wait for Single character figure after the segmentation of separating character image.

Optionally, the intermediate point of blank spaces is searched from projected image as pre-segmentation point, to obtain pre-segmentation point The step of set includes：Projection value is searched from the projected image and is less than the point of setting value, and records each point successively Abscissa；According to the abscissa of two neighboring point, the abscissa of the intermediate point of two neighboring point is calculated successively, it is pre- to obtain Cut-point coordinate set blank_point={ b₀,b₁,...,b_i,..,b_m}；Wherein, m indicates the sum of pre-segmentation point；Less than etc. In character sum N_char；b_iIndicate the abscissa of i-th of pre-segmentation point.

Optionally, the step of calculating the mean breadth of single character according to character sum and character overall width include：According to Following formula calculates the mean breadth of single character, mean breadth=character overall width/character sum of single character.

Optionally, the pre-segmentation point set is traversed, calculates the interval of two neighboring pre-segmentation point, and in conjunction with the list The mean breadth of a character determines that the step of practical segmentation point set includes：Pre-segmentation point set blank_point is traversed, then Calculate separately the interval interval between the abscissa of two neighboring pre-segmentation point；Wherein, interval=blank_point [i+1]-blank_point [i], i ∈ [0, m)；The size for comparing the mean breadth W of interval and single character, according to pre- If recognition rule determine the abscissa of practical cut-point, and by the practical segmentation point set segment_ of the write-in of practical cut-point point；Wherein, b₀For the practical abscissa for dividing first practical cut-point in point set.

Optionally, the size for comparing interval and W is wrapped according to the step of preset recognition rule determination practical cut-point It includes：As the first coefficient * W<The coefficient * of interval≤second W, it is determined that the interval includes a character, i.e. blank_point [i + 1] it is practical cut-point；Wherein, the second coefficient is more than the first coefficient.

Optionally, the size for comparing interval and W is wrapped according to the step of preset recognition rule determination practical cut-point It includes：As the second coefficient * W<When interval≤third coefficient * W, wherein third coefficient is more than the second coefficient, determines that this interval is wrapped Containing 3 adhesion characters, the mean breadth w of character in the interval section is calculated, using blank_point [i] as starting point Start determines practical cut-point as follows：Step A：First in the interval section is calculated according to following formula Abscissa seg_point, the seg_point=start+w of a adhesion cut-point, then centered on the first adhesion cut-point, Extend the pixel of the first predetermined number respectively to the left and right sides, after expansion in the range of search projection value minimum point, and general The minimum point of projection value divides as the first practical cut-point in the interval section, and according to first in the interval section is practical The abscissa of cutpoint updates seg_point values；Step B：Using updated seg_point values in step A as starting point Start repeats step A, to obtain second practical cut-point in the interval section；Step C：By blank_ Point [i+1] is as the last one practical cut-point in the last interval section.

Optionally, the size for comparing interval and W is wrapped according to the step of preset recognition rule determination practical cut-point It includes：Work as interval>When third coefficient * W, determine that the interval section includes the adhesion character that number of characters is more than 3, according to following step Suddenly the practical cut-point of the interval section is determined：Step a：The pixel of the second predetermined number will be respectively retracted before and after the interval section； Step b：The point of the upright projection value minimum after being retracted in the interval section is searched, and using the point as the adhesion of the interval section Cut-point, and the interval section is divided by two subintervals according to the adhesion cut-point；Step c：Between the section for calculating subinterval Every, and according to the size at subinterval interval and the mean breadth of single character, determined every son according to the preset recognition rule The practical cut-point in section.

Optionally, the traversal practical segmentation point set, determines the step of the pixel between adjacent practical cut-point Suddenly include：Determine the abscissa of adjacent practical cut-point；Pixel abscissa is belonged between adjacent practical cut-point abscissa Pixel as the pixel between the practical cut-point.

Optionally, the character picture to be split includes the binary image for only including character to be split.

The second aspect according to the ... of the embodiment of the present invention provides a kind of Character segmentation device.

The present invention Character segmentation device include：Projection module, for character picture to be split to be carried out vertical direction throwing Shadow, and from projected image search blank spaces intermediate point as pre-segmentation point, to obtain pre-segmentation point set；It is described Blank spaces are the point that vertical direction projection value is less than setting value；Computing module, for according to character sum and character overall width Calculate the mean breadth of single character；Determining module calculates two neighboring pre-segmentation point for traversing the pre-segmentation point set Interval, and determine practical segmentation point set in conjunction with the mean breadth of the single character；Character determining module, for traversing The practical segmentation point set, determines the pixel between adjacent practical cut-point, to obtain the character picture to be split Segmentation after single character figure.

Optionally, the projection module is additionally operable to：The point that projection value is less than setting value is searched from the projected image, and The abscissa of each point is recorded successively；According to the abscissa of two neighboring point, the intermediate point of two neighboring point is calculated successively Abscissa, to obtain pre-segmentation point coordinates collection blank_point={ b₀,b₁,...,b_i,..,b_m}；Wherein, m indicates pre- The sum of cut-point；Less than or equal to character sum N_char；b_iIndicate the abscissa of i-th of pre-segmentation point.

Optionally, the computing module is additionally operable to：The mean breadth of single character, single character are calculated according to following formula Mean breadth=character overall width/character sum.

Optionally, the determining module is additionally operable to：Pre-segmentation point set blank_point is traversed, phase is then calculated separately Interval interval between the abscissa of adjacent two pre-segmentation points；Compare the mean breadth W's of interval and single character Size, determines the abscissa of practical cut-point according to preset recognition rule, and by the practical cut-point of the write-in of practical cut-point Set segment_point；Wherein, b₀For the practical abscissa for dividing first practical cut-point in point set.

Optionally, the determining module is additionally operable to：As the first coefficient * W<The coefficient * of interval≤second W, it is determined that should Interval includes a character, i.e. blank_point [i+1] is practical cut-point.

Optionally, the determining module is additionally operable to：As the second coefficient * W<When interval≤third coefficient * W, this is determined Interval include 3 adhesion characters, calculate the mean breadth w of character in the interval section, using blank_point [i] as Initial point start determines practical cut-point as follows：Step A：It is calculated in the interval section according to following formula Abscissa seg_point, the seg_point=start+w of first adhesion cut-point, then with the first adhesion cut-point be The heart extends the pixel of the first predetermined number respectively to the left and right sides, after expansion in the range of search projection value minimum point, and By the minimum point of projection value as the first practical cut-point in the interval section, and it is practical according to first in the interval section The abscissa of cut-point updates seg_point values；Step B：Using updated seg_point values in step A as starting point Start repeats step A, to obtain second practical cut-point in the interval section；Step C：By blank_ Point [i+1] is as the last one practical cut-point in the last interval section.

Optionally, the determining module is additionally operable to：Work as interval>When third coefficient * W, determine that the interval section includes Number of characters is more than 3 adhesion character, and the practical cut-point of the interval section is determined according to following steps：Step a：By the spacer region Between it is front and back it is each retraction the second predetermined number pixel；Step b：Search the upright projection value minimum after being retracted in the interval section Point, and using the point as the adhesion cut-point of the interval section, and the interval section is divided into two according to the adhesion cut-point Subinterval；Step c：The interval in subinterval is calculated, and according to the size at subinterval interval and the mean breadth of single character, The practical cut-point every subinterval is determined according to the preset recognition rule.

Optionally, the character determining module is additionally operable to：Determine the abscissa of adjacent practical cut-point；By the horizontal seat of pixel Mark belongs to the pixel between adjacent practical cut-point abscissa as the pixel between the practical cut-point.

In terms of third according to the ... of the embodiment of the present invention, a kind of electronic equipment is provided.

The present invention electronic equipment include：One or more processors；Storage device, for storing one or more journeys Sequence, when one or more of programs are executed by one or more of processors so that one or more of processors are real Existing character segmentation method provided by the invention.

In terms of third according to the ... of the embodiment of the present invention, a kind of computer-readable medium is provided.

The computer-readable medium of the present invention, is stored thereon with computer program, real when described program is executed by processor Existing character segmentation method provided by the invention.

One embodiment in foregoing invention has the following advantages that or advantageous effect：For the feelings of the continuous adhesion of multiple characters Condition is split according to different adhesion number of characters using different segmentation rules, can be effectively by multiple continuous adhesion words Symbol cuts into complete single character, avoids a case where complete character cuts in half, improves to a certain extent The recognition accuracy of character.

Further effect possessed by above-mentioned non-usual optional mode adds hereinafter in conjunction with specific implementation mode With explanation.

Description of the drawings

Attached drawing does not constitute inappropriate limitation of the present invention for more fully understanding the present invention.Wherein：

Fig. 1 is the schematic diagram that number 1 is projected to X-axis and Y-axis respectively；Wherein, (a) is the perspective view of X-direction；(b) it is Y The perspective view of axis direction；

Fig. 2 is the image of multiple characters；

Fig. 3 is the schematic diagram that multiple characters are projected to X-axis and Y-axis respectively；Wherein, (a) is the perspective view of X-direction；(b) For the perspective view of Y direction；

Fig. 4 is that character picture carries out the later schematic diagram of binary conversion treatment；

Fig. 5 is a kind of schematic diagram of separating character method according to the ... of the embodiment of the present invention；

Fig. 6 is the schematic diagram for the pretreatment zone for needing to intercept in insurance document example；

Fig. 7 is the image carried out to pretreatment zone after binaryzation；

Fig. 8 is the image after corrosion treatment and edge detection to binary image；

Fig. 9 is the perspective view carried out edge-detected image after floor projection；

Figure 10 is the image for the number of policy line of text extracted；

Figure 11 is that the perspective view after vertical direction projection is carried out to number of policy line of text image；

Figure 12 is the figure for the number of policy that positioning obtains；

Figure 13 is the vertical projection diagram of number of policy image；

Figure 14 is the image of the single character after over-segmentation；

A kind of Figure 15 schematic diagrames of separating character device according to the ... of the embodiment of the present invention；

Figure 16 is adapted for the structural schematic diagram of the computer system of the terminal device for realizing the embodiment of the present application.

Specific implementation mode

It explains to the exemplary embodiment of the present invention below in conjunction with attached drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together The description to known function and structure is omitted for clarity and conciseness in sample in following description.

Fig. 5 is a kind of schematic diagram of separating character method according to the ... of the embodiment of the present invention, as shown in figure 5, the present invention is implemented The method of the separating character of example includes following step S50 to S53.

Step S50：Character picture to be split is subjected to vertical direction projection, and searches blank spaces from projected image Intermediate point as pre-segmentation point, to obtain pre-segmentation point set.In this step, the character picture to be split is only Include the bianry image of character to be split；The blank spaces being previously mentioned are the point that vertical direction projection value is less than setting value, such as Here setting value is 2, that is to say, that if bright vertical direction projection value of crossing is less than 2 pixels, that is, thinks that the vertical direction is not wrapped Containing any character information；It should be noted that first pre-segmentation point is redefined for the previous sky of first character in the step White position coordinates, if without blank position, first pre-segmentation point coordinates is set as 0.In the step searches blank spaces Between when putting, projection value is searched first from the projected image and is less than the point of setting value (that is, vertical direction projection value is less than 2 The point of pixel), and the abscissa of each point is recorded successively；Then according to the abscissa of two neighboring point, phase is calculated successively The abscissa of the intermediate point of adjacent two points, to obtain pre-segmentation point coordinates collection blank_point={ b₀,b₁,...,b_i,.., b_m}；Wherein, m indicates the sum of pre-segmentation point；Less than or equal to character sum N_char；b_iIndicate the abscissa of i-th of pre-segmentation point.

Step S51：The mean breadth of single character is calculated according to character sum and character overall width.In this step, root The mean breadth of single character, mean breadth=character overall width/character sum of single character are calculated according to following formula.

Step S52：The pre-segmentation point set is traversed, calculates the interval of two neighboring pre-segmentation point, and in conjunction with described The mean breadth of single character determines practical segmentation point set.In this step, traversal pre-segmentation point set blank_ first Then point calculates separately the interval interval between the abscissa of two neighboring pre-segmentation point；

Wherein, interval=blank_point [i+1]-blank_point [i], i ∈ [0, m)；

Then compare the magnitude relationship of the mean breadth W of interval and single character, it is true according to preset recognition rule The abscissa of fixed practical cut-point, and by the practical segmentation point set segment_point of the write-in of practical cut-point；Wherein, b₀ For the practical abscissa for dividing first practical cut-point in point set.

Relationship between Interval and W includes following three kinds：

As the first coefficient * W<The coefficient * of interval≤second W, it is determined that the interval includes a character, i.e. blank_ Point [i+1] is practical cut-point；Wherein, the second coefficient is more than the first coefficient.

As the second coefficient * W<When interval≤third coefficient * W, wherein third coefficient is more than the second coefficient, determines this Interval include 3 adhesion characters, calculate the mean breadth w of character in the interval section, using blank_point [i] as Initial point start determines practical cut-point as follows：

Step A：The abscissa seg_ of first adhesion cut-point in the interval section is calculated according to following formula Point, seg_point=start+w, then centered on the first adhesion cut-point, extension first is pre- respectively to the left and right sides If the pixel of number, after expansion in the range of search the point of projection value minimum, and using the point of projection value minimum as the interval The first practical cut-point in section, and seg_ is updated according to the abscissa of the first practical cut-point in the interval section Point values；

Step B：Updated seg_point values in step A are repeated into step A as starting point start, to Obtain second practical cut-point in the interval section；

Step C：By blank_point [i+1] as the last one practical cut-point in the last interval section.

Work as interval>When third coefficient * W, determine that the interval section includes the adhesion character that number of characters is more than 3, according to Following steps determine the practical cut-point of the interval section：

Step a：The pixel of the second predetermined number will be respectively retracted before and after the interval section；

Step b：The point of the upright projection value minimum after being retracted in the interval section is searched, and using the point as the spacer region Between adhesion cut-point, and the interval section is divided by two subintervals according to the adhesion cut-point；

Step c：The interval in subinterval is calculated, and according to the big of subinterval interval and the mean breadth of single character It is small, the practical cut-point every subinterval is determined according to the preset recognition rule.

Step S53：The practical segmentation point set is traversed, determines the pixel between adjacent practical cut-point, to Single character figure to after the segmentation of the character picture to be split.In this step, it is first determined adjacent practical cut-point Abscissa；Then the pixel between pixel abscissa belongs to adjacent practical cut-point abscissa is divided as the reality Pixel between point；Finally according to the single character figure between adjacent practical cut-point.

Below by taking the identification of the social security number or taxid on insurance document as an example, technical solution of the present invention to be described in detail.? In the embodiment, image to be treated is pre-processed first, and has obtained only wrapping by various image processing methods Containing the binary image with separating character, specific pretreated process is as follows：

In declaration form, for the relatively whole Zhang Baodan of region area shared by number of policy, ratio very little, and also position is relatively solid It is fixed, in order to improve subsequent processing efficiency, the waste of computing resource is avoided, the declaration form image of input is converted to gray-scale map first Picture, and region interception is carried out to it.In the embodiment of the present invention, setting number of policy is unified in the upper right corner of declaration form image, with declaration form The upper right corner is starting point, is not more than the 1/2 of full figure width according to width, 1/6 interception declaration form upper right of the length no more than full figure length The region area at angle, as shown in fig. 6, the region is defined as pretreatment zone.

Self-adaption binaryzation processing, local neighborhood are carried out using the method for local auto-adaptive threshold value to Fig. 6 pretreatment zones The size of block is 35*35, and after binary conversion treatment, the pixel more than threshold value can be set to 255, and the pixel for being less than threshold value can quilt It is set to 0.Due to the picture characteristics of declaration form, the pixel value of binaryzation back scene area is usually 255, and the pixel value of foreground image is 0, as shown in Figure 7.

Pretreatment zone (i.e. Fig. 7) after binaryzation is subjected to horizontal direction corrosion treatment, due to the intercharacter of number of policy Away from very little, thus erodent template is no more than 3 pixels；Processing can not only remove the scattered noise of some in Fig. 7 in this way, also The profile information that can reinforce foreground character to a certain extent improves the accuracy of follow-up text row positioning.

Binaryzation picture after corrosion treatment first carries out picture the operation of sobel edge detections, as a result such as Fig. 8 Shown, in the figure after edge detection, the corresponding pixel of character only has " profile " of outmost turns, in this way when projecting, It can reduce, due to font, thickness, the error of caused perspective view ensures size of the value of projection with character as far as possible It is related with the quantity of word, convenient for passing through the relative position of projected image location character.

To edge-detected image carry out horizontal direction projection, perspective view as shown in figure 9, the embodiment of the present invention technical side In case, since the relative position of two row comments below number of policy and number of policy is fixed, and two row comments Since word height and number of characters are close, i.e. the floor projection feature of two row comments is almost the same, therefore first positions close to declaration form Number comment line of text position it is more reliable.It, can be by walking as follows in a specific embodiment of the present invention It is rapid to be positioned close to the coboundary position of the comment line of text of number of policy：

Step 1)：Maximum horizontal projection value proj_max, given threshold can be obtained according to floor projection value horiz_proj Thred, the criterion as extraction comment line of text：

Thred=proj_max*0.6

Step 2)：Horiz_proj is traversed, the size of comparison level projection value and threshold value, all big by floor projection value line by line In the continuous row of threshold value is concluded to a section list sublist, and preserve consecutive rows of row coordinate sublist=[row_i, row_i+1,…row_i+n], wherein i is consecutive rows of initial row coordinate, and n is consecutive rows of row sum；According to row shared by comment Number characteristic will not be less than 10 using the section list sublist length of thred extractions, therefore when the length of section list is unsatisfactory for this When condition, it will give up without subsequent processing；After having traversed, a section list collection seg_list=[sublist can be obtained₁, sublist₂,…,sublist_m], m is the number of section list.It should be noted that due to being by the sequence traversed from top to bottom It is handled, therefore sublist₂In first numerical value can be more than sublist₁In the last one numerical value.

Step 3)：Calculating sifting is carried out to the multiple sections of lists that step 2) obtains, obtains the comment close to number of policy The coboundary position coboundary_row of line of text, concrete operation step are as follows：

a：Calculate the maximum value and the corresponding row coordinate of maximum value of each section list sublist；

b：Seg_list is traversed, since the distance of two row comments is close, setting of the embodiment of the present invention represents expository writing The spacing of two sublist of word should be no more than 15.After two neighboring sublist meets the condition, two sublist are calculated The mean value and variance of maximum value；

c：Two adjacent sublist of selected minimum variance_i, sublist_i+1Line of text institute as comment is in place It sets, therefore sublist_iThe first element be close to number of policy comment line of text coboundary position row coordinate coboundary_row。

Since comment and number of policy are in the ranks without apparent adhesion, using coboundary coboundary_row as starting point to Upper scanning horiz_proj, lower boundary position of the interval midpoint that first floor projection value is 0 as number of policy, second water Coboundary position of the interval midpoint that flat projection value is 0 as number of policy, according to the up-and-down boundary position coordinates oriented, extraction Number of policy line of text it is as shown in Figure 10.

Finally, vertical direction projection is carried out to the number of policy text image row in Figure 10, as shown in figure 11.Due to " insurance Odd numbers：" and declaration form character there is apparent and fixed position feature, therefore the spy that can be 0 by dashed box position projection value in figure Sign, by " number of policy：" character opens with declaration form character cutting, orient the right boundary of number of policy.At this point it is possible to obtain number of policy Be accurately positioned, as shown in figure 12.

After the image for obtaining number of policy as shown in figure 12, some characters are to be sticked together as can see from Figure 12 , number of policy shown in Figure 12 is subjected to upright projection (perspective view is as shown in figure 13), obtains the intermediate point of blank spaces, as Pre-segmentation point.Here the projection value that blank refers to vertical direction is not higher than 2 pixels, that is, it is any to think that the vertical direction does not include Character information.It should be noted that first pre-segmentation point is redefined for the previous blank position coordinate of first character, if nothing Blank position then sets first pre-segmentation point coordinates and is set as 0.

The intermediate point that each blank spaces are calculated according to upright projection, to obtain pre-segmentation point coordinates collection blank_point ={ b₀,b₁,...,b_i,..,b_m, wherein m indicates pre-segmentation point sum, is less than or equal to character sum N_char, b_iIndicate i-th in advance Divide point coordinates.The interval of two neighboring pre-segmentation point can be expressed as interval=blank_point [i+1]-blank_ Point [i], i ∈ [0, m)；

As shown in Figure 13, two neighboring pre-segmentation point section may include the declaration form character of a non-adhesion, it is also possible to wrap Declaration form character containing multiple adhesions；According to character sum and number of policy overall width, the width of averagely single character can be obtained W.Interval by comparing two neighboring pre-segmentation point and single character mean breadth, it can be determined that two neighboring pre-segmentation point is No includes adhesion character：

As the first coefficient * W<The coefficient * of interval≤second W, it is determined that the interval includes a character, i.e. blank_ Point [i+1] is practical cut-point；Wherein, the second coefficient is more than the first coefficient；In this embodiment, the first coefficient is 0.6, Second coefficient is 1.2；Work as 0.6*W<When interval≤1.2*W, it includes a character to determine the interval only, so blank_ Point [i+1] is practical cut-point.

As the second coefficient * W<When interval≤third coefficient * W, wherein third coefficient is set as 3.2, that is, works as 1.2*W< When interval≤3.2*W, determines that this interval includes 3 adhesion characters, calculate the average width of character in the interval section W is spent, using blank_point [i] as starting point start, determines practical cut-point as follows：

Step A：The abscissa seg_ of first adhesion cut-point in the interval section is calculated according to following formula Point, seg_point=start+w, since character duration will not be essentially equal, Along ent is not necessarily practical cut-point, then Centered on the first adhesion cut-point, the picture of the first predetermined number (being set as 3 in embodiment) is extended respectively to the left and right sides Element, after expansion in the range of search the point of projection value minimum, and put projection value is minimum as the in the interval section One practical cut-point, and seg_point values are updated according to the abscissa of the first practical cut-point in the interval section；

Work as interval>When 3.2*W, determine that the interval section includes the adhesion character that number of characters is more than 3, according to following step Suddenly the practical cut-point of the interval section is determined：

When finding all practical cut-points through the above steps, practical cut-point coordinate set segment_ is traversed Point finds the pixel in each segmentation section, to the single character picture after being divided, as shown in figure 14.

Finally, in the prior art, training sample is trained using convolutional neural networks, character recognition mould can be obtained Type.In forecast period, a character picture is inputted, identification model can export the highest character of identification probability.The embodiment of the present invention In identification model including but not limited to convolutional neural networks, can be used has common supervised classifier KNN, SVM etc..

A kind of Figure 15 schematic diagrames of separating character device according to the ... of the embodiment of the present invention.As shown in figure 15, the present invention is implemented Example separating character device 150 include mainly：Projection module 151, computing module 152, determining module 153 and character determine Module 154；Projection module 151 is used to character picture to be split carrying out vertical direction projection, and is searched from projected image The intermediate point of blank spaces is as pre-segmentation point, to obtain pre-segmentation point set；The blank spaces project for vertical direction Point of the value less than setting value；Computing module 152 is used to calculate the average width of single character according to character sum and character overall width Degree；Determining module 153 calculates the interval of two neighboring pre-segmentation point, and combine institute for traversing the pre-segmentation point set The mean breadth for stating single character determines practical segmentation point set；Character determining module 154 is for traversing the practical cut-point Set, determines the pixel between adjacent practical cut-point, single after the segmentation of the character picture to be split to obtain Character figure；Wherein, the character picture to be split includes the binary image for only including character to be split.

The projection module 151 of separating character device 150 can be additionally used in：Lookup projection value, which is less than, from the projected image sets The point of definite value, and the abscissa of each point is recorded successively；According to the abscissa of two neighboring point, calculate successively two neighboring The abscissa of the intermediate point of point, to obtain pre-segmentation point coordinates collection blank_point={ b₀,b₁,...,b_i,..,b_m}；Its In, m indicates the sum of pre-segmentation point；Less than or equal to character sum N_char；b_iIndicate the abscissa of i-th of pre-segmentation point.

The computing module 152 of separating character device 150 can be additionally used in：The average width of single character is calculated according to following formula Degree, mean breadth=character overall width/character sum of single character.

The determining module 153 of separating character device 150 can be additionally used in：Pre-segmentation point set blank_point is traversed, then Calculate separately the interval interval between the abscissa of two neighboring pre-segmentation point；Compare the flat of interval and single character The size of equal width W, the abscissa of practical cut-point is determined according to preset recognition rule, and the write-in of practical cut-point is real Divide point set segment_point in border；Wherein, b₀For the practical abscissa for dividing first practical cut-point in point set.

The determining module 153 of separating character device 150 can be additionally used in：As the first coefficient * W<The coefficients of interval≤second * W, it is determined that the interval includes a character, i.e. blank_point [i+1] is practical cut-point；Wherein, the second coefficient is more than the One coefficient.

The determining module 153 of separating character device 150 can be additionally used in：As the second coefficient * W<Interval≤third coefficient * When W, wherein third coefficient is more than the second coefficient, determines that this interval includes 3 adhesion characters, calculates word in the interval section The mean breadth w of symbol determines practical cut-point as follows using blank_point [i] as starting point start：Step A：The abscissa seg_point, seg_ of first adhesion cut-point in the interval section are calculated according to following formula Point=start+w, then centered on the first adhesion cut-point, extend the picture of the first predetermined number respectively to the left and right sides Element, after expansion in the range of search the point of projection value minimum, and put projection value is minimum as the in the interval section One practical cut-point, and seg_point values are updated according to the abscissa of the first practical cut-point in the interval section；Step B： Updated seg_point values in step A are repeated into step A as starting point start, to obtain the interval section Second interior practical cut-point；Step C：By blank_point [i+1] as the last one reality in the last interval section Border cut-point.

The determining module 153 of separating character device 150 can be additionally used in：Work as interval>When third coefficient * W, determine between being somebody's turn to do Include the adhesion character that number of characters is more than 3 between septal area, the practical cut-point of the interval section is determined according to following steps：Step a： The pixel of the second predetermined number will be respectively retracted before and after the interval section；Step b：Search the vertical throwing in the interval section after being retracted The point of shadow value minimum, and using the point as the adhesion cut-point of the interval section, and according to the adhesion cut-point by the spacer region Between be divided into two subintervals；Step c：Calculate the interval in subinterval, and being averaged according to subinterval interval and single character The size of width determines the practical cut-point every subinterval according to the preset recognition rule.

The character determining module 154 of separating character device 150 can be additionally used in：Determine the abscissa of adjacent practical cut-point； Pixel between pixel abscissa is belonged to adjacent practical cut-point abscissa is as the picture between the practical cut-point Vegetarian refreshments.

Below with reference to Figure 16, it illustrates the computer systems suitable for the terminal device for realizing the embodiment of the present application 1600 structural schematic diagram.Terminal device shown in Figure 16 is only an example, should not to the function of the embodiment of the present application and Use scope brings any restrictions.

As shown in figure 16, computer system 1600 include central processing unit (CPU) 1601, can according to be stored in only It reads the program in memory (ROM) 1602 or is loaded into the journey in random access storage device (RAM) 1603 from storage section 608 Sequence and execute various actions appropriate and processing.In RAM1603, be also stored with system 1600 operate required various programs and Data.CPU1601, ROM1602 and RAM1603 are connected with each other by bus 1604.Input/output (I/O) interface 1605 It is connected to bus 1604.

It is connected to I/O interfaces 1605 with lower component：Importation 1606 including keyboard, mouse etc.；Including such as cathode The output par, c 1607 of ray tube (CRT), liquid crystal display (LCD) etc. and loud speaker etc.；Storage section including hard disk etc. 1608；And the communications portion 1609 of the network interface card including LAN card, modem etc..Communications portion 1609 passes through Communication process is executed by the network of such as internet.Driver 1610 is also according to needing to be connected to I/O interfaces 1605.It is detachable to be situated between Matter 1611, such as disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 1610 as needed, so as to In being mounted into storage section 1608 as needed from the computer program read thereon.

Particularly, in accordance with an embodiment of the present disclosure, the process of Character segmentation above may be implemented as computer software journey Sequence.For example, embodiment of the disclosure includes a kind of computer program product comprising carry meter on a computer-readable medium Calculation machine program, the computer program include the program code of the character segmentation method for executing technical solution of the present invention.At this In the embodiment of sample, which can be downloaded and installed by communications portion 1609 from network, and/or from removable Medium 1611 is unloaded to be mounted.When the computer program is executed by central processing unit (CPU) 1601, the system that executes the application The above-mentioned function of middle restriction.

It should be noted that computer-readable medium shown in the application can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two arbitrarily combines.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or arbitrary above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to：Electrical connection with one or more conducting wires, just It takes formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this application, can be any include computer readable storage medium or storage journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this In application, computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated, Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By instruction execution system, device either device use or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to：Wirelessly, electric wire, optical cable, RF etc. or above-mentioned Any appropriate combination.

Flow chart in attached drawing and block diagram, it is illustrated that according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part for a part for one module, program segment, or code of table, above-mentioned module, program segment, or code includes one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.

Being described in module involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described module can also be arranged in the processor, for example, can be described as：A kind of processor packet Include projection module, computing module, determining module and character determining module.Wherein, the title of these units is under certain conditions The restriction to the unit itself is not constituted, for example, projection module can also be described as searching blank spaces from projected image Intermediate point as pre-segmentation point, to obtain the module of pre-segmentation point set.

As on the other hand, present invention also provides a kind of computer-readable medium, which can be Included in equipment described in above-described embodiment；Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtaining the equipment includes：Character picture to be split is subjected to vertical direction projection, and searches blank spaces from projected image Intermediate point is as pre-segmentation point, to obtain pre-segmentation point set；The blank spaces are that vertical direction projection value is less than setting The point of value；The mean breadth of single character is calculated according to character sum and character overall width；The pre-segmentation point set is traversed, is counted The interval of two neighboring pre-segmentation point is calculated, and practical segmentation point set is determined in conjunction with the mean breadth of the single character；Time The practical segmentation point set is gone through, the pixel between adjacent practical cut-point is determined, to obtain the character figure to be split Single character figure after the segmentation of picture.

Technical solution according to the ... of the embodiment of the present invention, the case where for the continuous adhesion of multiple characters, according to different adhesions Number of characters is split using different segmentation rules, can be effectively by multiple continuous adhesion Character segmentations at complete single Character avoids a case where complete character cuts in half, improves the recognition accuracy of character to a certain extent.

Above-mentioned specific implementation mode, does not constitute limiting the scope of the invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and replacement can occur.It is any Modifications, equivalent substitutions and improvements made by within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims

1. a kind of character segmentation method, which is characterized in that including：

Character picture to be split is subjected to vertical direction projection, and searches the intermediate point conduct of blank spaces from projected image Pre-segmentation point, to obtain pre-segmentation point set；The blank spaces are the point that vertical direction projection value is less than setting value；

The pre-segmentation point set is traversed, calculates the interval of two neighboring pre-segmentation point, and in conjunction with the flat of the single character Equal width determines practical segmentation point set；

The practical segmentation point set is traversed, determines the pixel between adjacent practical cut-point, it is described to be split to obtain Single character figure after the segmentation of character picture.

2. according to the method described in claim 1, it is characterized in that, searching the intermediate point conduct of blank spaces from projected image Pre-segmentation point, to include the step of obtaining pre-segmentation point set：

Projection value is searched from the projected image and is less than the point of setting value, and records the abscissa of each point successively；

According to the abscissa of two neighboring point, the abscissa of the intermediate point of two neighboring point is calculated successively, to obtain pre-segmentation Point coordinates collection blank_point={ b₀,b₁,...,b_i,..,b_m}；Wherein, m indicates the sum of pre-segmentation point；Less than or equal to word Accord with sum N_char；b_iIndicate the abscissa of i-th of pre-segmentation point.

3. according to the method described in claim 1, it is characterized in that, calculating single character according to character sum and character overall width Mean breadth the step of include：

The mean breadth of single character is calculated according to following formula, mean breadth=character overall width/character of single character is total Number.

4. according to the method described in claim 1, it is characterized in that, traversing the pre-segmentation point set, calculating is two neighboring pre- The interval of cut-point, and determine that practical the step of dividing point set includes in conjunction with the mean breadth of the single character：

Pre-segmentation point set blank_point is traversed, between then calculating separately between the abscissa of two neighboring pre-segmentation point Every interval；

Wherein, interval=blank_point [i+1]-blank_point [i], i ∈ [0, m)；

The size for comparing the mean breadth W of interval and single character determines practical cut-point according to preset recognition rule Abscissa, and by the practical segmentation point set segment_point of the write-in of practical cut-point；Wherein, b₀For practical cut-point The abscissa of first practical cut-point in set.

5. according to the method described in claim 4, it is characterized in that, comparing the size of interval and W, according to preset identification Rule determines that the step of practical cut-point includes：

As the first coefficient * W<The coefficient * of interval≤second W, it is determined that the interval includes a character, i.e. blank_point [i+1] is practical cut-point；Wherein, the second coefficient is more than the first coefficient.

6. according to the method described in claim 4, it is characterized in that, comparing the size of interval and W, according to preset identification Rule determines that the step of practical cut-point includes：

As the second coefficient * W<When interval≤third coefficient * W, wherein third coefficient is more than the second coefficient, determines this interval Including 3 adhesion characters, calculate the mean breadth w of character in the interval section, using blank_point [i] as starting point Start determines practical cut-point as follows：

7. according to the method described in claim 4, it is characterized in that, comparing the size of interval and W, according to preset identification Rule determines that the step of practical cut-point includes：

Work as interval>When third coefficient * W, determine that the interval section includes the adhesion character that number of characters is more than 3, according to as follows Step determines the practical cut-point of the interval section：

Step b：The point of the upright projection value minimum after being retracted in the interval section is searched, and using the point as the interval section Adhesion cut-point, and the interval section is divided by two subintervals according to the adhesion cut-point；

Step c：The interval in subinterval is calculated, and according to the size at subinterval interval and the mean breadth of single character, is pressed The practical cut-point every subinterval is determined according to the preset recognition rule.

8. according to the method described in claim 1, it is characterized in that, the traversal practical segmentation point set, determines adjacent The step of pixel between practical cut-point includes：

Determine the abscissa of adjacent practical cut-point；

Pixel between pixel abscissa is belonged to adjacent practical cut-point abscissa is as between the practical cut-point Pixel.

9. method according to any one of claim 1 to 8, which is characterized in that the character picture to be split includes only Include the binary image of character to be split.

10. a kind of Character segmentation device, which is characterized in that including：

Projection module is used to character picture to be split carrying out vertical direction projection, and between lookup blank in projected image Every intermediate point as pre-segmentation point, to obtain pre-segmentation point set；The blank spaces are that vertical direction projection value is less than The point of setting value；

Computing module, the mean breadth for calculating single character according to character sum and character overall width；

Determining module calculates the interval of two neighboring pre-segmentation point, and in conjunction with described for traversing the pre-segmentation point set The mean breadth of single character determines practical segmentation point set；

Character determining module divides point set for traversing the reality, determines the pixel between adjacent practical cut-point, from And obtain the single character figure after the segmentation of the character picture to be split.

11. device according to claim 10, which is characterized in that the projection module is additionally operable to：

12. device according to claim 10, which is characterized in that the computing module is additionally operable to：

13. device according to claim 10, which is characterized in that the determining module is additionally operable to：

14. device according to claim 13, which is characterized in that the determining module is additionally operable to：

15. device according to claim 13, which is characterized in that the determining module is additionally operable to：

16. device according to claim 13, which is characterized in that the determining module is additionally operable to：

17. device according to claim 10, which is characterized in that the character determining module is additionally operable to：

Determine the abscissa of adjacent practical cut-point；

18. the device according to any one of claim 10 to 17, which is characterized in that the character picture to be split includes Only include the binary image of character to be split.

19. a kind of electronic equipment, which is characterized in that including：

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors so that one or more of processors are real The now method as described in any in claim 1-9.

20. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor The method as described in any in claim 1-9 is realized when row.