CN110059695A - A kind of character segmentation method and terminal based on upright projection - Google Patents

A kind of character segmentation method and terminal based on upright projection Download PDF

Info

Publication number
CN110059695A
CN110059695A CN201910328657.3A CN201910328657A CN110059695A CN 110059695 A CN110059695 A CN 110059695A CN 201910328657 A CN201910328657 A CN 201910328657A CN 110059695 A CN110059695 A CN 110059695A
Authority
CN
China
Prior art keywords
character
spacing
adjacent
obtains
central point
Prior art date
Application number
CN201910328657.3A
Other languages
Chinese (zh)
Inventor
庄国金
陈文传
杜保发
Original Assignee
厦门商集网络科技有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 厦门商集网络科技有限责任公司 filed Critical 厦门商集网络科技有限责任公司
Priority to CN201910328657.3A priority Critical patent/CN110059695A/en
Publication of CN110059695A publication Critical patent/CN110059695A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/34Segmentation of touching or overlapping patterns in the image field
    • G06K9/344Segmentation of touching or overlapping patterns in the image field using recognition of characters or words

Abstract

The present invention relates to a kind of character segmentation method and terminal based on upright projection, belong to data processing field.The present invention divides the character in the first character string picture by using vertical projection method, obtains the second character string picture;Obtain gauged distance value corresponding with second character string picture;The gauged distance value is the standard value of the spacing of two adjacent characters in second character string picture;If in second character string picture, the spacing of the central point of two adjacent characters is greater than the gauged distance value, then: being two characters by the Character segmentation in two adjacent character;If in second character string picture, the spacing of the central point of two adjacent characters is less than the gauged distance value, then: obtaining the character in two adjacent character, obtains the first character;A character adjacent with first character is obtained, the second character is obtained;Merge first character and second character.Realize the accuracy for improving Character segmentation.

Description

A kind of character segmentation method and terminal based on upright projection
Technical field
The present invention relates to a kind of character segmentation method and terminal based on upright projection, belong to data processing field.
Background technique
With the development of electronic information, the information on physical support need to be converted to computer by many application scenarios to be located The digital information of reason.For example, parking lot entrance identifies license plate, Parking Fee is calculated according to the license plate number recognized. In this application scenarios of Car license recognition, needs first to take pictures to license plate, obtain license plate photo, then identify that license plate on piece obtains license plate Number.In order to improve the accuracy of identification, reduce the interference between character, should the character first to license plate on piece be split, then Each monocase is identified respectively.
Application No. is the offers of 201710858247.0 patent document to disclose a kind of identity card character segmentation method.First Image is demonstrate,proved by special equipment captured identity, then uses binarization threshold with image background color the black font in ID Card Image Variation and the binary conversion treatment mode that changes, obtain bianry image, and reversed bianry image rotation 180 degree is obtained into forward direction Bianry image;Then, floor projection is done to bianry image, above and below floor projection result acquisition ID card No. part Boundary does upright projection to ID card No. image, obtains the right boundary and each number of number based on upright projection Position;And the corresponding position relationship based on Chinese character part and number part, the left and right boundary of Chinese character part is obtained, to be based on The floor projection of Chinese character part image, upright projection obtain the segmentation of name, nationality and the single character of home address part.This Invention is for extracting the Character segmentation of identity information, and segmentation is accurate, and cost overhead is low.
Above patent document divides ID card No. using vertical projection method, and this method is for being adhered smaller or disconnected pen Less character string picture can obtain preferable Character segmentation effect, but for serious interference or the serious feelings of loss of data Condition easily causes and accidentally divides.It is opened for example, two characters being adhered are undivided, same character is divided into two characters.
Summary of the invention
The technical problems to be solved by the present invention are: how to improve the accuracy of Character segmentation.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention are as follows:
The present invention provides a kind of character segmentation method based on upright projection, further includes:
Character in first character string picture is divided using vertical projection method, obtains the second character string picture;
Obtain the gauged distance value of two adjacent intercharacters in second character string picture;
If in second character string picture, the spacing of the central point of two adjacent characters is greater than the gauged distance value, then by institute Stating the Character segmentation in two adjacent characters is two characters;
If in second character string picture, the spacing of the central point of two adjacent characters is less than the gauged distance value, then: obtaining A character in two adjacent character, obtains the first character;A character adjacent with first character is obtained, obtains second Character;Merge first character and second character.
It preferably, is two characters by the Character segmentation in two adjacent character, specifically:
The abscissa for obtaining the central point of a character, obtains central point abscissa;
Presetted pixel number;
The value range of abscissa is the region of (x-a, x+a) in second character string picture described in upright projection, is vertically thrown Shadow histogram;Wherein, the x is the central point abscissa, and a is the number of pixels;The vertical projective histogram Abscissa indicate abscissa of the pixel in second character string picture;The ordinate of the vertical projective histogram Indicate pixel number;
It obtains in the vertical projective histogram, the smallest abscissa value of pixel number obtains breakpoint coordinate;
Divide a character according to the breakpoint coordinate.
Preferably, the gauged distance value of two adjacent intercharacters in second character string picture is obtained, specifically:
The character in second character string picture is obtained, third character is obtained;
A character adjacent with the third character is obtained, the 4th character is obtained;4th character is located at the third character Left side;
A character adjacent with the third character is obtained, the 5th character is obtained;5th character is located at the third character Right side;
The average value for calculating the height of the height of the third character, the height of the 4th character and the 5th character, obtains To height mean value;
The central point of the third character is obtained at a distance from the central point of the 4th character, obtains the first spacing;
The central point of the third character is obtained at a distance from the central point of the 5th character, obtains the second spacing;
The average value for calculating first spacing Yu second spacing, obtains spacing mean value;
If the difference of first spacing and second spacing is less than preset spacing threshold, and the spacing mean value with it is described For the ratio of height mean value in preset ratio range, then it is the gauged distance value that the spacing mean value, which is arranged,.
The present invention also provides a kind of Character segmentation terminal, including one or more processors and memory, the memories It is stored with program, and is configured to execute following steps by one or more of processors:
Character in first character string picture is divided using vertical projection method, obtains the second character string picture;
Obtain the gauged distance value of two adjacent intercharacters in second character string picture;
If in second character string picture, the spacing of the central point of two adjacent characters is greater than the gauged distance value, then by institute Stating the Character segmentation in two adjacent characters is two characters;
If in second character string picture, the spacing of the central point of two adjacent characters is less than the gauged distance value, then: obtaining A character in two adjacent character, obtains the first character;A character adjacent with first character is obtained, obtains second Character;Merge first character and second character.
It preferably, is two characters by the Character segmentation in two adjacent character, specifically:
The abscissa for obtaining the central point of a character, obtains central point abscissa;
Presetted pixel number;
The value range of abscissa is the region of (x-a, x+a) in second character string picture described in upright projection, is vertically thrown Shadow histogram;Wherein, the x is the central point abscissa, and a is the number of pixels;The vertical projective histogram Abscissa indicate abscissa of the pixel in second character string picture;The ordinate of the vertical projective histogram Indicate pixel number;
It obtains in the vertical projective histogram, the smallest abscissa value of pixel number obtains breakpoint coordinate;
Divide a character according to the breakpoint coordinate.
Preferably, the gauged distance value of two adjacent intercharacters in second character string picture is obtained, specifically:
The character in second character string picture is obtained, third character is obtained;
A character adjacent with the third character is obtained, the 4th character is obtained;4th character is located at the third character Left side;
A character adjacent with the third character is obtained, the 5th character is obtained;5th character is located at the third character Right side;
The average value for calculating the height of the height of the third character, the height of the 4th character and the 5th character, obtains To height mean value;
The central point of the third character is obtained at a distance from the central point of the 4th character, obtains the first spacing;
The central point of the third character is obtained at a distance from the central point of the 5th character, obtains the second spacing;
The average value for calculating first spacing Yu second spacing, obtains spacing mean value;
If the difference of first spacing and second spacing is less than preset spacing threshold, and the spacing mean value with it is described For the ratio of height mean value in preset ratio range, then it is the gauged distance value that the spacing mean value, which is arranged,.
The invention has the following beneficial effects:
1, the prior art utilizes merely vertical projection method's separating character, two Character segmentations that gap will be present that can only be simple It opens.Following two problem is had using vertical projection method's separating character: being adhered, can not correctly divide if two characters exist; If a character has disconnected pen, the character is since, in gap, vertical projection method can miss a character point between disconnected two parts It is cut into two characters.The present invention provides a kind of character segmentation method and terminal based on upright projection, passes through a character string picture The segmentation result of the gauged distance value correction vertical projection method of middle intercharacter, causes two characters undivided to avoid due to being adhered It opens, or since disconnected pen or information lose a case where character is accidentally divided into two characters.Character string figure of the present invention The spacing of adjacent character is identical as in, for example, the ID card No. on identity card.Gauged distance value provided by the invention is character The standard value of the spacing of two adjacent characters in string image.For example, the ID card No. region of an ID Card Image, digital two-by-two Be spaced in 5 pixels nearby float.The present invention is carrying out primary segmentation to the first character string picture using vertical projection method Afterwards, the size relation of the spacing and gauged distance value between every two adjacent character is successively analyzed, it can be determined that go out according to vertical The result of sciagraphy segmentation whether there is the situation for being adhered or accidentally dividing.For example, using vertical projection method to an ID Card Image It carries out preliminary Character segmentation and obtains 11 characters, wherein if the spacing between third character and the 4th character is 10 pictures Vegetarian refreshments is greater than gauged distance value, then third character or the 4th character need further to divide there may be being adhered It cuts;If the spacing between third character and the 4th character is 2 pixels, it is less than gauged distance value, then third character Or the 4th character there may be accidentally dividing, i.e. third character or the 4th character is imperfect, only half, need to be with Adjacent character merges to achieve the purpose that correct separating character.The present invention is using the standard value of the spacing of two adjacent characters to vertical The segmentation result of straight sciagraphy is corrected, and the accuracy of separating character string image can be improved.
2, it further, when needing a Character segmentation is two characters, if being projected using entire character, takes The smallest abscissa value of pixel number carries out Character segmentation in the vertical projective histogram of entire character, it is most likely that there is mistake The case where segmentation.Since the spacing of two adjacent characters in character string picture of the present invention is roughly the same, then each character Character width is roughly the same.Therefore, the central area of character to be split is only carried out upright projection by the present invention, is conducive to improve and be divided Cut the accuracy of character.
3, further, the present invention traverses the second character string picture, meets following item until finding continuous three characters Part: (1) intermediate character is roughly the same with the spacing of left and right adjacent character;The spacing of (2) two characters and the height of three characters are equal The ratio of value is consistent with preset ratio range.When meeting above-mentioned condition there are continuous three characters, indicate using vertical The Character segmentation result that sciagraphy divides these three characters is correct, these three characters, which are not present, the abnormal feelings such as is adhered or accidentally divides Condition can be improved and be divided to other using vertical projection method using the average character pitch of these three characters as gauged distance value Abnormal character is split the accuracy of correction.
Detailed description of the invention
Fig. 1 is a kind of flow chart element of the specific embodiment of the character segmentation method based on upright projection provided by the invention Figure;
Fig. 2 is the segmentation result schematic diagram for dividing the first character string picture using vertical projection method;
Fig. 3 is a kind of structural block diagram of the specific embodiment of the Character segmentation terminal based on upright projection provided by the invention;
Label declaration:
1, processor;2, memory.
Specific embodiment
It is next in the following with reference to the drawings and specific embodiments that the present invention will be described in detail.
Fig. 1 to Fig. 3 is please referred to,
The embodiment of the present invention one are as follows:
As shown in Figure 1, the present embodiment provides a kind of character segmentation methods based on upright projection, further includes:
S1, the character in the first character string picture is divided using vertical projection method, obtains the second character string picture.
Wherein, existing vertical projection method can be used to be split the first character string picture, obtain the second character string figure Picture.For example, application No. is 201810751647.6 patent document be on the basis of vertical projection method realize to license plate into Line character segmentation.Divide the character of the first character string picture using vertical projection method, specifically: the first character string picture is carried out Upright projection obtains vertical projective histogram.Vertical projective histogram is for counting black pixel point distribution situation, i.e. the first word It is each in symbol string image to arrange the black pixel point number for including.If there is the black pixel point number of a column in the first character string picture It is zero, then illustrates that there is no any traces on this column, it is most likely that be the line of demarcation of two intercharacters.The present embodiment according to This principle carries out primary segmentation to character string picture.For example, Fig. 2 is using vertical projection method to original character string image (the One character string picture) carry out Character segmentation segmentation result schematic diagram (the second character string picture).
S2, gauged distance value corresponding with second character string picture is obtained;The gauged distance value is described second The standard value of the spacing of two adjacent characters in character string picture.Specifically:
A character in S21, acquisition second character string picture, obtains third character.
For example, as shown in Fig. 2, " number " chosen in the second character string picture is as third character.
S22, a character adjacent with the third character is obtained, obtains the 4th character;4th character is located at described On the left of third character.
For example, as shown in Fig. 2, adjacent with the third character " number ", and it is located at the word on the left of the third character " number " Symbol is " big ", i.e. the 4th character is " big ".
S23, a character adjacent with the third character is obtained, obtains the 5th character;5th character is located at described The right side of third character.
For example, as shown in Fig. 2, adjacent with the third character " number ", and the character being located on the right side of the third character is " according to ", i.e. the 5th character are " according to ".
S24, the height for calculating the third character, the height of the height of the 4th character and the 5th character it is flat Mean value obtains height mean value.
For example, as shown in Fig. 2, the height of third character " number " are as follows: 33, the height of the 4th character " big " are as follows: 32, the 5th word Accord with the height of " according to " are as follows: 32, the height average of these three characters are as follows: 32.22.
S25, the central point for obtaining the third character obtain between first at a distance from the central point of the 4th character Away from.
Wherein, connected domain detection is carried out to the second character string picture, connected domain refers to all point structure to communicate with each other At set, the point to communicate with each other forms a region, and disconnected point forms different regions.It is examined by connected domain Survey the position that can tentatively recognize each character in the second character string picture.The present embodiment is using the central point of connected domain as character Central point.
For example, as shown in Fig. 2, the central point of third character " number " is at a distance from the central point of the 4th character " big " are as follows: 35.
S26, the central point for obtaining the third character obtain between second at a distance from the central point of the 5th character Away from.
For example, as shown in Fig. 2, the central point of third character " number " is at a distance from the central point of the 4th character " according to " are as follows: 34.
S27, the average value for calculating first spacing and second spacing, obtain spacing mean value.
For example, as shown in Fig. 2, spacing mean value is 34 pixels.
If the difference of S28, first spacing and second spacing is less than preset spacing threshold, and the spacing is equal For the ratio of value and the height mean value in preset ratio range, then it is the gauged distance value that the spacing mean value, which is arranged,.
Wherein, the present invention traverses the second character string picture, meets the following conditions until finding continuous three characters: (1) Intermediate character is roughly the same with the spacing of left and right adjacent character;The ratio of the spacing of (2) two characters and the height mean value of three characters Value is consistent with preset ratio range.When meeting above-mentioned condition there are continuous three characters, indicate to use vertical projection method The Character segmentation result for dividing these three characters is correct, these three characters, which are not present, the abnormal conditions such as is adhered or accidentally divides, with The average character pitch of these three characters can be improved as gauged distance value and divide exception using vertical projection method to other Character is split the accuracy of correction.
For example, as shown in Fig. 2, the difference of the first spacing and the second spacing is 1, it is less than preset spacing threshold 6, and Ratio away from mean value and height mean value is that spacing is then arranged in preset ratio range [27.2:32,40.8:32] in 34:32 Mean value: the 34 gauged distance value as the present embodiment.
Wherein, the difference of the first spacing and the second spacing is smaller, and it is smaller to represent these three character exception probability, the present embodiment Spacing difference threshold take the 1/5 of three character average heights empirically threshold value, i.e. ((32+33+33)/3)/5=6.
Preset ratio range takes region [typical ratio * 80%, typical ratio * 120%] here.Typical ratio is in different fields Much there is fixed proportion under scape, such as identity card, driver's license, driving license character all have typesetting specification features, therefore standard Ratio is also fixed;But just do not have unalterable rules in general file, then this value needs to calculate: respectively to adjacent character spacing and Character height carries out statistics with histogram, finds out best center spacing and optimum height, the two parameter logistics are denoted as typical ratio. By taking calibrated altitude calculates as an example: in the character height that abscissa represents, ordinate is represented in the histogram of character number and is counted The region of width distribution at most (most intensive), optimum height are generally all fallen in this region, finally take the center in this region Point is used as standard height value.Specific steps:
Abscissa is recycled from 1 to maximum character height max, it is corresponding to add up fixed step size (such as 5) at each abscissa respectively Character number and, when such as abscissa being Xn, cumulative number is the ordinate value summation SUMn in the region Xn-2 ~ Xn+2, tired at these The corresponding abscissa value of maximum value can be confirmed as the character height being best suitable in addend SUM1 ~ SUMmax.
For example, can calculate best central point spacing in Fig. 2 is 34, optimum height 32, typical ratio is 34:32, According to preset ratio range=[typical ratio * 80%, typical ratio * 120%], preset ratio range is finally confirmed as [27.2:32,40.8:32].
S3, divide or merge character in second character string picture according to the gauged distance value.Specifically:
If in S31, second character string picture, the spacing of the central point of two adjacent characters is greater than the gauged distance value, then:
It is two characters by the Character segmentation in two adjacent character.
For example, the spacing of the central point of the central point of the character " being one " in Fig. 2 and left side adjacent character " net " are as follows: 53, Greater than the gauged distance value: 34, and 53-34=19, it is greater than spacing threshold 6.Therefore, the character in the second character string picture " is One " or character " net " there may be adhered." being one " is first split operation by the present embodiment.
It wherein, is two characters by the Character segmentation in two adjacent character, specifically:
S311, obtain the character central point abscissa, obtain central point abscissa.
For example, obtaining the central point abscissa 50 of character " being one ".
S312, presetted pixel number.
Wherein, the value of presetted pixel number is 2.
The value range of abscissa is the region of (x-a, x+a) in second character string picture described in S313, upright projection, is obtained To vertical projective histogram;Wherein, the x is the central point abscissa, and a is the number of pixels;The vertical throwing The abscissa of shadow histogram indicates abscissa of the pixel in second character string picture;The vertical projective histogram Ordinate indicate pixel number.
Wherein, a value, which takes, generally takes an empirical value, and such as 2.Optimal partition point is found in small range region.
For example, the value range of abscissa is that the region of (48,52) carries out upright projection by the second character string picture.
S314, it obtains in the vertical projective histogram, the smallest abscissa value of pixel number obtains breakpoint coordinate.
S315, a character is divided according to the breakpoint coordinate.
Wherein, when needing a Character segmentation is two characters, if being projected using entire character, it is rounded a word The smallest abscissa value of pixel number carries out Character segmentation in the vertical projective histogram of symbol, it is most likely that there is accidentally segmentation Situation.Since the spacing of two adjacent characters in character string picture of the present invention is roughly the same, then the character of each character is wide It spends roughly the same.Therefore, the central area of character to be split is only carried out upright projection by the present invention, is conducive to improve separating character Accuracy.
For example, the smallest abscissa value of pixel number is 50, i.e. the central point black pixel point number of character " being one " At least.It is that character " being one " is divided into "Yes" and " one " by endpoint with abscissa 50.
After character " being one " in Fig. 2 is split into two character "Yes" and " one ", character "Yes" and character " net " it Between spacing meet the requirements, therefore, " net " word there is no is adhered or accidentally divide the case where, no longer need to operate " net ".
If in S32, second character string picture, the spacing of the central point of two adjacent characters is less than the gauged distance Value, then: obtaining the character in two adjacent character, obtain the first character;Obtain a word adjacent with first character Symbol, obtains the second character;Merge first character and second character.
For example, as shown in Fig. 2, spacing between character " stone " and its adjacent character " base " in left side are as follows: 26, it is less than described Gauged distance value 34, and 34-26=8 are greater than spacing threshold 6.Spacing between character " stone " and character " out " adjacent on the right side of it Are as follows: 16, it is less than the gauged distance value 34, and 34-16=18, is greater than spacing threshold 6.
Therefore, there may be accidentally divide for character " base ", character " stone " and character " out ".The present embodiment is first by word Symbol " stone " and the character " out " adjacent on the right side of " stone " merge, and obtain " plinth ".After merging, the spacing of " plinth " and " base " is conformed to It asks, therefore, " base " character no longer needs to operate " net " there is no being adhered or accidentally dividing.
Wherein, if the spacing of the central point of two adjacent characters is greater than gauged distance value, illustrate there is one in two adjacent characters There is the case where being adhered in character, need to be split processing;If the spacing of the central point of two adjacent characters is less than gauged distance value, Illustrate have a character imperfect in two adjacent characters, processing need to be merged.
The present embodiment provides a kind of character segmentation method and terminal based on upright projection, character string figure of the present invention The spacing of adjacent character is identical as in, for example, the ID card No. on identity card.Gauged distance value provided by the invention is character The standard value of the spacing of two adjacent characters in string image.For example, the ID card No. region of an ID Card Image, digital two-by-two Be spaced in 5 pixels nearby float.The present invention is carrying out primary segmentation to the first character string picture using vertical projection method Afterwards, the size relation of the spacing and gauged distance value between every two adjacent character is successively analyzed, it can be determined that go out according to vertical The result of sciagraphy segmentation whether there is the situation for being adhered or accidentally dividing.For example, using vertical projection method to an ID Card Image It carries out preliminary Character segmentation and obtains 11 characters, wherein if the spacing between third character and the 4th character is 10 pictures Vegetarian refreshments is greater than gauged distance value, then third character or the 4th character need further to divide there may be being adhered It cuts;If the spacing between third character and the 4th character is 2 pixels, it is less than gauged distance value, then third character Or the 4th character there may be accidentally dividing, i.e. third character or the 4th character is imperfect, only half, need to be with Adjacent character merges to achieve the purpose that correct separating character.The present invention is using the standard value of the spacing of two adjacent characters to vertical The segmentation result of straight sciagraphy is corrected, and the accuracy of separating character string image can be improved.Character according to the present invention String image, the spacing of two adjacent characters are identical.
The embodiment of the present invention two are as follows:
As shown in figure 3, the present embodiment provides a kind of Character segmentation terminal based on upright projection, including one or more processors 1 and memory 2, the memory 2 be stored with program, and be configured to be executed by one or more of processors 1 following Step:
S1, the character in the first character string picture is divided using vertical projection method, obtains the second character string picture.
Wherein, existing vertical projection method can be used to be split the first character string picture, obtain the second character string figure Picture.Specifically: upright projection is carried out to the first character string picture, obtains vertical projective histogram.Vertical projective histogram is used for Count black pixel point distribution situation, i.e. the black pixel point number that each column include in the first character string picture.If the first word Having the black pixel point number of a column in symbol string image is zero, then illustrates that there is no any traces on this column, it is most likely that It is the line of demarcation of two intercharacters.The present embodiment carries out primary segmentation according to this principle, to character string picture.For example, Fig. 2 is The segmentation result schematic diagram (the of Character segmentation is carried out to original character string image (the first character string picture) using vertical projection method Two character string pictures).
S2, gauged distance value corresponding with second character string picture is obtained;The gauged distance value is described second The standard value of the spacing of two adjacent characters in character string picture.Specifically:
A character in S21, acquisition second character string picture, obtains third character.
For example, as shown in Fig. 2, " number " chosen in the second character string picture is as third character.
S22, a character adjacent with the third character is obtained, obtains the 4th character;4th character is located at described On the left of third character.
For example, as shown in Fig. 2, adjacent with the third character " number ", and it is located at the word on the left of the third character " number " Symbol is " big ", i.e. the 4th character is " big ".
S23, a character adjacent with the third character is obtained, obtains the 5th character;5th character is located at described The right side of third character.
For example, as shown in Fig. 2, adjacent with the third character " number ", and the character being located on the right side of the third character is " according to ", i.e. the 5th character are " according to ".
S24, the height for calculating the third character, the height of the height of the 4th character and the 5th character it is flat Mean value obtains height mean value.
For example, as shown in Fig. 2, the height of third character " number " are as follows: 33, the height of the 4th character " big " are as follows: 32, the 5th word Accord with the height of " according to " are as follows: 32, the height average of these three characters are as follows: 32.22.
S25, the central point for obtaining the third character obtain between first at a distance from the central point of the 4th character Away from.
Wherein, connected domain detection is carried out to the second character string picture, connected domain refers to all point structure to communicate with each other At set, the point to communicate with each other forms a region, and disconnected point forms different regions.It is examined by connected domain Survey the position that can tentatively recognize each character in the second character string picture.The present embodiment is using the central point of connected domain as character Central point.
For example, as shown in Fig. 2, the central point of third character " number " is at a distance from the central point of the 4th character " big " are as follows: 35.
S26, the central point for obtaining the third character obtain between second at a distance from the central point of the 5th character Away from.
For example, as shown in Fig. 2, the central point of third character " number " is at a distance from the central point of the 4th character " according to " are as follows: 34.
S27, the average value for calculating first spacing and second spacing, obtain spacing mean value.
For example, as shown in Fig. 2, spacing mean value is 34 pixels.
If the difference of S28, first spacing and second spacing is less than preset spacing threshold, and the spacing is equal For the ratio of value and the height mean value in preset ratio range, then it is the gauged distance value that the spacing mean value, which is arranged,.
Wherein, the present invention traverses the second character string picture, meets the following conditions until finding continuous three characters: (1) Intermediate character is roughly the same with the spacing of left and right adjacent character;The ratio of the spacing of (2) two characters and the height mean value of three characters Value is consistent with preset ratio range.When meeting above-mentioned condition there are continuous three characters, indicate to use vertical projection method The Character segmentation result for dividing these three characters is correct, these three characters, which are not present, the abnormal conditions such as is adhered or accidentally divides, with The average character pitch of these three characters can be improved as gauged distance value and divide exception using vertical projection method to other Character is split the accuracy of correction.
For example, as shown in Fig. 2, the difference of the first spacing and the second spacing is 1, it is less than preset spacing threshold 6, and Ratio away from mean value and height mean value is that spacing is then arranged in preset ratio range [27.2:32,40.8:32] in 34:32 Mean value: the 34 gauged distance value as the present embodiment.
Wherein, the difference of the first spacing and the second spacing is smaller, and it is smaller to represent these three character exception probability, the present embodiment Spacing difference threshold take the 1/5 of three character average heights empirically threshold value, i.e. ((32+33+33)/3)/5=6.
Preset ratio range takes region [typical ratio * 80%, typical ratio * 120%] here.Typical ratio is in different fields Much there is fixed proportion under scape, such as identity card, driver's license, driving license character all have typesetting specification features, therefore standard Ratio is also fixed;But just do not have unalterable rules in general file, then this value needs to calculate: respectively to adjacent character spacing and Character height carries out statistics with histogram, finds out best center spacing and optimum height, the two parameter logistics are denoted as typical ratio. By taking calibrated altitude calculates as an example: in the character height that abscissa represents, ordinate is represented in the histogram of character number and is counted The region of width distribution at most (most intensive), optimum height are generally all fallen in this region, finally take the center in this region Point is used as standard height value.Specific steps:
Abscissa is recycled from 1 to maximum character height max, it is corresponding to add up fixed step size (such as 5) at each abscissa respectively Character number and, when such as abscissa being Xn, cumulative number is the ordinate value summation SUMn in the region Xn-2 ~ Xn+2, tired at these The corresponding abscissa value of maximum value can be confirmed as the character height being best suitable in addend SUM1 ~ SUMmax.
For example, can calculate best central point spacing in Fig. 2 is 34, optimum height 32, typical ratio is 34:32, According to preset ratio range=[typical ratio * 80%, typical ratio * 120%], preset ratio range is finally confirmed as [27.2:32,40.8:32].
S3, divide or merge character in second character string picture according to the gauged distance value.Specifically:
If in S31, second character string picture, the spacing of the central point of two adjacent characters is greater than the gauged distance value, then:
It is two characters by the Character segmentation in two adjacent character.
For example, the spacing of the central point of the central point of the character " being one " in Fig. 2 and left side adjacent character " net " are as follows: 53, Greater than the gauged distance value: 34, and 53-34=19, it is greater than spacing threshold 6.Therefore, the character in the second character string picture " is One " or character " net " there may be adhered." being one " is first split operation by the present embodiment.
It wherein, is two characters by the Character segmentation in two adjacent character, specifically:
S311, obtain the character central point abscissa, obtain central point abscissa.
For example, obtaining the central point abscissa 50 of character " being one ".
S312, presetted pixel number.
Wherein, the value of presetted pixel number is 2.
The value range of abscissa is the region of (x-a, x+a) in second character string picture described in S313, upright projection, is obtained To vertical projective histogram;Wherein, the x is the central point abscissa, and a is the number of pixels;The vertical throwing The abscissa of shadow histogram indicates abscissa of the pixel in second character string picture;The vertical projective histogram Ordinate indicate pixel number.
Wherein, a value, which takes, generally takes an empirical value, and such as 2.Optimal partition point is found in small range region.
For example, the value range of abscissa is that the region of (48,52) carries out upright projection by the second character string picture.
S314, it obtains in the vertical projective histogram, the smallest abscissa value of pixel number obtains breakpoint coordinate.
S315, a character is divided according to the breakpoint coordinate.
Wherein, when needing a Character segmentation is two characters, if being projected using entire character, it is rounded a word The smallest abscissa value of pixel number carries out Character segmentation in the vertical projective histogram of symbol, it is most likely that there is accidentally segmentation Situation.Since the spacing of two adjacent characters in character string picture of the present invention is roughly the same, then the character of each character is wide It spends roughly the same.Therefore, the central area of character to be split is only carried out upright projection by the present invention, is conducive to improve separating character Accuracy.
For example, the smallest abscissa value of pixel number is 50, i.e. the central point black pixel point number of character " being one " At least.It is that character " being one " is divided into "Yes" and " one " by endpoint with abscissa 50.
After character " being one " in Fig. 2 is split into two character "Yes" and " one ", character "Yes" and character " net " it Between spacing meet the requirements, therefore, " net " word there is no is adhered or accidentally divide the case where, no longer need to operate " net ".
If in S32, second character string picture, the spacing of the central point of two adjacent characters is less than the gauged distance Value, then: obtaining the character in two adjacent character, obtain the first character;Obtain a word adjacent with first character Symbol, obtains the second character;Merge first character and second character.
For example, as shown in Fig. 2, spacing between character " stone " and its adjacent character " base " in left side are as follows: 26, it is less than described Gauged distance value 34, and 34-26=8 are greater than spacing threshold 6.Spacing between character " stone " and character " out " adjacent on the right side of it Are as follows: 16, it is less than the gauged distance value 34, and 34-16=18, is greater than spacing threshold 6.
Therefore, there may be accidentally divide for character " base ", character " stone " and character " out ".The present embodiment is first by word Symbol " stone " and the character " out " adjacent on the right side of " stone " merge, and obtain " plinth ".After merging, the spacing of " plinth " and " base " is conformed to It asks, therefore, " base " character no longer needs to operate " net " there is no being adhered or accidentally dividing.
Wherein, if the spacing of the central point of two adjacent characters is greater than gauged distance value, illustrate there is one in two adjacent characters There is the case where being adhered in character, need to be split processing;If the spacing of the central point of two adjacent characters is less than gauged distance value, Illustrate have a character imperfect in two adjacent characters, processing need to be merged.
The present embodiment provides a kind of character segmentation method and terminal based on upright projection, character string figure of the present invention The spacing of adjacent character is identical as in, for example, the ID card No. on identity card.Gauged distance value provided by the invention is character The standard value of the spacing of two adjacent characters in string image.For example, the ID card No. region of an ID Card Image, digital two-by-two Be spaced in 5 pixels nearby float.The present invention is carrying out primary segmentation to the first character string picture using vertical projection method Afterwards, the size relation of the spacing and gauged distance value between every two adjacent character is successively analyzed, it can be determined that go out according to vertical The result of sciagraphy segmentation whether there is the situation for being adhered or accidentally dividing.For example, using vertical projection method to an ID Card Image It carries out preliminary Character segmentation and obtains 11 characters, wherein if the spacing between third character and the 4th character is 10 pictures Vegetarian refreshments is greater than gauged distance value, then third character or the 4th character need further to divide there may be being adhered It cuts;If the spacing between third character and the 4th character is 2 pixels, it is less than gauged distance value, then third character Or the 4th character there may be accidentally dividing, i.e. third character or the 4th character is imperfect, only half, need to be with Adjacent character merges to achieve the purpose that correct separating character.The present invention is using the standard value of the spacing of two adjacent characters to vertical The segmentation result of straight sciagraphy is corrected, and the accuracy of separating character string image can be improved.Character according to the present invention String image, the spacing of two adjacent characters are identical.
The above description is only an embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (6)

1. a kind of character segmentation method based on upright projection, which is characterized in that further include:
Character in first character string picture is divided using vertical projection method, obtains the second character string picture;
Obtain gauged distance value corresponding with second character string picture;The gauged distance value is the second character string figure The standard value of the spacing of two adjacent characters as in;
If in second character string picture, the spacing of the central point of two adjacent characters is greater than the gauged distance value, then: by institute Stating the Character segmentation in two adjacent characters is two characters;
If in second character string picture, the spacing of the central point of two adjacent characters is less than the gauged distance value, then: obtaining A character in two adjacent character, obtains the first character;A character adjacent with first character is obtained, obtains second Character;Merge first character and second character.
2. the character segmentation method according to claim 1 based on upright projection, which is characterized in that by two adjacent words A Character segmentation in symbol is two characters, specifically:
The abscissa for obtaining the central point of a character, obtains central point abscissa;
Presetted pixel number;
The value range of abscissa is the region of (x-a, x+a) in second character string picture described in upright projection, is vertically thrown Shadow histogram;Wherein, the x is the central point abscissa, and a is the number of pixels;The vertical projective histogram Abscissa indicate abscissa of the pixel in second character string picture;The ordinate of the vertical projective histogram Indicate pixel number;
It obtains in the vertical projective histogram, the smallest abscissa value of pixel number obtains breakpoint coordinate;
Divide a character according to the breakpoint coordinate.
3. the character segmentation method according to claim 1 based on upright projection, which is characterized in that obtain second word The gauged distance value of two adjacent intercharacters in symbol string image, specifically:
The character in second character string picture is obtained, third character is obtained;
A character adjacent with the third character is obtained, the 4th character is obtained;4th character is located at the third character Left side;
A character adjacent with the third character is obtained, the 5th character is obtained;5th character is located at the third character Right side;
The average value for calculating the height of the height of the third character, the height of the 4th character and the 5th character, obtains To height mean value;
The central point of the third character is obtained at a distance from the central point of the 4th character, obtains the first spacing;
The central point of the third character is obtained at a distance from the central point of the 5th character, obtains the second spacing;
The average value for calculating first spacing Yu second spacing, obtains spacing mean value;
If the difference of first spacing and second spacing is less than preset spacing threshold, and the spacing mean value with it is described For the ratio of height mean value in preset ratio range, then it is the gauged distance value that the spacing mean value, which is arranged,.
4. a kind of Character segmentation terminal, which is characterized in that including one or more processors and memory, the memory storage There is program, and be configured to execute following steps by one or more of processors:
Character in first character string picture is divided using vertical projection method, obtains the second character string picture;
Obtain the gauged distance value of two adjacent intercharacters in second character string picture;
If in second character string picture, the spacing of the central point of two adjacent characters is greater than the gauged distance value, then by institute Stating the Character segmentation in two adjacent characters is two characters;
If in second character string picture, the spacing of the central point of two adjacent characters is less than the gauged distance value, then: obtaining A character in two adjacent character, obtains the first character;A character adjacent with first character is obtained, obtains second Character;Merge first character and second character.
5. Character segmentation terminal according to claim 4, which is characterized in that by the character in two adjacent character point Two characters are segmented into, specifically:
The abscissa for obtaining the central point of a character, obtains central point abscissa;
Presetted pixel number;
The value range of abscissa is the region of (x-a, x+a) in second character string picture described in upright projection, is vertically thrown Shadow histogram;Wherein, the x is the central point abscissa, and a is the number of pixels;The vertical projective histogram Abscissa indicate abscissa of the pixel in second character string picture;The ordinate of the vertical projective histogram Indicate pixel number;
It obtains in the vertical projective histogram, the smallest abscissa value of pixel number obtains breakpoint coordinate;
Divide a character according to the breakpoint coordinate.
6. Character segmentation terminal according to claim 4, which is characterized in that obtain two-phase in second character string picture The gauged distance value of adjacent intercharacter, specifically:
The character in second character string picture is obtained, third character is obtained;
A character adjacent with the third character is obtained, the 4th character is obtained;4th character is located at the third character Left side;
A character adjacent with the third character is obtained, the 5th character is obtained;5th character is located at the third character Right side;
The average value for calculating the height of the height of the third character, the height of the 4th character and the 5th character, obtains To height mean value;
The central point of the third character is obtained at a distance from the central point of the 4th character, obtains the first spacing;
The central point of the third character is obtained at a distance from the central point of the 5th character, obtains the second spacing;
The average value for calculating first spacing Yu second spacing, obtains spacing mean value;
If the difference of first spacing and second spacing is less than preset spacing threshold, and the spacing mean value with it is described For the ratio of height mean value in preset ratio range, then it is the gauged distance value that the spacing mean value, which is arranged,.
CN201910328657.3A 2019-04-23 2019-04-23 A kind of character segmentation method and terminal based on upright projection CN110059695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910328657.3A CN110059695A (en) 2019-04-23 2019-04-23 A kind of character segmentation method and terminal based on upright projection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910328657.3A CN110059695A (en) 2019-04-23 2019-04-23 A kind of character segmentation method and terminal based on upright projection

Publications (1)

Publication Number Publication Date
CN110059695A true CN110059695A (en) 2019-07-26

Family

ID=67320310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910328657.3A CN110059695A (en) 2019-04-23 2019-04-23 A kind of character segmentation method and terminal based on upright projection

Country Status (1)

Country Link
CN (1) CN110059695A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043959A (en) * 2010-12-28 2011-05-04 青岛海信网络科技股份有限公司 License plate character segmentation method
CN102222226A (en) * 2011-06-21 2011-10-19 杭州电子科技大学 Priori analysis based iterative method for segmenting characters of licence plate
CN102496019A (en) * 2011-12-08 2012-06-13 银江股份有限公司 License plate character segmenting method
CN103324930A (en) * 2013-06-28 2013-09-25 浙江大学苏州工业技术研究院 License plate character segmentation method based on grey level histogram binaryzation
CN103729636A (en) * 2013-12-18 2014-04-16 小米科技有限责任公司 Method and device for cutting character and electronic device
CN105426891A (en) * 2015-12-14 2016-03-23 广东安居宝数码科技股份有限公司 Image-based vehicle license plate character segmentation method and system
CN108171237A (en) * 2017-12-08 2018-06-15 众安信息技术服务有限公司 A kind of line of text image individual character cutting method and device
CN108491845A (en) * 2018-03-02 2018-09-04 深圳怡化电脑股份有限公司 Determination, character segmentation method, device and the equipment of Character segmentation position
CN108805128A (en) * 2017-05-05 2018-11-13 北京京东金融科技控股有限公司 A kind of character segmentation method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043959A (en) * 2010-12-28 2011-05-04 青岛海信网络科技股份有限公司 License plate character segmentation method
CN102222226A (en) * 2011-06-21 2011-10-19 杭州电子科技大学 Priori analysis based iterative method for segmenting characters of licence plate
CN102496019A (en) * 2011-12-08 2012-06-13 银江股份有限公司 License plate character segmenting method
CN103324930A (en) * 2013-06-28 2013-09-25 浙江大学苏州工业技术研究院 License plate character segmentation method based on grey level histogram binaryzation
CN103729636A (en) * 2013-12-18 2014-04-16 小米科技有限责任公司 Method and device for cutting character and electronic device
CN105426891A (en) * 2015-12-14 2016-03-23 广东安居宝数码科技股份有限公司 Image-based vehicle license plate character segmentation method and system
CN108805128A (en) * 2017-05-05 2018-11-13 北京京东金融科技控股有限公司 A kind of character segmentation method and device
CN108171237A (en) * 2017-12-08 2018-06-15 众安信息技术服务有限公司 A kind of line of text image individual character cutting method and device
CN108491845A (en) * 2018-03-02 2018-09-04 深圳怡化电脑股份有限公司 Determination, character segmentation method, device and the equipment of Character segmentation position

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张剑等: "《一种基于字符分割与字符识别的LPR方法》", 《计算技术与自动化》 *
曹迪铭等: "《牌照字符分割中的区域分裂与合并》", 《计算机工程》 *
杨菊花等: "《基于图像的铁路货车车号定位与识别》", 《兰州交通大学学报》 *

Similar Documents

Publication Publication Date Title
US10429193B2 (en) Method and apparatus for generating high precision map
CN105261020B (en) A kind of express lane line detecting method
CN106652465B (en) Method and system for identifying abnormal driving behaviors on road
Lu et al. Scene text extraction based on edges and support vector regression
CN102708356B (en) Automatic license plate positioning and recognition method based on complex background
Luvizon et al. A video-based system for vehicle speed measurement in urban roadways
US10430681B2 (en) Character segmentation and recognition method
EP3309705B1 (en) Qr code feature detection method and system
US8391560B2 (en) Method and system for image identification and identification result output
JP5867596B2 (en) Three-dimensional object detection apparatus and three-dimensional object detection method
Uchiyama et al. Random dot markers
CN104392205B (en) A kind of recognition methods and system of improper license plate
CN101334836B (en) License plate positioning method incorporating color, size and texture characteristic
CN103065138B (en) Recognition method of license plate number of motor vehicle
CN103116751B (en) A kind of Method of Automatic Recognition for Character of Lcecse Plate
Yu et al. An approach to Korean license plate recognition based on vertical edge matching
CN102043950B (en) Vehicle outline recognition method based on canny operator and marginal point statistic
CN102609686B (en) Pedestrian detection method
CN103824066B (en) A kind of licence plate recognition method based on video flowing
US7302098B2 (en) Character segmentation method and apparatus
Kheyrollahi et al. Automatic real-time road marking recognition using a feature driven approach
Samra et al. Localization of license plate number using dynamic image processing techniques and genetic algorithms
US6577763B2 (en) Document image recognition apparatus and computer-readable storage medium storing document image recognition program
US20160210507A1 (en) Image processing system with layout analysis and method of operation thereof
JP5783243B2 (en) Periodic stationary object detection apparatus and periodic stationary object detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination