CN102156865A - Handwritten text line character segmentation method and identification method - Google Patents

Handwritten text line character segmentation method and identification method Download PDF

Info

Publication number
CN102156865A
CN102156865A CN 201010587738 CN201010587738A CN102156865A CN 102156865 A CN102156865 A CN 102156865A CN 201010587738 CN201010587738 CN 201010587738 CN 201010587738 A CN201010587738 A CN 201010587738A CN 102156865 A CN102156865 A CN 102156865A
Authority
CN
China
Prior art keywords
character
connected domain
cutting
adhesion
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010587738
Other languages
Chinese (zh)
Inventor
镇立新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hehe Information Technology Development Co Ltd
Original Assignee
Shanghai Hehe Information Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hehe Information Technology Development Co Ltd filed Critical Shanghai Hehe Information Technology Development Co Ltd
Priority to CN 201010587738 priority Critical patent/CN102156865A/en
Publication of CN102156865A publication Critical patent/CN102156865A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Input (AREA)

Abstract

The invention discloses a handwritten text line character segmentation method and an identification method, which can accurately segment text lines. Three common conditions of naturally isolated characters, superposed characters and merged characters in a naturally written Chinese character text are respectively processed. The naturally isolated characters are segmented through a histogram projection method; the superposed characters are segmented by setting isolating points in a superposed region; and the merged characters are segmented through steps of firstly detecting out merged points in the merged characters, then, separating the characters at the merged points and finally segmenting the characters as the superposed characters. The methods provided by the invention can ensure higher segmentation accuracy while implementing rapid segmentation of the text lines.

Description

Handwritten text line character cutting method, recognition methods
Technical field
The invention belongs to Flame Image Process and handwriting recognition technology field, relate to a kind of character cutting method, relate in particular to a kind of handwritten text line character cutting method; Simultaneously, the invention still further relates to a kind of recognition methods of handwritten text line character.
Background technology
The man-machine interchange naturally is the important development direction of following man-machine interaction, and literal be the mankind with computing machine between one of the means that exchange naturally.Description form according to people's natural language, obtain user's handwriting by certain method and technological means (for example utilizing touch-screen, scanner or camera), utilize certain technical method that hand-written character image is analyzed and handled again, realizing the automatic identification of computing machine to literal, also is that our said literal is discerned.
Present monocase character recognition technology is comparative maturity, but the identification of unconfined full frame text writing style of writing word remains a difficult problem that needs to be resolved hurrily.So-called monocase identification is that each user only imports a character picture and discerns processing to electronic equipment; So-called line of text identification is meant that the user can write delegation's literal according to the writing style of nature on writing medium, and we are referred to as line of text, and this article one's own profession is submitted to electronic equipment again and discerns.Obviously, the line of text input has better character input efficiency than the monocase input.Do not have the constraint line of text and generally be meant what nature was write, owing to the writer is not done restriction on any writing, people can freely write according to daily writing style, and need not consider to have neat handwriting, not have the requirement on a horizontal linear or the like of the literal that connects in pen, the line of text.Therefore, natural text written row identification is for people provide a kind of more natural mode to the computing machine input characters.But then, because under the situation of writing naturally, character has a bad handwriting, the situation complexity between the character: overlapping, adhesion, cross the usually generation that grades, and the literal of being write may be also not point-blank.For computing machine, strengthened its difficulty of identification automatically undoubtedly, wherein Zui Da difficult point is the single character in the line of text is split automatically with certain technological means, handles thereby conveniently utilize the monocase recognition technology to carry out character recognition.
The identification of line of text is that carry out on the basis with the monocase, at first needs to determine the boundary of each character in the line of text that is:, respectively these characters is discerned, and identifies whole line of text then based on this.Yet under the situation of writing naturally, each character that is syncopated as exactly in the line of text is the very work of difficulty.Existing character cutting algorithm, perhaps cutting accuracy is not high enough, perhaps the cutting time long, can not satisfy the requirement of real-time operation.Do not have the constraint nature write the off line Chinese text capable in, the boundary between the character is not obvious, (see figure 1) often takes place in situations such as overlapping, adhesion, has brought very big difficulty for the accurate cutting of character.In the past the pre-cutting method that Chinese text is capable mainly contain method based on refinement, the method extracted based on stroke or the like.Wherein, the former cutting accuracy is higher, but need expend a large amount of time; The latter realizes simpler, but the cutting effect is not ideal enough.And in above method, all can't be not obvious to the boundary between the character, overlapping, situation such as adhesion carries out cutting preferably.
Summary of the invention
Technical matters to be solved by this invention is: a kind of handwritten text line character cutting method is provided, can carries out the cutting of line of text exactly.
In addition, the present invention also provides a kind of recognition methods of handwritten text line character, can carry out the identification of line of text exactly.
For solving the problems of the technologies described above, the present invention adopts following technical scheme:
A kind of handwritten text line character cutting method, described method comprises the steps: that input text is capable; The character of separating naturally in the input text utilizes the histogram projection method that it is carried out cutting; For the overlapping character in the input text, utilize the method that isolating points is set in the overlapping region that it is carried out cutting; For the adhesion character in the input text, at first detect adhesion point wherein, then character is split in the punishment of adhesion point, then it is carried out cutting as overlapping character.
A kind of handwritten text line character cutting method, described method comprises the steps:
Step 100: input text is capable;
Step 200: the character of separating is naturally carried out cutting;
Step 300: the result to obtaining through step 200, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step 400; Otherwise turn to step 700;
Step 400: carry out overlapping character cutting;
Step 500: the result to obtaining through step 400, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step 600; Otherwise turn to step 700;
Step 600: carry out the cutting of adhesion character;
Step 700: output cutting result;
Step 800: finish.
As a preferred embodiment of the present invention, in the described step 200, adopt following method to carry out the character cutting that nature is separated:
Calculate seek the null value zone in the given line of text histogram projection curve, determine the interval between the nature separating character;
Occupy at interval the middle part and with about the distance of two characters equate a perpendicular line, as the cutting route of natural separating character.
As a preferred embodiment of the present invention, in the described step 300, adopt following method to carry out the estimation of character duration:
Step 310: the mean breadth Mean_Wid that estimates character by following formula:
Wherein, w i(i=1,2 ...) be the width of any one character among the cutting result; W MedBe the intermediate value median value of all character durations among the cutting result, when character duration was arranged by ascending order, sorting position that value placed in the middle was exactly an intermediate value; Bool () represents Boolean calculation, and functional value is 1 when the condition in the bracket satisfies, otherwise functional value is 0; N MedFor satisfying the number of characters of following condition: { w i| w i〉=0.8*W Med﹠amp; ﹠amp; w i≤ 1.6*W Med;
Step 320: press following formula calculated threshold T:
T=1.2*Mean_Wid
Wherein, variable Mean_Wid is determined by step 310.
As a preferred embodiment of the present invention, in the described step 400, adopt following method to carry out the cutting of overlapping character:
Step 410: counterweight reduplicated word symbol carries out the connected domain analysis; Set up the attaching relation between new connected domain and the character; This attaching relation is obtained by the common factor of pixel: because overlapping character mutually disjoints, and the new connected domain energy of each the overlapping region in and only can be crossing with a character; If the common factor of certain a new connected domain and a character is not empty, then this connected domain just belongs to this character;
Step 420: each row of traversal overlapping region: investigate these row from top to bottom, if two connected domains that occur belong to different characters successively, then an isolating points is set, makes them between these two connected domains, and this arrives up and down, and the distance of two connected domains equates at these row;
Step 430: in order to guarantee that isolating points correctly is set in the overlapping region, will to the connected domain in the overlapping region set by step 440-470 adjust;
Step 440: in the overlapping region, those list the connected domain that pixel is all arranged at all mark, and they and little connected domain are made a distinction;
Step 450:, investigate successively from top to bottom for those connected domains of mark; If occur successively two the mark connected domain belong to same character, then insert a virtual connected domain between them: each lists one or more first kind pixels all is set, and represents character pixels; This pixel between these two connected domains, and with the connected domain of below one or more second type pixels at interval, non-intersect as far as possible to guarantee this virtual pixel and little connected domain; If can't avoid crossing, then need in little connected domain, a pixel above intersection location and this position to be removed, make little connected domain and this virtual pixel one or more second type pixels at interval; This virtual connected domain of mark, and think that it belongs to another character;
Step 460: in the overlapping region, add a virtual first kind pixel respectively, form a virtual connected domain for first each row of going; This virtual connected domain of mark, and think that it belongs to different characters respectively with the connected domain of mark of original the top; Similarly, also add a virtual first kind pixel respectively, form a virtual connected domain for each row of last column; This virtual connected domain of mark, and think that it belongs to different characters respectively with the original connected domain of mark the most of below;
Step 470: investigate occur successively two mark connected domains, if between them little connected domain is arranged, and these little connected domains belong to different characters respectively, should guarantee that then these little connected domain vertical projections do not have common factor, that is: if they have pixel distribution in identical row, then list the pixel of these little connected domains is eliminated at this;
Step 480: the isolating points of each row that will obtain after will handling through above-mentioned steps is connected in turn, and has just constituted that part of cutting route in the overlapping region;
Step 490: with the vertical boundary of overlapping region cutting route, and the cutting route of all parts linked to each other, just constituted a complete curve cutting route as remainder.
As a preferred embodiment of the present invention, among the described step 600, adopt following method to carry out the adhesion character cutting:
Step 610: in the adhesion character, the boundary position of each character is estimated according to its histogram projection;
Step 620: for the rough position of adhesion character public boundary, find out a histogram peak point respectively from its left and right sides, the value of this point is greater than histogrammic average peak, and the public boundary of the most close adhesion character; Think near the center of this peak point corresponding to character; Between above-mentioned two peak points, find out a histogram valley point, the value of this point is less than histogrammic average valley, and the public boundary of close character; Think the definite position of this valley point corresponding to adhesion character public boundary;
Step 630: the adhesion character is carried out the character thinning processing; On refined image, find out the stroke of above-mentioned valley point position, they may comprise the adhesion point; Crunode on these strokes and angle point all are recorded as the adhesion point;
Step 640: the adhesion character is split in the punishment of adhesion point, they are become overlapping character; Wherein, the division of adhesion character is carried out as follows: be in the m*m pixel region at center with the adhesion point, image pixel all is changed to the second type pixel, the expression background pixel; Wherein, m is for setting numerical value;
Step 650: the method according to counterweight reduplicated word symbol is carried out cutting to character;
Step 660: the cutting route that produces according to step 650 is superimposed upon on the image of former adhesion character, and image is carried out cutting.
A kind of recognition methods of handwritten text line character is characterized in that, described method comprises the steps:
Step S10: input text is capable;
Step S20: the character of separating is naturally carried out cutting;
Step S30: the result to obtaining through step S20, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step S40; Otherwise turn to step S70;
Step S40: carry out overlapping character cutting;
Step S50: the result to obtaining through step S40, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step S60; Otherwise turn to step S70;
Step S60: carry out the cutting of adhesion character;
Step S70: output cutting result;
Step S80: the character that the cutting result who exports obtains is discerned the output recognition result respectively.
Beneficial effect of the present invention is: handwritten text line character cutting method and recognition methods that the present invention proposes, by the handwritten text line character image under the situation of writing is naturally carried out cutting, effectively the boundary between the processing character is not obvious, situations such as overlapping, adhesion, with character dividing processing among the line of text, handle thereby conveniently utilize the monocase recognition technology to carry out character recognition.
Description of drawings
Fig. 1 is not for there being the character synoptic diagram of the hand-written Chinese text of constraint nature in capable.Wherein: Fig. 1 (a) is for separating situation naturally; Fig. 1 (b) is overlapping situation; Fig. 1 (c) is the adhesion situation; Fig. 1 (d) is overlapping and the adhesion situation; Fig. 1 (e) is adhesion and overlapping situation.
Fig. 2 is a cutting method process flow diagram of the present invention.
Fig. 3 is for carrying out the cutting synoptic diagram to the character of separating naturally; Wherein Fig. 3 (a) is the character picture histogram projection; Fig. 3 (b) is a straight line cutting route synoptic diagram.
The attach most importance to cutting process of reduplicated word symbol of Fig. 4.Wherein Fig. 4 (a) is the overlapping region; Fig. 4 (b) is the new connected domain in the overlapping region; Fig. 4 (c) is a certain isolating points that lists; Fig. 4 (d) is the part cutting route in the overlapping region; Fig. 4 (e) is complete curve cutting route.
Fig. 5 is the testing process synoptic diagram of adhesion point.
Fig. 6 utilizes the capable synoptic diagram of the inventive method cutting handwritten text.
Embodiment
Describe the preferred embodiments of the present invention in detail below in conjunction with accompanying drawing.
Embodiment one
The present invention proposes a kind of capable cutting method of constraint handwritten text that do not have fast, carries out the cutting of line of text exactly.Three kinds of common situations at the Chinese text of writing naturally in capable: the character of Fen Geing, overlapping character and adhesion character are handled respectively naturally.Character for separating naturally utilizes the histogram projection method that it is carried out cutting; For overlapping character, utilize the method that isolating points is set in the overlapping region that it is carried out cutting; For the adhesion character, at first detect adhesion point wherein, then character is split in the punishment of adhesion point, it can be carried out cutting as overlapping character at last.The inventive method can guarantee higher cutting accuracy when realizing the quick cutting of line of text.
The present invention is directed to Chinese text that non-nothing constraint nature writes capable in three kinds of common situations: the character of Fen Geing, overlapping character and adhesion character naturally, proposed a kind of character cutting technical method, the inventive method adopts three kinds of different strategies to handle above three kinds of character situations successively.After handling a kind of situation, the mean breadth of all characters in the line of text is estimated at every turn, and threshold value T is taken as the direct ratio function of character mean breadth.Exceeded those characters of threshold value for width, thought that they have comprised plural character, need proceed cutting.At last, the cutting result with three kinds of character situations integrates the cutting result of formation line of text.The inventive method mainly comprises following four technology parts: the character cutting of Fen Geing naturally, overlapping character cutting, adhesion character cutting, and the estimation of character duration.
The equipment of implementing patent of the present invention can adopt the smart mobile phone (for example HTC/GoogleNexus One smart mobile phone) of band camera, and this mobile phone has camera, can gather the hand-written line of text view data of user.Adopt C Plus Plus to work out corresponding all kinds of handling procedure, just can well implement the present invention.The present invention also can realize on other mobile electronic devices such as PC, panel computer, PDA; The present invention also can adopt other programming languages such as C language, Java language to realize.
See also Fig. 2, the present invention has disclosed a kind of handwritten text line character cutting method, and described method comprises the steps:
[step 100] input text is capable.
[step 200] carries out cutting to the character of separating naturally; Specific practice is: calculate seek the null value zone in the given line of text histogram projection curve, determine the interval between the nature separating character.Occupy at interval the middle part and with about the distance of two characters equate a perpendicular line, as the cutting route of natural separating character.As shown in Figure 3.
[step 300] result to obtaining through step 200 estimates (being primary Calculation) each character duration, whether judges this character duration greater than certain predetermined threshold value T, if answer is for being then to turn to step 400; If answer then turns to step 700 for not;
Adopt following method to carry out the estimation of character duration:
Step 310: the mean breadth Mean_Wid that estimates character by following formula:
Figure BDA0000038148580000071
Wherein, w i(i=1,2 ...) be the width of any one character among the cutting result, W MedIntermediate value (median value for all character durations among the cutting result, when character duration was arranged by ascending order, sorting position that value placed in the middle was exactly an intermediate value), (functional value is 1 to bool () expression Boolean calculation when the condition in the bracket satisfies, otherwise functional value is 0), N MedFor satisfying the number of characters of following condition: { w i| w i〉=0.8*W Med﹠amp; ﹠amp; w i≤ 1.6*W Med}
Step 320: press following formula calculated threshold T:
T=1.2*Mean_Wid (2)
Wherein, the variable Mean_Wid on equal sign the right is determined by step 310.
[step 400] carries out overlapping character cutting; Comprise the steps:
Step 410: counterweight reduplicated word symbol carries out the connected domain analysis; Set up the attaching relation between new connected domain and the character; This attaching relation is obtained by the common factor of pixel: because overlapping character mutually disjoints, and the new connected domain energy of each the overlapping region in and only can be crossing with a character.If the common factor of certain a new connected domain and a character is not empty, then this connected domain just belongs to this character.
Step 420: each row of traversal overlapping region: investigate these row from top to bottom, if two connected domains that occur belong to different characters successively, then an isolating points is set, makes it between these two connected domains, and this arrives up and down, and the distance of two connected domains equates at these row;
Step 430: in order to guarantee that isolating points correctly is set in the overlapping region, we will 440~470 adjust set by step to the connected domain in the overlapping region;
Step 440: in the overlapping region, those list the connected domain that pixel is all arranged at all mark, and they and little connected domain are made a distinction.
Step 450:, investigate successively from top to bottom for those connected domains of mark.If occur successively two the mark connected domain belong to same character, then insert a virtual connected domain between them: each lists a black pixel (representative character pixels) all is set, this pixel between these two connected domains, and with the connected domain of below only at interval a white pixel (it is non-intersect as far as possible in order to guarantee this virtual pixel and little connected domain doing like this.If can't avoid crossing, then need in little connected domain, a pixel above intersection location and this position to be removed, make also white pixel in interval only of little connected domain and this virtual pixel).This virtual connected domain of mark, and think that it belongs to another character.
Step 460: in the overlapping region, add a virtual black pixel respectively, form a virtual connected domain for first each row of going.This virtual connected domain of mark, and think that it belongs to different characters respectively with the connected domain of mark of original the top.Similarly, also add a virtual black pixel respectively, form a virtual connected domain for each row of last column.This virtual connected domain of mark, and think that it belongs to different characters respectively with the original connected domain of mark the most of below.
Step 470: investigate occur successively two mark connected domains, if between them little connected domain is arranged, and these little connected domains belong to different characters respectively, should guarantee that then these little connected domain vertical projections do not have common factor, that is: if they have pixel distribution in identical row, then list the pixel of these little connected domains is eliminated at this.
Step 480: the isolating points of each row that will obtain after will handling through above-mentioned steps is connected in turn, and has just constituted that part of cutting route in the overlapping region;
Step 490: with the vertical boundary of overlapping region cutting route, and the cutting route of all parts linked to each other, just constituted a complete curve cutting route as remainder;
Above-mentioned implementation step processing procedure synoptic diagram as shown in Figure 4;
[step 500] result to obtaining through step 400 estimates each character duration, whether judges this character duration greater than certain predetermined threshold value T, if answer is for being then to turn to step 600; If answer then turns to step 700 for not;
[step 600] carries out the cutting of adhesion character; Comprise the steps:
Step 610: in the adhesion character, the boundary position of each character is estimated according to its histogram projection;
Step 620: for the rough position of adhesion character public boundary, find out a histogram peak point respectively from its left and right sides, the value of this point is greater than histogrammic average peak, and the public boundary of the most close adhesion character.Think near the center of this peak point corresponding to character.Between above-mentioned two peak points, find out a histogram valley point, the value of this point is less than histogrammic average valley, and the public boundary of close character.Think the definite position of this valley point corresponding to adhesion character public boundary;
Step 630: the adhesion character is carried out the character thinning processing.On refined image, find out the stroke of above-mentioned valley point position, they may comprise the adhesion point.Crunode on these strokes and angle point all are recorded as adhesion point, (as shown in Figure 5);
Step 640: the adhesion character is split in the punishment of adhesion point, they are become overlapping character; Wherein, the division of adhesion character is carried out as follows: be in the m*m pixel region at center with the adhesion point, image pixel all is changed to white pixel (expression background pixel), and desirable m is any one number between 8~15 in the patent of the present invention.
Step 650: the method according to counterweight reduplicated word symbol is carried out cutting (seeing step 410-step 490) to character;
Step 660: the cutting route that produces according to step 650 is superimposed upon on the image of former adhesion character, and image is carried out cutting.
[step 700] output cutting result;
[step 800] finishes.
Fig. 6 has shown and utilizes the technology of the present invention not have the result that the capable cutting of carrying out cutting that nature separates cutting, overlapping character and adhesion character of constraint handwritten text obtains respectively to one section, this exemplifying embodiment is not well to there being capable cutting processing, the fine effect that reaches effective separating character of energy of having carried out of constraint handwritten text as we can see from the figure.
In sum, handwritten text line character cutting method and recognition methods that the present invention proposes, by the handwritten text line character image under the situation of writing is naturally carried out cutting, effectively the boundary between the processing character is not obvious, situations such as overlapping, adhesion, with character dividing processing among the line of text, handle thereby conveniently utilize the monocase recognition technology to carry out character recognition.
Embodiment two
Present embodiment discloses a kind of recognition methods of handwritten text line character, and described method comprises the steps:
Step S10: input text is capable;
Step S20: the character of separating is naturally carried out cutting;
Step S30: the result to obtaining through step S20, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step S40; Otherwise turn to step S70;
Step S40: carry out overlapping character cutting;
Step S50: the result to obtaining through step S40, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step S60; Otherwise turn to step S70;
Step S60: carry out the cutting of adhesion character;
Step S70: output cutting result;
Step S80: the character that the cutting result who exports obtains is discerned the output recognition result respectively.
Among the described step S30, adopt following method to carry out the estimation of character duration:
Step 310: the mean breadth Mean_Wid that estimates character by following formula:
Figure BDA0000038148580000111
Wherein, w i(i=1,2 ...) be the width of any one character among the cutting result; W MedBe the intermediate value median value of all character durations among the cutting result, when character duration was arranged by ascending order, sorting position that value placed in the middle was exactly an intermediate value; Bool () represents Boolean calculation, and functional value is 1 when the condition in the bracket satisfies, otherwise functional value is 0; N MedFor satisfying the number of characters of following condition: { w i| w i〉=0.8*W Med﹠amp; ﹠amp; w i≤ 1.6*W Med;
Step 320: press following formula calculated threshold T:
T=1.2*Mean_Wid
Wherein, variable Mean_Wid is determined by step 310.
Among the described step S40, adopt following method to carry out the cutting of overlapping character:
Step 410: counterweight reduplicated word symbol carries out the connected domain analysis; Set up the attaching relation between new connected domain and the character; This attaching relation is obtained by the common factor of pixel: because overlapping character mutually disjoints, and the new connected domain energy of each the overlapping region in and only can be crossing with a character; If the common factor of certain a new connected domain and a character is not empty, then this connected domain just belongs to this character;
Step 420: each row of traversal overlapping region: investigate these row from top to bottom, if two connected domains that occur belong to different characters successively, then an isolating points is set, makes them between these two connected domains, and this arrives up and down, and the distance of two connected domains equates at these row;
Step 430: in order to guarantee that isolating points correctly is set in the overlapping region, will to the connected domain in the overlapping region set by step 440-470 adjust;
Step 440: in the overlapping region, those list the connected domain that pixel is all arranged at all mark, and they and little connected domain are made a distinction;
Step 450:, investigate successively from top to bottom for those connected domains of mark; If occur successively two the mark connected domain belong to same character, then insert a virtual connected domain between them: each lists one or more first kind pixels all is set, and represents character pixels; This pixel between these two connected domains, and with the connected domain of below one or more second type pixels at interval, non-intersect as far as possible to guarantee this virtual pixel and little connected domain; If can't avoid crossing, then need in little connected domain, a pixel above intersection location and this position to be removed, make little connected domain and this virtual pixel one or more second type pixels at interval; This virtual connected domain of mark, and think that it belongs to another character;
Step 460: in the overlapping region, add a virtual first kind pixel respectively, form a virtual connected domain for first each row of going; This virtual connected domain of mark, and think that it belongs to different characters respectively with the connected domain of mark of original the top; Similarly, also add a virtual first kind pixel respectively, form a virtual connected domain for each row of last column; This virtual connected domain of mark, and think that it belongs to different characters respectively with the original connected domain of mark the most of below;
Step 470: investigate occur successively two mark connected domains, if between them little connected domain is arranged, and these little connected domains belong to different characters respectively, should guarantee that then these little connected domain vertical projections do not have common factor, that is: if they have pixel distribution in identical row, then list the pixel of these little connected domains is eliminated at this;
Step 480: the isolating points of each row that will obtain after will handling through above-mentioned steps is connected in turn, and has just constituted that part of cutting route in the overlapping region;
Step 490: with the vertical boundary of overlapping region cutting route, and the cutting route of all parts linked to each other, just constituted a complete curve cutting route as remainder.
Among the described step S60, adopt following method to carry out the adhesion character cutting:
Step 610: in the adhesion character, the boundary position of each character is estimated according to its histogram projection;
Step 620: for the rough position of adhesion character public boundary, find out a histogram peak point respectively from its left and right sides, the value of this point is greater than histogrammic average peak, and the public boundary of the most close adhesion character; Think near the center of this peak point corresponding to character; Between above-mentioned two peak points, find out a histogram valley point, the value of this point is less than histogrammic average valley, and the public boundary of close character; Think the definite position of this valley point corresponding to adhesion character public boundary;
Step 630: the adhesion character is carried out the character thinning processing; On refined image, find out the stroke of above-mentioned valley point position, they may comprise the adhesion point; Crunode on these strokes and angle point all are recorded as the adhesion point;
Step 640: the adhesion character is split in the punishment of adhesion point, they are become overlapping character; Wherein, the division of adhesion character is carried out as follows: be in the m*m pixel region at center with the adhesion point, image pixel all is changed to the second type pixel, the expression background pixel; Wherein, m is for setting numerical value;
Step 650: the method according to counterweight reduplicated word symbol is carried out cutting to character;
Step 660: the cutting route that produces according to step 650 is superimposed upon on the image of former adhesion character, and image is carried out cutting.
Here description of the invention and application is illustrative, is not to want with scope restriction of the present invention in the above-described embodiments.Here the distortion of disclosed embodiment and change are possible, and the various parts of the replacement of embodiment and equivalence are known for those those of ordinary skill in the art.Those skilled in the art are noted that under the situation that does not break away from spirit of the present invention or essential characteristic, and the present invention can be with other form, structure, layout, ratio, and realize with other assembly, material and parts.Under the situation that does not break away from the scope of the invention and spirit, can carry out other distortion and change here to disclosed embodiment.

Claims (11)

1. a handwritten text line character cutting method is characterized in that described method comprises the steps:
--step 100: input text is capable;
--step 200: the character of separating is naturally carried out cutting;
Wherein, adopt following method to carry out the character cutting that nature is separated: calculate seek the null value zone in the given line of text histogram projection curve, determine the interval between the nature separating character; Occupy at interval the middle part and with about the distance of two characters equate a perpendicular line, as the cutting route of natural separating character;
--step 300: the result to obtaining through step 200, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step 400; Otherwise turn to step 700;
Wherein, adopt following method to carry out the estimation of character duration:
Step 310: the mean breadth Mean_Wid that estimates character by following formula:
Figure FDA0000038148570000011
Wherein, w i(i=1,2 ...) be the width of any one character among the cutting result; W MedBe the intermediate value median value of all character durations among the cutting result, when character duration was arranged by ascending order, sorting position that value placed in the middle was exactly an intermediate value; Bool () represents Boolean calculation, and functional value is 1 when the condition in the bracket satisfies, otherwise functional value is 0; N MedFor satisfying the number of characters of following condition: { w i| w i〉=0.8*W Med﹠amp; ﹠amp; w i≤ 1.6*W Med;
Step 320: by following formula calculated threshold T:T=1.2*Mean_Wid; Wherein, variable Mean_Wid is determined by step 310;
--step 400: carry out overlapping character cutting; Adopt following method to carry out the cutting of overlapping character:
Step 410: counterweight reduplicated word symbol carries out the connected domain analysis; Set up the attaching relation between new connected domain and the character; This attaching relation is obtained by the common factor of pixel: because overlapping character mutually disjoints, and the new connected domain energy of each the overlapping region in and only can be crossing with a character; If the common factor of certain a new connected domain and a character is not empty, then this connected domain just belongs to this character;
Step 420: each row of traversal overlapping region: investigate these row from top to bottom, if two connected domains that occur belong to different characters successively, then an isolating points is set, makes them between these two connected domains, and this arrives up and down, and the distance of two connected domains equates at these row;
Step 430: in order to guarantee that isolating points correctly is set in the overlapping region, will to the connected domain in the overlapping region set by step 440-470 adjust;
Step 440: in the overlapping region, those list the connected domain that pixel is all arranged at all mark, and they and little connected domain are made a distinction;
Step 450:, investigate successively from top to bottom for the connected domain of mark; If occur successively two the mark connected domain belong to same character, then insert a virtual connected domain between them: each lists one or more first kind pixels all is set, and represents character pixels; This pixel between these two connected domains, and with the connected domain of below one or more second type pixels at interval, non-intersect as far as possible to guarantee this virtual pixel and little connected domain; If can't avoid crossing, then need in little connected domain, a pixel above intersection location and this position to be removed, make little connected domain and this virtual pixel one or more second type pixels at interval; This virtual connected domain of mark, and think that it belongs to another character;
Step 460: in the overlapping region, add a virtual first kind pixel respectively, form a virtual connected domain for first each row of going; This virtual connected domain of mark, and think that it belongs to different characters respectively with the connected domain of mark of original the top; Similarly, also add a virtual first kind pixel respectively, form a virtual connected domain for each row of last column; This virtual connected domain of mark, and think that it belongs to different characters respectively with the original connected domain of mark the most of below;
Step 470: investigate occur successively two mark connected domains, if between them little connected domain is arranged, and these little connected domains belong to different characters respectively, should guarantee that then these little connected domain vertical projections do not have common factor, that is: if they have pixel distribution in identical row, then list the pixel of these little connected domains is eliminated at this;
Step 480: the isolating points of each row that will obtain after will handling through above-mentioned steps is connected in turn, and has just constituted that part of cutting route in the overlapping region;
Step 490: with the vertical boundary of overlapping region cutting route, and the cutting route of all parts linked to each other, just constituted a complete curve cutting route as remainder;
--step 500: the result to obtaining through step 400, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step 600; Otherwise turn to step 700;
--step 600: carry out the cutting of adhesion character;
Among the described step 600, adopt following method to carry out the adhesion character cutting:
Step 610: in the adhesion character, the boundary position of each character is estimated according to its histogram projection;
Step 620: for the rough position of adhesion character public boundary, find out a histogram peak point respectively from its left and right sides, the value of this point is greater than histogrammic average peak, and the public boundary of the most close adhesion character; Think near the center of this peak point corresponding to character; Between above-mentioned two peak points, find out a histogram valley point, the value of this point is less than histogrammic average valley, and the public boundary of close character; Think the definite position of this valley point corresponding to adhesion character public boundary;
Step 630: the adhesion character is carried out the character thinning processing; On refined image, find out the stroke of above-mentioned valley point position, they may comprise the adhesion point; Crunode on these strokes and angle point all are recorded as the adhesion point;
Step 640: the adhesion character is split in the punishment of adhesion point, they are become overlapping character; Wherein, the division of adhesion character is carried out as follows: be in the m*m pixel region at center with the adhesion point, image pixel all is changed to the second type pixel, the expression background pixel; Wherein, m is for setting numerical value;
Step 650: the method according to counterweight reduplicated word symbol is carried out cutting to character;
Step 660: the cutting route that produces according to step 650 is superimposed upon on the image of former adhesion character, and image is carried out cutting;
--step 700: output cutting result;
--step 800: finish.
2. a handwritten text line character cutting method is characterized in that described method comprises the steps:
Step 100: input text is capable;
Step 200: the character of separating is naturally carried out cutting;
Step 300: the result to obtaining through step 200, estimate each character duration, judge that whether each character duration is greater than predetermined threshold value T; If then turn to step 400; Otherwise turn to step 700;
Step 400: carry out overlapping character cutting;
Step 500: the result to obtaining through step 400, estimate each character duration, judge that whether each character duration is greater than predetermined threshold value T; If then turn to step 600; Otherwise turn to step 700;
Step 600: carry out the cutting of adhesion character;
Step 700: output cutting result;
Step 800: finish.
3. handwritten text line character cutting method according to claim 2 is characterized in that:
In the described step 200, adopt following method to carry out the character cutting that nature is separated:
Calculate seek the null value zone in the given line of text histogram projection curve, determine the interval between the nature separating character;
Occupy at interval the middle part and with about the distance of two characters equate a perpendicular line, as the cutting route of natural separating character.
4. handwritten text line character cutting method according to claim 2 is characterized in that:
In the described step 300, adopt following method to carry out the estimation of character duration:
Step 310: the mean breadth Mean_Wid that estimates character by following formula:
Figure FDA0000038148570000041
Wherein, w i(i=1,2 ...) be the width of any one character among the cutting result; W MedBe the intermediate value median value of all character durations among the cutting result, when character duration was arranged by ascending order, sorting position that value placed in the middle was exactly an intermediate value; Bool () represents Boolean calculation, and functional value is 1 when the condition in the bracket satisfies, otherwise functional value is 0; N MedFor satisfying the number of characters of following condition: { w i| w i〉=0.8*W Med﹠amp; ﹠amp; w i≤ 1.6*W Med;
Step 320: press following formula calculated threshold T:
T=1.2*Mean_Wid
Wherein, variable Mean_Wid is determined by step 310.
5. handwritten text line character cutting method according to claim 2 is characterized in that:
In the described step 400, adopt following method to carry out the cutting of overlapping character:
Step 410: counterweight reduplicated word symbol carries out the connected domain analysis; Set up the attaching relation between new connected domain and the character; This attaching relation is obtained by the common factor of pixel: because overlapping character mutually disjoints, and the new connected domain energy of each the overlapping region in and only can be crossing with a character; If the common factor of certain a new connected domain and a character is not empty, then this connected domain just belongs to this character;
Step 420: each row of traversal overlapping region: investigate these row from top to bottom, if two connected domains that occur belong to different characters successively, then an isolating points is set, makes them between these two connected domains, and this arrives up and down, and the distance of two connected domains equates at these row;
Step 430: in order to guarantee that isolating points correctly is set in the overlapping region, will to the connected domain in the overlapping region set by step 440-470 adjust;
Step 440: in the overlapping region, those list the connected domain that pixel is all arranged at all mark, and they and little connected domain are made a distinction;
Step 450:, investigate successively from top to bottom for those connected domains of mark; If occur successively two the mark connected domain belong to same character, then insert a virtual connected domain between them: each lists one or more first kind pixels all is set, and represents character pixels; This pixel between these two connected domains, and with the connected domain of below one or more second type pixels at interval, non-intersect as far as possible to guarantee this virtual pixel and little connected domain; If can't avoid crossing, then need in little connected domain, a pixel above intersection location and this position to be removed, make little connected domain and this virtual pixel one or more second type pixels at interval; This virtual connected domain of mark, and think that it belongs to another character;
Step 460: in the overlapping region, add a virtual first kind pixel respectively, form a virtual connected domain for first each row of going; This virtual connected domain of mark, and think that it belongs to different characters respectively with the connected domain of mark of original the top; Similarly, also add a virtual first kind pixel respectively, form a virtual connected domain for each row of last column; This virtual connected domain of mark, and think that it belongs to different characters respectively with the original connected domain of mark the most of below;
Step 470: investigate occur successively two mark connected domains, if between them little connected domain is arranged, and these little connected domains belong to different characters respectively, should guarantee that then these little connected domain vertical projections do not have common factor, that is: if they have pixel distribution in identical row, then list the pixel of these little connected domains is eliminated at this;
Step 480: the isolating points of each row that will obtain after will handling through above-mentioned steps is connected in turn, and has just constituted the cutting route in the overlapping region;
Step 490: with the vertical boundary of overlapping region cutting route, and the cutting route of all parts linked to each other, just constituted a complete curve cutting route as remainder.
6. handwritten text line character cutting method according to claim 2 is characterized in that:
Among the described step 600, adopt following method to carry out the adhesion character cutting:
Step 610: in the adhesion character, the boundary position of each character is estimated according to its histogram projection;
Step 620: for the rough position of adhesion character public boundary, find out a histogram peak point respectively from its left and right sides, the value of this point is greater than histogrammic average peak, and the public boundary of the most close adhesion character; Think near the center of this peak point corresponding to character; Between above-mentioned two peak points, find out a histogram valley point, the value of this point is less than histogrammic average valley, and the public boundary of close character; Think the definite position of this valley point corresponding to adhesion character public boundary;
Step 630: the adhesion character is carried out the character thinning processing; On refined image, find out the stroke of above-mentioned valley point position, they may comprise the adhesion point; Crunode on these strokes and angle point all are recorded as the adhesion point;
Step 640: the adhesion character is split in the punishment of adhesion point, they are become overlapping character; Wherein, the division of adhesion character is carried out as follows: be in the m*m pixel region at center with the adhesion point, image pixel all is changed to the second type pixel, the expression background pixel; Wherein, m is for setting numerical value;
Step 650: the method according to counterweight reduplicated word symbol is carried out cutting to character;
Step 660: the cutting route that produces according to step 650 is superimposed upon on the image of former adhesion character, and image is carried out cutting.
7. a handwritten text line character cutting method is characterized in that described method comprises the steps:
Input text is capable;
The character of separating naturally in the input text utilizes the histogram projection method that it is carried out cutting;
For the overlapping character in the input text, utilize the method that isolating points is set in the overlapping region that it is carried out cutting;
For the adhesion character in the input text, at first detect adhesion point wherein, then character is split in the punishment of adhesion point, then it is carried out cutting as overlapping character.
8. the recognition methods of a handwritten text line character is characterized in that, described method comprises the steps:
Step S10: input text is capable;
Step S20: the character of separating is naturally carried out cutting;
Step S30: the result to obtaining through step S20, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step S40; Otherwise turn to step S70;
Step S40: carry out overlapping character cutting;
Step S50: the result to obtaining through step S40, estimate each character duration, judge that whether this character duration is greater than predetermined threshold value T; If then turn to step S60; Otherwise turn to step S70;
Step S60: carry out the cutting of adhesion character;
Step S70: output cutting result;
Step S80: the character that the cutting result who exports obtains is discerned the output recognition result respectively.
9. the recognition methods of handwritten text line character according to claim 8 is characterized in that:
Among the described step S30, adopt following method to carry out the estimation of character duration:
Step 310: the mean breadth Mean_Wid that estimates character by following formula:
Figure FDA0000038148570000071
Wherein, w i(i=1,2 ...) be the width of any one character among the cutting result; W MedBe the intermediate value median value of all character durations among the cutting result, when character duration was arranged by ascending order, sorting position that value placed in the middle was exactly an intermediate value; Bool () represents Boolean calculation, and functional value is 1 when the condition in the bracket satisfies, otherwise functional value is 0; N MedFor satisfying the number of characters of following condition: { w i| w i〉=0.8*W Med﹠amp; ﹠amp; w i≤ 1.6*W Med;
Step 320: press following formula calculated threshold T:
T=1.2*Mean_Wid
Wherein, variable Mean_Wid is determined by step 310.
10. the recognition methods of handwritten text line character according to claim 8 is characterized in that:
Among the described step S40, adopt following method to carry out the cutting of overlapping character:
Step 410: counterweight reduplicated word symbol carries out the connected domain analysis; Set up the attaching relation between new connected domain and the character; This attaching relation is obtained by the common factor of pixel: because overlapping character mutually disjoints, and the new connected domain energy of each the overlapping region in and only can be crossing with a character; If the common factor of certain a new connected domain and a character is not empty, then this connected domain just belongs to this character;
Step 420: each row of traversal overlapping region: investigate these row from top to bottom, if two connected domains that occur belong to different characters successively, then an isolating points is set, makes them between these two connected domains, and this arrives up and down, and the distance of two connected domains equates at these row;
Step 430: in order to guarantee that isolating points correctly is set in the overlapping region, will to the connected domain in the overlapping region set by step 440-470 adjust;
Step 440: in the overlapping region, those list the connected domain that pixel is all arranged at all mark, and they and little connected domain are made a distinction;
Step 450:, investigate successively from top to bottom for those connected domains of mark; If occur successively two the mark connected domain belong to same character, then insert a virtual connected domain between them: each lists one or more first kind pixels all is set, and represents character pixels; This pixel between these two connected domains, and with the connected domain of below one or more second type pixels at interval, non-intersect as far as possible to guarantee this virtual pixel and little connected domain; If can't avoid crossing, then need in little connected domain, a pixel above intersection location and this position to be removed, make little connected domain and this virtual pixel one or more second type pixels at interval; This virtual connected domain of mark, and think that it belongs to another character;
Step 460: in the overlapping region, add a virtual first kind pixel respectively, form a virtual connected domain for first each row of going; This virtual connected domain of mark, and think that it belongs to different characters respectively with the connected domain of mark of original the top; Similarly, also add a virtual first kind pixel respectively, form a virtual connected domain for each row of last column; This virtual connected domain of mark, and think that it belongs to different characters respectively with the original connected domain of mark the most of below;
Step 470: investigate occur successively two mark connected domains, if between them little connected domain is arranged, and these little connected domains belong to different characters respectively, should guarantee that then these little connected domain vertical projections do not have common factor, that is: if they have pixel distribution in identical row, then list the pixel of these little connected domains is eliminated at this;
Step 480: the isolating points of each row that will obtain after will handling through above-mentioned steps is connected in turn, and has just constituted the cutting route in the overlapping region;
Step 490: with the vertical boundary of overlapping region cutting route, and the cutting route of all parts linked to each other, just constituted a complete curve cutting route as remainder.
11. the recognition methods of handwritten text line character according to claim 8 is characterized in that:
Among the described step S60, adopt following method to carry out the adhesion character cutting:
Step 610: in the adhesion character, the boundary position of each character is estimated according to its histogram projection;
Step 620: for the rough position of adhesion character public boundary, find out a histogram peak point respectively from its left and right sides, the value of this point is greater than histogrammic average peak, and the public boundary of the most close adhesion character; Think near the center of this peak point corresponding to character; Between above-mentioned two peak points, find out a histogram valley point, the value of this point is less than histogrammic average valley, and the public boundary of close character; Think the definite position of this valley point corresponding to adhesion character public boundary;
Step 630: the adhesion character is carried out the character thinning processing; On refined image, find out the stroke of above-mentioned valley point position, they may comprise the adhesion point; Crunode on these strokes and angle point all are recorded as the adhesion point;
Step 640: the adhesion character is split in the punishment of adhesion point, they are become overlapping character; Wherein, the division of adhesion character is carried out as follows: be in the m*m pixel region at center with the adhesion point, image pixel all is changed to the second type pixel, the expression background pixel; Wherein, m is for setting numerical value;
Step 650: the method according to counterweight reduplicated word symbol is carried out cutting to character;
Step 660: the cutting route that produces according to step 650 is superimposed upon on the image of former adhesion character, and image is carried out cutting.
CN 201010587738 2010-12-14 2010-12-14 Handwritten text line character segmentation method and identification method Pending CN102156865A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010587738 CN102156865A (en) 2010-12-14 2010-12-14 Handwritten text line character segmentation method and identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010587738 CN102156865A (en) 2010-12-14 2010-12-14 Handwritten text line character segmentation method and identification method

Publications (1)

Publication Number Publication Date
CN102156865A true CN102156865A (en) 2011-08-17

Family

ID=44438356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010587738 Pending CN102156865A (en) 2010-12-14 2010-12-14 Handwritten text line character segmentation method and identification method

Country Status (1)

Country Link
CN (1) CN102156865A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106406A (en) * 2011-11-09 2013-05-15 佳能株式会社 Method and system for segmenting characters in text line with different character widths
CN103257810A (en) * 2012-02-17 2013-08-21 汉王科技股份有限公司 Identification method and identification device of handwritten mathematical formula
CN104809483A (en) * 2014-01-26 2015-07-29 安徽科大讯飞信息科技股份有限公司 Method and system for realizing segmentation of text lines written in any directions
CN105912993A (en) * 2016-03-31 2016-08-31 深圳感官密码科技有限公司 Automatic paper marking image identification method and system
CN106339704A (en) * 2015-07-14 2017-01-18 富士通株式会社 Character recognition method and character recognition equipment
CN106611175A (en) * 2016-12-29 2017-05-03 成都数联铭品科技有限公司 Automatic character and picture segmentation system for recognizing image characters
CN106682667A (en) * 2016-12-29 2017-05-17 成都数联铭品科技有限公司 Image-text OCR (optical character recognition) system for uncommon fonts
CN106709474A (en) * 2017-01-23 2017-05-24 无锡职业技术学院 Handwritten telephone number identification, verification and information sending system
CN106845473A (en) * 2015-12-03 2017-06-13 富士通株式会社 For determine image whether be the image with address information method and apparatus
CN107545261A (en) * 2016-06-23 2018-01-05 佳能株式会社 The method and device of text detection
CN107944451A (en) * 2017-11-27 2018-04-20 西北民族大学 The row cutting method and system of a kind of ancient Tibetan books document
CN108121988A (en) * 2016-11-30 2018-06-05 富士通株式会社 Information processing method and device and information detecting method and device
CN108171237A (en) * 2017-12-08 2018-06-15 众安信息技术服务有限公司 A kind of line of text image individual character cutting method and device
CN108460384A (en) * 2018-02-08 2018-08-28 南京晓庄学院 A kind of character cutting method of line Handwritten text
CN108491845A (en) * 2018-03-02 2018-09-04 深圳怡化电脑股份有限公司 Determination, character segmentation method, device and the equipment of Character segmentation position
CN108701215A (en) * 2016-01-20 2018-10-23 迈思慧公司 The system and method for multipair image structures for identification
CN108805128A (en) * 2017-05-05 2018-11-13 北京京东金融科技控股有限公司 A kind of character segmentation method and device
CN109389115A (en) * 2017-08-11 2019-02-26 腾讯科技(上海)有限公司 Text recognition method, device, storage medium and computer equipment
CN109635718A (en) * 2018-12-10 2019-04-16 科大讯飞股份有限公司 A kind of text filed division methods, device, equipment and storage medium
CN109800756A (en) * 2018-12-14 2019-05-24 华南理工大学 A kind of text detection recognition methods for the intensive text of Chinese historical document
CN111401351A (en) * 2020-03-06 2020-07-10 南京红松信息技术有限公司 Segmentation method based on vertical character positioning expansion
CN112101347A (en) * 2020-08-27 2020-12-18 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN113254653A (en) * 2021-07-05 2021-08-13 明品云(北京)数据科技有限公司 Text classification method, system, device and medium
CN113723413A (en) * 2021-08-01 2021-11-30 北京工业大学 Handwritten Chinese text segmentation method based on greedy snake
CN113936181A (en) * 2021-08-01 2022-01-14 北京工业大学 Method for identifying adhered handwritten English characters
CN115410209A (en) * 2022-10-31 2022-11-29 山东济矿鲁能煤电股份有限公司阳城煤矿 Coal mine work order identification method based on image processing
US11823474B2 (en) 2020-10-27 2023-11-21 Boe Technology Group Co., Ltd. Handwritten text recognition method, apparatus and system, handwritten text search method and system, and computer-readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《博士学位论文》 20100625 李南希 非特定人的自然书写脱机中文文本行识别 第29页图3-3,第29页第1段第3.2.1节,第29页第2段第3.2.2节,第32页第3段第3.2.2.2节第10至第16行,第33页第3.2.2.3节,第34页第2段至第35页第4段,第35页倒数第10行至倒数第8行,第32页第3段倒数第4行,第37页第3.2.3节倒数第2行,第38页第1段,第38页第2段,第38页第3.2.3.2节第4段第2行至第5行,第39页第3.2.4节第2段至第3段,第64页第4.3.3节第1段 1-11 , *

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106406A (en) * 2011-11-09 2013-05-15 佳能株式会社 Method and system for segmenting characters in text line with different character widths
CN103106406B (en) * 2011-11-09 2016-10-05 佳能株式会社 There is the method and system of character in the line of text of kinds of characters width for cutting
CN103257810A (en) * 2012-02-17 2013-08-21 汉王科技股份有限公司 Identification method and identification device of handwritten mathematical formula
CN103257810B (en) * 2012-02-17 2016-03-02 汉王科技股份有限公司 Hand-written method for identifying mathematical formula and device
CN104809483A (en) * 2014-01-26 2015-07-29 安徽科大讯飞信息科技股份有限公司 Method and system for realizing segmentation of text lines written in any directions
CN104809483B (en) * 2014-01-26 2019-04-05 科大讯飞股份有限公司 Realize the method and system of any direction text writing row cutting
CN106339704A (en) * 2015-07-14 2017-01-18 富士通株式会社 Character recognition method and character recognition equipment
CN106845473A (en) * 2015-12-03 2017-06-13 富士通株式会社 For determine image whether be the image with address information method and apparatus
CN106845473B (en) * 2015-12-03 2020-06-02 富士通株式会社 Method and device for determining whether image is image with address information
CN108701215A (en) * 2016-01-20 2018-10-23 迈思慧公司 The system and method for multipair image structures for identification
CN105912993A (en) * 2016-03-31 2016-08-31 深圳感官密码科技有限公司 Automatic paper marking image identification method and system
CN107545261A (en) * 2016-06-23 2018-01-05 佳能株式会社 The method and device of text detection
CN108121988B (en) * 2016-11-30 2021-09-24 富士通株式会社 Information processing method and device, and information detection method and device
CN108121988A (en) * 2016-11-30 2018-06-05 富士通株式会社 Information processing method and device and information detecting method and device
CN106682667A (en) * 2016-12-29 2017-05-17 成都数联铭品科技有限公司 Image-text OCR (optical character recognition) system for uncommon fonts
CN106611175A (en) * 2016-12-29 2017-05-03 成都数联铭品科技有限公司 Automatic character and picture segmentation system for recognizing image characters
CN106709474A (en) * 2017-01-23 2017-05-24 无锡职业技术学院 Handwritten telephone number identification, verification and information sending system
CN108805128B (en) * 2017-05-05 2023-11-07 京东科技控股股份有限公司 Character segmentation method and device
CN108805128A (en) * 2017-05-05 2018-11-13 北京京东金融科技控股有限公司 A kind of character segmentation method and device
CN109389115B (en) * 2017-08-11 2023-05-23 腾讯科技(上海)有限公司 Text recognition method, device, storage medium and computer equipment
CN109389115A (en) * 2017-08-11 2019-02-26 腾讯科技(上海)有限公司 Text recognition method, device, storage medium and computer equipment
CN107944451A (en) * 2017-11-27 2018-04-20 西北民族大学 The row cutting method and system of a kind of ancient Tibetan books document
CN108171237A (en) * 2017-12-08 2018-06-15 众安信息技术服务有限公司 A kind of line of text image individual character cutting method and device
CN108460384B (en) * 2018-02-08 2024-01-19 南京晓庄学院 Character segmentation method for offline handwriting text
CN108460384A (en) * 2018-02-08 2018-08-28 南京晓庄学院 A kind of character cutting method of line Handwritten text
CN108491845B (en) * 2018-03-02 2022-05-31 深圳怡化电脑股份有限公司 Character segmentation position determination method, character segmentation method, device and equipment
CN108491845A (en) * 2018-03-02 2018-09-04 深圳怡化电脑股份有限公司 Determination, character segmentation method, device and the equipment of Character segmentation position
CN109635718A (en) * 2018-12-10 2019-04-16 科大讯飞股份有限公司 A kind of text filed division methods, device, equipment and storage medium
CN109800756B (en) * 2018-12-14 2021-02-12 华南理工大学 Character detection and identification method for dense text of Chinese historical literature
CN109800756A (en) * 2018-12-14 2019-05-24 华南理工大学 A kind of text detection recognition methods for the intensive text of Chinese historical document
CN111401351A (en) * 2020-03-06 2020-07-10 南京红松信息技术有限公司 Segmentation method based on vertical character positioning expansion
CN112101347A (en) * 2020-08-27 2020-12-18 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
US11823474B2 (en) 2020-10-27 2023-11-21 Boe Technology Group Co., Ltd. Handwritten text recognition method, apparatus and system, handwritten text search method and system, and computer-readable storage medium
CN113254653B (en) * 2021-07-05 2021-12-21 明品云(北京)数据科技有限公司 Text classification method, system, device and medium
CN113254653A (en) * 2021-07-05 2021-08-13 明品云(北京)数据科技有限公司 Text classification method, system, device and medium
CN113936181A (en) * 2021-08-01 2022-01-14 北京工业大学 Method for identifying adhered handwritten English characters
CN113723413A (en) * 2021-08-01 2021-11-30 北京工业大学 Handwritten Chinese text segmentation method based on greedy snake
CN113723413B (en) * 2021-08-01 2024-03-08 北京工业大学 Handwriting Chinese text segmentation method based on greedy snake
CN113936181B (en) * 2021-08-01 2024-03-26 北京工业大学 Recognition method for adhering handwritten English characters
CN115410209A (en) * 2022-10-31 2022-11-29 山东济矿鲁能煤电股份有限公司阳城煤矿 Coal mine work order identification method based on image processing
CN115410209B (en) * 2022-10-31 2023-01-31 山东济矿鲁能煤电股份有限公司阳城煤矿 Coal mine work order identification method based on image processing

Similar Documents

Publication Publication Date Title
CN102156865A (en) Handwritten text line character segmentation method and identification method
CN109284758B (en) Invoice seal eliminating method and device and computer storage medium
CN102200834B (en) Television control-oriented finger-mouse interaction method
CN101976114B (en) System and method for realizing information interaction between computer and pen and paper based on camera
CN100550038C (en) Image content recognizing method and recognition system
CN106650780A (en) Data processing method, device, classifier training method and system
CN105511792A (en) In-position hand input method and system for form
CN102156609A (en) Overlap handwriting input method
CN104182750A (en) Extremum connected domain based Chinese character detection method in natural scene image
CN103150019A (en) Handwriting input system and method
CN106940799A (en) Method for processing text images and device
CN110969129A (en) End-to-end tax bill text detection and identification method
CN100555312C (en) Utilize charcter topology information to carry out the method and apparatus of the handwriting recognition of aftertreatment
CN104766076A (en) Detection method and device for video images and texts
CN101751569A (en) Character segmentation method for offline handwriting Uighur words
CN103336961A (en) Interactive natural scene text detection method
CN107944451B (en) Line segmentation method and system for ancient Tibetan book documents
CN110516673B (en) Yi-nationality ancient book character detection method based on connected component and regression type character segmentation
CN109086772A (en) A kind of recognition methods and system distorting adhesion character picture validation code
CN101581981A (en) Method and system for directly forming Chinese text by writing Chinese characters on a piece of common paper
Hu et al. Touching text line segmentation combined local baseline and connected component for uchen Tibetan historical documents
CN113191309A (en) Method and system for recognizing, scoring and correcting handwritten Chinese characters
CN109598185A (en) Image recognition interpretation method, device, equipment and readable storage medium storing program for executing
CN102750534A (en) Method and device for segmenting characters
CN103345365B (en) The display packing of continuous handwriting input and the hand input device of employing the method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110817