CN106503688A - Writing brush word minimum bounding box extracting method based on wavelet Smoothing - Google Patents
Writing brush word minimum bounding box extracting method based on wavelet Smoothing Download PDFInfo
- Publication number
- CN106503688A CN106503688A CN201611012109.2A CN201611012109A CN106503688A CN 106503688 A CN106503688 A CN 106503688A CN 201611012109 A CN201611012109 A CN 201611012109A CN 106503688 A CN106503688 A CN 106503688A
- Authority
- CN
- China
- Prior art keywords
- word
- image
- interval
- bounding box
- writing brush
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/333—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
- Character Discrimination (AREA)
Abstract
Writing brush word minimum bounding box extracting method based on wavelet Smoothing disclosed by the invention, specially:Rubbings image is input into first, the handwriting image of binaryzation is obtained after pretreatment;Enter ranks integration to the handwriting image of binaryzation, be partitioned into the starting and ending position of each row;Each arranging is integrated into every trade, is introduced wavelet Smoothing, for wavelet Smoothing being carried out to row integrated signal, is obtained smooth waveform, segmentation trip on smooth waveform;To rupturing, character is merged;Integration is once arranged again to each word first, accurate left and right edge is cut out;The upper and lower, left and right edge of each word is recorded afterwards;The single character picture of well cutting is finally preserved.Writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, can realize the cutting in the case of erecting in column of embarking on journey horizontal to rubbings image, and can effective process noise situations.
Description
Technical field
The invention belongs to image processing method technical field, and in particular to a kind of most parcel of the writing brush word based on wavelet Smoothing
Enclose box extracting method.
Background technology
In recent years, with the fast-developing and popularization of computer technology, also begin to more make in traditional calligraphy field
Use computer technology.During rubbings image after to scanning is processed, need to go out the single Chinese to rubbings image zooming-out
Word character, asks for the minimum bounding box of the character.
So-called minimum bounding box refers to the minimum rectangle frame for surrounding a word surrounding, is come real by transverse and longitudinal integration method
Existing, the step of realizing of transverse and longitudinal integration method is:First string word is marked off with row integration, then mark off each word by row integration
Lower edges, the row integration that finally tries again marks off the left and right edge of each word, how accurately simple in the method
Realize that ranks cutting is difficult point, traditional ranks integration method only enters ranks integration, and actual uplink integration there are two steps, in reality
Suffer from the drawback that in operation:(1) noise such as it is stained due to there is rubbings image, makes row integrated waveform there is burr so as to being difficult to
Judge character starting and ending, cause the cutting of said method lower edges not accurate enough, be not the accurate minimum encirclement of word
Box;(2) some Chinese characters may be cut into two halves etc..As can be seen here, row integration method has pending further improvement.
Content of the invention
It is an object of the invention to provide a kind of writing brush word minimum bounding box extracting method based on wavelet Smoothing, can realize
The cutting in the case of erecting in column of embarking on journey horizontal to rubbings image, and can effective process noise situations.
The technical solution adopted in the present invention is, based on the writing brush word minimum bounding box extracting method of wavelet Smoothing, specifically
Implement according to following steps:
Step 1, first input rubbings image, obtain the handwriting image of binaryzation after pretreatment;
Step 2, enter ranks integration to the handwriting image of the binaryzation obtained through step 1, be partitioned into starting and the knot of each row
Beam position;
Step 3, through after step 2, each arranging is integrated into every trade, wavelet Smoothing is introduced, little for carrying out to row integrated signal
Popin is slided, and specially using coif2 wavelet basiss, is decomposed three-level and is reconstructed using third level profile signal, obtain smooth
Waveform, segmentation trip on smooth waveform;
Step 4, treat step 3 after the completion of, to rupture character merge;
Step 5, treat step 4 after the completion of, integration is once arranged again to each word first, accurate left and right edge is cut out;
The upper and lower, left and right edge of each word is recorded afterwards;The single character picture of well cutting is finally preserved.
The characteristics of of the invention, also resides in:
Step 1 is specifically implemented in accordance with the following methods:
Using the rubbings image for color format storage, R is carried out, G, B triple channel color value summation is more than preset value
Method is carried out, and is specifically implemented according to following algorithm:
In formula:C (x, y) is the original color image of input, and b (x, y) is the binary image for obtaining;Selected threshold 200
Carry out binaryzation.
Step 4 is specifically implemented according to following steps:
Step a, first interval of the searching more than 0, calculate the interval height afterwards;
Step b, through step a after, if the interval highly less than a word 30% and away between previous interval or latter zone
Distance less than a word 10%, then it is the part between previous interval or latter zone to give tacit consent to the interval;
Step c, through step b after, travel through above-mentioned interval from top to bottom, carry out rupturing interval merge, obtain complete character
Lower edges.
Beneficial effect of the present invention is:
(1) writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, efficiently solves rubbings image certainly
Accurate cutting problem is moved, the cutting in the case of erecting in column of embarking on journey horizontal to rubbings image can be realized, moreover it is possible to effective process noise situations.
(2) present invention based on the writing brush word minimum bounding box extracting method of wavelet Smoothing overcome of the prior art not
Foot, with three big improvements:1. first integrated using row integration, row, then arrange the three-step approach of integration, that is, integration of being expert at cuts out every list
Again to each word using once row integration after the lower edges of word, left and right edge can be accurately cut out;2. be especially expert at integration when, pin
The burr phenomena existed by row integrated waveform, is smoothed using wavelet transformation, so effectively eliminates character starting and ending
The burr at place, improves the precision for judging;Situation about 3. may be split off for some Chinese characters, such as " thinks ", the level such as " three " is integrated
There is blank situation, judged according to character average height and character average headway;When cutting gained number of characters is more than every
Start the judgement during row number of characters, if certain character height is less than in advance much smaller than character average height and with neighbouring character pitch
If when value, entering line character merging, the situation that character is split off can be eliminated.
Description of the drawings
Fig. 1 is flow chart of the present invention based on the writing brush word minimum bounding box extracting method of wavelet Smoothing;
Fig. 2 be the present invention based on the row integrated waveform being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing
Figure;
Fig. 3 be the present invention based on the wavelet Smoothing being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing after
Row integrated waveform figure;
Fig. 4 be the present invention based on the row integral image being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing;
Fig. 5 is before the present invention is merged based on the interval being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing
Row integral image;
Fig. 6 is after the present invention is merged based on the interval being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing
Row integral image.
Specific embodiment
The present invention is described in detail with reference to the accompanying drawings and detailed description.
Writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, its flow process is as shown in figure 1, be first input into upright stone tablet
Note image, obtains the handwriting image of binaryzation after pretreatment;Row integration is then passed through, the starting and ending position of each row is partitioned into
Put;Then each column count row is integrated, it is considered to which rubbings is stained the noise of presence, is caused row integrated waveform to there is burr, is increased cutting
Difficulty, affects cutting precision, introduces wavelet Smoothing, the enterprising every trade cutting of the waveform after smooth;Merge afterwards and may be split off
Character;The coordinate of final entry minimum bounding box simultaneously preserves cutting character picture.
Writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, specifically implements according to following steps:
Step 1, first input rubbings image, obtain the handwriting image of binaryzation, specifically in accordance with the following methods after pretreatment
Implement:
The binarization method of rubbings image is more, will be carried out according to the characteristics of rubbings;
Using the rubbings image for color format storage, R is carried out here, G, B triple channel color value summation is more than default
The method of value is carried out, and is specifically implemented according to following algorithm:
In above formula:C (x, y) is the original color image of input, and b (x, y) is the binary image for obtaining;Here choose
Threshold value 200 carries out binaryzation;
For the rubbings that some are downloaded from the Internet, may also have and there is white edge in page surrounding, can be found out by integration method
The position of white edge, so that intercept the image in white edge for effective binary image.
Step 2, enter ranks integration to the handwriting image of the binaryzation obtained through step 1, be partitioned into starting and the knot of each row
Beam position.
Step 3, through after step 2, each arranging is integrated into every trade, the noise of presence is wherein stained in view of rubbings, is caused to go
There is burr in integrated waveform, increase cutting difficulty, affect cutting precision, and wavelet Smoothing to be introduced, for entering to row integrated signal
Row wavelet Smoothing, specially using coif2 wavelet basiss, is decomposed three-level and is reconstructed using third level profile signal, put down
Sliding waveform;Segmentation trip on smooth waveform;
Wherein, row integrated waveform figure is as shown in Fig. 2 the row integrated waveform figure after wavelet Smoothing is as shown in figure 3, can see
Go out:After through wavelet Smoothing, significantly improve the burr phenomena on waveform, it is to avoid what lower edges cannot be positioned when word is split lacks
Fall into.
Step 4, treat step 3 after the completion of, to rupture character merge, specifically implement in accordance with the following methods:
In order to avoid individual malapropism is split off, be handled as follows:
Step a, first interval of the searching more than 0, calculate the interval height afterwards;
Step b, through step a after, if the interval highly less than a word 30% and away between previous interval or latter zone
Distance less than a word 10%, then it is the part between previous interval or latter zone to give tacit consent to the interval;
Step c, through step b after, travel through above-mentioned interval from top to bottom, carry out rupturing interval merge, obtain complete character
Lower edges;
As shown in Fig. 4, Fig. 5 and Fig. 6, the 8th character is cut and splits into two characters, merges into one after algorithm process
Individual complete character.
Step 5, treat step 4 after the completion of, integration is once arranged again to each word first, accurate left and right edge is cut out;
The upper and lower, left and right edge of each word is recorded afterwards;The single character picture of well cutting is finally preserved.
Writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, efficiently solves rubbings image automatic precision
Really cutting problem, can realize the cutting in the case of erecting in column of embarking on journey horizontal to rubbings image, moreover it is possible to effective process noise situations.
Claims (3)
1. the writing brush word minimum bounding box extracting method based on wavelet Smoothing, it is characterised in that specifically implement according to following steps:
Step 1, first input rubbings image, obtain the handwriting image of binaryzation after pretreatment;
Step 2, enter ranks integration to the handwriting image of the binaryzation obtained through step 1, be partitioned into the starting and ending position of each row
Put;
Step 3, through after step 2, each arranging is integrated into every trade, wavelet Smoothing is introduced, for carrying out little popin to row integrated signal
Sliding, specially using coif2 wavelet basiss, decompose three-level and be reconstructed using third level profile signal, obtain smooth waveform,
Segmentation trip on smooth waveform;
Step 4, treat step 3 after the completion of, to rupture character merge;
Step 5, treat step 4 after the completion of, integration is once arranged again to each word first, accurate left and right edge is cut out;Afterwards
Record the upper and lower, left and right edge of each word;The single character picture of well cutting is finally preserved.
2. the writing brush word minimum bounding box extracting method based on wavelet Smoothing according to claim 1, it is characterised in that institute
State step 1 specifically to implement in accordance with the following methods:
Using the rubbings image for color format storage, R, method of G, the B triple channel color value summation more than preset value is carried out
Carry out, specifically implement according to following algorithm:
In formula:C (x, y) is the original color image of input, and b (x, y) is the binary image for obtaining;Selected threshold 200 is carried out
Binaryzation.
3. the writing brush word minimum bounding box extracting method based on wavelet Smoothing according to claim 1, it is characterised in that institute
State step 4 specifically to implement according to following steps:
Step a, first interval of the searching more than 0, calculate the interval height afterwards;
Step b, through step a after, if the interval is highly less than 30% and the distance away between previous interval or latter zone of a word
Less than the 10% of a word, then it is the part between previous interval or latter zone to give tacit consent to the interval;
Step c, through step b after, travel through above-mentioned interval from top to bottom, carry out rupturing interval merge, obtain complete character upper and lower
Edge.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611012109.2A CN106503688A (en) | 2016-11-17 | 2016-11-17 | Writing brush word minimum bounding box extracting method based on wavelet Smoothing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611012109.2A CN106503688A (en) | 2016-11-17 | 2016-11-17 | Writing brush word minimum bounding box extracting method based on wavelet Smoothing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106503688A true CN106503688A (en) | 2017-03-15 |
Family
ID=58324748
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611012109.2A Pending CN106503688A (en) | 2016-11-17 | 2016-11-17 | Writing brush word minimum bounding box extracting method based on wavelet Smoothing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106503688A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446701A (en) * | 2018-03-12 | 2018-08-24 | 南昌航空大学 | A kind of best bounding volume method of writing brush word |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102456220A (en) * | 2010-10-25 | 2012-05-16 | 新奥特(北京)视频技术有限公司 | Color image noise channel extraction method based on bounding box |
CN103093240A (en) * | 2013-01-18 | 2013-05-08 | 浙江大学 | Calligraphy character identifying method |
CN104715256A (en) * | 2015-03-04 | 2015-06-17 | 南昌大学 | Auxiliary calligraphy exercising system and evaluation method based on image method |
CN104992176A (en) * | 2015-07-24 | 2015-10-21 | 北京航空航天大学 | Inscription oriented Chinese character extracting method |
CN105117741A (en) * | 2015-09-28 | 2015-12-02 | 上海海事大学 | Recognition method of calligraphy character style |
-
2016
- 2016-11-17 CN CN201611012109.2A patent/CN106503688A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102456220A (en) * | 2010-10-25 | 2012-05-16 | 新奥特(北京)视频技术有限公司 | Color image noise channel extraction method based on bounding box |
CN103093240A (en) * | 2013-01-18 | 2013-05-08 | 浙江大学 | Calligraphy character identifying method |
CN104715256A (en) * | 2015-03-04 | 2015-06-17 | 南昌大学 | Auxiliary calligraphy exercising system and evaluation method based on image method |
CN104992176A (en) * | 2015-07-24 | 2015-10-21 | 北京航空航天大学 | Inscription oriented Chinese character extracting method |
CN105117741A (en) * | 2015-09-28 | 2015-12-02 | 上海海事大学 | Recognition method of calligraphy character style |
Non-Patent Citations (4)
Title |
---|
印月等: "一种完整的汉字识别系统设计", 《微计算机信息》 * |
尹 明等: "一种新的离线手写签名识别方法", 《现代电子技术》 * |
章夏芬: "中国数字书法检索与作品真伪鉴别的研究", 《中国优秀博硕士学位论文全文数据库 (博士) 信息科技辑》 * |
章夏芬等: "根据形状相似性的书法内容检索", 《计算机辅助设计与图形学学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446701A (en) * | 2018-03-12 | 2018-08-24 | 南昌航空大学 | A kind of best bounding volume method of writing brush word |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10255691B2 (en) | Method and system of detecting and recognizing a vehicle logo based on selective search | |
CN107610124B (en) | Furnace mouth image preprocessing method | |
US7298900B2 (en) | Image processing method, image processing apparatus and image processing program | |
CN108470021A (en) | The localization method and device of table in PDF document | |
US20050193327A1 (en) | Method for determining logical components of a document | |
US9275030B1 (en) | Horizontal and vertical line detection and removal for document images | |
CA2429507A1 (en) | Writing guide for a free-form document editor | |
Lehal | Ligature segmentation for Urdu OCR | |
CN103646247A (en) | Music score recognition method | |
CN111368695A (en) | Table structure extraction method | |
CN106503688A (en) | Writing brush word minimum bounding box extracting method based on wavelet Smoothing | |
EP2685426A1 (en) | Character string detection device, image processing device, character string detection method, control program and storage medium | |
CN101877062A (en) | Method for profile analysis in image layout area | |
CN107730511A (en) | A kind of Tibetan language historical document line of text cutting method based on baseline estimations | |
CN102314608A (en) | Method and device for extracting rows from character image | |
CN106446863B (en) | PDF document logic diagram identification method | |
CN101944180A (en) | Music note primitive segmentation method based on music note knowledge and double projection method | |
CN101452368B (en) | Hand-written character input method | |
CN101901333B (en) | Method for segmenting word in text image and identification device using same | |
JP2000148788A (en) | Device and method for extracting title area from document image and document retrieving method | |
CN108564078B (en) | Method for extracting axle wire of Manchu word image | |
CN108764155B (en) | Handwritten Uyghur word segmentation recognition method | |
CN106682666A (en) | Characteristic template manufacturing method for unusual font OCR identification | |
CN108596182B (en) | Manchu component cutting method | |
CN108549896B (en) | Method for deleting redundant candidate segmentation lines in Manchu component segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170315 |