CN106503688A - Writing brush word minimum bounding box extracting method based on wavelet Smoothing - Google Patents

Writing brush word minimum bounding box extracting method based on wavelet Smoothing Download PDF

Info

Publication number
CN106503688A
CN106503688A CN201611012109.2A CN201611012109A CN106503688A CN 106503688 A CN106503688 A CN 106503688A CN 201611012109 A CN201611012109 A CN 201611012109A CN 106503688 A CN106503688 A CN 106503688A
Authority
CN
China
Prior art keywords
word
image
interval
bounding box
writing brush
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611012109.2A
Other languages
Chinese (zh)
Inventor
张九龙
屈小娥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN201611012109.2A priority Critical patent/CN106503688A/en
Publication of CN106503688A publication Critical patent/CN106503688A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/333Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

Writing brush word minimum bounding box extracting method based on wavelet Smoothing disclosed by the invention, specially:Rubbings image is input into first, the handwriting image of binaryzation is obtained after pretreatment;Enter ranks integration to the handwriting image of binaryzation, be partitioned into the starting and ending position of each row;Each arranging is integrated into every trade, is introduced wavelet Smoothing, for wavelet Smoothing being carried out to row integrated signal, is obtained smooth waveform, segmentation trip on smooth waveform;To rupturing, character is merged;Integration is once arranged again to each word first, accurate left and right edge is cut out;The upper and lower, left and right edge of each word is recorded afterwards;The single character picture of well cutting is finally preserved.Writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, can realize the cutting in the case of erecting in column of embarking on journey horizontal to rubbings image, and can effective process noise situations.

Description

Writing brush word minimum bounding box extracting method based on wavelet Smoothing
Technical field
The invention belongs to image processing method technical field, and in particular to a kind of most parcel of the writing brush word based on wavelet Smoothing Enclose box extracting method.
Background technology
In recent years, with the fast-developing and popularization of computer technology, also begin to more make in traditional calligraphy field Use computer technology.During rubbings image after to scanning is processed, need to go out the single Chinese to rubbings image zooming-out Word character, asks for the minimum bounding box of the character.
So-called minimum bounding box refers to the minimum rectangle frame for surrounding a word surrounding, is come real by transverse and longitudinal integration method Existing, the step of realizing of transverse and longitudinal integration method is:First string word is marked off with row integration, then mark off each word by row integration Lower edges, the row integration that finally tries again marks off the left and right edge of each word, how accurately simple in the method Realize that ranks cutting is difficult point, traditional ranks integration method only enters ranks integration, and actual uplink integration there are two steps, in reality Suffer from the drawback that in operation:(1) noise such as it is stained due to there is rubbings image, makes row integrated waveform there is burr so as to being difficult to Judge character starting and ending, cause the cutting of said method lower edges not accurate enough, be not the accurate minimum encirclement of word Box;(2) some Chinese characters may be cut into two halves etc..As can be seen here, row integration method has pending further improvement.
Content of the invention
It is an object of the invention to provide a kind of writing brush word minimum bounding box extracting method based on wavelet Smoothing, can realize The cutting in the case of erecting in column of embarking on journey horizontal to rubbings image, and can effective process noise situations.
The technical solution adopted in the present invention is, based on the writing brush word minimum bounding box extracting method of wavelet Smoothing, specifically Implement according to following steps:
Step 1, first input rubbings image, obtain the handwriting image of binaryzation after pretreatment;
Step 2, enter ranks integration to the handwriting image of the binaryzation obtained through step 1, be partitioned into starting and the knot of each row Beam position;
Step 3, through after step 2, each arranging is integrated into every trade, wavelet Smoothing is introduced, little for carrying out to row integrated signal Popin is slided, and specially using coif2 wavelet basiss, is decomposed three-level and is reconstructed using third level profile signal, obtain smooth Waveform, segmentation trip on smooth waveform;
Step 4, treat step 3 after the completion of, to rupture character merge;
Step 5, treat step 4 after the completion of, integration is once arranged again to each word first, accurate left and right edge is cut out; The upper and lower, left and right edge of each word is recorded afterwards;The single character picture of well cutting is finally preserved.
The characteristics of of the invention, also resides in:
Step 1 is specifically implemented in accordance with the following methods:
Using the rubbings image for color format storage, R is carried out, G, B triple channel color value summation is more than preset value Method is carried out, and is specifically implemented according to following algorithm:
In formula:C (x, y) is the original color image of input, and b (x, y) is the binary image for obtaining;Selected threshold 200 Carry out binaryzation.
Step 4 is specifically implemented according to following steps:
Step a, first interval of the searching more than 0, calculate the interval height afterwards;
Step b, through step a after, if the interval highly less than a word 30% and away between previous interval or latter zone Distance less than a word 10%, then it is the part between previous interval or latter zone to give tacit consent to the interval;
Step c, through step b after, travel through above-mentioned interval from top to bottom, carry out rupturing interval merge, obtain complete character Lower edges.
Beneficial effect of the present invention is:
(1) writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, efficiently solves rubbings image certainly Accurate cutting problem is moved, the cutting in the case of erecting in column of embarking on journey horizontal to rubbings image can be realized, moreover it is possible to effective process noise situations.
(2) present invention based on the writing brush word minimum bounding box extracting method of wavelet Smoothing overcome of the prior art not Foot, with three big improvements:1. first integrated using row integration, row, then arrange the three-step approach of integration, that is, integration of being expert at cuts out every list Again to each word using once row integration after the lower edges of word, left and right edge can be accurately cut out;2. be especially expert at integration when, pin The burr phenomena existed by row integrated waveform, is smoothed using wavelet transformation, so effectively eliminates character starting and ending The burr at place, improves the precision for judging;Situation about 3. may be split off for some Chinese characters, such as " thinks ", the level such as " three " is integrated There is blank situation, judged according to character average height and character average headway;When cutting gained number of characters is more than every Start the judgement during row number of characters, if certain character height is less than in advance much smaller than character average height and with neighbouring character pitch If when value, entering line character merging, the situation that character is split off can be eliminated.
Description of the drawings
Fig. 1 is flow chart of the present invention based on the writing brush word minimum bounding box extracting method of wavelet Smoothing;
Fig. 2 be the present invention based on the row integrated waveform being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing Figure;
Fig. 3 be the present invention based on the wavelet Smoothing being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing after Row integrated waveform figure;
Fig. 4 be the present invention based on the row integral image being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing;
Fig. 5 is before the present invention is merged based on the interval being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing Row integral image;
Fig. 6 is after the present invention is merged based on the interval being related in the writing brush word minimum bounding box extracting method of wavelet Smoothing Row integral image.
Specific embodiment
The present invention is described in detail with reference to the accompanying drawings and detailed description.
Writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, its flow process is as shown in figure 1, be first input into upright stone tablet Note image, obtains the handwriting image of binaryzation after pretreatment;Row integration is then passed through, the starting and ending position of each row is partitioned into Put;Then each column count row is integrated, it is considered to which rubbings is stained the noise of presence, is caused row integrated waveform to there is burr, is increased cutting Difficulty, affects cutting precision, introduces wavelet Smoothing, the enterprising every trade cutting of the waveform after smooth;Merge afterwards and may be split off Character;The coordinate of final entry minimum bounding box simultaneously preserves cutting character picture.
Writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, specifically implements according to following steps:
Step 1, first input rubbings image, obtain the handwriting image of binaryzation, specifically in accordance with the following methods after pretreatment Implement:
The binarization method of rubbings image is more, will be carried out according to the characteristics of rubbings;
Using the rubbings image for color format storage, R is carried out here, G, B triple channel color value summation is more than default The method of value is carried out, and is specifically implemented according to following algorithm:
In above formula:C (x, y) is the original color image of input, and b (x, y) is the binary image for obtaining;Here choose Threshold value 200 carries out binaryzation;
For the rubbings that some are downloaded from the Internet, may also have and there is white edge in page surrounding, can be found out by integration method The position of white edge, so that intercept the image in white edge for effective binary image.
Step 2, enter ranks integration to the handwriting image of the binaryzation obtained through step 1, be partitioned into starting and the knot of each row Beam position.
Step 3, through after step 2, each arranging is integrated into every trade, the noise of presence is wherein stained in view of rubbings, is caused to go There is burr in integrated waveform, increase cutting difficulty, affect cutting precision, and wavelet Smoothing to be introduced, for entering to row integrated signal Row wavelet Smoothing, specially using coif2 wavelet basiss, is decomposed three-level and is reconstructed using third level profile signal, put down Sliding waveform;Segmentation trip on smooth waveform;
Wherein, row integrated waveform figure is as shown in Fig. 2 the row integrated waveform figure after wavelet Smoothing is as shown in figure 3, can see Go out:After through wavelet Smoothing, significantly improve the burr phenomena on waveform, it is to avoid what lower edges cannot be positioned when word is split lacks Fall into.
Step 4, treat step 3 after the completion of, to rupture character merge, specifically implement in accordance with the following methods:
In order to avoid individual malapropism is split off, be handled as follows:
Step a, first interval of the searching more than 0, calculate the interval height afterwards;
Step b, through step a after, if the interval highly less than a word 30% and away between previous interval or latter zone Distance less than a word 10%, then it is the part between previous interval or latter zone to give tacit consent to the interval;
Step c, through step b after, travel through above-mentioned interval from top to bottom, carry out rupturing interval merge, obtain complete character Lower edges;
As shown in Fig. 4, Fig. 5 and Fig. 6, the 8th character is cut and splits into two characters, merges into one after algorithm process Individual complete character.
Step 5, treat step 4 after the completion of, integration is once arranged again to each word first, accurate left and right edge is cut out; The upper and lower, left and right edge of each word is recorded afterwards;The single character picture of well cutting is finally preserved.
Writing brush word minimum bounding box extracting method of the present invention based on wavelet Smoothing, efficiently solves rubbings image automatic precision Really cutting problem, can realize the cutting in the case of erecting in column of embarking on journey horizontal to rubbings image, moreover it is possible to effective process noise situations.

Claims (3)

1. the writing brush word minimum bounding box extracting method based on wavelet Smoothing, it is characterised in that specifically implement according to following steps:
Step 1, first input rubbings image, obtain the handwriting image of binaryzation after pretreatment;
Step 2, enter ranks integration to the handwriting image of the binaryzation obtained through step 1, be partitioned into the starting and ending position of each row Put;
Step 3, through after step 2, each arranging is integrated into every trade, wavelet Smoothing is introduced, for carrying out little popin to row integrated signal Sliding, specially using coif2 wavelet basiss, decompose three-level and be reconstructed using third level profile signal, obtain smooth waveform, Segmentation trip on smooth waveform;
Step 4, treat step 3 after the completion of, to rupture character merge;
Step 5, treat step 4 after the completion of, integration is once arranged again to each word first, accurate left and right edge is cut out;Afterwards Record the upper and lower, left and right edge of each word;The single character picture of well cutting is finally preserved.
2. the writing brush word minimum bounding box extracting method based on wavelet Smoothing according to claim 1, it is characterised in that institute State step 1 specifically to implement in accordance with the following methods:
Using the rubbings image for color format storage, R, method of G, the B triple channel color value summation more than preset value is carried out Carry out, specifically implement according to following algorithm:
b ( x , y ) = { 0 , &Sigma; r , g , b c ( x , y ) < 200 1 , &Sigma; r , g , b c ( x , y ) &GreaterEqual; 200 ;
In formula:C (x, y) is the original color image of input, and b (x, y) is the binary image for obtaining;Selected threshold 200 is carried out Binaryzation.
3. the writing brush word minimum bounding box extracting method based on wavelet Smoothing according to claim 1, it is characterised in that institute State step 4 specifically to implement according to following steps:
Step a, first interval of the searching more than 0, calculate the interval height afterwards;
Step b, through step a after, if the interval is highly less than 30% and the distance away between previous interval or latter zone of a word Less than the 10% of a word, then it is the part between previous interval or latter zone to give tacit consent to the interval;
Step c, through step b after, travel through above-mentioned interval from top to bottom, carry out rupturing interval merge, obtain complete character upper and lower Edge.
CN201611012109.2A 2016-11-17 2016-11-17 Writing brush word minimum bounding box extracting method based on wavelet Smoothing Pending CN106503688A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611012109.2A CN106503688A (en) 2016-11-17 2016-11-17 Writing brush word minimum bounding box extracting method based on wavelet Smoothing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611012109.2A CN106503688A (en) 2016-11-17 2016-11-17 Writing brush word minimum bounding box extracting method based on wavelet Smoothing

Publications (1)

Publication Number Publication Date
CN106503688A true CN106503688A (en) 2017-03-15

Family

ID=58324748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611012109.2A Pending CN106503688A (en) 2016-11-17 2016-11-17 Writing brush word minimum bounding box extracting method based on wavelet Smoothing

Country Status (1)

Country Link
CN (1) CN106503688A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446701A (en) * 2018-03-12 2018-08-24 南昌航空大学 A kind of best bounding volume method of writing brush word

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456220A (en) * 2010-10-25 2012-05-16 新奥特(北京)视频技术有限公司 Color image noise channel extraction method based on bounding box
CN103093240A (en) * 2013-01-18 2013-05-08 浙江大学 Calligraphy character identifying method
CN104715256A (en) * 2015-03-04 2015-06-17 南昌大学 Auxiliary calligraphy exercising system and evaluation method based on image method
CN104992176A (en) * 2015-07-24 2015-10-21 北京航空航天大学 Inscription oriented Chinese character extracting method
CN105117741A (en) * 2015-09-28 2015-12-02 上海海事大学 Recognition method of calligraphy character style

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456220A (en) * 2010-10-25 2012-05-16 新奥特(北京)视频技术有限公司 Color image noise channel extraction method based on bounding box
CN103093240A (en) * 2013-01-18 2013-05-08 浙江大学 Calligraphy character identifying method
CN104715256A (en) * 2015-03-04 2015-06-17 南昌大学 Auxiliary calligraphy exercising system and evaluation method based on image method
CN104992176A (en) * 2015-07-24 2015-10-21 北京航空航天大学 Inscription oriented Chinese character extracting method
CN105117741A (en) * 2015-09-28 2015-12-02 上海海事大学 Recognition method of calligraphy character style

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
印月等: "一种完整的汉字识别系统设计", 《微计算机信息》 *
尹 明等: "一种新的离线手写签名识别方法", 《现代电子技术》 *
章夏芬: "中国数字书法检索与作品真伪鉴别的研究", 《中国优秀博硕士学位论文全文数据库 (博士) 信息科技辑》 *
章夏芬等: "根据形状相似性的书法内容检索", 《计算机辅助设计与图形学学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446701A (en) * 2018-03-12 2018-08-24 南昌航空大学 A kind of best bounding volume method of writing brush word

Similar Documents

Publication Publication Date Title
US10255691B2 (en) Method and system of detecting and recognizing a vehicle logo based on selective search
CN107610124B (en) Furnace mouth image preprocessing method
US7298900B2 (en) Image processing method, image processing apparatus and image processing program
CN108470021A (en) The localization method and device of table in PDF document
US20050193327A1 (en) Method for determining logical components of a document
US9275030B1 (en) Horizontal and vertical line detection and removal for document images
CA2429507A1 (en) Writing guide for a free-form document editor
Lehal Ligature segmentation for Urdu OCR
CN103646247A (en) Music score recognition method
CN111368695A (en) Table structure extraction method
CN106503688A (en) Writing brush word minimum bounding box extracting method based on wavelet Smoothing
EP2685426A1 (en) Character string detection device, image processing device, character string detection method, control program and storage medium
CN101877062A (en) Method for profile analysis in image layout area
CN107730511A (en) A kind of Tibetan language historical document line of text cutting method based on baseline estimations
CN102314608A (en) Method and device for extracting rows from character image
CN106446863B (en) PDF document logic diagram identification method
CN101944180A (en) Music note primitive segmentation method based on music note knowledge and double projection method
CN101452368B (en) Hand-written character input method
CN101901333B (en) Method for segmenting word in text image and identification device using same
JP2000148788A (en) Device and method for extracting title area from document image and document retrieving method
CN108564078B (en) Method for extracting axle wire of Manchu word image
CN108764155B (en) Handwritten Uyghur word segmentation recognition method
CN106682666A (en) Characteristic template manufacturing method for unusual font OCR identification
CN108596182B (en) Manchu component cutting method
CN108549896B (en) Method for deleting redundant candidate segmentation lines in Manchu component segmentation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170315