CN104867114B - Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field - Google Patents

Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field Download PDF

Info

Publication number
CN104867114B
CN104867114B CN201510168613.0A CN201510168613A CN104867114B CN 104867114 B CN104867114 B CN 104867114B CN 201510168613 A CN201510168613 A CN 201510168613A CN 104867114 B CN104867114 B CN 104867114B
Authority
CN
China
Prior art keywords
back side
image
background
side infiltration
random field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510168613.0A
Other languages
Chinese (zh)
Other versions
CN104867114A (en
Inventor
李树涛
孙斌
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Fujitsu Ltd
Original Assignee
Hunan University
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University, Fujitsu Ltd filed Critical Hunan University
Priority to CN201510168613.0A priority Critical patent/CN104867114B/en
Publication of CN104867114A publication Critical patent/CN104867114A/en
Application granted granted Critical
Publication of CN104867114B publication Critical patent/CN104867114B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of ancient books text image back side based on condition random field to permeate blind minimizing technology.It comprises the following steps:First establish the random probability distribution model of text image, image is divided into foreground part, back side infiltration part and three constituents of background parts, obtain the approximating function of the grey level histogram of three, and the parameter Estimation of three is obtained using K means algorithms, then set up the condition random field models are finely divided class to input picture, utilize belief propagation algorithm identification back side infiltration part, image is repaired finally by random filling algorithm, obtains the textual scan image of final no back side infiltration part.The present invention repairs the method that algorithm is combined using condition random field with filling immediately, the foreground part of reservation text image that can be perfect, and effectively remove back side infiltration part, substantially improve the visual effect of textual scan image, solve the problems such as display and printing of history text scan image, be of very high actual application value.

Description

Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field
Technical field
The present invention relates to a kind of processing method of text image, is in particular a kind of ancient books based on condition random field Blind minimizing technology is permeated at the scan image back side.
Background technology
Because ancient books text is rare precious, modern guard method is often these Digitization of Chinese Ancient Books, with scan image Mode browsed for researcher.The problem of preserving due to ancient books paper quality or for a long time, many two-sided writings or printing Ancient books all there occurs the back side infiltration phenomenon, the i.e. another side that ink oozes out from the another side of paper phenomenon.This phenomenon The content of ancient books text can be made to be difficult to read, meanwhile, it also have impact on the visual sense of beauty of some rare manuscripts.
In order to solve the above problems, the method that many back side infiltrations remove has been proposed.For at present, these methods are substantially Two classes can be divided into:Blind minimizing technology and non-blind minimizing technology.Non-blind minimizing technology needs positive and negative the two of one page ancient books accurate alignment The scan image in face.Because the image of autoregistration tow sides still has many difficult points, therefore this kind of work usually requires Largely manually complete.In contrast, blind minimizing technology only needs the one-sided image of paper, avoids image registration problem. A.Tonazzini et al. propositions use blind source separate technology, regard input picture as foreground part, back side infiltration part and background Partial mixed signal, and calculated using independent composition analysis algorithm and attempt to recover this three parts.Because this method The signal for the same object that different sensors collected is needed, needs color scanning image in this approach.In addition, together One author proposes solve the problems, such as blind source separating using markov random file and EM algorithms again.It is different from thinking for Signal separator Method, C.Wolf regard back side infiltration removal as image segmentation problem, propose markov random file and list based on double hidden layers One observation field method.This method alternately updates the default value of two Markov random fields by maximum-flow algorithm, until convergence To final segmentation result.But this kind of algorithm calculates exorbitant expenditure, it is difficult to meet some application requests.
The content of the invention
Above-mentioned technical problem existing for blind removal is permeated in order to solve the ancient books scan image back side, the present invention provides a kind of base In the blind minimizing technology of ancient books scan image of condition random field.The effective back side infiltration removed in textual scan image of the invention Part, while preferably ensure the integrality of text foreground part again, improve the readability of text image.
Technical proposal that the invention solves the above-mentioned problems comprises the following steps:
1) scan image of known class label is divided into foreground part, back side infiltration part, background parts, and establishes figure As foreground part, the back side infiltration part, background parts conditional probability distribution model, obtain prospect, the back side infiltration, background three The approximating function divided;
2) approximating function obtained according to step 1), by the use of the text image of unknown class label as input, using based on K-means clustering algorithms obtain the foreground part of the text image of unknown class label, back side infiltration part, background parts, and Calculate display foreground part, back side infiltration part, the average gray of background parts and variance;
3) to input picture set up the condition random field models, and the display foreground part, the back side that are obtained according to step 2) are oozed Part, the average gray of background parts and variance are finely divided class to image thoroughly, obtain back side infiltration part;
4) classified image is repaired, removes back side infiltration part, obtains the image of final no back side infiltration part.
The technical effects of the invention are that:Ancient books text image is divided into three different parts by the present invention, and is built Vertical conditional probability distribution model;After K-means algorithm preliminary classifications, the parameter of three kinds of components is estimated;It is basic herein On, the condition random field of input picture is established, class label corresponding to each pixel is determined by belief propagation algorithm;Finally adopt The back side permeable areas in image is repaired with random filling algorithm, the back side infiltration removed in image is blind.The present invention can effectively be gone Except part is permeated at the back side in image, foreground part is preferably preserved, substantially increases the readability of image.
Brief description of the drawings
Fig. 1 is the process chart of the present invention;
Fig. 2 is the textual scan image classification results contrast of different classifications method;First original text image is classified as from left to right, the Two are classified as the result of K-means clustering algorithms, and the 3rd is classified as the result of the invention based on maximum matching method.
Fig. 3 is that the foreground pixel classification accuracy of different classifications method and recall rate compare;
Fig. 4 is that the textual scan image result of different back sides infiltration removing method compares;
First original text image is classified as from left to right in Fig. 4, second is classified as the knot that K-means clusters combine random filling algorithm Fruit, the 3rd is classified as the result of the present invention.
Embodiment
Fig. 1 is the process chart of the present invention.As illustrated, the present invention first establishes the random probability distribution mould of text image Type, image is divided into foreground part, back side infiltration part and three constituents of background parts, obtains the intensity histogram of three The approximating function of figure, and the parameter Estimation of three is obtained using K-means algorithms, then set up the condition random field models are to input Image is finely divided class, and using belief propagation algorithm identification back side infiltration part, image is repaired finally by random filling algorithm, Obtain the textual scan image of final no back side infiltration part.
The gray scale ancient books image for inputting to permeate with the back side, the gray scale ancient books image for exporting to permeate without the back side.This hair Bright detailed step is as follows:
1. the scan image of known class label is divided into foreground part, back side infiltration part, background parts, and establish figure As the conditional probability distribution model of three kinds of compositions.Entire image, foreground part, part is permeated at the back side and the gray scale of background parts is straight Side's figure is respectively H, Hfg,Hbt,Hbg.Then every kind of member condition probability distribution such as formula (1), (2), (3) are shown.
Wherein s is class label, and d is gray value.P (s=0 | d), P (s=1 | d), P (s=2 | d) it is respectively prospect, the back of the body Face is permeated, background parts conditional probability distribution.The conditional probability distribution of our select logic function simulation foreground and background parts, Select the conditional probability distribution of Gaussian function simulation back side infiltration part.
WhereinIt is the amplitude factor of Gaussian function, (u0,u1,u2) it is the center factor, (σ012) be shape because Son.
2. for the input picture of unknown class label, the most gray value of number being averaged as background component will appear from Gray scale c2, then use (7) formula to estimate the variance of background component more than or equal to the pixel of average gray according to gray value.
Wherein N is the sum of all pixels of input picture, and 1 { f } is labeling function, as expression formula f>When 0, functional value 1, otherwise For 0, IjAnd IkThe jth and k pixel in image I are represented respectively.
3. according to the average gray and method of the background component estimated, from the histogram of whole input picture, subtract Background component, then determine that differentiation prospect component and background permeate the gray scale threshold of component according to remaining histogram using Da-Jin algorithm Value, and the result divided according to threshold value permeates the average gray of component to calculate prospect component and the back side, is designated as c successively0With c1
4. the average gray and variance of three components obtained according to above step, can directly determine reverse side infiltration group The conditional probability model of part;Component is permeated by fixed prospect component conditional probability model in prospect component average value and reverse side to put down Probable value at averageWithTwo parameters of prospect component conditional probability model can be calculated with formula (8) and (9) Value;Similarly, permeated by fixed background component conditional probability model in reverse side at component average value and background component average value Probable valueWithThe value of two parameters of background component conditional probability model can be calculated with formula (10) and (11).
5. pair input picture set up the condition random field models.Model includes a hiding markov random file and one Field is observed, i.e., each corresponding observation node of pixel and a concealed nodes.The value for observing node is grey scale pixel value, is hidden The value of node may be 0 (prospect), 1 (back side infiltration), one in 2 (backgrounds), the solution depending on following optimization function:
Wherein siConcealed nodes to be estimated, D are observation fields, and S is hide Categories field, according to the property of Markov field:
Wherein diIt is siThe value of corresponding observation node, NiIt is siThe adjacent node set in field in field is hidden.Using putting Propagation algorithm is believed to approach the optimal solution of this problem.(specific implementation details of belief propagation algorithm are with reference to J.S.Yedidia et al. Paper J.S.Yedidia, the W.T.Freeman and being published in 2005 on IEEE Trans.Inf.Theory Y.Weiss,Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms,IEEE Trans.Inf.Theory,vol.51,no.7,pp.2282-2312, Jul.2005.) 6. based on the image after condition random field classification, and back side infiltration part is repaired using random filling algorithm.With Machine selects the pixel value substitution original pixel value of the neighborhood of pixels of each back side infiltration part, reaches and removes back side osmosizing portion subhead 's:
Wherein R is random selection function,It is that partial pixel, W are permeated in the pending back sidekIt isLocal neighborhood, DbgIt is background parts set of pixels.
Sorting technique based on condition random field proposed by the invention and the sorting technique based on K-means clustering algorithms Compare.First row is original image to accompanying drawing 2 from left to right, and second is classified as the result of K-means clustering algorithms, and the 3rd is classified as this hair The bright result based on maximum matching method.As can be seen that the result based on K-means clustering algorithms and many noises, and this The result of invention has more correct and smooth edge.
For objective appraisal classification results, according to the actual value provided in database, evaluation criterion include accuracy rate and Recall rate, it is calculated as follows:
WhereinIt is the classification that sorting algorithm obtains, SgtIt is the actual value of classification.From accompanying drawing 3 as can be seen that the present invention carries The algorithm gone out has higher recall rate than K-means algorithm, but the accuracy rate on certain some image is calculated not as good as K-means Method.For history text surface sweeping image, foreground part should more be taken seriously, so the importance of recall rate is more than accurately Rate.
Accompanying drawing 4 illustrates the final result that back side infiltration removes, and by comparing, algorithm proposed by the present invention compares K-means Protection of the algorithm to foreground part is more preferable.Even if input picture has larger contrast and grey scale change, calculation proposed by the present invention Method can also effectively remove back side infiltration part.

Claims (3)

1. blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field, comprise the following steps:
1) scan image of known class label is divided into foreground part, back side infiltration part, background parts, and before establishing image Scape part, back side infiltration part, the conditional probability distribution models of background parts, obtain prospect, back side infiltration, background three parts Approximating function;
2) approximating function obtained according to step 1), by the use of the text image of unknown class label as input, using based on K- Means clustering algorithms obtain the foreground part of the text image of unknown class label, back side infiltration part, background parts, and count Nomogram is as foreground part, back side infiltration part, the average gray of background parts and variance;
3) to input picture set up the condition random field models, and obtained according to step 2) display foreground part, back side osmosizing portion Point, the average gray of background parts and variance class is finely divided to image, obtain back side infiltration part;It is concretely comprised the following steps: Conditional random field models include a hiding markov random file and an observation field, and the value for observing node is pixel ash Angle value, hiding Markov nodal value are class label, and optimal solution is approached using belief propagation algorithm, obtain back side infiltration Region;
4) classified image is repaired, removes back side infiltration part, obtains the image of final no back side infiltration part;Specific step Suddenly it is:Using random filling algorithm, one background pixel gray value of random selection in neighborhood of pixels is overleaf permeated, substitutes former ash Angle value, remove back side permeable areas.
2. blind minimizing technology is permeated at the ancient books scan image back side according to claim 1 based on condition random field, it is special Sign is that described step 1) concretely comprises the following steps:Using logical function come approximate foreground and background conditional probability distribution, its is general Rate distribution is as follows:
Wherein s is class label, and d is gray value, (u0,u2) centered on location factor, (σ02) it is form factor;
Conditional probability distribution using Gaussian function approximation back side infiltration part is as follows:
WhereinFor amplitude factor, u1And σ1Respectively the center factor and form factor.
3. blind minimizing technology is permeated at the ancient books scan image back side according to claim 1 based on condition random field, it is special Sign is that described step 2) concretely comprises the following steps:For the input picture of unknown class label, the most ash of number will appear from Average gray c of the angle value as background component2, and the variance for estimating background component is as follows:
Wherein N is the sum of all pixels of input picture, and 1 { f } is labeling function, as expression formula f>Functional value 1, it is otherwise 0 when 0, IjAnd IkThe jth and k pixel in image I are represented respectively;
According to the average gray and method of the background component estimated, from the histogram of whole input picture, background group is subtracted Part, then determine that differentiation prospect component and background permeate the gray threshold of component according to remaining histogram using Da-Jin algorithm, and The result divided according to threshold value permeates the average gray of component to calculate prospect component and the back side.
CN201510168613.0A 2015-04-13 2015-04-13 Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field Expired - Fee Related CN104867114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510168613.0A CN104867114B (en) 2015-04-13 2015-04-13 Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510168613.0A CN104867114B (en) 2015-04-13 2015-04-13 Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field

Publications (2)

Publication Number Publication Date
CN104867114A CN104867114A (en) 2015-08-26
CN104867114B true CN104867114B (en) 2018-01-09

Family

ID=53912931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510168613.0A Expired - Fee Related CN104867114B (en) 2015-04-13 2015-04-13 Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field

Country Status (1)

Country Link
CN (1) CN104867114B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631837A (en) * 2015-10-27 2016-06-01 江苏思曼特信用管理有限公司 Text image back penetration removing method
CN106023090A (en) * 2016-03-17 2016-10-12 陈于中 Method of eliminating displayed back side contents in shot page image
US10740644B2 (en) * 2018-02-27 2020-08-11 Intuit Inc. Method and system for background removal from documents

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687961A (en) * 2005-04-19 2005-10-26 浙江大学 Computerized truth identifying method for traditional Chinese painting
US8194965B2 (en) * 2007-11-19 2012-06-05 Parascript, Llc Method and system of providing a probability distribution to aid the detection of tumors in mammogram images
CN103530405A (en) * 2013-10-23 2014-01-22 天津大学 Image retrieval method based on layered structure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687961A (en) * 2005-04-19 2005-10-26 浙江大学 Computerized truth identifying method for traditional Chinese painting
US8194965B2 (en) * 2007-11-19 2012-06-05 Parascript, Llc Method and system of providing a probability distribution to aid the detection of tumors in mammogram images
CN103530405A (en) * 2013-10-23 2014-01-22 天津大学 Image retrieval method based on layered structure

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
A Ground Truth Bleed-Through Document Image Database;Roisın Rowley-Brooke 等;《International Conference on Theory and Practice of Digital Libraries. Springer-Verlag》;20121231;1-4 *
A Markov Model for Blind Image Separation by a Mean-Field EM Algorithm;Anna Tonazzini 等;《IEEE TRANSACTIONS ON IMAGE PROCESSING》;20060228;第15卷(第2期);473-482 *
Document Ink Bleed-Through Removal with Two Hidden Markov Random Fields and a Single Observation Field;Christian Wolf;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20100331;第32卷(第3期);摘要,第1-3章,第5章,第6.1节,第6.2节 *
Improving recto document side restoration with an estimation of the verso side from a single scanned page;Christian Wolf;《International Conference on Pattern Recognition》;20091231;185-196 *
Independent component analysis for document restoration;Anna Tonazzini 等;《International Journal on Document Analysis and Recognition (IJDAR)》;20040430;第7卷(第1期);17-27 *
Scanned Image Descreening With Image Redundancy and Adaptive Filtering;Bin Sun 等;《IEEE TRANSACTIONS ON IMAGE PROCESSING》;20140831;第23卷(第8期);3698-3710 *
基于条件随机场的复杂背景图像文字分割;王佳鑫 等;《现代电子技术》;20110401;第34卷(第7期);0引言,第1节 *
条件随机场模型的场景描述;赵龙 等;《中国图象图形学报》;20130316;第18卷(第3期);271-276 *

Also Published As

Publication number Publication date
CN104867114A (en) 2015-08-26

Similar Documents

Publication Publication Date Title
CN108562589B (en) Method for detecting surface defects of magnetic circuit material
CN111145209B (en) Medical image segmentation method, device, equipment and storage medium
Poletti et al. A review of thresholding strategies applied to human chromosome segmentation
CN109241973B (en) Full-automatic soft segmentation method for characters under texture background
CN103473739A (en) White blood cell image accurate segmentation method and system based on support vector machine
CN111340824B (en) Image feature segmentation method based on data mining
WO2022012110A1 (en) Method and system for recognizing cells in embryo light microscope image, and device and storage medium
CN106127817B (en) A kind of image binaryzation method based on channel
CN110189383B (en) Traditional Chinese medicine tongue color and fur color quantitative analysis method based on machine learning
CN110738160A (en) human face quality evaluation method combining with human face detection
CN109035274A (en) File and picture binary coding method based on background estimating Yu U-shaped convolutional neural networks
CN109598681B (en) No-reference quality evaluation method for image after repairing of symmetrical Thangka
CN109002755A (en) Age estimation model building method and estimation method based on facial image
CN110728302A (en) Method for identifying color textile fabric tissue based on HSV (hue, saturation, value) and Lab (Lab) color spaces
CN106529432A (en) Hand area segmentation method deeply integrating significance detection and prior knowledge
CN104021566A (en) GrabCut algorithm-based automatic segmentation method of tongue diagnosis image
CN107895379A (en) The innovatory algorithm of foreground extraction in a kind of video monitoring
CN104867114B (en) Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field
Shaker et al. Automatic detection and segmentation of sperm head, acrosome and nucleus in microscopic images of human semen smears
CN107705323A (en) A kind of level set target tracking method based on convolutional neural networks
CN103268492B (en) A kind of corn grain type identification method
CN113592783A (en) Method and device for accurately quantifying basic indexes of cells in corneal confocal image
CN109712095B (en) Face beautifying method with rapid edge preservation
CN104268845A (en) Self-adaptive double local reinforcement method of extreme-value temperature difference short wave infrared image
CN110874835A (en) Crop leaf disease resistance identification method and system, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180109

Termination date: 20180413