CN104867114B - Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field - Google Patents
Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field Download PDFInfo
- Publication number
- CN104867114B CN104867114B CN201510168613.0A CN201510168613A CN104867114B CN 104867114 B CN104867114 B CN 104867114B CN 201510168613 A CN201510168613 A CN 201510168613A CN 104867114 B CN104867114 B CN 104867114B
- Authority
- CN
- China
- Prior art keywords
- back side
- image
- background
- side infiltration
- random field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Abstract
The invention discloses a kind of ancient books text image back side based on condition random field to permeate blind minimizing technology.It comprises the following steps:First establish the random probability distribution model of text image, image is divided into foreground part, back side infiltration part and three constituents of background parts, obtain the approximating function of the grey level histogram of three, and the parameter Estimation of three is obtained using K means algorithms, then set up the condition random field models are finely divided class to input picture, utilize belief propagation algorithm identification back side infiltration part, image is repaired finally by random filling algorithm, obtains the textual scan image of final no back side infiltration part.The present invention repairs the method that algorithm is combined using condition random field with filling immediately, the foreground part of reservation text image that can be perfect, and effectively remove back side infiltration part, substantially improve the visual effect of textual scan image, solve the problems such as display and printing of history text scan image, be of very high actual application value.
Description
Technical field
The present invention relates to a kind of processing method of text image, is in particular a kind of ancient books based on condition random field
Blind minimizing technology is permeated at the scan image back side.
Background technology
Because ancient books text is rare precious, modern guard method is often these Digitization of Chinese Ancient Books, with scan image
Mode browsed for researcher.The problem of preserving due to ancient books paper quality or for a long time, many two-sided writings or printing
Ancient books all there occurs the back side infiltration phenomenon, the i.e. another side that ink oozes out from the another side of paper phenomenon.This phenomenon
The content of ancient books text can be made to be difficult to read, meanwhile, it also have impact on the visual sense of beauty of some rare manuscripts.
In order to solve the above problems, the method that many back side infiltrations remove has been proposed.For at present, these methods are substantially
Two classes can be divided into:Blind minimizing technology and non-blind minimizing technology.Non-blind minimizing technology needs positive and negative the two of one page ancient books accurate alignment
The scan image in face.Because the image of autoregistration tow sides still has many difficult points, therefore this kind of work usually requires
Largely manually complete.In contrast, blind minimizing technology only needs the one-sided image of paper, avoids image registration problem.
A.Tonazzini et al. propositions use blind source separate technology, regard input picture as foreground part, back side infiltration part and background
Partial mixed signal, and calculated using independent composition analysis algorithm and attempt to recover this three parts.Because this method
The signal for the same object that different sensors collected is needed, needs color scanning image in this approach.In addition, together
One author proposes solve the problems, such as blind source separating using markov random file and EM algorithms again.It is different from thinking for Signal separator
Method, C.Wolf regard back side infiltration removal as image segmentation problem, propose markov random file and list based on double hidden layers
One observation field method.This method alternately updates the default value of two Markov random fields by maximum-flow algorithm, until convergence
To final segmentation result.But this kind of algorithm calculates exorbitant expenditure, it is difficult to meet some application requests.
The content of the invention
Above-mentioned technical problem existing for blind removal is permeated in order to solve the ancient books scan image back side, the present invention provides a kind of base
In the blind minimizing technology of ancient books scan image of condition random field.The effective back side infiltration removed in textual scan image of the invention
Part, while preferably ensure the integrality of text foreground part again, improve the readability of text image.
Technical proposal that the invention solves the above-mentioned problems comprises the following steps:
1) scan image of known class label is divided into foreground part, back side infiltration part, background parts, and establishes figure
As foreground part, the back side infiltration part, background parts conditional probability distribution model, obtain prospect, the back side infiltration, background three
The approximating function divided;
2) approximating function obtained according to step 1), by the use of the text image of unknown class label as input, using based on
K-means clustering algorithms obtain the foreground part of the text image of unknown class label, back side infiltration part, background parts, and
Calculate display foreground part, back side infiltration part, the average gray of background parts and variance;
3) to input picture set up the condition random field models, and the display foreground part, the back side that are obtained according to step 2) are oozed
Part, the average gray of background parts and variance are finely divided class to image thoroughly, obtain back side infiltration part;
4) classified image is repaired, removes back side infiltration part, obtains the image of final no back side infiltration part.
The technical effects of the invention are that:Ancient books text image is divided into three different parts by the present invention, and is built
Vertical conditional probability distribution model;After K-means algorithm preliminary classifications, the parameter of three kinds of components is estimated;It is basic herein
On, the condition random field of input picture is established, class label corresponding to each pixel is determined by belief propagation algorithm;Finally adopt
The back side permeable areas in image is repaired with random filling algorithm, the back side infiltration removed in image is blind.The present invention can effectively be gone
Except part is permeated at the back side in image, foreground part is preferably preserved, substantially increases the readability of image.
Brief description of the drawings
Fig. 1 is the process chart of the present invention;
Fig. 2 is the textual scan image classification results contrast of different classifications method;First original text image is classified as from left to right, the
Two are classified as the result of K-means clustering algorithms, and the 3rd is classified as the result of the invention based on maximum matching method.
Fig. 3 is that the foreground pixel classification accuracy of different classifications method and recall rate compare;
Fig. 4 is that the textual scan image result of different back sides infiltration removing method compares;
First original text image is classified as from left to right in Fig. 4, second is classified as the knot that K-means clusters combine random filling algorithm
Fruit, the 3rd is classified as the result of the present invention.
Embodiment
Fig. 1 is the process chart of the present invention.As illustrated, the present invention first establishes the random probability distribution mould of text image
Type, image is divided into foreground part, back side infiltration part and three constituents of background parts, obtains the intensity histogram of three
The approximating function of figure, and the parameter Estimation of three is obtained using K-means algorithms, then set up the condition random field models are to input
Image is finely divided class, and using belief propagation algorithm identification back side infiltration part, image is repaired finally by random filling algorithm,
Obtain the textual scan image of final no back side infiltration part.
The gray scale ancient books image for inputting to permeate with the back side, the gray scale ancient books image for exporting to permeate without the back side.This hair
Bright detailed step is as follows:
1. the scan image of known class label is divided into foreground part, back side infiltration part, background parts, and establish figure
As the conditional probability distribution model of three kinds of compositions.Entire image, foreground part, part is permeated at the back side and the gray scale of background parts is straight
Side's figure is respectively H, Hfg,Hbt,Hbg.Then every kind of member condition probability distribution such as formula (1), (2), (3) are shown.
Wherein s is class label, and d is gray value.P (s=0 | d), P (s=1 | d), P (s=2 | d) it is respectively prospect, the back of the body
Face is permeated, background parts conditional probability distribution.The conditional probability distribution of our select logic function simulation foreground and background parts,
Select the conditional probability distribution of Gaussian function simulation back side infiltration part.
WhereinIt is the amplitude factor of Gaussian function, (u0,u1,u2) it is the center factor, (σ0,σ1,σ2) be shape because
Son.
2. for the input picture of unknown class label, the most gray value of number being averaged as background component will appear from
Gray scale c2, then use (7) formula to estimate the variance of background component more than or equal to the pixel of average gray according to gray value.
Wherein N is the sum of all pixels of input picture, and 1 { f } is labeling function, as expression formula f>When 0, functional value 1, otherwise
For 0, IjAnd IkThe jth and k pixel in image I are represented respectively.
3. according to the average gray and method of the background component estimated, from the histogram of whole input picture, subtract
Background component, then determine that differentiation prospect component and background permeate the gray scale threshold of component according to remaining histogram using Da-Jin algorithm
Value, and the result divided according to threshold value permeates the average gray of component to calculate prospect component and the back side, is designated as c successively0With
c1。
4. the average gray and variance of three components obtained according to above step, can directly determine reverse side infiltration group
The conditional probability model of part;Component is permeated by fixed prospect component conditional probability model in prospect component average value and reverse side to put down
Probable value at averageWithTwo parameters of prospect component conditional probability model can be calculated with formula (8) and (9)
Value;Similarly, permeated by fixed background component conditional probability model in reverse side at component average value and background component average value
Probable valueWithThe value of two parameters of background component conditional probability model can be calculated with formula (10) and (11).
5. pair input picture set up the condition random field models.Model includes a hiding markov random file and one
Field is observed, i.e., each corresponding observation node of pixel and a concealed nodes.The value for observing node is grey scale pixel value, is hidden
The value of node may be 0 (prospect), 1 (back side infiltration), one in 2 (backgrounds), the solution depending on following optimization function:
Wherein siConcealed nodes to be estimated, D are observation fields, and S is hide Categories field, according to the property of Markov field:
Wherein diIt is siThe value of corresponding observation node, NiIt is siThe adjacent node set in field in field is hidden.Using putting
Propagation algorithm is believed to approach the optimal solution of this problem.(specific implementation details of belief propagation algorithm are with reference to J.S.Yedidia et al.
Paper J.S.Yedidia, the W.T.Freeman and being published in 2005 on IEEE Trans.Inf.Theory
Y.Weiss,Constructing Free-Energy Approximations and Generalized Belief
Propagation Algorithms,IEEE Trans.Inf.Theory,vol.51,no.7,pp.2282-2312,
Jul.2005.) 6. based on the image after condition random field classification, and back side infiltration part is repaired using random filling algorithm.With
Machine selects the pixel value substitution original pixel value of the neighborhood of pixels of each back side infiltration part, reaches and removes back side osmosizing portion subhead
's:
Wherein R is random selection function,It is that partial pixel, W are permeated in the pending back sidekIt isLocal neighborhood,
DbgIt is background parts set of pixels.
Sorting technique based on condition random field proposed by the invention and the sorting technique based on K-means clustering algorithms
Compare.First row is original image to accompanying drawing 2 from left to right, and second is classified as the result of K-means clustering algorithms, and the 3rd is classified as this hair
The bright result based on maximum matching method.As can be seen that the result based on K-means clustering algorithms and many noises, and this
The result of invention has more correct and smooth edge.
For objective appraisal classification results, according to the actual value provided in database, evaluation criterion include accuracy rate and
Recall rate, it is calculated as follows:
WhereinIt is the classification that sorting algorithm obtains, SgtIt is the actual value of classification.From accompanying drawing 3 as can be seen that the present invention carries
The algorithm gone out has higher recall rate than K-means algorithm, but the accuracy rate on certain some image is calculated not as good as K-means
Method.For history text surface sweeping image, foreground part should more be taken seriously, so the importance of recall rate is more than accurately
Rate.
Accompanying drawing 4 illustrates the final result that back side infiltration removes, and by comparing, algorithm proposed by the present invention compares K-means
Protection of the algorithm to foreground part is more preferable.Even if input picture has larger contrast and grey scale change, calculation proposed by the present invention
Method can also effectively remove back side infiltration part.
Claims (3)
1. blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field, comprise the following steps:
1) scan image of known class label is divided into foreground part, back side infiltration part, background parts, and before establishing image
Scape part, back side infiltration part, the conditional probability distribution models of background parts, obtain prospect, back side infiltration, background three parts
Approximating function;
2) approximating function obtained according to step 1), by the use of the text image of unknown class label as input, using based on K-
Means clustering algorithms obtain the foreground part of the text image of unknown class label, back side infiltration part, background parts, and count
Nomogram is as foreground part, back side infiltration part, the average gray of background parts and variance;
3) to input picture set up the condition random field models, and obtained according to step 2) display foreground part, back side osmosizing portion
Point, the average gray of background parts and variance class is finely divided to image, obtain back side infiltration part;It is concretely comprised the following steps:
Conditional random field models include a hiding markov random file and an observation field, and the value for observing node is pixel ash
Angle value, hiding Markov nodal value are class label, and optimal solution is approached using belief propagation algorithm, obtain back side infiltration
Region;
4) classified image is repaired, removes back side infiltration part, obtains the image of final no back side infiltration part;Specific step
Suddenly it is:Using random filling algorithm, one background pixel gray value of random selection in neighborhood of pixels is overleaf permeated, substitutes former ash
Angle value, remove back side permeable areas.
2. blind minimizing technology is permeated at the ancient books scan image back side according to claim 1 based on condition random field, it is special
Sign is that described step 1) concretely comprises the following steps:Using logical function come approximate foreground and background conditional probability distribution, its is general
Rate distribution is as follows:
Wherein s is class label, and d is gray value, (u0,u2) centered on location factor, (σ0,σ2) it is form factor;
Conditional probability distribution using Gaussian function approximation back side infiltration part is as follows:
WhereinFor amplitude factor, u1And σ1Respectively the center factor and form factor.
3. blind minimizing technology is permeated at the ancient books scan image back side according to claim 1 based on condition random field, it is special
Sign is that described step 2) concretely comprises the following steps:For the input picture of unknown class label, the most ash of number will appear from
Average gray c of the angle value as background component2, and the variance for estimating background component is as follows:
Wherein N is the sum of all pixels of input picture, and 1 { f } is labeling function, as expression formula f>Functional value 1, it is otherwise 0 when 0,
IjAnd IkThe jth and k pixel in image I are represented respectively;
According to the average gray and method of the background component estimated, from the histogram of whole input picture, background group is subtracted
Part, then determine that differentiation prospect component and background permeate the gray threshold of component according to remaining histogram using Da-Jin algorithm, and
The result divided according to threshold value permeates the average gray of component to calculate prospect component and the back side.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510168613.0A CN104867114B (en) | 2015-04-13 | 2015-04-13 | Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510168613.0A CN104867114B (en) | 2015-04-13 | 2015-04-13 | Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104867114A CN104867114A (en) | 2015-08-26 |
CN104867114B true CN104867114B (en) | 2018-01-09 |
Family
ID=53912931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510168613.0A Expired - Fee Related CN104867114B (en) | 2015-04-13 | 2015-04-13 | Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104867114B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631837A (en) * | 2015-10-27 | 2016-06-01 | 江苏思曼特信用管理有限公司 | Text image back penetration removing method |
CN106023090A (en) * | 2016-03-17 | 2016-10-12 | 陈于中 | Method of eliminating displayed back side contents in shot page image |
US10740644B2 (en) * | 2018-02-27 | 2020-08-11 | Intuit Inc. | Method and system for background removal from documents |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1687961A (en) * | 2005-04-19 | 2005-10-26 | 浙江大学 | Computerized truth identifying method for traditional Chinese painting |
US8194965B2 (en) * | 2007-11-19 | 2012-06-05 | Parascript, Llc | Method and system of providing a probability distribution to aid the detection of tumors in mammogram images |
CN103530405A (en) * | 2013-10-23 | 2014-01-22 | 天津大学 | Image retrieval method based on layered structure |
-
2015
- 2015-04-13 CN CN201510168613.0A patent/CN104867114B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1687961A (en) * | 2005-04-19 | 2005-10-26 | 浙江大学 | Computerized truth identifying method for traditional Chinese painting |
US8194965B2 (en) * | 2007-11-19 | 2012-06-05 | Parascript, Llc | Method and system of providing a probability distribution to aid the detection of tumors in mammogram images |
CN103530405A (en) * | 2013-10-23 | 2014-01-22 | 天津大学 | Image retrieval method based on layered structure |
Non-Patent Citations (8)
Title |
---|
A Ground Truth Bleed-Through Document Image Database;Roisın Rowley-Brooke 等;《International Conference on Theory and Practice of Digital Libraries. Springer-Verlag》;20121231;1-4 * |
A Markov Model for Blind Image Separation by a Mean-Field EM Algorithm;Anna Tonazzini 等;《IEEE TRANSACTIONS ON IMAGE PROCESSING》;20060228;第15卷(第2期);473-482 * |
Document Ink Bleed-Through Removal with Two Hidden Markov Random Fields and a Single Observation Field;Christian Wolf;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20100331;第32卷(第3期);摘要,第1-3章,第5章,第6.1节,第6.2节 * |
Improving recto document side restoration with an estimation of the verso side from a single scanned page;Christian Wolf;《International Conference on Pattern Recognition》;20091231;185-196 * |
Independent component analysis for document restoration;Anna Tonazzini 等;《International Journal on Document Analysis and Recognition (IJDAR)》;20040430;第7卷(第1期);17-27 * |
Scanned Image Descreening With Image Redundancy and Adaptive Filtering;Bin Sun 等;《IEEE TRANSACTIONS ON IMAGE PROCESSING》;20140831;第23卷(第8期);3698-3710 * |
基于条件随机场的复杂背景图像文字分割;王佳鑫 等;《现代电子技术》;20110401;第34卷(第7期);0引言,第1节 * |
条件随机场模型的场景描述;赵龙 等;《中国图象图形学报》;20130316;第18卷(第3期);271-276 * |
Also Published As
Publication number | Publication date |
---|---|
CN104867114A (en) | 2015-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108562589B (en) | Method for detecting surface defects of magnetic circuit material | |
CN111145209B (en) | Medical image segmentation method, device, equipment and storage medium | |
Poletti et al. | A review of thresholding strategies applied to human chromosome segmentation | |
CN109241973B (en) | Full-automatic soft segmentation method for characters under texture background | |
CN103473739A (en) | White blood cell image accurate segmentation method and system based on support vector machine | |
CN111340824B (en) | Image feature segmentation method based on data mining | |
WO2022012110A1 (en) | Method and system for recognizing cells in embryo light microscope image, and device and storage medium | |
CN106127817B (en) | A kind of image binaryzation method based on channel | |
CN110189383B (en) | Traditional Chinese medicine tongue color and fur color quantitative analysis method based on machine learning | |
CN110738160A (en) | human face quality evaluation method combining with human face detection | |
CN109035274A (en) | File and picture binary coding method based on background estimating Yu U-shaped convolutional neural networks | |
CN109598681B (en) | No-reference quality evaluation method for image after repairing of symmetrical Thangka | |
CN109002755A (en) | Age estimation model building method and estimation method based on facial image | |
CN110728302A (en) | Method for identifying color textile fabric tissue based on HSV (hue, saturation, value) and Lab (Lab) color spaces | |
CN106529432A (en) | Hand area segmentation method deeply integrating significance detection and prior knowledge | |
CN104021566A (en) | GrabCut algorithm-based automatic segmentation method of tongue diagnosis image | |
CN107895379A (en) | The innovatory algorithm of foreground extraction in a kind of video monitoring | |
CN104867114B (en) | Blind minimizing technology is permeated at a kind of ancient books scan image back side based on condition random field | |
Shaker et al. | Automatic detection and segmentation of sperm head, acrosome and nucleus in microscopic images of human semen smears | |
CN107705323A (en) | A kind of level set target tracking method based on convolutional neural networks | |
CN103268492B (en) | A kind of corn grain type identification method | |
CN113592783A (en) | Method and device for accurately quantifying basic indexes of cells in corneal confocal image | |
CN109712095B (en) | Face beautifying method with rapid edge preservation | |
CN104268845A (en) | Self-adaptive double local reinforcement method of extreme-value temperature difference short wave infrared image | |
CN110874835A (en) | Crop leaf disease resistance identification method and system, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180109 Termination date: 20180413 |