CN103024246B - Documentary archive image compressing method - Google Patents

Documentary archive image compressing method Download PDF

Info

Publication number
CN103024246B
CN103024246B CN201210496941.XA CN201210496941A CN103024246B CN 103024246 B CN103024246 B CN 103024246B CN 201210496941 A CN201210496941 A CN 201210496941A CN 103024246 B CN103024246 B CN 103024246B
Authority
CN
China
Prior art keywords
image
compression
ll
run
archives
Prior art date
Application number
CN201210496941.XA
Other languages
Chinese (zh)
Other versions
CN103024246A (en
Inventor
吕岳
刘丽
Original Assignee
华东师范大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华东师范大学 filed Critical 华东师范大学
Priority to CN201210496941.XA priority Critical patent/CN103024246B/en
Publication of CN103024246A publication Critical patent/CN103024246A/en
Application granted granted Critical
Publication of CN103024246B publication Critical patent/CN103024246B/en

Links

Abstract

本发明公开一种文书档案图像压缩方法。 The present invention discloses an image compression method archives. 包括:预处理,所述预处理对原始图像进行处理,以去除扫描过程中的噪声干扰;三值化处理,所述三值化处理对经过所述预处理后的图像进行处理,以利用三种色彩信息对经过所述预处理后的所述图像进行表示;LL压缩编码处理,所述LL压缩编码处理用于对经过所述三值化处理的所述图像进行无损压缩编码。 Comprising: a pretreatment, the pretreatment of the raw image is processed to remove noise in the scanning process; binarization processing, the ternarization processing on the image after the pre-processed, to use three after information indicating colors of the image after the pretreatment is performed; LL compression coding processing, the compression encoding process LL for the value of the image through the three lossless compression-encoding process is performed. 本发明不但保留了文档图像中所有有用信息,同时消除了冗余,获得较高的压缩比。 The invention not only retains all the useful information in the document image, while eliminating redundancy, to obtain a higher compression ratio.

Description

一种文书档案图像压缩方法 One kind of document file image compression method

技术领域 FIELD

[0001] 本发明属于文档图像处理领域,特别涉及一种文书档案图像压缩方法。 [0001] The present invention belongs to the document image processing, and more particularly to an image compression method archives.

背景技术 Background technique

[0002] 为了适应信息化建设的需求,大量的纸质文书档案经过扫描以电子形式进行存储。 [0002] In order to meet the needs of information technology, a large number of files scanned paper documents stored in electronic form. 一方面可以避免由于存放不当造成的污损;另一方面,利用现有的信息技术可以方便地对电子档案进行管理及查找,从而大大节省了人力物力。 On the one hand to avoid contamination due to improper storage; on the other hand, the use of existing information technology can be easily carried out on electronic records management and search, thus saving manpower and resources. 通常,经过扫描的纸质档案以彩色图像格式存储在计算机中,如何对图像进行压缩以节省存储空间是急需解决的问题。 Typically, after scanning paper files stored in a computer format color image, how the image is compressed to save storage space are urgent problems.

[0003] 近年来,许多学者对图像压缩问题进行研宄,提出了各种不同的压缩编码方法。 [0003] In recent years, many scholars study based image compression issues, proposed a variety of different compression coding method. 主要分为无损压缩编码和有损压缩编码。 It divided into lossless compression encoding and lossy compression. 其中,无损压缩编码是指对压缩编码后的数据进行还原,可以获得与原来完全相同的数据。 Wherein the lossless compression-encoding means for compression encoding data reduction can be obtained with the same original data. 一些常用的无损压缩编码算法有霍夫曼(Huffman) 算法和LZWaenpel-Ziv&Welch)算法。 Some lossless compression algorithm Huffman coding (Huffman) algorithm and LZWaenpel-Ziv & Welch) algorithm is frequently used. 有损压缩编码是指经过压缩、解压的数据与原始数据不同但是非常接近的压缩方法,例如被广泛应用的JPEG压缩编码方法。 Lossy compression coding means is compressed, decompressed data and the original data compression method is very different, but close to, for example, widely used JPEG compression coding method. JPEG压缩基于离散余弦变换(DCT),它首先将图像分为一些互不重叠的区域,对每个区域进行离散余弦变换。 JPEG compression based on discrete cosine transform (DCT), which first image is divided into several non-overlapping areas, performing discrete cosine transform for each area. 将变换后的系数根据量化表进行量化,量化后的系数按照折线扫描重新组织,然后进行游程编码、算术编码或者霍夫曼编码。 The quantized transformed coefficients according to a quantization table, the quantized coefficients according to reorganize the zigzag scanning, and run-length coding, arithmetic coding or Huffman coding.

[0004] 文书档案一般用于提供重要历史信息,具有法律意义。 [0004] archives are generally used to provide important historical information, legal significance. 该特性决定了在选择编码方式时应采用无损压缩方法。 This feature determines the lossless compression coding method should be selected.

发明内容 SUMMARY

[0005] 本发明克服了现有技术中图像存储空间较大且有损压缩使图像失真等缺陷,提出了一种文书档案图像压缩方法。 [0005] The present invention overcomes the prior art image storage space is large and lossy compression on image defects such as distortion, archives proposed an image compression method.

[0006] 本发明提出了一种文书档案图像压缩方法,包括以下步骤: [0006] The present invention provides a method for compressing document image files, comprising the steps of:

[0007] 预处理,所述预处理对原始图像进行处理,以去除扫描过程中的噪声干扰; [0007] The pretreatment, the pretreatment of the raw image is processed to remove noise in the scanning process;

[0008] 三值化处理,所述三值化处理对经过所述预处理后的图像进行处理,以利用三种色彩信息对经过所述预处理后的所述图像进行表示;和 [0008] The binarization processing, the ternarization processing on the image after pretreatment process, utilizing the image information of three colors after the pretreatment is expressed; and

[0009] LL压缩编码处理,所述LL压缩编码用于对经过所述三值化处理的所述图像进行压缩编码。 [0009] LL compression coding processing, the compression encoding for the LL image through the three-value compression encoding process is performed.

[0010] 其中,所述预处理步骤包括以下分步骤: [0010] wherein the preprocessing step comprises the substeps of:

[0011] 步骤A1 :将RGB颜色模型转换为HSI颜色模型; [0011] Step A1: convert the RGB color model for the HSI color model;

[0012] 步骤A2 :在HSI模型中用灰度图增强方法增强其中的I分量; [0012] Step A2: In the HSI color model by grayscale image enhancement methods to enhance the I component therein;

[0013] 步骤A3 :将处理结果转换为RGB模型。 [0013] Step A3: The conversion processing result to the RGB model.

[0014] 其中,所述三值化处理步骤中,所述三种色彩信息分别为红色信息、白色信息和黑色信息。 [0014] wherein the ternarization processing step, color information of the three information are red, black information and white information.

[0015] 其中,所述三值化处理步骤中: [0015] wherein, in the ternarization process steps:

[0016] 依次扫描所述图像中的每个像素(i,j),假设其RGB值为(&_,Gu,BiP,则 [0016] sequentially scanning the image for each pixel (i, j), it is assumed that RGB values ​​(& _, Gu, BiP, the

[0017] 若当&.< TK,Gu< Tc,Bu< TB时,表示当前像素为黑色,令I u= 0 ; [0017] provided that when & <TK, Gu <Tc, Bu <TB, it indicates that the current pixel is black, so I u = 0;

[0018] 若当 [0018] Jordan

Figure CN103024246BD00041

,表示当前像素为白色,令、=1 ; , It indicates that the current pixel is white, so, = 1;

[0019] 否则,表示当前像素为红色,令、=2; [0019] Otherwise, it indicates that the current pixel is red, so, = 2;

[0020] 其中(TK,TG,TB)及(尤,4,尤)为经验阈值,I。 [0020] wherein (TK, TG, TB), and (in particular, 4, in particular) is an empirical threshold, I. .为三值化后每个像素的颜色标志, 共 For the color patches of each pixel after binarization, total

[0021] 有三种取值:{0,1,2}。 [0021] There are three values: {0,1,2}.

[0022] 其中,所述经验阈值分别为(TK,Te,TB) = (20,15,30),并且 [0022] wherein, the empirical threshold are (TK, Te, TB) = (20,15,30), and

Figure CN103024246BD00042

[0023] 其中,在所述LL压缩编码处理步骤中,利用游程描述所述图像,所述游程为所述图像中的一行中连续的相同颜色的像素。 [0023] wherein, in the compression encoding processing step LL, the image described using a run, the run of the image, one row of consecutive pixels of the same color.

[0024] 其中,在所述LL压缩编码处理步骤中生成LL压缩文件,所述LL压缩文件包括:LL 文件头、图像数据开始标志、行压缩数据、和图像数据结束标志。 [0024] wherein, in the compressed file generating LL LL compression coding processing step, the compressed file LL comprising: LL header, a start flag image data, compressed line data, image data, and the end flag.

[0025] 本发明的技术方案是根据三值化图像中像素之间的相关性,提出一种基于游程的图像LL无损压缩编码方法。 [0025] aspect of the present invention is the correlation between the three values ​​of the pixel in the image, based on image LL proposed run-length coding method for lossless compression.

[0026] 本发明对图像实施无损压缩编码,保留了图像中的所有有用信息,避免了图像还原后因失真而丢失需要的彳目息。 [0026] The embodiment of the present invention, lossless compression-encoding an image, the image retains all useful information, avoiding the entry information after the left foot image distortion due to the reduction of the required lost.

[0027] 本发明针对文书档案图像主要具有三种颜色的特点,通过对扫描图像实施三值化处理,使处理后的图像内容更符合文书档案的规格。 [0027] The present invention is directed to document image files having a characteristic primary three colors, the image by scanning the ternarization process embodiment, the processed image contents more meet the specifications of archives.

[0028] 本发明采用基于游程的无损压缩编码方法,消除了冗余度,使压缩后的图像具有更高的压缩比,减小了占用的存储空间,有利于文书档案图像的保存。 [0028] The present invention is based runlengths stored lossless compression-encoding method, eliminating the redundancy, the compressed image having a higher compression ratio, reducing the occupied storage space is conducive to image archives.

附图说明 BRIEF DESCRIPTION

[0029] 图1表示文书档案图像压缩流程; [0029] FIG. 1 shows an image compression process archives;

[0030] 图2表示预处理前后的文书档案图像; [0030] FIG. 2 shows an image before and after pretreatment archives;

[0031] 图3表不二值化前后的文书档案图像; [0031] Table 3 archives Fuji binarized image before and after;

[0032] 图4表示游程的压缩结构CS ; [0032] FIG. 4 shows the structure of a compression run of the CS;

[0033] 图5表示控制结构BS; [0033] FIG. 5 shows a control structure of the BS;

[0034] 图6表示文书档案图像行压缩结果。 [0034] FIG. 6 shows the results of compression image line archives.

具体实施方式 Detailed ways

[0035] 结合以下具体实施例和附图,对本发明作进一步的详细说明。 [0035] The following specific embodiments and in conjunction with the accompanying drawings, the present invention will be further described in detail. 实施本发明的过程、 条件、实验方法等,除以下专门提及的内容之外,均为本领域的普遍知识和公知常识,本发明没有特别限制内容。 The process embodiment of the present invention, conditions, experimental methods, etc., in addition to content specifically mentioned in the following, are common knowledge and common knowledge in the art, the present invention is not particularly restricted content.

[0036] 本发明文书档案图像压缩方法包括预处理、三值化处理与LL压缩编码处理,如图1所示。 [0036] The present invention archives image compression method comprising a pretreatment process and ternary LL compression encoding process, as shown in FIG.

[0037] 预处理的目的是对原始图像进行处理,通过增强图片中的对比度,滤除其中由于扫描产生的噪声干扰,使图片更为清晰。 [0037] The purpose of preprocessing an original image is processed by the contrast enhanced image, which filter out noise due to the generation of scanning the image clearer.

[0038] 三值化处理用于对预处理后的图像进行类型转换。 [0038] The three values ​​used for processing the preprocessed image type conversion. 由于文书档案图像的颜色大多以红、白、黑三色,所以将图像中的其它颜色通过三值化处理归为三色中的一种,使图像转为以红、白、黑三色表示。 Since most of the color document image file of red, white and black, so that the other color images by three-value as a normalization process in three colors, into image represented with red, white and black .

[0039] LL压缩编码处理用于对经过三值化处理的图像进行游程编码压缩,游程编码压缩为无损压缩,保留了图像中的所有信息。 [0039] LL compression encoding process for the image after binarization processing performed run-length encoding compression, run-length encoding compression is lossless compression, all the information retained in the image. 并且游程编码消除了冗余度,具有很高的压缩比, 减小了图像文件的存储空间。 And run-length encoding eliminates the redundancy, a high compression ratio, reducing the storage space of the image file.

[0040] 实施例: [0040] Example:

[0041] 文书档案在扫描过程中经常会受到噪声的干扰从而影响后续处理,故需要对图像进行预处理。 [0041] archives are often disturbed by noise in the scanning process thus affecting the subsequent processing, it is necessary to preprocess the image. 优选地,本发明采用彩色图像增强的方法来改善图像的视觉效果。 Preferably, the present invention employs a color image enhancement method to improve the visual effect of the image. 对彩色图像进行处理,首先需要选择合适的颜色模型。 Color image processing, first need to select the appropriate color model. 常用的颜色模型有RGB,YUV及HSI等。 There are common color model RGB, YUV and HSI and so on. 其中最常用的RGB颜色模型与显示系统相关,计算机显示器使用RGB来显示颜色。 The most common of the RGB color model associated with the display system, computer monitor using RGB display colors. 它是一种混合型颜色模型,由三种基色:红色(R)、绿色(G)和蓝色(B)按照一定比例混合得到。 It is a hybrid model of color, the three primary colors: red (R), green (G) and blue (B) obtained by mixing a certain percentage. RGB模型基于笛卡尔坐标系统,三个轴分别为R、G和B。 The RGB model is based on a Cartesian coordinate system, are three axes R, G and B. HSI颜色模型用色调H、饱和度S、亮度I 来描述颜色,其中色调和饱和度主要用于描述色彩信息,而亮度表示光的强度。 HSI color model of hue H, saturation S, the brightness I to describe the color, hue and saturation which is mainly used to describe the color information and the luminance represents the intensity of light. 该模型有两个特点:(1) I分量与图像的彩色信息无关(2) H和S分量与人感受颜色的方式紧密相连,符合人的视觉特性。 This model has two characteristics: (1) color information of the image I independent component (2) H and S components closely linked to human feelings color mode, consistent with human visual characteristics. 本实施例中对HSI颜色模型进行预处理。 In this embodiment of the HSI color model pretreatment.

[0042] RGB模型与HSI模型之间的转换关系如下: [0042] The conversion between the RGB model and the HSI color model as follows:

[0043] (l)RGB 转换到HSI [0043] (l) RGB to HSI conversion

Figure CN103024246BD00051

[0047] (2)HSI 转换到RGB [0047] (2) HSI to RGB conversion

[0048] 当11£[0,120]时, [0048] When 11 £ [0,120], the

[0049] B = I (1-S) [0049] B = I (1-S)

Figure CN103024246BD00052

[0051] G = 3I-(B+R) [0051] G = 3I- (B + R)

[0052] 当HG [120, 240]时, [0052] When HG [120, 240],

[0053] R = I (1-S) [0053] R = I (1-S)

Figure CN103024246BD00053

[0055] B = 3I-(R+G) [0055] B = 3I- (R + G)

[0056] 当HG [240, 360]时, [0056] When HG [240, 360],

[0057] G = I (1-S) [0057] G = I (1-S)

Figure CN103024246BD00054

[0059] R = 3I-(R+G) [0059] R = 3I- (R + G)

[0060] 预处理具体步骤如下: [0060] The preprocessing steps are as follows:

[0061] 步骤A1 :将RGB颜色模型转换为HS I颜色模型。 [0061] Step A1: convert the RGB color model for the HS I color model. 将RGB颜色模型根据上述关系公式转换为HSI颜色模型。 The RGB color model is converted into HSI color model according to the above formula relationship.

[0062] 步骤A2 :在HSI模型中用灰度图增强方法增强其中的I分量。 [0062] Step A2: In the HSI color model by grayscale image enhancement methods to enhance the I component therein.

[0063] 步骤A3 :将处理结果转换为RGB模型。 [0063] Step A3: The conversion processing result to the RGB model. 转换后的RGB图像较原始RGB图像具有较高的对比度,图像更加清晰醒目并且色彩表现更加鲜明。 RGB converted image than the original RGB image with high contrast, image more clearly visible and more vivid color performance.

[0064] 图2显示的是预处理前后的文书档案图像,提高了文书档案图像的对比度。 [0064] FIG. 2 shows the images before and after pretreatment archives, archives improve the contrast of the image. 图2 中左边图像为预处理前的文书档案图像,图2中右边图像为预处理后的文书档案图像。 FIG 2 is a left image in the document image before the pretreatment instrument, in FIG. 2 is a right image archives preprocessed image. 经预处理后的图像较预处理前的图像具有较高的对比度,图像更加清晰醒目并且色彩表现更加鲜明。 Image after image before the pretreatment pretreatment with higher contrast than the images more clearly visible and more vivid color representation.

[0065] 在彩色图像中,一般用24bit来表示一个像素,即每个像素共有224种可能的颜色。 [0065] In a color image, it represented generally by a 24bit pixel, i.e., each pixel of a total of 224 kinds of possible colors. 然而针对文书档案图像的视觉特性,主要由红、白、黑三种颜色构成,所以利用24bit来表示每个像素存在着大量的冗余。 However, for the visual characteristics of the image archives, mainly composed of red, white, black, three colors, so that each pixel is represented using the 24bit there is a lot of redundancy. 据此提出一种文书档案图像三值化方法,利用三种色彩信息将图像表示出来。 Accordingly archives provides a method for image binarization, using three color image information represented.

[0066] 该过程中,将由RGB模型表示的图像转换为由红、白、黑三色组成的图像。 [0066] In this process, the image conversion by the RGB model represented by red, white, and black image thereof. 针对RGB 图像中的每个像素点的颜色进行判断,将其转换为红、白、黑之一的颜色。 The determination color for each pixel in an RGB image, to convert it to red, white color, a black one.

[0067] 例如,针对图像中的每个像素(i,j),定义其颜色标志I#其中: [0067] For example, for each pixel in the image (i, j), the definition of color flag I # wherein:

[0068] Iij= 0说明像素(i,j)为黑色 [0068] Iij = 0 in which the pixel (i, j) is black

[0069] If 1说明像素(i,j)为白色 [0069] If 1 in which the pixel (i, j) as a white

[0070] Iu= 2说明像素(i,j)为红色 [0070] Iu = 2 in which the pixel (i, j) as a red

[0071] 与原始采用24bit表示一个像素不同,由于颜色标志L只有三种可能的取值{0, 1,2},故最多需要2bit即可以将该像素描述出来。 [0071] represents a 24bit using the original pixels of different, since the color of the flag L are only three possible values ​​{0, 1}, i.e., it may take up to 2bit out the pixel described.

[0072] 文书档案图像三值化具体步骤如下: [0072] The image binarization archives the following steps:

[0073] 依次扫描图像中的每个像素(i,j),假设其RGB值为(Rm Gy,By),则 [0073] sequentially scanned image for each pixel (i, j), it is assumed that RGB value (Rm Gy, By), the

[0074] (1)当氏」<1\4」<1'(;4」<1\时,表示当前像素为黑色,令1 1」=0。 [0074] (1) when s "<1 \ 4" <1 '(; 4 "<1 \, indicates the current pixel is black, so that a 1" = 0.

[0075] (2)当 [0075] (2) when

Figure CN103024246BD00061

,表示当前像素为白色,令Ii」=1。 It indicates that the current pixel is white, so Ii "= 1.

[0076] (3)若⑴和⑵均不满足,则说明当前像素为红色,令Iij= 2。 [0076] (3) If ⑴ and ⑵ not met, then the current pixel is red, so that Iij = 2.

[0077] 其中(TK,Tc,T0及(元,4,尤)为经验阈值,本发明中的取值分别为(TK,Tc,T0 = (20,15, 30): [0077] wherein (TK, Tc, T0 and (Yuan, 4, in particular) is an empirical threshold value in the present invention are (TK, Tc, T0 = (20,15, 30):

Figure CN103024246BD00062

[0078] 图3中左边图像为三值化前的文书档案图像,图3中右边图像为三值化后的文书档案图像。 In the left image [0078] FIG. 3 is a three-archives image before binarization, the image of the right side in FIG. 3 archives image after binarization. 经过三值化处理后,图像中每个像素的颜色为红、黑或白三种颜色之一。 After binarization process, the image color of each pixel is a red, black and white, or one of three colors.

[0079] 对大量三值化文书档案图像进行统计,可以发现图像中像素之间存在着较强的相关性。 [0079] The binarization large archives image statistics, can be found there is a strong correlation between the pixels in the image. 假设同一行中前一个像素为红色即I iU〇= 2,那么当前像素为红色的条件概率P(I u =2 | Iujj = 2)满足下列不等式: Suppose the preceding pixel in the same row as the red iU〇 i.e., I = 2, then the current pixel is red condition probability P (I u = 2 | Iujj = 2) satisfies the following inequality:

[0080] P (lij= 2 11 i(ji) = 2) > P (I jj= 111 ifj-!)= 2)&P (I jj= 2 11 ifj-!) = 2) > P (I jj = 0 | ]:&._;〇= 2)同理, [0080] P (lij = 2 11 i (ji) = 2)> P (I jj = 111 ifj -!)! = 2) & P (I jj = 2 11 ifj-) = 2)> P (I jj = 0 |]: & ._; square = 2) Similarly,

[0081 ] P (Iij= 111 i(ji) = 1) > P (I ij= 2 11 = 1)&P (I ij= 111 i(jD =1) > P (I ^-= 〇I li(ji)= 1) [0081] P (Iij = 111 i (ji) = 1)> P (I ij = 2 11 = 1) & P (I ij = 111 i (jD = 1)> P (I ^ - = 〇I li (ji ) = 1)

[0082] P (Iij= 0 | I "jo = 0)〉P (I1 | I "jn = 0) &P (I 〇.= 0 | I "jo = 0)〉P (I 〇.= 2 Ili(ji)=〇) [0082] P (Iij = 0 | I "jo = 0)> P (I1 | I" jn = 0) & P. ​​(I square = 0 |. I "jo = 0)> P (I square = 2 Ili ( ji) = square)

[0083] 基于该特性提出一种LL压缩编码方法,基本思想是利用图像中的游程,即一行中连续的相同颜色的像素,而不是一个个孤立的像素来表示图像。 [0083] Based on the characteristics of LL proposed compression encoding method, the basic idea is to use a run in the image, i.e., a continuous line of pixels of the same color, rather than one isolated pixels to represent the image. 针对文书档案图像而言,其游程数目远远小于图像中的像素数目,所以采用游程描述图像,一方面完整地保留了图像中的所有信息,另一方面大大降低了冗余度。 Archives for images, the number of its travel path is much less than the number of pixels in the image, so the use of run-length description of the image, on the one hand to retain the integrity of all information in the image, on the other hand greatly reduced redundancy.

[0084] 为了有效地描述游程,需要对大量的三值化文书档案图像进行统计分析,比如每行中的游程数目、游程长度的均值以及方差等等。 [0084] In order to effectively describe the run, the need for a large number of image binarization archives statistical analysis, such as the number of runs in each row, and the mean runlength variance and the like. 在此基础上,定义游程的压缩结构CS, On this basis, the compressed structure of the CS run defined,

[0085] 如图4所示,其中每个格子代表一个bit。 [0085] 4, wherein each grid represents one bit. 虚线框表示可选,取决于游程长度标志位(LF),对CS中各项的具体含义说明如表1所示。 Dashed boxes indicate optional, depending on the run length flag (the LF), the specific meaning of the CS Description As shown in Table 1.

[0086] 定义控制结构BS,如图5所示。 [0086] defined control structure BS, as shown in FIG. 其中, among them,

[0087] 口此口办口巩二000000时,标志图像数据开始,记为RSMGSTART. [0087] This inlet port opening Gong 2,000,000 do, the start symbol image data, referred to as RSMGSTART.

[0088] 111111时,标志图像数据结束,记为BSMGEND. When the [0088] 111111, marks the end of the image data, referred to as BSMGEND.

[0089] 000001时,标志图像中一行数据开始,记为BSROWSTART. When the [0089] 000001, the start flag line of image data, referred to as BSROWSTART.

[0090] 000010时,标志图像中一行数据结束,记为BSROWEND. When the [0090] 000010, marks the end of one line image data, referred to as BSROWEND.

[0091] 表1 [0091] TABLE 1

Figure CN103024246BD00071

[0093] 利用游程压缩结构CS以及控制结构BS,依次对图像中的每行数据进行压缩,行压缩结果如图6所示,其中N表示一行中的游程数目。 [0093] The use of run length compression structure and control structure of the BS CS, sequentially for each row in the image compression, the compression line shown in FIG. 6 the results, where N represents the number of runs in a row.

[0094] 结合文书档案图像的宽度、高度等统计信息,定义一种新的压缩文件格式LL,其组成如下表: [0094] The width, height, etc. Statistics binding archives image, defining a new compressed file format LL, the composition as follows:

[0095] 表2 [0095] TABLE 2

Figure CN103024246BD00072

[0097] [0097]

Figure CN103024246BD00081

[0098]红白黑三色组成的文书档案图像经游程编码进行压缩之后,图像所占的存储空间进一步减小。 [0098] After the archives color composition red and black images are compressed by run-length encoding, the storage space occupied by the image is further reduced. 更由于LL压缩编码(游程编码)为无损压缩编码方法,不但保留了图像中所有信息,同时消除了冗余,获得较高的压缩比。 Because LL more compression encoding (RLE) is a lossless compression-encoding method, not only retains all of the image information, while eliminating redundant, to obtain a higher compression ratio.

[0099]随机选取1000个文书档案图像利用所述压缩方法进行压缩,平均压缩比例为1 : 99。 [0099] randomly selected 1000 archives image compression using the compression method, the average compression ratio of 1: 99 部分图像压缩前后所占存储空间以及压缩比例如表3所示。 The storage space occupied by the front and rear portions of the image compression and the compression ratio as shown in Table 3.

[0100] 表3 [0100] TABLE 3

Figure CN103024246BD00082

[0102] 本发明的保护内容不局限于以上实施例。 [0102] protection of the present invention is not limited to the above embodiment. 在不背离发明构思的精神和范围下,本领域技术人员能够想到的变化和优点都被包括在本发明中,并且以所附的权利要求书为保护范围。 Without departing from the spirit and scope of the inventive concept, those skilled in the art can conceive of variations and advantages are included in the present invention and to the appended claims for the scope of protection.

Claims (5)

1. 一种文书档案图像压缩方法,其特征在于,包括以下步骤: 预处理,所述预处理对原始图像进行处理,以去除扫描过程中的噪声干扰; 三值化处理,所述三值化处理对经过所述预处理后的图像进行处理,以利用三种色彩信息对经过所述预处理后的所述图像进行表示;和LL压缩编码处理,所述LL压缩编码处理用于对经过所述三值化处理的所述图像进行无损压缩编码; 其中, 所述三值化处理步骤中: 依次扫描所述图像中的每个像素(i,j),假设其RGB值为(RuGu ,则若当RijCTli, GijCTe, BijCIe时,表示当前像素为黑色,令I i」=O ; 若当> ή,.為> &,时,表示当前像素为白色,令Iij= 1 ; 否则,表示当前像素为红色,令Iij= 2 ; 其中(TK,Te,Tb)及为经验阈值,Iij为三值化后每个像素的颜色标志,共有三种取值:{〇, 1,2}; 所述LL压缩编码处理步骤中:利用游程描述 Archives An image compression method comprising the steps of: pretreatment, the pretreatment of the raw image is processed to remove noise in the scanning process; binarization processing, the ternarization after image processing performed after the pretreatment process, using three colors to the image information through the pre-processed representation; LL and compression coding processing, the compression encoding process for LL through the said image of said three-value processing of the lossless compression encoding; wherein the values ​​of the three step process: sequentially scanning the image for each pixel (i, j), it is assumed that RGB values ​​(RuGu, the If when RijCTli, GijCTe, BijCIe, indicates that the current pixel is black, so i i "= O; Jordan> ή ,. is> &, time, indicates that the current pixel is white, so Iij = 1; otherwise, it represents the current pixel red, so that Iij = 2; wherein the (TK, Te, Tb) and as an empirical threshold, ??? Iij is the color patches of each pixel after binarization, there are three values: {square, 1,2}; the LL compression encoding process steps: using a run described 述图像,所述游程为所述图像中的一行中连续的相同颜色的像素;在利用游程描述所述图像中,对三值化文书档案图像进行统计分析,定义游程的压缩结构和控制结构;利用游程压缩结构以及控制结构,依次对图像中的每行数据进行压缩; 其中所述压缩结构包括像素标志、游程长度标志和游程长度;所述控制结构包括以下: D6D5D4D3D2D 1= OOOOOO时,标志图像数据开始,记为BSMGSTART ; D6D5D4D3D2D 1= 111111时,标志图像数据结束,记为BSMGEND ; D6D5D4D3D2D 1= 000001时,标志图像中一行数据开始,记为BSROWSTART ; D6D5D4D3D2D 1= 000010时,标志图像中一行数据结束,记为BSROWEND。 Said image, the run is in the image row successive same colored pixels; using the run described in the image, the three-valued archives image statistical analysis, a compression structure and the control structure definition of the run; using run-length compressed configuration and control structure, sequentially for each row in the image is compressed; wherein the compression structure includes pixel flag, run length flag and a run length; said control structure comprises the following: when D6D5D4D3D2D 1 = OOOOOO, the logo image data start, referred to as BSMGSTART; when D6D5D4D3D2D 1 = 111111, the logo image data end, referred to as BSMGEND; when 000001 D6D5D4D3D2D 1 =, symbol image row of data starts, referred to as BSROWSTART; D6D5D4D3D2D 1 = 000010, the symbol image a row of data end, denoted BSROWEND.
2. 如权利要求1所述的文书档案图像压缩方法,其特征在于,所述预处理步骤包括以下分步骤: 步骤Al :将RGB颜色模型转换为HSI颜色模型; 步骤A2 :在HSI模型中用灰度图增强方法增强其中的I分量; 步骤A3 :将处理结果转换为RGB模型。 In the HSI color model by:; document file as claimed in image compression method according to claim 1, wherein said preprocessing step comprises the following substeps:: Step A2 Step Al color model into the RGB color model HSI I grayscale enhancement method for enhancing component therein; step A3: the conversion processing result to the RGB model.
3. 如权利要求1所述的文书档案图像压缩方法,其特征在于,所述三值化处理步骤中, 所述三种色彩信息分别为红色信息、白色信息和黑色信息。 Archives The image compression claim 1, characterized in that said three-value processing step, color information of the three information are red, white and black information information.
4. 如权利要求1所述的文书档案图像压缩方法,其特征在于,所述经验阈值分别为(TK,TG,TB) = (2〇,15,3〇),并且(之,户("之)=(210,225,220) O 4. The image compression method of the document file as claimed in claim 1, characterized in that the empirical threshold are (TK, TG, TB) = (2〇, 15,3〇), and (of, households ( " the) = (210,225,220) O
5. 如权利要求1所述的文书档案图像压缩方法,其特征在于,在所述LL压缩编码处理步骤中生成LL压缩文件,所述LL压缩文件包括:LL文件头、图像数据开始标志、行压缩数据、和图像数据结束标志。 5. The image compression method of the document file as claimed in claim 1, characterized in that, in the compressed file generating LL LL compression coding processing step, the compressed file LL comprising: LL header, the image data of the start flag, OK data compression, and image data end mark.
CN201210496941.XA 2012-11-29 2012-11-29 Documentary archive image compressing method CN103024246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210496941.XA CN103024246B (en) 2012-11-29 2012-11-29 Documentary archive image compressing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210496941.XA CN103024246B (en) 2012-11-29 2012-11-29 Documentary archive image compressing method

Publications (2)

Publication Number Publication Date
CN103024246A CN103024246A (en) 2013-04-03
CN103024246B true CN103024246B (en) 2015-06-24

Family

ID=47972346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210496941.XA CN103024246B (en) 2012-11-29 2012-11-29 Documentary archive image compressing method

Country Status (1)

Country Link
CN (1) CN103024246B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9582847B2 (en) * 2013-04-22 2017-02-28 Intel Corporation Color buffer compression
CN106791866A (en) * 2015-11-24 2017-05-31 潘晓虹 A kind of four are worth image compress processing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1696977A (en) * 2005-05-26 2005-11-16 无敌科技(西安)有限公司 Method for compressing image
EP2144432A1 (en) * 2008-07-08 2010-01-13 Panasonic Corporation Adaptive color format conversion and deconversion
CN101835045A (en) * 2010-05-05 2010-09-15 哈尔滨工业大学 Hi-fidelity remote sensing image compression and resolution ratio enhancement joint treatment method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1696977A (en) * 2005-05-26 2005-11-16 无敌科技(西安)有限公司 Method for compressing image
EP2144432A1 (en) * 2008-07-08 2010-01-13 Panasonic Corporation Adaptive color format conversion and deconversion
CN101835045A (en) * 2010-05-05 2010-09-15 哈尔滨工业大学 Hi-fidelity remote sensing image compression and resolution ratio enhancement joint treatment method

Also Published As

Publication number Publication date
CN103024246A (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN101931815B (en) Quantization adjustment based on texture level
CN1260978C (en) Image processing apparatus
CN101371583B (en) Method and device of high dynamic range coding / decoding
Lin et al. Compound image compression for real-time computer screen image transmission
CN100551058C (en) Video coding system providing separate coding chains for dynamically selected small-size or full-size playback
CN100476858C (en) Method, device and system for achieving coding ganis in wavelet-based image coding-decoding device
CN1684495B (en) Predictive lossless coding of images and video
CN100446540C (en) Method and device for compressing color images
CN101416220B (en) For enhancing compressed image data preprocessing
US7792898B2 (en) Method of remote displaying and processing based on server/client architecture
CN100423538C (en) Image processing apparatus
CN1214349C (en) Method and apparatus for processing visual image and image compression method
JP4365957B2 (en) Image processing method and apparatus and storage medium
EP0833519B1 (en) Segmentation and background suppression in JPEG-compressed images using encoding cost data
KR100566122B1 (en) Method of compressing still pictures for mobile devices
JP2000059634A (en) Variable quantization device
WO2003065708A1 (en) Coder matched layer separation for compression of compound documents
CN1402528A (en) Picture processing device and method, and computer program and storage medium
CN101019437A (en) H.264 spatial error concealment based on the intra-prediction direction
JP2003228712A (en) Method for identifying text-like pixel from image
CN1893659A (en) DCT compression using Golomb-Rice coding
CN104584559B (en) Equipment, method, system and the readable storage medium storing program for executing of a kind of scope of extension for chroma QP value
TW200826691A (en) Intra prediction encoding control method and apparatus, program therefor, and storage medium for storing the program
CN102523367B (en) Based on real-time image compression and multi-color palette reduction method
JP4773678B2 (en) Document system

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model
CF01