KR100237851B1

KR100237851B1 - Image/text data processing method and device

Info

Publication number: KR100237851B1
Application number: KR1019970028395A
Authority: KR
Inventors: 부병욱
Original assignee: 윤종용; 삼성전자주식회사
Priority date: 1997-06-27
Filing date: 1997-06-27
Publication date: 2000-01-15
Also published as: KR19990004310A

Abstract

개시된 이미지/텍스트 영역 데이터 검출방법 및 압축장치는 텍스트 영역의 데이터와 이미지 영역의 데이터가 공존하고 있을 경우에 이미지 영역의 데이터 및 텍스트 영역의 데이터를 검출하고, 검출 결과에 따라 이미지 영역의 데이터는 손실 알고리즘으로 압축하고, 텍스트 영역의 데이터는 무손실 알고리즘으로 압축하는 것이다.The disclosed image / text area data detecting method and the compression apparatus detect data of the image area and data of the text area when the data of the text area and the data of the image area coexist, and the data of the image area is lost according to the detection result. Algorithm compression, textual data is compressed using a lossless algorithm.

본 발명은 가로 방향 및 세로 방향으로 소정의 수의 픽셀을 가지는 단위 영역으로 구획하면서 순차적으로 스캐닝한 소정 문서의 데이터를 그레이 스케일 변환부에서 그레이 스케일 데이터로 변환하고, 변환한 그레이 스케일 데이터의 값으로 이미지/텍스트 데이터 검출부가 이미지 영역의 데이터 또는 텍스트 영역의 데이터인지를 판단하며, 판단 결과에 따라 이미지 영역의 데이터는 손실 압축부에서 손실 알고리즘으로 압축하고, 텍스트 영역의 데이터는 무손실 압축부에서 무손실 알고리즘으로 압축하는 것으로서 이미지 영역의 데이터는 높은 압축률로 압축하고, 텍스트 영역의 데이터는 손실됨이 없이 압축할 수 있다.According to the present invention, data of a predetermined document sequentially scanned while being divided into a unit area having a predetermined number of pixels in the horizontal direction and the vertical direction is converted into gray scale data by a gray scale conversion unit, and converted into gray scale data values. The image / text data detector determines whether the data of the image area or the data of the text area, and according to the determination result, the data of the image area is compressed by the lossy algorithm in the lossy compression unit, and the data of the text area is the lossless algorithm in the lossless compression unit. By compressing the data in the image region, the data of the image region can be compressed at a high compression rate, and the data of the text region can be compressed without loss.

Description

Method for detecting image / text area data of document and compression device

본 발명은 하나의 문서의 데이터에서 이미지(image) 영역의 데이터 및 텍스트(text) 영역의 데이터를 검출하고, 검출한 이미지 영역의 데이터 및 텍스트 영역의 데이터에 따라 효율적으로 압축하는 문서의 이미지/텍스트 영역 데이터 검출방법 및 압축장치에 관한 것이다.The present invention provides an image / text of a document which detects data of an image region and data of a text region from data of one document, and efficiently compresses according to the detected data of the image region and data of a text region. A method for detecting area data and a compression apparatus.

일반적으로 소정의 문서를 전송하거나 저장할 경우에 소정 문서의 데이터를 그대로 전송 및 저장하게 되면, 데이터의 용량이 매우 커 많은 전송 시간이 소요되고, 많은 기억 용량을 필요로 한다.In general, when transmitting or storing a predetermined document, if the data of the predetermined document is transmitted and stored as it is, the capacity of the data is very large, which requires a lot of transmission time, and requires a lot of storage capacity.

그러므로 소정 문서의 데이터를 압축하여 전송 및 저장하고, 압축한 소정의 데이터는 소정의 복원 알고리즘을 이용하여 원래의 데이터로 복원하고 있다.Therefore, the data of the predetermined document is compressed, transmitted and stored, and the compressed predetermined data is restored to the original data by using a predetermined restoration algorithm.

데이터의 압축은 압축할 데이터의 종류에 따라 상이한 압축 알고리즘을 사용하여 압축하고 있다.The compression of data is compressed using different compression algorithms depending on the type of data to be compressed.

예를 들면, 소정의 기호 및 문자 등과 같은 텍스트 영역의 데이터와 사진 및 그림 등과 같은 이미지 영역의 데이터에 따라 상이한 압축 알고리즘으로 데이터를 압축한다.For example, data is compressed by different compression algorithms according to data of a text area such as predetermined symbols and characters and data of an image area such as a picture and a picture.

데이터를 압축시키는 압축 알고리즘으로는 특성상 JPEG(Joint Photographic Experts Group) 등과 같은 손실(lossy) 알고리즘과 JBIG(Joint Bi-level Image experts Group) 등과 같은 무손실(lossless) 알고리즘이 있다.Compression algorithms for compressing data include lossy algorithms such as Joint Photographic Experts Group (JPEG) and lossless algorithms such as Joint Bi-level Image Expert Group (JBIG).

상기 손실 알고리즘은 데이터의 손실이 발생하나 높은 압축률로 압축시킬 수 있는 것으로서 통상적으로 이미지 영역의 데이터를 압축시키는 데 많이 사용되고 있고, 텍스트 영역의 데이터를 압축하는 데에는 적용할 수 없다.The loss algorithm is used for compressing data in an image region, but it is generally used to compress data in an image region.

상기 무손실 알고리즘은 데이터를 손실됨이 없이 압축 및 복원이 가능하나 이미지 영역의 데이터에 대해서는 압축률이 낮으므로 소정의 문자 및 기호 등이 포함되어 있는 문서 등과 같은 텍스트 영역의 데이터를 압축시키는 데 사용되고 있다.The lossless algorithm is capable of compressing and restoring data without losing data. However, since the compression rate is low for data in an image area, the lossless algorithm is used to compress data in a text area such as a document including predetermined characters and symbols.

그리고 텍스트 영역의 데이터도 두 가지의 레벨(bi-level)을 가지고 있는 그레이 스케일 데이터에 대해서만 적용이 가능한 것으로 컬러 문서의 데이터인 경우에는 적용이 매우 곤란하다.Data of the text area is also applicable only to gray scale data having two levels (bi-level), which is very difficult to apply to data of color documents.

일반적으로 소정의 문서에는 소정의 문자 및 기호 등의 텍스트 영역 및 사진 및 그림 등의 이미지 영역이 혼합되어 있다.In general, a predetermined document is a mixture of text areas such as predetermined characters and symbols and image areas such as photographs and pictures.

이러한 텍스트 영역 및 이미지 영역이 혼합되어 있는 문서의 데이터를 압축시킬 경우에 상기한 손실 알고리즘을 사용하게 되면, 텍스트 영역의 데이터에 손실이 발생하여 원래의 텍스트 영역의 데이터를 복원시킬 수 없고, 또한 상기 무손실 알고리즘으로 압축시킬 경우에는 이미지 영역의 데이터의 압축률이 낮아 데이터의 용량이 커지게 되는 등의 문제점이 있었다.If the above loss algorithm is used to compress the data of a document in which the text area and the image area are mixed, the data of the text area may be lost and the data of the original text area may not be restored. When compressing with a lossless algorithm, there is a problem that the data capacity becomes large due to the low compression rate of the data in the image region.

따라서 본 발명의 목적은 문서의 데이터에서 이미지 영역의 데이터 및 텍스트 영역의 데이터를 검출하는 문서의 이미지/텍스트 영역 데이터 검출방법을 제공하는 데 있다.Accordingly, an object of the present invention is to provide a method for detecting image / text area data of a document for detecting data of an image area and data of a text area from data of a document.

본 발명의 다른 목적은 검출한 이미지 영역의 데이터는 손실 알고리즘으로 압축하고 텍스트 영역의 데이터는 무손실 알고리즘으로 압축하는 문서의 이미지/텍스트 영역 데이터 압축장치를 제공하는 데 있다.Another object of the present invention is to provide an image / text area data compression device of a document which compresses data of the detected image area by a lossy algorithm and compresses data of the text area by a lossless algorithm.

도 1은 본 발명에 따라 소정 문서를 스캐닝하는 과정을 설명하는 도면,1 is a view illustrating a process of scanning a predetermined document according to the present invention;

도 2는 본 발명의 압축장치의 구성을 보인 블록도,2 is a block diagram showing the configuration of a compression device of the present invention;

도 3은 본 발명의 검출방법에 따라 이미지 영역의 데이터 및 텍스트 영역의 데이터를 판단하는 동작을 보인 신호 흐름도,3 is a signal flow diagram illustrating an operation of determining data of an image area and data of a text area according to a detection method of the present invention;

도 4는 단위 영역의 픽셀을 이미지/텍스트 데이터 검출부가 입력하여 값을 판단하는 순서를 보인 도면이다.4 is a diagram illustrating a procedure of determining a value by inputting a pixel of a unit area to an image / text data detector.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

10 : 그레이 스케일 변환부 20 : 이미지/텍스트 데이터 검출부10: gray scale conversion unit 20: image / text data detection unit

30 : 손실 압축부 40 : 무손실 압축부30: lossy compression unit 40: lossless compression unit

이러한 목적을 달성하기 위한 본 발명의 문서의 이미지/텍스트 영역 데이터 검출방법 및 압축장치에 따르면, 소정의 문서를 단위 영역으로 구획하여 스캐닝한 컬러 데이터를 그레이 스케일 변환부에서 그레이 스케일 데이터로 변환한다.According to the method and the compression apparatus for image / text area data detection of a document of the present invention for achieving the above object, the color data scanned by dividing a predetermined document into a unit area is converted into gray scale data by a gray scale converter.

상기 그레이 스케일 변환부에서 변환한 각각의 그레이 스케일 데이터의 값을 판단하여 컬러의 수를 판단하고, 판단한 컬러의 수로 이미지 영역의 데이터 또는 텍스트 영역의 데이터인지를 판단한다.The number of colors is determined by determining the value of each gray scale data converted by the gray scale converter, and it is determined whether the data of the image area or the data area is the data by the determined number of colors.

데이터의 종류가 판단되면, 판단된 이미지 영역의 데이터는 손실 압축부에서 손실 알고리즘으로 압축하고, 텍스트 영역의 데이터는 무손실 압축부에서 무손실 알고리즘으로 압축한다.When the type of data is determined, the data of the determined image region is compressed by the lossy algorithm by the lossy compression unit, and the data of the text region is compressed by the lossless algorithm by the lossless compressor.

그러므로 본 발명에 따르면, 이미지 영역의 데이터는 손실 알고리즘으로 압축하여 높은 압축률로 압축할 수 있고, 텍스트 영역의 데이터는 무손실 알고리즘으로 압축하여 손실됨이 없이 압축할 수 있다.Therefore, according to the present invention, the data of the image region can be compressed with a lossy algorithm and compressed at a high compression rate, and the data of the text region can be compressed without loss by compressing with a lossless algorithm.

이하, 첨부된 도면을 참조하여 본 발명의 문서의 이미지/텍스트 영역 데이터 검출방법 및 압축장치를 상세히 설명한다.Hereinafter, an image and text area data detection method and a compression apparatus of a document of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 일반적인 문서의 이미지 스캐닝 과정을 설명하는 도면이다.1 is a diagram illustrating an image scanning process of a general document.

이에 도시된 바와 같이 소정의 문서를 스캐너 등으로 스캐닝하여 데이터로 변환할 경우에 문서의 좌측에서 우측으로 이동하면서 스캐닝하고, 또한 상부에서 하부로 이동하면서 스캐닝한다.As shown in the drawing, when a document is converted into data by scanning with a scanner or the like, scanning is performed while moving from the left to the right of the document, and also while moving from the top to the bottom.

본 발명에서는 소정 크기의 픽셀(pixels)을 가지는 단위 영역으로 구획하여 스캐닝한다.In the present invention, the scanning is divided into unit areas having pixels of a predetermined size.

예를 들면, 8 × 8 픽셀을 가지는 단위 영역으로 구획하여 스캐닝한다.For example, the scanning is partitioned into unit areas having 8 x 8 pixels.

도 2는 본 발명의 압축장치의 구성을 보인 블록도이다.2 is a block diagram showing the configuration of the compression apparatus of the present invention.

이에 도시된 바와 같이 소정의 픽셀을 가지는 단위 영역으로 소정의 문서를 구획 및 스캐닝하여 입력되는 데이터를 그레이 스케일 데이터로 변환하는 그레이 스케일 변환부(10)와, 상기 그레이 스케일 변환부(10)가 출력하는 단위 영역의 픽셀의 값으로 단위 영역이 이미지 영역의 데이터 또는 텍스트 영역의 데이터인지를 검출하는 이미지/텍스트 데이터 검출부(20)와, 상기 이미지/텍스트 데이터 검출부(20)가 이미지 영역의 데이터를 검출할 경우에 상기 입력되는 컬러 데이터를 손실 알고리즘에 따라 압축하는 손실 압축부(30)와, 상기 이미지/텍스트 데이터 검출부(20)가 텍스트 영역의 데이터를 검출할 경우에 상기 입력되는 컬러 데이터를 무손실 알고리즘에 따라 압축하는 무손실 압축부(40)로 구성하였다.As shown, the gray scale converter 10 converts the input data into gray scale data by dividing and scanning a predetermined document into a unit area having predetermined pixels, and the gray scale converter 10 outputs the gray scale data. An image / text data detector 20 detecting whether the unit region is data of the image region or data of the text region using the pixel value of the unit region, and the image / text data detector 20 detects the data of the image region. In this case, the lossy compression unit 30 compressing the input color data according to a loss algorithm, and the lossy algorithm of the input color data when the image / text data detector 20 detects data in a text area. It consists of a lossless compression unit 40 to compress according to.

이와 같이 구성된 본 발명의 압축장치는 소정의 문서를 소정의 픽셀을 가지는 단위 영역으로 구획하여 스캐닝한 컬러 데이터는 그레이 스케일 변환부(10)와, 손실 압축부(30) 및 무손실 압축부(40)로 입력된다.In the compression apparatus of the present invention configured as described above, color data scanned by dividing a predetermined document into a unit area having predetermined pixels is scanned by a gray scale converter 10, a lossy compression unit 30, and a lossless compression unit 40. Is entered.

상기 그레이 스케일 변환부(10)는 입력된 컬러 데이터를 그레이 스케일 데이터로 변환하여 출력한다.The gray scale converter 10 converts the input color data into gray scale data and outputs the gray color data.

일반적으로 NTSC 방식의 컬러 영상신호는 수학식 1을 만족하도록 하여 정확한 색상을 구현할 수 있도록 하고 있다.In general, the NTSC color video signal satisfies Equation 1 to implement accurate colors.

[수학식 1][Equation 1]

휘도신호 = 0.30×R + 0.59×G + 0.11×BLuminance signal = 0.30 × R + 0.59 × G + 0.11 × B

여기서, R은 적색 신호이고,Where R is the red signal,

G는 녹색 신호이며,G is the green signal,

B는 청색 신호이다.B is a blue signal.

그러나 상기 수학식 1을 만족하게 컬러 데이터를 그레이 스케일 데이터로 변환하게 되면, 회로의 구성이 매우 복잡하게 된다.However, when the color data is converted into the gray scale data in accordance with Equation 1, the circuit configuration becomes very complicated.

본 발명에서는 정확한 색상을 구현할 필요가 없고, 단위 영역을 구성하는 픽셀의 컬러의 수만을 판별하면 된다.In the present invention, it is not necessary to implement accurate colors, and only the number of colors of pixels constituting the unit area needs to be determined.

그러므로 본 발명에서는 수학식 2를 만족하도록 컬러 데이터를 그레이 스케일 데이터로 변환하여 컬러의 수를 판별하고, 그레이 스케일 변환부(10)의 회로 구성을 간단하게 한다.Therefore, in the present invention, the number of colors is determined by converting the color data into the gray scale data so as to satisfy the equation (2), and the circuit configuration of the gray scale converter 10 is simplified.

[수학식 2][Equation 2]

휘도신호 = R + G + BLuminance signal = R + G + B

상기 그레이 스케일 변환부(10)에서 출력되는 그레이 스케일 데이터는 이미지/텍스트 데이터 검출부(20)로 입력되는 것으로서 이미지/텍스트 데이터 검출부(20)는 입력되는 각각의 그레이 스케일 데이터의 값으로 컬러의 수를 판단하고, 판단한 컬러의 수로 이미지 영역의 데이터 또는 텍스트 영역의 데이터인지를 판단한다.The gray scale data output from the gray scale converter 10 is input to the image / text data detector 20, and the image / text data detector 20 converts the number of colors into values of the respective gray scale data. It is determined whether the data of the image area or the data area is data based on the determined number of colors.

즉, 이미지/텍스트 데이터 검출부(20)는 단위 영역을 이루는 64개의 픽셀의 값을 판단하고, 판단 결과 값이 다른 픽셀의 종류를 카운트하며, 카운트한 값으로 데이터의 종류를 판단한다.That is, the image / text data detector 20 determines values of 64 pixels constituting the unit area, counts types of pixels having different determination results, and determines the type of data based on the counted values.

일반적으로 텍스트 문서일 경우에 8 × 8의 픽셀을 가지는 단위 영역의 컬러의 수는 2개 또는 적은 수를 가지게 된다.In general, in the case of a text document, the number of colors of a unit area having 8 × 8 pixels is two or less.

그러나 사진 등과 같은 이미지 데이터일 경우에는 이미지의 특성상 연속적인 색조(continuous - tone)를 갖는 것으로서 사람의 눈으로 보기에는 동일한 컬러일지라고 실제의 데이터 값은 미세한 차이가 있으므로 여러 개의 컬러 값이 존재하게 된다.However, in case of image data such as photograph, it has continuous-tone due to the characteristics of the image, and it may have the same color to the human eye. .

그러므로 본 발명에서는 단위 영역을 이루는 64개의 픽셀이 가지고 있는 컬러의 수로 텍스트 영역의 데이터 및 이미지 영역의 데이터를 구분한다.Therefore, in the present invention, data of the text area and data of the image area are distinguished by the number of colors of the 64 pixels constituting the unit area.

상기 이미지/텍스트 데이터 검출부(20)가 컬러의 수로 텍스트 영역의 데이터 및 이미지 영역의 데이터를 판단한 결과에 따라 소정의 제어신호를 출력하고, 출력한 제어신호에 따라 손실 압축부(30) 및 무손실 압축부(40)가 선택적으로 동작하여 입력되는 컬러 데이터를 압축하게 된다.The image / text data detector 20 outputs a predetermined control signal according to a result of determining the data of the text area and the data of the image area by the number of colors, and according to the output control signal, the lossy compression unit 30 and lossless compression The unit 40 selectively operates to compress the input color data.

예를 들면, 이미지/텍스트 데이터 검출부(20)가 단위 영역의 데이터를 이미지 영역의 데이터로 판단할 경우에 이미지/텍스트 데이터 검출부(20)에서 출력되는 제어신호에 따라 손실 압축부(30)가 동작되게 하여 입력되는 컬러 데이터를 손실 알고리즘에 따라 압축한다.For example, when the image / text data detector 20 determines the data of the unit area as the data of the image area, the loss compression unit 30 operates according to a control signal output from the image / text data detector 20. The compressed color data is compressed according to a loss algorithm.

그리고 이미지/텍스트 데이터 검출부(20)가 단위 영역의 데이터를 텍스트 영역의 데이터로 판단할 경우에 이미지/텍스트 데이터 검출부(20)에서 출력되는 제어신호에 따라 무손실 압축부(40)가 동작되게 하여 입력되는 컬러 데이터를 무손실 알고리즘에 따라 압축하게 된다.When the image / text data detector 20 determines the data of the unit area as the data of the text area, the lossless compression unit 40 is operated according to the control signal output from the image / text data detector 20. The resulting color data is compressed according to a lossless algorithm.

도 3은 본 발명의 검출방법에 따라 상기 이미지/텍스트 데이터 검출부(20)가 이미지 영역 및 텍스트 영역의 데이터를 판단하는 동작을 보인 신호 흐름도이다.3 is a signal flowchart illustrating an operation of determining, by the image / text data detector 20, data of an image region and a text region according to the detection method of the present invention.

이에 도시된 바와 같이 이미지/텍스트 데이터 검출부(20)는 단계(S1)에서 픽셀 변수(i)의 값을 '0'으로 초기화하고, 단계(S2)에서 카운트 변수(CNT)의 값을 '0'으로 초기화한 후 단계(S3)에서 하나의 픽셀을 입력한다.As shown therein, the image / text data detector 20 initializes the value of the pixel variable i to '0' in step S1 and resets the value of the count variable CNT to '0' in step S2. After initialization to step S3, one pixel is input.

예를 들면, 도 4에 도시된 바와 같이 단위 영역 내의 픽셀을 하나씩 순차적으로 입력한다.For example, as illustrated in FIG. 4, pixels in the unit area are sequentially input one by one.

다음 단계(S4)에서는 현재 입력한 픽셀의 값과 전에 입력한 픽셀의 값을 비교하여 동일한지를 판단한다.In the next step S4, it is determined whether the value of the currently input pixel is the same by comparing the value of the previously input pixel.

비교 결과 픽셀의 값이 동일한 픽셀이 없을 경우에는 단계(S5)에서 카운트 변수(CNT)의 값에 '1'을 가산하고, 단계(S6)에서 카운트 변수(CNT)의 값이 미리 설정된 값(N) 이상인지를 판단한다.If there is no pixel having the same pixel value as a result of the comparison, in step S5, '1' is added to the value of the count variable CNT, and in step S6, the value of the count variable CNT is set in advance (N). ) To determine if it is abnormal.

상기 단계(S6)에서 카운트 변수(CNT)의 값이 미리 설정된 값(N) 이상이 아닐 경우에 단계(S7)에서 픽셀 변수(i)에 1을 가산하고, 단계(S8)에서 픽셀 변수(i)의 값이 '64'인지를 판단한다.If the value of the count variable CNT is not equal to or greater than the preset value N in step S6, 1 is added to the pixel variable i in step S7, and the pixel variable i in step S8. ) Is a value of '64'.

상기 단계(S8)에서 픽셀 변수(i)의 값이 '64'가 아닐 경우에 단계(S3)로 복귀하여 1 픽셀을 입력하고, 입력한 픽셀의 값을 판단하는 동작을 계속 반복한다.When the value of the pixel variable i is not 64 in the step S8, the process returns to the step S3, inputs one pixel, and continuously repeats the operation of determining the value of the input pixel.

그리고 상기 단계(S4)에서 비교 결과 픽셀의 값이 동일할 경우에는 단계(S7)에서 픽셀 변수(i)에 1을 가산하고, 단계(S8)에서 픽셀 변수(i)의 값이 '64'인지를 판단하며, 픽셀 변수(i)의 값이 '64'가 아닐 경우에 단계(S3)로 복귀하여 1 픽셀을 입력하고, 입력한 픽셀의 값을 판단하는 동작을 계속 반복한다.If the pixel value is the same as the result of the comparison in the step S4, 1 is added to the pixel variable i in the step S7, and the value of the pixel variable i is 64 in the step S8. When the value of the pixel variable i is not '64', the process returns to step S3 to input one pixel, and the operation of determining the value of the input pixel is repeated.

이와 같은 상태에서 단계(S6)에서 카운트 변수(CNT)의 값이 미리 설정된 값(N) 이상으로 되면, 단계(S9)에서 이미지 데이터로 판단하고, 손실 압축부(30)를 동작시켜 입력되는 컬러 데이터를 손실 알고리즘에 따라 압축하게 한다.In this state, if the value of the count variable CNT is greater than or equal to the preset value N in step S6, the color is determined as the image data in step S9, and the loss compression unit 30 is operated. The data is compressed according to the loss algorithm.

그리고 단계(S8)에서 픽셀 변수(i)의 값이 '64'가 될 때까지 카운트 변수(CNT)의 값이 미리 설정된 값(N) 이상으로 되지 않으면, 단계(S10)에서 텍스트 데이터로 판단하고, 무손실 압축부(40)를 동작시켜 입력되는 컬러 데이터를 무손실 알고리즘에 따라 압축하게 한다.If the value of the count variable CNT does not become equal to or greater than the preset value N until the value of the pixel variable i becomes '64' in step S8, the process determines the text data in step S10. The lossless compression unit 40 is operated to compress the input color data according to a lossless algorithm.

한편, 상기에서는 8 × 8의 픽셀을 가지는 단위 영역을 예로 들어 설명하였다.In the above description, the unit area having 8 × 8 pixels has been described as an example.

본 발명을 실시함에 있어서는 이에 한정되지 않고, 단위 영역의 크기를 임의로 변경할 수 있다.In the present invention, the present invention is not limited thereto, and the size of the unit region may be arbitrarily changed.

그리고 텍스트 영역의 데이터 및 이미지 영역의 데이터를 구분하는 컬러의 수는 2로 설정하는 것이 바람직하나, 픽셀의 값을 판단할 경우에 약간의 오차가 발생할 수 있으므로 2보다 약간 큰 수로 설정한다.The number of colors separating the data of the text area and the data of the image area is preferably set to 2. However, since a slight error may occur when determining the pixel value, the color is set to be slightly larger than 2.

이상에서와 같이 본 발명은 소정 문서의 데이터를 단위 영역으로 구획하여 컬러의 수를 판단하고, 판단한 컬러의 수로 텍스트 영역의 데이터 및 이미지 영역의 데이터를 판단하는 것으로서 데이터의 종류를 정확하게 판단할 수 있다.As described above, according to the present invention, the number of colors may be determined by dividing the data of a predetermined document into unit areas, and the type of data may be accurately determined by determining the data of the text area and the data of the image area based on the determined number of colors. .

또한 본 발명은 판단한 데이터의 종류에 따라 손실 알고리즘 또는 무손실 알고리즘으로 압축하므로 이미지 영역의 데이터는 높은 압축률로 압축하고, 텍스트 영역의 데이터는 손실됨이 없이 압축할 수 있다.In addition, since the present invention compresses by a lossy algorithm or a lossless algorithm according to the type of the determined data, the data of the image region can be compressed at a high compression rate, and the data of the text region can be compressed without loss.

Claims

A first step of converting color data input into a unit area having a predetermined pixel into gray scale data;

A second step of inputting gray scale data converted by performing the first step by one pixel, and determining the number of colors based on the input gray scale data;

A third step of repeating, based on a unit area, a determination as to whether the number of colors determined by the second process is equal to a value of a pixel previously input;

A fourth step of counting an area in which the number of colors determined by the execution of the second step is different from a value of a previously input pixel;

A fifth step of determining as image data when the count value is greater than or equal to a preset value by performing the fourth step; And

A sixth process of determining as text data when the value counted by the execution of the fourth process is equal to or less than a preset count value;

And image / text area data detection method of a document.

A gray scale conversion unit converting color data input to a unit area of a predetermined pixel into gray scale data based on Equation 2;

An image / text data detection unit receiving gray scale data of a unit area converted by the gray scale converter and detecting data of an image area or data of a text area based on the number of colors represented by the gray scale data;

A loss compression unit for compressing the number of colors according to a flow of a loss algorithm when the image / text data detector determines the number of colors as data of an image area and detects the result as data of an image area; And

A lossless compression unit configured to compress the number of colors according to a flow of a lossless algorithm when the image / text data detector determines the number of colors as data of a text area and detects the result as data of a text area;

And image / text area data compression device of a document.

[Equation 2]

Luminance signal = R + G + B

The apparatus of claim 2, wherein the image / text data detector;

The pixel value is determined based on the number of colors represented by the gray scale data, the pixel value of the image area or the text area of the image area is determined using a count value of a different pixel by comparing the pixel value of the pixel with the predetermined pixel value. And image / text area data compression device of said document.