KR20010066401A

KR20010066401A - A compression and restoring method of data in Korean

Info

Publication number: KR20010066401A
Application number: KR1019990068112A
Authority: KR
Inventors: 김기영
Original assignee: 김기영; (주) 한국인프라
Priority date: 1999-12-31
Filing date: 1999-12-31
Publication date: 2001-07-11
Also published as: KR100320686B1

Abstract

PURPOSE: A method for compression and recovery of Korean character data is provided to perform effectively compression and recovery of all data using Korean character by outputting a code word + prefix code of code stream. CONSTITUTION: A format of a file is classified to compress and a dictionary is initialized. An index table/housing table is generated about a compression code. A compression is performed by a process outputting a code word + prefix code of code stream. A format of a file is classified to recover and the dictionary is initialized. It is confirmed if a character word is in the dictionary. In case the character word is in the dictionary, a code word is output and replaced with a prefix, and prefix new data are added to the dictionary. In case the character word isn't in the dictionary, a code stream is replaced with a prefix, and a prefix + code word is output to a character stream and the output is added to the dictionary.

Description

A compression and restoring method of data in Korean}

본 발명은 한글 데이터의 압축 및 복원방법에 관한 것으로, 특히 기존의 데이터를 압축하기 위한 프로그램들과의 호환성이 뛰어나며 압축과 복원 뿐아니라, 압축된 파일을 직접 E-mail로 보내거나 분할하여 압축시킬 수 있으며, 손상된 파일의 검색 및 치료 등의 기능을 갖도록 함으로써 한글을 사용하는 모든 데이터의 압축이나 복원에 효율적으로 이용할 수 있도록 한 한글 데이터의 압축 및 복원방법에 관한 것이다.The present invention relates to a method for compressing and restoring Hangul data, and in particular, it has excellent compatibility with programs for compressing existing data, compresses and restores, as well as compresses and compresses a compressed file directly by e-mail or by splitting. The present invention relates to a method for compressing and restoring Hangul data, which can be efficiently used for compressing or restoring all data using Hangul by having functions such as searching and treating damaged files.

일반적으로 컴퓨터를 사용하는 가정이나 직장이 많아지면서 거의 모든 정보를 컴퓨터용 파일로 하여 저장하는 경향이 많아지고 있음은 이미 주지된 사실이다.In general, it is well known that as the number of homes and workplaces using computers increases, almost all information is stored as files for computers.

그리고 이러한 컴퓨터 파일로 저장하는 것은 "0"이나 "1"의 데이터를 조합한 디지털 신호로 기록된 상태를 저장하는 것이므로 문자로만 이루어진 정보의 데이터를 저장할 경우에는 그 용량이 별로 크지 않아 디지털 신호의 조합으로 이루어진 많은 용량의 파일을 저장할 수 있지만, 오디오 신호 또는 그림이나 색채의 조합에 의한 디지털 정보인 경우에는 그 용량이 매우 커져서 하나의 저장용 디스켓을 사용할 수 없음은 물론, 웬만한 용량의 컴퓨터의 잔여 저장영역에도 저장할 수 없는 경우가 허다한 지경이다.In addition, since the data is stored as a digital signal in which the data of "0" or "1" is combined, the capacity of the data of information consisting only of letters is not so large. It can save a large amount of files, but in case of digital information based on audio signal or picture or color combination, its capacity is so large that not only one diskette can be used, but also the remaining storage of computer In many cases, it can't be saved in an area.

그러므로 일반 데이터를 압축하거나 복원할 수 있도록 한 윈집(Winzip)이 제안되어 사용하도록 하였다.Therefore, Winzip, which can compress or restore general data, is proposed and used.

그러나 상기와 같은 종래의 데이터의 압축 및 복원 방법에 의하여서는 이는 zip파일만 지원하면서 압축시킬 수 있으며, zip이외의 포맷을 복원하고자 할 경우에는 다른 포맷을 실행할 프로그램이 항상 존재하여야만 가능하고, 영문의 메뉴에만 사용할 수 있으며, 압축된 상태의 파일을 형식 변환할 수 없고, 압축된 상태에서 복원하면서 파일을 정상적으로 설치한 후에만 실행이 가능하므로 국내의 많은 컴퓨터를 사용하는 사용자들이 효율적으로 이용할 수 없는 등의 단점이 있었다.However, in the conventional method of compressing and restoring data, it can be compressed while supporting only a zip file. When restoring a format other than zip, a program to execute another format must always exist. It can only be used for menus, and it can't convert files in a compressed state, and can only be executed after files have been installed normally while being restored in a compressed state. There was a downside.

따라서 본 발명은 기존의 데이터를 압축하기 위한 프로그램들과의 호환성이 뛰어나며 압축과 복원 뿐아니라, 압축된 파일을 직접 E-mail로 보내거나 분할하여 압축시킬 수 있으며, 손상된 파일의 검색 및 치료 등의 기능을 갖도록 함으로써 한글을 사용하는 모든 데이터의 압축이나 복원에 효율적으로 이용할 수 있도록 한 한글 데이터의 압축 및 복원방법을 제공하는 것을 그 목적으로 한다.Therefore, the present invention has excellent compatibility with programs for compressing existing data, and not only compresses and restores, but also compresses and compresses a compressed file directly by e-mail or splits it. It is an object of the present invention to provide a method for compressing and restoring Hangul data that can be efficiently used for compressing or restoring all data using Hangul.

이와 같은 목적을 달성하기 위한 본 발명은 먼저 압축 및 복원을 위한 프로그램이 압축된 CD 등을 컴퓨터의 CD 롬 드라이버에 넣은 후 프로그램을 읽도록 명령하면, 컴퓨터에서 CD 롬 드라이버를 가동하면서 픽업 모듈을 통하여 저장된 데이터를 읽는 단계와,In order to achieve the above object, the present invention first inserts a compressed CD or the like into a CD ROM driver of a computer and instructs the program to be read. Reading the stored data,

상기의 CD 롬 드라이버를 통하여 CD의 데이터를 읽은 초기화면을 디스플레이하면서 사용자가 프로그램을 설치할 것인 가의 여부를 묻는 단계와,Displaying the initial screen of reading the data of the CD through the CD ROM driver and asking whether the user wants to install the program;

사용자가 프로그램을 설치하기 위해 설치의 버튼을 클릭하면 CD에 압축되어 저장된 프로그램을 읽어서 복원한 후 사용자가 지정한 번지에 저장하는 단계와,When the user clicks the button of the installation to install the program, the program reads and restores the compressed program stored in the CD and saves it to the address designated by the user.

그 상태에서 사용자가 압축시키고자 하는 데이터의 파일을 입력시키면, 파일의 형태를 분류하여 형태별로 구분하는 단계와,When the user inputs a file of data to be compressed, classifying the file type and classifying the file by type,

상기의 구분된 형태들의 파일을 압축하는 단계와,Compressing the files of the separated types;

상기의 압축된 파일의 dictionary를 초기화한 후 캐릭터 스트림의 압축할 파일을 읽어서 가져오는 단계와,Initializing the dictionary of the compressed file and reading and importing the file to be compressed of the character stream;

압축 코드에 대한 인덱스 테이블/하우징 테이블을 생성한 후 메모리에 인덱스 테이블을 생성하는 단계와,Creating an index table in memory after creating an index table / housing table for the compression code;

압축 코드에 대한 인덱스 번호를 기억시킨 후 코드화된 자료를 코드 스트림에서 출력하는 단계와,Storing the coded data in the code stream after storing the index number for the compressed code;

캐릭터 스트림에 압축할 자료가 있는 가를 확인하면서 dictionary를 초기화하는 단계부터 다시 수행하는 단계와,Re-initializing the dictionary, checking if there is any data to compress in the character stream,

캐릭터 스트림에 압축할 자료가 더 이상 없으면, 코드워드 + 코드 스트림의 프리픽스(prefix) 코드를 출력한 후 종료하는 단계들에 의해 압축을 수행하도록 하는 한편,If there is no more data to compress in the character stream, the compression is performed by the steps of outputting the codeword + prefix code of the code stream and then ending it.

사용자가 복원하고자 하는 데이터의 파일을 입력시키면, 파일의 형태를 분류하여 형태별로 구분하는 단계와,When the user inputs a file of data to be restored, classifying the file type and classifying the file by type;

상기의 압축된 파일의 dictionary를 초기화한 후 코드 스트림의 처음 코드를 읽어서 가져오는 단계와,Initializing the dictionary of the compressed file and reading and importing the first code of the code stream,

캐릭터 스트림의 캐릭터 워드를 출력하고 프리픽스 워드를 캐릭터 워드로 대체하는 단계와,Outputting a character word of the character stream and replacing the prefix word with the character word;

코드 스트림의 다음에 위치하는 캐릭터 워드를 읽어서 가져온 상태에서 가져온 캐릭터 워드가 dictionary에 있는 가를 확인하는 단계와,Reading the character word located next to the code stream to check if the imported character word is in the dictionary,

dictionary에 있는 경우에는 캐릭터 스트림으로 가져온 코드 워드를 출력한 후 프리픽스로 대체하는 단계와,if it is in a dictionary, outputting codewords imported into the character stream and replacing them with prefixes,

코드워드에서 새로운 데이터를 읽어서 가져온 후 프리픽스 새 데이터를 dictionary에 추가하는 단계와,Reading and importing new data from the codeword, adding the prefix new data to the dictionary,

상기의 dictionary에 없는 경우에는 프리픽스를 읽어서 가져온 후 코드 스트림을 프리픽스로 대체하는 단계와,If it is not in the dictionary, read and import the prefix and replace the code stream with the prefix,

프리픽스 + 코드 워드를 캐릭터 스트림에 출력하면서 dictionary에 출력값을추가한 후 코드 스트림에 해석할 또 다른 해석할 자료가 더 있는 가를 확인하는 과정들에 의해 복원을 수행하도록 함으로써 한글을 사용하는 모든 데이터의 압축이나 복원에 효율적으로 이용할 수 있도록 한 것이다.Compress all the data using Hangul by outputting the prefix + codeword to the character stream, adding the output to the dictionary, and then restoring it by checking if there is more data to interpret in the code stream. It can also be used efficiently for restoration.

도 1은 본 발명의 데이터를 압축하는 과정을 나타낸 플로우 차트.1 is a flow chart illustrating a process of compressing data of the present invention.

도 2는 본 발명의 데이터를 복원하는 과정을 나타낸 플로우 차트.2 is a flowchart illustrating a process of restoring data of the present invention.

도 3은 본 발명 자동 풀림 편집기의 메뉴를 나타낸 개략도.3 is a schematic diagram showing a menu of the present invention automatic release editor.

도 4는 본 발명 압축파일 분할의 메뉴를 나타낸 개략도.4 is a schematic diagram showing a menu of the present invention compressed file segmentation.

도 5는 본 발명 Zip파일 검사/치료의 메뉴를 나타낸 개략도.Figure 5 is a schematic diagram showing the menu of the present invention Zip file inspection / treatment.

이하 본 발명을 첨부 도면에 의거 상세히 기술하면 다음과 같다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

본 발명은 하드웨어적인 구성보다는 파일을 저장하면서 데이터를 압축하거나 재생하는 과정에 그 특징이 있는 것으로서,The present invention is characterized in that the process of compressing or reproducing data while storing a file rather than a hardware configuration,

먼저 압축 및 복원을 위한 프로그램이 압축된 CD 등을 컴퓨터의 CD 롬 드라이버에 넣은 후(단계 1) 프로그램을 읽도록 명령하면(단계 2), 컴퓨터에서 CD 롬 드라이버를 가동하면서 픽업 모듈을 통하여 저장된 데이터를 읽는다(단계 3).First, the program for compressing and restoring inserts a compressed CD into the CD ROM driver of the computer (step 1) and then instructs the program to read (step 2). Read (step 3).

상기의 CD 롬 드라이버를 통하여 CD의 데이터를 읽은 초기화면을 디스플레이하면서 사용자가 프로그램을 설치할 것인 가의 여부를 묻고(단계 4), 사용자가 프로그램을 설치하기 위해 설치의 버튼을 클릭하면(단계 5), CD에 압축되어 저장된 프로그램을 읽어서 복원한 후 사용자가 지정한 번지에 저장한다(단계 6).Displaying the initial screen reading the data of the CD through the CD ROM driver, asking whether the user wants to install the program (step 4), and when the user clicks the install button to install the program (step 5). In addition, the program compressed and stored in the CD is read, restored, and stored in the address designated by the user (step 6).

상기와 같이 데이터의 압축이나 복원을 위한 프로그램을 저장한 상태에서 사용자가 압축시키고자 하는 데이터의 파일을 입력시키면(단계 7), 압축할 파일이 형태를 구분할 수 있는 가의 여부를 판단하여(단계 8) 가능하지 않으면 에러를 표시한 후 종료하고(단계 19), 파일의 구분이 가능하면 파일의 형태를 분류하여 ARJ, ARC, GZIP, LHA, LZH, PAK. RAR, TAR, ZIP, ZOO 등의 형태별로 구분한다(단계 9).If the user inputs a file of data to be compressed while storing a program for compressing or restoring the data as described above (step 7), it is determined whether or not the file to be compressed can be distinguished (step 8). ) If not possible, display the error and exit (step 19). If the file can be classified, classify the file type into ARJ, ARC, GZIP, LHA, LZH, PAK. The RAR, TAR, ZIP, and ZOO are classified by types (step 9).

상기의 구분된 형태들의 파일을 "LZW"의 형태로 압축하고(단계 10), 상기의압축된 파일의 dictionary를 초기화한 후(단계 11) 캐릭터 스트림의 압축할 파일을 읽어서 가져온다(단계 12).The files of the separated types are compressed in the form of "LZW" (step 10), the dictionary of the compressed files is initialized (step 11), and the files to be compressed of the character stream are read (step 12).

압축 코드에 대한 인덱스 테이블/하우징 테이블을 생성한 후(단계 13), 메모리에 인덱스 테이블을 생성한다(단계 14).After creating an index table / housing table for the compression code (step 13), create an index table in memory (step 14).

압축 코드에 대한 인덱스 번호를 기억시킨 후(단계 15), 코드화된 자료를 코드 스트림에서 출력한다(단계 16).After storing the index number for the compressed code (step 15), the coded data is output from the code stream (step 16).

캐릭터 스트림에 압축할 자료가 더 있는 가를 확인하여(단계 17), 있으면 dictionary를 초기화하는 단계부터 다시 수행하도록 하는 한편, 캐릭터 스트림에 압축할 자료가 더 이상 없으면, 코드워드 + 코드 스트림의 프리픽스(prefix) 코드를 출력한 후 종료하는 단계들에 의해 압축을 수행하도록 한다(단계 18).Check if there is more data to compress in the character stream (step 17), and if so, start again from initializing the dictionary, while if there are no more data to compress in the character stream, the prefix of the codeword + code stream Compression is performed by the steps of outputting the code) and ending (step 18).

상기의 압축파일을 자동실행 파일(.exe)로 만들 수 있으며 이는 도 3에 도시한 것과 같은 자동풀림 편집의 메뉴를 선택하여(단계 20) Zip파일과 Exe 위치 및 표시제목을 입력시킴으로 가능하게 되고, setup.exe 같은 파일을 생성할 때 유용하게 쓰일 수 있게 된다(단계 21).The compressed file can be made into an auto-execution file (.exe). This can be done by selecting the menu of the self-extracting edit menu as shown in FIG. 3 (step 20) and inputting the zip file and the Exe location and display title. This can be useful for creating files like setup.exe (step 21).

그리고 압축한 파일을 하나의 디스켓에 다 저장하지 못할 경우에는 도 4에 도시한 것과 같은 압축파일 분할의 메뉴를 선택하여(단계 22) 그 크기를 임의로 설정하면서 분할하여 저장할 수 있도록 한다(단계 23).If the compressed file cannot be stored on a single diskette, the compressed file division menu as shown in FIG. 4 is selected (step 22), and the size of the compressed file can be divided and stored (step 23). .

그리고 도 5에 도시한 것과 같은 Zip파일 검사/치료의 메뉴를 선택하면(단계 24) 압축파일을 검사하고 파일이 손상되었을 때 파일을 복구하고 새로운 이름으로 저장할 수 있다(단계 25).If the menu of Zip file check / repair as shown in Fig. 5 is selected (step 24), the compressed file can be scanned and the file can be recovered when the file is damaged and saved under a new name (step 25).

한편, 상기의 압축과정에 의해 압축된 데이터를 사용자가 복원하고자 할 경우에는, 데이터의 파일을 입력시키면(단계 31), 압축할 파일이 형태를 구분할 수 있는 가의 여부를 판단하여(단계 32) 가능하지 않으면 에러를 표시한 후 종료하는 한편(단계 33), 파일의 구분이 가능하면 파일의 형태를 분류하여 ARJ, ARC, GZIP, LHA, LZH, PAK. RAR, TAR, ZIP, ZOO 등의 형태별로 구분한다(단계 34).On the other hand, when the user wants to restore the data compressed by the above compression process, if a file of data is input (step 31), it is possible to determine whether the file to be compressed can be distinguished (step 32). If not, an error is displayed and then terminated (step 33). If the files can be distinguished, the file types are classified and ARJ, ARC, GZIP, LHA, LZH, PAK. The RAR, TAR, ZIP, and ZOO are classified by types (step 34).

상기의 구분된 형태들의 파일을 "LZW"와 같은 형태로 압축하고(단계 35), 상기의 압축된 파일의 dictionary를 초기화한 후(단계 36) 코드 스트림의 처음 코드를 읽어서 가져오도록 한다(단계 37).Compress the above distinct types of files into a form such as "LZW" (step 35), initialize the dictionary of the compressed file (step 36), and then read and import the first code of the code stream (step 37). ).

캐릭터 스트림의 캐릭터 워드를 출력하고(단계 38) 프리픽스 워드를 캐릭터 워드로 대체한다(단계 39).The character word of the character stream is output (step 38) and the prefix word is replaced with the character word (step 39).

코드 스트림의 다음에 위치하는 캐릭터 워드를 읽어서 가져온 상태에서(단계 40) 가져온 캐릭터 워드가 dictionary에 있는 가를 확인한다(단계 41).The character word located next to the code stream is read to check whether the imported character word is in the dictionary (step 41).

dictionary에 있는 경우에는, 캐릭터 스트림으로 가져온 코드 워드를 출력한 후(단계 42) 프리픽스로 대체한다(단계 43).If it is in the dictionary, the codeword taken into the character stream is output (step 42) and replaced with the prefix (step 43).

코드워드에서 새로운 데이터를 읽어서 가져온 후(단계 44) 프리픽스 새 데이터를 dictionary에 추가한다(단계 45).Read and import the new data from the codeword (step 44) and add the prefix new data to the dictionary (step 45).

상기의 단계 41에서 dictionary에 없는 경우에는, 프리픽스를 읽어서 가져온 후(단계 46) 코드 스트림을 프리픽스로 대체한다(단계 47).If it is not in the dictionary in step 41 above, the prefix is read and retrieved (step 46) and the code stream is replaced with the prefix (step 47).

프리픽스 + 코드 워드를 캐릭터 스트림에 출력하면서 dictionary에 출력값을 추가한 후(단계 48) 코드 스트림에 해석할 또 다른 해석할 자료가 더 있는 가를 확인하는 과정들에 의해 복원을 수행하도록 한다(단계 49).Output the prefix + codeword to the character stream, add the output to the dictionary (step 48), and then perform the restoration by checking whether there is more data to interpret in the code stream (step 49). .

이와 같은 본 발명의 한글 데이터의 압축 및 복원방법에 의하여서는 압축시킬 파일을 입력시키면, 파일의 형태를 분류하여 압축하고 그 dictionary를 초기화한 후 캐릭터 스트림의 압축할 파일을 가져와 압축 코드에 대한 인덱스 테이블/하우징 테이블을 생성하고 메모리에 인덱스 테이블을 생성하고,According to the method of compressing and restoring Korean data according to the present invention, when a file to be compressed is input, the file type is classified and compressed, the dictionary is initialized, the file to be compressed of the character stream is imported, and an index table for the compression code is obtained. Create a housing table, create an index table in memory,

압축 코드에 대한 인덱스 번호를 기억시키고 코드화된 자료를 코드 스트림에서 출력하면서 압축할 자료가 더 있는 가를 확인하면서 코드워드 + 코드 스트림의 프리픽스 코드를 출력하는 단계들에 의해 압축을 수행하도록 하는 한편,Compression is performed by storing the index number for the compression code and outputting the coded data from the code stream, checking whether there is more data to compress, and outputting the prefix code of the codeword + code stream,

복원하고자 하는 데이터의 파일을 입력시키면, 파일의 형태를 분류하여 파일을 압축하고 그 dictionary를 초기화한 후 코드 스트림의 처음 코드에 대해 캐릭터 스트림의 캐릭터 워드를 출력하고 프리픽스 워드를 캐릭터 워드로 대체하고 다음의 캐릭터 워드를 가져와 캐릭터 워드가 dictionary에 있는 가를 확인하는 단계와,After inputting the file of the data to be restored, classify the file type, compress the file, initialize the dictionary, output the character word of the character stream for the first code of the code stream, replace the prefix word with the character word, and then Taking the character word of and checking if it is in the dictionary,

dictionary에 있는 경우에는 캐릭터 스트림으로 가져온 코드 워드를 출력한 후 프리픽스로 대체하도록 하고 코드워드에서 새로운 데이터를 가져와 프리픽스 새 데이터를 dictionary에 추가하는 단계와,If it's in a dictionary, it prints the codeword imported into the character stream and replaces it with the prefix, gets the new data from the codeword, adds the prefix new data to the dictionary,

상기의 dictionary에 없는 경우에는 프리픽스를 읽어서 가져온 후 코드 스트림을 프리픽스로 대체하도록 하고 프리픽스 + 코드 워드를 캐릭터 스트림에 출력하면서 dictionary에 출력값을 추가하여 코드 스트림에 다른 해석할 자료가 더 있는 가를 확인하는 과정들에 의해 복원을 수행하도록 함으로써 한글을 사용하는 모든데이터의 압축이나 복원에 효율적으로 이용할 수 있도록 한 것이다.If it is not in the above dictionary, read and import the prefix, replace the code stream with the prefix, output the prefix + code word to the character stream, and add the output to the dictionary to see if there is more data to interpret in the code stream. By restoring the data, all the data using Hangul can be efficiently used for compression or restoration.

Claims

The program for compressing and restoring has saved the compressed CD on your computer.

When the user inputs a file of data to be compressed, classifying the file type and classifying the file by type,

Compressing the files of the separated types;

Initializing the dictionary of the compressed file and reading and importing the file to be compressed of the character stream;

Creating an index table in memory after creating an index table / housing table for the compression code;

Storing the coded data in the code stream after storing the index number for the compressed code;

Re-initializing the dictionary, checking if there is any data to compress in the character stream,

If there is no more data to be compressed in the character stream, the method of compressing Korean data is performed by the steps of outputting a codeword + prefix code of the code stream and ending.

When the user inputs a file of data to be restored, classifying the file type and classifying the file by type;

Compressing the files of the separated types;

Initializing the dictionary of the compressed file and reading and importing the first code of the code stream,

Outputting a character word of the character stream and replacing the prefix word with the character word;

Reading the character word located next to the code stream to check if the imported character word is in the dictionary,

if it is in a dictionary, outputting codewords imported into the character stream and replacing them with prefixes,

Reading and importing new data from the codeword, adding the prefix new data to the dictionary,

If it is not in the dictionary, read and import the prefix and replace the code stream with the prefix,

A method of restoring Hangul data in which a prefix + code word is output to a character stream, adding output to the dictionary, and then restoring whether there is more data to be interpreted in the code stream.