JP2017208644A

JP2017208644A - Decoding program, decoding method, and decoding apparatus

Info

Publication number: JP2017208644A
Application number: JP2016098753A
Authority: JP
Inventors: 片岡　正弘; Masahiro Kataoka; 正弘片岡; 樹一山田; Kiichi Yamada; 晴康上田; Haruyasu Ueda
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-05-17
Filing date: 2016-05-17
Publication date: 2017-11-24
Anticipated expiration: 2036-05-17
Also published as: JP6665679B2

Abstract

PROBLEM TO BE SOLVED: To assign two bytes or more of codes such as codes associated with characters or words of high frequency of occurrence that should be assigned to short codes to one byte code.SOLUTION: A code conversion unit 850 of a decoding apparatus 800 uses a plurality of automatons generated on the basis of a second code allocation table to decode coded data into character data by an automaton selected according to a value of first 4 bits of the data among the plurality of automata.SELECTED DRAWING: Figure 25

Description

本発明は、復号化プログラム等に関する。 The present invention relates to a decryption program and the like.

従来のテキストデータは、ASCIIコードおよびユニコード（Unicode）のコード割当表に基づいて所定のコードに置き換えられる。図３０は、従来のASCIIコードおよびユニコードに基づくコード割当表を説明するための図である。図３０に示すように、コード割当表の００ｈ〜１Ｆｈには、所定の制御記号が設定され、各制御記号には、１バイトのコードが割り当てられる。コード割当表の２０ｈ〜７Ｆｈには、英数字が設定され、各英数字には、１バイトのコードが割り当てられる。コード割当表の８０ｈ〜ＦＦｈには、ＣＪＫ文字が設定され、各ＣＪＫ文字には、３バイトのコードが割り当てられる。 Conventional text data is replaced with a predetermined code based on an ASCII code and Unicode code assignment table. FIG. 30 is a diagram for explaining a code assignment table based on a conventional ASCII code and Unicode. As shown in FIG. 30, predetermined control symbols are set in 00h to 1Fh in the code assignment table, and a 1-byte code is assigned to each control symbol. Alphanumeric characters are set in 20h to 7Fh of the code assignment table, and a 1-byte code is assigned to each alphanumeric character. CJK characters are set in 80h to FFh of the code assignment table, and a 3-byte code is assigned to each CJK character.

ここで、従来技術１には、コード割当表の制御記号を割り当てる００ｈ〜１Ｆｈに空き領域が存在する場合に、係る空き領域に単語等を登録し、かかるコード割当表を用いてコード化を実行する技術がある。また、従来技術２には、コード割当表の英大文字の領域において、英大文字の代わりに他の文字を設定し、かかるコード割当表を用いて、コード化を実行する技術がある。 Here, in the prior art 1, when an empty area exists in 00h to 1Fh to which the control symbols of the code allocation table are allocated, a word or the like is registered in the empty area and the encoding is executed using the code allocation table. There is technology to do. Further, in the prior art 2, there is a technique in which other characters are set instead of uppercase letters in the uppercase area of the code assignment table, and encoding is executed using the code assignment table.

特開平７−２８７７１６号公報JP-A-7-287716 特開平１１−１４３８７７号公報Japanese Patent Laid-Open No. 11-143877

しかしながら、上述した従来技術では、出現頻度が高い単語や一般記号に対して、短いバイトコードを割り当てることができないという問題がある。 However, the above-described conventional technique has a problem that short bytecodes cannot be assigned to words and general symbols that appear frequently.

例えば、テキストデータを送受信する者同士が、使用しない制御記号や英大文字とそのコード割当表を共有する場合に限り、従来技術１、２のように、制御記号の空き領域等に単語を割り当てることで、出現頻度が高い文字や単語に短いバイトコードに割り当てることができる。 For example, assigning a word to an empty area of a control symbol as in the prior arts 1 and 2 only when those who send and receive text data share a control symbol that is not used or an uppercase letter and its code assignment table Thus, it is possible to assign a short byte code to a character or word having a high appearance frequency.

一方、一般のテキストデータを構成する単語や一般記号の出現頻度に応じて可変長符号を割り当てると、約４０種の符号長が５〜８ビット、約８千種の符号長が９〜１６ビットである。そこで、単語や一般記号に対し、その出現頻度に応じて、３２種以上に１バイトコード、８１９２種以上に２バイトコードを割り当てることで、高い圧縮率を達成する圧縮処理を行うことができる。しかし、かかる従来技術１，２では、多量な単語や一般記号にコード割当てすることができない。 On the other hand, when variable length codes are assigned according to the appearance frequency of words and general symbols constituting general text data, about 40 types of code lengths are 5 to 8 bits and about 8,000 types of code lengths are 9 to 16 bits. is there. Accordingly, by assigning 1 byte code to 32 or more types and 2 byte code to 8192 or more types according to the appearance frequency of words and general symbols, compression processing that achieves a high compression rate can be performed. However, in the related arts 1 and 2, codes cannot be assigned to a large number of words and general symbols.

１つの側面では、本発明は、短いコードに割り当てるべき、出現頻度が高い文字や単語に対応付けられたコード等の２バイト以上のコードを１バイトコードに割り当てることができる復号化プログラム、復号化方法および復号化装置を提供することを目的とする。 In one aspect, the present invention relates to a decoding program that can assign a code of 2 bytes or more, such as a code associated with a character or word having a high appearance frequency, that should be assigned to a short code, to a 1-byte code, It is an object to provide a method and a decoding device.

第１の案では、コンピュータに下記の処理を実行させる。コンピュータは、第２コード割当表を基にして生成された複数のオートマトンを利用し、コード化されたデータを複数のオートマトンのうち、当該データの先頭４ビットの値に応じて選択されるオートマトンにより文字データに復号化する。第２コード割当表は、第１コード割当表の１バイト領域に割り当てられた文字の一部を、２バイト領域に割り当てた変換規則である。また、第２コード割当表は、２バイト領域に割り当てられた前記文字の少なくとも一部に対して、２バイト以上のコードを割り当てることで、入力された文字データをコード化する変換規則である。また、第２コード割当表の変換規則では、コード化された符号データの先頭４ビットの値が当該符号データのコード長に応じて異なる。 In the first plan, the computer executes the following processing. The computer uses a plurality of automata generated based on the second code allocation table, and uses the automaton selected from the plurality of automata according to the value of the first 4 bits of the data. Decrypt into character data. The second code assignment table is a conversion rule in which a part of the characters assigned to the 1-byte area of the first code assignment table is assigned to the 2-byte area. The second code allocation table is a conversion rule for encoding input character data by allocating a code of 2 bytes or more to at least a part of the character allocated to the 2-byte area. Further, according to the conversion rule of the second code allocation table, the value of the first 4 bits of the encoded code data differs depending on the code length of the code data.

出現頻度が高い文字や単語に対して、短いバイトコードを割り当てることができる。 Short byte codes can be assigned to characters and words that appear frequently.

図１ａは、本実施例１に係る符号化装置の処理の一例を示す図である。FIG. 1A is a diagram illustrating an example of processing of the encoding apparatus according to the first embodiment. 図１ｂは、本実施例１に係る復号化装置の処理の一例を示す図である。FIG. 1B is a diagram illustrating an example of processing of the decoding apparatus according to the first embodiment. 図２ａは、本実施例１に係る符号化装置の構成を示す機能ブロック図である。FIG. 2A is a functional block diagram illustrating the configuration of the encoding device according to the first embodiment. 図２ｂは、本実施例１に係る復号化装置の構成を示す機能ブロック図である。FIG. 2B is a functional block diagram illustrating the configuration of the decoding apparatus according to the first embodiment. 図３は、本実施例１に係るコード割当表の一例を示す図である。FIG. 3 is a diagram illustrating an example of a code assignment table according to the first embodiment. 図４は、本実施例１に係る２バイトコード割当表の一例を示す図である。FIG. 4 is a diagram illustrating an example of a 2-byte code allocation table according to the first embodiment. 図５は、本実施例１に係る３バイトコード割当表の一例を示す図である。FIG. 5 is a diagram illustrating an example of a 3-byte code allocation table according to the first embodiment. 図６ａは、本実施例１に係る符号化装置の処理手順を示すフローチャートである。FIG. 6A is a flowchart illustrating the processing procedure of the encoding apparatus according to the first embodiment. 図６ｂは、本実施例１に係る復号化装置の処理手順を示すフローチャートである。FIG. 6B is a flowchart illustrating the processing procedure of the decoding apparatus according to the first embodiment. 図７ａは、本実施例２に係る符号化装置の処理の一例を示す図である。FIG. 7A is a diagram illustrating an example of processing of the encoding apparatus according to the second embodiment. 図７ｂは、本実施例２に係る復号化装置の処理の一例を示す図である。FIG. 7B is a diagram illustrating an example of a process performed by the decoding apparatus according to the second embodiment. 図８ａは、本実施例２に係る符号化装置の構成を示す機能ブロック図である。FIG. 8A is a functional block diagram illustrating the configuration of the encoding device according to the second embodiment. 図８ｂは、本実施例２に係る復号化装置の構成を示す機能ブロック図である。FIG. 8B is a functional block diagram illustrating the configuration of the decoding apparatus according to the second embodiment. 図９は、本実施例２に係るコード割当表の一例を示す図である。FIG. 9 is a diagram illustrating an example of a code assignment table according to the second embodiment. 図１０は、本実施例２に係る２バイトコード割当表の一例を示す図である。FIG. 10 is a diagram illustrating an example of a 2-byte code allocation table according to the second embodiment. 図１１は、本実施例２に係る３バイトコード割当表の一例を示す図である。FIG. 11 is a diagram illustrating an example of a 3-byte code allocation table according to the second embodiment. 図１２ａは、本実施例２に係る符号化装置の処理手順を示すフローチャートである。FIG. 12A is a flowchart illustrating the processing procedure of the encoding apparatus according to the second embodiment. 図１２ｂは、本実施例２に係る復号化装置の処理手順を示すフローチャートである。FIG. 12B is a flowchart illustrating the processing procedure of the decoding apparatus according to the second embodiment. 図１３ａは、本実施例３に係る符号化装置の処理の一例を示す図である。FIG. 13A is a diagram illustrating an example of a process performed by the encoding apparatus according to the third embodiment. 図１３ｂは、本実施例３に係る復号化装置の処理の一例を示す図である。FIG. 13B is a diagram illustrating an example of a process performed by the decoding apparatus according to the third embodiment. 図１４ａは、本実施例３に係る符号化装置の構成を示す機能ブロック図である。FIG. 14A is a functional block diagram illustrating the configuration of the encoding device according to the third embodiment. 図１４ｂは、本実施例３に係る復号化装置の構成を示す機能ブロック図である。FIG. 14B is a functional block diagram illustrating the configuration of the decoding apparatus according to the third embodiment. 図１５は、本実施例３に係るコード割当表の一例を示す図である。FIG. 15 is a diagram illustrating an example of a code assignment table according to the third embodiment. 図１６は、本実施例３に係る英単語２バイトコード割当表の一例を示す図である。FIG. 16 is a diagram illustrating an example of an English word 2-byte code assignment table according to the third embodiment. 図１７は、本実施例３に係る日本単語２バイト割当表の一例を示す図である。FIG. 17 is a diagram illustrating an example of a Japanese word 2-byte allocation table according to the third embodiment. 図１８は、本実施例３に係る２・３バイト割当表の一例を示す図である。FIG. 18 is a diagram illustrating an example of the 2.3 byte allocation table according to the third embodiment. 図１９ａは、本実施例３に係る符号化装置の処理手順を示すフローチャートである。FIG. 19A is a flowchart illustrating the processing procedure of the encoding apparatus according to the third embodiment. 図１９ｂは、本実施例３に係る復号化装置の処理手順を示すフローチャートである。FIG. 19B is a flowchart illustrating the processing procedure of the decoding apparatus according to the third embodiment. 図２０ａは、第１コード変換処理の処理手順を示すフローチャートである。FIG. 20A is a flowchart showing the processing procedure of the first code conversion processing. 図２０ｂは、第２コード変換処理の処理手順を示すフローチャートである。FIG. 20B is a flowchart illustrating the processing procedure of the second code conversion processing. 図２１は、本実施例４に係る復号化装置の処理の一例を示す図である。FIG. 21 is a diagram illustrating an example of processing of the decoding apparatus according to the fourth embodiment. 図２２は、第１オートマトンの一例を示す図である。FIG. 22 is a diagram illustrating an example of the first automaton. 図２３は、第２オートマトンの一例を示す図である。FIG. 23 is a diagram illustrating an example of the second automaton. 図２４は、第３オートマトンの一例を示す図である。FIG. 24 is a diagram illustrating an example of the third automaton. 図２５は、本実施例４に係る復号化装置の構成を示す機能ブロック図である。FIG. 25 is a functional block diagram of the configuration of the decoding apparatus according to the fourth embodiment. 図２６は、本実施例４に係る復号化装置の処理手順を示すフローチャートである。FIG. 26 is a flowchart of the process procedure of the decoding apparatus according to the fourth embodiment. 図２７は、コンピュータのハードウェア構成例を示す図である。FIG. 27 is a diagram illustrating a hardware configuration example of a computer. 図２８は、コンピュータで動作するプログラムの構成例を示す図である。FIG. 28 is a diagram illustrating a configuration example of a program operating on a computer. 図２９は、実施形態のシステムにおける装置の構成例を示す図である。FIG. 29 is a diagram illustrating a configuration example of an apparatus in the system according to the embodiment. 図３０は、従来のASCIIコードおよびユニコードに基づくコード割当表を説明するための図である。FIG. 30 is a diagram for explaining a code assignment table based on a conventional ASCII code and Unicode.

以下に、本願の開示する復号化プログラム、復号化方法および復号化装置の実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。 Hereinafter, embodiments of a decoding program, a decoding method, and a decoding device disclosed in the present application will be described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments.

図１ａは、本実施例１に係る符号化装置の処理の一例を示す図である。本実施例１に係る符号化装置は、従来技術で用いていたコード割当表５０の代わりに、コード割当表１１０を用いて、テキストデータ１０ａをコード変換することで、コード変換されたテキストデータ１０ｂを生成する。 FIG. 1A is a diagram illustrating an example of processing of the encoding apparatus according to the first embodiment. The encoding apparatus according to the first embodiment transcodes the text data 10a by using the code allocation table 110 instead of the code allocation table 50 used in the prior art, thereby converting the text data 10b that has been subjected to code conversion. Is generated.

従来技術のコード割当表５０の００ｈ〜１Ｆｈには、制御記号が設定され、各制御記号には、１バイトのコードが割り当てられる。「ｈ」は１６進数を示す記号である。コード割当表５０の２０ｈ〜７Ｆｈには、英数字が設定され、各英数字には、１バイトのコードが割り当てられる。コード割当表５０の８０ｈ〜ＦＦｈには、ＣＪＫ文字が設定され、各ＣＪＫ文字には、３バイトのコードが割り当てられる。 Control symbols are set in 00h to 1Fh of the code allocation table 50 of the prior art, and a 1-byte code is allocated to each control symbol. “H” is a symbol indicating a hexadecimal number. Alphanumeric characters are set in 20h to 7Fh of the code assignment table 50, and a 1-byte code is assigned to each alphanumeric character. CJK characters are set in 80h to FFh of the code assignment table 50, and a 3-byte code is assigned to each CJK character.

これに対して、本実施例１に係るコード割当表１１０の００ｈ〜２Ｆｈには、後述する所定の単語が設定され、１バイトのコードが割り当てられる。コード割当表１１０の００ｈ〜２Ｆｈは、コード割当表５０において、制御記号が割り当てられていた領域を含む。 On the other hand, predetermined words to be described later are set in 00h to 2Fh of the code assignment table 110 according to the first embodiment, and a 1-byte code is assigned. 00h to 2Fh in the code assignment table 110 includes an area to which a control symbol is assigned in the code assignment table 50.

コード割当表１１０の３０ｈ〜５Ｆｈには、高頻度の単語等が設定される。また、コード割当表１１０の３０ｈ〜５Ｆｈには、コード割当表５０の００ｈ〜１Ｆｈに設定されていた制御記号や、コード割当表５０の２０ｈ〜７Ｆｈに設定されていた英数字が設定される。また、コード割当表１１０の３０ｈ〜５Ｆｈには、コード割当表５０の８０ｈ〜ＦＦｈに設定されていたＣＪＫ文字の一部が設定される。コード割当表１１０の３０ｈ〜５Ｆｈに設定された高頻度の単語、制御記号、英数字、ＣＪＫ文字には、２バイトのコードが割り当てられる。 A high-frequency word or the like is set in 30h to 5Fh of the code assignment table 110. In addition, the control symbols set in 00h to 1Fh in the code assignment table 50 and the alphanumeric characters set in 20h to 7Fh in the code assignment table 50 are set in 30h to 5Fh in the code assignment table 110. Also, a part of CJK characters set in 80h to FFh in the code assignment table 50 are set in 30h to 5Fh in the code assignment table 110. A 2-byte code is assigned to high-frequency words, control symbols, alphanumeric characters, and CJK characters set to 30h to 5Fh in the code assignment table 110.

すなわち、コード割当表５０の００ｈ〜７Ｆｈに設定され、それまで１バイトのコードが割り当てられていた制御記号および英数字は、コード割当表１１０の３０ｈ〜５Ｆｈの一部に割り当てられ、２バイトのコードが割り当てられる。 That is, the control symbols and alphanumeric characters that have been set to 00h to 7Fh in the code assignment table 50 and have been assigned a 1-byte code until then are assigned to a part of 30h to 5Fh in the code assignment table 110, and are assigned 2 bytes. A code is assigned.

コード割当表１１０の６０ｈ〜ＦＦｈには、低頻度の単語等が設定される。また、コード割当表１１０の６０ｈ〜ＦＦｈには、コード割当表５０の８０ｈ〜ＦＦｈに設定されていたＣＪＫ文字の一部が設定される。 Infrequent words and the like are set in 60h to FFh of the code assignment table 110. Also, part of the CJK characters set in 80h to FFh in the code assignment table 50 are set in 60h to FFh in the code assignment table 110.

本実施例１について、以下の説明では、適宜、コード割当表１１０の００ｈ〜２Ｆｈの領域を「１バイト領域」と表記する。コード割当表１１０の３０ｈ〜５Ｆｈの領域を「２バイト領域」と表記する。コード割当表１１０の６０ｈ〜ＦＦｈの領域を「３バイト領域」と表記する。 Regarding the first embodiment, in the following description, the area from 00h to 2Fh in the code assignment table 110 will be appropriately expressed as “1-byte area”. The area of 30h to 5Fh in the code assignment table 110 is expressed as “2-byte area”. The area from 60h to FFh in the code assignment table 110 is denoted as “3-byte area”.

コード変換部１５０は、コード割当表１１０に基づいて、テキストデータ１０ａを、テキストデータ１０ｂに変換する。ここでは、テキストデータ１０ａを「・・・ｈｅ△ｉｓ△ｉｎ△ｔｈｅ△ｈｏｕｓｅ△・・・」とする。テキストデータ１０ａの「△」はスペースを示すものである。 Based on the code assignment table 110, the code conversion unit 150 converts the text data 10a into text data 10b. Here, the text data 10a is assumed to be “..., He Δis Δ in Δ the Δ house Δ. “Δ” in the text data 10a indicates a space.

コード変換部１５０は、スペース「△」で区切られる単語と、コード割当表１１０とを比較して、単語をコードに変換する。テキストデータ１０ａに含まれる単語「ｈｅ△」は、コード割当表１１０の１バイト領域に設定された単語であり、コード変換部１５０は、単語「ｈｅ△」を１バイトのコード「１２ｈ」に変換する。 The code conversion unit 150 compares the word delimited by the space “Δ” with the code assignment table 110 and converts the word into a code. The word “heΔ” included in the text data 10a is a word set in the 1-byte area of the code assignment table 110, and the code conversion unit 150 converts the word “heΔ” into a 1-byte code “12h”. To do.

テキストデータ１０ａに含まれる単語「ｉｓ△」は、コード割当表１１０の１バイト領域に設定された単語であり、コード変換部１５０は、単語「ｉｓ△」を１バイトのコード「０８ｈ」に変換する。 The word “isΔ” included in the text data 10a is a word set in the 1-byte area of the code allocation table 110, and the code conversion unit 150 converts the word “isΔ” into a 1-byte code “08h”. To do.

テキストデータ１０ａに含まれる単語「ｉｎ△」は、コード割当表１１０の１バイト領域に設定された単語であり、コード変換部１５０は、単語「ｉｎ△」を１バイトのコード「０７ｈ」に変換する。 The word “inΔ” included in the text data 10a is a word set in the 1-byte area of the code assignment table 110, and the code conversion unit 150 converts the word “inΔ” into a 1-byte code “07h”. To do.

テキストデータ１０ａに含まれる単語「ｔｈｅ△」は、コード割当表１１０の１バイト領域に設定された単語であり、コード変換部１５０は、単語「ｔｈｅ△」を１バイトのコード「００ｈ」に変換する。 The word “theΔ” included in the text data 10a is a word set in the 1-byte area of the code assignment table 110, and the code conversion unit 150 converts the word “theΔ” into a 1-byte code “00h”. To do.

テキストデータ１０ａに含まれる単語「ｈｏｕｓｅ△」は、コード割当表１１０の２バイト領域に設定された単語であり、コード変換部１５０は、例えば、単語「ｈｏｕｓｅ△」を２バイトのコード「４３４１ｈ」に変換する。 The word “houseΔ” included in the text data 10a is a word set in the 2-byte area of the code assignment table 110. For example, the code conversion unit 150 converts the word “houseΔ” into a 2-byte code “4341h”. Convert to

コード変換部１５０は、テキストデータ１０ａに含まれる各単語に対して、上記処理を実行することで、テキストデータ１０ａをテキストデータ１０ｂにコード化する。 The code conversion unit 150 encodes the text data 10a into the text data 10b by performing the above processing on each word included in the text data 10a.

図１ｂは、本実施例１に係る復号化装置の処理の一例を示す図である。本実施例１に係る復号化装置は、従来技術で用いていたコード割当表５０の代わりに、コード割当表１１０を用いて、コード変換されたテキストデータ１０ｂを、文字コード変換することで、テキストデータ１０ａを生成する。コード割当表１１０に関する説明は、上記の説明と同様である。 FIG. 1B is a diagram illustrating an example of processing of the decoding apparatus according to the first embodiment. The decoding apparatus according to the first embodiment uses the code assignment table 110 instead of the code assignment table 50 used in the prior art to perform character code conversion on the text data 10b that has been subjected to code conversion. Data 10a is generated. The description regarding the code assignment table 110 is the same as the above description.

コード変換部５５０は、コード割当表１１０に基づいて、テキストデータ１０ｂを、テキストデータ１０ａに変換する。ここでは、テキストデータ１０ｂを「・・・１２ｈ０８ｈ０７ｈ００ｈ４３４１ｈ・・・」とする。 The code conversion unit 550 converts the text data 10b into the text data 10a based on the code assignment table 110. Here, the text data 10b is assumed to be “... 12h 08h 07h 00h 4341h.

コード変換部５５０は、コードと、コード割当表１１０とを比較して、コードを単語に変換する。例えば、コード変換部５５０は、１バイトのコード「１２ｈ」を単語「ｈｅ△」に変換する。コード変換部５５０は、１バイトのコード「０８ｈ」を単語「ｉｓ△」に変換する。コード変換部５５０は、１バイトのコード「０７ｈ」を単語「ｉｎ△」に変換する。コード変換部５５０は、１バイトのコード「００ｈ」を単語「ｔｈｅ△」に変換する。コード変換部５５０は、２バイトのコード「４３４１ｈ」を単語「ｈｏｕｓｅ△」に変換する。 The code conversion unit 550 compares the code with the code assignment table 110 and converts the code into a word. For example, the code conversion unit 550 converts the 1-byte code “12h” into the word “heΔ”. The code conversion unit 550 converts the 1-byte code “08h” into the word “isΔ”. The code conversion unit 550 converts the 1-byte code “07h” into the word “inΔ”. The code conversion unit 550 converts the 1-byte code “00h” into the word “theΔ”. The code conversion unit 550 converts the 2-byte code “4341h” into the word “houseΔ”.

コード変換部５５０は、テキストデータ１０ｂに含まれる各コードに対して、上記処理を実行することで、テキストデータ１０ｂをテキストデータ１０ａに変換する。 The code conversion unit 550 converts the text data 10b into the text data 10a by executing the above process on each code included in the text data 10b.

図２ａは、本実施例１に係る符号化装置の構成を示す機能ブロック図である。図２ａに示すように、この符号化装置１００は、入力部１０１、出力部１０２、レジスタ１０５ａ，１０５ｂ、記憶部１０６、コード変換部１５０を有する。 FIG. 2A is a functional block diagram illustrating the configuration of the encoding device according to the first embodiment. As shown in FIG. 2 a, the encoding device 100 includes an input unit 101, an output unit 102, registers 105 a and 105 b, a storage unit 106, and a code conversion unit 150.

入力部１０１は、コード変換を行うテキストデータを受け付ける処理部である。入力部１０１は、受け付けたテキストデータを、レジスタ１０５ａに格納する。 The input unit 101 is a processing unit that receives text data for code conversion. The input unit 101 stores the received text data in the register 105a.

出力部１０２は、レジスタ１０５ｂに格納されるコード変換後のテキストデータを出力する処理部である。 The output unit 102 is a processing unit that outputs the text data after code conversion stored in the register 105b.

レジスタ１０５ａは、コード変換を行う前のテキストデータを格納するものである。レジスタ１０５ｂは、コード変換後のテキストデータを格納するものである。 The register 105a stores text data before code conversion. The register 105b stores the text data after code conversion.

記憶部１０６は、コード割当表１１０と、２バイトコード割当表１１５ａと、３バイトコード割当表１１５ｂとを有する。記憶部１０６は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、フラッシュメモリ（Flash Memory）などの半導体メモリ素子などの記憶装置に対応する。 The storage unit 106 includes a code allocation table 110, a 2-byte code allocation table 115a, and a 3-byte code allocation table 115b. The storage unit 106 corresponds to a storage device such as a semiconductor memory element such as a random access memory (RAM), a read only memory (ROM), and a flash memory.

図３は、本実施例１に係るコード割当表の一例を示す図である。コード割当表１１０は、単語等と、所定のコードとを対応付けたテーブルであり、図１ａで説明したコード割当表１１０に対応する。図３に示すように、このコード割当表１１０は、１バイト領域１１０Ａと、２バイト領域１１０Ｂと、３バイト領域１１０Ｃとを有する。 FIG. 3 is a diagram illustrating an example of a code assignment table according to the first embodiment. The code assignment table 110 is a table in which words and the like are associated with predetermined codes, and corresponds to the code assignment table 110 described with reference to FIG. As shown in FIG. 3, the code allocation table 110 has a 1-byte area 110A, a 2-byte area 110B, and a 3-byte area 110C.

１バイト領域１１０Ａは、コード割当表１１０の００ｈ〜２Ｆｈの領域である。この１バイト領域１１０Ａには、青空文庫、オックスフォード英語辞書、その他の一般的な書籍を基にして、出現頻度の高い上位４８個の単語が設定される。 The 1-byte area 110A is an area from 00h to 2Fh in the code assignment table 110. In the 1-byte area 110A, the top 48 words having the highest appearance frequency are set based on the blue sky library, the Oxford English dictionary, and other general books.

１バイト領域１１０Ａに設定された単語は、１バイト領域１１０Ａの設定位置に応じた１バイトのコードが割り当てられる。単語「ｔｈｅ△」は、１バイトのコード「００ｈ」が割り当てられる。１バイト領域１１０Ａに設定された残りの単語も同様に、１バイトのコードが割り当てられる。 A 1-byte code corresponding to the set position of the 1-byte area 110A is assigned to the word set in the 1-byte area 110A. The word “theΔ” is assigned a 1-byte code “00h”. Similarly, a 1-byte code is assigned to the remaining words set in the 1-byte area 110A.

２バイト領域１１０Ｂは、コード割当表１１０の３０ｈ〜５Ｆｈの領域である。この２バイト領域１１０Ｂには、青空文庫、オックスフォード英語辞書、その他の一般的な書籍を基にして、出現頻度が所定値以上となる単語が設定される。以下の説明では、適宜、出現頻度が所定値以上となる単語を高頻度単語と表記する。また、２バイト領域１１０Ｂには、英数字、記号、かな、カナ、漢字、数値、時刻、タグ、構文等も含まれる。 The 2-byte area 110B is an area of 30h to 5Fh in the code assignment table 110. In this 2-byte area 110B, words whose appearance frequency is equal to or higher than a predetermined value are set based on a blue sky library, an Oxford English dictionary, and other general books. In the following description, words whose appearance frequency is equal to or higher than a predetermined value will be referred to as high-frequency words as appropriate. The 2-byte area 110B includes alphanumeric characters, symbols, kana, kana, kanji, numerical values, time, tags, syntax, and the like.

ここで、２バイト領域１１０Ｂには、係る２バイト領域１１０Ｂに設定された高頻度単語等に割り当てる２バイトのコードのうち、前半の１バイトのコードのみが定義されている。２バイト領域１１０Ｂに設定された単語等に割り当てる２バイトのコードは、後述する２バイトコード割当表１１５ａに定義されている。 Here, in the 2-byte area 110B, only the 1-byte code of the first half is defined among 2-byte codes assigned to the high-frequency word or the like set in the 2-byte area 110B. A 2-byte code assigned to a word or the like set in the 2-byte area 110B is defined in a 2-byte code assignment table 115a described later.

例えば、２バイト領域１１０Ｂの英数字、記号、かな、カナ、漢字、数値、時刻、タグ、構文に割り当てる２バイトのコードのうち、前半の１バイトのコードは「３０ｈ〜３Ｆｈ」となる。そして、前半の１バイトのコードと、残りの１バイトのコードは、２バイトコード割当表１１５ａに定義されている。 For example, among the 2-byte codes assigned to alphanumeric characters, symbols, kana, kana, kanji, numerical values, time, tags, and syntax in the 2-byte area 110B, the first one-byte code is “30h to 3Fh”. The 1-byte code in the first half and the remaining 1-byte code are defined in the 2-byte code allocation table 115a.

２バイト領域１１０Ｂの高頻度単語に割り当てる２バイトのコードのうち、前半の１バイトのコードは「４０ｈ〜５Ｆｈ」となる。そして、前半の１バイトのコードと、残りの１バイトのコードは、２バイトコード割当表１１５ａに定義されている。 Of the 2-byte codes assigned to the high-frequency words in the 2-byte area 110B, the 1-byte code in the first half is “40h to 5Fh”. The 1-byte code in the first half and the remaining 1-byte code are defined in the 2-byte code allocation table 115a.

３バイト領域１１０Ｃは、コード割当表１１０の６０ｈ〜ＦＦｈの領域である。この３バイト領域１１０Ｃには、青空文庫、オックスフォード英語辞書、その他の一般的な書籍を基にして、出現頻度が所定値未満となる低頻度の単語が設定される。例えば、３バイト領域１１０Ｃには、ＣＪＫ文字、英単語、日本単語、第３国の単語、数値、時刻、タグ、構文意味解析の結果等が含まれる。 The 3-byte area 110C is an area from 60h to FFh in the code assignment table 110. In this 3-byte area 110C, a low-frequency word whose appearance frequency is less than a predetermined value is set based on a blue sky library, an Oxford English dictionary, and other general books. For example, the 3-byte area 110C includes CJK characters, English words, Japanese words, third country words, numerical values, times, tags, results of syntactic and semantic analysis, and the like.

ここで、３バイト領域１１０Ｃには、係る３バイト領域１１０Ｃに設定された単語等に割り当てる３バイトのコードのうち、前半の１バイトのコードのみが定義されている。３バイト領域１１０Ｃに設定された単語等に割り当てる３バイトのコードは、後述する３バイトコード割当表１１５ｂに定義されている。 Here, in the 3-byte area 110C, only the 1-byte code of the first half is defined among 3-byte codes assigned to the word set in the 3-byte area 110C. A 3-byte code assigned to a word or the like set in the 3-byte area 110C is defined in a 3-byte code assignment table 115b described later.

例えば、３バイト領域１１０ＣのＣＪＫ文字、英単語、日本単語、第３国の単語、数値、時刻、タグ、構文意味解析の結果等に割り当てる３バイトのコードのうち、前半の１バイトのコードは「６０ｈ〜ＦＦｈ」となる。そして、前半の１バイトのコードと、残りの２バイトのコードは、３バイトコード割当表１１５ｂに定義されている。 For example, among the 3-byte codes assigned to CJK characters, English words, Japanese words, third country words, numerical values, times, tags, syntax semantic analysis results, etc. in the 3-byte area 110C, the first half-byte code is “60h to FFh”. The 1-byte code in the first half and the remaining 2-byte code are defined in the 3-byte code allocation table 115b.

図４は、本実施例１に係る２バイトコード割当表の一例を示す図である。図４に示すように、２バイトコード割当表１１５ａは、高頻度単語と、２バイトのコードとを対応付ける。また、２バイトコード割当表１１５ａは、英数字、記号、かな、カナ、漢字、数値、時刻、タグ、構文と、２バイトのコードとを対応付ける。 FIG. 4 is a diagram illustrating an example of a 2-byte code allocation table according to the first embodiment. As shown in FIG. 4, the 2-byte code assignment table 115a associates high-frequency words with 2-byte codes. The 2-byte code assignment table 115a associates alphanumeric characters, symbols, kana, kana, kanji, numerical values, time, tags, syntax, and 2-byte codes.

２バイトコード割当表１１５ａにおいて、「３０００ｈ〜３ＦＦＦｈ」には、英数字、記号、かな、カナ、漢字、数値、時刻、タグ、構文が設定され、設定位置に応じた２バイトのコードが割り当てられる。例えば、「ＮＵＬＬ」には、２バイトのコード「３０００ｈ」が割り当てられる。 In the 2-byte code allocation table 115a, alphanumeric characters, symbols, kana, kana, kanji, numerical values, time, tags, and syntax are set in “3000h to 3FFFh”, and 2-byte codes corresponding to the set positions are allocated. . For example, a 2-byte code “3000h” is assigned to “NULL”.

２バイトコード割当表１１５ａにおいて、「４０００ｈ〜５ＦＦＦｈ」には、高頻度単語が設定され、設定位置に応じた２バイトのコードが割り当てられる。例えば、設定位置「４０００ｈ」に設定された高頻度単語には、２バイトのコード「４０００ｈ」が割り当てられる。 In the 2-byte code assignment table 115a, a high-frequency word is set in “4000h to 5FFFh”, and a 2-byte code corresponding to the set position is assigned. For example, the 2-byte code “4000h” is assigned to the high-frequency word set at the setting position “4000h”.

図５は、本実施例１に係る３バイトコード割当表の一例を示す図である。図５に示すように、３バイトコード割当表１１５ｂは、ＣＪＫ文字、英単語、日本単語、第３国の単語、数値、時刻、タグ、構文意味解析の結果と、３バイトのコードとを対応付ける。なお、３バイトコード割当表１１５ｂにおいて、例えば、「Ｅ０００００ｈ〜ＦＦＦＦＦＦｈ」は、予備の領域となる。 FIG. 5 is a diagram illustrating an example of a 3-byte code allocation table according to the first embodiment. As shown in FIG. 5, the 3-byte code assignment table 115b associates CJK characters, English words, Japanese words, third country words, numerical values, times, tags, syntactic and semantic analysis results, and 3-byte codes. . In the 3-byte code allocation table 115b, for example, “E00000h to FFFFFFh” is a spare area.

３バイトコード割当表１１５ｂにおいて、「８０００００ｈ〜ＤＦＦＦＦＦｈ」には、日本単語、第３国の単語、数値、時刻、タグ、構文意味解析の結果が設定され、設定位置に応じた３バイトのコードが割り当てられる。例えば、設定位置「８０００００ｈ」に設定された日本単語には、３バイトのコード「８０００００ｈ」が割り当てられる。 In the 3-byte code allocation table 115b, “800000h to DFFFFFh” is set with a Japanese word, a third country word, a numerical value, a time, a tag, and a result of syntax semantic analysis, and a 3-byte code corresponding to the set position. Assigned. For example, a 3-byte code “800000h” is assigned to a Japanese word set at the set position “800000h”.

図２ａの説明に戻る。コード変換部１５０は、コード割当表１１０、２バイトコード割当表１１５ａ、３バイトコード割当表１１５ｂを基にして、レジスタ１０５ａに格納されたテキストデータをコード化する処理部である。コード変換部１５０は、コード化したテキストデータを、レジスタ１０５ｂに格納する。 Returning to the description of FIG. The code conversion unit 150 is a processing unit that encodes the text data stored in the register 105a based on the code allocation table 110, the 2-byte code allocation table 115a, and the 3-byte code allocation table 115b. The code conversion unit 150 stores the encoded text data in the register 105b.

以下において、コード変換部１５０の処理の一例について説明する。コード変換部１５０は、テキストデータから、スペース「△」で区切られる単語を取得し、取得した単語が、１バイト領域１１０Ａに設定された単語か、２バイト領域１１０Ｂに設定された単語か、３バイト領域１１０Ｃに設定された単語かを判定する。 Hereinafter, an example of processing of the code conversion unit 150 will be described. The code conversion unit 150 acquires words delimited by a space “Δ” from the text data, and determines whether the acquired word is a word set in the 1-byte area 110A or a word set in the 2-byte area 110B. It is determined whether the word is set in the byte area 110C.

コード変換部１５０の取得した単語が１バイト領域１１０Ａに設定された単語である場合について説明する。コード変換部１５０は、取得した単語と、１バイト領域１１０Ａの各単語とを比較して、該当する設定位置の１バイトのコードを特定し、コード化する。例えば、コード変換部１５０は、取得した単語が「ｔｈｅ△」である場合には、かかる単語「ｔｈｅ△」を「００ｈ」にコード化する。 A case where the word acquired by the code conversion unit 150 is a word set in the 1-byte area 110A will be described. The code conversion unit 150 compares the acquired word with each word in the 1-byte area 110A to identify and code a 1-byte code at the corresponding setting position. For example, if the acquired word is “theΔ”, the code converting unit 150 encodes the word “theΔ” into “00h”.

続いて、コード変換部１５０の取得した単語が２バイト領域１１０Ｂに設定された単語である場合について説明する。コード変換部１５０は、取得した単語と、２バイトコード割当表１１５ａとを比較して、該当する設定位置の２バイトのコードを特定し、コード化する。例えば、コード変換部１５０は、取得した単語が、２バイトコード割当表１１５ａの「４０００ｈ」に設定されたある高頻度単語である場合には、かかる高頻度単語を２バイトのコード「４０００ｈ」にコード化する。 Next, a case where the word acquired by the code conversion unit 150 is a word set in the 2-byte area 110B will be described. The code conversion unit 150 compares the acquired word with the 2-byte code assignment table 115a, identifies the 2-byte code at the corresponding setting position, and encodes it. For example, when the acquired word is a certain high-frequency word set to “4000h” in the 2-byte code assignment table 115a, the code conversion unit 150 converts the high-frequency word into a 2-byte code “4000h”. Encode.

なお、コード変換部１５０は、取得した情報が、２バイト領域１１０Ｂに設定された英数字、記号、かな、カナ、漢字、数値、時刻、タグ、構文である場合も、２バイトコード割当表１１５ａと比較して、コード化する。例えば、コード変換部１５０は、「ＮＵＬＬ」を取得した場合には、かかる「ＮＵＬＬ」を「３０００ｈ」にコード化する。 The code conversion unit 150 also includes the 2-byte code assignment table 115a even when the acquired information is alphanumeric characters, symbols, kana, kana, kanji, numerical values, time, tags, and syntax set in the 2-byte area 110B. Compare with. For example, if “NULL” is acquired, the code conversion unit 150 encodes “NULL” into “3000h”.

続いて、コード変換部１５０の取得した単語が３バイト領域１１０Ｃに設定された単語である場合について説明する。コード変換部１５０は、取得した単語と、３バイトコード割当表１１５ｂとを比較して、該当する設定位置の３バイトのコードを特定し、コード化する。例えば、コード変換部１５０は、取得した単語が、３バイトコード割当表１１５ｂの「７０００００ｈ」に設定されたある英単語である場合には、かかる英単語を３バイトのコード「７０００００ｈ」にコード化する。 Next, a case where the word acquired by the code conversion unit 150 is a word set in the 3-byte area 110C will be described. The code conversion unit 150 compares the acquired word with the 3-byte code assignment table 115b to identify and code a 3-byte code at the corresponding setting position. For example, if the acquired word is an English word set to “700000h” in the 3-byte code assignment table 115b, the code conversion unit 150 encodes the English word into a 3-byte code “700000h”. To do.

なお、コード変換部１５０は、取得した情報が、３バイト領域１１０Ｃに設定された日本単語、第３国の単語、数値、時刻、タグ、構文意味解析の結果である場合も、３バイトコード割当表１１５ｂと比較して、コード化する。例えば、コード変換部１５０は、取得した情報が、３バイトコード割当表１１５ｂの「８０００００ｈ」に設定されたある日本単語である場合には、かかる日本単語を３バイトのコード「８０００００ｈ」にコード化する。 Note that the code conversion unit 150 also assigns a 3-byte code even if the acquired information is a Japanese word, a third country word, a numerical value, a time, a tag, or a syntax semantic analysis result set in the 3-byte area 110C. Compare with Table 115b. For example, if the acquired information is a certain Japanese word set to “800000h” in the 3-byte code assignment table 115b, the code conversion unit 150 encodes the Japanese word into a 3-byte code “800000h”. To do.

コード変換部１５０は、レジスタ１０５ａに格納されたテキストデータに対して、上記処理を繰り返し実行することで、テキストデータをコード化する。コード変換部１５０は、コード化したテキストデータを、レジスタ１０５ｂに格納する。 The code conversion unit 150 encodes the text data by repeatedly executing the above processing on the text data stored in the register 105a. The code conversion unit 150 stores the encoded text data in the register 105b.

図２ｂは、本実施例１に係る復号化装置の構成を示す機能ブロック図である。図２ｂに示すように、この復号化装置５００は、入力部５０１、出力部５０２、レジスタ５０５ａ，５０５ｂ、記憶部５０６、コード変換部５５０を有する。 FIG. 2B is a functional block diagram illustrating the configuration of the decoding apparatus according to the first embodiment. As illustrated in FIG. 2b, the decoding device 500 includes an input unit 501, an output unit 502, registers 505a and 505b, a storage unit 506, and a code conversion unit 550.

入力部５０１は、コード変換されたテキストデータを受け付ける処理部である。入力部５０１は、受け付けたテキストデータを、レジスタ５０５ａに格納する。 The input unit 501 is a processing unit that receives code-converted text data. The input unit 501 stores the received text data in the register 505a.

出力部５０２は、レジスタ５０５ｂに格納されたテキストデータを出力する処理部である。 The output unit 502 is a processing unit that outputs the text data stored in the register 505b.

レジスタ５０５ａは、コード変換されたテキストデータを格納するものである。レジスタ５０５ｂは、文字コード変換後のテキストデータを格納するものである。 The register 505a stores code-converted text data. The register 505b stores text data after character code conversion.

記憶部５０６は、コード割当表１１０と、２バイトコード割当表１１５ａと、３バイトコード割当表１１５ｂとを有する。記憶部５０６は、例えば、ＲＡＭ、ＲＯＭ、フラッシュメモリなどの半導体メモリ素子などの記憶装置に対応する。 The storage unit 506 includes a code allocation table 110, a 2-byte code allocation table 115a, and a 3-byte code allocation table 115b. The storage unit 506 corresponds to a storage device such as a semiconductor memory element such as a RAM, a ROM, or a flash memory.

コード割当表１１０に関する説明は、図３で説明したコード割当表１１０に関する説明と同様である。２バイトコード割当表１１５ａに関する説明は、図４で説明した２バイトコード割当表１１５ａに関する説明と同様である。３バイトコード割当表１１５ｂに関する説明は、図５で説明した３バイトコード割当表１１５ｂに関する説明と同様である。 The description regarding the code allocation table 110 is the same as the description regarding the code allocation table 110 described with reference to FIG. The description regarding the 2-byte code allocation table 115a is the same as the description regarding the 2-byte code allocation table 115a described with reference to FIG. The description regarding the 3-byte code allocation table 115b is the same as the description regarding the 3-byte code allocation table 115b described with reference to FIG.

以下において、コード変換部５５０の処理の一例について説明する。例えば、コード変換部５５０は、テキストデータからコードを取得し、取得したコードが１バイト領域１１０Ａに設定された単語に対応するものか、２バイト領域１１０Ｂに設定された単語に対応するものか、３バイト領域１１０Ｃに設定された単語に対応するものかを判定する。 Hereinafter, an example of processing of the code conversion unit 550 will be described. For example, the code conversion unit 550 acquires a code from text data, and whether the acquired code corresponds to a word set in the 1-byte area 110A or a word set in the 2-byte area 110B, It is determined whether the word corresponds to the word set in the 3-byte area 110C.

コード変換部５５０が取得したコードが１バイト領域１１０Ａに設定された単語に対応するものである場合について説明する。１バイト領域１１０Ａに設定された単語に対応するコードの１バイト目は「００ｈ〜２Ｆｈ」に含まれる。コード変換部５５０は、コードに対応する単語を、１バイト領域１１０Ａに設定された単語から選択し、選択した単語によって文字コード化する。例えば、コード変換部５５０は、取得したコードが「００ｈ」である場合には、「００ｈ」を「ｔｈｅ△」に文字コード化する。 A case where the code acquired by the code conversion unit 550 corresponds to a word set in the 1-byte area 110A will be described. The first byte of the code corresponding to the word set in the 1-byte area 110A is included in “00h to 2Fh”. The code conversion unit 550 selects a word corresponding to the code from the words set in the 1-byte area 110A, and converts the word into a character code using the selected word. For example, if the acquired code is “00h”, the code conversion unit 550 converts “00h” into a character code “theΔ”.

コード変換部５５０が取得したコードが２バイト領域１１０Ｂに設定された単語に対応するものである場合について説明する。２バイト領域１１０Ｂに設定された単語に対応するコードの１バイト目は「３０ｈ〜５Ｆｈ」に含まれる。コード変換部５５０は、コードの１バイト目と続く２バイト目とを合わせたコードと、２バイトコード割当表１１５ａとを比較して、単語を文字コード化する。例えば、コード変換部５５０は、２バイトのコードが「４０００ｈ」である場合には、２バイトコード割当表１１５ａに設定された「４０００ｈ」に対応する単語に文字コード化する。 A case where the code acquired by the code conversion unit 550 corresponds to a word set in the 2-byte area 110B will be described. The first byte of the code corresponding to the word set in the 2-byte area 110B is included in “30h to 5Fh”. The code conversion unit 550 compares the code combining the first byte and the second byte of the code with the 2-byte code assignment table 115a, and converts the word into a character code. For example, when the 2-byte code is “4000h”, the code conversion unit 550 converts the character code into a word corresponding to “4000h” set in the 2-byte code assignment table 115a.

コード変換部５５０が取得したコードが３バイト領域１１０Ｃに設定された単語に対応するものである場合について説明する。３バイト領域１１０Ｃに設定された単語に対応するコードの１バイト目は「６０ｈ〜ＦＦｈ」に含まれる。コード変換部５５０は、コードの１バイト目と続く２、３バイト目とを合わせたコードと、３バイトコード割当表１１５ｂとを比較して、単語を文字コード化する。例えば、コード変換部５５０は、３バイトのコードが「７０００００ｈ」である場合には、３バイトコード割当表１１５ｂに設定された「７０００００ｈ」に対応する単語に文字コード化する。 A case where the code acquired by the code conversion unit 550 corresponds to a word set in the 3-byte area 110C will be described. The first byte of the code corresponding to the word set in the 3-byte area 110C is included in “60h to FFh”. The code conversion unit 550 compares the first byte of the code with the subsequent second and third bytes, and the 3-byte code assignment table 115b to convert the word into a character code. For example, when the 3-byte code is “700000h”, the code conversion unit 550 converts the character code into a word corresponding to “700000h” set in the 3-byte code allocation table 115b.

図６ａは、本実施例１に係る符号化装置の処理手順を示すフローチャートである。図６ａに示すように、符号化装置１００の入力部１０１は、テキストデータをレジスタ１０５ａに格納する（ステップＳ１０１）。符号化装置１００のコード変換部１５０は、レジスタ１０５ａに格納されたテキストデータから単語を取得する（ステップＳ１０２）。ステップＳ１０２では、説明の便宜上、単語と表記するが、コード変換部１５０が取得するものは、単語の他に、日本単語、第３国の単語、数値、時刻、タグ、構文意味解析の結果等の場合もある。 FIG. 6A is a flowchart illustrating the processing procedure of the encoding apparatus according to the first embodiment. As shown in FIG. 6a, the input unit 101 of the encoding device 100 stores the text data in the register 105a (step S101). The code conversion unit 150 of the encoding device 100 acquires a word from the text data stored in the register 105a (step S102). In step S102, it is written as a word for convenience of explanation, but what the code conversion unit 150 acquires is a Japanese word, a third country word, a numerical value, a time, a tag, a result of syntax semantic analysis, etc. In some cases.

コード変換部１５０は、単語とコード割当表１１０とを比較する（ステップＳ１０３）。コード変換部１５０は、単語がコード割当表１１０の１バイト領域１１０Ａの単語に対応する単語である場合には（ステップＳ１０４，Ｙｅｓ）、ステップＳ１０５に移行する。コード変換部１５０は、コード割当表１１０に基づいて、単語を１バイトのコードに変換し（ステップＳ１０５）、ステップＳ１０９に移行する。 The code conversion unit 150 compares the word with the code assignment table 110 (step S103). If the word is a word corresponding to the word in the 1-byte area 110A of the code assignment table 110 (Yes in step S104), the code conversion unit 150 proceeds to step S105. The code conversion unit 150 converts the word into a 1-byte code based on the code assignment table 110 (step S105), and proceeds to step S109.

一方、コード変換部１５０は、単語がコード割当表１１０の１バイト領域１１０Ａの単語に対応する単語でない場合には（ステップＳ１０４，Ｎｏ）、ステップＳ１０６に移行する。コード変換部１５０は、単語がコード割当表１１０の２バイト領域１１０Ｂの単語に対応する単語である場合には（ステップＳ１０６，Ｙｅｓ）、ステップＳ１０７に移行する。コード変換部１５０は、２バイトコード割当表１１５ａに基づいて、単語を２バイトのコードに変換し（ステップＳ１０７）、ステップＳ１０９に移行する。 On the other hand, if the word is not a word corresponding to the word in the 1-byte area 110A of the code assignment table 110 (No at Step S104), the code conversion unit 150 proceeds to Step S106. If the word is a word corresponding to the word in the 2-byte area 110B of the code assignment table 110 (Yes in step S106), the code conversion unit 150 proceeds to step S107. The code conversion unit 150 converts the word into a 2-byte code based on the 2-byte code assignment table 115a (step S107), and proceeds to step S109.

一方、コード変換部１５０は、単語がコード割当表１１０の２バイト領域１１０Ｂの単語に対応する単語でない場合には（ステップＳ１０６，Ｎｏ）、ステップＳ１０８に移行する。コード変換部１５０は、３バイトコード変換表１１５ｂに基づいて、単語を３バイトのコードに変換し（ステップＳ１０８）、ステップＳ１０９に移行する。 On the other hand, if the word is not a word corresponding to the word in the 2-byte area 110B of the code assignment table 110 (No at Step S106), the code conversion unit 150 proceeds to Step S108. The code conversion unit 150 converts the word into a 3-byte code based on the 3-byte code conversion table 115b (step S108), and proceeds to step S109.

コード変換部１５０は、テキストデータのコード化が終了したか否かを判定する（ステップＳ１０９）。コード変換部１５０は、テキストデータのコード化が終了していない場合には（ステップＳ１０９，Ｎｏ）、ステップＳ１０２に移行する。 The code conversion unit 150 determines whether or not the text data has been encoded (step S109). If the encoding of the text data has not been completed (No at Step S109), the code converting unit 150 proceeds to Step S102.

一方、コード変換部１５０は、テキストデータのコード化が終了した場合には（ステップＳ１０９，Ｙｅｓ）、コード化したテキストデータを、レジスタ１０５ｂに格納する（ステップＳ１１０）。 On the other hand, when the encoding of the text data is completed (step S109, Yes), the code conversion unit 150 stores the encoded text data in the register 105b (step S110).

図６ｂは、本実施例１に係る復号化装置の処理手順を示すフローチャートである。図６ｂに示すように、復号化装置５００の入力部５０１は、テキストデータをレジスタ５０５ａに格納する（ステップＳ５０１）。復号化装置５００のコード変換部５５０は、レジスタ５０５ａに格納されたテキストデータからコードを取得する（ステップＳ５０２）。 FIG. 6B is a flowchart illustrating the processing procedure of the decoding apparatus according to the first embodiment. As shown in FIG. 6b, the input unit 501 of the decryption apparatus 500 stores the text data in the register 505a (step S501). The code conversion unit 550 of the decryption apparatus 500 acquires a code from the text data stored in the register 505a (step S502).

コード変換部５５０は、コードとコード割当表１１０とを比較する（ステップＳ５０３）。コード変換部５５０は、コードがコード割当表１１０の１バイト領域１１０Ａの単語に対応するコードである場合には（ステップＳ５０４，Ｙｅｓ）、ステップＳ５０５に移行する。コード変換部５５０は、コード割当表１１０に基づいて、１バイトのコードを単語に変換し（ステップＳ５０５）、ステップＳ５０９に移行する。 The code conversion unit 550 compares the code with the code assignment table 110 (step S503). If the code is a code corresponding to a word in the 1-byte area 110A of the code assignment table 110 (Yes in step S504), the code conversion unit 550 proceeds to step S505. The code conversion unit 550 converts the 1-byte code into a word based on the code assignment table 110 (step S505), and proceeds to step S509.

一方、コード変換部５５０は、コードがコード割当表１１０の１バイト領域１１０Ａの単語に対応するコードでない場合には（ステップＳ５０４，Ｎｏ）、ステップＳ５０６に移行する。コード変換部５５０は、コードがコード割当表１１０の２バイト領域１１０Ｂの単語に対応するコードである場合には（ステップＳ５０６，Ｙｅｓ）、ステップＳ５０７に移行する。コード変換部５５０は、２バイトコード割当表１１５ａに基づいて、２バイトのコードを単語に変換し（ステップＳ５０７）、ステップＳ５０９に移行する。 On the other hand, when the code is not a code corresponding to the word in the 1-byte area 110A of the code assignment table 110 (No at Step S504), the code conversion unit 550 proceeds to Step S506. If the code is a code corresponding to a word in the 2-byte area 110B of the code assignment table 110 (step S506, Yes), the code conversion unit 550 proceeds to step S507. Based on the 2-byte code assignment table 115a, the code conversion unit 550 converts the 2-byte code into a word (step S507), and proceeds to step S509.

一方、コード変換部５５０は、コードがコード割当表１１０の２バイト領域１１０Ｂの単語に対応するコードでない場合には（ステップＳ５０６，Ｎｏ）、ステップＳ５０８に移行する。コード変換部５５０は、３バイトコード変換表１１５ｂに基づいて、３バイトのコードを単語に変換し（ステップＳ５０８）、ステップＳ５０９に移行する。 On the other hand, when the code is not a code corresponding to the word in the 2-byte area 110B of the code assignment table 110 (No at Step S506), the code conversion unit 550 proceeds to Step S508. Based on the 3-byte code conversion table 115b, the code conversion unit 550 converts the 3-byte code into a word (step S508), and proceeds to step S509.

コード変換部５５０は、テキストデータの復号化が終了したか否かを判定する（ステップＳ５０９）。コード変換部５５０は、テキストデータの復号化が終了していない場合には（ステップＳ５０９，Ｎｏ）、ステップＳ５０２に移行する。 The code conversion unit 550 determines whether or not the text data has been decrypted (step S509). If the decoding of the text data is not completed (No at Step S509), the code conversion unit 550 proceeds to Step S502.

一方、コード変換部５５０は、テキストデータの復号化が終了した場合には（ステップＳ５０９，Ｙｅｓ）、復号化したテキストデータを、レジスタ５０５ｂに格納する（ステップＳ５１０）。 On the other hand, when the decoding of the text data is completed (Yes in step S509), the code conversion unit 550 stores the decoded text data in the register 505b (step S510).

次に、本実施例１に係る符号化装置１００の効果について説明する。符号化装置１００は、従来のコード割当表５０の１バイト領域に割り当てられていた文字をコード割当表１１０の２バイト領域に退避させ、コード割当表１１０の１バイト領域には厳選した単語を割り当てた割当表を用いたコード変換を行う。係る処理を実行することで、出現頻度が高い文字や単語に対して、短いバイトコードを割り当てることができる。 Next, effects of the encoding device 100 according to the first embodiment will be described. The encoding apparatus 100 saves the characters assigned to the 1-byte area of the conventional code assignment table 50 to the 2-byte area of the code assignment table 110, and assigns carefully selected words to the 1-byte area of the code assignment table 110. Code conversion using the assigned table. By executing such processing, a short byte code can be assigned to a character or word having a high appearance frequency.

また、復号化装置５００は、上記のコード割当表１１０を用いて、コード化されたテキストデータを復号化するため、出現頻度が高い単語や一般記号に対して、短いバイトコードを割り当てた場合でも、係るバイトコードを単語や一般記号に変換することができる。 In addition, since the decoding apparatus 500 uses the code allocation table 110 described above to decode the encoded text data, even when a short byte code is allocated to a word or general symbol having a high appearance frequency. , Such byte codes can be converted into words and general symbols.

図７ａは、本実施例２に係る符号化装置の処理の一例を示す図である。本実施例２に係る符号化装置は、従来技術で用いていたコード割当表５０の代わりに、コード割当表２１０を用いて、テキストデータ２０ａをコード変換することで、コード変換されたテキストデータ２０ｂを生成する。従来技術のコード割当表５０に関する説明は、実施例１で説明したものと同様である。 FIG. 7A is a diagram illustrating an example of processing of the encoding apparatus according to the second embodiment. The encoding apparatus according to the second embodiment performs code conversion on the text data 20a by using the code allocation table 210 instead of the code allocation table 50 used in the prior art, thereby converting the text data 20b subjected to code conversion. Is generated. The description regarding the code allocation table 50 of the prior art is the same as that described in the first embodiment.

本実施例２に係るコード割当表２１０について説明する。コード割当表２１０の００ｈ〜１Ｆｈには、後述する所定の単語が設定され、１バイトのコードが割り当てられる。コード割当表２１０の００ｈ〜１Ｆｈは、コード割当表５０において、制御記号が割り当てられていた領域を含む。 The code allocation table 210 according to the second embodiment will be described. A predetermined word to be described later is set in 00h to 1Fh in the code assignment table 210, and a 1-byte code is assigned. 00h to 1Fh in the code assignment table 210 includes an area to which a control symbol is assigned in the code assignment table 50.

コード割当表２１０の２０ｈ〜７Ｆｈには、英数字が設定され、１バイトのコードが割り当てられる。コード割当表２１０の２０ｈ〜７Ｆｈに設定される英数字は、コード割当表５０の２０ｈ〜７Ｆｈに設定される英数字と同様である。 Alphanumeric characters are set in 20h to 7Fh of the code assignment table 210, and a 1-byte code is assigned. The alphanumeric characters set to 20h to 7Fh in the code assignment table 210 are the same as the alphanumeric characters set to 20h to 7Fh in the code assignment table 50.

コード割当表２１０の８０ｈ〜９Ｆｈには、高頻度の単語等が設定される。また、コード割当表２１０の８０ｈ〜９Ｆｈには、コード割当表５０の００ｈ〜１Ｆｈに設定されていた制御記号や、コード割当表５０の８０ｈ〜ＦＦｈに設定されていたＣＪＫ文字の一部が設定される。コード割当表２１０の８０ｈ〜９Ｆｈに設定された高頻度の単語、制御記号、ＣＪＫ文字には、２バイトのコードが割り当てられる。 Frequent words and the like are set in 80h to 9Fh of the code assignment table 210. In addition, in 80h to 9Fh of the code assignment table 210, control symbols set in 00h to 1Fh in the code assignment table 50 and a part of CJK characters set in 80h to FFh in the code assignment table 50 are set. Is done. A 2-byte code is assigned to high-frequency words, control symbols, and CJK characters set to 80h to 9Fh in the code assignment table 210.

コード割当表２１０のＡ０ｈ〜ＦＦｈには、低頻度の単語等が設定される。また、コード割当表２１０のＡ０ｈ〜ＦＦｈには、コード割当表５０の８０ｈ〜ＦＦｈに設定されていたＣＪＫ文字の一部が設定される。 Infrequent words and the like are set in A0h to FFh of the code assignment table 210. A part of the CJK characters set in 80h to FFh in the code assignment table 50 are set in A0h to FFh in the code assignment table 210.

本実施例２について、以下の説明では、適宜、コード割当表２１０の００ｈ〜１Ｆｈの領域を「単語１バイト領域」と表記する。コード割当表２１０の２０ｈ〜７Ｆｈの領域を「英数字１バイト領域」と表記する。コード割当表２１０の８０ｈ〜９Ｆｈの領域を「２バイト領域」と表記する。コード割当表２１０のＡ０ｈ〜ＦＦｈの領域を「３バイト領域」と表記する。 Regarding the second embodiment, in the following description, the area from 00h to 1Fh in the code assignment table 210 will be referred to as a “word 1-byte area” as appropriate. The area of 20h to 7Fh in the code assignment table 210 is expressed as “alphanumeric 1-byte area”. The area from 80h to 9Fh in the code allocation table 210 is expressed as “2-byte area”. The area from A0h to FFh in the code assignment table 210 is denoted as “3-byte area”.

コード変換部２５０は、コード割当表２１０に基づいて、テキストデータ２０ａを、テキストデータ２０ｂに変換する。ここでは、テキストデータ２０ａを「・・・ｈｅ△ｉｓ△ｉｎ△ｔｈｅ△ｈｏｕｓｅ△・・・」とする。テキストデータ２０ａの「△」はスペースを示すものである。 Based on the code assignment table 210, the code conversion unit 250 converts the text data 20a into text data 20b. Here, the text data 20a is assumed to be “..., He Δis Δ in Δ the Δ house Δ. “Δ” in the text data 20a indicates a space.

コード変換部２５０は、スペース「△」で区切られる単語と、コード割当表２１０とを比較して、単語をコードに変換する。テキストデータ２０ａに含まれる単語「ｈｅ△」は、コード割当表２１０の単語１バイト領域に設定された単語であり、コード変換部２５０は、単語「ｈｅ△」を１バイトのコード「１２ｈ」に変換する。 The code conversion unit 250 compares the word delimited by the space “Δ” with the code assignment table 210 and converts the word into a code. The word “heΔ” included in the text data 20a is a word set in the word 1-byte area of the code assignment table 210, and the code conversion unit 250 converts the word “heΔ” into a 1-byte code “12h”. Convert.

テキストデータ２０ａに含まれる単語「ｉｓ△」は、コード割当表２１０の単語１バイト領域に設定された単語であり、コード変換部２５０は、単語「ｉｓ△」を１バイトのコード「０８ｈ」に変換する。 The word “isΔ” included in the text data 20a is a word set in the word 1-byte area of the code assignment table 210, and the code conversion unit 250 converts the word “isΔ” into a 1-byte code “08h”. Convert.

テキストデータ２０ａに含まれる単語「ｉｎ△」は、コード割当表２１０の単語１バイト領域に設定された単語であり、コード変換部２５０は、単語「ｉｎ△」を１バイトのコード「０７ｈ」に変換する。 The word “inΔ” included in the text data 20a is a word set in the word 1-byte area of the code assignment table 210, and the code conversion unit 250 converts the word “inΔ” into a 1-byte code “07h”. Convert.

テキストデータ２０ａに含まれる単語「ｔｈｅ△」は、コード割当表２１０の単語１バイト領域に設定された単語であり、コード変換部２５０は、単語「ｔｈｅ△」を１バイトのコード「００ｈ」に変換する。 The word “theΔ” included in the text data 20a is a word set in the word 1-byte area of the code assignment table 210, and the code conversion unit 250 converts the word “theΔ” into a 1-byte code “00h”. Convert.

テキストデータ２０ａに含まれる単語「ｈｏｕｓｅ△」は、コード割当表２１０の２バイト領域に設定された単語であり、コード変換部２５０は、例えば、単語「ｈｏｕｓｅ△」を２バイトのコード「８３４１ｈ」に変換する。 The word “houseΔ” included in the text data 20a is a word set in the 2-byte area of the code assignment table 210. For example, the code conversion unit 250 converts the word “houseΔ” into a 2-byte code “8341h”. Convert to

コード変換部２５０は、テキストデータ２０ａに含まれる各単語に対して、上記処理を実行することで、テキストデータ２０ａをテキストデータ２０ｂにコード化する。 The code conversion unit 250 encodes the text data 20a into the text data 20b by performing the above processing on each word included in the text data 20a.

図７ｂは、本実施例２に係る復号化装置の処理の一例を示す図である。本実施例２に係る復号化装置は、従来技術で用いていたコード割当表５０の代わりに、コード割当表２１０を用いて、コード変換されたテキストデータ２０ｂを、文字コード変換することで、テキストデータ２０ａを生成する。コード割当表２１０に関する説明は、上記の説明と同様である。 FIG. 7B is a diagram illustrating an example of a process performed by the decoding apparatus according to the second embodiment. The decoding apparatus according to the second embodiment uses the code assignment table 210 instead of the code assignment table 50 used in the prior art to perform character code conversion on the text data 20b that has been subjected to code conversion. Data 20a is generated. The description regarding the code assignment table 210 is the same as the above description.

コード変換部６５０は、コード割当表２１０に基づいて、テキストデータ２０ｂを、テキストデータ２０ａに変換する。ここでは、テキストデータ２０ｂを「・・・１２ｈ０８ｈ０７ｈ００ｈ８３４１ｈ・・・」とする。 The code conversion unit 650 converts the text data 20b into the text data 20a based on the code assignment table 210. Here, the text data 20b is assumed to be “... 12h 08h 07h 00h 8341h.

コード変換部６５０は、コードと、コード割当表２１０とを比較して、コードを単語に変換する。例えば、コード変換部６５０は、１バイトのコード「１２ｈ」を単語「ｈｅ△」に変換する。コード変換部６５０は、１バイトのコード「０８ｈ」を単語「ｉｓ△」に変換する。コード変換部６５０は、１バイトのコード「０７ｈ」を単語「ｉｎ△」に変換する。コード変換部６５０は、１バイトのコード「００ｈ」を単語「ｔｈｅ△」に変換する。コード変換部６５０は、２バイトのコード「８３４１ｈ」を単語「ｈｏｕｓｅ△」に変換する。 The code conversion unit 650 compares the code with the code assignment table 210 and converts the code into a word. For example, the code conversion unit 650 converts the 1-byte code “12h” into the word “heΔ”. The code conversion unit 650 converts the 1-byte code “08h” into the word “isΔ”. The code conversion unit 650 converts the 1-byte code “07h” into the word “inΔ”. The code conversion unit 650 converts the 1-byte code “00h” into the word “theΔ”. The code conversion unit 650 converts the 2-byte code “8341h” into the word “houseΔ”.

コード変換部６５０は、テキストデータ２０ｂに含まれる各コードに対して、上記処理を実行することで、テキストデータ２０ｂをテキストデータ２０ａに変換する。 The code conversion unit 650 converts the text data 20b into the text data 20a by executing the above processing on each code included in the text data 20b.

図８ａは、本実施例２に係る符号化装置の構成を示す機能ブロック図である。図８ａに示すように、この符号化装置２００は、入力部２０１、出力部２０２、レジスタ２０５ａ，２０５ｂ、記憶部２０６、コード変換部２５０を有する。 FIG. 8A is a functional block diagram illustrating the configuration of the encoding device according to the second embodiment. As shown in FIG. 8a, the encoding device 200 includes an input unit 201, an output unit 202, registers 205a and 205b, a storage unit 206, and a code conversion unit 250.

入力部２０１は、コード変換を行うテキストデータを受け付ける処理部である。入力部２０１は、受け付けたテキストデータを、レジスタ２０５ａに格納する。 The input unit 201 is a processing unit that receives text data for code conversion. The input unit 201 stores the received text data in the register 205a.

出力部２０２は、レジスタ２０５ｂに格納されるコード変換後のテキストデータを出力する処理部である。 The output unit 202 is a processing unit that outputs the text data after code conversion stored in the register 205b.

レジスタ２０５ａは、コード変換を行う前のテキストデータを格納するものである。レジスタ２０５ｂは、コード変換後のテキストデータを格納するものである。 The register 205a stores text data before code conversion. The register 205b stores text data after code conversion.

記憶部２０６は、コード割当表２１０と、２バイトコード割当表２１５ａと、３バイトコード割当表２１５ｂとを有する。記憶部２０６は、例えば、ＲＡＭ、ＲＯＭ、フラッシュメモリなどの半導体メモリ素子などの記憶装置に対応する。 The storage unit 206 includes a code allocation table 210, a 2-byte code allocation table 215a, and a 3-byte code allocation table 215b. The storage unit 206 corresponds to a storage device such as a semiconductor memory element such as a RAM, a ROM, or a flash memory.

図９は、本実施例２に係るコード割当表の一例を示す図である。コード割当表２１０は、単語等と、所定のコードとを対応付けたテーブルであり、図７ａで説明したコード割当表２１０に対応する。図９に示すように、このコード割当表２１０は、単語１バイト領域２１０Ａと、英数字１バイト領域２１０Ｂと、２バイト領域２１０Ｃと、３バイト領域２１０Ｄとを有する。 FIG. 9 is a diagram illustrating an example of a code assignment table according to the second embodiment. The code assignment table 210 is a table in which words and the like are associated with predetermined codes, and corresponds to the code assignment table 210 described with reference to FIG. As shown in FIG. 9, the code allocation table 210 has a word 1-byte area 210A, an alphanumeric 1-byte area 210B, a 2-byte area 210C, and a 3-byte area 210D.

単語１バイト領域２１０Ａは、コード割当表２１０の００ｈ〜１Ｆｈの領域である。この単語１バイト領域２１０Ａには、青空文庫、オックスフォード英語辞典、その他の一般的な書籍を基にして、出現頻度の高い上位３２個の単語が設定される。 The word 1-byte area 210A is an area from 00h to 1Fh in the code assignment table 210. In the word 1-byte area 210A, the top 32 words having the highest appearance frequency are set based on the blue sky library, the Oxford English dictionary, and other general books.

単語１バイト領域２１０Ａに設定された単語は、単語１バイト領域２１０Ａの設定位置に応じた１バイトのコードが割り当てられる。例えば、単語「ｔｈｅ△」は、１バイトのコード「００ｈ」が割り当てられる。単語１バイト領域２１０Ａに設定された残りの単語も同様に、１バイトのコードが割り当てられる。 A 1-byte code corresponding to the set position of the word 1-byte area 210A is assigned to the word set in the word 1-byte area 210A. For example, the word “theΔ” is assigned a 1-byte code “00h”. Similarly, a 1-byte code is assigned to the remaining words set in the word 1-byte area 210A.

英数字１バイト領域２１０Ｂは、コード割当表２１０の２０ｈ〜７Ｆｈの領域である。この英数字１バイト領域２１０Ｂには、コード割当表５０の２０ｈ〜７Ｆｈに設定される英数字と同様の英数字が設定される。 The alphanumeric 1-byte area 210B is an area of 20h to 7Fh in the code assignment table 210. In this alphanumeric 1-byte area 210B, alphanumeric characters similar to the alphanumeric characters set in 20h to 7Fh of the code assignment table 50 are set.

英数字１バイト領域２１０Ｂに設定された英数字は、英数字１バイト領域２１０Ｂの設定位置に応じた１バイトのコードが割り当てられる。例えば、数値「０」は、１バイトのコード「３０ｈ」が割り当てられる。英数字１バイト領域２１０Ｂに設定された残りの英数字も同様に、１バイトのコードが割り当てられる。 An alphanumeric character set in the alphanumeric 1-byte area 210B is assigned a 1-byte code corresponding to the set position of the alphanumeric 1-byte area 210B. For example, the numerical value “0” is assigned a 1-byte code “30h”. Similarly, a 1-byte code is assigned to the remaining alphanumeric characters set in the alphanumeric 1-byte area 210B.

２バイト領域２１０Ｃは、コード割当表２１０の８０ｈ〜９Ｆｈの領域である。この２バイト領域２１０Ｃには、青空文庫、オックスフォード英語辞書、その他の一般的な書籍を基にして、出現頻度が所定値以上となる単語が設定される。以下の説明では、適宜、出現頻度が所定値以上となる単語を高頻度単語と表記する。また、２バイト領域２１０Ｃには、制御記号等が含まれていても良い。 The 2-byte area 210C is an area from 80h to 9Fh in the code assignment table 210. In this 2-byte area 210C, words whose appearance frequency is equal to or higher than a predetermined value are set based on a blue sky library, an Oxford English dictionary, and other general books. In the following description, words whose appearance frequency is equal to or higher than a predetermined value will be referred to as high-frequency words as appropriate. The 2-byte area 210C may include a control symbol or the like.

ここで、２バイト領域２１０Ｃには、係る２バイト領域２１０Ｃに設定された高頻度単語等に割り当てる２バイトのコードのうち、前半の１バイトのコードのみが定義されている。２バイト領域２１０Ｃに設定された単語等に割り当てる２バイトのコードは、後述する２バイトコード割当表２１５ａに定義されている。 Here, in the 2-byte area 210C, only the 1-byte code of the first half is defined among 2-byte codes assigned to the high-frequency word or the like set in the 2-byte area 210C. A 2-byte code assigned to a word or the like set in the 2-byte area 210C is defined in a 2-byte code assignment table 215a described later.

例えば、２バイト領域２１０Ｃの高頻度単語に割り当てる２バイトのコードのうち、前半の１バイトのコードは「８０ｈ〜９Ｆｈ」となる。そして、前半の１バイトのコードと、残りの１バイトのコードは、２バイトコード割当表２１５ａに定義されている。 For example, among the 2-byte codes assigned to the high-frequency words in the 2-byte area 210C, the 1-byte code in the first half is “80h-9Fh”. The 1-byte code in the first half and the remaining 1-byte code are defined in the 2-byte code allocation table 215a.

３バイト領域２１０Ｄは、コード割当表２１０のＡ０ｈ〜ＦＦｈの領域である。この３バイト領域２１０Ｄには、青空文庫、オックスフォード英語辞書、その他の一般的な書籍を基にして、出現頻度が所定値未満となる低頻度の単語が設定される。例えば、３バイト領域２１０Ｄには、ＣＪＫ文字、英単語、日本単語、数値、タグ、動的コード等が含まれる。動的コードは、例えば、人物名や住所、連結単語等に対応するものである。 The 3-byte area 210D is an area from A0h to FFh in the code assignment table 210. In this 3-byte area 210D, a low-frequency word whose appearance frequency is less than a predetermined value is set based on a blue sky library, an Oxford English dictionary, and other general books. For example, the 3-byte area 210D includes CJK characters, English words, Japanese words, numerical values, tags, dynamic codes, and the like. The dynamic code corresponds to, for example, a person name, an address, a connected word, or the like.

ここで、３バイト領域２１０Ｄには、係る３バイト領域２１０Ｄに設定された単語等に割り当てる３バイトのコードのうち、前半の１バイトのコードのみが定義されている。３バイト領域２１０Ｄに設定された単語等に割り当てる３バイトのコードは、後述する３バイトコード割当表２１５ｂに定義されている。 Here, in the 3-byte area 210D, only the 1-byte code of the first half is defined among 3-byte codes assigned to the word set in the 3-byte area 210D. A 3-byte code assigned to a word or the like set in the 3-byte area 210D is defined in a 3-byte code assignment table 215b described later.

図１０は、本実施例２に係る２バイトコード割当表の一例を示す図である。図１０に示すように、２バイトコード割当表２１５ａは、高頻度単語と、２バイトのコードとを対応付ける。 FIG. 10 is a diagram illustrating an example of a 2-byte code allocation table according to the second embodiment. As shown in FIG. 10, the 2-byte code assignment table 215a associates high-frequency words with 2-byte codes.

例えば、２バイトコード割当表２１５ａにおいて、「８０００ｈ〜９ＦＦＦｈ」には、高頻度単語が設定され、設定位置に応じた２バイトのコードが割り当てられる。例えば、設定位置「８０００ｈ」に設定された高頻度単語には、２バイトのコード「８０００ｈ」が割り当てられる。 For example, in the 2-byte code assignment table 215a, a high-frequency word is set in “8000h to 9FFFh”, and a 2-byte code corresponding to the set position is assigned. For example, the 2-byte code “8000h” is assigned to the high-frequency word set at the setting position “8000h”.

図１１は、本実施例２に係る３バイトコード割当表の一例を示す図である。図１１に示すように、３バイトコード割当表２１５ｂは、ＣＪＫ文字、英単語、日本単語、数値、タグ、動的コードと、３バイトのコードとを対応付ける。 FIG. 11 is a diagram illustrating an example of a 3-byte code allocation table according to the second embodiment. As shown in FIG. 11, the 3-byte code assignment table 215b associates CJK characters, English words, Japanese words, numerical values, tags, dynamic codes, and 3-byte codes.

図８ａの説明に戻る。コード変換部２５０は、コード割当表２１０、２バイトコード割当表２１５ａ、３バイトコード割当表２１５ｂを基にして、レジスタ２０５ａに格納されたテキストデータをコード化する処理部である。コード変換部２５０は、コード化したテキストデータを、レジスタ２０５ｂに格納する。 Returning to the description of FIG. The code conversion unit 250 is a processing unit that encodes the text data stored in the register 205a based on the code allocation table 210, the 2-byte code allocation table 215a, and the 3-byte code allocation table 215b. The code conversion unit 250 stores the encoded text data in the register 205b.

以下において、コード変換部２５０の処理の一例について説明する。コード変換部２５０は、テキストデータから、スペース「△」で区切られる単語を取得する。コード変換部２５０は、取得した単語が、単語１バイト領域２１０Ａに設定された単語か、英数字１バイト領域２１０Ｂに設定された英数字に対応するものか、２バイト領域２１０Ｃに設定された単語か、３バイト領域２１０Ｄに設定された単語かを判定する。 Hereinafter, an example of processing of the code conversion unit 250 will be described. The code conversion unit 250 acquires words delimited by the space “Δ” from the text data. The code conversion unit 250 determines whether the acquired word is a word set in the word 1-byte area 210A or an alphanumeric word set in the alphanumeric 1-byte area 210B, or a word set in the 2-byte area 210C Or a word set in the 3-byte area 210D.

コード変換部２５０の取得した単語が単語１バイト領域２１０Ａに設定された単語である場合について説明する。コード変換部２５０は、取得した単語と、単語１バイト領域２１０Ａの各単語とを比較して、該当する設定位置の１バイトのコードを特定し、コード化する。例えば、コード変換部２５０は、取得した単語が「ｔｈｅ△」である場合には、かかる単語「ｔｈｅ△」を「００ｈ」にコード化する。 A case where the word acquired by the code conversion unit 250 is a word set in the word 1-byte area 210A will be described. The code conversion unit 250 compares the acquired word with each word in the word 1-byte area 210A to identify and code a 1-byte code at the corresponding setting position. For example, if the acquired word is “theΔ”, the code conversion unit 250 encodes the word “theΔ” into “00h”.

コード変換部２５０の取得した情報が英数字１バイト領域２１０Ｂに設定された英数字である場合について説明する。コード変換部２５０は、取得した英数字と、英数字１バイト領域２１０Ｂの各英数字とを比較して、該当する設置位置の１バイトのコードを特定し、コード化する。例えば、コード変換部２５０は、取得した英数字が「Ａ」である場合には、係る英数字「Ａ」を「４１ｈ」にコード化する。 A case where the information acquired by the code conversion unit 250 is an alphanumeric character set in the alphanumeric one-byte area 210B will be described. The code conversion unit 250 compares the acquired alphanumeric character with each alphanumeric character in the alphanumeric one-byte area 210B, and specifies and encodes the one-byte code at the corresponding installation position. For example, if the acquired alphanumeric character is “A”, the code converting unit 250 encodes the alphanumeric character “A” into “41h”.

コード変換部２５０の取得した単語が２バイト領域２１０Ｃに設定された単語である場合について説明する。コード変換部２５０は、取得した単語と、２バイトコード割当表２１５ａとを比較して、該当する設定位置の２バイトのコードを特定し、コード化する。例えば、コード変換部２５０は、取得した単語が、２バイトコード割当表２１５ａの「８０００ｈ」に設定されたある高頻度単語である場合には、かかる高頻度単語を２バイトのコード「８０００ｈ」にコード化する。 A case where the word acquired by the code conversion unit 250 is a word set in the 2-byte area 210C will be described. The code conversion unit 250 compares the acquired word with the 2-byte code assignment table 215a, identifies the 2-byte code at the corresponding setting position, and encodes it. For example, if the acquired word is a certain high-frequency word set to “8000h” in the 2-byte code assignment table 215a, the code conversion unit 250 converts the high-frequency word into a 2-byte code “8000h”. Code it.

コード変換部２５０の取得した単語が３バイト領域２１０Ｄに設定された単語である場合について説明する。コード変換部２５０は、取得した単語と、３バイトコード割当表２１５ｂとを比較して、該当する設定位置の３バイトのコードを特定し、コード化する。例えば、コード変換部２５０は、取得した単語が、３バイトコード割当表２１５ｂの「Ｂ０００００ｈ」に設定されたある英単語である場合には、かかる英単語を３バイトのコード「Ｂ０００００ｈ」にコード化する。 A case where the word acquired by the code conversion unit 250 is a word set in the 3-byte area 210D will be described. The code conversion unit 250 compares the acquired word with the 3-byte code assignment table 215b to identify and code a 3-byte code at the corresponding setting position. For example, if the acquired word is a certain English word set to “B00000h” in the 3-byte code allocation table 215b, the code conversion unit 250 encodes the English word into a 3-byte code “B00000h”. To do.

なお、コード変換部２５０は、取得した情報が、３バイト領域２１０Ｄに設定された日本単語、ＣＪＫ文字、数値、タグ、動的コードである場合も、３バイトコード割当表２１５ｂと比較して、コード化する。 The code conversion unit 250 also compares the acquired information with Japanese words, CJK characters, numerical values, tags, and dynamic codes set in the 3-byte area 210D in comparison with the 3-byte code allocation table 215b. Code it.

図８ｂは、本実施例２に係る復号化装置の構成を示す機能ブロック図である。図８ｂに示すように、この復号化装置６００は、入力部６０１、出力部６０２、レジスタ６０５ａ，６０５ｂ、記憶部６０６、コード変換部６５０を有する。 FIG. 8B is a functional block diagram illustrating the configuration of the decoding apparatus according to the second embodiment. As illustrated in FIG. 8b, the decoding device 600 includes an input unit 601, an output unit 602, registers 605a and 605b, a storage unit 606, and a code conversion unit 650.

入力部６０１は、コード変換されたテキストデータを受け付ける処理部である。入力部６０１は、受け付けたテキストデータを、レジスタ６０５ａに格納する。 The input unit 601 is a processing unit that receives code-converted text data. The input unit 601 stores the received text data in the register 605a.

出力部６０２は、レジスタ６０５ｂに格納されたテキストデータを出力する処理部である。 The output unit 602 is a processing unit that outputs text data stored in the register 605b.

レジスタ６０５ａは、コード変換されたテキストデータを格納するものである。レジスタ６０５ｂは、文字コード変換後のテキストデータを格納するものである。 The register 605a stores code-converted text data. The register 605b stores text data after character code conversion.

記憶部６０６は、コード割当表２１０と、２バイトコード割当表２１５ａと、３バイトコード割当表２１５ｂとを有する。記憶部６０６は、例えば、ＲＡＭ、ＲＯＭ、フラッシュメモリなどの半導体メモリ素子などの記憶装置に対応する。 The storage unit 606 includes a code allocation table 210, a 2-byte code allocation table 215a, and a 3-byte code allocation table 215b. The storage unit 606 corresponds to a storage device such as a semiconductor memory element such as a RAM, a ROM, or a flash memory.

コード割当表２１０に関する説明は、図９で説明したコード割当表２１０に関する説明と同様である。２バイトコード割当表２１５ａに関する説明は、図１０で説明した２バイトコード割当表２１５ａに関する説明と同様である。３バイトコード割当表２１５ｂに関する説明は、図１１で説明した３バイトコード割当表２１５ｂに関する説明と同様である。 The description regarding the code allocation table 210 is the same as the description regarding the code allocation table 210 described with reference to FIG. 9. The description regarding the 2-byte code allocation table 215a is the same as the description regarding the 2-byte code allocation table 215a described with reference to FIG. The description regarding the 3-byte code allocation table 215b is the same as the description regarding the 3-byte code allocation table 215b described with reference to FIG.

以下において、コード変換部６５０の処理の一例について説明する。例えば、コード変換部６５０は、テキストデータからコードを取得し、取得したコードが単語１バイト領域２１０Ａに設定された単語に対応するものか、英数字１バイト領域２１０Ｂに設定された英数字に対応するものかを判定する。また、コード変換部６５０は、取得したコードが、２バイト領域２１０Ｃに設定された単語に対応するものか、３バイト領域２１０Ｄに設定された単語に対応するものかを判定する。 Hereinafter, an example of processing of the code conversion unit 650 will be described. For example, the code conversion unit 650 acquires a code from text data, and the acquired code corresponds to the word set in the word 1-byte area 210A or the alphanumeric set in the alphanumeric 1-byte area 210B Determine what to do. In addition, the code conversion unit 650 determines whether the acquired code corresponds to a word set in the 2-byte area 210C or a word set in the 3-byte area 210D.

コード変換部６５０の取得したコードが単語１バイト領域２１０Ａに設定された単語に対応するコードである場合について説明する。単語１バイト領域２１０Ａに設定された単語に対応するコードの１バイト目は「００ｈ〜１Ｆｈ」に含まれる。コード変換部６５０は、コードに対応する単語を、単語１バイト領域２１０Ａに設定された単語から選択し、選択した単語によって文字コード化する。例えば、コード変換部６５０は、取得したコードが「００ｈ」である場合には、「００ｈ」を「ｔｈｅ△」に文字コード化する。 A case where the code acquired by the code conversion unit 650 is a code corresponding to the word set in the word 1-byte area 210A will be described. The first byte of the code corresponding to the word set in the word 1-byte area 210A is included in “00h to 1Fh”. The code conversion unit 650 selects a word corresponding to the code from words set in the word 1-byte area 210A, and converts the word into a character code using the selected word. For example, when the acquired code is “00h”, the code conversion unit 650 converts “00h” into “theΔ” as a character code.

コード変換部６５０が取得したコードが英数字１バイト領域２１０Ｂに設定された英数字に対応するコードである場合について説明する。英数字１バイト領域２１０Ｂに設定された英数字に対応するコードの１バイト目は「２０ｈ〜７Ｆｈ」に含まれる。コード変換部６５０は、コードに対応する英数字を、英数字１バイト領域２１０ｂに設定された英数字から選択し、選択した英数字によって文字コード化する。例えば、コード変換部６５０は、取得したコードが「４１ｈ」である場合には、「４１ｈ」を「Ａ」に文字コード化する。 A case where the code acquired by the code conversion unit 650 is a code corresponding to an alphanumeric character set in the alphanumeric one-byte area 210B will be described. The first byte of the code corresponding to the alphanumeric character set in the alphanumeric one-byte area 210B is included in “20h-7Fh”. The code conversion unit 650 selects an alphanumeric character corresponding to the code from the alphanumeric characters set in the alphanumeric one-byte area 210b, and converts the character code into the selected alphanumeric character. For example, when the acquired code is “41h”, the code conversion unit 650 converts “41h” into “A” as a character code.

コード変換部６５０の取得したコードが２バイト領域２１０Ｃに設定された単語に対応するコードである場合について説明する。２バイト領域２１０Ｃに設定された単語に対応するコードの１バイト目は「８０ｈ〜９Ｆｈ」に含まれる。コード変換部６５０は、取得したコードと、２バイトコード割当表２１５ａとを比較して、コードに対応する単語を特定し、文字コード化する。コード変換部６５０は、取得したコードが「８０００ｈ」である場合には、２バイトコード割当表２１５ａの「８０００ｈ」に対応する高頻度単語に文字コード化する。 A case where the code acquired by the code conversion unit 650 is a code corresponding to the word set in the 2-byte area 210C will be described. The first byte of the code corresponding to the word set in the 2-byte area 210C is included in “80h-9Fh”. The code conversion unit 650 compares the acquired code with the 2-byte code assignment table 215a, identifies a word corresponding to the code, and converts it into a character code. When the acquired code is “8000h”, the code converting unit 650 converts the character code into a high-frequency word corresponding to “8000h” in the 2-byte code assignment table 215a.

コード変換部６５０の取得したコードが３バイト領域２１０Ｄに設定された単語に対応するコードである場合について説明する。３バイト領域２１０Ｄに設定された単語に対応するコードの１バイト目は「Ａ０ｈ〜ＦＦｈ」に含まれる。コード変換部６５０は、取得したコードと、３バイトコード割当表２１５ｂとを比較して、コードに対応する単語を特定し、文字コード化する。コード変換部６５０は、取得したコードが「Ｂ０００００ｈ」である場合には、３バイトコード割当表２１５ｂの「Ｂ０００００ｈ」に対応する英単語に文字コード化する。 A case where the code acquired by the code conversion unit 650 is a code corresponding to the word set in the 3-byte area 210D will be described. The first byte of the code corresponding to the word set in the 3-byte area 210D is included in “A0h to FFh”. The code conversion unit 650 compares the acquired code with the 3-byte code assignment table 215b, identifies a word corresponding to the code, and converts it into a character code. When the acquired code is “B00000h”, the code conversion unit 650 converts the character code into an English word corresponding to “B00000h” in the 3-byte code assignment table 215b.

図１２ａは、本実施例２に係る符号化装置の処理手順を示すフローチャートである。図１２ａに示すように、符号化装置２００の入力部２０１は、テキストデータをレジスタ２０５ａに格納する（ステップＳ２０１）。符号化装置２００のコード変換部２５０は、レジスタ２０５ａに格納されたテキストデータから単語を取得する（ステップＳ２０２）。ステップＳ２０２では、説明の便宜上、単語と表記するが、コード変換部２５０が取得するものは、単語の他に、英数字、ＣＪＫ文字、日本単語、英単語、数値、タグ、動的コードの場合もある。 FIG. 12A is a flowchart illustrating the processing procedure of the encoding apparatus according to the second embodiment. As shown in FIG. 12a, the input unit 201 of the encoding device 200 stores the text data in the register 205a (step S201). The code conversion unit 250 of the encoding device 200 acquires a word from the text data stored in the register 205a (step S202). In step S202, for convenience of explanation, it is written as a word, but what is acquired by the code conversion unit 250 is, in addition to the word, an alphanumeric character, CJK character, Japanese word, English word, numeric value, tag, or dynamic code. There is also.

コード変換部２５０は、単語とコード割当表２１０とを比較する（ステップＳ２０３）。コード変換部２５０は、単語（情報）がコード割当表２１０の単語１バイト領域２１０Ａの単語または英数字１バイト領域２１０Ｂの英数字に対応する単語である場合には（ステップＳ２０４，Ｙｅｓ）、ステップＳ２０５に移行する。コード変換部２５０は、コード割当表２１０に基づいて、単語または英数字を１バイトのコードに変換し（ステップＳ２０５）、ステップＳ２０９に移行する。 The code conversion unit 250 compares the word with the code assignment table 210 (step S203). If the word (information) is a word corresponding to the word in the word 1-byte area 210A of the code assignment table 210 or the alphanumeric character in the alphanumeric 1-byte area 210B (Yes in step S204), the code conversion unit 250 The process proceeds to S205. Based on the code assignment table 210, the code conversion unit 250 converts a word or an alphanumeric character into a 1-byte code (step S205), and proceeds to step S209.

一方、コード変換部２５０は、単語（情報）がコード割当表２１０の単語１バイト領域２１０Ａの単語あるいは英数字１バイト領域２１０Ｂの英数字に対応する単語でない場合には（ステップＳ２０４，Ｎｏ）、ステップＳ２０６に移行する。コード変換部２５０は、単語がコード割当表２１０の２バイト領域２１０Ｃの単語に対応する単語である場合には（ステップＳ２０６，Ｙｅｓ）、ステップＳ２０７に移行する。コード変換部２５０は、２バイトコード割当表２１５ａに基づいて、単語を２バイトのコードに変換し（ステップＳ２０７）、ステップＳ２０９に移行する。 On the other hand, if the word (information) is not a word corresponding to the word in the word 1-byte area 210A of the code assignment table 210 or the alphanumeric character in the alphanumeric 1-byte area 210B (No in step S204), The process proceeds to step S206. If the word is a word corresponding to the word in the 2-byte area 210C of the code assignment table 210 (Yes at Step S206), the code conversion unit 250 proceeds to Step S207. The code conversion unit 250 converts the word into a 2-byte code based on the 2-byte code assignment table 215a (step S207), and proceeds to step S209.

一方、コード変換部２５０は、単語がコード割当表２１０の２バイト領域２１０Ｃの単語に対応する単語でない場合には（ステップＳ２０６，Ｎｏ）、ステップＳ２０８に移行する。コード変換部２５０は、３バイトコード割当表２１５ｂに基づいて、単語を３バイトのコードに変換し（ステップＳ２０８）、ステップＳ２０９に移行する。 On the other hand, if the word is not a word corresponding to the word in the 2-byte area 210C of the code assignment table 210 (No at Step S206), the code conversion unit 250 proceeds to Step S208. The code conversion unit 250 converts the word into a 3-byte code based on the 3-byte code assignment table 215b (step S208), and proceeds to step S209.

コード変換部２５０は、テキストデータのコード化が終了したか否かを判定する（ステップＳ２０９）。コード変換部２５０は、テキストデータのコード化が終了していない場合には（ステップＳ２０９，Ｎｏ）、ステップＳ２０２に移行する。 The code conversion unit 250 determines whether or not the text data has been encoded (step S209). If the encoding of the text data has not been completed (No at Step S209), the code conversion unit 250 proceeds to Step S202.

一方、コード変換部２５０は、テキストデータのコード化が終了した場合には（ステップＳ２０９，Ｙｅｓ）、コード化したテキストデータを、レジスタ２０５ｂに格納する（ステップＳ２１０）。 On the other hand, when the encoding of the text data is completed (Yes at Step S209), the code conversion unit 250 stores the encoded text data in the register 205b (Step S210).

図１２ｂは、本実施例２に係る復号化装置の処理手順を示すフローチャートである。図１２ｂに示すように、復号化装置６００の入力部６０１は、テキストデータをレジスタ６０５ａに格納する（ステップＳ６０１）。復号化装置６００のコード変換部６５０は、レジスタ６０５ａに格納されたテキストデータからコードを取得する（ステップＳ６０２）。 FIG. 12B is a flowchart illustrating the processing procedure of the decoding apparatus according to the second embodiment. As shown in FIG. 12b, the input unit 601 of the decryption apparatus 600 stores the text data in the register 605a (step S601). The code conversion unit 650 of the decryption apparatus 600 acquires a code from the text data stored in the register 605a (step S602).

コード変換部６５０は、コードとコード割当表２１０とを比較する（ステップＳ６０３）。コード変換部６５０は、コードがコード割当表２１０の単語１バイト領域２１０Ａの単語または英数字１バイト領域２１０Ｂの英数字に対応する対応するコードである場合には（ステップＳ６０４，Ｙｅｓ）、ステップＳ６０５に移行する。コード変換部６５０は、コード割当表２１０に基づいて、１バイトのコードを単語または英数字に変換し（ステップＳ６０５）、ステップＳ６０９に移行する。 The code conversion unit 650 compares the code with the code assignment table 210 (step S603). When the code is a corresponding code corresponding to a word in the word 1-byte area 210A of the code assignment table 210 or an alphanumeric character in the alphanumeric 1-byte area 210B (step S604, Yes), the code conversion unit 650 performs step S605. Migrate to Based on the code assignment table 210, the code conversion unit 650 converts the 1-byte code into a word or an alphanumeric character (step S605), and proceeds to step S609.

一方、コード変換部６５０は、コードがコード割当表２１０の単語１バイト領域２１０Ａの単語あるいは英数字１バイト領域２１０Ｂの英数字に対応するコードでない場合には（ステップＳ６０４，Ｎｏ）、ステップＳ６０６に移行する。コード変換部６５０は、コードがコード割当表２１０の２バイト領域２１０Ｃの単語に対応するコードである場合には（ステップＳ６０６，Ｙｅｓ）、ステップＳ６０７に移行する。コード変換部６５０は、２バイトコード割当表２１５ａに基づいて、２バイトのコードを単語に変換し（ステップＳ６０７）、ステップＳ６０９に移行する。 On the other hand, if the code is not a code corresponding to the word in the word 1-byte area 210A of the code assignment table 210 or the alphanumeric character in the alphanumeric 1-byte area 210B (step S604, No), the code conversion unit 650 proceeds to step S606. Transition. If the code is a code corresponding to a word in the 2-byte area 210C of the code assignment table 210 (step S606, Yes), the code conversion unit 650 proceeds to step S607. Based on the 2-byte code allocation table 215a, the code conversion unit 650 converts the 2-byte code into a word (step S607), and proceeds to step S609.

一方、コード変換部６５０は、コードがコード割当表２１０の２バイト領域２１０Ｃの単語に対応するコードでない場合には（ステップＳ６０６，Ｎｏ）、ステップＳ６０８に移行する。コード変換部６５０は、３バイトコード割当表２１５ｂに基づいて、３バイトのコードを単語に変換し（ステップＳ６０８）、ステップＳ６０９に移行する。 On the other hand, if the code is not a code corresponding to the word in the 2-byte area 210C of the code assignment table 210 (No at Step S606), the code conversion unit 650 proceeds to Step S608. Based on the 3-byte code assignment table 215b, the code conversion unit 650 converts the 3-byte code into a word (step S608), and proceeds to step S609.

コード変換部６５０は、テキストデータの復号化が終了したか否かを判定する（ステップＳ６０９）。コード変換部２５０は、テキストデータの復号化が終了していない場合には（ステップＳ６０９，Ｎｏ）、ステップＳ６０２に移行する。 The code conversion unit 650 determines whether or not the text data has been decrypted (step S609). If the decoding of the text data has not been completed (No at Step S609), the code conversion unit 250 proceeds to Step S602.

一方、コード変換部２５０は、テキストデータの復号化が終了した場合には（ステップＳ６０９，Ｙｅｓ）、復号化したテキストデータを、レジスタ６０５ｂに格納する（ステップＳ６１０）。 On the other hand, when the decoding of the text data is completed (Yes in step S609), the code conversion unit 250 stores the decoded text data in the register 605b (step S610).

次に、本実施例２に係る符号化装置２００の効果について説明する。符号化装置２００は、コード割当表２１０の単語１バイト領域において、厳選した単語を割り当てた割当表を用いたコード変換を行う。なお、英数字１バイト領域には、従来のコード割当表５０の２０ｈ〜７Ｆｈに設定される英数字と同様の英数字を設定する。係る処理を実行することで、英数字に関しては、従来と同様に１バイトのコードに変換することを可能にしつつ、出現頻度が高い文字や単語に対しては、短いバイトコードを割り当てることができる。 Next, effects of the encoding device 200 according to the second embodiment will be described. The encoding apparatus 200 performs code conversion using an assignment table in which carefully selected words are assigned in the word 1-byte area of the code assignment table 210. In the alphanumeric 1-byte area, the same alphanumeric characters as those set for 20h to 7Fh in the conventional code assignment table 50 are set. By executing such processing, alphanumeric characters can be converted into 1-byte codes as in the past, and short byte codes can be assigned to characters and words that appear frequently. .

また、復号化装置６００は、上記のコード割当表２１０を用いて、コード化されたテキストデータを復号化するため、出現頻度が高い単語や一般記号に対して、短いバイトコードを割り当てた場合でも、係るバイトコードを単語や一般記号に変換することができる。 In addition, since the decoding apparatus 600 uses the code allocation table 210 described above to decode the encoded text data, even when a short byte code is allocated to a word or general symbol having a high appearance frequency. , Such byte codes can be converted into words and general symbols.

図１３ａは、本実施例３に係る符号化装置の処理の一例を示す図である。本実施例３に係る符号化装置は、従来のコード割当表５０と、本実施例３特有のコード割当表３１０とを切り替えて利用する。例えば、符号化装置は、テキストデータから、制御記号「ＳＩ（Shift In）」を検出した場合には、制御記号「ＳＩ」以降のテキストデータを、コード割当表３１０を用いて、コード変換する。一方、符号化装置は、テキストデータから、制御記号「ＳＯ（Shift Out）」を検出した場合には、コード割当表５０を用いて、コード変換する。従来技術のコード割当表５０に関する説明は、実施例１で説明したものと同様である。 FIG. 13A is a diagram illustrating an example of a process performed by the encoding apparatus according to the third embodiment. The encoding apparatus according to the third embodiment switches between the conventional code allocation table 50 and the code allocation table 310 unique to the third embodiment. For example, when detecting the control symbol “SI (Shift In)” from the text data, the encoding device performs code conversion on the text data after the control symbol “SI” using the code assignment table 310. On the other hand, when the control apparatus detects the control symbol “SO (Shift Out)” from the text data, the encoding apparatus performs code conversion using the code assignment table 50. The description regarding the code allocation table 50 of the prior art is the same as that described in the first embodiment.

コード割当表３１０について説明する。コード割当表３１０の００ｈ〜１Ｆｈには、制御記号が設定され、１バイトのコードが割り当てられる。コード変換表３１０の００ｈ〜１Ｆｈに設定される制御記号は、コード割当表５０の００ｈ〜１Ｆｈに設定される制御記号と同様である。 The code assignment table 310 will be described. Control symbols are set in 00h to 1Fh in the code assignment table 310, and 1-byte codes are assigned. The control symbols set to 00h to 1Fh in the code conversion table 310 are the same as the control symbols set to 00h to 1Fh in the code assignment table 50.

コード割当表３１０の２０ｈ〜３Ｆｈには、後述する所定の英単語が設定され、１バイトのコードが割り当てられる。コード割当表３１０の４０ｈ〜５Ｆｈには、高頻度の英単語が設定され、２バイトのコードが割り当てられる。 Predetermined English words, which will be described later, are set in 20h to 3Fh of the code assignment table 310, and a 1-byte code is assigned. Frequent English words are set in 40h to 5Fh in the code assignment table 310, and a 2-byte code is assigned.

コード割当表３１０の６０ｈ〜７Ｆｈには、後述する所定の日本単語が設定され、１バイトのコードが割り当てられる。コード割当表３１０の８０ｈ〜９Ｆｈには、高頻度の日本単語が設定される。 A predetermined Japanese word to be described later is set in 60h to 7Fh of the code assignment table 310, and a 1-byte code is assigned. Frequent Japanese words are set in 80h to 9Fh of the code assignment table 310.

コード割当表３１０のＡ０ｈ〜ＦＦｈには、低頻度の単語が設定され、２バイトまたは３バイトのコードが割り当てられる。 Infrequent words are set in A0h to FFh of the code assignment table 310, and 2-byte or 3-byte codes are assigned.

本実施例３について、以下の説明では、適宜、コード割当表３１０の００ｈ〜１Ｆｈの領域を「制御記号１バイト領域」と表記する。コード割当表３１０の２０ｈ〜３Ｆｈの領域を「英単語１バイト領域」と表記する。コード割当表３１０の４０ｈ〜５Ｆｈの領域を「英単語２バイト領域」と表記する。コード割当表３１０の６０ｈ〜７Ｆｈの領域を「日本単語１バイト領域」と表記する。コード割当表３１０の８０ｈ〜９Ｆｈの領域を「日本単語２バイト領域」と表記する。コード割当表３１０のＡ０ｈ〜ＦＦｈの領域を「２・３バイト領域」と表記する。 Regarding the third embodiment, in the following description, the area from 00h to 1Fh in the code assignment table 310 will be referred to as “control symbol 1-byte area” as appropriate. The area from 20h to 3Fh in the code assignment table 310 is expressed as “English word 1-byte area”. The area from 40h to 5Fh in the code assignment table 310 is denoted as “English word 2-byte area”. The area from 60h to 7Fh in the code assignment table 310 is expressed as “Japanese word 1-byte area”. The area from 80h to 9Fh in the code assignment table 310 is expressed as “Japanese word 2-byte area”. The area from A0h to FFh in the code assignment table 310 is denoted as “2.3 byte area”.

コード変換部３５０は、制御記号「ＳＩ」あるいは「ＳＯ」の検出により、コード割当表５０，３１０を切り替え、切り替えたコード割当表に基づいて、テキストデータ３０ａを、テキストデータ３０ｂに変換する。ここでは、テキストデータ３０ａを「・・・Ｉｓ△ｈｅ△ｉｎ△ｔｈｅ△ｈｏｕｓｅ？」とする。 The code conversion unit 350 switches the code assignment tables 50 and 310 by detecting the control symbol “SI” or “SO”, and converts the text data 30a to the text data 30b based on the switched code assignment table. Here, the text data 30a is assumed to be “... IsΔheΔinΔtheΔhouse?”.

以下の説明では、前提として、コード変換部３５０は、制御記号「ＳＩ」を検出しており、コード割当表３１０を基にして、テキストデータ３０ａをコード変換する場合について説明する。なお、コード変換部３５０が、コード割当表５０を基にして、テキストデータ３０ａをコード変換する処理は、従来技術と同じであるため、説明を省略する。 In the following description, it is assumed that the code conversion unit 350 detects the control symbol “SI” and performs code conversion on the text data 30 a based on the code assignment table 310. Since the code conversion unit 350 performs code conversion on the text data 30a based on the code assignment table 50, the description thereof is omitted.

コード変換部３５０は、スペース「△」で区切られる単語と、コード割当表３１０とを比較して、単語をコードに変換する。テキストデータ３０ａに含まれる単語「Ｉｓ△」は、コード割当表３１０の英単語１バイト領域に設定された単語であり、コード変換部３５０は、単語「Ｉｓ△」を１バイトのコード「２５ｈ」と、「２Ｆｈ」とに変換する。ここで、１バイトのコード「２５ｈ」は、単語の先頭が大文字であることを示す１バイトのコードである。「２Ｆｈ」は、「ｉｓ△」に対応する１バイトのコードである。 The code conversion unit 350 compares the word delimited by the space “Δ” with the code assignment table 310 and converts the word into a code. The word “IsΔ” included in the text data 30a is a word set in the 1-byte English word area of the code assignment table 310, and the code conversion unit 350 converts the word “IsΔ” into a 1-byte code “25h”. And “2Fh”. Here, the 1-byte code “25h” is a 1-byte code indicating that the beginning of the word is capitalized. “2Fh” is a 1-byte code corresponding to “isΔ”.

テキストデータ３０ａに含まれる「ｈｅ△」は、コード割当表３１０の英単語１バイト領域に設定された単語であり、コード変換部３５０は、単語「ｈｅ△」を１バイトのコード「３９ｈ」に変換する。 “HeΔ” included in the text data 30a is a word set in the 1-byte English word area of the code assignment table 310, and the code conversion unit 350 converts the word “heΔ” into a 1-byte code “39h”. Convert.

テキストデータ３０ａに含まれる「ｉｎ△」は、コード割当表３１０の英単語１バイト領域に設定された単語であり、コード変換部３５０は、単語「ｉｎ△」を１バイトのコード「２Ｅｈ」に変換する。 “InΔ” included in the text data 30a is a word set in the 1-byte English word area of the code assignment table 310, and the code conversion unit 350 converts the word “inΔ” into a 1-byte code “2Eh”. Convert.

テキストデータ３０ａに含まれる「ｔｈｅ△」は、コード割当表３１０の英単語１バイト領域に設定された単語であり、コード変換部３５０は、単語「ｔｈｅ△」を１バイトのコード「２７ｈ」に変換する。 “TheΔ” included in the text data 30a is a word set in the 1-byte English word area of the code assignment table 310, and the code conversion unit 350 converts the word “theΔ” into a 1-byte code “27h”. Convert.

テキストデータ３０ａに含まれる単語「ｈｏｕｓｅ」は、「ｈｏｕｓｅ△」と「−△」に分割される。「ｈｏｕｓｅ△」は、コード割当表３１０の２バイト領域に設定された単語であり、コード変換部３５０は、例えば、単語「ｈｏｕｓｅ△」を２バイトのコード「４３４１ｈ」に、単語「−△」を１バイトのコード「２１ｈ」に変換する。 The word “house” included in the text data 30a is divided into “houseΔ” and “−Δ”. “HouseΔ” is a word set in the 2-byte area of the code allocation table 310. For example, the code conversion unit 350 converts the word “houseΔ” into a 2-byte code “4341h” and the word “−Δ”. Is converted to a 1-byte code “21h”.

テキストデータ３０ａに含まれる単語「？」は、コード割当表３１０の英単語２バイト領域に設定された記号であり、コード変換部３５０は、例えば、単語「？」を２バイトのコード「４０３Ｆｈ」に変換する。 The word “?” Included in the text data 30a is a symbol set in the 2-byte English word area of the code assignment table 310. For example, the code conversion unit 350 converts the word “?” Into a 2-byte code “403Fh”. Convert to

コード変換部３５０は、テキストデータ３０ａに含まれる各単語に対して、上記処理を実行することで、テキストデータ３０ａをテキストデータ３０ｂにコード化する。 The code conversion unit 350 encodes the text data 30a into the text data 30b by performing the above processing on each word included in the text data 30a.

図１３ｂは、本実施例３に係る復号化装置の処理の一例を示す図である。本実施例３に係る復号化装置は、従来のコード割当表５０と、本実施例３特有のコード割当表３１０とを切り替えて利用する。例えば、復号化装置は、テキストデータから、制御記号「ＳＩ」のコードを検出した場合には、制御記号「ＳＩ」以降のテキストデータを、コード割当表３１０を用いて、文字コード変換する。一方、復号化装置は、テキストデータから、制御記号「ＳＯ」のコードを検出した場合には、コード割当表５０を用いて、文字コード変換する。従来技術のコード割当表５０に関する説明は、実施例１で説明したものと同様である。また、コード割当表３１０に関する説明は、上記の説明と同様である。 FIG. 13B is a diagram illustrating an example of a process performed by the decoding apparatus according to the third embodiment. The decoding apparatus according to the third embodiment switches between the conventional code allocation table 50 and the code allocation table 310 unique to the third embodiment. For example, when the code of the control symbol “SI” is detected from the text data, the decoding apparatus converts the text data after the control symbol “SI” into a character code using the code assignment table 310. On the other hand, when the code of the control symbol “SO” is detected from the text data, the decoding apparatus converts the character code using the code assignment table 50. The description regarding the code allocation table 50 of the prior art is the same as that described in the first embodiment. The description regarding the code assignment table 310 is the same as the above description.

コード変換部７５０は、制御記号「ＳＩ」のコードあるいは「ＳＯ」のコードの検出により、コード割当表５０，３１０を切り替え、切り替えたコード割当表に基づいて、テキストデータ３０ｂを、テキストデータ３０ａに変換する。ここでは、テキストデータ３０ｂを「・・・２５ｈ２Ｆｈ３９ｈ２Ｅｈ２７ｈ４３４１ｈ２１ｈ４０３Ｆｈ・・・」とする。 The code conversion unit 750 switches the code allocation tables 50 and 310 by detecting the code of the control symbol “SI” or the code of “SO”, and converts the text data 30b to the text data 30a based on the switched code allocation table. Convert. Here, the text data 30b is assumed to be “... 25h 2Fh 39h 2Eh 27h 4341h 21h 403Fh.

以下の説明では、前提として、コード変換部７５０は、制御記号「ＳＩ」のコードを検出しており、コード割当表３１０を基にして、テキストデータ３０ｂを文字コード変換する場合について説明する。なお、コード変換部７５０が、コード割当表５０を基にして、テキストデータ３０ｂを文字コード変換する処理は、従来技術と同じであるため、説明を省略する。 In the following description, it is assumed that the code conversion unit 750 detects the code of the control symbol “SI” and converts the text data 30b to a character code based on the code assignment table 310. The process in which the code conversion unit 750 converts the text data 30b into the character code based on the code assignment table 50 is the same as that in the prior art, and thus the description thereof is omitted.

コード変換部７５０は、コードと、コード割当表３１０とを比較して、コードを単語に変換する。例えば、コード変換部７５０は、１バイトのコード「２５ｈ」と、「２Ｆｈ」を、単語「Ｉｓ△」に変換する。コード変換部７５０は、１バイトのコード「３９ｈ」を、単語「ｈｅ△」に変換する。コード変換部７５０は、１バイトのコード「２Ｅｈ」を、単語「ｉｎ△」に変換する。コード変換部７５０は、１バイトのコード「２７ｈ」を、単語「ｔｈｅ△」に変換する。コード変換部７５０は、２バイトのコード「４３４１ｈ」と１バイトのコード「２１ｈ」とを、単語「ｈｏｕｓｅ」に変換する。コード変換部７５０は、２バイトのコード「４０３Ｆｈ」を、記号「？」に変換する。 The code conversion unit 750 compares the code with the code assignment table 310 and converts the code into a word. For example, the code conversion unit 750 converts the 1-byte code “25h” and “2Fh” into the word “IsΔ”. The code conversion unit 750 converts the 1-byte code “39h” into the word “heΔ”. The code conversion unit 750 converts the 1-byte code “2Eh” into the word “inΔ”. The code conversion unit 750 converts the 1-byte code “27h” into the word “theΔ”. The code conversion unit 750 converts the 2-byte code “4341h” and the 1-byte code “21h” into the word “house”. The code conversion unit 750 converts the 2-byte code “403Fh” into the symbol “?”.

コード変換部７５０は、テキストデータ３０ｂに含まれる各コードに対して、上記処理を実行することで、テキストデータ３０ｂをテキストデータ３０ａに文字コード化する。 The code conversion unit 750 performs the above processing on each code included in the text data 30b, thereby converting the text data 30b into text data 30a.

図１４ａは、本実施例３に係る符号化装置の構成を示す機能ブロック図である。図１４ａに示すように、この符号化装置３００は、入力部３０１、出力部３０２、レジスタ３０５ａ，３０５ｂ、記憶部３０６、コード変換部３５０を有する。 FIG. 14A is a functional block diagram illustrating the configuration of the encoding device according to the third embodiment. As illustrated in FIG. 14A, the encoding apparatus 300 includes an input unit 301, an output unit 302, registers 305a and 305b, a storage unit 306, and a code conversion unit 350.

入力部３０１は、コード変換を行うテキストデータを受け付ける処理部である。入力部３０１は、受け付けたテキストデータを、レジスタ３０５ａに格納する。 The input unit 301 is a processing unit that accepts text data for code conversion. The input unit 301 stores the received text data in the register 305a.

出力部３０２は、レジスタ３０５ｂに格納されるコード変換後のテキストデータを出力する処理部である。 The output unit 302 is a processing unit that outputs the text data after code conversion stored in the register 305b.

レジスタ３０５ａは、コード変換を行う前のテキストデータを格納するものである。レジスタ３０５ｂは、コード変換後のテキストデータを格納するものである。 The register 305a stores text data before code conversion. The register 305b stores text data after code conversion.

記憶部３０６は、コード割当表５０と、コード割当表３１０と、英単語２バイトコード割当表３１５ａと、日本単語２バイトコード割当表３１５ｂと、２・３バイトコード割当表３１６とを有する。記憶部３０６は、例えば、ＲＡＭ、ＲＯＭ、フラッシュメモリなどの半導体メモリ素子などの記憶装置に対応する。 The storage unit 306 includes a code allocation table 50, a code allocation table 310, an English word 2-byte code allocation table 315a, a Japanese word 2-byte code allocation table 315b, and a 2.3 byte code allocation table 316. The storage unit 306 corresponds to a storage device such as a semiconductor memory element such as a RAM, a ROM, or a flash memory.

コード割当表５０は、従来のコード割当表である。例えば、コード割当表５０の説明は、実施例１で説明したものと同様である。 The code allocation table 50 is a conventional code allocation table. For example, the description of the code assignment table 50 is the same as that described in the first embodiment.

図１５は、本実施例３に係るコード割当表の一例を示す図である。コード割当表３１０は、単語等と、所定のコードとを対応付けたテーブルであり、図１３ａで説明したコード割当表３１０に対応する。図１５に示すように、このコード割当表３１０は、制御記号１バイト領域３１０Ａと、英単語１バイト領域３１０Ｂと、英単語２バイト領域３１０Ｃと、日本単語１バイト領域３１０Ｄと、日本単語２バイト領域３１０Ｅと、２・３バイト領域３１０Ｆとを有する。 FIG. 15 is a diagram illustrating an example of a code assignment table according to the third embodiment. The code assignment table 310 is a table in which words and the like are associated with predetermined codes, and corresponds to the code assignment table 310 described with reference to FIG. As shown in FIG. 15, the code allocation table 310 includes a control symbol 1 byte area 310A, an English word 1 byte area 310B, an English word 2 byte area 310C, a Japanese word 1 byte area 310D, and a Japanese word 2 bytes. It has an area 310E and a 2.3 byte area 310F.

制御記号１バイト領域３１０Ａは、コード割当表３１０の００ｈ〜１Ｆｈの領域である。制御記号１バイト領域３１０Ａに設定される制御記号は、コード割当表５０の００ｈ〜１Ｆｈに設定される制御記号と同様である。なお、制御記号には、「ＳＯ」と「ＳＩ」が含まれる。制御記号「ＳＯ」は、コード変換部３５０に、コード割当表５０を用いてコード変換を行うことを指示する制御記号である。制御記号「ＳＩ」は、コード変換部３５０に、コード割当表３１０を用いて、コード変換することを指示する制御記号である。 The control symbol 1-byte area 310A is an area from 00h to 1Fh in the code assignment table 310. The control symbols set in the control symbol 1-byte area 310A are the same as the control symbols set in 00h to 1Fh in the code assignment table 50. The control symbols include “SO” and “SI”. The control symbol “SO” is a control symbol that instructs the code conversion unit 350 to perform code conversion using the code assignment table 50. The control symbol “SI” is a control symbol that instructs the code conversion unit 350 to perform code conversion using the code assignment table 310.

英単語１バイト領域３１０Ｂは、コード割当表３１０の２０ｈ〜３Ｆｈの領域である。英単語１バイト領域３１０Ｂに設定された英単語には１バイトのコードが割り当てられる。この英単語１バイト領域３１０Ｂには、オックスフォード英語辞典、その他の一般的な書籍を基にして、出現頻度の高い上位２５個の英単語が設定される。例えば、単語「ｔｈｅ」には、１バイトのコード「２７ｈ」が割り当てられる。 The English word 1-byte area 310B is an area of 20h to 3Fh in the code assignment table 310. A 1-byte code is assigned to the English word set in the English word 1-byte area 310B. In the English word 1-byte area 310B, the top 25 English words having the highest appearance frequency are set based on the Oxford English dictionary and other general books. For example, the 1-byte code “27h” is assigned to the word “the”.

また、英単語１バイト領域３１０Ｂには、スペース「△」、バックスペース「−△」、コンマ「，」、アポストロフィ「’」、単語の先頭が大文字であることを示すコード、単語の全部が大文字であることを示すコードが設定される。例えば、スペース「△」には、１バイトのコード「２０ｈ」が割り当てられる。 In the English word 1-byte area 310B, a space “Δ”, a back space “−Δ”, a comma “,”, an apostrophe “′”, a code indicating that the beginning of the word is capitalized, and all the words are capitalized. A code indicating that is set. For example, a 1-byte code “20h” is assigned to the space “Δ”.

英単語２バイト領域３１０Ｃは、コード割当表３１０の４０ｈ〜５Ｆｈの領域である。この英単語２バイト領域３１０Ｃには、オックスフォード英語辞書、その他の一般的な書籍を基にして、出現頻度が所定値以上となる英単語が設定される。以下の説明では、適宜、出現頻度が所定値以上となる単語を高頻度英単語と表記する。 The English word 2-byte area 310C is an area of 40h to 5Fh in the code assignment table 310. In this English word 2-byte area 310C, English words whose appearance frequency is equal to or higher than a predetermined value are set based on the Oxford English dictionary and other general books. In the following description, a word whose appearance frequency is equal to or higher than a predetermined value is described as a high-frequency English word as appropriate.

ここで、英単語２バイト領域３１０Ｃには、係る英単語２バイト領域３１０Ｃに設定された高頻度英単語に割り当てる２バイトのコードのうち、前半の１バイトのコードのみが定義されている。英単語２バイト領域３１０Ｃに設定された英単語に割り当てる２バイトのコードは、後述する英単語２バイトコード割当表３１５ａに定義されている。 Here, in the English 2-byte area 310C, only the first half-byte code is defined among 2-byte codes assigned to the high-frequency English words set in the English 2-byte area 310C. The 2-byte code assigned to the English word set in the English word 2-byte area 310C is defined in an English word 2-byte code assignment table 315a described later.

日本単語１バイト領域３１０Ｄは、コード割当表３１０の６０ｈ〜７Ｆｈの領域である。この日本単語１バイト領域３１０Ｄは、青空文庫、その他の一般的な書籍を基にして、出現頻度の高い上位の日本語が設定される。例えば、日本単語「の」には、１バイトのコード「６５ｈ」が割り当てられる。 The Japanese word 1-byte area 310D is an area of 60h to 7Fh in the code assignment table 310. In this Japanese word 1-byte area 310D, higher-order Japanese having a high appearance frequency is set based on Aozora Bunko and other general books. For example, a 1-byte code “65h” is assigned to the Japanese word “NO”.

また、日本単語１バイト領域３１０Ｄは、読点「、」、句点「。」、カギ括弧が設定される。例えば、読点「、」には、１バイトのコード「６１ｈ」が割り当てられる。 The Japanese word 1-byte area 310D is set with a reading mark “,”, a punctuation mark “.”, And brackets. For example, a one-byte code “61h” is assigned to the reading point “,”.

日本単語２バイト領域３１０Ｅは、コード割当表３１０の８０ｈ〜９Ｆｈの領域である。この日本単語２バイト領域３１０Ｅは、青空文庫、その他の一般的な書籍を基にして、出現頻度の高い上位の日本語が設定される。以下の説明では、適宜、出現頻度が所定値以上となる単語を高頻度日本単語と表記する。 The Japanese word 2-byte area 310E is an area from 80h to 9Fh in the code assignment table 310. In the Japanese word 2-byte area 310E, upper Japanese having a high appearance frequency is set based on the Aozora Bunko and other general books. In the following description, words whose appearance frequency is equal to or higher than a predetermined value will be described as high-frequency Japanese words as appropriate.

ここで、日本単語２バイト領域３１０Ｅには、係る日本単語２バイト領域３１０Ｅに設定された高頻度日本単語に割り当てる２バイトのコードのうち、前半の１バイトのコードのみが設定されている。日本単語２バイト領域３１０Ｅに設定された日本語に割り当てる２バイトのコードは、後述する日本単語２バイトコード割当表３１５ｂに定義されている。 Here, in the Japanese word 2-byte area 310E, only the 1-byte code in the first half of the 2-byte codes assigned to the high-frequency Japanese words set in the Japanese word 2-byte area 310E is set. The 2-byte code assigned to Japanese set in the Japanese word 2-byte area 310E is defined in a Japanese word 2-byte code assignment table 315b described later.

２・３バイト領域３１０Ｆは、コード割当表３１０のＡ０ｈ〜ＦＦｈの領域である。この２・３バイト領域３１０Ｆには、青空文庫、オックスフォード英語辞書、その他の一般的な書籍を基にして、出現頻度が所定値未満となる低頻度の単語が設定される。以下の説明では、適宜、低頻度の単語を、低頻度単語と表記する。２・３バイト領域３１０Ｆに設定された低頻度単語には、２バイトまたは３バイトのコードが割り当てられる。 The 2.3 byte area 310F is an area from A0h to FFh in the code allocation table 310. In this 2.3 byte area 310F, low-frequency words whose appearance frequency is less than a predetermined value are set based on Aozora Bunko, Oxford English Dictionary, and other general books. In the following description, a low-frequency word is appropriately described as a low-frequency word. A 2-byte or 3-byte code is assigned to the low-frequency word set in the 2.3 byte area 310F.

なお、２・３バイト領域３１０Ｆには、係る２・３バイト領域３１０Ｆに設定された単語に割り当てるバイトコードのうち、前半の１バイトのコードのみが設定されている。２・３バイト領域３１０Ｆに設定された単語に割り当てる２バイトまたは３バイトのコードは、後述する２・３バイト割当表３１６に定義されている。 In the 2.3 byte area 310F, only the 1-byte code in the first half of the byte codes assigned to the words set in the 2.3 byte area 310F is set. A 2-byte or 3-byte code assigned to a word set in the 2 · 3 byte area 310F is defined in a 2 · 3 byte assignment table 316 described later.

図１６は、本実施例３に係る英単語２バイトコード割当表の一例を示す図である。図１６に示すように、英単語２バイトコード割当表３１５ａは、高頻度英単語と、２バイトのコードとを対応付ける。 FIG. 16 is a diagram illustrating an example of an English word 2-byte code assignment table according to the third embodiment. As shown in FIG. 16, the English word 2-byte code assignment table 315a associates high-frequency English words with 2-byte codes.

英単語２バイトコード割当表３１５ａにおいて、「４０００ｈ〜５ＦＦＦｈ」には、高頻度英単語が設定され、設置位置に応じた２バイトのコードが割り当てられる。例えば、設定位置「４０００ｈ」に設定された高頻度英単語には、２バイトのコード「４０００ｈ」が割り当てられる。 In the English word 2-byte code assignment table 315a, a high-frequency English word is set in “4000h to 5FFFh”, and a 2-byte code corresponding to the installation position is assigned. For example, the 2-byte code “4000h” is assigned to the high-frequency English word set at the setting position “4000h”.

図１７は、本実施例３に係る日本単語２バイト割当表の一例を示す図である。図１７に示すように、この日本単語２バイトコード割当表３１５ｂは、高頻度日本単語と、２バイトのコードとを対応付ける。 FIG. 17 is a diagram illustrating an example of a Japanese word 2-byte allocation table according to the third embodiment. As shown in FIG. 17, this Japanese word 2-byte code assignment table 315b associates high-frequency Japanese words with 2-byte codes.

日本単語２バイト割当表３１５ｂにおいて、「８０００ｈ〜９ＦＦＦｈ」には、高頻度日本単語が設定され、設置位置に応じた２バイトのコードが割り当てられる。例えば、設定位置「８０００ｈ」に設定された高頻度日本単語には、２バイトのコード「８０００ｈ」が割り当てられる。 In the Japanese word 2-byte allocation table 315b, a high-frequency Japanese word is set in “8000h to 9FFFh”, and a 2-byte code corresponding to the installation position is allocated. For example, a 2-byte code “8000h” is assigned to the high-frequency Japanese word set at the setting position “8000h”.

図１８は、本実施例３に係る２・３バイト割当表の一例を示す図である。図１８に示すように、この２・３バイト割当表３１６は、低頻度単語と、２バイトのコードまたは３バイトのコードを割り当てる。例えば、Ａ０００ｈ〜Ｅ７ＦＦｈ、Ｆ０００ｈ〜Ｆ７ＦＦｈに設定される低頻度単語には、２バイトのコードが割り当てられる。Ｅ９００００ｈ〜ＥＦＦＦＦＦｈ、Ｆ９００００ｈ〜ＦＦＦＦＦＦｈに設定される低頻度単語には、３バイトのコードが割り当てられる。 FIG. 18 is a diagram illustrating an example of the 2.3 byte allocation table according to the third embodiment. As shown in FIG. 18, this 2/3 byte allocation table 316 allocates low frequency words and 2-byte codes or 3-byte codes. For example, a 2-byte code is assigned to low-frequency words set in A000h to E7FFh and F000h to F7FFh. A 3-byte code is assigned to the low-frequency words set in E90000h to EFFFFFh and F90000h to FFFFFFh.

図１４ａの説明に戻る。コード変換部３５０は、制御記号に基づいてコード割当表を切り替え、切り替えたコード割当表に基づいて、テキストデータをコード化する処理部である。コード変換部３５０は、制御記号「ＳＩ」以降のテキストデータを、コード割当表３１０を用いて、コード変換する。一方、符号化装置３００は、テキストデータから、制御記号「ＳＯ」を検出した場合には、コード割当表５０を用いて、コード変換する。従来技術のコード割当表５０に関する説明は、実施例１で説明したものと同様である。コード変換部３５０は、コード化したテキストデータを、レジスタ３０５ｂに格納する。 Returning to the description of FIG. The code conversion unit 350 is a processing unit that switches a code allocation table based on a control symbol and encodes text data based on the switched code allocation table. The code conversion unit 350 converts the text data after the control symbol “SI” using the code assignment table 310. On the other hand, when detecting the control symbol “SO” from the text data, the encoding apparatus 300 performs code conversion using the code assignment table 50. The description regarding the code allocation table 50 of the prior art is the same as that described in the first embodiment. The code conversion unit 350 stores the encoded text data in the register 305b.

以下において、コード変換部３５０がコード割当表３１０を用いてコード化する処理の一例について説明する。コード変換部３５０は、テキストデータから情報（英単語、日本単語、制御記号等）を取得する。コード変換部３５０は、テキストデータから取得した情報が、各領域３１０Ａ〜３１０Ｆの何れの領域の情報に対応するか特定し、特定した領域に応じたコード化を行う。 Hereinafter, an example of processing in which the code conversion unit 350 performs encoding using the code assignment table 310 will be described. The code conversion unit 350 acquires information (English words, Japanese words, control symbols, etc.) from the text data. The code conversion unit 350 identifies which area of the areas 310A to 310F the information acquired from the text data corresponds to, and performs coding according to the identified area.

コード変換部３５０の取得した情報が制御記号１バイト領域３１０Ａに設定された制御記号である場合について説明する。コード変換部３５０は、取得した制御記号と、制御記号１バイト領域３１０Ａに設定された各制御記号とを比較して、該当する設定位置の１バイトのコードを特定し、コード化する。例えば、コード変換部３５０は、取得した制御記号が「ＮＵＬ」である場合には、かかる制御記号「ＮＵＬ」を「００ｈ」にコード化する。 A case where the information acquired by the code conversion unit 350 is a control symbol set in the control symbol 1-byte area 310A will be described. The code conversion unit 350 compares the acquired control symbol with each control symbol set in the control symbol 1-byte area 310A to identify and code a 1-byte code at the corresponding set position. For example, when the acquired control symbol is “NUL”, the code conversion unit 350 encodes the control symbol “NUL” to “00h”.

なお、コード変換部３５０は、取得した制御記号が「ＳＯ」である場合には、かかる制御記号「ＳＯ」を「０Ｅｈ」にコード化すると共に、利用するコード割当表を、コード割当表５０に切り替える。 When the acquired control symbol is “SO”, the code conversion unit 350 encodes the control symbol “SO” into “0Eh”, and converts the code allocation table to be used into the code allocation table 50. Switch.

コード変換部３５０は、取得した制御記号が「ＳＩ」である場合には、かかる制御記号「ＳＩ」を「０Ｆｈ」にコード化すると共に、利用するコード割当表を、コード割当表３１０に切り替える。 When the acquired control symbol is “SI”, the code conversion unit 350 encodes the control symbol “SI” to “0Fh” and switches the code allocation table to be used to the code allocation table 310.

コード変換部３５０の取得した情報が英単語１バイト領域３１０Ｂに設定された英単語である場合について説明する。コード変換部３５０は、取得した英単語と、英単語１バイト領域３１０Ｂに設定された各英単語とを比較して、該当する設定位置の１バイトのコードを特定し、コード化する。例えば、コード変換部３５０は、取得した英単語が「ｔｈｅ」である場合には、係る英単語「ｔｈｅ」を「２７ｈ」にコード化する。 A case where the information acquired by the code conversion unit 350 is an English word set in the English word 1-byte area 310B will be described. The code conversion unit 350 compares the acquired English word with each English word set in the English word 1-byte area 310B, identifies the 1-byte code at the corresponding set position, and encodes it. For example, when the acquired English word is “the”, the code conversion unit 350 encodes the English word “the” into “27h”.

コード変換部３５０が取得した情報が英単語２バイト領域３１０Ｃに設定された英単語である場合について説明する。コード変換部３５０は、取得した英単語と、英単語２バイトコード割当表３１５ａとを比較して、該当する設置位置の２バイトのコードを特定し、コード化する。例えば、コード変換部３５０は、取得した単語が、英単語２バイトコード割当表３１５ａの「４０００ｈ」に設定されたある高頻度英単語である場合には、かかる高頻度英単語を２バイトのコード「４０００ｈ」にコード化する。 The case where the information acquired by the code conversion unit 350 is an English word set in the English word 2-byte area 310C will be described. The code conversion unit 350 compares the acquired English word with the English word 2-byte code assignment table 315a, identifies the 2-byte code at the corresponding installation position, and encodes it. For example, if the acquired word is a certain high-frequency English word set to “4000h” in the English word 2-byte code assignment table 315a, the code conversion unit 350 converts the high-frequency English word into a 2-byte code. Code to “4000h”.

コード変換部３５０の取得した情報が日本単語１バイト領域３１０Ｄに設定された日本単語である場合について説明する。コード変換部３５０は、取得した日本単語と、日本単語１バイト領域３１０Ｄに設定された各日本単語とを比較して、該当する設定位置の１バイトのコードを特定し、コード化する。例えば、コード変換部３５０は、取得した日本単語が「の」である場合には、係る日本単語「の」を「６５ｈ」にコード化する。 A case where the information acquired by the code conversion unit 350 is a Japanese word set in the Japanese word 1-byte area 310D will be described. The code conversion unit 350 compares the acquired Japanese word with each Japanese word set in the Japanese word 1-byte area 310D to identify and code a 1-byte code at the corresponding set position. For example, if the acquired Japanese word is “NO”, the code conversion unit 350 encodes the Japanese word “NO” into “65h”.

コード変換部３５０の取得した情報が日本単語２バイト領域３１０Ｅに設定された日本単語である場合について説明する。コード変換部３５０は、取得した日本単語と、日本単語２バイトコード割当表３１５ｂとを比較して、該当する設置位置の２バイトのコードを特定し、コード化する。例えば、コード変換部３５０は、取得した単語が、日本単語２バイトコード割当表３１５ｂの「８０００ｈ」に設定されたある高頻度日本単語である場合には、かかる高頻度日本単語を２バイトのコード「８０００ｈ」にコード化する。 A case where the information acquired by the code conversion unit 350 is a Japanese word set in the Japanese word 2-byte area 310E will be described. The code conversion unit 350 compares the acquired Japanese word with the Japanese word 2-byte code assignment table 315b, identifies the 2-byte code at the corresponding installation position, and encodes it. For example, when the acquired word is a certain high-frequency Japanese word set to “8000h” in the Japanese word 2-byte code assignment table 315b, the code conversion unit 350 converts the high-frequency Japanese word into a 2-byte code. Code to “8000h”.

コード変換部３５０の取得した情報が２・３バイト領域３１０Ｆに設定された低頻度単語である場合について説明する。コード変換部３５０は、取得した単語と、２・３バイトコード割当表３１６とを比較して、該当する設定位置の２バイトまたは３バイトのコードを特定し、コード化する。例えば、コード変換部３５０は、取得した単語が、２・３バイトコード割当表３１６の「Ａ０００ｈ」に設定された低頻度単語である場合には、係る低頻度単語を２バイトのコード「Ａ０００ｈ」にコード化する。例えば、コード変換部３５０は、取得した単語が、２・３バイトコード割当表３１６の「Ｅ９００００ｈ」に設定された低頻度単語である場合には、係る低頻度単語を３バイトのコード「Ｅ９００００ｈ」にコード化する。 A case where the information acquired by the code conversion unit 350 is a low-frequency word set in the 2.3 byte area 310F will be described. The code conversion unit 350 compares the acquired word with the 2 / 3-byte code assignment table 316, identifies the 2-byte or 3-byte code at the corresponding setting position, and encodes it. For example, if the acquired word is a low-frequency word set to “A000h” in the 2.3 byte code allocation table 316, the code conversion unit 350 converts the low-frequency word into a 2-byte code “A000h”. Code to For example, if the acquired word is a low-frequency word set to “E90000h” in the 2.3 byte code assignment table 316, the code conversion unit 350 converts the low-frequency word into a 3-byte code “E90000h”. Code to

図１４ｂは、本実施例３に係る復号化装置の構成を示す機能ブロック図である。図１４ｂに示すように、この復号化装置７００は、入力部７０１、出力部７０２、レジスタ７０５ａ，７０５ｂ、記憶部７０６、コード変換部７５０を有する。 FIG. 14B is a functional block diagram illustrating the configuration of the decoding apparatus according to the third embodiment. As illustrated in FIG. 14B, the decoding device 700 includes an input unit 701, an output unit 702, registers 705a and 705b, a storage unit 706, and a code conversion unit 750.

入力部７０１は、コード変換を行うテキストデータを受け付ける処理部である。入力部７０１は、受け付けたテキストデータを、レジスタ７０５ａに格納する。 The input unit 701 is a processing unit that accepts text data for code conversion. The input unit 701 stores the received text data in the register 705a.

出力部７０２は、レジスタ７０５ｂに格納される文字コード変換後のテキストデータを出力する処理部である。 The output unit 702 is a processing unit that outputs text data after character code conversion stored in the register 705b.

レジスタ７０５ａは、コード変換されたテキストデータを格納するものである。レジスタ７０５ｂは、文字コード変換後のテキストデータを格納するものである。 The register 705a stores code-converted text data. The register 705b stores text data after character code conversion.

記憶部７０６は、コード割当表５０と、コード割当表３１０と、英単語２バイトコード割当表３１５ａと、日本単語２バイトコード割当表３１５ｂと、２・３バイトコード割当表３１６とを有する。記憶部７０６は、例えば、ＲＡＭ、ＲＯＭ、フラッシュメモリなどの半導体メモリ素子などの記憶装置に対応する。 The storage unit 706 includes a code allocation table 50, a code allocation table 310, an English word 2-byte code allocation table 315a, a Japanese word 2-byte code allocation table 315b, and a 2.3 byte code allocation table 316. The storage unit 706 corresponds to a storage device such as a semiconductor memory element such as a RAM, a ROM, or a flash memory.

コード割当表５０の説明は、実施例１で説明したものと同様である。コード割当表３１０に関する説明は、図１５で説明したコード割当表３１０に関する説明と同様である。英単語２バイトコード割当表３１５ａに関する説明は、図１６で説明した英単語２バイトコード割当表３１５ａに関する説明と同様である。日本単語２バイトコード割当表３１５ｂに関する説明は、図１７で説明した日本単語２バイト割当表３１５ｂに関する説明と同様である。２・３バイトコード割当表３１６に関する説明は、図１８で説明した２・３バイトコード割当表３１６に関する説明と同様である。 The description of the code assignment table 50 is the same as that described in the first embodiment. The description regarding the code allocation table 310 is the same as the description regarding the code allocation table 310 described with reference to FIG. The description regarding the English word 2-byte code allocation table 315a is the same as the description regarding the English word 2-byte code allocation table 315a described with reference to FIG. The description regarding the Japanese word 2-byte code allocation table 315b is the same as the description regarding the Japanese word 2-byte allocation table 315b described with reference to FIG. The description regarding the 2 · 3 byte code allocation table 316 is the same as the description regarding the 2 · 3 byte code allocation table 316 described with reference to FIG.

コード変換部７５０は、制御記号のコードに基づいてコード割当表を切り替え、切り替えたコード割当表に基づいて、テキストデータを文字コード化する処理部である。コード変換部７５０は、制御記号「ＳＩ」のコード以降のテキストデータを、コード割当表３１０を用いて、文字コード変換する。一方、復号化装置７００は、テキストデータから、制御記号「ＳＯ」のコードを検出した場合には、コード割当表５０を用いて、文字コード変換する。コード変換部７５０は、コード化したテキストデータを、レジスタ７０５ｂに格納する。 The code conversion unit 750 is a processing unit that switches the code assignment table based on the code of the control symbol and converts the text data into a character code based on the switched code assignment table. The code conversion unit 750 converts the text data after the code of the control symbol “SI” into a character code using the code assignment table 310. On the other hand, when the code of the control symbol “SO” is detected from the text data, the decoding apparatus 700 converts the character code using the code assignment table 50. The code conversion unit 750 stores the encoded text data in the register 705b.

以下において、コード変換部７５０がコード割当表３１０を用いて文字コード化する処理の一例について説明する。コード変換部７５０は、テキストデータからコードを取得する。コード変換部７５０は、テキストデータから取得したコードが、各領域３１０Ａ〜３１０Ｆの何れの領域の情報に対応するコードであるかを特定し、特定した領域に応じた文字コード化を行う。 In the following, an example of a process in which the code conversion unit 750 performs character encoding using the code assignment table 310 will be described. The code conversion unit 750 acquires a code from the text data. The code conversion unit 750 specifies whether the code acquired from the text data is a code corresponding to information in any of the areas 310A to 310F, and performs character encoding according to the specified area.

コード変換部７５０の取得したコードが制御記号１バイト領域３１０Ａに設定された制御記号のコードである場合について説明する。制御記号１バイト領域３１０Ａに設定された制御記号に対応するコードの１バイト目は「００ｈ〜１Ｆｈ」に含まれる。コード変換部７５０は、コードに対応する制御記号を、制御記号１バイト領域３１０Ａに設定された制御記号から選択し、選択した制御記号によって文字コード化する。例えば、コード変換部７５０は、取得したコードが「００ｈ」である場合には、「００ｈ」を「ＮＵＬ」に文字コード化する。 A case where the code acquired by the code conversion unit 750 is the code of the control symbol set in the control symbol 1-byte area 310A will be described. The first byte of the code corresponding to the control symbol set in the control symbol 1-byte area 310A is included in “00h to 1Fh”. The code conversion unit 750 selects a control symbol corresponding to the code from the control symbols set in the control symbol 1-byte area 310A, and character-codes the selected control symbol. For example, when the acquired code is “00h”, the code conversion unit 750 converts “00h” into “NUL” as a character code.

なお、コード変換部７５０は、取得したコードが「０Ｅｈ」である場合には、かかるコード「０Ｅｈ」を「ＳＯ」に文字コード化すると共に、利用するコード割当表を、コード割当表５０に切り替える。 When the acquired code is “0Eh”, the code conversion unit 750 converts the code “0Eh” into a character code “SO” and switches the code allocation table to be used to the code allocation table 50. .

コード変換部７５０は、取得したコードが「０Ｆｈ」である場合には、かかるコード「０Ｆｈ」を「ＳＩ」に文字コード化すると共に、利用するコード割当表を、コード割当表３１０に切り替える。 When the acquired code is “0Fh”, the code conversion unit 750 converts the code “0Fh” into a character code “SI” and switches the code allocation table to be used to the code allocation table 310.

コード変換部７５０の取得したコードが英単語１バイト領域３１０Ｂに設定された英単語に対応するコードである場合について説明する。英単語１バイト領域３１０Ｂに設定された英単語に対応するコードの１バイト目は「２０ｈ〜３Ｆｈ」に含まれる。コード変換部７５０は、取得したコードと、英単語１バイト領域３１０Ｂに設定された各英単語のコードとを比較して、該当する設定位置の英単語を特定し、文字コード化する。例えば、コード変換部７５０は、取得したコードが「２７ｈ」である場合には、係るコード「２７ｈ」を「ｔｈｅ」に文字コード化する。 A case where the code acquired by the code conversion unit 750 is a code corresponding to the English word set in the English word 1-byte area 310B will be described. The first byte of the code corresponding to the English word set in the English word 1-byte area 310B is included in “20h-3Fh”. The code conversion unit 750 compares the acquired code with the code of each English word set in the English word 1-byte area 310B, specifies the English word at the corresponding set position, and converts it into a character code. For example, when the acquired code is “27h”, the code conversion unit 750 converts the code “27h” into a character code “the”.

コード変換部７５０が取得したコードが英単語２バイト領域３１０Ｃに設定された英単語に対応するコードである場合について説明する。英単語２バイト領域３１０Ｃに設定された英単語に対応するコードの１バイト目は「４０ｈ〜５Ｆｈ」に含まれる。コード変換部７５０は、取得したコードと、英単語２バイトコード割当表３１５ａとを比較して、該当する設置位置の英単語を特定し、文字コード化する。例えば、コード変換部７５０は、取得したコードが「４０００ｈ」である場合には、英単語２バイトコード割当表３１５ａの「４０００ｈ」に対応する高頻度英単語に文字コード化する。 A case where the code acquired by the code conversion unit 750 is a code corresponding to the English word set in the English word 2-byte area 310C will be described. The first byte of the code corresponding to the English word set in the English word 2-byte area 310C is included in “40h to 5Fh”. The code conversion unit 750 compares the acquired code with the English word 2-byte code assignment table 315a, identifies the English word at the corresponding installation position, and converts it into a character code. For example, when the acquired code is “4000h”, the code conversion unit 750 converts the character code into a high-frequency English word corresponding to “4000h” in the English word 2-byte code assignment table 315a.

コード変換部７５０の取得したコードが２・３バイト領域３１０Ｆに設定された低頻度単語である場合について説明する。２・３バイト領域３１０Ｆに設定された低頻度単語に対応するコードの１バイト目は「Ａ０ｈ〜ＦＦｈ」に含まれる。コード変換部７５０は、取得したコードと、２・３バイトコード割当表３１６とを比較して、対応する設定位置の低頻度単語を特定し、文字コード化する。例えば、コード変換部７５０は、取得したコードが「Ａ０００ｈ」である場合には、２・３バイトコード割当表３１６の「Ａ０００ｈ」に対応する低頻度単語に文字コード化する。 A case where the code acquired by the code conversion unit 750 is a low-frequency word set in the 2.3 byte area 310F will be described. The first byte of the code corresponding to the low frequency word set in the 2.3 byte area 310F is included in “A0h to FFh”. The code conversion unit 750 compares the acquired code with the 2 / 3-byte code assignment table 316, identifies the low-frequency word at the corresponding set position, and converts it into a character code. For example, when the acquired code is “A000h”, the code conversion unit 750 converts the character code into a low-frequency word corresponding to “A000h” in the 2.3 byte code allocation table 316.

図１９ａは、本実施例３に係る符号化装置の処理手順を示すフローチャートである。図１９ａに示すように、符号化装置３００の入力部３０１は、テキストデータをレジスタ３０５ａに格納する（ステップＳ３０１）。符号化装置３００のコード変換部３５０は、テキストデータから情報を取得する（ステップＳ３０２）。ステップＳ３０２では、説明の便宜上、情報と表記するが、コード変換部３５０が取得する情報には、英単語、日本単語、制御記号等の情報が含まれる。 FIG. 19A is a flowchart illustrating the processing procedure of the encoding apparatus according to the third embodiment. As illustrated in FIG. 19a, the input unit 301 of the encoding device 300 stores text data in the register 305a (step S301). The code conversion unit 350 of the encoding device 300 acquires information from the text data (step S302). In step S302, for convenience of explanation, it is described as information, but the information acquired by the code conversion unit 350 includes information such as English words, Japanese words, control symbols, and the like.

コード変換部３５０は、取得した情報が制御記号の「ＳＯ」または「ＳＩ」であるか否かを判定する（ステップＳ３０３）。コード変換部３５０は、情報が制御記号の「ＳＯ」または「ＳＩ」である場合には（ステップＳ３０３，Ｙｅｓ）、ステップＳ３０４に移行する。 The code conversion unit 350 determines whether the acquired information is the control symbol “SO” or “SI” (step S303). If the information is the control symbol “SO” or “SI” (step S303, Yes), the code conversion unit 350 proceeds to step S304.

コード変換部３５０は、制御記号が「ＳＯ」の場合には、コード割当表５０を選択し、制御記号が「ＳＩ」の場合には、コード割当表３１０を選択し（ステップＳ３０４）、ステップＳ３０２に移行する。 The code conversion unit 350 selects the code assignment table 50 when the control symbol is “SO”, and selects the code assignment table 310 when the control symbol is “SI” (step S304), and step S302. Migrate to

一方、コード変換部３５０は、取得した情報が制御記号の「ＳＯ」でもなく、かつ、「ＳＩ」でもない場合には（ステップＳ３０３，Ｎｏ）、第１コード変換処理を実行する（ステップＳ３０５）。コード変換部３５０は、テキストデータのコード化が終了したか否かを判定する（ステップＳ３０６）。 On the other hand, when the acquired information is neither “SO” nor “SI” of the control symbol (No in step S303), the code conversion unit 350 executes the first code conversion process (step S305). . The code conversion unit 350 determines whether or not the text data has been encoded (step S306).

コード変換部３５０は、テキストデータのコード化が終了していない場合には（ステップＳ３０６，Ｎｏ）、ステップＳ３０２に移行する。一方、コード変換部３５０は、テキストデータのコード化が終了した場合には（ステップＳ３０６，Ｙｅｓ）、コード化したテキストデータを、レジスタ３０５ｂに格納する（ステップＳ３０７）。 If the encoding of the text data has not been completed (No at Step S306), the code converting unit 350 proceeds to Step S302. On the other hand, when the encoding of the text data is completed (Yes in step S306), the code conversion unit 350 stores the encoded text data in the register 305b (step S307).

図２０ａは、第１コード変換処理の処理手順を示すフローチャートである。このコード変換処理は、図１９ａのステップＳ３０５に示した処理に対応するものである。図２０ａに示すように、符号化装置３００のコード変換部３５０は、コード割当表５０を選択中であるか否かを判定する（ステップＳ４０１）。 FIG. 20A is a flowchart showing the processing procedure of the first code conversion processing. This code conversion process corresponds to the process shown in step S305 of FIG. 19a. As illustrated in FIG. 20A, the code conversion unit 350 of the encoding device 300 determines whether or not the code assignment table 50 is being selected (step S401).

コード変換部３５０は、コード割当表５０を選択中である場合には（ステップＳ４０１，Ｙｅｓ）、コード割当表５０を参照し（ステップＳ４０２）、コード割当表５０に基づいて、情報をバイトコードに変換する（ステップＳ４０３）。 When the code allocation table 50 is being selected (step S401, Yes), the code conversion unit 350 refers to the code allocation table 50 (step S402), and converts the information into a byte code based on the code allocation table 50. Conversion is performed (step S403).

一方、コード変換部３５０は、コード割当表５０を選択中ではなく、コード割当表３１０を選択中である場合には（ステップＳ４０１，Ｎｏ）、ステップＳ４０４に移行する。コード変換部３５０は、コード割当表３１０を参照し（ステップＳ４０４）、コード割当表３１０に基づいて、情報をバイトコードに変換する（ステップＳ４０５）。 On the other hand, if the code conversion table 350 is not selecting the code assignment table 50 but is selecting the code assignment table 310 (No in step S401), the code conversion unit 350 proceeds to step S404. The code conversion unit 350 refers to the code assignment table 310 (step S404), and converts information into a byte code based on the code assignment table 310 (step S405).

図１９ｂは、本実施例３に係る復号化装置の処理手順を示すフローチャートである。図１９ｂに示すように、復号化装置７００の入力部７０１は、テキストデータをレジスタ７０５ａに格納する（ステップＳ７０１）。復号化装置７００のコード変換部７５０は、テキストデータからコードを取得する（ステップＳ７０２）。 FIG. 19B is a flowchart illustrating the processing procedure of the decoding apparatus according to the third embodiment. As shown in FIG. 19b, the input unit 701 of the decryption apparatus 700 stores the text data in the register 705a (step S701). The code conversion unit 750 of the decryption device 700 acquires a code from the text data (step S702).

コード変換部７５０は、取得したコードが制御記号の「ＳＯ」または「ＳＩ」に対応するコードであるか否かを判定する（ステップＳ７０３）。コード変換部７５０は、コードが制御記号の「ＳＯ」または「ＳＩ」に対応するコードである場合には（ステップＳ７０３，Ｙｅｓ）、ステップＳ７０４に移行する。 The code conversion unit 750 determines whether or not the acquired code is a code corresponding to the control symbol “SO” or “SI” (step S703). When the code is a code corresponding to the control symbol “SO” or “SI” (Yes in step S703), the code converting unit 750 proceeds to step S704.

コード変換部７５０は、コードが「ＳＯ」に対応するコードの場合には、コード割当表５０を選択し、コードが「ＳＩ」に対応するコードの場合には、コード割当表３１０を選択し（ステップＳ７０４）、ステップＳ７０２に移行する。 The code conversion unit 750 selects the code allocation table 50 when the code corresponds to “SO”, and selects the code allocation table 310 when the code corresponds to “SI” ( Step S704) and the process proceeds to Step S702.

一方、コード変換部７５０は、取得したコードが「ＳＯ」に対応するコードでもなく、かつ、「ＳＩ」に対応するコードでもない場合には（ステップＳ７０３，Ｎｏ）、第２コード変換処理を実行する（ステップＳ７０５）。コード変換部７５０は、テキストデータの復号化が終了したか否かを判定する（ステップＳ７０６）。 On the other hand, if the acquired code is neither a code corresponding to “SO” nor a code corresponding to “SI” (step S703, No), the code conversion unit 750 executes the second code conversion process. (Step S705). The code conversion unit 750 determines whether or not the text data has been decrypted (step S706).

コード変換部７５０は、テキストデータの復号化が終了していない場合には（ステップＳ７０６，Ｎｏ）、ステップＳ７０２に移行する。一方、コード変換部７５０は、テキストデータの復号化が終了した場合には（ステップＳ７０６，Ｙｅｓ）、復号化したテキストデータを、レジスタ７０５ｂに格納する（ステップＳ７０７）。 If the decoding of the text data has not been completed (step S706, No), the code conversion unit 750 proceeds to step S702. On the other hand, when the decoding of the text data is completed (Yes in step S706), the code conversion unit 750 stores the decoded text data in the register 705b (step S707).

図２０ｂは、第２コード変換処理の処理手順を示すフローチャートである。このコード変換処理は、図１９ｂのステップＳ７０５に示した処理に対応するものである。図２０ｂに示すように、復号化装置７００のコード変換部７５０は、コード割当表５０を選択中であるか否かを判定する（ステップＳ８０１）。 FIG. 20B is a flowchart illustrating the processing procedure of the second code conversion processing. This code conversion process corresponds to the process shown in step S705 of FIG. 19b. As illustrated in FIG. 20b, the code conversion unit 750 of the decoding device 700 determines whether or not the code allocation table 50 is being selected (step S801).

コード変換部７５０は、コード割当表５０を選択中である場合には（ステップＳ８０１，Ｙｅｓ）、コード割当表５０を参照し（ステップＳ８０２）、コード割当表５０に基づいて、バイトコードを文字コードに変換する（ステップＳ８０３）。 If the code allocation table 50 is being selected (step S801, Yes), the code conversion unit 750 refers to the code allocation table 50 (step S802), and converts the byte code into a character code based on the code allocation table 50. (Step S803).

一方、コード変換部７５０は、コード割当表５０を選択中ではなく、コード割当表３１０を選択中である場合には（ステップＳ８０１，Ｎｏ）、ステップＳ８０４に移行する。コード変換部７５０は、コード割当表３１０を参照し（ステップＳ８０４）、コード割当表３１０に基づいて、バイトコードを文字コードに変換する（ステップＳ８０５）。 On the other hand, when the code allocation table 50 is not selected but the code allocation table 310 is selected (No at Step S801), the code conversion unit 750 proceeds to Step S804. The code conversion unit 750 refers to the code assignment table 310 (step S804), and converts the byte code into a character code based on the code assignment table 310 (step S805).

次に、本実施例３に係る符号化装置３００の効果について説明する。符号化装置３００は、従来のコード割当表５０と、本実施例３特有のコード割当表３１０とを切り替えて利用する。例えば、符号化装置３００は、テキストデータから、制御記号「ＳＩ」を検出した場合には、制御記号「ＳＩ」以降のテキストデータを、コード割当表３１０を用いて、コード変換する。一方、符号化装置３００は、テキストデータから、制御記号「ＳＯ」を検出した場合には、コード割当表５０を用いて、コード変換する。このため、従来のコード割当表５０を用いたコード変換に対応しつつ、出現頻度が高い文字や単語に対しては、短いバイトコードを割り当てることができる。 Next, effects of the encoding device 300 according to the third embodiment will be described. The encoding apparatus 300 switches between the conventional code assignment table 50 and the code assignment table 310 unique to the third embodiment. For example, when detecting the control symbol “SI” from the text data, the encoding apparatus 300 converts the text data after the control symbol “SI” using the code assignment table 310. On the other hand, when detecting the control symbol “SO” from the text data, the encoding apparatus 300 performs code conversion using the code assignment table 50. Therefore, it is possible to assign a short byte code to a character or word having a high appearance frequency while corresponding to code conversion using the conventional code assignment table 50.

また、復号化装置７００は、上記のコード割当表５０、３１０を切り替えて使用し、コード化されたテキストデータを復号化するため、従来のコード割当表５０を用いた文字コード変換に対応しつつ、出現頻度が高い単語や一般記号に対して、短いバイトコードを割り当てた場合でも、係るバイトコードを単語や一般記号に変換することができる。 In addition, the decoding device 700 uses the above code allocation tables 50 and 310 by switching and decodes the encoded text data, so that it supports character code conversion using the conventional code allocation table 50. Even when a short byte code is assigned to a word or general symbol having a high appearance frequency, the byte code can be converted into a word or general symbol.

図２１は、本実施例４に係る復号化装置の処理の一例を示す図である。本実施例４に係る復号化装置は、第１オートマトン８０６ａ、第２オートマトン８０６ｂ、第３オートマトン８０６ｃを用いて、コード変換されたテキストデータ１０ｂを、文字コード変換することで、テキストデータ１０ａを生成する。テキストデータ１０ｂは、例えば、実施例１で説明した符号化装置１００によりコード変換されたものである。 FIG. 21 is a diagram illustrating an example of processing of the decoding apparatus according to the fourth embodiment. The decoding apparatus according to the fourth embodiment generates text data 10a by performing character code conversion on the text data 10b that has been subjected to code conversion using the first automaton 806a, the second automaton 806b, and the third automaton 806c. To do. The text data 10b is, for example, code-converted by the encoding device 100 described in the first embodiment.

第１オートマトン８０６ａは、１バイトのコードと、１バイトのコードに対応する文字とが対応付けられる。図２２は、第１オートマトンの一例を示す図である。図２２に示すように、第１オートマトン８０６ａは、「００ｈ〜２Ｆｈ」と各単語とが対応付けられる。例えば、「００ｈ〜２Ｆｈ」に対応付けられた各単語は、図３で説明した１バイト領域１１０Ａの各単語に対応する。 In the first automaton 806a, a 1-byte code is associated with a character corresponding to the 1-byte code. FIG. 22 is a diagram illustrating an example of the first automaton. As shown in FIG. 22, in the first automaton 806a, “00h to 2Fh” is associated with each word. For example, each word associated with “00h to 2Fh” corresponds to each word in the 1-byte area 110A described with reference to FIG.

第２オートマトン８０６ｂは、２バイトのコードと、所定の文字列、スペース、記号、高頻度単語等とを対応付ける。図２３は、第２オートマトンの一例を示す図である。図２３に示すように、第２オートマトン８０６ｂは、「３０００ｈ〜５ＦＦＦｈ」と文字列、スペース、記号、高頻度単語等とが対応付けられる。ここでは図示を省略するが、第２オートマトン８０６ｂでは、２バイトのコードと、英数字、記号、かな、カナ、漢字、数値、時刻、タグ、構文とを対応付けてもよい。例えば、「３０００ｈ〜５ＦＦＦｈ」に対応付けられる情報は、図４で説明した２バイトコード割当表１１５ａにおいて、「３０００ｈ〜５ＦＦＦｈ」と対応付けられる情報に対応する。 The second automaton 806b associates a 2-byte code with a predetermined character string, space, symbol, high-frequency word, and the like. FIG. 23 is a diagram illustrating an example of the second automaton. As shown in FIG. 23, in the second automaton 806b, “3000h to 5FFFh” is associated with a character string, a space, a symbol, a high-frequency word, and the like. Although not shown here, in the second automaton 806b, a 2-byte code may be associated with alphanumeric characters, symbols, kana, kana, kanji, numerical values, time, tags, and syntax. For example, information associated with “3000h to 5FFFh” corresponds to information associated with “3000h to 5FFFh” in the 2-byte code assignment table 115a described with reference to FIG.

第３オートマトン８０６ｃは、３バイトのコードと、所定のＣＪＫ文字、英単語、日本単語、第３国の単語、数値、時刻、タグ、構文意味解析の結果とを対応付ける。図２４は、第３オートマトンの一例を示す図である。図２４に示すように、第３オートマトン８０６ｃは、「６０００００ｈ〜ＦＦＦＦＦＦｈ」と、所定のＣＪＫ文字、英単語、日本単語、第３国の単語、数値、時刻、タグ、構文意味解析の結果とが対応付けられる。なお、「Ｅ０００００ｈ〜ＦＦＦＦＦＦｈ」は、予備の領域となる。例えば、「６０００００ｈ〜ＦＦＦＦＦＦｈ」に対応付けられる情報は、図５で説明した３バイトコード割当表１１５ｂにおいて、「６０００００ｈ〜ＦＦＦＦＦＦｈ」に対応付けられる情報に対応する。 The third automaton 806c associates a 3-byte code with a predetermined CJK character, English word, Japanese word, third country word, numerical value, time, tag, and result of syntax semantic analysis. FIG. 24 is a diagram illustrating an example of the third automaton. As shown in FIG. 24, the third automaton 806c includes “600000h to FFFFFFh” and predetermined CJK characters, English words, Japanese words, third country words, numerical values, times, tags, and syntax semantic analysis results. It is associated. Note that “E00000h to FFFFFFh” is a spare area. For example, information associated with “600000h to FFFFFFh” corresponds to information associated with “600000h to FFFFFFh” in the 3-byte code assignment table 115b described with reference to FIG.

図２１の説明に戻る。コード変換部８５０は、コード変換されたテキストデータ１０ｂからコードを読み出し、コードの先頭４ビットの値に基づいて、第１オートマトン８０６ａ、第２オートマトン８０６ｂ、第３オートマトン８０６ｃのいずれかのオートマトンを選択する。そして、コード変換部８５０は、選択したオートマトンを基にして、コードを変換する。 Returning to the description of FIG. The code conversion unit 850 reads a code from the code-converted text data 10b, and selects one of the first automaton 806a, the second automaton 806b, and the third automaton 806c based on the value of the first 4 bits of the code To do. Then, the code conversion unit 850 converts the code based on the selected automaton.

例えば、コード変換部８５０は、コードの先頭４ビットが「００ｈ〜２Ｆｈ」に含まれる場合には、第１オートマトン８０６ａを選択し、第１オートマトン８０６ａに基づいて、コードを変換する。 For example, when the first 4 bits of the code are included in “00h to 2Fh”, the code conversion unit 850 selects the first automaton 806a and converts the code based on the first automaton 806a.

コード変換部８５０は、コードの先頭４ビットが「３０ｈ〜５Ｆｈ」に含まれる場合には、第２オートマトン８０６ｂを選択し、第２オートマトン８０６ｂに基づいて、コードを変換する。 When the first 4 bits of the code are included in “30h to 5Fh”, the code conversion unit 850 selects the second automaton 806b and converts the code based on the second automaton 806b.

コード変換部８５０は、コードの先頭４ビットが「６０ｈ〜ＦＦｈ」に含まれる場合には、第３オートマトン８０６ｃを選択し、第３オートマトン８０６ｃに基づいて、コードを変換する。 When the first 4 bits of the code are included in “60h to FFh”, the code conversion unit 850 selects the third automaton 806c and converts the code based on the third automaton 806c.

図２１のテキストデータ１０ｂに含まれる各コード「１２ｈ、０８ｈ、０７ｈ、００ｈ」の先頭４ビットは、「００ｈ〜２Ｆｈ」に含まれるため、コード変換部８５０は、第１オートマトン８０６ａを選択し、コードを変換する。例えば、コード変換部８５０は、「１２ｈ、０８ｈ、０７ｈ、００ｈ」を第１オートマトン８０６ａに基づき「ｈｅ△、ｉｓ△、ｉｎ△、ｔｈｅ△」にそれぞれ変換する。 Since the first 4 bits of each code “12h, 08h, 07h, 00h” included in the text data 10b of FIG. 21 are included in “00h-2Fh”, the code conversion unit 850 selects the first automaton 806a, Convert the code. For example, the code conversion unit 850 converts “12h, 08h, 07h, 00h” into “heΔ, isΔ, inΔ, theΔ” based on the first automaton 806a.

図２１のテキストデータ１０ｂに含まれるコード「４３４１ｈ」の先頭４ビットは、「３０ｈ〜５Ｆｈ」に含まれるため、コード変換部８５０は、第２オートマトン８０６ｂを選択し、コードを変換する。例えば、コード変換部８５０は、「４３４１ｈ」を第２オートマトン８０６ｂに基づき「ｈｏｕｓｅ△」に変換する。コード変換部８５０が上記処理を実行することで、テキストデータ１０ｂは、テキストデータ１０ａに変換される。 Since the first 4 bits of the code “4341h” included in the text data 10b of FIG. 21 are included in “30h to 5Fh”, the code conversion unit 850 selects the second automaton 806b and converts the code. For example, the code conversion unit 850 converts “4341h” into “houseΔ” based on the second automaton 806b. When the code conversion unit 850 executes the above processing, the text data 10b is converted into the text data 10a.

図２５は、本実施例４に係る復号化装置の構成を示す機能ブロック図である。図２５に示すように、この復号化装置８００は、入力部８０１、出力部８０２、レジスタ８０５ａ，８０５ｂ、記憶部８０６、コード変換部８５０を有する。 FIG. 25 is a functional block diagram of the configuration of the decoding apparatus according to the fourth embodiment. As illustrated in FIG. 25, the decoding device 800 includes an input unit 801, an output unit 802, registers 805a and 805b, a storage unit 806, and a code conversion unit 850.

入力部８０１は、コード変換されたテキストデータを受け付ける処理部である。入力部８０１は、受け付けたテキストデータを、レジスタ８０５ａに格納する。 The input unit 801 is a processing unit that receives code-converted text data. The input unit 801 stores the received text data in the register 805a.

出力部８０２は、レジスタ８０５ｂに格納されたテキストデータを出力する処理部である。 The output unit 802 is a processing unit that outputs the text data stored in the register 805b.

記憶部８０６は、第１オートマトン８０６ａと、第２オートマトン８０６ｂと、第３オートマトン８０６ｃとを有する。記憶部８０６は、例えば、ＲＡＭ、ＲＯＭ、フラッシュメモリなどの半導体メモリ素子などの記憶装置に対応する。 The storage unit 806 includes a first automaton 806a, a second automaton 806b, and a third automaton 806c. The storage unit 806 corresponds to a storage device such as a semiconductor memory element such as a RAM, a ROM, or a flash memory.

第１オートマトン８０６ａ、第２オートマトン８０６ｂ、第３オートマトン８０６ｃに関する説明は、図２１で説明した第１オートマトン８０６ａ、第２オートマトン８０６ｂ、第３オートマトン８０６ｃに関する説明と同様である。 The description regarding the first automaton 806a, the second automaton 806b, and the third automaton 806c is the same as the description regarding the first automaton 806a, the second automaton 806b, and the third automaton 806c described with reference to FIG.

コード変換部８５０は、コード変換されたテキストデータ１０ｂからコードを読み出し、コードの先頭４ビットの値に基づいて、第１オートマトン８０６ａ、第２オートマトン８０６ｂ、第３オートマトン８０６ｃのいずれかのオートマトンを選択する。そして、コード変換部８５０は、選択したオートマトンを基にして、コードを変換する。コード変換部８５０の具体的な処理は、図２１で説明したコード変換部８５０の処理と同様である。 The code conversion unit 850 reads a code from the code-converted text data 10b, and selects one of the first automaton 806a, the second automaton 806b, and the third automaton 806c based on the value of the first 4 bits of the code To do. Then, the code conversion unit 850 converts the code based on the selected automaton. Specific processing of the code conversion unit 850 is the same as the processing of the code conversion unit 850 described with reference to FIG.

図２６は、本実施例４に係る復号化装置の処理手順を示すフローチャートである。図２６に示すように、復号化装置８００の入力部８０１は、テキストデータをレジスタ８０５ａに格納する（ステップＳ９０１）。復号化装置８００のコード変換部８５０は、レジスタ８０５ａに格納されたテキストデータからコードを取得する（ステップＳ９０２）。 FIG. 26 is a flowchart of the process procedure of the decoding apparatus according to the fourth embodiment. As shown in FIG. 26, the input unit 801 of the decryption apparatus 800 stores the text data in the register 805a (step S901). The code conversion unit 850 of the decryption apparatus 800 acquires a code from the text data stored in the register 805a (step S902).

コード変換部８５０は、コードの先頭の４ビットの値と各オートマトンとを比較する（ステップＳ９０３）。コード変換部８５０は、コードの先頭の４ビットの値が第１オートマトン８０６ａにヒットしたか否かを判定する（ステップＳ９０４）。コード変換部８５０は、コードの先頭の４ビットの値が第１オートマトン８０６ａにヒットした場合には（ステップＳ９０４，Ｙｅｓ）、第１オートマトン８０６ａを選択する（ステップＳ９０５）。コード変換部８５０は、第１オートマトン８０６ａに基づいてコードを単語に変換し（ステップＳ９０６）、ステップＳ９１２に移行する。 The code conversion unit 850 compares the value of the first 4 bits of the code with each automaton (step S903). The code conversion unit 850 determines whether the value of the first 4 bits of the code hits the first automaton 806a (step S904). If the value of the first 4 bits of the code hits the first automaton 806a (step S904, Yes), the code conversion unit 850 selects the first automaton 806a (step S905). The code conversion unit 850 converts the code into a word based on the first automaton 806a (step S906), and proceeds to step S912.

一方、コード変換部８５０は、コードの先頭の４ビットの値が第１オートマトン８０６ａにヒットしていない場合には（ステップＳ９０４，Ｎｏ）、コードの先頭の４ビットの値が第２オートマトン８０６ｂにヒットしたか否かを判定する（ステップＳ９０７）。コード変換部８５０は、コードの先頭の４ビットの値が第２オートマトン８０６ｂにヒットした場合には（ステップＳ９０７，Ｙｅｓ）、第２オートマトン８０６ｂを選択する（ステップＳ９０８）。コード変換部８５０は、第２オートマトン８０６ｂに基づいてコードを単語に変換し（ステップＳ９０９）、ステップＳ９１２に移行する。 On the other hand, if the value of the first 4 bits of the code does not hit the first automaton 806a (step S904, No), the code conversion unit 850 sets the value of the first 4 bits of the code to the second automaton 806b. It is determined whether or not there is a hit (step S907). If the value of the first 4 bits of the code hits the second automaton 806b (Yes in step S907), the code conversion unit 850 selects the second automaton 806b (step S908). The code conversion unit 850 converts the code into a word based on the second automaton 806b (step S909), and proceeds to step S912.

一方、コード変換部８５０は、コードの先頭の４ビットの値が第２オートマトン８０６ｂにヒットしない場合には（ステップＳ９０７，Ｎｏ）、第３オートマトン８０６ｃを選択する（ステップＳ９１０）。コード変換部８５０は、第３オートマトン８０６ｃに基づいてコードを単語に変換する（ステップＳ９１１）。 On the other hand, when the value of the first 4 bits of the code does not hit the second automaton 806b (No in step S907), the code conversion unit 850 selects the third automaton 806c (step S910). The code conversion unit 850 converts the code into a word based on the third automaton 806c (step S911).

コード変換部８５０は、テキストデータの復号化が終了したか否かを判定する（ステップＳ９１２）。コード変換部８５０は、テキストデータの復号化が終了していない場合には（ステップＳ９１２，Ｎｏ）、ステップＳ９０２に移行する。 The code conversion unit 850 determines whether or not the decoding of the text data has been completed (step S912). If the decoding of the text data has not been completed (No at Step S912), the code conversion unit 850 proceeds to Step S902.

一方、コード変換部８５０は、テキストデータの復号化が終了した場合には（ステップＳ９１２，Ｙｅｓ）、復号化したテキストデータを、レジスタ８０５ｂに格納する（ステップＳ９１３）。 On the other hand, when the decoding of the text data is finished (step S912, Yes), the code conversion unit 850 stores the decoded text data in the register 805b (step S913).

次に、復号化装置８００の効果について説明する。復号化装置８００は、コード変換されたテキストデータ１０ｂからコードを読み出し、コードの先頭４ビットの値に基づいて、第１オートマトン８０６ａ、第２オートマトン８０６ｂ、第３オートマトン９０６ｃのいずれかのオートマトンを選択する。そして、復号化装置８００は、選択したオートマトンを基にして、コードを変換する。これにより、符号化装置１００等により、短いコードに割り当てるべき、出現頻度が高い文字や単語に対応付けられたコード等の２バイト以上のコードを１バイトコードに割り当てた場合でも、復号化装置８００を用いて、適切に復号化できる。すなわち、復号化装置８００により、短いコードに割り当てるべき、出現頻度が高い文字や単語に対応付けられたコード等の２バイト以上のコードを１バイトコードに割り当てることができる。 Next, the effect of the decoding apparatus 800 will be described. Decoding apparatus 800 reads a code from transcoded text data 10b, and selects one of first automaton 806a, second automaton 806b, and third automaton 906c based on the value of the first 4 bits of the code. To do. Then, decoding apparatus 800 converts the code based on the selected automaton. Thereby, even when the encoding device 100 or the like assigns a code of 2 bytes or more, such as a code associated with a character or word having a high appearance frequency, which should be assigned to a short code, to the 1-byte code, the decoding device 800. Can be properly decoded. That is, the decoding apparatus 800 can assign a code of 2 bytes or more, such as a code associated with a character or word having a high appearance frequency, to be assigned to a short code, to a 1-byte code.

下記に、本実施形態に用いられるハードウェア及びソフトウェアについて説明する。図２７は、コンピュータ１のハードウェア構成例を示す。コンピュータ１は、例えば、プロセッサ４０１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）４０２、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）４０３、ドライブ装置４０４、記憶媒体４０５、入力インターフェース（Ｉ／Ｆ）４０６、入力デバイス４０７、出力インターフェース（Ｉ／Ｆ）４０８、出力デバイス４０９、通信インターフェース（Ｉ／Ｆ）４１０、ＳＡＮ（ＳｔｏｒａｇｅＡｒｅａＮｅｔｗｏｒｋ）インターフェース（Ｉ／Ｆ）４１１およびバス４１２などを含む。それぞれのハードウェアはバス４１２を介して接続されている。 The hardware and software used in this embodiment will be described below. FIG. 27 shows a hardware configuration example of the computer 1. The computer 1 includes, for example, a processor 401, a RAM (Random Access Memory) 402, a ROM (Read Only Memory) 403, a drive device 404, a storage medium 405, an input interface (I / F) 406, an input device 407, an output interface (I / F) 408, an output device 409, a communication interface (I / F) 410, a SAN (Storage Area Network) interface (I / F) 411, a bus 412, and the like. Each piece of hardware is connected via a bus 412.

ＲＡＭ４０２は読み書き可能なメモリ装置であって、例えば、ＳＲＡＭ（ＳｔａｔｉｃＲＡＭ）やＤＲＡＭ（ＤｙｎａｍｉｃＲＡＭ）などの半導体メモリ、またはＲＡＭでなくてもフラッシュメモリなどが用いられる。ＲＯＭ４０３は、ＰＲＯＭ（ＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）なども含む。ドライブ装置４０４は、記憶媒体４０５に記録された情報の読み出しか書き込みかの少なくともいずれか一方を行なう装置である。記憶媒体４０５は、ドライブ装置４０４によって書き込まれた情報を記憶する。記憶媒体４０５は、例えば、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）などのフラッシュメモリ、ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、ブルーレイディスクなどの記憶媒体である。また、例えば、コンピュータ１は、複数種類の記憶媒体それぞれについて、ドライブ装置４０４及び記憶媒体４０５を設ける。 The RAM 402 is a readable / writable memory device. For example, a semiconductor memory such as an SRAM (Static RAM) or a DRAM (Dynamic RAM), or a flash memory is used even if it is not a RAM. The ROM 403 includes a PROM (Programmable ROM). The drive device 404 is a device that performs at least one of reading and writing of information recorded on the storage medium 405. The storage medium 405 stores information written by the drive device 404. The storage medium 405 is a storage medium such as a hard disk, a flash memory such as an SSD (Solid State Drive), a CD (Compact Disc), a DVD (Digital Versatile Disc), or a Blu-ray disc. For example, the computer 1 includes a drive device 404 and a storage medium 405 for each of a plurality of types of storage media.

入力インターフェース４０６は、入力デバイス４０７と接続されており、入力デバイス４０７から受信した入力信号をプロセッサ４０１に伝達する回路である。出力インターフェース４０８は、出力デバイス４０９と接続されており、出力デバイス４０９に、プロセッサ４０１の指示に応じた出力を実行させる回路である。通信インターフェース４１０はネットワーク３を介した通信の制御を行なう回路である。通信インターフェース４１０は、例えばネットワークインターフェースカード（ＮＩＣ）などである。ＳＡＮインターフェース４１１は、ストレージエリアネットワークによりコンピュータ１と接続された記憶装置との通信の制御を行なう回路である。ＳＡＮインターフェース４１１は、例えばホストバスアダプタ（ＨＢＡ）などである。 The input interface 406 is connected to the input device 407 and is a circuit that transmits an input signal received from the input device 407 to the processor 401. The output interface 408 is connected to the output device 409 and is a circuit that causes the output device 409 to execute an output in accordance with an instruction from the processor 401. The communication interface 410 is a circuit that controls communication via the network 3. The communication interface 410 is, for example, a network interface card (NIC). The SAN interface 411 is a circuit that controls communication with a storage device connected to the computer 1 via a storage area network. The SAN interface 411 is a host bus adapter (HBA), for example.

入力デバイス４０７は、操作に応じて入力信号を送信する装置である。入力信号は、例えば、キーボードやコンピュータ１の本体に取り付けられたボタンなどのキー装置や、マウスやタッチパネルなどのポインティングデバイスである。出力デバイス４０９は、コンピュータ１の制御に応じて情報を出力する装置である。出力デバイス４０９は、例えば、ディスプレイなどの画像出力装置（表示デバイス）や、スピーカーなどの音声出力装置などである。また、例えば、タッチスクリーンなどの入出力装置が、入力デバイス４０７及び出力デバイス４０９として用いられる。また、入力デバイス４０７及び出力デバイス４０９は、コンピュータ１と一体になっていてもよいし、コンピュータ１に含まれず、例えば、コンピュータ１に外部から接続する装置であってもよい。 The input device 407 is a device that transmits an input signal according to an operation. The input signal is, for example, a key device such as a keyboard or a button attached to the main body of the computer 1, or a pointing device such as a mouse or a touch panel. The output device 409 is an apparatus that outputs information according to the control of the computer 1. The output device 409 is, for example, an image output device (display device) such as a display, or an audio output device such as a speaker. For example, an input / output device such as a touch screen is used as the input device 407 and the output device 409. The input device 407 and the output device 409 may be integrated with the computer 1 or may be an apparatus that is not included in the computer 1 and is connected to the computer 1 from the outside, for example.

例えば、プロセッサ４０１は、ＲＯＭ４０３や記憶媒体４０５に記憶されたプログラムをＲＡＭ４０２に読み出し、読み出されたプログラムの手順に従って、入力部１０１，２０１，３０１、コード変換部１５０，２５０，３５０、出力部１０２，２０２，３０２の処理を行なう。その際にＲＡＭ４０２はプロセッサ４０１のワークエリアとして用いられる。記憶部の機能は、ＲＯＭ４０３および記憶媒体４０５がプログラムファイル（後述のアプリケーションプログラム２４、ミドルウェア２３およびＯＳ２２など）やデータファイル（テキストデータ、照合対象となる文字列）を記憶し、ＲＡＭ４０２がプロセッサ４０１のワークエリアとして用いられることによって実現される。プロセッサ４０１が読み出すプログラムについては、図２８を用いて説明する。 For example, the processor 401 reads a program stored in the ROM 403 or the storage medium 405 to the RAM 402, and according to the procedure of the read program, the input units 101, 201, 301, the code conversion units 150, 250, 350, and the output unit 102. , 202, 302 are performed. At that time, the RAM 402 is used as a work area of the processor 401. As for the function of the storage unit, the ROM 403 and the storage medium 405 store program files (such as an application program 24, middleware 23, and OS 22 described later) and data files (text data, character strings to be collated), and the RAM 402 stores the processor 401. It is realized by being used as a work area. The program read by the processor 401 will be described with reference to FIG.

図２８は、コンピュータで動作するプログラムの構成例を示す。コンピュータ１において、図２８に示すハードウェア群２１（４０１〜４１２）の制御を行なうＯＳ（オペレーティング・システム）２２が動作する。ＯＳ２２に従った手順でプロセッサ４０１が動作して、ハードウェア群２１の制御・管理が行なわれることにより、アプリケーションプログラム２４やミドルウェア２３に従った処理がハードウェア群２１で実行される。さらに、コンピュータ１において、ミドルウェア２３またはアプリケーションプログラム２４が、ＲＡＭ４０２に読み出されてプロセッサ４０１により実行される。 FIG. 28 shows a configuration example of a program that runs on a computer. In the computer 1, an OS (Operating System) 22 for controlling the hardware group 21 (401 to 412) shown in FIG. The processor 401 operates in accordance with the procedure in accordance with the OS 22 to control and manage the hardware group 21, whereby processing according to the application program 24 and the middleware 23 is executed in the hardware group 21. Further, in the computer 1, the middleware 23 or the application program 24 is read into the RAM 402 and executed by the processor 401.

プロセッサ４０１が、照合機能が呼び出された場合に、ミドルウェア２３またはアプリケーションプログラム２４の少なくとも一部に基づく処理を行なうことにより、（それらの処理をＯＳ２２に基づいてハードウェア群２１を制御して）コード変換部１５０，２５０，３５０の機能が実現される。照合機能は、それぞれアプリケーションプログラム２４自体に含まれてもよいし、アプリケーションプログラム２４に従って呼び出されることで実行されるミドルウェア２３の一部であってもよい。 When the processor 401 calls the collation function, the processor 401 performs processing based on at least a part of the middleware 23 or the application program 24 (by controlling the hardware group 21 based on the OS 22). The functions of the conversion units 150, 250, and 350 are realized. The collation function may be included in the application program 24 itself, or may be a part of the middleware 23 that is executed by being called according to the application program 24.

図２９は、実施形態のシステムにおける装置の構成例を示す。図２９のシステムは、コンピュータ１ａ、コンピュータ１ｂ、基地局２およびネットワーク３を含む。コンピュータ１ａは、無線または有線の少なくとも一方により、コンピュータ１ｂと接続されたネットワーク３に接続している。図２ａ、図８ａ、図１４ａに示す符号化装置１００，２００，３００の機能は、図２９に示すコンピュータ１ａとコンピュータ１ｂとのいずれに含まれてもよい。また、図２ｂ、図８ｂ、図１４ｂ、図２５に示す復号化装置５００，６００，７００、８００の機能は、図２９に示すコンピュータ１ａとコンピュータ１ｂとのいずれに含まれてもよい。 FIG. 29 shows a configuration example of an apparatus in the system of the embodiment. The system in FIG. 29 includes a computer 1 a, a computer 1 b, a base station 2 and a network 3. The computer 1a is connected to the network 3 connected to the computer 1b by at least one of wireless and wired. The functions of the encoding devices 100, 200, and 300 shown in FIGS. 2a, 8a, and 14a may be included in either the computer 1a or the computer 1b shown in FIG. Further, the functions of the decoding apparatuses 500, 600, 700, and 800 shown in FIGS. 2b, 8b, 14b, and 25 may be included in either the computer 1a or the computer 1b shown in FIG.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）コンピュータに、
記憶装置に格納された、第１コード割当表の１バイト領域に割り当てられた文字の一部を、２バイト領域に割り当て、かつ、前記２バイト領域に割り当てられた前記文字の少なくとも一部に対して、２バイト以上のコードを割り当てることで、入力された文字データをコード化する変換規則であって、前記コード化された符号データの先頭４ビットの値は当該符号データのコード長に応じて異なる変換規則を定義した第２コード割当表を参照し、
前記第２コード割当表を基にして生成された複数のオートマトンを利用し、コード化されたデータを前記複数のオートマトンのうち、当該データの先頭４ビットの値に応じて選択されるオートマトンにより文字データに復号化する
処理を実行させることを特徴とする復号化プログラム。 (Supplementary note 1)
A part of the character assigned to the 1-byte area of the first code assignment table stored in the storage device is assigned to the 2-byte area, and at least a part of the character assigned to the 2-byte area A conversion rule for encoding input character data by assigning a code of 2 bytes or more, and the value of the first 4 bits of the encoded code data depends on the code length of the code data Refer to the second code assignment table that defines different conversion rules,
A plurality of automata generated based on the second code allocation table are used, and the encoded data is characterized by the automaton selected according to the value of the first 4 bits of the data among the plurality of automata. A decryption program for executing a process of decrypting data.

（付記２）コンピュータが実行する復号化方法であって、
記憶装置に格納された、第１コード割当表の１バイト領域に割り当てられた文字の一部を、２バイト領域に割り当て、かつ、前記２バイト領域に割り当てられた前記文字の少なくとも一部に対して、２バイト以上のコードを割り当てることで、入力された文字データをコード化する変換規則であって、前記コード化された符号データの先頭４ビットの値は当該符号データのコード長に応じて異なる変換規則を定義した第２コード割当表を参照し、
前記第２コード割当表を基にして生成された複数のオートマトンを利用し、コード化されたデータを前記複数のオートマトンのうち、当該データの先頭４ビットの値に応じて選択されるオートマトンにより文字データに復号化する
処理を実行することを特徴とする復号化方法。 (Appendix 2) A decryption method executed by a computer,
A part of the character assigned to the 1-byte area of the first code assignment table stored in the storage device is assigned to the 2-byte area, and at least a part of the character assigned to the 2-byte area A conversion rule for encoding input character data by assigning a code of 2 bytes or more, and the value of the first 4 bits of the encoded code data depends on the code length of the code data Refer to the second code assignment table that defines different conversion rules,
A plurality of automata generated based on the second code allocation table are used, and the encoded data is characterized by the automaton selected according to the value of the first 4 bits of the data among the plurality of automata. A decoding method characterized by executing a process of decoding data.

（付記３）第１コード割当表の１バイト領域に割り当てられた文字の一部を、２バイト領域に割り当て、かつ、前記２バイト領域に割り当てられた前記文字の少なくとも一部に対して、２バイト以上のコードを割り当てることで、入力された文字データをコード化する変換規則であって、前記コード化された符号データの先頭４ビットの値は当該符号データのコード長に応じて異なる変換規則を定義した第２コード割当表を基に生成された複数のオートマトンを記憶する記憶部と、
前記複数のオートマトンを利用し、コード化されたデータを前記複数のオートマトンのうち、当該データの先頭４ビットの値に応じて選択されるオートマトンにより文字データに復号化するコード変換部と
を有することを特徴とする復号化装置。 (Supplementary Note 3) A part of the character assigned to the 1-byte area of the first code assignment table is assigned to the 2-byte area, and at least a part of the character assigned to the 2-byte area is 2 A conversion rule for encoding input character data by assigning a code of bytes or more, wherein the value of the first 4 bits of the encoded code data differs according to the code length of the code data A storage unit that stores a plurality of automata generated based on the second code allocation table that defines
A code conversion unit that uses the plurality of automata and decodes the encoded data into character data by an automaton selected according to the value of the first 4 bits of the data among the plurality of automata. A decoding device characterized by the above.

１００，２００，３００符号化装置
１５０，２５０，３５０コード変換部 100, 200, 300 Encoding device 150, 250, 350 Code conversion unit

Claims

On the computer,
A part of the character assigned to the 1-byte area of the first code assignment table stored in the storage device is assigned to the 2-byte area, and at least a part of the character assigned to the 2-byte area A conversion rule for encoding input character data by assigning a code of 2 bytes or more, and the value of the first 4 bits of the encoded code data depends on the code length of the code data Refer to the second code assignment table that defines different conversion rules,
A plurality of automata generated based on the second code allocation table are used, and the encoded data is characterized by the automaton selected according to the value of the first 4 bits of the data among the plurality of automata. A decryption program for executing a process of decrypting data.

A decryption method executed by a computer,
A part of the character assigned to the 1-byte area of the first code assignment table stored in the storage device is assigned to the 2-byte area, and at least a part of the character assigned to the 2-byte area A conversion rule for encoding input character data by assigning a code of 2 bytes or more, and the value of the first 4 bits of the encoded code data depends on the code length of the code data Refer to the second code assignment table that defines different conversion rules,
A plurality of automata generated based on the second code allocation table are used, and the encoded data is characterized by the automaton selected according to the value of the first 4 bits of the data among the plurality of automata. A decoding method characterized by executing a process of decoding data.

A part of a character assigned to the 1-byte area of the first code assignment table is assigned to a 2-byte area, and a code of 2 bytes or more is assigned to at least a part of the character assigned to the 2-byte area. Is a conversion rule for encoding the input character data, and the value of the first 4 bits of the encoded code data defines a conversion rule that is different depending on the code length of the code data. A storage unit for storing a plurality of automata generated based on the two-code allocation table;
A code conversion unit that uses the plurality of automata and decodes the encoded data into character data by an automaton selected according to the value of the first 4 bits of the data among the plurality of automata. A decoding device characterized by the above.