1306337 '九、發明說明: 【發明所屬之技術領域】 本發明係有關於一種字元轉換技術,特別係有關於一 種將字元於不同字元集間進行字元編碼轉換之方法及系 統。 【先前技術】 字元編瑪/解碼(character encoding/decoding)可應用於 φ 多語文應用系統中,多語文應用系統可實現於許多裝置 中,如個人數位助理(Personal Digital Assistants,PDAs)或其 他手持通訊裝置(mobile handheld device)中。 傳統上,將字元(character)的字元碼(character code)於 不同的字元集(character code set)中進行轉換,係利芾建立 1對1對映表(one-to-one mapping table)來完成。所建立的1 對1對映表通常包括兩個不同的字元集的字元碼紀錄,也 就是包括來源字元集(source character code set)以及目的字 鲁元集(destination character code set)的字元碼紀錄。1306337 ' IX. EMBODIMENT DESCRIPTION: TECHNICAL FIELD OF THE INVENTION The present invention relates to a character conversion technique, and more particularly to a method and system for character-to-character conversion between different character groups. [Prior Art] Character encoding/decoding can be applied to φ multi-language application systems, and multi-language application systems can be implemented in many devices, such as Personal Digital Assistants (PDAs) or others. In the mobile handheld device. Traditionally, the character code of a character is converted in a different character code set, and a one-to-one mapping table is established. )To be done. The established one-to-one mapping table usually includes character code records of two different character sets, that is, including a source character code set and a destination character code set. Character code record.
舉例而言,當來源字元集為Arabic ISO (International Organization for Standardization)而目的字元集為 Unicode-UCS (Universal Character Set)時,傳統字元轉換會 建立如前所述之1對1對映表。在此例中,由於Arabic ISO 包括209個字元,而Arabic ISO/Unicode-UCS使用16位元 (bit)即2位元組(byte)來進行編碼,因此一個單向的對映表 需要有(209)χ(2+2)=836位元組。一個單向.的對映表只能完 成單向的字元轉換,如由Arabic ISO轉換至Unicode-UCS 0758B-A31433TWF/PIvm"05-001T/Teresa 4 1306337 或由Unicode-UCS轉換至Arabic ISO。如欲完成雙向的字 元轉換,即可由Arabic ISO轉換至Unicode-UCS以及由 Unicode-UCS轉換至Arabic ISO,則必須將對映表的大小 加倍’因此雙向1對1對映表的大小為836x2=1672位元組。 再者,字元轉換的執行效率約可等同於在對映表中進 行二元搜尋(binary search)的效率執行複雜度,就傳統方法 而吕’其字元轉換的執行複雜度(complexity)約為l〇g,N, N=209。 然而,不同的語文會利用不同的字元集進行字元編 瑪’因此當某一裝置支援多種語文的應用系統時,字元往 往必須在各個不同的字元集中重覆地進行轉換,當轉換頻 繁又欠缺有效率之轉換方法時,字元轉換問題便會形成。 ή々j在如手持通成表置寻衮置中,記憶體空間51:汉) 及中央處理單元(central processing unit,cpu)的規格相當 叉限,因此傳統字元轉換方法無法滿足現今及往後相關科 技發展的需求。 【發明内容】 有鑑於此,本發明的目的就在於提供系統化之字元轉 換方法,其可將字元於不同的字元集間進行字元碼的轉換。 為達成上述目的’本發明提供一種字元轉換方法,用 ^將字元編碼由來源字元集轉換至目的字元集。首先必須 提供對映表’對映表用以表示來源字元集及目的字元集之 間的關係’對映表係分析來源字元集及目的字元集而建 〇758B-A31433TWF/PMTI-〇5-〇〇1T/Teresa 5 1306337 立,對映表所包括的紀錄係以來源字元集中數個不連續的 範圍為主。對映表之每一紀錄係對映至一範圍,並包括該 範圍之啟始值、結束值以及位移值,位移值用以作為啟始 值於目的字元集中之對應值。 接著,接收來源字元碼,來源字元碼係將欲進行轉換 — 的目標字元以來源字元集進行編碼所得。再根據來源字元 • 碼,以二元搜尋法等方式搜尋對映表以獲得啟始值以及位 移值。位移值用以表示啟始值於目的字元集之對應值。 ® 然後,根據啟始值及來源字元碼,計算差值。例如, 將來源字元碼減去啟始值而得到差值。最後,將位移值與 . 差值相加,以得到目的值。目的值係作為目標字元於目的 * 字元集中之一指標(index),也就是其後可根據指標,得到 目標字元以目的字元集進行編碼之目的字元碼。 再者,本發明提出一種字元轉換系統,周以將字元編 碼由來源字元集轉換至目的字元集,包括對映表、接收模 組、搜尋模組、計算模組以及相加模組,用以執行上述之 字元轉換方法。 【實施方式】 請參照第1圖,第1圖係顯示本發明所揭示之方法之 執行流程圖。首先,分析來源字元集及目的字元集,主要 分析來源字元集及目的字元集中所包含的數個不連續的範 圍(步驟S700)。 根據分析的結果,可建立一個對映表(步驟S702)。因 0758B-A3 Ϊ 433 TWF/PMTI-05-001 T/Teresa 6 1306337 此,對映表係依據分析所得到的資訊,指出來源字元集與 目的字元集之間的關係。當對映表建立完成後,對映表會 包含用以表示來源字元集中不連續範圍的多筆紀錄 (entry) 〇 接著,接收來源字元碼(步驟S704),來源字元碼係將 ' 欲進行轉換的目標字元以來源字元集進行編碼所得。再根 • 據來源字元碼,以二元搜尋法等方式搜尋對映表以獲得啟 始值以及位移值(步驟S706)。位移值可用以表示啟始值於 ®目的字元集中之對應值。 然後,根據啟始值及來源字元碼計算差值(步驟 S708)。例如,將來源字元碼減去啟始值而得到差值。最後, 將位移值與差值相加,以得到目的值(步驟S710)。目的值 係為目標字元於目的字元#中之一指標,其後可根據此指 標,得到目標字元以目的字元集進行編碼之目的字元碼。 第2圖係顯示本發明所揭示之系統之功能方塊圖。如 I 圖所示,本發明提出一種字元轉換系統,用以將字元編碼 由來源字元集轉換至目的字元集,包括對映表900、接收 模組902、搜尋模組904、計算模組906以及相加模組908。 對映表900用以表示來源字元集及目的字元集之間的 關係,對映表係根據來源字元集和目的字元集中數個不相 連續的範圍所建立,而對映表包括多筆記錄,用以表示來 源字元集中之數個不連續範圍。每一紀錄係對映至一個範 圍,並包括該範圍之啟始值、結束值以及位移值,其中位 移值用以作為啟始值於目的字元集中之一對應值。對映表 0758B-A31433TWF/PMTI-05-001 T/Teresa 7 1306337 « 可藉由分析來源字元集及目的字元集而建立。 接收模組902用以接收來源字元碼,來源字元碼係將 目標字元以來源字元集進行編碼所得。搜尋模組904用以 根據來源字元碼,搜尋對映表900以獲得啟始值以及位移 值,位移值用以表示啟始值於目的字元集中之對應值。搜 • 尋模組904可以二元搜尋法進行搜尋。 - 計算模組906用以根據啟始值及來源字元碼,計算差 值。計算模組906進行差值之計算時,係將來源字元碼減 ® 去啟始值而得到。相加模組908用以將位移值與差值相 加,以得到目的值。目的值可作為目標字元於目的字元集 中之一指標。轉換系統還可包括獲得模組910,獲得模紐 910用以根據指標得到目標字元以目的字元集進行編碼之 目的字元碼。 請參照第3圖,第3圖係顯示本發明所揭示之方法之 一實施例之執行流程圖。在此實施例中,來源字元集係為 UCS而目的字元集為Arabic ISO。欲進行字元轉換的目標 字元假設為” (3 ”,應用本發明所提出之方法可將其字元編碼 由UCS轉換至Arabic ISO。目標字元”ti”以來源字元集UCS 進行編碼之來源字元碼為”0x642”,以目的字元集Arabic ISO進行編碼的目的字元碼為”0xe2”。For example, when the source character set is Arabic ISO (International Organization for Standardization) and the destination character set is Unicode-UCS (Universal Character Set), the traditional character conversion will establish a one-to-one mapping as described above. table. In this example, since the Arabic ISO includes 209 characters, and the Arabic ISO/Unicode-UCS uses 16 bits or 2 bytes to encode, a one-way mapping table needs to have (209) χ (2+2) = 836 bytes. A one-way mapping table can only perform one-way character conversion, such as conversion from Arabic ISO to Unicode-UCS 0758B-A31433TWF/PIvm"05-001T/Teresa 4 1306337 or from Unicode-UCS to Arabic ISO. To perform a two-way character conversion, you can convert from Arabic ISO to Unicode-UCS and from Unicode-UCS to Arabic ISO, you must double the size of the mapping table' so the size of the bidirectional 1-to-1 mapping table is 836x2 =1672 bytes. Moreover, the execution efficiency of the character conversion can be approximately equal to the efficiency of the binary search in the mapping table, and the complexity of the traditional method is the complexity of the character conversion. Is l〇g, N, N=209. However, different languages use different character sets for character encoding. Therefore, when a device supports a multi-language application system, characters often have to be repeatedly converted in different character groups when converting. Character conversion problems are formed when there are frequent and inefficient conversion methods. Ή々j in the hand-held device, the memory space 51: Chinese) and the central processing unit (cpu) specifications are quite limited, so the traditional character conversion method can not meet the present and the future After the relevant technology development needs. SUMMARY OF THE INVENTION In view of the above, it is an object of the present invention to provide a systemized character conversion method that converts a character into a character code between different sets of characters. To achieve the above object, the present invention provides a character conversion method for converting a character encoding from a source character set to a destination character set. First, the mapping table 'opformed table' is used to represent the relationship between the source character set and the destination character set'. The mapping table analyzes the source character set and the destination character set and builds 758B-A31433TWF/PMTI- 〇5-〇〇1T/Teresa 5 1306337, the records included in the mapping table are mainly based on a number of discontinuous ranges in the source character set. Each record of the mapping table is mapped to a range and includes the start value, the end value, and the displacement value of the range, and the displacement value is used as the corresponding value of the start value in the target character set. Next, the source character code is received, and the source character code encodes the target character to be converted - the source character set. Then, according to the source character code, the binary search method is used to search the mapping table to obtain the starting value and the shift value. The displacement value is used to indicate the corresponding value of the starting value in the set of destination characters. ® Then, calculate the difference based on the start value and the source character code. For example, subtract the starting value from the source character code to get the difference. Finally, the displacement value is added to the difference value to obtain the target value. The destination value is used as a target character in the target * character set index (index), that is, the target character code for encoding the target character in the target character set according to the index. Furthermore, the present invention proposes a character conversion system for converting a character encoding from a source character set to a destination character set, including a mapping table, a receiving module, a search module, a computing module, and an additive module. A group for performing the above character conversion method. [Embodiment] Please refer to Fig. 1, which is a flow chart showing the execution of the method disclosed in the present invention. First, the source character set and the target character set are analyzed, and a plurality of discontinuous ranges included in the source character set and the target character set are mainly analyzed (step S700). Based on the result of the analysis, a mapping table can be established (step S702). Because 0758B-A3 Ϊ 433 TWF/PMTI-05-001 T/Teresa 6 1306337 Therefore, the mapping table indicates the relationship between the source character set and the destination character set based on the information obtained by the analysis. After the mapping table is created, the mapping table will contain a plurality of entries for indicating the discontinuous range in the source character set. Next, the source character code is received (step S704), and the source character code system will be ' The target character to be converted is encoded in the source character set. Then, according to the source character code, the mapping table is searched by a binary search method or the like to obtain a start value and a displacement value (step S706). The displacement value can be used to indicate the corresponding value of the starting value in the ® target character set. Then, the difference is calculated based on the start value and the source character code (step S708). For example, subtracting the starting value from the source character code yields the difference. Finally, the displacement value is added to the difference value to obtain the target value (step S710). The destination value is an indicator of the target character in the destination character #, and then the destination character code of the target character encoded by the destination character set can be obtained according to the index. Figure 2 is a functional block diagram showing the system of the present invention. As shown in FIG. 1, the present invention provides a character conversion system for converting a character encoding from a source character set to a destination character set, including a mapping table 900, a receiving module 902, a search module 904, and a calculation. Module 906 and addition module 908. The mapping table 900 is used to represent the relationship between the source character set and the destination character set. The mapping table is established according to a plurality of non-contiguous ranges of the source character set and the destination character set, and the mapping table includes Multiple records to represent several discrete ranges in the source character set. Each record is mapped to a range and includes the start value, end value, and displacement value of the range, where the bit shift value is used as the start value in one of the destination character sets. Mapping Table 0758B-A31433TWF/PMTI-05-001 T/Teresa 7 1306337 « Can be established by analyzing the source character set and the destination character set. The receiving module 902 is configured to receive the source character code, and the source character code encodes the target character in the source character set. The search module 904 is configured to search the mapping table 900 to obtain the starting value and the displacement value according to the source character code, and the displacement value is used to indicate the corresponding value of the starting value in the target character set. The search module 904 can search by binary search method. - The calculation module 906 is configured to calculate the difference based on the start value and the source character code. The calculation module 906 calculates the difference value by subtracting the source character code from the start value. The adding module 908 is configured to add the displacement value and the difference value to obtain the target value. The destination value can be used as one of the target characters in the target character set. The conversion system may further include an obtaining module 910 for obtaining a target character code for encoding the target character in the target character set according to the index. Please refer to FIG. 3, which is a flow chart showing the execution of an embodiment of the method disclosed in the present invention. In this embodiment, the source character set is UCS and the destination character set is Arabic ISO. The target character to be character-transformed is assumed to be "(3", and its character encoding can be converted from UCS to Arabic ISO by the method proposed by the present invention. The target character "ti" is encoded by the source character set UCS. The source character code is "0x642", and the destination character code encoded by the target character set Arabic ISO is "0xe2".
首先,主要依據字元集中不連續的範圍,分析來源字 元集UCS及目的字元集Arabic ISO,然後根據分析的結果 建立一個對映表。表一即為根據分析結果所建立之對映 表。在表一中,共有8個紀錄用以代表來源字元集即UCS 075 8B-A3143 3TWF/PMTI-05-001 T/Teresa 1306337First, the source character set UCS and the destination character set Arabic ISO are analyzed mainly based on the discontinuous range of the character set, and then an mapping table is established based on the analysis result. Table 1 is the mapping table established based on the analysis results. In Table 1, there are 8 records used to represent the source character set, ie UCS 075 8B-A3143 3TWF/PMTI-05-001 T/Teresa 1306337
Unicode-UCS的雙向轉換而言,對映表的總尺寸為 8x6+7x6=48+42=90個位元組。若進一步將Arabic ISO和 Unicode-UCS字元集的編碼表(enc〇ding tables)考量在内, 那麼所需的記憶體空間會變為48+42+(209x2x2)=926個位 元組。相較於傳統方法所需的1672位元組,本發明對記憶 - 體空間的要求大幅減少。 , 再者,應用本發明所提出之方法,對於執行的效率而 言’將字元由Unicode-UCS轉換至Arabic ISO的執行複雜 肇度為i〇g28 ’而將字元由Arabic ISO轉換至Unicode-UCS 的執行複雜度為log27。 由上可知’本發明所提出之方法及系統大幅提升字元 轉換的效率,當本發明所提出之方法或系統應用於手持通 訊哀置時’更顯示其執行效能,可增加手持通訊裝置之資 料處理效率。 本發明所提出之方法及系統,或者其中某些部份,可 癱 能以電腦程式(電腦指令)之方式加以實現,此電腦程式(電 月自才曰々)可此建置於儲存媒體中,如軟碟(floppy diskettes)、 光碟(CD-ROMS)、硬碟(hard drives)、韋刃體(firmware)或其 他任何機咨可辨讀之儲存媒體中。當前述之電腦程式(電腦 指令)經由如電腦等機器載入並執行時,此載入電腦程式 (毛鲕指令)之機器即轉換為一用以實現本發明之裝置。再 者本發明所揭示之方法及系統可以電腦程式(電腦指令) 之方式進打傳輸’傳輪媒體如電線(electrical wire)、電纜 (cable)、光纖⑽er〇pti⑻、以及其他任何可進行傳輸之傳 0758B-A3 ^SSTWF/PMTI^OS^OOi T/Teresa 12 1306337 輸媒體或無線傳輸(wireless communication)。當前述傳輪之 電腦程式(電腦指令)經由如電腦等機器載入並執行時,此 載入電腦程式(電腦指令)之機器即轉換為一用以實現本發 明之裝置。又再者,本發明所揭示之方法及系統可以電腦 程式(電腦指令)之型態應用於一通用目的(general-purpose) 處理器中,當前述應用於通用目的處理器之電腦程式(電腦 指令)與該處理器相結合時,即提供一用以實現本發明之裝 置,其功能相當於具有特定功能之邏輯電路(logic circuits)。 雖然本發明已以較佳實施例揭露如上,然其並非用以 限定本發明,任何熟習此技藝者,在不脫離本發明之精神 和範圍内,當可作些許之更動與潤飾,因此本發明之保護 範圍當視後附之申請專利範圍所界定者為準。 0758B-A31433TWF/PMTI-05-001T/Teresa 13 1306337 '【圖式簡單說明】 第_1圖係顯示本發明所揭示之方法之執行流程圖。 第2圖係顯示本發明所揭示之系統之功能方塊圖。 第3圖係顯示本發明所揭示之方法之一實施例之執行 流程圖。 【主要元件符號說明】 900—對映表; • 902—接收模組; 904—搜尋模組; 906—計算模組; 908 —相加模組, 910—獲得模組。 075 8B-A31433TWF/PMTI-05-001 T/Teresa 14For bidirectional conversion of Unicode-UCS, the total size of the mapping table is 8x6 + 7x6 = 48 + 42 = 90 bytes. If you further consider the encoding tables of the Arabic ISO and Unicode-UCS character sets (enc〇ding tables), the required memory space will become 48+42+(209x2x2)=926 bytes. Compared with the 1672 bytes required by the conventional method, the requirements for the memory-body space of the present invention are greatly reduced. Furthermore, applying the method proposed by the present invention, for the efficiency of execution, the conversion complexity of characters from Unicode-UCS to Arabic ISO is i〇g28' and the characters are converted from Arabic ISO to Unicode. The execution complexity of -UCS is log27. It can be seen from the above that the method and system proposed by the present invention greatly improve the efficiency of character conversion, and when the method or system proposed by the present invention is applied to the handheld communication, the display performance is more displayed, and the data of the handheld communication device can be increased. Processing efficiency. The method and system of the present invention, or some of the parts thereof, can be implemented by means of a computer program (computer instruction), which can be built in a storage medium. Such as floppy diskettes, CD-ROMs, hard drives, firmware, or any other storage medium that can be read by readers. When the aforementioned computer program (computer command) is loaded and executed by a machine such as a computer, the machine loaded with the computer program (furry command) is converted into a device for implementing the present invention. Furthermore, the method and system disclosed by the present invention can be transmitted in the form of a computer program (computer command) to transmit 'transmission media such as electrical wires, cables, optical fibers (10) er〇pti (8), and any other transmission. Transmission 0758B-A3 ^SSTWF/PMTI^OS^OOi T/Teresa 12 1306337 Transmission media or wireless communication (wireless communication). When the aforementioned computer program (computer command) is loaded and executed by a machine such as a computer, the machine loaded with the computer program (computer command) is converted into a device for implementing the present invention. Furthermore, the method and system disclosed by the present invention can be applied to a general-purpose processor in the form of a computer program (computer instruction), when the aforementioned computer program (computer instruction) is applied to a general purpose processor. When combined with the processor, a means for implementing the invention is provided, the function of which is equivalent to logic circuits having a particular function. While the present invention has been described in its preferred embodiments, the present invention is not intended to limit the invention, and the present invention may be modified and modified without departing from the spirit and scope of the invention. The scope of protection is subject to the definition of the scope of the patent application. 0758B-A31433TWF/PMTI-05-001T/Teresa 13 1306337 '[Simplified Schematic] FIG. 1 shows a flowchart of execution of the method disclosed by the present invention. Figure 2 is a functional block diagram showing the system of the present invention. Figure 3 is a flow chart showing the execution of one embodiment of the method disclosed herein. [Main component symbol description] 900 - mapping table; • 902 - receiving module; 904 - search module; 906 - computing module; 908 - adding module, 910 - obtaining module. 075 8B-A31433TWF/PMTI-05-001 T/Teresa 14