TWI306337B - Character conversion methods and systems - Google Patents

Character conversion methods and systems Download PDF

Info

Publication number
TWI306337B
TWI306337B TW95100307A TW95100307A TWI306337B TW I306337 B TWI306337 B TW I306337B TW 95100307 A TW95100307 A TW 95100307A TW 95100307 A TW95100307 A TW 95100307A TW I306337 B TWI306337 B TW I306337B
Authority
TW
Taiwan
Prior art keywords
character
value
source
target
character set
Prior art date
Application number
TW95100307A
Other languages
Chinese (zh)
Other versions
TW200635238A (en
Inventor
Salwan Vikram
Gupta Arun
Jain Anku
Original Assignee
Mediatek India Technology Pvt Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/242,421 external-priority patent/US7215264B2/en
Application filed by Mediatek India Technology Pvt Ltd filed Critical Mediatek India Technology Pvt Ltd
Publication of TW200635238A publication Critical patent/TW200635238A/en
Application granted granted Critical
Publication of TWI306337B publication Critical patent/TWI306337B/en

Links

Landscapes

  • Document Processing Apparatus (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Description

1306337 '九、發明說明: 【發明所屬之技術領域】 本發明係有關於一種字元轉換技術,特別係有關於一 種將字元於不同字元集間進行字元編碼轉換之方法及系 統。 【先前技術】 字元編瑪/解碼(character encoding/decoding)可應用於 φ 多語文應用系統中,多語文應用系統可實現於許多裝置 中,如個人數位助理(Personal Digital Assistants,PDAs)或其 他手持通訊裝置(mobile handheld device)中。 傳統上,將字元(character)的字元碼(character code)於 不同的字元集(character code set)中進行轉換,係利芾建立 1對1對映表(one-to-one mapping table)來完成。所建立的1 對1對映表通常包括兩個不同的字元集的字元碼紀錄,也 就是包括來源字元集(source character code set)以及目的字 鲁元集(destination character code set)的字元碼紀錄。1306337 ' IX. EMBODIMENT DESCRIPTION: TECHNICAL FIELD OF THE INVENTION The present invention relates to a character conversion technique, and more particularly to a method and system for character-to-character conversion between different character groups. [Prior Art] Character encoding/decoding can be applied to φ multi-language application systems, and multi-language application systems can be implemented in many devices, such as Personal Digital Assistants (PDAs) or others. In the mobile handheld device. Traditionally, the character code of a character is converted in a different character code set, and a one-to-one mapping table is established. )To be done. The established one-to-one mapping table usually includes character code records of two different character sets, that is, including a source character code set and a destination character code set. Character code record.

舉例而言,當來源字元集為Arabic ISO (International Organization for Standardization)而目的字元集為 Unicode-UCS (Universal Character Set)時,傳統字元轉換會 建立如前所述之1對1對映表。在此例中,由於Arabic ISO 包括209個字元,而Arabic ISO/Unicode-UCS使用16位元 (bit)即2位元組(byte)來進行編碼,因此一個單向的對映表 需要有(209)χ(2+2)=836位元組。一個單向.的對映表只能完 成單向的字元轉換,如由Arabic ISO轉換至Unicode-UCS 0758B-A31433TWF/PIvm"05-001T/Teresa 4 1306337 或由Unicode-UCS轉換至Arabic ISO。如欲完成雙向的字 元轉換,即可由Arabic ISO轉換至Unicode-UCS以及由 Unicode-UCS轉換至Arabic ISO,則必須將對映表的大小 加倍’因此雙向1對1對映表的大小為836x2=1672位元組。 再者,字元轉換的執行效率約可等同於在對映表中進 行二元搜尋(binary search)的效率執行複雜度,就傳統方法 而吕’其字元轉換的執行複雜度(complexity)約為l〇g,N, N=209。 然而,不同的語文會利用不同的字元集進行字元編 瑪’因此當某一裝置支援多種語文的應用系統時,字元往 往必須在各個不同的字元集中重覆地進行轉換,當轉換頻 繁又欠缺有效率之轉換方法時,字元轉換問題便會形成。 ή々j在如手持通成表置寻衮置中,記憶體空間51:汉) 及中央處理單元(central processing unit,cpu)的規格相當 叉限,因此傳統字元轉換方法無法滿足現今及往後相關科 技發展的需求。 【發明内容】 有鑑於此,本發明的目的就在於提供系統化之字元轉 換方法,其可將字元於不同的字元集間進行字元碼的轉換。 為達成上述目的’本發明提供一種字元轉換方法,用 ^將字元編碼由來源字元集轉換至目的字元集。首先必須 提供對映表’對映表用以表示來源字元集及目的字元集之 間的關係’對映表係分析來源字元集及目的字元集而建 〇758B-A31433TWF/PMTI-〇5-〇〇1T/Teresa 5 1306337 立,對映表所包括的紀錄係以來源字元集中數個不連續的 範圍為主。對映表之每一紀錄係對映至一範圍,並包括該 範圍之啟始值、結束值以及位移值,位移值用以作為啟始 值於目的字元集中之對應值。 接著,接收來源字元碼,來源字元碼係將欲進行轉換 — 的目標字元以來源字元集進行編碼所得。再根據來源字元 • 碼,以二元搜尋法等方式搜尋對映表以獲得啟始值以及位 移值。位移值用以表示啟始值於目的字元集之對應值。 ® 然後,根據啟始值及來源字元碼,計算差值。例如, 將來源字元碼減去啟始值而得到差值。最後,將位移值與 . 差值相加,以得到目的值。目的值係作為目標字元於目的 * 字元集中之一指標(index),也就是其後可根據指標,得到 目標字元以目的字元集進行編碼之目的字元碼。 再者,本發明提出一種字元轉換系統,周以將字元編 碼由來源字元集轉換至目的字元集,包括對映表、接收模 組、搜尋模組、計算模組以及相加模組,用以執行上述之 字元轉換方法。 【實施方式】 請參照第1圖,第1圖係顯示本發明所揭示之方法之 執行流程圖。首先,分析來源字元集及目的字元集,主要 分析來源字元集及目的字元集中所包含的數個不連續的範 圍(步驟S700)。 根據分析的結果,可建立一個對映表(步驟S702)。因 0758B-A3 Ϊ 433 TWF/PMTI-05-001 T/Teresa 6 1306337 此,對映表係依據分析所得到的資訊,指出來源字元集與 目的字元集之間的關係。當對映表建立完成後,對映表會 包含用以表示來源字元集中不連續範圍的多筆紀錄 (entry) 〇 接著,接收來源字元碼(步驟S704),來源字元碼係將 ' 欲進行轉換的目標字元以來源字元集進行編碼所得。再根 • 據來源字元碼,以二元搜尋法等方式搜尋對映表以獲得啟 始值以及位移值(步驟S706)。位移值可用以表示啟始值於 ®目的字元集中之對應值。 然後,根據啟始值及來源字元碼計算差值(步驟 S708)。例如,將來源字元碼減去啟始值而得到差值。最後, 將位移值與差值相加,以得到目的值(步驟S710)。目的值 係為目標字元於目的字元#中之一指標,其後可根據此指 標,得到目標字元以目的字元集進行編碼之目的字元碼。 第2圖係顯示本發明所揭示之系統之功能方塊圖。如 I 圖所示,本發明提出一種字元轉換系統,用以將字元編碼 由來源字元集轉換至目的字元集,包括對映表900、接收 模組902、搜尋模組904、計算模組906以及相加模組908。 對映表900用以表示來源字元集及目的字元集之間的 關係,對映表係根據來源字元集和目的字元集中數個不相 連續的範圍所建立,而對映表包括多筆記錄,用以表示來 源字元集中之數個不連續範圍。每一紀錄係對映至一個範 圍,並包括該範圍之啟始值、結束值以及位移值,其中位 移值用以作為啟始值於目的字元集中之一對應值。對映表 0758B-A31433TWF/PMTI-05-001 T/Teresa 7 1306337 « 可藉由分析來源字元集及目的字元集而建立。 接收模組902用以接收來源字元碼,來源字元碼係將 目標字元以來源字元集進行編碼所得。搜尋模組904用以 根據來源字元碼,搜尋對映表900以獲得啟始值以及位移 值,位移值用以表示啟始值於目的字元集中之對應值。搜 • 尋模組904可以二元搜尋法進行搜尋。 - 計算模組906用以根據啟始值及來源字元碼,計算差 值。計算模組906進行差值之計算時,係將來源字元碼減 ® 去啟始值而得到。相加模組908用以將位移值與差值相 加,以得到目的值。目的值可作為目標字元於目的字元集 中之一指標。轉換系統還可包括獲得模組910,獲得模紐 910用以根據指標得到目標字元以目的字元集進行編碼之 目的字元碼。 請參照第3圖,第3圖係顯示本發明所揭示之方法之 一實施例之執行流程圖。在此實施例中,來源字元集係為 UCS而目的字元集為Arabic ISO。欲進行字元轉換的目標 字元假設為” (3 ”,應用本發明所提出之方法可將其字元編碼 由UCS轉換至Arabic ISO。目標字元”ti”以來源字元集UCS 進行編碼之來源字元碼為”0x642”,以目的字元集Arabic ISO進行編碼的目的字元碼為”0xe2”。For example, when the source character set is Arabic ISO (International Organization for Standardization) and the destination character set is Unicode-UCS (Universal Character Set), the traditional character conversion will establish a one-to-one mapping as described above. table. In this example, since the Arabic ISO includes 209 characters, and the Arabic ISO/Unicode-UCS uses 16 bits or 2 bytes to encode, a one-way mapping table needs to have (209) χ (2+2) = 836 bytes. A one-way mapping table can only perform one-way character conversion, such as conversion from Arabic ISO to Unicode-UCS 0758B-A31433TWF/PIvm"05-001T/Teresa 4 1306337 or from Unicode-UCS to Arabic ISO. To perform a two-way character conversion, you can convert from Arabic ISO to Unicode-UCS and from Unicode-UCS to Arabic ISO, you must double the size of the mapping table' so the size of the bidirectional 1-to-1 mapping table is 836x2 =1672 bytes. Moreover, the execution efficiency of the character conversion can be approximately equal to the efficiency of the binary search in the mapping table, and the complexity of the traditional method is the complexity of the character conversion. Is l〇g, N, N=209. However, different languages use different character sets for character encoding. Therefore, when a device supports a multi-language application system, characters often have to be repeatedly converted in different character groups when converting. Character conversion problems are formed when there are frequent and inefficient conversion methods. Ή々j in the hand-held device, the memory space 51: Chinese) and the central processing unit (cpu) specifications are quite limited, so the traditional character conversion method can not meet the present and the future After the relevant technology development needs. SUMMARY OF THE INVENTION In view of the above, it is an object of the present invention to provide a systemized character conversion method that converts a character into a character code between different sets of characters. To achieve the above object, the present invention provides a character conversion method for converting a character encoding from a source character set to a destination character set. First, the mapping table 'opformed table' is used to represent the relationship between the source character set and the destination character set'. The mapping table analyzes the source character set and the destination character set and builds 758B-A31433TWF/PMTI- 〇5-〇〇1T/Teresa 5 1306337, the records included in the mapping table are mainly based on a number of discontinuous ranges in the source character set. Each record of the mapping table is mapped to a range and includes the start value, the end value, and the displacement value of the range, and the displacement value is used as the corresponding value of the start value in the target character set. Next, the source character code is received, and the source character code encodes the target character to be converted - the source character set. Then, according to the source character code, the binary search method is used to search the mapping table to obtain the starting value and the shift value. The displacement value is used to indicate the corresponding value of the starting value in the set of destination characters. ® Then, calculate the difference based on the start value and the source character code. For example, subtract the starting value from the source character code to get the difference. Finally, the displacement value is added to the difference value to obtain the target value. The destination value is used as a target character in the target * character set index (index), that is, the target character code for encoding the target character in the target character set according to the index. Furthermore, the present invention proposes a character conversion system for converting a character encoding from a source character set to a destination character set, including a mapping table, a receiving module, a search module, a computing module, and an additive module. A group for performing the above character conversion method. [Embodiment] Please refer to Fig. 1, which is a flow chart showing the execution of the method disclosed in the present invention. First, the source character set and the target character set are analyzed, and a plurality of discontinuous ranges included in the source character set and the target character set are mainly analyzed (step S700). Based on the result of the analysis, a mapping table can be established (step S702). Because 0758B-A3 Ϊ 433 TWF/PMTI-05-001 T/Teresa 6 1306337 Therefore, the mapping table indicates the relationship between the source character set and the destination character set based on the information obtained by the analysis. After the mapping table is created, the mapping table will contain a plurality of entries for indicating the discontinuous range in the source character set. Next, the source character code is received (step S704), and the source character code system will be ' The target character to be converted is encoded in the source character set. Then, according to the source character code, the mapping table is searched by a binary search method or the like to obtain a start value and a displacement value (step S706). The displacement value can be used to indicate the corresponding value of the starting value in the ® target character set. Then, the difference is calculated based on the start value and the source character code (step S708). For example, subtracting the starting value from the source character code yields the difference. Finally, the displacement value is added to the difference value to obtain the target value (step S710). The destination value is an indicator of the target character in the destination character #, and then the destination character code of the target character encoded by the destination character set can be obtained according to the index. Figure 2 is a functional block diagram showing the system of the present invention. As shown in FIG. 1, the present invention provides a character conversion system for converting a character encoding from a source character set to a destination character set, including a mapping table 900, a receiving module 902, a search module 904, and a calculation. Module 906 and addition module 908. The mapping table 900 is used to represent the relationship between the source character set and the destination character set. The mapping table is established according to a plurality of non-contiguous ranges of the source character set and the destination character set, and the mapping table includes Multiple records to represent several discrete ranges in the source character set. Each record is mapped to a range and includes the start value, end value, and displacement value of the range, where the bit shift value is used as the start value in one of the destination character sets. Mapping Table 0758B-A31433TWF/PMTI-05-001 T/Teresa 7 1306337 « Can be established by analyzing the source character set and the destination character set. The receiving module 902 is configured to receive the source character code, and the source character code encodes the target character in the source character set. The search module 904 is configured to search the mapping table 900 to obtain the starting value and the displacement value according to the source character code, and the displacement value is used to indicate the corresponding value of the starting value in the target character set. The search module 904 can search by binary search method. - The calculation module 906 is configured to calculate the difference based on the start value and the source character code. The calculation module 906 calculates the difference value by subtracting the source character code from the start value. The adding module 908 is configured to add the displacement value and the difference value to obtain the target value. The destination value can be used as one of the target characters in the target character set. The conversion system may further include an obtaining module 910 for obtaining a target character code for encoding the target character in the target character set according to the index. Please refer to FIG. 3, which is a flow chart showing the execution of an embodiment of the method disclosed in the present invention. In this embodiment, the source character set is UCS and the destination character set is Arabic ISO. The target character to be character-transformed is assumed to be "(3", and its character encoding can be converted from UCS to Arabic ISO by the method proposed by the present invention. The target character "ti" is encoded by the source character set UCS. The source character code is "0x642", and the destination character code encoded by the target character set Arabic ISO is "0xe2".

首先,主要依據字元集中不連續的範圍,分析來源字 元集UCS及目的字元集Arabic ISO,然後根據分析的結果 建立一個對映表。表一即為根據分析結果所建立之對映 表。在表一中,共有8個紀錄用以代表來源字元集即UCS 075 8B-A3143 3TWF/PMTI-05-001 T/Teresa 1306337First, the source character set UCS and the destination character set Arabic ISO are analyzed mainly based on the discontinuous range of the character set, and then an mapping table is established based on the analysis result. Table 1 is the mapping table established based on the analysis results. In Table 1, there are 8 records used to represent the source character set, ie UCS 075 8B-A3143 3TWF/PMTI-05-001 T/Teresa 1306337

Unicode-UCS的雙向轉換而言,對映表的總尺寸為 8x6+7x6=48+42=90個位元組。若進一步將Arabic ISO和 Unicode-UCS字元集的編碼表(enc〇ding tables)考量在内, 那麼所需的記憶體空間會變為48+42+(209x2x2)=926個位 元組。相較於傳統方法所需的1672位元組,本發明對記憶 - 體空間的要求大幅減少。 , 再者,應用本發明所提出之方法,對於執行的效率而 言’將字元由Unicode-UCS轉換至Arabic ISO的執行複雜 肇度為i〇g28 ’而將字元由Arabic ISO轉換至Unicode-UCS 的執行複雜度為log27。 由上可知’本發明所提出之方法及系統大幅提升字元 轉換的效率,當本發明所提出之方法或系統應用於手持通 訊哀置時’更顯示其執行效能,可增加手持通訊裝置之資 料處理效率。 本發明所提出之方法及系統,或者其中某些部份,可 癱 能以電腦程式(電腦指令)之方式加以實現,此電腦程式(電 月自才曰々)可此建置於儲存媒體中,如軟碟(floppy diskettes)、 光碟(CD-ROMS)、硬碟(hard drives)、韋刃體(firmware)或其 他任何機咨可辨讀之儲存媒體中。當前述之電腦程式(電腦 指令)經由如電腦等機器載入並執行時,此載入電腦程式 (毛鲕指令)之機器即轉換為一用以實現本發明之裝置。再 者本發明所揭示之方法及系統可以電腦程式(電腦指令) 之方式進打傳輸’傳輪媒體如電線(electrical wire)、電纜 (cable)、光纖⑽er〇pti⑻、以及其他任何可進行傳輸之傳 0758B-A3 ^SSTWF/PMTI^OS^OOi T/Teresa 12 1306337 輸媒體或無線傳輸(wireless communication)。當前述傳輪之 電腦程式(電腦指令)經由如電腦等機器載入並執行時,此 載入電腦程式(電腦指令)之機器即轉換為一用以實現本發 明之裝置。又再者,本發明所揭示之方法及系統可以電腦 程式(電腦指令)之型態應用於一通用目的(general-purpose) 處理器中,當前述應用於通用目的處理器之電腦程式(電腦 指令)與該處理器相結合時,即提供一用以實現本發明之裝 置,其功能相當於具有特定功能之邏輯電路(logic circuits)。 雖然本發明已以較佳實施例揭露如上,然其並非用以 限定本發明,任何熟習此技藝者,在不脫離本發明之精神 和範圍内,當可作些許之更動與潤飾,因此本發明之保護 範圍當視後附之申請專利範圍所界定者為準。 0758B-A31433TWF/PMTI-05-001T/Teresa 13 1306337 '【圖式簡單說明】 第_1圖係顯示本發明所揭示之方法之執行流程圖。 第2圖係顯示本發明所揭示之系統之功能方塊圖。 第3圖係顯示本發明所揭示之方法之一實施例之執行 流程圖。 【主要元件符號說明】 900—對映表; • 902—接收模組; 904—搜尋模組; 906—計算模組; 908 —相加模組, 910—獲得模組。 075 8B-A31433TWF/PMTI-05-001 T/Teresa 14For bidirectional conversion of Unicode-UCS, the total size of the mapping table is 8x6 + 7x6 = 48 + 42 = 90 bytes. If you further consider the encoding tables of the Arabic ISO and Unicode-UCS character sets (enc〇ding tables), the required memory space will become 48+42+(209x2x2)=926 bytes. Compared with the 1672 bytes required by the conventional method, the requirements for the memory-body space of the present invention are greatly reduced. Furthermore, applying the method proposed by the present invention, for the efficiency of execution, the conversion complexity of characters from Unicode-UCS to Arabic ISO is i〇g28' and the characters are converted from Arabic ISO to Unicode. The execution complexity of -UCS is log27. It can be seen from the above that the method and system proposed by the present invention greatly improve the efficiency of character conversion, and when the method or system proposed by the present invention is applied to the handheld communication, the display performance is more displayed, and the data of the handheld communication device can be increased. Processing efficiency. The method and system of the present invention, or some of the parts thereof, can be implemented by means of a computer program (computer instruction), which can be built in a storage medium. Such as floppy diskettes, CD-ROMs, hard drives, firmware, or any other storage medium that can be read by readers. When the aforementioned computer program (computer command) is loaded and executed by a machine such as a computer, the machine loaded with the computer program (furry command) is converted into a device for implementing the present invention. Furthermore, the method and system disclosed by the present invention can be transmitted in the form of a computer program (computer command) to transmit 'transmission media such as electrical wires, cables, optical fibers (10) er〇pti (8), and any other transmission. Transmission 0758B-A3 ^SSTWF/PMTI^OS^OOi T/Teresa 12 1306337 Transmission media or wireless communication (wireless communication). When the aforementioned computer program (computer command) is loaded and executed by a machine such as a computer, the machine loaded with the computer program (computer command) is converted into a device for implementing the present invention. Furthermore, the method and system disclosed by the present invention can be applied to a general-purpose processor in the form of a computer program (computer instruction), when the aforementioned computer program (computer instruction) is applied to a general purpose processor. When combined with the processor, a means for implementing the invention is provided, the function of which is equivalent to logic circuits having a particular function. While the present invention has been described in its preferred embodiments, the present invention is not intended to limit the invention, and the present invention may be modified and modified without departing from the spirit and scope of the invention. The scope of protection is subject to the definition of the scope of the patent application. 0758B-A31433TWF/PMTI-05-001T/Teresa 13 1306337 '[Simplified Schematic] FIG. 1 shows a flowchart of execution of the method disclosed by the present invention. Figure 2 is a functional block diagram showing the system of the present invention. Figure 3 is a flow chart showing the execution of one embodiment of the method disclosed herein. [Main component symbol description] 900 - mapping table; • 902 - receiving module; 904 - search module; 906 - computing module; 908 - adding module, 910 - obtaining module. 075 8B-A31433TWF/PMTI-05-001 T/Teresa 14

Claims (1)

1^0633, 95100307號申請專利範圍修正頁 修正日期:97.11.18 申請專利範圍 1. 一種字元轉換方法,用以將字元編碼由一來源字元 集轉換至一目的字元集,包括下列步驟: 分析上述來源字元集及上述目的字元集; 根據上述之分析結果,提供一對映表,其用以表示上 . 述來源字元集及上述目的字元集之間的對應關係; 接收來源字元碼,上述來源字元碼係將一目標字元以 上述來源字元集進行編碼所得; B 根據上述來源字元碼,搜尋上述對映表,以獲得一對 應值;以及 根據上述對應值計算出一目的值。 2. 如申請專利範圍第1項所述之字元轉換方法,其中 上述目的值係為上述目標字元於上述目的字元集中之一指 標,而上述方法是根據上述指標,得到上述目標字元以上 述目的字元集進行編碼之目的字元碼。 3. 如申請專利範圍第1項所述之字元轉換方法,其中 φ 上述對映表係根據上述來源字元集及上述目的字元集中所 包含的不相連續的字元碼範圍所建立。 4. 如申請專利範圍第3項所述之字元轉換方法,其中 上述對映表包含複數筆記錄,每一筆記錄係對應於上述來 源字元集中之一字元碼範圍,並且具有一啟始值、一結束 值,以及一位移值,其中上述啟始值與結束值是用以表示 所對應之字元碼範圍,而上述位移值是用以表示上述啟始 值於上述目的字元集中之對應值。 0758B〇A 3143:r Γ W FlCJimm 14) 151^0633, 95100307 Patent Application Scope Amendment Page Revision Date: 97.11.18 Patent Application Area 1. A character conversion method for converting character code from a source character set to a destination character set, including the following Step: analyzing the source character set and the target character set; and, according to the analysis result, providing a pair mapping table, which is used to represent a correspondence between the source character set and the target character set; Receiving a source character code, wherein the source character code encodes a target character by using the source character set; B searching for the mapping table according to the source character code to obtain a corresponding value; A corresponding value is calculated for the corresponding value. 2. The character conversion method according to claim 1, wherein the destination value is an indicator of the target character in the target character set, and the method is to obtain the target character according to the above indicator. The character code to be encoded in the above-mentioned target character set. 3. The character conversion method of claim 1, wherein φ the above mapping table is established based on the source character set and the non-contiguous range of character codes included in the target character set. 4. The character conversion method of claim 3, wherein the mapping table comprises a plurality of records, each record corresponding to a range of character codes in the source character set, and having a start a value, an end value, and a displacement value, wherein the start value and the end value are used to represent the corresponding character code range, and the displacement value is used to indicate the start value in the target character set Corresponding value. 0758B〇A 3143:r Γ W FlCJimm 14) 15 1306337 5·如申請專利範圍第4項所述之字元轉換 在上述根據上述來源字元碼搜尋上述對映表之井/ /、中 述方法是以二元搜尋法對上述對映表進行搜尋^驟中,上 啟始值及-位移值。 ^ 5 6. 如申請專利範圍第5項所述之字元轉換方 在上述根據上述對應值計算出一目的值、、’其中 且〈少驟中,上祕士 法是根據上述啟始值及上述來源字元碼, 疋万 &quot;&quot;月 ~ ^香,1&gt; f 及將上述位移值與上述差值相加,以得到一 u 曰6¾值。 7. 如申請專利範圍第6項所述之字元轉換方 上述差值之計算係將上述來源字元碼減去上述啟妒,其中 到。 。值而得 δ. 一禋子兀轉換糸 集轉換至一目的字元集,包括: -對映表,其用以表示上述來源字元集及 元集之__,上料映㈣根據上述 述目的字元集中所包含的不相連續的字 ^及上 一接收模組,用以接收來源车- 、、 斤建立, 係將-目#字元以m 70碼’上述來源字元碼 係將目標子70以上迷來源字元集進行編碼所得. 一搜尋模組,其耦接於上述接 , 來源字元碼,搜尋上述對映表 、龄卩根據上述 了吹表以獲传一啟始值以及一位移 值’其中上述位移值用以表示上述啟始值於 集中之對應值; 子疋 -計算模組,其輕接於上述搜尋模組,用以根據上述 啟始值及上述來源子元碼,計算一差值.以及 0758Η-Α31433Ί^Ί.'2(2〇〇8Ο814) 16 1306337 一相加模組,其耦接於上述計算模組,用以將上述位 移值與上述差值相加,以得到一目的值。 9. 如申請專利範圍第8項所述之字元轉換系統,其中 上述目的值係為上述目標字元於上述目的字元集中之一指 標,而上述字元轉換系統另包括一獲得模組,其耦接於上 - 述相加模組,用以根據上述指標,得到上述目標字元以上 . 述目的字元集進行編瑪之目的字元碼。 10. 如申請專利範圍第8項所述之字元轉換系統,其中 ® 上述搜尋模組係以二元搜尋法對上述對映表進行搜尋。 11. 如申請專利範圍第8項所述之字元轉換系統,其中 上述對映表包含複數筆記錄,每一筆記錄係對應於上述來 源字元集中之一字元碼範圍,並且具有一啟始值、一結束 值,以及一位移值,其中上述啟始值與結束值是.用以表示 所對應之字元碼範圍,而上述位移值是用以表示上述啟始 值於上述目的字元集中之對應值。 12. 如申請專利範圍第8項所述之字元轉換系統,其中 I 上述計算模組進行上述差值之計算時,係將上述來源字元 碼減去上述啟始值而得到。 13. —種字元轉換方法,用以將字元編碼由一來源字元 集轉換至一目的字元集,包括下列步驟: 提供一對映表,其用以表示上述來源字元集及上述目 的字元集之間的關係,上述對映表係根據上述來源字元集 及上述目的字元集中所包含的不相連續的字元碼範圍所建 立; 075SB-A3 2433TWF/PMTI-05-002 T/Teresa 27 1306337 接收來源字元碼,上述來源字元碼係將一目標字元以 上述來源字元集進行編碼所得; 根據上述來源字元碼,搜尋上述對映表以獲得一啟始 值以及一位移值,其中上述位移值用以表示上述啟始值於 上述目的字元集中之對應值; 根據上述啟始值及上述來源字元碼,計算一差值;以 及 將上述位移值與上述差值相加,以得到一目的值。 14. 如申請專利範圍第13項所述之字元轉換方法,其 中上述目的值係為上述目標字元於上述目的字元集中之一 指標,而上述字元轉換方法另包括根據上述指標,得到上 述目標字元以上述目的字元集進行編碼之目的字元碼。 15. 如申請專利範圍第丨3項所述之字元轉換方法,其 f上述對映表係以二元搜尋法進行搜尋。 16. 如申請專利範圍第13項所述之字元轉換方法,其 中上述對映表包含複數筆記錄,每一筆記錄係對應於上述 來源字元集中之一字元碼範圍,並且具有一啟始值、一結 束值,以及一位移值,其中上述啟始值與結束值是用以表 示所對應之字元碼範圍,而上述位移值是用以表示上述啟 始值於上述目的字元集中之對應值。 17. 如申請專利範圍第13項所述之字元轉換方法,其 中上述差值之計算係將上述來源字元碼減去上述啟始值而 得到。 0758B-A31433TWF/PMTI-05-001 T/Teresa 181306337 5. The character conversion as described in item 4 of the patent application scope searches for the well mapping table according to the above source character code, and the method described above searches for the above mapping table by a binary search method. In the middle of the step, the start value and the - displacement value. ^ 5 6. If the character conversion party described in item 5 of the patent application scope calculates a target value based on the above corresponding value, 'where and < less, the upper secret method is based on the above starting value and The above source character code, 疋万&quot;&quot;月~^香,1&gt; f and adding the above displacement value to the above difference to obtain a value of u 曰 63⁄4. 7. The character conversion party as described in item 6 of the patent application scope is calculated by subtracting the above source character code from the above-mentioned starting point, which is obtained. . The value δ. The conversion of the 禋 兀 兀 糸 糸 至 至 至 , , , , , , , , - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The non-contiguous word contained in the target character set and the previous receiving module are used to receive the source car -, and the jin is established, and the _ _ _ character is m 70 code 'the above source character code system The target module 70 is encoded by the source character set. The search module is coupled to the source code source code, and searches for the mapping table and the age group according to the above-mentioned blowing table to obtain a starting value. And a displacement value 'where the displacement value is used to represent the corresponding value of the starting value in the concentration; the sub-computing module is lightly connected to the search module for using the starting value and the source sub-unit a code, a difference is calculated, and 0758Η-Α31433Ί^Ί.'2(2〇〇8Ο814) 16 1306337 an add-on module coupled to the calculation module for using the displacement value and the difference Add to get a target value. 9. The character conversion system of claim 8, wherein the destination value is an indicator of the target character in the target character set, and the character conversion system further comprises an acquisition module. The method is coupled to the above-mentioned adding module, and is configured to obtain the target character code of the target character set according to the above-mentioned index. 10. The character conversion system of claim 8, wherein the search module searches for the mapping table by a binary search method. 11. The character conversion system of claim 8, wherein the mapping table comprises a plurality of records, each record corresponding to a range of character codes in the source character set, and having a start a value, an end value, and a displacement value, wherein the start value and the end value are used to indicate a corresponding character code range, and the displacement value is used to indicate the start value in the target character set Corresponding value. 12. The character conversion system according to claim 8, wherein the calculation module performs the calculation of the difference value by subtracting the start value from the source character code. 13. A character conversion method for converting a character encoding from a source character set to a destination character set, comprising the steps of: providing a pair of mapping tables for representing the source character set and the above The relationship between the target character sets, the mapping table is established according to the source character set and the non-contiguous character code range included in the target character set; 075SB-A3 2433TWF/PMTI-05-002 T/Teresa 27 1306337 receives a source character code, wherein the source character code encodes a target character by using the source character set; and searching for the mapping table to obtain a starting value according to the source character code And a displacement value, wherein the displacement value is used to represent a corresponding value of the start value in the target character set; calculating a difference according to the start value and the source character code; and the displacement value is The differences are added to obtain a target value. 14. The character conversion method according to claim 13, wherein the target value is an indicator of the target character in the target character set, and the character conversion method further comprises obtaining according to the above indicator. The target character code encoded by the target character set in the above target character set. 15. For the character conversion method described in item iii of the patent application, the above-mentioned mapping table is searched by the binary search method. 16. The character conversion method of claim 13, wherein the mapping table comprises a plurality of records, each of which corresponds to a range of character codes in the source character set, and has a start a value, an end value, and a displacement value, wherein the start value and the end value are used to represent the corresponding character code range, and the displacement value is used to indicate the start value in the target character set Corresponding value. 17. The character conversion method according to claim 13, wherein the calculation of the difference is obtained by subtracting the source character code from the start value. 0758B-A31433TWF/PMTI-05-001 T/Teresa 18
TW95100307A 2005-01-24 2006-01-04 Character conversion methods and systems TWI306337B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US64640705P 2005-01-24 2005-01-24
US11/242,421 US7215264B2 (en) 2005-09-30 2005-09-30 Methods and systems for character conversion

Publications (2)

Publication Number Publication Date
TW200635238A TW200635238A (en) 2006-10-01
TWI306337B true TWI306337B (en) 2009-02-11

Family

ID=36935982

Family Applications (1)

Application Number Title Priority Date Filing Date
TW95100307A TWI306337B (en) 2005-01-24 2006-01-04 Character conversion methods and systems

Country Status (2)

Country Link
CN (1) CN1825301B (en)
TW (1) TWI306337B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7642937B2 (en) 2006-01-09 2010-01-05 Taiwan Semiconductor Manufacturing Co., Ltd. Character conversion methods and systems
CN101840483B (en) * 2009-03-17 2015-11-25 北大方正集团有限公司 A kind of method and system of protecting computer document content

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2359399B (en) * 2000-02-21 2004-06-02 Kenwood Corp Character display

Also Published As

Publication number Publication date
TW200635238A (en) 2006-10-01
CN1825301B (en) 2010-07-14
CN1825301A (en) 2006-08-30

Similar Documents

Publication Publication Date Title
CN111090628B (en) Data processing method and device, storage medium and electronic equipment
JP6931050B2 (en) Methods and equipment for encoding and decoding binary data
WO2016029801A1 (en) Encoding and decoding method, encoding device and decoding device
WO2022142011A1 (en) Method and device for address recognition, computer device, and storage medium
US20200050589A1 (en) Performing a code conversion in a smaller target encoding space
CN1673997A (en) Representation of a deleted interpolation n-gram language model in ARPA standard format
US11669553B2 (en) Context-dependent shared dictionaries
WO2024066271A1 (en) Database watermark embedding method and apparatus, database watermark tracing method and apparatus, and electronic device
TWI306337B (en) Character conversion methods and systems
WO2009023585A2 (en) Dynamically converting symbolic links
JP5551660B2 (en) Computer-implemented method for encoding text into matrix code symbols, computer-implemented method for decoding matrix code symbols, encoder for encoding text into matrix code symbols, and decoder for decoding matrix code symbols
CN111444680A (en) Rarely-used word encoding expansion method and device, storage medium and electronic equipment
CN113641714B (en) Medical data correction method, device, computer equipment and storage medium
CN102033858A (en) Method and system for typesetting and outputting formula
JP2006216024A (en) Efficient conversion of interchange format message
US20150055868A1 (en) Character data processing method, information processing method, and information processing apparatus
KR100399495B1 (en) Method to convert unicode text to mixed codepages
US11580064B2 (en) Methods and systems for encoding URI for arbitrary payload data based on alphanumeric encoding methods
CN103914436B (en) Code converting method and device compared with Small object space encoder is provided
CN112395468A (en) Number management method and device, electronic equipment and storage medium
US8839102B2 (en) Method to automatically display filenames encoded in multiple code sets
CN105991291A (en) Cross-platform electronic file and signature data integration method and system
CN111191473B (en) Method and device for acquiring translation text file
CN113407375B (en) Database deleted data recovery method, device, equipment and storage medium
KR101177092B1 (en) Method for encoding data of object on a image file, method for decoding data of object, apparatus and recording medium thereof

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees