Background technology
Character code/decoding (character encoding/decoding) can be applicable in the multi-lingual civilian application system, multi-lingual civilian application system can be implemented in many devices, (PersonalDigital Assistants is PDAs) or in other hand-hold communication device (mobile handheld device) as personal digital assistant.
Traditionally, the character code (character code) of character (character) being changed in different character set (character code set), is to utilize to set up 1 pair 1 mapping table (one-to-onemapping table) and finish.1 couple, the 1 mapping table of being set up generally includes the character code record of two different character set, just comprises the character code record of source character set (source character code set) and purpose character set (destination character code set).
For example, when source character set is Arabic ISO (International Organizationfor Standardization) and purpose character set when being Unicode-UCS (Universal CharacterSet), traditional character conversion can be set up foregoing 1 pair 1 mapping table.In this example because ArabicISO comprises 209 characters, and Arabic ISO/Unicode-UCS use 16 (bit) promptly 2 bytes (byte) encode, so a unidirectional mapping table needs (209) * (2+2)=836 byte.A unidirectional mapping table can only be finished unidirectional character conversion, as being converted to Unicode-UCS by Arabic ISO or being converted to Arabic ISO by Unicode-UCS.As desire to finish two-way character conversion, and can be converted to Unicode-UCS and be converted to ArabicISO by Arabic ISO by Unicode-UCS, then must be with the size doubles of mapping table, therefore the size of two-way 1 pair 1 mapping table is 836 * 2=1672 byte.
Moreover the execution efficient of character conversion can be equal to the efficient of carrying out binary search (binarysearch) in the mapping table approximately and carry out complexity, and with regard to classic method, the execution complexity (complexity) of its character conversion is about log2N, N=209.
Yet, different Chinese language can utilize different character set to carry out character code, therefore when a certain device is supported the application system of multiple Chinese language, character often must repeatedly be changed in each different character set, when conversion frequently was short of efficient conversion method again, the character conversion problem just can form.Especially in as devices such as hand-hold communication devices, memory headroom (memory size) and CPU (central processing unit) (central processing unit, CPU) specification is quite limited, and therefore traditional character conversion method can't satisfy the demand that reaches the development in science and technology of backward being correlated with now.
Summary of the invention
In view of this, purpose of the present invention just is to provide systematized character conversion method, and it can carry out character the conversion of character code between different character set.
For reaching above-mentioned purpose, the invention provides a kind of character conversion method, in order to character code origin source character set is converted to the purpose character set.At first must provide the mapping table, the mapping table is in order to the relation between expression source character set and the purpose character set, the mapping table is to analyze source character set and purpose character set and set up, and the included record of mapping table is based on several discontinuous scopes in the source character set.Each record of mapping table is mapping to a scope, and comprise this scope open initial value, end value and shift value, shift value is in order to as opening the respective value of initial value in the purpose character set.
Then, receipt source character code, source character code are the target characters that will desire to change with the source character set gained of encoding.According to the source character code, search the mapping table to obtain to open initial value and shift value again in modes such as binary search methods.Shift value opens the respective value of initial value in the purpose character set in order to expression.
Then, according to opening initial value and source character code, calculated difference.For example, in the future the source character code deducts and opens initial value and obtain difference.At last, with shift value and difference addition, to obtain the purpose value.The purpose value is as the pointer (index) of target character in the purpose character set, just can obtain the purpose character code that target character is encoded with the purpose character set according to pointer thereafter.
Moreover the present invention proposes a kind of character conversion system, in order to character code origin source character set is converted to the purpose character set, comprises mapping table, receiver module, search module, computing module and summation module, in order to carry out above-mentioned character conversion method.
Embodiment
Please refer to Fig. 1, Fig. 1 is the flowchart that shows disclosed method.At first, analyze source character set and purpose character set, mainly analyze several discontinuous scopes (step S700) that comprised in source character set and the purpose character set.
Result according to analyzing can set up a mapping table (step S702).Therefore, the mapping table is according to analyzing resulting information, pointing out the relation between source character set and the purpose character set.After the foundation of mapping table was finished, the mapping table can comprise many records (entry) in order to discontinuous scope in the expression source character set.
Then, receipt source character code (step S704), source character code are the target characters that will desire to change with the source character set gained of encoding.According to the source character code, search the mapping table to obtain to open initial value and shift value (step S706) again in modes such as binary search methods.Shift value can open the respective value of initial value in the purpose character set in order to expression.
Then, according to opening initial value and source character code calculated difference (step S708).For example, in the future the source character code deducts and opens initial value and obtain difference.At last, with shift value and difference addition, to obtain purpose value (step S710).The purpose value is the pointer of target character in the purpose character set, can obtain the purpose character code that target character is encoded with the purpose character set according to this pointer thereafter.
Fig. 2 is the functional block diagram that shows disclosed system.As shown in the figure, the present invention proposes a kind of character conversion system, in order to character code origin source character set is converted to the purpose character set, comprises mapping table 900, receiver module 902, search module 904, computing module 906 and summation module 908.
Mapping table 900 is in order to the relation between expression source character set and the purpose character set, the mapping table be according in source character set and the purpose character set several not consecutive scope set up, and the mapping table comprises many notes records, in order to several the discontinuous scopes in the expression source character set.Each record is mapping to a scope, and comprise this scope open initial value, end value and shift value, wherein shift value is in order to as opening the respective value of initial value in the purpose character set.The mapping table can be set up by analyzing source character set and purpose character set.
Receiver module 902 is in order to the receipt source character code, and the source character code is with the source character set gained of encoding with target character.Search module 904 is searched mapping table 900 to obtain opening initial value and shift value in order to according to the source character code, and shift value opens the respective value of initial value in the purpose character set in order to expression.Search module 904 can the binary search method be searched.
Computing module 906 opens initial value and source character code, calculated difference in order to basis.When computing module 906 carries out the calculating of difference, be that in the future the source character code deducts and opens initial value and obtain.Summation module 908 is in order to shift value and difference addition, to obtain the purpose value.The purpose value can be used as the pointer of target character in the purpose character set.Converting system also can comprise acquisition module 910, obtains module 910 in order to obtain the purpose character code that target character is encoded with the purpose character set according to pointer.
Please refer to Fig. 3, Fig. 3 is the flowchart that shows an embodiment of disclosed method.In this embodiment, source character set is UCS and the purpose character set is Arabic ISO.The target character of desiring to carry out character conversion is assumed to be
, use method proposed by the invention and its character code can be converted to Arabic ISO by UCS.Target character
The source character code of encoding with source character set UCS is " Ox642 ", and the purpose character code of encoding with purpose character set Arabic ISO is " 0xe2 ".
At first,, analyze source character set UCS and purpose character set Arabic ISO, set up a mapping table according to the result who analyzes then mainly according to discontinuous scope in the character set.Table one is the mapping table of setting up according to analysis result.In Table 1, having 8 records is 8 character code scopes among the UCS in order to represent source character set, points out the corresponding relation between source character set USC and the purpose character set Arabic ISO simultaneously.What the record of each in the table one comprised its pairing scope opens initial value, end value and shift value.Shown in the first stroke record among the figure, the initial value that opens of first character code scope is 0x0000 among the source character set UCS, end value is 0x00a0, be that first character code scope is between 0x0000 and 0x00a0, and first character code scope of source character set UCS is first character code scope corresponding to purpose character set Arabic ISO, therefore first character code scope of source character set UCS open initial value 0x0000 be corresponding to first character code scope of purpose character set Arabic ISO open initial value 0x0, be shift value in the first stroke record and open initial value 0x0.In like manner, for second record, the initial value that opens of second character code scope of source character set UCS is 0x00a4, and end value is 0x00a4, is 0xa1 and open the respective value of initial value 0x00a4 in purpose character set Arabic ISO, and this is shift value.Therefore, as shown in table one, be discontinuous between scope and the scope, for example, the end value of first scope is 0x00a0, and the initial value that opens of second scope is 0x00a4, and in Table 1, these not mutual continuous scopes are to open initial value according to it to arrange in the mode of successively decreasing.
Table one
Start Value | End Value | Offset |
0x0000 | 0x00a0 | 0x0 |
0x00a4 | 0x00a4 | 0xa1 |
0x00ad | 0x00ad | 0xa2 |
0x060c | 0x060c | 0xa3 |
0x061b | 0x061b | 0xa4 |
0x061f | 0x061f | 0xa5 |
0x0621 | 0x063a | 0xa6 |
0x0640 | 0x0652 | 0xc0 |
Please get back to Fig. 3, at first the source character code " 0x0642 " (step S800) of receiving target character is then searched table one according to source character code " 0x0642 ", to obtain opening initial value and shift value (step S802).Because source character code " 0x0642 " is positioned at scope " 0x0640 " to arrive among " 0x0652 ", after search is finished, can obtain to open initial value " 0x0640 " and shift value " 0xc0 ".Then, source character code " 0x0642 " deducts and opens initial value " 0x0640 " in the future, to obtain difference " 0x02 " (step S804).With shift value " 0xc0 " and difference " 0x02 " addition, obtain purpose value (step 806) again.The purpose value is the pointer of target character in the purpose character set, can obtain the purpose character code (step 808) that target character is encoded with purpose character set Arabic ISO according to this pointer thereafter.
At this, table one is the mapping table of a unidirectional conversion, only is used for character is converted to Arabic ISO by Unicode-UCS.If desire to carry out two-way character conversion, promptly be converted to ArabicISO and be converted to Unicode-UCS by Arabic IsO by Unicode-UCS, then mapping table as shown in Table 2 must be established.
For table two, source character set is Arabic ISO and the purpose character set is Unicode-UCS.As shown in Table 2, have 7 records in order to 7 not consecutive character code scopes among the expression source character set Arabic ISO.Similarly, each record in the table two comprise corresponding scope open initial value, end value and shift value.Shown in the record of the first stroke among the figure, the initial value that opens of first character code scope of source character set Arabic ISO is 0x0, and end value is 0xa0, is 0x0 and open the respective value of initial value 0x0 in purpose character set USC, and this is shift value.In like manner, for second record, the initial value that opens of second character code scope of source character set Arabic ISO is 0xa4, and end value is 0xa4, is 0xa1 and open the respective value of initial value 0xa4 in purpose character set USC, and this is shift value.Therefore, as shown in table two, be discontinuous between scope and the scope, for example, the end value of first scope is 0xa0, and the initial value that opens of second scope is 0xa4, and in table two, these not mutual continuous scopes are to open initial value according to it to arrange in the mode of successively decreasing.
Table two
Start Value | End Value | Offset |
0x0 | 0xa0 | 0x0 |
0xa4 | 0xa4 | 0xa1 |
0xac | 0xad | 0xa2 |
0xbb | 0xbb | 0xa4 |
0xbf | 0xbf | 0xa5 |
0xc1 | 0xda | 0xa6 |
0xe0 | 0xf2 | 0xc0 |
For two-way character conversion, promptly be converted to Arabic ISO and be converted to Unicode-UCS by Arabic ISO by Unicode-UCS, the overall dimensions of mapping table (size) can be considered the length summation of table one and table two, and overall dimensions is the number that depends on the not consecutive character code scope that is comprised among character set Arabic ISO and the Unicode-UCS.As shown in this example, table one comprises 8 notes records, and being used for writing down 8 character code scopes, and each record comprises and opens initial value, end value and shift value, its each need 2 bytes, so each record must use 6 bytes to store data.Therefore, the overall dimensions of table one is 8 * 6=48 byte.Table two has 7 records, is used for writing down 7 character code scopes, and each record comprises opens initial value, end value and shift value, and each needs 2 bytes, so each record must use 6 bytes to store data.Therefore, the overall dimensions of table two is 7 * 6=42 byte.For being converted to Arabic ISO by Unicode-UCS and being converted to by Arabic ISO for the bi-directional conversion of Unicode-UCS, the overall dimensions of mapping table is 8 * 6+7 * 6=48+42=90 byte.If further the coding schedule (encoding tables) of Arabic ISO and Unicode-UCS character set is considered interior, so required memory headroom can become 48+42+ (209 * 2 * 2)=926 byte.Compared to 1672 required bytes of classic method, the present invention significantly reduces the requirement of memory headroom.
Moreover, use method proposed by the invention, for the efficient of carrying out, be log28 by the execution complexity that Unicode-UCS is converted to Arabic ISO, and be log27 by the execution complexity that ArabicISO is converted to Unicode-UCS character with character.
As from the foregoing, method and system proposed by the invention significantly promote the efficient of character conversion, when method proposed by the invention or system applies during in hand-hold communication device, show that more it carries out usefulness, can increase the data-handling efficiency of hand-hold communication device.
Method and system proposed by the invention, perhaps some part wherein, may be realized in the mode of computer program (computer instruction), this computer program (computer instruction) may be built and place Storage Media, can distinguish in the Storage Media of reading as floppy disk (floppy diskettes), CD (CD-ROMS), hard disk (hard drives), firmware (firmware) or other any machine.When aforesaid computer program (computer instruction) when loading as machines such as computing machines and carrying out, this machine that loads computer program (computer instruction) promptly is converted to one in order to realize device of the present invention.Moreover, disclosed method and system can computer program (computer instruction) mode transmit transmission medium such as electric wire (electrical wire), cable (cable), optical fiber (fiber optics) and other any transmission medium that transmits or wireless transmission (wireless communication).When the computer program (computer instruction) of aforementioned transmission when loading and carrying out as machines such as computing machines, this machine that loads computer program (computer instruction) promptly is converted to one in order to realize device of the present invention.Again moreover, disclosed method and system can computer program (computer instruction) kenel be applied in general purpose (general-purpose) processor, when the aforementioned computer program (computer instruction) that is applied to general purpose processor combines with this processor, promptly provide one in order to realize device of the present invention, its function is equivalent to have the logical circuit (logic circuits) of specific function.
Though the present invention discloses as above with preferred embodiment; right its is not in order to limiting the present invention, anyly has the knack of this skill person, without departing from the spirit and scope of the present invention; when can doing a little change and retouching, so protection scope of the present invention is as the criterion when looking the claim person of defining.