CN102760119A - Method for storing Unicode coded character string in embedded device - Google Patents

Method for storing Unicode coded character string in embedded device Download PDF

Info

Publication number
CN102760119A
CN102760119A CN2012102402965A CN201210240296A CN102760119A CN 102760119 A CN102760119 A CN 102760119A CN 2012102402965 A CN2012102402965 A CN 2012102402965A CN 201210240296 A CN201210240296 A CN 201210240296A CN 102760119 A CN102760119 A CN 102760119A
Authority
CN
China
Prior art keywords
unicode
character
ascii
string
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012102402965A
Other languages
Chinese (zh)
Inventor
张翔
李元章
王文明
谭毓安
马忠梅
张全新
张雪兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN2012102402965A priority Critical patent/CN102760119A/en
Publication of CN102760119A publication Critical patent/CN102760119A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a method for storing a Unicode coded character string in an embedded device. The method specifically comprises the steps of: 1, sequentially representing each Unicode character in the Unicode coded string by using a 16 system number, then carrying out operation from the step 2 to the step 3 on each Unicode character represented by using the 16 system number; 2, respectively converting a high 4 bit and a low 4 bit of each byte in one Unicode character into an ASCII code for representing; and 3, merging converted results, namely, converting one Unicode character into characters of four ASCII codes, and storing conversion results represented by the four ASCII codes. The method provided by the invention has the advantages of no need of a special operating system programming environment, space saving, and capability of conveniently realizing inverse conversion from an ASCII character string to the Unicode character string to obtain an original Unicode character string.

Description

The storage means of Unicode coded string in the embedded device
Technical field
The present invention relates to the storage means of Unicode coded string in a kind of embedded device, belong to the computer data field of storage.
Background technology
Along with the development of Computer Storage art, in embedded device, the application of Unicode coding more and more widely.Unicode is a kind of character coding method, by international organizational design, can hold the encoding scheme of all spoken and written languages of the whole world.Usually the implementation of Unicode is the UTF-16 coding.The full name of UTF is Unicode Transformation Format, i.e. Unicode format transformation.UTF-16 is two bytes with each character code, and this is just saving the space and simplifying between two targets of encoding context, provides one well to trade off, and has obtained using widely.In the embedded software developing environment, generally use multi-byte character string, i.e. the ANSI coding.Different standards has been formulated in different countries and regions, has produced GB2312 thus, coding standard separately such as BIG5, JIS etc.These use 2 bytes to represent the various Chinese character stretched-out coding modes of a character, are called the ANSI coding.Under the simplified form of Chinese Character system, the ANSI coding is represented the GB2312 coding, and under Japanese operating system, the ANSI coding is represented the JIS coding.Incompatible between the different ANSI codings, when information exchanges internationally, can't be stored in the text of same section ANSI coding belonging to macaronic literal.
At present in embedded device; Normally directly storage of character string with the Unicode coding; When using the character string of encoding, adopt following 2 kinds of modes to carry out the conversion of Unicode and ANSI coded string: 1. under the Win32 environment, to use MultibyteToWideChar and WideCharToMultiByte function to carry out the conversion of Unicode and ANSI coded string with Unicode; Under the Linux environment, use the iconv function to carry out the conversion of Unicode and ANSI coded string.2. use user-defined conversion table, manually realize the conversion between the character code.Directly there is following shortcoming in storage with the character string of Unicode coding: in (1) embedded device under non-Windows, Linux or other high level operating system programmed environment, 1. the plant Unicode can't use with the conversion method that ANSI encodes.(2) in the embedded device environment that code space is had relatively high expectations; The 2. plant Unicode and ANSI coding the conversion method method need take the memory headroom of hundreds of K bytes, this is a huge expense for some to the higher embedded system of space requirement.
Summary of the invention
The objective of the invention is to overcome the deficiency that exists in storage and the use of existing Unicode coded string, propose the storage means of Unicode coded string in a kind of embedded device.
The objective of the invention is to realize through following technical scheme.
The storage means of Unicode coded string in a kind of embedded device, its concrete operations step is:
The 1. step: successively each the Unicode character in the Unicode coded string is shown with 16 system numerical tables.A Unicode character comprises 2 bytes.Then each the Unicode character that shows with 16 system numerical tables being carried out the 2. goes on foot to the 3. operation in step.
The 2. step: high 4 to each byte in the Unicode character convert corresponding ASCII character respectively to and represent with low 4.
The 3. step: merge the result of conversion, being about to a Unicode character conversion will be the character of 4 ASCII codings, and to storing by the transformation result of 4 ASCII coded representations.
Through the operation of above-mentioned steps, can a Unicode coded string be converted into the ASCII coding form and stores, this moment, embedded system can directly be used the character string of this kind coding form.
Beneficial effect
Method and present existing technology that the present invention proposes relatively have the following advantages:
1. the inventive method does not need specific operating system programmed environment.
2. conserve space.The inventive method huge conversion table that need not take up room has been practiced thrift the space of program.
3. the storage means of the present invention's proposition can realize the inverse conversion of ascii string to the Unicode character string very easily, obtains original Unicode character string.
Embodiment
Below in conjunction with specific embodiment content of the present invention is further specified.
Embodiment 1:
A long filenames " the storage means .doc of Unicode coded string in a kind of embedded device " with the Unicode coding to the FAT32 file system is stored, and its concrete operations step is:
The 1. step: successively each the Unicode character in this Unicode coded string is shown with 16 system numerical tables: " 00 4E CD, 79 4C 5D, 65 51 0F 5F BE 8B, 07 59 2D 4E, 55 00 6E, 00 69 00 6300 6F, 00 64 00 65 00 16 7F, 01 78 57 5B, 26 7B, 32 4E, 84 76 58 5B A8,50 B9,65 D56C 2E, 00 64 00 6F 00 63 00 ".Then each the Unicode character that shows with 16 system numerical tables being carried out the 2. goes on foot to the 3. operation in step.
The 2. step: high 4 to each byte in the Unicode character convert corresponding ASCII character respectively to and represent with low 4.Be specially:, make it add 0x30, otherwise add 0x37 if 4 binary codes of pending byte are positioned at 0 to 9 interval.The 0x30 that just is mapped to ASCII character for 4 binary codes 0~9, A~F like this to 0x39 or 0x61 between the 0x66.
The 3. step: merge the result of conversion, being about to a Unicode character conversion will be the character of 4 ASCII codings, and to storing by the transformation result of 4 ASCII coded representations.
Operation through above-mentioned steps; Can convert a Unicode coded string into the ASCII coding form: " 004ECD794C5D65510F5FBE8B7592D4E55006E00690063006F0064006 50,016,7F0,178,575,B26,7B3,24E,847,658,5BA,850,B96,5D5,6C2,E00,640,06F,006 300 "; And it is stored, this moment, embedded system can directly be used the character string of this kind coding form.
The concrete operations step of recovering Unicode is:
The 1st step: each character in the ASCII character character string " 004ECD794C5D65510F5FBE8B7592D4E55006E00690063006F0064006 50,016,7F0,178,575,B26,7B3,24E,847,658,5BA,850,B96,5D5,6C2,E00,640,06F,006 300 " distributes 4 bit spaces; This character string is totally 116 ascii characters; Promptly need the space of 58 bytes altogether, corresponding 29 Unicode characters.
The 2nd step: each ASCII character character is changed, and process is following: if pending ascii character is positioned at character 0 to character 9 intervals, make it deduct 0x30; Otherwise deduct 0x37.0x30 for ASCII character just is mapped to 4 binary codes 0 to F to 0x39,0x61 to the 0x66 interval like this.Such as preceding four ascii characters " 004E " for this character string, 16 systems are expressed as 0x30,0x30, and 0x34,0x45 just converts 16 system 0x004E into, i.e. Chinese character among the Unicode " ".
The 3rd goes on foot: the character string that converts is merged to together store.After all converting, ASCII coded string " 004ECD794C5D65510F5FBE8B7592D4E55006E00690063006F0064006 50,016,7F0,178,575,B26,7B3,24E,847,658,5BA,850,B96,5D5,6C2,E00,640,06F,006 300 " is stored as former Unicode character string: " the storage means .doc of Unicode coded string in a kind of embedded device ".
Embodiment 2:
One of the new technology file system filename with the Unicode coding " Windows programme Video tutorials 1 " is stored, and its concrete operations step is:
The 1. step: successively each the Unicode character in this Unicode coded string is shown with 16 system numerical tables: " 57 00 69 00 6E, 00 64 00 6F, 00 77 00 73 00 16 7F 0B 7A C6,89 91 98 59 650B 7A 31 00 ".Then each the Unicode character that shows with 16 system numerical tables being carried out the 2. goes on foot to the 3. operation in step.
The 2. step: high 4 to each byte in the Unicode character convert corresponding ASCII character respectively to and represent with low 4.Be specially:, make it add 0x30, otherwise add 0x37 if 4 binary codes of pending byte are positioned at 0 to 9 interval.The 0x30 that just is mapped to ASCII character for 4 binary codes 0~9, A~F like this to 0x39 or 0x61 between the 0x66.
The 3. step: merge the result of conversion, being about to a Unicode character conversion will be the character of 4 ASCII codings, and to storing by the transformation result of 4 ASCII coded representations.
Operation through above-mentioned steps; Can convert a Unicode coded string into the ASCII coding form: " 570069006E0064006F0077007300167F0B7AC689919859650B7A3100 "; And it is stored, this moment, embedded system can directly be used the character string of this kind coding form.
The present invention is not limited only to above embodiment, everyly utilizes mentality of designing of the present invention, does the design of some simple change, all should count within protection scope of the present invention.

Claims (1)

1. the storage means of Unicode coded string in the embedded device, it is characterized in that: its concrete operations step is:
The 1. step: successively each the Unicode character in the Unicode coded string is shown with 16 system numerical tables; A Unicode character comprises 2 bytes; Then each the Unicode character that shows with 16 system numerical tables being carried out the 2. goes on foot to the 3. operation in step;
The 2. step: high 4 to each byte in the Unicode character convert corresponding ASCII character respectively to and represent with low 4;
The 3. step: merge the result of conversion, being about to a Unicode character conversion will be the character of 4 ASCII codings, and to storing by the transformation result of 4 ASCII coded representations;
Through the operation of above-mentioned steps, can a Unicode coded string be converted into the ASCII coding form and stores, this moment, embedded system can directly be used the character string of this kind coding form.
CN2012102402965A 2012-07-11 2012-07-11 Method for storing Unicode coded character string in embedded device Pending CN102760119A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012102402965A CN102760119A (en) 2012-07-11 2012-07-11 Method for storing Unicode coded character string in embedded device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012102402965A CN102760119A (en) 2012-07-11 2012-07-11 Method for storing Unicode coded character string in embedded device

Publications (1)

Publication Number Publication Date
CN102760119A true CN102760119A (en) 2012-10-31

Family

ID=47054578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012102402965A Pending CN102760119A (en) 2012-07-11 2012-07-11 Method for storing Unicode coded character string in embedded device

Country Status (1)

Country Link
CN (1) CN102760119A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897257A (en) * 2017-02-23 2017-06-27 郑州云海信息技术有限公司 The conversion method and device of a kind of ASCII character and character string based on LINUX platforms
CN109446488A (en) * 2018-08-21 2019-03-08 深圳市华力特电气有限公司 A kind of data processing method and device
CN112017049B (en) * 2020-10-15 2021-05-18 南京艾科朗克信息科技有限公司 Security quotation forwarding system and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994027374A1 (en) * 1993-05-13 1994-11-24 Apple Computer, Inc. Method and apparatus for efficient compression of data having redundant characteristics
US20050246507A1 (en) * 2004-04-29 2005-11-03 International Business Machines Corporation Storage pre-alignment and EBCDIC, ASCII and unicode basic latin conversions for packed decimal data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994027374A1 (en) * 1993-05-13 1994-11-24 Apple Computer, Inc. Method and apparatus for efficient compression of data having redundant characteristics
US20050246507A1 (en) * 2004-04-29 2005-11-03 International Business Machines Corporation Storage pre-alignment and EBCDIC, ASCII and unicode basic latin conversions for packed decimal data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GYZY: "编写Unicode有效的ShellCode", 《黑客防线》, no. 5, 31 December 2007 (2007-12-31), pages 1 - 2 *
程小刚 等: "GB18030与Unicode编码转换算法", 《华侨大学学报(自然科学版)》, vol. 30, no. 1, 31 January 2009 (2009-01-31) *
马玉春 等: "数据的编码与处理技术", 《电脑编程技巧与维护》, no. 8, 31 December 2011 (2011-12-31) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897257A (en) * 2017-02-23 2017-06-27 郑州云海信息技术有限公司 The conversion method and device of a kind of ASCII character and character string based on LINUX platforms
CN109446488A (en) * 2018-08-21 2019-03-08 深圳市华力特电气有限公司 A kind of data processing method and device
CN112017049B (en) * 2020-10-15 2021-05-18 南京艾科朗克信息科技有限公司 Security quotation forwarding system and method

Similar Documents

Publication Publication Date Title
US9043346B2 (en) Method of providing data included in building information modeling data file, recording medium therefor, system using the method, and method of providing data using building information modeling server
CN104753540B (en) Data compression method, data decompression method and apparatus
CN102141916B (en) Embedded equipment and method for displaying language word on OSD interface
CN101645061A (en) Information hiding method taking text information as carrier
CN100550020C (en) A kind of method and apparatus that is used to solve the Chinese software issue of supporting multilanguage
CN102609417A (en) Engine device and method for data integration and exchange of building information mode based on IFC (industry foundation classes) standards
CN101271463A (en) Representation method and system of layout file logical structure information
CN101202976A (en) Apparatus and method of mobile communication terminal character conversion
CA2544899A1 (en) Application conversion of source data
CN100585561C (en) Method for clipping relocatable ELF files in embedded system
CN102760119A (en) Method for storing Unicode coded character string in embedded device
JP5551660B2 (en) Computer-implemented method for encoding text into matrix code symbols, computer-implemented method for decoding matrix code symbols, encoder for encoding text into matrix code symbols, and decoder for decoding matrix code symbols
CN103529773B (en) Automatic conversion method for behavior control scripting language
CN114490853A (en) Data processing method, device, equipment, storage medium and program product
CN102339216A (en) Chinese character display method based on VxWorks operating system
CN111428441B (en) Information system cross-platform application oriented Chinese character code conversion method and equipment
CN107436938B (en) The additional log analytic method of image before a kind of relational database
CN106354746A (en) Searching method, and searching device
CN101877005B (en) Document mode-based GML compression method
CN103218349A (en) Reading and conversion method for PLC (Programmable Logic Controller) instruction storage rule in PMW-format file
US20050289132A1 (en) Method and system for converting encoding character set
US8463759B2 (en) Method and system for compressing data
JP4821287B2 (en) Structured document encoding method, encoding apparatus, encoding program, decoding apparatus, and encoded structured document data structure
CN105740374A (en) Distributed memory based three-dimensional platform data fuzzy query method
US20120017202A1 (en) Translation device, translation method, and storage medium for program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121031