EP1076419A2 - Procédé et système d'implémentation de conversions de code dans un système informatique - Google Patents
Procédé et système d'implémentation de conversions de code dans un système informatique Download PDFInfo
- Publication number
- EP1076419A2 EP1076419A2 EP00306815A EP00306815A EP1076419A2 EP 1076419 A2 EP1076419 A2 EP 1076419A2 EP 00306815 A EP00306815 A EP 00306815A EP 00306815 A EP00306815 A EP 00306815A EP 1076419 A2 EP1076419 A2 EP 1076419A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- codeset
- text file
- computer system
- file
- elements
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/28—Programmable structures, i.e. where the code converter contains apparatus which is operator-changeable to modify the conversion process
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
Definitions
- the present invention relates generally to computer systems, and more specifically to user-defined codeset conversions, including operation-based code conversions.
- a character set is a collection of predefined characters based on the needs of a particular language or use environment.
- the character set can be composed of alphabetic, numeric, or other characters. Characters may be grouped in a set because they are needed to communicate in a given language or in a given specialized environment. Examples of character sets of the latter type include the symbols necessary to communicate mathematical or chemical formulas.
- the codeset defines a set of unambiguous rules that establish a one-to-one relationship between each character of the character set and that character's bit representation. This bit representation can be considered as a graphical image of the character. It is this image that is displayed on the computer screen and the printed page.
- the codeset's representation can be dependent on the number of bytes used to represent each character, as well as the computer system's communication protocol. For example, a computer system using a 7-bit communication protocol often represents the same set of characters differently than a computer system using an 8-bit protocol. Thus, the choice of which codeset to use frequently depends on the user's data processing requirements.
- the first conversion technique required the user to request a conversion by defining a series of simple one-to-one mappings. These mapping requests were then converted to a binary table. This technique was onerous because it required every mapping to be listed individually. And for multi-byte codesets, this technique was particularly problematic: The conversions often could not be expressed in terms of one-to-one mappings. And even if that was not the case, the resulting table sizes were too large for practical use.
- the second conversion technique required the user to write algorithmic converters to a defined application program interface. These converters were typically written in the C programming language, or some other compiled language. As such, a compiler, linker, and debugger was needed to test and verify such converters This complicated system often led to lengthy development cycles. This invention provides a valuable alternative.
- Embodiments of the present invention provide methods and systems that facilitate codeset conversions.
- a user-defined text file assigns conversion rules between differing codesets.
- This text file is composed of one or more conditional, or operation-based, conversion elements.
- a utility evaluates the rules represented by these elements and produces a table that memorializes the rules in a binary file format.
- This binary file is then used to transform character data of a first codeset to character data of a second codeset.
- the preferred embodiment converts data between differing codesets in an efficient manner.
- the preferred embodiment can convert data between differing multi-byte codesets and between multi- and single-byte codesets.
- the preferred embodiment does not require the writing of complex algorithmic converter functions and is not limited to primitive one-to-one mapping requests. Nor is the user required to compile the text file, as it is passed to the utility as a text file.
- Figure 1 depicts a computer system that provides for user-defined code conversions in accordance with an embodiment of the present invention.
- Figure 2 provides an overview of logical steps employed in an embodiment of the present invention.
- Figure 3 is a flowchart illustrating steps used in an embodiment of the present invention to store a user-defined code conversion in a database.
- Figure 4 is a flowchart detailing steps used in an embodiment of the present invention to convert codeset data in accordance with the user-defined code conversion.
- FIG. 1 The Computer System
- FIG. 1 A computer system that provides for user-defined code conversions in accordance with an embodiment of the present invention is illustrated in Figure 1.
- This system 100 includes an array of components: a central processing unit 102; a user interface 104, which typically consists of a keyboard, a mouse, and a monitor; a primary memory 106, such as a random access memory, for program execution; and a secondary memory 108, such as a disk, for storing programs and data that are not immediately needed for execution.
- a primary memory 106 such as a random access memory, for program execution
- secondary memory 108 such as a disk, for storing programs and data that are not immediately needed for execution.
- Contained within memories 106, 108 are an operating system 110 that contains instructions and data for managing the computer's resources and a user program 112, that interacts with the operating system to convert data between codesets.
- the operating system 110 is based on the UNIX standard. Those skilled in the art will realize, however, that the present invention can be employed in any operating system
- the conversion process begins with the production of a user-defined code conversion definition text file 114, which is composed of a number of operation-based code definition elements 116. These elements 116 are conditional constructs that define the desired codeset conversion. By using these elements 116, codeset conversions can be defined using a text file that does not require individual mappings of each desired conversion. Nor is it necessary to write and debug complicated algorithmic converters. And since the text file 114 is passed to the operating system 110 as a flat text file, it does not have to be compiled by the user. All of these facets result in faster development cycles.
- the user program 112 has two functional components: a conversion definer 118 and a conversion requester 120.
- the conversion definer 118 forwards the text file 114 to a utility of the operating system 110, the geniconvtbl() utility 122.
- this utility 122 is specifically designed to facilitate code conversions. It does this by accepting the text file 114 and converting it to a binary data file 124.
- the conversion definer 118 then stores that binary data file 124 in a code conversion table database 126 contained within the operating system 110.
- the user program's conversion requester 120 then interfaces with three operating system functions, which can be collectively referred to as the iconv subsystem, or individually referred to as iconv_open 128, iconv() 130, and iconv_close 132.
- the conversion definition text file 114 is named by listing the "convert from” and "convert to" codesets separated by a percentage sign.
- the name for a text file defining the conversion rules for converting codeset data from US-ASCII to ISO8859-1 would be US-ASCOI%ISO8859-1.
- the file 114 is composed of a number of the operation-based code conversion definition elements 116. As pictured, these elements can include directions, conditions, operations, and mappings.
- direction elements represent the pinnacle of the code conversion definition element hierarchy. This is so because they are composed of one or more condition-action pairs. Each condition-action pair contains a condition element and an action element. If the predefined condition is met, the corresponding action will be executed.
- the condition can be (1) a predefined condition element, (2) a name to a predefined condition element, or (3) a condition literal value, true, which will always result in the corresponding action being executed.
- the action component of the element can be another direction element, an operation element, or a map element. Operation and map elements will be explained below. Condition elements will be explained now.
- Condition conversion elements specify one or more condition expression elements in a preferred embodiment of the present invention. These condition expression elements can take three forms in this embodiment: (1) BETWEEN condition expression elements; (2) ESCAPE SEQUENCE condition expression elements, and (3) mathematical and logical condition expression elements, which are generically referred to as EXPRESSION elements.
- the BETWEEN condition expression element is used to define conversion rules for codeset data contained within one or more comma-separated ranges. For example, the following defines a conversion rule using a direction that is composed of a condition-action pair that uses a BETWEEN condition expression element to apply a predefined operation to all codeset data in the ranges of 0x64 and 0x7f:
- the ESCAPE SEQUENCE condition expression element can be used to define one or more comma-separated escape sequence designators. For example, the following equates an escape sequence to ESC $) C and the Shift-Out control character code, OxOe: escapeseq 0x1b242943, 0x0e;
- Operation elements can be comprised of the following operation expression elements; (1) IF-ELSE operation expressions; (2) OUTPUT operation expressions; and (3) CONTROL operation expressions.
- An operation can be composed of any number or combination of operation expression elements.
- the IF-ELSE operation expression element defines a conversion rule that is dependant on the outcome of the boolean result of the IF statement. If the result is true, the task that follows the IF statement is executed. If false, the task that follows the ELSE statement is executed.
- IF-ELSE statements can be nested to create more complex conversion rules. The following is representative syntax that generates an error message if the remaining output buffer is less than a predefined minimum. Else, the syntax creates a rule that generates an output codeset character representation by performing a logical AND on the input codeset character and the hexadecimal value 0x7f:
- the CONTROL operation expression can be used to (1 ) return error messages, (2) discard bytes from the input buffer pointer and move the input buffer accordingly, (3) stop the execution of the current operation, (4) execute an initialization operation and set all variables to zero; (5) execute a reset operation and set all variables to zero; (6) execute a predefined named operation, e.g., "operation ISO8859-1_to_ISO8859-2; (7) execute a predefined direction, or (8) execute a predefined mapping.
- a predefined named operation e.g., "operation ISO8859-1_to_ISO8859-2
- (7) execute a predefined direction
- (8) execute a predefined mapping.
- mappings can specify a direct code conversion mapping by using one or more map pairs.
- Five possible pairings are (1) HEXADECIMAL-HEXADECIMAL, (2) HEXADECIMAL RANGE-HEXADECIMAL RANGE, (3) 'default'-HEXADECIMAL, (4) 'default'-'no_change_copy', and (5) HEXADECIMAL-ERROR.
- mappings can be used to (1) convert a specified hexadecimal value to another hexadecimal value, (2) convert a specified range of hexadecimal values to another range of hexadecimal values, (3) convert an undefined input character to a defaulting hexadecimal value, (4) leave an undefined input character unchanged, and (5) return an error message when a particular input character is encountered.
- Each map element can also have comma-separated attribute elements.
- the mapping can be encoded as the following table types: dense, hash, binary search tree, index, or a automatically defined type. Illustrative syntax is:
- mappings can be efficiently defined.
- the use of the operational-based definition elements allows the preferred embodiment to provide for conversions not only between two single-byte codesets but also between multi-byte codes sets and bidirectional conversion between single and multi-byte codesets. This is so because the operational elements of the code conversion definition file reduce the space needed to define the conversions.
- the text file 114 is also advantageous because of its ease of use: The file 114 is composed of concise declaration statements-not complicated instructions written in a program language such as C, which requires compilation before use. As such, the text file, when applied to the other inventive concepts disclosed herein, is a valuable tool for the computer industry. Complete examples of user-defined code conversion text files 114 are provided in Appendix B, which has been incorporated herein by reference.
- FIG. 2 An overview of logical steps employed in an embodiment of the present invention to produce a code conversion binary table file from a user-defined code conversion text file and to convert codeset data in accordance with that binary table file is shown in Figure 2.
- This process begins by defining the code conversion definition text file 114. Once this is done, it is passed to the geniconvtbl() utility 122, which interprets the text file 114 and converts it to a binary table file 124. This file transformation takes place to render the text file into a format that the iconv subsystem 202, represented by the functions iconv_open 128, iconv() 130, and iconv_close 132, can understand.
- the binary file 124 is produced, it is stored in the database 126.
- the user-defined codeset conversion is now available for use.
- Actual conversion begins when character data in an input file 204 is transferred to the operation system by the user program.
- the iconv subsystem 202 recognizes that the user is requesting a conversion and instantiates the shared object 134 to retrieve the appropriate binary file 124 from the database 126.
- the subsystem 202 then interfaces with the shared object 134 to translate the data according to the protocol set forth in the binary file 124.
- the translated data is placed in an output file 206.
- the processing steps used in a preferred embodiment of the present invention to store a table representing user-defined code conversion rules are illustrated in Figure 3.
- the embodiment begins with a preprocessing step: defining the codeset conversing rules via a code conversion definition file (step 302).
- the conversion definer 118 sends the text file 114 to the geniconvtbl() utility 122, which then parses the file for errors (steps 304, 306). If an error is found, the utility 122 returns an error message to the invoking user program 118. Otherwise, the utility 122 converts the text file 114 to a binary file 124 (step 306), Processing is then returned to the calling program 118, which stores the binary file 124 in the database 126 (step 308).
- the steps used in an embodiment of the present invention to convert codeset data is shown in Figure 4.
- the processing begins when conversion requester 120 invokes iconv_open 128 to obtain a conversion descriptor for the desired conversion (step 402).
- this conversion descriptor contains pointers and data that the calling program 120 uses to later invoke iconv() 130.
- the conversion requester 120 invokes iconv_open 128 by passing to it the "from" and "to" codesets of the desired conversion.
- the function 128 searches the database for a corresponding binary file (step 404). If no such file exists, an error message is returned to the invoking program 120. Else, the function 128 instantiates the shared object and returns control to the invoking program 120 (step 406).
- the conversion requester 120 next calls iconv() 130, which interfaces with the shared object to perform the conversion (steps 408, 410).
- the shared object performs the conversion by accessing the binary file stored in the database 126.
- control is again returned to the invoking program 120.
- the process is completed when the requester 120 calls iconv_close 132 to release the shared object and the conversion descriptor (steps 412, 414). In this manner, the preferred embodiment is able to efficiently convert data between differing codesets.
- this disclosure has outlined several operation-based definition elements that can be incorporated into the code conversion definition text file. Using the inventive concepts disclosed herein, those skilled in the art may be able to create further such elements that, although not explicitly disclosed, are implicitly disclosed by the concepts of the invention.
- Another possible modification is to store the text file-as opposed to the binary file-for use in subsequent conversions.
- the utility would recall the text file from a database when a conversion is requested. The utility would then convert the text file to a binary file so that conversion could be effectuated.
- the present invention is not limited to any particular CPU 102 or processing technology. Nor is the invention limited to any particular operating system 110. Rather, the invention could be utilized in any operating environment, such as WINDOWS 95, WINDOWS 98, UNIX, MacOS, or any JAVA runtime environment, which refers to the operating environment typified by a JAVA virtual machine and associated JAVA class libraries. JAVA is a registered trademark of SUN MICROSYSTEMS, Inc.
- inventive concepts were described in part as being contained within random access memory and a hard disk. Those skilled in the art will recognize that these concepts can also be stored and invoked from any media that can store data or have data read from it. Examples of such media include floppy disks, magnetic tapes, phase discs, carrier waves sent across a network, and various forms ROM, such as DVDs and CDs. Thus, the present invention anticipates the use of all computer readable media.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US371616 | 1999-08-10 | ||
US09/371,616 US6708310B1 (en) | 1999-08-10 | 1999-08-10 | Method and system for implementing user-defined codeset conversions in a computer system |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1076419A2 true EP1076419A2 (fr) | 2001-02-14 |
EP1076419A3 EP1076419A3 (fr) | 2004-04-28 |
Family
ID=23464685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00306815A Withdrawn EP1076419A3 (fr) | 1999-08-10 | 2000-08-09 | Procédé et système d'implémentation de conversions de code dans un système informatique |
Country Status (2)
Country | Link |
---|---|
US (1) | US6708310B1 (fr) |
EP (1) | EP1076419A3 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005083584A1 (fr) | 2004-02-25 | 2005-09-09 | Computer Associates Think, Inc. | Systeme et procede destines a la conversion entre plusieurs ensembles de symboles |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100591070C (zh) * | 2004-08-12 | 2010-02-17 | 大唐移动通信设备有限公司 | 协议消息解析方法及协议消息解析系统 |
US7856653B2 (en) * | 2006-03-29 | 2010-12-21 | International Business Machines Corporation | Method and apparatus to protect policy state information during the life-time of virtual machines |
US9454514B2 (en) * | 2009-09-02 | 2016-09-27 | Red Hat, Inc. | Local language numeral conversion in numeric computing |
US9021471B2 (en) | 2011-05-03 | 2015-04-28 | International Business Machines Corporation | Managing change-set delivery |
US20160057239A1 (en) * | 2014-08-20 | 2016-02-25 | International Business Machines Corporation | Managing codeset converter usage over a communications network |
US9917598B1 (en) * | 2016-11-22 | 2018-03-13 | International Business Machines Corporation | Implementing preemptive customized codeset converter selection on SAAS |
AU2020297181A1 (en) * | 2019-06-17 | 2022-01-27 | Umwelt (Australia) Pty Limited | A data extraction method |
CN115392160B (zh) * | 2022-06-10 | 2024-04-09 | 无锡芯光互连技术研究院有限公司 | 一种电路图描述文件的格式转换方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2830884B2 (ja) * | 1992-02-06 | 1998-12-02 | 日本電気株式会社 | 多重文字コードセットの入出力変換方式 |
JPH07271777A (ja) * | 1994-03-31 | 1995-10-20 | Fujitsu Ltd | 分散情報処理システムにおける文字コード管理方式 |
US5936636A (en) * | 1996-05-16 | 1999-08-10 | Sun Microsystems, Inc. | Encoding schemes |
US5898874A (en) * | 1996-09-20 | 1999-04-27 | Sun Microsystems, Inc. | Dynamic codeset translation environment |
US5831560A (en) * | 1996-09-20 | 1998-11-03 | Sun Microsystems, Inc. | S-table approach to data translation |
-
1999
- 1999-08-10 US US09/371,616 patent/US6708310B1/en not_active Expired - Fee Related
-
2000
- 2000-08-09 EP EP00306815A patent/EP1076419A3/fr not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
No Search * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005083584A1 (fr) | 2004-02-25 | 2005-09-09 | Computer Associates Think, Inc. | Systeme et procede destines a la conversion entre plusieurs ensembles de symboles |
US7218252B2 (en) | 2004-02-25 | 2007-05-15 | Computer Associates Think, Inc. | System and method for character conversion between character sets |
Also Published As
Publication number | Publication date |
---|---|
EP1076419A3 (fr) | 2004-04-28 |
US6708310B1 (en) | 2004-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7010796B1 (en) | Methods and apparatus providing remote operation of an application programming interface | |
RU2405202C2 (ru) | Использование абстрактных описаний для генерации, обмена и конфигурирования рабочих циклов сервиса и клиента | |
EP0861467B1 (fr) | Systeme et procede d'enregistrement global base sur une programmation orientee objets | |
CN100418056C (zh) | 生成方法的系统与方法 | |
US6400287B1 (en) | Data structure for creating, scoping, and converting to unicode data from single byte character sets, double byte character sets, or mixed character sets comprising both single byte and double byte character sets | |
US20040015889A1 (en) | Translator-compiler for converting legacy management software | |
CA2454608A1 (fr) | Extensions de langage de programmation permettant de traiter des objets xml et applications correspondantes | |
JPS6120129A (ja) | 複数コ−ド系情報処理方式 | |
EP2587372A1 (fr) | Partage d'objets de première classe à travers plusieurs langages de programmation interprétés | |
EP1717719A1 (fr) | Conversion de l'application des données sources | |
KR20010110094A (ko) | 메시지 변환 선택 툴 및 그 방법 | |
US20050262042A1 (en) | Generating a dynamic content creation program | |
WO2020259417A1 (fr) | Procédé et dispositif d'analyse de données pour chaîne de blocs | |
US5657447A (en) | Platform-transparent registration and build of stored procedures and user-defined functions | |
US20060053131A1 (en) | General programming language support for nullable types | |
EP1076419A2 (fr) | Procédé et système d'implémentation de conversions de code dans un système informatique | |
US6799320B1 (en) | Providing binding options for component interfaces | |
US8914769B2 (en) | Source code generation for interoperable clients and server interfaces | |
US20080216099A1 (en) | System for Generating Optimized Computer Data Field Conversion Routines | |
US20220043639A1 (en) | Control of mission data tool application program interfaces | |
CN117075912B (zh) | 用于程序语言转换的方法、编译方法及相关设备 | |
CN112947938B (zh) | 一种文件转化方法、装置、电子设备及存储介质 | |
US8856731B2 (en) | Scalable language infrastructure for electronic system level tools | |
CN115617354A (zh) | Cobol转java的新旧信息控制系统转译方法 | |
US9015728B2 (en) | Methods, apparatus, and systems to access runtime values of object instances |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: THEMBILE, MTWA Inventor name: ASHIZAWA, KAZUNORI Inventor name: IENUP, SUNG |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: 7H 03M 7/28 B Ipc: 7G 06F 17/21 A |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK RO SI |
|
AKX | Designation fees paid | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: 8566 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20041029 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: THEMBILE, MTWA Inventor name: ASHIZAWA, KAZUNORI Inventor name: IENUP, SUNG |