CN107209672B

CN107209672B - Information processing apparatus and information processing method

Info

Publication number: CN107209672B
Application number: CN201680007574.9A
Authority: CN
Inventors: 城代佳范; 坂井孝介
Original assignee: Hitachi Social Information Services Ltd
Current assignee: Hitachi Social Information Services Ltd
Priority date: 2015-01-28
Filing date: 2016-01-14
Publication date: 2020-08-14
Anticipated expiration: 2036-01-14
Also published as: HK1244910A1; CN107209672A; WO2016121509A1; JP2016139294A; JP6397343B2

Abstract

An information processing device (P) that processes data using a character encoding system after a change in the character encoding system of the data to be processed, the information processing device (P) comprising: a DA setting unit (31) that sets a data storage area (DA) in which data of 1 data size can be stored and each area has an equal size; an FA setting unit (32) that sets a flag storage area (FA) that stores a flag for identifying the type of data stored in the DA, in association with the DA; an FA reading unit (33) for reading an FA in which a flag is stored; and a DA reading unit (34) that reads the DA set in correspondence with the FA.

Description

Information processing apparatus and information processing method

Technical Field

The present invention relates to a data processing technique involving a change in a character encoding system.

Background

In a character encoding system such as Shift-JIS (japanese industrial Standards) used in information systems that have been operated more than several decades ago, the display width (number of bits) and the data size (number of bytes) of each character are equal. For example, Shift-JIS represents a half-character consisting of 1 bit with 1 byte, and a full-character consisting of 2 bits with 2 bytes. Therefore, the information system can process the data in units of characters, and can read out the character data as the processing target without errors. A representative program Language used for the generation of the information system is COBOL (Business-Oriented general Language). COBOL does not distinguish between character data and binary data that is a non-character and processes the same.

Currently, with globalization, as a character encoding system, Unicode, for example, is adopted as a standard. Unicode does not classify characters by the number of digits compared to Shift-JIS, and there is no predetermined relationship between the display width of each character and the data size. Here, for example, for an information system which is produced by COBOL and uses a character encoding system such as Shift-JIS, it is desirable to change the character encoding system to Unicode or the like. In particular, it is often desired to use, as a new information system using a character encoding system such as Unicode, an information system which has been produced by COBOL and operated since a long time ago using a character encoding system such as Shift-JIS, by migration. Further, for example, patent document 1 discloses a technical content of migration.

Documents of the prior art

Patent document

Patent document 1: japanese patent No. 4405571

Disclosure of Invention

Problems to be solved by the invention

However, even if the character encoding scheme is simply changed to Unicode or the like in which the display width and the data size of each character do not have a predetermined relationship, the information system cannot know the number of bytes that need to be read for the processing of 1 character. As a result, the processing cannot be performed in units of characters in the modified character encoding system, and the character data to be processed is erroneously read. Conventionally, when a character encoding system is changed, a developer extracts a program code to be a processing part of character data and appropriately modifies the program code to eliminate such a problem. Further, modification of the program code, which becomes a processing part of the binary data, requires a method different from that of the character data. However, such manual modification of the program code causes many errors, and also causes a reduction in the work efficiency of the transition and a reduction in the quality of the information system after the character encoding system is changed.

The present invention has been made in view of the above circumstances, and an object thereof is to assist in reliably performing an intended process regardless of a character encoding system to which data to be processed is to be subjected.

Means for solving the problems

In order to achieve the above object, the present invention provides an information processing apparatus for processing data to be processed using a character encoding system after a change in a character encoding system of the data, the information processing apparatus comprising:

a data storage area setting unit that sets a data storage area capable of storing data of 1 data size and having an equal area size;

a flag storage area setting unit that sets a flag storage area storing a flag for identifying a type of data stored in the data storage area, in correspondence with the data storage area;

a flag storage area reading unit that reads the flag storage area in which the flag is stored; and

and a data storage area reading unit that reads the data storage area set in correspondence with the flag storage area.

Other means will be described later.

Effects of the invention

According to the present invention, regardless of the character encoding system to which the data to be processed is changed, the intended processing can be reliably performed.

Drawings

Fig. 1 is a diagram showing a functional configuration of an information processing apparatus according to the present embodiment.

Fig. 2 is a flowchart showing the processing of the information processing apparatus according to the present embodiment.

FIG. 3 is an explanatory view of embodiment 1.

FIG. 4 is an explanatory view of embodiment 2.

FIG. 5 is an explanatory view of embodiment 3.

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. For convenience of explanation, the data storage area is sometimes referred to as "DA" and the flag storage area is sometimes referred to as "FA".

Structure (of the related Art)

The information processing apparatus P according to the present embodiment is a computer including hardware such as an input unit, an output unit, a control unit (corresponding to the processing unit in fig. 1), and a storage unit (corresponding to the storage unit 2 and the work area 4 in fig. 1). For example, when the control Unit is constituted by a CPU (Central Processing Unit), information Processing by a computer including the control Unit is realized by program execution Processing by the CPU. The storage unit included in the computer stores various programs for realizing the functions of the computer in accordance with instructions from the CPU. Thereby enabling cooperation of software and hardware. The above-described program may be provided by being stored in a recording medium or via a network.

As shown in fig. 1, the information processing apparatus P includes a processing unit 1, a storage unit 2, an editing library 3, a work area 4, and an editing tool 5.

The processing unit 1 controls the whole of all processes executed by the information processing apparatus P. The processing targeted by the processing unit 1 includes, for example, processing necessary for conversion of a character encoding system adopted by the information processing apparatus P.

The storage unit 2 is a part for storing various forms of information. The storage unit 2 stores a file f and a character code conversion table T.

The file f is a text file or a binary file. When the file f is a text file, the processing unit 1 may convert the character encoding system of the character data in the file f to another character encoding system specified by the input unit or the like, for example.

The character code conversion table T manages the character codes of each character code system employed by the information processing apparatus P for respective characters belonging to a predetermined character set. The processing unit 1 can express character data in a text file according to the adopted character encoding system. When the designation of the change of the character encoding system is given, the processing unit 1 may express the character data expressed according to the character encoding system before the change according to the character encoding system after the change.

The editing library 3 is a reusable program that reads a file f opened by the editing tool 5 from the storage unit 2 and performs necessary processing for the editing tool 5 to process data in the read file f. The editing library 3 includes a DA setting unit 31 (data storage area setting unit), an FA setting unit 32 (flag storage area setting unit), an FA reading unit 33 (flag storage area reading unit), and a DA reading unit 34 (data storage area reading unit). The "necessary processing" is realized by the DA setting unit 31, the FA setting unit 32, the FA reading unit 33, and the DA reading unit 34.

The DA setting unit 31 sets a DA capable of storing data of 1 data size and having an equal area size for the work area 4. "data of 1 data amount" means character data representing 1 character, or binary data of 1 byte amount. The data size of the character data representing 1 character is 1 byte or 2 bytes or more in accordance with the character encoding system adopted by the information processing apparatus P. For example, in the case where the character encoding system is UTF-8, 1 character of the full-size character is represented by 3 bytes. In this case, the DA setting unit 31 sets a DA capable of storing 3 bytes of character data for the work area 4.

"data that can store 1 data amount" includes the following meanings: the size (capacity) of the 1 DA area is set to a size equal to or larger than the maximum value of the data size of the character data that can be expressed in the character encoding system employed by the information processing apparatus P. For example, in the case of UTF-8, 1 character is represented by 1-3 bytes (e.g., 1 character of half-corner alphanumeric characters is represented by 1 byte, 1 character of a part of operators is represented by 2 bytes, and 1 character of all-corner characters is represented by 3 bytes). In this case, the maximum value of the data size of the character data that can be expressed in UTF-8 is 3 bytes, and the area size of 1 DA is 3 bytes or 4 bytes or more.

Further, by setting the area sizes of the respective DA equal, the load associated with the DA setting can be reduced. That is, since the process of appropriately changing the area size of the DA according to the data size of 1 data is inefficient and involves a large load, such a process is not performed.

The FA setting unit 32 sets FA, which stores a flag for identifying the type of data stored in the DA, in the work area 4 in association with the DA. The flag includes, for example, a flag for recognizing binary data, a flag for recognizing character data, and a flag for recognizing specific data, but for the character data, for example, a flag for recognizing a half-character or a flag for recognizing a part of a full-character may be used (details will be described later). In addition, the area size of 1 FA can be arbitrarily set, but is preferably smaller than the area size of 1 DA (for example: the area size of 1 FA is 4 bits).

The FA reading unit 33 reads the FA in which the flag is stored. The FA reading unit 33 acquires the flag stored in the read FA.

The DA reading unit 34 reads the DA set in correspondence with the FA. The DA reading unit 34 acquires data stored in the read FA based on the flag stored in the read DA corresponding to the read DA.

The work area 4 is an area where data reading and writing are performed by the information processing apparatus P. The work area 4 includes the DA set by the DA setting unit 31 and the FA set by the FA setting unit.

The editing tool 5 is software (information system) having an editing function of the file f. The editing tool 5 converts the program language by using a known automatic conversion tool so that the program code originally described (before migration) by the COBOL is the program code described by JAVA (registered trademark). The logic of the program code described by COBOL is the same as that described by JAVA. The editing tool 5 is linked to the editing library 3.

The editing library 3 and the editing tool 5 are introduced into the information processing apparatus P so that the information processing apparatus P holding the file f by migration can process the file f by the logic shown by COBOL even if the conversion from COBOL to JAVA is performed.

The program language before conversion according to the present invention is not limited to COBOL, and may be another program language. The program language after conversion according to the present invention is not limited to JAVA, and may be another program language.

Treatment

As shown in fig. 2, the information processing apparatus P according to the present embodiment operates in the following order to edit the file f. This sequence is performed under the control of the processing unit 1, and starts at step S01.

In step S01, the editing tool 5 calls the editing library 3. The editing tool 5 notifies the editing library 3 of the file f to be edited. After step S01, the process proceeds to step S02.

In step S02, the editing library 3 reads a file f to be edited. When the file f is read, the data in the file f is sequentially read in units of bytes. After step S02, the process proceeds to step S03.

In step S03, the DA setting unit 31 sets a DA for the work area 4, and the FA setting unit 32 sets an FA for the work area 4. After step S03, the process proceeds to step S04.

In step S04, the DA setting unit 31 sequentially stores the data read from the file f for 1 data amount for each set DA, and the FA setting unit 32 sequentially stores the flag of the data stored in the DA for each set FA. For example, after the editing tool 5 calls the editing library 3 (step S01), the FA setting unit 32 can confirm the data items defined by the execution section of the program code of the editing tool 5 and determine the values of the flags stored in the FAs set by the FA setting unit 32. The DA setting unit 31 stores data of 1 data amount for each set DA based on the flag stored in the FA. After step S04, the process proceeds to step S05.

In step S05, the FA reading unit 33 sequentially reads the set FAs and sequentially acquires the flags stored in the FAs. After step S05, the process proceeds to step S06.

In step S06, the DA reading unit 34 sequentially reads the DA set in correspondence with the read FA, and acquires the data stored in the read DA. At this time, the acquired data is data of 1 data size determined based on the value of the flag acquired from the read FA (details will be described later). The DA reading unit 34 sequentially transmits the acquired data to the editing tool 5. After step S06, the process proceeds to step S07.

In step S07, the editing tool 5 performs editing processing on the data acquired from the DA reading unit 34 of the editing library 3. Compared with the case where the data itself is analyzed and the editing process is performed without referring to the flag, performing the editing process based on the data of this step after identifying the type of the data by referring to the flag as in step S05 can shorten the time required for analyzing the data itself, and thus can increase the speed of the editing process. A specific example of the editing process corresponding to the flag will be described later.

In step S08, the editing tool 5 outputs the processing result of the editing processing of the data. Various output destinations, output modes, and the like are provided depending on the purpose of the editing process, and description thereof will be omitted. After step S08, the process of fig. 2 ends.

According to the present embodiment, when the character encoding scheme (for example, Shift-JIS in which the display width and the data size of each character are the same) of the data to be processed in the file f is changed and the data is processed using the changed character encoding scheme (for example, Unicode in which the display width and the data size of each character do not have a predetermined relationship), the following is derived.

First, the editing library 3 sequentially stores 1 data of each data size in each set DA, and stores a flag for identifying the data type in each corresponding FA. That is, 1 piece of data to be processed by the editing tool 5 is specified by the DA, and the type of data is specified for each piece of data.

Therefore, (although program codes are described in JAVA at present), the editing tool 5, which constructs the logic of program codes by COBOL whose data size can be specified only by the display width of characters, only needs to call the unified order of the editing library 3 that refers to the entire data in each DA and the flag in each FA corresponding to each DA. In this way, even if the editing tool 5 cannot determine the data size by the changed character encoding system, it is possible to reliably process 1 piece of data targeted for the editing process.

Note that the editing tool 5, which constructs the logic of the program code by COBOL (which similarly processes both character data and binary data by substituting them into the X item) that processes the same without (or without) distinguishing the data type, may only use a uniform procedure of calling the editing library 3 that refers to the flag stored in each FA. In this way, even if the editing tool 5 itself does not distinguish the type of data, it is possible to output the processing result suitable for the type of data without fail.

The editing tool 5 may be linked to the editing library 3, and may dynamically call the editing library 3 as necessary. This makes it possible to prevent an artificial error in modifying the program code without excessively modifying the program code itself of the editing tool 5.

Therefore, regardless of the character encoding system to which the character encoding system of the data to be processed is changed, the intended processing can be reliably executed.

Specific applications of the present embodiment will be described in detail in examples 1 to 3.

[ example 1]

Even if the information processing apparatus P of the present embodiment processes character data having a unit of 1 byte or more or binary data having a unit of 1 byte, depending on the character encoding system to be processed, the editing tool 5 can perform the editing process in the same manner using the editing library 3.

The case where the editing tool 5 performs the editing process on the binary data "0 xFF,0x 01" in the file f (binary file) will be described. The changed character encoding system is UTF-32 (1 character is represented by 4 bytes), and the size of each area of the DA is uniformly set to 4 bytes. In this case, "0 xFF,0x 01" which are two 1-byte binary data are stored as "0 x000000FF,0x 00000001" in 2 DA sets by the DA setting unit 31, for example (the storage method is various, and the method is not limited to the method described later as long as 1 data can be stored). The FA setting unit 32 of the editing library 3 checks the data items defined in the execution section of the program code of the editing tool 5 calling itself to know that binary data needs to be transmitted to the editing tool 5. Therefore, for example, when "— 1" is used as a flag indicating that the data stored in the DA is binary data, the FA setting unit 32 stores "— 1" in each FA corresponding to each of the 2 DA. Fig. 3 shows a case where a flag "— 1" is assigned to the stored binary data "0 xFF,0x 01".

Therefore, the editing library 3 can determine that the data stored in the DA is not character data but binary data by the FA reading unit 33 acquiring "— 1" stored in the FA, and can reliably acquire the data as binary data "0 xFF,0x 01" by the DA reading unit 34. That is, two 1-byte binary data "0 xFF,0x 01" stored in the DA are not erroneously acquired as two 4-byte character data "0 x000000FF,0x 00000001" (characters having the value), and the data size and the data type at the time of storage into the DA, the data size and the data type at the time of acquisition from the DA do not change. As a result, the editing library 3 can reliably transmit the binary data "0 xFF,0x 01" to the editing tool 5.

Similarly, for example, in UTF-32, character data "0 x 00000031" (4 bytes) indicating a character "1" of a half corner is stored in DA (the area size is set to 4 bytes) as "0 x 00000031", for example. In this case, the FA setting section 32 confirms the data item defined by the execution section of the program code of the editing tool 5, whereby the editing library 3 can know that the character data needs to be transmitted to the editing tool 5. Therefore, a flag "0" indicating that it is a half-angle character is stored in the FA corresponding to the DA. Therefore, the editing library 3 can determine that the data stored in the DA is not binary data but character data by acquiring "0" stored in the FA by the FA reading unit 33, and can reliably acquire the data as character data "0 x 00000031" (4 bytes) by the DA reading unit 34. That is, the character data "0 x 00000031" (4 bytes) stored in the DA is not erroneously acquired as 1-byte binary data "0 x 31", and the data size and the data type at the time of storing to the DA and the data size and the data type at the time of acquiring from the DA do not change. As a result, the editing library 3 can reliably transmit the character data "0 x 00000031" to the editing tool 5.

If the FA is not set, a problem occurs in data exchange with another information processing apparatus (for example, an apparatus which does not know the existence of the editing library 3 included in the information processing apparatus P according to the present embodiment) which performs editing processing. For example, in the Information processing apparatus P, when the Information processing apparatus P is changed from KEIS (one of character encoding systems in which the display width and the data size of each character are the same) to UTF-16 (one of character encoding systems in which the display width and the data size of each character do not have a predetermined relationship), the DA set by the DA setting unit 31 is 2 bytes (or 2 bytes or more). For example, binary data "0 x 31" read from a binary file is stored as "0 x 0031" in 1 DA. At this time, if FA is not set and it is not discriminated whether the data stored in the DA is character data or binary data, the data acquired from the DA is set to "0 x 0031" and is different in data size from the originally stored binary data "0 x 31". As a result, the editing tool 5 performs editing processing on the acquired data that is different from the original data, and the information processing apparatus P outputs a processing result that is different from the original data to another information processing apparatus.

According to embodiment 1, the editing library 3 sets FA, determines the type of data stored in the DA with reference to the flag stored in the FA, and transmits the data to the editing tool 5. Therefore, regardless of the character encoding system to which the data to be processed is changed, the intended processing can be reliably performed.

[ example 2]

Shift-JIS, which is a representative example of a character encoding system before modification in which the display width (number of bits) and the data size (number of bytes) are the same for each character, represents a full-size character formed of 2 bits with 2 bytes. In this case, the byte itself can be referred to determine the order (bit order) of each byte representing 1 character. In the case where characters are simply changed to UTF-32 or the like in which the display width and the data size of each character do not have a predetermined relationship without sorting the characters by the number of bits, the 1 byte read from the file f cannot be specified as the number of bits even with reference to the read byte itself.

When character data is read from a file f which is a text file, the information processing device P of the present embodiment sets DA and FA so that the character data can be distinguished as a half-character or a full-character, and can maintain the bit order of the read character data in order to accurately obtain the intended character data.

The case where the editing tool 5 performs the editing process on the character string "AB 12 あ 9 zzz" ("あ" is other characters than the half-character and "あ" is a full-character) will be described. For example, when the changed character encoding system is UTF-32, each character is 4 bytes of data, and the area size of each DA is 4 bytes, for example. In this case, the FA setting unit 32 of the editing library 3 checks the data item defined by the execution section of the program code of the editing tool 5 calling itself, and stores a flag "0" indicating that the character is a half-character in the FA for the half-character.

For the full-size character "あ" expressed by 2 bits in the character encoding system before the change (Shift-JIS, for example), 2 DA's are prepared, and the "あ" character data is redundantly stored in each DA, thereby maintaining the bit number of 1 character before and after the change of the character encoding system. Further, the FA setting section 32 stores "1" indicating the 1 st bit (left half) of the full-angle character in the FA for the full-angle character "あ" stored in the DA on the left side, and stores "2" indicating the 2 nd bit (right half) of the full-angle character in the FA for the full-angle character "あ" stored in the DA on the right side. Fig. 4 shows a case where character string data of "AB 12 あ 9 zzz" is stored in a predetermined number of DA arranged, and flags "0", "1", and "2" are assigned to the respective characters.

If the character encoding system before the change expresses a specific 1 character by n bits or more (n is 1, 2, 3, …), n DA's are prepared, character data of the specific 1 character is redundantly stored in the n DA's, and the bit order of the specific 1 character is indicated, so that n flags having different values can be stored in the n FA's.

Therefore, when the FA reading unit 33 acquires the flag from the FA, it can determine that the corresponding character data in the DA is a half-character or a full-character using the flag, and the number of bits in the case of the full-character. Therefore, the FA reading unit 33 can refer to the flag to determine the type of character data stored in the DA at high speed in the editing library 3. As a result, the editing library 3 can accurately transmit the targeted character data to the editing tool 5.

According to fig. 4, even if the editing library 3 acquires the full-size character "あ" stored in the DA 6 from the left, the bit sequence ("あ" is 1 st or 2 nd bit) cannot be known. To know the bit order, DA needs to be investigated in order from the left beginning. In this embodiment, at the time point when the flag "2" of the FA corresponding to the 6 th DA from the left is taken out alone, the data stored in the DA can be determined at high speed as the 2 nd bit of the full-size character. Even when the same full-size characters are continuous as in the character string "あああああ", it is possible to quickly determine whether the corresponding character data "あ" in the DA indicates the 1 st bit (left half) or the 2 nd bit (right half) by referring to the flag of the FA.

In the editing process of the editing tool 5, there is a case where it is desired to output the next character data "い" of the character data "あ" read out from the text data. For example, in the case of the Shift-JIS character encoding system, 2 is added to the 2 nd digit value of "あ" (when 2 is added to the 1 st digit value of "あ", other characters are output). Since the editing library 3 first extracts the FA of the flag "2" and can specify the 2 nd digit (not the 1 st digit) of "あ" at high speed, the editing tool 5 can output the character data "い" at high speed. Moreover, it is also possible to facilitate the migration of program code for performing such editing processing.

Further, when the editing library 3 acquires the flag "1" stored in the 5 th FA from the left, it is possible to determine that the data stored in the DA is the 1 st digit of the full-size character at high speed. Further, since the presence of the flag "1" can restrict the presence of the flag "2" with high probability, it can be determined at high speed that data (2 nd bit) in the 6 th DA from the left needs to be acquired.

[ example 3]

The DA of the present embodiment has the following advantages: the same storage method can be used for both character data and binary data having the same area size and different data sizes, and the data can be easily read and written in the changed character encoding system. However, Unicode shows that the number of bytes per 1 character is larger than Shift-JIS and KEIS, and the average data size per character is larger. In addition, as for binary data, it is wasteful to store 1 byte of data in 1 DA having an area size of a plurality of bytes. Therefore, even if the same data is processed before and after the change of the character encoding scheme, the number of access bytes to the DA in the editing library 3 after the change of the character encoding scheme is increased, and a large amount of memory bandwidth resources are used. This may cause a performance degradation of the information processing apparatus P.

Therefore, for data having a specific value among the data stored in the DA, the access to the data by the DA reading section 34 is not necessary, and the FA setting section 32 stores a flag (for example, "5") indicating that the data having the specific value is stored in the corresponding FA. A specific value may be set as a representation value on the editing process, and for example, as data having a specific value, binary data "0 x 00" often used as an initial value of a variable may be employed. The changed character encoding system is, for example, UTF-32, and the area size of the DA is, for example, 4 bytes. The area size of the FA and the data size of the flag stored in the FA are set to 4 bits, but the data size of the flag is not limited to 4 bits.

As shown in fig. 5, a case will be described in which 2 binary data "0 x00,0 xAB" are stored in 2 DA by the DA setting unit 31, and a flag "5" is stored in the corresponding 2 FA by the FA setting unit 32. First, the FA reading unit 33 accesses the FA and acquires the 4-bit flag "5" (see step S05 in fig. 2). At this stage, the FA reading unit 33 regards that binary data "0 x 00" is stored in the DA corresponding to the FA storing the flag "5", and the DA reading unit 34 omits the process of accessing the DA of 4 bytes (step S06 in fig. 2 is omitted). As a result, the number of access bytes can be reduced from 4 bytes to 4 bits for 1 DA. In addition, the editing library 3 sends binary data "0 x 00" to the editing tool 5.

In fig. 5, since the flag "5" is also stored in the corresponding FA for the DA storing the binary data "0 xAB", the FA reading unit 33 regards that the binary data "0 x 00" is stored, and the access to the DA by the DA reading unit 34 is omitted. In addition, the editing library 3 transmits binary data "0 x 00" to the editing tool 5. That is, the FA reading section 33 regards that the binary data "0 x 00" is stored regardless of the binary data actually stored. However, there is an editing process regardless of the value itself of the binary data in the editing process performed by the editing tool 5, and a method using the flag "5" is useful for such an editing process. As the editing process, for example, there is an editing process of setting a value of a data item such as an initialization (initialization) command of COBOL to a specific value in a large number at a time. Further, as the specific VALUEs, LOW-VALUE (LOW-VALUE) (0x00) and HIGH-VALUE (HIGH-VALUE) (0xFF) of COBOL may be used.

Therefore, according to embodiment 3, the step of accessing the DA can be omitted by storing a flag having a specific value in the FA. Therefore, even if the average data size is increased after changing the character encoding system, the increase in the number of access bytes to the DA can be suppressed. As a result, it is possible to avoid a situation in which the performance of the information processing apparatus P is degraded by using a large amount of memory bandwidth resources.

In addition, although the binary data is described in fig. 5, the present embodiment may be applied to character data. For example, the above-mentioned specific value may be set to character data "0" of a half corner ("0 x 00000030" in UTF-32 and "0 x 30" in UTF-8). In this case, when the flag "5" is stored in the FA, it is considered that the character data "0" is stored in the corresponding DA, and the access to the DA can be omitted.

(others)

(1) In the present embodiment, when the DA setting unit 31 stores character data in the DA, the DA reading unit 34 acquires the character data from the DA, and when the DA setting unit 31 stores binary data in the DA, the DA reading unit 34 acquires the binary data from the DA. However, depending on the purpose of the editing process by the editing tool 5, the binary data may be acquired from the DA when the character data is stored in the DA, or the character data may be acquired from the DA when the binary data is stored in the DA. This is because, even in these cases, a correct flag for identifying the type of data stored in the DA is stored in the corresponding FA.

(2) In the present embodiment, when a character coding scheme in which a half-corner character consisting of 1 bit is represented by 1 byte and a full-corner character consisting of 2 bits is represented by 2 bytes is changed from Shift-JIS or the like to Unicode or the like, as flags stored in FA, "0" representing the half-corner character, "1" representing the 1 st bit (left half) of the full-corner character, and "2" representing the 2 nd bit (right half) of the full-corner character are used. However, when the character data itself stored in each DA is accessed, the half-character can be determined, and the full-character can also be determined (since the full-character is stored in a plurality of DAs redundantly, the bit order of the full-character is unclear). Therefore, for example, by assigning a flag to at least the 1 st bit (left half) or the 2 nd bit (right half) of the full-size character and not assigning a flag to other character data, it is possible to classify the character data for each DA.

(3) In the present embodiment, in examples 1 to 3, FA is used as both for determining the type of data (example 1), for determining the bit order of character data (example 2), and for determining data having a specific value of frequency (example 3), and is set in the work area 4. However, for example, FA may be prepared and set in the work area 4 according to the use of embodiments 1 to 3. In addition, the FA in embodiments 1 to 3 can be set to a storage area different from the DA even if it corresponds to the DA. However, FA may be set including the corresponding DA. Specifically, in the case where the character encoding scheme is changed to UTF-32, the upper bit ratio of the unused portion which becomes 4 bytes of character data is used as the FA, and the flag is stored in the upper bit ratio. The area size of the DA may be set to 5 bytes, and 4 bytes of character data may be stored in each DA, and the remaining 1 byte may be used as the FA, and a flag of 1 byte or less may be stored in the remaining 1 byte. Thus, various approaches can be adopted for DA and FA.

(4) The horizontally arranged DA set in the work area 4 by the DA setting unit 31 (see fig. 1) may be sequentially stored in the corresponding DA from the top byte or acquired from the DA from the left side to the right side of fig. 1 (Big end: Big end), or may be sequentially stored in the corresponding DA from the bottom byte or acquired from the DA from the left side to the right side of fig. 1 (Little end: Little end). Therefore, when storing or acquiring character string data for a horizontally arranged DA, the storage order of bytes indicating the full size character may be the same as the order of bits 1 st bit → 2 nd bit, or may be the order reverse to the order of bits 2 nd bit → 1 st bit, as in the full size character "あ" shown in fig. 4. The above description also applies to FA.

Further, a technique in which various techniques described in this embodiment are combined as appropriate can also be realized.

The software described in this embodiment may be implemented as hardware, or the hardware may be implemented as software.

Hardware, software, flowcharts, and the like may be modified as appropriate without departing from the scope of the present invention.

Description of the symbols

A P information processing device;

1, a processing part;

2 a storage section;

3 editing the library;

31 DA setting unit (data storage area setting unit);

a 32 FA setting unit (flag storage area setting unit);

a 33 FA reading unit (flag storage area reading unit);

a 34 DA reading unit (data storage area reading unit);

4, a working area;

5 an editing tool;

f, files;

t character code conversion table.

Claims

1. An information processing apparatus for processing data to be processed using a character encoding system after a change in the character encoding system of the data,

the information processing device includes:

2. The information processing apparatus according to claim 1,

the data to be processed is character data or binary data in which 1 character is represented by a predetermined data size in the modified character encoding system,

the area size of the data storage area is a size equal to or larger than a maximum value of a data size of the character data that can be expressed in the character encoding system,

the flag stored in the flag storage area includes a flag indicating that the data stored in the data storage area is the character data and a flag indicating that the data stored in the data storage area is the binary data.

3. The information processing apparatus according to claim 1,

in the case where the data to be processed is character data in which 1 character is represented by a predetermined data size in the modified character encoding system,

the data storage area setting unit may redundantly store the character data representing 1 character in each of the plurality of data storage areas,

the flag stored in the flag storage region includes a flag for identifying a storage order of the character data redundantly stored in each of the plurality of data storage regions.

4. The information processing apparatus according to claim 1,

the flag stored in the flag storage area includes a flag which is regarded as storing data having a specific value among the data to be processed in the data storage area corresponding to the flag storage area.

5. An information processing method in an information processing apparatus for processing data to be processed using a character encoding system after a change in the character encoding system of the data,

the information processing method includes the steps of:

a step in which a data storage area setting unit sets a data storage area in which data of 1 data size can be stored and the respective area sizes are equal;

a flag storage area setting unit that sets a flag storage area storing a flag for identifying a type of data stored in the data storage area, in association with the data storage area;

a flag storage area reading step of reading the flag storage area in which the flag is stored; and

a data storage area reading step of reading the data storage area set in correspondence with the flag storage area.