WO2005064494A1 - Character processing execution program and recording medium - Google Patents

Character processing execution program and recording medium Download PDF

Info

Publication number
WO2005064494A1
WO2005064494A1 PCT/JP2004/019445 JP2004019445W WO2005064494A1 WO 2005064494 A1 WO2005064494 A1 WO 2005064494A1 JP 2004019445 W JP2004019445 W JP 2004019445W WO 2005064494 A1 WO2005064494 A1 WO 2005064494A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
information
attribute information
characters
attribute
Prior art date
Application number
PCT/JP2004/019445
Other languages
French (fr)
Japanese (ja)
Inventor
Haruhiko Yoshimeki
Takashi Igarashi
Original Assignee
Konica Minolta Photo Imaging, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Konica Minolta Photo Imaging, Inc. filed Critical Konica Minolta Photo Imaging, Inc.
Publication of WO2005064494A1 publication Critical patent/WO2005064494A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing

Definitions

  • the present invention relates to a character processing method suitable for character search processing and sorting processing by an information processing device such as a computer, a program for causing a computer to execute the method, and a computer-readable recording recording the program.
  • an information processing device such as a computer
  • a program for causing a computer to execute the method and a computer-readable recording recording the program.
  • character processing in an information processing device is performed based on a code number assigned to a glyph (glyph).
  • the code number is a JIS kanji code.
  • a character code (encoded character set) standardized by a public standard such as JIS X 0208: 1997) is used.
  • variant characters In the character code standardized by the official standard, characters necessary for accurately describing a person's name, a place name, and the like are lacking. Such characters are synonymous with the so-called common kanji included in the JIS kanji code, but there are a number of kanji whose character forms are not common but have a variant relationship (hereinafter referred to as variant characters). Is added as an external character to the user-defined area (free area) that is not used by the specified character code.
  • a sorting method for example, if “Sound-reading” is specified as the highest priority key to determine the sort order (array order) and “Member number” is specified as the second key, as shown in Fig. 9 (a) As a result, L11 is displayed as a result of sorting the member names whose readings are "Otahashi" in order of the member number. With this sorting method, the kanji notation of the member name is not regular, and the list display P12 is not unified.
  • the member name whose reading aloud is “OK” will be displayed as the character code L 1 2 is displayed as a result of sorting based on. Even in this sorting method, the kanji notation of the member name is irregular and the list is displayed as P 13.
  • JIS first-level kanji code or a JIS second-level kanji code which is a character code standardized by a public standard
  • the JIS Level 1 Kanji code is a set of coded Kanji that is composed by adding character codes in the order of reading aloud to common kanji.
  • An encoded kanji set composed of kanji with character codes assigned in the order of the number of strokes according to radicals.
  • the present invention solves this problem, and provides a character processing method capable of rearranging a plurality of characters in a highly significant arrangement order determined without depending on a character code, and a method for causing a computer to execute the method. It is an object of the present invention to provide a program and a computer-readable recording medium on which the program is recorded.
  • a first character processing method is a method for processing a character which is provided with an arbitrary character code and constitutes a coded character set, wherein a plurality of characters are Attribute information for identifying the attribute to be provided and the character code are stored in association with each other, and a plurality of characters are selected from a coded character set in which the attribute information is stored. Attribute information is read for a plurality of characters selected by input, and the attributes of each of the plurality of characters are determined based on the read attribute information. Determine the sequence order.
  • a second character processing method is a method for processing a character that is provided with an arbitrary character code and that constitutes an encoded character set, and includes an attribute for determining an attribute of each of a plurality of characters.
  • Information and the character code are stored in association with each other, and the input of the character code for selecting a character from the coded character set in which the attribute information is stored is waited for, and the attribute information regarding the character selected by inputting the character code is waited for.
  • the attribute of this character is determined based on the attribute information read in here, a character common to the determined attribute and the attribute is searched, and the character common to the searched attribute is assigned to the character searched here.
  • Character codes are output.
  • a program for causing a computer to execute the first character processing method according to the present invention includes an attribute provided in each of a plurality of characters for processing a character to which an arbitrary character code is added and which constitutes an encoded character set.
  • a database is provided in which attribute information and character codes are stored in a manner corresponding to each other in advance, and a character code for selecting a plurality of characters from a coded character set in which attribute information is stored in this database is awaited. Attribute information is read from the database for a plurality of characters selected by inputting the character code, and the attributes of the plurality of characters are determined based on the read attribute information. The arrangement order is determined for characters.
  • a program for causing a computer to execute the second character processing method according to the present invention includes an attribute provided in each of a plurality of characters for processing a character which is provided with an arbitrary character code and forms an encoded character set.
  • a database is provided in which attribute information and character codes are stored in association with each other in order to determine the character code. Waiting for input of a character code for selecting a character from a coded character set in which the attribute information is stored in this database, The attribute information for the character selected by inputting the character code is read from the database, the attribute of the character is determined based on the attribute information read here, and the character common to the determined attribute and the attribute is stored in the database. And outputs the character code assigned to the searched attribute common character.
  • a computer-readable recording medium on which a program for causing a computer to execute the first character processing method according to the present invention is provided, the computer can execute the first character processing method according to the present invention.
  • the program is recorded on a computer-readable recording medium. Therefore, when a computer is used to process characters constituting an encoded character set with an arbitrary character code, a plurality of characters are arranged in an arrangement order determined based on attribute information without depending on the character code. Since it is possible to change the order, it is possible to avoid an unnecessary change in the order of arrangement due to the addition of a new character to the character set.
  • FIG. 1 is a diagram showing a configuration example of a database provided in a program according to the present invention.
  • FIG. 2 is a diagram showing an example of a character arrangement before sorting.
  • FIG. 3 is a diagram showing an example of a character arrangement based on attribute information by the first program according to the present invention.
  • FIG. 4 is a diagram showing an example of sorting based on variant character information by the first program according to the present invention.
  • FIG. 5 is a flowchart showing a first example of character processing according to the present invention.
  • FIG. 6 shows a display example of the attribute common character list by the second program according to the present invention.
  • FIG. 7 is a flowchart showing a second example of character processing according to the present invention.
  • FIG. 8 (a) is a diagram for explaining a conventional sorting example, and is a diagram showing an example of character information having a data break.
  • FIG. 8 (b) is a diagram for explaining a conventional sorting example, and is a diagram showing a sorting range selection example.
  • FIG. 9 (a) is a diagram for explaining a conventional sorting example, and is a diagram showing an example of sorting by member number.
  • FIG. 9 (b) is a diagram for explaining a conventional sorting example, and is a diagram showing a sorting example using character codes.
  • a program for causing a computer to execute the first character processing method according to the first embodiment includes a plurality of characters for processing a character to which an arbitrary character code is added to form an encoded character set by the computer.
  • a database is provided in which attribute information for identifying the attributes of each character and a character code are stored in advance in correspondence with each other, and based on the attribute information stored in the database, the attributes of each of the plurality of characters are determined. The arrangement order is determined for these multiple characters according to the result. As a result, the array determined based on the attribute information without depending on the character code It is possible to sort multiple characters in order. .
  • the above-described program is recorded in a computer-readable recording medium in advance.
  • the computer-readable recording medium includes a magnetic recording medium such as a hard disk (HD), an optical recording medium such as a compact disk (CD), and an electronic recording medium such as a semiconductor memory.
  • HDD hard disk drive
  • FIG. 1 is a diagram showing a configuration example of a database DB provided in a program for causing a computer to execute a first character processing method according to the present invention.
  • This database DB includes a character code CD assigned to a plurality of characters C constituting an arbitrary coded character set, and attribute information D for determining an attribute of each of the plurality of characters. Attribute information D is output in response to a request without depending on the application that refers to the database DB.
  • the attribute information D is, for example, a radical information D 1 for determining a radical C 11 of the kanji C 1 as an attribute included in the kanji (character) “Lotus” C 1, and a kanji C Element information D 2 for discriminating the elements C 1 2 and C 1 3 of 1, the part other than the radical C 11 of this kanji C 1, that is, the total number of strokes of the elements C 1 2 and C 13 Information on the number of strokes in the radical D3, variant information D4 to determine the kanji C2 that is in a variant relationship with this kanji C1, and total stroke information to determine the total number of strokes in this kanji C1 D5, the actual name information D6 to determine the actual use of this kanji C1 It is composed of place name result information D7 for determining the use result of the kanji C1 in the place name, and regional result information D8 for determining the use result of the kanji C1 in each region.
  • a radical information D 1 for determining a radical C 11 of
  • the database DB includes, as attribute information D, character component information (element information) D2 for determining the component (element) of this kanji C1, variant character information D4, and personal name result information D 6, and the actual name information D 7 and the local result information D 8 are stored (stored) in correspondence with the character code “4F 7B” previously assigned to the kanji C 1.
  • the element information D 2 does not merely determine the element but also includes variant information and related information of the element.
  • the database DB stores the character code “4F7B” of the kanji C1 in association with the actual name information “1”, and stores the kanji.
  • the character code “4 F 7 B” of this kanji C 1 and the place name actual information “1” are stored in association with each other.
  • this kanji C1 is not used for personal name and place name characters, the character code “4F7B” of this kanji C1 and the actual name information “0” and the actual place name information “0” And save it.
  • prefectures are adopted as regional units, and the number of prefectures that have been used is stored as regional performance information D8. For example, if Kanji C1 has been used as a family name or place name in 47 prefectures, the database DB will use the kanji C1 character code ⁇ 4F7B '' and regional performance information ⁇ 47 '' And save them.
  • the sort processing engine, etc. is, specifically, an attribute included in the kanji “Lotus” C1. It is a radical C1 1 force S “canopy”, and the elements C1 2, C1 3 force S “Nosurai” And ⁇ car '', the number of strokes in the radical is ⁇ 9 '', the character is in a variant relationship with the kanji C2 assigned the character code ⁇ 5G6A '', the total number of strokes is ⁇ 1 2 '', It can be determined from the attribute information D that there is a place name record and the regional record is “47” prefectures.
  • radical information D 1 is specified as the highest priority key. If you specify the number of strokes information D 3 in the second key and the total number of strokes information D 5 in the third key, for example, a plurality of kanji whose radical is ⁇ canopy '' is shown in Fig. 2. Are arranged as follows.
  • variant characters E2 to E8 of kanji E1, variant character F2 of kanji F1, variant character G2 of kanji G1, and variant characters HI to H3 of kanji "Waka" are adjacent. It is difficult to find a variant relationship between these kanji characters because they are not arranged in a sequence.
  • the radical information D 1 is specified as the highest priority key
  • the radical stroke number information D 3 is specified as the second key
  • the variant character information D 4 is specified as the third key.
  • the plurality of kanji whose radical is "canopy" are sorted as shown in FIG.
  • the variant character relationship of a plurality of kanji is determined based on the variant character information D4, and the arrangement order is determined for the plurality of kanji according to the determination result.
  • variants E2 to E8 of kanji E1, variants F2 of kanji F1, variants G2 of kanji G1, and variants HI to H3 of Kanji ⁇ Waka '' are adjacently arranged. As a result, it is easy to find a variant relationship between these kanji.
  • the radical information D1 is specified as the highest priority key
  • the radical stroke number information D3 is specified as the second key
  • the variant key is specified as the third key.
  • character information D4 was specified and total stroke count information D5 was specified as the fourth key.
  • radical information D1 was specified as the highest priority key and the second key was specified.
  • the input device such as a keyboard connected to a computer has the highest priority as a sorting method. If the key is set to "on-reading" and the second key is specified for variant character information D4, as shown in Fig. 4, the member name whose on-reading is "Okuhashi" contains kanji related to variant characters. An array L that is in contact with each member's name. The unified member list P1 can be displayed.
  • the attribute information D and the character code CD for discriminating the attributes of a plurality of characters are provided.
  • a program provided with the database DB is installed on a computer. This program is started as a part of the Kana-Kanji conversion system, etc., for causing a computer to execute character processing with software for table creation, etc., and causes the computer to execute character processing according to the flowchart shown in FIG. Shall be assumed.
  • step S1 of the flowchart shown in FIG. 5 the computer stores the attribute information D in the database DB. Waiting for the input of a character code CD for selecting a plurality of characters from the encoded character set, and when the sorting range IP is selected in the member list P11, it is determined that the character code is CD input and the process proceeds to step S2.
  • step S2 the character code CD of the selected sorting range IP is copied to the work file and input, and in step S3, it is determined whether or not the sorting method is specified.
  • the sorting method for example, when “sound reading” is specified as the highest priority key and “variant” is specified as the second key, the process proceeds to step S4 to refer to the database DB.
  • step S5 the attribute information D corresponding to the specified key "variant character", i.e., the variant character information D4, for a plurality of characters selected by the character code CD transcribed in the work file is stored in the database DB. Read from.
  • step S6 based on the read variant character information D4, a variant character relationship is determined as an attribute of each of a plurality of characters, and the arrangement order is determined in step S7 according to the determination result. Then, in step S8, the characters are rearranged (sorted) so that the variant characters are adjacent to each other according to the arrangement order, and arranged in step S9. In step S9, the member list is reflected by reflecting the arrangement result L1. Display the list P1.
  • the database DB in which the character code CD and the sort order value serving as the reference for determining the arrangement order are stored as independent separate information is referred to.
  • This database DB executes input / output, update, search, etc. of character code CD and centrally manages attribute information D. Therefore, even if new characters are added to this database DB, Regularity can be maintained. .
  • variant character information D 4 is used as attribute information D.
  • attribute information D For example, by specifying regional performance information D 8 as attribute information D, required characters for each region, such as for each prefecture, etc. (Kanji) can be easily extracted. For this reason, it is possible to create a coded character set composed of kanji required for each region, and to provide a compact dictionary with a limited number of characters so that code points can be saved.
  • the first character processing method According to a program to be executed by a computer and a computer-readable recording medium on which the program is recorded, a plurality of characters can be rearranged in an arrangement order determined based on attribute information without depending on a character code. Therefore, it is possible to avoid an unnecessary change in the arrangement order due to the addition of a new character to the encoded character set.
  • the regularity of the arrangement order based on this attribute information can be maintained, and character processing can be performed so that a plurality of characters are rearranged without impairing the significance of the arrangement order. It is possible to quickly extract a desired character from a plurality of characters and easily generate a new encoded character set.
  • the type and the order of the attribute information specified as the sort order are not particularly limited, and the attribute information can be arbitrarily selected and changed according to the purpose of sorting the plurality of characters.
  • the following pattern can be realized. '
  • the radical information D 1 is specified as the highest priority key
  • the element information D 2 is specified as the second key
  • the radical stroke number information D 3 is specified as the third key
  • the fourth By specifying the variant character information D4 for the key and the total stroke count information D5 for the fifth key, multiple kanji characters are arranged in order to reflect the variant relationship (association relationship) between character components. Since multiple kanji can be arranged in the order of variant characters (order of related characters) with higher significance, the desired variant characters can be extracted more efficiently.
  • a program for causing a computer to execute the second character processing method according to the second embodiment is provided with an arbitrary character code to constitute an encoded character set.
  • a database is provided in which attribute information for identifying attributes of a plurality of characters and character codes are stored in advance in correspondence with each other, and determination is performed based on the attribute information stored in the database. Outputs the character code assigned to the character that is common to the attribute and attribute of the given character.
  • the input character and the character having the common attribute can be arranged.
  • the above-described program is pre-recorded on a computer-readable recording medium, and is installed in the computer. Save this program on the HDD etc. This allows the computer to execute the second character processing method according to the present invention as in the first embodiment described above.
  • the database included in the program according to the second embodiment can be configured in the same manner as the database DB included in the program according to the first embodiment. Therefore, detailed description of the database DB will be omitted.
  • the program according to the second embodiment is started as a partial program such as a kana-kanji conversion system for character processing.
  • a partial program such as a kana-kanji conversion system for character processing.
  • an application such as a kanji “ren” C 1 character code “4F7B” I kana-kanji conversion system
  • this program determines the variant C2 of kanji C1 as a character with common attributes and outputs the character code "5G6A" of this variant C2 from the database DB. Can be done.
  • the variant character C2 having the same attribute and high selection probability as the input kanji C1 can be preferentially displayed side by side with the kanji C1 as a conversion candidate character. Select the desired kanji from among multiple variants that are difficult to select It can be easily found and converted to kanji.
  • a program for causing a computer to execute the second character processing method is installed on the computer, and character input is performed from a keyboard or the like connected to the computer.
  • the kana-kanji conversion system refers to the database DB stored on the HDD and reads the character code CD and the attribute information D corresponding to the kanji “side” in advance. Based on the attribute information D read here, the attribute of the kanji “side” is determined, and characters common to this attribute and the genre “I” are searched and displayed in a list.
  • the variant character information D 4 is read as the attribute information D by designating in advance that the conversion candidate characters are to be displayed with priority on the variant characters. For example, if re-conversion is requested for the kanji “side”, the display is switched to the attribute common character list display screen P2 as shown in FIG.
  • the attribute common character list display screen P2 includes a representative character “side” Ml selected by inputting a character code CD from the keyboard, a character M2 having the same attribute as the representative character “side” Ml, Buttons B1 to B5 for selecting an arrangement method of the character M2 having the common attribute, a "Convert" button B6 for converting the character M2 having the common attribute, and a common attribute having a smaller number of strokes
  • the arrangement of the attribute common character M2 becomes The variant information (related information) of the element in the letter Sorted to reflect.
  • the variant character information D4 is selected by the "variant character” button B2
  • the arrangement of the characters M2 having the common attribute is sorted so as to reflect the variant character relationship between these characters.
  • conversion candidate characters can be called and displayed in a list according to the attribute of the character to be input. Therefore, even when a plurality of similar variants exist for the input character, A conversion candidate character having a high selection probability can be displayed preferentially. Therefore, the character input efficiency can be improved.
  • this computer has an attribute in the database DB. Waits for the input of a character code CD for selecting a character from the coded character set in which the information D is stored.
  • the flow advances to step S11 to refer to the database DB.
  • step S12 the attribute information D is read from the database DB for the character "side" Ml selected by inputting the character code CD, and based on the attribute information D read here, the step S1 is executed.
  • step 3 the attribute of the character "edge" Ml is determined.
  • step S 14 the character M 2 that is common to the attribute determined here is searched in the database DB, and in step S 15, the character M 2 that is searched for is assigned to the character M 2 that is common to the attribute Output character code CD from database DB.
  • step S16 the kana-kanji conversion system displays the attribute-common character M2 as in an attribute-common character list display screen P2 based on the character code CD output from the database DB.
  • the character M2 having the common attribute can be regularly formed. Display array and exit.
  • the input character and the character having the common attribute can be arranged, so that the arrangement having high significance is achieved.
  • Multiple characters can be arranged in order. Therefore, a character that has the same attribute and high selection probability as this character has priority as a selection candidate character. Can be easily displayed in a short time. Further, even when a new character is added to the encoded character set, the character can be processed while maintaining the regularity of the selection candidate character display. Therefore, a desired character can be quickly extracted and a new encoded character set can be easily generated.
  • the character input using the kana-kanji conversion system has been described as an example.
  • an OCR Optical
  • 'Character Reader Optical character reader. It can be applied to software etc.
  • this OCR for example, when reading personal name information such as a business card, it is possible to preferentially display a list of variant characters having a personal name record as conversion candidate characters for characters that could not be determined by the OCR. The efficiency of character string conversion in OCR can be improved.
  • the present invention is very suitably applied to character search processing and sort processing by an information processing device such as a computer.
  • a plurality of characters can be rearranged in an arrangement order determined based on attribute information without depending on a character code. Unnecessary changes in the order of the array due to the addition of characters can be avoided.
  • the regularity of the arrangement order based on the attribute information can be maintained, and character processing can be performed so as to rearrange a plurality of characters without impairing the significance of the arrangement order. For this reason, a desired character can be quickly extracted from a plurality of rearranged characters, and a new character set can be easily generated.
  • the input character and the character having the common attribute can be arranged, a plurality of characters can be arranged in a highly meaningful arrangement order. Therefore, a character having a high selection probability and having the same attribute as the character can be preferentially displayed as a selection candidate character, so that a desired character can be easily found in a short time. Also, even when a new character is added to this encoded character set, the character can be processed while maintaining the regularity of the selection candidate character display. Therefore, a desired character can be quickly extracted and a new encoded character set can be easily generated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Document Processing Apparatus (AREA)

Abstract

A program causes a computer to execute a method for processing characters C1, C2 assigned by arbitrary character codes CD and constituting an encoded character set. There is provided a database DB containing attribute information D2, D4, D6, D7, D8 for judging the attribute of each of the characters, which attribute information are correlated to the character codes CD in advance. The program causes the computer to wait for input of a character code for selecting a plurality of characters from the encoded character set whose attribute is stored in the database DB, read from the database DB, the attribute information on the plurality of characters selected by the input of the character code, judge the attributes of the characters according to the attribute information read in, and decide the sequence order of the characters according to the judgment result.

Description

明細書 文字処理方法、  Statement character processing method,
文字処理実行プログラム及び記録媒体 技術分野  Character processing execution program and recording medium
本発明は、 コンピュータ等の情報処理装置による文字検索処理及ぴソート処理 に適用して好適な文字処理方法、 その方法をコンピュータに実行させるためのプ ログラム及びそのプログラムを記録したコンピュータ読取可能な記録媒体に関す る。 背景技術  The present invention relates to a character processing method suitable for character search processing and sorting processing by an information processing device such as a computer, a program for causing a computer to execute the method, and a computer-readable recording recording the program. About media. Background art
従来、 コンピュータ等の情報処理装置における文字処理は、 字形 (グリフ : glyph) に付番されたコード番号に基づいて実行され、 このコード番号としては、 JIS漢字コード .(情報交換用符号化漢字集合: JIS X 0208 : 1997) のような公的規 格により標準化された文字コード (符号化文字集合) が用いられる。  Conventionally, character processing in an information processing device such as a computer is performed based on a code number assigned to a glyph (glyph). The code number is a JIS kanji code. A character code (encoded character set) standardized by a public standard such as JIS X 0208: 1997) is used.
公的規格により標準化された文字コードでは、 特に、 人名及び地名等を正確に 記載するために必要な文字が欠けてしまう。 このような文字として、 J I S漢字 コードに含まれる所謂常用漢字と同音同義であるが、 字形が一般的ではなく異体 字関係にある漢字 (以下、 異体字と呼ぶ) は多数存在しており、 標準化された文 字コードでは使用されないユーザー定義領域 (空き領域) に、 外字として追加さ れる。  In the character code standardized by the official standard, characters necessary for accurately describing a person's name, a place name, and the like are lacking. Such characters are synonymous with the so-called common kanji included in the JIS kanji code, but there are a number of kanji whose character forms are not common but have a variant relationship (hereinafter referred to as variant characters). Is added as an external character to the user-defined area (free area) that is not used by the specified character code.
このような多数の異体字の間の相違点の検出を容易にできると共に、 複数の異 体字からの外字の選択時間を短縮できるようにすることによって、 コンピュータ 等の文字処理効率を向上させることができる情報処理方法が開発されている (特 願 2003— 1 55787号参照) 。 It is easy to detect differences between such a large number of variants, and multiple An information processing method has been developed that can improve the character processing efficiency of computers and the like by reducing the time required to select external characters from body characters (see Japanese Patent Application No. 2003-155787).
上述した公報に記載された情報処理方法により複数の異体字をインストールさ れたコンピュータにおいて、 -従来の表作成用ソフトウェア及ぴデータベース構築 用ソフトウェア等により、 第 8 (a) 図に示すような会員名、 この会員名の読み 仮名及ぴ会員番号を一覧表としたスプレツド 'シート P 1 1等のデータ区切りを 有する文字情報に対してソート処理を実行する場合、 第 8 (b) 図に示すように、 ソート範囲 I Pを選択し、 ソート方法を指定していた。  In a computer in which multiple variants are installed by the information processing method described in the above-mentioned gazette,-a member as shown in Fig. 8 (a) using conventional table creation software and database construction software, etc. Name, reading of this member name When performing sort processing on character information with data delimiters such as Spread 'Sheet P11 that lists kana and member number, as shown in Fig. 8 (b) In addition, the sort range IP was selected and the sort method was specified.
ソート方法として、 例えば、 ソート順位'(配列順位) を決定するための最優先 キーに 「音読み」 を指定し、 第 2キーに 「会員番号」 を指定した場合、 第 9 (a) 図に示すように、 音読みが 「オタハシ」 となる会員名を会員番号順にソー トした結果 L 1 1が表示される。 このソート方法では、 会員名の漢字表記に規則 性がなく、 統一されない一覧表示 P 12となってしまう。  As a sorting method, for example, if “Sound-reading” is specified as the highest priority key to determine the sort order (array order) and “Member number” is specified as the second key, as shown in Fig. 9 (a) As a result, L11 is displayed as a result of sorting the member names whose readings are "Otahashi" in order of the member number. With this sorting method, the kanji notation of the member name is not regular, and the list display P12 is not unified.
また例えば、 最優先キーに 「音読み」 を指定し、 第 2キーに 「文字コード」 を 指定した場合、 第 9 (b) 図に示すように、 音読みが 「ォクハシ」 となる会員名 を文字コードに基づきソートした結果 L 1 2が表示される。 このソート方法にお いても、 やはり会員名の漢字表記が不規則で統一されない一覧表示 P 1 3となつ てしまう。  Also, for example, if you specify “reading aloud” as the highest priority key and “letter code” as the second key, as shown in Fig. 9 (b), the member name whose reading aloud is “OK” will be displayed as the character code L 1 2 is displayed as a result of sorting based on. Even in this sorting method, the kanji notation of the member name is irregular and the list is displayed as P 13.
一般に、 文字コードに基づくソート処理においては、 公的規格により標準化さ れた文字コードである J I S第 1水準漢字コード又は J I S第 2水準漢字コード が参照される。 J I S第 1水準漢字コードは、 常用漢字に音読み順に文字コード を付与して構成した符号化漢字集合であり、 J I S第 2水準漢字コードは、 常用 漢字に部首に応じて画数順に文字コードを付与して構成した符号化漢字集合であ る。 In general, in a sorting process based on a character code, a JIS first-level kanji code or a JIS second-level kanji code, which is a character code standardized by a public standard, is referred to. The JIS Level 1 Kanji code is a set of coded Kanji that is composed by adding character codes in the order of reading aloud to common kanji. An encoded kanji set composed of kanji with character codes assigned in the order of the number of strokes according to radicals.
J I S規格に対しては、 5年に 1度の変更機会 (見直し) があるが、 J I S第 1及び第 2水準漢字コードでの文字数の増加はほとんどない。 このため、 J I S 第 1及び第 2水準漢字コードの空き領域に外字として追加された異体字 (関連 字) は、 ソート順位を決定するための規則性を保持していない。 従って、 ソート 処理のためのキーとして文字コードを指定した場合、 第 9 ( b ) 図に示したよう に、 異体字関係にある漢字を含む複数の会員名が、 不規則にソートされて表示さ れることになるため、 有意性の高い文字配列で並べ替えて表示することができな いという問題があった。 発明の開示  Although there is an opportunity (review) once every five years for the JIS standard, there is almost no increase in the number of characters in the JIS first and second level Kanji codes. For this reason, variant characters (related characters) added as external characters in the free space of the JIS first and second level kanji codes do not maintain the regularity for determining the sort order. Therefore, when a character code is specified as a key for the sorting process, as shown in Fig. 9 (b), a plurality of member names including kanji related to variant characters are displayed in an irregularly sorted manner. Therefore, there was a problem that it was not possible to sort and display the character sequences with high significance. Disclosure of the invention
本発明は、 この問題を解決し、 文字コードに依存することなく決定された有意 性の高い配列順位で複数の文字を並べ替えることができる文字処理方法、 その方 法をコンピュータに実行させるためのプログラム及びそのプログラムを記録した コンピュータ読取可能な記録媒体を提供することを目的とする。  The present invention solves this problem, and provides a character processing method capable of rearranging a plurality of characters in a highly significant arrangement order determined without depending on a character code, and a method for causing a computer to execute the method. It is an object of the present invention to provide a program and a computer-readable recording medium on which the program is recorded.
上記問題を解決するため、 本発明に係る第 1の文字処理方法は、 任意の文字コ 一ドを付与されて符号化文字集合を構成する文字を処理する方法であって、 複数 の文字が夫々備える属性を判別するための属性情報と文字コードとを対応させて 保存し、 ここに属性情報が保存された符号化文字集合から複数の文字を選択する 文字コードの入力を待ち、 この文字コードの入力により選択された複数の文字に 関して属性情報を読み込み、 ここに読み込まれた属性情報に基づきこれら複数の 文字が夫々備える属性を判別し、 この判別結果に従いこれら複数の文字に対して 配列順位を決定する。 In order to solve the above problem, a first character processing method according to the present invention is a method for processing a character which is provided with an arbitrary character code and constitutes a coded character set, wherein a plurality of characters are Attribute information for identifying the attribute to be provided and the character code are stored in association with each other, and a plurality of characters are selected from a coded character set in which the attribute information is stored. Attribute information is read for a plurality of characters selected by input, and the attributes of each of the plurality of characters are determined based on the read attribute information. Determine the sequence order.
本発明に係る第 2の文字処理方法は、 任意の文字コードを付与されて符号化文 字集合を構成する文字を処理する方法であって、 複数の文字が夫々備える属性を 判別するための属性情報と文字コードとを対応させて保存し、 ここに属性情報が 保存された符号化文字集合から文字を選択する文字コードの入力を待ち、 この文 字コードの入力により選択された文字に関して属性情報を読み込み、 ここに読み 込まれた属性情報に基づきこの文字が備える属性を判別し、 ここに判別された属 性と属性共通の文字を検索し、 ここに検索された属性共通の文字に付与された文 字コードを出力することを特徴とする。  A second character processing method according to the present invention is a method for processing a character that is provided with an arbitrary character code and that constitutes an encoded character set, and includes an attribute for determining an attribute of each of a plurality of characters. Information and the character code are stored in association with each other, and the input of the character code for selecting a character from the coded character set in which the attribute information is stored is waited for, and the attribute information regarding the character selected by inputting the character code is waited for. Is read, the attribute of this character is determined based on the attribute information read in here, a character common to the determined attribute and the attribute is searched, and the character common to the searched attribute is assigned to the character searched here. Character codes are output.
本発明に係る第 1の文字処理方法をコンピュータに実行させるためのプロダラ ムは、 任意の文字コードを付与されて符号化文字集合を構成する文字を処理する ために、 複数の文字が夫々備える属性を判別するための属性情報と文字コードと を予め対応させて保存したデータベースを備え、 このデータベースに属性情報が 保存された符号化文字集合から複数の文字を選択する文字コードの入力を待ち、 この文字コードの入力により選択された複数の文字に関して属性情報をデータべ ースから読み込み、 ここに読み込まれた属性情報に基づきこれら複数の文字が 夫々備える属性を判別し、 この判別結果に従いこれら複数の文字に対して配列順 位を決定することを特徴とする。  A program for causing a computer to execute the first character processing method according to the present invention includes an attribute provided in each of a plurality of characters for processing a character to which an arbitrary character code is added and which constitutes an encoded character set. A database is provided in which attribute information and character codes are stored in a manner corresponding to each other in advance, and a character code for selecting a plurality of characters from a coded character set in which attribute information is stored in this database is awaited. Attribute information is read from the database for a plurality of characters selected by inputting the character code, and the attributes of the plurality of characters are determined based on the read attribute information. The arrangement order is determined for characters.
本発明に係る第 2の文字処理方法をコンピュータに実行させるためのプログラ ムは、 任意の文字コードを付与されて符号化文字集合を構成する文字を処理する ために、 複数の文字が夫々備える属性を判別するための属性情報と文字コードと を予め対応させて保存したデータベースを備え、 このデータベースに属性情報が 保存された符号化文字集合から文字を選択する文字コードの入力を待ち、 この文 字コードの入力により選択された文字に関して属性情報をデータベースから読み 込み、 ここに読み込まれた属性情報に基づきこの文字が備える属性を判別し、 こ こに判別された属性と属性共通の文字をデータベースで検索し、 検索された属性 共通の文字に付与された文字コードを出力することを特徴とする。 A program for causing a computer to execute the second character processing method according to the present invention includes an attribute provided in each of a plurality of characters for processing a character which is provided with an arbitrary character code and forms an encoded character set. A database is provided in which attribute information and character codes are stored in association with each other in order to determine the character code. Waiting for input of a character code for selecting a character from a coded character set in which the attribute information is stored in this database, The attribute information for the character selected by inputting the character code is read from the database, the attribute of the character is determined based on the attribute information read here, and the character common to the determined attribute and the attribute is stored in the database. And outputs the character code assigned to the searched attribute common character.
本発明に係る第 1の文字処理方法をコンピュータに実行させるためのプロダラ ムを記録したコンピュータ読取可能な記録媒体によれば、 本発明に係る第 1の文 字処理方法をコンピュータに実行させるためのプログラムがコンピュータ読取可 能な記録媒体に記録される。 従って、 任意の文字コードを付与されて符号化文字 集合を構成する文字をコンピュータにより処理する場合に、 文字コードに依存す ることなく属性情報に基づいて決定された配列順位で複数の文字を並べ替えるこ とができるので、 符号ィヒ文字集合への新規文字追加に起因する配列順位の未必的 変更を回避することが可能となる。 図面の簡単な説明  According to a computer-readable recording medium on which a program for causing a computer to execute the first character processing method according to the present invention is provided, the computer can execute the first character processing method according to the present invention. The program is recorded on a computer-readable recording medium. Therefore, when a computer is used to process characters constituting an encoded character set with an arbitrary character code, a plurality of characters are arranged in an arrangement order determined based on attribute information without depending on the character code. Since it is possible to change the order, it is possible to avoid an unnecessary change in the order of arrangement due to the addition of a new character to the character set. Brief Description of Drawings
第 1図は、 本発明に係るプログラムが備えるデータベースの構成例を示す図で ある。  FIG. 1 is a diagram showing a configuration example of a database provided in a program according to the present invention.
第 2図は、 ソート前の文字配列例を示す図である。  FIG. 2 is a diagram showing an example of a character arrangement before sorting.
第 3図は、 本発明に係る第 1のプログラムによる属性情報に基づく文字配列例 を示す図である。  FIG. 3 is a diagram showing an example of a character arrangement based on attribute information by the first program according to the present invention.
第 4図は、 本発明に係る第 1のプログラムによる異体字情報に基づくソート例 を示す図である。  FIG. 4 is a diagram showing an example of sorting based on variant character information by the first program according to the present invention.
第 5図は、 本発明に係る第 1の文字処理例を示すフローチャートである。  FIG. 5 is a flowchart showing a first example of character processing according to the present invention.
第 6図は、 本発明に係る第 2のプログラムによる属性共通文字一覧の表示例を 示す図である。 FIG. 6 shows a display example of the attribute common character list by the second program according to the present invention. FIG.
-第 7図は、 本発明に係る第 2の文字処理例を示すフローチャートである。  FIG. 7 is a flowchart showing a second example of character processing according to the present invention.
第 8 ( a ) 図は、 従来のソート例を説明する図であって、 データ区切りを有す る文字情報例を示す図である。  FIG. 8 (a) is a diagram for explaining a conventional sorting example, and is a diagram showing an example of character information having a data break.
第 8 ( b ) 図は、 従来のソート例を説明する図であって、 ソート範囲選択例を 示す図である。  FIG. 8 (b) is a diagram for explaining a conventional sorting example, and is a diagram showing a sorting range selection example.
第 9 ( a ) 図は、 従来のソート例を説明する図であって、 会員番号によるソー ト例を示す図である。  FIG. 9 (a) is a diagram for explaining a conventional sorting example, and is a diagram showing an example of sorting by member number.
第 9 ( b ) 図は、 従来のソート例を説明する図であって、 文字コードによるソ 一ト例を示す図である。 発明を実施するための最良の形態  FIG. 9 (b) is a diagram for explaining a conventional sorting example, and is a diagram showing a sorting example using character codes. BEST MODE FOR CARRYING OUT THE INVENTION
以下、 添付の図面を参照しながら、 本発明に係る文字処理方法、 その方法をコ ンピュータに実行させるためのプログラム及びそのプログラムを記録したコンビ ユータ読取可能な記録媒体の実施に最良の形態について説明する。  Hereinafter, the best mode for implementing a character processing method according to the present invention, a program for causing a computer to execute the method, and a computer-readable recording medium storing the program will be described with reference to the accompanying drawings. I do.
[ 1 ] 第 1の実施形態  [1] First Embodiment
この第丄の実施形態としての第 1の文字処理方法をコンピュータに実行させる ためのプログラムは、 任意の文字コードを付与されて符号化文字集合を構成する 文字をコンピュータにより処理するために、 複数の文字が夫々備える属性を判別 するための属性情報と文字コードとを予め対応させて保存したデータベースを備 え、 このデータベースに保存された属性情報に基づきこれら複数の文字が夫々備 える属性を判別した結果に従いこれら複数の文字に対して配列順序を決定する。 これによつて、 文字コードに依存することなく属性情報に基づき決定された配列 順位で複数の文字を並べ替えることができるようにしたものである。 . A program for causing a computer to execute the first character processing method according to the first embodiment includes a plurality of characters for processing a character to which an arbitrary character code is added to form an encoded character set by the computer. A database is provided in which attribute information for identifying the attributes of each character and a character code are stored in advance in correspondence with each other, and based on the attribute information stored in the database, the attributes of each of the plurality of characters are determined. The arrangement order is determined for these multiple characters according to the result. As a result, the array determined based on the attribute information without depending on the character code It is possible to sort multiple characters in order. .
本実施形態においては、 この第 1の文字処理方法をコンピュータに実行させる ために、 上述のプログラムをコンピュータ読取可能な記録媒体に予め記録する。 このコンピュータ読取可能な記録媒体は、 ハードディスク (H D) のような磁気 記録媒体、 コンパクトディスク (C D ) のような光記録媒体及ぴ半導体メモリの ような電子記録媒体等から構成され、 例えば、 コンピュータに上述のプログラム をインストールすることによって、 このコンピュータに内蔵されたハードデイス クドライブ (H D D ) 等に、 このプログラムを保存 (格納) する。 これによつて. このコンピュータに、 本発明にかかる第 1の文字処理方法を実行させることがで きる。  In the present embodiment, in order to cause a computer to execute the first character processing method, the above-described program is recorded in a computer-readable recording medium in advance. The computer-readable recording medium includes a magnetic recording medium such as a hard disk (HD), an optical recording medium such as a compact disk (CD), and an electronic recording medium such as a semiconductor memory. By installing the above program, this program is saved (stored) in a hard disk drive (HDD) built in this computer. This allows the computer to execute the first character processing method according to the present invention.
第 1図は、 本発明に係る第 1の文字処理方法をコンピュータに実行させるため のプログラムが備えるデータベース D Bの構成例を示す図である。 このデータべ ース D Bは、 任意の符号化文字集合を構成する複数の文字 Cに付与された文字コ 一ド C Dと、 これら複数の文字が夫々備える属性を判別するための属性情報 Dと を予め対応させて保存したものであって、 このデータベース D Bを参照するァプ リケーションに依存することなく、 要求に応じて属性情報 Dを出力する。  FIG. 1 is a diagram showing a configuration example of a database DB provided in a program for causing a computer to execute a first character processing method according to the present invention. This database DB includes a character code CD assigned to a plurality of characters C constituting an arbitrary coded character set, and attribute information D for determining an attribute of each of the plurality of characters. Attribute information D is output in response to a request without depending on the application that refers to the database DB.
ここで、 属性情報 Dとは、 例えば、 漢字 (文字) 「蓮」 C 1が備える属性とし て、 この漢字 C 1の部首 C 1 1を判別するための部首情報 D 1、 この漢字 C 1の エレメント C 1 2, C 1 3を判別するためのエレメント情報 D 2、 この漢字 C 1 の部首 C 1 1以外の部分、 即ち、 エレメント C 1 2, C 1 3の総画数を判別する ための部首内画数情報 D 3、 この漢字 C 1と異体字関係にある漢字 C 2を判別す るための異体字情報 D 4、 この漢字 C 1の総画数を判別するための総画数情報 D 5、 この漢字 C 1の人名での使用実績を判別するための人名実績情報 D 6、 この 漢字 C 1の地名での使用実績を判別するための地名実績情報 D 7及びこの漢字 C 1の地域別での使用実績を判別するための地域実績情報 D 8等から構成される。 . このようにデータベース DBは、 属性情報 Dとして、 この漢字 C 1の構成要素 (エレメント) を判別するための文字構成要素情報 (エレメント情報) D2と、 異体字情報 D 4と、 人名実績情報 D 6と、 ±也名実績情報 D 7と、 地域実績情報 D 8とを、 この漢字 C 1に予め付与された文字コード 「4 F 7B」 と対応させて保 存 (格納) する。 エレメント情報 D 2は、 単にエレメントを判別するだけでなく. エレメントの異体情報及ぴ関連情報を含む。 Here, the attribute information D is, for example, a radical information D 1 for determining a radical C 11 of the kanji C 1 as an attribute included in the kanji (character) “Lotus” C 1, and a kanji C Element information D 2 for discriminating the elements C 1 2 and C 1 3 of 1, the part other than the radical C 11 of this kanji C 1, that is, the total number of strokes of the elements C 1 2 and C 13 Information on the number of strokes in the radical D3, variant information D4 to determine the kanji C2 that is in a variant relationship with this kanji C1, and total stroke information to determine the total number of strokes in this kanji C1 D5, the actual name information D6 to determine the actual use of this kanji C1 It is composed of place name result information D7 for determining the use result of the kanji C1 in the place name, and regional result information D8 for determining the use result of the kanji C1 in each region. As described above, the database DB includes, as attribute information D, character component information (element information) D2 for determining the component (element) of this kanji C1, variant character information D4, and personal name result information D 6, and the actual name information D 7 and the local result information D 8 are stored (stored) in correspondence with the character code “4F 7B” previously assigned to the kanji C 1. The element information D 2 does not merely determine the element but also includes variant information and related information of the element.
人名実績情報 D 6及び地名実績情報 D 7において、 例えば、 人名又は地名での 使用実績有りは 「1」 、 使用実績無しは 「0」 を各文字の使用実績として保存す る。 即ち、 データベース DBは、 漢字 C 1が人名字に使われている場合、 この漢 字 C 1の文字コード 「4 F 7 B」 と人名実績情報 「1」 とを対応させて保存し、 この漢字 C 1が地名字に使われている場合、 この漢字 C 1の文字コード 「4 F 7 B」 と地名実績情報 「1」 とを対応させて保存するように構成する。 もちろん、 この漢字 C 1が、 人名字及ぴ地名字に使われていない場合は、 この漢字 C 1の文 字コード 「4 F 7B」 と人名実績情報 「0」 及び地名実績情報 「0」 とを対応さ せて保存する。  In the person name result information D 6 and the place name result information D 7, for example, “1” is stored as a use result of a person or a place name, and “0” is stored as a use result of no character as a use result of each character. That is, if the kanji C1 is used as a personal name, the database DB stores the character code “4F7B” of the kanji C1 in association with the actual name information “1”, and stores the kanji. When C1 is used as a place name character, the character code “4 F 7 B” of this kanji C 1 and the place name actual information “1” are stored in association with each other. Of course, if this kanji C1 is not used for personal name and place name characters, the character code “4F7B” of this kanji C1 and the actual name information “0” and the actual place name information “0” And save it.
また、 地域実績情報 D 8においては、 地域単位として都道府県を採用し、 使用 実績が有った都道府県数を地域実績情報 D 8として保存する。 漢字 C 1には、 例 えば、 人名字又は地名字として 47都道府県で戸籍使用実績がある場合、 データ ベース DBは、 この漢字 C 1の文字コード 「4 F 7B」 と地域実績情報 「47」 とを対応させて保存する。  In the regional performance information D8, prefectures are adopted as regional units, and the number of prefectures that have been used is stored as regional performance information D8. For example, if Kanji C1 has been used as a family name or place name in 47 prefectures, the database DB will use the kanji C1 character code `` 4F7B '' and regional performance information `` 47 '' And save them.
従って、 かな漢字変換システム及び表作成用ソフトウェア等のアプリケーショ ンが備えるソート処理エンジン等を介して、 漠字 「蓮」 C 1を選択する文字コー ド 「4 F 7B」 の入力が有ったとき、 この文字コード 「4F 7B」 に対応する属 性情報 Dをデータベース D Bからソート処理ェンジンに出力することができる。 よって、 ソート処理エンジン等は、 このデータベース DBから出力された属性情 報 Dを読み込み、 この属性情報 Dに基づいて、 漢字 「蓮」 C 1が備える属性を判 別することができる。 Therefore, applications such as Kana-Kanji conversion system and table creation software When the character code “4F7B” is selected to select the vague character “Lotus” C1 via the sort processing engine provided by the application, the attribute information corresponding to this character code “4F7B” D can be output from the database DB to the sorting engine. Therefore, the sort processing engine or the like reads the attribute information D output from the database DB, and can determine the attribute included in the kanji “Lotus” C1 based on the attribute information D.
ソート処理エンジン等は、 具体的には、 漢字 「蓮」 C 1の備える属性として、. 部首 C 1 1力 S 「草冠」 であり、 エレメント C 1 2 , C 1 3力 S 「之繞」 及び 「車」 であり、 部首内画数が 「9」 であり、 文字コード 「5G6A」 を付与された漢字 C 2と異体字関係にあり、 総画数が 「1 2」 であり、 人名実績及び地名実績があ り、 地域実績が 「47」 都道府県であることを、 属性情報 Dから判別できる。 このように構成されたデータベース DBに、 複数の異体字が存在する多数の漢 字 (文字) を予め保存し、 これらの漢字を配列する方法として、 最優先キーに部 首情報 D 1を指定し、 第 2のキーに部首内画数情報 D 3を指定し、 第 3のキーに 総画数情報 D 5を指定すると、 例えば部首が 「草冠」 である複数の漢字は、 第 2 図に示すように配列される。  The sort processing engine, etc. is, specifically, an attribute included in the kanji “Lotus” C1. It is a radical C1 1 force S “canopy”, and the elements C1 2, C1 3 force S “Nosurai” And `` car '', the number of strokes in the radical is `` 9 '', the character is in a variant relationship with the kanji C2 assigned the character code `` 5G6A '', the total number of strokes is `` 1 2 '', It can be determined from the attribute information D that there is a place name record and the regional record is “47” prefectures. In a database DB configured in this way, many kanji (characters) with multiple variants are stored in advance, and as a method of arranging these kanji, radical information D 1 is specified as the highest priority key. If you specify the number of strokes information D 3 in the second key and the total number of strokes information D 5 in the third key, for example, a plurality of kanji whose radical is `` canopy '' is shown in Fig. 2. Are arranged as follows.
この文字配列例において、 漢字 E 1の異体字 E 2〜E 8、 漢字 F 1の異体字 F 2、 漢字 G 1の異体字 G 2及び漢字 「若」 の異体字 HI〜H 3は、 隣接して配列 されないため、 これらの漢字の間に異体字関係を見出すことは困難である。  In this character array example, variant characters E2 to E8 of kanji E1, variant character F2 of kanji F1, variant character G2 of kanji G1, and variant characters HI to H3 of kanji "Waka" are adjacent. It is difficult to find a variant relationship between these kanji characters because they are not arranged in a sequence.
一方、 これらの漢字を配列する方法として、 最優先キーに部首情報 D 1を指定 し、 第 2のキーに部首内画数情報 D 3を指定し、 第 3のキーに異体字情報 D 4の 指定を挿入し、 第 4のキーに総画数情報 D 5を指定すると、 部首が 「草冠」 であ る複数の漢字は、 第 3図に示すようにソートされる。 この属性情報 Dに基づく文字配列例においては、 異体字情報 D 4に基づき複数 の漢字の異体字関係が判別され、 この判別結果に従って、 これら複数の漢字に対 して配列順位が決定される。 従って、 漢字 E 1の異体字 E 2〜E 8、 漢字 F 1の 異体字 F 2、 漢字 G 1の異体字 G 2及ぴ漢字 「若」 の異体字 H I〜H 3が各々隣 接して配列されるため、 これらの漢字の間に容易に異体字関係を見出すことがで さる。 On the other hand, as a method of arranging these kanji, the radical information D 1 is specified as the highest priority key, the radical stroke number information D 3 is specified as the second key, and the variant character information D 4 is specified as the third key. Is inserted, and the total number of strokes information D5 is specified as the fourth key, the plurality of kanji whose radical is "canopy" are sorted as shown in FIG. In the example of the character arrangement based on the attribute information D, the variant character relationship of a plurality of kanji is determined based on the variant character information D4, and the arrangement order is determined for the plurality of kanji according to the determination result. Therefore, variants E2 to E8 of kanji E1, variants F2 of kanji F1, variants G2 of kanji G1, and variants HI to H3 of Kanji `` Waka '' are adjacently arranged. As a result, it is easy to find a variant relationship between these kanji.
この文字配列例では、 配列順位 (ソートオーダー) として、 最優先キーに部首 情報 D 1を指定し、 第 2のキーに部首内画数情報 D 3を指定し、 第 3のキーに異 体字情報 D 4を指定し、 第 4のキーに総画数情報 D 5を指定した例について説明 したが、 ソートオーダーとして、 例えば、 最優先キーに部首情報 D 1を指定し、 第 2のキーにエレメント情報 D 2を指定し、 第 3のキーに部首内画数情報 D 3を 指定し、 第 4のキーに異体字情報 D 4を指定し、 第 5のキーに総画数情報 D 5を 指定することによって、 複数の漢字を夫々構成するエレメント間での異体関係を ソートオーダーに反映することができるので、 より有意性の高い配列で多数の漢 字を異体字順 (関連字順) に並べ替えることができる。 これによつて、 例えば、 多数の漢字の中から異体関係及び関連関係にある複数の漠字を抽出して新たに辞 書を作成する機能等を漢字データベースに付与することができるので、 漢字デー タベースの管理等に極めて有効である。  In this character arrangement example, as the arrangement order (sort order), the radical information D1 is specified as the highest priority key, the radical stroke number information D3 is specified as the second key, and the variant key is specified as the third key. In the example described above, character information D4 was specified and total stroke count information D5 was specified as the fourth key.As a sort order, for example, radical information D1 was specified as the highest priority key and the second key was specified. Specify the element information D 2 for the key, the radical key information D 3 for the third key, the variant character information D 4 for the fourth key, and the total stroke information D 5 for the fifth key. By specifying, it is possible to reflect the heterogeneity relationship between the elements that compose multiple kanji characters in the sort order, so that many kanji characters are arranged in a more significant sequence in the variant character order (related character order). Can be sorted. Thus, for example, it is possible to add a function of extracting a plurality of vague characters having a variant and a related relationship from a large number of kanji and creating a new dictionary to the kanji database. This is extremely effective for database management.
このような属性情報 Dに基づく文字配列方法を、 上述したデータ区切りを有す る文字情報 P 1 1に適用する場合、 コンピュータに接続されたキーボード等の入 力装置によって、 ソート方法として、 最優先キーに 「音読み」 を指定し、 第 2キ 一に異体字情報 D 4を指定すると、 第 4図に示すように、 音読みが 「ォクハシ」 である会員名が、 異体字関係にある漢字を含む会員名と夫々 P舞接する配列 Lのよ うに並べ替えられ、 統一された会員名簿一覧 P 1を表示することができる。 When the character arrangement method based on the attribute information D is applied to the character information P11 having the above-described data delimiter, the input device such as a keyboard connected to a computer has the highest priority as a sorting method. If the key is set to "on-reading" and the second key is specified for variant character information D4, as shown in Fig. 4, the member name whose on-reading is "Okuhashi" contains kanji related to variant characters. An array L that is in contact with each member's name. The unified member list P1 can be displayed.
従って、 会員名の漢字表記に規則性を付与することができ、 有意性の高い文字 並び順 (ソートオーダー) を得ることができる。 この文字配列方法は、 名簿等の 並び替え及び漢字データベース管理等に便利で有益である。  Therefore, regularity can be given to the kanji notation of the member name, and a highly significant character sorting order can be obtained. This method of arranging characters is convenient and useful for sorting lists and managing kanji databases.
次に、 本発明に係る第 1の文字処理方法の実施形態として、 このデータベース D Bを備えたプログラムによる文字処理例について、 第 5図に示すフローチヤ一 トを参照して説明する。  Next, as an embodiment of the first character processing method according to the present invention, an example of character processing by a program including the database DB will be described with reference to a flowchart shown in FIG.
この文字処理例では、 任意の文字コードを付与されて符号化文字集合を構成す る文字を処理する場合に、 複数の文字が夫々備える属性を判別するための属性情 報 Dと文字コード C Dとを予め対応させてデータベース D Bに保存し、 このデー タベース D Bを備えたプログラムをコンピュータにインストールすることを前提 とする。 このプログラムは、 表作成用ソフトウェア等での文字処理をコンビユー タに実行させるためのかな漢字変換システム等の一部プログラムとして起動され、 第 5図に示すフローチャートに沿って、 コンピュータに文字処理を実行させるも のとする。  In this character processing example, when processing characters forming an encoded character set given an arbitrary character code, the attribute information D and the character code CD for discriminating the attributes of a plurality of characters are provided. Are stored in the database DB in advance, and it is assumed that a program provided with the database DB is installed on a computer. This program is started as a part of the Kana-Kanji conversion system, etc., for causing a computer to execute character processing with software for table creation, etc., and causes the computer to execute character processing according to the flowchart shown in FIG. Shall be assumed.
この前提において、 コンピュータにより表作成用ソフトウェア等を起動すると、 このコンピュータによる文字処理が開始され、 第 5図に示すフローチャートのス テツプ S 1で、 このコンピュータは、 データベース D Bに属性情報 Dが保存され た符号化文字集合から複数の文字を選択する文字コード C Dの入力を待ち、 会員 名簿一覧 P 1 1でソート範囲 I Pが選択されると、 文字コード C D入力と判断し てステップ S 2に進む。  Under this assumption, when the table creation software or the like is started by the computer, character processing by the computer is started, and in step S1 of the flowchart shown in FIG. 5, the computer stores the attribute information D in the database DB. Waiting for the input of a character code CD for selecting a plurality of characters from the encoded character set, and when the sorting range IP is selected in the member list P11, it is determined that the character code is CD input and the process proceeds to step S2.
ステップ S 2で、 選択されたソート範囲 I Pの文字コード C Dをワークフアイ ルへ転写して入力し、 ステップ S 3でソート方法指定か否かを判断する。 ここで、 ソート方法として、 例えば、 最優先キーに 「音読み」 が指定され、 第 2のキーに 「異体字」 が指定されると、 ステップ S 4に進んでデータベース D Bを参照する。 次のステップ S 5で、 ワークファイルに転写された文字コード C Dにより選択 された複数の文字に関して、 指定されたキー 「異体字」 に対応する属性情報 D、 即ち、 異体字情報 D 4をデータベース D Bから読み込む。 In step S2, the character code CD of the selected sorting range IP is copied to the work file and input, and in step S3, it is determined whether or not the sorting method is specified. here, As a sorting method, for example, when “sound reading” is specified as the highest priority key and “variant” is specified as the second key, the process proceeds to step S4 to refer to the database DB. In the next step S5, the attribute information D corresponding to the specified key "variant character", i.e., the variant character information D4, for a plurality of characters selected by the character code CD transcribed in the work file is stored in the database DB. Read from.
ステップ S 6で、 読み込まれた異体字情報 D 4に基づいて、 複数の文字が夫々 備える属性として異体字関係を判別し、 この判別結果に従って、 ステップ S 7で、 配列順位を決定する。 そして、 ステップ S 8で、 この配列順位に応じて異体字を 隣接させるように文字を並べ替えて (ソートして) 配列し、 ステップ S 9で、 こ の配列結果 L 1を反映させて会員名簿一覧 P 1を表示する。  In step S6, based on the read variant character information D4, a variant character relationship is determined as an attribute of each of a plurality of characters, and the arrangement order is determined in step S7 according to the determination result. Then, in step S8, the characters are rearranged (sorted) so that the variant characters are adjacent to each other according to the arrangement order, and arranged in step S9. In step S9, the member list is reflected by reflecting the arrangement result L1. Display the list P1.
このように、'本発明に係る第 1の文字処理方法によれば、 文字コード C Dと配 列順位を決定する基準となるソートオーダー値とを独立した別情報として格納し たデータベース D Bを参照し、 このデータベース D Bにより文字コード C Dの入 出力、 更新及び検索等を実行し、 属性情報 Dを集中管理するので、 このデータべ ース D Bに新字を追加した場合であっても、 ソートオーダーの規則性を保持する ことができる。 .  As described above, according to the first character processing method according to the present invention, the database DB in which the character code CD and the sort order value serving as the reference for determining the arrangement order are stored as independent separate information is referred to. This database DB executes input / output, update, search, etc. of character code CD and centrally manages attribute information D. Therefore, even if new characters are added to this database DB, Regularity can be maintained. .
上述した文字処理例では、 属性情報 Dとして異体字情報 D 4を使用したが、 例 えば、 属性情報 Dとして地域実績情報 D 8を指定することによって、 都道府県別 等の地域別に必要となる文字 (漢字) を容易に抽出することができる。 このため、 このような地域別に必要となる漢字で構成された符号化文字集合を作成し、 コー ドボイントを節約可能に文字数を限定したコンパクトな辞書として提供すること が可能となる。  In the above-described example of character processing, variant character information D 4 is used as attribute information D.For example, by specifying regional performance information D 8 as attribute information D, required characters for each region, such as for each prefecture, etc. (Kanji) can be easily extracted. For this reason, it is possible to create a coded character set composed of kanji required for each region, and to provide a compact dictionary with a limited number of characters so that code points can be saved.
このように本発明に係る第 1の文字処理方法、 この第 1の文字処理方法をコン ピュータに実行させるためのプログラム及びこのプログラムを記録したコンビュ 一タ読取可能な記録媒体によれば、 文字コードに依存することなく属性情報に基 づき決定された配列順位で複数の文字を並べ替えることができるので、 符号化文 字集合への新規文字追加に起因する配列順位の未必的変更を回避することが可能 となる。 Thus, the first character processing method according to the present invention, According to a program to be executed by a computer and a computer-readable recording medium on which the program is recorded, a plurality of characters can be rearranged in an arrangement order determined based on attribute information without depending on a character code. Therefore, it is possible to avoid an unnecessary change in the arrangement order due to the addition of a new character to the encoded character set.
よって、 この属性情報に基づく配列順位の規則性を維持することができるので、 この配列順位の有意性を損なうことなく複数の文字を並べ替えるように文字処理 することができるため、 並べ替えられた複数の文字から所望の文字を迅速に抽出 して新たな符号化文字集合を容易に生成することができる。  Therefore, the regularity of the arrangement order based on this attribute information can be maintained, and character processing can be performed so that a plurality of characters are rearranged without impairing the significance of the arrangement order. It is possible to quickly extract a desired character from a plurality of characters and easily generate a new encoded character set.
なお、 ソートオーダーとして指定する属性情報の種類及び指定順序は、 特に限 定されるものではなく、 複数の文字を並べ替える目的に応じて、 任意に選択変更 することができるので、 複数のソートオーダーのパターンを実現することができ る。 '  The type and the order of the attribute information specified as the sort order are not particularly limited, and the attribute information can be arbitrarily selected and changed according to the purpose of sorting the plurality of characters. The following pattern can be realized. '
ソートオーダーとして、 例えば、 最優先キーに部首情報 D 1を指定し、 第 2の キーにエレメント情報 D 2を指定し、 第 3のキーに部首内画数情報 D 3を指定し、 第 4のキーに異体字情報 D 4を指定し、 第 5のキーに総画数情報 D 5を指定する ことによって、 文字構成要素間での異体関係 (関連関係) を反映させて複数の漢 字を並べ替えることができるので、 より高い有意性で異体字順 (関連字順) に複 数の漢字を配列することができるため、 所望の異体字をより効率良く抽出するこ ともできる。  As the sort order, for example, the radical information D 1 is specified as the highest priority key, the element information D 2 is specified as the second key, the radical stroke number information D 3 is specified as the third key, and the fourth By specifying the variant character information D4 for the key and the total stroke count information D5 for the fifth key, multiple kanji characters are arranged in order to reflect the variant relationship (association relationship) between character components. Since multiple kanji can be arranged in the order of variant characters (order of related characters) with higher significance, the desired variant characters can be extracted more efficiently.
[ 2 ] 第 2の実施形態  [2] Second embodiment
この第 2の実施形態としての第 2の文字処理方法をコンピュータに実行させる ためのプログラムは、 任意の文字コードを付与されて符号化文字集合を構成する 文字をコンピュータにより処理するために、 複数の文字が夫々備える属性を判別 するための属性情報と文字コードとを予め対応させて保存したデータベースを備 え、 このデータベースに保存された属性情報に基づき判別された文字の属性と属 性共通の文字に付与された文字コードを出力する。 これによつて、 入力された文 字と属性共通の文字とを配列させることができるようにしたものである。 A program for causing a computer to execute the second character processing method according to the second embodiment is provided with an arbitrary character code to constitute an encoded character set. In order to process characters by computer, a database is provided in which attribute information for identifying attributes of a plurality of characters and character codes are stored in advance in correspondence with each other, and determination is performed based on the attribute information stored in the database. Outputs the character code assigned to the character that is common to the attribute and attribute of the given character. Thus, the input character and the character having the common attribute can be arranged.
本実施形態においては、 この第 2の文字処理方法をコンピュータに実行させる ために、 上述のプログラムをコンピュータ読取可能な記録媒体に予め記録し、 コ ンピュータにィンストールすることによって、 このコンピュータに内蔵された H D D等に、 このプログラムを保存する。 これによつて、 上述した第 1の実施形態 と同様に、 本発明にかかる第 2の文字処理方法をコンピュータに実行させること ができる。  In the present embodiment, in order to cause the computer to execute the second character processing method, the above-described program is pre-recorded on a computer-readable recording medium, and is installed in the computer. Save this program on the HDD etc. This allows the computer to execute the second character processing method according to the present invention as in the first embodiment described above.
この第 2の実施形態としてのプログラムが備えるデータベースは、 上述した第 1の実施形態としてのプログラムが備えるデータベース D Bと同様に構成するこ とができるので、 このデータベース D Bについての詳細説明を省略する。  The database included in the program according to the second embodiment can be configured in the same manner as the database DB included in the program according to the first embodiment. Therefore, detailed description of the database DB will be omitted.
この第 2の実施形態としてのプログラムを文字処理用のかな漢字変換システム 等の一部プログラムとして起動し、 例えば、 漢字 「蓮」 C 1の文字コード 「4 F 7 B」 I かな漢字変換システム等のアプリケーションを介して入力された場合、 このプログラムにより漢字 C 1の異体字 C 2を属性共通の文字と判別し、 この異 体字 C 2の文字コード 「5 G 6 A」 をデータベース D Bから出力することができ る。  The program according to the second embodiment is started as a partial program such as a kana-kanji conversion system for character processing. For example, an application such as a kanji “ren” C 1 character code “4F7B” I kana-kanji conversion system When input via the, this program determines the variant C2 of kanji C1 as a character with common attributes and outputs the character code "5G6A" of this variant C2 from the database DB. Can be done.
従って、 入力されだ漢字 C 1と属性共通で選択確率の高い異体字 C 2を、 変換 候補文字として漢字 C 1と並べて優先的に表示することができるので、 相違点が 僅かなために変換候補文字の選択が困難な複数の異体字の中から、 所望の漢字を 容易に見出して漢字変換することができるようになる。 Therefore, the variant character C2 having the same attribute and high selection probability as the input kanji C1 can be preferentially displayed side by side with the kanji C1 as a conversion candidate character. Select the desired kanji from among multiple variants that are difficult to select It can be easily found and converted to kanji.
この第 2の文字処理方法をコンピュータに実行させるためのプログラムをコン ピュータにィンストールし、 このコンピュータに接続されたキーボード等から文 字入力を実行し、 例えば、 変換確定前の漠字 「辺」 に対して再変換を要求すると、 かな漢字変換システムは、 H D Dに保存されたデータベース D Bを参照し、 この 漢字 「辺」 に予め付与された文字コード C Dと対応する属性情報 Dを読み込む。 ここで読み込まれた属性情報 Dに基づいて、 この漢字 「辺」 が備える属性を判別 し、 この属性と属' I"生共通の文字を検索して一覧表示する。  A program for causing a computer to execute the second character processing method is installed on the computer, and character input is performed from a keyboard or the like connected to the computer. When re-conversion is requested, the kana-kanji conversion system refers to the database DB stored on the HDD and reads the character code CD and the attribute information D corresponding to the kanji “side” in advance. Based on the attribute information D read here, the attribute of the kanji “side” is determined, and characters common to this attribute and the genre “I” are searched and displayed in a list.
このようにデータベース D Bを参照して変換候補文字を表示するかな漢字変換 システムにおいて、 異体字優先で変換候補文字を表示するように予め指定するこ とによって、 属性情報 Dとして異体字情報 D 4を読み込むように設定し、 例えば、 漢字 「辺」 に対して再変換を要求すると、 第 6図に示すような属性共通文字一覧 表示画面 P 2に表示が切り換えられる。  In this way, in the kana-kanji conversion system that displays conversion candidate characters with reference to the database DB, the variant character information D 4 is read as the attribute information D by designating in advance that the conversion candidate characters are to be displayed with priority on the variant characters. For example, if re-conversion is requested for the kanji “side”, the display is switched to the attribute common character list display screen P2 as shown in FIG.
この属性共通文字一覧表示画面 P 2は、 キーボードからの文字コード C Dの入 力により選択された代表文字 「辺」 M lと、 この代表文字 「辺」 M lと属性共通 の文字 M 2と、 この属性共通の文字 M 2の配列方法を選択するための釦 B 1〜B 5と、 属性共通の文字 M 2に変換するための 「変換」 釦 B 6と、 より画数の少な い属性共通の文字を表示するための釦 B 7と、 より画数の多い属性共通の文字を 表示するための釦 B 8と、 この属性共通の文字 M 2から所望の文字を選択するた めのチェックボックス B 9とを表示する。  The attribute common character list display screen P2 includes a representative character “side” Ml selected by inputting a character code CD from the keyboard, a character M2 having the same attribute as the representative character “side” Ml, Buttons B1 to B5 for selecting an arrangement method of the character M2 having the common attribute, a "Convert" button B6 for converting the character M2 having the common attribute, and a common attribute having a smaller number of strokes A button B7 for displaying characters, a button B8 for displaying characters having a greater number of strokes and a common attribute, and a check box B9 for selecting a desired character from the characters M2 having the same attribute. And are displayed.
この属性共通文字一覧表示画面 P 2において、 属性共通の文字 M 2の配列方法 として、 「エレメント」 釦 B 1により'エレメント情報 D 2を選択すると、 属性共 通の文字 M 2の配列が、 これらの文字でのエレメントの異体情報 (関連情報) を 反映させるようにソートされる。 「異体字」 釦 B 2により異体字情報 D 4を選択 すると、 同様に、 属性共通の文字 M 2の配列が、 これらの文字での異体字関係を 反映させるようにソートされる。 In the attribute common character list display screen P2, when the element information D2 is selected by the "element" button B1 as an arrangement method of the attribute common character M2, the arrangement of the attribute common character M2 becomes The variant information (related information) of the element in the letter Sorted to reflect. When the variant character information D4 is selected by the "variant character" button B2, similarly, the arrangement of the characters M2 having the common attribute is sorted so as to reflect the variant character relationship between these characters.
また、 この属性共通文字一覧表示画面 P 2において、 「人名実績」 釦 B 3によ り人名実績情報 D 6を選択すると、 属性共通の文字 M 2の中から人名使用実績の ある文字が優先表示される。 「地名実績」 釦 B 4又は 「地域実績」 釦 B 5による 地名実績情報 D 7又は地域実績情報 D 8の選択によっても、 同様に、 属性共通の 文字 M 2の中から地名使用実績又は地域使用実績のある文字が優先表示されるよ うになる。  In the attribute common character list display screen P2, when the person name result information D6 is selected with the "person name result" button B3, the character having the person name use result is displayed preferentially from the attribute common character M2. Is done. Similarly, by selecting the place name result information D 7 or the region result information D 8 using the “Place name result” button B 4 or “Region result” button B 5, the place name use result or the region use is similarly selected from the characters M 2 with the common attribute. Proven characters will be displayed preferentially.
従って、 文字入力時に、 入力される文字の属性に応じて変換候補文字を呼び出 して一覧表示することができるので、 入力文字に対して複数の類似する異体字が 存在する場合であっても、 選択確率の高い変換候補文字を優先的に表示すること ができる。 よって、 文字入力効率を向上させることができる。  Therefore, at the time of character input, conversion candidate characters can be called and displayed in a list according to the attribute of the character to be input. Therefore, even when a plurality of similar variants exist for the input character, A conversion candidate character having a high selection probability can be displayed preferentially. Therefore, the character input efficiency can be improved.
次に、 本発明に係る第 2の文字処理方法の実施形態として、 このプログラムに よる第 2の文字処理例について、 第 7図に示すフローチャートを参照して説明す る。  Next, as an embodiment of the second character processing method according to the present invention, a second example of character processing by this program will be described with reference to the flowchart shown in FIG.
この第 2の文字処理例では、 任意の文字コードを付与されて符号化文字集合を 構成する文字を処理する場合に、 複数の文字が夫々備える属性を判別するための 属性情報 Dと文字コード C Dとを予め対応させてデータベース D Bに保存し、 こ のデータベース D Bを備えたプログラムをコンピュータにインストールすること を前提とする。 このプログラムは、 かな漢字変換システム等での文字処理をコン ピュータに実行させるために起動され、 第 7図に示すフローチャートに沿ってコ ンピュータに文字処理を実行させるものとする。 この前提において、 かな漢字変換システムを文字処理に使用するアプリケーシ ヨンを起動すると、 コンピュータによる文字処理が開始され、 第 7図に示すフロ 一チャートのステップ S 1 0で、 このコンピュータは、 データベース D Bに属性 情報 Dが保存された符号化文字集合から文字を選択する文字コード C Dの入力を 待ち、 キーボードから文字コード C Dが入力されるとステップ S 1 1に進み、 デ ータベース D Bを参照する。 In this second example of character processing, when processing a character which is provided with an arbitrary character code and constitutes a coded character set, attribute information D and a character code CD for discriminating attributes of a plurality of characters are provided. It is assumed that these are stored in a database DB in a manner corresponding to each other in advance, and that a program provided with this database DB is installed on a computer. This program is activated to cause a computer to execute character processing in a kana-kanji conversion system or the like, and causes the computer to execute character processing according to a flowchart shown in FIG. Under this assumption, when an application that uses the Kana-Kanji conversion system for character processing is started, character processing by a computer is started. In step S10 of the flowchart shown in FIG. 7, this computer has an attribute in the database DB. Waits for the input of a character code CD for selecting a character from the coded character set in which the information D is stored. When the character code CD is input from the keyboard, the flow advances to step S11 to refer to the database DB.
次のステップ S 1 2で、 この文字コード C Dの入力により選択された文字 「辺」 M lに関して属性情報 Dをデータベース D Bから読み込み、 ここに読み込 まれた属性情報 Dに基づいて、 ステップ S 1 3で、 この文字 「辺」 M lが備える 属性を判別する。  In the next step S12, the attribute information D is read from the database DB for the character "side" Ml selected by inputting the character code CD, and based on the attribute information D read here, the step S1 is executed. In step 3, the attribute of the character "edge" Ml is determined.
ここで判別された属性と属性共通の文字 M 2を、 ステップ S 1 4で、 データべ ース D Bで検索し、 ステップ S 1 5で、 検索された属性共通の文字 M 2に付与さ れた文字コード C Dをデータベース D Bから出力する。  In step S 14, the character M 2 that is common to the attribute determined here is searched in the database DB, and in step S 15, the character M 2 that is searched for is assigned to the character M 2 that is common to the attribute Output character code CD from database DB.
かな漢字変換システムは、 データベース D Bから出力された文字コード C Dに 基づいて、 ステップ S 1 6で、 この属性共通の文字 M 2を、 属性共通文字一覧表 示画面 P 2のように表示する。 このとき、 第 5図に示したフローチャートをコー ルし、 データベース D Bから出力された文字コードを入力してステップ S 1乃至 ステップ S 9の処理を実行することによって、 属性共通の文字 M 2を規則正しく 配列表示して終了する。  In step S16, the kana-kanji conversion system displays the attribute-common character M2 as in an attribute-common character list display screen P2 based on the character code CD output from the database DB. At this time, by calling the flowchart shown in FIG. 5 and inputting the character code output from the database DB and executing the processing of steps S1 to S9, the character M2 having the common attribute can be regularly formed. Display array and exit.
このように本実施形態としての第 2の文字処理方法をコンピュータに実行ざせ るためのプログラムによれば、 入力された文字と属性共通の文字とを配列させる ことができるので、 有意性の高い配列順位で複数の文字を並べることができる。 従って、 この文字と属性共通で選択確率の高い文字を選択候補文字として優先的 に表示することができるため、 所望の文字を短時間で容易に見出すことができる ようになる。 また、 この符号化文字集合に新たな文字が追加された場合であって も、 選択候補文字表示の規則性を維持しで文字を処理することができる。 従って、 所望の文字を迅速に抽出して新たな符号化文字集合を容易に生成することができ る。 As described above, according to the program for causing the computer to execute the second character processing method according to the present embodiment, the input character and the character having the common attribute can be arranged, so that the arrangement having high significance is achieved. Multiple characters can be arranged in order. Therefore, a character that has the same attribute and high selection probability as this character has priority as a selection candidate character. Can be easily displayed in a short time. Further, even when a new character is added to the encoded character set, the character can be processed while maintaining the regularity of the selection candidate character display. Therefore, a desired character can be quickly extracted and a new encoded character set can be easily generated.
なお、 本実施形態においては、 かな漢字変換システムを使用した文字入力を例 示したが、 本発明 ίこ係る第 2の文字処理方法は、 画像を文字列 (テキスト) に変 換可能な O C R (Optical' Character Reader:光学式文字読取装置) ソフトゥェ ァ等にも適用することが可能である。 この O C Rにより、 例えば、 名刺等の人名 情報を読み取る場合に、 O C Rで判別できなかった文字に対する変換候補文字と して人名実績を有する異体字一覧を優先的に表示させること等が可能となるので、 O C Rでの文字列変換効率を向上させることが可能となる。  In the present embodiment, the character input using the kana-kanji conversion system has been described as an example. However, according to the second character processing method according to the present invention, an OCR (Optical) capable of converting an image into a character string (text) is used. 'Character Reader: Optical character reader. It can be applied to software etc. With this OCR, for example, when reading personal name information such as a business card, it is possible to preferentially display a list of variant characters having a personal name record as conversion candidate characters for characters that could not be determined by the OCR. The efficiency of character string conversion in OCR can be improved.
以上、 本発明の実施形態及びその効果について詳細に説明してきたが、 本発明 は、 これら第 1及び第 2の実施形態の構成に限定されるものではなく、 本願明細 書に添付した特許請求の範囲により規定された本発明の適用範囲から逸脱せずに、 上述した実施形態の構成が有する機能を達成可能な構成であれば、 どのようなも のであっても適用することができる。 産業上の利用可能性  The embodiments of the present invention and the effects thereof have been described above in detail. However, the present invention is not limited to the configurations of the first and second embodiments, and the claims attached to the present specification are not limited thereto. Any configuration that can achieve the functions of the configuration of the above-described embodiment can be applied without departing from the application range of the present invention defined by the scope. Industrial applicability
本発明は、 コンピュータ等の情報処理装置による文字検索処理及びソート処理 に適用して極めて好適である。  INDUSTRIAL APPLICABILITY The present invention is very suitably applied to character search processing and sort processing by an information processing device such as a computer.
本発明によれば、 文字コードに依存することなく属性情報に基づき決定された 配列順位で複数の文字を並べ替えることができるので、 符号化文字集合への新規 文字追加に起因する配列順位の未必的変更を回避することが可能となる。 According to the present invention, a plurality of characters can be rearranged in an arrangement order determined based on attribute information without depending on a character code. Unnecessary changes in the order of the array due to the addition of characters can be avoided.
従って、 この属性情報に基づく配列順位の規則性を維持することができるので、 この配列順位の有意性を損なうことなく複数の文字を並べ替えるように文字処理 することができる。 このため、 並べ替えられた複数の文字から所望の文字を迅速 に抽出して新たな符号ィ匕文字集合を容易に生成することができる。  Therefore, the regularity of the arrangement order based on the attribute information can be maintained, and character processing can be performed so as to rearrange a plurality of characters without impairing the significance of the arrangement order. For this reason, a desired character can be quickly extracted from a plurality of rearranged characters, and a new character set can be easily generated.
また、 入力された文字と属性共通の文字とを配列させることができるので、 有 意性の高い配列順位で複数の文字を並べることができる。 従って、 この文字と属 性共通で選択確率の高い文字を選択候補文字として優先的に表示することができ るので、 所望の文字を短時間で容易に見出すことができるようになる。 また、 こ の符号化文字集合に新たな文字が追加された場合であっても、 選択候補文字表示 の規則性を維持して文字を処理することができる。 従って、 所望の文字を迅速に 抽出して新たな符号化文字集合を容易に生成することができる。  In addition, since the input character and the character having the common attribute can be arranged, a plurality of characters can be arranged in a highly meaningful arrangement order. Therefore, a character having a high selection probability and having the same attribute as the character can be preferentially displayed as a selection candidate character, so that a desired character can be easily found in a short time. Also, even when a new character is added to this encoded character set, the character can be processed while maintaining the regularity of the selection candidate character display. Therefore, a desired character can be quickly extracted and a new encoded character set can be easily generated.

Claims

請求の範囲 The scope of the claims
1 . 任意の文字コードを付与された符号化文字集合を構成する文字を処理する 方法であって、 1. A method for processing characters constituting an encoded character set to which an arbitrary character code is assigned,
複数の前記文字の夫々が備える属性情報と前記文字コードとを対応させて保存 し、  Attribute information provided for each of the plurality of characters and the character code are stored in association with each other,
前記属性情報が保存された前記符号化文字集合から複数の前記文字を選択する 文字コードの入力を待ち、  Waiting for input of a character code for selecting a plurality of the characters from the encoded character set in which the attribute information is stored,
該文字コードの入力により選択された複数の前記文字に関して前記属性情報を 読み込み、  Reading the attribute information for the plurality of characters selected by inputting the character code;
読み込まれた前記属性情報に基づき複数の前記文字が夫々備える属性を判別し、 該判別結果に従!/、複数の前記文字に対して配列順位を決定する文字処理方法。  The attribute of each of the plurality of characters is determined based on the read attribute information, and according to the determination result! /, A character processing method for determining an arrangement order for a plurality of the characters.
2 . 任意の文字コードを付与されて符号化文字集合を構成する文字を処理する 方法であって、 2. A method for processing a character which is provided with an arbitrary character code and constitutes an encoded character set,
複数の前記文字の夫々が備える属性情報と前記文字コードとを対応させて保存 し、  Attribute information provided for each of the plurality of characters and the character code are stored in association with each other,
前記属性情報が保存された前記符号化文字集合から前記文字を選択する文字コ 一ドの入力を待ち、  Waiting for input of a character code for selecting the character from the coded character set in which the attribute information is stored,
該文字コードの入力により選択された前記文字に関して前記属性情報を読み込 み、  Reading the attribute information for the character selected by inputting the character code;
読み込まれた前記属性情報に基づき前記文字が備える属性を判別し、  Determining the attribute of the character based on the read attribute information;
判別された前記属性と属性共通の文字を検索し、 検索された前記属性共通の文字に付与された文字コードを出力する文字処理方 法。 Search for characters common to the attribute and the determined attribute, A character processing method for outputting a character code assigned to the searched character common to the attributes.
3 . 任意の文字コードを付与されて符号化文字集合を構成する文字を処理する ために、 3. In order to process characters that are given an arbitrary character code and constitute an encoded character set,
複数の前記文字の夫々が備える属性情報と前記文字コードとを予め対応させて 保存したデータベースを備え、  A database in which attribute information of each of the plurality of characters and the character codes are stored in advance in correspondence with each other;
該データベースに前記属性情報が保存された前記符号化文字集合から複数の前 記文字を選択する文字コードの入力を待ち、  Waiting for input of a character code for selecting a plurality of the aforementioned characters from the encoded character set in which the attribute information is stored in the database,
該文字コードの入力により選択された複数の前記文字に関して前記属性情報を 前記データベースから読み込み、  Reading the attribute information from the database for the plurality of characters selected by inputting the character code,
読み込まれた前記属性情報に基づき複数の前記文字が夫々備える属性を判別し、 該判別結果に従い複数の前記文字に対して配列順位を決定する文字処理方法を コンピュータに実行させるためのプログラム。  A program for causing a computer to execute a character processing method of determining an attribute included in each of a plurality of characters based on the read attribute information and determining an arrangement order for the plurality of characters according to the determination result.
4 . 前記データベースは、 前記属性情報として文字構成要素情報を保存する請 求の範囲第 3項に記載のプログラム。 4. The program according to claim 3, wherein the database stores character component information as the attribute information.
5 . 前記データベースは、 前記属性情報として異体字情報を保存する請求の範 囲第 3項に記載のプログラム。 5. The program according to claim 3, wherein the database stores variant character information as the attribute information.
6 . 前記データベースは、 前記属性情報として人名実績情報を保存する請求の 範囲第 3項に記載のプログラム。 6. The program according to claim 3, wherein the database stores personal name performance information as the attribute information.
7 . 前記データベースは、 前記属性情報として地名実績情報を保存する請求の 範囲第 3項に記載のプログラム。 7. The program according to claim 3, wherein the database stores place name performance information as the attribute information.
8 . 前記データベースは、 前記属性情報として地域実績情報を保存する請求の 範囲第 3項に記載のプログラム。 8. The program according to claim 3, wherein the database stores regional performance information as the attribute information.
9 . 任意の文字コードを付与されて符号化文字集合を構成する文字を処理する ために、 9. In order to process characters that are given an arbitrary character code and constitute an encoded character set,
複数の前記文字の夫々が備える属性情報と前記文字コードとを予め対応させて 保存したデータベースを備え、  A database in which attribute information of each of the plurality of characters and the character codes are stored in advance in correspondence with each other;
該データベースに前記属性情報が保存された前記符号化文字集合から前記文字 を選択する文字コードの入力を待ち、  Waiting for input of a character code for selecting the character from the coded character set in which the attribute information is stored in the database,
該文字コードの入力により選択された前記文字に関して前記属性情報を前記デ ータベースから読み込み、  Reading the attribute information from the database for the character selected by inputting the character code;
読み込まれた前記属性情報に基づき前記文字が備える属性を判別し、  Determining the attribute of the character based on the read attribute information;
判別された前記属性と属性共通の文字を前記データベースで検索し、 検索された前記属性共通の文字に付与された文字コードを出力する文字処理方 法をコンピュータに実行させるためのプログラム。  A program for causing a computer to execute a character processing method of searching the database for characters having the same attribute as the determined attribute and having a common attribute, and outputting a character code assigned to the searched character having the common attribute.
1 0: 前記データベースは、 前記属性情報として文字構成要素情報を保存する 請求の範囲第 9項に記載のプログラム。 10. The program according to claim 9, wherein the database stores character component information as the attribute information.
1 1 . 前記データベースは、 前記属性情報として異体字情報を保存する請求の 範囲第 9項に記載のプロダラム。 11. The program according to claim 9, wherein the database stores variant character information as the attribute information.
1 2 . 前記データベースは、 前記属性情報として人名実績情報を保存する請求 の範囲第 9項に記載のプログラム。 12. The program according to claim 9, wherein the database stores personal name performance information as the attribute information.
1 3 . 前記データベースは、 前記属性情報として地名実績情報を保存する請求 の範囲第 9項に記載のプロダラム。 1 4 . 前記データベースは、 前記属性情報として地域実績情報を保存する請求 の範囲第 9項に記載のプログラム。 13. The program according to claim 9, wherein the database stores place name result information as the attribute information. 14. The program according to claim 9, wherein the database stores regional performance information as the attribute information.
1 5 . 請求の範囲第 3項に記載のプログラムを記録したコンピュータ読取可能 な記録媒体。 15. A computer-readable recording medium on which the program according to claim 3 is recorded.
1 6 . 前記データベースは、 前記属性情報として文字構成要素情報を保存する 請求の範囲第 1 5項に記載の記録媒体。 16. The recording medium according to claim 15, wherein the database stores character component information as the attribute information.
1 7 . 前記データベースは、 前記属性情報として異体字情報を保存する請求の 範囲第 1 5項に記載の記録媒体。 17. The recording medium according to claim 15, wherein the database stores variant character information as the attribute information.
1 8 . 前記データベースは、 前記属性情報として人名実績情報を保存する請求 の範囲第 1 5項に記載の記録媒体。 18. The recording medium according to claim 15, wherein said database stores personal name performance information as said attribute information. 18.
1 9 . 前記データベースは、 前記属性情報として地名実績情報を保存する請求 の範囲第 1 5項に記載の記録媒体。 2 0 . 前記データベースは、 前記属性情報として地域実績情報を保存する請求 の範囲第 1 5項に記載の記録媒体。 19. The recording medium according to claim 15, wherein the database stores place name performance information as the attribute information. 20. The recording medium according to claim 15, wherein said database stores regional performance information as said attribute information.
2 1 . 請求の範囲第 9項に記載のプログラムを記録したコンピュータ読取可能 な記録媒体。 21. A computer-readable recording medium on which the program according to claim 9 is recorded.
2 2 . 前記データベースは、 前記属性情報として文字構成要素情報を保存する 請求の範囲第 2 1項に記載の記録媒体。 22. The recording medium according to claim 21, wherein said database stores character component information as said attribute information.
2 3 . 前記データベースは、 前記属性情報として異体字情報を保存する請求の 範囲第 2 1項に記載の記録媒体。 23. The recording medium according to claim 21, wherein said database stores variant character information as said attribute information.
2 4 . 前記データベースは、 前記属性情報として人名実績情報を保存する請求 の範囲第 2 1項に記載の記録媒体。 2 5 . 前記データベースは、 前記属性情報として地名実績情報を保存する請求 の範囲第 2 1項に記載の記録媒体。 24. The recording medium according to claim 21, wherein said database stores personal name performance information as said attribute information. 25. The recording medium according to claim 21, wherein said database stores place name performance information as said attribute information.
2 6 . 前記データベースは、 前記属性情報として地域実績情報を保存する請求 の範囲第 2 1項に記載の記録媒体。 2 6. The database stores regional performance information as the attribute information. 21. The recording medium according to item 21.
PCT/JP2004/019445 2003-12-25 2004-12-17 Character processing execution program and recording medium WO2005064494A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2003429407 2003-12-25
JP2003-429407 2003-12-25

Publications (1)

Publication Number Publication Date
WO2005064494A1 true WO2005064494A1 (en) 2005-07-14

Family

ID=34736302

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2004/019445 WO2005064494A1 (en) 2003-12-25 2004-12-17 Character processing execution program and recording medium

Country Status (1)

Country Link
WO (1) WO2005064494A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5814248A (en) * 1981-07-17 1983-01-27 Sanyo Electric Co Ltd Kanji(chinese character) input device
JPH0696266A (en) * 1992-09-11 1994-04-08 Hitachi Ltd Correction supporting system for character recognition result
JP2001216296A (en) * 2000-01-31 2001-08-10 Fujitsu Ltd Character retrieval device, character retrieval method, and recording medium
JP2003167869A (en) * 2001-11-29 2003-06-13 Canon Inc Character processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5814248A (en) * 1981-07-17 1983-01-27 Sanyo Electric Co Ltd Kanji(chinese character) input device
JPH0696266A (en) * 1992-09-11 1994-04-08 Hitachi Ltd Correction supporting system for character recognition result
JP2001216296A (en) * 2000-01-31 2001-08-10 Fujitsu Ltd Character retrieval device, character retrieval method, and recording medium
JP2003167869A (en) * 2001-11-29 2003-06-13 Canon Inc Character processor

Similar Documents

Publication Publication Date Title
WO2007139039A1 (en) Information classification device, information classification method, and information classification program
JP2984862B2 (en) Business card electronic filing device
JP2000353177A (en) Data mining method and system
JP2002117027A (en) Feeling information extracting method and recording medium for feeling information extracting program
US20140032480A1 (en) Form template refactoring
JP2006065477A (en) Character recognition device
TWI341990B (en) Method and apparatus for searching data
WO2005064494A1 (en) Character processing execution program and recording medium
JPS63249267A (en) Electronic filing system
JP5513953B2 (en) Masking data generation apparatus and program for testing
US7660801B2 (en) Method and system for generating a serializing portion of a record identifier
JP2006190060A (en) Database retieval method, database retieval program, and original processor
JPH10312395A (en) System and method for full-text retrieval and record medium where full-text retrieving program is recorded
JP3071703B2 (en) Table creation apparatus and method
JP4272690B1 (en) Personal information file determination system
JP4388142B2 (en) Information processing system and recording medium storing program for causing computer to perform processing in this system
JP3183252B2 (en) Database search system
US20010037330A1 (en) Data input form retrieving system, data input form retrieving method, and computer-readable recording medium
JPH07302347A (en) Graph generating device
JP3005380B2 (en) Slip transaction data input device and input method
JP2002140218A (en) Data processing method, computer-readable recording medium and data processing device
JP2001312517A (en) Index generation system and document retrieval system
JPH11312166A (en) Data base management device
JP4612469B2 (en) Leakage source business investigation system and leakage source business investigation method
Hlaing Graph Querying Using Graph Code and GC_Trie

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP

122 Ep: pct application non-entry in european phase