JP4104000B2

JP4104000B2 - Information processing apparatus, control method, program, and program recording medium

Info

Publication number: JP4104000B2
Application number: JP2003141277A
Authority: JP
Inventors: 嘯洲王; 幹人廣田; 正章石橋; 和浩薮田
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2003-05-20
Filing date: 2003-05-20
Publication date: 2008-06-18
Anticipated expiration: 2023-05-20
Also published as: JP2004348185A

Description

【０００１】
【発明の属する技術分野】
本発明は、情報処理装置、制御方法、プログラム、データ記録媒体、及びプログラム記録媒体に関する。特に本発明は、文字列を操作する情報処理装置、制御方法、プログラム、データ記録媒体、及びプログラム記録媒体に関する。
【０００２】
【従来の技術】
近年、ネットワークシステムの発達により、ネットワーク上の情報から所望の情報を検索するシステムが用いられている。このようなシステムにおいては、文字列の検索及び文字列の並び替え等の、文字列操作処理が用いられている。
文字列操作の一例として、特許文献１を参照。
【特許文献１】
特開平６−５９８５７号公報
【０００３】
【発明が解決しようとする課題】
しかしながら、上記システムにおいては、文字の属性に基づいて文字列操作処理を行うことが出来ない。例えば、文字が漢字等である場合、漢字等は複数の発音や意味を保持する場合があるので、漢字の読み方に基づいて、漢字を検索することはできない。また、当該漢字がどの言語に用いられる文字であるかを判別することが困難であり、利用者の所望する言語と異なる言語で記述された文字を検索してしまう場合や、検索結果の文字列が文字化けを起こす場合がある。
【０００４】
一例としては、従来のユニコード体系は、日本語、中国語、及び韓国語において共通して用いられる文字を、一の文字コードとして定めている。従って、上記システムは、ネットワーク上のウェブ文書等を検索する場合に、当該文書が日本語、中国語、及び韓国語の何れで記載されたものであるのかを、文字コードに基づいて判断することができず、言語を適切に識別する機能を別途備えなければならなかった。
そこで本発明は、上記の課題を解決することのできる情報処理装置、制御方法、プログラム、データ記録媒体、及びプログラム記録媒体を提供することを目的とする。この目的は特許請求の範囲における独立項に記載の特徴の組み合わせにより達成される。また従属項は本発明の更なる有利な具体例を規定する。
【０００５】
【課題を解決するための手段】
即ち、本発明の第１の形態によると、複数の文字のそれぞれを、当該文字に対応する文字コードにより識別する情報処理装置であって、文字コードのそれぞれは、当該文字コードにより当該文字を特定するために必要な文字特定情報と、当該文字コードの文字特定情報に付加された、当該文字コードにより識別される文字の文字列として表されたときの語句の属性を示す拡張情報とを含み、文字コードのそれぞれにおける拡張情報が格納される領域は、当該文字コードを含む文書情報を生成するアプリケーションプログラムに応じて異なり、当該情報処理装置は、複数の文字列を含む文書情報を入力し、更に、拡張情報に、当該拡張情報が文字コードの何れの領域に格納されているかを示す情報を対応付けた格納位置指定情報を、文書情報に対応付けて取得する文書情報入力部と、文書情報入力部により入力された文書情報を生成したアプリケーションプログラムの種類を検出するアプリケーション種別検出部と、アプリケーション種別検出部により検出されたアプリケーションプログラムの種類に応じて、並び替えに用いる拡張情報が格納されている領域を格納位置指定情報に基づいて選択、文書情報に含まれる複数の文字列を、複数の文字列のそれぞれに含まれる各文字に対応する選択した当該領域に格納された拡張情報に基づいて、並び替える文字列並替部とを備える情報処理装置、当該情報処理装置を制御する制御方法、プログラム、及びプログラムを記録したプログラム記録媒体を提供する。
また、本発明の第２の形態によると、複数の文字のそれぞれを、当該文字に対応する文字コードにより識別する情報処理装置であって、文字コードのそれぞれは、当該文字コードにより当該文字を特定するために必要な文字特定情報と、当該文字コードの文字特定情報に付加された、当該文字コードにより識別される文字の文字列として表されたときの語句の属性を示す拡張情報とを含み、文字コードのそれぞれにおける拡張情報が格納される領域は、当該文字コードを含む文書情報を生成するアプリケーションプログラムに応じて異なり、当該情報処理装置は、複数の文字列を含む文書情報を入力し、更に、拡張情報に、当該拡張情報が文字コードの何れの領域に格納されているかを示す情報を対応付けた格納位置指定情報を、文書情報に対応付けて取得する文書情報入力部と、文書情報入力部により入力された文書情報を生成したアプリケーションプログラムの種類を検出するアプリケーション種別検出部と、アプリケーション種別検出部により検出されたアプリケーションプログラムの種類に応じて、並び替えに用いる拡張情報が格納されている領域を格納位置指定情報に基づいて選択し、複数の文字列のそれぞれに含まれる各文字に対応する選択した当該領域に格納された拡張情報に基づいて、複数の文字列の中から少なくとも一の文字列を検索する文字列検索部とを備える情報処理装置、当該情報処理装置を制御する制御方法、プログラム、及びプログラムを記録したプログラム記録媒体を提供する。
なお上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではなく、これらの特徴群のサブコンビネーションも又発明となりうる。
【０００６】
【発明の実施の形態】
以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではなく、又実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。
【０００７】
図１は、情報処理装置１０の機能ブロック図を示す。情報処理装置１０は、複数の文字列を含む文書情報を、当該文書情報に含まれる文字を特定するための文字コードの集合として外部から取得する。ここで、各文字コードは、当該文字コードに対応する文字の読み、部首、又は画数の少なくとも１つを識別する属性情報を、当該文字コードにより文字を特定するために必要な文字特定情報として含む。これにより、情報処理装置１０は、文字を特定するための文字コードを用いて、文字の属性を示す他の情報を用いることなく、文字の属性に応じた処理、例えば、文字の検索、文字列の並び替え、文字の読み上げ、及び文字の表示等の処理を行うことができる。
【０００８】
情報処理装置１０は、文書情報を用いて情報の処理を行うアプリケーションプログラム２０−１〜Ｎと、複数のアプリケーションプログラムに共通な文字コードの情報を提供するオペレーティングシステム３０とを備える。アプリケーションプログラム２０−１は、情報処理装置１０を、文書情報入力部２１０と、アプリケーション種別検出部２２０と、属性情報入力部２３０と、文字列並替部２４０と、文字列検索部２５０と、文字列出力部２６０として機能させる。アプリケーションプログラム２０−２〜Ｎのそれぞれは、アプリケーションプログラム２０−１と略同一であるので説明を省略する。
【０００９】
文書情報入力部２１０は、複数の文字列を含む文書情報を入力し、アプリケーション種別検出部２２０、文字列並替部２４０、及び文字列検索部２５０に送る。入力方法として、好ましくは、文書情報入力部２１０は、文字列の読みを示す入力文字列を利用者から受け付け、当該入力文字列を、文字コードを複数配列した出力文字列に仮名漢字変換し、当該出力文字列を文書情報として入力する。
【００１０】
ここで、文書情報入力部２１０により入力される文字コードは、文字特定情報である属性情報に、更に、各文字コードの文字特定情報に付加された、当該文字コードにより識別される文字の属性を示す拡張情報を含む。例えば、文字が漢字である場合の属性情報とは、文字が属する言語の種類、文字の音読み、部首、及び画数の組合せである。また、拡張情報とは、文字の訓読みを識別する情報及び文字が人名用漢字であるか否かを示す情報である。
【００１１】
これに代えて、文書情報入力部２１０は、アプリケーションプログラム２０−２において生成された文書情報を、アプリケーションプログラム２０−２から取得することにより入力してもよい。この場合、文書情報入力部２１０は、拡張情報を識別する情報に、当該拡張情報が文字コードのうち何れのフィールドに格納されているかを示す情報を対応付けた格納位置指定情報５０を、文書情報に対応付けて更に取得し、アプリケーション種別検出部２２０に送る。
また、文書情報とは、例えば、複数の文字列を含むテキスト文書である。これに代えて、文書情報とは、複数の文字列を含むデータベースであってもよいし、複数の文字列をエントリとして含む表又はテーブルであってもよい。
【００１２】
アプリケーション種別検出部２２０は、文書情報入力部２１０から受け取った文書情報又は格納位置指定情報に基づいて、文書情報入力部２１０により入力された文書情報を生成したアプリケーションプログラムの種類を検出し、検出結果を格納位置指定情報と伴に、文字列並替部２４０及び文字列検索部２５０に送る。例えば、アプリケーション種別検出部２２０は、文書情報又は格納位置指定情報を格納するファイル名の拡張子に基づいて、文書情報を生成したアプリケーションプログラムの種類を特定してもよいし、当該ファイルの内容を解析することにより、文書情報を生成したアプリケーションプログラムの種類を特定してもよい。
【００１３】
属性情報入力部２３０は、属性情報の種類を示す属性種類情報及び／又は拡張情報の種類を示す拡張種類情報を、文字を並び替える並替指示に対応付けて、利用者等からの指示に応じて入力し、文字列並替部２４０に送る。また、属性情報入力部２３０は、属性情報及び／又は拡張情報を、文字を検索する検索指示に対応付けて外部から入力し、文字列検索部２５０に送る。
【００１４】
文字列並替部２４０は、属性情報入力部２３０から受け取った情報が属性種類情報である場合に、文書情報入力部２１０から受け取った文書情報に含まれる複数の文字列を、当該複数の文字列のそれぞれに含まれる各文字の文字コードに含まれる属性情報に基づき並び替え、並び替えた結果の文字列を文字列出力部２６０に送る。例えば、文字列並替部２４０は、各文字の文字コードに含まれる属性情報のうち、属性情報入力部２３０から受け取った属性種類情報が示す種類の情報に基づいて並び替えを行う。一例としては、各文字の属性情報が文字の読み、部首、又は画数を識別する情報である場合に、属性情報入力部２３０から受け取った属性種類情報が文字の画数を示しているのであれば、文字列並替部２４０は、各文字列の先頭の文字の画数が少ない順に、文字列を並び替える。
【００１５】
また、文字列並替部２４０は、属性情報入力部２３０から受け取った情報が拡張種類情報である場合に、文書情報入力部２１０から受け取った文書情報に含まれる複数の文字列を、複数の文字列のそれぞれに含まれる各文字に対応する拡張情報に更に基づき並び替える。例えば、文字列並替部２４０は、文字コードのうち属性情報入力部２３０から受け取った拡張種類情報が示す種類の拡張情報が格納されているフィールドを、アプリケーション種別検出部２２０から受け取った格納位置指定情報５０及び拡張情報項目表データベース３１０から受け取った拡張情報項目表に基づいて選択し、選択した当該フィールドに格納された拡張情報に更に基づいて、文書情報における複数の文字列を並び替える。
【００１６】
文字列検索部２５０は、属性情報入力部２３０から受け取った情報が属性情報である場合に、文書情報入力部２１０から受け取った文書情報に含まれる複数の文字列のうち、受け取った当該属性情報を含む一の文字列を検索し、検索結果、例えば、検索された文字列を、文字列出力部２６０に送る。
【００１７】
また、文字列検索部２５０は、属性情報入力部２３０から受け取った情報が拡張情報である場合に、文書情報入力部２１０から受け取った文書情報に含まれる複数の文字列のうち、当該拡張情報を含む少なくとも一の文字列を検索し、検索結果を文字列出力部２６０に送る。例えば、文字列検索部２５０は、文字コードのうち属性情報入力部２３０から受け取った拡張情報が格納されるフィールドを、アプリケーション種別検出部２２０から受け取った格納位置指定情報５０及び拡張情報項目表データベース３１０から受け取った拡張情報項目表に基づいて選択し、選択した当該フィールドに格納された拡張情報を検索することにより、文字列を検索する。
【００１８】
文字列出力部２６０は、文字列並替部２４０又は文字列検索部２５０から受け取った文字列を、当該文字列を構成する文字の属性情報に基づいて、外部に出力する。例えば、文字列出力部２６０は、文字の属性情報に基づいて、出力するべき文字の形状を識別する形状識別情報を、基本次元文字コード表データベース３００から選択し、当該形状識別情報が示す形状により文字を出力する。一例としては、文字列出力部２６０は、文字コードを構成する属性情報の中から、当該文字コードに対応する文字が属する言語を含む言語情報を選択し、当該言語情報に適したフォントにより文字列を表示してもよい。また、他の例としては、文字列出力部２６０は、文字コードを構成する属性情報の中から、当該文字コードに対応する文字の読み及び読みの抑揚を示す情報を選択し、これらの情報に基づいて、文字を読み上げる音声を出力してもよい。
【００１９】
オペレーティングシステム３０は、アプリケーションプログラム２０−１〜Ｎに対して共通のデータ及び処理を提供するプログラムであり、基本次元文字コード表データベース３００と、拡張情報項目表データベース３１０とを備える。なお、オペレーティングシステム３０は、アプリケーションプログラム２０−１〜Ｎに対して共通のデータ及び処理を供給するプログラムであればよく、例えば、他のオペレーティングシステム上で動作し、アプリケーションプログラム２０−１〜Ｎに対して共通のデータ及び処理を供給するミドルウェアであってもよい。
【００２０】
図２（ａ）は、基本次元文字コード表データベース３００の詳細を示す。基本次元文字コード表データベース３００は、本発明に係るデータ記録媒体の一例であり、複数の文字のそれぞれにおいて、当該文字の表示又は印刷に用いる文字の形状を識別する形状識別情報を格納する形状識別情報格納領域と、当該形状識別情報に対応付けて当該文字の文字コードを格納する文字コード格納領域とを含む。形状識別情報とは、文字の形状を示す文字フォント、例えば、ビットマップフォント又はアウトラインフォントである。これに代えて、形状識別情報とは、文字を識別する他の文字コード、例えば、ＪＩＳコード、シフトＪＩＳコード、又はユニコード等であってもよい。
【００２１】
また、文字コードは、当該文字コードに対応する文字が属する言語を含む言語情報と、当該文字の音読みを示す情報と、当該文字の読みの抑揚を示す情報と、当該文字の画数を示す情報と、当該文字の部首を示す情報とを、この順に、属性情報として含む。例えば、漢字「廣」の文字コードは、日本語に属し、音読みが「コウ」であり、読みの抑揚が第１音節にあり、画数が１４画であり、かつ部首が「まだれ」である旨を示す属性情報を、当該文字コードにより文字を識別するために必要な文字特定情報として含む。一例としては、日本語に属する旨を０１で示し、音読みが「コウ」である旨を３Ｆで示し、読みの抑揚が第１音節にある旨を０３で示し、画数が１３画である旨を０Ｄで示し、部首が「まだれ」である旨を０３で示し、結果として、漢字「廣」の文字コードは「０１３Ｆ０３０Ｄ０３」である。
【００２２】
図２（ｂ）は、基本次元文字コード表データベース３００が格納する文字コードの概念図を示す。本実施形態に係る文字コード、例えば、漢字「廣」の文字コードは、言語情報を示すベクトルと、音読みを示すベクトルと、読みの抑揚を示すベクトルと、文字の画数を示すベクトルと、部首を示すベクトルとの合成ベクトルにより表される。即ち、複数の文字コードは、これらのベクトルにより構成される多次元のユークリッド空間内の局所局面における、離散的な点の集合である。
【００２３】
このように、本実施形態に係る文字コードは、文字を単に識別するのみならず、文字の属性を示す属性情報を内包している。これにより、情報処理装置１０は、文字コードを用いて、文字の属性を示す他の情報を用いることなく、文字の属性に応じた処理を行うことができる。
なお、各属性情報を示すデータのサイズは、本図の例においては、１バイトであるが、本図の例に限定されるものではない。例えば、各属性情報のサイズは、属性情報の内容に応じて異なっていてもよい。また、文字コードに内包される属性の数及び種類は、本図の例に限定されない。情報処理装置１０は、属性情報として、本図に示した属性情報の何れか、例えば、読みの抑揚を示す情報を含んでいなくともよい。
【００２４】
図３は、拡張情報項目表データベース３１０の詳細を示す。拡張情報項目表データベース３１０は、拡張情報の種類を識別する拡張情報識別情報に対応付けて、拡張情報の種類を示す拡張種類情報を格納している。例えば、「訓読み」を示す拡張種類情報の、拡張情報識別情報は、０１である。また、「音読み」を示す拡張種類情報の、拡張情報識別情報は、０２である。また、「人名漢字」を示す拡張種類情報の、拡張情報識別情報は、０３である。また、「地名漢字」を示す拡張種類情報の、拡張情報識別情報は、０４である。
また、これらの拡張情報識別情報及び拡張種類情報は、文字コードに付加される新たな次元として、利用者により追加されもよい。この場合、好ましくは、拡張情報項目表データベース３１０は、利用者が拡張情報識別情報及び拡張種類情報を追加するための記憶領域を、予め有している。
【００２５】
図４は、格納位置指定情報５０の詳細を示す。格納位置指定情報５０は、拡張情報の種類を識別する拡張情報識別情報に、当該拡張情報が文字コードのうち何れのフィールドに格納されているかを示すフィールド識別情報を対応付けた情報である。フィールド識別情報とは、例えば、各拡張情報が、文字コードにおいて属性情報の次に連続する領域の、先頭から何番目の領域に格納されているかを示す情報である。一例としては、拡張情報識別情報が０３である拡張情報、即ち「人名漢字であるか否か」を示す情報は、文字コードにおいて属性情報の次に連続する領域の、先頭からの順番が１番目のフィールドに格納されている。
【００２６】
これにより、例えば文字列検索部２５０は、文書情報を生成したアプリケーションプログラムの種類が、アプリケーションプログラム２０−１とは異なる場合であっても、拡張情報が、文字コードの何れのフィールドに格納されているかを特定し、当該拡張情報に基づいて文字列を適切に操作することができる。例えば、属性情報入力部２３０は、人名漢字として用いられる文字を検索する旨の検索指示を利用者から受け付けると、人名漢字に対応する拡張情報識別情報が０３である旨を拡張情報項目表データベース３１０に基づき特定し、０３の拡張情報識別情報を有する拡張情報が１番目のフィールドに格納されている旨を格納位置指定情報５０により特定する。この結果、文字列検索部２５０は、検索に用いる拡張情報（例えば、検索のキーとなる属性）が格納されているフィールドを適切に選択することができる。同様に、文字列並替部２４０は、利用者から受け付けた拡張種類情報が、文字コードの何れのフィールドに格納されているかを特定し、文字列を適切に並び替えることができる。
【００２７】
図５は、情報処理装置１０のフローチャートを示す。文書情報入力部２１０は、文書情報及び格納位置指定情報５０を入力する（Ｓ６００）。そして、属性情報入力部２３０は、属性情報を入力する（Ｓ６１０）。アプリケーション種別検出部２２０により、文書情報を生成したアプリケーションプログラムの種類が、アプリケーションプログラム２０−１と異なると判断された場合に（Ｓ６２０：ＹＥＳ）、文字列並替部２４０又は文字列検索部２５０は、格納位置指定情報５０及び拡張情報項目表データベース３１０に基づき、拡張情報が格納されている文字コード中のフィールドを選択する（Ｓ６３０）。
【００２８】
文字列並替部２４０は、文書情報に含まれる複数の文字列を、複数の文字列のそれぞれに含まれる各文字に対応する属性情報及び／又は拡張情報に基づき並び替える。また、文字列検索部２５０は、文書情報に含まれる複数の文字列のうち、属性情報及び／又は拡張情報を含む少なくとも一の文字列を検索する（Ｓ６４０）。そして、文字列出力部２６０は、Ｓ６４０の処理の結果生成された文字列を、当該文字列を構成する文字の属性情報に基づいて、外部に出力する（Ｓ６５０）。
【００２９】
図６は、文書情報入力部２１０が入力する文書情報の第１の例を示す。文書情報入力部２１０は、文書情報に含まれる文字を、属性情報に拡張情報を対応付けた文字コードとして、入力する。属性情報は、図２（ａ）において説明した属性情報と略同一であるので説明を省略する。拡張情報とは、文字コードの文字特定情報に付加された、文字コードにより識別される文字の属性を示す情報であり、例えば、当該文字が人名漢字であるか否かを示す情報及び当該文字の訓読みを示す情報である。例えば、本図における００は、人名用に用いられる漢字でない旨を示し、００以外の値は、人名用に用いられる漢字である旨を示す。
【００３０】
これにより、例えば、情報処理装置１０は、人名を示す文字列のうち、所定の読みを有する文字列を適切に検索することができる。また、情報処理装置１０は、日本語として用いられる複数の文字を、画数が少ない順に並び替えることができる。このように、文字列検索部２５０は、検索の指示内容に応じて拡張情報を参照することにより、検索範囲を人名漢字のみに適切に絞り込むことができる。例えば、文字列検索部２５０は、人名漢字が用いられている文字列を検索する旨の検索指示を受けた場合に、拡張情報を参照することにより、検索範囲を人名漢字のみに適切に絞り込むことができる。
【００３１】
図７は、文書情報入力部２１０が入力する文書情報の第２の例を示す。文書情報は、それぞれが文字列の一例である「１９９８年４月１日」、「○○」、及び「××」とを含み、「１９９８年４月１日に、○○が、××社に入社した。」の文章を構成する。例えば、「１９９８年４月１日」を構成する各文字の文字コードは、拡張情報として、当該文字が日付を示す文字列を構成している旨の情報を含んでいる。このように、文書情報は、文字列として、当該文字列として表された語句の属性を、拡張情報として含む文字コードを格納している。即ち、情報処理装置１０は、文字コードのみにより、他の付加的な情報、例えば、ＨＴＭＬ又はＸＭＬ等におけるタグ等を用いることなく、構造化文書を実現できる。
【００３２】
図８は、変形例における情報処理装置１０の機能ブロック図を示す。本例における情報処理装置１０は、図１に示した情報処理装置１０に、更に、文字列格納部２７０と、文字列選択部２８０とを備える。他の構成について、本例における情報処理装置１０は、図１の情報処理装置１０と略同一の構成を取るので、相違点を説明する。
【００３３】
文字列格納部２７０は、アプリケーションプログラム２０−１の種類に応じて予め定められた複数の文字列を格納している。そして、文書情報入力部２１０は、文書情報を、複数の文字列のそれぞれにおける先頭の文字の文字コードとして入力する。例えば、「日本語」という文字列を出力するべく、文字列格納部２７０は、文字列「本語」を予め格納しており、文書情報入力部２１０は、文書情報として、「日本語」を示す「日」を入力する。文字列選択部２８０は、文書情報入力部２１０により入力された先頭の文字の当該文字コードに含まれる文字識別情報に基づき、文字列格納部２７０に格納されている文字列の中から一の文字列である「本語」を選択し、文字列出力部２６０に送る。文字列出力部２６０は、これを受けて、文書情報入力部２１０により入力された文字コードに対応する文字「日」と、文字列選択部２８０により選択された一の文字列である「本語」とを、出力する。これにより、長い固有名詞を先頭の１文字で表すことができるので、文書情報のデータサイズを小さくすることができる。
【００３４】
図９は、変形例における文書情報及び文字列の一例を示す。文書情報入力部２１０は、文書情報を、文字列の先頭の文字の文字コードとして入力する。例えば、文書情報入力部２１０は、「日本語」の先頭の文字列「日」の文字コードとして、言語の種類、音読み、画数、及び部首を含む属性情報と、「日」の次に続いて出力されるべき後続文字列「本語」を識別する文字識別情報とを入力する。後続文字列を識別する文字識別情報とは、より具体的には、後続文字列が格納されている文字列格納部２７０内の位置を示すポインタ情報であってもよいし、後続文字列が文字列格納部２７０内において格納されている順序を示す情報であってもよい。更に、文字列格納部２７０は、後続文字列を構成する各文字について、当該文字の次に出力されるべき文字を識別する情報であるポインタを、各文字の文字コードとして含む。この場合、文字列の終端の文字コードは、文字列の終端を示す情報として、ＮＵＬＬ情報を含む。
【００３５】
このように、情報処理装置１０は、複数の文字により構成される文字列を、当該文字列の先頭の１文字を示す文字コードにより表すことができる。これにより、情報処理装置１０は、所定の文字列、例えば、「日本語」という単語が頻繁に用いられる場合には、外部から入力する文書情報のデータサイズを小さくすることができる。更に、情報処理装置１０は、語句を示す文字列を一体として扱うことができるので、削除処理等により語句の一部のみが削除され、意味を形成しない文字が残存することを防ぐことができる。
【００３６】
図１０は、上記実施形態及び変形例に係る情報処理装置１０のハードウェア構成の一例を示す。実施形態又は変形例に係る情報処理装置１０は、ホストコントローラ１０８２により相互に接続されるＣＰＵ１０００、ＲＡＭ１０２０、グラフィックコントローラ１０７５、及び表示装置１０８０を有するＣＰＵ周辺部と、入出力コントローラ１０８４によりホストコントローラ１０８２に接続される通信インターフェイス１０３０、ハードディスクドライブ１０４０、及びＣＤ−ＲＯＭドライブ１０６０を有する入出力部と、入出力コントローラ１０８４に接続されるＲＯＭ１０１０、フレキシブルディスクドライブ１０５０、及び入出力チップ１０７０を有するレガシー入出力部とを備える。
【００３７】
ホストコントローラ１０８２は、ＲＡＭ１０２０と、高い転送レートでＲＡＭ１０２０をアクセスするＣＰＵ１０００及びグラフィックコントローラ１０７５とを接続する。ＣＰＵ１０００は、ＲＯＭ１０１０及びＲＡＭ１０２０に格納されたプログラムに基づいて動作し、各部の制御を行う。グラフィックコントローラ１０７５は、ＣＰＵ１０００等がＲＡＭ１０２０内に設けたフレームバッファ上に生成する画像データを取得し、表示装置１０８０上に表示させる。これに代えて、グラフィックコントローラ１０７５は、ＣＰＵ１０００等が生成する画像データを格納するフレームバッファを、内部に含んでもよい。
【００３８】
入出力コントローラ１０８４は、ホストコントローラ１０８２と、比較的高速な入出力装置である通信インターフェイス１０３０、ハードディスクドライブ１０４０、及びＣＤ−ＲＯＭドライブ１０６０を接続する。通信インターフェイス１０３０は、ネットワークを介して他の装置と通信する。ハードディスクドライブ１０４０は、情報処理装置１０が使用するプログラム及びデータを格納する。ＣＤ−ＲＯＭドライブ１０６０は、ＣＤ−ＲＯＭ１０９５からプログラム又はデータを読み取り、ＲＡＭ１０２０を介して入出力チップ１０７０に提供する。
【００３９】
また、入出力コントローラ１０８４には、ＲＯＭ１０１０と、フレキシブルディスクドライブ１０５０や入出力チップ１０７０等の比較的低速な入出力装置とが接続される。ＲＯＭ１０１０は、情報処理装置１０の起動時にＣＰＵ１０００が実行するブートプログラムや、情報処理装置１０のハードウェアに依存するプログラム等を格納する。フレキシブルディスクドライブ１０５０は、フレキシブルディスク１０９０からプログラム又はデータを読み取り、ＲＡＭ１０２０を介して入出力チップ１０７０に提供する。入出力チップ１０７０は、フレキシブルディスク１０９０や、例えばパラレルポート、シリアルポート、キーボードポート、マウスポート等を介して各種の入出力装置を接続する。
【００４０】
情報処理装置１０に提供されるプログラムは、フレキシブルディスク１０９０、ＣＤ−ＲＯＭ１０９５、又はＩＣカード等の記録媒体に格納されて利用者によって提供される。プログラムは、記録媒体から読み出され、入出力チップ１０７０を介して情報処理装置１０にインストールされ、情報処理装置１０において実行される。
【００４１】
情報処理装置１０にインストールされて実行されるプログラムは、文書情報入力モジュールと、アプリケーション種別検出モジュールと、属性情報入力モジュールと、文字列並替モジュールと、文字列検索モジュールと、文字列選択モジュールと、文字列出力モジュールとを含む。各モジュールが情報処理装置１０に働きかけて行わせる動作は、図１から図９において説明した情報処理装置１０における、対応する部材の動作と同一であるから、説明を省略する。
【００４２】
以上に示したプログラム又はモジュールは、外部の記憶媒体に格納されてもよい。記憶媒体としては、フレキシブルディスク１０９０、ＣＤ−ＲＯＭ１０９５の他に、ＤＶＤやＰＤ等の光学記録媒体、ＭＤ等の光磁気記録媒体、テープ媒体、ＩＣカード等の半導体メモリ等を用いることができる。また、専用通信ネットワークやインターネットに接続されたサーバシステムに設けたハードディスク又はＲＡＭ等の記憶装置を記録媒体として使用し、ネットワークを介してプログラムを情報処理装置１０に提供してもよい。
【００４３】
以上の実施形態及び変形例から明らかなように、本実施形態及び変形例に係る文字コードは、文字を単に識別するのみならず、文字の属性を示す属性情報を内包している。これにより、情報処理装置１０は、文字コードを用いて、文字の属性を示す他の情報を用いることなく、文字の属性に応じた文字列操作を行うことができる。
例えば、文字コードが、文字の属性として、文字の属する言語を含んでいる場合には、情報処理装置１０は、文字の言語を特定する処理を別途行うことなく、複数の種類の言語に対応した文字列操作を行うことができる。これにより、多数の種類の文字が混在するインターネットにおいても、情報処理装置１０は、言語の種類及び言語に関連する文化等の情報に基づいて、適切かつ効率的に、文字列を操作することができる。
【００４４】
以上、本発明を実施形態を用いて説明したが、本発明の技術的範囲は上記実施形態に記載の範囲には限定されない。上記実施形態に、多様な変更または改良を加えることができる。そのような変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。
【００４５】
以上に示した実施形態によると、以下の各項目に示す情報処理装置、制御方法、プログラム、データ記録媒体、及びプログラム記録媒体を実現できる。
【００４６】
（項目１）複数の文字のそれぞれを、当該文字に対応する文字コードにより識別する情報処理装置であって、前記文字コードのそれぞれは、当該文字コードに対応する文字の読み、部首、又は画数の少なくとも１つを識別する属性情報を、当該文字コードにより当該文字を特定するために必要な文字特定情報として含み、当該情報処理装置は、複数の文字列を含む文書情報を入力する文書情報入力部と、前記文書情報に含まれる前記複数の文字列を、前記複数の文字列のそれぞれに含まれる各文字の文字コードに含まれる前記属性情報に基づき並び替える文字列並替部とを備える情報処理装置。
（項目２）前記文字コードのそれぞれは、当該文字コードの文字特定情報に付加された、当該文字コードにより識別される文字の属性を示す拡張情報を更に含み、前記文字列並替部は、前記文書情報に含まれる前記複数の文字列を、前記複数の文字列のそれぞれに含まれる各文字に対応する前記拡張情報に更に基づき並び替える項目１記載の情報処理装置。
【００４７】
（項目３）前記文書情報入力部は、前記拡張情報に、当該拡張情報が前記文字コードのうち何れのフィールドに格納されているかを示す情報を対応付けた格納位置指定情報を、前記文書情報に対応付けて更に取得し、前記文字列並替部は、前記文字コードのうち前記拡張情報が格納されているフィールドを、前記格納位置指定情報に基づいて選択し、選択した当該フィールドに格納された拡張情報に基づいて、前記複数の文字列を並び替える項目２記載の情報処理装置。
（項目４）前記文字コードのそれぞれにおける前記拡張情報が格納されるフィールドは、当該文字コードを含む前記文書情報を生成するアプリケーションプログラムに応じて異なり、当該情報処理装置は、前記文書情報入力部により入力された前記文書情報を生成したアプリケーションプログラムの種類を検出するアプリケーション種別検出部と、前記アプリケーション種別検出部により検出されたアプリケーションプログラムの種類が、当該情報処理装置を前記文字列並替部として機能させるアプリケーションプログラムの種類と異なる場合に、前記文字列並替部は、前記格納位置指定情報に基づき、並び替えに用いる拡張情報が格納されているフィールドを選択する項目３記載の情報処理装置。
（項目５）前記文書情報入力部は、利用者から入力された入力文字列を、前記属性情報及び前記拡張情報を含む前記文字コードを複数配列した出力文字列に変換し、当該出力文字列を前記文書情報として入力する項目２記載の情報処理装置。
【００４８】
（項目６）複数の文字のそれぞれを、当該文字に対応する文字コードにより識別する情報処理装置であって、前記文字コードのそれぞれは、当該文字コードに対応する文字の読み、部首、又は画数の少なくとも１つを識別する属性情報を、当該文字コードにより当該文字を特定するために必要な文字特定情報として含み、当該情報処理装置は、複数の文字列を含む文書情報を入力する文書情報入力部と、前記属性情報を入力する属性情報入力部と、前記属性情報入力部により入力された前記属性情報に基づき、前記複数の文字列の中から少なくとも一の文字列を検索する文字列検索部とを備える情報処理装置。
（項目７）複数の文字のそれぞれを、当該文字に対応する文字コードにより識別する情報処理装置であって、前記文字コードのそれぞれは、当該文字コードに対応する文字の読みを識別する属性情報を、当該文字コードにより当該文字を特定するために必要な文字特定情報として含み、更に、当該文字コードに対応する文字の読みの抑揚を識別する情報を含み、当該情報処理装置は、複数の文字を含む文書情報を入力する文書情報入力部と、前記複数の文字を読み上げる音声を、前記複数の文字列のそれぞれに含まれる各文字の文字コードに対応する前記属性情報に基づいて出力する文字列出力部とを備える情報処理装置。
（項目８）前記文書情報入力部は、利用者から入力された入力文字列を、前記属性情報及び前記読みの抑揚を識別する情報を含む前記文字コードを複数配列した出力文字列に変換し、当該出力文字列を前記文書情報として入力する項目７記載の情報処理装置。
【００４９】
（項目９）入力された文字コードに基づいて文字列を出力する情報処理装置であって、前記文字コードのそれぞれは、当該文字コードに対応する文字の次に続いて出力されるべき文字列を識別する文字列識別情報を含み、複数の文字列を格納する文字列格納部と、複数の文字列を含む文書情報を、当該複数の文字列のそれぞれにおける先頭の文字の文字コードとして入力する文書情報入力部と、前記文字コード入力部により入力された当該文字コードに含まれる文字列識別情報に基づき、前記文字列格納部に格納されている文字列の中から一の文字列を選択する文字列選択部と、前記文字コード入力部により入力された文字コードに対応する文字と、前記文字列選択部により選択された前記一の文字列とを出力する文字列出力部とを備える情報処理装置。
（項目１０）複数の文字のそれぞれを、当該文字に対応する文字コードにより識別する情報処理装置であって、前記文字コードのそれぞれは、当該文字コードに対応する文字が属する言語を識別する言語情報を、当該文字コードにより当該文字を特定するために必要な情報として含み、当該情報処理装置は、複数の文字コードを含む文書情報を入力する文書情報入力部と、前記文書情報に含まれる前記複数の文字のそれぞれを、当該文字の文字コードに含まれる前記言語情報に基づいて出力する文字列出力部とを備える情報処理装置。
【００５０】
（項目１１）文字コード及び文字を対応付けて記録したデータ記録媒体であって、前記文字コードのそれぞれは、当該文字コードに対応する文字の読み、部首、又は画数の少なくとも１つを識別する属性情報を、当該文字コードにより当該文字を特定するために必要な文字特定情報として含み、複数の文字のそれぞれにおいて、当該文字の表示又は印刷に用いる文字の形状を識別する形状識別情報を格納する形状識別情報格納領域と、前記形状識別情報に対応付けて、当該文字の文字コードを格納する文字コード格納領域とを備えるデータ記録媒体。
（項目１２）複数の文字のそれぞれを、当該文字に対応する文字コードにより識別する情報処理装置を制御する制御方法であって、前記文字コードのそれぞれは、当該文字コードに対応する文字の読み、部首、又は画数の少なくとも１つを識別する属性情報を、当該文字コードにより当該文字を特定するために必要な文字特定情報として含み、複数の文字列を含む文書情報を入力する文書情報入力段階と、前記文書情報に含まれる前記複数の文字列を、前記複数の文字列のそれぞれに含まれる各文字の文字コードに含まれる前記属性情報に基づき並び替える並替段階とを備える制御方法。
【００５１】
（項目１３）複数の文字のそれぞれを、当該文字に対応する文字コードにより識別する情報処理装置を制御するプログラムであって、前記文字コードのそれぞれは、当該文字コードに対応する文字の読み、部首、又は画数の少なくとも１つを識別する属性情報を、当該文字コードにより当該文字を特定するために必要な文字特定情報として含み、前記情報処理装置を、複数の文字列を含む文書情報を入力する文書情報入力部と、前記文書情報に含まれる前記複数の文字列を、前記複数の文字列のそれぞれに含まれる各文字の文字コードに含まれる前記属性情報に基づき並び替える文字列並替部として機能させるプログラム。
（項目１４）項目１３に記載のプログラムを記録したプログラム記録媒体。
【００５２】
【発明の効果】
上記説明から明らかなように、本発明によれば文字列を適切に処理することができる。
【図面の簡単な説明】
【図１】図１は、情報処理装置１０の機能ブロック図を示す。
【図２】図２（ａ）は、基本次元文字コード表データベース３００の詳細を示す。
図２（ｂ）は、基本次元文字コード表データベース３００が格納する文字コードの概念図を示す。
【図３】図３は、拡張情報項目表データベース３１０の詳細を示す。
【図４】図４は、格納位置指定情報５０の詳細を示す。
【図５】図５は、情報処理装置１０のフローチャートを示す。
【図６】図６は、文書情報入力部２１０が入力する文書情報の第１の例を示す。
【図７】図７は、文書情報入力部２１０が入力する文書情報の第２の例を示す。
【図８】図８は、変形例における情報処理装置１０の機能ブロック図を示す。
【図９】図９は、変形例における文書情報及び文字列の一例を示す。
【図１０】図１０は、上記実施形態及び変形例に係る情報処理装置１０のハードウェア構成の一例を示す。
【符号の説明】
１０情報処理装置
２０アプリケーションプログラム
３０オペレーティングシステム
５０格納位置指定情報
２１０文書情報入力部
２２０アプリケーション種別検出部
２３０属性情報入力部
２４０文字列並替部
２５０文字列検索部
２６０文字列出力部
２７０文字列格納部
２８０文字列選択部
３００基本次元文字コード表データベース
３１０拡張情報項目表データベース[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information processing apparatus, a control method, a program, a data recording medium, and a program recording medium. In particular, the present invention relates to an information processing apparatus for operating a character string, a control method, a program, a data recording medium, and a program recording medium.
[0002]
[Prior art]
In recent years, with the development of network systems, systems for retrieving desired information from information on the network have been used. In such a system, character string manipulation processing such as character string search and character string rearrangement is used.
See Patent Document 1 as an example of character string manipulation.
[Patent Document 1]
JP-A-6-59857
[0003]
[Problems to be solved by the invention]
However, in the above system, character string manipulation processing cannot be performed based on character attributes. For example, when a character is a kanji character or the like, the kanji character or the like may have a plurality of pronunciations or meanings. Therefore, the kanji character cannot be searched based on how to read the kanji character. In addition, it is difficult to determine in which language the kanji is used, and when a character described in a language different from the language desired by the user is searched, or the character string of the search result May cause garbled characters.
[0004]
As an example, in the conventional Unicode system, a character commonly used in Japanese, Chinese, and Korean is defined as one character code. Therefore, when searching for a web document on the network, the system determines whether the document is written in Japanese, Chinese, or Korean based on the character code. However, it had to have a separate function to properly identify the language.
Accordingly, an object of the present invention is to provide an information processing apparatus, a control method, a program, a data recording medium, and a program recording medium that can solve the above-described problems. This object is achieved by a combination of features described in the independent claims. The dependent claims define further advantageous specific examples of the present invention.
[0005]
[Means for Solving the Problems]
That is, according to the first aspect of the present invention, there is provided an information processing apparatus for identifying each of a plurality of characters by a character code corresponding to the character,ConcernedCharacter identification information necessary to identify the character by character codeAnd the extended information indicating the attribute of the phrase when represented as a character string of the character identified by the character code added to the character specifying information of the character code, and the extended information in each character code The area to be stored differs depending on the application program that generates document information including the character code.The information processing apparatus inputs document information including a plurality of character strings.Further, storage location designation information in which extended information is associated with information indicating in which area of the character code the extended information is stored is acquired in association with the document information.A document information input section;An application type detection unit that detects the type of application program that generated the document information input by the document information input unit, and extended information used for sorting according to the type of application program detected by the application type detection unit Is selected based on the storage location specification information,Multiple character strings included in document informationBased on the extended information stored in the selected area corresponding to each character included in each of a plurality of character strings,Information processing apparatus including a character string rearranging unit for rearranging, a control method for controlling the information processing apparatus, a program,And program recording medium on which program is recordedI will provide a.
According to the second aspect of the present invention, there is provided an information processing apparatus for identifying each of a plurality of characters by a character code corresponding to the character, wherein each character code identifies the character by the character code. Including character specification information necessary for performing, and extended information indicating the attribute of the phrase when represented as a character string of the character identified by the character code, added to the character specification information of the character code, The area where the extended information in each character code is stored differs depending on the application program that generates the document information including the character code, and the information processing apparatus inputs the document information including a plurality of character strings. , Storage location designation information in which extended information is associated with information indicating in which area of the character code the extended information is stored in document information. The document information input unit acquired by association, the application type detection unit that detects the type of application program that generated the document information input by the document information input unit, and the type of application program detected by the application type detection unit Accordingly, the extended information stored in the selected area corresponding to each character included in each of the plurality of character strings is selected based on the storage position designation information, and the area storing the extended information used for sorting is selected. , An information processing apparatus including a character string search unit that searches for at least one character string from among a plurality of character strings, a control method for controlling the information processing apparatus, a program, and a program recording medium recording the program I will provide a.
The above summary of the invention does not enumerate all the necessary features of the present invention, and sub-combinations of these feature groups can also be the invention.
[0006]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described through embodiments of the invention. However, the following embodiments do not limit the invention according to the claims, and all combinations of features described in the embodiments are included. It is not necessarily essential for the solution of the invention.
[0007]
FIG. 1 shows a functional block diagram of the information processing apparatus 10. The information processing apparatus 10 acquires document information including a plurality of character strings from the outside as a set of character codes for specifying characters included in the document information. Here, each character code is attribute information that identifies at least one of the reading, radical, or number of strokes of the character corresponding to the character code as character specifying information necessary for specifying the character by the character code. Including. Thereby, the information processing apparatus 10 uses the character code for specifying the character, and does not use other information indicating the character attribute, for example, processing according to the character attribute, for example, character search, character string Such as rearrangement, reading of characters, and display of characters can be performed.
[0008]
The information processing apparatus 10 includes application programs 20-1 to 20-N that process information using document information, and an operating system 30 that provides character code information common to a plurality of application programs. The application program 20-1 includes the information processing apparatus 10, a document information input unit 210, an application type detection unit 220, an attribute information input unit 230, a character string rearrangement unit 240, a character string search unit 250, It functions as the column output unit 260. Since each of the application programs 20-2 to 20-N is substantially the same as the application program 20-1, description thereof is omitted.
[0009]
The document information input unit 210 inputs document information including a plurality of character strings, and sends the document information to the application type detection unit 220, the character string rearrangement unit 240, and the character string search unit 250. As an input method, preferably, the document information input unit 210 receives an input character string indicating reading of a character string from a user, converts the input character string into an output character string in which a plurality of character codes are arranged, The output character string is input as document information.
[0010]
Here, the character code input by the document information input unit 210 includes the character information identified by the character code added to the character identification information of the character code and the attribute information that is the character identification information. Contains extended information to indicate. For example, the attribute information in the case where the character is a Chinese character is a combination of the type of language to which the character belongs, the sound reading of the character, the radical, and the number of strokes. Further, the extended information is information for identifying the reading of the character and information indicating whether or not the character is a personal name kanji.
[0011]
Instead, the document information input unit 210 may input the document information generated in the application program 20-2 by acquiring it from the application program 20-2. In this case, the document information input unit 210 uses the storage location designation information 50 in which the information for identifying the extension information is associated with the information indicating in which field of the character code the extension information is stored. Are further acquired in association with the ID and sent to the application type detection unit 220.
The document information is, for example, a text document including a plurality of character strings. Instead of this, the document information may be a database including a plurality of character strings, or a table or table including a plurality of character strings as entries.
[0012]
The application type detection unit 220 detects the type of the application program that generated the document information input by the document information input unit 210 based on the document information or storage location designation information received from the document information input unit 210, and the detection result Is sent to the character string rearrangement unit 240 and the character string search unit 250 together with the storage location designation information. For example, the application type detection unit 220 may identify the type of the application program that generated the document information based on the extension of the file name that stores the document information or the storage location designation information, and the content of the file The type of application program that generated the document information may be specified by analysis.
[0013]
The attribute information input unit 230 associates attribute type information indicating the type of attribute information and / or extended type information indicating the type of extended information with a rearrangement instruction for rearranging characters in response to an instruction from a user or the like. Input to the character string rearrangement unit 240. Further, the attribute information input unit 230 inputs attribute information and / or extended information from the outside in association with a search instruction for searching for a character, and sends it to the character string search unit 250.
[0014]
When the information received from the attribute information input unit 230 is attribute type information, the character string rearrangement unit 240 converts a plurality of character strings included in the document information received from the document information input unit 210 into the plurality of character strings. Are rearranged based on the attribute information included in the character code of each character included in each of the characters, and the resulting character string is sent to the character string output unit 260. For example, the character string rearrangement unit 240 performs rearrangement based on the type of information indicated by the attribute type information received from the attribute information input unit 230 among the attribute information included in the character code of each character. As an example, if the attribute information of each character is information that identifies the reading, radical, or number of strokes of the character, if the attribute type information received from the attribute information input unit 230 indicates the number of strokes of the character The character string rearrangement unit 240 rearranges the character strings in ascending order of the number of strokes of the first character of each character string.
[0015]
Further, the character string rearrangement unit 240 converts a plurality of character strings included in the document information received from the document information input unit 210 into a plurality of characters when the information received from the attribute information input unit 230 is extended type information. Rearrangement is further performed based on the extended information corresponding to each character included in each column. For example, the character string rearrangement unit 240 stores the field in which the extended information of the type indicated by the extended type information received from the attribute information input unit 230 in the character code is stored from the application type detection unit 220. A selection is made based on the information 50 and the extended information item table received from the extended information item table database 310, and a plurality of character strings in the document information are rearranged further based on the extended information stored in the selected field.
[0016]
When the information received from the attribute information input unit 230 is attribute information, the character string search unit 250 selects the received attribute information from among a plurality of character strings included in the document information received from the document information input unit 210. One character string is searched, and the search result, for example, the searched character string is sent to the character string output unit 260.
[0017]
In addition, when the information received from the attribute information input unit 230 is extended information, the character string search unit 250 selects the extended information from among a plurality of character strings included in the document information received from the document information input unit 210. At least one character string is searched for, and the search result is sent to the character string output unit 260. For example, the character string search unit 250 stores the field in which the extended information received from the attribute information input unit 230 in the character code is stored, the storage location designation information 50 received from the application type detection unit 220, and the extended information item table database 310. The character string is searched by selecting based on the extended information item table received from, and searching the extended information stored in the selected field.
[0018]
The character string output unit 260 outputs the character string received from the character string rearrangement unit 240 or the character string search unit 250 to the outside based on the attribute information of the characters constituting the character string. For example, the character string output unit 260 selects shape identification information for identifying the shape of the character to be output from the basic dimension character code table database 300 based on the character attribute information, and uses the shape indicated by the shape identification information. Output characters. As an example, the character string output unit 260 selects language information including the language to which the character corresponding to the character code belongs from the attribute information constituting the character code, and the character string using a font suitable for the language information. May be displayed. As another example, the character string output unit 260 selects, from among the attribute information constituting the character code, information indicating the reading of the character corresponding to the character code and the inflection of the reading, and includes these information. Based on this, a voice that reads out a character may be output.
[0019]
The operating system 30 is a program that provides common data and processing to the application programs 20-1 to 20 -N, and includes a basic dimension character code table database 300 and an extended information item table database 310. The operating system 30 only needs to be a program that supplies common data and processing to the application programs 20-1 to 20-N. For example, the operating system 30 operates on another operating system and includes the application programs 20-1 to 20-N. Alternatively, middleware that supplies common data and processing may be used.
[0020]
FIG. 2A shows details of the basic dimension character code table database 300. The basic dimension character code table database 300 is an example of a data recording medium according to the present invention, and for each of a plurality of characters, shape identification information for storing shape identification information for identifying the shape of the character used for displaying or printing the character. An information storage area and a character code storage area for storing a character code of the character in association with the shape identification information. The shape identification information is a character font indicating a character shape, for example, a bitmap font or an outline font. Alternatively, the shape identification information may be another character code for identifying a character, such as a JIS code, a shift JIS code, or a unicode.
[0021]
The character code includes language information including a language to which the character corresponding to the character code belongs, information indicating the reading of the character, information indicating inflection of the character, and information indicating the number of strokes of the character. The information indicating the radical of the character is included as attribute information in this order. For example, the character code for the Chinese character “廣” belongs to Japanese, the phonetic reading is “Kou”, the reading inflection is in the first syllable, the number of strokes is 14 strokes, and the radical is “Mare”. Attribute information indicating the presence is included as character specifying information necessary for identifying a character by the character code. As an example, 01 indicates that it belongs to Japanese, 3F indicates that the reading is “Kou”, 03 indicates that the inflection of the reading is in the first syllable, and it indicates that the number of strokes is 13 strokes. The character code of the Chinese character “０３” is “013F030D03”.
[0022]
FIG. 2B shows a conceptual diagram of character codes stored in the basic dimension character code table database 300. The character code according to the present embodiment, for example, the character code of the Chinese character “廣”, is a vector indicating language information, a vector indicating sound reading, a vector indicating inflection of reading, a vector indicating the number of strokes of characters, a radical It is represented by a composite vector with a vector indicating That is, a plurality of character codes are a set of discrete points in a local situation in a multidimensional Euclidean space constituted by these vectors.
[0023]
As described above, the character code according to the present embodiment not only identifies the character but also includes attribute information indicating the attribute of the character. Thereby, the information processing apparatus 10 can perform processing according to the character attribute without using other information indicating the character attribute using the character code.
The size of the data indicating each attribute information is 1 byte in the example of this figure, but is not limited to the example of this figure. For example, the size of each attribute information may be different depending on the contents of the attribute information. Further, the number and types of attributes included in the character code are not limited to the example in this figure. The information processing apparatus 10 may not include any of the attribute information shown in the drawing, for example, information indicating inflection of reading, as the attribute information.
[0024]
FIG. 3 shows details of the extended information item table database 310. The extended information item table database 310 stores extended type information indicating the type of extended information in association with extended information identification information for identifying the type of extended information. For example, the extended information identification information of the extended type information indicating “reading” is 01. Also, the extended information identification information of the extended type information indicating “sound reading” is 02. Also, the extended information identification information of the extended type information indicating “personal kanji” is 03. The extended information identification information of the extended type information indicating “place name kanji” is 04.
Further, the extended information identification information and the extended type information may be added by the user as a new dimension added to the character code. In this case, preferably, the extended information item table database 310 has a storage area for the user to add extended information identification information and extended type information in advance.
[0025]
FIG. 4 shows details of the storage location designation information 50. The storage location designation information 50 is information in which extended information identification information for identifying the type of extended information is associated with field identification information indicating in which field of the character code the extended information is stored. The field identification information is information indicating, for example, in what number area from the beginning of each piece of extended information the area following the attribute information in the character code. As an example, the extended information whose extended information identification information is 03, that is, the information indicating “whether or not it is a personal name kanji” is the first in the order from the top of the area next to the attribute information in the character code. Stored in the field.
[0026]
Thus, for example, the character string search unit 250 stores the extension information in any field of the character code even if the type of the application program that generated the document information is different from that of the application program 20-1. And the character string can be appropriately manipulated based on the extended information. For example, when the attribute information input unit 230 receives a search instruction from the user to search for characters used as personal kanji characters, the extended information item table database 310 indicates that the extended information identification information corresponding to the personal kanji characters is 03. The storage location designation information 50 specifies that the extended information having the extended information identification information of 03 is stored in the first field. As a result, the character string search unit 250 can appropriately select a field in which extended information used for the search (for example, an attribute serving as a search key) is stored. Similarly, the character string rearrangement unit 240 can specify in which field of the character code the extension type information received from the user is stored, and can appropriately rearrange the character strings.
[0027]
FIG. 5 shows a flowchart of the information processing apparatus 10. The document information input unit 210 inputs document information and storage location designation information 50 (S600). Then, the attribute information input unit 230 inputs attribute information (S610). When the application type detection unit 220 determines that the type of the application program that generated the document information is different from that of the application program 20-1 (S620: YES), the character string rearrangement unit 240 or the character string search unit 250 Based on the storage location designation information 50 and the extended information item table database 310, the field in the character code in which the extended information is stored is selected (S630).
[0028]
The character string rearrangement unit 240 rearranges a plurality of character strings included in the document information based on attribute information and / or extended information corresponding to each character included in each of the plurality of character strings. In addition, the character string search unit 250 searches for at least one character string including attribute information and / or extended information among a plurality of character strings included in the document information (S640). Then, the character string output unit 260 outputs the character string generated as a result of the process of S640 to the outside based on the attribute information of the characters constituting the character string (S650).
[0029]
FIG. 6 shows a first example of document information input by the document information input unit 210. The document information input unit 210 inputs characters included in the document information as a character code in which extended information is associated with attribute information. The attribute information is substantially the same as the attribute information described in FIG. The extended information is information added to the character identification information of the character code and indicating the attribute of the character identified by the character code. For example, the information indicating whether or not the character is a personal name kanji and the character It is information indicating a cautionary reading. For example, 00 in the figure indicates that the character is not used for personal names, and a value other than 00 indicates that the character is used for personal names.
[0030]
Thereby, for example, the information processing apparatus 10 can appropriately search for a character string having a predetermined reading among character strings indicating personal names. In addition, the information processing apparatus 10 can rearrange a plurality of characters used as Japanese in order from the smallest number of strokes. In this manner, the character string search unit 250 can appropriately narrow down the search range to only personal kanji characters by referring to the extended information according to the search instruction content. For example, when receiving a search instruction for searching for a character string in which a personal name kanji is used, the character string search unit 250 appropriately narrows down the search range to only the personal name kanji by referring to the extended information. Can do.
[0031]
FIG. 7 shows a second example of document information input by the document information input unit 210. The document information includes “April 1, 1998”, “XX”, and “XX”, each of which is an example of a character string, and “XX on April 1, 1998 is XX. "The company joined the company." For example, the character code of each character constituting “April 1, 1998” includes, as extended information, information indicating that the character constitutes a character string indicating a date. As described above, the document information stores, as a character string, a character code including the attribute of the word or phrase expressed as the character string as extended information. That is, the information processing apparatus 10 can realize a structured document by using only the character code without using other additional information, for example, a tag in HTML or XML.
[0032]
FIG. 8 is a functional block diagram of the information processing apparatus 10 according to the modification. The information processing apparatus 10 in this example further includes a character string storage unit 270 and a character string selection unit 280 in addition to the information processing apparatus 10 shown in FIG. Since the information processing apparatus 10 in this example has substantially the same configuration as that of the information processing apparatus 10 in FIG.
[0033]
The character string storage unit 270 stores a plurality of character strings predetermined according to the type of the application program 20-1. Then, the document information input unit 210 inputs the document information as the character code of the first character in each of the plurality of character strings. For example, in order to output a character string “Japanese”, the character string storage unit 270 stores a character string “real word” in advance, and the document information input unit 210 uses “Japanese” as document information. Enter the day to indicate. The character string selection unit 280 selects one character from the character strings stored in the character string storage unit 270 based on the character identification information included in the character code of the first character input by the document information input unit 210. The column “real language” is selected and sent to the character string output unit 260. In response to this, the character string output unit 260 receives the character “day” corresponding to the character code input by the document information input unit 210 and “true word” which is one character string selected by the character string selection unit 280. Is output. Thereby, since a long proper noun can be represented by the first character, the data size of the document information can be reduced.
[0034]
FIG. 9 shows an example of document information and a character string in the modified example. The document information input unit 210 inputs document information as the character code of the first character of the character string. For example, the document information input unit 210 uses the attribute information including the language type, reading aloud, the number of strokes, and the radical as the character code of the first character string “day” of “Japanese”, followed by “day”. And character identification information for identifying the subsequent character string “real language” to be output. More specifically, the character identification information for identifying the subsequent character string may be pointer information indicating a position in the character string storage unit 270 in which the subsequent character string is stored. Information indicating the order of storage in the column storage unit 270 may be used. Furthermore, the character string storage unit 270 includes, for each character constituting the subsequent character string, a pointer that is information for identifying a character to be output next to the character as a character code of each character. In this case, the character code at the end of the character string includes NULL information as information indicating the end of the character string.
[0035]
As described above, the information processing apparatus 10 can represent a character string including a plurality of characters by a character code indicating the first character of the character string. Thereby, the information processing apparatus 10 can reduce the data size of the document information input from the outside when a predetermined character string, for example, the word “Japanese” is frequently used. Furthermore, since the information processing apparatus 10 can handle a character string indicating a word as a unit, only a part of the word is deleted by a deletion process or the like, and characters that do not form a meaning can be prevented from remaining.
[0036]
FIG. 10 illustrates an example of a hardware configuration of the information processing apparatus 10 according to the embodiment and the modification. The information processing apparatus 10 according to the embodiment or the modification includes a CPU peripheral unit including a CPU 1000, a RAM 1020, a graphic controller 1075, and a display device 1080 connected to each other by a host controller 1082, and a host controller 1082 by an input / output controller 1084. Input / output unit having communication interface 1030, hard disk drive 1040, and CD-ROM drive 1060 to be connected, and legacy input / output unit having ROM 1010, flexible disk drive 1050, and input / output chip 1070 connected to input / output controller 1084 With.
[0037]
The host controller 1082 connects the RAM 1020 to the CPU 1000 and the graphic controller 1075 that access the RAM 1020 at a high transfer rate. The CPU 1000 operates based on programs stored in the ROM 1010 and the RAM 1020, and controls each unit. The graphic controller 1075 acquires image data generated by the CPU 1000 or the like on a frame buffer provided in the RAM 1020 and displays it on the display device 1080. Alternatively, the graphic controller 1075 may include a frame buffer that stores image data generated by the CPU 1000 or the like.
[0038]
The input / output controller 1084 connects the host controller 1082 to the communication interface 1030, the hard disk drive 1040, and the CD-ROM drive 1060, which are relatively high-speed input / output devices. The communication interface 1030 communicates with other devices via a network. The hard disk drive 1040 stores programs and data used by the information processing apparatus 10. The CD-ROM drive 1060 reads a program or data from the CD-ROM 1095 and provides it to the input / output chip 1070 via the RAM 1020.
[0039]
The input / output controller 1084 is connected to the ROM 1010 and relatively low-speed input / output devices such as the flexible disk drive 1050 and the input / output chip 1070. The ROM 1010 stores a boot program executed by the CPU 1000 when the information processing apparatus 10 is started up, a program depending on the hardware of the information processing apparatus 10, and the like. The flexible disk drive 1050 reads a program or data from the flexible disk 1090 and provides it to the input / output chip 1070 via the RAM 1020. The input / output chip 1070 connects various input / output devices via a flexible disk 1090 and, for example, a parallel port, a serial port, a keyboard port, a mouse port, and the like.
[0040]
A program provided to the information processing apparatus 10 is stored in a recording medium such as the flexible disk 1090, the CD-ROM 1095, or an IC card and provided by a user. The program is read from the recording medium, installed in the information processing apparatus 10 via the input / output chip 1070, and executed in the information processing apparatus 10.
[0041]
A program installed and executed in the information processing apparatus 10 includes a document information input module, an application type detection module, an attribute information input module, a character string rearrangement module, a character string search module, and a character string selection module. And a character string output module. The operation that each module causes the information processing apparatus 10 to perform is the same as the operation of the corresponding member in the information processing apparatus 10 described with reference to FIGS.
[0042]
The program or module shown above may be stored in an external storage medium. As the storage medium, in addition to the flexible disk 1090 and the CD-ROM 1095, an optical recording medium such as a DVD or PD, a magneto-optical recording medium such as an MD, a tape medium, a semiconductor memory such as an IC card, or the like can be used. Further, a storage device such as a hard disk or a RAM provided in a server system connected to a dedicated communication network or the Internet may be used as a recording medium, and the program may be provided to the information processing apparatus 10 via the network.
[0043]
As is clear from the above embodiments and modifications, the character code according to the present embodiment and modifications includes not only identifying characters but also attribute information indicating the attributes of the characters. Accordingly, the information processing apparatus 10 can perform a character string operation according to the character attribute without using other information indicating the character attribute, using the character code.
For example, when the character code includes the language to which the character belongs as the character attribute, the information processing apparatus 10 supports a plurality of types of languages without separately performing processing for specifying the language of the character. String operations can be performed. Thereby, even in the Internet where many types of characters are mixed, the information processing apparatus 10 can appropriately and efficiently operate a character string based on information such as language type and culture related to the language. it can.
[0044]
As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. Various modifications or improvements can be added to the above embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.
[0045]
According to the embodiment described above, the information processing apparatus, control method, program, data recording medium, and program recording medium shown in the following items can be realized.
[0046]
(Item 1) An information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character, wherein each of the character codes is a reading, radical, or number of strokes of the character corresponding to the character code Attribute information for identifying at least one of the character information as character specifying information necessary for specifying the character by the character code, and the information processing apparatus inputs document information including a plurality of character strings. And a character string rearrangement unit that rearranges the plurality of character strings included in the document information based on the attribute information included in a character code of each character included in each of the plurality of character strings. Processing equipment.
(Item 2) Each of the character codes further includes extended information added to the character identification information of the character code and indicating an attribute of the character identified by the character code, and the character string rearrangement unit includes The information processing apparatus according to item 1, wherein the plurality of character strings included in the document information are further rearranged based on the extended information corresponding to each character included in each of the plurality of character strings.
[0047]
(Item 3) The document information input unit includes, in the document information, storage position designation information in which the extension information is associated with information indicating in which field of the character code the extension information is stored. The character string rearrangement unit further selects the field in which the extended information is stored in the character code based on the storage position designation information, and stores the field in the selected field. The information processing apparatus according to item 2, wherein the plurality of character strings are rearranged based on extended information.
(Item 4) The field in which the extended information in each of the character codes is stored differs depending on the application program that generates the document information including the character code, and the information processing apparatus is controlled by the document information input unit. An application type detection unit that detects the type of application program that generated the input document information, and the type of application program detected by the application type detection unit functions as the character string rearrangement unit. The information processing apparatus according to item 3, wherein the character string rearrangement unit selects a field in which extended information used for rearrangement is stored based on the storage position designation information when the type of application program is different.
(Item 5) The document information input unit converts an input character string input from a user into an output character string in which a plurality of the character codes including the attribute information and the extended information are arranged, and the output character string is converted into the output character string. The information processing apparatus according to item 2, which is input as the document information.
[0048]
(Item 6) An information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character, wherein each of the character codes is a reading, radical, or number of strokes of the character corresponding to the character code Attribute information for identifying at least one of the character information as character specifying information necessary for specifying the character by the character code, and the information processing apparatus inputs document information including a plurality of character strings. A character string search unit that searches for at least one character string from among the plurality of character strings based on the attribute information input by the attribute information input unit and the attribute information input unit that inputs the attribute information An information processing apparatus comprising:
(Item 7) An information processing apparatus for identifying each of a plurality of characters by a character code corresponding to the character, wherein each of the character codes includes attribute information for identifying a reading of a character corresponding to the character code. Including the character identification information necessary for identifying the character by the character code, and further including information for identifying the inflection of the reading of the character corresponding to the character code. A document information input unit that inputs document information including the character string output that outputs the voice that reads out the plurality of characters based on the attribute information corresponding to the character code of each character included in each of the plurality of character strings And an information processing apparatus.
(Item 8) The document information input unit converts an input character string input from a user into an output character string in which a plurality of the character codes including information identifying the attribute information and the reading inflection are arranged, The information processing apparatus according to item 7, wherein the output character string is input as the document information.
[0049]
(Item 9) An information processing apparatus for outputting a character string based on an input character code, wherein each of the character codes is a character string to be output following a character corresponding to the character code. A document that includes character string identification information to be identified, and that inputs a character string storage unit that stores a plurality of character strings and document information that includes the plurality of character strings as the character code of the first character in each of the plurality of character strings. A character that selects one character string from among character strings stored in the character string storage unit, based on an information input unit and character string identification information included in the character code input by the character code input unit A column selection unit; and a character string output unit that outputs a character corresponding to the character code input by the character code input unit and the one character string selected by the character string selection unit. Broadcast processing apparatus.
(Item 10) An information processing apparatus for identifying each of a plurality of characters by a character code corresponding to the character, wherein each of the character codes identifies language to which a character corresponding to the character code belongs. As information necessary for specifying the character by the character code, the information processing apparatus includes a document information input unit that inputs document information including a plurality of character codes, and the plurality of pieces included in the document information. An information processing apparatus comprising: a character string output unit that outputs each of the characters based on the language information included in the character code of the character.
[0050]
(Item 11) A data recording medium in which a character code and a character are recorded in association with each other, and each of the character codes identifies at least one of a character reading, radical, or stroke number corresponding to the character code. The attribute information is included as character specifying information necessary for specifying the character by the character code, and shape identification information for identifying the shape of the character used for displaying or printing the character is stored in each of the plurality of characters. A data recording medium comprising a shape identification information storage area and a character code storage area for storing a character code of the character in association with the shape identification information.
(Item 12) A control method for controlling an information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character, wherein each of the character codes is a reading of a character corresponding to the character code, A document information input stage for inputting document information including a plurality of character strings, including attribute information for identifying at least one of radical or stroke number as character specifying information necessary for specifying the character by the character code And a rearrangement step of rearranging the plurality of character strings included in the document information based on the attribute information included in a character code of each character included in each of the plurality of character strings.
[0051]
(Item 13) A program for controlling an information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character, wherein each of the character codes is a reading of a character corresponding to the character code, Attribute information for identifying at least one of the neck and the number of strokes is included as character specifying information necessary for specifying the character by the character code, and the information processing apparatus inputs document information including a plurality of character strings. And a character string rearrangement unit that rearranges the plurality of character strings included in the document information based on the attribute information included in a character code of each character included in each of the plurality of character strings. Program to function as.
(Item 14) A program recording medium on which the program according to Item 13 is recorded.
[0052]
【The invention's effect】
As is clear from the above description, according to the present invention, a character string can be appropriately processed.
[Brief description of the drawings]
FIG. 1 shows a functional block diagram of an information processing apparatus 10;
FIG. 2A shows details of a basic dimension character code table database 300. FIG.
FIG. 2B shows a conceptual diagram of character codes stored in the basic dimension character code table database 300.
FIG. 3 shows details of an extended information item table database 310. FIG.
FIG. 4 shows details of storage location designation information 50. FIG.
FIG. 5 shows a flowchart of the information processing apparatus 10;
FIG. 6 shows a first example of document information input by a document information input unit 210;
FIG. 7 shows a second example of document information input by the document information input unit 210;
FIG. 8 is a functional block diagram of the information processing apparatus 10 according to a modified example.
FIG. 9 shows an example of document information and a character string in a modified example.
FIG. 10 shows an example of a hardware configuration of the information processing apparatus 10 according to the embodiment and the modification.
[Explanation of symbols]
10 Information processing equipment
20 Application programs
30 Operating system
50 Storage location designation information
210 Document information input section
220 Application type detection unit
230 Attribute information input part
240 Character string sorting part
250 Character string search part
260 String output part
270 character string storage
280 Character string selector
300 Basic Dimension Character Code Table Database
310 Extended Information Item Table Database

Claims

An information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character,
Each of the character codes includes information for specifying the character and extended information including information indicating whether the character is a personal name kanji ,
The position of the extended information in the character code differs depending on the application program that generates document information including the character code,
The information processing apparatus
Document information including a plurality of character strings, and a document information input unit that inputs position designation information indicating in which position of the character code the extended information is included ;
An application type detection unit that detects the type of the application program that generated the document information input by the document information input unit;
For each character code of the plurality of characters contained in the document information, select the extension information based on the position specification information, depending on the type of application program which is detected by said application type detecting unit and said selected The information indicating whether or not the character is a personal name kanji from the extended information is specified, and the plurality of characters included in the document information according to the information indicating whether or not the specified character is a personal name kanji An information processing apparatus comprising: a character string rearrangement unit that rearranges the columns.

An information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character,
Each of the character codes includes information for specifying the character, and extended information including information indicating whether the character is a kanji for personal names ,
The position of the extension information in the character code differs depending on the application program that generates document information including the character code,
The information processing apparatus
Document information including a plurality of character strings, and a document information input unit for inputting position designation information indicating in which position of the character code the extended information is included ;
An application type detection unit that detects the type of application program that generated the document information input by the document information input unit;
For each character code of the plurality of characters contained in the document information, select the extension information based on the position specification information, depending on the type of application program which is detected by said application type detecting unit and said selected The information indicating whether or not the character is a personal name kanji from the extended information is specified, and the plurality of characters included in the document information according to the information indicating whether or not the specified character is a personal name kanji An information processing apparatus comprising: a character string search unit that searches for at least one character string from a string.

A control method for controlling an information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character,
Each of the character codes includes information for specifying the character, and extended information including information indicating whether the character is a kanji for personal names ,
The position of the extension information in the character code differs depending on the application program that generates document information including the character code,
The information processing apparatus
Document information including a plurality of character strings, and document information input stage for inputting position specifying information indicating in which position of the character code the extended information is included ;
An application type detection step for detecting the type of the application program that generated the document information input in the document information input step;
For each character code of the plurality of characters contained in the document information, the position select the extension information based on the specified information, depending on the type of application program detected by the application type detecting step and said selected The information indicating whether or not the character is a personal name kanji from the extended information is specified, and the plurality of characters included in the document information according to the information indicating whether or not the specified character is a personal name kanji A control method comprising: a character string rearrangement step for rearranging the columns.

A control method for controlling an information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character,
Each of the character codes includes information for specifying the character and extended information including information indicating whether the character is a personal name kanji ,
The position of the extended information in the character code differs depending on the application program that generates document information including the character code,
The information processing apparatus
Document information including a plurality of character strings, and document information input stage for inputting position specifying information indicating in which position of the character code the extended information is included ;
An application type detection step for detecting the type of the application program that generated the document information input in the document information input step;
For each character code of the plurality of characters contained in the document information, the position select the extension information based on the specified information, depending on the type of application program detected by the application type detecting step and said selected The information indicating whether or not the character is a personal name kanji from the extended information is specified, and the plurality of characters included in the document information according to the information indicating whether or not the specified character is a personal name kanji control method and a character string search step of searching for at least one string from the column.

A program for controlling an information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character,
Each of the character codes includes information for specifying the character and extended information including information indicating whether the character is a personal name kanji ,
The position of the extended information in the character code differs depending on the application program that generates document information including the character code,
The information processing apparatus;
Document information including a plurality of character strings, and a document information input unit that inputs position designation information indicating in which position of the character code the extended information is included ;
An application type detection unit that detects the type of the application program that generated the document information input by the document information input unit;
For each character code of the plurality of characters contained in the document information, select the extension information based on the position specification information, depending on the type of application program which is detected by said application type detecting unit and said selected The information indicating whether or not the character is a personal name kanji from the extended information is specified, and the plurality of characters included in the document information according to the information indicating whether or not the specified character is a personal name kanji program to function as a string rearrangement unit for rearranging the column.

A program for controlling an information processing apparatus that identifies each of a plurality of characters by a character code corresponding to the character,
Each of the character codes includes information for specifying the character and extended information including information indicating whether the character is a personal name kanji ,
The position of the extended information in the character code differs depending on the application program that generates document information including the character code,
The information processing apparatus;
Document information including a plurality of character strings, and a document information input unit that inputs position designation information indicating in which position of the character code the extended information is included ;
An application type detection unit that detects the type of the application program that generated the document information input by the document information input unit;
For each character code of the plurality of characters contained in the document information, select the extension information based on the position specification information, depending on the type of application program which is detected by said application type detecting unit and said selected The information indicating whether or not the character is a personal name kanji from the extended information is specified, and the plurality of characters included in the document information according to the information indicating whether or not the specified character is a personal name kanji program to function as a character string search unit for searching at least one string from the column.

A program recording medium on which the program according to claim 5 is recorded.

A program recording medium on which the program according to claim 6 is recorded.