JP4195780B2

JP4195780B2 - Program, data processing system and storage medium

Info

Publication number: JP4195780B2
Application number: JP2001117432A
Authority: JP
Inventors: 浩井ノ川
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2001-04-16
Filing date: 2001-04-16
Publication date: 2008-12-10
Anticipated expiration: 2021-04-16
Also published as: JP2002328950A

Description

【０００１】
【発明の属する技術分野】
本発明は、処理対象として文字列を入力し、ソート、検索などのデータ処理を行うシステムに関し、特にアプリケーションが要求する処理に対して所定の機能を付加するプログラムインターフェイスに関する。
【０００２】
【従来の技術】
コンピュータを用いて文字列のソートや検索を行う場合、何らかの情報をキーとして文字列を認識し、分類することが必要となる。日本語文字列においては、コードポイントを用いたバイナリソートにより文字列を認識することが一般的である。このバイナリソートには、通常、ＪＩＳコード（シフトＪＩＳ、日本語ＥＵＣを含む、以下、同じ）が用いられる。
【０００３】
また、特殊なデータを扱うデータベースその他のアプリケーション・ソフトウェアにおいては、取り扱うデータの種類などに応じて、書き込み日時、金額や数量などの数字製品コードなどをキー情報としてソートや検索などを行うものもある。
【０００４】
アプリケーション・ユーザは、これらの種々の手法を用いて、文字列やデータのソートを行ったり、文字列やデータベースの中から所望の単語や文を検索したりすることができる。
図１１は、従来のデータ処理システムを用いたデータのソート結果を例示する図である。
図１１を参照すると、「社内ニュース」を蓄積したデータベースの「トップニュース」に分類されるデータが日付順でソートされている。このデータベースでは、さらにニュースの種類（図では「カテゴリ別」と表記）に基づいてデータをソートすることができ、また特に図示しないが、所定の文字または文字列を含むニュースのタイトルを、ＪＩＳコードを用いて検索することもできる。
【０００５】
【発明が解決しようとする課題】
ところで、データの種類によっては、文字列の読みに基づいてソートや検索を行うことが好ましい場合がある。しかし、上述した従来のデータ処理システムでは、通常、文字列の読みに基づくソートや検索を行う機能は用意されていなかった。
【０００６】
ＪＩＳコードは一部が該当文字の代表的な読み（以下、代表読み）に対応しているので、上記のコードポイントを用いたバイナリソートにおいてＪＩＳコードを用いたソートを行えば、文字の読みによるソートと同様の処理がある程度は可能となる。
しかし、コードポイントは文字を単位としているため、単語や文といった単位では正確なソートや検索ができない場合が多い。例えば、「安」の代表読みが“アン”であり、「売」の代表読みが“バイ”である場合、「安売り」は“アンバイリ”として扱われてしまう。このような不都合は、ソートや検索の対象が文のように長くなるとさらに顕著になる。
【０００７】
さらに、日本語の漢字には、複数の読みを持つものが多数ある。これに対し、代表読みを用いたデータ処理は、処理対象の文字における代表読みのみに基づいてソートや検索を行うこととなる。そのため、高い精度を期待できなかった。
【０００８】
また、読みに基づくデータ処理の実現手段として、蓄積されたデータに対して読み（読み仮名）を予め入力しておき、この読み仮名を基準としてソートや検索を行う方法が考えられる。
しかし、予め読み仮名を入力する作業は大きな手間を要し、コストの上昇やデータベース構築の作業時間の増大を招く。
【０００９】
そこで、本発明は、アプリケーションが要求するソートや検索といったデータ処理に対して、処理対象である文字列の読みに基づくデータ処理を行う機能を付加することを目的とする。
【００１０】
また、本発明は、ソートや検索といったデータ処理の対象である文字列が複数の読みを持ち得る場合に、各読みに基づいてこれらのデータ処理を行うインターフェイスを提供することを他の目的とする。
【００１１】
また、本発明は、ソートや検索といったデータ処理の対象である文字列の読みを自動的に取得し、当該読みに基づいてこれらのデータ処理を行うインターフェイスを提供することをさらに他の目的とする。
【００１２】
【課題を解決するための手段】
本発明は、上記の目的を達成する手段として、処理対象である漢字仮名混じり文字列の読みを生成する手段を用意し、アプリケーションが要求するデータのソートや検索を行う際に、動的に生成された文字列の読みを用いて処理を行うアプリケーション・プログラミング・インターフェイス（ＡＰＩ）を提供する。
【００１３】
上記のように、漢字仮名混じり文字列の読みに基づくデータ処理を行うため、本発明は、コンピュータを制御して、入力データに対する処理結果を出力するプログラムであって、処理対象である漢字仮名混じり文字列を要素とするデータ配列の入力または指定を受け付ける処理と、入力または指定されたこのデータ配列の要素である漢字仮名混じり文字列の読み情報を生成する処理と、生成された読み情報に基づいてこの漢字仮名混じり文字列をソートし、ソート結果であるデータ配列を出力する処理とをこのコンピュータに実行させることを特徴とする。
【００１４】
また、このプログラムは、処理対象であるデータ配列に対し、要素である漢字仮名混じり文字列をその読み情報に基づいてソートする処理に加え、またはこの処理の替わりに、生成された前記読み情報のカテゴリに基づいて、前記漢字仮名混じり文字列を分類し、分類結果であるデータ配列を出力する処理をこのコンピュータに実行させる構成とすることができる。
【００１５】
また、本発明のプログラムは、処理対象である漢字仮名混じり文字列の入力を受け付ける処理と、入力された漢字仮名混じり文字列の読み情報を生成する処理と、複数の漢字仮名混じり文字列におけるこの読み情報を比較し、比較結果を出力する処理とをこのコンピュータに実行させることを特徴とする。
【００１６】
さらに、本発明のプログラムは、検索キーである漢字仮名混じり文字列と、検索対象である漢字仮名混じり文字列の入力を受け付ける処理と、入力された漢字仮名混じり文字列の読み情報を生成する処理と、この検索キー及びこの検索対象における読み情報に基づいて、この検索対象からこの検索キーと同一の読みを持つ部分を検索し、検索結果を出力する処理とをこのコンピュータに実行させることを特徴とする。
ここで、この検索処理は、前記検索対象である漢字仮名混じり文字列の形態素ごとに、前記検索キーと読みが同一かどうかを判断する処理を含むことを特徴とする。
【００１７】
上述したプログラムにおけるデータ処理、すなわち、ソート、カテゴリ分類、文字列比較及び文字列検索は、データ処理装置（またはデータ処理装置を制御するソフトウェア）に対して個別に提供するだけでなく、複数の処理を利用できるように組み合わせて提供することができる。
また、上記のプログラムにおける漢字仮名混じり文字列の読み情報を生成する処理は、処理対象である漢字仮名混じり文字列に複数の読みがある場合、対応する複数の読み情報を生成することができる。この場合、ソート、カテゴリ分類、文字列比較及び文字列検索の各処理は、生成された複数の読み情報の各々に関して実行する。文字列の比較や検索においては、複数の読み情報のうちの一つでも同一であれば、文字列の一致、検出といった判断をすることができる。
【００１８】
また、上述したプログラムは、磁気ディスクその他の記憶媒体に格納して配布したり、プログラム伝送装置からネットワークを介して配信したりすることにより提供することができる。
【００１９】
さらにまた、漢字仮名混じり文字列の読みに基づいたデータ処理を行うユーザインターフェイスを実現する本発明は、次のように構成されたことを特徴とするデータ処理システムを提供する。このデータ処理システムは、漢字仮名混じり文字列を要素とするデータ配列を格納した記憶手段と、このデータ配列のソートを指示する命令を受け付ける入力手段と、記憶手段から処理対象であるデータ配列を取得し、このデータ配列の要素である漢字仮名混じり文字列の読み情報を生成する読み情報生成手段と、生成された読み情報に基づいて処理対象である漢字仮名混じり文字列をソートする処理手段と、ソート結果であるデータ配列を出力する出力手段とを備える。
ここで、この読み情報生成手段は、前記入力手段により命令が受け付けられた後に、前記漢字仮名混じり文字列の読み情報を生成する構成とすることができる。すなわち、ソート命令を受け付けてから、処理対象である漢字仮名混じり文字列に対して動的に読み情報を生成し、読みに基づくソートを行う。
【００２０】
また、本発明は、次のように構成されたことを特徴とするデータ処理システムを提供する。このデータ処理システムは、漢字仮名混じり文字列を要素とするデータ配列を格納した記憶手段と、このデータ配列に対する要素の分類を指示する命令を受け付ける入力手段と、記憶手段から処理対象であるデータ配列を取得し、このデータ配列の要素である漢字仮名混じり文字列の読み情報を生成する読み情報生成手段と、生成された読み情報のカテゴリに基づいて、処理対象である漢字仮名混じり文字列を分類する処理手段と、この処理手段による分類結果に基づいて、この漢字仮名混じり文字列をこの読み情報のカテゴリごとに分けて出力する出力手段とを備える。
【００２１】
さらに、本発明は、次のように構成されたことを特徴とするデータ処理システムを提供する。すなわち、アプリケーション・ソフトウェアに応じたユーザインターフェイスを提供するデータ処理システムにおいて、比較対象として複数の漢字仮名混じり文字列を指定して比較入力する入力手段と、比較対象である漢字仮名混じり文字列の読みに基づいてなされた比較結果を、このアプリケーション・ソフトウェアの用途に応じて加工して出力する出力手段とを備える。
【００２２】
さらにまた、本発明は、次のように構成されたことを特徴とするデータ処理システムを提供する。すなわち、アプリケーション・ソフトウェアに応じたユーザインターフェイスを提供するデータ処理システムにおいて、検索キーである漢字仮名混じり文字列と、検索対象である漢字仮名混じり文字列とを入力する入力手段と、この検索キー及びこの検索対象における読み情報に基づいて、この検索対象からこの検索キーと同一の読みを持つ部分を検索した結果を出力する出力手段とを備える。
【００２３】
【発明の実施の形態】
以下、添付図面に示す実施の形態に基づいて、この発明を詳細に説明する。
図１は、本実施の形態によるインターフェイスを実現するデータ処理システムの構成例を説明する図である。
本実施の形態のデータ処理システムは、パーソナルコンピュータやワークステーションなどのコンピュータ装置、ＰＤＡ（Personal Digital Assistant）、その他の電子情報機器にて実現される。
【００２４】
図１を参照すると、データ処理システムは、キーボードやマウスなどの入力装置１１と、ＣＰＵ及び主メモリを備えた処理装置１２と、ハードディスク装置などの記憶装置１３と、ディスプレイ装置やプリンタなどの出力装置１４とを備える。
入力装置１１は、ユーザにて操作され、データや各種処理の実行命令を入力するために用いられる。
処理装置１２は、プログラム制御により、種々のデータ処理を行う。また、後述するように、文字列の読み情報に基づくソートや検索を行うインターフェイスを提供する。
記憶装置１３は、処理対象である文字列を含むデータを格納する。したがって、処理装置１２は、入力装置１１にて入力された文字列、または入力装置１１から入力された命令により記憶装置１３から読み出した文字列に対して、読み情報に基づくデータ処理を行うこととなる。
出力装置１４は、処理装置１２による処理結果を出力する。
【００２５】
図２は、本実施の形態によるインターフェイス提供部の概略構成を示す図である。
処理装置１２上では、ワードプロセッサやデータベースなどのアプリケーション・ソフトウェアが動作しており、図２に示すインターフェイス提供部２０は、これらのアプリケーション・ソフトウェアの機能として、文字列の読みを用いたデータ処理を実行する。
【００２６】
図２を参照すると、本実施の形態によるインターフェイス提供部２０は、読み情報生成部２１と、データ処理部２２とを備える。
なお、図２に示す読み情報生成部２１及びデータ処理部２２は、コンピュータプログラムにより制御されたＣＰＵにて実現される仮想的なソフトウェアブロックである。ＣＰＵを制御する当該コンピュータプログラムは、ＣＤ−ＲＯＭやフロッピーディスクなどの記憶媒体に格納して配布したり、ネットワークを介して伝送したりすることにより提供される。また、当該コンピュータプログラムは、本実施の形態により提供されるインターフェイスを使用するアプリケーション・ソフトウェアに組み込むプログラムモジュールなどの形態で提供することができる。
【００２７】
読み情報生成部２１は、図１に示した入力装置１１または記憶装置１３から入力した処理対象であるデータ（文字列または文字列を要素とするデータ配列）に対して、必要に応じて読み情報を生成し、付加する。読み情報の生成は、漢字仮名混じり文中の漢字に対して読み仮名（振り仮名）を生成する公知のプログラムを用いて実行することができる。読み情報を生成する手法として、文字列を形態素解析し、形態素ごとに品詞の種類などの情報に基づいて読み情報を生成する手法を用いれば、読み情報の精度を向上させることができる。以下、本実施の形態における処理対象を、漢字仮名混じり文字列またはかかる文字列を要素とするデータ配列として説明する。ただし、漢字のみで構成された文字列や仮名文字のみで構成された文字列も漢字仮名混じり文字列に含まれるものとする。
なお、処理対象である文字列には、予めユーザが入力装置１１を用いて適切な読み仮名を入力し、付加情報として付加しておくことができる。この場合、文字列の読み情報としては、ユーザによって付加された読み仮名を用いるのが最も正確であるので、読み情報生成部２１による読み情報の生成は行う必要はない。以下の説明では、特に区別する必要がない限り、ユーザによって付加された読み仮名を読み情報として説明する。
【００２８】
データ処理部２２は、処理対象である文字列に対して、読み情報生成部２１により生成された、またはユーザにより入力された読み情報に基づいて、所定のデータ処理を実行する。本実施の形態では、読み情報を用いたデータ処理として、データのソート、データのカテゴリ分類、文字列比較、文字列検索の４種類の処理について説明する。具体的な処理の詳細については後述する。
【００２９】
図３は、図２に示したインターフェイス提供部２０にて提供されるデータ処理の流れを説明する図である。
図３に示すように、ユーザは、種々のアプリケーション・ソフトウェアの要求に応じて漢字仮名混じり文を入力し、インターフェイス提供部２０による処理を経て出力された出力結果を取得する。なお、漢字仮名混じり文の入力は、上述したように、入力装置１１から文を直接入力することもできるし、記憶装置１３に格納されている文を処理対象として指定することによって間接的に入力することもできる。
インターフェイス提供部２０は、まず、入力された漢字仮名混じり文に読み仮名が付されていなければ、読み情報を生成して付加する。図示の例では、「大和事業所、週末電源工事」という漢字仮名混じり文に、“ヤマトジギョウショ、シュウマツデンゲンコウジ”という読み情報が付加されている。そして、インターフェイス提供部２０は、この読み情報を用いてデータ処理を行い、処理の結果を出力する。
以上の動作により、本実施の形態によるインターフェイスを用いることで、ユーザは、漢字仮名混じり文に関してその読みを意識することなく入力し、当該漢字仮名混じり文の読みに基づいたデータ処理の結果を得ることが可能となる。
【００３０】
また、図３に示したように、本実施の形態のインターフェイス提供部２０は、ユーザから処理対象である漢字仮名混じり文の入力があった場合に、読み情報生成部２１により、動的に読み情報を生成する。なお、一度生成した読み情報を対応する漢字仮名混じり文に付加情報として付加しておくことにより、次に同じ漢字仮名混じり文を処理する場合に、読み情報生成部２１により改めて読み情報を生成しなくても、付加されている読み情報を用いて処理することができる。
【００３１】
次に、本実施の形態にて提供されるインターフェイスにおけるデータ処理について詳細に説明する。
１．ソート
ソートは、漢字仮名混じり文字列を要素とするデータ配列を受け取り、当該漢字仮名混じり文字列の読みを基準としてソートし、ソート結果である漢字仮名混じり文字列の配列として返すデータ処理である。漢字仮名混じり文字列に対して読み仮名が既に付加されている場合は、当該読み仮名をソートの基準とすることができる。読み仮名が付加されていない場合は、読み情報生成部２１により生成された読み情報をソートの基準とする。
この種のインターフェイスを用いる例としては、住所録、電話帳、名簿などのデータベース・ソフトウェアにおけるソート処理において、データ配列を氏名の読みに基づいてソートする場合、製品一覧を製品名の読みに基づいてソートする場合、データベースに対して条件検索を行った結果をアイウエオ順にソートする場合などが考えられる。
【００３２】
本実施の形態によりデータのソートを行う場合、インターフェイス提供部２０に入力されるのは、漢字仮名混じりの文字列とその読み情報が対になったデータＡを要素としたデータ配列である。ただし、読みは空であっても良い。また、インターフェイス提供部２０には、ソートの種類、すなわち、読みの辞書的順序における昇順にソートするか降順にソートするかを指定する命令が入力される。
【００３３】
これらの入力があると、インターフェイス提供部２０は、入力されたデータＡの配列に対して、読み情報のチェックとデータのソートという２段階の処理を行う。
図４は、インターフェイス提供部２０による処理の流れを説明するフローチャートである。
図４に示すように、インターフェイス提供部２０の読み情報生成部２１は、データ配列が入力されると（ステップ４０１）、まず、データ配列の全ての要素（データＡ）に対して、読み仮名が付されているかを順次チェックする（ステップ４０２〜４０４）。そして、読み仮名が付されていないデータＡが存在する場合、読み情報生成部２１により当該データＡの漢字仮名混じり文字列から読み情報を生成し、付加する（ステップ４０５）。読み情報が生成できれば、次の要素（データＡ）に対して同様の処理を行い（ステップ４０６）、読み情報が生成できなかったならば、エラーコードを出力する（ステップ４０６、４０７）。全ての要素（データＡ）に対してチェックまたは必要な読み情報生成処理を行ったならば、ソート処理へ移行する（ステップ４０２）。
次に、データ処理部２２は、必要に応じて漢字仮名混じり文字列の読み情報が補完されたデータ配列を入力し、その読み情報をキーとして要素（データＡ）のソートを行い（ステップ４０８）、ソート結果を出力する（ステップ４０９）。このとき、ソートを読みの辞書的順序における昇順とするか降順とするかは、入力時におけるユーザによる指定に従う。
【００３４】
以上の処理の後、インターフェイス提供部２０は、インターフェイス提供部２０を使用しているアプリケーション（例えば上述したデータベース・ソフトウェア）へ、ソートにより要素（データＡ）を並べ替えられたデータ配列及びリターンコードを出力する。リターンコードには、例えば、正しく処理が完了したことを示す「成功」、読み情報生成部２１において読み情報の生成に失敗したことを示す「エラー１」、その他のエラーを示す「エラー２」を設定することができる。エラー１の場合、データ配列、リターンコードに加えて、問題の発生した配列要素を特定する情報をさらに出力することができる。
インターフェイス提供部２０を使用しているアプリケーションは、受け取ったデータ配列及びリターンコードに基づいて、ソート結果やメッセージを出力装置１４に出力する。
【００３５】
図５は、本実施の形態によるソート結果の例を示す図である。
図５を参照すると、「社内ニュース」を蓄積したデータベースの「トップニュース」に分類されるデータが、ニュースのタイトルにおける読みの辞書的順序（図では「５０音順」と表記）でソートされている。
【００３６】
２．カテゴリ分類
カテゴリ分類は、漢字仮名混じり文字列を要素とするデータ配列を受け取り、当該漢字仮名混じり文字列の読み情報を基準として、読みが「ア」で始まる文字列や読みが「タ行」で始まる文字列といったカテゴリに分類し、分類ごとの漢字仮名混じり文字列の配列として返すデータ処理である。
この種のインターフェイスを用いる例としては、上述したソートの場合と同様の例が考えられる。かかるソート処理において、要素の数が膨大であり、読みが特定の文字で始まる文字列といった一部のデータのみを取得したい場合などに利用することができる。
【００３７】
本実施の形態によりデータのカテゴリ分類を行う場合、インターフェイス提供部２０に入力されるのは、漢字仮名混じりの文字列とその読み情報が対になったデータＡを要素とした配列である。ただし、読みは空であっても良い。また、インターフェイス提供部２０には、カテゴリ分類のパターンを指定する命令が入力される。ここで、カテゴリ分類のパターンとしては、例えば、読みが、「ア行」で始まる文字列、「カ行」で始まる文字列、といった分類で分けるパターン、読みが、「ア」で始まる文字列、「イ」で始まる文字列、といった分類で分けるパターンなどが考えられる。
【００３８】
これらの入力があると、インターフェイス提供部２０は、入力されたデータＡの配列に対して、読み情報のチェックとデータのカテゴリ分類という２段階の処理を行う。
図６は、インターフェイス提供部２０による処理の流れを説明するフローチャートである。
図６において、ステップ６０１からステップ６０７までの処理は、図４に示したソート処理のフローチャートにおけるステップ４０１からステップ４０７までの処理と同様であるため、説明を省略する。
インターフェイス提供部２０のデータ処理部２２は、読み情報生成部２１により必要に応じて漢字仮名混じり文字列の読み情報が補完されたデータ配列を入力し、その読み情報をキーとして、データ配列の各要素（データＡ）を入力時にユーザにより指定されたカテゴリごとの配列に振り分け（ステップ６０８）、分類結果を出力する（ステップ６０９）。
【００３９】
以上の処理の後、インターフェイス提供部２０は、インターフェイス提供部２０を使用しているアプリケーション（例えば上述したデータベース・ソフトウェア）へ、カテゴリごとに要素（データＡ）を分類し、並べ替えられたデータ配列及びリターンコードを出力する。リターンコードには、例えば、正しく処理が完了したことを示す「成功」、読み情報生成部２１において読み情報の生成に失敗したことを示す「エラー１」、その他のエラーを示す「エラー２」を設定することができる。エラー１の場合、データ配列、リターンコードに加えて、問題の発生した配列要素を特定する情報をさらに出力することができる。
インターフェイス提供部２０を使用しているアプリケーションは、受け取ったデータ配列及びリターンコードに基づいて、カテゴリ分類の結果やメッセージを出力装置１４に出力する。
【００４０】
図７は、本実施の形態によるカテゴリ分類の結果の例を示す図である。
図７を参照すると、「社内ニュース」を蓄積したデータベースの「トップニュース」に分類されるデータが、ニュースのタイトルにおける読み情報に基づいてカテゴリ分類されている（図では「あかさたな」と表記されている）。図７において、タイトルが「ア行」の文字で始まるニュースは、「ア行」の欄に、タイトルが「カ行」の文字で始まるニュースは、「カ行」の欄に、タイトルが「サ行」の文字で始まるニュースは、「サ行」の欄に、それぞれ分類されている。
なお、処理対象であるデータ配列のデータ量が膨大である場合は、本処理のカテゴリ分類を用いて、特定のカテゴリに属するデータのみを出力するように設定することもできる。
【００４１】
３．文字列比較
文字列比較は、二つの漢字仮名混じり文字列を受け取り、両者の読み情報を比較するデータ処理である。
この種のインターフェイスを用いる例としては、各種のデータベース・ソフトウェアにおける検索処理において、所定のリストの中から、同じ読みの名前やタイトルを検索したい場合などが考えられる。具体的には、電話帳のようなリストに対して名字での絞込み検索を行い、同じ読みの名字を取得したい場合、例えば「斎藤」と入力して検索し、同じ読みの「斎藤」、「斉藤」、「齋藤」、「齊藤」、「西藤」、「西東」などの検索結果を取得したい場合が考えられる。同様に、楽曲や映画のタイトルのデータベースに対して、同じ読みのタイトルを取得したい場合にも用いることができる。すなわち、ユーザが、つづり字は明確でないが読みは覚えているという場合の検索に用いることができる。
【００４２】
本実施の形態により文字列比較を行う場合、インターフェイス提供部２０に入力されるのは、漢字仮名混じりの文字列とその読み情報が対になったデータＡと、同じく漢字仮名混じりの文字列とその読み情報が対になったデータＢである。ただし、読みは空であっても良い。また、比較対照となるデータＡ、データＢは、それぞれデータＡ、データＢを要素とするデータ配列（リスト）から選出された要素であっても良いし、検索キーのように特に入力されたデータであっても良い。
【００４３】
これらの入力があると、インターフェイス提供部２０は、入力されたデータＡの配列に対して、読み情報のチェックと読み情報に基づく文字列比較という２段階の処理を行う。
図８は、インターフェイス提供部２０による処理の流れを説明するフローチャートである。
図８に示すように、インターフェイス提供部２０の読み情報生成部２１は、データＡとデータＢとが入力されると（ステップ８０１）、まず、両データＡ、Ｂに対して読み仮名が付されているかを順次チェックする（ステップ８０２〜８０４）。そして、読み仮名が付されていないデータが存在する場合、読み情報生成部２１により当該データの漢字仮名混じり文字列から読み情報を生成し、付加する（ステップ８０５）。読み情報が生成できなかったならば、エラーコードを出力する（ステップ８０６、８０７）。データＡ、データＢの双方をチェックまたは必要な読み情報生成処理を行ったならば、読みに基づく比較処理へ移行する（ステップ８０２）。
次に、データ処理部２２は、データＡ、データＢの読み情報を比較し（ステップ８０８）、比較結果を出力する（ステップ８０９）。
【００４４】
以上の処理の後、インターフェイス提供部２０は、インターフェイス提供部２０を使用しているアプリケーションへ、データＡとデータＢとの比較結果を示す値及びリターンコードを出力する。比較結果としては、データＡとデータＢの読みが一致しているか、一致していない場合は読みの辞書的順序でどちらが先かといった情報が得られる。したがって、比較結果を示す値は、例えば、データＢの読みの辞書的順序を基準（０）とし、データＡがデータＢよりも辞書的順序で先にあるときは０より小さい値、データＡとデータＢとが一致しているときは０、データＡがデータＢよりも辞書的順序で後にあるときは０より大きい値をそれぞれ出力するような設定とすることができる。また、リターンコードには、例えば、正しく処理が完了したことを示す「成功」、読み情報生成部２１において読み情報の生成に失敗したことを示す「エラー１」、その他のエラーを示す「エラー２」を設定することができる。
【００４５】
インターフェイス提供部２０を使用しているアプリケーションは、受け取った比較結果及びリターンコードに基づいて、当該アプリケーションにおける処理の結果を生成し、出力装置１４に出力する。
例えば、この文字列比較を上述した電話帳などのデータベースに対する検索システムの機能として利用すれば、検索キーの文字列と同じ読みを持つ要素（データ）をデータベースのリストの中から検出することができる。また、複数のリストを比較し、同じ読みを持つ要素（データ）を選別して取得したり、リストどうしを照合したりするという利用も考えられる。
【００４６】
４．文字列の読みのあいまい性を吸収した文字列比較
３の文字列比較と同様に、二つの漢字仮名混じり文字列を受け取り、両者の読み情報を比較するデータ処理である。ただし、日本語の漢字は複数の読みを持つものが一般的であることに対応して、処理対象の漢字仮名混じり文字列における読みとして複数の読み情報を生成し、当該複数の読み情報どうしを比較する。なお、この場合であっても、ユーザにより予め漢字仮名混じり文の読み仮名が付加されている場合は、当該付加されている読み仮名に従う。
【００４７】
文字列の読みのあいまい性を許容する文字列比較を行う場合、インターフェイス提供部２０に入力されるのは、３の文字列比較の場合と同様のデータＡ及びデータＢと、データＡ、Ｂの文字列に対して複数の読み情報が得られる場合に、いくつの読み情報を比較に使うかを指定する命令である。
【００４８】
これらの入力があると、インターフェイス提供部２０は、入力されたデータＡの配列に対して、読み情報のチェックと読みに基づく文字列比較という２段階の処理を行う。
図９は、インターフェイス提供部２０による処理の流れを説明するフローチャートである。
図９に示すように、インターフェイス提供部２０の読み情報生成部２１は、データＡとデータＢとが入力されると（ステップ９０１）、まず、両データＡ、Ｂに対して読み仮名が付されているかを順次チェックする（ステップ９０２〜９０４）。そして、読み仮名が付されていないデータが存在する場合、読み情報生成部２１により当該データの漢字仮名混じり文字列から読み情報を生成し、付加する（ステップ９０５）。このとき、読み情報生成部２１は、入力時に指定された数の読み情報を生成する。通常、文字や単語の読みを記憶した辞書データベースは、頻出する読みや代表的な読みから順に並べて記憶しているので、読み情報生成部２１は、指定数ｎにしたがって、ｎ番目までの読みを取得し、読み情報としてデータＡまたはデータＢに付加する。
また、読み情報生成部２１は、読み情報が生成できなかったならば、エラーコードを出力する（ステップ９０６、９０７）。データＡ、データＢの双方をチェックまたは必要な読み情報生成処理を行ったならば、読みに基づく比較処理へ移行する（ステップ９０２）。
【００４９】
次に、データ処理部２２は、データＡ、データＢの読みを比較し（ステップ９０８）、比較結果を出力する（ステップ９０９）。ここで、データＡ、データＢの一方または双方に複数の読み情報が付加されている場合は、当該複数の読み情報の組み合わせを全て調べ、一つでも合致する読み情報があれば、データＡ、データＢの読みが一致すると判断する。
【００５０】
以上の処理の後、インターフェイス提供部２０は、インターフェイス提供部２０を使用しているアプリケーションへ、データＡとデータＢとの比較結果を示す値及びリターンコードを出力する。比較結果を示す値は、例えば、読みが一致する場合（同じ読み情報が一つ以上ある場合）は０、読みが一致しない場合（全ての読み情報が異なる場合）は０以外の値をそれぞれ出力するような設定とすることができる。また、リターンコードには、例えば、正しく処理が完了したことを示す「成功」、読み情報生成部２１において読み情報の生成に失敗したことを示す「エラー１」、その他のエラーを示す「エラー２」を設定することができる。
【００５１】
本処理の場合、３による文字列比較と異なり、一つの文字列から複数の読みが得られる場合に、それぞれの読みに基づいて検索結果を取得することができる。すなわち、漢字などのつづり字は明確に覚えているが読みは明確でない場合に、適切な読みを含むデータ処理を実行することができる。例えば、「上村」と入力して検索した場合、“ウエムラ”、“カミムラ”の二つの読みが得られたならば、文字列「うえむら」と一致するという検索結果を得ることができる。
さらに、３の文字列比較において説明したように、本実施の形態は、検索キーの文字列に対して同じ読みを持つ文字列を検索することができるので、「上村」と入力して“ウエムラ”、“カミムラ”の二つの読みが得られた場合、“ウエムラ”に対して「植村」、“カミムラ”に対して「神村」という検索結果を得ることができる。
【００５２】
５．文字列検索
文字列検索は、二つの漢字仮名混じり文字列を入力とし、一方の文字列の中から他方の文字列と同じ読みの部分を探し、該当部分の出現箇所を返すデータ処理である。この処理においても、４の文字列比較と同様に、処理対象の漢字仮名混じり文字列における読みとして複数の読み情報を生成し、当該複数の読み情報どうしを比較する。この場合も、ユーザにより予め漢字仮名混じり文の読み仮名が付加されている場合は、当該付加されている読みに従う。
この種のインターフェイスを用いる例としては、ワードプロセッサ・ソフトウェアにおける文字列検索処理において、所定の文書中から所望の文字列（単語など）と同じ読みの文字列を検索する場合などが考えられる。
【００５３】
本実施の形態により文字列検索を行う場合、インターフェイス提供部２０に入力されるのは、検索対象である漢字仮名混じりの文字列とその読み情報が対になったデータＡと、検索キーである漢字仮名混じりの文字列とその読み情報が対になったデータＢ、そして、検索キーであるデータＢの文字列に対して複数の読み情報が得られる場合に、いくつの読み情報を比較に使うかを指定する命令である。
【００５４】
これらの入力があると、インターフェイス提供部２０は、入力されたデータＡの配列に対して、読み情報のチェックと読みに基づく文字列検索という２段階の処理を行う。
図１０は、インターフェイス提供部２０による処理の流れを説明するフローチャートである。
図１０に示すように、インターフェイス提供部２０の読み情報生成部２１は、データＡとデータＢとが入力されると（ステップ１００１）、まず、両データＡ、Ｂに対して読み仮名が付されているかを順次チェックする（ステップ１００２〜１００４）。そして、読み仮名が付されていないデータが存在する場合、読み情報生成部２１により当該データの漢字仮名混じり文字列から読み情報を生成し、付加する（ステップ１００５）。このとき、データＢに対しては、入力時に指定された数の読み情報を生成する。通常、文字や単語の読みを記憶した辞書データベースは、頻出する読みや代表的な読みから順に並べて記憶しているので、読み情報生成部２１は、指定数ｎにしたがって、ｎ番目までの読みを取得し、読み情報としてデータＢに付加する。
また、読み情報生成部２１は、読み情報が生成できなかったならば、エラーコードを出力する（ステップ１００６、１００７）。データＡ、データＢの双方をチェックまたは必要な読み情報生成処理を行ったならば、読みに基づく検索処理へ移行する（ステップ１００２）。
【００５５】
次に、データ処理部２２は、データＡ、データＢの読みを比較し（ステップ１００８）、データＡの読みの中にデータＢの読みと同一の部分があるか調べ、その結果を出力する（ステップ１００９）。ここで、データＡ中にデータＢと同じ読みの部分がある場合は、その場所を示す位置情報を出力する。また、データＢと同じ読みの部分が複数箇所ある場合は、例えば、最初に読みが一致した部分を示す位置情報を出力することができる。なお、データＢに複数の読み情報が付加されている場合は、複数の読み情報の全てについて、データＡ中に同じ読みの部分があるかどうかを調べ、その結果を出力する。
【００５６】
以上の処理の後、インターフェイス提供部２０は、インターフェイス提供部２０を使用しているアプリケーションへ、検索結果及びリターンコードを出力する。検索結果は、例えば、データＡ中にデータＢと同じ読みの部分がある場合は、その場所を示す位置情報である。具体的には、出力装置１４に表示されたデータＡ中の該当文字列にポインタを表示したり、反転表示したりする。位置情報を表示は、検出した文字列全体に対して行っても良いし、検出した文字列の先頭の文字や最後の文字など特定の部分のみに対して行っても良い。
また、データＡ中にデータＢと同じ読みの部分がない場合は、例えば、該当部分がないことを示すメッセージを出力する。また、リターンコードには、例えば、正しく処理が完了したことを示す「成功」、読み情報生成部２１において読み情報の生成に失敗したことを示す「エラー１」、その他のエラーを示す「エラー２」を設定することができる。
インターフェイス提供部２０を使用しているアプリケーションは、受け取った比較結果及びリターンコードに基づいて、検索結果やメッセージを出力装置１４に出力する。
【００５７】
本処理は、ある程度の長さのある文章やデータ列を検索対象とし、所望の文字や単語、文などを検索キーとして検索する場合に利用することができる。そして、それぞれの文字列の読みに基づいて検索を行うため、つづり字は明確でないが読みは覚えているという場合でも、適切な検索結果を含む検索を行うことができる。例えば、検索対象「・・・肴はあぶったいかでいい・・・・」に対し、検索キーとして「魚」「サカナ」「さかな」のどれを入力しても「肴」を検出することができる。
また、検索キーについて複数の読みが得られる場合に、それぞれの読みに基づいて検索を行うことができる。例えば、検索キーとして「上村」を入力し、検索対象中に「・・・うえむらさん・・・・」「・・・かみむら先生・・・・」「・・・植村君・・・・」「・・・神村氏・・・・」などの部分があるならば、いずれも検出することができる。
なお、検索対象の文字列から検索キーの文字列に対応する部分を探す際、検索対象の文字列中を形態素単位で移動しながら検索するようにすれば、単語を途中で分割するような不自然な部分で読みが一致し、誤った検出を行ってしまうことを回避することができる。
【００５８】
このように、日本語の文字列を対象としてデータのソート、カテゴリ分類、文字列比較、文字列検索といったデータ処理を行う場合、文字列の読みに基づいて処理することにより、より応用性が高く精度の高いデータ処理を行うことが可能となる。
また、文字列の読みは、読み情報生成部２１により自動的に生成されるので、ユーザは文字列の読みを意識せず（読み仮名を入力することなく）、読みに基づくデータ処理の結果を得ることができる。
【００５９】
さらに、文字列比較や文字列検索においては、文字列の読みに基づいて処理を行うことにより、検索キーや入力文字列のつづり字が誤っている場合でも、正しい検索結果を得ることができる。また、文字列に複数の読みがある場合に、それぞれの読みについて処理を行うことにより、ユーザが読みを誤って認識していた場合でも、正しい検索結果を得ることができる。
なお、上述した実施の形態では、文字列比較及び文字列検索の場合にのみ、複数の読みに対応する処理を説明したが、データのソート及びカテゴリ分類においても複数の読みに対応した処理を行うことができる。この場合、例えば、複数の読みに基づいて、該当する複数の位置にそれぞれデータを挿入するか、またはいずれかの該当位置にデータを挿入し、他の該当位置に当該データへのポインタを挿入する。このようにすれば、ソート結果または分類結果に対して、ユーザがいずれの読みに基づいてデータを探す場合でも、所望のデータを取得することができる。
また、上述した実施の形態では、仮名漢字混じり文字列の読みとして、読み情報を生成し、文字列に付加した。この読み情報は、統一されていれば、平仮名であっても片仮名であっても良い。また、仮名文字の替わりにローマ字を読み情報として用いても良い。
【００６０】
さらにまた、本実施の形態では、データ処理としてデータのソート、データのカテゴリ分類、文字列比較及び文字列検索について説明したが、他のデータ処理として、例えば、漢字仮名混じり文字列における特定の読みを持つ部分を他の文字列に置き換える文字列置換を行うこともできる。
【００６１】
【発明の効果】
以上説明したように、本発明によれば、アプリケーションが要求するソートや検索といったデータ処理に対して、処理対象である文字列の読みに基づくデータ処理を行う機能を付加することができる。
【００６２】
また、本発明によれば、ソートや検索といったデータ処理の対象である文字列が複数の読みを持ち得る場合に、各読みに基づいてかかるデータ処理を行うインターフェイスを提供することができる。
【００６３】
さらに、本発明によれば、ソートや検索といったデータ処理の対象である文字列の読みを自動的に取得し、当該読みに基づいてかかるデータ処理を行うインターフェイスを提供することができる。
【図面の簡単な説明】
【図１】本実施の形態によるインターフェイスを実現するデータ処理システムの構成例を説明する図である。
【図２】本実施の形態によるインターフェイス提供部の概略構成を示す図である。
【図３】本実施の形態によるインターフェイス提供部にて提供されるデータ処理の流れを説明する図である。
【図４】本実施の形態のインターフェイス提供部による処理の流れを説明するフローチャートであり、データのソートを行う場合の処理を説明する図である。
【図５】本実施の形態によるソート結果の例を示す図である。
【図６】本実施の形態のインターフェイス提供部による処理の流れを説明するフローチャートであり、データのカテゴリ分類を行う場合の処理を説明する図である。
【図７】本実施の形態によるソート結果の他の例を示す図である。
【図８】本実施の形態のインターフェイス提供部による処理の流れを説明するフローチャートであり、文字列比較を行う場合の処理を説明する図である。
【図９】本実施の形態のインターフェイス提供部による処理の流れを説明するフローチャートであり、文字列の読みのあいまい性を吸収した文字列比較を行う場合の処理を説明する図である。
【図１０】本実施の形態のインターフェイス提供部による処理の流れを説明するフローチャートであり、文字列検索を行う場合の処理を説明する図である。
【図１１】従来のデータ処理システムを用いたデータのソート結果を例示する図である。
【符号の説明】
１１…入力装置、１２…処理装置、１３…記憶装置、１４…出力装置、２０…インターフェイス提供部、２１…読み情報生成部、２２…データ処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a system that inputs a character string as a processing target and performs data processing such as sorting and searching, and more particularly to a program interface that adds a predetermined function to processing requested by an application.
[0002]
[Prior art]
When sorting and searching for character strings using a computer, it is necessary to recognize and classify character strings using some information as a key. In Japanese character strings, it is common to recognize character strings by binary sorting using code points. In this binary sort, a JIS code (including shift JIS, Japanese EUC, hereinafter the same) is usually used.
[0003]
In addition, some databases and other application software that handle special data sort or search using numeric product codes such as date and time of writing, amount, quantity, etc. as key information, depending on the type of data handled. .
[0004]
Using these various methods, the application user can sort character strings and data, and search for desired words and sentences from the character strings and databases.
FIG. 11 is a diagram illustrating a data sorting result using a conventional data processing system.
Referring to FIG. 11, data classified as “top news” in the database storing “in-house news” is sorted in date order. In this database, the data can be further sorted based on the type of news (indicated by “category” in the figure), and although not particularly shown, the title of the news including a predetermined character or character string is converted into a JIS code. You can also search using
[0005]
[Problems to be solved by the invention]
By the way, depending on the type of data, it may be preferable to perform sorting or searching based on reading of a character string. However, the above-described conventional data processing system usually does not have a function for performing sorting or searching based on reading of a character string.
[0006]
A part of the JIS code corresponds to typical reading of the corresponding character (hereinafter referred to as representative reading). Therefore, if the sorting using the JIS code is performed in the binary sorting using the above code points, the character reading is performed. Processing similar to sorting is possible to some extent.
However, since code points are in units of characters, accurate sorting and searching are often not possible in units of words or sentences. For example, when the representative reading of “low” is “Ann” and the representative reading of “Sell” is “Buy”, “Low sale” is treated as “Ambiri”. Such inconvenience becomes more remarkable when the target of sorting or searching becomes longer like a sentence.
[0007]
In addition, many Japanese kanji have multiple readings. On the other hand, in data processing using representative reading, sorting and searching are performed based only on representative readings of characters to be processed. Therefore, high accuracy could not be expected.
[0008]
Further, as a means for realizing data processing based on reading, a method is conceivable in which readings (reading kana) are input in advance with respect to accumulated data, and sorting or searching is performed based on the reading kana.
However, the work of inputting the reading kana in advance requires a great amount of labor, which leads to an increase in cost and an increase in work time for database construction.
[0009]
Therefore, an object of the present invention is to add a function of performing data processing based on reading of a character string to be processed to data processing such as sorting and searching required by an application.
[0010]
Another object of the present invention is to provide an interface for performing data processing based on each reading when a character string that is a target of data processing such as sorting and searching can have a plurality of readings. .
[0011]
Another object of the present invention is to provide an interface that automatically acquires readings of character strings that are targets of data processing such as sorting and searching, and performs these data processing based on the readings. .
[0012]
[Means for Solving the Problems]
In order to achieve the above object, the present invention provides means for generating a reading of a character string mixed with kanji as a processing target, and is generated dynamically when sorting and searching for data requested by an application. An application programming interface (API) is provided that performs processing using the read character string.
[0013]
As described above, in order to perform data processing based on reading of a kanji-kana mixed character string, the present invention controls a computer and outputs a processing result for input data. Based on processing that accepts input or specification of a data array that has character strings as elements, processing that generates reading information of character strings that are input or specified and that are elements of this data array, and based on the generated reading information The present invention is characterized in that the computer executes a process of sorting character strings mixed with kanji and kana and outputting a data array as a sorting result.
[0014]
In addition, in addition to the process of sorting the kanji kana mixed character strings as elements based on the reading information with respect to the data array to be processed, this program can replace the generated reading information. Based on the category, the computer can execute a process of classifying the character string mixed with the kanji and kana and outputting a data array as a classification result.
[0015]
Further, the program of the present invention includes a process for receiving input of a kanji-kana mixed character string to be processed, a process for generating reading information of the input kanji-kana mixed character string, and a plurality of kanji-kana mixed character strings. The computer is caused to execute a process of comparing reading information and outputting a comparison result.
[0016]
Furthermore, the program of the present invention includes a process for receiving input of a kanji kana mixed character string as a search key and a kanji kana mixed character string to be searched, and a process of generating reading information of the input kanji kana mixed character string And, based on the search key and the reading information in the search target, search the portion having the same reading as the search key from the search target, and cause the computer to execute a process of outputting the search result. And
Here, the search process includes a process of determining whether or not the search key and the reading are the same for each morpheme of the kanji mixed character string to be searched.
[0017]
Data processing in the above-described program, that is, sorting, category classification, character string comparison, and character string search are not only provided individually to the data processing device (or software that controls the data processing device), but also a plurality of processes. Can be provided in combination so that they can be used.
Moreover, the process of generating the reading information of the kanji mixed character string in the above program can generate a plurality of corresponding reading information when there are a plurality of readings in the kanji mixed character string to be processed. In this case, each process of sort, category classification, character string comparison, and character string search is executed for each of the generated plurality of reading information. In the comparison and search of character strings, it is possible to make a determination such as matching or detection of character strings if one of a plurality of reading information is the same.
[0018]
The above-described program can be provided by being stored and distributed on a magnetic disk or other storage medium, or distributed from a program transmission apparatus via a network.
[0019]
Furthermore, the present invention for realizing a user interface that performs data processing based on reading of a character string mixed with kanji characters provides a data processing system configured as follows. This data processing system includes a storage means storing a data array having a character string mixed with kanji as an element, an input means for receiving an instruction for sorting the data array, and obtaining a data array to be processed from the storage means A reading information generation unit that generates reading information of a kanji-kana mixed character string that is an element of the data array; a processing unit that sorts a kanji-kana mixed character string to be processed based on the generated reading information; Output means for outputting a data array as a sorting result.
Here, the reading information generating means may be configured to generate reading information of the character string mixed with the kanji characters after an instruction is received by the input means. That is, after receiving a sort command, reading information is dynamically generated for a character string mixed with kanji characters to be processed, and sorting based on reading is performed.
[0020]
The present invention also provides a data processing system configured as follows. The data processing system includes a storage unit that stores a data array whose elements are character strings mixed with kanji characters, an input unit that receives a command for classifying an element with respect to the data array, and a data array to be processed from the storage unit Based on the reading information generation means for generating the reading information of the kanji-kana mixed character string that is the element of this data array, and classifying the kanji-kana mixed character string to be processed based on the category of the generated reading information And a processing unit for outputting the character string mixed with kanji and kana for each category of the reading information based on the classification result by the processing unit.
[0021]
Furthermore, the present invention provides a data processing system configured as follows. That is, in a data processing system that provides a user interface corresponding to application software, input means for specifying and comparing a plurality of kanji-kana mixed character strings as comparison targets, and reading of kanji-kana mixed character strings to be compared Output means for processing and outputting the comparison result made based on the above in accordance with the use of the application software.
[0022]
Furthermore, the present invention provides a data processing system configured as follows. That is, in a data processing system that provides a user interface according to application software, an input means for inputting a kanji kana mixed character string that is a search key and a kanji kana mixed character string that is a search target, and the search key and Output means for outputting a result of searching a portion having the same reading as the search key from the search target based on the reading information in the search target.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail based on embodiments shown in the accompanying drawings.
FIG. 1 is a diagram illustrating a configuration example of a data processing system that implements an interface according to the present embodiment.
The data processing system of the present embodiment is realized by a computer device such as a personal computer or a workstation, a PDA (Personal Digital Assistant), or other electronic information equipment.
[0024]
Referring to FIG. 1, a data processing system includes an input device 11 such as a keyboard and a mouse, a processing device 12 having a CPU and a main memory, a storage device 13 such as a hard disk device, and an output device such as a display device and a printer. 14.
The input device 11 is operated by a user and is used to input data and execution instructions for various processes.
The processing device 12 performs various data processing under program control. In addition, as will be described later, an interface for sorting and searching based on reading information of a character string is provided.
The storage device 13 stores data including a character string to be processed. Therefore, the processing device 12 performs data processing based on the reading information on the character string input from the input device 11 or the character string read from the storage device 13 by the command input from the input device 11. Become.
The output device 14 outputs the processing result obtained by the processing device 12.
[0025]
FIG. 2 is a diagram showing a schematic configuration of the interface providing unit according to the present embodiment.
Application software such as a word processor and a database operates on the processing device 12, and the interface providing unit 20 shown in FIG. 2 executes data processing using character string reading as a function of these application software. To do.
[0026]
Referring to FIG. 2, the interface providing unit 20 according to the present embodiment includes a reading information generating unit 21 and a data processing unit 22.
Note that the reading information generation unit 21 and the data processing unit 22 illustrated in FIG. 2 are virtual software blocks realized by a CPU controlled by a computer program. The computer program for controlling the CPU is provided by being stored and distributed in a storage medium such as a CD-ROM or a floppy disk, or transmitted via a network. In addition, the computer program can be provided in the form of a program module incorporated in application software that uses the interface provided by the present embodiment.
[0027]
The reading information generating unit 21 reads the reading target data (a character string or a data array having a character string as an element) input from the input device 11 or the storage device 13 shown in FIG. Is generated and added. The generation of the reading information can be executed by using a known program that generates a reading kana (kana) for kanji in a kanji mixed sentence. As a method for generating reading information, the accuracy of reading information can be improved by using a method of performing morphological analysis on a character string and generating reading information based on information such as the type of part of speech for each morpheme. Hereinafter, the processing target in the present embodiment will be described as a character string mixed with kanji characters or a data array having such character strings as elements. However, it is assumed that a character string composed only of kanji and a character string composed only of kana characters are also included in the kanji mixed kana character string.
It should be noted that the user can input an appropriate reading pseudonym in advance using the input device 11 and add it as additional information to the character string to be processed. In this case, as the reading information of the character string, it is most accurate to use the reading kana added by the user, and therefore it is not necessary to generate the reading information by the reading information generation unit 21. In the following description, a reading pseudonym added by a user will be described as reading information unless it is necessary to distinguish between them.
[0028]
The data processing unit 22 performs predetermined data processing on the character string to be processed based on the reading information generated by the reading information generating unit 21 or input by the user. In the present embodiment, as data processing using reading information, four types of processing of data sorting, data category classification, character string comparison, and character string search will be described. Details of specific processing will be described later.
[0029]
FIG. 3 is a diagram for explaining the flow of data processing provided by the interface providing unit 20 shown in FIG.
As shown in FIG. 3, the user inputs a kanji-kana mixed sentence in response to various application software requests, and obtains an output result output through processing by the interface providing unit 20. As described above, a sentence mixed with kanji characters can be input directly from the input device 11 or indirectly by designating a sentence stored in the storage device 13 as a processing target. You can also
The interface providing unit 20 first generates and adds reading information unless a reading kana is attached to the input kanji mixed text. In the example shown in the figure, the reading information “Yamatojigyosho, Shumatsuden Koji” is added to the kanji-kana mixed sentence “Yamato Works, Weekend Power Supply Construction”. The interface providing unit 20 performs data processing using the reading information and outputs the processing result.
With the above operation, by using the interface according to the present embodiment, the user inputs the kanji kana mixed text without being aware of the reading, and obtains the data processing result based on the kanji kana mixed text reading. It becomes possible.
[0030]
As shown in FIG. 3, the interface providing unit 20 according to the present embodiment dynamically reads the reading information by the reading information generation unit 21 when a user inputs a kanji-kana mixed sentence to be processed. Generate information. Note that once the generated reading information is added as additional information to the corresponding kanji-kana mixed sentence, the reading information generation unit 21 generates the reading information again when the same kanji-kana mixed sentence is processed next time. Even if not, processing can be performed using the added reading information.
[0031]
Next, data processing in the interface provided in the present embodiment will be described in detail.
1. sort
Sorting is a data process in which a data array including a character string mixed with kanji characters is received, sorted based on the reading of the character string mixed with kanji characters, and returned as an array of character strings mixed with kanji characters as a sorting result. When a reading kana is already added to a character string mixed with kanji characters, the reading kana can be used as a sorting reference. If no reading kana is added, the reading information generated by the reading information generation unit 21 is used as a sorting reference.
As an example of using this type of interface, when sorting data arrays based on readings of names in the sorting process in database software such as address book, telephone directory, and directory, the product list is based on readings of product names. In the case of sorting, there may be a case where the result of the condition search performed on the database is sorted in the order of Iueo.
[0032]
When data is sorted according to the present embodiment, a data array having a character string mixed with kanji characters and its reading information as an element is input to the interface providing unit 20. However, the reading may be empty. Also, the interface providing unit 20 receives an instruction that specifies the sort type, that is, whether to sort in ascending order or descending order in the lexicographic order of reading.
[0033]
When these inputs are made, the interface providing unit 20 performs two-stage processing on the input data A array, that is, reading information check and data sorting.
FIG. 4 is a flowchart for explaining the flow of processing by the interface providing unit 20.
As shown in FIG. 4, when a data array is input to the reading information generating unit 21 of the interface providing unit 20 (step 401), first, reading pseudonyms are assigned to all elements (data A) of the data array. It is sequentially checked whether they are attached (steps 402 to 404). If there is data A to which no reading kana is attached, the reading information generating unit 21 generates reading information from the kanji kana mixed character string of the data A and adds it (step 405). If reading information can be generated, the same processing is performed for the next element (data A) (step 406). If reading information cannot be generated, an error code is output (steps 406 and 407). When checking or necessary reading information generation processing is performed for all elements (data A), the processing proceeds to sorting processing (step 402).
Next, the data processing unit 22 inputs a data array in which the reading information of the character string mixed with kanji characters is supplemented as necessary, and sorts the elements (data A) using the reading information as a key (step 408). The sort result is output (step 409). At this time, whether the sorting is ascending or descending in the lexicographic order of reading depends on the specification by the user at the time of input.
[0034]
After the above processing, the interface providing unit 20 sends the data array in which the elements (data A) are rearranged by sorting to the application (for example, the database software described above) using the interface providing unit 20 and the return code. Output. The return code includes, for example, “success” indicating that the processing has been correctly completed, “error 1” indicating that the reading information generation unit 21 has failed to generate reading information, and “error 2” indicating other errors. Can be set. In the case of error 1, in addition to the data array and the return code, information for specifying the array element in which the problem has occurred can be further output.
An application using the interface providing unit 20 outputs a sort result and a message to the output device 14 based on the received data array and return code.
[0035]
FIG. 5 is a diagram illustrating an example of a sorting result according to the present embodiment.
Referring to FIG. 5, data classified as “top news” in a database storing “in-house news” is sorted in the lexicographic order of reading in the news titles (indicated as “50 alphabetical order” in the figure). Yes.
[0036]
2. Category classification
The category classification receives a data array whose elements are kanji-kana mixed character strings, and based on the reading information of the kanji-kana mixed character strings, the characters that start with “a” and the characters that start with “ta line” This is data processing that is classified into categories such as columns and returned as an array of character strings mixed with kanji for each category.
As an example using this type of interface, an example similar to the sort described above can be considered. This sort processing can be used when the number of elements is enormous and only a part of data such as a character string whose reading starts with a specific character is desired to be acquired.
[0037]
When performing data category classification according to the present embodiment, what is input to the interface providing unit 20 is an array having as an element data A in which a character string mixed with kanji and its reading information are paired. However, the reading may be empty. In addition, an instruction for designating a category classification pattern is input to the interface providing unit 20. Here, as a pattern of category classification, for example, a pattern divided by classification such as a character string starting with “A line”, a character string starting with “K line”, a character string starting with “A”, A pattern divided by classification such as a character string starting with “I” is conceivable.
[0038]
When these inputs are made, the interface providing unit 20 performs a two-stage process on the input data A array, that is, reading information check and data category classification.
FIG. 6 is a flowchart for explaining the flow of processing by the interface providing unit 20.
In FIG. 6, the processing from step 601 to step 607 is the same as the processing from step 401 to step 407 in the sort processing flowchart shown in FIG.
The data processing unit 22 of the interface providing unit 20 inputs a data array in which the reading information generation unit 21 supplements the reading information of the character string mixed with kanji as necessary, and uses each reading information as a key to input each data array. The elements (data A) are sorted into an array for each category designated by the user at the time of input (step 608), and the classification result is output (step 609).
[0039]
After the above processing, the interface providing unit 20 classifies the elements (data A) for each category into the application (for example, the above-described database software) using the interface providing unit 20, and rearranges the data array. And a return code. The return code includes, for example, “success” indicating that the processing has been correctly completed, “error 1” indicating that the reading information generation unit 21 has failed to generate reading information, and “error 2” indicating other errors. Can be set. In the case of error 1, in addition to the data array and the return code, information for specifying the array element in which the problem has occurred can be further output.
The application using the interface providing unit 20 outputs a category classification result and a message to the output device 14 based on the received data array and return code.
[0040]
FIG. 7 is a diagram showing an example of the result of category classification according to the present embodiment.
Referring to FIG. 7, the data classified as “top news” in the database storing “in-house news” is classified into categories based on the reading information in the news titles (indicated as “Akasana” in the figure). ) In FIG. 7, the news whose title begins with the characters “A line” is displayed in the “A line” column, and the news whose title begins with the characters “K line” is displayed in the “K line” column. News that begins with the characters “line” is categorized in the “sa line” column.
Note that when the data amount of the data array to be processed is enormous, it is possible to set so that only data belonging to a specific category is output using the category classification of this processing.
[0041]
3. String comparison
Character string comparison is data processing for receiving a character string mixed with two kanji characters and comparing the reading information of the two.
As an example of using this type of interface, it is possible to search for the same reading name or title from a predetermined list in search processing in various database software. Specifically, if you want to search for a last name in a list such as a phone book and want to obtain the last name of the same reading, for example, enter “Saito” and search for “Saito”, “ There may be cases where it is desired to obtain search results such as “Saito”, “Saito”, “Saito”, “Nishito”, “Nishito”. Similarly, it can also be used when it is desired to obtain titles of the same reading from a database of music and movie titles. That is, it can be used for a search when the user is not clear of the spelling but remembers the reading.
[0042]
When character string comparison is performed according to the present embodiment, the input to the interface providing unit 20 is a character string mixed with kanji and kana and the data A in which the reading information is paired, and a character string mixed with kanji and kana The reading information is a pair of data B. However, the reading may be empty. Further, the comparison data A and B may be elements selected from a data array (list) having the data A and data B as elements, respectively, or data input in particular as a search key. It may be.
[0043]
When these inputs are made, the interface providing unit 20 performs a two-stage process of checking the reading information and comparing the character strings based on the reading information for the input data A array.
FIG. 8 is a flowchart for explaining the flow of processing by the interface providing unit 20.
As shown in FIG. 8, when the data A and the data B are input to the reading information generation unit 21 of the interface providing unit 20 (step 801), first, a reading pseudonym is assigned to both the data A and B. Are sequentially checked (steps 802 to 804). If there is data to which no reading kana is attached, the reading information generating unit 21 generates reading information from the kanji mixed kana character string of the data and adds it (step 805). If reading information cannot be generated, an error code is output (steps 806 and 807). If both data A and data B are checked or the necessary reading information generation processing is performed, the process proceeds to comparison processing based on reading (step 802).
Next, the data processing unit 22 compares the reading information of data A and data B (step 808), and outputs the comparison result (step 809).
[0044]
After the above processing, the interface providing unit 20 outputs a value indicating a comparison result between the data A and the data B and a return code to the application using the interface providing unit 20. As a comparison result, information indicating whether the reading of data A and data B is the same or which is the first in the lexicographic order of reading is obtained. Therefore, the value indicating the comparison result is, for example, based on the lexicographic order of reading data B as a reference (0), and when data A precedes data B in lexicographic order, A setting can be made such that 0 is output when data B matches, and a value greater than 0 is output when data A follows lexicographically after data B. The return code includes, for example, “success” indicating that the processing has been correctly completed, “error 1” indicating that the reading information generation unit 21 has failed to generate reading information, and “error 2” indicating other errors. Can be set.
[0045]
The application using the interface providing unit 20 generates a processing result in the application based on the received comparison result and return code, and outputs the result to the output device 14.
For example, if this character string comparison is used as a function of a search system for a database such as the telephone book described above, an element (data) having the same reading as the character string of the search key can be detected from the database list. . It is also possible to compare multiple lists and select and acquire elements (data) having the same reading, or to collate lists.
[0046]
4). String comparison that absorbs ambiguity in reading strings
Similar to the character string comparison of No. 3, this is data processing for receiving a character string mixed with two kanji characters and comparing the reading information of both. However, in response to the fact that Japanese kanji generally has multiple readings, multiple reading information is generated as readings in a kanji mixed character string to be processed. Compare. Even in this case, when the reading kana of the kanji mixed text is added in advance by the user, it follows the added reading kana.
[0047]
When performing a character string comparison that allows ambiguity in reading the character strings, the interface providing unit 20 inputs the same data A and data B as in the case of the three character string comparisons and the data A and B. This command specifies how many pieces of reading information are used for comparison when multiple pieces of reading information are obtained for a character string.
[0048]
When these inputs are made, the interface providing unit 20 performs a two-stage process on the input array of data A: reading information check and character string comparison based on reading.
FIG. 9 is a flowchart for explaining the flow of processing by the interface providing unit 20.
As shown in FIG. 9, when data A and data B are input to the reading information generation unit 21 of the interface providing unit 20 (step 901), first, a reading pseudonym is assigned to both the data A and B. Are sequentially checked (steps 902 to 904). If there is data to which no reading kana is attached, the reading information generating unit 21 generates reading information from the kanji mixed kana character string of the data and adds it (step 905). At this time, the reading information generation unit 21 generates the number of reading information designated at the time of input. Usually, the dictionary database that stores the readings of characters and words stores the readings frequently and representatively in order, so that the reading information generation unit 21 reads up to the nth reading according to the designated number n. Acquired and added to data A or data B as reading information.
If the reading information cannot be generated, the reading information generation unit 21 outputs an error code (steps 906 and 907). If both data A and data B are checked or necessary reading information generation processing is performed, the processing shifts to comparison processing based on reading (step 902).
[0049]
Next, the data processing unit 22 compares the readings of the data A and data B (step 908) and outputs the comparison result (step 909). Here, when a plurality of reading information is added to one or both of the data A and the data B, all combinations of the plurality of reading information are checked, and if there is at least one matching reading information, the data A, It is determined that the readings of data B match.
[0050]
After the above processing, the interface providing unit 20 outputs a value indicating a comparison result between the data A and the data B and a return code to the application using the interface providing unit 20. The value indicating the comparison result is, for example, 0 when the readings match (when there is one or more of the same reading information), and a value other than 0 when the readings do not match (when all reading information is different). It can be set as such. The return code includes, for example, “success” indicating that the processing has been correctly completed, “error 1” indicating that the reading information generation unit 21 has failed to generate reading information, and “error 2” indicating other errors. Can be set.
[0051]
In the case of this process, unlike the character string comparison by 3, when a plurality of readings are obtained from one character string, the search result can be acquired based on each reading. In other words, when spelling such as kanji is clearly remembered but reading is not clear, data processing including appropriate reading can be executed. For example, when searching for “Uemura” by entering “Uemura” and “Kamimura”, it is possible to obtain a search result that matches the character string “Uemura”.
Further, as described in the character string comparison of 3, this embodiment can search for a character string having the same reading with respect to the character string of the search key. When two readings "" and "Kamimura" are obtained, search results "Uemura" for "Uemura" and "Kamimura" for "Kamimura" can be obtained.
[0052]
5. String search
The character string search is a data process in which a character string mixed with two kanji characters is input, a part having the same reading as the other character string is searched from one character string, and an appearance portion of the corresponding part is returned. In this process, similarly to the character string comparison of 4, a plurality of reading information is generated as readings in a character string mixed with kanji characters to be processed, and the plurality of reading information is compared. Also in this case, when a reading kana of a kanji kana mixed sentence is added in advance by the user, the added reading is followed.
As an example of using this type of interface, there is a case where a character string having the same reading as a desired character string (word or the like) is searched from a predetermined document in a character string search process in word processor software.
[0053]
When a character string search is performed according to the present embodiment, what is input to the interface providing unit 20 is a search key and data A in which a character string mixed with kanji as a search target and its reading information are paired. When a plurality of reading information is obtained for the character string of the character string of the kanji kana mixed character string and the reading information and the data string of the data B as the search key, how many reading information are used for comparison. It is an instruction that specifies.
[0054]
When these inputs are made, the interface providing unit 20 performs a two-stage process on the input data A array, that is, reading information check and character string search based on reading.
FIG. 10 is a flowchart for explaining the flow of processing by the interface providing unit 20.
As shown in FIG. 10, when data A and data B are input to the reading information generating unit 21 of the interface providing unit 20 (step 1001), first, reading pseudonyms are attached to both data A and B. Are sequentially checked (steps 1002 to 1004). If there is data to which no reading kana is attached, the reading information generating unit 21 generates reading information from a character string mixed with kanji of the data and adds it (step 1005). At this time, for the data B, the number of reading information designated at the time of input is generated. Usually, the dictionary database that stores the readings of characters and words stores the readings frequently and representatively in order, so that the reading information generation unit 21 reads up to the nth reading according to the designated number n. Acquired and added to data B as reading information.
If the reading information cannot be generated, the reading information generation unit 21 outputs an error code (steps 1006 and 1007). If both data A and data B are checked or necessary reading information generation processing is performed, the processing proceeds to reading-based search processing (step 1002).
[0055]
Next, the data processing unit 22 compares the readings of the data A and B (step 1008), checks whether the reading of the data A has the same portion as the reading of the data B, and outputs the result ( Step 1009). Here, if the data A has the same reading portion as the data B, position information indicating the location is output. In addition, when there are a plurality of reading portions that are the same as the data B, for example, position information indicating a portion where readings first coincide can be output. When a plurality of reading information is added to the data B, it is checked whether or not there is the same reading portion in the data A for all of the plurality of reading information, and the result is output.
[0056]
After the above processing, the interface providing unit 20 outputs the search result and the return code to the application using the interface providing unit 20. The search result is, for example, position information indicating the location when the data A has the same reading portion as the data B. Specifically, the pointer is displayed on the corresponding character string in the data A displayed on the output device 14 or displayed in reverse video. The position information may be displayed on the entire detected character string or only on a specific part such as the first character or the last character of the detected character string.
Further, when the data A does not have the same reading part as the data B, for example, a message indicating that there is no corresponding part is output. The return code includes, for example, “success” indicating that the processing has been correctly completed, “error 1” indicating that the reading information generation unit 21 has failed to generate reading information, and “error 2” indicating other errors. Can be set.
The application using the interface providing unit 20 outputs a search result and a message to the output device 14 based on the received comparison result and return code.
[0057]
This processing can be used when a sentence or data string having a certain length is a search target and a desired character, word, sentence or the like is searched as a search key. Since the search is performed based on the reading of each character string, even when the spelling is not clear but the reading is remembered, a search including an appropriate search result can be performed. For example, “肴” can be detected by entering “fish”, “fish”, or “fish” as the search key for the search target “... it can.
In addition, when a plurality of readings are obtained for the search key, a search can be performed based on each reading. For example, "Uemura" is entered as a search key, and "... Uemura-san ...""... Kamimura-sensei ...""... Uemura-kun ..." If there is a part such as "... Mr. Kamimura ...", any of them can be detected.
When searching for the part corresponding to the character string of the search key from the character string to be searched, if the search is performed while moving in the character string to be searched in units of morphemes, the word may not be divided in the middle. It can be avoided that readings coincide with each other in a natural part and erroneous detection is performed.
[0058]
In this way, when data processing such as data sorting, category classification, character string comparison, and character string search is performed on Japanese character strings, it is more applicable by processing based on reading of character strings. It becomes possible to perform highly accurate data processing.
Moreover, since the reading of the character string is automatically generated by the reading information generation unit 21, the user is not conscious of reading the character string (without inputting the reading kana), and the result of the data processing based on the reading is performed. Obtainable.
[0059]
Furthermore, in character string comparison and character string search, by performing processing based on reading of a character string, a correct search result can be obtained even if the search key or the spelling of the input character string is incorrect. Further, when there are a plurality of readings in the character string, a correct search result can be obtained by performing processing for each reading even if the user has mistakenly recognized the reading.
In the above-described embodiment, processing corresponding to a plurality of readings has been described only in the case of character string comparison and character string search. However, processing corresponding to a plurality of readings is also performed in data sorting and category classification. be able to. In this case, for example, based on a plurality of readings, data is inserted into each of a plurality of corresponding positions, or data is inserted into one of the corresponding positions, and a pointer to the data is inserted into another corresponding position. . In this way, it is possible to acquire desired data even when the user searches for data based on any reading for the sorting result or the classification result.
In the embodiment described above, reading information is generated and added to a character string as a reading of a character string mixed with kana and kanji. The reading information may be hiragana or katakana as long as it is unified. In addition, romaji may be used as reading information instead of kana characters.
[0060]
Furthermore, in this embodiment, data sorting, data category classification, character string comparison, and character string search have been described as data processing. However, as other data processing, for example, a specific reading in a character string mixed with kanji characters is used. It is also possible to perform character string replacement that replaces a part having a character string with another character string.
[0061]
【The invention's effect】
As described above, according to the present invention, a function for performing data processing based on reading of a character string to be processed can be added to data processing such as sorting and searching required by an application.
[0062]
Further, according to the present invention, when a character string that is a target of data processing such as sorting or searching can have a plurality of readings, an interface that performs such data processing based on each reading can be provided.
[0063]
Furthermore, according to the present invention, it is possible to provide an interface that automatically acquires readings of character strings that are targets of data processing such as sorting and searching, and performs such data processing based on the readings.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration example of a data processing system that realizes an interface according to an embodiment;
FIG. 2 is a diagram showing a schematic configuration of an interface providing unit according to the present embodiment.
FIG. 3 is a diagram illustrating a flow of data processing provided by an interface providing unit according to the present embodiment.
FIG. 4 is a flowchart for explaining a flow of processing by an interface providing unit according to the present embodiment, and is a diagram for explaining processing when data is sorted.
FIG. 5 is a diagram illustrating an example of a sorting result according to the present embodiment.
FIG. 6 is a flowchart for explaining a flow of processing by an interface providing unit according to the present embodiment, and is a diagram for explaining processing when data category classification is performed;
FIG. 7 is a diagram showing another example of a sorting result according to the present embodiment.
FIG. 8 is a flowchart for explaining the flow of processing by the interface providing unit according to the present embodiment, and is a diagram for explaining processing when character string comparison is performed.
FIG. 9 is a flowchart for explaining the flow of processing by the interface providing unit according to the present embodiment, and is a diagram for explaining processing when character string comparison is performed that absorbs ambiguity in reading character strings.
FIG. 10 is a flowchart for explaining the flow of processing by the interface providing unit according to the present embodiment, and is a diagram for explaining processing when performing a character string search;
FIG. 11 is a diagram illustrating a data sorting result using a conventional data processing system.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 11 ... Input device, 12 ... Processing device, 13 ... Memory | storage device, 14 ... Output device, 20 ... Interface provision part, 21 ... Reading information generation part, 22 ... Data processing part

Claims

A program for controlling a computer and outputting a processing result for input data,
A process of accepting input of a character string mixed with kana and kana to be compared ,
A process for generating the reading information of the input kanji kana mixed character string with no reading kana added in advance;
Among the character strings mixed with kanji characters, the kana characters are preliminarily added to the kana characters as the reading information, and a plurality of reading information is generated for at least two of the character strings mixed with the kanji characters. If you are, compared with the respective each other the kanji kana string plurality of reading information generated for the plurality of reading information generated for one of the kanji kana string, one or more When the comparison result about reading information is the same, it is judged that the character string mixed with the kanji kana to be compared matches, and causes the computer to execute a process of outputting the judgment result .

In a data processing system that provides a user interface according to application software,
An input means for inputting a kanji-kana mixed character string to be subjected to character string comparison ;
Reading information generation means for generating the reading information of the input kana-kana mixed character string to which the reading kana is not added in advance,
The input kana is adopted as the reading information for the input kana kana mixed character string to which the reading kana has been added in advance , and a plurality of reading information is obtained for at least two of the plurality of kanji kana mixed character strings. Each of the plurality of reading information generated for one kanji-kana mixed character string and each of the plurality of reading information generated for the other kanji-kana mixed character string. A data processing system comprising: output means for determining that the character strings mixed with kanji characters to be compared match when the comparison results of two or more reading information are the same; and outputting the determination result .

A storage medium storing a program for controlling a computer and outputting a processing result for input data,
The program is
A process of accepting input of a character string mixed with kana and kana to be compared ,
A process for generating the reading information of the input kanji kana mixed character string with no reading kana added in advance;
Among the character strings mixed with kanji characters, the kana characters are preliminarily added to the kana characters as the reading information, and a plurality of reading information is generated for at least two of the character strings mixed with the kanji characters. If you are, compared with the respective each other the kanji kana string plurality of reading information generated for the plurality of reading information generated for one of the kanji kana string, one or more A storage medium characterized by causing the computer to execute a process of determining that the character strings mixed with the kanji characters to be compared match when the comparison results of the reading information are the same, and outputting the determination result .