JP2007004636A

JP2007004636A - Language input system, its processing method, recording medium and program

Info

Publication number: JP2007004636A
Application number: JP2005185767A
Authority: JP
Inventors: Takeshi Fujimura; 武志藤村; Hiroaki Kaneki; 宏明鹿子木; Kotaro Yoshida; 航太郎吉田; Ken Itakura; 謙板倉
Original assignee: Microsoft Corp
Current assignee: Microsoft Corp
Priority date: 2005-06-24
Filing date: 2005-06-24
Publication date: 2007-01-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide a language input system improved in usability of user-related data. <P>SOLUTION: The language input system comprises a user dictionary file 203 storing a plurality of dictionary data 203a, 203b, 203c, 203d, and 203e by layer, the data set including a plurality of data records, and the data record including a data definition for acquiring a notation corresponding to reading. In the system, a sentence is analyzed, a data record including a data definition corresponding to the reading of a word contained in the sentence is extracted, and it is determined whether the extracted data record is contained in any one of the one or more data sets or not. When it is determined that the extracted data record is not contained in any one of the one or more data sets, a new data set containing the extracted data record is generated, and the generated new data set is stored in an empty layer of the storage means. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、入力された読みに対する１以上の表記を表記リストへ表示し、表示された表記から１つの表記を選択し確定することができる言語入力システム、その処理方法、記録媒体及びプログラムに関する。 The present invention relates to a language input system, a processing method, a recording medium, and a program capable of displaying one or more notations for an input reading in a notation list and selecting and confirming one notation from the displayed notations.

従来、入力された言語、たとえば日本語の読みを所望の日本語表記（ひらがな、カタカナ、漢字、記号その他）に変換する日本語入力プログラムが動作する言語入力システムがある。日本語入力プログラムとして例えば、マイクロソフト社のＭＳ−ＩＭＥ（Input Method Editor）（商標）、ジャストシステム社のＡＴＯＫ（商標）、バックス社のＶＪＥ（商標）及びその他がある（例えば、特許文献１参照）。 2. Description of the Related Art Conventionally, there is a language input system that operates a Japanese input program that converts an input language, for example, Japanese reading into desired Japanese notation (Hiragana, Katakana, Kanji, symbols, etc.). Examples of Japanese input programs include MS-IME (Input Method Editor) (trademark) of Microsoft Corporation, ATOK (trademark) of Just Systems Corporation, VJE (trademark) of Bucks Corporation, and others (see, for example, Patent Document 1).

例えばＭＳ−ＩＭＥは、Ｗｉｎｄｏｗｓ（登録商標）等のＯＳ（operating system）環境で動作するパーソナルコンピュータ（ＰＣ）上で動作する。ＭＳ−ＩＭＥは、ＭＳ−ＩＭＥが予め有している辞書データ（システム辞書）またはユーザによって定義された辞書データ（ユーザ辞書）を参照し、入力された読みに対する１以上の表記を辞書から取得する。ＭＳ−ＩＭＥは表記をＰＣの表示画面に表記リストとして表示し、読みをユーザが選択し確定した１表記に変換する。 For example, MS-IME operates on a personal computer (PC) that operates in an OS (operating system) environment such as Windows (registered trademark). MS-IME refers to dictionary data (system dictionary) that MS-IME has in advance or dictionary data (user dictionary) defined by a user, and acquires one or more expressions for input readings from the dictionary. . MS-IME displays the notation as a notation list on the display screen of the PC, and converts the reading into one notation selected and fixed by the user.

ＭＳ−ＩＭＥは、日本語の文字を入力するためのプログラムである。ＭＳ−ＩＭＥは、レキシコンデータベース（ＤＢ）と日本語の言語モデル（ＬＭ）とを使用して、表音文字（読み）から適当な表意文字（表記）を生成する。レキシコンは用語集でありＭＳ−ＩＭＥの所謂辞書である。ＬＭは、統計的な言語処理を行うための規則を規定した文法ＤＢである。ＭＳ−ＩＭＥは、入力された読みに対応する１以上の表記をレキシコンＤＢから取り出し、それらからＬＭを使用した言語処理によって最適な１表記を確定する。 MS-IME is a program for inputting Japanese characters. MS-IME uses a lexicon database (DB) and a Japanese language model (LM) to generate appropriate ideograms (notation) from phonetic characters (reading). Lexicon is a glossary and a so-called dictionary of MS-IME. The LM is a grammar DB that defines rules for performing statistical language processing. The MS-IME extracts one or more notations corresponding to the input reading from the lexicon DB, and determines the optimum one notation by language processing using LM therefrom.

特開２００２-１２３５１０号公報JP 2002-123510 A

上述のような従来の言語入力システムにおいて、世間で生まれる略語や新語の例に見られるように、日本語の語彙は日々変化しているため、日本語入力プログラムの既定のＤＢは全ての日本語を網羅していない。したがって、日本語入力プログラムは、既定のＤＢに対応することはもちろんのこと、世間で生まれる種々の語を支援するユーザ関連ＤＢにも対応することが求められている。 In the conventional language input system as described above, as seen in examples of abbreviations and new words born in the world, the vocabulary of Japanese changes every day, so the default DB of the Japanese input program is all Japanese Is not covered. Accordingly, the Japanese input program is required not only to correspond to a predetermined database but also to a user related DB that supports various words born in the world.

本発明は、このような課題に鑑みてなされたもので、その目的とするところは、ユーザ関連データの扱い易さを向上させた言語入力システム、その処理方法、記録媒体及びプログラムを提供することにある。 The present invention has been made in view of such problems, and an object of the present invention is to provide a language input system, a processing method thereof, a recording medium, and a program that improve the ease of handling user-related data. It is in.

このような目的を達成するために、本発明の言語入力システムは、所定の言語の入力された読みに対する１つ又は複数の表記を表示し、当該表示された表記の中から、１つの表記を選択するための言語入力システムであって、１つ又は複数のデータセット（２０３ａ、２０３ｂ、２０３ｃ、２０３ｄ、２０３ｅ）をレイヤ別に記憶した記憶手段であって、前記データセットは１つ又は複数のデータレコードを含み、前記データレコードは読みに対応した表記を取得するためのデータ定義を含む記憶手段（１０８）と、文章を解析し、該文章が含む語の読みに対応した前記データ定義を含むデータレコードを抽出する抽出手段（１０１、Ｓ７１２）と、前記抽出手段によって抽出された前記データレコードが、前記１つ又は複数のデータセットのいずれかに含まれるか否かを判定する抽出データ判定手段（１０１、Ｓ７１４）と、前記抽出データ判定手段によって、抽出された前記データレコードが前記１つ又は複数のデータセットのいずれにも含まれないとの判定がされた場合、抽出された前記データレコードを含む新たなデータセットを生成し、当該生成された新たなデータセットを前記記憶手段の空きレイヤに記憶するデータ生成手段（１０１、Ｓ７１６、Ｓ７１８）とを備えたことを特徴とする。 In order to achieve such an object, the language input system of the present invention displays one or more notations for an input reading in a predetermined language, and displays one notation from the displayed notations. A language input system for selection, which is a storage means for storing one or more data sets (203a, 203b, 203c, 203d, 203e) for each layer, wherein the data set is one or more data Storage means (108) including a data definition for obtaining a notation corresponding to reading, and data including the data definition corresponding to reading of a word included in the sentence by analyzing the sentence Extraction means (101, S712) for extracting records, and the data records extracted by the extraction means include the one or more data sets The data record extracted by the extracted data determining means (101, S714) for determining whether or not it is included in any of the one or a plurality of data sets is included in the one or more data sets. If it is determined that there is no data generation unit (101, S716) that generates a new data set including the extracted data record and stores the generated new data set in an empty layer of the storage unit. , S718).

また上記目的を達成するために、本発明の言語入力システムの処理方法は、所定の言語の入力された読みに対する１つ又は複数の表記を表示し、当該表示された表記の中から、１つの表記を選択するための言語入力システムの処理方法であって、前記言語入力システムは、１つ又は複数のデータセットをレイヤ別に記憶した記憶手段であって、前記データセットは１つ又は複数のデータレコードを含み、前記データレコードは読みに対応した表記を取得するためのデータ定義を含む記憶手段を有し、前記方法は、抽出手段が、文章を解析し、該文章が含む語の読みに対応した前記データ定義を含むデータレコードを抽出する抽出ステップと、前記抽出ステップにおいて抽出された前記データレコードが、前記１つ又は複数のデータセットのいずれかに含まれるか否かを抽出データ判定手段が判定する抽出データ判定ステップと、前記抽出データ判定ステップにおいて、抽出された前記データレコードが前記１つ又は複数のデータセットのいずれにも含まれないとの判定がされた場合、データ生成手段が、抽出された前記データレコードを含む新たなデータセットを生成し、当該生成された新たなデータセットを前記記憶手段の空きレイヤに記憶するデータ生成ステップとを備えることを特徴とする。 In order to achieve the above object, the processing method of the language input system of the present invention displays one or a plurality of notations for an input reading in a predetermined language, and one of the displayed notations is displayed. A method of processing a language input system for selecting a notation, wherein the language input system is a storage unit that stores one or more data sets by layer, and the data set includes one or more data Including a record, the data record having a storage means including a data definition for obtaining a notation corresponding to the reading, and the method is such that the extracting means analyzes the sentence and corresponds to the reading of the word included in the sentence. An extraction step for extracting a data record including the data definition, and the data record extracted in the extraction step is included in the one or more data sets. In the extracted data determination step in which the extracted data determination means determines whether or not it is included, and in the extracted data determination step, the extracted data record is included in any of the one or a plurality of data sets. When it is determined that there is no data, the data generation unit generates a new data set including the extracted data record, and stores the generated new data set in an empty layer of the storage unit. And a step.

なお、特許請求の範囲の構成要素と対応する実施形態中の図中符号等を（）で示した。ただし、特許請求の範囲に記載した構成要素は上記（）部の実施形態の構成要素に限定されるものではない。 In addition, the code | symbol etc. in the figure in embodiment corresponding to the component of a claim are shown by (). However, the constituent elements described in the claims are not limited to the constituent elements in the embodiment of the above () part.

以上の構成により、言語入力システムは、簡単な設計のレイヤ化構造を使用することによって、種々のユーザ指向の語彙情報を処理できる。また、言語入力システムは、既定のＤＢに対応する周知のユーザ状況はもちろんのこと、世間で生まれる種々の語を支援するユーザ関連ＤＢに対応する新しいカテゴリの語彙も網羅できる。 With the above configuration, the language input system can process various user-oriented vocabulary information by using a layered structure with a simple design. Moreover, the language input system can cover not only the well-known user situation corresponding to a predetermined DB but also a new category of vocabulary corresponding to a user-related DB that supports various words born in the world.

本発明によれば、言語入力システムにおいて、ユーザ関連データの扱い易さを向上させる効果を奏する。 ADVANTAGE OF THE INVENTION According to this invention, in a language input system, there exists an effect which improves the ease of handling of a user related data.

以下、図面を参照して本発明を適用できる実施形態を詳細に説明する。なお、各図面において同様の機能を有する箇所には同一の符号を付し、説明の重複は省略する。 Embodiments to which the present invention can be applied will be described below in detail with reference to the drawings. In addition, the same code | symbol is attached | subjected to the location which has the same function in each drawing, and duplication of description is abbreviate | omitted.

（装置構成）
図１は、本実施形態における日本語の読みを所望の日本語表記に変換する日本語入力システムを実現するための、日本語入力プログラムをインストールしたＰＣのシステムブロック図である。本実施形態では、ＯＳとしてWindows（登録商標）を搭載した汎用のＰＣを使用する日本語入力システムについて説明する。 (Device configuration)
FIG. 1 is a system block diagram of a PC installed with a Japanese input program for realizing a Japanese input system for converting Japanese readings into desired Japanese notation in the present embodiment. In the present embodiment, a Japanese language input system using a general-purpose PC equipped with Windows (registered trademark) as an OS will be described.

図１のＰＣ１００において、図中符号１０１はＣＰＵ（central processing unit）であり、システムメモリ１０２のＲＡＭ（random access memory）にロードされたプログラムを実行する。システムメモリ１０２はＣＰＵ１０１の実行するプログラムで必要な入力データ、プログラムの実行結果等各種のデータを保存するためのＲＡＭや、ＢＩＯＳ（Basic Input/Output System）等を予め記憶したＲＯＭ（read only memory）等である。また、システムメモリ１０２のＲＡＭはディスプレイ１０７に表示するデータや、キーボード１０６、モデム１０３、マイク１１７等から入力されたデータを一時記憶する。 In the PC 100 of FIG. 1, reference numeral 101 in the figure denotes a CPU (central processing unit), which executes a program loaded in a random access memory (RAM) of the system memory 102. A system memory 102 is a RAM (Read Only Memory) that stores various data such as input data necessary for a program executed by the CPU 101, execution results of the program, and a BIOS (Basic Input / Output System). Etc. The RAM of the system memory 102 temporarily stores data to be displayed on the display 107 and data input from the keyboard 106, modem 103, microphone 117, and the like.

符号１０３はモデムであり、内部モデムであっても外部モデムであってもよく、シリアルポートインターフェースを介してシステムバス１１６に接続され、インターネットなどの広域ネットワーク１２０上の通信を確立する。符号１０４はＣＤ（compact disc）−ＲＯＭドライブであり、装着されたＣＤ−ＲＯＭ１０５からデータを読み取る。本実施形態では、日本語入力プログラム及び関連のデータ（システム辞書ファイル等）を記録したＣＤ−ＲＯＭ１０５から読み取られたプログラム及びデータが、後述のハードディスク記憶装置（ＨＤ）１０８にインストールされている。 Reference numeral 103 denotes a modem, which may be an internal modem or an external modem, and is connected to the system bus 116 via a serial port interface to establish communication on the wide area network 120 such as the Internet. Reference numeral 104 denotes a CD (compact disc) -ROM drive, which reads data from a mounted CD-ROM 105. In this embodiment, a program and data read from a CD-ROM 105 in which a Japanese input program and related data (system dictionary file or the like) are recorded are installed in a hard disk storage device (HD) 108 to be described later.

符号１０６はキーボードであり、文字に対応するキーを押下することで、文字を入力する。符号１１７はマイクであり、ユーザの発した音声をＰＣ１００へ入力する。符号１０７はディスプレイであり、キーボード１０６等から入力された文字やＣＰＵ１０１の演算結果を可視表示する。ポインティングデバイス１１５は、ディスプレイ１０７の表示画面上に表示されたポインタ（カーソル図形）を移動させ、また、その位置を確定のために指示することが可能であり、本実施形態では、マウスを使用する。上記ポインタの移動はマウス１１５自身をユーザが移動し、ポインタの位置の確定は左右のクリックボタンで行われる。符号１１６は、ＰＣ１００の上述した各要素を接続するシステムバスである。 Reference numeral 106 denotes a keyboard, and a character is input by pressing a key corresponding to the character. Reference numeral 117 denotes a microphone, which inputs the voice uttered by the user to the PC 100. Reference numeral 107 denotes a display that visually displays characters input from the keyboard 106 and the like and the calculation result of the CPU 101. The pointing device 115 can move a pointer (cursor figure) displayed on the display screen of the display 107 and can instruct the position for confirmation. In this embodiment, a mouse is used. . The pointer is moved by the user with the mouse 115 itself, and the position of the pointer is determined by the left and right click buttons. Reference numeral 116 denotes a system bus that connects the above-described elements of the PC 100.

ＨＤ１０８には以下に述べるプログラム、データが保存目的で記憶されている。符号１０９はＰＣ１００及び周辺機器を制御するためのＯＳであり、本実施形態では上述のＷｉｎｄｏｗｓ（登録商標）が使用される。符号１１０は、本実施形態の日本語入力プログラム１１４以外の種々のアプリケーションプログラムであり、例えば、ＷＥＢコンテンツのＷＥＢページを閲覧するためのブラウザ、エクスプローラ（商標）、ワープロソフト、データベース・ソフト、表計算ソフト、音声認識ソフト等である。 The HD 108 stores the following programs and data for the purpose of storage. Reference numeral 109 denotes an OS for controlling the PC 100 and peripheral devices. In the present embodiment, the above-described Windows (registered trademark) is used. Reference numeral 110 denotes various application programs other than the Japanese input program 114 of the present embodiment. For example, a browser for browsing a WEB page of WEB content, Explorer (trademark), word processor software, database software, spreadsheet Software, voice recognition software, etc.

符号１１１はアプリケーションプログラム以外の種々のプログラムモジュールであり、例えば各種デバイスドライバや、各種ＡＰＩ（Application Programming Interface）を提供するＤＬＬ（dynamic-link library）ファイル等である。プログラムデータ１１２は、ＯＳその他各種プログラムが使用し又は生成する種々のデータである。プログラムデータ１１２は、本実施形態の日本語入力プログラムに付随した種々のシステム辞書ファイル及びユーザ辞書ファイル、日本語入力プログラムが表記リストの表示に要するＧＵＩ（graphical user interface）データを含む。ベンダーによって供給されたシステム辞書ファイルはシステム辞書データを含み、ユーザ辞書ファイルはユーザによって定義されたユーザ辞書データを含む。 Reference numeral 111 denotes various program modules other than application programs, such as various device drivers and DLL (dynamic-link library) files that provide various APIs (Application Programming Interface). The program data 112 is various data used or generated by the OS and various other programs. The program data 112 includes various system dictionary files and user dictionary files associated with the Japanese input program of the present embodiment, and GUI (graphical user interface) data required for the Japanese input program to display a notation list. The system dictionary file supplied by the vendor includes system dictionary data, and the user dictionary file includes user dictionary data defined by the user.

符号１１４は日本語入力プログラムであり、ＣＰＵ１０１が行う日本語入力プログラム処理の内容を規定した図７〜９のフローチャートで示されるプログラムである。このプログラムの詳細は以下の動作説明において後述される。 Reference numeral 114 denotes a Japanese input program, which is a program shown in the flowcharts of FIGS. 7 to 9 that defines the contents of the Japanese input program processing performed by the CPU 101. Details of this program will be described later in the following description of the operation.

符号１２０は広域ネットワークであり、本実施形態ではインターネットを想定している。広域ネットワーク１２０にはＷＷＷ（World Wide Web）サーバ１２１が接続されている。本実施形態のシステム辞書ファイルを使用する日本語入力プログラム１１４のベンダーのＷＥＢサイトが、ＷＷＷサーバ１２１上で公開されている。新たなシステム辞書ファイルの入手にあたっては、日本語入力プログラム１１４のベンダーが開くＷＥＢサイトにＰＣ１００からネットワーク１２０経由でアクセスし、そこからファイルをダウンロードすることが可能である。 Reference numeral 120 denotes a wide area network, and the Internet is assumed in this embodiment. A WWW (World Wide Web) server 121 is connected to the wide area network 120. A web site of the vendor of the Japanese input program 114 that uses the system dictionary file of the present embodiment is published on the WWW server 121. In obtaining a new system dictionary file, it is possible to access the WEB site opened by the vendor of the Japanese input program 114 from the PC 100 via the network 120 and download the file from there.

図２は、日本語入力プログラム１１４の動作概要を示すシステム構成図である。日本語入力プログラム１１４は、辞書サーチモジュール１１４ａとＬＭモジュール１１４ｂを有する。 FIG. 2 is a system configuration diagram showing an outline of the operation of the Japanese input program 114. The Japanese input program 114 has a dictionary search module 114a and an LM module 114b.

基本的に、辞書サーチモジュール１１４ａは、入力された読み２０１に対応する１以上の表記を、ベンダー供給の既定のレキシコンＤＢ内のシステム辞書ファイル２０２及びユーザレキシコンＤＢ内のユーザ辞書ファイル２０３から取り出すモジュールである。また辞書サーチモジュール１１４ａは、ユーザ辞書ファイル２０３内のデータの更新、削除等の管理を行う。ＬＭモジュール１１４ｂは、ＬＭ内の学習結果ファイル２０４を使用した言語処理によって、辞書サーチモジュール１１４ａが取り出した１以上の表記から最適な１表記の表記出力２０５を確定するモジュールである。 Basically, the dictionary search module 114a extracts one or more notations corresponding to the input reading 201 from the system dictionary file 202 in the vendor-supplied default lexicon DB and the user dictionary file 203 in the user lexicon DB. It is. Further, the dictionary search module 114a manages the update and deletion of data in the user dictionary file 203. The LM module 114b is a module that determines an optimal notation output 205 of one notation from one or more notations extracted by the dictionary search module 114a by language processing using the learning result file 204 in the LM.

システム辞書ファイル２０２は、システム辞書データ２０２ａを含む。ユーザ辞書ファイル２０３は、固有名詞辞書データ２０３ａ、Ｅ（電子）メール応答辞書データ２０３ｂ、略語辞書データ２０３ｃ、ドキュメントフォルダ辞書データ２０３ｄ、新語辞書データ２０３ｅ等を含む。学習結果ファイル２０４は学習結果データ２０４ａを含む。 The system dictionary file 202 includes system dictionary data 202a. The user dictionary file 203 includes proper noun dictionary data 203a, E (electronic) mail response dictionary data 203b, abbreviation dictionary data 203c, document folder dictionary data 203d, new word dictionary data 203e, and the like. The learning result file 204 includes learning result data 204a.

特に、本実施形態においては、辞書サーチモジュール１１４ａによるユーザ辞書ファイル２０３中の各辞書データの管理方法に特徴があり、この各辞書データの管理方法、及び図２におけるこれら各データについては、以下で詳述される。 In particular, the present embodiment has a feature in the management method of each dictionary data in the user dictionary file 203 by the dictionary search module 114a. The management method of each dictionary data and each of these data in FIG. Detailed.

尚、学習結果データ２０４ａはＬＭモジュールが表記出力２０５を確定した際の仮名漢字変換処理の学習結果を記録する。ＬＭでの学習結果の記録の手法としては、ワードバイグラム（bigram）方式その他多様な方法が有るが、本実施形態では周知のフラグカウントを使用した学習ルーチンを適用する。 The learning result data 204a records the learning result of the kana-kanji conversion process when the LM module determines the notation output 205. As a method of recording a learning result in LM, there are a word bigram method and various other methods. In this embodiment, a learning routine using a well-known flag count is applied.

（動作説明）
以上述べたシステム構成において、本実施形態の日本語の読みを所望の日本語表記に変換するための日本語入力システムの日本語入力プログラム１１４の動作について以下、図３〜６を参照し説明する。 (Description of operation)
In the system configuration described above, the operation of the Japanese input program 114 of the Japanese input system for converting Japanese readings of the present embodiment into a desired Japanese notation will be described below with reference to FIGS. .

本実施形態の日本語入力システムは、簡単な設計のレイヤ化構造を使用することによって、種々のユーザ指向の語彙情報を処理する。これにより、既定のレキシコンＤＢに対応する周知のユーザ状況はもちろんのこと、世間で生まれる種々の語を支援するユーザレキシコンＤＢに対応する新しいカテゴリの語彙も網羅することに特徴がある。 The Japanese input system of this embodiment processes various user-oriented vocabulary information by using a layered structure with a simple design. As a result, not only the known user situation corresponding to a predetermined lexicon DB but also a new category of vocabulary corresponding to the user lexicon DB that supports various words born in the world is covered.

さらに、日本語入力システムは、多様なカテゴリの辞書を管理可能なユーザレキシコンＤＢを支援し、多様なカテゴリの辞書データは単一のフォーマットを使用する。各辞書内のユーザ指向の語彙は、存在期間の特殊性を有する。例えば、新語は公衆や市場で生成されるが、時折そのような新語は不使用となり、その語は死語として扱われる。一方、人名のような固有名詞はユーザによって永久的に使用される。日本語入力システムは、このようなレイヤ化データ管理を使用することによって、ユーザの辞書データ間の存在期間の相違を簡単に処理できることに特徴がある。 Further, the Japanese input system supports a user lexicon DB that can manage dictionaries of various categories, and the dictionary data of various categories uses a single format. The user-oriented vocabulary within each dictionary has a period-specificity. For example, a new word is generated by the public or the market, but sometimes such a new word is not used and the word is treated as a dead word. On the other hand, proper nouns such as personal names are permanently used by users. The Japanese input system is characterized in that it can easily handle the difference in the existence period between the user's dictionary data by using such layered data management.

図３は、例えばメーラーによるＥメール応答文書の編集中に、日本語入力プログラム１１４が上述の各辞書ファイルを参照して、入力された読み「はなし」を各辞書から取得した適当な表記へ変換する場合の例を示す図である。 FIG. 3 shows, for example, when editing an e-mail response document by a mailer, the Japanese input program 114 refers to each dictionary file described above, and converts the input reading “None” into an appropriate notation obtained from each dictionary. It is a figure which shows the example in the case of doing.

日本語入力プログラム１１４は、各辞書ファイルを参照し、表示画面上の図中符号３０１の位置に入力された読み「はなし」に対する１以上の表記を取得する。日本語入力プログラム１１４は、それら表記をＰＣ１００のディスプレイ１０７上の表示画面に表記リスト３０２として表示する。この時に表記リスト３０２中に表示される符号３０３の分数「６／６」は、フォーカス（強調のためハイライトされること）３０４が位置している表記「花氏」が全表記６個中の６番目の表記であることを示す。 The Japanese input program 114 refers to each dictionary file and acquires one or more notations for the reading “None” input at the position of the reference numeral 301 in the drawing on the display screen. The Japanese input program 114 displays these notations as a notation list 302 on the display screen on the display 107 of the PC 100. At this time, the fraction “6/6” of the reference numeral 303 displayed in the notation list 302 has the notation “Hanaji” where the focus (highlighted for highlighting) 304 is located in all six notations. Indicates the sixth notation.

日本語入力プログラム１１４は、ユーザがＰＣのキーボード１０６の↓（down）キー等を押し下げるとフォーカスを移動し、ユーザが確定キー（たとえばエンターキー）を押すと表記リスト３０２を閉じる。そして日本語入力プログラム１１４は、符号３０１の位置において、入力された読み「はなし」をユーザがフォーカスを移動して選択し確定した１表記「花氏」に変換する。 The Japanese input program 114 moves the focus when the user depresses the down (down) key or the like of the PC keyboard 106, and closes the notation list 302 when the user presses the enter key (for example, the enter key). Then, the Japanese input program 114 converts the input reading “None” into a notation “Mr. Hana” selected and confirmed by moving the focus at the position of reference numeral 301.

図４は、辞書サーチモジュール１１４ａによるユーザ辞書ファイル２０３中の各辞書データの管理方法を示す図である。ユーザ辞書ファイル２０３は、データをレイヤ化した構造で格納する。ユーザ辞書ファイル２０３は、固有名詞辞書データ２０３ａ、Ｅメール応答辞書データ２０３ｂ、略語辞書データ２０３ｃ、ドキュメントフォルダ辞書データ２０３ｄ、新語辞書データ２０３ｅ等を各々独立したレイヤのデータとして格納することが可能である。図４は、ユーザ辞書ファイル２０３が、固有名詞辞書データ２０３ａ、略語辞書データ２０３ｃ、新語辞書データ２０３ｅを各レイヤに既に格納している場合を示している。 FIG. 4 is a diagram showing a method of managing each dictionary data in the user dictionary file 203 by the dictionary search module 114a. The user dictionary file 203 stores data in a layered structure. The user dictionary file 203 can store proper noun dictionary data 203a, e-mail response dictionary data 203b, abbreviation dictionary data 203c, document folder dictionary data 203d, new word dictionary data 203e, and the like as independent layer data. . FIG. 4 shows a case where the user dictionary file 203 already stores proper noun dictionary data 203a, abbreviation dictionary data 203c, and new word dictionary data 203e in each layer.

辞書サーチモジュール１１４ａは、メーラーによるＥメール応答文書４０１の編集が開始されると、Ｅメール応答文書４０１中のメール送信者からの元のメッセージ４０１ａを解析する。辞書サーチモジュール１１４ａは、元のメッセージ４０１ａ中にあってシステムに既存の各辞書データに無い語を抽出し、それら抽出した語を含むＥメール応答辞書データ２０３ｂを作成しユーザ辞書ファイル２０３中の空きレイヤに格納する。この時、Ｅメール応答辞書データ２０３ｂには存在期間「メール送信まで」が割り当てられる。 When the e-mail response document 401 is started to be edited by the mailer, the dictionary search module 114 a analyzes the original message 401 a from the mail sender in the e-mail response document 401. The dictionary search module 114a extracts words in the original message 401a that are not present in each dictionary data existing in the system, creates e-mail response dictionary data 203b including the extracted words, and creates a free space in the user dictionary file 203. Store in the layer. At this time, the e-mail response dictionary data 203b is assigned an existing period “until mail transmission”.

これにより、辞書サーチモジュール１１４ａは、Ｅメール応答文書４０１のメール送信を検知すると、Ｅメール応答辞書データ２０３ｂをユーザ辞書ファイル２０３から削除する。そして、Ｅメール応答辞書データ２０３ｂが格納されていたレイヤは空きレイヤとなる。存在期間「メール送信まで」により、Ｅメール応答辞書データ２０３ｂ中の語が不使用となり死語として扱われる時期（Ｅメール応答文書４０１のメール送信時）に、Ｅメール応答辞書データ２０３ｂを不要と判断し自動的に破棄することができる。 As a result, when the dictionary search module 114 a detects the email transmission of the email response document 401, the dictionary search module 114 a deletes the email response dictionary data 203 b from the user dictionary file 203. The layer in which the e-mail response dictionary data 203b is stored is an empty layer. It is determined that the e-mail response dictionary data 203b is unnecessary at the time when the words in the e-mail response dictionary data 203b are not used and are treated as dead words due to the existence period “until e-mail transmission” (when the e-mail response document 401 is transmitted). And can be discarded automatically.

一方辞書サーチモジュール１１４ａは、ワープロソフト等によりＨＤ１０８内のフォルダ４０２中に格納されている文書ファイル４０２ｂの編集が開始されると、フォルダ４０２中の他の１つ以上の文書ファイル４０２ａの文章を解析する。辞書サーチモジュール１１４ａは、文書ファイル４０２ａの文章中にあって各辞書データに無い語を抽出し、それら抽出した語を含むドキュメントフォルダ辞書データ２０３ｄを作成しユーザ辞書ファイル２０３中の空きレイヤに格納する。この時、ドキュメントフォルダ辞書データ２０３ｄには所定の存在期間「ＹＹＭＭＤＤまで」が割り当てられる。表現形式ＹＹＭＭＤＤのＹＹは西暦末尾（００〜９９）を表し、ＭＭは月（０１〜１２）、ＤＤは日（０１〜３１）を示す。 On the other hand, when the editing of the document file 402b stored in the folder 402 in the HD 108 is started by the word processing software or the like, the dictionary search module 114a analyzes the text of one or more other document files 402a in the folder 402. To do. The dictionary search module 114 a extracts words that are not included in each dictionary data in the text of the document file 402 a, creates document folder dictionary data 203 d including the extracted words, and stores it in an empty layer in the user dictionary file 203. . At this time, a predetermined existence period “until YYMMDD” is assigned to the document folder dictionary data 203d. In the expression format YYMMDD, YY represents the end of the Christian era (00 to 99), MM represents the month (01 to 12), and DD represents the day (01 to 31).

これにより、辞書サーチモジュール１１４ａは、ＹＹＭＭＤＤの超過を検知すると、キュメントフォルダ辞書データ２０３ｄをユーザ辞書ファイル２０３から削除する。そして、キュメントフォルダ辞書データ２０３ｄが格納されていたレイヤは空きレイヤとなる。ここで所定の存在期間「ＹＹＭＭＤＤまで」には、ドキュメントフォルダ辞書データ２０３ｄ中の語が不使用となり死語として扱われると見込まれる将来の時期を想定し、システムで予め既定した適当な時期が設定されるようしておけばよい。これによりドキュメントフォルダ辞書データ２０３ｄを不要となった時期に自動的に破棄することができる。 As a result, when the dictionary search module 114 a detects that YYMMDD is exceeded, the document folder dictionary data 203 d is deleted from the user dictionary file 203. The layer in which the document folder dictionary data 203d is stored is an empty layer. Here, for the predetermined existence period “until YYMMDD”, an appropriate time preset in advance by the system is set assuming a future time when a word in the document folder dictionary data 203d is not used and is expected to be treated as a dead word. Just do it. As a result, the document folder dictionary data 203d can be automatically discarded when it becomes unnecessary.

さらに、ユーザ辞書ファイル２０３中の多様なカテゴリの辞書データは、全辞書を通して単一のフォーマットを使用する。辞書サーチモジュール１１４ａは、このようなレイヤ化データ管理と単一フォーマットを使用することによって、ユーザ辞書ファイル２０３中の辞書データ間の存在期間の相違を簡単に処理できる。 Further, various categories of dictionary data in the user dictionary file 203 use a single format throughout the entire dictionary. The dictionary search module 114a can easily handle the difference in the existence period between the dictionary data in the user dictionary file 203 by using such layered data management and a single format.

図５は、ユーザ辞書ファイル２０３中の辞書データ間の存在期間の相違を説明する図である。図５において右向きが未来への時間経過を示す。例えば、新語は公衆や市場で生成されるが、時折そのような新語は不使用となり、その語は死語として扱われる。そのため、新語辞書データ２０３ｅに所定の存在期間「ＹＹＭＭＤＤまで」を割り当てることで、新語辞書データ２０３ｅの存在期間５０１ｃが既定される。また、Ｅメール応答文書４０１のメール送信が完了すると、Ｅメール応答辞書データ２０３ｂ中の語が不使用となり死語として扱われる。そのため、Ｅメール応答辞書データ２０３ｂに存在期間「メール送信まで」を割り当てることで、Ｅメール応答辞書データ２０３ｂの存在期間５０１ｂが既定される。 FIG. 5 is a diagram for explaining a difference in existence period between dictionary data in the user dictionary file 203. In FIG. 5, the right direction shows the passage of time to the future. For example, a new word is generated by the public or the market, but sometimes such a new word is not used and the word is treated as a dead word. Therefore, the existence period 501c of the new word dictionary data 203e is defined by assigning a predetermined existence period “until YYMMDD” to the new word dictionary data 203e. Further, when the email transmission of the email response document 401 is completed, the words in the email response dictionary data 203b are not used and are treated as dead words. Therefore, the existence period 501b of the e-mail response dictionary data 203b is defined by assigning the existence period “until mail transmission” to the e-mail response dictionary data 203b.

一方、人名のような固有名詞はユーザによって永久的に使用される。そのため、固有名詞辞書データ２０３ａに存在期間「永久」を割り当てることで、固有名詞辞書データ２０３ａの永久的な存在期間５０１ａが既定される。システム辞書データ２０２ａは、ベンダー供給の既定のレキシコンＤＢ内のシステム辞書ファイル２０２に含まれるデータなので、システム辞書データ２０２ａの存在期間５０１ｘは永久である。 On the other hand, proper nouns such as personal names are permanently used by users. Therefore, the permanent existence period 501a of the proper noun dictionary data 203a is determined by assigning the existence period “permanent” to the proper noun dictionary data 203a. Since the system dictionary data 202a is data included in the system dictionary file 202 in the default lexicon DB supplied by the vendor, the existence period 501x of the system dictionary data 202a is permanent.

図６は、システム辞書ファイル２０２、ユーザ辞書ファイル２０３、学習結果ファイル２０４の構造を示す図である。 FIG. 6 is a diagram showing the structure of the system dictionary file 202, user dictionary file 203, and learning result file 204.

図６は、日本語入力プログラム１１４が上述の表記リスト３０２を表示するに際し、学習結果ファイル２０４を参照して取得する、読み「はなし」に対する１以上の表記を有する表記リストテーブル６０１を示す。学習結果ファイル２０４は、各読みに対応したこのような１以上の表記リストテーブルからなる学習結果データ２０４ａを格納したファイルである。 FIG. 6 shows a notation list table 601 having one or more notations for the reading “none”, which is acquired by referring to the learning result file 204 when the Japanese input program 114 displays the above-described notation list 302. The learning result file 204 is a file that stores learning result data 204a including one or more notation list tables corresponding to each reading.

図６において、表記リストテーブル６０１は、読みフィールド６０２（データは一律「はなし」）、表記フィールド６０３及びフラグカウントフィールド６０４を有するレコードを複数含む。例えば、レコード６０５は、読み「はなし」に対する表記が「話し」で、そのフラグカウントが「４」である。フラグカウントはそのフラグカウントを有する表記について、表記リスト３０２の上位に表示される優先順位を示している。表記リストテーブル６０１中のレコードのフラグカウントの降順にそれらレコードの有する表記が表記リスト３０２に表示される。 In FIG. 6, the notation list table 601 includes a plurality of records each having a reading field 602 (data is “none”), a notation field 603, and a flag count field 604. For example, in the record 605, the notation for the reading “None” is “Talk” and the flag count is “4”. The flag count indicates the priority order displayed at the top of the notation list 302 for the notation having the flag count. The notation possessed by the records in the descending order of the flag count of the records in the notation list table 601 is displayed in the notation list 302.

ここで例えば、図３の符号３０１の位置において、入力された読み「はなし」がユーザがフォーカスを移動して選択し確定した１表記「話し」に変換されたとする。この場合、日本語入力プログラム１１４は、ＬＭモジュールにより、図６のレコード６０５のフラグカウントをインクリメント（＋＋）して「５」とし、確定した際の仮名漢字変換処理の学習結果を記録する。 Here, for example, at the position of reference numeral 301 in FIG. 3, it is assumed that the input reading “None” has been converted into one notation “spoken” selected and confirmed by the user moving the focus. In this case, the Japanese language input program 114 increments (++) the flag count of the record 605 in FIG. 6 to “5” by the LM module, and records the learning result of the kana-kanji conversion process when it is confirmed.

また図６は、各読みに対する１以上の表記を有するシステム辞書テーブル６０６を示す。システム辞書ファイル２０２は、このようなシステム辞書テーブル６０６、レイヤＩＤ（ブランク）、存在期間（永久）、データ名（システム辞書データ）から成るシステム辞書データ２０２ａを格納したファイルである。 FIG. 6 also shows a system dictionary table 606 having one or more notations for each reading. The system dictionary file 202 is a file in which system dictionary data 202a including such a system dictionary table 606, layer ID (blank), existence period (permanent), and data name (system dictionary data) is stored.

図６において、システム辞書テーブル６０６は、読みフィールドと表記フィールドを有するレコードを複数含む。例えば、レコード群６０７は、上から順に読み「はなし、はなし、はなし、はなし、はなし」に対する表記が「話、話し、放し、噺、離し」である。尚、レコード群６０７にフラグカウントを付加したレコード群は、以前の学習結果として学習結果データ２０４ａ中の読み「はなし」の表記リストテーブル６０１に既に存在している。 In FIG. 6, the system dictionary table 606 includes a plurality of records each having a reading field and a notation field. For example, the record group 607 is read in order from the top, and the notation for “no, no, no, no, no, no” is “speak, talk, release, jealous, release”. Note that the record group to which the flag count is added to the record group 607 already exists in the notation list table 601 of the reading “None” in the learning result data 204a as the previous learning result.

また図６は、各読みに対する１以上の表記を有する固有名詞辞書テーブル６０８を示す。ユーザ辞書ファイル２０３は、このような固有名詞辞書テーブル６０８、レイヤＩＤ（１）、存在期間（永久）、データ名（固有名詞辞書データ）から成る固有名詞辞書データ２０３ａを格納したファイルである。さらに図６は、各読みに対する１以上の表記を有する新語辞書テーブル６０９を示す。ユーザ辞書ファイル２０３は、このような新語辞書テーブル６０９、レイヤＩＤ（５）、存在期間（ＹＹＭＭＤＤまで）、データ名（新語辞書データ）から成る新語辞書データ２０３ｅを格納したファイルである。 FIG. 6 also shows a proper noun dictionary table 608 having one or more notations for each reading. The user dictionary file 203 is a file in which proper noun dictionary data 203a including such a proper noun dictionary table 608, layer ID (1), existence period (permanent), and data name (proper noun dictionary data) is stored. FIG. 6 further shows a new word dictionary table 609 having one or more notations for each reading. The user dictionary file 203 is a file in which new word dictionary data 203e including such a new word dictionary table 609, layer ID (5), existence period (up to YYMMDD), and data name (new word dictionary data) is stored.

ここで、図４を参照し説明した、辞書サーチモジュール１１４ａがＥメール応答辞書データ２０３ｂを作成しユーザ辞書ファイル２０３中の空きレイヤに格納する場合について、図６を参照し説明する。この場合、レイヤＩＤが２のレイヤ（以下、レイヤ２と呼ぶ）が空きレイヤであったとすると、Ｅメール応答辞書データ２０３ｂはレイヤ２に挿入され格納される。 Here, the case where the dictionary search module 114a described with reference to FIG. 4 creates the e-mail response dictionary data 203b and stores it in the empty layer in the user dictionary file 203 will be described with reference to FIG. In this case, if the layer with layer ID 2 (hereinafter referred to as layer 2) is an empty layer, the e-mail response dictionary data 203b is inserted and stored in layer 2.

図６は、Ｅメール応答辞書データ２０３ｂが格納された場合における、各読みに対する１以上の表記を有するＥメール応答辞書テーブル６１０を示す。ユーザ辞書ファイル２０３は、このようなＥメール応答辞書テーブル６１０、レイヤＩＤ（２）、存在期間（メール送信まで）、データ名（Ｅメール応答辞書データ）から成るＥメール応答辞書データ２０３ｂを格納したファイルとなる。 FIG. 6 shows an email response dictionary table 610 having one or more notations for each reading when email response dictionary data 203b is stored. The user dictionary file 203 stores e-mail response dictionary data 203b including such an e-mail response dictionary table 610, a layer ID (2), an existence period (until mail transmission), and a data name (e-mail response dictionary data). It becomes a file.

図６において、Ｅメール応答辞書テーブル６１０は、読みフィールドと表記フィールドを有するレコードを複数含む。例えば、レコード６１１は、読み「はなし」に対する表記が「花氏」である。ここで、Ｅメール応答辞書データ２０３ｂのレイヤ２への挿入に同期して、レコード６１１（読み「はなし」、表記「花氏」）にフラグカウント０を付加したレコードが、学習結果データ２０４ａ中の読み「はなし」の表記リストテーブル６０１に追加される。 In FIG. 6, the e-mail response dictionary table 610 includes a plurality of records each having a reading field and a notation field. For example, in the record 611, the notation for the reading “None” is “Hana”. Here, in synchronization with the insertion of the e-mail response dictionary data 203b into layer 2, a record in which the flag count 0 is added to the record 611 (reading “none”, notation “Hana”) is included in the learning result data 204a. The reading “None” is added to the notation list table 601.

次いで、図４を参照し説明した、辞書サーチモジュール１１４ａがＥメール応答辞書データ２０３ｂをユーザ辞書ファイル２０３から削除する場合について、図６を参照し説明する。この場合、Ｅメール応答辞書データ２０３ｂがレイヤ２から削除されると、レイヤ２のレイヤＩＤ以外がブランクとなりレイヤ２が空きレイヤとなる。 Next, the case where the dictionary search module 114a described with reference to FIG. 4 deletes the e-mail response dictionary data 203b from the user dictionary file 203 will be described with reference to FIG. In this case, when the e-mail response dictionary data 203b is deleted from the layer 2, the layers other than the layer ID of the layer 2 are blank and the layer 2 becomes an empty layer.

ここで、Ｅメール応答辞書データ２０３ｂのレイヤ２からの削除に同期して、レコード６１１（読み「はなし」、表記「花氏」）にフラグカウントを付加した上述のレコードは、学習結果データ２０４ａ中の読み「はなし」の表記リストテーブル６０１から削除される。 Here, in synchronization with the deletion of the e-mail response dictionary data 203b from the layer 2, the above-described record in which the flag count is added to the record 611 (reading “none”, the notation “Hana”) is included in the learning result data 204a. Is deleted from the notation list table 601 of “None”.

同様に、Ｅメール応答辞書データ２０３ｂのレイヤ２への挿入に同期して、レコード６１１（読み「はなし」、表記「花氏」）以外のレコード（例えば読み「ときお」、表記「時男」）についてもそれにフラグカウント０を付加したレコードが、学習結果データ２０４ａ中の読み「ときお」の表記リストテーブルに追加される。また、Ｅメール応答辞書データ２０３ｂのレイヤ２からの削除に同期して、レコード６１１（読み「はなし」、表記「花氏」）以外のレコード（例えば読み「ときお」、表記「時男」）についてもそれにフラグカウントを付加した上述のレコードは、学習結果データ２０４ａ中の読み「ときお」の表記リストテーブルから削除される。 Similarly, in synchronism with the insertion of the email response dictionary data 203b into layer 2, records other than the record 611 (reading “None”, notation “Hana” ”) (for example, reading“ Tokio ”, notation“ Tokio ”) ) Is added to the notation list table of the reading “Tokio” in the learning result data 204a. Also, in synchronization with the deletion of the e-mail response dictionary data 203b from the layer 2, records other than the record 611 (reading “None”, notation “Hana” ”) (for example, reading“ Tokio ”, notation“ Tokio ”) In addition, the above-mentioned record to which the flag count is added is deleted from the notation list table of the reading “Tokio” in the learning result data 204a.

（プログラムの処理説明）
本実施形態における上記日本語入力に係る処理は、図７〜９のフローチャートに示す処理手順により行われる。図７〜９の処理手順は、ＣＰＵ１０１が実行する日本語入力プログラム処理の内容を示し、ＣＰＵ１０１が、ＨＤ１０８に記憶されている日本語入力プログラム１１４を、システムメモリ１０２のＲＡＭにロードして実行することにより行われる。以下、ＣＰＵ１０１が行う日本語入力プログラム１１４の処理手順について、図１〜６も合わせて参照しながら説明する。 (Program processing explanation)
The processing related to the Japanese input in the present embodiment is performed according to the processing procedure shown in the flowcharts of FIGS. 7 to 9 show the contents of the Japanese input program processing executed by the CPU 101. The CPU 101 loads the Japanese input program 114 stored in the HD 108 into the RAM of the system memory 102 and executes it. Is done. Hereinafter, the processing procedure of the Japanese input program 114 performed by the CPU 101 will be described with reference to FIGS.

ここで、ＯＳ１０９のブート後は、ＯＳ１０９により自動で日本語入力プログラム１１４がシステムメモリ１０２に常駐され、起動されているものとする。また、あらかじめメーラー又はワープロソフト等のアプリケーションも起動されているものとする。 Here, it is assumed that after the OS 109 is booted, the Japanese input program 114 is automatically resident in the system memory 102 and started by the OS 109. It is also assumed that an application such as a mailer or word processor software has been started in advance.

図７において、ＣＰＵ１０１は、ＯＳからのイベントの有無を判定する（ステップＳ７００）。ＯＳからのイベントを検出した場合、ＣＰＵ１０１は、そのイベントがアプリケーションによる文書の編集開始イベントか否かを判定する（ステップＳ７０２）。文書の編集開始の場合、ＣＰＵ１０１は、ＡＰＩ等の機能によりＯＳに問い合わせ、どのアプリケーションで文書編集中であるか情報を取得する（ステップＳ７０４→Ｓ７０６）。 In FIG. 7, the CPU 101 determines whether there is an event from the OS (step S700). When detecting an event from the OS, the CPU 101 determines whether or not the event is a document editing start event by the application (step S702). In the case of starting editing of a document, the CPU 101 inquires of the OS by a function such as an API, and acquires information as to which application is editing the document (steps S704 → S706).

メーラーで文書編集中の場合、ＣＰＵ１０１は、図４に示したようなＥメール応答文書４０１の編集開始か否かを判定する（ステップＳ７０８→Ｓ７１０）。Ｅメール応答文書４０１の編集開始ではない場合、ＣＰＵ１０１は、メーラーの現行の処理を行い、ＯＳからのイベント待ちに戻る（ステップＳ７６０→Ｓ７００）。 When the document is being edited by the mailer, the CPU 101 determines whether or not the editing of the e-mail response document 401 as shown in FIG. 4 is started (steps S708 → S710). If it is not the start of editing the e-mail response document 401, the CPU 101 performs the current processing of the mailer and returns to waiting for an event from the OS (steps S760 → S700).

Ｅメール応答文書４０１の編集開始の場合、ＣＰＵ１０１は、Ｅメール応答文書４０１中のメール送信者からの元のメッセージ４０１ａを解析し、元のメッセージ４０１ａに含まれる語の読みと表記を有するレコード（以下、語レコードと呼ぶ）を抽出する（ステップＳ７１２）。ここで、語レコードを抽出する解析手法としては、周知のn-gram方式、形態素解析等を適用すればよい。そしてＣＰＵ１０１は、抽出した語レコードのうち、元のメッセージ４０１ａ中に有ってシステムに既存の各辞書データ（システム辞書データ２０２ａ、固有名詞辞書データ２０３ａ、略語辞書データ２０３ｃ、新語辞書データ２０３ｅ等）に無い新しい語レコードを判定し、それら新しい語レコードを抽出する（ステップＳ７１４）。 When the editing of the e-mail response document 401 is started, the CPU 101 analyzes the original message 401a from the mail sender in the e-mail response document 401, and records that have the reading and notation of the words included in the original message 401a ( (Hereinafter referred to as a word record) is extracted (step S712). Here, as an analysis method for extracting the word record, a known n-gram method, morphological analysis, or the like may be applied. Then, the CPU 101 stores each dictionary data existing in the system in the original message 401a among the extracted word records (system dictionary data 202a, proper noun dictionary data 203a, abbreviation dictionary data 203c, new word dictionary data 203e, etc.). New word records that are not present are determined, and those new word records are extracted (step S714).

次いでＣＰＵ１０１は、ステップＳ７１４で抽出した新しい語レコード含むＥメール応答辞書データ２０３ｂを、システムメモリ１０２のＲＡＭ上に作成する（ステップＳ７１６）。ここでＥメール応答辞書データ２０３ｂは、図６に示したように、空きレイヤのレイヤＩＤ（例えば２）、存在期間（メール送信まで）、データ名（Ｅメール応答辞書データ）、抽出した新しい語レコードを有するＥメール応答辞書テーブル６１０から成る。 Next, the CPU 101 creates e-mail response dictionary data 203b including the new word record extracted in step S714 on the RAM of the system memory 102 (step S716). Here, as shown in FIG. 6, the e-mail response dictionary data 203b includes a layer ID (for example, 2) of an empty layer, an existing period (until mail transmission), a data name (e-mail response dictionary data), and a new word extracted. It consists of an e-mail response dictionary table 610 having records.

そしてＣＰＵ１０１は、Ｅメール応答辞書データ２０３ｂをユーザ辞書ファイル２０３中の空きレイヤ（例えばレイヤ２）に挿入し格納する（ステップＳ７１８）。さらにＥメール応答辞書データ２０３ｂのレイヤ２への挿入に同期して、ＣＰＵ１０１は、学習結果データ２０４ａにＥメール応答辞書データ２０３ｂに関連したレコードを追加する（ステップＳ７２０）。即ち、図６に示したように、例えばレコード６１１（読み「はなし」、表記「花氏」）及びそれ以外のレコード（例えば読み「ときお」、表記「時男」）について、それらにフラグカウント０を付加した各レコードが、学習結果データ２０４ａ中の読み「はなし」、「ときお」等の各表記リストテーブルに追加される。そしてＣＰＵ１０１は、ＯＳからのイベント待ちに戻る（ステップＳ７２０→Ｓ７００）。 The CPU 101 inserts and stores the e-mail response dictionary data 203b into an empty layer (for example, layer 2) in the user dictionary file 203 (step S718). Further, in synchronization with the insertion of the email response dictionary data 203b into layer 2, the CPU 101 adds a record related to the email response dictionary data 203b to the learning result data 204a (step S720). That is, as shown in FIG. 6, for example, the record 611 (reading “None”, notation “Hana” ”) and other records (eg, reading“ Tokio ”, notation“ Tokio ”) Each record to which 0 is added is added to each notation list table such as “None” or “Tokio” in the learning result data 204a. Then, the CPU 101 returns to waiting for an event from the OS (step S720 → S700).

一方、メーラー以外で文書編集中の場合、ＣＰＵ１０１は、ステップＳ７１０〜Ｓ７２０の処理に準じた文書編集をしているそのアプリケーションに対応した処理を行う（ステップＳ７０８→Ｓ７５０）。例えば、ＣＰＵ１０１は、図４に示したようにワープロソフト等によりＨＤ１０８内のフォルダ４０２中に格納されている文書ファイル４０２ｂの編集が開始されると、フォルダ４０２中の他の１つ以上の文書ファイル４０２ａの文章を解析する。ＣＰＵ１０１は、文書ファイル４０２ａの文章中にあって各辞書データに無い語を抽出し、それら抽出した語を含むドキュメントフォルダ辞書データ２０３ｄを作成しユーザ辞書ファイル２０３中の空きレイヤに格納する。この時、ドキュメントフォルダ辞書データ２０３ｄには所定の存在期間「ＹＹＭＭＤＤまで」が割り当てられる。そしてＣＰＵ１０１は、ＯＳからのイベント待ちに戻る（ステップＳ７５０→Ｓ７００）。 On the other hand, when the document is being edited by other than the mailer, the CPU 101 performs processing corresponding to the application that is editing the document in accordance with the processing in steps S710 to S720 (step S708 → S750). For example, the CPU 101 starts editing one or more other document files in the folder 402 when editing of the document file 402b stored in the folder 402 in the HD 108 is started by word processing software as shown in FIG. The sentence 402a is analyzed. The CPU 101 extracts words that are not included in each dictionary data in the text of the document file 402 a, creates document folder dictionary data 203 d including the extracted words, and stores it in an empty layer in the user dictionary file 203. At this time, a predetermined existence period “until YYMMDD” is assigned to the document folder dictionary data 203d. Then, the CPU 101 returns to waiting for an event from the OS (step S750 → S700).

また、ＯＳからのイベントが文書の編集開始以外の場合、ＣＰＵ１０１は、そのイベントがキーボード１０６による読みの入力であるか判定する（ステップＳ７０２→Ｓ７９０）。ＯＳからのイベントが読みの入力の場合、ＣＰＵ１０１は、図８に示す処理に進む（ステップＳ７９０→図８）。ＯＳからの読みの入力以外の場合、ＣＰＵ１０１は、そのイベントがアプリケーションによる文書の編集終了イベントか否かを判定する（ステップＳ７９０→Ｓ７９５）。ＯＳからのイベントが文書の編集終了の場合、ＣＰＵ１０１は、図９に示す処理に進む（ステップＳ７９５→図９）。ＯＳからのイベントが文書の編集終了以外の場合、ＣＰＵ１０１は、そのイベントに対応した処理を行い、ＯＳからのイベント待ちに戻る（ステップＳ７９５→Ｓ７００）。 If the event from the OS is other than the start of document editing, the CPU 101 determines whether the event is a reading input by the keyboard 106 (steps S702 → S790). If the event from the OS is a reading input, the CPU 101 proceeds to the process shown in FIG. 8 (step S790 → FIG. 8). In cases other than reading input from the OS, the CPU 101 determines whether the event is a document editing end event by the application (steps S790 → S795). If the event from the OS is the end of document editing, the CPU 101 proceeds to the process shown in FIG. 9 (step S795 → FIG. 9). If the event from the OS is other than the end of document editing, the CPU 101 performs processing corresponding to the event and returns to waiting for an event from the OS (steps S795 → S700).

図８において、ＯＳからのイベントが読みの入力の場合、ＣＰＵ１０１は、例えば図３を参照し上述したようなキーボード１０６により入力された読み「はなし」の情報を、システムメモリ１０２のＲＡＭ上のバッファへ格納する（ステップＳ８００）。次いでＣＰＵ１０１は、学習結果データ２０４ａから、入力された読み「はなし」に対応する表記リストテーブル６０１を認識する（ステップＳ８０２）。そしてＣＰＵ１０１は、表記リストテーブル６０１中のレコードのフラグカウントの降順にそれらレコードの有する表記を並べた表記リスト３０２のデータを、バッファに編集する（ステップＳ８０４）。 In FIG. 8, when the event from the OS is a reading input, the CPU 101 stores information on the reading “nothing” input by the keyboard 106 as described above with reference to FIG. (Step S800). Next, the CPU 101 recognizes the notation list table 601 corresponding to the input reading “None” from the learning result data 204a (step S802). The CPU 101 edits in the buffer the data of the notation list 302 in which the notations of the records are arranged in descending order of the flag counts of the records in the notation list table 601 (step S804).

続いてＣＰＵ１０１はユーザの変換キーの入力イベントを検知すると、上記ステップＳ８０４においてバッファ上に編集した表記リスト３０２のデータから取得される最初の表記（話し）を最初の変換文字列として、符号３０１の入力された読み「はなし」の位置に表示する（ステップＳ８０６）。 Subsequently, when the CPU 101 detects an input event of the user's conversion key, the first notation (spoken) acquired from the data of the notation list 302 edited on the buffer in step S804 is used as the first conversion character string, and the reference numeral 301 indicates. The input reading “None” is displayed at the position (step S806).

最初の変換文字列の表示後、ＣＰＵ１０１は確定キーの入力イベントを検知すると、ステップＳ８０８→Ｓ８１６→Ｓ８１８の１連の処理で最初の変換文字列を確定文字列として文書編集中のアプリケーション（例えばメーラー）へ渡し、後処理（Ｓ８１６→Ｓ８１８）を行う。後処理については後述する。 After displaying the first conversion character string, when the CPU 101 detects an input event of the confirmation key, an application (for example, a mailer) that is editing a document with the first conversion character string as the confirmation character string in a series of processes of steps S808 → S816 → S818. ) To perform post-processing (S816 → S818). The post-processing will be described later.

最初の変換文字列の後、ＣＰＵ１０１は再度変換キーの入力イベントを検知すると、バッファ上に編集した表記リスト３０２のデータを図３に示すように表記リスト３０２としてディスプレイ１０７の表示画面に表示する（ステップＳ８０８→Ｓ８１０）。 After detecting the conversion key input event again after the first conversion character string, the CPU 101 displays the data of the notation list 302 edited on the buffer as the notation list 302 on the display screen of the display 107 as shown in FIG. Step S808 → S810).

ＣＰＵ１０１は、図３に示すように、表示した表記リスト３０２中の表記を指示するフォーカス３０４を、変換キーとdown（↓）キーの入力イベントに対しては下方へ、up（↑）キーの入力イベントに対しては上方へ移動させる。また、ＣＰＵ１０１は、フォーカス３０４が表記リスト３０２の上下限を超える場合には、表記リスト３０２をスクロールさせて更新する。この処理はステップＳ１８１２とステップＳ８１４の間をループする処理で行われる。なおＣＰＵ１０１は、フォーカス３０４で指示された表記を常に変換文字列として、符号３０１の入力された読み「はなし」の位置に表示する。 As shown in FIG. 3, the CPU 101 moves the focus 304 indicating the notation in the displayed notation list 302 downward to the input event of the conversion key and the down (↓) key, and inputs the up (↑) key. Move upward for an event. When the focus 304 exceeds the upper and lower limits of the notation list 302, the CPU 101 scrolls and updates the notation list 302. This process is performed by a loop between step S1812 and step S814. The CPU 101 always displays the notation instructed by the focus 304 as a converted character string at the position of the reading “None” inputted with reference numeral 301.

ＣＰＵ１０１はステップＳ８１４において確定キーの入力イベントを検知すると、後処理（ステップＳ８１６→Ｓ８１８）を行う。ＣＰＵ１０１は後処理として、ステップＳ８１６→Ｓ８１８の１連の処理で、まず表記リスト３０２を閉じ、変換文字列（フォーカスされていた表記）を確定文字列として文書編集中のアプリケーション（例えばメーラー）へ渡す。 When the CPU 101 detects a determination key input event in step S814, the CPU 101 performs post-processing (steps S816 → S818). As a post-processing, the CPU 101 first closes the notation list 302 in a series of processes from step S816 to S818, and passes the converted character string (the focused notation) to the application (for example, mailer) that is editing the document as a confirmed character string. .

ここで例えば、図３の符号３０１の位置において、入力された読み「はなし」がユーザがフォーカスを移動して選択し確定した１表記「話し」に変換されたとする。この場合、ＣＰＵ１０１は、図６の学習結果データ２０４ａの表記リストテーブル６０１中のレコード６０５について、そのフラグカウントをインクリメント（＋＋）して「５」とし、確定した際の仮名漢字変換処理の学習結果を記録する（ステップＳ８２０）。ＣＰＵ１０１は、その後、ＯＳからのイベント待ちに戻る（図７のステップＳ７００）。 Here, for example, at the position of reference numeral 301 in FIG. 3, it is assumed that the input reading “None” has been converted into one notation “spoken” selected and confirmed by the user moving the focus. In this case, the CPU 101 increments (++) the flag count to “5” for the record 605 in the notation list table 601 of the learning result data 204a in FIG. Is recorded (step S820). Thereafter, the CPU 101 returns to waiting for an event from the OS (step S700 in FIG. 7).

図９において、ＯＳからのイベントが文書の編集終了の場合、ＣＰＵ１０１は、現状の時間イベントをＯＳから取得し、その情報をシステムメモリ１０２のＲＡＭ上のバッファへ格納する（ステップＳ９００）。現状の時間イベントとしては、「メール送信完了」、「ＹＹＭＭＤＤ」等の情報が適宜取得可能である。 In FIG. 9, when the event from the OS is the end of document editing, the CPU 101 acquires the current time event from the OS and stores the information in a buffer on the RAM of the system memory 102 (step S900). As the current time event, information such as “mail transmission complete” and “YYMMDD” can be acquired as appropriate.

次いでＣＰＵ１０１は、ユーザ辞書ファイル２０３に既存の各辞書データ（固有名詞辞書データ２０３ａ、Ｅメール応答辞書データ２０３ｂ、略語辞書データ２０３ｃ、ドキュメントフォルダ辞書データ２０３ｄ、新語辞書データ２０３ｅ等）中の存在期間と、取得した現状の時間イベントとを比較する（ステップＳ９０２）。例えば、現状の時間イベントが「メール送信完了」であった場合、ＣＰＵ１０１は、図６に示すように存在期間が「メール送信まで」であるＥメール応答辞書データ２０３ｂを存在期間超過と判定し、処理をステップＳ９０４へ進める。 Next, the CPU 101 stores the existing period in each existing dictionary data (such as proper noun dictionary data 203a, e-mail response dictionary data 203b, abbreviation dictionary data 203c, document folder dictionary data 203d, new word dictionary data 203e) in the user dictionary file 203. The acquired current time event is compared (step S902). For example, if the current time event is “mail transmission complete”, the CPU 101 determines that the e-mail response dictionary data 203b whose existence period is “until mail transmission” as shown in FIG. The process proceeds to step S904.

ＣＰＵ１０１は、ステップＳ９０４において、例えば、存在期間満了となった図６のＥメール応答辞書データ２０３ｂをレイヤ２から削除し、レイヤ２のレイヤＩＤ以外をブランクとしてレイヤ２を空きレイヤとする。さらにＥメール応答辞書データ２０３ｂのレイヤ２からの削除に同期して、ＣＰＵ１０１は、学習結果データ２０４ａからＥメール応答辞書データ２０３ｂに関連したレコードを削除する（ステップＳ９０６）。即ち、図６に示したように、例えばレコード６１１（読み「はなし」、表記「花氏」）及び以外のレコード（例えば読み「ときお」、表記「時男」）について、それらにフラグカウントを付加した各レコードが、学習結果データ２０４ａ中の読み「はなし」、「ときお」等の各表記リストテーブルから削除される。ＣＰＵ１０１は、その後、ＯＳからのイベント待ちに戻る（図７のステップＳ７００）。 In step S904, for example, the CPU 101 deletes the e-mail response dictionary data 203b in FIG. 6 whose expiration period has expired from the layer 2, and sets the layer 2 as a blank layer except for the layer ID of the layer 2 as a blank. Further, in synchronization with the deletion of the email response dictionary data 203b from the layer 2, the CPU 101 deletes the record related to the email response dictionary data 203b from the learning result data 204a (step S906). That is, as shown in FIG. 6, for example, record 611 (reading “None”, notation “Hana” ”) and other records (eg, reading“ Tokio ”, notation“ Tokio ”) are flagged for them. Each added record is deleted from each notation list table such as “None” or “Tokio” in the learning result data 204a. Thereafter, the CPU 101 returns to waiting for an event from the OS (step S700 in FIG. 7).

一方、ステップＳ９０２において、ユーザ辞書ファイル２０３に既存の各辞書データが全て存在期間内と判定された場合、ＣＰＵ１０１は、ＯＳからのイベント待ちに戻る（ステップＳ９０２→図７のステップＳ７００）。 On the other hand, if it is determined in step S902 that all existing dictionary data in the user dictionary file 203 is within the existing period, the CPU 101 returns to waiting for an event from the OS (step S902 → step S700 in FIG. 7).

（実施形態の効果）
以上説明したように本実施形態によれば、第１の態様として、所定の言語の入力された読みに対する１つ又は複数の表記を表示し、その表示された表記の中から、１つの表記を選択するための言語入力システムは、１つ又は複数のデータセット（２０３ａ、２０３ｂ、２０３ｃ、２０３ｄ、２０３ｅ）をレイヤ別に記憶した記憶手段であって、上記データセットは１つ又は複数のデータレコードを含み、上記データレコードは読みに対応した表記を取得するためのデータ定義を含む記憶手段（１０８）と、文章を解析し、その文章が含む語の読みに対応した上記データ定義を含むデータレコードを抽出する抽出手段（１０１、Ｓ７１２）と、上記抽出手段によって抽出された上記データレコードが、上記１つ又は複数のデータセットのいずれかに含まれるか否かを判定する抽出データ判定手段（１０１、Ｓ７１４）と、上記抽出データ判定手段によって、抽出された上記データレコードが上記１つ又は複数のデータセットのいずれにも含まれないとの判定がされた場合、抽出された上記データレコードを含む新たなデータセットを生成し、その生成された新たなデータセットを上記記憶手段の空きレイヤに記憶するデータ生成手段（１０１、Ｓ７１６、Ｓ７１８）とを備えたことを特徴とする。 (Effect of embodiment)
As described above, according to the present embodiment, as a first aspect, one or more notations for an input reading in a predetermined language are displayed, and one notation is displayed from the displayed notations. The language input system for selecting is a storage means that stores one or more data sets (203a, 203b, 203c, 203d, 203e) by layer, and the data set stores one or more data records. The data record includes a storage means (108) including a data definition for obtaining a notation corresponding to reading, and a data record including the data definition corresponding to reading of a word included in the sentence after analyzing the sentence The extracting means (101, S712) for extracting and the data record extracted by the extracting means are either one of the one or plural data sets. If the extracted data record is not included in any of the one or more data sets by the extracted data determination means (101, S714) for determining whether or not the data record is included Is determined, data generation means (101, S716, S718) for generating a new data set including the extracted data record and storing the generated new data set in an empty layer of the storage means. ).

ここで、第２の態様として、第１の態様の言語入力システムにおいて、上記１つ又は複数のデータセットの各々は共通のフォーマットを有し（図４、図６、２０３ａ、２０３ｂ、２０３ｃ、２０３ｄ、２０３ｅ）、上記データ生成手段は、上記フォーマットを有する上記新たなデータセットを生成することを特徴とすることができる。 Here, as a second aspect, in the language input system according to the first aspect, each of the one or more data sets has a common format (FIGS. 4, 6, 203a, 203b, 203c, 203d). 203e), the data generation means may generate the new data set having the format.

また、第３の態様として、第１又は第２の態様の言語入力システムにおいて、上記記憶手段に記憶された上記１つ又は複数のデータセットが含む上記データレコードは、読みとその読みに対応した表記とを含み（６０８、６０９）、上記抽出手段によって抽出された上記データレコードは、上記文章が含む語の読みとその読みに対応した表記とを含む（６１０）ことを特徴とすることができる。 As a third aspect, in the language input system according to the first or second aspect, the data record included in the one or more data sets stored in the storage means corresponds to the reading and the reading. The data record extracted by the extracting means includes a reading of a word included in the sentence and a notation corresponding to the reading (610). .

また、第４の態様として、第１乃至第３のいずれかの態様の言語入力システムにおいて、上記記憶手段に記憶された上記１つ又は複数のデータセットの各々は、存在期間を有し（図６、２０３ａ、２０３ｅ）、上記データ生成手段は、上記新たなデータセットに存在期間を割り当てる（図６、２０３ｂ、Ｓ７１６）ことを特徴とすることができる。 As a fourth aspect, in the language input system according to any one of the first to third aspects, each of the one or more data sets stored in the storage means has an existence period (see FIG. 6, 203a, 203e), and the data generation means assigns an existing period to the new data set (FIG. 6, 203b, S716).

また、第５の態様として、第４の態様の言語入力システムにおいて、上記記憶手段に記憶された１つ又は複数のデータセットの各々の存在期間を判定する期間判定手段（１０１、Ｓ９０２）と、上記期間判定手段によって存在期間が満了と判定されたデータセットを上記記憶手段から削除し、その削除されたデータセットが記憶されていたレイヤを空きレイヤとするデータ消去手段（１０１、Ｓ９０４）とをさらに備えたことを特徴とすることができる。 Further, as a fifth aspect, in the language input system of the fourth aspect, period determining means (101, S902) for determining each existing period of one or a plurality of data sets stored in the storage means; A data erasure unit (101, S904) that deletes from the storage unit the data set whose existence period has been determined to have expired by the period determination unit, and sets the layer in which the deleted data set is stored as an empty layer; Furthermore, it can be characterized by being provided.

また、第６の態様として、第１乃至第５のいずれかの態様の言語入力システムにおいて、上記文章は、電子メール応答文書中の電子メール送信者からの元のメッセージ（４０１ａ）であることを特徴とすることができる。 Further, as a sixth aspect, in the language input system according to any one of the first to fifth aspects, the sentence is an original message (401a) from the e-mail sender in the e-mail response document. Can be a feature.

また、第７の態様として、第１乃至第５のいずれかの態様の言語入力システムにおいて、上記文章は、特定のフォルダ内にある文書ファイル（４０２ａ）の内容であることを特徴とすることができる。 As a seventh aspect, in the language input system according to any one of the first to fifth aspects, the sentence is a content of a document file (402a) in a specific folder. it can.

また、第８の態様として、第１乃至第７のいずれかの態様の言語入力システムにおいて、上記所定の言語は日本語であることを特徴とすることができる。 As an eighth aspect, in the language input system according to any one of the first to seventh aspects, the predetermined language may be Japanese.

さらに、多様なカテゴリのユーザ関連データは単一のフォーマットを使用するので、言語入力システムは、このようなレイヤ化データ管理を使用することによって、ユーザ関連データ間の存在期間の相違を簡単に処理できる。 In addition, since various categories of user-related data use a single format, the language input system can easily handle lifetime differences between user-related data by using such layered data management. it can.

（他の実施形態）
以上述べた実施形態の他に次の形態を実施できる。
１）上述の実施形態では、１実施形態を例示する便宜上、図２、４、６〜９等において、１個のユーザ辞書ファイル２０３内での複数辞書データ（２０３ａ、２０３ｂ、２０３ｃ、２０３ｄ、２０３ｅ等）のレイヤ構造として説明した。しかし本実施形態はこれに限られず、各辞書データ（２０３ａ、２０３ｂ、２０３ｃ、２０３ｄ、２０３ｅ等）が各々１個のユーザ辞書ファイルを構成し、それら複数のユーザ辞書ファイルのレイヤ構造とした実施形態に変形が可能であることは、当業者には容易に理解できよう。 (Other embodiments)
In addition to the embodiments described above, the following embodiments can be implemented.
1) In the above-described embodiment, for convenience of illustrating one embodiment, a plurality of dictionary data (203a, 203b, 203c, 203d, 203e in one user dictionary file 203 are shown in FIGS. Etc.). However, the present embodiment is not limited to this, and each dictionary data (203a, 203b, 203c, 203d, 203e, etc.) constitutes one user dictionary file and has a layer structure of the plurality of user dictionary files. It will be readily understood by those skilled in the art that variations are possible.

さらに、上述の実施形態では、１個のシステム辞書ファイル２０２が１個のシステム辞書データ２０２ａを含む場合を説明したが、本実施形態はこれに限られず、システム辞書ファイルが複数の実施形態、又は１個のシステム辞書ファイルが複数のシステム辞書データを含む実施形態に変形が可能であることは、当業者には容易に理解できよう。 Furthermore, although the case where one system dictionary file 202 includes one system dictionary data 202a has been described in the above-described embodiment, the present embodiment is not limited to this, and the system dictionary file includes a plurality of embodiments, or Those skilled in the art will readily understand that one system dictionary file can be modified into an embodiment including a plurality of system dictionary data.

２）上述の実施形態では、ユーザがキーボード１０６を使用して入力したディスプレイ１０７の表示画面上の、符号３０１の位置に入力された読み「はなし」に対して、仮名漢字変換処理する場合を例に説明した。しかし、本発明は仮名漢字変換のみに限るものではなく、言語処理系全般に適用可能である。例えば読みの入力はキーボードに限らず、マイクでユーザの発した音声をＰＣ１００へ入力し、その音声を音声認識ソフトで認識した結果の読みを取り込んで処理する場合にも本発明が適用可能なことを、当業者なら容易に理解できるであろう。 2) In the above-described embodiment, an example in which kana-kanji conversion processing is performed on the reading “nothing” input at the position of the reference numeral 301 on the display screen of the display 107 input by the user using the keyboard 106. Explained. However, the present invention is not limited to kana-kanji conversion, but can be applied to all language processing systems. For example, the input of reading is not limited to a keyboard, and the present invention can also be applied to the case where a voice uttered by a user is input to the PC 100 with a microphone and the result of the recognition of the voice by voice recognition software is captured and processed. Will be easily understood by those skilled in the art.

３）上述の実施形態では、日本語を処理する日本語入力システムの例を示したが、本発明は日本語に限ることはない。同音異義語を有する各種の言語にも本発明が適用可能なことを、当業者なら容易に理解できるであろう。 3) In the above-described embodiment, an example of a Japanese input system that processes Japanese is shown, but the present invention is not limited to Japanese. Those skilled in the art will readily understand that the present invention is applicable to various languages having homonyms.

４）上述の実施形態では、日本語入力システムを汎用のパーソナルコンピュータで実現する例を示したが、パーソナルコンピュータ以外の情報処理装置として、ワークステーション、サーバ、ＰＤＡ（Personal Digital Assistant）、携帯電話及び各種のプログラム実行可能なその他情報処理装置に本発明を適用可能である。 4) In the above-described embodiment, an example in which a Japanese input system is realized by a general-purpose personal computer has been described. However, as an information processing device other than a personal computer, a workstation, a server, a PDA (Personal Digital Assistant), a mobile phone, The present invention can be applied to other information processing apparatuses capable of executing various programs.

５）本発明で言う記録媒体とは、ＣＰＵが実行するプログラムを記録しておき、デバイスにより読み取り可能な媒体を言う。記録媒体としては、ＣＤ−ＲＯＭ以外に、ＩＣ（integrated circuits）メモリ、ＨＤ，フロッピー（登録商標）ディスク、光磁気ディスク（ＭＯ）など周知の記録媒体を使用することができる。
また、記録媒体に記録されるプログラムは、プログラムそのもの、圧縮したもの、暗号化したもののいずれでもよく、これらのデータはすべて本発明のプログラムの概念の中に含まれる。
さらにインターネット、ＬＡＮ（local area network）などのネットワークあるいは信号線を介して、プログラムを情報処理装置に転送（ダウンロード）する場合には、転送元の装置の上記プログラムを記憶する記録媒体または記憶デバイスが本発明の記録媒体に該当する。 5) The recording medium referred to in the present invention refers to a medium in which a program executed by the CPU is recorded and can be read by the device. As the recording medium, in addition to the CD-ROM, a known recording medium such as an IC (integrated circuits) memory, an HD, a floppy (registered trademark) disk, a magneto-optical disk (MO) can be used.
The program recorded on the recording medium may be any of the program itself, a compressed program, and an encrypted program, and all these data are included in the concept of the program of the present invention.
Further, when transferring (downloading) a program to an information processing apparatus via a network such as the Internet or a LAN (local area network) or a signal line, a recording medium or storage device for storing the program of the transfer source apparatus is provided. This corresponds to the recording medium of the present invention.

６）上述の実施形態は本発明の例示のために説明したが、上述の実施形態の他にも変形が可能である。その変形が特許請求の範囲で述べられている本発明の技術思想に基づく限り、その変形は本発明の技術的範囲内となる。 6) Although the above-described embodiment has been described for the purpose of illustrating the present invention, modifications can be made in addition to the above-described embodiment. As long as the modification is based on the technical idea of the present invention described in the claims, the modification is within the technical scope of the present invention.

本発明を適用できる実施形態の日本語入力プログラムをインストールしたＰＣのシステムブロック図である。It is a system block diagram of PC which installed the Japanese input program of embodiment which can apply this invention. 本発明を適用できる実施形態の日本語入力プログラムの動作概要を示すシステム構成図である。It is a system configuration | structure figure which shows the operation | movement outline | summary of the Japanese input program of embodiment which can apply this invention. 本発明を適用できる実施形態の入力された読み適当な表記へ変換する場合の例を示す図である。It is a figure which shows the example in the case of converting into the input appropriate reading notation of embodiment which can apply this invention. 本発明を適用できる実施形態の辞書サーチモジュールによるユーザ辞書ファイル中の各辞書データの管理方法を示す図である。It is a figure which shows the management method of each dictionary data in the user dictionary file by the dictionary search module of embodiment which can apply this invention. 本発明を適用できる実施形態のユーザ辞書ファイル中の辞書データ間の存在期間の相違を説明する図である。It is a figure explaining the difference in the existence period between the dictionary data in the user dictionary file of embodiment which can apply this invention. 本発明を適用できる実施形態のシステム辞書ファイル、ユーザ辞書ファイル、学習結果ファイルの構造を示す図である。It is a figure which shows the structure of the system dictionary file of embodiment which can apply this invention, a user dictionary file, and a learning result file. 本発明を適用できる実施形態のＣＰＵが実行する日本語入力プログラム処理を示すフローチャートである。It is a flowchart which shows the Japanese input program process which CPU of embodiment which can apply this invention performs. 本発明を適用できる実施形態のＣＰＵが実行する日本語入力プログラム処理を示すフローチャートである。It is a flowchart which shows the Japanese input program process which CPU of embodiment which can apply this invention performs. 本発明を適用できる実施形態のＣＰＵが実行する日本語入力プログラム処理を示すフローチャートである。It is a flowchart which shows the Japanese input program process which CPU of embodiment which can apply this invention performs.

Explanation of symbols

１００パーソナルコンピュータ
１０１ＣＰＵ
１０２システムメモリ
１０３モデム
１０４ＣＤ−ＲＯＭドライブ
１０５ＣＤ−ＲＯＭ
１０６キーボード
１０７ディスプレイ
１０８ＨＤ
１０９ＯＳ
１１０種々のアプリケーションプログラム
１１１種々のプログラムモジュール
１１２プログラムデータ
１１４日本語入力プログラム
１１５マウス
１１６システムバス
１１７マイク
１２０広域ネットワーク
１２１ＷＷＷサーバ
１１４辞書サーチモジュールａ
１１４ｂＬＭモジュール
２０１入力された読み
２０２システム辞書ファイル
２０２ａシステム辞書データ
２０３ユーザ辞書ファイル
２０３ａ固有名詞辞書データ
２０３ｂＥ（電子）メール応答辞書データ
２０３ｃ略語辞書データ
２０３ｄドキュメントフォルダ辞書データ
２０３ｅ新語辞書データ
２０４学習結果ファイル
２０４ａ学習結果データ
２０５表記出力
４０１Ｅメール応答文書
４０１ａ元のメッセージ
４０２フォルダ
４０２ａ、４０２ｂ文書ファイル
５０１ａ、５０１ｂ、５０１ｃ、５０１ｘ存在期間５０１ａ
100 Personal computer 101 CPU
102 System memory 103 Modem 104 CD-ROM drive 105 CD-ROM
106 Keyboard 107 Display 108 HD
109 OS
110 Various application programs 111 Various program modules 112 Program data 114 Japanese input program 115 Mouse 116 System bus 117 Microphone 120 Wide area network 121 WWW server 114 Dictionary search module a
114b LM module 201 Input reading 202 System dictionary file 202a System dictionary data 203 User dictionary file 203a Proper noun dictionary data 203b E (electronic) mail response dictionary data 203c Abbreviation dictionary data 203d Document folder dictionary data 203e New word dictionary data 204 Learning result File 204a Learning result data 205 Notation output 401 E-mail response document 401a Original message 402 Folder 402a, 402b Document file 501a, 501b, 501c, 501x Existence period 501a

Claims

A language input system for displaying one or more notations for an input reading in a predetermined language and selecting one notation from the displayed notations,
Storage means for storing one or more data sets by layer, wherein the data set includes one or more data records, and the data records include a data definition for obtaining a notation corresponding to reading. Storage means;
Extracting means for analyzing a sentence and extracting the data record including the data definition corresponding to the reading of the word included in the sentence;
Extracted data determination means for determining whether the data record extracted by the extraction means is included in any of the one or more data sets;
If the extracted data determination means determines that the extracted data record is not included in any of the one or more data sets, a new data set including the extracted data record is obtained. A language input system comprising: data generation means for generating and storing the generated new data set in an empty layer of the storage means.

Each of the one or more data sets has a common format;
The language input system according to claim 1, wherein the data generation unit generates the new data set having the format.

The data record included in the one or more data sets stored in the storage means includes a reading and a notation corresponding to the reading;
The language input system according to claim 1, wherein the data record extracted by the extraction unit includes a reading of a word included in the sentence and a notation corresponding to the reading.

Each of the one or more data sets stored in the storage means has a lifetime.
The language input system according to any one of claims 1 to 3, wherein the data generation unit assigns an existing period to the new data set.

Period determination means for determining the existence period of each of the one or more data sets stored in the storage means;
A data erasure unit that deletes the data set whose existence period is determined to be expired by the period determination unit from the storage unit, and sets the layer in which the deleted data set is stored as an empty layer. The language input system according to claim 4, wherein:

The language input system according to claim 1, wherein the sentence is an original message from an e-mail sender in an e-mail response document.

The language input system according to claim 1, wherein the sentence is a content of a document file in a specific folder.

The language input system according to claim 1, wherein the predetermined language is Japanese.

A language input system processing method for displaying one or more notations for an input reading in a predetermined language and selecting one notation from the displayed notations,
The language input system is storage means for storing one or more data sets for each layer, and the data set includes one or more data records, and the data records obtain a notation corresponding to reading. Storage means including a data definition for the method,
An extracting step of extracting the data record including the data definition corresponding to the reading of the word included in the sentence by analyzing the sentence;
An extracted data determination step in which an extracted data determination means determines whether the data record extracted in the extraction step is included in any of the one or more data sets;
In the extracted data determination step, when it is determined that the extracted data record is not included in any of the one or more data sets, a data generation unit includes the extracted data record A language generation system processing method comprising: a data generation step of generating a new data set and storing the generated new data set in an empty layer of the storage means.

Each of the one or more data sets has a common format;
The processing method of the language input system according to claim 9, wherein in the data generation step, the new data set having the format is generated.

The data record included in the one or more data sets stored in the storage means includes a reading and a notation corresponding to the reading;
The processing method of the language input system according to claim 9 or 10, wherein the data record extracted in the extraction step includes a reading of a word included in the sentence and a notation corresponding to the reading.

Each of the one or more data sets stored in the storage means has a lifetime.
The language input system processing method according to claim 9, wherein an existing period is assigned to the new data set in the data generation step.

A period determination step in which the period determination unit determines the existence period of each of the one or more data sets stored in the storage unit;
A data erasing step, wherein the data erasing step deletes the data set whose existence period is determined to have expired in the period determining step from the storage means, and sets the layer in which the deleted data set is stored as an empty layer; The language input system processing method according to claim 12, further comprising:

The processing method of the language input system according to claim 9, wherein the sentence is an original message from an e-mail sender in an e-mail response document.

The processing method of the language input system according to claim 9, wherein the sentence is a content of a document file in a specific folder.

The processing method of the language input system according to claim 9, wherein the predetermined language is Japanese.

17. A computer-readable recording medium on which a program for causing a computer to execute each step of the processing method of the language input system according to claim 9 is recorded.

A program for causing a computer to execute each step of the processing method of the language input system according to any one of claims 9 to 16.