JP4119979B2

JP4119979B2 - Personal environment language conversion device, personal environment difference enhancement device, and program

Info

Publication number: JP4119979B2
Application number: JP2003053258A
Authority: JP
Inventors: 真樹村田
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2003-02-28
Filing date: 2003-02-28
Publication date: 2008-07-16
Anticipated expiration: 2023-02-28
Also published as: JP2004265014A

Description

【０００１】
【発明の属する技術分野】
本発明は、個人環境での読み書きの入力データから文字列の頻度情報を求める個人環境頻度記憶装置と、該個人環境頻度記憶装置を使用する個人環境言語変換装置及び個人環境差分強調装置及びプログラムに関する。
【０００２】
特に、個人環境の読み書きシステムにおいてその個人の良く知っている文字列（単語）を認定する個人環境頻度記憶装置及びプログラムと、なるべく個人の良く知っている文字列（単語）を使って表示するようにする個人環境言語変換装置及びプログラムと、個人環境での文字列の出現頻度により文字列を強調表示する個人環境差分強調装置及びプログラムに関する。
【０００３】
【従来の技術】
従来、自然言語で記述された文または文章に関する表現の変換処理として典型的なものは、機械翻訳である。機械翻訳では、ある国の自然言語で記述された文または文章を他の国の自然言語で記述された文または文章に変換する。
【０００４】
機械翻訳が他の国の言語に変換するのに対し、同一の自然言語間での文または文章の変換処理を行うシステムも用いられるようになってきている。例えば、要約文を自動生成したり、文章を推敲したりするシステムである。
【０００５】
一般に同一自然言語間での文の変換処理では、変換前の語・句・文などのパターンと変換後の語・句・文などのパターンとの対からなる変換規則を大量に用意し、いわゆるパターン・マッチングによって入力文中に現れる変換前のパターンを探し出し、該当するパターンがあれば、それを変換後の語・句・文などのパターンに置き換える処理を行うものであった。
【０００６】
また、各種の言い換えを統一的に扱うことができるようにするため、ある自然言語で記述された文字列を、同一の自然言語で記述された他の表現による文字列に変換するシステムとして次のものが提案されている。
【０００７】
主要なモジュールとして変形処理手段と評価処理手段と、自然言語の文字列に関する変形の規則を記憶する変形規則記憶手段と、文字列を変形した結果が目的とするふさわしい変換であるかどうかを評価するための尺度を与える評価関数または評価規則を記憶する評価情報記憶手段とを持ち、これらを変換の目的に応じて交換できるようにする。さらに、変形規則記憶手段および評価情報記憶手段に、変形規則および評価関数等を複数種類用意し、変換の目的に応じて選択できるようにする。
【０００８】
これにより、複数の言語変換処理システムを開発する場合の開発コストを低減し、また、複数の言語変換処理システムを統一されたインタフェースで利用可能にすることができるものがあった（例えば、非特許文献１及び特願２００１−２０５８８９参照）。
【０００９】
【非特許文献１】
村田真樹，井佐原均, 言い換えの統一的モデル−尺度に基づく変形の利用−，2001年 3月30日言語処理学会第７回年次大会ワークショップ論文集，ｐ．21〜26
【００１０】
【発明が解決しようとする課題】
上記従来の要約文の自動生成、文章の推敲等の同一自然言語内での文または文章の変換処理では、一般に変換規則による一律な変換を行うものであり、平易文生成、要約文生成、文章の推敲といった変換の目的に応じて、それぞれ個別に独自のシステムを構築する必要があり、個人環境に対応した言語変換を自動で行えるものではなかった。
【００１１】
また、各種の言い換えを統一的に扱うことができるようにしたものは、個人の良く知っている単語を認定し、各個人の良く知っている単語を使って表示できるものではなかった。
【００１２】
本発明は上記問題点の解決を図り、個人環境の読み書きシステムにおいてその個人の良く知っている単語を認定し、なるべくその単語を使って表示するようにすることで、各個人にとってわかりやすい表現とすることを目的とする。
【００１３】
また、個人環境での文字列の出現頻度により文字列を強調表示することで、自分の苦手な文字列や単語、自分の興味の大きい文字列や単語を容易に特定できるようにすることを目的とする。
【００１４】
【課題を解決するための手段】
図１は本発明の原理説明図である。図１中、２は個人環境頻度記憶装置、３ａは頻度記憶手段、４ａは言語変換手段、５は読み書き入力部、６は入力部、７は出力部、１３ａは格納手段である。
【００１５】
本発明は、前記従来の課題を解決するため次のような手段を有する。
【００１６】
（１）：個人環境での読み書きの入力を行う読み書き入力部５と、文字列の頻度情報の検索を行う頻度記憶手段３ａとを備え、前記頻度記憶手段３ａは、前記読み書き入力部５から入力された個人環境での読み書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする。このため、個人環境での文字列の頻度情報により、入力文の個人環境への言語変換や差分強調を容易に行うことができる。
【００１７】
（２）：前記（１）の個人環境頻度記憶装置において、前記頻度記憶手段３ａに読み入力検出部と書き入力検出部とを備え、前記頻度記憶手段３ａは、前記書き入力検出部で検出した前記任意の文字列の出現頻度の重みを前記読み入力検出部で検出した前記任意の文字列の出現頻度の重みより重くして前記任意の文字列の出現頻度を求める。このため、印象のより多い「書く」ということを重要視することができる。
【００１８】
（３）：前記（２）の個人環境頻度記憶装置において、前記読み入力検出部は、表示時間の短いものは前記任意の文字列の出現頻度として検出しないようにして前記任意の文字列の出現頻度を求める。このため、個人が読まないで単に表示するものを除くことができる。
【００１９】
（４）：前記（１）〜（３）の個人環境頻度記憶装置において、前記頻度記憶手段３ａは、前記読み書き入力部５から入力された読み書きデータのうち古いものを削除して前記任意の文字列の出現頻度を求める。このため、最近の個人環境頻度を記憶でき、個人環境の変化に対応することができる。
【００２０】
（５）：前記（１）〜（３）の個人環境頻度記憶装置において、前記頻度記憶手段３ａは、前記読み書き入力部５から入力された読み書きデータから古いものの重みを軽くして前記任意の文字列の出現頻度を求める。このため、最近の個人環境頻度を重要視でき、個人環境の変化に対応することができる。
【００２１】
（６）：前記（１）〜（５）のいずれかに記載の個人環境頻度記憶装置２と、入力文を個人環境言語に変換して出力する言語変換手段４ａとを備え、前記言語変換手段４ａは、入力された文字列を前記個人環境頻度記憶装置２に格納されている出現頻度の高い文字列に変換して出力する。このため、各個人にとって分かりやすい表現にすることができる。
【００２２】
（７）：前記（１）〜（５）のいずれかに記載の個人環境頻度記憶装置２と、入力文を個人環境言語に変換して出力する言語変換手段４ａとを備え、前記言語変換手段４ａは、入力された文字列のうち前記個人環境頻度記憶装置２に格納されている出現頻度の高い文字列を括弧づけで補助表記して出力する。このため、完全に書き換えて、勘違いや文の意味が変わるのを防止することができる。
【００２３】
（８）：前記（１）〜（５）のいずれかに記載の個人環境頻度記憶装置２と、入力文の文字列の差分を強調して表示する差分強調装置とを備え、前記差分強調装置は、入力された文字列のうち前記個人環境頻度記憶装置２に格納されている文字列の出現頻度がある閾値以下のものを強調表示する。このため、自分の苦手な文字列や単語を手際よく探すことができる。
【００２４】
（９）：前記（１）〜（５）のいずれかに記載の個人環境頻度記憶装置２と、入力文の文字列の差分を強調して表示する差分強調装置とを備え、前記差分強調装置は、入力された文字列のうち前記個人環境頻度記憶装置２に格納されている文字列の出現頻度がある閾値以上のものを強調表示する。このため、自分の興味の大きい文字列や単語を強調表示して、自分の興味のある段落や文を手際よく探すことができる。
【００２５】
（１０）：個人環境での書きの入力を行う書き入力部と、文字列の頻度情報の検索を行う頻度記憶手段３ａを備え、前記頻度記憶手段３ａは、前記書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする個人環境頻度記憶装置２と、入力文を個人環境言語に変換して出力する言語変換手段４ａとを備え、前記言語変換手段４ａは、入力された文字列のうち前記個人環境頻度記憶装置２に格納されている出現頻度の高い文字列を括弧づけで補助表記して出力する。このため、完全に書き換えて、勘違いや文の意味が変わるのを防止することができる。
【００２６】
（１１）：個人環境での書きの入力を行う書き入力部と、文字列の頻度情報の検索を行う頻度記憶手段３ａを備え、前記頻度記憶手段３ａは、前記書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする個人環境頻度記憶装置２と、入力文の文字列の差分を強調して表示する差分強調装置とを備え、前記差分強調装置は、入力された文字列のうち前記個人環境頻度記憶装置２に格納されている文字列の出現頻度がある閾値以下のものを強調表示する。このため、自分の苦手な文字列や単語を手際よく探すことができる。
【００２７】
（１２）：個人環境での書きの入力を行う書き入力部と、文字列の頻度情報の検索を行う頻度記憶手段３ａを備え、前記頻度記憶手段３ａは、前記書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする個人環境頻度記憶装置２と、入力された文字列の差分として出力する対象の単位である抽出単位と入力された文字列の差分を検出するために比較する領域の単位である検出領域を設定し、抽出手段で入力された文字列の現在の前記検出領域以外の領域から全ての前記抽出単位に相当するものを抽出して格納手段に格納し、現在の前記検出領域において、前記格納手段に格納されていない前記抽出単位に相当するものを強調表示して現在の検出領域の文書を出力することを、前記検出領域ごとに繰り返す差分強調装置とを備え、前記差分強調装置は、前記抽出手段で強調表示すべきと判断された前記入力された文字列のうち前記個人環境頻度記憶装置２に格納されている文字列の出現頻度がある閾値以下のものを強調表示する。このため、強調表示される箇所が少なく、自分の苦手な表現が初出に現れた箇所を容易に特定できる。
【００２８】
（１３）：個人環境での書きの入力を行う書き入力部と、文字列の頻度情報の検索を行う頻度記憶手段３ａを備え、前記頻度記憶手段３ａは、前記書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする個人環境頻度記憶装置２と、入力された文字列の差分として出力する対象の単位である抽出単位と入力された文字列の差分を検出するために比較する領域の単位である検出領域を設定し、抽出手段で、入力された文書データの現在の前記検出領域において、格納手段に格納されていない前記抽出単位に相当するものを強調表示して現在の検出領域の文書を出力し、前記強調表示したものを前記格納手段に格納することを、前記検出領域ごとに繰り返す差分強調装置とを備え、前記差分強調装置は、前記抽出手段で強調表示すべきと判断された前記入力された文字列のうち、前記個人環境頻度記憶装置２に格納されている文字列の出現頻度がある閾値以下のものを強調表示する。このため、強調表示される箇所が少なく、自分の苦手な表現が初出に現れた箇所を容易に特定できる。
【００２９】
（１４）：個人環境での書きの入力を行う書き入力部と、文字列の頻度情報の検索を行う頻度記憶手段３ａを備え、前記頻度記憶手段３ａは、前記書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする個人環境頻度記憶装置２と、入力文の文字列の差分を強調して表示する差分強調装置とを備え、前記差分強調装置は、入力された文字列のうち前記個人環境頻度記憶装置２に格納されている文字列の出現頻度がある閾値以上のものを強調表示する。このため、自分の興味の大きい文字列や単語を強調表示して、自分の興味のある段落や文を手際よく探すことができる。
【００３０】
（１５）：個人環境での書きの入力を行う書き入力部と、文字列の頻度情報の検索を行う頻度記憶手段３ａを備え、前記頻度記憶手段３ａは、前記書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする個人環境頻度記憶装置２と、入力文の文字列の差分を強調して表示する差分強調装置とを備え、前記差分強調装置は、入力された文字列のうちユーザ個人での文字列出現確率が一般的なテキスト集合での出現確率より有意に大きいものだけを強調表示する。このため、出現確率が有意に大きくない一般的な名詞等は強調表示されなくなり、その人の興味の大きい文字列や単語が強調表示され、自分の興味のある文字列や単語を含む段落や文を手際よく探すことができる。
【００３１】
（１６）：個人環境での書きの入力を行う書き入力部と、文字列の頻度情報の検索を行う頻度記憶手段３ａを備え、前記頻度記憶手段３ａは、前記書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする個人環境頻度記憶装置２と、入力文の文字列の差分を強調して表示する差分強調装置とを備え、前記差分強調装置は、入力された文字列のうちユーザ個人での文字列出現確率が一般的なテキスト集合での出現確率のある値倍より大きいものだけを強調表示する。このため、出現確率がある値倍より大きくない一般的な名詞等は比率的に強調表示されなくなり、その人の興味のより大きい文字列や単語が強調表示され、自分の興味のある文字列や単語を含む段落や文を手際よく探すことができる。
【００３２】
【発明の実施の形態】
§１：個人環境言語変換の説明
この発明は、基本的には同一言語内の言語変換を行うものであるが、特に、個人の環境にマッチした言語変換を行うものである。
【００３３】
つまり、個人環境の読み書きを行うシステムと同一言語内での言語変換装置をくっつけるものである。一般に、個人が文章を作成するためのワードプロセッサや文章を読むためにディスプレイに表示するための個人環境の読み書きシステムにおいてその個人の良く知っている単語を認定し、なるべくその単語を使って表示するようにするものである。
【００３４】
（１）：個人環境言語変換装置の説明
図２は個人環境言語変換装置の説明図である。図２において、個人環境言語変換装置には、頻度記憶部３、言語変換部４、読み書き入力部５、入力部６、出力部７が設けてある。
【００３５】
頻度記憶部３は、個人環境での読み書き入力部５から入力された文字列の出現頻度を求めるものである。読み書きシステムが使用されると読み書き入力部５から読み書きが入力され頻度記憶部３はいつも書き換えられるものである。
【００３６】
言語変換部４は、頻度記憶部３の変形規則によって変換の候補を獲得し、出現頻度等の評価の尺度（評価関数など）によって評価し、最もふさわしい変換の候補を選択するものである。
【００３７】
読み書き入力部５は、読みシステム、書きシステム又は読み書きが一体になったシステム等の読み書きシステムから読み書きが入力されるものでる。入力部６は、変換対象文を入力するものである。出力部７は、言語変換結果を出力するものである。
【００３８】
（２）：言語補助変換部を用いる場合の説明
図３は言語補助変換部を用いる場合の説明図である。図３の個人環境言語変換装置は、図２の言語変換部４の代わりの言語補助変換部８を用るものである。図３において、個人環境言語変換装置には、頻度記憶部３、読み書き入力部５、入力部６、出力部７、言語補助変換部８が設けてある。
【００３９】
頻度記憶部３は、個人環境での読み書きシステム等から入力された文字列の出現頻度を求めるものである。読み書き入力部５は読み書きシステムからの読み書きが入力されるものでる。入力部６は、変換対象文を入力するものである。出力部７は、言語変換結果を出力するものである。
【００４０】
言語補助変換部８は、頻度記憶部３の変形規則によって変換の候補を獲得し、出現頻度等の評価の尺度（評価関数など）によって評価し、最もふさわしい変換の候補を選択し括弧づけで補助表記するものである。
【００４１】
（３）：頻度記憶部の説明
ａ）頻度記憶部の例（１）
図４は頻度記憶部の説明図（１）である。図４において、頻度記憶部には、読み入力検出部１１、読みデータ格納部１３、全文検索エンジン１４、書き入力検出部２１、書きデータ格納部２３、全文検索エンジン２４が設けてある。
【００４２】
読み入力検出部１１は、読み書き入力部５から入力された読み入力を検出するものである。読みデータ格納部１３は、読み入力検出部１１が検出した読み入力を格納するものである。全文検索エンジン１４は、読みデータ格納部１３に格納された任意の文字列の個数をカウントするものである。書き入力検出部２１は、読み書き入力部５から入力された書き入力を検出するものである。書きデータ格納部２３は、書き入力検出部２１が検出した書き入力を格納するものである。全文検索エンジン２４は、書きデータ格納部２３に格納された任意の文字列の個数をカウントするものである。なお、全文検索エンジン１４と２４は、一つの検索エンジンを用いることもできる。
【００４３】
図５は頻度記憶部の処理フローチャート（１）である。以下、頻度記憶部の処理を図５のステップＳ１〜Ｓ５に従って説明する。
【００４４】
Ｓ１：読み書き入力部５から読み入力されたデータを読み入力検出部１１で検出する。具体的には、画面上に何分か以上連続して表示された部分を読み入力されたデータとして認識する。
【００４５】
Ｓ２：読み入力検出部１１で検出された読み入力された文字列をそのまま読みデータ格納部１３に格納する。
【００４６】
Ｓ３：読み書き入力部５から書き入力されたデータを書き入力検出部２１で検出する。具体的には、キーボード入力などで入力された文字列を書き入力されたデータとして認識する。
【００４７】
Ｓ４：書き入力検出部２１で検出された書き入力された文字列をそのまま書きデータ格納部２３に格納する。
【００４８】
Ｓ５：文字列の存在また個数を高速に検索する、全文検索エンジン１４、２４を用いて任意の文字列の個数をカウントできるようにしておく。
【００４９】
ｂ）頻度記憶部の例（２）
図６は頻度記憶部の説明図（２）である。図６において、頻度記憶部には、読み入力検出部１１、形態素解析器１２、読みデータ格納部１３、単語検索エンジン１４、書き入力検出部２１、形態素解析器２２、書きデータ格納部２３、全文検索エンジン２４が設けてある。
【００５０】
読み入力検出部１１は、読み書き入力部５から入力された読み入力を検出するものである。形態素解析器１２は、読み入力を単語に分割するものである。読みデータ格納部１３は、形態素解析器１２で分割した単語を格納するものである。単語検索エンジン１４は、任意の単語の出現回数をカウントするものである。書き入力検出部２１は、読み書き入力部５から入力された書き入力を検出するものである。形態素解析器２２は、書き入力を単語に分割するものである。書きデータ格納部２３は、形態素解析器２２で分割した単語を格納するものである。単語検索エンジン２４は、任意の単語の出現回数をカウントするものである。なお、形態素解析器１２と２２、読みデータ格納部１３と書きデータ格納部２３及び全文検索エンジン１４と２４は、それぞれ一つの形態素解析器、一つの格納部及び一つの検索エンジンを用いることもできる。
【００５１】
図７は頻度記憶部の処理フローチャート（２）である。以下、形態素解析器を用いる頻度記憶部の処理を図７のステップＳ１１〜Ｓ１５に従って説明する。
【００５２】
Ｓ１１：読み書き入力部５から読み入力されたデータを読み入力検出部１１で検出する。具体的には、画面上に何分か以上連続して表示された部分を読み入力されたデータとして認識する。
【００５３】
Ｓ１２：読み入力検出部１１で検出された読み入力された文字列を形態素解析器１２で単語に分割し、単語ごとに読みデータ格納部１３に格納する。各単語ごとに何回出現したかの回数のデータも同時に格納する。すでに格納してある単語と同じ単語のものを格納する場合は出現回数のデータのみを更新する。
【００５４】
Ｓ１３：読み書き入力部５から書き入力されたデータを書き入力検出部２１で検出する。具体的には、キーボード入力などで入力された文字列を書き入力されたデータとして認識する。
【００５５】
Ｓ１４：書き入力検出部２１で検出された書き入力された文字列を形態素解析器２２で単語に分割し、単語ごとに書きデータ格納部２３に格納する。各単語ごとに何回出現したかの回数のデータも同時に格納する。すでに格納してある単語と同じ単語のものを格納する場合は出現回数のデータのみを更新する。
【００５６】
Ｓ１５：単語検索エンジン１４、２４では、任意の単語の出現回数をカウントできるようにしておく。
【００５７】
（４）言語変換部の説明
図８は言語変換部の説明図である。図８において、言語変換部には、言語変換処理部３１、変換規則部（言語変換の辞書）３２、変換用尺度３３が設けてある。言語変換処理部３１は、変換規則を用いて変形の候補をあげ、変換用尺度により変換の妥当性のチェックをし、最も妥当であると判断されたものに変換するものである。即ち、変形規則を用いて変換用尺度が大きくなるような変換を行う。変形規則部（言語変換の辞書）３２は、「罷免する」を「やめさせる」に変形する等の規則である。変換用尺度３３は、類似度、長さ、出現頻度（出現確率）等があるが、ここでは頻度記憶部の出現頻度を用いる。
【００５８】
（５）個人環境言語変換処理の説明
ａ）頻度記憶部と言語変換部を用いる個人環境言語変換の説明
▲１▼：ユーザの日々の読み書きの行動から、頻度記憶部のフローチャート（図５、図７参照）の処理で、ユーザの読みデータ、書きデータを頻度記憶部３に記憶する。
【００５９】
▲２▼：言語変換部４には言語変換規則が蓄えられているものとする。適用可能な言語変換規則があるとき、その変換をした後の文字列の頻度と、変換前の文字列の頻度を、頻度記憶部３から求めて、変換をした後の文字列の頻度の方が大きい場合、変換を行なう。
【００６０】
また、変換前の文字列の頻度の方が大きい場合は、変換を行なわない。変換を行ないうる言語変換規則が複数ある場合は、その変換をした後の文字列の頻度がもっとも大きい規則を用いて変換を行なう。
【００６１】
このときの文字列の頻度は、読みデータと書きデータの両方を組み合わせたもので、概ね以下のような式で求めなおしたものを用いる。
【００６２】
具体的には、読みシステムにおいての各単語t の出現頻度をｆ_r(t) 、書きシステムにおいての各単語t の出現頻度をｆ_w(t) とするとき、その個人の単語出現頻度分布を
α×ｆ_r(t) ＋（１−α）×ｆ_w(t) （ただし、０≦α≦１）
として、この頻度が多くなるように単語を変換する。すなわち、言語変換をする際に用いる尺度して、個人環境の読み書きシステムにおけるその個人の単語出現頻度分布を用いるものである。ここで、αを設けるのは、「読む」ということは「書く」ということより印象が少ないので、「書く」事への重みを高める（重み付ける）ためである。つまり、αは０．５より小さいものとなる。なお、αなどの定数はユーザが設定変更できるようにしておくものである。
【００６３】
例えば、その個人が政治家であるとするとなるべく政治家の専門用語を使った表現に変換するようになる。一般に自分が良く使う表現で書かれた文ほど、その人にとってわかりやすい。この発明の個人環境言語変換装置を用いると、他の分野の専門家の文章も、自分の分野でよく使われる表現に変更することができる。
【００６４】
（具体的な例による説明）
同一言語内の言語変換については、すでに出願済であるが、ここでは、個人環境の読み書きシステムと同一言語内での言語変換装置をくっつけたものである。個人環境の読み書きシステムをくっつけることで、その個人のよく知っている単語の頻度分布を容易に取得でき、その情報を言語変換に用いることができる点が重要なところである。
【００６５】
例えば、以下の矢印のように変換する。
【００６６】
「世界知識を用いた照応解析の研究」→「常識を用いた指示詞の指示先の推定の研究」
言語変換の辞書（変換規則）に次のものがあるとする。
【００６７】
世界知識＝常識
照応解析＝指示詞の指示先の推定
ここで、言語処理の分野で記述される「世界知識を用いた照応解析の研究」という文が入力されたとする。
【００６８】
（場合１）
ユーザが言語処理の研究者とする。
【００６９】
「常識」よりも「世界知識」、「指示詞の指示先の推定」よりも「照応解析」の方がそのユーザの読み書きシステムでの利用頻度（出現頻度）が高いとする。
【００７０】
これは、言語処理の研究者のためにそのような専門用語を使う頻度が多いためである。
【００７１】
この場合、個人環境言語変換装置は、書き換えずにそのまま「世界知識を用いた照応解析の研究」を出力することになる。このユーザーにとってはこの表現の方が自然なのでこのままの出力を見る方が都合が良い。
【００７２】
（場合２）
ユーザーが言語処理の分野をあまり知らない人とする。
【００７３】
この場合、より一般的な用語の「常識」や「指示詞の指示先の推定」の方が利用頻度が高くなると思われる。
【００７４】
この場合、個人環境言語変換装置は、書き換えを行ない「常識を用いた指示詞の指示先の推定の研究」を出力する。
【００７５】
このユーザーは、「照応解析」や「世界知識」という用語を知らないのでだいぶわかりやすい表現を見ることになり、文章の理解が容易になる。
【００７６】
ただし、完全に書き換えたのでは、勘違いをしたり、文の意味がかわる可能性がある。
【００７７】
ｂ）頻度記憶部と言語補助変換部を用いる個人環境言語変換の説明
▲１▼：ユーザの日々の読み書きの行動から、頻度記憶部の処理フローチャート（図５、図７参照）の処理で、ユーザの読みデータ、書きデータを頻度記憶部３に記憶する。
【００７８】
▲２▼：言語補助変換部８には言語変換規則が蓄えられているものとする。適用可能な言語変換規則があるとき、その変換をした後の文字列の頻度と、変換前の文字列の頻度を、頻度記憶部から求めて、変換をした後の文字列の頻度の方が大きい場合、補助変換を行なう。
【００７９】
また、変換前の文字列の頻度の方が大きい場合は、補助変換を行なわない。変換を行ないうる言語変換規則が複数ある場合は、その変換をした後の文字列の頻度がもっとも大きい規則を用いて補助変換を行なう。
【００８０】
このときの文字列の頻度は、読みデータと書きデータの両方を組み合わせたもので、概ね以下のような式で求めなおしたものを用いる。
【００８１】
具体的には、読みシステムにおいての各単語t の出現頻度をｆ_r(t) 、書きシステムにおいての各単語t の出現頻度をｆ_w(t) とするとき、その個人の単語出現頻度分布を
α×ｆ_r(t) ＋（１−α）×ｆ_w(t) （ただし、０≦α≦１）
として、この頻度が多くなるように単語を変換する。すなわち、言語変換をする際に用いる尺度して、個人環境の読み書きシステムにおけるその個人の単語出現頻度分布を用いるものである。ここで、αを設けるのは、「読む」ということは「書く」ということより印象が少ないので、「書く」事への重みを高める（重み付ける）ためである。つまり、αは０．５より小さいものとなる。なお、αなどの定数はユーザが設定変更できるようにしておくものである。
【００８２】
ところで、補助変換とは、文字列を変換してしまうのではなく、変換先の文字列を括弧づけで補助表記することを意味する。
【００８３】
（具体的な例による説明）
「世界知識（常識）を用いた照応解析（指示詞の指示先の推定）の研究」
のように、完全に書き換えてしまうのではなく、括弧づけで補足的な表示をする。なお、本文中に括弧が使われている場合は、それと区別するため異なる括弧を使用することもできる。
【００８４】
このときも、専門の研究者など、「世界知識」「照応解析」など、用語をよく知っている人にはこの補助変換（表示）をするとむしろ不便であるので、ユーザの用語の使用頻度によって出すか出さないかなどを判断した方がよい。
【００８５】
この括弧づけで判断する（補助変換）方法は以下で説明する。
【００８６】
・前記のように、その個人の単語出現頻度分布を〔α×ｆ_r(t) ＋（１−α）×ｆ_w(t) （ただし、０≦α≦１）〕として、この頻度が多くなるような書き換え候補の語を括弧付けで付ける。即ち、書き換える語の使用頻度が大きくなる方を括弧付けで付ける。
【００８７】
・前記のように、その個人の単語出現頻度分布〔α×ｆ_r(t) ＋（１−α）×ｆ_w(t) （ただし、０≦α≦１）〕として、この頻度が減らないような書き換え候補の語で、かつ、もとの語の頻度がある閾値よりも小さい語を括弧付けで付ける。即ち、個人の使用頻度が多ければ括弧付けは行わないが、頻度が０とか少ない場合には括弧付けで付ける。
【００８８】
なお、読み書きシステムには、読みシステム、書きシステム又は読み書きが一体になったシステムがある。読みシステムにはメーラ、インターネット・エクスプローラ、読むために開いた（表示した）ワード文章（文章を作成システムの一種）等の文章を読むためのシステムである。書きシステムには文字を入力して文章を作成するワード文章等の文章を書くためのシステムである。また、読みシステムにおいては、ディスプレイ等に表示される文章の量が多くなるので、表示時間の短いものは除くようにすることもできる。
【００８９】
さらに、読みシステムにおいて、頻度記憶部に格納する単語の重み付けを変えることもできる。例えば、文章を作成システムであるワード文章等を読む場合は丁寧に読むものと考えられるので、インターネット等で画面を見る場合と比べ重みを高くすることができる。
【００９０】
また、頻度記憶部に格納されている単語は、古いものを除くようにすることができる。例えば、個人の趣味が変わるとか、ある分野の専門家になる等で個人環境も変化する場合があるので、古いものは削除するか重み付けを低くするものである。
【００９１】
§２：個人環境差分強調装置の説明
図９は個人環境差分強調装置の説明図である。図９において、個人環境差分強調装置には、頻度記憶部３、読み書き入力部５、入力部６、出力部７、差分強調装置９が設けてある。
【００９２】
頻度記憶部３は、個人環境での読み書きシステム等から入力された文字列の出現頻度を求めるものである。読み書きシステムが使用されると頻度記憶部３はいつも書き換えられるものである。
【００９３】
読み書き入力部５は、読み書きシステムから読み書きが入力されるものでる。入力部６は、変換対象文を入力するものである。出力部７は、言語変換結果を出力するものである。
【００９４】
差分強調装置９は、入力部６から入力された文章の文字列と頻度記憶部３の一定頻度の文字列等との差分から、入力部６から入力された文章の文字列の強調表示を行うものである。
【００９５】
（１）：個人環境差分強調装置の動作説明（１）
▲１▼：ユーザの日々の読み書きの行動から、頻度記憶部３のフローチャートの処理（図５、図７参照）で、ユーザの読みデータ、書きデータを記憶する。
【００９６】
▲２▼：差分強調装置９で、入力された文章のうち、ユーザ個人の単語出現頻度がある閾値以下のものだけを強調表示する。
【００９７】
（例による説明）
例えば、次のようなテキストが入力部６から入力されたとする。
【００９８】
『自然言語では，動詞を省略するということがある。この省略された動詞を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである。そこで本研究では，この省略された動詞を表層の表現（手がかり語）と用例から補完することを行なう。解析のための規則を作成する際，動詞の省略現象を補完する動詞がテキスト内にあるかいなかなどで分類した。小説を対象にして実験を行なったところ，テストサンプルで再現率84% ，適合率82% の精度で解析できた。このことは本手法が有効であることを示している。テキスト内に補完すべき動詞がある場合は非常に精度が良かった。それに比べ，テキスト内に補完すべき動詞がない場合はあまり良くなかった。しかし，テキスト内に補完すべき動詞がない場合の問題の難しさから考えると，少しでも解析できるだけでも価値がある。また，コーパスが多くなり，計算機の性能もあがり大規模なコーパスが利用できるようになった際には，本稿で提案した用例を利用する手法は重要になるだろう。』
ユーザは、言語学の専門の人とする。その場合、工学的な表現「テストサンプル」「再現率」「適合率」の使用頻度が極端に小さいとする。
【００９９】
その場合、次のように強調表示される。即ち、強調表示箇所は《, 》（二重山括弧）で囲って表示している。
【０１００】
『自然言語では，動詞を省略するということがある。この省略された動詞を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである。そこで本研究では，この省略された動詞を表層の表現( 手がかり語) と用例から補完することを行なう。解析のための規則を作成する際，動詞の省略現象を補完する動詞がテキスト内にあるかいなかなどで分類した。小説を対象にして実験を行なったところ，《テストサンプル》で《再現率》84% ，《適合率》82% の精度で解析できた。このことは本手法が有効であることを示している。テキスト内に補完すべき動詞がある場合は非常に精度が良かった。それに比べ，テキスト内に補完すべき動詞がない場合はあまり良くなかった。しかし，テキスト内に補完すべき動詞がない場合の問題の難しさから考えると，少しでも解析できるだけでも価値がある。また，コーパスが多くなり，計算機の性能もあがり大規模なコーパスが利用できるようになった際には，本稿で提案した用例を利用する手法は重要になるだろう。』
このようにして、自分の苦手な語を手際よく探すことができ、それをなんらかの辞書などで調べるとよいとすぐわかり便利である。
【０１０１】
（２）：個人環境差分強調装置の動作説明（２）
▲１▼：ユーザの日々の読み書きの行動から、頻度記憶部３のフローチャートの処理（図５、図７参照）で、ユーザの読みデータ、書きデータを記憶する。
【０１０２】
▲２▼：差分強調装置９で、入力部５から入力された文章において、その文章のうちで初めての文字列を強調表示すべきと判断され、かつ、ユーザ個人の単語出現頻度がある閾値以下のものだけを強調表示する。
【０１０３】
なお、差分強調装置で、入力された文章のうちで初めての文字列を強調表示すべきと判断する手法は、次の手法１、２がある（特願２００２−２９０９４６参照）。
【０１０４】
手法１
▲１▼入力部等により、予め抽出の単位（抽出単位）、検出領域の単位を定める。抽出単位とは、差分として出力する対象の単位である。抽出単位には、「単語」「漢字」「名詞句」などが考えられる。検出領域の単位とは、差分を検出するために比較する領域の単位のことである。検出領域の単位には、「文字」「単語」「文」「箇条書の項目」「段落」「特許の請求項」などが考えられる。
【０１０５】
▲２▼差分強調装置は、すべての入力データを記憶手段（差分強調装置内の）に記憶させる。
【０１０６】
▲３▼差分強調装置は、入力されたデータを左から調べて左の検出領域から▲１▼で定めた検出領域ごとに以下の処理▲４▼と処理▲５▼を繰り返す。
【０１０７】
▲４▼差分強調装置は、現在の検出領域以外の領域すべてから、すべての抽出単位に相当するもの（例えば単語）を抽出し、それを抽出物記憶手段（差分強調装置内の）に格納する。
【０１０８】
▲５▼差分強調装置は、現在の検出領域において、抽出物記憶手段（差分強調装置内の）に格納されていない抽出単位に相当するもの（例えば単語）を強調表示して現在の検出領域の文章を出力する。
【０１０９】
手法２
▲１▼入力部１等により、予め抽出の単位（抽出単位）、検出領域の単位を定める。抽出単位とは、差分として出力する対象の単位である。抽出単位には、「単語」「漢字」「名詞句」などが考えられる。検出領域の単位とは、差分を検出するために比較する領域の単位のことである。検出領域の単位には、「文字」「単語」「文」「箇条書の項目」「段落」「特許の請求項」などが考えられる。
【０１１０】
▲２▼入力部から前記▲１▼で定めた検出領域ごとに入力データが入力され、差分強調装置は、以下の処理▲３▼と処理▲４▼を繰り返す。
【０１１１】
▲３▼差分強調装置は、現在の検出領域において、抽出物記憶装置（差分強調装置内の）に格納されていない抽出単位に相当するもの（例えば単語）を強調表示して現在の検出領域の文章を出力する。ただし、抽出物記憶装置（差分強調装置内の）は最初は空である。
【０１１２】
▲４▼前記処理▲１▼で強調表示した表現を抽出物記憶装置に格納する。
【０１１３】
（例による説明）
ここでは手法２に適用した例を説明する。差分強調装置は、上記手法２により、入力された文章のうち、以下の二重山括弧で囲った部分を強調表示すべきと判断する。なお、抽出の単位、検出領域の単位ともに単語である。
【０１１４】
『《本研究の目的は，日本語》の《受け身文》，《使役》文《を能動》文《に変換する際》に《変更され》《るべき格助詞》を《機械学習》を《用いて自動》変換する《ことである．》日本語の受け身文，使役文の《例》を《図1 と》図《2 》に《あげる》．図1 の文の日本語の《接尾辞「》れ《た」》は《受動態》を《示す助動詞》で《あり》，《この》文は受け身文である．図2 の文の日本語の接尾辞「《せ》た」は使役を示す助動詞であり，この文は使役文である．《これら》の文に《対》《応》する能動文を図《3 》に示す．図1 の文《が》能動文に変換さ《れるとき》は，《(i) 》格助詞「に」は格助詞「が」に《(ii)》格助詞「が」は格助詞「を」に変換される．図2 の文が能動文に変換されるときは，(i) 格助詞「が」の《部分》「《彼》が」の《文節》が《消去》され，(ii)格助詞「に」が格助詞「が」に変換され，《(iii) 》格助詞「を」は変換され《ず》に《そのまま残る》．本研究では，これらの格助詞の変換《( 》例《：》格《助》《詞》「に」の格助詞「が」《へ》の変換《) 》と，《不要》部分の消去( 例：「彼が」の消去) を，研究の《対象》とする．( 《以降》，《本稿》では《便宜上》「彼が」《など》の消去の部分《も》格助詞の変換と《呼ぶ》．)
受け身文，使役文の能動文への変換は，文《生成》，《言い換え》，文の《平易化／言語》《運用支援》，《自然》言語文《から》の《知識獲得や情報抽出》，《質問応答システム》と《多く》の研究《分野》で《役に立つもの》である．《例えば》，質問応答システムでは，質問文が《能》《動》文で《答え》が《受動》文で《書か》れて《いる場合》，質問文と答えを《含む》文で，文の《構造》が《異なるため》に，質問の答えを《取り出す》のが《困難な》場合がある．この《よう》な《問》《題》も受け身文，使役文の能動文への変換が《できる》ように《なる》と《解決》する《のであ》る．このように受け身文，使役文の能動文への変換は，自然言語《処理》で《重要》なものである．』
ここでユーザは言語学にも言語処理も詳しくない人とする。そうすると、言語学、言語処理の専門用語は出現確率が低く、「受け身」「使役」「能動」「態」「言い換え」が頻度が０であったとする。また、上述の閾値も０であったとする。そうすると、個人環境差分強調装置の出力として、これらの語と上記で強調されている部分の重なったところのみが次のように強調表示される。
【０１１５】
『本研究の目的は，日本語の《受け身》文，《使役》文を《能動》文に変換する際に変更されるべき格助詞を機械学習を用いて自動変換することである．日本語の受け身文，使役文の例を図1 と図2 にあげる．図1 の文の日本語の接尾辞「れた」は《受動態》を示す助動詞であり，この文は受け身文である．図2 の文の日本語の接尾辞「せた」は使役を示す助動詞であり，この文は使役文である．これらの文に対応する能動文を図3 に示す．図1 の文が能動文に変換されるときは，(i) 格助詞「に」は格助詞「が」に(ii)格助詞「が」は格助詞「を」に変換される．図2 の文が能動文に変換されるときは，(i) 格助詞「が」の部分「彼が」の文節が消去され，(ii)格助詞「に」が格助詞「が」に変換され，(iii) 格助詞「を」は変換されずにそのまま残る．本研究では，これらの格助詞の変換( 例：格助詞「に」の格助詞「が」への変換) と，不要部分の消去( 例：「彼が」の消去) を，研究の対象とする．( 以降，本稿では便宜上「彼が」などの消去の部分も格助詞の変換と呼ぶ．)
受け身文，使役文の能動文への変換は，文生成，《言い換え》，文の平易化／言語運用支援，自然言語文からの知識獲得や情報抽出，質問応答システムと多くの研究分野で役に立つものである．例えば，質問応答システムでは，質問文が能動文で答えが受動文で書かれている場合，質問文と答えを含む文で，文の構造が異なるために，質問の答えを取り出すのが困難な場合がある．このような問題も受け身文，使役文の能動文への変換ができるようになると解決するのである．このように受け身文，使役文の能動文への変換は，自然言語処理で重要なものである．』
上記のようになると、これは先にあげた単純に前回出願（手法２だけ）を使ったときに比べて強調表示される箇所が少なく、見やすい。また、単純に今回（手法２を用いない）だけだと、すべての「受け身」「使役」「能動」「態」「言い換え」が強調表示されるが、前回出願（手法１又は２）と併用することで、初出の「受け身」「使役」「能動」「態」「言い換え」だけが強調表示されることになる。
【０１１６】
これにより、自分の苦手な表現が初出にあらわれた箇所を容易に特定でき便利である。
【０１１７】
さらに、原理的に、この個人環境差分強調装置が行なっていることを考えてみると、閾値０の場合は、このテキストの最初に、その個人が読み書きしてきた全テキストをくっつけて前回出願の手法２を行なったことを意味する。即ち、もし、その人が、読み書きシステムでした文字を見たり書いたりすることがないとすると、ここで強調表示されるものは、その人が全生涯通じて初めて見た単語を意味する。手法２を生涯にまで拡張したものと見ることができる。
【０１１８】
（３）：削除用単語記憶部を用いる個人環境差分強調装置の説明（個人環境差分強調装置の動作説明（３））
図１０は削除用単語記憶部を用いる個人環境差分強調装置の説明図である。図１０において、個人環境差分強調装置には、頻度記憶部３、読み書き入力部５、入力部６、出力部７、差分強調装置９、削除用単語記憶部１０が設けてある。
【０１１９】
図１０の個人環境差分強調装置は、図９の個人環境差分強調装置に予め指定している単語（名詞以外の単語や一般的な名詞）は強調表示しないようにする削除用単語記憶部１０を追加したものである。
【０１２０】
（動作説明）
▲１▼：ユーザの日々の読み書きの行動から、頻度記憶部３のフローチャートの処理（図５、図７参照）で、ユーザの読みデータ、書きデータを記憶する。
【０１２１】
▲２▼：差分強調装置９で、入力された文章のうち、ユーザ個人の単語出現頻度がある閾値以上のものだけを強調表示する。ただし、あらかじめ指定している単語（名詞以外の単語や一般的な名詞）は強調表示しないようにする。
【０１２２】
（例による説明）
例えば、次のようなテキストが入力部６から入力されたとする。
【０１２３】
『自然言語では，動詞を省略するということがある．この省略された動詞を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである．そこで本研究では，この省略された動詞を表層の表現（手がかり語）と用例から補完することを行なう．解析のための規則を作成する際，動詞の省略現象を補完する動詞がテキスト内にあるかいなかなどで分類した．小説を対象にして実験を行なったところ，テストサンプルで再現率84% ，適合率82% の精度で解析できた．このことは本手法が有効であることを示している．テキスト内に補完すべき動詞がある場合は非常に精度が良かった．それに比べ，テキスト内に補完すべき動詞がない場合はあまり良くなかった．しかし，テキスト内に補完すべき動詞がない場合の問題の難しさから考えると，少しでも解析できるだけでも価値がある．また，コーパスが多くなり，計算機の性能もあがり大規模なコーパスが利用できるようになった際には，本稿で提案した用例を利用する手法は重要になるだろう．』
ここで、ユーザは言語学の専門の人とする。その場合、言語学的な表現、「言語」「動詞」「省略」の使用頻度が極端に高いとする。ただし、名詞以外の語や「こと」「内」などの一般的な名詞はあらかじめ強調表示しない単語として削除用単語記憶部１０に登録しておく。その場合、次のように強調表示される。
【０１２４】
『自然《言語》では，《動詞》を《省略》するということがある．この《省略》された《動詞》を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである．そこで本研究では，この《省略》された《動詞》を表層の表現( 手がかり語) と用例から補完することを行なう．解析のための規則を作成する際，《動詞》の省略現象を補完する《動詞》がテキスト内にあるかいなかなどで分類した．小説を対象にして実験を行なったところ，テストサンプルで再現率84% ，適合率82% の精度で解析できた．このことは本手法が有効であることを示している．テキスト内に補完すべき《動詞》がある場合は非常に精度が良かった．それに比べ，テキスト内に補完すべき《動詞》がない場合はあまり良くなかった．しかし，テキスト内に補完すべき《動詞》がない場合の問題の難しさから考えると，少しでも解析できるだけでも価値がある．また，コーパスが多くなり，計算機の性能もあがり大規模なコーパスが利用できるようになった際には，本稿で提案した用例を利用する手法は重要になるだろう．』
強調表示箇所は《, 》で囲っている。このように、自分のよく使う単語、つまり、その人の興味の大きい単語が強調表示されることになる。自分の興味のある単語を含む段落や文を手際よく探すことができ、それを中心に読むなどのことができるので便利である。
【０１２５】
（４）：個人環境差分強調装置の動作説明（４）
▲１▼：ユーザの日々の読み書きの行動から、頻度記憶部３のフローチャートの処理（図５、図７参照）で、ユーザの読みデータ、書きデータを記憶する。
【０１２６】
▲２▼：差分強調装置９で、入力された文章のうち、ユーザ個人での単語出現確率が、一般的なテキスト集合での出現確率よりも有意に大きいものだけを強調表示する。
【０１２７】
単語Ａの出現確率は、単語Ａの出現回数を全単語の総出現回数で割ったものである。一般的なテキスト集合をもっていればこれは実現できる。有意に大きいかどうかの判定には一般に統計的検定が用いられる。
【０１２８】
統計的検定とは、検定すべき仮説と客観的証拠としての標本データとを比較して、その間に矛盾がなければ仮説を受け入れ、矛盾が生じた場合には仮説を棄却するものである（参考文献の一例として、心理教育統計学培風館肥田他、参照）。
【０１２９】
（例による説明）
例えば、次のようなテキストが入力部６から入力されたとする。
【０１３０】
『自然言語では，動詞を省略するということがある．この省略された動詞を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである．そこで本研究では，この省略された動詞を表層の表現( 手がかり語) と用例から補完することを行なう．解析のための規則を作成する際，動詞の省略現象を補完する動詞がテキスト内にあるかいなかなどで分類した．小説を対象にして実験を行なったところ，テストサンプルで再現率84% ，適合率82% の精度で解析できた．このことは本手法が有効であることを示している．テキスト内に補完すべき動詞がある場合は非常に精度が良かった．それに比べ，テキスト内に補完すべき動詞がない場合はあまり良くなかった．しかし，テキスト内に補完すべき動詞がない場合の問題の難しさから考えると，少しでも解析できるだけでも価値がある．また，コーパスが多くなり，計算機の性能もあがり大規模なコーパスが利用できるようになった際には，本稿で提案した用例を利用する手法は重要になるだろう．』
ユーザは、言語学の専門の人とする。その場合、言語学的な表現、その個人環境での「言語」「動詞」「省略」の使用頻度が、一般的テキストでの出現頻度に比べて有意に高いとする。
【０１３１】
この方法の場合、有意差検定の利用（統計的検定）により、自動的に「こと」「内」などの一般的な名詞は有意に高いとはでないので、強調表示されないことになる。その場合、次のような強調表示となる。
【０１３２】
『自然《言語》では，《動詞》を《省略》するということがある．この《省略》された《動詞》を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである．そこで本研究では，この《省略》された《動詞》を表層の表現( 手がかり語) と用例から補完することを行なう．解析のための規則を作成する際，《動詞》の省略現象を補完する《動詞》がテキスト内にあるかいなかなどで分類した．小説を対象にして実験を行なったところ，テストサンプルで再現率84% ，適合率82% の精度で解析できた．このことは本手法が有効であることを示している．テキスト内に補完すべき《動詞》がある場合は非常に精度が良かった．それに比べ，テキスト内に補完すべき《動詞》がない場合はあまり良くなかった．しかし，テキスト内に補完すべき《動詞》がない場合の問題の難しさから考えると，少しでも解析できるだけでも価値がある．また，コーパスが多くなり，計算機の性能もあがり大規模なコーパスが利用できるようになった際には，本稿で提案した用例を利用する手法は重要になるだろう．』
強調表示箇所は《, 》で囲っている。このように、自分のよく使う単語、つまり、その人の興味の大きい単語が強調表示されることになる。自分の興味のある単語を含む段落や文を手際よく探すことができ、それを中心に読むなどのことができ、便利である。
【０１３３】
（５）：個人環境差分強調装置の動作説明（５）
▲１▼：ユーザの日々の読み書きの行動から、頻度記憶部３のフローチャートの処理（図５、図７参照）で、ユーザの読みデータ、書きデータを記憶する。
【０１３４】
▲２▼：差分強調装置９で、入力された文章のうち、ユーザ個人での単語出現確率が、一般的なテキスト集合での出現確率のある値倍よりも大きいものだけを強調表示する。
【０１３５】
単語Ａの出現確率は、単語Ａの出現回数を全単語の総出現回数で割ったものである。一般的なテキスト集合をもっていればこれは実現できる。この事例では有意かどうかの判定をしないので統計的検定などの難しい方法を使わなくて済む。
【０１３６】
単に、ユーザ個人での単語出現確率と一般的なテキスト集合での出現確率を計算し、その割り算が、つまり、ユーザ個人での単語出現確率を一般的なテキスト集合での出現確率で割った値があらかじめ定めたある値よりも大きいものだけを強調表示する。
【０１３７】
（例による説明）
例えば、次のようなテキストが入力部６から入力されたとする。
【０１３８】
『自然言語では，動詞を省略するということがある．この省略された動詞を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである．そこで本研究では，この省略された動詞を表層の表現( 手がかり語) と用例から補完することを行なう．解析のための規則を作成する際，動詞の省略現象を補完する動詞がテキスト内にあるかいなかなどで分類した．小説を対象にして実験を行なったところ，テストサンプルで再現率84% ，適合率82% の精度で解析できた．このことは本手法が有効であることを示している．テキスト内に補完すべき動詞がある場合は非常に精度が良かった．それに比べ，テキスト内に補完すべき動詞がない場合はあまり良くなかった．しかし，テキスト内に補完すべき動詞がない場合の問題の難しさから考えると，少しでも解析できるだけでも価値がある．また，コーパスが多くなり，計算機の性能もあがり大規模なコーパスが利用できるようになった際には，本稿で提案した用例を利用する手法は重要になるだろう．』
ここでユーザは言語学の専門の人とする。その場合、言語学的な表現、その個人環境での「言語」「動詞」「省略」の使用頻度が、一般的テキストでの出現頻度のあらかじめ定めた値倍したものよりも高いとする。この方法の場合も、自動的に「こと」「内」などの一般的な名詞は比率的にはそれほど出現しないので強調表示されないことになる。その場合、次のような強調表示となる。
【０１３９】
『自然《言語》では，《動詞》を《省略》するということがある．この《省略》された《動詞》を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである．そこで本研究では，この《省略》された《動詞》を表層の表現( 手がかり語) と用例から補完することを行なう．解析のための規則を作成する際，《動詞》の省略現象を補完する《動詞》がテキスト内にあるかいなかなどで分類した．小説を対象にして実験を行なったところ，テストサンプルで再現率84% ，適合率82% の精度で解析できた．このことは本手法が有効であることを示している．テキスト内に補完すべき《動詞》がある場合は非常に精度が良かった．それに比べ，テキスト内に補完すべき《動詞》がない場合はあまり良くなかった．しかし，テキスト内に補完すべき《動詞》がない場合の問題の難しさから考えると，少しでも解析できるだけでも価値がある．また，コーパスが多くなり，計算機の性能もあがり大規模なコーパスが利用できるようになった際には，本稿で提案した用例を利用する手法は重要になるだろう．』
強調表示箇所は《, 》で囲っている。このように、自分のよく使う単語、つまり、その人の興味の大きい単語が強調表示されることになる。自分の興味のある単語を含む段落や文を手際よく探すことができ、それを中心に読むなどのことができ、便利である。
【０１４０】
なお、全文検索技術の参考資料の一例として、日本語全文検索システムの構築と活用，馬場肇 SOFT BANK がある。
【０１４１】
また、前記実施の形態では、強調表示として、２重山括弧で囲む説明をしたが、下線、色分け、背景の変更、字体の変更、点滅等他の強調表示を行うこともできる。
【０１４２】
さらに、前記実施の形態では、頻度記憶部３で、個人環境での読み書きシステムから入力された文字列の出現頻度を求める説明をしたが、読みシステムのみ、又は、書きシステムのみから入力された文字列の出現頻度を求めるようにしてもよい。
【０１４３】
§３：プログラムインストールの説明
個人環境頻度記憶装置２、頻度記憶手段３ａ、頻度記憶部３、言語変換手段４ａ、言語変換部４、読み書き入力部５、入力部６、出力部７、言語補助変換部８、差分強調装置９、削除用単語記憶部１０、格納手段１３ａ等は、プログラムで構成でき、主制御部（ＣＰＵ）が実行するものであり、主記憶に格納されているものである。このプログラムは、一般的な、コンピュータで処理されるものである。このコンピュータは、主制御部、主記憶、ファイル装置、表示装置、キーボード等の入力手段である入力装置などのハードウェアで構成されている。このコンピュータに、本発明のプログラムをインストールする。このインストールは、フロッピィ、光磁気ディスク等の可搬型の記録（記憶）媒体に、これらのプログラムを記憶させておき、コンピュータが備えている記録媒体に対して、アクセスするためのドライブ装置を介して、或いは、ＬＡＮ等のネットワークを介して、コンピュータに設けられたファイル装置にインストールされる。そして、このファイル装置から処理に必要なプログラムステップを主記憶に読み出し、主制御部が実行するものである。
【０１４４】
【発明の効果】
以上説明したように、本発明によれば、次のような効果がある。
【０１４５】
（１）：頻度記憶手段で、読み書き入力部から入力された個人環境での読み書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントするため、個人環境での文字列の頻度情報により、入力文の個人環境への言語変換や差分強調を容易に行うことができる。
【０１４６】
（２）：頻度記憶手段で、書き入力検出部で検出した任意の文字列の出現頻度の重みを読み入力検出部で検出した前記任意の文字列の出現頻度の重みより重くして前記任意の文字列の出現頻度を求めるため、印象のより多い「書く」ということを重要視することができる。
【０１４７】
（３）：読み入力検出部で、表示時間の短いものは任意の文字列の出現頻度として検出しないようにして前記任意の文字列の出現頻度を求めるため、個人が読まないで単に表示するものを除くことができる。
【０１４８】
（４）：頻度記憶手段で、読み書き入力部から入力された読み書きデータのうち古いものを削除して前記任意の文字列の出現頻度を求めるため、最近の個人環境頻度を記憶でき、個人環境の変化に対応することができる。
【０１４９】
（５）：頻度記憶手段で、読み書き入力部から入力された読み書きデータから古いものの重みを軽くして前記任意の文字列の出現頻度を求めるため、最近の個人環境頻度を重要視でき、個人環境の変化に対応することができる。
【０１５０】
（６）：言語変換手段で、入力された文字列を個人環境頻度記憶装置に格納されている出現頻度の高い文字列に変換して出力するため、各個人にとって分かりやすい表現にすることができる。
【０１５１】
（７）：言語変換手段で、入力された文字列のうち個人環境頻度記憶装置に格納されている出現頻度の高い文字列を括弧づけで補助表記して出力するため、完全に書き換えて、勘違いや文の意味が変わるのを防止することができる。
【０１５２】
（８）：差分強調装置で、入力された文字列のうち個人環境頻度記憶装置に格納されている文字列の出現頻度がある閾値以下のものを強調表示するため、自分の苦手な文字列や単語を手際よく探すことができる。
【０１５３】
（９）：差分強調装置で、入力された文字列のうち個人環境頻度記憶装置に格納されている文字列の出現頻度がある閾値以上のものを強調表示するため、自分の興味の大きい文字列や単語を強調表示して、自分の興味のある段落や文を手際よく探すことができる。
【０１５４】
（１０）：個人環境頻度記憶装置で、個人環境での書きの入力を行う書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントし、言語変換手段で、入力された文字列のうち前記個人環境頻度記憶装置に格納されている出現頻度の高い文字列を括弧づけで補助表記して出力するため、完全に書き換えて、勘違いや文の意味が変わるのを防止することができる。
【０１５５】
（１１）：個人環境頻度記憶装置で、個人環境での書きの入力を行う書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントし、差分強調装置で、入力された文字列のうち前記個人環境頻度記憶装置に格納されている文字列の出現頻度がある閾値以下のものを強調表示するため、自分の苦手な文字列や単語を手際よく探すことができる。
【０１５６】
（１２）：入力された文字列の差分として出力する対象の単位である抽出単位と入力された文字列の差分を検出するために比較する領域の単位である検出領域を設定し、抽出手段で入力された文字列の現在の前記検出領域以外の領域から全ての前記抽出単位に相当するものを抽出して格納手段に格納し、現在の前記検出領域において、前記格納手段に格納されていない前記抽出単位に相当するものを強調表示して現在の検出領域の文書を出力することを、前記検出領域ごとに繰り返す差分強調装置で、前記抽出手段で強調表示すべきと判断された前記入力された文字列のうち個人環境頻度記憶装置に格納されている文字列の出現頻度がある閾値以下のものを強調表示するため、強調表示される箇所が少なく、自分の苦手な表現が初出に現れた箇所を容易に特定できる。
【０１５７】
（１３）：入力された文字列の差分として出力する対象の単位である抽出単位と入力された文字列の差分を検出するために比較する領域の単位である検出領域を設定し、抽出手段で、入力された文書データの現在の前記検出領域において、格納手段に格納されていない前記抽出単位に相当するものを強調表示して現在の検出領域の文書を出力し、前記強調表示したものを前記格納手段に格納することを、前記検出領域ごとに繰り返す差分強調装置で、前記抽出手段で強調表示すべきと判断された前記入力された文字列のうち、個人環境頻度記憶装置に格納されている文字列の出現頻度がある閾値以下のものを強調表示するため、強調表示される箇所が少なく、自分の苦手な表現が初出に現れた箇所を容易に特定できる。
【０１５８】
（１４）：個人環境頻度記憶装置で、個人環境での書きの入力を行う書き入力部から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントし、差分強調装置で、入力された文字列のうち個人環境頻度記憶装置に格納されている文字列の出現頻度がある閾値以上のものを強調表示するため、自分の興味の大きい文字列や単語を強調表示して、自分の興味のある段落や文を手際よく探すことができる。
【０１５９】
（１５）：差分強調装置で、入力された文字列のうちユーザ個人での文字列出現確率が一般的なテキスト集合での出現確率より有意に大きいものだけを強調表示するため、出現確率が有意に大きくない一般的な名詞等は強調表示されなくなり、その人の興味の大きい文字列や単語が強調表示され、自分の興味のある文字列や単語を含む段落や文を手際よく探すことができる。
【０１６０】
（１６）：差分強調装置で、入力された文字列のうちユーザ個人での文字列出現確率が一般的なテキスト集合での出現確率のある値倍より大きいものだけを強調表示するため、出現確率がある値倍より大きくない一般的な名詞等は比率的に強調表示されなくなり、その人の興味のより大きい文字列や単語が強調表示され、自分の興味のある文字列や単語を含む段落や文を手際よく探すことができる。
【０１６１】
（１７）：個人環境での読み書きの入力を行う読み書き入力手段と、該読み書き入力手段から入力された個人環境での読み書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする頻度記憶手段として、コンピュータを機能させるためのプログラム又はプログラムを記録した記録媒体とするため、このプログラムをコンピュータにインストールすることで、個人環境での文字列の頻度情報を得ることができる個人環境頻度記憶装置を容易に提供することができる。
【０１６２】
（１８）：個人環境での読み書きの入力を行う読み書き入力手段と、該読み書き入力手段から入力された個人環境での読み書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントして出現頻度を求める頻度記憶手段と、入力された文字列を前記頻度記憶手段に格納されている出現頻度の高い文字列に変換して出力する言語変換手段として、コンピュータを機能させるためのプログラム又はプログラムを記録した記録媒体とするため、このプログラムをコンピュータにインストールすることで、各個人にとって分かりやすい表現にすることができる個人環境言語変換装置を容易に提供することができる。
【０１６３】
（１９）：個人環境での読み書きの入力を行う読み書き入力手段と、該読み書き入力手段から入力された個人環境での読み書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントして出現頻度を求める頻度記憶手段と、入力された文字列のうち前記頻度記憶手段に格納されている出現頻度の高い文字列を括弧づけで補助表記して出力する言語変換手段として、コンピュータを機能させるためのプログラム又はプログラムを記録した記録媒体とするため、このプログラムをコンピュータにインストールすることで、完全に書き換えて、勘違いや文の意味が変わるのを防止することができる個人環境言語変換装置を容易に提供することができる。
【０１６４】
（２０）：個人環境での読み書きの入力を行う読み書き入力手段と、該読み書き入力手段から入力された個人環境での読み書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントして出現頻度を求める頻度記憶手段と、入力された文字列のうち前記頻度記憶手段に格納されている文字列の出現頻度がある閾値以下のものを強調表示する差分強調装置として、コンピュータを機能させるためのプログラム又はプログラムを記録した記録媒体とするため、このプログラムをコンピュータにインストールすることで、自分の苦手な文字列や単語を手際よく探すことができる個人環境差分強調装置を容易に提供することができる。
【０１６５】
（２１）：個人環境での読み書きの入力を行う読み書き入力手段と、該読み書き入力手段から入力された個人環境での読み書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントして出現頻度を求める頻度記憶手段と、入力された文字列のうち前記頻度記憶手段に格納されている文字列の出現頻度がある閾値以上のものを強調表示する差分強調装置として、コンピュータを機能させるためのプログラム又はプログラムを記録した記録媒体とするため、このプログラムをコンピュータにインストールすることで、自分の興味のある段落や文を手際よく探すことができる個人環境差分強調装置を容易に提供することができる。
【０１６６】
（２２）：個人環境での書きの入力を行う書き入力手段と、前記書き入力手段から入力された個人環境での書きデータから任意の文字列を抽出し、該抽出した文字列毎の個数をカウントする頻度記憶手段と、入力された文字列の差分として出力する対象の単位である抽出単位と入力された文字列の差分を検出するために比較する領域の単位である検出領域を設定し、抽出手段で、入力された文書データの現在の前記検出領域において、格納手段に格納されていない前記抽出単位に相当するものを強調表示して現在の検出領域の文書を出力し、前記強調表示したものを前記格納手段に格納することを、前記検出領域ごとに繰り返す差分強調装置と、前記抽出手段で強調表示すべきと判断された前記入力された文字列のうち、前記頻度記憶手段に格納されている文字列の出現頻度がある閾値以下のものを強調表示する前記差分強調装置として、コンピュータを機能させるためのプログラム又はプログラムを記録した記録媒体とするため、このプログラムをコンピュータにインストールすることで、強調表示される箇所が少なく、自分の苦手な表現が初出に現れた箇所を容易に特定できる個人環境差分強調装置を容易に提供することができる。
【図面の簡単な説明】
【図１】本発明の原理説明図である。
【図２】実施の形態における個人環境言語変換装置の説明図である。
【図３】実施の形態における言語補助変換部を用いる場合の説明図である。
【図４】実施の形態における頻度記憶部の説明図（１）である。
【図５】実施の形態における頻度記憶部の処理フローチャート（１）である。
【図６】実施の形態における頻度記憶部の説明図（２）である。
【図７】実施の形態における頻度記憶部の処理フローチャート（２）である。
【図８】実施の形態における言語変換部の説明図である。
【図９】実施の形態における個人環境差分強調装置の説明図である。
【図１０】実施の形態における削除用単語記憶部を用いる個人環境差分強調装置の説明図である。
【符号の説明】
２個人環境頻度記憶装置
３ａ頻度記憶手段
４ａ言語変換手段
５読み書き入力部
６入力部
７出力部
１３ａ格納手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a personal environment frequency storage device that obtains character string frequency information from read / write input data in a personal environment, a personal environment language conversion device that uses the personal environment frequency storage device, a personal environment difference enhancement device, and a program. .
[0002]
In particular, a personal environment frequency storage device and program that recognizes a character string (word) familiar to the individual in a personal environment read / write system and a character string (word) familiar to the individual as much as possible are displayed. The present invention relates to a personal environment language conversion apparatus and program for personal computer, and a personal environment difference emphasis apparatus and program for highlighting a character string according to the appearance frequency of the character string in the personal environment.
[0003]
[Prior art]
Conventionally, machine translation is a typical conversion process for expressions related to sentences or sentences written in a natural language. In machine translation, a sentence or sentence written in a natural language of one country is converted into a sentence or sentence written in the natural language of another country.
[0004]
In contrast to machine translation which translates into languages of other countries, systems that perform sentence or sentence conversion processing between the same natural languages are also being used. For example, it is a system that automatically generates a summary sentence or recommends a sentence.
[0005]
In general, in the process of converting sentences between the same natural language, a large number of conversion rules consisting of pairs of patterns of words / phrases / sentences before conversion and patterns of words / phrases / sentences after conversion are prepared. The pattern matching is used to find a pre-conversion pattern that appears in the input sentence, and if there is a corresponding pattern, replace it with a pattern such as a word / phrase / sentence after conversion.
[0006]
In addition, in order to be able to handle various paraphrasing in a unified manner, a system that converts a character string written in one natural language into a character string expressed in another expression written in the same natural language is as follows. Things have been proposed.
[0007]
Deformation processing means and evaluation processing means as main modules, deformation rule storage means for storing deformation rules relating to natural language character strings, and evaluating whether or not the result of transforming a character string is the appropriate conversion And an evaluation information storage means for storing an evaluation function that gives a scale for the evaluation, or an evaluation rule, which can be exchanged according to the purpose of conversion. Furthermore, a plurality of types of deformation rules and evaluation functions are prepared in the deformation rule storage means and the evaluation information storage means so that they can be selected according to the purpose of conversion.
[0008]
As a result, the development cost when developing a plurality of language conversion processing systems can be reduced, and there are some that can be used with a unified interface (for example, non-patented). Reference 1 and Japanese Patent Application No. 2001-205889).
[0009]
[Non-Patent Document 1]
Masaki Murata, Hitoshi Isahara, Unified Model of Paraphrasing-Use of Scale-Based Deformation-, March 30, 2001, Proceedings of the 7th Annual Conference of the Association for Natural Language Processing, p. 21-26
[0010]
[Problems to be solved by the invention]
In the above-described conventional automatic sentence generation, sentence conversion within the same natural language, such as sentence revision, generally conversion is performed uniformly according to conversion rules. Plain sentence generation, summary sentence generation, sentence Depending on the purpose of the conversion, such as refining, it was necessary to construct an individual system individually, and language conversion corresponding to the personal environment could not be performed automatically.
[0011]
In addition, what can handle various paraphrasings in a unified manner cannot recognize words that are familiar to individuals and display them using words that are familiar to each individual.
[0012]
The present invention solves the above-mentioned problems, recognizes a word familiar to the individual in a personal environment reading and writing system, and displays it using the word as much as possible, thereby making the expression easy to understand for each individual. For the purpose.
[0013]
In addition, by highlighting the character string according to the appearance frequency of the character string in a personal environment, it is possible to easily identify a character string or word that you are not good at or a character string or word that you are interested in. And
[0014]
[Means for Solving the Problems]
FIG. 1 is a diagram illustrating the principle of the present invention. In FIG. 1, 2 is a personal environment frequency storage device, 3a is a frequency storage unit, 4a is a language conversion unit, 5 is a read / write input unit, 6 is an input unit, 7 is an output unit, and 13a is a storage unit.
[0015]
The present invention has the following means in order to solve the conventional problems.
[0016]
(1): It includes a read / write input unit 5 that performs read / write input in a personal environment, and a frequency storage unit 3a that searches frequency information of a character string. The frequency storage unit 3a is input from the read / write input unit 5 An arbitrary character string is extracted from the read / write data in the personal environment, and the number of each extracted character string is counted. For this reason, language conversion of the input sentence to the personal environment and difference emphasis can be easily performed based on the frequency information of the character string in the personal environment.
[0017]
(2): In the personal environment frequency storage device of (1), the frequency storage unit 3a includes a reading input detection unit and a writing input detection unit, and the frequency storage unit 3a is detected by the writing input detection unit. The appearance frequency of the arbitrary character string is obtained by making the weight of the appearance frequency of the arbitrary character string heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. For this reason, it is possible to place importance on “writing” with a greater impression.
[0018]
(3): In the personal environment frequency storage device of (2), the reading input detection unit does not detect a short display time as the appearance frequency of the arbitrary character string, and the appearance of the arbitrary character string Find the frequency. For this reason, what is simply displayed without being read by an individual can be excluded.
[0019]
(4): In the personal environment frequency storage device of the above (1) to (3), the frequency storage means 3a deletes old ones of read / write data input from the read / write input unit 5 and selects the arbitrary character Find the appearance frequency of a column. For this reason, the recent personal environment frequency can be memorize | stored and it can respond to the change of a personal environment.
[0020]
(5): In the personal environment frequency storage device of (1) to (3), the frequency storage means 3a reduces the weight of the old one from the read / write data input from the read / write input unit 5, and the arbitrary character Find the appearance frequency of a column. For this reason, it is possible to attach importance to recent personal environment frequencies, and to cope with changes in the personal environment.
[0021]
(6): The personal environment frequency storage device 2 according to any one of (1) to (5), and a language conversion unit 4a that converts an input sentence into a personal environment language and outputs the language, and the language conversion unit 4a converts the inputted character string into a character string having a high appearance frequency stored in the personal environment frequency storage device 2 and outputs the character string. For this reason, it is possible to make it easy to understand for each individual.
[0022]
(7): The personal environment frequency storage device 2 according to any one of (1) to (5), and a language conversion unit 4a that converts an input sentence into a personal environment language and outputs the language, and the language conversion unit In 4a, a character string with a high appearance frequency stored in the personal environment frequency storage device 2 among the input character strings is supplementally written in parentheses and output. For this reason, it can be completely rewritten to prevent the misunderstanding and the meaning of the sentence from changing.
[0023]
(8): The difference enhancement apparatus, comprising: the personal environment frequency storage device 2 according to any one of (1) to (5); and a difference enhancement apparatus that emphasizes and displays a difference between character strings of input sentences. The highlighted character string has an appearance frequency of a character string stored in the personal environment frequency storage device 2 that is equal to or lower than a threshold value. For this reason, it is possible to search for character strings and words that are not good for them.
[0024]
(9): comprising the personal environment frequency storage device 2 according to any one of (1) to (5), and a difference emphasizing device that emphasizes and displays a difference between character strings of an input sentence, the difference emphasizing device. The highlighted character string has a frequency of appearance of a character string stored in the personal environment frequency storage device 2 that is equal to or higher than a threshold. For this reason, it is possible to highlight a character string or word that is of great interest and to search for a paragraph or sentence that is of interest to him / her.
[0025]
(10): a writing input unit for inputting writing in a personal environment and a frequency storage unit 3a for searching frequency information of a character string, the frequency storage unit 3a being an individual input from the writing input unit A personal environment frequency storage device 2 for extracting an arbitrary character string from written data in the environment and counting the number of each extracted character string; and a language conversion means 4a for converting an input sentence into a personal environment language and outputting it. The language conversion unit 4a supplements and outputs a character string having a high appearance frequency stored in the personal environment frequency storage device 2 in parentheses among the input character strings. For this reason, it can be completely rewritten to prevent the misunderstanding and the meaning of the sentence from changing.
[0026]
(11): a writing input unit for inputting writing in a personal environment, and a frequency storage unit 3a for searching frequency information of a character string. The frequency storage unit 3a is an individual input from the writing input unit. A personal environment frequency storage device 2 that extracts an arbitrary character string from written data in the environment and counts the number of each extracted character string; and a difference enhancement device that emphasizes and displays a difference between character strings of an input sentence; The difference emphasizing device highlights the input character strings that have a frequency of appearance of a character string stored in the personal environment frequency storage device 2 that is equal to or lower than a threshold value. For this reason, it is possible to search for character strings and words that are not good for them.
[0027]
(12): a writing input unit for inputting writing in a personal environment and a frequency storage unit 3a for searching for frequency information of a character string, the frequency storage unit 3a being an individual input from the writing input unit A personal environment frequency storage device 2 that extracts an arbitrary character string from written data in the environment and counts the number of each extracted character string, and an extraction unit that is a unit to be output as a difference between the input character strings In order to detect the difference between the input character strings, a detection area that is a unit of an area to be compared is set, and all the extraction units from areas other than the current detection area of the character string input by the extraction unit are set. Corresponding items are extracted and stored in a storage unit, and in the current detection region, a document corresponding to the extraction unit that is not stored in the storage unit is highlighted and a document in the current detection region is output. The A difference enhancement device that repeats for each detection region, and the difference enhancement device is stored in the personal environment frequency storage device 2 among the inputted character strings determined to be highlighted by the extraction means. Characters whose appearance frequency is less than a threshold value are highlighted. For this reason, there are few places to be highlighted, and it is possible to easily identify a place where an unsatisfactory expression appears for the first time.
[0028]
(13): a writing input unit for inputting writing in a personal environment and a frequency storage unit 3a for searching frequency information of a character string, the frequency storage unit 3a being an individual input from the writing input unit A personal environment frequency storage device 2 that extracts an arbitrary character string from written data in the environment and counts the number of each extracted character string, and an extraction unit that is a unit to be output as a difference between the input character strings In order to detect the difference between the input character strings, a detection area that is a unit of an area to be compared is set, and the extraction means is not stored in the storage means in the current detection area of the input document data. A difference emphasis device that repeats, for each detection region, highlighting the one corresponding to the extraction unit, outputting a document in the current detection region, and storing the highlighted one in the storage unit The difference emphasizing device has an appearance frequency of a character string stored in the personal environment frequency storage device 2 that is equal to or less than a threshold among the input character strings determined to be highlighted by the extraction unit To highlight. For this reason, there are few places to be highlighted, and it is possible to easily identify a place where an unsatisfactory expression appears first.
[0029]
(14): a writing input unit for inputting writing in a personal environment and a frequency storage unit 3a for searching for frequency information of a character string, the frequency storage unit 3a being an individual input from the writing input unit A personal environment frequency storage device 2 that extracts an arbitrary character string from written data in the environment and counts the number of each extracted character string; and a difference enhancement device that emphasizes and displays a difference between character strings of an input sentence; The difference emphasizing device highlights the input character strings that have a frequency of appearance of a character string stored in the personal environment frequency storage device 2 that is equal to or higher than a threshold value. For this reason, it is possible to highlight a character string or word that is of great interest and to search for a paragraph or sentence that is of interest to him / her.
[0030]
(15): a writing input unit for inputting writing in a personal environment, and a frequency storage unit 3a for searching frequency information of a character string, the frequency storage unit 3a being an individual input from the writing input unit A personal environment frequency storage device 2 that extracts an arbitrary character string from written data in the environment and counts the number of each extracted character string; and a difference enhancement device that emphasizes and displays a difference between character strings of an input sentence; The difference emphasizing apparatus highlights only the input character strings that have a character string appearance probability that is significantly higher than that of a general text set. For this reason, general nouns that do not appear to have a significantly high probability are no longer highlighted, strings and words that are of great interest to the person are highlighted, and paragraphs and sentences that contain the character or word of interest. Can be searched smartly.
[0031]
(16): a writing input unit for inputting writing in a personal environment and a frequency storage unit 3a for searching for frequency information of a character string, the frequency storage unit 3a being an individual input from the writing input unit A personal environment frequency storage device 2 that extracts an arbitrary character string from written data in the environment and counts the number of each extracted character string; and a difference enhancement device that emphasizes and displays a difference between character strings of an input sentence; The difference emphasizing apparatus highlights only the input character strings whose character string appearance probability for the individual user is larger than a certain double of the appearance probability in a general text set. For this reason, general nouns etc. whose appearance probability is not greater than a certain value will not be highlighted proportionately, strings and words of greater interest of the person will be highlighted, You can search for paragraphs and sentences that contain words.
[0032]
DETAILED DESCRIPTION OF THE INVENTION
§1: Explanation of personal environment language conversion
The present invention basically performs language conversion within the same language, but particularly performs language conversion that matches the personal environment.
[0033]
That is, a language conversion device in the same language as the system that reads and writes in the personal environment is attached. Generally, a person's familiar word is recognized in a word processor for creating a sentence or a personal reading / writing system for displaying on a display for reading a sentence, and the word is used as much as possible. It is to make.
[0034]
(1): Explanation of personal environment language conversion device
FIG. 2 is an explanatory diagram of the personal environment language conversion apparatus. In FIG. 2, the personal environment language conversion apparatus includes a frequency storage unit 3, a language conversion unit 4, a read / write input unit 5, an input unit 6, and an output unit 7.
[0035]
The frequency storage unit 3 obtains the appearance frequency of the character string input from the read / write input unit 5 in a personal environment. When the read / write system is used, read / write is input from the read / write input unit 5 and the frequency storage unit 3 is always rewritten.
[0036]
The language conversion unit 4 acquires conversion candidates according to the transformation rules of the frequency storage unit 3, evaluates them using an evaluation scale (evaluation function, etc.) such as appearance frequency, and selects the most suitable conversion candidate.
[0037]
The read / write input unit 5 receives read / write from a read / write system such as a read system, a write system, or a system in which read / write is integrated. The input unit 6 inputs a conversion target sentence. The output unit 7 outputs a language conversion result.
[0038]
(2): Explanation when using the language auxiliary conversion unit
FIG. 3 is an explanatory diagram when a language auxiliary conversion unit is used. The personal environment language conversion device of FIG. 3 uses a language auxiliary conversion unit 8 instead of the language conversion unit 4 of FIG. In FIG. 3, the personal environment language conversion apparatus includes a frequency storage unit 3, a read / write input unit 5, an input unit 6, an output unit 7, and a language auxiliary conversion unit 8.
[0039]
The frequency storage unit 3 obtains the appearance frequency of a character string input from a read / write system or the like in a personal environment. The read / write input unit 5 receives input from the read / write system. The input unit 6 inputs a conversion target sentence. The output unit 7 outputs a language conversion result.
[0040]
The auxiliary language conversion unit 8 obtains conversion candidates according to the transformation rules of the frequency storage unit 3, evaluates them using an evaluation scale (evaluation function, etc.) such as the appearance frequency, selects the most suitable conversion candidate, and assists with parentheses. It is to be described.
[0041]
(3): Description of the frequency storage unit
a) Example of frequency storage unit (1)
FIG. 4 is an explanatory diagram (1) of the frequency storage unit. In FIG. 4, the frequency storage unit includes a reading input detection unit 11, a reading data storage unit 13, a full text search engine 14, a writing input detection unit 21, a writing data storage unit 23, and a full text search engine 24.
[0042]
The reading input detector 11 detects the reading input input from the read / write input unit 5. The reading data storage unit 13 stores the reading input detected by the reading input detection unit 11. The full text search engine 14 counts the number of arbitrary character strings stored in the reading data storage unit 13. The writing input detection unit 21 detects writing input input from the read / write input unit 5. The writing data storage unit 23 stores the writing input detected by the writing input detection unit 21. The full text search engine 24 counts the number of arbitrary character strings stored in the write data storage unit 23. The full-text search engines 14 and 24 can use one search engine.
[0043]
FIG. 5 is a processing flowchart (1) of the frequency storage unit. Hereinafter, the processing of the frequency storage unit will be described according to steps S1 to S5 in FIG.
[0044]
S 1: Read / input data from the read / write input unit 5 is detected by the read input detection unit 11. Specifically, a portion continuously displayed on the screen for several minutes or longer is recognized as input data.
[0045]
S2: The character string input by reading detected by the reading input detection unit 11 is stored in the reading data storage unit 13 as it is.
[0046]
S3: The data input from the read / write input unit 5 is detected by the write input detection unit 21. Specifically, a character string input by keyboard input or the like is written and recognized as input data.
[0047]
S4: The character string inputted by the writing input detection unit 21 is stored in the writing data storage unit 23 as it is.
[0048]
S5: The number of arbitrary character strings can be counted using the full-text search engines 14 and 24 that search for the existence or number of character strings at high speed.
[0049]
b) Example of frequency storage unit (2)
FIG. 6 is an explanatory diagram (2) of the frequency storage unit. In FIG. 6, the frequency storage unit includes a reading input detection unit 11, a morpheme analyzer 12, a reading data storage unit 13, a word search engine 14, a writing input detection unit 21, a morpheme analyzer 22, a writing data storage unit 23, a full text. A search engine 24 is provided.
[0050]
The reading input detector 11 detects the reading input input from the read / write input unit 5. The morpheme analyzer 12 divides the reading input into words. The reading data storage unit 13 stores the words divided by the morphological analyzer 12. The word search engine 14 counts the number of appearances of an arbitrary word. The writing input detection unit 21 detects writing input input from the read / write input unit 5. The morpheme analyzer 22 divides the writing input into words. The writing data storage unit 23 stores the words divided by the morphological analyzer 22. The word search engine 24 counts the number of appearances of an arbitrary word. The morpheme analyzers 12 and 22, the reading data storage unit 13 and the writing data storage unit 23, and the full-text search engines 14 and 24 can use one morpheme analyzer, one storage unit, and one search engine, respectively. .
[0051]
FIG. 7 is a processing flowchart (2) of the frequency storage unit. Hereinafter, the processing of the frequency storage unit using the morphological analyzer will be described according to steps S11 to S15 of FIG.
[0052]
S 11: The data read from the read / write input unit 5 is detected by the read input detection unit 11. Specifically, a portion continuously displayed on the screen for several minutes or longer is recognized as input data.
[0053]
S12: The character string input by reading detected by the reading input detection unit 11 is divided into words by the morphological analyzer 12, and each word is stored in the reading data storage unit 13. The number of times of occurrence for each word is also stored simultaneously. When the same word as the already stored word is stored, only the data on the number of appearances is updated.
[0054]
S13: The data input from the read / write input unit 5 is detected by the write input detection unit 21. Specifically, a character string input by keyboard input or the like is written and recognized as input data.
[0055]
S14: The character string inputted by the writing input detection unit 21 is divided into words by the morphological analyzer 22 and stored in the writing data storage unit 23 for each word. The number of times of occurrence for each word is also stored simultaneously. When the same word as the already stored word is stored, only the data on the number of appearances is updated.
[0056]
S15: The word search engines 14 and 24 make it possible to count the number of appearances of an arbitrary word.
[0057]
(4) Description of language converter
FIG. 8 is an explanatory diagram of the language conversion unit. In FIG. 8, the language conversion unit includes a language conversion processing unit 31, a conversion rule unit (language conversion dictionary) 32, and a conversion scale 33. The language conversion processing unit 31 uses the conversion rules to raise candidates for transformation, checks the validity of the conversion using the conversion scale, and converts it to the one determined to be most appropriate. That is, conversion is performed using the deformation rule so that the scale for conversion becomes large. The deformation rule part (language conversion dictionary) 32 is a rule for changing “dismiss” to “stop”. The conversion scale 33 includes the similarity, the length, the appearance frequency (appearance probability), and the like. Here, the appearance frequency of the frequency storage unit is used.
[0058]
(5) Explanation of personal environment language conversion processing
a) Explanation of personal environment language conversion using a frequency storage unit and a language conversion unit
{Circle around (1)} The user's read / write data is stored in the frequency storage unit 3 by the process of the flowchart (see FIGS. 5 and 7) of the frequency storage unit based on the daily reading / writing behavior of the user.
[0059]
(2): Language conversion rules are stored in the language conversion unit 4. When there are applicable language conversion rules, the frequency of the character string after the conversion and the frequency of the character string before the conversion are obtained from the frequency storage unit 3, and the frequency of the character string after the conversion is calculated. If is large, conversion is performed.
[0060]
If the frequency of the character string before conversion is greater, conversion is not performed. When there are a plurality of language conversion rules that can be converted, the conversion is performed using the rule having the highest frequency of the character string after the conversion.
[0061]
The frequency of the character string at this time is a combination of both the reading data and the writing data, and a value obtained by recalculating the following expression is generally used.
[0062]
Specifically, the frequency of appearance of each word t in the reading system is f _r (t) The frequency of occurrence of each word t in the writing system is f _w (t), the individual word frequency distribution is
α × f _r (t) + (1-α) × f _w (t) (However, 0 ≦ α ≦ 1)
The words are converted so that the frequency increases. That is, the word frequency distribution of the individual in the personal environment read / write system is used as a scale used for language conversion. Here, α is provided in order to increase (weight) the weight of “writing” because “reading” has less impression than “writing”. That is, α is smaller than 0.5. A constant such as α is set so that the user can change the setting.
[0063]
For example, if the individual is a politician, it will be converted into an expression that uses politician terminology as much as possible. In general, sentences written in expressions that I often use are easier to understand. By using the personal environment language conversion device of the present invention, the texts of experts in other fields can be changed to expressions often used in their fields.
[0064]
(Explanation by specific example)
The language conversion in the same language has already been filed, but here, a personal environment read / write system and a language conversion device in the same language are attached. It is important that the frequency distribution of words familiar to the individual can be easily acquired by using a personal environment reading and writing system, and that information can be used for language conversion.
[0065]
For example, conversion is performed as indicated by the following arrow.
[0066]
"Study on anaphora analysis using world knowledge"->"Study on estimation of target of indicator using common sense"
Suppose that the language conversion dictionary (conversion rules) has the following.
[0067]
World knowledge = common sense
Anaphoric analysis = Estimating the target of a directive
Here, it is assumed that a sentence “research of anaphora analysis using world knowledge” described in the field of language processing is input.
[0068]
(Case 1)
The user is a researcher of language processing.
[0069]
It is assumed that “analysis analysis” has higher usage frequency (appearance frequency) in the user's read / write system than “world knowledge” and “estimation of instruction target of indicator” than “common sense”.
[0070]
This is because such terminology is often used for language processing researchers.
[0071]
In this case, the personal environment language conversion device outputs “analysis analysis using world knowledge” as it is without rewriting. For this user, it is more convenient to see the output as it is because this expression is more natural.
[0072]
(Case 2)
Suppose that the user is not familiar with the field of language processing.
[0073]
In this case, the more common terms “common sense” and “estimation of the instruction destination of a directive” are likely to be used more frequently.
[0074]
In this case, the personal environment language conversion device performs rewriting and outputs “a study on estimation of the instruction destination of the indicator using common sense”.
[0075]
Since this user does not know the terms “anaphoric analysis” and “world knowledge”, he / she sees expressions that are easy to understand, making it easier to understand the text.
[0076]
However, complete rewriting may cause misunderstandings or change the meaning of the sentence.
[0077]
b) Explanation of personal environment language conversion using a frequency storage unit and a language auxiliary conversion unit
(1): From the user's daily reading and writing behavior, the user's read data and write data are stored in the frequency storage unit 3 by the processing of the processing flowchart of the frequency storage unit (see FIGS. 5 and 7).
[0078]
(2): Language conversion rules are stored in the language auxiliary conversion unit 8. When there are applicable language conversion rules, the frequency of the character string after the conversion and the frequency of the character string before the conversion are obtained from the frequency storage unit, and the frequency of the character string after the conversion is more If larger, auxiliary conversion is performed.
[0079]
If the frequency of the character string before conversion is larger, auxiliary conversion is not performed. If there are a plurality of language conversion rules that can be converted, auxiliary conversion is performed using the rule with the highest frequency of the character string after the conversion.
[0080]
The frequency of the character string at this time is a combination of both the reading data and the writing data, and a value obtained by recalculating the following expression is generally used.
[0081]
Specifically, the frequency of appearance of each word t in the reading system is f _r (t) The frequency of occurrence of each word t in the writing system is f _w (t), the individual word frequency distribution is
α × f _r (t) + (1-α) × f _w (t) (However, 0 ≦ α ≦ 1)
The words are converted so that the frequency increases. That is, the word frequency distribution of the individual in the personal environment read / write system is used as a scale used for language conversion. Here, α is provided in order to increase (weight) the weight of “writing” because “reading” has less impression than “writing”. That is, α is smaller than 0.5. A constant such as α is set so that the user can change the setting.
[0082]
By the way, auxiliary conversion means that the character string is not converted, but the conversion destination character string is auxiliary written in parentheses.
[0083]
(Explanation by specific example)
"Study on anaphora analysis (estimation of indicator destination) using world knowledge (common sense)"
It is not completely rewritten like this, but supplementary display is done with parenthesis. If parentheses are used in the text, different parentheses can be used to distinguish them.
[0084]
Even at this time, it is rather inconvenient to perform this auxiliary conversion (display) for those who are familiar with terms such as “world knowledge” and “anaphoric analysis” such as specialized researchers. It is better to judge whether to put out or not.
[0085]
A method of determining by this parenthesis (auxiliary conversion) will be described below.
[0086]
As described above, the individual word frequency distribution is expressed as [α × f _r (t) + (1-α) × f _w (t) (where 0 ≦ α ≦ 1)], rewrite candidate words that increase in frequency are parenthesized. In other words, parentheses are used to indicate the direction in which the rewritten word is used more frequently.
[0087]
As described above, the individual word appearance frequency distribution [α × f _r (t) + (1-α) × f _w (t) (where 0 ≦ α ≦ 1)], a word that is a candidate for rewriting that does not decrease this frequency and that has a frequency that is lower than the threshold of the original word is parenthesized. In other words, parentheses are not performed if the frequency of personal use is high, but are added in parentheses if the frequency is low or zero.
[0088]
Note that the reading / writing system includes a reading system, a writing system, or a system in which reading and writing are integrated. The reading system is a mailer, Internet Explorer, or a system for reading sentences such as a word sentence opened (displayed) for reading (a kind of sentence creation system). The writing system is a system for writing a sentence such as a word sentence in which a character is input to create a sentence. Also, in the reading system, the amount of text displayed on a display or the like increases, so that a sentence with a short display time can be excluded.
[0089]
Furthermore, in the reading system, the weighting of words stored in the frequency storage unit can be changed. For example, when reading a word sentence or the like, which is a text creation system, it is considered to read carefully, so the weight can be made higher than when the screen is viewed on the Internet or the like.
[0090]
Also, the old words can be excluded from the words stored in the frequency storage unit. For example, the personal environment may change due to personal hobbies changing or becoming an expert in a certain field, so old ones are deleted or weighted down.
[0091]
§2: Explanation of personal environment difference enhancement device
FIG. 9 is an explanatory diagram of the personal environment difference enhancement device. In FIG. 9, the personal environment difference enhancement device includes a frequency storage unit 3, a read / write input unit 5, an input unit 6, an output unit 7, and a difference enhancement device 9.
[0092]
The frequency storage unit 3 obtains the appearance frequency of a character string input from a read / write system or the like in a personal environment. When the read / write system is used, the frequency storage unit 3 is always rewritten.
[0093]
The read / write input unit 5 receives input / output from the read / write system. The input unit 6 inputs a conversion target sentence. The output unit 7 outputs a language conversion result.
[0094]
The difference emphasizing device 9 highlights the character string of the sentence input from the input unit 6 based on the difference between the character string of the sentence input from the input unit 6 and the character string with a certain frequency in the frequency storage unit 3. Is.
[0095]
(1): Explanation of operation of personal environment difference enhancement device (1)
{Circle around (1)} Based on the user's daily reading and writing behavior, the user's reading data and writing data are stored by the processing of the flowchart of the frequency storage unit 3 (see FIGS. 5 and 7).
[0096]
{Circle around (2)} The difference emphasizing device 9 highlights only the input sentences that are less than or equal to a threshold with a word appearance frequency of the individual user.
[0097]
(Description by example)
For example, it is assumed that the following text is input from the input unit 6.
[0098]
“In natural language, verbs are sometimes omitted. Restoring this abbreviated verb is indispensable for the realization of dialogue systems and high-quality machine translation systems. Therefore, in this study, we will supplement the omitted verbs with surface expressions (cue words) and examples. When creating rules for analysis, we classified them according to whether there are verbs in the text that complement the verb abbreviation. Experiments were conducted on novels, and the test samples were analyzed with an accuracy of 84% recall and 82% accuracy. This shows that this method is effective. It was very accurate when there was a verb in the text to be completed. In contrast, it was not so good when there were no verbs to complete in the text. However, considering the difficulty of the problem when there are no verbs to be complemented in the text, it is worthwhile to analyze even a little. In addition, when the number of corpora increases and the performance of computers increases, and large-scale corpora can be used, the method using the example proposed in this paper will be important. ]
The user is a person specializing in linguistics. In this case, it is assumed that the frequency of use of the engineering expressions “test sample”, “reproducibility”, and “precision” is extremely small.
[0099]
In that case, it is highlighted as follows: That is, the highlighted portion is displayed surrounded by <<, >> (double angle brackets).
[0100]
“In natural language, verbs are sometimes omitted. Restoring this abbreviated verb is indispensable for the realization of dialogue systems and high-quality machine translation systems. Therefore, in this study, we will supplement this omitted verb with the surface expression (cue word) and examples. When creating rules for analysis, we classified them according to whether there are verbs in the text that complement the verb abbreviation. When an experiment was conducted on a novel, it was possible to analyze the test sample with an accuracy of 84% and a precision of 82%. This shows that this method is effective. It was very accurate when there was a verb in the text to be completed. In contrast, it was not so good when there were no verbs to complete in the text. However, considering the difficulty of the problem when there are no verbs to be complemented in the text, it is worthwhile to analyze even a little. In addition, when the number of corpora increases and the performance of computers increases, and large-scale corpora can be used, the method using the example proposed in this paper will be important. ]
In this way, you can search for words that you are not good at, and it is convenient to find them in a dictionary.
[0101]
(2): Explanation of the operation of the personal environment difference enhancement device (2)
{Circle around (1)} Based on the user's daily reading and writing behavior, the user's reading data and writing data are stored by the processing of the flowchart of the frequency storage unit 3 (see FIGS. 5 and 7).
[0102]
{Circle around (2)} In the text input from the input unit 5 by the difference emphasizing device 9, it is determined that the first character string in the text should be highlighted, and the user's personal word appearance frequency is less than a threshold value Highlight only things.
[0103]
Note that there are the following methods 1 and 2 for determining that the first character string in the input sentence should be highlighted in the difference emphasis device (see Japanese Patent Application No. 2002-290946).
[0104]
Method 1
(1) An extraction unit (extraction unit) and a detection area unit are determined in advance by an input unit or the like. An extraction unit is a unit to be output as a difference. The extraction unit may be “word”, “kanji”, “noun phrase”, or the like. The unit of detection area is a unit of area to be compared in order to detect a difference. As a unit of the detection area, “character”, “word”, “sentence”, “item of item”, “paragraph”, “claim of patent”, and the like can be considered.
[0105]
(2) The difference emphasizing apparatus stores all input data in the storage means (in the difference emphasizing apparatus).
[0106]
(3) The difference enhancement device repeats the following processing (4) and processing (5) for each detection region determined in (1) from the left detection region by examining the input data from the left.
[0107]
(4) The difference enhancement device extracts all the extraction units (for example, words) corresponding to all the extraction units from all the regions other than the current detection region, and stores them in the extract storage means (in the difference enhancement device). .
[0108]
(5) The difference emphasizing device highlights an item (eg, a word) corresponding to an extraction unit that is not stored in the extract storage means (in the difference emphasizing device) in the current detection region, and displays the current detection region. Output text.
[0109]
Method 2
(1) An extraction unit (extraction unit) and a detection area unit are determined in advance by the input unit 1 or the like. An extraction unit is a unit to be output as a difference. The extraction unit may be “word”, “kanji”, “noun phrase”, or the like. The unit of detection area is a unit of area to be compared in order to detect a difference. As a unit of the detection area, “character”, “word”, “sentence”, “item of item”, “paragraph”, “claim of patent”, and the like can be considered.
[0110]
(2) Input data is input from the input unit for each detection region defined in (1), and the difference emphasis device repeats the following processes (3) and (4).
[0111]
(3) The difference emphasizing device highlights an item (for example, a word) corresponding to an extraction unit that is not stored in the extract storage device (in the difference emphasizing device) in the current detection region, and displays the current detection region. Output text. However, the extract storage device (in the difference enhancement device) is initially empty.
[0112]
(4) The expression highlighted in the process (1) is stored in the extract storage device.
[0113]
(Description by example)
Here, an example applied to Method 2 will be described. The difference emphasizing apparatus determines that the portion surrounded by the following double angle brackets in the input sentence should be highlighted by the above method 2. Note that both the extraction unit and the detection area unit are words.
[0114]
“The purpose of this study is to change the“ passive sentence ”of“ Japanese ”,“ usefulness ”sentence“ when transforming “active” sentence into “active” sentence << to “changed” and “case particles” to be “machine learning” << Use automatic to convert. >> “Examples” of Japanese passive sentences and usage sentences are given in “Figure 1” and “Figure 2”. The Japanese << suffix ">>"<<"in the sentence in Figure 1 is" passive "is a" showing auxiliary verb "," is ", and" this "is a passive sentence. The Japanese suffix “<< se” ta ”in the sentence in Fig. 2 is an auxiliary verb indicating a working part, and this sentence is a working sentence. Figure << 3 >> shows an active sentence that corresponds to << the >> sentence. When the sentence << in Figure 1 is transformed into an active sentence, << (i) >> the case particle "ni" is the case particle "ga", << (ii) >> the case particle "ga" is the case particle " Is converted to. When the sentence in Fig. 2 is converted to an active sentence, (i) the << part >> of the case particle "ga", << the clause >> of << he >> is << erased >>, and (ii) the case particle "ni" Is converted to the case particle "ga", and << (iii) >> In this study, conversion of these case particles <<(>> example <<: >> case << auxiliary >><< verb >> conversion of the case particle "ga""to"<<)>> and elimination of the << unnecessary >> part ( Example: Erasing “He is” is the “subject” of the study. (From 《After》 and 《This paper》《Convenience》《He is》《Erase》《Erase》《Also》《Call》 Conversion of case particles.
The conversion of passive sentences and active sentences into active sentences is performed by using the sentence << Generation >>, << paraphrase >>, sentence << simplification / language >><< operation support >>, << natural >> language sentence << from >><< knowledge acquisition and information extraction 》, 《Question Answering System》 and 《Many》 Research 《Fields》 are 《Useful》. << For example >>, in the question answering system, the question sentence is a << noh >><< motion >> sentence and the << answer >> is << written >> with a << passive >> sentence. There are cases where it is “difficult” to “take out” an answer to a question because the “structure” of the sentence is “different”. These “like” “questions” and “titles” also become “resolve” and “solve” so that “passive” and active sentences can be converted into active sentences. In this way, the conversion of passive sentences and usage sentences into active sentences is “important” in the natural language “processing”. ]
Here, it is assumed that the user is not familiar with linguistics or language processing. Then, it is assumed that technical terms of linguistics and language processing have a low appearance probability, and the frequency of “passive”, “usefulness”, “active”, “state”, and “paraphrase” is zero. Further, it is assumed that the above-described threshold value is also 0. Then, as an output of the personal environment difference emphasizing apparatus, only a place where these words overlap with the emphasized part is highlighted as follows.
[0115]
“The purpose of this study is to automatically convert case particles that should be changed when converting Japanese“ passive ”and“ use ”sentences into“ active ”sentences using machine learning. Examples of Japanese passive and usage sentences are shown in Figs. The Japanese suffix "Reda" in the sentence in Fig. 1 is an auxiliary verb indicating "passive", and this sentence is a passive sentence. The Japanese suffix "Seta" in the sentence in Fig. 2 is an auxiliary verb indicating a working part, and this sentence is a working sentence. Figure 3 shows the active sentences corresponding to these sentences. When the sentence in Fig. 1 is converted to an active sentence, (i) the case particle "ni" is converted to the case particle "ga", and (ii) the case particle "ga" is converted to the case particle "ga". When the sentence in Fig. 2 is converted to an active sentence, (i) the phrase "he is" of the case particle "ga" is deleted, and (ii) the case particle "ni" is converted to the case particle "ga". (Iii) The case particle “o” remains unchanged. In this study, conversion of these case particles (eg, conversion of the case particle “ni” into case particle “ga”) and elimination of unnecessary parts (eg, deletion of “higa”) Do it. (Hereafter, for the sake of convenience, the part of erasure such as “he is” is also referred to as case particle conversion in this paper.)
The conversion of passive sentences and active sentences into active sentences is useful in many research fields such as sentence generation, << paraphrase >>, sentence simplification / language operation support, knowledge acquisition and information extraction from natural language sentences, question answering systems, etc. Is. For example, in a question answering system, when the question sentence is an active sentence and the answer is a passive sentence, it is difficult to extract the answer of the question because the sentence structure is different between the question sentence and the sentence including the answer. There are cases. Such problems can be solved when the passive and active sentences can be converted into active sentences. In this way, conversion of passive sentences and usage sentences into active sentences is important in natural language processing. ]
As described above, this is easier to see because there are fewer highlighted areas than when the previous application (only method 2) is used. In addition, if only this time (method 2 is not used), all “passive”, “use”, “active”, “state”, and “paraphrase” are highlighted, but they are used together with the previous application (method 1 or 2). By doing so, only the first “passive”, “utility”, “active”, “state”, and “paraphrase” are highlighted.
[0116]
As a result, it is easy to easily identify the place where your poor expression first appeared.
[0117]
Furthermore, in principle, considering that this personal environment difference emphasis device is doing, if the threshold is 0, the method of the previous application is applied by attaching all the texts read and written by the individual at the beginning of this text. It means that 2 was done. That is, if the person does not see or write the characters that were the reading and writing system, what is highlighted here is the word that the person has seen for the first time in his lifetime. It can be seen that Method 2 is extended to life.
[0118]
(3): Description of the personal environment difference emphasizing apparatus using the deletion word storage unit (Description of operation of the personal environment difference emphasizing apparatus (3))
FIG. 10 is an explanatory diagram of the personal environment difference emphasizing apparatus using the deletion word storage unit. In FIG. 10, the personal environment difference enhancement device includes a frequency storage unit 3, a read / write input unit 5, an input unit 6, an output unit 7, a difference enhancement device 9, and a deletion word storage unit 10.
[0119]
The personal environment difference emphasizing apparatus in FIG. 10 includes a deletion word storage unit 10 that does not highlight words (words other than nouns or general nouns) specified in advance in the personal environment difference emphasizing apparatus in FIG. It is added.
[0120]
(Description of operation)
{Circle around (1)} Based on the user's daily reading and writing behavior, the user's reading data and writing data are stored by the processing of the flowchart of the frequency storage unit 3 (see FIGS. 5 and 7).
[0121]
{Circle around (2)} The difference emphasizing device 9 highlights only the input sentences that have a word appearance frequency equal to or higher than a certain threshold. However, words that are designated in advance (words other than nouns or general nouns) are not highlighted.
[0122]
(Description by example)
For example, it is assumed that the following text is input from the input unit 6.
[0123]
“Natural languages sometimes omit verbs. Restoring these abbreviated verbs is essential for the realization of dialogue systems and high-quality machine translation systems. Therefore, in this study, the omitted verb is complemented from the surface expression (cue word) and examples. When creating rules for analysis, we classified them according to whether there are verbs in the text that complement the verb abbreviation. When the experiment was conducted on a novel, it was possible to analyze the test sample with an accuracy of 84% recall and 82% accuracy. This shows that this method is effective. When there was a verb to be complemented in the text, it was very accurate. On the other hand, it was not so good when there was no verb to complete in the text. However, considering the difficulty of the problem when there are no verbs to be complemented in the text, it is worthwhile to analyze even a little. In addition, when the number of corpora increases and the performance of computers improves, and large-scale corpora can be used, the method using the example proposed in this paper will be important. ]
Here, the user is a person specializing in linguistics. In this case, it is assumed that the frequency of use of linguistic expressions, “language”, “verb”, and “abbreviation” is extremely high. However, words other than nouns and general nouns such as “koto” and “inside” are registered in the deletion word storage unit 10 as words that are not highlighted in advance. In that case, it is highlighted as follows:
[0124]
“Natural“ Language ”sometimes means“ Omitting ”“ Verb ”. Restoring these << abbreviated >> verbs is indispensable for the realization of dialogue systems and high-quality machine translation systems. Therefore, in this study, we will supplement this “omitted” “verb” from the surface expression (cue word) and examples. When creating the rules for analysis, we classified them according to whether or not 《Verbs》 supplemented the 《Verb》 omission phenomenon. When the experiment was conducted on a novel, it was possible to analyze the test sample with an accuracy of 84% recall and 82% accuracy. This shows that this method is effective. When there was a “verb” to be complemented in the text, the accuracy was very good. On the other hand, it was not so good when there was no “verb” to be complemented in the text. However, considering the difficulty of the problem when there is no “verb” to be supplemented in the text, it is worthwhile to analyze even a little. In addition, when the number of corpora increases and the performance of computers improves, and large-scale corpora can be used, the method using the example proposed in this paper will be important. ]
The highlighted area is surrounded by <<, >>. In this way, words that are frequently used, that is, words that are of great interest to the person are highlighted. It is convenient because you can search for paragraphs and sentences that contain the words you are interested in and read them mainly.
[0125]
(4): Operation explanation of personal environment difference enhancement device (4)
{Circle around (1)} Based on the user's daily reading and writing behavior, the user's reading data and writing data are stored by the processing of the flowchart of the frequency storage unit 3 (see FIGS. 5 and 7).
[0126]
{Circle around (2)} The difference emphasizing device 9 highlights only those sentences in which the word appearance probability of the individual user is significantly larger than the appearance probability of a general text set.
[0127]
The appearance probability of word A is the number of appearances of word A divided by the total number of appearances of all words. This can be achieved if you have a general text set. A statistical test is generally used to determine whether it is significantly large.
[0128]
Statistical testing compares the hypothesis to be tested with sample data as objective evidence, accepts the hypothesis if there is no contradiction between them, and rejects the hypothesis if there is a contradiction (reference) As an example of the literature, see Psychoeducational Statistics Baifukan Hida et al.).
[0129]
(Description by example)
For example, it is assumed that the following text is input from the input unit 6.
[0130]
“Natural languages sometimes omit verbs. Restoring these abbreviated verbs is essential for the realization of dialogue systems and high-quality machine translation systems. Therefore, in this study, the omitted verbs are complemented with surface expressions (cue words) and examples. When creating rules for analysis, we classified them according to whether there are verbs in the text that complement the verb abbreviation. When the experiment was conducted on a novel, it was possible to analyze the test sample with an accuracy of 84% recall and 82% accuracy. This shows that this method is effective. When there was a verb to be complemented in the text, it was very accurate. On the other hand, it was not so good when there was no verb to complete in the text. However, considering the difficulty of the problem when there are no verbs to be complemented in the text, it is worthwhile to analyze even a little. In addition, when the number of corpora increases and the performance of computers improves, and large-scale corpora can be used, the method using the example proposed in this paper will be important. ]
The user is a person specializing in linguistics. In this case, it is assumed that the usage frequency of “language”, “verb”, and “abbreviation” in the linguistic expression and the personal environment is significantly higher than the appearance frequency in the general text.
[0131]
In the case of this method, the use of a significant difference test (statistical test) does not automatically highlight general nouns such as “ko” and “inside” because they are not significantly high. In that case, the highlighting is as follows.
[0132]
“Natural“ Language ”sometimes means“ Omitting ”“ Verb ”. Restoring these << abbreviated >> verbs is indispensable for the realization of dialogue systems and high-quality machine translation systems. Therefore, in this study, we will supplement this “omitted” “verb” from the surface expression (cue word) and examples. When creating the rules for analysis, we classified them according to whether or not 《Verbs》 supplemented the 《Verb》 omission phenomenon. When the experiment was conducted on a novel, it was possible to analyze the test sample with an accuracy of 84% recall and 82% accuracy. This shows that this method is effective. When there was a “verb” to be complemented in the text, the accuracy was very good. On the other hand, it was not so good when there was no “verb” to be complemented in the text. However, considering the difficulty of the problem when there is no “verb” to be supplemented in the text, it is worthwhile to analyze even a little. In addition, when the number of corpora increases and the performance of computers improves, and large-scale corpora can be used, the method using the example proposed in this paper will be important. ]
The highlighted area is surrounded by <<, >>. In this way, words that are frequently used, that is, words that are of great interest to the person are highlighted. It is convenient because you can search for paragraphs and sentences that contain the words you are interested in.
[0133]
(5): Explanation of operation of personal environment difference enhancement device (5)
{Circle around (1)} Based on the user's daily reading and writing behavior, the user's reading data and writing data are stored by the processing of the flowchart of the frequency storage unit 3 (see FIGS. 5 and 7).
[0134]
{Circle around (2)} The difference emphasizing device 9 highlights only those sentences in which the word appearance probability in the individual user is larger than a certain value multiple of the appearance probability in a general text set.
[0135]
The appearance probability of word A is the number of appearances of word A divided by the total number of appearances of all words. This can be achieved if you have a general text set. In this case, since it is not judged whether it is significant, it is not necessary to use a difficult method such as a statistical test.
[0136]
Simply calculate the word appearance probability in the user individual and the appearance probability in the general text set, and the division, that is, the value obtained by dividing the word appearance probability in the user individual by the appearance probability in the general text set. Only highlights that are greater than a predetermined value.
[0137]
(Description by example)
For example, it is assumed that the following text is input from the input unit 6.
[0138]
“Natural languages sometimes omit verbs. Restoring these abbreviated verbs is essential for the realization of dialogue systems and high-quality machine translation systems. Therefore, in this study, the omitted verbs are complemented with surface expressions (cue words) and examples. When creating rules for analysis, we classified them according to whether there are verbs in the text that complement the verb abbreviation. When the experiment was conducted on a novel, it was possible to analyze the test sample with an accuracy of 84% recall and 82% accuracy. This shows that this method is effective. When there was a verb to be complemented in the text, it was very accurate. On the other hand, it was not so good when there was no verb to complete in the text. However, considering the difficulty of the problem when there are no verbs to be complemented in the text, it is worthwhile to analyze even a little. In addition, when the number of corpora increases and the performance of computers improves, and large-scale corpora can be used, the method using the example proposed in this paper will be important. ]
Here, the user is a person specializing in linguistics. In this case, it is assumed that the linguistic expression, the usage frequency of “language”, “verb”, and “abbreviation” in the personal environment is higher than a frequency obtained by multiplying the appearance frequency in general text by a predetermined value. Also in this method, general nouns such as “Koto” and “Uchi” do not appear so much in proportion, so they are not highlighted. In that case, the highlighting is as follows.
[0139]
“Natural“ Language ”sometimes means“ Omitting ”“ Verb ”. Restoring these << abbreviated >> verbs is indispensable for the realization of dialogue systems and high-quality machine translation systems. Therefore, in this study, we will supplement this “omitted” “verb” from the surface expression (cue word) and examples. When creating the rules for analysis, we classified them according to whether or not 《Verbs》 supplemented the 《Verb》 omission phenomenon. When the experiment was conducted on a novel, it was possible to analyze the test sample with an accuracy of 84% recall and 82% accuracy. This shows that this method is effective. When there was a “verb” to be complemented in the text, the accuracy was very good. On the other hand, it was not so good when there was no “verb” to be complemented in the text. However, considering the difficulty of the problem when there is no “verb” to be supplemented in the text, it is worthwhile to analyze even a little. In addition, when the number of corpora increases and the performance of computers improves, and large-scale corpora can be used, the method using the example proposed in this paper will be important. ]
The highlighted area is surrounded by <<, >>. In this way, words that are frequently used, that is, words that are of great interest to the person are highlighted. It is convenient because you can search for paragraphs and sentences that contain the words you are interested in.
[0140]
An example of reference material for full-text search technology is the construction and use of a Japanese full-text search system, and Baba SOFT BANK.
[0141]
Further, in the above-described embodiment, description has been given with double angle brackets as the highlight display, but other highlight displays such as underline, color coding, background change, font change, and blinking can also be performed.
[0142]
Further, in the above embodiment, the frequency storage unit 3 has been described to determine the appearance frequency of the character string input from the read / write system in the personal environment. However, the character input from only the reading system or only the writing system. You may make it obtain | require the appearance frequency of a row | line | column.
[0143]
§3: Explanation of program installation
Personal environment frequency storage device 2, frequency storage means 3a, frequency storage unit 3, language conversion unit 4a, language conversion unit 4, read / write input unit 5, input unit 6, output unit 7, language auxiliary conversion unit 8, difference enhancement device 9 The deletion word storage unit 10, the storage means 13a, and the like can be configured by a program, which is executed by the main control unit (CPU) and stored in the main memory. This program is generally processed by a computer. This computer is composed of hardware such as an input device as input means such as a main control unit, main memory, file device, display device, and keyboard. The program of the present invention is installed on this computer. In this installation, these programs are stored in a portable recording (storage) medium such as a floppy disk or a magneto-optical disk, and a drive device for accessing the recording medium provided in the computer is used. Alternatively, it is installed in a file device provided in the computer via a network such as a LAN. Then, the program steps necessary for processing are read from the file device into the main memory and executed by the main control unit.
[0144]
【The invention's effect】
As described above, the present invention has the following effects.
[0145]
(1): The frequency storage means extracts an arbitrary character string from the read / write data in the personal environment input from the read / write input unit, and counts the number of each extracted character string. By using the frequency information, it is possible to easily perform language conversion of the input sentence to the personal environment and difference emphasis.
[0146]
(2): The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the input detection unit, and In order to obtain the appearance frequency of the character string, it is possible to attach importance to “writing” with more impression.
[0147]
(3): A reading input detection unit that does not detect an occurrence of an arbitrary character string without a short display time, so that the appearance frequency of the arbitrary character string is obtained. Can be excluded.
[0148]
(4): The frequency storage means deletes old data from the read / write data input from the read / write input unit and obtains the appearance frequency of the arbitrary character string. Can respond to changes.
[0149]
(5): Since the frequency storage means obtains the appearance frequency of the arbitrary character string by reducing the weight of the old one from the read / write data input from the read / write input unit, the recent personal environment frequency can be emphasized. Can respond to changes in
[0150]
(6): Since the input character string is converted into a character string having a high appearance frequency stored in the personal environment frequency storage device by the language conversion means and output, it can be expressed easily for each individual. .
[0151]
(7): In the language conversion means, out of the input character string, the character string with a high appearance frequency stored in the personal environment frequency storage device is output with auxiliary notation in parentheses, so it is completely rewritten and misunderstood And the meaning of the sentence can be prevented from changing.
[0152]
(8): The difference emphasizing device highlights the input character strings that are less than a threshold with the appearance frequency of the character strings stored in the personal environment frequency storage device. You can search for words well.
[0153]
(9): In the difference emphasizing device, a character string of great interest to the user is displayed in order to highlight a character string stored in the personal environment frequency storage device that has an appearance frequency equal to or higher than a threshold. And highlight words and search for paragraphs and sentences that interest you.
[0154]
(10): In the personal environment frequency storage device, an arbitrary character string is extracted from the writing data in the personal environment input from the writing input unit that inputs the writing in the personal environment, and the number for each extracted character string In order to output the character string with a high appearance frequency stored in the personal environment frequency storage device among the input character strings as auxiliary notation in parenthesis and output by the language conversion means, completely rewrite, It can prevent misunderstanding and the meaning of the sentence from changing.
[0155]
(11): In the personal environment frequency storage device, an arbitrary character string is extracted from the writing data in the personal environment input from the writing input unit for inputting the writing in the personal environment, and the number for each extracted character string And the difference emphasis device highlights the input character string that has a frequency of appearance of the character string stored in the personal environment frequency storage device that is below a threshold value. And can search for words well.
[0156]
(12): An extraction unit that is a unit to be output as a difference between input character strings and a detection area that is a unit of an area to be compared in order to detect a difference between input character strings are set, and the extraction means All of the extracted character strings corresponding to the extraction units are extracted from the area other than the current detection area and stored in storage means, and the current detection area is not stored in the storage means. The difference enhancement device that repeats for each detection region to highlight the one corresponding to the extraction unit and output the document of the current detection region, the input that has been determined to be highlighted by the extraction means Since the character strings stored in the personal environment frequency storage device are less than a certain threshold value among the character strings, there are few places to be highlighted, and my poor expression appears for the first time Tokoro the can be easily identified.
[0157]
(13): An extraction unit that is a unit to be output as a difference between input character strings and a detection area that is a unit of an area to be compared in order to detect a difference between input character strings are set, and the extraction means In the current detection area of the input document data, the one corresponding to the extraction unit not stored in the storage means is highlighted to output the document in the current detection area, and the highlighted one is A difference emphasis device that repeats storing for each detection area is stored in a storage means, and is stored in a personal environment frequency storage device among the inputted character strings determined to be highlighted by the extraction means. Since characters whose frequency of appearance is less than a certain threshold value are highlighted, there are few places to be highlighted, and it is possible to easily identify a place where an unsatisfactory expression appears for the first time.
[0158]
(14): In the personal environment frequency storage device, an arbitrary character string is extracted from the writing data in the personal environment input from the writing input unit for inputting the writing in the personal environment, and the number for each extracted character string In the difference emphasis device, a character string of great interest to the user is displayed in order to highlight the input character string whose frequency of appearance of the character string stored in the personal environment frequency storage device is equal to or higher than a threshold value. And highlight words and search for paragraphs and sentences that interest you.
[0159]
(15): The difference emphasis device highlights only the input character strings that have a character string appearance probability that is significantly higher than the appearance probability in a general text set, so that the appearance probability is significant. General nouns that are not very large are no longer highlighted, and strings and words that are of great interest to the person are highlighted, allowing you to search for paragraphs and sentences that contain the strings or words that interest you. .
[0160]
(16): The difference emphasis device highlights only the input character string whose character string appearance probability in the user is greater than a certain value multiple of the appearance probability in a general text set. General nouns that are not larger than a certain value will not be highlighted proportionately, strings and words of greater interest of the person will be highlighted, and paragraphs containing the strings or words of interest You can search for sentences well.
[0161]
(17): Read / write input means for performing read / write input in a personal environment, an arbitrary character string is extracted from the read / write data in the personal environment input from the read / write input means, and the number of each extracted character string is determined. As a frequency storage means for counting, a program for causing the computer to function or a recording medium on which the program is recorded. By installing this program in the computer, an individual who can obtain frequency information of character strings in a personal environment An environmental frequency storage device can be easily provided.
[0162]
(18): Read / write input means for inputting / reading in a personal environment, an arbitrary character string is extracted from the read / write data in the personal environment input from the read / write input means, and the number of each extracted character string is determined. A frequency storage unit that counts and obtains an appearance frequency, and a language conversion unit that converts an input character string into a character string having a high appearance frequency stored in the frequency storage unit and outputs the converted character string. Since the program or the recording medium on which the program is recorded is installed in the computer, it is possible to easily provide a personal environment language conversion device that can make the expression easy to understand for each individual.
[0163]
(19): Read / write input means for performing read / write input in a personal environment, an arbitrary character string is extracted from the read / write data in the personal environment input from the read / write input means, and the number of each extracted character string is determined. A computer as a frequency storage means for obtaining an appearance frequency by counting, and a language conversion means for outputting a character string having a high appearance frequency stored in the frequency storage means among the input character strings in an auxiliary notation in parentheses and outputting it Personal environment language conversion that can be completely rewritten to prevent misunderstandings and changes in the meaning of sentences by installing this program on a computer to make it a program or a recording medium on which the program is recorded The device can be provided easily.
[0164]
(20): Read / write input means for inputting / reading in a personal environment, an arbitrary character string is extracted from the read / write data in the personal environment input from the read / write input means, and the number of each extracted character string is determined. A computer as a frequency emphasizing unit that counts and calculates an appearance frequency, and a difference emphasizing device that highlights an input character string that has a frequency of appearance of a character string stored in the frequency storage unit that is equal to or less than a threshold value. Providing a personal environment difference emphasis device that can easily search for character strings and words that you are not good at by installing this program on a computer so that it can be a functioning program or a recording medium that records the program. can do.
[0165]
(21): Read / write input means for inputting / reading in a personal environment, an arbitrary character string is extracted from the read / write data in the personal environment input from the read / write input means, and the number of each extracted character string is determined. A computer as a frequency emphasizing unit that counts and calculates an appearance frequency, and a difference emphasizing device that highlights an input character string that has a frequency of appearance of a character string stored in the frequency storage unit that is greater than or equal to a threshold value. Providing a personal environment difference enhancement device that can easily find paragraphs and sentences that interest you by installing this program on a computer so that it can be a functioning program or a recording medium that records the program. can do.
[0166]
(22): A writing input means for inputting writing in a personal environment, an arbitrary character string is extracted from writing data in the personal environment input from the writing input means, and the number of each extracted character string is determined. A frequency storage means for counting, an extraction unit that is a unit to be output as a difference between input character strings, and a detection area that is a unit of an area to be compared in order to detect a difference between input character strings; The extraction means highlights the current detection area of the input document data corresponding to the extraction unit not stored in the storage means, outputs the document in the current detection area, and the highlighted Among the input character strings determined to be highlighted by the extraction means, the frequency emphasizing means is stored in the frequency storage means. A program for causing a computer to function or a recording medium on which the program is recorded is installed in a computer as the difference emphasizing device that highlights the appearance frequency of a stored character string below a certain threshold value. Thus, it is possible to easily provide a personal environment difference emphasizing apparatus that can easily identify the places where the highlighted expressions are few and the expressions that are not suitable for the first time appear.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating the principle of the present invention.
FIG. 2 is an explanatory diagram of a personal environment language conversion device according to the embodiment.
FIG. 3 is an explanatory diagram in the case of using a language auxiliary conversion unit in the embodiment.
FIG. 4 is an explanatory diagram (1) of a frequency storage unit in the embodiment.
FIG. 5 is a process flowchart (1) of a frequency storage unit according to the embodiment.
FIG. 6 is an explanatory diagram (2) of the frequency storage unit in the embodiment.
FIG. 7 is a process flowchart (2) of the frequency storage unit in the embodiment.
FIG. 8 is an explanatory diagram of a language conversion unit in the embodiment.
FIG. 9 is an explanatory diagram of a personal environment difference enhancement device according to an embodiment.
FIG. 10 is an explanatory diagram of a personal environment difference emphasizing apparatus using a deletion word storage unit in the embodiment.
[Explanation of symbols]
2 Personal environmental frequency storage device
3a Frequency storage means
4a Language conversion means
5 Read / write input section
6 Input section
7 Output section
13a Storage means

Claims

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment language conversion device comprising language conversion means for converting and outputting an input sentence using a dictionary for language conversion into a character string having the same meaning as the personal environment language ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
The language conversion unit converts the character string of the input sentence to the highest string occurrence frequency in the same sense as the string of the input sentence in the dictionary stored in which the language conversion to said frequency storage means output A personal environment language conversion device characterized by

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment language conversion device comprising language conversion means for converting and outputting an input sentence using a dictionary for language conversion into a character string having the same meaning as the personal environment language ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
The language conversion means obtains a character string having the highest appearance frequency in the same meaning as the character string of the input sentence in the language conversion dictionary stored in the frequency storage means among the character strings of the input sentence ; A personal environment language conversion device characterized in that the obtained character string is output in auxiliary notation in parentheses near the character string of the input sentence having the same meaning.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment difference emphasis device comprising a difference emphasis device that emphasizes and displays a difference between a character string of an input sentence and a character string stored in the frequency storage means ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
The difference emphasizing device highlights the character string of the input sentence that has a frequency of appearance of a character string stored in the frequency storage means that is equal to or lower than a threshold value.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment difference emphasis device comprising a difference emphasis device that emphasizes and displays a difference between a character string of an input sentence and a character string stored in the frequency storage means ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
The difference emphasizing apparatus highlights those character strings of an input sentence that have a frequency of appearance of a character string stored in the frequency storage means that is equal to or higher than a threshold value.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
Characters input by extraction means by setting a detection area, which is a unit of an area to be compared in order to detect the difference between the extracted unit and the input character string, which is the target unit to be output as the difference between the input character strings A column corresponding to all the extraction units is extracted from an area other than the current detection area and stored in the storage means, and in the current detection area, corresponding to the extraction units not stored in the storage means It is a personal environment difference emphasizing apparatus comprising a difference emphasizing apparatus that repeats for each of the detection areas to determine that what to do is to be highlighted and to output a document of the current detection area ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for the appearance frequency,
The difference emphasizing device highlights the input character strings that are determined to be highlighted by the extraction means, those having a frequency of appearance of a character string stored in the frequency storage means that is less than or equal to a threshold value. Personal environment difference enhancement device characterized by that.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
Set the detection area, which is the unit of the area to be compared to detect the difference between the extracted unit and the input character string to be detected as the difference between the input character string and the input unit. In the current detection area of the document data, it is determined that the one corresponding to the extraction unit that is not stored in the storage means should be highlighted, the document in the current detection area is output, and it is determined that the highlight should be displayed A personal environment difference emphasizing device comprising a difference emphasizing device that repeats storing in the storage means for each detection region ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for the appearance frequency,
The difference emphasizing device highlights the input character strings that are determined to be highlighted by the extraction means, those having a frequency of appearance of a character string stored in the frequency storage means that is less than or equal to a threshold value. Personal environment difference enhancement device characterized by that.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment difference emphasis device comprising a difference emphasis device that emphasizes and displays a difference between a character string of an input sentence and a character string stored in the frequency storage means ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
The difference emphasis device uses a statistical test, and the character string appearance probability of the individual user obtained from the character string appearance frequency of the frequency storage means among the character strings of the input sentence is an appearance probability obtained from a general text set A personal environment difference enhancement device characterized by highlighting only those that are significantly larger.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment difference emphasis device comprising a difference emphasis device that emphasizes and displays a difference between a character string of an input sentence and a character string stored in the frequency storage means ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
The difference emphasis device is a value having an appearance probability obtained from a general text set in which the character string appearance probability in the personal read / write data obtained from the character string appearance frequency of the frequency storage means among the character strings of the input sentence A personal environment difference emphasis device characterized by highlighting only those that are larger than double.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment language conversion device comprising language conversion means for converting and outputting an input sentence using a dictionary for language conversion into a character string having the same meaning as the personal environment language ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit the write any of the string entered in order to create as input, eh Bei an input detection unit to write to detect the input-out該書,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
A character string of the input sentence as the language conversion means for highest converted to a string output of the same meaning frequency string of the input sentence in the dictionary of the language conversion stored in the frequency storage means , A program to make a computer function.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment language conversion device comprising language conversion means for converting and outputting an input sentence using a dictionary for language conversion into a character string having the same meaning as the personal environment language ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit the write arbitrary string entered in order to create an input, an input detection unit to write detects an input-out該書,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
The character string of the highest appearance frequency is obtained in the same meaning as the character string of the input sentence in the language conversion dictionary stored in the frequency storage means among the character strings of the input sentence , and the obtained character string is the same as the language conversion means and outputting the auxiliary denoted by bracing near the input sentence string of meaning, a program for causing a computer to function.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment difference emphasis device comprising a difference emphasis device that emphasizes and displays a difference between a character string of an input sentence and a character string stored in the frequency storage means ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
Examples difference enhancement apparatus, a program for causing a computer to function to highlight the following there is the appearance frequency of the character string stored in said frequency storage means in the character string of the input sentence threshold.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
A personal environment difference emphasis device comprising a difference emphasis device that emphasizes and displays a difference between a character string of an input sentence and a character string stored in the frequency storage means ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for appearance frequency information ,
Examples difference enhancement apparatus, a program for causing a computer to function to highlight more than there is the appearance frequency of the character string stored in said frequency storage means in the character string of the input sentence threshold.

A read / write input unit for reading and writing in a personal environment and read / write data input to the read / write input unit are stored, and the number of arbitrary character strings is counted by a full-text search engine or divided into words by a morphological analyzer And a personal environment frequency storage device having frequency storage means for counting the frequency of appearance of any word and searching for frequency information of the character string,
Characters input by extraction means by setting a detection area, which is a unit of an area to be compared in order to detect the difference between the extracted unit and the input character string, which is the target unit to be output as the difference between the input character strings A column corresponding to all the extraction units is extracted from an area other than the current detection area and stored in the storage means, and in the current detection area, corresponding to the extraction units not stored in the storage means It is a personal environment difference emphasizing apparatus comprising a difference emphasizing apparatus that repeats for each of the detection areas to determine that what to do is to be highlighted and to output a document of the current detection area ,
A reading input detection unit and a sentence for detecting an input of an arbitrary character string that is a portion continuously displayed for a predetermined time or longer on the screen input from the read / write input unit in the frequency storage unit An arbitrary character string input to create a writing input, and a writing input detection unit for detecting the writing input,
The frequency storage means sets the weight of the appearance frequency of the arbitrary character string detected by the writing input detection unit to be heavier than the weight of the appearance frequency of the arbitrary character string detected by the reading input detection unit. Search the string for the appearance frequency,
As the difference emphasizing apparatus that highlights the input character strings that are determined to be highlighted by the extracting means, those that are less than a threshold value with the appearance frequency of the character string stored in the frequency storage means , A program that allows a computer to function.