JP4041875B2

JP4041875B2 - Written word style conversion system and written word style conversion processing program

Info

Publication number: JP4041875B2
Application number: JP2001205888A
Authority: JP
Inventors: 真樹村田; 均井佐原
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2001-07-06
Filing date: 2001-07-06
Publication date: 2008-02-06
Anticipated expiration: 2021-07-06
Also published as: JP2003022266A

Description

【０００１】
【発明の属する技術分野】
本発明は，ある自然言語で記述された文章語の文字列を，同一の自然言語で記述された他の文体による文章語の文字列に変換する文章語文体変換システムに関するものである。
【０００２】
【従来の技術】
自然言語で記述された文または文章に関する表現の変換処理として典型的なものは，機械翻訳である。機械翻訳では，ある国の自然言語で記述された文または文章を他の国の自然言語で記述された文または文章に変換する。
【０００３】
機械翻訳が他の国の言語に変換するのに対し，同一の自然言語間での文または文章の変換処理を行うシステムも用いられるようになってきている。例えば，要約文を自動生成したり，文章を推敲したりするシステムである。
【０００４】
一般に同一自然言語間での文の変換処理では，変換前の語・句・文などのパターンと変換後の語・句・文などのパターンとの対からなる変換規則を大量に用意し，いわゆるパターン・マッチングによって入力文中に現れる変換前のパターンを探し出し，該当するパターンがあれば，それを変換後の語・句・文などのパターンに置き換える処理を行っている。
【０００５】
また，同一自然言語間での文の変換処理として，要約文を自動生成したり，文章を推敲したりすることは行われていたが，例えば芥川龍之介が書いた小説を，夏目漱石の文体の小説に変換するというように，ある特定の個人の文体または一般的な文体の文章を，他の特定の個人の文体に変換するというようなことは行われていなかった。
【０００６】
【発明が解決しようとする課題】
従来の同一自然言語内での文または文章の変換処理では，一般に変換規則による一律な変換を行っており，変換結果の良し悪しについての評価は行われていなかった。そのため，実際によい変換が行われるかどうかは，あらかじめ用意された変換規則の良し悪しに大きく依存し，適用された変換規則によっては，目的とする変換と異なる変換結果になってしまうということがあった。
【０００７】
また，変換の精度を良くしようとすると，変換規則として真に妥当なものだけを選別する必要があり，大量な変換規則を選別するのは，大変な困難を伴う作業であった。例えば変換規則の中に，「Ａ」という文字列を「Ｂ」という文字列に変換する規則と，これとは逆に「Ｂ」という文字列を「Ａ」という文字列に変換する規則とが混在すると，従来の技術では目的とする正しい変換結果は得られなかった。
【０００８】
そのため，特に従来技術では，個人文体の変換や難解な文を小学生にも理解しやすい平易文に変換するというようなシステムを構築することは難しいという問題があった。
【０００９】
本発明は上記問題点の解決を図り，文章語の表現を変える規則として，厳選された規則ではなく，例えばコンピュータにより自動獲得されたような多種多様な規則を用いた場合でも，入力した文章語を目的とする文体に適切に変換できるシステムを提供することを目的とする。
【００１０】
【課題を解決するための手段】
本発明は，上記課題を解決するため，ある自然言語で記述された文章語を，同一の自然言語で記述された他の文体による文章語に変換するシステムにおいて，自然言語で記述された第１の文字列を同義の第２の文字列に言い換える変形の規則であって，コンピュータにより，複数の異なる辞書ファイルから同じ単語の説明文を抽出し，抽出した複数の説明文の言語情報を突き合わせ，突き合わせた結果から得られた同義語または同義語フレーズによって自動生成された前記第１の文字列と前記第２の文字列からなる，目的とする文体の変換の方向に依存しない変形規則を記憶する変形規則記憶手段と，文字列を変形した結果の表現が目的とする文体になっているかどうかを評価するための数値情報，関数群もしくはサブルーチン群，評価方法を記述した規則，またはこれらを組み合わせた情報からなる評価情報であって，該評価情報を規定する評価尺度が，変換先の文体による文章の集合を格納したデータベース中の用例における変換の候補の出現頻度または出現確率が大きいものほど高い評価値を与えるものであるように予め定められた評価情報を記憶する評価情報記憶手段とを備え，自然言語で記述された文体変換対象の文字列を入力すると，変形処理手段によって，入力された文字列を変形規則記憶手段に記憶された変形の規則を用いて変形し，複数の変換の候補を生成する。次に，生成された複数の変換の候補を，評価処理手段によって評価情報記憶手段に記憶された評価関数または評価規則を用いて評価し，評価結果の最もよい表現を選択し，選択された表現の変換結果を，目的とする文体に変換された文章語として出力する。
【００１１】
例えば，評価関数または評価規則による評価の尺度を，平易な文章集合からなる大量の用例における変換の候補の出現頻度または出現確率を含むものとし，その出現頻度または出現確率が大きいものに高い評価を与えるようにすることにより，入力した難解文の文字列を平易文に変換する文章語文体変換システムを実現することができる。
【００１２】
また，例えば評価関数または評価規則による評価の尺度を，特定の個人の文章集合からなる大量の用例における変換の候補の出現頻度または出現確率を含むものとし，その出現頻度または出現確率が大きいものに高い評価を与えるようにすることにより，入力した文字列の表現をある特定の個人の文体に変換する文章語文体変換システムを実現することができる。
【００１３】
評価の尺度として，必ずしも前記出現頻度または出現確率の大小を用いる必要はないが，何らかの評価尺度によって，文章語の変形の後に目的に合致したふさわしい文体になっているかどうかを評価し，高い評価の変形を変換結果として選択するので，あらかじめ文体の変形規則として用意する情報は，基本的に文字列の同義性があれば十分であり，多種多様な変形規則を厳選することなく用いることができる。変形元と変形先というような変形規則の方向性についての考慮も不要である。すなわち，例えば変形規則の中に，「Ａ」という文字列を「Ｂ」という文字列に変形する規則と，これとは逆に「Ｂ」という文字列を「Ａ」という文字列に変形する規則とが混在しているような場合でも，評価によって最終的に目的とする変換結果が得られることになる。したがって，変形規則の作成が容易であり，ある文体変換のために用意した変形規則を，他の目的の文体変換のために利用するようなことも可能である。
【００１４】
以上の手段は，コンピュータと，そのコンピュータにインストールされ実行されるソフトウェアプログラムとによって実現することができ，そのプログラムは，コンピュータが読み取り可能な可搬媒体メモリ，半導体メモリ，ハードディスク等の適当な記録媒体に格納することができる。
【００１５】
【発明の実施の形態】
図１は，本発明のシステム構成例を示す。図中，１はＣＰＵおよびメモリなどからなるコンピュータによって実現される文章語文体変換処理装置であって，変形処理部１１，評価処理部１２，変形規則記憶部１４，評価情報記憶部１５を備える。
【００１６】
変換対象文１０は，本システムにおける入力となる自然言語文である。以下，特に断らないが変換対象文１０は必ずしも一文に限られるわけではなく，文章または句，節のようなものであってもよい。変換結果文１３は，本システムの出力であって，変換対象文１０を同一の種類の自然言語で元の文体と異なる文体に言い換えたものである。
【００１７】
文章語文体変換処理装置１のモジュールは，基本的に変形処理部１１と評価処理部１２とから構成される。変形処理部１１は，変形規則記憶部１４に格納されている変形規則を用いて，変換の候補を獲得するモジュールである。評価処理部１２は，変換の候補の良さ，すなわち目的のふさわしい文体であるかどうかを，あらかじめ評価情報記憶部１５に記憶されている評価の尺度（評価関数など）によって評価し，最も評価の高い変換の候補を選択するモジュールである。
【００１８】
変換対象文１０が入力されると，変形処理部１１は，変形規則を用いて変換の候補を挙げ，評価処理部１２は，変形された文体の妥当性をチェックして，最も妥当であると判断されたものを選択し，その結果を変換結果文１３として出力する。
【００１９】
変形規則記憶部１４に記憶する変形規則は，コンピュータにより，複数の異なる辞書ファイルから同じ単語の説明文を抽出し，抽出した複数の説明文の言語情報を突き合わせ，突き合わせた結果から得られた同義語または同義語フレーズによって自動生成された前記第１の文字列と前記第２の文字列からなる。この変形規則は，目的とする文体の変換の方向に依存しない。変換の候補を評価する評価関数（評価尺度）の評価情報は，扱う問題ごとに適正なものを用意する。
【００２０】
評価情報は，評価のための数値情報であってもよいし，関数群もしくはサブルーチン群などによる手続き的なものであってもよい。また，評価方法を記述した規則（ルール）であってもよい。これらの組み合わせで実現することも可能である。評価処理部１２で用いる評価の尺度の代表的な例としては，目的とする文体の文章集合からなる大量の言語データ中での出現頻度（または出現確率）を挙げることができる。
【００２１】
例えば，文章語の文体を変える本システムにおいて，変形処理部１１が使用する変形規則がすべて同義性を満足するものであるとする。この場合，文体を変換したいデータを，そのデータの出現（生起）確率が目的とする文体の文章集合中で高くなるように変形すると目的とする文体に非常に近い文章語となる。
【００２２】
もう少し簡単な例でこれを説明すると，例えば入力した文章がいわゆる「です調」の文体であり，「〜です」という文字列が多くあったとする。変形規則に「〜です」を「〜である」に変形する規則があったとしよう。目的とする文体が「である調」の文体であれば，その目的とする文体の文章データが大量に格納されたデータベースを用意して，評価のために用いる。そのデータベースにおいて「〜です」と「〜である」の出現回数を数える。「〜である」の出現回数のほうが数が多い場合，「〜である」のほうの評価を「〜です」より高くする。この評価によって，「です調」の文体は，「である調」の文体に自動変換されることになる。
【００２３】
ここで，出現頻度 (または出現確率) を調べるコーパスをいろいろと変えることにより，さまざまな文体の変換の結果を得ることができる。例えば，入力データが法律関係の文のときに，コーパスとして平易な文章の集合を与えておくと，法律関係の難解な文章を平易な文章に変形させることが期待できる。
【００２４】
また，ここで入力データとして適当に誰かが書いた小説の文章を入れて，コーパスとしてシェークスピアの小説をいれると，シェークスピアの文体の小説が新たに完成することになる。同様に，芥川龍之介の小説を夏目漱石の文体に変換するなどといったことも可能になる。
【００２５】
本システムで用いる変形規則は，例えば次のようにしてコンピュータにより自動生成される。まず，複数の同一言語により記述された意味的な対応関係がある言語情報を抽出する。具体的には，複数の異なる辞書ファイルを用意し，それらから同じ単語の説明文を抽出する。次に，抽出した複数の言語情報を突き合わせ，その結果から同義語または同義フレーズを抽出する。抽出した同義語または同義フレーズから，第１の文字列を同義の第２の文字列に言い換える変形規則を自動生成する。
【００２６】
図２は，変形処理部１１の処理フローチャートである。変形処理部１１は，まずステップＳ１０により，文体の変換対象として指定された変換対象文１０を入力する。キーボードなどからの入力，ファイルからの入力，アプリケーションプログラムからの入力など，入力方法は問わない。
【００２７】
ステップＳ１１では，変形規則記憶部１４から変換に必要な変形規則を読み込む。既に読み込まれている場合には，ここでの読み込みは不要である。ステップＳ１２では，入力した変換対象文１０を変形規則を用いて変形する。この変形した後の表現の候補を評価処理部１２へ引き渡す。このとき，変形した後の表現の候補を一つずつ評価処理部１２へ渡してもよいし，複数ある場合には複数まとめて渡してもよい。
【００２８】
ステップＳ１３では，評価処理部１２は，変形処理部１１から変換対象文１０を変形した後の表現の候補を受け取り，評価情報記憶部１５に記憶されている評価情報を用いて，目的とする文体にふさわしい表現になっているかどうかを評価する。この評価情報は，評価処理部１２から呼び出される評価関数のようなものでも，また評価関数が使用するパラメータのようなものでもよい。表現の各候補について，評価結果が数値（評価値）として算出されることになる。ステップＳ１４では，評価結果の最もよい変形後の表現を選択し，その変形した表現を変換結果文１３として出力する。
【００２９】
以下，各種の文章語文体変換処理システムへの具体的な適用例を説明する。
【００３０】
（Ａ）難解文変換システムへの適用例
図３は，難解文変換システムへの適用例を示している。図３に示す難解文変換システムでは，法律文章を平易な文に書き換えたり，難しい新聞の記事を小学生向けの易しい文に書き換えたりする処理を行う。
【００３１】
例えば図３の例のように，変換対象文１０として，「大臣を罷免する」という文が入力されたとする。変形処理部１１は，この変換対象文１０を，変形規則記憶部１４にあらかじめ用意された変形規則を用いて，同義の異なる表現に言い換える。ここで，変形規則として，
「罷免する」→「やめさせる」
・・・・
という規則があったとすると，変形処理部１１は，変換対象文１０に変形規則を適用することにより，「大臣を罷免する」という文から「大臣をやめさせる」という文を生成する。この他にも，種々の変形規則が存在し，多くの変形された文が候補として生成されることになる。これらの文を評価処理部１２に渡す。なお，変形されなかった変換対象文１０についても候補の一つとして評価処理部１２に渡す。
【００３２】
評価処理部１２は，評価情報記憶部１５にあらかじめ用意された評価情報（評価関数）を用いて，変形処理部１１が変形した文を評価する。ここで評価の尺度は，例えば小学生向けというような低年齢層向けの文章集合での出現頻度または出現確率が大きくなる変換に高い評価を与えるものである。したがってこの例では，評価処理部１２は，あらかじめ定められた範囲での低年齢層向けの文章集合における「大臣を罷免する」と「大臣をやめさせる」の出現頻度を求める。簡便な手法としては，変形した部分を含む小さい領域範囲の文字列が言語データで何回出現したかを数える。「大臣をやめさせる」のほうが出現頻度が大きい場合，この表現のほうが低年齢層向けの易しい表現であるとわかる。これによりこの変形はよしとされ，変換結果文１３として「大臣をやめさせる」が出力される。なお，出現頻度ではなく，出現（生起）確率を計算してもよい。
【００３３】
また，評価の尺度としては，所定の文章集合での出現頻度や出現確率に限らず，他の何らかの尺度を用いることもできる。例えば，あらかじめ単語の結び付きや，構文解析結果から得られる文法上の言い回しに対して，評価ポイントを定めておき，それを用いて評価するようなことも可能である。また，変換後の文章集合から平易さを示す何らかの尺度を学習し，それに従って評価する方法も考えられる。
【００３４】
なお，低年齢層向けの文章集合において，評価の都度，変換の候補の出現頻度を数えるのではなく，あらかじめ文章集合に現れる各文，句，文節，単語などごとに，それらの出現頻度（出現確率）を求めておいたテーブルを用意しておき，そのテーブルを検索することによって出現頻度（出現確率）を求めるようにしてもよいことは言うまでもない。
【００３５】
（Ｂ）個人文体変換システムへの適用例
図４は，個人文体変換システムへの適用例を示している。図４に示す個人文体変換システムでは，例えば芥川龍之介の小説を，夏目漱石の文体の小説に書き換えたり，ある無名の作家の小説をシェークスピアの文体の小説に書き換えたりする処理を行う。
【００３６】
例えば図４の（１）の例のように，変換対象文１０として，「大臣を罷免するなどを行った」という文が入力されたとする。変形処理部１１は，この変換対象文１０を，あらかじめ変形規則記憶部１４に用意された変形規則を用いて，異なる同義の表現に言い換える。ここで，「といった」という表現を多用する人の文体への変形規則として，
「するなど」→「するといったこと」
・・・・
という規則があったとすると，変形処理部１１は，変換対象文１０に変形規則を適用することにより，「大臣を罷免するなどを行った」という文から「大臣を罷免するといったことを行った」という文を生成する。この他にも，種々の変形規則が存在し，多くの変形された文が候補として生成されることになる。これらの文を評価処理部１２に渡す。なお，変形されなかった変換対象文１０についても候補の一つとして評価処理部１２に渡す。
【００３７】
評価処理部１２は，あらかじめ評価規則記憶部１５に用意された評価情報（評価関数）を用いて，変形処理部１１が変形した文を評価する。ここで評価の尺度は，変換目的である特定個人の文章集合での出現頻度または出現確率が高くなるような表現に，高い評価を与えるものである。したがってこの例では，評価処理部１２は，その特定個人の文章集合における「大臣を罷免するなどを行った」という文や，「大臣を罷免するといったことを行った」という文の出現頻度を求める。なお，出現頻度は，必ずしも文全体の出現回数でなくてもよく，変形した部分を含む小さい領域範囲の文字列が文章集合の中で何回出現したかでもよい。「大臣を罷免するといったことを行った」という文の出現頻度が大きい場合，評価処理部１２は，変換結果文１３として「大臣を罷免するといったことを行った」を出力する。
【００３８】
また，「であろう」を多用する人への文体への変更の場合，例えば変形規則として，
「と思われる」→「であろう」
・・・・
という規則を用い，評価の尺度として，その「であろう」を多用する特定個人の文章集合での出現頻度または出現確率が高くなるような表現をよしとするものを用いる。
【００３９】
変形処理部１１は，図４の（２）のように「大臣を罷免すると思われる」という変換対象文１０を入力すると，この入力に対して変形規則を適用することにより，この文を「大臣を罷免するであろう」という表現に変形する。評価処理部１２による評価によって，「大臣を罷免するであろう」という表現の評価値が最も高いことがわかると，評価処理部１２はこの文を変換結果文１３として出力する。
【００４０】
なお，評価の尺度としては，所定の文章集合での出現頻度や出現確率に限らず，他の何らかの尺度を用いることができることは，前述したシステムの例と同様である。
【００４１】
以上，難解文変換システムへの適用例と個人文体変換システムへの適用例を説明したが，本システムは，文章語の何らかの文体を変換するものであれば，同様に適用することができる。この場合，各変形規則は共用することも可能である。例えば作者Ａの文体を作者Ｂの文体に変形する規則と，作者Ｂの文体を作者Ａの文体に変形する規則とが混在している変形規則があった場合でも，評価情報を変えるだけで，同じ変形規則を用いて作者Ａの文体から作者Ｂの文体への変換，これとは逆に作者Ｂの文体から作者Ａの文体への変換を実現することができる。この点が従来技術と大きく異なる点である。もちろん，変形規則をそれぞれの文体の変換に適した規則に選別して用いてもよいことは言うまでもない。
【００４２】
【発明の効果】
以上説明したように，本発明によれば，種々の目的とする文体変換を自動的に行うことが可能になる。文体変換のための文字列の変形規則は，必ずしも目的とする文体に変換されるような方向性のある変形の規則でなくてもよく，少なくとも同義性が満足されれば十分であるので，変形規則の収集・蓄積を容易に行うことが可能である。
【図面の簡単な説明】
【図１】本発明のシステム構成例を示す図である。
【図２】文章語文体変換の処理フローチャートである。
【図３】難解文変換システムへの適用例を示す図である。
【図４】個人文体変換システムへの適用例を示す図である。
【符号の説明】
１文章語文体変換処理装置
１０変換対象文
１１変形処理部
１２評価処理部
１３変換結果文
１４変形規則記憶部
１５評価情報記憶部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a text word style conversion system that converts a text string of a text word described in a certain natural language into a text string of text words in another text style described in the same natural language.
[0002]
[Prior art]
A typical example of conversion processing of sentences or sentences related to sentences written in a natural language is machine translation. In machine translation, sentences or sentences written in a natural language of one country are converted into sentences or sentences written in a natural language of another country.
[0003]
In contrast to machine translation, which translates into languages of other countries, systems that convert sentences or sentences between the same natural languages are also being used. For example, it is a system that automatically generates a summary sentence or recommends a sentence.
[0004]
In general, in the process of converting sentences between the same natural language, a large number of conversion rules consisting of pairs of patterns of words, phrases, sentences, etc. before conversion and patterns of words, phrases, sentences, etc. after conversion are prepared. The pattern matching is used to find a pre-conversion pattern that appears in the input sentence, and if there is a corresponding pattern, replace it with a pattern such as a word / phrase / sentence after conversion.
[0005]
In addition, as a sentence conversion process between the same natural languages, summary sentences were automatically generated and sentences were reviewed. For example, a novel written by Ryunosuke Ninagawa was written in the style of Natsume Soseki. There was no such thing as converting a particular individual's style or general style to another particular person's style, such as converting to a novel.
[0006]
[Problems to be solved by the invention]
In conventional sentence or sentence conversion processing in the same natural language, uniform conversion is generally performed according to conversion rules, and the quality of conversion results has not been evaluated. Therefore, whether or not a good conversion is actually performed depends greatly on the quality of the conversion rules prepared in advance, and depending on the conversion rules applied, the conversion result may differ from the target conversion. there were.
[0007]
In addition, in order to improve the conversion accuracy, it is necessary to select only truly valid conversion rules, and selecting a large number of conversion rules has been a very difficult task. For example, a conversion rule includes a rule for converting a character string “A” into a character string “B” and a rule for converting a character string “B” into a character string “A”. When mixed, the conventional technology could not obtain the correct conversion result.
[0008]
For this reason, the conventional technology has a problem that it is difficult to construct a system that converts personal writing styles and difficult sentences into plain sentences that are easy for elementary school students to understand.
[0009]
The present invention solves the above-mentioned problems and does not select carefully selected rules as rules for changing the expression of text words. For example, even when various rules such as those automatically acquired by a computer are used, It aims at providing the system which can be appropriately converted into the style of the object.
[0010]
[Means for Solving the Problems]
In order to solve the above-mentioned problems, the present invention provides a first system described in a natural language in a system that converts a text word described in a certain natural language into a text word having another style described in the same natural language. Is a transformation rule that paraphrases the character string of the above into a second character string having the same meaning, and the computer extracts the description of the same word from a plurality of different dictionary files, matches the language information of the plurality of extracted descriptions, Stores a transformation rule that does not depend on the conversion direction of the target sentence style, which is composed of the first character string and the second character string automatically generated by the synonym or synonym phrase obtained from the matching result. a transformation rule storing means, numerical information for the results obtained by modifying the string representation to assess whether it is the style of interest, function group or subroutines, evaluation method Appearance frequency of conversion candidates in an example in a database in which a set of sentences in a database storing a set of sentences according to a conversion destination is used as evaluation information composed of written rules or information obtained by combining these rules. Or an evaluation information storage means for storing predetermined evaluation information so that the higher the appearance probability is, the higher the evaluation value is given, and when inputting a character string subject to stylistic conversion described in natural language, The transformation processing means transforms the input character string using the transformation rules stored in the transformation rule storage means to generate a plurality of conversion candidates. Next, the plurality of generated conversion candidates are evaluated by the evaluation processing means using the evaluation function or the evaluation rule stored in the evaluation information storage means, and the best expression of the evaluation result is selected, and the selected expression is selected. The result of conversion is output as a sentence word converted to the target style.
[0011]
For example, the scale of evaluation by an evaluation function or evaluation rule includes the appearance frequency or probability of conversion candidates in a large number of examples consisting of plain text sets, and gives a high evaluation to those with a high appearance frequency or appearance probability. By doing so, it is possible to realize a text-to-sentence conversion system that converts a character string of an input difficult sentence into a plain text.
[0012]
In addition, for example, the evaluation scale by the evaluation function or the evaluation rule includes the appearance frequency or appearance probability of conversion candidates in a large number of examples made up of a specific individual's sentence set, and is high for those with a high appearance frequency or appearance probability. By giving the evaluation, it is possible to realize a text-to-sentence style conversion system that converts the expression of the input character string into a specific personal style.
[0013]
Although it is not always necessary to use the frequency of appearance or the probability of appearance as an evaluation scale, it is possible to evaluate whether or not the text style is suitable for the purpose after transformation of the written word by some kind of evaluation scale. Since the transformation is selected as the conversion result, it is sufficient that the information prepared in advance as the transformation rules of the style is basically synonymous with the character string, and a variety of transformation rules can be used without careful selection. There is no need to consider the direction of the deformation rule such as the deformation source and the deformation destination. That is, for example, in a deformation rule, a rule that transforms a character string “A” into a character string “B”, and a rule that transforms a character string “B” into a character string “A”. Even if both are mixed, the final conversion result can be obtained by the evaluation. Therefore, it is easy to create a transformation rule, and it is possible to use a transformation rule prepared for a certain style conversion for another style conversion.
[0014]
The above means can be realized by a computer and a software program that is installed and executed on the computer, and the program is a computer-readable portable medium memory, semiconductor memory, hard disk, or other suitable recording medium. Can be stored.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows a system configuration example of the present invention. In the figure, reference numeral 1 denotes a text-to-sentence conversion processing device realized by a computer including a CPU and a memory, and includes a deformation processing unit 11, an evaluation processing unit 12, a deformation rule storage unit 14, and an evaluation information storage unit 15.
[0016]
The conversion target sentence 10 is a natural language sentence that becomes an input in the present system. Hereinafter, the conversion target sentence 10 is not necessarily limited to one sentence, but may be a sentence, a phrase, a clause, or the like, although not particularly specified. The conversion result sentence 13 is an output of the present system, and is the conversion target sentence 10 having the same type of natural language and a different style from the original style.
[0017]
The module of the text / word style conversion processing device 1 basically includes a deformation processing unit 11 and an evaluation processing unit 12. The deformation processing unit 11 is a module that acquires conversion candidates using the deformation rules stored in the deformation rule storage unit 14. The evaluation processing unit 12 evaluates whether or not the conversion candidate is good, that is, whether the sentence is suitable for the purpose, using an evaluation scale (evaluation function or the like) stored in the evaluation information storage unit 15 in advance, and has the highest evaluation. This module selects conversion candidates.
[0018]
When the conversion target sentence 10 is input, the transformation processing unit 11 uses the transformation rules to list transformation candidates, and the evaluation processing unit 12 checks the validity of the transformed text and finds that it is the most appropriate. The determined one is selected and the result is output as a conversion result sentence 13.
[0019]
The transformation rules stored in the transformation rule storage unit 14 are synonyms obtained from the result of matching the extracted language explanation information of a plurality of explanation texts by extracting explanation texts of the same word from a plurality of different dictionary files. It consists of the first character string and the second character string automatically generated by a word or synonym phrase. This transformation rule does not depend on the direction of the target style. Evaluation information of the evaluation function for evaluating the conversion candidates (rating scale) prepares what is appropriate for each issue to be handled.
[0020]
The evaluation information may be numerical information for evaluation, or may be procedural information such as a function group or a subroutine group. Also, it may be a rule (rule) describing the evaluation method. It can also be realized by a combination of these. A typical example of the evaluation scale used in the evaluation processing unit 12 is an appearance frequency (or appearance probability) in a large amount of language data composed of a sentence set of a target style.
[0021]
For example, in the present system that changes the style of a sentence word, it is assumed that all deformation rules used by the deformation processing unit 11 satisfy synonyms. In this case, if the data whose style is to be converted is transformed so that the probability of occurrence (occurrence) of the data is higher in the text set of the desired style, the text word is very close to the target style.
[0022]
This will be explained with a simpler example. For example, it is assumed that the input sentence has a so-called “Datone” style, and there are many character strings “~”. Suppose the transformation rule has a rule that transforms “is” into “is”. If the target style is “Naruto”, a database storing a large amount of text data of the target style is prepared and used for evaluation. Count the number of occurrences of “is” and “is” in the database. If there are more occurrences of “is”, the evaluation of “is” is higher than “is”. By this evaluation, the style of “Datone” is automatically converted to the style of “Datone”.
[0023]
Here, by changing the corpus for examining the appearance frequency (or appearance probability), the results of various style conversions can be obtained. For example, when the input data is a legal sentence, if a set of plain sentences is given as a corpus, it can be expected that a difficult sentence related to the law will be transformed into a plain sentence.
[0024]
Also, if you put a novel sentence written by someone as input data and put a Shakespeare novel as a corpus, a novel with a Shakespeare style will be completed. In the same way, it is possible to convert the novel of Ryunosuke Akutagawa into the style of Natsume Soseki.
[0025]
The deformation rules used in this system are automatically generated by a computer, for example, as follows. First, language information having a semantic correspondence described in a plurality of the same languages is extracted. Specifically, by preparing a plurality of different dictionary file, we extract a description of the same word from them. Next, a plurality of extracted language information is matched, and a synonym or synonym phrase is extracted from the result. From the extracted synonym or synonym phrase, a deformation rule that automatically translates the first character string into a synonymous second character string is automatically generated.
[0026]
FIG. 2 is a processing flowchart of the deformation processing unit 11. First, in step S10, the transformation processing unit 11 inputs the conversion target sentence 10 designated as the style conversion target. The input method does not matter, such as input from a keyboard, input from a file, input from an application program.
[0027]
In step S11, a deformation rule necessary for conversion is read from the deformation rule storage unit 14. If it has already been read, reading here is not necessary. In step S12, the input conversion target sentence 10 is transformed using a transformation rule. The transformed expression candidate is delivered to the evaluation processing unit 12. At this time, the transformed expression candidates may be passed to the evaluation processing unit 12 one by one, or when there are a plurality of candidates, they may be passed together.
[0028]
In step S <b> 13, the evaluation processing unit 12 receives a candidate for expression after transforming the conversion target sentence 10 from the transformation processing unit 11, and uses the evaluation information stored in the evaluation information storage unit 15 to obtain a target sentence style. Evaluate whether the expression is suitable for This evaluation information may be an evaluation function called from the evaluation processing unit 12, or may be a parameter used by the evaluation function. For each candidate for expression, the evaluation result is calculated as a numerical value (evaluation value). In step S14, the expression after modification having the best evaluation result is selected, and the modified expression is output as the conversion result sentence 13.
[0029]
Hereinafter, specific application examples to various text-to-sentence conversion processing systems will be described.
[0030]
(A) Application Example to Difficult Sentence Conversion System FIG. 3 shows an application example to the difficult sentence conversion system. The difficult sentence conversion system shown in FIG. 3 performs processing for rewriting legal sentences into plain sentences and rewriting difficult newspaper articles into easy sentences for elementary school students.
[0031]
For example, as in the example of FIG. 3, it is assumed that a sentence “I exempt the minister” is input as the conversion target sentence 10. The transformation processing unit 11 paraphrases the conversion target sentence 10 into different expressions having the same meaning using the transformation rules prepared in advance in the transformation rule storage unit 14. Here, as a transformation rule,
“Relieve” → “Stop”
...
If there is a rule, the transformation processing unit 11 applies the transformation rule to the conversion target sentence 10 to generate a sentence “stop the minister” from a sentence “dismiss the minister”. In addition, there are various transformation rules, and many transformed sentences are generated as candidates. These sentences are passed to the evaluation processing unit 12. Note that the conversion target sentence 10 that has not been transformed is also passed to the evaluation processing unit 12 as one of the candidates.
[0032]
The evaluation processing unit 12 evaluates the sentence deformed by the deformation processing unit 11 using evaluation information (evaluation function) prepared in advance in the evaluation information storage unit 15. Here, the evaluation scale gives a high evaluation to the conversion that increases the appearance frequency or the appearance probability in a sentence set for a low age group such as for elementary school students. Therefore, in this example, the evaluation processing unit 12 obtains the appearance frequencies of “dismiss the minister” and “stop the minister” in the sentence set for the younger age group within a predetermined range. A simple method is to count how many times the character string in the small area range including the deformed part appears in the language data. If the appearance frequency is higher for “quit the minister”, this expression is easier to express for younger age groups. As a result, this modification is accepted, and the conversion result sentence 13 is output “Stop the Minister”. Note that the appearance (occurrence) probability may be calculated instead of the appearance frequency.
[0033]
Further, the evaluation scale is not limited to the appearance frequency and the appearance probability in a predetermined sentence set, and any other scale can be used. For example, it is possible to preliminarily set evaluation points for word connections and grammatical phrases obtained from the result of parsing, and to use them for evaluation. Another possible method is to learn some measure of simplicity from the converted text set and evaluate it accordingly.
[0034]
In addition, in the sentence set for the younger age group, instead of counting the appearance frequency of conversion candidates each time it is evaluated, the appearance frequency (appearance) for each sentence, phrase, phrase, word, etc. that appears in the sentence set in advance. It goes without saying that an appearance frequency (appearance probability) may be obtained by preparing a table for which the probability) has been obtained and searching the table.
[0035]
(B) Application Example to Personal Style Conversion System FIG. 4 shows an application example to the personal style conversion system. In the personal style conversion system shown in FIG. 4, for example, a novel written by Ryunosuke Ayukawa is rewritten into a novel written by Natsume Soseki, or a novel of an unknown writer is rewritten as a novel written by Shakespeare.
[0036]
For example, as in the example of (1) in FIG. 4, it is assumed that a sentence “I have exempted the minister” has been input as the conversion target sentence 10. The transformation processing unit 11 paraphrases the conversion target sentence 10 into different synonymous expressions using the transformation rules prepared in advance in the transformation rule storage unit 14. Here, as a transformation rule to the style of a person who frequently uses the expression “such as”
“To do” → “To do”
...
The transformation processing unit 11 applies the transformation rule to the conversion target sentence 10, so that the sentence “I exempted the minister” was used to “dismiss the minister”. Is generated. In addition, there are various transformation rules, and many transformed sentences are generated as candidates. These sentences are passed to the evaluation processing unit 12. Note that the conversion target sentence 10 that has not been transformed is also passed to the evaluation processing unit 12 as one of the candidates.
[0037]
The evaluation processing unit 12 evaluates the sentence deformed by the deformation processing unit 11 using evaluation information (evaluation function) prepared in advance in the evaluation rule storage unit 15. Here, the evaluation scale gives a high evaluation to an expression having a high appearance frequency or appearance probability in a sentence set of a specific individual that is the purpose of conversion. Therefore, in this example, the evaluation processing unit 12 obtains the appearance frequency of the sentence “I have exempted the minister” or the sentence “I have exempted the minister” in the sentence set of the specific individual. . Note that the appearance frequency does not necessarily have to be the number of appearances of the entire sentence, and it may be the number of times a character string in a small area range including the deformed portion appears in the sentence set. When the appearance frequency of the sentence “I did something like exempting the minister” is high, the evaluation processing unit 12 outputs “I did something like exempting the minister” as the conversion result sentence 13.
[0038]
Also, in the case of a change to the style of a person who frequently uses “will”, for example, as a transformation rule,
"I think" → "I will"
...
As a measure of evaluation, an expression that makes the appearance frequency or the probability of occurrence in a sentence set of a specific individual who frequently uses the word “probably” be used.
[0039]
When the transformation processing unit 11 inputs the conversion target sentence 10 “I think that the Minister is exempted” as shown in (2) of FIG. 4, the transformation processing unit 11 applies the transformation rule to this input, Will be transformed. If the evaluation by the evaluation processing unit 12 shows that the evaluation value of the expression “I will exempt the minister” is the highest, the evaluation processing unit 12 outputs this sentence as a conversion result sentence 13.
[0040]
Note that the evaluation scale is not limited to the appearance frequency and the appearance probability in a predetermined sentence set, and any other scale can be used, as in the system example described above.
[0041]
As described above, the application example to the difficult sentence conversion system and the application example to the personal style conversion system have been described. However, the present system can be applied in the same manner as long as it converts some styles of sentence words. In this case, each transformation rule can be shared. For example, even if there is a transformation rule in which a rule that transforms the style of the author A into the style of the author B and a rule that transforms the style of the author B into the style of the author A, there is only a change in the evaluation information. Using the same transformation rule, conversion from the style of author A to the style of author B, and conversely, conversion from the style of author B to the style of author A can be realized. This is a significant difference from the prior art. Of course, it goes without saying that the transformation rules may be selected and used as rules suitable for conversion of each style.
[0042]
【The invention's effect】
As described above, according to the present invention, it is possible to automatically perform style conversion for various purposes. The transformation rules for character strings for stylistic conversion do not necessarily have to be transformation rules with directionality that can be converted into the target style, and it is sufficient if at least the synonym is satisfied. It is possible to easily collect and accumulate rules.
[Brief description of the drawings]
FIG. 1 is a diagram showing a system configuration example of the present invention.
FIG. 2 is a processing flowchart of sentence word style conversion.
FIG. 3 is a diagram illustrating an application example to a difficult sentence conversion system.
FIG. 4 is a diagram illustrating an application example to a personal style conversion system.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Text word style conversion processing apparatus 10 Conversion object sentence 11 Deformation process part 12 Evaluation process part 13 Conversion result sentence 14 Deformation rule memory | storage part 15 Evaluation information storage part

Claims

A system for converting a sentence word described in a natural language into a sentence word with another style described in the same natural language,
A modification rule for paraphrasing a first character string described in the natural language into a second character string having the same meaning, and a plurality of extracted explanations of the same word extracted from a plurality of different dictionary files by a computer The linguistic information of the description sentence is matched, and conversion of the target sentence style consisting of the first character string and the second character string automatically generated by the synonym or synonym phrase obtained from the result of the matching Deformation rule storage means for storing deformation rules independent of direction;
Evaluation information consisting of numerical information for evaluating whether the expression of the result of transforming the character string has the target style, functions or subroutines, rules describing the evaluation method, or information combining these The evaluation scale that prescribes the evaluation information gives a higher evaluation value as the appearance frequency or appearance probability of the conversion candidate in the example in the database storing the set of sentences by the conversion destination style is higher. Evaluation information storage means for storing evaluation information determined in advance,
An input means for inputting a character string to be converted in a natural language;
A transformation processing means for transforming the input character string using a transformation rule stored in the transformation rule storage means, and generating a plurality of conversion candidates;
An evaluation value based on the evaluation scale is calculated using evaluation information stored in the evaluation information storage unit for a plurality of conversion candidates generated by the deformation processing unit, and each conversion candidate is represented by the calculated evaluation value. Evaluation processing means that evaluates and selects the expression with the highest evaluation value;
An output unit that outputs the conversion result of the selected expression as a sentence word converted into a target sentence style.

When evaluating the expression of each conversion candidate, the evaluation processing means determines how many times the character string in the area range including the deformed portion of the character string deformed by the conversion candidate is included in the sentence set in the database. 2. The sentence-to-sentence conversion according to claim 1 , wherein an appearance frequency indicating whether it has appeared is used as an evaluation scale, and a higher evaluation value is given to the higher appearance frequency to evaluate the candidate expression for conversion. system.

Ru sentences word written in natural language Oh, a system for converting the text word by another style described in the same natural language, a program for realizing a computer,
A modification rule that paraphrases the first character string described in the natural language into a second character string having the same meaning, and extracts a plurality of explanatory sentences of the same word extracted from a plurality of different dictionary files. Depends on the direction of conversion of the target style consisting of the first character string and the second character string automatically generated by the synonym or synonym phrase obtained from the result of matching A deformation rule storage means for storing a deformation rule that is not to be stored;
Evaluation information consisting of numerical information for evaluating whether the expression of the result of transforming the character string has the target style, functions or subroutines, rules describing the evaluation method, or information combining these The evaluation scale that prescribes the evaluation information gives a higher evaluation value as the appearance frequency or appearance probability of the conversion candidate in the example in the database storing the set of sentences by the conversion destination style is higher. Evaluation information storage means for storing evaluation information determined in advance,
An input means for inputting a character string to be converted in a natural language;
The character string the input deformation with the rules of deformation stored in the transformation rule storing means, and deformation processing means for generating a plurality of candidate conversion,
An evaluation value based on the evaluation scale is calculated using evaluation information stored in the evaluation information storage unit for a plurality of conversion candidates generated by the deformation processing unit , and each conversion candidate is represented by the calculated evaluation value. Evaluation processing means that evaluates and selects the expression with the highest evaluation value;
As an output means for outputting the conversion result of the selected expression as a sentence word converted to a target sentence style,
A text-to-sentence conversion program for making a computer function.