JP2006227425A

JP2006227425A - Speech reproducing device and utterance support device

Info

Publication number: JP2006227425A
Application number: JP2005042916A
Authority: JP
Inventors: Maki Murata; 真樹村田; Hitoshi Isahara; 均井佐原
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2005-02-18
Filing date: 2005-02-18
Publication date: 2006-08-31
Anticipated expiration: 2025-02-18
Also published as: JP4811557B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech reproducing device and an utterance support device for better communication with an audience by expressing a word or phrase difficult for a speaker or speech synthesis to pronounce in other words. <P>SOLUTION: The utterance support device 1 includes an input means 21 for acquiring document text data in the device, a word string extracting means 22 for extracting a word string, a substituent candidate word string retrieving means 23 for retrieving a substitutable candidate word string by matching the word string against a synonym database 24, a pronunciation difficulty database 26 in which difficulties in pronouncing respective word strings are quantitatively recorded in advance, a word string substituting means 25 for acquiring difficulties in pronouncing a word string to be substituted and a word string as a substitution candidate from the pronunciation difficulty database 26 respectively and selecting the word string having the minimum difficulty in pronouncing for substitution, and an output means 12 for outputting a text for dictation. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、言語処理技術に関するものであって、特に音声合成処理により聞き取りやすい語句を用いるように、あるいは外国語で発話する際に発話者が発音しやすい語句を用いるように原文を変換する方法に係るものである。 The present invention relates to language processing technology, and in particular, a method of converting an original sentence so that words that are easy to hear by speech synthesis processing are used, or words that are easy for a speaker to pronounce when speaking in a foreign language. It is related to.

近年生活の多くの場面で人工的に音声合成処理された音声を耳にする機会が増えている。音質的にもかなり改善が進んでおり、特定の文言を出力する場合にはイントネーションなどを細かく定義することで聞き取りやすい音声出力が実現できる。 In recent years, there are increasing opportunities to hear speech that has been artificially synthesized in many scenes of daily life. The sound quality has also improved considerably, and when outputting specific words, it is possible to achieve easy-to-listen voice output by finely defining intonation and the like.

しかしながら、Webコンテンツの読み上げなど任意のテキストを入力して音声合成を行うと、依然として聞き取りにくく、正確に内容が伝わらない恐れがある。一方で、テキストに対して適切な抑揚をつけることなどによって聞き取りやすくするための研究も進んでいるが、音声合成処理に関しての処理負荷が大きく、高速で再生する場合などに不向きな問題がある。 However, if speech synthesis is performed by inputting arbitrary text such as reading out Web content, it is still difficult to hear and the content may not be transmitted accurately. On the other hand, research to make it easy to hear by adding appropriate inflection to text is also progressing, but there is a problem that is unsuitable for high-speed playback due to heavy processing load related to speech synthesis processing.

聞き手が聞き取りにくい別のケースとしては、外国語での発話時に、母国語との発音方法の違いから正確な発音が困難で、意思が伝わりにくい場合がある。特に国際会議などにおける発表時には予め原稿を用意しているため、文法的な誤りが少なく、用語の選択も正確であるにも関わらず、発話者の発音が不適当な結果、発話内容が聴衆に伝わらないことは多い。
外国語学習において周知のように、たとえば日本人が英語のＬとＲを適切に区別して発音することは困難である。これは日本語においてＬとＲに対応する発音の区別がなく、いずれも「ラリルレロ」で発音していることに起因していると考えられる。 Another case where the listener is difficult to hear is that when speaking in a foreign language, accurate pronunciation is difficult due to the difference in pronunciation method from the native language, making it difficult to communicate. In particular, since the manuscripts are prepared in advance for presentations at international conferences, etc., there are few grammatical errors. There are many things that are not transmitted.
As is well known in foreign language learning, for example, it is difficult for Japanese to pronounce English with L and R properly distinguished. This is considered to be due to the fact that there is no distinction between pronunciations corresponding to L and R in Japanese, and both are pronounced as “Laryl Lero”.

発音は訓練によって大きく是正することが可能であるものの、ＬとＲの区別のようにいくつかの発音に関してはかなりの訓練を積まなければ適切に区別することはできない。通訳等、専門家でない者にとってこの負担は大きいものである。意思を明確に伝える観点からすると、必ずしも伝わりにくい単語を無理に用いて正確な発音を行うように訓練をせず、単にその単語を用いないようにすれば回避できる場合もある。 Although pronunciation can be greatly corrected by training, it cannot be properly distinguished without significant training for some pronunciations such as L and R. This burden is great for non-experts such as interpreters. From the point of view of clearly communicating intentions, there are cases in which it is possible to avoid this by not using a word that is not always easy to communicate and training to make accurate pronunciation, but simply not using that word.

本件発明者らによる非特許文献１においては、入力された文章に対して変形部で変形の候補をあげ、評価部において変形の妥当性をチェックし、もっとも妥当であると判断されたものに変形し、それを出力するシステムを提案している。
評価部で用いられる尺度として、言い換え語句間での類似度や長さ、頻度等様々な尺度があることが紹介されている。 In the non-patent document 1 by the present inventors, candidates for deformation are given to the input text by the deformation section, the validity of the deformation is checked by the evaluation section, and the deformation is determined to be the most appropriate one. And a system that outputs it.
It has been introduced that there are various scales such as similarity, length and frequency between paraphrased phrases as scales used in the evaluation unit.

村田真樹、井佐原均「言い換えの統一的モデル尺度に基づく変形の利用」言語処理学会第７回年次大会ワークショップ論文集２００１年Masaki Murata, Hitoshi Isahara "Uniform model of paraphrasing, Utilization of deformation based on scale" Proceedings of the 7th Annual Conference of the Language Processing Society of Japan 2001

例えば、長さを評価部の尺度として、より短い言い換えに高い評価を与えれば、文章の圧縮を行うことができる。
また、頻度を尺度にして、より頻度の高い単語に言い換えるようにすると、難解な文章を平易な単語で表現するように言い換えることができる。 For example, if the length is used as a scale of the evaluation unit and a high evaluation is given to a shorter paraphrase, the sentence can be compressed.
In addition, if the frequency is used as a scale and the words are rephrased with higher frequency, it is possible to rephrase the difficult sentences as simple words.

本件発明者らは、上記文献において発音しにくい単語をあまり使わない尺度というものを当該文献において示唆したが、いかなる尺度であるのか具体的な研究が進んでいなかった。そのため、該文献の時点では発音しにくい単語を原文から除去するのか、あるいはいずれかの単語に言い換えるのか、その場合どのように言い換えるのか、技術的な考察が行われておらず、願望を記載したものにとどまっていた。 The inventors of the present invention have suggested a scale that does not use words that are difficult to pronounce in the above-mentioned document, but no specific research has been conducted on what scale it is. Therefore, there was no technical consideration on whether words that would be difficult to pronounce at the time of the document should be removed from the original text, or in other words, how they would be rephrased. I stayed in things.

上記技術は下記の特許文献１などによっても開示されている。該開示では、複数種類の言い換えが必要な文または文章を、目的とする文または文章に簡単に変換することができるシステムを提供している。
具体的には、変形処理部が、変換対象文を入力すると変形規則記憶部中の変形規則を用いて多くの変換の候補を生成する。評価処理部は、生成された変換の候補について、文字列を変形した結果が目的とするふさわしい変換であるかどうかを評価するための複数の評価尺度を用いて評価し、評価結果のよい表現の文字列を選択する。その評価の高い文字列を変換結果文として出力する。
評価尺度は、評価尺度選択部によって選択することができ、また選択した評価尺度の重要度は評価重要度設定部によって設定することができる。 The above technique is also disclosed in Patent Document 1 below. This disclosure provides a system that can easily convert a sentence or sentence that requires multiple types of paraphrasing into a target sentence or sentence.
Specifically, when the transformation processing unit inputs a conversion target sentence, a number of transformation candidates are generated using the transformation rules in the transformation rule storage unit. The evaluation processing unit evaluates the generated conversion candidates using a plurality of evaluation scales for evaluating whether or not the result of transforming the character string is a target appropriate conversion, and Select a string. The highly evaluated character string is output as a conversion result sentence.
The evaluation scale can be selected by the evaluation scale selection section, and the importance of the selected evaluation scale can be set by the evaluation importance setting section.

特開2003-76687号公報Japanese Patent Laid-Open No. 2003-76687

本件出願人以外による開示としては、下記特許文献２が挙げられる。該開示の技術は、単語・複合語を問わず、聞き取りにくい表現を音声による読み上げに適した表現に変換し得る自然言語処理方法を提供することを目的とした技術である。
具体的には、テキスト変換部の発音パターン抽出部が、発音規則テーブル内に該当する発音パターンが存在するかをサーチし、読み上げた際に聞きづらいと推定される部分を抽出する。次に、テキスト変換処理部が、テキスト変換規則テーブルを用いて、抽出された部分を、読み上げを前提としたテキストに変換する。文書を音読する際の音の組み合わせという、新規な観点から広く聞きづらい表現をサーチするので、複合語単位の表現でも適切な表現に置換できる、としている。 As a disclosure by a person other than the present applicant, the following Patent Document 2 can be cited. The disclosed technique is a technique for providing a natural language processing method capable of converting an expression that is difficult to hear regardless of a word or a compound word into an expression suitable for reading by speech.
More specifically, the pronunciation pattern extraction unit of the text conversion unit searches for a corresponding pronunciation pattern in the pronunciation rule table, and extracts a portion that is estimated to be difficult to hear when read out. Next, the text conversion processing unit uses the text conversion rule table to convert the extracted portion into text premised on reading. He searches for expressions that are difficult to hear from a new point of view, which is a combination of sounds when reading a document aloud, so that even compound word expressions can be replaced with appropriate expressions.

特開2000-172289号公報JP 2000-172289 A

しかし、上記非特許文献１及び特許文献１は一般的な言い換え技術を開示したものであって、これらによって発音しにくい単語を簡便に置き換える手法が提供されたとは言えない。すなわち如何なる尺度を用いて、如何なる言い換えに対し、如何なる評価を行うのかが一切開示されていないため、単にＬとＲを含まない単語に置き換えることはできても、その置き換えの有効性や正確性を評価できない。
このような従来の方法では、機械的な用語の置き換え作業を行ったのと変わりがなく、好適な言い換えによる発話支援を実現できない問題がある。 However, Non-Patent Document 1 and Patent Document 1 disclose general paraphrasing techniques, and it cannot be said that a technique for easily replacing words that are difficult to pronounce is provided. That is, since what kind of scale is used and what kind of evaluation is performed is not disclosed at all, even if it can be simply replaced with a word that does not include L and R, the effectiveness and accuracy of the replacement can be determined. Cannot be evaluated.
In such a conventional method, there is no difference from performing mechanical term replacement work, and there is a problem that speech support by suitable paraphrasing cannot be realized.

また、特許文献２は、本技術は音声読み上げに関連した技術であるが、置換に際して主に助詞やポーズを挿入することで聞き取りやすくする技術を開示しており、音声の正確な再現自体が困難な場合に適用できるものではない。もちろん、外国語に適用しても大きな効果を得ることはできない。 In addition, Patent Document 2 discloses a technique related to speech reading, but discloses a technique that makes it easy to hear mainly by inserting a particle or a pose at the time of replacement, and it is difficult to accurately reproduce speech itself. It is not applicable to any case. Of course, even if it is applied to a foreign language, a great effect cannot be obtained.

上述した従来技術のように、従来公知の技術では外国語発話時において発音が困難な語句を言い換える技術は提供されておらず、とくに国際会議におけるプレゼンテーションや、その原稿作成時に置き換え候補となる語句をユーザに呈示できるようなシステムは実現することができなかった。
本発明は、発話者にとって発音が困難な語句を、好適な言い換えによって表現し、聴衆への意思疎通を促すための発話支援方法及び装置を提供することを目的とする。 Like the prior art described above, there is no technology for rephrasing words that are difficult to pronounce when speaking in a foreign language, as in the prior art described above. A system that can be presented to the user could not be realized.
An object of the present invention is to provide an utterance support method and apparatus for expressing a phrase difficult to pronounce for a speaker by a suitable paraphrase and for promoting communication with the audience.

本発明は、上記の課題を解決するために、次のような発話支援装置を提供する。
すなわち、請求項１に係る発明は、原稿テキスト中の発音困難な語句を置き換えて発話しやすい口述用テキストを出力する発話支援装置を提供する。
該装置には、原稿テキストデータを装置内に取得する入力手段と、該原稿テキストデータから単語又は単語列（以下、単語列と呼ぶ）を抽出する単語列抽出手段を備える。
さらに、単語列に対する同義語句を備えた同義語データベースを記憶媒体に格納し、単語列抽出手段により抽出された単語列を、該同義語データベースと照合し、置き換え可能な置換候補単語列を検索する置換候補単語列検索手段を有する。
また、各単語列の発音の困難度を予め定量的に記録した発音困難度データベースを記憶媒体に格納し、置換前の単語列と置換候補の単語列とについて、発音困難度データベースからそれぞれの発音困難度を取得し、該発音困難度が最小の単語列を選択し置換する単語列置換手段と、上記手段により単語列が置換された口述用テキストを出力する出力手段とを備える。 In order to solve the above problems, the present invention provides the following speech support apparatus.
That is, the invention according to claim 1 provides an utterance support device that outputs dictation text that is easy to utter by replacing difficult-to-pronunciate words in a manuscript text.
The apparatus includes input means for acquiring document text data in the apparatus, and word string extraction means for extracting words or word strings (hereinafter referred to as word strings) from the document text data.
Further, a synonym database having synonym phrases for the word string is stored in a storage medium, and the word string extracted by the word string extracting unit is compared with the synonym database to search for a replacement candidate word string that can be replaced. It has a replacement candidate word string search means.
In addition, a pronunciation difficulty level database in which the difficulty level of pronunciation of each word string is quantitatively recorded in advance is stored in a storage medium, and the word string before replacement and the replacement candidate word string are respectively pronounced from the pronunciation difficulty level database. It includes a word string replacement unit that acquires a difficulty level, selects and replaces a word string having the smallest pronunciation difficulty level, and an output unit that outputs dictation text in which the word string is replaced by the above unit.

請求項２に記載の発明によると、上記の発音困難度データベースが、単語列中の所定の文字又は発音記号の数に応じて困難度を定めたものであって、単語列置換手段が、抽出された単語列における該所定の文字又は発音記号の数を計数し、発音困難度データベースに定められた発音困難度の計算式に基づいて当該単語列の発音困難度を算出することを特徴とする。 According to the second aspect of the present invention, the pronunciation difficulty level database defines difficulty levels according to the number of predetermined characters or phonetic symbols in the word string, and the word string replacement means extracts The number of the predetermined characters or phonetic symbols in the word sequence is counted, and the pronunciation difficulty level of the word sequence is calculated based on a calculation formula of the pronunciation difficulty level set in the pronunciation difficulty database .

請求項３に記載の発明によると、上記の発話支援装置が、前記原稿テキストと同一言語のコーパスデータベースを備え、単語列置換手段が、置換候補単語列及び、その前後k-gram（kは前後で同一又は異なる任意の数）の単語列との配列がコーパスデータベース中で出現する頻度を計数する頻度計数部と、該頻度が所定値以上の時に、前記置換候補単語列の発音困難度が最小でかつ、該頻度が最多の置換候補単語列に置換する発音困難度比較置換部とを備える構成を提供することができる。 According to a third aspect of the present invention, the utterance support device includes a corpus database in the same language as the original text, and the word string replacement means includes a replacement candidate word string and k-grams before and after the replacement candidate word string. A frequency counting unit that counts the frequency of occurrence of sequences in the corpus database, and when the frequency is equal to or greater than a predetermined value, the pronunciation difficulty of the replacement candidate word string is minimized. In addition, it is possible to provide a configuration including a pronunciation difficulty comparison / replacement unit that replaces the replacement candidate word string with the highest frequency.

請求項４に記載の発明によると、次のような発話支援方法を提供することもできる。該方法は、コンピュータ上で実行するプログラムにより提要してもよい。
すなわち、原稿テキスト中の発音困難な語句を置き換えて発話しやすい口述用テキストを出力する発話支援方法であって、入力手段が、原稿テキストデータを装置内に取得する入力ステップ、単語列抽出手段が、該原稿テキストデータから単語又は単語列（以下、単語列と呼ぶ）を抽出する単語列抽出ステップを有する。
そして、単語列に対する同義語句を備えた同義語データベースを用いて置換候補単語列検索手段が、単語列抽出ステップで抽出された単語列を、該同義語データベースと照合し、置き換え可能な置換候補単語列を検索する置換候補単語列検索ステップ、各単語列の発音の困難度を予め定量的に記録した発音困難度データベースを用いて単語列置換手段が、置換前の単語列と置換候補の単語列とについて、発音困難度データベースからそれぞれの発音困難度を取得し、該発音困難度が最小の単語列を選択し置換する単語列置換ステップ、出力手段が、上記各ステップにより単語列が置換された口述用テキストを出力する出力ステップを少なくとも含んだ発話支援方法を提供する。 According to the fourth aspect of the present invention, the following speech support method can be provided. The method may be provided by a program executed on a computer.
That is, an utterance support method for outputting dictation text that is easy to utter by replacing difficult-to-pronunciate words in the manuscript text, wherein the input means acquires the manuscript text data in the apparatus, and the word string extraction means And a word string extracting step of extracting a word or a word string (hereinafter referred to as a word string) from the document text data.
Then, the replacement candidate word string search means using the synonym database provided with the synonym phrase for the word string collates the word string extracted in the word string extraction step with the synonym database and can be replaced. A replacement candidate word string search step for searching for a string, a word string replacement means using a pronunciation difficulty level database in which the difficulty level of pronunciation of each word string is quantitatively recorded in advance, and the word string before replacement and the replacement candidate word string The word string replacement step for obtaining each of the pronunciation difficulty levels from the pronunciation difficulty level database, selecting and replacing the word string having the minimum pronunciation difficulty level, and the output means, the word strings are replaced by the above steps There is provided an utterance support method including at least an output step of outputting dictation text.

請求項５に記載の発明は、上記の発音困難度データベースが、単語列中の所定の文字又は発音記号の数に応じて困難度を定めたものであって、単語列置換ステップにおいて、抽出された単語列における該所定の文字又は発音記号の数を計数し、発音困難性データベースに定められた発音困難度の計算式に基づいて当該単語列の発音困難度を算出することを特徴とするものである。 In the invention described in claim 5, the pronunciation difficulty level database defines difficulty levels according to the number of predetermined characters or phonetic symbols in the word string, and is extracted in the word string replacement step. Counting the number of the predetermined characters or phonetic symbols in the word string, and calculating the pronunciation difficulty of the word string based on the calculation formula of the pronunciation difficulty defined in the pronunciation difficulty database It is.

請求項６に記載の発明は、発話支援方法の単語列置換ステップにおいて、原稿テキストと同一言語のコーパスデータベースを用い、単語列置換手段が、該置換候補単語列及び、その前後k-gram（kは前後で同一又は異なる任意の数）の単語列との配列が、該コーパスデータベース中で出現する頻度を計数し、該頻度が所定値以上の時に、前記置換候補単語列の発音困難度が最小でかつ、該頻度が最多の置換候補単語列に置換する技術を提供する。 According to the sixth aspect of the present invention, in the word string replacement step of the speech support method, a corpus database in the same language as the manuscript text is used, and the word string replacement means includes the replacement candidate word string and k-gram (k-gram (k Is an arbitrary or the same number of word strings before and after) and counts the frequency of occurrence in the corpus database, and when the frequency is equal to or higher than a predetermined value, the pronunciation difficulty of the replacement candidate word string is minimum In addition, there is provided a technique for replacing with a replacement candidate word string having the highest frequency.

請求項７に記載の発明は、上記の技術を特にワードプロセッサプログラム上の機能として付加することを提案するものである。
すなわち、ワードプロセッサプログラムと共に用いられ、原稿テキスト中の発音困難な語句を置き換えて発話しやすい口述用テキストを出力する発話支援プログラムを提供する。
そして、ワードプロセッサ処理手段において編集状態の原稿テキストデータを取得する原稿テキストデータ読み出しステップ、単語列抽出手段が、該原稿テキストデータから単語又は単語列（以下、単語列と呼ぶ）を抽出する単語列抽出ステップを有し、作成中の文書から単語列を抽出する。 The invention described in claim 7 proposes to add the above technique as a function on a word processor program.
That is, it provides an utterance support program that is used together with a word processor program and outputs dictation text that is easy to utter by replacing difficult-to-pronunciation words in a manuscript text.
Then, a document text data reading step for obtaining the edited document text data in the word processor processing means, and a word string extraction in which the word string extraction means extracts a word or a word string (hereinafter referred to as a word string) from the document text data. A step of extracting a word string from a document being created;

さらに、単語列に対する同義語句を備えた同義語データベースを用いて置換候補単語列検索手段が、単語列抽出ステップで抽出された単語列を、該同義語データベースと照合し、置き換え可能な置換候補単語列を検索する置換候補単語列検索ステップ、各単語列の発音の困難度を予め定量的に記録した発音困難度データベースを用いて単語列選択手段が、置換前の単語列と置換候補の単語列とについて、発音困難度データベースからそれぞれの発音困難度を取得し、該発音困難度が最小の単語列を選択する単語列選択ステップを有する。
その上で、置換単語列呈示手段が、ワードプロセッサ処理手段において編集状態の原稿テキストデータと共に、単語列選択ステップで選択された置換する単語列を呈示し、ユーザに置換の有無の入力を促す置換単語列呈示ステップ、単語列置換手段が、ユーザの入力に応じて、単語列の置換を行う単語列置換ステップを含んだワードプロセッサプログラムと共に用いられる発話支援プログラムを提供するものである。 Further, the replacement candidate word string search means using the synonym database provided with the synonym phrase for the word string collates the word string extracted in the word string extraction step with the synonym database, and can be replaced. A replacement candidate word string search step for searching a string, a word string selection means using a pronunciation difficulty database that quantitatively records the difficulty of pronunciation of each word string in advance, the word string before replacement and the replacement candidate word string And a word string selection step of acquiring each pronunciation difficulty level from the pronunciation difficulty level database and selecting a word string having the minimum pronunciation difficulty level.
Then, the replacement word string presenting means presents the replacement word string selected in the word string selection step together with the original text data edited in the word processor processing means, and prompts the user to input whether or not to replace. The column presentation step and the word string replacement means provide an utterance support program used together with a word processor program including a word string replacement step for replacing a word string in accordance with a user input.

請求項８に記載の発明は、発音困難度データベースが、単語列中の所定の文字又は発音記号の数に応じて困難度を定めたものであって、置換単語列呈示ステップにおいて、抽出された単語列における該所定の文字又は発音記号の数を計数し、発音困難度データベースに定められた発音困難度の計算式に基づいて当該単語列の発音困難度を算出することを特徴とする。 In the invention according to claim 8, the pronunciation difficulty database is determined in accordance with the number of predetermined characters or phonetic symbols in the word string, and is extracted in the replacement word string presentation step The number of the predetermined characters or phonetic symbols in the word string is counted, and the pronunciation difficulty level of the word string is calculated based on the calculation formula of the pronunciation difficulty level defined in the pronunciation difficulty level database.

請求項９に記載の発明は、発話支援プログラムの置換単語列呈示ステップにおいて、原稿テキストと同一言語のコーパスデータベースを用い、置換単語列呈示手段が、置換候補単語列及び、その前後k-gram（kは前後で同一又は異なる任意の数）の単語列との配列が、該コーパスデータベース中で出現する頻度を計数し、該頻度が所定値以上の時に、前記置換候補単語列の発音困難度が最小でかつ、該頻度が最多の置換候補単語列を呈示することを特徴とする。 According to the ninth aspect of the present invention, in the replacement word string presentation step of the speech support program, a corpus database in the same language as the manuscript text is used, and the replacement word string presentation means includes a replacement candidate word string and k-gram ( k is an arbitrary number of the same or different word sequences before and after), and the frequency of occurrence in the corpus database is counted, and when the frequency exceeds a predetermined value, the pronunciation difficulty level of the replacement candidate word sequence is A replacement candidate word string that is the smallest and the most frequent is presented.

以上の発明により次の効果を奏する。すなわち、同義語データベースから多くの置換候補を得ると共に、発音の困難性を定量的に定めた発音困難性データベースを参照することにより、置換の重要性を数値化して評価し、最も適切な置換候補を選択することができる。
特に、前後の単語列との配列がコーパス中に出現するか否かを調べることによって、不自然な置換を行わないようにすることができる。 The following effects are produced by the above invention. In other words, many substitution candidates are obtained from the synonym database, and the importance of substitution is numerically evaluated by referring to the pronunciation difficulty database that quantitatively determines pronunciation difficulty, and the most appropriate substitution candidate Can be selected.
In particular, it is possible to prevent unnatural substitution by checking whether or not an arrangement with preceding and following word strings appears in the corpus.

また、請求項７ないし９に記載の発明のようにワードプロセッサプログラムと組み合わせて使用することにより、口述用の原稿を作成しながら、随時に発音のしにくい単語列の言い換え候補を呈示される構成を提供することができるので、効率的な発話支援に寄与する。 In addition, when used in combination with a word processor program as in the inventions described in claims 7 to 9, a paraphrase candidate for a word string that is difficult to pronounce at any time while creating a dictation manuscript is presented. Since it can be provided, it contributes to efficient speech support.

以下、本発明の実施形態を図面に示しながら説述する。なお、本発明の実施においては、以下に限定されることなく、任意に変形、応用等を行うことが可能である。
図１は本発明における発話支援装置（１）の構成図である。本装置（１）は周知のパーソナルコンピュータ等によって構成することが望ましい。これらには、演算処理やテキスト処理を司るＣＰＵ（１０）と共に、処理中のデータや、処理に用いるデータベースを格納するハードディスク（１１）や、表示を行うディスプレイ、ＣＰＵ（１０）と協働するメモリ（１３）などが備えられている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In addition, in implementation of this invention, it is not limited to the following, A deformation | transformation, an application, etc. can be performed arbitrarily.
FIG. 1 is a block diagram of an utterance support apparatus (1) according to the present invention. The apparatus (1) is preferably constituted by a known personal computer or the like. These include a CPU (10) that controls arithmetic processing and text processing, a hard disk (11) that stores data being processed and a database used for processing, a display that displays data, and a memory that cooperates with the CPU (10). (13) etc. are provided.

まず、本装置（１）に対して原稿テキスト（２０）を入力する。原稿テキストは、プレゼンテーションなどで読み上げる為の原稿であり、資料や論文などに基づいて作成される。これら原稿テキストは文法的、語句の用法などは誤っていないが、発音のしやすさなどは考慮せずに作成されたものである。 First, the document text (20) is input to the apparatus (1). The manuscript text is a manuscript for reading in a presentation or the like, and is created based on a document or a paper. Although these manuscript texts are grammatically correct and the usage of phrases is not wrong, they were created without considering the ease of pronunciation.

そして、原稿テキスト取得部（２１）が該テキストデータをＣＰＵ（１０）内に入力する。この処理は通常はハードディスク（１１）からのデータ読み出しなど一般的な処理である。
入力された原稿テキストは、次の単語列抽出部（２２）で抽出処理される。
本実施例では原稿テキストとして英文の発表用原稿を例に挙げて以下説述する。 Then, the document text acquisition unit (21) inputs the text data into the CPU (10). This processing is usually a general processing such as data reading from the hard disk (11).
The input document text is extracted by the next word string extraction unit (22).
In this embodiment, an English presentation manuscript is taken as an example of manuscript text, and will be described below.

英語やフランス語、韓国語などは、一般に分かち書きと呼ばれる単語毎に空白を空ける方法で記載される。実際の発音は隣接する単語間で融合した発音となる場合も多いが、原則的には空白で区切られた単語毎に発音が途切れる。
したがって、本発明のように発音に着目する場合、空白で区切られた単語を抽出するのが簡便であり合理的である。
単語列抽出部（２２）は空白コードを参照しながらテキストデータから単語を抽出していく。その際、ハードディスクに辞書データなどを備えるか、後述する同義語データベースやコーパスに含まれる情報を用いて、複数の単語で１つの意味を持つ熟語などの単語列を抽出してもよい。 English, French, Korean, and the like are generally described in a way that leaves a space for each word, which is called a split-word. In many cases, the actual pronunciation is a pronunciation that is fused between adjacent words, but in principle, the pronunciation is interrupted for each word separated by spaces.
Therefore, when focusing on pronunciation as in the present invention, it is simple and reasonable to extract words separated by spaces.
The word string extraction unit (22) extracts words from the text data while referring to the blank code. At that time, the hard disk may be provided with dictionary data or the like, or a word string such as an idiom having one meaning among a plurality of words may be extracted using information included in a synonym database or a corpus to be described later.

もっとも本発明は、英語等の発話支援に限らず日本語を対象としてもよく、その場合には公知の形態素解析処理を用いて、日本語テキストデータから単語列を抽出してもよい。
形態素解析処理は周知のさまざまな手法を用いることができるが、例えば形態素解析プログラムであるＪＵＭＡＮを用いて形態素解析して、形態素列に分解することができる。なお、本発明の実施においては文法的に厳密な形態素解析は特に必要ではない。発音の単位毎に分割することが主な目的であるから、単に辞書データを参照して、記載される文字数の多い単語毎に抽出するといった方法でもよい。 However, the present invention is not limited to utterance support such as English, and may target Japanese. In that case, a word string may be extracted from Japanese text data using a known morphological analysis process.
Various known techniques can be used for the morpheme analysis processing. For example, morpheme analysis can be performed using JUMAN, which is a morpheme analysis program, and decomposed into morpheme strings. Note that grammatically strict morphological analysis is not particularly necessary in the practice of the present invention. Since the main purpose is to divide into units of pronunciation, a method of simply referring to dictionary data and extracting each word having a large number of characters may be used.

単語列抽出部（２２）で抽出された単語列毎に、言い換えを行う必要があるか、必要がある場合には如何なる語句に言い換えるか、次の処理により決定する。
すなわち、置換候補単語列検索部（２３）はハードディスク内（１１）に格納された同義語データベース（２４）を参照して、抽出単語列に同義語（置換候補単語列）があるか否か検索する。 For each word string extracted by the word string extraction unit (22), it is necessary to perform paraphrasing, and if necessary, what words are to be paraphrased are determined by the following processing.
That is, the replacement candidate word string search unit (23) refers to the synonym database (24) stored in the hard disk (11) and searches whether there is a synonym (substitution candidate word string) in the extracted word string. To do.

同義語データベース（２４）として、本実施例では英語の同義語データベースであるWordNet2.0（非特許文献２）を用いた。
http://wordnet.princeton.edu/ ,Princeton University, Wordnet 2.0.２００３年 As the synonym database (24), WordNet 2.0 (Non-patent Document 2), which is an English synonym database, was used in this example.
http://wordnet.princeton.edu/, Princeton University, Wordnet 2.0. 2003

なお、英語の場合には動詞の活用形や名詞の複数形などについても同義語表現として抽出する必要があるため、上記同義語データベースにはこれらの情報も付加して用いている。 In the case of English, verb inflection forms and plural forms of nouns need to be extracted as synonym expressions, so these information are also added to the synonym database.

そして検索の結果、置換候補単語列が存在する単語列は該候補と共に次の単語列置換部（２５）に、それ以外の単語列はそのまま置換候補単語列検索部（２３）から出力される。
単語列置換部（２５）では置換候補単語列に言い換えを行うのが良いか、発音困難度データベース（２６）を参照して演算により決定する。 As a result of the search, the word string in which the replacement candidate word string exists is output to the next word string replacement unit (25) together with the candidate, and the other word strings are output from the replacement candidate word string search unit (23) as they are.
In the word string replacement unit (25), it is determined by calculation with reference to the pronunciation difficulty database (26) whether the replacement candidate word string should be paraphrased.

発音困難度データベース（２６）には、図２に示すようなデータテーブルを備えている。すなわち、本データベースは発音の困難度を定量的に設定したものであって、単語列の有する特性に対して困難度を与える。
困難度は例えば欄（３０）のようにrightという単語の困難度１３と各単語に対して個別に設定しても良い。しかし、全ての発音困難な単語を抽出するのは膨大な作業を必要とするため、欄（３１）〜（３３）のように単語の特性から単語列置換部（２５）で算出するのが望ましい。 The pronunciation difficulty database (26) includes a data table as shown in FIG. That is, this database is a quantitative setting of the difficulty level of pronunciation, and gives a difficulty level to the characteristics of the word string.
The difficulty level may be set individually for each word and the difficulty level 13 of the word “right” as in the column (30). However, since extracting all difficult-to-pronounced words requires a huge amount of work, it is desirable that the word string replacement unit (25) calculates from the characteristics of the words as in columns (31) to (33). .

すなわち、単語列置換部（２５）は抽出された単語列における所定の文字か、発音記号の数等を計数して、発音困難度データベース（２６）に格納された計算式に基づいて単語列の発音困難度を算出する。
計算式は任意に設定することができるが、本実施例では文字・発音記号の数と、重み付け値を乗算する簡便な手法を用いている。 That is, the word string replacement unit (25) counts predetermined characters in the extracted word string, the number of phonetic symbols, and the like, and based on the calculation formula stored in the pronunciation difficulty database (26), Calculate the pronunciation difficulty.
Although the calculation formula can be arbitrarily set, in this embodiment, a simple method of multiplying the number of characters / phonetic symbols by a weighting value is used.

具体的に説述すると、第１段（３１）では単語列中にＬ又はＲの文字がいくつ含まれているかに着目したものであり、当該単語に対する困難度はその個数×１と定義する。例えばleftであれば困難度１と算出される。
次に、第２段（３２）では、単語列中の第１アクセントの音にＬ又はＲが含まれる個数に着目している。一般的に第１アクセントの発音は重要であり、母国語として聞き取る者はこの音及び全体の韻律で単語を判断することも多い。そこで、第１アクセントにＬ又はＲが含まれた場合の困難度はその個数×５と定義する。上記の困難度に加算する方法をとると、leftの困難度は１＋５＝６となる。 More specifically, the first level (31) focuses on how many L or R characters are included in the word string, and the degree of difficulty for the word is defined as the number × 1. For example, if it is left, the difficulty level 1 is calculated.
Next, in the second stage (32), attention is focused on the number of L or R included in the first accent sound in the word string. In general, the pronunciation of the first accent is important, and the person who hears it as a native language often judges the word based on this sound and the overall prosody. Therefore, the degree of difficulty when L or R is included in the first accent is defined as the number × 5. If the method of adding to the above difficulty is taken, the difficulty of left is 1 + 5 = 6.

第３段（３３）の定義は、単語列中のＬとＲを入れ替えたときに別な有意な単語列を構成するか否かに着目している。例えばrightという単語ではＲとＬとを入れ替えてもlightとなり、正確な発音を行わなければ両者を混同する可能性は高い。
そこで、この個数×７を困難度として定義する。以上によると、rightの困難度は、いずれの条件にも１個ずつ合致するため、１＋５＋７＝１３として定義される。 The definition of the third level (33) focuses on whether or not another significant word string is formed when L and R in the word string are exchanged. For example, in the word “right”, even if R and L are interchanged, it becomes light.
Therefore, this number × 7 is defined as the difficulty level. According to the above, the right difficulty level is defined as 1 + 5 + 7 = 13 because it matches each condition one by one.

該発音困難度データベース（２６）の構成は母国語及び外国語の関係によって定義されるものであるから、例えば中国人向けの日本語発話用データベースというように、組み合わせに応じて用意するのが望ましい。このように装置（１）に対して「母国語」「発話言語」の２個の情報を与えると、複数の発音困難度データベースから、２個の情報に合致するデータベースを選択して用いることにより、多言語に対応する装置構成を提供してもよい。
また、発話者が特に発音が苦手な単語列を手動で追加して言い換えるように構成してもよい。 Since the structure of the pronunciation difficulty database (26) is defined by the relationship between the native language and the foreign language, it is desirable to prepare according to the combination, such as a database for Japanese utterances for Chinese. . As described above, when two pieces of information “native language” and “spoken language” are given to the device (1), a database that matches the two pieces of information is selected and used from a plurality of pronunciation difficulty databases. An apparatus configuration corresponding to multiple languages may be provided.
In addition, the speaker may be configured to manually add and rephrase a word string that is particularly difficult to pronounce.

単語列置換部（２５）では、以上により原文から抽出された単語列と、その置換候補単語列の発音困難度を比較して、最も困難度が小さい単語列を選択する。置換候補単語列の１つが最小である場合には、抽出された単語列を置き換えし、口述用テキスト（２７）として出力する。出力にはディスプレイ１２でも良いし、ハードディスク（１１）上に結果を格納してもよい。
なお、同義語データベース中に置換候補とならない単語列（置換するとかえってＬとＲが増える場合など）を予め含まないように除去してある場合には、置換候補単語列は必ず抽出された単語列よりも発音困難度が小さいので、発音困難度の比較は置換候補単語列間のみで行えばよい。 In the word string replacement unit (25), the word string extracted from the original sentence as described above is compared with the pronunciation difficulty level of the replacement candidate word string, and the word string having the smallest difficulty level is selected. If one of the replacement candidate word strings is minimum, the extracted word string is replaced and output as dictation text (27). The output may be the display 12 or the result may be stored on the hard disk (11).
If a word string that does not become a substitution candidate in the synonym database (such as when L and R increase instead of substitution) is removed so as not to be included in advance, the substitution candidate word string is always an extracted word string Since the pronunciation difficulty level is smaller than that, the pronunciation difficulty levels may be compared only between the replacement candidate word strings.

本発明の最小限の構成は以上の通りであるが、第２実施例として言い換えの正確さを高める技術を次に説述する。
図３は上記単語列置換部（２５）の追加実施形態を説明する説明図である。該単語列置換部（２５）以外の構成は上記と同様である。
本実施例では、言い換えにより単語列を置換しても自然な文となっているか否かを判定することを可能にする。 Although the minimum configuration of the present invention is as described above, a technique for improving the accuracy of paraphrasing as the second embodiment will be described below.
FIG. 3 is an explanatory diagram for explaining an additional embodiment of the word string replacement unit (25). The configuration other than the word string replacement unit (25) is the same as described above.
In the present embodiment, it is possible to determine whether or not the sentence is a natural sentence even if the word string is replaced by paraphrase.

本処理には図４のようなフローを用いる。
まず抽出された単語列（５０）に対して、置換候補単語列検索部（２３）で置換候補単語列が抽出（５１）されると、元の原稿テキストデータからその前後ｋ-ｇｒａｍの単語列を抽出（５２）する。これには前後ｋ-ｇｒａｍ抽出部（４０）が作用する。
なお抽出するのは形態素が原則であるが、本実施例では空白で区切られた単語でよい。
ｋは定数であって、前後独立して任意に設定できるが、本実施例では前後とも２-ｇｒａｍ（ｋ＝２）としている。 A flow as shown in FIG. 4 is used for this processing.
First, when a replacement candidate word string is extracted (51) by the replacement candidate word string search unit (23) with respect to the extracted word string (50), k-gram word strings before and after the original document text data are extracted. Is extracted (52). The front and rear k-gram extraction unit (40) acts on this.
Note that morphemes are extracted in principle, but in this embodiment, words separated by spaces may be used.
k is a constant and can be arbitrarily set independently before and after, but in this embodiment, it is 2-gram (k = 2) both before and after.

さらに単語配列生成部（４１）において、前節ｋ-ｇｒａｍ、置換候補単語列、後節ｋ-ｇｒａｍと連続して並んだ配列を生成（５３）する。
これにより、前後ｋ-ｇｒａｍは同一で置換対象の単語列のみを入れ替えた配列が生成される。 Further, the word sequence generation unit (41) generates (53) an array in which the preceding clause k-gram, the replacement candidate word string, and the subsequent clause k-gram are arranged side by side.
As a result, an array in which the front and rear k-grams are the same and only the replacement target word string is replaced is generated.

生成された配列について、出現頻度計数部（４２）によりコーパス（４３）中から同一の配列の出現頻度ｆｂ２を計数（５４）する。
コーパスは周知のように文章とともにその品詞情報、構文情報などを付したものである。出現頻度の計数を目的とするだけであれば、コーパスに限らず一定の文章量を有するテキストデータベースを用いてもよい。 For the generated sequence, the appearance frequency fb2 of the same sequence is counted (54) from the corpus (43) by the appearance frequency counting unit (42).
As is well known, a corpus is a sentence with part-of-speech information, syntax information, and the like attached thereto. If it is only for the purpose of counting the appearance frequency, not only the corpus but also a text database having a certain amount of sentences may be used.

自然な文章となっているか否かを判定するために、計数された出現頻度ｆｂ２が１以上であるか確認（５５）する。１つもない場合には不自然な言い回しになっていると判定して、次の単語列に進む。 In order to determine whether the sentence is a natural sentence, it is confirmed (55) whether the counted appearance frequency fb2 is 1 or more. If there is none, it is determined that the wording is unnatural and the process proceeds to the next word string.

１以上の場合には、発音困難度算出比較部（４４）において、発音困難度を算出する。これには前述と同様の方法を用いるが、本実施例では発音困難度データベース（２６）に備えた情報の１つとして「母音が後に隣接して続くＲ又はＬの個数」を情報として用いる。英語の母音としてはa,i,u,e,o,yがある。
これによると、「Ｒ＋母音」「Ｌ＋母音」が抽出された単語列及び該置換候補単語列に含まれる頻度ｆｂ１を求め（５６）、その値が最小の置換候補単語列を選べばよい。これはすなわち、発音困難度が最小の単語列を選択したことに他ならない。 In the case of 1 or more, the pronunciation difficulty level calculation / comparison unit (44) calculates the pronunciation difficulty level. For this purpose, the same method as described above is used, but in this embodiment, “number of R or L that is followed by a vowel” is used as information as one of the information provided in the pronunciation difficulty database (26). English vowels include a, i, u, e, o, and y.
According to this, the word string from which “R + vowel” and “L + vowel” are extracted and the frequency fb1 included in the replacement candidate word string are obtained (56), and the replacement candidate word string having the smallest value may be selected. In other words, this is nothing other than selecting a word string having the smallest pronunciation difficulty.

もしｆｂ１が最小の単語列が複数存在した場合、そのなかでコーパス中における配列の出現頻度ｆｂ２が最大のものを選択（５７）すればよい。
この選択方法によれば、「Ｒ＋母音」等の発音困難度が高い要素を含む単語列を極力置き換えることができると同時に、正しい外国語文中において出現頻度の大きな配列を抽出することができるので、自然な言い回しに寄与する。 If there are a plurality of word strings having the minimum fb1, the one having the maximum appearance frequency fb2 in the corpus may be selected (57).
According to this selection method, it is possible to replace as much as possible a word string including an element having a high pronunciation difficulty such as “R + vowel”, and at the same time, it is possible to extract a sequence having a large appearance frequency in a correct foreign language sentence. Contributes to natural wording.

以上により置換する単語列を決定したら、次の抽出された単語列の処理に進む。そして、原稿テキストデータから抽出されたすべての単語列について処理を行った後に、口述用テキスト（２７）を出力する。 When the word string to be replaced is determined as described above, the processing proceeds to the next extracted word string. After all word strings extracted from the manuscript text data are processed, the dictation text (27) is output.

本実施例ではｋ＝２と固定した場合で説明したがｋを動的に変化させることで、より有意義な前後節との組み合わせを抽出して出現頻度を比較することもできる。すなわち、対象となる単語列から前節又は後節に向けて内容語（文章を特徴づける有意な単語）が１個出現するまでの各文節を抽出する。
これは、機能語だけの文脈から言い換えした表現がコーパス等に出現したとしても、必ずしもその言い換えが正しいと判断するのは困難であるが、内容語まで含めて比較することで、その判断をより正しく行えるからである。 In the present embodiment, the case where k = 2 is fixed has been described. However, by dynamically changing k, it is also possible to extract a combination with more meaningful front and rear clauses and compare appearance frequencies. That is, each phrase is extracted from the target word string until one content word (significant word that characterizes the sentence) appears toward the previous section or the subsequent section.
This is because it is difficult to determine that the paraphrase is correct even if a paraphrased expression appears in the corpus, etc., from the context of only the functional word. This is because it can be done correctly.

具体的にはハードディスク等に機能語辞書を備えておき、機能語辞書と一致しない単語（内容語）がｍ個（ｍは任意の数）現れるまでの単語ｋ-ｇｒａｍを抽出する。このとき、ｍを例えば１、ｋの上限を例えば７と規定しておき、１〜７の間で動的に変化するようにしてもよい。
本構成によると、言い換えたものが自然な文になっているかの検証において、言い換えた表現の周りの表現として効果的に内容語を使用できるので、確実な言い換えに寄与することができる。 Specifically, a functional word dictionary is provided in a hard disk or the like, and word k-grams until m words (content words) that do not match the functional word dictionary appear (m is an arbitrary number) are extracted. At this time, m may be defined as 1, for example, and the upper limit of k may be defined as 7, for example, and may be dynamically changed between 1 and 7.
According to this configuration, in verifying whether the paraphrase is a natural sentence, the content word can be used effectively as an expression around the paraphrased expression, which can contribute to reliable paraphrase.

また上記では、ＬとＲなど特定の文字やその位置を条件としたが、本発明は発音を対象としているので、発音記号に基づいて置換を行ってもよい。
発音記号は表１に示すように母音及び子音が規定されており、さらにアクセントが表示される。これらの記号は辞書データにより簡単に単語から発音記号への置き換えが可能であり、原稿テキストを一旦発音記号の表記にＣＰＵ（１０）の発音記号変換部（図示しない）で変換することができる。 In the above description, specific characters such as L and R and their positions are used as conditions. However, since the present invention is intended for pronunciation, substitution may be performed based on pronunciation symbols.
As shown in Table 1, vowels and consonants are defined as phonetic symbols, and further accents are displayed. These symbols can be easily replaced from words to phonetic symbols by dictionary data, and the original text can be once converted into phonetic symbol notation by a phonetic symbol converter (not shown) of the CPU (10).

そして、発音記号を参照して、[r][l]や、[r]に母音が続く場合の個数を計数することができる。このように発音記号を用いると、文字列だけでは発音が困難か区別の難しい語句であっても、明確に発音困難として抽出することができる。
発音記号を用いる場合にも、発音困難度データベースには発音記号に対する困難度をそれぞれ設定した情報を備えておけばよい。 Then, with reference to the phonetic symbols, the number of [r] [l] or the number of vowels following [r] can be counted. When using phonetic symbols in this way, words that are difficult to pronounce or difficult to distinguish with only a character string can be clearly extracted as difficult to pronounce.
Even when phonetic symbols are used, the pronunciation difficulty level database may be provided with information in which the difficulty levels for the phonetic symbols are set.

本発明は、別の実施例３として、ワードプロセッサプログラム（ワープロソフトと呼ぶ）上に本発明技術を搭載する構成で提供することもできる。
図５はそのブロック図であり、ＣＰＵ（６０）にはワープロを動作させる処理部であるワードプロセッサ処理部（６１）を有している。ここでは公知のワープロソフトが動作し、テキストの編集、読み込み、書き出しなどを行うことができる。 As another embodiment 3, the present invention can be provided in a configuration in which the technology of the present invention is mounted on a word processor program (called word processor software).
FIG. 5 is a block diagram thereof. The CPU (60) has a word processor processing unit (61) which is a processing unit for operating a word processor. Here, a well-known word processor software operates, and text can be edited, read and written.

そして、ワープロソフト上で編集中の原稿テキスト（６２）に対して、本発明による以下の処理部が発話支援動作を行う。すなわち、読出部（６３）が原稿テキストを読み出し、単語列抽出部（６４）が単語列を抽出する。上記と同様の動作である。
各単語列に対して置換候補単語列検索部（６５）が同義語データベース（６６）を用いて置換候補単語列の検索を行う。 Then, the following processing unit according to the present invention performs an utterance support operation on the manuscript text (62) being edited on the word processing software. That is, the reading unit (63) reads the original text, and the word string extracting unit (64) extracts the word string. The operation is the same as above.
For each word string, the replacement candidate word string search unit (65) searches the replacement candidate word string using the synonym database (66).

このとき、原稿テキストは必ずしも全文が完成している必要はなく、入力しながら随時以上の動作を行ってもよい。例えば、１文が入力される（ピリオドが打たれる）と、各処理が行われるようにしてもよい。 At this time, the entire text of the manuscript text does not necessarily have to be completed. For example, each process may be performed when one sentence is input (a period is entered).

そして、単語列選択部（６７）では発音困難度データベース（６８）を参照しながら置換するのに最適な単語列を選択する。本実施例では、選択した後に、置換単語列提示部（６９）がユーザに対して置換を勧める単語列を呈示する。
呈示にはディスプレイ（７０）を用いるのが簡便であり、例えばワープロソフト上の当該単語列上に重複して置換を勧める単語列を表示する。呈示と同時に、置換をするか否かをユーザに促す表示を行う。 Then, the word string selection unit (67) selects an optimum word string for replacement while referring to the pronunciation difficulty database (68). In this embodiment, after selection, the replacement word string presentation unit (69) presents a word string that recommends replacement to the user.
For the presentation, it is easy to use the display (70). For example, a word string that recommends replacement is displayed on the word string on the word processor software. Simultaneously with the presentation, a display prompting the user whether or not to replace is performed.

ユーザがキーボード（７１）などによって承諾すると、単語列置換部（７２）の作用により、ワープロソフト上で当該単語列が置換単語列に置換される。
以上の方法によると、ユーザがワープロソフト上で口述用のテキストをタイピングしながら随時に発音しにくい単語を呈示し、承諾動作だけで言い換えを行うことができる。
本発明のシステムを実施する際には情報量の豊富な同義語データベース・コーパス等を用いることが好ましいが、情報量が少なくともユーザに呈示して、ユーザがその都度判断する本実施例のシステムであれば、十分に利用性が高い。 When the user consents with the keyboard (71) or the like, the word string replacement unit (72) replaces the word string with a replacement word string on the word processing software.
According to the above method, a user can present words that are difficult to pronounce while typing text for dictation on word processing software, and can perform paraphrasing only by an acceptance operation.
When implementing the system of the present invention, it is preferable to use a synonym database corpus having a large amount of information. However, in the system of this embodiment, the amount of information is presented to the user, and the user judges each time. If there is, it is sufficiently usable.

以上の装置による実施結果を図６に示す。
"approach"を"way"に書き換えたり、"length"を"size"に書き換えるなど、発音しやすい単語に書き換えられることがわかる。一方で、現在のシステムでは微妙なニュアンスの違いを言い換えられない場合もある。これには同義語データベースの内容の見直しなどにより精度を高めることが考えられる。
また、第３実施例と同じように他の実施例でも候補をユーザに対して呈示し、ユーザが言い換えを行うかいなか判断するようにしてもよい。 FIG. 6 shows the result of the above apparatus.
It can be seen that the words “approach” can be rewritten to “way”, and “length” can be rewritten to “size” to make it easier to pronounce. On the other hand, there are cases in which subtle differences in nuances cannot be rephrased in the current system. This can be achieved by improving the accuracy by reviewing the contents of the synonym database.
Further, as in the third embodiment, in other embodiments, candidates may be presented to the user and it may be determined whether or not the user is to paraphrase.

本発明は、上記技術を用いて音声再生装置を提供することもできる。
従来、音声合成処理技術が周知であり、音声合成手段により、テキストデータから音声波形を生成し、スピーカ等から音声再生する技術が知られている。この際、一部の発音について完全に再生することが難しく、聞き取りにくい場合がある。上記した特許文献２はこの点を解消することを目的としたものであり、助詞やポーズを補ったり、機械的に言い換えすることで解決を図っている。 The present invention can also provide an audio reproducing apparatus using the above technique.
2. Description of the Related Art Conventionally, a voice synthesis processing technique is well known, and a technique for generating a voice waveform from text data by voice synthesis means and reproducing voice from a speaker or the like is known. At this time, it may be difficult to completely reproduce a part of the pronunciation and it may be difficult to hear. The above-mentioned Patent Document 2 aims to solve this problem, and attempts to solve the problem by supplementing particles and poses or by paraphrasing them mechanically.

しかし、このような方法では画一的な言い換えだけが可能であり、言語特有の曖昧な規則に柔軟に対応して言い換えを行うことはできない。特に予め聞き取りにくいと想定されるすべての語句を登録しなければならないので、新規な語句は聞き取りにくいものであっても対応できない。
これに対して本発明の技術は、上述したように規則に基づいて発音しにくい（聞き取りにくい）か否かを判定しながら新しい語句にも対応できる。 However, in this method, only uniform paraphrasing is possible, and it is not possible to flexibly deal with ambiguous rules peculiar to languages. In particular, since all words that are assumed to be difficult to hear must be registered in advance, new words cannot be handled even if they are difficult to hear.
On the other hand, the technique of the present invention can deal with new words and phrases while determining whether or not pronunciation is difficult (difficult to hear) based on the rules as described above.

図７は本発明に係る音声再生装置（８０）の構成図である。
原稿テキストデータ（８１）をＣＰＵ（８２）にテキストデータ入力手段（８３）によって入力する。該テキストデータ入力部（８３）はＣＰＵにテキストを入力する様々な手段を総称したものであり、例えばキーボードなどの直接入力する手段や、磁気ディスクなどの読み出し手段、ネットワークからの情報取得手段などを含む。 FIG. 7 is a block diagram of an audio playback device (80) according to the present invention.
Document text data (81) is input to the CPU (82) by text data input means (83). The text data input unit (83) is a collective term for various means for inputting text to the CPU. For example, a direct input means such as a keyboard, a reading means such as a magnetic disk, an information acquisition means from a network, etc. Including.

そして、単語列抽出部（８４）で上記と同様に単語列を抽出し、置換候補単語列検索部（８５）で同義語データベース（８６）から置換候補単語列を検索する。
さらに、抽出された単語列と、置換候補の単語列との間で音声再現困難度データベース（８７）を参照して、単語列置換部（８８）で音声再現困難度を算出する。 Then, the word string extraction unit (84) extracts the word string in the same manner as described above, and the replacement candidate word string search unit (85) searches the synonym database (86) for the replacement candidate word string.
Further, the speech reproduction difficulty level is calculated by the word string replacement unit (88) with reference to the speech reproduction difficulty level database (87) between the extracted word string and the replacement candidate word string.

該音声再現困難度は、上記発音困難度と同様の構成であるが、データベースに含まれる内容は次の音声合成部（８９）で音声合成を行う際に正確な合成が困難な音に係る情報である。
すなわち、日本語では５０音における特定の音や、促音などを定義する。例えば、音声合成部（８９）で促音が不得意として、促音「っ」に音声再現困難度が５と定義されていたとする。やや不得意な場合としてすべての濁音には音声再現困難度が１と定義されていたとする。
このとき、「さっそく」の語句について、同義語データベース（８６）から「すぐに」が抽出される。「すぐに」の困難度は１つの濁音を含んで１と算出されるから、困難度が５の「さっそく」は「すぐに」に置き換えられる。 The degree of difficulty in reproducing speech is the same as the degree of difficulty in pronunciation, but the contents included in the database are information related to sounds that are difficult to synthesize accurately when the next speech synthesizer (89) performs speech synthesis. It is.
That is, in Japanese, a specific sound in 50 sounds or a prompt sound is defined. For example, it is assumed that the sound synthesis unit (89) is not good at the prompt sound and the sound reproduction difficulty level is defined as 5 for the prompt sound “tsu”. It is assumed that the sound reproduction difficulty level is defined as 1 for all muddy sounds as a slightly unsatisfactory case.
At this time, “immediately” is extracted from the synonym database (86) for the phrase “soon”. Since the difficulty level of “immediately” is calculated as 1 including one muddy sound, “soon” with a difficulty level of 5 is replaced with “immediately”.

音声合成部（８９）では公知の音声合成技術によりテキストデータから音声波形情報を生成する。任意の文字列を出力する方法として規則合成・テキスト音声合成の方式が知られている。
一例として原稿テキストデータが日本語テキストの場合を説明すると、単語列抽出部（８４）において、公知の形態素解析技術や辞書との照合によって単語列を抽出し、さらにハードディスク（９１）に蓄積した表音文字対応データベースに基づいて単語の読み及びアクセント（韻律情報）を定義する。 The voice synthesizer (89) generates voice waveform information from the text data by a known voice synthesis technique. As a method for outputting an arbitrary character string, a rule synthesis / text-to-speech synthesis method is known.
As an example, the case where the manuscript text data is Japanese text will be described. In the word string extraction unit (84), a word string is extracted by collation with a known morphological analysis technique or a dictionary and further stored in the hard disk (91). The word reading and accent (prosodic information) are defined based on the phonetic character correspondence database.

これによって入力された日本語テキストは表音文字列に変換される。単語列置換部（８８）ではここで定義されたアクセントを参考に、第１アクセントの場合に重み付けを行うなど、上記実施例と同様の演算を行うことができる。
そして、音声合成部（８９）において表音文字列を音声波形に変換する。ハードディスク（９１）内の波形辞書データベースから各音素の波形を読み出し、音声波形を生成する。 As a result, the input Japanese text is converted into a phonetic character string. The word string replacement unit (88) can perform the same calculation as in the above-described embodiment, such as weighting in the case of the first accent, with reference to the accent defined here.
Then, the phonetic character string is converted into a voice waveform in the voice synthesizer (89). The waveform of each phoneme is read from the waveform dictionary database in the hard disk (91), and a speech waveform is generated.

各音素の波形は、その音素の両側にどのような音素が来るか(音素コンテキスト) によって著しく異なる。そのため一般的に、同じ音素でも音素コンテキストが異なるものは異なる波形を用意している。どれだけ細かく音素コンテキストに対処するかは、合成音声の明瞭度・滑らかさに大きくかかわる。なお、明瞭度とは、人間による音の聞き取り率(認識率)であり、本実施例における音声再現困難度データベースに相対する概念である。 The waveform of each phoneme varies significantly depending on what phoneme comes on both sides of the phoneme (phoneme context). Therefore, in general, different waveforms are prepared for the same phoneme but different phoneme contexts. How finely deal with phoneme context is greatly related to the clarity and smoothness of synthesized speech. The intelligibility is a sound listening rate (recognition rate) by a human and is a concept opposite to the speech reproduction difficulty level database in this embodiment.

音声合成技術では、このような音素コンテキストに対して複雑な処理により対応する研究が進められていたが、音声合成部（８９）における処理を軽快にするために、本発明ではあらかじめ単語列を置換するものである。
従って、音声合成部（８９）にはすでに上記明瞭度が高まるようなテキストが与えられ、生成された音声波形は音声出力部（９０）に入力される。 In the speech synthesis technology, research corresponding to such phoneme contexts by complicated processing has been advanced. However, in order to simplify the processing in the speech synthesis unit (89), the present invention replaces word strings in advance. To do.
Accordingly, the speech synthesizing unit (89) is already provided with text that increases the clarity, and the generated speech waveform is input to the speech output unit (90).

音声出力部（９０）では公知のサウンドカードなどで構成され音声波形から音響（アナログデータ）生成し出力する。出力にはＣＰＵ（８０）と接続したスピーカ（９２）を用いる。 The sound output unit (90) is composed of a known sound card or the like, and generates and outputs sound (analog data) from the sound waveform. A speaker (92) connected to the CPU (80) is used for output.

もちろん本装置では、英語等の外国語を再生する場合に、上記実施例と同様に聴衆が聴き分けにくい音を回避して出力することができる。たとえば、ＬとＲが含まれる単語を減らすことで日本人にとって聴き取りにくい語句の使用を減少させることができる。
より一般的に発音記号で比較を行い、類似した発音を持つ発音記号を多く含む単語を言い換えることもできる。例えば類似した発音を持つ発音記号を予め音声再現困難度データベースに備えて困難度を設定することで、紛らわしい発音を多く含む単語列は言い換えるようにする。これによって、外国語に堪能でない聞き手に聴き取りやすい音声再生装置を提供することもできる。 Of course, in the present apparatus, when reproducing a foreign language such as English, it is possible to avoid and output a sound that is difficult for the audience to distinguish as in the above embodiment. For example, by reducing the number of words including L and R, it is possible to reduce the use of words that are difficult for Japanese to hear.
More generally, the phonetic symbols can be compared to rephrase words that contain many phonetic symbols with similar pronunciation. For example, phonetic symbols having similar pronunciations are prepared in advance in the speech reproduction difficulty level database, and the difficulty level is set in advance, so that a word string including many confusing pronunciations is rephrased. As a result, it is possible to provide an audio reproduction device that is easy to listen to a listener who is not fluent in foreign languages.

以上の音声再生装置によると、音声合成部（８９）における処理を軽減することができると同時に、音素コンテキストに対する様々な波形を用意する必要がなくなる。特に、音声合成部で複数の言語を再生する場合に、あらかじめ各言語の各音声コンテキストと波形をすべて備えるのは情報量が莫大になる問題がある。
しかし、本発明では音声合成部（８９）における発音の不得意な音についての情報（発音記号が望ましい）を与えておけば、複数の言語においても各言語の同義語データベースから単語列の置換を行うことで、明瞭度の高い音声再生を実現することができる。 According to the above voice reproducing apparatus, the processing in the voice synthesizing unit (89) can be reduced, and at the same time, it is not necessary to prepare various waveforms for the phoneme context. In particular, when a plurality of languages are reproduced by the speech synthesizer, having all the speech contexts and waveforms in each language in advance has a problem that the amount of information becomes enormous.
However, in the present invention, if information on a sound that is poorly pronounced in the speech synthesizer (89) is given (preferably a phonetic symbol), word strings are replaced from the synonym database of each language even in a plurality of languages. By doing so, it is possible to realize voice reproduction with high clarity.

本発明の第１の実施形態に係る発話支援装置の構成図である。It is a block diagram of the speech support apparatus which concerns on the 1st Embodiment of this invention. 発音困難度データベースの一例である。It is an example of a pronunciation difficulty database. 本発明の第２の実施形態に係る単語列置換部の構成図である。It is a block diagram of the word string replacement part which concerns on the 2nd Embodiment of this invention. 同、フローチャートである。It is a flowchart. 本発明の第３の実施形態に係るブロック図である。It is a block diagram concerning the 3rd embodiment of the present invention. 本発明による支援結果を示す図である。It is a figure which shows the support result by this invention. 本発明の第３の実施形態に係る音声再生装置の構成図である。It is a block diagram of the audio | voice reproduction apparatus which concerns on the 3rd Embodiment of this invention.

Explanation of symbols

１発話支援装置
１０ＣＰＵ
１１ハードディスク
１２ディスプレイ
１３メモリ
２０原稿テキスト
２１原稿テキスト取得部
２２単語列抽出部
２３置換候補単語列検索部
２４同義語データベース
２５単語列置換部
２６発音困難度データベース
２７口述用テキスト
1 Utterance support device 10 CPU
DESCRIPTION OF SYMBOLS 11 Hard disk 12 Display 13 Memory 20 Manuscript text 21 Manuscript text acquisition part 22 Word string extraction part 23 Replacement candidate word string search part 24 Synonym database 25 Word string substitution part 26 Pronunciation difficulty database 27 Oral text

Claims

A voice playback device that generates voice by voice synthesis processing when a manuscript text is input,
Input means for acquiring document text data in the apparatus;
Word string extraction means for extracting words or word strings (hereinafter referred to as word strings) from the manuscript text data;
A synonym database with synonym phrases for word sequences;
A replacement candidate word string search means for collating the word string extracted by the word string extraction means with the synonym database and searching for a replaceable replacement candidate word string;
A speech reproduction difficulty database that quantitatively records the difficulty of speech reproduction in speech synthesis processing;
Word string replacement means for acquiring the respective speech reproduction difficulty levels from the speech reproduction difficulty database for the word string before replacement and the replacement candidate word strings, and selecting and replacing the word string having the smallest voice reproduction difficulty level When,
Speech synthesis means for performing speech synthesis processing based on the dictation text with the word string replaced by the above means;
An audio reproduction apparatus comprising: audio output means for outputting audio.

The speech reproduction difficulty level database defines a difficulty level according to the number of predetermined characters or phonetic symbols in a word string,
The word string replacement means counts the number of the predetermined characters or phonetic symbols in the extracted word string, and the speech of the word string is calculated based on the speech reproduction difficulty calculation formula defined in the speech reproduction difficulty database. The audio reproduction device according to claim 1, wherein the reproduction difficulty level is calculated.

In the speech reproduction difficulty level database, regarding two or more characters or phonetic symbols that are difficult to distinguish, any number of predetermined characters or phonetic symbols included in a word string, or any predetermined character or phonetic symbol The number constituting the sound of the first accent in the word string, the number constituting a significant word string when any given character or phonetic symbol in the word string is replaced with the other character or phonetic symbol, Comprising at least one piece of information and defining a predetermined weighting value for these numbers,
The audio reproduction device according to claim 1, wherein the word string replacement unit performs a calculation of multiplying each constituent number by a weighting value and calculating a sum of the multiplications.

The audio reproduction device includes a corpus database in the same language as the original text,
The word string replacement means is
A frequency counting unit that counts the frequency at which the sequence of the replacement candidate word strings and the word strings of k-grams before and after the k-grams (k is the same or different before and after) appear in the corpus database;
A speech reproduction difficulty comparison / replacement unit that replaces the replacement candidate word string with a replacement candidate word string that has the lowest frequency and the highest frequency when the frequency is equal to or higher than a predetermined value. The audio reproduction device according to any one of claims 1 to 3.

An utterance support device that outputs easy-to-speak dictation text by replacing difficult-to-pronunciation words in manuscript text,
Input means for acquiring document text data in the apparatus;
Word string extraction means for extracting words or word strings (hereinafter referred to as word strings) from the manuscript text data;
A synonym database with synonym phrases for word sequences;
A replacement candidate word string search means for collating the word string extracted by the word string extraction means with the synonym database and searching for a replaceable replacement candidate word string;
A pronunciation difficulty database that quantitatively records the difficulty of pronunciation of each word string in advance;
Word string replacement means for obtaining the pronunciation difficulty level from the pronunciation difficulty database for the word string before replacement and the replacement candidate word string, and selecting and replacing the word string having the minimum pronunciation difficulty;
An utterance support apparatus comprising: output means for outputting dictation text in which a word string is replaced by the means described above.

The pronunciation difficulty database defines the difficulty according to the number of predetermined characters or phonetic symbols in the word string,
The word string replacement means counts the number of the predetermined characters or phonetic symbols in the extracted word string, and based on the pronunciation difficulty calculation formula defined in the pronunciation difficulty database, the pronunciation difficulty of the word string The utterance support device according to claim 5.

In the pronunciation difficulty level database, regarding two or more difficult-to-distinguish characters or phonetic symbols, the number of any given character or phonetic symbol included in the word string, or any given character or phonetic symbol is At least a number constituting the sound of the first accent in the word string, a number constituting a significant word string when any given character or phonetic symbol in the word string is replaced with the other character or phonetic symbol, It is configured to include any information and to define a predetermined weighting value for the number,
The utterance support device according to claim 5 or 6, wherein the word string replacement unit performs a calculation of multiplying each constituent number by a weighting value and calculating a sum of the multiplications.

The utterance support device comprises a corpus database in the same language as the manuscript text;
The word string replacement means is
A frequency counting unit that counts the frequency at which the sequence of the replacement candidate word strings and the word strings of k-grams before and after the k-grams (k is the same or different before and after) appear in the corpus database;
A pronunciation difficulty comparison / replacement unit that replaces the substitution candidate word string with a substitution candidate word string that has the smallest and most frequent pronunciation difficulty when the frequency is equal to or higher than a predetermined value. The utterance support device according to claim 5.

An utterance support method for outputting dictation text that is easy to speak by replacing difficult-to-pronunciation words in the manuscript text,
An input step in which the input means acquires document text data in the apparatus;
A word string extracting step in which a word string extracting unit extracts a word or a word string (hereinafter referred to as a word string) from the document text data;
The replacement candidate word string search means using the synonym database having synonym phrases for the word string collates the word string extracted in the word string extraction step with the synonym database, and determines a replacement candidate word string that can be replaced. A replacement candidate word string search step to search,
Using the pronunciation difficulty database that quantitatively records the difficulty of pronunciation of each word string in advance, the word string replacement means for each word pronunciation from the pronunciation difficulty database for the word string before the replacement and the replacement candidate word string A word string replacement step of acquiring a difficulty level and selecting and replacing a word string having the minimum pronunciation difficulty level;
The speech support method, wherein the output means includes at least an output step of outputting the dictation text in which the word string is replaced by each of the above steps.

The pronunciation difficulty database defines the difficulty according to the number of predetermined characters or phonetic symbols in the word string,
In the word string replacement step, the number of the predetermined characters or phonetic symbols in the extracted word string is counted, and the pronunciation difficulty level of the word string is calculated based on the calculation formula of the pronunciation difficulty level defined in the pronunciation difficulty database The utterance support method according to claim 9.

In the pronunciation difficulty level database, regarding two or more difficult-to-distinguish characters or phonetic symbols, the number of any given character or phonetic symbol included in the word string, or any given character or phonetic symbol is At least a number constituting the sound of the first accent in the word string, a number constituting a significant word string when any given character or phonetic symbol in the word string is replaced with the other character or phonetic symbol, It is configured to include any information and to define a predetermined weighting value for the number,
The utterance support method according to claim 9 or 10, wherein, in the word string replacement step, calculation is performed by multiplying each constituent number by a weighting value and calculating a total sum thereof.

In the word string replacement step of the utterance support method,
Using a corpus database in the same language as the original text, the word string replacement means
Counting the frequency of occurrence of the replacement candidate word sequence and the sequence of k-grams before and after the word sequence in the corpus database (where k is the same number or different before and after)
The utterance support method according to any one of claims 9 to 11, wherein when the frequency is equal to or higher than a predetermined value, the replacement candidate word string is replaced with a replacement candidate word string having the lowest pronunciation difficulty and the highest frequency.

An utterance support program that is used together with a word processor program and outputs dictation text that is easy to utter by replacing difficult-to-pronunciation words in a manuscript text,
An original text data reading step for obtaining original text data in an edited state in a word processor processing means;
A word string extracting step in which a word string extracting unit extracts a word or a word string (hereinafter referred to as a word string) from the document text data;
The replacement candidate word string search means using the synonym database having synonym phrases for the word string collates the word string extracted in the word string extraction step with the synonym database, and determines a replacement candidate word string that can be replaced. A replacement candidate word string search step to search,
The word string selection means uses the pronunciation difficulty database that quantitatively records the difficulty of pronunciation of each word string in advance, and the word string selection means for each word string from the pronunciation difficulty database for the word string before replacement and the replacement candidate word string. A word string selection step of acquiring a difficulty level and selecting a word string having a minimum pronunciation difficulty level;
The replacement word string presenting means presents the replacement word string selected in the word string selection step together with the original text data edited in the word processor processing means, and prompts the user to input whether or not to replace,
An utterance support program used together with a word processor program, wherein the word string replacement means includes at least a word string replacement step of replacing a word string in response to a user input.

The pronunciation difficulty database defines the difficulty according to the number of predetermined characters or phonetic symbols in the word string,
In the replacement word string presenting step, the number of the predetermined characters or phonetic symbols in the extracted word string is counted, and the pronunciation difficulty of the word string is calculated based on the calculation formula of the pronunciation difficulty defined in the pronunciation difficulty database An utterance support program used together with the word processor program according to claim 13.

In the pronunciation difficulty level database, regarding two or more difficult-to-distinguish characters or phonetic symbols, the number of any given character or phonetic symbol included in the word string, or any given character or phonetic symbol is At least a number constituting the sound of the first accent in the word string, a number constituting a significant word string when any given character or phonetic symbol in the word string is replaced with the other character or phonetic symbol, It is configured to include any information and to define a predetermined weighting value for the number,
The utterance support program used together with the word processor program according to claim 13 or 14, wherein, in the replacement word string presenting step, a calculation is performed to multiply each number by a weighting value and to obtain a sum thereof.

In the replacement word string presentation step of the utterance support program,
Using a corpus database in the same language as the manuscript text, the replacement word string presenting means,
Counting the frequency of occurrence of the replacement candidate word sequence and the sequence of k-grams before and after the word sequence in the corpus database (where k is the same number or different before and after)
16. The word processor program according to claim 13, wherein when the frequency is equal to or higher than a predetermined value, the substitution candidate word string having the smallest pronunciation difficulty and the highest frequency is presented. Utterance support program.