JP2004226556A

JP2004226556A - Method and device for diagnosing speaking, speaking learning assist method, sound synthesis method, karaoke practicing assist method, voice training assist method, dictionary, language teaching material, dialect correcting method, and dialect learning method

Info

Publication number: JP2004226556A
Application number: JP2003012581A
Authority: JP
Inventors: Masumi Saito; ますみ斎藤
Original assignee: Individual
Current assignee: Individual
Priority date: 2003-01-21
Filing date: 2003-01-21
Publication date: 2004-08-12

Abstract

<P>PROBLEM TO BE SOLVED: To make diagnosis of the speaking etc. easy by visual representation using a computer etc. <P>SOLUTION: A speaking diagnosing method for diagnosing the speaking of a person who desires a diagnosis by using a computer includes: a voice data acquisition step of inputting voice data generated by recording the way of speaking of the person to be diagnosed to the computer; a voice data analysis step of analyzing the voice data inputted in the voice data acquisition step as to variations of the pitch, tempo, and loudness of the voice etc.; and an analysis result display step of visually displaying the way of speaking of the person who desires the diagnosis according to the results of analyses performed at the data analysis step. The display in the analysis result display step is carried out by using a stave or drawing a graph representing the pitch and loudness of the voice on two axes. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】本発明は、話し方の診断をする方法などに関する。
【０００２】
【従来の技術】従来は、カラオケ練習システムなどで、歌ったあとで１００点、８０点などの採点をおこなうものは、あった。話し方については、特にコンピュータを利用した解析を行うものはなかった。
【０００３】
【発明が解決しようとする課題】本発明の発明者は、日頃話し方、聞き方、電話応対の仕方などについて講習を行ってきた。そして、今回画期的な話し方診断方法を考え出した。多線譜（五線譜の上位概念であり、複数の平行線を用いて音階を表示するものをいう。）を用いて文のイントネーションを表記して話し方の診断や指導を行う方法である。この方法は、本発明者が著者である「電話王の話す技術・聞く技術」（２００２年７月３０日発行、株式会社太陽企画出版）に公開された。多線譜を用いて話し方の診断等をする方法については、この書籍の２８ページ、３０ページ、６０ページ、１１７ページ、１３２ページ、１４３ページにあらわされている。また、本発明の発明者は、声の高さと声の大きさとを二軸とするグラフを用いて話し方を指導する方法についても考え出し、この書籍の２６ページ、１２６ページ、１３２ページ、１４３ページ、１４７ページ、１５１ページ、１６８ページ、１７４ページ、１８４ページ、１８８ページ、１９８ページにあらわされている。
【０００４】本発明の発明者は、話し方の診断をもっと多くの人々に普及する方法を考えた。本発明は、話し方の診断等をコンピュータなどを用いて視覚的に表現しわかりやすくすることを目的とする。
【０００５】
【課題を解決する手段】上記課題を解決すべく、請求項１に記載した発明は、コンピュータを使用して話し方の診断をする話し方診断方法であって、診断を欲する人の話し方を録音した音声データを該コンピュータに取り込む音声データ取得ステップと、該音声データ取得ステップにて取り込んだ音声データを声の高さ、テンポ、声の大きさの変化などについて解析する音声データ解析ステップと、該音声データ解析ステップにて解析した解析結果に基づいて、前記診断を欲する人の話し方を視覚的に表示する解析結果表示ステップとを有するものである。
【０００６】請求項２に記載した発明は、請求項１に記載した話し方診断方法であって、前記解析結果表示ステップにおける表示は、多線譜を用いたものである。
【０００７】請求項３に記載した発明は、請求項１に記載した話し方診断方法であって、前記解析結果表示ステップにおける表示は、声の高さと声の大きさとを二つの軸とするグラフを描いたものである。
【０００８】請求項４に記載した発明は、携帯電話装置、パーソナルデジタルアシスタント（ＰＤＡ）、電子手帳、腕時計などの携帯機器の表示部に話し方の診断結果を表示する話し方診断装置であって、使用者の話した音声データを取り込む音声データ取り込み手段と、該音声データ取り込み手段により取り込まれた音声データを解析する音声データ解析手段と、該音声データ解析手段により解析された解析結果に基づいて、前記携帯機器の表示部に話し方の解析結果を表示する解析結果表示手段とを有する。
【０００９】請求項５に記載した発明は、請求項４に記載の話し方診断装置であって、該話し方診断装置は、前記携帯機器の内部に組み込まれ、当該機器単独で前記機能を有するものである。
【００１０】請求項６に記載した発明は、請求項４に記載の話し方診断装置であって、該話し方診断装置は、携帯電話などの無線通信を介してつながるコンピュータネットワークの中に設けられ、前記携帯機器との間で通信することにより前記機能を果たすものである。
【００１１】請求項７に記載した発明は、コンピュータを使用して話し方の学習を支援する話し方学習支援方法であって、該学習者の話し方を録音した音声データを該コンピュータに取り込む音声データ取得ステップと、該音声データ取得ステップにて取り込んだ音声データを声の高さ、テンポ、声の大きさの変化などについて解析する音声データ解析ステップと、該音声データ解析ステップにて解析した解析結果に基づいて、前記学習者の話し方を視覚的に表示する解析結果表示ステップとを有する。
【００１２】請求項８に記載した発明は、電子的に音声を合成する音声合成方法であって、センテンスごとのイントネーションについて多線譜にしたがった調整を加えるイントネーション調整ステップと、該イントネーション調整ステップにて調整された結果に基づいて合成音声データを作成する合成音声作成ステップとを有する。
【００１３】請求項９に記載した発明は、コンピュータを用いてカラオケ練習者のカラオケ練習を支援するカラオケ練習支援方法であって、該カラオケ練習者の歌い方を録音した音声データを該コンピュータに取り込む音声データ取得ステップと、該音声データ取得ステップにて取り込んだ音声データを声の高さ、テンポ、声の大きさの変化などについて解析する音声データ解析ステップと、該音声データ解析ステップにて解析した解析結果に基づいて、前記カラオケ練習者の歌い方を視覚的に表示する解析結果表示ステップとを有する。
【００１４】請求項１０に記載した発明は、コンピュータを使用してボイストレーニング訓練者のボイストレーニングを支援するボイストレーニング支援方法であって、該ボイストレーニング訓練者の声を録音した音声データを該コンピュータに取り込む音声データ取得ステップと、該音声データ取得ステップにて取り込んだ音声データを声の高さ、テンポ、声の大きさの変化などについて解析する音声データ解析ステップと、該音声データ解析ステップにて解析した解析結果に基づいて、前記ボイストレーニング訓練者の声を視覚的に表示する解析結果表示ステップとを有する。
【００１５】請求項１１に記載した発明は、センテンス単位の好ましいイントネーションを多線譜を用いて表記した辞書である。
【００１６】請求項１２に記載した発明は、センテンス単位の好ましいイントネーションを多線譜を用いて表記した語学教材である。
【００１７】請求項１３に記載した発明は、コンピュータを使用して方言を矯正する方言矯正方法であって、該矯正者の話し方を録音した音声データを該コンピュータに取り込む音声データ取得ステップと、該音声データ取得ステップにて取り込んだ音声データを声の高さ、テンポ、声の大きさの変化などについて解析する音声データ解析ステップと、該音声データ解析ステップにて解析した解析結果に基づいて、前記矯正者の話し方を視覚的に表示する解析結果表示ステップとを有する。
【００１８】請求項１４に記載した発明は、コンピュータを使用して方言を学習する方言学習方法であって、該学習者の話し方を録音した音声データを該コンピュータに取り込む音声データ取得ステップと、該音声データ取得ステップにて取り込んだ音声データを声の高さ、テンポ、声の大きさの変化などについて解析する音声データ解析ステップと、該音声データ解析ステップにて解析した解析結果に基づいて、前記学習者の話し方を視覚的に表示する解析結果表示ステップとを有するものである。
【００１９】請求項１５に記載した発明は、コンピュータを使用して話し方の診断をする話し方診断方法であって、診断を欲する人の話し方を録音した音声データを該コンピュータに取り込む音声データ取得ステップと、該音声データ取得ステップにて取り込んだ音声データを解析してトーンの変化点を取得するトーン変化点取得ステップと、該トーン変化点取得ステップにて取得した変化点を音符化する音符化ステップとを有するものである。
【００２０】請求項１６に記載した発明は、請求項１５に記載した話し方診断方法であって、前記音符化ステップにて音符化した音階にメジャーコードを適用して調整するコード調整ステップをさらに有するものである。
【００２１】請求項１７に記載した発明は、請求項１５又は１６に記載した話し方診断方法であって、前記音符化した結果と学習目標とを併記して表示する診断結果表示ステップをさらに有するものである。
【００２２】
【発明の実施の形態】以下、図面を参照しつつ、本発明の実施形態について説明する。図１は、応対時における声の「大きさ」と「高さ」の関係を示す図である。この図でＸ軸（横軸）は声の大きさを表す。声の大小は、音楽的には強弱という表現になるので、ｐｐ（ピアニシモ）、ｐ（ピアノ）、ｍｐ（メゾピアノ）、ｍｆ（メゾフォルテ）、ｆ（フォルテ）、ｆｆ（フォルテシモ）として表記している。Ｙ軸（縦軸）は、声の高さをドレミファソの音階で表している。例えば、図中のＡ点は、フォルテとソ音が交差する座標、Ｂ点は、ピアノとド音が交差する座標と考える。Ｘ軸とＹ軸が交差する座標軸を中心に、右上が「好感応対」、右下が「説得」、左下が「陳謝」、左上が「協調」領域となる。インバウンドの第一声では、「好感応対」領域で話し、会話展開に特に問題のない場合は、終話までこの領域で話すようにするのが好ましい。声の高さに、感情が入り明るさなどの濃淡が出ると、声のトーンと呼ぶ。
【００２３】図２は、多線譜を用いて話し方を分析した図である。多線譜は、五線譜の上位概念であって、音の高さの経時的変化を視覚的に表現したものである。場合によっては、六線譜、四線譜などのように、説明する相手の理解度、音感の訓練度などに応じて、線の数を適当に変化させることもできる。「ありがとうございます」、「日本商事お客様サービス部斎藤でございます」をソ音からド音に至る直線的な動きで発声した場合には、ストレートでやわらかさに欠ける。それに対し、最初の音の高さから少しあげて下げるという曲線的な動きにすると柔らかく受け入れられやすい。「ありがとうございます」の場合だと、最初ソ音から始まり、すこし上げてからド音まで上に凸の山なり状態で曲線的に下がるのがよい。「日本商事お客様サービス部斎藤でございます」の例では、「日本商事」の部分でソ音からファ音まで少しあがってから山なりで下がり、「お客様サービス部」の部分でファ音からレ音まで少し上がってから山なりで下がり、「斎藤でございます」の部分でミ音の少し上からド音まで少しあがってから山なりで下がるのがよい。
また、図２の下半分には、抑揚についての分析が書かれている。「どうもありがとうございました」、「まことにもうしわけございませんでした」などをミ音で抑揚なし（棒読み）で発声すると音域が狭く、心がこもっていない感触を聞く人に与える。また、ミ音とド音の間を下がったり上がったりする抑揚は不適切であり、失礼な印象、誤解を与える結果となる。適切な抑揚は図２の一番下にあるように、適切な切れ目ごとに上に凸である山なりのイントネーションを持っているものである。この抑揚で話すと心がこもっている印象を相手に与えることができる。
【００２４】図３は、語尾のクセ・３パターンについて多線譜を用いて説明する図である。語尾上げ、語尾伸ばし、語尾おさえの三つのクセについて説明している。本人が自覚していないクセについてもこのように図示して指摘することによって自覚を促し矯正する動機を与えることができる。
【００２５】図４は、いろいろな「はい」の言い方について多線譜を用いて説明する図である。あいづちを打つはい、同意型のはい、快諾型のはい、クレーム対応のはい、返事のはい、無気力型のはい、不安型のはい、同調型のはい、イライラ型のはい、疑問のはい、と１０種類の「はい」について多線譜を用いると区別して説明することができる。
【００２６】図５は、話し方に変化をつける５チェンジ話法を多線譜を用いて説明する図である。入りの音を変える、音域を変える、入りのタイミングを変える（間の取り方）、緩急を変える、キーワードの強調の仕方を変えるの５つの話法を用いて変化をつけることを多線譜を用いて説明している。また、これを間の取り方について応用することもできる。
【００２７】図６は、クレーム応対の四つの局面と声の四つの領域について、声の高さと声の大きさを二軸としたグラフを用いて説明する図である。ここでクレームとは、いわゆる苦情、不満を顧客が感じてそれを会社のサポートセンターやお客様相談室などにぶつけることを指していっている。四つの局面は、顧客についてみると主張の局面、状況説明の局面、聴いて理解し合意点に達するまでの局面、合意点から納得、満足、感動に至る局面の四つである。応対者から見ると、あいづちの局面、状況把握の局面、問題解決の局面、クロージングの局面の四つとなる。それぞれの局面において声の大きさと声のトーンがどの領域を用いるのが好ましいかを説明している。
【００２８】図７は、クレーム第一局面での声の出し方を声の高さと声の大きさを二軸としたグラフと、多線譜とを併用して説明する図である。あいづちのはいの入りの音の高さの変化、そして、「それは、誠に申し訳ございませんでした。」のイントネーションを多線譜で示しつつ、声の高さと声の大きさとの二軸のグラフでの動きと対応させて説明している。
【００２９】図８は、クレーム第二局面での声の出し方を同様に二軸グラフと多線譜とを用いて説明した図である。図９は、クレーム第三局面での声の出し方を、図１０は、クレーム第四局面での声の出し方を説明している。
【００３０】図１１は、セールスでの声の出し方を説明する図である。図１２は、セールス第一局面でどんなトーンが受け入れられるかを説明する図である。図１３は、セールス第二局面ではメリットを伝えながら、相手の情報を引き出すことを説明する図である。図１４は、セールス第三局面では相手の疑問に答え、さらに商品をアピールすることを説明する図である。図１５は、セールス第四局面では手続方法を説明し、約束を取り付けることを説明する図である。このように、クレーム対応のみならずセールスの局面においても声の高さと声の大きさを二軸としたグラフを用いて話し方及び聞き方の診断、指導をすることができる。
【００３１】図１６は、本発明の話し方診断方法を示すフローチャートである。例えば汎用コンピュータを用いてこの話し方診断方法を実現することができる。まず、このコンピュータプログラムを起動すると最初の初期画面が表示される（ステップ６００）。この初期画面には少なくとも録音ボタンが表示されており、その録音ボタンを操作者がクリックすることで、コンピュータに接続されたマイクから音声データの取り込みが開始される（ステップ６０１でＹＥＳの場合にステップ６０３に進む）。次に、取得した音声データを解析する（ステップ６０５）。声の大きさ、高さ、テンポ、間の取り方などを解析してその解析結果を視覚的に表示する（ステップ６０７）。この視覚的表示には、多線譜表示や、声の高さと声の大きさを二軸とするグラフ表示が採用されえる。操作者の音声に基づく表示をするのみならず、手本となる模範的な発声についても比較表示することが望ましい。また、操作者、使用者の個人差に対応すべく、ステップ６０１の録音に先立って、その人の音域や音量を測定するプロセスを経ることが望ましい。
【００３２】図１７は、本発明の話し方診断装置の構成例を示す概念ブロック図である。前述したように汎用コンピュータを用いることができるが、持ち運び可能な機器として構成する際には、電子手帳、ＰＤＡ、携帯電話装置、腕時計などを用いることが可能である。最低限図１７に示すＣＰＵ１０、そのバスにつながるメモリ２０、マイクインタフェース３０、表示装置４０、スイッチ５０、及びマイクインタフェース３０につながるマイク３５が必要である。メモリ２０は、必要なプログラムを記憶するとともにＣＰＵのワークエリアを供給する。
【００３３】図１８は、本発明に基づく音声合成方法を示すフローチャートである。本発明では、センテンスを構成する単語についてのアクセント情報を含む音声データに関してデータベース構築がすでになされていることを前提とする。まず、単語についてのデータベースにアクセスして、必要な単語データの取得をする（ステップ８１０）。「必要な」とは、そのセンテンスを構成する単語に分解した上で、それらをすべて取得することを意味する。センテンスを単語に分解することをコンピュータプログラムで自動的に行うことも可能であるが、作業者が手作業にて分解した上で、取得することとしてもよい。必要な単語のデータが取得できたことを確認したうえで（ステップ８２０でＹＥＳ）、それらを順番につなぎ合わせる（ステップ８３０）。そして、その結果を所定のルールに基づき加工する（ステップ８４０）。ここで、「所定のルール」とは、例えば図２で表現した適切な抑揚の図に示したようなやまなりのイントネーションである。もともとの単語データのアクセントが例えば、三線譜で表現されていたとすると、それらのいくつかをつなぎ合わせてセンテンスを構成する際には、六線譜、七線譜、八線譜などの多線譜になることも考えられる。そのようにしてできあがったセンテンスについて最適化されたイントネーションの情報に基づいてセンテンスの音声合成を実行するが、場合によっては、多線譜の表現が適用する音声合成システムの技術的な都合上、その音程の高さの変化に制限がある場合があるため、適切な調整処理を実行する（ステップ８５０）。その上で、通常の音声合成を実行することとなる。
【００３４】
【実施例】音声分析に音符化の手法を取りいれることには、いくつかの課題がある。既成の音符化ソフトを用いて音声データから音符化をすると、音符の数と音声のモーラ数との釣り合いが取れない傾向が見られる。また、音符化した結果を再生したときに、もとの音声とはかけ離れたものになる。などの課題がある。本発明の発明者は、それらの課題に対する解決策をここに提示する。ここに、モーラ数とは、音声学上の概念であって、文章をひらがな表記した文字数に近い概念である。
【００３５】まず、音の数が適切でないという課題に対しては、適切な閾値を設けて、トーンの変化がそれ以上になった変化点を検出して、ある変化点から次の変化点までは一つの音符で代表させてデフォルメすることである。
【００３６】また、音符化した結果を再生したときに、もとの音声とはかけ離れることについては、メジャーコードの適用による解決を提案する。ピアノやギターなどで用いられるコードの概念には、メジャーコード、マイナーコードなどが知られているが、それぞれ和音にて表現される。音符化した結果の一つ一つの音をある特定のメジャーコードの和音を構成する音のうちの一つに当てはめて表現することにより、より自然な音の流れにする。これにより、再生した結果が話し言葉のイメージに近づけることができる。なお、基本はメジャーコードを使うが、感情表現によってはマイナーコードを適用するのがふさわしい場合もある。
【００３７】音符化した結果と理想の話し方（学習目標）の音の流れとを併記して表示することにより話し方を学ぶ者に自己の上達度を示すことが可能である。また、音符化した結果と文章の文字による表示を併記することもできる。さらに、前述の音声データの取得の際に、音声認識のプログラムをも同時に実行して、その結果を併記することも可能である。
【００３８】図１９は、音声診断方法の実施例を示すフローチャートである。まず音声データを取得する（ステップ９１０）。この音声データ取得は、マイクを通じて既存の録音ソフトを用いることによりなされ得る。また、テープレコーダなどの出力をコンピュータに入力することによってもなされ得る。次に取得した音声データを解析してトーン変化点を抽出する（ステップ９２０）。このトーン変化点の抽出は適当な閾値を設けてその閾値を越える変化をする点を抽出することによりなされ得る。次に、音符化処理に移る（ステップ９４０）。音符化処理は、抽出した変化点に一つの音符を置き、次の変化点まで音を延ばすことによりなされる。次にメジャーコードの適用処理（ステップ９５０）をした後に、表示する（ステップ９６０）。この表示の内容としては、ステップ９１０で取得した音声データの波形またはステップ９４０で変換した音符をつないでできる波形を同時に表示することもできる。そしてさらに学習目標の波形も表示することが望ましい。
【００３９】上述した実施形態及び実施例では、パソコンなどのコンピュータを用いることを前提として書いたが、携帯電話装置を用いる場合などの便宜を図るべく、プログラムをインターネット上あるいは携帯電話網上のサーバに置き、ネットワークを介してプログラムを走らせることとする実施例も可能である。
【００４０】本明細書ではコンピュータを使用して話し方の診断をする方法について記載したが、コンピュータを用いずに同様のことを行うこともこの発明の範疇に入る。多線譜や、声の高さと声の大きさを二軸表示したグラフを用いて辞書や語学教材を作成することもこの発明の範疇となる。また、カラオケや音声合成、方言矯正、方言学習、ボイストレーニング、に応用することも範疇に入るものである。
【００４１】
【発明の効果】本発明は以上のように構成されているから、話し方の学習をしようとする者、語学学習をしようとする者、カラオケ練習をしようとする者、ボイストレーニングをしようとする者などにとって学習の指針をわかりやすく提示することができる。
【図面の簡単な説明】
【図１】応対時における声の「大きさ」と「高さ」の関係を示す図である。
【図２】多線譜を用いて話し方を分析した図である。
【図３】語尾のクセ・３パターンについて多線譜を用いて説明する図である。
【図４】いろいろな「はい」の言い方について多線譜を用いて説明する図である。
【図５】話し方に変化をつける５チェンジ話法を多線譜を用いて説明する図である。
【図６】クレーム応対の四つの局面と声の四つの領域について、声の高さと声の大きさを二軸としたグラフを用いて説明する図である。
【図７】クレーム第一局面での声の出し方を声の高さと声の大きさを二軸としたグラフと、多線譜とを併用して説明する図である。
【図８】クレーム第二局面での声の出し方を同様に二軸グラフと多線譜とを用いて説明した図である。
【図９】クレーム第三局面での声の出し方を説明する図である。
【図１０】クレーム第四局面での声の出し方を説明する図である。
【図１１】セールスでの声の出し方を説明する図である。
【図１２】セールス第一局面でどんなトーンが受け入れられるかを説明する図である。
【図１３】セールス第二局面ではメリットを伝えながら、相手の情報を引き出すことを説明する図である。
【図１４】セールス第三局面では相手の疑問に答え、さらに商品をアピールすることを説明する図である。
【図１５】セールス第四局面では手続方法を説明し、約束を取り付けることを説明する図である。
【図１６】本発明の話し方診断方法を示すフローチャートである。
【図１７】本発明の話し方診断装置の構成例を示す概念ブロック図である。
【図１８】本発明の音声合成方法を示すフローチャートである。
【図１９】本発明の音声診断方法の実施例を示すフローチャートである。
【符号の説明】
１０ＣＰＵ
２０メモリ
３０マイクインタフェース
３５マイク
４０表示装置
５０スイッチ[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of diagnosing speech.
[0002]
2. Description of the Related Art Conventionally, there has been a karaoke practice system or the like that scores 100 points, 80 points, etc. after singing. There was no computer-assisted analysis of how to speak.
[0003]
SUMMARY OF THE INVENTION The inventor of the present invention has been training on how to speak, listen, and answer telephone calls on a daily basis. And this time, he came up with a revolutionary speech diagnosis method. This is a method of diagnosing and teaching how to speak by expressing the intonation of a sentence using a multi-line notation (a higher-level concept of a staff notation, which indicates the scale using a plurality of parallel lines). This method has been disclosed in "Techniques of Speaking and Listening by the Telephone King" (published on July 30, 2002, Taiyo Planning Publishing Co., Ltd.), the author of which is the author. A method of diagnosing speech using a multi-line score is described on pages 28, 30, 60, 117, 132, and 143 of this book. In addition, the inventor of the present invention has conceived a method of instructing how to speak using a graph in which the pitch of the voice and the loudness of the voice are two axes, and the book has pages 26, 126, 132, 143, 147 pages, 151 pages, 168 pages, 174 pages, 184 pages, 188 pages, and 198 pages.
[0004] The inventor of the present invention has conceived a method of disseminating speech diagnosis to more people. It is an object of the present invention to visually express diagnosis and the like using a computer or the like to make it easy to understand.
[0005]
Means for Solving the Problems To solve the above-mentioned problems, an invention according to claim 1 is a speech diagnosis method for diagnosing a speech using a computer, the speech recording a speech of a person who wants a diagnosis. An audio data acquisition step of importing data into the computer; an audio data analysis step of analyzing the audio data acquired in the audio data acquisition step for changes in voice pitch, tempo, voice volume, and the like; An analysis result display step of visually displaying the way of speech of the person who wants the diagnosis based on the analysis result analyzed in the analysis step.
According to a second aspect of the present invention, there is provided the speech diagnosis method according to the first aspect, wherein the display in the analysis result display step uses a multi-line notation.
According to a third aspect of the present invention, there is provided the method for diagnosing speech according to the first aspect, wherein the display in the analysis result displaying step includes a graph having two axes of a voice pitch and a voice loudness. It is what I drew.
According to a fourth aspect of the present invention, there is provided a speech diagnosis apparatus for displaying a speech diagnosis result on a display unit of a portable device such as a portable telephone device, a personal digital assistant (PDA), an electronic organizer, and a wristwatch. Voice data capturing means for capturing voice data spoken by a person, voice data analyzing means for analyzing voice data captured by the voice data capturing means, and based on an analysis result analyzed by the voice data analyzing means, Analysis result display means for displaying an analysis result of the way of speaking on the display unit of the portable device.
According to a fifth aspect of the present invention, there is provided a speech diagnosis apparatus according to the fourth aspect, wherein the speech diagnosis apparatus is incorporated in the portable device and has the function alone. is there.
According to a sixth aspect of the present invention, there is provided a speech diagnosis apparatus according to the fourth aspect, wherein the speech diagnosis apparatus is provided in a computer network connected via wireless communication such as a mobile phone. The function is achieved by communicating with a portable device.
According to a seventh aspect of the present invention, there is provided a speaking style learning supporting method for assisting learning of a speaking style using a computer, wherein a voice data acquiring step of capturing voice data obtained by recording the learner's speaking style into the computer. A voice data analysis step of analyzing the voice data captured in the voice data acquisition step for changes in voice pitch, tempo, voice loudness, and the like, and based on the analysis result analyzed in the voice data analysis step. Analysis result display step of visually displaying the learner's speech.
An eighth aspect of the present invention is a voice synthesizing method for synthesizing voice electronically, wherein an intonation adjusting step for adjusting intonation of each sentence in accordance with a multi-line notation, and the intonation adjusting step. And generating synthesized voice data based on the adjusted result.
According to a ninth aspect of the present invention, there is provided a karaoke practice assisting method for assisting a karaoke practicer by using a computer, wherein voice data recording the singing style of the karaoke practicer is taken into the computer. The voice data acquisition step, the voice data analysis step of analyzing the voice data captured in the voice data acquisition step for changes in voice pitch, tempo, voice loudness, etc., and the voice data analysis step An analysis result display step of visually displaying how to sing the karaoke trainer based on the analysis result.
According to a tenth aspect of the present invention, there is provided a voice training support method for supporting voice training of a voice training trainer using a computer, wherein the voice data obtained by recording the voice of the voice training trainer is transmitted to the computer. A voice data obtaining step for analyzing the voice data captured in the voice data obtaining step for voice pitch, tempo, change in voice volume, and the like; and An analysis result display step of visually displaying the voice of the voice training trainer based on the analyzed analysis result.
[0015] The invention described in claim 11 is a dictionary in which preferred intonations of sentence units are described using a multi-line notation.
The invention according to claim 12 is a language teaching material in which a preferred intonation of a sentence unit is described by using a multi-line notation.
The invention according to claim 13 is a dialect correction method for correcting a dialect using a computer, the method comprising: obtaining voice data obtained by recording the speech of the corrector into the computer; The voice data captured in the voice data obtaining step, the voice pitch, tempo, voice data analysis step for analyzing changes in voice volume and the like, based on the analysis results analyzed in the voice data analysis step, Analysis result display step of visually displaying the corrector's speech style.
An invention according to claim 14 is a dialect learning method for learning a dialect using a computer, wherein a voice data acquisition step of loading voice data obtained by recording the learner's speaking style into the computer; The voice data captured in the voice data obtaining step, the voice pitch, tempo, voice data analysis step for analyzing changes in voice volume and the like, based on the analysis results analyzed in the voice data analysis step, And an analysis result display step of visually displaying a learner's speech style.
An invention according to claim 15 is a speech diagnosis method for diagnosing speech using a computer, comprising: a speech data acquisition step of taking in speech data obtained by recording speech of a person who wants a diagnosis; A tone change point obtaining step of analyzing the voice data captured in the voice data obtaining step to obtain a change point of a tone, and a note conversion step of converting the change point obtained in the tone change point obtaining step into a note. It has.
The invention described in claim 16 is the speech diagnosis method according to claim 15, further comprising a chord adjusting step of applying a major chord to the scale converted in the note-forming step and applying a major chord. Things.
The invention described in claim 17 is the speech diagnosis method according to claim 15 or 16, further comprising a diagnosis result displaying step of displaying the musical result and the learning target together. It is.
[0022]
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a relationship between “loudness” and “pitch” of a voice at the time of reception. In this figure, the X axis (horizontal axis) represents the volume of the voice. Since the magnitude of the voice is expressed in terms of musical strength, it is expressed as pp (pianissimo), p (piano), mp (meso piano), mf (meso forte), f (forte), ff (fortesimo). . The Y axis (vertical axis) represents the pitch of the voice in the scale of Doremi Faso. For example, it is assumed that point A in the figure is a coordinate at which the forte and the sone intersect, and point B is a coordinate at which the piano and the doe intersect. With respect to the coordinate axes where the X axis and the Y axis intersect, the upper right is the “favorable response”, the lower right is the “persuasion”, the lower left is the “Cheer”, and the upper left is the “cooperation” area. In the first voice of the inbound, it is preferable to speak in the "favorable response" area, and if there is no particular problem in the development of conversation, it is preferable to speak in this area until the end of the conversation. If the tone of the voice comes into the tone of the voice, the tone is called the tone of the voice.
FIG. 2 is a diagram in which the way of speaking is analyzed using a multi-line score. The multi-line notation is a superordinate concept of the staff notation and visually expresses a temporal change in pitch. In some cases, the number of lines can be appropriately changed, such as a six-score notation or a four-score notation, depending on the degree of understanding of the person to be explained and the degree of training of the pitch. If you say "Thank you" or "I'm Saito at Japan Shoji Customer Service Department" with a straight line movement from soud to mute, it's straight and lacks softness. On the other hand, it is soft and easy to accept if you make a curved movement that raises a little from the initial pitch and lowers it. In the case of "Thank you", it is better to start with a so-sound at first, then go up a little and then go down in a curved manner with a convex mountain up to the sound. In the example of "I'm Saito at the Japan Trading Customer Service Department", the "Japan Trading" part goes up a little from the so-to sound to the fa-sound, then descends in a mountain, and the "Customer Service Department" part goes from the fa-sound to the sound. It is better to go up a little and then go down in a mountain, and in the part of "It's Saito", go up a little from the top of the M sound to a little sound and then go down in a mountain.
In the lower half of Fig. 2, an analysis of intonation is written. If you say "Thank you very much" or "I'm really sorry", etc. with the M sound without inflection (stick reading), it gives the listener the feeling that the range is narrow and the heart is not muffled. Also, inflections that go down and up between the Mi and Do sounds are inappropriate, resulting in a rude impression and misunderstanding. A proper intonation is one that has a hill-like intonation that is convex upward at each appropriate cut, as shown at the bottom of FIG. Speaking in this intonation can give the other party the impression of being loving.
FIG. 3 is a diagram for explaining the habit and three patterns at the end using a multi-line notation. Explains three habits of ending, extending, and ending. By showing and pointing out a habit that the person is not aware of in this way, motivation to promote and correct the awareness can be given.
FIG. 4 is a diagram for explaining various ways of saying "yes" using a multi-line score. Yes to answer, yes to consent, yes to consent, yes to respond, yes to answer, yes to lethargy, yes to anxiety, yes to tuning, yes to frustration, 10 to yes. The type “yes” can be distinguished and explained by using a multi-line score.
FIG. 5 is a diagram for explaining a five-change speech method for changing the way of speaking using a multi-line score. Change the sound of the entry, change the range, change the timing of the entry (interval), change the speed, change the way of emphasizing the keyword. It is described using FIG. In addition, this can be applied to the way of taking time.
FIG. 6 is a diagram for explaining four aspects of the claim response and four areas of the voice using a graph in which the pitch of the voice and the volume of the voice are two axes. Here, the complaint indicates that the customer feels a so-called complaint or dissatisfaction and hits it at a company support center or a customer consultation room. The four phases are the following aspects of the customer: the phase of the claim, the phase of the situation explanation, the phase from listening to understanding and reaching an agreement, and the phase from the agreement to consent, satisfaction, and excitement. From the point of view of the respondent, there are four phases: a meeting phase, a situation understanding phase, a problem solving phase, and a closing phase. In each situation, it is described which region of the voice volume and voice tone is preferably used.
FIG. 7 is a diagram for explaining how to make a voice in the first aspect of the claim by using a graph in which the pitch of the voice and the volume of the voice are on two axes and a multi-line notation. A two-axis graph of the pitch and loudness of the voice, showing the change in the pitch of the sound of the voice of Aizuchi and the intonation of "I'm sorry." The explanation is made in correspondence with the movement in.
FIG. 8 is a diagram for explaining how to make a voice in the second aspect of the claim by using a two-axis graph and a multi-line notation. FIG. 9 illustrates how to make a voice in the third aspect of the claim, and FIG. 10 illustrates how to make a voice in the fourth aspect of the claim.
FIG. 11 is a diagram for explaining how to produce a voice in sales. FIG. 12 is a diagram illustrating what tones are accepted in the first phase of sales. FIG. 13 is a diagram illustrating that in the second phase of the sales, the information of the partner is extracted while transmitting the merit. FIG. 14 is a diagram illustrating answering a question of a partner in the third phase of sales and further promoting a product. FIG. 15 is a diagram for explaining a procedure in the fourth phase of sales and explaining that a promise is attached. As described above, not only in response to complaints but also in the sales phase, diagnosis and guidance of how to speak and how to listen can be performed using the graph in which the pitch of the voice and the volume of the voice are two axes.
FIG. 16 is a flowchart showing a speech diagnosis method according to the present invention. For example, the speech diagnosis method can be realized using a general-purpose computer. First, when this computer program is started, an initial screen is displayed (step 600). At least a record button is displayed on this initial screen, and when the operator clicks the record button, the capture of audio data from the microphone connected to the computer is started (if YES in step 601, the step Proceed to 603). Next, the acquired voice data is analyzed (step 605). The loudness, pitch, tempo, interval, and the like of the voice are analyzed, and the analysis result is visually displayed (step 607). For this visual display, a multi-line notation display or a graph display in which the pitch of the voice and the volume of the voice are two axes can be adopted. It is desirable not only to display based on the voice of the operator, but also to compare and display exemplary utterances that serve as examples. Further, in order to cope with the individual difference between the operator and the user, it is desirable to go through a process of measuring the sound range and volume of the person prior to the recording in step 601.
FIG. 17 is a conceptual block diagram showing an example of the configuration of the speech diagnosis device of the present invention. As described above, a general-purpose computer can be used, but when configured as a portable device, an electronic organizer, a PDA, a mobile phone, a wristwatch, or the like can be used. At a minimum, the CPU 10 shown in FIG. 17, the memory 20 connected to the bus, the microphone interface 30, the display device 40, the switch 50, and the microphone 35 connected to the microphone interface 30 are required. The memory 20 stores necessary programs and supplies a work area for the CPU.
FIG. 18 is a flowchart showing a speech synthesis method according to the present invention. In the present invention, it is assumed that a database has already been constructed for voice data including accent information on words constituting a sentence. First, a word database is accessed to acquire necessary word data (step 810). "Necessary" means that the sentence is broken down into words that constitute the sentence, and all of them are obtained. Although it is possible to automatically decompose sentences into words by a computer program, the sentence may be obtained after being manually decomposed by an operator. After confirming that the necessary word data has been obtained (YES in step 820), they are connected in order (step 830). Then, the result is processed based on a predetermined rule (step 840). Here, the “predetermined rule” is, for example, a sharp intonation as shown in the diagram of the appropriate intonation expressed in FIG. For example, if the accents of the original word data were expressed in a three-line notation, when connecting some of them to form a sentence, a multi-line notation such as a six-line, seven-line, or eight-line notation was used. It is possible that Speech synthesis of the sentence is performed based on the information of the intonation optimized for the sentence thus completed, but in some cases, due to the technical convenience of the speech synthesis system to which the multi-line notation expression is applied, the Since there is a case where the change of the pitch is limited, an appropriate adjustment process is executed (step 850). Then, normal speech synthesis is executed.
[0034]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The use of note-taking techniques in speech analysis has several challenges. When the music data is converted from the voice data using the existing conversion software, there is a tendency that the number of musical notes and the number of mora of the voice cannot be balanced. Also, when the result of the note conversion is reproduced, the result is far from the original sound. There are issues such as. The inventor of the present invention presents a solution to these problems here. Here, the mora number is a phonetic concept, which is similar to the number of characters in which a sentence is written in hiragana.
First, with respect to the problem that the number of sounds is not appropriate, an appropriate threshold value is provided to detect a change point where the change in tone is larger than that, and from a certain change point to the next change point. Is to make one note represent and deform.
In addition, when the result of the musical note is reproduced, it is proposed that a major code be applied to solve the problem that the result is far from the original voice. Major chords, minor chords, and the like are known as chords used in pianos and guitars, and are each represented by chords. A more natural sound flow is achieved by expressing each sound resulting from the note conversion by applying it to one of the sounds constituting the chord of a particular major chord. As a result, the reproduced result can be made closer to the image of the spoken language. Basically, major codes are used, but depending on emotional expressions, it may be appropriate to apply minor codes.
By displaying the result of the note conversion and the sound flow of the ideal way of speaking (learning target) together, it is possible to show a person who learns how to speak to his / her progress. In addition, the result of the musical note and the display of the text can be written together. Furthermore, at the time of acquiring the above-mentioned voice data, it is also possible to execute a voice recognition program at the same time and to record the result together.
FIG. 19 is a flowchart showing an embodiment of the voice diagnosis method. First, audio data is obtained (step 910). This voice data acquisition can be performed by using existing recording software through a microphone. Further, it may be performed by inputting an output of a tape recorder or the like to a computer. Next, the acquired voice data is analyzed to extract a tone change point (step 920). The extraction of the tone change point can be performed by providing an appropriate threshold value and extracting a change point exceeding the threshold value. Next, the process proceeds to a note conversion process (step 940). The note conversion process is performed by placing one note at the extracted change point and extending the sound to the next change point. Next, after performing a major code application process (step 950), it is displayed (step 960). As the contents of this display, the waveform of the audio data acquired in step 910 or the waveform formed by connecting the notes converted in step 940 can be simultaneously displayed. It is also desirable to display the learning target waveform.
In the above-described embodiments and examples, the description has been made on the assumption that a computer such as a personal computer is used. However, in order to facilitate the use of a portable telephone device, the program is stored on a server on the Internet or a portable telephone network. An embodiment in which the program is run via a network is also possible.
Although a method of diagnosing speech using a computer has been described in the present specification, performing the same without using a computer falls within the scope of the present invention. It is also within the scope of the present invention to create a dictionary or language teaching material using a multi-line notation or a graph in which voice pitch and voice loudness are displayed on two axes. In addition, application to karaoke, speech synthesis, dialect correction, dialect learning, and voice training is also included in the category.
[0041]
As described above, the present invention is constructed as described above, and therefore, a person who wants to learn how to speak, a person who wants to learn language, a person who wants to practice karaoke, and a person who wants to perform voice training. For example, the guideline of learning can be presented in an easy-to-understand manner.
[Brief description of the drawings]
FIG. 1 is a diagram showing a relationship between “loudness” and “pitch” of a voice at the time of reception.
FIG. 2 is a diagram in which a way of speaking is analyzed using a multi-line score.
FIG. 3 is a diagram illustrating a habit and three patterns at the end using a multi-line notation.
FIG. 4 is a diagram illustrating various ways of saying “yes” using a multi-line score.
FIG. 5 is a diagram illustrating a five-change speech method that changes the way of speaking, using a multiple score.
FIG. 6 is a diagram illustrating four aspects of a claim response and four regions of a voice using a graph in which voice pitch and voice volume are represented on two axes.
FIG. 7 is a diagram for explaining how to produce a voice in the first aspect of the claim by using a graph in which the pitch of the voice and the volume of the voice are two axes and a multi-line score.
FIG. 8 is a diagram similarly illustrating how to produce a voice in the second aspect of the claim using a two-axis graph and a multi-line notation.
FIG. 9 is a diagram illustrating how to make a voice in the third phase of the claim.
FIG. 10 is a diagram illustrating how to make a voice in the fourth aspect of the claim.
FIG. 11 is a diagram illustrating how to produce a voice in sales.
FIG. 12 is a diagram illustrating what tones are accepted in the first phase of sales.
FIG. 13 is a diagram illustrating that in the second phase of sales, information on the other party is extracted while transmitting a merit.
FIG. 14 is a diagram illustrating answering a question of the other party in the third phase of sales and further promoting the product.
FIG. 15 is a diagram illustrating a procedure method in the fourth phase of sales and explaining that a promise is attached.
FIG. 16 is a flowchart showing a speech style diagnosis method of the present invention.
FIG. 17 is a conceptual block diagram illustrating a configuration example of a speaking style diagnosis device of the present invention.
FIG. 18 is a flowchart showing a speech synthesis method of the present invention.
FIG. 19 is a flowchart showing an embodiment of the voice diagnosis method of the present invention.
[Explanation of symbols]
10 CPU
Reference Signs List 20 memory 30 microphone interface 35 microphone 40 display device 50 switch

Claims

A speech diagnosis method for diagnosing speech using a computer,
Voice data acquisition step of capturing voice data recording a way of speaking of a person who wants a diagnosis into the computer;
Voice data analysis step of analyzing the voice data captured in the voice data acquisition step for voice pitch, tempo, change in voice volume, and the like;
An analysis result display step of visually displaying the speech of the person who wants the diagnosis based on the analysis result analyzed in the voice data analysis step.

A speech diagnosis method according to claim 1, wherein:
The display method in the analysis result display step is a speech style diagnosis method in which the display is a multi-line notation (a visual representation of a temporal change in pitch).

A speech diagnosis method according to claim 1, wherein:
The display method in the analysis result display step is a speech diagnosis method in which a graph having two axes of a pitch of the voice and a loudness of the voice is drawn.

A speech diagnosis device that displays a diagnosis result of a speech on a display unit of a portable device such as a mobile phone device, a personal digital assistant (PDA), an electronic organizer, and a wristwatch,
Voice data capturing means for capturing voice data spoken by the user;
Voice data analyzing means for analyzing voice data captured by the voice data capturing means;
A speech diagnosis apparatus comprising: an analysis result display unit configured to display an analysis result of the speech on a display unit of the portable device based on an analysis result analyzed by the voice data analysis unit.

The speech diagnosis device according to claim 4,
The speech diagnosis device is a speech diagnosis device that is incorporated inside the portable device and has the function alone.

The speech diagnosis device according to claim 4,
The speech diagnosis device is provided in a computer network connected via wireless communication such as a mobile phone, and performs the function by communicating with the portable device.

A speech learning support method for assisting learning of speech using a computer,
Voice data acquisition step of capturing voice data of the learner's speech into the computer;
Voice data analysis step of analyzing the voice data captured in the voice data acquisition step for voice pitch, tempo, change in voice volume, and the like;
An analysis result display step of visually displaying the learner's speech based on the analysis result analyzed in the voice data analysis step.

A speech synthesis method for electronically synthesizing speech,
An intonation adjustment step for making adjustments according to a multi-line staff for each sentence intonation;
A synthetic speech creating step of creating synthetic speech data based on a result adjusted in the intonation adjusting step.

A karaoke practice support method for supporting karaoke practice of a karaoke trainer using a computer,
Voice data acquisition step of capturing voice data of how to sing the karaoke trainer into the computer;
Voice data analysis step of analyzing the voice data captured in the voice data acquisition step for voice pitch, tempo, change in voice volume, and the like;
A karaoke practice support method comprising: visually displaying the karaoke practicer's singing style based on the analysis result analyzed in the voice data analysis step.

A voice training support method using a computer to support voice training of a voice training trainer,
Voice data acquisition step of capturing voice data obtained by recording the voice of the voice training trainer into the computer;
Voice data analysis step of analyzing the voice data captured in the voice data acquisition step for voice pitch, tempo, change in voice volume, and the like;
An analysis result display step of visually displaying the voice of the voice training trainer based on the analysis result analyzed in the voice data analysis step.

A dictionary in which the preferred intonation of a sentence unit is described using multi-line notation.

A language teaching material in which the preferred intonation of a sentence unit is described using multi-line notation.

A dialect correction method for correcting a dialect using a computer,
Voice data acquisition step of capturing voice data recording the corrector's speech into the computer;
Voice data analysis step of analyzing the voice data captured in the voice data acquisition step for voice pitch, tempo, change in voice volume, and the like;
An analysis result display step of visually displaying a speech style of the corrector based on the analysis result analyzed in the voice data analysis step.

A dialect learning method of learning a dialect using a computer,
Voice data acquisition step of capturing voice data of the learner's speech into the computer;
Voice data analysis step of analyzing the voice data captured in the voice data acquisition step for voice pitch, tempo, change in voice volume, and the like;
An analysis result display step of visually displaying the learner's speech based on the analysis result analyzed in the voice data analysis step.

A speech diagnosis method for diagnosing speech using a computer,
Voice data acquisition step of capturing voice data recording a way of speaking of a person who wants a diagnosis into the computer;
A tone change point obtaining step of analyzing the voice data captured in the voice data obtaining step to obtain a tone change point;
A note-forming step of converting the change point acquired in the tone-change-point acquiring step into a note.

A speech diagnosis method according to claim 15, wherein:
A speech style diagnosis method further comprising a chord adjustment step of applying a major chord to the scale converted into a note in the note conversion step to adjust the scale.

A speech diagnosis method according to claim 15 or 16,
A speech diagnosis method further comprising a diagnosis result display step of displaying the musical result and the learning target together.