JP5521554B2

JP5521554B2 - Text conversion device, method, and program

Info

Publication number: JP5521554B2
Application number: JP2009554331A
Authority: JP
Inventors: 玲史近藤; 康行三井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2008-02-19
Filing date: 2009-02-17
Publication date: 2014-06-18
Anticipated expiration: 2029-02-17
Also published as: JPWO2009104613A1; WO2009104613A1

Description

（関連出願についての記載）
本願は、先の日本特許出願２００８−０３７６０３号（２００８年２月１９日出願）の優先権を主張するものであり、前記先の出願の全記載内容は、本書に引用をもって繰込み記載されているものとみなされる。
本発明は、テキスト変換装置、テキスト変換方法、テキスト変換プログラム、音声合成装置及びロボットに関し、特に、テキストの内容が伝わりやすくするような変換を行うテキスト変換装置、テキスト変換方法、テキスト変換プログラム、音声合成装置及びロボットに関する。(Description of related applications)
This application claims the priority of the previous Japanese Patent Application No. 2008-037603 (filed on Feb. 19, 2008), and the entire description of the previous application is incorporated herein by reference. Is considered to be.
The present invention relates to a text conversion device, a text conversion method, a text conversion program, a speech synthesizer, and a robot, and in particular, a text conversion device, a text conversion method, a text conversion program, and a voice that perform conversion that facilitates transmission of text contents. The present invention relates to a synthesis apparatus and a robot.

特許文献１に、ある自然言語で記述された文字列を、変換目的毎に用意される変換規則を用いて、当該自然言語の他の表現による文字列に変換する文変換技術が、開示されている。 Patent Document 1 discloses a sentence conversion technique for converting a character string described in a certain natural language into a character string in another representation of the natural language using a conversion rule prepared for each conversion purpose. Yes.

特許文献２には、車両を運転中のドライバの運転負荷を判断する運転負荷判断手段を備え、ドライバの運転負荷に応じて、ドライバ用音声出力手段を制御し、読み上げの休止期間や発話速度を変更する技術が開示されている。 Patent Document 2 includes a driving load determination unit that determines a driving load of a driver who is driving a vehicle, controls the driver's voice output unit according to the driving load of the driver, and sets a reading pause period and speech rate. Techniques for changing are disclosed.

特許文献３には、音声応答サービス装置において、利用者の属性情報に応じて、音声合成の音質を制御する技術が開示されている。 Patent Document 3 discloses a technique for controlling the sound quality of speech synthesis according to user attribute information in a voice response service device.

特許文献４には、入力した音声の音圧、ピッチ周波数、継続時間等の特徴量に基づいて、話者の感情を推定する技術が開示されている。 Patent Document 4 discloses a technique for estimating a speaker's emotion based on feature amounts such as sound pressure, pitch frequency, and duration of input speech.

特許第３９３２３５０号公報Japanese Patent No. 3932350 特開２００５−０７０７０３号公報JP-A-2005-070703 特許第３９３６３５１号公報Japanese Patent No. 3936351 特開２００３−２２８３９１号公報JP 2003-228391 A

以上の特許文献１〜４の全開示内容は、本書に引用をもって繰り込み記載されているものとする。以下に本発明による関連技術の分析を与える。
受聴者は常に精一杯の能力を内容の受聴に用いているとは限らず、逐次の心理的状況によっては、聞き漏らしたり、誤解をしたりすることがある。本発明の目的は、上記受聴者の心理的状況を考慮し、その意味内容が伝わりやすくなるようなテキスト変換を行うテキスト変換装置、テキスト変換方法、テキスト変換プログラム、音声合成装置及びロボットを提供することにある。The entire disclosures of Patent Documents 1 to 4 above are incorporated herein by reference. The following is an analysis of the related art according to the present invention.
The listener does not always use his / her full ability to listen to the content, and may be missed or misunderstood depending on the sequential psychological situation. An object of the present invention is to provide a text conversion device, a text conversion method, a text conversion program, a speech synthesizer, and a robot that perform text conversion so that the semantic content of the listener is easily transmitted in consideration of the psychological situation of the listener. There is.

この点、特許文献１には、具体的な適用例として（Ａ）質問応答システム、（Ｂ）文内圧縮システム、（Ｃ）推敲システム、（Ｄ）難解文変換システムへの適用例と、その際に行われるであろう異なる表現への変換例が挙げられているが、これらの変換後のテキストが、受聴者に伝わりやすいものであるという保証はない。また、その他、特許文献１には、書き言葉と話し言葉での変換・逆変換への適用が示唆されているが（段落００６４）、受聴者に伝わりやすいテキストを生成するといった変換目的や、そのための具体的な変換規則の開示はなされていない。 In this regard, Patent Document 1 includes, as a specific application example, an application example to (A) question answering system, (B) sentence compression system, (C) recommendation system, (D) difficult sentence conversion system, and Although there are examples of conversion to different expressions that will be performed in the past, there is no guarantee that the text after these conversions will be easily communicated to the listener. In addition, Patent Document 1 suggests application to conversion / inverse conversion between written words and spoken words (paragraph 0064). However, the purpose of conversion is to generate text that is easy to convey to the listener, and a specific example for that purpose. No specific conversion rules have been disclosed.

また、特許文献２でいうところの「ドライバの運転負荷」は、具体的には車速のことであり、運転中であれば、通常（他の同乗者に対する読み上げ速度）よりも早い読み上げ速度で、音声出力を行うことが開示されているに過ぎない。 In addition, the “driver's driving load” as referred to in Patent Document 2 specifically refers to the vehicle speed. When driving, the reading speed is higher than normal (the reading speed for other passengers). It is only disclosed that audio output is performed.

また、特許文献３記載の技術は、もとより上記目的として挙げたようなテキストの変換を行うものではないが、刻々と変わっていく受聴者の心理的状況に対応することも不可能である。 Further, although the technique described in Patent Document 3 does not perform the text conversion as originally mentioned as the above object, it is impossible to cope with the psychological situation of the listener that changes every moment.

また、特許文献４には、入力音声から話者の感情を推定する技術が記載されているが、具体的には問診を行ないその特徴量から感情の変化を計測する介護ロボットへの適用が開示されているにすぎない。 Further, Patent Document 4 describes a technique for estimating a speaker's emotion from input speech. Specifically, the application to a nursing robot that performs an inquiry and measures a change in emotion from its feature amount is disclosed. It has only been done.

本発明の第１の視点によれば、ユーザの心理状況を表すパラメータと、テキストと、を入力として、置き換えても文意が変わらない単語対を用いてテキスト中の単語を置き換えることにより、該入力テキストの文意を変えない範囲で、複数のテキスト変換候補を作成し、それぞれのテキスト変換候補に対して、前記単語対毎に設定された前記単語対を用いてテキストを変換した場合のスコアを用いて出力テキストを音声として受聴した場合のわかりやすさを示すスコアを求め、前記入力パラメータと釣り合うスコアを持つテキスト変換候補を選択することにより、入力テキストの変換動作を行うテキスト変換装置が提供される。 According to a first aspect of the present invention , by inputting a parameter representing a user's psychological state and text, and replacing a word in the text using a word pair whose meaning does not change even if replaced, Score when a plurality of text conversion candidates are created within a range that does not change the meaning of the input text, and the text is converted using the word pairs set for each word pair for each text conversion candidate Provided is a text conversion device that performs a conversion operation of input text by obtaining a score indicating the ease of understanding when the output text is received as speech using, and selecting a text conversion candidate having a score that matches the input parameter. .

本発明の第２の視点によれば、テキスト変換装置によるテキスト変換方法であって、ユーザの心理状況を表すパラメータと、テキストと、を入力するステップと、置き換えても文意が変わらない単語対を用いてテキスト中の単語を置き換えることにより、前記テキストから、複数のテキスト変換候補を生成し、前記各テキスト変換候補に対し、前記単語対毎に設定された前記単語対を用いてテキストを変換した場合のスコアを用いてそれぞれ出力テキストを音声として受聴した場合のわかりやすさを示すスコアを求め、前記入力パラメータと釣り合うスコアを持つテキスト変換候補を選択することにより、前記入力パラメータに基づいて、文意を変えない範囲で前記入力されたテキストを変換するステップと、を含む、テキスト変換方法が提供される。 According to a second aspect of the present invention, there is provided a text conversion method by a text conversion device , a step of inputting a parameter representing a user's psychological state and text, and a word pair whose meaning does not change even if replaced. Is used to generate a plurality of text conversion candidates from the text by replacing words in the text, and for each text conversion candidate, convert the text using the word pairs set for each word pair A score indicating the ease of understanding when the output text is received as speech using each score, and selecting a text conversion candidate having a score commensurate with the input parameter. , converting said inputted text in a range that does not alter the containing text conversion method is Hisage It is.

本発明の第３の視点によれば、前記テキスト変換装置の機能を実現するためのテキスト変換プログラムが提供される。なお、このテキスト変換プログラムは、コンピュータが読み取り可能な記憶媒体に記録することができる。
According to a third aspect of the present invention, there is provided a text conversion program for realizing the function of the text conversion device . This text conversion program can be recorded on a computer-readable storage medium.

本発明によれば、任意のテキストを、その意味内容が受聴者に伝わりやすいテキストに変換することが可能になる。その理由は、受聴者の心理的状況を表すパラメータを入力としてテキスト変換を行う構成を採用したことにある。また、本効果に派生して、例えば、読み上げテキストの作成時に受聴者の状況を考慮する負担が軽減され、これらのテキストの作成が容易化される。 According to the present invention, it is possible to convert an arbitrary text into a text whose semantic content is easily transmitted to the listener. The reason is that a configuration is adopted in which text conversion is performed with parameters representing the psychological situation of the listener as input. Also, derived from this effect, for example, the burden of considering the listener's situation when creating a text to be read is reduced, and the creation of these texts is facilitated.

本発明の概要を説明するための図である。It is a figure for explaining the outline of the present invention. 本発明の第１の実施形態に係るテキスト変換装置の構成を表したブロック図である。It is a block diagram showing the structure of the text converter which concerns on the 1st Embodiment of this invention. 音声のピッチ周波数と、ユーザの緊急度との関係を説明するための図である。It is a figure for demonstrating the relationship between the pitch frequency of an audio | voice, and the urgency level of a user. 本発明の第１の実施形態に係るテキスト変換装置のテキスト変換部の詳細構成を表したブロック図である。It is a block diagram showing the detailed structure of the text conversion part of the text converter which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係るテキスト変換装置の単語変換データベースの構成を説明するための図である。It is a figure for demonstrating the structure of the word conversion database of the text converter which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係るテキスト変換装置の動作を説明するための図である。It is a figure for demonstrating operation | movement of the text conversion apparatus which concerns on the 1st Embodiment of this invention. 本発明の第２の実施形態に係るテキスト変換装置の構成を表したブロック図である。It is a block diagram showing the structure of the text conversion apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係るテキスト変換装置のテキスト変換部の詳細構成を表したブロック図である。It is a block diagram showing the detailed structure of the text conversion part of the text converter which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係るテキスト変換装置の動作を説明するための図である。It is a figure for demonstrating operation | movement of the text conversion apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第３の実施形態に係るテキスト変換装置の構成を表したブロック図である。It is a block diagram showing the structure of the text conversion apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施形態に係るテキスト変換装置のテキスト変換部の詳細構成を表したブロック図である。It is a block diagram showing the detailed structure of the text conversion part of the text converter which concerns on the 3rd Embodiment of this invention.

１１マイクロフォン（音声入力部）
１２音声認識部
１３応答文言生成部
１４テキスト音声合成部
１５スピーカ
２０ピッチ周波数分析部
２１心理状況推定部
２２テキスト変換部
２３発話速度測定部
３１単語変換部
３２テキスト分割部
３３候補選択部
３４単語変換データベース（単語変換ＤＢ）
３６テキスト要約部
３７テキスト強調部11 Microphone (voice input unit)
DESCRIPTION OF SYMBOLS 12 Speech recognition part 13 Response word generation part 14 Text speech synthesis part 15 Speaker 20 Pitch frequency analysis part 21 Psychological condition estimation part 22 Text conversion part 23 Speech rate measurement part 31 Word conversion part 32 Text division part 33 Candidate selection part 34 Word conversion Database (word conversion DB)
36 Text summary section 37 Text enhancement section

続いて、本発明を好適な実施形態として第１〜第３の実施形態を示して説明する。これらの実施形態は、図１に抽象化されるように、いずれも、受聴者の心理的状況を表すパラメータ（心理的状況パラメータ）に応じて、対象のテキストが「音声として発せられた場合にわかりやすいテキスト」となるように、テキスト自体を変形するテキスト変換手段（図１のテキスト変換部）を備えるものである。 Subsequently, the present invention will be described with reference to the first to third embodiments as preferred embodiments. In these embodiments, as abstracted in FIG. 1, in both cases, the target text is “spoken as speech” according to a parameter (psychological situation parameter) that represents the psychological situation of the listener. It comprises text conversion means (text conversion unit in FIG. 1) for transforming the text itself so that it becomes “intelligible text”.

［第１の実施形態］
始めに、ユーザの心理的状況を表すパラメータとして、ユーザの緊急度（急いでいる度合い）を用いてテキスト変換を行う本発明の第１の実施形態について説明する。図２は、本発明の第１の実施形態に係るテキスト変換装置の構成を表したブロック図である。[First Embodiment]
First, a first embodiment of the present invention in which text conversion is performed using a user's urgency (degree of urgency) as a parameter representing a user's psychological situation will be described. FIG. 2 is a block diagram showing the configuration of the text conversion apparatus according to the first embodiment of the present invention.

図２を参照すると、本発明の第１の実施形態に係るテキスト変換装置は、マイクロフォン等の音声入力部１１と、ピッチ周波数分析部２０と、心理状況推定部２１と、テキスト変換部２２と、を備えて構成される。これらテキスト変換装置の処理手段は、テキスト変換装置を構成するコンピュータに、後記する各処理を実行させるプログラムにより実現することができる。 Referring to FIG. 2, the text conversion apparatus according to the first embodiment of the present invention includes a voice input unit 11 such as a microphone, a pitch frequency analysis unit 20, a psychological situation estimation unit 21, a text conversion unit 22, It is configured with. The processing means of these text conversion apparatuses can be realized by a program that causes a computer constituting the text conversion apparatus to execute each process described later.

ピッチ周波数分析部２０は、音声入力部１１より入力されたユーザの発声する音声を分析し、ピッチ周波数を得る手段である。 The pitch frequency analysis unit 20 is a means for analyzing the voice uttered by the user input from the voice input unit 11 and obtaining the pitch frequency.

心理状況推定部２１は、ユーザ音声のピッチ周波数の平均値から、ユーザの緊急度を表すパラメータｘ１を求め、テキスト変換部２２に出力する手段である。 The psychological situation estimation unit 21 is a means for obtaining a parameter x1 representing the degree of urgency of the user from the average value of the pitch frequency of the user voice and outputting the parameter x1 to the text conversion unit 22.

図３は、ピッチ周波数の平均値から、ユーザの緊急度ｘ１を求めるマップを表した図である。図３の例では、ピッチ周波数の平均値が高い程、ユーザの緊急度ｘ１が高くなるような単調増加関係の曲線（より厳密には、ピッチ周波数の平均値が第１の閾値ｔｈ１を超えると、ユーザの緊急度ｘ１が急増し、ピッチ周波数の平均値が第２の閾値ｔｈ２を超えると、再び緩やかに増えていくようなＳ字状の曲線）となっている。これは、ピッチ周波数が高い場合、うわずった発声が行われているので、ユーザの緊急度は高いと推定できるからである。 FIG. 3 is a diagram showing a map for obtaining the urgency level x1 of the user from the average value of the pitch frequency. In the example of FIG. 3, the higher the average value of the pitch frequency is, the higher the user's urgency level x1 is. The curve of the monotonically increasing relationship (strictly speaking, when the average value of the pitch frequency exceeds the first threshold th1). When the user's urgency level x1 increases rapidly and the average value of the pitch frequency exceeds the second threshold th2, the S-shaped curve gradually increases again. This is because when the pitch frequency is high, it is possible to estimate that the urgency level of the user is high because the voice is uttered.

テキスト変換部２２は、上記のようにして得られるユーザの緊急度ｘ１に応じて、入力テキストをスコアに基づき変換して、出力テキストを生成する手段である。本実施形態におけるスコア又は総合スコアとは、音声としてのわかりやすさを表す指標である。 The text conversion unit 22 is a means for generating an output text by converting the input text based on the score according to the user's urgency level x1 obtained as described above. The score or the total score in the present embodiment is an index representing the ease of understanding as a voice.

図４は、テキスト変換部２２の詳細構成を表した図である。テキスト変換部２２は、単語変換部３１と、テキスト分割部３２と、候補選択部３３と、単語変換データベース（単語変換ＤＢ）３４と、を備えて構成される。 FIG. 4 is a diagram showing a detailed configuration of the text conversion unit 22. The text conversion unit 22 includes a word conversion unit 31, a text division unit 32, a candidate selection unit 33, and a word conversion database (word conversion DB) 34.

単語変換部３１は、単語変換ＤＢ３４に登録された単語対を用いて、長さＬの入力テキストに含まれる変更可能な、すべての組のテキスト候補である単語変換候補群とそれぞれの変換スコアを出力する。 The word conversion unit 31 uses the word pairs registered in the word conversion DB 34 to obtain word conversion candidate groups that are all sets of text candidates that can be changed and are included in the input text having a length L, and the respective conversion scores. Output.

単語変換ＤＢ３４には、意味が略同一である（置き換えても文意が変わらない。）一組以上の単語対と、各単語対ｉによって文言を変換した場合の変換スコアＳ１（ｉ）が記録されている。図５は、単語変換ＤＢ３４に登録された単語対と変換スコアの例である。変換スコアは、同音異義語が存在する入力単語に対しては、低く、同音異義語が存在しない出力単語に対しては変換スコアが高くなるように設定することができる。これは、同音異義語が存在する入力単語（図５の「パーソナルコンピュータ」と、「パトカー」はそれぞれ「ＰＣ」という同音異義語を持つ。）を同音異義語が存在しない出力単語（図５の「パソコン」、「ポケコン」は同音異義語を持たない。）への置き換えを促進するためである。 The word conversion DB 34 records one or more pairs of words that have substantially the same meaning (the sentence meaning does not change even if they are replaced), and a conversion score S1 (i) when the words are converted by each word pair i. Has been. FIG. 5 is an example of word pairs and conversion scores registered in the word conversion DB 34. The conversion score can be set to be low for an input word having a homonym, and to be high for an output word having no homonym. This is because an input word in which a homonym is present (“personal computer” in FIG. 5 and “patker” each have a homonym of “PC”) is output word (in FIG. 5). “PC” and “Pokécon” do not have homonyms.

テキスト分割部３２は、入力された単語変換候補群の各テキストを、読点又は所定のポーズ記号によって、長さＬ（１），Ｌ（２），．．．Ｌ（Ｎ）のＮ個の分割単位に分割したすべての組合せであるテキスト変換候補群を出力する。 The text dividing unit 32 converts each text of the input word conversion candidate group into lengths L (1), L (2),. . . A text conversion candidate group that is all combinations divided into N division units of L (N) is output.

例えば、読点を用いて、入力テキストを分割する場合、読点がＭ個挿入可能である場合、２のＭ乗種類の分割が可能である。 For example, when the input text is divided using reading marks, when M reading marks can be inserted, division of 2 M power types is possible.

なお、上記テキスト長Ｌ、Ｌ（１）〜Ｌ（Ｎ）としては、簡便に求めることが可能であり、発音時間長と相関のある文字数を用いるものとして説明するが、入力テキストからモーラ数を求めることが可能である場合には、発音時間長とより相関の強いモーラ数を用いることもできる。 The text lengths L and L (1) to L (N) can be easily obtained and will be described as using the number of characters correlated with the pronunciation time length. If it can be obtained, the number of mora more strongly correlated with the pronunciation duration can be used.

候補選択部３３は、テキスト分割部３２が出力したテキスト変換候補のすべての組み合わせの中で、ｙ１＝ｘ１−（α１＊Ｓ１＋α２＊Ｓ２）が正かつ最小となる候補を選択し、該当変換後テキストを出力する。ここで、α１、α２は、予め定めた定数である。 The candidate selection unit 33 selects a candidate in which y1 = x1− (α1 * S1 + α2 * S2) is positive and minimum among all combinations of text conversion candidates output by the text dividing unit 32, and the corresponding converted text Is output. Here, α1 and α2 are predetermined constants.

ここで、単語変換候補毎に算出されるＳ１は、変換に使用した単語の変換スコアＳ１（ｉ）の和である。 Here, S1 calculated for each word conversion candidate is the sum of the conversion scores S1 (i) of the words used for conversion.

また、テキスト変換候補毎に算出されるＳ２は、Ｓ２＝Ｌ＾２−Σ（Ｌ（ｉ）＊Ｌ（ｉ））で求められる。このようにして得られるＳ２は、分割回数が小さい程、また変換、分割後のフレーズ長が均一である程、小さな値となる。 Further, S2 calculated for each text conversion candidate is obtained by S2 = L ^ 2-Σ (L (i) * L (i)). The S2 obtained in this way becomes smaller as the number of divisions is smaller, and as the phrase length after conversion and division is uniform.

図６は、「私は、パーソナルコンピュータを、買った。」という入力テキストに対するスコア（総合スコア）を上記Ｓ１、Ｓ２を用いて算出した結果を表している。なお、図６の例では、上記定数α１、α２として、α１＝１０、α２＝１を設定している。 FIG. 6 shows a result of calculating a score (overall score) for the input text “I bought a personal computer” using S1 and S2. In the example of FIG. 6, α1 = 10 and α2 = 1 are set as the constants α1 and α2.

図６を参照して、スコアの算出の方法を説明する。例えば、候補番号１の単語変換「なし」、テキスト分割「なし」のケースでは、Ｓ１＝０（変換なし）、Ｓ２＝２０＾２−（２０＾２）＝０と算出され、総合スコアは、α１×０＋α２×０＝０と算出される。 A method for calculating a score will be described with reference to FIG. For example, in the case of word conversion “none” for candidate number 1 and text division “none”, S1 = 0 (no conversion), S2 = 20 ^ 2− (20 ^ 2) = 0, and the total score is It is calculated as α1 × 0 + α2 × 0 = 0.

同様に、候補番号２の単語変換「ａ」（図５の「パーソナルコンピュータ」を「パソコン」に変換、テキスト分割「なし」のケースでは、Ｓ１＝５０、Ｓ２＝２０＾２−（１３＾２）＝２３１と算出され、総合スコアは、α１×５０＋α２×２３１＝７３１と算出される。 Similarly, the word conversion “a” for candidate number 2 (in the case of “personal computer” in FIG. 5 converted to “personal computer” and text division “none”, S1 = 50, S2 = 20 ^ 2- (13 ^ 2 ) = 231, and the total score is calculated as α1 × 50 + α2 × 231 = 731.

同様に、候補番号３の単語変換「ｂ」（図５の「パーソナルコンピュータ」を「ＰＣ」に変換、テキスト分割「なし」のケースでは、Ｓ１＝５０、Ｓ２＝２０＾２−（１３＾２）＝２３１と算出され、総合スコアは、α１×５０＋α２×２３１＝７３１と算出される。 Similarly, the word conversion “b” of candidate number 3 (in the case of “personal computer” in FIG. 5 converted to “PC” and text division “none”, S1 = 50, S2 = 20 ^ 2- (13 ^ 2 ) = 231, and the total score is calculated as α1 × 50 + α2 × 231 = 731.

上記総合スコアを用いた上述の候補選択部３３のｙ１＝ｘ１−（α１＊Ｓ１＋α２＊Ｓ２）が正かつ最小となる候補を選択する基準に当てはめると、ユーザの緊急度ｘ１が著しく高いときには、候補番号２のテキスト変換候補が選択される。ユーザの緊急度ｘ１が著しく高くはないが一定値以上である場合、候補番号３のテキスト変換候補が選択される。ユーザの緊急度ｘ１が低い場合、候補番号１のテキスト変換候補が選択される。つまり、ユーザの緊急度ｘ１が高いときには、候補番号２の「私は、パソコンを、買った。」という同義語が少なく、かつ、短く言い換えられた候補が選択される。 If the criterion for selecting a candidate for which y1 = x1− (α1 * S1 + α2 * S2) of the candidate selection unit 33 using the total score is positive and minimum is applied, when the user's urgency level x1 is extremely high, the candidate The text conversion candidate of number 2 is selected. If the user's urgency level x1 is not remarkably high but is a certain value or more, the text conversion candidate with candidate number 3 is selected. When the user's urgency level x1 is low, the text conversion candidate with candidate number 1 is selected. That is, when the user's urgency level x1 is high, the candidate number 2 is selected with a short synonym of “I bought a personal computer” and a short paraphrase.

上記総合スコアは、テキスト分割がされる毎に高くなり、例えば、図６の候補番号４〜１２の各テキスト変換候補は、それぞれ同一の単語変換を行った候補番号１〜３のテキスト変換候補よりも総合スコアが高くなっている。すべてのテキスト変換候補が得られている状態では、ユーザの緊急度ｘ１が２００である場合、入力テキストを１回分割した候補番号７（総合スコア＝１５９、ｙ１＝４１）が選択される。同様に、ユーザの緊急度ｘ１が上がって５００である場合、読み上げ文が短くなるよう単語変換を行った候補番号３（総合スコア＝４７９、ｙ１＝２１）が選択される。更に、ユーザの緊急度ｘ１が更に上がって９００である場合には、更に、短かく、細かく分割された候補番号１１（総合スコア＝８５５、ｙ１＝４５）が選択される。 The total score becomes higher every time the text is divided. For example, each of the text conversion candidates of candidate numbers 4 to 12 in FIG. 6 is more than the text conversion candidates of candidate numbers 1 to 3 that have been subjected to the same word conversion. The overall score is also high. In a state where all text conversion candidates are obtained, when the user's urgency level x1 is 200, candidate number 7 (total score = 159, y1 = 41) obtained by dividing the input text once is selected. Similarly, when the user's urgency level x1 is increased to 500, candidate number 3 (total score = 479, y1 = 21) obtained by performing word conversion so as to shorten the read-out sentence is selected. Further, when the user's urgency level x1 is further increased to 900, the candidate number 11 (total score = 855, y1 = 45) that is further finely divided is selected.

以上のように、緊急度の高いユーザには、なるべく同義語を含まず（わかりやすく）、細かく分割が行われた（聞き取りやすい）出力テキストが生成される。 As described above, a user with a high degree of urgency generates an output text that does not include synonyms as much as possible (easy to understand) and is finely divided (easy to hear).

なお、本実施形態においては、選択可能な候補を予めすべて挙げてから選択を行うものとして説明したが、単語変換部３１やテキスト分割部３２にユーザの緊急度ｘ１を入力し、各段階で不要な候補を削除、あるいは、最適な候補を選択するようにしてもよい。例えば、ユーザの緊急度ｘ１が高い場合には、スコアＳ１が高い変換を行ったテキスト変換候補のみを出力するようにすることで、テキスト分割部３２や候補選択部３３の負荷や処理時間を低減することが可能である。 In the present embodiment, description has been made assuming that all candidates that can be selected are listed in advance, and selection is performed. However, the user's urgency level x1 is input to the word conversion unit 31 and the text division unit 32, and is not required at each stage. Simple candidates may be deleted or an optimal candidate may be selected. For example, when the user's urgency level x1 is high, only the text conversion candidates that have undergone conversion with a high score S1 are output, thereby reducing the load and processing time of the text dividing unit 32 and candidate selecting unit 33. Is possible.

［第２の実施形態］
続いて、ユーザの心理的状況を表すパラメータとして、ユーザの切迫度（差し迫っている度合い）を用いてテキスト変換を行う本発明の第２の実施形態について説明する。図７は、本発明の第２の実施形態に係るテキスト変換装置の構成を表したブロック図である。[Second Embodiment]
Next, a second embodiment of the present invention in which text conversion is performed using a user's urgency level (immediate degree) as a parameter representing the user's psychological situation will be described. FIG. 7 is a block diagram showing the configuration of the text conversion apparatus according to the second embodiment of the present invention.

図７を参照すると、本発明の第２の実施形態に係るテキスト変換装置は、マイクロフォン等の音声入力部１１と、音声認識部１２と、応答文言生成部１３と、発話速度測定部２３と、心理状況推定部２１と、テキスト変換部２２と、テキスト音声合成部１４と、スピーカ１５と、を備えて構成される。上記音声認識部１２と、応答文言生成部１３と、発話速度測定部２３と、心理状況推定部２１と、テキスト変換部２２と、テキスト音声合成部１４との処理手段は、テキスト変換装置を構成するコンピュータに、後記する各処理を実行させるプログラムにより実現することができる。 Referring to FIG. 7, a text conversion apparatus according to the second embodiment of the present invention includes a voice input unit 11 such as a microphone, a voice recognition unit 12, a response message generation unit 13, an utterance speed measurement unit 23, A psychological situation estimation unit 21, a text conversion unit 22, a text speech synthesis unit 14, and a speaker 15 are configured. The processing means of the speech recognition unit 12, the response message generation unit 13, the speech rate measurement unit 23, the psychological situation estimation unit 21, the text conversion unit 22, and the text speech synthesis unit 14 constitutes a text conversion device. It can be realized by a program that causes a computer to execute each process described later.

音声認識部１２は、マイクロフォン１１から入力された音声を認識し、応答文言生成部１３に出力する手段である。 The voice recognition unit 12 is a unit that recognizes the voice input from the microphone 11 and outputs the voice to the response message generation unit 13.

応答文言生成部１３は、音声認識部１２にて認識されたユーザの発話内容に応答する文言を生成し、入力テキストとしてテキスト変換部２２に出力する手段である。 The response word generation unit 13 is a unit that generates a word that responds to the content of the user's utterance recognized by the voice recognition unit 12 and outputs it to the text conversion unit 22 as input text.

発話速度測定部２３は、ユーザの発声する音声の発話速度を測定する手段である。マイクロフォン１１から入力された音声は、発話速度測定部２３にも入力され、ユーザの発声する音声の発話速度の測定が行われる。 The utterance speed measuring unit 23 is a means for measuring the utterance speed of the voice uttered by the user. The sound input from the microphone 11 is also input to the speech rate measuring unit 23, and the speech rate of the speech uttered by the user is measured.

心理状況推定部２１は、発話速度測定部２３にて測定された発話速度の値に基づいて、切迫度を表す数値ｘ２を出力する。 The psychological situation estimation unit 21 outputs a numerical value x2 representing the degree of urgency based on the value of the utterance speed measured by the utterance speed measurement unit 23.

ここで、切迫度を表す数値ｘ２は、発話速度の値（単位はモーラ毎秒）と単調増加の関係となるよう、予め与えられた関係によって求めることができる。 Here, the numerical value x2 representing the degree of urgency can be obtained from a relationship given in advance so as to have a monotonically increasing relationship with the speech rate value (unit: mora per second).

テキスト変換部２２は、上記のようにして得られるユーザの切迫度ｘ２に応じて、入力テキストをスコアに基づき変換して、出力テキストを生成する手段である。本実施形態におけるスコア又は総合スコアとは、音声としてのわかりやすさを表す指標である。 The text conversion unit 22 is a means for generating an output text by converting the input text based on the score according to the user's degree of urgency x2 obtained as described above. The score or the total score in the present embodiment is an index representing the ease of understanding as a voice.

図８は、テキスト変換部２２の詳細構成を表した図である。テキスト変換部２２は、単語変換部３１と、テキスト要約部３６と、候補選択部３３と、単語変換データベース（単語変換ＤＢ）３４と、を備えて構成される。 FIG. 8 is a diagram showing a detailed configuration of the text conversion unit 22. The text conversion unit 22 includes a word conversion unit 31, a text summarization unit 36, a candidate selection unit 33, and a word conversion database (word conversion DB) 34.

単語変換部３１は、単語変換ＤＢ３４に登録された単語対を用いて、長さＬの入力テキストに含まれる変更可能な、すべての組のテキスト候補である単語変換候補群を出力する。 The word conversion unit 31 uses the word pairs registered in the word conversion DB 34 to output word conversion candidate groups that are all the sets of changeable text candidates included in the input text of length L.

単語変換ＤＢ３４には、上記第１の実施形態と同様、意味が略同一である一組以上の単語対が記録されている。 As in the first embodiment, one or more pairs of words having substantially the same meaning are recorded in the word conversion DB 34.

本実施形態では、各単語変換候補毎に算出されるスコアＳ１は、単語変換部３１による、変換前のテキストの文字数Ｌ１１と、変換後のテキストの文字数Ｌ１２から、Ｓ１＝Ｌ１１−Ｌ１２として求めるものとする。 In the present embodiment, the score S1 calculated for each word conversion candidate is obtained as S1 = L11−L12 from the number L11 of text before conversion and the number L12 of text after conversion by the word conversion unit 31. And

テキスト要約部３６は、入力された単語変換候補群の長さＬ２１のテキストを文書要約し、長さＬ２２の要約テキストを出力する。テキスト要約部３６により複数の要約候補を生成可能である場合は、そのすべてがテキスト変換候補群に含まれる。 The text summarizing section 36 summarizes the text of the input word conversion candidate group having a length L21 and outputs a summary text having a length L22. When a plurality of summary candidates can be generated by the text summarizing unit 36, all of them are included in the text conversion candidate group.

各テキスト変換候補毎に算出されるスコアＳ２は、Ｓ２＝Ｌ２１−Ｌ２２で求めるものとする。 The score S2 calculated for each text conversion candidate is obtained by S2 = L21−L22.

候補選択部３３は、単語変換部３１とテキスト要約部３６が出力した各テキスト変換候補のすべての組み合わせの中で、ｙ２＝ｘ２−（α１＊Ｓ１＋α２＊Ｓ２）が正かつ最小となる候補を選択し、出力する。ただし、すべてのテキスト変換候補のｙ２が負になる場合は、候補選択部３３は、Ｓ１＋Ｓ２が最大となる候補を選択し、出力する。 The candidate selection unit 33 selects a candidate in which y2 = x2− (α1 * S1 + α2 * S2) is positive and minimum among all combinations of the text conversion candidates output from the word conversion unit 31 and the text summarization unit 36. And output. However, when y2 of all the text conversion candidates becomes negative, the candidate selection unit 33 selects and outputs a candidate that maximizes S1 + S2.

図９は、「私が持っているパーソナルコンピュータが壊れてしまった。」という入力テキストに対するスコア（総合スコア）を上記Ｓ１、Ｓ２を用いて算出した結果を表している。なお、図９の例では、上記ｙ２の算出式中の定数α１、α２として、α１＝１、α２＝１を設定している。 FIG. 9 shows the result of calculating the score (total score) for the input text “My personal computer has been broken” using S1 and S2. In the example of FIG. 9, α1 = 1 and α2 = 1 are set as the constants α1 and α2 in the calculation formula of y2.

図９を参照して、スコアの算出の方法を説明する。例えば、候補番号１の単語変換「なし」、テキスト要約「なし」のケースでは、Ｓ１＝０（変換による短縮なし）、Ｓ２＝０（要約による短縮なし）と算出され、総合スコアは０と算出される。 A score calculation method will be described with reference to FIG. For example, in the case of word conversion “none” for candidate number 1 and text summary “none”, S1 = 0 (no shortening due to conversion), S2 = 0 (no shortening due to summary), and the total score is calculated as 0. Is done.

同様に、候補番号２の単語変換「ａ」（図５の「パーソナルコンピュータ」を「パソコン」に変換、テキスト要約「なし」のケースでは、Ｓ１＝７、Ｓ２＝０と算出され、総合スコアは７と算出される。 Similarly, the word conversion “a” of candidate number 2 (in the case of “personal computer” in FIG. 5 converted to “personal computer” and the text summary “none” is calculated as S1 = 7, S2 = 0, and the total score is 7 is calculated.

同様に、候補番号３の単語変換「ｂ」（図５の「パーソナルコンピュータ」を「ＰＣ」に変換、テキスト要約「なし」のケースでは、Ｓ１＝９、Ｓ２＝０と算出され、総合スコアは９と算出される。 Similarly, in the case of the word conversion “b” of candidate number 3 (“personal computer” in FIG. 5 is converted to “PC” and the text summary is “none”, S1 = 9 and S2 = 0 are calculated, and the total score is 9 is calculated.

上記総合スコアを用いた上述の候補選択部３３のｙ２＝ｘ２−（α１＊Ｓ１＋α２＊Ｓ２）が正かつ最小となる候補を選択する基準に当てはめると、ユーザの切迫度ｘ２が著しく高いとき（度数９以上）には、候補番号１〜３のうち最も短い候補番号３のテキスト変換候補が選択される。ユーザの切迫度ｘ２が著しく高くはないが一定値以上である場合（度数７以上９未満）、候補番号３のテキスト変換候補が選択される。ユーザの切迫度ｘ２が低い場合（度数７未満）、候補番号１のテキスト変換候補が選択される。つまり、ユーザの切迫度ｘ２が高いときには、候補番号３の「私が持っているＰＣが壊れてしまった。」という単語変換により、短く言い換えられた候補が選択される。 When the criterion of selecting a candidate for which y2 = x2− (α1 * S1 + α2 * S2) of the candidate selection unit 33 using the total score is positive and minimum is applied, when the user's urgency level x2 is extremely high (frequency) 9 or more), the text conversion candidate with the shortest candidate number 3 among the candidate numbers 1 to 3 is selected. When the user's urgency level x2 is not remarkably high but is a certain value or more (frequency 7 or more and less than 9), the text conversion candidate of candidate number 3 is selected. When the user's urgency level x2 is low (frequency less than 7), the text conversion candidate with candidate number 1 is selected. That is, when the user's urgency level x2 is high, a candidate that is paraphrased shortly is selected by the word conversion of candidate number 3 "My PC has been broken."

上記総合スコアは、テキスト要約の効果が大きくなると更に高くなり、例えば、図６の候補番号４〜９の各テキスト変換候補は、それぞれ同一の単語変換を行った候補番号１〜３のテキスト変換候補よりも総合スコアが高くなっている。すべてのテキスト変換候補が得られている状態では、ユーザの切迫度ｘ２が度数２０である場合、入力テキストを単語変換と、文書要約により短かく言い換えた候補番号９（総合スコア＝１８、ｙ２＝２）が選択される。 The total score becomes higher as the effect of text summarization becomes larger. For example, each of the text conversion candidates of candidate numbers 4 to 9 in FIG. 6 is the text conversion candidate of candidate numbers 1 to 3 obtained by performing the same word conversion. The overall score is higher than. In a state where all text conversion candidates are obtained, if the user's urgency level x2 is 20, the candidate number 9 (total score = 18, y2 = 2) is selected.

以上のように、本実施形態では、緊急度が高いと判定された場合は、より短くて、短時間で伝達可能な可能性が高い文を生成することが可能となる。 As described above, in the present embodiment, when it is determined that the degree of urgency is high, it is possible to generate a sentence that is shorter and highly likely to be transmitted in a short time.

なお、本実施形態においても、選択可能な候補を予めすべて挙げてから選択を行うものとして説明したが、単語変換部３１やテキスト要約部３６にユーザの切迫度ｘ２を入力し、各段階で不要な候補を削除、あるいは、最適な候補を選択するようにしてもよい。例えば、ユーザの切迫度ｘ２が高い場合には、スコアＳ１が高い変換を行ったテキスト変換候補のみを出力するようにすることで、テキスト要約部３６や候補選択部３３の負荷や処理時間を低減することが可能である。 In the present embodiment, it has been described that selection is performed after all possible candidates are listed. However, the user's urgency level x2 is input to the word conversion unit 31 and the text summarization unit 36, and is not required at each stage. Simple candidates may be deleted or an optimal candidate may be selected. For example, when the user's urgency level x2 is high, only the text conversion candidates that have undergone conversion with a high score S1 are output, thereby reducing the load and processing time of the text summarization unit 36 and candidate selection unit 33. Is possible.

また、上記ユーザの切迫度を表す数値ｘ２は、上記に限らず、以下に示す各方法で、得ることが可能である。 In addition, the numerical value x2 representing the degree of urgency of the user is not limited to the above, and can be obtained by the following methods.

例えば、上記切迫度を表す数値ｘ２は、ユーザが搭乗（運転）している自動車等の乗り物の移動速度（車速あるいは駆動輪回転数）を入力とし、該当値と単調増加な関係としても得られる。 For example, the numerical value x2 representing the degree of urgency is obtained as a monotonically increasing relationship with the corresponding value by inputting the moving speed (vehicle speed or driving wheel rotational speed) of a vehicle such as an automobile on which the user is boarding (driving). .

また例えば、上記切迫度を表す数値ｘ２は、自動車のブレーキの動作を入力とし、自動車を運転しているユーザがブレーキを踏んだ時にｘ２の値を大きく、更にブレーキペダルの動きの加速度が大きい時に、ｘ２の値が更に大きくなるようにすることによっても得ることができる。 Also, for example, the numerical value x2 representing the degree of urgency is obtained when the brake operation of the automobile is input, the value of x2 is increased when the user driving the automobile depresses the brake, and the acceleration of the movement of the brake pedal is large. , X2 can be obtained by further increasing the value.

［第３の実施形態］
続いて、ユーザの心理的状況を表すパラメータとして、ユーザの集中度（集中している度合い）を用いてテキスト変換を行う本発明の第３の実施形態について説明する。図１０は、本発明の第３の実施形態に係るテキスト変換装置の構成を表したブロック図である。[Third Embodiment]
Next, a third embodiment of the present invention in which text conversion is performed using a user's concentration degree (concentration degree) as a parameter representing the user's psychological situation will be described. FIG. 10 is a block diagram showing the configuration of the text conversion apparatus according to the third embodiment of the present invention.

図１０を参照すると、本発明の第３の実施形態に係るテキスト変換装置は、マイクロフォン等の音声入力部１１と、音声認識部１２と、応答文言生成部１３と、発話速度測定部２３と、心理状況推定部２１と、テキスト変換部２２と、テキスト音声合成部１４と、スピーカ１５と、を備えて構成される。先の第２の実施形態と同様に、本実施形態のテキスト変換装置の処理手段は、テキスト変換装置を構成するコンピュータに、後記する各処理を実行させるプログラムにより実現することができる。 Referring to FIG. 10, the text conversion device according to the third exemplary embodiment of the present invention includes a voice input unit 11 such as a microphone, a voice recognition unit 12, a response message generation unit 13, a speech rate measurement unit 23, A psychological situation estimation unit 21, a text conversion unit 22, a text speech synthesis unit 14, and a speaker 15 are configured. As in the second embodiment, the processing means of the text conversion apparatus of this embodiment can be realized by a program that causes a computer constituting the text conversion apparatus to execute each process described later.

本実施形態は、心理状況推定部２１がユーザの発話速度からユーザの集中度を示す値ｘ３を出力し、テキスト変換部２２がユーザの集中度を示す値ｘ３を用いてテキスト変換を行うものであり、その他要素は、上記した第２の実施形態と同様であるので、その相違点を中心に説明する。 In the present embodiment, the psychological situation estimation unit 21 outputs a value x3 indicating the user's concentration level from the user's utterance speed, and the text conversion unit 22 performs text conversion using the value x3 indicating the user's concentration level. The other elements are the same as those in the second embodiment described above, and the differences will be mainly described.

心理状況推定部２１は、発話速度の値の時間的変動成分からユーザの集中度ｘ３を求める。具体的には、発話速度の最大値Ｖｍａｘと最小値Ｖｍｉｎの値から、変動成分Ｖｄｉｆｆ＝Ｖｍａｘ−Ｖｍｉｎを計算する。 The psychological state estimation unit 21 obtains the user concentration degree x3 from the temporal variation component of the utterance speed value. Specifically, the fluctuation component Vdiff = Vmax−Vmin is calculated from the maximum value Vmax and the minimum value Vmin of the speech rate.

心理状況推定部２１は、この発話速度の値の時間的変動成分Ｖｄｉｆｆの値が大きい時は、対話以外のことに気を取られている可能性が高いと判定して、集中度を表す数値ｘ３を出力する。 When the value of the temporal variation component Vdiff of the utterance speed value is large, the psychological situation estimation unit 21 determines that there is a high possibility of being distracted by things other than dialogue and is a numerical value indicating the degree of concentration. x3 is output.

従って、集中度を表す数値ｘ３は、該発話速度の値の時間的変動成分の値Ｖｄｉｆｆと、単調減少の関係となるよう、予め与えられた関係によって求めることができる。 Therefore, the numerical value x3 representing the degree of concentration can be obtained from a relationship given in advance so as to have a monotonously decreasing relationship with the time variation component value Vdiff of the speech rate value.

テキスト変換部２２は、上記のようにして得られるユーザの集中度ｘ３に応じて、入力テキストをスコアに基づき変換して、出力テキストを生成する手段である。 The text conversion unit 22 is means for converting the input text based on the score according to the user concentration x3 obtained as described above, and generating the output text.

図１１は、テキスト変換部２２の詳細構成を表した図である。テキスト変換部２２は、単語変換部３１と、テキスト強調部３７と、候補選択部３３と、単語変換データベース（単語変換ＤＢ）３４と、を備えて構成される。 FIG. 11 is a diagram showing a detailed configuration of the text conversion unit 22. The text conversion unit 22 includes a word conversion unit 31, a text enhancement unit 37, a candidate selection unit 33, and a word conversion database (word conversion DB) 34.

単語変換ＤＢ３４には、上記第１の実施形態と同様、意味が略同一である一組以上の単語対と、各単語対ｉによって文言を変換した場合の変換スコアＳ１（ｉ）が記録されている。 In the word conversion DB 34, as in the first embodiment, one or more pairs of words having substantially the same meaning and a conversion score S1 (i) when the word is converted by each word pair i are recorded. Yes.

テキスト強調部３７は、入力された単語変換候補群の長さＬ２１のテキストから重要語を抽出し、該重要語を任意の回数繰り返した長さＬ２２の出力テキストを生成する（フレーズ繰り返し処理）。例えば、「次はＢボタンを押してください」という入力テキストに対し、テキスト強調部３７は、「Ｂボタン」を重要語として抽出し、読点を挟んで該重要語を二度繰り返すことにより、「次はＢボタン、Ｂボタンを押してください」というテキストを生成する。入力テキストに複数の重要語が含まれている場合は、テキスト強調部３７は、各重要語をそれぞれ繰り返したパターンの組み合わせすべてをテキスト変換候補群として出力する。 The text emphasizing unit 37 extracts an important word from the text of the input word conversion candidate group having a length L21, and generates an output text having a length L22 in which the important word is repeated an arbitrary number of times (phrase repetition processing). For example, for the input text “Please press the B button next”, the text emphasizing unit 37 extracts the “B button” as an important word, and repeats the important word twice with a punctuation mark. "B button, please press B button". When the input text includes a plurality of important words, the text emphasizing unit 37 outputs all combinations of patterns obtained by repeating each important word as a text conversion candidate group.

重要語の候補は、予めテキスト強調部３７内に定義しておいてもよいし、目的語等を一定の規則で重要語として抽出するようにしてもよい。 The important word candidates may be defined in advance in the text emphasizing unit 37, or the object word or the like may be extracted as the important word according to a certain rule.

各テキスト変換候補毎に算出されるスコアＳ２は、Ｓ２＝Ｌ２２−Ｌ２１で求めるものとする。 The score S2 calculated for each text conversion candidate is obtained by S2 = L22−L21.

候補選択部３３は、単語変換部３１とテキスト強調部３７が出力した各テキスト変換候補のすべての組み合わせの中で、ｙ３＝（１／ｘ３）−（β１＊Ｓ１＋β２＊Ｓ２）が正かつ最小となる候補を選択し、出力する。ただし、すべてのテキスト変換候補のｙ３が負になる場合は、候補選択部３３は、Ｓ１＋Ｓ２が最大となる候補を選択し、出力する。 The candidate selection unit 33 determines that y3 = (1 / x3) − (β1 * S1 + β2 * S2) is positive and minimum among all combinations of the text conversion candidates output by the word conversion unit 31 and the text enhancement unit 37. Select the candidate to be output. However, when y3 of all the text conversion candidates is negative, the candidate selection unit 33 selects and outputs a candidate having the maximum S1 + S2.

本実施形態におけるスコアの算出の方法を説明する。例えば、単語変換「なし」、テキスト強調「なし」のケースでは、Ｓ１＝０（変換なし）、Ｓ２＝０（強調なし）と算出され、総合スコアは０と算出される。 A method for calculating a score in this embodiment will be described. For example, in the case of word conversion “none” and text enhancement “none”, S1 = 0 (no conversion) and S2 = 0 (no enhancement) are calculated, and the total score is calculated as zero.

一方、図５の「パーソナルコンピュータ」を「パソコン」に変換等の単語変換を行い、テキスト強調「なし」のケースでは、Ｓ１＝５０、Ｓ２＝０と算出され、β１、β２をそれぞれ１とした場合、総合スコアは５０と算出される。 On the other hand, word conversion such as conversion of “personal computer” to “personal computer” in FIG. 5 is performed, and in the case of text emphasis “none”, S1 = 50 and S2 = 0 are calculated, and β1 and β2 are set to 1, respectively. In this case, the total score is calculated as 50.

一方、図５の「パーソナルコンピュータ」を「パソコン」に変換等の単語変換を行い、テキスト強調「ボタンＢ」の２回繰り返しを行ったケースでは、Ｓ１＝５０、Ｓ２＝４と算出され、β１、β２をそれぞれ１とした場合、総合スコアは５４と算出される。 On the other hand, in the case where word conversion such as conversion of “personal computer” in FIG. 5 to “personal computer” is performed and the text emphasis “button B” is repeated twice, S1 = 50 and S2 = 4 are calculated, and β1 , Β2 is 1, and the total score is calculated as 54.

上記総合スコアを用いた上述の候補選択部３３のｙ３＝（１／ｘ３）−（β１＊Ｓ１＋β２＊Ｓ２）が正かつ最小となる候補を選択する基準に当てはめると、ユーザの集中度ｘ３が低いときには、上記単語変換と、テキスト強調の双方を行ったテキスト変換候補が選択される。反対に、ユーザの集中度ｘ３が高いと判断されるときには、上記単語変換やテキスト強調を行っていないテキスト変換候補が選択される。 When the above-described candidate selection unit 33 using the total score is applied to a criterion for selecting a candidate where y3 = (1 / x3) − (β1 * S1 + β2 * S2) is positive and minimum, the user concentration level x3 is low. Sometimes, a text conversion candidate that performs both the word conversion and text emphasis is selected. On the other hand, when it is determined that the user's concentration x3 is high, a text conversion candidate that is not subjected to the word conversion or text emphasis is selected.

以上のように、本実施形態では、ユーザの集中度が低いと判定された場合は、より冗長だが判りやすい表現の文を生成することが可能となる。 As described above, in this embodiment, when it is determined that the user's concentration level is low, it is possible to generate a more verbose but easy-to-understand expression sentence.

なお、本実施形態においても、選択可能な候補を予めすべて挙げてから選択を行うものとして説明したが、単語変換部３１やテキスト強調部３７にユーザの集中度ｘ３を入力し、各段階で不要な候補を削除、あるいは、最適な候補を選択するようにしてもよい。例えば、ユーザの集中度ｘ３が低い場合には、スコアＳ１が高い変換を行ったテキスト変換候補のみを出力するようにすることで、テキスト強調部３７や候補選択部３３の負荷や処理時間を低減することが可能である。 In this embodiment, the selection is made after all possible candidates are listed in advance. However, the user's concentration x3 is input to the word conversion unit 31 and the text emphasizing unit 37, and is unnecessary at each stage. Simple candidates may be deleted or an optimal candidate may be selected. For example, when the degree of user concentration x3 is low, only the text conversion candidates that have undergone conversion with a high score S1 are output, thereby reducing the load and processing time of the text enhancement unit 37 and candidate selection unit 33. Is possible.

また、上記ユーザの集中度を表す数値ｘ３は、上記に限らず、以下に示す各方法で、得ることが可能である。 Further, the numerical value x3 representing the degree of user concentration is not limited to the above, and can be obtained by the following methods.

例えば、上記集中度を表す数値ｘ３は、ユーザの皮膚の電気抵抗を測定し入力することにより、電気抵抗と単調減少の関係にある値としてのユーザの発汗量を推定し、発汗量が多い場合に集中度が高いという関係から求めることができる。 For example, the numerical value x3 representing the concentration level is obtained by measuring and inputting the electrical resistance of the user's skin to estimate the amount of sweating of the user as a value that has a monotonous decrease relationship with the electrical resistance. It can be obtained from the relationship that the degree of concentration is high.

また例えば、上記集中度を表す数値ｘ３は、ユーザの呼吸を測定し入力することにより、時間当たりの呼吸回数が少ない時は集中度が高いという関係から求めることができる。 Further, for example, the numerical value x3 representing the concentration degree can be obtained from the relationship that the concentration degree is high when the number of breaths per hour is small by measuring and inputting the respiration of the user.

また例えば、上記集中度を表す数値ｘ３は、ユーザの脈拍を測定し入力することにより、時間当たりの拍動数が多い時は集中度が高いという関係から求めることができる。 Further, for example, the numerical value x3 representing the degree of concentration can be obtained from the relationship that the degree of concentration is high when the number of pulsations per hour is large by measuring and inputting the user's pulse.

以上、本発明の好適な実施形態を説明したが、本発明は、上記した実施形態に限定されるものではなく、本発明の基本的技術的思想を逸脱しない範囲で、更なる変形・置換・調整を加えることができる。また例えば、上記した第２の実施形態では、変換した単語が同義語を持つか否かといった観点のスコアを用いないものとして説明したが、このスコアを適宜補正して総合スコアに加算することで、切迫しているユーザに伝わりやすい音声を出力することが可能となる。 The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above-described embodiments, and further modifications, replacements, and replacements may be made without departing from the basic technical idea of the present invention. Adjustments can be made. Further, for example, in the above-described second embodiment, it has been described that the score from the viewpoint of whether or not the converted word has a synonym is not used, but by appropriately correcting this score and adding it to the total score It is possible to output a voice that is easy to be transmitted to an imminent user.

更に、上記した第１〜第３の実施形態では、ユーザの緊急度、切迫度、集中度によるテキスト変換を取り上げて説明したが、上記した第１〜第３の実施形態のユーザの緊急度、切迫度、集中度によるテキスト変換をそれぞれ行ない、その中からユーザの心理的状況に適ったものを選択して発話するように構成することもできる。また、ユーザの緊急度、切迫度、集中度に限らず、その他の心理的状況を表すパラメータにより、テキスト変換を行うことももちろん可能である。
なお、上記ユーザの緊急度、切迫度、集中度といった各種心理的状況を表すパラメータ、入出力されるテキストおよびテキスト変換プログラムは、コンピュータが物理的ないし電気的信号として取り扱い可能なものであればよい。テキスト変換プログラムは、これら心理的状況を表すパラメータ及びテキストが入力されたコンピュータを、変換後のテキストを出力させるための物理的手段として機能させる。Furthermore, in the first to third embodiments described above, the text conversion based on the user's urgency level, urgency level, and concentration level has been described, but the user urgency level of the above-described first to third embodiments, It is also possible to perform text conversion based on the degree of urgency and the degree of concentration, respectively, and select a text suitable for the psychological situation of the user from the text conversion and utter it. Of course, it is possible to perform text conversion not only with the user's urgency level, urgency level, and concentration level but also with parameters representing other psychological situations.
The parameters representing various psychological situations such as the user's urgency level, urgency level, and concentration level, input / output text, and text conversion program may be anything that can be handled as a physical or electrical signal by the computer. . The text conversion program causes a computer in which parameters and text representing the psychological situation are input to function as physical means for outputting the converted text.

本発明は、テキスト音声合成装置と組み合わせることにより、ユーザの心理的状況を察して発話テキストを変更する、音声合成装置、音声対話システム、音声自動応答装置、知能ロボット等の各種用途に用いることができる。
なお、本発明の全開示（請求の範囲を含む）の枠内において、さらにその基本的技術思想に基づいて、実施形態ないし実施例の変更・調整が可能である。また、本発明の請求の範囲の枠内において種々の開示要素の多様な組み合わせ乃至選択が可能である。すなわち、本発明は、請求の範囲を含む全開示、技術的思想にしたがって当業者であればなし得るであろう各種変形、修正を含むことは勿論である。
［付記１−条約第３４条に基づく補正後請求項３］
前記ユーザの心理状況を表すパラメータとしてユーザの緊急度を用いることを特徴とする、請求項１に記載のテキスト変換装置。
［付記２−条約第３４条に基づく補正後請求項４］
前記ユーザの心理状況を表すパラメータとしてユーザの切迫度を用いることを特徴とする、請求項１に記載のテキスト変換装置。
［付記３−条約第３４条に基づく補正後請求項５］
前記ユーザの心理状況を表すパラメータとしてユーザの集中度を用いることを特徴とする、請求項１に記載のテキスト変換装置。
［付記４−条約第３４条に基づく補正後請求項６］
前記ユーザの緊急度を、ユーザの発声する音声のピッチ周波数の平均値で代替することを特徴とする、請求項３に記載のテキスト変換装置。
［付記５−条約第３４条に基づく補正後請求項７］
前記ユーザの切迫度を、ユーザの発声する音声の速度で代替することを特徴とする、請求項４に記載のテキスト変換装置。
［付記６−条約第３４条に基づく補正後請求項８］
前記ユーザの集中度を、ユーザの発声する音声の速度の時間的変動成分で代替することを特徴とする、請求項５に記載のテキスト変換装置。
［付記７−条約第３４条に基づく補正後請求項９］
入力テキスト中の単語を別の単語に置換することにより、複数のテキスト変換候補を作成し、
前記置換した単語が持つ同音異義語の数に応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項１、３ないし８いずれか一に記載のテキスト変換装置。
［付記８−条約第３４条に基づく補正後請求項１０］
入力テキスト中の単語を別の単語に置換することにより、複数のテキスト変換候補を作成し、
前記各テキスト変換候補を読み上げた時の時間長さ、または前記各テキスト変換候補の文の長さに応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項１、３ないし８いずれか一に記載のテキスト変換装置。
［付記９−条約第３４条に基づく補正後請求項１１］
入力テキストを複数の分割単位に分割することにより、複数のテキスト変換候補を作成し、
前記各テキスト変換候補に含まれる分割単位の数または各分割単位の長さに応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項１、３ないし８いずれか一に記載のテキスト変換装置。
［付記１０−条約第３４条に基づく補正後請求項１２］
入力テキスト中から１以上の重要語を抽出して、該重要語を二回以上重ねる変換操作を行うことにより、複数のテキスト変換候補を作成し、
前記変換操作の回数に応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項１、３ないし８いずれか一に記載のテキスト変換装置。
［付記１１−条約第３４条に基づく補正後請求項１５］
請求項１、３ないし１３いずれか一に記載のテキスト変換装置と、前記テキスト変換装置から出力されるテキストを読み上げるテキスト音声合成手段と、を備えることを特徴とする、音声出力装置。
［付記１２−条約第３４条に基づく補正後請求項１６］
請求項１５に記載の音声出力装置を含み、受聴者の心理状況に応じた音声出力を行うことを特徴とする、ロボット。
［付記１３−条約第３４条に基づく補正後請求項１９］
入力テキスト中の単語を別の単語に置換することにより、複数のテキスト変換候補を作成し、
前記置換した単語が持つ同音異義語の数に応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項１７に記載のテキスト変換方法。
［付記１４−条約第３４条に基づく補正後請求項２０］
入力テキスト中の単語を別の単語に置換することにより、複数のテキスト変換候補を作成し、
前記各テキスト変換候補を読み上げた時の時間長さ、または前記各テキスト変換候補の文の長さに応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項１７に記載のテキスト変換方法。
［付記１５−条約第３４条に基づく補正後請求項２１］
入力テキストを複数の分割単位に分割することにより、複数のテキスト変換候補を作成し、
前記各テキスト変換候補に含まれる分割単位の数または各分割単位の長さに応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項１７に記載のテキスト変換方法。
［付記１６−条約第３４条に基づく補正後請求項２２］
入力テキスト中から１以上の重要語を抽出して、該重要語を二回以上重ねる変換操作を行うことにより、複数のテキスト変換候補を作成し、
前記変換操作の回数に応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項１７に記載のテキスト変換方法。
［付記１７−条約第３４条に基づく補正後請求項２５］
入力テキスト中の単語を別の単語に置換することにより、複数のテキスト変換候補を作成し、
前記置換した単語が持つ同音異義語の数に応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項２３に記載のテキスト変換プログラム。
［付記１８−条約第３４条に基づく補正後請求項２６］
入力テキスト中の単語を別の単語に置換することにより、複数のテキスト変換候補を作成し、前記各テキスト変換候補を読み上げた時の時間長さ、または前記各テキスト変換候補の文の長さに応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項２３に記載のテキスト変換プログラム。
［付記１９−条約第３４条に基づく補正後請求項２７］
入力テキストを複数の分割単位に分割することにより、複数のテキスト変換候補を作成し、
前記各テキスト変換候補に含まれる分割単位の数または各分割単位の長さに応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項２３に記載のテキスト変換プログラム。
［付記２０−条約第３４条に基づく補正後請求項２８］
入力テキスト中から１以上の重要語を抽出して、該重要語を二回以上重ねる変換操作を行うことにより、複数のテキスト変換候補を作成し、
前記変換操作の回数に応じて、前記各テキスト変換候補にスコアを与えることを特徴とする、請求項２３に記載のテキスト変換プログラム。
［付記２１−条約第３４条に基づく補正後請求項２９］
請求項２３、２５ないし２８いずれか一に記載のテキスト変換プログラムにより、出力されるテキストを、テキスト音声合成技術によって音声に変換して出力する処理を更に前記コンピュータに実行させる音声出力プログラム。

The present invention can be used for various applications such as a speech synthesizer, a speech dialogue system, a speech automatic response device, an intelligent robot, etc., which changes the utterance text by combining with a text-to-speech synthesizer. it can.
It should be noted that the embodiments and examples can be changed and adjusted within the scope of the entire disclosure (including claims) of the present invention and based on the basic technical concept. Various combinations and selections of various disclosed elements are possible within the scope of the claims of the present invention. That is, the present invention of course includes various variations and modifications that could be made by those skilled in the art according to the entire disclosure including the claims and the technical idea.
[Appendix 1-Amended claim 3 under Article 34 of the Convention]
The text conversion apparatus according to claim 1, wherein a user's urgency level is used as a parameter representing the user's psychological state.
[Appendix 2-Amended claim 4 under Article 34 of the Convention]
The text conversion apparatus according to claim 1, wherein the user's urgency level is used as a parameter representing the user's psychological state.
[Appendix 3-Amended claim 5 under Article 34 of the Convention]
The text conversion apparatus according to claim 1, wherein a user concentration level is used as a parameter representing the user's psychological state.
[Appendix 4-Amended claim 6 under Article 34 of the Convention]
The text conversion device according to claim 3, wherein the urgency level of the user is replaced with an average value of pitch frequencies of voices uttered by the user.
[Appendix 5-Amended claim 7 under Article 34 of the Convention]
The text conversion device according to claim 4, wherein the degree of urgency of the user is replaced by a speed of voice uttered by the user.
[Appendix 6-Amended claim 8 under Article 34 of the Convention]
The text conversion apparatus according to claim 5, wherein the user's concentration degree is replaced with a temporal variation component of a speed of voice uttered by the user.
[Appendix 7-Amended claim 9 under Article 34 of the Convention]
Create multiple text conversion candidates by replacing a word in the input text with another word,
The text conversion apparatus according to claim 1, wherein a score is given to each of the text conversion candidates according to the number of homonyms that the replaced word has.
[Appendix 8-Claim 10 after amendment under Article 34 of the Convention]
Create multiple text conversion candidates by replacing a word in the input text with another word,
The score is given to each of the text conversion candidates according to the length of time when each of the text conversion candidates is read out or the length of the sentence of each of the text conversion candidates. 8. The text conversion device according to any one of 8.
[Appendix 9-Amended claim 11 under Article 34 of the Convention]
Create multiple text conversion candidates by dividing the input text into multiple units,
The score is given to each text conversion candidate according to the number of division units included in each text conversion candidate or the length of each division unit. Text converter.
[Appendix 10-Claim 12 after amendment under Article 34 of the Convention]
By extracting one or more important words from the input text and performing a conversion operation of overlapping the important words twice or more, a plurality of text conversion candidates are created,
9. The text conversion apparatus according to claim 1, wherein a score is given to each text conversion candidate according to the number of times of the conversion operation.
[Appendix 11-Amended claim 15 under Article 34 of the Convention]
14. A speech output device comprising: the text conversion device according to claim 1; and a text-to-speech synthesizer that reads a text output from the text conversion device.
[Appendix 12-Amended claim 16 under Article 34 of the Convention]
A robot comprising the audio output device according to claim 15, wherein the robot outputs audio in accordance with a listener's psychological situation.
[Appendix 13-Claim 19 after amendment under Article 34 of the Convention]
Create multiple text conversion candidates by replacing a word in the input text with another word,
18. The text conversion method according to claim 17, wherein a score is given to each of the text conversion candidates in accordance with the number of homonyms that the replaced word has.
[Appendix 14-Amended claim 20 under Article 34 of the Convention]
Create multiple text conversion candidates by replacing a word in the input text with another word,
The score is given to each of the text conversion candidates according to a time length when the text conversion candidates are read out or a sentence length of each of the text conversion candidates. Text conversion method.
[Appendix 15-Amended claim 21 under Article 34 of the Convention]
Create multiple text conversion candidates by dividing the input text into multiple units,
The text conversion method according to claim 17, wherein a score is given to each text conversion candidate according to the number of division units included in each text conversion candidate or the length of each division unit.
[Appendix 16-Claim 22 after amendment under Article 34 of the Convention]
By extracting one or more important words from the input text and performing a conversion operation of overlapping the important words twice or more, a plurality of text conversion candidates are created,
The text conversion method according to claim 17, wherein a score is given to each text conversion candidate according to the number of times of the conversion operation.
[Appendix 17-Amended claim 25 under Article 34 of the Convention]
Create multiple text conversion candidates by replacing a word in the input text with another word,
The text conversion program according to claim 23, wherein a score is given to each of the text conversion candidates according to the number of homonyms that the replaced word has.
[Appendix 18-Claim 26 after amendment under Article 34 of the Convention]
By replacing a word in the input text with another word, a plurality of text conversion candidates are created, and the time length when each of the text conversion candidates is read out or the length of the sentence of each text conversion candidate is set. The text conversion program according to claim 23, wherein a score is given to each of the text conversion candidates accordingly.
[Appendix 19-Amended claim 27 under Article 34 of the Convention]
Create multiple text conversion candidates by dividing the input text into multiple units,
The text conversion program according to claim 23, wherein a score is given to each text conversion candidate in accordance with the number of division units included in each text conversion candidate or the length of each division unit.
[Appendix 20-Amended claim 28 under Article 34 of the Convention]
By extracting one or more important words from the input text and performing a conversion operation of overlapping the important words twice or more, a plurality of text conversion candidates are created,
The text conversion program according to claim 23, wherein a score is given to each of the text conversion candidates according to the number of times of the conversion operation.
[Appendix 21-Amended claim 29 under Article 34 of the Convention]
29. A voice output program for causing the computer to further execute a process of converting a text to be output into a voice by a text-to-speech synthesis technique using the text conversion program according to any one of claims 23, 25 to 28.

Claims

By replacing a word in the text with a word pair that does not change the meaning even if it is replaced with a parameter that represents the user's psychological state and the text, the range does not change the meaning of the input text, When creating a plurality of text conversion candidates and listening to the output text as speech using the score when the text is converted using the word pairs set for each word pair for each text conversion candidate A text conversion device that performs a conversion operation of input text by obtaining a score indicating the ease of understanding and selecting a text conversion candidate having a score that matches the input parameter.

The text conversion apparatus according to claim 1, wherein an average value of pitch frequencies of speech uttered by the user is used as a parameter representing the psychological state of the user.

The text conversion apparatus according to claim 1, wherein a speed of a voice uttered by the user is used as a parameter representing the psychological state of the user.

The text conversion apparatus according to claim 1, wherein a temporal variation component of a speed of voice uttered by the user is used as a parameter representing the psychological state of the user.

The text conversion device according to any one of claims 1 to 4, wherein the score is set so as to promote replacement with a word in which no homonym objection word exists.

The score is given to each text conversion candidate according to the length of time when each text conversion candidate is read out or the length of a sentence of each text conversion candidate. The text conversion apparatus as described in any one.

Instead of replacing words in the text with word pairs that do not change the meaning of the text,
Create multiple text conversion candidates by dividing the input text into multiple units,
The text according to any one of claims 1 to 5, wherein a score is given to each text conversion candidate according to the number of division units included in each text conversion candidate or the length of each division unit. Conversion device.

Instead of replacing words in the text with word pairs that do not change the meaning of the text,
By extracting one or more important words from the input text and performing a conversion operation of overlapping the important words twice or more, a plurality of text conversion candidates are created,
6. The text conversion apparatus according to claim 1, wherein a score is given to each of the text conversion candidates according to the number of times of the conversion operation.

Based on the user's voice or the user's operation content, a psychological situation estimation unit that outputs a parameter for estimating the user's psychological situation;
A word conversion candidate group is generated by replacing a word in the text using a word pair whose sentence meaning does not change even if it is replaced, and set for each word conversion candidate in each text conversion candidate included in the text conversion candidate group By assigning a score using the score when the text is converted using the word pair, and selecting a text conversion candidate to which a score that matches the parameter for estimating the psychological state of the user is selected. A text converter that replaces words in the input text with synonyms;
A text conversion device comprising:

A text conversion method by a text conversion device,
Inputting parameters representing the user's psychological state and text;
A plurality of text conversion candidates are generated from the text by replacing a word in the text with a word pair whose sentence meaning does not change even if replaced ,
For each of the text conversion candidates, obtain a score indicating ease of understanding when the output text is received as speech using the score when the text is converted using the word pair set for each word pair , Selecting a text conversion candidate having a score that matches the input parameter, thereby converting the input text within a range that does not change the meaning of the sentence based on the input parameter.