JPWO2019044534A1

JPWO2019044534A1 - Information processing device and information processing method

Info

Publication number: JPWO2019044534A1
Application number: JP2019539360A
Authority: JP
Inventors: 角川　元輝; 元輝角川; 政明星野; 亜由美中川
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2017-08-31
Filing date: 2018-08-17
Publication date: 2020-11-19
Also published as: WO2019044534A1

Abstract

本技術は、七五調制約によって、より人間的な発話を行うことができるようにする情報処理装置、及び情報処理方法に関する。入力されたテキスト情報を、七五調に変換して出力する処理部を備える情報処理装置が提供されることで、七五調制約によって、より人間的な発話を行うことができるようになる。本技術は、例えば、対話システム等のユーザの発話に対する応答を生成するシステム、又はニュース番組制作システムやデジタルサイネージシステム等のテキスト情報を音声合成により読み上げるシステムなどに適用することができる。The present technology relates to an information processing device and an information processing method that enable more human speech by the Shichigocho constraint. By providing an information processing device provided with a processing unit that converts the input text information into the 75-tone and outputs it, it becomes possible to perform more human utterance by the 75-tone constraint. This technology can be applied to, for example, a system such as a dialogue system that generates a response to a user's utterance, or a system such as a news program production system or a digital signage system that reads out text information by voice synthesis.

Description

本技術は、情報処理装置、及び情報処理方法に関し、特に、七五調制約によって、より人間的な発話を行うことができるようにした情報処理装置、及び情報処理方法に関する。 The present technology relates to an information processing device and an information processing method, and more particularly to an information processing device and an information processing method that enable more human utterances due to the restrictions of the 75th tone.

近年、音声による対話を行う商品やサービスが普及している。例えば、特許文献１には、外部からの音声に応じて任意の動作を行う電子玩具（家庭用ロボット）が開示されている。 In recent years, products and services that engage in voice dialogue have become widespread. For example, Patent Document 1 discloses an electronic toy (domestic robot) that performs an arbitrary operation in response to a voice from the outside.

特開２００２−３０７３５４号公報Japanese Unexamined Patent Publication No. 2002-307354

ところで、音声による対話を行う場合に、例えば、「明日の横浜の天気は晴れです。」のように応答が単調で、人間同士が行う対話とは明らかに異なっているものも多い。 By the way, in the case of voice dialogue, there are many cases where the response is monotonous, such as "Tomorrow's weather in Yokohama is sunny", which is clearly different from the dialogue between humans.

そうすると、ユーザによっては、例えば、面白みがない、対話を楽しめない、記憶に残りにくい、継続して使いたいとは思わないなどとなることが想定され、より人間的な発話を行うことができるようにするための技術が求められていた。 Then, depending on the user, for example, it is assumed that it is not interesting, the dialogue cannot be enjoyed, it is difficult to remember, and the user does not want to continue using it, so that he / she can speak more humanly. There was a need for technology to make it.

本技術はこのような状況に鑑みてなされたものであり、七五調制約によって、より人間的な発話を行うことができるようにするものである。 This technology was made in view of such a situation, and it is possible to make more human utterances by the Shichigocho constraint.

本技術の一側面の情報処理装置は、入力されたテキスト情報を、七五調に変換して出力する処理部を備える情報処理装置である。 The information processing device on one aspect of the present technology is an information processing device including a processing unit that converts input text information into 75-tone and outputs it.

本技術の一側面の情報処理方法は、情報処理装置の情報処理方法において、前記情報処理装置が、入力されたテキスト情報を、七五調に変換して出力する情報処理方法である。 The information processing method of one aspect of the present technology is the information processing method of the information processing apparatus, in which the information processing apparatus converts the input text information into seventy-five tones and outputs the information processing method.

本技術の一側面の情報処理装置、及び情報処理方法においては、入力されたテキスト情報が、七五調に変換されて出力される。 In the information processing device and the information processing method of one aspect of the present technology, the input text information is converted into 75 tones and output.

本技術の一側面の情報処理装置は、独立した装置であってもよいし、１つの装置を構成している内部ブロックであってもよい。 The information processing device on one aspect of the present technology may be an independent device or an internal block constituting one device.

本技術の一側面によれば、七五調制約によって、より人間的な発話を行うことができる。 According to one aspect of the present technology, more human utterances can be made by the Shichigocho constraint.

なお、ここに記載された効果は必ずしも限定されるものではなく、本開示中に記載されたいずれかの効果であってもよい。 The effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

本技術を適用した情報処理装置のハードウェア構成の例を示すブロック図である。It is a block diagram which shows the example of the hardware composition of the information processing apparatus to which this technology is applied. 本技術を適用した情報処理装置のソフトウェア構成の例を示すブロック図である。It is a block diagram which shows the example of the software configuration of the information processing apparatus to which this technology is applied. 対話処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an interactive process. 対話処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an interactive process. コンテキスト情報DBの例を示す図である。It is a figure which shows the example of the context information DB. ユーザフィードバック情報DBの例を示す図である。It is a figure which shows the example of the user feedback information DB. 語尾リストの例を示す図である。It is a figure which shows the example of the ending list. 意味不変語リストの例を示す図である。It is a figure which shows the example of the meaning invariant word list. オノマトペリストの例を示す図である。It is a figure which shows an example of an onomatopoeia. 同義語辞書の例を示す図である。It is a figure which shows the example of the synonym dictionary. 本技術を適用したニュース番組制作システムの例を示す図である。It is a figure which shows the example of the news program production system which applied this technology. 本技術を適用したデジタルサイネージシステムの例を示す図である。It is a figure which shows the example of the digital signage system to which this technology is applied. 対話システムの構成の例を示すブロック図である。It is a block diagram which shows the example of the structure of the dialogue system.

以下、図面を参照しながら本技術の実施の形態について説明する。なお、説明は以下の順序で行うものとする。 Hereinafter, embodiments of the present technology will be described with reference to the drawings. The explanations will be given in the following order.

１．第１の実施の形態：対話システム
２．第２の実施の形態：ニュース番組制作システム
３．第３の実施の形態：デジタルサイネージシステム
４．変形例1. 1. First Embodiment: Dialogue system 2. Second embodiment: News program production system 3. Third embodiment: Digital signage system 4. Modification example

＜１．第１の実施の形態＞ <1. First Embodiment>

（ハードウェア構成例）
図１は、本技術を適用した情報処理装置のハードウェア構成の例を示すブロック図である。(Hardware configuration example)
FIG. 1 is a block diagram showing an example of a hardware configuration of an information processing apparatus to which the present technology is applied.

図１の情報処理装置１０は、例えば、ネットワークに接続可能なスピーカであって、いわゆるスマートスピーカやホームエージェントなどとも称される。この種のスピーカは、音楽の再生のほか、例えば、ユーザとの音声対話や、照明器具や空調設備などの機器に対する音声操作などを行うことができる。 The information processing device 10 of FIG. 1 is, for example, a speaker that can be connected to a network, and is also called a so-called smart speaker or a home agent. In addition to playing music, this type of speaker can perform voice dialogue with a user, voice operation on equipment such as lighting equipment and air conditioning equipment, and the like.

なお、情報処理装置１０は、スピーカに限らず、例えば、スマートフォンや携帯電話機等のモバイル機器や、タブレット型のコンピュータ、パーソナルコンピュータ、テレビ受像機、ゲーム機等の電子機器として構成されるようにしてもよい。 The information processing device 10 is not limited to a speaker, but is configured as, for example, a mobile device such as a smartphone or a mobile phone, or an electronic device such as a tablet computer, a personal computer, a television receiver, or a game machine. May be good.

図１において、情報処理装置１０は、CPU１０１、ROM１０２、RAM１０３、情報アクセス部１０４、ハードディスク１０５、操作I/F１０６、操作部１０７、音声入力I/F１０８、マイクロフォン１０９、映像入力I/F１１０、カメラ１１１、音声出力I/F１１２、スピーカ１１３、映像出力I/F１１４、ディスプレイ１１５、通信I/F１１６、及びバス１１７から構成される。 In FIG. 1, the information processing device 10 includes a CPU 101, ROM 102, RAM 103, an information access unit 104, a hard disk 105, an operation I / F 106, an operation unit 107, a voice input I / F 108, a microphone 109, a video input I / F 110, and a camera 111. , Audio output I / F 112, speaker 113, video output I / F 114, display 115, communication I / F 116, and bus 117.

CPU(Central Processing Unit)１０１、ROM(Read Only Memory)１０２、及びRAM(Random Access Memory)１０３は、制御部１００として構成される。制御部１００は、各種の演算処理や、各部の動作制御など、情報処理装置１０における中心的な処理装置として動作する。 The CPU (Central Processing Unit) 101, the ROM (Read Only Memory) 102, and the RAM (Random Access Memory) 103 are configured as the control unit 100. The control unit 100 operates as a central processing device in the information processing device 10, such as various arithmetic processes and operation control of each unit.

情報アクセス部１０４は、例えば、データ書き込み／読み出し回路などから構成される。情報アクセス部１０４は、バス１１７を介した制御部１００からの制御に従い、各種のデータを、ハードディスク１０５に書き込んだり、ハードディスク１０５に記録された各種のデータを読み出したりする。ハードディスク１０５は、HDD(Hard Disk Drive)であって、大容量の記録装置として構成される。 The information access unit 104 is composed of, for example, a data writing / reading circuit. The information access unit 104 writes various data to the hard disk 105 and reads various data recorded on the hard disk 105 according to the control from the control unit 100 via the bus 117. The hard disk 105 is an HDD (Hard Disk Drive) and is configured as a large-capacity recording device.

操作I/F１０６は、例えば、操作インターフェース回路などから構成される。操作部１０７は、例えば、ボタンやキーボード、マウス等から構成される。操作I/F１０６は、操作部１０７に対するユーザの操作に応じた操作信号を、バス１１７を介して制御部１００に供給する。 The operation I / F 106 is composed of, for example, an operation interface circuit or the like. The operation unit 107 is composed of, for example, a button, a keyboard, a mouse, and the like. The operation I / F 106 supplies an operation signal corresponding to the user's operation to the operation unit 107 to the control unit 100 via the bus 117.

音声入力I/F１０８は、例えば、音声入力インターフェース回路などから構成される。マイクロフォン１０９は、外部からの音を、電気信号に変換する機器（収音器）である。音声入力I/F１０８は、マイクロフォン１０９により収音された音に応じた音声信号を、バス１１７を介して制御部１００や音声出力I/F１１２等に供給する。 The voice input I / F 108 is composed of, for example, a voice input interface circuit or the like. The microphone 109 is a device (sound collector) that converts sound from the outside into an electric signal. The voice input I / F 108 supplies a voice signal corresponding to the sound picked up by the microphone 109 to the control unit 100, the voice output I / F 112, and the like via the bus 117.

映像入力I/F１１０は、例えば、映像入力インターフェース回路などから構成される。カメラ１１１は、イメージセンサや信号処理回路などを有し、被写体を撮像して撮像画像を生成する。映像入力I/F１１０は、カメラ１１１により生成された撮像画像の画像データを、バス１１７を介して制御部１００や情報アクセス部１０４、映像出力I/F１１４等に供給する。 The video input I / F 110 is composed of, for example, a video input interface circuit or the like. The camera 111 has an image sensor, a signal processing circuit, and the like, and images a subject to generate an captured image. The video input I / F 110 supplies the image data of the captured image generated by the camera 111 to the control unit 100, the information access unit 104, the video output I / F 114, and the like via the bus 117.

音声出力I/F１１２は、例えば、音声出力インターフェース回路などから構成される。スピーカ１１３は、電気信号を物理振動に変えて音を出す機器である。音声出力I/F１１２は、バス１１７を介した制御部１００からの制御に従い、スピーカ１１３から、音声信号に応じた音を出力する。 The audio output I / F 112 is composed of, for example, an audio output interface circuit or the like. The speaker 113 is a device that converts an electric signal into physical vibration to produce sound. The voice output I / F 112 outputs a sound corresponding to the voice signal from the speaker 113 according to the control from the control unit 100 via the bus 117.

映像出力I/F１１４は、例えば、映像出力インターフェース回路などから構成される。ディスプレイ１１５は、例えば、液晶ディスプレイや有機ELディスプレイなどから構成される。映像出力I/F１１４は、バス１１７を介した制御部１００からの制御に従い、ディスプレイ１１５に対し、映像信号に応じた各種の情報（例えば文字や画像等）を表示する。 The video output I / F 114 is composed of, for example, a video output interface circuit or the like. The display 115 is composed of, for example, a liquid crystal display or an organic EL display. The video output I / F 114 displays various information (for example, characters, images, etc.) according to the video signal on the display 115 according to the control from the control unit 100 via the bus 117.

通信I/F１１６は、例えば、通信インターフェース回路などから構成される。通信I/F１１６は、バス１１７を介した制御部１００からの制御に従い、インターネット３０に接続されたサーバ（不図示）にアクセスして、各種のデータをやり取りする。 The communication I / F 116 is composed of, for example, a communication interface circuit or the like. The communication I / F 116 accesses a server (not shown) connected to the Internet 30 in accordance with control from the control unit 100 via the bus 117, and exchanges various data.

なお、図１に示した構成では、センサとして、カメラ１１１が有するイメージセンサを例示したが、他のセンサが設けられるようにしてもよい。各種のセンサによるセンシングの結果得られるセンサ情報は、バス１１７を介して制御部１００に供給され、処理される。 In the configuration shown in FIG. 1, the image sensor included in the camera 111 is illustrated as the sensor, but another sensor may be provided. The sensor information obtained as a result of sensing by various sensors is supplied to the control unit 100 via the bus 117 and processed.

ここで、例えば、センサとしては、磁場（磁界）の大きさや方向を検出する磁気センサ、加速度を検出する加速度センサ、角度（姿勢）や角速度、角加速度を検出するジャイロセンサ、近接するものを検出する近接センサ、あるいは、指紋や虹彩、脈拍などの生体情報を検出する生体センサなど、各種のセンサを含めることができる。 Here, for example, as sensors, a magnetic sensor that detects the magnitude and direction of a magnetic field (magnetic field), an acceleration sensor that detects acceleration, a gyro sensor that detects an angle (attitude), an angular velocity, and an angular acceleration, and a gyro sensor that detects an adjacent object are detected. Various sensors can be included, such as a proximity sensor or a biosensor that detects biometric information such as fingerprints, irises, and pulses.

さらに、センサには、温度を検出する温度センサや、湿度を検出する湿度センサ、周囲の明るさを検出する環境光センサなどの周囲の環境を測定するためのセンサを含めることができる。なお、センサ情報としては、上述のセンサから得られるセンサ情報のほか、GPS(Global Positioning System)信号などから算出される位置情報や、計時手段により計時された時刻情報などの様々な情報を含めることができる。 Further, the sensor can include a sensor for measuring the surrounding environment such as a temperature sensor for detecting temperature, a humidity sensor for detecting humidity, and an ambient light sensor for detecting ambient brightness. In addition to the sensor information obtained from the above-mentioned sensor, the sensor information includes various information such as position information calculated from GPS (Global Positioning System) signals and time information measured by a timekeeping means. Can be done.

（ソフトウェア構成例）
図２は、本技術を適用した情報処理装置のソフトウェア構成の例を示すブロック図である。(Software configuration example)
FIG. 2 is a block diagram showing an example of a software configuration of an information processing apparatus to which the present technology is applied.

すなわち、図１に示した情報処理装置１０では、CPU１０１が、ROM１０２や、ハードディスク１０５等の記録装置に記録されているプログラムを、RAM１０３にロードして実行することによって、図２に示した情報処理部１５０の機能が実現され、各種の処理が実行される。 That is, in the information processing device 10 shown in FIG. 1, the CPU 101 loads the program recorded in the recording device such as the ROM 102 or the hard disk 105 into the RAM 103 and executes the information processing shown in FIG. The function of unit 150 is realized, and various processes are executed.

図２において、情報処理部１５０は、音声認識部１５１、発話意図理解部１５２、アプリケーション／サービス実行部１５３、応答生成部１５４、コンテキスト取得部１５５、七五調変換部１５６、音声合成部１５７、及びユーザフィードバック収集部１５８から構成される。 In FIG. 2, the information processing unit 150 includes a voice recognition unit 151, an utterance intention understanding unit 152, an application / service execution unit 153, a response generation unit 154, a context acquisition unit 155, a 75-tone conversion unit 156, a voice synthesis unit 157, and a user. It is composed of a feedback collecting unit 158.

なお、以下の説明では、ユーザに対する応答文のうち、七五調変換部１５６により七五調に変換される前の応答文を、「応答文（変換前）」と表記するとともに、七五調変換部１５６により七五調に変換された後の応答文を、「応答文（変換後）」と表記して区別する。また、応答文のうち、応答文（変換後）の候補として生成される応答文を、「応答文（変換後候補）」と表記する。 In the following description, among the response sentences to the user, the response sentence before being converted to the 75th tone by the 75th tone conversion unit 156 is referred to as "response sentence (before conversion)" and is converted to the 75th tone by the 75th tone conversion unit 156. The converted response statement is referred to as "response statement (after conversion)" to distinguish it. Further, among the response sentences, the response sentence generated as a candidate for the response sentence (after conversion) is referred to as "response sentence (candidate after conversion)".

また、七五調とは、例えば、「五・七・五」や、「五・七・五・七・七」などのように、五音と七音が繰り返される形式をいう。ここでの単位は、例えばモーラ（音韻）を用いることができる。 In addition, the seven-five tone refers to a form in which five tones and seven tones are repeated, such as "5,7,5" and "5,7,5,7,7". As the unit here, for example, a mora (phonology) can be used.

なお、音節は、ローマ字表記したときの母音の数であって、モーラとは異なる。すなわち、モーラは、音節数に加えて、「っ」、「ん」、「−」の個数を含める。例えば、「チューリッヒ」である語は、３音節であって、５モーラでもある。ただし、これらの音節やモーラの定義は一例であって、他の定義が採用されるようにしてもよい。 The syllable is the number of vowels when written in Roman letters, and is different from the mora. That is, the mora includes the number of "tsu", "n", and "-" in addition to the number of syllables. For example, the word "Zurich" has three syllables and is also a five mora. However, the definitions of these syllables and mora are examples, and other definitions may be adopted.

音声認識部１５１は、マイクロフォン１０９を介してそこに入力される、ユーザの発話に応じた音声信号を、テキスト情報に置き換える音声認識処理を実行し、その結果得られる音声認識結果を、発話意図理解部１５２に供給する。なお、この音声認識処理では、音声テキスト変換用のデータベースなどが用いられる。 The voice recognition unit 151 executes a voice recognition process of replacing the voice signal corresponding to the user's utterance input to the microphone 109 with text information, and understands the utterance intention of the voice recognition result obtained as a result. Supply to unit 152. In this voice recognition process, a database for voice text conversion or the like is used.

発話意図理解部１５２は、音声認識部１５１から供給される音声認識結果に基づいて、ユーザの発話に応じたテキスト情報に対する解析処理（ユーザの意図を理解する処理）を実行する。発話意図理解部１５２は、解析処理の結果得られる意図理解結果を、アプリケーション／サービス実行部１５３に供給する。 The utterance intention understanding unit 152 executes an analysis process (process for understanding the user's intention) on the text information according to the user's utterance based on the voice recognition result supplied from the voice recognition unit 151. The utterance intention understanding unit 152 supplies the intention understanding result obtained as a result of the analysis process to the application / service execution unit 153.

ここでは、ユーザの意図として、例えば、「天気確認」や「スケジュール確認」などの意図が推定され、また、そのパラメータ（スロット）として、「日時」や「場所」などが解析される。 Here, as the user's intention, for example, the intention such as "weather confirmation" or "schedule confirmation" is estimated, and the "date and time" or "location" is analyzed as the parameter (slot).

アプリケーション／サービス実行部１５３は、発話意図理解部１５２から供給される意図理解結果に基づいて、ユーザの意図に合ったアプリケーションやサービスを実行し、その実行の結果を、応答生成部１５４に供給する。 The application / service execution unit 153 executes an application or service that matches the user's intention based on the intention understanding result supplied from the utterance intention understanding unit 152, and supplies the execution result to the response generation unit 154. ..

ここでは、例えば、ユーザの意図として、「天気確認」が推定され、そのパラメータとして、「日時」や「場所」が解析された場合には、外部のサービスにより提供される、天気確認のAPI(Application Programming Interface)に対し、「日時」や「場所」のパラメータを引数として渡すことで、対象の日時や場所の天気に関する情報を得ることができる。 Here, for example, when "weather confirmation" is estimated as the user's intention and "date and time" or "location" is analyzed as its parameters, the weather confirmation API provided by an external service ( By passing the parameters of "date and time" and "location" as arguments to Application Programming Interface), it is possible to obtain information about the weather of the target date and time and location.

応答生成部１５４は、アプリケーション／サービス実行部１５３から供給される、アプリケーションやサービスの実行結果などに基づいて、ユーザに対する応答文（変換前）を生成し、七五調変換部１５６に供給する。 The response generation unit 154 generates a response statement (before conversion) for the user based on the execution result of the application or service supplied from the application / service execution unit 153, and supplies it to the Shichigocho conversion unit 156.

コンテキスト取得部１５５は、時刻情報や位置情報、カメラ１１１により撮像された撮像画像の解析結果などに基づいて、コンテキスト情報を取得し、七五調変換部１５６に供給する。 The context acquisition unit 155 acquires the context information based on the time information, the position information, the analysis result of the captured image captured by the camera 111, and the like, and supplies the context information to the Shichigocho conversion unit 156.

七五調変換部１５６は、応答生成部１５４から供給される応答文（変換前）に対する七五調変換処理を実行し、その結果得られる応答文（変換後）を、音声合成部１５７に供給する。このとき、七五調変換部１５６は、コンテキスト取得部１５５から供給されるコンテキスト情報や、ユーザフィードバック収集部１５８から供給されるユーザフィードバック情報に基づき、応答文（変換前）を、より適切に七五調に変換して、応答文（変換後）を得ることができる。 The 75-tone conversion unit 156 executes the 75-tone conversion process for the response sentence (before conversion) supplied from the response generation unit 154, and supplies the response sentence (after conversion) obtained as a result to the speech synthesis unit 157. At this time, the 75-tone conversion unit 156 more appropriately converts the response sentence (before conversion) into the 75-tone based on the context information supplied from the context acquisition unit 155 and the user feedback information supplied from the user feedback collection unit 158. Then, the response sentence (after conversion) can be obtained.

音声合成部１５７は、七五調変換部１５６から供給される応答文（変換後）に基づいて、音声を合成して、スピーカ１１３から出力する。すなわち、音声合成部１５７は、応答文（変換後）のテキスト情報を読み上げる機能（TTS：Text To Speech）を有している。 The voice synthesis unit 157 synthesizes the voice based on the response sentence (after conversion) supplied from the 75-tone conversion unit 156, and outputs the voice from the speaker 113. That is, the speech synthesis unit 157 has a function (TTS: Text To Speech) of reading out the text information of the response sentence (after conversion).

ユーザフィードバック収集部１５８は、ユーザに対し、応答文（変換後）を出力したときのユーザの反応に関する情報を収集し、ユーザフィードバック情報として、七五調変換部１５６に供給する。 The user feedback collecting unit 158 collects information on the user's reaction when the response sentence (after conversion) is output to the user, and supplies the user feedback information to the Shichigocho conversion unit 156.

以上、情報処理装置１０の構成について説明した。 The configuration of the information processing device 10 has been described above.

（対話処理の流れ）
次に、図３乃至図４のフローチャートを参照して、図２に示した情報処理部１５０により実行される対話処理の全体的な流れについて説明する。(Flow of dialogue processing)
Next, the overall flow of the dialogue processing executed by the information processing unit 150 shown in FIG. 2 will be described with reference to the flowcharts of FIGS. 3 to 4.

ここで、情報処理部１５０に対する入力としては、マイクロフォン１０９から供給される、ユーザ発話に応じた音声信号、又はキーボードである操作部１０７から供給される操作信号に応じたテキスト情報がある。 Here, as the input to the information processing unit 150, there is a voice signal corresponding to the user's utterance supplied from the microphone 109 or text information corresponding to the operation signal supplied from the operation unit 107 which is a keyboard.

情報処理部１５０では、ユーザ発話に応じた音声信号が入力された場合に、ステップＳ１１の処理が実行される。ステップＳ１１において、音声認識部１５１は、ユーザが発話に応じた音声信号を、テキスト情報に置き換える音声認識処理を実行する。 The information processing unit 150 executes the process of step S11 when a voice signal corresponding to the user's utterance is input. In step S11, the voice recognition unit 151 executes a voice recognition process in which the voice signal corresponding to the user's utterance is replaced with text information.

このようにして、ステップＳ１１の処理（音声認識処理）の結果得られるテキスト情報、又はユーザの入力操作に応じたテキスト情報が、発話意図理解部１５２に入力される。 In this way, the text information obtained as a result of the process (speech recognition process) in step S11 or the text information corresponding to the user's input operation is input to the utterance intention understanding unit 152.

ステップＳ１２において、発話意図理解部１５２は、そこに入力されるテキスト情報に対する解析処理を実行することで、ユーザの意図を理解する。 In step S12, the utterance intention understanding unit 152 understands the user's intention by executing an analysis process for the text information input therein.

例えば、ユーザによって、「明日の横浜の天気を教えて？」という発話がなされた場合には、ユーザの意図として、「天気確認」である意図が推定され、また、そのパラメータとして、"明日"である「日時」や、"横浜"である「場所」が得られる。 For example, when the user utters "Tell me the weather in Yokohama tomorrow?", The intention of the user is "weather confirmation", and the parameter is "tomorrow". "Date and time" and "Yokohama" "place" can be obtained.

ステップＳ１３において、アプリケーション／サービス実行部１５３は、ステップＳ１２の処理で得られる意図理解結果に基づいて、ユーザの意図に合ったアプリケーションやサービスを実行する。 In step S13, the application / service execution unit 153 executes the application or service that meets the user's intention based on the intention understanding result obtained in the process of step S12.

ここでは、例えば、ユーザの意図として、「天気確認」が推定され、そのパラメータとして、"明日"である「日時」や、"横浜"である「場所」が解析された場合に、外部のサービスにより提供される、天気確認のAPIに対し、"明日"である「日時」や、"横浜"である「場所」のパラメータを引数として渡すことで、"明日の横浜の天気"に関する情報を得ることができる。 Here, for example, when "weather confirmation" is estimated as the user's intention and the "date and time" which is "tomorrow" and the "location" which is "Yokohama" are analyzed as the parameters, an external service is provided. By passing the parameters of "date and time" which is "tomorrow" and "location" which is "Yokohama" as arguments to the weather confirmation API provided by, you can get information about "weather in Yokohama tomorrow". be able to.

ステップＳ１４において、応答生成部１５４は、ステップＳ１３の処理で得られるアプリケーションやサービスの実行結果に基づいて、ユーザに対する応答文（変換前）を生成する。 In step S14, the response generation unit 154 generates a response statement (before conversion) for the user based on the execution result of the application or service obtained in the process of step S13.

ここでは、例えば、天気確認のAPIから得られる"明日の横浜の天気"に関する情報に基づき、「明日の横浜の天気は晴れです。」である応答文（変換前）が生成される。 Here, for example, a response sentence (before conversion) that says "Tomorrow's weather in Yokohama is sunny" is generated based on the information about "Tomorrow's Yokohama weather" obtained from the weather confirmation API.

ステップＳ１５において、コンテキスト取得部１５５は、時刻情報や位置情報、カメラ１１１により撮像された撮像画像の解析結果などに基づいて、コンテキスト情報を取得する。 In step S15, the context acquisition unit 155 acquires context information based on the time information, the position information, the analysis result of the captured image captured by the camera 111, and the like.

ここで、コンテキスト情報は、例えば、ユーザの発話の時間帯や場所、話者、同席者、場の雰囲気などの発話に関連する現在の環境情報を含み、コンテキスト情報DB２０１に一時的に記録することができる。ただし、現在の環境情報は、コンテキスト情報DB２０１に記録せずに、直接、処理されるようにしてもよい。 Here, the context information includes, for example, the current environmental information related to the utterance such as the time zone and place of the user's utterance, the speaker, the attendees, and the atmosphere of the place, and is temporarily recorded in the context information DB 201. Can be done. However, the current environment information may be processed directly without being recorded in the context information DB201.

ステップＳ１６において、七五調変換部１５６は、ステップＳ１４の処理で生成された応答文（変換前）に対する七五調変換処理を実行することで、応答文（変換前）を七五調に変換して、応答文（変換後）を生成する。 In step S16, the 75-tone conversion unit 156 converts the response sentence (before conversion) into the 75-tone by executing the 75-tone conversion process for the response sentence (before conversion) generated in the process of step S14, and the response sentence (before conversion). After conversion) is generated.

この七五調変換処理では、後述のステップＳ１８の処理で収集されるユーザフィードバック情報を参照して、ステップＳ１５の処理で取得されるコンテキスト情報に適合した七五調変換処理が選定され、応答文（変換前）に対し、選定された七五調変換処理が実行される。 In this 75-tone conversion process, the 75-tone conversion process suitable for the context information acquired in the process of step S15 is selected with reference to the user feedback information collected in the process of step S18 described later, and the response statement (before conversion). On the other hand, the selected 75-tone conversion process is executed.

具体的には、例えば、ユーザフィードバック情報は、七五調の候補を生成する候補生成パターンごとに、過去の環境情報とユーザの反応とをスコア値化した情報を含み、コンテキスト情報は、現在の環境情報を含んでいるので、七五調変換部１５６は、過去の環境情報のうち、現在の環境情報と同一又は類似する過去の環境情報のスコアが、閾値以上となる候補生成パターンを選定することができる。 Specifically, for example, the user feedback information includes information obtained by scoring the past environmental information and the user's reaction for each candidate generation pattern that generates the candidates of the 75th tone, and the context information is the current environmental information. The 75-tone conversion unit 156 can select a candidate generation pattern in which the score of the past environmental information that is the same as or similar to the current environmental information is equal to or higher than the threshold value among the past environmental information.

また、このとき、複数の候補生成パターンが選定可能な場合には、例えば、ランダムに１つの候補生成パターンが、選定されるようにすることができる。すなわち、七五調変換処理では、選定された候補生成パターン（七五調候補生成処理（の組み合わせ））を順に実行して、応答文（変換前）が七五調になるように各処理が実行されることになる。 Further, at this time, when a plurality of candidate generation patterns can be selected, for example, one candidate generation pattern can be randomly selected. That is, in the 75-tone conversion process, the selected candidate generation patterns (75-tone candidate generation process (combination)) are executed in order, and each process is executed so that the response statement (before conversion) becomes the 75-tone. ..

なお、七五調変換処理の詳細は、この対話処理の全体的な流れを説明した後に説明するが、候補生成パターンとしては、例えば、以下の七五調候補生成処理を含めることができる。 The details of the 75-tone conversion process will be described after explaining the overall flow of this dialogue process, but the candidate generation pattern can include, for example, the following 75-tone candidate generation process.

（Ａ）助詞を抜いて七五調になる候補を生成する助詞抜き
（Ｂ）意味的に不要な部分を除去して七五調になる候補を生成する不要部分除去
（Ｃ）語尾を追加して七五調になる候補を生成する語尾追加
（Ｄ）意味的に変わらない語を追加して七五調になる候補を生成する意味不変語追加
（Ｅ）ある語を繰り返して七五調になる候補を生成する繰り返し追加
（Ｆ）オノマトペを追加して七五調になる候補を生成するオノマトペ追加
（Ｇ）同義語に置換して七五調になる候補を生成する同義語追加
（Ｈ）上記の（Ａ）乃至（Ｇ）の組み合わせ(A) Remove auxiliary words to generate candidates that become 75-tone (B) Remove semantically unnecessary parts to generate candidates that become 75-tone Remove unnecessary parts (C) Add endings to become 75-tone Addition of endings to generate candidates (D) Add words that do not change in meaning to generate candidates that become 75-tone Addition of meaning-invariant words (E) Repeat addition of words to generate candidates that become 75-tone (F) Add onomatope to generate candidates that become 75-tone Add onomatope (G) Add synonyms to generate candidates that become 75-tone by replacing with synonyms (H) Combination of (A) to (G) above

なお、（Ｃ）の語尾追加の七五調候補生成の際には、語尾リスト２１１が用いられる。また、（Ｄ）の意味不変語追加の七五調候補生成の際には、意味不変語リスト２１２が用いられる。さらに、（Ｆ）のオノマトペ追加の七五調候補生成の際には、オノマトペリスト２１３が用いられ、（Ｇ）の同義語追加の七五調候補生成の際には、同義語辞書２１４が用いられる。 The ending list 211 is used when generating the 75-tone candidate for adding the ending in (C). In addition, the meaning-invariant word list 212 is used when generating the 75-tone candidate for adding the meaning-invariant word in (D). Further, the onomatopoeia list 213 is used when the onomatopoeia addition 75-tone candidate is generated in (F), and the synonym dictionary 214 is used when the onomatopoeia addition 75-tone candidate is generated in (G).

このようにして、応答文（変換前）が七五調に変換され、応答文（変換後）が生成される。そして、ステップＳ１６の処理で得られる七五調変換結果を、音声出力する場合には、応答文（変換後）は、音声合成部１５７に出力される。 In this way, the response sentence (before conversion) is converted to Shichigocho, and the response sentence (after conversion) is generated. Then, when the 75-tone conversion result obtained in the process of step S16 is output as voice, the response sentence (after conversion) is output to the voice synthesis unit 157.

ステップＳ１７において、音声合成部１５７は、ステップＳ１６の処理で得られる応答文（変換後）に基づいて、音声を合成して、音声出力I/F１１２を介してスピーカ１１３から出力する。これにより、スピーカ１１３からは、七五調の応答に応じた音声が出力される。 In step S17, the voice synthesis unit 157 synthesizes the voice based on the response sentence (after conversion) obtained in the process of step S16, and outputs the voice from the speaker 113 via the voice output I / F 112. As a result, the speaker 113 outputs the sound corresponding to the response of the 75th tone.

一方で、ステップＳ１６の処理で得られる七五調変換結果を、映像出力する場合には、応答文（変換後）は、映像出力I/F１１４を介してディスプレイ１１５に出力される。これにより、ティスプレイ１１５の画面には、七五調の応答に応じたテキストが表示される。 On the other hand, when the 75-tone conversion result obtained in the process of step S16 is output as a video, the response sentence (after conversion) is output to the display 115 via the video output I / F 114. As a result, the text corresponding to the response of the Shichigocho is displayed on the screen of the display 115.

より具体的には、ここでは、例えば、上述の「明日の横浜の天気は晴れです。」である応答文（変換前）が、「横浜の明日の天気は晴れなのさ。」である応答文（変換後）のように、七五調に変換され、出力されることになる。 More specifically, here, for example, the above-mentioned response sentence "Tomorrow's weather in Yokohama is sunny" (before conversion) is "Yokohama's tomorrow's weather is sunny." Like (after conversion), it will be converted to 75-tone and output.

なお、ユーザフィードバック収集部１５８は、ユーザに対し、応答文（変換後）を出力したときのユーザの反応に関する情報を収集しており（Ｓ１８）、そのときに得られるユーザフィードバック情報を、ユーザフィードバック情報DB２０２に記録している。上述の七五調変換処理では、このユーザフィードバック情報が用いられる。 The user feedback collecting unit 158 collects information on the user's reaction when the response sentence (after conversion) is output to the user (S18), and the user feedback information obtained at that time is used as user feedback. It is recorded in the information DB 202. This user feedback information is used in the above-mentioned Shichigocho conversion process.

以上、対話処理の全体的な流れについて説明した。 The overall flow of dialogue processing has been described above.

（七五調変換処理の流れ）
次に、上述した図３乃至図４に示した対話処理のうち、図４のステップＳ１６の処理（七五調変換処理）の詳細について説明する。(Flow of Shichigocho conversion process)
Next, among the above-mentioned dialogue processes shown in FIGS. 3 to 4, the details of the process (75-tone conversion process) in step S16 of FIG. 4 will be described.

ステップＳ１１１において、七五調変換部１５６は、言語解析処理を実行する。 In step S111, the Shichigocho conversion unit 156 executes the language analysis process.

この言語解析処理では、そこに入力される応答文（変換前）に対し、形態素解析処理が行われ、応答文（変換前）の形態素解析結果が得られる。ただし、この形態素解析処理では、応答文（変換前）を、形態素（単語）の列に分割して、それぞれの形態素の品詞などが判別されるが、各形態素には、読み仮名も付与されるものとする。 In this language analysis process, the morphological analysis process is performed on the response sentence (before conversion) input therein, and the morphological analysis result of the response sentence (before conversion) is obtained. However, in this morphological analysis process, the response sentence (before conversion) is divided into columns of morphemes (words), and the part of speech of each morpheme is determined, but a reading kana is also given to each morpheme. It shall be.

言語解析処理（図４のＳ１１１）の内容
入力（IN）：応答文（変換前）
出力（OUT）：形態素解析結果
処理：形態素解析処理Contents of language analysis processing (S111 in FIG. 4) Input (IN): Response sentence (before conversion)
Output (OUT): Morphological analysis result processing: Morphological analysis processing

ステップＳ１１２において、七五調変換部１５６は、候補生成選定処理を実行する。 In step S112, the Shichigocho conversion unit 156 executes the candidate generation selection process.

この候補生成選定処理では、例えば、ユーザフィードバック情報DB２０２に格納されたユーザフィードバック情報を参照して、現在のコンテキスト情報に近いケースを特定し、そのフィードバックのスコアが、閾値以上となる候補生成処理の組み合わせの中から、ランダムに１つの候補生成処理を選定することができる。 In this candidate generation selection process, for example, the user feedback information stored in the user feedback information DB 202 is referred to to identify a case close to the current context information, and the candidate generation process in which the feedback score is equal to or higher than the threshold value. One candidate generation process can be randomly selected from the combinations.

また、この候補生成選定処理では、例えば、ユーザフィードバック情報に存在しない、候補生成処理の組み合わせをランダムに選定することが可能であり、また、それが選定されるかどうかもランダムに選択することができる。なお、フィードバックのスコアが、閾値以上となるケースが存在しなかった場合には、応答文（変換前）に対する七五調変換処理を実行せずに、応答文をそのまま出力すればよい。 Further, in this candidate generation selection process, for example, it is possible to randomly select a combination of candidate generation processes that does not exist in the user feedback information, and it is also possible to randomly select whether or not it is selected. it can. If there is no case where the feedback score is equal to or higher than the threshold value, the response sentence may be output as it is without executing the 75-tone conversion process for the response sentence (before conversion).

図５は、コンテキスト情報DB２０１の例を示す図である。 FIG. 5 is a diagram showing an example of the context information DB 201.

図５において、コンテキスト情報DB２０１には、コンテキスト情報の項目ごとに、値が格納されている。図５の例では、コンテキスト情報として、"日曜21時台"である時間帯、"自宅"である場所、"山田浩"である話者、"家族"である同席者、"楽しい"である雰囲気などが格納されている。 In FIG. 5, the context information DB 201 stores a value for each item of context information. In the example of FIG. 5, the context information is the time zone of "Sunday 21:00", the place of "home", the speaker of "Hiroshi Yamada", the attendees of "family", and "fun". The atmosphere etc. are stored.

ここで、「時間帯」は、例えば、情報処理装置１０に内蔵された計時手段により計時された時刻情報を用いればよい。また、「場所」は、GPS(Global Positioning System)信号などから算出される位置情報を用いればよい。 Here, for the "time zone", for example, time information measured by a time measuring means built in the information processing apparatus 10 may be used. Further, as the "location", position information calculated from a GPS (Global Positioning System) signal or the like may be used.

また、「話者」や「同席者」、「雰囲気」は、カメラ１１１により撮像された被写体の画像データを解析し、その解析結果に基づき、判定すればよい。例えば、「雰囲気」は、画像解析結果から得られる、話者や同席者の表情などから、"楽しい"や"悲しい"などを判定し、その判定結果の総和から、場の雰囲気を決定することができる。 Further, the "speaker", "accompanying person", and "atmosphere" may be determined by analyzing the image data of the subject captured by the camera 111 and based on the analysis result. For example, "atmosphere" is to judge "fun" or "sad" from the facial expressions of speakers and attendees obtained from the image analysis result, and to determine the atmosphere of the place from the sum of the judgment results. Can be done.

なお、人の表情や感情等を読み取る技術は、既に様々な研究がなされており、ここでは、例えば、それらの公知の技術を用いた、顔や画像、音声の認識用のAPI(Application Programming Interface)を用いることができる。例えば、この種のAPIを提供するサービスを利用して、話者や同席者の顔を含む撮像画像を送ることで、例えば、喜びや驚き、怒り、悲しみ、軽蔑、嫌悪感など、話者や同席者の表情や感情に関する情報を得ることができる。 Various studies have already been conducted on technologies for reading human facial expressions and emotions. Here, for example, APIs (Application Programming Interfaces) for recognizing faces, images, and voices using those known technologies have been conducted. ) Can be used. For example, by using a service that provides this kind of API and sending an image containing the faces of the speaker and attendants, for example, joy, surprise, anger, sadness, contempt, disgust, etc. You can get information about the facial expressions and emotions of the attendees.

また、ここでは、コンテキスト情報を、コンテキスト情報DB２０１として、データベースで管理しているが、必ずしもデータベースで管理する必要はない。ただし、例えば、現在の環境情報のうち、場所などの時間的に変化しにくい情報については、データベースとして管理したほうがよい。 Further, here, the context information is managed in the database as the context information DB201, but it is not always necessary to manage it in the database. However, for example, among the current environmental information, information such as a location that does not change easily over time should be managed as a database.

図６は、ユーザフィードバック情報DB２０２の例を示す図である。 FIG. 6 is a diagram showing an example of the user feedback information DB 202.

図６において、ユーザフィードバック情報DB２０２には、候補生成パターンごとに、コンテキスト情報とユーザの反応をスコア値化して格納している。ここで、スコア値の算出方法であるが、例えば、対象の候補生成パターンを用いて生成された応答文（変換後）の出力に対し、ユーザの反応が良かった場合には、"+ 1" とする一方で、ユーザの反応が悪かった場合には、"- 1" とする。 In FIG. 6, the user feedback information DB 202 stores the context information and the user's reaction as score values for each candidate generation pattern. Here, the score value is calculated. For example, if the user's response to the output of the response sentence (after conversion) generated using the target candidate generation pattern is good, "+1". On the other hand, if the user's reaction is bad, set it to "-1".

例えば、「ユーザの反応が良かった場合」とは、音声認識結果として、話者や同席者等が、「おもしろかった」と言ったことを認識できたときや、画像解析結果として、話者や同席者等が、「笑った」ことを認識できたときなどが想定される。一方で、「ユーザの反応が悪かった場合」とは、話者や同席者等が、「なんだそれ」と言ったことを認識できたときや、画像解析結果として、話者や同席者等が、「怒っている」ことを認識できたときなどが想定される。 For example, "when the user's reaction is good" means when the speaker or the attendant can recognize that the speaker or the attendant has said "interesting" as a voice recognition result, or when the image analysis result shows the speaker or the like. It is assumed that the attendees can recognize that they "laughed". On the other hand, "when the user's reaction is bad" means when the speaker or attendant can recognize what he said, or as an image analysis result, the speaker or attendant, etc. However, it is assumed that it is possible to recognize that it is "angry".

なお、音声認識結果や画像解析結果などに基づき、ユーザの反応がなかったと判定された場合には、スコア値として、例えば、"0" とすることができる。 If it is determined that there is no reaction from the user based on the voice recognition result or the image analysis result, the score value can be set to, for example, "0".

図６の例では、"語尾追加＆オノマトペ追加" である候補生成パターンに対し、"時間帯 = 日曜21時台"，"場所 = 自宅"， "話者 = 山田浩" であるコンテキスト情報と、"+129" であるスコアが格納されている。 In the example of Fig. 6, for the candidate generation pattern of "addition of ending & addition of onomatopoeia", context information of "time zone = Sunday 21:00", "location = home", "speaker = Hiroshi Yamada", The score that is "+129" is stored.

また、図６の例では、"意味不変語追加" や "助詞抜き＆繰り返し追加＆同義語置換" ，""助詞抜き＆不要部分除去" などの候補生成パターンに対しても、各種のコンテキスト情報と、"+86"，"+29"，"-42"などのスコアが付与されている。 In addition, in the example of FIG. 6, various context information is also applied to candidate generation patterns such as "addition of meaning invariant words", "without particles & repeated addition & replacement of synonyms", and "without particles & removal of unnecessary parts". And scores such as "+86", "+29", and "-42" are given.

ここで、例えば、スコアと比較される閾値として、"+80"が設定されている場合には、候補生成パターンとして、"語尾追加＆オノマトペ追加" 及び "意味不変語追加" が選定され得るが、ここでは、例えば、"語尾追加＆オノマトペ追加" 及び "意味不変語追加" のうち、１つの候補生成パターンが、ランダムに選定されるようにすることができる。 Here, for example, when "+80" is set as the threshold value to be compared with the score, "addition of ending & addition of onomatopoeia" and "addition of meaning-invariant word" can be selected as candidate generation patterns. , Here, for example, one candidate generation pattern among "addition of ending & addition of onomatopoeia" and "addition of meaning invariant word" can be randomly selected.

このように、複数の候補生成パターンが選定可能な場合に、ランダムに１つの候補生成パターンが選定されるようにすることで、ある一定条件を満たした候補生成パターンの中から選択された１つの候補生成パターンに応じた七五調の変換が行われ、様々なパターンの七五調の応答文を、ユーザに対して提示することができる。 In this way, when a plurality of candidate generation patterns can be selected, one candidate generation pattern is randomly selected, so that one candidate generation pattern satisfying a certain condition is selected. The 75-tone conversion is performed according to the candidate generation pattern, and various patterns of 75-tone response sentences can be presented to the user.

なお、例えば、最もスコアが高い候補生成パターンが選定されるようにすると、ある特定の候補生成パターンのみが繰り返し選定される可能性が高いため、上述のようにランダムに候補生成パターンを選定するのが好適であるが、対話システムの運用によっては、ランダムに限らず、例えば、最もスコアが高いものなどが選定されるようにしてもよい。さらに、この例では、閾値以上となる候補生成パターンの中から選定する例を示しているが、例えば、閾値未満の候補生成パターンがランダムに選定されるようにしてもよい。 For example, if the candidate generation pattern with the highest score is selected, there is a high possibility that only a specific candidate generation pattern will be repeatedly selected. Therefore, the candidate generation patterns are randomly selected as described above. However, depending on the operation of the dialogue system, it is not limited to random, and for example, the one with the highest score may be selected. Further, in this example, an example of selecting from candidate generation patterns having a threshold value or more is shown, but for example, a candidate generation pattern having a threshold value or less may be randomly selected.

候補生成選定処理（図４のＳ１１２）の内容
入力（IN）：応答文（変換前）、形態素解析結果（図４のＳ１１１）、コンテキスト情報（図４のＳ１５）、ユーザフィードバック情報（図４のＳ１８）
出力（OUT）：候補生成処理の選定結果（どの候補生成処理をかけるか）
処理：候補生成選定処理Contents of candidate generation selection process (S112 in FIG. 4) Input (IN): Response sentence (before conversion), morphological analysis result (S111 in FIG. 4), context information (S15 in FIG. 4), user feedback information (in FIG. 4). S18)
Output (OUT): Selection result of candidate generation processing (which candidate generation processing to apply)
Processing: Candidate generation selection processing

ステップＳ１１３において、七五調変換部１５６は、七五調候補生成処理を実行する。 In step S113, the 75-tone conversion unit 156 executes the 75-tone candidate generation process.

この七五調候補生成処理では、例えば、助詞抜き、不要部分除去、語尾追加、意味不変語追加、繰り返し追加、オノマトペ追加、又は同義語置換によって、七五調の候補を生成する候補生成処理のうち、ステップＳ１１２の処理で得られる選定結果に応じた１以上の候補生成処理が実行される。 In this 75-tone candidate generation process, for example, among the candidate generation processes for generating 75-tone candidates by removing particles, removing unnecessary parts, adding endings, adding meaning-invariant words, adding repeatedly, adding onomatopoeia, or replacing synonyms, step S112 One or more candidate generation processes are executed according to the selection result obtained in the process of.

なお、ここに列挙した助詞抜き、不要部分除去、語尾追加、意味不変語追加、繰り返し追加、オノマトペ追加、又は同義語置換等の候補生成処理の詳細は、第１候補生成処理乃至第７候補生成処理として後述する。 For details of the candidate generation processing such as particle removal, unnecessary part removal, ending addition, meaning invariant addition, repeated addition, onomatopoeia addition, or synonym replacement listed here, refer to the first candidate generation processing to the seventh candidate generation. The process will be described later.

七五調候補生成処理（図４のＳ１１３）の内容
入力（IN）：応答文（変換前）、形態素解析結果（図４のＳ１１１）、候補生成処理の選定結果（図４のＳ１１２）
出力（OUT）：応答文（変換後候補）
処理：候補生成処理の選定結果に応じた七五調候補生成処理Contents of the Shichigocho candidate generation process (S113 in FIG. 4) Input (IN): Response sentence (before conversion), morphological analysis result (S111 in FIG. 4), selection result of candidate generation process (S112 in FIG. 4)
Output (OUT): Response statement (candidate after conversion)
Processing: Shichigocho candidate generation processing according to the selection result of candidate generation processing

（Ａ）第１候補生成処理 (A) First candidate generation process

第１候補生成処理では、応答文（変換前）に含まれる助詞を抜くことで、七五調になる応答文（変換後候補）を生成する（図４のＳ１１３Ａ）。 In the first candidate generation process, by removing the particles included in the response sentence (before conversion), a response sentence (post-conversion candidate) having a 75-tone tone is generated (S113A in FIG. 4).

以下に、助詞抜きの七五調候補生成時の変換前と変換後候補の応答文の例を示す。ただし、応答文（変換前）については、日本語と、ローマ字と、英語の３種類の表記をし、応答文（変換後）については、日本語とローマ字の２種類の表記をする。 The following is an example of the response sentences of the pre-conversion and post-conversion candidates when the 75-tone candidate without particles is generated. However, the response sentence (before conversion) is written in Japanese, Romaji, and English, and the response sentence (after conversion) is written in two types, Japanese and Romaji.

また、その際に、日本語は「日」、ローマ字は「ロ」、英語は「英」とそれぞれ略記するものとする。なお、これらの応答文の例の表記については、以下に説明する他の応答文においても同様とされる。 At that time, Japanese is abbreviated as "Japanese", Romaji is abbreviated as "Ro", and English is abbreviated as "English". The notation of examples of these response statements is the same for other response statements described below.

助詞抜きの七五調候補生成の例

応答文（変換前）
（日）：松山で人気の居酒屋はサバサバ亭。
（ロ）：matsuyama de ninki no izakaya ha sabasabatei
（英）：Popular bar in Matsuyama is Sabasabatei.

応答文（変換後候補）
（日）：松山で人気の居酒屋サバサバ亭。
（ロ）：matsuyama de ninki no izakaya sabasabateiExample of generating 75-tone candidates without particles

Response statement (before conversion)
(Sun): A popular izakaya in Matsuyama is Saba Saba Tei.
(B): matsuyama de ninki no izakaya ha sabasabatei
(English): Popular bar in Matsuyama is Sabasabatei.

Response statement (candidate after conversion)
(Sun): Saba Saba Tei, a popular izakaya in Matsuyama.
(B): matsuyama de ninki no izakaya sabasabatei

この助詞抜きの七五調候補生成の例では、応答文（変換前）における「居酒屋」と「サバサバ亭」とを接続する助詞である「は」を抜くことで、応答文（変換後候補）が七五調になるようにしている。 In this example of generating Shichigocho candidates without particles, the response sentence (candidate after conversion) is made into Shichigocho by removing the particle "ha" that connects "Izakaya" and "Saba Saba Tei" in the response sentence (before conversion). I am trying to be.

（Ｂ）第２候補生成処理 (B) Second candidate generation process

第２候補生成処理では、応答文（変換前）に含まれる意味的に不要な部分を除去して、七五調になる応答文（変換後候補）を生成する（図４のＳ１１３Ｂ）。 In the second candidate generation process, a semantically unnecessary part included in the response sentence (before conversion) is removed to generate a response sentence (post-conversion candidate) having a 75-tone tone (S113B in FIG. 4).

以下に、不要部分除去の七五調候補生成時の変換前と変換後候補の応答文の例を示す。 The following is an example of the response sentences of the candidates before and after conversion when the 75-tone candidate for removing unnecessary parts is generated.

不要部分除去の七五調候補生成の例

応答文（変換前）
（日）：ロンドンの現在時刻は夜８時になりました。
（ロ）：rondon no genzai zikoku ha yoru hatizi ni narimasita
（英）：The current time in London is 8 o'clock in the evening.

応答文（変換後候補）
（日）：ロンドンの現在時刻は夜８時。
（ロ）：rondon no genzai zikoku ha yoru hatiziExample of generating 75-tone candidates for removing unnecessary parts

Response statement (before conversion)
(Sun): The current time in London is 8 pm.
(B): rondon no genzai zikoku ha yoru hatizi ni narimasita
(UK): The current time in London is 8 o'clock in the evening.

Response statement (candidate after conversion)
(Sun): The current time in London is 8:00 pm.
(B): rondon no genzai zikoku ha yoru hatizi

この不要部分除去の七五調候補生成の例では、応答文（変換前）における「夜８時」に続く、「になりました」である意味的に不要な部分を除去することで、応答文（変換後候補）が七五調になるようにしている。 In this example of Shichigocho candidate generation for removing unnecessary parts, the response sentence (before conversion) is followed by "8 pm" and the response sentence (before conversion) is removed by removing the semantically unnecessary part that is "become". (Candidate after conversion) is set to 75 tone.

（Ｃ）第３候補生成処理 (C) Third candidate generation process

第３候補生成処理では、語尾リスト２１１を用いて、応答文（変換前）に語尾を追加することで、七五調になる応答文（変換後候補）を生成する（図４のＳ１１３Ｃ）。 In the third candidate generation process, the ending list 211 is used to add the ending to the response sentence (before conversion) to generate a response sentence (candidate after conversion) having a 75-tone tone (S113C in FIG. 4).

図７は、語尾リスト２１１の例を示す図である。なお、図７において、"*"（アスタリスク）は、任意の文字（文字列）や品詞を表している。 FIG. 7 is a diagram showing an example of the ending list 211. In FIG. 7, "*" (asterisk) represents an arbitrary character (character string) or part of speech.

図７において、「本表記」は、語尾に追加される語であり、その品詞を、「本品詞」として表している。また、ある応答文における、「本表記」よりも前の表記が「前表記」であり、「本表記」よりも後ろの表記が「後表記」である。さらに、「前表記」の品詞を、「前品詞」として表し、「後表記」の品詞を、「後品詞」として表している。 In FIG. 7, "this notation" is a word added to the end of a word, and its part of speech is represented as "this part of speech". Further, in a certain response sentence, the notation before the "main notation" is the "pre-notation", and the notation after the "main notation" is the "post-notation". Further, the part of speech of "pre-notation" is represented as "pre-part of speech", and the part of speech of "post-notation" is represented as "post-part of speech".

図７の例では、"よ"である本表記は、助詞（終助詞）であって、前表記として、助動詞（終止形）からなる任意の文字が指定され、後表記として、任意の品詞からなる任意の文字が指定されている。また、図７の例では、"ね"である本表記は、助詞（終助詞）であって、前表記として、助詞（格助詞）からなる任意の文字列が指定され、後表記として、任意の品詞からなる任意の文字が指定されている。 In the example of FIG. 7, this notation of "yo" is a particle (final particle), and an arbitrary character consisting of an auxiliary verb (termination form) is specified as a pre-notation, and an arbitrary part of speech is specified as a post-notation. Any character is specified. Further, in the example of FIG. 7, this notation of "ne" is a particle (final particle), and an arbitrary character string consisting of a particle (case particle) is specified as a pre-particle, and an arbitrary character string consisting of a particle (case particle) is specified as a postpositional notation. Any character consisting of the part of particles of is specified.

このとき、前形態素、本形態素、及び後形態素が条件に一致した場合に、本形態素を追加したとしても意味が変わらないため、応答文（変換前）に対し、本形態素を追加することができる。 At this time, when the pre-morpheme, the present morpheme, and the post-morpheme match the conditions, the meaning does not change even if the present morpheme is added, so that the present morpheme can be added to the response sentence (before conversion). ..

なお、図７に示した"*" の意味や、「本表記」と、「前表記」や「後表記」などとの関係は、後述する他の図（図８や図９）においても同様とされる。 The meaning of "*" shown in FIG. 7 and the relationship between the "main notation" and the "pre-notation" and "post-notation" are the same in other figures (FIGS. 8 and 9) described later. It is said that.

以下に、語尾追加の七五調候補生成時の変換前と変換後候補の応答文の例を示す。 The following is an example of the response sentences of the candidates before and after conversion when the 75-tone candidate with the ending added is generated.

語尾追加の七五調候補生成の第１の例

応答文（変換前）
（日）：岸さんにメールを送信しました。
（ロ）：kisi san ni me-ru wo sousin simasita
（英）：I sent an email to Kishi.

応答文（変換後候補）
（日）：岸さんにメールを送信しましたよ。
（ロ）：kisi san ni me-ru wo sousin simasita yoFirst example of generating Shichigocho candidates with additional endings

Response statement (before conversion)
(Sun): An email was sent to Mr. Kishi.
(B): kisi san ni me-ru wo sousin simasita
(English): I sent an email to Kishi.

Response statement (candidate after conversion)
(Sun): I sent an email to Mr. Kishi.
(B): kisi san ni me-ru wo sousin simasita yo

この語尾追加の七五調候補生成の第１の例では、応答文（変換前）における「送信しました」に続いて、「よ」である語尾を追加することで、応答文（変換後候補）が七五調になるようにしている。 In the first example of generating the Shichi-Go-Chinese candidate with this ending added, the response sentence (post-conversion candidate) is created by adding the ending that is "yo" after "sent" in the response sentence (before conversion). I try to be in the 75th tone.

語尾追加の七五調候補生成の第２の例

応答文（変換前）
（日）：横浜の明日の天気は晴れです。
（ロ）：yokohama no asita no tenki ha hare desu
（英）：The weather in Yokohama is sunny tomorrow.

応答文（変換後候補）
（日）：横浜の明日の天気は晴れなのさ。
（ロ）：yokohama no asita no tenki ha hare nanosaSecond example of generating Shichigocho candidates with additional endings

Response statement (before conversion)
(Sun): Tomorrow's weather in Yokohama will be fine.
(B): yokohama no asita no tenki ha hare desu
(English): The weather in Yokohama is sunny tomorrow.

Response statement (candidate after conversion)
(Sun): Tomorrow's weather in Yokohama will be sunny.
(B): yokohama no asita no tenki ha hare nanosa

この語尾追加の七五調候補生成の第２の例では、応答文（変換前）における「晴れ」に続いて、「なのさ」である語尾を追加することで、応答文が七五調（変換後候補）になるようにしている。ただし、この第２の例では、応答文（変換前）における「です」である語尾を削除している。 In the second example of generating the 75-tone candidate with the addition of the ending, the response sentence is the 75-tone (post-conversion candidate) by adding the ending of "Nanosa" after "Sunny" in the response sentence (before conversion). I am trying to be. However, in this second example, the ending of "desu" in the response sentence (before conversion) is deleted.

なお、この語尾追加の七五調候補生成の他の例としては、例えば、「日本は」である応答文（変換前）に対し、「ね」である語尾を追加して、「日本はね」である応答文（変換後候補）を生成するケースなどが想定される。 As another example of generating the 75-tone candidate with the addition of the ending, for example, add the ending of "ne" to the response sentence (before conversion) of "Japan is" and use "Japan is". It is assumed that a certain response statement (candidate after conversion) is generated.

（Ｄ）第４候補生成処理 (D) Fourth candidate generation process

第４候補生成処理では、意味不変語リスト２１２を用いて、応答文（変換前）に意味の変わらない語（意味不変語）を追加することで、七五調になる応答文（変換後候補）を生成する（図４のＳ１１３Ｄ）。 In the fourth candidate generation process, by using the meaning-invariant word list 212 and adding a word (meaning-invariant word) whose meaning does not change to the response sentence (before conversion), a response sentence (post-conversion candidate) that becomes 75-tone is obtained. Generate (S113D in FIG. 4).

図８は、意味不変語リスト２１２の例を示す図である。 FIG. 8 is a diagram showing an example of the meaning immutable word list 212.

図８の例では、"やっぱり"である本表記は、副詞であって、前表記及び後表記として、任意の品詞からなる任意の文字が指定されている。また、図８の例では、"ところで"である本表記は、接続詞であって、前表記及び後表記として、任意の品詞からなる任意の文字が指定されている。 In the example of FIG. 8, this notation, which is "after all", is an adverb, and any character consisting of an arbitrary part of speech is specified as the pre-notation and the post-notation. Further, in the example of FIG. 8, this notation of "by the way" is a conjunction, and an arbitrary character consisting of an arbitrary part of speech is specified as a pre-notation and a post-notation.

以下に、意味不変語追加の七五調候補生成時の変換前と変換後候補の応答文の例を示す。 The following is an example of the response sentences of the candidates before and after conversion when the 75-tone candidate with the addition of meaning-invariant words is generated.

意味不変語追加の七五調候補生成の例

応答文（変換前）
（日）：川崎の地図を映します。
（ロ）：kawasaki no tizu wo utusi masu
（英）：I will display a map of Kawasaki.

応答文（変換後候補）
（日）：川崎の地図をやっぱり映します。
（ロ）：kawasaki no tizu wo yappari utusi masuExample of generating 75-tone candidates with additional meaning-invariant words

Response statement (before conversion)
(Sun): Shows a map of Kawasaki.
(B): kawasaki no tizu wo utusi masu
(English): I will display a map of Kawasaki.

Response statement (candidate after conversion)
(Sun): After all, the map of Kawasaki will be projected.
(B): kawasaki no tizu wo yappari utusi masu

この意味不変語追加の七五調候補生成の例では、応答文（変換前）における「地図を」と「映します」との間に、「やっぱり」である意味不変語を追加することで、応答文（変換後候補）が七五調になるようにしている。 In the example of generating the Shichigocho candidate for adding the meaning invariant word, the response sentence is added by adding the meaning invariant word that is "after all" between "map" and "project" in the response sentence (before conversion). (Candidate after conversion) is set to 75 tones.

（Ｅ）第５候補生成処理 (E) Fifth candidate generation process

第５候補生成処理では、応答文（変換前）に含まれる語を繰り返して、七五調になる応答文（変換後候補）を生成する（図４のＳ１１３Ｅ）。 In the fifth candidate generation process, the words included in the response sentence (before conversion) are repeated to generate a response sentence (post-conversion candidate) having a 75-tone tone (S113E in FIG. 4).

以下に、繰り返し追加の七五調候補生成時の変換前と変換後候補の応答文の例を示す。 The following is an example of the response sentences of the pre-conversion and post-conversion candidates when the 75-tone candidates are repeatedly added.

繰り返し追加の七五調候補生成の例

応答文（変換前）
（日）：明後日の天気予報は晴れです。
（ロ）：asatte no tenkiyohou ha hare desu
（英）：The weather forecast for the day after tomorrow is sunny.

応答文（変換後候補）
（日）：明後日の天気予報は晴れ晴れ晴れ。
（ロ）：asatte no tenkiyohou ha hare hare hareExample of repeating additional 75-tone candidate generation

Response statement (before conversion)
(Sun): The weather forecast for the day after tomorrow will be fine.
(B): asatte no tenkiyohou ha hare desu
(English): The weather forecast for the day after tomorrow is sunny.

Response statement (candidate after conversion)
(Sun): The weather forecast for the day after tomorrow will be sunny and sunny.
(B): asatte no tenkiyohou ha hare hare hare

この繰り返し追加の七五調候補生成の例では、応答文（変換前）における「晴れ」である語を３回繰り返すことで、応答文（変換後候補）が七五調になるようにしている。ただし、この七五調候補生成の例では、応答文（変換前）における「です」である語尾を除去している。 In the example of generating the 75-tone candidate added repeatedly, the response sentence (post-conversion candidate) is made to be 75-tone by repeating the word "sunny" in the response sentence (before conversion) three times. However, in this example of Shichigocho candidate generation, the ending of "desu" in the response sentence (before conversion) is removed.

（Ｆ）第６候補生成処理 (F) 6th candidate generation process

第６候補生成処理では、オノマトペリスト２１３を用いて、応答文（変換前）にオノマトペを追加することで、七五調になる応答文（変換後候補）を生成する（図４のＳ１１３Ｆ）。 In the sixth candidate generation process, the onomatopoeia 213 is used to add the onomatopoeia to the response sentence (before conversion) to generate a response sentence (post-conversion candidate) having a 75-tone tone (S113F in FIG. 4).

図９は、オノマトペリスト２１３の例を示す図である。 FIG. 9 is a diagram showing an example of onomatopoeia 213.

図９の例では、"じゃんじゃん"である本表記は、副詞であって、前表記として、任意の品詞からなる任意の文字が指定され、後表記として、動詞からなる任意の文字が指定されている。また、図９の例では、"ぎんぎん"である本表記は、形容動詞であって、前表記として、任意の品詞からなる任意の文字が指定され、後表記として、動詞からなる任意の文字が指定されている。 In the example of FIG. 9, this notation, which is "Janjan", is an adverb, and an arbitrary character consisting of an arbitrary part of speech is specified as a prenotation, and an arbitrary character consisting of a verb is specified as a postnotation. There is. Further, in the example of FIG. 9, this notation of "gingin" is an adjective verb, and an arbitrary character consisting of an arbitrary part of speech is specified as a prenotation, and an arbitrary character consisting of a verb is specified as a postnotation. Is specified.

以下に、オノマトペ追加の七五調候補生成時の変換前と変換後候補の応答文の例を示す。 The following is an example of the response sentences of the pre-conversion and post-conversion candidates when the 75-tone candidate is added to the onomatopoeia.

オノマトペ追加の七五調候補生成の第１の例

応答文（変換前）
（日）：メールがきています。どうしますか？
（ロ）：me-ru ga kite imasu dou simasuka
（英）：E-mail is coming. What will you do ?

応答文（変換後候補）
（日）：メールがねじゃんじゃんきています。どうします。
（ロ）：me-ru gane zyanzyan kite imasu dou simasuFirst example of generating 75-tone candidates for onomatopoeia

Response statement (before conversion)
(Sun): I have received an email. what should I do?
(B): me-ru ga kite imasu dou simasuka
(English): E-mail is coming. What will you do?

Response statement (candidate after conversion)
(Sun): I'm getting an email. What do you do
(B): me-ru gane zyanzyan kite imasu dou simasu

このオノマトペ追加の七五調候補生成の第１の例では、応答文（変換前）における「きています」の前に、「じゃんじゃん」であるオノマトペを追加することで、応答文（変換後候補）が七五調になるようにしている。また、この第１の例では、「メールが」の後に「ね（終助詞）」である語尾を追加し、さらに、「か（終助詞）」の助詞抜きを行っている。 In the first example of generating the 75-tone candidate for adding onomatopoeia, the response sentence (candidate after conversion) is created by adding the onomatopoeia that is "janjan" before "coming" in the response sentence (before conversion). I try to be in the 75th tone. Further, in this first example, the ending of "ne (final particle)" is added after "mail is", and the particle of "ka (final particle)" is omitted.

なお、ここでは、「じゃんじゃん」は、4モーラであるため、５音・８音・５音になっているが、許容範囲であるとしている。また、前後の形態素の概念を条件としてもよい。例えば、この第１の例の場合には、「じゃんじゃん」の前後に量を表す形態素がある。 Here, since "Janjan" has 4 mora, it has 5 sounds, 8 sounds, and 5 sounds, but it is said that it is within the permissible range. Further, the concept of morphemes before and after may be a condition. For example, in the case of this first example, there are morphemes representing quantities before and after "Janjan".

オノマトペ追加の七五調候補生成の第２の例

応答文（変換前）
（日）：２分後にタイマーセットしました。
（ロ）：ni hungo ni taima setto simasita
（英）：Timer set after 2 minutes.

応答文（変換後候補）
（日）：２分後にタイマーセットチクタック。
（ロ）：ni hungo ni taima setto tikutakkuSecond example of generating 75-tone candidates with onomatopoeia

Response statement (before conversion)
(Sun): The timer was set after 2 minutes.
(B): ni hungo ni taima setto simasita
(English): Timer set after 2 minutes.

Response statement (candidate after conversion)
(Sun): Timer set after 2 minutes.
(B): ni hungo ni taima setto tikutakku

このオノマトペ追加の七五調候補生成の第２の例では、応答文（変換前）における「タイマーセット」に続いて、「チクタック」であるオノマトペを追加することで、応答文（変換後候補）が七五調になるようにしている。ただし、この第２の例では、応答文（変換前）における「しました」に対して不要部分除去を行っている。 In the second example of generating the 75-tone candidate for adding onomatopoeia, the response sentence (post-conversion candidate) is 75-tone by adding the onomatopoeia that is "ticktack" after the "timer set" in the response sentence (before conversion). I am trying to be. However, in this second example, unnecessary parts are removed for "done" in the response sentence (before conversion).

オノマトペ追加の七五調候補生成の第３の例

応答文（変換前）
（日）：明後日の天気予報は晴れです。
（ロ）：asatte no tenkiyohou ha hare desu
（英）：The weather forecast for the day after tomorrow is sunny.

応答文（変換後候補）
（日）：明後日の天気予報は晴れれれれ
（ロ）：asatte no tenkiyohou ha hare re re reThird example of generating 75-tone candidates with onomatopoeia

Response statement (before conversion)
(Sun): The weather forecast for the day after tomorrow will be fine.
(B): asatte no tenkiyohou ha hare desu
(English): The weather forecast for the day after tomorrow is sunny.

Response statement (candidate after conversion)
(Sun): The weather forecast for the day after tomorrow will be fine (b): asate no tenkiyohou ha hare re re re

このオノマトペ追加の七五調候補生成の第３の例では、応答文（変換前）を七五調に変換する際に、どうしても七五調にならないので、「晴れ」である語に含まれる「れ」を３回繰り返すことで、応答文（変換後候補）が七五調になるようにしている。ただし、この第３の例では、応答文（変換前）における「です」に対して不要部分除去を行っている。なお、どうしても七五調にならない場合には、例えば「ダダダ」などの語を追加して埋めるようにしてもよい。 In the third example of this onomatopoeia-added 75-tone candidate generation, when converting the response sentence (before conversion) to 75-tone, it will never be 75-tone, so "re" included in the word "sunny" is repeated three times. By doing so, the response sentence (candidate after conversion) is made to be 75-tone. However, in this third example, unnecessary parts are removed for "desu" in the response sentence (before conversion). If the tone does not change to 75, you may add a word such as "Dadada" to fill it.

（Ｇ）第７候補生成処理 (G) 7th candidate generation process

第７候補生成処理では、同義語辞書２１４を用いて、応答文（変換前）に含まれる語を、同義語に置換することで、七五調になる応答文（変換後候補）を生成する（図４のＳ１１３Ｇ）。 In the seventh candidate generation process, the synonym dictionary 214 is used to replace the words included in the response sentence (before conversion) with synonyms to generate a response sentence (post-conversion candidate) that has a tone of 75 (Fig.). 4 S113G).

図１０は、同義語辞書２１４の例を示す図である。 FIG. 10 is a diagram showing an example of the synonym dictionary 214.

図１０において、「表記１」と「表記２」とが同じ意味となる語であり、「表記１」の品詞を、「品詞１」として表し、「表記２」の品詞を、「品詞２」として表している。 In FIG. 10, "notation 1" and "notation 2" have the same meaning, the part of speech of "notation 1" is represented as "part of speech 1", and the part of speech of "notation 2" is referred to as "part of speech 2". It is expressed as.

図１０の例では、"ラーメン" である名詞と、"中華そば" である名詞とは、同義語として指定されている。また、図１０の例では、"うれしい" である形容詞と、"ハッピー" である名詞とは、同義語として指定されている。 In the example of FIG. 10, the noun "ramen" and the noun "Chinese soba" are designated as synonyms. Further, in the example of FIG. 10, the adjective "happy" and the noun "happy" are designated as synonyms.

このとき、第１形態素を、第２形態素に置き換えることができる。またその逆に、第２形態素を、第１形態素に置き換えるようにしてもよい。 At this time, the first morpheme can be replaced with the second morpheme. On the contrary, the second morpheme may be replaced with the first morpheme.

以下に、同義語置換の七五調候補生成時の変換前と変換後候補の応答文の例を示す。 The following is an example of the response sentences of the pre-conversion and post-conversion candidates when the 75-tone candidate for synonymous word substitution is generated.

同義語置換の七五調候補生成の第１の例

応答文（変換前）
（日）：今晩は中華そばがおすすめです。
（ロ）：konban ha tyuuka soba ga osusume desu
（英）：Chinese noodle is recommended tonight

応答文（変換後候補）
（日）：今晩はラーメンがおすすめよ。
（ロ）：konban ha ra-men ga osusume yoFirst example of generating 75-tone candidates for synonymous substitution

Response statement (before conversion)
(Sun): Chinese soba is recommended tonight.
(B): konban ha tyuuka soba ga osusume desu
(English): Chinese noodle is recommended tonight

Response statement (candidate after conversion)
(Sun): I recommend ramen tonight.
(B): konban ha ra-men ga osusume yo

この同義語置換の七五調候補生成の第１の例では、応答文（変換前）における「中華そば」である語を、「ラーメン」である同義語に置換することで、応答文（変換後候補）が七五調になるようにしている。ただし、この第１の例では、応答文（変換前）における「です」に対して不要部分除去を行い、さらに「よ（終助詞）」である語尾を追加している。 In the first example of generating the Shichi-Go-Chinese candidate for synonymous word substitution, the word "Chinese soba" in the response sentence (before conversion) is replaced with the synonym "ramen" to replace the response sentence (candidate after conversion). ) Is set to 75. However, in this first example, unnecessary parts are removed from "desu" in the response sentence (before conversion), and the ending "yo (final particle)" is added.

同義語置換の七五調候補生成の第２の例

応答文（変換前）
（日）：明後日は買い物の予定です。
（ロ）：asatte ha kaimono no yotei desu
（英）：I will go shopping the day after tomorrow.

応答文（変換後候補）
（日）：明後日はショッピングの予定です。
（ロ）：asatte ha syoppingu no yotei desuSecond example of generating 75-tone candidates for synonymous substitution

Response statement (before conversion)
(Sun): I plan to shop the day after tomorrow.
(B): asatte ha kaimono no yotei desu
(English): I will go shopping the day after tomorrow.

Response statement (candidate after conversion)
(Sun): The day after tomorrow will be shopping.
(B): asatte ha syoppingu no yotei desu

この同義語置換の七五調候補生成の第２の例では、応答文（変換前）における「買い物」である語を、「ショッピング」である同義語に置換することで、応答文（変換後候補）が七五調になるようにしている。 In the second example of generating the Shichi-Go-Chinese candidate for synonymous word substitution, the word "shopping" in the response sentence (before conversion) is replaced with the synonym for "shopping", so that the response sentence (candidate after conversion) Is trying to be in the 75th tone.

（Ｈ）第１候補生成処理乃至第７候補生成処理の組み合わせ (H) Combination of 1st candidate generation processing to 7th candidate generation processing

上述した第１候補生成処理乃至第７候補生成処理は、単独で、候補生成パターンとすることは勿論、それらを組み合わせた候補生成パターンを用いて、七五調になる応答文（変換後候補）を生成するようにしてもよい。 The first candidate generation process to the seventh candidate generation process described above can be used alone as a candidate generation pattern, or a combination of them to generate a response sentence (candidate after conversion) that has a tone of 75. You may try to do it.

なお、（Ａ）助詞抜きや（Ｂ）不要部分除去の七五調候補生成は、応答文（変換前）に含まれる語を除外する第１のケースとして分類することができる。また、（Ｃ）語尾追加や、（Ｄ）意味不変語追加、（Ｅ）繰り返し追加、（Ｆ）オノマトペ追加は、応答文（変換前）に対して語を追加する第２のケースとして分類することができる。さらに、（Ｇ）同義語置換は、応答文（変換前）に含まれる語を置き換える第３のケースとして分類することができる。 Note that (A) particle removal and (B) generation of 75-tone candidates for removing unnecessary parts can be classified as the first case of excluding words included in the response sentence (before conversion). In addition, (C) addition of endings, (D) addition of meaning-invariant words, (E) addition of repetitions, and (F) addition of onomatopoeia are classified as the second case of adding words to the response sentence (before conversion). be able to. Further, (G) synonymous word substitution can be classified as a third case of replacing a word contained in a response sentence (before conversion).

以下に、第１候補生成処理乃至第７候補生成処理のいずれかを組み合わせた場合における七五調候補生成時の応答文の一例として、語尾の追加と同義語置換を組み合わせた七五調候補生成時の変換前と変換後候補の応答文の例を示す。 Below, as an example of the response sentence at the time of generating the 75th tone candidate when any of the 1st candidate generation processing and the 7th candidate generation processing is combined, before the conversion at the time of generating the 75th tone candidate by combining the addition of the ending and the synonym substitution. An example of the response statement of the candidate after conversion is shown.

語尾の追加＆同義語置換の七五調候補生成の例

応答文（変換前）
（日）：今日の占い結果はとてもいいです。
（ロ）：kyou no uranai kekka ha totemo ii desu
（英）：Today's fortunetelling results are very good.

応答文（変換後候補）
（日）：今日のね占い結果はグッドです。
（ロ）：kyou no ne uranai kekka ha guddo desuExample of adding endings & generating 75-tone candidates for synonymous replacement

Response statement (before conversion)
(Sun): Today's fortune-telling result is very good.
(B): kyou no uranai kekka ha totemo ii desu
(English): Today's fortunetelling results are very good.

Response statement (candidate after conversion)
(Sun): Today's fortune-telling result is good.
(B): kyou no ne uranai kekka ha guddo desu

この語尾の追加＆同義語置換の七五調候補生成の例では、応答文（変換前）における「今日の」に続いて、「ね」である語尾が追加され、さらに、「とてもいい」である語を、「グッド」である同義語に置換することで、応答文（変換後候補）が七五調になるようにしている。 In the example of this addition of ending & generation of 75-tone candidates for synonymous replacement, the ending of "ne" is added after "today" in the response sentence (before conversion), and the word "very good". Is replaced with a synonym for "good" so that the response sentence (candidate after conversion) becomes 75-tone.

（Ｉ）日本語以外の他の言語の例 (I) Examples of languages other than Japanese

上述した説明では、日本語の応答文を七五調に変換する例を説明したが、日本語以外の他の言語に対しても同様に七五調変換を行うことができる。ここでは、他の言語として、英語を一例にして、英語の応答文が、七五調変換される場合を示す。 In the above explanation, an example of converting a Japanese response sentence into a 75-tone conversion has been described, but the 75-tone conversion can be performed in the same manner for languages other than Japanese. Here, as another language, English is taken as an example, and a case where an English response sentence is converted to Shichigocho is shown.

以下に、英語の応答文について、七五調候補生成時の変換前と変換後候補の応答文の例を示す。 The following is an example of the response sentences of the candidates before and after conversion at the time of generating the Shichigocho candidate for the English response sentence.

英語の応答文の七五調候補生成の第１の例

応答文（変換前）
（英）：Today's weather in Tokyo is rainy.

応答文（変換後候補）
（英）：In Tokyo today's weather is rainy.First example of generating 75-tone candidates for English response sentences

Response statement (before conversion)
(English): Today's weather in Tokyo is rainy.

Response statement (candidate after conversion)
(English): In Tokyo today's weather is rainy.

この英語の応答文の七五調候補生成の第１の例では、応答文（変換前）と応答文（変換後候補）とで、「today's weather」と「in Tokyo」との語順を入れ替えることで、応答文が七五調になるようにしている。 In the first example of generating the Shichigo-Chinese candidate of this English response sentence, the word order of "today's weather" and "in Tokyo" is exchanged between the response sentence (before conversion) and the response sentence (candidate after conversion). The response sentence is made to be 75-tone.

英語の応答文の七五調候補生成の第２の例

応答文（変換前）
（英）：These are maps you want.

応答文（変換後候補）
（英）：These are maps which you want.Second example of generating 75-tone candidates for English response sentences

Response statement (before conversion)
(English): These are maps you want.

Response statement (candidate after conversion)
(English): These are maps which you want.

この英語の応答文の七五調候補生成の第２の例では、応答文（変換前）における「maps」に続いて、「which」を挿入することで、応答文が七五調になるようにしている。 In the second example of generating the 75-tone candidate of the English response sentence, the response sentence is made to be the 75-tone by inserting "which" after "maps" in the response sentence (before conversion).

英語の応答文の七五調候補生成の第３の例

応答文（変換前）
（英）：This is recommendation for you.

（変換後候補）
（英）：This is a recommended restaurant.Third example of generating 75-tone candidates for English response sentences

Response statement (before conversion)
(English): This is recommendation for you.

(Candidate after conversion)
(English): This is a recommended restaurant.

この英語の応答文の七五調候補生成の第３の例では、応答文（変換前）における「for you」を削除するとともに、「a」と、「restaurant」を挿入することで、応答文が七五調になるようにしている。 In the third example of generating the 75-tone candidate of this English response sentence, the response sentence is changed to the 75-tone by deleting "for you" in the response sentence (before conversion) and inserting "a" and "restaurant". I am trying to be.

英語の応答文の七五調候補生成の第４の例

応答文（変換前）
（英）：You got a mail.

応答文（変換後候補）
（英）：Just you got a mail you're happy?Fourth example of generating 75-tone candidates for English response sentences

Response statement (before conversion)
(English): You got a mail.

Response statement (candidate after conversion)
(English): Just you got a mail you're happy?

この英語の応答文の七五調候補生成の第４の例では、応答文（変換前）に対して、「Just」と「you're happy?」を挿入することで、応答文が七五調になるようにしている。 In the fourth example of generating 75-tone candidates for this English response sentence, by inserting "Just" and "you're happy?" Into the response sentence (before conversion), the response sentence becomes 75-tone. I have to.

なお、上述した英語など、日本語以外の他の言語においても、七五調は存在しているが、日本語ほど身近ではない言語もあるので、そのような言語に対する処理を行う場合には、ユーザフィードバック情報を用いた閾値処理での閾値を、日本語の場合の閾値に比べて、高めに設定するなどの対処を行うようにしてもよい。 In addition, although there are seventy-five tones in languages other than Japanese, such as English mentioned above, there are languages that are not as familiar as Japanese, so user feedback is required when processing for such languages. It is also possible to take measures such as setting the threshold value in the threshold value processing using information higher than the threshold value in the case of Japanese.

以上、第１の実施の形態について説明した。 The first embodiment has been described above.

この第１の実施の形態では、ユーザの発話に対する応答（テキスト情報）を、七五調に変換して出力することで、七五調制約によって、より人間的な発話（例えば心地よい発話）を行うことができるようにしている。 In this first embodiment, the response (text information) to the user's utterance is converted into 75-tone and output so that more human utterance (for example, comfortable utterance) can be performed by the 75-tone constraint. I have to.

すなわち、近年、音声による対話を行う商品やサービスが普及しているが、音声による対話を行う場合に、例えば、「明日の横浜の天気は晴れです。」のように応答が単純で、人間同士が行う対話とは明らかに異なっているものも多い。そうすると、ユーザによっては、例えば、面白みがない、対話を楽しめない、記憶に残りにくい、継続して使いたいとは思わないなどとなることが想定され、より人間的な発話を行うことができるようにすることが求められていた。 In other words, in recent years, products and services that engage in voice dialogue have become widespread, but when conducting voice dialogue, the response is simple, for example, "Tomorrow's weather in Yokohama is sunny." Many of them are clearly different from the dialogues that they carry out. Then, depending on the user, for example, it is assumed that it is not interesting, the dialogue cannot be enjoyed, it is difficult to remember, and the user does not want to continue using it, so that he / she can speak more humanly. Was required to be.

そこで、第１の実施の形態では、ユーザの発話に対する応答を七五調で返すことによって、人間味が出てきて、システムとの対話を楽しむことができるようにしている。例えば、上述の「明日の横浜の天気は晴れです。」を、「横浜の明日の天気は晴れなのさ。」のように、５音・７音・５音になるようにすることで、より人間的な対話を行うことができる。 Therefore, in the first embodiment, by returning the response to the user's utterance in the 75th tone, a human touch comes out and the dialogue with the system can be enjoyed. For example, by changing the above-mentioned "Tomorrow's weather in Yokohama is sunny" to "Tomorrow's weather in Yokohama is sunny", it will be 5 sounds, 7 sounds, and 5 sounds. Can have a human dialogue.

また、第１の実施の形態では、ユーザフィードバック情報やコンテキスト情報を用いることで、七五調制約を行うに際して、より適切に七五調に変換することができる。 Further, in the first embodiment, by using the user feedback information and the context information, it is possible to more appropriately convert to the 75-tone when performing the 75-tone constraint.

すなわち、七五調での応答を行うことで、ユーザによっては、鬱陶しいや煩わしいなど感じて、不快感を抱くケースも想定されるため、システムの発話の結果、現在の状況と似た過去の状況において、ユーザの反応がどうであったか（例えば、良かったのか、悪かったのか）を収集し、ユーザの反応が良かった七五調変換処理を優先的に選択するようにする。これにより、例えば、時と場所、場合（TPO：Time Place Occasion）をわきまえた七五調の応答を行うことができる。 In other words, by responding in the Shichigo-cho, some users may feel annoyed or annoyed and feel uncomfortable.Therefore, as a result of the system's utterance, in a past situation similar to the current situation, Collect how the user's reaction was (for example, whether it was good or bad), and preferentially select the 75-tone conversion process that the user's reaction was good. As a result, for example, it is possible to perform a response in the 75th tone that knows the time, place, and case (TPO: Time Place Occasion).

なお、本技術を用いることで、七五調制約によって、対話システムのエージェントが、より人間的な発話を行うことで、エージェントのキャラクタ（特徴）付けを行うことができる。 In addition, by using this technology, the agent of the dialogue system can characterize (characterize) the agent by making a more human utterance due to the Shichigocho constraint.

また、本技術を用いることで、仮に、音声合成部１５７による音声合成（TTS）の精度が低い（例えばイントネーションの不安定さなどがある）場合でも、七五調制約を行うことでその精度の低さを隠すことができる。さらに、本技術を用いることで、仮に、発話意図理解部１５２で、ユーザの意図を解析できない場合でも、七五調で応答を返すことで、システムが意図を理解できていないことを、ユーザが許容する可能性を高めることができる。 In addition, by using this technology, even if the accuracy of speech synthesis (TTS) by the speech synthesis unit 157 is low (for example, there is instability of intonation), the accuracy is low by performing the 75-tone constraint. Can be hidden. Furthermore, by using this technology, even if the utterance intention understanding unit 152 cannot analyze the user's intention, the user allows the system to not understand the intention by returning a response in the 75th tone. The possibility can be increased.

なお、上述した七五調候補生成処理は一例であって、応答文（変換前）を七五調に変換できるものであれば、いずれの処理を採用してもよい。例えば、ユーザが「クラシックの曲をかけて。」と発話した場合に、通常であれば、システムは、「クラシックの曲は入っていません。」などと応答するが、ここでは、格フレーム辞書を利用して、「かけてもいいのマヨネーズ。」などと応答するようにしてもよい。 The above-mentioned 75-tone candidate generation process is an example, and any process may be adopted as long as the response sentence (before conversion) can be converted to the 75-tone. For example, if the user says "play a classical song", the system would normally respond with "no classical song", but here is a case frame dictionary. You may use to respond with something like "Mayonnaise that you can call."

すなわち、「かける」という単語には、「音楽をかける」という意味のほかに、例えば、「マヨネーズ」等の「調味料をかける」という意味もあるため、ここでは、あえて、「かける」を違う意味で用いることで、人間的な対話が行われるようにしている。なお、格フレームとは、用言とそれに関係する名詞を、用言の用法ごとに整理したものである。また、応答文（変換前）を七五調に変換するに際しては、全体的に口語調を優先させるようにしてもよい。 In other words, the word "kake" has the meaning of "playing music" as well as "playing seasonings" such as "mayonnaise", so here, "kake" is intentionally different. By using it in a sense, we are trying to have a human dialogue. The case frame is a collection of nouns and related nouns according to the usage of the words. Further, when converting the response sentence (before conversion) to the 75-tone, the verbal tone may be given priority as a whole.

＜２．第２の実施の形態＞ <2. Second Embodiment>

ところで、上述した第１の実施の形態では、ユーザとの対話、すなわち、ユーザの発話の内容を解析し、その発話内容に対する応答を出力する際に、七五調制約によって、より人間的な応答が行われるようにした。 By the way, in the above-described first embodiment, when the dialogue with the user, that is, the content of the user's utterance is analyzed and the response to the utterance content is output, a more human response is performed due to the Shichigocho constraint. I tried to be.

本技術は、このような対話システムに限らず、例えば、アバタ等のキャラクタが、ニュースや天気予報等の原稿（テキスト）を読み上げるシーンなど、様々な音声発話のシーンに適用することができる。そこで、以下、第２の実施の形態として、本技術をニュース番組制作システムに適用した場合を説明する。 This technology is not limited to such a dialogue system, and can be applied to various voice utterance scenes such as a scene in which a character such as Avata reads out a manuscript (text) such as news or a weather forecast. Therefore, a case where the present technology is applied to a news program production system will be described below as a second embodiment.

（ニュース番組制作システムの例）
図１１は、本技術を適用したニュース番組制作システムの例を示している。(Example of news program production system)
FIG. 11 shows an example of a news program production system to which the present technology is applied.

図１１において、情報処理装置１０は、ニュース番組制作システムの一部として構成されており、例えば、街中や駅前などに設置される大型ビジョンで、女性のキャラクタが、ニュースや天気予報のテキスト情報を、音声合成により読み上げている。 In FIG. 11, the information processing device 10 is configured as a part of a news program production system. For example, in a large-scale vision installed in the city or in front of a station, a female character can send text information of news and weather forecast. , Read aloud by voice synthesis.

このとき、例えば、ニュースの合間や、番組の最後などのタイミングで、女性のキャラクタが、本技術を適用して七五調に変換されたテキスト情報を読み上げるようにする。 At this time, for example, at the timing between news or at the end of the program, the female character applies the present technology to read out the text information converted into the 75-tone.

このように、七五調制約によって、より人間的な発話が行われるようにすることで、例えば、ぼけたキャラクタを演出して、ユーザに対し、意外な一面があることを印象付けることで、女性のキャラクタに興味を持つユーザを増やし、結果として、ニュース番組の視聴者を増加させることができる。 In this way, by allowing more human speech to be performed by the Shichigocho constraint, for example, by directing a blurred character and impressing the user that there is an unexpected side, a woman's It is possible to increase the number of users who are interested in the character, and as a result, increase the number of viewers of the news program.

ここで、本技術をニュース番組制作システムに適用した場合における、ユーザフィードバック情報であるが、例えば、情報処理装置１０に設けられたカメラ１１１により撮像された撮像画像を解析して、街中等に設置された大型ビジョンを観ている通行人の表情などから、候補生成パターンごとのスコアを算出することができる。 Here, as user feedback information when this technology is applied to a news program production system, for example, an image captured by a camera 111 provided in the information processing device 10 is analyzed and installed in a city or the like. The score for each candidate generation pattern can be calculated from the facial expressions of passers-by who are watching the large-scale vision.

なお、ここでも、上述した顔や画像の認識用のAPIを提供するサービスを利用して、多数の通行人の顔を含む撮像画像を送ることで、例えば、喜びや驚きなど、通行人の表情や感情に関する情報を得ることができる。 Also here, by using the service that provides the API for recognizing faces and images described above and sending captured images including the faces of a large number of passers-by, for example, facial expressions of passers-by such as joy and surprise. You can get information about and feelings.

例えば、多くの通行人が、楽しそうな表情をしていることが認識された場合には、スコアが加算される一方で、例えば、多くの通行人がつまらなそうな表情をしていることが認識された場合には、スコアが減算される。 For example, if it is recognized that many passers-by have a happy look, the score will be added, while many passers-by will have a boring look, for example. If recognized, the score is subtracted.

なお、例えば、ユーザが各家庭で、テレビ受像機やスマートフォンなどにより、当該ニュース番組を視聴している場合には、リモートコントローラや、スマートフォンで起動されたアプリケーションなどを、ユーザが操作することで、女性のキャラクタの七五調の発話に対する投票（例えば、良い又は悪いなど）などが行われるようにしてもよい。 For example, when the user is watching the news program at home using a TV receiver or a smartphone, the user can operate the remote controller or the application started on the smartphone. Voting (eg, good or bad) for the female character's 75-tone utterances may be made.

また、ここでも、コンテキスト情報としては、時刻情報や位置情報のほか、例えば、情報処理装置１０に設けられたカメラ１１１により撮像された撮像画像の解析結果から得られる情報を用いることができる。 Also, here, as the context information, in addition to the time information and the position information, for example, information obtained from the analysis result of the captured image captured by the camera 111 provided in the information processing apparatus 10 can be used.

なお、本技術を適用したニュース番組制作システムでは、ユーザの発話を認識したり、応答を生成したりする必要はないため、図２に示した情報処理部１５０の機能（音声認識部１５１乃至ユーザフィードバック収集部１５８）のうち、例えば、コンテキスト取得部１５５、七五調変換部１５６、音声合成部１５７、及びユーザフィードバック収集部１５８に応じた機能を含むようにして構成されればよい。 In the news program production system to which this technology is applied, it is not necessary to recognize the user's utterance or generate a response. Therefore, the functions of the information processing unit 150 (speech recognition unit 151 to the user) shown in FIG. Among the feedback collecting units 158), for example, it may be configured to include functions corresponding to the context acquisition unit 155, the 75-tone conversion unit 156, the speech synthesis unit 157, and the user feedback collecting unit 158.

また、上述した説明では、女性のキャラクタによる音声発話を例に説明したが、音声発話に限らず、ニュースや天気予報の内容、七五調の発話の内容が、テキスト情報として、大型ビジョンやディスプレイ等の画面に表示されるようにしてもよい。 In addition, in the above explanation, voice utterances by female characters have been described as an example, but not limited to voice utterances, the contents of news and weather forecasts, and the contents of Shichigocho utterances can be used as text information for large-scale visions, displays, etc. It may be displayed on the screen.

以上、第２の実施の形態について説明した。 The second embodiment has been described above.

この第２の実施の形態では、音声合成により読み上げられることを目的としたテキスト情報（例えばニュースの合間の発話など）を、七五調に変換して出力することで、七五調制約によって、より人間的な発話（例えば親近感がわく発話）を行うことができるようにしている。また、第２の実施の形態においても、ユーザフィードバック情報やコンテキスト情報を用いることで、七五調制約を行うに際して、より適切に七五調に変換することができる。 In this second embodiment, text information intended to be read aloud by voice synthesis (for example, utterances between news) is converted into 75-tone and output, so that it is more human-like due to the 75-tone constraint. It makes it possible to make utterances (for example, utterances that give a feeling of familiarity). Further, also in the second embodiment, by using the user feedback information and the context information, it is possible to more appropriately convert to the 75-tone when performing the 75-tone constraint.

＜３．第３の実施の形態＞ <3. Third Embodiment>

また、対話システム以外の構成として、上述した第２の実施の形態では、ニュース番組制作システムを説明したが、その他の構成として、例えば、デジタルサイネージにて、CMを流すシーンなどにも適用することができる。そこで、以下、第３の実施の形態として、本技術をデジタルサイネージシステムに適用した場合を説明する。 Further, as a configuration other than the dialogue system, the news program production system has been described in the above-described second embodiment, but as another configuration, for example, it may be applied to a scene in which a commercial is played in digital signage. Can be done. Therefore, a case where the present technology is applied to a digital signage system will be described below as a third embodiment.

（デジタルサイネージシステムの例）
図１２は、本技術を適用したデジタルサイネージシステムの例を示している。(Example of digital signage system)
FIG. 12 shows an example of a digital signage system to which the present technology is applied.

図１２において、情報処理装置１０は、デジタルサイネージシステムの一部として構成されており、例えば、駅や商業施設などの屋内に設置され、広告や案内等の情報を表示している。 In FIG. 12, the information processing device 10 is configured as a part of a digital signage system, and is installed indoors such as a station or a commercial facility to display information such as advertisements and guidance.

このとき、例えば、CMとCMの間などのタイミングで、本技術を適用して七五調に変換されたテキスト情報を読み上げるようにする。より具体的には、図１２に示すように、ある時刻に流れる自動車のCMと、その後の時刻に流れる洗剤のCMとの間に、デジタルサイネージの近くを歩いている通行人が興味を引くような、例えば、ぼけた内容の七五調の発話の音声が出力されるようにする。 At this time, for example, at a timing such as between CMs, the present technology is applied to read out the text information converted into the Shichigocho. More specifically, as shown in FIG. 12, a passerby walking near the digital signage will be interested between the commercial of the car flowing at a certain time and the commercial of the detergent flowing at the subsequent time. For example, the voice of the 75-tone utterance with blurred content is output.

このように、七五調制約によって、より人間的な発話が行われるようにすることで、例えば、通行人に対し、CMの対象となる商品を印象付けて、その商品に興味を持たせることができる。 In this way, by allowing more human speech to be made by the Shichigocho constraint, for example, it is possible to impress passers-by with a product that is the target of a commercial and make the product interested. ..

なお、ここでは、音声発話を一例に説明したが、音声発話に限らず、七五調に変換されたテキスト情報が、デジタルサイネージの画面に表示されるようにしてもよい。 Although the voice utterance has been described here as an example, not only the voice utterance but also the text information converted into the 75-tone may be displayed on the screen of the digital signage.

ここで、本技術をデジタルサイネージシステムに適用した場合における、ユーザフィードバック情報であるが、例えば、情報処理装置１０に設けられたカメラ１１１により撮像された撮像画像を解析して、駅等の屋内に設置されたデジタルサイネージを観ている通行人の表情などから、候補生成パターンごとのスコアを算出することができる。 Here, the user feedback information when this technology is applied to a digital signage system is, for example, analyzed by an image captured by a camera 111 provided in the information processing apparatus 10 and used indoors such as a station. The score for each candidate generation pattern can be calculated from the facial expressions of passers-by who are watching the installed digital signage.

なお、本技術を適用したデジタルサイネージシステムにおいても、ユーザの発話を認識したり、応答を生成したりする必要はないため、図２に示した情報処理部１５０の機能（音声認識部１５１乃至ユーザフィードバック収集部１５８）のうち、例えば、コンテキスト取得部１５５、七五調変換部１５６、音声合成部１５７、及びユーザフィードバック収集部１５８に応じた機能を含むようにして構成されればよい。 Even in the digital signage system to which this technology is applied, it is not necessary to recognize the user's utterance or generate a response, so that the function of the information processing unit 150 (speech recognition unit 151 to the user) shown in FIG. Among the feedback collecting units 158), for example, it may be configured to include functions corresponding to the context acquisition unit 155, the 75-tone conversion unit 156, the speech synthesis unit 157, and the user feedback collecting unit 158.

以上、第３の実施の形態について説明した。 The third embodiment has been described above.

この第３の実施の形態では、音声合成により読み上げられることを目的としたテキスト情報（例えばCMとCMの間に出力される発話など）を、七五調に変換して出力することで、七五調制約によって、より人間的な発話（例えば商品に興味を持たせるような発話）を行うことができるようにしている。また、第３の実施の形態においても、ユーザフィードバック情報やコンテキスト情報を用いることで、七五調制約を行うに際して、より適切に七五調に変換することができる。 In this third embodiment, text information intended to be read aloud by voice synthesis (for example, an utterance output between CMs) is converted into 75-tone and output, thereby being subject to the 75-tone constraint. , Makes it possible to make more human-like utterances (for example, utterances that make the product interesting). Further, also in the third embodiment, by using the user feedback information and the context information, it is possible to more appropriately convert to the 75-tone when performing the 75-tone constraint.

＜４．変形例＞ <4. Modification example>

（対話システムの構成例）
上述した説明では、情報処理装置１０（の情報処理部１５０）により対話処理が実行されることで、対話サービスが実現される場合を例示したが、このような対話サービスを実現するための構成として、例えば、図１３に示すような構成を採用することができる。(Example of interactive system configuration)
In the above description, the case where the dialogue service is realized by executing the dialogue processing by the information processing device 10 (information processing unit 150) has been illustrated, but as a configuration for realizing such a dialogue service, For example, the configuration shown in FIG. 13 can be adopted.

図１３において、対話システム１は、ユーザ宅等のローカル側に設置され、対話サービスのユーザインターフェースとして機能する情報処理装置１０と、データセンタ等のクラウド側に設置され、対話サービスの対話機能の実現するための処理を行うサーバ２０とから構成されている。 In FIG. 13, the dialogue system 1 is installed on the local side such as a user's house and functions as a user interface of the dialogue service, and is installed on the cloud side such as a data center to realize the dialogue function of the dialogue service. It is composed of a server 20 that performs processing for the purpose of processing.

この対話システム１において、情報処理装置１０とサーバ２０とは、インターネット３０を介して相互に接続されている。 In the dialogue system 1, the information processing device 10 and the server 20 are connected to each other via the Internet 30.

情報処理装置１０は、例えば、例えば、家庭内LAN等のネットワークに接続可能なスピーカであって、いわゆるスマートスピーカやホームエージェントなどとも称される。この種のスピーカは、音楽の再生のほか、例えば、ユーザとの音声対話や、照明器具や空調設備などの機器に対する音声操作などを行うことができる。 The information processing device 10 is, for example, a speaker that can be connected to a network such as a home LAN, and is also called a so-called smart speaker or a home agent. In addition to playing music, this type of speaker can perform voice dialogue with a user, voice operation on equipment such as lighting equipment and air conditioning equipment, and the like.

情報処理装置１０は、インターネット３０を介してサーバ２０と連携することで、ユーザに対し、対話サービス（のユーザインターフェース）を提供することができる。 The information processing device 10 can provide a dialogue service (user interface) to a user by cooperating with the server 20 via the Internet 30.

すなわち、情報処理装置１０は、ユーザから発せられた音声（ユーザ発話）を収音し、その音声データを、インターネット３０を介して、サーバ２０に送信する。また、情報処理装置１０は、インターネットを介してサーバ２０から送信されてくる処理データを受信し、その処理データに応じた音声を出力する。 That is, the information processing device 10 picks up the voice (user's utterance) emitted from the user and transmits the voice data to the server 20 via the Internet 30. Further, the information processing device 10 receives the processing data transmitted from the server 20 via the Internet, and outputs the voice corresponding to the processing data.

サーバ２０は、クラウドベースの対話サービスを提供するサーバである。サーバ２０は、インターネット３０を介して情報処理装置１０から送信されてくる音声データを、テキスト情報に変換するための音声認識処理を行う。また、サーバ２０は、テキスト情報に対し、ユーザの意図に応じた対話処理などの処理を行い、その処理の結果得られる処理データを、インターネット３０を介して情報処理装置１０に送信する。 The server 20 is a server that provides a cloud-based dialogue service. The server 20 performs voice recognition processing for converting voice data transmitted from the information processing device 10 via the Internet 30 into text information. Further, the server 20 performs processing such as interactive processing according to the user's intention on the text information, and transmits the processing data obtained as a result of the processing to the information processing device 10 via the Internet 30.

なお、図１３に示したローカル側とクラウド側からなる構成では、対話システム１として、ユーザの発話に対する応答を生成するシステムについて説明したが、上述したような、ニュース番組制作システムやデジタルサイネージシステム等のテキスト情報を音声合成により読み上げるシステムとして構成されるようにしてもよい。 In the configuration composed of the local side and the cloud side shown in FIG. 13, a system for generating a response to a user's speech was described as the dialogue system 1, but as described above, a news program production system, a digital signage system, etc. It may be configured as a system that reads out the text information of the above by voice synthesis.

このように、上述した説明では、図２の情報処理部１５０の機能（音声認識部１５１乃至ユーザフィードバック収集部１５８）が、情報処理装置１０に組み込まれるとして説明したが、図２の情報処理部１５０の機能が、サーバ２０の機能として組み込まれるようにしてもよい。すなわち、図２の情報処理部１５０の機能（音声認識部１５１乃至ユーザフィードバック収集部１５８）のそれぞれは、情報処理装置１０、及びサーバ２０のうち、いずれの機器に組み込まれてもよい。 As described above, in the above description, the functions of the information processing unit 150 of FIG. 2 (voice recognition unit 151 to the user feedback collecting unit 158) have been described as being incorporated in the information processing device 10, but the information processing unit of FIG. 2 has been described. The 150 functions may be incorporated as the functions of the server 20. That is, each of the functions of the information processing unit 150 (voice recognition unit 151 to the user feedback collecting unit 158) of FIG. 2 may be incorporated in any of the information processing device 10 and the server 20.

例えば、図２の情報処理部１５０の機能のうち、音声認識部１５１乃至応答生成部１５４が、クラウド側のサーバ２０に組み込まれ、コンテキスト取得部１５５乃至ユーザフィードバック収集部１５８が、ローカル側の情報処理装置１０に組み込まれるようにすることができる。 For example, among the functions of the information processing unit 150 in FIG. 2, the voice recognition unit 151 to the response generation unit 154 are incorporated in the server 20 on the cloud side, and the context acquisition unit 155 to the user feedback collection unit 158 provide information on the local side. It can be incorporated into the processing device 10.

なお、いずれの構成を採用した場合でも、ユーザフィードバック情報DB２０２や、語尾リスト２１１、意味不変語リスト２１２、オノマトペリスト２１３、同義語辞書２１４などのデータベースは、インターネット３０上のサーバ２０が管理することができる。 Regardless of which configuration is adopted, the database such as the user feedback information DB 202, the ending list 211, the meaning-invariant word list 212, the onomatopoeia 213, and the synonym dictionary 214 shall be managed by the server 20 on the Internet 30. Can be done.

以上のように、本技術は、１つの機能を、ネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 As described above, the present technology can have a cloud computing configuration in which one function is shared by a plurality of devices via a network and jointly processed.

（コンピュータの構成）
上述した一連の処理（例えば、図３乃至図４に示した七五調変換出力処理）は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。図１及び図２の構成に示したように、一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、情報処理装置１０（コンピュータ）にインストールされる。(Computer configuration)
The series of processes described above (for example, the 75-tone conversion output process shown in FIGS. 3 to 4) can be executed by hardware or software. As shown in the configurations of FIGS. 1 and 2, when a series of processes are executed by software, a program constituting the software is installed in the information processing apparatus 10 (computer).

図１の情報処理装置１０（CPU１０１）が実行するプログラムは、例えば、パッケージメディア等としてのリムーバブル記録媒体に記録して提供することができる。なお、リムーバブル記録媒体は、例えば、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等から構成される。 The program executed by the information processing device 10 (CPU101) of FIG. 1 can be recorded and provided on a removable recording medium such as a package medium. The removable recording medium is composed of, for example, a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like.

また、プログラムは、インターネット３０のほか、ローカルエリアネットワーク、デジタル衛星放送といった、有線又は無線の伝送媒体を介して提供することができる。 In addition to the Internet 30, the program can be provided via a wired or wireless transmission medium such as a local area network or digital satellite broadcasting.

図１の情報処理装置１０では、プログラムは、リムーバブル記録媒体を、ドライブ（不図示）に装着することにより、情報アクセス部１０４を介して、ハードディスク１０５等の記録装置にインストールすることができる。 In the information processing device 10 of FIG. 1, the program can be installed in a recording device such as a hard disk 105 via the information access unit 104 by mounting a removable recording medium on a drive (not shown).

また、プログラムは、有線又は無線の伝送媒体を介して、通信I/F１１６で受信し、ハードディスク１０５等の記録装置にインストールすることができる。その他、プログラムは、ROM１０２や記録装置などに、あらかじめインストールしておくことができる。 Further, the program can be received by the communication I / F 116 via a wired or wireless transmission medium and installed in a recording device such as a hard disk 105. In addition, the program can be installed in advance in the ROM 102, the recording device, or the like.

ここで、本明細書において、図１の情報処理装置１０（CPU１０１）がプログラムに従って行う処理は、必ずしも図３乃至図４に示したフローチャートとして記載された順序に沿って時系列に行われる必要はない。すなわち、図１の情報処理装置１０（CPU１０１）がプログラムに従って行う処理は、並列的あるいは個別に実行される処理（例えば、並列処理あるいはオブジェクトによる処理）も含む。 Here, in the present specification, the processing performed by the information processing apparatus 10 (CPU101) of FIG. 1 according to the program does not necessarily have to be performed in chronological order in the order described as the flowcharts shown in FIGS. 3 to 4. Absent. That is, the processing performed by the information processing apparatus 10 (CPU101) of FIG. 1 according to the program includes processing executed in parallel or individually (for example, parallel processing or processing by an object).

また、プログラムは、１のコンピュータ（プロセッサ）により処理されるものであってもよいし、複数のコンピュータによって分散処理されるものであってもよい。すなわち、図３乃至図４に示した七五調変換出力処理の各ステップは、１つの装置で実行する他、複数の装置で分担して実行することができる。さらに、１つのステップに複数の処理が含まれる場合には、その１つのステップに含まれる複数の処理は、１つの装置で実行する他、複数の装置で分担して実行することができる。 Further, the program may be processed by one computer (processor) or may be distributed by a plurality of computers. That is, each step of the 75-tone conversion output process shown in FIGS. 3 to 4 can be executed by one device or shared by a plurality of devices. Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices.

なお、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiment of the present technology is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present technology.

また、本技術は、以下のような構成をとることができる。 In addition, the present technology can have the following configurations.

（１）
入力されたテキスト情報を、七五調に変換して出力する処理部を備える
情報処理装置。
（２）
前記処理部は、ユーザからのフィードバックで得られるユーザフィードバック情報に基づいて、前記テキスト情報を七五調に変換する
前記（１）に記載の情報処理装置。
（３）
前記処理部は、前記テキスト情報に対するコンテキスト情報に基づいて、前記テキスト情報を七五調に変換する
前記（１）又は（２）に記載の情報処理装置。
（４）
前記処理部は、
前記ユーザフィードバック情報を参照して、前記コンテキスト情報に適合した七五調変換処理を選定し、
前記テキスト情報に対し、選定された七五調変換処理を実行する
前記（３）に記載の情報処理装置。
（５）
前記ユーザフィードバック情報は、七五調の候補を生成する候補生成パターンごとに、過去の環境情報と前記ユーザの反応とをスコア値化した情報を含み、
前記コンテキスト情報は、現在の環境情報を含み、
前記処理部は、前記過去の環境情報のうち、前記現在の環境情報と同一又は類似する過去の環境情報のスコアが、閾値以上となる候補生成パターンを選定する
前記（４）に記載の情報処理装置。
（６）
前記処理部は、複数の候補生成パターンが選定可能な場合に、ランダムに１つの候補生成パターンを選定する
前記（５）に記載の情報処理装置。
（７）
前記候補生成パターンは、
助詞を抜いて七五調になる候補を生成する助詞抜き、
意味的に不要な部分を除去して七五調になる候補を生成する不要部分除去、
語尾を追加して七五調になる候補を生成する語尾追加、
意味的に変わらない語を追加して七五調になる候補を生成する意味不変語追加、
ある語を繰り返して七五調になる候補を生成する繰り返し追加、
オノマトペを追加して七五調になる候補を生成するオノマトペ追加、及び
同義語に置換して七五調になる候補を生成する同義語追加
による七五調候補生成のうち、１つの七五調候補生成、又は複数の七五調候補生成の組み合わせを含む
前記（５）又は（６）に記載の情報処理装置。
（８）
前記コンテキスト情報は、時間帯、場所、話者、同席者、及び場の雰囲気を示す情報のうち、少なくとも１つの情報を含む
前記（５）乃至（７）のいずれかに記載の情報処理装置。
（９）
前記テキスト情報は、ユーザの発話に対する応答である
前記（１）乃至（８）のいずれかに記載の情報処理装置。
（１０）
前記テキスト情報は、音声合成により読み上げられることを目的とした情報である
前記（１）乃至（８）のいずれかに記載の情報処理装置。
（１１）
情報処理装置の情報処理方法において、
前記情報処理装置が、
入力されたテキスト情報を、七五調に変換して出力する
情報処理方法。(1)
An information processing device equipped with a processing unit that converts the input text information into 75-tone and outputs it.
(2)
The information processing device according to (1) above, wherein the processing unit converts the text information into 75-tone based on the user feedback information obtained by the feedback from the user.
(3)
The information processing device according to (1) or (2) above, wherein the processing unit converts the text information into a seven-five tone based on context information for the text information.
(4)
The processing unit
With reference to the user feedback information, the 75-tone conversion process suitable for the context information is selected.
The information processing device according to (3) above, which executes the selected 75-tone conversion process on the text information.
(5)
The user feedback information includes information obtained by scoring the past environmental information and the reaction of the user for each candidate generation pattern that generates the candidates of the Shichigocho.
The context information includes current environmental information.
The information processing unit according to (4) above selects a candidate generation pattern in which the score of the past environmental information that is the same as or similar to the current environmental information is equal to or higher than the threshold value among the past environmental information. apparatus.
(6)
The information processing apparatus according to (5) above, wherein the processing unit randomly selects one candidate generation pattern when a plurality of candidate generation patterns can be selected.
(7)
The candidate generation pattern is
Without particles, to generate candidates that will be in the 75th tone without particles,
Remove unnecessary parts to generate candidates that become 75-tone by removing semantically unnecessary parts,
Add endings to generate candidates for Shichigocho
Add meaning-invariant words to generate candidates that become 75-tone by adding words that do not change in meaning,
Repeated addition to generate candidates that become 75-tone by repeating a certain word,
Add onomatopoeia to generate candidates for 75-tone, and add onomatopoeia to generate candidates for 75-tone, and add synonyms to generate one-75-tone candidate or multiple 75-tone candidates. The information processing apparatus according to (5) or (6) above, which includes a combination of generations.
(8)
The information processing device according to any one of (5) to (7) above, wherein the context information includes at least one piece of information indicating a time zone, a place, a speaker, an attendant, and an atmosphere of the place.
(9)
The information processing device according to any one of (1) to (8) above, wherein the text information is a response to a user's utterance.
(10)
The information processing device according to any one of (1) to (8) above, wherein the text information is information intended to be read aloud by voice synthesis.
(11)
In the information processing method of the information processing device
The information processing device
An information processing method that converts the input text information into 75-tone and outputs it.

１対話システム，１０情報処理装置，２０サーバ，３０インターネット，１００制御部，１０１ CPU，１０２ ROM，１０３ RAM，１５０情報処理部，１５１音声認識部，１５２発話意図理解部，１５３アプリケーション／サービス実行部，１５４応答生成部，１５５コンテキスト取得部，１５６七五調変換部，１５７音声合成部，１５８ユーザフィードバック収集部，２０１コンテキスト情報DB，２０２ユーザフィードバック情報DB，２１１語尾リスト，２１２意味不変語リスト，２１３オノマトペリスト，２１４同義語リスト 1 Dialogue system, 10 Information processing device, 20 Server, 30 Internet, 100 Control unit, 101 CPU, 102 ROM, 103 RAM, 150 Information processing unit, 151 Speech recognition unit, 152 Speech intention understanding unit, 153 Application / service execution unit , 154 Response generation unit, 155 Context acquisition unit, 156 75-tone conversion unit, 157 Speech synthesis unit, 158 User feedback collection unit, 201 Context information DB, 202 User feedback information DB, 211 Ending list, 212 Semantic invariant list, 213 Onomatope List, 214 Synonyms list

Claims

An information processing device equipped with a processing unit that converts the input text information into 75-tone and outputs it.

The information processing device according to claim 1, wherein the processing unit converts the text information into 75-tone based on the user feedback information obtained by the feedback from the user.

The information processing apparatus according to claim 2, wherein the processing unit converts the text information into Shichigo-Chine based on context information for the text information.

The processing unit
With reference to the user feedback information, the 75-tone conversion process suitable for the context information is selected.
The information processing device according to claim 3, which executes the selected 75-tone conversion process on the text information.

The user feedback information includes information obtained by scoring the past environmental information and the reaction of the user for each candidate generation pattern that generates the candidates of the Shichigocho.
The context information includes current environmental information.
The information processing apparatus according to claim 4, wherein the processing unit selects a candidate generation pattern in which the score of the past environmental information that is the same as or similar to the current environmental information is equal to or higher than the threshold value among the past environmental information. ..

The information processing apparatus according to claim 5, wherein the processing unit randomly selects one candidate generation pattern when a plurality of candidate generation patterns can be selected.

The candidate generation pattern is
Without particles, to generate candidates that will be in the 75th tone without particles,
Remove unnecessary parts to generate candidates that become 75-tone by removing semantically unnecessary parts,
Add endings to generate candidates for Shichigocho
Add meaning-invariant words to generate candidates that become 75-tone by adding words that do not change in meaning,
Repeated addition to generate candidates that become 75-tone by repeating a certain word,
Add onomatopoeia to generate candidates for 75-tone, and add onomatopoeia to generate candidates for 75-tone, and add synonyms to generate one-75-tone candidate or multiple 75-tone candidates. The information processing apparatus according to claim 5, which includes a combination of generations.

The information processing device according to claim 5, wherein the context information includes at least one piece of information indicating a time zone, a place, a speaker, an attendant, and an atmosphere of the place.

The information processing device according to claim 1, wherein the text information is a response to a user's utterance.

The information processing device according to claim 1, wherein the text information is information intended to be read aloud by voice synthesis.

In the information processing method of the information processing device
The information processing device
An information processing method that converts the input text information into 75-tone and outputs it.