JPS59216242A

JPS59216242A - Voice recognizing response device

Info

Publication number: JPS59216242A
Application number: JP58091809A
Authority: JP
Inventors: Yoichi Takebayashi; 洋一竹林; Hidenori Shinoda; 篠田　英範; Teruhiko Ukita; 浮田　輝彦
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1983-05-25
Filing date: 1983-05-25
Publication date: 1984-12-06
Anticipated expiration: 2010-03-08
Also published as: JPH0721759B2

Abstract

PURPOSE:To make a natural and smooth interaction between a person and a machine possible by measuring the speaking speed of an input voice and controlling the output of a response voice in accordance with this speaking speed. CONSTITUTION:With respect to the input voice, the start and the end of a word in a voice pattern are detected by an analyzer 1, a voice pattern memory 2, and a voice section detector 3. A pattern collating circuit 4 collates the voice pattern of word data with standard patterns of plural words registered preliminarily in a word dictionary memory 5 to recognize the word. The recognition results of the input voice obtained by the pattern collating circuit 4 is given to a voice response output part 6 and a speaking speed measurer 7, and the measurer 7 uses the input voice recognition result and the time length between the start and the end to obtain a standard time length. The speaking speed is calculated on a basis of an average value of standard time lengths and variance. A voice response speed controller 9 attains information concerning this speaking speed to control variably the speed of the response voice due to a voice response output part 6.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は音声入力による情報処理システムに用いられる
音声認識応答装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a voice recognition response device used in an information processing system using voice input.

[Technical background of the invention and its problems]

近時、音声認識技術や音声合成技術の発達が目覚ましく
、例えば連続音声認識や不特定話者を対象とした音声認
識が可能となり、また線形予測符号化法を用いた精度の
高い音声合成が可能となっている。また文章を音声に変
換する為の規則合成法に関しても盛んに研究開発されて
いる。In recent years, there has been remarkable progress in speech recognition technology and speech synthesis technology. For example, it has become possible to perform continuous speech recognition and speech recognition for unspecified speakers, and it has also become possible to perform highly accurate speech synthesis using linear predictive coding. It becomes. There is also active research and development on rule synthesis methods for converting sentences into speech.

しかして、このような技術を用いて、例えば電話公衆回
線を用いて各種のザービスを行う電話音声応答サービス
システムや、銀行等におけるオンライン業務システムの
開発が試行されており、その有用性が注目されている。Using such technology, attempts have been made to develop, for example, a telephone voice response service system that provides various services using public telephone lines, and an online business system for banks, etc., and their usefulness is attracting attention. ing.

ところがこの種のシステムの利用者は不特定多数であり
、例えば老人や子供等の不慣れな人、あるいは１日に何
回ともなく利用する人が存在する。これにも拘らず、従
来装置ｆ５あっては、その音声応答の内容が一様であシ
、すたその発話速度も一定である為、人間と機械との対
話が円滑になされていなかった。つまり応答が冗長で苛
立しさが生じたり、或いは応答がわかり難いという問題
が生じた。However, the users of this type of system are an unspecified number of people, including people who are inexperienced, such as the elderly and children, and people who use the system several times a day. In spite of this, in the conventional device f5, the content of the voice response is uniform, and the speech rate is also constant, so that dialogue between humans and machines cannot be carried out smoothly. In other words, a problem arises in that the responses are redundant and irritating, or the responses are difficult to understand.

[Purpose of the invention]

本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、人間と機械との間の自然で円滑
な対話を可能として効果的な音声入力による情報処理を
可能ならしめる実用性の高い音声認識応答装置を提供す
ることにある。The present invention was made in consideration of these circumstances, and its purpose is to enable natural and smooth interaction between humans and machines and to enable effective information processing through voice input. An object of the present invention is to provide a highly practical voice recognition response device.

[Invention planting]

本発明は入力音声を認識して音声応答するに際し、上記
入力音声の発話速度を測定し、この発話速度に応じて応
答音声の出力を、例えば応答音声速度や応答内容等を制
御するようにしたものである。In the present invention, when recognizing an input voice and giving a voice response, the speech rate of the input voice is measured, and the output of the response voice, such as the response voice rate and response content, is controlled according to this speech rate. It is something.

〔Effect of the invention〕

かくして本発明によれば、入力音声の発話速度に応じて
音声応答出力が制御されるので、音声入力者に対して適
切な応答を与えることが可能となる。例えば利用頻度の
高い人に対しては簡潔な応答を与え、また利用頻度の低
い人に対しては丁寧な応答を与えることによって音声入
力の適切な指示を与えることが可能となり、その対話の
自然性、円滑性を十分に高めることが可能となる。Thus, according to the present invention, since the voice response output is controlled according to the speech rate of the input voice, it is possible to give an appropriate response to the voice input person. For example, it is possible to give appropriate instructions for voice input by giving concise responses to people who use the device frequently, and polite responses to people who use it infrequently, making the dialogue more natural. This makes it possible to sufficiently improve performance and smoothness.

[Embodiments of the invention]

以下、図面を参照して本発明の実施例につき説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第１図は第１の実施例装置を示す概略構成図である。こ
の装置は音声の認識対象を単語とし、この単語の発話速
度に応じて音声応答の速度制御を行うものである。即ち
、入力音声ば分析器１を介してＡ／Ｄ変換処理、スペク
トル分析処理等が施されてその特徴パラメータの系列に
変換され、音声パターンメモリ２に格納される。FIG. 1 is a schematic configuration diagram showing a first embodiment of the apparatus. This device uses a word as a speech recognition target, and controls the speed of the speech response according to the speech rate of the word. That is, the input speech is subjected to A/D conversion processing, spectrum analysis processing, etc. via the analyzer 1, converted into a series of characteristic parameters, and stored in the speech pattern memory 2.

音声区間検出器３は、上記特徴パラメータ時系列の、例
えばエネルギー情報を利用して音声パターン中の単語の
始端と終端とを検出するものであり、これによって単語
データ部分が切出される。しかしてｔ’？ターン照合回
路４は、上記単語データの音声パターンと、単語辞書メ
モリ５に予め登録された複数の単語の各標準パターンと
を照合して、単語を認識している。この／９ターンの照
合は、例えば類似度計算法によって行われる。この認識
結果が音声応答出力部６に与えられる。The speech section detector 3 detects the start and end of a word in a speech pattern by using, for example, energy information of the characteristic parameter time series, and thereby extracts word data portions. But t'? The turn matching circuit 4 matches the audio pattern of the word data with each standard pattern of a plurality of words registered in advance in the word dictionary memory 5 to recognize the word. This /9 turn matching is performed, for example, by a similarity calculation method. This recognition result is given to the voice response output section 6.

一方、パターン照合回路４で求められた入力音声の認識
結果は発話速度測定器７に与えられる。この発話速度測
定器７ば、入力音声の認識結果Ｗｉと、前記始端および
終端の情報として示される単語の時間長Ｌｉとを用い、
単語継続時間長メモリ８に予め登録されている上記認識
単語Ｗｉの標準時間長Ｒｉを求め、その平均値と分散と
から発話速度τを算出するものである。これによって例
えば前記入力音声の発話速度τがその平均的な標準発話
速度よりも早いか、或いは遅いかが判定される。換言す
れば、これによって音声入力者が所謂早口か、標準的か
、遅日かが判定される。音声応答速度制御器９は、この
発話速度に関する情報を得て前記音声応答出力部６によ
る応答音声の速度を可変制御するものである。On the other hand, the recognition result of the input speech obtained by the pattern matching circuit 4 is given to the speech rate measuring device 7. This speech rate measuring device 7 uses the recognition result Wi of the input speech and the time length Li of the word indicated as the start and end information,
The standard duration Ri of the recognized word Wi registered in advance in the word duration memory 8 is obtained, and the speech rate τ is calculated from the average value and variance thereof. This determines, for example, whether the speech rate τ of the input voice is faster or slower than its average standard speech rate. In other words, it is determined whether the voice input person is a so-called fast talker, a standard talker, or a slow talker. The voice response speed controller 9 obtains information regarding the speech rate and variably controls the speed of the response voice output by the voice response output section 6.

この結果、音声応答出力部６からは、入力音声の認識結
果に応じて決定された応答文の音声出力速度が上記入力
音声の発話速度に応じて可変制御されて音声応答がなさ
れることになる。As a result, the voice response output unit 6 outputs a voice response by variably controlling the voice output speed of the response sentence determined according to the recognition result of the input voice according to the speaking speed of the input voice. .

このとき、規則合成方式によって応答音声が合成出力さ
れる場合にζ−Ｊ：、上記規則合成の為の種種のパラメ
ータの変化速度を制御することによって応答音声速度が
可変制御される。また録音編集形の音声合成が行われる
場合には、予め記録された発話速度の異なる文章や音声
素片を選択する等して、その応答音声速度の制御が行わ
れる。At this time, when the response voice is synthesized and output using the rule synthesis method, the response voice speed is variably controlled by controlling the rate of change of various parameters for the rule synthesis. Furthermore, when voice synthesis in the form of recording and editing is performed, the response voice speed is controlled by selecting pre-recorded sentences or voice segments having different speaking speeds.

かくして、このように構成された本装置によれば、音声
入力者の発話速度に応じた発話速度で音声応答が行われ
るので、所謂せっかちで早口な人に対しては早口形式で
、またのんびり型で遅日な人に対しては緩やかな速度で
音声応答することが可能となシ、ここに人間と機械との
間の対話の自然性を高め、その円滑化を図ることが可能
となる。この結果、総合的には音声認識応答による情報
処理効率の向上を図ることが可能となる。Thus, according to the present device configured in this way, the voice response is performed at a speaking speed that corresponds to the speaking speed of the person inputting the voice, so it is possible to respond to the so-called impatient and fast-talking person in a fast-talking manner, and to respond to a person who is an impatient person who speaks quickly, or to a leisurely person. This makes it possible to respond by voice at a slower speed to people who are late, thereby increasing the naturalness of dialogue between humans and machines and making it smoother. As a result, it is possible to improve the information processing efficiency through voice recognition responses.

第２図は本発明の汗、２の実施例を示す概略構成図でア
シ、入力音声のぎツチ周波数を求め、その変化からその
発話速度を検出して、応答文の内容自体を変えるように
したものである。即ち、入力音声は分析器１１を介して
分析され、その音声・クターンが音声ツクターンメモリ
１２に格納される。この音声パターンに対して音声認識
部１３は辞書メモリ１４に登録された音声辞書を参照し
てその認識を行っている。Figure 2 is a schematic configuration diagram showing the second embodiment of the present invention, in which the pitch frequency of the input voice is determined, the speaking rate is detected from the change, and the content of the response sentence itself is changed. This is what I did. That is, the input speech is analyzed through the analyzer 11, and the speech pattern is stored in the speech pattern memory 12. The speech recognition unit 13 recognizes this speech pattern by referring to a speech dictionary registered in the dictionary memory 14.

一方、前記入力音声はピッチ抽出器１５を介してそのピ
ッチ周波数成分が検出されている。On the other hand, the pitch frequency component of the input voice is detected through the pitch extractor 15.

このピッチ周波数成分の検出は、例えばケプストラム法
、変形相関法、ＡＤＩＶＩ法等を用いて、前記入力音声
の認識処理とは独立に行われる。このピッチ周波数の時
系列パターンの変化から、発話速度測定器１６により上
記入力音声の発話速度τが求められている。この発話速
度嘗の情報と前記入力音声の認識結果とを入力して音声
応答制御部１７はそれに応じた内容の応答文を決定して
おシ、これが音声応答出力部１８を介して音声出力され
る。Detection of this pitch frequency component is performed independently of the recognition process of the input speech using, for example, the cepstral method, the modified correlation method, the ADIVI method, or the like. The speech rate τ of the input voice is determined by the speech rate measuring device 16 from the change in the time-series pattern of the pitch frequency. The voice response control unit 17 inputs this speech rate information and the recognition result of the input voice, and determines a response sentence with corresponding content, which is output as voice via the voice response output unit 18. Ru.

つまシ音声認識結果とその発話速度に応じて、例えば「
ありがとう」、「あシがとうございます」、「あシがと
うございました、またどうぞ」等の同一意味を表わす応
答であっても、その表現形式の異なるものの中の１つが
選択制御されて音声出力される。つまり、音声入力者に
応じた内容と速度の音声応答がなされることになる。Depending on the speech recognition result and the speaking speed, for example,
Even if the responses express the same meaning, such as "Thank you,""Thankyou,""Thank you, thank you," or "Thank you, thank you very much." Output. In other words, a voice response is made with the content and speed depending on the voice input person.

従って、音声応答として次の音声入力指示を与えるよう
な場合、簡洲にその指示を与えた９、不慣れな人に対し
ては丁寧にその指示をヵえたりすることが可能となり、
対話の自然性を高め、処理効率の向上を図ることが可能
となる。Therefore, when giving the next voice input instruction as a voice response, it becomes possible to politely repeat the instruction to an unfamiliar person by giving that instruction to Kanshu9.
It becomes possible to enhance the naturalness of dialogue and improve processing efficiency.

このように本発明によれば、音声入力者の性格を良く反
映する音声発話速度を検出し、これに応じて音声応答を
制御するので、音声入力者との間の対話の自然性を高め
ることができる。As described above, according to the present invention, since the speech rate that well reflects the personality of the person inputting voice is detected and the voice response is controlled accordingly, the naturalness of the dialogue with the person inputting voice can be enhanced. I can do it.

この結果、上記音声入力者に苛立たしさを与える等の不
具合が無くなる等の実用上多大なる効果が奏せられる。As a result, great practical effects can be achieved, such as eliminating problems such as irritating the voice inputter.

尚、本発明は上記実施例に限宇されるものではない。例
えば入力音声の発話法ｎ′［の測定を、母音類似度やス
ペクトルの時間的変化を利用して行ったり、数字の連続
発生を認識対象とする時にはそのポーズ時間長を用いて
行うようにすることもできる。その他、音声応答の制御
を、その内容文の制御と共に発話速度をも制御して行う
ようにしてもよい。また音声の認識処理や音声合成の方
式は、従来よシ知られ／ζ種々の方式を適宜採用すれば
よい。要するに本発明は、その要旨を逸脱しない範囲で
稠々変形して実施することができる。Note that the present invention is not limited to the above embodiments. For example, the utterance method n'[ of the input speech can be measured using vowel similarity or temporal changes in the spectrum, or when the continuous occurrence of numbers is to be recognized, the pause duration can be used. You can also do that. In addition, the voice response may be controlled by controlling the content sentence as well as the speech rate. Furthermore, various conventionally known methods for speech recognition processing and speech synthesis may be employed as appropriate. In short, the present invention can be modified and implemented without departing from the gist thereof.

[Brief explanation of the drawing]

第１図は本発明の第１の実施例装置の概１略第１が成図
、第２図は本発明の第２の実施例装置の概略構成図であ
る。１．１１・・・分析器、２．１２・・・階１わパターン
メモリ、３・・・音声区間検出器、４・・・ツヤターン
照合回路、５・・・単語音材メモリ、６，１８・・・音
声応答出力部、７，１６・・・発話速度測定器、８・・
・単語継続時間長メモリ、９・・・音声応答速度！制御
器、１３・・・音声認識部、１４・・・辞書メモ１）、
１５・・・ピッチ抽出器、１７・・・音声応答制御部。FIG. 1 is a schematic diagram of a device according to a first embodiment of the present invention, and FIG. 2 is a schematic diagram of a device according to a second embodiment of the present invention. 1.11...Analyzer, 2.12...Level 1 pattern memory, 3...Speech section detector, 4...Tsuya turn matching circuit, 5...Word sound material memory, 6,18 ...Voice response output unit, 7, 16...Speech rate measuring device, 8...
・Word duration length memory, 9...Voice response speed! Controller, 13... Voice recognition unit, 14... Dictionary memo 1),
15... Pitch extractor, 17... Voice response control unit.

Claims

[Claims]

(1) In a voice recognition response device that recognizes input voice and outputs a voice response to the recognition result, the speech rate of the input voice is measured and the voice response output is controlled according to this speech rate. Characteristic voice recognition response device.

(2) Control of the voice response output varies the voice response speed,
The voice recognition response device according to claim 1, wherein the voice recognition response device is configured to change the content of the voice response sentence.