JPS63121098A

JPS63121098A - Voice recognition equipment for telephone

Info

Publication number: JPS63121098A
Application number: JP61266960A
Authority: JP
Inventors: 正宏浜田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1986-11-10
Filing date: 1986-11-10
Publication date: 1988-05-25

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は電話端末から入力された音声を認識する電話用
音声認識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a telephone speech recognition device that recognizes speech input from a telephone terminal.

従来の技術近年、電話音声を認識するだめの音声認識装置は徐々に
一般にも利用されるようになってきた。2. Description of the Related Art In recent years, voice recognition devices for recognizing telephone voices have gradually come into general use.

以下図面を参照しながら、従来の電話用音声認識装置に
ついて説明を行う。第２図は従来の電話用音声認識装置
を示すものである。第２図に於て、１０および１１は各
種の電話端末であり、これらはそれぞれ電話交換機２中
の選択手段３に接続されている。更に選択手段３の出力
は音声照合部４に入力されている。一方標準パターンの
組５の出力も音声照合部４に接続されている。A conventional telephone voice recognition device will be described below with reference to the drawings. FIG. 2 shows a conventional telephone voice recognition device. In FIG. 2, 10 and 11 are various telephone terminals, each of which is connected to the selection means 3 in the telephone exchange 2. In FIG. Furthermore, the output of the selection means 3 is input to the voice matching section 4. On the other hand, the output of the standard pattern set 5 is also connected to the voice verification section 4.

以上のように構成された電話用音声認識装置に関し、以
下その動作について説明する。The operation of the telephone voice recognition device configured as described above will be described below.

使用者は電話端末１Ｑあるいは１１から音声を入力し、
これらの音声は電話交換機２に入力される。電話交換機
２中の選択手段３は、どの電話端末から音声入力があっ
たかを判別し、スイッチを制御して認識すべき音声を音
声照合部４へ出力する。一方、標準パターンの組５の中
には、各認識対象音声についての統計的に最適な音声の
分析結果の一組が記憶されており、これらのデータも音
声照合部４へ入力される。音声照合部では前記の２系統
の入力を受け、所定の照合作業を行って認識結果を出力
する。この照合作業には多くの方法が考案されているが
、その大部分は入力音声と標準パターンとの間のスペク
トル的距離の概念に基づくものであり、最も近いスペク
トル距離を与えた漂準パターンの音声を以て、入力音声
の認識結果としている。The user inputs voice from telephone terminal 1Q or 11,
These voices are input to the telephone exchange 2. A selection means 3 in the telephone exchange 2 determines from which telephone terminal the voice input is received, controls a switch, and outputs the voice to be recognized to the voice matching section 4. On the other hand, in the standard pattern set 5, a set of statistically optimal speech analysis results for each recognition target speech is stored, and these data are also input to the speech matching section 4. The voice verification section receives the above-mentioned two systems of input, performs a predetermined verification process, and outputs a recognition result. Many methods have been devised for this matching task, but most of them are based on the concept of spectral distance between the input speech and the standard pattern, and are based on the drift pattern that gives the closest spectral distance. The voice is used as the recognition result of the input voice.

発明が解決しようとする問題点しかしながら上記のような構成では、使用者が使用する
電話端末のそれぞれが音響特性的に異なり、互いに異な
ったスペクトルが音声に付与される場合、音声照合部４
で行うスペクトル距離の比較において、電話端末の違い
によるスペクトル距離のために音声そのものの違いによ
るスペクトル距離差が不明確になり、ひいては誤認識を
招くようになるという問題点を有していた。Problems to be Solved by the Invention However, in the above configuration, when the telephone terminals used by the users have different acoustic characteristics and different spectra are given to the voices, the voice verification unit 4
In the comparison of spectral distances carried out in , there was a problem in that the spectral distances due to differences in telephone terminals made the spectral distance differences due to differences in voice itself unclear, leading to erroneous recognition.

−例として、第３図に市販の３種の電話端末の送話系周
波数特性を示す。第３図（Ｌ）はカーボン型マイクロホ
ンを用いた送話系の特性であり、２ないし３ＫＨ，にお
ける広い山特性と、３００Ｈ７における肩特性とが顕著
である。一方、第３図（ｂ）はエレクトレットコンデン
サ型マイクロホンを用いた送話系の特性であり、これに
は第３図（＆）に・見られたような顕著な性質は現れて
いない。さらに第３図（Ｃ）は同じくエレクトレットコ
ンデンサ型マイクロホンを用いた別の送話系の特性であ
り、これには２ないし３ＫＨｚにおける広い山特性のみ
が現れている。以上のように電話端末送話系の周波数特
性は機種によってそれぞれ大きく異なっている。従って
、これらから入力された音声の周波数特注もそれぞれ大
きく異なっている事は容易に想像される。更に、機種の
異なる電話端末から入力されたそれぞれの音声を単一種
類の標準パターンによって照合した場合に、音声自体の
スペクトル距離差が不明確になり誤認識を招くというこ
とも容易に想像されるという問題点を有している。- As an example, FIG. 3 shows the frequency characteristics of the transmitting system of three types of commercially available telephone terminals. FIG. 3(L) shows the characteristics of a transmission system using a carbon type microphone, and the wide peak characteristics at 2 to 3KH and the shoulder characteristics at 300H7 are noticeable. On the other hand, FIG. 3(b) shows the characteristics of a transmitting system using an electret condenser microphone, and the remarkable characteristics seen in FIG. 3(&) do not appear in this. Furthermore, FIG. 3(C) shows the characteristics of another transmission system using the same electret condenser microphone, and only a wide peak characteristic at 2 to 3 kHz appears. As described above, the frequency characteristics of the transmitting system of telephone terminals vary greatly depending on the model. Therefore, it is easy to imagine that the frequency customization of the audio input from these devices also differs greatly. Furthermore, it is easy to imagine that if voices input from different models of telephone terminals are compared using a single type of standard pattern, the differences in the spectral distances of the voices themselves will become unclear, leading to misrecognition. There is a problem with this.

本発明は上記問題点に鑑み、使用者の使用する電話端末
の音響特性が互いに異なる場合でも良好な音声認識を行
うことのできる電話用音声認識装置を提供するものであ
る。In view of the above-mentioned problems, the present invention provides a telephone speech recognition device that can perform good speech recognition even when the acoustic characteristics of telephone terminals used by users are different from each other.

問題点を解決するための手段この目的を達成するために本発明の電話用音声認識装置
け、複数の電話端末と、入力された音声がどの電話端末
からのものであるかを判別し選択する選択手段と、選択
手段の出力をフィルタ処理するフィルタ手段と、標準パ
ターンの組と、前記フィルタ手段に入力すべきフィルタ
係数の複数の徂と、前記複数の電話端末と前記フィルタ
係数の複数の組との間で予め定められた多対一の対応関
係を記り行する対応テーブルと、音声照合部とから構成
されている。Means for Solving the Problems In order to achieve this objective, the telephone voice recognition device of the present invention identifies and selects a plurality of telephone terminals and which telephone terminal the input voice comes from. a selection means, a filter means for filtering the output of the selection means, a set of standard patterns, a plurality of ranges of filter coefficients to be input to the filter means, a plurality of telephone terminals and a plurality of sets of the filter coefficients; It consists of a correspondence table that records a predetermined many-to-one correspondence between the two, and a voice matching section.

作用この構成により、認識対象音声が入力された電話端末が
判別され、この電話端末の音響特性に対応したフィルタ
係数の組が一組決定される。ここでフィルタ係数のそれ
ぞれの組には、前記対応テーブルで対応づけられた電話
端末の音響特性に対応したスペクトルが予め付与されて
おり、これてよって電話端末の違いを相殺した良好な音
声認識を行うことが出来る。Effect: With this configuration, the telephone terminal into which the speech to be recognized is input is determined, and one set of filter coefficients corresponding to the acoustic characteristics of this telephone terminal is determined. Here, each set of filter coefficients is given a spectrum in advance that corresponds to the acoustic characteristics of the telephone terminals associated with it in the correspondence table, thereby achieving good speech recognition that cancels out differences between telephone terminals. It can be done.

実施例以下本発明の一実施例について１図面を参照しながら説
明する。EXAMPLE Hereinafter, an example of the present invention will be described with reference to one drawing.

第１図は本発明の一実施例における電話用音声認識装置
を示すものである。第１図に於て１０および１１は客種
の電話端末であり、これらはそれぞれ回線６０および６
１を経て電話交換？Ａ２中の選択手段３に接続されてい
る。さらに選択手段３の出力はフィルタ手段７に加えら
れ、フィルタ手段７の出力は音声照合部４に入力されて
いる。−方選択手段３の選択情報は対応テーブル９に入
力され、対応テーブル９のテーブル索引結果出力はフィ
ルタ係数の複数の組８に入力される。更にフィルタ係数
の複数の組８から選ばれた一組のフィルタ係数はフィル
タ手段７に入力される。また、標準パターンの組５の内
容は音声照合部４に入力されている。FIG. 1 shows a telephone voice recognition device according to an embodiment of the present invention. In FIG. 1, 10 and 11 are customer telephone terminals, and these are lines 60 and 6, respectively.
Telephone exchange after 1? It is connected to the selection means 3 in A2. Further, the output of the selection means 3 is applied to the filter means 7, and the output of the filter means 7 is inputted to the voice verification section 4. The selection information of the - way selection means 3 is input to a correspondence table 9, and the table index result output of the correspondence table 9 is input to a plurality of sets 8 of filter coefficients. Furthermore, one set of filter coefficients selected from the plurality of sets 8 of filter coefficients is input to the filter means 7. Further, the contents of the standard pattern set 5 are input to the voice matching section 4.

以上のように構成された電話用音声認識装置に関して、
以下その動作について説明する。Regarding the telephone voice recognition device configured as above,
The operation will be explained below.

使用者は、電話端末１ｏあるいは１１から音声を入力し
、これらの音声は回線６０および６１を経て電話交換機
２に入力される。電話交換機２中の選択手段３は、どの
回線から音声入力があったかを判別し、スイッチを制御
して認識すべき音声をフィルタ手段７へ出力する。ここ
で説明の便宜上、使用者が音声入力に用いている電話端
末をアクティブ端末、これの接続されている回線をアク
ティブ回線と呼ぶことにする。選択手段３は現在のア名
ティプ回線の回線番号を対応テーブル９に出力する。対
応テーブルはこのアクティブ回線番号を用いて多対一の
テーブルを検索し、アクティブ端末の種別を決定し、さ
らにフィルタ手段７の入力段における音声信号に付与さ
れている前記アクティブ端末の音響特性を相殺するのに
最も適した一組のフィルタ係数を、フィルタ係数の複数
の組６の中から選び出す。選び出された一組のフィルタ
係数はフィルタ手段７に入力され、ここでのフィルタ処
理により前記アクティブ端末の音響特性が相殺された音
声信号が得られる。音声照合部４はこの音声信号と標準
パターンの組５からの出力とを受け、音声照合を行う。A user inputs voice from the telephone terminal 1o or 11, and these voices are input to the telephone exchange 2 via lines 60 and 61. The selection means 3 in the telephone exchange 2 determines from which line the voice input is received and outputs the voice to be recognized to the filter means 7 by controlling a switch. For convenience of explanation, the telephone terminal used by the user for voice input will be referred to as an active terminal, and the line to which it is connected will be referred to as an active line. The selection means 3 outputs the line number of the current anonymous line to the correspondence table 9. The correspondence table uses this active line number to search a many-to-one table, determines the type of active terminal, and further cancels out the acoustic characteristics of the active terminal imparted to the audio signal at the input stage of the filter means 7. The set of filter coefficients most suitable for The selected set of filter coefficients is input to filter means 7, and filter processing therein yields an audio signal in which the acoustic characteristics of the active terminal are canceled. The voice verification section 4 receives this voice signal and the output from the standard pattern set 5, and performs voice verification.

ところで本発明においては、それぞれの電話端末の音響
特性を相殺するだめのフィルタ係数の組を予め作成して
おかねばならない。−例として、それぞれの種別の電話
端末の送話系インパルス応答を予め測定し、これの逆特
性を近似するディジタルフィルタを構成する方法が挙げ
られる。この方法によって線形系の仮定の下では電話端
末送話系の補正を上記近似の範囲内で完全に行うことが
できる。送話系にエレクトレットコンデンサ型マイクロ
ホンが使用されている場合には、この方法が適している
。一方、カーボン型マイクロホンが使用されている場合
にはこれの持つ非線形性のために、上述のような逆特性
による補正は厳密には成立し得ない。しかし粗い近似と
しては、対象となるカーボン型マイクロホンの代表的周
波数特性を測定し、これに同様の処理を加えることによ
って所期の目的が達せられるものと考えられる。However, in the present invention, a set of filter coefficients that cancel out the acoustic characteristics of each telephone terminal must be created in advance. - For example, there is a method of measuring the transmitting impulse response of each type of telephone terminal in advance and configuring a digital filter that approximates the inverse characteristics of the impulse response. With this method, under the assumption of a linear system, it is possible to completely correct the transmission system of the telephone terminal within the range of the above approximation. This method is suitable when an electret condenser microphone is used in the transmission system. On the other hand, when a carbon type microphone is used, due to its nonlinearity, the above-mentioned correction based on the inverse characteristic cannot be strictly achieved. However, as a rough approximation, it is considered that the desired objective can be achieved by measuring the typical frequency characteristics of the target carbon microphone and applying similar processing to this.

以上のように太実施例によれば、フィルタ手段７の出力
には使用した電話端末の如何に拘らず、常に電話端末の
特性が相殺された信号が得られ、これにより音声照合部
では良好な音声照合を行うことができる。さらに第１図
中の回線６ｏあるいは６１が特定の相異なる音響特性を
持つ場合にも同様の方法でこの影９を相殺ないしは怪減
することができる。As described above, according to the embodiment, a signal with the characteristics of the telephone terminal canceled out is always obtained as the output of the filter means 7, regardless of the telephone terminal used. Voice verification can be performed. Furthermore, even if the lines 6o or 61 in FIG. 1 have specific different acoustic characteristics, this shadow 9 can be canceled out or significantly reduced by the same method.

発明の効果以上のように本発明は、複数の電話端末と、入力された
音声がどの電話端末からのものであるかを判別し選択す
る選択手段と、選択手段の出力をフィルタ処理するフィ
ルタ手段と、標準パターンの組と、前記フィルタ手段に
入力すべきフィルタ係数の複数の組と、前記複数の電話
端末と前記フィルタ係数の複数の組との間で予め定めら
れた多対一の対応関係を記憶する対応テーブルと、音声
照合部とを設けることにより、実際に音声が入力された
電話端末機種の音響特性の違いを相殺した良好な音声認
識を行うことができ、さらに前記選択手段は電話端末と
回線とを一組にして判別・選択するため、回線が特定の
相異なる音♂特性を持つ場合にも同様の方法で良好な音
声認識を行うことが出来る優れた電話用音声認識装置を
実現できるものである。Effects of the Invention As described above, the present invention provides a plurality of telephone terminals, a selection means for determining and selecting which telephone terminal the input voice comes from, and a filter means for filtering the output of the selection means. and a predetermined many-to-one correspondence relationship between a set of standard patterns, a plurality of sets of filter coefficients to be input to the filter means, and the plurality of telephone terminals and the plurality of sets of filter coefficients. By providing a correspondence table for storing the information and a voice matching section, it is possible to perform good voice recognition that cancels out differences in the acoustic characteristics of the telephone terminal models to which the voice is actually input, and furthermore, the selection means Since the terminal and line are identified and selected as a pair, we have developed an excellent telephone voice recognition device that can perform good voice recognition using the same method even when the line has specific different sound characteristics. This is something that can be achieved.

[Brief explanation of the drawing]

第１図は本発明の一実施例における電話用音声認識装置
のブロック図、第２図は従来の電話用音声認識装置のブ
ロック図、第３図（λ）、（ｂ）、（Ｃ）は市販の３皿
の電話端末の送話系周波数特性図である。ＩＱ、１１・・・・・電話端末、２・・・・・・電話交
換機、３・・・・・・選択手段、４・・・・・・音声照
合部、６・・・・・・Ｑ？！ｇパターンの祖、６０．６
１・・・・・・回線、７・・・・・・フィルタ手段、８
・・・・・フィルタ係数の複数の組、９・・・・・・対
応テーブル。代理人の氏名　弁理士　中　尾　敏　男　ほか１名派　
　　　　　リ　　　− 第２図第　３　図（ｄ討 □唄波秩（）ｌｚＪ（ｄＢ＋ □思儂物Ｈυ −ｍン友？〜（９２〕FIG. 1 is a block diagram of a telephone voice recognition device according to an embodiment of the present invention, FIG. 2 is a block diagram of a conventional telephone voice recognition device, and FIG. 3 (λ), (b), and (C) are It is a frequency characteristic diagram of the transmitting system of three commercially available telephone terminals. IQ, 11...Telephone terminal, 2...Telephone exchange, 3...Selecting means, 4...Voice matching section, 6...Q ? ! The ancestor of the g pattern, 60.6
1... Line, 7... Filter means, 8
...Multiple sets of filter coefficients, 9...Correspondence table. Name of agent: Patent attorney Toshio Nakao and one other person
- Figure 2 Figure 3 (dB+ Utaha Chichi ()lzJ (dB+ □Thinking thing Hυ -m friend? ~ (92)

Claims

[Claims]

a plurality of telephone terminals, a selection means for determining and selecting from which telephone terminal the input voice comes from, a filter means for filtering the output of the selection means, a set of standard patterns, and the filter means. a correspondence table that stores a predetermined many-to-one correspondence relationship between a plurality of sets of filter coefficients to be input into a plurality of telephone terminals and a plurality of sets of filter coefficients; The selection means determines and selects from which telephone terminal the voice to be recognized that is input from any one of the plurality of telephone terminals is input, and the determination and selection result A corresponding set of filter coefficients is selected from the plurality of sets of filter coefficients based on the correspondence table, and the audio input to the filter means is processed using the selected set of filter coefficients. A voice recognition device for a telephone, characterized in that filter processing is performed and the output is compared with the set of standard patterns by the voice matching unit.