JPS63167400A

JPS63167400A - Voice recognition equipment for telephone

Info

Publication number: JPS63167400A
Application number: JP61311016A
Authority: JP
Inventors: 正宏浜田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1986-12-29
Filing date: 1986-12-29
Publication date: 1988-07-11

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は電話端末から入力された音声を認識する電話用
音声認識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a telephone speech recognition device that recognizes speech input from a telephone terminal.

従来の技術近年、電話音声を認識するための音声認識装置は徐々に
一般にも利用されるようになってきた。2. Description of the Related Art In recent years, speech recognition devices for recognizing telephone voices have gradually come into general use.

以下図面を参照しながら、従来の電話用音声認識装置に
ついて説明を行う。第２図は従来の電話用音声認識装置
を示すものである。第２図に於て、１ｏおよび１１は各
種の電話端末であり、これらはそれぞれ電話交換機２中
の選択部３に接続されている。更に選択部３の音声信号
出力は分析部ｅに入力され、分析部ｅの音響特徴量出力
は照合部４に入力されている。一方標準パターンの組６
の出力も照合部４に接続されている。A conventional telephone voice recognition device will be described below with reference to the drawings. FIG. 2 shows a conventional telephone voice recognition device. In FIG. 2, 1o and 11 are various telephone terminals, each of which is connected to a selection section 3 in a telephone exchange 2. In FIG. Further, the audio signal output from the selection section 3 is input to the analysis section e, and the acoustic feature output from the analysis section e is input to the collation section 4. On the other hand, standard pattern set 6
The output of is also connected to the matching section 4.

以上のように構成された電話用音声認識装置に関し、以
下その動作について説明する。The operation of the telephone voice recognition device configured as described above will be described below.

使用者は電話端末１ｏあるいは１１から音声を入力し、
これらの音声は回線１２あるいは１３を介し電話交換機
２に入力される。電話交換機２中の選択部３は、どの電
話端末から音声入力があったかを判別し、スイッチを制
御して認識すべき音声を分析部６へ入力する。分析部６
は入力音声に対して音響分析を行い、抽出した音響特徴
量を照合部４へ出力する。一方、標準パターンの組５の
中には、各認識対象音声についての統計的に最適な音声
の分析結果の一組が記憶されており、これらのデータも
照合部４へ入力される。音声照合部では前記２系統の入
力を受け、所定の照合作業を行って認識結果を出力する
。この照合作業には多くの方法が提案されているが、そ
の大部分は入力音声と標準パターンとの間のスペクトル
的距離の概念に基づくものであシ、最も近いスペクトル
距離を与えた標準パターンの音声を以て、入力音声の認
識結果としている。The user inputs voice from the telephone terminal 1o or 11,
These voices are input to the telephone exchange 2 via the line 12 or 13. A selection unit 3 in the telephone exchange 2 determines from which telephone terminal a voice input is received, controls a switch, and inputs the voice to be recognized to an analysis unit 6. Analysis department 6
performs acoustic analysis on the input speech and outputs the extracted acoustic features to the matching section 4. On the other hand, in the standard pattern set 5, a set of statistically optimal speech analysis results for each recognition target speech is stored, and these data are also input to the collation section 4. The voice verification section receives the inputs from the two systems, performs a predetermined verification process, and outputs a recognition result. Many methods have been proposed for this matching task, but most of them are based on the concept of spectral distance between the input speech and the standard pattern. The voice is used as the recognition result of the input voice.

発明が解決しようとする問題点しかしながら上記のような構成では、使用者が使用する
電話端末のそれぞれが音響特性的に異なり、互いに異な
ったスペクトルが音声に付与される場合、照合部４で行
うスペクトル距離の比較において、電話端末の違いによ
るスペクトル距離のために音声そのものの違いによるス
ペクトル距離差が不明確になシ、ひいては誤認識を招く
ようになるという問題点を有していた。Problems to be Solved by the Invention However, in the above configuration, when each telephone terminal used by a user has different acoustic characteristics and different spectra are given to the voice, the spectrum performed by the matching unit 4 When comparing distances, there is a problem in that the spectral distance difference due to the difference in telephone terminals makes the difference in spectral distance due to the difference in voice itself unclear, and even leads to erroneous recognition.

一例として、第３図に市販の３種の電話端末の送話系周
波数特性を示す。第３図（ａ）はカーボン型マイクロホ
ンを用いた送話系の特性であり、２ないし３匹における
広い山特性と、３００Ｈｚにおける肩特性とが顕著であ
る。−力筒３図（ｂ）はエレクトレットコンデンサ型マ
イクロホンを用いた送話系の特性であシ、これには第３
図（、）に見られたような顕著な性質は現れていない。As an example, FIG. 3 shows the frequency characteristics of the transmitting system of three types of commercially available telephone terminals. FIG. 3(a) shows the characteristics of a transmission system using a carbon type microphone, and the wide peak characteristics at 2 or 3 animals and the shoulder characteristics at 300 Hz are remarkable. - Figure 3 (b) shows the characteristics of a transmitting system using an electret condenser microphone, which includes the third
The remarkable properties seen in Figures (,) do not appear.

さらに第３図（ｃ）は同じくエレクトレットコンデンサ
型マイクロホンを用いた別の送話系の特性であり、これ
には２ないし３此における広い山特性のみが現れている
。Furthermore, FIG. 3(c) shows the characteristics of another transmission system using the same electret condenser microphone, and only the wide peak characteristics at 2 or 3 appear in this.

以上のように電話端末送話系の周波数特性は機種によっ
てそれぞれ大きく異なっており、これらから入力された
音声の周波数特性もそれぞれ大きく異なっている事は容
易に想像される。As described above, the frequency characteristics of the transmitting system of telephone terminals differ greatly depending on the model, and it is easy to imagine that the frequency characteristics of the voice input from these terminals also differ greatly.

以上の例で明らかなように、従来の電話用音声認識装置
では、機種の異なる電話端末から入力されたそれぞれの
音声を単一種類の標準パターンによって照合するために
、音声自体のスペクトル距離差が不明確になり誤認識を
招くという問題点を有していた。As is clear from the above examples, in conventional telephone voice recognition devices, in order to match each voice input from telephone terminals of different models using a single type of standard pattern, differences in the spectral distances of the voices themselves are This has the problem of being unclear and leading to misunderstandings.

本発明は上記問題点に鑑み、使用者の使用する電話端末
の音響特性が互いに異なる場合でも良好な音声認識を行
うことのできる電話用音声認識装置を提供するものであ
る。In view of the above-mentioned problems, the present invention provides a telephone speech recognition device that can perform good speech recognition even when the acoustic characteristics of telephone terminals used by users are different from each other.

問題点を解決するための手段この目的を達成するために本発明の電話用音声認識装置
は、複数の電話端末と、入力された音声がどの電話端末
からのものであるかを判別し選択する選択部と、選択部
の出力を音響分析する分析部と、分析部の出力を補正す
る補正部と、標準パターンの組と、前記補正部に入力す
べき補正係数の複数の組と、前記複数の電話端末と前記
補正係数の複数の組との間で予め定められた多対一の対
応関係を記憶する対応テーブルと、照合部とから構成さ
れている。Means for Solving the Problems To achieve this object, the telephone voice recognition device of the present invention distinguishes and selects a plurality of telephone terminals and which telephone terminal the input voice is from. a selection section, an analysis section that acoustically analyzes the output of the selection section, a correction section that corrects the output of the analysis section, a set of standard patterns, a plurality of sets of correction coefficients to be input to the correction section, and the plurality of sets of correction coefficients to be input to the correction section. The present invention is comprised of a correspondence table that stores a predetermined many-to-one correspondence between the telephone terminal and the plurality of sets of correction coefficients, and a matching section.

作　　用この構成によシ、認識対象音声が入力された電話端末が
判別され、この電話端末の音響特性に対応した補正係数
の組が一組決定される。ここで補正係数のそれぞれの組
は、前記対応テーブルで対応づけられた電話端末の音響
特性を相殺するにふされしい係数の組となっており、こ
れによって電話端末の違いに左右されない良好な音声認
識を行うことが出来る。Operation: With this configuration, the telephone terminal into which the speech to be recognized is input is determined, and one set of correction coefficients corresponding to the acoustic characteristics of this telephone terminal is determined. Here, each set of correction coefficients is a set of coefficients suitable for canceling out the acoustic characteristics of the telephone terminals associated with each other in the correspondence table, thereby ensuring good audio quality unaffected by differences in telephone terminals. It is possible to perform recognition.

実施例以下本発明の一実施例について、図面を参照しながら説
明する。EXAMPLE An example of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例における電話用音声認識装置
を示すものである。第１図において、１０および１１は
各種の電話端末であり、これらはそれぞれ回線１２およ
び１３を経て電話交換機２中の選択部３に接続されてい
る。さらに選択部によって選択された出力は分析部６に
入力され、分析部６の出力は補正部７に入力され、補正
部７の出力は照合部４に入力されている。一方、選択部
３の選択情報は対応テーブル９に入力され、対応テーブ
ル９のテーブル索引結果出力は補正係数の複数の組８に
入力される。更に補正係数の複数の組８から選ばれた一
組の補正係数は補正部７に入力される。また、標準パタ
ーンの組６の内容は照合部４に入力されている。FIG. 1 shows a telephone voice recognition device according to an embodiment of the present invention. In FIG. 1, 10 and 11 are various telephone terminals, which are connected to a selection section 3 in a telephone exchange 2 via lines 12 and 13, respectively. Further, the output selected by the selection section is input to the analysis section 6, the output of the analysis section 6 is input to the correction section 7, and the output of the correction section 7 is input to the collation section 4. On the other hand, the selection information of the selection unit 3 is input to the correspondence table 9, and the table index result output of the correspondence table 9 is input to the plurality of sets 8 of correction coefficients. Furthermore, one set of correction coefficients selected from the plurality of sets 8 of correction coefficients is input to the correction section 7. Further, the contents of the standard pattern set 6 are input to the matching section 4.

以上のように構成された電話用音声認識装置に関して、
以下その動作について説明する。Regarding the telephone voice recognition device configured as above,
The operation will be explained below.

使用者は、電話端末１０あるいは１１から音声を入力し
、これらの音声は回線１２および１３を経て電話交換機
２に入力される。電話交換機２中の選択部３は、どの回
線から音声入力があったかを判別し、スイッチを制御し
て認識すべき音声を分析部６へ出力する。ここで説明の
便宜上、使用者が音声入力に用いている電話端末をアク
ティブ端末、これの接続されている回線をアクティブ回
線と呼ぶことにする。選択部３は現在のアクティブ回線
の回線番号を対応テーブル９に出力する。A user inputs voice from telephone terminal 10 or 11, and these voices are input to telephone exchange 2 via lines 12 and 13. A selection unit 3 in the telephone exchange 2 determines which line the voice input is from, controls a switch, and outputs the voice to be recognized to the analysis unit 6. For convenience of explanation, the telephone terminal used by the user for voice input will be referred to as an active terminal, and the line to which it is connected will be referred to as an active line. The selection unit 3 outputs the line number of the currently active line to the correspondence table 9.

対応テーブル９はこのアクティブ回線番号を用いて多対
一〇テーブルを検索し、アクティブ端末の種別を決定し
、さらに補正部７の入力段における音響特徴量に混入し
ている前記アクティブ端末の音響特性を相殺するのに最
も適した一組の補正係数を、補正係数の複数の組８の中
から選び出す。The correspondence table 9 uses this active line number to search the many-to-10 table, determines the type of active terminal, and further calculates the acoustic characteristics of the active terminal that are mixed in the acoustic feature at the input stage of the correction unit 7. A set of correction coefficients most suitable for canceling out is selected from among the plurality of correction coefficient sets 8.

選び出された一組の補正係数は補正部７に入力され、こ
こでの補正処理により前記アクティブ端末の音響特性が
相殺された音響特徴量が得られる。The selected set of correction coefficients is input to the correction unit 7, and the correction processing therein obtains an acoustic feature amount in which the acoustic characteristics of the active terminal are canceled.

照・合部４はこの音響特徴量と標準パターンの組６から
の出力とを受け、音声照合を行う。The matching unit 4 receives this acoustic feature amount and the output from the standard pattern set 6, and performs voice matching.

ところで本実施例における補正係数の組８中の係数値お
よび補正部７の動作内容は、分析部６、照合部４におけ
る動作との関連で決定されねばならない。−例として、
分析部６がフィルタバンクによる周波数帯域別エネルギ
分析を行い、照合部４が標準パターンと入カバターンと
の間の市街地距離を計算している場合を挙げる。この場
合は、例えばそれぞれの種別の電話端末の送話系伝送特
性を予め測定し、得られた音響特徴量を補正係数の組８
として持ち、実際の認識時に補正部７において、入力信
号の音響特徴量から使用電話端末の補正係数を減算する
方法が考えられる。この減算によって、音響特徴量の加
法則の下では電話端末送話系の補正を完全に行うことが
できる。送話系にエレクトレットコンデンサ型マイクロ
ホンが使用されている場合には、この方法が適している
。Incidentally, the coefficient values in the correction coefficient set 8 and the operation contents of the correction section 7 in this embodiment must be determined in relation to the operations in the analysis section 6 and the collation section 4. -For example,
Let us consider a case where the analysis section 6 performs energy analysis by frequency band using a filter bank, and the comparison section 4 calculates the urban distance between the standard pattern and the incoming cover pattern. In this case, for example, the transmission characteristics of the transmitting system of each type of telephone terminal are measured in advance, and the obtained acoustic features are used as the set of correction coefficients.
A possible method is to subtract the correction coefficient of the telephone terminal used from the acoustic feature of the input signal in the correction section 7 during actual recognition. By this subtraction, it is possible to completely correct the transmission system of the telephone terminal under the law of addition of acoustic features. This method is suitable when an electret condenser microphone is used in the transmission system.

一方、カーボン型マイクロホンが使用されている場合に
はこれの持つ非線形性のために加法則が成立せず、上述
のような音響特徴量の減算による補正は厳密には成立し
得ない。しかし粗い近似としては、対象となるカーボン
型マイクロホンの代表的周波数特性を測定し、これをも
とに同様の処理を加えることによって所期の目的が達せ
られるものと考えられる。On the other hand, when a carbon-type microphone is used, the addition law does not hold due to its nonlinearity, and the above-mentioned correction by subtracting the acoustic feature cannot strictly hold. However, as a rough approximation, it is thought that the intended purpose can be achieved by measuring the typical frequency characteristics of the target carbon microphone and applying similar processing based on this.

以上のように本実施例によれば、補正部７の出力には使
用した電話端末の如何に拘らず、常に電話端末の特性が
相殺された信号が得られ、これにより照合部では良好な
音声照合を行うことができる。さらに第１図中の回線１
２あるいは１３が特定の相異なる音響特性を持つ場合に
も、同様の方法でこの影響を相殺ないしは軽減すること
ができる。As described above, according to this embodiment, a signal in which the characteristics of the telephone terminal are canceled out is always obtained as the output of the correction section 7, regardless of the telephone terminal used. Verification can be performed. Furthermore, line 1 in Figure 1
2 or 13 have specific different acoustic properties, this effect can be canceled out or reduced in a similar way.

発明の効果以上のように本発明は、複数の電話端末と、入力された
音声がどの電話端末からのものであるかを判別し選択す
る選択部と、選択部の出力を音響分析する分析部と、分
析部の出力を補正する補正部と、標準パターンの組と、
前記補正部に入力すべき補正係数の複数の組と、前記複
数の電話端末と前記補正係数の複数の組との間で予め定
められた多対一の対応関係を記憶する対応テーブルと、
照合部とを設けることにより、実際に音声が入力された
電話端末機種の音響特性の違いを相殺した良好な音声認
識を行うことができ、さらに前記選択部は電話端末と回
線とを一組にして判別・選択するため、回線が特定の相
異なる音響特性を持つ場合にも同様の方法で良好な音声
認識を行うことが出来る優れた電話用音声認識装置を実
現できるものである。Effects of the Invention As described above, the present invention comprises a plurality of telephone terminals, a selection section that determines and selects which telephone terminal the input voice comes from, and an analysis section that acoustically analyzes the output of the selection section. , a correction section that corrects the output of the analysis section, a set of standard patterns,
a correspondence table that stores a predetermined many-to-one correspondence between a plurality of sets of correction coefficients to be input to the correction unit, and a plurality of sets of correction coefficients and the plurality of telephone terminals;
By providing a matching section, it is possible to perform good speech recognition that cancels out differences in the acoustic characteristics of the telephone terminal models to which speech is actually input. Therefore, it is possible to realize an excellent telephone speech recognition device that can perform good speech recognition using the same method even when lines have specific and different acoustic characteristics.

[Brief explanation of the drawing]

第１図は本発明の一実施例における電話用音声認識装置
のブロック図、第２図は従来の電話用音声認識装置のブ
ロック図、第３図ａ、ｂ、ｃは市販の３種の電話端末の
送話系周波数伝送特性図である。２・・・・・・電話交換機、３・・・・・・選択部、４
・・・・・・照合部、６・・・・・・標準パターンの組
、６・・・・・・分析部、７・・・・・・補正部、８・
・・・・・補正係数の複数の組、９・・・・・・対応テ
ーブル、１０．１１・・・・・・電話端末、１２゜１３
・・・・・・回線。代理人の氏名　弁理士　中　尾　敏　男　ほか１名（’
Ｊ　　　　　　　　　　Ｑ　　　　　ｙζ　　　　　　
＼第３図ホスも（−）尊ジシ沢（冷）肩濃枚（１世）FIG. 1 is a block diagram of a telephone voice recognition device according to an embodiment of the present invention, FIG. 2 is a block diagram of a conventional telephone voice recognition device, and FIG. 3 a, b, and c are three types of commercially available telephones. FIG. 3 is a diagram of frequency transmission characteristics of a transmitting system of a terminal. 2... Telephone exchange, 3... Selection section, 4
....Verification section, 6.. Standard pattern set, 6.. Analysis section, 7..... Correction section, 8.
...Multiple sets of correction coefficients, 9...Correspondence table, 10.11...Telephone terminal, 12゜13
・・・・・・Line. Name of agent: Patent attorney Toshio Nakao and one other person ('
J Q yζ
＼ Figure 3 Hosu also (-) Takajishizawa (cold) Shoulder thicket (1st generation)

Claims

[Claims]

a plurality of telephone terminals, a selection section that determines and selects which telephone terminal the input voice comes from, an analysis section that analyzes the audio signal output of the selection section, and an acoustic feature amount of the analysis section. a correction section that corrects the output; a set of standard patterns;
a correspondence table that stores a predetermined many-to-one correspondence between a plurality of sets of correction coefficients to be input to the correction unit, and a plurality of sets of correction coefficients and the plurality of telephone terminals;
a verification unit, the selection unit determines and selects from which telephone terminal the voice to be recognized inputted from any one of the plurality of telephone terminals is inputted; - Select a corresponding set of correction coefficients from the plurality of sets of correction coefficients by referring to the correspondence table based on the selection result, and using the selected set of correction coefficients,
A speech recognition device for a telephone, characterized in that the acoustic feature output from the analysis section is corrected by the correction section, and the comparison section performs speech matching between the correction output and the set of standard patterns. .