JPH012462A

JPH012462A - How to distinguish between audio and control signals

Info

Publication number: JPH012462A
Application number: JP62-158020A
Authority: JP
Inventors: 富永　好彦
Original assignee: 富士電機株式会社
Filing date: 1987-06-25
Publication date: 1989-01-06

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

[Detailed description of the invention] [Industrial application field]

本発明は固有の周波数スペクトルから構成される“人間
の音声“と、単一の正弦波より構成される“ホ制御信号
”とを識別できるようにした音声信号と制御信号との識
別方法に関するものであり、特に電話回線等の音声周波
数帯域内（３００１１ｚ〜３．４Ｋ　ｌｌｚ　）におい
て“人間の音声”と例えばファクシミリ等の他の接ＶＣ
機器の“制御信号”（ファクシミリの場合には一般にＣ
ＮＧ信号と呼ばれる１１００１１ｚの自動応答信号、以
下においてはＣＮＧ信号と呼ぶ）とを識別できるように
することにより電話回線等の有効活用を可能とするもの
である。The present invention relates to a method for identifying a voice signal and a control signal, which makes it possible to distinguish between a "human voice" consisting of a unique frequency spectrum and a "control signal" consisting of a single sine wave. In particular, within the voice frequency band (30011z to 3.4Kllz) such as telephone lines, "human voice" and other connected VCs such as facsimile etc.
The “control signal” of the device (in the case of facsimile, it is generally C
By making it possible to identify the 110011z automatic response signal called an NG signal (hereinafter referred to as a CNG signal), it is possible to make effective use of telephone lines and the like.

[Conventional technology]

電話回線の有効活用を図るために、電話回線に接続され
る１つの局線接続端子と、電話機や他の接続機器等（例
えばファクシミリ）に接続される複数の内線端末接続端
子とを持つ付加装置を用いて１本の電話回線に電話機や
他の接続機器等を切換接続することが行われている。こ
のような場合、付加装置は発信側からの信号を識別して
電話回線を電話機や他の接続機器等の何れかに切換接ｙ
コする制御を行っている。An additional device that has one office line connection terminal connected to the telephone line and multiple extension terminal connection terminals connected to telephones and other connected devices (e.g. facsimiles) in order to make effective use of telephone lines. A telephone is used to switch and connect telephones and other connected devices to a single telephone line. In such cases, the additional device identifies the signal from the calling party and switches the telephone line to either the telephone or other connected equipment.
control.

[Problems to be solved by the invention]

しかし、内線端末接続端子に接続されるものが電話機と
ファクシミリ等の他の接ｉ、４７　Ｆ、ＨＵ器であるＦ
；１合、電話機による“人間の音声”の周波数成分と、
他の接続機器の“制御信号”の周波数成分とは同一の音
声周波数帯域内で同じ周波数成分を有することがある。例えば“人間の音声”は３００Ｈｚ〜３．４Ｋ　ｆｉｚ
の音声周波数帯域の周波数成分であるのに対して、前記
ＣＮＧ信号は１１００Ｈｚである。このような場合、“
人間の音声”と“制御信号”とは同じ周波数成分を有す
るため、付加装置において正確に識別することは非常に
困難であった。本発明は上記に濫み、電話回線等の音声周波数帯域にお
いて、電話機による“人間の音声”と、他の接続機器に
よる“制御信号”とを識別することのできる識別方法を
提供することを目的とする。However, if the devices connected to the extension terminal connection terminal are telephones, facsimile machines, etc.,
;1. Frequency components of "human voice" from a telephone,
The "control signal" of another connected device may have the same frequency component within the same audio frequency band. For example, “human voice” is 300Hz to 3.4K fiz
The frequency component of the CNG signal is in the audio frequency band, whereas the CNG signal has a frequency of 1100 Hz. In such a case, “
Since "human voice" and "control signal" have the same frequency components, it has been extremely difficult to accurately identify them in an additional device. The object of the present invention is to provide an identification method that can distinguish between "human voice" from a telephone and "control signals" from other connected devices.

[Means to solve the problem]

人間の音声信号の検出は制御信号の周波数と等しいホル
マント（Ｆｏｒｍａｎｔ）の周波数成分Ｆに対してピン
チ周波数Ｐのｎ倍（但し、ｎは整数）の差を有する少な
くとも２つの周波数成分Ｆ＋ｎＰとＦ−ｎＰのアンド条
件で検出し、制御信号の検出はホルマントの周波数成分
に等しい周波数成分として検出する。Detection of a human voice signal involves detecting at least two frequency components F+nP and F− that have a difference of n times the pinch frequency P (where n is an integer) from a formant frequency component F that is equal to the frequency of the control signal. Detection is performed under the AND condition of nP, and the control signal is detected as a frequency component equal to the frequency component of the formant.

[For use]

制御信号は単一の正弦波、例えば前記ＣＮＧ信号は１１
００１１ｚの単一の正弦波より構成されるが、人間の音
声は固有のスペクトルから構成され、スペクトル上にお
いて主に３つの共振部分、即ちホルマントから構成され
る。このホルマントのピッチ周波数Ｐは音声の基本周波
数に対応し、男性で約１２５Ｈｚ、女性で約２５０Ｈｚ
である。したがって、音声信号の検出をホルマントの周
波数成分Ｆに対してピッチ周波数Ｐのｎ倍の差を有する
少なくとも２つの周波数成分Ｆ＋ｎＰとＦ−ｎＰのアン
ド条件で行うことにより、制御信号と識別することがで
きる。The control signal is a single sine wave, for example the CNG signal is 11
Although it is composed of a single sine wave of 0011z, human speech is composed of a unique spectrum, and the spectrum is mainly composed of three resonant parts, that is, formants. The pitch frequency P of this formant corresponds to the fundamental frequency of voice, which is approximately 125 Hz for men and approximately 250 Hz for women.
It is. Therefore, by detecting the audio signal under the AND condition of at least two frequency components F+nP and F-nP, which have a difference of n times the pitch frequency P with respect to the formant frequency component F, it is possible to identify it as a control signal. can.

【Example】

母音は通常、声帯が音源となり、口腔の形状が母音毎に
一定のかまえ（これを調音と呼ぶ）をもって発声される
。この母音の調音形態に対応して特定の周波数成分が強
調され、ア、イ、つ、工。オの５母音を特徴づける周波数スペクトルを生ずる。こ
の母音を特徴づける優勢な周波数成分をホルマントと呼
んでいるが、この日本語の５母音のホルマントを第２図
に示す、この第２図からも明らかなように、人間の音声
は主に３つの共振部分、すなわちホルマントから構成さ
れている。また、音声の振動数が音声の基本周波数、即
ちホルマントのピンチ周波数に対応しており、男性の基
本周波数の平均値と標準偏差はそれぞれ約１２５Ｈｚと
約２０．５）１ｚ、女性ではそれぞれ約２５０Ｈｚと約
４１．５１（ｚである。このように、人間の音声は固有の周波数スペクトルから
構成されるが、この人間の音声と単一の正弦波より構成
される制御信号とを周波数スペクトルにて表すと第３図
のようになる。第３図においては（ａ）が人間の音声、
（ｂｌが制御信号を示しており、制御信号としては１１
００ＨｚのＣＮＧ信号が示されている。第３図からも明
らかなように、制御信号は単一の正弦波より構成されて
いるため、人間の音声とは異なりホルマントの周波数成
分Ｆに対してピンチ周波数Ｐのｎ倍の差を有する少なく
とも２つの周波数成分Ｆ＋ｎＰ、Ｆ−ｎＰが存在しない
ので、制御信号と人間の音声とが同一の音声周波数帯域
内であったとしても人間の音声を周波数成分Ｆ＋ｎＰと
Ｆ−ｎＰのアンド条件で検出するようにしておけば、制
御信号と人間の音声とを識別することが可能である。第１図は上述の原理に基づいて本発明による識別方法を
通用した識別回路の構成を示すブロック図である。第１
図において、ＢＰＦｌ、ＢＰＦ２゜ＢＰＦ３は帯域通過
フィルタ、ＡＭＰＩ、ＡＭＰ２、ＡＭＰ３．ＡＭＰ４は
アンプ、ＡＤｌ、ＡＤ２、ＡＤ３は波形整形回路、ＡＮ
ＤＩ、ＡＮＤ２はアンド回路、Ｌはロジック回路を示し
ている。フィルタＢＰＦ　１は制御信号の周波数と等しいホルマ
ントの周波数成分Ｆを通過させるように構成され、フィ
ルタＢＰＦ２はこの周波数成分Ｆに対してピッチ周波数
Ｐだけ高いＦ＋Ｐの周波数成分を通過させるように構成
され、フィルタＢＰＦ３は周波数成分Ｆに対してピッチ
周波数Ｐだけ低いＦ−Ｐの周波数成分を通過させるよう
に構成される。したがって、“制御信号”としてファク
シミリのＣＮＧ信号を想定した場合にはフィルタＢＰＦ
１の通過帯域は１１００Ｈｚとなり、フィルタＢＰＦ２
の通過帯域は男性と女性のピンチ周波数（基本周波数）
を考慮して（１１００＆　＋１２５Ｈｚ）＝（１１００
Ｈｚ＋　２５０Ｈｚ）　＝１２２５Ｈｚ〜１３５０Ｈｚ
となり、フィルタＢＰＦ３の通過帯域は同様に（１１０
０Ｈｚ　−２５０Ｈｚ）　〜（１１００）１ｚ−１２５
１１ｚ）　　＝　８５０１１ｚ　〜９７５１１ｚとなる
。これらの通過帯域は音声の基本周波数の標準偏差を考
慮に入れれば、その分だけ変化することは勿論である。このような構成により、アンプＡＭＰ　１の入力端子に
“人間の音声”による信号が加えられた場合、この信号
には周波数成分Ｆ、Ｆ＋Ｐ、Ｆ−Ｐのいずれも含まれて
いるので、各フィルタＢＰＦ１、ＢＰＦ２．ＢＰＦ３の
いずれからも出力が生じ、波形整形回路ＡＤＩ、ＡＤ２
．ＡＤ３とアンプＡＭＰ２．ＡＭＰ３．ＡＭＰ４を介し
てアンド回路ＡＮＤ１．ＡＮＤ２に入力が加えられる。アンド回路ＡＮＤＩは２つの入力が共に１１”となるの
で、その出力は“ｌ”となる。またアンド回路ＡＮＤ２
も１つしかない入力が６１となるので、その出力は６１
″となる。これに対してアンプＡＭＰ　１の入力端子に
制御信号としてＣＮＧ信号が加えられた場合、この信号
には周波数成分Ｆしか含まれていないので、フィルタＢ
ＰＦ　１からは出力が生じるが、フィルタＢＰＦ２とＢ
ＰＦ３からは出力が生じない。したがってアンド回路Ａ
ＮＤＩは２つの入力が共に０”となるので、その出力は
０”となるが、アンド回路ＡＮＤ２は１つしかない入力
が“ビとなるので、その出力はｌ″となる。このように、第１図に示す構成の回路においては、人間
の音声と制御信号とではアンド回路ＡＮＤＩ、ＡＮＤ２
の出力が異なるので、使用目的に応じてロジック回路り
を構成することにより“人間の音声”と“制御信号”と
を識別することができる。第４図にファクシミリのＣＮＧ信号と音声を識別してＣ
ＮＧ信号が入力された場合にのみ信号を出力させるよう
にしたロジック回路りの構成を示す。第４図において、
ＩＮＶは否定回路、ＯＲはオア回路、Ｓはオア回路ＯＲ
の出力が“０”のときにＯＮ″となるアナログスイッチ
を示している。このようにロジック回路りを構成した場
合、アナログスイッチＳの動作は次表のようになる。したがって、第４図に記載されたロジック回路りにおい
てはＣＮＧ信号が入力された場合にのみスイッチＳがＯ
Ｎとなるので、このスイッチＳの出力により電話回線を
ファクシミリに接続すればよいことになる。ロジック回
路りの構成を変えることにより人間の音声信号が入力さ
れた場合にのみスイッチＳをＯＮさせるようにすること
ができることは勿論であり、その他の構成も適宜実施す
ることができる。Vowels are usually produced by the vocal cords being the sound source, and the shape of the oral cavity giving each vowel a certain shape (this is called articulation). Corresponding to the articulation form of this vowel, specific frequency components are emphasized, such as a, i, tsu, ku. This produces a frequency spectrum that characterizes the five vowels of o. The dominant frequency components that characterize these vowels are called formants, and the formants of the five vowels in Japanese are shown in Figure 2.As is clear from Figure 2, human speech mainly consists of three vowels. It consists of two resonant parts, or formants. In addition, the frequency of voice vibration corresponds to the fundamental frequency of voice, that is, the formant pinch frequency, and the average value and standard deviation of the fundamental frequency for men are approximately 125Hz and 20.5)1z, respectively, and for women it is approximately 250Hz. and approximately 41.51 (z). In this way, human voice is composed of a unique frequency spectrum, but if this human voice and a control signal composed of a single sine wave are combined in a frequency spectrum, The representation is as shown in Figure 3. In Figure 3, (a) is human voice;
(bl indicates the control signal, and the control signal is 11
A CNG signal of 00Hz is shown. As is clear from FIG. 3, since the control signal is composed of a single sine wave, unlike human speech, the control signal has a difference of at least n times the pinch frequency P with respect to the formant frequency component F, unlike human speech. Since the two frequency components F+nP and F-nP do not exist, even if the control signal and the human voice are in the same audio frequency band, the human voice is detected using the AND condition of the frequency components F+nP and F-nP. By doing so, it is possible to distinguish between control signals and human voices. FIG. 1 is a block diagram showing the configuration of an identification circuit based on the above-mentioned principle and which is used in the identification method according to the present invention. 1st
In the figure, BPF1, BPF2, BPF3 are band pass filters, AMPI, AMP2, AMP3, . AMP4 is an amplifier, ADl, AD2, AD3 are waveform shaping circuits, AN
DI and AND2 represent AND circuits, and L represents a logic circuit. The filter BPF 1 is configured to pass a formant frequency component F that is equal to the frequency of the control signal, and the filter BPF 2 is configured to pass a frequency component F+P that is higher than this frequency component F by a pitch frequency P. The filter BPF3 is configured to pass the frequency component FP lower than the frequency component F by the pitch frequency P. Therefore, when assuming a facsimile CNG signal as a "control signal", the filter BPF
The passband of 1 is 1100Hz, and the filter BPF2
The passband of is the male and female pinch frequency (fundamental frequency)
Considering (1100 & +125Hz) = (1100
Hz+250Hz) =1225Hz~1350Hz
Similarly, the passband of filter BPF3 is (110
0Hz -250Hz) ~(1100)1z-125
11z) = 85011z to 97511z. Of course, these passbands will change by that amount if the standard deviation of the fundamental frequency of the voice is taken into account. With this configuration, when a "human voice" signal is applied to the input terminal of amplifier AMP 1, this signal includes frequency components F, F+P, and F-P, so each filter BPF1, BPF2. Outputs are generated from both BPF3, and waveform shaping circuits ADI and AD2
．． AD3 and amplifier AMP2. AMP3. AND circuit AND1. Input is added to AND2. Since the two inputs of the AND circuit ANDI are both 11", its output is "l". Also, the AND circuit AND2
Since there is only one input, which is 61, its output is 61.
''.On the other hand, when a CNG signal is applied as a control signal to the input terminal of amplifier AMP 1, this signal contains only frequency component F, so filter B
Output is generated from PF 1, but filters BPF 2 and B
No output is generated from PF3. Therefore, AND circuit A
Since the two inputs of NDI are both 0'', its output is 0'', but the AND circuit AND2 has only one input, which is ``Bi'', so its output is 1''. In this way, in the circuit configured as shown in FIG.
Since the outputs are different, it is possible to distinguish between "human voice" and "control signal" by configuring a logic circuit according to the purpose of use. Figure 4 shows how to identify the CNG signal and voice of a facsimile.
This figure shows the configuration of a logic circuit that outputs a signal only when an NG signal is input. In Figure 4,
INV is an inversion circuit, OR is an OR circuit, S is an OR circuit
This figure shows an analog switch that turns ON when the output of S is "0." When the logic circuit is configured in this way, the operation of the analog switch S is as shown in the table below. In the described logic circuit, the switch S is turned ON only when the CNG signal is input.
Therefore, the output of this switch S can be used to connect the telephone line to the facsimile. Of course, by changing the configuration of the logic circuit, it is possible to turn on the switch S only when a human voice signal is input, and other configurations can also be implemented as appropriate.

【Effect of the invention】

本発明によれば、人間の音声を制御信号の周波数と等し
いホルマントの周波数成分Ｆに対してピッチ周波数Ｐの
ｎ倍の差を有する少な（とも２つの周波数成分Ｆ＋ｎＰ
、Ｆ−ｎＰのアンド条件により検出するようにしたため
、このような２つの周波数成分Ｆ＋ｎＰ、Ｆ−ｎＰを有
しない制御信号と人間の音声とを比較的簡単に、しかも
高い信頼性にて識別が可能である。According to the present invention, a human voice is divided into two frequency components F+nP having a difference of n times the pitch frequency P from a formant frequency component F that is equal to the frequency of the control signal.
, F-nP, it is possible to distinguish the human voice from a control signal that does not have these two frequency components F+nP and F-nP relatively easily and with high reliability. It is possible.

[Brief explanation of drawings]

第１図は本発明による識別方法を適用した識別回路のブ
ロック図、第２図は日本語５母音のホルマントの説明図
、第３図は人間の音声と制御信号の周波数スペクトルの
説明図、第４図はロジック回路の実施例を示す回路図を
示している。ＡＭＰＩ〜Ａ　Ｍ　Ｐ　４−−−−アンプ、ＢＰＦ　１
〜ＢＰＦ３　　・・−帯域通過フィルタ、ＡＤＩ−ＡＤ
３　−波形整形回路、ＡＮＤＩ、ＡＮＤ２　−　　アン
ド回路、Ｌ　−ロジック回路。　　　　　　　　　　　
、、＋７−１べル（６日）第３藺Fig. 1 is a block diagram of an identification circuit to which the identification method according to the present invention is applied, Fig. 2 is an explanatory diagram of the formants of the five Japanese vowels, Fig. 3 is an explanatory diagram of the frequency spectra of human speech and control signals, FIG. 4 shows a circuit diagram showing an embodiment of the logic circuit. AMPI~AMP 4----Amplifier, BPF 1
~BPF3...-Band pass filter, ADI-AD
3 - Waveform shaping circuit, ANDI, AND2 - AND circuit, L - logic circuit.
,, +7-1 bell (6th) 3rd episode

Claims

[Claims]

1) In systems that identify human voice signals and control signals transmitted via telephone lines, etc. within the same voice frequency band, the voice signal is detected using a formant frequency component F that is equal to the frequency of the control signal. At least two pitch frequencies having a difference of n times the pitch frequency P (where n is an integer)
A method for identifying an audio signal and a control signal, characterized in that detection is performed under an AND condition of two frequency components F+nP and F-nP, and the control signal is detected using a formant frequency component F.