JP5812932B2

JP5812932B2 - Voice listening device, method and program thereof

Info

Publication number: JP5812932B2
Application number: JP2012098839A
Authority: JP
Inventors: 哲小橋川; 済央野本; 浩和政瀧; 高橋　敏; 敏高橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-04-24
Filing date: 2012-04-24
Publication date: 2015-11-17
Anticipated expiration: 2032-04-24
Also published as: JP2013228459A

Description

本発明は、電話機から送話信号と受話信号を取得する音声聴取装置とその方法とプログラムに関する。 The present invention relates to a voice listening device for acquiring a transmission signal and a reception signal from a telephone, a method thereof, and a program.

従来から、電話機の送話信号と受話信号とを取得する音声聴取装置が存在する。その音声聴取装置の一つとして、送話信号と受話信号を音声認識するとともに聴取データを音声ファイルとして出力して再生できるようにする音声認識装置が知られている。その音声認識装置においては、送話信号と受話信号とを別々に音声認識する際に、送話側の話者と受話側の話者が同時に発言するクロストーク状態が問題になっていた。クロストーク状態の場合、それぞれの発話は通常と異なり乱れた信号となる。その乱れた送話信号と受話信号は音声認識の誤認識の原因になる。 2. Description of the Related Art Conventionally, there is a voice listening device that acquires a transmission signal and a reception signal of a telephone. As one of the voice listening devices, a voice recognition device is known that recognizes a transmission signal and a reception signal as voice and outputs listening data as a voice file for reproduction. In the speech recognition apparatus, when the speech signal and the reception signal are separately recognized, the crosstalk state in which the speaker on the transmission side and the speaker on the reception side speak simultaneously has been a problem. In the case of the crosstalk state, each utterance becomes a distorted signal unlike usual. The disturbed transmission signal and reception signal cause misrecognition of speech recognition.

クロストーク状態における音声認識の誤認識を防ぐ目的で、送話信号と受話信号のそれぞれについて音声区間検出を行い、片方のみの話者が発話状態の区間を検出してその区間のみを音声認識する音声認識装置１１０が特許文献１に開示されている。 In order to prevent misrecognition of speech recognition in the crosstalk state, speech section detection is performed for each of the transmission signal and the reception signal, and only one of the speakers detects the speech state section and recognizes only that section. A speech recognition apparatus 110 is disclosed in Patent Document 1.

図１２に示す従来の音声認識装置１１０の機能構成を参照してその動作を簡単に説明する。音声認識装置１１０は、側音抑圧処理部２１、送話音声区間検出部２２、受話音声区間検出部２３、音声区間情報管理部２４、送話信号抽出部２５、受話信号抽出部２６、送話信号録音部１６、受話信号録音部１７、音声認識処理部１１１、を具備する。 The operation will be briefly described with reference to the functional configuration of the conventional speech recognition apparatus 110 shown in FIG. The voice recognition device 110 includes a side tone suppression processing unit 21, a transmission voice segment detection unit 22, a reception voice segment detection unit 23, a voice segment information management unit 24, a transmission signal extraction unit 25, a reception signal extraction unit 26, and a transmission. A signal recording unit 16, an incoming signal recording unit 17, and a speech recognition processing unit 111 are provided.

側音抑圧処理部２１は、受話信号に回り込む側音信号を除去する。側音とは、会話をし易くする目的で、自端末側のスピーカに少量出力される自端末側の送話信号のことである。側音信号は、電話機１９内の側音回路１５において受話信号に付加される。 The side tone suppression processing unit 21 removes the side tone signal that wraps around the received signal. A side sound is a transmission signal on the terminal side that is output in a small amount to a speaker on the terminal side for the purpose of facilitating conversation. The side sound signal is added to the received signal in the side sound circuit 15 in the telephone 19.

送話音声区間検出部２２は、送話信号の音声区間と非音声区間を検出する。受話音声区間検出部２３は、受話信号の音声区間と非音声区間を検出する。音声区間情報管理部２４は、送話信号と受話信号の音声区間と非音声区間を入力としてクロストーク状態ではない送話音声抽出区間と受話音声区間とを特定する。 The transmission voice section detection unit 22 detects a voice section and a non-voice section of the transmission signal. The received voice section detector 23 detects a voice section and a non-voice section of the received signal. The voice section information management unit 24 inputs the voice section and the non-voice section of the transmission signal and the reception signal, and specifies the transmission voice extraction section and the reception voice section that are not in the crosstalk state.

音声認識処理部１１１は、送話音声抽出区間の送話信号と、受話音声抽出区間の受話信号と、を音声認識処理して、聴取データとするとともに音声ファイルを出力して再生できるようにする。 The voice recognition processing unit 111 performs voice recognition processing on the transmission signal in the transmission voice extraction section and the reception signal in the reception voice extraction section to generate listening data and to output and reproduce the voice file. .

特開２００６−３４３６４２号公報JP 2006-343642 A

従来の音声を聴取する目的で送受話信号を音声認識する音声認識装置１１０では、クロストーク状態の受話信号を取り出すことが出来ない課題がある。 In the conventional speech recognition apparatus 110 that recognizes a transmission / reception signal for the purpose of listening to a voice, there is a problem that a reception signal in a crosstalk state cannot be extracted.

この発明は、このような課題に鑑みてなされたものであり、クロストーク状態の受話信号も、抽出できるようにした音声聴取装置とその方法とプログラムを提供することを目的とする。 The present invention has been made in view of such a problem, and an object of the present invention is to provide an audio listening device, a method thereof, and a program capable of extracting a reception signal in a crosstalk state.

本発明の音声聴取装置は、側音抑圧処理部と、受話音声区間検出部と、受話信号抽出部と、送受話信号記録部と、を具備する。側音抑圧処理部は、電話機からの受話信号と送話信号を入力として側音信号を抑圧した側音抑圧済み受話信号を出力する。受話音声区間検出部は、側音抑圧済み受話信号を入力として、当該側音抑圧済み受話信号の音声区間を検出して受話音声区間情報を出力する。受話信号抽出部は、受話音声区間情報と受話信号を入力として、受話音声区間情報に対応する受話信号を、側音除去受話信号として出力する。送受話信号記録部は、側音除去受話信号と送話信号を記録する。 The voice listening device of the present invention includes a side-tone suppression processing unit, a received voice segment detecting unit, a received signal extracting unit, and a transmitted / received signal recording unit. The side tone suppression processing unit outputs a reception signal after side tone suppression in which the side tone signal is suppressed by receiving the reception signal and the transmission signal from the telephone. The reception voice section detection unit receives the side-tone-suppressed reception signal, detects a voice section of the side-tone-suppressed reception signal, and outputs reception voice section information. The received signal extraction unit receives the received voice segment information and the received signal, and outputs a received signal corresponding to the received voice segment information as a side tone removed received signal. The transmission / reception signal recording unit records the side-tone-removed reception signal and the transmission signal.

この発明の音声聴取装置によれば、側音信号を抑圧した側音抑圧済み受話信号から受話音声区間を検出し、当該受話音声区間内の受話信号を受話信号として記録するので、クロストーク状態の場合でも全ての受話信号を漏れなく聴取することが可能になる。 According to the voice listening device of the present invention, the received voice section is detected from the side-tone-suppressed received signal in which the side-tone signal is suppressed, and the received signal in the received voice section is recorded as the received signal. Even in this case, it is possible to listen to all received signals without omission.

本発明の音声聴取装置１００の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 100 of this invention. 音声聴取装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the audio | voice listening apparatus 100. 本発明の音声聴取装置２００の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 200 of this invention. 本発明の音声聴取装置３００の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 300 of this invention. 受話信号抽出部３０１の機能構成例を示す図。The figure which shows the function structural example of the received signal extraction part 301. FIG. 受話信号抽出部３０１が出力する側音除去受話信号の例を示す図。The figure which shows the example of the side tone removal received signal which the received signal extraction part 301 outputs. 本発明の音声聴取装置４００の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 400 of this invention. 本発明の音声聴取装置５００の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 500 of this invention. 受話区間抽出部５０３の機能構成例を示す図。The figure which shows the function structural example of the receiving area extraction part 503. FIG. 受話信号抽出部５０３の動作フローを示す図。The figure which shows the operation | movement flow of the received signal extraction part 503. 受話信号抽出部５０３が出力する側音除去受話信号の例を示す図。The figure which shows the example of the side tone removal received signal which the received signal extraction part 503 outputs. 従来の音声認識装置１１０の機能構成例を示す図。The figure which shows the function structural example of the conventional speech recognition apparatus 110. FIG.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

図１に、この発明の音声聴取装置１００の機能構成例を示す。その動作フローを図２に示す。音声聴取装置１００は、側音抑圧処理部１０と、受話音声区間検出部２０と、受話信号抽出部３０と、送受話信号記録部４０と、を具備する。音声聴取装置１００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。以降で説明する他の実施例に示す音声聴取装置も同様である。 FIG. 1 shows an example of the functional configuration of a voice listening device 100 according to the present invention. The operation flow is shown in FIG. The voice listening device 100 includes a side-tone suppression processing unit 10, a received voice section detecting unit 20, a received signal extracting unit 30, and a transmitted / received signal recording unit 40. The audio listening device 100 is realized by a predetermined program being read into a computer composed of, for example, a ROM, a RAM, and a CPU, and the CPU executing the program. The same applies to the audio listening devices shown in other embodiments described below.

側音抑圧処理部１０は、電話機１９からの受話信号と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号を出力する（ステップＳ１０）。マイクロホン１１とスピーカ１２と送信部１３と受信部１４と側音回路１５とで構成される電話機１９は、従来技術で示したものと同じである。なお、電話機１９の参照符号は作図の都合により省略している。電話機１９は一般的なものであり、受話信号が相手方の発話者音声、送話信号が着信を受けている側の送話者音声である。 The side sound suppression processing unit 10 receives the reception signal and the transmission signal from the telephone 19 and outputs a side sound suppressed reception signal with the side sound signal suppressed (step S10). A telephone set 19 including a microphone 11, a speaker 12, a transmission unit 13, a reception unit 14, and a side sound circuit 15 is the same as that shown in the related art. Note that reference numerals of the telephone 19 are omitted for convenience of drawing. The telephone 19 is a general one, and the reception signal is the other party's speaker voice, and the transmission signal is the other party's voice.

側音抑圧処理部１０は、受話信号と送話信号とを入力とし、受話信号に重畳する送話信号を抑圧して側音抑圧済み受話信号を出力する。側音抑圧処理部１０は、一般的なエコーキャンセラで構成できる。この例の側音抑圧済み受話信号は、相手方の発話音声に重畳した送話者の発話音声を抑圧した信号である。 Side sound suppression processing unit 10 receives the received signal and the transmitted signal, suppresses the transmitted signal superimposed on the received signal, and outputs the received signal with the side sound suppressed. The side tone suppression processing unit 10 can be configured by a general echo canceller. The side-tone-suppressed reception signal in this example is a signal in which the utterance voice of the sender superimposed on the utterance voice of the other party is suppressed.

受話音声区間検出部２０は、側音抑圧済み受話信号を入力として、当該側音抑圧済み受話信号の音声区間を検出して受話音声区間情報を出力する（ステップＳ２０）。受話音声区間情報とは、受話音声区間を現す区間情報である。音声区間検出の方法としては、一般的な音量（パワー）に基づく手法を用いると良い。又は、混合正規分布（GMM:Gaussian Mixture Model）に基づく音声モデルと非音声モデルを用いた音声区間検出を行っても良い。混合正規分布を用いた音声区間検出の場合は、ＧＭＭの学習に上記した側音抑圧済み受話信号を用いることで、音声／非音声の識別性能の向上を図ることが出来る。 The received voice section detection unit 20 receives the side-tone-suppressed received signal, detects the voice section of the side-tone-suppressed received signal, and outputs received voice section information (step S20). The received voice section information is section information representing the received voice section. As a method for detecting a voice section, a method based on a general sound volume (power) may be used. Alternatively, speech segment detection using a speech model and a non-speech model based on a mixed normal distribution (GMM: Gaussian Mixture Model) may be performed. In the case of voice segment detection using a mixed normal distribution, the voice / non-voice discrimination performance can be improved by using the above-described side-tone-suppressed reception signal for GMM learning.

受話信号抽出部３０は、受話音声区間検出部２０が出力する受話音声区間情報と受話信号を入力として、当該受話音声区間情報に対応する受話信号を、側音除去受話信号として出力する（ステップＳ３０）。また、受話信号抽出部３０は、上記受話音声区間情報以外の側音除去受話信号の振幅を０（無音）にして出力する。側音除去受話信号は、受話音声区間検出部２０において受話音声区間として検出された受話区間に対応する区間の受話信号のみの信号となる。 The reception signal extraction unit 30 receives the reception voice section information and the reception signal output from the reception voice section detection unit 20, and outputs a reception signal corresponding to the reception voice section information as a side tone removal reception signal (step S30). ). In addition, the reception signal extraction unit 30 sets the amplitude of the side-tone-removed reception signal other than the reception voice section information to 0 (silence) and outputs it. The side-tone-removed reception signal is a signal only for the reception signal in the section corresponding to the reception section detected as the reception voice section by the reception voice section detection unit 20.

側音除去受話信号は、送話者の音声を抑圧した受話者の音声信号から検出した受話音声区間情報に対応した受話信号であるので、クロストーク状態の受話者の音声信号（受話信号）も漏れなく抽出することが出来る。つまり、受話信号と送話信号が重なるクロストーク状態でも、受話信号を聴取することが可能である。 Since the side-tone-removed reception signal is a reception signal corresponding to the reception voice section information detected from the voice signal of the listener who suppresses the voice of the sender, the voice signal (reception signal) of the speaker in the crosstalk state is also used. Extract without leakage. That is, it is possible to listen to the received signal even in a crosstalk state where the received signal and the transmitted signal overlap.

送受話信号記録部４０は、受話信号抽出部３０が出力する側音除去受話信号と送話信号を記録する（ステップＳ４０）。送受話信号記録部４０は、送話信号のファイルと、送話信号と時間的に同期のとれた側音除去受話信号の音声ファイルを生成する。音声ファイルは２つ別々、又はステレオの１つのファイルであっても良い。 The transmission / reception signal recording unit 40 records the side-tone-removed reception signal and the transmission signal output from the reception signal extraction unit 30 (step S40). The transmission / reception signal recording unit 40 generates a transmission signal file and an audio file of a side-tone-removed reception signal synchronized in time with the transmission signal. The audio files may be two separate files or one stereo file.

以上説明したように音声聴取装置１００によれば、側音信号を抑圧した側音抑圧済み受話信号から受話音声区間情報を検出し、当該受話音声区間情報に対応する受話信号を側音除去受話信号として記録するので、クロストーク状態の場合でも全ての受話信号を漏れなく聴取することが可能になる。側音抑圧処理部１０は、受話音声区間を検出する目的のみで側音を抑圧するので抑圧量を大きくすることが出来る。つまり、波形に歪みが発生しても構わないので、正確な受話音声区間を検出することが可能である。その正確な受話音声区間情報に対応して抽出される受話信号は、歪みの少ない信号にすることが出来るため、聴き取り易く、また、音声認識等に利用し易い信号となる。 As described above, according to the voice listening device 100, the received voice section information is detected from the side-tone-suppressed received signal in which the side-tone signal is suppressed, and the received signal corresponding to the received voice-section information is detected as the side-tone-removed received signal. Therefore, it is possible to listen to all received signals without omission even in a crosstalk state. The side-tone suppression processing unit 10 suppresses the side-tone only for the purpose of detecting the reception voice section, so that the amount of suppression can be increased. In other words, since the waveform may be distorted, it is possible to detect an accurate received voice interval. The received signal extracted corresponding to the accurate received voice section information can be a signal with little distortion, and thus is a signal that is easy to listen to and easy to use for voice recognition and the like.

なお、図１に、破線で音声認識処理部５０を示すように、送受話信号記録部４０に記録された側音除去受話信号と送話信号を、音声認識処理部５０で音声認識処理してテキストデータに変換しても良い。また、音声視聴部６０において、音声認識結果のテキストデータと側音除去受話信号と送話信号の音声ファイルとを対応付けて音声ファイルを効率良く視聴できるようにしても良い。音声視聴部６０は、例えば、テキストデータをディスプレ装置で表示すると共に、音声ファイルを再生できるようにするアプリケーションソフトを含む機能部である。 As shown in FIG. 1, the voice recognition processing unit 50 is indicated by a broken line, and the side recognition removed reception signal and the transmission signal recorded in the transmission / reception signal recording unit 40 are subjected to voice recognition processing by the voice recognition processing unit 50. You may convert into text data. In addition, the audio viewing unit 60 may be configured to associate the text data of the speech recognition result, the side-tone-removed reception signal, and the audio file of the transmission signal with each other so that the audio file can be efficiently viewed. The audio viewing unit 60 is, for example, a functional unit that includes application software that displays text data on a display device and that can reproduce an audio file.

次に、側音抑圧受話信号から受話信号を抽出するようにした音声聴取装置２００を説明する。 Next, a voice listening device 200 that extracts a reception signal from a side tone suppression reception signal will be described.

図３に、この発明の音声聴取装置２００の機能構成例を示す。音声聴取装置２００は、音声聴取装置１００に対して側音低抑圧処理部２０１を、更に備える点で異なる。側音低抑圧処理部２０１は、受話信号と送話信号を入力として、原音付加率の大きい低抑圧量の側音低抑圧済み受話信号を、受音信号抽出部３０に出力する。 FIG. 3 shows a functional configuration example of the voice listening device 200 of the present invention. The audio listening device 200 is different from the audio listening device 100 in that it further includes a side tone low suppression processing unit 201. The side tone low suppression processing unit 201 receives the reception signal and the transmission signal as inputs, and outputs a low suppression amount side reception suppressed signal having a large original sound addition rate to the reception signal extraction unit 30.

受音信号抽出部３０は、受話音声区間検出部２０が出力する受話音声区間情報と側音低抑圧済み受話信号を入力として、当該受話音声区間情報に対応する側音低抑圧済み受話信号を、側音除去受話信号として出力する。 The reception signal extraction unit 30 receives the reception voice section information output from the reception voice section detection unit 20 and the reception signal with the side-tone low-suppressed input, and receives the side-tone-suppressed reception signal corresponding to the reception voice section information. Output as a sidetone-removed reception signal.

側音低抑圧処理部２０１のエコー抑圧量は、側音抑圧処理部１０よりも低めに設定される。そのエコー抑圧量に対して側音抑圧処理部１０は、原音付加率Ａの高いエコー抑圧量（例えば、Ａ＝０．１）に設定する。エコー抑圧量を高く設定すると受話信号が歪むが、受話音声区間情報は正確に検出することが可能である。 The echo suppression amount of the side tone suppression processing unit 201 is set lower than that of the side tone suppression processing unit 10. The side-tone suppression processing unit 10 sets the echo suppression amount with a high original sound addition rate A (for example, A = 0.1) with respect to the echo suppression amount. If the echo suppression amount is set high, the reception signal is distorted, but the reception voice section information can be accurately detected.

例えば原音付加率Ａ＝０．６程度の低いエコー抑圧量に設定される側音低抑圧処理部２０１の出力する側音低抑圧済み受話信号の歪みは、送話信号が抑圧された歪みの少ない受話信号である。受話信号抽出部３０は、受話音声区間情報とその歪みの少ない側音低抑圧済み受話信号とから、側音除去受話信号を抽出するので、当該側音除去受話信号を例えば音声認識処理しても誤認識が少ない音声信号にすることが出来る。 For example, the distortion of the side-tone-suppressed received signal output from the side-tone low suppression processing unit 201 that is set to a low echo suppression amount of the original sound addition rate A = 0.6 is less distortion that the transmission signal is suppressed. This is a received signal. Since the reception signal extraction unit 30 extracts the side tone removal reception signal from the reception voice section information and the reception signal with suppressed sidetone low suppression with less distortion, even if the side sound removal reception signal is subjected to voice recognition processing, for example. An audio signal with few misrecognitions can be obtained.

このように、音声聴取装置２００によれば、受話音声区間検出部２０において受話音声区間情報を検出するための音声信号のエコー抑圧量を高くして正確な受話音声区間情報を検出可能にし、その正確な受話音声区間情報に対応する歪みの少ない側音低抑圧済み受話信号から、歪みの少ない側音除去受話信号を抽出することが出来る。 As described above, according to the voice listening device 200, the received voice section detecting unit 20 can increase the echo suppression amount of the voice signal for detecting the received voice section information so that accurate received voice section information can be detected. A side-tone-removed received signal with less distortion can be extracted from a received signal with reduced side-tone low suppression corresponding to accurate received voice section information.

次に、受話音声区間以外の側音除去受話信号の振幅を０にするところに、雑音を重畳さるようにした音声聴取装置３００を説明する。 Next, a description will be given of the voice listening apparatus 300 in which noise is superimposed on the side sound removal received signal other than the received voice section where the amplitude of the received signal is zero.

図４に、この発明の音声聴取装置３００の機能構成例を示す。音声聴取装置３００は、音声聴取装置１００に対して受話信号抽出部３０１の動作が異なる。受話信号抽出部３０１は、上記した側音除去受話信号の振幅が０（無音）のところに、振幅の小さな雑音（例えば白色雑音）を重畳するように動作する。 FIG. 4 shows a functional configuration example of the voice listening device 300 of the present invention. The voice listening device 300 differs from the voice listening device 100 in the operation of the received signal extraction unit 301. The received signal extraction unit 301 operates so as to superimpose noise with a small amplitude (for example, white noise) on the place where the amplitude of the above-described side-tone-removed received signal is 0 (silence).

受話信号抽出部３０１は、受話音声区間情報に対応する受話信号又は側音低抑圧済み受話信号を側音除去受話信号として出力すると共に、受話音声区間情報に対応しない区間の側音除去受話信号を、音量レベルの小さい雑音として出力する。 The reception signal extraction unit 301 outputs the reception signal corresponding to the reception voice section information or the reception signal after the side-tone low suppression as the side sound removal reception signal, and the side-tone removal reception signal of the section not corresponding to the reception voice section information. , Output as low noise level.

音声聴取装置３００は、上記した音声聴取装置１００，２００に対して、側音除去受話信号の振幅が急激に変化しないので、聴覚の連続聴効果（例えば、http://www.brl.ntt.co.jp/IllusionForum/a/continuityIllusion/ja/index.html参照）として報告されている補完が働き易くなり、聴取者の聞き誤りを防ぐ効果を奏することが出来る。音声聴取装置３００は、聴取者の聞き誤りを防止するばかりでなく、音声認識精度の安定性を向上させる効果も奏する。 Since the amplitude of the side-tone-removed reception signal does not change abruptly in the audio listening device 300 compared to the audio listening devices 100 and 200 described above, the auditory continuous listening effect (for example, http://www.brl.ntt. co.jp/IllusionForum/a/continuityIllusion/en/index.html), which is easier to work with, and can prevent the listener from making mistakes. The voice listening device 300 not only prevents the listener from hearing errors but also improves the stability of voice recognition accuracy.

図５に、受話信号抽出部３０１の機能構成例を示す。受話信号抽出部３０１は、雑音生成手段３０１０と、雑音重畳手段３０１１と、を備える。雑音生成手段３０１０は、例えば、−８０ｄＢ程度の音量レベルが非常に小さな白色雑音を生成する。白色雑音は、例えば、正規乱数を用いた従来手法によって容易に生成することが出来る。 FIG. 5 shows a functional configuration example of the received signal extraction unit 301. The received signal extraction unit 301 includes noise generation means 3010 and noise superimposition means 3011. The noise generation unit 3010 generates white noise having a very small volume level of, for example, about −80 dB. The white noise can be easily generated by a conventional method using normal random numbers, for example.

雑音重畳手段３０１１は、受話音声区間情報と、受話信号又は側音低抑圧済み受話信号を入力として、受話音声区間情報に対応する区間を、受話信号又は側音低抑圧済み受話信号とし、それ以外の区間を白色雑音とした側音除去受話信号を出力する。 The noise superimposing means 3011 receives the reception voice section information and the reception signal or the reception signal with the side-tone low suppressed, and sets the section corresponding to the reception voice section information as the reception signal or the side-tone low-suppression reception signal. A sidetone-removed reception signal with white noise in the section is output.

図６に、受話信号抽出部３０１が出力する側音除去受話信号の例を示す。１行目は受話信号、２行目はその受話信号を入力として受話音声区間検出部２０が出力する受話音声区間情報である。３行目は、雑音生成手段３０１０が出力する白色雑音である。４行目は、上記した受話音声区間情報と受話信号と白色雑音とを入力とした場合の、雑音重畳手段３０１１が出力する側音除去受話信号を示す。受話音声区間情報に対応する区間には受話信号が出力され、それ以外の区間には白色雑音が出力されている様子が分かる。 FIG. 6 shows an example of the sidetone-removed reception signal output by the reception signal extraction unit 301. The first line is a reception signal, and the second line is reception voice section information output from the reception voice section detection unit 20 with the reception signal as an input. The third line is white noise output from the noise generating unit 3010. The fourth line shows a side-tone-removed reception signal output by the noise superimposing means 3011 when the above-described reception voice section information, reception signal, and white noise are input. It can be seen that the reception signal is output in the section corresponding to the reception voice section information, and white noise is output in the other sections.

なお、受話信号の音量レベルが小さい場合には、雑音重畳による音声認識率の低下が問題になることがある。そこで、受話信号の音量レベル又は、受話信号のＳ／Ｎ比に応じて重畳する雑音のレベルを変えるようにしても良い。その場合は、受話信号抽出部３０１内に受話信号又は側音低抑圧済み受話信号のＳ／Ｎ比を検出するＳ／Ｎ比検出手段３０１２を設け、Ｓ／Ｎ比検出手段３０１２で検出したＳ／Ｎ比に応じて白色雑音生成手段３０１０が生成する白色雑音の振幅を、例えばＳ／Ｎ比が３０ｄＢ以上になるように自動的に制御するようにしても良い。Ｓ／Ｎ比検出手段３０１２は、受話信号又は側音低抑圧済み受話信号のパワーを検出するパワー検出手段に代えても良い。なお、実施例３は白色雑音を、側音除去受話信号に重畳させる例を説明したが、雑音は白色雑音に限定する必要はない。例えば、パワースペクトル密度が周波数に反比例する関係のピンクノイズを用いても良い。白色雑音、ピンクノイズ以外の雑音であっても同様の効果を奏する雑音を用いても良い。 Note that when the volume level of the received signal is low, there is a problem that the speech recognition rate is reduced due to noise superimposition. Therefore, the volume level of the received signal or the level of noise to be superimposed may be changed according to the S / N ratio of the received signal. In that case, S / N ratio detection means 3012 for detecting the S / N ratio of the reception signal or the reception signal with suppressed side tone low is provided in the reception signal extraction unit 301, and the S detected by the S / N ratio detection means 3012. The amplitude of the white noise generated by the white noise generating unit 3010 according to the / N ratio may be automatically controlled so that the S / N ratio becomes, for example, 30 dB or more. The S / N ratio detection means 3012 may be replaced with a power detection means for detecting the power of the reception signal or the reception signal with the side-tone low suppressed. In addition, although Example 3 demonstrated the example which superimposes white noise on a side-tone removal receiving signal, noise does not need to be limited to white noise. For example, pink noise whose power spectral density is inversely proportional to the frequency may be used. Even noise other than white noise and pink noise may be used that exhibits the same effect.

また、上記では、受話音声区間情報に対応しない区間の側音除去受話信号を、音量レベルの小さい雑音として出力する例について説明したが、重畳する雑音が小さい場合、重畳される音声に与える影響は少なくなるため、受話音声区間情報に対応する区間を含む受話音声全体に音量レベルの小さい雑音を重畳しても良い。同様に聴覚の連続聴効果を得ることができる。 Further, in the above description, an example in which a side-tone-removed reception signal in a section that does not correspond to the reception voice section information is output as noise with a low sound volume level has been described. Therefore, noise with a low volume level may be superimposed on the entire received voice including the section corresponding to the received voice section information. Similarly, an auditory continuous listening effect can be obtained.

次に、受話信号抽出部が、送話側の音量レベルが過度に大きい区間の側音除去受話信号の振幅を下げるように動作する音声聴取装置４００を説明する。 Next, the voice listening device 400 in which the received signal extraction unit operates so as to reduce the amplitude of the side-tone-removed received signal in a section where the volume level on the transmitting side is excessively high will be described.

図７に、音声聴取装置４００の機能構成例を示す。音声聴取装置４００は、音声聴取装置１００に対して、送話音声区間検出部４０１を備える点と受話信号抽出部４０２の動作とが異なる。 FIG. 7 shows a functional configuration example of the audio listening device 400. The voice listening device 400 differs from the voice listening device 100 in that it includes a transmitted voice section detecting unit 401 and an operation of the received signal extracting unit 402.

送話音声区間検出部４０１は、送話信号を入力として当該送話信号の音声区間検出を行い送話音声区間を検出すると共に、その送話音声区間内の音量レベルを検出してその音量レベルと送話音声区間とを出力する。音声区間の検出方法は、上記した受話音声区間検出部２０と同じである。 The transmission voice section detection unit 401 receives a transmission signal, detects a voice section of the transmission signal, detects a transmission voice section, detects a volume level in the transmission voice section, and detects the volume level. And the transmission voice section are output. The method for detecting the voice segment is the same as that of the received voice segment detector 20 described above.

受話信号抽出部４０２は、送話音声区間検出部が出力する音量レベルを参照してその音量レベルが所定値以上の場合、その区間に対応する受話信号の振幅を低下させるように制御する。所定値とは、例えば送話信号の音量レベルが−２０ｄＢ以上の場合、送話信号の音量レベル／最大音量レベル（音声聴取装置４００で扱える音量レベルの最大値）の比をαとした時に、その送話音声区間を含む側音除去受話信号の振幅を例えば（１−α）倍に低下させる。 The reception signal extraction unit 402 refers to the volume level output from the transmission voice section detection unit, and controls the amplitude of the reception signal corresponding to the section when the volume level is a predetermined value or more. For example, when the volume level of the transmission signal is −20 dB or more, the predetermined value is when the ratio of the volume level of the transmission signal / the maximum volume level (maximum value of the volume level that can be handled by the audio listening device 400) is α, For example, the amplitude of the side-tone-removed reception signal including the transmission voice section is reduced by (1−α) times.

このようにすることで、送話側の音量レベルが過度に大きい区間の側音除去受話信号の音量レベルを下げることができ、送話信号の回り込み音声が大音量で聞こえることを抑制することが出来る。なお、この音声聴取装置４００の考えは、音声聴取装置２００と３００と、組み合わせても良く、その場合でも同様の効果を奏することが可能である。 By doing in this way, the volume level of the side-tone-removed reception signal in the section where the volume level on the transmission side is excessively high can be lowered, and the sneak sound of the transmission signal can be suppressed from being heard at a high volume. I can do it. The idea of the audio listening device 400 may be combined with the audio listening devices 200 and 300, and even in that case, the same effect can be obtained.

次に、受話音声区間情報の前後にマージン時間を付与した受話音声区間に対応する受話信号を、側音除去受話信号として出力するようにした音声聴取装置５００を説明する。 Next, a description will be given of a voice listening device 500 that outputs a received signal corresponding to a received voice section with a margin time before and after the received voice section information as a side-tone-removed received signal.

図８に、音声聴取装置５００の機能構成例を示す。音声聴取装置５００は、側音抑圧処理部１０と、受話送話信号記録部５０１と、音声区間検出部５０２と、受話信号抽出部５０３と、側音除去受話信号記録部５０４と、を具備する。側音抑圧処理部１０は、参照符号から明らかなように音声聴取装置１００と同じものである。 FIG. 8 shows a functional configuration example of the audio listening device 500. The voice listening device 500 includes a side-tone suppression processing unit 10, a received transmission signal recording unit 501, a voice segment detection unit 502, a received signal extraction unit 503, and a side-tone-removed reception signal recording unit 504. . The side-tone suppression processing unit 10 is the same as the audio listening device 100 as is clear from the reference numerals.

側音抑圧処理部１０は、図示しない電話機から受話信号と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号を出力する。 Sidetone suppression processing unit 10 receives a reception signal and a transmission signal from a telephone (not shown), and outputs a sidetone-suppressed reception signal in which the sidetone signal is suppressed.

受話送話信号記録部５０１は、側音抑圧処理部１０が出力する側音抑圧済み受話信号と、送話信号とを記録する。音声聴取装置５００は、実時間処理するものではなく、例えば受話者と送話者の会話を記録した音声ファイルの音声を再生させて、一度記録した側音抑圧済み受話信号と送話信号を用いて処理を行うものである。 The received transmission signal recording unit 501 records the side-tone-suppressed reception signal output from the side-tone suppression processing unit 10 and the transmission signal. The audio listening device 500 does not perform real-time processing. For example, the audio listening device 500 reproduces the audio of the audio file in which the conversation between the receiver and the transmitter is recorded, and uses the side-signal-suppressed reception signal and the transmission signal that have been recorded once. Process.

音声区間検出部５０２は、受話送話信号記録部５０１から側音抑圧済み受話信号と送話信号とを読み出して、側音抑圧済み受話信号の複数の音声区間の開始時刻Ｒｕ_１〜Ｒｕ_ｎと終了時刻Ｒｄ_１〜Ｒｄ_ｎ、及び上記送話信号の複数の音声区間の開始時刻Ｓｕ_１〜Ｓｕ_ｎと終了時刻Ｓｄ_１〜Ｓｄ_ｎを検出して音声区間情報として出力する。 The speech section detection unit 502 reads the side-tone-suppressed reception signal and the transmission signal from the reception-speech signal recording unit 501, and the start times Ru _{1 to} Ru _n of the plurality of speech sections of the side-tone-suppressed reception signal end time _Rd 1 ~ Rd _n, and output as speech segment information by detecting the start time _Su 1 to SU _n and the end time _Sd 1 to SD _n of a plurality of speech segment of said transmission signal.

受話信号抽出部５０３は、音声区間情報Ｒｕ_１〜Ｒｕ_ｎ，Ｒｄ_１〜Ｒｄ_ｎ，Ｓｕ_１〜Ｓｕ_ｎ，Ｓｄ_１〜Ｓｄ_ｎと受話信号とを入力として、側音抑圧済み受話信号の音声区間情報の前後にマージン時間を付与した受話音声区間に対応する受話信号を、側音除去受話信号として出力する。 The received signal extraction unit 503 receives the speech section information Ru _{1 to} Ru _n , Rd _{1 to} Rd _n , Su _{1 to} Su _n , Sd _{1 to} Sd _n and the received signal, and receives the speech section of the side-tone-suppressed received signal. A reception signal corresponding to a reception voice section to which a margin time is given before and after the information is output as a side-tone-removed reception signal.

側音除去受話信号記録部５０４は、受話信号抽出部５０３が出力する側音除去受話信号を記録する。 The side tone removed received signal recording unit 504 records the side tone removed received signal output by the received signal extracting unit 503.

音声聴取装置５００によれば、側音除去受話信号の前後にマージン時間を付与することが出来るので、例えば、側音除去受話信号が子音で始まる時に音声区間が欠落してしまう場合などの音声区間検出の誤検出を防止することが可能である。 According to the voice listening device 500, margin time can be given before and after the sidetone removal received signal, so that, for example, a voice section is lost when the sidetone removal received signal starts with a consonant. It is possible to prevent erroneous detection.

図９に、受話信号抽出部５０３のより具体的な機能構成例を示す。受話信号抽出部５０３は、マージン時間付与手段５１２と側音除去受話信号抽出手段５２２とを備える。マージン時間付与手段５１２は、音声区間情報Ｒｕ_１〜Ｒｕ_ｎ，Ｒｄ_１〜Ｒｄ_ｎ，Ｓｕ_１〜Ｓｕ_ｎ，Ｓｄ_１〜Ｓｄ_ｎを入力として、側音抑圧済み受話信号の音声区間の開始時刻Ｒｕ_１〜Ｒｕ_ｎより所定の時間Ｒｍｓ過去の時間内に、送話信号の音声区間の終了時刻Ｓｄ_１〜Ｓｄ_ｎが存在するか否かを判定する。そして、側音抑圧済み受話信号の音声区間の終了時刻Ｒｄ_１〜Ｒｄ_ｎより所定の時間Ｒｍｅ未来の時間内に、送話信号の音声区間の開始時刻Ｓｕ_１〜Ｓｕ_ｎが存在するか否かを判定する。所定時間内に開示時刻と終了時刻が存在しない場合は、所定時間のマージン時間を受話音声区間情報の前後に付与する。 FIG. 9 shows a more specific functional configuration example of the received signal extraction unit 503. The received signal extraction unit 503 includes a margin time giving unit 512 and a side tone removal received signal extracting unit 522. Margin time imparting means 512, the speech section information _{_{_{_{Ru 1 ~Ru n, Rd 1 ~Rd}}}} n, Su 1 ~Su n, as input Sd 1 to SD _n, the start time of the speech section of the sidetone suppression already received signal Ru within ₁ ～Ru predetermined time than _n Rms of past time, it determines whether the end time _Sd 1 to SD _n speech period of the transmission signal is present. Then, in the end time _Rd 1 ~ Rd _n predetermined time than Rme future time of the audio section of the sidetone suppression already received signal, whether the start time _Su 1 to SU _n in the speech section of the transmission signal is present Determine. When the disclosure time and the end time do not exist within the predetermined time, a margin time of the predetermined time is given before and after the reception voice section information.

図１０に示す音声区間検出部５０２と受話信号抽出部５０３の動作フローを参照して更に説明する。音声区間検出部５０２は、受話送話信号記録部５０１から側音抑圧済み受話信号と送話信号とを読み出して、側音抑圧済み受話信号の開始時刻Ｒｕ_１〜Ｒｕ_ｎと終了時刻Ｒｄ_１〜Ｒｄ_ｎを検出する（ステップＳ５０２０）。開始時刻とは受話音声区間の立上りの時刻、終了時刻とは受話音声区間の立下り時刻のことである。同様に、送話音声区間の開始時刻Ｓｕ_１〜Ｓｕ_ｎと終了時刻Ｓｄ_１〜Ｓｄ_ｎを検出する（ステップＳ５０２１）。図１１に、２個の受話音声区間情報Ｒｕ_１とＲｕ_２，Ｒｄ_１とＲｄ_２と、２個の送話音声区間情報Ｓｕ_１とＳｕ_２，Ｓｄ_１とＳｄ_２とを例示する。経過時間順に開始時刻と終了時刻にそれぞれ番号が付されている。 Further description will be given with reference to the operation flow of the voice section detection unit 502 and the reception signal extraction unit 503 shown in FIG. The voice section detection unit 502 reads the side-tone-suppressed reception signal and the transmission signal from the reception-speech signal recording unit 501, and starts the side-tone-suppressed reception signal Ru _{1 to} Ru _n and the end time Rd ₁ to detecting the Rd _n (step S5020). The start time is the rise time of the received voice section, and the end time is the fall time of the received voice section. Similarly, to detect an end time _Sd 1 to SD _n and start time _Su 1 to SU _n of the transmission voice section (Step S5021). FIG. 11 illustrates two pieces of received voice section information Ru ₁ and Ru ₂ , Rd ₁ and Rd ₂ , and two pieces of transmitted voice section information Su ₁ and Su ₂ , Sd ₁ and Sd ₂ . Numbers are assigned to the start time and end time in order of elapsed time.

受話信号抽出部５０３は、音声区間の番号を表す変数ｉを初期化（ｉ＝１）する（ステップＳ５１２１）。そして、側音抑圧済み受話信号の最初の音声区間の開始時刻Ｒｕ_１よりＲｍｓ（例えば０．５秒）過去の時間内に、送話信号の終了時刻Ｓｄ_１が存在するか否かを判定する（ステップＳ５１２２）。Ｒｍｓ秒過去の時間内に、送話信号の終了時刻Ｓｄ_１が存在しない場合（ステップＳ５１２２のＮｏ）、側音抑圧済み受話信号の最初の音声区間の開始時刻Ｒｕ_１をＲｕ_１＝Ｒｕ_１−Ｒｍｓ秒に決定する（ステップＳ５１２３）。Ｒｍｓ秒過去の時間内に、送話信号の終了時刻Ｓｄ_１が存在する場合（ステップＳ５１２２のＹｅｓ）、側音抑圧済み受話信号の最初の音声区間の開始時刻Ｒｕ_１を、送話信号の最初の終了時刻Ｓｄ_１＝Ｓｄ_１秒に決定する（ステップＳ５１２４）。ここで例えば、Ｒｍｓ＝０．５秒とＲｍｅ＝０．１秒とするが、他の所定時間であっても良い。なお、ＲｍｓとＲｍｅの時間幅を変えている理由は、発話開始時において音量が小さく音声区間として検出し難い子音で始まる場合に対処する目的で、長めのマージン時間を確保した方が好都合であることによる。一方、語尾が母音で終わる日本語の場合、受話信号の終了時刻のマージン時間は短くても不都合が少ないためである。 The received signal extraction unit 503 initializes (i = 1) a variable i representing the number of the voice section (step S5121). Then, it is determined whether or not the end time Sd ₁ of the transmission signal exists within the time Rms (for example, 0.5 seconds) past the start time Ru ₁ of the first voice section of the reception signal with sidetone suppression. (Step S5122). The Rms seconds in past time, when the end time Sd ₁ of the transmission signal does not exist (No in step S5122), the start time Ru ₁ of the first speech section of the sidetone suppression already received signal _Ru ₁ = _Ru 1 - Rms seconds are determined (step S5123). The Rms seconds within a time in the past, (Yes in Step S5122) If the end time Sd ₁ of the transmission signal is present, the start time Ru ₁ of the first speech section of the sidetone suppression already received signal, the first transmission signal End time Sd ₁ = Sd ₁ second is determined (step S5124). Here, for example, Rms = 0.5 seconds and Rme = 0.1 seconds, but other predetermined times may be used. The reason why the time widths of Rms and Rme are changed is that it is advantageous to secure a long margin time for the purpose of dealing with a case where the sound volume starts small and is difficult to detect as a voice section at the start of utterance. It depends. On the other hand, in the case of Japanese ending with vowels, there is little inconvenience even if the margin time of the end time of the received signal is short.

次に、側音抑圧済み受話信号の最初の音声区間の終了時刻Ｒｄ_１よりＲｍｅ秒未来の時間内に、送話信号の開始時刻Ｓｕ_２が存在するか否かを判定する（ステップＳ５１２５）。Ｒｍｅ秒未来の時間内に、送話信号の開始時刻Ｓｕ_２が存在しない場合（ステップＳ５１２５のＮｏ）、側音抑圧済み受話信号の最初の音声区間の終了時刻Ｒｄ_１をＲｄ_１＝Ｒｄ_１＋Ｒｍｅ秒に決定する（ステップＳ５１２６）。Ｒｍｅ秒未来の時間内に、送話信号の開始時刻Ｓｕ_２が存在する場合（ステップＳ５１２５のＹｅｓ）、側音抑圧済み受話信号の最初の音声区間の終了時刻Ｒｄ_１を、送話信号の開始時刻Ｓｕ_２（Ｒｄ_１＝Ｓｕ_２）秒に決定する（ステップＳ５１２７）。 Next, the sidetone suppression already received signals of the first end time Rd ₁ in more Rme of seconds in the future time of the speech section, determines whether the start time Su ₂ of the transmission signal is present (step S5125). In RME of seconds in the future time, when the start time Su ₂ of the transmission signal does not exist (No in step S5125), the sidetone suppression already received signals of the first end time Rd ₁ speech period _Rd 1 ₌ _Rd 1 + Rme Second is determined (step S5126). In Rme seconds in the future time, (Yes in step S5125) When the start time Su ₂ of the transmission signal is present, the end time Rd ₁ of the first speech section of the sidetone suppression already received signal, the start of the transmission signal Time Su ₂ (Rd ₁ = Su ₂ ) seconds is determined (step S5127).

以上の処理は、音声区間が終了するまで音声区間の番号を表す変数ｉをインクリメントしながら続けられる（ステップＳ５１２９のＮｏのループ）。つまり、側音抑圧済み受話信号の音声区間の前後に所定の時間幅のマージン時間が付与される。 The above processing is continued while incrementing the variable i representing the number of the voice section until the voice section is completed (No loop in step S5129). That is, a margin time of a predetermined time width is given before and after the voice section of the reception signal with side-tone suppressed.

図１１に、音声聴取装置５００で聴取した側音除去受話信号の例を示す。１行目は側音抑圧済み受話信号の受話音声区間情報、２行目は送話信号、３行目は送話信号の送話音声区間情報、４行目はマージン時間付与手段５１２の出力するマージン時間付与済み受話音声区間情報、５行目は受話信号抽出部５０３が出力する側音除去受話信号である。マージン時間付与手段５１２の出力信号に、破線で示すように所定時間のマージン時間が受話音声区間情報の前後に付与されている。そのマージン時間付与済み受話音声区間情報に対応した側音除去受話信号の音声波形は、図６に示したものよりも時間の前後方向に拡大されている。 FIG. 11 shows an example of a side-tone-removed reception signal that has been listened to by the voice listening apparatus 500. The first line is the received voice section information of the side-tone-suppressed received signal, the second line is the transmitted signal, the third line is the transmitted voice section information of the transmitted signal, and the fourth line is output from the margin time giving means 512. Received speech section information with margin time added, and the fifth line is a sidetone-removed received signal output from the received signal extraction unit 503. A margin time of a predetermined time is added to the output signal of the margin time giving means 512 before and after the reception voice section information as indicated by a broken line. The speech waveform of the side-tone-removed reception signal corresponding to the reception speech section information to which the margin time has been added is expanded in the front-rear direction of time as compared with that shown in FIG.

以上説明したように音声聴取装置５００は、所定の時間のマージン時間が付与された受話音声区間情報に対応する受話信号を、側音除去受話信号として抽出して記録することが出来る。なお、音声聴取装置５００の機能構成は図８に示した例に限定されない。例えば、送話信号と受話信号の音声区間の検出を音声区間検出部５０２で行う例で説明したが、受話信号の音声区間は受話音声区間検出部２０で行い、送話信号の音声区間は実施例４で説明したように送話音声区間検出部４０１で行って、それぞれから音声区間情報を取得するようにしても良い。また、受話送話信号記録部５０１を備える例で説明を行ったが、実施例１〜４に示した送話受話信号記録部４０に、音声取得処理をする前の音声データを記録するようにしても良い。このように、実施例５の機能構成は、図８に示す機能構成に限定されない。 As described above, the voice listening device 500 can extract and record a received signal corresponding to the received voice section information to which a predetermined margin time is given as a side-tone-removed received signal. The functional configuration of the audio listening device 500 is not limited to the example shown in FIG. For example, although the description has been made with the example in which the voice section of the transmission signal and the reception signal is detected by the voice section detection unit 502, the voice section of the reception signal is performed by the reception voice section detection unit 20, and the voice section of the transmission signal is performed. As described in the fourth example, the transmission voice section detection unit 401 may perform the voice section information from each. In addition, although an example in which the received transmission signal recording unit 501 is provided has been described, the voice data before the voice acquisition process is recorded in the transmission / reception signal recording unit 40 described in the first to fourth embodiments. May be. Thus, the functional configuration of the fifth embodiment is not limited to the functional configuration illustrated in FIG.

なお、この側音抑圧済み受話信号の音声区間の前後に所定の時間幅のマージン時間を付与する考えは、実施例２〜４と組み合わせることも可能である。 Note that the idea of giving a margin time having a predetermined time width before and after the speech section of the reception signal after side-tone suppression can be combined with the second to fourth embodiments.

以上実施例で説明した音声聴取装置によれば、側音信号を抑圧した側音抑圧済み受話信号から受話音声区間情報を検出し、当該受話音声区間情報に対応する受話信号を側音除去受話信号として記録するので、クロストーク状態の場合でも全ての受話信号を漏れなく聴取することが可能になる。 According to the voice listening device described in the above embodiment, the received voice section information is detected from the side-tone-suppressed received signal in which the side-tone signal is suppressed, and the received signal corresponding to the received voice-section information is detected as the side-tone-removed received signal. Therefore, it is possible to listen to all received signals without omission even in a crosstalk state.

なお、上記方法及び装置において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 Note that the processes described in the above method and apparatus are not only executed in time series according to the order of description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Good.

上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 When the processing means in the above apparatus is realized by a computer, the processing contents of the functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD（Digital Versatile Disc）、DVD-RAM（Random Access Memory）、CD-ROM（Compact Disc Read Only Memory）、CD-R（Recordable）/RW（ReWritable）等を、光磁気記録媒体として、MO（Magneto Optical disc）等を、半導体メモリとしてEEP-ROM（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording media, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

A side-tone suppression processing unit that receives a reception signal x ₁ and a transmission signal from a telephone and outputs a side-tone-suppressed reception signal x ₂ that suppresses the side-tone signal;
As input said side sound suppression already received signal x _2, the received voice section detection unit for outputting a received voice segment information by detecting a voice section of the side sound suppression already received signal x _2,
With the received signal x ₁ and the transmitted signal as inputs, a side tone low suppression processing unit that outputs a side suppression low-suppressed received signal x ₄ having a low suppression amount with a large original sound addition rate ;
As inputs the received voice segment information and the side tone low suppression already received signal x _4, regardless of whether the cross-talk state, the sidetone low suppression already received signal x ₄ corresponding to the received voice segment information, side A received signal extraction unit that outputs as a sound removal received signal x ₃ ;
The side-tone-removed reception signal x ₃ and the transmission / reception signal recording unit for recording the transmission signal,
A voice listening device comprising:

A side-tone suppression processing unit that receives a reception signal x ₁ and a transmission signal from a telephone and outputs a side-tone-suppressed reception signal x ₂ that suppresses the side-tone signal;
As input said side sound suppression already received signal x _2, the received voice section detection unit for outputting a received voice segment information by detecting a voice section of the side sound suppression already received signal x _2,
As inputs the received voice segment information and the received signals x _1, regardless of whether the cross-talk state, the reception signal x ₁ corresponding to the received voice segment information, and outputs as a sidetone removed received signal x ₃ , and the reception signal extraction section for outputting a side tone removal received signal x ₃ sections that do not correspond to the received voice segment information as a small noise-volume level,
The side-tone-removed reception signal x ₃ and the transmission / reception signal recording unit for recording the transmission signal,
A voice listening device comprising:

A side-tone suppression processing unit that receives a reception signal x ₁ and a transmission signal from a telephone and outputs a side-tone-suppressed reception signal x ₂ that suppresses the side-tone signal;
As input said side sound suppression already received signal x _2, the received voice section detection unit for outputting a received voice segment information by detecting a voice section of the side sound suppression already received signal x _2,
Detecting the voice section of the transmission signal to detect the transmission voice section, detecting the volume level in the transmission voice section, and outputting the transmission voice section and the volume level of the transmission voice section A transmission voice section detection unit;
With the received voice section information and the received signal x ₁ as inputs , regardless of whether or not in a crosstalk state , the received signal x ₁ corresponding to the received voice section information is output as a side-tone-removed received signal x ₃ , When the volume level is higher than a predetermined value with reference to the volume level output by the transmission voice section detection unit , a received signal extraction unit that controls to reduce the amplitude of the received signal x ₁ corresponding to the section ;
The side-tone-removed reception signal x ₃ and the transmission / reception signal recording unit for recording the transmission signal,
A voice listening device comprising:

A side-tone suppression processing unit that receives a reception signal x ₁ and a transmission signal from a telephone and outputs a side-tone-suppressed reception signal x ₂ that suppresses the side-tone signal;
The side-tone-suppressed reception signal x ₂ and the reception transmission signal recording unit for recording the transmission signal,
It reads and the side tone suppression already received signal x ₂ and the transmission signal from the received transmission signal recording unit, and start time Ru ₁ ~Ru _n of a plurality of speech segment of said side sound suppression already received signal x ₂ a voice section detection unit for outputting a speech segment information end time _Rd 1 ~ Rd _n, and detects the start time _Su 1 to SU _n and the end time _Sd 1 to SD _n of a plurality of speech segment of said transmission signal,
With the voice segment information and the received signal x ₁ as inputs , regardless of whether or not a crosstalk state is present, the received voice segment with margin time added before and after the voice segment information of the side signal suppressed received signal x ₂ A received signal extraction unit that outputs a corresponding received signal x ₁ as a side-tone-removed received signal x ₃ ;
A side-tone-removed reception signal recording unit for recording the side-tone-removed reception signal x ₃ ;
A voice listening device comprising:

Side-tone suppression processing process for receiving side-tone-suppressed reception signal x ₂ with side-tone signal being suppressed, with reception signal x ₁ and transmission signal from the phone as inputs,
With the side-tone-suppressed reception signal x ₂ as an input, a reception voice-section detection process of detecting a voice section of the side-tone-suppressed reception signal x ₂ and outputting reception voice section information;
With the received signal x ₁ and the transmitted signal as inputs, a side-tone low suppression process for outputting a side-tone low-suppressed received signal x ₄ with a low suppression amount with a large original sound addition rate ,
As inputs the received voice segment information and the side tone low suppression already received signal x _4, regardless of whether the cross-talk state, the sidetone low suppression already received signal x ₄ corresponding to the received voice segment information, side The process of extracting the received signal to be output as the sound-removed received signal x ₃ ;
The side-tone-removed reception signal x ₃ and the transmission / reception signal recording process for recording the transmission signal,
A voice listening method comprising:

Side-tone suppression processing process for receiving side-tone-suppressed reception signal x ₂ with side-tone signal being suppressed, with reception signal x ₁ and transmission signal from the phone as inputs,
With the side-tone-suppressed reception signal x ₂ as an input, a reception voice-section detection process of detecting a voice section of the side-tone-suppressed reception signal x ₂ and outputting reception voice section information;
As inputs the received voice segment information and the received signals x _1, regardless of whether the cross-talk state, the reception signal x ₁ corresponding to the received voice segment information, and outputs as a sidetone removed received signal x ₃ , A reception signal extraction process for outputting the side-tone-removed reception signal x ₃ of the section not corresponding to the reception voice section information as noise having a low volume level ;
The side-tone-removed reception signal x ₃ and the transmission / reception signal recording process for recording the transmission signal,
A voice listening method comprising:

Side-tone suppression processing process for receiving side-tone-suppressed reception signal x ₂ with side-tone signal being suppressed, with reception signal x ₁ and transmission signal from the phone as inputs,
With the side-tone-suppressed reception signal x ₂ as an input, a reception voice-section detection process of detecting a voice section of the side-tone-suppressed reception signal x ₂ and outputting reception voice section information;
Detecting the voice section of the transmission signal to detect the transmission voice section, detecting the volume level in the transmission voice section, and outputting the transmission voice section and the volume level of the transmission voice section The process of detecting the transmitted voice interval;
With the received voice section information and the received signal x ₁ as inputs , regardless of whether or not in a crosstalk state , the received signal x ₁ corresponding to the received voice section information is output as a side-tone-removed received signal x ₃ , When the volume level is greater than or equal to a predetermined value with reference to the volume level output by the transmission voice section detection process , a received signal extraction process for controlling to reduce the amplitude of the received signal x ₁ corresponding to the section ;
The side-tone-removed reception signal x ₃ and the transmission / reception signal recording process for recording the transmission signal,
A voice listening method comprising:

  Answer signal from phone x ₁₁ Side-signal suppressed received signal x with side-tone signal suppressed ₂₂ Side-tone suppression process that outputs
  Received signal x with side-tone suppressed ₂₂ And the received transmission signal recording process for recording the transmission signal, and
  From the received speech transmission signal recording process, the side signal suppressed reception signal x ₂₂ And the above transmission signal, and the side-tone-suppressed reception signal x ₂₂ Start times Ru of multiple voice segments _１1 ~ Ru _ｎn And end time Rd _１1 ~ Rd _ｎn , And start times Su of a plurality of voice sections of the transmission signal _１1 ~ Su _ｎn And end time Sd _１1 ~ Sd _ｎn Detecting a voice segment and outputting it as voice segment information;
  Voice section information and received signal x ₁₁ , And the side-tone-suppressed received signal x regardless of whether or not it is in the crosstalk state. ₂₂ Received signal x corresponding to the received voice section with margin time before and after the voice section information of ₁₁ , Sidetone elimination received signal x _3Three Receiving signal extraction process to be output as
  Above sidetone elimination received signal x _3Three Side-tone-removed received signal recording process,
  A voice listening method comprising:

A program for causing a computer to function as the audio listening device according to any one of claims 1 to 4 .