JP5812932B2 - Voice listening device, method and program thereof - Google Patents

Voice listening device, method and program thereof Download PDF

Info

Publication number
JP5812932B2
JP5812932B2 JP2012098839A JP2012098839A JP5812932B2 JP 5812932 B2 JP5812932 B2 JP 5812932B2 JP 2012098839 A JP2012098839 A JP 2012098839A JP 2012098839 A JP2012098839 A JP 2012098839A JP 5812932 B2 JP5812932 B2 JP 5812932B2
Authority
JP
Japan
Prior art keywords
signal
tone
voice
received
transmission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2012098839A
Other languages
Japanese (ja)
Other versions
JP2013228459A (en
Inventor
哲 小橋川
哲 小橋川
済央 野本
済央 野本
浩和 政瀧
浩和 政瀧
高橋 敏
敏 高橋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2012098839A priority Critical patent/JP5812932B2/en
Publication of JP2013228459A publication Critical patent/JP2013228459A/en
Application granted granted Critical
Publication of JP5812932B2 publication Critical patent/JP5812932B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephone Function (AREA)

Description

本発明は、電話機から送話信号と受話信号を取得する音声聴取装置とその方法とプログラムに関する。   The present invention relates to a voice listening device for acquiring a transmission signal and a reception signal from a telephone, a method thereof, and a program.

従来から、電話機の送話信号と受話信号とを取得する音声聴取装置が存在する。その音声聴取装置の一つとして、送話信号と受話信号を音声認識するとともに聴取データを音声ファイルとして出力して再生できるようにする音声認識装置が知られている。その音声認識装置においては、送話信号と受話信号とを別々に音声認識する際に、送話側の話者と受話側の話者が同時に発言するクロストーク状態が問題になっていた。クロストーク状態の場合、それぞれの発話は通常と異なり乱れた信号となる。その乱れた送話信号と受話信号は音声認識の誤認識の原因になる。   2. Description of the Related Art Conventionally, there is a voice listening device that acquires a transmission signal and a reception signal of a telephone. As one of the voice listening devices, a voice recognition device is known that recognizes a transmission signal and a reception signal as voice and outputs listening data as a voice file for reproduction. In the speech recognition apparatus, when the speech signal and the reception signal are separately recognized, the crosstalk state in which the speaker on the transmission side and the speaker on the reception side speak simultaneously has been a problem. In the case of the crosstalk state, each utterance becomes a distorted signal unlike usual. The disturbed transmission signal and reception signal cause misrecognition of speech recognition.

クロストーク状態における音声認識の誤認識を防ぐ目的で、送話信号と受話信号のそれぞれについて音声区間検出を行い、片方のみの話者が発話状態の区間を検出してその区間のみを音声認識する音声認識装置110が特許文献1に開示されている。   In order to prevent misrecognition of speech recognition in the crosstalk state, speech section detection is performed for each of the transmission signal and the reception signal, and only one of the speakers detects the speech state section and recognizes only that section. A speech recognition apparatus 110 is disclosed in Patent Document 1.

図12に示す従来の音声認識装置110の機能構成を参照してその動作を簡単に説明する。音声認識装置110は、側音抑圧処理部21、送話音声区間検出部22、受話音声区間検出部23、音声区間情報管理部24、送話信号抽出部25、受話信号抽出部26、送話信号録音部16、受話信号録音部17、音声認識処理部111、を具備する。   The operation will be briefly described with reference to the functional configuration of the conventional speech recognition apparatus 110 shown in FIG. The voice recognition device 110 includes a side tone suppression processing unit 21, a transmission voice segment detection unit 22, a reception voice segment detection unit 23, a voice segment information management unit 24, a transmission signal extraction unit 25, a reception signal extraction unit 26, and a transmission. A signal recording unit 16, an incoming signal recording unit 17, and a speech recognition processing unit 111 are provided.

側音抑圧処理部21は、受話信号に回り込む側音信号を除去する。側音とは、会話をし易くする目的で、自端末側のスピーカに少量出力される自端末側の送話信号のことである。側音信号は、電話機19内の側音回路15において受話信号に付加される。   The side tone suppression processing unit 21 removes the side tone signal that wraps around the received signal. A side sound is a transmission signal on the terminal side that is output in a small amount to a speaker on the terminal side for the purpose of facilitating conversation. The side sound signal is added to the received signal in the side sound circuit 15 in the telephone 19.

送話音声区間検出部22は、送話信号の音声区間と非音声区間を検出する。受話音声区間検出部23は、受話信号の音声区間と非音声区間を検出する。音声区間情報管理部24は、送話信号と受話信号の音声区間と非音声区間を入力としてクロストーク状態ではない送話音声抽出区間と受話音声区間とを特定する。   The transmission voice section detection unit 22 detects a voice section and a non-voice section of the transmission signal. The received voice section detector 23 detects a voice section and a non-voice section of the received signal. The voice section information management unit 24 inputs the voice section and the non-voice section of the transmission signal and the reception signal, and specifies the transmission voice extraction section and the reception voice section that are not in the crosstalk state.

音声認識処理部111は、送話音声抽出区間の送話信号と、受話音声抽出区間の受話信号と、を音声認識処理して、聴取データとするとともに音声ファイルを出力して再生できるようにする。   The voice recognition processing unit 111 performs voice recognition processing on the transmission signal in the transmission voice extraction section and the reception signal in the reception voice extraction section to generate listening data and to output and reproduce the voice file. .

特開2006−343642号公報JP 2006-343642 A

従来の音声を聴取する目的で送受話信号を音声認識する音声認識装置110では、クロストーク状態の受話信号を取り出すことが出来ない課題がある。   In the conventional speech recognition apparatus 110 that recognizes a transmission / reception signal for the purpose of listening to a voice, there is a problem that a reception signal in a crosstalk state cannot be extracted.

この発明は、このような課題に鑑みてなされたものであり、クロストーク状態の受話信号も、抽出できるようにした音声聴取装置とその方法とプログラムを提供することを目的とする。   The present invention has been made in view of such a problem, and an object of the present invention is to provide an audio listening device, a method thereof, and a program capable of extracting a reception signal in a crosstalk state.

本発明の音声聴取装置は、側音抑圧処理部と、受話音声区間検出部と、受話信号抽出部と、送受話信号記録部と、を具備する。側音抑圧処理部は、電話機からの受話信号と送話信号を入力として側音信号を抑圧した側音抑圧済み受話信号を出力する。受話音声区間検出部は、側音抑圧済み受話信号を入力として、当該側音抑圧済み受話信号の音声区間を検出して受話音声区間情報を出力する。受話信号抽出部は、受話音声区間情報と受話信号を入力として、受話音声区間情報に対応する受話信号を、側音除去受話信号として出力する。送受話信号記録部は、側音除去受話信号と送話信号を記録する。   The voice listening device of the present invention includes a side-tone suppression processing unit, a received voice segment detecting unit, a received signal extracting unit, and a transmitted / received signal recording unit. The side tone suppression processing unit outputs a reception signal after side tone suppression in which the side tone signal is suppressed by receiving the reception signal and the transmission signal from the telephone. The reception voice section detection unit receives the side-tone-suppressed reception signal, detects a voice section of the side-tone-suppressed reception signal, and outputs reception voice section information. The received signal extraction unit receives the received voice segment information and the received signal, and outputs a received signal corresponding to the received voice segment information as a side tone removed received signal. The transmission / reception signal recording unit records the side-tone-removed reception signal and the transmission signal.

この発明の音声聴取装置によれば、側音信号を抑圧した側音抑圧済み受話信号から受話音声区間を検出し、当該受話音声区間内の受話信号を受話信号として記録するので、クロストーク状態の場合でも全ての受話信号を漏れなく聴取することが可能になる。   According to the voice listening device of the present invention, the received voice section is detected from the side-tone-suppressed received signal in which the side-tone signal is suppressed, and the received signal in the received voice section is recorded as the received signal. Even in this case, it is possible to listen to all received signals without omission.

本発明の音声聴取装置100の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 100 of this invention. 音声聴取装置100の動作フローを示す図。The figure which shows the operation | movement flow of the audio | voice listening apparatus 100. 本発明の音声聴取装置200の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 200 of this invention. 本発明の音声聴取装置300の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 300 of this invention. 受話信号抽出部301の機能構成例を示す図。The figure which shows the function structural example of the received signal extraction part 301. FIG. 受話信号抽出部301が出力する側音除去受話信号の例を示す図。The figure which shows the example of the side tone removal received signal which the received signal extraction part 301 outputs. 本発明の音声聴取装置400の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 400 of this invention. 本発明の音声聴取装置500の機能構成例を示す図。The figure which shows the function structural example of the audio | voice listening apparatus 500 of this invention. 受話区間抽出部503の機能構成例を示す図。The figure which shows the function structural example of the receiving area extraction part 503. FIG. 受話信号抽出部503の動作フローを示す図。The figure which shows the operation | movement flow of the received signal extraction part 503. 受話信号抽出部503が出力する側音除去受話信号の例を示す図。The figure which shows the example of the side tone removal received signal which the received signal extraction part 503 outputs. 従来の音声認識装置110の機能構成例を示す図。The figure which shows the function structural example of the conventional speech recognition apparatus 110. FIG.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。   Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

図1に、この発明の音声聴取装置100の機能構成例を示す。その動作フローを図2に示す。音声聴取装置100は、側音抑圧処理部10と、受話音声区間検出部20と、受話信号抽出部30と、送受話信号記録部40と、を具備する。音声聴取装置100は、例えばROM、RAM、CPU等で構成されるコンピュータに所定のプログラムが読み込まれて、CPUがそのプログラムを実行することで実現されるものである。以降で説明する他の実施例に示す音声聴取装置も同様である。   FIG. 1 shows an example of the functional configuration of a voice listening device 100 according to the present invention. The operation flow is shown in FIG. The voice listening device 100 includes a side-tone suppression processing unit 10, a received voice section detecting unit 20, a received signal extracting unit 30, and a transmitted / received signal recording unit 40. The audio listening device 100 is realized by a predetermined program being read into a computer composed of, for example, a ROM, a RAM, and a CPU, and the CPU executing the program. The same applies to the audio listening devices shown in other embodiments described below.

側音抑圧処理部10は、電話機19からの受話信号と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号を出力する(ステップS10)。マイクロホン11とスピーカ12と送信部13と受信部14と側音回路15とで構成される電話機19は、従来技術で示したものと同じである。なお、電話機19の参照符号は作図の都合により省略している。電話機19は一般的なものであり、受話信号が相手方の発話者音声、送話信号が着信を受けている側の送話者音声である。   The side sound suppression processing unit 10 receives the reception signal and the transmission signal from the telephone 19 and outputs a side sound suppressed reception signal with the side sound signal suppressed (step S10). A telephone set 19 including a microphone 11, a speaker 12, a transmission unit 13, a reception unit 14, and a side sound circuit 15 is the same as that shown in the related art. Note that reference numerals of the telephone 19 are omitted for convenience of drawing. The telephone 19 is a general one, and the reception signal is the other party's speaker voice, and the transmission signal is the other party's voice.

側音抑圧処理部10は、受話信号と送話信号とを入力とし、受話信号に重畳する送話信号を抑圧して側音抑圧済み受話信号を出力する。側音抑圧処理部10は、一般的なエコーキャンセラで構成できる。この例の側音抑圧済み受話信号は、相手方の発話音声に重畳した送話者の発話音声を抑圧した信号である。   Side sound suppression processing unit 10 receives the received signal and the transmitted signal, suppresses the transmitted signal superimposed on the received signal, and outputs the received signal with the side sound suppressed. The side tone suppression processing unit 10 can be configured by a general echo canceller. The side-tone-suppressed reception signal in this example is a signal in which the utterance voice of the sender superimposed on the utterance voice of the other party is suppressed.

受話音声区間検出部20は、側音抑圧済み受話信号を入力として、当該側音抑圧済み受話信号の音声区間を検出して受話音声区間情報を出力する(ステップS20)。受話音声区間情報とは、受話音声区間を現す区間情報である。音声区間検出の方法としては、一般的な音量(パワー)に基づく手法を用いると良い。又は、混合正規分布(GMM:Gaussian Mixture Model)に基づく音声モデルと非音声モデルを用いた音声区間検出を行っても良い。混合正規分布を用いた音声区間検出の場合は、GMMの学習に上記した側音抑圧済み受話信号を用いることで、音声/非音声の識別性能の向上を図ることが出来る。   The received voice section detection unit 20 receives the side-tone-suppressed received signal, detects the voice section of the side-tone-suppressed received signal, and outputs received voice section information (step S20). The received voice section information is section information representing the received voice section. As a method for detecting a voice section, a method based on a general sound volume (power) may be used. Alternatively, speech segment detection using a speech model and a non-speech model based on a mixed normal distribution (GMM: Gaussian Mixture Model) may be performed. In the case of voice segment detection using a mixed normal distribution, the voice / non-voice discrimination performance can be improved by using the above-described side-tone-suppressed reception signal for GMM learning.

受話信号抽出部30は、受話音声区間検出部20が出力する受話音声区間情報と受話信号を入力として、当該受話音声区間情報に対応する受話信号を、側音除去受話信号として出力する(ステップS30)。また、受話信号抽出部30は、上記受話音声区間情報以外の側音除去受話信号の振幅を0(無音)にして出力する。側音除去受話信号は、受話音声区間検出部20において受話音声区間として検出された受話区間に対応する区間の受話信号のみの信号となる。   The reception signal extraction unit 30 receives the reception voice section information and the reception signal output from the reception voice section detection unit 20, and outputs a reception signal corresponding to the reception voice section information as a side tone removal reception signal (step S30). ). In addition, the reception signal extraction unit 30 sets the amplitude of the side-tone-removed reception signal other than the reception voice section information to 0 (silence) and outputs it. The side-tone-removed reception signal is a signal only for the reception signal in the section corresponding to the reception section detected as the reception voice section by the reception voice section detection unit 20.

側音除去受話信号は、送話者の音声を抑圧した受話者の音声信号から検出した受話音声区間情報に対応した受話信号であるので、クロストーク状態の受話者の音声信号(受話信号)も漏れなく抽出することが出来る。つまり、受話信号と送話信号が重なるクロストーク状態でも、受話信号を聴取することが可能である。   Since the side-tone-removed reception signal is a reception signal corresponding to the reception voice section information detected from the voice signal of the listener who suppresses the voice of the sender, the voice signal (reception signal) of the speaker in the crosstalk state is also used. Extract without leakage. That is, it is possible to listen to the received signal even in a crosstalk state where the received signal and the transmitted signal overlap.

送受話信号記録部40は、受話信号抽出部30が出力する側音除去受話信号と送話信号を記録する(ステップS40)。送受話信号記録部40は、送話信号のファイルと、送話信号と時間的に同期のとれた側音除去受話信号の音声ファイルを生成する。音声ファイルは2つ別々、又はステレオの1つのファイルであっても良い。   The transmission / reception signal recording unit 40 records the side-tone-removed reception signal and the transmission signal output from the reception signal extraction unit 30 (step S40). The transmission / reception signal recording unit 40 generates a transmission signal file and an audio file of a side-tone-removed reception signal synchronized in time with the transmission signal. The audio files may be two separate files or one stereo file.

以上説明したように音声聴取装置100によれば、側音信号を抑圧した側音抑圧済み受話信号から受話音声区間情報を検出し、当該受話音声区間情報に対応する受話信号を側音除去受話信号として記録するので、クロストーク状態の場合でも全ての受話信号を漏れなく聴取することが可能になる。側音抑圧処理部10は、受話音声区間を検出する目的のみで側音を抑圧するので抑圧量を大きくすることが出来る。つまり、波形に歪みが発生しても構わないので、正確な受話音声区間を検出することが可能である。その正確な受話音声区間情報に対応して抽出される受話信号は、歪みの少ない信号にすることが出来るため、聴き取り易く、また、音声認識等に利用し易い信号となる。   As described above, according to the voice listening device 100, the received voice section information is detected from the side-tone-suppressed received signal in which the side-tone signal is suppressed, and the received signal corresponding to the received voice-section information is detected as the side-tone-removed received signal. Therefore, it is possible to listen to all received signals without omission even in a crosstalk state. The side-tone suppression processing unit 10 suppresses the side-tone only for the purpose of detecting the reception voice section, so that the amount of suppression can be increased. In other words, since the waveform may be distorted, it is possible to detect an accurate received voice interval. The received signal extracted corresponding to the accurate received voice section information can be a signal with little distortion, and thus is a signal that is easy to listen to and easy to use for voice recognition and the like.

なお、図1に、破線で音声認識処理部50を示すように、送受話信号記録部40に記録された側音除去受話信号と送話信号を、音声認識処理部50で音声認識処理してテキストデータに変換しても良い。また、音声視聴部60において、音声認識結果のテキストデータと側音除去受話信号と送話信号の音声ファイルとを対応付けて音声ファイルを効率良く視聴できるようにしても良い。音声視聴部60は、例えば、テキストデータをディスプレ装置で表示すると共に、音声ファイルを再生できるようにするアプリケーションソフトを含む機能部である。   As shown in FIG. 1, the voice recognition processing unit 50 is indicated by a broken line, and the side recognition removed reception signal and the transmission signal recorded in the transmission / reception signal recording unit 40 are subjected to voice recognition processing by the voice recognition processing unit 50. You may convert into text data. In addition, the audio viewing unit 60 may be configured to associate the text data of the speech recognition result, the side-tone-removed reception signal, and the audio file of the transmission signal with each other so that the audio file can be efficiently viewed. The audio viewing unit 60 is, for example, a functional unit that includes application software that displays text data on a display device and that can reproduce an audio file.

次に、側音抑圧受話信号から受話信号を抽出するようにした音声聴取装置200を説明する。   Next, a voice listening device 200 that extracts a reception signal from a side tone suppression reception signal will be described.

図3に、この発明の音声聴取装置200の機能構成例を示す。音声聴取装置200は、音声聴取装置100に対して側音低抑圧処理部201を、更に備える点で異なる。側音低抑圧処理部201は、受話信号と送話信号を入力として、原音付加率の大きい低抑圧量の側音低抑圧済み受話信号を、受音信号抽出部30に出力する。   FIG. 3 shows a functional configuration example of the voice listening device 200 of the present invention. The audio listening device 200 is different from the audio listening device 100 in that it further includes a side tone low suppression processing unit 201. The side tone low suppression processing unit 201 receives the reception signal and the transmission signal as inputs, and outputs a low suppression amount side reception suppressed signal having a large original sound addition rate to the reception signal extraction unit 30.

受音信号抽出部30は、受話音声区間検出部20が出力する受話音声区間情報と側音低抑圧済み受話信号を入力として、当該受話音声区間情報に対応する側音低抑圧済み受話信号を、側音除去受話信号として出力する。   The reception signal extraction unit 30 receives the reception voice section information output from the reception voice section detection unit 20 and the reception signal with the side-tone low-suppressed input, and receives the side-tone-suppressed reception signal corresponding to the reception voice section information. Output as a sidetone-removed reception signal.

側音低抑圧処理部201のエコー抑圧量は、側音抑圧処理部10よりも低めに設定される。そのエコー抑圧量に対して側音抑圧処理部10は、原音付加率Aの高いエコー抑圧量(例えば、A=0.1)に設定する。エコー抑圧量を高く設定すると受話信号が歪むが、受話音声区間情報は正確に検出することが可能である。   The echo suppression amount of the side tone suppression processing unit 201 is set lower than that of the side tone suppression processing unit 10. The side-tone suppression processing unit 10 sets the echo suppression amount with a high original sound addition rate A (for example, A = 0.1) with respect to the echo suppression amount. If the echo suppression amount is set high, the reception signal is distorted, but the reception voice section information can be accurately detected.

例えば原音付加率A=0.6程度の低いエコー抑圧量に設定される側音低抑圧処理部201の出力する側音低抑圧済み受話信号の歪みは、送話信号が抑圧された歪みの少ない受話信号である。受話信号抽出部30は、受話音声区間情報とその歪みの少ない側音低抑圧済み受話信号とから、側音除去受話信号を抽出するので、当該側音除去受話信号を例えば音声認識処理しても誤認識が少ない音声信号にすることが出来る。   For example, the distortion of the side-tone-suppressed received signal output from the side-tone low suppression processing unit 201 that is set to a low echo suppression amount of the original sound addition rate A = 0.6 is less distortion that the transmission signal is suppressed. This is a received signal. Since the reception signal extraction unit 30 extracts the side tone removal reception signal from the reception voice section information and the reception signal with suppressed sidetone low suppression with less distortion, even if the side sound removal reception signal is subjected to voice recognition processing, for example. An audio signal with few misrecognitions can be obtained.

このように、音声聴取装置200によれば、受話音声区間検出部20において受話音声区間情報を検出するための音声信号のエコー抑圧量を高くして正確な受話音声区間情報を検出可能にし、その正確な受話音声区間情報に対応する歪みの少ない側音低抑圧済み受話信号から、歪みの少ない側音除去受話信号を抽出することが出来る。   As described above, according to the voice listening device 200, the received voice section detecting unit 20 can increase the echo suppression amount of the voice signal for detecting the received voice section information so that accurate received voice section information can be detected. A side-tone-removed received signal with less distortion can be extracted from a received signal with reduced side-tone low suppression corresponding to accurate received voice section information.

次に、受話音声区間以外の側音除去受話信号の振幅を0にするところに、雑音を重畳さるようにした音声聴取装置300を説明する。   Next, a description will be given of the voice listening apparatus 300 in which noise is superimposed on the side sound removal received signal other than the received voice section where the amplitude of the received signal is zero.

図4に、この発明の音声聴取装置300の機能構成例を示す。音声聴取装置300は、音声聴取装置100に対して受話信号抽出部301の動作が異なる。受話信号抽出部301は、上記した側音除去受話信号の振幅が0(無音)のところに、振幅の小さな雑音(例えば白色雑音)を重畳するように動作する。   FIG. 4 shows a functional configuration example of the voice listening device 300 of the present invention. The voice listening device 300 differs from the voice listening device 100 in the operation of the received signal extraction unit 301. The received signal extraction unit 301 operates so as to superimpose noise with a small amplitude (for example, white noise) on the place where the amplitude of the above-described side-tone-removed received signal is 0 (silence).

受話信号抽出部301は、受話音声区間情報に対応する受話信号又は側音低抑圧済み受話信号を側音除去受話信号として出力すると共に、受話音声区間情報に対応しない区間の側音除去受話信号を、音量レベルの小さい雑音として出力する。   The reception signal extraction unit 301 outputs the reception signal corresponding to the reception voice section information or the reception signal after the side-tone low suppression as the side sound removal reception signal, and the side-tone removal reception signal of the section not corresponding to the reception voice section information. , Output as low noise level.

音声聴取装置300は、上記した音声聴取装置100,200に対して、側音除去受話信号の振幅が急激に変化しないので、聴覚の連続聴効果(例えば、http://www.brl.ntt.co.jp/IllusionForum/a/continuityIllusion/ja/index.html参照)として報告されている補完が働き易くなり、聴取者の聞き誤りを防ぐ効果を奏することが出来る。音声聴取装置300は、聴取者の聞き誤りを防止するばかりでなく、音声認識精度の安定性を向上させる効果も奏する。   Since the amplitude of the side-tone-removed reception signal does not change abruptly in the audio listening device 300 compared to the audio listening devices 100 and 200 described above, the auditory continuous listening effect (for example, http://www.brl.ntt. co.jp/IllusionForum/a/continuityIllusion/en/index.html), which is easier to work with, and can prevent the listener from making mistakes. The voice listening device 300 not only prevents the listener from hearing errors but also improves the stability of voice recognition accuracy.

図5に、受話信号抽出部301の機能構成例を示す。受話信号抽出部301は、雑音生成手段3010と、雑音重畳手段3011と、を備える。雑音生成手段3010は、例えば、−80dB程度の音量レベルが非常に小さな白色雑音を生成する。白色雑音は、例えば、正規乱数を用いた従来手法によって容易に生成することが出来る。   FIG. 5 shows a functional configuration example of the received signal extraction unit 301. The received signal extraction unit 301 includes noise generation means 3010 and noise superimposition means 3011. The noise generation unit 3010 generates white noise having a very small volume level of, for example, about −80 dB. The white noise can be easily generated by a conventional method using normal random numbers, for example.

雑音重畳手段3011は、受話音声区間情報と、受話信号又は側音低抑圧済み受話信号を入力として、受話音声区間情報に対応する区間を、受話信号又は側音低抑圧済み受話信号とし、それ以外の区間を白色雑音とした側音除去受話信号を出力する。   The noise superimposing means 3011 receives the reception voice section information and the reception signal or the reception signal with the side-tone low suppressed, and sets the section corresponding to the reception voice section information as the reception signal or the side-tone low-suppression reception signal. A sidetone-removed reception signal with white noise in the section is output.

図6に、受話信号抽出部301が出力する側音除去受話信号の例を示す。1行目は受話信号、2行目はその受話信号を入力として受話音声区間検出部20が出力する受話音声区間情報である。3行目は、雑音生成手段3010が出力する白色雑音である。4行目は、上記した受話音声区間情報と受話信号と白色雑音とを入力とした場合の、雑音重畳手段3011が出力する側音除去受話信号を示す。受話音声区間情報に対応する区間には受話信号が出力され、それ以外の区間には白色雑音が出力されている様子が分かる。   FIG. 6 shows an example of the sidetone-removed reception signal output by the reception signal extraction unit 301. The first line is a reception signal, and the second line is reception voice section information output from the reception voice section detection unit 20 with the reception signal as an input. The third line is white noise output from the noise generating unit 3010. The fourth line shows a side-tone-removed reception signal output by the noise superimposing means 3011 when the above-described reception voice section information, reception signal, and white noise are input. It can be seen that the reception signal is output in the section corresponding to the reception voice section information, and white noise is output in the other sections.

なお、受話信号の音量レベルが小さい場合には、雑音重畳による音声認識率の低下が問題になることがある。そこで、受話信号の音量レベル又は、受話信号のS/N比に応じて重畳する雑音のレベルを変えるようにしても良い。その場合は、受話信号抽出部301内に受話信号又は側音低抑圧済み受話信号のS/N比を検出するS/N比検出手段3012を設け、S/N比検出手段3012で検出したS/N比に応じて白色雑音生成手段3010が生成する白色雑音の振幅を、例えばS/N比が30dB以上になるように自動的に制御するようにしても良い。S/N比検出手段3012は、受話信号又は側音低抑圧済み受話信号のパワーを検出するパワー検出手段に代えても良い。なお、実施例3は白色雑音を、側音除去受話信号に重畳させる例を説明したが、雑音は白色雑音に限定する必要はない。例えば、パワースペクトル密度が周波数に反比例する関係のピンクノイズを用いても良い。白色雑音、ピンクノイズ以外の雑音であっても同様の効果を奏する雑音を用いても良い。   Note that when the volume level of the received signal is low, there is a problem that the speech recognition rate is reduced due to noise superimposition. Therefore, the volume level of the received signal or the level of noise to be superimposed may be changed according to the S / N ratio of the received signal. In that case, S / N ratio detection means 3012 for detecting the S / N ratio of the reception signal or the reception signal with suppressed side tone low is provided in the reception signal extraction unit 301, and the S detected by the S / N ratio detection means 3012. The amplitude of the white noise generated by the white noise generating unit 3010 according to the / N ratio may be automatically controlled so that the S / N ratio becomes, for example, 30 dB or more. The S / N ratio detection means 3012 may be replaced with a power detection means for detecting the power of the reception signal or the reception signal with the side-tone low suppressed. In addition, although Example 3 demonstrated the example which superimposes white noise on a side-tone removal receiving signal, noise does not need to be limited to white noise. For example, pink noise whose power spectral density is inversely proportional to the frequency may be used. Even noise other than white noise and pink noise may be used that exhibits the same effect.

また、上記では、受話音声区間情報に対応しない区間の側音除去受話信号を、音量レベルの小さい雑音として出力する例について説明したが、重畳する雑音が小さい場合、重畳される音声に与える影響は少なくなるため、受話音声区間情報に対応する区間を含む受話音声全体に音量レベルの小さい雑音を重畳しても良い。同様に聴覚の連続聴効果を得ることができる。   Further, in the above description, an example in which a side-tone-removed reception signal in a section that does not correspond to the reception voice section information is output as noise with a low sound volume level has been described. Therefore, noise with a low volume level may be superimposed on the entire received voice including the section corresponding to the received voice section information. Similarly, an auditory continuous listening effect can be obtained.

次に、受話信号抽出部が、送話側の音量レベルが過度に大きい区間の側音除去受話信号の振幅を下げるように動作する音声聴取装置400を説明する。   Next, the voice listening device 400 in which the received signal extraction unit operates so as to reduce the amplitude of the side-tone-removed received signal in a section where the volume level on the transmitting side is excessively high will be described.

図7に、音声聴取装置400の機能構成例を示す。音声聴取装置400は、音声聴取装置100に対して、送話音声区間検出部401を備える点と受話信号抽出部402の動作とが異なる。   FIG. 7 shows a functional configuration example of the audio listening device 400. The voice listening device 400 differs from the voice listening device 100 in that it includes a transmitted voice section detecting unit 401 and an operation of the received signal extracting unit 402.

送話音声区間検出部401は、送話信号を入力として当該送話信号の音声区間検出を行い送話音声区間を検出すると共に、その送話音声区間内の音量レベルを検出してその音量レベルと送話音声区間とを出力する。音声区間の検出方法は、上記した受話音声区間検出部20と同じである。   The transmission voice section detection unit 401 receives a transmission signal, detects a voice section of the transmission signal, detects a transmission voice section, detects a volume level in the transmission voice section, and detects the volume level. And the transmission voice section are output. The method for detecting the voice segment is the same as that of the received voice segment detector 20 described above.

受話信号抽出部402は、送話音声区間検出部が出力する音量レベルを参照してその音量レベルが所定値以上の場合、その区間に対応する受話信号の振幅を低下させるように制御する。所定値とは、例えば送話信号の音量レベルが−20dB以上の場合、送話信号の音量レベル/最大音量レベル(音声聴取装置400で扱える音量レベルの最大値)の比をαとした時に、その送話音声区間を含む側音除去受話信号の振幅を例えば(1−α)倍に低下させる。   The reception signal extraction unit 402 refers to the volume level output from the transmission voice section detection unit, and controls the amplitude of the reception signal corresponding to the section when the volume level is a predetermined value or more. For example, when the volume level of the transmission signal is −20 dB or more, the predetermined value is when the ratio of the volume level of the transmission signal / the maximum volume level (maximum value of the volume level that can be handled by the audio listening device 400) is α, For example, the amplitude of the side-tone-removed reception signal including the transmission voice section is reduced by (1−α) times.

このようにすることで、送話側の音量レベルが過度に大きい区間の側音除去受話信号の音量レベルを下げることができ、送話信号の回り込み音声が大音量で聞こえることを抑制することが出来る。なお、この音声聴取装置400の考えは、音声聴取装置200と300と、組み合わせても良く、その場合でも同様の効果を奏することが可能である。   By doing in this way, the volume level of the side-tone-removed reception signal in the section where the volume level on the transmission side is excessively high can be lowered, and the sneak sound of the transmission signal can be suppressed from being heard at a high volume. I can do it. The idea of the audio listening device 400 may be combined with the audio listening devices 200 and 300, and even in that case, the same effect can be obtained.

次に、受話音声区間情報の前後にマージン時間を付与した受話音声区間に対応する受話信号を、側音除去受話信号として出力するようにした音声聴取装置500を説明する。   Next, a description will be given of a voice listening device 500 that outputs a received signal corresponding to a received voice section with a margin time before and after the received voice section information as a side-tone-removed received signal.

図8に、音声聴取装置500の機能構成例を示す。音声聴取装置500は、側音抑圧処理部10と、受話送話信号記録部501と、音声区間検出部502と、受話信号抽出部503と、側音除去受話信号記録部504と、を具備する。側音抑圧処理部10は、参照符号から明らかなように音声聴取装置100と同じものである。   FIG. 8 shows a functional configuration example of the audio listening device 500. The voice listening device 500 includes a side-tone suppression processing unit 10, a received transmission signal recording unit 501, a voice segment detection unit 502, a received signal extraction unit 503, and a side-tone-removed reception signal recording unit 504. . The side-tone suppression processing unit 10 is the same as the audio listening device 100 as is clear from the reference numerals.

側音抑圧処理部10は、図示しない電話機から受話信号と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号を出力する。   Sidetone suppression processing unit 10 receives a reception signal and a transmission signal from a telephone (not shown), and outputs a sidetone-suppressed reception signal in which the sidetone signal is suppressed.

受話送話信号記録部501は、側音抑圧処理部10が出力する側音抑圧済み受話信号と、送話信号とを記録する。音声聴取装置500は、実時間処理するものではなく、例えば受話者と送話者の会話を記録した音声ファイルの音声を再生させて、一度記録した側音抑圧済み受話信号と送話信号を用いて処理を行うものである。   The received transmission signal recording unit 501 records the side-tone-suppressed reception signal output from the side-tone suppression processing unit 10 and the transmission signal. The audio listening device 500 does not perform real-time processing. For example, the audio listening device 500 reproduces the audio of the audio file in which the conversation between the receiver and the transmitter is recorded, and uses the side-signal-suppressed reception signal and the transmission signal that have been recorded once. Process.

音声区間検出部502は、受話送話信号記録部501から側音抑圧済み受話信号と送話信号とを読み出して、側音抑圧済み受話信号の複数の音声区間の開始時刻Ru〜Ruと終了時刻Rd〜Rd、及び上記送話信号の複数の音声区間の開始時刻Su〜Suと終了時刻Sd〜Sdを検出して音声区間情報として出力する。 The speech section detection unit 502 reads the side-tone-suppressed reception signal and the transmission signal from the reception-speech signal recording unit 501, and the start times Ru 1 to Ru n of the plurality of speech sections of the side-tone-suppressed reception signal end time Rd 1 ~ Rd n, and output as speech segment information by detecting the start time Su 1 to SU n and the end time Sd 1 to SD n of a plurality of speech segment of said transmission signal.

受話信号抽出部503は、音声区間情報Ru〜Ru,Rd〜Rd,Su〜Su,Sd〜Sdと受話信号とを入力として、側音抑圧済み受話信号の音声区間情報の前後にマージン時間を付与した受話音声区間に対応する受話信号を、側音除去受話信号として出力する。 The received signal extraction unit 503 receives the speech section information Ru 1 to Ru n , Rd 1 to Rd n , Su 1 to Su n , Sd 1 to Sd n and the received signal, and receives the speech section of the side-tone-suppressed received signal. A reception signal corresponding to a reception voice section to which a margin time is given before and after the information is output as a side-tone-removed reception signal.

側音除去受話信号記録部504は、受話信号抽出部503が出力する側音除去受話信号を記録する。   The side tone removed received signal recording unit 504 records the side tone removed received signal output by the received signal extracting unit 503.

音声聴取装置500によれば、側音除去受話信号の前後にマージン時間を付与することが出来るので、例えば、側音除去受話信号が子音で始まる時に音声区間が欠落してしまう場合などの音声区間検出の誤検出を防止することが可能である。   According to the voice listening device 500, margin time can be given before and after the sidetone removal received signal, so that, for example, a voice section is lost when the sidetone removal received signal starts with a consonant. It is possible to prevent erroneous detection.

図9に、受話信号抽出部503のより具体的な機能構成例を示す。受話信号抽出部503は、マージン時間付与手段512と側音除去受話信号抽出手段522とを備える。マージン時間付与手段512は、音声区間情報Ru〜Ru,Rd〜Rd,Su〜Su,Sd〜Sdを入力として、側音抑圧済み受話信号の音声区間の開始時刻Ru〜Ruより所定の時間Rms過去の時間内に、送話信号の音声区間の終了時刻Sd〜Sdが存在するか否かを判定する。そして、側音抑圧済み受話信号の音声区間の終了時刻Rd〜Rdより所定の時間Rme未来の時間内に、送話信号の音声区間の開始時刻Su〜Suが存在するか否かを判定する。所定時間内に開示時刻と終了時刻が存在しない場合は、所定時間のマージン時間を受話音声区間情報の前後に付与する。 FIG. 9 shows a more specific functional configuration example of the received signal extraction unit 503. The received signal extraction unit 503 includes a margin time giving unit 512 and a side tone removal received signal extracting unit 522. Margin time imparting means 512, the speech section information Ru 1 ~Ru n, Rd 1 ~Rd n, Su 1 ~Su n, as input Sd 1 to SD n, the start time of the speech section of the sidetone suppression already received signal Ru within 1 ~Ru predetermined time than n Rms of past time, it determines whether the end time Sd 1 to SD n speech period of the transmission signal is present. Then, in the end time Rd 1 ~ Rd n predetermined time than Rme future time of the audio section of the sidetone suppression already received signal, whether the start time Su 1 to SU n in the speech section of the transmission signal is present Determine. When the disclosure time and the end time do not exist within the predetermined time, a margin time of the predetermined time is given before and after the reception voice section information.

図10に示す音声区間検出部502と受話信号抽出部503の動作フローを参照して更に説明する。音声区間検出部502は、受話送話信号記録部501から側音抑圧済み受話信号と送話信号とを読み出して、側音抑圧済み受話信号の開始時刻Ru〜Ruと終了時刻Rd〜Rdを検出する(ステップS5020)。開始時刻とは受話音声区間の立上りの時刻、終了時刻とは受話音声区間の立下り時刻のことである。同様に、送話音声区間の開始時刻Su〜Suと終了時刻Sd〜Sdを検出する(ステップS5021)。図11に、2個の受話音声区間情報RuとRu,RdとRdと、2個の送話音声区間情報SuとSu,SdとSdとを例示する。経過時間順に開始時刻と終了時刻にそれぞれ番号が付されている。 Further description will be given with reference to the operation flow of the voice section detection unit 502 and the reception signal extraction unit 503 shown in FIG. The voice section detection unit 502 reads the side-tone-suppressed reception signal and the transmission signal from the reception-speech signal recording unit 501, and starts the side-tone-suppressed reception signal Ru 1 to Ru n and the end time Rd 1 to detecting the Rd n (step S5020). The start time is the rise time of the received voice section, and the end time is the fall time of the received voice section. Similarly, to detect an end time Sd 1 to SD n and start time Su 1 to SU n of the transmission voice section (Step S5021). FIG. 11 illustrates two pieces of received voice section information Ru 1 and Ru 2 , Rd 1 and Rd 2 , and two pieces of transmitted voice section information Su 1 and Su 2 , Sd 1 and Sd 2 . Numbers are assigned to the start time and end time in order of elapsed time.

受話信号抽出部503は、音声区間の番号を表す変数iを初期化(i=1)する(ステップS5121)。そして、側音抑圧済み受話信号の最初の音声区間の開始時刻RuよりRms(例えば0.5秒)過去の時間内に、送話信号の終了時刻Sdが存在するか否かを判定する(ステップS5122)。Rms秒過去の時間内に、送話信号の終了時刻Sdが存在しない場合(ステップS5122のNo)、側音抑圧済み受話信号の最初の音声区間の開始時刻RuをRu=Ru−Rms秒に決定する(ステップS5123)。Rms秒過去の時間内に、送話信号の終了時刻Sdが存在する場合(ステップS5122のYes)、側音抑圧済み受話信号の最初の音声区間の開始時刻Ruを、送話信号の最初の終了時刻Sd=Sd秒に決定する(ステップS5124)。ここで例えば、Rms=0.5秒とRme=0.1秒とするが、他の所定時間であっても良い。なお、RmsとRmeの時間幅を変えている理由は、発話開始時において音量が小さく音声区間として検出し難い子音で始まる場合に対処する目的で、長めのマージン時間を確保した方が好都合であることによる。一方、語尾が母音で終わる日本語の場合、受話信号の終了時刻のマージン時間は短くても不都合が少ないためである。 The received signal extraction unit 503 initializes (i = 1) a variable i representing the number of the voice section (step S5121). Then, it is determined whether or not the end time Sd 1 of the transmission signal exists within the time Rms (for example, 0.5 seconds) past the start time Ru 1 of the first voice section of the reception signal with sidetone suppression. (Step S5122). The Rms seconds in past time, when the end time Sd 1 of the transmission signal does not exist (No in step S5122), the start time Ru 1 of the first speech section of the sidetone suppression already received signal Ru 1 = Ru 1 - Rms seconds are determined (step S5123). The Rms seconds within a time in the past, (Yes in Step S5122) If the end time Sd 1 of the transmission signal is present, the start time Ru 1 of the first speech section of the sidetone suppression already received signal, the first transmission signal End time Sd 1 = Sd 1 second is determined (step S5124). Here, for example, Rms = 0.5 seconds and Rme = 0.1 seconds, but other predetermined times may be used. The reason why the time widths of Rms and Rme are changed is that it is advantageous to secure a long margin time for the purpose of dealing with a case where the sound volume starts small and is difficult to detect as a voice section at the start of utterance. It depends. On the other hand, in the case of Japanese ending with vowels, there is little inconvenience even if the margin time of the end time of the received signal is short.

次に、側音抑圧済み受話信号の最初の音声区間の終了時刻RdよりRme秒未来の時間内に、送話信号の開始時刻Suが存在するか否かを判定する(ステップS5125)。Rme秒未来の時間内に、送話信号の開始時刻Suが存在しない場合(ステップS5125のNo)、側音抑圧済み受話信号の最初の音声区間の終了時刻RdをRd=Rd+Rme秒に決定する(ステップS5126)。Rme秒未来の時間内に、送話信号の開始時刻Suが存在する場合(ステップS5125のYes)、側音抑圧済み受話信号の最初の音声区間の終了時刻Rdを、送話信号の開始時刻Su(Rd=Su)秒に決定する(ステップS5127)。 Next, the sidetone suppression already received signals of the first end time Rd 1 in more Rme of seconds in the future time of the speech section, determines whether the start time Su 2 of the transmission signal is present (step S5125). In RME of seconds in the future time, when the start time Su 2 of the transmission signal does not exist (No in step S5125), the sidetone suppression already received signals of the first end time Rd 1 speech period Rd 1 = Rd 1 + Rme Second is determined (step S5126). In Rme seconds in the future time, (Yes in step S5125) When the start time Su 2 of the transmission signal is present, the end time Rd 1 of the first speech section of the sidetone suppression already received signal, the start of the transmission signal Time Su 2 (Rd 1 = Su 2 ) seconds is determined (step S5127).

以上の処理は、音声区間が終了するまで音声区間の番号を表す変数iをインクリメントしながら続けられる(ステップS5129のNoのループ)。つまり、側音抑圧済み受話信号の音声区間の前後に所定の時間幅のマージン時間が付与される。   The above processing is continued while incrementing the variable i representing the number of the voice section until the voice section is completed (No loop in step S5129). That is, a margin time of a predetermined time width is given before and after the voice section of the reception signal with side-tone suppressed.

図11に、音声聴取装置500で聴取した側音除去受話信号の例を示す。1行目は側音抑圧済み受話信号の受話音声区間情報、2行目は送話信号、3行目は送話信号の送話音声区間情報、4行目はマージン時間付与手段512の出力するマージン時間付与済み受話音声区間情報、5行目は受話信号抽出部503が出力する側音除去受話信号である。マージン時間付与手段512の出力信号に、破線で示すように所定時間のマージン時間が受話音声区間情報の前後に付与されている。そのマージン時間付与済み受話音声区間情報に対応した側音除去受話信号の音声波形は、図6に示したものよりも時間の前後方向に拡大されている。   FIG. 11 shows an example of a side-tone-removed reception signal that has been listened to by the voice listening apparatus 500. The first line is the received voice section information of the side-tone-suppressed received signal, the second line is the transmitted signal, the third line is the transmitted voice section information of the transmitted signal, and the fourth line is output from the margin time giving means 512. Received speech section information with margin time added, and the fifth line is a sidetone-removed received signal output from the received signal extraction unit 503. A margin time of a predetermined time is added to the output signal of the margin time giving means 512 before and after the reception voice section information as indicated by a broken line. The speech waveform of the side-tone-removed reception signal corresponding to the reception speech section information to which the margin time has been added is expanded in the front-rear direction of time as compared with that shown in FIG.

以上説明したように音声聴取装置500は、所定の時間のマージン時間が付与された受話音声区間情報に対応する受話信号を、側音除去受話信号として抽出して記録することが出来る。なお、音声聴取装置500の機能構成は図8に示した例に限定されない。例えば、送話信号と受話信号の音声区間の検出を音声区間検出部502で行う例で説明したが、受話信号の音声区間は受話音声区間検出部20で行い、送話信号の音声区間は実施例4で説明したように送話音声区間検出部401で行って、それぞれから音声区間情報を取得するようにしても良い。また、受話送話信号記録部501を備える例で説明を行ったが、実施例1〜4に示した送話受話信号記録部40に、音声取得処理をする前の音声データを記録するようにしても良い。このように、実施例5の機能構成は、図8に示す機能構成に限定されない。   As described above, the voice listening device 500 can extract and record a received signal corresponding to the received voice section information to which a predetermined margin time is given as a side-tone-removed received signal. The functional configuration of the audio listening device 500 is not limited to the example shown in FIG. For example, although the description has been made with the example in which the voice section of the transmission signal and the reception signal is detected by the voice section detection unit 502, the voice section of the reception signal is performed by the reception voice section detection unit 20, and the voice section of the transmission signal is performed. As described in the fourth example, the transmission voice section detection unit 401 may perform the voice section information from each. In addition, although an example in which the received transmission signal recording unit 501 is provided has been described, the voice data before the voice acquisition process is recorded in the transmission / reception signal recording unit 40 described in the first to fourth embodiments. May be. Thus, the functional configuration of the fifth embodiment is not limited to the functional configuration illustrated in FIG.

なお、この側音抑圧済み受話信号の音声区間の前後に所定の時間幅のマージン時間を付与する考えは、実施例2〜4と組み合わせることも可能である。   Note that the idea of giving a margin time having a predetermined time width before and after the speech section of the reception signal after side-tone suppression can be combined with the second to fourth embodiments.

以上実施例で説明した音声聴取装置によれば、側音信号を抑圧した側音抑圧済み受話信号から受話音声区間情報を検出し、当該受話音声区間情報に対応する受話信号を側音除去受話信号として記録するので、クロストーク状態の場合でも全ての受話信号を漏れなく聴取することが可能になる。   According to the voice listening device described in the above embodiment, the received voice section information is detected from the side-tone-suppressed received signal in which the side-tone signal is suppressed, and the received signal corresponding to the received voice-section information is detected as the side-tone-removed received signal. Therefore, it is possible to listen to all received signals without omission even in a crosstalk state.

なお、上記方法及び装置において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。   Note that the processes described in the above method and apparatus are not only executed in time series according to the order of description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Good.

上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。   When the processing means in the above apparatus is realized by a computer, the processing contents of the functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD(Digital Versatile Disc)、DVD-RAM(Random Access Memory)、CD-ROM(Compact Disc Read Only Memory)、CD-R(Recordable)/RW(ReWritable)等を、光磁気記録媒体として、MO(Magneto Optical disc)等を、半導体メモリとしてEEP-ROM(Electronically Erasable and Programmable-Read Only Memory)等を用いることができる。   The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording media, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。   This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。   Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims (9)

電話機からの受話信号x 1 と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号x 2 を出力する側音抑圧処理部と、
上記側音抑圧済み受話信号x 2 を入力として、当該側音抑圧済み受話信号x 2 の音声区間を検出して受話音声区間情報を出力する受話音声区間検出部と、
上記受話信号x 1 と上記送話信号を入力として、原音付加率の大きい低抑圧量の側音低抑圧済み受話信号x 4 を出力する側音低抑圧処理部と、
上記受話音声区間情報と上記側音低抑圧済み受話信号x 4 を入力として、クロストーク状態か否かに関わらず、上記受話音声区間情報に対応する側音低抑圧済み受話信号x 4 を、側音除去受話信号x 3 として出力する受話信号抽出部と、
上記側音除去受話信号x 3 と上記送話信号を記録する送受話信号記録部と、
を具備する音声聴取装置。
A side-tone suppression processing unit that receives a reception signal x 1 and a transmission signal from a telephone and outputs a side-tone-suppressed reception signal x 2 that suppresses the side-tone signal;
As input said side sound suppression already received signal x 2, the received voice section detection unit for outputting a received voice segment information by detecting a voice section of the side sound suppression already received signal x 2,
With the received signal x 1 and the transmitted signal as inputs, a side tone low suppression processing unit that outputs a side suppression low-suppressed received signal x 4 having a low suppression amount with a large original sound addition rate ;
As inputs the received voice segment information and the side tone low suppression already received signal x 4, regardless of whether the cross-talk state, the sidetone low suppression already received signal x 4 corresponding to the received voice segment information, side A received signal extraction unit that outputs as a sound removal received signal x 3 ;
The side-tone-removed reception signal x 3 and the transmission / reception signal recording unit for recording the transmission signal,
A voice listening device comprising:
電話機からの受話信号x 1 と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号x 2 を出力する側音抑圧処理部と、
上記側音抑圧済み受話信号x 2 を入力として、当該側音抑圧済み受話信号x 2 の音声区間を検出して受話音声区間情報を出力する受話音声区間検出部と、
上記受話音声区間情報と上記受話信号x 1 を入力として、クロストーク状態か否かに関わらず、上記受話音声区間情報に対応する受話信号x 1 を、側音除去受話信号x 3 として出力すると共に、上記受話音声区間情報に対応しない区間の側音除去受話信号x 3 を音量レベルの小さい雑音として出力する受話信号抽出部と、
上記側音除去受話信号x 3 と上記送話信号を記録する送受話信号記録部と、
を具備する音声聴取装置。
A side-tone suppression processing unit that receives a reception signal x 1 and a transmission signal from a telephone and outputs a side-tone-suppressed reception signal x 2 that suppresses the side-tone signal;
As input said side sound suppression already received signal x 2, the received voice section detection unit for outputting a received voice segment information by detecting a voice section of the side sound suppression already received signal x 2,
As inputs the received voice segment information and the received signals x 1, regardless of whether the cross-talk state, the reception signal x 1 corresponding to the received voice segment information, and outputs as a sidetone removed received signal x 3 , and the reception signal extraction section for outputting a side tone removal received signal x 3 sections that do not correspond to the received voice segment information as a small noise-volume level,
The side-tone-removed reception signal x 3 and the transmission / reception signal recording unit for recording the transmission signal,
A voice listening device comprising:
電話機からの受話信号x 1 と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号x 2 を出力する側音抑圧処理部と、
上記側音抑圧済み受話信号x 2 を入力として、当該側音抑圧済み受話信号x 2 の音声区間を検出して受話音声区間情報を出力する受話音声区間検出部と、
上記送話信号の音声区間検出を行い送話音声区間を検出すると共に、当該送話音声区間内の音量レベルを検出して上記送話音声区間と当該送話音声区間の音量レベルとを出力する送話音声区間検出部と、
上記受話音声区間情報と上記受話信号x 1 を入力として、クロストーク状態か否かに関わらず、上記受話音声区間情報に対応する受話信号x 1 を、側音除去受話信号x 3 として出力し、上記送話音声区間検出部が出力する音量レベルを参照して当該音量レベルが所定値以上の場合、その区間に対応する受話信号x 1 の振幅を低下させるように制御する受話信号抽出部と、
上記側音除去受話信号x 3 と上記送話信号を記録する送受話信号記録部と、
を具備する音声聴取装置。
A side-tone suppression processing unit that receives a reception signal x 1 and a transmission signal from a telephone and outputs a side-tone-suppressed reception signal x 2 that suppresses the side-tone signal;
As input said side sound suppression already received signal x 2, the received voice section detection unit for outputting a received voice segment information by detecting a voice section of the side sound suppression already received signal x 2,
Detecting the voice section of the transmission signal to detect the transmission voice section, detecting the volume level in the transmission voice section, and outputting the transmission voice section and the volume level of the transmission voice section A transmission voice section detection unit;
With the received voice section information and the received signal x 1 as inputs , regardless of whether or not in a crosstalk state , the received signal x 1 corresponding to the received voice section information is output as a side-tone-removed received signal x 3 , When the volume level is higher than a predetermined value with reference to the volume level output by the transmission voice section detection unit , a received signal extraction unit that controls to reduce the amplitude of the received signal x 1 corresponding to the section ;
The side-tone-removed reception signal x 3 and the transmission / reception signal recording unit for recording the transmission signal,
A voice listening device comprising:
電話機からの受話信号x 1 と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号x 2 を出力する側音抑圧処理部と、
上記側音抑圧済み受話信号x 2 と上記送話信号を記録する受話送話信号記録部と、
上記受話送話信号記録部から上記側音抑圧済み受話信号x 2 と上記送話信号とを読み出して、上記側音抑圧済み受話信号x 2 の複数の音声区間の開始時刻Ru〜Ruと終了時刻Rd〜Rd、及び上記送話信号の複数の音声区間の開始時刻Su〜Suと終了時刻Sd〜Sdを検出して音声区間情報として出力する音声区間検出部と、
上記音声区間情報と上記受話信号x 1 とを入力として、クロストーク状態か否かに関わらず、上記側音抑圧済み受話信号x 2 の音声区間情報の前後にマージン時間を付与した受話音声区間に対応する受話信号x 1 を、側音除去受話信号x 3 として出力する受話信号抽出部と、
上記側音除去受話信号x 3 を記録する側音除去受話信号記録部と、
を具備する音声聴取装置。
A side-tone suppression processing unit that receives a reception signal x 1 and a transmission signal from a telephone and outputs a side-tone-suppressed reception signal x 2 that suppresses the side-tone signal;
The side-tone-suppressed reception signal x 2 and the reception transmission signal recording unit for recording the transmission signal,
It reads and the side tone suppression already received signal x 2 and the transmission signal from the received transmission signal recording unit, and start time Ru 1 ~Ru n of a plurality of speech segment of said side sound suppression already received signal x 2 a voice section detection unit for outputting a speech segment information end time Rd 1 ~ Rd n, and detects the start time Su 1 to SU n and the end time Sd 1 to SD n of a plurality of speech segment of said transmission signal,
With the voice segment information and the received signal x 1 as inputs , regardless of whether or not a crosstalk state is present, the received voice segment with margin time added before and after the voice segment information of the side signal suppressed received signal x 2 A received signal extraction unit that outputs a corresponding received signal x 1 as a side-tone-removed received signal x 3 ;
A side-tone-removed reception signal recording unit for recording the side-tone-removed reception signal x 3 ;
A voice listening device comprising:
電話機からの受話信号x 1 と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号x 2 を出力する側音抑圧処理過程と、
上記側音抑圧済み受話信号x 2 を入力として、当該側音抑圧済み受話信号x 2 の音声区間を検出して受話音声区間情報を出力する受話音声区間検出過程と、
上記受話信号x 1 と上記送話信号を入力として、原音付加率の大きい低抑圧量の側音低抑圧済み受話信号x 4 を出力する側音低抑圧処理過程と、
上記受話音声区間情報と上記側音低抑圧済み受話信号x 4 を入力として、クロストーク状態か否かに関わらず、上記受話音声区間情報に対応する側音低抑圧済み受話信号x 4 を、側音除去受話信号x 3 として出力する受話信号抽出過程と、
上記側音除去受話信号x 3 と上記送話信号を記録する送受話信号記録過程と、
を具備する音声聴取方法。
Side-tone suppression processing process for receiving side-tone-suppressed reception signal x 2 with side-tone signal being suppressed, with reception signal x 1 and transmission signal from the phone as inputs,
With the side-tone-suppressed reception signal x 2 as an input, a reception voice-section detection process of detecting a voice section of the side-tone-suppressed reception signal x 2 and outputting reception voice section information;
With the received signal x 1 and the transmitted signal as inputs, a side-tone low suppression process for outputting a side-tone low-suppressed received signal x 4 with a low suppression amount with a large original sound addition rate ,
As inputs the received voice segment information and the side tone low suppression already received signal x 4, regardless of whether the cross-talk state, the sidetone low suppression already received signal x 4 corresponding to the received voice segment information, side The process of extracting the received signal to be output as the sound-removed received signal x 3 ;
The side-tone-removed reception signal x 3 and the transmission / reception signal recording process for recording the transmission signal,
A voice listening method comprising:
電話機からの受話信号x 1 と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号x 2 を出力する側音抑圧処理過程と、
上記側音抑圧済み受話信号x 2 を入力として、当該側音抑圧済み受話信号x 2 の音声区間を検出して受話音声区間情報を出力する受話音声区間検出過程と、
上記受話音声区間情報と上記受話信号x 1 を入力として、クロストーク状態か否かに関わらず、上記受話音声区間情報に対応する受話信号x 1 を、側音除去受話信号x 3 として出力すると共に、上記受話音声区間情報に対応しない区間の側音除去受話信号x 3 を音量レベルの小さい雑音として出力する受話信号抽出過程と、
上記側音除去受話信号x 3 と上記送話信号を記録する送受話信号記録過程と、
を具備する音声聴取方法。
Side-tone suppression processing process for receiving side-tone-suppressed reception signal x 2 with side-tone signal being suppressed, with reception signal x 1 and transmission signal from the phone as inputs,
With the side-tone-suppressed reception signal x 2 as an input, a reception voice-section detection process of detecting a voice section of the side-tone-suppressed reception signal x 2 and outputting reception voice section information;
As inputs the received voice segment information and the received signals x 1, regardless of whether the cross-talk state, the reception signal x 1 corresponding to the received voice segment information, and outputs as a sidetone removed received signal x 3 , A reception signal extraction process for outputting the side-tone-removed reception signal x 3 of the section not corresponding to the reception voice section information as noise having a low volume level ;
The side-tone-removed reception signal x 3 and the transmission / reception signal recording process for recording the transmission signal,
A voice listening method comprising:
電話機からの受話信号x 1 と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号x 2 を出力する側音抑圧処理過程と、
上記側音抑圧済み受話信号x 2 を入力として、当該側音抑圧済み受話信号x 2 の音声区間を検出して受話音声区間情報を出力する受話音声区間検出過程と、
上記送話信号の音声区間検出を行い送話音声区間を検出すると共に、当該送話音声区間内の音量レベルを検出して上記送話音声区間と当該送話音声区間の音量レベルとを出力する送話音声区間検出過程と、
上記受話音声区間情報と上記受話信号x 1 を入力として、クロストーク状態か否かに関わらず、上記受話音声区間情報に対応する受話信号x 1 を、側音除去受話信号x 3 として出力し、上記送話音声区間検出過程が出力する音量レベルを参照して当該音量レベルが所定値以上の場合、その区間に対応する受話信号x 1 の振幅を低下させるように制御する受話信号抽出過程と、
上記側音除去受話信号x 3 と上記送話信号を記録する送受話信号記録過程と、
を具備する音声聴取方法。
Side-tone suppression processing process for receiving side-tone-suppressed reception signal x 2 with side-tone signal being suppressed, with reception signal x 1 and transmission signal from the phone as inputs,
With the side-tone-suppressed reception signal x 2 as an input, a reception voice-section detection process of detecting a voice section of the side-tone-suppressed reception signal x 2 and outputting reception voice section information;
Detecting the voice section of the transmission signal to detect the transmission voice section, detecting the volume level in the transmission voice section, and outputting the transmission voice section and the volume level of the transmission voice section The process of detecting the transmitted voice interval;
With the received voice section information and the received signal x 1 as inputs , regardless of whether or not in a crosstalk state , the received signal x 1 corresponding to the received voice section information is output as a side-tone-removed received signal x 3 , When the volume level is greater than or equal to a predetermined value with reference to the volume level output by the transmission voice section detection process , a received signal extraction process for controlling to reduce the amplitude of the received signal x 1 corresponding to the section ;
The side-tone-removed reception signal x 3 and the transmission / reception signal recording process for recording the transmission signal,
A voice listening method comprising:
電話機からの受話信号x  Answer signal from phone x 11 と送話信号を入力として、側音信号を抑圧した側音抑圧済み受話信号xSide-signal suppressed received signal x with side-tone signal suppressed 22 を出力する側音抑圧処理過程と、Side-tone suppression process that outputs
上記側音抑圧済み受話信号x  Received signal x with side-tone suppressed 22 と上記送話信号を記録する受話送話信号記録過程と、And the received transmission signal recording process for recording the transmission signal, and
上記受話送話信号記録過程から上記側音抑圧済み受話信号x  From the received speech transmission signal recording process, the side signal suppressed reception signal x 22 と上記送話信号とを読み出して、上記側音抑圧済み受話信号xAnd the above transmission signal, and the side-tone-suppressed reception signal x 22 の複数の音声区間の開始時刻RuStart times Ru of multiple voice segments 1 〜Ru~ Ru n と終了時刻RdAnd end time Rd 1 〜Rd~ Rd n 、及び上記送話信号の複数の音声区間の開始時刻Su, And start times Su of a plurality of voice sections of the transmission signal 1 〜Su~ Su n と終了時刻SdAnd end time Sd 1 〜Sd~ Sd n を検出して音声区間情報として出力する音声区間検出過程と、Detecting a voice segment and outputting it as voice segment information;
上記音声区間情報と上記受話信号x  Voice section information and received signal x 11 とを入力として、クロストーク状態か否かに関わらず、上記側音抑圧済み受話信号x, And the side-tone-suppressed received signal x regardless of whether or not it is in the crosstalk state. 22 の音声区間情報の前後にマージン時間を付与した受話音声区間に対応する受話信号xReceived signal x corresponding to the received voice section with margin time before and after the voice section information of 11 を、側音除去受話信号x, Sidetone elimination received signal x 3Three として出力する受話信号抽出過程と、Receiving signal extraction process to be output as
上記側音除去受話信号x  Above sidetone elimination received signal x 3Three を記録する側音除去受話信号記録過程と、Side-tone-removed received signal recording process,
を具備する音声聴取方法。  A voice listening method comprising:
請求項1乃至の何れかに記載した音声聴取装置としてコンピュータを機能させるためのプログラム。 A program for causing a computer to function as the audio listening device according to any one of claims 1 to 4 .
JP2012098839A 2012-04-24 2012-04-24 Voice listening device, method and program thereof Expired - Fee Related JP5812932B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2012098839A JP5812932B2 (en) 2012-04-24 2012-04-24 Voice listening device, method and program thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2012098839A JP5812932B2 (en) 2012-04-24 2012-04-24 Voice listening device, method and program thereof

Publications (2)

Publication Number Publication Date
JP2013228459A JP2013228459A (en) 2013-11-07
JP5812932B2 true JP5812932B2 (en) 2015-11-17

Family

ID=49676174

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2012098839A Expired - Fee Related JP5812932B2 (en) 2012-04-24 2012-04-24 Voice listening device, method and program thereof

Country Status (1)

Country Link
JP (1) JP5812932B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103747117B (en) * 2013-12-31 2017-01-25 余姚市盛飞电器有限公司 Fixed-line telephone equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2921472B2 (en) * 1996-03-15 1999-07-19 日本電気株式会社 Voice and noise elimination device, voice recognition device
US5765130A (en) * 1996-05-21 1998-06-09 Applied Language Technologies, Inc. Method and apparatus for facilitating speech barge-in in connection with voice recognition systems
JP2001343985A (en) * 2000-06-02 2001-12-14 Nippon Telegr & Teleph Corp <Ntt> Method of voice switching and voice switch
JP2002149198A (en) * 2000-11-13 2002-05-24 Matsushita Electric Ind Co Ltd Voice encoder and decoder
JP2004271607A (en) * 2003-03-05 2004-09-30 Toyota Motor Corp Device and method for speech recognition
JP2005321539A (en) * 2004-05-07 2005-11-17 Nippon Telegr & Teleph Corp <Ntt> Voice recognition method, its device and program and its recording medium
JP2006343642A (en) * 2005-06-10 2006-12-21 Nippon Telegr & Teleph Corp <Ntt> Speech recognition method, speech recognition device, program, and recording medium
US9002709B2 (en) * 2009-12-10 2015-04-07 Nec Corporation Voice recognition system and voice recognition method

Also Published As

Publication number Publication date
JP2013228459A (en) 2013-11-07

Similar Documents

Publication Publication Date Title
US9208766B2 (en) Computer program product for adaptive audio signal shaping for improved playback in a noisy environment
RU2439716C2 (en) Detection of telephone answering machine by voice recognition
JP2013200423A (en) Voice interaction support device, method and program
JP6731632B2 (en) Audio processing device, audio processing method, and audio processing program
US10540983B2 (en) Detecting and reducing feedback
JP4867765B2 (en) Information embedding device for sound signal and device for extracting information from sound signal
US9123349B2 (en) Methods and apparatus to provide speech privacy
US8768406B2 (en) Background sound removal for privacy and personalization use
JP5812932B2 (en) Voice listening device, method and program thereof
KR102174270B1 (en) Voice converting apparatus and Method for converting user voice thereof
JP4644876B2 (en) Audio processing device
US8195317B2 (en) Data reproduction apparatus and data reproduction method
JP6610195B2 (en) Terminal device and communication method
JP2006197580A (en) Sound signal amplitude limiter
JP4493557B2 (en) Audio signal judgment device
KR20120124351A (en) speech recognition hearing aid system using mobile phone and its application method thereof
JP6690200B2 (en) Terminal device, communication method
JP6143824B2 (en) Spoken dialogue support apparatus, method, and program
JP2007086592A (en) Speech output device and method therefor
CN112954577B (en) Evaluation equipment and earphone evaluation real-time display method
KR20220067276A (en) Speaker diarization of single channel speech using source separation
JP2005123869A (en) System and method for dictating call content
JP6353402B2 (en) Acoustic digital watermark system, digital watermark embedding apparatus, digital watermark reading apparatus, method and program thereof
JP2022181759A (en) Voice quality evaluation device, voice quality evaluation method, and voice quality evaluation program
KR20130130325A (en) Speech recognition hearing aid system using mobile phone and its application method thereof

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20140703

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20150126

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20150303

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20150417

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20150908

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20150915

R150 Certificate of patent or registration of utility model

Ref document number: 5812932

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

LAPS Cancellation because of no payment of annual fees