WO2012157785A1 - Audio processing device, audio processing method, and recording medium on which audio processing program is recorded - Google Patents

Audio processing device, audio processing method, and recording medium on which audio processing program is recorded Download PDF

Info

Publication number
WO2012157785A1
WO2012157785A1 PCT/JP2012/063404 JP2012063404W WO2012157785A1 WO 2012157785 A1 WO2012157785 A1 WO 2012157785A1 JP 2012063404 W JP2012063404 W JP 2012063404W WO 2012157785 A1 WO2012157785 A1 WO 2012157785A1
Authority
WO
WIPO (PCT)
Prior art keywords
suppression
signal
echo
unit
regression coefficient
Prior art date
Application number
PCT/JP2012/063404
Other languages
French (fr)
Japanese (ja)
Inventor
宝珠山 治
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2013515243A priority Critical patent/JPWO2012157785A1/en
Publication of WO2012157785A1 publication Critical patent/WO2012157785A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B3/00Line transmission systems
    • H04B3/02Details
    • H04B3/20Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other
    • H04B3/23Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other using a replica of transmitted signal in the time domain, e.g. echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/02Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback

Definitions

  • the present invention relates to a technique for suppressing echo.
  • the echo suppression device described in Patent Document 1 includes a conversion unit having the following functions.
  • the conversion unit inputs the first signal and the second signal.
  • the first signal is either a microphone input signal (output signal in Patent Document 1) or a signal obtained by subtracting the output signal of the linear echo canceller from the output signal of the speaker by the subtractor.
  • the second signal is an output signal of the linear echo canceller.
  • the conversion unit calculates an estimated value of echo leakage from the first signal and the second signal.
  • the conversion unit corrects the first signal based on the calculated estimated value, thereby generating a near-end signal obtained by removing the echo from the first signal, and outputs the generated near-end signal to the output terminal.
  • an apparatus includes: Nonlinear echo suppression means for suppressing nonlinear echo included in the input signal; In accordance with the result of suppression by the nonlinear echo suppression unit, a determination unit that determines the presence or absence of a desired signal in the input signal; As a result of the determination, if it is determined that the desired signal is mixed in the input signal, a suppression strength control unit that weakens the suppression strength of the nonlinear echo in the nonlinear echo suppression unit; Is provided.
  • a method includes Suppresses non-linear echo contained in the input signal, According to the result of the suppression, the presence or absence of a desired signal in the input signal is determined, As a result of the determination, if it is determined that the desired signal is mixed in the input signal, nonlinear echo suppression with weakened nonlinear echo suppression strength is performed.
  • the program recorded on the non-volatile medium in one embodiment of the present invention is: Processing to suppress non-linear echo contained in the input signal; A process for determining the presence or absence of a desired signal in the input signal according to the result of the suppression; As a result of the determination, if it is determined that the desired signal is mixed in the input signal, processing for suppressing nonlinear echo with weakened nonlinear echo suppression strength; Is executed on the computer.
  • FIG. 1 is a block diagram showing the configuration of the speech processing apparatus according to the first embodiment of the present invention.
  • FIG. 2 is a block diagram showing the configuration of the speech processing apparatus according to the second embodiment of the present invention.
  • FIG. 3 is a flowchart showing a flow of processing of the speech processing apparatus according to the second embodiment of the present invention.
  • FIG. 4 is a diagram showing the effect of the speech processing apparatus according to the second embodiment of the present invention.
  • FIG. 5 is a block diagram showing the configuration of the speech processing apparatus according to the third embodiment of the present invention.
  • FIG. 6 is a flowchart showing the flow of processing of the speech processing apparatus according to the fourth embodiment of the present invention.
  • FIG. 7 is a flowchart showing a process flow of the speech processing apparatus according to the fifth embodiment of the present invention.
  • FIG. 8 is a block diagram showing a configuration of a speech processing apparatus according to the sixth embodiment of the present invention.
  • FIG. 9 is a flowchart showing the flow of processing of the speech processing apparatus according to the sixth embodiment of the present invention.
  • FIG. 10 is a block diagram showing the configuration of the speech processing apparatus according to the seventh embodiment of the present invention.
  • FIG. 11 is a diagram for explaining the effect of the speech processing apparatus according to the seventh embodiment of the present invention.
  • FIG. 12 is a block diagram showing a configuration of a sound processing apparatus according to another embodiment of the present invention.
  • FIG. 13 is a diagram showing a recording medium on which the program of the present invention is recorded.
  • a speech processing apparatus 100 as a first embodiment of the present invention will be described with reference to FIG. 1.
  • the audio processing device 100 is a device for suppressing echo in a signal.
  • the speech processing apparatus 100 includes a nonlinear echo suppression unit 101, a desired signal mixture determination unit 102, and an echo suppression intensity control unit 103.
  • the nonlinear echo suppression unit 101 suppresses nonlinear echo included in the input signal.
  • the desired signal mixture determination unit 102 determines whether or not the desired signal is mixed in the input signal according to the result of suppression by the nonlinear echo suppression unit 101. If the echo suppression intensity control unit 103 determines that the desired signal is mixed in the input signal as a result of the determination by the desired signal mixture determination unit 102, the echo suppression intensity control unit 103 weakens the suppression intensity of the nonlinear echo in the nonlinear echo suppression unit 101. . With the above configuration, the speech processing apparatus 100 can suppress echo while suppressing deterioration of a desired signal. The reason is that the speech processing apparatus 100 has the following configuration. First, the desired signal mixture determination unit 102 determines whether or not the desired signal is mixed in the input signal.
  • FIG. 2 is a block diagram for explaining a schematic configuration of the speech processing apparatus 200 according to the present embodiment.
  • the speech processing apparatus 200 determines the suppression amount after determining the presence or absence of the desired signal so that the high-frequency component of the desired signal (for example, the near-end signal) is attenuated and the intelligibility does not decrease when the echo is suppressed To do.
  • the sound processing apparatus 200 includes a microphone 201 as sound input means and a speaker 202 as sound output means.
  • the speaker 202 outputs sound corresponding to the output signal
  • an echo signal may be mixed into the input signal of the microphone 201.
  • the pseudo linear echo generation unit 203 generates a pseudo signal of the linear echo signal among the echo signals from the output signal.
  • the quasi-linear echo generation unit 203 is configured by an adaptive filter that updates a coefficient using a residual signal, for example.
  • the linear echo suppression unit 204 subtracts the pseudo linear echo signal from the input signal input by the microphone 201.
  • the pseudo linear echo signal and the residual signal generated by the pseudo linear echo generation unit 203 are input to the nonlinear echo suppression unit 205.
  • the nonlinear echo suppression unit 205 estimates the nonlinear echo signal and suppresses the nonlinear echo signal component in the residual signal.
  • the speech processing apparatus 200 includes an output determination unit 206 and a suppression strength control unit 207.
  • the output determination unit 206 determines the presence of the desired signal from the signal after the echo is strongly suppressed. In particular, the presence of the desired signal is determined only from the high frequency component of the sound.
  • the non-linear echo suppression unit 205 suppresses non-linear echo based on a non-linear echo signal estimated by multiplying by a regression coefficient.
  • the suppression intensity control unit 207 reduces the suppression amount by correcting or selecting the regression coefficient. That is, when the desired signal is present in the signal after the echo is strongly suppressed using the first regression coefficient, the suppression intensity control unit 207 applies the second regression coefficient smaller than the first regression coefficient to the nonlinear echo suppression unit. 205.
  • the nonlinear echo suppression unit 205 estimates the nonlinear echo signal by multiplying the pseudo nonlinear echo signal by a second regression coefficient smaller than the first regression coefficient.
  • FIG. 3 is a flowchart for explaining the flow of processing performed by the speech processing apparatus 200.
  • step S301 the nonlinear echo suppression unit 205 strongly suppresses the echo with a relatively large first regression coefficient.
  • step S302 the output determination unit 206 calculates the power after suppression.
  • step S303 the output determination unit 206 determines whether or not the suppressed power (power, that is, the square of the amplitude) is greater than a threshold value. When the power after suppression is larger than the threshold value, it is considered that the desired signal is included in the original input signal, and thus the process proceeds to step S304, and the suppression strength control unit 207 decreases the regression coefficient. That is, the suppression intensity control unit 207 outputs a second regression coefficient that is smaller than the first regression coefficient.
  • FIG. 4 is a diagram illustrating effects that can be achieved by the sound processing apparatus 200 according to the present embodiment.
  • the graph on the left side of the upper row 401 is a graph showing the spectrum after the first suppression process.
  • the graph on the right side of the upper row 401 is a graph showing the spectrum after the second suppression processing after the regression coefficient is corrected to be small. As shown in the upper section 401, by reducing the regression coefficient, it is possible to perform echo suppression so that a large desired speech remains.
  • the graph on the left side of the lower stage 402 is a graph showing a spectrum when the output power after the first suppression process is smaller than the threshold value.
  • the suppression strength control unit 207 does not change the regression coefficient.
  • the spectrum after the second suppression process indicated by the graph on the right side of the lower stage 402 is a spectrum having the same shape as the graph on the left side.
  • the vertical axis represents intensity (decibel)
  • the horizontal axis represents frequency (hertz).
  • a speech processing apparatus according to a third embodiment of the present invention will be described with reference to FIG.
  • the speech processing apparatus according to this embodiment differs from the nonlinear echo suppression unit 205 of the second embodiment in that the nonlinear echo suppression unit includes an output determination unit and a suppression intensity control unit. Since other configurations and operations are the same as those of the second embodiment, description thereof is omitted here.
  • the nonlinear echo suppression unit 500 includes a fast Fourier transform (FFT) unit 501, a fast Fourier transform unit 502, a spectrum amplitude estimation unit 503, a spectrum flooring unit 504, a spectrum gain calculation unit 505, and an inverse And a fast Fourier transform unit (IFFT) 506.
  • the fast Fourier transform units 501 and 502 respectively convert the residual signal d (k) and the pseudo linear echo y (k) into a frequency spectrum.
  • a spectrum amplitude estimation unit 503, a spectrum flooring unit 504, and a spectrum gain calculation unit 505 are prepared for each frequency component.
  • the inverse fast Fourier transform unit 506 integrates the amplitude spectrum derived for each frequency component with the corresponding phase, performs inverse fast Fourier transform, and outputs the time-domain output signal zi (k), that is, the speech waveform to be sent to the other party. Re-synthesize. Linear echo and nonlinear echo are completely different waveforms. However, looking at the spectrum amplitude for each frequency of the linear echo and the non-linear echo, the non-linear echo tends to increase when the pseudo-linear echo is large. That is, there is an amplitude correlation between the linear echo and the nonlinear echo. That is, the amount of non-linear echo can be estimated based on the quasi-linear echo.
  • the spectrum amplitude estimation unit 503 estimates the spectrum amplitude of the desired signal based on the estimated amount of nonlinear echo. There is an error in the spectral amplitude of the estimated signal. Therefore, the spectrum flooring unit 504 performs flooring processing so that the estimation error does not become subjectively unpleasant in the voice waveform sent to the other party. For example, when the estimated spectral amplitude of the signal is excessively small and lower than the spectral amplitude of the background noise, the signal level fluctuates depending on the presence or absence of an echo, causing a strange feeling to the other party. As a countermeasure, the spectrum flooring unit 504 reduces the level fluctuation by estimating the background noise level and setting it as the lower limit of the estimated spectrum amplitude.
  • the spectral gain calculation unit 505 does not subtract the estimated nonlinear echo to cancel the echo, but multiplies the gain so that the amplitude becomes a subtracted degree. By performing smoothing to prevent sudden changes in gain, it is possible to suppress intermittent changes in residual echo.
  • the internal configurations of the spectrum amplitude estimation unit 503, the spectrum flooring unit 504, and the spectrum gain calculation unit 505 will be described using mathematical expressions.
  • the residual signal d (k) input to the nonlinear echo suppression unit 500 is the sum of the desired signal s (k) and the residual nonlinear echo q (k).
  • d (k) s (k) + q (k) (1)
  • the linear echo suppressor 204 has almost completely removed the linear echo, only the nonlinear component is considered in the frequency domain.
  • the residual signal represented by the equation (1) is transformed into the frequency domain and represented by the following equation.
  • D (m) S (m) + Q (m) (2)
  • m is a frame number
  • vectors D (m), S (m), and Q (m) are expressions obtained by converting d (k), s (k), and q (k), respectively, into the frequency domain.
  • Si (m) Di (m) ⁇ Qi (m) (3) Since the pseudo linear echo generation unit 203 and the linear echo suppression unit 204 perform correlation removal, there is almost no correlation between Di (m) and Yi (m). Therefore, the subtractor 536 is as follows: This is an echo replica of the i-th frequency when converted.
  • the product can be modeled as follows.
  • the absolute value converting circuit 532 and the averaging circuit 534 calculate the average echo replica from Yi (m). Multiply the number ai1.
  • the regression coefficient ai1 is a regression coefficient indicating the correlation between
  • Expression (3) is an additive model widely used in noise suppression.
  • the spectrum shaping of the nonlinear echo suppressor 500 shown in FIG. 5 has a spectrum multiplication type configuration that is less likely to cause unpleasant musical noise in noise suppression.
  • is obtained as the product of the spectral gain Gi (m) and the residual signal
  • the comparing unit 537 compares this output signal with a threshold value.
  • the comparison unit 537 displays the comparison result with strong suppression. In this case, it is determined that the desired signal is included, and the regression coefficient calculation unit 538 corrects the regression coefficient. Specifically, the regression coefficient calculation unit 538 calculates a smaller regression coefficient ai2 by multiplying the regression coefficient ai1 by a predetermined correction value.
  • the nonlinear echo is suppressed again by the multiplier 540 and the subtracter 539. There is no error. If the error is large and oversubtraction occurs, a high frequency component is reduced in the desired signal, or a sense of modulation is produced in the voice waveform sent to the other party. In particular, when the desired signal is stationary like air-conditioning sound, the sense of modulation is uncomfortable for the other party.
  • the flooring unit 504 performs flooring on the spectrum. In flooring, the averaging circuit 541 first estimates the steady component
  • the multiplier 553 outputs the amplitude
  • the inverse fast Fourier transform unit 506 performs inverse Fourier transform on the amplitude
  • the regression coefficient calculation unit 538 stores the first regression coefficient and the regression coefficient correction value. The regression coefficient calculation unit 538 then multiplies the first regression coefficient ai1 and the regression coefficient correction value when the comparison result in the comparison unit 537 is a determination that a desired signal is mixed in the input signal.
  • the second regression coefficient ai2 is calculated and output.
  • the regression coefficient calculation unit 538 stores the first regression coefficient and the smaller second regression coefficient, and the comparison result in the comparison unit 537 is a determination that the desired signal is mixed in the input signal.
  • the second regression coefficient may be read and output.
  • the nonlinear echo suppressor 500 according to the present embodiment has the following configuration. First, the fast Fourier transform units 501 and 502 respectively convert the residual signal d (k) and the pseudo linear echo y (k) into a frequency spectrum.
  • the spectrum amplitude estimation unit 503, the spectrum flooring unit 504, and the spectrum gain calculation unit 505 perform echo suppression while suppressing degradation of the desired signal for each frequency component.
  • the inverse fast Fourier transform unit 506 integrates the amplitude spectrum derived for each frequency component with the corresponding phase, performs inverse fast Fourier transform, and re-synthesizes the output signal zi (k) in the time domain.
  • FIG. 6 is a flowchart for explaining the flow of processing in the speech processing apparatus according to this embodiment.
  • the suppression intensity control unit 207 provides a small regression coefficient prepared in advance to the nonlinear echo suppression unit 205.
  • the nonlinear echo suppressor 205 is different from the second embodiment in that the output signal is recalculated using the provided regression coefficient. Since other configurations and operations are the same as those in the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted. According to this embodiment, the same effect as that of the second embodiment can be obtained. The reason is that the suppression intensity control unit 207 provides a small regression coefficient prepared in advance to the nonlinear echo suppression unit 205, and the nonlinear echo suppression unit 205 recalculates the output signal using the regression coefficient. Because. [Fifth Embodiment] A fifth embodiment of the present invention will be described with reference to FIG. FIG.
  • step S703 the output determination unit 206 is different from the second embodiment in that it determines whether the output signal is larger than the threshold only in the audio band. Since other configurations and operations are the same as those in the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted. According to this embodiment, the same effect as that of the second embodiment can be obtained. Moreover, according to this embodiment, the clarity of the desired signal after echo suppression in the voice band is improved. The reason is that the output determination unit 206 determines whether the output signal is larger than the threshold only in the audio band.
  • FIG. 8 is a block diagram showing the configuration of the nonlinear echo suppression unit 800 provided in the speech processing apparatus according to this embodiment.
  • the nonlinear echo suppression unit 800 includes a spectrum amplitude estimation unit 803 instead of the spectrum amplitude estimation unit 503.
  • the spectrum amplitude estimation unit 803 further includes a selection unit 838, a multiplication unit 835, a subtracter 836 and a threshold comparison unit 837 compared to the spectrum amplitude estimation unit 503.
  • the spectrum amplitude estimation unit 803 does not include the comparison unit 537, the regression coefficient calculation unit 538, the subtractor 539, and the multiplication unit 540, as compared with the spectrum amplitude estimation unit 503.
  • the absolute value converting circuit 532 and the averaging circuit 534 are arranged such that the average echo signal is calculated from Yi (m). Multiply by two.
  • the regression coefficients ai1 and ai2 are regression coefficients indicating the correlation between
  • the threshold comparison unit 837 compares the output from the subtractor 536 with the threshold and passes the result to the selection unit 838.
  • the selection unit 838 selects one of the outputs from the subtracter 536 and the subtracter 836 according to the comparison result by the threshold comparison unit 837 and passes the selected output to the maximum value selection circuit 542. Since other configurations and operations are the same as those of the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
  • FIG. 9 is a flowchart showing a simplified flow of processing in the present embodiment.
  • the multiplication unit 535 and the subtractor 536 perform echo suppression with a large regression coefficient ai1, and output a first output signal.
  • the multiplier 835 and the subtractor 836 perform echo suppression with a small regression coefficient ai2 and output a second output signal.
  • the threshold value comparison unit 837 determines whether or not the first output signal is larger than the threshold value. If the first output signal is greater than the threshold value, the selection unit 838 selects and outputs the second output signal in step S906. If the first output signal is smaller than the threshold value, the selection unit 838 selects and outputs the first output signal in step S907. According to the present embodiment, the same effects as those of the second embodiment can be obtained with a simpler configuration. The reason is that the nonlinear echo suppressor 500 according to the present embodiment has the following configuration. First, the multiplier 535 and the subtractor 536 perform echo suppression with a large regression coefficient ai1 and output a first output signal.
  • FIG. 10 is a block diagram showing a configuration of the nonlinear echo suppression unit 1000 provided in the speech processing apparatus according to the present embodiment.
  • the nonlinear echo suppression unit 1000 includes a spectrum amplitude estimation unit 1003 instead of the spectrum amplitude estimation unit 503, and includes a spectrum gain calculation unit 1005 instead of the spectrum gain calculation unit 505.
  • the spectrum amplitude estimation unit 1003 further includes a peak search unit 1031 as compared with the spectrum amplitude estimation unit 503. Further, the spectrum amplitude estimation unit 1003 does not include the comparison unit 537, the regression coefficient calculation unit 538, the subtracter 539, and the multiplication unit 540, as compared with the spectrum amplitude estimation unit 503.
  • the spectral gain calculation unit 1005 further includes a spectral gain correction unit 1051 as compared with the spectral gain calculation unit 505.
  • the peak search unit 1031 derives the maximum value in the mid-high range (higher component of the main audio frequency) out of the spectral gain for suppressing the echo.
  • the peak search unit 1031 outputs this maximum value to the spectrum gain correction unit 1051.
  • the spectrum gain correction unit 1051 calculates the minimum spectrum gain that makes the spectrum shape look like speech with reference to the input maximum value. Specifically, the spectral gain correction unit 1051 calculates the spectral gain attenuated by several dB / Octave from the maximum value of the spectral gain.
  • the spectral gain correction unit 1051 sets these spectral gains as the lower limit value of the spectral gain at each frequency.
  • the spectral gain correction unit 1051 functions as a suppression strength control unit that weakens the suppression strength of the nonlinear echo by attenuating the gain with respect to the frequency with a predetermined slope.
  • FIG. 11 is a diagram for explaining the effect of the present embodiment.
  • the graph on the left side of the upper column 1101 is a graph showing the relationship between the uncorrected spectral gain 1101A and the desired signal spectrum 1101B.
  • the graph on the right side of the upper column 1101 is a graph showing a desired signal spectrum 1101C that has been subjected to suppression processing based on the spectrum gain 1101A.
  • the graph on the left side of the lower column 1102 is a graph showing the relationship between the corrected spectrum gain 1102A and the desired signal spectrum 1102B.
  • the graph on the right side of the lower column 1102 is a graph showing a desired signal spectrum 1102C that has been subjected to suppression processing based on the spectrum gain 1102A.
  • the vertical axis represents intensity (decibel), and the horizontal axis represents frequency (hertz).
  • the spectrum gain attenuated by several dB / Octave from the maximum value of the spectrum gain 1101A is set as the lower limit value of the spectrum gain at each frequency. .
  • the maximum value of the spectral gain 1102A is large, and the lower limit spectral gain is also large.
  • the present invention can also be applied to a case where an information processing program that implements the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention on a computer, a program installed in the computer, a medium storing the program, and a WWW (World Wide Web) server that downloads the program are also included in the scope of the present invention. .
  • a flow of processing executed by a CPU (Central Processing Unit) 1202 provided in the computer 1200 when the audio processing described in the second embodiment is realized by software will be described with reference to FIG. .
  • the CPU 1202 executes, for example, the program read from the memory 1204 to realize steps S301 to S304 described with reference to FIG.
  • FIG. 13 is a diagram illustrating an example of a recording medium (storage medium) 1207 that records (stores) a program.
  • the recording medium 1207 is a non-volatile recording medium that stores information non-temporarily.
  • the recording medium 1307 may be a recording medium that temporarily stores information.
  • the recording medium 1307 records a program (software) that causes the computer 1200 (CPU 1202) to execute the operation illustrated in FIG.
  • the recording medium 1207 may further record arbitrary programs and data. Even if such a computer is used, the same effect as in the second embodiment can be obtained. While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2011-112077 for which it applied on May 19, 2011, and takes in those the indications of all here.

Abstract

The present invention provides an audio processing device capable of suppressing an echo while minimizing degradation of a desired signal. Said audio processing device is provided with the following: a nonlinear-echo suppression means that suppresses a nonlinear echo in an input signal; a determination means that uses suppression results from the nonlinear-echo suppression means to determine the presence or absence of a desired signal in the input signal; and a suppression-strength control means that, if it has been determined that the desired signal is present in the input signal, reduces the strength of the nonlinear-echo suppression performed by the nonlinear-echo suppression means.

Description

音声処理装置、音声処理方法及び音声処理プログラムを記録した記録媒体Audio processing apparatus, audio processing method, and recording medium recording audio processing program
 本発明は、エコーを抑圧する技術に関する。 The present invention relates to a technique for suppressing echo.
 上記技術分野において、特許文献1に示されているように、エコーを抑圧する技術が知られている。
 特許文献1記載のエコー抑圧装置は以下の機能を有する変換部を備える。第一に、変換部は、第1の信号及び第2の信号を入力する。ここで、第1の信号は、マイクロフォンの入力信号(特許文献1では出力信号)、またはスピーカの出力信号から線形エコーキャンセラの出力信号を減算器で減算した信号、のいずれか一方である。また、第2の信号は、線形エコーキャンセラの出力信号である。第二に、変換部は、第1の信号および第2の信号からエコーの漏れこみ具合の推定値を算出する。第三に、変換部は、この算出した推定値に基づいて第1の信号を補正することで、第1の信号からエコーを除去した近端信号を生成して出力端子に出力する。
In the above technical field, as shown in Patent Document 1, a technique for suppressing echoes is known.
The echo suppression device described in Patent Document 1 includes a conversion unit having the following functions. First, the conversion unit inputs the first signal and the second signal. Here, the first signal is either a microphone input signal (output signal in Patent Document 1) or a signal obtained by subtracting the output signal of the linear echo canceller from the output signal of the speaker by the subtractor. The second signal is an output signal of the linear echo canceller. Second, the conversion unit calculates an estimated value of echo leakage from the first signal and the second signal. Third, the conversion unit corrects the first signal based on the calculated estimated value, thereby generating a near-end signal obtained by removing the echo from the first signal, and outputs the generated near-end signal to the output terminal.
特開2004−56453号公報JP 2004-56453 A
 しかしながら、特許文献1に記載された技術では、エコーの抑圧によって所望信号が劣化することが多く、出力信号品質が十分ではなかった。
 その理由は、特許文献1記載のエコー抑圧装置の変換部が、第1の信号あるいは第2の信号に、所望信号が含まれているか否かを考慮することなく、第1の信号を補正するからである。
 本発明の目的は、上述の課題を解決する技術を提供することにある。
However, with the technique described in Patent Document 1, the desired signal often deteriorates due to echo suppression, and the output signal quality is not sufficient.
The reason is that the conversion unit of the echo suppressor described in Patent Document 1 corrects the first signal without considering whether the first signal or the second signal contains the desired signal. Because.
The objective of this invention is providing the technique which solves the above-mentioned subject.
 本発明の一態様における装置は、
 入力信号に含まれる非線形エコーを抑圧する非線形エコー抑圧手段と、
 前記非線形エコー抑圧手段による抑圧の結果に応じて、入力信号における所望信号の混在の有無を判定する判定手段と、
 判定の結果、前記所望信号が前記入力信号に混在していると判断した場合には、前記非線形エコー抑圧手段における非線形エコーの抑圧強度を弱める抑圧強度制御手段と、
 を備える。
 本発明の一態様における方法は、
 入力信号に含まれる非線形エコーを抑圧し、
 前記抑圧の結果に応じて、入力信号における所望信号の混在の有無を判定し、
 判定の結果、前記所望信号が前記入力信号に混在していると判断した場合には、非線形エコーの抑圧強度を弱めた非線形エコーの抑圧を行う。
 本発明の一態様における不揮発性媒体に記録されたプログラムは、
 入力信号に含まれる非線形エコーを抑圧する処理と、
 前記抑圧の結果に応じて、入力信号における所望信号の混在の有無を判定する処理と、
 判定の結果、前記所望信号が前記入力信号に混在していると判断した場合には、非線形エコーの抑圧強度を弱めた非線形エコーの抑圧を行う処理と、
 をコンピュータに実行させる。
In one embodiment of the present invention, an apparatus includes:
Nonlinear echo suppression means for suppressing nonlinear echo included in the input signal;
In accordance with the result of suppression by the nonlinear echo suppression unit, a determination unit that determines the presence or absence of a desired signal in the input signal;
As a result of the determination, if it is determined that the desired signal is mixed in the input signal, a suppression strength control unit that weakens the suppression strength of the nonlinear echo in the nonlinear echo suppression unit;
Is provided.
In one aspect of the invention, a method includes
Suppresses non-linear echo contained in the input signal,
According to the result of the suppression, the presence or absence of a desired signal in the input signal is determined,
As a result of the determination, if it is determined that the desired signal is mixed in the input signal, nonlinear echo suppression with weakened nonlinear echo suppression strength is performed.
The program recorded on the non-volatile medium in one embodiment of the present invention is:
Processing to suppress non-linear echo contained in the input signal;
A process for determining the presence or absence of a desired signal in the input signal according to the result of the suppression;
As a result of the determination, if it is determined that the desired signal is mixed in the input signal, processing for suppressing nonlinear echo with weakened nonlinear echo suppression strength;
Is executed on the computer.
 本発明によれば、所望信号の劣化を抑えつつエコーの抑圧を行うことができる。 According to the present invention, it is possible to suppress echoes while suppressing deterioration of a desired signal.
図1は、本発明の第1実施形態に係る音声処理装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of the speech processing apparatus according to the first embodiment of the present invention. 図2は、本発明の第2実施形態に係る音声処理装置の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of the speech processing apparatus according to the second embodiment of the present invention. 図3は、本発明の第2実施形態に係る音声処理装置の処理の流れを示すフローチャートである。FIG. 3 is a flowchart showing a flow of processing of the speech processing apparatus according to the second embodiment of the present invention. 図4は、本発明の第2実施形態に係る音声処理装置の効果を示す図である。FIG. 4 is a diagram showing the effect of the speech processing apparatus according to the second embodiment of the present invention. 図5は、本発明の第3実施形態に係る音声処理装置の構成を示すブロック図である。FIG. 5 is a block diagram showing the configuration of the speech processing apparatus according to the third embodiment of the present invention. 図6は、本発明の第4実施形態に係る音声処理装置の処理の流れを示すフローチャートである。FIG. 6 is a flowchart showing the flow of processing of the speech processing apparatus according to the fourth embodiment of the present invention. 図7は、本発明の第5実施形態に係る音声処理装置の処理の流れを示すフローチャートである。FIG. 7 is a flowchart showing a process flow of the speech processing apparatus according to the fifth embodiment of the present invention. 図8は、本発明の第6実施形態に係る音声処理装置の構成を示すブロック図である。FIG. 8 is a block diagram showing a configuration of a speech processing apparatus according to the sixth embodiment of the present invention. 図9は、本発明の第6実施形態に係る音声処理装置の処理の流れを示すフローチャートである。FIG. 9 is a flowchart showing the flow of processing of the speech processing apparatus according to the sixth embodiment of the present invention. 図10は、本発明の第7実施形態に係る音声処理装置の構成を示すブロック図である。FIG. 10 is a block diagram showing the configuration of the speech processing apparatus according to the seventh embodiment of the present invention. 図11は、本発明の第7実施形態に係る音声処理装置の効果を説明する図である。FIG. 11 is a diagram for explaining the effect of the speech processing apparatus according to the seventh embodiment of the present invention. 図12は、本発明のその他の実施形態に係る音声処理装置の構成を示すブロック図である。FIG. 12 is a block diagram showing a configuration of a sound processing apparatus according to another embodiment of the present invention. 図13は、本発明のプログラムを記録した記録媒体を示す図である。FIG. 13 is a diagram showing a recording medium on which the program of the present invention is recorded.
 以下に、図面を参照して、本発明の実施の形態について例示的に詳しく説明する。ただし、以下の実施の形態に記載されている構成要素はあくまで例示であり、本発明の技術範囲をそれらのみに限定する趣旨のものではない。
 [第1実施形態]
 本発明の第1実施形態としての音声処理装置100について、図1を用いて説明する。音声処理装置100は、信号中のエコーを抑圧するための装置である。
 図1に示すように、音声処理装置100は、非線形エコー抑圧部101と所望信号混在判定部102とエコー抑圧強度制御部103とを含む。
 非線形エコー抑圧部101は、入力信号に含まれる非線形エコーを抑圧する。また、所望信号混在判定部102は、非線形エコー抑圧部101による抑圧の結果に応じて、入力信号における所望信号の混在の有無を判定する。
 エコー抑圧強度制御部103は、所望信号混在判定部102での判定の結果、所望信号が入力信号に混在していると判断した場合には、非線形エコー抑圧部101における非線形エコーの抑圧強度を弱める。
 以上の構成により、音声処理装置100は、所望信号の劣化を抑えつつエコーの抑圧を行うことができる。
 その理由は、音声処理装置100が以下のような構成を備えるからである。第一に、所望信号混在判定部102は、入力信号における所望信号の混在の有無を判定する。第二に、エコー抑圧強度制御部103は、その判定結果が所望信号の混在有である場合、非線形エコー抑圧部101における非線形エコーの抑圧強度を弱める。
 [第2実施形態]
 次に本発明の第2実施形態に係る音声処理装置200について、図2を用いて説明する。図2は、本実施形態に係る音声処理装置200の概略構成を説明するためのブロック図である。音声処理装置200は、エコーを抑圧した際に、所望信号(例えば近端信号)の高域成分が減衰して明瞭度が低下しないように、所望信号の有無を判定してから抑圧量を決定する。具体的には、エコーを確実に抑圧した後に信号が残留している場合は所望信号が存在していると判定し、抑圧量を低減して、再度エコー抑圧処理を繰り返す。抑圧量低減により、所望信号の明瞭度が改善する。所望信号が存在している場合は、エコーが多少残留していても気にならないので、主観的に十分なエコー抑圧効果がある。
 図2に示すように、音声処理装置200は、音声入力手段としてのマイクロフォン201と、音声出力手段としてのスピーカ202を含む。スピーカ202が出力信号に応じた音声を出力すると、マイクロフォン201の入力信号にエコー信号が混入する場合がある。擬似線形エコー生成部203は、そのエコー信号のうちの線形エコー信号の擬似信号を、出力信号から生成する。擬似線形エコー生成部203は、例えば残差信号によって係数の更新を行う適応フィルタによって構成される。線形エコー抑圧部204は、擬似線形エコー信号を、マイクロフォン201が入力した入力信号から減算する。
 擬似線形エコー生成部203で生成された擬似線形エコー信号及び残差信号は、非線形エコー抑圧部205に入力される。非線形エコー抑圧部205は、非線形エコー信号を推定して、残差信号中の非線形エコー信号成分を抑圧する。
 また、音声処理装置200は、出力判定部206と抑圧強度制御部207とを備えている。出力判定部206は、エコーを強く抑圧した後の信号から、所望信号の存在を判定する。特に、音声の高域成分のみから所望信号の存在を判定する。
 本実施形態では、非線形エコー抑圧部205は、回帰係数を乗算して推定される非線形エコー信号に基づいて非線形エコーを抑圧する。抑圧強度制御部207は、その回帰係数の補正または選択によって、抑圧量の低減を行う。すなわち、第1回帰係数を用いてエコーを強く抑圧した後の信号に、所望信号が存在していた場合、抑圧強度制御部207は、第1回帰係数より小さい第2回帰係数を非線形エコー抑圧部205に提供する。非線形エコー抑圧部205は、擬似非線形エコー信号に第1回帰係数より小さい第2回帰係数を乗算して非線形エコー信号を推定する。
 図3は、音声処理装置200で行われる処理の流れを説明するフローチャートである。ステップS301において、非線形エコー抑圧部205は、比較的大きな第1回帰係数でエコーを強く抑圧する。次に、ステップS302において、出力判定部206は、抑圧後の電力を算出する。ステップS303では、出力判定部206は、抑圧後の電力(パワー、つまり振幅の二乗)が閾値よりも大きいか否かを判定する。抑圧後の電力が閾値よりも大きい場合には、元々の入力信号中に所望信号が含まれていたと考えられるため、ステップS304に進んで、抑圧強度制御部207は、回帰係数を小さくする。即ち、抑圧強度制御部207は、先の第1回帰係数より小さい第2回帰係数を出力する。そして、非線形エコー抑圧部205は、その回帰係数に基づいて、再度エコー抑圧処理を行う。つまり、エコー抑圧効果を小さくして、所望音声が大きく残るようにする。
 図4は、本実施形態に係る音声処理装置200によって達成できる効果を示す図である。上の段401の左側のグラフは、1回目の抑圧処理後のスペクトルを示すグラフである。また、上の段401の右側のグラフは、回帰係数を小さく補正した後の2回目の抑圧処理後のスペクトルを示すグラフである。上段401に示すように、回帰係数を小さくすることにより、所望音声が大きく残るようにエコー抑圧を行うことができる。一方、下段402の左側のグラフは、1回目の抑圧処理後の出力電力が閾値よりも小さい場合のスペクトルを示すグラフである。この場合、抑圧強度制御部207は、回帰係数を変更しない。このため、下段402の右側のグラフが示す、2回目の抑圧処理後のスペクトルは、左側のグラフと同じ形状のスペクトルとなる。各グラフにおいて、例えば、縦軸は強度(デシベル)であり、横軸は周波数(ヘルツ)である。
 以上、本実施形態によれば、エコー抑圧後の所望信号の明瞭度が改善する。
 その理由は、出力判定部206が、抑圧後の電力が閾値よりも大きいか否かを判定し、抑圧強度制御部207が、その判定結果に基づいて発生した回帰係数を、非線形エコー抑圧部205へ出力するようにしたからである。
 [第3実施形態]
 次に本発明の第3実施形態に係る音声処理装置について、図5を用いて説明する。本実施形態に係る音声処理装置は、非線形エコー抑圧部が、出力判定部及び抑圧強度制御部を内包する点で第2実施形態の非線形エコー抑圧部205と異なる。その他の構成及び作用については第2実施形態と同様であるためここでは説明を省略する。
 図5は、本実施形態に係る非線形エコー抑圧部500の内部構成を示す図である。非線形エコー抑圧部500は、高速フーリエ変換部(Fast Fourier Transform:FFT)501と、高速フーリエ変換部502と、スペクトル振幅推定部503と、スペクトルフロアリング部504と、スペクトル利得計算部505と、逆高速フーリエ変換部(Inverse Fast Fourier Transform:IFFT)506とを備える。
 高速フーリエ変換部501、502は、それぞれ、残差信号d(k)と擬似線形エコーy(k)とを周波数スペクトルに変換する。スペクトル振幅推定部503と、スペクトルフロアリング部504と、スペクトル利得計算部505とは、周波数成分ごとに用意されている。逆高速フーリエ変換部506は、周波数成分ごとに導き出された振幅スペクトルを、対応する位相と統合して逆高速フーリエ変換し、時間領域の出力信号zi(k)、つまり通話相手に送る音声波形に再合成する。
 線形エコーと非線形エコーはまったく違う波形である。しかし、線形エコー及び非線形エコーそれぞれの周波数ごとにスペクトル振幅を見ると、擬似線形エコーが大きい時は非線形エコーも大きくなる傾向がある。すなわち、線形エコーと非線形エコーとには、振幅の相関がある。つまり、擬似線形エコーに基づいて、非線形エコーの量を推定することができる。
 そこで、スペクトル振幅推定部503は、推定した非線形エコーの量に基づいて、所望信号のスペクトル振幅を推定する。推定された信号のスペクトル振幅には誤差がある。そこで、その推定誤差が、通話相手に送る音声波形において、主観的に不快なものにならないよう、スペクトルフロアリング部504はフロアリング処理を加える。
 例えば、信号の推定スペクトル振幅が過剰に小さく、背景雑音のスペクトル振幅を下回る場合、エコーの有無で信号レベルが変動し、通話相手に違和感を生じさせる。その対策として、スペクトルフロアリング部504は、背景雑音レベルを推定して、推定スペクトル振幅の下限とすることにより、レベル変動を低減する。
 一方、推定誤差により推定スペクトル振幅にエコーが大きく残留してしまった場合、残留したエコーは、断続的かつ急激に変化して、ミュージカルノイズと呼ばれる、人工的な付加音となる。その対策として、スペクトル利得計算部505は、エコーを消去するために、推定した非線形エコーを減算するのではなく、減算された程度の振幅になるように利得を乗じる。利得の急激な変化を防止する平滑化を行うことにより、残留エコーの断続的変化を抑えることができる。
 以下、スペクトル振幅推定部503、スペクトルフロアリング部504、スペクトル利得計算部505の内部構成について数式を用いて説明する。
 非線形エコー抑圧部500に入力される残差信号d(k)は、所望信号s(k)と、残留非線形エコーq(k)の和である。
 d(k)=s(k)+q(k)・・・(1)
 線形エコー抑圧部204は線形エコーをほぼ完全に除去していると仮定して、非線形成分のみを周波数領域で考える。高速フーリエ変換部501及び高速フーリエ変換部502によって、式(1)で表される残差信号は、周波数領域に変換され、以下の式で表される。
 D(m)=S(m)+Q(m)・・・(2)
 ここでmはフレーム番号、ベクトルD(m)、S(m)およびQ(m)それぞれは、d(k)、s(k)およびq(k)それぞれを周波数領域に変換した表現である。各周波数を独立に考えて式(2)を変形すると、所望信号のi番目の周波数成分は、以下の式で表される。
 Si(m)=Di(m)−Qi(m)・・・(3)
 擬似線形エコー生成部203及び線形エコー抑圧部204が相関除去を行うため、Di(m)とYi(m)の間にはほとんど相関はない。従って、減算器536は、以下のように
Figure JPOXMLDOC01-appb-I000001
変換した場合のi番目の周波数のエコーレプリカである。
Figure JPOXMLDOC01-appb-I000002
Figure JPOXMLDOC01-appb-I000003
Figure JPOXMLDOC01-appb-I000004
の積として以下の様にモデル化できる。
Figure JPOXMLDOC01-appb-I000005
 そこで、絶対値化回路532と平均化回路534とは、Yi(m)から平均エコーレプリカ
Figure JPOXMLDOC01-appb-I000006
数ai1を乗算する。ここで回帰係数ai1は、|Qi(m)|と|Yi(m)|との相関を示す回帰係数である。このモデルは、|Qi(m)|と|Yi(m)|との間に有意な相関があるという実験結果に基づいている。
 式(3)は、ノイズ抑圧で広く用いられている加法型のモデルである。図5に示す非線形エコー抑圧部500のスペクトル整形では、ノイズ抑圧において、不快なミュージカルノイズを生じにくい、スペクトル乗算型の構成をとる。スペクトル乗算を用いて、出力信号の振幅|Zi(m)|を、スペクトルゲインGi(m)と残差信号|Di(m)|の積として得る。
Figure JPOXMLDOC01-appb-I000007
 式(6)の平方根をとり、式(4)の|Qi(m)|にai・|Yi(m)|を代入することにより、
Figure JPOXMLDOC01-appb-I000008
Figure JPOXMLDOC01-appb-I000009
Figure JPOXMLDOC01-appb-I000010
較部537は、この出力信号を閾値と比較する。比較部537は、比較結果を、抑圧強
Figure JPOXMLDOC01-appb-I000011
場合、所望信号が含まれると判断して、回帰係数算出部538は、回帰係数を補正する。具体的には、回帰係数算出部538は、回帰係数ai1に所定の補正値を乗算してより小さな回帰係数ai2を算出する。そして、乗算部540及び減算器539で再度、非線形エコーの抑圧を行う。
Figure JPOXMLDOC01-appb-I000012
ない誤差がある。誤差が大きく、オーバーサブトラクションがおきると、所望信号において、高域成分の減少、あるいは通話相手に送る音声波形において変調感を生じることになる。特に、所望信号が空調音のように定常である場合、通話相手にとって、変調感は不快である。この変調感を主観的に低減するために、フロアリング部504はスペクトル上のフロアリングを行う。
 フロアリングでは、まず平均化回路541が、所望信号Di(m)の定常成分|Ni(m)|を推定する。次に、最大値選択回路642が、これを下限とするフロアリングを行う。そ
Figure JPOXMLDOC01-appb-I000013
 最後に数式(6)に示したように、乗算部553が、スペクトルゲインGi(m)と残差信号|Di(m)|の積を求めて算出した、振幅|Zi(m)|を出力信号として出力する。逆高速フーリエ変換部506は、振幅|Zi(em)|を逆フーリエ変換し、非線形のエコーが効果的に抑圧された信号zi(k)を出力する。
 回帰係数算出部538は、第1回帰係数と回帰係数補正値とを記憶する。そして、回帰係数算出部538は、比較部537での比較の結果が、入力信号において所望信号が混在しているという判定であった場合、第1回帰係数ai1と回帰係数補正値とを乗算して第2回帰係数ai2を算出し、出力する。
 回帰係数算出部538は、第1回帰係数とそれより小さい第2回帰係数とを記憶し、比較部537での比較の結果が、入力信号において所望信号が混在しているという判定であった場合に、第2回帰係数を読出し、出力してもよい。
 以上の通り、本実施形態の構成によれば、回帰係数を補正して、エコー抑圧後の所望信号の明瞭度を改善させることができる。
 その理由は、本実施形態に係る非線形エコー抑圧部500が以下のような構成を備えるからである。第一に、高速フーリエ変換部501及び502が、それぞれ、残差信号d(k)と擬似線形エコーy(k)とを周波数スペクトルに変換する。第二に、スペクトル振幅推定部503と、スペクトルフロアリング部504と、スペクトル利得計算部505とが、周波数成分ごとに、所望信号の劣化を抑えつつエコーの抑圧を行う。第三に、逆高速フーリエ変換部506が、周波数成分ごとに導き出された振幅スペクトルを、対応する位相と統合して逆高速フーリエ変換し、時間領域の出力信号zi(k)に再合成する。
 [第4実施形態]
 本発明の第4実施形態について、図6を用いて説明する。図6は、本実施形態に係る音声処理装置での処理の流れを説明するためのフローチャートである。本実施形態では、ステップS604において、抑圧強度制御部207が、あらかじめ用意された小さい回帰係数を非線形エコー抑圧部205に提供する。非線形エコー抑圧部205は、提供された回帰係数を用いて出力信号を再計算する点で、第2実施形態と異なる。その他の構成及び動作は、第2実施形態と同様であるため、同じ構成及び動作については同じ符号を付してその詳しい説明を省略する。
 本実施形態によれば、第2実施形態と同様の効果を得ることができる。
 その理由は、抑圧強度制御部207が、あらかじめ用意された小さい回帰係数を、非線形エコー抑圧部205に提供し、非線形エコー抑圧部205が、その回帰係数を用いて出力信号を再計算するようにしたからである。
 [第5実施形態]
 本発明の第5実施形態について、図7を用いて説明する。図7は、本実施形態に係る音声処理装置での処理の流れを説明するためのフローチャートである。ステップS703において、出力判定部206が、音声帯域のみで、出力信号が閾値より大きいか判断する点で、第2実施形態と異なる。その他の構成及び動作は、第2実施形態と同様であるため、同じ構成及び動作については同じ符号を付してその詳しい説明を省略する。
 本実施形態によれば、第2実施形態と同様の効果を得ることができる。また、本実施形態によれば、音声帯域における、エコー抑圧後の所望信号の明瞭度が改善する。
 その理由は、出力判定部206が、音声帯域のみで、出力信号が閾値より大きいか判断するようにしたからである。
 [第6実施形態]
 本発明の第6実施形態について、図8を用いて説明する。図8は、本実施形態に係る音声処理装置に設けられた非線形エコー抑圧部800の構成を示すブロック図である。
 非線形エコー抑圧部800は、非線形エコー抑圧部500と比べて、スペクトル振幅推定部503に替えて、スペクトル振幅推定部803を含む。スペクトル振幅推定部803は、スペクトル振幅推定部503と比べて、選択部838、乗算部835、減算器836及び閾値比較部837を更に含む。また、スペクトル振幅推定部803は、スペクトル振幅推定部503と比べて、比較部537、回帰係数算出部538、減算器539及び乗算部540を含まない。
 図8において、絶対値化回路532と平均化回路534とは、Yi(m)から平均エコーレ
Figure JPOXMLDOC01-appb-I000014
2を乗算する。ここで、回帰係数ai1、ai2は、|Qi(m)|と|Yi(m)|との相関を示す回帰係数であり、ai1>ai2である。
 閾値比較部837は、減算器536からの出力を閾値と比較し、その結果を選択部838に渡す。選択部838は、閾値比較部837による比較結果に応じて、減算器536、減算器836のいずれかの出力を選択して、最大値選択回路542に渡す。その他の構成及び動作は、第3実施形態と同様であるため、同じ構成及び動作については同じ符号を付してその詳しい説明を省略する。
 図9は、本実施形態での処理の流れを簡略化して示したフローチャートである。ステップS901では、乗算部535及び減算器536が、大きな回帰係数ai1でエコー抑圧を行い、第1出力信号を出力する。ステップS903では、乗算部835及び減算器836が、小さな回帰係数ai2でエコー抑圧を行い、第2出力信号を出力する。ステップS905においては、閾値比較部837が、第1出力信号が閾値よりも大きいか否かを判定する。そして、第1出力信号が閾値よりも大きければ、ステップS906において、選択部838が、第2出力信号を選択して出力する。また、第1出力信号が閾値よりも小さければ、ステップS907において、選択部838が、第1出力信号を選択して出力する。
 本実施形態によれば、より単純な構成で第2実施形態と同様の効果を得ることができる。
 その理由は、本実施形態に係る非線形エコー抑圧部500が以下のような構成を備えるからである。第一に、乗算部535及び減算器536が、大きな回帰係数ai1でエコー抑圧を行い、第1出力信号を出力する。第二に、乗算部835及び減算器836が、小さな回帰係数ai2でエコー抑圧を行い、第2出力信号を出力する。第三に、閾値比較部837が、第1出力信号が閾値よりも大きいか否かを判定する。第四に、選択部838が、第1出力信号が閾値よりも大きければ、第2出力信号を選択して出力し、第1出力信号が閾値よりも小さければ、第1出力信号を選択して出力する。
 [第7実施形態]
 本発明の第7実施形態について、図10を用いて説明する。図10は、本実施形態に係る音声処理装置に設けられた非線形エコー抑圧部1000の構成を示すブロック図である。
 非線形エコー抑圧部1000は、非線形エコー抑圧部500と比べて、スペクトル振幅推定部503に替えてスペクトル振幅推定部1003を、スペクトル利得計算部505に替えてスペクトル利得計算部1005を含む。スペクトル振幅推定部1003は、スペクトル振幅推定部503と比べて、ピーク探索部1031を更に含む。また、スペクトル振幅推定部1003は、スペクトル振幅推定部503と比べて、比較部537、回帰係数算出部538、減算器539及び乗算部540を含まない。また、スペクトル利得計算部1005は、スペクトル利得計算部505に比べて、スペクトルゲイン補正部1051を更に含む。
Figure JPOXMLDOC01-appb-I000015
中高域においてのピークを探索する。
 ピーク探索部1031は、エコーを抑圧するスペクトルゲインのうち、中高域(音声の主要周波数のうち、高めの成分)における最大値を導き出す。ピーク探索部1031は、この最大値を、スペクトルゲイン補正部1051に出力する。
 スペクトルゲイン補正部1051は、入力した最大値を基準にして、スペクトルの形状が音声らしくなるような最小のスペクトルゲインを算出する。具体的には、スペクトルゲイン補正部1051は、スペクトルゲインの最大値から、数dB/Octaveで減衰させたスペクトルゲインを算出する。スペクトルゲイン補正部1051は、これらのスペクトルゲインを、各周波数におけるスペクトルゲインの下限値とする。即ち、スペクトルゲイン補正部1051は、周波数に対する利得を所定の傾きで減衰させることにより、非線形エコーの抑圧強度を弱める抑圧強度制御部として機能する。
 図11は、本実施形態の効果を説明する図である。
 上欄1101の左側のグラフは、補正されていないスペクトルゲイン1101Aと、所望信号スペククトル1101Bとの関係を示すグラフである。上欄1101の右側のグラフは、スペクトルゲイン1101Aに基づいて抑圧処理された、所望信号スペクトル1101Cを示すグラフである。
 下欄1102の左側のグラフは、補正されたスペクトルゲイン1102Aと、所望信号スペクトル1102Bとの関係を示すグラフである。下欄1102の右側のグラフは、スペクトルゲイン1102Aに基づいて抑圧処理された、所望信号スペクトル1102Cを示すグラフである。
各グラフにおいて、例えば、縦軸は強度(デシベル)であり、横軸は周波数(ヘルツ)である。
 図11の下欄1102に示すように所望信号1102Bが存在している場合に、スペクトルゲイン1101Aの最大値から数dB/Octaveで減衰させたスペクトルゲインを、各周波数におけるスペクトルゲインの下限値とする。これにより、スペクトルゲイン1102Aの最大値が大きく、下限のスペクトルゲインも大きくなる。従って、図11の上欄1101に示すスペクトルゲインの補正を行わない抑圧方法の場合に比べて、図11の下欄の場合は、丸で囲んだ部分の高域が保持され、所望信号の明瞭度が改善する。一方、所望信号が存在しない場合は、スペクトルゲインの最大値が小さいので、下限のスペクトルゲインも小さくなり、十分なエコー抑圧効果が得られる。
 (他の実施形態)
 以上、本発明の実施形態について詳述したが、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステムまたは装置も、本発明の範疇に含まれる。
 また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。更に、本発明は、実施形態の機能を実現する情報処理プログラムが、システムあるいは装置に直接あるいは遠隔から供給される場合にも適用可能である。従って、本発明の機能をコンピュータで実現するために、コンピュータにインストールされるプログラム、あるいはそのプログラムを格納した媒体、そのプログラムをダウンロードさせるWWW(World Wide Web)サーバも、本発明の範疇に含まれる。
 以下、一例として、第2実施形態で説明した音声処理をソフトウェアで実現する場合に、コンピュータ1200に設けられたCPU(Central Processing Unit)1202で実行する処理の流れを、図12を用いて説明する。CPU1202は、例えばメモリ1204から読出した、プログラムを実行することにより、図3を用いて説明したステップS301~S304を実現し、入力部1201から入力した入力信号に対して所定の処理を施して出力部1203から出力する。
 尚、入力部1201は、マイクロフォン201を含んでよい。出力部1203は、スピーカ202を含んでよい。メモリ1204は、情報を記憶する。CPU1202は、各ステップの動作を実行する場合に、メモリ1204に必要な情報を書き込み、またメモリ1204から必要な情報を読み出す。
 図13は、プログラムを記録(記憶)する、記録媒体(記憶媒体)1207の例を示す図である。記録媒体1207は、情報を非一時的に記憶する不揮発性記録媒体である。尚、記録媒体1307は、情報を一時的に記憶する記録媒体であってもよい。記録媒体1307は、図12に示す動作をコンピュータ1200(CPU1202)に実行させるプログラム(ソフトウェア)を記録する。尚、記録媒体1207は、更に、任意のプログラムやデータを記録してよい。
 このようなコンピュータを用いても第2実施形態と同様の効果を得ることができる。
 以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。
 この出願は、2011年5月19日に出願された日本出願特願2011−112077を基礎とする優先権を主張し、その開示の全てをここに取り込む。
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings. However, the components described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them.
[First Embodiment]
A speech processing apparatus 100 as a first embodiment of the present invention will be described with reference to FIG. The audio processing device 100 is a device for suppressing echo in a signal.
As shown in FIG. 1, the speech processing apparatus 100 includes a nonlinear echo suppression unit 101, a desired signal mixture determination unit 102, and an echo suppression intensity control unit 103.
The nonlinear echo suppression unit 101 suppresses nonlinear echo included in the input signal. Further, the desired signal mixture determination unit 102 determines whether or not the desired signal is mixed in the input signal according to the result of suppression by the nonlinear echo suppression unit 101.
If the echo suppression intensity control unit 103 determines that the desired signal is mixed in the input signal as a result of the determination by the desired signal mixture determination unit 102, the echo suppression intensity control unit 103 weakens the suppression intensity of the nonlinear echo in the nonlinear echo suppression unit 101. .
With the above configuration, the speech processing apparatus 100 can suppress echo while suppressing deterioration of a desired signal.
The reason is that the speech processing apparatus 100 has the following configuration. First, the desired signal mixture determination unit 102 determines whether or not the desired signal is mixed in the input signal. Second, the echo suppression strength control unit 103 weakens the suppression strength of the nonlinear echo in the nonlinear echo suppression unit 101 when the determination result indicates that the desired signal is mixed.
[Second Embodiment]
Next, a speech processing apparatus 200 according to the second embodiment of the present invention will be described with reference to FIG. FIG. 2 is a block diagram for explaining a schematic configuration of the speech processing apparatus 200 according to the present embodiment. The speech processing apparatus 200 determines the suppression amount after determining the presence or absence of the desired signal so that the high-frequency component of the desired signal (for example, the near-end signal) is attenuated and the intelligibility does not decrease when the echo is suppressed To do. Specifically, if the signal remains after the echo is reliably suppressed, it is determined that the desired signal exists, the suppression amount is reduced, and the echo suppression process is repeated again. The reduction of the suppression amount improves the clarity of the desired signal. When the desired signal is present, there is a subjectively sufficient echo suppression effect because there is no concern even if some echo remains.
As shown in FIG. 2, the sound processing apparatus 200 includes a microphone 201 as sound input means and a speaker 202 as sound output means. When the speaker 202 outputs sound corresponding to the output signal, an echo signal may be mixed into the input signal of the microphone 201. The pseudo linear echo generation unit 203 generates a pseudo signal of the linear echo signal among the echo signals from the output signal. The quasi-linear echo generation unit 203 is configured by an adaptive filter that updates a coefficient using a residual signal, for example. The linear echo suppression unit 204 subtracts the pseudo linear echo signal from the input signal input by the microphone 201.
The pseudo linear echo signal and the residual signal generated by the pseudo linear echo generation unit 203 are input to the nonlinear echo suppression unit 205. The nonlinear echo suppression unit 205 estimates the nonlinear echo signal and suppresses the nonlinear echo signal component in the residual signal.
The speech processing apparatus 200 includes an output determination unit 206 and a suppression strength control unit 207. The output determination unit 206 determines the presence of the desired signal from the signal after the echo is strongly suppressed. In particular, the presence of the desired signal is determined only from the high frequency component of the sound.
In the present embodiment, the non-linear echo suppression unit 205 suppresses non-linear echo based on a non-linear echo signal estimated by multiplying by a regression coefficient. The suppression intensity control unit 207 reduces the suppression amount by correcting or selecting the regression coefficient. That is, when the desired signal is present in the signal after the echo is strongly suppressed using the first regression coefficient, the suppression intensity control unit 207 applies the second regression coefficient smaller than the first regression coefficient to the nonlinear echo suppression unit. 205. The nonlinear echo suppression unit 205 estimates the nonlinear echo signal by multiplying the pseudo nonlinear echo signal by a second regression coefficient smaller than the first regression coefficient.
FIG. 3 is a flowchart for explaining the flow of processing performed by the speech processing apparatus 200. In step S301, the nonlinear echo suppression unit 205 strongly suppresses the echo with a relatively large first regression coefficient. Next, in step S302, the output determination unit 206 calculates the power after suppression. In step S303, the output determination unit 206 determines whether or not the suppressed power (power, that is, the square of the amplitude) is greater than a threshold value. When the power after suppression is larger than the threshold value, it is considered that the desired signal is included in the original input signal, and thus the process proceeds to step S304, and the suppression strength control unit 207 decreases the regression coefficient. That is, the suppression intensity control unit 207 outputs a second regression coefficient that is smaller than the first regression coefficient. Then, the nonlinear echo suppression unit 205 performs echo suppression processing again based on the regression coefficient. That is, the echo suppression effect is reduced so that the desired voice remains large.
FIG. 4 is a diagram illustrating effects that can be achieved by the sound processing apparatus 200 according to the present embodiment. The graph on the left side of the upper row 401 is a graph showing the spectrum after the first suppression process. The graph on the right side of the upper row 401 is a graph showing the spectrum after the second suppression processing after the regression coefficient is corrected to be small. As shown in the upper section 401, by reducing the regression coefficient, it is possible to perform echo suppression so that a large desired speech remains. On the other hand, the graph on the left side of the lower stage 402 is a graph showing a spectrum when the output power after the first suppression process is smaller than the threshold value. In this case, the suppression strength control unit 207 does not change the regression coefficient. For this reason, the spectrum after the second suppression process indicated by the graph on the right side of the lower stage 402 is a spectrum having the same shape as the graph on the left side. In each graph, for example, the vertical axis represents intensity (decibel), and the horizontal axis represents frequency (hertz).
As described above, according to the present embodiment, the clarity of a desired signal after echo suppression is improved.
The reason is that the output determination unit 206 determines whether or not the power after suppression is larger than the threshold value, and the suppression intensity control unit 207 uses the regression coefficient generated based on the determination result as the nonlinear echo suppression unit 205. It is because it was made to output to.
[Third Embodiment]
Next, a speech processing apparatus according to a third embodiment of the present invention will be described with reference to FIG. The speech processing apparatus according to this embodiment differs from the nonlinear echo suppression unit 205 of the second embodiment in that the nonlinear echo suppression unit includes an output determination unit and a suppression intensity control unit. Since other configurations and operations are the same as those of the second embodiment, description thereof is omitted here.
FIG. 5 is a diagram illustrating an internal configuration of the nonlinear echo suppression unit 500 according to the present embodiment. The nonlinear echo suppression unit 500 includes a fast Fourier transform (FFT) unit 501, a fast Fourier transform unit 502, a spectrum amplitude estimation unit 503, a spectrum flooring unit 504, a spectrum gain calculation unit 505, and an inverse And a fast Fourier transform unit (IFFT) 506.
The fast Fourier transform units 501 and 502 respectively convert the residual signal d (k) and the pseudo linear echo y (k) into a frequency spectrum. A spectrum amplitude estimation unit 503, a spectrum flooring unit 504, and a spectrum gain calculation unit 505 are prepared for each frequency component. The inverse fast Fourier transform unit 506 integrates the amplitude spectrum derived for each frequency component with the corresponding phase, performs inverse fast Fourier transform, and outputs the time-domain output signal zi (k), that is, the speech waveform to be sent to the other party. Re-synthesize.
Linear echo and nonlinear echo are completely different waveforms. However, looking at the spectrum amplitude for each frequency of the linear echo and the non-linear echo, the non-linear echo tends to increase when the pseudo-linear echo is large. That is, there is an amplitude correlation between the linear echo and the nonlinear echo. That is, the amount of non-linear echo can be estimated based on the quasi-linear echo.
Therefore, the spectrum amplitude estimation unit 503 estimates the spectrum amplitude of the desired signal based on the estimated amount of nonlinear echo. There is an error in the spectral amplitude of the estimated signal. Therefore, the spectrum flooring unit 504 performs flooring processing so that the estimation error does not become subjectively unpleasant in the voice waveform sent to the other party.
For example, when the estimated spectral amplitude of the signal is excessively small and lower than the spectral amplitude of the background noise, the signal level fluctuates depending on the presence or absence of an echo, causing a strange feeling to the other party. As a countermeasure, the spectrum flooring unit 504 reduces the level fluctuation by estimating the background noise level and setting it as the lower limit of the estimated spectrum amplitude.
On the other hand, if a large amount of echo remains in the estimated spectrum amplitude due to the estimation error, the remaining echo changes intermittently and rapidly, and becomes an artificial additional sound called musical noise. As a countermeasure, the spectral gain calculation unit 505 does not subtract the estimated nonlinear echo to cancel the echo, but multiplies the gain so that the amplitude becomes a subtracted degree. By performing smoothing to prevent sudden changes in gain, it is possible to suppress intermittent changes in residual echo.
Hereinafter, the internal configurations of the spectrum amplitude estimation unit 503, the spectrum flooring unit 504, and the spectrum gain calculation unit 505 will be described using mathematical expressions.
The residual signal d (k) input to the nonlinear echo suppression unit 500 is the sum of the desired signal s (k) and the residual nonlinear echo q (k).
d (k) = s (k) + q (k) (1)
Assuming that the linear echo suppressor 204 has almost completely removed the linear echo, only the nonlinear component is considered in the frequency domain. By the fast Fourier transform unit 501 and the fast Fourier transform unit 502, the residual signal represented by the equation (1) is transformed into the frequency domain and represented by the following equation.
D (m) = S (m) + Q (m) (2)
Here, m is a frame number, and vectors D (m), S (m), and Q (m) are expressions obtained by converting d (k), s (k), and q (k), respectively, into the frequency domain. When equation (2) is transformed by considering each frequency independently, the i-th frequency component of the desired signal is expressed by the following equation.
Si (m) = Di (m) −Qi (m) (3)
Since the pseudo linear echo generation unit 203 and the linear echo suppression unit 204 perform correlation removal, there is almost no correlation between Di (m) and Yi (m). Therefore, the subtractor 536 is as follows:
Figure JPOXMLDOC01-appb-I000001
This is an echo replica of the i-th frequency when converted.
Figure JPOXMLDOC01-appb-I000002
Figure JPOXMLDOC01-appb-I000003
Figure JPOXMLDOC01-appb-I000004
The product can be modeled as follows.
Figure JPOXMLDOC01-appb-I000005
Therefore, the absolute value converting circuit 532 and the averaging circuit 534 calculate the average echo replica from Yi (m).
Figure JPOXMLDOC01-appb-I000006
Multiply the number ai1. Here, the regression coefficient ai1 is a regression coefficient indicating the correlation between | Qi (m) | and | Yi (m) |. This model is based on experimental results that there is a significant correlation between | Qi (m) | and | Yi (m) |.
Expression (3) is an additive model widely used in noise suppression. The spectrum shaping of the nonlinear echo suppressor 500 shown in FIG. 5 has a spectrum multiplication type configuration that is less likely to cause unpleasant musical noise in noise suppression. Using spectral multiplication, the output signal amplitude | Zi (m) | is obtained as the product of the spectral gain Gi (m) and the residual signal | Di (m) |.
Figure JPOXMLDOC01-appb-I000007
By taking the square root of equation (6) and substituting ai 2 · | Yi (m) | 2 for | Qi (m) | 2 of equation (4),
Figure JPOXMLDOC01-appb-I000008
Figure JPOXMLDOC01-appb-I000009
Figure JPOXMLDOC01-appb-I000010
The comparing unit 537 compares this output signal with a threshold value. The comparison unit 537 displays the comparison result with strong suppression.
Figure JPOXMLDOC01-appb-I000011
In this case, it is determined that the desired signal is included, and the regression coefficient calculation unit 538 corrects the regression coefficient. Specifically, the regression coefficient calculation unit 538 calculates a smaller regression coefficient ai2 by multiplying the regression coefficient ai1 by a predetermined correction value. Then, the nonlinear echo is suppressed again by the multiplier 540 and the subtracter 539.
Figure JPOXMLDOC01-appb-I000012
There is no error. If the error is large and oversubtraction occurs, a high frequency component is reduced in the desired signal, or a sense of modulation is produced in the voice waveform sent to the other party. In particular, when the desired signal is stationary like air-conditioning sound, the sense of modulation is uncomfortable for the other party. In order to subjectively reduce the modulation feeling, the flooring unit 504 performs flooring on the spectrum.
In flooring, the averaging circuit 541 first estimates the steady component | Ni (m) | of the desired signal Di (m). Next, the maximum value selection circuit 642 performs flooring with this as the lower limit. So
Figure JPOXMLDOC01-appb-I000013
Finally, as shown in Expression (6), the multiplier 553 outputs the amplitude | Zi (m) | calculated by calculating the product of the spectral gain Gi (m) and the residual signal | Di (m) | Output as a signal. The inverse fast Fourier transform unit 506 performs inverse Fourier transform on the amplitude | Zi (em) |, and outputs a signal zi (k) in which nonlinear echoes are effectively suppressed.
The regression coefficient calculation unit 538 stores the first regression coefficient and the regression coefficient correction value. The regression coefficient calculation unit 538 then multiplies the first regression coefficient ai1 and the regression coefficient correction value when the comparison result in the comparison unit 537 is a determination that a desired signal is mixed in the input signal. The second regression coefficient ai2 is calculated and output.
The regression coefficient calculation unit 538 stores the first regression coefficient and the smaller second regression coefficient, and the comparison result in the comparison unit 537 is a determination that the desired signal is mixed in the input signal. In addition, the second regression coefficient may be read and output.
As described above, according to the configuration of the present embodiment, it is possible to correct the regression coefficient and improve the clarity of the desired signal after echo suppression.
The reason is that the nonlinear echo suppressor 500 according to the present embodiment has the following configuration. First, the fast Fourier transform units 501 and 502 respectively convert the residual signal d (k) and the pseudo linear echo y (k) into a frequency spectrum. Second, the spectrum amplitude estimation unit 503, the spectrum flooring unit 504, and the spectrum gain calculation unit 505 perform echo suppression while suppressing degradation of the desired signal for each frequency component. Third, the inverse fast Fourier transform unit 506 integrates the amplitude spectrum derived for each frequency component with the corresponding phase, performs inverse fast Fourier transform, and re-synthesizes the output signal zi (k) in the time domain.
[Fourth Embodiment]
A fourth embodiment of the present invention will be described with reference to FIG. FIG. 6 is a flowchart for explaining the flow of processing in the speech processing apparatus according to this embodiment. In the present embodiment, in step S604, the suppression intensity control unit 207 provides a small regression coefficient prepared in advance to the nonlinear echo suppression unit 205. The nonlinear echo suppressor 205 is different from the second embodiment in that the output signal is recalculated using the provided regression coefficient. Since other configurations and operations are the same as those in the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
According to this embodiment, the same effect as that of the second embodiment can be obtained.
The reason is that the suppression intensity control unit 207 provides a small regression coefficient prepared in advance to the nonlinear echo suppression unit 205, and the nonlinear echo suppression unit 205 recalculates the output signal using the regression coefficient. Because.
[Fifth Embodiment]
A fifth embodiment of the present invention will be described with reference to FIG. FIG. 7 is a flowchart for explaining the flow of processing in the speech processing apparatus according to this embodiment. In step S703, the output determination unit 206 is different from the second embodiment in that it determines whether the output signal is larger than the threshold only in the audio band. Since other configurations and operations are the same as those in the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
According to this embodiment, the same effect as that of the second embodiment can be obtained. Moreover, according to this embodiment, the clarity of the desired signal after echo suppression in the voice band is improved.
The reason is that the output determination unit 206 determines whether the output signal is larger than the threshold only in the audio band.
[Sixth Embodiment]
A sixth embodiment of the present invention will be described with reference to FIG. FIG. 8 is a block diagram showing the configuration of the nonlinear echo suppression unit 800 provided in the speech processing apparatus according to this embodiment.
Compared with the nonlinear echo suppression unit 500, the nonlinear echo suppression unit 800 includes a spectrum amplitude estimation unit 803 instead of the spectrum amplitude estimation unit 503. The spectrum amplitude estimation unit 803 further includes a selection unit 838, a multiplication unit 835, a subtracter 836 and a threshold comparison unit 837 compared to the spectrum amplitude estimation unit 503. Further, the spectrum amplitude estimation unit 803 does not include the comparison unit 537, the regression coefficient calculation unit 538, the subtractor 539, and the multiplication unit 540, as compared with the spectrum amplitude estimation unit 503.
In FIG. 8, the absolute value converting circuit 532 and the averaging circuit 534 are arranged such that the average echo signal is calculated from Yi (m).
Figure JPOXMLDOC01-appb-I000014
Multiply by two. Here, the regression coefficients ai1 and ai2 are regression coefficients indicating the correlation between | Qi (m) | and | Yi (m) |, and ai1> ai2.
The threshold comparison unit 837 compares the output from the subtractor 536 with the threshold and passes the result to the selection unit 838. The selection unit 838 selects one of the outputs from the subtracter 536 and the subtracter 836 according to the comparison result by the threshold comparison unit 837 and passes the selected output to the maximum value selection circuit 542. Since other configurations and operations are the same as those of the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
FIG. 9 is a flowchart showing a simplified flow of processing in the present embodiment. In step S901, the multiplication unit 535 and the subtractor 536 perform echo suppression with a large regression coefficient ai1, and output a first output signal. In step S903, the multiplier 835 and the subtractor 836 perform echo suppression with a small regression coefficient ai2 and output a second output signal. In step S905, the threshold value comparison unit 837 determines whether or not the first output signal is larger than the threshold value. If the first output signal is greater than the threshold value, the selection unit 838 selects and outputs the second output signal in step S906. If the first output signal is smaller than the threshold value, the selection unit 838 selects and outputs the first output signal in step S907.
According to the present embodiment, the same effects as those of the second embodiment can be obtained with a simpler configuration.
The reason is that the nonlinear echo suppressor 500 according to the present embodiment has the following configuration. First, the multiplier 535 and the subtractor 536 perform echo suppression with a large regression coefficient ai1 and output a first output signal. Second, the multiplier 835 and the subtractor 836 perform echo suppression with a small regression coefficient ai2 and output a second output signal. Third, the threshold comparison unit 837 determines whether or not the first output signal is greater than the threshold. Fourth, the selection unit 838 selects and outputs the second output signal if the first output signal is greater than the threshold, and selects the first output signal if the first output signal is less than the threshold. Output.
[Seventh Embodiment]
A seventh embodiment of the present invention will be described with reference to FIG. FIG. 10 is a block diagram showing a configuration of the nonlinear echo suppression unit 1000 provided in the speech processing apparatus according to the present embodiment.
Compared with the nonlinear echo suppression unit 500, the nonlinear echo suppression unit 1000 includes a spectrum amplitude estimation unit 1003 instead of the spectrum amplitude estimation unit 503, and includes a spectrum gain calculation unit 1005 instead of the spectrum gain calculation unit 505. The spectrum amplitude estimation unit 1003 further includes a peak search unit 1031 as compared with the spectrum amplitude estimation unit 503. Further, the spectrum amplitude estimation unit 1003 does not include the comparison unit 537, the regression coefficient calculation unit 538, the subtracter 539, and the multiplication unit 540, as compared with the spectrum amplitude estimation unit 503. Further, the spectral gain calculation unit 1005 further includes a spectral gain correction unit 1051 as compared with the spectral gain calculation unit 505.
Figure JPOXMLDOC01-appb-I000015
Search for peaks in the mid-high range.
The peak search unit 1031 derives the maximum value in the mid-high range (higher component of the main audio frequency) out of the spectral gain for suppressing the echo. The peak search unit 1031 outputs this maximum value to the spectrum gain correction unit 1051.
The spectrum gain correction unit 1051 calculates the minimum spectrum gain that makes the spectrum shape look like speech with reference to the input maximum value. Specifically, the spectral gain correction unit 1051 calculates the spectral gain attenuated by several dB / Octave from the maximum value of the spectral gain. The spectral gain correction unit 1051 sets these spectral gains as the lower limit value of the spectral gain at each frequency. That is, the spectral gain correction unit 1051 functions as a suppression strength control unit that weakens the suppression strength of the nonlinear echo by attenuating the gain with respect to the frequency with a predetermined slope.
FIG. 11 is a diagram for explaining the effect of the present embodiment.
The graph on the left side of the upper column 1101 is a graph showing the relationship between the uncorrected spectral gain 1101A and the desired signal spectrum 1101B. The graph on the right side of the upper column 1101 is a graph showing a desired signal spectrum 1101C that has been subjected to suppression processing based on the spectrum gain 1101A.
The graph on the left side of the lower column 1102 is a graph showing the relationship between the corrected spectrum gain 1102A and the desired signal spectrum 1102B. The graph on the right side of the lower column 1102 is a graph showing a desired signal spectrum 1102C that has been subjected to suppression processing based on the spectrum gain 1102A.
In each graph, for example, the vertical axis represents intensity (decibel), and the horizontal axis represents frequency (hertz).
As shown in the lower column 1102 of FIG. 11, when the desired signal 1102B exists, the spectrum gain attenuated by several dB / Octave from the maximum value of the spectrum gain 1101A is set as the lower limit value of the spectrum gain at each frequency. . As a result, the maximum value of the spectral gain 1102A is large, and the lower limit spectral gain is also large. Therefore, compared with the suppression method that does not correct the spectral gain shown in the upper column 1101 of FIG. 11, in the lower column of FIG. The degree improves. On the other hand, when the desired signal does not exist, since the maximum value of the spectral gain is small, the lower limit spectral gain is also small, and a sufficient echo suppression effect can be obtained.
(Other embodiments)
As mentioned above, although embodiment of this invention was explained in full detail, the system or apparatus which combined the separate characteristic contained in each embodiment how was included in the category of this invention.
In addition, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where an information processing program that implements the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention on a computer, a program installed in the computer, a medium storing the program, and a WWW (World Wide Web) server that downloads the program are also included in the scope of the present invention. .
Hereinafter, as an example, a flow of processing executed by a CPU (Central Processing Unit) 1202 provided in the computer 1200 when the audio processing described in the second embodiment is realized by software will be described with reference to FIG. . The CPU 1202 executes, for example, the program read from the memory 1204 to realize steps S301 to S304 described with reference to FIG. 3, and performs predetermined processing on the input signal input from the input unit 1201 and outputs it. Output from the unit 1203.
Note that the input unit 1201 may include a microphone 201. The output unit 1203 may include a speaker 202. The memory 1204 stores information. The CPU 1202 writes necessary information to the memory 1204 and reads necessary information from the memory 1204 when executing the operation of each step.
FIG. 13 is a diagram illustrating an example of a recording medium (storage medium) 1207 that records (stores) a program. The recording medium 1207 is a non-volatile recording medium that stores information non-temporarily. The recording medium 1307 may be a recording medium that temporarily stores information. The recording medium 1307 records a program (software) that causes the computer 1200 (CPU 1202) to execute the operation illustrated in FIG. The recording medium 1207 may further record arbitrary programs and data.
Even if such a computer is used, the same effect as in the second embodiment can be obtained.
While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2011-112077 for which it applied on May 19, 2011, and takes in those the indications of all here.
 100  音声処理装置
 101  非線形エコー抑圧部
 101  エコー抑圧部
 102  所望信号混在判定部
 103  エコー抑圧強度制御部
 200  音声処理装置
 201  マイクロフォン
 202  スピーカ
 203  擬似線形エコー生成部
 204  線形エコー抑圧部
 205  非線形エコー抑圧部
 206  出力判定部
 207  抑圧強度制御部
 404  フロアリング部
 500  非線形エコー抑圧部
 501  高速フーリエ変換部
 502  高速フーリエ変換部
 503  スペクトル振幅推定部
 504  スペクトルフロアリング部
 505  スペクトル利得計算部
 506  逆高速フーリエ変換部
 531  絶対値化回路
 532  絶対値化回路
 533  平均化回路
 534  平均化回路
 535  乗算部
 536  減算器
 537  比較部
 538  回帰係数算出部
 539  減算器
 540  乗算部
 541  平均化回路
 542  最大値選択回路
 551  除算器
 552  平均化回路
 553  乗算部
 800  非線形エコー抑圧部
 835  乗算部
 837  閾値比較部
 838  選択部
 1000  非線形エコー抑圧部
 1200  コンピュータ
 1201  入力部
 1202  CPU
 1203  出力部
 1207  記録媒体
DESCRIPTION OF SYMBOLS 100 Speech processing apparatus 101 Nonlinear echo suppression part 101 Echo suppression part 102 Desired signal mixture determination part 103 Echo suppression intensity control part 200 Speech processing apparatus 201 Microphone 202 Speaker 203 Pseudo linear echo generation part 204 Linear echo suppression part 205 Nonlinear echo suppression part 206 Output determination unit 207 Suppression intensity control unit 404 Flooring unit 500 Non-linear echo suppression unit 501 Fast Fourier transform unit 502 Fast Fourier transform unit 503 Spectrum amplitude estimation unit 504 Spectrum flooring unit 505 Spectrum gain calculation unit 506 Inverse fast Fourier transform unit 531 Absolute Value circuit 532 Absolute value circuit 533 Averaging circuit 534 Averaging circuit 535 Multiplying unit 536 Subtractor 537 Comparison unit 538 Regression coefficient calculation unit 539 Subtractor 40 multiplication unit 541 averaging circuit 542 maximum value selection circuit 551 divider 552 averaging circuit 553 multiplying unit 800 nonlinear echo suppressor 835 multiplying unit 837 threshold comparator 838 selecting unit 1000 nonlinear echo suppressor 1200 computer 1201 input section 1202 CPU
1203 Output unit 1207 Recording medium

Claims (10)

  1.  入力信号に含まれる非線形エコーを抑圧する非線形エコー抑圧手段と、
     前記非線形エコー抑圧手段による抑圧の結果に応じて、入力信号における所望信号の混在の有無を判定する判定手段と、
     判定の結果、前記所望信号が前記入力信号に混在していると判断した場合には、前記非線形エコー抑圧手段における非線形エコーの抑圧強度を弱める抑圧強度制御手段と、を含む音声処理装置。
    Nonlinear echo suppression means for suppressing nonlinear echo included in the input signal;
    In accordance with the result of suppression by the nonlinear echo suppression unit, a determination unit that determines the presence or absence of a desired signal in the input signal;
    As a result of the determination, when it is determined that the desired signal is mixed in the input signal, a speech processing apparatus including suppression intensity control means for weakening the suppression intensity of the nonlinear echo in the nonlinear echo suppression means.
  2.  前記非線形エコー抑圧手段は、擬似線形エコー信号に第1回帰係数を乗算して推定される非線形エコー信号に基づいて前記非線形エコーを抑圧する手段であり、
     抑圧強度制御手段は、擬似非線形エコー信号に前記第1回帰係数より小さい第2回帰係数を乗算して前記非線形エコー信号を推定する
     ことを特徴とする請求項1に記載の音声処理装置。
    The nonlinear echo suppression means is means for suppressing the nonlinear echo based on a nonlinear echo signal estimated by multiplying a pseudo linear echo signal by a first regression coefficient,
    The speech processing apparatus according to claim 1, wherein the suppression intensity control means estimates the nonlinear echo signal by multiplying the pseudo nonlinear echo signal by a second regression coefficient smaller than the first regression coefficient.
  3.  前記抑圧強度制御手段は、
     前記第1回帰係数と前記第2回帰係数とを記憶する記憶手段と、
     前記判定手段による判定の結果、前記入力信号において所望信号が混在していると判定した場合に、前記記憶手段から前記第2回帰係数を読出す回帰係数選択手段と、
     を含み、読出した前記第2回帰係数を乗算して前記推定非線形エコー信号を推定する
     ことを特徴とする請求項2に記載の音声処理装置。
    The suppression intensity control means includes
    Storage means for storing the first regression coefficient and the second regression coefficient;
    As a result of determination by the determination unit, when it is determined that a desired signal is mixed in the input signal, a regression coefficient selection unit that reads the second regression coefficient from the storage unit;
    The speech processing apparatus according to claim 2, wherein the estimated nonlinear echo signal is estimated by multiplying the read second regression coefficient.
  4.  前記抑圧強度制御手段は、
     前記第1回帰係数と前記回帰係数補正値とを記憶する記憶手段と、
     前記判定手段による判定の結果、前記入力信号において所望信号が混在していると判定した場合に、前記第1回帰係数と前記回帰係数補正値とを乗算して前記第2回帰係数を算出する第2回帰係数算出手段と、
     を含み、算出した前記第2回帰係数を用いて前記推定非線形エコー信号を推定する
     ことを特徴とする請求項2に記載の音声処理装置。
    The suppression intensity control means includes
    Storage means for storing the first regression coefficient and the regression coefficient correction value;
    As a result of determination by the determination means, when it is determined that a desired signal is mixed in the input signal, the second regression coefficient is calculated by multiplying the first regression coefficient and the regression coefficient correction value. 2 regression coefficient calculation means;
    The speech processing apparatus according to claim 2, wherein the estimated nonlinear echo signal is estimated using the calculated second regression coefficient.
  5.  前記非線形エコー抑圧手段は、擬似線形エコー信号に第1回帰係数を乗算して推定される推定非線形エコー信号に応じた利得を乗算することによって前記非線形エコーを抑圧する手段であり、
     前記抑圧強度制御手段は、前記利得を小さくすることにより、非線形エコーの抑圧強度を弱める
     ことを特徴とする請求項1に記載の音声処理装置。
    The nonlinear echo suppression means is means for suppressing the nonlinear echo by multiplying a pseudo linear echo signal by a gain according to an estimated nonlinear echo signal estimated by multiplying a first regression coefficient,
    The speech processing apparatus according to claim 1, wherein the suppression intensity control unit weakens the suppression intensity of nonlinear echoes by reducing the gain.
  6.  前記抑圧強度制御手段は、周波数に対する前記利得を所定の傾きで減衰させることにより、非線形エコーの抑圧強度を弱める
     ことを特徴とする請求項5に記載の音声処理装置。
    The speech processing apparatus according to claim 5, wherein the suppression strength control means attenuates the suppression strength of nonlinear echoes by attenuating the gain with respect to frequency with a predetermined slope.
  7.  出力信号に基づいて音声を出力する音声出力手段と、
     音声を入力して入力信号を出力する音声入力手段と、
     前記音声出力手段から出力した音声が前記音声入力手段に対して回り込むことにより発生したと推定される擬似線形エコー信号を、前記出力信号から生成する擬似線形エコー生成手段と、
     をさらに有し、
     前記非線形エコー抑圧手段は、前記擬似線形エコー信号を用いて前記非線形エコーを抑圧する
     ことを特徴とする請求項1乃至6のいずれか1項に記載の音声処理装置。
    Audio output means for outputting audio based on the output signal;
    Voice input means for inputting voice and outputting an input signal;
    A pseudo-linear echo generation unit that generates a pseudo-linear echo signal estimated to have been generated by the sound output from the audio output unit being circulated with respect to the audio input unit;
    Further comprising
    The speech processing apparatus according to claim 1, wherein the nonlinear echo suppression unit suppresses the nonlinear echo using the pseudo-linear echo signal.
  8.  前記判定手段は、前記非線形エコー抑圧手段による抑圧結果としてのスペクトル形状に基づいて、入力信号における所望信号の混在の有無を判定する
     ことを特徴とする請求項1乃至7のいずれか1項に記載の音声処理装置。
    The said determination means determines the presence or absence of the mixing of the desired signal in an input signal based on the spectrum shape as a suppression result by the said nonlinear echo suppression means, The any one of Claim 1 thru | or 7 characterized by the above-mentioned. Voice processing device.
  9.  入力信号に含まれる非線形エコーを抑圧するし、
     前記抑圧の結果に応じて、入力信号における所望信号の混在の有無を判定し、
     判定の結果、前記所望信号が前記入力信号に混在していると判断した場合には、非線形エコーの抑圧強度を弱めた非線形エコーの抑圧を行う、
     音声処理方法。
    Suppresses non-linear echo contained in the input signal,
    According to the result of the suppression, the presence or absence of a desired signal in the input signal is determined,
    As a result of the determination, if it is determined that the desired signal is mixed in the input signal, nonlinear echo suppression is performed by reducing the nonlinear echo suppression strength.
    Audio processing method.
  10.  入力信号に含まれる非線形エコーを抑圧する処理と、
     前記抑圧の結果に応じて、入力信号における所望信号の混在の有無を判定する処理と、
     判定の結果、前記所望信号が前記入力信号に混在していると判断した場合には、非線形エコーの抑圧強度を弱めた非線形エコーの抑圧を行う処理と、をコンピュータに実行させる
     音声処理プログラムを記録した不揮発性媒体。
    Processing to suppress non-linear echo contained in the input signal;
    A process for determining the presence or absence of a desired signal in the input signal according to the result of the suppression;
    As a result of the determination, if it is determined that the desired signal is mixed in the input signal, a voice processing program is recorded that causes the computer to execute processing for suppressing nonlinear echo with reduced nonlinear echo suppression strength. Non-volatile media.
PCT/JP2012/063404 2011-05-19 2012-05-18 Audio processing device, audio processing method, and recording medium on which audio processing program is recorded WO2012157785A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2013515243A JPWO2012157785A1 (en) 2011-05-19 2012-05-18 Audio processing apparatus, audio processing method, and recording medium recording audio processing program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011-112077 2011-05-19
JP2011112077 2011-05-19

Publications (1)

Publication Number Publication Date
WO2012157785A1 true WO2012157785A1 (en) 2012-11-22

Family

ID=47177099

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/063404 WO2012157785A1 (en) 2011-05-19 2012-05-18 Audio processing device, audio processing method, and recording medium on which audio processing program is recorded

Country Status (2)

Country Link
JP (1) JPWO2012157785A1 (en)
WO (1) WO2012157785A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125179A (en) * 1995-12-13 2000-09-26 3Com Corporation Echo control device with quick response to sudden echo-path change
JP2003101445A (en) * 2001-09-20 2003-04-04 Mitsubishi Electric Corp Echo processor
JP2007060644A (en) * 2005-07-28 2007-03-08 Toshiba Corp Signal processor
JP2007189536A (en) * 2006-01-13 2007-07-26 Matsushita Electric Ind Co Ltd Acoustic echo canceler, acoustic error canceling method and speech communication equipment
WO2009051197A1 (en) * 2007-10-19 2009-04-23 Nec Corporation Echo suppressing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125179A (en) * 1995-12-13 2000-09-26 3Com Corporation Echo control device with quick response to sudden echo-path change
JP2003101445A (en) * 2001-09-20 2003-04-04 Mitsubishi Electric Corp Echo processor
JP2007060644A (en) * 2005-07-28 2007-03-08 Toshiba Corp Signal processor
JP2007189536A (en) * 2006-01-13 2007-07-26 Matsushita Electric Ind Co Ltd Acoustic echo canceler, acoustic error canceling method and speech communication equipment
WO2009051197A1 (en) * 2007-10-19 2009-04-23 Nec Corporation Echo suppressing method and device

Also Published As

Publication number Publication date
JPWO2012157785A1 (en) 2014-07-31

Similar Documents

Publication Publication Date Title
JP4186932B2 (en) Howling suppression device and loudspeaker
ES2558559T3 (en) Estimation and suppression of nonlinearities of harmonic speakers
JP4973873B2 (en) Reverberation suppression method, apparatus, and reverberation suppression program
JP2004056453A (en) Method and device for suppressing echo
JPWO2009028023A1 (en) Echo suppression device, echo suppression system, echo suppression method, and computer program
JP6064600B2 (en) Signal processing apparatus, signal processing method, and signal processing program
JP2023133472A (en) Background noise estimation using gap confidence
WO2012070670A1 (en) Signal processing device, signal processing method, and signal processing program
JP2008058480A (en) Signal processing method and device
JP6094479B2 (en) Audio processing apparatus, audio processing method, and recording medium recording audio processing program
JP6070953B2 (en) Signal processing apparatus, signal processing method, and storage medium
CN106941006B (en) Method, apparatus and system for separation and bass enhancement of audio signals
JP5086442B2 (en) Noise suppression method and apparatus
JP6064370B2 (en) Noise suppression device, method and program
JP4594960B2 (en) Background noise interpolation apparatus and background noise interpolation method
JP5292931B2 (en) Acoustic echo canceller and echo cancellation device
WO2012157785A1 (en) Audio processing device, audio processing method, and recording medium on which audio processing program is recorded
JP2006324786A (en) Acoustic signal processing apparatus and method
WO2012157783A1 (en) Audio processing device, audio processing method, and recording medium on which audio processing program is recorded
JP6304643B2 (en) Nonlinear distortion reduction apparatus, method, and program for speaker
JP2008085556A (en) Low pitch sound correcting device and sound recorder
JP6180689B1 (en) Echo canceller apparatus, echo cancellation method, and echo cancellation program
Guo et al. An Improved Low-Complexity Echo Suppression Algorithm Based on the Acoustic Coloration Effect
JP5056654B2 (en) Noise suppression device and noise suppression method
JP2009251533A (en) Digital noise canceling headphone

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12786496

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2013515243

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12786496

Country of ref document: EP

Kind code of ref document: A1