WO2012157785A1

WO2012157785A1 - Audio processing device, audio processing method, and recording medium on which audio processing program is recorded

Info

Publication number: WO2012157785A1
Application number: PCT/JP2012/063404
Authority: WO
Inventors: 宝珠山　治
Original assignee: 日本電気株式会社
Priority date: 2011-05-19
Filing date: 2012-05-18
Publication date: 2012-11-22
Also published as: JPWO2012157785A1

Abstract

The present invention provides an audio processing device capable of suppressing an echo while minimizing degradation of a desired signal. Said audio processing device is provided with the following: a nonlinear-echo suppression means that suppresses a nonlinear echo in an input signal; a determination means that uses suppression results from the nonlinear-echo suppression means to determine the presence or absence of a desired signal in the input signal; and a suppression-strength control means that, if it has been determined that the desired signal is present in the input signal, reduces the strength of the nonlinear-echo suppression performed by the nonlinear-echo suppression means.

Description

Audio processing apparatus, audio processing method, and recording medium recording audio processing program

The present invention relates to a technique for suppressing echo.

In the above technical field, as shown in Patent Document 1, a technique for suppressing echoes is known.
The echo suppression device described in Patent Document 1 includes a conversion unit having the following functions. First, the conversion unit inputs the first signal and the second signal. Here, the first signal is either a microphone input signal (output signal in Patent Document 1) or a signal obtained by subtracting the output signal of the linear echo canceller from the output signal of the speaker by the subtractor. The second signal is an output signal of the linear echo canceller. Second, the conversion unit calculates an estimated value of echo leakage from the first signal and the second signal. Third, the conversion unit corrects the first signal based on the calculated estimated value, thereby generating a near-end signal obtained by removing the echo from the first signal, and outputs the generated near-end signal to the output terminal.

JP 2004-56453 A

However, with the technique described in Patent Document 1, the desired signal often deteriorates due to echo suppression, and the output signal quality is not sufficient.
The reason is that the conversion unit of the echo suppressor described in Patent Document 1 corrects the first signal without considering whether the first signal or the second signal contains the desired signal. Because.
The objective of this invention is providing the technique which solves the above-mentioned subject.

In one embodiment of the present invention, an apparatus includes:
Nonlinear echo suppression means for suppressing nonlinear echo included in the input signal;
In accordance with the result of suppression by the nonlinear echo suppression unit, a determination unit that determines the presence or absence of a desired signal in the input signal;
As a result of the determination, if it is determined that the desired signal is mixed in the input signal, a suppression strength control unit that weakens the suppression strength of the nonlinear echo in the nonlinear echo suppression unit;
Is provided.
In one aspect of the invention, a method includes
Suppresses non-linear echo contained in the input signal,
According to the result of the suppression, the presence or absence of a desired signal in the input signal is determined,
As a result of the determination, if it is determined that the desired signal is mixed in the input signal, nonlinear echo suppression with weakened nonlinear echo suppression strength is performed.
The program recorded on the non-volatile medium in one embodiment of the present invention is:
Processing to suppress non-linear echo contained in the input signal;
A process for determining the presence or absence of a desired signal in the input signal according to the result of the suppression;
As a result of the determination, if it is determined that the desired signal is mixed in the input signal, processing for suppressing nonlinear echo with weakened nonlinear echo suppression strength;
Is executed on the computer.

According to the present invention, it is possible to suppress echoes while suppressing deterioration of a desired signal.

FIG. 1 is a block diagram showing the configuration of the speech processing apparatus according to the first embodiment of the present invention. FIG. 2 is a block diagram showing the configuration of the speech processing apparatus according to the second embodiment of the present invention. FIG. 3 is a flowchart showing a flow of processing of the speech processing apparatus according to the second embodiment of the present invention. FIG. 4 is a diagram showing the effect of the speech processing apparatus according to the second embodiment of the present invention. FIG. 5 is a block diagram showing the configuration of the speech processing apparatus according to the third embodiment of the present invention. FIG. 6 is a flowchart showing the flow of processing of the speech processing apparatus according to the fourth embodiment of the present invention. FIG. 7 is a flowchart showing a process flow of the speech processing apparatus according to the fifth embodiment of the present invention. FIG. 8 is a block diagram showing a configuration of a speech processing apparatus according to the sixth embodiment of the present invention. FIG. 9 is a flowchart showing the flow of processing of the speech processing apparatus according to the sixth embodiment of the present invention. FIG. 10 is a block diagram showing the configuration of the speech processing apparatus according to the seventh embodiment of the present invention. FIG. 11 is a diagram for explaining the effect of the speech processing apparatus according to the seventh embodiment of the present invention. FIG. 12 is a block diagram showing a configuration of a sound processing apparatus according to another embodiment of the present invention. FIG. 13 is a diagram showing a recording medium on which the program of the present invention is recorded.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings. However, the components described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them.
[First Embodiment]
A speech processing apparatus 100 as a first embodiment of the present invention will be described with reference to FIG. The audio processing device 100 is a device for suppressing echo in a signal.
As shown in FIG. 1, the speech processing apparatus 100 includes a nonlinear echo suppression unit 101, a desired signal mixture determination unit 102, and an echo suppression intensity control unit 103.
The nonlinear echo suppression unit 101 suppresses nonlinear echo included in the input signal. Further, the desired signal mixture determination unit 102 determines whether or not the desired signal is mixed in the input signal according to the result of suppression by the nonlinear echo suppression unit 101.
If the echo suppression intensity control unit 103 determines that the desired signal is mixed in the input signal as a result of the determination by the desired signal mixture determination unit 102, the echo suppression intensity control unit 103 weakens the suppression intensity of the nonlinear echo in the nonlinear echo suppression unit 101. .
With the above configuration, the speech processing apparatus 100 can suppress echo while suppressing deterioration of a desired signal.
The reason is that the speech processing apparatus 100 has the following configuration. First, the desired signal mixture determination unit 102 determines whether or not the desired signal is mixed in the input signal. Second, the echo suppression strength control unit 103 weakens the suppression strength of the nonlinear echo in the nonlinear echo suppression unit 101 when the determination result indicates that the desired signal is mixed.
[Second Embodiment]
Next, a speech processing apparatus 200 according to the second embodiment of the present invention will be described with reference to FIG. FIG. 2 is a block diagram for explaining a schematic configuration of the speech processing apparatus 200 according to the present embodiment. The speech processing apparatus 200 determines the suppression amount after determining the presence or absence of the desired signal so that the high-frequency component of the desired signal (for example, the near-end signal) is attenuated and the intelligibility does not decrease when the echo is suppressed To do. Specifically, if the signal remains after the echo is reliably suppressed, it is determined that the desired signal exists, the suppression amount is reduced, and the echo suppression process is repeated again. The reduction of the suppression amount improves the clarity of the desired signal. When the desired signal is present, there is a subjectively sufficient echo suppression effect because there is no concern even if some echo remains.
As shown in FIG. 2, the sound processing apparatus 200 includes a microphone 201 as sound input means and a speaker 202 as sound output means. When the speaker 202 outputs sound corresponding to the output signal, an echo signal may be mixed into the input signal of the microphone 201. The pseudo linear echo generation unit 203 generates a pseudo signal of the linear echo signal among the echo signals from the output signal. The quasi-linear echo generation unit 203 is configured by an adaptive filter that updates a coefficient using a residual signal, for example. The linear echo suppression unit 204 subtracts the pseudo linear echo signal from the input signal input by the microphone 201.
The pseudo linear echo signal and the residual signal generated by the pseudo linear echo generation unit 203 are input to the nonlinear echo suppression unit 205. The nonlinear echo suppression unit 205 estimates the nonlinear echo signal and suppresses the nonlinear echo signal component in the residual signal.
The speech processing apparatus 200 includes an output determination unit 206 and a suppression strength control unit 207. The output determination unit 206 determines the presence of the desired signal from the signal after the echo is strongly suppressed. In particular, the presence of the desired signal is determined only from the high frequency component of the sound.
In the present embodiment, the non-linear echo suppression unit 205 suppresses non-linear echo based on a non-linear echo signal estimated by multiplying by a regression coefficient. The suppression intensity control unit 207 reduces the suppression amount by correcting or selecting the regression coefficient. That is, when the desired signal is present in the signal after the echo is strongly suppressed using the first regression coefficient, the suppression intensity control unit 207 applies the second regression coefficient smaller than the first regression coefficient to the nonlinear echo suppression unit. 205. The nonlinear echo suppression unit 205 estimates the nonlinear echo signal by multiplying the pseudo nonlinear echo signal by a second regression coefficient smaller than the first regression coefficient.
FIG. 3 is a flowchart for explaining the flow of processing performed by the speech processing apparatus 200. In step S301, the nonlinear echo suppression unit 205 strongly suppresses the echo with a relatively large first regression coefficient. Next, in step S302, the output determination unit 206 calculates the power after suppression. In step S303, the output determination unit 206 determines whether or not the suppressed power (power, that is, the square of the amplitude) is greater than a threshold value. When the power after suppression is larger than the threshold value, it is considered that the desired signal is included in the original input signal, and thus the process proceeds to step S304, and the suppression strength control unit 207 decreases the regression coefficient. That is, the suppression intensity control unit 207 outputs a second regression coefficient that is smaller than the first regression coefficient. Then, the nonlinear echo suppression unit 205 performs echo suppression processing again based on the regression coefficient. That is, the echo suppression effect is reduced so that the desired voice remains large.
FIG. 4 is a diagram illustrating effects that can be achieved by the sound processing apparatus 200 according to the present embodiment. The graph on the left side of the upper row 401 is a graph showing the spectrum after the first suppression process. The graph on the right side of the upper row 401 is a graph showing the spectrum after the second suppression processing after the regression coefficient is corrected to be small. As shown in the upper section 401, by reducing the regression coefficient, it is possible to perform echo suppression so that a large desired speech remains. On the other hand, the graph on the left side of the lower stage 402 is a graph showing a spectrum when the output power after the first suppression process is smaller than the threshold value. In this case, the suppression strength control unit 207 does not change the regression coefficient. For this reason, the spectrum after the second suppression process indicated by the graph on the right side of the lower stage 402 is a spectrum having the same shape as the graph on the left side. In each graph, for example, the vertical axis represents intensity (decibel), and the horizontal axis represents frequency (hertz).
As described above, according to the present embodiment, the clarity of a desired signal after echo suppression is improved.
The reason is that the output determination unit 206 determines whether or not the power after suppression is larger than the threshold value, and the suppression intensity control unit 207 uses the regression coefficient generated based on the determination result as the nonlinear echo suppression unit 205. It is because it was made to output to.
[Third Embodiment]
Next, a speech processing apparatus according to a third embodiment of the present invention will be described with reference to FIG. The speech processing apparatus according to this embodiment differs from the nonlinear echo suppression unit 205 of the second embodiment in that the nonlinear echo suppression unit includes an output determination unit and a suppression intensity control unit. Since other configurations and operations are the same as those of the second embodiment, description thereof is omitted here.
FIG. 5 is a diagram illustrating an internal configuration of the nonlinear echo suppression unit 500 according to the present embodiment. The nonlinear echo suppression unit 500 includes a fast Fourier transform (FFT) unit 501, a fast Fourier transform unit 502, a spectrum amplitude estimation unit 503, a spectrum flooring unit 504, a spectrum gain calculation unit 505, and an inverse And a fast Fourier transform unit (IFFT) 506.
The fast

Fourier transform units

501 and 502 respectively convert the residual signal d (k) and the pseudo linear echo y (k) into a frequency spectrum. A spectrum amplitude estimation unit 503, a spectrum flooring unit 504, and a spectrum gain calculation unit 505 are prepared for each frequency component. The inverse fast Fourier transform unit 506 integrates the amplitude spectrum derived for each frequency component with the corresponding phase, performs inverse fast Fourier transform, and outputs the time-domain output signal zi (k), that is, the speech waveform to be sent to the other party. Re-synthesize.
Linear echo and nonlinear echo are completely different waveforms. However, looking at the spectrum amplitude for each frequency of the linear echo and the non-linear echo, the non-linear echo tends to increase when the pseudo-linear echo is large. That is, there is an amplitude correlation between the linear echo and the nonlinear echo. That is, the amount of non-linear echo can be estimated based on the quasi-linear echo.
Therefore, the spectrum amplitude estimation unit 503 estimates the spectrum amplitude of the desired signal based on the estimated amount of nonlinear echo. There is an error in the spectral amplitude of the estimated signal. Therefore, the spectrum flooring unit 504 performs flooring processing so that the estimation error does not become subjectively unpleasant in the voice waveform sent to the other party.
For example, when the estimated spectral amplitude of the signal is excessively small and lower than the spectral amplitude of the background noise, the signal level fluctuates depending on the presence or absence of an echo, causing a strange feeling to the other party. As a countermeasure, the spectrum flooring unit 504 reduces the level fluctuation by estimating the background noise level and setting it as the lower limit of the estimated spectrum amplitude.
On the other hand, if a large amount of echo remains in the estimated spectrum amplitude due to the estimation error, the remaining echo changes intermittently and rapidly, and becomes an artificial additional sound called musical noise. As a countermeasure, the spectral gain calculation unit 505 does not subtract the estimated nonlinear echo to cancel the echo, but multiplies the gain so that the amplitude becomes a subtracted degree. By performing smoothing to prevent sudden changes in gain, it is possible to suppress intermittent changes in residual echo.
Hereinafter, the internal configurations of the spectrum amplitude estimation unit 503, the spectrum flooring unit 504, and the spectrum gain calculation unit 505 will be described using mathematical expressions.
The residual signal d (k) input to the nonlinear echo suppression unit 500 is the sum of the desired signal s (k) and the residual nonlinear echo q (k).
d (k) = s (k) + q (k) (1)
Assuming that the linear echo suppressor 204 has almost completely removed the linear echo, only the nonlinear component is considered in the frequency domain. By the fast Fourier transform unit 501 and the fast Fourier transform unit 502, the residual signal represented by the equation (1) is transformed into the frequency domain and represented by the following equation.
D (m) = S (m) + Q (m) (2)
Here, m is a frame number, and vectors D (m), S (m), and Q (m) are expressions obtained by converting d (k), s (k), and q (k), respectively, into the frequency domain. When equation (2) is transformed by considering each frequency independently, the i-th frequency component of the desired signal is expressed by the following equation.
Si (m) = Di (m) −Qi (m) (3)
Since the pseudo linear echo generation unit 203 and the linear echo suppression unit 204 perform correlation removal, there is almost no correlation between Di (m) and Yi (m). Therefore, the subtractor 536 is as follows:

This is an echo replica of the i-th frequency when converted.

The product can be modeled as follows.

Therefore, the absolute value converting circuit 532 and the averaging circuit 534 calculate the average echo replica from Yi (m).

Multiply the number ai1. Here, the regression coefficient ai1 is a regression coefficient indicating the correlation between | Qi (m) | and | Yi (m) |. This model is based on experimental results that there is a significant correlation between | Qi (m) | and | Yi (m) |.
Expression (3) is an additive model widely used in noise suppression. The spectrum shaping of the nonlinear echo suppressor 500 shown in FIG. 5 has a spectrum multiplication type configuration that is less likely to cause unpleasant musical noise in noise suppression. Using spectral multiplication, the output signal amplitude | Zi (m) | is obtained as the product of the spectral gain Gi (m) and the residual signal | Di (m) |.

By taking the square root of equation (6) and substituting ai ² · | Yi (m) | ² for | Qi (m) | ² of equation (4),

The comparing unit 537 compares this output signal with a threshold value. The comparison unit 537 displays the comparison result with strong suppression.

In this case, it is determined that the desired signal is included, and the regression coefficient calculation unit 538 corrects the regression coefficient. Specifically, the regression coefficient calculation unit 538 calculates a smaller regression coefficient ai2 by multiplying the regression coefficient ai1 by a predetermined correction value. Then, the nonlinear echo is suppressed again by the multiplier 540 and the subtracter 539.

There is no error. If the error is large and oversubtraction occurs, a high frequency component is reduced in the desired signal, or a sense of modulation is produced in the voice waveform sent to the other party. In particular, when the desired signal is stationary like air-conditioning sound, the sense of modulation is uncomfortable for the other party. In order to subjectively reduce the modulation feeling, the flooring unit 504 performs flooring on the spectrum.
In flooring, the averaging circuit 541 first estimates the steady component | Ni (m) | of the desired signal Di (m). Next, the maximum value selection circuit 642 performs flooring with this as the lower limit. So

Finally, as shown in Expression (6), the multiplier 553 outputs the amplitude | Zi (m) | calculated by calculating the product of the spectral gain Gi (m) and the residual signal | Di (m) | Output as a signal. The inverse fast Fourier transform unit 506 performs inverse Fourier transform on the amplitude | Zi (em) |, and outputs a signal zi (k) in which nonlinear echoes are effectively suppressed.
The regression coefficient calculation unit 538 stores the first regression coefficient and the regression coefficient correction value. The regression coefficient calculation unit 538 then multiplies the first regression coefficient ai1 and the regression coefficient correction value when the comparison result in the comparison unit 537 is a determination that a desired signal is mixed in the input signal. The second regression coefficient ai2 is calculated and output.
The regression coefficient calculation unit 538 stores the first regression coefficient and the smaller second regression coefficient, and the comparison result in the comparison unit 537 is a determination that the desired signal is mixed in the input signal. In addition, the second regression coefficient may be read and output.
As described above, according to the configuration of the present embodiment, it is possible to correct the regression coefficient and improve the clarity of the desired signal after echo suppression.
The reason is that the nonlinear echo suppressor 500 according to the present embodiment has the following configuration. First, the fast

Fourier transform units

501 and 502 respectively convert the residual signal d (k) and the pseudo linear echo y (k) into a frequency spectrum. Second, the spectrum amplitude estimation unit 503, the spectrum flooring unit 504, and the spectrum gain calculation unit 505 perform echo suppression while suppressing degradation of the desired signal for each frequency component. Third, the inverse fast Fourier transform unit 506 integrates the amplitude spectrum derived for each frequency component with the corresponding phase, performs inverse fast Fourier transform, and re-synthesizes the output signal zi (k) in the time domain.
[Fourth Embodiment]
A fourth embodiment of the present invention will be described with reference to FIG. FIG. 6 is a flowchart for explaining the flow of processing in the speech processing apparatus according to this embodiment. In the present embodiment, in step S604, the suppression intensity control unit 207 provides a small regression coefficient prepared in advance to the nonlinear echo suppression unit 205. The nonlinear echo suppressor 205 is different from the second embodiment in that the output signal is recalculated using the provided regression coefficient. Since other configurations and operations are the same as those in the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
According to this embodiment, the same effect as that of the second embodiment can be obtained.
The reason is that the suppression intensity control unit 207 provides a small regression coefficient prepared in advance to the nonlinear echo suppression unit 205, and the nonlinear echo suppression unit 205 recalculates the output signal using the regression coefficient. Because.
[Fifth Embodiment]
A fifth embodiment of the present invention will be described with reference to FIG. FIG. 7 is a flowchart for explaining the flow of processing in the speech processing apparatus according to this embodiment. In step S703, the output determination unit 206 is different from the second embodiment in that it determines whether the output signal is larger than the threshold only in the audio band. Since other configurations and operations are the same as those in the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
According to this embodiment, the same effect as that of the second embodiment can be obtained. Moreover, according to this embodiment, the clarity of the desired signal after echo suppression in the voice band is improved.
The reason is that the output determination unit 206 determines whether the output signal is larger than the threshold only in the audio band.
[Sixth Embodiment]
A sixth embodiment of the present invention will be described with reference to FIG. FIG. 8 is a block diagram showing the configuration of the nonlinear echo suppression unit 800 provided in the speech processing apparatus according to this embodiment.
Compared with the nonlinear echo suppression unit 500, the nonlinear echo suppression unit 800 includes a spectrum amplitude estimation unit 803 instead of the spectrum amplitude estimation unit 503. The spectrum amplitude estimation unit 803 further includes a selection unit 838, a multiplication unit 835, a subtracter 836 and a threshold comparison unit 837 compared to the spectrum amplitude estimation unit 503. Further, the spectrum amplitude estimation unit 803 does not include the comparison unit 537, the regression coefficient calculation unit 538, the subtractor 539, and the multiplication unit 540, as compared with the spectrum amplitude estimation unit 503.
In FIG. 8, the absolute value converting circuit 532 and the averaging circuit 534 are arranged such that the average echo signal is calculated from Yi (m).

Multiply by two. Here, the regression coefficients ai1 and ai2 are regression coefficients indicating the correlation between | Qi (m) | and | Yi (m) |, and ai1> ai2.
The threshold comparison unit 837 compares the output from the subtractor 536 with the threshold and passes the result to the selection unit 838. The selection unit 838 selects one of the outputs from the subtracter 536 and the subtracter 836 according to the comparison result by the threshold comparison unit 837 and passes the selected output to the maximum value selection circuit 542. Since other configurations and operations are the same as those of the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
FIG. 9 is a flowchart showing a simplified flow of processing in the present embodiment. In step S901, the multiplication unit 535 and the subtractor 536 perform echo suppression with a large regression coefficient ai1, and output a first output signal. In step S903, the multiplier 835 and the subtractor 836 perform echo suppression with a small regression coefficient ai2 and output a second output signal. In step S905, the threshold value comparison unit 837 determines whether or not the first output signal is larger than the threshold value. If the first output signal is greater than the threshold value, the selection unit 838 selects and outputs the second output signal in step S906. If the first output signal is smaller than the threshold value, the selection unit 838 selects and outputs the first output signal in step S907.
According to the present embodiment, the same effects as those of the second embodiment can be obtained with a simpler configuration.
The reason is that the nonlinear echo suppressor 500 according to the present embodiment has the following configuration. First, the multiplier 535 and the subtractor 536 perform echo suppression with a large regression coefficient ai1 and output a first output signal. Second, the multiplier 835 and the subtractor 836 perform echo suppression with a small regression coefficient ai2 and output a second output signal. Third, the threshold comparison unit 837 determines whether or not the first output signal is greater than the threshold. Fourth, the selection unit 838 selects and outputs the second output signal if the first output signal is greater than the threshold, and selects the first output signal if the first output signal is less than the threshold. Output.
[Seventh Embodiment]
A seventh embodiment of the present invention will be described with reference to FIG. FIG. 10 is a block diagram showing a configuration of the nonlinear echo suppression unit 1000 provided in the speech processing apparatus according to the present embodiment.
Compared with the nonlinear echo suppression unit 500, the nonlinear echo suppression unit 1000 includes a spectrum amplitude estimation unit 1003 instead of the spectrum amplitude estimation unit 503, and includes a spectrum gain calculation unit 1005 instead of the spectrum gain calculation unit 505. The spectrum amplitude estimation unit 1003 further includes a peak search unit 1031 as compared with the spectrum amplitude estimation unit 503. Further, the spectrum amplitude estimation unit 1003 does not include the comparison unit 537, the regression coefficient calculation unit 538, the subtracter 539, and the multiplication unit 540, as compared with the spectrum amplitude estimation unit 503. Further, the spectral gain calculation unit 1005 further includes a spectral gain correction unit 1051 as compared with the spectral gain calculation unit 505.

Search for peaks in the mid-high range.
The peak search unit 1031 derives the maximum value in the mid-high range (higher component of the main audio frequency) out of the spectral gain for suppressing the echo. The peak search unit 1031 outputs this maximum value to the spectrum gain correction unit 1051.
The spectrum gain correction unit 1051 calculates the minimum spectrum gain that makes the spectrum shape look like speech with reference to the input maximum value. Specifically, the spectral gain correction unit 1051 calculates the spectral gain attenuated by several dB / Octave from the maximum value of the spectral gain. The spectral gain correction unit 1051 sets these spectral gains as the lower limit value of the spectral gain at each frequency. That is, the spectral gain correction unit 1051 functions as a suppression strength control unit that weakens the suppression strength of the nonlinear echo by attenuating the gain with respect to the frequency with a predetermined slope.
FIG. 11 is a diagram for explaining the effect of the present embodiment.
The graph on the left side of the upper column 1101 is a graph showing the relationship between the uncorrected spectral gain 1101A and the desired signal spectrum 1101B. The graph on the right side of the upper column 1101 is a graph showing a desired signal spectrum 1101C that has been subjected to suppression processing based on the spectrum gain 1101A.
The graph on the left side of the lower column 1102 is a graph showing the relationship between the corrected spectrum gain 1102A and the desired signal spectrum 1102B. The graph on the right side of the lower column 1102 is a graph showing a desired signal spectrum 1102C that has been subjected to suppression processing based on the spectrum gain 1102A.
In each graph, for example, the vertical axis represents intensity (decibel), and the horizontal axis represents frequency (hertz).
As shown in the lower column 1102 of FIG. 11, when the desired signal 1102B exists, the spectrum gain attenuated by several dB / Octave from the maximum value of the spectrum gain 1101A is set as the lower limit value of the spectrum gain at each frequency. . As a result, the maximum value of the spectral gain 1102A is large, and the lower limit spectral gain is also large. Therefore, compared with the suppression method that does not correct the spectral gain shown in the upper column 1101 of FIG. 11, in the lower column of FIG. The degree improves. On the other hand, when the desired signal does not exist, since the maximum value of the spectral gain is small, the lower limit spectral gain is also small, and a sufficient echo suppression effect can be obtained.
(Other embodiments)
As mentioned above, although embodiment of this invention was explained in full detail, the system or apparatus which combined the separate characteristic contained in each embodiment how was included in the category of this invention.
In addition, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where an information processing program that implements the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention on a computer, a program installed in the computer, a medium storing the program, and a WWW (World Wide Web) server that downloads the program are also included in the scope of the present invention. .
Hereinafter, as an example, a flow of processing executed by a CPU (Central Processing Unit) 1202 provided in the computer 1200 when the audio processing described in the second embodiment is realized by software will be described with reference to FIG. . The CPU 1202 executes, for example, the program read from the memory 1204 to realize steps S301 to S304 described with reference to FIG. 3, and performs predetermined processing on the input signal input from the input unit 1201 and outputs it. Output from the unit 1203.
Note that the input unit 1201 may include a microphone 201. The output unit 1203 may include a speaker 202. The memory 1204 stores information. The CPU 1202 writes necessary information to the memory 1204 and reads necessary information from the memory 1204 when executing the operation of each step.
FIG. 13 is a diagram illustrating an example of a recording medium (storage medium) 1207 that records (stores) a program. The recording medium 1207 is a non-volatile recording medium that stores information non-temporarily. The recording medium 1307 may be a recording medium that temporarily stores information. The recording medium 1307 records a program (software) that causes the computer 1200 (CPU 1202) to execute the operation illustrated in FIG. The recording medium 1207 may further record arbitrary programs and data.
Even if such a computer is used, the same effect as in the second embodiment can be obtained.
While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2011-112077 for which it applied on May 19, 2011, and takes in those the indications of all here.

DESCRIPTION OF SYMBOLS 100 Speech processing apparatus 101 Nonlinear echo suppression part 101 Echo suppression part 102 Desired signal mixture determination part 103 Echo suppression intensity control part 200 Speech processing apparatus 201 Microphone 202 Speaker 203 Pseudo linear echo generation part 204 Linear echo suppression part 205 Nonlinear echo suppression part 206 Output determination unit 207 Suppression intensity control unit 404 Flooring unit 500 Non-linear echo suppression unit 501 Fast Fourier transform unit 502 Fast Fourier transform unit 503 Spectrum amplitude estimation unit 504 Spectrum flooring unit 505 Spectrum gain calculation unit 506 Inverse fast Fourier transform unit 531 Absolute Value circuit 532 Absolute value circuit 533 Averaging circuit 534 Averaging circuit 535 Multiplying unit 536 Subtractor 537 Comparison unit 538 Regression coefficient calculation unit 539 Subtractor 40 multiplication unit 541 averaging circuit 542 maximum value selection circuit 551 divider 552 averaging circuit 553 multiplying unit 800 nonlinear echo suppressor 835 multiplying unit 837 threshold comparator 838 selecting unit 1000 nonlinear echo suppressor 1200 computer 1201 input section 1202 CPU
1203 Output unit 1207 Recording medium

Claims

Nonlinear echo suppression means for suppressing nonlinear echo included in the input signal;
In accordance with the result of suppression by the nonlinear echo suppression unit, a determination unit that determines the presence or absence of a desired signal in the input signal;
As a result of the determination, when it is determined that the desired signal is mixed in the input signal, a speech processing apparatus including suppression intensity control means for weakening the suppression intensity of the nonlinear echo in the nonlinear echo suppression means.
The nonlinear echo suppression means is means for suppressing the nonlinear echo based on a nonlinear echo signal estimated by multiplying a pseudo linear echo signal by a first regression coefficient,
The speech processing apparatus according to claim 1, wherein the suppression intensity control means estimates the nonlinear echo signal by multiplying the pseudo nonlinear echo signal by a second regression coefficient smaller than the first regression coefficient.
The suppression intensity control means includes
Storage means for storing the first regression coefficient and the second regression coefficient;
As a result of determination by the determination unit, when it is determined that a desired signal is mixed in the input signal, a regression coefficient selection unit that reads the second regression coefficient from the storage unit;
The speech processing apparatus according to claim 2, wherein the estimated nonlinear echo signal is estimated by multiplying the read second regression coefficient.
The suppression intensity control means includes
Storage means for storing the first regression coefficient and the regression coefficient correction value;
As a result of determination by the determination means, when it is determined that a desired signal is mixed in the input signal, the second regression coefficient is calculated by multiplying the first regression coefficient and the regression coefficient correction value. 2 regression coefficient calculation means;
The speech processing apparatus according to claim 2, wherein the estimated nonlinear echo signal is estimated using the calculated second regression coefficient.
The nonlinear echo suppression means is means for suppressing the nonlinear echo by multiplying a pseudo linear echo signal by a gain according to an estimated nonlinear echo signal estimated by multiplying a first regression coefficient,
The speech processing apparatus according to claim 1, wherein the suppression intensity control unit weakens the suppression intensity of nonlinear echoes by reducing the gain.
The speech processing apparatus according to claim 5, wherein the suppression strength control means attenuates the suppression strength of nonlinear echoes by attenuating the gain with respect to frequency with a predetermined slope.
Audio output means for outputting audio based on the output signal;
Voice input means for inputting voice and outputting an input signal;
A pseudo-linear echo generation unit that generates a pseudo-linear echo signal estimated to have been generated by the sound output from the audio output unit being circulated with respect to the audio input unit;
Further comprising
The speech processing apparatus according to claim 1, wherein the nonlinear echo suppression unit suppresses the nonlinear echo using the pseudo-linear echo signal.
The said determination means determines the presence or absence of the mixing of the desired signal in an input signal based on the spectrum shape as a suppression result by the said nonlinear echo suppression means, The any one of Claim 1 thru | or 7 characterized by the above-mentioned. Voice processing device.
Suppresses non-linear echo contained in the input signal,
According to the result of the suppression, the presence or absence of a desired signal in the input signal is determined,
As a result of the determination, if it is determined that the desired signal is mixed in the input signal, nonlinear echo suppression is performed by reducing the nonlinear echo suppression strength.
Audio processing method.
Processing to suppress non-linear echo contained in the input signal;
A process for determining the presence or absence of a desired signal in the input signal according to the result of the suppression;
As a result of the determination, if it is determined that the desired signal is mixed in the input signal, a voice processing program is recorded that causes the computer to execute processing for suppressing nonlinear echo with reduced nonlinear echo suppression strength. Non-volatile media.