WO2005112007A1 - 音響信号除去装置、音響信号除去方法及び音響信号除去プログラム - Google Patents
音響信号除去装置、音響信号除去方法及び音響信号除去プログラム Download PDFInfo
- Publication number
- WO2005112007A1 WO2005112007A1 PCT/JP2004/013168 JP2004013168W WO2005112007A1 WO 2005112007 A1 WO2005112007 A1 WO 2005112007A1 JP 2004013168 W JP2004013168 W JP 2004013168W WO 2005112007 A1 WO2005112007 A1 WO 2005112007A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- amplitude spectrum
- mixed
- sound
- acoustic
- spectral intensity
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 62
- 238000001228 spectrum Methods 0.000 claims abstract description 191
- 238000012545 processing Methods 0.000 claims abstract description 43
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 230000005236 sound signal Effects 0.000 claims description 79
- 230000003595 spectral effect Effects 0.000 claims description 65
- 230000008030 elimination Effects 0.000 claims description 27
- 238000003379 elimination reaction Methods 0.000 claims description 20
- 239000000284 extract Substances 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 20
- 238000012937 correction Methods 0.000 claims description 11
- 238000012417 linear regression Methods 0.000 claims description 11
- 230000002123 temporal effect Effects 0.000 claims description 9
- 230000008859 change Effects 0.000 abstract description 12
- 230000010363 phase shift Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 19
- 238000006243 chemical reaction Methods 0.000 description 18
- 238000003860 storage Methods 0.000 description 18
- 238000004364 calculation method Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 7
- 239000000203 mixture Substances 0.000 description 7
- 238000005070 sampling Methods 0.000 description 7
- 238000013075 data extraction Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 238000009499 grossing Methods 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 230000037433 frameshift Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000001174 ascending effect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 101100285899 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) SSE2 gene Proteins 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000005587 bubbling Effects 0.000 description 1
- 239000000571 coke Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Definitions
- the present invention relates to an acoustic signal removing apparatus, an acoustic signal removing method, and an acoustic signal removing program.
- the present invention relates to an audio signal elimination device and an audio signal elimination device that eliminate sound such as BGM and audio mixed in content when reusing the content such as rebroadcasting a program that has already been broadcast.
- the present invention relates to a method and an acoustic signal removal program.
- the bass and treble of the BGM are emphasized or attenuated for program effect at the time of program creation.
- the frequency characteristics of the background music have changed during the process of recording and playback, and the subtraction process cannot be performed simply.
- the present invention has been made to solve the above-described problem, and may cause erroneous processing based on a discontinuity in volume or a phase shift that occurs when a known sound is removed from mixed sound. It is an object of the present invention to provide an audio signal elimination device, an audio signal elimination method, and an audio signal elimination program capable of avoiding noise, automatically and accurately predicting a change in sound to be eliminated, and appropriately eliminating the change. I do.
- the present invention extracts a known sound amplitude spectrum from a known sound signal to be removed, and mixes the known sound signal with another sound signal.
- the mixed sound amplitude spectrum is extracted from the mixed sound signal, the degree of coincidence between the known sound amplitude spectrum and the mixed sound amplitude spectrum is calculated, and the mixed sound amplitude spectrum is calculated according to the calculated degree of coincidence.
- the temporal position of the known acoustic amplitude spectrum is displaced, and the temporal positions of the known acoustic amplitude spectrum and the known acoustic amplitude spectrum mixed in the mixed acoustic amplitude spectrum are matched, and the known acoustic amplitude spectrum is displaced.
- the sound amplitude spectrum is removed from the mixed sound amplitude spectrum force.
- a stationary block defined by a frequency band of a predetermined width and a time width is set.
- an estimation block having a range including the stationary block is set, and the spectrum intensity points in the estimation block corresponding to between the known acoustic amplitude spectrum and the mixed acoustic amplitude spectrum are plotted on a plane.
- a common line is set for all the static intensity points, and the degree of coincidence is calculated based on the degree of deviation of each spectral intensity point from the common line.
- the process of removing the known sound from the mixed sound including the known sound and performing the process of matching the start times of the mixed sound and the known sound is performed according to the present invention. It can be performed automatically and accurately.
- a known acoustic signal force to be removed also extracts a known acoustic amplitude spectrum, and a mixed acoustic signal in which the known acoustic signal is mixed with another acoustic signal.
- the spectrum is extracted, the degree of coincidence between the known acoustic amplitude spectrum and the mixed acoustic amplitude spectrum is calculated, and the frequency characteristic of the known acoustic amplitude spectrum is corrected according to the calculated degree of coincidence, so that the frequency characteristic is corrected.
- the corrected known sound amplitude spectrum is removed from the mixed sound amplitude spectrum.
- each of the known acoustic amplitude spectrum and the mixed acoustic amplitude spectrum is defined by a frequency band having a predetermined width and a time width.
- an estimated block having a range including the stationary block is set, and corresponding spectral intensity points in the estimated block between the known acoustic amplitude spectrum and the mixed acoustic amplitude spectrum are plotted on a plane.
- set a common line for all vector intensity points and calculate the degree of coincidence based on the degree of deviation of each spectral intensity point from the common line.
- a process for correcting the frequency characteristic of the known sound amplitude spectrum performed when removing the known sound from the mixed sound including the known sound is performed. It can be performed automatically and accurately.
- a stationary block defined by a frequency band having a predetermined width and a time width is set, and for all stationary blocks, mixed noise and mixed sound are determined from the degree of divergence of spectral intensity points.
- erroneous processing such as estimating the magnitude estimation larger than the actual sound is reduced, and the Assuming that the sound is larger than the actual sound volume (intensity and amplitude spectrum), it is possible to avoid deterioration of the processed sound without excessively removing the sound signal.
- the known sound included in the mixed sound is produced by mixing the sound of a music CD or the like given as the known sound, adjusting the frequency characteristics and volume according to the production intention, and mixing the sound with other sounds. Even in this case, it is possible to accurately estimate the intensity and the frequency characteristics of the known sound included at each time of the mixed sound.
- the frequency characteristic is estimated only in the sample section. It is possible to solve the problem that the characteristics of the frequency not included in the known sound in the section can be predicted, and in order to avoid this problem, it is only possible to obtain discretely for each frequency. To complement or smooth the characteristics In addition to eliminating the need for such processing, it is possible to display a frequency correction graph associated with the conventional processing and eliminate the need for the operator to manually correct the frequency. As a result, it is possible to improve the efficiency and accuracy of the known sound removing operation.
- the present invention also extracts a known acoustic signal force and a known acoustic amplitude spectrum, and extracts a mixed acoustic amplitude spectrum from a mixed acoustic signal in which the known acoustic signal and another acoustic signal are mixed.
- the degree of coincidence between the known acoustic amplitude spectrum and the mixed acoustic amplitude spectrum is calculated, the range of only the known acoustic signal in the mixed acoustic signal is estimated according to the calculated degree of coincidence, and the range of the range is calculated. Remove the mixed acoustic signal.
- a stationary block defined by a frequency band having a predetermined width and a time width is set, and the stationary block is determined. For all, set an estimation block having a range including a stationary block, plot the spectrum intensity points in the corresponding estimation block between the known acoustic amplitude spectrum and the mixed acoustic amplitude spectrum on a plane, and set all the vector intensity points. Is established, and the degree of coincidence is calculated from the degree of deviation of each spectral intensity point from the common line.
- the sound that is not removed due to the estimation error of the known sound or the like in the time interval of only the known sound in the mixed sound can be solved.
- FIG. 1 is a block diagram showing a configuration of an acoustic signal removal system according to an embodiment.
- FIG. 2 is a flowchart showing an operation of the acoustic signal removal system according to the embodiment.
- FIG. 3 is a functional block diagram of a removal engine according to the embodiment.
- FIG. 4 is a flowchart showing an operation of the removal engine according to the embodiment.
- FIG. 5 is an explanatory diagram showing the setting of a stationary block in the known acoustic method according to the embodiment.
- FIG. 6 is an explanatory diagram showing an example of changing a stationary block setting in the known acoustic method according to the embodiment.
- FIG. 7 is a plot of spectral intensity points in the known acoustic method according to the embodiment. It is explanatory drawing which shows a state.
- FIG. 8 is a perspective view showing a computer-readable recording medium on which a program according to the embodiment is recorded.
- FIG. 9 is an explanatory diagram showing the effect of the acoustic signal elimination method according to the embodiment.
- FIG. 1 is a block diagram showing the overall configuration of the acoustic signal removal system according to the present embodiment.
- the audio signal removal system includes an input I / F 1 and a DV capture 2 for inputting mixed sound or known sound.
- Files (for example, AVI files and WAV files) input from the input I / F 1 and the DV capture 2 are stored in the storage device 5.
- the input I / F1 is an interface that captures audio signals from the playback device such as a CD player or MD player.
- the DV capture 2 is an interface for extracting MIX audio for removal, which is a mixed audio signal in which video and audio are mixed.
- the audio signal removal system includes a voice conversion unit (PreWav / PostWav) 4 for performing voice data extraction processing and voice conversion processing on various data stored in the storage device 5 and a voice data extraction unit (DVReMix). And three.
- the voice conversion unit 4 and the voice data extraction unit 3 read the specified file (AVI file or WAV file) from the storage device 5, perform predetermined processing, and then process the processed file (WAV file). Store in storage device 5.
- the sound conversion unit 4 performs frequency conversion and separation of monaural stereo power (S103).
- the WAV file is separated into two channels on the left and right to match the format of the removal engine 100, the sampling rate is converted to 48 kHz, and the two WAV files (output file name: MIX-L. WAV, the right channel is generated as MIX-R.WAV) and stored in the storage device 5.
- the audio data extraction unit 3 is a module that extracts only audio data from the content composed of the video data and the audio data. Extract voice data in WAV format.
- the WAV file here is in stereo format, and its sampling rate is 32kHz or 48kHz, which is the same as DV audio.
- the extracted WAV file is stored in the storage device 5.
- the acoustic signal removal system includes a removal engine 100 that removes a known acoustic signal from the mixed acoustic signal.
- the removal engine 100 reads each audio file (WAV file) stored in the storage device 5 and stores the removed data and various data related to the removal processing in the storage device 5 via the temporary memory 7. Output from the monitor 10 or the speaker 11 through the output I / F8.
- the monitor 10 displays a GUI that displays the operation and processing results of the user interface 6, and the speaker 11 outputs mixed sound, known sound, and post-removal sound based on the user operation of the user interface 6.
- the removal engine 100 acquires an operation signal based on a user operation by an input device such as the keyboard 6a or the mouse 6b through the user interface 6, and performs various processes based on the operation signal.
- the acoustic signal removal processing by the removal engine 100 will be described later.
- the acoustic signal elimination system includes a synchronization control unit 9, which reads out data from the storage device 5, performs elimination processing by the elimination engine 100, and inputs data by the memory 7 and the output I / F8. Synchronize output. Thereby, the video displayed on the monitor and the audio output from the speaker 11 can be synchronized with the processing by the removal engine 100 and the user operation in the user interface 6.
- the acoustic signal elimination system includes a simulation unit 14 that sets a default value by a simulation when setting a parameter, and assists a user's work.
- the simulation unit 14 inputs a single tone (480 Hz) having a constant amplitude as a mixed sound, performs a removal process with zero known sound, and compares the output sound volume with the mixed sound before the process. Then, the difference amount is measured, and a default value of the removal strength in the user interface 6 is set so that the difference amount becomes zero.
- a single tone 480 Hz
- the simulation unit 14 inputs a single tone (480 Hz) having a constant amplitude as a mixed sound, performs a removal process with zero known sound, and compares the output sound volume with the mixed sound before the process. Then, the difference amount is measured, and a default value of the removal strength in the user interface 6 is set so that the difference amount becomes zero.
- FIG. 2 is a flowchart showing the operation of the acoustic signal removal system.
- a video file (DV) recorded in video and audio power stereo is assumed to be mixed sound (MIX audio)
- MIX audio mixed sound
- BGM is included in the video file.
- the processing in the present embodiment is roughly divided into (1) preprocessing, (2) music removal processing, and (3) postprocessing. Hereinafter, each processing will be described in detail.
- MIX audio to be removed is extracted from DV, and BGM audio (original music) is prepared.
- the video is captured using DV video editing software from DV Capture 2 (S101), and the captured file is converted to a type 1 AVI file (output file name: MIX.AVI). , Stored in the storage device 5.
- the audio data extraction unit (DVReMix) 3 extracts audio data from the AVI file in the WAV format (output file name: MIX.WAV) (S102).
- the WAV file here is in stereo format, and its sampling rate is 32kHz or 48kHz, which is the same as DV audio.
- the extracted WAV file is stored in the storage device 5.
- the audio conversion unit (PreWav) 4 performs frequency conversion and separates monaural stereo power (S103).
- the WAV file is separated into two channels on the left and right, the sampling rate is converted to 48kHz, and two WAV files (output file name: MIX-L.WAV on the left channel, right The channel is generated as MIX-R.WAV) and stored in the storage device 5.
- the offset of the video start time is output to a setting file (file name: MIX.time) simultaneously with the audio conversion, and is stored in the storage device 5.
- the original music (BGM music) is imported from a CD or the like, and stored in the storage device 5 as a 44.1 kHz stereo WAV file (output file name: BGM.WAV).
- the audio conversion unit (PreWav) 3 performs frequency conversion and separation of stereophonic monaural (S105).
- step S104 the Separate the imported WAV file into two channels (left and right), convert the sampling rate to 48 kHz, and convert it into two WAV files (output file name: BGM-L.WAV for the left channel and BGM-R.WAV for the right channel).
- step S104 the Separate the imported WAV file into two channels (left and right), convert the sampling rate to 48 kHz, and convert it into two WAV files (output file name: BGM-L.WAV for the left channel and BGM-R.WAV for the right channel).
- the background music is removed from the MIX sound by the removal engine (GEQ) 100 (S106).
- the audio file output after this removal is a monaural 48 kHz WAV file for both the left and right channels (output file name: left channel power 3 ⁇ 4RASE-L.WAV, right channel is ERASE-R.WAV), in memory 7 or storage device. Stored in 5.
- the audio power removed by the removal engine is converted to DV audio and restored to DV (AVI file).
- the audio converter (PostWav) 4 performs frequency conversion and conversion from monaural to stereo (S107). That is, the audio conversion unit 4 synthesizes the left and right two-channel WAV file output from the removal engine 100 into stereo, converts it to the same sampling rate as the original DV audio if necessary, and converts the WAV file (file name: ERASE .WAV) in the storage device 5.
- the audio data extraction unit (DVReMix) 3 replaces the captured AVI file (BGM.AVI) audio with the removed audio (ERASE.WAV) (S108), and removes the audio file (file name: ERASE.AV I) is stored in the storage device 5.
- FIG. 3 is a block diagram showing functions of the removal engine 100.
- the removal engine 100 according to the present embodiment is a module virtually constructed on a CPU by executing an acoustic signal removal program on an arithmetic processing device such as a CPU.
- the removal engine 100 includes a mixed sound as a signal input means. Sound signal input section 101 for inputting a sound signal and a known sound signal input section 102 for inputting a known sound signal to be removed. An audio signal output unit 107 is provided.
- the removal engine 100 includes an amplitude spectrum extracting unit 200 that extracts an amplitude spectrum from an input acoustic signal.
- the amplitude spectrum extracting section 200 includes a data dividing section 201, a window function processing section 202, and a Fourier transform section 203.
- the window function processing unit 202 multiplies the audio signal data of the window size section (170 ms) divided by the data division unit 201 by a Hanning function, and smoothes the first and last parts of the data. Into a signal waveform that converges to zero.
- the Fourier transform unit 203 performs a Fourier transform on the data of each of the mixed acoustic signal and the known acoustic signal, and separates and outputs a phase and amplitude spectrum for each frequency channel. It should be noted that data that has power only in the amplitude spectrum is output as “time-frequency data”.
- the Fourier transform section 203 performs a fast Fourier transform (FFT) on the voice data that has been subjected to the Hayung function processing.
- FFT fast Fourier transform
- the input speech data is only real numbers and contains imaginary parts.
- input and output are calculated by complex numbers, so two window conversions are performed on the real and imaginary parts of the input data, respectively.
- Fast Fourier transformation is performed, and after transformation, separation is performed using the conjugate relationship, achieving a two-fold improvement in speed.
- This system uses the SSE2 instruction that can be used with the Intel Pentium4 processor (registered trademark) or the like to achieve high-speed processing.
- the amplitude spectrum extracting section 200 sets the section to be Fourier-transformed to 480 samples (480 samples).
- time-frequency data that is data representing “only the amplitude” of the audio signal is obtained for each frequency channel.
- the circumference obtained in this way The number of wavenumber channels is 4096 channels from ⁇ (DC) to about 24 kHz for every 5.86 Hz, such as 0 ⁇ , 5.86 ⁇ , 11,72 ⁇ , 17.57 ⁇ ... 23,994.14 ⁇ .
- the amplitude spectrum extracting section 200 functions as a mixed acoustic amplitude extracting section that extracts a mixed acoustic amplitude spectrum from the mixed acoustic signal.
- the signal is a known sound signal to be removed, it functions as a known sound amplitude extraction unit that extracts a sound amplitude spectrum from the known sound signal.
- the removal engine 100 automatically estimates the change of the known sound in the mixed sound based on the amplitude spectrum of the known sound extracted from the amplitude spectrum extraction unit 200, and corrects the automatic estimation result by a user operation.
- a parameter estimating unit 300 is provided.
- the parameter estimating unit 300 is a module for estimating the frequency characteristic, intensity, and time position of the known sound, and correcting each parameter to match the known sound in the mixed sound. The calculation is performed based on the degree of coincidence calculated by the calculation unit 304.
- the parameter estimating unit 300 includes a frequency characteristic correcting unit 301, an intensity correcting unit 302, and a time position correcting unit 303, whereby (1) the temporal positions of the mixed sound and the known sound are shifted. Then, (2) the frequency characteristics of the known sound and (3) the time change of the volume of the known sound are estimated.
- Frequency characteristic correction unit 301 is a module for estimating the frequency distribution, upon the estimation of the frequency distribution is a function of arbitrary shape for Ikoraijingu processing and Fueda operation processing for the amplitude spectrum c ( For co, t), the shape in the ⁇ direction is changed to adjust the frequency characteristics after removal of the known acoustic signal, like a graphic equalizer.
- the frequency characteristic correction unit 301 smoothes the frequency characteristics because the value obtained due to noise or the like becomes unstable in a portion of the audio channel where the volume of the BGM is low. This smoothing is realized by smoothing the average value of the front and rear channels.
- Intensity correction section 302 performs estimation and smoothing of a temporal change in volume.
- the shape of the spectral function c (co, t) in the t direction is corrected, as in the case of the volume fader operation of a mixer, where the sound volume change after removing a known sound signal is obtained. Can be adjusted.
- the intensity correction unit 302 detects a temporal change in the volume of the known sound over the entire time range of the mixed sound. Since the mixed sound includes sounds such as voices in addition to the known sound, the frequency channels of the mixed sound and the known sound corrected by the frequency characteristics are summed up for each octave (every double in frequency). I do.
- the graph display allows the user to identify that the volume is clearly increased, and corrects it manually to deal with it.
- an automatic judgment method such as a robust statistical method may be adopted!
- the intensity correction unit 302 also performs smoothing even when estimating a time change, and smoothes the average value of the volume of the known sound before and after the time.
- the time position correction unit 303 is a module that corrects a temporal displacement between the start point of the mixed sound and the start point of the known sound.
- the mixed sound in the user's ear and the known sound may be output from the left and right separate speakers, and these sounds may be compared and listened to, and the user's auditory sense may be used for positioning.
- the removal engine 100 includes a removal processing unit 104 that removes the mixed acoustic amplitude spectrum force and the known acoustic amplitude spectrum extracted by the amplitude spectrum extraction unit 200, and an inverse Fourier transform to perform the removal-subjected sound.
- An oscillator unit 105 for superimposing and restoring and an arrangement processing unit 106 are provided.
- the removal processing unit 104 converts the known sound according to the estimation data generated by the parameter estimating unit 300, and removes the converted signal from the “time-frequency data” of the mixed sound.
- the oscillator unit 105 restores, by superposition conversion, data of only the sound with the known sound removed from the "time-frequency data" obtained by the subtraction calculation and the phase data in the mixed sound signal.
- an inverse Fourier transform may be performed instead of a powerful superposition transform.
- subtraction at each time Force to perform inverse Fourier transform of subsequent frequency channel data should have the same value as the phase of the known sound or mixed sound before removal
- the arrangement processing unit 106 superimposes the outputs of the windows of the same width by the overlap add (OverlapAdd) method on the sound at each time point having a width of 170 milliseconds, which is the width of the Hayung window, and finally, Restore audio from which music has been removed.
- overlapAdd overlap add
- the post-removal sound signal output unit 107 is a module that outputs mixed sound from which the known sound has been removed as audio data.
- the post-removal sound signal output unit 107 estimates the range of only the known sound signal in the mixed sound signal according to the matching degree calculated by the matching degree calculating unit 304, and It functions as a sound removal unit that removes sound signals.
- FIG. 4 shows the flow of processing by the removal engine 100.
- the removal engine 100 acquires the phase and amplitude spectrum of the mixed acoustic signal from the mixed acoustic signal by Fourier transform.
- the removal engine 100 performs A / D conversion on the acoustic signal at a sampling frequency of 48 kHz and a quantization bit number of 16 bits, and performs STFT using a Hayung window having a window width of 8192 points as a window function h (t). , Calculated by the fast Fourier transform (FFT).
- FFT fast Fourier transform
- the elimination engine 100 shifts the FFT frame by 480 points at a time, and sets the frame shift time (one frame shift) to 10 ms as the processing time unit.
- the removal engine 100 can easily cope with other standardized frequencies (16 kHz, 44 kHz, etc.), window widths, and frame shifts.
- step S202 the removal engine 100 performs a Fourier transform of the known acoustic signal to obtain an amplitude spectrum of the known acoustic signal.
- the coincidence calculating section 304 calculates the amplitude spectrum of the mixed sound and the amplitude spectrum of the known sound.
- the frequency characteristics and intensity points of the mixed sound and the known sound signal at each time are plotted, and the degree of coincidence between the common linear force and the deviation force with respect to the plotted intensity points is compared (S203).
- the frequency characteristic and the intensity are estimated from the inclination of the common line. The calculation of the degree of coincidence and the calculation of the inclination of the common line will be described later.
- the time position corrector 303 detects the start time of the known sound signal and corrects the amplitude spectrum of the known sound signal (S204 and S205). ).
- the corrected amplitude spectrum of the known acoustic signal is removed from the amplitude spectrum of the mixed acoustic signal (S206), and the removed amplitude spectrum is compared with the phase of the mixed acoustic signal in the oscillator unit 105. And superposition conversion (S207), and the arrangement processing unit 106 performs arrangement conversion by the overlap add method (S208). Further, based on the degree of coincidence calculated in step S203, the range of only known sounds is determined, and the range of only known sounds is removed by the acoustic signal output unit 107 after removal (S209 and S210).
- the matching degree calculation unit 304 calculates the matching degree between the mixed sound and the known sound. Specifically, the coincidence calculating section 304 compares the intensities of the amplitude spectra of the mixed sound and the known sound, and calculates the degree of coincidence. Specifically, it is based on the following page.
- the degree-of-coincidence calculation unit 304 calculates a mixed sound section (Tms, Tme) and a known sound section (Tms, Tme) given to the mixed sound signal, the known sound signal, and the amplitude spectrum obtained by Fourier transforming each of them. , Tme), the degree of coincidence is determined from the difference (Td) between the start times. If Td is a multiple of Tf (the frame length at the time of Fourier transform), the coincidence calculating section 304 can use the amplitude spectrum obtained in step S201. In other cases, the coincidence calculating section 304 calculates the amplitude spectrum every time.
- the coincidence calculating unit 304 obtains a common section in which the known sound is included in the mixed sound. If this is (Ts, Te), the following equation is obtained. [Number 1]
- Ts max (Tms, Tbs-Td)
- Te mn (Tme, Tbe-Td)
- Nf (Te-Ts) tTf + l
- the coincidence calculating section 304 sets a stationary block defined by a frequency band having a predetermined width and a time width.
- the coincidence calculating section 304 calculates the mixed acoustic amplitude spectrum M ( ⁇ , t) and the known acoustic amplitude spectrum ⁇ ( ⁇ , t) in the time axis direction and the frequency axis direction (logarithmic plot). It is divided into stationary blocks in units of time and frequency (for example, as shown in Fig. 5, the horizontal width of the stationary block is 200 milliseconds (20 frames) and the vertical width is about 0.5 octave).
- the coincidence calculating section 304 sets an estimated block having a range including the stationary block for all of the stationary blocks. Specifically, the coincidence calculating section 304 sets a larger estimated block surrounding the block for each stationary block (for example, in FIG. 5, the width is 400 milliseconds (40 frames), and the pitch is about 1
- the coincidence calculation unit 304 estimates the actual intensity of the known sound (BGM) using the data included in the estimation block).
- the size of the block can be any size. That is, in the present embodiment, the size of the block is rectangular with the stationary block, the low frequency band width, and the time width as shown in FIG. 5, but, for example, as shown in FIG. It may be a strip-shaped block separated. Regardless of the amount of calculation (calculation speed), the width and height of the “stationary block” are the minimum ( Even more accurate estimation is possible, even with 10 ms horizontal and 1 vertical channel. Conversely, by increasing the horizontal and vertical widths of the “steady-state block”, the amount of calculation can be reduced and high-speed execution is possible.
- the coincidence calculating section 304 calculates an average signal intensity (spectral intensity) for each block. Specifically, the following processing is performed for all estimated blocks.
- the range of the estimation block set for the mixed sound amplitude spectrum is the time axis (Tks ⁇ Ti ⁇ Tke) and the frequency axis (coks coK coke)
- the Ti and ⁇ Find all sets of known acoustic amplitude spectrum values B ( ⁇ Ti + Td) corresponding to all used mixed acoustic amplitude spectrum values ⁇ ( ⁇ ⁇ , Ti) and assumed start time shift Td .
- the coincidence calculating section 304 obtains the coincidence in the estimation block by any one of the following procedures. Specifically, as shown in FIG. 7, the coincidence calculating section 304 plots the spectrum intensity of the corresponding estimated block on a plane between the known acoustic amplitude spectrum and the mixed acoustic amplitude spectrum. , Set a common line for all spectral intensity points, calculate the degree of coincidence based on the degree of divergence of each spectral intensity point from the common line, find the common line with the lowest degree of divergence, and calculate the spectral intensity based on the slope of the common line. Ask for. In Fig. 7 (a), the degree of coincidence is high because the intensity points are almost located on a common straight line. In Fig. 7 (b), sounds (speech and noise) other than known sounds and interference on the same frequency , The degree of coincidence is low.
- the coincidence is calculated by the following equation. This is a correlation value, and the greater the value, the larger the value close to 1.
- the number of samples in the estimation block (the number when all ⁇ ⁇ and Ti are exhausted) is Ns.
- this method is a typical robust statistical method, finds a straight line that fits using a linear regression method that is a straight line passing through the origin, and then searches for a point where the linear force is also distant, and 10% from the distant point. To eliminate the point. Then, for the remaining points, a straight line that fits is found again, and among the remaining points, points that are far from the new linear force are excluded by about 10%. If this process is repeated about 5 times, 50% of the points remain as the points to be fitted, and the result is the slope of the last fitted line. For the amount to be reduced and the setting method, use a V-type method.
- W is set with an initial value of 1 for all relevant Ti and ⁇ ⁇ . At this time [Number 6]
- Step 3 The standard ratio ⁇ is obtained by the following equation.
- Step 4 Obtain the deviation from the standard ratio for all applicable Ti and ⁇ .
- Step 5 Subtract a constant value Cs from Rs.
- the fixed value is greater than 0.0 and smaller than 0.5. (For example, use 0.1 as Cs)
- W is set to 1 for each (sample number * Rs) in order of the force with the smallest deviation.
- W is set to 0 for the remaining (number of samples * (1.0-Rs)).
- the degree of coincidence is calculated by the following equation. This represents a correlation value, and the greater the value, the larger the value close to 1.
- W is set with an initial value of 1 for all relevant Ti and ⁇ .
- the finally obtained oc is defined as the intensity of the known sound in the stationary block. Also, for all the applicable Ti and ⁇ i, find the deviation from the standard ratio.
- the magnitude Dmidium of the (difference in the number of samples * 0.5) -th shift is determined in ascending order.
- the degree of coincidence is calculated by the following equation. This represents a correlation value, and the greater the value, the larger the value close to 1.
- the set line weight is calculated by integrating the slope of the common line with the distance from the X-axis and the ⁇ -axis of the plane to each spectral intensity point, and each weight is calculated from a straight line passing through the origin. It is integrated with the distance to the torque intensity point, and the result is defined as the slope of a straight line obtained by the statistics.
- W (coi, Ti) is determined, and a weighting function W (coi, Ti) that increases as ⁇ ( ⁇ , ⁇ ) and ⁇ ( ⁇ , Ti + Td) increases is determined. Examples include the following:
- ⁇ ⁇ , ⁇ ⁇ ⁇ ⁇ ( ⁇ , ⁇ ⁇ ⁇ , ⁇ + Td)) 2
- the obtained degree of coincidence is obtained for all the estimated blocks, and the average is obtained.
- it is possible to improve the accuracy of the matching degree by excluding frequency bands (extremely low and high frequencies) that are apparently noisy from experience.
- the inclination of the common line can also be obtained by the following procedure.
- the slope of the straight line passing through the NZ2nd spectral intensity point among all the straight lines passing through each spectral intensity point from the origin is calculated.
- the acoustic signal removal system is, for example, It is realized by an audio signal removal program installed in a computer such as a user terminal or a Web server or an IC chip, and a CPU, a memory, a hard disk, etc. provided in the computer. Therefore, by installing the audio signal removal program, it is possible to easily construct an audio signal removal device or an audio signal removal system having the above-described functions, and to implement the audio signal removal method. .
- This acoustic signal elimination program can be distributed, for example, through a communication line, and can be transferred as a knocking application that runs on a stand-alone computer.
- Such a program can be recorded on a recording medium 116-119 readable by a general-purpose computer 120 as shown in FIG. More specifically, various recording media such as a RAM card 118 as well as a magnetic recording medium such as a flexible disk 116 and a cassette tape 119, an optical disk such as a CD-ROM and a DVD-ROM 117 as shown in FIG. Can be recorded on the body.
- various recording media such as a RAM card 118 as well as a magnetic recording medium such as a flexible disk 116 and a cassette tape 119, an optical disk such as a CD-ROM and a DVD-ROM 117 as shown in FIG.can be recorded on the body.
- a general-purpose computer or a special-purpose computer is used to construct the above-described acoustic signal removal system or implement the acoustic signal removal method. This makes it easy to store, transport and install the program.
- phase of the amplitude data can be changed, processing that does not depend on the phase can be performed. Therefore, for example, only music can be removed from the audio signal of a program in which audio and music are mixed, using the audio data such as a music CD used when the program was created.
- Fig. 9 shows the result of actually processing a mixed sound of classical music being played in the background music of a dialogue between two men and women.
- BGM component known sound signal
- the synchronization control unit 9 synchronizes the video and the audio and outputs the synchronized video and audio from the monitor 10 and the speaker 11, so that before the music is removed, the music is removed. The operation can be performed while visually confirming each subsequent sound by comparing it with the video, thereby improving work efficiency.
- the time change graph is displayed and can be corrected by an intuitive operation in which the user draws with the mouse, the user's intention such as taking into account each scene of the program and the reuse method can be considered. , The effect of the music removal can be adjusted.
- the audio signal removing apparatus can appropriately remove the sound to be removed from the mixed sound. And useful.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004144177A JP4272107B2 (ja) | 2004-05-13 | 2004-05-13 | 音響信号除去装置、音響信号除去方法及び音響信号除去プログラム |
JP2004-144177 | 2004-05-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005112007A1 true WO2005112007A1 (ja) | 2005-11-24 |
Family
ID=35394384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/013168 WO2005112007A1 (ja) | 2004-05-13 | 2004-09-09 | 音響信号除去装置、音響信号除去方法及び音響信号除去プログラム |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP4272107B2 (ja) |
WO (1) | WO2005112007A1 (ja) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5365380B2 (ja) * | 2009-07-07 | 2013-12-11 | ソニー株式会社 | 音響信号処理装置、その処理方法およびプログラム |
JP5057535B1 (ja) | 2011-08-31 | 2012-10-24 | 国立大学法人電気通信大学 | ミキシング装置、ミキシング信号処理装置、ミキシングプログラム及びミキシング方法 |
JP7344649B2 (ja) * | 2019-02-25 | 2023-09-14 | 株式会社ベネッセコーポレーション | 情報端末装置およびプログラム |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS57161800A (en) * | 1981-03-30 | 1982-10-05 | Toshiyuki Sakai | Voice information filter |
JPS58100199A (ja) * | 1981-10-19 | 1983-06-14 | ボータン | 音声認識及び再生方法とその装置 |
JPS59165098A (ja) * | 1983-03-10 | 1984-09-18 | 三洋電機株式会社 | 音声スペクトルパラメ−タ抽出装置 |
JPH04340599A (ja) * | 1991-05-16 | 1992-11-26 | Ricoh Co Ltd | 雑音除去装置 |
JPH09251299A (ja) * | 1996-03-15 | 1997-09-22 | Toshiba Corp | マイクロホンアレイ入力型音声認識装置及び方法 |
JPH10133689A (ja) * | 1996-10-30 | 1998-05-22 | Kyocera Corp | 雑音除去装置 |
JPH1115494A (ja) * | 1997-06-25 | 1999-01-22 | Denso Corp | 音声認識装置 |
JPH1138997A (ja) * | 1997-07-16 | 1999-02-12 | Olympus Optical Co Ltd | 雑音抑圧装置および音声の雑音除去の処理をするための処理プログラムを記録した記録媒体 |
JP2002314637A (ja) * | 2001-04-09 | 2002-10-25 | Denso Corp | 雑音低減装置 |
JP2003140671A (ja) * | 2001-11-05 | 2003-05-16 | Honda Motor Co Ltd | 混合音の分離装置 |
JP2003271166A (ja) * | 2002-03-14 | 2003-09-25 | Nissan Motor Co Ltd | 入力信号処理方法および入力信号処理装置 |
-
2004
- 2004-05-13 JP JP2004144177A patent/JP4272107B2/ja not_active Expired - Lifetime
- 2004-09-09 WO PCT/JP2004/013168 patent/WO2005112007A1/ja active Application Filing
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS57161800A (en) * | 1981-03-30 | 1982-10-05 | Toshiyuki Sakai | Voice information filter |
JPS58100199A (ja) * | 1981-10-19 | 1983-06-14 | ボータン | 音声認識及び再生方法とその装置 |
JPS59165098A (ja) * | 1983-03-10 | 1984-09-18 | 三洋電機株式会社 | 音声スペクトルパラメ−タ抽出装置 |
JPH04340599A (ja) * | 1991-05-16 | 1992-11-26 | Ricoh Co Ltd | 雑音除去装置 |
JPH09251299A (ja) * | 1996-03-15 | 1997-09-22 | Toshiba Corp | マイクロホンアレイ入力型音声認識装置及び方法 |
JPH10133689A (ja) * | 1996-10-30 | 1998-05-22 | Kyocera Corp | 雑音除去装置 |
JPH1115494A (ja) * | 1997-06-25 | 1999-01-22 | Denso Corp | 音声認識装置 |
JPH1138997A (ja) * | 1997-07-16 | 1999-02-12 | Olympus Optical Co Ltd | 雑音抑圧装置および音声の雑音除去の処理をするための処理プログラムを記録した記録媒体 |
JP2002314637A (ja) * | 2001-04-09 | 2002-10-25 | Denso Corp | 雑音低減装置 |
JP2003140671A (ja) * | 2001-11-05 | 2003-05-16 | Honda Motor Co Ltd | 混合音の分離装置 |
JP2003271166A (ja) * | 2002-03-14 | 2003-09-25 | Nissan Motor Co Ltd | 入力信号処理方法および入力信号処理装置 |
Also Published As
Publication number | Publication date |
---|---|
JP4272107B2 (ja) | 2009-06-03 |
JP2005326587A (ja) | 2005-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10650796B2 (en) | Single-channel, binaural and multi-channel dereverberation | |
JP3670562B2 (ja) | ステレオ音響信号処理方法及び装置並びにステレオ音響信号処理プログラムを記録した記録媒体 | |
US8891778B2 (en) | Speech enhancement | |
US6405163B1 (en) | Process for removing voice from stereo recordings | |
RU2467406C2 (ru) | Способ и устройство для поддержки воспринимаемости речи в многоканальном звуковом сопровождении с минимальным влиянием на систему объемного звучания | |
EP1741313B1 (en) | A method and system for sound source separation | |
US8885839B2 (en) | Signal processing method and apparatus | |
KR20180050652A (ko) | 음향 신호를 사운드 객체들로 분해하는 방법 및 시스템, 사운드 객체 및 그 사용 | |
US9071215B2 (en) | Audio signal processing device, method, program, and recording medium for processing audio signal to be reproduced by plurality of speakers | |
KR100750148B1 (ko) | 음성신호 제거 장치 및 그 방법 | |
JP4274419B2 (ja) | 音響信号除去装置、音響信号除去方法及び音響信号除去プログラム | |
KR101008250B1 (ko) | 기지 음향신호 제거방법 및 장치 | |
JP2005284163A (ja) | 雑音スペクトル推定方法、雑音抑圧方法および雑音抑圧装置 | |
WO2005112007A1 (ja) | 音響信号除去装置、音響信号除去方法及び音響信号除去プログラム | |
JP4922427B2 (ja) | 信号補正装置 | |
CN101422054A (zh) | 声像定位装置 | |
JP2008072600A (ja) | 音響信号処理装置、音響信号処理プログラム、音響信号処理方法 | |
JP2006235102A (ja) | 音声処理装置および音声処理方法 | |
JP4274418B2 (ja) | 音響信号除去装置、音響信号除去方法及び音響信号除去プログラム | |
KR101096091B1 (ko) | 음성 분리 장치 및 이를 이용한 단일 채널 음성 분리 방법 | |
KR20090054583A (ko) | 휴대용 단말기에서 스테레오 효과를 제공하기 위한 장치 및방법 | |
JP2005284016A (ja) | 音声信号の雑音推定方法およびそれを用いた雑音除去装置 | |
JP2014158103A (ja) | 音声信号処理装置、音声信号処理装置の制御方法およびプログラム | |
JP2004020945A (ja) | 音声認識装置、音声認識方法、および、音声認識プログラム | |
JP2014206559A (ja) | 受信装置及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |