WO2020166216A1 - 処理装置、処理方法、再生方法、及びプログラム - Google Patents
処理装置、処理方法、再生方法、及びプログラム Download PDFInfo
- Publication number
- WO2020166216A1 WO2020166216A1 PCT/JP2019/050601 JP2019050601W WO2020166216A1 WO 2020166216 A1 WO2020166216 A1 WO 2020166216A1 JP 2019050601 W JP2019050601 W JP 2019050601W WO 2020166216 A1 WO2020166216 A1 WO 2020166216A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- unit
- frequency
- data
- envelope
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
- H04S1/005—For headphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K15/00—Acoustics not otherwise provided for
- G10K15/02—Synthesis of acoustic waves
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/02—Spatial or constructional arrangements of loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/04—Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/01—Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
Definitions
- the present invention relates to a processing device, a processing method, a reproducing method, and a program.
- the recording and playback system disclosed in Patent Document 1 uses filter means for processing the signal supplied to the loudspeaker.
- the filter means comprises two filter design steps.
- the transfer function between the position of the virtual sound source and the specific position of the reproduced sound field is described in the form of the filter (A).
- the specific position of the reproduced sound field is the ear or head region of the listener.
- a transfer function filter (A) is used for crosstalk cancellation used to invert the electroacoustic transfer path or paths (C) between the loudspeaker input and the specific position. It is convolved with the matrix of filters (Hx).
- the matrix of the crosstalk canceling filter (Hx) is created by measuring the impulse response.
- a measurement signal impulse sound, etc.
- ch 2-channel
- the processing device To record. Then, the processing device generates a filter based on the collected sound signal obtained by collecting the measurement signal. By convolving the generated filter with the audio signal of 2ch, out-of-head localization reproduction can be realized.
- the characteristics from the headphones to the ear to the eardrum are measured with a microphone installed in the listener's own ear. To do.
- Patent Document 2 discloses a method for generating an inverse filter of the ear canal transfer function.
- the amplitude component of the ear canal transfer function is corrected in order to prevent treble noise due to the notch. Specifically, when the gain of the amplitude component is below the gain threshold, the notch is adjusted by correcting the gain value. Then, an inverse filter is generated based on the corrected ear canal transfer function.
- a microphone installed in the listener's own ear When performing out-of-head localization, it is preferable to measure the characteristics with a microphone installed in the listener's own ear.
- impulse response measurement or the like is performed with a microphone and headphones attached to the listener's ear.
- a filter suitable for the listener By using the characteristics of the listener himself, a filter suitable for the listener can be generated. For such filter generation and the like, it is desired to appropriately process the sound pickup signal obtained by the measurement.
- the present embodiment has been made in view of the above points, and an object thereof is to provide a processing device, a processing method, a reproducing method, and a program that can appropriately process a sound pickup signal.
- the processing apparatus generates scale conversion data by performing envelope conversion and data interpolation on an envelope calculation unit that calculates an envelope for the frequency characteristic of the sound pickup signal and the frequency data of the envelope.
- a scale conversion unit a scale factor conversion data is divided into a plurality of frequency bands, a feature value for each frequency band is obtained, a normalization coefficient calculation unit that calculates a normalization coefficient based on the feature value, and the normalization unit.
- a normalization unit that normalizes the sound pickup signal in the time domain using the normalization coefficient.
- the processing method a step of calculating an envelope for the frequency characteristics of the sound pickup signal, by performing scale conversion and data interpolation of the frequency data of the envelope, to generate scale conversion data, Dividing the scale conversion data into a plurality of frequency bands, obtaining a characteristic value for each frequency band, calculating a normalization coefficient based on the characteristic value, and using the normalization coefficient, in the time domain Normalizing the picked-up signal.
- the program according to the present embodiment is a program for causing a computer to execute a processing method, and the processing method includes a step of calculating an envelope with respect to a frequency characteristic of a sound pickup signal, and a frequency of the envelope.
- the processing method includes a step of calculating an envelope with respect to a frequency characteristic of a sound pickup signal, and a frequency of the envelope.
- the out-of-head localization process according to the present embodiment is to perform the out-of-head localization process using the spatial acoustic transfer characteristics and the external auditory meatus transfer characteristics.
- the spatial acoustic transfer characteristic is a transfer characteristic from a sound source such as a speaker to the ear canal.
- the ear canal transfer characteristic is a transfer characteristic from a speaker unit of headphones or earphones to an eardrum.
- the spatial acoustic transfer characteristics are measured without wearing headphones or earphones, and the external auditory meatus transfer characteristics are measured with wearing headphones or earphones.
- Out-of-head localization processing is realized.
- the present embodiment is characterized by a microphone system for measuring spatial acoustic transfer characteristics or ear canal transfer characteristics.
- the out-of-head localization process is executed by a user terminal such as a personal computer, a smart phone, or a tablet PC.
- the user terminal is an information processing apparatus having a processing unit such as a processor, a storage unit such as a memory or a hard disk, a display unit such as a liquid crystal monitor, a touch panel, a button, a keyboard, an input unit such as a mouse.
- the user terminal may have a communication function of transmitting and receiving data.
- an output unit (output unit) having headphones or earphones is connected to the user terminal.
- the connection between the user terminal and the output means may be wired or wireless.
- FIG. 1 shows a block diagram of an out-of-head localization processing device 100, which is an example of the sound field reproducing device according to the present embodiment.
- the out-of-head localization processing device 100 reproduces a sound field for the user U who wears the headphones 43. Therefore, the out-of-head localization processing device 100 performs sound image localization processing on the Lch and Rch stereo input signals XL and XR.
- the Lch and Rch stereo input signals XL and XR are analog audio reproduction signals output from a CD (Compact Disc) player or digital audio data such as mp3 (MPEG Audio Layer-3).
- the audio reproduction signal or digital audio data is collectively referred to as a reproduction signal. That is, the stereo input signals XL and XR of Lch and Rch are reproduction signals.
- out-of-head localization processing device 100 is not limited to a physically single device, and a part of processing may be performed by a different device.
- a part of the processing may be performed by a smart phone or the like, and the remaining processing may be performed by a DSP (Digital Signal Processor) or the like built in the headphones 43.
- DSP Digital Signal Processor
- the out-of-head localization processing device 100 includes an out-of-head localization processing unit 10, a filter unit 41 that stores an inverse filter Linv, a filter unit 42 that stores an inverse filter Rinv, and headphones 43.
- the out-of-head localization processing unit 10, the filter unit 41, and the filter unit 42 can be specifically realized by a processor or the like.
- the out-of-head localization processing unit 10 includes convolution operation units 11 to 12, 21 to 22, which store spatial acoustic transfer characteristics Hls, Hlo, Hro, and Hrs, and adders 24 and 25.
- the convolution operation units 11 to 12 and 21 to 22 perform the convolution process using the spatial acoustic transfer characteristics.
- the stereo input signals XL and XR from a CD player or the like are input to the out-of-head localization processing unit 10. Spatial acoustic transfer characteristics are set in the out-of-head localization processing unit 10.
- the out-of-head localization processing unit 10 convolves a filter of spatial acoustic transfer characteristics (hereinafter, also referred to as spatial acoustic filter) with the stereo input signals XL and XR of each channel.
- the spatial acoustic transfer characteristic may be a head-related transfer function HRTF measured by the person's head or auricle, a dummy head, or a third-party head-related transfer function.
- the spatial acoustic transfer function is a set of four spatial acoustic transfer characteristics Hls, Hlo, Hro, and Hrs.
- the data used for convolution in the convolution operation units 11, 12, 21, and 22 becomes a spatial acoustic filter.
- a spatial acoustic filter is generated by cutting out the spatial acoustic transfer characteristics Hls, Hlo, Hro, and Hrs with a predetermined filter length.
- Each of the spatial acoustic transfer characteristics Hls, Hlo, Hro, and Hrs is acquired in advance by impulse response measurement or the like.
- the user U wears a microphone on each of the left and right ears.
- the left and right speakers arranged in front of the user U respectively output impulse sounds for performing impulse response measurement.
- the measurement signal such as the impulse sound output from the speaker is picked up by the microphone.
- the spatial acoustic transfer characteristics Hls, Hlo, Hro, and Hrs are acquired based on the sound pickup signal from the microphone.
- Spatial acoustic transfer characteristic Hls between left speaker and left microphone, spatial acoustic transfer characteristic Hlo between left speaker and right microphone, spatial acoustic transfer characteristic Hro between right speaker and left microphone, right speaker and right microphone The spatial acoustic transfer characteristic Hrs between and is measured.
- the convolution operation unit 11 convolves the spatial acoustic filter according to the spatial acoustic transfer characteristic Hls with respect to the Lch stereo input signal XL.
- the convolution operation unit 11 outputs the convolution operation data to the adder 24.
- the convolution operation unit 21 convolves a spatial acoustic filter according to the spatial acoustic transfer characteristic Hro with respect to the Rch stereo input signal XR.
- the convolution operation unit 21 outputs the convolution operation data to the adder 24.
- the adder 24 adds the two convolution operation data and outputs the result to the filter unit 41.
- the convolution operation unit 12 convolves a spatial acoustic filter according to the spatial acoustic transfer characteristic Hlo with the Lch stereo input signal XL.
- the convolution operation unit 12 outputs the convolution operation data to the adder 25.
- the convolution operation unit 22 convolves a spatial acoustic filter according to the spatial acoustic transfer characteristic Hrs with the Rch stereo input signal XR.
- the convolution operation unit 22 outputs the convolution operation data to the adder 25.
- the adder 25 adds the two convolution operation data and outputs the result to the filter unit 42.
- Inverse filters Linv and Rinv that cancel the headphone characteristics are set in the filter units 41 and 42. Then, the inverse filters Linv and Rinv are convolved with the reproduction signal (convolution operation signal) processed by the out-of-head localization processing unit 10.
- the filter unit 41 convolves the Lch signal from the adder 24 with the inverse filter Linv having the headphone characteristic on the Lch side.
- the filter unit 42 convolves the Rch signal from the adder 25 with the inverse filter Rinv having the headphone characteristic on the Rch side.
- the inverse filters Linv and Rinv cancel the characteristics from the headphone unit to the microphone when the headphones 43 are attached.
- the microphone may be placed anywhere between the entrance to the ear canal and the eardrum.
- the filter unit 41 outputs the processed Lch signal YL to the left unit 43L of the headphones 43.
- the filter unit 42 outputs the processed Rch signal YR to the right unit 43R of the headphones 43.
- the user U wears the headphones 43.
- the headphones 43 output the Lch signal YL and the Rch signal YR (hereinafter, the Lch signal YL and the Rch signal YR are also collectively referred to as a stereo signal) to the user U. Thereby, the sound image localized outside the head of the user U can be reproduced.
- the out-of-head localization processing device 100 performs the out-of-head localization process using the spatial acoustic filters according to the spatial acoustic transfer characteristics Hls, Hlo, Hro, and Hrs and the inverse filters Linv and Rinv of the headphone characteristics.
- the spatial acoustic filter according to the spatial acoustic transfer characteristics Hls, Hlo, Hro, and Hrs and the inverse filters Linv and Rinv of the headphone characteristic are collectively referred to as an out-of-head localization filter.
- the out-of-head localization filter is composed of four spatial acoustic filters and two inverse filters.
- the out-of-head localization processing device 100 performs the out-of-head localization processing by performing convolution calculation processing on the stereo reproduction signal using a total of six out-of-head localization filters.
- the out-of-head localization filter is preferably based on the user U's personal measurements.
- the out-of-head localization filter is set based on the sound collection signal collected by the microphone mounted on the ear of the user U.
- the spatial acoustic filter and the headphone characteristic inverse filters Linv and Rinv are filters for audio signals.
- the out-of-head localization processing device 100 executes out-of-head localization processing.
- the processing for generating the inverse filters Linv and Rinv is one of the technical features. The process for generating the inverse filter will be described below.
- FIG. 2 shows a configuration for measuring the transfer characteristic of the user U.
- the measuring device 200 includes a microphone unit 2, headphones 43, and a processing device 201.
- the measured person 1 is the same person as the user U in FIG.
- the processing device 201 of the measurement device 200 performs arithmetic processing for appropriately generating a filter according to the measurement result.
- the processing device 201 is a personal computer (PC), a tablet terminal, a smart phone, or the like, and includes a memory and a processor.
- the memory stores a processing program, various parameters, measurement data, and the like.
- the processor executes the processing program stored in the memory. Each process is executed by the processor executing the processing program.
- the processor may be, for example, a CPU (Central Processing Unit), an FPGA (Field-Programmable Gate Array), a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), or a GPU (Graphics Processing Unit). ..
- a microphone unit 2 and headphones 43 are connected to the processing device 201.
- the microphone unit 2 may be built in the headphones 43.
- the microphone unit 2 includes a left microphone 2L and a right microphone 2R.
- the left microphone 2L is attached to the left ear 9L of the user U.
- the right microphone 2R is attached to the right ear 9R of the user U.
- the processing device 201 may be the same processing device as the out-of-head localization processing device 100, or may be a different processing device. Further, earphones can be used instead of the headphones 43.
- the headphone 43 has a headphone band 43B, a left unit 43L, and a right unit 43R.
- the headphone band 43B connects the left unit 43L and the right unit 43R.
- the left unit 43L outputs a sound toward the left ear 9L of the user U.
- the right unit 43R outputs a sound toward the right ear 9R of the user U.
- the headphone 43 is a closed type, an open type, a semi-open type, a semi-closed type, or the like, and any type of headphone may be used.
- the user U wears the headphones 43 while the microphone unit 2 is worn by the user U.
- the left unit 43L and the right unit 43R of the headphones 43 are attached to the left ear 9L and the right ear 9R to which the left microphone 2L and the right microphone 2R are attached, respectively.
- the headphone band 43B generates a biasing force that presses the left unit 43L and the right unit 43R against the left ear 9L and the right ear 9R, respectively.
- the left microphone 2L picks up the sound output from the left unit 43L of the headphones 43.
- the right microphone 2R picks up the sound output from the right unit 43R of the headphones 43.
- the microphone parts of the left microphone 2L and the right microphone 2R are arranged at sound collecting positions near the external ear canal.
- the left microphone 2L and the right microphone 2R are configured not to interfere with the headphones 43. That is, the user U can wear the headphones 43 in a state where the left microphone 2L and the right microphone 2R are arranged at appropriate positions of the left ear 9L and the right ear 9R.
- the processing device 201 outputs a measurement signal to the headphones 43.
- the headphones 43 generate an impulse sound or the like.
- the impulse sound output from the left unit 43L is measured by the left microphone 2L.
- the right microphone 2R measures the impulse sound output from the right unit 43R.
- Impulse response measurement is performed by the microphones 2L and 2R acquiring the sound pickup signal when the measurement signal is output.
- FIG. 3 is a control block diagram showing the processing device 201.
- the processing device 201 includes a measurement signal generation unit 211, a sound pickup signal acquisition unit 212, an envelope calculation unit 214, and a scale conversion unit 215. Furthermore, the processing device 201 includes a normalization coefficient calculation unit 216, a normalization unit 217, a conversion unit 218, a dip correction unit 219, and a filter generation unit 220.
- the measurement signal generation unit 211 includes a D/A converter and an amplifier, and generates a measurement signal for measuring the ear canal transfer characteristic.
- the measurement signal is, for example, an impulse signal or a TSP (Time Stretched Pulse) signal.
- the measurement device 200 performs impulse response measurement using impulse sound as the measurement signal.
- the left microphone 2L and the right microphone 2R of the microphone unit 2 pick up the measurement signals, respectively, and output the picked up signals to the processing device 201.
- the sound collection signal acquisition unit 212 acquires the sound collection signals collected by the left microphone 2L and the right microphone 2R.
- the collected sound signal acquisition unit 212 may include an A/D converter that performs A/D conversion on the collected sound signals from the microphones 2L and 2R.
- the picked-up signal acquisition unit 212 may synchronously add the signals obtained by a plurality of measurements.
- the sound pickup signal in the time domain is called ECTF.
- the envelope calculation unit 214 calculates the envelope of the frequency characteristics of the collected sound signal.
- the envelope calculation unit 214 can obtain the envelope using the cepstrum analysis.
- the envelope calculation unit 214 calculates the frequency characteristic of the collected sound signal (ECTF) by discrete Fourier transform or discrete cosine transform.
- the envelope calculation unit 214 calculates the frequency characteristic by, for example, performing FFT (Fast Fourier Transform) on the ECTF in the time domain.
- the frequency characteristic includes a power spectrum and a phase spectrum.
- the envelope calculation unit 214 may generate an amplitude spectrum instead of the power spectrum.
- the envelope calculation unit 214 obtains a cepstrum by performing an inverse Fourier transform on the logarithmic transform spectrum.
- the envelope calculation unit 214 applies a lifter to the cepstrum.
- the lifter is a low-pass lifter that passes only low frequency band components.
- the envelope calculation unit 214 may use a method other than the cepstrum analysis.
- the envelope may be calculated by applying a general smoothing method to the logarithmically converted amplitude value.
- a smoothing method a simple moving average, a Savitzky-Golay filter, a smoothing spline, etc. can be used.
- the scale conversion unit 215 changes the scale of the envelope data so that the discrete spectrum data are evenly spaced on the logarithmic axis.
- the envelope data obtained by the envelope calculation unit 214 are equidistant in frequency. That is, since the envelope data are evenly spaced on the frequency linear axis, they are not evenly spaced on the frequency logarithmic axis. Therefore, the scale conversion unit 215 performs an interpolation process on the envelope data so that the envelope data has equal intervals on the frequency logarithmic axis.
- the scale conversion unit 215 interpolates the data in the low frequency band with a rough data interval. Specifically, the scale conversion unit 215 obtains discrete envelope data arranged at equal intervals on the logarithmic axis by performing interpolation processing such as three-dimensional spline interpolation. Envelope data that has undergone scale conversion is referred to as scale conversion data.
- the scale conversion data is a spectrum in which the frequency and the power value are associated with each other.
- the scale conversion unit 215 is not limited to the logarithmic scale and may convert the envelope data into a scale close to human hearing (referred to as a hearing scale).
- scale conversion may be performed using a logarithmic scale (Log scale), a mel (mel) scale, a Bark scale, an ERB (Equivalent Rectangular Bandwidth) scale, or the like.
- the scale conversion unit 215 scales the envelope data with an auditory scale by data interpolation. For example, the scale conversion unit 215 makes the data in the low frequency band dense by interpolating the data in the low frequency band having a rough data interval in the auditory scale. Data that is equidistant on the auditory scale is dense on the low frequency band and coarse on the high frequency band on the linear scale. By doing so, the scale conversion unit 215 can generate scale conversion data at equal intervals on the auditory scale. Of course, the scale conversion data does not have to be completely equidistant data on the auditory scale.
- the normalization coefficient calculation unit 216 calculates the normalization coefficient based on the scale conversion data. Therefore, the normalization coefficient calculation unit 216 divides the scale conversion data into a plurality of frequency bands and calculates the characteristic value for each frequency band. Then, the normalization coefficient calculation unit 216 calculates the normalization coefficient based on the feature value for each frequency band. The normalization coefficient calculation unit 216 calculates the normalization coefficient by weighting and adding the feature values for each frequency band.
- the normalization coefficient calculation unit 216 divides the scale conversion data into four frequency bands (hereinafter, referred to as first to fourth bands).
- the first band is at least the minimum frequency (for example, 10 Hz) and less than 1000 Hz.
- the first band is a range that changes depending on whether or not the headphones 43 fit.
- the second band is 1000 Hz or more and less than 4 kHz.
- the second band is a range in which the characteristics of the headphones themselves appear regardless of the individual.
- the third band is 4 kHz or more and less than 12 kHz.
- the third characteristic is the range in which the characteristic of the individual is most represented.
- the fourth band has a frequency of 12 kHz or more and a maximum frequency (for example, 22.4 kHz) or less.
- the fourth band is a range that changes every time the headphones are worn.
- the range of each band is an example, and the range is not limited to the above values.
- the characteristic value is, for example, four values of the maximum value, the minimum value, the average value, and the median value of the scale conversion data in each band.
- the four values of the first band are Amax (maximum value), Amin (minimum value), Aave (average value), and Amed (median value).
- the four values of the second band are Bmax, Bmin, Bave, and Bmed.
- the four values of the third band are Cmax, Cmin, Cave, and Cmed
- the four values of the fourth band are Dmax, Dmin, Dave, and Dmed.
- the normalization coefficient calculator 216 calculates a reference value for each band based on the four feature values.
- the reference value of the first band is Astd
- Dstd Dmax ⁇ 0.1+Dmin ⁇ 0.1+Dave ⁇ 0.5+Dmed ⁇ 0.3 (4)
- the normalization coefficient Std is expressed by the following equation (5).
- Std Astd ⁇ 0.25+Bstd ⁇ 0.4+Cstd ⁇ 0.25+Dstd ⁇ 0.1 (5)
- the normalization coefficient calculation unit 216 calculates the normalization coefficient Std by weighting and adding the feature values for each band.
- the normalization coefficient calculation unit 216 divides into four frequency bands and extracts four feature values from each band.
- the normalization coefficient calculation unit 216 weights and adds 16 feature values.
- the variance value of each band may be calculated and the weighting may be changed according to the variance value.
- An integral value or the like may be used as the characteristic value.
- the number of characteristic values in one band is not limited to four, and may be five or more or three or less. It suffices if at least one of the maximum value, the minimum value, the average value, the median value, the integral value, and the variance value is the feature value.
- the weighted addition coefficient for one or more of the maximum value, the minimum value, the average value, the median value, the integral value, and the variance value may be zero.
- the normalization unit 217 normalizes the sound pickup signal using the normalization coefficient. Specifically, the normalization unit 217 calculates Std ⁇ ECTF as the normalized sound pickup signal. The sound pickup signal after the normalization is referred to as a normalized ECTF. The normalization unit 217 can normalize the ECTF to an appropriate level by using the normalization coefficient.
- the conversion unit 218 calculates the frequency characteristic of the normalized ECTF by the discrete Fourier transform or the discrete cosine transform. For example, the conversion unit 218 calculates the frequency characteristic by performing FFT (Fast Fourier Transform) on the normalized ECTF in the time domain.
- the frequency characteristics of the normalized ECTF include a power spectrum and a phase spectrum.
- the conversion unit 218 may generate an amplitude spectrum instead of the power spectrum.
- the power spectrum and the phase spectrum of the normalized ECTF will be referred to as a normalized power spectrum and a normalized phase spectrum.
- FIG. 5 shows power spectra before and after normalization. By performing the normalization, the power value of the power spectrum changes to an appropriate level.
- the dip correction unit 219 corrects the dip in the normalized power spectrum.
- the dip correction unit 219 determines a portion where the power value of the normalized power spectrum is equal to or less than the threshold value as a dip, and corrects the power value of the portion where the dip occurs. For example, the dip correction unit 219 corrects the dip by interpolating the portion below the threshold value.
- the normalized power spectrum after dip correction is used as the corrected power spectrum.
- the dip correction unit 219 divides the normalized power spectrum into two bands and sets different thresholds for each band.
- the boundary frequency is 12 kHz, 12 kHz or less is the low frequency band, and 12 kHz or more is the high frequency band.
- the threshold of the low frequency band is the first threshold TH1
- the threshold of the high frequency band is the second threshold TH2.
- the first threshold TH1 is preferably lower than the second threshold TH2.
- the first threshold TH1 can be set to ⁇ 13 dB and the second threshold TH2 can be set to ⁇ 9 dB.
- the dip correction unit 219 may be divided into three or more bands and set different thresholds for the respective bands.
- FIG. 6 and 7 show power spectra before and after dip correction.
- FIG. 6 is a graph showing a power spectrum before dip correction, that is, a normalized power spectrum.
- FIG. 7 is a graph showing the corrected power spectrum after the dip correction.
- the power value is below the first threshold TH1 at the point P1.
- the dip correction unit 219 determines a portion P1 where the power value is below the first threshold value TH1 in the low frequency band as a dip.
- the power value is below the second threshold TH2 at the point P2.
- the dip correction unit 219 determines a location P2 where the power value is below the second threshold TH2 in the high frequency band as a dip.
- the dip correction unit 219 increases the power value at the points P1 and P2. For example, the dip correction unit 219 replaces the power value of the place P1 with the first threshold value TH1. The dip correction unit 219 replaces the power value at the location P2 with the second threshold TH2. Further, the dip correction unit 219 may round the boundary portion between the portion below the threshold and the portion below the threshold, as shown in FIG. 7. Alternatively, the dip correction unit 219 may correct the dip by interpolating the points P1 and P2 using a method such as spline interpolation.
- the filter generation unit 220 uses the corrected power spectrum to generate a filter.
- the filter generation unit 220 obtains the inverse characteristic of the corrected power spectrum. Specifically, the filter generation unit 220 obtains an inverse characteristic that cancels the corrected power spectrum (frequency characteristic in which the dip is corrected).
- the inverse characteristic is a power spectrum having a filter coefficient that cancels the corrected logarithmic power spectrum.
- the filter generation unit 220 calculates a signal in the time domain from the inverse characteristic and the phase characteristic (normalized phase spectrum) by inverse discrete Fourier transform or inverse discrete cosine transform.
- the filter generation unit 220 generates a time signal by performing IFFT (Inverse Fast Fourier Transform) on the inverse characteristic and the phase characteristic.
- IFFT Inverse Fast Fourier Transform
- the filter generation unit 220 calculates the inverse filter by cutting out the generated time signal with a predetermined filter length.
- the processing device 201 generates the inverse filter Linv by performing the above processing on the sound collection signal collected by the left microphone 2L.
- the processing device 201 generates the inverse filter Rinv by performing the above processing on the sound pickup signal picked up by the right microphone 2R.
- the inverse filters Linv and Rinv are set in the filter units 41 and 42 of FIG. 1, respectively.
- the normalization coefficient calculation unit 216 calculates the normalization coefficient based on the scale conversion data. Thereby, the normalization unit 217 can perform normalization using an appropriate normalization coefficient.
- the normalization coefficient can be calculated by paying attention to a band that is important for hearing.
- the coefficient is calculated such that the sum of squares or RMS (root mean square) has a predetermined value. Compared to the case where such a general method is used, the processing of the present embodiment can obtain an appropriate normalization coefficient.
- the measurement of the ear canal transfer characteristics of the person to be measured 1 is performed using the microphone unit 2 and the headphones 43.
- the processing device 201 may be a smart phone or the like. For this reason, the measurement settings may be different for each measurement. Further, there is a possibility that variations may occur in the mounting of the headphones 43 and the microphone unit 2.
- the processing device 201 performs normalization by multiplying the ECTF by the normalization coefficient Std calculated as described above. By doing so, it is possible to suppress variations due to settings during measurement and measure the ear canal transfer characteristics.
- the filter generation unit 220 calculates the inverse characteristic by using the corrected power spectrum in which the dip has been corrected. As a result, it is possible to prevent the power value of the inverse characteristic from having a steep rising waveform in the frequency band corresponding to the dip. Thereby, an appropriate inverse filter can be generated. Further, the dip correction unit 219 divides the frequency characteristic into two or more frequency bands and sets different thresholds. By doing so, the dip can be appropriately corrected for each frequency band. Therefore, more appropriate inverse filters Linv and Rinv can be generated.
- the normalization unit 217 normalizes the ECTF in order to appropriately perform such dip correction.
- the dip correction unit 219 corrects the dip in the power spectrum (or amplitude spectrum) of the normalized ECTF. Therefore, the dip correction unit 219 can appropriately correct the dip.
- FIG. 8 is a flowchart showing the processing method according to this embodiment.
- the envelope calculating unit 214 calculates the envelope of the power spectrum of the ECTF by using the cepstrum analysis (S1). As described above, the envelope calculation unit 214 may use a method other than the cepstrum analysis.
- the scale conversion unit 215 scales the envelope data into logarithmically spaced data (S2).
- the scale conversion unit 215 interpolates low frequency band data having a coarse data interval by three-dimensional spline interpolation or the like. As a result, scale conversion data with equal intervals on the frequency logarithmic axis can be obtained.
- the scale conversion unit 215 may perform scale conversion using not only the logarithmic scale but also the various auditory scales described above.
- the normalization coefficient calculation unit 216 calculates the normalization coefficient using weighting for each frequency band (S3). In the normalization coefficient calculation unit 216, weights are set in advance for each of a plurality of frequency bands. The normalization coefficient calculation unit 216 extracts the characteristic value of the scale conversion data for each frequency band. Then, the normalization coefficient calculation unit 216 calculates the normalization coefficient by weighting and adding a plurality of feature values.
- the normalization unit 217 calculates the normalized ECTF using the normalization coefficient (S4).
- the normalization unit 217 calculates the normalized ECTF by multiplying the time domain ECTF by the normalization coefficient.
- the conversion unit 218 calculates the frequency characteristic of the normalized ECTF (S5).
- the conversion unit 218 calculates the normalized power spectrum and the normalized phase spectrum by subjecting the normalized ECTF to discrete Fourier transform or the like.
- the dip correction unit 219 interpolates the dip of the normalized power spectrum using different thresholds for each frequency band (S6). For example, the dip correction unit 219 interpolates a portion where the power value of the normalized power spectrum is lower than the first threshold TH1 in the low frequency band. The dip correction unit 219 interpolates a portion where the power value of the normalized power spectrum falls below the second threshold TH2 in the high frequency band. Thereby, the dip of the normalized power spectrum can be corrected so as to have the respective threshold values for each band. Thereby, the corrected power spectrum can be obtained.
- the filter generation unit 220 calculates time domain data using the corrected power spectrum (S7).
- the filter generation unit 220 calculates the inverse characteristic of the corrected power spectrum.
- the inverse characteristic is data that cancels the headphone characteristic based on the corrected power spectrum. Then, the filter generation unit 220 calculates time domain data by performing inverse FFT on the inverse characteristic and the normalized phase spectrum obtained in S5.
- the filter generation unit 220 calculates an inverse filter by cutting out the time domain data with a predetermined filter length (S8).
- the filter generation unit 220 outputs the inverse filters Linv and Rinv to the out-of-head localization processing device 100.
- the out-of-head localization processing device 100 reproduces the reproduction signal subjected to the out-of-head localization using the inverse filters Linv and Rinv. As a result, the user U can listen to the reproduction signal that has been appropriately subjected to the out-of-head localization process.
- the processing device 201 generates the inverse filters Linv and Rinv in the above embodiment, the processing device 201 is not limited to the one that generates the inverse filters Linv and Rinv.
- the processing device 201 is suitable when it is necessary to appropriately normalize the collected sound signal.
- Non-transitory computer-readable media include various types of tangible storage media, such as tangible storage media.
- Examples of non-transitory computer readable media are magnetic recording media (eg flexible disk, magnetic tape, hard disk drive), magneto-optical recording media (eg magneto-optical disk), CD-ROM (Read Only Memory), CD-R, Includes CD-R/W and semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)).
- the program may be supplied to the computer by various types of transitory computer-readable media (transmission computer readable medium).
- transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves.
- the transitory computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
- the present disclosure can be applied to a processing device that processes a sound pickup signal.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
- Circuit For Audible Band Transducer (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201980090944.3A CN113412630B (zh) | 2019-02-14 | 2019-12-24 | 处理装置、处理方法、再现方法和程序 |
| EP19914812.3A EP3926977B1 (en) | 2019-02-14 | 2019-12-24 | Processing device, processing method, reproducing method, and program |
| US17/400,672 US11997468B2 (en) | 2019-02-14 | 2021-08-12 | Processing device, processing method, reproducing method, and program |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2019-024336 | 2019-02-14 | ||
| JP2019024336A JP7115353B2 (ja) | 2019-02-14 | 2019-02-14 | 処理装置、処理方法、再生方法、及びプログラム |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/400,672 Continuation US11997468B2 (en) | 2019-02-14 | 2021-08-12 | Processing device, processing method, reproducing method, and program |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020166216A1 true WO2020166216A1 (ja) | 2020-08-20 |
Family
ID=72045256
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2019/050601 Ceased WO2020166216A1 (ja) | 2019-02-14 | 2019-12-24 | 処理装置、処理方法、再生方法、及びプログラム |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US11997468B2 (https=) |
| EP (1) | EP3926977B1 (https=) |
| JP (1) | JP7115353B2 (https=) |
| CN (1) | CN113412630B (https=) |
| WO (1) | WO2020166216A1 (https=) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240305950A1 (en) * | 2021-01-29 | 2024-09-12 | Sony Group Corporation | Information processing device, information processing method, and program |
| JP7632163B2 (ja) | 2021-08-06 | 2025-02-19 | 株式会社Jvcケンウッド | 処理装置、及び処理方法 |
| JP7750003B2 (ja) * | 2021-09-27 | 2025-10-07 | 株式会社Jvcケンウッド | フィルタ生成装置、フィルタ生成方法、及びプログラム |
| JP7772229B2 (ja) * | 2022-07-28 | 2025-11-18 | Ntt株式会社 | 伝達特性補正装置、伝達特性補正方法、プログラム |
| WO2024024053A1 (ja) * | 2022-07-28 | 2024-02-01 | 日本電信電話株式会社 | 伝達特性補正装置、伝達特性補正方法、プログラム |
| KR102740590B1 (ko) * | 2022-12-15 | 2024-12-11 | 주식회사 지오드사운드 | 입체 음향 구현 방법 및 이를 이용한 입체 음향 구현 시스템 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015032933A (ja) * | 2013-08-01 | 2015-02-16 | クラリオン株式会社 | 低域補完装置および低域補完方法 |
| JP2015126268A (ja) | 2013-12-25 | 2015-07-06 | 株式会社Jvcケンウッド | 頭外音像定位装置、頭外音像定位方法、及び、プログラム |
| JP2017060040A (ja) * | 2015-09-17 | 2017-03-23 | 株式会社Jvcケンウッド | 頭外定位処理装置、及び頭外定位処理方法 |
| JP2019024336A (ja) | 2017-07-26 | 2019-02-21 | 日清製粉株式会社 | パン類の製造方法 |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB9417185D0 (en) | 1994-08-25 | 1994-10-12 | Adaptive Audio Ltd | Sounds recording and reproduction systems |
| US5974387A (en) * | 1996-06-19 | 1999-10-26 | Yamaha Corporation | Audio recompression from higher rates for karaoke, video games, and other applications |
| JP4274614B2 (ja) * | 1999-03-09 | 2009-06-10 | パナソニック株式会社 | オーディオ信号復号方法 |
| JP2003280691A (ja) * | 2002-03-19 | 2003-10-02 | Sanyo Electric Co Ltd | 音声処理方法および音声処理装置 |
| JP5792994B2 (ja) * | 2011-05-18 | 2015-10-14 | 日本放送協会 | 音声比較装置及び音声比較プログラム |
| CN104041054A (zh) * | 2012-01-17 | 2014-09-10 | 索尼公司 | 编码设备及编码方法、解码设备及解码方法以及程序 |
| WO2016133988A1 (en) * | 2015-02-19 | 2016-08-25 | Dolby Laboratories Licensing Corporation | Loudspeaker-room equalization with perceptual correction of spectral dips |
| CN106878866B (zh) * | 2017-03-03 | 2020-01-10 | Oppo广东移动通信有限公司 | 音频信号处理方法、装置及终端 |
-
2019
- 2019-02-14 JP JP2019024336A patent/JP7115353B2/ja active Active
- 2019-12-24 EP EP19914812.3A patent/EP3926977B1/en active Active
- 2019-12-24 WO PCT/JP2019/050601 patent/WO2020166216A1/ja not_active Ceased
- 2019-12-24 CN CN201980090944.3A patent/CN113412630B/zh active Active
-
2021
- 2021-08-12 US US17/400,672 patent/US11997468B2/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015032933A (ja) * | 2013-08-01 | 2015-02-16 | クラリオン株式会社 | 低域補完装置および低域補完方法 |
| JP2015126268A (ja) | 2013-12-25 | 2015-07-06 | 株式会社Jvcケンウッド | 頭外音像定位装置、頭外音像定位方法、及び、プログラム |
| JP2017060040A (ja) * | 2015-09-17 | 2017-03-23 | 株式会社Jvcケンウッド | 頭外定位処理装置、及び頭外定位処理方法 |
| JP2019024336A (ja) | 2017-07-26 | 2019-02-21 | 日清製粉株式会社 | パン類の製造方法 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3926977A4 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3926977B1 (en) | 2026-03-25 |
| EP3926977A4 (en) | 2022-04-13 |
| US20210377684A1 (en) | 2021-12-02 |
| US11997468B2 (en) | 2024-05-28 |
| JP7115353B2 (ja) | 2022-08-09 |
| EP3926977A1 (en) | 2021-12-22 |
| JP2020136752A (ja) | 2020-08-31 |
| CN113412630B (zh) | 2024-03-08 |
| CN113412630A (zh) | 2021-09-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11997468B2 (en) | Processing device, processing method, reproducing method, and program | |
| US11115743B2 (en) | Signal processing device, signal processing method, and program | |
| JP6866679B2 (ja) | 頭外定位処理装置、頭外定位処理方法、及び頭外定位処理プログラム | |
| US10687144B2 (en) | Filter generation device and filter generation method | |
| US12137318B2 (en) | Processing device and processing method | |
| JP6981330B2 (ja) | 頭外定位処理装置、頭外定位処理方法、及びプログラム | |
| JP7639607B2 (ja) | 処理装置、及び処理方法 | |
| JP6805879B2 (ja) | フィルタ生成装置、フィルタ生成方法、及びプログラム | |
| US12192742B2 (en) | Filter generation device and filter generation method | |
| JP7755780B2 (ja) | フィルタ生成装置、フィルタ生成方法、及びプログラム | |
| JP7750003B2 (ja) | フィルタ生成装置、フィルタ生成方法、及びプログラム | |
| US12170884B2 (en) | Processing device and processing method | |
| JP7677052B2 (ja) | 処理装置、及び処理方法 | |
| JP7439502B2 (ja) | 処理装置、処理方法、フィルタ生成方法、再生方法、及びプログラム | |
| JP2024125727A (ja) | クラスタリング装置、及びクラスタリング方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19914812 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2019914812 Country of ref document: EP Effective date: 20210914 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2019914812 Country of ref document: EP |