WO2022145154A1 - Stereophonic processing device, stereophonic processing method, and stereophonic processing program - Google Patents

Stereophonic processing device, stereophonic processing method, and stereophonic processing program Download PDF

Info

Publication number
WO2022145154A1
WO2022145154A1 PCT/JP2021/043354 JP2021043354W WO2022145154A1 WO 2022145154 A1 WO2022145154 A1 WO 2022145154A1 JP 2021043354 W JP2021043354 W JP 2021043354W WO 2022145154 A1 WO2022145154 A1 WO 2022145154A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing
signal
stereophonic
processing unit
frequency band
Prior art date
Application number
PCT/JP2021/043354
Other languages
French (fr)
Japanese (ja)
Inventor
宏 岩村
興一郎 落合
Original Assignee
オーディージョンサウンドラボ合同会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by オーディージョンサウンドラボ合同会社 filed Critical オーディージョンサウンドラボ合同会社
Publication of WO2022145154A1 publication Critical patent/WO2022145154A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems

Definitions

  • the present invention relates to a stereophonic processing device that localizes the reproduced sound for earphones out of the head, a processing method thereof, and a program thereof.
  • the sound emitted from one of multiple speakers is divided into direct sound that reaches the ear directly and reflected sound that is reflected by the wall or floor and reaches the ear.
  • the direct sound reaches the eardrum linearly, and also reaches the eardrum under the influence of wraparound / reflex in the head, reflex / resonance in the pinna and the ear canal.
  • a head-related transfer function (HRTF) that expresses the change in the physical characteristics of the incident sound due to the periphery of the head in the frequency domain is added to the direct sound.
  • the head related transfer function represented in the graph by the relative amplitude [dB] (vertical axis) with respect to the frequency [kHz] (horizontal axis) is the relative amplitude [dB] (vertical axis) with respect to the time region [ms] (horizontal axis).
  • the relative amplitude is expressed by multiplying the logarithm of the amplitude ratio of the received signal to the sound source signal by 20. Head-related transfer functions differ between the left and right ears.
  • HRTF L In order to obtain the left and right head related transfer functions HRTF L , HRTF R , first, in the free sound field of the unsounded room, from the sound source to the dummy head (pseudo head) or the microphone attached to both ears of the human head.
  • Left and right head-related transfer responses g L , g R are divided by the blank head-related transfer response g C from the same sound source to the microphone at the position corresponding to the head center without the head, and the left and right head-related transfer responses (g L / g C) .
  • the sound from the sound output part such as headphones reaches the eardrum almost without being affected by the shape around the head, unlike the sound from the speaker.
  • the head related transfer function is not taken into account.
  • the brain of the headphone wearer cannot recognize the stereophonic effect of sound (3D sound, 3D sound image, 3D sound, sound image localization, out-of-head localization), and obtains only a narrow sound image limited to the inside of the head between the left and right ears. Not possible (intra-head localization).
  • a narrow sound image not only does not produce a three-dimensional sound, but also causes discomfort to the headphone wearer and causes brain fatigue due to long-term listening.
  • a sound source including a head-related transfer function is recorded with a microphone attached to both human ears or a dummy head, and the sound source signal is played back with headphones.
  • a binaural reproduction method that takes into account the characteristics is known.
  • the recognition of stereophonic sound varies greatly among individuals, and in the binaural reproduction method, a person with a microphone or a person other than a dummy head whose head-related transfer function is close to correctly recognizes the stereophonic sound image.
  • a person with a microphone can obtain the effect of a stereophonic image by this reproduction method, and even a dummy head that attempts standardization and averaging cannot eliminate the harmful effects of individual differences in the head-related transfer function.
  • a method of convolving a head-related transfer function measured in advance or measured at the time of reproduction into a sound source signal and correcting the sound source signal to form a stereophonic sound image is known.
  • This is a method of personalizing, that is, measuring a head-related transfer function peculiar to the listener and correcting the sound source signal at the time of reproduction, contrary to the above-mentioned standardization and averaging attempt by a dummy head or the like.
  • the measurement of the head-related transfer function needs to be performed with high accuracy by a special device in an anechoic chamber, which is not realistic for general listeners.
  • There is also a simple measurement method for the head-related transfer function but due to lack of accuracy, the head-related transfer function unique to each person cannot be obtained with high accuracy, and the three-dimensional effect cannot be sufficiently obtained.
  • the head-related transfer function differs not only depending on the individual, but also on the position of the sound source, that is, the angle and distance, and its frequency characteristics are extremely complicated. Therefore, when the head-related transfer function is convoluted into the sound source signal and corrected, the amount of calculation is extremely large, which leads to a decrease in calculation processing speed and an increase in size of the device.
  • it is necessary to reproduce the reproduced sound that matches the characteristics peculiar to the headphone wearer with high accuracy by arithmetic processing. Since the sound image cannot be formed and the sound quality changes and the localization in the head occurs, the capacities of the arithmetic processing device and the storage device must be inevitably increased.
  • Patent Document 1 estimates the center frequency and bandwidth of the first peak from the measurement dimensions of a specific part of the listener's ear, further estimates the center frequency and bandwidth of the first notch, and obtains each transfer function. Disclosed is a three-dimensional sound reproduction device that convolves into a sound source signal and personally adapts to generate a binoral signal. However, it is difficult to accurately measure each dimension of the auricle portion, and the device of Patent Document 1 cannot generate a personalized stereophonic signal even if the amount of calculation can be reduced by the low-precision measurement.
  • the head-related transfer function information in the high frequency band is lost or not stored by the compression process in the audio signal that convolves the general head-related transfer function, and the stereophonic effect is reduced. do. That is, the delay of the reflected sound in the high frequency band due to the head-related transfer function is as short as several hundred ⁇ s (0.0001 seconds) from the direct sound, and it is reduced by compression processing as a sound that is difficult to recognize from the auditory psychological point of view. Or it will be deleted. In order to prevent this, the high band component of the audio signal is amplified and corrected before compression, but the sound quality changes.
  • Patent Document 2 includes a reflected sound synthesis processing unit that synthesizes a reflected sound component with an input audio signal, and an out-of-head localization processing unit that filters the output of the reflected sound component and outputs the left and right audio signals.
  • a coefficient based on the measurement of the head-related transfer function including the floor-reflected sound in the sound source direction is stored in each circuit of the reflected sound synthesis processing unit and the out-of-head localization processing unit, and discloses an apparatus for performing virtual sound image localization processing.
  • the head-related transfer function that has not been suppressed or attenuated is convoluted with respect to the entire band of the audio signal, information on the head-related transfer function is used in the high frequency band when the audio signal is compressed. Disappears from the audio signal, and the stereophonic effect is significantly reduced.
  • the direct sound reaches the ear predominantly, and the ear on the opposite side is shielded by the head and the direct sound hardly reaches.
  • the head related transfer function is measured in a free sound field (anechoic chamber) for accurate measurement.
  • the head-related transfer function on the opposite side of the sound source cannot be measured with sufficient accuracy and S / N ratio, and if the head-related transfer function is used as it is, it will be reproduced in a real environment with reflection. Accuracy deteriorates and human sound source space recognition does not work properly.
  • an object of the present invention is to provide a stereophonic processing device for earphones or headphones that generates high-quality stereophonic sound with a small amount of calculation and storage capacity, and a processing method thereof. Another object of the present invention is to provide a stereophonic sound processing device and a processing method thereof for generating a stereophonic sound with little recognition difference between listeners and almost no change in sound quality with respect to a sound source. Another object of the present invention is to provide a stereophonic processing device for forming a stereophonic sound image by high-speed processing using a transfer function and a processing method thereof. Further, it is an object of the present invention to provide a stereophonic processing apparatus and a processing method thereof in which the stereophonic effect does not deteriorate with respect to the compression processing of the audio signal.
  • the stereophonic processing apparatus (10,10', 10 ") of the present invention has a first frequency band (91) including the lowest region of the audible region and a second frequency band (91) higher than the first frequency band (91).
  • a stereophonic sound reproduction signal is generated from the sound source signal including 92) and supplied to the earphone (50) or the headphone.
  • the stereophonic processing unit (1) that performs out-of-head localization processing at least the first frequency band (91) of the input signal.
  • a reflection processing unit (2) connected to the stereophonic processing unit (1) and adding a reflected sound component to at least a part of the second frequency band (92) of the input signal.
  • a stereophonic processing filter 11a-11d to which the coefficients of only one pair of left and right transmission functions are given in advance or by selection, and the only pair of left and right transmission functions are at only one angle from 45 ° to 90 ° in the horizontal plane.
  • a pair of left and right high-frequency suppression transmission functions in which the band (92') corresponding to the second frequency band of the sound source signal is suppressed or attenuated for the pair of head transmission functions from the sound source position to each of the left and right ears. be.
  • the transfer function of only one angle is used, and the high frequency suppression transfer function (BRTF) in which the band (92') corresponding to the second frequency band is suppressed or attenuated is used, so that the amount of calculation is small.
  • the sound source signal can be processed in three dimensions.
  • the high-frequency suppression transfer function (BRTF) is obtained from the head-related transfer function (HRTR) with a horizontal plane azimuth ( ⁇ ) of 45 ° to 90 °, individual differences in stereophonic sound recognition are small, and earphones (50) or For all listeners wearing headphones, there is no in-head localization and high-quality three-dimensional sound can be reproduced.
  • the coefficient (h B0 to h Bn ) of the high frequency suppression transmission function (BRTF) is given to the three-dimensional processing unit (1) in advance and is not updated during operation, so that a storage device and an additional arithmetic unit are used.
  • the entire device can be made smaller and lighter without the need, and can be built into earphones (50) or headphones.
  • the coefficient (h B0 to h Bn ) of the high frequency suppression transmission function (BRTF) at one horizontal plane azimuth ( ⁇ ) can be selected, the type of sound depends on the type of earphone (50) or headphone. Therefore, it is possible to realize optimum reproduction of three-dimensional sound by switching according to the taste of the listener.
  • stereophonic processing apparatus (10''') of the present invention includes a first and second input terminals (6a, 6b) for inputting a sound source signal to the stereophonic processing apparatus (10'''). It is provided with a stereophonic processing unit (1) that performs out-of-head localization processing at least the first frequency band (91) of the input signals from the first and second input terminals (6a, 6b).
  • the three-dimensional processing unit (1) includes a first and second three-dimensional processing filter (11a, 11b) connected to the first input end (6a) and a third and fourth three-dimensional processing filters (11a, 11b) connected to the second input end (6a).
  • Both output signals of the three-dimensional processing filter (11c, 11d) and the second and third three-dimensional processing filters (11b, 11c) are input as left and right signals, separated into a center signal and a side signal, and a reflected sound component is added to the side signal.
  • the other adder (16b) that adds the output signal of one of the M / S processor (51) and the output signal of the fourth stereoscopic processing filter (11d).
  • the one adder (16a) for adding the other output signal of the M / S processor (51) and the output signal of the first three-dimensional processing filter (11a) is provided.
  • the transfer function of only one pair of left and right is given in advance or by selection, and the transfer function of only one pair of left and right is one angle of horizontal plane orientation angle of 45 ° to 90 °. Only a pair of left and right high-frequency transfer functions in which the band (92') corresponding to the second frequency band of the sound source signal is suppressed or attenuated for the pair of head-related transfer functions from the sound source position to the left and right ears. It is a function.
  • the M / S processor (51) By separating the center and side signals with the M / S processor (51) and adding reflected sound to the side signal, for example, the side signal on the opposite side of the sound source, it is possible to generate a sound that is easy for humans to recognize three-dimensional spatial sound. ..
  • the stereophonic processing method of the present invention is from a sound source signal including a first frequency band (91) including the lowest region of the audible region and a second frequency band (92) higher than the first frequency band (91). , Generates a stereophonic sound reproduction signal and supplies it to the earphone (50) or headphones.
  • the process of inputting the processing signal of the reflection processing unit (2) and performing out-of-head localization processing of at least the first frequency band (91) of the input signal to generate the processing signal of the three-dimensional processing unit (1) is included.
  • the process of assigning the transfer function coefficient in advance is the process of preparing a pair of left and right head transfer functions from the sound source position to each of the left and right ears at only one angle of the horizontal plane orientation angle of 45 ° to 90 °, and the process of preparing a pair of left and right heads.
  • a three-dimensional processing filter (11a- 11 d) includes the process of pre-giving.
  • a sound source signal is input to the process of generating and accumulating the coefficient of the transfer function and the three-dimensional processing filter (11a-11d) of the three-dimensional processing unit (1). It includes a process of inputting a processing signal of the reflection processing unit (2), performing out-of-head localization processing of at least the first frequency band (91) of the input signal, and generating a processing signal of the three-dimensional processing unit (1).
  • the process of generating and accumulating transfer function coefficients is the process of preparing multiple pairs of left and right head transfer functions from each sound source position to each of the left and right ears at multiple angles of horizontal plane orientation angle of 45 ° to 90 °, and multiple pairs of left and right.
  • the band (92') corresponding to the second frequency band is suppressed or attenuated by the coefficient generator (3) to generate a plurality of pairs of left and right high-frequency transfer functions.
  • the process of generating the processing signal of the three-dimensional processing unit (1) is the coefficient of the high-frequency suppression transfer function of only one pair of left and right among the coefficients of the high-frequency suppression transfer function of multiple pairs of left and right stored in the coefficient storage unit (4).
  • the sound source signal or the processing signal of the reflection processing unit (2) is passed through the process of assigning the function coefficient and the stereoscopic processing filter (11a-11d) to which only one pair of left and right high-frequency suppression transfer function coefficients are assigned.
  • the stereophonic processing program of the present invention causes a computer to function as each part of the stereophonic processing apparatus.
  • the stereophonic processing device, processing method, and program of the present invention can realize stereophonic sound with high sound quality and natural sound quality with little influence of individual differences. Further, with a small amount of calculation and storage capacity, the stereophonic processing device of the present invention, which can be mounted as software on a personal computer, a smartphone, a game machine or the like, and which is downsized and lightweight, can be built in an earphone or a headphone. Further, in the present invention, even a wireless earphone or headphone that requires transmission of a compressed signal can realize a high-quality stereophonic effect without correction such as amplification.
  • a block diagram showing a stereophonic processing apparatus according to the first embodiment of the present invention.
  • a block diagram illustrating the configuration of the three-dimensional processing unit of FIG. A block diagram illustrating the configuration of the three-dimensional processing unit of FIG.
  • a block diagram illustrating the configuration of the reflection processing unit of FIG. A block diagram illustrating the configuration of the reflection processing unit of FIG.
  • a block diagram illustrating the configuration of the reflection processing unit of FIG. A block diagram illustrating the configuration of the reflection processing unit of FIG.
  • FIG. 11 Schematic diagram showing how to obtain a head impulse response from a sound source Flowchart to generate high frequency suppression transfer function from head impulse response Graph showing each transfer function of left ear (a) and right ear (b) Graph showing each impulse response of the left ear (a) and the right ear (b) A graph showing each audio signal processed by the prior art (a) and the method (b) of the present invention. A graph showing each audio signal obtained by compressing each audio signal in FIG. 11.
  • a block diagram illustrating the configuration of the M / S delay unit of FIG. A block diagram showing a stereophonic processing apparatus according to a fourth embodiment of the present invention.
  • FIG. 1 shows a block diagram of a two-channel stereophonic processing device 10 that generates a stereophonic sound reproduction signal from a sound source signal stored in the sound source apparatus 30 and supplies the stereophonic sound reproduction signal to the earphone 50 or headphones.
  • the sound source signal includes a first frequency band 91 including the lowest region of the audible region and a second frequency band 92 higher than the first frequency band 91.
  • the human audible range is generally said to have a frequency range of 20 Hz to 20,000 Hz, the lowest of which is about 20 Hz.
  • the stereophonic processing device 10 of the present invention is connected to a stereophonic processing unit 1 that performs out-of-head localization processing of at least the first frequency band 91 of the input signal and a stereophonic processing unit 1, and has a height exceeding the first frequency band 91 of the input signal.
  • a reflection processing unit 2 for adding a reflected sound component is provided in at least a part of the second frequency band 92 including the region.
  • the three-dimensional processing unit 1 shown in FIG. 1 is connected to a sound source device 30 in which a sound source of digital or analog data is stored via the first and second input terminals 6a and 6b.
  • a sound source device 30 in which a sound source of digital or analog data is stored via the first and second input terminals 6a and 6b.
  • an analog-to-digital conversion device (not shown) is required between the sound source device 30 and the stereophonic processing device 10.
  • the three-dimensional processing unit 1 includes a three-dimensional processing filter 11a-11d to which only a pair of left and right transfer function coefficients are given in advance, and the three-dimensional processing filter (11a-11d) passes an input signal of a sound source signal to a digital signal. Out-of-head localization processing is performed on the sound source signal by processing.
  • the block diagram of FIG. 2A illustrating the embodiment of the three-dimensional processing unit 1 includes the first three-dimensional processing filter 11a and the second three-dimensional processing filter 11b, each of which is connected to the first input end 6a of the stereophonic processing apparatus 10. , Connected to the third stereophonic processing filter 11c and the fourth stereophonic processing filter 11b, one of which is connected to the second input end 6b of the stereophonic processing apparatus 10, and the other of the first and third stereophonic processing filters 11a, 11c.
  • the adder 16a is provided, and the adder 16b connected to the other of the second and fourth steric processing filters 11b and 11d is provided. It is a so-called tasuki hanging structure.
  • the M / S processor 51 may be arranged between the second and third three-dimensional processing filters 11b and 11c and the two adders 16a and 16b.
  • left and right signals are input to the L terminal and the R terminal from the branch points 55a and 55b between the second and third three-dimensional processing filters 11b and 11c and the adder 16a and 16b.
  • the M / S converter 52 which is separated into a center signal and a side signal and outputs the center and side signals from the M terminal and the S terminal, respectively, and the center signal and the side signal from the M / S converter 52 are input and reflected, respectively.
  • the reflection processing filters 54a and 54b that add sound and the reflection processing filters 54a and 54b, the signal after reflection processing is input to the M terminal and S terminal, respectively, and returned to the left and right signals, and the left and right feedback signals are sent from the L terminal and R terminal.
  • the L / R converter 53 that outputs It is equipped with two adders 57a and 57b that output to the adders 16b and 16a by tapping.
  • the first to fourth three-dimensional processing filters 11a-11d are FIR filters (finite impulse response filters) or IIR filters (infinite impulse response filters).
  • the block diagram of FIG. 3 illustrates the configuration of the three-dimensional processing filter 11a by a general FIR filter. That is, the output signals of both the delay device 12 1 and the multiplier 13 0 connected to the input terminal 6a, the multiplier 13 1 connected to the delay device 12 1 and the multiplier 13 0 and the multiplier 13 1 are added. It is equipped with an adder 141, a delay device 12 1 , a multiplier 13 1 and an adder 14 1 to form a tap, and a plurality of taps to form a higher-order (n-th order) FIR filter.
  • the coefficient of the transfer function of only the pair of left and right given to the first to fourth three-dimensional processing filters 11a-11d in advance is shown in FIG. 7 at any one angle of the horizontal plane azimuth ( ⁇ ) 45 ° to 90 °. It is a coefficient obtained by arithmetically processing a pair of head impulse responses HRIR L and HRIR R from the position of the sound source 71 to the left and right ears 70 L and 70 R. Since only one angle is used, the amount of calculation and time can be reduced in the present invention.
  • the horizontal azimuth ( ⁇ ) 45 ° to 90 ° is an angle of both left and right directions + 45 ° to + 90 ° and ⁇ 45 ° to ⁇ 90 when the front direction of the head 70 of the human or dummy head is 0 °. Includes any °. In the angle range, the influence of individual differences is small and a stereophonic sound image can be formed.
  • Any angle of horizontal azimuth ( ⁇ ) includes, for example, plus or minus 45 °, 50 °, 55 °, 60 °, 65 °, 70 °, 75 °, 80 °, 85 ° or 90 °. It is not limited to these as long as it is within the range of 45 ° to 90 °.
  • the vertical angle (elevation angle) in which the individual difference in the head impulse response is large is set to 0 ° and is not considered in the present invention.
  • the pre-assigned transfer function coefficient is a pair of left and right head-related transfer obtained by arithmetically processing a pair of left and right head-related transfer responses HRIR L and HRIR R at one of the horizontal plane orientation angles of 45 ° to 90 °. It is the coefficient of the high frequency suppression transfer functions BRTF L , BRTF R of only one pair on the left and right, in which the band 92'corresponding to the second frequency band 92 of the sound source signal is suppressed or attenuated with respect to the functions HRTF L and HRTF R. ..
  • the band 91'of the first frequency band 91 and the corresponding transfer functions HRTF and BRTF is a band included in the range of 0 to 8 kHz, and is preferably 0.1 kHz to 8 kHz.
  • the second frequency band 92 and the corresponding transfer functions HRTF and BRTF band 92' are included in the range of 8 kHz to 96 kHz, and are usually 8 kHz to 20 kHz in the audible range, preferably 8 kHz to 15 kHz. ..
  • the upper limit is half the sampling frequency Fs (Fs / 2) for the convenience of processing, for example, the sound source is 44.1kHz / 2 for CD, 48kHz / 2, 96kHz / 2, 192kHz / 2 for DVD / Blu-ray, etc. You may.
  • the stereophonic processing device 10 shown in FIG. 1 one is connected to the stereophonic processing unit 1 and the other is connected to the digital-to-analog (D / A) conversion device 40 via the output terminals 7a and 7b.
  • the D / A conversion device 40 is wired or wirelessly connected to the headphones 50 by a transmission line 41, and transmits a stereophonic sound reproduction signal to the earphones 50.
  • the digital signal is wirelessly transmitted as it is from the transmitter (indicated by reference numeral 40 in this case) connected to the output terminals 7a and 7b without providing the D / A conversion device on the stereophonic processing device 10 side, and the bluetooth is transmitted.
  • the transmitter indicated by reference numeral 40 in this case
  • the bluetooth is transmitted.
  • It may be converted into an analog signal by a D / A conversion device (not shown) added or built into an earphone or the like.
  • FIG. 4A is a block diagram illustrating the configuration of the two-channel reflection processing unit 2.
  • the reflection processing unit 2 includes a first reflection processing filter 21a connected to a first branch point 25a between the first input end 6a (FIG. 1) of the three-dimensional processing device 10 and the earphone 50 to input a signal. ..
  • the processing signal of the three-dimensional processing unit 1 and the feedback signal from the first reflection processing filter 21a are added to the first addition point 26a between the first input terminal 6a and the first branch point 25a of the three-dimensional processing apparatus 10.
  • a first adder 27a is provided.
  • the reflection processing unit 2 is connected to the second branch point 25b between the second input end 6b (FIG.
  • the signal is input to the second reflection processing filter. Equipped with 21b.
  • the processing signal of the three-dimensional processing unit 1 and the feedback signal from the second reflection processing filter 21b are added to the second addition point 26b between the second input end 6b and the second branch point 25b of the three-dimensional processing apparatus 10.
  • a second adder 27b is provided.
  • the feedback signal from the first reflection processing filter 21a is input to either or both of the second adder 27b and the first adder 27a by the switches 17a and 17b.
  • the feedback signal from the second reflection processing filter 21b may be input to either or both of the first adder 27a and the second adder 27b.
  • an M / S converter 28 is provided between the first and second branch points 25a and 25b and the first and second reflection processing filters 21a and 21b, and the stereo left and right signals L and R are centered and side signals. It can be separated into M and S and processed individually. Perform a Fourier transform and perform a Fourier transform,
  • ⁇ arctan (
  • the coefficient of the low frequency portion (preferably about 200 Hz) may be used as the center signal component regardless of the arctan value. Only the center signal component may be extracted and subtracted from the original signal to be used as the side signal component.
  • an L / R converter 29 is provided between the first and second reflection processing filters 21a and 21b and the first and second adders 27a and 27b, and the center and side signals M and S are stereo left and right signals.
  • 4C shows the first and third reflection processing filters 21a and 21c, one of which is connected to the M terminal of the M / S converter 28, and the first and one of which are connected to the S terminal of the M / S converter 28.
  • Adder 19a connected to the M terminal of the L / R converter 29 by adding the outputs of the second and fourth reflection processing filters 21b and 21d and the first and third reflection processing filters 21a and 21c, and the second and second.
  • an adder 19b connected to the S terminal of the L / R converter 29 by adding both outputs of the fourth reflection processing filter 21b and 21d.
  • switching switches 18a and 18b are provided between the third and fourth reflection processing filters 21c and 21d and the adders 19a and 19b, and two output signals from the third reflection processing filter 21c are added. It can be switched and transmitted to either or both of the devices 19a and 19b, and the output signal from the fourth reflection processing filter 21d can be switched and transmitted to either or both of the two adders 19b and 19a.
  • FIG. 5 illustrates the configuration of the first reflection processing filter 21a of the reflection processing unit 2, from the delayer 22 connected to the first branch point 25a (FIGS. 4A to 4D) and inputting a signal, and the delayer 22. Includes a multiplier 23 that amplifies the output signal of ..
  • FIG. 6 is a block diagram showing an IIR filter, which is an embodiment of the frequency filter 24 of FIG.
  • the frequency filter 24 is a multiplier that multiplies the input signal X 24 by a coefficient b 0 , a delayer 62 1 that delays the input signal X 24 , and a multiplier that multiplies the signal by the delayer 62 1 by a coefficient b 1 .
  • a feed forward unit including an adder 64 1 for adding a signal and an adder 64 2 for adding a signal by the adder 64 1 and a signal by a multiplier 63 2 .
  • the frequency filter 24 uses a delay device 6 2 3 for inputting a part of the output signal Y 24 generated from the signal by the adder 64 2 through the adders 64 3 and 64 4 and a coefficient a 1 for the signal by the delay device 6 23.
  • a multiplier 63 3 that multiplies by, a delayer 62 4 that delays the signal by the delayer 62 3 , a multiplier 63 4 that multiplies the signal by the delayer 62 4 by the coefficient a 2 , and a signal by the adder 64 2 . It is provided with a feedback unit including an adder 64 3 that adds the signal by the multiplier 63 3 and an adder 64 4 that adds the signal by the adder 64 3 and the signal by the multiplier 63 4 .
  • the reflected sound includes an initial reflected sound and an additional reflected sound.
  • Early reflection is a reflected sound that arrives in a short time after a direct sound that arrives directly from a sound source, and is referred to as a reflected sound that is reflected only once by an object.
  • the additional reflected sound is a reflected sound having two or more reflections, and is also referred to as a rear reverberation sound, simply a reverberation sound, or a reverb.
  • the coefficient of the transfer function is given in advance to the three-dimensional processing filter 11a-11d of the three-dimensional processing unit 1.
  • a pair of left and right head-related transfer functions HRTF L , HRTF from the sound source position to each of the left and right ears at only one angle of the horizontal plane azimuth angle of 45 ° to 90 ° in an anechoic chamber. Includes the process of preparing R. For example, in FIG.
  • the left and right impulse responses g L and g R are recorded, and the blank impulse response g C is recorded by a microphone (not shown) placed at the position corresponding to the center of the head 70 C without the head 70, and the left and right impulse responses g are recorded.
  • FIG. 8 shows a flowchart (90) for generating an impulse response (high frequency suppression impulse response BRIR L , BRIR R ) of a high frequency suppression transfer function from the left and right head impulse responses HRIR L , HRIR R (80).
  • the left and right head impulse responses HRIR L and HRIR R are arithmetically processed by Fourier transform F (s81), and the common logarithm is multiplied by 20 and converted into a dB value (s82) to transmit a pair of left and right heads.
  • F F
  • FHRIR R HRTF R
  • a pair of left and right head-related transfer functions HRTF L , HRTF R obtained from or corrected from a database of a research institution, a university, a general company, etc. may be used.
  • the horizontal axis is the frequency [kHz] and the vertical axis is the relative amplitude [dB].
  • the head-related transfer function HRTF L of FIG. 9B is shown, and the solid line in FIG. 9B shows the head-related transfer function HRTF R of the right ear 70R at a horizontal azimuth ( ⁇ ) of 45 °.
  • the sampling frequency Fs is 44.1 kHz.
  • a band 92'corresponding to the second frequency band 92 of the sound source signal for example, a component having a frequency exceeding 8 kHz is passed through a low-pass filter.
  • Suppression or attenuation processing (s83) produces a pair of left and right high-frequency suppression transfer functions BRTF L , BRTF R , which are shown by the broken lines in FIGS. 9 (a) and 9 (b).
  • the suppression or attenuation processing (s83) by the low-pass filter fine irregularities disappear on the time axis and the waveform is smoothed. For example, it may be smoothed by a moving average method.
  • FIGS. 9 (a) and 9 (b) By the suppression or attenuation processing (s83) by the low-pass filter, fine irregularities disappear on the time axis and the waveform is smoothed. For example, it may be smoothed by a moving average method.
  • the head-related transfer functions HRTF L and HRTF R (solid line) before the high-frequency suppression treatment are compared with the high-frequency suppression transfer functions BRTF L and BRTF R (broken line) after the treatment. Then, in the band 91'corresponding to the first frequency band from 0 to 8 kHz, the solid line and the broken line show almost the same value. On the other hand, in the band 92'corresponding to the second frequency band exceeding 8 kHz, the solid line and the broken line show similar changes and behaviors, but the high frequency suppression transfer functions BRTF L and BRTF R are head-related transfer functions HRTF L. , Shows a value attenuated by a few dB from the HRTF R.
  • FIG. 10 (a) shows the impulse responses BRIR R and BRIR R on the right ear 70R side distal to the sound source 71 are shown in FIG. Shown in (b). Due to the difference in distance from the sound source 71, FIG. 10 (a) has a larger amplitude as a whole as compared with FIG. 10 (b).
  • the value of the amplitude of the impulse response corresponds to the filter coefficient value in a frequency filter such as an FIR filter, and the number of samples corresponds to the number of taps (n + 1) of the FIR filter. If the number of taps (n + 1) is small, the amount of calculation by the FIR filter can be reduced. Therefore, in the present invention in which the high-frequency suppression impulse responses BRIR L and BRIR R are adopted, the effect of significantly reducing the amount of calculation can be obtained as compared with the prior art.
  • Equation 2 The frequency response of Equation 1 is expressed by Equation 2 by converting e- j ⁇ T to z -1 .
  • T is the time interval [seconds] between data
  • is the angular frequency [rad / sec] represented by 2 ⁇ Fc / Fs
  • Fs is the sampling frequency [Hz]
  • Fc is the cutoff frequency [Hz].
  • H (j ⁇ ) h B0 + h B1 e -j ⁇ T + ⁇ ⁇ + h Bn e -jn ⁇ T (Equation 2)
  • the three-dimensional processing filter 11a of the three-dimensional processing unit 1 is input. Is given in advance the coefficients of only a pair of left and right transfer functions.
  • the high-frequency suppression impulse responses BRIR L and BRIR are given in advance as coefficients to the other three-dimensional processing filters 11b-11d.
  • the sound source signal from the sound source device 30 is input to the three-dimensional processing unit 1 to which the coefficient of the transfer function is assigned through the first and second input terminals 6a and 6b, the input signal is subjected to out-of-head localization processing to perform three-dimensional processing.
  • the processing signal of part 1 is obtained.
  • the entire first and second frequency bands 91 and 92 of the sound source signal are subjected to out-of-head localization processing, but in the high frequency suppression transfer function BRTF, the band 92'corresponding to the second frequency band is suppressed or attenuated.
  • the influence of the high frequency suppression transfer function BRTF on the second frequency band 92 is smaller than the influence on the first frequency band 91.
  • signal processing using the coefficient of the high-frequency suppression transfer function BRTF for the first frequency band 91 is effective for forming a stereophonic image with little individual difference.
  • the processed signal subjected to the out-of-head localization processing by the three-dimensional processing unit 1 is input to the reflection processing unit 2, and is applied to at least a part of the second frequency band 92 including the high frequency band exceeding the first frequency band 91 of the input signal.
  • a reflected sound component is added to generate a processing signal of the reflection processing unit 2.
  • the reflection processing unit 2 when the processing signal of the three-dimensional processing unit 1 input to the reflection processing unit 2 passes through the reflection processing unit 2, a second frequency band higher than the first frequency band 91 of the input signal is generated.
  • an additional reflected sound component (reverberation sound, reverb) delayed from the direct sound by 0.001 to 10 seconds is added together with the initial reflected sound.
  • the input signal introduced from the branch points 25a and 25b of FIGS. 4A to 4D to the reflection processing filters 21a and 21b of the reflection processing unit 2 has the delay device 22 shown in FIG. 5 and an arbitrary coefficient of the reflection processing gain. It passes through the applied multiplier 23 and is input to the frequency filter 24.
  • the biquadratic (BiQuad) filter of the IIR filter shown in FIG. 6 as the frequency filter 24 in the case of a low shelf filter (LSF: a filter that amplifies components below the cutoff frequency by the amount of amplification), the transfer function H (z). And its frequency response are represented by Equations 3 and 4.
  • H (z) A (z -2 + (A 1/2 / Q) z -1 + A) / (Az -2 + (A 1/2 / Q) z -1 +1) (Equation 3)
  • H (j ⁇ ) A (e -j2 ⁇ + (A 1/2 / Q) e -j ⁇ + A) / (Ae -j2 ⁇ + (A 1/2 / Q) e -j ⁇ +1) (Equation 4)
  • Q is the filter Q value (a value that affects the amplification curve around the cutoff frequency)
  • G is the amplification amount [dB] that determines the filter characteristics
  • is the angular frequency [rad / sec] expressed by 2 ⁇ Fc / Fs.
  • Fs indicates the sampling frequency [Hz]
  • Fc indicates the cutoff frequency [Hz]. For example, if Fc is set to 8 kHz, the band 92'corresponding to the second frequency band is in the range of 8 kHz or more.
  • A is represented by 10 G / 40 and ⁇ is represented by sin ⁇ / 2Q.
  • the impulse response is equal to the coefficient, but the coefficients a 1 , a 2 , b 0 , b 1 , b 2 of the LSF in the IIR filter 24 in FIG. 6 are the following equations 5 to 5 in consideration of the frequency characteristics. Determined by 9.
  • a 1 -2 ((A-1) + (A + 1) cos ⁇ ) (Equation 5)
  • a 2 (A + 1) + (A-1) cos ⁇ -2 ⁇ A 1/2
  • b 0 A ((A + 1)-(A-1) cos ⁇ + 2 ⁇ A 1/2 )
  • b 1 2A ((A-1)-(A + 1) cos ⁇ )
  • b 2 A ((A + 1)-(A-1) cos ⁇ -2 ⁇ A 1/2 ) (Equation 9)
  • the coefficients a 1 , a 2 , b 0 , b 1 , b 2 can be determined by arbitrarily setting the filter Q value, the amplification amount G, and the cutoff frequency Fc according to the earphone characteristics and the listener's preference. .. Further, instead of the LSF, other filter types such as low-pass, high-pass, band-pass, notch, high-shelf, peaking, and
  • FIG. 11 (a) schematically shows a three-dimensional sound reproduction signal 34 in which a coefficient of a conventional head-related transfer function HRTF having a horizontal plane orientation angle of substantially 45 ° is added to the sound source signal
  • FIG. 11 (a) shows the same
  • the stereoscopic sound reproduction signal 35 to which the coefficient of the high frequency suppression transmission function BRTF is added to the sound source signal by the stereoscopic processing unit 1 and the components of the initial reflected sound and the additional reflected sound are added by the reflection processing unit 2 is outlined. Show.
  • the sound source signal 33 (broken line) is shown in both figures.
  • the horizontal axis is frequency [kHz] and the vertical axis is relative amplitude [dB], and the data on the left ear 70L side when the sound source 71 is substantially + 45 ° (FIG. 7) is illustrated. do.
  • the compatible sound reproduction signals 34 and 35 of FIGS. 11A and 11B show almost the same characteristics in the first frequency band 91 of substantially 8 kHz or less, but in the second frequency band 92 of substantially more than 8 kHz. The characteristics are different.
  • the three-dimensional sound reproduction signal 35 generated by both the three-dimensional processing unit 1 and the reflection processing unit 2 is transmitted to the digital-to-analog conversion device 40 via the output terminals 7a and 7b, and further sent to the earphone 50 or the headphone as the three-dimensional sound reproduction signal.
  • Supply 35 In this case, for example, when transmitting the three-dimensional sound reproduction signal 35 to the wireless earphone, the signal compression processing is required.
  • FIG. 12 (a) schematically shows an audio signal 34'that has been compressed during transmission to, for example, a wireless earphone, with respect to the audio signal 34 (FIG. 11 (a)) to which a coefficient of the conventional head-related transfer function HRTF is added. .. In FIG. 12 (a), the audio signal 35 (FIG.
  • the audio signal 35' is outlined.
  • the voice signal 34'by the conventional technique the sound image information of the second frequency band 92 in the high band is reduced or disappears by adding the coefficient of the head related transfer function HRTF, that is, adding a short delay component of about 0.0001 seconds from the direct sound. do.
  • HRTF head related transfer function
  • an additional reflected sound component delayed from the direct sound by 0.001 to 10 seconds is added separately from the transmission function together with the initial reflected sound component. The reflected sound component added to the frequency band 92 does not disappear, and the stereophonic effect is maintained.
  • FIG. 13 shows the stereophonic device 10'of the second embodiment according to the present invention.
  • a stereophonic sound reproduction signal is generated from a sound source signal including the first frequency band 91 including the lowest region of the audible region and the second frequency band 92 higher than the first frequency band 91 and supplied to the earphone 50 or the headphone.
  • the stereophonic processing device 10' is connected to a stereophonic processing unit 1 that performs out-of-head localization processing of at least the first frequency band 91 of the input signal, and is connected to the stereophonic processing unit 1 to cover at least a part of the second frequency band 92 of the input signal.
  • the point that the reflection processing unit 2 for adding the reflected sound component is provided is the same as that of the stereophonic device 10 of FIG. FIG.
  • the coefficient generation unit 3 receives sound source signals for a plurality of pairs of head-related transfer functions HRTF L , HRTF R from each sound source position 71 to the left and right ears 70 L, 70 R at a plurality of angles from the horizontal plane azimuth angle of 45 ° to 90 °. Generates the coefficients of a plurality of pairs of left and right high-frequency suppression transfer functions BRTF L and BRTF R in which the band corresponding to the second frequency band 92 is suppressed or attenuated.
  • the coefficient storage unit 4 accumulates the coefficients of a plurality of pairs of left and right high-frequency suppression transfer functions BRTF L and BRTF R.
  • the three-dimensional processing unit 1 is given a selected pair of the coefficients of the high-frequency suppression transfer functions BRTF L and BRTF R stored in the coefficient storage unit 4, and passes the input signal to perform out-of-head localization processing. It is equipped with a three-dimensional processing filter 11a-11d.
  • the left and right ears 70L from each sound source 71 position at multiple angles of horizontal plane azimuth angle 45 ° to 90 ° Prepare multiple pairs of left and right head-related transfer functions HRTF L and HRTF R up to 70R.
  • the head-related transfer functions HRTF L and HRTF R are prepared by performing arithmetic processing by Fourier transform (s81) and multiplying the common logarithm by 20 to convert it to a dB value (s82). Processing is performed in the same manner as in the flowchart (FIG. 8) of the first embodiment, except that a plurality of pairs are prepared.
  • the band 92'corresponding to the second frequency band is suppressed or attenuated by the coefficient generator 3 (s83), and the left and right multiple pairs.
  • the coefficient generator 3 s83
  • the high and right multiple pairs Generates the high frequency suppression transfer functions BRTF L and BRTF R.
  • the logarithmic value is returned to the amplitude value (s84), inverse Fourier transform is performed (s85), and window function processing is performed (s86).
  • Suppressive impulse responses BRIR L , BRIR R are obtained (90).
  • the coefficients h B0 to h Bn of a plurality of pairs of left and right high-frequency suppression transfer functions corresponding to the values of the high-frequency suppression impulse responses BRIR L and BRIR R are stored in the coefficient storage unit 4.
  • the coefficients of the high frequency suppression transfer function of are selected manually or automatically or according to the program.
  • the coefficient h of the high-frequency suppression transfer function of only one pair on the left and right at only one selected horizontal plane azimuth angle. B0 to h Bn is given.
  • the sound source signal is passed through the three-dimensional processing filter 11a-11d to which the coefficients h B0 to h Bn of the high-frequency suppression transfer function of only one pair on the left and right are given, and at least the first frequency band 91 is localized out of the head. Generates the processed signal of the processed three-dimensional processing unit 1.
  • the processing signal of the three-dimensional processing unit 1 is input to the reflection processing unit 2, and a reflected sound component is added to at least a part of the second frequency band 92 including the high band exceeding the first frequency band 91 of the input signal. Generates the processing signal of the reflection processing unit 2.
  • the three-dimensional sound reproduction signal generated by both the three-dimensional processing unit 1 and the reflection processing unit 2 is supplied to the earphone 50 or the headphones. Similar to the stereophonic device 10 (FIG. 1), the stereophonic device 10'(FIG. 13) and the processing method using the stereophonic device 10'(FIG. 1) have an excellent effect of forming high-quality stereophonic sound with a small amount of calculation and small individual differences. The effect is obtained.
  • the sound source signals input to the stereophonic devices 10 and 10' are processed in the order of the stereophonic processing unit 1 and the reflection processing unit 2.
  • the same effect can be obtained by reversing the order and processing in the order of the reflection processing unit 2 and the three-dimensional processing unit 1.
  • the two-channel stereophonic device 10,10' is shown, but two or more channels may be used.
  • the entire first and second frequency bands 91 and 92 of the sound source signal are subjected to out-of-head localization processing by the stereoscopic processing unit 1, but only in the first frequency band 91. Overhead localization processing may be performed by the coefficient of the region suppression transfer function BRTF.
  • FIG. 14 shows a stereophonic device 10 "according to the third embodiment, which includes an M / S delay unit 93 connected to the reflection processing unit 2 at positions after both processing of the stereophonic processing unit 1 and the reflection processing unit 2.
  • FIG. 15 shows an exemplary configuration of the M / S delay unit 93.
  • the M / S delay unit 93 inputs the output signals from the reflection processing unit 2 to the L and R terminals, respectively, and converts them into center and side signals.
  • the M / S converter 98, the reflection processing filter 21a that delays the center and side signals from the M / S converter 98, and the delay signal from the reflection processing filter 21a are input to the M and S terminals, respectively, and left and right signals.
  • the reflection processing filter 21a has the same reference numerals as those shown in FIG.
  • the delay device 22 of the reflection processing filter 21a delays each signal separately or only to one of them, and then L / R conversion is performed again.
  • the center signal component preferably arctan (0.7) ⁇ arctan (
  • the center signal component may be used. Only the center signal component may be extracted and subtracted from the original signal to be used as the side signal component.
  • FIG. 16 shows a stereophonic device 10 ′′ according to a fourth embodiment of the present invention, which includes an M / S processor 51 that separates a center signal and a side signal, but does not include a reflection processing unit 2.
  • It is provided with a three-dimensional processing unit 1 for processing.
  • the three-dimensional processing unit 1 includes first and second three-dimensional processing filters 11a and 11b connected to the first input end 6a, and third and fourth three-dimensional processing filters 11c and 11d connected to the second input end 6a.
  • M / S that inputs both output signals of the second and third three-dimensional processing filters 11b and 11c as left and right signals, separates them into a center signal and a side signal, and adds reflected sound to at least the side signal or only the side signal.
  • One adder 16a that adds the output signal of the processor 51 and the other of the M / S processor 51 (from FIG. 2B adder 57b) and the output signal of the first stereoscopic processing filter 11a, and the M / S processor.
  • the other adder 16b for adding the output signal of one of 51 (from FIG. 2B adder 57a) and the output signal of the fourth stereoscopic processing filter 11d is provided.
  • the first to fourth three-dimensional processing filters 11a-11d are given the coefficients of only a pair of left and right transfer functions in advance or by selection.
  • the transfer function of only one pair on the left and right is the band corresponding to the second frequency band of the sound source signal with respect to the pair of head-related transfer functions from the sound source position to each ear on the left and right at only one angle of the horizontal plane azimuth angle of 45 ° to 90 °.
  • the signal on the sound source side (output signal from the first or fourth stereoscopic processing filter 11a or 11d) does not pass through the M / S processor 51, while the signal on the opposite side of the sound source (second or second).
  • the output signal from the three-dimensional processing filter 11b or 11c) can be passed through the M / S processor 51 to add a reflected sound component to enhance human spatial acoustic recognition.
  • additional processing such as reflected sound may be performed only on the side signal component after conversion to the center and side signals.
  • the present invention may be a stereophonic processing program for operating a computer as each part of the stereophonic processing apparatus 10, 10', 10 ", 10'''.
  • the processing content of the function of each part may be.
  • the processing of each part can be realized on the computer.
  • a storage unit for storing the stereophonic processing program and a storage unit connected to the storage unit of the program includes a control unit that controls the processing operation.
  • the storage unit includes a computer-readable storage medium such as a magnetic storage device, an optical disk, a photomagnetic storage medium, and a semiconductor memory. Further, the first and second storage units are included.
  • the stereophonic processing method described in detail according to the embodiment may be executed by the stereophonic processing program stored in the storage unit.
  • the stereophonic processing device, processing method, and processing program of the present invention can be used not only for music sound sources but also for all stereo sound sources such as environmental sounds, conversation sounds, and game sounds.

Abstract

[Problem] To easily realize acoustic localization with a low amount of computation and storage capacity. [Solution] The present invention comprises: a stereo processing unit (1) that performs an out-of-head localization process on at least a first frequency band (91) of an input signal; and a reflection processing unit (2) that is connected to the stereo processing unit (1) and adds a reflected sound component to at least a portion of a second frequency band (92) of the input signal. The stereo processing unit (1) includes stereo processing filters (11a-11d) in which the coefficient of a transmission function for only a left-right pair is applied in advance or selectively, the transmission function of only the left-right pair being a high-frequency suppression transmission function for the left-right pair only, in which the band corresponding to the second frequency band (92) of a sound source signal is suppressed or attenuated for a pair of head transmission functions from the sound source position at only one angle within a 45-90° horizontal plane azimuth, to the left and right ears.

Description

立体音響処理装置、立体音響処理方法及び立体音響処理プログラム3D sound processing device, 3D sound processing method and 3D sound processing program
 本発明は、イヤホン用の再生音を頭外定位化する立体音響処理装置、その処理方法及びそのプログラムに関する。 The present invention relates to a stereophonic processing device that localizes the reproduced sound for earphones out of the head, a processing method thereof, and a program thereof.
 音声をスピーカー再生する場合、複数スピーカーの1つから発せられた音は、直接耳に届く直接音と、壁や床で反射して耳に届く反射音とに分けられる。直接音は、直線的に耳の鼓膜に到達し、また、頭での回り込み・反射、耳介及び外耳道での反射・共鳴等の影響を受けて鼓膜に到達する。直接音には、頭部周辺による入射音の物理特性変化を周波数領域で表現した頭部伝達関数(HRTF:Head-Related Transfer Function)が加味されている。 When playing sound through speakers, the sound emitted from one of multiple speakers is divided into direct sound that reaches the ear directly and reflected sound that is reflected by the wall or floor and reaches the ear. The direct sound reaches the eardrum linearly, and also reaches the eardrum under the influence of wraparound / reflex in the head, reflex / resonance in the pinna and the ear canal. A head-related transfer function (HRTF) that expresses the change in the physical characteristics of the incident sound due to the periphery of the head in the frequency domain is added to the direct sound.
 通常、周波数[kHz](横軸)に対する相対振幅[dB](縦軸)でグラフに表される頭部伝達関数は、時間領域[ms](横軸)に対する相対振幅[dB](縦軸)でグラフに表される頭部インパルス応答(Head-Related Impulse Response)とフーリエ変換対の関係にある。相対振幅は、音源信号に対する受信信号の振幅比の対数を20倍して表される。頭部伝達関数は、左右の耳で相違する。左右の頭部伝達関数HRTFL,HRTFRを得るために、最初に、無響室の自由音場にて、音源からダミーヘッド(擬似頭)又は人間の頭部両耳に付したマイクまでの左右インパルス応答gL,gRを、同じ音源から頭部が無い状態の頭部中心相当位置のマイクまでのブランクのインパルス応答gCで除して左右の頭部インパルス応答(gL/gC=HRIRL,gR/gC=HRIRR)を取得し、次に、左右の頭部インパルス応答HRIRL,HRIRRをそれぞれフーリエ変換Fして、左右の頭部伝達関数が得られる(FHRIRL=HRTFL,FHRIRR=HRTFR)。 Normally, the head related transfer function represented in the graph by the relative amplitude [dB] (vertical axis) with respect to the frequency [kHz] (horizontal axis) is the relative amplitude [dB] (vertical axis) with respect to the time region [ms] (horizontal axis). ) Is the relationship between the Head-Related Impulse Response shown in the graph and the Fourier transform pair. The relative amplitude is expressed by multiplying the logarithm of the amplitude ratio of the received signal to the sound source signal by 20. Head-related transfer functions differ between the left and right ears. In order to obtain the left and right head related transfer functions HRTF L , HRTF R , first, in the free sound field of the unsounded room, from the sound source to the dummy head (pseudo head) or the microphone attached to both ears of the human head. Left and right head-related transfer responses g L , g R are divided by the blank head-related transfer response g C from the same sound source to the microphone at the position corresponding to the head center without the head, and the left and right head-related transfer responses (g L / g C) . = HRIR L , g R / g C = HRIR R ) is acquired, and then the left and right head-related transfer functions are obtained by Fourier transform F for the left and right head-related transfer responses HRIR L and HRIR R , respectively (FHRIR). L = HRTF L , FHRIR R = HRTF R ).
 一方、音声をイヤホン又はヘッドバンド付きイヤホン(ヘッドホン)で再生する場合、ヘッドホン等の音出部からの音は、スピーカーからの音と異なり、頭部周辺形状の影響をほぼ受けずに鼓膜に到達し、頭部伝達関数が加味されない。このため、ヘッドホン装着者の脳は、音の立体感(立体音響、立体音像、三次元音響、音像定位、頭外定位)を認識できず、左右耳間の頭内に限定した狭い音像しか得られない(頭内定位)。狭い音像は、立体音が得られないだけでなく、ヘッドホン装着者は不快感を伴い、長時間の聴音により脳疲労を引き起こす。 On the other hand, when playing sound with earphones or earphones with a headband (headphones), the sound from the sound output part such as headphones reaches the eardrum almost without being affected by the shape around the head, unlike the sound from the speaker. However, the head related transfer function is not taken into account. For this reason, the brain of the headphone wearer cannot recognize the stereophonic effect of sound (3D sound, 3D sound image, 3D sound, sound image localization, out-of-head localization), and obtains only a narrow sound image limited to the inside of the head between the left and right ears. Not possible (intra-head localization). A narrow sound image not only does not produce a three-dimensional sound, but also causes discomfort to the headphone wearer and causes brain fatigue due to long-term listening.
 イヤホン又はヘッドホンによる再生で立体音を得る方法として、人間の両耳又はダミーヘッドに付けたマイクにより頭部伝達関数を含む音源を録音し、その音源信号をヘッドホンで再生する、頭部周辺形状の特性を加味したバイノーラル再生方式が知られる。しかし、立体音の認識(頭部伝達関数)は、個人差が大きく、バイノーラル再生方式では、マイクを付けた人間又はダミーヘッドに頭部伝達関数が近似する以外の者は、立体音像を正しく認識できない。事実上、マイクを付けた人間のみが、この再生方式で立体音像の効果が得られ、また、標準化及び平均化を試みるダミーヘッドでも、頭部伝達関数の個人差の弊害を解消できない。 As a method of obtaining a three-dimensional sound by playing with earphones or headphones, a sound source including a head-related transfer function is recorded with a microphone attached to both human ears or a dummy head, and the sound source signal is played back with headphones. A binaural reproduction method that takes into account the characteristics is known. However, the recognition of stereophonic sound (head-related transfer function) varies greatly among individuals, and in the binaural reproduction method, a person with a microphone or a person other than a dummy head whose head-related transfer function is close to correctly recognizes the stereophonic sound image. Can not. In fact, only a person with a microphone can obtain the effect of a stereophonic image by this reproduction method, and even a dummy head that attempts standardization and averaging cannot eliminate the harmful effects of individual differences in the head-related transfer function.
 ヘッドホン等再生により立体音を得る方法として、予め又は再生時に計測して取得した頭部伝達関数を演算により音源信号に畳み込み、音源信号を補正し立体音像を形成する方法が知られる。これは、ダミーヘッド等による前記標準化及び平均化の試みとは逆に、個人化、即ち聴者固有の頭部伝達関数を計測して再生時に音源信号を補正する方法である。しかし、頭部伝達関数の計測は、無響室で特殊装置により高精度に実行される必要があり、一般聴者への採用として現実的ではない。頭部伝達関数の簡易計測方法も存在するが、精度不足のため、各人固有の頭部伝達関数が高精度に得られず、立体効果が十分に得られない。 As a method of obtaining stereophonic sound by reproducing headphones or the like, a method of convolving a head-related transfer function measured in advance or measured at the time of reproduction into a sound source signal and correcting the sound source signal to form a stereophonic sound image is known. This is a method of personalizing, that is, measuring a head-related transfer function peculiar to the listener and correcting the sound source signal at the time of reproduction, contrary to the above-mentioned standardization and averaging attempt by a dummy head or the like. However, the measurement of the head-related transfer function needs to be performed with high accuracy by a special device in an anechoic chamber, which is not realistic for general listeners. There is also a simple measurement method for the head-related transfer function, but due to lack of accuracy, the head-related transfer function unique to each person cannot be obtained with high accuracy, and the three-dimensional effect cannot be sufficiently obtained.
 頭部伝達関数は、個人差だけでなく、音源の位置、即ち角度や距離により相違し、その周波数特性が極めて複雑である。このため、頭部伝達関数を音源信号に畳み込み、補正する際、演算量が極めて多く、演算処理速度低下、装置の大型化を招く。頭部伝達関数を用いて立体音響を実現するには、ヘッドホン装着者固有の特性に高精度で合致した再生音を演算処理で再現する必要があり、僅かでも固有の特性と異なると、正確に音像形成できず、音質変化、頭内定位を生じるため、演算処理装置、記憶装置の容量を必然的に増加しなければならない。 The head-related transfer function differs not only depending on the individual, but also on the position of the sound source, that is, the angle and distance, and its frequency characteristics are extremely complicated. Therefore, when the head-related transfer function is convoluted into the sound source signal and corrected, the amount of calculation is extremely large, which leads to a decrease in calculation processing speed and an increase in size of the device. In order to realize stereophonic sound using the head-related transfer function, it is necessary to reproduce the reproduced sound that matches the characteristics peculiar to the headphone wearer with high accuracy by arithmetic processing. Since the sound image cannot be formed and the sound quality changes and the localization in the head occurs, the capacities of the arithmetic processing device and the storage device must be inevitably increased.
 また、複数定位音源(マルチトラック、マルチオブジェクト)の信号を補正する場合、定位音源毎の頭部伝達関数による処理を要し、更に多くの演算が必要となる。 In addition, when correcting the signals of multiple localized sound sources (multi-track, multi-object), processing by the head-related transfer function for each localized sound source is required, and more calculations are required.
 少ない演算量で音像定位を実現すべく、頭外定位のための代表的かつ特徴的ピーク(山)周波数及びディップ(谷)又はノッチ(特に急峻な負のピーク)周波数における頭部伝達関数のみを演算に使用する簡略化手法により、ヘッドホン再生音を立体化する試みがなされている。しかし、この代表的ピーク等と同一でないピーク等を有する聴者は、十分な立体音像が得られず、このような簡略化手法では、個人差の弊害を十分に解消できない。 Only head-related transfer functions at typical and characteristic peak (mountain) frequencies and dip (valley) or notch (especially steep negative peak) frequencies for out-of-head localization to achieve sound image localization with less computation. Attempts have been made to make the headphone playback sound three-dimensional by a simplified method used for calculation. However, a listener having a peak or the like that is not the same as the typical peak or the like cannot obtain a sufficient stereophonic image, and such a simplified method cannot sufficiently eliminate the harmful effects of individual differences.
 特許文献1は、受聴者耳介の特定部位の計測寸法から、第1ピークの中心周波数及び帯域幅を推定し、更に第1ノッチの中心周波数及び帯域幅を推定し、各伝達関数を求め、音源信号に畳み込み、個人適応化しバイノーラル信号を生成する立体音再生装置を開示する。しかし、耳介部位各寸法の精密計測は難しく、低精度の計測により、特許文献1の装置では、演算量を低減できても、個人適応化した立体音声信号を生成できない。 Patent Document 1 estimates the center frequency and bandwidth of the first peak from the measurement dimensions of a specific part of the listener's ear, further estimates the center frequency and bandwidth of the first notch, and obtains each transfer function. Disclosed is a three-dimensional sound reproduction device that convolves into a sound source signal and personally adapts to generate a binoral signal. However, it is difficult to accurately measure each dimension of the auricle portion, and the device of Patent Document 1 cannot generate a personalized stereophonic signal even if the amount of calculation can be reduced by the low-precision measurement.
 また、音声信号を圧縮処理する場合、一般的な頭部伝達関数を畳み込んだ音声信号では、圧縮処理によって高周波数帯域の頭部伝達関数情報が消失し又は記憶されず、立体音響効果が低下する。即ち、頭部伝達関数による高周波数帯域の反射音の遅延(ディレイ)は、直接音から数百μs(0.0001秒)程度と短く、聴覚心理上認知し難い音として、圧縮処理により低減化又は削除される。これを防止するため、圧縮前に音声信号の高帯域成分を増幅し補正するが、音質変化が生じる。 In addition, when the audio signal is compressed, the head-related transfer function information in the high frequency band is lost or not stored by the compression process in the audio signal that convolves the general head-related transfer function, and the stereophonic effect is reduced. do. That is, the delay of the reflected sound in the high frequency band due to the head-related transfer function is as short as several hundred μs (0.0001 seconds) from the direct sound, and it is reduced by compression processing as a sound that is difficult to recognize from the auditory psychological point of view. Or it will be deleted. In order to prevent this, the high band component of the audio signal is amplified and corrected before compression, but the sound quality changes.
 特許文献2は、入力音声信号に反射音成分を合成する反射音合成処理部と、その出力をフィルタ処理して左右音声信号を出力する頭外定位処理部とを含み、水平方向での所定の音源方向における床面反射音を含む頭部伝達関数の測定に基づく係数が、反射音合成処理部及び頭外定位処理部の各回路に格納され、仮想音像定位処理する装置を開示する。しかし、特許文献2では、音声信号の帯域全体に対し、抑圧又は減衰処理されていない頭部伝達関数が畳み込まれるため、音声信号を圧縮する際、高周波数帯域では、頭部伝達関数の情報が音声信号から消失し、立体音響効果が著しく低下する。 Patent Document 2 includes a reflected sound synthesis processing unit that synthesizes a reflected sound component with an input audio signal, and an out-of-head localization processing unit that filters the output of the reflected sound component and outputs the left and right audio signals. A coefficient based on the measurement of the head-related transfer function including the floor-reflected sound in the sound source direction is stored in each circuit of the reflected sound synthesis processing unit and the out-of-head localization processing unit, and discloses an apparatus for performing virtual sound image localization processing. However, in Patent Document 2, since the head-related transfer function that has not been suppressed or attenuated is convoluted with respect to the entire band of the audio signal, information on the head-related transfer function is used in the high frequency band when the audio signal is compressed. Disappears from the audio signal, and the stereophonic effect is significantly reduced.
 更に、聴者に対し45°~90°位置からの音源の場合、音源側の耳では直接音の到達が支配的であり、逆側の耳では頭部に遮蔽され直接音が殆ど届かない。一般的に頭部伝達関数は、精度良く測定するために自由音場(無響室)で測定される。しかし自由音場では反射が無いため、音源と逆側の頭部伝達関数を十分な精度、S/N比で測定できず、その頭部伝達関数をそのまま用いると、反射を有する実環境では再現精度が劣化し、人間の音源空間認識が適切に機能しない。 Furthermore, in the case of a sound source from a position of 45 ° to 90 ° with respect to the listener, the direct sound reaches the ear predominantly, and the ear on the opposite side is shielded by the head and the direct sound hardly reaches. Generally, the head related transfer function is measured in a free sound field (anechoic chamber) for accurate measurement. However, since there is no reflection in the free sound field, the head-related transfer function on the opposite side of the sound source cannot be measured with sufficient accuracy and S / N ratio, and if the head-related transfer function is used as it is, it will be reproduced in a real environment with reflection. Accuracy deteriorates and human sound source space recognition does not work properly.
特開2017-85362号公報Japanese Unexamined Patent Publication No. 2017-85362 特開2009-105565号公報Japanese Unexamined Patent Publication No. 2009-105565
 そこで、本発明は、少ない演算量及び記憶容量で高音質の立体音を生成するイヤホン又はヘッドホン用の立体音響処理装置及びその処理方法の提供を目的とする。また、聴者各人の認識差が小さく音源に対し音質変化が殆ど無い立体再生音を生成する立体音響処理装置及びその処理方法の提供を目的とする。また、伝達関数を用いて高速処理で立体音像を形成する立体音響処理装置及びその処理方法の提供を目的とする。更に、音声信号の圧縮処理に対し、立体音響効果が低下しない立体音響処理装置及びその処理方法の提供を目的とする。 Therefore, an object of the present invention is to provide a stereophonic processing device for earphones or headphones that generates high-quality stereophonic sound with a small amount of calculation and storage capacity, and a processing method thereof. Another object of the present invention is to provide a stereophonic sound processing device and a processing method thereof for generating a stereophonic sound with little recognition difference between listeners and almost no change in sound quality with respect to a sound source. Another object of the present invention is to provide a stereophonic processing device for forming a stereophonic sound image by high-speed processing using a transfer function and a processing method thereof. Further, it is an object of the present invention to provide a stereophonic processing apparatus and a processing method thereof in which the stereophonic effect does not deteriorate with respect to the compression processing of the audio signal.
 本発明の立体音響処理装置(10,10’,10”)は、可聴領域の最低域を含む第1周波数帯域(91)と第1周波数帯域(91)よりも高域の第2周波数帯域(92)とを包含する音源信号から、立体音再生信号を生成してイヤホン(50)又はヘッドホンに供給する。入力信号の少なくとも第1周波数帯域(91)を頭外定位処理する立体処理部(1)と、立体処理部(1)に接続され、入力信号の第2周波数帯域(92)の少なくとも一部分に、反射音成分を付加する反射処理部(2)とを備える。立体処理部(1)は、左右一対のみの伝達関数の係数が予め又は選択により付与された立体処理フィルタ(11a-11d)を含み、左右一対のみの伝達関数は、水平面方位角45°~90°の一角度のみにおける音源位置から左右各耳までの一対の頭部伝達関数に対し、音源信号の第2周波数帯域に相当する帯域(92’)が抑圧又は減衰処理された、左右一対のみの高域抑圧伝達関数である。 The stereophonic processing apparatus (10,10', 10 ") of the present invention has a first frequency band (91) including the lowest region of the audible region and a second frequency band (91) higher than the first frequency band (91). A stereophonic sound reproduction signal is generated from the sound source signal including 92) and supplied to the earphone (50) or the headphone. The stereophonic processing unit (1) that performs out-of-head localization processing at least the first frequency band (91) of the input signal. ) And a reflection processing unit (2) connected to the stereophonic processing unit (1) and adding a reflected sound component to at least a part of the second frequency band (92) of the input signal. Includes a stereophonic processing filter (11a-11d) to which the coefficients of only one pair of left and right transmission functions are given in advance or by selection, and the only pair of left and right transmission functions are at only one angle from 45 ° to 90 ° in the horizontal plane. A pair of left and right high-frequency suppression transmission functions in which the band (92') corresponding to the second frequency band of the sound source signal is suppressed or attenuated for the pair of head transmission functions from the sound source position to each of the left and right ears. be.
 本発明では、一角度のみの伝達関数を使用し、かつ第2周波数帯域に相当する帯域(92’)が抑圧又は減衰処理された高域抑圧伝達関数(BRTF)を使用するため、少ない演算量及びメモリ容量により、音源信号を三次元化処理できる。また、高域抑圧伝達関数(BRTF)は、水平面方位角(θ)45°~90°の頭部伝達関数(HRTR)から得られるため、立体音認識の個人差が小さく、イヤホン(50)又はヘッドホンを装着したあらゆる聴者に対し、頭内定位が無く、高音質の立体音を再現できる。更に、本発明では、高域抑圧伝達関数(BRTF)の係数(hB0~hBn)を予め立体処理部(1)に付与し、作動中に更新しないため、記憶装置及び追加の演算装置を必要とせず、装置全体を小型軽量化でき、イヤホン(50)又はヘッドホンに内蔵できる。又は、本発明では、水平面方位角(θ)一角度における高域抑圧伝達関数(BRTF)の係数(hB0~hBn)を選択できるため、イヤホン(50)又はヘッドホンの種類により、音声の種類により、また、聴者の好みにより切り替えて、最適な立体音の再生を実現できる。 In the present invention, the transfer function of only one angle is used, and the high frequency suppression transfer function (BRTF) in which the band (92') corresponding to the second frequency band is suppressed or attenuated is used, so that the amount of calculation is small. And, depending on the memory capacity, the sound source signal can be processed in three dimensions. In addition, since the high-frequency suppression transfer function (BRTF) is obtained from the head-related transfer function (HRTR) with a horizontal plane azimuth (θ) of 45 ° to 90 °, individual differences in stereophonic sound recognition are small, and earphones (50) or For all listeners wearing headphones, there is no in-head localization and high-quality three-dimensional sound can be reproduced. Further, in the present invention, the coefficient (h B0 to h Bn ) of the high frequency suppression transmission function (BRTF) is given to the three-dimensional processing unit (1) in advance and is not updated during operation, so that a storage device and an additional arithmetic unit are used. The entire device can be made smaller and lighter without the need, and can be built into earphones (50) or headphones. Alternatively, in the present invention, since the coefficient (h B0 to h Bn ) of the high frequency suppression transmission function (BRTF) at one horizontal plane azimuth (θ) can be selected, the type of sound depends on the type of earphone (50) or headphone. Therefore, it is possible to realize optimum reproduction of three-dimensional sound by switching according to the taste of the listener.
 本発明の立体音響処理装置(10’’’)の他の実施の形態は、音源信号を立体処理装置(10’’’)に入力する第1及び第2入力端(6a,6b)と、第1及び第2入力端(6a,6b)からの入力信号の少なくとも第1周波数帯域(91)を頭外定位処理する立体処理部(1)とを備える。立体処理部(1)は、第1入力端(6a)に接続された第1及び第2立体処理フィルタ(11a,11b)と、第2入力端(6a)に接続された第3及び第4立体処理フィルタ(11c,11d)と、第2及び第3立体処理フィルタ(11b,11c)の両出力信号を左右信号として各々入力し、センター信号及びサイド信号に分離し、サイド信号に反射音成分を付加するM/S処理器(51)と、M/S処理器(51)の一方の出力信号と第4立体処理フィルタ(11d)の出力信号とを加算する他方の加算器(16b)と、M/S処理器(51)の他方の出力信号と第1立体処理フィルタ(11a)の出力信号とを加算する一方の加算器(16a)とを備える。第1~第4立体処理フィルタ(11a-11d)は、左右一対のみの伝達関数の係数が予め又は選択により付与され、左右一対のみの伝達関数は、水平面方位角45°~90°の一角度のみにおける音源位置から左右各耳までの一対の頭部伝達関数に対し、音源信号の第2周波数帯域に相当する帯域(92’)が抑圧又は減衰処理された、左右一対のみの高域抑圧伝達関数である。M/S処理器(51)でセンター及びサイド信号に分離し、サイド信号、例えば音源と逆側のサイド信号に反射音を付加することにより、立体空間音響を人間が認識し易い音を生成できる。 Another embodiment of the stereophonic processing apparatus (10''') of the present invention includes a first and second input terminals (6a, 6b) for inputting a sound source signal to the stereophonic processing apparatus (10'''). It is provided with a stereophonic processing unit (1) that performs out-of-head localization processing at least the first frequency band (91) of the input signals from the first and second input terminals (6a, 6b). The three-dimensional processing unit (1) includes a first and second three-dimensional processing filter (11a, 11b) connected to the first input end (6a) and a third and fourth three-dimensional processing filters (11a, 11b) connected to the second input end (6a). Both output signals of the three-dimensional processing filter (11c, 11d) and the second and third three-dimensional processing filters (11b, 11c) are input as left and right signals, separated into a center signal and a side signal, and a reflected sound component is added to the side signal. And the other adder (16b) that adds the output signal of one of the M / S processor (51) and the output signal of the fourth stereoscopic processing filter (11d). , The one adder (16a) for adding the other output signal of the M / S processor (51) and the output signal of the first three-dimensional processing filter (11a) is provided. In the first to fourth three-dimensional processing filters (11a-11d), the transfer function of only one pair of left and right is given in advance or by selection, and the transfer function of only one pair of left and right is one angle of horizontal plane orientation angle of 45 ° to 90 °. Only a pair of left and right high-frequency transfer functions in which the band (92') corresponding to the second frequency band of the sound source signal is suppressed or attenuated for the pair of head-related transfer functions from the sound source position to the left and right ears. It is a function. By separating the center and side signals with the M / S processor (51) and adding reflected sound to the side signal, for example, the side signal on the opposite side of the sound source, it is possible to generate a sound that is easy for humans to recognize three-dimensional spatial sound. ..
 本発明の立体音響処理方法は、可聴領域の最低域を含む第1周波数帯域(91)と第1周波数帯域(91)よりも高域の第2周波数帯域(92)とを包含する音源信号から、立体音再生信号を生成してイヤホン(50)又はヘッドホンに供給する。立体処理部(1)の立体処理フィルタ(11a-11d)に、伝達関数の係数を予め付与する過程と、伝達関数の係数が予め付与された立体処理フィルタ(11a-11d)に、音源信号を又は反射処理部(2)の処理信号を入力し、入力信号の少なくとも第1周波数帯域(91)を頭外定位処理して、立体処理部(1)の処理信号を生成する過程とを含む。伝達関数の係数を予め付与する過程は、水平面方位角45°~90°の一角度のみにおける音源位置から左右各耳までの左右一対の頭部伝達関数を準備する過程と、左右一対の頭部伝達関数に対し、第2周波数帯域に相当する帯域(92’)を抑圧又は減衰処理して、左右一対の高域抑圧伝達関数を生成する過程と、左右一対の高域抑圧伝達関数を逆フーリ変換により演算処理して、左右一対の高域抑圧インパルス応答を得る過程と、高域抑圧インパルス応答の値に相当する、左右一対のみの高域抑圧伝達関数の係数を、立体処理フィルタ(11a-11d)に予め付与する過程とを含む。 The stereophonic processing method of the present invention is from a sound source signal including a first frequency band (91) including the lowest region of the audible region and a second frequency band (92) higher than the first frequency band (91). , Generates a stereophonic sound reproduction signal and supplies it to the earphone (50) or headphones. The process of preliminarily assigning the transfer function coefficient to the three-dimensional processing filter (11a-11d) of the three-dimensional processing unit (1) and the sound source signal to the three-dimensional processing filter (11a-11d) to which the transfer function coefficient is preliminarily assigned. Alternatively, the process of inputting the processing signal of the reflection processing unit (2) and performing out-of-head localization processing of at least the first frequency band (91) of the input signal to generate the processing signal of the three-dimensional processing unit (1) is included. The process of assigning the transfer function coefficient in advance is the process of preparing a pair of left and right head transfer functions from the sound source position to each of the left and right ears at only one angle of the horizontal plane orientation angle of 45 ° to 90 °, and the process of preparing a pair of left and right heads. The process of suppressing or attenuating the band (92') corresponding to the second frequency band with respect to the transfer function to generate a pair of left and right high-frequency suppression transfer functions, and the process of inversely fooling the pair of left and right high-frequency suppression transfer functions. A three-dimensional processing filter (11a- 11 d) includes the process of pre-giving.
 また、本発明の立体音響処理方法の他の実施の形態は、伝達関数の係数を生成及び蓄積する過程と、立体処理部(1)の立体処理フィルタ(11a-11d)に、音源信号を又は反射処理部(2)の処理信号を入力し、入力信号の少なくとも第1周波数帯域(91)を頭外定位処理して、立体処理部(1)の処理信号を生成する過程とを含む。伝達関数の係数を生成及び蓄積する過程は、水平面方位角45°~90°の複数角度における各音源位置から左右各耳までの左右複数対の頭部伝達関数を準備する過程と、左右複数対の頭部伝達関数に対し、第2周波数帯域に相当する帯域(92’)を、係数生成部(3)で抑圧又は減衰処理して、左右複数対の高域抑圧伝達関数を生成する過程と、左右複数対の高域抑圧伝達関数を逆フーリエ変換により演算処理して、左右複数対の高域抑圧インパルス応答を得る過程と、高域抑圧インパルス応答の値に相当する、左右複数対の高域抑圧伝達関数の係数を、係数記憶部(4)に蓄積する過程とを含む。立体処理部(1)の処理信号を生成する過程は、係数記憶部(4)に蓄積された左右複数対の高域抑圧伝達関数の係数のうち、左右一対のみの高域抑圧伝達関数の係数を選択する過程と、伝達関数の係数が予め付与されている又は付与されていない立体処理フィルタ(11a-11d)に対し、選択された水平面方位角一角度のみにおける左右一対のみの高域抑圧伝達関数の係数を付与する過程と、左右一対のみの高域抑圧伝達関数の係数が付与された立体処理フィルタ(11a-11d)に対し、音源信号を又は反射処理部(2)の処理信号を通過させて、少なくとも第1周波数帯域(91)について頭外定位処理した立体処理部(1)の処理信号を生成する過程とを含む。 Further, in another embodiment of the three-dimensional acoustic processing method of the present invention, a sound source signal is input to the process of generating and accumulating the coefficient of the transfer function and the three-dimensional processing filter (11a-11d) of the three-dimensional processing unit (1). It includes a process of inputting a processing signal of the reflection processing unit (2), performing out-of-head localization processing of at least the first frequency band (91) of the input signal, and generating a processing signal of the three-dimensional processing unit (1). The process of generating and accumulating transfer function coefficients is the process of preparing multiple pairs of left and right head transfer functions from each sound source position to each of the left and right ears at multiple angles of horizontal plane orientation angle of 45 ° to 90 °, and multiple pairs of left and right. With respect to the head transfer function of, the band (92') corresponding to the second frequency band is suppressed or attenuated by the coefficient generator (3) to generate a plurality of pairs of left and right high-frequency transfer functions. , The process of obtaining the high-frequency suppression impulse response of multiple pairs of left and right by arithmetically processing the high-frequency suppression transfer function of multiple pairs of left and right by inverse Fourier transform, and the high of multiple pairs of left and right corresponding to the value of the high-frequency suppression impulse response. It includes the process of accumulating the coefficient of the region suppression transfer function in the coefficient storage unit (4). The process of generating the processing signal of the three-dimensional processing unit (1) is the coefficient of the high-frequency suppression transfer function of only one pair of left and right among the coefficients of the high-frequency suppression transfer function of multiple pairs of left and right stored in the coefficient storage unit (4). For the process of selecting and the three-dimensional processing filter (11a-11d) to which the transfer function coefficient is pre-assigned or not assigned, only a pair of left and right high-frequency suppression transmissions at only one selected horizontal plane azimuth angle. The sound source signal or the processing signal of the reflection processing unit (2) is passed through the process of assigning the function coefficient and the stereoscopic processing filter (11a-11d) to which only one pair of left and right high-frequency suppression transfer function coefficients are assigned. This includes a process of generating a processing signal of the three-dimensional processing unit (1) that has been subjected to out-of-head localization processing for at least the first frequency band (91).
 本発明の立体音響処理プログラムは、立体音響処理装置の各部としてコンピュータを機能させる。 The stereophonic processing program of the present invention causes a computer to function as each part of the stereophonic processing apparatus.
 本発明の立体音響処理装置及び処理方法及びプログラムでは、個人差の影響が小さく高音質で自然な音質の立体音響を実現できる。また、少ない演算量及び記憶容量により、パソコン、スマートフォン、ゲーム機等にソフトウエアとして搭載でき、また、小型化及び軽量化した本発明の立体音響処理装置を、イヤホン又はヘッドホンに内蔵できる。更に、本発明では、圧縮信号の送信が必要な無線イヤホン又はヘッドホンでも、増幅等の補正をせずに、高音質の立体音響効果を実現できる。 The stereophonic processing device, processing method, and program of the present invention can realize stereophonic sound with high sound quality and natural sound quality with little influence of individual differences. Further, with a small amount of calculation and storage capacity, the stereophonic processing device of the present invention, which can be mounted as software on a personal computer, a smartphone, a game machine or the like, and which is downsized and lightweight, can be built in an earphone or a headphone. Further, in the present invention, even a wireless earphone or headphone that requires transmission of a compressed signal can realize a high-quality stereophonic effect without correction such as amplification.
本発明の第1の実施の形態による立体音響処理装置を示すブロック図A block diagram showing a stereophonic processing apparatus according to the first embodiment of the present invention. 図1の立体処理部の構成を例示するブロック図A block diagram illustrating the configuration of the three-dimensional processing unit of FIG. 図1の立体処理部の構成を例示するブロック図A block diagram illustrating the configuration of the three-dimensional processing unit of FIG. 図2A及び図2Bの立体処理フィルタの構成を例示するブロック図A block diagram illustrating the configuration of the three-dimensional processing filter of FIGS. 2A and 2B. 図1の反射処理部の構成を例示するブロック図A block diagram illustrating the configuration of the reflection processing unit of FIG. 図1の反射処理部の構成を例示するブロック図A block diagram illustrating the configuration of the reflection processing unit of FIG. 図1の反射処理部の構成を例示するブロック図A block diagram illustrating the configuration of the reflection processing unit of FIG. 図1の反射処理部の構成を例示するブロック図A block diagram illustrating the configuration of the reflection processing unit of FIG. 図4A~図4Dの反射処理フィルタの構成を例示するブロック図A block diagram illustrating the configuration of the reflection processing filter of FIGS. 4A to 4D. 図5の周波数フィルタの構成を例示するブロック図A block diagram illustrating the configuration of the frequency filter of FIG. 音源から頭部インパルス応答を取得する方法を示す概略図Schematic diagram showing how to obtain a head impulse response from a sound source 頭部インパルス応答から高域抑圧伝達関数を生成するフローチャートFlowchart to generate high frequency suppression transfer function from head impulse response 左耳(a)及び右耳(b)の各伝達関数を示すグラフGraph showing each transfer function of left ear (a) and right ear (b) 左耳(a)及び右耳(b)の各インパルス応答を示すグラフGraph showing each impulse response of the left ear (a) and the right ear (b) 従来技術(ア)及び本発明の方法(イ)により処理を施した各音声信号を示すグラフA graph showing each audio signal processed by the prior art (a) and the method (b) of the present invention. 図11の各音声信号に対し圧縮処理を施した各音声信号を示すグラフA graph showing each audio signal obtained by compressing each audio signal in FIG. 11. 本発明の第2の実施の形態による立体音響処理装置を示すブロック図A block diagram showing a stereophonic processing apparatus according to a second embodiment of the present invention. 本発明の第3の実施の形態による立体音響処理装置を示すブロック図A block diagram showing a stereophonic processing apparatus according to a third embodiment of the present invention. 図14のM/Sディレイ部の構成を例示するブロック図A block diagram illustrating the configuration of the M / S delay unit of FIG. 本発明の第4の実施の形態による立体音響処理装置を示すブロック図A block diagram showing a stereophonic processing apparatus according to a fourth embodiment of the present invention.
 本発明による立体音響処理装置及びその処理方法及びそのプログラムの実施の形態を図1~図16を参照して説明する。下記実施の形態は例示であり、本発明を限定的に解釈するものではない。 The stereophonic processing apparatus according to the present invention, the processing method thereof, and the embodiment of the program will be described with reference to FIGS. 1 to 16. The following embodiments are illustrative and do not limit the interpretation of the present invention.
 図1は、音源装置30に記憶された音源信号から立体音再生信号を生成し、イヤホン50又はヘッドホンに立体音再生信号を供給する2チャンネル式の立体音響処理装置10のブロック図を示す。音源信号は、可聴領域の最低域を含む第1周波数帯域91と第1周波数帯域91よりも高域の第2周波数帯域92とを包含する。人間の可聴領域は、一般に周波数範囲が20Hzから20,000Hzと言われ、その最低域とは約20Hzである。本発明の立体音響処理装置10は、入力信号の少なくとも第1周波数帯域91を頭外定位処理する立体処理部1と、立体処理部1に接続され、入力信号の第1周波数帯域91を超える高域を含む第2周波数帯域92の少なくとも一部分に、反射音成分を付加する反射処理部2とを備える。 FIG. 1 shows a block diagram of a two-channel stereophonic processing device 10 that generates a stereophonic sound reproduction signal from a sound source signal stored in the sound source apparatus 30 and supplies the stereophonic sound reproduction signal to the earphone 50 or headphones. The sound source signal includes a first frequency band 91 including the lowest region of the audible region and a second frequency band 92 higher than the first frequency band 91. The human audible range is generally said to have a frequency range of 20 Hz to 20,000 Hz, the lowest of which is about 20 Hz. The stereophonic processing device 10 of the present invention is connected to a stereophonic processing unit 1 that performs out-of-head localization processing of at least the first frequency band 91 of the input signal and a stereophonic processing unit 1, and has a height exceeding the first frequency band 91 of the input signal. A reflection processing unit 2 for adding a reflected sound component is provided in at least a part of the second frequency band 92 including the region.
 図1に示す立体処理部1は、第1及び第2入力端6a,6bを介して、デジタル又はアナログデータの音源が蓄積された音源装置30に接続される。蓄積音源がアナログデータの場合、音源装置30と立体音響処理装置10との間に、アナログ-デジタル変換装置(図示せず)が必要となる。立体処理部1は、左右一対のみの伝達関数の係数が予め付与された立体処理フィルタ11a-11dを含み、立体処理フィルタ(11a-11d)は、音源信号の入力信号を通過させて、デジタル信号処理により音源信号に対し頭外定位処理を施す。 The three-dimensional processing unit 1 shown in FIG. 1 is connected to a sound source device 30 in which a sound source of digital or analog data is stored via the first and second input terminals 6a and 6b. When the stored sound source is analog data, an analog-to-digital conversion device (not shown) is required between the sound source device 30 and the stereophonic processing device 10. The three-dimensional processing unit 1 includes a three-dimensional processing filter 11a-11d to which only a pair of left and right transfer function coefficients are given in advance, and the three-dimensional processing filter (11a-11d) passes an input signal of a sound source signal to a digital signal. Out-of-head localization processing is performed on the sound source signal by processing.
 立体処理部1の実施の形態を例示する図2Aのブロック図は、立体音響処理装置10の第1入力端6aに各一方が接続された第1立体処理フィルタ11a及び第2立体処理フィルタ11bと、立体音響処理装置10の第2入力端6bに各一方が接続された第3立体処理フィルタ11c及び第4立体処理フィルタ11bと、第1及び第3立体処理フィルタ11a,11cの各他方に接続された加算器16aと、第2及び第4立体処理フィルタ11b,11dの各他方に接続された加算器16bとを備える。所謂タスキ掛け構造である。加算器16a,16bの出力側は、図1の通り反射処理部2に接続される。図2Bのブロック図に示すように、第2及び第3立体処理フィルタ11b,11cと2つの加算器16a,16bとの間にM/S処理器51を配置してもよい。M/S処理器51は、第2及び第3立体処理フィルタ11b,11cと各加算器16a,16bとの間の各分岐点55a,55bから、左右信号がL端子及びR端子に入力されてセンター信号及びサイド信号に分離し、M端子及びS端子からセンター及びサイド信号を各々出力するM/S変換器52と、M/S変換器52からのセンター信号及びサイド信号を各々入力し、反射音を付加する反射処理フィルタ54a,54bと、反射処理フィルタ54a,54bから、反射処理後の信号が各々M端子及びS端子に入力され、左右信号に戻してL端子及びR端子から左右帰還信号を各々出力するL/R変換器53と、L/R変換器53からの各帰還信号と第2及び第3立体処理フィルタ11b,11cからの各出力信号とを加算点56a,56bで加算しタスキ掛けで加算器16b,16aへ出力する2つの加算器57a,57bとを備える。 The block diagram of FIG. 2A illustrating the embodiment of the three-dimensional processing unit 1 includes the first three-dimensional processing filter 11a and the second three-dimensional processing filter 11b, each of which is connected to the first input end 6a of the stereophonic processing apparatus 10. , Connected to the third stereophonic processing filter 11c and the fourth stereophonic processing filter 11b, one of which is connected to the second input end 6b of the stereophonic processing apparatus 10, and the other of the first and third stereophonic processing filters 11a, 11c. The adder 16a is provided, and the adder 16b connected to the other of the second and fourth steric processing filters 11b and 11d is provided. It is a so-called tasuki hanging structure. The output sides of the adders 16a and 16b are connected to the reflection processing unit 2 as shown in FIG. As shown in the block diagram of FIG. 2B, the M / S processor 51 may be arranged between the second and third three-dimensional processing filters 11b and 11c and the two adders 16a and 16b. In the M / S processor 51, left and right signals are input to the L terminal and the R terminal from the branch points 55a and 55b between the second and third three-dimensional processing filters 11b and 11c and the adder 16a and 16b. The M / S converter 52, which is separated into a center signal and a side signal and outputs the center and side signals from the M terminal and the S terminal, respectively, and the center signal and the side signal from the M / S converter 52 are input and reflected, respectively. From the reflection processing filters 54a and 54b that add sound and the reflection processing filters 54a and 54b, the signal after reflection processing is input to the M terminal and S terminal, respectively, and returned to the left and right signals, and the left and right feedback signals are sent from the L terminal and R terminal. The L / R converter 53 that outputs It is equipped with two adders 57a and 57b that output to the adders 16b and 16a by tapping.
 第1~第4立体処理フィルタ11a-11dは、FIRフィルタ(有限インパルス応答フィルタ)又はIIRフィルタ(無限インパルス応答フィルタ)である。図3のブロック図は、一般的なFIRフィルタによる立体処理フィルタ11aの構成を例示する。即ち、入力端6aに接続された遅延器121及び乗算器130と、遅延器121に接続された乗算器131と、乗算器130及び乗算器131の両出力信号を加算する加算器141とを備え、遅延器121、乗算器131及び加算器141によりタップを構成し、複数のタップにより高次(n次)のFIRフィルタを形成する。 The first to fourth three-dimensional processing filters 11a-11d are FIR filters (finite impulse response filters) or IIR filters (infinite impulse response filters). The block diagram of FIG. 3 illustrates the configuration of the three-dimensional processing filter 11a by a general FIR filter. That is, the output signals of both the delay device 12 1 and the multiplier 13 0 connected to the input terminal 6a, the multiplier 13 1 connected to the delay device 12 1 and the multiplier 13 0 and the multiplier 13 1 are added. It is equipped with an adder 141, a delay device 12 1 , a multiplier 13 1 and an adder 14 1 to form a tap, and a plurality of taps to form a higher-order (n-th order) FIR filter.
 第1~第4立体処理フィルタ11a-11dに予め付与された左右一対のみの伝達関数の係数は、水平面方位角(θ)45°~90°のうちの任意の一角度のみにおける、図7の音源71位置から左右各耳70L,70Rまでの一対の頭部インパルス応答HRIRL,HRIRRを演算処理して得られる係数である。一角度のみを使用するため、本発明では、演算量及び時間の低減を図ることができる。ここで水平面方位角(θ)45°~90°とは、人間又はダミーヘッドの頭部70の正面方向を0°とした場合の左右両方向の角度+45°~+90°及び-45°~-90°を何れも含む。当該角度範囲は、個人差の影響が小さく立体音像を形成できる。水平面方位角(θ)の任意の一角度は、例えばプラス又はマイナスの45°、50°、55°、60°、65°、70°、75°、80°、85°又は90°を含み、45°~90°の範囲内であれば、これらに限定されない。頭部インパルス応答の個人差が大きい上下方向の角度(仰角)は、0°として本発明では考慮しない。 The coefficient of the transfer function of only the pair of left and right given to the first to fourth three-dimensional processing filters 11a-11d in advance is shown in FIG. 7 at any one angle of the horizontal plane azimuth (θ) 45 ° to 90 °. It is a coefficient obtained by arithmetically processing a pair of head impulse responses HRIR L and HRIR R from the position of the sound source 71 to the left and right ears 70 L and 70 R. Since only one angle is used, the amount of calculation and time can be reduced in the present invention. Here, the horizontal azimuth (θ) 45 ° to 90 ° is an angle of both left and right directions + 45 ° to + 90 ° and −45 ° to −90 when the front direction of the head 70 of the human or dummy head is 0 °. Includes any °. In the angle range, the influence of individual differences is small and a stereophonic sound image can be formed. Any angle of horizontal azimuth (θ) includes, for example, plus or minus 45 °, 50 °, 55 °, 60 °, 65 °, 70 °, 75 °, 80 °, 85 ° or 90 °. It is not limited to these as long as it is within the range of 45 ° to 90 °. The vertical angle (elevation angle) in which the individual difference in the head impulse response is large is set to 0 ° and is not considered in the present invention.
 予め付与された伝達関数の係数は、水平面方位角45°~90°のうちの一角度における左右一対の頭部インパルス応答HRIRL,HRIRRについて演算処理して得られた左右一対の頭部伝達関数HRTFL,HRTFRに対して、音源信号の第2周波数帯域92に相当する帯域92’が抑圧又は減衰処理された、左右一対のみの高域抑圧伝達関数BRTFL,BRTFRの係数である。第1周波数帯域91及びそれに相当する伝達関数HRTF,BRTFの帯域91’は、0~8kHzの範囲に含まれる帯域であり、好ましくは0.1kHz~8kHzである。第2周波数帯域92及びそれに相当する伝達関数HRTF,BRTFの帯域92’は、8kHz~96kHzの範囲に含まれる帯域であり、通常、可聴領域の8kHz~20kHzであり、好ましくは8kHz~15kHzである。その上限は、処理の都合上、サンプリング周波数Fsの半分(Fs/2)、例えば、音源がCDでは44.1kHz/2、DVD・ブルーレイでは48kHz/2、96kHz/2、192kHz/2等であってもよい。 The pre-assigned transfer function coefficient is a pair of left and right head-related transfer obtained by arithmetically processing a pair of left and right head-related transfer responses HRIR L and HRIR R at one of the horizontal plane orientation angles of 45 ° to 90 °. It is the coefficient of the high frequency suppression transfer functions BRTF L , BRTF R of only one pair on the left and right, in which the band 92'corresponding to the second frequency band 92 of the sound source signal is suppressed or attenuated with respect to the functions HRTF L and HRTF R. .. The band 91'of the first frequency band 91 and the corresponding transfer functions HRTF and BRTF is a band included in the range of 0 to 8 kHz, and is preferably 0.1 kHz to 8 kHz. The second frequency band 92 and the corresponding transfer functions HRTF and BRTF band 92'are included in the range of 8 kHz to 96 kHz, and are usually 8 kHz to 20 kHz in the audible range, preferably 8 kHz to 15 kHz. .. The upper limit is half the sampling frequency Fs (Fs / 2) for the convenience of processing, for example, the sound source is 44.1kHz / 2 for CD, 48kHz / 2, 96kHz / 2, 192kHz / 2 for DVD / Blu-ray, etc. You may.
 図1に示す立体音響処理装置10は、一方が立体処理部1に接続され、他方が出力端7a,7bを介してデジタル-アナログ(D/A)変換装置40に接続された反射処理部2を備える。D/A変換装置40は、伝送路41によりヘッドホン50に有線又は無線接続され、立体音再生信号をイヤホン50に送信する。また、D/A変換装置を立体音響処理装置10側に設けずに、出力端7a,7bに接続された送信機(この場合に符号40で示す)から、デジタル信号をそのまま無線伝送し、ブルートゥース(登録商標)イヤホン等に付加又は内蔵されたD/A変換装置(図示せず)により、アナログ信号に変換してもよい。 In the stereophonic processing device 10 shown in FIG. 1, one is connected to the stereophonic processing unit 1 and the other is connected to the digital-to-analog (D / A) conversion device 40 via the output terminals 7a and 7b. To prepare for. The D / A conversion device 40 is wired or wirelessly connected to the headphones 50 by a transmission line 41, and transmits a stereophonic sound reproduction signal to the earphones 50. Further, the digital signal is wirelessly transmitted as it is from the transmitter (indicated by reference numeral 40 in this case) connected to the output terminals 7a and 7b without providing the D / A conversion device on the stereophonic processing device 10 side, and the bluetooth is transmitted. (Registered trademark) It may be converted into an analog signal by a D / A conversion device (not shown) added or built into an earphone or the like.
 図4Aは、2チャンネル式の反射処理部2の構成を例示するブロック図である。反射処理部2は、立体処理装置10の第1入力端6a(図1)とイヤホン50との間の第1分岐点25aに接続されて、信号が入力される第1反射処理フィルタ21aを備える。立体処理装置10の第1入力端6aと第1分岐点25aとの間の第1加算点26aには、立体処理部1の処理信号と第1反射処理フィルタ21aからの帰還信号とを加算する第1加算器27aが設けられる。同様に、反射処理部2は、立体処理装置10の第2入力端6b(図1)とイヤホン50との間の第2分岐点25bに接続されて、信号が入力される第2反射処理フィルタ21bを備える。立体処理装置10の第2入力端6bと第2分岐点25bとの間の第2加算点26bには、立体処理部1の処理信号と第2反射処理フィルタ21bからの帰還信号とを加算する第2加算器27bが設けられる。また、図4B及び図4Dのように切替器17a,17bによって、第1反射処理フィルタ21aからの帰還信号を、第2加算器27b及び第1加算器27aの何れか一方又は両方に入力してもよく、第2反射処理フィルタ21bからの帰還信号を第1加算器27a及び第2加算器27bの何れか一方又は両方に入力しもよい。図4Bは、第1及び第2分岐点25a,25bと第1及び第2反射処理フィルタ21a,21bとの間にM/S変換器28を備え、ステレオ左右信号L,Rをセンター及びサイド信号M,Sに分離し個別に処理できる。フーリエ変換を行い、|L振幅|-|R振幅|、(L振幅)2-(R振幅)2、又はarctan(|L振幅|/|R振幅|)がある範囲(望ましくは、arctan(0.7)≦arctan(|L振幅|/|R振幅|)≦arctan(1.3)程度)をセンター信号成分、それ以外をサイド信号成分として分離し、それぞれ逆フーリエ変換で戻して用いても良い(||は絶対値)。一般的に音楽信号は低域成分をセンターに配置することが多いので、低域部分(望ましくは~200Hz程度)の係数を上記arctanの値に関わらずセンター信号成分としてもよい。センター信号成分のみを抽出し、元信号から引いてサイド信号成分としても良い。変換式はM=(L+R)/2、S=(L-R)/2である。図4Bは、第1及び第2反射処理フィルタ21a,21bと第1及び第2加算器27a,27bとの間にL/R変換器29を備え、センター及びサイド信号M,Sをステレオ左右信号L,Rに戻す。変換式はL=M+S、R=M-Sである。図4Cは、M/S変換器28のM端子にそれぞれ一方が接続された第1及び第3反射処理フィルタ21a,21cと、M/S変換器28のS端子にそれぞれ一方が接続された第2及び第4反射処理フィルタ21b,21dと、第1及び第3反射処理フィルタ21a,21cの両出力を加算しL/R変換器29のM端子に接続された加算器19aと、第2及び第4反射処理フィルタ21b,21dの両出力を加算しL/R変換器29のS端子に接続された加算器19bとを備える。更に図4Dは、第3及び第4反射処理フィルタ21c,21dと加算器19a,19bとの間に、切替器18a,18bを設けて、第3反射処理フィルタ21cからの出力信号を2つの加算器19a,19bの何れか又は両方に切替え送信でき、また、第4反射処理フィルタ21dからの出力信号を2つの加算器19b,19aの何れか又は両方に切替え送信できる。 FIG. 4A is a block diagram illustrating the configuration of the two-channel reflection processing unit 2. The reflection processing unit 2 includes a first reflection processing filter 21a connected to a first branch point 25a between the first input end 6a (FIG. 1) of the three-dimensional processing device 10 and the earphone 50 to input a signal. .. The processing signal of the three-dimensional processing unit 1 and the feedback signal from the first reflection processing filter 21a are added to the first addition point 26a between the first input terminal 6a and the first branch point 25a of the three-dimensional processing apparatus 10. A first adder 27a is provided. Similarly, the reflection processing unit 2 is connected to the second branch point 25b between the second input end 6b (FIG. 1) of the three-dimensional processing device 10 and the earphone 50, and the signal is input to the second reflection processing filter. Equipped with 21b. The processing signal of the three-dimensional processing unit 1 and the feedback signal from the second reflection processing filter 21b are added to the second addition point 26b between the second input end 6b and the second branch point 25b of the three-dimensional processing apparatus 10. A second adder 27b is provided. Further, as shown in FIGS. 4B and 4D, the feedback signal from the first reflection processing filter 21a is input to either or both of the second adder 27b and the first adder 27a by the switches 17a and 17b. Alternatively, the feedback signal from the second reflection processing filter 21b may be input to either or both of the first adder 27a and the second adder 27b. In FIG. 4B, an M / S converter 28 is provided between the first and second branch points 25a and 25b and the first and second reflection processing filters 21a and 21b, and the stereo left and right signals L and R are centered and side signals. It can be separated into M and S and processed individually. Perform a Fourier transform and perform a Fourier transform, | L amplitude |-| R amplitude |, (L amplitude) 2- (R amplitude) 2 , or arctan (| L amplitude | / | R amplitude |) within a certain range (preferably arctan (0.7). ) ≤ arctan (| L amplitude | / | R amplitude |) ≤ arctan (1.3)) may be separated as a center signal component, and the others may be separated as side signal components and returned by inverse Fourier transform (||). Is an absolute value). In general, since the low frequency component is often arranged in the center of the music signal, the coefficient of the low frequency portion (preferably about 200 Hz) may be used as the center signal component regardless of the arctan value. Only the center signal component may be extracted and subtracted from the original signal to be used as the side signal component. The conversion formulas are M = (L + R) / 2 and S = (LR) / 2. In FIG. 4B, an L / R converter 29 is provided between the first and second reflection processing filters 21a and 21b and the first and second adders 27a and 27b, and the center and side signals M and S are stereo left and right signals. Return to L and R. The conversion formulas are L = M + S and R = MS. 4C shows the first and third reflection processing filters 21a and 21c, one of which is connected to the M terminal of the M / S converter 28, and the first and one of which are connected to the S terminal of the M / S converter 28. Adder 19a connected to the M terminal of the L / R converter 29 by adding the outputs of the second and fourth reflection processing filters 21b and 21d and the first and third reflection processing filters 21a and 21c, and the second and second. It is provided with an adder 19b connected to the S terminal of the L / R converter 29 by adding both outputs of the fourth reflection processing filter 21b and 21d. Further, in FIG. 4D, switching switches 18a and 18b are provided between the third and fourth reflection processing filters 21c and 21d and the adders 19a and 19b, and two output signals from the third reflection processing filter 21c are added. It can be switched and transmitted to either or both of the devices 19a and 19b, and the output signal from the fourth reflection processing filter 21d can be switched and transmitted to either or both of the two adders 19b and 19a.
 図5は、反射処理部2の第1反射処理フィルタ21aの構成を例示し、第1分岐点25a(図4A~D)に接続されて信号が入力される遅延器22と、遅延器22からの出力信号を増幅する乗算器23と、係数が付与されて、乗算器23で増幅された信号から反射音成分を生成し、帰還信号として第1加算器27aに送信する周波数フィルタ24とを含む。図6は、図5の周波数フィルタ24の一実施の形態であるIIRフィルタを示すブロック図である。周波数フィルタ24は、入力信号X24を係数b0で乗算する乗算器630と、入力信号X24を遅延させる遅延器621と、遅延器621による信号を係数b1で乗算する乗算器631と、遅延器621による信号を遅延させる遅延器622と、遅延器622による信号を係数b2で乗算する乗算器632と、乗算器630による信号と乗算器631による信号とを加算する加算器641と、加算器641による信号と乗算器632による信号とを加算する加算器642とを含むフィードフォワード部を備える。 FIG. 5 illustrates the configuration of the first reflection processing filter 21a of the reflection processing unit 2, from the delayer 22 connected to the first branch point 25a (FIGS. 4A to 4D) and inputting a signal, and the delayer 22. Includes a multiplier 23 that amplifies the output signal of .. FIG. 6 is a block diagram showing an IIR filter, which is an embodiment of the frequency filter 24 of FIG. The frequency filter 24 is a multiplier that multiplies the input signal X 24 by a coefficient b 0 , a delayer 62 1 that delays the input signal X 24 , and a multiplier that multiplies the signal by the delayer 62 1 by a coefficient b 1 . 63 1 and the delayer 62 2 that delays the signal by the delayer 62 1 , the multiplier 63 2 that multiplies the signal by the delayer 62 2 by the coefficient b 2 , and the signal by the multiplier 63 0 and the multiplier 63 1 It is provided with a feed forward unit including an adder 64 1 for adding a signal and an adder 64 2 for adding a signal by the adder 64 1 and a signal by a multiplier 63 2 .
 また、周波数フィルタ24は、加算器642による信号から加算器643,644を通じ生じた出力信号Y24の一部を入力する遅延器623と、遅延器623による信号を係数a1で乗算する乗算器633と、遅延器623による信号を遅延させる遅延器624と、遅延器624による信号を係数a2で乗算する乗算器634と、加算器642による信号と乗算器633による信号とを加算する加算器643と、加算器643による信号と乗算器634による信号とを加算する加算器644とを含むフィードバック部を備える。図4A~Dの第2反射処理フィルタ21bは、第1反射処理フィルタ21aと同一構成のため、説明を省略する。反射音は、初期反射音と、追加反射音とを含む。初期反射音(Early Reflection)は、音源から直接到達する直接音の後に短時間で到達する反射音であり、物体による反射回数が1回のみの反射音と指称される。追加反射音は、反射回数が2回以上の反射音であり、後部残響音、単に残響音、リバーブとも指称される。 Further, the frequency filter 24 uses a delay device 6 2 3 for inputting a part of the output signal Y 24 generated from the signal by the adder 64 2 through the adders 64 3 and 64 4 and a coefficient a 1 for the signal by the delay device 6 23. A multiplier 63 3 that multiplies by, a delayer 62 4 that delays the signal by the delayer 62 3 , a multiplier 63 4 that multiplies the signal by the delayer 62 4 by the coefficient a 2 , and a signal by the adder 64 2 . It is provided with a feedback unit including an adder 64 3 that adds the signal by the multiplier 63 3 and an adder 64 4 that adds the signal by the adder 64 3 and the signal by the multiplier 63 4 . Since the second reflection processing filter 21b of FIGS. 4A to 4D has the same configuration as the first reflection processing filter 21a, the description thereof will be omitted. The reflected sound includes an initial reflected sound and an additional reflected sound. Early reflection is a reflected sound that arrives in a short time after a direct sound that arrives directly from a sound source, and is referred to as a reflected sound that is reflected only once by an object. The additional reflected sound is a reflected sound having two or more reflections, and is also referred to as a rear reverberation sound, simply a reverberation sound, or a reverb.
 図1に示す本発明の立体音響装置10を用い、音源信号を立体処理する方法を以下詳述する。
 本発明では最初に、立体処理部1の立体処理フィルタ11a-11dに、伝達関数の係数を予め付与する。伝達関数の係数を予め付与する過程は、無響室にて水平面方位角45°~90°のうちの一角度のみにおける音源位置から左右各耳までの左右一対の頭部伝達関数HRTFL,HRTFRを準備する過程を含む。例えば図7では、一角度として方位角(θ)実質+45°における音源71から、約1m離間したダミーヘッド又は人間の頭部70の左右各耳70L,70Rに設けたマイク(図示せず)により左右インパルス応答gL,gRを収録し、頭部70が無い状態での頭部中心相当位置70Cに配置したマイク(図示せず)によりブランクのインパルス応答gCを収録し、左右インパルス応答gL,gRをブランクのインパルス応答gCで除して、左右の頭部インパルス応答(gL/gC=HRIRL,gR/gC=HRIRR)が得られる。
A method for three-dimensionally processing a sound source signal using the stereophonic device 10 of the present invention shown in FIG. 1 will be described in detail below.
In the present invention, first, the coefficient of the transfer function is given in advance to the three-dimensional processing filter 11a-11d of the three-dimensional processing unit 1. In the process of assigning the transfer function coefficient in advance, a pair of left and right head-related transfer functions HRTF L , HRTF from the sound source position to each of the left and right ears at only one angle of the horizontal plane azimuth angle of 45 ° to 90 ° in an anechoic chamber. Includes the process of preparing R. For example, in FIG. 7, a dummy head or a microphone (not shown) provided on the left and right ears 70L and 70R of the human head 70, which is about 1 m away from the sound source 71 at an impulse (θ) real + 45 ° as one angle, is used. The left and right impulse responses g L and g R are recorded, and the blank impulse response g C is recorded by a microphone (not shown) placed at the position corresponding to the center of the head 70 C without the head 70, and the left and right impulse responses g are recorded. By dividing L and g R by the blank impulse response g C , the left and right head impulse responses (g L / g C = HRIR L , g R / g C = HRIR R ) are obtained.
 図8は、左右の頭部インパルス応答HRIRL,HRIRRから(80)、高域抑圧伝達関数のインパルス応答(高域抑圧インパルス応答BRIRL,BRIRR)を生成する(90)フローチャートを示す。最初に、左右の頭部インパルス応答HRIRL,HRIRRをフーリエ変換Fにより演算処理し(s81)、その常用対数を20倍しdB値に変換することにより(s82)、左右一対の頭部伝達関数が準備される(FHRIRL=HRTFL,FHRIRR=HRTFR)。別法として、研究機関、大学、一般企業等のデータベースから取得し又はそれを補正した左右一対の頭部伝達関数HRTFL,HRTFRを用いてもよい。図9(a)及び(b)は、横軸を周波数[kHz]、縦軸を相対振幅[dB]として、図9(a)実線は、水平面方位角(θ)実質45°における左耳70Lの頭部伝達関数HRTFLを示し、図9(b)実線は、水平面方位角(θ)実質45°における右耳70Rの頭部伝達関数HRTFRを示す。図9ではサンプリング周波数Fsを44.1kHzとする。 FIG. 8 shows a flowchart (90) for generating an impulse response (high frequency suppression impulse response BRIR L , BRIR R ) of a high frequency suppression transfer function from the left and right head impulse responses HRIR L , HRIR R (80). First, the left and right head impulse responses HRIR L and HRIR R are arithmetically processed by Fourier transform F (s81), and the common logarithm is multiplied by 20 and converted into a dB value (s82) to transmit a pair of left and right heads. The function is prepared (FHRIR L = HRTF L , FHRIR R = HRTF R ). Alternatively, a pair of left and right head-related transfer functions HRTF L , HRTF R obtained from or corrected from a database of a research institution, a university, a general company, etc. may be used. In FIGS. 9 (a) and 9 (b), the horizontal axis is the frequency [kHz] and the vertical axis is the relative amplitude [dB]. The head-related transfer function HRTF L of FIG. 9B is shown, and the solid line in FIG. 9B shows the head-related transfer function HRTF R of the right ear 70R at a horizontal azimuth (θ) of 45 °. In FIG. 9, the sampling frequency Fs is 44.1 kHz.
 得られた左右一対の頭部伝達関数HRTFL,HRTFRに対し、音源信号の第2周波数帯域92に相当する帯域92’、例えば8kHzを超える周波数の成分を、低域通過(ローパス)フィルタで抑圧又は減衰処理して(s83)、左右一対の高域抑圧伝達関数BRTFL,BRTFRが生成され、それらを図9(a)及び(b)の破線で示す。低域通過フィルタによる抑圧又は減衰処理(s83)によって、時間軸上では細かい凹凸が消失し波形が平滑化(スムージング)される。例えば移動平均法により平滑化してもよい。図9(a)及び(b)において、高域抑圧処理前の頭部伝達関数HRTFL,HRTFR(実線)と、処理後の高域抑圧伝達関数BRTFL,BRTFR(破線)とを比較すると、0から実質8kHzの第1周波数帯域に相当する帯域91’では、実線と破線とがほぼ同一の値を示す。一方、実質8kHzを超える第2周波数帯域に相当する帯域92’では、実線と破線とは似た変化及び挙動を示すが、高域抑圧伝達関数BRTFL,BRTFRは、頭部伝達関数HRTFL,HRTFRから数dB減衰した値を示す。 For the obtained pair of left and right head-related transfer functions HRTF L and HRTF R , a band 92'corresponding to the second frequency band 92 of the sound source signal, for example, a component having a frequency exceeding 8 kHz is passed through a low-pass filter. Suppression or attenuation processing (s83) produces a pair of left and right high-frequency suppression transfer functions BRTF L , BRTF R , which are shown by the broken lines in FIGS. 9 (a) and 9 (b). By the suppression or attenuation processing (s83) by the low-pass filter, fine irregularities disappear on the time axis and the waveform is smoothed. For example, it may be smoothed by a moving average method. In FIGS. 9A and 9B, the head-related transfer functions HRTF L and HRTF R (solid line) before the high-frequency suppression treatment are compared with the high-frequency suppression transfer functions BRTF L and BRTF R (broken line) after the treatment. Then, in the band 91'corresponding to the first frequency band from 0 to 8 kHz, the solid line and the broken line show almost the same value. On the other hand, in the band 92'corresponding to the second frequency band exceeding 8 kHz, the solid line and the broken line show similar changes and behaviors, but the high frequency suppression transfer functions BRTF L and BRTF R are head-related transfer functions HRTF L. , Shows a value attenuated by a few dB from the HRTF R.
 次に、水平面方位角(θ)の一角度における左右一対の高域抑圧伝達関数BRTFL,BRTFR(図9破線)について、対数値を振幅値に戻し(s84)、逆フーリエ変換F-1して(F-1BRTFL,F-1BRTFR)(s85)、更に、例えばハミング窓により窓関数処理を行い(s86)、左右一対の高域抑圧インパルス応答BRIRL,BRIRRを得る(90)。窓関数として、方形波窓、ハニング窓、バートレット窓を用いてもよい。横軸をサンプル数(最大値512)、縦軸を振幅で表した図10(a)及び(b)は、本発明により得られた左右一対の高域抑圧インパルス応答BRIRL,BRIRR(実線)を、従来技術の左右一対の頭部インパルス応答HRIRL,HRIRR(破線)と対比して示すグラフである。音源71に近位の左耳70L側の各インパルス応答HRIRL,BRIRLを図10(a)に示し、音源71から遠位の右耳70R側の各インパルス応答BRIRR,BRIRRを図10(b)に示す。音源71からの距離の相違により、図10(a)は、図10(b)に比べて全体的に振幅が大きい。 Next, for the pair of left and right high-frequency suppression transfer functions BRTF L , BRTF R (broken line in Fig. 9) at one angle of the horizontal plane azimuth (θ), the logarithmic value is returned to the amplitude value (s84), and the inverse Fourier transform F -1 . Then (F -1 BRTF L , F -1 BRTF R ) (s85), and further, window function processing is performed by, for example, a humming window (s86), and a pair of left and right high-frequency suppression impulse responses BRIR L , BRIR R are obtained (F -1 BRTF L, F -1 BRTF R). 90). A square wave window, a Hanning window, or a Bartlett window may be used as the window function. FIGS. 10 (a) and 10 (b), in which the horizontal axis represents the number of samples (maximum value 512) and the vertical axis represents the amplitude, show a pair of left and right high-frequency suppression impulse responses BRIR L , BRIR R (solid lines) obtained by the present invention. ) Is a graph showing in comparison with a pair of left and right head impulse responses HRIR L , HRIR R (broken line) of the prior art. The impulse responses HRIR L and BRIR L on the left ear 70L side proximal to the sound source 71 are shown in FIG. 10 (a), and the impulse responses BRIR R and BRIR R on the right ear 70R side distal to the sound source 71 are shown in FIG. Shown in (b). Due to the difference in distance from the sound source 71, FIG. 10 (a) has a larger amplitude as a whole as compared with FIG. 10 (b).
 図10(a)及び(b)に示す通り、本発明により得られた高域抑圧インパルス応答BRIRL,BRIRR(破線)では、サンプル数150程度で振幅が0に収束するのに対し、従来技術の頭部インパルス応答HRIRL,HRIRR(実線)では、サンプル数最大値512(図10右端)まで振幅が収束しない。本発明では、第2周波数帯域に相当する帯域92’を抑圧又は減衰処理するため、主に高周波数成分から構成される時間的に遅延した(サンプル数150以上の)インパルス応答が抑制されたことに起因する。インパルス応答の振幅の値は、FIRフィルタ等の周波数フィルタにおいてフィルタ係数値に相当し、サンプル数は、FIRフィルタのタップ数(n+1)に相当する。タップ数(n+1)が少なければ、FIRフィルタによる演算量を低減できる。このため、高域抑圧インパルス応答BRIRL,BRIRRを採用する本発明では、従来技術に比べて演算量を著しく低減する作用効果が得られる。 As shown in FIGS. 10 (a) and 10 (b), in the high frequency suppression impulse response BRIR L , BRIR R (broken line) obtained by the present invention, the amplitude converges to 0 when the number of samples is about 150, whereas the amplitude converges to 0 in the conventional case. In the head impulse response HRIR L , HRIR R (solid line) of the technology, the amplitude does not converge up to the maximum number of samples 512 (right end of FIG. 10). In the present invention, in order to suppress or attenuate the band 92'corresponding to the second frequency band, the time-delayed impulse response (with 150 or more samples) mainly composed of high frequency components is suppressed. caused by. The value of the amplitude of the impulse response corresponds to the filter coefficient value in a frequency filter such as an FIR filter, and the number of samples corresponds to the number of taps (n + 1) of the FIR filter. If the number of taps (n + 1) is small, the amount of calculation by the FIR filter can be reduced. Therefore, in the present invention in which the high-frequency suppression impulse responses BRIR L and BRIR R are adopted, the effect of significantly reducing the amount of calculation can be obtained as compared with the prior art.
 得られた左右一対の高域抑圧インパルス応答の値BRIRL,BRIRR(図10破線)は、そのままFIRフィルタの係数に相当するため、水平面方位角θの一角度における左右一対のみの高域抑圧伝達関数の係数hB0~hBnとして、立体処理フィルタ11a-11dに予め付与される。具体的には、図3に示すフィルタ長(タップ数)n+1のFIRフィルタ11aでは、複数の乗算器130-13nに対し、フィルタ係数hB0~hBnとして、高域抑圧インパルス応答BRIRL,BRIRRの値がそれぞれ付与される。FIRフィルタの伝達関数H(z)は、式1として表される。
 H(z)=hB0+hB1-1+・・+hBn-n  (式1)
Since the obtained values of the pair of left and right high-frequency suppression impulse responses BRIR L and BRIR R (broken line in FIG. 10) directly correspond to the coefficients of the FIR filter, only the pair of left and right high-frequency suppression at one angle of the horizontal plane azimuth θ The coefficients h B0 to h Bn of the transfer function are given in advance to the three-dimensional processing filters 11a-11d. Specifically, in the FIR filter 11a having a filter length (number of taps) n + 1 shown in FIG. 3, the high frequency suppression impulse response BRIR L is set as the filter coefficients h B0 to h Bn for a plurality of multipliers 13 0 -13 n . , BRIR R values are given respectively. The transfer function H (z) of the FIR filter is expressed as Equation 1.
H (z) = h B0 + h B1 z -1 + ... + h Bn z -n (Equation 1)
 式1の周波数応答は、z-1にe-jωTを変換して式2で表される。Tはデータ間の時間間隔[秒]、ωは2πFc/Fsで表される角周波数[rad/秒]、Fsはサンプリング周波数[Hz]、Fcはカットオフ周波数[Hz]をそれぞれ示す。例えば、Fcを8kHzに設定すれば第1周波数帯域に相当する帯域91’は8kHz以下の帯域となる。
 H(jω)=hB0+hB1-jωT+・・+hBn-jnωT  (式2)
The frequency response of Equation 1 is expressed by Equation 2 by converting e- jωT to z -1 . T is the time interval [seconds] between data, ω is the angular frequency [rad / sec] represented by 2πFc / Fs, Fs is the sampling frequency [Hz], and Fc is the cutoff frequency [Hz]. For example, if Fc is set to 8 kHz, the band 91'corresponding to the first frequency band becomes a band of 8 kHz or less.
H (jω) = h B0 + h B1 e -jωT + ・ ・ + h Bn e -jnωT (Equation 2)
 周波数特性を考慮した式2のhB0~hBnに、図8のフローチャートに従い求めた高域抑圧インパルス応答BRIRL,BRIRRの値をそれぞれ入力することにより、立体処理部1の立体処理フィルタ11aに、左右一対のみの伝達関数の係数が予め付与される。係数が予め付与された立体処理フィルタ11aに音源を通過させると、入力信号Xa(図3)は、高域抑圧伝達関数BRTFL,BRTFRの係数が加味されて、出力信号Yaに変換処理される(HXa=Ya)。他の立体処理フィルタ11b-11dにも同様に、高域抑圧インパルス応答BRIRL,BRIRが係数として予め付与される。伝達関数の係数が付与された立体処理部1に、音源装置30からの音源信号を第1及び第2入力端6a,6bを通じ入力されると、入力信号を頭外定位処理して、立体処理部1の処理信号が得られる。本発明では、音源信号の第1及び第2周波数帯域91,92全体を頭外定位処理するが、高域抑圧伝達関数BRTFは第2周波数帯域に相当する帯域92’が抑圧又は減衰処理されているため、高域抑圧伝達関数BRTFによる第2周波数帯域92への影響が第1周波数帯域91への影響に比べ小さい。換言すると、第1周波数帯域91に対する高域抑圧伝達関数BRTFの係数による信号処理が、個人差の少ない立体音像形成に有効である。 By inputting the values of the high frequency suppression impulse responses BRIR L and BRIR R obtained according to the flowchart of FIG. 8 into h B0 to h Bn of Equation 2 in consideration of the frequency characteristics, the three-dimensional processing filter 11a of the three-dimensional processing unit 1 is input. Is given in advance the coefficients of only a pair of left and right transfer functions. When the sound source is passed through the three-dimensional processing filter 11a to which the coefficient is given in advance, the input signal Xa (FIG. 3) is converted into the output signal Ya by adding the coefficients of the high frequency suppression transfer functions BRTF L and BRTF R. (HXa = Ya). Similarly, the high-frequency suppression impulse responses BRIR L and BRIR are given in advance as coefficients to the other three-dimensional processing filters 11b-11d. When the sound source signal from the sound source device 30 is input to the three-dimensional processing unit 1 to which the coefficient of the transfer function is assigned through the first and second input terminals 6a and 6b, the input signal is subjected to out-of-head localization processing to perform three-dimensional processing. The processing signal of part 1 is obtained. In the present invention, the entire first and second frequency bands 91 and 92 of the sound source signal are subjected to out-of-head localization processing, but in the high frequency suppression transfer function BRTF, the band 92'corresponding to the second frequency band is suppressed or attenuated. Therefore, the influence of the high frequency suppression transfer function BRTF on the second frequency band 92 is smaller than the influence on the first frequency band 91. In other words, signal processing using the coefficient of the high-frequency suppression transfer function BRTF for the first frequency band 91 is effective for forming a stereophonic image with little individual difference.
 続いて、立体処理部1により頭外定位処理された処理信号を、反射処理部2に入力し、その入力信号の第1周波数帯域91を超える高域を含む第2周波数帯域92の少なくとも一部分に、反射音成分を付加して、反射処理部2の処理信号を生成する。また、反射処理部2では、反射処理部2に入力された立体処理部1の処理信号が反射処理部2を通過するとき、入力信号の第1周波数帯域91よりも高帯域の第2周波数帯域92に対して、初期反射音と共に、直接音から0.001~10秒遅延する追加反射音成分(残響音、リバーブ)を付加する。 Subsequently, the processed signal subjected to the out-of-head localization processing by the three-dimensional processing unit 1 is input to the reflection processing unit 2, and is applied to at least a part of the second frequency band 92 including the high frequency band exceeding the first frequency band 91 of the input signal. , A reflected sound component is added to generate a processing signal of the reflection processing unit 2. Further, in the reflection processing unit 2, when the processing signal of the three-dimensional processing unit 1 input to the reflection processing unit 2 passes through the reflection processing unit 2, a second frequency band higher than the first frequency band 91 of the input signal is generated. For 92, an additional reflected sound component (reverberation sound, reverb) delayed from the direct sound by 0.001 to 10 seconds is added together with the initial reflected sound.
 具体的には、図4A~Dの分岐点25a,25bから反射処理部2の反射処理フィルタ21a,21bに導入される入力信号は、図5に示す遅延器22と反射処理ゲインの任意係数が付与される乗算器23とを通過し、周波数フィルタ24に入力される。周波数フィルタ24として図6に示すIIRフィルタの双2次(BiQuad)フィルタでは、ローシェルフフィルタ(LSF:カットオフ周波数以下の成分を増幅量分だけ増幅するフィルタ)の場合、伝達関数H(z)及びその周波数応答は、式3及び式4で表される。
 H(z)=A(z-2+(A1/2/Q)z-1+A)/(Az-2+(A1/2/Q)z-1+1) (式3)
 H(jω)=A(e-j2ω+(A1/2/Q)e-jω+A)/(Ae-j2ω+(A1/2/Q)e-jω+1) (式4)
Specifically, the input signal introduced from the branch points 25a and 25b of FIGS. 4A to 4D to the reflection processing filters 21a and 21b of the reflection processing unit 2 has the delay device 22 shown in FIG. 5 and an arbitrary coefficient of the reflection processing gain. It passes through the applied multiplier 23 and is input to the frequency filter 24. In the biquadratic (BiQuad) filter of the IIR filter shown in FIG. 6 as the frequency filter 24, in the case of a low shelf filter (LSF: a filter that amplifies components below the cutoff frequency by the amount of amplification), the transfer function H (z). And its frequency response are represented by Equations 3 and 4.
H (z) = A (z -2 + (A 1/2 / Q) z -1 + A) / (Az -2 + (A 1/2 / Q) z -1 +1) (Equation 3)
H (jω) = A (e -j2ω + (A 1/2 / Q) e -jω + A) / (Ae -j2ω + (A 1/2 / Q) e -jω +1) (Equation 4)
 QはフィルタQ値(カットオフ周波数周辺の増幅のカーブに影響する値)、Gはフィルタ特性を決定する増幅量[dB]、ωは2πFc/Fsで表される角周波数[rad/秒]、Fsはサンプリング周波数[Hz]、Fcはカットオフ周波数[Hz]をそれぞれ示す。例えば、Fcを8kHzに設定すれば第2周波数帯域に相当する帯域92’は8kHz以上の範囲となる。Aは10G/40、αはsinω/2Qで表される。前記FIRフィルタは、インパルス応答が係数に等しいが、図6のIIRフィルタ24におけるLSFの係数a1,a2,b0,b1,b2は、周波数特性が考慮された下記式5~式9により決定される。
 a1=-2((A-1)+(A+1)cosω)  (式5)
 a2=(A+1)+(A-1)cosω-2αA1/2  (式6)
 b0=A((A+1)-(A-1)cosω+2αA1/2)  (式7)
 b1=2A((A-1)-(A+1)cosω)  (式8)
 b2=A((A+1)-(A-1)cosω-2αA1/2)  (式9)
 周波数フィルタ24では、イヤホン特性、聴者好みに応じ、フィルタQ値、増幅量G、カットオフ周波数Fcを任意に設定して、係数a1,a2,b0,b1,b2を決定できる。また、LSFではなく、他のフィルタ形式、例えば、ローパス、ハイパス、バンドパス、ノッチ、ハイシェルフ、ピーキング、オールパス等のフィルタでもよい。他のフィルタと組み合わせてもよい。
Q is the filter Q value (a value that affects the amplification curve around the cutoff frequency), G is the amplification amount [dB] that determines the filter characteristics, and ω is the angular frequency [rad / sec] expressed by 2πFc / Fs. Fs indicates the sampling frequency [Hz], and Fc indicates the cutoff frequency [Hz]. For example, if Fc is set to 8 kHz, the band 92'corresponding to the second frequency band is in the range of 8 kHz or more. A is represented by 10 G / 40 and α is represented by sinω / 2Q. In the FIR filter, the impulse response is equal to the coefficient, but the coefficients a 1 , a 2 , b 0 , b 1 , b 2 of the LSF in the IIR filter 24 in FIG. 6 are the following equations 5 to 5 in consideration of the frequency characteristics. Determined by 9.
a 1 = -2 ((A-1) + (A + 1) cosω) (Equation 5)
a 2 = (A + 1) + (A-1) cosω-2αA 1/2 (Equation 6)
b 0 = A ((A + 1)-(A-1) cosω + 2αA 1/2 ) (Equation 7)
b 1 = 2A ((A-1)-(A + 1) cosω) (Equation 8)
b 2 = A ((A + 1)-(A-1) cosω-2αA 1/2 ) (Equation 9)
In the frequency filter 24, the coefficients a 1 , a 2 , b 0 , b 1 , b 2 can be determined by arbitrarily setting the filter Q value, the amplification amount G, and the cutoff frequency Fc according to the earphone characteristics and the listener's preference. .. Further, instead of the LSF, other filter types such as low-pass, high-pass, band-pass, notch, high-shelf, peaking, and all-pass filters may be used. It may be combined with other filters.
 図11(ア)は、音源信号に対し、水平面方位角実質45°の従来の頭部伝達関数HRTFの係数を付与した立体音再生信号34を概略示し、図11(イ)は、前記同一の音源信号に対し、本発明により立体処理部1で高域抑圧伝達関数BRTFの係数を付与し、かつ反射処理部2で初期反射音及び追加反射音の成分を付加した立体音再生信号35を概略示する。比較のために両図に音源信号33(破線)を示す。図11(ア)及び(イ)では、横軸を周波数[kHz]、縦軸を相対振幅[dB]とし、音源71が実質+45°(図7)の場合の左耳70L側のデータを例示する。図11(ア)及び(イ)の両立体音再生信号34,35について、実質8kHz以下の第1周波数帯域91では、ほぼ同一の特性を示すが、実質8kHzを超える第2周波数帯域92では、特性が異なる。 FIG. 11 (a) schematically shows a three-dimensional sound reproduction signal 34 in which a coefficient of a conventional head-related transfer function HRTF having a horizontal plane orientation angle of substantially 45 ° is added to the sound source signal, and FIG. 11 (a) shows the same According to the present invention, the stereoscopic sound reproduction signal 35 to which the coefficient of the high frequency suppression transmission function BRTF is added to the sound source signal by the stereoscopic processing unit 1 and the components of the initial reflected sound and the additional reflected sound are added by the reflection processing unit 2 is outlined. Show. For comparison, the sound source signal 33 (broken line) is shown in both figures. In FIGS. 11A and 11B, the horizontal axis is frequency [kHz] and the vertical axis is relative amplitude [dB], and the data on the left ear 70L side when the sound source 71 is substantially + 45 ° (FIG. 7) is illustrated. do. The compatible sound reproduction signals 34 and 35 of FIGS. 11A and 11B show almost the same characteristics in the first frequency band 91 of substantially 8 kHz or less, but in the second frequency band 92 of substantially more than 8 kHz. The characteristics are different.
 立体処理部1及び反射処理部2の両処理により生成した立体音再生信号35を、出力端7a,7bを介してデジタル-アナログ変換装置40に送信し、更にイヤホン50又はヘッドホンに立体音再生信号35を供給する。この場合、例えば無線イヤホンへ立体音再生信号35を送信するとき、信号の圧縮処理が必要である。図12(ア)は、従来の頭部伝達関数HRTFの係数を付与した音声信号34(図11(イ))に対し、例えば無線イヤホンへの送信時に圧縮処理した音声信号34’を概略示する。図12(イ)は、本発明の処理方法により、立体処理部1及び反射処理部2で処理した音声信号35(図11(イ))に対し、前記同様に圧縮処理し、圧縮処理後の音声信号35’を概略示する。従来技術による音声信号34’では、頭部伝達関数HRTFの係数の付与、即ち直接音から0.0001秒程度の短い遅延成分の付加により高帯域の第2周波数帯域92の音像情報が低減又は消失する。これに対し、本発明の処理方法による音声信号35’では、初期反射音成分と共に、直接音から0.001~10秒遅延する追加反射音成分を、伝達関数とは別に付加するため、第2周波数帯域92に付加された反射音成分が消失せず、立体音響効果が維持される。 The three-dimensional sound reproduction signal 35 generated by both the three-dimensional processing unit 1 and the reflection processing unit 2 is transmitted to the digital-to-analog conversion device 40 via the output terminals 7a and 7b, and further sent to the earphone 50 or the headphone as the three-dimensional sound reproduction signal. Supply 35. In this case, for example, when transmitting the three-dimensional sound reproduction signal 35 to the wireless earphone, the signal compression processing is required. FIG. 12 (a) schematically shows an audio signal 34'that has been compressed during transmission to, for example, a wireless earphone, with respect to the audio signal 34 (FIG. 11 (a)) to which a coefficient of the conventional head-related transfer function HRTF is added. .. In FIG. 12 (a), the audio signal 35 (FIG. 11 (a)) processed by the three-dimensional processing unit 1 and the reflection processing unit 2 is compressed in the same manner as described above by the processing method of the present invention, and after the compression processing. The audio signal 35'is outlined. In the voice signal 34'by the conventional technique, the sound image information of the second frequency band 92 in the high band is reduced or disappears by adding the coefficient of the head related transfer function HRTF, that is, adding a short delay component of about 0.0001 seconds from the direct sound. do. On the other hand, in the voice signal 35'by the processing method of the present invention, an additional reflected sound component delayed from the direct sound by 0.001 to 10 seconds is added separately from the transmission function together with the initial reflected sound component. The reflected sound component added to the frequency band 92 does not disappear, and the stereophonic effect is maintained.
 図13は、本発明による第2の実施の形態の立体音響装置10’を示す。
 可聴領域の最低域を含む第1周波数帯域91と第1周波数帯域91よりも高域の第2周波数帯域92とを包含する音源信号から、立体音再生信号を生成してイヤホン50又はヘッドホンに供給する立体音響処理装置10’は、入力信号の少なくとも第1周波数帯域91を頭外定位処理する立体処理部1と、立体処理部1に接続され、入力信号の第2周波数帯域92の少なくとも一部分に、反射音成分を付加する反射処理部2とを備える点は、は、図1の立体音響装置10と同様である。図13では、係数生成部3及び係数記憶部4を備える点で図1と相違する。係数生成部3は、水平面方位角45°~90°のうち複数角度における各音源位置71から左右各耳70L,70Rまでの複数対の頭部伝達関数HRTFL,HRTFRに対し、音源信号の第2周波数帯域92に相当する帯域が抑圧又は減衰処理された、左右複数対の高域抑圧伝達関数BRTFL,BRTFRの係数を生成する。係数記憶部4は、左右複数対の高域抑圧伝達関数BRTFL,BRTFRの係数を蓄積する。立体処理部1は、係数記憶部4に蓄積された左右複数対の高域抑圧伝達関数BRTFL,BRTFRの係数のうち選択された一対が付与され、入力信号を通過させて頭外定位処理する立体処理フィルタ11a-11dを備える。
FIG. 13 shows the stereophonic device 10'of the second embodiment according to the present invention.
A stereophonic sound reproduction signal is generated from a sound source signal including the first frequency band 91 including the lowest region of the audible region and the second frequency band 92 higher than the first frequency band 91 and supplied to the earphone 50 or the headphone. The stereophonic processing device 10'is connected to a stereophonic processing unit 1 that performs out-of-head localization processing of at least the first frequency band 91 of the input signal, and is connected to the stereophonic processing unit 1 to cover at least a part of the second frequency band 92 of the input signal. The point that the reflection processing unit 2 for adding the reflected sound component is provided is the same as that of the stereophonic device 10 of FIG. FIG. 13 differs from FIG. 1 in that the coefficient generation unit 3 and the coefficient storage unit 4 are provided. The coefficient generator 3 receives sound source signals for a plurality of pairs of head-related transfer functions HRTF L , HRTF R from each sound source position 71 to the left and right ears 70 L, 70 R at a plurality of angles from the horizontal plane azimuth angle of 45 ° to 90 °. Generates the coefficients of a plurality of pairs of left and right high-frequency suppression transfer functions BRTF L and BRTF R in which the band corresponding to the second frequency band 92 is suppressed or attenuated. The coefficient storage unit 4 accumulates the coefficients of a plurality of pairs of left and right high-frequency suppression transfer functions BRTF L and BRTF R. The three-dimensional processing unit 1 is given a selected pair of the coefficients of the high-frequency suppression transfer functions BRTF L and BRTF R stored in the coefficient storage unit 4, and passes the input signal to perform out-of-head localization processing. It is equipped with a three-dimensional processing filter 11a-11d.
 次に、図13の立体音響装置10’を用いた本発明による立体処理方法を示す。
 立体音響装置10’を用いた処理方法では、伝達関数の係数を生成及び蓄積するために、第1に、水平面方位角45°~90°の複数角度における各音源71位置から左右各耳70L,70Rまでの左右複数対の頭部伝達関数HRTFL,HRTFRを準備する。具体的には、水平面方位角45°~90°の複数角度θにおける各音源71位置から左右各耳70L,70Rまでの複数対の頭部インパルス応答HRIRL,HRIRR(80)を、係数生成部3でフーリエ変換により演算処理し(s81)、その常用対数を20倍しdB値に変換することにより(s82)、左右複数対の頭部伝達関数HRTFL,HRTFRが準備される。複数対を準備する以外は、前記第1の実施の形態のフローチャート(図8)と同様に処理する。
Next, a three-dimensional processing method according to the present invention using the three-dimensional sound device 10'in FIG. 13 will be shown.
In the processing method using the stereophonic device 10', in order to generate and accumulate the transfer function coefficient, first, the left and right ears 70L from each sound source 71 position at multiple angles of horizontal plane azimuth angle 45 ° to 90 °, Prepare multiple pairs of left and right head-related transfer functions HRTF L and HRTF R up to 70R. Specifically, coefficient generation of multiple pairs of head-related transfer responses HRIR L , HRIR R (80) from each sound source 71 position to each left and right ear 70 L, 70 R at multiple angles θ with a horizontal azimuth angle of 45 ° to 90 °. In Part 3, the head-related transfer functions HRTF L and HRTF R are prepared by performing arithmetic processing by Fourier transform (s81) and multiplying the common logarithm by 20 to convert it to a dB value (s82). Processing is performed in the same manner as in the flowchart (FIG. 8) of the first embodiment, except that a plurality of pairs are prepared.
 第2に、左右複数対の頭部伝達関数HRTFL,HRTFRに対し、第2周波数帯域に相当する帯域92’を、係数生成部3で抑圧又は減衰処理して(s83)、左右複数対の高域抑圧伝達関数BRTFL,BRTFRを生成する。第3に、高域抑圧伝達関数BRTFL,BRTFRについて、対数値を振幅値に戻し(s84)、逆フーリエ変換し(s85)、窓関数処理を行い(s86)、左右複数対の高域抑圧インパルス応答BRIRL,BRIRRが得られる(90)。第4に、高域抑圧インパルス応答BRIRL,BRIRRの値に相当する、左右複数対の高域抑圧伝達関数の係数hB0~hBnを、係数記憶部4に蓄積する。 Second, for the left and right head-related transfer functions HRTF L and HRTF R , the band 92'corresponding to the second frequency band is suppressed or attenuated by the coefficient generator 3 (s83), and the left and right multiple pairs. Generates the high frequency suppression transfer functions BRTF L and BRTF R. Third, for the high-frequency suppression transfer functions BRTF L and BRTF R , the logarithmic value is returned to the amplitude value (s84), inverse Fourier transform is performed (s85), and window function processing is performed (s86). Suppressive impulse responses BRIR L , BRIR R are obtained (90). Fourth, the coefficients h B0 to h Bn of a plurality of pairs of left and right high-frequency suppression transfer functions corresponding to the values of the high-frequency suppression impulse responses BRIR L and BRIR R are stored in the coefficient storage unit 4.
 次に、立体処理部1の処理信号を生成するために、第1に、係数記憶部4に蓄積された左右複数対の高域抑圧伝達関数の係数hB0~hBnのうち、左右一対のみの高域抑圧伝達関数の係数を、手動若しくは自動により又はプログラムに従い選択する。第2に、伝達関数の係数が予め付与されている又は付与されていない立体処理フィルタ11a-11dに対し、選択された水平面方位角一角度のみにおける左右一対のみの高域抑圧伝達関数の係数hB0~hBnを付与する。第3に、左右一対のみの高域抑圧伝達関数の係数hB0~hBnが付与された立体処理フィルタ11a-11dに対し、音源信号を通過させて、少なくとも第1周波数帯域91について頭外定位処理した立体処理部1の処理信号を生成する。 Next, in order to generate the processing signal of the three-dimensional processing unit 1, first, only the left and right pairs of the coefficients h B0 to h Bn of the left and right plurality of pairs of high-frequency suppression transfer functions stored in the coefficient storage unit 4 are generated. The coefficients of the high frequency suppression transfer function of are selected manually or automatically or according to the program. Secondly, for the three-dimensional processing filter 11a-11d to which the transfer function coefficient is given or not given in advance, the coefficient h of the high-frequency suppression transfer function of only one pair on the left and right at only one selected horizontal plane azimuth angle. B0 to h Bn is given. Third, the sound source signal is passed through the three-dimensional processing filter 11a-11d to which the coefficients h B0 to h Bn of the high-frequency suppression transfer function of only one pair on the left and right are given, and at least the first frequency band 91 is localized out of the head. Generates the processed signal of the processed three-dimensional processing unit 1.
 更に、立体処理部1の処理信号を反射処理部2に入力し、入力信号の第1周波数帯域91を超える高帯域を含む第2周波数帯域92の少なくとも一部分に、反射音成分を付加して、反射処理部2の処理信号を生成する。最後に、立体処理部1及び反射処理部2の両処理により生成した立体音再生信号をイヤホン50又はヘッドホンに供給する。立体音響装置10’(図13)及びそれを用いた処理方法では、前記立体音響装置10(図1)と同様に、個人差が小さく少ない演算量で高音質の立体音響を形成できる優れた作用効果が得られる。 Further, the processing signal of the three-dimensional processing unit 1 is input to the reflection processing unit 2, and a reflected sound component is added to at least a part of the second frequency band 92 including the high band exceeding the first frequency band 91 of the input signal. Generates the processing signal of the reflection processing unit 2. Finally, the three-dimensional sound reproduction signal generated by both the three-dimensional processing unit 1 and the reflection processing unit 2 is supplied to the earphone 50 or the headphones. Similar to the stereophonic device 10 (FIG. 1), the stereophonic device 10'(FIG. 13) and the processing method using the stereophonic device 10'(FIG. 1) have an excellent effect of forming high-quality stereophonic sound with a small amount of calculation and small individual differences. The effect is obtained.
 図1及び図13に示す前記第1及び第2の実施の形態では、立体音響装置10,10’に入力された音源信号は、立体処理部1及び反射処理部2の順で処理されるが、順序を逆にして、反射処理部2及び立体処理部1の順で処理しても、同様の作用効果が得られる。また、前記第1及び第2の実施の形態では、2チャンネル式の立体音響装置10,10’を示したが、2以上の多チャンネルであっても良い。また、前記第1及び第2の実施の形態では、音源信号の第1及び第2周波数帯域91,92全体を立体処理部1で頭外定位処理するが、第1周波数帯域91にのみ、高域抑圧伝達関数BRTFの係数による頭外定位処理を施してもよい。 In the first and second embodiments shown in FIGS. 1 and 13, the sound source signals input to the stereophonic devices 10 and 10'are processed in the order of the stereophonic processing unit 1 and the reflection processing unit 2. , The same effect can be obtained by reversing the order and processing in the order of the reflection processing unit 2 and the three-dimensional processing unit 1. Further, in the first and second embodiments, the two-channel stereophonic device 10,10'is shown, but two or more channels may be used. Further, in the first and second embodiments, the entire first and second frequency bands 91 and 92 of the sound source signal are subjected to out-of-head localization processing by the stereoscopic processing unit 1, but only in the first frequency band 91. Overhead localization processing may be performed by the coefficient of the region suppression transfer function BRTF.
 図14は、立体処理部1及び反射処理部2の両処理後の位置に、反射処理部2に接続されたM/Sディレイ部93を備える第3の実施の形態による立体音響装置10”を示し、図15にM/Sディレイ部93の例示的な構成を示す。M/Sディレイ部93は、反射処理部2からの出力信号を各々L及びR端子に入力しセンター及びサイド信号に変換するM/S変換器98と、M/S変換器98からのセンター及びサイド信号を各々遅延させる反射処理フィルタ21aと、反射処理フィルタ21aからの遅延信号を各々M及びS端子に入力し左右信号に戻し左右信号をL及びR端子から出力端7a,7bに各々送信するL/R変換器99とを備える。反射処理フィルタ21aは、図5の構成と同一のため、同一の符号を付して説明を省略する。M/S変換器98で分離したセンター及びサイド信号に対し、反射処理フィルタ21aの遅延器22にて各信号を別々に又は一方のみに遅延を与え、再度L/R変換器99で左右信号に戻すことにより、ハース効果(同一音が複数方向から同一音量で聞こえた場合、最も早く到達した方向に定位が偏位し聞こえる現象)による音の立体感及び定位感が強調される。センター及びサイド信号の両成分に同一成分が含まれて立体感が弱く聞こえる場合でも、センター信号及びサイド信号に相対的に異なる遅延を付与して、先行音に定位が偏り、立体音響を形成できる。別法として、フーリエ変換を行い、|L振幅|-|R振幅|、(L振幅)2-(R振幅)2、又はarctan(|L振幅|/|R振幅|)がある範囲(望ましくは、arctan(0.7)≦arctan(|L振幅|/|R振幅|)≦arctan(1.3)程度)をセンター信号成分、それ以外をサイド信号成分として分離し、それぞれ逆フーリエ変換で戻して用いても良い(||は絶対値)。一般的に音楽信号は低域成分をセンターに配置することが多いので、低域部分(望ましくは~200Hz程度)の係数を上記arctanの値に関わらずセンター信号成分としてもよい。センター信号成分のみを抽出し、元信号から引いてサイド信号成分としても良い。 FIG. 14 shows a stereophonic device 10 "according to the third embodiment, which includes an M / S delay unit 93 connected to the reflection processing unit 2 at positions after both processing of the stereophonic processing unit 1 and the reflection processing unit 2. FIG. 15 shows an exemplary configuration of the M / S delay unit 93. The M / S delay unit 93 inputs the output signals from the reflection processing unit 2 to the L and R terminals, respectively, and converts them into center and side signals. The M / S converter 98, the reflection processing filter 21a that delays the center and side signals from the M / S converter 98, and the delay signal from the reflection processing filter 21a are input to the M and S terminals, respectively, and left and right signals. It is equipped with an L / R converter 99 that transmits left and right signals from the L and R terminals to the output terminals 7a and 7b, respectively. The reflection processing filter 21a has the same reference numerals as those shown in FIG. For the center and side signals separated by the M / S converter 98, the delay device 22 of the reflection processing filter 21a delays each signal separately or only to one of them, and then L / R conversion is performed again. By returning to the left and right signals with the device 99, the stereoscopic effect and the sense of localization of the sound due to the hearth effect (a phenomenon in which the localization is deviated in the direction of the earliest arrival when the same sound is heard from multiple directions at the same volume) are emphasized. Even if both the center signal and the side signal contain the same component and the stereoscopic effect sounds weak, a relatively different delay is given to the center signal and the side signal, the localization is biased to the preceding sound, and the stereophonic sound is produced. Alternatively, a Fourier transform is performed and there are | L amplitude |-| R amplitude |, (L amplitude) 2- (R amplitude) 2 , or arctan (| L amplitude | / | R amplitude |). Separate the range (preferably arctan (0.7) ≤ arctan (| L amplitude | / | R amplitude |) ≤ arctan (1.3)) as the center signal component and the other as the side signal component, and return them by inverse Fourier transform. (|| is an absolute value). Generally, in music signals, the low frequency component is often placed in the center, so the coefficient of the low frequency region (preferably about 200Hz) is set to the above arctan value. Regardless, the center signal component may be used. Only the center signal component may be extracted and subtracted from the original signal to be used as the side signal component.
 図16は、センター信号及びサイド信号に分離するM/S処理器51を含むが、反射処理部2を含まない本発明による第4の実施の形態による立体音響装置10’’’を示す。音源信号を立体処理装置10’’’に入力する第1及び第2入力端6a,6bと、第1及び第2入力端6a,6bからの入力信号の少なくとも第1周波数帯域91を頭外定位処理する立体処理部1とを備える。立体処理部1は、第1入力端6aに接続された第1及び第2立体処理フィルタ11a,11bと、第2入力端6aに接続された第3及び第4立体処理フィルタ11c,11dと、第2及び第3立体処理フィルタ11b,11cの両出力信号を左右信号として各々入力し、センター信号及びサイド信号に分離し、少なくともサイド信号に又はサイド信号のみに、反射音を付加するM/S処理器51と、M/S処理器51の他方(図2B加算器57bから)の出力信号と第1立体処理フィルタ11aの出力信号とを加算する一方の加算器16aと、M/S処理器51の一方(図2B加算器57aから)の出力信号と第4立体処理フィルタ11dの出力信号とを加算する他方の加算器16bとを備える。M/S処理器51の具体的構成は、図2Bに例示した符号51の構成と実質同一のため詳述を省略する。第1~第4立体処理フィルタ11a-11dには、左右一対のみの伝達関数の係数が予め又は選択により付与される。左右一対のみの伝達関数は、水平面方位角45°~90°の一角度のみにおける音源位置から左右各耳までの一対の頭部伝達関数に対し、音源信号の第2周波数帯域に相当する帯域92’が抑圧又は減衰処理された、左右一対のみの高域抑圧伝達関数である。本構成により、音源側の信号(第1又は第4立体処理フィルタ11a又は11dからの出力信号)についてM/S処理器51を通過させず、一方、音源と逆側の信号(第2又は第3立体処理フィルタ11b又は11cからの出力信号)についてM/S処理器51を通過させて、反射音成分を加え人間の空間音響認識を高めることができる。ステレオ音源の場合、音の広がり感はサイド信号成分の影響が大きいため、センター及びサイド信号への変換後、サイド信号成分のみに反射音等付加処理を行ってもよい。 FIG. 16 shows a stereophonic device 10 ″ according to a fourth embodiment of the present invention, which includes an M / S processor 51 that separates a center signal and a side signal, but does not include a reflection processing unit 2. Out-of-head localization of at least the first frequency band 91 of the first and second input terminals 6a and 6b for inputting the sound source signal to the three-dimensional processing device 10'''and the input signals from the first and second input ends 6a and 6b. It is provided with a three-dimensional processing unit 1 for processing. The three-dimensional processing unit 1 includes first and second three- dimensional processing filters 11a and 11b connected to the first input end 6a, and third and fourth three- dimensional processing filters 11c and 11d connected to the second input end 6a. M / S that inputs both output signals of the second and third three-dimensional processing filters 11b and 11c as left and right signals, separates them into a center signal and a side signal, and adds reflected sound to at least the side signal or only the side signal. One adder 16a that adds the output signal of the processor 51 and the other of the M / S processor 51 (from FIG. 2B adder 57b) and the output signal of the first stereoscopic processing filter 11a, and the M / S processor. The other adder 16b for adding the output signal of one of 51 (from FIG. 2B adder 57a) and the output signal of the fourth stereoscopic processing filter 11d is provided. Since the specific configuration of the M / S processor 51 is substantially the same as the configuration of reference numeral 51 illustrated in FIG. 2B, detailed description thereof will be omitted. The first to fourth three-dimensional processing filters 11a-11d are given the coefficients of only a pair of left and right transfer functions in advance or by selection. The transfer function of only one pair on the left and right is the band corresponding to the second frequency band of the sound source signal with respect to the pair of head-related transfer functions from the sound source position to each ear on the left and right at only one angle of the horizontal plane azimuth angle of 45 ° to 90 °. 'Is a high-frequency suppression transfer function with only a pair of left and right, suppressed or attenuated. With this configuration, the signal on the sound source side (output signal from the first or fourth stereoscopic processing filter 11a or 11d) does not pass through the M / S processor 51, while the signal on the opposite side of the sound source (second or second). 3 The output signal from the three- dimensional processing filter 11b or 11c) can be passed through the M / S processor 51 to add a reflected sound component to enhance human spatial acoustic recognition. In the case of a stereo sound source, since the sense of spaciousness of the sound is greatly affected by the side signal component, additional processing such as reflected sound may be performed only on the side signal component after conversion to the center and side signals.
 本発明は、立体音響処理装置10,10’,10”,10’’’の各部としてコンピュータを機能させるための立体音響処理プログラムであってもよい。この場合、各部が有する機能の処理内容はプログラムに記述されて、プログラムをコンピュータで実行することにより、各部の処理がコンピュータ上で実現され得る。例えば、図示しないが、立体音響処理プログラムを記憶する記憶部と、記憶部に接続されプログラムの処理動作を制御する制御部とを備える。記憶部は、例えば磁気記憶装置、光ディスク、光磁気記憶媒体、半導体メモリ等、コンピュータで読み取り可能な記憶媒体を含む。更に、前記第1及び第2の実施の形態により詳述された立体処理方法を、記憶部に記憶された立体処理プログラムにより実行してもよい。 The present invention may be a stereophonic processing program for operating a computer as each part of the stereophonic processing apparatus 10, 10', 10 ", 10'''. In this case, the processing content of the function of each part may be. By being described in the program and executing the program on the computer, the processing of each part can be realized on the computer. For example, although not shown, a storage unit for storing the stereophonic processing program and a storage unit connected to the storage unit of the program The storage unit includes a control unit that controls the processing operation. The storage unit includes a computer-readable storage medium such as a magnetic storage device, an optical disk, a photomagnetic storage medium, and a semiconductor memory. Further, the first and second storage units are included. The stereophonic processing method described in detail according to the embodiment may be executed by the stereophonic processing program stored in the storage unit.
 本発明の立体音響処理装置及び処理方法及び処理プログラムでは、音楽音源だけでなく、環境音、会話音、ゲーム音声等、ステレオ音声であればあらゆる音源に利用可能である。 The stereophonic processing device, processing method, and processing program of the present invention can be used not only for music sound sources but also for all stereo sound sources such as environmental sounds, conversation sounds, and game sounds.
 1・・立体処理部、 2・・反射処理部、 3・・係数生成部、 4・・係数記憶部、 6a,6b・・入力端、 10,10’・・立体音響処理装置、 11a-11d・・立体処理フィルタ、 16a,16b・・加算器、21a,21b・・反射処理フィルタ、 22・・遅延器、 23・・乗算器、 24・・周波数フィルタ、 25a,25b・・分岐点、 26a,26b・・加算点、 27a,27b・・加算器、51・・M/S処理器、 1 ... 3D processing unit, 2 ... Reflection processing unit, 3 ... Coefficient generation unit, 4 ... Coefficient storage unit, 6a, 6b ... Input end, 10,10'... 3D sound processing device, 11a-11d・ ・ Solid processing filter, 16a, 16b ・ ・ Adder, 21a, 21b ・ ・ Reflection processing filter, 22 ・ ・ Delayer, 23 ・ ・ Multiplier, 24 ・ ・ Frequency filter, 25a, 25b ・ ・ Branch point, 26a , 26b ... Adder points, 27a, 27b ... Adder, 51 ... M / S processor,

Claims (10)

  1.  可聴領域の最低域を含む第1周波数帯域と第1周波数帯域よりも高域の第2周波数帯域とを包含する音源信号から、立体音再生信号を生成してイヤホン又はヘッドホンに供給する立体音響処理装置において、
     入力信号の少なくとも第1周波数帯域を頭外定位処理する立体処理部と、
     立体処理部に接続され、入力信号の第2周波数帯域の少なくとも一部分に、反射音成分を付加する反射処理部とを備え、
     立体処理部は、左右一対のみの伝達関数の係数が予め又は選択により付与された立体処理フィルタを含み、
     左右一対のみの伝達関数は、水平面方位角45°~90°の一角度のみにおける音源位置から左右各耳までの一対の頭部伝達関数に対し、音源信号の第2周波数帯域に相当する帯域が抑圧又は減衰処理された、左右一対のみの高域抑圧伝達関数であることを特徴とする立体音響処理装置。
    Stereophonic sound processing that generates a stereophonic sound reproduction signal from a sound source signal including a first frequency band including the lowest region of the audible region and a second frequency band higher than the first frequency band and supplies it to an earphone or a headphone. In the device
    A three-dimensional processing unit that performs out-of-head localization processing at least the first frequency band of the input signal,
    It is connected to the three-dimensional processing unit and includes a reflection processing unit that adds a reflected sound component to at least a part of the second frequency band of the input signal.
    The three-dimensional processing unit includes a three-dimensional processing filter in which only a pair of left and right transfer function coefficients are given in advance or by selection.
    The transfer function of only one pair on the left and right has a band corresponding to the second frequency band of the sound source signal with respect to the pair of head-related transfer functions from the sound source position to each ear on the left and right at only one angle of the horizontal plane azimuth angle of 45 ° to 90 °. A stereophonic processing device characterized by having only a pair of left and right high-frequency suppression transfer functions that have been suppressed or attenuated.
  2.  可聴領域の最低域を含む第1周波数帯域と第1周波数帯域よりも高域の第2周波数帯域とを包含する音源信号から、立体音再生信号を生成してイヤホン又はヘッドホンに供給する立体音響処理装置において、
     音源信号を立体処理装置に入力する第1及び第2入力端と、
     第1及び第2入力端からの入力信号の少なくとも第1周波数帯域を頭外定位処理する立体処理部とを備え、
     立体処理部は、
     第1入力端に接続された第1及び第2立体処理フィルタと、
     第2入力端に接続された第3及び第4立体処理フィルタと、
     第2及び第3立体処理フィルタの両出力信号を左右信号として各々入力し、センター信号及びサイド信号に分離しサイド信号に反射音成分を付加するM/S処理器と、
     M/S処理器の一方の出力信号と第4立体処理フィルタの出力信号とを加算する他方の加算器と、
     M/S処理器の他方の出力信号と第1立体処理フィルタの出力信号とを加算する一方の加算器とを備え、
     第1~第4立体処理フィルタは、左右一対のみの伝達関数の係数が予め又は選択により付与され、
     左右一対のみの伝達関数は、水平面方位角45°~90°の一角度のみにおける音源位置から左右各耳までの一対の頭部伝達関数に対し、音源信号の第2周波数帯域に相当する帯域が抑圧又は減衰処理された、左右一対のみの高域抑圧伝達関数であることを特徴とする立体音響処理装置。
    Stereophonic sound processing that generates a stereophonic sound reproduction signal from a sound source signal including a first frequency band including the lowest region of the audible region and a second frequency band higher than the first frequency band and supplies it to an earphone or a headphone. In the device
    The first and second input ends that input the sound source signal to the three-dimensional processing device, and
    It is provided with a three-dimensional processing unit that performs out-of-head localization processing at least in the first frequency band of the input signals from the first and second input ends.
    The three-dimensional processing unit is
    The first and second steric processing filters connected to the first input end,
    The third and fourth steric processing filters connected to the second input end,
    An M / S processor that inputs both output signals of the second and third stereoscopic processing filters as left and right signals, separates them into a center signal and a side signal, and adds a reflected sound component to the side signal.
    The other adder that adds the output signal of one of the M / S processors and the output signal of the 4th solid processing filter, and
    It is provided with one adder that adds the other output signal of the M / S processor and the output signal of the first solid processing filter.
    In the first to fourth three-dimensional processing filters, the coefficients of only a pair of left and right transfer functions are given in advance or by selection.
    The transfer function of only one pair on the left and right has a band corresponding to the second frequency band of the sound source signal with respect to the pair of head-related transfer functions from the sound source position to each ear on the left and right at only one angle of the horizontal plane azimuth angle of 45 ° to 90 °. A stereophonic processing device characterized by having only a pair of left and right high-frequency suppression transfer functions that have been suppressed or attenuated.
  3.  反射処理部は、
     音源信号を立体処理装置に入力する入力端とイヤホン又はヘッドホンとの間の分岐点から、信号が入力される反射処理フィルタと、
     入力端と分岐点との間の加算点で、音源信号又は立体処理部の処理信号と反射処理フィルタからの帰還信号とを加算する加算器とを備え、
     反射処理フィルタは、
     分岐点から信号が入力される遅延器と、
     遅延器からの信号を増幅する乗算器と、
     係数が付与されて、乗算器で増幅された信号から反射音成分を生成し、帰還信号として加算器に反射音成分を送信する周波数フィルタとを含む請求項1に記載の立体音響処理装置。
    The reflection processing unit is
    A reflection processing filter in which a signal is input from a branch point between an input end for inputting a sound source signal to a stereoscopic processing device and an earphone or a headphone, and a reflection processing filter.
    An adder that adds the processing signal of the sound source signal or the three-dimensional processing unit and the feedback signal from the reflection processing filter at the addition point between the input end and the branch point.
    The reflection processing filter is
    A delayer that inputs a signal from a branch point, and
    A multiplier that amplifies the signal from the delayer, and
    The stereophonic processing apparatus according to claim 1, further comprising a frequency filter to which a coefficient is added, a reflected sound component is generated from a signal amplified by a multiplier, and the reflected sound component is transmitted to an adder as a feedback signal.
  4.  第1周波数帯域は、0~8kHzの範囲に含まれる帯域であり、第2周波数帯域は、8kHz~96kHzの範囲に含まれる帯域である請求項1~3の何れか1項に記載の立体音響処理装置。 The stereophonic sound according to any one of claims 1 to 3, wherein the first frequency band is a band included in the range of 0 to 8 kHz, and the second frequency band is a band included in the range of 8 kHz to 96 kHz. Processing equipment.
  5.  可聴領域の最低域を含む第1周波数帯域と第1周波数帯域よりも高域の第2周波数帯域とを包含する音源信号から、立体音再生信号を生成してイヤホン又はヘッドホンに供給する立体音響処理方法において、
     立体処理部の立体処理フィルタに、伝達関数の係数を予め付与する過程と、
     伝達関数の係数が予め付与された立体処理フィルタに、音源信号を又は反射処理部の処理信号を入力し、入力信号の少なくとも第1周波数帯域を頭外定位処理して、立体処理部の処理信号を生成する過程とを含み、
     伝達関数の係数を予め付与する過程は、
     水平面方位角45°~90°の一角度のみにおける音源位置から左右各耳までの左右一対の頭部伝達関数を準備する過程と、
     左右一対の頭部伝達関数に対し、第2周波数帯域に相当する帯域を抑圧又は減衰処理して、左右一対の高域抑圧伝達関数を生成する過程と、
     左右一対の高域抑圧伝達関数を逆フーリ変換により演算処理して、左右一対の高域抑圧インパルス応答を得る過程と、
     高域抑圧インパルス応答の値に相当する、左右一対のみの高域抑圧伝達関数の係数を、立体処理フィルタに予め付与する過程とを含むことを特徴とする立体音響処理方法。
    Stereophonic sound processing that generates a stereophonic sound reproduction signal from a sound source signal including a first frequency band including the lowest region of the audible region and a second frequency band higher than the first frequency band and supplies it to an earphone or a headphone. In the method
    The process of assigning the transfer function coefficient to the 3D processing filter of the 3D processing unit in advance,
    A sound source signal or a processing signal of the reflection processing unit is input to a three-dimensional processing filter to which a coefficient of the transmission function is given in advance, and at least the first frequency band of the input signal is subjected to out-of-head localization processing to process the processing signal of the three-dimensional processing unit. Including the process of generating
    The process of assigning the transfer function coefficient in advance is
    The process of preparing a pair of left and right head-related transfer functions from the sound source position to the left and right ears at only one angle of horizontal plane azimuth of 45 ° to 90 °, and
    The process of generating a pair of left and right high-frequency suppression transfer functions by suppressing or attenuating the band corresponding to the second frequency band for the pair of left and right head-related transfer functions.
    The process of obtaining a pair of left and right high-frequency suppression impulse responses by arithmetically processing a pair of left and right high-frequency suppression transfer functions by inverse fury transformation.
    A stereophonic processing method comprising a process of preliminarily assigning a coefficient of only a pair of left and right high-frequency suppression transfer functions to a stereophonic processing filter, which corresponds to a value of a high-frequency suppression impulse response.
  6.  可聴領域の最低域を含む第1周波数帯域と第1周波数帯域よりも高域の第2周波数帯域とを包含する音源信号から、立体音再生信号を生成してイヤホン又はヘッドホンに供給する立体音響処理方法において、
     伝達関数の係数を生成及び蓄積する過程と、
     立体処理部の立体処理フィルタに、音源信号を又は反射処理部の処理信号を入力し、入力信号の少なくとも第1周波数帯域を頭外定位処理して、立体処理部の処理信号を生成する過程とを含み、
     伝達関数の係数を生成及び蓄積する過程は、
     水平面方位角45°~90°の複数角度における各音源位置から左右各耳までの左右複数対の頭部伝達関数を準備する過程と、
     左右複数対の頭部伝達関数に対し、第2周波数帯域に相当する帯域を、係数生成部で抑圧又は減衰処理して、左右複数対の高域抑圧伝達関数を生成する過程と、
     左右複数対の高域抑圧伝達関数を逆フーリエ変換により演算処理して、左右複数対の高域抑圧インパルス応答を得る過程と、
     高域抑圧インパルス応答の値に相当する、左右複数対の高域抑圧伝達関数の係数を、係数記憶部に蓄積する過程とを含み、
     立体処理部の処理信号を生成する過程は、
     係数記憶部に蓄積された左右複数対の高域抑圧伝達関数の係数のうち、左右一対のみの高域抑圧伝達関数の係数を選択する過程と、
     伝達関数の係数が予め付与されている又は付与されていない立体処理フィルタに対し、選択された水平面方位角一角度のみにおける左右一対のみの高域抑圧伝達関数の係数を付与する過程と、
     左右一対のみの高域抑圧伝達関数の係数が付与された立体処理フィルタに対し、音源信号を又は反射処理部の処理信号を通過させて、少なくとも第1周波数帯域について頭外定位処理した立体処理部の処理信号を生成する過程とを含むことを特徴とする立体音響処理方法。
    Stereophonic sound processing that generates a stereophonic sound reproduction signal from a sound source signal including a first frequency band including the lowest region of the audible region and a second frequency band higher than the first frequency band and supplies it to an earphone or a headphone. In the method
    The process of generating and accumulating transfer function coefficients,
    A process in which a sound source signal or a processing signal of the reflection processing unit is input to the three-dimensional processing filter of the three-dimensional processing unit, and at least the first frequency band of the input signal is subjected to out-of-head localization processing to generate a processing signal of the three-dimensional processing unit. Including
    The process of generating and accumulating transfer function coefficients is
    The process of preparing multiple pairs of head-related transfer functions from each sound source position to each of the left and right ears at multiple angles of horizontal plane azimuth angle of 45 ° to 90 °, and
    The process of generating a high-frequency suppression transfer function of multiple pairs of left and right by suppressing or attenuating the band corresponding to the second frequency band with respect to the head-related transfer function of multiple pairs of left and right.
    The process of obtaining the high-frequency suppression impulse response of multiple pairs of left and right by arithmetically processing the high-frequency suppression transfer function of multiple pairs of left and right by inverse Fourier transform.
    Including the process of accumulating the coefficients of multiple pairs of left and right high-frequency suppression transfer functions in the coefficient storage unit, which correspond to the values of the high-frequency suppression impulse response.
    The process of generating the processing signal of the three-dimensional processing unit is
    Of the coefficients of the high-frequency suppression transfer function of multiple pairs of left and right stored in the coefficient storage, the process of selecting the coefficients of the high-frequency suppression transfer function of only one pair of left and right, and
    The process of assigning the coefficient of the high-frequency suppression transfer function of only one pair on the left and right at only one selected horizontal plane azimuth angle to the three-dimensional processing filter to which the transfer function coefficient is given or not given in advance, and
    A stereophonic processing unit that has been subjected to out-of-head localization processing for at least the first frequency band by passing a sound source signal or a processing signal of the reflection processing unit through a stereophonic processing filter to which only a pair of left and right high-frequency suppression transfer function coefficients are assigned. A stereophonic processing method comprising a process of generating a processing signal of.
  7.  立体処理部の処理信号を又は音源信号を反射処理部に入力し、入力信号の第2周波数帯域の少なくとも一部分に、反射音成分を付加して、反射処理部の処理信号を生成する過程と、
     立体処理部及び反射処理部の両処理により生成した立体音再生信号を、イヤホン又はヘッドホンに供給する過程とを含む請求項5又は6に記載の立体音響処理方法。
    The process of inputting the processed signal of the three-dimensional processing unit or the sound source signal to the reflection processing unit and adding the reflected sound component to at least a part of the second frequency band of the input signal to generate the processed signal of the reflection processing unit.
    The stereophonic processing method according to claim 5 or 6, which includes a process of supplying a stereophonic sound reproduction signal generated by both the stereophonic processing unit and the reflection processing unit to earphones or headphones.
  8.  反射処理部の処理信号を生成する過程は、
     立体処理部の処理信号が又は音源信号が、反射処理部に入力され、反射処理部を通過する過程と、
     入力信号の第2周波数帯域に対して、初期反射音成分と、直接音から0.001~10秒遅延する反射音成分とを付加する過程とを含む請求項7に記載の立体音響処理方法。
    The process of generating the processing signal of the reflection processing unit is
    The process in which the processing signal of the three-dimensional processing unit or the sound source signal is input to the reflection processing unit and passes through the reflection processing unit, and
    The stereophonic processing method according to claim 7, further comprising a process of adding an initial reflected sound component and a reflected sound component delayed by 0.001 to 10 seconds from the direct sound to the second frequency band of the input signal.
  9.  第1周波数帯域は、0~8kHzの範囲に含まれる帯域であり、第2周波数帯域は、8kHz~96kHzの範囲に含まれる帯域である請求項5~8の何れか1項に記載の立体音響処理方法。 The stereophonic sound according to any one of claims 5 to 8, wherein the first frequency band is a band included in the range of 0 to 8 kHz, and the second frequency band is a band included in the range of 8 kHz to 96 kHz. Processing method.
  10.  請求項1~4の何れか1項に記載の立体音響処理装置の各部としてコンピュータを機能させることを特徴とする立体音響処理プログラム。
     
    A stereophonic processing program characterized in that a computer functions as each part of the stereophonic processing apparatus according to any one of claims 1 to 4.
PCT/JP2021/043354 2020-12-29 2021-11-26 Stereophonic processing device, stereophonic processing method, and stereophonic processing program WO2022145154A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-219867 2020-12-29
JP2020219867A JP7319687B2 (en) 2020-12-29 2020-12-29 3D sound processing device, 3D sound processing method and 3D sound processing program

Publications (1)

Publication Number Publication Date
WO2022145154A1 true WO2022145154A1 (en) 2022-07-07

Family

ID=82260373

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/043354 WO2022145154A1 (en) 2020-12-29 2021-11-26 Stereophonic processing device, stereophonic processing method, and stereophonic processing program

Country Status (2)

Country Link
JP (1) JP7319687B2 (en)
WO (1) WO2022145154A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110026718A1 (en) * 2006-01-04 2011-02-03 Texas Instruments Incorporated Virtualizer with cross-talk cancellation and reverb
JP2014239429A (en) * 2013-05-23 2014-12-18 ジーエヌ リザウンド エー/エスGn Resound A/S Hearing aid with spatial signal enhancement
US20200336858A1 (en) * 2019-04-18 2020-10-22 Facebook Technologies, Llc Individualization of head related transfer function templates for presentation of audio content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110026718A1 (en) * 2006-01-04 2011-02-03 Texas Instruments Incorporated Virtualizer with cross-talk cancellation and reverb
JP2014239429A (en) * 2013-05-23 2014-12-18 ジーエヌ リザウンド エー/エスGn Resound A/S Hearing aid with spatial signal enhancement
US20200336858A1 (en) * 2019-04-18 2020-10-22 Facebook Technologies, Llc Individualization of head related transfer function templates for presentation of audio content

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HIROSHI IWAMURA: "Spatial Acoustic Reproduction Using Speaker State Space Model and Modern Control Theory", PROCEEDINGS OF THE 2019 SPRING MEETING OF THE ACOUSTICAL SOCIETY OF JAPAN; TOKYO, JAPAN; MARCH 3-5, 2019, 19 February 2019 (2019-02-19) - 5 March 2019 (2019-03-05), JP, pages 355 - 358, XP009538012 *
TOMOMI HASEGAWA: "1-10-9 Influence of reflected sound structure on front-back perception", PROCEEDINGS OF THE 2015 SPRING MEETING OF THE ACOUSTICAL SOCIETY OF JAPAN; TOKYO, JAPAN; MARCH 16-18, 2015, 6 March 2015 (2015-03-06) - 18 March 2015 (2015-03-18), JP, pages 669 - 670, XP009538011 *

Also Published As

Publication number Publication date
JP7319687B2 (en) 2023-08-02
JP2022104731A (en) 2022-07-11

Similar Documents

Publication Publication Date Title
US9432793B2 (en) Head-related transfer function convolution method and head-related transfer function convolution device
JP5533248B2 (en) Audio signal processing apparatus and audio signal processing method
US8873761B2 (en) Audio signal processing device and audio signal processing method
JP4780119B2 (en) Head-related transfer function measurement method, head-related transfer function convolution method, and head-related transfer function convolution device
KR100739798B1 (en) Method and apparatus for reproducing a virtual sound of two channels based on the position of listener
JP6824155B2 (en) Audio playback system and method
KR100636252B1 (en) Method and apparatus for spatial stereo sound
KR20050060789A (en) Apparatus and method for controlling virtual sound
JP2012004668A (en) Head transmission function generation device, head transmission function generation method, and audio signal processing apparatus
CN109155896B (en) System and method for improved audio virtualization
US7826630B2 (en) Sound image localization apparatus
EP2484127B1 (en) Method, computer program and apparatus for processing audio signals
CN112956210B (en) Audio signal processing method and device based on equalization filter
US8320590B2 (en) Device, method, program, and system for canceling crosstalk when reproducing sound through plurality of speakers arranged around listener
CN101489173A (en) Signal processing apparatus, signal processing method, and storage medium
WO2022145154A1 (en) Stereophonic processing device, stereophonic processing method, and stereophonic processing program
JP2011259299A (en) Head-related transfer function generation device, head-related transfer function generation method, and audio signal processing device
JP4427915B2 (en) Virtual sound image localization processor
JP5163685B2 (en) Head-related transfer function measurement method, head-related transfer function convolution method, and head-related transfer function convolution device
JP2009105565A (en) Virtual sound image localization processor and virtual sound image localization processing method
JP3090416B2 (en) Sound image control device and sound image control method
US11545130B1 (en) System and method for an audio reproduction device
DK180449B1 (en) A method and system for real-time implementation of head-related transfer functions
JPH10126898A (en) Device and method for localizing sound image
CA3192986A1 (en) Sound reproduction with multiple order hrtf between left and right ears

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21915015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21915015

Country of ref document: EP

Kind code of ref document: A1