WO2016056410A1 - 音声処理装置および方法、並びにプログラム - Google Patents
音声処理装置および方法、並びにプログラム Download PDFInfo
- Publication number
- WO2016056410A1 WO2016056410A1 PCT/JP2015/077242 JP2015077242W WO2016056410A1 WO 2016056410 A1 WO2016056410 A1 WO 2016056410A1 JP 2015077242 W JP2015077242 W JP 2015077242W WO 2016056410 A1 WO2016056410 A1 WO 2016056410A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spatial
- sound
- frequency
- spatial filter
- microphone array
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S3/00—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
- G01S3/80—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
- G01S3/802—Systems for determining direction or deviation from predetermined direction
- G01S3/808—Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/406—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/40—Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
- H04R2201/403—Linear arrays of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
Definitions
- the present technology relates to a voice processing apparatus, method, and program, and more particularly, to a voice processing apparatus, method, and program that can improve localization of a sound image at a lower cost.
- a wavefront synthesis technique for reproducing a sound field using a flat speaker array or a linear speaker array is known.
- Such a wavefront synthesis technique can be used for next-generation bidirectional communication, for example, as shown in FIG.
- next-generation bidirectional communication is performed between a space P11 where the caller W11 is located and a space P12 where the caller W12 is located.
- the sound field A mainly composed of the voice uttered by the caller W11 is collected by the linear microphone array MCA11 composed of a plurality of microphones arranged in the vertical direction in the figure, and the result is obtained.
- the generated sound source signal is transmitted to the space P12.
- the arrows in the figure indicate the direction of propagation of the sound from the speaker W11 as the sound source, and the sound of the speaker W11 arrives from the direction of the angle ⁇ as viewed from the straight microphone array MCA11 and is collected.
- this angle ⁇ that is, the angle formed by the propagation direction of the sound from the sound source and the direction in which the microphones constituting the microphone array are arranged is referred to as an arrival angle ⁇ .
- a speaker drive signal for reproducing the sound field A is generated from the sound source signal transmitted from the space P11. Then, the sound field A is reproduced based on the generated speaker driving signal by the linear speaker array SPA11 including a plurality of speakers arranged in the vertical direction in the drawing in the space P12.
- the arrow in the figure indicates the propagation direction of the sound that is output from the linear speaker array SPA11 and propagated to the caller W12.
- the angle formed by the propagation direction and the linear speaker array SPA11 is the same as the arrival angle ⁇ .
- a linear microphone array is also provided in the space P12, and this linear microphone array collects a sound field B mainly composed of voices uttered by the caller W12.
- the received sound source signal is transmitted to the space P11.
- a speaker drive signal is generated from the sound source signal transmitted from the space P12, and the sound field B is reproduced by a linear speaker array (not shown) based on the obtained speaker drive signal.
- the highest spatial frequency that is not affected by spatial aliasing (hereinafter referred to as the upper limit spatial frequency) k lim is obtained from the spatial Nyquist frequency calculated by the distance between the speakers constituting the speaker array or the distance between the microphones constituting the microphone array. Determined by the lower one.
- the upper limit spatial frequency k lim is obtained by the following equation (1).
- the upper limit spatial frequency k lim obtained in this way affects the localization of the sound image, and generally a higher value is preferable.
- equation (2) the relationship between the frequency (hereinafter referred to as the time frequency) f of the sound source signal and the spatial frequency k is as shown in the following equation (2).
- c represents the speed of sound.
- the highest time frequency (hereinafter referred to as the upper limit time frequency) flim that is not affected by spatial aliasing can be obtained by equation (2).
- the upper limit time frequency flim affects the sound quality. Generally, a higher value means higher reproducibility, that is, HiFi (High Fidelity).
- FIG. 2 shows a spatial spectrum according to the difference in the arrival angle of the plane wave of the sound from the sound source.
- the spatial spectrum is also called an angle spectrum because the position of the spectrum peak changes depending on the arrival angle of the plane wave.
- the vertical axis indicates the time frequency f
- the horizontal axis indicates the spatial frequency k.
- straight lines L11 to L13 indicate spectral peaks, respectively.
- the spectrum peak appears in the positive direction of the spatial frequency k.
- the straight line L12 corresponds to the straight line L11, and shows a spectral peak that should appear originally.
- the straight line L13 shows a spectrum peak that appears due to spatial aliasing.
- spatial aliasing is noticeably generated in a region where the temporal frequency f is higher than the upper limit temporal frequency flim and the spatial frequency k is negative. ing.
- the spectrum peak should appear in the region where the spatial frequency k is negative when the plane wave arrival angle ⁇ is ⁇ / 2 ⁇ ⁇ ⁇ ⁇ .
- the upper limit time frequency f lim that is not affected by spatial aliasing is further increased by using two types of speaker arrays, a loudspeaker unit and a loudspeaker unit with different speaker intervals.
- a technique has been proposed (see, for example, Patent Document 1). According to this technique, it is possible to accurately reproduce a signal having a higher time frequency.
- the present technology has been made in view of such circumstances, and is intended to improve the localization of sound images at a lower cost.
- An audio processing apparatus is provided with a sound collection obtained by collecting sound from the sound source by a direction information acquisition unit that acquires direction information indicating the direction of the sound source and a microphone array including a plurality of microphones.
- a spatial filter application unit that applies a spatial filter having characteristics determined by the orientation information to the signal.
- the spatial filter application unit can determine a center frequency and a bandwidth as characteristics of the spatial filter based on the orientation information.
- the spatial filter may be a filter that transmits a component of a transmission frequency band of the collected sound signal with a spatial frequency band determined by the center frequency and the bandwidth as a transmission frequency band.
- the spatial filter may be a filter that transmits a component of a transmission frequency band of the collected sound signal with a time frequency band determined by the center frequency and the bandwidth as a transmission frequency band.
- the spatial filter application unit determines the characteristics of the spatial filter so that the bandwidth becomes wider as the angle between the direction of the sound source indicated by the orientation information and the microphone array approaches ⁇ / 2. be able to.
- the microphone array can be a linear microphone array.
- An audio processing method or program obtains direction information indicating the direction of a sound source, and collects sound from the sound source by a microphone array including a plurality of microphones. And applying a spatial filter having a characteristic determined by the azimuth information.
- azimuth information indicating the direction of a sound source is acquired, and the directional information is obtained with respect to a sound collection signal obtained by collecting sound from the sound source by a microphone array including a plurality of microphones.
- a spatial filter having a characteristic determined by is applied.
- sound image localization can be improved at lower cost.
- an increase in the upper limit time frequency flim is realized by reducing spatial aliasing at the expense of a wavefront propagating in a non-target direction.
- the propagation direction of the wavefront to which reproduction is prioritized and other non-target directions are specified. Can do. Therefore, the upper limit time frequency flim can be increased by blocking the spatial frequency in the specified non-target direction.
- FIG. 3 is a diagram illustrating a configuration example of an embodiment of a spatial aliasing controller to which the present technology is applied.
- the spatial aliasing controller 11 has a transmitter 21 and a receiver 22.
- the transmitter 21 is arranged in a sound collection space for collecting the sound field
- the receiver 22 is arranged in a reproduction space for reproducing the sound field collected in the sound collection space.
- the transmitter 21 collects the sound field, generates a spatial frequency spectrum from the collected sound signal obtained by the sound collection, and transmits the spatial frequency spectrum to the receiver 22.
- the receiver 22 receives the spatial frequency spectrum transmitted from the transmitter 21 to generate a speaker drive signal, and reproduces a sound field based on the obtained speaker drive signal.
- the transmitter 21 includes a microphone array 31, a time frequency analysis unit 32, a spatial frequency analysis unit 33, and a communication unit 34.
- the receiver 22 includes an orientation information acquisition unit 35, a communication unit 36, a drive signal generation unit 37, a spatial filter application unit 38, a spatial frequency synthesis unit 39, a time frequency synthesis unit 40, and a speaker array 41. .
- the microphone array 31 is composed of, for example, a linear microphone array composed of a plurality of microphones arranged in a straight line, and picks up a plane wave of the incoming voice, and as a result, a time-frequency analysis unit converts the collected sound signal obtained by each microphone. 32.
- the time frequency analysis unit 32 performs time frequency conversion on the collected sound signal supplied from the microphone array 31 and supplies the time frequency spectrum obtained as a result to the spatial frequency analysis unit 33.
- the spatial frequency analysis unit 33 performs spatial frequency conversion on the temporal frequency spectrum supplied from the temporal frequency analysis unit 32 and supplies the spatial frequency spectrum obtained as a result to the communication unit 34.
- the communication unit 34 transmits the spatial frequency spectrum supplied from the spatial frequency analysis unit 33 to the communication unit 36 of the receiver 22 by wire or wireless.
- the azimuth information acquisition unit 35 of the receiver 22 acquires speaker azimuth information indicating the azimuth (direction) of the caller that is the sound source of the sound collected by the microphone array 31 and supplies the acquired speaker azimuth information to the spatial filter application unit 38. To do.
- the sound source of the collected sound field is a caller
- the sound source is not limited to a caller, and may be any object such as an object such as a car or a sound source of environmental sound.
- the speaker orientation information may be any information that indicates the relative positional relationship between the main sound source and the listener, such as the direction of the caller relative to the listener who listens to the sound from the speaker as the sound source.
- the description will be continued assuming that the speaker orientation information is the arrival angle ⁇ described above. In this case, for example, in the example of next-generation bidirectional communication shown in FIG.
- the speaker orientation information indicating the arrival angle ⁇ is also referred to as speaker orientation information ⁇ .
- the communication unit 36 receives the spatial frequency spectrum transmitted from the communication unit 34 and supplies it to the drive signal generation unit 37. Based on the spatial frequency spectrum supplied from the communication unit 36, the drive signal generation unit 37 generates a spatial region speaker drive signal for reproducing the collected sound field, and supplies the generated speaker drive signal to the spatial filter application unit 38. .
- the spatial filter application unit 38 performs a filtering process on the speaker drive signal supplied from the drive signal generation unit 37 using a spatial filter having a characteristic determined by the speaker orientation information supplied from the orientation information acquisition unit 35.
- the spatial filter spectrum obtained as a result is supplied to the spatial frequency synthesizer 39.
- the spatial frequency synthesis unit 39 performs spatial frequency synthesis of the spatial filter spectrum supplied from the spatial filter application unit 38 and supplies the time frequency spectrum obtained as a result to the time frequency synthesis unit 40.
- the time frequency synthesizer 40 performs time frequency synthesis of the time frequency spectrum supplied from the spatial frequency synthesizer 39 and supplies the speaker drive signal obtained as a result to the speaker array 41.
- the speaker array 41 includes, for example, a linear speaker array including a plurality of speakers arranged in a straight line, and reproduces sound based on the speaker drive signal supplied from the time-frequency synthesizer 40. Thereby, the sound field in the sound collection space is reproduced.
- the time frequency analysis unit 32 analyzes time frequency information of the collected sound signal s (n mic , t) obtained by each microphone constituting the microphone array 31.
- N mic is the number of microphones constituting the microphone array 31.
- t represents time in the collected sound signal s (n mic , t).
- the time-frequency analysis unit 32 performs a fixed-size time frame division on the collected sound signal s (n mic , t) to obtain an input frame signal s fr (n mic , n fr , l). Then, the time-frequency analysis unit 32 multiplies the input frame signal s fr (n mic , n fr , l) by the window function w T (n fr ) shown in the following equation (3) to obtain the window function application signal s w ( n mic , n fr , l). That is, the calculation of the following equation (4) is performed to calculate the window function application signal s w (n mic , n fr , l).
- n fr indicates a time index indicating a sample in the time frame
- the time index n fr 0,..., N fr ⁇ 1.
- L indicates a time frame index
- time frame index l 0,..., L ⁇ 1.
- N fr is the frame size (number of samples in the time frame)
- L is the total number of frames.
- the time T fr of one frame is 1.0 [s]
- the rounding function R () is rounded off.
- the frame shift amount is set to 50% of the frame size N fr , but other frame amounts may be used.
- the square root of the Hanning window is used here as the window function, other windows such as a Hamming window and a Blackman Harris window may be used.
- the time-frequency analysis unit 32 calculates the following functions (5) and (6) to obtain the window function.
- a time-frequency conversion is performed on the applied signal s w (n mic , n fr , l) to calculate a time-frequency spectrum S (n mic , n T , l).
- the zero padded signal s w ′ (n mic , m T , l) is obtained by calculation of the formula (5), and the formula is based on the obtained zero padded signal s w ′ (n mic , m T , l).
- (6) is calculated, and the time-frequency spectrum S (n mic , n T , l) is calculated.
- M T represents the number of points used in the time-frequency transform.
- N T represents a time-frequency spectrum index.
- i in Formula (6) indicates a pure imaginary number.
- time-frequency transformation is performed by STFT (Short Time Fourier Transform) (short-time Fourier transform), but DCT (Discrete Cosine Transform) (discrete cosine transform) or MDCT (Modified Discrete Cosine Transform).
- STFT Short Time Fourier Transform
- DCT Discrete Cosine Transform
- MDCT Modified Discrete Cosine Transform
- time frequency transforms such as (modified discrete cosine transform) may be used.
- the number of points M T of STFT is a power of 2 that is N fr or more and is closest to N fr , but other point numbers M T may be used.
- the time-frequency analysis unit 32 supplies the time-frequency spectrum S (n mic , n T , l) obtained by the processing described above to the spatial frequency analysis unit 33.
- the spatial frequency analysis unit 33 performs spatial frequency conversion on the temporal frequency spectrum S (n mic , n T , l) supplied from the temporal frequency analysis unit 32 by calculating the following equation (7). Then, the spatial frequency spectrum S SP (n S , n T , l) is calculated.
- S ′ (m S , n T , l) indicates a zero-padded time frequency spectrum obtained by performing zero padding on the time-frequency spectrum S (n mic , n T , l), and i is Indicates a pure imaginary number. Further, n S represents a spatial frequency spectrum index.
- zero padding may be performed appropriately according to the number of points M S of IDFT.
- zero-padded time frequency spectrum S ′ (m S , n T , l) time frequency spectrum S (n mic , n T , l )
- the zero-padded time frequency spectrum S ′ (m S , n T , l) 0.
- the spatial frequency spectrum S SP (n S , n T , l) obtained by the processing described above shows the waveform of the signal of the temporal frequency n T included in the time frame l in space. Is shown.
- the spatial frequency analysis unit 33 supplies the spatial frequency spectrum S SP (n S , n T , l) to the communication unit 34.
- the drive signal generation unit 37 is supplied with the spatial frequency spectrum S SP (n S , n T , l) from the spatial frequency analysis unit 33 via the communication unit 36 and the communication unit 34.
- the drive signal generator 37 calculates the following equation (8) based on the spatial frequency spectrum S SP (n S , n T , l), and a space for reproducing the sound field (wavefront) by the speaker array 41.
- the speaker drive signal D SP (m S , n T , l) in the region is obtained. That is, the speaker drive signal D SP (m S , n T , l), which is a spatial frequency spectrum, is calculated by SDM (Spectral Division Method).
- y ref indicates the SDM reference distance
- the reference distance y ref is the position where the wavefront is accurately reproduced.
- This reference distance y ref is a distance in a direction perpendicular to the direction in which the microphones constituting the microphone array 31 are arranged.
- the reference distance y ref 1 [m] is used here, but other values may be used.
- H 0 (2) represents the second kind Hankel function
- K 0 represents the Bessel function
- i represents a pure imaginary number
- c represents the speed of sound
- ⁇ represents a time angular frequency
- Equation (8) k represents a spatial frequency
- m S , n T , and l represent a spatial frequency spectrum index, a time frequency spectrum index, and a time frame index, respectively.
- the method of calculating the speaker drive signal D SP (m S , n T , l) by SDM has been described as an example, but the speaker drive signal may be calculated by another method.
- SDM especially "Jens Adrens, Sascha Spors," Applying the Ambisonics Approach on Planar and Linear Arrays of Loudspeakers ", in 2 nd International Symposium on Ambisonics and Spherical Acoustics " has been described in detail.
- the drive signal generation unit 37 supplies the speaker drive signal D SP (m S , n T , l) obtained as described above to the spatial filter application unit 38.
- the spatial filter application unit 38 has characteristics determined by the speaker drive signal D SP (m S , n T , l) supplied from the drive signal generation unit 37 and the speaker orientation information ⁇ supplied from the orientation information acquisition unit 35.
- the spatial filter spectrum F (m S , n T , l) is obtained using the spatial bandpass filter B ⁇ (m S , n T ) to be obtained.
- the shape of the spatial bandpass filter B ⁇ (m S , n T ) is assumed to be rectangular, but the shape of the spatial bandpass filter B ⁇ (m S , n T ) is any other It may be a shape.
- the spatial filter application unit 38 determines the center frequency k cen and the bandwidth k len of the spatial bandpass filter B ⁇ (m S , n T ) based on the speaker orientation information ⁇ , so that the spatial The characteristics of the bandpass filter B ⁇ (m S , n T ) are determined. That is, the characteristics of the spatial bandpass filter B ⁇ (m S , n T ) are determined according to the arrival angle ⁇ of the plane wave of the sound from the target main sound source.
- the spatial filter application unit 38 calculates the center frequency k cen by calculating the following equation (9), and calculates the bandwidth k len by calculating the following equation (10).
- ⁇ represents the speaker orientation information, that is, the angle of arrival of the plane wave (voice) output from the sound source and directed to the receiver to the microphone array 31.
- k lim indicates the upper limit spatial frequency determined from the microphone interval of the microphone array 31 and the speaker interval of the speaker array 41.
- the spatial bandpass filter B ⁇ (m S , n T ) uses the spatial frequency band of the bandwidth k len centered at the center frequency k cen as the transmission frequency band (passband), and the other spatial frequency band as the cutoff frequency It is a band pass filter which makes a band (stop band).
- the value of the spatial bandpass filter B ⁇ (m S , n T ) is 1 if the spatial frequency indicated by the spatial frequency spectrum index m S is a frequency within the transmission frequency band, and the spatial frequency spectrum index m S If the displayed spatial frequency is within the cut-off frequency band, it is zero.
- the spatial bandpass filter B ⁇ (m S , n T ) becomes a spatial filter that transmits only the components in the transmission frequency band.
- the spatial filter application unit 38 converts the spatial bandpass filter B ⁇ (m S , n T ) to the speaker drive signal D SP ( m S, n T, multiplies the l), to obtain the spatial filter spectrum F (m S, n T, l) a.
- the spatial filter application unit 38 supplies the spatial filter spectrum F (m S , n T , l) obtained by the calculation of Expression (11) to the spatial frequency synthesis unit 39.
- the band between the spatial frequency k 0 and the upper limit spatial frequency k lim is the transmission frequency band.
- the vertical axis indicates the time frequency f
- the horizontal axis indicates the spatial frequency k.
- the spectrum peak shown on the straight line L21 appearing in the region where the spatial frequency k is k ⁇ 0 is the spectrum peak that should appear originally.
- the spectral peak indicated by the straight line L22 appears due to spatial aliasing, and it can be seen that spatial aliasing is significant in a region where the spatial frequency k is negative.
- the region of the time frequency f without the spectrum peak indicated by the straight line L22 due to the spatial aliasing is the non-aliasing band R11.
- the upper limit time frequency of the non-aliasing band R11 that is, the region where the time frequency is higher than the upper limit time frequency flim described above is the aliasing band R12 affected by the spatial aliasing.
- the characteristic of the spatial bandpass filter B ⁇ (m S , n T ) is the characteristic indicated by the broken line L23 from the above-described equations (9) and (10).
- FIG. 4 shows a region where the shaded region in the figure is blocked by the spatial bandpass filter B ⁇ (m S , n T ).
- the spectrum of the spatial aliasing indicated by the straight line L22 is shown.
- the portion of the peak where the spatial frequency k is negative is removed.
- the non-aliasing band R13 which is the region of the time frequency f without the spectrum peak of spatial aliasing, is wider than the non-aliasing band R11, and the aliasing band R14 affected by the spatial aliasing is narrowed accordingly.
- the upper limit time frequency flim can be further increased by the filtering process using the spatial bandpass filter B ⁇ (m S , n T ).
- the upper limit time frequency flim that is not affected by spatial aliasing is increased by a factor of two.
- the upper limit time frequency f lim can be increased, so that the sound quality of the plane wave that propagates at the arrival angle ⁇ , that is, the angle ⁇ , in particular. Can be improved. Further, since spatial aliasing can be reduced, it is possible to improve localization of sound images in which plane waves that have propagated from other angles that should not be present are mixed. That is, more accurate sound image localization can be realized.
- the band between the spatial frequency k and ⁇ k lim to the upper limit spatial frequency k lim is It becomes a transmission frequency band.
- the vertical axis represents the time frequency f
- the horizontal axis represents the spatial frequency k.
- the spectrum peak indicated by the straight line L31 is observed in the spatial spectrum (angle spectrum) of the plane wave collected by the microphone array 31 as indicated by the arrow A21.
- the characteristics of the spatial bandpass filter B ⁇ (m S , n T ) are the characteristics indicated by the broken line L32 from the above formulas (9) and (10). It becomes.
- the shaded area in the figure indicates an area blocked by the spatial bandpass filter B ⁇ (m S , n T ).
- the start frequency sb is ⁇ k lim and the end frequency eb is k lim , the positive and negative spatial frequency components are not particularly reduced.
- the filtering process using the spatial bandpass filter B ⁇ (m S , n T ) is performed, the upper limit time frequency f lim is increased, and the sound quality is improved particularly for the plane wave propagating in the direction of the intended arrival angle ⁇ . And the localization of the sound image can be improved.
- the spatial bandpass filter B ⁇ (m S , n T )
- the sound quality of the plane wave propagating at another angle different from the angle ⁇ deteriorates according to the removed component. End up. Therefore, the range of the area where the sound can be heard with good sound quality in the reproduction space is reduced accordingly.
- the bandwidth k len becomes wider as the arrival angle ⁇ approaches ⁇ / 2, that is, as the spatial aliasing decreases, so that the sound can be heard with good sound quality. Therefore, the influence caused by the filter processing can be reduced.
- a transmission frequency band corresponding to the speaker orientation information ⁇ may be set.
- a transmission frequency band corresponding to the speaker orientation information ⁇ may be set for both the spatial frequency and the time frequency.
- the center frequency and bandwidth corresponding to the speaker orientation information ⁇ that is, the transmission frequency band is determined not only for the spatial frequency but also for the temporal frequency.
- the spatial bandpass The value of the filter B ⁇ (m S , n T ) is 1. That is, the spatial bandpass filter B ⁇ (m S , n T ) is a spatial filter that transmits only the components in the spatial frequency transmission frequency band and the temporal frequency transmission frequency band.
- the spatial frequency synthesizer 39 calculates the following equation (12) to synthesize the spatial frequency of the spatial filter spectrum F (m S , n T , l) supplied from the spatial filter application unit 38, that is, the spatial filter spectrum F
- An inverse spatial frequency conversion is performed on (m S , n T , l), and a time frequency spectrum D (n spk , n T , l) is calculated.
- DFT Discrete Fourier Transform
- DFT discrete Fourier transform
- n spk represents a speaker index that identifies the speakers constituting the speaker array 41.
- M S indicates the number of points of DFT, i denotes the pure imaginary number.
- the spatial frequency synthesis unit 39 supplies the time frequency spectrum D (n spk , n T , l) obtained in this way to the time frequency synthesis unit 40.
- the time-frequency synthesis unit 40 performs time-frequency synthesis of the time-frequency spectrum D (n spk , n T , l) supplied from the spatial frequency synthesis unit 39 by performing the calculation of the following equation (13), and outputs the output frame
- the signal d fr (n spk , n fr , l) is obtained.
- ISTFT Inverse Short Time Fourier Transform
- ISTFT Inverse Short Time Fourier Transform
- i a pure imaginary number
- n fr a time index.
- M T denotes the number of points ISTFT
- n spk indicates the speaker index.
- the time-frequency synthesis unit 40 multiplies the obtained output frame signal d fr (n spk , n fr , l) by the window function w T (n fr ), and performs overlap addition to perform frame synthesis. Do. For example, frame synthesis is performed by calculation of the following equation (15), and an output signal d (n spk , t) is obtained.
- the same window function used in the time-frequency analysis unit 32 is used as the window function w T (n fr ) for multiplying the output frame signal d fr (n spk , n fr , l).
- a rectangular window may be used.
- the time-frequency synthesizer 40 supplies the output signal d (n spk , t) thus obtained to the speaker array 41 as a speaker drive signal.
- step S ⁇ b > 11 the microphone array 31 collects a plane wave of sound in the sound collection space, and supplies the sound collection signal s (n mic , t) obtained as a result to the time frequency analysis unit 32.
- step S ⁇ b > 12 the time frequency analysis unit 32 analyzes time frequency information of the collected sound signal s (n mic , t) supplied from the microphone array 31.
- the time frequency analysis unit 32 performs time frame division on the collected sound signal s (n mic , t), and obtains the input frame signal s fr (n mic , n fr , l) obtained as a result.
- the window function application signal s w (n mic , n fr , l) is calculated by multiplying the window function w T (n fr ).
- the time-frequency analysis unit 32 performs time-frequency conversion on the window function applied signal s w (n mic , n fr , l), and the resulting time-frequency spectrum S (n mic , n T , l ) Is supplied to the spatial frequency analysis unit 33. That is, the calculation of Expression (6) is performed to calculate the time-frequency spectrum S (n mic , n T , l).
- step S13 the spatial frequency analysis unit 33 performs spatial frequency conversion on the time frequency spectrum S (n mic , n T , l) supplied from the time frequency analysis unit 32, and the resulting spatial frequency spectrum.
- S SP (n S , n T , l) is supplied to the communication unit 34.
- the spatial frequency analysis unit 33 calculates the expression (7) to convert the time frequency spectrum S (n mic , n T , l) into the spatial frequency spectrum S SP (n S , n T , l). Convert.
- step S14 the communication unit 34 transmits the spatial frequency spectrum S SP (n S , n T , l) supplied from the spatial frequency analysis unit 33 to the receiver 22 arranged in the reproduction space by wireless communication.
- step S ⁇ b > 15 the communication unit 36 of the receiver 22 receives the spatial frequency spectrum S SP (n S , n T , l) transmitted by wireless communication and supplies it to the drive signal generation unit 37.
- step S16 the direction information acquisition unit 35 acquires the speaker direction information ⁇ and supplies it to the spatial filter application unit 38.
- the speaker orientation information ⁇ may be determined in advance or may be acquired from the transmitter 21 or the like.
- step S ⁇ b > 17 the drive signal generator 37 generates a spatial domain speaker drive signal D SP (m S , n T , n) based on the spatial frequency spectrum S SP (n S , n T , l) supplied from the communication unit 36. l) is calculated and supplied to the spatial filter application unit 38.
- the drive signal generation unit 37 calculates the speaker drive signal D SP (m S , n T , l) in the spatial domain by calculating Expression (8).
- step S ⁇ b> 18 the spatial filter application unit 38 determines the characteristics of the spatial bandpass filter B ⁇ (m S , n T ) based on the speaker orientation information ⁇ supplied from the orientation information acquisition unit 35.
- the spatial filter application unit 38 calculates the above-described equations (9) and (10), and calculates the center frequency k cen and the bandwidth k len of the spatial bandpass filter B ⁇ (m S , n T ).
- the characteristics of the spatial bandpass filter B ⁇ (m S , n T ), that is, the transmission frequency band is determined.
- step S ⁇ b> 19 the spatial filter application unit 38 applies the spatial bandpass filter B ⁇ (with the determined characteristics to the speaker drive signal D SP (m S , n T , l) supplied from the drive signal generation unit 37. m S , n T ) applies.
- the spatial filter application unit 38 performs the calculation of Expression (11), so that the spatial bandpass filter B ⁇ (m S , n T ) is obtained for the speaker drive signal D SP (m S , n T , l). Is used to obtain a spatial filter spectrum F (m S , n T , l).
- the spatial filter application unit 38 supplies the spatial filter spectrum F (m S , n T , l) obtained by the filtering process to the spatial frequency synthesis unit 39.
- step S20 the spatial frequency synthesis unit 39 performs inverse spatial frequency conversion on the spatial filter spectrum F (m S , n T , l) supplied from the spatial filter application unit 38, and the time frequency obtained as a result thereof.
- the spectrum D (n spk , n T , l) is supplied to the time frequency synthesis unit 40.
- the spatial frequency synthesis unit 39 performs inverse spatial frequency conversion by calculating Expression (12).
- step S ⁇ b> 21 the time frequency synthesis unit 40 performs time frequency synthesis of the time frequency spectrum D (n spk , n T , l) supplied from the spatial frequency synthesis unit 39.
- the time-frequency synthesizer 40 calculates the expression (13) to obtain the output frame signal d fr (n spk , n fr , l) from the time-frequency spectrum D (n spk , n T , l). calculate. Further, the time-frequency synthesizer 40 multiplies the output frame signal d fr (n spk , n fr , l) by the window function w T (n fr ) to calculate Equation (15), and outputs the output signal d by frame synthesis. (n spk , t) is calculated.
- the time-frequency synthesizer 40 supplies the output signal d (n spk , t) thus obtained to the speaker array 41 as a speaker drive signal.
- step S22 the speaker array 41 reproduces sound based on the speaker drive signal supplied from the time-frequency synthesizer 40, and the sound field reproduction process ends.
- the sound field of the sound collection space is reproduced in the reproduction space.
- the spatial aliasing controller 11 determines the characteristics of the spatial bandpass filter B ⁇ (m S , n T ) based on the speaker orientation information ⁇ , and also uses the spatial bandpass filter B ⁇ (m S , n T ) is applied to the speaker drive signal D SP (m S , n T , l) to reduce spatial aliasing.
- the spatial aliasing is reduced by using the spatial bandpass filter B ⁇ (m S , n T ) having characteristics according to the speaker orientation information ⁇ , thereby increasing the upper limit time frequency flim and improving the sound quality.
- the localization of the sound image can be improved.
- the spatial aliasing controller 11 does not require a special speaker array and can reduce spatial aliasing by a simple process called filter processing, so that the upper limit time frequency can be increased at a lower cost. Can do.
- the series of processes described above can be executed by hardware or can be executed by software.
- a program constituting the software is installed in the computer.
- the computer includes, for example, a general-purpose personal computer capable of executing various functions by installing a computer incorporated in dedicated hardware and various programs.
- FIG. 7 is a block diagram showing an example of the hardware configuration of a computer that executes the above-described series of processing by a program.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- An input / output interface 505 is further connected to the bus 504.
- An input unit 506, an output unit 507, a recording unit 508, a communication unit 509, and a drive 510 are connected to the input / output interface 505.
- the input unit 506 includes a keyboard, a mouse, a microphone, an image sensor, and the like.
- the output unit 507 includes a display, a speaker, and the like.
- the recording unit 508 includes a hard disk, a nonvolatile memory, and the like.
- the communication unit 509 includes a network interface or the like.
- the drive 510 drives a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
- the CPU 501 loads the program recorded in the recording unit 508 to the RAM 503 via the input / output interface 505 and the bus 504 and executes the program, for example. Is performed.
- the program executed by the computer (CPU 501) can be provided by being recorded on the removable medium 511 as a package medium, for example.
- the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
- the program can be installed in the recording unit 508 via the input / output interface 505 by attaching the removable medium 511 to the drive 510. Further, the program can be received by the communication unit 509 via a wired or wireless transmission medium and installed in the recording unit 508. In addition, the program can be installed in the ROM 502 or the recording unit 508 in advance.
- the program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.
- the present technology can take a cloud computing configuration in which one function is shared by a plurality of devices via a network and is jointly processed.
- each step described in the above flowchart can be executed by one device or can be shared by a plurality of devices.
- the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.
- the present technology can be configured as follows.
- An orientation information acquisition unit for acquiring orientation information indicating the direction of the sound source;
- a sound processing apparatus comprising: a spatial filter applying unit that applies a spatial filter having a characteristic determined by the azimuth information to a collected sound signal obtained by collecting sound from the sound source by a microphone array including a plurality of microphones. .
- the spatial filter application unit determines a center frequency and a bandwidth as characteristics of the spatial filter based on the orientation information.
- the spatial filter is a filter that transmits a component of a transmission frequency band of the collected sound signal with a spatial frequency band determined by the center frequency and the bandwidth as a transmission frequency band.
- the spatial filter is a filter that transmits a component of a transmission frequency band of the collected sound signal, with a time frequency band determined by the center frequency and the bandwidth as a transmission frequency band, according to [2] or [3].
- Audio processing device. [5] The spatial filter application unit determines the characteristics of the spatial filter so that the bandwidth becomes wider as the angle between the direction of the sound source indicated by the azimuth information and the microphone array approaches ⁇ / 2. The audio processing device according to any one of [2] to [4]. [6] The sound processing apparatus according to any one of [1] to [5], wherein the microphone array is a linear microphone array.
- An audio processing method including a step of applying a spatial filter having a characteristic determined by the azimuth information to a collected sound signal obtained by collecting sound from the sound source by a microphone array including a plurality of microphones.
- An audio processing method including a step of applying a spatial filter having a characteristic determined by the azimuth information to a collected sound signal obtained by collecting sound from the sound source by a microphone array including a plurality of microphones.
- Get direction information indicating the direction of the sound source A program for causing a computer to execute a process including a step of applying a spatial filter having characteristics determined by the azimuth information to a collected sound signal obtained by collecting sound from the sound source by a microphone array including a plurality of microphones .
- Spatial aliasing controller 31 Microphone array, 32 Time frequency analysis unit, 33 Spatial frequency analysis unit, 35 Direction information acquisition unit, 37 Drive signal generation unit, 38 Spatial filter application unit, 39 Spatial frequency synthesis unit, 40 Time frequency Synthesizer, 41 speaker array
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- General Health & Medical Sciences (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
- Stereophonic System (AREA)
Abstract
Description
〈空間エリアシング制御器の構成例〉
本技術は、平面スピーカアレイや直線スピーカアレイを用いて音場を再現する場合に、波面合成を行うためのスピーカ駆動信号を生成する際、スピーカ駆動信号に適切な空間フィルタを適用することで、スピーカの離散配置によって生じる空間エリアシングを低減させるものである。
時間周波数分析部32は、マイクアレイ31を構成する各マイクロフォンで得られた収音信号s(nmic,t)の時間周波数情報を分析する。
続いて空間周波数分析部33は、時間周波数分析部32から供給された時間周波数スペクトルS(nmic,nT,l)に対して、次式(7)を計算することで空間周波数変換を行い、空間周波数スペクトルSSP(nS,nT,l)を算出する。
駆動信号生成部37には、通信部36および通信部34を介して、空間周波数分析部33から空間周波数スペクトルSSP(nS,nT,l)が供給される。
空間フィルタ適用部38は、駆動信号生成部37から供給されたスピーカ駆動信号DSP(mS,nT,l)と、方位情報取得部35から供給された話者方位情報θにより特性が決定される空間バンドパスフィルタBθ(mS,nT)とを用いて、空間フィルタスペクトルF(mS,nT,l)を求める。なお、ここでは空間バンドパスフィルタBθ(mS,nT)の形状が矩形状であるものとするが、空間バンドパスフィルタBθ(mS,nT)の形状は他のどのような形状であってもよい。
続いて、空間周波数合成部39について説明する。
時間周波数合成部40は、次式(13)の計算を行うことで、空間周波数合成部39から供給された時間周波数スペクトルD(nspk,nT,l)の時間周波数合成を行い、出力フレーム信号dfr(nspk,nfr,l)を得る。ここでは、時間周波数合成として、ISTFT(Inverse Short Time Fourier Transform)(短時間逆フーリエ変換)が用いられているが、時間周波数分析部32で行われる時間周波数変換(順変換)の逆変換に相当するものを用いればよい。
次に、以上において説明した空間エリアシング制御器11により行われる処理の流れについて説明する。空間エリアシング制御器11は、収音空間における音声の平面波の収音が指示されると、その平面波の収音を行って音場を再現する音場再現処理を行う。
音源の方向を示す方位情報を取得する方位情報取得部と、
複数のマイクロフォンからなるマイクアレイにより前記音源からの音声を収音して得られた収音信号に対して、前記方位情報により定まる特性の空間フィルタを適用する空間フィルタ適用部と
を備える音声処理装置。
[2]
前記空間フィルタ適用部は、前記方位情報に基づいて、前記空間フィルタの特性として中心周波数およびバンド幅を決定する
[1]に記載の音声処理装置。
[3]
前記空間フィルタは、前記中心周波数および前記バンド幅により定まる空間周波数の帯域を透過周波数帯域として、前記収音信号の透過周波数帯域の成分を透過させるフィルタである
[2]に記載の音声処理装置。
[4]
前記空間フィルタは、前記中心周波数および前記バンド幅により定まる時間周波数の帯域を透過周波数帯域として、前記収音信号の透過周波数帯域の成分を透過させるフィルタである
[2]または[3]に記載の音声処理装置。
[5]
前記空間フィルタ適用部は、前記方位情報により示される前記音源の方向と、前記マイクアレイとのなす角度がπ/2に近くなるほど前記バンド幅が広くなるように前記空間フィルタの特性を決定する
[2]乃至[4]の何れか一項に記載の音声処理装置。
[6]
前記マイクアレイは直線マイクアレイである
[1]乃至[5]の何れか一項に記載の音声処理装置。
[7]
音源の方向を示す方位情報を取得し、
複数のマイクロフォンからなるマイクアレイにより前記音源からの音声を収音して得られた収音信号に対して、前記方位情報により定まる特性の空間フィルタを適用する
ステップを含む音声処理方法。
[8]
音源の方向を示す方位情報を取得し、
複数のマイクロフォンからなるマイクアレイにより前記音源からの音声を収音して得られた収音信号に対して、前記方位情報により定まる特性の空間フィルタを適用する
ステップを含む処理をコンピュータに実行させるプログラム。
Claims (8)
- 音源の方向を示す方位情報を取得する方位情報取得部と、
複数のマイクロフォンからなるマイクアレイにより前記音源からの音声を収音して得られた収音信号に対して、前記方位情報により定まる特性の空間フィルタを適用する空間フィルタ適用部と
を備える音声処理装置。 - 前記空間フィルタ適用部は、前記方位情報に基づいて、前記空間フィルタの特性として中心周波数およびバンド幅を決定する
請求項1に記載の音声処理装置。 - 前記空間フィルタは、前記中心周波数および前記バンド幅により定まる空間周波数の帯域を透過周波数帯域として、前記収音信号の透過周波数帯域の成分を透過させるフィルタである
請求項2に記載の音声処理装置。 - 前記空間フィルタは、前記中心周波数および前記バンド幅により定まる時間周波数の帯域を透過周波数帯域として、前記収音信号の透過周波数帯域の成分を透過させるフィルタである
請求項2に記載の音声処理装置。 - 前記空間フィルタ適用部は、前記方位情報により示される前記音源の方向と、前記マイクアレイとのなす角度がπ/2に近くなるほど前記バンド幅が広くなるように前記空間フィルタの特性を決定する
請求項2に記載の音声処理装置。 - 前記マイクアレイは直線マイクアレイである
請求項1に記載の音声処理装置。 - 音源の方向を示す方位情報を取得し、
複数のマイクロフォンからなるマイクアレイにより前記音源からの音声を収音して得られた収音信号に対して、前記方位情報により定まる特性の空間フィルタを適用する
ステップを含む音声処理方法。 - 音源の方向を示す方位情報を取得し、
複数のマイクロフォンからなるマイクアレイにより前記音源からの音声を収音して得られた収音信号に対して、前記方位情報により定まる特性の空間フィルタを適用する
ステップを含む処理をコンピュータに実行させるプログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/516,563 US10602266B2 (en) | 2014-10-10 | 2015-09-28 | Audio processing apparatus and method, and program |
JP2016553046A JP6604331B2 (ja) | 2014-10-10 | 2015-09-28 | 音声処理装置および方法、並びにプログラム |
EP15849523.4A EP3206415B1 (en) | 2014-10-10 | 2015-09-28 | Sound processing device, method, and program |
CN201580053837.5A CN106797526B (zh) | 2014-10-10 | 2015-09-28 | 音频处理装置、方法和计算机可读记录介质 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014208865 | 2014-10-10 | ||
JP2014-208865 | 2014-10-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016056410A1 true WO2016056410A1 (ja) | 2016-04-14 |
Family
ID=55653027
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/077242 WO2016056410A1 (ja) | 2014-10-10 | 2015-09-28 | 音声処理装置および方法、並びにプログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US10602266B2 (ja) |
EP (1) | EP3206415B1 (ja) |
JP (1) | JP6604331B2 (ja) |
CN (1) | CN106797526B (ja) |
WO (1) | WO2016056410A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106604191A (zh) * | 2016-12-20 | 2017-04-26 | 广州视源电子科技股份有限公司 | 一种扩音方法及扩音系统 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10477309B2 (en) | 2014-04-16 | 2019-11-12 | Sony Corporation | Sound field reproduction device, sound field reproduction method, and program |
US10674255B2 (en) | 2015-09-03 | 2020-06-02 | Sony Corporation | Sound processing device, method and program |
WO2017098949A1 (ja) | 2015-12-10 | 2017-06-15 | ソニー株式会社 | 音声処理装置および方法、並びにプログラム |
WO2018042791A1 (ja) | 2016-09-01 | 2018-03-08 | ソニー株式会社 | 情報処理装置、情報処理方法及び記録媒体 |
US11565365B2 (en) * | 2017-11-13 | 2023-01-31 | Taiwan Semiconductor Manufacturing Co., Ltd. | System and method for monitoring chemical mechanical polishing |
JP6959134B2 (ja) * | 2017-12-28 | 2021-11-02 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | エリア再生方法、エリア再生プログラム及びエリア再生システム |
US20220167085A1 (en) * | 2019-05-28 | 2022-05-26 | Sony Group Corporation | Audio processing device, audio processing method, and program |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008048294A (ja) * | 2006-08-18 | 2008-02-28 | Kanazawa Univ | 指向性アレーマイクロホンおよび指向性アレースピーカ |
JP2011107602A (ja) * | 2009-11-20 | 2011-06-02 | Sony Corp | 信号処理装置、および信号処理方法、並びにプログラム |
JP2014501064A (ja) * | 2010-10-25 | 2014-01-16 | クゥアルコム・インコーポレイテッド | マルチマイクロフォンを用いた3次元サウンド獲得及び再生 |
JP2014501945A (ja) * | 2010-12-03 | 2014-01-23 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | 幾何ベースの空間オーディオ符号化のための装置および方法 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS59193380A (ja) * | 1983-04-18 | 1984-11-01 | Yokogawa Medical Syst Ltd | 方位角適応型フエ−ズド・アレイ・ソ−ナ− |
JP4124182B2 (ja) | 2004-08-27 | 2008-07-23 | ヤマハ株式会社 | アレイスピーカ装置 |
US8238569B2 (en) * | 2007-10-12 | 2012-08-07 | Samsung Electronics Co., Ltd. | Method, medium, and apparatus for extracting target sound from mixed sound |
JP2012150237A (ja) * | 2011-01-18 | 2012-08-09 | Sony Corp | 音信号処理装置、および音信号処理方法、並びにプログラム |
JP2014014410A (ja) * | 2012-07-06 | 2014-01-30 | Sony Corp | 記憶制御装置、記憶制御システムおよびプログラム |
EP2738762A1 (en) * | 2012-11-30 | 2014-06-04 | Aalto-Korkeakoulusäätiö | Method for spatial filtering of at least one first sound signal, computer readable storage medium and spatial filtering system based on cross-pattern coherence |
EP2747451A1 (en) * | 2012-12-21 | 2014-06-25 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Filter and method for informed spatial filtering using multiple instantaneous direction-of-arrivial estimates |
CN104010265A (zh) * | 2013-02-22 | 2014-08-27 | 杜比实验室特许公司 | 音频空间渲染设备及方法 |
JP5741866B2 (ja) * | 2013-03-05 | 2015-07-01 | 日本電信電話株式会社 | 音場収音再生装置、方法及びプログラム |
JP5986966B2 (ja) * | 2013-08-12 | 2016-09-06 | 日本電信電話株式会社 | 音場収音再生装置、方法及びプログラム |
-
2015
- 2015-09-28 CN CN201580053837.5A patent/CN106797526B/zh active Active
- 2015-09-28 JP JP2016553046A patent/JP6604331B2/ja active Active
- 2015-09-28 WO PCT/JP2015/077242 patent/WO2016056410A1/ja active Application Filing
- 2015-09-28 EP EP15849523.4A patent/EP3206415B1/en active Active
- 2015-09-28 US US15/516,563 patent/US10602266B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008048294A (ja) * | 2006-08-18 | 2008-02-28 | Kanazawa Univ | 指向性アレーマイクロホンおよび指向性アレースピーカ |
JP2011107602A (ja) * | 2009-11-20 | 2011-06-02 | Sony Corp | 信号処理装置、および信号処理方法、並びにプログラム |
JP2014501064A (ja) * | 2010-10-25 | 2014-01-16 | クゥアルコム・インコーポレイテッド | マルチマイクロフォンを用いた3次元サウンド獲得及び再生 |
JP2014501945A (ja) * | 2010-12-03 | 2014-01-23 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | 幾何ベースの空間オーディオ符号化のための装置および方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3206415A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106604191A (zh) * | 2016-12-20 | 2017-04-26 | 广州视源电子科技股份有限公司 | 一种扩音方法及扩音系统 |
Also Published As
Publication number | Publication date |
---|---|
US10602266B2 (en) | 2020-03-24 |
CN106797526A (zh) | 2017-05-31 |
CN106797526B (zh) | 2019-07-12 |
US20180279042A1 (en) | 2018-09-27 |
EP3206415A1 (en) | 2017-08-16 |
JPWO2016056410A1 (ja) | 2017-07-20 |
JP6604331B2 (ja) | 2019-11-13 |
EP3206415A4 (en) | 2018-06-06 |
EP3206415B1 (en) | 2019-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6604331B2 (ja) | 音声処理装置および方法、並びにプログラム | |
EP3320692B1 (en) | Spatial audio processing apparatus | |
US11310617B2 (en) | Sound field forming apparatus and method | |
US9361898B2 (en) | Three-dimensional sound compression and over-the-air-transmission during a call | |
US10477335B2 (en) | Converting multi-microphone captured signals to shifted signals useful for binaural signal processing and use thereof | |
EP3080806B1 (en) | Extraction of reverberant sound using microphone arrays | |
EP2777298B1 (en) | Method and apparatus for processing signals of a spherical microphone array on a rigid sphere used for generating a spherical harmonics representation or an ambisonics representation of the sound field | |
WO2015196729A1 (zh) | 一种麦克风阵列语音增强方法及装置 | |
EP3133833B1 (en) | Sound field reproduction apparatus, method and program | |
KR20160086831A (ko) | 음장 재현 장치 및 방법, 그리고 프로그램 | |
JP6508539B2 (ja) | 音場収音装置および方法、音場再生装置および方法、並びにプログラム | |
US20130253923A1 (en) | Multichannel enhancement system for preserving spatial cues | |
JP2014165901A (ja) | 音場収音再生装置、方法及びプログラム | |
JP4116600B2 (ja) | 収音方法、収音装置、収音プログラム、およびこれを記録した記録媒体 | |
WO2021212287A1 (zh) | 音频信号处理方法、音频处理装置及录音设备 | |
EP3866485A1 (en) | Method and apparatus for rendering audio | |
JP2015164267A (ja) | 収音装置および収音方法、並びにプログラム | |
JP5734327B2 (ja) | 音場収音再生装置、方法及びプログラム | |
Chisaki et al. | Network-based multi-channel signal processing using the precision time protocol | |
Okamoto et al. | Estimation of high-resolution sound properties for realizing an editable sound-space system | |
KR20060091966A (ko) | 머리 모델링을 이용한 입체음향 합성방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15849523 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016553046 Country of ref document: JP Kind code of ref document: A |
|
REEP | Request for entry into the european phase |
Ref document number: 2015849523 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2015849523 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15516563 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |