WO2022034795A1 - Signal processing device and method, noise cancelling device, and program - Google Patents

Signal processing device and method, noise cancelling device, and program Download PDF

Info

Publication number
WO2022034795A1
WO2022034795A1 PCT/JP2021/027823 JP2021027823W WO2022034795A1 WO 2022034795 A1 WO2022034795 A1 WO 2022034795A1 JP 2021027823 W JP2021027823 W JP 2021027823W WO 2022034795 A1 WO2022034795 A1 WO 2022034795A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal processing
signal
speaker
spatial frequency
microphone
Prior art date
Application number
PCT/JP2021/027823
Other languages
French (fr)
Japanese (ja)
Inventor
徹徳 板橋
直毅 村田
悠 前野
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2022034795A1 publication Critical patent/WO2022034795A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers

Definitions

  • the present technology relates to signal processing devices and methods, noise canceling devices, and programs, and in particular, to signal processing devices and methods capable of reducing delay times, noise canceling devices, and programs.
  • spatial noise canceling (hereinafter, also referred to as spatial NC (Noise Canceling)), which performs noise canceling using wave field synthesis technology, is known.
  • a method of performing operations such as filtering in the spatial frequency domain can be considered.
  • Non-Patent Document 1 A technique for performing an operation in the spatial frequency domain (see, for example, Non-Patent Document 1) has been proposed in order to realize wave field synthesis, and if the arithmetic in the spatial frequency domain is used, between channels corresponding to a plurality of speakers. It is possible to realize higher spatial NC performance in consideration of the correlation of.
  • time Fourier transform is performed on the microphone signals obtained by all the microphones, and spatial Fourier transform is performed on the resulting signals. It needs to be done and converted into a signal in the spatial frequency domain.
  • This technology was made in view of such a situation, and makes it possible to reduce the delay time.
  • the signal processing device of the first aspect of the present technology has one or a plurality of signal processing units that perform signal processing in the spatial frequency region, and the signal processing unit is a microphone obtained by sound collection by a plurality of microphones. The signal processing is performed on the signal converted in the spatial frequency region based on the signal.
  • the signal processing method or program of the first aspect of the present technology is a signal processing method or program of a signal processing apparatus having one or more signal processing units that perform signal processing in the spatial frequency region, and is the one or more.
  • the signal processing unit includes a step of performing the signal processing on a signal converted in the spatial frequency region based on the microphone signals obtained by collecting sounds from a plurality of microphones.
  • the sound is picked up by a plurality of microphones by the one or a plurality of the signal processing units.
  • the signal processing is performed on the signal converted in the spatial frequency region based on the microphone signal obtained by.
  • the noise canceling device on the second aspect of the present technology includes a plurality of microphones, one or a plurality of signal processing units that perform signal processing in the spatial frequency region, and a sound based on the noise canceling signal generated by the signal processing.
  • the signal processing unit is provided with a plurality of speakers for outputting the signal, and the signal processing unit performs the signal processing on the signal converted in the spatial frequency region based on the microphone signals obtained by the sound collection by the plurality of microphones.
  • the noise canceling signal is generated.
  • a plurality of microphones, one or a plurality of signal processing units that perform signal processing in the spatial frequency region, and a plurality of sounds based on the noise canceling signal generated by the signal processing are output.
  • the signal processing unit performs the signal processing on the signal converted in the spatial frequency region based on the microphone signals obtained by the sound collection by the plurality of microphones. , The noise canceling signal is generated.
  • This technology realizes spatial NC that does not require time-frequency conversion and vice versa by directly performing spatial frequency conversion on the microphone signal in the time domain and converting it into a signal in the spatial frequency domain. be. As a result, the delay time can be reduced and higher spatial NC performance can be obtained.
  • the temporal Fourier transform is performed on the microphone signals obtained by all the microphones, and the spatial Fourier transform is performed on the resulting signals. Will be. Then, after filtering is performed on the signal in the spatial frequency domain obtained by the spatial Fourier transform to generate a speaker signal, the spatial Fourier transform and the time Fourier transform are inversely transformed on all the speaker signals. , It is a speaker signal in the time domain.
  • the spatial Fourier transform, its inverse transform, and the filtering in the spatial frequency domain are performed using the signals obtained from the microphone signals of all the microphones.
  • microphones and speakers used for spatial NC are divided into multiple groups, and processing is performed for each group, so that multiple processing that could only be performed by one device in general spatial NC can be performed. It can be shared by devices and arithmetic units. As a result, the amount of calculation of each device or arithmetic unit can be reduced, and the number of input / output lines required for one device can also be reduced.
  • FIG. 1 is a diagram showing a configuration of a multi-input multi-output system that realizes noise canceling with general headphones or the like, that is, a parallel SISO (Single Input Single Output) system.
  • SISO Single Input Single Output
  • the parallel SISO system shown in FIG. 1 has microphones 11-1 to 11-6, SISO filters 12-1 to SISO filters 12-6, and speakers 13-1 to 13-6.
  • the microphones 11-1 to 11-6 pick up the ambient sound, and supply the resulting microphone signal to the SISO filter 12-1 to the SISO filter 12-6.
  • the SISO filter 12-1 to SISO filter 12-6 filters the microphone signals supplied from the microphones 11-1 to 11-6 by the SISO filter in the time domain, and obtains the speaker signal as a result. It is supplied to the speakers 13-1 to the speakers 13-6.
  • the speaker signal is a speaker signal for outputting sound (hereinafter, also referred to as noise canceling sound) from the speaker so that noise noise is canceled in a target area (position), that is, noise canceling is realized. It is a drive signal.
  • the speaker signal is a noise canceling signal for outputting a noise canceling sound from the speaker.
  • Speakers 13-1 to 13-6 output sound based on the speaker signals supplied from the SISO filters 12-1 to SISO filters 12-6, and realize noise canceling.
  • microphones 11-1 to 11-6 they are also simply referred to as microphones 11, and when it is not necessary to distinguish SISO filters 12-1 to SISO filters 12-6, they are simply SISO. Also referred to as a filter 12. Further, hereinafter, when it is not necessary to distinguish between the speaker 13-1 and the speaker 13-6, the speaker 13 is also simply referred to as a speaker 13.
  • a system consisting of a microphone 11-1, an SISO filter 12-1, and a speaker 13-1 is an SISO system corresponding to one channel, and a plurality of SISO systems are arranged in parallel in parallel.
  • the SISO system is configured.
  • each channel that is, each SISO system operates independently, the amount of calculation in each of those SISO systems can be small.
  • the correlation between channels is not taken into consideration, the higher the frequency, the greater the influence of the phase shift of the sound output by each speaker 13, and the lower the noise canceling effect.
  • FIG. 2 is a diagram showing a configuration of a multi-point control MIMO (Multi Input Multi Output) system.
  • MIMO Multi Input Multi Output
  • FIG. 2 the same reference numerals are given to the portions corresponding to those in FIG. 1, and the description thereof will be omitted as appropriate.
  • the multipoint control MIMO system shown in FIG. 2 has microphones 11-1 to 11-6, a MIMO filter 41, and speakers 13-1 to 13-6.
  • the microphone signals obtained from all the microphones 11 are input to one MIMO filter 41.
  • the MIMO filter 41 performs filtering by the MIMO filter on the microphone signal supplied from each microphone 11 to generate a speaker signal for each channel, and transfers the speaker signal of each channel to the speaker 13 corresponding to each channel. Output.
  • the filter calculation in the time domain is performed between each microphone 11 and each speaker 13, the correlation between channels is also taken into consideration, and a good noise canceling effect is obtained even at high frequencies. Obtainable.
  • the larger the number of channels the larger the amount of calculation in the MIMO filter 41 is in proportion to the square of the number of channels.
  • the number of channels is 48 channels
  • the amount of filtering calculation for 48 channels is sufficient.
  • the number of channels is 48 channels
  • FIG. 3 for example, it is known that the amount of calculation can be significantly reduced by performing processing in the spatial frequency domain by the spatial frequency domain processing system.
  • FIG. 3 the same reference numerals are given to the portions corresponding to those in FIG. 1, and the description thereof will be omitted as appropriate.
  • the spatial frequency domain processing system shown in FIG. 3 includes a microphone 11-1 to a microphone 11-6, a time FFT (Fast Fourier Transform) unit 71-1 to a time FFT unit 71-6, a space FFT unit 72, and a filter processing unit 73. It has a spatial inverse FFT unit 74, a time inverse FFT unit 75-1 to a time inverse FFT unit 75-6, and a speaker 13-1 to a speaker 13-6.
  • a spatial inverse FFT unit 74 a time inverse FFT unit 75-1 to a time inverse FFT unit 75-6
  • speaker 13-1 to a speaker 13-6.
  • the time FFT unit 71-1 to the time FFT unit 71-6 perform time FFT processing on the microphone signals supplied from the microphones 11-1 to 11-6, and obtain signals in the time frequency region obtained as a result. It is supplied to the space FFT unit 72.
  • the spatial FFT unit 72 performs spatial FFT processing on the signals supplied from the time FFT unit 71-1 to the time FFT unit 71-6, and the resulting spatial frequency domain signal (signal for each frequency bin). Is supplied to the filter processing unit 73.
  • time FFT unit 71 when it is not necessary to particularly distinguish between the time FFT unit 71-1 and the time FFT unit 71-6, it is also simply referred to as the time FFT unit 71.
  • time FFT unit 71 for example, STFT (Short Time Fourier Transform) or other time axis FFT processing (time Fourier transform) is performed as time FFT processing, and the microphone signal, which is a time signal, is converted into a signal in the time frequency region.
  • STFT Short Time Fourier Transform
  • time Fourier transform time Fourier transform
  • the time frequency in the space frequency domain is performed by performing the FFT process (spatial Fourier transform) on the space axis as the space FFT process for the signal in the time frequency domain obtained by the time FFT unit 71. Get the signal of.
  • FFT process spatial Fourier transform
  • the filter processing unit 73 filters the signal supplied from the spatial FFT unit 72 in the spatial frequency domain, and supplies the speaker signal for each frequency bin obtained as a result to the spatial inverse FFT unit 74.
  • the filtering in the filter processing unit 73 is a multiplication of the frequency axes, which is a large calculation compared with the multipoint control MIMO system of FIG. The amount can be reduced.
  • the spatial inverse FFT 74 performs the spatial inverse FFT processing, that is, the inverse transform of the spatial FFT processing on the speaker signal in the spatial frequency domain supplied from the filter processing unit 73, and the speaker in the temporal frequency domain of each channel obtained as a result.
  • the signal is supplied to the time reverse FFT unit 75-1 to the time reverse FFT unit 75-6.
  • the time inverse FFT unit 75-1 to the time inverse FFT unit 75-6 perform the time inverse FFT process, that is, the inverse transform of the time FFT process on the speaker signal in the time frequency domain supplied from the spatial inverse FFT 74, and the result is The obtained speaker signal in the time domain of each channel is supplied to the speaker 13-1 to the speaker 13-6.
  • time reverse FFT unit 75-1 when it is not necessary to distinguish the time reverse FFT unit 75-1 to the time reverse FFT unit 75-6, it is also simply referred to as a time reverse FFT unit 75.
  • Such a spatial frequency domain processing system can be used when the microphone 11 and the speaker 13 are arranged in an array in a specific shape such as an annular shape, and can be used in a space with a large number of channels and a wide frequency band with a low calculation amount. NC can be realized.
  • the correlation between channels is taken into consideration, so that higher spatial NC performance can be obtained up to higher frequencies than in the parallel SISO system, and moreover, the calculation is performed more than in the multipoint control MIMO system.
  • the amount can be kept low.
  • a wide region can be controlled, that is, a desired wavefront can be formed with high accuracy in a wide region, so that spatial NC can be performed for a wide region.
  • the adaptive processing of the filter used in the filter processing unit 73 converges quickly, so that the spatial NC that follows the environmental change can be performed.
  • DFT Discrete Fourier Transform
  • a microphone array consisting of a plurality of microphones arranged at equal intervals in a ring shape centered on the origin of the xy coordinate system, which is a predetermined two-dimensional Cartesian coordinate system set in space, and the origin of the xy coordinate system.
  • a speaker array consisting of a plurality of speakers arranged in a ring at equal intervals.
  • the number of elements of the microphone array and the speaker array that is, the number of microphones constituting the microphone array and the number of speakers constituting the speaker array are M respectively.
  • the radius of the microphone array that is, the distance from the center position (origin position) of the microphone array to the microphone is R mic , and the radius of the speaker array is R spc .
  • the coordinates indicating the arrangement position of each speaker constituting the speaker array can be expressed in the same manner as the coordinates indicating the arrangement position of each microphone.
  • n be the discrete-time index (time index)
  • x [m, n] be the microphone signal in the time domain obtained by the m-th microphone that constitutes the microphone array.
  • y [m, n] be the speaker signal in the time domain of the mth speaker constituting the speaker array.
  • k in the microphone signal X [m, k] is an index indicating the time frequency
  • N in the equation (2) indicates the time Fourier transform length
  • the spatial Fourier transform is defined in the same way as the time Fourier transform.
  • the Fourier transform is performed with respect to the time index n, but in the spatial Fourier transform, the Fourier transform is performed with respect to the microphone index m.
  • l be the index indicating the spatial frequency, that is, the index of the frequency bin of the spatial frequency
  • the signal in the spatial frequency region obtained by performing the spatial Fourier transform on the microphone signal X [m, k] is X'[l, k. ].
  • the relationship between the microphone signal X [m, k] and the signal X'[l, k] is shown in the following equation (3).
  • the speaker signal in the spatial frequency domain obtained by performing the spatial Fourier transform on the speaker signal Y [m, k] is Y'[l, k]
  • the relationship of the signal Y'[l, k] can also be expressed by the same equation as in equation (3).
  • filtering in the spatial frequency domain is performed on the signal X'[l, k] in the spatial frequency domain, and this filtering process filters the spatial frequency domain W'[. If l, k], it is expressed by the following equation (4). That is, the speaker signal Y'[l, k] can be obtained by calculating the equation (4) based on the signal X'[l, k] and the filter W'[l, k].
  • the filtering represented by the equation (4) is performed to generate a speaker signal in the spatial frequency domain.
  • w [m', n'] represents a time domain filter for a microphone with a microphone index m'corresponding to the filter W'[l, k], and (P) Q.
  • P mod
  • the equation (5) expresses the relationship between the time domain filter w [m, n] and the microphone signal x [m, n] and the time domain speaker signal y [m, n].
  • Filtering based on equation (5) can be used in spatial NC because it is not affected by the system delay due to the time Fourier transform, but the amount of computation is the same as the above-mentioned multipoint control MIMO system, and many computations are performed. It will be necessary.
  • the multipoint control MIMO system is described in detail in, for example, “C. Hansen, et al.,” Active Control of Noise and Vibration “, CRC press, 2012.” (hereinafter referred to as Reference 2). ing.
  • the spatial Fourier transform in which the total number M of the microphones is the DFT point length for the microphone signal x [m, n] in the time domain, that is, to the signal in the spatial frequency domain.
  • the signal obtained by performing the conversion (spatial frequency conversion) of is defined as x'[l, n].
  • the process of converting the microphone signal x [m, n] into the frequency domain only in the spatial direction is performed. Therefore, it can be said that the signal x'[l, n] obtained by the equation (6) is a time signal in the spatial frequency domain.
  • the time domain filter w [m, n] is not subjected to the time Fourier transform (temporal frequency transform), but only the spatial Fourier transform (spatial frequency transform).
  • the filter of the spatial frequency domain (frequency bin of each spatial frequency) obtained by this is defined as w'[l, n].
  • the spatial frequency domain (each) obtained by performing only the spatial Fourier transform (spatial frequency conversion) without performing the temporal Fourier transform (temporal frequency transform) for the speaker signal y [m, n] in the temporal domain.
  • the speaker signal of the frequency bin of the spatial frequency is defined as y'[l, n]. More specifically, the speaker signal y'[l, n] is not a speaker drive signal that drives the speaker alone, but the noise canceling sound output from each speaker is derived from this speaker signal y'[l, n]. It is calculated.
  • the following equation (7) can be obtained by performing only the inverse time Fourier transform without performing the reciprocal space Fourier transform on both sides of the equation (4).
  • the signal x'[l, n] is filtered by the filter w'[l, n] having a filter length N f , that is, the filter w'[l, n] and the signal x'[l, n].
  • the convolution process in the time direction the speaker signal y'[l, n] can be obtained.
  • the filtering operation shown in equation (7) is a process in the spatial frequency domain, but does not require spatial convolution, that is, it is independent of the spatial frequency index (frequency bin) l and is convoluted in the temporal direction. All you have to do is.
  • the actual amount of operation for obtaining the speaker signal y'[l, n] is the same as that in the above-mentioned parallel SISO system except for the operation of a constant multiple.
  • a system that generates a speaker signal as a noise canceling signal by the spatial frequency conversion shown in the equation (6) and the filtering shown in the equation (7) is also referred to as a low delay spatial frequency domain processing system. do.
  • the filtering operation is a convolution operation instead of a simple multiplication as in the spatial frequency domain processing system, but since SISO filtering is sufficient, multipoint control is required. Compared to the MIMO system, the amount of computation is significantly reduced.
  • the delay time (delay) generated in the system is extremely reduced.
  • spatial FFT spatial frequency conversion
  • the low-delay spatial frequency domain processing system becomes a realistic system with low delay and a small amount of computation, and can realize a higher-performance spatial NC.
  • FIG. 4 is a diagram showing a configuration example of a noise canceling device which is an example of an embodiment of a low delay spatial frequency domain processing system to which the present technology is applied.
  • the noise canceling device 101 shown in FIG. 4 has a microphone array 111, a signal processing device 112, and a speaker array 113.
  • the microphone array 111 is a microphone array such as an annular microphone array obtained by arranging the microphones 121-1 to 121-16 in a predetermined shape such as an annular shape or a rectangular shape.
  • the microphones 121-1 to 121-16 collect ambient sounds including noise to be canceled, and supply the resulting microphone signal to the signal processing device 112.
  • the microphones 121 are simply referred to as the microphones 121.
  • the signal processing device 112 comprises, for example, a personal computer having one or more arithmetic units, and generates a speaker signal in the time domain for spatial NC based on the microphone signal supplied from the microphone array 111, and the speaker array. Output to 113.
  • the speaker signal in this time domain is a noise canceling signal for spatial NC, and is a speaker driving signal that drives the speakers constituting the speaker array 113 to output a noise canceling sound.
  • the signal processing device 112 has a signal processing unit 131 including one arithmetic unit such as a DSP (Digital Signal Processor) or an FPGA (Field Programmable Gate Array).
  • a DSP Digital Signal Processor
  • FPGA Field Programmable Gate Array
  • the signal processing unit 131 has a spatial frequency conversion unit 141, a filter processing unit 142-1 to a filter processing unit 142-16, and a spatial frequency synthesis unit 143.
  • the spatial frequency conversion unit 141 performs spatial frequency conversion on the microphone signal in the time region supplied from the microphones 121-1 to 121-16, that is, the time signal, and filters the signal in the spatial frequency region obtained as a result. It is supplied to the processing unit 142-1 to the filter processing unit 142-16. In other words, the spatial frequency conversion unit 141 converts the microphone signal in the time domain in the spatial frequency domain.
  • the DFT shown in the equation (6) is performed as the spatial frequency conversion based on the microphone signals supplied from all the microphones 121.
  • the total number M of the microphones 121 is set to "16", and the signal x'[l] for the frequency bin l of each of the 16 spatial frequencies corresponding to the filter processing unit 142-1 to the filter processing unit 142-16. , N] is calculated.
  • the filter processing unit 142-1 to the filter processing unit 142-16 generate a speaker signal in the spatial frequency region by performing signal processing in the spatial frequency region with respect to the signal supplied from the spatial frequency conversion unit 141. It is supplied to the spatial frequency synthesis unit 143.
  • the filter processing unit 142-1 to the filter processing unit 142-16 hold an SISO filter for the spatial NC, and the SISO filter filters the signal in the spatial frequency domain from the spatial frequency conversion unit 141. , Performed as signal processing in the spatial frequency domain. More specifically, the process of convolving the filter coefficients constituting the SISO filter and the signal in the spatial frequency region is performed as filtering by the SISO filter.
  • the filter processing unit 142-1 to the filter processing unit 142-16 hold the above-mentioned filter w'[l, n] as an SISO filter, and are represented by the equation (7) as filtering. The calculation is performed and the speaker signal y'[l, n] is generated.
  • the filter processing unit 142 is also simply referred to as the filter processing unit 142.
  • one filter processing unit 142 performs filtering for one frequency bin l of the spatial frequency. Become.
  • the SISO filter held by the filter processing unit 142 is, for example, a FIR (Finite Impulse Response) filter generated in advance by LMS (Least Mean Squares) or the like based on the shape of the microphone array 111 or the total number of microphones 121.
  • FIR Finite Impulse Response
  • LMS Least Mean Squares
  • a SISO filter prepared in advance may be continuously used, or an SISO filter may be used based on a microphone signal obtained by collecting sound by a microphone or the like installed at a control point. May be updated sequentially.
  • the spatial frequency synthesis unit 143 generates a speaker signal in the time region for each speaker by performing spatial frequency synthesis on the speaker signal in the spatial frequency region supplied from each filter processing unit 142, and supplies the speaker signal to the speaker array 113. do.
  • the inverse conversion of the spatial frequency conversion performed by the spatial frequency conversion unit 141 is performed as the spatial frequency synthesis. Therefore, for example, when the DFT (spatial Fourier transform) shown in the equation (6) is performed by the spatial frequency transforming unit 141, the spatial frequency synthesizing unit 143 has an IDFT (Inverse Discrete Fourier Transform) (reverse) corresponding to the equation (6). Spatial Fourier transform) is performed.
  • DFT spatial Fourier transform
  • IDFT Inverse Discrete Fourier Transform
  • the speaker array 113 is a speaker array such as an annular speaker array obtained by arranging the speakers 151-1 to 151-16, which are speaker units, in a predetermined shape such as an annular shape or a rectangular shape. ..
  • Speakers 151-1 to 151-16 are driven based on the speaker signal in the time domain supplied from the spatial frequency synthesis unit 143, and output a noise canceling sound. As a result, the noise sound is canceled in the predetermined target area, and the spatial NC is realized.
  • the speaker 151-1 is also simply referred to as the speaker 151.
  • FIG. 5 the parts corresponding to the case in FIG. 4 are designated by the same reference numerals, and the description thereof will be omitted. Further, in FIG. 5, in order to make the figure easier to see, the reference numerals of the microphone 121 and the speaker 151 are not attached.
  • the user U11 who is a listener or the like listening to the content is in the predetermined area R11, and this area R11 is the area (cancellation area) targeted by the spatial NC.
  • the speakers 151 constituting the speaker array 113 are arranged in a ring shape so as to surround the area R11 which is a cancellation area to form an annular speaker array.
  • the microphones 121 constituting the microphone array 111 are arranged in a ring shape on the outside of the speaker array 113 so as to surround the speaker array 113 to form a ring-shaped microphone array.
  • the speaker array 113 and the microphone array 111 are arranged so that the center of the speaker array 113 and the microphone array 111 is at the center position of the circular region R11.
  • the microphone array 111 arranged outside the speaker array 113 when viewed from the region R11 collects noise (noise sound) generated outside the microphone array 111 and propagated to the region R11. Will be done.
  • a speaker signal is generated based on the microphone signal obtained by collecting the sound, and the noise canceling sound based on the speaker signal is output in the direction of the region R11 from each speaker 151 constituting the speaker array 113.
  • the wavefront of the noise canceling sound output from each speaker 151 is combined to form a wavefront that cancels (cancels) the noise sound in the region R11.
  • spatial NC by wave field synthesis is realized.
  • the numbers of the microphone 121 and the speaker 151 and the shapes of the microphone array 111 and the speaker array 113 do not necessarily have to be the same, and may be different numbers and shapes.
  • the spatial frequency conversion unit 141 or the spatial frequency synthesis unit 143 may perform upsampling or downsampling on the signal in the spatial frequency domain according to the numbers. good.
  • the number of microphones 121 and speakers 151 may be any number, and the shape (array shape) of the microphone array 111 and speaker array 113 may be any shape.
  • each microphone 121 of the microphone array 111 collects ambient sound, and supplies the microphone signal in the time domain obtained as a result to the spatial frequency conversion unit 141.
  • step S12 the spatial frequency conversion unit 141 performs spatial frequency conversion on the microphone signal in the time region supplied from the microphone 121, and supplies the signal in the spatial frequency region obtained as a result to each filter processing unit 142. For example, in step S12, the calculation of the above equation (6) is performed, and a signal in the spatial frequency region is generated.
  • step S13 the filter processing unit 142 filters the signal in the spatial frequency domain supplied from the spatial frequency conversion unit 141 by the holding SISO filter, and obtains the speaker signal in the spatial frequency domain obtained as a result. It is supplied to the spatial frequency synthesis unit 143. For example, in step S13, the calculation of the equation (7) is performed as filtering.
  • step S14 the spatial frequency synthesis unit 143 performs spatial frequency synthesis on the spatial frequency domain speaker signal supplied from the filter processing unit 142, and generates a temporal domain speaker signal.
  • step S15 the spatial frequency synthesis unit 143 supplies the speaker signal obtained in the process of step S14 to each speaker 151 of the speaker array 113 to output a sound (noise canceling sound).
  • the noise canceling device 101 performs spatial frequency conversion on the microphone signal in the time domain without performing time frequency conversion, and the speaker signal is based on the signal in the spatial frequency region obtained as a result. To generate.
  • ⁇ Second embodiment> ⁇ Processing in the spatial frequency domain>
  • the spatial frequency conversion unit 141 first performs spatial frequency conversion, and the microphone signal in the time region is performed. Is converted into a signal in the spatial frequency region.
  • the outputs (microphone signals) of all the microphones 121 constituting the microphone array 111 must be input to the signal processing unit 131 for spatial frequency conversion (DFT). Further, even after filtering in the spatial frequency domain, it is necessary to perform spatial frequency synthesis using the outputs of all the filter processing units 142.
  • DFT spatial frequency conversion
  • one arithmetic unit as the signal processing unit 131 must perform spatial frequency conversion, signal processing (filtering) in the spatial frequency region, and spatial frequency synthesis. That is, it is not possible to divide the hardware that performs these processes and share the processes with a plurality of hardware (arithmetic units) (distribute the processes).
  • the upper limit of the frequency targeted for noise canceling is set to 1 kHz and the cancel area (region R11) is to be a region with a diameter of 2 m, in order to obtain sufficient spatial NC performance.
  • arithmetic unit such as one DSP or FPGA as the signal processing unit 131 is the number of PINs (input terminals) of the input and output provided in the arithmetic unit. And the number of output terminals), it may be physically impossible.
  • one signal processing unit 131 may not be able to perform the calculation (processing) because the amount of calculation is too large, and it may not be possible to realize the spatial NC.
  • a plurality of microphones 121 and speakers 151 constituting the microphone array 111 and the speaker array 113 are divided into a plurality of groups, and spatial frequency conversion, filtering in the spatial frequency region, and spatial frequency synthesis are performed for each of the divided groups. May be done.
  • the calculation for spatial NC can be shared by multiple arithmetic units (arithmetic units), so it is high while reducing the number of PINs and the amount of computation required for one arithmetic unit. Spatial NC performance can be obtained.
  • FIG. 7 consider a case where a noise sound is generated with the position P11 outside the microphone array 111 as the sound source position of one point sound source.
  • the same reference numerals are given to the portions corresponding to those in FIG. 5, and the description thereof will be omitted as appropriate.
  • all the microphones 121 constituting the microphone array 111 and all the speakers 151 constituting the speaker array 113 are used in order to perform spatial NC.
  • the degree (contribution rate) used for sound collection and sound output for each microphone 121 and speaker 151 that is, for the realization of spatial NC.
  • the importance of is different.
  • the importance is as high as the microphone 121 and the speaker 151 arranged at a position close to the position P11 where the sound source of the noise sound is located, and conversely, the microphone 121 arranged at a position far from the position P11. And the speaker 151 become less important.
  • FIG. 8 it is also possible to perform spatial NC using only four microphones 121 and 12 speakers 151 near the position P11 where the noise source is located.
  • the parts corresponding to the case in FIG. 7 are designated by the same reference numerals, and the description thereof will be omitted as appropriate.
  • the number of microphones 121 used for spatial NC is four, while the number of speakers 151 used for spatial NC is twelve. Therefore, in order to generate a speaker signal for spatial NC by the same calculation as in the case of the noise canceling device 101, the microphone signals of the eight microphones 121 are further required.
  • each of the twelve will be calculated in the same manner as in the case of the noise canceling device 101.
  • the speaker signal of the speaker 151 can be generated.
  • the noise sound generated at the position P11 can be sufficiently canceled without using all the microphones 121 and the speakers 151.
  • the noise sound generation position (noise source position) must be near the position P11, and noise sound from all directions cannot be dealt with.
  • the microphone 121 constituting the microphone array 111 and the speaker 151 constituting the speaker array 113 are each divided into four groups and a speaker signal is generated for each group, the whole is completed. It is possible to deal with noise sounds from the direction.
  • the parts corresponding to the case in FIG. 7 are designated by the same reference numerals, and the description thereof will be omitted as appropriate.
  • the 16 microphones 121 constituting the microphone array 111 and the 16 speakers 151 constituting the speaker array 113 are each divided into four groups.
  • the group of microphones 121 will also be referred to as a microphone group
  • the group of speakers 151 will also be referred to as a speaker group.
  • the 16 microphones 121 constituting the microphone array 111 are divided into four microphone groups as shown by arrows Q21 to Q24.
  • the microphone groups are grouped so that one microphone 121 belongs to only one microphone group and the microphones 121 arranged adjacent to each other belong to one microphone group.
  • one microphone group consists of four microphones 121.
  • one microphone group is formed by four microphones 121 arranged adjacent to each other on the right front side when viewed from the user U11.
  • a microphone group is formed by four microphones 121 arranged adjacent to each other in the right rear, left rear, and left front directions when viewed from the user U11. Has been done.
  • a speaker group consisting of 12 speakers 151 arranged adjacent to each other with the position in front of the right side of the user U11 as the center with respect to the microphone group on the right front side when viewed from the user U11. Is formed.
  • the right rear, left rear, and left rear of the user U11 are also for the microphone groups in the right rear, left rear, and left front directions when viewed from the user U11.
  • a speaker group consisting of 12 speakers 151 arranged adjacent to each other is formed around the position in each direction on the front left side.
  • the speaker groups are grouped so that 12 speakers 151 arranged adjacent to each other belong to one speaker group, one speaker 151 belongs to three speaker groups. It will be.
  • the space can be generated.
  • the entire processing of NC can be divided into four. That is, for example, the hardware is divided into four by providing one arithmetic unit corresponding to the signal processing unit 131 for the corresponding microphone group and the speaker group, and the processing for spatial NC is performed by the plurality of arithmetic units. Can be dispersed.
  • a plurality of microphones 121 constituting the microphone array 111 are divided into four microphone groups so that four microphones 121 adjacent to each other belong to the same microphone group. That is, the microphone 121 to be used for one filtering is selected while shifting the microphone 121 by four.
  • the speaker 151 that is the output destination of the speaker signal obtained by one filtering is selected, and the speaker signal is supplied to the selected speaker 151.
  • grouping is performed to divide the entire processing part by part, and the speaker signals are added to the part where the filtering output destinations overlap to obtain the final speaker signal, resulting in all microphones.
  • Spatial NC can be performed using 121 and speaker 151. This makes it possible to deal with noise sounds from all directions.
  • the noise canceling device is configured as shown in FIG. 10, for example.
  • FIG. 10 the same reference numerals are given to the portions corresponding to those in FIG. 4, and the description thereof will be omitted as appropriate.
  • the noise canceling device 191 shown in FIG. 10 has a microphone array 111, a signal processing device 201, and a speaker array 113.
  • the microphone array 111 and the speaker array 113 are each divided into four groups.
  • microphones 121-1 to 121-4 are grouped together.
  • each of the microphones 121-5 to 121-8, the microphones 121-9 to 121-12, and the microphones 121-13 to 121-16 is grouped together.
  • the speakers 151-1 to the speakers 151-12 are grouped together, and the speakers 151-5 to the speakers 151-16 are also grouped together.
  • the speakers 151-9 to the speaker 151-16 and the speakers 151-1 to the speaker 151-4 are grouped together, and the speakers 151-13 to the speaker 151-16 and the speakers 151-1 to the speaker 1511 are grouped together.
  • -8 is one group.
  • the signal processing device 201 corresponds to the signal processing device 112 of FIG. 4, and is composed of, for example, a personal computer having one or a plurality of arithmetic units.
  • the signal processing device 201 has a signal processing unit 211-1 to a signal processing unit 211-4, and an addition unit 212-1 to an addition unit 212-16.
  • Each of the signal processing unit 211-1 to the signal processing unit 211-4 is composed of one arithmetic unit such as a DSP or FPGA, and corresponds to the signal processing unit 131 of FIG.
  • the signal processing unit 211-1 is the same as in the signal processing unit 131 based on the microphone signals supplied from the microphones 121-1 to 121-4 and the predetermined eight zero signals treated as microphone signals. Performs processing and generates a speaker signal.
  • the signal processing unit 211-1 generates speaker signals of each of the twelve channels, that is, speaker signals having the output destinations of the speakers 151-13 to the speaker 151-16 and the speakers 151-1 to 151-8 respectively. ..
  • the signal processing unit 211-1 supplies the generated speaker signal to the addition unit of the corresponding channel, that is, the addition unit 212-13 to the addition unit 212-16 and the addition unit 212-1 to the addition unit 212-8.
  • the signal processing unit 211-2 to the signal processing unit 211-4 also have 12 channels of speaker signals based on the microphone signals from the four microphones 121 and the eight zero signals. Is generated and output.
  • the signal processing unit 211-2 receives the microphone signal from the microphones 121-5 to 121-8, and supplies the speaker signal to the addition unit 212-1 to the addition unit 212-12.
  • the signal processing unit 211-3 receives the microphone signal from the microphones 121-9 to 121-12, and supplies the speaker signal to the addition unit 212-5 to the addition unit 212-16.
  • the signal processing unit 211-4 receives the microphone signal from the microphones 121-13 to 121-16, and is supplied to the addition unit 212-9 to the addition unit 212-16, and the addition unit 212-1 to the addition unit 212-4. Supply a speaker signal.
  • the signal processing unit 211-1 to the signal processing unit 211-4 are also simply referred to as the signal processing unit 211.
  • a signal processing unit 211 to which the microphone signal obtained by collecting the sound by the microphone 121 of the microphone group is input is predetermined for each microphone group.
  • Each signal processing unit 211 performs filtering by an SISO filter based on the microphone signals supplied from all the microphones 121 belonging to one microphone group, and corresponds to a part of the speakers 151 of the speaker array 113, that is, the microphone group.
  • the speaker signal of the speaker 151 belonging to the speaker group is generated.
  • the addition unit 212-1 to the addition unit 212-16 add speaker signals of the same channel supplied from a plurality of signal processing units 211 to obtain a final speaker signal, and the final speaker signal of the corresponding channel. It is supplied to the speaker 151.
  • the addition unit 212-1 to the addition unit 212-4 receive the speaker signal supply from the signal processing unit 211-1, the signal processing unit 211-2, and the signal processing unit 211-4, and the addition unit 212- 5 to the addition unit 212-8 receive a speaker signal from the signal processing unit 211-1 to the signal processing unit 211-3.
  • the addition unit 212-9 to the addition unit 212-12 receive a speaker signal from the signal processing unit 211-2 to the signal processing unit 211-4, and the addition unit 212-13 to the addition unit 212-16 are signal processing units.
  • the speaker signal is supplied from 211-1, the signal processing unit 211-3, and the signal processing unit 211-4.
  • addition unit 212-1 when it is not necessary to distinguish between the addition unit 212-1 and the addition unit 212-16, it is also simply referred to as the addition unit 212.
  • each of a plurality of corresponding addition units 212 is provided for each of the plurality of speakers 151 constituting the speaker array 113, and the addition units 212 are obtained by two or more signal processing units 211.
  • the speaker signals of the same speaker 151 are added and output.
  • each of these signal processing units 211 may be provided in a plurality of signal processing devices different from each other. ..
  • FIG. 11 is a diagram showing a configuration example of the signal processing unit 211 of the noise canceling device 191.
  • the signal processing unit 211 has a spatial frequency conversion unit 241, a filter processing unit 242-1 to a filter processing unit 242-12, and a spatial frequency synthesis unit 243.
  • the spatial frequency conversion unit 241, the filter processing unit 242-1 to the filter processing unit 242-12, and the spatial frequency synthesis unit 243 are the spatial frequency conversion unit 141, the filter processing unit 142, and the spatial frequency synthesis unit shown in FIG. Corresponds to part 143.
  • the spatial frequency conversion unit 241 performs spatial frequency conversion based on the time domain microphone signals supplied from each of the four microphones 121 and the eight zero signals supplied as dummy microphone signals.
  • the same DFT as in the equation (6) is performed as the spatial frequency conversion.
  • the DFT point length is set to 12
  • the signal x'[l, n] for the frequency bin l of each of the 12 spatial frequencies corresponding to the filter processing unit 242-1 to the filter processing unit 242-12 is calculated.
  • the spatial frequency conversion unit 241 supplies the signal in the spatial frequency domain obtained by the spatial frequency conversion to the filter processing unit 242-1 to the filter processing unit 242-12.
  • the filter processing unit 242-1 to the filter processing unit 242-12 generate a speaker signal in the spatial frequency region by performing signal processing in the spatial frequency region with respect to the signal supplied from the spatial frequency conversion unit 241. It is supplied to the spatial frequency synthesis unit 243.
  • the filter processing unit 242-1 to the filter processing unit 242-12 hold an SISO filter for spatial NC.
  • the filter processing unit 242-1 to the filter processing unit 242-12 generate a speaker signal by filtering the signal in the spatial frequency domain from the spatial frequency conversion unit 241 as signal processing by the holding SISO filter. ..
  • This SISO filter is, for example, the above-mentioned filter w'[l, n], and the calculation of the equation (7) is performed as filtering.
  • filter processing unit 242-1 when it is not necessary to distinguish between the filter processing unit 242-1 and the filter processing unit 242-12, it is also simply referred to as the filter processing unit 242.
  • the spatial frequency synthesis unit 243 generates a speaker signal in the time region for each speaker 151 by performing spatial frequency synthesis on the speaker signal in the spatial frequency region supplied from each filter processing unit 242, and causes the speaker array 113. Supply.
  • the inverse conversion of the spatial frequency conversion performed by the spatial frequency conversion unit 241 is performed as the spatial frequency synthesis.
  • the outputs of the plurality of microphones 121 constituting the microphone array 111 are divided into four and input to each signal processing unit 211.
  • each microphone signal supplied from the four microphones 121 is input to each of the four input terminals in the center of the twelve input terminals. Further, a zero signal, which is a dummy microphone signal, is input to each of a total of eight input terminals, four on each of the four input terminals on the left and right ends.
  • DFT is performed as spatial frequency conversion, for example, with a DFT point length "12" based on the microphone signal input from the input terminal, and then in each filter processing unit 242, the DFT output is subjected to. Filtering by SISO filter is performed.
  • IDFT is performed as spatial frequency synthesis for the output of each filter processing unit 242, and a speaker signal in the time domain of each channel is generated.
  • the speaker signal of each channel generated in this way is input to the speaker 151 corresponding to those channels, but before that, in each addition unit 212, the same channel from three signal processing units 211 adjacent to each other is used. Speaker signal is added.
  • the addition processing of the speaker signals of the same channel may be performed in the amplifier, or before the speaker signal is input to the amplifier.
  • the addition process may be performed in a digital or analog state.
  • each signal processing unit 211 the input / output of the spatial frequency conversion unit 241 and the spatial frequency synthesis unit 243, that is, the input / output (point length) of the DFT and IDFT is "12", so that the signal processing unit 131 shown in FIG. It is less than the point length "16" in the case of.
  • the number of PINs (number of input / output terminals) of the signal processing unit 211 can be reduced as compared with the case of the signal processing unit 131, and the amount of calculation (signal processing) performed by the signal processing unit 211 is also small. can do.
  • the noise canceling device 191 it is possible to reduce the number of PINs and the amount of calculation of one signal processing unit 211 and also reduce the delay time, and obtain high spatial NC performance in real time. Moreover, it is possible to deal with noise sounds from all directions.
  • each signal is processed according to the specifications of the signal processing unit (calculator) such as the number of PINs and the number of MIPS (Million Instructions Per Second), such as dividing a 256-channel microphone signal into 12 channels for processing.
  • the point length (number of divisions) in the section can be set arbitrarily.
  • step S51 Since the process of step S51 is the same as the process of step S11 of FIG. 6, the description thereof will be omitted. However, in step S51, the microphone signal obtained by each microphone 121 is supplied to the spatial frequency conversion unit 241 of the signal processing unit 211.
  • step S52 the spatial frequency conversion unit 241 of each signal processing unit 211 performs spatial frequency conversion on the microphone signals in the time domain supplied from the four microphones 121 and the eight zero signals, and the space obtained as a result.
  • a signal in the frequency domain is supplied to each filter processing unit 242.
  • step S52 the same calculation as in the above equation (6) is performed.
  • step S53 the filter processing unit 242 filters the signal in the spatial frequency domain supplied from the spatial frequency conversion unit 241 by the holding SISO filter, and obtains the speaker signal in the spatial frequency domain obtained as a result. It is supplied to the spatial frequency synthesis unit 243. For example, in step S53, the same calculation as in equation (7) is performed as filtering.
  • step S54 the spatial frequency synthesis unit 243 performs spatial frequency synthesis on the spatial frequency region speaker signal supplied from the filter processing unit 242, and supplies the resulting time domain speaker signal to the addition unit 212. ..
  • step S55 the addition unit 212 performs addition processing to add the speaker signals of the same channel supplied from the spatial frequency synthesis unit 243 of each of the three signal processing units 211, and obtains the final speaker signal.
  • each addition unit 212 supplies the speaker signal obtained in the process of step S55 to each speaker 151 of the speaker array 113 to output a sound (noise canceling sound), and the noise canceling process ends.
  • the noise canceling device 191 divides the output of the microphone array 111 into four parts and inputs them to the signal processing unit 211, and each signal processing unit 211 generates a speaker signal by signal processing in the spatial frequency region. do. By doing so, it is possible to reduce the number of PINs and the amount of calculation of one signal processing unit 211 and also reduce the delay time, and realize a high-performance spatial NC that can handle all directions in real time.
  • an addition unit 212 is provided after the signal processing unit 211 in order to share the processing with the plurality of signal processing units 211.
  • FIGS. 13 and 14 if the number of microphones 121 belonging to the microphone group is increased and the outputs of the microphones 121 are overlapped with a plurality of adjacent signal processing units 211 (computing units) for input,
  • the configuration may be such that the addition unit 212 is not provided.
  • FIG. 13 or FIG. 14 the parts corresponding to the case in FIG. 7 are designated by the same reference numerals, and the description thereof will be omitted as appropriate.
  • the noise sound from the position P11 is signal-processed, that is, filtered by using 12 microphones 121, and among the speaker signals obtained as a result, the speaker signals for 4 channels are obtained. Sound is output from the four speakers 151 used.
  • the microphone 121 and the speaker 151 are each divided into four groups.
  • one microphone group is formed by twelve microphones 121 arranged adjacent to each other around the position on the right front side of the user U11.
  • a microphone group is formed by twelve microphones 121 arranged adjacent to each other around the right rear, left rear, and left front directions of the user U11. Has been done.
  • one microphone 121 belongs to three microphone groups. Become.
  • a speaker group consisting of four speakers 151 arranged adjacent to each other is formed with respect to the microphone group on the right front side of the user U11, centered on the position on the right side front side of the user U11. Has been done.
  • the right rear, left rear, and left front of the user U11 are also for the microphone groups in the right rear, left rear, and left front directions of the user U11.
  • a speaker group consisting of four speakers 151 arranged adjacent to each other is formed with the position in each direction as the center.
  • the speaker groups are grouped so that one speaker 151 belongs to only one speaker group, and the speakers 151 arranged adjacent to each other belong to one speaker group.
  • FIG. 15 ⁇ Configuration example of noise canceling device>
  • the noise canceling device is configured as shown in FIG. 15, for example.
  • the same reference numerals are given to the portions corresponding to those in FIG. 10, and the description thereof will be omitted as appropriate.
  • the noise canceling device 281 shown in FIG. 15 has a microphone array 111, a signal processing device 201, and a speaker array 113. Further, the signal processing device 201 has a signal processing unit 211-1 to a signal processing unit 211-4.
  • the configuration of the noise canceling device 281 is different from the configuration of the noise canceling device 191 in that the addition unit 212 is not provided, and is the same configuration as the noise canceling device 191 in other respects.
  • the noise canceling device 281 and the noise canceling device 191 differ in the input / output relationship between the signal processing unit 211, the microphone 121, and the speaker 151.
  • microphones 121-1 to 121-8 and microphones 121-13 to 121-16 are grouped together, and the microphone signals of these microphones 121 are supplied to the signal processing unit 211-1. To.
  • the microphones 121-1 to 121-12 are grouped together, and the microphone signals of these microphones 121 are supplied to the signal processing unit 211-2.
  • Microphones 121-5 to 121-16 are grouped together, and the microphone signals of these microphones 121 are supplied to the signal processing unit 211-3.
  • Microphones 121-9 to 121-16 and microphones 121-1 to 121-4 are grouped together, and the microphone signals of these microphones 121 are supplied to the signal processing unit 211-4.
  • the output of one microphone 121 is input to two or more, more specifically, three signal processing units 211 predetermined for the microphone 121 (microphone group). Therefore, the dummy microphone signal (zero signal) is not supplied to the spatial frequency conversion unit 241 of each signal processing unit 211, and the microphone signals of the twelve microphones 121 are input.
  • the speakers 151-1 to 151-4 are grouped together, and the speakers 151-5 to 151-8 are also grouped together.
  • speakers 151-9 to speakers 151-12 are grouped together, and speakers 151-13 to speakers 151-16 are grouped together.
  • each signal processing unit 211 filtering by an SISO filter or the like is performed based on the microphone signal, and a speaker signal of a part of the speaker 151 of the speaker array 113, that is, all the speakers 151 belonging to the speaker group corresponding to the microphone group is generated. Will be done.
  • the spatial frequency synthesis unit 243 the same number of 12 channels of speaker signals as the input of the spatial frequency conversion unit 241, that is, speaker signals corresponding to each of the 12 speakers 151 can be obtained. However, among these speaker signals, only the speaker signals for four channels, that is, the speaker signals of some of the speakers 151 out of the twelve speakers 151 are actually output to the speaker 151.
  • a speaker signal is output from each of the four output terminals in the center of the twelve output terminals to the speaker 151 connected to those output terminals. Further, since the speaker 151 is not connected to each of the eight output terminals, four on each of the four output terminals on the left and right ends, the speaker signal is supplied from these output terminals to the speaker 151. I won't get it.
  • the speaker signal is output only from four output terminals out of the twelve output terminals, and the remaining eight output terminals are not used. Therefore, for example, a part of the output of spatial frequency synthesis (IDFT) may be omitted.
  • IDFT spatial frequency synthesis
  • the calculation for space NC performed by the noise canceling device 281 is completely equivalent to the calculation for space NC performed by the noise canceling device 191.
  • the noise canceling device 281 and the noise canceling device Which configuration of the ring device 191 may be selected.
  • the configuration of the noise canceling device 281 may be adopted.
  • the configuration of the noise canceling device 191 may be adopted.
  • noise canceling device 281 as described above, basically, the noise canceling process described with reference to FIG. 6 is performed.
  • step S11 the microphone signal obtained by each microphone 121 is supplied to the spatial frequency conversion unit 241 of the signal processing unit 211.
  • step S12 the spatial frequency conversion unit 241 of each signal processing unit 211 performs spatial frequency conversion, and the resulting signal is the filter processing unit 242-1 to the filter processing unit 242- of each signal processing unit 211. It is supplied to 12.
  • step S13 filtering is performed by the filter processing unit 242 of each signal processing unit 211, and the speaker signal in the spatial frequency region obtained as a result is supplied to the spatial frequency synthesis unit 243 of each signal processing unit 211.
  • step S14 spatial frequency synthesis is performed by the spatial frequency synthesis unit 243 of each signal processing unit 211, and the speaker signal in the time domain obtained as a result is supplied to the speaker 151 in step S15, and spatial NC is realized.
  • the series of processes described above can be executed by hardware or software.
  • the programs constituting the software are installed on the computer.
  • the computer includes a computer embedded in dedicated hardware and, for example, a general-purpose personal computer capable of executing various functions by installing various programs.
  • FIG. 16 is a block diagram showing a configuration example of computer hardware that executes the above-mentioned series of processes programmatically.
  • the CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • An input / output interface 505 is further connected to the bus 504.
  • An input unit 506, an output unit 507, a recording unit 508, a communication unit 509, and a drive 510 are connected to the input / output interface 505.
  • the input unit 506 includes a keyboard, a mouse, a microphone, an image pickup element, and the like.
  • the output unit 507 includes a display, a speaker, and the like.
  • the recording unit 508 includes a hard disk, a non-volatile memory, and the like.
  • the communication unit 509 includes a network interface and the like.
  • the drive 510 drives a removable recording medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
  • the CPU 501 loads the program recorded in the recording unit 508 into the RAM 503 via the input / output interface 505 and the bus 504 and executes the above-mentioned series. Is processed.
  • the program executed by the computer (CPU501) can be recorded and provided on a removable recording medium 511 as a package medium or the like, for example.
  • the program can also be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
  • the program can be installed in the recording unit 508 via the input / output interface 505 by mounting the removable recording medium 511 in the drive 510. Further, the program can be received by the communication unit 509 and installed in the recording unit 508 via a wired or wireless transmission medium. In addition, the program can be pre-installed in the ROM 502 or the recording unit 508.
  • the program executed by the computer may be a program in which processing is performed in chronological order according to the order described in the present specification, in parallel, or at a necessary timing such as when a call is made. It may be a program in which processing is performed.
  • the embodiment of the present technology is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present technology.
  • this technology can take a cloud computing configuration in which one function is shared by multiple devices via a network and processed jointly.
  • each step described in the above flowchart can be executed by one device or shared by a plurality of devices.
  • the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices.
  • this technology can also have the following configurations.
  • the signal processing unit has one or more signal processing units that perform signal processing in the spatial frequency domain.
  • the signal processing unit is a signal processing device that performs the signal processing on a signal converted in the spatial frequency domain based on the microphone signals obtained by collecting sounds from a plurality of microphones.
  • the signal processing device according to (1) wherein the signal processing unit generates a noise canceling signal by performing the signal processing.
  • a spatial frequency conversion unit that performs spatial frequency conversion for the microphone signals in a plurality of time domains, It is further equipped with a spatial frequency synthesizer that performs spatial frequency synthesis on the signal in the spatial frequency region obtained by the signal processing.
  • the signal processing unit performs the signal processing by inputting a plurality of signals based on the plurality of microphone signals obtained by the plurality of microphones, and outputs the plurality of signals (1) to (3).
  • the signal processing device according to paragraph 1.
  • the signal processing device according to any one of (1) to (4), wherein the signal processing unit has a plurality of filter processing units and performs filtering by the filter processing unit as the signal processing.
  • It has a plurality of the signal processing units and has a plurality of the signal processing units.
  • the signal processing unit performs the signal processing on the signal based on the microphone signal obtained by all the microphones belonging to one group when the plurality of microphones are divided into a plurality of groups, and performs the signal processing to the plurality of microphones.
  • the addition unit adds the speaker signals of the speaker corresponding to the addition unit obtained by two or more of the signal processing units of the plurality of signal processing units, and the final obtained by the addition.
  • the signal processing device according to any one of (1) to (5), which outputs a specific speaker signal to the corresponding speaker.
  • the plurality of the microphones are divided into a predetermined number of the groups so that the plurality of microphones adjacent to each other belong to the same group.
  • the signal processing unit of the predetermined number to which the microphone signal obtained by the microphone belonging to the group is input is defined for each of the predetermined number of the groups (6).
  • Signal processing device has a plurality of the signal processing units and has a plurality of the signal processing units. For each of the plurality of microphones, the microphone signal obtained by one microphone is input to two or more predetermined signal processing units among the plurality of signal processing units.
  • the signal processing unit performs signal processing on a signal based on the microphone signal input from the plurality of microphones to generate a speaker signal corresponding to each of the plurality of speakers, and the plurality of speakers.
  • the signal processing device according to any one of (1) to (5), which outputs the speaker signal to some of the speakers.
  • (11) With multiple microphones
  • the signal processing unit performs noise canceling to generate the noise canceling signal by performing the signal processing on the signal converted in the spatial frequency region based on the microphone signals obtained by the sound collection by the plurality of microphones.

Abstract

The present technology pertains to a signal processing device and method, a noise cancelling device, and a program configured to enable a reduction in delay time. The signal processing device includes one or a plurality of signal processing units that carry out signal processing in the spatial frequency domain. The signal processing units carry out the signal processing on signals converted in the spatial frequency domain on the basis of microphone signals obtained by sound collection using a plurality of microphones. This technology can be applied to noise cancelling devices.

Description

信号処理装置および方法、ノイズキャンセリング装置、並びにプログラムSignal processing equipment and methods, noise canceling equipment, and programs
 本技術は、信号処理装置および方法、ノイズキャンセリング装置、並びにプログラムに関し、特に、遅延時間を低減させることができるようにした信号処理装置および方法、ノイズキャンセリング装置、並びにプログラムに関する。 The present technology relates to signal processing devices and methods, noise canceling devices, and programs, and in particular, to signal processing devices and methods capable of reducing delay times, noise canceling devices, and programs.
 従来、波面合成技術を利用してノイズキャンセリングを行う空間ノイズキャンセリング(以下、空間NC(Noise Cancelling)とも称する)が知られている。 Conventionally, spatial noise canceling (hereinafter, also referred to as spatial NC (Noise Canceling)), which performs noise canceling using wave field synthesis technology, is known.
 例えば環状マイクアレイおよび環状スピーカアレイを用いて空間NCを行うときの1つの手法として、空間周波数領域でのフィルタリング等の演算を行う手法が考えられる。 For example, as one method when performing spatial NC using an annular microphone array and an annular speaker array, a method of performing operations such as filtering in the spatial frequency domain can be considered.
 波面合成を実現するために空間周波数領域での演算を行う技術(例えば、非特許文献1参照)が提案されており、空間周波数領域での演算を利用すれば、複数のスピーカに対応するチャネル間の相関を考慮し、より高い空間NC性能を実現することができる。 A technique for performing an operation in the spatial frequency domain (see, for example, Non-Patent Document 1) has been proposed in order to realize wave field synthesis, and if the arithmetic in the spatial frequency domain is used, between channels corresponding to a plurality of speakers. It is possible to realize higher spatial NC performance in consideration of the correlation of.
 ところで、空間周波数領域でのフィルタリング等を行って空間NCを実現するには、全マイクで得られたマイク信号に対して時間フーリエ変換を行い、その結果得られた信号に対して空間フーリエ変換を行い、空間周波数領域の信号へと変換する必要がある。 By the way, in order to realize spatial NC by performing filtering in the spatial frequency domain, time Fourier transform is performed on the microphone signals obtained by all the microphones, and spatial Fourier transform is performed on the resulting signals. It needs to be done and converted into a signal in the spatial frequency domain.
 また、フィルタリング後においても、空間周波数領域の信号に対して、空間フーリエ変換の逆変換と、時間フーリエ変換の逆変換を行って、時間領域の信号に戻す必要がある。 Also, even after filtering, it is necessary to perform the inverse transform of the spatial Fourier transform and the inverse transform of the temporal Fourier transform on the signal in the spatial frequency domain to return it to the signal in the time domain.
 しかしながら、このような時間フーリエ変換を行うと原理的に避けられないアルゴリズム遅延が生じてしまうため、スピーカから空間NCのための音を出力するまでに遅延が生じ、空間NC性能が低下してしまう。 However, when such a time Fourier transform is performed, an algorithm delay that cannot be avoided in principle occurs, so that a delay occurs until the sound for spatial NC is output from the speaker, and the spatial NC performance deteriorates. ..
 本技術は、このような状況に鑑みてなされたものであり、遅延時間を低減させることができるようにするものである。 This technology was made in view of such a situation, and makes it possible to reduce the delay time.
 本技術の第1の側面の信号処理装置は、空間周波数領域での信号処理を行う1または複数の信号処理部を有し、前記信号処理部は、複数のマイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行う。 The signal processing device of the first aspect of the present technology has one or a plurality of signal processing units that perform signal processing in the spatial frequency region, and the signal processing unit is a microphone obtained by sound collection by a plurality of microphones. The signal processing is performed on the signal converted in the spatial frequency region based on the signal.
 本技術の第1の側面の信号処理方法またはプログラムは、空間周波数領域での信号処理を行う1または複数の信号処理部を有する信号処理装置の信号処理方法またはプログラムであって、前記1または複数の前記信号処理部が、複数のマイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行うステップを含む。 The signal processing method or program of the first aspect of the present technology is a signal processing method or program of a signal processing apparatus having one or more signal processing units that perform signal processing in the spatial frequency region, and is the one or more. The signal processing unit includes a step of performing the signal processing on a signal converted in the spatial frequency region based on the microphone signals obtained by collecting sounds from a plurality of microphones.
 本技術の第1の側面においては、空間周波数領域での信号処理を行う1または複数の信号処理部を有する信号処理装置において、前記1または複数の前記信号処理部により、複数のマイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理が行われる。 In the first aspect of the present technology, in a signal processing apparatus having one or a plurality of signal processing units that perform signal processing in the spatial frequency region, the sound is picked up by a plurality of microphones by the one or a plurality of the signal processing units. The signal processing is performed on the signal converted in the spatial frequency region based on the microphone signal obtained by.
 本技術の第2の側面のノイズキャンセリング装置は、複数のマイクと、空間周波数領域での信号処理を行う1または複数の信号処理部と、前記信号処理により生成されたノイズキャンセル信号に基づく音を出力する複数のスピーカとを備え、前記信号処理部は、複数の前記マイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行うことで、前記ノイズキャンセル信号を生成する。 The noise canceling device on the second aspect of the present technology includes a plurality of microphones, one or a plurality of signal processing units that perform signal processing in the spatial frequency region, and a sound based on the noise canceling signal generated by the signal processing. The signal processing unit is provided with a plurality of speakers for outputting the signal, and the signal processing unit performs the signal processing on the signal converted in the spatial frequency region based on the microphone signals obtained by the sound collection by the plurality of microphones. The noise canceling signal is generated.
 本技術の第2の側面においては、複数のマイクと、空間周波数領域での信号処理を行う1または複数の信号処理部と、前記信号処理により生成されたノイズキャンセル信号に基づく音を出力する複数のスピーカとを備えるノイズキャンセリング装置において、前記信号処理部により、複数の前記マイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行うことで、前記ノイズキャンセル信号が生成される。 In the second aspect of the present technology, a plurality of microphones, one or a plurality of signal processing units that perform signal processing in the spatial frequency region, and a plurality of sounds based on the noise canceling signal generated by the signal processing are output. In the noise canceling device including the speaker, the signal processing unit performs the signal processing on the signal converted in the spatial frequency region based on the microphone signals obtained by the sound collection by the plurality of microphones. , The noise canceling signal is generated.
並列SISOシステムの構成を示す図である。It is a figure which shows the structure of the parallel SISO system. 多点制御MIMOシステムの構成を示す図である。It is a figure which shows the structure of the multipoint control MIMO system. 空間周波数領域処理システムの構成を示す図である。It is a figure which shows the structure of the spatial frequency domain processing system. ノイズキャンセリング装置の構成例を示す図である。It is a figure which shows the configuration example of a noise canceling apparatus. スピーカアレイのスピーカ配置例を示す図である。It is a figure which shows the speaker arrangement example of a speaker array. ノイズキャンセリング処理を説明するフローチャートである。It is a flowchart explaining the noise canceling process. マイクとスピーカの使用について説明する図である。It is a figure explaining the use of a microphone and a speaker. マイクとスピーカの使用について説明する図である。It is a figure explaining the use of a microphone and a speaker. マイクアレイとスピーカアレイのグループ分けについて説明する図である。It is a figure explaining the grouping of a microphone array and a speaker array. ノイズキャンセリング装置の構成例を示す図である。It is a figure which shows the configuration example of a noise canceling apparatus. 信号処理部の構成例を示す図である。It is a figure which shows the structural example of a signal processing part. ノイズキャンセリング処理を説明するフローチャートである。It is a flowchart explaining the noise canceling process. マイクとスピーカの使用について説明する図である。It is a figure explaining the use of a microphone and a speaker. マイクアレイとスピーカアレイのグループ分けについて説明する図である。It is a figure explaining the grouping of a microphone array and a speaker array. ノイズキャンセリング装置の構成例を示す図である。It is a figure which shows the configuration example of a noise canceling apparatus. コンピュータの構成例を示す図である。It is a figure which shows the configuration example of a computer.
 以下、図面を参照して、本技術を適用した実施の形態について説明する。 Hereinafter, embodiments to which the present technology is applied will be described with reference to the drawings.
〈第1の実施の形態〉
〈本技術について〉
 本技術は、時間領域のマイク信号に対して、直接、空間周波数変換を行って空間周波数領域の信号に変換することで、時間周波数変換とその逆変換を必要としない空間NCを実現するものである。これにより、遅延時間を低減させ、より高い空間NC性能を得ることができるようになる。
<First Embodiment>
<About this technology>
This technology realizes spatial NC that does not require time-frequency conversion and vice versa by directly performing spatial frequency conversion on the microphone signal in the time domain and converting it into a signal in the spatial frequency domain. be. As a result, the delay time can be reduced and higher spatial NC performance can be obtained.
 所望の領域において、所望の周波数での空間NCを実現するには、空間のナイキストの定理を満たすだけの多くのマイクおよびスピーカを用いる必要がある。 In order to realize spatial NC at a desired frequency in a desired region, it is necessary to use as many microphones and speakers as possible to satisfy the Nyquist theorem of space.
 特に一般的な空間NCにおいて、より高い周波数までキャンセル効果を得るためには、多数のマイクとスピーカを用いて、膨大な演算を行う必要がある。 Especially in general spatial NC, in order to obtain the canceling effect up to higher frequencies, it is necessary to perform a huge amount of calculation using a large number of microphones and speakers.
 すなわち、一般的な空間周波数領域処理の演算を用いる空間NCにおいては、全マイクで得られたマイク信号に対して時間フーリエ変換が行われ、その結果得られた信号に対して空間フーリエ変換が行われる。そして、空間フーリエ変換により得られた空間周波数領域の信号に対してフィルタリングが行われてスピーカ信号が生成された後、全てのスピーカ信号に対して空間フーリエ変換と時間フーリエ変換の逆変換が行われ、時間領域のスピーカ信号とされる。 That is, in the spatial NC that uses the calculation of general spatial frequency domain processing, the temporal Fourier transform is performed on the microphone signals obtained by all the microphones, and the spatial Fourier transform is performed on the resulting signals. Will be. Then, after filtering is performed on the signal in the spatial frequency domain obtained by the spatial Fourier transform to generate a speaker signal, the spatial Fourier transform and the time Fourier transform are inversely transformed on all the speaker signals. , It is a speaker signal in the time domain.
 上述したように、時間フーリエ変換を行うと原理的に避けられない時間遅延が生じてしまうため、システムの遅延が生じ、リアルタイムで高い空間NC性能を実現することは困難となる。 As mentioned above, when the time Fourier transform is performed, an unavoidable time delay occurs in principle, so that a system delay occurs and it becomes difficult to realize high spatial NC performance in real time.
 また、一般的な空間周波数領域処理の演算を用いる空間NCでは、全てのマイクのマイク信号から得られる信号を用いて空間フーリエ変換やその逆変換、空間周波数領域でのフィルタリングが行われる。 Further, in the spatial NC that uses the calculation of the general spatial frequency domain processing, the spatial Fourier transform, its inverse transform, and the filtering in the spatial frequency domain are performed using the signals obtained from the microphone signals of all the microphones.
 そのため、空間NCにマイクやスピーカを多数用いると、その分だけDSP(Digital Signal Processor)等の処理装置への入出力線も必要となり、空間NCのための各処理も1つの装置で処理しなければならない。 Therefore, if a large number of microphones and speakers are used for the spatial NC, input / output lines to processing devices such as DSP (Digital Signal Processor) are required accordingly, and each process for the spatial NC must be processed by one device. Must be.
 したがって、空間NCを行う装置の演算負荷が大きいだけでなく、多数の入出力線も必要であり、処理能力が低かったり、入出力線が少なかったりする装置では、空間NCを行うことができなくなってしまう。 Therefore, not only the computational load of the device that performs spatial NC is large, but also a large number of input / output lines are required, and the device that has low processing capacity or few input / output lines cannot perform spatial NC. Will end up.
 そこで、本技術では、時間領域のマイク信号を、直接、空間周波数領域の信号に変換することで、時間周波数変換やその逆変換を必要としない空間NCを実現できるようにした。 Therefore, in this technology, by directly converting the microphone signal in the time domain into the signal in the spatial frequency domain, it is possible to realize a spatial NC that does not require time frequency conversion or its reverse conversion.
 これにより、演算量を大幅に削減することができるだけでなく、遅延時間も低減させ、リアルタイムで高い空間NC性能を得ることができる。 As a result, not only the amount of calculation can be significantly reduced, but also the delay time can be reduced, and high spatial NC performance can be obtained in real time.
 また、本技術では、空間NCに用いるマイクやスピーカを複数のグループに分割し、グループごとに処理を行うことで、一般的な空間NCでは1つの装置で行うことしかできなかった処理を複数の装置や演算器で分担して行うことができる。これにより、各装置や演算器の演算量を低減させることができ、また1つの装置で必要となる入出力線の数も低減させることができる。 In addition, in this technology, microphones and speakers used for spatial NC are divided into multiple groups, and processing is performed for each group, so that multiple processing that could only be performed by one device in general spatial NC can be performed. It can be shared by devices and arithmetic units. As a result, the amount of calculation of each device or arithmetic unit can be reduced, and the number of input / output lines required for one device can also be reduced.
 それでは、以下、一般的なノイズキャンセリング技術と本技術について、より詳細に説明する。 Then, the general noise canceling technology and this technology will be explained in more detail below.
 図1は、一般的なヘッドフォン等でのノイズキャンセリングを実現する多入力多出力システム、すなわち並列SISO(Single Input Single Output)システムの構成を示す図である。 FIG. 1 is a diagram showing a configuration of a multi-input multi-output system that realizes noise canceling with general headphones or the like, that is, a parallel SISO (Single Input Single Output) system.
 図1に示す並列SISOシステムは、マイク11-1乃至マイク11-6、SISOフィルタ12-1乃至SISOフィルタ12-6、およびスピーカ13-1乃至スピーカ13-6を有している。 The parallel SISO system shown in FIG. 1 has microphones 11-1 to 11-6, SISO filters 12-1 to SISO filters 12-6, and speakers 13-1 to 13-6.
 マイク11-1乃至マイク11-6は、周囲の音を収音し、その結果得られたマイク信号をSISOフィルタ12-1乃至SISOフィルタ12-6に供給する。 The microphones 11-1 to 11-6 pick up the ambient sound, and supply the resulting microphone signal to the SISO filter 12-1 to the SISO filter 12-6.
 SISOフィルタ12-1乃至SISOフィルタ12-6は、マイク11-1乃至マイク11-6から供給されたマイク信号に対して、時間領域でSISOフィルタによるフィルタリングを行い、その結果得られたスピーカ信号をスピーカ13-1乃至スピーカ13-6に供給する。 The SISO filter 12-1 to SISO filter 12-6 filters the microphone signals supplied from the microphones 11-1 to 11-6 by the SISO filter in the time domain, and obtains the speaker signal as a result. It is supplied to the speakers 13-1 to the speakers 13-6.
 スピーカ信号は、対象とする領域(位置)においてノイズ音がキャンセルされるように、すなわちノイズキャンセリングが実現されるようにスピーカから音(以下、ノイズキャンセル音とも称する)を出力させるためのスピーカの駆動信号である。換言すれば、スピーカ信号は、スピーカからノイズキャンセル音を出力させるためのノイズキャンセル信号である。 The speaker signal is a speaker signal for outputting sound (hereinafter, also referred to as noise canceling sound) from the speaker so that noise noise is canceled in a target area (position), that is, noise canceling is realized. It is a drive signal. In other words, the speaker signal is a noise canceling signal for outputting a noise canceling sound from the speaker.
 スピーカ13-1乃至スピーカ13-6は、SISOフィルタ12-1乃至SISOフィルタ12-6から供給されたスピーカ信号に基づいて音を出力し、ノイズキャンセリングを実現する。 Speakers 13-1 to 13-6 output sound based on the speaker signals supplied from the SISO filters 12-1 to SISO filters 12-6, and realize noise canceling.
 なお、以下、マイク11-1乃至マイク11-6を特に区別する必要のない場合、単にマイク11とも称し、SISOフィルタ12-1乃至SISOフィルタ12-6を特に区別する必要のない場合、単にSISOフィルタ12とも称する。また、以下、スピーカ13-1乃至スピーカ13-6を特に区別する必要のない場合、単にスピーカ13とも称する。 Hereinafter, when it is not necessary to distinguish microphones 11-1 to 11-6, they are also simply referred to as microphones 11, and when it is not necessary to distinguish SISO filters 12-1 to SISO filters 12-6, they are simply SISO. Also referred to as a filter 12. Further, hereinafter, when it is not necessary to distinguish between the speaker 13-1 and the speaker 13-6, the speaker 13 is also simply referred to as a speaker 13.
 図1では、例えばマイク11-1、SISOフィルタ12-1、およびスピーカ13-1からなるシステムが1つのチャネルに対応するSISOシステムとなっており、複数のSISOシステムが並列に複数配置されて並列SISOシステムが構成されている。 In FIG. 1, for example, a system consisting of a microphone 11-1, an SISO filter 12-1, and a speaker 13-1 is an SISO system corresponding to one channel, and a plurality of SISO systems are arranged in parallel in parallel. The SISO system is configured.
 各チャネル、すなわち各SISOシステムは独立に動作するため、それらの各SISOシステムでの演算量は少なくて済む。しかし、並列SISOシステムでは、チャネル間の相関は考慮されないため、より高い周波数になるほど、各スピーカ13により出力される音の位相のずれの影響が大きくなり、ノイズキャンセル効果は低くなってしまう。 Since each channel, that is, each SISO system operates independently, the amount of calculation in each of those SISO systems can be small. However, in the parallel SISO system, since the correlation between channels is not taken into consideration, the higher the frequency, the greater the influence of the phase shift of the sound output by each speaker 13, and the lower the noise canceling effect.
 図2は、多点制御MIMO(Multi Input Multi Output)システムの構成を示す図である。なお、図2において図1における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 FIG. 2 is a diagram showing a configuration of a multi-point control MIMO (Multi Input Multi Output) system. In FIG. 2, the same reference numerals are given to the portions corresponding to those in FIG. 1, and the description thereof will be omitted as appropriate.
 図2に示す多点制御MIMOシステムは、マイク11-1乃至マイク11-6、MIMOフィルタ41、およびスピーカ13-1乃至スピーカ13-6を有している。 The multipoint control MIMO system shown in FIG. 2 has microphones 11-1 to 11-6, a MIMO filter 41, and speakers 13-1 to 13-6.
 この多点制御MIMOシステムでは、全てのマイク11で得られたマイク信号が1つのMIMOフィルタ41に入力される。 In this multi-point control MIMO system, the microphone signals obtained from all the microphones 11 are input to one MIMO filter 41.
 MIMOフィルタ41は、各マイク11から供給されたマイク信号に対してMIMOフィルタによるフィルタリングを行ってチャネルごとのスピーカ信号を生成し、各チャネルのスピーカ信号を、それらの各チャネルに対応するスピーカ13に出力する。 The MIMO filter 41 performs filtering by the MIMO filter on the microphone signal supplied from each microphone 11 to generate a speaker signal for each channel, and transfers the speaker signal of each channel to the speaker 13 corresponding to each channel. Output.
 MIMOフィルタ41では、各マイク11と各スピーカ13との間で、それぞれ時間領域でのフィルタ演算が施されているので、チャネル間の相関も考慮され、高い周波数においても良好なノイズキャンセリング効果を得ることができる。 In the MIMO filter 41, since the filter calculation in the time domain is performed between each microphone 11 and each speaker 13, the correlation between channels is also taken into consideration, and a good noise canceling effect is obtained even at high frequencies. Obtainable.
 しかしながら、多点制御MIMOシステムでは、チャネル数が多いほど、MIMOフィルタ41での演算量がチャネル数の2乗に比例して大きくなってしまう。 However, in the multipoint control MIMO system, the larger the number of channels, the larger the amount of calculation in the MIMO filter 41 is in proportion to the square of the number of channels.
 例えば図1の並列SISOシステムでは、チャネル数が48chであれば、48chのフィルタリングの演算量で済む。これに対して、図2の多点制御MIMOシステムでは、チャネル数が48chであると、2304(=482)chのフィルタリングが必要となり、演算量が少チャネルのシステムに比べて著しく増大してしまう。 For example, in the parallel SISO system shown in FIG. 1, if the number of channels is 48 channels, the amount of filtering calculation for 48 channels is sufficient. On the other hand, in the multipoint control MIMO system shown in FIG. 2, when the number of channels is 48 channels, filtering of 2304 (= 48 2 ) channels is required, and the amount of calculation is significantly increased as compared with the system with a small number of channels. It ends up.
 そこで、例えば図3に示すように、空間周波数領域処理システムにより空間周波数領域での処理を行うと、演算量を大幅に低減できることが知られている。なお、図3において図1における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 Therefore, as shown in FIG. 3, for example, it is known that the amount of calculation can be significantly reduced by performing processing in the spatial frequency domain by the spatial frequency domain processing system. In FIG. 3, the same reference numerals are given to the portions corresponding to those in FIG. 1, and the description thereof will be omitted as appropriate.
 図3に示す空間周波数領域処理システムは、マイク11-1乃至マイク11-6、時間FFT(Fast Fourier Transform)部71-1乃至時間FFT部71-6、空間FFT部72、フィルタ処理部73、空間逆FFT部74、時間逆FFT部75-1乃至時間逆FFT部75-6、およびスピーカ13-1乃至スピーカ13-6を有している。 The spatial frequency domain processing system shown in FIG. 3 includes a microphone 11-1 to a microphone 11-6, a time FFT (Fast Fourier Transform) unit 71-1 to a time FFT unit 71-6, a space FFT unit 72, and a filter processing unit 73. It has a spatial inverse FFT unit 74, a time inverse FFT unit 75-1 to a time inverse FFT unit 75-6, and a speaker 13-1 to a speaker 13-6.
 時間FFT部71-1乃至時間FFT部71-6は、マイク11-1乃至マイク11-6から供給されたマイク信号に対して時間FFT処理を行い、その結果得られた時間周波数領域の信号を空間FFT部72に供給する。 The time FFT unit 71-1 to the time FFT unit 71-6 perform time FFT processing on the microphone signals supplied from the microphones 11-1 to 11-6, and obtain signals in the time frequency region obtained as a result. It is supplied to the space FFT unit 72.
 空間FFT部72は、時間FFT部71-1乃至時間FFT部71-6から供給された信号に対して空間FFT処理を行い、その結果得られた空間周波数領域の信号(周波数ビンごとの信号)をフィルタ処理部73に供給する。 The spatial FFT unit 72 performs spatial FFT processing on the signals supplied from the time FFT unit 71-1 to the time FFT unit 71-6, and the resulting spatial frequency domain signal (signal for each frequency bin). Is supplied to the filter processing unit 73.
 なお、以下、時間FFT部71-1乃至時間FFT部71-6を特に区別する必要のない場合、単に時間FFT部71とも称する。 Hereinafter, when it is not necessary to particularly distinguish between the time FFT unit 71-1 and the time FFT unit 71-6, it is also simply referred to as the time FFT unit 71.
 時間FFT部71では、例えばSTFT(Short Time Fourier Transform)などの時間軸のFFT処理(時間フーリエ変換)が時間FFT処理として行われ、時間信号であるマイク信号が時間周波数領域の信号へと変換される。 In the time FFT unit 71, for example, STFT (Short Time Fourier Transform) or other time axis FFT processing (time Fourier transform) is performed as time FFT processing, and the microphone signal, which is a time signal, is converted into a signal in the time frequency region. To.
 また、空間FFT部72では、時間FFT部71で得られた時間周波数領域の信号に対して、空間軸のFFT処理(空間フーリエ変換)を空間FFT処理として行うことで、空間周波数領域の時間周波数の信号を得る。 Further, in the space FFT unit 72, the time frequency in the space frequency domain is performed by performing the FFT process (spatial Fourier transform) on the space axis as the space FFT process for the signal in the time frequency domain obtained by the time FFT unit 71. Get the signal of.
 フィルタ処理部73は、空間FFT部72から供給された信号に対して、空間周波数領域でフィルタリングを行い、その結果得られた周波数ビンごとのスピーカ信号を空間逆FFT部74に供給する。 The filter processing unit 73 filters the signal supplied from the spatial FFT unit 72 in the spatial frequency domain, and supplies the speaker signal for each frequency bin obtained as a result to the spatial inverse FFT unit 74.
 空間FFT部72では空間FFT処理により空間周波数領域の時間周波数の信号が得られるので、フィルタ処理部73でのフィルタリングは周波数軸の掛け算となり、図2の多点制御MIMOシステムと比較すると大幅に演算量を削減することができる。 In the spatial FFT unit 72, since the signal of the time frequency in the spatial frequency domain is obtained by the spatial FFT processing, the filtering in the filter processing unit 73 is a multiplication of the frequency axes, which is a large calculation compared with the multipoint control MIMO system of FIG. The amount can be reduced.
 このような空間周波数領域での演算は、波面合成による音場再現によく用いられており、例えば「Sascha Spors and Herbert Buchner, “Efficient Massive Multichannel Active Noise Control using Wave-Domain Adaptive Filtering”, 2008 3rd International Symposium on Communications, Control and Signal Processing.」(以下、参照文献1と称する)などに詳細に記載されている。 Such operations in the spatial frequency domain are often used for sound field reproduction by wave field synthesis. For example, "Sascha Spors and Herbert Buchner," Efficient Massive Multichannel Active Noise Control using Wave-Domain Adaptive Filtering ", 2008 3rd International. Symposium on Communications, Control and Signal Processing. ”(Hereinafter referred to as Reference 1) and the like are described in detail.
 空間逆FFT74は、フィルタ処理部73から供給された空間周波数領域のスピーカ信号に対して空間逆FFT処理、すなわち空間FFT処理の逆変換を行い、その結果得られた各チャネルの時間周波数領域のスピーカ信号を時間逆FFT部75-1乃至時間逆FFT部75-6に供給する。 The spatial inverse FFT 74 performs the spatial inverse FFT processing, that is, the inverse transform of the spatial FFT processing on the speaker signal in the spatial frequency domain supplied from the filter processing unit 73, and the speaker in the temporal frequency domain of each channel obtained as a result. The signal is supplied to the time reverse FFT unit 75-1 to the time reverse FFT unit 75-6.
 時間逆FFT部75-1乃至時間逆FFT部75-6は、空間逆FFT74から供給された時間周波数領域のスピーカ信号に対して時間逆FFT処理、すなわち時間FFT処理の逆変換を行い、その結果得られた各チャネルの時間領域のスピーカ信号をスピーカ13-1乃至スピーカ13-6に供給する。 The time inverse FFT unit 75-1 to the time inverse FFT unit 75-6 perform the time inverse FFT process, that is, the inverse transform of the time FFT process on the speaker signal in the time frequency domain supplied from the spatial inverse FFT 74, and the result is The obtained speaker signal in the time domain of each channel is supplied to the speaker 13-1 to the speaker 13-6.
 なお、以下、時間逆FFT部75-1乃至時間逆FFT部75-6を特に区別する必要のない場合、単に時間逆FFT部75とも称する。 Hereinafter, when it is not necessary to distinguish the time reverse FFT unit 75-1 to the time reverse FFT unit 75-6, it is also simply referred to as a time reverse FFT unit 75.
 このような空間周波数領域処理システムは、マイク11やスピーカ13が環状などの特定の形状でアレイ配置されている場合に用いることができ、低演算量で、多くのチャネルかつ広い周波数帯域での空間NCを実現することができる。 Such a spatial frequency domain processing system can be used when the microphone 11 and the speaker 13 are arranged in an array in a specific shape such as an annular shape, and can be used in a space with a large number of channels and a wide frequency band with a low calculation amount. NC can be realized.
 すなわち、空間周波数領域処理システムでは、チャネル間の相関が考慮されるので、並列SISOシステムと比較して高い周波数まで、より高い空間NC性能を得ることができ、しかも多点制御MIMOシステムよりも演算量を低く抑えることができる。 That is, in the spatial frequency domain processing system, the correlation between channels is taken into consideration, so that higher spatial NC performance can be obtained up to higher frequencies than in the parallel SISO system, and moreover, the calculation is performed more than in the multipoint control MIMO system. The amount can be kept low.
 また、空間周波数領域処理システムでは、広い領域を制御可能、つまり広い領域で所望の波面を高精度に形成することができるため、広い領域を対象として空間NCを行うことができる。さらに空間周波数領域処理システムでは、フィルタ処理部73で用いるフィルタの適応処理の収束も早いので、環境変化に追従した空間NCを行うことができる。 Further, in the spatial frequency domain processing system, a wide region can be controlled, that is, a desired wavefront can be formed with high accuracy in a wide region, so that spatial NC can be performed for a wide region. Further, in the spatial frequency domain processing system, the adaptive processing of the filter used in the filter processing unit 73 converges quickly, so that the spatial NC that follows the environmental change can be performed.
 ここで、フィルタ処理部73における空間周波数領域でのフィルタリングについて、より具体的に説明する。 Here, the filtering in the spatial frequency region in the filter processing unit 73 will be described more specifically.
 なお、以降においては、全ての処理が計算機上で行われることを想定し、本来であれば離散フーリエ変換(DFT(Discrete Fourier Transform))と表記すべきことを単にフーリ変換とも表記することとする。 In the following, assuming that all processing is performed on a computer, what should be described as a discrete Fourier transform (DFT (Discrete Fourier Transform)) will be simply referred to as a furi transform. ..
 例えば、空間上に設定された所定の2次元直交座標系であるxy座標系の原点を中心として、環状に等間隔で配置された複数のマイクからなるマイクアレイと、xy座標系の原点を中心として、環状に等間隔で配置された複数のスピーカからなるスピーカアレイについて考えるとする。 For example, a microphone array consisting of a plurality of microphones arranged at equal intervals in a ring shape centered on the origin of the xy coordinate system, which is a predetermined two-dimensional Cartesian coordinate system set in space, and the origin of the xy coordinate system. Let us consider a speaker array consisting of a plurality of speakers arranged in a ring at equal intervals.
 ここで、それらのマイクアレイやスピーカアレイの素子数、すなわちマイクアレイを構成するマイクの数、およびスピーカアレイを構成するスピーカの数がそれぞれM個であるとする。 Here, it is assumed that the number of elements of the microphone array and the speaker array, that is, the number of microphones constituting the microphone array and the number of speakers constituting the speaker array are M respectively.
 また、マイクアレイの半径、つまりマイクアレイの中心位置(原点位置)からマイクまでの距離をRmicとし、同様にスピーカアレイの半径をRspcとする。 The radius of the microphone array, that is, the distance from the center position (origin position) of the microphone array to the microphone is R mic , and the radius of the speaker array is R spc .
 このとき、マイクアレイを構成するマイクを示すインデックス(マイクインデックス)mを、m=0,1,…,M-1とすると、マイクアレイを構成する各マイクの配置位置を示す座標(x,y)は、次式(1)に示すようになる。 At this time, if the index (microphone index) m indicating the microphones constituting the microphone array is m = 0,1, ..., M-1, the coordinates (x, y) indicating the arrangement position of each microphone constituting the microphone array. ) Is as shown in the following equation (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 スピーカアレイを構成する各スピーカの配置位置を示す座標も、各マイクの配置位置を示す座標と同様にして表すことができる。 The coordinates indicating the arrangement position of each speaker constituting the speaker array can be expressed in the same manner as the coordinates indicating the arrangement position of each microphone.
 また、離散時間のインデックス(時間インデックス)をnとし、マイクアレイを構成するm番目のマイクで得られる時間領域のマイク信号をx[m,n]とする。同様にスピーカアレイを構成するm番目のスピーカの時間領域のスピーカ信号をy[m,n]とする。 Also, let n be the discrete-time index (time index), and let x [m, n] be the microphone signal in the time domain obtained by the m-th microphone that constitutes the microphone array. Similarly, let y [m, n] be the speaker signal in the time domain of the mth speaker constituting the speaker array.
 マイク信号x[m,n]に対して、時間領域でフーリエ変換を行って得られる時間周波数領域のマイク信号をX[m,k]とすると、これらのマイク信号x[m,n]とマイク信号X[m,k]との関係は、次式(2)に示すようになる。 Assuming that the microphone signal in the time frequency domain obtained by performing a Fourier transform on the microphone signal x [m, n] in the time domain is X [m, k], these microphone signals x [m, n] and the microphone The relationship with the signal X [m, k] is shown in the following equation (2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 なお、マイク信号X[m,k]におけるkは、時間周波数を示すインデックスであり、式(2)におけるNは時間フーリエ変換長を示している。 Note that k in the microphone signal X [m, k] is an index indicating the time frequency, and N in the equation (2) indicates the time Fourier transform length.
 スピーカ信号y[m,n]に対して、時間領域でフーリエ変換を行って得られる時間周波数領域のスピーカ信号をY[m,k]とすると、スピーカ信号y[m,n]とスピーカ信号Y[m,k]の関係も式(2)と同様の式により表すことができる。 Assuming that the speaker signal in the time frequency domain obtained by performing a Fourier transform on the speaker signal y [m, n] in the time domain is Y [m, k], the speaker signal y [m, n] and the speaker signal Y The relationship of [m, k] can also be expressed by the same equation as in equation (2).
 このような時間フーリエ変換と同様にして、空間フーリエ変換を定義する。 The spatial Fourier transform is defined in the same way as the time Fourier transform.
 すなわち、式(2)では時間インデックスnに関してフーリエ変換を行ったが、空間フーリエ変換では、マイクインデックスmに関してフーリエ変換が行われる。 That is, in the equation (2), the Fourier transform is performed with respect to the time index n, but in the spatial Fourier transform, the Fourier transform is performed with respect to the microphone index m.
 例えば空間周波数を示すインデックス、すなわち空間周波数の周波数ビンのインデックスをlとし、マイク信号X[m,k]に対して空間フーリエ変換を行って得られる空間周波数領域の信号をX’[l,k]とする。この場合、マイク信号X[m,k]と信号X’[l,k]との関係は、次式(3)に示すようになる。 For example, let l be the index indicating the spatial frequency, that is, the index of the frequency bin of the spatial frequency, and the signal in the spatial frequency region obtained by performing the spatial Fourier transform on the microphone signal X [m, k] is X'[l, k. ]. In this case, the relationship between the microphone signal X [m, k] and the signal X'[l, k] is shown in the following equation (3).
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 スピーカ信号Y[m,k]に対して、周波数領域で空間フーリエ変換を行って得られる空間周波数領域のスピーカ信号をY’[l,k]とすると、スピーカ信号Y[m,k]とスピーカ信号Y’[l,k]の関係も式(3)と同様の式により表すことができる。 Assuming that the speaker signal in the spatial frequency domain obtained by performing the spatial Fourier transform on the speaker signal Y [m, k] is Y'[l, k], the speaker signal Y [m, k] and the speaker The relationship of the signal Y'[l, k] can also be expressed by the same equation as in equation (3).
 上述の参照文献1によれば、空間周波数領域でのフィルタリングは、空間周波数領域の信号X’[l,k]に対して行われ、このフィルタリングの処理は、空間周波数領域のフィルタをW’[l,k]とすると、以下の式(4)により表される。すなわち、信号X’[l,k]とフィルタW’[l,k]とに基づいて式(4)を計算することで、スピーカ信号Y’[l,k]を得ることができる。 According to Reference 1 above, filtering in the spatial frequency domain is performed on the signal X'[l, k] in the spatial frequency domain, and this filtering process filters the spatial frequency domain W'[. If l, k], it is expressed by the following equation (4). That is, the speaker signal Y'[l, k] can be obtained by calculating the equation (4) based on the signal X'[l, k] and the filter W'[l, k].
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 図3に示した空間周波数領域処理システムのフィルタ処理部73では、式(4)により表されるフィルタリングが行われて空間周波数領域のスピーカ信号が生成される。 In the filter processing unit 73 of the spatial frequency domain processing system shown in FIG. 3, the filtering represented by the equation (4) is performed to generate a speaker signal in the spatial frequency domain.
 しかしながら、式(4)に示すフィルタリングを空間NCのシステムにおいて用いることは困難である。 However, it is difficult to use the filtering shown in Eq. (4) in the spatial NC system.
 その理由は、式(4)に示すフィルタリングを行うためには、時間フーリエ変換が必要であり、この時間フーリエ変換のためにブロック処理による避けられないシステム遅延が生じるからである。 The reason is that the time Fourier transform is required to perform the filtering shown in the equation (4), and the time Fourier transform causes an unavoidable system delay due to the block processing.
 具体的には、例えば式(2)に示した時間フーリエ変換において、時間フーリエ変換長Nを512とし、512サンプルの時間フーリエ変換を行うと約10msecの遅延が生じるが、この遅延時間は音速により距離換算すると3m以上の遅延となる。そのため、式(4)に示したフィルタリングにより空間NCを実現しようとしても、実際には高い空間NC性能を得ることは困難である。 Specifically, for example, in the time Fourier transform shown in Eq. (2), if the time Fourier transform length N is 512 and the time Fourier transform of 512 samples is performed, a delay of about 10 msec occurs, but this delay time depends on the speed of sound. When converted to distance, the delay is 3 m or more. Therefore, even if an attempt is made to realize spatial NC by the filtering shown in the equation (4), it is actually difficult to obtain high spatial NC performance.
 そこで、時間フーリエ変換を用いることなく、式(4)と等価な処理を実現することができるように、式(4)を変形することを考える。 Therefore, consider transforming the equation (4) so that the processing equivalent to the equation (4) can be realized without using the time Fourier transform.
 例えば空間周波数領域のフィルタW’[l,k]のフィルタ長をNfとし、式(4)の両辺に対して逆時間フーリエ変換と逆空間フーリエ変換を行うと、次式(5)が得られる。 For example, if the filter length of the filter W'[l, k] in the spatial frequency domain is N f and the inverse time Fourier transform and the reciprocal space Fourier transform are performed on both sides of the equation (4), the following equation (5) is obtained. Be done.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 なお、式(5)においてw[m’,n’]は、フィルタW’[l,k]に対応する、マイクインデックスm’のマイクについての時間領域のフィルタを表しており、(P)Qは、P“mod”Q、すなわちPをQで割った剰余を表している。 In equation (5), w [m', n'] represents a time domain filter for a microphone with a microphone index m'corresponding to the filter W'[l, k], and (P) Q. Represents P “mod” Q, that is, the remainder of P divided by Q.
 したがって、式(5)は、時間領域のフィルタw[m,n]およびマイク信号x[m,n]と、時間領域のスピーカ信号y[m,n]との関係を表している。 Therefore, the equation (5) expresses the relationship between the time domain filter w [m, n] and the microphone signal x [m, n] and the time domain speaker signal y [m, n].
 式(5)に基づくフィルタリングは、時間フーリエ変換によるシステム遅延の影響を受けないため、空間NCで用いることができるが、演算量は上述の多点制御MIMOシステムと同等であり、多くの演算が必要となってしまう。 Filtering based on equation (5) can be used in spatial NC because it is not affected by the system delay due to the time Fourier transform, but the amount of computation is the same as the above-mentioned multipoint control MIMO system, and many computations are performed. It will be necessary.
 なお、多点制御MIMOシステムについては、例えば「C. Hansen, et al., “Active Control of Noise and Vibration”, CRC press, 2012.」(以下、参照文献2と称する)などに詳細に記載されている。 The multipoint control MIMO system is described in detail in, for example, "C. Hansen, et al.," Active Control of Noise and Vibration ", CRC press, 2012." (hereinafter referred to as Reference 2). ing.
 これに対して、多次元フーリエ変換の可換性に着目し、式(4)の両辺に対して逆空間フーリエ変換は行わずに、逆時間フーリエ変換のみを行うことを考える。 On the other hand, paying attention to the commutativity of the multidimensional Fourier transform, consider performing only the inverse time Fourier transform without performing the reciprocal space Fourier transform on both sides of the equation (4).
 ここで、次式(6)に示すように時間領域のマイク信号x[m,n]に対してマイクの総数MをDFTポイント長とする空間フーリエ変換(DFT)、すなわち空間周波数領域の信号への変換(空間周波数変換)を行って得られる信号をx’[l,n]と定義する。 Here, as shown in the following equation (6), to the spatial Fourier transform (DFT) in which the total number M of the microphones is the DFT point length for the microphone signal x [m, n] in the time domain, that is, to the signal in the spatial frequency domain. The signal obtained by performing the conversion (spatial frequency conversion) of is defined as x'[l, n].
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 すなわち、式(6)では、マイク信号x[m,n]に対して時間フーリエ変換(時間周波数変換)は行わずに、空間フーリエ変換(空間周波数変換)のみを行う変換処理、つまり時間領域のマイク信号x[m,n]を空間方向だけ周波数領域へと変換する処理が行われる。したがって、式(6)により得られる信号x’[l,n]は、空間周波数領域における時間信号であるということができる。 That is, in the equation (6), the conversion process in which only the spatial Fourier transform (spatial frequency conversion) is performed without performing the time Fourier transform (time-frequency conversion) for the microphone signal x [m, n], that is, the time domain. The process of converting the microphone signal x [m, n] into the frequency domain only in the spatial direction is performed. Therefore, it can be said that the signal x'[l, n] obtained by the equation (6) is a time signal in the spatial frequency domain.
 信号x’[l,n]と同様に、時間領域のフィルタw[m,n]に対して、時間フーリエ変換(時間周波数変換)は行わずに、空間フーリエ変換(空間周波数変換)のみを行うことで得られる空間周波数領域(各空間周波数の周波数ビン)のフィルタをw’[l,n]と定義する。 Similar to the signal x'[l, n], the time domain filter w [m, n] is not subjected to the time Fourier transform (temporal frequency transform), but only the spatial Fourier transform (spatial frequency transform). The filter of the spatial frequency domain (frequency bin of each spatial frequency) obtained by this is defined as w'[l, n].
 また、時間領域のスピーカ信号y[m,n]に対して、時間フーリエ変換(時間周波数変換)は行わずに、空間フーリエ変換(空間周波数変換)のみを行うことで得られる空間周波数領域(各空間周波数の周波数ビン)のスピーカ信号をy’[l,n]と定義する。なお、より詳細には、スピーカ信号y’[l,n]はスピーカ単体を駆動させるスピーカ駆動信号ではないが、各スピーカから出力させるノイズキャンセル音は、このスピーカ信号y’[l,n]から計算される。 Further, the spatial frequency domain (each) obtained by performing only the spatial Fourier transform (spatial frequency conversion) without performing the temporal Fourier transform (temporal frequency transform) for the speaker signal y [m, n] in the temporal domain. The speaker signal of the frequency bin of the spatial frequency) is defined as y'[l, n]. More specifically, the speaker signal y'[l, n] is not a speaker drive signal that drives the speaker alone, but the noise canceling sound output from each speaker is derived from this speaker signal y'[l, n]. It is calculated.
 この場合、式(4)の両辺に対して、逆空間フーリエ変換は行わずに、逆時間フーリエ変換のみを行うこと、次式(7)が得られる。 In this case, the following equation (7) can be obtained by performing only the inverse time Fourier transform without performing the reciprocal space Fourier transform on both sides of the equation (4).
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 式(7)では、フィルタ長Nfのフィルタw’[l,n]により信号x’[l,n]に対してフィルタリング、すなわちフィルタw’[l,n]と信号x’[l,n]との時間方向の畳み込み処理を行うことで、スピーカ信号y’[l,n]が得られることになる。 In equation (7), the signal x'[l, n] is filtered by the filter w'[l, n] having a filter length N f , that is, the filter w'[l, n] and the signal x'[l, n]. ] And the convolution process in the time direction, the speaker signal y'[l, n] can be obtained.
 式(7)に示すフィルタリングの演算は、空間周波数領域における処理であるが、空間方向の畳み込みを必要とせずに、すなわち空間周波数のインデックス(周波数ビン)lに対して独立に、時間方向の畳み込みを行うだけでよい。 The filtering operation shown in equation (7) is a process in the spatial frequency domain, but does not require spatial convolution, that is, it is independent of the spatial frequency index (frequency bin) l and is convoluted in the temporal direction. All you have to do is.
 そのため、スピーカ信号y’[l,n]を求める演算の実質的な演算量は、定数倍の演算を除き上述の並列SISOシステムにおける場合と同等になる。 Therefore, the actual amount of operation for obtaining the speaker signal y'[l, n] is the same as that in the above-mentioned parallel SISO system except for the operation of a constant multiple.
 以下では、式(6)に示した空間周波数変換と、式(7)に示したフィルタリングとによりノイズキャンセル信号としてのスピーカ信号を生成するシステムを、特に低遅延空間周波数領域処理システムとも称することとする。 Hereinafter, a system that generates a speaker signal as a noise canceling signal by the spatial frequency conversion shown in the equation (6) and the filtering shown in the equation (7) is also referred to as a low delay spatial frequency domain processing system. do.
 本技術によれば、低遅延空間周波数領域処理システムにより空間NCを実現することで、システムで生じる遅延時間を低減させ、より高い空間NC性能を得ることができる。 According to this technology, by realizing spatial NC with a low delay spatial frequency domain processing system, it is possible to reduce the delay time generated in the system and obtain higher spatial NC performance.
 すなわち、本技術を適用した低遅延空間周波数領域処理システムでは、フィルタリングの演算が、空間周波数領域処理システムのような単純な乗算ではなく畳み込み演算となるが、SISOのフィルタリングで済むので、多点制御MIMOシステムと比較すると大幅に演算量が少なくなる。 That is, in the low-delay spatial frequency domain processing system to which this technology is applied, the filtering operation is a convolution operation instead of a simple multiplication as in the spatial frequency domain processing system, but since SISO filtering is sufficient, multipoint control is required. Compared to the MIMO system, the amount of computation is significantly reduced.
 しかも、低遅延空間周波数領域処理システムでは、時間軸上のフーリエ変換(FFT)をせずにすむので、システムで生じる遅延時間(ディレイ)は極端に少なくなる。また、低遅延空間周波数領域処理システムでは、空間のFFT(空間周波数変換)は行われるが、同時刻のマイクの出力(マイク信号)を一度にFFTするだけであるので、時間的にマイク信号をバッファリングする必要はなく、遅延は殆ど生じない。 Moreover, in the low delay spatial frequency domain processing system, since it is not necessary to perform the Fourier transform (FFT) on the time axis, the delay time (delay) generated in the system is extremely reduced. Also, in the low delay spatial frequency domain processing system, spatial FFT (spatial frequency conversion) is performed, but since the output (mic signal) of the microphone at the same time is only FFTed at once, the microphone signal is temporally generated. There is no need for buffering and there is little delay.
 したがって、低遅延空間周波数領域処理システムは、低遅延かつ演算量の少ない現実的なシステムとなり、より高性能な空間NCを実現することができる。 Therefore, the low-delay spatial frequency domain processing system becomes a realistic system with low delay and a small amount of computation, and can realize a higher-performance spatial NC.
〈ノイズキャンセリング装置の構成例〉
 図4は、本技術を適用した低遅延空間周波数領域処理システムの一実施の形態の例であるノイズキャンセリング装置の構成例を示す図である。
<Configuration example of noise canceling device>
FIG. 4 is a diagram showing a configuration example of a noise canceling device which is an example of an embodiment of a low delay spatial frequency domain processing system to which the present technology is applied.
 図4に示すノイズキャンセリング装置101は、マイクアレイ111、信号処理装置112、およびスピーカアレイ113を有している。 The noise canceling device 101 shown in FIG. 4 has a microphone array 111, a signal processing device 112, and a speaker array 113.
 マイクアレイ111は、マイク121-1乃至マイク121-16により構成され、それらのマイクを環状や矩形状などの所定の形状に並べることで得られる環状マイクアレイ等のマイクアレイである。 The microphone array 111 is a microphone array such as an annular microphone array obtained by arranging the microphones 121-1 to 121-16 in a predetermined shape such as an annular shape or a rectangular shape.
 マイク121-1乃至マイク121-16は、キャンセル対象のノイズ音を含む周囲の音を収音し、その結果得られたマイク信号を信号処理装置112に供給する。なお、以下、マイク121-1乃至マイク121-16を特に区別する必要のない場合、単にマイク121とも称することとする。 The microphones 121-1 to 121-16 collect ambient sounds including noise to be canceled, and supply the resulting microphone signal to the signal processing device 112. Hereinafter, when it is not necessary to distinguish between the microphones 121-1 and the microphones 121-16, they are simply referred to as the microphones 121.
 信号処理装置112は、例えば1または複数の演算器を有するパーソナルコンピュータ等からなり、マイクアレイ111から供給されたマイク信号に基づいて、空間NCのための時間領域のスピーカ信号を生成し、スピーカアレイ113に出力する。 The signal processing device 112 comprises, for example, a personal computer having one or more arithmetic units, and generates a speaker signal in the time domain for spatial NC based on the microphone signal supplied from the microphone array 111, and the speaker array. Output to 113.
 この時間領域のスピーカ信号は、空間NCのためのノイズキャンセル信号であって、スピーカアレイ113を構成するスピーカを駆動させて、ノイズキャンセル音を出力させるスピーカ駆動信号である。 The speaker signal in this time domain is a noise canceling signal for spatial NC, and is a speaker driving signal that drives the speakers constituting the speaker array 113 to output a noise canceling sound.
 信号処理装置112は、例えばDSP(Digital Signal Processor)やFPGA(Field Programmable Gate Array)などの1つの演算器からなる信号処理部131を有している。 The signal processing device 112 has a signal processing unit 131 including one arithmetic unit such as a DSP (Digital Signal Processor) or an FPGA (Field Programmable Gate Array).
 信号処理部131は、空間周波数変換部141、フィルタ処理部142-1乃至フィルタ処理部142-16、および空間周波数合成部143を有している。 The signal processing unit 131 has a spatial frequency conversion unit 141, a filter processing unit 142-1 to a filter processing unit 142-16, and a spatial frequency synthesis unit 143.
 空間周波数変換部141は、マイク121-1乃至マイク121-16から供給された時間領域のマイク信号、すなわち時間信号に対して空間周波数変換を行い、その結果得られた空間周波数領域の信号をフィルタ処理部142-1乃至フィルタ処理部142-16に供給する。換言すれば、空間周波数変換部141では、空間周波数領域で時間領域のマイク信号の変換が行われる。 The spatial frequency conversion unit 141 performs spatial frequency conversion on the microphone signal in the time region supplied from the microphones 121-1 to 121-16, that is, the time signal, and filters the signal in the spatial frequency region obtained as a result. It is supplied to the processing unit 142-1 to the filter processing unit 142-16. In other words, the spatial frequency conversion unit 141 converts the microphone signal in the time domain in the spatial frequency domain.
 例えば空間周波数変換部141では、空間周波数変換として、全マイク121から供給されたマイク信号に基づいて式(6)に示したDFTが行われる。 For example, in the spatial frequency conversion unit 141, the DFT shown in the equation (6) is performed as the spatial frequency conversion based on the microphone signals supplied from all the microphones 121.
 特に、この例ではマイク121の総数Mは「16」とされ、フィルタ処理部142-1乃至フィルタ処理部142-16に対応する16個の各空間周波数の周波数ビンlについての信号x’[l,n]が算出される。 In particular, in this example, the total number M of the microphones 121 is set to "16", and the signal x'[l] for the frequency bin l of each of the 16 spatial frequencies corresponding to the filter processing unit 142-1 to the filter processing unit 142-16. , N] is calculated.
 フィルタ処理部142-1乃至フィルタ処理部142-16は、空間周波数変換部141から供給された信号に対して、空間周波数領域での信号処理を行うことで空間周波数領域のスピーカ信号を生成し、空間周波数合成部143に供給する。 The filter processing unit 142-1 to the filter processing unit 142-16 generate a speaker signal in the spatial frequency region by performing signal processing in the spatial frequency region with respect to the signal supplied from the spatial frequency conversion unit 141. It is supplied to the spatial frequency synthesis unit 143.
 すなわち、フィルタ処理部142-1乃至フィルタ処理部142-16は、空間NCのためのSISOフィルタを保持しており、そのSISOフィルタにより、空間周波数変換部141からの空間周波数領域の信号に対するフィルタリングを、空間周波数領域での信号処理として行う。より詳細には、SISOフィルタを構成するフィルタ係数と、空間周波数領域の信号とを畳み込む処理がSISOフィルタによるフィルタリングとして行われる。 That is, the filter processing unit 142-1 to the filter processing unit 142-16 hold an SISO filter for the spatial NC, and the SISO filter filters the signal in the spatial frequency domain from the spatial frequency conversion unit 141. , Performed as signal processing in the spatial frequency domain. More specifically, the process of convolving the filter coefficients constituting the SISO filter and the signal in the spatial frequency region is performed as filtering by the SISO filter.
 具体的には、例えばフィルタ処理部142-1乃至フィルタ処理部142-16には、SISOフィルタとして上述したフィルタw’[l,n]が保持されており、フィルタリングとして式(7)に示した計算が行われてスピーカ信号y’[l,n]が生成される。 Specifically, for example, the filter processing unit 142-1 to the filter processing unit 142-16 hold the above-mentioned filter w'[l, n] as an SISO filter, and are represented by the equation (7) as filtering. The calculation is performed and the speaker signal y'[l, n] is generated.
 なお、以下、フィルタ処理部142-1乃至フィルタ処理部142-16を特に区別する必要のない場合、単にフィルタ処理部142とも称することとする。 Hereinafter, when it is not necessary to distinguish between the filter processing unit 142-1 and the filter processing unit 142-16, the filter processing unit 142 is also simply referred to as the filter processing unit 142.
 例えば、フィルタ処理部142で式(7)に示したフィルタリング(信号処理)が行われる場合には、1つのフィルタ処理部142で、空間周波数の1つの周波数ビンlについてのフィルタリングが行われることになる。 For example, when the filtering (signal processing) shown in the equation (7) is performed by the filter processing unit 142, one filter processing unit 142 performs filtering for one frequency bin l of the spatial frequency. Become.
 フィルタ処理部142で保持されるSISOフィルタは、例えばマイクアレイ111の形状やマイク121の総数などに基づいて、予めLMS(Least Mean Squares)等により生成されたFIR(Finite Impulse Response)フィルタなどとされる。 The SISO filter held by the filter processing unit 142 is, for example, a FIR (Finite Impulse Response) filter generated in advance by LMS (Least Mean Squares) or the like based on the shape of the microphone array 111 or the total number of microphones 121. To.
 フィルタ処理部142では、予め用意されたSISOフィルタが継続して用いられるようにしてもよいし、制御点に設置されたマイク等により収音して得られたマイク信号などに基づいて、SISOフィルタが逐次更新されるようにしてもよい。 In the filter processing unit 142, a SISO filter prepared in advance may be continuously used, or an SISO filter may be used based on a microphone signal obtained by collecting sound by a microphone or the like installed at a control point. May be updated sequentially.
 空間周波数合成部143は、各フィルタ処理部142から供給された空間周波数領域のスピーカ信号に対して空間周波数合成を行うことで、スピーカごとの時間領域のスピーカ信号を生成し、スピーカアレイ113に供給する。 The spatial frequency synthesis unit 143 generates a speaker signal in the time region for each speaker by performing spatial frequency synthesis on the speaker signal in the spatial frequency region supplied from each filter processing unit 142, and supplies the speaker signal to the speaker array 113. do.
 例えば空間周波数合成部143では、空間周波数合成として、空間周波数変換部141で行われる空間周波数変換の逆変換が行われる。したがって、例えば空間周波数変換部141で式(6)に示したDFT(空間フーリエ変換)が行われる場合、空間周波数合成部143では、式(6)に対応するIDFT(Inverse Discrete Fourier Transform)(逆空間フーリエ変換)が行われる。 For example, in the spatial frequency synthesis unit 143, the inverse conversion of the spatial frequency conversion performed by the spatial frequency conversion unit 141 is performed as the spatial frequency synthesis. Therefore, for example, when the DFT (spatial Fourier transform) shown in the equation (6) is performed by the spatial frequency transforming unit 141, the spatial frequency synthesizing unit 143 has an IDFT (Inverse Discrete Fourier Transform) (reverse) corresponding to the equation (6). Spatial Fourier transform) is performed.
 スピーカアレイ113は、スピーカユニットであるスピーカ151-1乃至スピーカ151-16により構成され、それらのスピーカを環状や矩形状などの所定の形状に並べることで得られる環状スピーカアレイ等のスピーカアレイである。 The speaker array 113 is a speaker array such as an annular speaker array obtained by arranging the speakers 151-1 to 151-16, which are speaker units, in a predetermined shape such as an annular shape or a rectangular shape. ..
 スピーカ151-1乃至スピーカ151-16は、空間周波数合成部143から供給された時間領域のスピーカ信号に基づいて駆動し、ノイズキャンセル音を出力する。これにより、所定の対象領域においてノイズ音がキャンセルされ、空間NCが実現される。 Speakers 151-1 to 151-16 are driven based on the speaker signal in the time domain supplied from the spatial frequency synthesis unit 143, and output a noise canceling sound. As a result, the noise sound is canceled in the predetermined target area, and the spatial NC is realized.
 なお、以下、スピーカ151-1乃至スピーカ151-16を特に区別する必要のない場合、単にスピーカ151とも称することとする。 Hereinafter, when it is not necessary to distinguish between the speaker 151-1 and the speaker 151-16, the speaker 151-1 is also simply referred to as the speaker 151.
 ここで、図5を参照してマイクアレイ111とスピーカアレイ113の配置例について説明する。なお、図5において図4における場合と対応する部分には同一の符号を付してあり、その説明は省略する。また、図5では図を見やすくするため、マイク121およびスピーカ151の符号は付されていない。 Here, an example of arrangement of the microphone array 111 and the speaker array 113 will be described with reference to FIG. In FIG. 5, the parts corresponding to the case in FIG. 4 are designated by the same reference numerals, and the description thereof will be omitted. Further, in FIG. 5, in order to make the figure easier to see, the reference numerals of the microphone 121 and the speaker 151 are not attached.
 図5の例では、コンテンツを聴取する受聴者等であるユーザU11が所定の領域R11内におり、この領域R11が空間NCの対象となる領域(キャンセルエリア)とされている。 In the example of FIG. 5, the user U11 who is a listener or the like listening to the content is in the predetermined area R11, and this area R11 is the area (cancellation area) targeted by the spatial NC.
 また、キャンセルエリアである領域R11を囲むようにスピーカアレイ113を構成する各スピーカ151が環状に並べられて環状スピーカアレイが形成されている。 Further, the speakers 151 constituting the speaker array 113 are arranged in a ring shape so as to surround the area R11 which is a cancellation area to form an annular speaker array.
 さらにスピーカアレイ113を囲むように、そのスピーカアレイ113の外側にマイクアレイ111を構成する各マイク121が環状に並べられて環状マイクアレイが形成されている。 Further, the microphones 121 constituting the microphone array 111 are arranged in a ring shape on the outside of the speaker array 113 so as to surround the speaker array 113 to form a ring-shaped microphone array.
 ここでは、スピーカアレイ113とマイクアレイ111の中心が、円形状である領域R11の中心位置となるようにスピーカアレイ113とマイクアレイ111が配置されている。 Here, the speaker array 113 and the microphone array 111 are arranged so that the center of the speaker array 113 and the microphone array 111 is at the center position of the circular region R11.
 ノイズキャンセリング装置101では、領域R11から見てスピーカアレイ113の外側に配置されたマイクアレイ111により、そのマイクアレイ111の外側で発生し、領域R11へと伝搬するノイズ(ノイズ音)が収音される。 In the noise canceling device 101, the microphone array 111 arranged outside the speaker array 113 when viewed from the region R11 collects noise (noise sound) generated outside the microphone array 111 and propagated to the region R11. Will be done.
 そして、収音により得られたマイク信号に基づいてスピーカ信号が生成され、スピーカアレイ113を構成する各スピーカ151から、スピーカ信号に基づくノイズキャンセル音が領域R11の方向に出力される。すると、各スピーカ151から出力されたノイズキャンセル音の波面が合成されて、領域R11内においてノイズ音をキャンセルする(打ち消す)波面が形成される。これにより、波面合成による空間NCが実現される。 Then, a speaker signal is generated based on the microphone signal obtained by collecting the sound, and the noise canceling sound based on the speaker signal is output in the direction of the region R11 from each speaker 151 constituting the speaker array 113. Then, the wavefront of the noise canceling sound output from each speaker 151 is combined to form a wavefront that cancels (cancels) the noise sound in the region R11. As a result, spatial NC by wave field synthesis is realized.
 なお、ここではマイクアレイ111を構成するマイク121の数と、スピーカアレイ113を構成するスピーカ151の数とが同じであり、マイクアレイ111とスピーカアレイ113が同じ形状(環状)である例について説明した。 Here, an example in which the number of microphones 121 constituting the microphone array 111 and the number of speakers 151 constituting the speaker array 113 are the same, and the microphone array 111 and the speaker array 113 have the same shape (annular shape) will be described. did.
 しかし、マイク121とスピーカ151の数や、マイクアレイ111とスピーカアレイ113の形状は必ずしも同じである必要はなく、異なる数や形状であってもよい。例えばマイク121とスピーカ151の数が異なる場合には、それらの数に応じて、空間周波数変換部141または空間周波数合成部143において、空間周波数領域の信号に対してアップサンプリングやダウンサンプリングを行えばよい。 However, the numbers of the microphone 121 and the speaker 151 and the shapes of the microphone array 111 and the speaker array 113 do not necessarily have to be the same, and may be different numbers and shapes. For example, when the numbers of the microphone 121 and the speakers 151 are different, the spatial frequency conversion unit 141 or the spatial frequency synthesis unit 143 may perform upsampling or downsampling on the signal in the spatial frequency domain according to the numbers. good.
 また、マイク121やスピーカ151の個数もいくつであってもよく、マイクアレイ111やスピーカアレイ113の形状(アレイ形状)もどのような形状であってもよい。 Further, the number of microphones 121 and speakers 151 may be any number, and the shape (array shape) of the microphone array 111 and speaker array 113 may be any shape.
〈ノイズキャンセリング処理の説明〉
 続いて、ノイズキャンセリング装置101の動作について説明する。すなわち、以下、図6のフローチャートを参照して、ノイズキャンセリング装置101によるノイズキャンセリング処理について説明する。
<Explanation of noise canceling processing>
Subsequently, the operation of the noise canceling device 101 will be described. That is, the noise canceling process by the noise canceling device 101 will be described below with reference to the flowchart of FIG.
 ステップS11においてマイクアレイ111の各マイク121は、周囲の音を収音し、その結果得られた時間領域のマイク信号を空間周波数変換部141に供給する。 In step S11, each microphone 121 of the microphone array 111 collects ambient sound, and supplies the microphone signal in the time domain obtained as a result to the spatial frequency conversion unit 141.
 ステップS12において空間周波数変換部141は、マイク121から供給された時間領域のマイク信号に対して空間周波数変換を行い、その結果得られた空間周波数領域の信号を各フィルタ処理部142に供給する。例えばステップS12では、上述した式(6)の計算が行われ、空間周波数領域の信号が生成される。 In step S12, the spatial frequency conversion unit 141 performs spatial frequency conversion on the microphone signal in the time region supplied from the microphone 121, and supplies the signal in the spatial frequency region obtained as a result to each filter processing unit 142. For example, in step S12, the calculation of the above equation (6) is performed, and a signal in the spatial frequency region is generated.
 ステップS13においてフィルタ処理部142は、空間周波数変換部141から供給された空間周波数領域の信号に対して、保持しているSISOフィルタによるフィルタリングを行い、その結果得られた空間周波数領域のスピーカ信号を空間周波数合成部143に供給する。例えばステップS13では、式(7)の計算がフィルタリングとして行われる。 In step S13, the filter processing unit 142 filters the signal in the spatial frequency domain supplied from the spatial frequency conversion unit 141 by the holding SISO filter, and obtains the speaker signal in the spatial frequency domain obtained as a result. It is supplied to the spatial frequency synthesis unit 143. For example, in step S13, the calculation of the equation (7) is performed as filtering.
 ステップS14において空間周波数合成部143は、フィルタ処理部142から供給された空間周波数領域のスピーカ信号に対して空間周波数合成を行い、時間領域のスピーカ信号を生成する。 In step S14, the spatial frequency synthesis unit 143 performs spatial frequency synthesis on the spatial frequency domain speaker signal supplied from the filter processing unit 142, and generates a temporal domain speaker signal.
 ステップS15において空間周波数合成部143は、ステップS14の処理で得られたスピーカ信号をスピーカアレイ113の各スピーカ151に供給して音(ノイズキャンセル音)を出力させる。 In step S15, the spatial frequency synthesis unit 143 supplies the speaker signal obtained in the process of step S14 to each speaker 151 of the speaker array 113 to output a sound (noise canceling sound).
 これにより、キャンセルエリアではノイズ音をキャンセルする波面が形成され、空間NCが実現される。このようにして空間NCが行われると、ノイズキャンセリング処理は終了する。 As a result, a wavefront that cancels noise sounds is formed in the cancel area, and spatial NC is realized. When the spatial NC is performed in this way, the noise canceling process ends.
 以上のようにしてノイズキャンセリング装置101は、時間領域のマイク信号に対して、時間周波数変換を行わずに、空間周波数変換を行い、その結果得られた空間周波数領域の信号に基づいてスピーカ信号を生成する。 As described above, the noise canceling device 101 performs spatial frequency conversion on the microphone signal in the time domain without performing time frequency conversion, and the speaker signal is based on the signal in the spatial frequency region obtained as a result. To generate.
 このように、時間周波数変換とその逆変換を行うことなく、空間周波数領域での信号処理によりスピーカ信号を生成することで、演算量を大幅に削減することができるだけでなく、遅延時間も低減させ、リアルタイムで高い空間NC性能を得ることができる。 In this way, by generating a speaker signal by signal processing in the spatial frequency domain without performing time-frequency conversion and its inverse conversion, not only the amount of calculation can be significantly reduced, but also the delay time can be reduced. , High spatial NC performance can be obtained in real time.
〈第2の実施の形態〉
〈空間周波数領域での処理について〉
 ところで、図4に示したノイズキャンセリング装置101では、信号処理部131において空間周波数領域での信号処理を行うために、まず空間周波数変換部141で空間周波数変換が行われ、時間領域のマイク信号が空間周波数領域の信号へと変換される。
<Second embodiment>
<Processing in the spatial frequency domain>
By the way, in the noise canceling device 101 shown in FIG. 4, in order for the signal processing unit 131 to perform signal processing in the spatial frequency region, the spatial frequency conversion unit 141 first performs spatial frequency conversion, and the microphone signal in the time region is performed. Is converted into a signal in the spatial frequency region.
 このとき、マイクアレイ111を構成する全マイク121の出力(マイク信号)を、空間周波数変換(DFT)のために信号処理部131に入力しなければならない。また、空間周波数領域でのフィルタリング後においても、全フィルタ処理部142の出力を用いて空間周波数合成を行う必要がある。 At this time, the outputs (microphone signals) of all the microphones 121 constituting the microphone array 111 must be input to the signal processing unit 131 for spatial frequency conversion (DFT). Further, even after filtering in the spatial frequency domain, it is necessary to perform spatial frequency synthesis using the outputs of all the filter processing units 142.
 そのため、ノイズキャンセリング装置101では、信号処理部131としての1つの演算器で空間周波数変換、空間周波数領域での信号処理(フィルタリング)、および空間周波数合成を行わなければならない。すなわち、これらの処理を行うハードウェアを分割し、複数のハードウェア(演算器)で分担して処理を行う(処理を分散させる)ことができない。 Therefore, in the noise canceling device 101, one arithmetic unit as the signal processing unit 131 must perform spatial frequency conversion, signal processing (filtering) in the spatial frequency region, and spatial frequency synthesis. That is, it is not possible to divide the hardware that performs these processes and share the processes with a plurality of hardware (arithmetic units) (distribute the processes).
 図5では、説明を簡単にするため、16pointのFFTの例、すなわち式(6)におけるDFTポイント長M(マイク121の総数M)が16である例について説明した。 In FIG. 5, for the sake of simplicity, an example of a 16-point FFT, that is, an example in which the DFT point length M (total number M of microphones 121) in the equation (6) is 16.
 空間NCでは、アレイ形状で配置されたマイク121同士の間隔やスピーカ151同士の間隔が狭いほど、より高い周波数までの制御、すなわち、より高い周波数までノイズ音のキャンセルを行うことができる。 In the spatial NC, the narrower the distance between the microphones 121 arranged in the array shape and the distance between the speakers 151, the higher the frequency can be controlled, that is, the noise sound can be canceled up to the higher frequency.
 また、キャンセルエリアとなる領域R11をより広くしたい場合には、より狭い領域R11で同じ周波数までの制御を行う場合よりもマイク121やスピーカ151の個数を多くする必要がある。 Further, when it is desired to make the area R11 which is the cancellation area wider, it is necessary to increase the number of microphones 121 and speakers 151 as compared with the case where the control is performed up to the same frequency in the narrower area R11.
 具体的な例として、例えばノイズキャンセリングの対象とする周波数の上限を1kHzまでとし、かつキャンセルエリア(領域R11)を直径2mの領域としようとする場合には、十分な空間NC性能を得るためには、マイク121やスピーカ151が40個以上必要となる。したがって、さらに高い周波数や広い領域を対象とした空間NCを行おうとすると、必要となるマイク121やスピーカ151は数百を超える場合もある。 As a specific example, for example, when the upper limit of the frequency targeted for noise canceling is set to 1 kHz and the cancel area (region R11) is to be a region with a diameter of 2 m, in order to obtain sufficient spatial NC performance. Requires 40 or more microphones 121 and 151 speakers. Therefore, in order to perform spatial NC for a higher frequency or a wider area, the required microphone 121 or speaker 151 may exceed several hundreds.
 そのような場合、全マイク121の出力を、信号処理部131としての1つのDSPやFPGAといった演算器(演算装置)に入力することは、演算器に設けられる入力や出力のPIN数(入力端子や出力端子の数)の制限上、物理的に不可能となることがある。 In such a case, inputting the outputs of all the microphones 121 to one arithmetic unit (arithmetic unit) such as one DSP or FPGA as the signal processing unit 131 is the number of PINs (input terminals) of the input and output provided in the arithmetic unit. And the number of output terminals), it may be physically impossible.
 また、仮にPIN数が足りたとしても、マイク121やスピーカ151の数が多くなると、その分だけ必要なフィルタ処理部142(SISOフィルタ)の数も増加し、信号処理部131全体での演算量が増えてしまう。 Even if the number of PINs is sufficient, if the number of microphones 121 and speakers 151 increases, the number of filter processing units 142 (SISO filters) required increases accordingly, and the amount of calculation in the entire signal processing unit 131 increases. Will increase.
 そうすると、1つの信号処理部131では演算量が多すぎて演算(処理)しきれなくなり、空間NCを実現することができなくなってしまうおそれもある。 Then, one signal processing unit 131 may not be able to perform the calculation (processing) because the amount of calculation is too large, and it may not be possible to realize the spatial NC.
 そこで、例えばマイクアレイ111やスピーカアレイ113を構成する複数のマイク121やスピーカ151を複数のグループに分割し、それらの分割したグループごとに空間周波数変換、空間周波数領域でのフィルタリング、および空間周波数合成を行うようにしてもよい。 Therefore, for example, a plurality of microphones 121 and speakers 151 constituting the microphone array 111 and the speaker array 113 are divided into a plurality of groups, and spatial frequency conversion, filtering in the spatial frequency region, and spatial frequency synthesis are performed for each of the divided groups. May be done.
 そのようにすることで、空間NCのための演算を複数の演算器(演算装置)で分担して行うことができるので、1つの演算器あたりの必要なPIN数や演算量を低減させつつ高い空間NC性能を得ることができる。 By doing so, the calculation for spatial NC can be shared by multiple arithmetic units (arithmetic units), so it is high while reducing the number of PINs and the amount of computation required for one arithmetic unit. Spatial NC performance can be obtained.
 例えば図7に示すように、マイクアレイ111の外側における位置P11を1つの点音源の音源位置としてノイズ音が発生した場合について考える。なお、図7において図5における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 For example, as shown in FIG. 7, consider a case where a noise sound is generated with the position P11 outside the microphone array 111 as the sound source position of one point sound source. In FIG. 7, the same reference numerals are given to the portions corresponding to those in FIG. 5, and the description thereof will be omitted as appropriate.
 この場合、上述したノイズキャンセリング装置101では、空間NCを行うために、マイクアレイ111を構成する全てのマイク121と、スピーカアレイ113を構成する全てのスピーカ151が使用される。 In this case, in the noise canceling device 101 described above, all the microphones 121 constituting the microphone array 111 and all the speakers 151 constituting the speaker array 113 are used in order to perform spatial NC.
 しかし、空間NCを行うにあたっては、ノイズ音の音源が1つである場合など、場合によっては必ずしも全てのマイク121とスピーカ151を使用する必要がないこともある。 However, when performing spatial NC, it may not always be necessary to use all microphones 121 and speakers 151 in some cases, such as when there is only one sound source for noise.
 例えば図7の例のように、1つの点音源のノイズを考えた場合、マイク121やスピーカ151ごとに、収音や音の出力に用いられる度合い(寄与率)、すなわち空間NCの実現のための重要度は異なる。 For example, when considering the noise of one point sound source as in the example of FIG. 7, the degree (contribution rate) used for sound collection and sound output for each microphone 121 and speaker 151, that is, for the realization of spatial NC. The importance of is different.
 具体的には、図7の例では、ノイズ音の音源のある位置P11に近い位置に配置されたマイク121やスピーカ151ほど重要度は高く、逆に位置P11から遠い位置に配置されたマイク121やスピーカ151の重要度は低くなる。 Specifically, in the example of FIG. 7, the importance is as high as the microphone 121 and the speaker 151 arranged at a position close to the position P11 where the sound source of the noise sound is located, and conversely, the microphone 121 arranged at a position far from the position P11. And the speaker 151 become less important.
 したがって、場合によっては必ずしも全てのマイク121やスピーカ151を使用しなくても、十分な性能で空間NCを行うことができる。 Therefore, in some cases, it is possible to perform spatial NC with sufficient performance without necessarily using all microphones 121 and speakers 151.
 この例では、例えば位置P11に近い位置にある、矢印Q11により示される範囲内に配置されたマイク121やスピーカ151だけを使用しても、十分な空間NC性能を得ることができる。 In this example, sufficient spatial NC performance can be obtained even if only the microphone 121 and the speaker 151 arranged within the range indicated by the arrow Q11, which are located near the position P11, are used.
 そのため、例えば図8に示すようにノイズ源のある位置P11に近い4個のマイク121と12個のスピーカ151だけを使用して空間NCを行うこともできる。なお、図8において図7における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 Therefore, for example, as shown in FIG. 8, it is also possible to perform spatial NC using only four microphones 121 and 12 speakers 151 near the position P11 where the noise source is located. In FIG. 8, the parts corresponding to the case in FIG. 7 are designated by the same reference numerals, and the description thereof will be omitted as appropriate.
 図8の例では、空間NCに使用するマイク121が4個であるのに対して、空間NCに使用するスピーカ151は12個となっている。したがって、ノイズキャンセリング装置101における場合と同様の演算で空間NCのためのスピーカ信号を生成するには、8個のマイク121のマイク信号がさらに必要となる。 In the example of FIG. 8, the number of microphones 121 used for spatial NC is four, while the number of speakers 151 used for spatial NC is twelve. Therefore, in order to generate a speaker signal for spatial NC by the same calculation as in the case of the noise canceling device 101, the microphone signals of the eight microphones 121 are further required.
 そこで、実際に収音して得られたものではないが、ゼロ信号を残りの8個のマイク121のマイク信号として用いれば、ノイズキャンセリング装置101における場合と同様の演算により、12個の各スピーカ151のスピーカ信号を生成することができる。 Therefore, although it was not actually obtained by picking up the sound, if the zero signal is used as the microphone signal of the remaining eight microphones 121, each of the twelve will be calculated in the same manner as in the case of the noise canceling device 101. The speaker signal of the speaker 151 can be generated.
 このようにすれば、全てのマイク121やスピーカ151を用いなくても、位置P11で発生したノイズ音を十分にキャンセルすることができる。 By doing so, the noise sound generated at the position P11 can be sufficiently canceled without using all the microphones 121 and the speakers 151.
 但し、この場合、ノイズ音の発生位置(ノイズ源の位置)が位置P11の近傍にある必要があり、全方位からのノイズ音には対応することができない。 However, in this case, the noise sound generation position (noise source position) must be near the position P11, and noise sound from all directions cannot be dealt with.
 しかし、例えば図9に示すようにマイクアレイ111を構成するマイク121と、スピーカアレイ113を構成するスピーカ151を、それぞれ4つのグループに分割し、それらのグループごとにスピーカ信号を生成すれば、全方位からのノイズ音に対応することができる。なお、図9において図7における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 However, for example, as shown in FIG. 9, if the microphone 121 constituting the microphone array 111 and the speaker 151 constituting the speaker array 113 are each divided into four groups and a speaker signal is generated for each group, the whole is completed. It is possible to deal with noise sounds from the direction. In FIG. 9, the parts corresponding to the case in FIG. 7 are designated by the same reference numerals, and the description thereof will be omitted as appropriate.
 この例では、マイクアレイ111を構成する16個のマイク121と、スピーカアレイ113を構成する16個のスピーカ151とが、それぞれ4個のグループに分割されている。なお、以下では、特にマイク121のグループをマイクグループとも称し、スピーカ151のグループをスピーカグループとも称することとする。 In this example, the 16 microphones 121 constituting the microphone array 111 and the 16 speakers 151 constituting the speaker array 113 are each divided into four groups. In the following, the group of microphones 121 will also be referred to as a microphone group, and the group of speakers 151 will also be referred to as a speaker group.
 この例では、マイクアレイ111を構成する16個のマイク121は、矢印Q21乃至矢印Q24に示すように4つのマイクグループに分割されている。 In this example, the 16 microphones 121 constituting the microphone array 111 are divided into four microphone groups as shown by arrows Q21 to Q24.
 特に、ここでは、1つのマイク121が1つのマイクグループにのみ属し、かつ1つのマイクグループには互いに隣接して配置されているマイク121が属すようにマイクグループのグループ分けが行われている。この例では1つのマイクグループは4つのマイク121からなる。 In particular, here, the microphone groups are grouped so that one microphone 121 belongs to only one microphone group and the microphones 121 arranged adjacent to each other belong to one microphone group. In this example, one microphone group consists of four microphones 121.
 具体的には、矢印Q21に示すように、ユーザU11から見て右側前方にある、互いに隣接して配置された4つのマイク121により、1つのマイクグループが形成されている。同様に、矢印Q22乃至矢印Q24に示すように、ユーザU11から見て右側後方、左側後方、および左側前方のそれぞれの方向にある、互いに隣接して配置された4つのマイク121によりマイクグループが形成されている。 Specifically, as shown by arrow Q21, one microphone group is formed by four microphones 121 arranged adjacent to each other on the right front side when viewed from the user U11. Similarly, as shown by arrows Q22 to Q24, a microphone group is formed by four microphones 121 arranged adjacent to each other in the right rear, left rear, and left front directions when viewed from the user U11. Has been done.
 また、このような4つのマイクグループに対して、それらのマイクグループに対応する4つのスピーカグループが設けられている。 Further, for such four microphone groups, four speaker groups corresponding to those microphone groups are provided.
 すなわち、矢印Q21に示すように、ユーザU11から見て右側前方にあるマイクグループに対して、同じくユーザU11の右側前方の位置を中心とし、互いに隣接して並ぶ12個のスピーカ151からなるスピーカグループが形成されている。 That is, as shown by arrow Q21, a speaker group consisting of 12 speakers 151 arranged adjacent to each other with the position in front of the right side of the user U11 as the center with respect to the microphone group on the right front side when viewed from the user U11. Is formed.
 同様に、矢印Q22乃至矢印Q24に示すように、ユーザU11から見て右側後方、左側後方、および左側前方のそれぞれの方向にあるマイクグループに対して、同じくユーザU11の右側後方、左側後方、および左側前方のそれぞれの方向の位置を中心とし、互いに隣接して並ぶ12個のスピーカ151からなるスピーカグループが形成されている。 Similarly, as shown by arrows Q22 to Q24, the right rear, left rear, and left rear of the user U11 are also for the microphone groups in the right rear, left rear, and left front directions when viewed from the user U11. A speaker group consisting of 12 speakers 151 arranged adjacent to each other is formed around the position in each direction on the front left side.
 この例では、1つのスピーカグループには互いに隣接して配置されている12個のスピーカ151が属すようにスピーカグループのグループ分けが行われているため、1つのスピーカ151は3つのスピーカグループに属すことになる。 In this example, since the speaker groups are grouped so that 12 speakers 151 arranged adjacent to each other belong to one speaker group, one speaker 151 belongs to three speaker groups. It will be.
 このようなグループ分けを行い、グループごとに、1つのマイクグループに属すマイク121のマイク信号を用いて、そのマイクグループに対応する1つのスピーカグループに属すスピーカ151のスピーカ信号を生成すれば、空間NCの全体の処理を4分割することができる。すなわち、例えば対応する1つのマイクグループとスピーカグループに対して信号処理部131に対応する1つの演算器を設けるなどしてハードウェアを4分割し、空間NCのための処理を複数の演算器に分散させることができる。 If such grouping is performed and the microphone signal of the microphone 121 belonging to one microphone group is used for each group to generate the speaker signal of the speaker 151 belonging to one speaker group corresponding to the microphone group, the space can be generated. The entire processing of NC can be divided into four. That is, for example, the hardware is divided into four by providing one arithmetic unit corresponding to the signal processing unit 131 for the corresponding microphone group and the speaker group, and the processing for spatial NC is performed by the plurality of arithmetic units. Can be dispersed.
 この例では、互いに隣接する4つのマイク121が同じマイクグループに属すように、マイクアレイ111を構成する複数のマイク121が4つのマイクグループへと分割される。すなわち、マイク121を4つずつずらしながら、1度のフィルタリングに用いるマイク121の選択が行われる。 In this example, a plurality of microphones 121 constituting the microphone array 111 are divided into four microphone groups so that four microphones 121 adjacent to each other belong to the same microphone group. That is, the microphone 121 to be used for one filtering is selected while shifting the microphone 121 by four.
 同様にして、スピーカ151も4つずつずらしながら、1度のフィルタリングにより得られるスピーカ信号の出力先となるスピーカ151の選択が行われ、選択されたスピーカ151へとスピーカ信号が供給される。 Similarly, while shifting the speaker 151 by four, the speaker 151 that is the output destination of the speaker signal obtained by one filtering is selected, and the speaker signal is supplied to the selected speaker 151.
 このとき、互いに異なる3度のフィルタリングで得られた各スピーカ信号の出力先として1つの同じスピーカ151が選択されるので、同じスピーカ151を出力先とする3つのスピーカ信号が加算され、最終的なスピーカ信号とされる。 At this time, since one and the same speaker 151 is selected as the output destination of each speaker signal obtained by filtering three times different from each other, the three speaker signals having the same speaker 151 as the output destination are added and finally. It is regarded as a speaker signal.
 以上のように、グループ分けを行って全体の処理を一部ずつ分担させるとともに、フィルタリングの出力先が重なり合う部分はスピーカ信号を加算して最終的なスピーカ信号とすることで、結果として全てのマイク121とスピーカ151を用いて空間NCを行うことができる。これにより、全方位からのノイズ音に対応することができる。 As described above, grouping is performed to divide the entire processing part by part, and the speaker signals are added to the part where the filtering output destinations overlap to obtain the final speaker signal, resulting in all microphones. Spatial NC can be performed using 121 and speaker 151. This makes it possible to deal with noise sounds from all directions.
〈ノイズキャンセリング装置の構成例〉
 マイク121やスピーカ151のグループ分けを行い、それらのグループごとに処理を分担して行う場合、ノイズキャンセリング装置は、例えば図10に示すように構成される。なお、図10において図4における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。
<Configuration example of noise canceling device>
When the microphone 121 and the speaker 151 are grouped and the processing is shared among the groups, the noise canceling device is configured as shown in FIG. 10, for example. In FIG. 10, the same reference numerals are given to the portions corresponding to those in FIG. 4, and the description thereof will be omitted as appropriate.
 図10に示すノイズキャンセリング装置191は、マイクアレイ111、信号処理装置201、およびスピーカアレイ113を有している。 The noise canceling device 191 shown in FIG. 10 has a microphone array 111, a signal processing device 201, and a speaker array 113.
 この例では、マイクアレイ111およびスピーカアレイ113は、それぞれ4つのグループに分割されている。 In this example, the microphone array 111 and the speaker array 113 are each divided into four groups.
 すなわち、マイク121-1乃至マイク121-4が1つのグループとされている。 That is, microphones 121-1 to 121-4 are grouped together.
 同様に、マイク121-5乃至マイク121-8、マイク121-9乃至マイク121-12、およびマイク121-13乃至マイク121-16のそれぞれが1つのグループとされている。 Similarly, each of the microphones 121-5 to 121-8, the microphones 121-9 to 121-12, and the microphones 121-13 to 121-16 is grouped together.
 また、スピーカ151-1乃至スピーカ151-12が1つのグループとされており、スピーカ151-5乃至スピーカ151-16も1つのグループとされている。 Further, the speakers 151-1 to the speakers 151-12 are grouped together, and the speakers 151-5 to the speakers 151-16 are also grouped together.
 同様に、スピーカ151-9乃至スピーカ151-16、およびスピーカ151-1乃至スピーカ151-4が1つのグループとされており、スピーカ151-13乃至スピーカ151-16、およびスピーカ151-1乃至スピーカ151-8が1つのグループとされている。 Similarly, the speakers 151-9 to the speaker 151-16 and the speakers 151-1 to the speaker 151-4 are grouped together, and the speakers 151-13 to the speaker 151-16 and the speakers 151-1 to the speaker 1511 are grouped together. -8 is one group.
 信号処理装置201は、図4の信号処理装置112に対応し、例えば1または複数の演算器を有するパーソナルコンピュータ等からなる。 The signal processing device 201 corresponds to the signal processing device 112 of FIG. 4, and is composed of, for example, a personal computer having one or a plurality of arithmetic units.
 信号処理装置201は、信号処理部211-1乃至信号処理部211-4、および加算部212-1乃至加算部212-16を有している。 The signal processing device 201 has a signal processing unit 211-1 to a signal processing unit 211-4, and an addition unit 212-1 to an addition unit 212-16.
 信号処理部211-1乃至信号処理部211-4のそれぞれは、例えばDSPやFPGAなどの1つの演算器からなり、図4の信号処理部131に対応する。 Each of the signal processing unit 211-1 to the signal processing unit 211-4 is composed of one arithmetic unit such as a DSP or FPGA, and corresponds to the signal processing unit 131 of FIG.
 信号処理部211-1は、マイク121-1乃至マイク121-4から供給されたマイク信号と、マイク信号として扱う所定の8個のゼロ信号とに基づいて、信号処理部131における場合と同様の処理を行い、スピーカ信号を生成する。 The signal processing unit 211-1 is the same as in the signal processing unit 131 based on the microphone signals supplied from the microphones 121-1 to 121-4 and the predetermined eight zero signals treated as microphone signals. Performs processing and generates a speaker signal.
 信号処理部211-1では、12個の各チャネルのスピーカ信号、すなわちスピーカ151-13乃至スピーカ151-16およびスピーカ151-1乃至スピーカ151-8のそれぞれを出力先とするスピーカ信号が生成される。 The signal processing unit 211-1 generates speaker signals of each of the twelve channels, that is, speaker signals having the output destinations of the speakers 151-13 to the speaker 151-16 and the speakers 151-1 to 151-8 respectively. ..
 信号処理部211-1は、生成したスピーカ信号を対応するチャネルの加算部、すなわち加算部212-13乃至加算部212-16および加算部212-1乃至加算部212-8に供給する。 The signal processing unit 211-1 supplies the generated speaker signal to the addition unit of the corresponding channel, that is, the addition unit 212-13 to the addition unit 212-16 and the addition unit 212-1 to the addition unit 212-8.
 信号処理部211-2乃至信号処理部211-4も信号処理部211-1と同様にして、4つのマイク121からのマイク信号と8個のゼロ信号とに基づいて、12チャネル分のスピーカ信号を生成して出力する。 Similar to the signal processing unit 211-1, the signal processing unit 211-2 to the signal processing unit 211-4 also have 12 channels of speaker signals based on the microphone signals from the four microphones 121 and the eight zero signals. Is generated and output.
 すなわち、信号処理部211-2は、マイク121-5乃至マイク121-8からマイク信号の供給を受け、加算部212-1乃至加算部212-12にスピーカ信号を供給する。 That is, the signal processing unit 211-2 receives the microphone signal from the microphones 121-5 to 121-8, and supplies the speaker signal to the addition unit 212-1 to the addition unit 212-12.
 信号処理部211-3は、マイク121-9乃至マイク121-12からマイク信号の供給を受け、加算部212-5乃至加算部212-16にスピーカ信号を供給する。 The signal processing unit 211-3 receives the microphone signal from the microphones 121-9 to 121-12, and supplies the speaker signal to the addition unit 212-5 to the addition unit 212-16.
 信号処理部211-4は、マイク121-13乃至マイク121-16からマイク信号の供給を受け、加算部212-9乃至加算部212-16、および加算部212-1乃至加算部212-4にスピーカ信号を供給する。 The signal processing unit 211-4 receives the microphone signal from the microphones 121-13 to 121-16, and is supplied to the addition unit 212-9 to the addition unit 212-16, and the addition unit 212-1 to the addition unit 212-4. Supply a speaker signal.
 なお、以下、信号処理部211-1乃至信号処理部211-4を特に区別する必要のない場合、単に信号処理部211とも称することとする。 Hereinafter, when it is not necessary to distinguish between the signal processing unit 211-1 and the signal processing unit 211-4, the signal processing unit 211-1 to the signal processing unit 211-4 are also simply referred to as the signal processing unit 211.
 ノイズキャンセリング装置191では、マイクグループごとに、それらのマイクグループのマイク121による収音で得られたマイク信号が入力される信号処理部211が予め定められている。 In the noise canceling device 191, a signal processing unit 211 to which the microphone signal obtained by collecting the sound by the microphone 121 of the microphone group is input is predetermined for each microphone group.
 各信号処理部211は、1つのマイクグループに属す全てのマイク121から供給されたマイク信号に基づいてSISOフィルタによるフィルタリング等を行い、スピーカアレイ113の一部のスピーカ151、すなわちマイクグループに対応するスピーカグループに属すスピーカ151のスピーカ信号を生成する。 Each signal processing unit 211 performs filtering by an SISO filter based on the microphone signals supplied from all the microphones 121 belonging to one microphone group, and corresponds to a part of the speakers 151 of the speaker array 113, that is, the microphone group. The speaker signal of the speaker 151 belonging to the speaker group is generated.
 加算部212-1乃至加算部212-16は、複数の信号処理部211から供給された同じチャネルのスピーカ信号を加算して最終的なスピーカ信号とし、その最終的なスピーカ信号を対応するチャネルのスピーカ151に供給する。 The addition unit 212-1 to the addition unit 212-16 add speaker signals of the same channel supplied from a plurality of signal processing units 211 to obtain a final speaker signal, and the final speaker signal of the corresponding channel. It is supplied to the speaker 151.
 この例では、加算部212-1乃至加算部212-4は、信号処理部211-1、信号処理部211-2、および信号処理部211-4からスピーカ信号の供給を受け、加算部212-5乃至加算部212-8は、信号処理部211-1乃至信号処理部211-3からスピーカ信号の供給を受ける。 In this example, the addition unit 212-1 to the addition unit 212-4 receive the speaker signal supply from the signal processing unit 211-1, the signal processing unit 211-2, and the signal processing unit 211-4, and the addition unit 212- 5 to the addition unit 212-8 receive a speaker signal from the signal processing unit 211-1 to the signal processing unit 211-3.
 加算部212-9乃至加算部212-12は、信号処理部211-2乃至信号処理部211-4からスピーカ信号の供給を受け、加算部212-13乃至加算部212-16は、信号処理部211-1、信号処理部211-3、および信号処理部211-4からスピーカ信号の供給を受ける。 The addition unit 212-9 to the addition unit 212-12 receive a speaker signal from the signal processing unit 211-2 to the signal processing unit 211-4, and the addition unit 212-13 to the addition unit 212-16 are signal processing units. The speaker signal is supplied from 211-1, the signal processing unit 211-3, and the signal processing unit 211-4.
 なお、以下、加算部212-1乃至加算部212-16を特に区別する必要のない場合、単に加算部212とも称することとする。 Hereinafter, when it is not necessary to distinguish between the addition unit 212-1 and the addition unit 212-16, it is also simply referred to as the addition unit 212.
 この例では、スピーカアレイ113を構成する複数のスピーカ151のそれぞれに対して、対応する複数の加算部212のそれぞれが設けられており、加算部212は2以上の信号処理部211で得られた同じスピーカ151のスピーカ信号を加算して出力する。 In this example, each of a plurality of corresponding addition units 212 is provided for each of the plurality of speakers 151 constituting the speaker array 113, and the addition units 212 are obtained by two or more signal processing units 211. The speaker signals of the same speaker 151 are added and output.
 また、ここでは複数の信号処理部211が1つの信号処理装置201に設けられる例について説明したが、これらの信号処理部211のそれぞれが互いに異なる複数の信号処理装置に設けられるようにしてもよい。 Further, although an example in which a plurality of signal processing units 211 are provided in one signal processing device 201 has been described here, each of these signal processing units 211 may be provided in a plurality of signal processing devices different from each other. ..
〈信号処理部の構成例〉
 図11は、ノイズキャンセリング装置191の信号処理部211の構成例を示す図である。
<Configuration example of signal processing unit>
FIG. 11 is a diagram showing a configuration example of the signal processing unit 211 of the noise canceling device 191.
 信号処理部211は空間周波数変換部241、フィルタ処理部242-1乃至フィルタ処理部242-12、および空間周波数合成部243を有している。 The signal processing unit 211 has a spatial frequency conversion unit 241, a filter processing unit 242-1 to a filter processing unit 242-12, and a spatial frequency synthesis unit 243.
 これらの空間周波数変換部241、フィルタ処理部242-1乃至フィルタ処理部242-12、および空間周波数合成部243は、図4に示した空間周波数変換部141、フィルタ処理部142、および空間周波数合成部143に対応している。 The spatial frequency conversion unit 241, the filter processing unit 242-1 to the filter processing unit 242-12, and the spatial frequency synthesis unit 243 are the spatial frequency conversion unit 141, the filter processing unit 142, and the spatial frequency synthesis unit shown in FIG. Corresponds to part 143.
 空間周波数変換部241は、4個の各マイク121から供給された時間領域のマイク信号と、ダミーのマイク信号として供給された8個のゼロ信号とに基づいて空間周波数変換を行う。 The spatial frequency conversion unit 241 performs spatial frequency conversion based on the time domain microphone signals supplied from each of the four microphones 121 and the eight zero signals supplied as dummy microphone signals.
 例えば空間周波数変換部241では、空間周波数変換として、式(6)と同様のDFTが行われる。この場合、DFTポイント長は12とされ、フィルタ処理部242-1乃至フィルタ処理部242-12に対応する12個の各空間周波数の周波数ビンlについての信号x’[l,n]が算出される。 For example, in the spatial frequency conversion unit 241, the same DFT as in the equation (6) is performed as the spatial frequency conversion. In this case, the DFT point length is set to 12, and the signal x'[l, n] for the frequency bin l of each of the 12 spatial frequencies corresponding to the filter processing unit 242-1 to the filter processing unit 242-12 is calculated. To.
 空間周波数変換部241は、空間周波数変換により得られた空間周波数領域の信号をフィルタ処理部242-1乃至フィルタ処理部242-12に供給する。 The spatial frequency conversion unit 241 supplies the signal in the spatial frequency domain obtained by the spatial frequency conversion to the filter processing unit 242-1 to the filter processing unit 242-12.
 フィルタ処理部242-1乃至フィルタ処理部242-12は、空間周波数変換部241から供給された信号に対して、空間周波数領域での信号処理を行うことで空間周波数領域のスピーカ信号を生成し、空間周波数合成部243に供給する。 The filter processing unit 242-1 to the filter processing unit 242-12 generate a speaker signal in the spatial frequency region by performing signal processing in the spatial frequency region with respect to the signal supplied from the spatial frequency conversion unit 241. It is supplied to the spatial frequency synthesis unit 243.
 具体的には、フィルタ処理部242-1乃至フィルタ処理部242-12は、空間NCのためのSISOフィルタを保持している。 Specifically, the filter processing unit 242-1 to the filter processing unit 242-12 hold an SISO filter for spatial NC.
 フィルタ処理部242-1乃至フィルタ処理部242-12は、保持しているSISOフィルタにより、空間周波数変換部241からの空間周波数領域の信号に対するフィルタリングを信号処理として行うことで、スピーカ信号を生成する。このSISOフィルタは、例えば上述したフィルタw’[l,n]などとされ、フィルタリングとして式(7)の計算が行われる。 The filter processing unit 242-1 to the filter processing unit 242-12 generate a speaker signal by filtering the signal in the spatial frequency domain from the spatial frequency conversion unit 241 as signal processing by the holding SISO filter. .. This SISO filter is, for example, the above-mentioned filter w'[l, n], and the calculation of the equation (7) is performed as filtering.
 なお、以下、フィルタ処理部242-1乃至フィルタ処理部242-12を特に区別する必要のない場合、単にフィルタ処理部242とも称する。 Hereinafter, when it is not necessary to distinguish between the filter processing unit 242-1 and the filter processing unit 242-12, it is also simply referred to as the filter processing unit 242.
 空間周波数合成部243は、各フィルタ処理部242から供給された空間周波数領域のスピーカ信号に対して空間周波数合成を行うことで、スピーカ151ごとの時間領域のスピーカ信号を生成し、スピーカアレイ113に供給する。 The spatial frequency synthesis unit 243 generates a speaker signal in the time region for each speaker 151 by performing spatial frequency synthesis on the speaker signal in the spatial frequency region supplied from each filter processing unit 242, and causes the speaker array 113. Supply.
 空間周波数合成部243では、空間周波数合成として、空間周波数変換部241で行われる空間周波数変換の逆変換が行われる。 In the spatial frequency synthesis unit 243, the inverse conversion of the spatial frequency conversion performed by the spatial frequency conversion unit 241 is performed as the spatial frequency synthesis.
 以上のようにノイズキャンセリング装置191では、マイクアレイ111を構成する複数のマイク121の出力を4つずつに分けて各信号処理部211に入力する。 As described above, in the noise canceling device 191, the outputs of the plurality of microphones 121 constituting the microphone array 111 are divided into four and input to each signal processing unit 211.
 信号処理部211では、4つのマイク121から供給された各マイク信号が、12個ある入力端子のうちの中央にある4つの各入力端子に入力される。また、それらの4つの入力端子の左右両端に4つずつある合計8個の各入力端子には、ダミーのマイク信号であるゼロ信号が入力される。 In the signal processing unit 211, each microphone signal supplied from the four microphones 121 is input to each of the four input terminals in the center of the twelve input terminals. Further, a zero signal, which is a dummy microphone signal, is input to each of a total of eight input terminals, four on each of the four input terminals on the left and right ends.
 空間周波数変換部241では、入力端子から入力されたマイク信号に基づいて、空間周波数変換として例えばDFTポイント長「12」でDFTが行われ、その後、各フィルタ処理部242ではDFTの出力に対してSISOフィルタによるフィルタリングが行われる。 In the spatial frequency conversion unit 241, DFT is performed as spatial frequency conversion, for example, with a DFT point length "12" based on the microphone signal input from the input terminal, and then in each filter processing unit 242, the DFT output is subjected to. Filtering by SISO filter is performed.
 さらに、空間周波数合成部243では、各フィルタ処理部242の出力に対して、空間周波数合成として例えばIDFTが行われ、各チャネルの時間領域のスピーカ信号が生成される。 Further, in the spatial frequency synthesis unit 243, for example, IDFT is performed as spatial frequency synthesis for the output of each filter processing unit 242, and a speaker signal in the time domain of each channel is generated.
 このようにして生成された各チャネルのスピーカ信号は、それらのチャネルに対応するスピーカ151に入力されるが、その前に各加算部212において、互いに隣接する3つの信号処理部211からの同じチャネルのスピーカ信号が加算される。 The speaker signal of each channel generated in this way is input to the speaker 151 corresponding to those channels, but before that, in each addition unit 212, the same channel from three signal processing units 211 adjacent to each other is used. Speaker signal is added.
 なお、スピーカ151の前段には図示せぬ増幅器が設けられているが、同じチャネルのスピーカ信号の加算処理は増幅器内で行われるようにしてもよいし、スピーカ信号が増幅器に入力される前にデジタルまたはアナログの状態で加算処理が行われてもよい。 Although an amplifier (not shown) is provided in front of the speaker 151, the addition processing of the speaker signals of the same channel may be performed in the amplifier, or before the speaker signal is input to the amplifier. The addition process may be performed in a digital or analog state.
 各信号処理部211では、空間周波数変換部241や空間周波数合成部243の入出力、すなわちDFTやIDFTの入出力(ポイント長)は「12」であるので、図4に示した信号処理部131における場合のポイント長「16」よりも少ない。 In each signal processing unit 211, the input / output of the spatial frequency conversion unit 241 and the spatial frequency synthesis unit 243, that is, the input / output (point length) of the DFT and IDFT is "12", so that the signal processing unit 131 shown in FIG. It is less than the point length "16" in the case of.
 したがって、信号処理部211のPIN数(入出力の端子数)を、信号処理部131における場合よりも少なくすることができ、また信号処理部211で行われる演算(信号処理)の演算量も少なくすることができる。 Therefore, the number of PINs (number of input / output terminals) of the signal processing unit 211 can be reduced as compared with the case of the signal processing unit 131, and the amount of calculation (signal processing) performed by the signal processing unit 211 is also small. can do.
 このようにノイズキャンセリング装置191によれば、1つの信号処理部211のPIN数や演算量を低減させるとともに遅延時間も低減させ、リアルタイムで高い空間NC性能を得ることができる。しかも、全方位からのノイズ音に対応することができる。 As described above, according to the noise canceling device 191, it is possible to reduce the number of PINs and the amount of calculation of one signal processing unit 211 and also reduce the delay time, and obtain high spatial NC performance in real time. Moreover, it is possible to deal with noise sounds from all directions.
 なお、図10では説明を簡単にするため、ポイント長「16」から「12」にする例について説明した。 Note that, in FIG. 10, an example of changing the point length from “16” to “12” has been described for the sake of simplicity.
 しかし、例えば256チャネルのマイク信号を12チャネルずつに分割して処理するなど、PIN数やMIPS(Million Instructions Per Second)数等の信号処理部(演算器)の仕様などに応じて、各信号処理部でのポイント長(分割数)は任意に設定可能である。 However, each signal is processed according to the specifications of the signal processing unit (calculator) such as the number of PINs and the number of MIPS (Million Instructions Per Second), such as dividing a 256-channel microphone signal into 12 channels for processing. The point length (number of divisions) in the section can be set arbitrarily.
〈ノイズキャンセリング処理の説明〉
 次に、ノイズキャンセリング装置191の動作について説明する。すなわち、以下、図12のフローチャートを参照して、ノイズキャンセリング装置191によるノイズキャンセリング処理について説明する。
<Explanation of noise canceling processing>
Next, the operation of the noise canceling device 191 will be described. That is, the noise canceling process by the noise canceling device 191 will be described below with reference to the flowchart of FIG.
 なお、ステップS51の処理は、図6のステップS11の処理と同様であるので、その説明は省略する。但し、ステップS51では、各マイク121で得られたマイク信号は、信号処理部211の空間周波数変換部241に供給される。 Since the process of step S51 is the same as the process of step S11 of FIG. 6, the description thereof will be omitted. However, in step S51, the microphone signal obtained by each microphone 121 is supplied to the spatial frequency conversion unit 241 of the signal processing unit 211.
 ステップS52において各信号処理部211の空間周波数変換部241は、4つのマイク121から供給された時間領域のマイク信号と8個のゼロ信号に対して空間周波数変換を行い、その結果得られた空間周波数領域の信号を各フィルタ処理部242に供給する。例えばステップS52では、上述した式(6)と同様の計算が行われる。 In step S52, the spatial frequency conversion unit 241 of each signal processing unit 211 performs spatial frequency conversion on the microphone signals in the time domain supplied from the four microphones 121 and the eight zero signals, and the space obtained as a result. A signal in the frequency domain is supplied to each filter processing unit 242. For example, in step S52, the same calculation as in the above equation (6) is performed.
 ステップS53においてフィルタ処理部242は、空間周波数変換部241から供給された空間周波数領域の信号に対して、保持しているSISOフィルタによるフィルタリングを行い、その結果得られた空間周波数領域のスピーカ信号を空間周波数合成部243に供給する。例えばステップS53では、式(7)と同様の計算がフィルタリングとして行われる。 In step S53, the filter processing unit 242 filters the signal in the spatial frequency domain supplied from the spatial frequency conversion unit 241 by the holding SISO filter, and obtains the speaker signal in the spatial frequency domain obtained as a result. It is supplied to the spatial frequency synthesis unit 243. For example, in step S53, the same calculation as in equation (7) is performed as filtering.
 ステップS54において空間周波数合成部243は、フィルタ処理部242から供給された空間周波数領域のスピーカ信号に対して空間周波数合成を行い、その結果得られた時間領域のスピーカ信号を加算部212に供給する。 In step S54, the spatial frequency synthesis unit 243 performs spatial frequency synthesis on the spatial frequency region speaker signal supplied from the filter processing unit 242, and supplies the resulting time domain speaker signal to the addition unit 212. ..
 ステップS55において加算部212は、3つの各信号処理部211の空間周波数合成部243から供給された同じチャネルのスピーカ信号を加算する加算処理を行い、最終的なスピーカ信号とする。 In step S55, the addition unit 212 performs addition processing to add the speaker signals of the same channel supplied from the spatial frequency synthesis unit 243 of each of the three signal processing units 211, and obtains the final speaker signal.
 ステップS56において各加算部212は、ステップS55の処理で得られたスピーカ信号をスピーカアレイ113の各スピーカ151に供給して音(ノイズキャンセル音)を出力させ、ノイズキャンセリング処理は終了する。 In step S56, each addition unit 212 supplies the speaker signal obtained in the process of step S55 to each speaker 151 of the speaker array 113 to output a sound (noise canceling sound), and the noise canceling process ends.
 以上のようにしてノイズキャンセリング装置191は、マイクアレイ111の出力を4つに分割して信号処理部211に入力し、各信号処理部211で空間周波数領域での信号処理によりスピーカ信号を生成する。このようにすることで、1つの信号処理部211のPIN数や演算量を低減させるとともに遅延時間も低減させ、リアルタイムで全方位に対応可能な高性能の空間NCを実現することができる。 As described above, the noise canceling device 191 divides the output of the microphone array 111 into four parts and inputs them to the signal processing unit 211, and each signal processing unit 211 generates a speaker signal by signal processing in the spatial frequency region. do. By doing so, it is possible to reduce the number of PINs and the amount of calculation of one signal processing unit 211 and also reduce the delay time, and realize a high-performance spatial NC that can handle all directions in real time.
〈第3の実施の形態〉
〈グループ分けの他の例について〉
 ノイズキャンセリング装置191では、複数の信号処理部211により分担して処理を行うようにするため、信号処理部211の後段に加算部212が設けられている。
<Third embodiment>
<About other examples of grouping>
In the noise canceling device 191, an addition unit 212 is provided after the signal processing unit 211 in order to share the processing with the plurality of signal processing units 211.
 しかし、例えば図13や図14に示すようにマイクグループに属すマイク121の数を増やし、マイク121の出力を隣接する複数の信号処理部211(演算器)にオーバーラップさせて入力させれば、加算部212を設けない構成とすることができる。なお、図13または図14において図7における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 However, for example, as shown in FIGS. 13 and 14, if the number of microphones 121 belonging to the microphone group is increased and the outputs of the microphones 121 are overlapped with a plurality of adjacent signal processing units 211 (computing units) for input, The configuration may be such that the addition unit 212 is not provided. In FIG. 13 or FIG. 14, the parts corresponding to the case in FIG. 7 are designated by the same reference numerals, and the description thereof will be omitted as appropriate.
 図13の例では、位置P11からのノイズ音に対して、12個のマイク121を用いて信号処理、すなわちフィルタリング等を行い、その結果得られたスピーカ信号のうち、4チャネル分のスピーカ信号が用いられて4つのスピーカ151から音が出力される。 In the example of FIG. 13, the noise sound from the position P11 is signal-processed, that is, filtered by using 12 microphones 121, and among the speaker signals obtained as a result, the speaker signals for 4 channels are obtained. Sound is output from the four speakers 151 used.
 このようなマイク121とスピーカ151の組み合わせで、複数の信号処理部211で分割して処理を行う場合、例えば図14に示すようにグループ分けを行えばよい。 When the combination of the microphone 121 and the speaker 151 is divided by a plurality of signal processing units 211 for processing, for example, grouping may be performed as shown in FIG.
 すなわち、図14に示す例では、矢印Q41乃至矢印Q44に示すように、マイク121とスピーカ151とが、それぞれ4つのグループに分割されている。 That is, in the example shown in FIG. 14, as shown by arrows Q41 to Q44, the microphone 121 and the speaker 151 are each divided into four groups.
 具体的には、矢印Q41に示すように、ユーザU11の右側前方の位置を中心として互いに隣接して配置された12個のマイク121により、1つのマイクグループが形成されている。同様に、矢印Q42乃至矢印Q44に示すように、ユーザU11の右側後方、左側後方、および左側前方のそれぞれの方向を中心として、互いに隣接して配置された12個のマイク121によりマイクグループが形成されている。 Specifically, as shown by arrow Q41, one microphone group is formed by twelve microphones 121 arranged adjacent to each other around the position on the right front side of the user U11. Similarly, as shown by arrows Q42 to Q44, a microphone group is formed by twelve microphones 121 arranged adjacent to each other around the right rear, left rear, and left front directions of the user U11. Has been done.
 特に、ここでは、1つのマイクグループには互いに隣接して配置されている12個のマイク121が属すようにグループ分けが行われているため、1つのマイク121は3つのマイクグループに属すことになる。 In particular, here, since the grouping is performed so that 12 microphones 121 arranged adjacent to each other belong to one microphone group, one microphone 121 belongs to three microphone groups. Become.
 また、このような4つのマイクグループに対して、それらのマイクグループに対応する4つのスピーカグループが設けられている。 Further, for such four microphone groups, four speaker groups corresponding to those microphone groups are provided.
 すなわち、矢印Q41に示すように、ユーザU11の右側前方にあるマイクグループに対して、同じくユーザU11の右側前方の位置を中心とし、互いに隣接して並ぶ4個のスピーカ151からなるスピーカグループが形成されている。 That is, as shown by arrow Q41, a speaker group consisting of four speakers 151 arranged adjacent to each other is formed with respect to the microphone group on the right front side of the user U11, centered on the position on the right side front side of the user U11. Has been done.
 同様に、矢印Q42乃至矢印Q44に示すように、ユーザU11の右側後方、左側後方、および左側前方のそれぞれの方向にあるマイクグループに対して、同じくユーザU11の右側後方、左側後方、および左側前方のそれぞれの方向の位置を中心とし、互いに隣接して並ぶ4個のスピーカ151からなるスピーカグループが形成されている。 Similarly, as shown by arrows Q42 to Q44, the right rear, left rear, and left front of the user U11 are also for the microphone groups in the right rear, left rear, and left front directions of the user U11. A speaker group consisting of four speakers 151 arranged adjacent to each other is formed with the position in each direction as the center.
 この例では、1つのスピーカ151が1つのスピーカグループにのみ属し、かつ1つのスピーカグループには互いに隣接して配置されているスピーカ151が属すようにスピーカグループのグループ分けが行われている。 In this example, the speaker groups are grouped so that one speaker 151 belongs to only one speaker group, and the speakers 151 arranged adjacent to each other belong to one speaker group.
〈ノイズキャンセリング装置の構成例〉
 マイク121やスピーカ151が図14に示すようにグループ分けされる場合、ノイズキャンセリング装置は、例えば図15に示すように構成される。なお、図15において図10における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。
<Configuration example of noise canceling device>
When the microphone 121 and the speaker 151 are grouped as shown in FIG. 14, the noise canceling device is configured as shown in FIG. 15, for example. In FIG. 15, the same reference numerals are given to the portions corresponding to those in FIG. 10, and the description thereof will be omitted as appropriate.
 図15に示すノイズキャンセリング装置281は、マイクアレイ111、信号処理装置201、およびスピーカアレイ113を有している。また、信号処理装置201は、信号処理部211-1乃至信号処理部211-4を有している。 The noise canceling device 281 shown in FIG. 15 has a microphone array 111, a signal processing device 201, and a speaker array 113. Further, the signal processing device 201 has a signal processing unit 211-1 to a signal processing unit 211-4.
 ノイズキャンセリング装置281の構成は、加算部212が設けられていない点でノイズキャンセリング装置191の構成とは異なり、その他の点ではノイズキャンセリング装置191と同じ構成となっている。但し、ノイズキャンセリング装置281とノイズキャンセリング装置191とでは、信号処理部211とマイク121およびスピーカ151との入出力の関係が異なっている。 The configuration of the noise canceling device 281 is different from the configuration of the noise canceling device 191 in that the addition unit 212 is not provided, and is the same configuration as the noise canceling device 191 in other respects. However, the noise canceling device 281 and the noise canceling device 191 differ in the input / output relationship between the signal processing unit 211, the microphone 121, and the speaker 151.
 この例では、マイク121-1乃至マイク121-8、およびマイク121-13乃至マイク121-16が1つのグループとされており、これらのマイク121のマイク信号が信号処理部211-1に供給される。 In this example, microphones 121-1 to 121-8 and microphones 121-13 to 121-16 are grouped together, and the microphone signals of these microphones 121 are supplied to the signal processing unit 211-1. To.
 同様に、マイク121-1乃至マイク121-12が1つのグループとされており、これらのマイク121のマイク信号が信号処理部211-2に供給される。 Similarly, the microphones 121-1 to 121-12 are grouped together, and the microphone signals of these microphones 121 are supplied to the signal processing unit 211-2.
 マイク121-5乃至マイク121-16が1つのグループとされており、これらのマイク121のマイク信号が信号処理部211-3に供給される。 Microphones 121-5 to 121-16 are grouped together, and the microphone signals of these microphones 121 are supplied to the signal processing unit 211-3.
 マイク121-9乃至マイク121-16、およびマイク121-1乃至マイク121-4が1つのグループとされており、これらのマイク121のマイク信号が信号処理部211-4に供給される。 Microphones 121-9 to 121-16 and microphones 121-1 to 121-4 are grouped together, and the microphone signals of these microphones 121 are supplied to the signal processing unit 211-4.
 したがって、この例では、1つのマイク121の出力が、そのマイク121(マイクグループ)に対して予め定められた2以上、より詳細には3個の信号処理部211に入力される。そのため、各信号処理部211の空間周波数変換部241には、ダミーのマイク信号(ゼロ信号)は供給されず、12個のマイク121のマイク信号が入力されることになる。 Therefore, in this example, the output of one microphone 121 is input to two or more, more specifically, three signal processing units 211 predetermined for the microphone 121 (microphone group). Therefore, the dummy microphone signal (zero signal) is not supplied to the spatial frequency conversion unit 241 of each signal processing unit 211, and the microphone signals of the twelve microphones 121 are input.
 また、スピーカ151-1乃至スピーカ151-4が1つのグループとされており、スピーカ151-5乃至スピーカ151-8も1つのグループとされている。 Further, the speakers 151-1 to 151-4 are grouped together, and the speakers 151-5 to 151-8 are also grouped together.
 同様に、スピーカ151-9乃至スピーカ151-12が1つのグループとされており、スピーカ151-13乃至スピーカ151-16が1つのグループとされている。 Similarly, speakers 151-9 to speakers 151-12 are grouped together, and speakers 151-13 to speakers 151-16 are grouped together.
 各信号処理部211では、マイク信号に基づいてSISOフィルタによるフィルタリング等が行われ、スピーカアレイ113の一部のスピーカ151、すなわちマイクグループに対応するスピーカグループに属す全てのスピーカ151のスピーカ信号が生成される。 In each signal processing unit 211, filtering by an SISO filter or the like is performed based on the microphone signal, and a speaker signal of a part of the speaker 151 of the speaker array 113, that is, all the speakers 151 belonging to the speaker group corresponding to the microphone group is generated. Will be done.
 より詳細には、空間周波数合成部243では、空間周波数変換部241の入力と同じ数の12チャネル分のスピーカ信号、つまり12個のスピーカ151のそれぞれに対応するスピーカ信号が得られる。しかし、これらのスピーカ信号のうち、4チャネル分のスピーカ信号、つまり12個のスピーカ151のうちの一部のスピーカ151のスピーカ信号のみが実際にスピーカ151に対して出力される。 More specifically, in the spatial frequency synthesis unit 243, the same number of 12 channels of speaker signals as the input of the spatial frequency conversion unit 241, that is, speaker signals corresponding to each of the 12 speakers 151 can be obtained. However, among these speaker signals, only the speaker signals for four channels, that is, the speaker signals of some of the speakers 151 out of the twelve speakers 151 are actually output to the speaker 151.
 すなわち、空間周波数合成部243では、12個ある出力端子のうちの中央にある4つの各出力端子から、それらの出力端子に接続されたスピーカ151へとスピーカ信号が出力される。また、それらの4つの出力端子の左右両端に4つずつある合計8個の各出力端子には、スピーカ151が接続されていないため、これらの出力端子からスピーカ151へのスピーカ信号の供給は行われない。 That is, in the spatial frequency synthesis unit 243, a speaker signal is output from each of the four output terminals in the center of the twelve output terminals to the speaker 151 connected to those output terminals. Further, since the speaker 151 is not connected to each of the eight output terminals, four on each of the four output terminals on the left and right ends, the speaker signal is supplied from these output terminals to the speaker 151. I won't get it.
 このようなノイズキャンセリング装置281では、マイクアレイ111を構成する複数のマイク121の出力を4つずつずらしながら、1つの信号処理部211に12個のマイク信号が入力される。そのため、空間周波数変換部241の12個の入力端子全てに実際に収音して得られたマイク信号が入力される。 In such a noise canceling device 281, 12 microphone signals are input to one signal processing unit 211 while shifting the outputs of the plurality of microphones 121 constituting the microphone array 111 by four. Therefore, the microphone signal obtained by actually collecting the sound is input to all the 12 input terminals of the spatial frequency conversion unit 241.
 これに対して、空間周波数合成部243では、12個ある出力端子のうちの4個の出力端子からのみスピーカ信号が出力され、残りの8個の出力端子は使用されない。したがって、例えば空間周波数合成(IDFT)の出力の一部を省略してもよい。 On the other hand, in the spatial frequency synthesis unit 243, the speaker signal is output only from four output terminals out of the twelve output terminals, and the remaining eight output terminals are not used. Therefore, for example, a part of the output of spatial frequency synthesis (IDFT) may be omitted.
 ノイズキャンセリング装置281で行われる空間NCのための演算は、ノイズキャンセリング装置191で行われる空間NCのための演算と全く等価な演算である。 The calculation for space NC performed by the noise canceling device 281 is completely equivalent to the calculation for space NC performed by the noise canceling device 191.
 そのため、マイク121の出力の信号処理部211への入力のしやすさや、スピーカ151へと供給するスピーカ信号の加算(重畳)のしやすさなどを考慮して、ノイズキャンセリング装置281とノイズキャンセリング装置191のどちらの構成とするかを選択すればよい。 Therefore, in consideration of the ease of inputting the output of the microphone 121 to the signal processing unit 211 and the ease of adding (superimposing) the speaker signal supplied to the speaker 151, the noise canceling device 281 and the noise canceling device Which configuration of the ring device 191 may be selected.
 例えばアナログのマイク信号に対する加算や分岐等を容易に行うことができる状況であれば、ノイズキャンセリング装置281の構成を採用すればよい。また、例えばアナログの増幅器でのスピーカ信号の加算等を容易に行うことができる状況であれば、ノイズキャンセリング装置191の構成を採用すればよい。 For example, if it is possible to easily add or branch to an analog microphone signal, the configuration of the noise canceling device 281 may be adopted. Further, for example, if it is possible to easily add speaker signals with an analog amplifier, the configuration of the noise canceling device 191 may be adopted.
 以上のようなノイズキャンセリング装置281では、基本的には、図6を参照して説明したノイズキャンセリング処理が行われる。 In the noise canceling device 281 as described above, basically, the noise canceling process described with reference to FIG. 6 is performed.
 但し、ステップS11では、各マイク121で得られたマイク信号は、信号処理部211の空間周波数変換部241に供給される。 However, in step S11, the microphone signal obtained by each microphone 121 is supplied to the spatial frequency conversion unit 241 of the signal processing unit 211.
 そして、ステップS12では各信号処理部211の空間周波数変換部241で空間周波数変換が行われて、その結果得られた信号が各信号処理部211のフィルタ処理部242-1乃至フィルタ処理部242-12に供給される。 Then, in step S12, the spatial frequency conversion unit 241 of each signal processing unit 211 performs spatial frequency conversion, and the resulting signal is the filter processing unit 242-1 to the filter processing unit 242- of each signal processing unit 211. It is supplied to 12.
 ステップS13では、各信号処理部211のフィルタ処理部242でフィルタリングが行われ、その結果得られた空間周波数領域のスピーカ信号が各信号処理部211の空間周波数合成部243へと供給される。 In step S13, filtering is performed by the filter processing unit 242 of each signal processing unit 211, and the speaker signal in the spatial frequency region obtained as a result is supplied to the spatial frequency synthesis unit 243 of each signal processing unit 211.
 ステップS14では、各信号処理部211の空間周波数合成部243により空間周波数合成が行われ、その結果得られた時間領域のスピーカ信号がステップS15でスピーカ151に供給され、空間NCが実現される。 In step S14, spatial frequency synthesis is performed by the spatial frequency synthesis unit 243 of each signal processing unit 211, and the speaker signal in the time domain obtained as a result is supplied to the speaker 151 in step S15, and spatial NC is realized.
 このように、ノイズキャンセリング装置281においても1つの信号処理部211のPIN数や演算量を低減させるとともに遅延時間も低減させ、リアルタイムで全方位に対応可能な高性能の空間NCを実現することができる。 In this way, even in the noise canceling device 281, the number of PINs and the amount of calculation of one signal processing unit 211 are reduced, and the delay time is also reduced to realize a high-performance spatial NC capable of responding to all directions in real time. Can be done.
〈コンピュータの構成例〉
 ところで、上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、コンピュータにインストールされる。ここで、コンピュータには、専用のハードウェアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどが含まれる。
<Computer configuration example>
By the way, the series of processes described above can be executed by hardware or software. When a series of processes is executed by software, the programs constituting the software are installed on the computer. Here, the computer includes a computer embedded in dedicated hardware and, for example, a general-purpose personal computer capable of executing various functions by installing various programs.
 図16は、上述した一連の処理をプログラムにより実行するコンピュータのハードウェアの構成例を示すブロック図である。 FIG. 16 is a block diagram showing a configuration example of computer hardware that executes the above-mentioned series of processes programmatically.
 コンピュータにおいて、CPU(Central Processing Unit)501,ROM(Read Only Memory)502,RAM(Random Access Memory)503は、バス504により相互に接続されている。 In the computer, the CPU (Central Processing Unit) 501, the ROM (Read Only Memory) 502, and the RAM (Random Access Memory) 503 are connected to each other by the bus 504.
 バス504には、さらに、入出力インターフェース505が接続されている。入出力インターフェース505には、入力部506、出力部507、記録部508、通信部509、及びドライブ510が接続されている。 An input / output interface 505 is further connected to the bus 504. An input unit 506, an output unit 507, a recording unit 508, a communication unit 509, and a drive 510 are connected to the input / output interface 505.
 入力部506は、キーボード、マウス、マイク、撮像素子などよりなる。出力部507は、ディスプレイ、スピーカなどよりなる。記録部508は、ハードディスクや不揮発性のメモリなどよりなる。通信部509は、ネットワークインターフェースなどよりなる。ドライブ510は、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリなどのリムーバブル記録媒体511を駆動する。 The input unit 506 includes a keyboard, a mouse, a microphone, an image pickup element, and the like. The output unit 507 includes a display, a speaker, and the like. The recording unit 508 includes a hard disk, a non-volatile memory, and the like. The communication unit 509 includes a network interface and the like. The drive 510 drives a removable recording medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
 以上のように構成されるコンピュータでは、CPU501が、例えば、記録部508に記録されているプログラムを、入出力インターフェース505及びバス504を介して、RAM503にロードして実行することにより、上述した一連の処理が行われる。 In the computer configured as described above, the CPU 501 loads the program recorded in the recording unit 508 into the RAM 503 via the input / output interface 505 and the bus 504 and executes the above-mentioned series. Is processed.
 コンピュータ(CPU501)が実行するプログラムは、例えば、パッケージメディア等としてのリムーバブル記録媒体511に記録して提供することができる。また、プログラムは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の伝送媒体を介して提供することができる。 The program executed by the computer (CPU501) can be recorded and provided on a removable recording medium 511 as a package medium or the like, for example. The program can also be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
 コンピュータでは、プログラムは、リムーバブル記録媒体511をドライブ510に装着することにより、入出力インターフェース505を介して、記録部508にインストールすることができる。また、プログラムは、有線または無線の伝送媒体を介して、通信部509で受信し、記録部508にインストールすることができる。その他、プログラムは、ROM502や記録部508に、あらかじめインストールしておくことができる。 In the computer, the program can be installed in the recording unit 508 via the input / output interface 505 by mounting the removable recording medium 511 in the drive 510. Further, the program can be received by the communication unit 509 and installed in the recording unit 508 via a wired or wireless transmission medium. In addition, the program can be pre-installed in the ROM 502 or the recording unit 508.
 なお、コンピュータが実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 The program executed by the computer may be a program in which processing is performed in chronological order according to the order described in the present specification, in parallel, or at a necessary timing such as when a call is made. It may be a program in which processing is performed.
 また、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 Further, the embodiment of the present technology is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present technology.
 例えば、本技術は、1つの機能をネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 For example, this technology can take a cloud computing configuration in which one function is shared by multiple devices via a network and processed jointly.
 また、上述のフローチャートで説明した各ステップは、1つの装置で実行する他、複数の装置で分担して実行することができる。 In addition, each step described in the above flowchart can be executed by one device or shared by a plurality of devices.
 さらに、1つのステップに複数の処理が含まれる場合には、その1つのステップに含まれる複数の処理は、1つの装置で実行する他、複数の装置で分担して実行することができる。 Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices.
 さらに、本技術は、以下の構成とすることも可能である。 Furthermore, this technology can also have the following configurations.
(1)
 空間周波数領域での信号処理を行う1または複数の信号処理部を有し、
 前記信号処理部は、複数のマイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行う
 信号処理装置。
(2)
 前記信号処理部は、前記信号処理を行うことでノイズキャンセル信号を生成する
 (1)に記載の信号処理装置。
(3)
 複数の時間領域の前記マイク信号に対して空間周波数変換を行う空間周波数変換部と、
 前記信号処理により得られた空間周波数領域の信号に対して空間周波数合成を行う空間周波数合成部と
 をさらに備え、
 前記信号処理部は、前記空間周波数変換により得られた空間周波数領域の信号に対して前記信号処理を行う
 (1)または(2)に記載の信号処理装置。
(4)
 前記信号処理部は、前記複数の前記マイクで得られた複数の前記マイク信号に基づく複数の信号を入力として前記信号処理を行い、複数の信号を出力する
 (1)乃至(3)の何れか一項に記載の信号処理装置。
(5)
 前記信号処理部は、複数のフィルタ処理部を有し、前記信号処理として前記フィルタ処理部によるフィルタリングを行う
 (1)乃至(4)の何れか一項に記載の信号処理装置。
(6)
 複数の前記信号処理部を有し、
 前記信号処理部は、複数の前記マイクを複数のグループに分割したときの1つの前記グループに属す全ての前記マイクで得られた前記マイク信号に基づく信号に対して前記信号処理を行い、複数のスピーカのうちの一部の前記スピーカに対応するスピーカ信号を生成し、
 前記複数の前記スピーカのそれぞれに対応する複数の加算部をさらに備え、
 前記加算部は、前記複数の前記信号処理部のうちの2以上の前記信号処理部で得られた、前記加算部に対応する前記スピーカの前記スピーカ信号を加算し、前記加算により得られた最終的な前記スピーカ信号を、対応する前記スピーカに出力する
 (1)乃至(5)の何れか一項に記載の信号処理装置。
(7)
 複数の前記マイクは、互いに隣接する複数の前記マイクが同じ前記グループに属すように所定個数の前記グループに分割され、
 前記所定個数の前記グループのそれぞれに対して、前記グループに属す前記マイクで得られた前記マイク信号が入力される前記所定個数の前記信号処理部のそれぞれが定められている
 (6)に記載の信号処理装置。
(8)
 複数の前記信号処理部を有し、
 複数の各前記マイクについて、1つの前記マイクで得られた前記マイク信号が、前記複数の前記信号処理部のうちの予め定められた2以上の前記信号処理部に入力され、
 前記信号処理部は、複数の前記マイクから入力された前記マイク信号に基づく信号に対して前記信号処理を行うことで、複数のスピーカのそれぞれに対応するスピーカ信号を生成し、前記複数の前記スピーカのうちの一部の前記スピーカに対して前記スピーカ信号を出力する
 (1)乃至(5)の何れか一項に記載の信号処理装置。
(9)
 空間周波数領域での信号処理を行う1または複数の信号処理部を有する信号処理装置の信号処理方法であって、
 前記1または複数の前記信号処理部が、複数のマイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行う
 信号処理方法。
(10)
 空間周波数領域での信号処理を行う1または複数の信号処理部を有する信号処理装置を制御するコンピュータに、
 前記1または複数の前記信号処理部が、複数のマイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行う
 ステップを含む処理を実行させるプログラム。
(11)
 複数のマイクと、
 空間周波数領域での信号処理を行う1または複数の信号処理部と、
 前記信号処理により生成されたノイズキャンセル信号に基づく音を出力する複数のスピーカと
 を備え、
 前記信号処理部は、複数の前記マイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行うことで、前記ノイズキャンセル信号を生成する
 ノイズキャンセリング装置。
(1)
It has one or more signal processing units that perform signal processing in the spatial frequency domain.
The signal processing unit is a signal processing device that performs the signal processing on a signal converted in the spatial frequency domain based on the microphone signals obtained by collecting sounds from a plurality of microphones.
(2)
The signal processing device according to (1), wherein the signal processing unit generates a noise canceling signal by performing the signal processing.
(3)
A spatial frequency conversion unit that performs spatial frequency conversion for the microphone signals in a plurality of time domains,
It is further equipped with a spatial frequency synthesizer that performs spatial frequency synthesis on the signal in the spatial frequency region obtained by the signal processing.
The signal processing device according to (1) or (2), wherein the signal processing unit performs the signal processing on a signal in the spatial frequency region obtained by the spatial frequency conversion.
(4)
The signal processing unit performs the signal processing by inputting a plurality of signals based on the plurality of microphone signals obtained by the plurality of microphones, and outputs the plurality of signals (1) to (3). The signal processing device according to paragraph 1.
(5)
The signal processing device according to any one of (1) to (4), wherein the signal processing unit has a plurality of filter processing units and performs filtering by the filter processing unit as the signal processing.
(6)
It has a plurality of the signal processing units and has a plurality of the signal processing units.
The signal processing unit performs the signal processing on the signal based on the microphone signal obtained by all the microphones belonging to one group when the plurality of microphones are divided into a plurality of groups, and performs the signal processing to the plurality of microphones. Generate a speaker signal corresponding to the speaker of some of the speakers.
Further, a plurality of adder units corresponding to each of the plurality of the speakers are provided.
The addition unit adds the speaker signals of the speaker corresponding to the addition unit obtained by two or more of the signal processing units of the plurality of signal processing units, and the final obtained by the addition. The signal processing device according to any one of (1) to (5), which outputs a specific speaker signal to the corresponding speaker.
(7)
The plurality of the microphones are divided into a predetermined number of the groups so that the plurality of microphones adjacent to each other belong to the same group.
The signal processing unit of the predetermined number to which the microphone signal obtained by the microphone belonging to the group is input is defined for each of the predetermined number of the groups (6). Signal processing device.
(8)
It has a plurality of the signal processing units and has a plurality of the signal processing units.
For each of the plurality of microphones, the microphone signal obtained by one microphone is input to two or more predetermined signal processing units among the plurality of signal processing units.
The signal processing unit performs signal processing on a signal based on the microphone signal input from the plurality of microphones to generate a speaker signal corresponding to each of the plurality of speakers, and the plurality of speakers. The signal processing device according to any one of (1) to (5), which outputs the speaker signal to some of the speakers.
(9)
A signal processing method for a signal processing device having one or more signal processing units that perform signal processing in the spatial frequency domain.
A signal processing method in which the one or a plurality of the signal processing units perform the signal processing on a signal converted in the spatial frequency domain based on the microphone signals obtained by collecting sounds from a plurality of microphones.
(10)
For a computer that controls a signal processing device having one or more signal processing units that perform signal processing in the spatial frequency domain.
A program in which the one or a plurality of the signal processing units execute a process including a step of performing the signal processing on a signal converted in the spatial frequency domain based on a microphone signal obtained by sound collection by a plurality of microphones.
(11)
With multiple microphones
One or more signal processing units that perform signal processing in the spatial frequency domain,
It is equipped with a plurality of speakers that output sound based on the noise canceling signal generated by the signal processing.
The signal processing unit performs noise canceling to generate the noise canceling signal by performing the signal processing on the signal converted in the spatial frequency region based on the microphone signals obtained by the sound collection by the plurality of microphones. Device.
 111 マイクアレイ, 112 信号処理装置, 113 スピーカアレイ, 131 信号処理部, 141 空間周波数変換部, 142-1乃至142-16,142 フィルタ処理部, 143 空間周波数合成部, 211-1乃至211-4,211 信号処理部, 212-1乃至212-16,212 加算部 111 microphone array, 112 signal processing device, 113 speaker array, 131 signal processing unit, 141 spatial frequency conversion unit, 142-1 to 142-16, 142 filter processing unit, 143 spatial frequency synthesis unit, 211-1 to 211-4 , 211 Signal processing unit, 212-1 to 212-16,212 Addition unit

Claims (11)

  1.  空間周波数領域での信号処理を行う1または複数の信号処理部を有し、
     前記信号処理部は、複数のマイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行う
     信号処理装置。
    It has one or more signal processing units that perform signal processing in the spatial frequency domain.
    The signal processing unit is a signal processing device that performs the signal processing on a signal converted in the spatial frequency domain based on the microphone signals obtained by collecting sounds from a plurality of microphones.
  2.  前記信号処理部は、前記信号処理を行うことでノイズキャンセル信号を生成する
     請求項1に記載の信号処理装置。
    The signal processing device according to claim 1, wherein the signal processing unit generates a noise canceling signal by performing the signal processing.
  3.  複数の時間領域の前記マイク信号に対して空間周波数変換を行う空間周波数変換部と、
     前記信号処理により得られた空間周波数領域の信号に対して空間周波数合成を行う空間周波数合成部と
     をさらに備え、
     前記信号処理部は、前記空間周波数変換により得られた空間周波数領域の信号に対して前記信号処理を行う
     請求項1に記載の信号処理装置。
    A spatial frequency conversion unit that performs spatial frequency conversion for the microphone signals in a plurality of time domains,
    It is further equipped with a spatial frequency synthesizer that performs spatial frequency synthesis on the signal in the spatial frequency region obtained by the signal processing.
    The signal processing device according to claim 1, wherein the signal processing unit performs signal processing on a signal in a spatial frequency region obtained by the spatial frequency conversion.
  4.  前記信号処理部は、前記複数の前記マイクで得られた複数の前記マイク信号に基づく複数の信号を入力として前記信号処理を行い、複数の信号を出力する
     請求項1に記載の信号処理装置。
    The signal processing device according to claim 1, wherein the signal processing unit performs the signal processing by inputting a plurality of signals based on the plurality of microphone signals obtained by the plurality of microphones and outputs the plurality of signals.
  5.  前記信号処理部は、複数のフィルタ処理部を有し、前記信号処理として前記フィルタ処理部によるフィルタリングを行う
     請求項1に記載の信号処理装置。
    The signal processing device according to claim 1, wherein the signal processing unit has a plurality of filter processing units and performs filtering by the filter processing unit as the signal processing.
  6.  複数の前記信号処理部を有し、
     前記信号処理部は、複数の前記マイクを複数のグループに分割したときの1つの前記グループに属す全ての前記マイクで得られた前記マイク信号に基づく信号に対して前記信号処理を行い、複数のスピーカのうちの一部の前記スピーカに対応するスピーカ信号を生成し、
     前記複数の前記スピーカのそれぞれに対応する複数の加算部をさらに備え、
     前記加算部は、前記複数の前記信号処理部のうちの2以上の前記信号処理部で得られた、前記加算部に対応する前記スピーカの前記スピーカ信号を加算し、前記加算により得られた最終的な前記スピーカ信号を、対応する前記スピーカに出力する
     請求項1に記載の信号処理装置。
    It has a plurality of the signal processing units and has a plurality of the signal processing units.
    The signal processing unit performs the signal processing on the signal based on the microphone signal obtained by all the microphones belonging to one group when the plurality of microphones are divided into a plurality of groups, and performs the signal processing to the plurality of microphones. Generate a speaker signal corresponding to the speaker of some of the speakers.
    Further, a plurality of adder units corresponding to each of the plurality of the speakers are provided.
    The addition unit adds the speaker signals of the speaker corresponding to the addition unit obtained by two or more of the signal processing units of the plurality of signal processing units, and the final obtained by the addition. The signal processing device according to claim 1, wherein the speaker signal is output to the corresponding speaker.
  7.  複数の前記マイクは、互いに隣接する複数の前記マイクが同じ前記グループに属すように所定個数の前記グループに分割され、
     前記所定個数の前記グループのそれぞれに対して、前記グループに属す前記マイクで得られた前記マイク信号が入力される前記所定個数の前記信号処理部のそれぞれが定められている
     請求項6に記載の信号処理装置。
    The plurality of the microphones are divided into a predetermined number of the groups so that the plurality of microphones adjacent to each other belong to the same group.
    The sixth aspect of claim 6, wherein each of the predetermined number of the signal processing units to which the microphone signal obtained by the microphone belonging to the group is input to each of the predetermined number of the groups. Signal processing device.
  8.  複数の前記信号処理部を有し、
     複数の各前記マイクについて、1つの前記マイクで得られた前記マイク信号が、前記複数の前記信号処理部のうちの予め定められた2以上の前記信号処理部に入力され、
     前記信号処理部は、複数の前記マイクから入力された前記マイク信号に基づく信号に対して前記信号処理を行うことで、複数のスピーカのそれぞれに対応するスピーカ信号を生成し、前記複数の前記スピーカのうちの一部の前記スピーカに対して前記スピーカ信号を出力する
     請求項1に記載の信号処理装置。
    It has a plurality of the signal processing units and has a plurality of the signal processing units.
    For each of the plurality of microphones, the microphone signal obtained by one microphone is input to two or more predetermined signal processing units among the plurality of signal processing units.
    The signal processing unit performs signal processing on a signal based on the microphone signal input from the plurality of microphones to generate a speaker signal corresponding to each of the plurality of speakers, and the plurality of speakers. The signal processing device according to claim 1, wherein the speaker signal is output to some of the speakers.
  9.  空間周波数領域での信号処理を行う1または複数の信号処理部を有する信号処理装置の信号処理方法であって、
     前記1または複数の前記信号処理部が、複数のマイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行う
     信号処理方法。
    A signal processing method for a signal processing device having one or more signal processing units that perform signal processing in the spatial frequency domain.
    A signal processing method in which the one or a plurality of the signal processing units perform the signal processing on a signal converted in the spatial frequency domain based on the microphone signals obtained by collecting sounds from a plurality of microphones.
  10.  空間周波数領域での信号処理を行う1または複数の信号処理部を有する信号処理装置を制御するコンピュータに、
     前記1または複数の前記信号処理部が、複数のマイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行う
     ステップを含む処理を実行させるプログラム。
    For a computer that controls a signal processing device having one or more signal processing units that perform signal processing in the spatial frequency domain.
    A program in which the one or a plurality of the signal processing units execute a process including a step of performing the signal processing on a signal converted in the spatial frequency domain based on a microphone signal obtained by sound collection by a plurality of microphones.
  11.  複数のマイクと、
     空間周波数領域での信号処理を行う1または複数の信号処理部と、
     前記信号処理により生成されたノイズキャンセル信号に基づく音を出力する複数のスピーカと
     を備え、
     前記信号処理部は、複数の前記マイクによる収音によって得られたマイク信号に基づき空間周波数領域で変換された信号に対して前記信号処理を行うことで、前記ノイズキャンセル信号を生成する
     ノイズキャンセリング装置。
    With multiple microphones
    One or more signal processing units that perform signal processing in the spatial frequency domain,
    It is equipped with a plurality of speakers that output sound based on the noise canceling signal generated by the signal processing.
    The signal processing unit performs noise canceling to generate the noise canceling signal by performing the signal processing on the signal converted in the spatial frequency region based on the microphone signals obtained by the sound collection by the plurality of microphones. Device.
PCT/JP2021/027823 2020-08-11 2021-07-28 Signal processing device and method, noise cancelling device, and program WO2022034795A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2020135544 2020-08-11
JP2020-135544 2020-08-11
JP2020-145742 2020-08-31
JP2020145742 2020-08-31

Publications (1)

Publication Number Publication Date
WO2022034795A1 true WO2022034795A1 (en) 2022-02-17

Family

ID=80247161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/027823 WO2022034795A1 (en) 2020-08-11 2021-07-28 Signal processing device and method, noise cancelling device, and program

Country Status (1)

Country Link
WO (1) WO2022034795A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017118189A (en) * 2015-12-21 2017-06-29 日本電信電話株式会社 Sound collection signal estimating device, sound collection signal estimating method and program
WO2018163810A1 (en) * 2017-03-07 2018-09-13 ソニー株式会社 Signal processing device and method, and program
WO2019198557A1 (en) * 2018-04-09 2019-10-17 ソニー株式会社 Signal processing device, signal processing method, and signal processing program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017118189A (en) * 2015-12-21 2017-06-29 日本電信電話株式会社 Sound collection signal estimating device, sound collection signal estimating method and program
WO2018163810A1 (en) * 2017-03-07 2018-09-13 ソニー株式会社 Signal processing device and method, and program
WO2019198557A1 (en) * 2018-04-09 2019-10-17 ソニー株式会社 Signal processing device, signal processing method, and signal processing program

Similar Documents

Publication Publication Date Title
JP7028238B2 (en) Signal processing equipment and methods, as well as programs
US7991166B2 (en) Microphone apparatus
JP2012169895A (en) Multipole speaker group and arrangement method thereof, acoustic signal output device and method thereof, active noise control device and sound field reproduction device using method, and methods thereof and program
KR102514060B1 (en) A method of beamforming sound for driver units in a beamforming array and sound apparatus
JP4787727B2 (en) Audio recording apparatus, method thereof, program thereof, and recording medium thereof
JP5734329B2 (en) Sound field recording / reproducing apparatus, method, and program
WO2022034795A1 (en) Signal processing device and method, noise cancelling device, and program
WO2020085117A1 (en) Signal processing device, method, and program
WO2017201603A1 (en) Wave field synthesis by synthesizing spatial transfer function over listening region
CN113766396A (en) Loudspeaker control
JP5603307B2 (en) Sound field recording / reproducing apparatus, method, and program
WO2020171081A1 (en) Signal processing device, signal processing method, and program
JP5826712B2 (en) Multi-channel echo canceling apparatus, multi-channel echo canceling method, and program
US9288577B2 (en) Preserving phase shift in spatial filtering
CN110637466B (en) Loudspeaker array and signal processing device
JP5628219B2 (en) Sound field recording / reproducing apparatus, method, and program
CN114915875B (en) Adjustable beam forming method, electronic equipment and storage medium
JP2015027046A (en) Acoustic field sound collection reproduction device, method, and program
Rafaely Bessel nulls recovery in spherical microphone arrays for time-limited signals
JP5741866B2 (en) Sound field recording / reproducing apparatus, method, and program
JP2014007543A (en) Sound field reproduction apparatus, method and program
JP5749221B2 (en) Sound field recording / reproducing apparatus, method, and program
Zotter et al. Higher-order ambisonic microphones and the wave equation (linear, lossless)
Yan et al. Time-Domain Broadband Beamforming
JP6892111B2 (en) Direct inverse identification method and device for multi-input multi-output system, program and storage medium, multi-input multi-output reverse filter device and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21855874

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21855874

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP