WO2020039734A1 - Audio reproducing device, audio reproduction method, and audio reproduction program - Google Patents

Audio reproducing device, audio reproduction method, and audio reproduction program Download PDF

Info

Publication number
WO2020039734A1
WO2020039734A1 PCT/JP2019/025199 JP2019025199W WO2020039734A1 WO 2020039734 A1 WO2020039734 A1 WO 2020039734A1 JP 2019025199 W JP2019025199 W JP 2019025199W WO 2020039734 A1 WO2020039734 A1 WO 2020039734A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
audio
decoder
channels
hoa
Prior art date
Application number
PCT/JP2019/025199
Other languages
French (fr)
Japanese (ja)
Inventor
哲 曲谷地
一敦 大栗
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Priority to DE112019004193.2T priority Critical patent/DE112019004193T5/en
Priority to CN201980053901.8A priority patent/CN112567769B/en
Publication of WO2020039734A1 publication Critical patent/WO2020039734A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/02Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • the present disclosure relates to an audio playback device, an audio playback method, and an audio playback program.
  • an audio reproducing apparatus in addition to reproducing an audio signal in a stereo form (two channels), a multi-channel form in which the number of speakers is further increased is known.
  • a multi-channel form in which the number of speakers is further increased is known.
  • Patent Document 1 discloses a method of encoding an audio signal using higher-order ambisonics for such audio reproduction using multi-channels.
  • One object of the present disclosure is to provide an audio reproducing device, an audio reproducing method, and an audio reproducing program for improving the quality of reproduced audio.
  • a first decoder for decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels A second decoder that decodes a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
  • An audio playback device comprising: an adder that adds a plurality of channels of audio signals decoded by the first decoder and a plurality of channels of audio signals decoded by the second decoder.
  • a first signal associated with a spatial frequency is decoded into an audio signal of a plurality of channels, and a second signal including a band different from that of the first signal and associated with a spatial coordinate is converted into an audio signal of a plurality of channels.
  • a first decoding process of decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels A second decoding process of decoding a second signal including a band different from the first signal and associated with spatial coordinates into audio signals of a plurality of channels;
  • An audio reproduction program for causing an information processing apparatus to execute an addition process of adding audio signals of a plurality of channels decoded by the first decoder and audio signals of a plurality of channels decoded by the second decoder.
  • the present disclosure it is possible to improve the quality of audio to be reproduced.
  • the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.
  • the contents of the present disclosure are not to be construed as being limited by the illustrated effects.
  • FIG. 1 is a diagram illustrating a configuration of an audio system as a comparative example.
  • FIG. 2 is a diagram for describing an overview of the audio system according to the first embodiment.
  • FIG. 3 is a diagram illustrating frequency characteristics of the audio system according to the first embodiment.
  • FIG. 4 is a diagram illustrating a configuration of the audio system according to the first embodiment.
  • FIG. 5 is a diagram showing a recording format of a recording signal used in the audio system according to the first embodiment.
  • FIG. 6 is a diagram illustrating a configuration of an audio system according to the second embodiment.
  • FIG. 7 is a diagram illustrating a configuration of an audio system according to the third embodiment.
  • FIG. 8 is a diagram illustrating a configuration of an audio system according to the fourth embodiment.
  • FIG. 1 is a diagram illustrating a configuration of an audio system as a comparative example.
  • FIG. 2 is a diagram for describing an overview of the audio system according to the first embodiment.
  • FIG. 3 is a diagram
  • FIG. 9 is a diagram illustrating a configuration of an audio system according to the fifth embodiment.
  • FIG. 10 is a diagram illustrating frequency characteristics of the audio system according to the fifth embodiment.
  • FIG. 11 is a diagram illustrating a configuration of an audio system according to the sixth embodiment.
  • FIG. 12 is a diagram illustrating a recording format of a recording signal used in an audio system according to a modification.
  • ambisonics that can flexibly cope with an arbitrary recording and reproducing system.
  • ambisonics whose order is second or higher are called higher order ambisonics (HOA: ⁇ Higher ⁇ Order ⁇ Ambisonics).
  • HOA higher order ambisonics
  • information is stored by performing spatial frequency conversion (spherical harmonic function conversion) in the angular direction of three-dimensional polar coordinates. are doing. This can be considered to correspond to time-frequency conversion of the audio signal with respect to the time axis.
  • An advantage of this method is that information can be encoded and decoded from an arbitrary microphone array to an arbitrary speaker array without limiting the number of microphones and the number of speakers.
  • Encodings used in the HOA method can be roughly classified into two types. One is a recording base, and the other is an object base. Since the first recording base is targeted in the present embodiment, this will be described.
  • a certain time frequency ⁇ of the sound signal recorded by the annular or spherical microphone array is converted into HOA signals A m ( ⁇ ) and An m ( ⁇ ) according to the following equations, respectively.
  • Equation (1) is for a circular microphone array
  • Equation (2) is for a spherical microphone array.
  • ⁇ q and ⁇ q represent the azimuth and elevation of the q-th microphone
  • P q ( ⁇ ) represents the sound pressure of the q-th microphone.
  • J m (ka) is a Bessel function
  • m is its order
  • k is the wave number
  • a is the radius of the microphone array.
  • the Bessel function of Equation (1) is the spherical Bessel function
  • e -Imfaiq spherical harmonics Y n m ( ⁇ q, ⁇ q) is replaced to.
  • the spherical harmonic is
  • n is the order of the HOA. Since this is a conversion of the sound pressure P which is a continuous function of the azimuth and the elevation, the orders m and n exist up to infinity. However, when recording with a spherical microphone array, it is impossible to capture the sound pressure P as a continuous function. Therefore, similar to the sampling theorem at the time frequency, the following relation exists between the reproducible HOA orders M and N and the number Q of microphones.
  • Equation (4) for a ring Equation (5) for a sphere.
  • R is the radius of the speaker array
  • ⁇ i , ⁇ i are the elevation and azimuth angles of the i-th speaker
  • G m (R, ⁇ ) and G n 0 (R, ⁇ ) are the transfer functions of HOA coefficient.
  • H m (2) (kR) is a Hankel function of the second kind
  • h n (2) (kR) is a Hankel function of the second kind.
  • the conversion formula between the HOA signal and the audio signal differs depending on the shape of the microphone array, the shape of the speaker array, the directivity, and the like.
  • descriptions as HOA encoding and HOA decoding mean that these various systems are included, and are not limited to any of them.
  • spatial aliasing As described above, in recording by a microphone set, the order is finite due to the limitation of the number of microphones. Therefore, if a signal of a higher order is mixed, spatial aliasing occurs. If a signal in which spatial aliasing occurs is encoded and decoded by the HOA method, a signal different from the recorded space will be reproduced. The effect of this aliasing depends on the time frequency and the radius of the microphone. As the time frequency becomes lower and the microphone radius becomes smaller, the higher-order signal of the HOA order becomes smaller. In other words, for the same time frequency, the smaller the radius of the microphone, the smaller the higher-order HOA signal, and the less the aliasing effect. Also, if the microphone radius is the same, the effect of aliasing is reduced in the low frequency band.
  • an object is to perform high-quality audio reproduction by suppressing the influence of spatial aliasing generated at a particularly high frequency in accordance with the number and spacing of microphones and the radius of the array. .
  • FIG. 1 is a diagram showing a configuration of an audio system 1 as a comparative example.
  • This comparative example is a conventional form using only the HOA method, and includes a HOA encoder 22 and a HOA decoder 31. Audio signals collected by a plurality of microphones provided in the microphone set 41 are input to the HOA encoder 22.
  • the microphone set 41 includes a plurality of microphones provided in an appropriate arrangement such as a ring, a sphere, and a line.
  • the HOA encoder 22 performs HOA encoding on a plurality of audio signals collected by the microphone set 41, thereby converting the audio signals into a HOA signal represented as a spatial frequency.
  • the HOA decoder 31 can reproduce the received HOA signal using an arbitrary speaker set 42.
  • the speaker set 42 used includes a plurality of speakers provided in an appropriate arrangement such as a ring, a sphere, a line, and the like. Further, the arrangement of the speaker set 42 does not need to depend on the microphone arrangement of the microphone set 41 that has collected the sound. This is because the HOA signal is expressed in the spatial frequency. By setting the arrangement of the speakers of the speaker set 42 with respect to the HOA decoder 31, it is possible to reproduce the sound field collected. It is possible.
  • FIG. 2 is a diagram for describing an overview of the audio system 1 according to the first embodiment.
  • the audio system 1 according to the first embodiment includes an LPF 21 (Low Pass Filter), a HOA encoder 22, a HOA decoder 31, an HPF 23 (High Pass Filter), a multiplier 33, and an adder 34.
  • the audio system 1 receives a plurality of microphones provided in the microphone set 41 as input and outputs a speaker set 42 in which a plurality of speakers are arranged.
  • the audio signal output from the microphone set 41 and input to the HOA encoder 22 via the LPF 21 and the audio signal input to the HPF 23 are equivalent to the number of microphones provided in the microphone set 41. Has the number of channels.
  • the audio signal output from the HOA decoder 31 and output to the speaker set 42 via the adder 34 has the same number of channels as the number of speakers arranged in the speaker set 42. As described above, in the block diagram shown in FIG. 2, for convenience of drawing, there are places where a plurality of channels are indicated by one line.
  • a plurality of audio signals collected by a plurality of microphones of the microphone set 41 are input to the HOA encoder 22.
  • the LPF 21 is used for the plurality of sound signals input from the microphone set 41.
  • High-frequency components are removed, and the frequency band is limited to a frequency band that can be correctly expressed by the HOA signal.
  • the HOA encoder 22 converts the plurality of audio signals from which the high-frequency components have been removed by the LPF 21 into HOA signals represented as spatial frequency.
  • the HOA decoder 31 decodes the HOA signal output from the HOA encoder 22 and reproduces the HOA signal using an arbitrary speaker set 42. At this time, in a plurality of audio signals input to the microphone set 41, a high frequency band that cannot be expressed by the HOA encoder 22 is only a high-frequency component via the HPF 23, and after performing gain adjustment by the multiplication unit 33, At 34, the sum is added to the HOA-decoded audio signal and output to the speaker set 42.
  • FIG. 3 is a diagram illustrating frequency characteristics of the audio system 1 according to the first embodiment.
  • the low-pass characteristics shown by the solid lines indicate the characteristics of the LPF 21.
  • the high-pass characteristics indicated by the broken lines indicate the characteristics of the HPF 23.
  • a flat frequency characteristic is formed from low to high frequencies.
  • FIG. 4 is a diagram illustrating a configuration of the audio system 1 according to the first embodiment.
  • the audio system 1 is actually divided into a recording device 2 provided on the recording side and a reproducing device 3 provided on the reproducing side.
  • the recording signal recorded by the recording device 2 is recorded on a recording medium or transmitted via communication.
  • the reproduction device 3 reproduces the sound field at the time of recording by reproducing the recording signal recorded on the recording medium or the recording signal transmitted via communication.
  • the input side and the output side have eight channels (ch: channel), the microphone set 41 uses eight microphones m1 to m8, and the speaker set 42 also has eight speakers. s1 to s8 are used.
  • the microphones m1 to m8 and the speakers s1 to s8 are arranged such that the numbers of the subscripts correspond to each other. In FIG. 4, the numbers shown on the lines between the blocks indicate the number of channels.
  • the recording device 2 located on the recording side of the audio system 1 includes the LPF 21, the HOA encoder 22, the HPF 23, and the encoder 24.
  • the LPF 21, the HOA encoder 22, and the HPF 23 are the same as those described with reference to FIG.
  • the encoder 24 converts the audio signal that has passed through the HPF 23 into a signal corresponding to spatial coordinates.
  • a method of converting into a signal corresponding to the spatial coordinates for example, PCM (Pulse Code Modulation) coding, ADPCM (Adaptive Differential Pulse Code Modulation coding, Delta modulation, etc.) A method that depends on coordinates.
  • the HOA encoder 22 is different from the encoder 24 in that the HOA encoder 22 converts the audio signal input from the LPF 21 into a signal corresponding to a spatial frequency.
  • the HOA signal converted by the HOA encoder 22 can reproduce the sound at the spatial coordinate position by designating the spatial coordinates to be reproduced, that is, the positions of the speakers s1 to s8 in the speaker set 42. .
  • the HOA signal obtained as a result of conversion by the HOA encoder 22 of the recording device 2 and the high-frequency signal obtained as a result of conversion by the encoder 24 are recorded on a recording medium as a recording signal, or sent to the reproducing device 3 located on the reproducing side. Is sent.
  • FIG. 5 is a diagram showing a recording format of a recording signal used in the audio system 1 according to the first embodiment.
  • the recording signal has a header section and a data section.
  • the header section is a section in which various meta information necessary for reproducing the recorded audio signal is recorded.
  • the meta information to be recorded in the header section is configured to include a sampling rate, a frame length, the number of frames, the number of band divisions, and band information (first band information and second band information) for each band. .
  • the sampling rate is the sampling rate used at the time of recording, and may be fixed or variable.
  • the frame length is information defining the length of a frame recorded in the data section. Either fixed or variable frame length may be adopted.
  • the number of frames (L) is a number that defines the number of frames forming a chunk that is a unit of one data in the data portion.
  • the number of band divisions is a number indicating the number of bands divided in the audio system 1. In the present embodiment, the number of band divisions is “2” by the LPF 21 and the HPF 23 as described with reference to FIG. 2 ".
  • the first band information is information relating to conversion on the low band side, that is, the conversion of the HOA encoder 22.
  • the first band information is configured to include a cutoff frequency, spatial domain information, signal domain information, compression scheme information, and an order. ing.
  • the cutoff frequency corresponds to the cutoff frequency on the high frequency side of the LPF 21 described in FIG.
  • the spatial domain information includes information indicating that the band is a HOA signal.
  • information on the collected microphone set 41 for example, information on the arrangement of the microphones m1 to m8 in the microphone set 41, for example, , Spherical, annular, linear, inward, outward, and the like.
  • the signal domain information is information indicating whether it is recorded on the time axis or on the time frequency axis.
  • the compression method information is information indicating the presence or absence of compression and the compression method being used.
  • the order is the order used in the HOA encoder 22.
  • the second band information is information relating to the conversion on the high frequency side, that is, the encoder 24.
  • the second band information includes cutoff frequency, spatial domain information, signal domain information, compression scheme information, and channel information. It is configured.
  • the cutoff frequency corresponds to the cutoff frequency on the low frequency side of the HPF 23 described with reference to FIG.
  • the spatial domain information includes information indicating that the band is a signal encoded by the encoder 24.
  • information on the collected microphone set 41 for example, information of the microphones m1 to m8 in the microphone set 41 is included.
  • Information on the arrangement for example, information such as spherical, annular, linear, inward, outward and the like may be included.
  • the signal domain information is information indicating whether it is recorded on the time axis or on the time frequency axis.
  • the compression method information is information indicating the presence or absence of compression and the compression method being used.
  • the channel information includes the number of channels and channel coordinates. The number of channels corresponds to the number of microphones in the microphone set 41 (in this case, “8”).
  • the channel coordinates are coordinates indicating the spatial arrangement of the microphones m1 to m8 in the microphone set 41.
  • the data section stores signals converted by the HOA encoder 22 and the encoder 24.
  • frame chunks having frames are provided by the number of frames (L).
  • the data recorded in the frame as described above is converted into a sound signal by the HOA decoder 31 or the decoder 32 with reference to the meta information described in the header portion.
  • the recording format described above information common to bands can be combined into one.
  • the recording format described above is merely an example, and the present invention is not limited to this format, and can be configured in various forms.
  • the playback device 3 located on the playback side of the audio system 1 includes a HOA decoder 31, a decoder 32, a multiplier 33, and an adder 34.
  • the HOA decoder 31 decodes the HOA signal encoded by the HOA encoder 22 and forms an 8-channel audio signal.
  • the decoder 32 combines the signals encoded by the encoder 24 to form an 8-channel audio signal.
  • the adder 34 adds, for each channel, the audio signal formed by the HOA decoder and the audio signal formed by the decoder 32 and appropriately multiplied by the multiplier 33, and outputs the result to the speaker set 42.
  • the number of microphones m1 to m8 of the microphone set 41 and the number of speakers s1 to s8 of the speaker set 42 are the same eight, signals of the corresponding channels are output to the speakers s1 to s8. This makes it possible to reproduce the sound field at the time of sound pickup.
  • the number of the microphones m1 to m8 in the microphone set 41 for the HOA signal that is a signal corresponding to the spatial frequency is set.
  • Spacing, or the effect of spatial aliasing that occurs in accordance with the radius of the array, etc. makes it possible to suppress the deterioration of the audio signal that occurs at a certain frequency or higher, and to collect and reproduce the sound field with high accuracy. .
  • Second Embodiment> In the first embodiment, as described with reference to FIG. 4, the number of microphones m1 to m8 of the microphone set 41 and the number of speakers s1 to s8 of the speaker set 42 match. However, it is conceivable that the arrangement of the speaker set 42 cannot be configured in the same manner as the arrangement of the microphone set 41 at the time of sound collection due to the convenience of the reproduction side.
  • the second and third embodiments described below are embodiments in which the number of microphones in the microphone set 41 and the number of speakers in the speaker set 42 do not match.
  • FIG. 6 is a diagram showing a configuration of the audio system 1 according to the second embodiment.
  • the configurations of the microphone set 41 and the recording device 2 located on the recording side are the same as those described with reference to FIG. 4, and a description thereof will be omitted.
  • the speaker set 42 located on the reproduction side is different from the configuration in FIG. 4 in that the number of speakers s1 to s4 is smaller than the number (eight) of microphones m1 to m8. Further, the reproducing apparatus 3 is different in that a matrix section 35 is provided between the multiplication section 33 and the addition section 34. Then, by specifying the number and positions of the speakers s1 to s4 in the speaker set 42, the HOA decoder 31 outputs audio signals for four channels according to the arrangement of the speakers s1 to s4.
  • an audio signal for eight channels corresponding to the microphones m1 to m8 at the time of sound pickup is output.
  • the audio signal output from the decoder 32 is mixed by the matrix unit 35 as a conversion unit in accordance with the arrangement of the speakers s1 to s4 of the speaker set 42. Specifically, audio signals collected by three microphones m1, m2, and m8 are mixed as audio signals to be output to the speaker s1. At this time, the audio signals collected by the microphones m2 and m8 are multiplied by a coefficient of 0.25.
  • the audio signal output to the speaker s2 is obtained by mixing the audio signals collected by the three microphones m2, m3, and m4 in the matrix unit 35.
  • the audio signal output to the speaker s3 is obtained by mixing the audio signals collected by the three microphones m4, m5, and m6 in the matrix unit 35.
  • the audio signal output to the speaker s4 is obtained by mixing the audio signals collected by the three microphones m6, m7, and m8 in the matrix unit 35.
  • the HOA decoder 31 for restoring an audio signal based on a signal corresponding to a spatial frequency has a sound collecting form, that is, regardless of the arrangement form of the microphones m1 to m8, depending on the arrangement form of the speakers s1 to s4. While the sound field can be reproduced, the decoder 32 for restoring the audio signal based on the signal corresponding to the spatial coordinates reproduces the sound field with an audio signal depending on the positions of the microphones m1 to m8. Become.
  • a matrix unit 35 is provided as a conversion unit, and the number of channels of the audio signal output from the decoder 32 is converted according to the arrangement of the speakers s1 to s4 of the speaker set 42. This makes it possible to reproduce a sound field according to the arrangement of the speakers s1 to s4 on the reproduction side.
  • the configuration of the matrix unit 35 may be various methods other than those described in the present embodiment, and is not limited to one method.
  • FIG. 7 is a diagram illustrating a configuration of the audio system 1 according to the third embodiment.
  • the third embodiment is different from the second embodiment described with reference to FIG. 6 in the arrangement of the microphones in the microphone set 41 and the arrangement of the speakers in the speaker set 42. More specifically, the microphone set 41 includes four microphones m1 to m4, and the speaker set 42 includes eight speakers s1 to s8.
  • the matrix unit 35 as a conversion unit converts the four-channel audio signals output from the decoder 32 corresponding to the microphones m1 to m4 into eight-channel audio signals corresponding to the speakers s1 to s8.
  • the audio signals output to the speakers s1, s3, s5, and s7 directly output the audio signals collected by the microphones m1, m2, m3, and m4 in the corresponding arrangement.
  • the speakers s2, s4, s6, and s8, which do not have correspondingly arranged microphones, are formed by mixing audio signals of a plurality of microphones.
  • the audio signal for the speaker s2 is formed by mixing the audio signals of the microphone m1 and the microphone m2.
  • the mixing is performed by multiplying and adding fixed coefficients, the coefficients may be dynamically changed. For example, by distributing a large coefficient in a direction in which the magnitude (level) of the audio signal is large, it is possible to emphasize the sense of direction of the sound field during reproduction.
  • the number of channels is converted into the number of channels corresponding to the arrangement of the speakers s1 to s4 on the reproduction side by the matrix unit 35 as a conversion unit.
  • the sound field can be reproduced properly.
  • the conversion unit can not only convert the number of channels, but also convert the audio signal so that when the sound pickup direction of the microphone and the sound emission direction of the speaker are different, they are in an appropriate form. .
  • the number of microphones and the number of speakers may be the same.
  • the configuration of the matrix unit 35 may be various methods other than the one described in the present embodiment, and is not limited to one method.
  • FIG. 8 is a diagram illustrating a configuration of an audio system 1 according to the fourth embodiment.
  • the audio system 1 according to the first embodiment described with reference to FIG. 4 the downsampling unit 26 as a sampling frequency conversion unit on the recording side, and the delay caused by the downsampling unit 26 are described.
  • the difference is that a delay unit 25 for compensating is provided, and an upsampling unit 37 as a sampling frequency conversion unit is provided on the reproduction side.
  • the processing in the HOA encoder 22 does not include a signal in a high frequency region, it is conceivable that even if the frequency of the input audio signal is reduced, the effect on the sound quality is not so large.
  • the amount of calculation in the HOA encoder 22 is reduced by performing a down-sampling process on the time axis in the down-sampling unit 26 for the audio signal input to the HOA encoder 22. Further, by performing the downsampling, the data amount of the signal output from the HOA encoder 22 can be reduced, and the storage capacity and the communication amount can be reduced.
  • an up-sampling section 37 arranged downstream of the HOA decoder 31 performs up-sampling at the same sampling frequency as that of the decoder 32 side.
  • an FIR filter is mainly used in many cases.
  • the delay unit 25 is provided on the path on the encoder 24 side to compensate for the delay generated in the downsampling unit 26.
  • the delay may be compensated on the reproducing side (or the recording side and the reproducing side).
  • the delay unit When compensating for the delay on the reproduction side, for example, it is conceivable to arrange a delay unit at a stage subsequent to the decoder 32.
  • the downsampling process on the time axis is performed on the audio signal input to the HOA encoder 22 to reduce the calculation amount in the HOA encoder 22 and to reduce the amount of calculation from the HOA encoder 22. It is possible to reduce the data amount of the output signal. In addition, as the amount of data output from the HOA encoder 22 can be reduced, a larger amount of information (for example, the number of bits) can be assigned to the signal output from the encoder 24.
  • the conversion of the sampling frequency is performed not on the HOA encoder 22 side and the HOA decoder 31 side but on the encoder 24 and the decoder 32 side, or on both the HOA encoder 22 side and the HOA decoder 31 side and the encoder 24 and the decoder 32 side. It may be performed by.
  • FIG. 9 is a diagram illustrating a configuration of an audio system 1 according to the fifth embodiment.
  • the audio system 1 according to the first embodiment described with reference to FIG. 4 uses one HOA encoder 22, whereas in the fifth embodiment, the audio system 1 is in charge of a low band. This is different from the first embodiment in that a HOA encoder 22a for performing a mid-range operation is provided.
  • An LPF 21a for removing a high-frequency component of an input audio signal is arranged at a stage preceding the HOA encoder 22a, and a BPF 21b (Band) for extracting a mid-range component of the input audio signal is arranged at a stage prior to the HOA encoder 22b. Pass Filter).
  • a HOA decoder 31a for decoding an audio signal encoded by the HOA encoder 22a and a HOA decoder 31a for decoding an audio signal encoded by the HOA encoder 22b are arranged on the reproduction side. .
  • the adder 34 converts the audio signal decoded by the decoder 32 and multiplied by the coefficient by the multiplier 33 and the audio signal decoded by the HOA decoder 31a and the HOA decoder 31b, and outputs the converted signal to the speaker set 42. .
  • FIG. 10 is a diagram illustrating frequency characteristics of the audio system 1 according to the fifth embodiment.
  • the low-pass characteristic shown by the solid line indicates the characteristic of the LPF 21a.
  • the mid-pass characteristic indicated by the dashed line indicates the characteristic of the BPF 21b.
  • the high-pass characteristics indicated by broken lines indicate the characteristics of the HPF 23.
  • the HOA encoders 22a and 22b by providing the HOA encoders 22a and 22b separately in a plurality of frequency bands, it is possible to vary the order used in the HOA processing of the HOA encoders 22a and 22b.
  • the wavelength is sufficiently long, so that the direction of arrival of the sound perceived by humans is insensitive. It is conceivable to reduce the calculation amount by performing the processing.
  • FIG. 11 is a diagram illustrating a configuration of an audio system 1 according to the sixth embodiment.
  • the mode in which the delay unit 25 is provided on the recording side to compensate for the delay generated in the downsampling unit 26 has been described. Since the processing is performed separately, a time lag may occur between the bands.
  • the sixth embodiment shows a configuration for eliminating such a time lag between bands on the reproduction side.
  • a time lag caused by the processing of the encoder 24 and the HOA encoder 22 on the recording side or a time lag caused by the processing of the decoder 32 and the HOA decoder 31 on the reproduction side can be eliminated by providing the delay unit 36 on the reproduction side.
  • the delay instead of compensating for the delay on the reproducing side, the delay may be compensated on the recording side (or the recording side and the reproducing side).
  • Modification> Modification of HOA method
  • the first to sixth embodiments have described the embodiments using the HOA method using the HOA encoder 22, the HOA decoder 31, and the like.
  • the signals used in the various embodiments are not limited to the signals encoded by the HOA method, but the signals associated with the spatial frequencies, in other words, the positions to be reproduced when decoding (the positions at which the speakers are installed) ), Various methods can be adopted as long as the signal can reproduce the audio signal at the position.
  • a signal associated with a spatial frequency is referred to as an SF signal.
  • the method used by the encoder 24, the decoder 32, and the like uses a signal corresponding to the spatial coordinates.
  • the signal is a signal that can reproduce the audio signal at the picked-up position (spatial coordinates).
  • a signal associated with the spatial coordinates is referred to as an SA signal.
  • SA signal a signal associated with the spatial coordinates
  • FIG. 12 is a diagram illustrating a recording format of a recording signal used in the audio system 1 according to a modification.
  • the format of the recording signal used in the first embodiment has been described.
  • a description will be given of a recording format when the HOA signal is generalized as a signal (SF signal) associated with a spatial frequency, and a signal in charge of a high band is a signal (SA signal) associated with spatial coordinates (SA signal). .
  • SF signal signal
  • SA signal spatial coordinates
  • the recording signal has a header section and a data section.
  • the header section is a section in which various meta information necessary for reproducing the recorded audio signal is recorded.
  • the meta information to be recorded in the header section includes a sampling rate, a frame length, the number of frames (L), the number of band divisions (N), and band information (first to N-th band information) for each band. It is configured. For example, when the frequency band is divided into three frequency bands as in the fifth embodiment, first to third band information is provided.
  • the sampling rate is the sampling rate used at the time of recording, and may be fixed or variable.
  • the frame length is information defining the length of a frame recorded in the data section. Either fixed or variable frame length may be adopted.
  • the number of frames (L) is a number that defines the number of frames forming a chunk that is a unit of one data in the data portion.
  • the number of band divisions is a number indicating the number of bands to be divided in the audio system 1. For example, when the band is divided into three frequency bands as in the fifth embodiment, the number of band divisions is "3".
  • the first to third band information is provided.
  • Each band information (first to N-th band information) is provided with a first cutoff frequency indicating the lower limit of the assigned frequency band and a second cutoff frequency indicating the upper limit.
  • the time delay information is information indicating delay or advance with respect to another band, and can be used, for example, for setting the delay time in the delay unit 36 described in the sixth embodiment.
  • the spatial domain information is information indicating whether the band is an SF signal or an SA signal, and the reproducing device 3 can determine a decoding method for the band by referring to the spatial domain information. is there.
  • the spatial domain information may include information on the microphone arrangement of the collected microphone set 41 and the like.
  • the signal domain information is information indicating whether it is recorded on the time axis or on the time frequency axis.
  • the compression method information is information indicating the presence or absence of compression and the compression method being used.
  • the order is stored when the SF signal is used, and the channel information is stored when the SA signal is used.
  • the order stored for the frequency band using the SF signal is the order used for the process of forming a signal corresponding to the spatial frequency.
  • the channel information stored for the frequency band using the SA signal is information stored when the SA signal is used, and as described with reference to FIG. It is configured to include coordinates.
  • the number of channels corresponds to the number of microphones in the microphone set 41 (for example, “4” in the case of the third embodiment shown in FIG. 7).
  • the channel coordinates are coordinates indicating the spatial arrangement of the microphones m1 to m4 in the microphone set 41.
  • the matrix unit 35 (conversion unit) described in the second embodiment described with reference to FIG. 6 and the third embodiment described with reference to FIG. 7 uses the channel information and the arrangement of speakers in the speaker set 42 based on this channel information. Various conversions such as converting the number of channels of an audio signal can be performed.
  • the data section stores signals converted for each band.
  • frame chunks having frames are provided for the first to N-th bands by the number of frames (L).
  • the data recorded in such a frame is converted into an audio signal with reference to the meta information described in the header.
  • the playback device 3 in each of the above-described embodiments outputs an audio signal to the speaker set 42 including a plurality of speakers.
  • the reproducing device 3 may reproduce an audio signal in a virtual environment using headphones, for example. That is, in the above-described embodiment, if the head-related transfer functions from each speaker in the speaker set 42 to both ears of the listener are known, each head-related transfer function is convolved with the audio signal driving each speaker. It can be seen how the sound of each speaker is heard by both ears of the listener. By reproducing the sum of the left and right ears through headphones or the like, a sound field similar to that of a listener using the speaker set 42 can be reproduced.
  • ⁇ Sound field formation using such a virtual environment can be realized with an electro-acoustic transducer that is driven not only by headphones but also by two or more channels. At that time, if necessary, various corrections such as crosstalk cancellation can be performed on the audio signal reproduced by the electro-acoustic transducer.
  • the present disclosure can be realized in various forms such as an apparatus, a method, and a program.
  • the items described in each of the embodiments and the modified examples can be appropriately combined.
  • a first decoder for decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels A second decoder that decodes a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
  • An audio reproducing apparatus comprising: an adder that adds a plurality of channels of audio signals decoded by the first decoder and a plurality of channels of audio signals decoded by the second decoder.
  • the audio reproduction device according to (1) or (2), wherein the first decoder performs decoding based on an arrangement of speakers to be output.
  • the audio playback device according to any one of (1) to (3), wherein the first decoder uses a HOA method.
  • the audio playback device according to any one of (1) to (4), further including a conversion unit configured to convert audio signals of a plurality of channels output from the second decoder based on an arrangement of speakers to be output. .
  • the audio reproduction device according to (5), wherein the conversion unit converts the number of channels of the audio signal output from the second decoder.
  • the first signal and the second signal have different sampling frequencies
  • the audio playback device according to any one of (1) to (6), further including a sampling frequency conversion unit configured to convert at least one of the first signal and the second signal.
  • a plurality of the second decoders are provided for each band, The audio reproduction device according to any one of (1) to (7), wherein the plurality of second decoders use different orders for decoding.
  • the audio playback device according to any one of (1) to (8), further including a delay unit that adjusts a time shift generated between the first decoder and the second decoder.
  • a first decoding process of decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels A second decoding process of decoding a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
  • An audio reproduction program for causing an information processing device to execute an addition process of adding audio signals of a plurality of channels decoded by the first decoder and audio signals of a plurality of channels decoded by the second decoder.
  • audio system 2 recording device 3: playback device 21 (21a, 21b): LPF 22 (22a, 22b): HOA encoder 23: HPF 24: encoder 25: delay unit 26: downsampling unit 31 (31a, 31b): HOA decoder 32: decoder 33: multiplying unit 34: adding unit 35: matrix unit 36: delay unit 37: upsampling unit 41: microphone set 42 : Speaker set m1 to m8: Microphones s1 to s8: Speaker

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Stereophonic System (AREA)

Abstract

An audio reproducing device comprising a first decoder for decoding a first signal correlated to a spatial frequency into a multiple-channel audio signal, a second decoder for decoding a second signal that includes a band different from the first signal and is correlated to spatial coordinates into a multiple-channel audio signal, and an addition unit for adding together the multiple-channel audio signal decoded by the first decoder and the multiple-channel audio signal decoded by the second decoder.

Description

オーディオ再生装置、オーディオ再生方法及びオーディオ再生プログラムAudio playback device, audio playback method, and audio playback program
 本開示は、オーディオ再生装置、オーディオ再生方法及びオーディオ再生プログラムに関する。 The present disclosure relates to an audio playback device, an audio playback method, and an audio playback program.
 従来、オーディオ再生装置では、ステレオ形態(2チャネル)によるオーディオ信号の再生の他、更に、スピーカの数を増やしたマルチチャネルによる形態が知られている。このようなマルチチャネルを使用したオーディオ信号再生では、収音した際の音場を立体的に再現することが可能であり、リスナーに、臨場感のある音場を提供することが可能となる。 Conventionally, in an audio reproducing apparatus, in addition to reproducing an audio signal in a stereo form (two channels), a multi-channel form in which the number of speakers is further increased is known. In the audio signal reproduction using such a multi-channel, it is possible to three-dimensionally reproduce the sound field at the time of sound pickup, and to provide a listener with a realistic sound field.
 特許文献1には、このようなマルチチャネルを使用したオーディオ再生について、高次アンビソニックスを使用したオーディオ信号のエンコード方法が開示されている。 文献 Patent Document 1 discloses a method of encoding an audio signal using higher-order ambisonics for such audio reproduction using multi-channels.
特開2012-133366号公報JP 2012-133366 A
 このような分野では、再生するオーディオの高音質化を図ることが望まれている。 で は In such a field, it is desired to improve the quality of reproduced audio.
 本開示は、再生するオーディオの高音質化を図るオーディオ再生装置、オーディオ再生方法及びオーディオ再生プログラムを提供することを目的の一つとする。 開 示 One object of the present disclosure is to provide an audio reproducing device, an audio reproducing method, and an audio reproducing program for improving the quality of reproduced audio.
 本開示は、例えば、
 空間周波数に対応付いた第1の信号を、複数チャネルのオーディオ信号にデコードする第1のデコーダと、
 前記第1の信号と異なる帯域を含み、空間座標に対応付いた第2の信号を、複数チャネルのオーディオ信号にデコードする第2のデコーダと、
 前記第1のデコーダでデコードされた複数チャネルのオーディオ信号と、前記第2のデコーダでデコードされた複数チャネルのオーディオ信号を加算する加算部と、を備える
 オーディオ再生装置である。
The present disclosure, for example,
A first decoder for decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels;
A second decoder that decodes a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
An audio playback device comprising: an adder that adds a plurality of channels of audio signals decoded by the first decoder and a plurality of channels of audio signals decoded by the second decoder.
 本開示は、例えば、
 空間周波数に対応付いた第1の信号を、複数チャネルのオーディオ信号にデコードし、 前記第1の信号と異なる帯域を含み、空間座標に対応付いた第2の信号を、複数チャネルのオーディオ信号にデコードし、
 前記第1の信号に基づいてデコードされた複数チャネルのオーディオ信号と、前記第2の信号に基づいてデコードされた複数チャネルのオーディオ信号を加算する
 オーディオ再生方法である。
The present disclosure, for example,
A first signal associated with a spatial frequency is decoded into an audio signal of a plurality of channels, and a second signal including a band different from that of the first signal and associated with a spatial coordinate is converted into an audio signal of a plurality of channels. Decode and
An audio playback method for adding a plurality of channels of audio signals decoded based on the first signal and a plurality of channels of audio signals decoded based on the second signal.
 本開示は、例えば、
 空間周波数に対応付いた第1の信号を、複数チャネルのオーディオ信号にデコードする第1のデコード処理と、
 前記第1の信号と異なる帯域を含み、空間座標に対応付いた第2の信号を、複数チャネルのオーディオ信号にデコードする第2のデコード処理と、
 前記第1のデコーダでデコードされた複数チャネルのオーディオ信号と、前記第2のデコーダでデコードされた複数チャネルのオーディオ信号を加算する加算処理と、を情報処理装置に実行させる
 オーディオ再生プログラムである。
The present disclosure, for example,
A first decoding process of decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels;
A second decoding process of decoding a second signal including a band different from the first signal and associated with spatial coordinates into audio signals of a plurality of channels;
An audio reproduction program for causing an information processing apparatus to execute an addition process of adding audio signals of a plurality of channels decoded by the first decoder and audio signals of a plurality of channels decoded by the second decoder.
 本開示の少なくとも一つの実施形態によれば、再生するオーディオの高音質化を図ることが可能となる。ここに記載された効果は必ずしも限定されるものではなく、本開示中に記載されたいずれの効果であっても良い。また、例示された効果により本開示の内容が限定して解釈されるものではない。 According to at least one embodiment of the present disclosure, it is possible to improve the quality of audio to be reproduced. The effects described here are not necessarily limited, and may be any of the effects described in the present disclosure. In addition, the contents of the present disclosure are not to be construed as being limited by the illustrated effects.
図1は、比較例としてのオーディオシステムの構成を示す図である。FIG. 1 is a diagram illustrating a configuration of an audio system as a comparative example. 図2は、第1の実施形態に係るオーディオシステムの概要を説明するための図である。FIG. 2 is a diagram for describing an overview of the audio system according to the first embodiment. 図3は、第1の実施形態に係るオーディオシステムの周波数特性を示す図である。FIG. 3 is a diagram illustrating frequency characteristics of the audio system according to the first embodiment. 図4は、第1の実施形態に係るオーディオシステムの構成を示す図である。FIG. 4 is a diagram illustrating a configuration of the audio system according to the first embodiment. 図5は、第1の実施形態に係るオーディオシステムで使用する記録信号の記録フォーマットを示す図である。FIG. 5 is a diagram showing a recording format of a recording signal used in the audio system according to the first embodiment. 図6は、第2の実施形態に係るオーディオシステムの構成を示す図である。FIG. 6 is a diagram illustrating a configuration of an audio system according to the second embodiment. 図7は、第3の実施形態に係るオーディオシステムの構成を示す図である。FIG. 7 is a diagram illustrating a configuration of an audio system according to the third embodiment. 図8は、第4の実施形態に係るオーディオシステムの構成を示す図である。FIG. 8 is a diagram illustrating a configuration of an audio system according to the fourth embodiment. 図9は、第5の実施形態に係るオーディオシステムの構成を示す図である。FIG. 9 is a diagram illustrating a configuration of an audio system according to the fifth embodiment. 図10は、第5の実施形態に係るオーディオシステムの周波数特性を示す図である。FIG. 10 is a diagram illustrating frequency characteristics of the audio system according to the fifth embodiment. 図11は、第6の実施形態に係るオーディオシステムの構成を示す図である。FIG. 11 is a diagram illustrating a configuration of an audio system according to the sixth embodiment. 図12は、変形例について、オーディオシステムで使用する記録信号の記録フォーマットを示す図である。FIG. 12 is a diagram illustrating a recording format of a recording signal used in an audio system according to a modification.
 以下、本開示の実施形態等について図面を参照しながら説明する。なお、説明は以下の順序で行う。
<1.高次アンビソニックス(HOAについて)>
<2.第1の実施形態>
<3.第2の実施形態>
<4.第3の実施形態>
<5.第4の実施形態>
<6.第5の実施形態>
<7.第6の実施形態>
<8.変形例>
 以下に説明する実施形態等は本開示の好適な具体例であり、本開示の内容がこれらの実施形態に限定されるものではない。
Hereinafter, embodiments and the like of the present disclosure will be described with reference to the drawings. The description will be made in the following order.
<1. Higher Order Ambisonics (About HOA)>
<2. First Embodiment>
<3. Second Embodiment>
<4. Third Embodiment>
<5. Fourth embodiment>
<6. Fifth Embodiment>
<7. Sixth embodiment>
<8. Modification>
The embodiments and the like described below are preferred specific examples of the present disclosure, and the contents of the present disclosure are not limited to these embodiments.
<1.高次アンビソニックス(HOAについて)>
 近年、音声の分野において、全周囲からの空間情報を収録、伝送、再生する3次元音響の開発・普及が進んでいる。このような3次元音響について放送の分野では、22.2チャネルを使用した3次元マルチチャネル音響放送が計画される等、その進歩は著しいものがある。また、バーチャルリアリティの分野においても、全周囲を取り囲む映像に加え、音声においても全周囲を取り囲む信号を再生するものが世の中に出回りつつある。
<1. Higher Order Ambisonics (About HOA)>
2. Description of the Related Art In recent years, in the field of audio, development and spread of three-dimensional sound for recording, transmitting, and reproducing spatial information from all around have been advanced. In the field of broadcasting regarding such three-dimensional sound, progress has been remarkable, for example, a three-dimensional multi-channel sound broadcasting using 22.2 channels is planned. In the field of virtual reality, in addition to video that surrounds the entire circumference, audio that reproduces a signal that surrounds the entire circumference is becoming popular.
 そのような状況の中、アンビソニックスと呼ばれる、任意の収録再生系に柔軟に対応可能な3次元音声情報の表現手法が注目されている。特に、次数が2次以上となるアンビソニックスは、高次アンビソニックス方式(HOA: Higher Order Ambisonics)と呼ばれている。3次元のマルチチャネル音響においては、音の情報は時間軸に加えて空間軸に広がっており、アンビソニックスでは3次元極座標の角度方向に関して空間周波数変換(球面調和関数変換)を行って情報を保持している。これは、音声信号の時間軸に対する時間周波数変換に相当するものと考えることができる。この方法の利点としては、マイクロホンの数、スピーカの数を限定せず、任意のマイクロホンアレイから任意のスピーカアレイに対して情報をエンコード、デコードをすることができることにある。 の 中 In such a situation, attention has been paid to a method of expressing three-dimensional audio information called ambisonics that can flexibly cope with an arbitrary recording and reproducing system. In particular, ambisonics whose order is second or higher are called higher order ambisonics (HOA: \ Higher \ Order \ Ambisonics). In three-dimensional multi-channel sound, sound information spreads in the spatial axis in addition to the time axis. In Ambisonics, information is stored by performing spatial frequency conversion (spherical harmonic function conversion) in the angular direction of three-dimensional polar coordinates. are doing. This can be considered to correspond to time-frequency conversion of the audio signal with respect to the time axis. An advantage of this method is that information can be encoded and decoded from an arbitrary microphone array to an arbitrary speaker array without limiting the number of microphones and the number of speakers.
 一方で、HOA方式の収録・再生について、以下の問題点があげられる。
・球状または環状などのマイクロホンセット(マイクロホンアレイともいう)からの入力信号をHOAエンコードする際、マイクの数・間隔やアレイの半径に応じて、高い周波数にて空間エイリアシングが発生してしまい、ある周波数以上は正しく音場を収録・表現することができない。
・実際のマイクを用いてマイクロホンセットを構築する場合、その現実的な大きさやマイク数から算出されるこの周波数は可聴帯域内であるため、HOA方式による収録・再生では知覚できる周波数帯域内で音質劣化を生じてしまう。
On the other hand, the following problems are raised in the recording / reproduction of the HOA system.
-When HOA encoding an input signal from a spherical or annular microphone set (also referred to as a microphone array), spatial aliasing may occur at a high frequency depending on the number and spacing of microphones and the radius of the array. Sound fields cannot be recorded and expressed correctly above the frequency.
When a microphone set is constructed using an actual microphone, since the frequency calculated from its actual size and the number of microphones is within the audible band, sound quality within a frequency band that can be perceived in HOA recording / reproduction. Deterioration occurs.
 まず、以下にHOA方式の詳細について説明する。HOA方式で使用するエンコードは、大きく2種類に分類することができる。1つは、収録ベース、2つ目はオブジェクトベースである。本実施形態で対象となるのは、1つ目の収録ベースについてであるため、これについて説明する。 First, the details of the HOA method will be described below. Encodings used in the HOA method can be roughly classified into two types. One is a recording base, and the other is an object base. Since the first recording base is targeted in the present embodiment, this will be described.
(マイク収録信号からHOA信号への変換)
環状、または球状マイクロホンアレイにて収録された音信号のある時間周波数ωは、下記に示す式によりそれぞれ、HOA信号Am(ω)、An m(ω)号へと変換される。
(Conversion from microphone recording signal to HOA signal)
A certain time frequency ω of the sound signal recorded by the annular or spherical microphone array is converted into HOA signals A m (ω) and An m (ω) according to the following equations, respectively.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 式(1)は環状、式(2)は球状のマイクロホンアレイに対する式である。ここで、φq、θqはq番目のマイクロホンの方位角、仰角を、Pq(ω)はq番目のマイクロホンの音圧を表している。式(1)のJm(ka)はベッセル関数であり、mはその次数、kは波数、aはマイクロホンアレイの半径である。式(2)では、式(1)のベッセル関数が球ベッセル関数に、e-imφqが球面調和関数Yn m(φq、θq)へと置き換えられている。ここで、球面調和関数は、 Equation (1) is for a circular microphone array, and Equation (2) is for a spherical microphone array. Here, φ q and θ q represent the azimuth and elevation of the q-th microphone, and P q (ω) represents the sound pressure of the q-th microphone. In Equation (1), J m (ka) is a Bessel function, m is its order, k is the wave number, and a is the radius of the microphone array. In equation (2), the Bessel function of Equation (1) is the spherical Bessel function, e -Imfaiq spherical harmonics Y n m (φ q, θ q) is replaced to. Where the spherical harmonic is
Figure JPOXMLDOC01-appb-M000003
 のように定義される。Pn mはルジャンドル陪関数である。球面調和関数の定義はこのほかにも様々存在するが、どの定義を用いても本開示の趣旨に影響はないため、今後はこの定義を用いて進めていくこととする。
Figure JPOXMLDOC01-appb-M000003
Is defined as P n m is the associated Legendre functions. There are various other definitions of the spherical harmonic function, but any definition will not affect the purpose of the present disclosure, and will be used in the future.
 m、nはHOAの次数である。方位角および、仰角についての連続の関数である音圧Pの変換であるので、次数m、nは無限大まで存在する。しかしながら、球状マイクロホンアレイで収録した場合、連続的な関数として音圧Pを捉えることは不可能である。そのため、時間周波数における標本化定理同様に、再現可能なHOA次数M、Nとマイクロホンの数Qには下記のような関係が存在する。 M, n is the order of the HOA. Since this is a conversion of the sound pressure P which is a continuous function of the azimuth and the elevation, the orders m and n exist up to infinity. However, when recording with a spherical microphone array, it is impossible to capture the sound pressure P as a continuous function. Therefore, similar to the sampling theorem at the time frequency, the following relation exists between the reproducible HOA orders M and N and the number Q of microphones.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000005
 環状の場合は式(4)、球状の場合は式(5)となる。
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000005
Equation (4) for a ring, and equation (5) for a sphere.
(HOA信号からオーディオ信号への変換)
 HOA信号をオーディオ信号に変換する場合は、2次元の場合、ある時間周波数ωにおいて、
(Conversion from HOA signal to audio signal)
When converting the HOA signal into an audio signal, in a two-dimensional case, at a certain time frequency ω,
Figure JPOXMLDOC01-appb-M000006
となる。また、3次元の場合は、
Figure JPOXMLDOC01-appb-M000006
Becomes In the case of three dimensions,
Figure JPOXMLDOC01-appb-M000007
となる。
 ここで、Rはスピーカアレイの半径、αi、βiはi番目のスピーカの仰角と方位角、Gm(R、ω)、Gn 0(R、ω)は次に示す、伝達関数のHOA係数である。
Figure JPOXMLDOC01-appb-M000007
Becomes
Here, R is the radius of the speaker array, α i , β i are the elevation and azimuth angles of the i-th speaker, and G m (R, ω) and G n 0 (R, ω) are the transfer functions of HOA coefficient.
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000009
 ここで、Hm (2)(kR)は第2種ハンケル関数、hn (2)(kR)は第2種球ハンケル関数である。なお、HOA信号とオーディオ信号間の変換に関しては、マイクロホンアレイの形状やスピーカアレイの形状、指向性などによって変換式が異なる。今後HOAエンコード、HOAデコードとして説明される場合は、これらさまざまな方式を含むことを意味するものであって、いずれかに限定されるものではない。
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000009
Here, H m (2) (kR) is a Hankel function of the second kind and h n (2) (kR) is a Hankel function of the second kind. The conversion formula between the HOA signal and the audio signal differs depending on the shape of the microphone array, the shape of the speaker array, the directivity, and the like. In the following, descriptions as HOA encoding and HOA decoding mean that these various systems are included, and are not limited to any of them.
(空間エイリアシング)
 先に述べたように、マイクロホンセットによる収録においては、マイクロホン数の制限により、次数が有限となってしまう。そのため、それ以上の次数の信号が混入した場合、空間エイリアシングを起こしてしまう。空間エイリアシングが起きている信号をHOA方式でエンコード、デコードした場合、収録した空間とは異なる信号が再生されてしまうこととなる。このエイリアシングの影響は、時間周波数とマイクロホンの半径によって異なる。時間周波数の低域になるほど、また、マイクロホン半径が小さくなるほど、HOA次数の高次の信号は小さくなる。つまり同じ時間周波数に対しては、マイクロホン半径が小さくなるほど、高次のHOA信号は小さくなり、エイリアシングの影響は小さくなる。また、同じマイクロホン半径であれば、低域のほうが、エイリアシングの影響は少なくなることになる。
(Spatial aliasing)
As described above, in recording by a microphone set, the order is finite due to the limitation of the number of microphones. Therefore, if a signal of a higher order is mixed, spatial aliasing occurs. If a signal in which spatial aliasing occurs is encoded and decoded by the HOA method, a signal different from the recorded space will be reproduced. The effect of this aliasing depends on the time frequency and the radius of the microphone. As the time frequency becomes lower and the microphone radius becomes smaller, the higher-order signal of the HOA order becomes smaller. In other words, for the same time frequency, the smaller the radius of the microphone, the smaller the higher-order HOA signal, and the less the aliasing effect. Also, if the microphone radius is the same, the effect of aliasing is reduced in the low frequency band.
 以下に説明する各種実施形態では、マイクの数・間隔やアレイの半径に応じて、特に高い周波数で発生する空間エイリアシングによる影響を抑制し、高音質なオーディオ再生を行うことを一つの目的としている。 In various embodiments described below, an object is to perform high-quality audio reproduction by suppressing the influence of spatial aliasing generated at a particularly high frequency in accordance with the number and spacing of microphones and the radius of the array. .
 図1は、比較例としてのオーディオシステム1の構成を示す図である。この比較例は、HOA方式のみを使用した従来の形態であって、HOAエンコーダ22、HOAデコーダ31を有して構成されている。HOAエンコーダ22には、マイクロホンセット41に設けられた複数のマイクロホンで収音したオーディオ信号が入力される。ここで、マイクロホンセット41は、環状、球状、ライン状等、適宜配置に設けられた複数のマイクロホンを有して構成されている。HOAエンコーダ22は、マイクロホンセット41で収音された複数のオーディオ信号に対し、HOAエンコードを行うことで、空間周波数表現としたHOA信号に変換する。 FIG. 1 is a diagram showing a configuration of an audio system 1 as a comparative example. This comparative example is a conventional form using only the HOA method, and includes a HOA encoder 22 and a HOA decoder 31. Audio signals collected by a plurality of microphones provided in the microphone set 41 are input to the HOA encoder 22. Here, the microphone set 41 includes a plurality of microphones provided in an appropriate arrangement such as a ring, a sphere, and a line. The HOA encoder 22 performs HOA encoding on a plurality of audio signals collected by the microphone set 41, thereby converting the audio signals into a HOA signal represented as a spatial frequency.
 HOAデコーダ31は、受信したHOA信号を任意のスピーカセット42を使用して再生することが可能である。ここで、使用するスピーカセット42は、環状、球状、ライン状等、適宜配置に設けられた複数のスピーカによって構成されている。また、スピーカセット42の配置は、収音したマイクロホンセット41のマイクロホン配置に依存する必要は無い。これは、HOA信号が空間周波数表現されたことを理由とするものであり、HOAデコーダ31に対して、スピーカセット42のスピーカの配置を設定することで、収音した音場を再現することが可能である。 The HOA decoder 31 can reproduce the received HOA signal using an arbitrary speaker set 42. Here, the speaker set 42 used includes a plurality of speakers provided in an appropriate arrangement such as a ring, a sphere, a line, and the like. Further, the arrangement of the speaker set 42 does not need to depend on the microphone arrangement of the microphone set 41 that has collected the sound. This is because the HOA signal is expressed in the spatial frequency. By setting the arrangement of the speakers of the speaker set 42 with respect to the HOA decoder 31, it is possible to reproduce the sound field collected. It is possible.
 ところで、図1に示すオーディオシステム1では、マイクロホンセット41におけるマイクロホンの物理的な制約により、HOA信号において、特に高い周波数領域で空間の音の情報が正しく表現できず、音質劣化を起こしてしまう問題がある。また、この問題はHOAに限らず、マルチマイクロホンでの収録システムに存在する問題である。 By the way, in the audio system 1 shown in FIG. 1, due to the physical limitations of the microphones in the microphone set 41, information of spatial sound cannot be correctly expressed in a HOA signal, particularly in a high frequency region, resulting in deterioration of sound quality. There is. This problem is not limited to the HOA, and is a problem existing in a recording system using a multi-microphone.
<2.第1の実施形態>
 図2は、第1の実施形態に係るオーディオシステム1の概要を説明するための図である。第1の実施形態に係るオーディオシステム1は、LPF21(Low Pass Filter)、HOAエンコーダ22、HOAデコーダ31、HPF23(High Pass Filter)、乗算部33、加算部34を有して構成されている。また、オーディオシステム1は、マイクロホンセット41に設けられた複数のマイクロホンを入力とし、複数のスピーカが配置されたスピーカセット42を出力としている。なお、図2において、マイクロホンセット41から出力され、LPF21を介してHOAエンコーダ22に入力されるオーディオ信号、及び、HPF23に入力されるオーディオ信号は、マイクロホンセット41に設けられたマイクロホンの数分のチャネル数を有する。そして、HOAデコーダ31から出力され、加算部34を介してスピーカセット42に出力されるオーディオ信号は、スピーカセット42に配置されたスピーカの数分のチャネル数を有する。このように、図2に示すブロック図では、作図の都合上、複数のチャネルを1つの線で示されている箇所がある。
<2. First Embodiment>
FIG. 2 is a diagram for describing an overview of the audio system 1 according to the first embodiment. The audio system 1 according to the first embodiment includes an LPF 21 (Low Pass Filter), a HOA encoder 22, a HOA decoder 31, an HPF 23 (High Pass Filter), a multiplier 33, and an adder 34. The audio system 1 receives a plurality of microphones provided in the microphone set 41 as input and outputs a speaker set 42 in which a plurality of speakers are arranged. In FIG. 2, the audio signal output from the microphone set 41 and input to the HOA encoder 22 via the LPF 21 and the audio signal input to the HPF 23 are equivalent to the number of microphones provided in the microphone set 41. Has the number of channels. The audio signal output from the HOA decoder 31 and output to the speaker set 42 via the adder 34 has the same number of channels as the number of speakers arranged in the speaker set 42. As described above, in the block diagram shown in FIG. 2, for convenience of drawing, there are places where a plurality of channels are indicated by one line.
 HOAエンコーダ22には、マイクロホンセット41の複数のマイクロホンで収音された複数のオーディオ信号が入力されることになるが、本実施形態では、マイクロホンセット41から入力された複数の音信号について、LPF21を介すことによって、高域成分を除去し、HOA信号にて正しく表現できる周波数帯域に制限している。HOAエンコーダ22では、LPF21で高域成分が除去された複数のオーディオ信号を、空間周波数表現としたHOA信号に変換する。 A plurality of audio signals collected by a plurality of microphones of the microphone set 41 are input to the HOA encoder 22. In the present embodiment, the LPF 21 is used for the plurality of sound signals input from the microphone set 41. , High-frequency components are removed, and the frequency band is limited to a frequency band that can be correctly expressed by the HOA signal. The HOA encoder 22 converts the plurality of audio signals from which the high-frequency components have been removed by the LPF 21 into HOA signals represented as spatial frequency.
 HOAデコーダ31では、HOAエンコーダ22から出力されるHOA信号をデコードし、任意のスピーカセット42を使用して再生する。その際、マイクロホンセット41に入力される複数のオーディオ信号中、HOAエンコーダ22で表現できない高い周波数帯域は、HPF23を介して高域成分のみとし、乗算部33でゲイン調整を行った後、加算部34でHOAデコード後のオーディオ信号に加算してスピーカセット42に出力する。 The HOA decoder 31 decodes the HOA signal output from the HOA encoder 22 and reproduces the HOA signal using an arbitrary speaker set 42. At this time, in a plurality of audio signals input to the microphone set 41, a high frequency band that cannot be expressed by the HOA encoder 22 is only a high-frequency component via the HPF 23, and after performing gain adjustment by the multiplication unit 33, At 34, the sum is added to the HOA-decoded audio signal and output to the speaker set 42.
 図3は、第1の実施形態に係るオーディオシステム1の周波数特性を示す図である。図3に示される周波数特性中、実線で示される低域通過特性は、LPF21の特性を示している。また、破線で示される高域通過特性は、HPF23の特性を示している。LPF21の低域通過特性と、HPF23の高域通過特性を加算することで、低域から高域の周波数にわたって平坦な周波数特性が形成される。これらの特性はあくまで1つの例であり、設計の仕方によってさまざまな特性が可能である。 FIG. 3 is a diagram illustrating frequency characteristics of the audio system 1 according to the first embodiment. In the frequency characteristics shown in FIG. 3, the low-pass characteristics shown by the solid lines indicate the characteristics of the LPF 21. The high-pass characteristics indicated by the broken lines indicate the characteristics of the HPF 23. By adding the low-pass characteristics of the LPF 21 and the high-pass characteristics of the HPF 23, a flat frequency characteristic is formed from low to high frequencies. These characteristics are just one example, and various characteristics are possible depending on the design method.
(オーディオ再生システムの構成)
 図4は、第1の実施形態に係るオーディオシステム1の構成を示す図である。図2では、オーディオシステム1の概要について説明したが、オーディオシステム1は、実際には録音側に設けられた記録装置2と、再生側に設けられた再生装置3に分かれて構成される。記録装置2で記録された記録信号は、記録媒体に記録される、あるいは、通信を介して伝送される。再生装置3では、記録媒体に記録された記録信号、あるいは、通信を介して伝送されてきた記録信号を再生することで、記録した際の音場を再現することになる。
(Configuration of audio playback system)
FIG. 4 is a diagram illustrating a configuration of the audio system 1 according to the first embodiment. Although the outline of the audio system 1 has been described with reference to FIG. 2, the audio system 1 is actually divided into a recording device 2 provided on the recording side and a reproducing device 3 provided on the reproducing side. The recording signal recorded by the recording device 2 is recorded on a recording medium or transmitted via communication. The reproduction device 3 reproduces the sound field at the time of recording by reproducing the recording signal recorded on the recording medium or the recording signal transmitted via communication.
 なお、本実施形態では、入力側、出力側共に8チャネル(ch:channel)であって、マイクロホンセット41に8個のマイクロホンm1~m8を使用しており、スピーカセット42にも8個のスピーカs1~s8を使用している。マイクロホンm1~m8と、スピーカs1~s8は、両者の添字の数が対応するように配置される。なお、図4中、ブロック間の線上に示される数は、チャネル数を示している。 In the present embodiment, the input side and the output side have eight channels (ch: channel), the microphone set 41 uses eight microphones m1 to m8, and the speaker set 42 also has eight speakers. s1 to s8 are used. The microphones m1 to m8 and the speakers s1 to s8 are arranged such that the numbers of the subscripts correspond to each other. In FIG. 4, the numbers shown on the lines between the blocks indicate the number of channels.
 オーディオシステム1の記録側に位置する記録装置2は、LPF21、HOAエンコーダ22、HPF23、エンコーダ24を備えて構成されている。LPF21、HOAエンコーダ22、HPF23については、図2を用いて説明したものと同様であるため、ここでの説明は省略する。エンコーダ24は、HPF23を通過したオーディオ信号を、空間座標に対応付いた信号に変換する。ここで空間座標に対応付いた信号に変換する方法としては、例えば、PCM(Pulse Code Modulation)符号化、ADPCM(Adaptive Differential Pulse Code Modulation符号化、デルタ変調等、再生時において、録音した際の空間座標に依存する方法をいう。 The recording device 2 located on the recording side of the audio system 1 includes the LPF 21, the HOA encoder 22, the HPF 23, and the encoder 24. The LPF 21, the HOA encoder 22, and the HPF 23 are the same as those described with reference to FIG. The encoder 24 converts the audio signal that has passed through the HPF 23 into a signal corresponding to spatial coordinates. Here, as a method of converting into a signal corresponding to the spatial coordinates, for example, PCM (Pulse Code Modulation) coding, ADPCM (Adaptive Differential Pulse Code Modulation coding, Delta modulation, etc.) A method that depends on coordinates.
 一方、HOAエンコーダ22は、LPF21から入力されるオーディオ信号を空間周波数に対応付いた信号に変換する方法である点において、エンコーダ24と異なっている。HOAエンコーダ22で変換されたHOA信号は、再生する空間座標、すなわち、スピーカセット42における各スピーカs1~s8の位置を指定することで、当該空間座標位置での音を再生することが可能である。 On the other hand, the HOA encoder 22 is different from the encoder 24 in that the HOA encoder 22 converts the audio signal input from the LPF 21 into a signal corresponding to a spatial frequency. The HOA signal converted by the HOA encoder 22 can reproduce the sound at the spatial coordinate position by designating the spatial coordinates to be reproduced, that is, the positions of the speakers s1 to s8 in the speaker set 42. .
 記録装置2のHOAエンコーダ22で変換の結果得られたHOA信号、エンコーダ24で変換の結果得られた高域信号は、記録信号として記録媒体に記録、あるいは、再生側に位置する再生装置3へと送信される。 The HOA signal obtained as a result of conversion by the HOA encoder 22 of the recording device 2 and the high-frequency signal obtained as a result of conversion by the encoder 24 are recorded on a recording medium as a recording signal, or sent to the reproducing device 3 located on the reproducing side. Is sent.
 図5は、第1の実施形態に係るオーディオシステム1で使用する記録信号の記録フォーマットを示す図である。記録信号は、ヘッダ部とデータ部を有して構成される。ヘッダ部は、記録されたオーディオ信号を再生するために必要な各種メタ情報を記録した部分である。本実施形態では、ヘッダ部に記録するメタ情報として、サンプリングレート、フレーム長、フレーム数、帯域分割数、帯域毎の帯域情報(第1帯域情報、第2帯域情報)を含んで構成されている。 FIG. 5 is a diagram showing a recording format of a recording signal used in the audio system 1 according to the first embodiment. The recording signal has a header section and a data section. The header section is a section in which various meta information necessary for reproducing the recorded audio signal is recorded. In the present embodiment, the meta information to be recorded in the header section is configured to include a sampling rate, a frame length, the number of frames, the number of band divisions, and band information (first band information and second band information) for each band. .
 サンプリングレートは、録音時に使用したサンプリングレートであって、固定、可変のどちらであってもよい。フレーム長は、データ部内に記録したフレームの長さを規定した情報である。フレーム長についても固定、可変のどちらを採用してもよい。フレーム数(L)は、データ部中、1つのデータのまとまりとなるチャンクを構成するフレーム数を規定した数である。帯域分割数は、オーディオシステム1において分割される帯域数を示した数であり、本実施形態では、図4で説明したようにLPF21、HPF23により2つの帯域に分けられるため、帯域分割数は「2」となる。 The sampling rate is the sampling rate used at the time of recording, and may be fixed or variable. The frame length is information defining the length of a frame recorded in the data section. Either fixed or variable frame length may be adopted. The number of frames (L) is a number that defines the number of frames forming a chunk that is a unit of one data in the data portion. The number of band divisions is a number indicating the number of bands divided in the audio system 1. In the present embodiment, the number of band divisions is “2” by the LPF 21 and the HPF 23 as described with reference to FIG. 2 ".
 第1帯域情報は、低域側、すなわち、HOAエンコーダ22の変換に関する情報であって、本実施形態では、カットオフ周波数、空間ドメイン情報、信号ドメイン情報、圧縮方式情報、次数を含んで構成されている。カットオフ周波数は、図3で説明したLPF21の高域側のカットオフ周波数がこれに相当する。空間ドメイン情報は、その帯域がHOA信号であることを示す情報を含んでおり、この他、収音されたマイクロホンセット41に関する情報、例えば、マイクロホンセット41におけるマイクロホンm1~m8の配置に関する情報、例えば、球状、環状、線上、内向き、外向きなどの情報を含んでもよい。信号ドメイン情報は、時間軸で記録されているか、時間周波数軸で記録されているかを示す情報である。圧縮方式情報は、圧縮の有無、使用している圧縮方式を示す情報である。次数は、HOAエンコーダ22で使用された次数である。 The first band information is information relating to conversion on the low band side, that is, the conversion of the HOA encoder 22. In the present embodiment, the first band information is configured to include a cutoff frequency, spatial domain information, signal domain information, compression scheme information, and an order. ing. The cutoff frequency corresponds to the cutoff frequency on the high frequency side of the LPF 21 described in FIG. The spatial domain information includes information indicating that the band is a HOA signal. In addition, information on the collected microphone set 41, for example, information on the arrangement of the microphones m1 to m8 in the microphone set 41, for example, , Spherical, annular, linear, inward, outward, and the like. The signal domain information is information indicating whether it is recorded on the time axis or on the time frequency axis. The compression method information is information indicating the presence or absence of compression and the compression method being used. The order is the order used in the HOA encoder 22.
 一方、第2帯域情報は、高域側、すなわち、エンコーダ24の変換に関する情報であって、本実施形態では、カットオフ周波数、空間ドメイン情報、信号ドメイン情報、圧縮方式情報、チャネル情報を含んで構成されている。カットオフ周波数は、図3で説明したHPF23の低域側のカットオフ周波数がこれに相当する。空間ドメイン情報は、その帯域がエンコーダ24でエンコードされた信号であることを示す情報を含んでおり、この他、収音されたマイクロホンセット41に関する情報、例えば、マイクロホンセット41におけるマイクロホンm1~m8の配置に関する情報、例えば、球状、環状、線上、内向き、外向きなどの情報を含んでもよい。信号ドメイン情報は、時間軸で記録されているか、時間周波数軸で記録されているかを示す情報である。圧縮方式情報は、圧縮の有無、使用している圧縮方式を示す情報である。チャネル情報は、チャネル数と、チャネル座標を含んで構成されている。チャネル数は、マイクロホンセット41中のマイクロホンの数(この場合「8」個)がこれに相当する。また、チャネル座標は、マイクロホンセット41中、マイクロホンm1~m8について、その空間配置を示す座標である。 On the other hand, the second band information is information relating to the conversion on the high frequency side, that is, the encoder 24. In the present embodiment, the second band information includes cutoff frequency, spatial domain information, signal domain information, compression scheme information, and channel information. It is configured. The cutoff frequency corresponds to the cutoff frequency on the low frequency side of the HPF 23 described with reference to FIG. The spatial domain information includes information indicating that the band is a signal encoded by the encoder 24. In addition, information on the collected microphone set 41, for example, information of the microphones m1 to m8 in the microphone set 41 is included. Information on the arrangement, for example, information such as spherical, annular, linear, inward, outward and the like may be included. The signal domain information is information indicating whether it is recorded on the time axis or on the time frequency axis. The compression method information is information indicating the presence or absence of compression and the compression method being used. The channel information includes the number of channels and channel coordinates. The number of channels corresponds to the number of microphones in the microphone set 41 (in this case, “8”). The channel coordinates are coordinates indicating the spatial arrangement of the microphones m1 to m8 in the microphone set 41.
 データ部には、HOAエンコーダ22、エンコーダ24で変換された信号が格納されている。本実施形態では、第1帯域(低域)、第2帯域(高域)について、フレームを有するフレームチャンクが、フレーム数(L)だけ設けられている。このようにフレームで記録されたデータは、ヘッダ部に記述されたメタ情報を参照して、HOAデコーダ31、あるいは、デコーダ32で音信号に変換される。 The data section stores signals converted by the HOA encoder 22 and the encoder 24. In the present embodiment, for the first band (low band) and the second band (high band), frame chunks having frames are provided by the number of frames (L). The data recorded in the frame as described above is converted into a sound signal by the HOA decoder 31 or the decoder 32 with reference to the meta information described in the header portion.
 なお、上述した記録フォーマットにおいて、帯域間で共通する情報は、1つにまとめる等の形態を採用することができる。そして、上述した記録フォーマットは、あくまで一例であって、この形態に限られるものでは無く様々な形態で構成することが可能である。 In the recording format described above, information common to bands can be combined into one. The recording format described above is merely an example, and the present invention is not limited to this format, and can be configured in various forms.
 一方、オーディオシステム1の再生側に位置する再生装置3は、HOAデコーダ31,デコーダ32、乗算部33、加算部34を有して構成されている。HOAデコーダ31は、HOAエンコーダ22でエンコードされたHOA信号を復号し、8チャネルのオーディオ信号を形成する。また、デコーダ32は、エンコーダ24でエンコードされた信号を複合し、8チャネルのオーディオ信号を形成する。加算部34は、HOAデコーダで形成されたオーディオ信号と、デコーダ32で形成され、乗算部33で適宜係数が乗算されたオーディオ信号を、チャネル毎に加算してスピーカセット42に出力する。本実施形態では、マイクロホンセット41のマイクロホンm1~m8の数、スピーカセット42のスピーカs1~s8の数は、どちらも同じ8個であるため、対応するチャネルの信号をスピーカs1~s8に出力することで、収音時の音場を再現することが可能となる。 On the other hand, the playback device 3 located on the playback side of the audio system 1 includes a HOA decoder 31, a decoder 32, a multiplier 33, and an adder 34. The HOA decoder 31 decodes the HOA signal encoded by the HOA encoder 22 and forms an 8-channel audio signal. The decoder 32 combines the signals encoded by the encoder 24 to form an 8-channel audio signal. The adder 34 adds, for each channel, the audio signal formed by the HOA decoder and the audio signal formed by the decoder 32 and appropriately multiplied by the multiplier 33, and outputs the result to the speaker set 42. In the present embodiment, since the number of microphones m1 to m8 of the microphone set 41 and the number of speakers s1 to s8 of the speaker set 42 are the same eight, signals of the corresponding channels are output to the speakers s1 to s8. This makes it possible to reproduce the sound field at the time of sound pickup.
 以上、第1の実施形態に係るオーディオシステム1について説明したが、第1の実施形態によれば、空間周波数に対応付いた信号であるHOA信号について、マイクロホンセット41中のマイクロホンm1~m8の数、間隔、あるいはアレイの半径等に応じて生じる空間エイリアシングの影響により、ある周波数以上で生じるオーディオ信号の劣化を抑制することが可能となり、音場を精度高く収音、再生することが可能となる。 Although the audio system 1 according to the first embodiment has been described above, according to the first embodiment, the number of the microphones m1 to m8 in the microphone set 41 for the HOA signal that is a signal corresponding to the spatial frequency is set. , Spacing, or the effect of spatial aliasing that occurs in accordance with the radius of the array, etc., makes it possible to suppress the deterioration of the audio signal that occurs at a certain frequency or higher, and to collect and reproduce the sound field with high accuracy. .
<3.第2の実施形態>
 第1の実施形態では、図4で説明したように、マイクロホンセット41のマイクロホンm1~m8の数と、スピーカセット42のスピーカs1~s8の数を一致させていた。しかしながら、再生側の都合等で、スピーカセット42の配置が、収音時のマイクロホンセット41の配置と同様に構成できないことが考えられる。以下に説明する第2、第3の実施形態は、マイクロホンセット41のマイクロホンの数と、スピーカセット42のスピーカの数が一致しない場合の実施形態である。
<3. Second Embodiment>
In the first embodiment, as described with reference to FIG. 4, the number of microphones m1 to m8 of the microphone set 41 and the number of speakers s1 to s8 of the speaker set 42 match. However, it is conceivable that the arrangement of the speaker set 42 cannot be configured in the same manner as the arrangement of the microphone set 41 at the time of sound collection due to the convenience of the reproduction side. The second and third embodiments described below are embodiments in which the number of microphones in the microphone set 41 and the number of speakers in the speaker set 42 do not match.
 図6は、第2の実施形態に係るオーディオシステム1の構成を示す図である。記録側に位置するマイクロホンセット41、記録装置2の構成は、図4で説明したものと同様であり、ここでの説明は省略する。再生側に位置するスピーカセット42は、スピーカs1~s4の数が、マイクロホンm1~m8の数(8個)よりも少ない数で構成されている点において、図4の構成と異なっている。また、再生装置3には、乗算部33と加算部34の間にマトリックス部35が設けられている点において異なっている。そして、HOAデコーダ31には、スピーカセット42におけるスピーカs1~s4の数、及び、位置が指定されることで、スピーカs1~s4の配置に応じた4チャネル分のオーディオ信号を出力する。 FIG. 6 is a diagram showing a configuration of the audio system 1 according to the second embodiment. The configurations of the microphone set 41 and the recording device 2 located on the recording side are the same as those described with reference to FIG. 4, and a description thereof will be omitted. The speaker set 42 located on the reproduction side is different from the configuration in FIG. 4 in that the number of speakers s1 to s4 is smaller than the number (eight) of microphones m1 to m8. Further, the reproducing apparatus 3 is different in that a matrix section 35 is provided between the multiplication section 33 and the addition section 34. Then, by specifying the number and positions of the speakers s1 to s4 in the speaker set 42, the HOA decoder 31 outputs audio signals for four channels according to the arrangement of the speakers s1 to s4.
 一方、デコーダ32でデコードすることで出力されるオーディオ信号は、収音時のマイクロホンm1~m8に応じた8チャネル分のオーディオ信号が出力される。本実施形態では、デコーダ32で出力されるオーディオ信号を、スピーカセット42のスピーカs1~s4の配置に合わせて、変換部としてのマトリックス部35でミキシングを行っている。具体的には、スピーカs1に出力するオーディオ信号として、3つのマイクロホンm1、m2、m8で収音されたオーディオ信号をミキシングしている。その際、マイクロホンm2、m8で収音されたオーディオ信号に対しては、0.25の係数を乗算している。同様に、スピーカs2に出力するオーディオ信号は、3つのマイクロホンm2、m3、m4で収音されたオーディオ信号をマトリックス部35でミキシングしている。スピーカs3に出力するオーディオ信号は、3つのマイクロホンm4、m5、m6で収音されたオーディオ信号をマトリックス部35でミキシングしている。スピーカs4に出力するオーディオ信号は、3つのマイクロホンm6、m7、m8で収音されたオーディオ信号をマトリックス部35でミキシングしている。 On the other hand, as an audio signal output by decoding by the decoder 32, an audio signal for eight channels corresponding to the microphones m1 to m8 at the time of sound pickup is output. In the present embodiment, the audio signal output from the decoder 32 is mixed by the matrix unit 35 as a conversion unit in accordance with the arrangement of the speakers s1 to s4 of the speaker set 42. Specifically, audio signals collected by three microphones m1, m2, and m8 are mixed as audio signals to be output to the speaker s1. At this time, the audio signals collected by the microphones m2 and m8 are multiplied by a coefficient of 0.25. Similarly, the audio signal output to the speaker s2 is obtained by mixing the audio signals collected by the three microphones m2, m3, and m4 in the matrix unit 35. The audio signal output to the speaker s3 is obtained by mixing the audio signals collected by the three microphones m4, m5, and m6 in the matrix unit 35. The audio signal output to the speaker s4 is obtained by mixing the audio signals collected by the three microphones m6, m7, and m8 in the matrix unit 35.
 このように、空間周波数に対応付いた信号に基づいてオーディオ信号を復元するHOAデコーダ31は、収音の形態、すなわち、マイクロホンm1~m8の配置形態に関わらず、スピーカs1~s4の配置形態によって音場を再現することができるのに対し、空間座標に対応付いた信号に基づいてオーディオ信号を復元するデコーダ32は、マイクロホンm1~m8の位置に依存したオーディオ信号で音場を再現することになる。本実施形態では、両者の違いを考慮し、変換部としてのマトリックス部35を設け、デコーダ32から出力されたオーディオ信号のチャネル数を、スピーカセット42のスピーカs1~s4の配置に応じて変換することで、再生側のスピーカs1~s4の配置に応じて音場を再現することを可能としている。なお、マトリックス部35の構成は、本実施形態で説明されたものの他にもさまざまな方法が考えられ、1つの方法に限定されるものではない。 As described above, the HOA decoder 31 for restoring an audio signal based on a signal corresponding to a spatial frequency has a sound collecting form, that is, regardless of the arrangement form of the microphones m1 to m8, depending on the arrangement form of the speakers s1 to s4. While the sound field can be reproduced, the decoder 32 for restoring the audio signal based on the signal corresponding to the spatial coordinates reproduces the sound field with an audio signal depending on the positions of the microphones m1 to m8. Become. In the present embodiment, in consideration of the difference between the two, a matrix unit 35 is provided as a conversion unit, and the number of channels of the audio signal output from the decoder 32 is converted according to the arrangement of the speakers s1 to s4 of the speaker set 42. This makes it possible to reproduce a sound field according to the arrangement of the speakers s1 to s4 on the reproduction side. Note that the configuration of the matrix unit 35 may be various methods other than those described in the present embodiment, and is not limited to one method.
<4.第3の実施形態>
 図7は、第3の実施形態に係るオーディオシステム1の構成を示す図である。第3の実施形態は、図6で説明した第2の実施形態と、マイクロホンセット41のマイクロホンの配置、スピーカセット42のスピーカの配置において異なっている。具体的には、マイクロホンセット41は4個のマイクロホンm1~m4で構成され、スピーカセット42は8個のスピーカs1~s8で構成されている。
<4. Third Embodiment>
FIG. 7 is a diagram illustrating a configuration of the audio system 1 according to the third embodiment. The third embodiment is different from the second embodiment described with reference to FIG. 6 in the arrangement of the microphones in the microphone set 41 and the arrangement of the speakers in the speaker set 42. More specifically, the microphone set 41 includes four microphones m1 to m4, and the speaker set 42 includes eight speakers s1 to s8.
 この場合、変換部としてのマトリックス部35では、デコーダ32からマイクロホンm1~m4に対応して出力される4チャンネルのオーディオ信号を、スピーカs1~s8に対応する8チャンネルのオーディオ信号に変換することとしている。具体的には、スピーカs1、s3、s5、s7に出力する音オーディオ信号は、対応する配置のマイクロホンm1、m2、m3、m4で収音されたオーディオ信号をそのまま出力する。一方、対応する配置のマイクロホンが存在しないスピーカs2、s4、s6、s8については、複数のマイクロホンのオーディオ信号をミキシングして形成している。例えば、スピーカs2に対するオーディオ信号は、マイクロホンm1とマイクロホンm2のオーディオ信号をミキシングして形成している。ミキシングは、固定の係数を乗算して加算することで行っているが、動的に係数を変化させることとしてもよい。例えば、オーディオ信号の大きさ(レベル)が大きい方向に、係数を大きく分配することで、再生時における音場の方向感を強調することが可能となる。 In this case, the matrix unit 35 as a conversion unit converts the four-channel audio signals output from the decoder 32 corresponding to the microphones m1 to m4 into eight-channel audio signals corresponding to the speakers s1 to s8. I have. Specifically, the audio signals output to the speakers s1, s3, s5, and s7 directly output the audio signals collected by the microphones m1, m2, m3, and m4 in the corresponding arrangement. On the other hand, the speakers s2, s4, s6, and s8, which do not have correspondingly arranged microphones, are formed by mixing audio signals of a plurality of microphones. For example, the audio signal for the speaker s2 is formed by mixing the audio signals of the microphone m1 and the microphone m2. Although the mixing is performed by multiplying and adding fixed coefficients, the coefficients may be dynamically changed. For example, by distributing a large coefficient in a direction in which the magnitude (level) of the audio signal is large, it is possible to emphasize the sense of direction of the sound field during reproduction.
 以上、マイクロホンm1~m4の数が、スピーカs1~s8の数よりも少ない場合についても、変換部としてのマトリックス部35において、再生側のスピーカs1~s4の配置に応じたチャネル数に変換し、音場を適切に再現することを可能としている。なお、変換部では、チャネル数を変換するのみならず、マイクロホンの収音方向、スピーカの放音方向が異なる場合に、それらが適切な形態となるようにオーディオ信号を変換することも可能である。その際、マイクロホンの数とスピーカの数は同じであってもよい。なお、マトリックス部35の構成は、本実施形態で説明されたものの他にもさまざまな方法が考えられ、1つの方法に限定されるものではない。 As described above, even when the number of microphones m1 to m4 is smaller than the number of speakers s1 to s8, the number of channels is converted into the number of channels corresponding to the arrangement of the speakers s1 to s4 on the reproduction side by the matrix unit 35 as a conversion unit. The sound field can be reproduced properly. Note that the conversion unit can not only convert the number of channels, but also convert the audio signal so that when the sound pickup direction of the microphone and the sound emission direction of the speaker are different, they are in an appropriate form. . At that time, the number of microphones and the number of speakers may be the same. The configuration of the matrix unit 35 may be various methods other than the one described in the present embodiment, and is not limited to one method.
<5.第4の実施形態>
 図8は、第4の実施形態に係るオーディオシステム1の構成を示す図である。第4の実施形態は、例えば、図4で説明した第1の実施形態に係るオーディオシステム1と、記録側にサンプリング周波数変換部としてのダウンサンプリング部26、及び、ダウンサンプリング部26で生じる遅延を補償するための遅延部25が設けられている点、そして、再生側にサンプリング周波数変換部としてのアップサンプリング部37が設けられている点において異なっている。
<5. Fourth embodiment>
FIG. 8 is a diagram illustrating a configuration of an audio system 1 according to the fourth embodiment. In the fourth embodiment, for example, the audio system 1 according to the first embodiment described with reference to FIG. 4, the downsampling unit 26 as a sampling frequency conversion unit on the recording side, and the delay caused by the downsampling unit 26 are described. The difference is that a delay unit 25 for compensating is provided, and an upsampling unit 37 as a sampling frequency conversion unit is provided on the reproduction side.
 HOAエンコーダ22における処理は、高い周波数領域の信号を含まないため、入力されるオーディオ信号の周波数を落としても、音質に与える影響はさほど大きくないことが考えられる。第4の実施形態では、HOAエンコーダ22に入力されるオーディオ信号に対し、ダウンサンプリング部26において、時間軸でのダウンサンプリング処理を入れることで、HOAエンコーダ22における演算量を削減することとしている。また、ダウンサンプリングを行うことで、HOAエンコーダ22から出力される信号についてもデータ量を削減し、記憶容量、通信量の削減を図ることも可能となっている。 Since the processing in the HOA encoder 22 does not include a signal in a high frequency region, it is conceivable that even if the frequency of the input audio signal is reduced, the effect on the sound quality is not so large. In the fourth embodiment, the amount of calculation in the HOA encoder 22 is reduced by performing a down-sampling process on the time axis in the down-sampling unit 26 for the audio signal input to the HOA encoder 22. Further, by performing the downsampling, the data amount of the signal output from the HOA encoder 22 can be reduced, and the storage capacity and the communication amount can be reduced.
 例えば、マイクロホンセット41から入力されるオーディオ信号のサンプリング周波数を、一般的なサンプリング周波数をFs=44.1kHz(もしくは、48kHz)とした場合、ダウンサンプリング部26では、元の信号の半分のサンプリング周波数であるFs=22.05kHz(もしくは、24kHz)に落とすことが考えられる。再生側では、HOAデコーダ31の後段に配置されたアップサンプリング部37において、デコーダ32側と同じサンプリング周波数等にアップサンプリングされる。この場合、主にFIRフィルタが使われることが多い。本実施形態では、ダウンサンプリング部26で生じる遅延を補償するため、エンコーダ24側の経路に遅延部25を設けている。このように記録側で遅延を補償することに代え、再生側(あるいは記録側及び再生側)において、遅延を補償することとしてもよい。再生側で遅延を補償する場合、例えば、デコーダ32の後段に遅延部を配置すること等が考えられる。 For example, when the sampling frequency of an audio signal input from the microphone set 41 is a general sampling frequency of Fs = 44.1 kHz (or 48 kHz), the down-sampling unit 26 uses a sampling frequency half that of the original signal. Fs = 22.05 kHz (or 24 kHz). On the reproduction side, an up-sampling section 37 arranged downstream of the HOA decoder 31 performs up-sampling at the same sampling frequency as that of the decoder 32 side. In this case, an FIR filter is mainly used in many cases. In the present embodiment, the delay unit 25 is provided on the path on the encoder 24 side to compensate for the delay generated in the downsampling unit 26. Instead of compensating for the delay on the recording side, the delay may be compensated on the reproducing side (or the recording side and the reproducing side). When compensating for the delay on the reproduction side, for example, it is conceivable to arrange a delay unit at a stage subsequent to the decoder 32.
 このように、第3の実施形態では、HOAエンコーダ22に入力されるオーディオ信号に対し、時間軸でのダウンサンプリング処理を入れることで、HOAエンコーダ22における演算量の削減、並びに、HOAエンコーダ22から出力される信号のデータ量削減を図ることが可能となる。また、HOAエンコーダ22から出力されるデータ量を削減できることに伴って、エンコーダ24から出力される信号に対し、より多くの情報量(例えば、ビット数)を割り当てることも可能となる。なお、サンプリング周波数の変換は、HOAエンコーダ22側、HOAデコーダ31側ではなく、エンコーダ24、デコーダ32側で行う、あるいは、HOAエンコーダ22側、HOAデコーダ31側と、エンコーダ24、デコーダ32側の両方で行うこととしてもよい。 As described above, in the third embodiment, the downsampling process on the time axis is performed on the audio signal input to the HOA encoder 22 to reduce the calculation amount in the HOA encoder 22 and to reduce the amount of calculation from the HOA encoder 22. It is possible to reduce the data amount of the output signal. In addition, as the amount of data output from the HOA encoder 22 can be reduced, a larger amount of information (for example, the number of bits) can be assigned to the signal output from the encoder 24. The conversion of the sampling frequency is performed not on the HOA encoder 22 side and the HOA decoder 31 side but on the encoder 24 and the decoder 32 side, or on both the HOA encoder 22 side and the HOA decoder 31 side and the encoder 24 and the decoder 32 side. It may be performed by.
<6.第5の実施形態>
 図9は、第5の実施形態に係るオーディオシステム1の構成を示す図である。第5の実施形態は、例えば、図4で説明した第1の実施形態に係るオーディオシステム1が1つのHOAエンコーダ22を使用していたのに対し、第5の実施形態では、低域を担当するHOAエンコーダ22a、中域を担当するHOAエンコーダ22bが設けられている点において異なっている。
<6. Fifth Embodiment>
FIG. 9 is a diagram illustrating a configuration of an audio system 1 according to the fifth embodiment. In the fifth embodiment, for example, the audio system 1 according to the first embodiment described with reference to FIG. 4 uses one HOA encoder 22, whereas in the fifth embodiment, the audio system 1 is in charge of a low band. This is different from the first embodiment in that a HOA encoder 22a for performing a mid-range operation is provided.
 また、HOAエンコーダ22aの前段には、入力されるオーディオ信号の高域成分を除去するLPF21aが配置され、HOAエンコーダ22bの前段には、入力されるオーディオ信号の中域成分を抽出するBPF21b(Band Pass Filter)が配置されている。また、記録側の構成に伴い、再生側には、HOAエンコーダ22aでエンコードされたオーディオ信号をデコードするHOAデコーダ31a、HOAエンコーダ22bでエンコードされたオーディオ信号をデコードするHOAデコーダ31aが配置されている。加算部34は、デコーダ32でデコードされ、乗算部33で係数が乗算されたオーディオ信号と、HOAデコーダ31a、及び、HOAデコーダ31bでデコードされたオーディオ信号を換算して、スピーカセット42に出力する。 An LPF 21a for removing a high-frequency component of an input audio signal is arranged at a stage preceding the HOA encoder 22a, and a BPF 21b (Band) for extracting a mid-range component of the input audio signal is arranged at a stage prior to the HOA encoder 22b. Pass Filter). With the configuration on the recording side, a HOA decoder 31a for decoding an audio signal encoded by the HOA encoder 22a and a HOA decoder 31a for decoding an audio signal encoded by the HOA encoder 22b are arranged on the reproduction side. . The adder 34 converts the audio signal decoded by the decoder 32 and multiplied by the coefficient by the multiplier 33 and the audio signal decoded by the HOA decoder 31a and the HOA decoder 31b, and outputs the converted signal to the speaker set 42. .
 図10は、第5の実施形態に係るオーディオシステム1の周波数特性を示す図である。図10に示される周波数特性中、実線で示される低域通過特性は、LPF21aの特性を示している。また、一点鎖線で示される中域通過特性は、BPF21bの特性を示している。そして、破線で示される高域通過特性は、HPF23の特性を示している。LPF21aの低域通過特性と、BPF21bの中域通過特性と、HPF23の高域通過特性を加算することで、低域から高域の周波数にわたって平坦な周波数特性が形成される。これらの特性はあくまで1つの例であり、設計の仕方によってさまざまな特性が可能である。 FIG. 10 is a diagram illustrating frequency characteristics of the audio system 1 according to the fifth embodiment. In the frequency characteristic shown in FIG. 10, the low-pass characteristic shown by the solid line indicates the characteristic of the LPF 21a. The mid-pass characteristic indicated by the dashed line indicates the characteristic of the BPF 21b. The high-pass characteristics indicated by broken lines indicate the characteristics of the HPF 23. By adding the low-pass characteristics of the LPF 21a, the mid-pass characteristics of the BPF 21b, and the high-pass characteristics of the HPF 23, a flat frequency characteristic is formed from low to high frequencies. These characteristics are just one example, and various characteristics are possible depending on the design method.
 第5の実施形態のように、HOAエンコーダ22a、22bを複数の周波数帯域に分けて設けることで、HOAエンコーダ22a、22bのHOA処理で使用する次数を異ならせることが可能となる。特に、超低域の信号においては、波長が十分に長いため、人間が知覚する音の到来方向に鈍感であり、この周波数帯域を担当するHOAエンコーダ22aについては、HOAエンコーダ22bよりも次数を下げて処理することで、演算量を削減することが考えられる。 よ う As in the fifth embodiment, by providing the HOA encoders 22a and 22b separately in a plurality of frequency bands, it is possible to vary the order used in the HOA processing of the HOA encoders 22a and 22b. In particular, in the case of a signal in an extremely low frequency range, the wavelength is sufficiently long, so that the direction of arrival of the sound perceived by humans is insensitive. It is conceivable to reduce the calculation amount by performing the processing.
<7.第6の実施形態>
 図11は、第6の実施形態に係るオーディオシステム1の構成を示す図である。図8を用いて説明した第4の実施形態では、ダウンサンプリング部26で生じる遅延を補償するため、記録側に遅延部25を設ける形態について説明したが、ダウンサンプリングを行うことのみならず、帯域別に処理を行うため、帯域間で時間ずれが生じることが考えられる。
<7. Sixth embodiment>
FIG. 11 is a diagram illustrating a configuration of an audio system 1 according to the sixth embodiment. In the fourth embodiment described with reference to FIG. 8, the mode in which the delay unit 25 is provided on the recording side to compensate for the delay generated in the downsampling unit 26 has been described. Since the processing is performed separately, a time lag may occur between the bands.
 第6の実施形態は、このような帯域間における時間ずれを、再生側で解消するための構成を示したものである。例えば、記録側のエンコーダ24、HOAエンコーダ22の処理によって生じる時間ずれ、あるいは、再生側のデコーダ32、HOAデコーダ31の処理によって生じる時間ずれは、再生側に遅延部36を設けることで解消することが可能である。なお、第6の実施形態についても、再生側で遅延を補償することに代え、記録側(あるいは記録側及び再生側)において、遅延を補償することとしてもよい。 The sixth embodiment shows a configuration for eliminating such a time lag between bands on the reproduction side. For example, a time lag caused by the processing of the encoder 24 and the HOA encoder 22 on the recording side or a time lag caused by the processing of the decoder 32 and the HOA decoder 31 on the reproduction side can be eliminated by providing the delay unit 36 on the reproduction side. Is possible. Also in the sixth embodiment, instead of compensating for the delay on the reproducing side, the delay may be compensated on the recording side (or the recording side and the reproducing side).
<8.変形例>
(HOA方式の変形例)
 以上、第1~第6の実施形態では、HOAエンコーダ22、HOAデコーダ31等を使用したHOA方式を使用する形態について説明を行った。各種実施形態で使用する信号は、このようなHOA方式でエンコードされた信号に限られるものでは無く、空間周波数に対応付いた信号、言い換えると、デコードを行う際、再生する位置(スピーカの設置位置)を指定することで、当該位置におけるオーディオ信号を再現できる信号であれば、各種方式を採用することができる。ここでは、空間周波数に対応付いた信号をSF信号と呼ぶことにする。
<8. Modification>
(Modification of HOA method)
As described above, the first to sixth embodiments have described the embodiments using the HOA method using the HOA encoder 22, the HOA decoder 31, and the like. The signals used in the various embodiments are not limited to the signals encoded by the HOA method, but the signals associated with the spatial frequencies, in other words, the positions to be reproduced when decoding (the positions at which the speakers are installed) ), Various methods can be adopted as long as the signal can reproduce the audio signal at the position. Here, a signal associated with a spatial frequency is referred to as an SF signal.
 これに対し、エンコーダ24、デコーダ32等で使用する方式は、空間座標に対応付いた信号を使用することになる。すなわち、収音した位置(空間座標)についてのオーディオ信号を再現できる信号であるといえる。ここでは、空間座標に対応付いた信号をSA信号と呼ぶことにする。以下に、SA信号、SF信号とした場合の記録信号のフォーマットについて説明を行う。 {On the other hand, the method used by the encoder 24, the decoder 32, and the like uses a signal corresponding to the spatial coordinates. In other words, it can be said that the signal is a signal that can reproduce the audio signal at the picked-up position (spatial coordinates). Here, a signal associated with the spatial coordinates is referred to as an SA signal. Hereinafter, the format of the recording signal when the SA signal and the SF signal are used will be described.
(記録フォーマットの変形例)
 図12は、変形例について、オーディオシステム1で使用する記録信号の記録フォーマットを示す図である。図5では、第1の実施形態で使用する記録信号のフォーマットを説明した。この変形例では、HOA信号を空間周波数に対応付いた信号(SF信号)、高域を担当する信号を空間座標に対応付いた信号(SA信号)として、一般化した場合の記録フォーマットについて説明する。
(Modification of recording format)
FIG. 12 is a diagram illustrating a recording format of a recording signal used in the audio system 1 according to a modification. In FIG. 5, the format of the recording signal used in the first embodiment has been described. In this modified example, a description will be given of a recording format when the HOA signal is generalized as a signal (SF signal) associated with a spatial frequency, and a signal in charge of a high band is a signal (SA signal) associated with spatial coordinates (SA signal). .
 記録信号は、ヘッダ部とデータ部を有して構成される。ヘッダ部は、記録されたオーディオ信号を再生するために必要な各種メタ情報を記録した部分である。本実施形態では、ヘッダ部に記録するメタ情報として、サンプリングレート、フレーム長、フレーム数(L)、帯域分割数(N)、帯域毎の帯域情報(第1~第N帯域情報)を含んで構成されている。例えば、第5の実施形態のように3つの周波数帯域に分割される場合、第1~第3帯域情報が設けられることになる。 The recording signal has a header section and a data section. The header section is a section in which various meta information necessary for reproducing the recorded audio signal is recorded. In the present embodiment, the meta information to be recorded in the header section includes a sampling rate, a frame length, the number of frames (L), the number of band divisions (N), and band information (first to N-th band information) for each band. It is configured. For example, when the frequency band is divided into three frequency bands as in the fifth embodiment, first to third band information is provided.
 サンプリングレートは、録音時に使用したサンプリングレートであって、固定、可変どちらであってもよい。フレーム長は、データ部内に記録したフレームの長さを規定した情報である。フレーム長についても固定、可変のどちらを採用してもよい。フレーム数(L)は、データ部中、1つのデータのまとまりとなるチャンクを構成するフレーム数を規定した数である。帯域分割数は、オーディオシステム1において分割される帯域数を示した数であり、例えば、第5の実施形態のように3つの周波数帯域に分割される場合、帯域分割数は「3」となり、第1~第3帯域情報が設けられることになる。 The sampling rate is the sampling rate used at the time of recording, and may be fixed or variable. The frame length is information defining the length of a frame recorded in the data section. Either fixed or variable frame length may be adopted. The number of frames (L) is a number that defines the number of frames forming a chunk that is a unit of one data in the data portion. The number of band divisions is a number indicating the number of bands to be divided in the audio system 1. For example, when the band is divided into three frequency bands as in the fifth embodiment, the number of band divisions is "3". The first to third band information is provided.
 各帯域情報(第1~第N帯域情報)には、担当する周波数帯域の下限を示す第1カットオフ周波数、上限を示す第2カットオフ周波数が設けられる。時間遅延情報は、他の帯域に対する遅れ、あるいは、進みを示す情報であって、例えば、第6の実施形態で説明した遅延部36における遅延時間の設定に使用することが可能である。空間ドメイン情報は、その帯域がSF信号であるのかSA信号であるのかを示す情報であり、再生装置3は、空間ドメイン情報を参照することで、当該帯域に対するデコード方式を決定することが可能である。空間ドメイン情報には、収音されたマイクロホンセット41について、マイクロホンの配置に関する情報等を含むこととしてもよい。信号ドメイン情報は、時間軸で記録されているか、時間周波数軸で記録されているかを示す情報である。圧縮方式情報は、圧縮の有無、使用している圧縮方式を示す情報である。 Each band information (first to N-th band information) is provided with a first cutoff frequency indicating the lower limit of the assigned frequency band and a second cutoff frequency indicating the upper limit. The time delay information is information indicating delay or advance with respect to another band, and can be used, for example, for setting the delay time in the delay unit 36 described in the sixth embodiment. The spatial domain information is information indicating whether the band is an SF signal or an SA signal, and the reproducing device 3 can determine a decoding method for the band by referring to the spatial domain information. is there. The spatial domain information may include information on the microphone arrangement of the collected microphone set 41 and the like. The signal domain information is information indicating whether it is recorded on the time axis or on the time frequency axis. The compression method information is information indicating the presence or absence of compression and the compression method being used.
 次数・チャネル情報は、SF信号を使用している場合には次数が記憶され、SA信号を使用している場合には、チャネル情報が記憶される。SF信号を使用した周波数帯域に対して記憶される次数は、空間周波数に対応付いた信号を形成する処理に使用した次数である。 As the order / channel information, the order is stored when the SF signal is used, and the channel information is stored when the SA signal is used. The order stored for the frequency band using the SF signal is the order used for the process of forming a signal corresponding to the spatial frequency.
 一方、SA信号を使用した周波数帯域に対して記憶されるチャネル情報は、SA信号を使用した場合に記憶される情報であって、図5で説明したように、チャネル数と、各チャネルについてチャネル座標を含んで構成される。チャネル数は、マイクロホンセット41中のマイクロホンの数(例えば、図7に示す第3の実施形態の場合「4」個)がこれに相当する。その場合、チャネル座標は、マイクロホンセット41中、マイクロホンm1~m4について、その空間配置を示す座標である。図6で説明した第2の実施形態、図7で説明した第3の実施形態で説明したマトリックス部35(変換部)では、このチャネル情報、及び、スピーカセット42内のスピーカの配置に基づいて、オーディオ信号のチャネル数を変換する等の各種変換を行うことが可能となる。 On the other hand, the channel information stored for the frequency band using the SA signal is information stored when the SA signal is used, and as described with reference to FIG. It is configured to include coordinates. The number of channels corresponds to the number of microphones in the microphone set 41 (for example, “4” in the case of the third embodiment shown in FIG. 7). In this case, the channel coordinates are coordinates indicating the spatial arrangement of the microphones m1 to m4 in the microphone set 41. The matrix unit 35 (conversion unit) described in the second embodiment described with reference to FIG. 6 and the third embodiment described with reference to FIG. 7 uses the channel information and the arrangement of speakers in the speaker set 42 based on this channel information. Various conversions such as converting the number of channels of an audio signal can be performed.
 データ部には、各帯域別に変換された信号が格納されている。本変形例では、第1~第N帯域について、フレームを有するフレームチャンクが、フレーム数(L)だけ設けられている。このようにフレームで記録されたデータは、ヘッダ部に記述されたメタ情報を参照してオーディオ信号に変換される。 The data section stores signals converted for each band. In the present modification, frame chunks having frames are provided for the first to N-th bands by the number of frames (L). The data recorded in such a frame is converted into an audio signal with reference to the meta information described in the header.
 なお、上述した記録フォーマットについても、帯域間で共通する情報は、1つにまとめる等の形態を採用することができる。そして、上述した記録フォーマットは、あくまで一例であって、この形態に限られるものでは無く様々な形態で構成することが可能である。 Note that, for the above-described recording format, information common to the bands can be combined into one. The recording format described above is merely an example, and the present invention is not limited to this format, and can be configured in various forms.
(再生環境の変形例)
 前述した各種実施形態における再生装置3は、複数のスピーカで構成されたスピーカセット42に対してオーディオ信号を出力するものであった。このような形態のみならず、再生装置3は、例えば、ヘッドホンを使用した仮想環境でオーディオ信号を再生するものであってもよい。すなわち、前述した実施形態において、スピーカセット42中の各スピーカからリスナーの両耳までの頭部伝達関数がわかれば、各スピーカを駆動するオーディオ信号に対して、各頭部伝達関数を畳み込むことで、リスナーの両耳で各スピーカの音がどのように聞こえるかがわかる。左右の耳での総和をヘッドホンなどで再生することによって、スピーカセット42を使用するリスナーと同様の音場を再現することができる。
(Modification of playback environment)
The playback device 3 in each of the above-described embodiments outputs an audio signal to the speaker set 42 including a plurality of speakers. In addition to such a form, the reproducing device 3 may reproduce an audio signal in a virtual environment using headphones, for example. That is, in the above-described embodiment, if the head-related transfer functions from each speaker in the speaker set 42 to both ears of the listener are known, each head-related transfer function is convolved with the audio signal driving each speaker. It can be seen how the sound of each speaker is heard by both ears of the listener. By reproducing the sum of the left and right ears through headphones or the like, a sound field similar to that of a listener using the speaker set 42 can be reproduced.
 このような仮想環境を使用した音場形成は、ヘッドホンだけではなく、2チャネル以上で駆動する電気音響変換器があれば実現することができる。その際、必要に応じて、クロストークキャンセル等、電気音響変換器で再生するオーディオ信号に対して各種補正を行うことも可能である。 音 Sound field formation using such a virtual environment can be realized with an electro-acoustic transducer that is driven not only by headphones but also by two or more channels. At that time, if necessary, various corrections such as crosstalk cancellation can be performed on the audio signal reproduced by the electro-acoustic transducer.
 本開示は、装置、方法、プログラム等、各種形態で実現することが可能である。また、各実施形態、変形例で説明した事項は、適宜組み合わせることが可能である。 The present disclosure can be realized in various forms such as an apparatus, a method, and a program. In addition, the items described in each of the embodiments and the modified examples can be appropriately combined.
 本開示は、以下の構成を採用することができる。
(1)
 空間周波数に対応付いた第1の信号を、複数チャネルのオーディオ信号にデコードする第1のデコーダと、
 前記第1の信号と異なる帯域を含み、空間座標に対応付いた第2の信号を、複数チャネルのオーディオ信号にデコードする第2のデコーダと、
 前記第1のデコーダでデコードされた複数チャネルのオーディオ信号と、前記第2のデコーダでデコードされた複数チャネルのオーディオ信号を加算する加算部と、を備える
 オーディオ再生装置。
(2)
 前記第2のデコーダでデコードされたオーディオ信号は、前記第1のデコーダでデコードされたオーディオ信号よりも高域である
 (1)に記載のオーディオ再生装置。
(3)
 前記第1のデコーダは、出力対象となるスピーカの配置に基づいてデコードを行う
 (1)または(2)に記載のオーディオ再生装置。
(4)
 前記第1のデコーダは、HOA方式を使用する
 (1)から(3)の何れか1つに記載のオーディオ再生装置。
(5)
 前記第2のデコーダから出力される複数チャネルのオーディオ信号を、出力対象となるスピーカの配置に基づいて変換する変換部を備える
 (1)から(4)の何れか1つに記載のオーディオ再生装置。
(6)
 前記変換部は、前記第2のデコーダから出力されるオーディオ信号のチャネル数を変換する
 (5)に記載のオーディオ再生装置。
(7)
 前記第1の信号と前記第2の信号は、サンプリング周波数が異なり、
 前記第1の信号、前記第2の信号、少なくとも一方のサンプリング周波数を変換するサンプリング周波数変換部を備える
 (1)から(6)の何れか1つに記載のオーディオ再生装置。
(8)
 前記第2のデコーダは、帯域毎に複数設けられ、
 複数の前記第2のデコーダは、デコードに使用する次数が異なる
 (1)から(7)の何れか1つに記載のオーディオ再生装置。
(9)
 前記第1のデコーダと前記第2のデコーダ間で生じる時間ずれを調整する遅延部を備える
 (1)から(8)の何れか1つに記載のオーディオ再生装置。
(10)
 空間周波数に対応付いた第1の信号を、複数チャネルのオーディオ信号にデコードし、
 前記第1の信号と異なる帯域を含み、空間座標に対応付いた第2の信号を、複数チャネルのオーディオ信号にデコードし、
 前記第1の信号に基づいてデコードされた複数チャネルのオーディオ信号と、前記第2の信号に基づいてデコードされた複数チャネルのオーディオ信号を加算する
 オーディオ再生方法。
(11)
 空間周波数に対応付いた第1の信号を、複数チャネルのオーディオ信号にデコードする第1のデコード処理と、
 前記第1の信号と異なる帯域を含み、空間座標に対応付いた第2の信号を、複数チャネルのオーディオ信号にデコードする第2のデコード処理と、
 前記第1のデコーダでデコードされた複数チャネルのオーディオ信号と、前記第2のデコーダでデコードされた複数チャネルのオーディオ信号を加算する加算処理と、を情報処理装置に実行させる
 オーディオ再生プログラム。
The present disclosure can employ the following configurations.
(1)
A first decoder for decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels;
A second decoder that decodes a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
An audio reproducing apparatus comprising: an adder that adds a plurality of channels of audio signals decoded by the first decoder and a plurality of channels of audio signals decoded by the second decoder.
(2)
The audio reproduction device according to (1), wherein the audio signal decoded by the second decoder has a higher frequency band than the audio signal decoded by the first decoder.
(3)
The audio reproduction device according to (1) or (2), wherein the first decoder performs decoding based on an arrangement of speakers to be output.
(4)
The audio playback device according to any one of (1) to (3), wherein the first decoder uses a HOA method.
(5)
The audio playback device according to any one of (1) to (4), further including a conversion unit configured to convert audio signals of a plurality of channels output from the second decoder based on an arrangement of speakers to be output. .
(6)
The audio reproduction device according to (5), wherein the conversion unit converts the number of channels of the audio signal output from the second decoder.
(7)
The first signal and the second signal have different sampling frequencies,
The audio playback device according to any one of (1) to (6), further including a sampling frequency conversion unit configured to convert at least one of the first signal and the second signal.
(8)
A plurality of the second decoders are provided for each band,
The audio reproduction device according to any one of (1) to (7), wherein the plurality of second decoders use different orders for decoding.
(9)
The audio playback device according to any one of (1) to (8), further including a delay unit that adjusts a time shift generated between the first decoder and the second decoder.
(10)
Decoding the first signal corresponding to the spatial frequency into audio signals of a plurality of channels;
Decoding a second signal including a band different from the first signal and corresponding to spatial coordinates into an audio signal of a plurality of channels;
An audio reproduction method for adding a plurality of channels of audio signals decoded based on the first signal and a plurality of channels of audio signals decoded based on the second signal.
(11)
A first decoding process of decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels;
A second decoding process of decoding a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
An audio reproduction program for causing an information processing device to execute an addition process of adding audio signals of a plurality of channels decoded by the first decoder and audio signals of a plurality of channels decoded by the second decoder.
1:オーディオシステム
2:記録装置
3:再生装置
21(21a、21b):LPF
22(22a、22b):HOAエンコーダ
23:HPF
24:エンコーダ
25:遅延部
26:ダウンサンプリング部
31(31a、31b):HOAデコーダ
32:デコーダ
33:乗算部
34:加算部
35:マトリックス部
36:遅延部
37:アップサンプリング部
41:マイクロホンセット
42:スピーカセット
m1~m8:マイクロホン
s1~s8:スピーカ
1: audio system 2: recording device 3: playback device 21 (21a, 21b): LPF
22 (22a, 22b): HOA encoder 23: HPF
24: encoder 25: delay unit 26: downsampling unit 31 (31a, 31b): HOA decoder 32: decoder 33: multiplying unit 34: adding unit 35: matrix unit 36: delay unit 37: upsampling unit 41: microphone set 42 : Speaker set m1 to m8: Microphones s1 to s8: Speaker

Claims (11)

  1.  空間周波数に対応付いた第1の信号を、複数チャネルのオーディオ信号にデコードする第1のデコーダと、
     前記第1の信号と異なる帯域を含み、空間座標に対応付いた第2の信号を、複数チャネルのオーディオ信号にデコードする第2のデコーダと、
     前記第1のデコーダでデコードされた複数チャネルのオーディオ信号と、前記第2のデコーダでデコードされた複数チャネルのオーディオ信号を加算する加算部と、を備える
     オーディオ再生装置。
    A first decoder for decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels;
    A second decoder that decodes a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
    An audio reproducing apparatus comprising: an adder that adds a plurality of channels of audio signals decoded by the first decoder and a plurality of channels of audio signals decoded by the second decoder.
  2.  前記第2のデコーダでデコードされたオーディオ信号は、前記第1のデコーダでデコードされたオーディオ信号よりも高域である
     請求項1に記載のオーディオ再生装置。
    The audio reproduction device according to claim 1, wherein the audio signal decoded by the second decoder has a higher frequency range than the audio signal decoded by the first decoder.
  3.  前記第1のデコーダは、出力対象となるスピーカの配置に基づいてデコードを行う
     請求項1に記載のオーディオ再生装置。
    The audio playback device according to claim 1, wherein the first decoder performs decoding based on an arrangement of speakers to be output.
  4.  前記第1のデコーダは、HOA方式を使用する
     請求項1に記載のオーディオ再生装置。
    The audio reproduction device according to claim 1, wherein the first decoder uses a HOA method.
  5.  前記第2のデコーダから出力される複数チャネルのオーディオ信号を、出力対象となるスピーカの配置に基づいて変換する変換部を備える
     請求項1に記載のオーディオ再生装置。
    The audio reproduction device according to claim 1, further comprising: a conversion unit configured to convert audio signals of a plurality of channels output from the second decoder based on an arrangement of speakers to be output.
  6.  前記変換部は、前記第2のデコーダから出力されるオーディオ信号のチャネル数を変換する
     請求項5に記載のオーディオ再生装置。
    The audio playback device according to claim 5, wherein the conversion unit converts the number of channels of the audio signal output from the second decoder.
  7.  前記第1の信号と前記第2の信号は、サンプリング周波数が異なり、
     前記第1の信号、前記第2の信号、少なくとも一方のサンプリング周波数を変換するサンプリング周波数変換部を備える
     請求項1に記載のオーディオ再生装置。
    The first signal and the second signal have different sampling frequencies,
    The audio playback device according to claim 1, further comprising a sampling frequency conversion unit configured to convert at least one of a sampling frequency of the first signal and the second signal.
  8.  前記第2のデコーダは、帯域毎に複数設けられ、
     複数の前記第2のデコーダは、デコードに使用する次数が異なる
     請求項1に記載のオーディオ再生装置。
    A plurality of the second decoders are provided for each band,
    The audio playback device according to claim 1, wherein the plurality of second decoders use different orders for decoding.
  9.  前記第1のデコーダと前記第2のデコーダ間で生じる時間ずれを調整する遅延部を備える
     請求項1に記載のオーディオ再生装置。
    The audio playback device according to claim 1, further comprising: a delay unit that adjusts a time lag between the first decoder and the second decoder.
  10.  空間周波数に対応付いた第1の信号を、複数チャネルのオーディオ信号にデコードし、
     前記第1の信号と異なる帯域を含み、空間座標に対応付いた第2の信号を、複数チャネルのオーディオ信号にデコードし、
     前記第1の信号に基づいてデコードされた複数チャネルのオーディオ信号と、前記第2の信号に基づいてデコードされた複数チャネルのオーディオ信号を加算する
     オーディオ再生方法。
    Decoding the first signal corresponding to the spatial frequency into audio signals of a plurality of channels;
    Decoding a second signal including a band different from the first signal and associated with spatial coordinates into an audio signal of a plurality of channels;
    An audio reproducing method for adding a plurality of channels of audio signals decoded based on the first signal and a plurality of channels of audio signals decoded based on the second signal.
  11.  空間周波数に対応付いた第1の信号を、複数チャネルのオーディオ信号にデコードする第1のデコード処理と、
     前記第1の信号と異なる帯域を含み、空間座標に対応付いた第2の信号を、複数チャネルのオーディオ信号にデコードする第2のデコード処理と、
     前記第1のデコーダでデコードされた複数チャネルのオーディオ信号と、前記第2のデコーダでデコードされた複数チャネルのオーディオ信号を加算する加算処理と、を情報処理装置に実行させる
     オーディオ再生プログラム。
    A first decoding process of decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels;
    A second decoding process of decoding a second signal including a band different from the first signal and associated with spatial coordinates into audio signals of a plurality of channels;
    An audio reproduction program for causing an information processing device to execute an addition process of adding audio signals of a plurality of channels decoded by the first decoder and audio signals of a plurality of channels decoded by the second decoder.
PCT/JP2019/025199 2018-08-21 2019-06-25 Audio reproducing device, audio reproduction method, and audio reproduction program WO2020039734A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
DE112019004193.2T DE112019004193T5 (en) 2018-08-21 2019-06-25 AUDIO PLAYBACK DEVICE, AUDIO PLAYBACK METHOD AND AUDIO PLAYBACK PROGRAM
CN201980053901.8A CN112567769B (en) 2018-08-21 2019-06-25 Audio reproducing apparatus, audio reproducing method, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-154456 2018-08-21
JP2018154456 2018-08-21

Publications (1)

Publication Number Publication Date
WO2020039734A1 true WO2020039734A1 (en) 2020-02-27

Family

ID=69592557

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/025199 WO2020039734A1 (en) 2018-08-21 2019-06-25 Audio reproducing device, audio reproduction method, and audio reproduction program

Country Status (3)

Country Link
CN (1) CN112567769B (en)
DE (1) DE112019004193T5 (en)
WO (1) WO2020039734A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015137146A1 (en) * 2014-03-12 2015-09-17 ソニー株式会社 Sound field sound pickup device and method, sound field reproduction device and method, and program
WO2017035163A1 (en) * 2015-08-25 2017-03-02 Dolby Laboratories Licensing Corporation Audo decoder and decoding method
JP2017523451A (en) * 2014-07-02 2017-08-17 ドルビー・インターナショナル・アーベー Method and apparatus for decoding a compressed HOA representation and method and apparatus for encoding a compressed HOA representation

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB8920259D0 (en) * 1989-09-07 1989-10-18 British Broadcasting Corp Hybrid predictive coders and decoders for digital video signals
CN101140759B (en) * 2006-09-08 2010-05-12 华为技术有限公司 Band-width spreading method and system for voice or audio signal
EP2782094A1 (en) * 2013-03-22 2014-09-24 Thomson Licensing Method and apparatus for enhancing directivity of a 1st order Ambisonics signal
US9502045B2 (en) * 2014-01-30 2016-11-22 Qualcomm Incorporated Coding independent frames of ambient higher-order ambisonic coefficients
CN105898669B (en) * 2016-03-18 2017-10-20 南京青衿信息科技有限公司 A kind of coding method of target voice

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015137146A1 (en) * 2014-03-12 2015-09-17 ソニー株式会社 Sound field sound pickup device and method, sound field reproduction device and method, and program
JP2017523451A (en) * 2014-07-02 2017-08-17 ドルビー・インターナショナル・アーベー Method and apparatus for decoding a compressed HOA representation and method and apparatus for encoding a compressed HOA representation
WO2017035163A1 (en) * 2015-08-25 2017-03-02 Dolby Laboratories Licensing Corporation Audo decoder and decoding method

Also Published As

Publication number Publication date
DE112019004193T5 (en) 2021-07-15
CN112567769B (en) 2022-11-04
CN112567769A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
US10231073B2 (en) Ambisonic audio rendering with depth decoding
US10674262B2 (en) Merging audio signals with spatial metadata
CN107533843B (en) System and method for capturing, encoding, distributing and decoding immersive audio
US9361898B2 (en) Three-dimensional sound compression and over-the-air-transmission during a call
RU2640647C2 (en) Device and method of transforming first and second input channels, at least, in one output channel
CN100496149C (en) Method of decoding two-channel matrix encoded audio to reconstruct multichannel audio
US8284946B2 (en) Binaural decoder to output spatial stereo sound and a decoding method thereof
JP5054035B2 (en) Encoding / decoding apparatus and method
US20180048975A1 (en) Audio signal processing method and apparatus
CN112567765B (en) Spatial audio capture, transmission and reproduction
CN106797526A (en) Apparatus for processing audio, methods and procedures
KR101637407B1 (en) Apparatus and method and computer program for generating a stereo output signal for providing additional output channels
CN112823534B (en) Signal processing device and method, and program
WO2020039734A1 (en) Audio reproducing device, audio reproduction method, and audio reproduction program
JPWO2020100670A1 (en) Signal processing equipment and methods, and programs
CN112133316A (en) Spatial audio representation and rendering
WO2021261235A1 (en) Signal processing device and method, and program
KR20230060502A (en) Signal processing device and method, learning device and method, and program
EP4264962A1 (en) Stereo headphone psychoacoustic sound localization system and method for reconstructing stereo psychoacoustic sound signals using same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19852545

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19852545

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP