WO2020039734A1 - Dispositif de reproduction audio, procédé de reproduction audio et programme de reproduction audio - Google Patents

Dispositif de reproduction audio, procédé de reproduction audio et programme de reproduction audio Download PDF

Info

Publication number
WO2020039734A1
WO2020039734A1 PCT/JP2019/025199 JP2019025199W WO2020039734A1 WO 2020039734 A1 WO2020039734 A1 WO 2020039734A1 JP 2019025199 W JP2019025199 W JP 2019025199W WO 2020039734 A1 WO2020039734 A1 WO 2020039734A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
audio
decoder
channels
hoa
Prior art date
Application number
PCT/JP2019/025199
Other languages
English (en)
Japanese (ja)
Inventor
哲 曲谷地
一敦 大栗
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Priority to CN201980053901.8A priority Critical patent/CN112567769B/zh
Priority to DE112019004193.2T priority patent/DE112019004193T5/de
Publication of WO2020039734A1 publication Critical patent/WO2020039734A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/02Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • the present disclosure relates to an audio playback device, an audio playback method, and an audio playback program.
  • an audio reproducing apparatus in addition to reproducing an audio signal in a stereo form (two channels), a multi-channel form in which the number of speakers is further increased is known.
  • a multi-channel form in which the number of speakers is further increased is known.
  • Patent Document 1 discloses a method of encoding an audio signal using higher-order ambisonics for such audio reproduction using multi-channels.
  • One object of the present disclosure is to provide an audio reproducing device, an audio reproducing method, and an audio reproducing program for improving the quality of reproduced audio.
  • a first decoder for decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels A second decoder that decodes a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
  • An audio playback device comprising: an adder that adds a plurality of channels of audio signals decoded by the first decoder and a plurality of channels of audio signals decoded by the second decoder.
  • a first signal associated with a spatial frequency is decoded into an audio signal of a plurality of channels, and a second signal including a band different from that of the first signal and associated with a spatial coordinate is converted into an audio signal of a plurality of channels.
  • a first decoding process of decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels A second decoding process of decoding a second signal including a band different from the first signal and associated with spatial coordinates into audio signals of a plurality of channels;
  • An audio reproduction program for causing an information processing apparatus to execute an addition process of adding audio signals of a plurality of channels decoded by the first decoder and audio signals of a plurality of channels decoded by the second decoder.
  • the present disclosure it is possible to improve the quality of audio to be reproduced.
  • the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.
  • the contents of the present disclosure are not to be construed as being limited by the illustrated effects.
  • FIG. 1 is a diagram illustrating a configuration of an audio system as a comparative example.
  • FIG. 2 is a diagram for describing an overview of the audio system according to the first embodiment.
  • FIG. 3 is a diagram illustrating frequency characteristics of the audio system according to the first embodiment.
  • FIG. 4 is a diagram illustrating a configuration of the audio system according to the first embodiment.
  • FIG. 5 is a diagram showing a recording format of a recording signal used in the audio system according to the first embodiment.
  • FIG. 6 is a diagram illustrating a configuration of an audio system according to the second embodiment.
  • FIG. 7 is a diagram illustrating a configuration of an audio system according to the third embodiment.
  • FIG. 8 is a diagram illustrating a configuration of an audio system according to the fourth embodiment.
  • FIG. 1 is a diagram illustrating a configuration of an audio system as a comparative example.
  • FIG. 2 is a diagram for describing an overview of the audio system according to the first embodiment.
  • FIG. 3 is a diagram
  • FIG. 9 is a diagram illustrating a configuration of an audio system according to the fifth embodiment.
  • FIG. 10 is a diagram illustrating frequency characteristics of the audio system according to the fifth embodiment.
  • FIG. 11 is a diagram illustrating a configuration of an audio system according to the sixth embodiment.
  • FIG. 12 is a diagram illustrating a recording format of a recording signal used in an audio system according to a modification.
  • ambisonics that can flexibly cope with an arbitrary recording and reproducing system.
  • ambisonics whose order is second or higher are called higher order ambisonics (HOA: ⁇ Higher ⁇ Order ⁇ Ambisonics).
  • HOA higher order ambisonics
  • information is stored by performing spatial frequency conversion (spherical harmonic function conversion) in the angular direction of three-dimensional polar coordinates. are doing. This can be considered to correspond to time-frequency conversion of the audio signal with respect to the time axis.
  • An advantage of this method is that information can be encoded and decoded from an arbitrary microphone array to an arbitrary speaker array without limiting the number of microphones and the number of speakers.
  • Encodings used in the HOA method can be roughly classified into two types. One is a recording base, and the other is an object base. Since the first recording base is targeted in the present embodiment, this will be described.
  • a certain time frequency ⁇ of the sound signal recorded by the annular or spherical microphone array is converted into HOA signals A m ( ⁇ ) and An m ( ⁇ ) according to the following equations, respectively.
  • Equation (1) is for a circular microphone array
  • Equation (2) is for a spherical microphone array.
  • ⁇ q and ⁇ q represent the azimuth and elevation of the q-th microphone
  • P q ( ⁇ ) represents the sound pressure of the q-th microphone.
  • J m (ka) is a Bessel function
  • m is its order
  • k is the wave number
  • a is the radius of the microphone array.
  • the Bessel function of Equation (1) is the spherical Bessel function
  • e -Imfaiq spherical harmonics Y n m ( ⁇ q, ⁇ q) is replaced to.
  • the spherical harmonic is
  • n is the order of the HOA. Since this is a conversion of the sound pressure P which is a continuous function of the azimuth and the elevation, the orders m and n exist up to infinity. However, when recording with a spherical microphone array, it is impossible to capture the sound pressure P as a continuous function. Therefore, similar to the sampling theorem at the time frequency, the following relation exists between the reproducible HOA orders M and N and the number Q of microphones.
  • Equation (4) for a ring Equation (5) for a sphere.
  • R is the radius of the speaker array
  • ⁇ i , ⁇ i are the elevation and azimuth angles of the i-th speaker
  • G m (R, ⁇ ) and G n 0 (R, ⁇ ) are the transfer functions of HOA coefficient.
  • H m (2) (kR) is a Hankel function of the second kind
  • h n (2) (kR) is a Hankel function of the second kind.
  • the conversion formula between the HOA signal and the audio signal differs depending on the shape of the microphone array, the shape of the speaker array, the directivity, and the like.
  • descriptions as HOA encoding and HOA decoding mean that these various systems are included, and are not limited to any of them.
  • spatial aliasing As described above, in recording by a microphone set, the order is finite due to the limitation of the number of microphones. Therefore, if a signal of a higher order is mixed, spatial aliasing occurs. If a signal in which spatial aliasing occurs is encoded and decoded by the HOA method, a signal different from the recorded space will be reproduced. The effect of this aliasing depends on the time frequency and the radius of the microphone. As the time frequency becomes lower and the microphone radius becomes smaller, the higher-order signal of the HOA order becomes smaller. In other words, for the same time frequency, the smaller the radius of the microphone, the smaller the higher-order HOA signal, and the less the aliasing effect. Also, if the microphone radius is the same, the effect of aliasing is reduced in the low frequency band.
  • an object is to perform high-quality audio reproduction by suppressing the influence of spatial aliasing generated at a particularly high frequency in accordance with the number and spacing of microphones and the radius of the array. .
  • FIG. 1 is a diagram showing a configuration of an audio system 1 as a comparative example.
  • This comparative example is a conventional form using only the HOA method, and includes a HOA encoder 22 and a HOA decoder 31. Audio signals collected by a plurality of microphones provided in the microphone set 41 are input to the HOA encoder 22.
  • the microphone set 41 includes a plurality of microphones provided in an appropriate arrangement such as a ring, a sphere, and a line.
  • the HOA encoder 22 performs HOA encoding on a plurality of audio signals collected by the microphone set 41, thereby converting the audio signals into a HOA signal represented as a spatial frequency.
  • the HOA decoder 31 can reproduce the received HOA signal using an arbitrary speaker set 42.
  • the speaker set 42 used includes a plurality of speakers provided in an appropriate arrangement such as a ring, a sphere, a line, and the like. Further, the arrangement of the speaker set 42 does not need to depend on the microphone arrangement of the microphone set 41 that has collected the sound. This is because the HOA signal is expressed in the spatial frequency. By setting the arrangement of the speakers of the speaker set 42 with respect to the HOA decoder 31, it is possible to reproduce the sound field collected. It is possible.
  • FIG. 2 is a diagram for describing an overview of the audio system 1 according to the first embodiment.
  • the audio system 1 according to the first embodiment includes an LPF 21 (Low Pass Filter), a HOA encoder 22, a HOA decoder 31, an HPF 23 (High Pass Filter), a multiplier 33, and an adder 34.
  • the audio system 1 receives a plurality of microphones provided in the microphone set 41 as input and outputs a speaker set 42 in which a plurality of speakers are arranged.
  • the audio signal output from the microphone set 41 and input to the HOA encoder 22 via the LPF 21 and the audio signal input to the HPF 23 are equivalent to the number of microphones provided in the microphone set 41. Has the number of channels.
  • the audio signal output from the HOA decoder 31 and output to the speaker set 42 via the adder 34 has the same number of channels as the number of speakers arranged in the speaker set 42. As described above, in the block diagram shown in FIG. 2, for convenience of drawing, there are places where a plurality of channels are indicated by one line.
  • a plurality of audio signals collected by a plurality of microphones of the microphone set 41 are input to the HOA encoder 22.
  • the LPF 21 is used for the plurality of sound signals input from the microphone set 41.
  • High-frequency components are removed, and the frequency band is limited to a frequency band that can be correctly expressed by the HOA signal.
  • the HOA encoder 22 converts the plurality of audio signals from which the high-frequency components have been removed by the LPF 21 into HOA signals represented as spatial frequency.
  • the HOA decoder 31 decodes the HOA signal output from the HOA encoder 22 and reproduces the HOA signal using an arbitrary speaker set 42. At this time, in a plurality of audio signals input to the microphone set 41, a high frequency band that cannot be expressed by the HOA encoder 22 is only a high-frequency component via the HPF 23, and after performing gain adjustment by the multiplication unit 33, At 34, the sum is added to the HOA-decoded audio signal and output to the speaker set 42.
  • FIG. 3 is a diagram illustrating frequency characteristics of the audio system 1 according to the first embodiment.
  • the low-pass characteristics shown by the solid lines indicate the characteristics of the LPF 21.
  • the high-pass characteristics indicated by the broken lines indicate the characteristics of the HPF 23.
  • a flat frequency characteristic is formed from low to high frequencies.
  • FIG. 4 is a diagram illustrating a configuration of the audio system 1 according to the first embodiment.
  • the audio system 1 is actually divided into a recording device 2 provided on the recording side and a reproducing device 3 provided on the reproducing side.
  • the recording signal recorded by the recording device 2 is recorded on a recording medium or transmitted via communication.
  • the reproduction device 3 reproduces the sound field at the time of recording by reproducing the recording signal recorded on the recording medium or the recording signal transmitted via communication.
  • the input side and the output side have eight channels (ch: channel), the microphone set 41 uses eight microphones m1 to m8, and the speaker set 42 also has eight speakers. s1 to s8 are used.
  • the microphones m1 to m8 and the speakers s1 to s8 are arranged such that the numbers of the subscripts correspond to each other. In FIG. 4, the numbers shown on the lines between the blocks indicate the number of channels.
  • the recording device 2 located on the recording side of the audio system 1 includes the LPF 21, the HOA encoder 22, the HPF 23, and the encoder 24.
  • the LPF 21, the HOA encoder 22, and the HPF 23 are the same as those described with reference to FIG.
  • the encoder 24 converts the audio signal that has passed through the HPF 23 into a signal corresponding to spatial coordinates.
  • a method of converting into a signal corresponding to the spatial coordinates for example, PCM (Pulse Code Modulation) coding, ADPCM (Adaptive Differential Pulse Code Modulation coding, Delta modulation, etc.) A method that depends on coordinates.
  • the HOA encoder 22 is different from the encoder 24 in that the HOA encoder 22 converts the audio signal input from the LPF 21 into a signal corresponding to a spatial frequency.
  • the HOA signal converted by the HOA encoder 22 can reproduce the sound at the spatial coordinate position by designating the spatial coordinates to be reproduced, that is, the positions of the speakers s1 to s8 in the speaker set 42. .
  • the HOA signal obtained as a result of conversion by the HOA encoder 22 of the recording device 2 and the high-frequency signal obtained as a result of conversion by the encoder 24 are recorded on a recording medium as a recording signal, or sent to the reproducing device 3 located on the reproducing side. Is sent.
  • FIG. 5 is a diagram showing a recording format of a recording signal used in the audio system 1 according to the first embodiment.
  • the recording signal has a header section and a data section.
  • the header section is a section in which various meta information necessary for reproducing the recorded audio signal is recorded.
  • the meta information to be recorded in the header section is configured to include a sampling rate, a frame length, the number of frames, the number of band divisions, and band information (first band information and second band information) for each band. .
  • the sampling rate is the sampling rate used at the time of recording, and may be fixed or variable.
  • the frame length is information defining the length of a frame recorded in the data section. Either fixed or variable frame length may be adopted.
  • the number of frames (L) is a number that defines the number of frames forming a chunk that is a unit of one data in the data portion.
  • the number of band divisions is a number indicating the number of bands divided in the audio system 1. In the present embodiment, the number of band divisions is “2” by the LPF 21 and the HPF 23 as described with reference to FIG. 2 ".
  • the first band information is information relating to conversion on the low band side, that is, the conversion of the HOA encoder 22.
  • the first band information is configured to include a cutoff frequency, spatial domain information, signal domain information, compression scheme information, and an order. ing.
  • the cutoff frequency corresponds to the cutoff frequency on the high frequency side of the LPF 21 described in FIG.
  • the spatial domain information includes information indicating that the band is a HOA signal.
  • information on the collected microphone set 41 for example, information on the arrangement of the microphones m1 to m8 in the microphone set 41, for example, , Spherical, annular, linear, inward, outward, and the like.
  • the signal domain information is information indicating whether it is recorded on the time axis or on the time frequency axis.
  • the compression method information is information indicating the presence or absence of compression and the compression method being used.
  • the order is the order used in the HOA encoder 22.
  • the second band information is information relating to the conversion on the high frequency side, that is, the encoder 24.
  • the second band information includes cutoff frequency, spatial domain information, signal domain information, compression scheme information, and channel information. It is configured.
  • the cutoff frequency corresponds to the cutoff frequency on the low frequency side of the HPF 23 described with reference to FIG.
  • the spatial domain information includes information indicating that the band is a signal encoded by the encoder 24.
  • information on the collected microphone set 41 for example, information of the microphones m1 to m8 in the microphone set 41 is included.
  • Information on the arrangement for example, information such as spherical, annular, linear, inward, outward and the like may be included.
  • the signal domain information is information indicating whether it is recorded on the time axis or on the time frequency axis.
  • the compression method information is information indicating the presence or absence of compression and the compression method being used.
  • the channel information includes the number of channels and channel coordinates. The number of channels corresponds to the number of microphones in the microphone set 41 (in this case, “8”).
  • the channel coordinates are coordinates indicating the spatial arrangement of the microphones m1 to m8 in the microphone set 41.
  • the data section stores signals converted by the HOA encoder 22 and the encoder 24.
  • frame chunks having frames are provided by the number of frames (L).
  • the data recorded in the frame as described above is converted into a sound signal by the HOA decoder 31 or the decoder 32 with reference to the meta information described in the header portion.
  • the recording format described above information common to bands can be combined into one.
  • the recording format described above is merely an example, and the present invention is not limited to this format, and can be configured in various forms.
  • the playback device 3 located on the playback side of the audio system 1 includes a HOA decoder 31, a decoder 32, a multiplier 33, and an adder 34.
  • the HOA decoder 31 decodes the HOA signal encoded by the HOA encoder 22 and forms an 8-channel audio signal.
  • the decoder 32 combines the signals encoded by the encoder 24 to form an 8-channel audio signal.
  • the adder 34 adds, for each channel, the audio signal formed by the HOA decoder and the audio signal formed by the decoder 32 and appropriately multiplied by the multiplier 33, and outputs the result to the speaker set 42.
  • the number of microphones m1 to m8 of the microphone set 41 and the number of speakers s1 to s8 of the speaker set 42 are the same eight, signals of the corresponding channels are output to the speakers s1 to s8. This makes it possible to reproduce the sound field at the time of sound pickup.
  • the number of the microphones m1 to m8 in the microphone set 41 for the HOA signal that is a signal corresponding to the spatial frequency is set.
  • Spacing, or the effect of spatial aliasing that occurs in accordance with the radius of the array, etc. makes it possible to suppress the deterioration of the audio signal that occurs at a certain frequency or higher, and to collect and reproduce the sound field with high accuracy. .
  • Second Embodiment> In the first embodiment, as described with reference to FIG. 4, the number of microphones m1 to m8 of the microphone set 41 and the number of speakers s1 to s8 of the speaker set 42 match. However, it is conceivable that the arrangement of the speaker set 42 cannot be configured in the same manner as the arrangement of the microphone set 41 at the time of sound collection due to the convenience of the reproduction side.
  • the second and third embodiments described below are embodiments in which the number of microphones in the microphone set 41 and the number of speakers in the speaker set 42 do not match.
  • FIG. 6 is a diagram showing a configuration of the audio system 1 according to the second embodiment.
  • the configurations of the microphone set 41 and the recording device 2 located on the recording side are the same as those described with reference to FIG. 4, and a description thereof will be omitted.
  • the speaker set 42 located on the reproduction side is different from the configuration in FIG. 4 in that the number of speakers s1 to s4 is smaller than the number (eight) of microphones m1 to m8. Further, the reproducing apparatus 3 is different in that a matrix section 35 is provided between the multiplication section 33 and the addition section 34. Then, by specifying the number and positions of the speakers s1 to s4 in the speaker set 42, the HOA decoder 31 outputs audio signals for four channels according to the arrangement of the speakers s1 to s4.
  • an audio signal for eight channels corresponding to the microphones m1 to m8 at the time of sound pickup is output.
  • the audio signal output from the decoder 32 is mixed by the matrix unit 35 as a conversion unit in accordance with the arrangement of the speakers s1 to s4 of the speaker set 42. Specifically, audio signals collected by three microphones m1, m2, and m8 are mixed as audio signals to be output to the speaker s1. At this time, the audio signals collected by the microphones m2 and m8 are multiplied by a coefficient of 0.25.
  • the audio signal output to the speaker s2 is obtained by mixing the audio signals collected by the three microphones m2, m3, and m4 in the matrix unit 35.
  • the audio signal output to the speaker s3 is obtained by mixing the audio signals collected by the three microphones m4, m5, and m6 in the matrix unit 35.
  • the audio signal output to the speaker s4 is obtained by mixing the audio signals collected by the three microphones m6, m7, and m8 in the matrix unit 35.
  • the HOA decoder 31 for restoring an audio signal based on a signal corresponding to a spatial frequency has a sound collecting form, that is, regardless of the arrangement form of the microphones m1 to m8, depending on the arrangement form of the speakers s1 to s4. While the sound field can be reproduced, the decoder 32 for restoring the audio signal based on the signal corresponding to the spatial coordinates reproduces the sound field with an audio signal depending on the positions of the microphones m1 to m8. Become.
  • a matrix unit 35 is provided as a conversion unit, and the number of channels of the audio signal output from the decoder 32 is converted according to the arrangement of the speakers s1 to s4 of the speaker set 42. This makes it possible to reproduce a sound field according to the arrangement of the speakers s1 to s4 on the reproduction side.
  • the configuration of the matrix unit 35 may be various methods other than those described in the present embodiment, and is not limited to one method.
  • FIG. 7 is a diagram illustrating a configuration of the audio system 1 according to the third embodiment.
  • the third embodiment is different from the second embodiment described with reference to FIG. 6 in the arrangement of the microphones in the microphone set 41 and the arrangement of the speakers in the speaker set 42. More specifically, the microphone set 41 includes four microphones m1 to m4, and the speaker set 42 includes eight speakers s1 to s8.
  • the matrix unit 35 as a conversion unit converts the four-channel audio signals output from the decoder 32 corresponding to the microphones m1 to m4 into eight-channel audio signals corresponding to the speakers s1 to s8.
  • the audio signals output to the speakers s1, s3, s5, and s7 directly output the audio signals collected by the microphones m1, m2, m3, and m4 in the corresponding arrangement.
  • the speakers s2, s4, s6, and s8, which do not have correspondingly arranged microphones, are formed by mixing audio signals of a plurality of microphones.
  • the audio signal for the speaker s2 is formed by mixing the audio signals of the microphone m1 and the microphone m2.
  • the mixing is performed by multiplying and adding fixed coefficients, the coefficients may be dynamically changed. For example, by distributing a large coefficient in a direction in which the magnitude (level) of the audio signal is large, it is possible to emphasize the sense of direction of the sound field during reproduction.
  • the number of channels is converted into the number of channels corresponding to the arrangement of the speakers s1 to s4 on the reproduction side by the matrix unit 35 as a conversion unit.
  • the sound field can be reproduced properly.
  • the conversion unit can not only convert the number of channels, but also convert the audio signal so that when the sound pickup direction of the microphone and the sound emission direction of the speaker are different, they are in an appropriate form. .
  • the number of microphones and the number of speakers may be the same.
  • the configuration of the matrix unit 35 may be various methods other than the one described in the present embodiment, and is not limited to one method.
  • FIG. 8 is a diagram illustrating a configuration of an audio system 1 according to the fourth embodiment.
  • the audio system 1 according to the first embodiment described with reference to FIG. 4 the downsampling unit 26 as a sampling frequency conversion unit on the recording side, and the delay caused by the downsampling unit 26 are described.
  • the difference is that a delay unit 25 for compensating is provided, and an upsampling unit 37 as a sampling frequency conversion unit is provided on the reproduction side.
  • the processing in the HOA encoder 22 does not include a signal in a high frequency region, it is conceivable that even if the frequency of the input audio signal is reduced, the effect on the sound quality is not so large.
  • the amount of calculation in the HOA encoder 22 is reduced by performing a down-sampling process on the time axis in the down-sampling unit 26 for the audio signal input to the HOA encoder 22. Further, by performing the downsampling, the data amount of the signal output from the HOA encoder 22 can be reduced, and the storage capacity and the communication amount can be reduced.
  • an up-sampling section 37 arranged downstream of the HOA decoder 31 performs up-sampling at the same sampling frequency as that of the decoder 32 side.
  • an FIR filter is mainly used in many cases.
  • the delay unit 25 is provided on the path on the encoder 24 side to compensate for the delay generated in the downsampling unit 26.
  • the delay may be compensated on the reproducing side (or the recording side and the reproducing side).
  • the delay unit When compensating for the delay on the reproduction side, for example, it is conceivable to arrange a delay unit at a stage subsequent to the decoder 32.
  • the downsampling process on the time axis is performed on the audio signal input to the HOA encoder 22 to reduce the calculation amount in the HOA encoder 22 and to reduce the amount of calculation from the HOA encoder 22. It is possible to reduce the data amount of the output signal. In addition, as the amount of data output from the HOA encoder 22 can be reduced, a larger amount of information (for example, the number of bits) can be assigned to the signal output from the encoder 24.
  • the conversion of the sampling frequency is performed not on the HOA encoder 22 side and the HOA decoder 31 side but on the encoder 24 and the decoder 32 side, or on both the HOA encoder 22 side and the HOA decoder 31 side and the encoder 24 and the decoder 32 side. It may be performed by.
  • FIG. 9 is a diagram illustrating a configuration of an audio system 1 according to the fifth embodiment.
  • the audio system 1 according to the first embodiment described with reference to FIG. 4 uses one HOA encoder 22, whereas in the fifth embodiment, the audio system 1 is in charge of a low band. This is different from the first embodiment in that a HOA encoder 22a for performing a mid-range operation is provided.
  • An LPF 21a for removing a high-frequency component of an input audio signal is arranged at a stage preceding the HOA encoder 22a, and a BPF 21b (Band) for extracting a mid-range component of the input audio signal is arranged at a stage prior to the HOA encoder 22b. Pass Filter).
  • a HOA decoder 31a for decoding an audio signal encoded by the HOA encoder 22a and a HOA decoder 31a for decoding an audio signal encoded by the HOA encoder 22b are arranged on the reproduction side. .
  • the adder 34 converts the audio signal decoded by the decoder 32 and multiplied by the coefficient by the multiplier 33 and the audio signal decoded by the HOA decoder 31a and the HOA decoder 31b, and outputs the converted signal to the speaker set 42. .
  • FIG. 10 is a diagram illustrating frequency characteristics of the audio system 1 according to the fifth embodiment.
  • the low-pass characteristic shown by the solid line indicates the characteristic of the LPF 21a.
  • the mid-pass characteristic indicated by the dashed line indicates the characteristic of the BPF 21b.
  • the high-pass characteristics indicated by broken lines indicate the characteristics of the HPF 23.
  • the HOA encoders 22a and 22b by providing the HOA encoders 22a and 22b separately in a plurality of frequency bands, it is possible to vary the order used in the HOA processing of the HOA encoders 22a and 22b.
  • the wavelength is sufficiently long, so that the direction of arrival of the sound perceived by humans is insensitive. It is conceivable to reduce the calculation amount by performing the processing.
  • FIG. 11 is a diagram illustrating a configuration of an audio system 1 according to the sixth embodiment.
  • the mode in which the delay unit 25 is provided on the recording side to compensate for the delay generated in the downsampling unit 26 has been described. Since the processing is performed separately, a time lag may occur between the bands.
  • the sixth embodiment shows a configuration for eliminating such a time lag between bands on the reproduction side.
  • a time lag caused by the processing of the encoder 24 and the HOA encoder 22 on the recording side or a time lag caused by the processing of the decoder 32 and the HOA decoder 31 on the reproduction side can be eliminated by providing the delay unit 36 on the reproduction side.
  • the delay instead of compensating for the delay on the reproducing side, the delay may be compensated on the recording side (or the recording side and the reproducing side).
  • Modification> Modification of HOA method
  • the first to sixth embodiments have described the embodiments using the HOA method using the HOA encoder 22, the HOA decoder 31, and the like.
  • the signals used in the various embodiments are not limited to the signals encoded by the HOA method, but the signals associated with the spatial frequencies, in other words, the positions to be reproduced when decoding (the positions at which the speakers are installed) ), Various methods can be adopted as long as the signal can reproduce the audio signal at the position.
  • a signal associated with a spatial frequency is referred to as an SF signal.
  • the method used by the encoder 24, the decoder 32, and the like uses a signal corresponding to the spatial coordinates.
  • the signal is a signal that can reproduce the audio signal at the picked-up position (spatial coordinates).
  • a signal associated with the spatial coordinates is referred to as an SA signal.
  • SA signal a signal associated with the spatial coordinates
  • FIG. 12 is a diagram illustrating a recording format of a recording signal used in the audio system 1 according to a modification.
  • the format of the recording signal used in the first embodiment has been described.
  • a description will be given of a recording format when the HOA signal is generalized as a signal (SF signal) associated with a spatial frequency, and a signal in charge of a high band is a signal (SA signal) associated with spatial coordinates (SA signal). .
  • SF signal signal
  • SA signal spatial coordinates
  • the recording signal has a header section and a data section.
  • the header section is a section in which various meta information necessary for reproducing the recorded audio signal is recorded.
  • the meta information to be recorded in the header section includes a sampling rate, a frame length, the number of frames (L), the number of band divisions (N), and band information (first to N-th band information) for each band. It is configured. For example, when the frequency band is divided into three frequency bands as in the fifth embodiment, first to third band information is provided.
  • the sampling rate is the sampling rate used at the time of recording, and may be fixed or variable.
  • the frame length is information defining the length of a frame recorded in the data section. Either fixed or variable frame length may be adopted.
  • the number of frames (L) is a number that defines the number of frames forming a chunk that is a unit of one data in the data portion.
  • the number of band divisions is a number indicating the number of bands to be divided in the audio system 1. For example, when the band is divided into three frequency bands as in the fifth embodiment, the number of band divisions is "3".
  • the first to third band information is provided.
  • Each band information (first to N-th band information) is provided with a first cutoff frequency indicating the lower limit of the assigned frequency band and a second cutoff frequency indicating the upper limit.
  • the time delay information is information indicating delay or advance with respect to another band, and can be used, for example, for setting the delay time in the delay unit 36 described in the sixth embodiment.
  • the spatial domain information is information indicating whether the band is an SF signal or an SA signal, and the reproducing device 3 can determine a decoding method for the band by referring to the spatial domain information. is there.
  • the spatial domain information may include information on the microphone arrangement of the collected microphone set 41 and the like.
  • the signal domain information is information indicating whether it is recorded on the time axis or on the time frequency axis.
  • the compression method information is information indicating the presence or absence of compression and the compression method being used.
  • the order is stored when the SF signal is used, and the channel information is stored when the SA signal is used.
  • the order stored for the frequency band using the SF signal is the order used for the process of forming a signal corresponding to the spatial frequency.
  • the channel information stored for the frequency band using the SA signal is information stored when the SA signal is used, and as described with reference to FIG. It is configured to include coordinates.
  • the number of channels corresponds to the number of microphones in the microphone set 41 (for example, “4” in the case of the third embodiment shown in FIG. 7).
  • the channel coordinates are coordinates indicating the spatial arrangement of the microphones m1 to m4 in the microphone set 41.
  • the matrix unit 35 (conversion unit) described in the second embodiment described with reference to FIG. 6 and the third embodiment described with reference to FIG. 7 uses the channel information and the arrangement of speakers in the speaker set 42 based on this channel information. Various conversions such as converting the number of channels of an audio signal can be performed.
  • the data section stores signals converted for each band.
  • frame chunks having frames are provided for the first to N-th bands by the number of frames (L).
  • the data recorded in such a frame is converted into an audio signal with reference to the meta information described in the header.
  • the playback device 3 in each of the above-described embodiments outputs an audio signal to the speaker set 42 including a plurality of speakers.
  • the reproducing device 3 may reproduce an audio signal in a virtual environment using headphones, for example. That is, in the above-described embodiment, if the head-related transfer functions from each speaker in the speaker set 42 to both ears of the listener are known, each head-related transfer function is convolved with the audio signal driving each speaker. It can be seen how the sound of each speaker is heard by both ears of the listener. By reproducing the sum of the left and right ears through headphones or the like, a sound field similar to that of a listener using the speaker set 42 can be reproduced.
  • ⁇ Sound field formation using such a virtual environment can be realized with an electro-acoustic transducer that is driven not only by headphones but also by two or more channels. At that time, if necessary, various corrections such as crosstalk cancellation can be performed on the audio signal reproduced by the electro-acoustic transducer.
  • the present disclosure can be realized in various forms such as an apparatus, a method, and a program.
  • the items described in each of the embodiments and the modified examples can be appropriately combined.
  • a first decoder for decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels A second decoder that decodes a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
  • An audio reproducing apparatus comprising: an adder that adds a plurality of channels of audio signals decoded by the first decoder and a plurality of channels of audio signals decoded by the second decoder.
  • the audio reproduction device according to (1) or (2), wherein the first decoder performs decoding based on an arrangement of speakers to be output.
  • the audio playback device according to any one of (1) to (3), wherein the first decoder uses a HOA method.
  • the audio playback device according to any one of (1) to (4), further including a conversion unit configured to convert audio signals of a plurality of channels output from the second decoder based on an arrangement of speakers to be output. .
  • the audio reproduction device according to (5), wherein the conversion unit converts the number of channels of the audio signal output from the second decoder.
  • the first signal and the second signal have different sampling frequencies
  • the audio playback device according to any one of (1) to (6), further including a sampling frequency conversion unit configured to convert at least one of the first signal and the second signal.
  • a plurality of the second decoders are provided for each band, The audio reproduction device according to any one of (1) to (7), wherein the plurality of second decoders use different orders for decoding.
  • the audio playback device according to any one of (1) to (8), further including a delay unit that adjusts a time shift generated between the first decoder and the second decoder.
  • a first decoding process of decoding a first signal associated with a spatial frequency into audio signals of a plurality of channels A second decoding process of decoding a second signal including a band different from the first signal and corresponding to spatial coordinates into audio signals of a plurality of channels;
  • An audio reproduction program for causing an information processing device to execute an addition process of adding audio signals of a plurality of channels decoded by the first decoder and audio signals of a plurality of channels decoded by the second decoder.
  • audio system 2 recording device 3: playback device 21 (21a, 21b): LPF 22 (22a, 22b): HOA encoder 23: HPF 24: encoder 25: delay unit 26: downsampling unit 31 (31a, 31b): HOA decoder 32: decoder 33: multiplying unit 34: adding unit 35: matrix unit 36: delay unit 37: upsampling unit 41: microphone set 42 : Speaker set m1 to m8: Microphones s1 to s8: Speaker

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Stereophonic System (AREA)

Abstract

L'invention concerne un dispositif de reproduction audio comprenant un premier décodeur pour décoder un premier signal corrélé à une fréquence spatiale en un signal audio multicanal, un second décodeur pour décoder un second signal qui comprend une bande différente du premier signal et qui est corrélé à des coordonnées spatiales en un signal audio à canaux multiples, et une unité d'addition pour ajouter ensemble le signal audio multicanal décodé par le premier décodeur et le signal audio multicanal décodé par le second décodeur.
PCT/JP2019/025199 2018-08-21 2019-06-25 Dispositif de reproduction audio, procédé de reproduction audio et programme de reproduction audio WO2020039734A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980053901.8A CN112567769B (zh) 2018-08-21 2019-06-25 音频再现装置、音频再现方法和存储介质
DE112019004193.2T DE112019004193T5 (de) 2018-08-21 2019-06-25 Audiowiedergabevorrichtung, audiowiedergabeverfahren und audiowiedergabeprogramm

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018154456 2018-08-21
JP2018-154456 2018-08-21

Publications (1)

Publication Number Publication Date
WO2020039734A1 true WO2020039734A1 (fr) 2020-02-27

Family

ID=69592557

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/025199 WO2020039734A1 (fr) 2018-08-21 2019-06-25 Dispositif de reproduction audio, procédé de reproduction audio et programme de reproduction audio

Country Status (3)

Country Link
CN (1) CN112567769B (fr)
DE (1) DE112019004193T5 (fr)
WO (1) WO2020039734A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015137146A1 (fr) * 2014-03-12 2015-09-17 ソニー株式会社 Dispositif et procédé de capture de son de champ sonore, dispositif et procédé de reproduction de champ sonore et programme
WO2017035163A1 (fr) * 2015-08-25 2017-03-02 Dolby Laboratories Licensing Corporation Décodeur audio et procédé de décodage
JP2017523451A (ja) * 2014-07-02 2017-08-17 ドルビー・インターナショナル・アーベー 圧縮hoa表現をデコードする方法および装置ならびに圧縮hoa表現をエンコードする方法および装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB8920259D0 (en) * 1989-09-07 1989-10-18 British Broadcasting Corp Hybrid predictive coders and decoders for digital video signals
CN101140759B (zh) * 2006-09-08 2010-05-12 华为技术有限公司 语音或音频信号的带宽扩展方法及系统
EP2782094A1 (fr) * 2013-03-22 2014-09-24 Thomson Licensing Procédé et appareil permettant d'améliorer la directivité d'un signal ambisonique de 1er ordre
US9502045B2 (en) * 2014-01-30 2016-11-22 Qualcomm Incorporated Coding independent frames of ambient higher-order ambisonic coefficients
CN105898669B (zh) * 2016-03-18 2017-10-20 南京青衿信息科技有限公司 一种声音对象的编码方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015137146A1 (fr) * 2014-03-12 2015-09-17 ソニー株式会社 Dispositif et procédé de capture de son de champ sonore, dispositif et procédé de reproduction de champ sonore et programme
JP2017523451A (ja) * 2014-07-02 2017-08-17 ドルビー・インターナショナル・アーベー 圧縮hoa表現をデコードする方法および装置ならびに圧縮hoa表現をエンコードする方法および装置
WO2017035163A1 (fr) * 2015-08-25 2017-03-02 Dolby Laboratories Licensing Corporation Décodeur audio et procédé de décodage

Also Published As

Publication number Publication date
DE112019004193T5 (de) 2021-07-15
CN112567769A (zh) 2021-03-26
CN112567769B (zh) 2022-11-04

Similar Documents

Publication Publication Date Title
US10231073B2 (en) Ambisonic audio rendering with depth decoding
US10674262B2 (en) Merging audio signals with spatial metadata
US10999689B2 (en) Audio signal processing method and apparatus
CN107533843B (zh) 用于捕获、编码、分布和解码沉浸式音频的系统和方法
US9361898B2 (en) Three-dimensional sound compression and over-the-air-transmission during a call
RU2640647C2 (ru) Устройство и способ преобразования первого и второго входных каналов, по меньшей мере, в один выходной канал
CN100496149C (zh) 把两频道矩阵编码音频重构为多频道音频的解码方法
US8284946B2 (en) Binaural decoder to output spatial stereo sound and a decoding method thereof
JP5054035B2 (ja) 符号化/復号化装置及び方法
CN106797526A (zh) 音频处理装置、方法和程序
WO2019239011A1 (fr) Capture, transmission et reproduction audio spatiales
KR101637407B1 (ko) 부가적인 출력 채널들을 제공하기 위하여 스테레오 출력 신호를 발생시키기 위한 장치와 방법 및 컴퓨터 프로그램
CN112823534B (zh) 信号处理设备和方法以及程序
WO2020039734A1 (fr) Dispositif de reproduction audio, procédé de reproduction audio et programme de reproduction audio
JPWO2020100670A1 (ja) 信号処理装置および方法、並びにプログラム
CN112133316A (zh) 空间音频表示和渲染
WO2021261235A1 (fr) Dispositif et procédé de traitement de signaux et programme
WO2022050087A1 (fr) Dispositif et procédé de traitement de signal, dispositif et procédé d'apprentissage, et programme
EP4264962A1 (fr) Système de localisation sonore psychoacoustique pour casque stéréo et procédé de reconstruction de signaux sonores psychoacoustiques stéréo l'utilisant

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19852545

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19852545

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP