WO2024038702A1 - Sound field reproduction device, sound field reproduction method, and sound field reproduction system - Google Patents

Sound field reproduction device, sound field reproduction method, and sound field reproduction system Download PDF

Info

Publication number
WO2024038702A1
WO2024038702A1 PCT/JP2023/025363 JP2023025363W WO2024038702A1 WO 2024038702 A1 WO2024038702 A1 WO 2024038702A1 JP 2023025363 W JP2023025363 W JP 2023025363W WO 2024038702 A1 WO2024038702 A1 WO 2024038702A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound field
sound
signal
field reproduction
acoustic signal
Prior art date
Application number
PCT/JP2023/025363
Other languages
French (fr)
Japanese (ja)
Inventor
宏正 大橋
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Publication of WO2024038702A1 publication Critical patent/WO2024038702A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Definitions

  • the present disclosure relates to a sound field reproduction device, a sound field reproduction method, and a sound field reproduction system.
  • Scene-based stereophonic sound reproduction technology is the process of performing signal processing on multi-channel signals recorded using an ambisonics microphone with multiple directional microphone elements arranged on a rigid sphere or a hollow sphere.
  • Patent Document 1 is known as a prior art related to sound field reproduction.
  • Patent Document 1 discloses a plurality of sound collection units that are installed as one in a sound collection target space, and which have different directions depending on the position of a sound source and the position of an object that reflects the sound emitted from the sound source.
  • a plurality of sound collection signals are acquired based on sound collection by a plurality of sound collection units installed in
  • a signal processing device that generates an acoustic signal is disclosed.
  • Patent Document 1 The configuration of Patent Document 1 is based on the premise that a listening point exists within a sound collection target space in which a plurality of sound collection units are arranged. For this reason, even if an attempt is made to construct a scene-based stereophonic sound system using Patent Document 1, a listener must be present within the sound collection target space where the sound collection section is arranged. In other words, if the listener is in a position different from the sound collection target space, the sound field must be reproduced so that the sound signal collected in the sound collection target space can be heard within the sound collection target space. The problem is that it is difficult.
  • ambisonics microphones can synthesize higher-order ambisonics components (sound field components) as the number of microphone elements arranged on a spherical surface increases. It is known that directional resolution can be increased. However, in order to synthesize higher-order sound field components during real-time live streaming at events such as public viewings, it is necessary to increase the number of microphone elements placed in the recording target space to synthesize higher-order sound field components. This is necessary. Therefore, depending on the installation space of the microphone elements, there are physical restrictions on the arrangement of the microphone elements, and the increase in the number of transmission channels due to the increase in the number of microphone elements increases the processing load such as signal processing and synthesis processing. This causes a delay in the output for sound field reproduction.
  • the present disclosure was devised in view of the conventional situation described above, and utilizes low-order sound field components recorded using an ambisonics microphone to suppress an increase in sound source localization errors in a sound field reproduction space.
  • the object of the present invention is to provide a field reproduction device, a sound field reproduction method, and a sound field reproduction system.
  • the present disclosure includes a sound source extraction direction control unit that receives a designation of a sound source extraction direction in a sound field recording space in which a recording device is placed, and a low-order fundamental acoustic signal based on an encoding process using a recorded signal by the recording device.
  • a re-encoding unit that re-encodes the acoustic signal corresponding to the sound source extraction direction to generate a high-order base acoustic signal; and a plurality of speakers installed in a sound field reproduction space different from the sound field recording space.
  • a sound field reproducing device is provided, comprising a sound field reproducing unit that outputs a signal based on the high-order fundamental acoustic signal from each of them.
  • the present disclosure includes a step of receiving a designation of a sound source extraction direction in a sound field recording space in which a recording device is arranged, and a step of receiving a designation of a sound source extraction direction in a sound field recording space in which a recording device is arranged, and a step of receiving a designation of a sound source extraction direction of a low-order base acoustic signal based on an encoding process using a recorded signal by the recording device. re-encoding the acoustic signal corresponding to the sound source extraction direction to generate a high-order base acoustic signal;
  • a sound field reproduction method is provided, comprising the step of outputting a signal based on a sub-base acoustic signal.
  • the present disclosure also provides a sound field recording device having a recording device that can record a sound source in a sound field recording space, and a sound field reproduction space different from the sound field recording space, and a sound field recording device that includes a recording device that can record a sound source in a sound field recording space, and a sound field reproduction device that reproduces a sound field within the sound field recording space, and the sound field reproduction device includes a sound source extraction direction control unit that receives a designation of a sound source extraction direction within the sound field recording space, and a sound field reproduction device that uses a recording signal from the recording device.
  • a sound field reproduction system comprising: a sound field reproduction section that outputs a signal based on the high-order fundamental acoustic signal from each of a plurality of speakers.
  • Diagram schematically showing the concept from sound field recording to sound field reproduction in scene-based 3D sound reproduction technology using ambisonics microphones A diagram showing an example of the basis of an ambisonics component based on spherical harmonic expansion for order n and degree m
  • FIG. 1 A diagram showing an example of an overview of operations from sound field recording to sound field reproduction in Embodiment 1 Flowchart chronologically showing an example of an operation procedure for sound field reproduction by the sound field reproduction device according to the first embodiment
  • Block diagram showing a system configuration example of a sound field reproduction system according to Embodiment 2 A diagram showing an example of an overview of operations from sound field recording to sound field reproduction in Embodiment 2 Flowchart chronologically showing an example of an operation procedure for sound field reproduction by the sound field reproduction device according to Embodiment 2
  • scene-based stereophonic sound reproduction technology using an ambisonics microphone as a recording device for recording sound source signals such as sounds, music, and human voices in a sound field recording space (for example, a live venue)
  • sound source signals such as sounds, music, and human voices in a sound field recording space (for example, a live venue)
  • a sound field recording space for example, a live venue
  • signals recorded signals
  • point sound sources recorded by multiple microphone elements constituting the ambisonics microphone are converted into an intermediate representation ITMR1 (see Figure 1) using spherical harmonic functions. ) or by expressing (encoding) it as a B format signal, the sound field arriving from all directions is handled uniformly in the ambisonics signal domain (see below).
  • a speaker drive signal is generated, thereby realizing a desired sound field reproduction within a sound field reproduction space (for example, a satellite venue).
  • FIG. 1 is a diagram schematically showing the concept from sound field recording to sound field reproduction in scene-based stereophonic sound reproduction technology using an ambisonics microphone 11.
  • the ambisonics microphone 11 is placed in a sound field recording space such as the live venue LV1.
  • performances are performed using multiple sound sources (for example, in the case of a band performance by multiple people, various sound sources such as vocals, bass, guitar, drums, etc.) are performed, and the sounds of the performances are recorded by an ambisonics microphone 11. be done.
  • the ambisonics microphone 11 which is an example of a recording device, includes four microphone elements Mc1, Mc2, Mc3, and Mc4. Each of the microphone elements Mc1 to Mc4 is hollowly arranged so as to face four vertices from the center of the cube CB1 in FIG. 1 when the direction Dr1 is the front direction, and has unidirectivity in each vertex direction. ing.
  • the microphone element Mc1 faces the front left up (FLU) of the ambisonics microphone 11, and records sound in the front left up (FLU) direction.
  • the microphone element Mc2 faces the front right down (FRD) of the ambisonics microphone 11, and records sound in the front right down direction (FRD).
  • the microphone element Mc3 faces the back left down (BLD) of the ambisonics microphone 11, and records the sound in the back left direction.
  • the microphone element Mc4 faces the back right up (BRU) of the ambisonics microphone 11, and records the sound in the back right up direction.
  • the sound recording signals in these four directions are called A-format signals.
  • the A format signal cannot be used as is, but is converted into a B format signal as an intermediate representation ITMR1 having directional characteristics (directivity).
  • Examples of B-format signals include a B-format signal W for omnidirectional sound, a B-format signal X for front-back sound, a B-format signal Y for left-right sound, and a B-format signal Z for up-down sound. has.
  • the A format signal is converted to a B format signal using the following conversion formula.
  • each of the speakers SPk1 to SPk8 are specified by a predetermined distance and angle (azimuth angle ⁇ i and elevation angle ⁇ i ) from the reference position (for example, center position LSP1) of the sound field reproduction space (for example, satellite venue STL1). It is possible.
  • i is a variable indicating a speaker placed in the sound field reproduction space (for example, satellite venue STL1), and takes any integer from 1 to 8 in the example of FIG.
  • the sound field in the sound field recording space (for example, live venue LV1) can be freely adjusted in the sound field reproduction space (for example, satellite venue STL1) based on the respective directions of speakers SPk1 to SPk8 in the space (for example, satellite venue STL1). can be reproduced.
  • the front direction of the listener is taken as the reference direction, and from that reference direction, any three-dimensional direction (for example, the sound source It becomes possible to reproduce and output the sound in the presentation direction ⁇ target ).
  • FIG. 2 is a diagram showing an example of the basis of an ambisonics component based on spherical harmonic expansion with respect to order n and power m.
  • the horizontal axis (m) in FIG. 2 indicates degree, and the vertical axis (n) in FIG. 2 indicates order.
  • the frequency m takes a value from -n to +n.
  • FIG. 3 is a block diagram showing an example of the system configuration of the sound field reproduction system 100 according to the first embodiment.
  • FIG. 4 is a diagram showing an example of an outline of operations from sound field recording to sound field reproduction in the first embodiment.
  • the sound field reproduction system 100 includes a sound field recording device 1 and a sound field reproduction device 2.
  • the sound field recording device 1 and the sound field reproduction device 2 are connected to each other via a network NW1 so as to be capable of data communication.
  • Network NW1 may be a wired network or a wireless network.
  • the wired network is, for example, at least one of a wired LAN (Local Area Network), a wired WAN (Wide Area Network), and a power line communication (PLC), and may be any other network configuration that allows wired communication.
  • the wireless network includes at least one of wireless LAN such as Wi-Fi (registered trademark), wireless WAN, short-range wireless communication such as Bluetooth (registered trademark), and mobile mobile communication network such as 4G or 5G. , or other network configurations that allow wireless communication.
  • the sound field recording device 1 is arranged, for example, in a sound field recording space (for example, a live venue LV1), and includes an ambisonics microphone 11, an A/D conversion section 12, an encoding section 13, and a microphone element direction specifying section 14. include. Note that the sound field recording device 1 only needs to have at least the ambisonics microphone 11, and the A/D conversion section 12, the encoding section 13, and the microphone element direction specifying section 14 are provided in the sound field reproduction device 2. Good too. In other words, the ambisonics microphone 11 may be provided outside the sound field reproduction device 2.
  • the ambisonics microphone 11 includes four microphone elements Mc1, Mc2, Mc3, and Mc4.
  • the microphone element Mc1 records sound in the front upper left direction (see Figure 1), and the microphone element Mc2 records sound in the front lower right direction (see Figure 1).
  • the microphone element Mc3 records the sound in the rear lower left direction (see FIG. 1), and records the sound in the rear upper right direction (see FIG. 1).
  • the ambisonics microphone 11 may include more microphone elements having unidirectionality than the four microphone elements Mc1, Mc2, Mc3, and Mc4 arranged in the hollow, or may include microphone elements arranged on a rigid sphere.
  • a microphone element having omnidirectionality may be included.
  • the A/D conversion section 12, the encoding section 13, and the microphone element direction specifying section 14 are, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), a GPU (Graphical Processing Unit), and a GPU (Graphical Processing Unit). it), FPGA (Field Programmable Gate Array), etc.
  • the electronic device is configured by a semiconductor chip or dedicated hardware on which at least one of the electronic devices is mounted.
  • the A/D converter 12 converts the recorded signal in analog format from each microphone element constituting the ambisonics microphone 11 into a recorded signal in digital format, and sends the signal to the encoder 13.
  • the encoding unit 13 encodes the recorded signal after conversion by the A/D converter 12 using the recorded signal after conversion by the A/D converter 12 and the direction vector ⁇ m from the microphone element direction specifying unit 14. By performing the processing, a low-order fundamental acoustic signal (for example, a first-order ambisonics signal) is generated. Details of the encoding process by the encoding unit 13 will be described later.
  • Equation (4) can be expanded based on the spherical harmonic function of Equation (2).
  • a m n is an expansion coefficient
  • R n (kr) is a radial function term.
  • the infinite sum with respect to order n is approximated by truncating at a finite order N, and the accuracy of sound field reproduction changes according to this truncated order N.
  • the truncation order will be expressed as N.
  • i is an imaginary unit
  • j n (kr) is an n-th spherical Bessel function
  • j ′ n (kr) is its derivative.
  • the expansion coefficient vector ⁇ m n for this plane wave is handled as a B format signal (intermediate representation) that is the output of the encoding process by the encoding unit 13.
  • this expansion coefficient vector may be referred to as an ambisonics region signal or simply an ambisonics signal.
  • the recorded signal which is a time domain signal after conversion by the A/D converter 12, is converted into an ambisonics signal (for example, a first-order ambisonics signal).
  • This ambisonics signal (for example, a first-order ambisonics signal) is decoded by the first decoding section 25 and the second decoding section 26 of the sound field reproduction device 2 and converted into a speaker drive signal.
  • the sound field reproduction device 2 is arranged, for example, in a sound field reproduction space (for example, satellite venue STL1), and includes a sound source extraction direction control section 21, a sound source presentation direction control section 22, a re-encoding section 23, and a speaker direction specifying section 24. , a first decoding section 25, a second decoding section 26, a signal mixing section 27, a sound field reproduction section 28, and speakers SPk1, SPk2, . . . , SPk8.
  • the number of speakers arranged is 8 as an example, but it goes without saying that the number is not limited to 8 as long as it is an integer of 2 or more.
  • the signal mixing section 27 converts the speaker drive signal corresponding to the high-order base acoustic signal from the first decoding section 25 and the speaker drive signal corresponding to the low-order base acoustic signal from the second decoding section 26 into the speaker drive signal.
  • the mixed signals are mixed in a corresponding manner and sent to the sound field reproduction section 28.
  • the configuration of the signal mixing section 27 may be omitted from the sound field reproduction device 2, and in this case, only the high-order base acoustic signal from the first decoding section 25 is transmitted to each speaker SPk1 via the sound field reproduction section 28. - Output from each of SPk8.
  • the sound field reproducing unit 28 converts the digital speaker drive signal for each speaker mixed by the signal mixing unit 27 into an analog speaker drive signal, amplifies the signal, and outputs (reproduces) the signal from the corresponding speaker.
  • Each of the speakers SPk1, SPk2, ..., SPk8 is placed at each vertex of a sound field reproduction space modeled as a cube (for example, satellite venue STL1), and produces sound based on a speaker drive signal from the sound field reproduction section 28. Regenerate (recreate) the place.
  • the number of speakers installed can be changed depending on the sound field you want to reproduce, and if you do not want to reproduce a specific direction, or if you use commonly known virtual sound image generation methods such as the transaural system or the VBAP (Vector Based Amplitude Panning) method. By combining these, sound field reproduction may be performed using fewer than eight speakers. Conversely, sound field reproduction may be performed using more than eight speakers.
  • the speaker installation position may be other than each vertex of the sound field reproduction space (for example, the satellite venue STL1) as long as it is installed so as to surround the reference position (for example, the center position LSP1) of the satellite venue STL1.
  • the sound field reproduction unit 28 may output the signal to a binaural reproduction device such as headphones or earphones worn by the listener (user) instead of the speaker.
  • the sound field reproduction unit 28 supplies a signal to a reproduction device for both ears of the listener (user) (for example, the above-mentioned headphones or earphones)
  • the sound field reproduction unit 28 corresponds to an azimuth angle of +-90° by decoding processing described later.
  • a virtual sound image may be generated in multiple directions surrounding the head, and the user may perceive a stereoscopic sound image such as HRTF (Head Related Transfer Function) corresponding to those multiple angles.
  • the reproduction signal may be generated by multiplying the virtual sound image in the corresponding direction by the transfer characteristic for the reproduction in the frequency domain or by convolving it in the time domain.
  • the sound field is not reproduced only from each of the speakers SPk1, SPk2, ..., SPk8 placed in the satellite venue STL1, but the sound field is reproduced only by the listeners (users) placed in the satellite venue STL1. It is also possible to reproduce the sound field on a playback device (for example, the above-mentioned headphones or earphones).
  • FIG. 5 is a flowchart chronologically showing an example of an operation procedure for sound field reproduction by the sound field reproduction device 2 according to the first embodiment.
  • each process of step St1 and step St2 will be explained as being executed within the sound field recording device 1, but the process of step St2 is performed when the components other than the ambisonics microphone 11 of the sound field recording device 1 are If it is provided in the field reproduction device 2, it may be executed by the sound field reproduction device 2.
  • the sound field reproduction device 2 performs a series of processes in steps St3 to St6 (that is, a re-encoding process for generating a higher-order base acoustic signal) and a process in step St7 (that is, decoding processing for generating a low-order fundamental acoustic signal) is executed in parallel.
  • the signal mixing unit 27 of the sound field reproduction device 2 mixes the speaker drive signal (an example of the output of the first decoding process) corresponding to the high-order base acoustic signal from the first decoding unit 25 in step St6, and the speaker drive signal (an example of the output of the first decoding process) in step St7.
  • the speaker drive signal (an example of the output of the second decoding process) corresponding to the low-order base acoustic signal from the second decoding unit 26 is mixed so as to correspond to each speaker (step St8).
  • the sound field reproducing unit 28 of the sound field reproducing device 2 converts the digital format speaker drive signal for each speaker after mixing by the signal mixing unit 27 in step St8 into an analog format speaker drive signal, amplifies the signal, and responds accordingly. output (reproduction) from each of the speakers SPk1 to SPk8 (step St9).
  • the recording device is composed of an ambisonics microphone 11 that is three-dimensionally arranged so that each of the plurality of microphone elements Mc1 to Mc4 faces in a different direction.
  • the sound field recording device 1 can three-dimensionally record atmospheric sounds such as performances by a plurality of sound sources in the sound field recording space (live venue LV1).
  • FIG. 6 is a block diagram showing an example of a system configuration of a sound field reproduction system 100A according to the second embodiment.
  • FIG. 7 is a diagram showing an example of an outline of operations from sound field recording to sound field reproduction in the second embodiment.
  • the same reference numerals are used to simplify or omit the description of the contents that overlap with the configurations and operations of the corresponding FIGS. 3 and 4, and the different contents will be described.
  • the sound field reproduction system 100A includes a sound field recording device 1 and a sound field reproduction device 2A.
  • the configuration of the sound field recording device 1 is the same as that in Embodiment 1, so a description thereof will be omitted.
  • the sound field reproduction device 2A is arranged, for example, in a sound field reproduction space (for example, satellite venue STL1), and includes a sound source extraction direction control section 21, a sound source presentation direction control section 22, a re-encoding section 23, and a speaker direction specification section 24. , the first decoding section 25, the sound source acquisition section 29, the second encoding section 30, the second signal mixing section 31, the second decoding section 32, the signal mixing section 27, and the sound field reproduction section. 28, and speakers SPk1, SPk2, ..., SPk8.
  • a sound field reproduction space for example, satellite venue STL1
  • the first decoding section 25, the sound source acquisition section 29, the second encoding section 30, the second signal mixing section 31, the second decoding section 32, the signal mixing section 27, and the sound field reproduction section. 28 and speakers SPk1, SPk2, ..., SPk8.
  • the sound source acquisition unit 29 acquires acoustic signals s1[n], ..., sb[n] of a plurality of sound sources (for example, various sound sources such as vocals, bass, guitar, drums, etc.) to be presented in a sound field reproduction space (for example, satellite venue STL1). ] is acquired and sent to the second encoding unit 30.
  • Each acoustic signal s1[n], ..., sb[n] can be expressed as a point sound source.
  • n indicates discrete time
  • b indicates the number of sound sources.
  • These sound sources may be individually recorded in the sound field recording space (live venue Lv1), or may be sound sources unrelated to the sound field recording space.
  • the second signal mixing unit 31 mixes high-order fundamental sound signals (for example, N-order ambisonics signals) for each sound source obtained by the encoding process by the second encoding unit 30 and sends the mixture to the second decoding unit 32. send.
  • high-order fundamental sound signals for example, N-order ambisonics signals
  • FIG. 8 is a flowchart chronologically showing an example of an operation procedure for sound field reproduction by the sound field reproduction device 2A according to the second embodiment.
  • the same step numbers are given to the processes that overlap with the explanation of FIG. 5 to simplify or omit the explanation, and different contents will be explained.
  • the sound source acquisition unit 29 of the sound field reproduction device 2A sends audio signals of a plurality of sound sources (for example, various sound sources such as vocals, bass, guitar, drums, etc.) to be presented in the sound field reproduction space (for example, the satellite venue STL1).
  • s1[n], ..., sb[n] (an example of a point sound source signal) is acquired (step St11).
  • the second encoding unit 30 of the sound field reproduction device 2A reads the direction vectors ⁇ b of the b point sound sources from a memory (not shown) or acquires them based on a designation from a user interface (not shown) (step St12).
  • the sound field reproduction device 2A can provide a plurality of sound source signals (for example, sounds from various sound sources such as vocals, bass, guitar, drums, etc.) to be presented in the sound field reproduction space (satellite venue STL1).
  • a second encoding unit 30 generates a second higher-order fundamental sound signal (N-order ambisonics signal) by encoding each of the sound source signals, and mixes the second higher-order fundamental sound signal for each sound source signal. It further includes a second signal mixing section 31.
  • the sound field reproduction device 2A can reproduce high-level atmospheric sounds from sound sources that are uniquely desired to be presented in the sound field reproduction space (satellite venue STL1), unlike the sound field recording space (live venue LV1).
  • the basis allows output with high directional resolution.
  • the present disclosure discloses a sound field reproduction device, a sound field reproduction method, and a sound field reproduction device that utilizes low-order sound field components recorded using an ambisonics microphone to suppress an increase in sound source localization error in a sound field reproduction space. It is useful as a system.
  • Second signal mixing section 100A Sound field reproduction system SPk1, SPk2, SPk3, SPk4, SPk5, SPk6, SPk7, SPk8 speaker

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Otolaryngology (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

This sound field reproduction device comprises: a sound source extraction direction control unit which receives a designation of a sound source extraction direction in a sound field recording space in which a recording device is disposed; a re-encoding unit which re-encodes an acoustic signal corresponding to the sound source extraction direction among low-order basis acoustic signals based on an encoding process using a recorded signal from the recording device, and generates high-order basis acoustic signals; and a sound field reproduction unit which outputs signals based on the respective high-order basis acoustic signals from a plurality of speakers provided in a sound field reproduction space different from the sound field recording space.

Description

音場再現装置、音場再現方法及び音場再現システムSound field reproduction device, sound field reproduction method, and sound field reproduction system
 本開示は、音場再現装置、音場再現方法及び音場再現システムに関する。 The present disclosure relates to a sound field reproduction device, a sound field reproduction method, and a sound field reproduction system.
 昨今、リアルタイムに音場再現を行うためにシーンベース立体音響再生技術が注目されている。シーンベース立体音響再生技術とは、複数の指向性マイク素子を剛球上又は中空球面上に配置されているアンビソニックスマイクを用いて収録(収音)した多チャンネル信号に対して信号処理を施すことにより、視聴環境(空間)を取り囲むように配置されたスピーカを用いてあたかもリスナー(聴取者)がアンビソニックスマイクの設置箇所に存在しているかのような立体的な音場をリアルタイムに再現する方式である。 Recently, scene-based stereophonic sound reproduction technology has been attracting attention in order to reproduce sound fields in real time. Scene-based stereophonic sound reproduction technology is the process of performing signal processing on multi-channel signals recorded using an ambisonics microphone with multiple directional microphone elements arranged on a rigid sphere or a hollow sphere. A method that uses speakers placed to surround the listening environment (space) to reproduce a three-dimensional sound field in real time as if the listener were present at the location where the ambisonics microphone is installed. It is.
 音場再現に関する先行技術として、例えば特許文献1が知られている。特許文献1は、収音対象空間において一体となって設置された複数の収音部であって、音源の位置と当該音源から発せられる音を反射する物体の位置とに応じた複数の異なる向きで設置された複数の収音部による収音に基づく複数の収音信号を取得し、この取得された複数の収音信号に基づいて、収音対象空間内の指定された聴取点に対応する音響信号を生成する、信号処理装置を開示している。 For example, Patent Document 1 is known as a prior art related to sound field reproduction. Patent Document 1 discloses a plurality of sound collection units that are installed as one in a sound collection target space, and which have different directions depending on the position of a sound source and the position of an object that reflects the sound emitted from the sound source. A plurality of sound collection signals are acquired based on sound collection by a plurality of sound collection units installed in A signal processing device that generates an acoustic signal is disclosed.
日本国特開2019-192975号公報Japanese Patent Application Publication No. 2019-192975
 特許文献1の構成では、複数の収音部が配置されている収音対象空間内に聴取点が存在していることが前提となっている。このため、特許文献1を用いてシーンベース立体音響のシステムを構築しようとしても、収音部が配置されている収音対象空間内にリスナーが存在しなければならない。つまり、リスナーが収音対象空間とは異なる位置に存在する場合には、収音対象空間内で収音された音響信号をその収音対象空間内で聴取可能となるように音場再現することは困難であるという課題がある。 The configuration of Patent Document 1 is based on the premise that a listening point exists within a sound collection target space in which a plurality of sound collection units are arranged. For this reason, even if an attempt is made to construct a scene-based stereophonic sound system using Patent Document 1, a listener must be present within the sound collection target space where the sound collection section is arranged. In other words, if the listener is in a position different from the sound collection target space, the sound field must be reproduced so that the sound signal collected in the sound collection target space can be heard within the sound collection target space. The problem is that it is difficult.
 一方で、アンビソニックスマイクは球面上に配置されたマイク素子数が増えるほど高次のアンビソニックス成分(音場成分)を合成することができるため、マイク素子数を増やすことにより収録及び再現の際の方向解像度を高めることができることが知られている。しかしながら、パブリックビューイング等のイベントでのリアルタイムのライブ配信にて高次の音場成分を合成するには、収録対象空間に配置させるマイク素子数を増やしてより高次の音場成分を合成することが必要となる。このため、マイク素子の設置空間によってはマイク素子の配置の物理的な制約があり、また、マイク素子数の増加に伴う伝送チャンネル数の増加によって信号処理及び合成処理等の処理負荷が増大し、音場再現のための出力の遅延を引き起こしてしまう。そのため、マイク素子数は不必要に増やさずに低次の音場成分を用いて音場再現することが求められるが、この場合、周囲に配置したスピーカ群の中心位置から外れるにつれて音源定位誤差の増大が著しくなり、期待する音場再現を行うことができない。 On the other hand, ambisonics microphones can synthesize higher-order ambisonics components (sound field components) as the number of microphone elements arranged on a spherical surface increases. It is known that directional resolution can be increased. However, in order to synthesize higher-order sound field components during real-time live streaming at events such as public viewings, it is necessary to increase the number of microphone elements placed in the recording target space to synthesize higher-order sound field components. This is necessary. Therefore, depending on the installation space of the microphone elements, there are physical restrictions on the arrangement of the microphone elements, and the increase in the number of transmission channels due to the increase in the number of microphone elements increases the processing load such as signal processing and synthesis processing. This causes a delay in the output for sound field reproduction. Therefore, it is necessary to reproduce the sound field using low-order sound field components without unnecessarily increasing the number of microphone elements, but in this case, the sound source localization error increases as the distance from the center of the surrounding speakers The increase becomes significant, and the expected sound field reproduction cannot be achieved.
 本開示は、上述した従来の状況に鑑みて案出され、アンビソニックスマイクを用いて収録した低次の音場成分を利用し、音場再現空間内での音源定位誤差の増大を抑制する音場再現装置、音場再現方法及び音場再現システムを提供することを目的とする。 The present disclosure was devised in view of the conventional situation described above, and utilizes low-order sound field components recorded using an ambisonics microphone to suppress an increase in sound source localization errors in a sound field reproduction space. The object of the present invention is to provide a field reproduction device, a sound field reproduction method, and a sound field reproduction system.
 本開示は、収録デバイスが配置される音場収録空間内の音源抽出方向の指定を受ける音源抽出方向制御部と、前記収録デバイスによる収録信号を用いた符号化処理に基づく低次基底音響信号のうち前記音源抽出方向に対応する音響信号を再符号化して高次基底音響信号を生成する再符号化部と、前記音場収録空間とは異なる音場再現空間内に設けられた複数のスピーカのそれぞれから、前記高次基底音響信号に基づく信号を出力する音場再生部と、を備える、音場再現装置を提供する。 The present disclosure includes a sound source extraction direction control unit that receives a designation of a sound source extraction direction in a sound field recording space in which a recording device is placed, and a low-order fundamental acoustic signal based on an encoding process using a recorded signal by the recording device. a re-encoding unit that re-encodes the acoustic signal corresponding to the sound source extraction direction to generate a high-order base acoustic signal; and a plurality of speakers installed in a sound field reproduction space different from the sound field recording space. A sound field reproducing device is provided, comprising a sound field reproducing unit that outputs a signal based on the high-order fundamental acoustic signal from each of them.
 また、本開示は、収録デバイスが配置される音場収録空間内の音源抽出方向の指定を受けるステップと、前記収録デバイスによる収録信号を用いた符号化処理に基づく低次基底音響信号のうち前記音源抽出方向に対応する音響信号を再符号化して高次基底音響信号を生成するステップと、前記音場収録空間とは異なる音場再現空間内に設けられた複数のスピーカのそれぞれから、前記高次基底音響信号に基づく信号を出力するステップと、を有する、音場再現方法を提供する。 Further, the present disclosure includes a step of receiving a designation of a sound source extraction direction in a sound field recording space in which a recording device is arranged, and a step of receiving a designation of a sound source extraction direction in a sound field recording space in which a recording device is arranged, and a step of receiving a designation of a sound source extraction direction of a low-order base acoustic signal based on an encoding process using a recorded signal by the recording device. re-encoding the acoustic signal corresponding to the sound source extraction direction to generate a high-order base acoustic signal; A sound field reproduction method is provided, comprising the step of outputting a signal based on a sub-base acoustic signal.
 また、本開示は、音場収録空間内の音源を収録可能な収録デバイスを有する音場収録装置と、前記収録デバイスにより収録された音響信号を、前記音場収録空間とは異なる音場再現空間内で再現する音場再現装置と、を備え、前記音場再現装置は、前記音場収録空間内の音源抽出方向の指定を受ける音源抽出方向制御部と、前記収録デバイスによる収録信号を用いた符号化処理に基づく低次基底音響信号のうち前記音源抽出方向に対応する音響信号を再符号化して高次基底音響信号を生成する再符号化部と、前記音場再現空間内に設けられた複数のスピーカのそれぞれから、前記高次基底音響信号に基づく信号を出力する音場再生部と、を備える、音場再現システムを提供する。 The present disclosure also provides a sound field recording device having a recording device that can record a sound source in a sound field recording space, and a sound field reproduction space different from the sound field recording space, and a sound field recording device that includes a recording device that can record a sound source in a sound field recording space, and a sound field reproduction device that reproduces a sound field within the sound field recording space, and the sound field reproduction device includes a sound source extraction direction control unit that receives a designation of a sound source extraction direction within the sound field recording space, and a sound field reproduction device that uses a recording signal from the recording device. a re-encoding unit that generates a high-order base acoustic signal by re-encoding an acoustic signal corresponding to the sound source extraction direction among the low-order base acoustic signals based on the encoding process; and a re-encoding unit provided in the sound field reproduction space. A sound field reproduction system is provided, comprising: a sound field reproduction section that outputs a signal based on the high-order fundamental acoustic signal from each of a plurality of speakers.
 なお、これらの包括的または具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム、または、記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized by a system, an apparatus, a method, an integrated circuit, a computer program, or a recording medium. It may be realized by any combination of the following.
 本開示によれば、アンビソニックスマイクを用いて収録した低次の音場成分を利用し、音場再現空間内での音源定位誤差の増大を抑制できる。 According to the present disclosure, it is possible to suppress an increase in sound source localization error within a sound field reproduction space by using low-order sound field components recorded using an ambisonics microphone.
アンビソニックスマイクを用いたシーンベース立体音響再生技術における音場収録から音場再現までの概念を模式的に示す図Diagram schematically showing the concept from sound field recording to sound field reproduction in scene-based 3D sound reproduction technology using ambisonics microphones 次数n及び度数mに対する球面調和関数展開に基づくアンビソニックス成分の基底の一例を示す図A diagram showing an example of the basis of an ambisonics component based on spherical harmonic expansion for order n and degree m 実施の形態1に係る音場再現システムのシステム構成例を示すブロック図A block diagram showing an example of a system configuration of a sound field reproduction system according to Embodiment 1. 実施の形態1の音場収録から音場再現までの動作概要例を示す図A diagram showing an example of an overview of operations from sound field recording to sound field reproduction in Embodiment 1 実施の形態1に係る音場再現装置による音場再現の動作手順例を時系列に示すフローチャートFlowchart chronologically showing an example of an operation procedure for sound field reproduction by the sound field reproduction device according to the first embodiment 実施の形態2に係る音場再現システムのシステム構成例を示すブロック図Block diagram showing a system configuration example of a sound field reproduction system according to Embodiment 2 実施の形態2の音場収録から音場再現までの動作概要例を示す図A diagram showing an example of an overview of operations from sound field recording to sound field reproduction in Embodiment 2 実施の形態2に係る音場再現装置による音場再現の動作手順例を時系列に示すフローチャートFlowchart chronologically showing an example of an operation procedure for sound field reproduction by the sound field reproduction device according to Embodiment 2
 以下、図面を適宜参照して、本開示に係る音場再現装置、音場再現方法及び音場再現システムを具体的に開示した実施の形態について、詳細に説明する。ただし、必要以上に詳細な説明は省略する場合がある。例えば、すでによく知られた事項の詳細説明及び実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。なお、添付図面及び以下の説明は、当業者が本開示を十分に理解するために提供されるのであって、これらにより特許請求の記載の主題を限定することは意図されていない。 Hereinafter, with appropriate reference to the drawings, embodiments specifically disclosing a sound field reproduction device, a sound field reproduction method, and a sound field reproduction system according to the present disclosure will be described in detail. However, more detailed explanation than necessary may be omitted. For example, detailed explanations of well-known matters and redundant explanations of substantially the same configurations may be omitted. This is to avoid unnecessary redundancy in the following description and to facilitate understanding by those skilled in the art. The accompanying drawings and the following description are provided to enable those skilled in the art to fully understand the present disclosure, and are not intended to limit the subject matter of the claims.
 以下の各実施の形態では、音場収録空間(例えばライブ会場)内の音、音楽、人の声等の音源信号を収録する収録デバイスとしてアンビソニックスマイクを用いたシーンベース立体音響再生技術を例示して説明する。アンビソニックスマイクを用いたシーンベース立体音響再生技術では、アンビソニックスマイクを構成する複数のマイク素子で収録した信号(収録信号)或いは点音源を、球面調和関数を用いた中間表現ITMR1(図1参照)或いはBフォーマット信号として表現する(エンコードする)ことにより、全方位から到来する音場をアンビソニックス信号領域(後述参照)において統一的に取り扱う。更に、この中間表現をデコード(復号化)することによりスピーカ駆動信号を生成し、音場再現空間(例えばサテライト会場)内での所望の音場再現を実現する。 In each of the following embodiments, scene-based stereophonic sound reproduction technology using an ambisonics microphone as a recording device for recording sound source signals such as sounds, music, and human voices in a sound field recording space (for example, a live venue) will be exemplified. and explain. In scene-based stereophonic sound reproduction technology using an ambisonics microphone, signals (recorded signals) or point sound sources recorded by multiple microphone elements constituting the ambisonics microphone are converted into an intermediate representation ITMR1 (see Figure 1) using spherical harmonic functions. ) or by expressing (encoding) it as a B format signal, the sound field arriving from all directions is handled uniformly in the ambisonics signal domain (see below). Further, by decoding this intermediate representation, a speaker drive signal is generated, thereby realizing a desired sound field reproduction within a sound field reproduction space (for example, a satellite venue).
(実施の形態1)
 まず、図1を参照して、シーンベース立体音響再生技術の概念について説明する。図1は、アンビソニックスマイク11を用いたシーンベース立体音響再生技術における音場収録から音場再現までの概念を模式的に示す図である。アンビソニックスマイク11は、ライブ会場LV1等の音場収録空間内に配置される。ライブ会場LV1では、複数の音源(例えば複数人によるバンド演奏であればボーカル、ベース、ギター、ドラム等の各種の音源)による演奏等が行われ、その演奏等の音がアンビソニックスマイク11により収録される。
(Embodiment 1)
First, the concept of scene-based stereophonic sound reproduction technology will be explained with reference to FIG. FIG. 1 is a diagram schematically showing the concept from sound field recording to sound field reproduction in scene-based stereophonic sound reproduction technology using an ambisonics microphone 11. The ambisonics microphone 11 is placed in a sound field recording space such as the live venue LV1. At the live venue LV1, performances are performed using multiple sound sources (for example, in the case of a band performance by multiple people, various sound sources such as vocals, bass, guitar, drums, etc.) are performed, and the sounds of the performances are recorded by an ambisonics microphone 11. be done.
 収録デバイスの一例としてのアンビソニックスマイク11は、4つのマイク素子Mc1、Mc2、Mc3、Mc4を備える。マイク素子Mc1~Mc4のそれぞれは、方向Dr1を正面方向とした場合に、図1中の立方体CB1の中心から4つの頂点を向くように中空配置され、各頂点方向に対する単一指向性を有している。マイク素子Mc1は、アンビソニックスマイク11の前方左上(FLU:Front Left Up)を向き、その前方左上(FLU)の方向の音を収録する。マイク素子Mc2は、アンビソニックスマイク11の前方右下(FRD:Front Right Down)を向き、その前方右下(FRD)の方向の音を収録する。マイク素子Mc3は、アンビソニックスマイク11の後方左下(BLD:Back Left Down)を向き、その後方左下の方向の音を収録する。マイク素子Mc4は、アンビソニックスマイク11の後方右上(BRU:Back Right Up)を向き、その後方右上の方向の音を収録する。 The ambisonics microphone 11, which is an example of a recording device, includes four microphone elements Mc1, Mc2, Mc3, and Mc4. Each of the microphone elements Mc1 to Mc4 is hollowly arranged so as to face four vertices from the center of the cube CB1 in FIG. 1 when the direction Dr1 is the front direction, and has unidirectivity in each vertex direction. ing. The microphone element Mc1 faces the front left up (FLU) of the ambisonics microphone 11, and records sound in the front left up (FLU) direction. The microphone element Mc2 faces the front right down (FRD) of the ambisonics microphone 11, and records sound in the front right down direction (FRD). The microphone element Mc3 faces the back left down (BLD) of the ambisonics microphone 11, and records the sound in the back left direction. The microphone element Mc4 faces the back right up (BRU) of the ambisonics microphone 11, and records the sound in the back right up direction.
 これらの4方向(つまり、FLU、FRD、BLD、BRU)の音の収録信号は、Aフォーマット信号と呼ばれる。Aフォーマット信号は、そのままでは使用できず、指向特性(指向性)を有する中間表現ITMR1としてのBフォーマット信号に変換される。Bフォーマット信号は、例えば、全方向(全方位)の音のBフォーマット信号W、前後方向の音のBフォーマット信号X、左右方向の音のBフォーマット信号Y、上下方向の音のBフォーマット信号Zを有する。Aフォーマット信号は、次に示す変換式により、Bフォーマット信号に変換される。 The sound recording signals in these four directions (that is, FLU, FRD, BLD, and BRU) are called A-format signals. The A format signal cannot be used as is, but is converted into a B format signal as an intermediate representation ITMR1 having directional characteristics (directivity). Examples of B-format signals include a B-format signal W for omnidirectional sound, a B-format signal X for front-back sound, a B-format signal Y for left-right sound, and a B-format signal Z for up-down sound. has. The A format signal is converted to a B format signal using the following conversion formula.
  W=FLU+FRD+BLD+BRU
  X=FLU+FRD-BLD-BRU
  Y=FLU-FRD+BLD-BRU
  Z=FLU-FRD-BLD+BRU
W=FLU+FRD+BLD+BRU
X=FLU+FRD-BLD-BRU
Y=FLU-FRD+BLD-BRU
Z=FLU-FRD-BLD+BRU
 Bフォーマット信号W、X、Y、Zを合成することにより、前後、左右、上下の全方位の音の信号が得られる。そして、Bフォーマット信号W、X、Y、Zのそれぞれの信号レベルを変更させて合成することにより、前後、左右、上下の全方位のうち任意の指向特性を有する音の信号を生成することができる。例えば図1に示すように、立方体でモデル化される音場再現空間(例えばサテライト会場STL1)内の各頂点部分に、合計8つのスピーカSPk1、SPk2、SPk3、SPk4、SPk5、SPk6、SPk7、SPk8が配置され、音場収録空間(例えばライブ会場LV1)と同様(つまり、前後、左右、上下の方向が平行或いは同方向)の3次元座標系を考える。 By combining the B-format signals W, X, Y, and Z, sound signals in all directions - front and back, left and right, and up and down - are obtained. By changing the signal levels of the B-format signals W, X, Y, and Z and synthesizing them, it is possible to generate a sound signal having arbitrary directional characteristics in all directions: front, rear, left, right, and top and bottom. can. For example, as shown in FIG. 1, a total of eight speakers SPk1, SPk2, SPk3, SPk4, SPk5, SPk6, SPk7, SPk8 are installed at each vertex in a sound field reproduction space modeled as a cube (for example, satellite venue STL1). Consider a three-dimensional coordinate system in which the sound field recording space (for example, the live venue LV1) is arranged (that is, the front-rear, left-right, and up-down directions are parallel or in the same direction).
 なお、スピーカSPk1~SPk8のそれぞれの位置は、音場再現空間(例えばサテライト会場STL1)の基準位置(例えば中心位置LSP1)からの既定距離と角度(方位角θ及び仰角φ)とにより特定可能である。iは音場再現空間(例えばサテライト会場STL1)内に配置されているスピーカを示す変数であり、図1の例では1から8までのいずれかの整数をとる。 Note that the positions of each of the speakers SPk1 to SPk8 are specified by a predetermined distance and angle (azimuth angle θ i and elevation angle φ i ) from the reference position (for example, center position LSP1) of the sound field reproduction space (for example, satellite venue STL1). It is possible. i is a variable indicating a speaker placed in the sound field reproduction space (for example, satellite venue STL1), and takes any integer from 1 to 8 in the example of FIG.
 音場再現空間(例えばサテライト会場STL1)の中心位置LSP1にユーザであるリスナー(聴取者)が存在し、正面方向(Front)を向いているとする。このような状況下において、音場収録空間(例えばライブ会場LV1)内で収録されたAフォーマット信号に基づく符号化処理により得られたBフォーマット信号W、X、Y、Zのデータと音場再現空間(例えばサテライト会場STL1)内のスピーカSPk1~SPk8のそれぞれの方向とに基づいて、音場収録空間(例えばライブ会場LV1)内の音場を音場再現空間(例えばサテライト会場STL1)内で自由に再現することができる。つまり、音場再現空間(例えばサテライト会場STL1)にユーザであるリスナー(聴取者)が存在する場合に、リスナーの正面方向を基準方向とし、その基準方向から任意の3次元方向(例えば後述する音源提示方向θtarget)の音を再現出力することが可能となる。 It is assumed that a listener (listener) who is a user exists at the center position LSP1 of a sound field reproduction space (for example, satellite venue STL1) and is facing the front direction (Front). Under such circumstances, the data and sound field reproduction of the B format signals W, The sound field in the sound field recording space (for example, live venue LV1) can be freely adjusted in the sound field reproduction space (for example, satellite venue STL1) based on the respective directions of speakers SPk1 to SPk8 in the space (for example, satellite venue STL1). can be reproduced. In other words, when there is a listener who is a user in the sound field reproduction space (for example, satellite venue STL1), the front direction of the listener is taken as the reference direction, and from that reference direction, any three-dimensional direction (for example, the sound source It becomes possible to reproduce and output the sound in the presentation direction θ target ).
 次に、図2を参照して、次数n及び度数mに対する球面調和関数展開に基づくアンビソニックス成分の基底について説明する。図2は、次数n及び度数mに対する球面調和関数展開に基づくアンビソニックス成分の基底の一例を示す図である。 Next, with reference to FIG. 2, the basis of the ambisonics component based on spherical harmonic expansion for order n and power m will be explained. FIG. 2 is a diagram showing an example of the basis of an ambisonics component based on spherical harmonic expansion with respect to order n and power m.
 図2の横軸(m)は度数(degree)を示し、図2の縦軸(n)は次数(order)を示す。度数mは、-nから+nまでの値をとる。n=N次までの球面調和関数は合計(N+1)個の基底を含む。例えば、n=N=0である場合、1個の基底(つまり、全方位のBフォーマット信号W)が得られる。また例えば、n=N=1である場合、4個の基底(つまり、(n、m)=(0、0)に対応する全方位のBフォーマット信号W、(n、m)=(1、-1)に対応する前後方向のBフォーマット信号X、(n、m)=(1、0)に対応する上下方向のBフォーマット信号Z、(n、m)=(1、1)に対応する左右方向のBフォーマット信号Y)が得られる。なお、n=N=2以降も同様であるため、説明を省略する。 The horizontal axis (m) in FIG. 2 indicates degree, and the vertical axis (n) in FIG. 2 indicates order. The frequency m takes a value from -n to +n. The spherical harmonics up to order n=N include a total of (N+1) two bases. For example, when n=N=0, one basis (that is, an omnidirectional B format signal W) is obtained. For example, when n = N = 1, the omnidirectional B format signal W corresponding to the four bases (that is, (n, m) = (0, 0), (n, m) = (1, B-format signal X in the front-rear direction corresponding to -1), B-format signal Z in the vertical direction corresponding to (n, m) = (1, 0), corresponding to (n, m) = (1, 1) A horizontal B format signal Y) is obtained. Note that the same applies to n=N=2 and thereafter, so the explanation will be omitted.
 球面調和関数はnとmの増加に対して空間的な周期性が増す性質を有することが知られている。このため、nとmの組み合わせによって異なる方向パターン(指向特性)のBフォーマット信号を表現することが可能となる。次数n及び度数mに対する次元をアンビソニックスチャネルナンバリング(ACN:Ambisonics Channel Numbering)に基づいてK=n(n+1)+mと定義すると、球面調和関数を式(1)のようにベクトル形式で表現可能である。式(1)において、上添字のTは転置を示す。 It is known that spherical harmonics have the property of increasing spatial periodicity as n and m increase. Therefore, it is possible to express B format signals with different directional patterns (directional characteristics) depending on the combination of n and m. If the dimensions for order n and degree m are defined as K=n(n+1)+m based on Ambisonics Channel Numbering (ACN), the spherical harmonic function can be expressed in vector form as shown in equation (1). be. In formula (1), the superscript T indicates transposition.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-C000002
Figure JPOXMLDOC01-appb-C000002
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 次に、図3及び図4を参照して、実施の形態1に係る音場再現システム100のシステム構成並びに動作概要について説明する。図3は、実施の形態1に係る音場再現システム100のシステム構成例を示すブロック図である。図4は、実施の形態1の音場収録から音場再現までの動作概要例を示す図である。 Next, the system configuration and operational outline of the sound field reproduction system 100 according to the first embodiment will be described with reference to FIGS. 3 and 4. FIG. 3 is a block diagram showing an example of the system configuration of the sound field reproduction system 100 according to the first embodiment. FIG. 4 is a diagram showing an example of an outline of operations from sound field recording to sound field reproduction in the first embodiment.
 音場再現システム100は、音場収録装置1と、音場再現装置2とを含む。音場収録装置1と音場再現装置2とはネットワークNW1を介して互いにデータ通信が可能に接続されている。ネットワークNW1は、有線ネットワークでもよいし、無線ネットワークでもよい。有線ネットワークは、例えば有線LAN(Local Area Network)、有線WAN(Wide Area Network)、電力線通信(PLC:Power Line Communication)のうち少なくとも1つが該当し、他の有線通信可能なネットワーク構成でもよい。一方、無線ネットワークは、Wi-Fi(登録商標)等の無線LAN、無線WAN、Bluetooth(登録商標)等の近距離無線通信、4G或いは5G等の移動体携帯通信網のうち少なくとも1つが該当し、他の無線通信可能なネットワーク構成でもよい。 The sound field reproduction system 100 includes a sound field recording device 1 and a sound field reproduction device 2. The sound field recording device 1 and the sound field reproduction device 2 are connected to each other via a network NW1 so as to be capable of data communication. Network NW1 may be a wired network or a wireless network. The wired network is, for example, at least one of a wired LAN (Local Area Network), a wired WAN (Wide Area Network), and a power line communication (PLC), and may be any other network configuration that allows wired communication. On the other hand, the wireless network includes at least one of wireless LAN such as Wi-Fi (registered trademark), wireless WAN, short-range wireless communication such as Bluetooth (registered trademark), and mobile mobile communication network such as 4G or 5G. , or other network configurations that allow wireless communication.
 音場収録装置1は、例えば音場収録空間(例えばライブ会場LV1)に配置され、アンビソニックスマイク11と、A/D変換部12と、符号化部13と、マイク素子方向指定部14とを含む。なお、音場収録装置1は、少なくともアンビソニックスマイク11を有していればよく、A/D変換部12、符号化部13及びマイク素子方向指定部14は音場再現装置2に設けられてもよい。言い換えると、アンビソニックスマイク11は、音場再現装置2の外部に設けられても構わない。 The sound field recording device 1 is arranged, for example, in a sound field recording space (for example, a live venue LV1), and includes an ambisonics microphone 11, an A/D conversion section 12, an encoding section 13, and a microphone element direction specifying section 14. include. Note that the sound field recording device 1 only needs to have at least the ambisonics microphone 11, and the A/D conversion section 12, the encoding section 13, and the microphone element direction specifying section 14 are provided in the sound field reproduction device 2. Good too. In other words, the ambisonics microphone 11 may be provided outside the sound field reproduction device 2.
 アンビソニックスマイク11は、4つのマイク素子Mc1、Mc2、Mc3、Mc4を備え、マイク素子Mc1において前方左上方向(図1参照)の音を収録し、マイク素子Mc2において前方右下方向(図1参照)の音を収録し、マイク素子Mc3において後方左下方向(図1参照)の音を収録し、後方右上方向(図1参照)の音を収録する。なお、アンビソニックスマイク11は、中空配置された4つのマイク素子Mc1、Mc2、Mc3、Mc4よりも多くの単一指向性を有するマイク素子を備えていてもよく、また、剛球上に配置された無指向性を有するマイク素子を備えていても良い。多数のマイク素子を備えたアンビソニックスマイクを用いることにより、符号化部13において、2次以上オーダーのアンビソニックス信号を合成することが可能となる。アンビソニックスマイク11を構成する各マイク素子により収録された信号(収録信号)は、A/D変換部12に入力される。 The ambisonics microphone 11 includes four microphone elements Mc1, Mc2, Mc3, and Mc4. The microphone element Mc1 records sound in the front upper left direction (see Figure 1), and the microphone element Mc2 records sound in the front lower right direction (see Figure 1). ), the microphone element Mc3 records the sound in the rear lower left direction (see FIG. 1), and records the sound in the rear upper right direction (see FIG. 1). Note that the ambisonics microphone 11 may include more microphone elements having unidirectionality than the four microphone elements Mc1, Mc2, Mc3, and Mc4 arranged in the hollow, or may include microphone elements arranged on a rigid sphere. A microphone element having omnidirectionality may be included. By using an ambisonics microphone equipped with a large number of microphone elements, it becomes possible to synthesize ambisonics signals of second order or higher order in the encoding unit 13. Signals (recorded signals) recorded by each microphone element constituting the ambisonics microphone 11 are input to the A/D converter 12 .
 A/D変換部12、符号化部13及びマイク素子方向指定部14は、例えばCPU(Central Processing Unit)、DSP(Digital Signal Processor)、GPU(Graphical Processing Unit)、FPGA(Field Programmable Gate Array)等の電子デバイスのうち少なくとも1つが実装された半導体チップ若しくは専用のハードウェアにより構成される。 The A/D conversion section 12, the encoding section 13, and the microphone element direction specifying section 14 are, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), a GPU (Graphical Processing Unit), and a GPU (Graphical Processing Unit). it), FPGA (Field Programmable Gate Array), etc. The electronic device is configured by a semiconductor chip or dedicated hardware on which at least one of the electronic devices is mounted.
 A/D変換部12は、アンビソニックスマイク11を構成する各マイク素子からのアナログ形式の収録信号をディジタル形式の収録信号に変換して符号化部13に送る。 The A/D converter 12 converts the recorded signal in analog format from each microphone element constituting the ambisonics microphone 11 into a recorded signal in digital format, and sends the signal to the encoder 13.
 符号化部13は、A/D変換部12による変換後の収録信号とマイク素子方向指定部14からの方向ベクトルθとを用いて、A/D変換部12による変換後の収録信号を符号化処理することにより、低次基底音響信号(例えば1次オーダーアンビソニックス信号)を生成する。符号化部13による符号化処理の詳細については後述する。 The encoding unit 13 encodes the recorded signal after conversion by the A/D converter 12 using the recorded signal after conversion by the A/D converter 12 and the direction vector θ m from the microphone element direction specifying unit 14. By performing the processing, a low-order fundamental acoustic signal (for example, a first-order ambisonics signal) is generated. Details of the encoding process by the encoding unit 13 will be described later.
Figure JPOXMLDOC01-appb-C000005
Figure JPOXMLDOC01-appb-C000005
 ここで、符号化部13による符号化処理の詳細について説明する。 Here, details of the encoding process by the encoding unit 13 will be explained.
 一般的に、球面上の任意の角度(θ、φ)に対し半径rの位置で観測(収録)される音圧pは波動方程式の球面調和関数領域における内部問題の解として、波数kに対し式(2)の球面調和関数を基底として式(4)と展開されることが知られている。式(4)において、A は展開係数であり、R(kr)は動径関数項である。また、次数nに関する無限和は有限次数Nで打ち切ることで近似され、この打ち切り次数Nに応じて音場再現の精度が変化する。以下、打ち切り次数をNとして表現する。 In general, the sound pressure p observed (recorded) at a position of radius r for any angle (θ, φ) on a spherical surface is expressed as a solution to an internal problem in the spherical harmonic domain of the wave equation, with respect to the wave number k. It is known that Equation (4) can be expanded based on the spherical harmonic function of Equation (2). In equation (4), A m n is an expansion coefficient and R n (kr) is a radial function term. Furthermore, the infinite sum with respect to order n is approximated by truncating at a finite order N, and the accuracy of sound field reproduction changes according to this truncated order N. Hereinafter, the truncation order will be expressed as N.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-C000007
Figure JPOXMLDOC01-appb-C000007
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
 式(6)において、iは虚数単位であり、j(kr)はn次の球ベッセル関数、j (kr)はその導関数である。本開示においては、この平面波に対する展開係数ベクトルγ を、符号化部13による符号化処理の出力であるBフォーマット信号(中間表現)として取り扱う。以下、この展開係数ベクトルをアンビソニックス領域信号又は単にアンビソニックス信号と称する場合がある。 In equation (6), i is an imaginary unit, j n (kr) is an n-th spherical Bessel function, and j n (kr) is its derivative. In the present disclosure, the expansion coefficient vector γ m n for this plane wave is handled as a B format signal (intermediate representation) that is the output of the encoding process by the encoding unit 13. Hereinafter, this expansion coefficient vector may be referred to as an ambisonics region signal or simply an ambisonics signal.
 より具体的には、符号化部13による符号化処理では、A/D変換部12による変換後の時間領域信号である収録信号をアンビソニックス信号(例えば1次オーダーアンビソニックス信号)へと変換し、このアンビソニックス信号(例えば1次オーダーアンビソニックス信号)は音場再現装置2の第1復号化部25及び第2復号化部26のそれぞれによりデコード処理されてスピーカ駆動信号に変換される。 More specifically, in the encoding process by the encoding unit 13, the recorded signal, which is a time domain signal after conversion by the A/D converter 12, is converted into an ambisonics signal (for example, a first-order ambisonics signal). This ambisonics signal (for example, a first-order ambisonics signal) is decoded by the first decoding section 25 and the second decoding section 26 of the sound field reproduction device 2 and converted into a speaker drive signal.
Figure JPOXMLDOC01-appb-C000010
Figure JPOXMLDOC01-appb-C000010
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-C000014
Figure JPOXMLDOC01-appb-C000014
 音場再現装置2は、例えば音場再現空間(例えばサテライト会場STL1)に配置され、音源抽出方向制御部21と、音源提示方向制御部22と、再符号化部23と、スピーカ方向指定部24と、第1復号化部25と、第2復号化部26と、信号混合部27と、音場再生部28と、スピーカSPk1、SPk2、…、SPk8とを含む。なお、以下の説明において、スピーカの配置数は一例として8としているが、2以上の整数であれば8に限定されないことは言うまでもない。 The sound field reproduction device 2 is arranged, for example, in a sound field reproduction space (for example, satellite venue STL1), and includes a sound source extraction direction control section 21, a sound source presentation direction control section 22, a re-encoding section 23, and a speaker direction specifying section 24. , a first decoding section 25, a second decoding section 26, a signal mixing section 27, a sound field reproduction section 28, and speakers SPk1, SPk2, . . . , SPk8. In the following description, the number of speakers arranged is 8 as an example, but it goes without saying that the number is not limited to 8 as long as it is an integer of 2 or more.
Figure JPOXMLDOC01-appb-C000015
Figure JPOXMLDOC01-appb-C000015
Figure JPOXMLDOC01-appb-C000016
Figure JPOXMLDOC01-appb-C000016
Figure JPOXMLDOC01-appb-C000017
Figure JPOXMLDOC01-appb-C000017
Figure JPOXMLDOC01-appb-C000018
Figure JPOXMLDOC01-appb-C000018
Figure JPOXMLDOC01-appb-C000019
Figure JPOXMLDOC01-appb-C000019
Figure JPOXMLDOC01-appb-C000020
Figure JPOXMLDOC01-appb-C000020
 信号混合部27は、第1復号化部25からの高次基底音響信号に対応するスピーカ駆動信号と、第2復号化部26からの低次基底音響信号に対応するスピーカ駆動信号とを、スピーカごとに対応するように混合して音場再生部28に送る。なお、信号混合部27の構成は音場再現装置2から省略されてもよく、この場合には第1復号化部25による高次基底音響信号のみが音場再生部28を介して各スピーカSPk1~SPk8のそれぞれから出力される。 The signal mixing section 27 converts the speaker drive signal corresponding to the high-order base acoustic signal from the first decoding section 25 and the speaker drive signal corresponding to the low-order base acoustic signal from the second decoding section 26 into the speaker drive signal. The mixed signals are mixed in a corresponding manner and sent to the sound field reproduction section 28. Note that the configuration of the signal mixing section 27 may be omitted from the sound field reproduction device 2, and in this case, only the high-order base acoustic signal from the first decoding section 25 is transmitted to each speaker SPk1 via the sound field reproduction section 28. - Output from each of SPk8.
 音場再生部28は、信号混合部27による混合後のスピーカごとのディジタル形式のスピーカ駆動信号をアナログ形式のスピーカ駆動信号に変換して信号増幅し、対応するスピーカから出力(再生)する。 The sound field reproducing unit 28 converts the digital speaker drive signal for each speaker mixed by the signal mixing unit 27 into an analog speaker drive signal, amplifies the signal, and outputs (reproduces) the signal from the corresponding speaker.
 スピーカSPk1、SPk2、…、SPk8のそれぞれは、立方体でモデル化される音場再現空間(例えばサテライト会場STL1)の各頂点部分に配置され、音場再生部28からのスピーカ駆動信号に基づいて音場を再生(再現)する。なお、スピーカ設置数は再現したい音場によって変化させてよく、特定の方位に対する再現を行わない場合や、トランスオーラルシステムやVBAP(Vector Based Amplitude Panning)法など一般的に知られた仮想音像生成方式を組み合わせることにより8個よりも少ないスピーカを用いて音場再現を行っても良い。逆に、8個よりも多くのスピーカを用いた音場再現を行っても良い。また、スピーカ設置位置はサテライト会場STL1の基準位置(例えば中心位置LSP1)を取り囲むように設置されていれば音場再現空間(例えばサテライト会場STL1)の各頂点部分以外であっても良い。音場再生部28はスピーカの代わりに聴取者(ユーザ)が装着しているヘッドホンやイヤホンなどの両耳への再生装置に信号を出力しても良い。また、音場再生部28は、聴取者(ユーザ)の両耳への再生装置(例えば、上述したヘッドホンやイヤホン)に信号を供給する際は後述するデコード処理によって方位角+-90°に対応した再生信号を生成しても良いし、頭部を包囲する複数の方向に対して仮想音像を生成し、それら複数の角度に対応したHRTF(Head Related Transfer Function)などの立体音像をユーザに知覚させるための伝達特性を対応した方向の仮想音像に対して周波数領域で乗算又は時間領域で畳み込むことで再生信号を生成しても良い。これにより、サテライト会場STL1に配置されたスピーカSPk1、SPk2、…、SPk8のそれぞれからに限った音場再現となるのではなく、サテライト会場STL1に配置された聴取者(ユーザ)が装着している再生装置(例えば、上述したヘッドホンやイヤホン)への音場再現も可能となる。 Each of the speakers SPk1, SPk2, ..., SPk8 is placed at each vertex of a sound field reproduction space modeled as a cube (for example, satellite venue STL1), and produces sound based on a speaker drive signal from the sound field reproduction section 28. Regenerate (recreate) the place. Note that the number of speakers installed can be changed depending on the sound field you want to reproduce, and if you do not want to reproduce a specific direction, or if you use commonly known virtual sound image generation methods such as the transaural system or the VBAP (Vector Based Amplitude Panning) method. By combining these, sound field reproduction may be performed using fewer than eight speakers. Conversely, sound field reproduction may be performed using more than eight speakers. Further, the speaker installation position may be other than each vertex of the sound field reproduction space (for example, the satellite venue STL1) as long as it is installed so as to surround the reference position (for example, the center position LSP1) of the satellite venue STL1. The sound field reproduction unit 28 may output the signal to a binaural reproduction device such as headphones or earphones worn by the listener (user) instead of the speaker. Furthermore, when the sound field reproduction unit 28 supplies a signal to a reproduction device for both ears of the listener (user) (for example, the above-mentioned headphones or earphones), the sound field reproduction unit 28 corresponds to an azimuth angle of +-90° by decoding processing described later. Alternatively, a virtual sound image may be generated in multiple directions surrounding the head, and the user may perceive a stereoscopic sound image such as HRTF (Head Related Transfer Function) corresponding to those multiple angles. The reproduction signal may be generated by multiplying the virtual sound image in the corresponding direction by the transfer characteristic for the reproduction in the frequency domain or by convolving it in the time domain. As a result, the sound field is not reproduced only from each of the speakers SPk1, SPk2, ..., SPk8 placed in the satellite venue STL1, but the sound field is reproduced only by the listeners (users) placed in the satellite venue STL1. It is also possible to reproduce the sound field on a playback device (for example, the above-mentioned headphones or earphones).
 ここで、再符号化部23による再符号化処理、第1復号化部25及び第2復号化部26による処理の詳細について説明する。 Here, details of the re-encoding process by the re-encoding unit 23 and the processes by the first decoding unit 25 and the second decoding unit 26 will be described.
Figure JPOXMLDOC01-appb-C000021
Figure JPOXMLDOC01-appb-C000021
Figure JPOXMLDOC01-appb-C000022
Figure JPOXMLDOC01-appb-C000022
Figure JPOXMLDOC01-appb-M000023
Figure JPOXMLDOC01-appb-M000023
Figure JPOXMLDOC01-appb-C000024
Figure JPOXMLDOC01-appb-C000024
Figure JPOXMLDOC01-appb-M000025
Figure JPOXMLDOC01-appb-M000025
Figure JPOXMLDOC01-appb-C000026
Figure JPOXMLDOC01-appb-C000026
Figure JPOXMLDOC01-appb-M000027
Figure JPOXMLDOC01-appb-M000027
Figure JPOXMLDOC01-appb-C000028
Figure JPOXMLDOC01-appb-C000028
Figure JPOXMLDOC01-appb-M000029
Figure JPOXMLDOC01-appb-M000029
Figure JPOXMLDOC01-appb-C000030
Figure JPOXMLDOC01-appb-C000030
Figure JPOXMLDOC01-appb-M000031
Figure JPOXMLDOC01-appb-M000031
 次に、図5を参照して、音場再現装置2による音場再現の動作手順について説明する。図5は、実施の形態1に係る音場再現装置2による音場再現の動作手順例を時系列に示すフローチャートである。なお、以下の説明ではステップSt1及びステップSt2の各処理は音場収録装置1内で実行されるとして説明するが、ステップSt2の処理は音場収録装置1のアンビソニックスマイク11以外の構成が音場再現装置2内に設けられる場合には音場再現装置2により実行されてよい。 Next, with reference to FIG. 5, the operating procedure for sound field reproduction by the sound field reproduction device 2 will be described. FIG. 5 is a flowchart chronologically showing an example of an operation procedure for sound field reproduction by the sound field reproduction device 2 according to the first embodiment. In the following explanation, each process of step St1 and step St2 will be explained as being executed within the sound field recording device 1, but the process of step St2 is performed when the components other than the ambisonics microphone 11 of the sound field recording device 1 are If it is provided in the field reproduction device 2, it may be executed by the sound field reproduction device 2.
Figure JPOXMLDOC01-appb-C000032
Figure JPOXMLDOC01-appb-C000032
 音場再現装置2は、ステップSt2の処理を受けて、ステップSt3~ステップSt6の一連の処理(つまり、高次基底音響信号を生成するための再符号化処理)とステップSt7の処理(つまり、低次基底音響信号を生成するための復号化処理)とを並行して実行する。 In response to the process in step St2, the sound field reproduction device 2 performs a series of processes in steps St3 to St6 (that is, a re-encoding process for generating a higher-order base acoustic signal) and a process in step St7 (that is, decoding processing for generating a low-order fundamental acoustic signal) is executed in parallel.
Figure JPOXMLDOC01-appb-C000033
Figure JPOXMLDOC01-appb-C000033
Figure JPOXMLDOC01-appb-C000034
Figure JPOXMLDOC01-appb-C000034
 音場再現装置2の信号混合部27は、ステップSt6での第1復号化部25からの高次基底音響信号に対応するスピーカ駆動信号(第1復号化処理の出力の一例)と、ステップSt7での第2復号化部26からの低次基底音響信号に対応するスピーカ駆動信号(第2復号化処理の出力の一例)とを、スピーカごとに対応するように混合する(ステップSt8)。音場再現装置2の音場再生部28は、ステップSt8での信号混合部27による混合後のスピーカごとのディジタル形式のスピーカ駆動信号をアナログ形式のスピーカ駆動信号に変換して信号増幅し、対応するスピーカSPk1~SPk8のそれぞれから出力(再生)する(ステップSt9)。 The signal mixing unit 27 of the sound field reproduction device 2 mixes the speaker drive signal (an example of the output of the first decoding process) corresponding to the high-order base acoustic signal from the first decoding unit 25 in step St6, and the speaker drive signal (an example of the output of the first decoding process) in step St7. The speaker drive signal (an example of the output of the second decoding process) corresponding to the low-order base acoustic signal from the second decoding unit 26 is mixed so as to correspond to each speaker (step St8). The sound field reproducing unit 28 of the sound field reproducing device 2 converts the digital format speaker drive signal for each speaker after mixing by the signal mixing unit 27 in step St8 into an analog format speaker drive signal, amplifies the signal, and responds accordingly. output (reproduction) from each of the speakers SPk1 to SPk8 (step St9).
Figure JPOXMLDOC01-appb-C000035
Figure JPOXMLDOC01-appb-C000035
Figure JPOXMLDOC01-appb-C000036
Figure JPOXMLDOC01-appb-C000036
Figure JPOXMLDOC01-appb-C000037
Figure JPOXMLDOC01-appb-C000037
Figure JPOXMLDOC01-appb-C000038
Figure JPOXMLDOC01-appb-C000038
Figure JPOXMLDOC01-appb-C000039
Figure JPOXMLDOC01-appb-C000039
Figure JPOXMLDOC01-appb-C000040
Figure JPOXMLDOC01-appb-C000040
Figure JPOXMLDOC01-appb-C000041
Figure JPOXMLDOC01-appb-C000041
 また、収録デバイスは、複数のマイク素子Mc1~Mc4のそれぞれが異なる方向を向くように立体的に配置されたアンビソニックスマイク11により構成される。これにより、音場収録装置1は、音場収録空間(ライブ会場LV1)内の複数の音源による演奏等の雰囲気の音を立体的に収録することができる。 Further, the recording device is composed of an ambisonics microphone 11 that is three-dimensionally arranged so that each of the plurality of microphone elements Mc1 to Mc4 faces in a different direction. Thereby, the sound field recording device 1 can three-dimensionally record atmospheric sounds such as performances by a plurality of sound sources in the sound field recording space (live venue LV1).
Figure JPOXMLDOC01-appb-C000042
Figure JPOXMLDOC01-appb-C000042
 まず、図6及び図7を参照して、実施の形態2に係る音場再現システム100Aのシステム構成並びに動作概要について説明する。図6は、実施の形態2に係る音場再現システム100Aのシステム構成例を示すブロック図である。図7は、実施の形態2の音場収録から音場再現までの動作概要例を示す図である。図6及び図7の説明において、対応する図3及び図4の構成及び動作と重複する内容については同一の符号を参照して説明を簡略化或いは省略し、異なる内容について説明する。 First, the system configuration and operational outline of the sound field reproduction system 100A according to the second embodiment will be explained with reference to FIGS. 6 and 7. FIG. 6 is a block diagram showing an example of a system configuration of a sound field reproduction system 100A according to the second embodiment. FIG. 7 is a diagram showing an example of an outline of operations from sound field recording to sound field reproduction in the second embodiment. In the description of FIGS. 6 and 7, the same reference numerals are used to simplify or omit the description of the contents that overlap with the configurations and operations of the corresponding FIGS. 3 and 4, and the different contents will be described.
 音場再現システム100Aは、音場収録装置1と、音場再現装置2Aとを含む。音場収録装置1の構成は実施の形態1と同一であるため、説明を省略する。 The sound field reproduction system 100A includes a sound field recording device 1 and a sound field reproduction device 2A. The configuration of the sound field recording device 1 is the same as that in Embodiment 1, so a description thereof will be omitted.
 音場再現装置2Aは、例えば音場再現空間(例えばサテライト会場STL1)に配置され、音源抽出方向制御部21と、音源提示方向制御部22と、再符号化部23と、スピーカ方向指定部24と、第1復号化部25と、音源取得部29と、第2符号化部30と、第2信号混合部31と、第2復号化部32と、信号混合部27と、音場再生部28と、スピーカSPk1、SPk2、…、SPk8とを含む。 The sound field reproduction device 2A is arranged, for example, in a sound field reproduction space (for example, satellite venue STL1), and includes a sound source extraction direction control section 21, a sound source presentation direction control section 22, a re-encoding section 23, and a speaker direction specification section 24. , the first decoding section 25, the sound source acquisition section 29, the second encoding section 30, the second signal mixing section 31, the second decoding section 32, the signal mixing section 27, and the sound field reproduction section. 28, and speakers SPk1, SPk2, ..., SPk8.
 音源取得部29は、音場再現空間(例えばサテライト会場STL1)に提示したい複数の音源(例えばボーカル、ベース、ギター、ドラム等の各種の音源)の音響信号s1[n]、…、sb[n]を取得して第2符号化部30に送る。それぞれの音響信号s1[n]、…、sb[n]は点音源として表現可能である。nは離散時刻を示し、bは音源の個数を示す。これらの音源は音場収録空間(ライブ会場Lv1)で個別に収録されたものであっても良いし、音場収録空間とは関係のない音源であっても良い。 The sound source acquisition unit 29 acquires acoustic signals s1[n], ..., sb[n] of a plurality of sound sources (for example, various sound sources such as vocals, bass, guitar, drums, etc.) to be presented in a sound field reproduction space (for example, satellite venue STL1). ] is acquired and sent to the second encoding unit 30. Each acoustic signal s1[n], ..., sb[n] can be expressed as a point sound source. n indicates discrete time, and b indicates the number of sound sources. These sound sources may be individually recorded in the sound field recording space (live venue Lv1), or may be sound sources unrelated to the sound field recording space.
Figure JPOXMLDOC01-appb-C000043
Figure JPOXMLDOC01-appb-C000043
Figure JPOXMLDOC01-appb-M000044
Figure JPOXMLDOC01-appb-M000044
 第2信号混合部31は、第2符号化部30による符号化処理により得られた音源ごとの高次基底音響信号(例えばN次オーダーアンビソニックス信号)を混合して第2復号化部32に送る。 The second signal mixing unit 31 mixes high-order fundamental sound signals (for example, N-order ambisonics signals) for each sound source obtained by the encoding process by the second encoding unit 30 and sends the mixture to the second decoding unit 32. send.
Figure JPOXMLDOC01-appb-C000045
Figure JPOXMLDOC01-appb-C000045
Figure JPOXMLDOC01-appb-M000046
Figure JPOXMLDOC01-appb-M000046
 次に、図8を参照して、音場再現装置2Aによる音場再現の動作手順について説明する。図8は、実施の形態2に係る音場再現装置2Aによる音場再現の動作手順例を時系列に示すフローチャートである。図8の説明において、図5の説明と重複する処理については同一のステップ番号を付与して説明を簡略化或いは省略し、異なる内容について説明する。 Next, with reference to FIG. 8, the operating procedure for sound field reproduction by the sound field reproduction device 2A will be described. FIG. 8 is a flowchart chronologically showing an example of an operation procedure for sound field reproduction by the sound field reproduction device 2A according to the second embodiment. In the explanation of FIG. 8, the same step numbers are given to the processes that overlap with the explanation of FIG. 5 to simplify or omit the explanation, and different contents will be explained.
 図8において、音場再現装置2Aの音源取得部29は、音場再現空間(例えばサテライト会場STL1)に提示したい複数の音源(例えばボーカル、ベース、ギター、ドラム等の各種の音源)の音響信号s1[n]、…、sb[n](点音源信号の一例)を取得する(ステップSt11)。音場再現装置2Aの第2符号化部30は、b個の点音源の方向ベクトルθを、メモリ(図示略)から読み出す或いはユーザインターフェース(図示略)からの指定に基づいて取得する(ステップSt12)。 In FIG. 8, the sound source acquisition unit 29 of the sound field reproduction device 2A sends audio signals of a plurality of sound sources (for example, various sound sources such as vocals, bass, guitar, drums, etc.) to be presented in the sound field reproduction space (for example, the satellite venue STL1). s1[n], ..., sb[n] (an example of a point sound source signal) is acquired (step St11). The second encoding unit 30 of the sound field reproduction device 2A reads the direction vectors θ b of the b point sound sources from a memory (not shown) or acquires them based on a designation from a user interface (not shown) (step St12).
Figure JPOXMLDOC01-appb-C000047
Figure JPOXMLDOC01-appb-C000047
 以上により、実施の形態2に係る音場再現装置2Aは、音場再現空間(サテライト会場STL1)内に提示したい複数の音源信号(例えばボーカル、ベース、ギター、ドラム等の各種の音源からの音信号)のそれぞれを符号化処理して第2高次基底音響信号(N次オーダーアンビソニックス信号)を生成する第2符号化部30と、音源信号ごとの第2高次基底音響信号を混合する第2信号混合部31と、を更に備える。これにより、実施の形態2に係る音場再現装置2Aは、音場収録空間(ライブ会場LV1)とは異なり音場再現空間(サテライト会場STL1)において独自に提示したい音源による雰囲気の音を高次基底によって高い方向解像度を有しながら出力することができる。 As described above, the sound field reproduction device 2A according to the second embodiment can provide a plurality of sound source signals (for example, sounds from various sound sources such as vocals, bass, guitar, drums, etc.) to be presented in the sound field reproduction space (satellite venue STL1). A second encoding unit 30 generates a second higher-order fundamental sound signal (N-order ambisonics signal) by encoding each of the sound source signals, and mixes the second higher-order fundamental sound signal for each sound source signal. It further includes a second signal mixing section 31. As a result, the sound field reproduction device 2A according to the second embodiment can reproduce high-level atmospheric sounds from sound sources that are uniquely desired to be presented in the sound field reproduction space (satellite venue STL1), unlike the sound field recording space (live venue LV1). The basis allows output with high directional resolution.
Figure JPOXMLDOC01-appb-C000048
Figure JPOXMLDOC01-appb-C000048
Figure JPOXMLDOC01-appb-C000049
Figure JPOXMLDOC01-appb-C000049
 以上、添付図面を参照しながら実施の形態について説明したが、本開示はかかる例に限定されない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例、修正例、置換例、付加例、削除例、均等例に想到し得ることは明らかであり、それらについても本開示の技術的範囲に属すると了解される。また、発明の趣旨を逸脱しない範囲において、上述した実施の形態における各構成要素を任意に組み合わせてもよい。 Although the embodiments have been described above with reference to the accompanying drawings, the present disclosure is not limited to such examples. It is clear that those skilled in the art can come up with various changes, modifications, substitutions, additions, deletions, and equivalents within the scope of the claims, and It is understood that it falls within the technical scope of the present disclosure. Further, each of the constituent elements in the embodiments described above may be combined as desired without departing from the spirit of the invention.
 なお、本出願は、2022年8月15日出願の日本特許出願(特願2022-129434)に基づくものであり、その内容は本出願の中に参照として援用される。 Note that this application is based on a Japanese patent application (Japanese Patent Application No. 2022-129434) filed on August 15, 2022, and the contents thereof are incorporated as a reference in this application.
 本開示は、アンビソニックスマイクを用いて収録した低次の音場成分を利用し、音場再現空間内での音源定位誤差の増大を抑制する音場再現装置、音場再現方法及び音場再現システムとして有用である。 The present disclosure discloses a sound field reproduction device, a sound field reproduction method, and a sound field reproduction device that utilizes low-order sound field components recorded using an ambisonics microphone to suppress an increase in sound source localization error in a sound field reproduction space. It is useful as a system.
 1 音場収録装置
 2、2A 音場再現装置
 11 アンビソニックスマイク
 12 A/D変換部
 13 符号化部
 14 マイク素子方向指定部
 21 音源抽出方向制御部
 22 音源提示方向制御部
 23 再符号化部
 24 スピーカ方向指定部
 25 第1復号化部
 26、32 第2復号化部
 27 信号混合部
 28 音場再生部
 29 音源取得部
 30 第2符号化部
 31 第2信号混合部
 100、100A 音場再現システム
 SPk1、SPk2、SPk3、SPk4、SPk5、SPk6、SPk7、SPk8
 スピーカ
1 Sound field recording device 2, 2A Sound field reproduction device 11 Ambisonics microphone 12 A/D conversion unit 13 Encoding unit 14 Microphone element direction designation unit 21 Sound source extraction direction control unit 22 Sound source presentation direction control unit 23 Re-encoding unit 24 Speaker direction specifying section 25 First decoding section 26, 32 Second decoding section 27 Signal mixing section 28 Sound field reproduction section 29 Sound source acquisition section 30 Second encoding section 31 Second signal mixing section 100, 100A Sound field reproduction system SPk1, SPk2, SPk3, SPk4, SPk5, SPk6, SPk7, SPk8
speaker

Claims (13)

  1.  収録デバイスが配置される音場収録空間内の音源抽出方向の指定を受ける音源抽出方向制御部と、
     前記収録デバイスによる収録信号を用いた符号化処理に基づく低次基底音響信号のうち前記音源抽出方向に対応する音響信号を再符号化して高次基底音響信号を生成する再符号化部と、
     前記音場収録空間とは異なる音場再現空間内に設けられた複数のスピーカのそれぞれから、前記高次基底音響信号に基づく信号を出力する音場再生部と、を備える、
     音場再現装置。
    a sound source extraction direction control unit that receives designation of a sound source extraction direction in a sound field recording space in which the recording device is placed;
    a re-encoding unit that generates a high-order base acoustic signal by re-encoding an acoustic signal corresponding to the sound source extraction direction among the low-order base acoustic signals based on the encoding process using the recorded signal by the recording device;
    a sound field reproduction unit that outputs a signal based on the high-order base acoustic signal from each of a plurality of speakers provided in a sound field reproduction space different from the sound field recording space;
    Sound field reproduction device.
  2.  前記音源抽出方向と同一又は異なる方向であって、前記音場収録空間とは異なる音場再現空間内での音場再現の強調方向である音源提示方向の指定を受ける音源提示方向制御部、を更に備え、
     前記再符号化部は、前記音響信号と前記音源提示方向とを用いて前記再符号化して前記高次基底音響信号を生成する、
     請求項1に記載の音場再現装置。
    a sound source presentation direction control unit that receives designation of a sound source presentation direction that is the same as or different from the sound source extraction direction and is a direction for emphasizing sound field reproduction in a sound field reproduction space different from the sound field recording space; Further prepare,
    The re-encoding unit generates the higher-order base acoustic signal by performing the re-encoding using the acoustic signal and the sound source presentation direction.
    The sound field reproduction device according to claim 1.
  3.  前記高次基底音響信号と前記複数のスピーカのそれぞれの配置情報とを用いて、前記スピーカごとの高次基底成分を有する第1音場駆動信号を生成する第1復号化部、を更に備える、
     請求項1に記載の音場再現装置。
    Further comprising a first decoding unit that generates a first sound field drive signal having a high-order base component for each of the speakers using the high-order base acoustic signal and placement information of each of the plurality of speakers.
    The sound field reproduction device according to claim 1.
  4.  前記低次基底音響信号と前記複数のスピーカのそれぞれの配置情報とを用いて、前記スピーカごとの低次基底成分を有する第2音場駆動信号を生成する第2復号化部、を更に備える、
     請求項3に記載の音場再現装置。
    further comprising a second decoding unit that generates a second sound field drive signal having a low-order base component for each of the speakers using the low-order base acoustic signal and placement information of each of the plurality of speakers;
    The sound field reproduction device according to claim 3.
  5.  前記第1音場駆動信号と前記第2音場駆動信号とを前記スピーカごとに混合する信号混合部、を更に備え、
     前記音場再生部は、前記高次基底音響信号に基づく信号として、前記信号混合部による混合後の信号を前記スピーカごとに出力する、
     請求項4に記載の音場再現装置。
    further comprising a signal mixing unit that mixes the first sound field drive signal and the second sound field drive signal for each of the speakers,
    The sound field reproducing unit outputs a signal mixed by the signal mixing unit for each of the speakers as a signal based on the high-order base acoustic signal.
    The sound field reproduction device according to claim 4.
  6.  前記音源抽出方向は、前記音場収録空間内の基準位置からの3次元方向として指定される、
     請求項1に記載の音場再現装置。
    The sound source extraction direction is specified as a three-dimensional direction from a reference position within the sound field recording space.
    The sound field reproduction device according to claim 1.
  7.  前記音源提示方向は、前記音場収録空間内の基準位置からの3次元方向として指定される、
     請求項2に記載の音場再現装置。
    The sound source presentation direction is specified as a three-dimensional direction from a reference position within the sound field recording space.
    The sound field reproduction device according to claim 2.
  8.  前記音場再現空間内に提示したい複数の音源信号のそれぞれを符号化処理して第2高次基底音響信号を生成する第2符号化部と、
     前記音源信号ごとの前記第2高次基底音響信号を混合する第2信号混合部と、を更に備える、
     請求項3に記載の音場再現装置。
    a second encoding unit that generates a second higher-order base acoustic signal by encoding each of the plurality of sound source signals to be presented in the sound field reproduction space;
    further comprising a second signal mixing unit that mixes the second higher-order base acoustic signal for each of the sound source signals;
    The sound field reproduction device according to claim 3.
  9.  前記第2信号混合部による混合後の前記音源信号ごとの前記第2高次基底音響信号と前記複数のスピーカのそれぞれの配置情報とを用いて、前記スピーカごとの高次基底成分を有する第3音場駆動信号を生成する第2復号化部、を更に備える、
     請求項8に記載の音場再現装置。
    Using the second high-order base acoustic signal for each of the sound source signals mixed by the second signal mixing unit and the arrangement information of each of the plurality of speakers, a third base acoustic signal having a high-order base component for each of the speakers is generated. further comprising a second decoding unit that generates a sound field drive signal;
    The sound field reproduction device according to claim 8.
  10.  前記第1音場駆動信号と前記第3音場駆動信号とを前記スピーカごとに混合する信号混合部、を更に備え、
     前記音場再生部は、前記高次基底音響信号に基づく信号として、前記信号混合部による混合後の信号を前記スピーカごとに出力する、
     請求項9に記載の音場再現装置。
    further comprising a signal mixing unit that mixes the first sound field drive signal and the third sound field drive signal for each of the speakers,
    The sound field reproducing unit outputs a signal mixed by the signal mixing unit for each of the speakers as a signal based on the high-order base acoustic signal.
    The sound field reproduction device according to claim 9.
  11.  収録デバイスが配置される音場収録空間内の音源抽出方向の指定を受けるステップと、
     前記収録デバイスによる収録信号を用いた符号化処理に基づく低次基底音響信号のうち前記音源抽出方向に対応する音響信号を再符号化して高次基底音響信号を生成するステップと、
     前記音場収録空間とは異なる音場再現空間内に設けられた複数のスピーカのそれぞれから、前記高次基底音響信号に基づく信号を出力するステップと、を有する、
     音場再現方法。
    a step of receiving designation of a sound source extraction direction within a sound field recording space in which a recording device is placed;
    re-encoding an acoustic signal corresponding to the sound source extraction direction among the low-order basic acoustic signals based on the encoding process using the recorded signal by the recording device to generate a high-order basic acoustic signal;
    outputting a signal based on the higher-order base acoustic signal from each of a plurality of speakers provided in a sound field reproduction space different from the sound field recording space;
    Sound field reproduction method.
  12.  音場収録空間内の音源を収録可能な収録デバイスを有する音場収録装置と、
     前記収録デバイスにより収録された音響信号を、前記音場収録空間とは異なる音場再現空間内で再現する音場再現装置と、を備え、
     前記音場再現装置は、
     前記音場収録空間内の音源抽出方向の指定を受ける音源抽出方向制御部と、
     前記収録デバイスによる収録信号を用いた符号化処理に基づく低次基底音響信号のうち前記音源抽出方向に対応する音響信号を再符号化して高次基底音響信号を生成する再符号化部と、
     前記音場再現空間内に設けられた複数のスピーカのそれぞれから、前記高次基底音響信号に基づく信号を出力する音場再生部と、を備える、
     音場再現システム。
    a sound field recording device having a recording device capable of recording a sound source within a sound field recording space;
    a sound field reproduction device that reproduces the acoustic signal recorded by the recording device in a sound field reproduction space different from the sound field recording space;
    The sound field reproduction device includes:
    a sound source extraction direction control unit that receives designation of a sound source extraction direction within the sound field recording space;
    a re-encoding unit that generates a high-order base acoustic signal by re-encoding an acoustic signal corresponding to the sound source extraction direction among the low-order base acoustic signals based on the encoding process using the recorded signal by the recording device;
    a sound field reproduction unit that outputs a signal based on the high-order fundamental acoustic signal from each of a plurality of speakers provided in the sound field reproduction space;
    Sound field reproduction system.
  13.  前記収録デバイスは、複数のマイク素子のそれぞれが異なる方向を向くように立体的に配置されたアンビソニックスマイクにより構成される、
     請求項12に記載の音場再現システム。
     
    The recording device is composed of an ambisonics microphone arranged three-dimensionally so that each of the plurality of microphone elements faces in a different direction.
    The sound field reproduction system according to claim 12.
PCT/JP2023/025363 2022-08-15 2023-07-07 Sound field reproduction device, sound field reproduction method, and sound field reproduction system WO2024038702A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-129434 2022-08-15
JP2022129434A JP2024026010A (en) 2022-08-15 2022-08-15 Sound field reproduction device, sound field reproduction method, and sound field reproduction system

Publications (1)

Publication Number Publication Date
WO2024038702A1 true WO2024038702A1 (en) 2024-02-22

Family

ID=89941433

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/025363 WO2024038702A1 (en) 2022-08-15 2023-07-07 Sound field reproduction device, sound field reproduction method, and sound field reproduction system

Country Status (2)

Country Link
JP (1) JP2024026010A (en)
WO (1) WO2024038702A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016517033A (en) * 2013-03-22 2016-06-09 トムソン ライセンシングThomson Licensing Method and apparatus for enhancing directivity of primary ambisonics signal
US20180218740A1 (en) * 2017-01-27 2018-08-02 Google Inc. Coding of a soundfield representation
WO2018162803A1 (en) * 2017-03-09 2018-09-13 Aalto University Foundation Sr Method and arrangement for parametric analysis and processing of ambisonically encoded spatial sound scenes
JP2019530389A (en) * 2016-09-28 2019-10-17 ノキア テクノロジーズ オーユー Spatial audio signal format generation from a microphone array using adaptive capture
JP2019192975A (en) * 2018-04-19 2019-10-31 キヤノン株式会社 Signal processing device, signal processing method, and program
JP2022518744A (en) * 2019-01-21 2022-03-16 フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Devices and methods for encoding spatial audio representations, or devices and methods for decoding audio signals encoded using transport metadata, and related computer programs.

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016517033A (en) * 2013-03-22 2016-06-09 トムソン ライセンシングThomson Licensing Method and apparatus for enhancing directivity of primary ambisonics signal
JP2019530389A (en) * 2016-09-28 2019-10-17 ノキア テクノロジーズ オーユー Spatial audio signal format generation from a microphone array using adaptive capture
US20180218740A1 (en) * 2017-01-27 2018-08-02 Google Inc. Coding of a soundfield representation
WO2018162803A1 (en) * 2017-03-09 2018-09-13 Aalto University Foundation Sr Method and arrangement for parametric analysis and processing of ambisonically encoded spatial sound scenes
JP2019192975A (en) * 2018-04-19 2019-10-31 キヤノン株式会社 Signal processing device, signal processing method, and program
JP2022518744A (en) * 2019-01-21 2022-03-16 フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Devices and methods for encoding spatial audio representations, or devices and methods for decoding audio signals encoded using transport metadata, and related computer programs.

Also Published As

Publication number Publication date
JP2024026010A (en) 2024-02-28

Similar Documents

Publication Publication Date Title
US10674262B2 (en) Merging audio signals with spatial metadata
TWI744341B (en) Distance panning using near / far-field rendering
JP5688030B2 (en) Method and apparatus for encoding and optimal reproduction of a three-dimensional sound field
RU2666473C2 (en) Apparatus and method for audio rendering employing geometric distance definition
JP2017501438A (en) Multiplet-based matrix mixing for high channel count multi-channel audio
EP4173314A2 (en) Sound field adjustment
KR20190019915A (en) Method and apparatus for processing audio signals using Ambisonic signals
Braasch et al. A loudspeaker-based projection technique for spatial music applications using virtual microphone control
WO2024038702A1 (en) Sound field reproduction device, sound field reproduction method, and sound field reproduction system
Wakefield Third-order Ambisonic extensions for Max/MSP with musical applications
WO2022110722A1 (en) Audio encoding/decoding method and device
GB2578715A (en) Controlling audio focus for spatial audio processing
CN110782865B (en) Three-dimensional sound creation interactive system
WO2024070127A1 (en) Sound field reproduction device, sound field reproduction method, and sound field reproduction system
Tsutsumi et al. Directivity synthesis with multipoles comprising a cluster of focused sources using a linear loudspeaker array
Paterson et al. Producing 3-D audio
JP2014204316A (en) Acoustic signal reproducing device and acoustic signal preparation device
JP6204680B2 (en) Acoustic signal reproduction device, acoustic signal creation device
JP2024043430A (en) Sound field presence reproducing device and sound field presence reproducing method
Melchior et al. Spatial audio authoring for Ambisonics reproduction
WO2022034805A1 (en) Signal processing device and method, and audio playback system
Devonport et al. Full Reviewed Paper at ICSA 2019
JP2024043429A (en) Presence sound field reproducing device and presence sound field reproducing method
Nettingsmeier Higher order Ambisonics-a future-proof 3D audio technique
Thomaz et al. Orchestra spatialization using the AUDIENCE engine

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23854739

Country of ref document: EP

Kind code of ref document: A1