WO2022184097A1 - 虚拟扬声器集合确定方法和装置 - Google Patents

虚拟扬声器集合确定方法和装置 Download PDF

Info

Publication number
WO2022184097A1
WO2022184097A1 PCT/CN2022/078824 CN2022078824W WO2022184097A1 WO 2022184097 A1 WO2022184097 A1 WO 2022184097A1 CN 2022078824 W CN2022078824 W CN 2022078824W WO 2022184097 A1 WO2022184097 A1 WO 2022184097A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual
latitude
virtual speakers
speakers
speaker
Prior art date
Application number
PCT/CN2022/078824
Other languages
English (en)
French (fr)
Inventor
高原
刘帅
王宾
王喆
曲天书
徐佳浩
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to AU2022230620A priority Critical patent/AU2022230620A1/en
Priority to JP2023553928A priority patent/JP2024512347A/ja
Priority to EP22762560.5A priority patent/EP4294056A1/en
Priority to KR1020237033855A priority patent/KR20230154241A/ko
Priority to BR112023017996A priority patent/BR112023017996A2/pt
Publication of WO2022184097A1 publication Critical patent/WO2022184097A1/zh
Priority to US18/241,698 priority patent/US20230412981A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/02Spatial or constructional arrangements of loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/005Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo five- or more-channel type, e.g. virtual surround
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2205/00Details of stereophonic arrangements covered by H04R5/00 but not provided for in any of its subgroups
    • H04R2205/024Positioning of loudspeaker enclosures for spatial sound reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • the present application relates to the field of audio technology, and in particular, to a method and apparatus for determining a virtual speaker set.
  • 3D audio technology is an audio technology that acquires, processes, transmits, renders and plays back sound events and 3D sound field information in the real world by means of computer and signal processing.
  • the three-dimensional audio technology makes the sound have a strong sense of space, envelopment and immersion, giving people a "sound immersive" listening experience.
  • the current mainstream 3D audio technology is higher order ambisonics (HOA) technology.
  • HOA technology is independent of the speaker layout in the playback stage during recording and encoding, as well as the rotatable characteristics of HOA format data.
  • HOA technology has higher flexibility in 3D audio playback, so it has also received more extensive attention and research.
  • HOA technology can convert HOA signals into virtual speaker signals and then map them to binaural signals for playback.
  • the best sampling effect can be achieved by evenly distributing the virtual speakers, for example, distributing the virtual speakers on the vertices of a regular tetrahedron.
  • regular tetrahedron regular hexahedron
  • regular octahedron regular dodecahedron
  • regular icosahedron the number of virtual speakers that can be set is limited and cannot be applied to a larger number of speakers. Distribution of virtual speakers.
  • the present application provides a method and apparatus for determining a virtual speaker set, so as to improve the playback effect of an audio signal.
  • the present application provides a method for determining a virtual speaker set, comprising: determining a target virtual speaker from preset F virtual speakers according to an audio signal to be processed, wherein each virtual speaker in the F virtual speakers has its own Corresponding to the S virtual speakers, F is a positive integer, and S is a positive integer greater than 1; obtain the respective position information of the S virtual speakers corresponding to the target virtual speakers from a preset virtual speaker distribution table, and the virtual speakers
  • the distribution table includes position information of K virtual speakers, the position information includes a pitch angle index and a horizontal angle index, K is a positive integer greater than 1, F ⁇ K, F ⁇ S ⁇ K.
  • a virtual speaker distribution table is preset, so that the virtual speakers can be deployed according to the distribution table to obtain a higher average signal-to-noise ratio (SNR) of the HOA reconstructed signal, and then the selection and processing based on this distribution
  • SNR signal-to-noise ratio
  • the determining the target virtual speaker from the preset F virtual speakers according to the audio signal to be processed includes: acquiring a high-order stereo reverberation HOA coefficient of the audio signal; acquiring the F group HOA coefficients corresponding to the F virtual speakers, the F virtual speakers are in one-to-one correspondence with the F group HOA coefficients; one of the F groups of HOA coefficients that has the greatest correlation with the HOA coefficient of the audio signal The virtual speaker corresponding to the set of HOA coefficients is determined as the target virtual speaker.
  • Coding and analysis of the audio signal to be processed such as analyzing the sound field distribution of the audio signal to be processed, including the number of sound sources, directivity, dispersion and other characteristics of the audio signal, to obtain the HOA coefficient of the audio signal, as a decision on how to select the target One of the judgment conditions for virtual speakers.
  • the HOA coefficient of the audio signal to be processed and the HOA coefficient of the candidate virtual speakers that is, the above-mentioned F virtual speakers
  • a virtual speaker matching the audio signal to be processed can be selected, and the virtual speaker is referred to as the target in this application. virtual speakers.
  • the respective HOA coefficients of the F virtual speakers and the HOA coefficients of the audio signal may be inner products, and the virtual speaker with the largest absolute value of the inner product is selected as the target virtual speaker. It should be noted that other methods may also be used to determine the target virtual speaker, which is not specifically limited in this application.
  • the S virtual speakers corresponding to the target virtual speaker satisfy the following condition: the S virtual speakers include the target virtual speaker, and S virtual speakers located around the target virtual speaker -1 virtual speaker, any one of the S-1 correlations between the S-1 virtual speakers and the target virtual speaker is greater than any one of the S-1 correlations of the K virtual speakers except the S virtual speakers All of the K-S correlations of the other K-S virtual speakers with the target virtual speaker.
  • the target virtual speaker is the center virtual speaker with the highest correlation with the HOA coefficient of the audio signal to be processed.
  • the S virtual speakers corresponding to each center virtual speaker are the S virtual speakers with the highest correlation with the HOA coefficient of the center virtual speaker, and therefore the S virtual speakers corresponding to the target virtual speaker are also the HOA coefficients of the audio signal to be processed.
  • the S virtual speakers with the highest correlation are also the HOA coefficients of the audio signal to be processed.
  • the K virtual speakers satisfy the following conditions: the K virtual speakers are distributed on a preset spherical surface; the preset spherical surface includes L latitude regions, L>1; wherein, the The mth latitude area in the L latitude areas includes Tm latitude coils, and the horizontal angle difference between adjacent virtual speakers distributed on the m ith latitude coil in the K virtual speakers is ⁇ m , 1 ⁇ m ⁇ L, T m is a positive integer, 1 ⁇ m i ⁇ Tm; wherein, when T m >1, the pitch angle difference between any two adjacent latitude coils in the mth latitude region is a m .
  • the nth latitude area of the L latitude areas includes Tn latitude coils, and among the K virtual speakers, the adjacent virtual speakers distributed on the n ith latitude coil are among the The horizontal angle difference between them is ⁇ n , 1 ⁇ n ⁇ L, T n is a positive integer, 1 ⁇ n i ⁇ T n ; wherein, when T n >1, any two of the nth latitude regions
  • the cth latitude area in the L latitude areas includes T c latitude coils, one of the T c latitude coils is an equatorial latitude coil, and the K virtual speakers
  • the horizontal angle difference between adjacent virtual speakers distributed on the c i -th weft coil is ⁇ c , 1 ⁇ c ⁇ L
  • T c is a positive integer, 1 ⁇ ci ⁇ T c ; where, when T c >1, the pitch angle difference between any two adjacent latitude coils in the c-th latitude region is ⁇ c ; where ⁇ c ⁇ m , c ⁇ m.
  • the F virtual speakers satisfy the following condition: a horizontal angle difference ⁇ mi between adjacent virtual speakers distributed on the m i th weft coil among the F virtual speakers greater than ⁇ m .
  • ⁇ mi q ⁇ m , where q is a positive integer greater than 1.
  • the correlation R fk between the kth virtual speaker in the K virtual speakers and the target virtual speaker satisfies the following formula:
  • represents the horizontal angle of the target virtual speaker
  • represents the pitch angle of the target virtual speaker
  • HOA coefficient of the target virtual speaker represents the HOA coefficient of the kth virtual speaker among the K virtual speakers.
  • the present application provides an apparatus for determining a virtual speaker set, including: a determination module configured to determine a target virtual speaker from preset F virtual speakers according to an audio signal to be processed, wherein Each virtual speaker corresponds to S virtual speakers, F is a positive integer, and S is a positive integer greater than 1; an acquisition module is used to acquire S virtual speakers corresponding to the target virtual speaker from a preset virtual speaker distribution table
  • the position information of each speaker, the virtual speaker distribution table includes the position information of K virtual speakers, the position information includes the pitch angle index and the horizontal angle index, K is a positive integer greater than 1, F ⁇ K, F ⁇ S ⁇ K.
  • the determining module is specifically configured to acquire high-order stereo reverberation HOA coefficients of the audio signal; acquire F groups of HOA coefficients corresponding to the F virtual speakers, the F virtual speakers The speakers are in one-to-one correspondence with the F groups of HOA coefficients; the virtual speaker corresponding to the group of HOA coefficients with the greatest correlation with the HOA coefficients of the audio signal among the F groups of HOA coefficients is determined as the target virtual speaker.
  • the S virtual speakers corresponding to the target virtual speaker satisfy the following condition: the S virtual speakers include the target virtual speaker, and S virtual speakers located around the target virtual speaker -1 virtual speaker, any one of the S-1 correlations between the S-1 virtual speakers and the target virtual speaker is greater than any one of the S-1 correlations of the K virtual speakers except the S virtual speakers All of the K-S correlations of the other K-S virtual speakers with the target virtual speaker.
  • the K virtual speakers satisfy the following conditions: the K virtual speakers are distributed on a preset spherical surface; the preset spherical surface includes L latitude regions, L>1; wherein, the The mth latitude area in the L latitude areas includes Tm latitude coils, and the horizontal angle difference between adjacent virtual speakers distributed on the m ith latitude coil in the K virtual speakers is ⁇ m , 1 ⁇ m ⁇ L, T m is a positive integer, 1 ⁇ m i ⁇ Tm; wherein, when T m >1, the pitch angle difference between any two adjacent latitude coils in the mth latitude region is a m .
  • the nth latitude area of the L latitude areas includes Tn latitude coils, and among the K virtual speakers, the adjacent virtual speakers distributed on the n ith latitude coil are among the The horizontal angle difference between them is ⁇ n , 1 ⁇ n ⁇ L, T n is a positive integer, 1 ⁇ n i ⁇ T n ; wherein, when T n >1, any two of the nth latitude regions
  • the cth latitude area in the L latitude areas includes T c latitude coils, one of the T c latitude coils is an equatorial latitude coil, and the K virtual speakers
  • the horizontal angle difference between adjacent virtual speakers distributed on the c i -th weft coil is ⁇ c , 1 ⁇ c ⁇ L
  • T c is a positive integer, 1 ⁇ ci ⁇ T c ; where, when T c >1, the pitch angle difference between any two adjacent latitude coils in the c-th latitude region is ⁇ c ; where ⁇ c ⁇ m , c ⁇ m.
  • the F virtual speakers satisfy the following condition: a horizontal angle difference ⁇ mi between adjacent virtual speakers distributed on the m i th weft coil among the F virtual speakers greater than ⁇ m .
  • ⁇ mi q ⁇ m , where q is a positive integer greater than 1.
  • the correlation R fk between the kth virtual speaker in the K virtual speakers and the target virtual speaker satisfies the following formula:
  • represents the horizontal angle of the target virtual speaker
  • represents the pitch angle of the target virtual speaker
  • HOA coefficient of the target virtual speaker represents the HOA coefficient of the kth virtual speaker among the K virtual speakers.
  • the present application provides an audio processing device, comprising: one or more processors; a memory for storing one or more programs; when the one or more programs are processed by the one or more processors Execution causes the one or more processors to implement the method of any one of the above first aspects.
  • the present application provides a computer-readable storage medium, comprising a computer program, which, when executed on a computer, causes the computer to execute the method according to any one of the above-mentioned first aspects.
  • Fig. 1 is an exemplary structural diagram of the audio playback system of the application
  • FIG. 2 is an exemplary structural diagram of the audio decoding system 10 of the present application.
  • Fig. 3 is an exemplary structural diagram of the HOA encoding device of the present application.
  • 4a is an exemplary schematic diagram of a preset spherical surface of the present application.
  • Fig. 4b is an exemplary schematic diagram of the pitch angle and the horizontal angle of the present application.
  • 5a and 5b are exemplary distribution diagrams of K virtual speakers
  • 6a and 6b are exemplary distribution diagrams of K virtual speakers
  • FIG. 7 is an exemplary flowchart of the method for determining a virtual speaker set of the present application.
  • FIG. 8 is an exemplary structural diagram of an apparatus for determining a virtual speaker set of the present application.
  • At least one (item) refers to one or more, and "a plurality” refers to two or more.
  • “And/or” is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B exist , where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c” ", where a, b, c can be single or multiple.
  • the two values connected by the character " ⁇ " generally represent a value range, and the value range includes the two values connected by " ⁇ ".
  • Audio frame Audio data is streaming.
  • the amount of audio data within a period of time is usually taken as a frame of audio. This period is called “sampling time", which can be determined according to the codec. Determine its value according to the requirements of the device and specific applications, for example, the duration is 2.5ms to 60ms, and ms is milliseconds.
  • Audio signal is the information carrier of frequency and amplitude variation of regular sound waves with speech, music and sound effects. Audio is a continuously changing analog signal that can be represented by a continuous curve called a sound wave. Audio is a digital signal generated by analog-to-digital conversion or by a computer. Sound waves have three important parameters: frequency, amplitude and phase, which determine the characteristics of the audio signal.
  • FIG. 1 is an exemplary structural diagram of an audio playback system of the application.
  • the audio playback system includes: an audio sending device and an audio receiving device, wherein the audio sending device includes, for example, a mobile phone, a computer (laptop, Desktop computers, etc.), tablets (handheld tablets, car tablets, etc.) and other devices that can perform audio encoding and send audio streams; audio receiving devices include, for example, true wireless stereo (TWS), ordinary wireless headphones, audio, smart Devices that can receive audio streams, decode audio streams and play them, such as watches and smart glasses.
  • TWS true wireless stereo
  • a Bluetooth connection can be established between the audio sending device and the audio receiving device, and the two can support the transmission of voice and music.
  • audio sending and receiving devices are between mobile phones and TWS headsets, wireless headsets, or wireless collars, or between mobile phones and other end devices (such as smart speakers, smart watches, smart glasses, and car-mounted devices). speakers, etc.).
  • examples of audio transmitting devices and audio receiving devices can also be tablets, laptops or desktop computers and TWS earphones, wireless headphones, wireless collar earphones or other terminal devices (such as smart speakers, smart watches, between smart glasses and car speakers).
  • the audio sending device and the audio receiving device may also be connected by other communication methods, such as WiFi connection, wired connection or other wireless connection, which is not specifically limited in this application.
  • FIG. 2 is an exemplary structural diagram of the audio decoding system 10 of the present application.
  • the audio decoding system 10 may include a source device 12 and a destination device 14, and the source device 12 may be the audio transmitting device in FIG. 1 .
  • the destination device 14 may be the audio receiving device of FIG. 1 .
  • the source device 12 generates encoded stream information, and therefore, the source device 12 may also be referred to as an audio encoding device.
  • the destination device 14 may decode the encoded bitstream information generated by the source device 12, and thus, the destination device 14 may also be referred to as an audio decoding device.
  • the source device 12 and the audio encoding device may be collectively referred to as an audio sending device, and the destination device 14 and the audio decoding device may be collectively referred to as an audio receiving device.
  • the source device 12 includes an encoder 20 and, optionally, an audio source 16 , an audio preprocessor 18 , and a communication interface 22 .
  • Audio source 16 which may include or be any type of audio capture device, eg, capturing real world sounds, and/or any type of audio generating device, eg, a computer audio processor, or used to acquire and/or provide real world sound Audio, computer animation audio (e.g., screen content, audio in virtual reality (VR)) of any class of device, and/or any combination thereof (e.g., audio in augmented reality (AR), mixing Audio in Mixed Reality (MR) and/or Audio in Extended Reality (XR)).
  • Audio source 16 may be a microphone for capturing audio or a memory for storing audio, audio source 16 may also include any kind of interface (internal or external) that stores previously captured or generated audio and/or acquires or receives audio.
  • the audio source 16 When the audio source 16 is a microphone, the audio source 16 may be, for example, an audio capture device that is local or integrated in the source device; when the audio source 16 is a memory, the audio source 16 may be local or, for example, integrated in the source device Integrated memory.
  • the interface When the audio source 16 includes an interface, the interface may be, for example, an external interface that receives audio from an external audio source, such as an external audio capture device, such as a microphone, a microphone, an external memory, or an external audio generation device that generates
  • the device is, for example, an external computer audio processor, a computer or a server.
  • the interface can be any kind of interface according to any proprietary or standardized interface protocol, such as wired or wireless interfaces, optical interfaces.
  • the audio source 16 acquires the audio signal of the current scene, and the audio signal of the current scene refers to the audio signal obtained by collecting the sound field at the position of the microphone in the space.
  • the audio signal of the current scene may also be referred to as the original scene audio signal.
  • the current scene audio signal may be an audio signal obtained through a higher order ambisonics (higher order ambisonics, HOA) technology.
  • the audio source 16 acquires the HOA signal to be encoded.
  • the HOA signal can be acquired by using an actual acquisition device or synthesized by using an artificial audio object.
  • the HOA signal to be encoded may be a time-domain HOA signal or a frequency-domain HOA signal.
  • the audio preprocessor 18 is used for receiving the original audio signal and performing preprocessing on the original audio signal to obtain the preprocessed audio signal.
  • the preprocessing performed by the audio preprocessor 18 may include trimming or denoising.
  • the encoder 20 is configured to receive the pre-processed audio signal, and process the pre-processed audio signal to provide encoded code stream information.
  • the communication interface 22 in the source device 12 can be used to receive the code stream information and send the code stream to the destination device 14 through the communication channel 13 .
  • the communication channel 13 is, for example, a direct wired or wireless connection, a network of any kind such as a wired or wireless network or any combination thereof, or a private network and a public network of any kind, or any combination thereof.
  • the destination device 14 includes a decoder 30 and, optionally, a communication interface 28 , an audio post-processor 32 and a playback device 34 .
  • the communication interface 28 in the destination device 14 is used to receive the codestream information directly from the source device 12 and provide the codestream information to the decoder 30 .
  • Communication interface 22 and communication interface 28 may be used to send or receive stream information through communication channel 13 between source device 12 and destination device 14 .
  • Both the communication interface 22 and the communication interface 28 can be configured as a one-way communication interface as indicated by the arrow in FIG. 2 from the corresponding communication channel 13 of the source device 12 to the destination device 14, or a two-way communication interface, and can be used to send and receive messages etc. to establish a connection, acknowledge and exchange any other information related to a communication link and/or data transfer such as encoded audio data, etc.
  • the decoder 30 is configured to receive the code stream information, and decode the code stream information to obtain decoded audio data.
  • the audio post-processor 32 is used for post-processing the decoded audio data to obtain post-processed audio data.
  • the post-processing performed by the audio post-processor 32 may include, for example, trimming or resampling, and the like.
  • the playback device 34 is used for receiving the post-processed audio data to play the audio to the user or listener.
  • Playback device 34 may be or include any type of player for playing reconstructed audio, eg, integrated or external speakers.
  • speakers may include speakers, speakers, and the like.
  • FIG. 3 is an exemplary structural diagram of the HOA encoding apparatus of the present application. As shown in FIG. 3 , the HOA encoding apparatus may be applied to the encoder 20 of the audio decoding system 10 described above.
  • the HOA encoding apparatus includes: a virtual speaker configuration unit, an encoding analysis unit, a virtual speaker set generation unit, a virtual speaker selection unit, a virtual speaker signal generation unit, and a core encoder processing unit. in,
  • the virtual speaker configuration unit is used to configure the virtual speaker according to the encoder configuration information to obtain virtual speaker configuration parameters.
  • the encoder configuration information includes but is not limited to: HOA order, encoding bit rate, user-defined information, etc.
  • the virtual speaker configuration parameters include but are not limited to: the number of virtual speakers, the HOA order of the virtual speakers, etc.
  • the virtual speaker configuration parameters output by the virtual speaker configuration unit are used as input to the virtual speaker set generation unit.
  • the coding analysis unit is used to perform coding analysis on the HOA signal to be coded, such as analyzing the sound field distribution of the HOA signal to be coded, including the number of sound sources, directivity, dispersion and other characteristics of the HOA signal to be coded, as a decision on how to select the target virtual speaker one of the judgment conditions.
  • the HOA encoding apparatus may not include an encoding analysis unit, that is, the HOA encoding apparatus may not analyze the input signal, and a default configuration is used to determine how to select the target virtual speaker.
  • the HOA encoding device obtains the HOA signal to be encoded.
  • the HOA signal recorded from the actual acquisition device or the HOA signal synthesized by using artificial audio objects can be used as the input of the encoder, and the HOA signal to be encoded input by the encoder can be in the time domain.
  • the HOA signal may also be a frequency domain HOA signal.
  • the virtual speaker set generating unit is configured to generate a virtual speaker set, the virtual speaker set may include: a plurality of virtual speakers, and the virtual speakers in the virtual speaker set may also be referred to as "candidate virtual speakers”.
  • the virtual speaker set generating unit generates the designated candidate virtual speaker HOA coefficients.
  • the coordinates (ie, position information) of the candidate virtual speakers provided by the virtual speaker configuration unit and the HOA order of the candidate virtual speakers are used to generate the candidate virtual speaker HOA coefficients.
  • the method for determining the coordinates of the candidate virtual speakers includes, but is not limited to, generating K virtual speakers according to the equidistant rule, and generating K non-uniformly distributed candidate virtual speakers according to the auditory perception principle. Coordinates of candidate virtual speakers with uniform distribution are generated according to the number of candidate virtual speakers.
  • r represents the radius of the sphere
  • represents the horizontal angle (azimuth) (the horizontal angle can also be called the azimuth angle)
  • k represents the wave speed
  • s represents the amplitude of the ideal plane wave
  • m represents the HOA order number
  • the first j is an imaginary unit, does not change with the angle, are theta
  • the corresponding spherical harmonics is the spherical harmonic function of the sound source direction.
  • the Ambisonics coefficient is:
  • the above formula (3) can indicate that the sound field can be expanded on a spherical surface by a spherical harmonic function, which is represented by the Ambisonics coefficient.
  • the sound field can be reconstructed.
  • the formula (3) is truncated to the Nth item, and the Ambisonics coefficient is used as an approximate description of the sound field, which is called the Nth-order HOA coefficient, which is also called the Ambisonics coefficient. .
  • the Nth-order HOA coefficient which is also called the Ambisonics coefficient.
  • the HOA order may be 2 to 10 orders.
  • represents the horizontal angle of the position information of the virtual speaker on the preset spherical surface
  • l represents the HOA order
  • l 0, 1,...,N
  • m represents the direction parameter in each order
  • m -l,...,l.
  • the HOA coefficients of the candidate virtual speakers output by the virtual speaker set generation unit are used as inputs to the virtual speaker selection unit.
  • a virtual speaker selection unit configured to select a target virtual speaker from a plurality of candidate virtual speakers in the virtual speaker set according to the HOA signal to be encoded, where the target virtual speaker may be referred to as a "virtual speaker matching the HOA signal to be encoded", or Short for matching virtual speakers.
  • the virtual speaker selection unit selects the specified matching virtual speaker according to the HOA signal to be encoded and the candidate virtual speaker HOA coefficient output by the virtual speaker set generation unit.
  • the HOA coefficient matching of the candidate virtual speakers and the HOA signal to be encoded are used to do the inner product, and the candidate virtual speaker with the largest absolute value of the inner product is selected as the target.
  • the virtual speaker is to match the virtual speaker, and superimpose the projection of the HOA signal to be encoded on the candidate virtual speaker on the linear combination of the HOA coefficients of the candidate virtual speaker, and then subtract the projection vector from the HOA signal to be encoded to obtain the difference value,
  • the above process is repeated for the difference value to realize iterative calculation, and each iteration generates a matched virtual speaker, and outputs the matched virtual speaker coordinates and the matched virtual speaker HOA coefficient. It can be understood that multiple matching virtual speakers will be selected, and one matching virtual speaker will be generated at each iteration. (Other than this, other implementation methods are not limited)
  • the coordinates of the target virtual speaker and the HOA coefficient of the target virtual speaker output by the virtual speaker selection unit are used as inputs to the virtual speaker signal generation unit.
  • the virtual speaker signal generation unit is used to generate a virtual speaker signal according to the HOA signal to be encoded and the attribute information of the target virtual speaker, wherein when the attribute information is position information, determine the target virtual speaker according to the position information of the target virtual speaker.
  • HOA coefficient when the attribute information includes the HOA coefficient, the HOA coefficient of the target virtual speaker is acquired from the attribute information.
  • the virtual speaker signal generation unit calculates the virtual speaker signal by using the HOA signal to be encoded and the HOA coefficient of the target virtual speaker.
  • the HOA coefficient of the virtual speaker is represented by matrix A, which can be used to linearly combine the HOA signal to be encoded. Further, the least square method can be used to obtain the theoretical optimal solution w, which is the virtual speaker signal. For example, the following calculation can be used formula:
  • a -1 represents the inverse matrix of matrix A
  • the size of matrix A is (M ⁇ C)
  • C is the number of target virtual speakers
  • M is the number of channels of the Nth-order HOA coefficient
  • M (N+1) 2
  • a represents the HOA coefficient of the target virtual speaker, for example
  • X represents the HOA signal to be encoded
  • the size of the matrix X is (M ⁇ L)
  • M is the number of channels of the N-order HOA coefficients
  • L is the number of samples in the time domain or frequency domain
  • x represents the coefficient of the HOA signal to be encoded ,E.g,
  • the virtual speaker signal output by the virtual speaker signal generation unit is used as the input of the core encoder processing unit.
  • the core encoder processing unit is used to perform core encoder processing on the virtual speaker signal to obtain a transmission code stream.
  • the core encoder processing includes, but is not limited to, transformation, quantization, psychoacoustic model, code stream generation, etc.
  • the frequency domain transmission channel can also be processed on the time domain transmission channel, which is not limited here.
  • the present application provides a method for determining a virtual speaker set.
  • the virtual speaker set determination method is based on the following presets:
  • the virtual speaker distribution table includes position information of K virtual speakers, where the position information includes a pitch angle index and a horizontal angle index, and K is a positive integer greater than 1.
  • the preset spherical surface can include X weft loops and Y warp loops, X and Y can be the same or different, both X and Y are positive integers, for example, X is 512, 768 or 1024, etc., and Y is 512, 768 or 1024 and so on.
  • a virtual speaker is located at the intersection of the X weft coils and the Y warp coils. The larger the values of X and Y, the more candidate selection positions of the virtual speaker, and the better the playback effect of the sound field formed by the finally selected virtual speaker.
  • Fig. 4a is an exemplary schematic diagram of the preset spherical surface of the present application.
  • the preset spherical surface includes L (L>1) latitude regions, the mth latitude region includes T m latitude coils, and K virtual latitude circles.
  • the horizontal angle difference between adjacent virtual loudspeakers distributed on the m i -th weft coil in the loudspeaker is ⁇ m , 1 ⁇ m ⁇ L, T m is a positive integer, 1 ⁇ m i ⁇ Tm.
  • T m >1 the pitch angle difference between any two adjacent latitude coils in the mth latitude region is ⁇ m .
  • FIG. 4b is an exemplary schematic diagram of the pitch angle and the horizontal angle of the present application.
  • the connection between the position of the virtual speaker and the center of the sphere is connected to a preset horizontal plane (for example, the plane where the equatorial circle is located, or where the south pole is located.
  • the included angle is the pitch angle of the virtual speaker; the included angle between the projection of the line between the virtual speaker's position and the center of the sphere on the horizontal plane and the set initial direction is the horizontal angle of the virtual speaker.
  • the K virtual speakers are distributed on one or more weft coils in each latitude region, the distance between adjacent virtual speakers located on the same weft coil is represented by the horizontal angle difference, and the same weft coil
  • the horizontal angle difference between adjacent virtual speakers is all equal.
  • the horizontal angle difference between adjacent virtual speakers on the m ith latitude coil and the horizontal angle difference between adjacent virtual speakers on the m i+ 1th latitude coil are both a m .
  • the distance between the latitude coils in the latitude area is represented by the pitch angle difference, and the pitch angle difference between any two adjacent latitude coils is the same as that in the latitude area.
  • the horizontal angle difference between adjacent virtual speakers is equal.
  • ⁇ n ⁇ m or ⁇ n ⁇ m
  • ⁇ n is between adjacent virtual speakers among the K virtual speakers that are distributed on any latitude coil in the nth latitude region The horizontal angle difference of , n ⁇ m.
  • ⁇ c ⁇ m , ⁇ c is the horizontal angle difference between adjacent virtual speakers distributed on the m c th weft coil in the K virtual speakers, and the m c th weft coil is any latitude coil in the latitude area including the equatorial latitude coil in the L latitude areas.
  • the horizontal angle difference between adjacent virtual speakers in the latitude area including the equatorial latitude coil is the smallest, that is, in the L latitude areas, in the latitude area including the equatorial latitude coil
  • the virtual speakers are the most densely distributed.
  • the positions of the K virtual speakers in the virtual speaker distribution table may be represented by an index, and the indices may include a pitch angle index and a horizontal angle index.
  • the indices may include a pitch angle index and a horizontal angle index.
  • the pitch angle difference between adjacent virtual speakers in the direction of the coil satisfies the aforementioned requirements, after setting the virtual speakers with a pitch angle of 0, the pitch angles of other virtual speakers can be obtained.
  • the conversion formula between the pitch angle and the pitch angle index can obtain the pitch angle index of all virtual speakers on the coil. It should be noted that this application does not specifically limit the pitch angle of which virtual speaker on the coil is set to 0. For example, it may be the virtual speaker located on the equatorial circle, or the virtual speaker located on the south pole. , or a virtual speaker located at the North Pole.
  • the pitch angle of the kth virtual speaker in the above K virtual speakers is and pitch index Satisfy the following formula (that is, the conversion formula of pitch angle and pitch angle index):
  • rk represents the radius of the coil where the kth virtual speaker is located
  • round() represents the rounding
  • rk represents the radius of the latitude coil where the kth virtual speaker is located
  • round() represents the rounding
  • FIG. 5a and 5b are exemplary distribution diagrams of K virtual speakers.
  • the horizontal angle difference between adjacent virtual speakers in the latitude region including the equatorial latitude coil is smaller than the horizontal angle difference between adjacent virtual speakers in other latitude regions, ⁇ c ⁇ m .
  • K virtual speakers are randomly and approximately uniformly distributed on the preset sphere.
  • SNR signal-to-noise ratio
  • the file names from 1 to 12 are respectively a single-sound source voice signal, a single-sound source musical instrument signal, a two-sound source voice signal, and a two-sound source musical instrument signal.
  • FIG. 6a and 6b are exemplary distribution diagrams of K virtual speakers.
  • K virtual speakers are randomly and approximately uniformly distributed on the preset sphere.
  • SNR signal-to-noise ratio
  • this embodiment adopts 12 different types of test audio, and the file names from 1 to 12 are respectively a single-sound source voice signal, a single-sound source musical instrument signal, a two-sound source voice signal, and a two-sound source musical instrument signal.
  • Table 3 is an example of a virtual speaker distribution table.
  • K is 530, that is, Table 3 describes the specific distribution of 530 virtual speakers with serial numbers from 0 to 529, and the positions represent the horizontal angles of the virtual speakers corresponding to the serial numbers.
  • Index and pitch angle index the number before ",” in the position column in the table is the horizontal angle index, and the number after ",” is the pitch angle index.
  • the positions of the 530 virtual speakers in Table 3 are 530 of the 1046530 junctions.
  • the pitch angle index in Table 3 is calculated based on the pitch angle of the equator being 0, that is, except for the equator, the pitch angles corresponding to the other pitch angle indices are the pitch angles relative to the plane where the equator is located.
  • the F virtual speakers satisfy the condition: the horizontal angle difference ⁇ mi between adjacent virtual speakers distributed on the m i th latitude coil in the F virtual speakers is greater than ⁇ m , and the m i th latitude coil is the m th latitude area One of the weft coils inside.
  • a virtual speaker among the K virtual speakers is referred to as a candidate virtual speaker
  • any virtual speaker among the F virtual speakers is referred to as a center virtual speaker (also referred to as a first-round virtual speaker). That is, for any one weft coil on the preset spherical surface, one or more virtual speakers can be selected from a plurality of candidate virtual speakers distributed on the weft coil as the center virtual speaker and added to the F virtual speakers. If multiple virtual speakers are selected, the horizontal angle difference ⁇ mi between adjacent central virtual speakers is greater than the horizontal angle difference ⁇ m between adjacent candidate virtual speakers, which can be expressed as ⁇ mi > ⁇ m .
  • the center virtual speaker is selected from the multiple candidate virtual speakers, and the density is smaller.
  • Each of the F virtual speakers corresponds to S virtual speakers
  • the virtual speaker among the S virtual speakers is referred to as a target virtual speaker. That is, the S virtual loudspeakers corresponding to any one central virtual loudspeaker satisfy the condition: the S virtual loudspeakers include any one of the aforementioned central virtual loudspeakers, and S-1 virtual loudspeakers located around the arbitrary central virtual loudspeaker, the S-1 virtual loudspeakers Any one of the S-1 correlations between the virtual speakers and any one of the aforementioned central virtual speakers is greater than the K-S correlations between the K-S virtual speakers other than the S virtual speakers among the K virtual speakers and any of the aforementioned central virtual speakers All dependencies within individual dependencies.
  • the S R fk corresponding to the S virtual speakers are the largest S among the K R fk corresponding to the K virtual speakers.
  • the largest S indicates that the K R fks are sorted from large to small, and the S R fks at the top are the largest S.
  • R fk represents the correlation between any of the above-mentioned central virtual speakers and the kth virtual speaker in the K virtual speakers, and R fk satisfies the following formula:
  • represents the horizontal angle of any one of the above virtual speakers
  • S target virtual speakers can be determined for each central virtual speaker. It should be understood that this application presets F virtual speakers from K virtual speakers, so the position of each center virtual speaker can also be represented by a pitch angle index and a horizontal angle index; each center virtual speaker corresponds to There are S virtual speakers, and the S virtual speakers are also derived from the K virtual speakers, so the position of each target virtual speaker can also be represented by a pitch angle index and a horizontal angle index.
  • FIG. 7 is an exemplary flowchart of the method for determining a virtual speaker set of the present application.
  • the process 700 can be performed by the encoder 20 or the decoder 30 in the above-mentioned embodiment, that is, the encoder 20 in the audio transmission device implements audio encoding, and then the code stream information is sent to the audio reception device, and the decoding in the audio reception device
  • the controller 30 decodes the code stream information to obtain a target audio frame, and then renders a sound field audio signal corresponding to one or more virtual speakers based on the target audio frame.
  • Process 700 is described as a series of steps or operations, and it should be understood that process 700 may be performed in various orders and/or concurrently, and is not limited to the order of execution shown in FIG. 7 . As shown in Figure 7, the method includes:
  • Step 701 Determine a target virtual speaker from preset F virtual speakers according to the audio signal to be processed.
  • code analysis is performed on the audio signal to be processed, for example, the sound field distribution of the audio signal to be processed is analyzed, including features such as the number of sound sources, directivity, and dispersion of the audio signal, and the HOA coefficient of the audio signal is obtained as One of the judgment conditions for deciding how to select the target virtual speaker.
  • the HOA coefficient of the audio signal to be processed and the HOA coefficient of the candidate virtual speakers that is, the above-mentioned F virtual speakers
  • a virtual speaker matching the audio signal to be processed can be selected, and the virtual speaker is referred to as the target in this application. virtual speakers.
  • the HOA coefficients of the audio signal can be obtained first, and then the F group HOA coefficients corresponding to the F virtual speakers are obtained.
  • the F virtual speakers and the F group HOA coefficients are in one-to-one correspondence, and then the F group Among the HOA coefficients, a virtual speaker corresponding to a group of HOA coefficients with the greatest correlation with the HOA coefficients of the audio signal is determined as the target virtual speaker.
  • the respective HOA coefficients of the F virtual speakers and the HOA coefficients of the audio signal may be inner products, and the virtual speaker with the largest absolute value of the inner product is selected as the target virtual speaker. That is, each of the F groups of HOA coefficients includes (N+1) 2 coefficients, the HOA coefficients of the audio signal include (N+1) 2 coefficients, and N represents the order of the audio signal, so the HOA coefficients of the audio signal are the same as Each group of the HOA coefficients in the F group has a one-to-one correspondence. Based on this correspondence, the HOA coefficients of the audio signal and each group of the HOA coefficients in the F group are inner-products to obtain the HOA coefficients of the audio signal and the F group HOA coefficients respectively. correlation between each group. It should be noted that other methods may also be used to determine the target virtual speaker, which is not specifically limited in this application.
  • Step 702 Acquire respective position information of the S virtual speakers corresponding to the target virtual speaker from a preset virtual speaker distribution table, where the position information includes a pitch angle index and a horizontal angle index.
  • the target virtual speaker ie, the center virtual speaker
  • S virtual speakers corresponding to the target virtual speaker can be obtained.
  • the position information of the S virtual speakers can be obtained.
  • the position information of the S virtual speakers is represented by a pitch angle index and a horizontal angle index.
  • the target virtual speaker is the center virtual speaker with the highest correlation with the HOA coefficient of the audio signal to be processed.
  • the S virtual speakers corresponding to each center virtual speaker are the S virtual speakers with the highest correlation with the HOA coefficient of the center virtual speaker, and therefore the S virtual speakers corresponding to the target virtual speaker are also the HOA coefficients of the audio signal to be processed.
  • the S virtual speakers with the highest correlation are also the HOA coefficients of the audio signal to be processed.
  • a virtual speaker distribution table is preset, so that the virtual speakers can be deployed according to the distribution table to obtain a higher average signal-to-noise ratio (SNR) of the HOA reconstructed signal, and then the selection and processing based on this distribution
  • SNR signal-to-noise ratio
  • FIG. 8 is an exemplary structural diagram of an apparatus for determining a virtual speaker set of the present application.
  • the apparatus may be applied to the encoder 20 or the decoder 30 in the above-mentioned embodiment.
  • the apparatus for determining a virtual speaker set in this embodiment may include: a determination module 801 and an acquisition module 802, wherein the determination module 801 is configured to determine a target virtual speaker from the preset F virtual speakers according to the audio signal to be processed, the Each virtual speaker in the F virtual speakers corresponds to S virtual speakers, F is a positive integer, and S is a positive integer greater than 1; the acquiring module 802 is configured to acquire from a preset virtual speaker distribution table and the target
  • the respective position information of the S virtual speakers corresponding to the virtual speakers, the virtual speaker distribution table includes the position information of the K virtual speakers, the position information includes a pitch angle index and a horizontal angle index, K is a positive integer greater than 1, F ⁇ K, F ⁇ S ⁇ K.
  • the determining module 801 is specifically configured to acquire high-order stereo reverberation HOA coefficients of the audio signal; acquire F groups of HOA coefficients corresponding to the F virtual speakers, the F The virtual speakers are in one-to-one correspondence with the F groups of HOA coefficients; the virtual speaker corresponding to the group of HOA coefficients with the greatest correlation with the HOA coefficients of the audio signal among the F groups of HOA coefficients is determined as the target virtual speaker.
  • the S virtual speakers corresponding to the target virtual speaker satisfy the following condition: the S virtual speakers include the target virtual speaker, and S virtual speakers located around the target virtual speaker -1 virtual speaker, any one of the S-1 correlations between the S-1 virtual speakers and the target virtual speaker is greater than any one of the S-1 correlations of the K virtual speakers except the S virtual speakers All of the K-S correlations of the other K-S virtual speakers with the target virtual speaker.
  • the K virtual speakers satisfy the following conditions: the K virtual speakers are distributed on a preset spherical surface; the preset spherical surface includes L latitude regions, L>1; wherein, the The mth latitude area in the L latitude areas includes Tm latitude coils, and the horizontal angle difference between adjacent virtual speakers distributed on the m ith latitude coil in the K virtual speakers is ⁇ m , 1 ⁇ m ⁇ L, T m is a positive integer, 1 ⁇ m i ⁇ Tm; wherein, when T m >1, the pitch angle difference between any two adjacent latitude coils in the mth latitude region is a m .
  • the nth latitude area of the L latitude areas includes Tn latitude coils, and among the K virtual speakers, the adjacent virtual speakers distributed on the n ith latitude coil are among the The horizontal angle difference between them is ⁇ n , 1 ⁇ n ⁇ L, T n is a positive integer, 1 ⁇ n i ⁇ T n ; wherein, when T n >1, any two of the nth latitude regions
  • the cth latitude area in the L latitude areas includes T c latitude coils, one of the T c latitude coils is an equatorial latitude coil, and the K virtual speakers
  • the horizontal angle difference between adjacent virtual speakers distributed on the c i -th weft coil is ⁇ c , 1 ⁇ c ⁇ L
  • T c is a positive integer, 1 ⁇ ci ⁇ T c ; where, when T c >1, the pitch angle difference between any two adjacent latitude coils in the c-th latitude region is ⁇ c ; where ⁇ c ⁇ m , c ⁇ m.
  • the F virtual speakers satisfy the following condition: a horizontal angle difference ⁇ mi between adjacent virtual speakers distributed on the m i th weft coil among the F virtual speakers greater than ⁇ m .
  • ⁇ mi q ⁇ m , where q is a positive integer greater than 1.
  • the correlation R fk between the kth virtual speaker in the K virtual speakers and the target virtual speaker satisfies the following formula:
  • represents the horizontal angle of the target virtual speaker
  • represents the pitch angle of the target virtual speaker
  • HOA coefficient of the target virtual speaker represents the HOA coefficient of the kth virtual speaker among the K virtual speakers.
  • the apparatus in this embodiment can be used to execute the technical solution of the method embodiment shown in FIG. 7 , and the implementation principle and technical effect thereof are similar, and are not repeated here.
  • each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in the present application can be directly embodied as executed by a hardware encoding processor, or executed by a combination of hardware and software modules in the encoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

本申请提供一种虚拟扬声器集合确定方法和装置。虚拟扬声器集合确定方法,包括:根据待处理的音频信号从预设的F个虚拟扬声器中确定目标虚拟扬声器,所述F个虚拟扬声器中的每个虚拟扬声器各自对应S个虚拟扬声器,F为正整数,S为大于1的正整数;从预设的虚拟扬声器分布表中获取与所述目标虚拟扬声器对应的S个虚拟扬声器各自的位置信息,所述虚拟扬声器分布表包括K个虚拟扬声器的位置信息,所述位置信息包括俯仰角索引和水平角索引,K为大于1的正整数,F≤K,F×S≥K。本申请可以提升音频信号的回放效果。

Description

虚拟扬声器集合确定方法和装置
本申请要求于2021年3月5日提交中国专利局、申请号为202110247466.1、申请名称为“虚拟扬声器集合确定方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频技术领域,特别涉及虚拟扬声器集合确定方法和装置。
背景技术
三维音频技术是通过计算机、信号处理等方式对真实世界中声音事件和三维声场信息进行获取、处理,传输和渲染回放的音频技术。三维音频技术使声音具有强烈的空间感、包围感及沉浸感,给人以“声临其境”的听觉体验。目前主流的三维音频技术是高阶立体混响(higher order ambisonics,HOA)技术,HOA技术因其在录制和编码中与回放阶段的扬声器布局无关的性质,以及HOA格式数据的可旋转特性,使得HOA技术在三维音频回放时具有更高的灵活性,因而也得到了更为广泛的关注和研究。
HOA技术可以将HOA信号转为虚拟扬声器信号再映射为双耳信号进行回放。在上述过程中,虚拟扬声器均匀分布可以达到最好的采样效果,例如将虚拟扬声器分布于正四面体的顶点上。但由于三维空间中正多面体的数量只有五种,即正四面体、正六面体、正八面体、正十二面体和正二十面体,因此可以设置的虚拟扬声器的数量有限,不能适用于更多数量的虚拟扬声器的分布。
发明内容
本申请提供一种虚拟扬声器集合确定方法和装置,以提升音频信号的回放效果。
第一方面,本申请提供一种虚拟扬声器集合确定方法,包括:根据待处理的音频信号从预设的F个虚拟扬声器中确定目标虚拟扬声器,所述F个虚拟扬声器中的每个虚拟扬声器各自对应S个虚拟扬声器,F为正整数,S为大于1的正整数;从预设的虚拟扬声器分布表中获取与所述目标虚拟扬声器对应的S个虚拟扬声器各自的位置信息,所述虚拟扬声器分布表包括K个虚拟扬声器的位置信息,所述位置信息包括俯仰角索引和水平角索引,K为大于1的正整数,F≤K,F×S≥K。
本申请通过预先设定虚拟扬声器分布表,使得按照该分布表部署虚拟扬声器可以获得较高的HOA重建信号的信噪比(SNR)平均值,进而在基于这种分布的情况下选取与待处理的音频信号HOA系数相关性最高的S个虚拟扬声器,可以达到最优的采样效果,进而提升音频信号的回放效果。
在一种可能的实现方式中,所述根据待处理的音频信号从预设的F个虚拟扬声器中确定目标虚拟扬声器,包括:获取所述音频信号的高阶立体混响HOA系数;获取所述F个虚拟扬声器对应的F组HOA系数,所述F个虚拟扬声器与所述F组HOA系数一一对应;将所述F组HOA系数中与所述音频信号的HOA系数相关性最大的一组HOA系数对应的虚拟扬声器确定为所述目标虚拟扬声器。
对待处理的音频信号进行编码分析,例如分析待处理的音频信号的声场分布,包括音频信号的声源个数、方向性、弥散度等特征,得到该音频信号的HOA系数,作为决定如何选择目标虚拟扬声器的判断条件之一。根据待处理的音频信号的HOA系数和候选的虚拟扬声器(即上述F个虚拟扬声器)的HOA系数,可以选择出与待处理的音频信号匹配的虚拟扬声器,本申请中将该虚拟扬声器称作目标虚拟扬声器。可以将F个虚拟扬声器各自的HOA系数分别与音频信号的HOA系数做内积,选取内积绝对值最大的虚拟扬声器为目标虚拟扬声器。需要说明的是,还可以采用其他方法确定目标虚拟扬声器,本申请对此不做具体限定。
在一种可能的实现方式中,所述与所述目标虚拟扬声器对应的S个虚拟扬声器满足如下条件:所述S个虚拟扬声器包括所述目标虚拟扬声器,以及位于所述目标虚拟扬声器周围的S-1个虚拟扬声器,所述S-1个虚拟扬声器与所述目标虚拟扬声器的S-1个相关性中的任意一个相关性大于所述K个虚拟扬声器中除所述S个虚拟扬声器外的其它K-S个虚拟扬声器与所述目标虚拟扬声器的K-S个相关性中的所有相关性。
在确定目标虚拟扬声器时,该目标虚拟扬声器是与待处理的音频信号HOA系数相关性最高的中心虚拟扬声器。而每个中心虚拟扬声器对应的S个虚拟扬声器是与该中心虚拟扬声器HOA系数相关性最高的S个虚拟扬声器,而因此与目标虚拟扬声器对应的S个虚拟扬声器也是与待处理的音频信号HOA系数相关性最高的S个虚拟扬声器。
在一种可能的实现方式中,所述K个虚拟扬声器满足如下条件:所述K个虚拟扬声器分布于预设球面上;所述预设球面包含L个纬度区域,L>1;其中,所述L个纬度区域中第m个纬度区域包含T m个纬线圈,所述K个虚拟扬声器中分布于第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α m,1≤m≤L,T m为正整数,1≤m i≤Tm;其中,当T m>1时,所述第m个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α m
在一种可能的实现方式中,所述L个纬度区域中第n个纬度区域包含T n个纬线圈,所述K个虚拟扬声器中分布于第n i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α n,1≤n≤L,T n为正整数,1≤n i≤T n;其中,当T n>1时,所述第n个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α n;其中,α n=α m或者α n≠α m,n≠m。
在一种可能的实现方式中,所述L个纬度区域中第c个纬度区域包含T c个纬线圈,所述T c个纬线圈的其中之一为赤道纬线圈,所述K个虚拟扬声器中分布于第c i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α c,1≤c≤L,T c为正整数,1≤c i≤T c;其中,当T c>1时,所述第c个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α c;其中,α c<α m,c≠m。
在一种可能的实现方式中,所述F个虚拟扬声器满足如下条件:所述F个虚拟扬声器中分布于所述第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差α mi大于α m
在一种可能的实现方式中,α mi=q×α m,其中,q为大于1的正整数。
在一种可能的实现方式中,所述K个虚拟扬声器中的第k个虚拟扬声器与所述目标虚拟扬声器的相关性R fk满足如下公式:
Figure PCTCN2022078824-appb-000001
其中,θ表示所述目标虚拟扬声器的水平角度,
Figure PCTCN2022078824-appb-000002
表示所述目标虚拟扬声器的俯仰角度,
Figure PCTCN2022078824-appb-000003
表示所述目标虚拟扬声器的HOA系数,
Figure PCTCN2022078824-appb-000004
表示所述K个虚拟扬声器中的第 k个虚拟扬声器的HOA系数。
第二方面,本申请提供一种虚拟扬声器集合确定装置,包括:确定模块,用于根据待处理的音频信号从预设的F个虚拟扬声器中确定目标虚拟扬声器,所述F个虚拟扬声器中的每个虚拟扬声器各自对应S个虚拟扬声器,F为正整数,S为大于1的正整数;获取模块,用于从预设的虚拟扬声器分布表中获取与所述目标虚拟扬声器对应的S个虚拟扬声器各自的位置信息,所述虚拟扬声器分布表包括K个虚拟扬声器的位置信息,所述位置信息包括俯仰角索引和水平角索引,K为大于1的正整数,F≤K,F×S≥K。
在一种可能的实现方式中,所述确定模块,具体用于获取所述音频信号的高阶立体混响HOA系数;获取所述F个虚拟扬声器对应的F组HOA系数,所述F个虚拟扬声器与所述F组HOA系数一一对应;将所述F组HOA系数中与所述音频信号的HOA系数相关性最大的一组HOA系数对应的虚拟扬声器确定为所述目标虚拟扬声器。
在一种可能的实现方式中,所述与所述目标虚拟扬声器对应的S个虚拟扬声器满足如下条件:所述S个虚拟扬声器包括所述目标虚拟扬声器,以及位于所述目标虚拟扬声器周围的S-1个虚拟扬声器,所述S-1个虚拟扬声器与所述目标虚拟扬声器的S-1个相关性中的任意一个相关性大于所述K个虚拟扬声器中除所述S个虚拟扬声器外的其它K-S个虚拟扬声器与所述目标虚拟扬声器的K-S个相关性中的所有相关性。
在一种可能的实现方式中,所述K个虚拟扬声器满足如下条件:所述K个虚拟扬声器分布于预设球面上;所述预设球面包含L个纬度区域,L>1;其中,所述L个纬度区域中第m个纬度区域包含T m个纬线圈,所述K个虚拟扬声器中分布于第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α m,1≤m≤L,T m为正整数,1≤m i≤Tm;其中,当T m>1时,所述第m个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α m
在一种可能的实现方式中,所述L个纬度区域中第n个纬度区域包含T n个纬线圈,所述K个虚拟扬声器中分布于第n i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α n,1≤n≤L,T n为正整数,1≤n i≤T n;其中,当T n>1时,所述第n个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α n;其中,α n=α m或者α n≠α m,n≠m。
在一种可能的实现方式中,所述L个纬度区域中第c个纬度区域包含T c个纬线圈,所述T c个纬线圈的其中之一为赤道纬线圈,所述K个虚拟扬声器中分布于第c i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α c,1≤c≤L,T c为正整数,1≤c i≤T c;其中,当T c>1时,所述第c个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α c;其中,α c<α m,c≠m。
在一种可能的实现方式中,所述F个虚拟扬声器满足如下条件:所述F个虚拟扬声器中分布于所述第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差α mi大于α m
在一种可能的实现方式中,α mi=q×α m,其中,q为大于1的正整数。
在一种可能的实现方式中,所述K个虚拟扬声器中的第k个虚拟扬声器与所述目标虚拟扬声器的相关性R fk满足如下公式:
Figure PCTCN2022078824-appb-000005
其中,θ表示所述目标虚拟扬声器的水平角度,
Figure PCTCN2022078824-appb-000006
表示所述目标虚拟扬声器的俯仰角度,
Figure PCTCN2022078824-appb-000007
表示所述目标虚拟扬声器的HOA系数,
Figure PCTCN2022078824-appb-000008
表示所述K个虚拟扬声器中的第k个虚拟扬声器的HOA系数。
第三方面,本申请提供一种音频处理设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述第一方面中任一项所述的方法。
第四方面,本申请提供一种计算机可读存储介质,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行上述第一方面中任一项所述的方法。
附图说明
图1为本申请音频播放系统的一个示例性的结构图;
图2为本申请音频译码系统10的一个示例性的结构图;
图3为本申请HOA编码装置的一个示例性的结构图;
图4a为本申请预设球面的一个示例性的示意图;
图4b为本申请俯仰角度和水平角度的一个示例性的示意图;
图5a和图5b为K个虚拟扬声器的示例性的分布图;
图6a和图6b为K个虚拟扬声器的示例性的分布图;
图7是本申请虚拟扬声器集合确定方法的一个示例性的流程图;
图8为本申请虚拟扬声器集合确定装置的一个示例性的结构图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获取的所有其他实施例,都属于本申请保护的范围。
本申请的说明书实施例和权利要求书及附图中的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。字符“~”连接的两个数值一般表示一个取值范围,该取值范围包含“~”连接的两个数值。
本申请涉及到的相关名词解释:
音频帧:音频数据是流式的,在实际应用中,为了便于音频处理和传输,通常取一时长内的音频数据量作为一帧音频,该时长被称为“采样时间”,可以根据编解码器和具体应用的需求确定其值,例如该时长为2.5ms~60ms,ms为毫秒。
音频信号:音频信号是带有语音、音乐和音效的有规律的声波的频率、幅度变化信息载体。音频是一种连续变化的模拟信号,可用一条连续的曲线来表示,称为声波。音频通过模数转换或计算机生成的数字信号即为音频信号。声波有三个重要参数:频率、幅度和相位,这也就决定了音频信号的特征。
以下是本申请所应用的系统架构。
图1为本申请音频播放系统的一个示例性的结构图,如图1所示,该音频播放系统包括:音频发送设备和音频接收设备,其中,音频发送设备包括例如手机、电脑(笔记本电脑、台式电脑等)、平板(手持平板、车载平板等)等可以进行音频编码并发送音频码流的设备;音频接收设备包括例如真无线立体声(true wireless stereo,TWS)、普通无线耳机、音响、智能手表、智能眼镜等可以接收音频码流、解码音频码流并播放的设备。
音频发送设备和音频接收设备之间可以建立蓝牙连接,二者之间可以支持语音和音乐的传输。音频发送设备和音频接收设备的较为广泛的示例是手机与TWS耳机、无线头戴式耳机或者无线颈圈式耳机之间,或者手机与其他终端设备(例如智能音箱、智能手表、智能眼镜和车载音箱等)之间。可选的,音频发送设备和音频接收设备的示例也可以是平板、笔记本电脑或者台式电脑与TWS耳机、无线头戴式耳机、无线颈圈式耳机或其他终端设备(例如智能音箱、智能手表、智能眼镜和车载音箱)之间。
需要说明的是,音频发送设备和音频接收设备之间除蓝牙连接外,还可以通过其他通信方式连接,例如WiFi连接、有线连接或其他无线连接等,本申请对此不做具体限定。
图2为本申请音频译码系统10的一个示例性的结构图,如图2所示,音频译码系统10可包括源设备12和目的设备14,源设备12可以是图1的音频发送设备,目的设备14可以是图1的音频接收设备。源设备12产生经编码的码流信息,因此,源设备12也可以被称为音频编码设备。目的设备14可对由源设备12所产生的经编码的码流信息进行解码,因此,目的设备14也可以被称为音频解码设备。本申请中,源设备12、音频编码设备可以被统一称作音频发送设备,目的设备14、音频解码设备可以被统一称作音频接收设备。
源设备12包括编码器20,可选地,可包括音频源16、音频预处理器18、通信接口22。
音频源16,可以包括或可以为任何类别的音频捕获设备,例如,捕获现实世界声音,和/或任何类别的音频生成设备,例如,计算机音频处理器,或用于获取和/或提供现实世界音频、计算机动画音频(例如,屏幕内容、虚拟现实(virtual reality,VR)中的音频)的任何类别设备,和/或其任何组合(例如,增强现实(augmented reality,AR)中的音频、混合现实(mixed Reality,MR)中的音频和/或扩展现实(extended Reality,XR)中的音频)。音频源16可以为用于捕获音频的麦克风或者用于存储音频的存储器,音频源16还可以包括存储先前捕获或产生的音频和/或获取或接收音频的任何类别的(内部或外部)接口。当音频源16为麦克风时,音频源16可例如为本地的或集成在源设备中的音频采集装置;当音频源16为存储器时,音频源16可为本地的或例如集成在源设备中的集成存储器。当所述音频源16包括接口时,接口可例如为从外部音频源接收音频的外部接口,外部音频源例如为外部音频捕获设备,比如话筒、麦克风、外部存储器或外部音频生成设备,外部音频生成设备例如为外部计算机音频处理器、计算机或服务器。接口可以为根据任何 专有或标准化接口协议的任何类别的接口,例如有线或无线接口、光接口。
本申请中,音频源16获取当前场景音频信号,该当前场景音频信号是指对空间中麦克风所在位置的声场进行采集得到的音频信号,当前场景音频信号也可以称为原始场景音频信号。例如,当前场景音频信号可以是通过高阶立体混响(higher order ambisonics,HOA)技术得到的音频信号。音频源16获取待编码的HOA信号,例如,可以采用实际采集设备获取HOA信号或采用人工音频对象合成HOA信号。可选的,待编码的HOA信号可以是时域HOA信号或者频域HOA信号。
音频预处理器18,用于接收原始音频信号并对原始音频信号执行预处理,以获取经预处理的音频信号。例如,音频预处理器18执行的预处理可以包括整修或去噪。
编码器20,用于接收经预处理的音频信号,对经预处理的音频信号进行处理,从而提供经编码的码流信息。
源设备12中的通信接口22可用于接收码流信息并通过通信信道13向目的设备14发送该码流。通信信道13例如为直接有线或无线连接,任何类别的网络例如为有线或无线网络或其任何组合,或任何类别的私网和公网,或其任何组合。
目的设备14包括解码器30,可选地,可包括通信接口28、音频后处理器32和播放设备34。
目的设备14中的通信接口28用于直接从源设备12接收码流信息,并将码流信息提供给解码器30。通信接口22和通信接口28可用于通过源设备12与目的设备14之间的通信信道13发送或接收码流信息。
通信接口22和通信接口28均可配置为如图2中从源设备12指向目的设备14的对应通信信道13的箭头所指示的单向通信接口,或双向通信接口,并且可用于发送和接收消息等,以建立连接,确认并交换与通信链路和/或编码音频数据等数据传输相关的任何其它信息,等等。
解码器30,用于接收码流信息,并解码码流信息得到经解码的音频数据。
音频后处理器32,用于对解码的音频数据进行后处理,得到后处理后的音频数据。音频后处理器32执行的后处理可以包括例如修剪或重采样等。
播放设备34,用于接收后处理后的音频数据,以向用户或收听者播放音频。播放设备34可以为或包括任意类型的用于播放重建后音频的播放器,例如,集成或外部扬声器。例如,扬声器可包括喇叭、音响等。
图3为本申请HOA编码装置的一个示例性的结构图,如图3所示,HOA编码装置可以应用于上述音频译码系统10的编码器20中。HOA编码装置包括:虚拟扬声器配置单元、编码分析单元、虚拟扬声器集合生成单元、虚拟扬声器选择单元、虚拟扬声器信号生成单元和核心编码器处理单元。其中,
虚拟扬声器配置单元,用于根据编码器配置信息对虚拟扬声器进行配置,以得到虚拟扬声器配置参数。编码器配置信息包括且不限于:HOA阶数,编码比特率,用户自定义信息等,虚拟扬声器配置参数包括且不限于:虚拟扬声器的个数,虚拟扬声器的HOA阶数等。
虚拟扬声器配置单元输出的虚拟扬声器配置参数作为虚拟扬声器集合生成单元的输入。
编码分析单元,用于对待编码HOA信号进行编码分析,例如分析待编码HOA信号的声场分布,包括待编码HOA信号的声源个数、方向性、弥散度等特征,作为决定如何选择目标虚拟扬声器的判断条件之一。
不限定的是,本申请中,HOA编码装置中也可以不包括编码分析单元,即HOA编码装置可以不对输入信号进行分析,则采用一种默认配置决定如何选择目标虚拟扬声器。
其中,HOA编码装置获取待编码HOA信号,例如可以将从实际采集设备记录的HOA信号或采用人工音频对象合成的HOA信号作为编码器的输入,同时编码器输入的待编码HOA信号可以是时域HOA信号也可以是频域HOA信号。
虚拟扬声器集合生成单元,用于生成虚拟扬声器集合,该虚拟扬声器集合中可以包括:多个虚拟扬声器,虚拟扬声器集合中的虚拟扬声器也可以称为“候选虚拟扬声器”。
虚拟扬声器集合生成单元生成指定的候选虚拟扬声器HOA系数。由虚拟扬声器配置单元提供的候选虚拟扬声器的坐标(即位置信息)和候选虚拟扬声器的HOA阶数用于生成候选虚拟扬声器HOA系数。候选虚拟扬声器的坐标确定方法包括且不限于按等距规则产生K个虚拟扬声器、根据听觉感知原理生成非均匀分布的K个候选虚拟扬声器。根据候选虚拟扬声器的个数生成分布均匀的候选虚拟扬声器的坐标。
接下来生成虚拟扬声器的HOA系数:
声波在理想介质中传播,其波速为k=w/c,角频率w=2πf,f表示声波频率,c表示声速。因此声压p满足如下公式(1):
Figure PCTCN2022078824-appb-000009
其中,
Figure PCTCN2022078824-appb-000010
为拉普拉斯算子。
在球坐标下求解公式(1),声压p可以得到如下公式(2):
Figure PCTCN2022078824-appb-000011
其中,r表示球半径,θ表示水平角度(azimuth)(水平角度也可以称作方位角),
Figure PCTCN2022078824-appb-000012
表示俯仰角度(elevation),k表示波速,s表示理想平面波的幅度,m表示HOA阶数序号,
Figure PCTCN2022078824-appb-000013
表示球贝塞尔函数,亦称作径向基函数,第一个j是虚数单位,
Figure PCTCN2022078824-appb-000014
Figure PCTCN2022078824-appb-000015
不随角度变化,
Figure PCTCN2022078824-appb-000016
是θ和
Figure PCTCN2022078824-appb-000017
对应的球谐函数,
Figure PCTCN2022078824-appb-000018
是声源方向的球谐函数。
立体混响(Ambisonics)系数为:
Figure PCTCN2022078824-appb-000019
因此可以得到声压p的一般展开形式(4):
Figure PCTCN2022078824-appb-000020
上述公式(3)可以表明声场可以在球面上按球谐函数展开,其通过Ambisonics系数进行表示。
相应的,已知Ambisonics系数则可以重建声场,将公式(3)截断到第N项,以Ambisonics系数作为对声场的近似描述,则称为N阶的HOA系数,该HOA系数亦称作Ambisonics系数。N阶Ambisonics系数共有(N+1) 2个通道。可选的,HOA阶数可以为2阶~10阶,将球谐函数按照HOA信号的一个采样点对应的系数进行叠加,就能实现该采样点对应的时刻空间声场的重构。根据该原理可以生成虚拟扬声器的HOA系数。将公式(3)中的θ s
Figure PCTCN2022078824-appb-000021
分别设置为虚拟扬声器的位置信息,即水平角度和俯仰角度,根据式(3)可以获得该虚拟扬声器的HOA系数,也称作Ambisonics系数。例如,针对3阶HOA信 号,假设s=1,其对应的16通道的HOA系数可通过球谐函数
Figure PCTCN2022078824-appb-000022
得到,3阶HOA信号对应的16通道的HOA系数计算公式具体如表1所示:
表1
Figure PCTCN2022078824-appb-000023
Figure PCTCN2022078824-appb-000024
表1中θ表示虚拟扬声器在预设球面上的位置信息的水平角度,
Figure PCTCN2022078824-appb-000025
表示虚拟扬声器在预设球面上的位置信息的俯仰角度,l表示HOA阶数,l=0,1,…,N,m表示每一阶中的方向参数,m=-l,…,l。按照表1中的极坐标的表达式,可以根据虚拟扬声器的位置信息,获得该虚拟扬声器的3阶HOA信号对应的16个通道的HOA系数。
虚拟扬声器集合生成单元输出的候选虚拟扬声器的HOA系数作为虚拟扬声器选择单元的输入。
虚拟扬声器选择单元,用于根据待编码HOA信号从虚拟扬声器集合中的多个候选虚拟扬声器中选择出目标虚拟扬声器,该目标虚拟扬声器可以称为“与待编码HOA信号匹配的虚拟扬声器”,或者简称为匹配虚拟扬声器。
虚拟扬声器选择单元根据待编码HOA信号与虚拟扬声器集合生成单元输出的候选虚拟扬声器HOA系数,选择出指定的匹配虚拟扬声器。
接下来对匹配虚拟扬声器的选择方法进行举例说明:在一种可能的实现方式中,使用候选虚拟扬声器HOA系数匹配与待编码HOA信号做内积,选取内积绝对值最大的候选虚拟扬声器为目标虚拟扬声器,即匹配虚拟扬声器,并将待编码HOA信号在该候选虚拟扬声器的投影叠加到该候选虚拟扬声器HOA系数的线性组合上,然后将投影向量从待编码HOA信号中减去得到差值,对差值重复上述过程实现迭代计算,每迭代一次产生一个匹配虚拟扬声器,输出匹配虚拟扬声器坐标和匹配虚拟扬声器HOA系数。可以理解的是,匹配虚拟扬声器会选取多个,每迭代一次产生一个匹配虚拟扬声器。(除此之外,不限定其他实现方法)
虚拟扬声器选择单元输出的目标虚拟扬声器的坐标和目标虚拟扬声器的HOA系数作为虚拟扬声器信号生成单元的输入。
虚拟扬声器信号生成单元,用于根据待编码HOA信号和目标虚拟扬声器的属性信息生成虚拟扬声器信号,其中当属性信息为位置信息时,根据所述目标虚拟扬声器的位置信息确定所述目标虚拟扬声器的HOA系数,当属性信息包括HOA系数时,从所述属性信息中获取所述目标虚拟扬声器的HOA系数。
虚拟扬声器信号生成单元通过待编码HOA信号和目标虚拟扬声器的HOA系数计算虚拟扬声器信号。
虚拟扬声器的HOA系数用矩阵A表示,用矩阵A可以线性组合出待编码HOA信号,进一步的可以采用最小二乘方法求得理论的最优解w,即为虚拟扬声器信号,例如可以采用如下计算公式:
w=A -1X,
其中,A -1代表矩阵A的逆矩阵,矩阵A的大小为(M×C),C为目标虚拟扬声器个数,M为N阶的HOA系数的通道个数,M=(N+1) 2,a表示目标虚拟扬声器的HOA系数,例如,
Figure PCTCN2022078824-appb-000026
X代表待编码HOA信号,矩阵X的大小为(M×L),M为N阶的HOA系数的通道个数,L为时域或频域样点个数,x表示待编码HOA信号的系数,例如,
Figure PCTCN2022078824-appb-000027
虚拟扬声器信号生成单元输出的虚拟扬声器信号作为核心编码器处理单元的输入。
核心编码器处理单元,用于对虚拟扬声器信号进行核心编码器处理,得到传输码流。
核心编码器处理包括且不限于变换、量化、心理声学模型、码流产生等,可以对频域传输通道进行处理也可以对时域传输通道进行处理,此处不做限定。
基于上述实施例的描述,本申请提供了一种虚拟扬声器集合确定方法。该虚拟扬声器集合确定方法基于以下预先设定:
一、虚拟扬声器分布表
虚拟扬声器分布表包括K个虚拟扬声器的位置信息,该位置信息包括俯仰角索引和水平角索引,K为大于1的正整数。设定K个虚拟扬声器分布于预设球面上。该预设球面可以包括X个纬线圈,Y个经线圈,X和Y可以相同也可以不同,X和Y均为正整数,例如X为512,768或1024等等,Y为512,768或1024等等。虚拟扬声器位于所述X个纬线圈和所述Y个经线圈的交汇点上。其中X和Y的取值越大,虚拟扬声器的候选选择位置越多,最终选择的虚拟扬声器构成的声场的回放效果就越好。
图4a为本申请预设球面的一个示例性的示意图,如图4a所示,预设球面包含L(L>1)个纬度区域,第m个纬度区域包含T m个纬线圈,K个虚拟扬声器中分布于第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α m,1≤m≤L,T m为正整数,1≤m i≤Tm。当T m>1时,第m个纬度区域中的任意两个相邻纬线圈的俯仰角度差为α m。图4b为本申请俯仰角度和水平角度的一个示例性的示意图,如图4b所示,虚拟扬声器的位置和球心之间的连线与预设水平面(例如赤道圈所在平面,或者南极点所在的平面,或者北极点所在的平面,其中,南极点所在的平面垂直于南极点和北极点之间的连线,北极点所在的平面垂直于南极点和北极点之间的连线)之间的夹角为虚拟扬声器的俯仰角度;虚拟扬声器的位置和球心之间的连线在水平面上的投影与设定初始方向的夹角为虚拟扬声器的水平角度。
应当理解的是,K个虚拟扬声器分布于各个纬度区域中的一个或多个纬线圈上,位于 同一个纬线圈上的相邻虚拟扬声器之间的距离通过水平角度差表示,且同一个纬线圈上的所有相邻虚拟扬声器之间的水平角度差相等。例如,上述第m i个纬线圈上,任意两个相邻虚拟扬声器之间的水平角度差均为α m。而位于同一个纬度区域内的虚拟扬声器,若该纬度区域包含多个纬线圈,则无论在该纬度区域中的哪一个纬线圈上,相邻虚拟扬声器之间的水平角度差全都相等。例如,第m个纬度区域中,第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差和第m i+1个纬线圈上的相邻虚拟扬声器之间的水平角度差均为α m。另外,若某一个纬度区域包含多个纬线圈,则该纬度区域中的纬线圈之间的距离通过俯仰角度差表示,且任意两个相邻纬线圈之间的俯仰角度差和该纬度区域中的相邻虚拟扬声器之间的水平角度差相等。
在一种可能的实现方式中,α n=α m或者α n≠α m,α n为K个虚拟扬声器中分布于第n个纬度区域中的任意一个纬线圈上的相邻虚拟扬声器之间的水平角度差,n≠m。
即,位于不同纬度区域的虚拟扬声器,相邻虚拟扬声器之间的水平角度差可以是相等的,α n=α m,也可以是不相等的,α n≠α m。应当理解的是,本申请并不限定L个纬度区域内的相邻虚拟扬声器之间的水平角度差全部相等,也不限定L个纬度区域内的相邻虚拟扬声器之间的水平角度差全部不相等,甚至L个纬度区域中可以有部分纬度区域内的相邻虚拟扬声器之间的水平角度差相等,而和另一部分纬度区域内的相邻虚拟扬声器之间的水平角度差不相等。
在一种可能的实现方式中,α c<α m,α c为K个虚拟扬声器中分布于第m c个纬线圈上的相邻虚拟扬声器之间的水平角度差,第m c个纬线圈是L个纬度区域中包含赤道纬线圈的纬度区域中的任意一个纬线圈。
即,L个纬度区域中,包含了赤道纬线圈的纬度区域内的相邻虚拟扬声器之间的水平角度差是最小的,亦即,L个纬度区域中,包含了赤道纬线圈的纬度区域内的虚拟扬声器是分布最密集的。
可选的,可以通过索引的方式表示虚拟扬声器分布表中的K个虚拟扬声器的位置,索引可以包括俯仰角索引和水平角索引。例如,在任意一个纬线圈上,将分布其上的其中一个虚拟扬声器的水平角度设置为0,然后根据预设的水平角度与水平角索引之间的转换公式转换获得对应的水平角索引;由于纬线圈上的任意相邻虚拟扬声器之间的水平角差值是相等的,因此可以获得该纬线圈上的其他虚拟扬声器的水平角度,从而根据上述转换公式获得所述其他虚拟扬声器各自的水平角索引。需要说明的是,本申请对将纬线圈上的哪个虚拟扬声器的水平角度设置为0不作具体限定。同理,由于在经线圈方向相邻虚拟扬声器之间的俯仰角差值满足前述的要求,因此在设置了俯仰角度为0的虚拟扬声器后,就可以获得其他虚拟扬声器的俯仰角度,基于预设的俯仰角度和俯仰角索引之间的转换公式就可以获得经线圈上所有虚拟扬声器的俯仰角索引。需要说明的是,本申请对将经线圈上哪个虚拟扬声器的俯仰角度设置为0不作具体限定,例如可以是位于所述赤道圈上的虚拟扬声器,或者所述位于所述南极点上的虚拟扬声器,或者位于所述北极点上的虚拟扬声器。
可选的,上述K个虚拟扬声器中的第k个虚拟扬声器,其俯仰角度
Figure PCTCN2022078824-appb-000028
和俯仰角索引
Figure PCTCN2022078824-appb-000029
满足如下公式(即俯仰角度和俯仰角索引的转换公式):
Figure PCTCN2022078824-appb-000030
其中,r k表示第k个虚拟扬声器所在经线圈的半径,round()表示取整。
上述K个虚拟扬声器中的第k个虚拟扬声器,其水平角度θ k和水平角索引θ k’满足如下公式(即水平角度和水平角索引的转换公式):
Figure PCTCN2022078824-appb-000031
其中,r k表示第k个虚拟扬声器所在纬线圈的半径,round()表示取整。
图5a和图5b为K个虚拟扬声器的示例性的分布图。如图5a所示,包含了赤道纬线圈的纬度区域内的相邻虚拟扬声器之间的水平角度差小于其他纬度区域内的相邻虚拟扬声器之间的水平角度差,α c<α m。如图5b所示,K个虚拟扬声器在预设球面上随机近似均匀分布。
表1示出了图5a和图5b所示的分布图的比较,假设K=1669,可以看出图5a的分布方法获得的HOA重建信号的信噪比(SNR)的平均值高于图5b的分布方法获得的HOA重建信号的信噪比。
表1
Figure PCTCN2022078824-appb-000032
如表1所示,本实施例采用了12个不同类型的测试音频,文件名从1到12分别为单声源语音信号、单声源乐器信号、两声源语音信号、两声源乐器信号、三声源语音乐器混合信号、四声源语音乐器混合信号、两声源噪声信号1、两声源噪声信号2、两声源噪声信号3、两声源噪声信号4、两声源混响信号1、两声源混响信号2。
图6a和图6b为K个虚拟扬声器的示例性的分布图。如图6a所示,L个纬度区域内的相邻虚拟扬声器之间的水平角度差均相等,α n=α m。如图6b所示,K个虚拟扬声器在预设球面上随机近似均匀分布。
表2示出了图6a和图6b所示的分布图的比较,假设K=1669,可以看出图6a的分布方法获得的HOA重建信号的信噪比(SNR)的平均值高于图6b的分布方法获得的HOA重建信号的信噪比。
表2
Figure PCTCN2022078824-appb-000033
如表2所示,本实施例采用了12个不同类型的测试音频,文件名从1到12分别为单声源语音信号、单声源乐器信号、两声源语音信号、两声源乐器信号、三声源语音乐器混合信号、四声源语音乐器混合信号、两声源噪声信号1、两声源噪声信号2、两声源噪声信号3、两声源噪声信号4、两声源混响信号1、两声源混响信号2。
示例性的,表3是虚拟扬声器分布表的一个示例,该示例中K为530,即表3描述了序号从0~529的530个虚拟扬声器的具体分布,位置表示对应序号虚拟扬声器的水平角索引和俯仰角索引,表格中位置列中“,”前的数字是水平角索引,“,”后的数字是俯仰角索引。
表3虚拟扬声器分布表
序号 位置 序号 位置 序号 位置 序号 位置 序号 位置
0 5,768 106 444,987 212 453,5 318 208,34 424 19,68
1 5,805 107 478,987 213 470,5 319 226,34 425 37,68
2 146,805 108 512,987 214 487,5 320 243,34 426 56,68
3 293,805 109 546,987 215 504,5 321 260,34 427 74,68
4 439,805 110 580,987 216 520,5 322 278,34 428 93,68
5 585,805 111 614,987 217 537,5 323 295,34 429 112,68
6 731,805 112 649,987 218 554,5 324 312,34 430 130,68
7 878,805 113 683,987 219 571,5 325 330,34 431 149,68
8 5,841 114 717,987 220 588,5 326 347,34 432 168,68
9 73,841 115 751,987 221 604,5 327 364,34 433 186,68
10 146,841 116 785,987 222 621,5 328 382,34 434 205,68
11 219,841 117 819,987 223 638,5 329 399,34 435 223,68
12 293,841 118 853,987 224 655,5 330 417,34 436 242,68
13 366,841 119 887,987 225 671,5 331 434,34 437 261,68
14 439,841 120 922,987 226 688,5 332 451,34 438 279,68
15 512,841 121 956,987 227 705,5 333 469,34 439 298,68
16 585,841 122 990,987 228 722,5 334 486,34 440 317,68
17 658,841 123 5,256 229 739,5 335 503,34 441 335,68
18 731,841 124 5,222 230 755,5 336 521,34 442 354,68
19 805,841 125 146,222 231 772,5 337 538,34 443 372,68
20 878,841 126 293,222 232 789,5 338 555,34 444 391,68
21 951,841 127 439,222 233 806,5 339 573,34 445 410,68
22 5,878 128 585,222 234 823,5 340 590,34 446 428,68
23 54,878 129 731,222 235 839,5 341 607,34 447 447,68
24 108,878 130 878,222 236 856,5 342 625,34 448 465,68
25 162,878 131 5,188 237 873,5 343 642,34 449 484,68
26 216,878 132 79,188 238 890,5 344 660,34 450 503,68
27 269,878 133 158,188 239 906,5 345 677,34 451 521,68
28 323,878 134 236,188 240 923,5 346 694,34 452 540,68
29 377,878 135 315,188 241 940,5 347 712,34 453 559,68
30 431,878 136 394,188 242 957,5 348 729,34 454 577,68
31 485,878 137 473,188 243 974,5 349 746,34 455 596,68
32 539,878 138 551,188 244 990,5 350 764,34 456 614,68
33 593,878 139 630,188 245 1007,5 351 781,34 457 633,68
34 647,878 140 709,188 246 5,17 352 798,34 458 652,68
35 701,878 141 788,188 247 17,17 353 816,34 459 670,68
36 755,878 142 866,188 248 34,17 354 833,34 460 689,68
37 808,878 143 945,188 249 51,17 355 850,34 461 707,68
38 862,878 144 5,154 250 68,17 356 868,34 462 726,68
39 916,878 145 57,154 251 85,17 357 885,34 463 745,68
40 970,878 146 114,154 252 102,17 358 903,34 464 763,68
41 5,914 147 171,154 253 119,17 359 920,34 465 782,68
42 43,914 148 228,154 254 137,17 360 937,34 466 801,68
43 85,914 149 284,154 255 154,17 361 955,34 467 819,68
44 128,914 150 341,154 256 171,17 362 972,34 468 838,68
45 171,914 151 398,154 257 188,17 363 989,34 469 856,68
46 213,914 152 455,154 258 205,17 364 1007,34 470 875,68
47 256,914 153 512,154 259 222,17 365 5,51 471 894,68
48 299,914 154 569,154 260 239,17 366 18,51 472 912,68
49 341,914 155 626,154 261 256,17 367 35,51 473 931,68
50 384,914 156 683,154 262 273,17 368 53,51 474 950,68
51 427,914 157 740,154 263 290,17 369 71,51 475 968,68
52 469,914 158 796,154 264 307,17 370 88,51 476 987,68
53 512,914 159 853,154 265 324,17 371 106,51 477 1005,68
54 555,914 160 910,154 266 341,17 372 124,51 478 5,85
55 597,914 161 967,154 267 358,17 373 141,51 479 20,85
56 640,914 162 5,119 268 375,17 374 159,51 480 39,85
57 683,914 163 45,119 269 393,17 375 177,51 481 59,85
58 725,914 164 89,119 270 410,17 376 194,51 482 79,85
59 768,914 165 134,119 271 427,17 377 212,51 483 98,85
60 811,914 166 178,119 272 444,17 378 230,51 484 118,85
61 853,914 167 223,119 273 461,17 379 247,51 485 138,85
62 896,914 168 267,119 274 478,17 380 265,51 486 158,85
63 939,914 169 312,119 275 495,17 381 282,51 487 177,85
64 981,914 170 356,119 276 512,17 382 300,51 488 197,85
65 5,951 171 401,119 277 529,17 383 318,51 489 217,85
66 37,951 172 445,119 278 546,17 384 335,51 490 236,85
67 73,951 173 490,119 279 563,17 385 353,51 491 256,85
68 110,951 174 534,119 280 580,17 386 371,51 492 276,85
69 146,951 175 579,119 281 597,17 387 388,51 493 295,85
70 183,951 176 623,119 282 614,17 388 406,51 494 315,85
71 219,951 177 668,119 283 631,17 389 424,51 495 335,85
72 256,951 178 712,119 284 649,17 390 441,51 496 354,85
73 293,951 179 757,119 285 666,17 391 459,51 497 374,85
74 329,951 180 801,119 286 683,17 392 477,51 498 394,85
75 366,951 181 846,119 287 700,17 393 494,51 499 414,85
76 402,951 182 890,119 288 717,17 394 512,51 500 433,85
77 439,951 183 935,119 289 734,17 395 530,51 501 453,85
78 475,951 184 979,119 290 751,17 396 547,51 502 473,85
79 512,951 185 5,5 291 768,17 397 565,51 503 492,85
80 549,951 186 17,5 292 785,17 398 583,51 504 512,85
81 585,951 187 34,5 293 802,17 399 600,51 505 532,85
82 622,951 188 50,5 294 819,17 400 618,51 506 551,85
83 658,951 189 67,5 295 836,17 401 636,51 507 571,85
84 695,951 190 84,5 296 853,17 402 653,51 508 591,85
85 731,951 191 101,5 297 870,17 403 671,51 509 610,85
86 768,951 192 118,5 298 887,17 404 689,51 510 630,85
87 805,951 193 134,5 299 905,17 405 706,51 511 650,85
88 841,951 194 151,5 300 922,17 406 724,51 512 670,85
89 878,951 195 168,5 301 939,17 407 742,51 513 689,85
90 914,951 196 185,5 302 956,17 408 759,51 514 709,85
91 951,951 197 201,5 303 973,17 409 777,51 515 729,85
92 987,951 198 218,5 304 990,17 410 794,51 516 748,85
93 5,987 199 235,5 305 1007,17 411 812,51 517 768,85
94 34,987 200 252,5 306 5,34 412 830,51 518 788,85
95 68,987 201 269,5 307 17,34 413 847,51 519 807,85
96 102,987 202 285,5 308 35,34 414 865,51 520 827,85
97 137,987 203 302,5 309 52,34 415 883,51 521 847,85
98 171,987 204 319,5 310 69,34 416 900,51 522 866,85
99 205,987 205 336,5 311 87,34 417 918,51 523 886,85
100 239,987 206 353,5 312 104,34 418 936,51 524 906,85
101 273,987 207 369,5 313 121,34 419 953,51 525 926,85
102 307,987 208 386,5 314 139,34 420 971,51 526 945,85
103 341,987 209 403,5 315 156,34 421 989,51 527 965,85
104 375,987 210 420,5 316 174,34 422 1006,51 528 985,85
105 410,987 211 436,5 317 191,34 423 5,68 529 1004,85
需要说明的是,表3中虚拟扬声器所分布的球面包括了1024个经线圈以及1024个纬线圈(南极点和北极点也分别对应一个纬线圈),所述1024个经线圈和1024个纬线圈对应了1024×1022+2=1046530个交汇点,所述1046530个交汇点分别有各自的俯仰角和水平角,相应地,所述1046530个交汇点分别有各自的俯仰角索引和水平角索引;表3中的530个虚拟扬声器的位置是所述1046530个交汇点中的530个。其中,表3中俯仰角索引是基于赤道的俯仰角度为0进行计算获得的,即除赤道外,其余俯仰角索引所对应的俯仰角度均是相对于赤道所在平面的俯仰角度。
二、预设的F个虚拟扬声器
F个虚拟扬声器满足条件:F个虚拟扬声器中分布于第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差α mi大于α m,第m i个纬线圈是第m个纬度区域内的其中一个纬线圈。
为方便描述,将K个虚拟扬声器中的虚拟扬声器称作候选虚拟扬声器,将F个虚拟扬声器中的任意一个虚拟扬声器称作中心虚拟扬声器(亦可以称作第一轮虚拟扬声器)。即,针对预设球面上的任意一个纬线圈,可以从分布在该纬线圈上的多个候选虚拟扬声器中选取一个或多个虚拟扬声器作为中心虚拟扬声器,加入F个虚拟扬声器中。若是选取多个虚拟扬声器,则相邻中心虚拟扬声器之间的水平角度差α mi大于相邻候选虚拟扬声器之间的水平角度差α m,可以表示为α mi>α m。亦即,针对某一个纬线圈,分布有多个候选虚拟扬声器,中心虚拟扬声器选自该多个候选虚拟扬声器,且密度更小。例如,纬线圈上的相邻候选虚拟扬声器之间的水平角度差α m=5°,相邻中心虚拟扬声器之间的水平角度差α mi=8°。
在一种可能的实现方式中,α mi=q×α m,其中,q为大于1的正整数。可见,相邻中心虚拟扬声器之间的水平角度差和相邻候选虚拟扬声器之间的水平角度差成倍数关系。例如,纬线圈上的相邻候选虚拟扬声器之间的水平角度差α m=5°,相邻中心虚拟扬声器之间的水平角度差α mi=10°。
三、F个虚拟扬声器中的每个虚拟扬声器各自对应S个虚拟扬声器
为方便描述,将S个虚拟扬声器中的虚拟扬声器称作目标虚拟扬声器。即,任意一个中心虚拟扬声器对应的S个虚拟扬声器满足条件:该S个虚拟扬声器包括前述任意一个中心虚拟扬声器,以及位于该任意一个中心虚拟扬声器周围的S-1个虚拟扬声器,该S-1个虚拟扬声器与前述任意一个中心虚拟扬声器的S-1个相关性中的任意一个相关性大于K个虚拟扬声器中除S个虚拟扬声器外的其它K-S个虚拟扬声器与前述任意一个中心虚拟扬声器的K-S个相关性中的所有相关性。
亦即,该S个虚拟扬声器对应的S个R fk是K个虚拟扬声器对应的K个R fk中最大的S个。最大的S个表示K个R fk从大到小排序,排在最前面的S个R fk即为最大的S个。
R fk表示上述任意一个中心虚拟扬声器和K个虚拟扬声器中的第k个虚拟扬声器的相关性,R fk满足如下公式:
Figure PCTCN2022078824-appb-000034
其中,θ表示上述任意一个虚拟扬声器的水平角度,
Figure PCTCN2022078824-appb-000035
表示上述任意一个虚拟扬声器的俯仰角度,
Figure PCTCN2022078824-appb-000036
表示上述任意一个虚拟扬声器的HOA系数,
Figure PCTCN2022078824-appb-000037
表示K个虚拟扬声器中的第k个虚拟扬声器的HOA系数。
通过上述方法即可给每个中心虚拟扬声器确定出S个目标虚拟扬声器。应当理解的是,本申请预先设定的是,来自K个虚拟扬声器的F个虚拟扬声器,因此每个中心虚拟扬声器的位置也可以用俯仰角索引和水平角索引表示;每个中心虚拟扬声器对应S个虚拟扬声器,该S个虚拟扬声器也来源于K个虚拟扬声器,因此每个目标虚拟扬声器的位置也可以用俯仰角索引和水平角索引表示。
图7是本申请虚拟扬声器集合确定方法的一个示例性的流程图。该过程700可由上述实施例中的编码器20或解码器30执行,即由音频发送设备中的编码器20实现音频编码,然后将码流信息发送给音频接收设备,由音频接收设备中的解码器30对码流信息进行解码以获得目标音频帧,进而基于该目标音频帧渲染得到对应于一个或多个虚拟扬声器的声场音频信号。过程700描述为一系列的步骤或操作,应当理解的是,过程700可以以各种顺序执行和/或同时发生,不限于图7所示的执行顺序。如图7所示,该方法包括:
步骤701、根据待处理的音频信号从预设的F个虚拟扬声器中确定目标虚拟扬声器。
如上所述,对待处理的音频信号进行编码分析,例如分析待处理的音频信号的声场分布,包括音频信号的声源个数、方向性、弥散度等特征,得到该音频信号的HOA系数,作为决定如何选择目标虚拟扬声器的判断条件之一。根据待处理的音频信号的HOA系数和候选的虚拟扬声器(即上述F个虚拟扬声器)的HOA系数,可以选择出与待处理的音频信号匹配的虚拟扬声器,本申请中将该虚拟扬声器称作目标虚拟扬声器。
在一种可能的实现方式中,可以先获取音频信号的HOA系数,再获取F个虚拟扬声器对应的F组HOA系数,F个虚拟扬声器与F组HOA系数是一一对应的,然后将F组HOA系数中与音频信号的HOA系数相关性最大的一组HOA系数对应的虚拟扬声器确定为目标虚拟扬声器。
本申请可以将F个虚拟扬声器各自的HOA系数分别与音频信号的HOA系数做内积,选取内积绝对值最大的虚拟扬声器为目标虚拟扬声器。即,F组HOA系数中每一组包含(N+1) 2个系数,音频信号的HOA系数包含(N+1) 2个系数,N表示音频信号的阶数,因此音频信号的HOA系数与F组HOA系数中的每一组一一对应,基于此对应关系,将音频信号的HOA系数分别与F组HOA系数中每一组做内积,得到音频信号的HOA系数分别与F组HOA系数中每一组之间的相关性。需要说明的是,还可以采用其他方法确定目标虚拟扬声器,本申请对此不做具体限定。
步骤702、从预设的虚拟扬声器分布表中获取与目标虚拟扬声器对应的S个虚拟扬声器各自的位置信息,该位置信息包括俯仰角索引和水平角索引。
基于上述本申请的预先设定,一旦确定了目标虚拟扬声器(亦即中心虚拟扬声器),该目标虚拟扬声器对应的S个虚拟扬声器就可以获取到。而基于最早设定的虚拟扬声器分布表,就可以得到该S个虚拟扬声器的位置信息。与K个虚拟扬声器采用同样的表示方法,S个虚拟扬声器的位置信息用俯仰角索引和水平角索引表示。
由此可见,在确定目标虚拟扬声器时,该目标虚拟扬声器是与待处理的音频信号HOA系数相关性最高的中心虚拟扬声器。而每个中心虚拟扬声器对应的S个虚拟扬声器是与该 中心虚拟扬声器HOA系数相关性最高的S个虚拟扬声器,而因此与目标虚拟扬声器对应的S个虚拟扬声器也是与待处理的音频信号HOA系数相关性最高的S个虚拟扬声器。
本申请通过预先设定虚拟扬声器分布表,使得按照该分布表部署虚拟扬声器可以获得较高的HOA重建信号的信噪比(SNR)平均值,进而在基于这种分布的情况下选取与待处理的音频信号HOA系数相关性最高的S个虚拟扬声器,可以达到最优的采样效果,进而提升音频信号的回放效果。
图8为本申请虚拟扬声器集合确定装置的一个示例性的结构图,如图8所示,该装置可以应用于上述实施例中的编码器20或解码器30。本实施例的虚拟扬声器集合确定装置可以包括:确定模块801和获取模块802,其中,确定模块801,用于根据待处理的音频信号从预设的F个虚拟扬声器中确定目标虚拟扬声器,所述F个虚拟扬声器中的每个虚拟扬声器各自对应S个虚拟扬声器,F为正整数,S为大于1的正整数;获取模块802,用于从预设的虚拟扬声器分布表中获取与所述目标虚拟扬声器对应的S个虚拟扬声器各自的位置信息,所述虚拟扬声器分布表包括K个虚拟扬声器的位置信息,所述位置信息包括俯仰角索引和水平角索引,K为大于1的正整数,F≤K,F×S≥K。
在一种可能的实现方式中,所述确定模块801,具体用于获取所述音频信号的高阶立体混响HOA系数;获取所述F个虚拟扬声器对应的F组HOA系数,所述F个虚拟扬声器与所述F组HOA系数一一对应;将所述F组HOA系数中与所述音频信号的HOA系数相关性最大的一组HOA系数对应的虚拟扬声器确定为所述目标虚拟扬声器。
在一种可能的实现方式中,所述与所述目标虚拟扬声器对应的S个虚拟扬声器满足如下条件:所述S个虚拟扬声器包括所述目标虚拟扬声器,以及位于所述目标虚拟扬声器周围的S-1个虚拟扬声器,所述S-1个虚拟扬声器与所述目标虚拟扬声器的S-1个相关性中的任意一个相关性大于所述K个虚拟扬声器中除所述S个虚拟扬声器外的其它K-S个虚拟扬声器与所述目标虚拟扬声器的K-S个相关性中的所有相关性。
在一种可能的实现方式中,所述K个虚拟扬声器满足如下条件:所述K个虚拟扬声器分布于预设球面上;所述预设球面包含L个纬度区域,L>1;其中,所述L个纬度区域中第m个纬度区域包含T m个纬线圈,所述K个虚拟扬声器中分布于第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α m,1≤m≤L,T m为正整数,1≤m i≤Tm;其中,当T m>1时,所述第m个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α m
在一种可能的实现方式中,所述L个纬度区域中第n个纬度区域包含T n个纬线圈,所述K个虚拟扬声器中分布于第n i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α n,1≤n≤L,T n为正整数,1≤n i≤T n;其中,当T n>1时,所述第n个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α n;其中,α n=α m或者α n≠α m,n≠m。
在一种可能的实现方式中,所述L个纬度区域中第c个纬度区域包含T c个纬线圈,所述T c个纬线圈的其中之一为赤道纬线圈,所述K个虚拟扬声器中分布于第c i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α c,1≤c≤L,T c为正整数,1≤c i≤T c;其中,当T c>1时,所述第c个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α c;其中,α c<α m,c≠m。
在一种可能的实现方式中,所述F个虚拟扬声器满足如下条件:所述F个虚拟扬声器中分布于所述第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差α mi大于α m
在一种可能的实现方式中,α mi=q×α m,其中,q为大于1的正整数。
在一种可能的实现方式中,所述K个虚拟扬声器中的第k个虚拟扬声器与所述目标虚拟扬声器的相关性R fk满足如下公式:
Figure PCTCN2022078824-appb-000038
其中,θ表示所述目标虚拟扬声器的水平角度,
Figure PCTCN2022078824-appb-000039
表示所述目标虚拟扬声器的俯仰角度,
Figure PCTCN2022078824-appb-000040
表示所述目标虚拟扬声器的HOA系数,
Figure PCTCN2022078824-appb-000041
表示所述K个虚拟扬声器中的第k个虚拟扬声器的HOA系数。
本实施例的装置,可以用于执行图7所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的 划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种虚拟扬声器集合确定方法,其特征在于,包括:
    根据待处理的音频信号从预设的F个虚拟扬声器中确定目标虚拟扬声器,所述F个虚拟扬声器中的每个虚拟扬声器各自对应S个虚拟扬声器,F为正整数,S为大于1的正整数;
    从预设的虚拟扬声器分布表中,获取与所述目标虚拟扬声器对应的S个虚拟扬声器各自的位置信息,所述虚拟扬声器分布表包括K个虚拟扬声器的位置信息,所述位置信息包括俯仰角索引和水平角索引,K为大于1的正整数,F≤K,F×S≥K。
  2. 根据权利要求1所述的方法,其特征在于,所述根据待处理的音频信号从预设的F个虚拟扬声器中确定目标虚拟扬声器,包括:
    获取所述音频信号的高阶立体混响HOA系数;
    获取所述F个虚拟扬声器对应的F组HOA系数,所述F个虚拟扬声器与所述F组HOA系数一一对应;
    将所述F组HOA系数中与所述音频信号的HOA系数相关性最大的一组HOA系数对应的虚拟扬声器确定为所述目标虚拟扬声器。
  3. 根据权利要求1或2所述的方法,其特征在于,所述与所述目标虚拟扬声器对应的S个虚拟扬声器满足如下条件:
    所述S个虚拟扬声器包括所述目标虚拟扬声器,以及位于所述目标虚拟扬声器周围的S-1个虚拟扬声器,所述S-1个虚拟扬声器与所述目标虚拟扬声器的S-1个相关性中的任意一个相关性大于所述K个虚拟扬声器中除所述S个虚拟扬声器外的其它K-S个虚拟扬声器与所述目标虚拟扬声器的K-S个相关性中的所有相关性。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述K个虚拟扬声器满足如下条件:
    所述K个虚拟扬声器分布于预设球面上;所述预设球面包含L个纬度区域,L>1;
    其中,所述L个纬度区域中第m个纬度区域包含T m个纬线圈,所述K个虚拟扬声器中分布于第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α m,1≤m≤L,T m为正整数,1≤m i≤Tm;
    其中,当T m>1时,所述第m个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α m
  5. 根据权利要求4所述的方法,其特征在于,所述L个纬度区域中第n个纬度区域包含T n个纬线圈,所述K个虚拟扬声器中分布于第n i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α n,1≤n≤L,T n为正整数,1≤n i≤T n
    其中,当T n>1时,所述第n个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α n
    其中,α n=α m或者α n≠α m,n≠m。
  6. 根据权利要求4所述的方法,其特征在于,所述L个纬度区域中第c个纬度区域包含T c个纬线圈,所述T c个纬线圈的其中之一为赤道纬线圈,所述K个虚拟扬声器中分布于第c i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α c,1≤c≤L,T c为正整数, 1≤c i≤T c
    其中,当T c>1时,所述第c个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α c
    其中,α c<α m,c≠m。
  7. 根据权利要求4-6中任一项所述的方法,其特征在于,所述F个虚拟扬声器满足如下条件:
    所述F个虚拟扬声器中分布于所述第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差α mi大于α m
  8. 根据权利要求7所述的方法,其特征在于,α mi=q×α m,其中,q为大于1的正整数。
  9. 根据权利要求3所述的方法,其特征在于,所述K个虚拟扬声器中的第k个虚拟扬声器与所述目标虚拟扬声器的相关性R fk满足如下公式:
    Figure PCTCN2022078824-appb-100001
    其中,θ表示所述目标虚拟扬声器的水平角度,
    Figure PCTCN2022078824-appb-100002
    表示所述目标虚拟扬声器的俯仰角度,
    Figure PCTCN2022078824-appb-100003
    表示所述目标虚拟扬声器的HOA系数,
    Figure PCTCN2022078824-appb-100004
    表示所述第k个虚拟扬声器的HOA系数。
  10. 一种虚拟扬声器集合确定装置,其特征在于,包括:
    确定模块,用于根据待处理的音频信号从预设的F个虚拟扬声器中确定目标虚拟扬声器,所述F个虚拟扬声器中的每个虚拟扬声器各自对应S个虚拟扬声器,F为正整数,S为大于1的正整数;
    获取模块,用于从预设的虚拟扬声器分布表中获取与所述目标虚拟扬声器对应的S个虚拟扬声器各自的位置信息,所述虚拟扬声器分布表包括K个虚拟扬声器的位置信息,所述位置信息包括俯仰角索引和水平角索引,K为大于1的正整数,F≤K,F×S≥K。
  11. 根据权利要求10所述的装置,其特征在于,所述确定模块,具体用于获取所述音频信号的高阶立体混响HOA系数;获取所述F个虚拟扬声器对应的F组HOA系数,所述F个虚拟扬声器与所述F组HOA系数一一对应;将所述F组HOA系数中与所述音频信号的HOA系数相关性最大的一组HOA系数对应的虚拟扬声器确定为所述目标虚拟扬声器。
  12. 根据权利要求10或11所述的装置,其特征在于,所述与所述目标虚拟扬声器对应的S个虚拟扬声器满足如下条件:
    所述S个虚拟扬声器包括所述目标虚拟扬声器,以及位于所述目标虚拟扬声器周围的S-1个虚拟扬声器,所述S-1个虚拟扬声器与所述目标虚拟扬声器的S-1个相关性中的任意一个相关性大于所述K个虚拟扬声器中除所述S个虚拟扬声器外的其它K-S个虚拟扬声器与所述目标虚拟扬声器的K-S个相关性中的所有相关性。
  13. 根据权利要求10-12中任一项所述的装置,其特征在于,所述K个虚拟扬声器满足如下条件:
    所述K个虚拟扬声器分布于预设球面上;所述预设球面包含L个纬度区域,L>1;
    其中,所述L个纬度区域中第m个纬度区域包含T m个纬线圈,所述K个虚拟扬声器中分布于第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α m,1≤m≤L,T m为正整数,1≤m i≤Tm;
    其中,当T m>1时,所述第m个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α m
  14. 根据权利要求13所述的装置,其特征在于,所述L个纬度区域中第n个纬度区域包含T n个纬线圈,所述K个虚拟扬声器中分布于第n i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α n,1≤n≤L,T n为正整数,1≤n i≤T n
    其中,当T n>1时,所述第n个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α n
    其中,α n=α m或者α n≠α m,n≠m。
  15. 根据权利要求13所述的装置,其特征在于,所述L个纬度区域中第c个纬度区域包含T c个纬线圈,所述T c个纬线圈的其中之一为赤道纬线圈,所述K个虚拟扬声器中分布于第c i个纬线圈上的相邻虚拟扬声器之间的水平角度差为α c,1≤c≤L,T c为正整数,1≤c i≤T c
    其中,当T c>1时,所述第c个纬度区域中的任意两个相邻纬线圈之间的俯仰角度差为α c
    其中,α c<α m,c≠m。
  16. 根据权利要求13-15中任一项所述的装置,其特征在于,所述F个虚拟扬声器满足如下条件:
    所述F个虚拟扬声器中分布于所述第m i个纬线圈上的相邻虚拟扬声器之间的水平角度差α mi大于α m
  17. 根据权利要求16所述的装置,其特征在于,α mi=q×α m,其中,q为大于1的正整数。
  18. 根据权利要求12所述的装置,其特征在于,所述K个虚拟扬声器中的第k个虚拟扬声器与所述目标虚拟扬声器的相关性R fk满足如下公式:
    Figure PCTCN2022078824-appb-100005
    其中,θ表示所述目标虚拟扬声器的水平角度,
    Figure PCTCN2022078824-appb-100006
    表示所述目标虚拟扬声器的俯仰角度,
    Figure PCTCN2022078824-appb-100007
    表示所述目标虚拟扬声器的HOA系数,
    Figure PCTCN2022078824-appb-100008
    表示所述第k个虚拟扬声器的HOA系数。
  19. 一种音频处理设备,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9中任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1-9中任一项所述的方法。
PCT/CN2022/078824 2021-03-05 2022-03-02 虚拟扬声器集合确定方法和装置 WO2022184097A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
AU2022230620A AU2022230620A1 (en) 2021-03-05 2022-03-02 Method and apparatus for determining virtual speaker set
JP2023553928A JP2024512347A (ja) 2021-03-05 2022-03-02 仮想スピーカセットを決定するための方法および装置
EP22762560.5A EP4294056A1 (en) 2021-03-05 2022-03-02 Virtual speaker set determination method and device
KR1020237033855A KR20230154241A (ko) 2021-03-05 2022-03-02 가상 스피커 세트 결정 방법 및 디바이스
BR112023017996A BR112023017996A2 (pt) 2021-03-05 2022-03-02 Método e aparelho para determinar conjunto de alto-falantes virtuais
US18/241,698 US20230412981A1 (en) 2021-03-05 2023-09-01 Method and apparatus for determining virtual speaker set

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110247466.1 2021-03-05
CN202110247466.1A CN115038028B (zh) 2021-03-05 2021-03-05 虚拟扬声器集合确定方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/241,698 Continuation US20230412981A1 (en) 2021-03-05 2023-09-01 Method and apparatus for determining virtual speaker set

Publications (1)

Publication Number Publication Date
WO2022184097A1 true WO2022184097A1 (zh) 2022-09-09

Family

ID=83117702

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078824 WO2022184097A1 (zh) 2021-03-05 2022-03-02 虚拟扬声器集合确定方法和装置

Country Status (9)

Country Link
US (1) US20230412981A1 (zh)
EP (1) EP4294056A1 (zh)
JP (1) JP2024512347A (zh)
KR (1) KR20230154241A (zh)
CN (3) CN117061983A (zh)
AU (1) AU2022230620A1 (zh)
BR (1) BR112023017996A2 (zh)
TW (2) TW202410705A (zh)
WO (1) WO2022184097A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103618986A (zh) * 2013-11-19 2014-03-05 深圳市新一代信息技术研究院有限公司 一种3d空间中音源声像体的提取方法及装置
CN105637901A (zh) * 2013-10-07 2016-06-01 杜比实验室特许公司 空间音频处理系统和方法
EP3209036A1 (en) * 2016-02-19 2017-08-23 Thomson Licensing Method, computer readable storage medium, and apparatus for determining a target sound scene at a target position from two or more source sound scenes
JP2018157309A (ja) * 2017-03-16 2018-10-04 ヤマハ株式会社 マイクロフォンアレイ

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2645748A1 (en) * 2012-03-28 2013-10-02 Thomson Licensing Method and apparatus for decoding stereo loudspeaker signals from a higher-order Ambisonics audio signal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105637901A (zh) * 2013-10-07 2016-06-01 杜比实验室特许公司 空间音频处理系统和方法
CN103618986A (zh) * 2013-11-19 2014-03-05 深圳市新一代信息技术研究院有限公司 一种3d空间中音源声像体的提取方法及装置
EP3209036A1 (en) * 2016-02-19 2017-08-23 Thomson Licensing Method, computer readable storage medium, and apparatus for determining a target sound scene at a target position from two or more source sound scenes
JP2018157309A (ja) * 2017-03-16 2018-10-04 ヤマハ株式会社 マイクロフォンアレイ

Also Published As

Publication number Publication date
CN115038028A (zh) 2022-09-09
BR112023017996A2 (pt) 2023-11-14
CN117061983A (zh) 2023-11-14
KR20230154241A (ko) 2023-11-07
TW202410705A (zh) 2024-03-01
AU2022230620A1 (en) 2023-09-21
TW202245487A (zh) 2022-11-16
CN116980818A (zh) 2023-10-31
TWI816313B (zh) 2023-09-21
JP2024512347A (ja) 2024-03-19
CN115038028B (zh) 2023-07-28
EP4294056A1 (en) 2023-12-20
US20230412981A1 (en) 2023-12-21

Similar Documents

Publication Publication Date Title
TWI834760B (zh) 用於編碼、解碼、場景處理及與以指向性音訊編碼為基礎之空間音訊編碼有關的其他程序之裝置、方法及電腦程式
US10313815B2 (en) Apparatus and method for generating a plurality of parametric audio streams and apparatus and method for generating a plurality of loudspeaker signals
US10477310B2 (en) Ambisonic signal generation for microphone arrays
CN111542877B (zh) 空间音频参数编码和相关联的解码的确定
WO2014124264A1 (en) Determining renderers for spherical harmonic coefficients
WO2018200089A1 (en) Microphone configurations
WO2022110723A1 (zh) 一种音频编解码方法和装置
WO2022022293A1 (zh) 音频信号渲染方法和装置
WO2019091575A1 (en) Determination of spatial audio parameter encoding and associated decoding
WO2022184097A1 (zh) 虚拟扬声器集合确定方法和装置
WO2022110722A1 (zh) 一种音频编解码方法和装置
WO2022184096A1 (zh) Hoa系数的获取方法和装置
CN114586096A (zh) 空间音频方向参数的量化
GB2612817A (en) Spatial audio parameter decoding
CN118251722A (zh) 空间音频参数解码

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22762560

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023553928

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2022230620

Country of ref document: AU

Ref document number: AU2022230620

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2022762560

Country of ref document: EP

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023017996

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2022230620

Country of ref document: AU

Date of ref document: 20220302

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2022762560

Country of ref document: EP

Effective date: 20230911

ENP Entry into the national phase

Ref document number: 20237033855

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020237033855

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112023017996

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20230905