CN106415712B

CN106415712B - Device and method for rendering high-order ambiophony coefficient

Info

Publication number: CN106415712B
Application number: CN201580028070.0A
Authority: CN
Inventors: N·G·彼得斯; D·森; M·J·莫雷尔
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2014-05-30
Filing date: 2015-05-29
Publication date: 2019-11-15
Anticipated expiration: 2035-05-29
Also published as: ES2699657T3; HUE042058T2; CN110827839A; CN106415712A; CN110827839B; BR112016028215B1; EP3149971A1; JP6297721B2; EP3149971B1; KR101818877B1; WO2015184307A1; JP2017520177A; KR20170015897A; CA2949108C; BR112016028215A2; CA2949108A1

Abstract

Generally, the present invention describes the technology for obtaining the audio spatial cue in bit stream.It is a kind of be configured to rendering high-order ambiophony coefficient device the technology can be performed, described device includes processor and memory.The processor can be configured to obtain the sparsity information of the sparsity of oriental matrix, and the matrix is used to the high-order ambiophony coefficient being rendered into multiple speaker feeds.The memory can be configured to store the sparsity information.

Description

Device and method for rendering high-order ambiophony coefficient

Present application advocates entitled " the audio spatial cue communication in bit stream filed on July 11st, 2014 The United States provisional application the 62/th of (SIGNALING AUDIO RENDERING INFORMATION IN ABITSTREAM) " The U.S. Provisional Application of entitled " the audio spatial cue communication in bit stream " filed in 023, No. 662 and on May 30th, 2014 The equity that case the 62/005th, 829, the full content of each of aforesaid US Provisional Application is accordingly by reference It is incorporated herein, as its corresponding full text elaboration in this article.

Technical field

The present invention relates to spatial cues, and systems are used for the wash with watercolours of high-order ambiophony (HOA) audio data Contaminate information.

Background technique

During the generation of audio content, sound engineer can be used specific renderer rendering audio content to attempt to be directed to The audio content is customized for reproducing the target configuration of the loudspeaker of audio content.In other words, sound engineer can wash with watercolours Contaminate the audio content and using the rendered audio content of speaker playback being arranged in target configuration.Sound engineer can connect Remix the various aspects of audio content, rendering is remixed audio content, and uses the loudspeaking being arranged in target configuration Device resets the rendered audio content through remixing again.Sound engineer can be repeated up to audio content by this method and provide spy Until determining artistic intent.By this method, sound engineer, which can produce, provides specific artistic intent or provides playback in other ways The audio content (for example, with the video content played together with audio content) of the specific sound field of period.

Summary of the invention

Generally, technology of the description for the audio spatial cue in the specified bit stream for indicating audio data.In other words It says, the technology can provide a kind of to the side of the audio spatial cue used during generating to replay device communication audio content Formula, the replay device then can render audio content using audio spatial cue.Spatial cue is provided by this method to make Replay device can render audio content in such a way that sound engineer is intended to, and potentially ensure appropriate audio playback whereby Content is so that artistic intent is potentially listener is understood.In other words, wash with watercolours is provided in accordance with the techniques described in this disclosure The spatial cue used during dye by sound engineer, so that audio frequency replaying apparatus can be using the spatial cue with sound engineering The mode that teacher is intended to renders audio content, thereby ensures that compared with the system for not providing this audio spatial cue in audio The generation and the more consistent experience during playback the two of appearance.

In an aspect, a kind of device being configured to rendering high-order ambiophony coefficient includes: to be configured to obtain One or more processors of the sparsity information of the sparsity of oriental matrix, the matrix are used for high-order ambiophony system Number is rendered into multiple speaker feeds；And it is configured to store the memory of the sparsity information.

In another aspect, a kind of method rendering high-order ambiophony coefficient includes: to obtain the sparsity of oriental matrix Sparsity information, the matrix is for rendering the high-order ambiophony coefficient to generate multiple speaker feeds.

In another aspect, a kind of to be configured to generate the device of bit stream to include: the memory for being configured to storage matrix； And be configured to obtain one or more processors of the sparsity information for the sparsity for indicating the matrix, the matrix is used for wash with watercolours High-order ambiophony coefficient is contaminated to generate multiple speaker feeds.

In another aspect, a kind of method generating bit stream includes: to obtain the sparsity information of the sparsity of oriental matrix, The matrix is for rendering high-order ambiophony coefficient to generate multiple speaker feeds.

In another aspect, a kind of device being configured to rendering high-order ambiophony coefficient includes: to be configured to obtain One or more processors of the sign symmetry information of the sign symmetry of oriental matrix, the matrix are described for rendering High-order ambiophony coefficient is to generate multiple speaker feeds；And it is configured to store the memory of the sparsity information.

In another aspect, a kind of method rendering high-order ambiophony coefficient includes: to obtain the sign of oriental matrix The sign symmetry information of symmetry, the matrix is for rendering the high-order ambiophony coefficient to generate multiple loudspeakers Feeding.

It is in another aspect, a kind of to be configured to generate the device of bit stream to include: the memory for being configured to storage matrix, The matrix is for rendering high-order ambiophony coefficient to generate multiple speaker feeds；And it is configured to obtain and indicates the square One or more processors of the sign symmetry information of the sign symmetry of battle array.

The details of the one or more aspects of the technology is illustrated in the accompanying drawings and the following description.Other spies of the technology Sign, target and advantage will be apparent from description and schema and claims.

Detailed description of the invention

Fig. 1 is the figure for illustrating the spherical harmonics basis function with various orders and sub- order.

Fig. 2 is the figure for illustrating can be performed the system of the various aspects of technology described in the present invention.

Fig. 3 is is shown in the example for Fig. 2 that the various aspects of technology described in the present invention can be performed compared with detailed description Audio coding apparatus an example block diagram.

Fig. 4 is the block diagram that the audio decoding apparatus of Fig. 2 is relatively described in detail.

Fig. 5 is to illustrate that audio coding apparatus executes the various aspects of the synthetic technology described in the present invention based on vector The flow chart of example operation.

Fig. 6 is to illustrate that audio decoding apparatus executes the stream of the example operation of the various aspects of technology described in the present invention Cheng Tu.

The system of one of system of the Fig. 7 by showing in the example of explanation such as Fig. 2 executes described in the present invention The flow chart of the example operation of the various aspects of technology.

Fig. 8 A to 8D is the figure for illustrating the bit stream formed in accordance with the techniques described in this disclosure.

Fig. 8 E to 8G is the part that may specify bit stream or side channel information through compression space component compared with detailed description Figure.

Fig. 9 is the reality for illustrating HOA order dependence minimum and maximum gain in high-order ambiophony (HOA) rendering matrix The figure of example.

Figure 10 is the figure for illustrating the sparse 6 rank HOA rendering matrix in part for 22 loudspeakers.

Figure 11 is the flow chart for illustrating the communication with symmetry property.

Specific embodiment

The evolution of surround sound has made many output formats can be used for entertaining now.The reality of such consumption-orientation surround sound format Most of example is " channel " formula, this is because its feeding for being impliedly assigned to loudspeaker with certain geometric coordinates.Consumption-orientation Surround sound format include 5.1 universal formats (it includes following six channels: left front (FL), it is right before (FR), center or it is preceding in The heart, it is left back or it is left surround, be right after or right surround and low-frequency effects (LFE)), developing 7.1 format, include height speaker Various formats, such as 7.1.4 format and 22.2 formats (for example, for being used together with ultra high-definition television standard).Non-consumption Type format can include any number loudspeaker (at symmetrical and asymmetric geometrical arrangements), be usually referred to as " around array ".This One example of array includes 32 loudspeakers being located at the coordinate on the icosahedral turning of rescinded angle.

To the following mpeg encoder input option one of for three kinds of possible formats: (i) is traditional based on channel Audio (as discussed above), be intended to pass through the loudspeaker at the pre-specified position and play；(ii) object-based Audio is related to having the associated metadata containing its position coordinates (and other information) for single audio object Discrete pulse-code modulation (PCM) data；And the audio of (iii) based on scene, it is related to the coefficient using spherical harmonics basis function (also referred to as " spherical harmonics coefficient " or SHC, " high-order ambiophony " or HOA and " HOA coefficient ") indicates sound field.The future Mpeg encoder can describe in greater detail in International Organization for standardization/International Electrotechnical Commission (ISO)/(IEC) JTC1/SC29/ The text of entitled " it is required that the proposal (Call forProposals for 3D Audio) for 3D audio " of WG11/N13411 In offering, the document is issued in January, 2013 in Geneva, Switzerland, and can be in http://mpeg.chiariglione.org/ Sites/default/files/files/standards/parts/docs/w13411.zi p is obtained.

There are the various formats of ' surround sound ' based on channel in the market.Their range is (for example) from 5.1 family's shadows Department's system (its make living room enjoy stereo aspect obtained maximum success) is to NHK (Japan Broadcasting Association or Japan Broadcast Company) 22.2 systems developed.Creator of content (for example, Hollywood studios) will wish the primary audio track for generating film, Each speaker configurations are directed to without requiring efforts to remix it.Recently, standards development organizations (Standards Developing Organizations) such as under type is being considered always: it provides to the coding and subsequent solution standardized in bit stream Code, adaptable and the loudspeaker geometrical arrangements (and number) and acoustic condition that are unaware of at replay position (being related to renderer).

To provide this flexibility to creator of content, class type element set can be used to indicate sound field.The class type Element set can refer to wherein element and be ordered such that the basis set of lower-order element provides the complete representation of modelling sound field Element set.When the set expansion is with comprising higher order element, the expression becomes relatively in detail, to increase resolution ratio.

One example of class type element set is the set of spherical harmonics coefficient (SHC).Following formula demonstration uses Description or expression of the SHC to sound field:

Expression formula is shown at time t in any point of sound fieldThe pressure p at place_iCan by SHC,Uniquely Ground indicates.Herein,C is the speed (~343m/s) of sound,For reference point (or observation point), j_n(·) For the spherical Bessel function of rank n, andFor the spherical harmonics basis function of rank n and sub- rank m.It can be appreciated that Fang Kuo Term in number be signal (that is,) frequency domain representation, can by various time-frequency conversion approximate representations, such as from Dissipate Fourier transformation (DFT), discrete cosine transform (DCT) or wavelet transformation.Other examples of class type set become comprising small echo Change other set of the set of coefficient and the coefficient of multiresolution basis function.

Fig. 1 is the figure illustrated from zeroth order (n=0) to the spherical harmonics basis function of quadravalence (n=4).As can be seen, for Every single order, there are the extensions of sub- rank m, for the purpose of ease of explanation, show the sub- rank in the example of fig. 1 but not yet explicitly mention It arrives.

It can be configured by various microphone arrays and physically obtain (for example, record)It or alternatively, can From being exported based on channel or object-based description for sound field.SHC indicates the audio based on scene, and wherein SHC can be input to sound For frequency encoder to obtain encoded SHC, the encoded SHC can promote more effective transmitting or storage.For example, it can be used It is related to (1+4)²The quadravalence of a (25, and be therefore quadravalence) coefficient indicates.

As mentioned above, microphone array can be used to record export SHC from microphone.How can be led from microphone array The various examples of SHC are described in " the surrounding sound system based on spherical harmonics of Bo Laidi M (Poletti, M.) out (Three-Dimensional Surround Sound Systems Based on the Spherical Harmonics) " (sense of hearing Engineering science association proceedings (J.Audio Eng.Soc.), volume 53, o. 11th, in November, 2005, page 1004 to 1025) in.

To illustrate to consider following equation how from object-based description export SHC.It can will correspond to individual audio pair The coefficient of the sound field of elephantExpression are as follows:

Wherein i is For the spherical surface Hankel function (second species) of rank n, andFor object Position.Known object source energy g (ω) varying with frequency is (for example, using time-frequency analysis technology, for example, executing to PCM stream fast Fast Fourier transform) allow every PCM object and corresponding position being converted into SHC,In addition, can show (due to above formula For linear and Orthogonal Decomposition) every an objectCoefficient has additivity.By this method, numerous PCM objects can be bySystem (for example, summation of the coefficient vector as individual objects) is counted to indicate.Substantially, the coefficient contains the information about sound field (pressure become with 3D coordinate), and said circumstances are indicated in observation pointNearby from individual objects to entire sound field Expression transformation.Hereafter remaining each figure in the described in the text up and down of the audio coding based on object and based on SHC.

Fig. 2 is the figure for illustrating can be performed the system 10 of the various aspects of technology described in the present invention.As Fig. 2 example in It is shown, system 10 includes creator of content device 12 and content consumer device 14.Although in creator of content device 12 and The described in the text up and down of content consumer device 14, but can be in the SHC (it is alternatively referred to as HOA coefficient) of sound field or any other rank Laminar indicates to implement the technology in encoded any context to form the bit stream for indicating audio data.In addition, content is created The person's of building device 12 can indicate that any type of computing device of technology described in the present invention can be implemented, comprising hand-held set (or Cellular phone), tablet computer, smart phone or desktop computer (several examples are provided).Similarly, content consumer fills Setting 14 can indicate that any type of computing device of technology described in the present invention can be implemented, (or cellular comprising hand-held set Phone), tablet computer, smart phone, set-top box or desktop computer (several examples are provided).

Creator of content device 12 can by film workshop or can produce multi-channel audio content for content consumer dress The other entities for the operator's consumption for setting (for example, content consumer device 14) operate.In some instances, creator of content Device 12 can be by that will wish that the individual user for compressing HOA coefficient 11 operates.Creator of content usually combines video content to generate sound Frequency content.Content consumer device 14 can be operated by individual.Content consumer device 14 may include audio playback systems 16, can Rendering SHC is referred to for any type of audio playback systems as multi-channel audio content playback.

Creator of content device 12 includes audio editing system 18.Creator of content device 12 obtains various formats Directly as HOA coefficient) document recording 7 and audio object 9, creator of content device 12 can be used audio editing system 18 right It is edited.Microphone 5 can capture document recording 7.Creator of content can render during editing process comes from audio object 9 HOA coefficient 11, to listen to rendered speaker feeds to attempt to identify the various sides of sound field for requiring further to edit Face.Creator of content device 12 can then be edited HOA coefficient 11 and (can may therefrom be led in a manner of as described above by manipulation Out the different persons in the audio object 9 of source HOA coefficient and edit indirectly).Creator of content device 12 can utilize audio editing System 18 generates HOA coefficient 11.Audio editing system 18 indicate can editing audio data, and the audio data is exported For any system of one or more source spherical harmonics coefficients.

When editing process is completed, creator of content device 12 can generate bit stream 21 based on HOA coefficient 11.That is, Creator of content device 12 includes audio coding apparatus 20, and the audio coding apparatus expression is configured to retouch according to the present invention The various aspects for the technology stated encode or compress HOA coefficient 11 in other ways to generate the device of bit stream 21.Audio coding dress Setting 20 can produce bit stream 21 for (it can be for wired or wireless channel, data storage device or its is similar across launch channel Person) transmitting (as an example).Bit stream 21 can indicate the encoded version of HOA coefficient 11, and may include primary bitstream and another One side bit stream (it can be described as side channel information).

Although being shown as being transmitted directly to content consumer device 14 in Fig. 2, creator of content device 12 can be by position Stream 21 is output to the intermediate device being positioned between creator of content device 12 and content consumer device 14.Intermediate device can be deposited Storage space stream 21 for being delivered to the content consumer device 14 that can request the bit stream later.Intermediate device may include file clothes Business device, network server, desktop computer, laptop computer, tablet computer, mobile phone, smart phone, or can deposit Storage space stream 21 is with any other device for being retrieved later by audio decoder.Intermediate device can reside in can be by bit stream 21 Subscriber (for example, content consumer device 14) of (and video data bitstream may be corresponded in conjunction with the emitting) crossfire to request bit stream 21 Content delivery network in.

Alternatively, creator of content device 12 can be by 21 storage to storage media of bit stream, such as compact disk, number view Frequency CD, HD video CD or other storage media, wherein most of can be read by computer and therefore can be described as calculating Machine readable memory medium or non-transitory computer-readable storage media.In this context, launch channel can refer to so as to transmitting Store the channel (and may include retail shop and other delivery mechanisms based on shop) of the content of media.In any situation Under, technology of the invention therefore the example that should not necessarily be limited by Fig. 2 in this.

As further shown in the example of Figure 2, content consumer device 14 includes audio playback systems 16.Audio playback system System 16 can indicate that any audio playback systems of multi-channel audio data can be reset.Audio playback systems 16 may include it is several not With renderer 22.Renderer 22 can respectively be provided for various forms of renderings, wherein various forms of renderings may include executing Vector base amplitude translate one or more of the various modes of (VBAP) and/or execute in the various modes of sound field synthesis one or More persons.As used herein, " A and/or B " means " A or B ", or both " A and B ".

Audio playback systems 16 can further include audio decoding apparatus 24.Audio decoding apparatus 24 can indicate to be configured to The device of the HOA coefficient 11 ' from bit stream 21 is decoded, wherein HOA coefficient 11 ' can be similar to HOA coefficient 11, but is attributed to and damages Operation (for example, quantization) and/or different via the transmitting of launch channel.Audio playback systems 16 can be after decoding bit stream 21 It obtains HOA coefficient 11 ' and renders HOA coefficient 11 ' to export loudspeaker feeding 25.Loudspeaker feeding 25 can drive one or more expansions Sound device (its purpose do not show in the example of figure 2) for ease of illustration.

In order to select appropriate renderer or generate appropriate renderer in some cases, audio playback systems 16, which can get, to be referred to Show the loudspeaker information 13 of the number of loudspeaker and/or the space geometry arrangement of loudspeaker.In some cases, audio playback system 16 usable reference microphones of uniting, which obtain loudspeaker information 13 and driven in a manner of dynamically determining loudspeaker information 13, to amplify Device.In other cases or being dynamically determined for loudspeaker information 13 is combined, audio playback systems 16 can prompt user and audio weight Place system 16 interfaces with and inputs loudspeaker information 13.

Audio playback systems 16 then can select one of sound renderer 22 based on loudspeaker information 13.In some feelings Under condition, none a certain threshold in the loudspeaker geometrical arrangements specified into loudspeaker information 13 in sound renderer 22 When value similarity measurement (for loudspeaker geometrical arrangements) is interior, audio playback systems 16 can generate sound based on loudspeaker information 13 One of frequency renderer 22.Audio playback systems 16 can generate sound renderer based on loudspeaker information 13 in some cases One of 22, without first attempting to the existing one in selection sound renderer 22.One or more loudspeakers 3 can then weigh Put rendered loudspeaker feeding 25.

In some cases, any of 16 selectable audio renderer 22 of audio playback systems, and can be configured To depend on the source (such as DVD player, Blu-ray player, smart phone, tablet computer, the game that therefrom receive bit stream 21 One or more of system and TV (several examples are provided)) selection sound renderer 22.Although selectable audio renderer 22 Any one of, but be attributed to by creator of content 12 using this one in sound renderer (that is, being in the example of fig. 3 Sound renderer 5) creation content the fact, is usually provided preferably (and most preferably possible) when create content using sound renderer Rendering form.Identical or at least close to (for rendering form) one of sound renderer 22 is selected to can provide sound field Preferable expression, and preferable surround sound can be brought to experience for content consumer 14.

In accordance with the techniques described in this disclosure, audio coding apparatus 20 can produce to comprising 2 (" wash with watercolours of audio spatial cue Contaminate information 2 ") bit stream 21.Audio spatial cue 2 may include identifying the audio rendering used when generating multi-channel audio content The signal value of device (that is, being in the example of fig. 3 sound renderer 1).In some cases, signal value includes for spherical surface is humorous Wave system number is rendered into the matrix of multiple speaker feeds.

In some cases, signal value includes two or more positions for defining an index, and the index indicates bit stream Include the matrix for spherical harmonics coefficient to be rendered into multiple speaker feeds.In some cases, when using index, letter Number value further includes two or more positions of the number of the row for the matrix that definition is contained in bit stream, and definition is contained in Two or more positions of the number of matrix column in bit stream.It is using this information and usual in each coefficient of two-dimensional matrix Under conditions of being defined by 32 floating numbers, the size for matrix is in place can be calculated as capable number, the number of column and define square The function of the size (that is, being in this example 32) of the floating number of each coefficient of battle array.

In some cases, signal value specifies the rendering for spherical harmonics coefficient to be rendered into multiple speaker feeds to calculate Method.Rendering algorithms may include audio coding apparatus 20 and decoding apparatus 24 both known matrixes.That is, Rendering algorithms May include matrix application and other rendering steps, such as translation (for example, VBAP, DBAP or simple translation) or NFC filtering. In some cases, two or more positions of signal value comprising defining an index, it is described to index and be used for spherical harmonics It is associated that coefficient is rendered into one of multiple matrixes of multiple speaker feeds.Again, audio coding apparatus 20 and decoding dress Setting both 24 can be configured the information for indicating multiple matrixes and multiple order of matrix numbers so that index can uniquely identify it is multiple Specific one in matrix.Alternatively, audio coding apparatus 20 can specify in bit stream 21 defines multiple matrixes and/or multiple The data of order of matrix number, so that index can uniquely identify the specific one in multiple matrixes.

In some cases, signal value include define one index two or more positions, it is described index with for will It is associated that spherical harmonics coefficient is rendered into one of multiple Rendering algorithms of multiple speaker feeds.Again, audio coding fills Setting both 20 and decoding apparatus 24 can be configured the information for indicating the order of multiple Rendering algorithms and multiple Rendering algorithms, so that Index can uniquely identify the specific one in multiple matrixes.Alternatively, audio coding apparatus 20 can specify in bit stream 21 The data of multiple matrixes and/or multiple order of matrix numbers are defined, so that index can uniquely identify specific one in multiple matrixes Person.

In some cases, audio coding apparatus 20 is based on every audio frame specific audio frequency spatial cue 2 in bit stream.At it In the case of it, the single specific audio frequency spatial cue 2 in bit stream of audio coding apparatus 20.

Decoding apparatus 24 then can determine the audio spatial cue 2 specified in bit stream.Based on being contained in audio spatial cue 2 In signal value, audio playback systems 16 can render multiple speaker feeds 25 based on audio spatial cue 2.As mentioned above It arrives, signal value can be in some cases comprising the matrix for spherical harmonics coefficient to be rendered into multiple speaker feeds.Herein In the case of, audio playback systems 16 can use one of described matrix configuration sound renderer 22, to use sound renderer This one in 22 is based on matrix and renders speaker feeds 25.

In some cases, signal value includes two or more positions for defining an index, and the index indicates bit stream Include the matrix for HOA coefficient 11 ' to be rendered into speaker feeds 25.Decoding apparatus 24 may be in response to index and parse from bit stream Matrix, then audio playback systems 16 can be with through parsing one of matrix configuration sound renderer 22, and calls renderer 22 In this one render speaker feeds 25.As two that signal value includes the number of the row for the matrix that definition is contained in bit stream The number of matrix column that a or more than two position and definition are contained in bit stream two or more when, decoding apparatus 24 may be in response to two of the number of two or more and definition column of index and number based on definition line or be more than Two positions, mode described above parse the matrix from bit stream.

In some cases, signal value specifies the Rendering algorithms for HOA coefficient 11 ' to be rendered into speaker feeds 25. In these cases, these Rendering algorithms can be performed in some or all of sound renderer 22.Audio frequency replaying apparatus 16 is then Using specified Rendering algorithms (for example, one of sound renderer 22) to render speaker feeds 25 from HOA coefficient 11 '.

When signal value includes to define two or more of an index, some or all of sound renderer 22 It can indicate this multiple matrix, the index and one for being rendered into HOA coefficient 11 ' in multiple matrixes of speaker feeds 25 Person is associated.Therefore, audio playback systems 16 can be used one of sound renderer 22 associated with index from HOA coefficient 11 ' rendering speaker feeds 25.

When signal value includes to define two or more of an index, some or all of sound renderer 34 It can indicate these Rendering algorithms, the index and multiple Rendering algorithms for HOA coefficient 11 ' to be rendered into speaker feeds 25 One of it is associated.Therefore, audio playback systems 16 can be used one of sound renderer 22 associated with index from Spherical harmonics coefficient 11 ' renders speaker feeds 25.

Depending on the frequency so as to referring to this fixed audio spatial cue in bit stream, decoding apparatus 24 can based on every audio frame or Single determines audio spatial cue 2.

By specific audio frequency spatial cue 3 by this method, the technology can potentially generate multi-channel audio content compared with It is good to reproduce and be intended to reproduce the mode of multi-channel audio content according to creator of content 12.Therefore, the technology can provide relatively heavy Immersion surround sound or multi-channel audio experience.

In other words and as mentioned above, high-order ambiophony (HOA) can be indicated so as to based on spatial Fourier transform The mode of the directional information of sound field is described.In general, ambiophony order N is higher, spatial resolution is higher, spherical harmonics (SH) Number (N+1) ^2 of coefficient is bigger, and for emit and storing data required by bandwidth it is bigger.

One potential advantages of this description are that possible be arranged on (for example, 5.1,7.1 22.2 etc.) again in substantially any loudspeaker This existing sound field.The conversion for being described to M loudspeaker signal from sound field can be via with (N+1)²The static state of a input and M output Matrix is rendered to carry out.Therefore, each loudspeaker setting can require dedicated rendering matrix.It can exist for calculating and be directed to be expanded If the stem algorithm of the rendering matrix of sound device setting, the loudspeaker setting can be directed to the particular objective of such as Gerzon criterion Or subjective measurement and optimize.Irregular loudspeaker is arranged, algorithm is attributable to the iterative numerical optimization of such as convex surface optimization Program and complicate.To calculate the rendering matrix for being directed to irregular loudspeaker layout in the case where the N-free diet method time, there is foot Enough computing resources may be beneficial.Irregular loudspeaker setting is attributable to framework constraint and aesthstic preference at home It is common in the environment of parlor.Therefore, best sound field is reproduced, the rendering matrix for the optimization of such situation may be preferred , because it can realize more accurately reproduced sound-field.

Because audio decoder is usually not required for many computing resources, described device may not be with consumer The friendly time calculates irregular rendering matrix.The various aspects of technology described in the present invention can provide calculation method based on cloud Use, it is as follows:

1. audio decoder can connect via internet, by loudspeaker coordinate, (and in some cases, there are also utilize calibration The SPL measured value that microphone obtains) it is sent to server；

2. server based on cloud can calculate rendering matrix (and may several different editions so that consumer can later from These different editions selection)；And

3. server can then by rendering matrix (or different editions), connection sends back to audio decoder via internet.

The method allows manufacturer to keep the manufacturing cost of audio decoder lower (because powerful processing can not needed Device calculates these irregular rendering matrixes), while also promoting and being usually designed for conventional speakers configuration or geometry cloth The rendering matrix set compares better audio reproduction.Algorithm for calculating rendering matrix can also transport it in audio decoder By optimization, to potentially reduce for hardware modifications or the cost even recalled.In some cases, the technology may be used also Collect many information of the different loudspeakers setting about the consumer goods that can be beneficial to future products development.

In some cases, system demonstrated in Figure 3 can the not communication audio in bit stream 21 as described above Spatial cue 2, but be alternatively to be located away from the metadata of bit stream 21 by this 2 communication of audio spatial cue.Alternatively or tie Close content described above, system demonstrated in Figure 3 can the communication audio spatial cue 2 in bit stream 21 as described above A part, and be to be located away from the metadata of bit stream 21 by a part of communication of this audio spatial cue 3.In some instances, This exportable metadata of audio coding apparatus 20, can then upload onto the server or other devices.Audio decoding apparatus 24 connects Can download or retrieve in other ways this metadata, be then used to enhancing extracted from bit stream 21 by audio decoding apparatus 24 Audio spatial cue.The bit stream that example description below with respect to Fig. 8 A to 8D is formed according to the spatial cue aspect of the technology 21。

Fig. 3 is is shown in the example for Fig. 2 that the various aspects of technology described in the present invention can be performed compared with detailed description Audio coding apparatus 20 an example block diagram.Audio coding apparatus 20 includes content analysis unit 26, based on vector Decomposition unit 27 and decomposition unit 28 based on direction.Although being hereafter briefly described, about audio coding apparatus 20 and compression Or the relatively multi information of the various aspects of coding HOA coefficient " can be used for entitled filed on May 29th, 2014 in other ways Interpolation (the INTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A through exploded representation of sound field SOUND FIELD) " No. 2014/194099 International Patent Application Publication of WO in obtain.

Content analysis unit 26 indicates the content for being configured to analysis HOA coefficient 11 to identify that HOA coefficient 11 is indicated from reality The unit for the content that the content that condition record generates still is generated from audio object.Content analysis unit 26 can determine HOA coefficient 11 It is to generate from the record of practical sound field or generated from artificial audio object.In some cases, when frame formula HOA coefficient 11 be from When record generates, HOA coefficient 11 is transmitted to the decomposition unit 27 based on vector by content analysis unit 26.In some cases, When frame formula HOA coefficient 11 is generated from Composite tone object, HOA coefficient 11 is transmitted to based on direction by content analysis unit 26 Synthesis unit 28.Synthesis unit 28 based on direction can indicate the synthesis based on direction for being configured to execute HOA coefficient 11 To generate the unit of the bit stream 21 based on direction.

As shown in the example of fig. 3, the decomposition unit 27 based on vector may include Linear Invertible Transforms (LIT) unit 30, parameter calculation unit 32, the unit 34 that reorders, foreground selection unit 36, energy compensating unit 38, sound quality tone decoder list Member 40, bitstream producing unit 42, Analysis of The Acoustic Fields unit 44, coefficient reduce unit 46, background (BG) selecting unit 48, spatial-temporal interpolation Unit 50 and quantifying unit 52.

Linear Invertible Transforms (LIT) unit 30 receives the HOA coefficient 11 in HOA channel form, and each channel indicates and ball (it is represented by HOA [k], and wherein k can be indicated for the given order of face basis function, the block of the associated coefficient of sub- order or frame The present frame or block of sample).The matrix of HOA coefficient 11 can have dimension D:M × (N+1)²。

LIT unit 30 can indicate the unit for being configured to execute the analytical form for being referred to as singular value decomposition.Although opposite It is described, but relative to any similar transformation for providing linear incoherent energy-intensive output set or can be decomposed in SVD Execute technology described in the present invention.Moreover, being generally intended to refer to non-null set (except nonspecific to the reference of " set " in the present invention Ground state otherwise), and be not intended to refer to the classical mathematics definition of the set comprising so-called " null set ".Alternative transforms may include The often referred to as principal component analysis of " PCA ".Depending on context, PCA can be referred to by several different names, such as discrete card is prosperous Nan-Luo Wei transformation, Hotelling transform, appropriate Orthogonal Decomposition (POD) and eigen value decomposition (EVD), name just a few.Be conducive to compress The property of this generic operation of the elementary object of audio data is " energy compression " and " decorrelation " of multi-channel audio data.

Under any circumstance, for purposes of example, it is assumed that LIT unit 30 executes singular value decomposition, and (it can be referred to again " SVD "), HOA coefficient 11 can be transformed into the set of two or more transformed HOA coefficients by LIT unit 30.It is transformed " set " of HOA coefficient may include the vector of transformed HOA coefficient.In the example of fig. 3, LIT unit 30 can be relative to HOA system Number 11 executes SVD to generate so-called V matrix, s-matrix and U matrix.In linear algebra, SVD can indicate that y multiplies by following form The Factorization of z real number or complex matrix X (wherein X can indicate multi-channel audio data, such as HOA coefficient 11):

X=USV*

U can indicate that y multiplies y real number or plural unitary matrix, and wherein the y column of U are referred to as the left unusual of multi-channel audio data Vector.S can indicate that the y with nonnegative real number multiplies z rectangle diagonal matrix on the diagonal, and wherein the diagonal line value of S is referred to as The singular value of multi-channel audio data.V* (it can indicate the conjugate transposition of V) can indicate that z multiplies z real number or plural unitary matrix, The z column of middle V* are referred to as the right singular vector of multi-channel audio data.

In some instances, above with reference to SVD mathematic(al) representation in V* matrix be expressed as the conjugate transposition of V matrix with Reflection SVD can be applied to include plural matrix.When being applied to only includes the matrix of real number, the complex conjugate of V matrix (or is changed Sentence is talked about, V* matrix) transposition of V matrix can be considered as.Hereinafter it is easy the purpose of explanation, it is assumed that HOA coefficient 11 includes real Number, the result is that passing through SVD rather than V* Output matrix V matrix.In addition, although V matrix is expressed as in the present invention, to V matrix Reference be interpreted as being related to the transposition of V matrix in the appropriate case.Although it is assumed that being V matrix, but the technology can be similar Mode is applied to the HOA coefficient 11 with complex coefficient, and wherein the output of SVD is V* matrix.Therefore, in this, the skill Art, which should not necessarily be limited by, only to be provided using SVD to generate V matrix, but may include that SVD is applied to the HOA coefficient with complex number components 11 to generate V* matrix.

By this method, LIT unit 30 can execute SVD relative to HOA coefficient 11 to export with dimension D:M x (N+1)²'s US [k] vector 33 (it can indicate the group form a version of S vector and U vector) and have dimension D:(N+1)²×(N+1)²V [k] to Amount 35.Respective vectors element in US [k] matrix is also referred to as X_PS(k), and the respective vectors in V [k] matrix can also be claimed For v (k).

U, the analysis of S and V matrix can be shown, the matrix carries or indicate the space of basic sound field represented by X above And time response.Each of N number of vector in U (length is M sample) can indicate to change over time (for by M sample The time cycle of this expression) through regular separating audio signals, it is orthogonal and (it can also quilt with any spatial character Referred to as directional information) decoupling.Representation space shape and positionSpace characteristics alternatively by V matrix Other i-th vector v⁽ⁱ⁾(k) (each has length (N+1)²) indicate.v⁽ⁱ⁾(k) individual element of each of vector can table Show HOA coefficient, describes shape (comprising width) and the position of the sound field of associated audio object.In U matrix and V matrix the two Vector through normalization so that its root mean square energy be equal to unit.The energy of audio signal in U is therefore by diagonal in S Line element indicates.U and S are multiplied to be formed US [k] (with respective vectors element X_PS(k)), thus indicate have energy sound Frequency signal.SVD decomposes the ability for decoupling audio time signal (in U), its energy (in S) with its spatial character (in V) and can prop up Hold the various aspects of technology described in the present invention.In addition, synthesizing basic HOA [k] system with the vector multiplication of V [k] by US [k] The model of number X provides the term " decomposition based on vector " used through this document.

It is executed although depicted as directly with respect to HOA coefficient 11, but Linear Invertible Transforms can be applied to by LIT unit 30 The export item of HOA coefficient 11.For example, LIT unit 30 can be answered relative to the power spectral density matrix derived from the HOA coefficient 11 Use SVD.By the power spectral density (PSD) relative to HOA coefficient rather than coefficient itself executes SVD, and LIT unit 30 can handled One or more of device circulation and memory space aspect potentially reduce the computational complexity for executing SVD, while realizing identical Source audio code efficiency, as SVD is directly applied to HOA coefficient.

Parameter calculation unit 32 indicates the unit for being configured to calculate various parameters, the parameter such as relevance parameter (R), direction property parameterAnd energy properties (e).Each of parameter for present frame is represented by R [k], θ [k]、R [k] and e [k].Parameter calculation unit 32 can be executed relative to US [k] vector 33 energy spectrometer and/or correlation (or So-called crosscorrelation) with identification parameter.Parameter calculation unit 32 may further determine that the parameter of previous frame, wherein previous frame parameter can Based on the previous frame with US [k-1] vector and V [k-1] vector be expressed as R [k-1], θ [k-1],R [k-1] and e [k-1].Parameter current 37 and preceding parameters 39 can be output to the unit 34 that reorders by parameter calculation unit 32.

The parameter calculated by parameter calculation unit 32 can be by the unit 34 that reorders reordering audio object to indicate It is assessed naturally or continuity over time.Reorder unit 34 can low damage in future direction the first US [k] vector 33 Each of each of parameter 37 and the parameter 39 of the 2nd US [k-1] vector 33 compared with.Reordering unit 34 can It is reordered the various vectors in US [k] matrix 33 and V [k] matrix 35 (as one based on parameter current 37 and preceding parameters 39 A example, uses Hungary Algorithm) with US [k] matrix 33 ' for exporting rearranged sequence, (it can be mathematically represented as) and (it can be mathematically represented as V [k] matrix 35 ' of rearranged sequence) single to foreground sounds (or main sound-PS) selection First 36 (" foreground selection units 36 ") and energy compensating unit 38.

Analysis of The Acoustic Fields unit 44 can indicate to be configured to execute Analysis of The Acoustic Fields relative to HOA coefficient 11 potentially to realize The unit of targeted bit rates 41.Analysis of The Acoustic Fields unit 44 can be based on the analysis and/or based on received targeted bit rates 41, really (it can be environment or the total number (BG of background channel to the total number of accordatura matter decoder example_TOT) function) and prospect channel The number of (or in other words prevailing channel).The total number of sound quality decoder example is represented by numHOATransportChannels。

Again for targeted bit rates 41 are potentially realized, Analysis of The Acoustic Fields unit 44 may further determine that the total number of prospect channel (nFG) 45, the minimal order (N of background (or in other words environment) sound field_BGOr alternatively, MinAmbHOAorder), indicate Corresponding number (the nBGa=(MinAmbHOAorder+1) of the actual channel of the minimal order of background sound field²), and to be sent The index (i) of additional BG HOA channel (it can be referred to collectively as background channel information 43 in the example of fig. 3).Background channel Information 42 is also known as environment channel information 43.It keeps in the channel from numHOATransportChannels-nBGa Each can for " Additional background/environment channel ", " based on the prevailing channel of vector in effect ", " based on the master in direction in effect Want signal " or " completely non-active in ".In an aspect, channel type can be designated as (such as by two positions " ChannelType ") syntactic element is (for example, 00: the signal based on direction；01: the main signal based on vector；10: extra loop Border signal；11: non-active middle signal).It can be by (MinAmbHOAorder+1)²+ occur for the letter in the bit stream for the frame The number of the index 10 (in example above) of road type provides the total number nBGa of background or environmental signal.

Analysis of The Acoustic Fields unit 44 can based on targeted bit rates 41 select background (or in other words environment) channel number and The number of prospect (or in other words main) channel, thus when targeted bit rates 41 are relatively high (for example, in target position speed When rate 41 is equal to or more than 512Kbps) select more background and/or prospect channel.In an aspect, in the header portion of bit stream In point, numHOATransportChannels may be set to 8, and MinAmbHOAorder may be set to 1.In this case, In At each frame, four channels can be exclusively used in indicate sound field background or environment division, and other 4 channels can frame by frame in channel Change in type -- for example, being used as Additional background/environment channel or prospect/prevailing channel.Prospect/main signal may be based on to One of amount or the signal based on direction, as described above.

In some cases, can be existed by ChannelType index for the total number of the main signal based on vector of frame It is provided in the bit stream of the frame for 01 number.In aspect above, for each Additional background/environment channel (for example, corresponding In ChannelType 10), the corresponding informance of one of possible HOA coefficient can be indicated (beyond preceding four) in the channel. For quadravalence HOA content, the information can be the index of instruction HOA coefficient 5 to 25.It can be set as 1 in minAmbHOAorder When send first four environment HOA coefficient 1 to 4 always, therefore, audio coding apparatus only may need to indicate additional environment HOA system There is one of index 5 to 25 in number.Therefore 5 syntactic elements (for quadravalence content) can be used to send the information, It is represented by " CodedAmbCoeffIdx ".Under any circumstance, Analysis of The Acoustic Fields unit 44 is by background channel information 43 and HOA system Number 11 is output to background (BG) selecting unit 36, and background channel information 43 is output to coefficient and reduces unit 46 and bit stream generation list Member 42, and nFG 45 is output to foreground selection unit 36.

Foreground selection unit 48 can indicate to be configured to based on background channel information (for example, background sound field (N_BG) and it is pending The number (nBGa) of the additional BG HOA channel sent and index (i)) determine the unit of background or environment HOA coefficient 47.Citing comes It says, works as N_BGEqual to for the moment, each sample of the audio frame with the order equal to or less than one is may be selected in Foreground selection unit 48 HOA coefficient 11.In this example, Foreground selection unit 48 can be selected then with the rope by indexing one of (i) identification The HOA coefficient 11 drawn is used as additional BG HOA coefficient, wherein it is raw single to provide the nBGa to specify in bit stream 21 to miscarriage in place Member 42 is to make audio decoding apparatus (for example, the audio decoding apparatus 24 shown in the example of Fig. 2 and 4) can be from bit stream 21 parsing background HOA coefficients 47.Environment HOA coefficient 47 can be then output to energy compensating unit 38 by Foreground selection unit 48. Environment HOA coefficient 47 can have dimension D:M × [(N_BG+1)²+nBGa].Environment HOA coefficient 47 is also known as " environment HOA system Number 47 ", wherein each of environment HOA coefficient 47 corresponds to the independent environment to be encoded by sound quality tone decoder unit 40 HOA channel 47.

Foreground selection unit 36 can indicate to be configured to that (it can indicate one or more of identification prospect vector based on nFG 45 Index) it selects to indicate rearranged sequence US [k] matrix 33 ' and rearranged sequence V [k] matrix 35 ' of the prospect of sound field or different components Unit.Foreground selection unit 36 can (it be represented by rearranged sequence US [k] by nFG signal 49_1,…,nFG49、FG_1,…,nfG[k] 49, or) it is output to sound quality tone decoder unit 40, wherein nFG signal 49 can have dimension D:M × nFG, And each indicates monophonic audio object.Foreground selection unit 36 can will also correspond to the rearranged sequence of the prospect component of sound field V [k] matrix 35 ' (or v^(1..nFG)(k) it 35 ') is output to spatial-temporal interpolation unit 50, wherein the middle correspondence of rearranged sequence V [k] matrix 35 ' It can be represented as having dimension D:(N+1 in the subset of prospect component)²× nFG) prospect V [k] matrix 51_k(it can mathematically table It is shown as)。

Energy compensating unit 38 can indicate to be configured to execute energy compensating relative to environment HOA coefficient 47 to compensate attribution In the unit of the energy loss generated by each in the removal HOA channel of Foreground selection unit 48.Energy compensating unit 38 can Relative to rearranged sequence US [k] matrix 33 ', rearranged sequence V [k] matrix 35 ', nFG signal 49, prospect V [k] vector 51_kAnd ring One or more of border HOA coefficient 47 executes energy spectrometer, and executes energy compensating based on energy spectrometer then to generate warp The environment HOA coefficient 47 ' of energy compensating.Environment HOA coefficient 47 ' through energy compensating can be output to sound by energy compensating unit 38 Matter tone decoder unit 40.

Spatial-temporal interpolation unit 50 can indicate prospect V [k] vector 51 for being configured to receive kth frame_kAnd previous frame (therefore be K-1 notation) prospect V [k-1] vector 51_k-1And spatial-temporal interpolation is executed to generate the unit of interpolated prospect V [k] vector.When Empty interpolation unit 50 can be by nFG signal 49 and prospect V [k] vector 51_kIt reconfigures to restore the prospect HOA system of rearranged sequence Number.Spatial-temporal interpolation unit 50 can be then interpolated to generate divided by interpolated V [k] vector by the prospect HOA coefficient of rearranged sequence NFG signal 49 '.Also exportable prospect V [k] vector 51 of spatial-temporal interpolation unit 50_kFor generating interpolated prospect V [k] Those of vector vector, so that the audio decoding apparatus of such as audio decoding apparatus 24 can produce interpolated prospect V [k] vector And restore prospect V [k] vector 51 whereby_k.Prospect V [k] vector 51 of interpolated prospect V [k] vector will be used to generate_kIt indicates For remaining prospect V [k] vector 53.In order to ensure at encoder and decoder using identical V [k] and V [k-1] (to create Interpolated vector V [k]), quantified/dequantized version of vector can be used at encoder and decoder.Spatial-temporal interpolation Unit 50 interpolated nFG signal 49 ' can be output to sound quality tone decoder unit 46 and by interpolated prospect V [k] to Amount 51_kIt is output to coefficient and reduces unit 46.

Coefficient reduces unit 46 and can indicate to be configured to based on background channel information 43 relative to remaining prospect V [k] vector 53 execution coefficients are reduced so that reduced prospect V [k] vector 55 to be output to the unit of quantifying unit 52.Reduced prospect V [k] vector 55 can have dimension D:[(N+1)²-(N_BG+1)²-BG_TOT]×nFG.Coefficient reduce unit 46 can in this respect in table Show the unit for being configured to reduce the number of coefficients in remaining prospect V [k] vector 53.In other words, coefficient reduction unit 46 can Expression is configured to eliminate and (form remaining prospect V [k] vector 53) in prospect V [k] vector have few directional information to not The unit of coefficient with directional information.In some instances, phase exclusive or (in other words) prospect V [k] vector correspond to one (it is represented by N to the coefficient of rank and zeroth order basis function_BG) few directional information is provided, and therefore can be removed from prospect V vector (by the process that can be referred to " coefficient reduction ").In this example, it is possible to provide larger flexibility is not only from set [(N_BG+1)²+ 1, (N+1)²] identify corresponding to N_BGCoefficient and also (it can be by variable TotalOfAddAmbHOAChan in the identification additional channel HOA It indicates).

Quantifying unit 52 can indicate to be configured to execute any type of quantization to compress reduced prospect V [k] vector 55 come generate through decode prospect V [k] vector 57, thus will through decode prospect V [k] vector 57 be output to bitstream producing unit 42 Unit.In operation, quantifying unit 52 can indicate the spatial component for being configured to compression sound field (that is, in this example for through subtracting One or more of few prospect V [k] vector 55) unit.The executable amount such as by being expressed as " NbitsQ " of quantifying unit 52 Change any one of following 12 kinds of quantitative modes of mode syntax element instruction:

Quantifying unit 52 can also carry out the predicted version of any one of quantitative mode of aforementioned type, wherein determining previous The element of the element (or flexible strategy when executing vector quantization) of the V vector of frame and the V vector of present frame is (or when executing vector quantization Flexible strategy) between difference.Quantifying unit 52 can the then non-present by the difference between present frame and the element or flexible strategy of previous frame The value of the element of the V vector of frame itself quantifies.

Quantifying unit 52 can execute the quantization of diversified forms relative to each of reduced prospect V [k] vector 55, To obtain the multiple through decoded version of reduced prospect V [k] vector 55.Reduced prospect V [k] may be selected in quantifying unit 52 Vector 55 is used as through one of decoded version through decoding prospect V [k] vector 57.In other words, quantifying unit 52 can be based on Any combination for the criterion discussed in the present invention selects the not predicted V vector through vector quantization, predicted through vector The V vector of quantization, the scalar-quantized V vector without Hoffman decodeng and the scalar-quantized V through Hoffman decodeng to One of amount, for use as the V vector for exporting transformed quantization.In some instances, quantifying unit 52 can be from including vector quantity Quantitative mode is selected in the quantitative mode set of change mode and one or more scalar quantization modes, and be based on (or according to) select mould Formula weightization inputs V vector.Quantifying unit 52 can then by the selected person in the following provide bitstream producing unit 52 with Make through decoding prospect V [k] vector 57: the not predicted V vector through vector quantization is (for example, with regard to flexible strategy value or instruction flexible strategy value Position for), the predicted V vector (for example, for position of error amount or index error value) through vector quantization, without suddenly The scalar-quantized V vector that Fu Man is decoded and the scalar-quantized V vector through Hoffman decodeng.Quantifying unit 52 can also mention For indicating the syntactic element (for example, NbitsQ syntactic element) of quantitative mode and for by V vector de-quantization or in other ways Rebuild any other syntactic element of V vector.

The sound quality tone decoder unit 40 for including in audio coding apparatus 20 can indicate the multiple of sound quality tone decoder Example, each of these person is for encoding through each of energy compensating environment HOA coefficient 47 ' and interpolated nFG signal 49 ' Different audio objects or HOA channel to generate encoded environment HOA coefficient 59 and encoded nFG signal 61.Sound quality audio is translated Encoded environment HOA coefficient 59 and encoded nFG signal 61 can be output to bitstream producing unit 42 by code device unit 40.

The bitstream producing unit 42 being contained in audio coding apparatus 20 is indicated data format to meet known format (it can be referred to the format as known to decoding apparatus) generates the unit of the bit stream 21 based on vector whereby.In other words, bit stream 21 It can indicate the coded audio data that mode described above encodes.Bitstream producing unit 42 can indicate in some instances Multiplexer, can receive through decoding prospect V [k] vector 57, encoded environment HOA coefficient 59, encoded nFG signal 61 and Background channel information 43.Bitstream producing unit 42 can be then based on through decoding prospect V [k] vector 57, encoded environment HOA coefficient 59, encoded nFG signal 61 and background channel information 43 generate bit stream 21.By this method, bitstream producing unit 42 can exist whereby 21 middle finger orientation amount 57 of bit stream is to obtain bit stream 21.Bit stream 21 may include main or status of a sovereign stream and one or more side channels bits Stream.

As described above, the various aspects of the technology also aloow bitstream producing unit 46 to refer in bit stream 21 Accordatura frequency spatial cue 2.Although the current version of upcoming 3D audio compression working draft provides the communication in bit stream 21 Specific downmix matrix, but working draft does not provide the renderer specified in bit stream 21 for rendering HOA coefficient 11.For HOA content, HOA is is indicated the rendering matrix for being converted into wanted loudspeaker feeding by the equivalent of this downmix matrix.In the present invention The various aspects of the technology of description are proposed by allowing the communication HOA in bit stream of bitstream producing unit 46 to render matrix (as (example As) audio spatial cue 2) further coordinate channel content and the characteristic set of HOA.

Decoding scheme presented below based on downmix matrix and the exemplary messaging solution optimized for HOA. Similar to the transmitting of downmix matrix, HOA renders matrix can be in the interior communication of mpegh3daConfigExtension ().The technology It can provide new expansion type ID_CONFIG_EXT_HOA_MATRIX (wherein italics and boldface letter instruction as described in following table With the change of existing table).

The grammer (table 13 in CD) of table-mpegh3daConfigExtension ()

The value (table 1 in CD) of table-usacConfigExtType

usacConfigExtType	Value
		ID_CONFIG_EXT_FILL	0
ID_CONFIG_EXT_DMX_MATRIX	1
		ID_CONFIG_EXT_LOUDNESS_INFO	2
ID_CONFIG_EXT_HOA_MATRIX	3
		/ * be preserved for ISO using */	4-127
/ * be preserved for outside ISO range using */	128 and higher

Compared to DownmixMatrixSet (), bit field HOARenderingMatrixSet () can be in structure and function It is equivalent in property.Instead of inputCount (audioChannelLayout), HOARenderingMatrixSet () be can be used " equivalent " the NumOfHoaCoeffs value calculated in HOAConfig.In addition, because sequence of the HOA coefficient in HOA decoder can Fixed (for example, with reference to the annex G in CD), so HOARenderingMatrixSet is not necessarily to inputConfig (audioChannelLayout) any equivalent.

The grammer (being used in the table 15 in CD) of table 2-HOARenderingMatrixSet ()

The various aspects of the technology also aloow bitstream producing unit 46 using the first compression scheme (such as by The decomposition compression scheme that decomposition unit 27 based on vector indicates) HOA audio data is compressed (for example, being in the example in figure 4 HOA coefficient 11) Shi Zhiding bit stream 21, so that corresponding to the second compression scheme (for example, being indicated by the decomposition unit 28 based on direction The compression scheme based on direction or the compression scheme based on directionality) position be not included in bit stream 21.For example, bit stream Generating unit 42 can produce bit stream 21, so as not to comprising can be preserved for the specified compression scheme based on direction direction signal between Predictive information HOAPredictionInfo syntactic element or field.It is shown according to the present invention in the example of Fig. 8 E and 8F The example for the bit stream 21 that the various aspects of the technology of description generate.

In other words, the prediction of direction signal can be to be synthesized by the main sound that the decomposition unit 28 based on direction utilizes Part, and depend on the presence of ChannelType 0 (it can indicate the signal based on direction).When there is no be based on direction in frame Signal when, the prediction of direction signal can not be executed.However, can independently of the signal based on direction presence by associated sideband The each frame of information HOAPredictionInfo () (even if being not used) write-in.When direction signal is not present in frame, the present invention Described in technology aloow bitstream producing unit 42 to pass through the not communication in sideband as illustrated in following table HOAPredictionInfo and the size (wherein underline italics indicate addition) for reducing sideband:

Table: the grammer of HOAFrame

In in this respect, the technology aloows the device of such as audio coding apparatus 20 using the first compression side When case compresses high-order ambiophony audio data, it is configured to specified and is also used for compression high-order ambiophony not comprising corresponding to The bit stream through compressed version of the expression high-order ambiophony audio data of the position of second compression scheme of audio data.

In some cases, the first compression scheme includes the decomposition compression scheme based on vector.In these and other situation Under, the decomposition compression scheme based on vector includes being related to singular value decomposition (or relatively its equivalent of detailed description in the present invention) It is applied to the compression scheme of high-order ambiophony audio data.

At these and other, audio coding apparatus 20, which can be configured, to be corresponded to specifying and not including for executing The bit stream of the position of at least one syntactic element of the compression scheme of Second Type.As mentioned above, the second compression scheme can wrap Include the compression scheme based on directionality.

Audio coding apparatus 20 also can be configured with specify bit stream 21 so that bit stream 21 and do not include correspond to second compression The position of the HOAPredictionInfo syntactic element of scheme.

When the second compression scheme includes the compression scheme based on directionality, audio coding apparatus 20 be can be configured with specified Bit stream 21, so that bit stream 21 and not including HOAPredictionInfo grammer member corresponding to the compression scheme based on directionality The position of element.In other words, audio coding apparatus 20 can be configured with specify bit stream 21 so that bit stream 21 and do not include correspond to use In the position of at least one syntactic element of the compression scheme for executing Second Type, described at least one syntactic element indicate two or More than two is based on the prediction between the signal in direction.It bears repeat that, when the second compression scheme includes the compression based on directionality When scheme, audio coding apparatus 20 can be configured with specify bit stream 21 so that bit stream 21 and do not include correspond to be based on directionality Compression scheme HOAPredictionInfo syntactic element position, wherein HOAPredictionInfo syntactic element instruction two A or more than two is based on the prediction between the signal in direction.

The various aspects of the technology can be further such that bitstream producing unit 46 can specify bit stream in some cases 21 so that bit stream 21 and do not include gain calibration data.When gain calibration is suppressed, bitstream producing unit 46 may specify bit stream 21 so that bit stream 21 and do not include gain calibration data.As mentioned above, it shows in the example of Fig. 8 E and 8F according to the skill The example for the bit stream 21 that the various aspects of art generate.

In some cases, encoding in view of certain form of sound quality has relatively compared to other types of sound quality coding Small dynamic range applies gain calibration when executing these certain form of sound quality coding.For example, AAC has compared to system One voice and audio coding (USAC) relatively small dynamic range.When compression scheme (such as the synthesis compression side based on vector Case or compression scheme based on direction) when being related to USAC, bitstream producing unit 46 can in bit stream 21 communication gain calibration by Inhibit (for example, by with the syntactic element MaxGainCorrAmpExp in the specified HOAConfig of value 0) and being connect in bit stream 21 Specified bit stream 21, in order to avoid include gain calibration data (in HOAGainCorrectionData () field).

In other words, the bit field MaxGainCorrAmpExp as the part of HOAConfig (referring to the table 71 in CD) Controllable automatic growth control module influences the degree of transport channel signal before the decoding of USAC core.In some cases, This module is developed for RM0 to improve the non-ideal dynamic range that can use AAC coder implementation.During integration phase Change from AAC in the case of USAC core decoder, the dynamic range of core encoder can be improved, and therefore this gain control molding Block can not as previously described as it is important.

In some cases, if MaxGainCorrAmpExp is set to 0, gain control function can be suppressed.In It, can not be by associated side information according to the upper table of explanation " grammer of HOAFrame " in the case of these HOAGainCorrectionData () is written to every HOA frame.0 configuration is set to for MaxGainCorrAmpExp, this Technology described in invention can not communication HOAGainCorrectionData.In addition, in this case, can even bypass inverse Gain control module, to reduce decoding to every transport channel about 0.05MOPS in the case where not having any negative side effect profile Device complexity.

In in this respect, the technology can configure audio coding apparatus 20 in the compression of high-order ambiophony audio data When period inhibits gain calibration, the bit stream 21 through compressed version for indicating high-order ambiophony audio data is specified, so that bit stream 21 and do not include gain calibration data.

At these and other, audio coding apparatus 20 be can be configured according to the decomposition compression scheme based on vector Compress high-order ambiophony audio data, with generate high-order ambiophony audio data through compressed version.Decompose compression scheme Example can be related to by singular value decomposition (or its equivalent described in more detail above) be applied to high-order ambiophony audio number According to, with generate high-order ambiophony audio data through compressed version.

At these and other, audio coding apparatus 20 be can be configured with will be in bit stream 21 MaxGainCorrAmbExp syntactic element is appointed as zero, to indicate that gain calibration is suppressed.In some cases, when gain school Just be suppressed when, audio coding apparatus 20 can be configured with specify bit stream 21 so that bit stream 21 and do not include storage gain calibration The HOAGainCorrection data field of data.In other words, audio coding apparatus 20 can be configured with will be in bit stream 21 MaxGainCorrAmbExp syntactic element is appointed as zero, stores gain to indicate that gain calibration is suppressed and does not include in bit stream The HOAGainCorrection data field of correction data.

At these and other, audio coding apparatus 20 be can be configured in the pressure of high-order ambiophony audio data Inhibit gain when contracting is comprising being applied to high-order ambiophony audio data for unified audio voice and speech audio decoding (USAC) Correction.

The mode that can be further detailed below is adjusted or is updated in other ways various in aforementioned decode bit stream 21 The potential optimization of the communication of information.It is described to update in combination with the other updates application being discussed herein below or discussed above for only updating State the various aspects of optimization.Thus, each potential combination of the update to optimization described above is considered, comprising application to above The single update described below of described optimization, or application is to any spy of the update described below of optimization described above Fixed combination.

To refer to that set matrix, bitstream producing unit 42 can be (for example) in bit streams 21 in bit stream ID_CONFIG_EXT_HOA_MATRIX is specified in mpegh3daConfigExtension (), it is as follows to be shown as in following table Overstriking and highlight word.Following table indicates the language for specifying mpegh3daConfigExtension () part of bit stream 21 Method:

The grammer of table-mpegh3daConfigExtension ()

ID_CONFIG_EXT_HOA_MATRIX in aforementioned table provides the container to specified rendering matrix, the container It is expressed as " HoaRenderingMatrixSet () ".

The content of syntactic definition HoaRenderingMatrixSet () container according to described in following table:

The grammer of table-HoaRenderingMatrixSet ()

As shown in the table of surface, HoaRenderingMatrixSet () includes several different syntactic elements, includes numHoaRenderingMatrices、HoaRendereringMatrixId、CICPspeakerLayoutIdx、 HoaMatrixLenBits and HoARenderingMatrix.

NumHoaRenderingMatrices syntactic element may specify present in bit stream element The number that HoaRendereringMatrixId is defined.HoaRenderingMatrixId syntactic element can indicate uniquely to define The field of the Id of the default HOA rendering matrix or emitted HOA rendering matrix that can be used on decoder-side.In in this respect, The example that HoaRenderingMatrixId can indicate the signal value of two or more comprising defining an index, it is described Index instruction bit stream includes the matrix for spherical harmonics coefficient to be rendered into multiple speaker feeds；Or it indicates comprising defining one The example of the signal value of two or more of index, it is described to index and be used to spherical harmonics coefficient being rendered into multiple raise One of multiple matrixes of sound device feeding are associated.CICPspeakerLayoutIdx syntactic element can indicate description for Determine the value of the output loudspeaker layout of HOA rendering matrix, and can correspond to defined in 23000 1-8 of ISO/IEC ChannelConfiguration element.(it is also referred to as HoaMatrixLenBits " HoaRenderingMatrixLenBits ") syntactic element can position be unit specify following bit stream element (for example, HoaRenderingMatrix () container) length.

HoaRenderingMatrix () container includes NumOfHoaCoeffs, and followed by outputConfig () holds Device and outputCount () container.OutputConfig () container may include the channel of the specified information about each loudspeaker Configure vector.Bitstream producing unit 42 can be assumed that known to this channel configuration of loudspeaker information from output layout.Each single item OutputConfig [i] can indicate the data structure with following component:

AzimuthAngle (it can indicate the azimuthal absolute value of loudspeaker)；

AzimuthDirection (its can indicate (as an example) by 0 be used for left side and 1 be used for right side orientation side To)；

Elevation Angle (it can indicate the absolute value at the loudspeaker elevation angle)；

ElevationDirection (its can indicate (as an example) by 0 be used for upwards and 1 be used for it is downward vertical Direction)；And

IsLFE (it can indicate whether loudspeaker is low-frequency effects (LFE) loudspeaker).

Bitstream producing unit 42 can call the auxiliary letter for being expressed as " findSymmetricSpeakers " in some cases Number, can further specify that the function below:

(it can store SYMMETRIC (mean two symmetrical loudspeakers to) in a certain example, CENTER to pairType Or the value of ASYMMETRIC)；And

(it can indicate second (for example, right side) loudspeaker in group to symmetricPair- > originalPosition Original channel configuration in position, be only used for SYMMETRIC group).

OutputCount () container may specify the loudspeaker number that HOA rendering matrix is defined for it.

The grammer according to described in following table of bitstream producing unit 42 specifies HoaRenderingMatrix () container:

The grammer of table-HoaRenderingMatrix ()

As shown in the table of surface, numPairs syntactic element be set to from calling using outputCount and The value of outputConfig and hasLfeRendering findSymmetricSpeakers auxiliary function output as input. Therefore numPairs can indicate the symmetrical loudspeaker it is contemplated that for identifying in the output loudspeaker setting of efficient symmetry decoding Pair number.PrecisionLevel syntactic element in upper table can be indicated for the precision according to following table uniform quantization gain:

The uniform quantization step sizes for the hoaGain that table-becomes with precisionLevel

precisionLevel	Minimum quantization step sizes [dB]
		0	1.0
1	0.5
		2	0.25
3	0.125

The gainLimitPerHoaOrder language of the grammer of the elaboration HoaRenderingMatrix () shown in upper table Method element can indicate that instruction maxGain and minGain is individually referred to for each order or for entire HOA rendering matrix Fixed flag.MaxGain [i] syntactic element can refer to the practical increasing of maximum of the coefficient in set matrix for expressing with HOA order I Benefit, as an example, with decibel (dB) for unit.MinGain [i] syntactic element can refer in set matrix for HOA order I The minimum actual gain of the coefficient of expression is equally used as an example, as unit of dB.IsFullMatrix syntactic element can table Show that instruction HOA rendering matrix is the sparse flag being also filled with.Refer in HOA rendering matrix according to isFullMatrix syntactic element It is set in sparse situation, firstSparseOrder syntactic element may specify the first HOA order through sparse decoding. IsHoaCoefSparse syntactic element can indicate the bitmask vector derived from firstSparseOrder syntactic element. LfeExists syntactic element can indicate to indicate one or more LFE with the presence or absence of the flag in outputConfig. Whether hasLfeRendering syntactic element instruction rendering matrix contains the nonzero element for one or more LFE channels. Whether zerothOrderAlwaysPositive syntactic element can indicate instruction 0HOA order only with the flag of positive value.

IsAllValueSymmetric syntactic element can indicate to indicate all symmetrical loudspeakers in HOA rendering matrix Whether there is the flag of equal absolute.IsAnyValueSymmetric syntactic element indicates to be (for example) fictitious time instruction pair Claim some flags whether in HOA rendering matrix with equal absolute of loudspeaker centering.valueSymmetricPairs Syntactic element can indicate that instruction has the length bitmask of the numPairs of the loudspeaker pair of value symmetry. IsValueSymmetric syntactic element can indicate in a manner of being shown in table 3 from valueSymmetricPairs grammer member Bitmask derived from element.IsAllSignSymmetric syntactic element can indicate when not existence value symmetry in matrix, institute There is symmetrical loudspeaker to whether at least digital sign symmetry.IsAnySignSymmetric syntactic element can indicate to refer to Show the flag with the presence or absence of at least some symmetrical loudspeakers pair with digital sign symmetry.signSymmetricPairs Syntactic element can indicate that instruction has the length bitmask of the numPairs of the loudspeaker pair of sign symmetry. IsSignSymmetric variable can indicate side shown in the table of the grammer of HoaRenderingMatrix () set forth above Formula bitmask derived from signSymmetricPairs syntactic element.HasVerticalCoef syntactic element can indicate to indicate Matrix whether be only horizontal HOA rendering matrix flag.BootVal syntactic element can indicate the variable for decoding loop.

In other words, bitstream producing unit 42 can analyze sound renderer 1 to generate appointing in value symmetry information above What one or more (for example, isAllValueSymmetric syntactic element, isAnyValueSymmetric syntactic element, ValueSymmetricPairs syntactic element, isValueSymmetric syntactic element and valueSymmetricPairs grammer Any combination of one or more of element) or acquisition value symmetry information in other ways.Bitstream producing unit 42 can more than Literary institute's exhibition method specific audio frequency renderer information 2 in bit stream 21, so that sound renderer information 2 includes that value sign is symmetrical Property information.

In addition, bitstream producing unit 42 can also analyze sound renderer 1 to generate in sign symmetry information above Any one or more (for example, isAllSignSymmetric syntactic element, isAnySignSymmetric syntactic element, SignSymmetricPairs syntactic element, isSignSymmetric syntactic element and signSymmetricPairs grammer member Any combination of one or more of element) or in other ways obtain sign symmetry information.Bitstream producing unit 42 can be with Mode shown above specific audio frequency renderer information 2 in bit stream 21, so that sound renderer information 2 includes audio sign Symmetry information.

When determining value symmetry information and sign symmetry information, bitstream producing unit 42, which can be analyzed, can be designed to square The various values of the sound renderer 1 of battle array.The puppet inverse that rendering matrix can be worked out as matrix R.In other words, for by (N+1)²It is a HOA channel (hereinafter represented as Z) is rendered into L loudspeaker signal (being indicated by the column vector p of L loudspeaker signal), can provide Following equation:

Z=R*p.

For reach output L loudspeaker signal rendering matrix, as in following equation the inverse of R matrix is multiplied with showing With Z HOA channel:

P=R^-1*Z。

Unless the number L of loudspeaker channel is identical to the number (N+1) of Z HOA channel², otherwise matrix R will not be square Shape and it not can determine complete inverse.As a result, pseudo- inverse is alternatively used, it is defined as:

Pinv (R)=R^T(R*R^T)^-1,

Wherein R^TIndicate the transposition of R matrix.Replace the R in equation above^-1, L loudspeaker signal being indicated by column vector p Solution can mathematically indicate as follows:

P=pinv (R) * Z=R^T(R*R^T)^-1*Z。

The item of R matrix is the spherical harmonics value of loudspeaker location, wherein (N+1)²Behavior difference spherical harmonics and L, which are classified as, to be raised Sound device.Bitstream producing unit 42 can determine loudspeaker pair based on the value of loudspeaker.Analyze the spherical harmonics value of loudspeaker location, position Which loudspeaker location is stream generation unit 42 can determine in pairs (for example, due to similar, almost the same to that can have based on described value Or identical value but have relative sign).

It is described to later identifying, bitstream producing unit 42 can for it is every it is a pair of determine it is described to whether have identical value or Almost the same value.When it is all to identical value when, bitstream producing unit 42 can be by isAllValueSymmetric syntactic element It is set as one.When it is all to and when not having identical value, bitstream producing unit 42 can be by isAllValueSymmetric grammer member Element is set as zero.When one or more pairs it is not all to identical value when, bitstream producing unit 42 can will IsAnyValueSymmetric syntactic element is set as one.When the centering none with identical value when, bitstream producing unit IsAnyValueSymmetric syntactic element can be set as zero by 42.For pair with symmetry value, bitstream producing unit 42 can Only for the loudspeaker to a value is specified rather than two single values, reduced in bit stream 21 whereby for indicating that audio rendering is believed The bits number of breath 2 (for example, being in this example matrix).

When it is described to not existence value symmetry in the middle when, bitstream producing unit 42 also a pair of can determine loudspeaker for every To whether with sign symmetry (meaning that another loudspeaker has positive value to a loudspeaker with negative value).When all right When with sign symmetry, isAllSignSymmetric syntactic element can be set as one by bitstream producing unit 42.Work as institute When having to and not having sign symmetry, isAllSignSymmetric syntactic element can be set as by bitstream producing unit 42 Zero.When one or more pairs it is not all to sign symmetry when, bitstream producing unit 42 can will IsAnySignSymmetric syntactic element is set as one.When the centering none with sign symmetry when, bit stream produce IsAnySignSymmetric syntactic element can be set as zero by raw unit 42.For pair with symmetrical sign, bit stream is produced Raw unit 42 can specify sign only for the loudspeaker rather than two independent signs to a specified sign or not, It is reduced in bit stream 21 whereby for indicating the bits number of audio spatial cue 2 (for example, being in this example matrix).

Bitstream producing unit 42 can be according to specified elaboration HoaRenderingMatrix () of the grammer shown in following table The DecodeHoaMatrixData () container shown in the table of grammer:

The grammer of table-DecodeHoaMatrixData

Illustrate that the hasValue syntactic element in the aforementioned table of the grammer of DecodeHoaMatrixData can indicate instruction square Array element element whether the flag through sparse decoding.SignMatrix syntactic element can indicate HOA rendering matrix sign value (as One example) in the matrix of linearized vector form.HoaMatrix syntactic element can indicate (as an example) in warp Property vector form HOA render matrix value.Bitstream producing unit 42 can be illustrated according to the grammer shown in following table is specified The DecodeHoaGainValue () container shown in the table of the grammer of DecodeHoaMatrixData:

The grammer of table-DecodeHoaGainValue

Bitstream producing unit 42 can be according to the specified grammer for illustrating DecodeHoaGainValue of the grammer specified in following table Table in readRange () container for being shown:

The grammer of table 7-ReadRange

Although not showing in the example of fig. 3, audio coding apparatus 20 also may include bitstream output unit, the bit stream Output unit will use the synthesis based on direction or the composite coding based on vector based on present frame and switch from audio coding The bit stream (for example, switching between the bit stream 21 based on direction and the bit stream based on vector 21) that device 20 exports.Bit stream output Unit can execute synthesizing based on direction (as detecting that HOA coefficient 11 is based on the instruction exported by content analysis unit 26 The result generated from Composite tone object) or the synthesis based on vector is executed (as detecting the recorded knot of HOA coefficient Fruit) syntactic element execute the switching.Bitstream output unit may specify correct header grammer with indicate for present frame with And the switching or present encoding of the corresponding one in bit stream 21.

In addition, as mentioned above, Analysis of The Acoustic Fields unit 44 can recognize BG_TOTEnvironment HOA coefficient 47, the coefficient can be by Frame change (but BG sometimes_TOTIt may span across two or more neighbouring (in time) frames to keep constant or identical).BG_TOTChange Become the change that can lead to the coefficient expressed in reduced prospect V [k] vector 55.BG_TOTChange can bring background HOA coefficient (it is also known as " environment HOA coefficient "), the background HOA coefficient change (but equally, BG frame by frame_TOTSometimes two be may span across Or neighbouring (in time) frame of more than two is kept constant or identical).It is described to change the energy for frequently resulting in the various aspects of sound field Change, the sound field is by the addition or removal of additional environment HOA coefficient and coefficient from pair of reduced prospect V [k] vector 55 It should remove or the addition of coefficient to reduced prospect V [k] vector indicates.

As a result, Analysis of The Acoustic Fields unit 44 can further determine that when environment HOA coefficient changes from frame to frame, and generate instruction The flag of the change of environment HOA coefficient or other syntactic elements (for the context components for indicating sound field) are (wherein described Change is also known as " transformation " of environment HOA coefficient or " transformation " of environment HOA coefficient).Specifically, coefficient reduces unit 46 can produce flag (it is represented by AmbCoeffTransition flag or AmbCoeffIdxTransition flag), from And provide the flag to bitstream producing unit 42, so that the flag, which can be included in bit stream 21, (may be used as side The part of channel information).

In addition to designated environment coefficient changes flag, coefficient reduce unit 46 can also modify generate reduced prospect V [k] to The mode of amount 55.In an example, determine one of environment HOA environmental coefficient during present frame in transformation in it Afterwards, coefficient, which reduces unit 46, may specify vector coefficient (its of each of V vector for reduced prospect V [k] vector 55 It is also known as " vector element " or " element "), the vector coefficient corresponds to the environment HOA coefficient in transformation.Equally, Environment HOA coefficient in transformation can be added to the BG of background coefficient_TOTTotal number or BG from background coefficient_TOTTotal number moves It removes.Therefore, the gained of the total number of background coefficient, which changes, influences whether environment HOA coefficient is contained in bit stream, and institute above Whether corresponding element that in bit stream specified V vector include V vector is directed in second and third configuration mode of description.About Coefficient reduces how unit 46 can specify reduced prospect V [k] vector 55 to overcome the relatively multi information of energy change to be provided in " transformation (the TRANSITIONING OF AMBIENT of environment high-order ambiophony coefficient entitled filed on January 12nd, 2015 HIGHER_ORDER AMBISONIC COEFFICIENTS) " No. 14/594,533 US application case in.

Fig. 4 is the block diagram that the audio decoding apparatus 24 of Fig. 2 is relatively described in detail.As Fig. 4 example in show, audio decoder Device 24 may include extraction unit 72, renderer reconstruction unit 81, the reconstruction unit 90 based on directionality and be based on vector Reconstruction unit 92.Although being described below, about audio decoding apparatus 24 and decompresses or decode in other ways The relatively multi information of the various aspects of HOA coefficient can be entitled " for sound field through exploded representation filed on May 29th, 2014 Interpolation (NTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD) " WO It is obtained in No. 2014/194099 International Patent Application Publication.

Extraction unit 72 can indicate to be configured to receive bit stream 21 and extract each of audio spatial cue 2 and HOA coefficient 11 The unit of the encoded version (for example, the encoded version based on direction or encoded version based on vector) of kind.In other words, High-order ambiophony (HOA) rendering matrix can be emitted by audio coding apparatus 20 to be enabled to control in audio playback systems 16 HOA render process processed.Transmitting can be by means of type ID_CONFIG_EXT_HOA_MATRIX's shown above Mpegh3daConfigExtension promotes.Mpegh3daConfigExtension can match containing reproducing for different loudspeakers Several HOA rendering matrix set.When emitting HOA rendering matrix, audio coding apparatus 20 is for every HOA rendering matrix letter Number, communication determines the associated target loudspeaker layout of the size of rendering matrix together with HoaOrder.

The transmitting of unique HoaRenderingMatrixId allows available default HOA wash with watercolours at reference audio playback system 16 Matrix is contaminated, or renders matrix with reference to emitted HOA outside audio bit stream 21.In some cases, it is assumed that every HOA wash with watercolours Dye matrix is regular to be turned to N3D and follows the sequence of the HOA coefficient as defined in bit stream 21.

As mentioned above, function findSymmetricSpeakers can indicate all in provided loudspeaker setting The number of loudspeaker pair and position, as an example, the loudspeaker at so-called " most effective point " to relative to listening to The mesion of person is symmetrical.This auxiliary function may be defined as follows:

int findSymmetricSpeakers(int outputCount,SpeakerInformation* outputConfig,int hasLfeRendering)；

72 callable function createSymSigns of extraction unit with calculate with 1.0 and -1.0 value vector, connect Can be used for generating associated with symmetrical loudspeaker matrix element.This createSymSigns function may be defined as follows:

Extraction unit 72 callable function create2dBitmask is generated to identify the HOA system for being only used for horizontal plane Several bitmasks.Create2dBitmask function may be defined as follows:

Matrix coefficient is rendered for decoding HOA, extraction unit 72 can extract syntactic element first HoaRenderingMatrixSet (), containing as mentioned above can be through application to realize to wanted loudspeaker layout One or more HOA of HOA rendering render matrix.In some cases, given bit stream can be free of and have The more than one example of HoaRenderingMatrixSet ().Syntactic element HoaRenderingMatrix () contains HOA wash with watercolours It contaminates matrix information (it is represented by renderer information 2 in the example in figure 4).Extraction unit 72 can be decoded bootable first It is read in the configuration information of journey.Then, extraction unit 72 correspondingly reads matrix element.

In some cases, extraction unit 72 beginning read field precisionLevel and gainLimitPerOrder.When setting flag gainLimitPerOrder, extraction unit 72 is independent for every HOA order It reads and decodes maxGain and minGain field in ground.When flag gainLimitPerOrder is not set, extraction unit 72 exists Field maxGain and minGain is once read and decoded during decoding process and these fields are applied to all HOA orders. In some cases, the value of minGain must be between 0db and -69dB.In some cases, the value of maxGain must be It is lower than the value of minGain between 1dB and 111dB.Fig. 9 is to illustrate that the HOA order dependence in HOA rendering matrix is minimum and maximum The figure of the example of gain.

Next extraction unit 72 can read flag isFullMatrix, can communication matrix be defined as full of or portion Divide sparse.When matrix be defined as part it is sparse when, extraction unit 72 reads next field (for example, firstSparseOrder language Method element), specify HOA to render matrix from its HOA order through sparse decoding.It reproduces and is arranged depending on loudspeaker, HOA rendering Matrix can be usually intensive for low order and becomes sparse in high-order.Figure 10 is the part sparse 6 illustrated for 22 loudspeakers The figure of rank HOA rendering matrix.The sparsity of matrix demonstrated in Figure 10 starts at the 26th HOA coefficient (HOA order 5).

It whether there is depending on one or more low-frequency effects (LFE) channels and reproduce in setting in loudspeaker (by lfeExists Syntactic element instruction), field hasLfeRendering can be read in extraction unit 72.When hasLfeRendering is not set, It is digital zero that extraction unit 72, which is configured to the matrix element for assuming to be relevant to LFE channel,.It is read by extraction unit 72 next Field be flag zerothOrderAlwaysPositive, communication matrix element associated with the coefficient of the 0th rank whether be Just.Indicate that zeroth order HOA coefficient is positive in this case, extraction unit 72 determines in zerothOrderAlwaysPositive The digital sign of the undecoded rendering matrix coefficient for corresponding to zeroth order HOA coefficient.

It hereinafter, can be for the property about the symmetrical loudspeaker of mesion to communication HOA rendering matrix.In some feelings Under condition, exists and be relevant to a) value symmetry and b) two symmetry properties of sign symmetry.When being worth symmetry, and The matrix element of the side left loudspeaker of symmetrical loudspeaker pair is not decoded, but extraction unit 72 is by utilizing auxiliary function CreateSymSigns exports those elements from the decoded matrix element of side right loudspeaker, and the function executes as follows:

PairIdx=outputConfig [j] .symmetricPair- > originalPosition；

HoaMatrix [i*outputCount+j]=hoaMatrix [i*outputCount+pairIdx]；And

SignMatrix [i*outputCount+j]=symSigns [i] * signMatrix [i*outputCount+ pairIdx]。

When loudspeaker to and non-value it is symmetrical when, then matrix element may be symmetrical about its digital sign.When loudspeaker pair When being that sign is symmetrical, the digital sign of the matrix element of the side left loudspeaker of symmetrical loudspeaker pair is not decoded, and extract Unit 72 is positive and negative by the number using auxiliary function createSymSigns from matrix element associated with side right loudspeaker These digital signs number are exported, the function executes as follows:

PairIdx=outputConfig [j] .symmetricPair- > originalPosition；

SignMatrix [i*outputCount+j]=symSigns [i] * signMatrix [i*outputCount+ pairIdx]；

Figure 11 is the figure for illustrating the communication of symmetry property.Value is symmetrical and sign to that can not be defined as simultaneously for loudspeaker Symmetrically.Finally decoding flag hasVerticalCoef designates whether only to decode associated with circulation (that is, 2D) HOA coefficient Matrix element.If hasVerticalCoef is not set, will be defined with auxiliary function create2dBitmask and HOA system The associated matrix element of number is set to digital zero.

That is, the procedure extraction audio spatial cue 2 according to described in Figure 11 of extraction unit 72.Extraction unit 72 Can isAllValueSymmetric syntactic element (300) be read from bit stream 21 first.When isAllValueSymmetric grammer When element is set to one (or in other words boolean is true), the value of the recyclable access numPairs syntactic element of extraction unit 72, from And (effectively indicate all loudspeakers to being value pair the value that valueSymmetricPairs array grammer element is set to one Claim) (302).

When isAllValueSymmetric syntactic element is set to zero (or in other words boolean is false), extraction unit 72 IsAnyValueSymmetric syntactic element (304) can next be read.When isAnyValueSymmetric syntactic element is set When determining at one (or in other words boolean is true), the value of the recyclable access numPairs syntactic element of extraction unit 72, thus will ValueSymmetricPairs array grammer element is set to from the position that bit stream 21 is sequentially read (306).Extraction unit 72 may be used also Obtain for be set to zero valueSymmetricPairs syntactic element it is described to any one of IsAnySignSymmetric syntactic element (308).Extraction unit 72 then can again described in cyclic access pair number, and work as When valueSymmetricPairs is equal to zero, the value read from bit stream 21 is set to by signSymmetricPairs (310)。

When isAnyValueSymmetric syntactic element is set to zero (or in other words boolean is false), extraction unit 72 IsAllSignSymmetric syntactic element (312) can be read from bit stream 21.When isAllSignSymmetric syntactic element is set When determining at one (or in other words boolean is true), the value of the recyclable access numPairs syntactic element of extraction unit 72, thus will The value that signSymmetricPairs array grammer element is set to one (effectively indicates all loudspeakers to being that sign is symmetrical ) (316).

When isAllSignSymmetric syntactic element is set to zero (or in other words boolean is false), extraction unit 72 IsAnySignSymmetric syntactic element (316) can be read from bit stream 21.The recyclable access numPairs language of extraction unit 72 The value of method element, to being set to signSymmetricPairs array grammer element from the position that bit stream 21 is sequentially read (318).The executable reversible process above in relation to process described by extraction unit 72 of bitstream producing unit 42 is with designated value pair The combination of title property information, sign symmetry information or value and sign symmetry both information.

Renderer, which rebuilds unit 81, can indicate to be configured to the unit that renderer is rebuild based on audio spatial cue 2. That is, renderer, which rebuilds unit 81, can be read a series of matrix element yield values using property mentioned above.For Absolute gain value is read, renderer rebuilds 81 callable function DecodeGainValue () of unit.Renderer is rebuild single Member 81 can call the function ReadRange () of alphabetic index with equably decoded gain value.When decoded yield value and nonnumeric When zero, in addition renderer, which rebuilds unit 81, can read digital sign (according to hereafter table a).When matrix element is dilute with communication Dredge (via isHoaCoefSparse) HOA coefficient it is associated when, hasValue flag prior to gainValueIndex (referring to Table b).When hasValue flag is zero, this element is set to digital zero and not communication gainValueIndex and sign.

Example of the table a and b- to the bit stream syntax of decoding matrix element

Depending on the specified symmetry property of loudspeaker pair, renderer rebuild unit 81 can be exported from side right loudspeaker with The associated matrix element of side left loudspeaker.In the case, it reduces or therefore may be omitted completely in bit stream 21 to decode The audio spatial cue 2 of matrix element for side left loudspeaker.

By this method, audio decoding apparatus 24 can determine symmetry information to reduce the big of audio spatial cue to be specified It is small.In some cases, audio decoding apparatus 24 can determine symmetry information to reduce the big of audio spatial cue to be specified It is small, and at least part based on symmetry information export sound renderer.

At these and other, audio decoding apparatus 24 can determine value symmetry information to reduce audio to be specified The size of spatial cue.At these and other, audio decoding apparatus 24 can export audio wash with watercolours based on value symmetry information Contaminate at least part of device.

At these and other, audio decoding apparatus 24 can determine that sign symmetry information is to be specified to reduce The size of audio spatial cue.At these and other, audio decoding apparatus 24 can be led based on sign symmetry information At least part of sound renderer out.

At these and other, audio decoding apparatus 24 can determine that instruction is more for spherical harmonics coefficient to be rendered into The sparsity information of the sparsity of the matrix of a speaker feeds.

At these and other, audio decoding apparatus 24, which can determine, will use matrix to be rendered into spherical harmonics coefficient The loudspeaker layout of multiple speaker feeds.

In in this respect, audio decoding apparatus 24 can then determine the audio spatial cue 2 specified in bit stream.Based on comprising The rendering of one of sound renderer 22 is multiple can be used to raise for signal value in audio spatial cue 2, audio playback systems 16 Sound device feeding 25.Speaker feeds can drive the speaker 3.As mentioned above, signal value can be in some cases comprising being used for By spherical harmonics coefficient be rendered into multiple speaker feeds matrix (its it is decoded and be provided as in sound renderer 22 one Person).In the case, audio playback systems 16 can use one of described matrix configuration sound renderer 22, to use sound This one in frequency renderer 22 is based on matrix and renders speaker feeds 25.

For extract and then decode HOA coefficient 11 various encoded versions so that HOA coefficient 11 can be used to using Obtained sound renderer 22 renders, and extraction unit 72 can be via various from instruction HOA coefficient 11 based on direction or base It is determined in the encoded syntactic element that is previously mentioned of the version of vector.When executing the coding based on direction, extract single (it is in Fig. 4 for the version based on direction of first 72 extractable HOA coefficients 11 and syntactic element associated with the encoded version Example in be expressed as the information 91 based on direction), so that the information 91 based on direction is transmitted to the reconstruction based on direction Unit 90.Reconstruction unit 90 based on direction can indicate to be configured to the reconstruction of the information 91 based on described based on direction The unit of the HOA coefficient of 11 ' form of HOA coefficient.

When syntactic element instruction HOA coefficient 11 is encoded using the decomposition based on vector, extraction unit 72 is extractable Through decoding prospect V [k] vector 57 (its may include decoded flexible strategy 57 and/or index 63 or scalar-quantized V vector), it is encoded Environment HOA coefficient 59 and correspondence audio object 61 (it is also known as encoded nFG signal 61).Audio object 61 is respectively corresponding In one of vector 57.Extraction unit 72 can will be decoded prospect V [k] vector 57 and be transferred to V vector reconstruction unit 74, and Encoded environment HOA coefficient 59 is provided to sound quality decoding unit 80 together with encoded nFG signal 61.

V vector, which rebuilds unit 74, can indicate to be configured to the list that V vector is rebuild from encoded prospect V [k] vector 57 Member.V vector is rebuild the mode that unit 74 can be reciprocal with quantifying unit 52 and is operated.

Sound quality decoding unit 80 can be reciprocal with the sound quality tone decoder unit 40 that is shown in the example of Fig. 3 mode Operation, to decode encoded environment HOA coefficient 59 and encoded nFG signal 61 and to generate the environment through energy compensating whereby HOA coefficient 47 ' and interpolated nFG signal 49 ' (it is also known as interpolated nFG audio object 49 ').Sound quality decoding is single Environment HOA coefficient 47 ' through energy compensating can be transmitted to desalination unit 770 and nFG signal 49 ' is transmitted to prospect system by member 80 Order member 78.

Spatial-temporal interpolation unit 76 can be similar to and operate above in relation to mode described in spatial-temporal interpolation unit 50.Space-time Interpolation unit 76 can receive reduced prospect V [k] vector 55_kAnd relative to prospect V [k] vector 55_kAnd reduced prospect V [k-1] vector 55_k-1Spatial-temporal interpolation is executed to generate interpolated prospect V [k] vector 55_k".Spatial-temporal interpolation unit 76 can will be through Prospect V [k] vector 55 of interpolation_k" it is relayed to desalination unit 770.

The signal 757 when one of indicative for environments HOA coefficient is in transformation can be also output to by extraction unit 72 Desalination unit 770, the desalination unit can then determine SHC_BG47 ' (wherein SHC_BG47 ' are also denoted as " environment HOA letter Road 47 ' " or " environment HOA coefficient 47 ' ") and interpolated prospect V [k] vector 55_k" element in any one will fade in or light Out.In some instances, desalination unit 770 can be relative to environment HOA coefficient 47 ' and interpolated prospect V [k] vector 55_k" Each of element operates on the contrary.That is, desalination unit 770 can be relative to the correspondence one in environment HOA coefficient 47 ' Person, which executes to fade in or fade out or execute, both to be faded in or fades out, while relative to interpolated prospect V [k] vector 55_k" element In correspondence one execution fade in fade out or executes fade in and fade out the two.Desalination unit 770 can be by adjusted environment HOA Coefficient 47 " is output to HOA coefficient and works out unit 82 and by adjusted prospect V [k] vector 55_k" ' be output to prospect works out unit 78.In in this respect, the expression of desalination unit 770 is configured to relative to HOA coefficient or its export item (for example, in environment HOA system The form of number 47 ') and interpolated prospect V [k] vector 55_k" element various aspects execute fading operations unit.

Prospect works out unit 78 and can indicate to be configured to relative to adjusted prospect V [k] vector 55_k" ' and it is interpolated NFG signal 49 ' executes matrix multiplication to generate the unit of prospect HOA coefficient 65.In in this respect, prospect works out unit 78 can group Close audio object 49 ' (it is the another way so as to indicating interpolated nFG signal 49 ') and vector 55_k" ' to rebuild The prospect (or in other words main aspect) of HOA coefficient 11 '.Prospect, which works out unit 78, can be performed interpolated nFG signal 49 ' Multiplied by adjusted prospect V [k] vector 55_k" ' matrix multiplication.

HOA coefficient works out unit 82 and can indicate to be configured to prospect HOA coefficient 65 being combined to adjusted environment HOA system Number 47 " is to obtain the unit of HOA coefficient 11 '.Apostrophe notation reflection HOA coefficient 11 ' can be similar to rather than be identical to HOA coefficient 11.Difference between HOA coefficient 11 and 11 ' can be due to due to damaging the transmitting on transmitting media, quantization or other damaging operation And the loss generated.

In addition, extraction unit 72 and audio decoding apparatus 24 more generally also can be configured with middle description according to the present invention Technology various aspects operation, with obtain may be described above about not including various grammers member in some cases The mode of element or data field and the bit stream 21 that optimizes.

In some cases, the height that audio decoding apparatus 24 can be configured to be compressed in decompression using the first compression scheme When rank ambiophony audio data, obtains and do not include the second pressure for corresponding to and being also used for compressing high-order ambiophony audio data The bit stream 21 through compressed version of the expression high-order ambiophony audio data of the position of contracting scheme.First compression scheme may include base In the compression scheme of vector, gained vector is defined in spherical harmonics domain and sends via bit stream 21.In some instances, it is based on The decomposition compression scheme of vector may include be related to by singular value decomposition (or such as relative to the example of Fig. 3 in greater detail its Imitate object) it is applied to the compression scheme of high-order ambiophony audio data.

Audio decoding apparatus 24 can be configured to obtain and not include the compression scheme corresponded to for executing Second Type At least one syntactic element position bit stream 21.As mentioned above, the second compression scheme includes the compression based on directionality Scheme.More specifically, audio decoding apparatus 24 can be configured to obtain and not include corresponding to the second compression scheme The bit stream 21 of the position of HOAPredictionInfo syntactic element.In other words, when the second compression scheme includes based on directionality When compression scheme, audio decoding apparatus 24 be can be configured to obtain and not include corresponding to the compression scheme based on directionality The bit stream 21 of the position of HOAPredictionInfo syntactic element.As mentioned above, HOAPredictionInfo syntactic element It can indicate the prediction between two or more signals based on direction.

In some cases, as the alternative solution of previous examples or previous examples are combined, audio decoding apparatus 24 can be through When inhibiting gain calibration during configuring the compression to be high-order ambiophony audio data, obtains and do not include gain calibration number According to expression high-order ambiophony audio data the bit stream 21 through compressed version.In these cases, audio decoding apparatus 24 It can be configured to decompress high-order ambiophony audio data according to the synthesis decompression scheme based on vector.By by singular value Decompose (or above in relation to Fig. 3 example compared with detailed description its equivalent) be applied to high-order ambiophony audio data generate High-order ambiophony data through compressed version.When SVD or its equivalent are applied to HOA audio data, audio coding dress It sets 20 and specifies gained at least one of vector or the position for indicating it in bit stream 21, wherein vector description corresponds to prospect audio pair The spatial character (such as width, position and volume of corresponding prospect audio object) of elephant.

More specifically, audio decoding apparatus 24 can be configured to obtain to have from bit stream 21 and be set as zero to indicate gain Correct the MaxGainCorrAmbExp syntactic element of repressed value.That is, when gain calibration is suppressed, audio solution Code device 24 can be configured to obtain bit stream, so that bit stream and not including storage gain calibration data HOAGainCorrection data field.Bit stream 21 may include the value with zero to indicate that gain calibration is repressed MaxGainCorrAmbExp syntactic element, and and do not include the HOAGainCorrection data word for storing gain calibration data Section.When the compression of high-order ambiophony audio data includes that will unify voice and audio and speech decoding (USAC) applied to high-order The inhibition to gain calibration can occur when ambiophony audio data.

Fig. 5 is illustrates that audio coding apparatus (such as the audio coding apparatus 20 shown in the example of Fig. 3) executes this hair The flow chart of the example operation of the various aspects of synthetic technology described in bright based on vector.Initially, audio coding apparatus 20 receive HOA coefficient 11 (106).Audio coding apparatus 20 can call LIT unit 30, can relative to HOA coefficient application LIT with Exporting transformed HOA coefficient, (for example, in the case where SVD, transformed HOA coefficient may include US [k] vector 33 and V [k] vector 35)(107)。

Next audio coding apparatus 20 can call parameter calculation unit 32, in the manner described above relative to US Any combination execution analysis as described above of [k] vector 33, US [k-1] vector 33, V [k] and/or V [k-1] vector 35 comes Identify various parameters.That is, parameter calculation unit 32 can determine at least one based on the analysis of transformed HOA coefficient 33/35 A parameter (108).

Audio coding apparatus 20 can then call the unit 34 that reorders, can be (same by transformed HOA coefficient based on parameter In the context of SVD, US [k] vector 33 and V [k] vector 35 can refer to) it reorders to generate the transformed HOA of rearranged sequence Coefficient 33 '/35 ' (or in other words US [k] vector 33 ' and V [k] vector 35 '), (109) as described above.Audio coding Device 20 can also call Analysis of The Acoustic Fields unit 44 during any one of aforementioned operation or subsequent operation.As described above, Analysis of The Acoustic Fields unit 44 can execute Analysis of The Acoustic Fields relative to HOA coefficient 11 and/or transformed HOA coefficient 33/35, to determine prospect Total number, the background sound field (N of channel (nFG) 45_BG) order and additional BG HOA channel to be sent number (nBGa) And index (i) (it can be collectively denoted as background channel information 43 in the example of fig. 3) (109).

Audio coding apparatus 20 can also call Foreground selection unit 48.Foreground selection unit 48 can be based on background channel information 43 determine background or environment HOA coefficient 47 (110).Audio coding apparatus 20 can further call foreground selection unit 36, can Prospect or distinct components based on nFG 45 (it can indicate one or more indexes of identification prospect vector) selection expression sound field US [k] vector 33 ' of rearranged sequence and V [k] vector 35 ' (112) of rearranged sequence.

Audio coding apparatus 20 can call energy compensating unit 38.Energy compensating unit 38 can be relative to environment HOA coefficient 47 execute energy compensating, are damaged with compensating energy caused by each for being attributed to and being removed in HOA coefficient by Foreground selection unit 48 It loses (114), and generates the environment HOA coefficient 47 ' through energy compensating whereby.

Audio coding apparatus 20 can also call space-time interpolation unit 50.Spatial-temporal interpolation unit 50 can be relative to rearranged sequence Transformed HOA coefficient 33 '/35 ' executes spatial-temporal interpolation, and to obtain interpolated foreground signal 49 ', (it is also known as " interpolated NFG signal 49 ' ") and remaining developing direction information 53 (it is also known as " V [k] vector 53 ") (116).Audio coding dress Setting 20 can then call coefficient to reduce unit 46.Coefficient reduces unit 46 can be based on background channel information 43 relative to remaining prospect V [k] vector 53 executes coefficient and reduces, and to obtain reduced developing direction information 55, (it is also known as reduced prospect V [k] Vector 55) (118).

Audio coding apparatus 20 can call quantifying unit 52 then to compress reduced prospect V in the manner described above [k] vector 55 and generate through decode prospect V [k] vector 57 (120).

Audio coding apparatus 20 can also call sound quality tone decoder unit 40.Sound quality tone decoder unit 40 can be to warp It is encoded to generate that the environment HOA coefficient 47 ' of energy compensating and each vector of interpolated nFG signal 49 ' carry out sound quality decoding Environment HOA coefficient 59 and encoded nFG signal 61.Audio coding apparatus can then call bitstream producing unit 42.Bit stream generates Unit 42 can be based on believing through decoding developing direction information 57, through decoding environment HOA coefficient 59, through decoding nFG signal 61 and background Road information 43 generates bit stream 21.

Fig. 6 is illustrates that audio decoding apparatus (such as the audio decoding apparatus 24 shown in the example of Fig. 4) executes this hair The flow chart of the example operation of the various aspects of technology described in bright.Initially, audio decoding apparatus 24 can receive bit stream 21 (130).Upon receiving the bit stream, audio decoding apparatus 24 can call extraction unit 72.Bit stream is assumed for discussion purposes 21 instructions will execute the reconstruction based on vector, and extraction unit 72 can parse bit stream to retrieve information mentioned above, thus The information is transmitted to the reconstruction unit 92 based on vector.

In other words, extraction unit 72 can be extracted from bit stream 21 through decoding developing direction information in the manner described above 57 (same, to be also known as through decoding prospect V [k] vector 57) are decoded environment HOA coefficient 59 and through decoding foreground signal (it is also known as through decoding prospect nFG signal 59 or through decoding prospect audio object 59) (132).

Audio decoding apparatus 24 can further call dequantizing unit 74.Dequantizing unit 74 can be to through decoding developing direction Information 57 carries out entropy decoding and de-quantization to obtain reduced developing direction information 55_k(136).Audio decoding apparatus 24 may be used also Call sound quality decoding unit 80.The encoded environment HOA coefficient 59 of 80 decodable code of sound quality audio decoding unit and encoded prospect letter Numbers 61 to obtain environment HOA coefficient 47 ' and interpolated foreground signal 49 ' (138) through energy compensating.Sound quality decoding unit 80 Environment HOA coefficient 47 ' through energy compensating can be transmitted to desalination unit 770 and nFG signal 49 ' is transmitted to prospect and work out list Member 78.

Next audio decoding apparatus 24 can call space-time interpolation unit 76.Spatial-temporal interpolation unit 76 can receive rearranged sequence Developing direction information 55_k' and relative to reduced developing direction information 55_k/55_k-1It is interpolated to generate to execute spatial-temporal interpolation Developing direction information 55_k"(140).Spatial-temporal interpolation unit 76 can be by interpolated prospect V [k] vector 55_k" it is relayed to desalination list Member 770.

Audio decoding apparatus 24 can call desalination unit 770.Desalination unit 770 can (for example, from extraction unit 72) reception Or in other ways obtain instruction the environment HOA coefficient 47 ' through energy compensating when on the turn syntactic element (for example, AmbCoeffTransition syntactic element).Desalination unit 770 can be based on transformation syntactic element and maintained transition stage letter It ceases and fades in or the environment HOA coefficient 47 ' through energy compensating that fades out, so that adjusted environment HOA coefficient 47 " is output to HOA Coefficient works out unit 82.Desalination unit 770 can also be faded out or be faded in based on syntactic element and maintained transition stage information Interpolated prospect V [k] vector 55_k" correspondence one or more elements, thus by adjusted prospect V [k] vector 55_k" ' be output to Prospect works out unit 78 (142).

Audio decoding apparatus 24 can call prospect to work out unit 78.Prospect, which works out unit 78, can be performed nFG signal 49 ' and warp Adjust developing direction information 55_k" ' matrix multiplication to obtain prospect HOA coefficient 65 (144).Audio decoding apparatus 24 is also adjustable Unit 82 is worked out with HOA coefficient.HOA coefficient works out unit 82 can be by prospect HOA coefficient 65 and adjusted environment HOA coefficient 47 " It is added to obtain HOA coefficient 11 ' (146).

Fig. 7 executes technology described in the present invention by the system of the system 10 shown in the example of explanation such as Fig. 2 The flow chart of the example operation of various aspects.As discussed above, creator of content device 12 can utilize audio editing system 18 Come create it is editing captured or generate audio content (it is shown as HOA coefficient 11 in the example of figure 2).Creator of content dress Setting 12 then can be used sound renderer 1 that HOA coefficient 11 is rendered into produced multi-channel loudspeaker feeding, such as in more detail above Discuss (200).Creator of content device 12 then audio playback systems can be used to play these speaker feeds, and determination is No requirement is further adjusted or is edited to capture the wanted artistic intent (202) (as an example).When it is desirable that further adjustment When (202 "Yes"), creator of content device 12 can remix HOA coefficient 11 (204), rendering HOA coefficient 11 (200) and determine Further whether adjustment is necessary (202).When being not intended to further adjust (202 "No"), audio coding apparatus 20 can Mode encoded audio content described in example described above relative to Fig. 5 generates bit stream 21 (206).Audio coding apparatus 20 Simultaneously specific audio frequency spatial cue 2 can be also generated in bit stream 21, and (208) relatively such as are described in detail above.

Content consumer device 14 then can obtain audio spatial cue 2 (210) from bit stream 21.Decoding apparatus 24 then may be used Mode described in example described above relative to Fig. 6 decodes bit stream 21, and to obtain audio content, (it shows in the example of figure 2 For HOA coefficient 11 ') (211).Audio playback systems 16 can then be based on 2 wash with watercolours of audio spatial cue in a manner of as described above Dye HOA coefficient 11 ' (212) simultaneously plays rendered audio content (214) via loudspeaker 3.

Therefore technology described in the present invention can realize the position for generating (as the first example) and indicating multi-channel audio content Stream is with the device of specific audio frequency spatial cue.Described device can be in this first example comprising for specific audio frequency spatial cue Device, the audio spatial cue include the signal value for the sound renderer that identification is used when generating multi-channel audio content.

Such as the device of the first example, wherein signal value includes for spherical harmonics coefficient to be rendered into multiple speaker feeds Matrix.

In the second example, such as device of the first example, wherein signal value includes to define two of an index or more than two A position, the index instruction bit stream include the matrix for spherical harmonics coefficient to be rendered into multiple speaker feeds.

Such as the device of the second example, sound intermediate frequency spatial cue further includes the row for the matrix that definition is contained in bit stream Number two or more positions, and the number of matrix column that is contained in bit stream of definition two or more Position.

Such as the device of the first example, wherein signal value specifies the wash with watercolours for audio object to be rendered into multiple speaker feeds Contaminate algorithm.

Such as the device of the first example, wherein signal value is specified for spherical harmonics coefficient to be rendered into multiple speaker feeds Rendering algorithms.

Such as the device of the first example, wherein signal value includes two or more positions for defining an index, the index It is associated with one of multiple matrixes for spherical harmonics coefficient to be rendered into multiple speaker feeds.

Such as the device of the first example, wherein signal value includes two or more positions for defining an index, the index It is associated with one of multiple Rendering algorithms for audio object to be rendered into multiple speaker feeds.

Such as the device of the first example, wherein signal value includes two or more positions for defining an index, the index It is associated with one of multiple Rendering algorithms for spherical harmonics coefficient to be rendered into multiple speaker feeds.

Such as device of the first example, wherein the device for specific audio frequency spatial cue includes for the base in bit stream In the device of every audio frame specific audio frequency spatial cue.

Such as device of the first example, wherein the device for specific audio frequency spatial cue includes for single in bit stream The device of secondary specific audio frequency spatial cue.

In third example, a kind of non-transitory computer-readable storage media being stored thereon with instruction, described instruction One or more processors are caused to specify the audio spatial cue in bit stream when being executed, wherein audio spatial cue identification is worked as The sound renderer used when generating multi-channel audio content.

In the 4th example, a kind of for rendering the device of the multi-channel audio content from bit stream, described device includes: For determining that the device of audio spatial cue, the audio spatial cue include that identification is used when generating multi-channel audio content Sound renderer signal value；And for rendering multiple speaker feeds based on audio spatial cue specified in bit stream Device.

Such as device of the 4th example, wherein the signal value includes for spherical harmonics coefficient to be rendered into multiple loudspeakers The matrix of feeding, and wherein the device for rendering multiple speaker feeds includes multiple for being rendered based on the matrix The device of speaker feeds.

In the 5th example, such as device of the 4th example, wherein the signal value includes two or more for defining an index In two positions, the index instruction bit stream includes the matrix for spherical harmonics coefficient to be rendered into multiple speaker feeds, Described in device further comprise and wherein described being used for for the device in response to the matrix of the index parsing from bit stream The device for rendering multiple speaker feeds includes for based on the device for rendering multiple speaker feeds through parsing matrix.

Such as device of the 5th example, wherein the signal value further includes the row for the matrix that definition is contained in bit stream Two or more positions of number, and two or more positions of the number for the matrix column being contained in bit stream are defined, And wherein described for parsing the device of the matrix from bit stream includes for the number in response to the index, and based on definition line Purpose is described two or the described two or more than two position of the number of more than two position and definition column parses the square from bit stream The device of battle array.

Such as device of the 4th example, wherein the signal value is specified for audio object to be rendered into multiple speaker feeds Rendering algorithms, and wherein the device for rendering multiple speaker feeds include for using specified Rendering algorithms from Audio object renders the device of multiple speaker feeds.

Such as device of the 4th example, wherein the signal value is specified for spherical harmonics coefficient to be rendered into multiple loudspeakers The Rendering algorithms of feeding, and wherein the device for rendering multiple speaker feeds includes for being calculated using specified rendering Method renders the device of multiple speaker feeds from spherical harmonics coefficient.

Such as device of the 4th example, wherein the signal value includes two or more positions for defining an index, it is described Index is associated with being used to for spherical harmonics coefficient being rendered into one of multiple matrixes of multiple speaker feeds, and wherein institute Stating the device for rendering multiple speaker feeds includes for one of use multiple matrixes associated with the index The device of multiple speaker feeds is rendered from spherical harmonics coefficient.

Such as device of the 4th example, wherein the signal value includes two or more positions for defining an index, it is described Index is associated with being used to for audio object being rendered into one of multiple Rendering algorithms of multiple speaker feeds, and wherein institute Stating the device for rendering multiple speaker feeds includes for using in multiple Rendering algorithms associated with the index One renders the device of multiple speaker feeds from audio object.

Such as device of the 4th example, wherein the signal value includes two or more positions for defining an index, it is described Index is associated with being used to for spherical harmonics coefficient being rendered into one of multiple Rendering algorithms of multiple speaker feeds, and its Described in device for rendering multiple speaker feeds include for use multiple Rendering algorithms associated with the index One of the devices of multiple speaker feeds is rendered from spherical harmonics coefficient.

Such as device of the 4th example, wherein the device for determining audio spatial cue includes for being based on from bit stream Every audio frame determines the device of audio spatial cue.

Such as the device of the 4th example, wherein the device for determining audio spatial cue includes for from bit stream single Determine the device of audio spatial cue.

In the 6th example, a kind of non-transitory computer-readable storage media being stored thereon with instruction, described instruction Cause one or more processors when being executed: determining that audio spatial cue, the audio spatial cue include that identification is more when generating The signal value of the sound renderer used when channel audio content；And it is multiple based on the audio spatial cue rendering specified in bit stream Speaker feeds.

Fig. 8 A to 8D is the figure for illustrating bit stream 21A to the 21D formed in accordance with the techniques described in this disclosure.In the reality of Fig. 8 A In example, bit stream 21A can indicate figure 2 above to an example of the bit stream 21 shown in 4.Bit stream 21A includes audio rendering letter 2A is ceased, it includes one or more positions of definition signal value 554.This signal value 554 can indicate appointing for information type described below What is combined.Bit stream 21A also includes audio content 558, can indicate an example of audio content 7/9.

In the example of Fig. 8 B, bit stream 21B can be similar to bit stream 21A, and the signal value 554 of sound intermediate frequency spatial cue 2B wraps It includes one or more of index 554A, the row size 554B for defining communication matrix, define the one of communication matrix column size 554C Or multiple positions and matrix coefficient 554D.Two to five positions can be used to define for index 554A, and row size 554B and column size Two to 16 position definition can be used in each of 554C.

The extractable index 554A of extraction unit 72 simultaneously determines whether communication matrix is contained in (its in bit stream 21B to the index In such as 0000 or 1111 certain index values can communication matrix explicitly specify in bit stream 21B).In the example of Fig. 8 B, Bit stream 21B includes that communication matrix explicitly specifies the index 554A in bit stream 21B.As a result, to can extract row big for extraction unit 72 Small 554B and column size 554C.Extraction unit 72, which can be configured, parses retinue size 554B, column size to calculate bits number Transmitted (not the showing in Fig. 8 A) of 554C and each matrix coefficient or imply position size and the representing matrix coefficient that becomes.Using institute Determine bits number, extraction unit 72 can extract matrix coefficient 554D, and the matrix coefficient can be used to match for audio playback systems 16 Set one of sound renderer 22 as described above.Although being shown as the single communication audio rendering letter in bit stream 21B Cease 2B, but audio spatial cue 2B can in bit stream 21B multiple communication or at least partially or fully more in independent outband channel Secondary communication (being used as optional data in some cases).

In the example of Fig. 8 C, bit stream 21C can indicate figure 2 above to an example of the bit stream 21 shown in 4.Bit stream 21C includes audio spatial cue 2C, and it includes the signal values 554 of the index of assignment algorithm in this example 554E.Bit stream 21C is also wrapped Containing audio content 558.The definition of two to five positions can be used in algorithm index 554E, as mentioned above, wherein this algorithm index 554E can recognize the Rendering algorithms used when rendering audio content 558.

Extraction unit 72 can extract algorithm index 550E and determine whether communication matrix is contained in bit stream to algorithm index 554E In 21C (wherein such as 0000 or 1111 certain index values can communication matrix explicitly specify in bit stream 21C).Fig. 8 C's In example, bit stream 21C includes that communication matrix does not explicitly specify the algorithm index 554E in bit stream 21C.As a result, extracting single Algorithm index 554E is relayed to audio playback systems 16 by member 72, and the audio playback systems selection is one corresponding (if available) Rendering algorithms (it is expressed as renderer 22 in the example of Fig. 2 to 4).Although being shown as the single communication audio in bit stream 21C Spatial cue 2C, but in the example of Fig. 8 C, audio spatial cue 2C can in bit stream 21C multiple communication or at least partly or complete The multiple communication (being used as optional data in some cases) in independent outband channel entirely.

In the example of Fig. 8 D, bit stream 21D can indicate figure 2 above to an example of the bit stream 21 shown in 4.Bit stream 21D includes audio spatial cue 2D, and it includes the signal values 554 of the index of specified matrix in this example 554F.Bit stream 21D is also wrapped Containing audio content 558.The definition of two to five positions can be used in matrix index 554F, as mentioned above, wherein this matrix index 554F can recognize the Rendering algorithms used when rendering audio content 558.

Extraction unit 72 can extract matrix index 550F and determine whether communication matrix is contained in bit stream to matrix index 554F In 21D (wherein such as 0000 or 1111 certain index values can communication matrix explicitly specify in bit stream 21C).Fig. 8 D's In example, bit stream 21D includes that communication matrix does not explicitly specify the matrix index 554F in bit stream 21D.As a result, extracting single Matrix index 554F is relayed to audio frequency replaying apparatus by member 72, the correspondence one in the audio frequency replaying apparatus selection renderer 22 (if available).Although being shown as the single communication audio spatial cue 2D in bit stream 21D, in the example of Fig. 8 D, audio Spatial cue 2D can in bit stream 21D multiple communication or at least partially or fully in independent outband channel multiple communication (one Optional data is used as in a little situations).

Fig. 8 E to 8G is the part that may specify bit stream or side channel information through compression space component compared with detailed description Figure.Fig. 8 E illustrates the first example of the frame 249A ' of bit stream 21.In the example of Fig. 8 E, frame 249A ' includes ChannelSideInfoData (CSID) field 154A to 154C, HOAGainCorrectionData (HOAGCD) field, and VVectorData field 156A and 156B.CSID field 154A include unitC 267, bb 266 and ba 265 together with ChannelType 269, each of these person are set to the respective value 01,1,0 and 01 shown in the example of Fig. 8 E.CSID word Section 154B includes unitC 267, bb 266 and ba 265 together with ChannelType 269, and each of these person is set to Fig. 8 E Example in the respective value 01,1,0 and 01 that is shown.CSID field 154C includes the ChannelType field of the value with 3 269.Each of CSID field 154A to 154C corresponds to the corresponding one in transport channel 1,2 and 3.In fact, each CSID field 154A to 154C indicates that corresponding payload 156A and 156B are that the signal based on direction (works as correspondence When ChannelType is equal to zero), signal based on vector (when corresponding ChannelType is equal to for the moment), additional environment HOA system Number (when corresponding ChannelType is equal to two) or spacing wave (when ChannelType is equal to three).

In the example of Fig. 8 E, frame 249A includes two signals based on vector (in ChannelType 269 in CSID word Be equal under conditions of 1 in section 154A and 154B) and spacing wave (in ChannelType 269 equal to 3 in the CSID field 154C Under the conditions of).Based on the aforementioned part HOAconfig (not showing for ease of explanation purpose), audio decoding apparatus 24 can determine all 16 V vector elements are encoded.Therefore, VVectorData 156A and 156B respectively contains all 16 vector elements, wherein 8 position uniform quantizations of each.

As Fig. 8 E example in further show, frame 249A ' simultaneously do not include HOAPredictionInfo field. HOAPredictionInfo field can indicate the field corresponding to the second compression scheme based on direction, when the pressure based on vector Contracting scheme for that can remove the described second pressure based on direction when compressing HOA audio data in accordance with the techniques described in this disclosure Contracting scheme.

Fig. 8 F is to illustrate in addition to removing from storage to each transport channel of frame 249A " The figure of the frame 249A " of frame 249A is substantially similar to except HOAGainCorrectionData.When according to technology described above Various aspects inhibit gain calibration when, can from frame 249A " remove HOAGainCorrectionData field.

Fig. 8 G is the frame 249A " ' for illustrating to can be similar to frame 249A " other than removing HOAPredictionInfo field Figure.Frame 249A " ' is indicated in some cases can unnecessary various words to remove in combination with two aspects of the application technology One example of section.

Aforementioned techniques can be executed relative to the different contexts of any number and the audio ecosystem.Several realities are described below Example context, but the technology should not necessarily be limited by the example context.One example audio ecosystem may include audio content, Film workshop, music studio, gaming audio operating room, the audio content based on channel, decoding engine, gaming audio are former Sound, gaming audio decoding/rendering engine and delivery system.

Film workshop, music studio and gaming audio operating room can receive audio content.In some instances, audio Content can indicate the output obtained.Film workshop for example can be based on channel by using Digital Audio Workstation (DAW) output Audio content (for example, be in 2.0,5.1 and 7.1).Music studio for example can export the audio based on channel by using DAW Content (for example, being in 2.0 and 5.1).In any case, decoding engine can based on one or more codecs (for example, AAC, AC3, Doby high definition HD, Dolby Digital enhanced edition and DTS main body audio) it receives and encodes the audio content based on channel to be used for It is exported by delivery system.Gaming audio operating room for example can export one or more gaming audio primary sounds by using DAW.Game sound Frequency decoding/rendering engine decodable code audio primary sound and or audio primary sound is rendered to the audio content based on channel to be used for by passing System is sent to export.Another example context that the technology can be performed includes the audio ecosystem, may include broadcast recoding sound Frequency object, professional audio systems, capture on consumer devices, rendering on HOA audio format, device, consumption-orientation audio, TV and attached Part and automobile audio system.

It is captured on broadcast recoding audio object, professional audio systems and consumer devices and HOA audio format pair all can be used Its output is decoded.By this method, HOA audio format can be used that audio content is decoded into single representation, device can be used Upper rendering, consumption-orientation audio, TV and attachment and automobile audio system reset the single representation.It in other words, can be in general sound Sound is reset at frequency playback system (that is, opposite with the specific configuration such as 5.1,7.1 is required) (for example, audio playback systems 16) The single representation of frequency content.

Other examples that the context of the technology can be performed include the audio life that may include acquisition element and playback element State system.Obtaining element may include wired and/or wireless acquisition device (for example, intrinsic microphone), surround sound capture on device And mobile device (for example, smart phone and tablet computer).In some instances, wired and/or wireless acquisition device can be through Mobile device is coupled to by wired and/or radio communication channel.

One or more technologies according to the present invention, mobile device can be used for obtaining sound field.For example, mobile device can be through By surround sound capture on wired and/or wireless acquisition device and/or device (for example, being integrated into multiple Mikes in mobile device Wind) obtain sound field.Acquired sound field then can be decoded into HOA coefficient to be used for by one or more in playback element by mobile device Person resets.For example, the user of mobile device can record live events (for example, rally, meeting, match, concert etc.) and (obtain Take the sound field of live events), and record is decoded into HOA coefficient.

Mobile device can also decode sound field through HOA to reset using one or more of element is reset.For example, mobile The sound field that device decodable code is decoded through HOA, and the signal for causing one or more of playback element to re-create sound field is exported To resetting one or more of element.As an example, mobile device can utilize wireless and/or radio communication channel by signal It is output to one or more loudspeakers (for example, loudspeaker array, sound stick etc.).As another example, mobile device can utilize docking Solution output a signal to one or more Docking stations and/or one or more docking loudspeaker (for example, intelligent automobile and/or Audio system in family).As another example, mobile device can output a signal to one group of head using headphone rendering Headset is (for example) to create true stereo sound.

In some instances, specific mobile device can obtain 3D sound field and reset same 3D sound field in the time later.One In a little examples, mobile device can obtain 3D sound field, and 3D sound field is encoded to HOA and encoded 3D sound field is emitted to one or more Other devices (for example, other mobile devices and/or other nonmobile devices) are for resetting.

The another context that the technology can be performed includes the audio ecosystem, may include audio content, game work Room, through decoding audio content, rendering engine and delivery system.In some instances, game studios may include that HOA can be supported to believe Number editor one or more DAW.For example, one or more DAW may include can be configured with one or more gaming audios System operates the HOA plug-in unit and/or tool of (for example, work) together.In some instances, the exportable support of game studios The new primary sound format of HOA.Under any circumstance, game studios can will be output to rendering engine through decoding audio content, described Rendering engine can render sound field for being reset by delivery system.

The technology can also be executed relative to exemplary audio acquisition device.It for example, can be relative to may include common The intrinsic microphone that ground is configured to multiple microphones of record 3D sound field executes the technology.In some instances, intrinsic wheat On the surface for the substantially spherical ball that multiple microphones of gram wind can be located at the radius with about 4cm.In some instances, sound Frequency code device 20 can be integrated into intrinsic microphone so as to directly from microphone output bit stream 21.

Another exemplary audio obtain context may include can be configured with from one or more microphones (for example, one or more A intrinsic microphone) receive signal production vehicle.Making vehicle also may include audio coder, such as the audio coder 20 of Fig. 3.

In some cases, mobile device also may include the multiple microphones for being jointly configured to record 3D sound field.It changes Sentence is talked about, and multiple microphones can have X, Y, Z diversity.In some instances, mobile device may include rotatable relative to shifting One or more other microphones of dynamic device provide the microphone of X, Y, Z diversity.Mobile device also may include audio coder, example Such as the audio coder 20 of Fig. 3.

Reinforcement type video capture device can further be configured to record 3D sound field.In some instances, reinforcement type video Acquisition equipment could attach to the helmet for participating in movable user.For example, reinforcement type video capture device can go boating in user When be attached to the helmet of user.By this method, reinforcement type video capture device can capture indicate user around movement (for example, Shock of the water behind user, another person of going boating speak in front of user) 3D sound field.

Also the technology can be executed relative to the enhanced mobile device of attachment that may be configured to record 3D sound field.Some In example, mobile device can be similar to mobile device discussed herein above, wherein adding one or more attachmentes.For example, originally Sign microphone could attach to mobile device mentioned above to form the enhanced mobile device of attachment.By this method, attachment increases Strong type mobile device can capture the higher quality version of 3D sound field, rather than Jin Shiyong and the enhanced mobile device of attachment are integral The voice capturing component of formula.

The example audio replay device for the various aspects that technology described in the present invention can be performed is discussed further below.Root According to one or more technologies of the invention, loudspeaker and/or sound stick can be disposed in any arbitrary disposition when resetting 3D sound field.This Outside, in some instances, headphone replay device can be coupled to decoder 24 via wired or wireless connection.According to this hair One or more bright technologies can be indicated using the single general-purpose of sound field come in loudspeaker, sound stick and headphone replay device Any combination on render sound field.

Several different instances audio playback environment are also suitable for executing the various aspects of technology described in the present invention.Citing For, following environment can be the proper environment for executing the various aspects of technology described in the present invention: 5.1 speaker playbacks Environment, 2.0 (for example, stereo) speaker playback environment, with 9.1 speaker playback environment of loudspeaker, 22.2 before overall height Speaker playback environment, 16.0 speaker playback environment, auto loud hailer playback environment, and the movement with Headphone reproducing environment Device.

One or more technologies according to the present invention can be indicated using the single general-purpose of sound field come in aforementioned playback environment Sound field is rendered on any one.In addition, technology of the invention enables renderer to render sound field for removing from generic representation It is reset in playback environment except environment as described above.For example, if design considers that loudspeaker is forbidden to raise according to 7.1 The appropriate placement (for example, if can not place right surround loudspeaker) of sound device playback environment, technology of the invention makes wash with watercolours Dye device can be compensated with other 6 loudspeakers, so that playback can environmentally be realized in 6.1 speaker playbacks.

In addition, user can watch athletic competition when wearing headphone.One or more technologies according to the present invention, can The 3D sound field (for example, one or more intrinsic microphones can be placed in ball park and/or surrounding) for obtaining athletic competition, can obtain It obtains the HOA coefficient corresponding to 3D sound field and the HOA coefficient is emitted to decoder, the decoder can be based on HOA coefficient weight Construction 3D sound field and reconstructed structure 3D sound field is output to renderer, and the renderer can get the type about playback environment The instruction of (for example, headphone), and by reconstructed structure 3D sound field rendering at causing headphone output athletic competition The signal of the expression of 3D sound field.

In each of various situations described above, it should be appreciated that 20 executing method of audio coding apparatus, or it is another It outside include the device for being configured to each step of the method executed for executing audio coding apparatus 20.In some cases, Described device may include one or more processors.In some cases, one or more processors can indicate to be arrived by means of storage non- The application specific processor of the instruction configuration of temporary computer-readable storage medium.In other words, every in the set of encoding example The various aspects of technology in one can provide the non-transitory computer-readable storage media for being stored thereon with instruction, the finger Enable causes one or more processors to execute the method that audio coding apparatus 20 has been configured to execute when being executed.

In one or more examples, described function can be implemented with hardware, software, firmware, or any combination thereof.If with Software implementation, then the function can be used as one or more instructions or codes and store or emit on computer-readable media, and by Hardware based processing unit executes.Computer-readable media may include computer-readable storage medium, correspond to for example The tangible medium of data storage medium.Data storage medium can be that can be accessed by one or more computers or one or more processors To retrieve any useable medium for implementing the instructions, code, and or data structures of technology described in the present invention.Computer Program product may include computer-readable media.

Similarly, in each of various situations as described above, it should be appreciated that audio decoding apparatus 24 is executable Method also comprises the device for being configured to each step of the method executed for executing audio decoding apparatus 24.Some In the case of, described device may include one or more processors.In some cases, one or more processors can be indicated by means of depositing Store up the application specific processor of the instruction configuration of non-transitory computer-readable storage media.In other words, the set of encoding example Each of in the various aspects of technology can provide the non-transitory computer-readable storage media for being stored thereon with instruction, Described instruction causes one or more processors to execute the method that audio decoding apparatus 24 has been configured to execute when being executed.

By way of example and not limitation, such computer-readable storage medium may include RAM, ROM, EEPROM, CD-ROM Other optical disk storage apparatus, disk storage device or other magnetic storage devices, flash memory or can be used to store in refer to Wanted program code and any other media accessible by a computer of order or data structure form.However, it should be understood that described Computer-readable storage medium and data storage medium simultaneously do not include connection, carrier wave, signal or other temporary media, but real The tangible storage medium of non-transitory is directed on border.As used herein, disk and CD include compact disk (CD), laser CD, optical compact disks, digital versatile disc (DVD), floppy discs and Blu-ray Disc, wherein disk is usually magnetically again Existing data, and CD utilizes laser reproduce data optically.Every combination should also be included in computer-readable matchmaker above In the range of body.

Can by such as one or more digital signal processors (DSP), general purpose microprocessor, specific integrated circuit (ASIC), Field Programmable Logic Array (FPGA) or one or more other equivalent integrated or discrete logic processors execute instruction. Therefore, " processor " can refer to aforementioned structure or be adapted for carrying out appointing for technology described herein as used herein, the term Any one of one other structures.In addition, in certain aspects, functionality described herein can be configured for encoding And it is provided in decoded specialized hardware and/or software module, or be incorporated in combination codec.Moreover, the technology can be complete It is implemented in one or more circuits or logic elements entirely.

Technology of the invention can be implemented in a wide variety of devices or devices, including wireless handset, integrated circuit (IC) Or one group of IC (for example, chipset).It is to emphasize to be configured to execute institute that various components, modules, or units are described in the present invention In terms of the function of the device of disclosed technology, but it may not require to be realized by different hardware unit.In fact, as described above, Various units in combination with suitable software and/or firmware combinations in codec hardware unit, or by interoperability hardware list The set of member provides, and the hardware cell includes one or more processors as described above.

The various aspects of the technology have been described.These and other aspect of the technology is in the appended claims In the range of.

Claims

1. a kind of device for being configured to rendering high-order ambiophony coefficient, described device include:

One or more processors, are configured to:

The sparsity of the sparsity of oriental matrix is obtained from the bit stream of the encoded version comprising the high-order ambiophony coefficient Information, the matrix is for rendering the high-order ambiophony coefficient to generate multiple speaker feeds；

The value symmetry information for the value symmetry for indicating the matrix is obtained from the bit stream；

The reduced bits number for indicating the matrix is obtained from the bit stream；And

Based on the sparsity information, described value symmetry information and the reduced bits number, the matrix is rebuild；And

Memory is coupled to one or more described processors, and is configured to store the sparsity information.

2. the apparatus according to claim 1, wherein one or more described processors are further configured to determine and will use The matrix renders the loudspeaker layout of the multiple speaker feeds from the high-order ambiophony coefficient.

3. the apparatus according to claim 1 further comprises being configured to reproduce based on the multiple speaker feeds By the loudspeaker for the sound field that the high-order ambiophony coefficient indicates.

4. the apparatus according to claim 1, wherein one or more described processors are further configured to obtain instruction knowledge The audio spatial cue of the signal value for the sound renderer not used when generating the multiple speaker feeds, and based on described Audio spatial cue renders the multiple speaker feeds.

5. device according to claim 4,

Wherein the signal value includes for rendering the high-order ambiophony coefficient to generate the multiple speaker feeds The matrix, and

Wherein one or more described processors are configured to described more based on the matrix rendering being contained in the signal value A speaker feeds.

6. a kind of method for rendering high-order ambiophony coefficient, which comprises

The sparsity of the sparsity of oriental matrix is obtained from the bit stream of the encoded version comprising the high-order ambiophony coefficient Information, the matrix is for rendering the high-order ambiophony coefficient to generate multiple speaker feeds；And

The value symmetry information for the value symmetry for indicating the matrix is obtained from the bit stream；And

The reduced bits number for indicating the matrix is obtained from the bit stream；

Based on described value symmetry information, the sparsity information and the reduced bits number, the matrix is rebuild.

7. the matrix will be used to mix from the high-order solid according to the method described in claim 6, it is further comprised determining Ring the loudspeaker layout that coefficient renders the multiple speaker feeds.

8. according to the method described in claim 6, it further comprises being reproduced based on the multiple speaker feeds by the height The sound field that rank ambiophony coefficient indicates.

9. according to the method described in claim 6, it further comprises obtaining instruction identification when the multiple loudspeaker feedback of generation The audio spatial cue of the signal value of the sound renderer used when sending；And

The multiple speaker feeds are rendered based on the audio spatial cue.

10. according to the method described in claim 9,

Wherein the method further includes rendering the multiple loudspeaker based on the matrix being contained in the signal value Feeding.

11. a kind of device for being configured to generate bit stream, described device include:

Memory is configured to storage matrix；And

One or more processors are coupled to the memory and are configured to:

The sparsity information for indicating the sparsity of the matrix is obtained, the matrix is for rendering high-order ambiophony coefficient to produce Raw multiple speaker feeds；

Obtain the value symmetry information for indicating the value symmetry of the matrix；

Based on described value symmetry information and the sparsity information, the reduced bits number for indicating the matrix is determined； And

Generate the bit stream so that it includes the encoded version of the high-order ambiophony coefficient, described value symmetry information, The sparsity information and the reduced bits number.

12. device according to claim 11, wherein one or more described processors are further configured to determine and will make The loudspeaker layout of the multiple speaker feeds is rendered from the high-order ambiophony coefficient with the matrix.

13. device according to claim 11 further comprises being configured to capture by high-order ambiophony system The microphone for the sound field that number indicates.

14. a kind of method for generating bit stream, which comprises

The sparsity information of the sparsity of oriental matrix is obtained, the matrix is more to generate for rendering high-order ambiophony coefficient A speaker feeds；

Based on described value symmetry information and the sparsity information, the bits number for indicating the matrix is reduced；And

Generate the bit stream so that it includes the encoded version of the high-order ambiophony coefficient, described value symmetry information, The sparsity information and reduced bits number.

15. according to the method for claim 14, further comprising determining will use the matrix three-dimensional from the high-order Reverberation coefficient renders the loudspeaker layout of the multiple speaker feeds.

16. further comprising according to the method for claim 14, capturing to be indicated by the high-order ambiophony coefficient Sound field.