ZA200503594B - Method for describing the composition of audio signals - Google Patents
Method for describing the composition of audio signals Download PDFInfo
- Publication number
- ZA200503594B ZA200503594B ZA200503594A ZA200503594A ZA200503594B ZA 200503594 B ZA200503594 B ZA 200503594B ZA 200503594 A ZA200503594 A ZA 200503594A ZA 200503594 A ZA200503594 A ZA 200503594A ZA 200503594 B ZA200503594 B ZA 200503594B
- Authority
- ZA
- South Africa
- Prior art keywords
- sound
- audio
- sound source
- description
- screen plane
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 17
- 230000005236 sound signal Effects 0.000 title claims description 10
- 230000033001 locomotion Effects 0.000 claims description 8
- 108050005509 3D domains Proteins 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 claims 3
- 230000000694 effects Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- NQLVQOSNDJXLKG-UHFFFAOYSA-N prosulfocarb Chemical group CCCN(CCC)C(=O)SCC1=CC=CC=C1 NQLVQOSNDJXLKG-UHFFFAOYSA-N 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Landscapes
- Stereophonic System (AREA)
Description
Method for describing the composition of audio signals
The invention relates to =a method and to an apparatus for coding and decoding a presentation description of audio signals, especially for the spatialization of MPEG-4 encoded audio signals in a 3D domain.
The MPEG-4 Audio standard as defined in the MPEG-4 Audio standard 180/1EC 14496-3:2001 and the MPEG-4 Systems stan- dard 14496-1:2001 facilitates a wide variety of applications by supporting the represermtation of audio objects. For the combination of the audio objects additional information - the so-called scene descr-iption - determines the placement in space and time and is transmitted together with the coded audio objects.
For playback the audio ob-jects are decoded separately and composed using the scene description in order to prepare a single soundtrack, which -is then played to the listener.
For efficiency, the MPEG-4 Systems standard ISO/IEC 14496- -1:2001 defines a way to encode the scene degcription in a binary representation, the so-called Binary Format for Scene
Description (BIFS). Correspondingly, audio scenes are de- scribed using so-called AwudicBIFS.
A scene description is structured hierarchically and can be. represented as a graph, wherein leaf-nodes of the graph form the separate objects and the other nodes describes the proc- essing, e.g. positioning, scaling, effects. The appearance and behavior of the separate cbjects can be controlled using parameters within the scene description nodes.
CONFBERMATION COPY u WO 2004/051624 PCT/EP2003/013394
The invention is based on the recognition of the following fact. The above mentioned version of the MPEG-4 Audio stan- dard defines a node named "Sound" which allows spatialization of audio signals in & 3D domain. A further node with the name "Sound2D" only allows spatialization on a 2D screen. The use of the "Sound" node in a 2D graphical player is not specified due to different implementations of the properties in a 2D and 3D player. However, from games, cinema and TV applications it is known, that it makes sense to provide the end user with a fully spatialized "3D-Sound" presentation, even if the video presentation is limited to a small flat screen in front. This is not possible with the defined "Sound" and "Sound2D" nodes.
Therefore, a problem to be solved by the invention is to overcome the above mentioned drawback. This problem is ad- dressed by the coding method disclosed in claim 1 and the corresponding decoding method disclosed in claim 5.
In principle, the inventive coding method comprises the generation of a parametric description of a sound source including information which allows s=patialization in a 2D coordinate system. The parametric description of the sound source is linked with the audio signals of said sound source. An additional 1D value is added to said parametric description which allows in a 2D vis ual context a spatialization of said sound source in a 3D domain.
Separate sound sources may be coded as separate audio ob- jects and the arrangement of the sound sources in a sound scene may be described by a scene de scription having first nodes corresponding to the separate audio objects and second nodes describing the presentation of the audio objects. A field of a second node may define the 3D spatialization of a
Amended 10 April 2006 sound source.
Advantageously, the 2D coordinate system corxesponds to the screen plane and the 1D value corresponds to a depth infor- mation perpendicular to said screen plane.
Furt hermore, a transformation of said 2D cooxdinate system values to said 3 dimensional positions may erable the move- ment of a graphical object in the screen plame to be mapped to a movement of an audio object in the depth perpendicular to said screen plane.
The inventive decoding method comprises, in principle, the reception of an audio signal corresponding to a sound source linked with a parametric description of the =sound source.
The parametric degcription includes information which allows spat ialization in a 2D coordinate system. An additional 1D value is separated from said parametric descxiption. The sound source is spatialized in a 2D visual contexts in a 3D domain using said additional 1D value.
Audio objects representing separate sound sources may be separately decoded and a single soundtrack may be composed from the decoded audio objects using a scene description having first nodes corresponding to the sepamate audio ob- ject 8 and second nodes describing the processing of the au- dio objects. A field of a second node may define the 3D spat ialization of a sound source.
Advamtageously, the 2D coordinate system cormxesponds to the screen plane and said 1D value corresponds to a depth infor- mation perpendicular to said screen plane.
Furthermore, a transformation of said 2D coordinate system values to said 3 dimensional positions may eriable the move- ment of a graphical object in the screen plare to be mapped to a movement of an audi o object in the depth perpendicular to said screen plane. s Exemplary embodiments
The Sound2D node is defi ned as followed:
Sound2D { exposedField SFFloat intensity 1.0 exposedField SFVec2f location 0,0 exposedField SFNode source NULL field SFBool spatialize TRUE } and the Sound node, which is a 3D node, is defined as fol- lowed:
Sound { exposedField SFVec3f direction 0, 0, 1 exposedField SFFloat intensity 1.0 exposedField Srvec3f location 0, 0, © exposedField SFFloat maxBack 10.0 exposedField SFFloat maxFront 10.0 exposedField SFFloat minBack 1.0 exposedField SFFloat minFront 1.0 exposedField SFFloat priority 0.0 exposedField SFNode source NULL field SFBool spatialize TRUE 30}
In the following the general term for all sound nodes (Sound2D, Sound and DirectiveSound) will be written in lower-case e.g. ‘sound nodes’.
In the simplest case the Sound or Sound2D node is connected via an AudioSource node to the decoder output. The sound nodes contain the intensity and the location information.
From the audic point of view a sound node is the final node before the loudspeaker mapping. In the case of several sound nodes, the output will be summed up. From the systems point of view the sound nodes can be seen as an entry point for the audio sub graph. A sound node can be grouped with non- audio nodes into a Transform nocle that will set its original location. .
With the phaseGroup field of the AudioSource node, it is possible to mark channels that contain important phase rela- tions, like in the case of “stereo pair”, “multichannel” etc. A mixed operation of phase related channels and non- phase related channels is allowed. A spatialize field in the sound nodes specifies whether tlhe sound shall be spatialized or not. This is only true for channels, which are not member of a phase group.
The Sound2D can spatialize the sound on the 2D screen. The standard said that the sound should be spatialized on scene of size 2m x 1.5m in a distance of one meter. This explana- . tion seems to be ineffective because the value of the loca- tion field is not restricted and therefore the sound can also be positioned outside the screen size.
The Sound and DirectiveSound node can set the location eve- rywhere in the 3D space. The mapoping to the existing loud- speaker placement can be done using simple amplitude panning or more sophisticated techniquees.
Both Sound and Sound2D can handl.e multichannel inputs and basically have the same functiomalities, but the Sound2bD node cannot spatialize a sound other than to the front.
R 6
A possibility is to add Sound and Sound2D to all scene graph profiles, i.e. add the Sound node to the SF2DNode group.
But, one reason for not including the “3D” sound nodes into the 2D =cene graph profiles is, that a typical 2D player is not capable to handle 3D vectors (SFVec3f type), as it would be required for the Sound direction and location field. \
Another reason is that the Sound node is specially designed for virtual reality scenes with moving listening points and attenuation attributes for far distance sound objects. For this the Listening point node and the Sound maxBack, mmax-
Front, minBack and minFront fields are defined.
According one embodiment the old Sound2D node is exteraded or a new Sound2Ddepth node is defined. The Sound2Ddepth rode could be similar the Sound2D node but with an additioraal depth field.
Sound2Ddepth { exposedField SFFloat intensity 1.0 exposedField SFVec2f location 0,0 exposedField SFFloat depth 0.0 exposedField SFNode source NULL field SFBool spatialize TRUE }
The intensity field adjusts the loudness of the sound. Its value ranges from 0.0 to 1.0, and this value specifies a factor that is used during the playback of the sound.
The location field specifies the location of the sound in the 2D scene. 3s The deptch field specifies the depth of the sound in thie 2D scene using the same coordinate system than the location
Claims (9)
1. Method for coding a presentatiom description of audio signals, comprising: generating a parametric description of a sound : source including information which allows spatialization in a 2D coordinate system; linking the parametric descrip tion of said sound source with the audio signals of s aid sound source; characterized by adding an additional 1D value to said parametric description which allows in a 2D v isual context a spatialization of said sound source in a 3D domain.
2. Method according to claim 1, where in separate sound sources are coded as separate audi o objects and the ar- rangement of the sound sources in a sound scene is de- scribed by a scene description hav ing first nodes corre- sponding to the separate audio obj ects and second nodes describing the presentation of the audio objects and wherein a field of a second node defines the 3D spatialization of a sound source.
3. Method according to claim 1 or 2, wherein said 2D coor- dinate system corresponds to a screen plane and said 1D value corresponds to a depth information perpendicular to said screen plane.
4. Method according to claim 3, where in a transformation of said 2D coordinate system values t« 3 dimensional posi- tions enables the movement of a graphical object in the screen plane to be mapped to a movement of an audio ob- ject in the depth perpendicular to said screen plane.
5. Method for decoding a presentation description of audio Amended 10 April 2006 q . WO 2004/051624 PCT/EP2003/013394 signals, comprising: receiving audio signals corresponding to a sound source linked with a parametric description of said sound source, wherein said parametr=zic description includes information which allows spatialization in a 2D coordinate system; characterized by separating an additional 1D value from said parametric description; and spatializing in a 2D visual context said sound source in a 3D domain using said additional 1D value.
6. Method according to claim 5, whereira audio objects rep- resenting separate sound sources are separately decoded and a single soundtrack is composed from the decoded au- dio objects using a scene description having first nodes corresponding to the separate audio objects and second nodes describing the processing of t he audio objects, and wherein a field of a second node defines the 3D spatialization of a sound source.
7. Method according to claim 5 or 6, wherein said 2D coor- dinate system corresponds to a screem plane and said 1D value corresponds to a depth informa tion perpendicular to said screen plane.
8. Method according to claim 7, wherein a transformation of said 2D coordinate system values to 3 dimensional posi- tions enables the movement of a graphical object in the SCreen plane to be mapped to a movement of an audio ob- ject in the depth perpendicular to s=id screen plane.
9. Apparatus for performing a method according to any of the preceding claims. A mended 10 April 2006
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02026770 | 2002-12-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
ZA200503594B true ZA200503594B (en) | 2006-08-30 |
Family
ID=35822618
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
ZA200503594A ZA200503594B (en) | 2002-12-02 | 2003-11-22 | Method for describing the composition of audio signals |
Country Status (1)
Country | Link |
---|---|
ZA (1) | ZA200503594B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112154676A (en) * | 2018-01-30 | 2020-12-29 | 弗劳恩霍夫应用研究促进协会 | Apparatus for converting object position of audio object, audio stream provider, audio content generation system, audio playback apparatus, audio playback method, and computer program |
-
2003
- 2003-11-22 ZA ZA200503594A patent/ZA200503594B/en unknown
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112154676A (en) * | 2018-01-30 | 2020-12-29 | 弗劳恩霍夫应用研究促进协会 | Apparatus for converting object position of audio object, audio stream provider, audio content generation system, audio playback apparatus, audio playback method, and computer program |
US11653162B2 (en) | 2018-01-30 | 2023-05-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatuses for converting an object position of an audio object, audio stream provider, audio content production system, audio playback apparatus, methods and computer programs |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
RU2741738C1 (en) | System, method and permanent machine-readable data medium for generation, coding and presentation of adaptive audio signal data | |
EP1568251B1 (en) | Method for describing the composition of audio signals | |
KR101004836B1 (en) | Method for coding and decoding the wideness of a sound source in an audio scene | |
Herre et al. | MPEG-H audio—the new standard for universal spatial/3D audio coding | |
Malham et al. | 3-D sound spatialization using ambisonic techniques | |
CN106954172B (en) | Method and apparatus for playing back higher order ambiophony audio signal | |
CA2508220C (en) | Method and apparatus for processing two or more initially decoded audio signals received or replayed from a bitstream | |
KR101054932B1 (en) | Dynamic Decoding of Stereo Audio Signals | |
JP2002505058A (en) | Playing spatially shaped audio | |
CN108806706A (en) | Handle the coding/decoding device and method of channel signal | |
Jot et al. | Beyond surround sound-creation, coding and reproduction of 3-D audio soundtracks | |
Shivappa et al. | Efficient, compelling, and immersive vr audio experience using scene based audio/higher order ambisonics | |
Jot et al. | Binaural simulation of complex acoustic scenes for interactive audio | |
TW202105164A (en) | Audio rendering for low frequency effects | |
Shirley et al. | Platform independent audio | |
ZA200503594B (en) | Method for describing the composition of audio signals | |
Kares et al. | Streaming immersive audio content | |
Plogsties et al. | Conveying spatial sound using MPEG-4 | |
Kim | Object-based spatial audio: concept, advantages, and challenges | |
Cobos et al. | Interactive enhancement of stereo recordings using time-frequency selective panning | |
RU2779295C2 (en) | Processing of monophonic signal in 3d-audio decoder, providing binaural information material | |
Reiter et al. | Object-based A/V application systems: IAVAS I3D status and overview | |
DOCUMENTATION | Scene description and application engine | |
Geier | Describing three-dimensional movements in an audio scene authoring format | |
Zoia | Room Models and Object-Oriented Audio Coding: Advantages and Applications |