CN117223299A

CN117223299A - Method, apparatus and system for modeling audio objects with range

Info

Publication number: CN117223299A
Application number: CN202280031578.6A
Authority: CN
Inventors: L·特伦蒂夫; D·菲舍尔; P·塞蒂亚万; C·费尔施
Original assignee: Dolby International AB
Current assignee: Dolby International AB
Priority date: 2021-04-29
Filing date: 2022-04-28
Publication date: 2023-12-12

Abstract

A method of modeling extended audio objects for audio rendering in a virtual or augmented reality environment is described. The method includes obtaining a range representation indicative of a geometric form of an extended audio object and information related to one or more first audio sources associated with the extended audio object. Furthermore, the method includes obtaining geometrically opposite points of the extended audio object based on a user location in the virtual or augmented reality environment. The method further includes determining a range parameter of the range representation based on the user location and the relative point, and determining a location of one or more second audio sources relative to the user location for modeling the extended audio object. Additionally, the method includes outputting the modified representation of the extended audio object for modeling the extended audio object.

Description

Method, apparatus and system for modeling audio objects with range

Cross Reference to Related Applications

The present application claims priority from the following priority applications: U.S. provisional application 63/181,865 (reference number: D21045USP 1) filed on month 29 of 2021, U.S. provisional application 63/247,156 (reference number: D21045USP 2) filed on month 22 of 2021, and European application 21200055.8 (reference number: D21045 EP) filed on month 9 of 2021.

Technical Field

This document relates to object-based audio rendering, and more particularly to rendering audio objects having a range in a Virtual Reality (VR) environment.

Background

The new MPEG-I standard enables an auditory experience from different viewpoints and/or viewing angles or listening positions by supporting a full six degrees of freedom (6 DoF) in Virtual Reality (VR), augmented Reality (AR), mixed Reality (MR) and/or augmented reality (XR) applications. The 6DoF interaction extends the 3DoF spherical video/audio experience limited to head rotation (pitch, yaw and roll) to include translational motion (front/back, up/down and left/right) so that roaming within the virtual environment (e.g., physically walking within a room) is allowed in addition to head rotation.

For audio rendering in VR applications, object-based approaches have been widely adopted that represent a complex auditory scene as a plurality of individual audio objects, each associated with parameters or metadata defining the position/location and trajectory of the object in the scene. The audio object is not a point audio source but may be provided with a spatial range reflecting the auditory perception obtained from the audio object. Such audio objects may emit one or more sound sources that are to be rendered in VR implementations.

To create a 6DoF experience that is natural and realistic to a listener, the listener's experience of directionality and spatial extent of sound or audio sources (objects) is critical to 6DoF rendering, especially to enable a roaming experience through scenes and around virtual audio sources. Since 6DoF rendering additionally involves large translational changes in listener listening positions, complex interactions between constantly changing listening positions and ranges of audio objects with complex structures may render 6DoF rendering difficult to implement. In particular, modeling such position-object interactions requires a greater number of parameters, which makes the computational complexity in the corresponding audio processing very high.

It may be noted that available audio rendering systems, such as MPEG-H3D audio renderers, are typically limited to rendering 3DoF (i.e. rotational movement of an audio scene caused by the head movement of a listener) without taking into account panning changes in the listener's listening position. Even 3DoF + increases only small panning changes in the listener's listening position without taking into account the larger panning movements of the listener. Thus, prior art techniques that fail to account for the large translational movement of the listener may encounter difficulties in true immersive rendering of 6DoF sound.

Therefore, there is a need to provide a simple way to implement 6DoF rendering of audio objects. In particular, in view of the important user movements at 6DoF rendering, it may be desirable to simplify the modeling of the (spatial) range of audio objects.

Disclosure of Invention

According to one aspect, a (e.g., computer-implemented) method of modeling an extended audio object for audio rendering in a virtual or augmented reality environment (or, in general, a computer-mediated reality environment) is described. The method may include obtaining a range representation indicative of a geometric form of an extended audio object and information related to one or more first audio sources associated with the extended audio object. The one or more first audio sources may be captured using an audio sensor as a recorded audio source associated with the extended audio object. In particular, the method may include using a geometric form of the extended audio object (e.g., a range representation indicative of the geometric form) to obtain the relative point based on a user position (i.e., a listening position of a listener) in the virtual or augmented reality environment. Additionally, the method may include determining a range parameter for the range representation based on the user location and the relative point.

In particular, the range parameter may describe a spatial extension of the extended audio object perceived at the user location. Thus, it will be appreciated that such spatial expansion of the expanded audio object may vary depending on the user location, and that the expanded audio object may be adaptively modeled for various user locations. To efficiently model the extended audio object, the method may further comprise determining the location of the one or more second audio sources relative to the location of the user. Such one or more second audio sources may be considered virtual, reproduced audio sources for modeling the extended audio objects at the corresponding user locations. Furthermore, the method may include outputting a modified representation of the extended audio object for modeling the extended audio object. It may be noted that the modified representation comprises the determined range parameters and the position of the one or more second audio sources.

As configured above, the proposed method allows modeling extended audio objects with simple parameters. In particular, where the spatial extent of the extended audio object and the corresponding location of the second (virtual) audio source(s) calculated for the given user location are known, the extended audio object may be effectively modeled as having an appropriate (perceived) size corresponding to the given user location, which may be applicable to subsequent rendering of the extended audio object (e.g., 6 DoF). Thus, the computational complexity of the audio rendering may be reduced, since detailed information about the form/position/orientation of the audio object and the movement of the user position may not be needed.

In other words, the proposed method effectively converts 6DoF data (e.g. input audio object source, user position, range geometry of objects, range position/orientation of objects, etc.) into simple information as input to a range modeling interface/tool, which allows efficient 6DoF rendering of audio objects without having to process large amounts of data.

In an embodiment, the range representation indicating the geometric form of the extended audio object corresponds to (conforms to) the geometric form of the extended audio object. For example, for a relatively simple geometry, the geometry of the extended audio object may be used as a range representation.

In an embodiment, the range parameter may be determined further based on a position and/or orientation of the extension audio object. Moreover, the method may further include determining one or more second audio sources for modeling the extended audio object based on the one or more first audio sources. According to an embodiment, the method may further comprise determining the range angle based on the user position, the relative point and the position and/or orientation of the extended audio object. For example, the range angle may be an arc metric indicating the spatial expansion of the expanded audio object perceived at the user location. Thus, range angle may refer to a relative arc amount (i.e., a relative range angle) that depends on the relative point and position and/or orientation of the extended audio object. In this case, the range parameter may be determined based on the (relative) range angle.

As configured above, the proposed method provides a simplified method for obtaining an accurate estimate of the spatial extent/perceptual size of an audio object at different user locations, thereby improving the performance in modeling the audio object using simple parameters.

In an embodiment, determining the location of the one or more second audio sources may include determining a circular arc based on the user location, the relative points, and the geometric form of the extended audio object. Additionally, determining the location of the one or more second audio sources may further include locating the determined one or more second audio sources on an arc of a circle. Further, the arc may include an arc related to the (relative) range angle as a corresponding arc measure at the user position, and may be determined based on the range angle and the user position. In an embodiment, the positioning may involve equally distributing all the second audio sources over the circular arc. Moreover, the positioning may depend on a level of correlation between the second audio sources and/or the intent of the content creator. In other words, the second audio sources may be placed on the circular arc at appropriate distance intervals determined based on the level of correlation between the second audio sources and/or the intent of the content creator.

In an embodiment, the range parameter may be further determined based on the determined number (i.e., count) of the one or more second audio sources. Notably, the determined number of one or more second audio sources may be a predetermined constant that is independent of the user's location and/or relative point. Alternatively, determining one or more second audio sources for modeling the extended audio object may comprise determining a number of the one or more second audio sources based on the (relative) range angle. In this case, the number of one or more second audio sources may increase with increasing range angle (i.e., the number may be positively correlated with the range angle). More specifically, determining one or more second audio sources for modeling the extended audio object may further comprise copying or summing a weighted mix of the one or more first audio sources and applying a decorrelation process to the copied or summed first audio sources. That is, the first audio source(s) may be replicated or their weighted mixtures may be added to obtain a determined number of second audio sources.

As configured above, by properly defining the second (virtual) audio source, the method allows modeling of audio objects in an accurate and suitable manner. In particular, modeling can be performed efficiently for various user locations, for audio objects having different input sources and forms/locations, and for the intent of the content creator.

In an embodiment, the range representation may indicate a spatially extended two-dimensional or three-dimensional geometric form for representing the extended audio object. Furthermore, the extended audio object may be oriented in two or three dimensions. Moreover, spatial expansion of an expanded audio object perceived at a user location may be described as expanding the perceived width, size, and/or quality of the audio object.

The geometric form of the extended audio object (e.g., a range representation indicative of the geometric form) may be used to obtain a relative point closest to the user's location in the virtual or augmented reality environment.

Notably, the opposing points may be points on the geometric form of the extended audio object (e.g., a range representation indicating the geometric form) that are closest to the user location.

The inventors have surprisingly found that modeling an extended audio object with the relative point closest to the user position enables better control of the attenuation of the audio level (level) at different user positions relative to the extended audio object and thus better modeling of the extended audio object.

In an embodiment, using the range representation indicating the geometric form of the extended audio object to obtain the opposite point comprises obtaining the opposite point geometrically or obtaining the opposite point at a distance from the geometric form or range representation. For example, the opposing points may be located geometrically. Alternatively, the opposing points may be located at a distance from the range representation or the geometric form. For example, the opposing points may be located at a distance from a boundary or origin of the range representation or geometric form.

In an embodiment, the method may further comprise obtaining an orthogonal projection of the extended audio object on a projection plane orthogonal to a first line connecting the user location and the opposite point. The method may further include determining a plurality of boundary points on the orthogonal projection that identify a projection size of the extended audio object. In this case, the (relative) range angle may be determined using the user position and the plurality of boundary points. For example, the range angle may be determined by connecting two boundary points with the user position and determining an angle between two straight lines connecting the respective boundary points with the user position as the range angle.

In particular, determining the plurality of boundary points may comprise obtaining a second line related to a horizontal projection size of the extended audio object. Accordingly, the plurality of boundary points may include a leftmost boundary point and a rightmost boundary point orthogonally projected on the second line. Depending on the orientation of the extension audio object, the horizontal projection size may be the maximum size of the extension audio object. In some embodiments, the extended audio object may have a complex geometry, and the orthogonal projection may comprise a simplified projection of the extended audio object having a complex geometry. In this case, the method may further comprise obtaining a simplified geometric form of the extended audio object for determining the relative points before obtaining the relative points on the geometric form of the extended audio object.

As configured above, the method enables simplifying the estimation of spatial range/perceptual size of audio objects having various geometric forms while providing sufficient accuracy in modeling the audio objects using simple parameters.

In an embodiment, the method may further comprise rendering the extension audio object based on the modified representation of the extension audio object. The extended audio object may be rendered using the determined location and range parameters of the one or more second audio sources. In particular, rendering may include 6DoF audio rendering. The method may further include obtaining a user position, a position and/or orientation of the extended audio object, and a geometry for rendering.

In an embodiment, the method may further comprise controlling the perceived size of the extended audio object using the range parameter. Accordingly, by controlling the perceived size of the extended audio object, the extended audio object may be modeled as a point source (point source) or a wide source (wide source). It may be noted that the location of the one or more second audio sources may be determined such that all second audio sources are at the same reference distance from the user location.

With the above configuration, the extended audio object can be effectively modeled using simple parameters. In particular, the spatial extent of the extended audio object and the corresponding position of the second (virtual) audio source(s) are calculated for a given user position to allow an accurate estimation of the appropriate (perceived) size of the audio object corresponding to the given user position (i.e. the size of the spatial extent that can be perceived at the user position). Because modeling may not require detailed information about the form/position/orientation of the audio object and the movement of the user's position, the processing of subsequent rendering of the audio object (e.g., 6DoF rendering) may be simplified accordingly.

In other words, the proposed method provides for automatic conversion of 6DoF data (e.g. input audio object source, user position, range geometry of objects, range position/orientation of objects, etc.) for audio range modeling, which may require simple parameters as input interface data, thereby further allowing efficient 6DoF rendering of audio objects without complex data processing.

According to another aspect, an apparatus for modeling an extended audio object for audio rendering in a virtual or augmented reality environment (or, in general, a computer-mediated reality environment) is described. The apparatus may include a processor and a memory coupled to the processor and storing instructions for the processor. The processor may be configured to obtain a range representation indicative of a geometric form of the extended audio object and information related to one or more first audio sources associated with the extended audio object. One or more first audio sources may be captured using an audio sensor as a recorded audio source associated with the extended audio object. In particular, the processor may be configured to obtain geometrically opposite points of the extended audio object based on a user position in the virtual or augmented reality environment. Additionally, the processor may be configured to determine a range parameter for the range representation based on the user location and the relative point.

Notably, the range parameters may describe spatial expansion of the expanded audio object perceived at the user location. Thus, it will be appreciated that such spatial expansion of the expanded audio object may vary depending on the user location, and that the expanded audio object may be adaptively modeled for various user locations. Further, the processor may be configured to determine a location of the one or more second audio sources relative to the user location for modeling the extended audio object. Such one or more second audio sources may be considered virtual, reproduced audio sources for modeling the extended audio objects at the corresponding user locations. Moreover, the processor may be configured to output a modified representation of the extended audio object for modeling the extended audio object. In particular, the modified representation may comprise range parameters and the location of the one or more second audio sources.

As configured above, the proposed apparatus efficiently converts 6DoF data (e.g., input audio object source, user location, range geometry of objects, range location/orientation of objects, etc.) into simple information/parameters as input to a range modeling interface/tool, which allows for efficient 6DoF rendering of audio objects without the need to process large amounts of data.

In particular, where the spatial extent of the extended audio object and the corresponding location of the second (virtual) audio source(s) calculated for the given user location are known, the extended audio object may be effectively modeled as having an appropriate (perceived) size corresponding to the given user location, which may be applicable to subsequent rendering of the extended audio object (e.g., 6 DoF). Thus, the computational complexity of the audio rendering may be reduced, since detailed information about the form/position/orientation of the audio object and the movement of the user position may not be needed.

According to another aspect, a system for implementing audio rendering in a virtual or augmented reality environment (or, in general, a computer-mediated reality environment) is described. The system may comprise the proposed apparatus (e.g. as described above) and a range modeling unit. The range modeling unit may be configured to receive information from the apparatus related to the modified representation of the extended audio object as described above. In addition, the range modeling unit may be configured to further control the range size of the extended audio object based on the information related to the modified representation (e.g. range parameters included in the modified representation). In some embodiments, the system may be or may be part of a user virtual reality console (e.g., a head mounted device, a computer, a mobile phone, or any other audio rendering device for rendering audio in a virtual and/or augmented reality environment). In some embodiments, the system may be configured to transmit the information related to the modified representation and/or a controlled range size of the extended audio object to the audio output.

According to a further aspect, a computer program is described. The computer program may comprise executable instructions for performing the method steps outlined throughout the present disclosure when executed by a computing device.

According to another aspect, a computer-readable storage medium is described. The storage medium may store a computer program adapted to be executed on a processor and for performing the method steps outlined throughout the present disclosure when executed on the processor.

It should be noted that the methods and systems, including the preferred embodiments thereof, as outlined in the present patent application may be used independently or in combination with other methods and systems disclosed in the present document. Furthermore, all aspects of the methods and systems outlined in the present patent application may be combined in any combination. In particular, the features of the claims may be combined with each other in any way.

It will be appreciated that the apparatus features and method steps may be interchanged in various ways. In particular, as will be understood by the skilled person, the details of the disclosed method(s) may be implemented by the corresponding apparatus, and vice versa. Moreover, any statement above regarding method(s) (and, for example, steps thereof) should be understood to apply equally to the corresponding apparatus (and, for example, blocks, stages, units thereof), and vice versa.

Drawings

The invention is explained below by way of example with reference to the accompanying drawings, in which:

FIG. 1 illustrates a conceptual diagram of an example range modeling tool according to an embodiment of the present disclosure;

FIG. 2 (a) illustrates an example audio scene including different user locations for implementing audio rendering of extended audio objects, according to an embodiment of the disclosure;

FIG. 2 (b) illustrates a range level of corresponding user locations within the example audio scene as illustrated in FIG. 2 (a), in accordance with an embodiment of the present disclosure;

FIG. 3 illustrates an example flowchart for implementing audio rendering of extended audio objects, according to an embodiment of this disclosure;

FIG. 4 illustrates an example block diagram for implementing audio rendering of extended audio objects, according to an embodiment of this disclosure;

fig. 5 (a) to 5 (c) illustrate schematic diagrams for determining a modified representation of an extended audio object as performed in method 300, according to an embodiment of the present disclosure;

FIG. 6 illustrates another schematic diagram of a modified representation for determining an extended audio object as performed in method 300, according to an embodiment of the present disclosure;

7 (a) through 7 (b) illustrate the definition of a reference distance for an object source having a range; and

fig. 8 (a) through 8 (c) illustrate respective resulting modified representations of an extended audio object for different user locations as depicted in fig. 2, according to an embodiment of the present disclosure.

Detailed Description

As outlined above, the present disclosure relates to effectively modeling audio objects having a range for audio rendering in a virtual and/or augmented reality environment (or in general a computer-mediated reality environment). FIG. 1 illustrates a conceptual diagram of an example range modeling tool according to an embodiment of the present disclosure. In this context, the extended audio object 101 to be modeled is associated with one or more audio sources 102, which may be captured as recorded audio source(s) using an audio sensor (e.g., microphone). In general, the extended audio object 101 may be considered as an audio object having a range of geometric forms. Which may be provided with a range representation indicative of the geometry and information related to one or more audio sources 102. Further, the one or more audio sources 102 may, for example, comprise one or more point source signals associated with the extended audio object 101. The range modeling tool 103 can model the extended audio object 101 based on information about the form of the extended audio object 101 and the one or more audio sources 102. For example, the location of one or more audio sources 102 may be contained in a range representation. One or more audio sources 102 themselves may or may not be included in the range representation.

The range modeling tool 103 may also model the extended audio object 101 based on a user location (e.g., a listener's listening position) in a virtual and/or augmented reality environment. That is, depending on the user location, the extended audio object 101 may be modeled as an audio source (e.g., a wide source or a point source) having a different range size. This may be achieved by providing a modified representation of the extended audio object 101 for a specific user position based on the (original) range representation. Accordingly, the extended audio object 101 may be effectively modeled via the respective modified representations as experiencing/perceiving different range sizes at different user locations.

Fig. 2 (a) illustrates an example audio scene including different user locations for implementing audio rendering of extended audio objects, according to an embodiment of the disclosure. As an example, the extended audio object may include a "seashore front" with a rough wave. Other examples may be known to those skilled in the art. Example audio scenes may be applied to implement 6DoF audio rendering in a virtual or augmented reality environment. In an embodiment, the object range 201 is shown as an extended audio object having a two-dimensional or three-dimensional geometric form, which may be oriented in two-dimensional or three-dimensional. A (raw) range representation is obtained that indicates the geometric form of the object range 201, including information about the range geometry, position and orientation, and information about the audio source 202 associated with the object range 201. Although audio source 202 is illustrated herein as a two-point source, any number and other kinds of audio sources are possible in the context of the present disclosure. As mentioned above, the audio source 202 may be a recorded audio source captured using an audio sensor (e.g., a microphone). Furthermore, the user positions 203a, 203b, 203c (which may be relative positions with respect to the object range 201, for example) are also obtained. In the illustrated example scenario, user 203a and user 203b are located in front of object range 201, but at different distances from object range 201, while user 203c is located on one side of object range 201. However, any other orientation may also be included in the scene as a user location.

Fig. 2 (b) illustrates a range level of corresponding user positions within the example audio scene illustrated in fig. 2 (a) according to an embodiment of the present disclosure. Herein, the range levels 204a, 204b, 204c represent the respective perceptions of the object range 201 at the user locations 203a, 203b, 203 c. In an embodiment, the range level may be defined using a spatially extended range parameter describing the range of objects perceived at a particular user location (extended audio objects). Notably, the perception of the object range 201 (and thus the range parameters) may depend on the relative user-to-range (user-to-extension) geometric position and orientation (e.g., the user position relative to the object range and/or the orientation of the range object). For example, as shown in fig. 2 (b), a user may experience a greater range level 204a at user location 203a than user locations 203b and 203c, which have corresponding range levels 204b and 204c, respectively. Thus, it may be beneficial to associate such range levels with user positions and simply model the extended audio objects using range parameters to render audio scenes where the user positions may vary significantly (i.e., large panning movements).

Fig. 3 illustrates an example flowchart for implementing audio rendering of extended audio objects, according to an embodiment of this disclosure. As illustrated, the method 300 may be performed to model an extended audio object (e.g., the extended audio object 101) or object range 201 for audio rendering in a virtual and/or augmented reality environment. At the position ofStep 301In obtaining a range representation indicative of a geometric form of the extended audio object and associated with the extended audio objectInformation related to one or more first audio sources. At the position ofStep 302In which opposing points on a geometric form of an extended audio object (e.g., a range representation indicative of the geometric form) are obtained (e.g., determined) based on a user location in a virtual or augmented reality environment. At the position ofStep 303In which range parameters of a range representation (e.g., showing a range level representing a perceived/spatial extension of an extended audio object) are determined based on the user location and the relative points. As indicated above, the range parameter may describe a spatial extension of the extended audio object perceived at the user location.

At the position ofStep 304In determining the position of one or more second audio sources relative to the user's position for modeling the extended audio object. Unlike the first audio source(s) that may have been captured by direct recording, the second audio source(s) may be virtual reproduction audio sources based on the first audio source(s), e.g., determined via replication and/or audio processing (including, e.g., filtering), as will be explained in detail below. Subsequently, at Step 305A modified representation of the extended audio object is output for modeling the extended audio object. Note that the modified representation may include the range parameters and the determined location of the one or more second audio sources for the given user location. Accordingly, the extended audio object may be effectively modeled for a particular user location with simple parameters that contain some knowledge of the spatial extent of the extended audio object and/or the corresponding location of the second audio source(s) calculated for that particular location.

In other words, the proposed method 300 effectively converts 6DoF data (e.g., input audio object source, user location, range geometry of the object, range location/orientation of the object, etc.) into simple information (e.g., range parameters and location of the second audio source included in the modified representation) as input to a range modeling interface/tool (e.g., range modeling tool 103), which in some embodiments may be a conventional interface/tool.

Further, subsequent rendering of the extended audio object may be performed based on the modified representation of the extended audio object. In this case, the extended audio object may be rendered using the determined location and range parameters of the one or more second audio sources. In some embodiments, the rendering may be a 6DoF audio rendering of the extended audio objects. In this case, in addition to the user position, the position and/or orientation and geometry of the extended audio object may be obtained for rendering. Accordingly, the range parameter may be determined further based on the position and/or orientation of the extended audio object.

Fig. 4 illustrates an example block diagram for implementing audio rendering of extended audio objects, according to an embodiment of this disclosure. In particular, system 400 includes a means for modeling extended audio objects for audio rendering in a virtual and/or augmented reality environment. In embodiments, the system 400 may be or may be part of a user virtual reality console (e.g., a head mounted device, a computer, a mobile phone, or any other audio rendering device for rendering audio in a virtual and/or augmented reality environment).

In some embodiments, the apparatus may take the form of a parameter conversion unit 401, for example comprising a processor configured to perform all steps of the method 300 and a memory coupled to the processor and storing instructions for the processor. In particular, the parameter conversion unit 401 may be configured to receive audio scene data, such as 6DoF data, including, for example, an input audio object source, a user position 403, and information about a range geometry, position, and/or orientation of the extended audio object 402 (e.g., the extended audio object 101 or the object range 201). The parameter conversion unit 401 may further be configured to perform steps 301-305 of the method 300 described above. Accordingly, the parameter conversion unit 401 converts the received audio scene data into reduced information (e.g. as a modified representation of the extended audio object) comprising information about the second (virtual) audio source(s) 404 (e.g. object position and signal data of the second audio source (s)) and range parameters 405 showing a range level representing the perceived/spatial extension of the extended audio object experienced at a specific location. The parameter conversion unit 401 may send this (reduced) information directly or via other processing means to the audio rendering unit (e.g. inside or outside the system 400) for outputting audio to the user (or alternatively may be part of the audio rendering unit outputting audio to the user, e.g. via a suitable device speaker). Accordingly, when part of an audio rendering device, system 400 may transmit the converted parameters (e.g., the simplified information related to the modified representation described above) to an audio output of the audio rendering device.

In some embodiments, the reduced information output by the parameter conversion unit 401 may then be provided as input interface data to the range modeling unit 406. The range modeling unit 406 (also referred to as a range modeling tool, e.g., the range modeling tool 103) may control the range size of the extended audio object (e.g., as rendered) based on the range parameters included in the reduced information. For example, the range parameters may be used to control the perceived size of the extended audio object, whereby the extended audio object may be modeled as a point source or a wide source. Accordingly, by simply adjusting the range parameter (e.g., range level), the appropriate (perceived) size corresponding to the particular user location may be provided to a subsequent rendering (e.g., 6DoF rendering) of the extended audio object. This provides a simplified system for implementing 6DoF rendering of extended audio objects. As a result, rendering/modeling may not require detailed information about the form/position/orientation of the audio object and the (panning) movement of the user position, which further allows performing 6DoF rendering by existing audio rendering techniques (e.g. applicable to 3DoF rendering), and thereby also reduces the computational complexity of 6DoF audio rendering.

In other words, the proposed method 300 and/or system 400 for audio range modeling provides for automatic conversion of 6DoF scene data, which may require simple parameters as input interface data, allowing efficient 6DoF rendering of audio objects using available existing systems without complex data processing for rendering.

Fig. 5 (a) to 5 (c) illustrate schematic diagrams for determining a modified representation of an extended audio object as performed in method 300, according to an embodiment of the present disclosure. The range representation is assumed to indicate a three-dimensional geometry for representing a spatial extension of the extended audio object in a three-dimensional orientation. In the example embodiment shown in fig. 5, the range representation indicates a cuboid (cuboid) representing the spatial extension of the extended audio object perceived at a given user location, and may be described as the perceived width, size and/or quality of the extended audio object. However, the range representation may also indicate other solid shapes or more complex geometries for representing the range of objects. An example of a first stage of converting a three-dimensional (3D) geometry into a two-dimensional (2D) user-view domain is illustrated in fig. 5 (a). Subsequently, a second stage in the 2D user view domain determines range parameters (e.g., range levels) for a given user location, as illustrated in fig. 5 (b), and a third stage determines the location of one or more second audio sources from a one-dimensional (1D) view, as illustrated by the example of fig. 5 (c).

As can be seen in the example of fig. 5 (a), the user 501 is located in front of an extended audio object 503 that is oriented in three dimensions. The extension audio object 503 is represented by a 3D geometry indicating a spatial extension of the extension audio object 503. According to an example embodiment, the point 502 geometrically closest to the user 501 of the extended audio object 503 may be obtained (e.g., determined) as the opposite point. Alternatively, a projection plane 504 orthogonal to a first line 505 connecting the user location 501 and the point 502 may be obtained (e.g., determined). On the projection plane 504, an orthogonal projection 506 of the extended audio object 503 may be obtained (e.g., determined). Subsequently, a second line 507 characterizing the horizontal size of the extended audio object 503 (e.g., on projection) may be obtained (e.g., determined). The second line 507 may form a plane 508 (e.g., as a viewing plane) with the first line 505. Accordingly, the first stage illustrated in fig. 5 (a) converts the 3D geometry of the extended audio object 503 into a 2D viewing plane 508.

As can be seen in the example of fig. 5 (b), a plurality of boundary points 509, 510 can be determined on the orthogonal projection 506. In particular, the boundary points 509, 510 may comprise the leftmost edge of the perspective projection 506 on the second line 507 A border and a rightmost border. Thus, the plurality of boundary points 509, 510 may identify the projection size of the extended audio object 503. Using the determined boundary points 509, 510 and the user position 501, a range angle x representing the range level (e.g. as a range parameter) may be calculated accordingly, e.g. by a trigonometric calculation ⁰ . That is, the range angle x ⁰ May be determined as the angle between two lines connecting the user position 501 and the boundary points 509, 510, respectively. As mentioned above, the range angle x ⁰ May thus refer to a relative arc measure (i.e., a relative range angle) that depends on the relative point and position and/or orientation of the extended audio object. In addition, arc 513 may also be determined based on the geometry of user location 501, the opposing points 502, and the expanded audio object 503. Notably, the arc 513 can be at a range angle x with respect to a corresponding arc measure at the user location 501 ⁰ Related arcs, and may be based on range angle x ⁰ And a user location 501.

As mentioned above, the location and/or orientation of the extended audio object may also be used to determine the range parameters. More specifically, the user position 501, the relative point 502, and the position and/or orientation of the extended audio object 503 may be used to determine the range angle x ⁰ The range parameter may be based on the range angle. As described above, the expansion angle x ⁰ May be determined (e.g., calculated) by using trigonometric operations. It is further understood that the determined range level (e.g., as a scalar) and thus the corresponding arc obtained at this stage may be used for audio source localization, as done at the third stage described below.

At a certain range angle x ⁰ And a corresponding arc 513, one or more second audio sources 511 may be located on the arc 513, as illustrated in the example of fig. 5 (c). It may be noted that the audio sources located on arc 513 may be equally loud (e.g., may be perceived as equally loud) to user 501 and may have the same reference distance. Alternatively, the number (count) of second audio sources 511 may be based on the range angle x ⁰ To determine. For example, there is only one second toneThe frequency source (n=1) can be applied at a small angle, while more than one second audio source (N>1) Can be applied to large angles. That is, the number of second audio source(s) 511 may vary with range angle x ⁰ Is increased by an increase in (a). Alternatively, the number N of second audio source(s) 511 may be independent of the user location 501 and/or the relative point 502 (e.g., independent of the range angle x ⁰ Length of range level 512 and arc 513).

Subsequently, the angle x can be calculated according to the (relative) range ⁰ And the number N of second audio source(s) 511 to set/determine a range level 512 for modeling the extended audio object 503. In particular, the second audio source(s) 511 may be placed/positioned on the circular arc 513. In more than one second audio source (i.e., N>1) These available N audio sources 511 may be located on the arc 513 such that all second audio sources 511 are equally loud to the user (i.e., perceived as equally loud) and/or have the same reference distance calculated from points on the arc 513 (e.g., distance from the user's location to the opposite point 502) for proper distance attenuation. For example, the second audio sources 511 may be equally distributed over the circular arc 513, i.e. may be placed/positioned over the circular arc 513 in such a way that adjacent second audio sources 511 are spaced apart from each other by the same distance. In some embodiments where two or more second audio sources are considered, the positioning may depend on the level of correlation between the second audio sources 511 and/or the intent of the content creator, as shown in the case of n=2 in the example of fig. 5 (c). For example, the respective pairs of adjacent second audio sources 511 may have different levels of decorrelation (e.g. according to their original recordings or (additional/manual) processing, such as copying, decorrelation filtering, etc.). The distance D2 between the pair of second audio signals 511 having a high (or higher) level of decorrelation may be larger than the distance D1 between the pair of second audio signals 511 having a low (or lower) level of decorrelation.

It may further be noted that the one or more second audio sources 511 may be determined from the (original) first audio source, e.g. by increasing the number of first audio sources. This may be achieved by copying one or more first audio sources and/or by summing a weighted mix of one or more first audio sources, and then applying a decorrelation process to the copied and/or summed first audio sources. For example, when one or several first audio sources are recorded/captured only for the extended audio object, the number of audio sources may be increased by copying the one or several first audio sources for determining the second audio source. Alternatively, in the case of a plurality of first audio sources, the second audio source may be determined by adding their weighted mixtures. Subsequent applications of the signal decorrelation process may be performed to obtain a final second audio source.

Fig. 6 illustrates another schematic diagram of an example for determining a modified representation of an extended audio object (e.g., performed in method 300) in accordance with an embodiment of the present disclosure. Fig. 5 illustrates a cuboid for representing a spatial extension of the extension audio object 503, while the example of fig. 6 shows a case where the extension audio object 603 may have a complex geometry (e.g. a vehicle). Similar to the embodiment of fig. 5, the user 601 is located in front of the extended audio object 603 in a three-dimensional orientation. However, in this example embodiment, a simplified scope representation 605 showing, for example, ellipsoids may be obtained prior to applying step 301 of the proposed method 300 in order to simplify steps 301 and 303. In other words, for those embodiments in which the range object has a complex geometry, the method 300 may further include obtaining a simplified geometry (simplified range geometry) 605 of the extended audio object for determining the relative point before or in order to obtain the relative point on the geometry of the extended audio object. Accordingly, a simplified orthogonal projection 606 of the extended audio object 603 may be obtained for subsequent determination of the range angle/level 604 as explained above.

Returning to fig. 5, the second audio source 511 may be located on a circular arc 513 to have the same reference distance relative to the user location 501. It will be appreciated that the reference distance specifies a distance (e.g., user-source distance) independent of the distance attenuation law used (calculated attenuation of the audio source element is minimal, e.g., 0 dB). For object sources having a range, such user-source distances may be relative to the rangeIs measured, for example, with respect to its "location" attribute) or with respect to the range itself, as shown in the examples of fig. 7 (a) and 7 (b), respectively. In the example of FIG. 7 (a), the reference distance D _ref Is defined relative to the origin 702a of the object range 701a, while in the example of fig. 7 (b), the reference distance D _ref Is defined relative to the closest point of object range 701b, as indicated by relative point 502 in the example of fig. 5. Thus, referring to fig. 7 (a) and 7 (b), the opposite point closest to the user's location will lie on the dashed line. In the example of FIG. 7 (a), the relative point is located a reference distance D from the origin 702a of the object range 701a _ref Where it is located. In the example of FIG. 7 (b), the opposite point is located at a reference distance D from the object range 701b _ref Where it is located. Referring to fig. 5 (b), since the opposite point 502 (e.g., as the closest point to the user 501) is also located on the arc 513, the second audio source placed on the arc 513 has the same reference distance as the opposite point 502, at which the attenuation may show a minimum.

Fig. 8 (a) through 8 (c) illustrate examples of respective resulting modified representations of an extended audio object for different user positions as depicted in fig. 2, according to an embodiment of the present disclosure. Similar to the audio scene illustrated in fig. 2, the user 803a and the user 803b are located in front of the object range 801, but at different distances from the object range 801, while the user 803c is located on one side of the object range 801. It is to be appreciated that any other orientation may also be included in the scene as a user location. Accordingly, the resulting range levels 804a, 804b, 804c represent respective perceptions (e.g., spatial expansion) of the object range 801 at the user locations 803a, 803b, 803 c. As shown in the example of fig. 8, the resulting range level 804a at the user location 803a is greater than the resulting range level 804b at the user location 803b, which may also allow a greater number of second audio sources 802a to be determined (and placed) for modeling the object range 801. Similarly, the resulting range level 804b at the user location 803b is greater than the resulting range level 804c at the user location 803c, which may also allow a greater number of second audio sources 802b to be determined (and placed) for modeling the object range 801. In this example, the modified representation of user position 803a contains five second audio sources 802a, the modified representation of user position 803b contains two second audio sources 802b, and the modified representation of user position 803c contains only one second audio source 802c.

Interpretation of the drawings

Aspects of the systems described herein may be implemented in a suitable computer-based sound processing network environment (e.g., a server or cloud environment) to process digital or digitized audio files. Portions of the adaptive audio system may include one or more networks including any desired number of independent machines, including one or more routers (not shown) for buffering and routing data transmitted between the computers. Such a network may be built on a variety of different network protocols, and may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof.

One or more components, blocks, processes, or other functional components may be implemented by a computer program that controls the execution of a processor-based computing device of the system. It should also be noted that the various functions disclosed herein may be described in terms of behavior, register transfer, logic components, and/or other characteristics using any number of combinations of hardware, firmware, and/or data and/or instructions embodied in various machine-readable or computer-readable media. Computer-readable media that may embody such formatted data and/or instructions include, but are not limited to, physical (non-transitory), non-volatile storage media in various forms such as optical, magnetic, or semiconductor storage media.

In particular, it should be understood that embodiments may include hardware, software, and electronic components or modules, which, for discussion purposes, may be illustrated and described as if most of the components were implemented solely in hardware. However, those skilled in the art and based on a reading of this detailed description will appreciate that in at least one embodiment, the electronic-based aspects may be implemented in software (e.g., stored on a non-transitory computer-readable medium) executable by one or more electronic processors (e.g., microprocessors and/or application specific integrated circuits ("ASICs")). It should therefore be noted that embodiments may be implemented using a number of hardware and software based devices as well as a number of different structural components. For example, the "content activity detector" described herein may include one or more electronic processors, one or more computer-readable medium modules, one or more input/output interfaces, and various connections (e.g., a system bus) connecting the various components.

While one or more implementations have been described by way of example and with respect to specific embodiments, it should be understood that one or more implementations are not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. The scope of the appended claims is, therefore, to be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.

Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," or "having" and variations thereof is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Unless specified or limited otherwise, the terms "mounted," "connected," "supported," and "coupled" and variations thereof are used broadly and encompass both direct and indirect mountings, connections, supports, and couplings.

The exemplified embodiments are listed

Various aspects and implementations of the present disclosure may also be understood from the example embodiments (EEEs) listed below, which are not the claims.

EEE 1. A method of modeling an extended audio object for audio rendering in a virtual or augmented reality environment, the method comprising: obtaining a range representation indicative of a geometric form of an extended audio object and information related to one or more first audio sources associated with the extended audio object; obtaining geometrically relative points of the extended audio object based on a user location in the virtual or augmented reality environment; determining a range parameter of the range representation based on the user location and the relative point, the range parameter describing a spatial extension of the extended audio object perceived at the user location; determining a location of one or more second audio sources relative to the user location for modeling the extended audio object; and outputting a modified representation of the extended audio object for modeling the extended audio object, the modified representation including the range parameter and the location of the one or more second audio sources.

The method of EEE 2. The method of EEE1 further comprising determining the one or more second audio sources for modeling the extended audio object based on the one or more first audio sources.

The method of EEE 3. EEE1 or EEE2 wherein the range parameter is further determined based on a position and/or orientation of the extended audio object.

The method of EEE3, further comprising determining a relative range angle based on the user location, the relative point, and a location and/or orientation of the extended audio object, wherein the range parameter is determined based on the relative range angle.

The method of any of EEE 1-EEE 4, wherein determining the location of the one or more second audio sources comprises determining a circular arc based on the user location, the opposing points, and the geometric form of the extended audio object; and positioning the determined one or more second audio sources on the circular arc.

EEE 6. The method of EEE5, wherein said positioning involves equally distributing all of said second audio sources over said circular arc.

The method of EEE 7. EEE5 or EEE6 wherein the positioning is dependent on a level of correlation between the second audio sources and/or content creator intent.

The method of any of EEE 2-EEE 7, wherein the range parameter is further determined based on the determined number of one or more second audio sources.

The method of EEE8 wherein the determined number of one or more second audio sources is a predetermined constant independent of the user location and/or the opposing point.

The method of EEE10 when EEE8 is referenced to EEE4, wherein determining the one or more second audio sources for modeling the extended audio object includes determining a number of the one or more second audio sources based on the relative range angle.

The method of EEE10 wherein the number of the one or more second audio sources increases as the relative range angle increases.

The method of any of EEE 2-EEE 11, wherein determining the one or more second audio sources for modeling the extended audio object further comprises replicating the one or more first audio sources or summing a weighted mix of the one or more first audio sources; and applying a decorrelation process to the copied or added first audio source.

The method of any of EEEs 1-12, wherein the range representation indicates a two-dimensional or three-dimensional geometric form for representing a spatial extension of the extended audio object.

The method of any of EEEs 1 to 13, wherein the extended audio object (3) is oriented in two or three dimensions.

The method of any of EEEs 1-14, wherein the spatial extension of the extended audio object perceived at the user location is described as a perceived width, size, and/or quality of the extended audio object.

The method of any of EEEs 1-15, wherein the opposite point is a geometrically closest point to the user location of the extended audio object.

The method of any of EEE 4-EEE 16, further comprising obtaining an orthogonal projection of the extended audio object on a projection plane orthogonal to a first line connecting the user location and the opposite point; and determining a plurality of boundary points on the orthogonal projection identifying a projection size of the extended audio object, wherein the relative range angle is determined using the user position and the plurality of boundary points.

The EEE 18. The method of EEE17 wherein determining the plurality of boundary points comprises obtaining a second line related to a horizontal projection size of the extended audio object, wherein the plurality of boundary points comprises leftmost and rightmost boundary points of the orthogonal projection on the second line.

EEE 19. The method of EEE18, wherein said horizontal projection size is a maximum size of said extended audio object.

The method of any of EEEs 17-19, wherein the orthogonal projection comprises a simplified projection of the extended audio object having a complex geometry.

EEE21 the method of claim 20, further comprising obtaining a simplified geometric form of the extended audio object for use in determining the relative point prior to obtaining the relative point on the geometric form of the extended audio object.

The method of any of EEEs 1-21, further comprising rendering the extended audio object based on the modified representation of the extended audio object, wherein the extended audio object is rendered using the determined location of the one or more second audio sources and the range parameter.

The method of EEE23 wherein the rendering comprises 6DoF audio rendering, further comprising obtaining the user position, the position and/or orientation of the extended audio object, and geometry for the rendering.

The method of any of EEEs 1-23, further comprising controlling a perceived size of the extended audio object using the range parameter.

EEE 25. The method of EEE24, wherein the extended audio object is modeled as a point source or a wide source by controlling the perceived size of the extended audio object.

The method of any of EEEs 1-25, wherein the location of the one or more second audio sources is determined such that all of the second audio sources have the same reference distance from the user location.

An EEE 27. An apparatus for modeling extended audio objects for audio rendering in a virtual or augmented reality environment, the apparatus comprising a processor and a memory coupled to the processor and storing instructions for the processor, wherein the processor is configured to perform all the steps of the method according to any one of EEE1 to EEE 26.

An EEE 28. A system for implementing audio rendering in a virtual or augmented reality environment, the system comprising: a device according to EEE 27; and a range modeling unit configured to receive information related to a modified representation of the extended audio object from the apparatus and to control a range size of the extended audio object based on the information related to the modified representation.

EEE 29. The system of EEE28, wherein said system is or is part of a user virtual reality console.

EEE 30. The system according to EEE28 or EEE29, wherein said system is configured to transmit said information related to the modified representation and/or a controlled range size of said extended audio object to an audio output.

EEE 31. A computer program comprising instructions which, when executed by a computing device, cause the computing device to perform all the steps of the method according to any one of EEEs 1 to 26.

EEE 32A computer readable storage medium stores a computer program according to EEE 31.

Claims

1. A computer-implemented method of modeling an extended audio object for audio rendering in a virtual or augmented reality environment, the method comprising:

Obtaining a range representation indicative of a geometric form of an extended audio object and information related to one or more first audio sources associated with the extended audio object;

obtaining a relative point closest to a user location in the virtual or augmented reality environment using the range representation indicative of the geometric form of the extended audio object;

determining a range parameter of the range representation based on the user location and the relative point, the range parameter describing a spatial extension of the extended audio object perceived at the user location;

determining a location of one or more second audio sources relative to the user location for modeling the extended audio object; and

a modified representation of the extended audio object is output for modeling the extended audio object, the modified representation including the range parameter and the locations of the one or more second audio sources.

2. The computer-implemented method of claim 1, further comprising rendering the extended audio object based on the modified representation of the extended audio object, wherein the extended audio object is rendered using the determined locations of the one or more second audio sources and the range parameters.

3. The computer-implemented method of claim 2, wherein the rendering comprises a 6DoF audio rendering, further comprising obtaining the user position, the position and/or orientation of the extended audio object, and a geometry for the rendering.

4. The computer-implemented method of claim 1 or claim 2, further comprising determining the one or more second audio sources for modeling the extended audio object based on the one or more first audio sources.

5. The computer-implemented method of any of the preceding claims, wherein the range parameter is further determined based on a position and/or orientation of the extended audio object.

6. The computer-implemented method of claim 5, further comprising:

a relative range angle is determined based on the user position, the relative point, and the position and/or orientation of the extended audio object, wherein the range parameter is determined based on the relative range angle.

7. The computer-implemented method of any of the preceding claims, wherein determining the location of the one or more second audio sources comprises:

Determining a circular arc based on the user location, the point of opposition, and the geometric form of the extended audio object; and

the determined one or more second audio sources are positioned on the circular arc.

8. The computer-implemented method of claim 7, wherein the locating involves equally distributing all of the second audio sources over the arc.

9. The computer-implemented method of claim 7 or 8, wherein the positioning depends on a level of correlation between the second audio sources and/or content creator intent.

10. The computer-implemented method of any of the preceding claims when dependent on claim 4, wherein the range parameter is further determined based on the determined number of one or more second audio sources.

11. The computer-implemented method of claim 10, wherein the determined number of one or more second audio sources is a predetermined constant independent of the user location and/or the opposing point.

12. The computer-implemented method of claim 10 when dependent on claim 4, wherein determining the one or more second audio sources for modeling the extended audio object comprises determining the number of the one or more second audio sources based on the relative range angle.

13. The computer-implemented method of claim 12, wherein the number of the one or more second audio sources increases as the relative range angle increases.

14. The computer-implemented method of any of the preceding claims when dependent on claim 4, wherein determining the one or more second audio sources for modeling the extended audio object further comprises:

replicating the one or more first audio sources or summing a weighted mix of the one or more first audio sources; and

a decorrelation process is applied to the copied or added first audio source.

15. The computer-implemented method of any of the preceding claims, wherein the range representation indicates a two-dimensional or three-dimensional geometric form for representing a spatial extension of the extended audio object.

16. The computer-implemented method of any of the preceding claims, wherein the extended audio object (3) is oriented in two or three dimensions.

17. The computer-implemented method of any of the preceding claims, wherein the spatial extension of the extended audio object perceived at the user location is described as a perceived width, size, and/or quality of the extended audio object.

18. The computer-implemented method of any of the preceding claims when dependent on claim 6, further comprising:

obtaining an orthogonal projection of the extended audio object on a projection plane orthogonal to a first line connecting the user location and the opposite point; and

determining a plurality of boundary points on the orthogonal projection identifying a projection size of the extended audio object,

wherein the relative range angle is determined using the user position and the plurality of boundary points.

19. The computer-implemented method of claim 18, wherein determining the plurality of boundary points comprises obtaining a second line related to a horizontal projection size of the extended audio object, wherein the plurality of boundary points comprises leftmost and rightmost boundary points of the orthogonal projection on the second line.

20. The computer-implemented method of claim 19, wherein the horizontal projection size is a maximum size of the extended audio object.

21. The computer-implemented method of any of claims 18 to 20, wherein the orthogonal projection comprises a simplified projection of the extended audio object having a complex geometry.

22. The computer-implemented method of any of the preceding claims, further comprising controlling a perceived size of the extended audio object using the range parameter.

23. The computer-implemented method of claim 23, wherein the extended audio object is modeled as a point source or a wide source by controlling the perceived size of the extended audio object.

24. The computer-implemented method of any of the preceding claims, wherein the locations of the one or more second audio sources are determined such that all of the second audio sources have the same reference distance from the user location.

25. The computer-implemented method of any of the preceding claims, wherein obtaining the relative point using the range representation indicative of the geometric form of the extended audio object comprises obtaining the relative point on the geometric form or obtaining the relative point a distance from the range representation.

26. An apparatus for modeling an extended audio object for audio rendering in a virtual or augmented reality environment, the apparatus comprising a processor and a memory coupled to the processor and storing instructions for the processor, wherein the processor is configured to perform all the steps of the computer-implemented method according to any one of claims 1 to 25.

27. A system for implementing audio rendering in a virtual or augmented reality environment, the system comprising:

an apparatus for modeling an extended audio object for audio rendering in a virtual or augmented reality environment, the apparatus comprising a processor and a memory coupled to the processor and storing instructions for the processor, wherein the processor is configured to perform all the steps of the computer-implemented method according to any one of claims 1 to 25; and

a range modeling unit configured to receive information related to the modified representation of the extended audio object from the apparatus and to control a range size of the extended audio object based on the information related to the modified representation,

wherein the system is configured to transmit the information related to the modified representation and/or the controlled range size of the extended audio object to an audio output.

28. The system of claim 27, wherein the system is a user virtual reality console or is part of the user virtual reality console.

29. The system of claim 27 or claim 28, wherein the system is configured to transmit the information related to the modified representation and/or the controlled range size of the extended audio object to an audio output.

30. A computer program comprising instructions which, when executed by a computing device, cause the computing device to perform all the steps of the method according to any one of claims 1 to 25.

31. A computer readable storage medium storing a computer program according to claim 30.