WO2015152661A1

WO2015152661A1 - Method and apparatus for rendering audio object

Info

Publication number: WO2015152661A1
Application number: PCT/KR2015/003326
Authority: WO
Inventors: 전상배; 김선민
Original assignee: 삼성전자 주식회사
Priority date: 2014-04-02
Filing date: 2015-04-02
Publication date: 2015-10-08

Abstract

Disclosed is a method for rendering an audio object, comprising the steps of: obtaining information related to the audio object; determining, on the basis of the obtained information related to the object, a spread parameter indicating the degree of the audio object spreading in at least one direction; determining at least one direction in which the audio object is located in accordance with the determined parameters; and rendering, on the basis of the determined direction, the audio object.

Description

Method and device for rendering audio objects

The present invention is directed to a method and apparatus for rendering an audio object.

When object rendering is performed on an audio signal including at least one audio object, such as Moving Picture Experts Group-H (MPEG-H), the audio object may be positioned at a virtual source location. When an object is mixed with multiple channels so that an audio object can be located at a virtual sound source location, the interference between channels increases, so that the sound field and the space sense intended by the creator for each object may not be properly expressed. Can be.

Therefore, there is a problem in the method of rendering the object so that the audio signal can be output for each object according to the intention of the creator.

The present invention relates to a method and apparatus for rendering an audio object for effectively expressing a color, a feeling, a distance, a feeling of space, etc. of a sound according to a creator's intention by determining a degree of spread of the audio object based on the type of the audio object. will be.

According to an embodiment, the audio object may be output in an optimal state according to the producer's intention to provide to the listener.

1 is a diagram illustrating an example of rendering an audio object in a two-dimensional space according to an embodiment.

2 is a diagram illustrating an example of rendering an audio object in a 3D space according to an embodiment.

3 is a diagram illustrating an example of rendering an audio object spread in a plurality of directions according to an embodiment.

4 is a flowchart illustrating a method of determining a spread parameter based on a type of an audio object according to an exemplary embodiment.

5 is a flowchart illustrating a method of determining a type of an audio object according to an exemplary embodiment.

6 is a block diagram illustrating an internal structure of an apparatus for rendering an audio object according to an exemplary embodiment.

According to an embodiment, a method of rendering an audio object may include obtaining information related to the audio object; Determining a spread parameter representing a degree to which the audio object spreads in at least one direction based on the information related to the obtained object; Determining, according to the determined parameter, at least one direction in which the audio object is located; Based on the determined direction, rendering the audio object.

The information related to the object includes information about a type of an object, and the type of the object is classified according to whether the audio object provides a sense of space or reverberation to a listener.

The type of the audio object includes at least one of a direct type, an ambience type, an applause type, a soft decision type, and a dialog type.

The determining of the spread parameter includes determining the spread parameter to a value less than or equal to a reference value when the obtained object type is an ambience type or an applause type.

The determining of the spread parameter may include determining a spread parameter based on information indicating a degree of spatiality of the audio object when the obtained object type is a soft decision type.

The determining of the spread parameter includes determining the spread parameter based on at least one feature of the object and a rendering environment when the obtained object type is a direct type.

The determining of the spread parameter includes determining the spread parameter as a value equal to or greater than a reference value when the obtained object type is a dialog type.

According to an embodiment, an apparatus for rendering an audio object includes a receiver configured to receive an audio signal including at least one audio object and to extract an audio object from the audio signal; Acquiring information related to the audio object, determining a spread parameter indicating a degree to which the audio object spreads in at least one direction based on the information related to the obtained object, and according to the determined parameter, A controller which determines at least one orientation to be positioned and renders the audio object based on the determined direction; And a sound output unit configured to output the rendered audio object.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, in the following description and the accompanying drawings, detailed descriptions of well-known functions or configurations that may obscure the subject matter of the present invention will be omitted. In addition, it should be noted that like elements are denoted by the same reference numerals as much as possible throughout the drawings.

The terms or words used in the specification and claims described below should not be construed as being limited to the ordinary or dictionary meanings, and the inventors are properly defined as terms for explaining their own invention in the best way. It should be interpreted as meaning and concept corresponding to the technical idea of the present invention based on the principle that it can. Therefore, the embodiments described in the present specification and the configuration shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all of the technical ideas of the present invention, and various alternatives may be substituted at the time of the present application. It should be understood that there may be equivalents and variations.

In the accompanying drawings, some components are exaggerated, omitted, or schematically illustrated, and the size of each component does not entirely reflect the actual size. The invention is not limited by the relative size or spacing drawn in the accompanying drawings.

When any part of the specification is to "include" any component, this means that it may further include other components, except to exclude other components unless otherwise stated. In addition, when a part is "connected" with another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element between them.

In addition, the term "part" as used herein refers to a hardware component, such as software, FPGA or ASIC, and "part" plays certain roles. However, "part" is not meant to be limited to software or hardware. The “unit” may be configured to be in an addressable storage medium and may be configured to play one or more processors. Thus, as an example, a "part" refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, procedures, Subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays and variables. The functionality provided within the components and "parts" may be combined into a smaller number of components and "parts" or further separated into additional components and "parts".

Also, in the present specification, an audio object refers to each of sound components included in an audio signal. One audio signal may include various audio objects. For example, the audio signal generated by recording the performance of an orchestra includes a plurality of audio objects generated from a plurality of musical instruments such as guitar, violin, and oboe.

In addition, in this specification, the sound image means a position where the listener feels as where the sound source occurs. The actual sound is output from the speaker, but the point where each sound source is virtually called a sound image. The size and position of the sound image may vary according to the speaker from which sound is output. When the position of the sound of each sound source is clear and the sound of each sound source is well heard by the listener, it may be determined that the sound position is excellent.

DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

In one embodiment, the described technique is described based on the MPEG-H standard, but is not limited thereto, and may be applied to other audio coding techniques.

Hereinafter, exemplary embodiments of the present invention will be described with reference to the accompanying drawings.

Referring to FIG. 1, as the audio object 100 is rendered to the

speakers

1 and 2 110 and 120, the audio object 100 may be output through the

speakers

1 and 2 110 and 120. As the sound image of the audio object 100 is located between the speaker 1 110 and the speaker 2 120, the audio object 100 may be rendered and output to the

speakers

1 and 2 110 and 120.

According to the position of the sound image of the audio object 100, the volume of the sound constituting the audio object 100 output through the

speakers

1 and 2 110 and 120 may be determined. Referring to FIG. 1, as the audio object 100 is located closer to the speaker 1 110, the

speakers

1 and 2 110 may be positioned so that the sound image of the audio object 100 may be located at a position close to the speaker 1 110. 120 may be adjusted. The acoustic size of the audio object 100 output through the speaker 1 110 may be greater than the acoustic size of the audio object 100 output through the speaker 2 120.

The sound of the audio object 100 that may be output through each speaker described above may be represented by Equation 1 below.

Direction vector indicating the sound image position of the audio object 100

Is a unit direction vector representing the output positions of the speakers 1 and 2 (110, 120) as

It can be expressed as. Each direction vector may be determined according to the position of the sound image or the position of the speaker with respect to the position 130 of the listener.

Equation 1

g ₁ and g ₂ are gain factors that can be applied to the direction vectors of speakers 1 and 2,

And

A value that can be determined based on the value of. The gain factor value corresponding to the sound volume of the audio objects 100 output to the

speakers

1 and 2 may be determined according to the sound image of the audio object 100 and the positions of the speakers.

Referring to FIG. 2, an audio object may be output through three speakers by being rendered in channels m, n, and k. Since the position of the virtual sound source where the sound image of the audio object can be located is within a triangle formed by the channels m, n and k, the audio object may be output through the speakers of the channels m, n and k.

The direction vector p represents the position of the sound image on which the audio object is positioned with respect to the position 210 of the listener. The vectors l _m , l _n , l _k of the channels m, n, k indicate the positions of the channels m, n, k, respectively, with respect to the position 210 of the listener, and are unit vectors of size 1.

The direction vector p of the audio object may be expressed as Equation 2, similarly to Equation 1.

Equation 2

The gain factor values g _m , g _n , and g _k corresponding to the loudness of the audio object 210 output to the speakers of channels m, n, and k are the sound of the audio object 210 and the speakers of channels m, n, and k. Can be determined according to their location.

Meanwhile, the sound image of the audio object 100 may be positioned at a plurality of positions as well as one position as shown in FIGS. 1 and 2. The sound image of the audio object 100 may exist in a plurality of directions spread within a predetermined range about the reference direction. This will be described in more detail with reference to FIG. 3.

Referring to FIG. 3, a sound image of an audio object may be positioned in a plurality of directions spreading in a range of an angle α about a reference direction 210 according to a multiple-direction amplitude panning (MDAP) method. As the sound image of the audio object is positioned in a plurality of directions, the audio object may be output through more speakers by the size of the spread angle than when it is positioned in one direction.

When the audio object is output only in the reference direction 210, the audio object may be output through

speakers

1, 2, and 5 constituting a triangle in which the reference direction 210 is located. On the other hand, when the audio objects are output in a plurality of directions as shown in FIG. 3, the audio objects may be output through the

speakers

1, 2, 5, and 3 according to positions where sound images in each direction are located. An audio object may be output through three speakers constituting a triangle at positions where sound images in each direction are positioned.

The plurality of directions p _m in which the sound image of the audio object may be positioned may be determined according to a vector p ₀ and an angle α representing the reference direction of the audio object, as shown in Equation 3 below. Angle α represents a spread parameter to be described later.

Equation 3

p _m 'is a direction vector value that can be determined according to a p ₀ value representing a reference direction and a coordinate value corresponding to the p ₀ value. p _m 'is a value that can be determined based on the reference position where the sound image of the audio object is located. m may be determined as an integer value of 0 or more according to the number of directions in which the sound image of the audio object may be positioned. α 'means an α value having a value between 0.001 and 90 degrees.

A gain factor g value that can be applied to the p _m value may be obtained according to

Equation

1 or 2 described above. The gain factor value may be determined based on the vector p _m value for each direction in which the audio object is positioned and the direction vector value of the speaker from which each p _m value is output.

For example, referring to the direction 330 illustrated in FIG. 3 among the directions in which the audio object may be positioned, the direction vector of the 330 is located in a triangle formed by the speakers 2, 3, and 5. Accordingly, the gain factor value for the direction vector of 330 may be determined based on the direction vector values of speakers 2, 3, and 5, which originate from the position 320 of the listener.

Equation 3 is merely an example, and the direction vector value in which the sound image of the audio object can be positioned can be obtained in various ways.

The degree of spreading between the direction vectors in which the sound image of the audio object may be positioned may be determined according to an α value that may be determined by the spread parameter. As the value of α increases, the maximum angle between the direction vectors in which the sound image may be positioned increases, so that the audio object may be output through more speakers.

Depending on the characteristics of the audio object and the number of channels through which the audio object is output, a feeling such as spatial feeling and reverberation of a sound that the listener feels may vary. The spread parameter of the audio object may be determined to provide a listener with a sense of space, reverberation, etc. according to the intention of the producer. The apparatus may determine the spread parameter so that the audio object can be output with the intention or the optimal state of the producer according to the characteristics of the audio object.

When the audio object provides the listener with reverberation or a sense of space, the greater the number of channels through which the audio object is output, the higher the correlation between channels. Therefore, when the number of channels to which the audio object is output is large, the reverberation or space of the audio object may not be optimally provided to the listener.

According to an embodiment, the apparatus for rendering an audio object may determine a spread parameter according to a type of an audio object representing a characteristic of the audio object. As the spread parameter is determined according to the type of the audio object, the number of speakers to which the audio object is output may be adjusted according to the characteristics of the audio object.

In addition, the spread parameter may be determined according to an audio object type including a digitized value. For example, an audio object type including a digitized value may include a value according to a soft decision method. The soft decision means a method of displaying data with information indicating the certainty of a data value, such as 1 close to 0 or 1 close to 1. For example, it means a data display method that displays data by using information of a real value rather than an integer or further includes other additional information. According to an embodiment, the spread parameter may be determined according to the digitized data of the audio object belonging to the soft decision type.

Hereinafter, a method of determining a spread parameter based on information related to an object will be described in detail with reference to FIGS. 4 and 5.

4 is a flowchart illustrating a method of determining a spread parameter based on information related to an audio object according to an exemplary embodiment.

Referring to FIG. 4, in operation S410, the apparatus may obtain information related to an audio object to be rendered. The information related to the audio object may include location information of the object, characteristic information, and the like. The characteristic information of the object may include, for example, information about the type of the object.

The type of object may be classified according to whether the audio object provides a sense of space or reverberation to the listener. In addition, the type of the object may be classified based on whether the output performance or output characteristics of the audio object change as the audio object is output through a plurality of channels. For example, a spread parameter for a type of an audio object in which a change in output performance or characteristics of an audio object is insignificant as output to a plurality of channels may be determined as a relatively large value. The type of the object is not limited thereto, and may be classified in various ways.

In addition, the type of the audio object may be obtained from information about the type of the audio object signaled through the bitstream, or the type of the audio object may be determined based on a result of analyzing the characteristics of the audio object.

For example, the types of audio objects may be classified according to whether they provide a listener with a sense of space or reverberation. In this case, the type of the audio object may be classified into a direct type, an ambience type, an appliance type, a soft decision type, a dialog type, and the like.

The ambience type may include an audio object that provides a sense of space by providing reverberation to the listener.

The applause type may include transient, transient, transient audio objects, such as claps or rain.

The dialog type may include an audio object including a human voice, a conversation, and the like.

The direct type may include an audio object from which spread parameters may be determined based on at least one feature of the object and the rendering environment.

The soft decision type may include an audio object whose spread parameters may be determined in accordance with artistic information of the sound associated with the spread parameters determined by the producer. The information about the soft decision type audio object type may include specifically digitized information representing artistic information of a sound associated with a spread parameter. Spread parameters may be determined based on the quantified information described above. For example, the digitized information may include a value indicating a degree of spatial sense of sound. Alternatively, the digitized information may directly include spread parameter values.

When the type of the audio object determined according to the characteristic analysis of the audio object does not belong to the ambience type, the applause type, or the dialog type, the type of the audio object may be determined as one of a direct type and a soft decision type. The device may determine a type more suitable for rendering the audio object of the two types, and render the audio object according to the determined object type. One of the two types may be determined as the type of the audio object based on the characteristics of the audio object or the output environment of the audio object.

In operation S420, the apparatus may determine the spread parameter based on the information about the audio object to be rendered.

For example, the spread parameter for the audio object may be determined to be a value below the reference value for the audio object of the ambience type or the applause type, which may provide a sense of space or reverberation. In the case of an audio object that can provide a sense of space or reverberation, an audio object may not properly provide a sense of space or reverberation to a listener due to interference when rendered in many channels. Accordingly, the spread parameter may be determined to be a value less than or equal to a reference value for rendering with fewer channels for audio objects belonging to the ambience type or the applause type.

The spread parameter may be determined as a specific value below the reference value for each object type according to the intention of the producer. In addition, the spread parameter may be determined as a specific value that allows the audio object to be optimally output among values below the reference value.

On the other hand, although the audio object belonging to the dialog type is output in multiple channels due to its characteristics, the output performance of the audio object is not significantly affected. Dialog type audio objects are hardly affected by interference that may occur as they are output in multiple channels. Therefore, the spread parameter of the dialog type may be determined to be a value more than the reference value.

The spread parameter of an object belonging to the direct type may be determined based on at least one feature of the object and the rendering environment.

The spread parameter of the object belonging to the soft decision type may be determined as a value to be output as intended by the producer. Spread parameters of the object may be determined based on specifically quantified information indicative of the intention of the producer.

In addition, the spread parameter may be determined based on not only the type of the object but also information related to the object, for example, location information of the object, characteristic information, and the like.

In operation S430, the device may determine at least one direction in which the sound image of the audio object is positioned based on the spread parameter determined in operation S420. A plurality of directions that may be determined in step S430 will be referred to as a panning direction below. The panning direction represents a vector value that can be determined within an angular range according to the spread parameter value about the reference direction.

In operation S440, the device may render the audio object based on the direction in which the audio object determined in operation S430 is positioned. The device may obtain a gain factor for each panning direction of the audio object. As described above, the gain factor may be determined based on a direction vector value indicating a position of each panning direction and channels forming a triangle in which each panning direction is located. The device may render the audio object into a plurality of channels based on the panning direction and the gain factor of the audio object.

Hereinafter, a method of determining the type of an audio object will be described in more detail with reference to FIG. 5.

Referring to FIG. 5, in operation S510, the device may acquire a type of an audio object. The object types may be classified in various ways based on the degree to which the reverberation of the sound that may be provided to the listener, the sense of space, and the like may vary according to the extent to which the audio object is spread. In addition, the type of the object may be classified based on whether the output performance or output characteristics of the audio object change as the audio object is output through a plurality of channels.

In operation S520, the device may determine whether the type of the audio object acquired in operation S510 is an ambience type or an applause type.

In step S530, if it is determined in step S520 that the type of the audio object is an ambience type or an applause type, the device may determine the spread parameter to be a value less than or equal to the reference value. Audio objects of the ambience type or the applause type may provide a listener with a sense of space or reverberation. Therefore, as the audio object belonging to the above type is rendered in a large number of channels, the interference phenomenon may increase. The apparatus may determine the spread parameter to a value below the reference value so as to minimize the interference phenomenon.

In addition, the apparatus may determine the spread parameter to a value capable of outputting the audio object optimally in consideration of the characteristics of the audio object, the output environment of the audio object, a user setting, and the like.

In operation S540, the device may determine whether the type of the audio object belongs to the direct type. In operation S550, when the type of the audio object belongs to the direct type, the apparatus may obtain a spread parameter based on at least one feature of the object and the rendering environment.

In operation S560, the device may determine whether the type of the audio object belongs to the dialog type. In operation S570, the device may determine the spread parameter to be a value greater than or equal to the reference value according to the type of the audio object. Even if an audio object belonging to the dialog type is output in multiple channels, the output performance of the audio object is not significantly affected. Dialog type audio objects are hardly affected by interference that may occur as they are output in multiple channels. Therefore, the spread parameter of the dialog type may be determined to be a value greater than or equal to the reference value.

The larger the spread parameter, the more the audio object can be output through more directions and channels. In addition, when the audio object moves over time, the audio object may be output in an optimal state as the audio object is output through many channels. Accordingly, the device may determine the spread parameter so that the audio object is output through many channels, but may determine the spread parameter to a value less than or equal to the reference value according to the type of the audio object.

In operation S580, the device may determine that the audio object is a soft decision type. The spread parameter may be determined by the spread parameter of the object based on specifically digitized information representing the intention of the producer. For example, the spread parameter may be determined based on a numerical value indicating the degree of spatiality of the object.

In operation S590, the device may render the audio object using the spread parameters determined in operations S530, S550, S570, and S580. The rendered audio object may be output through at least one rendered channel.

Hereinafter, an apparatus for rendering an audio object will be described in detail with reference to FIG. 6.

According to an embodiment, an apparatus 600 for rendering an audio object may be a terminal apparatus that may be used by a user. For example, the device 600 may be a smart television, ultra high definition (UHD) TV, monitor, personal computer (PC), notebook computer, mobile phone, tablet PC, navigation terminal, smart Smart phones, personal digital assistants (PDAs), portable multimedia players (PMPs), and digital broadcast receivers.

The apparatus 600 for rendering an audio object according to an embodiment may include a receiver 610, a controller 620, and a sound output unit 630.

The receiver 610 may receive an audio signal including an audio object for rendering from the outside. In addition, the receiver 610 may extract an audio object from the audio signal. The audio signal may be received in the form of a bit stream, and the receiver 610 may extract an audio object from the bit stream including the audio signal. In addition, the receiver 610 may extract information for analyzing the characteristics of the audio object or information for determining the type of the audio object from the bit stream.

The controller 620 may determine the spread parameter based on the information related to the audio object, and render the audio object according to the determined spread parameter. The information related to the audio object may include location information of the object, characteristic information, and the like. The characteristic information of the object may include, for example, information about the type of the object. The spread parameter may be determined depending on whether the audio object provides a sense of space or reverberation to the listener. In addition, the type of the object may be classified based on whether the output performance or output characteristics of the audio object change as the audio object is output through a plurality of channels. In addition, the spread parameter of the object belonging to the soft decision type of the object type may be determined based on specifically digitized information indicating the intention of the producer.

The sound output unit 630 may output the audio object rendered by the controller 620 through a plurality of channels.

The method according to some embodiments may be embodied in the form of program instructions that may be executed by various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

Although the foregoing description has been focused on the novel features of the invention as applied to various embodiments, those skilled in the art will appreciate that the apparatus and method described above without departing from the scope of the invention. It will be understood that various deletions, substitutions, and changes in form and detail of the invention are possible. Accordingly, the scope of the invention is defined by the appended claims rather than in the foregoing description. All modifications within the scope of equivalents of the claims are to be embraced within the scope of the present invention.

Claims

In the method of rendering an audio object,

Obtaining information related to the audio object;

Determining a spread parameter indicative of the extent to which the audio object spreads in at least one direction based on the obtained information related to the obtained object;

Determining, according to the determined parameter, at least one direction in which the audio object is located;

Based on the determined direction, rendering the audio object.
The method of claim 1, wherein the information related to the object includes information about a type of an object.

The type of the object

And whether the audio object provides a listener with spatiality or reverberation.
The method of claim 2, wherein the type of the audio object

And at least one of a direct type, an ambience type, an applause type, a soft decision type, and a dialog type.
The method of claim 2, wherein determining the spread parameter

If the obtained object type is an ambience type or an applause type, determining the spread parameter to a value less than or equal to a reference value.
The method of claim 2, wherein determining the spread parameter

If the obtained object type is a soft decision type, determining a spread parameter based on information indicating a degree of spatiality of the audio object.
The method of claim 2, wherein determining the spread parameter

If the obtained object type is a direct type, determining the spread parameter based on at least one feature of the object and a rendering environment.
The method of claim 2, wherein determining the spread parameter

If the obtained object type is a dialog type, determining the spread parameter to a value equal to or greater than a reference value.
An apparatus for rendering an audio object,

A receiver which receives an audio signal including at least one audio object and extracts an audio object from the audio signal;

Acquiring information related to the audio object, determining a spread parameter indicating a degree to which the audio object spreads in at least one direction based on the information related to the obtained object, and according to the determined parameter, A controller which determines at least one orientation to be positioned and renders the audio object based on the determined direction; And

And a sound output unit for outputting the rendered audio object.
The method of claim 8, wherein the information related to the object includes information regarding a type of an object, and the type of the object

And whether the audio object provides a sense of space or reverberation to the listener.
The method of claim 9, wherein the type of the audio object

And at least one of a direct type, an ambience type, an applause type, a soft decision type, and a dialog type.
The method of claim 9, wherein the control unit

And if the obtained object type is an ambience type or an applause type, determining the spread parameter to a value below a reference value.
The method of claim 9, wherein the control unit

And if the obtained object type is a soft decision type, determine a spread parameter based on information indicating a degree of spatiality of the audio object.
The method of claim 9, wherein the control unit

And if the obtained object type is a direct type, determining the spread parameter based on at least one feature of the object and a rendering environment.
The method of claim 9, wherein the control unit

And determine the spread parameter to a value equal to or greater than a reference value when the obtained object type is a dialog type.
8. A computer-readable recording medium according to any one of claims 1 to 7, wherein a program for implementing the method is recorded.