WO2019147041A1

WO2019147041A1 - Method for generating binaural stereo audio and apparatus therefor

Info

Publication number: WO2019147041A1
Application number: PCT/KR2019/001019
Authority: WO
Inventors: 구본희
Original assignee: 구본희
Priority date: 2018-01-29
Filing date: 2019-01-24
Publication date: 2019-08-01
Also published as: KR102119239B1; KR20190091824A

Abstract

A method for generating a binaural stereo audio and an apparatus therefor are disclosed. A method for generating a binaural stereo audio, according to one embodiment of the present invention, comprises the steps of: generating a three-dimensional layer binaural output by performing three-dimensional layer binaural encoding corresponding to a three-dimensional binaural layer; generating a plane layer audio output by performing audio processing corresponding to a plane layer; and generating a binaural stereo output by mixing the three-dimensional layer binaural output with the plane layer audio output.

Description

Method and apparatus for generating binaural stereo audio

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for generating binaural stereo audio, and more particularly, to a technique for generating a binaural stereo audio that can be reproduced in general by combining binaural output based on a three-dimensional layer and audio output based on a plane layer .

The present invention claims the benefit of Korean Patent Application No. 10-2018-0010874 filed on January 29, 2018, the entire contents of which are incorporated herein by reference.

With the improvement of multimedia technology, the use of contents including multichannel audio signals such as 7.1 channel, 10.2 channel, 11.1 channel and 22.2 channel more than 5.1 channel is increasing. However, since user terminals possessed by users using contents can reproduce audio signals in stereo form, such as stereo speakers, headphones and earphones, high-quality multi-channel audio signals need to be converted into stereo-type audio signals .

In this regard, Korean Patent Laid-Open No. 10-2015-0013073 discloses a technique related to " binaural rendering method and apparatus for multi-channel audio signal ".

It is an object of the present invention to provide a method for generating binaural stereo audio capable of maximizing the binaural effect by mixing various sound elements.

It is also an object of the present invention to provide a binaural engine which can easily adjust or adjust a sound element for generating an effective binaural effect.

It is also an object of the present invention to improve compatibility with various kinds of contents based on natural upmix and downmix.

According to another aspect of the present invention, there is provided a binaural stereo audio generating method comprising: generating a 3D layer binaural output by performing 3D layer binaural encoding corresponding to a 3D binaural layer; Performing audio processing corresponding to a planar layer to produce a planar layer audio output; And combining the three-dimensional layer binaural output and the planar layer audio output to produce a binaural stereo output.

At this time, the plane layer performs a surround layer binaural encoding to generate a surround layer binaural output, a surround layer that provides the generated surround layer binaural output to the plane layer audio output, And a proximity stereo layer for generating the plane layer audio output corresponding to the stereo signal.

At this time, the three-dimensional layer binaural output corresponds to a three-dimensional vector for a binaural point located on an eight-channel-based three-dimensional cubic (Cubic) composed of four up channels and four down channels Lt; / RTI >

Wherein generating a binaural stereo output comprises applying a three-dimensional weight to the three-dimensional layer binaural output, applying a plane weight to the planar layer audio output, wherein the three-dimensional weight and the plane weight They can be set independently of each other.

At this time, the step of generating a binaural stereo output may include adding the sub-woofer output corresponding to the sub-woofer layer together with the three-dimensional layer binaural output and the planar layer audio output to generate the binaural stereo output have.

In this case, the cubic cubic can be generated by changing the positions of the eight dynamic speakers corresponding to the vertexes of the cubic cubic, corresponding to the size parameters for the 3D binaural layer.

At this time, the three-dimensional vector is included in the three-dimensional cubic, and can be generated based on a reference listening point corresponding to the center of the two-dimensional plane corresponding to the surround layer.

The generating of the 3D layer binaural output may include generating the 3D layer binaural output by applying direction information of the 3D vector to the 3D cubic that is rotated according to the head tracking information, The head tracking information may be obtained corresponding to at least one of a tracking input based on a head tracking module and a user input based on a user interface.

At this time, the cubic cubic can be rotated corresponding to the rotation parameter of at least one of pan, tilt, and roll.

At this time, the planar layer may be located between the four up channels and the four down channels.

In addition, the apparatus for generating binaural stereo audio according to an embodiment of the present invention may perform a three-dimensional layer binaural encoding corresponding to a three-dimensional binaural layer to generate a three-dimensional layer binaural output, A processor for performing audio processing corresponding to the layer to produce a planar layer audio output, and summing the three-dimensional layer binaural output and the planar layer audio output to produce a binaural stereo output; And a memory for storing the 3D layer binaural output and the plane layer audio output.

At this time, the processor applies a three-dimensional weight to the three-dimensional layer binaural output, applies a plane weight to the plane layer audio output, and the three-dimensional weight and the plane weight may be set independently of each other.

At this time, the processor may generate the binaural stereo output by summing the sub-woofer output corresponding to the sub-woofer layer with the three-dimensional layer binaural output and the planar layer audio output.

At this time, the processor applies the direction information of the three-dimensional vector to the rotated cubic bikes corresponding to the head tracking information to generate the three-dimensional layer binaural output, wherein the head tracking information is based on the head tracking module A tracking input, and a user input based on a user interface.

According to the present invention, it is possible to provide a method for generating binaural stereo audio capable of maximizing the binaural effect by mixing various sound elements

In addition, the present invention can provide a binaural engine which can easily adjust or adjust a sound element for generating an effective binaural effect.

In addition, the present invention can improve compatibility with various kinds of contents based on natural upmix and downmix.

1 is a view illustrating a structure of a binaural engine according to an embodiment of the present invention.

2 is a view showing a structure of a conventional binaural engine.

3 is a block diagram illustrating a binaural stereo audio generating apparatus according to an embodiment of the present invention.

4 is a diagram illustrating a detailed structure for generating a three-dimensional layer binaural output according to an embodiment of the present invention.

FIG. 5 is a diagram illustrating an example of an 8-channel based three-dimensional cubic according to the present invention.

6 is a diagram illustrating an example of 3D cubic with various sizes generated by changing the positions of dynamic speakers according to the present invention.

7 is a diagram showing an example of a three-dimensional vector according to the present invention.

FIG. 8 is a diagram illustrating an example of applying direction information of a three-dimensional vector to rotated cubic cubes corresponding to head tracking information according to the present invention.

9 is a diagram showing an example of a rotation parameter according to the present invention.

FIG. 10 illustrates a detailed structure for generating a surround layer binaural output according to an exemplary embodiment of the present invention. Referring to FIG.

11 is a diagram illustrating an example of a 5-channel based surround layer according to the present invention.

12 is a diagram illustrating a detailed structure for generating a stereo signal according to an embodiment of the present invention.

13 to 14 are views showing an example of a proximity stereo layer according to the present invention.

15 is a diagram illustrating an example of a plane layer positioned between an up channel and a down channel of a three-dimensional cubic according to the present invention.

16 to 17 are views showing an example of a proximity stereo layer used as a part of a channel of the surround layer according to the present invention.

18 is a diagram illustrating a detailed structure for generating a subwoofer output according to an embodiment of the present invention.

FIG. 19 is a view showing an example of a structure of a combination of a 3D binaural layer, a plane layer, and a sub-woofer layer according to the present invention.

20 is a diagram showing an example of a sound represented by a conventional binaural engine.

21 is a view showing an example of a sound represented by the binaural engine according to the present invention.

22 is a flowchart illustrating a method of generating binaural stereo audio according to an embodiment of the present invention.

The present invention will now be described in detail with reference to the accompanying drawings. Hereinafter, a repeated description, a known function that may obscure the gist of the present invention, and a detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. Accordingly, the shapes and sizes of the elements in the drawings and the like can be exaggerated for clarity.

Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 is a view showing a structure of a binaural engine according to an embodiment of the present invention, and FIG. 2 is a view showing a structure of a conventional binaural engine.

Referring to FIG. 2, a conventional binaural engine decodes a binaural encoded binaural output for a multi-channel audio file through a binaural encoder 210 through a dedicated player 220 to provide. In this case, since the binaural encoding according to the related art uses a fixed speaker disposed at a certain distance from the listening position, it is difficult to adjust the position of the speaker to increase or decrease the image of the space.

In addition, the conventional binaural engine is an engine specialized for content including both video and audio like surround movie contents. In the case of a source in which there is no spatial image such as music contents, it is difficult to apply binaural engine . In addition, since the binaurally encoded content can be reproduced only by using the dedicated player 220, the efficiency may be reduced in terms of utilization. For example, it is necessary to deliver sufficient loudness to the listener according to the characteristics of the music contents. However, the binaural encoder 210 shown in FIG. 2 has a limitation in providing a sound effect optimized for music contents.

In addition, since the conventional binaural engine uses only one encoder specialized for the effect mainly used according to the contents, it is impossible to apply various effects to the production. For example, since music contents often do not use subwoofers in nature, it has been rarely attempted to provide a bass reproduction element according to a subwoofer to music contents through a conventional binaural engine.

In contrast, the binaural engine according to an embodiment of the present invention shown in FIG. 1 mixes output including various binaural sound effects and output from audio processing to include more dramatic directing It is possible to generate an internal stereo output.

For example, binaural encoding is performed with a binaural encoder 111 corresponding to a multi-channel three-dimensional binaural layer 110 as shown in FIG. 1 to generate a three-dimensional layer binaural output . In addition, binaural encoding may be performed with the binaural encoder 121 corresponding to the surround layer 120 to generate a surround layer binaural output. In addition, the stereo bus 131 corresponding to the proximity stereo layer 130 may generate an audio output corresponding to the stereo signal. It is also possible to generate the subwoofer output on the LFE bus 141, which corresponds to the subwoofer layer 140. Thereafter, the respective outputs, that is, the three-dimensional layer binaural output, the surround layer binaural output, the audio output corresponding to the stereo signal, and the sub-woofer output are summed through the binaural mixer 150, Output can be generated. At this time, the binaural stereo output coupled through the binaural mixer 150 can be output to the listener or user in a reproducible form via the general-purpose decoder.

As described above, the binaural engine according to the present invention mixes outputs from various encoders to generate binaural stereo audio, so that the binaural engine can be used in a general form not specific to specific contents, Compatibility can be provided.

For example, in the case of movie contents including video and audio, a 3D layer binaural output, a stereo output, and a subwoofer output together with a surround layer binaural output that can be generated based on the motion of an object included in the image It is possible to provide a more dramatic sound production.

For another example, for music content that includes audio only, it provides dynamic music by mixing a stereo output or a subwoofer output with a 3D layer binaural output based on a 3D binaural layer You may.

Referring to FIG. 3, the apparatus for generating binaural stereo audio according to an exemplary embodiment of the present invention includes a communication unit 310, a processor 320, and a memory 330.

The communication unit 310 transmits and receives information necessary for generating binaural stereo audio through a communication network such as a network. In particular, the communication unit 310 according to an embodiment of the present invention receives the source or content that can be input for generating binaural stereo audio, head tracking information to be applied for binaural encoding, and information related to user input, It can provide binaural stereo audio equivalent to an in-built stereo output.

The processor 320 performs three-dimensional layer binaural encoding corresponding to the three-dimensional binaural layer to generate a three-dimensional layer binaural output.

Referring to FIG. 4, a binaural encoder 420 corresponding to a three-dimensional cubic method may be used to generate a three-dimensional binaural layer, Dimensional layer binaural encoding corresponding to a plurality of channels included in the binaural layer.

In this case, the 3D binaural layer may include four up channels 411 and four down channels 412 corresponding to 8-channel based cubic cubes.

Accordingly, the three-dimensional layer binaural output 430 may correspond to an output generated by binaural encoding the 8-channel based audio, and may be output corresponding to 2 channels as shown in FIG. In addition, the two channels corresponding to the three-dimensional layer binaural output 430 may correspond to the left channel and the right channel, respectively.

In this case, in the embodiment shown in FIG. 4, an 8-channel based cubic layer is used as the 3-dimensional binaural layer, but the 3-dimensional binaural layer may not be limited thereto. That is, the binaural engine or the binaural engine according to an embodiment of the present invention may include another usable three-dimensional binaural layer or a three-dimensional binaural layer to be developed in the future.

For example, referring to FIG. 5, an 8-channel based three-dimensional cubic may include four dynamic speakers 511 to 514 corresponding to four up channels and four dynamic speakers corresponding to four down channels (515 to 518) may be a hexahedron structure. At this time, since the positions of the eight dynamic speakers 511 to 518 can be changed, the range of the binaural effect caused by the three-dimensional cubic can also be changed dynamically.

In another example, an immersive sound may be implemented with eight dynamic speakers by generating three-dimensional cubic using a conventional binaural Vbap (Vector base amplitude panning) scheme. That is, a position value for X, Y, and Z is given to each of the eight dynamic speakers, and a vector-based virtual track point based on the center of cubic cubic can be expressed. At this time, the virtual track point can be represented corresponding to the parameter value included in the head tracking information.

With such three-dimensional cubic, it is possible to generate a spatial image of music contents containing only audio, and to express the movement of sound, thereby providing a more stereoscopic effect.

In this case, the 3D cubic can be generated by changing the position of the eight dynamic speakers corresponding to the vertices of the 3D cubic by changing the size parameter for the 3D binaural layer. That is, it is possible to efficiently generate three-dimensional cubic by changing the position of dynamic speakers of the variable system rather than the fixed system freely according to the size parameter.

For example, by processing the 3D cubic by a size parameter as a constant and multiplying it by a binaural function, the 3D

cubic bins

610, 620, and 630 having various ranges as shown in FIG. 6 Can be generated.

At this time, the three-dimensional vector is included in the three-dimensional cubic and can be generated based on the reference listening point corresponding to the center of the two-dimensional plane corresponding to the surround layer.

For example, referring to FIG. 7, a reference listening point 700, which virtually expresses the position of a user or a listener listening to binaural stereo audio, is composed of a three-dimensional cubic 710 having eight dynamic speakers as vertices, But may be located at a center portion on the surround layer 720. [ Assuming that the binaural point 730 is located on the upper surface of the cubic 710, as shown in FIG. 7, a three-dimensional vector 740 corresponding to the three-dimensional layer binaural output 7 in the direction from the reference listening point 700 shown in FIG. 7 to the binaural point 730.

In this case, as will be described in detail with reference to FIGS. 10 to 11, the surround layer 720 corresponds to an element for creating a surround image corresponding to a surround effect. In FIG. 7, But it may not be limited to a planar shape.

7, when the binaural point 730 is located on the 3D cubic 710 higher than the surround layer 720 where the reference listening point 700 is located, Can be formed at the top of the. Also, when the binaural point 730 is located on the three-dimensional cubic 710 lower than the surround layer 720 where the reference listening point 700 is located, the output sound may be formed at the bottom of the listener.

As described above, in the present invention, it is possible to produce more various audio by changing the position of the binaural point 730 based on the reference listening point 700 on the three-dimensional cubic 710.

At this time, the direction information of the three-dimensional vector can be applied to the rotated cubic by corresponding to the head tracking information to generate the three-dimensional layer binaural output. That is, since the binaural point is set based on the listener's head corresponding to the reference listening point, the position of the binaural point on the cubic bicycle can also be changed if the listener's head position or angle is changed.

For example, it can be assumed that the 3D cubic 710 shown in FIG. 7 is rotated as shown in FIG. 8 in accordance with the head tracking information. At this time, the direction information of the three-dimensional vector 740 shown in FIG. 7 can be directly applied to the three-dimensional cubic as shown in FIG. 8 to detect the position of the changed binaural point according to the rotation.

At this time, the head tracking information corresponds to data obtained by tracking the head movement of the user or listener, and may be obtained corresponding to at least one of a tracking input based on a separate head tracking module and a user input based on the user interface.

For example, if a user or a listener moves his / her head while directly wearing the head tracking module, the head tracking module can measure the distance or angle of movement of the user's head and generate and transmit the head tracking information.

As another example, the head tracking information may be artificially provided by the user or the listener through the user interface. That is, the user or the listener may input the head tracking information based on the user interface regardless of whether the head tracking information is received by the head tracking module in order to artificially rotate the spatial image. At this time, the user or the listener may input and modify the head tracking information while listening to the mixing process of generating the binaural stereo output or the binaural stereo output varying according to the inputted information.

For example, when the listener rotates the head corresponding to at least one of pan, tilt, and roll as shown in FIG. 9, the value is obtained as a rotation parameter, .

In this way, the effect of rotating the three-dimensional cubic according to the head tracking information or moving it in the up, down, left, and right directions can be mixed with the flat layer audio output in the future to generate the binaural stereo output. Therefore, it is possible to produce an immersive effect based on head tracking more efficiently than a conventional method of rotating or moving a surround layer, a proximity stereo layer, a sub-woofer layer, or the like corresponding to a flat layer.

In addition, the processor 320 performs audio processing corresponding to the plane layer to produce a plane layer audio output.

At this time, the planar layer corresponds to a layer having a structure different from that of the three-dimensional binaural layer, and may correspond to an element that produces an image corresponding to a surround effect or a stereo effect.

Accordingly, the plane layer performs a surround layer binaural encoding to generate a surround layer binaural output, a surround layer to provide the generated surround layer binaural output as a plane layer audio output, and a surround layer to provide a stereo signal Or a proximity stereo layer that produces a corresponding flat layer audio output.

For example, referring to FIG. 10, a binaural encoder 1020 may be used to perform surround layer binaural encoding corresponding to a 5-channel or 7-channel 1010 surround layer. In this case, as described with reference to FIGS. 13 to 14, two channels corresponding to a proximity stereo layer may be included in the surround layer to perform 7-channel based surround layer binaural encoding.

At this time, the surround layer may correspond to a structure including five speakers 1111 to 1115, for example, as shown in Fig. At this time, the surround layer binaural output 1030 may correspond to a binaural point located on the surround layer. If the listener is listening to a sound at a reference listening point located at the center of the surround layer, the surround layer binaural output 1030 is binaurally encoded as if it were sounding at a binaural point on the surround layer, Can be generated.

At this time, the surround layer binaural output 1030 may be output corresponding to two channels as shown in FIG. In addition, the two channels corresponding to the surround layer binaural output 1030 may correspond to the left channel and the right channel, respectively.

10 to 11 illustrate a surround layer corresponding to five channels or seven channels 1010. However, the channel of the surround layer is not limited to five channels or seven channels (1010). In FIG. 11, the surround layer is shown in a rectangular plane shape, but it is not limited thereto and can be expressed in various forms such as a line thickness, a planar shape, and a distance from a reference listening point.

For example, referring to FIG. 12, audio processing may be performed corresponding to the proximity stereo layer of the two channels 1210 based on a Stereo Bus 1220. That is, a stereo signal 1230 corresponding to a plane layer audio output may correspond to an output produced by processing 2-channel 1210 based stereo audio, and may be output corresponding to two channels.

At this time, the proximity stereo layer corresponds to an element for producing a stereo image corresponding to the stereo effect, and may be included as a part of the surround layer.

For example, as shown in FIGS. 13 to 14, a surround stereo layer corresponding to two

speakers

1311, 1312, 1411 and 1412 on a surround layer based on five speakers is included so that a total of seven Or a layer structure including speakers.

At this time, as shown in Fig. 13, the proximity stereo layer may be disposed at a distance from the reference listening point 1300 located on the surround layer. Alternatively, as shown in FIG. 14, a proximity stereo layer may be used as the left and right side speakers of the reference listening point 1400.

At this time, the stereo signal output corresponding to the proximity stereo layer can provide a damping feeling that is difficult to produce with spatial parameters used in binaural encoding. Accordingly, the binaural stereo output according to an embodiment of the present invention may provide a damping feeling while providing an immersive effect by binaural encoding.

As such, a planar layer audio output corresponding to a surround layer binaural output or a planar layer audio output corresponding to a stereo signal can be used for output corresponding to an output containing only a different sound effect Lt; / RTI > That is, the plane layer audio output may include various values other than the output corresponding to the three-dimensional layer, rather than the three-dimensional layer binaural output.

At this time, the planar layer may be located between four up channels and four down channels corresponding to cubic cubic.

For example, referring to FIG. 15, the planar layers 1510 to 1530 according to an embodiment of the present invention include four up-channels included in a cubic cubic corresponding to a three-dimensional binaural layer and four down- May be located between the channels.

In this case, the four up channels may correspond to four speakers located at the top of the cubic cubic, and the four down channels may correspond to the four speakers located at the bottom of the cubic cubic.

That is, as shown in FIG. 15, the flat layers 1510 to 1530 may be located within a height range of a hexahedron corresponding to cubic cubic.

Accordingly, each of the speakers included in the surround layer or the adjacent stereo layer corresponding to the flat layers 1510 to 1530 may be located between the four up channels included in the 3D cubic and the four down channels . In this case, the flat layers 1510 to 1530 are shown in the form of planes for convenience of explanation in FIG. 15, but the shape of the planar layers according to an embodiment of the present invention may not be limited to the planar form.

FIGS. 16 and 17 show a structure in which the three-dimensional cubic and the flat layer 1610 corresponding to the three-dimensional binaural layer are viewed from above, respectively. The speaker of the proximity stereo layer included in the flat layer 1610 1622 and 1622 are also located between the up channel and the down channel of the cubic cubic.

17, by arranging the

speakers

1721 and 1722 of the proximity stereo layer on the left and right sides based on the reference listening point 1700, it is possible to adapt the video contents including the video have.

In addition, the processor 320 combines the three-dimensional layer binaural output and the planar layer audio output to produce a binaural stereo output. That is, by mixing an immersive element by a three-dimensional layer binaural output and a near-playback element and an object element by a flat layer audio output, a binaural stereo output capable of generating a binaural effect can be generated have.

In this case, when only an immersive sound is desired, a binaural stereo output may be generated using only a three-dimensional layer binaural output.

At this time, the subwoofer output corresponding to the subwoofer layer can be added together with the three-dimensional layer binaural output and the planar layer audio output to generate a binaural stereo output. At this time, by summing the subwoofer outputs, it is possible to maximize the immersive effect corresponding to the binaural stereo output, and to produce a dynamic bass reproduction element.

For example, referring to FIG. 18, a single channel or two-channel 1810 signal included in a sub-woofer layer may be processed based on an LFE bus (Low Frequency Effects Bus) 1820. That is, the subwoofer output 1830 may correspond to an output produced by processing a single channel or two channel (1810) based audio, and may correspond to a single channel or two channels as shown in FIG.

For example, the subwoofer layer may correspond to a single channel, such as 5.1 channel, 7.1 channel and 11.1 channel, or may correspond to two channels, such as 10.2 channel and 22.2 channel.

At this time, the sub-woofer layer can be located separately from the three-dimensional cubic or planar layer corresponding to the three-dimensional binaural layer.

19, the sub-woofer layer 1940 is located at a distance from the three-dimensional cubic 1910, the surround layer 1920, and the proximity stereo layer 1930 corresponding to the three-dimensional binaural layer Can be located. At this time, the structure shown in FIG. 19 corresponds to an embodiment, and is not limited to a structure in which respective layers are combined.

At this time, the three-dimensional weight can be applied to the three-dimensional layer binaural output, the plane weight can be applied to the plane layer audio output, and the three-dimensional weight and the plane weight can be set independently of each other. That is, it is possible to generate a more dramatic binaural stereo output by adjusting the size of the layer-by-layer output and then performing the mixing, thereby maximizing the binaural effect.

In addition, since the present invention can support natural upmix and downmix functions based on the processor 320 having the above-described functions, it is possible to improve compatibility between contents supporting various kinds of sounds. For example, you can downmix a surround image represented by three-dimensional cubic into a surround layer. Also, the surround layer may be downmixed back to the adjacent stereo layer. As described above, by downmixing based on the area, the sound quality of the sound can be preserved more effectively.

The memory 330 stores the 3D layer binaural output and the plane layer audio output.

In addition, the memory 330 stores various information generated in the process of generating the binaural stereo audio according to an embodiment of the present invention, as described above.

According to an embodiment, the memory 330 may be configured independently of the binaural stereo audio generation device to support the binaural stereo audio generation function. At this time, the memory 330 may operate as a separate mass storage and may include a control function for performing operations.

On the other hand, a binaural stereo audio generating apparatus can store information in a memory on which a memory is mounted. In one implementation, the memory is a computer-readable medium. In one implementation, the memory may be a volatile memory unit, and in other embodiments, the memory may be a non-volatile memory unit. In one implementation, the storage device is a computer-readable medium. In various different implementations, the storage device may include, for example, a hard disk device, an optical disk device, or any other mass storage device.

Such a binaural stereo audio generator can maximize the binaural effect by mixing various sound elements. In addition, compatibility with various kinds of contents can be improved based on natural upmix and downmix.

FIG. 20 is a view showing an example of a sound represented by a conventional binaural engine, and FIG. 21 is a view illustrating an example of a sound represented by a binaural engine according to the present invention.

Referring to FIG. 20, a binaural mix method using a conventional binaural engine has a limitation in expressing the proximity of sound. In other words, binaural mixing corresponds to providing a spatial image of sound, so there is no way to control the volume of sound to represent the proximity of sound through binaural mixing.

Therefore, when performing binaural mixing with a conventional binaural engine, even if the engineer performs binaural mixing corresponding to the intended sound direction 2010, the mixing result corresponds to the actual sound direction 2020 Can be expressed. In other words, binaural mixing to express the sound from the front to the back or from the back to the front based on the reference listening point 2000 is actually expressed in the form of following the surface of the binaural engine, which is called Vector base amplitude panning (Vbap) It may be the limit of technology.

21, the binaural engine according to the present invention mixes the flat layer audio output generated using the surround layer 2110 and the proximity stereo layer 2120 in addition to the three-dimensional binaural layer . That is, in the conventional binaural engine, the proximity expression of the sound, which is controlled only by the volume of the sound, can be controlled through the surround layer binaural output and the stereo signal.

Therefore, in the case of using the binaural engine according to the present invention, in FIG. 20, the engineer can express the actual sound direction 2130 corresponding to the intended sound direction 2010. That is, by transmitting the reference listening point 2100, it is possible to produce a sound that seems to be transmitted through the listener's body.

Referring to FIG. 22, a binaural stereo audio generating method according to an embodiment of the present invention performs three-dimensional layer binaural encoding corresponding to a three-dimensional binaural layer to generate a three-dimensional layer binaural output (S2210).

cubic bins

610, 620, and 630 having various ranges as shown in FIG. 6 Can be generated.

At this time, the head tracking information corresponds to data obtained by tracking the head movement of the user or the listener, and may be input through a separate head tracking module or a user interface.

As another example, head tracking information may be artificially assigned by a user or a listener. That is, the user or the listener may input the head tracking information based on the user interface irrespective of whether the head tracking information is received by the head tracking module to artificially rotate the spatial image. At this time, the user or the listener may input and modify the head tracking information while listening to the mixing process of generating the binaural stereo output or the binaural stereo output varying according to the inputted information.

In addition, in the binaural stereo audio generation method according to an embodiment of the present invention, audio processing corresponding to a plane layer is performed to generate a plane layer audio output (S2220).

For example, referring to FIG. 12, it is possible to perform audio processing corresponding to the proximity stereo layer of the two channels 1210 based on a Stereo Bus 1220. That is, a stereo signal 1230 corresponding to a plane layer audio output may correspond to an output produced by processing 2-channel 1210 based stereo audio, and may be output corresponding to two channels.

speakers

17, by arranging the

speakers

In addition, the binaural stereo audio generation method according to an embodiment of the present invention combines the 3D layer binaural output and the plane layer audio output to generate a binaural stereo output (S2230). That is, by mixing an immersive element by a three-dimensional layer binaural output and a near-playback element and an object element by a flat layer audio output, a binaural stereo output capable of generating a binaural effect can be generated have.

Further, since the present invention can support natural upmix and downmix functions based on the above-described functions, compatibility between contents supporting various kinds of sounds can be improved. For example, you can downmix a surround image represented by three-dimensional cubic into a surround layer. Also, the surround layer may be downmixed back to the adjacent stereo layer. As described above, by downmixing based on the area, the sound quality of the sound can be preserved more effectively.

Although not shown in FIG. 22, the method for generating binaural stereo audio according to an embodiment of the present invention can transmit and receive information necessary for generating binaural stereo audio through a communication network such as a network. Particularly, it is possible to receive head tracking information, information related to a user input or contents to be applied with a binaural effect, and provide a binaural stereo output according to an embodiment of the present invention.

Although not shown in FIG. 22, the method of generating binaural stereo audio according to an embodiment of the present invention may include generating binaural stereo audio according to an embodiment of the present invention, Various information is stored.

Thus, embodiments of the invention may be embodied in a computer-implemented method or in a non-volatile computer readable medium having recorded thereon instructions executable by the computer. When instructions readable by a computer are executed by a processor, the instructions readable by the computer are capable of performing at least one aspect of the invention.

As described above, the method and apparatus for generating binaural stereo audio according to the present invention are not limited to the above-described embodiments, and various modifications may be made to the embodiments. All or some of the embodiments may be selectively combined.

The present invention relates to a binaural stereo audio generating method and apparatus therefor, and it is possible to generate binaural stereo audio capable of maximizing a binaural effect by mixing various sound elements, The present invention provides a binaural engine that can easily adjust or adjust a sound element for generating a sound, and can improve compatibility with various kinds of contents based on natural upmix and downmix, thereby contributing to the development of industry.

Claims

Performing a three-dimensional layer binaural encoding corresponding to a three-dimensional binaural layer to generate a three-dimensional layer binaural output;

Performing audio processing corresponding to a planar layer to produce a planar layer audio output; And

Generating a binaural stereo output by summing the three-dimensional layer binaural output and the planar layer audio output

And generating a binaural stereo audio signal.
The method according to claim 1,

The planar layer

A surround layer for performing surround layer binaural encoding to generate a surround layer binaural output and providing the generated surround layer binaural output as the plane layer audio output,

And a proximity stereo layer for receiving the stereo signal and generating the plane layer audio output corresponding to the stereo signal.
The method of claim 2,

The three-dimensional layer binaural output

Dimensional binaural point on the 8-channel based three-dimensional Cubic consisting of 4 up channels and 4 down channels. The binaural stereo audio Generation method.
The method of claim 2,

The step of generating the binaural stereo output

Applying a three-dimensional weight to the three-dimensional layer binaural output, applying a plane weight to the planar layer audio output, and wherein the three-dimensional weight and the plane weight are set independently of each other. How to create audio.
The method according to claim 1,

The step of generating the binaural stereo output

Wherein the binaural stereo output is generated by summing the sub-woofer output corresponding to the sub-woofer layer together with the three-dimensional layer binaural output and the planar layer audio output.
The method of claim 3,

The three-

Wherein the positions of the eight dynamic speakers corresponding to the vertexes of the cubic cubic are changed according to size parameters of the 3D binaural layer.
The method of claim 3,

The three-dimensional vector

Wherein the binaural stereo audio is generated based on a reference listening point included in the 3D cubic and corresponding to a center of a two-dimensional plane corresponding to the surround layer.
The method of claim 3,

Wherein generating the three-dimensional layer binaural output comprises:

Dimensional brain binaural output by applying the direction information of the three-dimensional vector to the rotated cubic bikes corresponding to the head tracking information, wherein the head tracking information includes a tracking input based on a head tracking module and a user interface And a user input based on at least one of the user input and the user input.
The method of claim 8,

The three-

Wherein the at least one audio signal is rotated in accordance with a rotation parameter of at least one of a pan, a tilt, and a roll.
The method of claim 3,

Wherein the flat layer is located between the four up channels and the four down channels.
Dimensional layer binaural encoding by performing a three-dimensional layer binaural encoding corresponding to a three-dimensional binaural layer, performing audio processing corresponding to a plane layer to generate a plane layer audio output, A processor for summing the dimension layer binaural output and the planar layer audio output to produce a binaural stereo output; And

A memory for storing the three-dimensional layer binaural output and the plane layer audio output

Wherein the binaural stereo audio generating device comprises:
The method of claim 11,

The planar layer

A surround layer for performing surround layer binaural encoding to generate a surround layer binaural output and providing the generated surround layer binaural output as the plane layer audio output,

And a proximity stereo layer for receiving the stereo signal and generating the plane layer audio output corresponding to the stereo signal.
The method of claim 12,

The three-dimensional layer binaural output

Dimensional binaural point on the 8-channel based three-dimensional Cubic consisting of 4 up channels and 4 down channels. The binaural stereo audio Generating device.
The method of claim 12,

The processor

Applying a three-dimensional weight to the three-dimensional layer binaural output, applying a plane weight to the planar layer audio output,

Wherein the three-dimensional weight and the plane weight are set independently of each other.
The method of claim 11,

The processor

Wherein the binaural stereo output is generated by summing the sub-woofer output corresponding to the sub-woofer layer together with the three-dimensional layer binaural output and the planar layer audio output.
14. The method of claim 13,

The three-

Wherein the positions of the eight dynamic speakers corresponding to the vertexes of the cubic cubic are generated by changing the positions of the dynamic speakers corresponding to the size parameters for the 3D binaural layer.
14. The method of claim 13,

The three-dimensional vector

Wherein the binaural stereo audio is generated based on a reference listening point included in the 3D cubic and corresponding to a center of a two-dimensional plane corresponding to the surround layer.
14. The method of claim 13,

The processor

Dimensional brain binaural output by applying the direction information of the three-dimensional vector to the rotated cubic bikes corresponding to the head tracking information, wherein the head tracking information includes a tracking input based on a head tracking module and a user interface Wherein the binaural stereo audio is generated corresponding to at least one of the user input based on the input of the binaural stereo audio.
19. The method of claim 18,

The three-

And wherein the binaural stereo audio is rotated in accordance with at least one rotation parameter of a pan, a tilt, and a roll.
14. The method of claim 13,

Wherein the flat layer is located between the four up channels and the four down channels.