WO2015147433A1

WO2015147433A1 - Apparatus and method for processing audio signal

Info

Publication number: WO2015147433A1
Application number: PCT/KR2015/000452
Authority: WO
Inventors: 오현오; 곽진삼; 손주형
Original assignee: 인텔렉추얼디스커버리 주식회사
Priority date: 2014-03-25
Filing date: 2015-01-15
Publication date: 2015-10-01

Abstract

The present invention for an apparatus for processing an audio signal comprises: a speaker information input unit for receiving information regarding speakers a user can use; a reception unit for receiving an audio bit stream signal comprising a channel signal and/or an object signal; a decoding unit for decoding the channel signal or the object signal included in the audio bit stream signal; an object discernment unit for discerning whether an object corresponding to the object signal is located within a usable speaker region; a rendering unit comprising a channel renderer and an object renderer for rendering the decoded channel signal and the decoded object signal, respectively, and a rendering configuration unit for configuring a rendering method on the basis of the result of the discernment; and a compositing unit for compositing the rendered channel signal and the rendered object signal.

Description

Audio signal processing apparatus and method

The present invention relates to an audio signal processing apparatus and method.

3D audio is a set of signal processing, transmission, encoding, and playback methods for providing a realistic sound in three-dimensional space by providing another axis corresponding to the height direction to a sound scene (2D) on a horizontal plane provided by conventional surround audio. Also known as technology. In particular, in order to provide 3D audio, a rendering technique is required in which a sound image is formed at a virtual position in which no speaker exists even if a larger number of speakers or a smaller number of speakers are used.

3D audio is expected to be an audio solution for future Ultra High Definition Television (UHDTV) applications, including sound from vehicles evolving into a high quality infotainment space, as well as theater sound, personal 3DTV, tablets, smartphones and cloud games. It is expected to be applied in various fields.

Meanwhile, only a channel based signal may exist or an object based signal may exist in the form of a sound source provided to 3D audio. In addition, there may be a sound source in which a channel-based signal and an object-based signal are mixed, thereby providing a user with a new listening experience.

In this case, MPEG-H 3D audio for processing channel-based signals and object-based signals has various problems due to the performance difference between the channel renderer and the object renderer, and the sound scene does not play as intended due to the performance difference. The distortion of the sound source is generated. Therefore, there is a need to solve the problem caused by the performance difference between the channel renderer and the object renderer.

In addition, in 3D audio reproduction, there may be exception channels that are difficult to reproduce by the existing reproduction scheme depending on the specificity of the channel and the speaker environment at the reproduction stage. If an object is also located outside the speaker environment at the play end, it may be difficult to play it. Accordingly, there is a need for a technique for effectively reproducing an exception channel based on a speaker environment at a reproduction stage, and an example of using a sound bar is an example.

The sound bar is an advantageous method for playing an exception channel, but has a disadvantage in that sound quality is degraded when playing a basic channel signal. Accordingly, it may be more preferable to use an audio reproducing apparatus having a structure in which a separate speaker reproducing apparatus such as a sound bar and a basic speaker apparatus are merged. Therefore, there is a need for an MPEG-H decoding method suitable for such a usage environment.

In this regard, Korean Patent Laid-Open Publication No. 2011-0002504 (name of the invention: improved coding and parameter representation of multi-channel downmixed object coding) includes a plurality of audio objects divided into at least two downmix channels to provide downmix information. A technique for generating a, and generating object parameters to generate an encoded audio object signal is disclosed.

In addition, Korean Patent Publication No. 2011-0002504 (name of the invention: improved coding and parameter representation of multi-channel downmixed object coding) includes distributing downmix information by distributing a plurality of audio objects to at least two downmix channels. A technique for generating and generating object parameters to generate an encoded audio object signal is disclosed.

SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems of the prior art, and in some embodiments of the present invention, when an exception object signal corresponding to an exception object exists, the exception object is synthesized by rendering the rendered exception object signal into a channel signal and rendering the exception object signal. An audio signal processing apparatus and method capable of processing a signal are provided.

In addition, some embodiments of the present invention, an audio signal processing apparatus that can render the input audio bit string signal in the internal renderer and the external renderer, respectively, and simultaneously play them through a separate loudspeaker device such as a general loudspeaker and a headphone or sound bar. And methods.

As a technical means for achieving the above technical problem, the audio signal processing apparatus according to the first aspect of the present invention is an audio bit including a speaker information input unit, a channel signal and / or an object signal for receiving the user's usable speaker information A receiver for receiving a column signal, a decoder for decoding the channel signal or an object signal included in the audio bit string signal, and an object discriminating unit for determining whether an object corresponding to the object signal is located in the usable speaker area A renderer including a channel renderer and an object renderer for rendering the decoded channel signal and the decoded object signal, respectively, and a rendering setting unit configured to set a rendering method based on the determination result; A synthesis unit for synthesizing the rendered object signals It should.

In addition, an audio signal processing method in an audio signal processing apparatus according to a second aspect of the present invention includes decoding a channel signal or an object signal from a received audio bit string, rendering the decoded channel signal or object signal; Synthesizing the rendered channel signal and the object signal. In this case, the rendering may include: rendering the decoded channel signal, synthesizing the rendered channel signal and the rendered object signal, and synthesizing the rendered object signal with the channel signal; Optionally, one of the second methods of rendering the channel signal.

Also, an audio signal processing apparatus according to a third aspect of the present invention includes an internal renderer and an external renderer for rendering a decoded channel signal or a decoded object signal, and the decoded channel signal or object signal to the internal renderer and the external renderer. A distribution unit for distributing and a reproduction unit for reproducing the channel signal or the object signal rendered by the internal renderer and the external renderer, respectively. In this case, the channel signal or the object signal rendered through the internal renderer or the external renderer are reproduced through separate playback units.

In addition, the audio signal processing method in the audio signal processing apparatus according to the fourth aspect of the present invention comprises the steps of: distributing at least one channel signal or object signal of the decoded channel signal or the decoded object signal to the internal renderer and the external renderer, respectively; Rendering channel signals or object signals distributed to the internal renderer and the external renderer, respectively, and reproducing the rendered channel signals or object signals. In this case, in the distributing step, when the decoded channel signal or object signal is out of the usable speaker area, the decoded channel signal or object signal is distributed to the external renderer.

According to the above-described problem solving means of the present invention, when the speaker corresponding to the exception channel is absent from the playback stage, it can be effectively reproduced using other speakers.

In addition, by synthesizing the object signal to the channel signal and rendering it through the channel renderer, it is possible to prevent the distortion of the sound source caused by the performance difference between the object renderer and the channel renderer.

1 is a view for explaining a viewing angle according to an image size at the same viewing distance.

2 is a layout diagram of a 22.2 channel speaker as an example of a multi-channel audio environment.

3 is a conceptual diagram illustrating positions of sound objects constituting a three-dimensional sound scene in a listening space.

FIG. 4 is a diagram illustrating the overall structure of a 3D audio decoder and a renderer including a channel or an object renderer.

5 is a diagram in which 5.1 channels are arranged at positions and arbitrary positions according to the ITU-R Recommendation.

6 is a diagram illustrating a coupled structure in which an object signal decoder and a flexible speaker renderer are combined.

7 is a block diagram of an audio signal processing apparatus according to an embodiment of the present invention.

8 is a diagram illustrating a process of rendering a channel signal or an object signal in an audio signal processing apparatus according to an embodiment of the present invention.

9 is a flowchart of an audio signal processing method according to an embodiment of the present invention.

10 is a block diagram of an audio signal processing apparatus according to another embodiment of the present invention.

11 is a flowchart of a method of reproducing an audio signal according to another embodiment of the present invention.

12 is a diagram illustrating an example of a device in which an audio signal processing method according to the present invention is implemented.

DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

Throughout the specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, which means that it may further include other components, except to exclude other components unless otherwise stated. As used throughout this specification, the term "step to" or "step of" does not mean "step for."

First, an environment for implementing an audio signal processing apparatus and an audio signal processing method according to the present invention will be described with reference to FIGS. 1 to 6.

FIG. 1 illustrates a viewing angle according to an image size (eg, UHDTV and HDTV) at the same viewing distance.

As display production technology has recently been developed, the size of display images, such as UHDTVs, is becoming larger in accordance with consumer demand. As shown in FIG. 1, the UHDTV (7680 * 4320 pixel image, 110) is an image about 16 times larger than the HDTV (1920 * 1080 pixel image, 120). When the HDTV 120 is installed on the living room wall and the viewer sits on the living room couch with a certain viewing distance, the viewing angle may be about 30 degrees. When the UHDTV 110 is installed at the same viewing distance, the viewing angle reaches about 100 degrees.

When the high quality and high resolution large screen is installed as described above, it is desirable to provide a sound having a high sense of presence and presence for the large content. It may not be enough to have one or two surround channel speakers to provide a viewer almost as if they were in the field. Thus, there is a need for a multichannel audio environment with more speakers and channels.

As described above, in addition to a home theater environment, a multi-channel audio environment is required, as well as a personal 3DTV, a smartphone TV, a 22.2 channel audio program, a car, a 3D video, a remote presence room, and a cloud-based game.

The 22.2 channel may be an example of a multichannel audio environment for enhancing the sound field, and the present invention is not limited to a specific number of channels or a specific speaker layout. Referring to FIG. 2, a total of nine channels may be arranged in the top layer 210. There are three speakers in the front, three in the middle and three in the surround, for a total of nine speakers. The middle layer 220 has five speakers in front, two in the middle position, and three in the surround position, for a total of 10 speakers. In the bottom layer 230, three channels are disposed on the front surface, and two LFE channels 240 are provided.

As such, a high amount of computation is required to transmit and reproduce multichannel signals of up to several dozen channels. In addition, a high compression ratio may be required when considering a communication environment. In addition, in general homes, it is extremely rare to have a multi-channel speaker environment such as 22.2 channels, and many listeners have two-channel or 5.1-channel setups. In the case of a signal, the multi-channel signal must be converted and reproduced so as to correspond to two or 5.1 channels. Accordingly, not only communication inefficiency occurs but also 22.2 channel PCM signals need to be stored, which may cause inefficient problems in memory management.

In the listening space 300 where the listener 320 listens to 3D audio, the position of each sound object 310 constituting the three-dimensional sound scene is represented by a point source 310 as shown in FIG. 3. It can be distributed in various positions in the form.

Meanwhile, in FIG. 3, each object is shown as a point source 310 for convenience of schematic, but in addition to the point source 310, a sound wave in the form of a plain wave or a full orientation capable of recognizing the space of a sound scene is shown. There may be an ambient sound source, which is a sound spread throughout.

The decoder system illustrated in FIG. 4 may be broadly divided into a 3D audio decoder 400 and a 3D audio renderer 450.

The 3D audio decoder 400 may include an individual object decoder 410, an individual channel decoder 420, a SAOC transducer 430, and an MPS decoder 440.

The individual object decoder 410 receives an object signal, and the individual channel decoder 420 receives a channel signal. In this case, the audio bit string may include only an object signal or only a channel signal, and may include both an object signal and a channel signal.

In addition, the 3D audio decoder 400 may receive a signal in which an object signal or a channel signal is waveform encoded or parametric encoded, respectively, through the SAOC transducer 430 and the MPS decoder 440.

The 3D audio renderer 450 may include a 3DA renderer 460, and may render a channel signal, an object signal, or a parametric coded signal through the 3DA renderer 460.

The 3D audio decoder 400 receives an object signal, a channel signal, or a combination of the signals output through the 3D audio decoder 400 and outputs sound in accordance with the environment of the speaker of the listening space where the listener is located. In this case, the weights of the 3D audio decoder 400 and the 3D audio renderer 450 may be set based on the number and location information of the speaker in the listening space where the listener is located.

On the other hand, one of the technologies required for 3D audio is flexible rendering, which is one of the important tasks to be solved in order to maximize the quality of 3D audio. Reasons for flexible rendering techniques include:

It is well known that the location of 5.1-channel speakers is very irregular depending on the structure of the living room and the layout of the furniture. Even if the speaker exists at such an irregular location, it should be able to provide a sound scene intended by the content creator. To this end, a user needs to know a speaker environment in a different playback environment, and at the same time, a rendering technique for correcting a difference in position versus a specification is required. That is, the decoding of the transmitted bit string according to the decoding method does not end the role of the codec, but a series of techniques for the process of optimizing and transforming it to the user's playback environment is required.

The speaker 520 disposed in the actual living room has a problem in that both the direction angle and the distance are different from those of the ITU-R recommendation 510. That is, as the height and direction of the speaker are different from the speaker 510 according to the recommendation, it is difficult to provide an ideal 3D sound scene when the original signal is reproduced as it is at the changed speaker 520 position.

In this situation, VBAP (Amplitude Panning), which determines the direction information of the sound source between two speakers based on the magnitude of the signal, or VBAP, which is widely used to determine the direction of the sound source using three speakers in three-dimensional space, Vector-Based Amplitude Panning) enables flexible rendering of object signals transmitted for each object. Therefore, by transmitting the object signal instead of the channel signal it is possible to easily provide a 3D sound scene even in an environment where the speaker is different.

As described in FIG. 5, when an object signal is used, an object may be positioned as a sound source according to a desired sound scene. The first embodiment 600 and the second embodiment 601 in which the object signal decoder and the flexible renderer reflecting these advantages are combined will be described.

In a first embodiment 600 in which an object signal decoder and a flexible speaker renderer are combined, a mixer 620 receives an object signal from an object decoder 610 and receives position information represented by a mixing matrix to form a channel signal. Will output That is, the positional information on the sound scene is expressed as relative information from the speaker corresponding to the output channel.

The output channel signal is flexibly rendered through the flexible speaker renderer 630 and output. At this time, if the actual number and location of the speaker does not exist in the predetermined position can receive the position information of the speaker and perform flexible rendering.

In contrast, in the second embodiment 601, when the object decoder 640 receives the audio bit string signal and decodes the object signal, the flexible speaker mixer 650 receives the audio bit string signal and performs flexible rendering. At this time, the matrix updater 660 transfers the matrix reflecting the mixing matrix and the location information of the speaker to the flexible speaker mixer 650 to reflect the result when performing the flexible rendering.

Rendering the channel signal back to another type of channel signal like the first embodiment 600 is more difficult to implement than rendering the object directly to the final channel as in the second embodiment 601. This will be described in detail below.

When a channel signal is transmitted as an input, when a speaker position corresponding to a corresponding channel is changed to an arbitrary position, an object cannot be implemented using the same panning technique, so a separate channel mapping process is required. In addition, since the process and solution required for rendering the object signal and the channel signal are different, when the object signal and the channel signal are transmitted at the same time to create a sound scene in which the two signals are mixed, distortion caused by the mismatch of the space Problems that are likely to arise.

In order to solve such a problem, a mixture is first performed on the channel signal without separately performing the flexible rendering on the object, and then the flexible rendering on the channel signal is performed. In this case, the rendering using the HRTF (Head Related Transfer Function) is preferably implemented as in the above method.

Hereinafter, an audio signal processing apparatus and method according to the present invention will be described in detail with reference to FIGS. 7 to 9.

7 is a block diagram of an audio signal processing apparatus 700 according to an embodiment of the present invention.

The audio signal processing apparatus 700 according to the present invention includes a speaker information input unit 710, an audio signal receiver 720, a decoder 730, an object discriminator 740, a renderer 750, and a synthesizer 760. It includes.

The speaker information input unit 710 receives user's usable speaker information.

The audio signal receiver 720 receives an audio bit string signal including a channel signal and / or an object signal. That is, the audio bit string may include only the channel signal and may include only the object signal. In addition, the audio bit string may include both a channel signal and an object signal.

The decoder 730 decodes a channel signal or an object signal included in the audio bit string. In this case, the decoder 730 may decode metadata regarding the object signal. Meanwhile, the channel signal may be decoded by a core codec such as Unified Speech and Audio Coding (USAC). The object signal may be decoded by a core codec such as USAC or may be a parametric object signal decoded by a parametric codec such as SAOC (Spatial Audio Object Coding).

The object determining unit 740 determines whether an object corresponding to the object signal is located within the available speaker area. That is, the object determining unit 740 determines whether the object to be rendered is located in the speaker area based on the available speaker information received from the speaker information input unit 710. In this case, the rendering setting unit 755 to be described below sets the rendering method according to whether the object is located in the speaker area.

The renderer 750 includes a channel renderer 751 that renders the decoded channel signal, and an object renderer 753 that renders the decoded object signal. And a rendering setting unit 755 for setting a rendering method based on a result determined by the object determining unit 740 as to whether the object is an exception object.

In this case, when only the channel signal is included in the audio bit string signal, the rendering unit 750 renders the channel renderer 751 through channel rendering and transmits the rendering to the synthesis unit 760. Accordingly, the combiner 760 outputs the rendered channel signal. The channel renderer 751 may be a format converter and may further include a spectral EQ.

In contrast, when the audio bitstream signal includes only the object signal, the renderer 750 renders the object renderer 753 through object rendering and transmits the rendered object to the synthesizer 760. Accordingly, the combiner 760 outputs the rendered object signal. In this case, the object renderer 753 may render through a virtual VBAP (Vector Based Amplitude Panning) method.

Hereinafter, a rendering method for a case in which both the channel signal and the object signal are included in the received audio bit string signal will be described with reference to FIG. 8.

8 is a diagram illustrating a process of rendering a channel signal or an object signal in the audio signal processing apparatus 700 according to an embodiment of the present invention.

When the audio bit string signal received by the receiver 720 includes both a channel signal and an object signal, the rendering setting unit 755 is an object located within a speaker area where the object is usable by the object determining unit 740. The rendering method may be set based on the determination result of whether the object is an exception object.

First, when the rendering setting unit 755 determines that the object determining unit 740 is an exception object located outside the available speaker area, the object renderer 753 renders an object signal and channels the rendered object signal. It passes to the renderer 751. The channel renderer 751 may synthesize the received rendered object signal with the channel signal and render the synthesized channel signal.

On the other hand, in the case of an exception object located outside the available speaker area, there is a problem in that when playing with the existing speaker only, the sound that is suitable for the intention of the content creator cannot be played. Therefore, when an exception object exists, a virtual speaker corresponding to the location of the exception object may be generated, and rendering may be performed based on available speaker information and the virtual speaker.

For example, referring to FIG. 2, a speaker located at the center of the top layer 210 is absent in 22.2 channels, and a sound such as VoG (Voice of God) played at a speaker located at the center of the top layer 210 is played. When receiving a signal for processing, an object signal corresponding to VoG may be rendered to a pre-installed speaker of the uppermost layer 210, and the mixed object signal may be downmixed to the intermediate layer 220 to process an exception object signal.

In addition, even if a speaker located at the center of the top layer 210 as well as a part of the speaker located at the front or surround surface is absent, a virtual speaker is created at the position of the speaker located at the front or surround surface to handle the exception object. can do. That is, the exception object signal is rendered to the virtual speaker of the top layer 210 and the pre-installed speaker, and the middle layer located on the same vertical line as the virtual speaker and the pre-installed speaker located in the top layer 210 that received the rendered signal. The exception object may be processed by performing downmixing with the speaker on 220.

In addition, the exception object may be rendered by the VBAP rendering method between the virtual speaker and the pre-installed speaker. As such, when an exception object exists, the virtual object may be rendered using the virtual speaker, and the rendering method applied at this time is not limited to the above example and may be rendered by various methods.

Referring back to FIG. 8, when the object determining unit 740 determines that the object is located within the available speaker area, the rendering setting unit 755 may set to select and render the first and second steps. In this case, the first step causes the channel renderer 751 to render the channel signal, the object renderer 753 to render the object signal, and then combines each of the rendered channel signal and the rendered object signal. It can be passed to and synthesized.

The second step may synthesize the rendered object signal with the channel signal, and cause the channel renderer 751 to render the synthesized channel signal. That is, when the object is located within the available speaker area, it may be rendered not only by the rendering method according to the first step but also by the rendering method applied when it is determined that the object is an exception object.

Referring back to FIG. 7, the synthesizer 760 synthesizes the rendered channel signal and the rendered object signal. That is, the synthesizer 760 synthesizes both the rendered channel signal and the rendered object signal, and outputs the synthesized signal. In contrast, when only the channel signal or only the object signal is present, the channel signal or the object signal is output without any synthesis.

Meanwhile, when the object included in the audio bit string is a parametric object signal decoded by a parametric codec, the object may be processed by a method different from that when the individual object signal is included in the audio bit string. That is, in the case of the parametric object signal, the object parameter is applied to the parametric downmix channel signal and decoded according to the input target rendering matrix. In this case, the output signal is a channel signal that can be directly mapped to the target flexible rendering channel. The output is based on. That is, when the output channel of the rendering matrix required in the parametric decoding process corresponds to the flexible rendering channel, the rendering may be directly performed in the target channel format similarly to the case of the individual object signal. If there is a risk of mismatching when the spatial resolution of the resulting parametric object is compared to the output of the channel renderer, the channel renderer 751 is first applied after outputting a rendering matrix that can be synthesized with the channel signal. Can be rendered.

Hereinafter, with reference to FIG. 9, the steps performed by each component of the audio signal processing apparatus 700 described with reference to FIGS. 7 to 8 will be described.

The audio signal processing method in the audio signal processing apparatus 700 according to the present invention may receive usable speaker information of a user, and also receive an audio bit string signal including at least one of a channel signal and an object signal. Can be. That is, the audio bit string may include only the channel signal or only the object signal, and may include both the channel signal and the object signal.

When the audio bit string signal is received as described above, the audio signal processing method according to the present invention decodes the channel signal or the object signal from the received audio bit string signal (S110). In this case, the channel signal may be decoded by a core codec such as USAC. The object signal may be decoded with a core codec such as USAC and may also be a parametric object signal decoded with a parametric codec such as SAOC.

Next, the decoded channel signal or object signal is rendered (S120). In this case, the rendering may include the first method of rendering the decoded channel signal, synthesizing the rendered channel signal and the rendered object signal, and synthesizing the rendered object signal with the channel signal, and the synthesized channel. Any one of the second methods of rendering the signal is selectively performed.

In order to select the first method and the second method, the audio signal processing method according to the present invention may further include determining whether an object corresponding to the object signal is located within an available speaker area. That is, as to determine whether the object is an exception object outside the speaker area, the rendering is performed in different ways depending on whether the object is an exception object. This will be described in detail below.

First, when determined as an exception object, the object renderer renders an object signal, and passes the rendered object signal to the channel renderer. The channel renderer may synthesize the rendered object signal and the channel signal and render the synthesized channel signal. In this case, the channel renderer may generate a virtual speaker corresponding to the location of the exception object and perform rendering based on the available speaker information and the virtual speaker. Since the method of rendering in the channel renderer has been described with reference to FIG. 8, a detailed description thereof will be omitted below.

Alternatively, when it is determined that the object is not an exception object, the first method and the second method may be selected and rendered. The first method may cause the channel renderer to render the channel signal as described above, cause the object renderer to render the object signal, and then synthesize each of the rendered channel signal and the rendered object signal.

The second method may synthesize the rendered object signal with the channel signal and cause the channel renderer to render the synthesized channel signal. That is, when not an exception object, not only the rendering method according to the first method, but also the rendering method applied when it is determined that it is an exception object can be rendered. In this case, as a method of selecting a rendering method through the second method, an embodiment may be a determination about rendering performance of the channel renderer. That is, the rendering performance of the channel renderer can be predicted according to the difference between the input channel format and the target speaker format. If this value is less than or equal to a preset reference value, the rendering by the second method is performed even if it is not an exception object. Can be.

In addition, the object renderer may select and render the first method for some object signals and the second method for some other object signals instead of selecting the first method and the second method for all input object signals. have.

Next, the rendered channel signal and the object signal are synthesized (S130). That is, when both the rendered channel signal and the rendered object signal exist, they are synthesized and the synthesized signal is output. In contrast, when only the channel signal or only the object signal is present, the channel signal or the object signal is output without any synthesis.

On the other hand, the audio signal processing apparatus and method according to another embodiment of the present invention can render the audio bit string signal input by using an internal renderer and an external renderer, respectively, which is described with reference to FIGS. This will be described with reference.

10 is a block diagram of an audio signal processing apparatus 1000 according to another embodiment of the present invention.

The audio signal processing apparatus 1000 according to the present invention includes an internal renderer 1030, an external renderer 1040, a distribution unit 1050, and a playback unit 1060.

First, the audio signal processing apparatus 1000 according to the present invention may further include an audio signal receiver 1010 and a decoder 1020. The audio signal receiver 1010 may receive an audio bit string signal including at least one of one or more channel signals or object signals, and the decoder 1020 may decode a channel signal or object signal included in the audio bit string. have. In this case, the decoder 1020 may decode metadata regarding the plurality of object signals.

The internal renderer 1030 renders the decoded channel signal or object signal, and the external renderer 1040 also renders the decoded channel signal or object signal. In this case, the internal renderer 1030 and the external renderer 1040 may render a channel signal or an object signal based on vector based amplitude panning (VBAP) rendering.

The internal renderer 1030 is a renderer corresponding to a standard renderer in the case of MPEG-H, and may be the 3DA renderer 460 illustrated in FIG. 4, and the external renderer 1040 may be a renderer included in a specific product or may be developed separately. It may be a renderer. A speaker environment to which the internal renderer 1030 and the external renderer 1040 are applied will be described below.

The speaker environment of the audio signal processing apparatus 1000 according to the present invention may be reproduced through a general loudspeaker, for example, when the speaker system is provided with a separate playback system such as a general loudspeaker and a sound bar. The sound source may be rendered through the internal renderer 1030, and the sound source reproduced through the sound bar may be rendered through the external renderer 1040.

In addition, the external renderer 1040 may be a binaural renderer. Accordingly, a signal rendered by the internal renderer 1030 may be reproduced in a general loudspeaker, and a signal binaurally rendered by the external renderer 1040 may be reproduced through a speaker environment such as headphones.

Meanwhile, in the audio signal processing apparatus 1000 according to the present invention, the speaker environment to which the internal renderer 1030 and the external renderer 1040 are applied is not limited thereto, and various rendering methods and speaker environments may be applied.

In this way, signals processed independently through two renderers may be reproduced simultaneously in the same space. In this case, in order for the rendered signal to be simultaneously reproduced in the same space, synchronization between the internal renderer 1030 and the external renderer 1040 is required, which is performed through the delay compensator 1070 and the weight adjusting unit 1080 to be described below. Can be synchronized.

The distribution unit 1050 distributes the decoded channel signal or the object signal to the internal renderer 1030 and the external renderer 1040. In this case, the distribution unit 1050 distributes one or more channel signals or object signals among the decoded channel signals or object signals to the internal renderer 1030 and the external renderer 1040.

The distribution unit 1050 may overlap one or more channel signals or object signals among the decoded channel signals or object signals and distribute them to the internal renderer 1030 and the external renderer 1040. For example, when receiving the first to fifth channel signals, the distribution unit 1050 distributes the first to third channel signals to the internal renderer 1030 and distributes the third to fifth channel signals to the external renderer ( 1040, the third channel signal may be distributed to the internal renderer 1030 and the external renderer 1040 to overlap each other. In this case, when the maximum overlap occurs, the internal renderer 1030 and the external renderer 1040 receive the same channel signal or object signal. That is, the distribution unit 1050 may distribute the first to fifth channel signals to be commonly input to the internal renderer 1030 and the external renderer 1040.

Alternatively, the distribution unit 1050 may distribute the decoded channel signal or the object signal to the internal renderer 1030 and the external renderer 1040 so as not to overlap. For example, the first to third channel signals may be distributed to the internal renderer 1030, and the fourth to fifth channel signals may be distributed to the external renderer 1040.

The playback unit 1060 reproduces the channel signal or the object signal rendered by the internal renderer 1030 and the external renderer 1040, respectively. In this case, the channel signal or the object signal rendered through the internal renderer 1030 or the external renderer 1040 is reproduced through a separate playback unit 1060.

Meanwhile, the audio signal processing apparatus 1000 according to the present invention may further include a delay compensator 1070, a weight adjuster 1080, and a speaker information input unit 1090.

The delay compensator 1070 may compensate for a time delay occurring between the internal renderer 1030 and the external renderer 1040. For example, when the external renderer 1040 generates an additional time delay than the internal renderer 1030, the delay compensator 1070 takes the delay time to synchronize the internal renderer 1030 and the external renderer 1040 in consideration of this. To compensate for this.

The weight adjusting unit 1080 may adjust the output weight of each of the internal renderer 1030 and the external renderer 1040 to adjust the sound intensity of the channel signal or the object signal. That is, since the channel signal or the object signal respectively rendered by the internal renderer 1030 and the external renderer 1040 are reproduced in the same space, the weight adjusting unit 1080 sounds the sound of the internal renderer 1030 and the external renderer 1040. You can synchronize by adjusting the intensity of.

The speaker information input unit 1090 may receive usable speaker information. At this time, if the location of the channel or object corresponding to the channel signal or the object signal is out of the available speaker area based on the input speaker information of the user, the distribution unit 1050 may receive the decoded channel signal or the object signal. The external renderer 1040 may distribute to the external renderer 1040, and thus the external renderer 1040 may render a channel signal or an object signal deviating from the available speaker area.

Hereinafter, with reference to FIG. 11, the steps performed by each component of the audio signal processing apparatus 1000 described with reference to FIG. 10 will be described.

11 is a flowchart of an audio signal processing method according to another embodiment of the present invention.

In the audio signal processing method of the audio signal processing apparatus 1000 according to the present invention, first, usable speaker information of a user may be input. In this case, the user's available speaker environment may include, for example, a general loudspeaker and a sound bar, or headphones that receive a rendered signal through binaural rendering instead of a sound bar. When the user inputs the speaker information available through the UI or the like, it is determined whether the position of the channel or the object corresponding to the channel signal or the object signal is out of the available speaker area based on the speaker information. If the determination result is out of the speaker region, the channel signal or the object signal is rendered through the external renderer 1040 as described below, and the rendered signal may be reproduced through a playback device such as a sound bar or headphones.

Meanwhile, the speaker environment to which the audio signal processing method according to the present invention is applied is not limited to the above-described application example, and the audio signal processing method according to the present invention may be applied in various speaker environments.

In addition, the audio signal processing method according to the present invention may receive an audio bit string signal including at least one channel signal or object signal and decode the channel signal or object signal included in the received audio bit string (S230). . In this case, the metadata of the object signal may be decoded, and the decoded metadata may be distributed to the internal renderer 1030 or the external renderer 1040 based on this.

After decoding the channel signal or the object signal as described above, one or more channel or object signals of the decoded channel signal or object signal are distributed to the internal renderer 1030 and the external renderer 1040, respectively (S210). In this case, when the decoded channel signal or object signal is out of the available speaker area, the decoded channel signal or object signal is distributed to the external renderer 1040.

On the other hand, the distribution unit 1050 may distribute the channel signal or the object signal included in the audio bit stream so as to overlap the internal renderer 1030 and the external renderer 1040, otherwise the channel signal or object signal is distributed so as not to overlap You may. Since this has been described with reference to FIG. 10, a detailed description thereof will be omitted.

Next, the channel signal or the object signal distributed to the internal renderer 1030 and the external renderer 1040 are respectively rendered (S220). In this case, the internal renderer 1030 and the external renderer 1040 may render a channel signal or an object signal based on the VBAP rendering. In the meantime, the internal renderer 1030 is a renderer corresponding to a standard renderer in the case of MPEG-H, and may be the 3DA renderer 460 illustrated in FIG. 4, and the external renderer 1040 may be a renderer included in a specific product or separately. It may be a developed renderer.

Next, the rendered channel signal or object signal is reproduced (S230). In this case, the channel signal or the object signal rendered through the internal renderer 1030 and the external renderer 1040 may be reproduced through separate playback units 1060. That is, the internal renderer 1030 may be reproduced through a general loudspeaker, and the external renderer 1040 may be reproduced through a separate playback unit 1060 such as a sound bar or headphones. As such, signals processed independently through the internal renderer 1030 and the external renderer 1040 may be simultaneously reproduced in the same space. In order to simultaneously play in the same space, a process of synchronizing the internal renderer 1030 and the external renderer 1040 is required.

Therefore, the audio signal processing method according to the present invention may further include synchronizing the internal renderer 1030 and the external renderer 1040.

Specifically, the method may further include compensating for a delay time occurring between the internal renderer 1030 and the external renderer 1040. For example, when the external renderer 1040 causes an additional time delay than the internal renderer 1030, the two renderers may be synchronized by compensating for the time delay of the internal renderer 1030 in consideration of this.

The method may further include adjusting the intensity of the sound of the channel signal or the object signal by adjusting the output weight of each of the external renderer 1040 and the internal renderer 1030. The output weights of the internal renderer 1030 and the external renderer 1040 are adjusted to adjust the intensity of the sound of the speaker that reproduces the signal rendered by the internal renderer 1030 and the speaker that reproduces the signal rendered by the external renderer 1040. By adjusting, it is possible to solve the problem that the sound is distorted when reproduced in the same space.

Meanwhile, the audio signal processing apparatus and method according to the exemplary embodiments described with reference to FIGS. 1 to 11 may be implemented by the audio reproducing apparatus 1 shown in FIG. 12, which will be described below.

12 is a diagram illustrating an example of a device in which an audio signal processing device and method according to the present invention are implemented.

The audio reproducing apparatus 1 may include a wired / wireless communication unit 10, a user authentication unit 20, an input unit 30, a signal coding unit 40, a control unit 50, and an output unit 60.

The wired / wireless communication unit 10 receives an audio bit string signal through a wired / wireless communication method. The wired / wireless communication unit 10 may include a configuration such as an infrared communication unit, a Bluetooth unit, or a wireless LAN communication unit, and may receive an audio bit string signal through various other communication methods.

The user authentication unit 20 receives user information and performs user authentication. In this case, the user authentication unit 20 may include one or more of a fingerprint recognition unit, an iris recognition unit, a face recognition unit, and a voice recognition unit. That is, the user authentication may be performed by receiving a fingerprint, iris information, facial outline information, and voice information, converting the user information into a user information, and determining whether or not matching with the registered user information is performed.

The input unit 30 is an input device for the user to input various types of commands, and may include one or more of a keypad unit, a touch pad unit, and a remote controller unit.

The signal coding unit 40 may encode or decode an audio signal, a video signal, or a combination thereof received through the wire / wireless communication unit 10 and output an audio signal of a time domain. The signal coding unit 40 may include an audio signal processing apparatus, and the audio signal processing apparatus according to the present invention may be applied.

The controller 50 receives an input signal from the input devices and controls all processes of the signal coding unit 40 and the output unit 60. The output unit 60 outputs an output signal generated by the signal coding unit 40, and may include components such as a speaker unit and a display unit. In this case, when the output signal is an audio signal, the output signal may be output to the speaker, and in the case of a video signal, the output signal may be output through the display.

For reference, components shown in FIGS. 4, 6 through 8, 10, and 12 according to an embodiment of the present invention may be software or hardware such as a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC). Means a component, and plays a role.

However, 'components' are not meant to be limited to software or hardware, and each component may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors.

Thus, as an example, a component may include components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, procedures, and subs. Routines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables.

Components and the functionality provided within those components may be combined into a smaller number of components or further separated into additional components.

Meanwhile, an embodiment of the present invention may be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by the computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.

The foregoing description of the present invention is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

The scope of the present invention is shown by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

Claims

In the audio signal processing apparatus,

Speaker information input unit for receiving the available speaker information of the user,

A receiver configured to receive an audio bit string signal including a channel signal and / or an object signal;

A decoder which decodes the channel signal or the object signal included in the audio bit string signal;

An object discriminating unit determining whether an object corresponding to the object signal is located in the usable speaker area;

A rendering unit including a channel renderer and an object renderer for rendering the decoded channel signal and the decoded object signal, respectively, and a rendering setting unit configured to set a rendering method based on the determination result;

And a synthesizer configured to synthesize the rendered channel signal and the rendered object signal.
The method of claim 1,

The rendering setting unit,

If it is determined that the object is located outside the available speaker area,

And synthesize the rendered object signal with the channel signal and render the synthesized channel signal.
The method of claim 2,

The object renderer creates a virtual speaker corresponding to an exception object located outside the available speaker area,

And render the exception object based on the available speaker information and the generated virtual speaker.
The method of claim 1,

The rendering setting unit,

If it is determined that the object is located within the available speaker area,

The channel renderer renders the channel signal, and the synthesis unit synthesizes the rendered channel signal and the rendered object signal;

And synthesizing the rendered object signal with the channel signal, and selectively performing any one of a second step of rendering the synthesized channel signal.
The method of claim 1,

And the decoder to decode metadata for the plurality of object signals.
An audio signal processing method in an audio signal processing apparatus,

Decoding a channel signal or an object signal from the received audio bit stream;

Rendering the decoded channel signal or object signal; and

Synthesizing the rendered channel signal and the object signal;

The rendering step,

A first method of rendering the decoded channel signal and synthesizing the rendered channel signal and the rendered object signal; and

And synthesizing the rendered object signal with the channel signal, and selectively performing any one of a second method of rendering the synthesized channel signal.
The method of claim 6,

Further comprising the step of receiving the available speaker information of the user,

And if it is determined that the object is located outside an available speaker area, the object signal is rendered according to a second method.
The method of claim 6,

And determining whether an object corresponding to the object signal is located in the usable speaker area.
The method of claim 6,

And receiving an audio bit string signal comprising at least one of the channel signal and the object signal.
In the audio signal processing apparatus,

An internal renderer and an external renderer for rendering a decoded channel signal or a decoded object signal,

A distribution unit which distributes the decoded channel signal or object signal to the internal renderer and the external renderer;

A reproduction unit for reproducing the channel signal or the object signal respectively rendered by the internal renderer and the external renderer,

And a channel signal or an object signal rendered through the internal renderer or the external renderer, respectively, through a separate playback unit.
The method of claim 10,

And a delay compensator for compensating for a time delay occurring between the internal renderer and the external renderer.
The method of claim 10,

And a weight adjuster configured to adjust the output weight of each of the external renderer and the internal renderer to adjust the intensity of sound of the channel signal or the object signal.
The method of claim 10,

And the distribution unit distributes one or more channel signals or object signals of the decoded channel signal or the object signal to the internal renderer and the external renderer.
The method of claim 10,

And the distribution unit distributes the decoded channel signal or the object signal to the internal renderer and the external renderer so that they do not overlap.
The method of claim 10,

Further comprising a speaker information input unit for receiving the user's available speaker information,

And the distribution unit distributes the decoded channel signal or object signal to the external renderer when the decoded channel signal or object signal is out of the available speaker area.
The method of claim 10,

And the decoding unit decodes metadata about the object signal.
The method of claim 10,

And the inner renderer and the outer renderer render the channel signal or the object signal based on VBAP rendering.
An audio signal processing method in an audio signal processing apparatus,

Distributing at least one channel signal or object signal of the decoded channel signal or the decoded object signal to the internal renderer and the external renderer, respectively;

Rendering channel signals or object signals distributed to the internal renderer and the external renderer, respectively;

Reproducing the rendered channel signal or object signal,

Distributing step,

And if the decoded channel signal or object signal is out of the available speaker area, distributing the decoded channel signal or object signal to the external renderer.
The method of claim 18,

Compensating for a time delay occurring between the internal renderer and the external renderer.
The method of claim 18,

And adjusting an output weight of each of the external renderer and the internal renderer to adjust the sound intensity of the channel signal or the object signal.
The method of claim 18,

The audio signal processing method further comprising the step of receiving the available speaker information of the user.