WO2015147433A1 - Appareil et procédé pour traiter un signal audio - Google Patents

Appareil et procédé pour traiter un signal audio Download PDF

Info

Publication number
WO2015147433A1
WO2015147433A1 PCT/KR2015/000452 KR2015000452W WO2015147433A1 WO 2015147433 A1 WO2015147433 A1 WO 2015147433A1 KR 2015000452 W KR2015000452 W KR 2015000452W WO 2015147433 A1 WO2015147433 A1 WO 2015147433A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
renderer
channel
channel signal
rendering
Prior art date
Application number
PCT/KR2015/000452
Other languages
English (en)
Korean (ko)
Inventor
오현오
곽진삼
손주형
Original Assignee
인텔렉추얼디스커버리 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020140034595A external-priority patent/KR20150111117A/ko
Priority claimed from KR1020140034597A external-priority patent/KR20150111119A/ko
Application filed by 인텔렉추얼디스커버리 주식회사 filed Critical 인텔렉추얼디스커버리 주식회사
Publication of WO2015147433A1 publication Critical patent/WO2015147433A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/005Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo five- or more-channel type, e.g. virtual surround
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation

Definitions

  • the present invention relates to an audio signal processing apparatus and method.
  • 3D audio is a set of signal processing, transmission, encoding, and playback methods for providing a realistic sound in three-dimensional space by providing another axis corresponding to the height direction to a sound scene (2D) on a horizontal plane provided by conventional surround audio. Also known as technology.
  • a rendering technique is required in which a sound image is formed at a virtual position in which no speaker exists even if a larger number of speakers or a smaller number of speakers are used.
  • 3D audio is expected to be an audio solution for future Ultra High Definition Television (UHDTV) applications, including sound from vehicles evolving into a high quality infotainment space, as well as theater sound, personal 3DTV, tablets, smartphones and cloud games. It is expected to be applied in various fields.
  • Ultra High Definition Television UHDTV
  • a channel based signal may exist or an object based signal may exist in the form of a sound source provided to 3D audio.
  • MPEG-H 3D audio for processing channel-based signals and object-based signals has various problems due to the performance difference between the channel renderer and the object renderer, and the sound scene does not play as intended due to the performance difference.
  • the distortion of the sound source is generated. Therefore, there is a need to solve the problem caused by the performance difference between the channel renderer and the object renderer.
  • the sound bar is an advantageous method for playing an exception channel, but has a disadvantage in that sound quality is degraded when playing a basic channel signal. Accordingly, it may be more preferable to use an audio reproducing apparatus having a structure in which a separate speaker reproducing apparatus such as a sound bar and a basic speaker apparatus are merged. Therefore, there is a need for an MPEG-H decoding method suitable for such a usage environment.
  • Korean Patent Laid-Open Publication No. 2011-0002504 name of the invention: improved coding and parameter representation of multi-channel downmixed object coding
  • a technique for generating a, and generating object parameters to generate an encoded audio object signal is disclosed.
  • Korean Patent Publication No. 2011-0002504 name of the invention: improved coding and parameter representation of multi-channel downmixed object coding
  • a technique for generating and generating object parameters to generate an encoded audio object signal is disclosed.
  • the present invention has been made to solve the above-mentioned problems of the prior art, and in some embodiments of the present invention, when an exception object signal corresponding to an exception object exists, the exception object is synthesized by rendering the rendered exception object signal into a channel signal and rendering the exception object signal.
  • An audio signal processing apparatus and method capable of processing a signal are provided.
  • an audio signal processing apparatus that can render the input audio bit string signal in the internal renderer and the external renderer, respectively, and simultaneously play them through a separate loudspeaker device such as a general loudspeaker and a headphone or sound bar. And methods.
  • the audio signal processing apparatus is an audio bit including a speaker information input unit, a channel signal and / or an object signal for receiving the user's usable speaker information
  • a receiver for receiving a column signal, a decoder for decoding the channel signal or an object signal included in the audio bit string signal, and an object discriminating unit for determining whether an object corresponding to the object signal is located in the usable speaker area
  • a renderer including a channel renderer and an object renderer for rendering the decoded channel signal and the decoded object signal, respectively, and a rendering setting unit configured to set a rendering method based on the determination result;
  • a synthesis unit for synthesizing the rendered object signals It should.
  • an audio signal processing method in an audio signal processing apparatus includes decoding a channel signal or an object signal from a received audio bit string, rendering the decoded channel signal or object signal; Synthesizing the rendered channel signal and the object signal.
  • the rendering may include: rendering the decoded channel signal, synthesizing the rendered channel signal and the rendered object signal, and synthesizing the rendered object signal with the channel signal;
  • one of the second methods of rendering the channel signal is not limited to rendering the channel signal.
  • an audio signal processing apparatus includes an internal renderer and an external renderer for rendering a decoded channel signal or a decoded object signal, and the decoded channel signal or object signal to the internal renderer and the external renderer.
  • the channel signal or the object signal rendered through the internal renderer or the external renderer are reproduced through separate playback units.
  • the audio signal processing method in the audio signal processing apparatus comprises the steps of: distributing at least one channel signal or object signal of the decoded channel signal or the decoded object signal to the internal renderer and the external renderer, respectively; Rendering channel signals or object signals distributed to the internal renderer and the external renderer, respectively, and reproducing the rendered channel signals or object signals.
  • the distributing step when the decoded channel signal or object signal is out of the usable speaker area, the decoded channel signal or object signal is distributed to the external renderer.
  • 1 is a view for explaining a viewing angle according to an image size at the same viewing distance.
  • FIG. 2 is a layout diagram of a 22.2 channel speaker as an example of a multi-channel audio environment.
  • FIG. 3 is a conceptual diagram illustrating positions of sound objects constituting a three-dimensional sound scene in a listening space.
  • FIG. 4 is a diagram illustrating the overall structure of a 3D audio decoder and a renderer including a channel or an object renderer.
  • 5 is a diagram in which 5.1 channels are arranged at positions and arbitrary positions according to the ITU-R Recommendation.
  • FIG. 6 is a diagram illustrating a coupled structure in which an object signal decoder and a flexible speaker renderer are combined.
  • FIG. 7 is a block diagram of an audio signal processing apparatus according to an embodiment of the present invention.
  • FIG. 8 is a diagram illustrating a process of rendering a channel signal or an object signal in an audio signal processing apparatus according to an embodiment of the present invention.
  • FIG. 9 is a flowchart of an audio signal processing method according to an embodiment of the present invention.
  • FIG. 10 is a block diagram of an audio signal processing apparatus according to another embodiment of the present invention.
  • FIG. 11 is a flowchart of a method of reproducing an audio signal according to another embodiment of the present invention.
  • FIG. 12 is a diagram illustrating an example of a device in which an audio signal processing method according to the present invention is implemented.
  • FIGS. 1 to 6 An environment for implementing an audio signal processing apparatus and an audio signal processing method according to the present invention will be described with reference to FIGS. 1 to 6.
  • FIG. 1 illustrates a viewing angle according to an image size (eg, UHDTV and HDTV) at the same viewing distance.
  • an image size eg, UHDTV and HDTV
  • the size of display images is becoming larger in accordance with consumer demand.
  • the UHDTV (7680 * 4320 pixel image, 110) is an image about 16 times larger than the HDTV (1920 * 1080 pixel image, 120).
  • the viewing angle may be about 30 degrees.
  • the UHDTV 110 is installed at the same viewing distance, the viewing angle reaches about 100 degrees.
  • a multi-channel audio environment is required, as well as a personal 3DTV, a smartphone TV, a 22.2 channel audio program, a car, a 3D video, a remote presence room, and a cloud-based game.
  • FIG. 2 is a layout diagram of a 22.2 channel speaker as an example of a multi-channel audio environment.
  • the 22.2 channel may be an example of a multichannel audio environment for enhancing the sound field, and the present invention is not limited to a specific number of channels or a specific speaker layout.
  • a total of nine channels may be arranged in the top layer 210.
  • the middle layer 220 has five speakers in front, two in the middle position, and three in the surround position, for a total of 10 speakers.
  • three channels are disposed on the front surface, and two LFE channels 240 are provided.
  • FIG. 3 is a conceptual diagram illustrating positions of sound objects constituting a three-dimensional sound scene in a listening space.
  • each sound object 310 constituting the three-dimensional sound scene is represented by a point source 310 as shown in FIG. 3. It can be distributed in various positions in the form.
  • each object is shown as a point source 310 for convenience of schematic, but in addition to the point source 310, a sound wave in the form of a plain wave or a full orientation capable of recognizing the space of a sound scene is shown.
  • a sound wave in the form of a plain wave or a full orientation capable of recognizing the space of a sound scene is shown.
  • FIG. 4 is a diagram illustrating the overall structure of a 3D audio decoder and a renderer including a channel or an object renderer.
  • the decoder system illustrated in FIG. 4 may be broadly divided into a 3D audio decoder 400 and a 3D audio renderer 450.
  • the 3D audio decoder 400 may include an individual object decoder 410, an individual channel decoder 420, a SAOC transducer 430, and an MPS decoder 440.
  • the individual object decoder 410 receives an object signal
  • the individual channel decoder 420 receives a channel signal.
  • the audio bit string may include only an object signal or only a channel signal, and may include both an object signal and a channel signal.
  • the 3D audio decoder 400 may receive a signal in which an object signal or a channel signal is waveform encoded or parametric encoded, respectively, through the SAOC transducer 430 and the MPS decoder 440.
  • the 3D audio renderer 450 may include a 3DA renderer 460, and may render a channel signal, an object signal, or a parametric coded signal through the 3DA renderer 460.
  • the 3D audio decoder 400 receives an object signal, a channel signal, or a combination of the signals output through the 3D audio decoder 400 and outputs sound in accordance with the environment of the speaker of the listening space where the listener is located.
  • the weights of the 3D audio decoder 400 and the 3D audio renderer 450 may be set based on the number and location information of the speaker in the listening space where the listener is located.
  • 5 is a diagram in which 5.1 channels are arranged at positions and arbitrary positions according to the ITU-R Recommendation.
  • the speaker 520 disposed in the actual living room has a problem in that both the direction angle and the distance are different from those of the ITU-R recommendation 510. That is, as the height and direction of the speaker are different from the speaker 510 according to the recommendation, it is difficult to provide an ideal 3D sound scene when the original signal is reproduced as it is at the changed speaker 520 position.
  • VBAP Amplitude Panning
  • VBAP which determines the direction information of the sound source between two speakers based on the magnitude of the signal
  • VBAP which is widely used to determine the direction of the sound source using three speakers in three-dimensional space
  • Vector-Based Amplitude Panning enables flexible rendering of object signals transmitted for each object. Therefore, by transmitting the object signal instead of the channel signal it is possible to easily provide a 3D sound scene even in an environment where the speaker is different.
  • FIG. 6 is a diagram illustrating a coupled structure in which an object signal decoder and a flexible speaker renderer are combined.
  • an object when used, an object may be positioned as a sound source according to a desired sound scene.
  • the first embodiment 600 and the second embodiment 601 in which the object signal decoder and the flexible renderer reflecting these advantages are combined will be described.
  • a mixer 620 receives an object signal from an object decoder 610 and receives position information represented by a mixing matrix to form a channel signal. Will output That is, the positional information on the sound scene is expressed as relative information from the speaker corresponding to the output channel.
  • the output channel signal is flexibly rendered through the flexible speaker renderer 630 and output. At this time, if the actual number and location of the speaker does not exist in the predetermined position can receive the position information of the speaker and perform flexible rendering.
  • the flexible speaker mixer 650 receives the audio bit string signal and performs flexible rendering.
  • the matrix updater 660 transfers the matrix reflecting the mixing matrix and the location information of the speaker to the flexible speaker mixer 650 to reflect the result when performing the flexible rendering.
  • Rendering the channel signal back to another type of channel signal like the first embodiment 600 is more difficult to implement than rendering the object directly to the final channel as in the second embodiment 601. This will be described in detail below.
  • a mixture is first performed on the channel signal without separately performing the flexible rendering on the object, and then the flexible rendering on the channel signal is performed.
  • the rendering using the HRTF Head Related Transfer Function
  • FIG. 7 is a block diagram of an audio signal processing apparatus 700 according to an embodiment of the present invention.
  • the audio signal processing apparatus 700 includes a speaker information input unit 710, an audio signal receiver 720, a decoder 730, an object discriminator 740, a renderer 750, and a synthesizer 760. It includes.
  • the speaker information input unit 710 receives user's usable speaker information.
  • the audio signal receiver 720 receives an audio bit string signal including a channel signal and / or an object signal. That is, the audio bit string may include only the channel signal and may include only the object signal. In addition, the audio bit string may include both a channel signal and an object signal.
  • the decoder 730 decodes a channel signal or an object signal included in the audio bit string.
  • the decoder 730 may decode metadata regarding the object signal.
  • the channel signal may be decoded by a core codec such as Unified Speech and Audio Coding (USAC).
  • the object signal may be decoded by a core codec such as USAC or may be a parametric object signal decoded by a parametric codec such as SAOC (Spatial Audio Object Coding).
  • the object determining unit 740 determines whether an object corresponding to the object signal is located within the available speaker area. That is, the object determining unit 740 determines whether the object to be rendered is located in the speaker area based on the available speaker information received from the speaker information input unit 710. In this case, the rendering setting unit 755 to be described below sets the rendering method according to whether the object is located in the speaker area.
  • the renderer 750 includes a channel renderer 751 that renders the decoded channel signal, and an object renderer 753 that renders the decoded object signal. And a rendering setting unit 755 for setting a rendering method based on a result determined by the object determining unit 740 as to whether the object is an exception object.
  • the rendering unit 750 when only the channel signal is included in the audio bit string signal, the rendering unit 750 renders the channel renderer 751 through channel rendering and transmits the rendering to the synthesis unit 760. Accordingly, the combiner 760 outputs the rendered channel signal.
  • the channel renderer 751 may be a format converter and may further include a spectral EQ.
  • the renderer 750 renders the object renderer 753 through object rendering and transmits the rendered object to the synthesizer 760. Accordingly, the combiner 760 outputs the rendered object signal.
  • the object renderer 753 may render through a virtual VBAP (Vector Based Amplitude Panning) method.
  • FIG. 8 is a diagram illustrating a process of rendering a channel signal or an object signal in the audio signal processing apparatus 700 according to an embodiment of the present invention.
  • the rendering setting unit 755 is an object located within a speaker area where the object is usable by the object determining unit 740.
  • the rendering method may be set based on the determination result of whether the object is an exception object.
  • the object renderer 753 renders an object signal and channels the rendered object signal. It passes to the renderer 751.
  • the channel renderer 751 may synthesize the received rendered object signal with the channel signal and render the synthesized channel signal.
  • a speaker located at the center of the top layer 210 is absent in 22.2 channels, and a sound such as VoG (Voice of God) played at a speaker located at the center of the top layer 210 is played.
  • VoG Voice of God
  • an object signal corresponding to VoG may be rendered to a pre-installed speaker of the uppermost layer 210, and the mixed object signal may be downmixed to the intermediate layer 220 to process an exception object signal.
  • a virtual speaker is created at the position of the speaker located at the front or surround surface to handle the exception object. can do. That is, the exception object signal is rendered to the virtual speaker of the top layer 210 and the pre-installed speaker, and the middle layer located on the same vertical line as the virtual speaker and the pre-installed speaker located in the top layer 210 that received the rendered signal.
  • the exception object may be processed by performing downmixing with the speaker on 220.
  • the exception object may be rendered by the VBAP rendering method between the virtual speaker and the pre-installed speaker.
  • the virtual object may be rendered using the virtual speaker, and the rendering method applied at this time is not limited to the above example and may be rendered by various methods.
  • the rendering setting unit 755 may set to select and render the first and second steps.
  • the first step causes the channel renderer 751 to render the channel signal, the object renderer 753 to render the object signal, and then combines each of the rendered channel signal and the rendered object signal. It can be passed to and synthesized.
  • the second step may synthesize the rendered object signal with the channel signal, and cause the channel renderer 751 to render the synthesized channel signal. That is, when the object is located within the available speaker area, it may be rendered not only by the rendering method according to the first step but also by the rendering method applied when it is determined that the object is an exception object.
  • the synthesizer 760 synthesizes the rendered channel signal and the rendered object signal. That is, the synthesizer 760 synthesizes both the rendered channel signal and the rendered object signal, and outputs the synthesized signal. In contrast, when only the channel signal or only the object signal is present, the channel signal or the object signal is output without any synthesis.
  • the object included in the audio bit string is a parametric object signal decoded by a parametric codec
  • the object may be processed by a method different from that when the individual object signal is included in the audio bit string. That is, in the case of the parametric object signal, the object parameter is applied to the parametric downmix channel signal and decoded according to the input target rendering matrix.
  • the output signal is a channel signal that can be directly mapped to the target flexible rendering channel. The output is based on. That is, when the output channel of the rendering matrix required in the parametric decoding process corresponds to the flexible rendering channel, the rendering may be directly performed in the target channel format similarly to the case of the individual object signal.
  • the channel renderer 751 is first applied after outputting a rendering matrix that can be synthesized with the channel signal. Can be rendered.
  • FIG. 9 is a flowchart of an audio signal processing method according to an embodiment of the present invention.
  • the audio signal processing method in the audio signal processing apparatus 700 may receive usable speaker information of a user, and also receive an audio bit string signal including at least one of a channel signal and an object signal. Can be. That is, the audio bit string may include only the channel signal or only the object signal, and may include both the channel signal and the object signal.
  • the audio signal processing method decodes the channel signal or the object signal from the received audio bit string signal (S110).
  • the channel signal may be decoded by a core codec such as USAC.
  • the object signal may be decoded with a core codec such as USAC and may also be a parametric object signal decoded with a parametric codec such as SAOC.
  • the decoded channel signal or object signal is rendered (S120).
  • the rendering may include the first method of rendering the decoded channel signal, synthesizing the rendered channel signal and the rendered object signal, and synthesizing the rendered object signal with the channel signal, and the synthesized channel. Any one of the second methods of rendering the signal is selectively performed.
  • the audio signal processing method may further include determining whether an object corresponding to the object signal is located within an available speaker area. That is, as to determine whether the object is an exception object outside the speaker area, the rendering is performed in different ways depending on whether the object is an exception object. This will be described in detail below.
  • the object renderer when determined as an exception object, the object renderer renders an object signal, and passes the rendered object signal to the channel renderer.
  • the channel renderer may synthesize the rendered object signal and the channel signal and render the synthesized channel signal.
  • the channel renderer may generate a virtual speaker corresponding to the location of the exception object and perform rendering based on the available speaker information and the virtual speaker. Since the method of rendering in the channel renderer has been described with reference to FIG. 8, a detailed description thereof will be omitted below.
  • the first method and the second method may be selected and rendered.
  • the first method may cause the channel renderer to render the channel signal as described above, cause the object renderer to render the object signal, and then synthesize each of the rendered channel signal and the rendered object signal.
  • the second method may synthesize the rendered object signal with the channel signal and cause the channel renderer to render the synthesized channel signal. That is, when not an exception object, not only the rendering method according to the first method, but also the rendering method applied when it is determined that it is an exception object can be rendered.
  • an embodiment may be a determination about rendering performance of the channel renderer. That is, the rendering performance of the channel renderer can be predicted according to the difference between the input channel format and the target speaker format. If this value is less than or equal to a preset reference value, the rendering by the second method is performed even if it is not an exception object. Can be.
  • the object renderer may select and render the first method for some object signals and the second method for some other object signals instead of selecting the first method and the second method for all input object signals. have.
  • the rendered channel signal and the object signal are synthesized (S130). That is, when both the rendered channel signal and the rendered object signal exist, they are synthesized and the synthesized signal is output. In contrast, when only the channel signal or only the object signal is present, the channel signal or the object signal is output without any synthesis.
  • the audio signal processing apparatus and method according to another embodiment of the present invention can render the audio bit string signal input by using an internal renderer and an external renderer, respectively, which is described with reference to FIGS. This will be described with reference.
  • FIG. 10 is a block diagram of an audio signal processing apparatus 1000 according to another embodiment of the present invention.
  • the audio signal processing apparatus 1000 includes an internal renderer 1030, an external renderer 1040, a distribution unit 1050, and a playback unit 1060.
  • the audio signal processing apparatus 1000 may further include an audio signal receiver 1010 and a decoder 1020.
  • the audio signal receiver 1010 may receive an audio bit string signal including at least one of one or more channel signals or object signals, and the decoder 1020 may decode a channel signal or object signal included in the audio bit string. have.
  • the decoder 1020 may decode metadata regarding the plurality of object signals.
  • the internal renderer 1030 renders the decoded channel signal or object signal
  • the external renderer 1040 also renders the decoded channel signal or object signal.
  • the internal renderer 1030 and the external renderer 1040 may render a channel signal or an object signal based on vector based amplitude panning (VBAP) rendering.
  • VBAP vector based amplitude panning
  • the internal renderer 1030 is a renderer corresponding to a standard renderer in the case of MPEG-H, and may be the 3DA renderer 460 illustrated in FIG. 4, and the external renderer 1040 may be a renderer included in a specific product or may be developed separately. It may be a renderer.
  • a speaker environment to which the internal renderer 1030 and the external renderer 1040 are applied will be described below.
  • the speaker environment of the audio signal processing apparatus 1000 according to the present invention may be reproduced through a general loudspeaker, for example, when the speaker system is provided with a separate playback system such as a general loudspeaker and a sound bar.
  • the sound source may be rendered through the internal renderer 1030, and the sound source reproduced through the sound bar may be rendered through the external renderer 1040.
  • the external renderer 1040 may be a binaural renderer. Accordingly, a signal rendered by the internal renderer 1030 may be reproduced in a general loudspeaker, and a signal binaurally rendered by the external renderer 1040 may be reproduced through a speaker environment such as headphones.
  • the speaker environment to which the internal renderer 1030 and the external renderer 1040 are applied is not limited thereto, and various rendering methods and speaker environments may be applied.
  • the distribution unit 1050 distributes the decoded channel signal or the object signal to the internal renderer 1030 and the external renderer 1040. In this case, the distribution unit 1050 distributes one or more channel signals or object signals among the decoded channel signals or object signals to the internal renderer 1030 and the external renderer 1040.
  • the distribution unit 1050 may overlap one or more channel signals or object signals among the decoded channel signals or object signals and distribute them to the internal renderer 1030 and the external renderer 1040. For example, when receiving the first to fifth channel signals, the distribution unit 1050 distributes the first to third channel signals to the internal renderer 1030 and distributes the third to fifth channel signals to the external renderer ( 1040, the third channel signal may be distributed to the internal renderer 1030 and the external renderer 1040 to overlap each other. In this case, when the maximum overlap occurs, the internal renderer 1030 and the external renderer 1040 receive the same channel signal or object signal. That is, the distribution unit 1050 may distribute the first to fifth channel signals to be commonly input to the internal renderer 1030 and the external renderer 1040.
  • the distribution unit 1050 may distribute the decoded channel signal or the object signal to the internal renderer 1030 and the external renderer 1040 so as not to overlap.
  • the first to third channel signals may be distributed to the internal renderer 1030
  • the fourth to fifth channel signals may be distributed to the external renderer 1040.
  • the playback unit 1060 reproduces the channel signal or the object signal rendered by the internal renderer 1030 and the external renderer 1040, respectively.
  • the channel signal or the object signal rendered through the internal renderer 1030 or the external renderer 1040 is reproduced through a separate playback unit 1060.
  • the audio signal processing apparatus 1000 may further include a delay compensator 1070, a weight adjuster 1080, and a speaker information input unit 1090.
  • the delay compensator 1070 may compensate for a time delay occurring between the internal renderer 1030 and the external renderer 1040. For example, when the external renderer 1040 generates an additional time delay than the internal renderer 1030, the delay compensator 1070 takes the delay time to synchronize the internal renderer 1030 and the external renderer 1040 in consideration of this. To compensate for this.
  • the weight adjusting unit 1080 may adjust the output weight of each of the internal renderer 1030 and the external renderer 1040 to adjust the sound intensity of the channel signal or the object signal. That is, since the channel signal or the object signal respectively rendered by the internal renderer 1030 and the external renderer 1040 are reproduced in the same space, the weight adjusting unit 1080 sounds the sound of the internal renderer 1030 and the external renderer 1040. You can synchronize by adjusting the intensity of.
  • the speaker information input unit 1090 may receive usable speaker information. At this time, if the location of the channel or object corresponding to the channel signal or the object signal is out of the available speaker area based on the input speaker information of the user, the distribution unit 1050 may receive the decoded channel signal or the object signal.
  • the external renderer 1040 may distribute to the external renderer 1040, and thus the external renderer 1040 may render a channel signal or an object signal deviating from the available speaker area.
  • FIG. 11 is a flowchart of an audio signal processing method according to another embodiment of the present invention.
  • the user's available speaker environment may include, for example, a general loudspeaker and a sound bar, or headphones that receive a rendered signal through binaural rendering instead of a sound bar.
  • the speaker information available through the UI or the like it is determined whether the position of the channel or the object corresponding to the channel signal or the object signal is out of the available speaker area based on the speaker information. If the determination result is out of the speaker region, the channel signal or the object signal is rendered through the external renderer 1040 as described below, and the rendered signal may be reproduced through a playback device such as a sound bar or headphones.
  • the speaker environment to which the audio signal processing method according to the present invention is applied is not limited to the above-described application example, and the audio signal processing method according to the present invention may be applied in various speaker environments.
  • the audio signal processing method may receive an audio bit string signal including at least one channel signal or object signal and decode the channel signal or object signal included in the received audio bit string (S230).
  • the metadata of the object signal may be decoded, and the decoded metadata may be distributed to the internal renderer 1030 or the external renderer 1040 based on this.
  • one or more channel or object signals of the decoded channel signal or object signal are distributed to the internal renderer 1030 and the external renderer 1040, respectively (S210).
  • the decoded channel signal or object signal is distributed to the external renderer 1040.
  • the distribution unit 1050 may distribute the channel signal or the object signal included in the audio bit stream so as to overlap the internal renderer 1030 and the external renderer 1040, otherwise the channel signal or object signal is distributed so as not to overlap You may. Since this has been described with reference to FIG. 10, a detailed description thereof will be omitted.
  • the channel signal or the object signal distributed to the internal renderer 1030 and the external renderer 1040 are respectively rendered (S220).
  • the internal renderer 1030 and the external renderer 1040 may render a channel signal or an object signal based on the VBAP rendering.
  • the internal renderer 1030 is a renderer corresponding to a standard renderer in the case of MPEG-H, and may be the 3DA renderer 460 illustrated in FIG. 4, and the external renderer 1040 may be a renderer included in a specific product or separately. It may be a developed renderer.
  • the rendered channel signal or object signal is reproduced (S230).
  • the channel signal or the object signal rendered through the internal renderer 1030 and the external renderer 1040 may be reproduced through separate playback units 1060. That is, the internal renderer 1030 may be reproduced through a general loudspeaker, and the external renderer 1040 may be reproduced through a separate playback unit 1060 such as a sound bar or headphones.
  • signals processed independently through the internal renderer 1030 and the external renderer 1040 may be simultaneously reproduced in the same space. In order to simultaneously play in the same space, a process of synchronizing the internal renderer 1030 and the external renderer 1040 is required.
  • the audio signal processing method according to the present invention may further include synchronizing the internal renderer 1030 and the external renderer 1040.
  • the method may further include compensating for a delay time occurring between the internal renderer 1030 and the external renderer 1040.
  • the two renderers may be synchronized by compensating for the time delay of the internal renderer 1030 in consideration of this.
  • the method may further include adjusting the intensity of the sound of the channel signal or the object signal by adjusting the output weight of each of the external renderer 1040 and the internal renderer 1030.
  • the output weights of the internal renderer 1030 and the external renderer 1040 are adjusted to adjust the intensity of the sound of the speaker that reproduces the signal rendered by the internal renderer 1030 and the speaker that reproduces the signal rendered by the external renderer 1040.
  • the audio signal processing apparatus and method according to the exemplary embodiments described with reference to FIGS. 1 to 11 may be implemented by the audio reproducing apparatus 1 shown in FIG. 12, which will be described below.
  • FIG. 12 is a diagram illustrating an example of a device in which an audio signal processing device and method according to the present invention are implemented.
  • the audio reproducing apparatus 1 may include a wired / wireless communication unit 10, a user authentication unit 20, an input unit 30, a signal coding unit 40, a control unit 50, and an output unit 60.
  • the wired / wireless communication unit 10 receives an audio bit string signal through a wired / wireless communication method.
  • the wired / wireless communication unit 10 may include a configuration such as an infrared communication unit, a Bluetooth unit, or a wireless LAN communication unit, and may receive an audio bit string signal through various other communication methods.
  • the user authentication unit 20 receives user information and performs user authentication.
  • the user authentication unit 20 may include one or more of a fingerprint recognition unit, an iris recognition unit, a face recognition unit, and a voice recognition unit. That is, the user authentication may be performed by receiving a fingerprint, iris information, facial outline information, and voice information, converting the user information into a user information, and determining whether or not matching with the registered user information is performed.
  • the input unit 30 is an input device for the user to input various types of commands, and may include one or more of a keypad unit, a touch pad unit, and a remote controller unit.
  • the signal coding unit 40 may encode or decode an audio signal, a video signal, or a combination thereof received through the wire / wireless communication unit 10 and output an audio signal of a time domain.
  • the signal coding unit 40 may include an audio signal processing apparatus, and the audio signal processing apparatus according to the present invention may be applied.
  • the controller 50 receives an input signal from the input devices and controls all processes of the signal coding unit 40 and the output unit 60.
  • the output unit 60 outputs an output signal generated by the signal coding unit 40, and may include components such as a speaker unit and a display unit. In this case, when the output signal is an audio signal, the output signal may be output to the speaker, and in the case of a video signal, the output signal may be output through the display.
  • components shown in FIGS. 4, 6 through 8, 10, and 12 may be software or hardware such as a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC). Means a component, and plays a role.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • 'components' are not meant to be limited to software or hardware, and each component may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors.
  • a component may include components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, procedures, and subs. Routines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables.
  • Components and the functionality provided within those components may be combined into a smaller number of components or further separated into additional components.
  • an embodiment of the present invention may be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by the computer.
  • Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer readable media may include both computer storage media and communication media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

La présente invention concerne un appareil pour traiter un signal audio, lequel appareil comprend : une unité d'entrée d'informations de haut-parleur pour recevoir des informations concernant des haut-parleurs qu'un utilisateur peut utiliser ; une unité de réception pour recevoir un signal de train de bits audio comprenant un signal de canal et/ou un signal d'objet ; une unité de décodage pour décoder le signal de canal ou le signal d'objet inclus dans le signal de train de bits audio ; une unité de discernement d'objet pour discerner si un objet correspondant au signal d'objet est ou non situé dans une région de haut-parleur pouvant être utilisée ; une unité de restitution comprenant un dispositif de restitution de canal et un dispositif de restitution d'objet pour restituer le signal de canal décodé et le signal d'objet décodé, respectivement, et une unité de configuration de restitution pour configurer un procédé de restitution sur la base du résultat du discernement ; et une unité de composition pour composer le signal de canal restitué et le signal d'objet restitué.
PCT/KR2015/000452 2014-03-25 2015-01-15 Appareil et procédé pour traiter un signal audio WO2015147433A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR1020140034595A KR20150111117A (ko) 2014-03-25 2014-03-25 오디오 신호 처리 장치 및 방법
KR10-2014-0034597 2014-03-25
KR1020140034597A KR20150111119A (ko) 2014-03-25 2014-03-25 오디오 신호 재생 장치 및 방법
KR10-2014-0034595 2014-03-25

Publications (1)

Publication Number Publication Date
WO2015147433A1 true WO2015147433A1 (fr) 2015-10-01

Family

ID=54195900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2015/000452 WO2015147433A1 (fr) 2014-03-25 2015-01-15 Appareil et procédé pour traiter un signal audio

Country Status (1)

Country Link
WO (1) WO2015147433A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3833047A1 (fr) * 2019-12-02 2021-06-09 Samsung Electronics Co., Ltd. Appareil électronique et son procédé de commande

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030018477A1 (en) * 2001-01-29 2003-01-23 Hinde Stephen John Audio User Interface
KR20080089308A (ko) * 2007-03-30 2008-10-06 한국전자통신연구원 다채널로 구성된 다객체 오디오 신호의 인코딩 및 디코딩장치 및 방법
US20090006106A1 (en) * 2006-01-19 2009-01-01 Lg Electronics Inc. Method and Apparatus for Decoding a Signal
US20100092014A1 (en) * 2006-10-11 2010-04-15 Fraunhofer-Geselischhaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a number of loudspeaker signals for a loudspeaker array which defines a reproduction space
WO2014014891A1 (fr) * 2012-07-16 2014-01-23 Qualcomm Incorporated Compensation de position de haut-parleur à codage audio 3d hiérarchique

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030018477A1 (en) * 2001-01-29 2003-01-23 Hinde Stephen John Audio User Interface
US20090006106A1 (en) * 2006-01-19 2009-01-01 Lg Electronics Inc. Method and Apparatus for Decoding a Signal
US20100092014A1 (en) * 2006-10-11 2010-04-15 Fraunhofer-Geselischhaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a number of loudspeaker signals for a loudspeaker array which defines a reproduction space
KR20080089308A (ko) * 2007-03-30 2008-10-06 한국전자통신연구원 다채널로 구성된 다객체 오디오 신호의 인코딩 및 디코딩장치 및 방법
WO2014014891A1 (fr) * 2012-07-16 2014-01-23 Qualcomm Incorporated Compensation de position de haut-parleur à codage audio 3d hiérarchique

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3833047A1 (fr) * 2019-12-02 2021-06-09 Samsung Electronics Co., Ltd. Appareil électronique et son procédé de commande
US11375265B2 (en) 2019-12-02 2022-06-28 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof

Similar Documents

Publication Publication Date Title
US9646620B1 (en) Method and device for processing audio signal
WO2018056780A1 (fr) Procédé et appareil de traitement de signal audio binaural
JP5174527B2 (ja) 音像定位音響メタ情報を付加した音響信号多重伝送システム、制作装置及び再生装置
WO2015147435A1 (fr) Système et procédé de traitement de signal audio
WO2014175669A1 (fr) Procédé de traitement de signaux audio pour permettre une localisation d'image sonore
JP5198567B2 (ja) ビデオ通信方法、システムおよび装置
WO2015105393A1 (fr) Procédé et appareil de reproduction d'un contenu audio tridimensionnel
WO2015147619A1 (fr) Procédé et appareil pour restituer un signal acoustique, et support lisible par ordinateur
WO2014171706A1 (fr) Procédé de traitement de signal audio utilisant la génération d'objet virtuel
WO2014175668A1 (fr) Procédé de traitement de signal audio
JP2010258604A (ja) オーディオ処理装置及びオーディオ処理方法
WO2015037905A1 (fr) Système de lecture à images multi-vues et son stéréophonique en 3d comportant un dispositif d'ajustement de son stéréophonique et procédé correspondant
WO2014175591A1 (fr) Procédé de traitement de signal audio
Jot et al. Beyond surround sound-creation, coding and reproduction of 3-D audio soundtracks
EP3494712A1 (fr) Appareil électronique, et procédé de commande associé
KR102148217B1 (ko) 위치기반 오디오 신호처리 방법
WO2019035622A1 (fr) Procédé et appareil de traitement de signal audio à l'aide d'un signal ambiophonique
WO2015147434A1 (fr) Dispositif et procédé de traitement de signal audio
JP2009260458A (ja) 音響再生装置、および、これを含む映像音声視聴システム
WO2014021586A1 (fr) Procédé et dispositif de traitement de signal audio
WO2015147433A1 (fr) Appareil et procédé pour traiter un signal audio
KR101949756B1 (ko) 오디오 신호 처리 방법 및 장치
JP2008301149A (ja) 音場制御方法、音場制御プログラム、音声再生装置
WO2013073810A1 (fr) Appareil d'encodage et appareil de décodage prenant en charge un signal audio multicanal pouvant être mis à l'échelle, et procédé pour des appareils effectuant ces encodage et décodage
US20190387272A1 (en) Display device and method of controlling display device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15768439

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15768439

Country of ref document: EP

Kind code of ref document: A1