KR20100087680A

KR20100087680A - A method and an apparatus for processing an audio signal

Info

Publication number: KR20100087680A
Application number: KR1020100007633A
Authority: KR
Inventors: 오현오; 정양원
Original assignee: 엘지전자 주식회사
Priority date: 2009-01-28
Filing date: 2010-01-27
Publication date: 2010-08-05

Abstract

PURPOSE: An audio signal processing method and apparatus are provided to determine whether an output signal is a stereo object signal using a relation identifier and a down mix channel level difference. CONSTITUTION: An audio signal processing method is as follows. A down mix signal including one or more object signals and a bit stream including object information and a down mix channel level difference are received(S110). If the down mix signal includes two or more object signals, a relation identifier which indicates whether two of the object signals are related to each other is extracted from the bit stream(S130). It is checked whether the two object signals correspond to stereo object signals by using the down mix channel level difference and the relation identifier(S150). Mix information including a first component and a second component is created using a single user input(S165). At least one of down mix processing information and multichannel information is created based on the object information and mix information.

Description

Audio signal processing method and apparatus {A METHOD AND AN APPARATUS FOR PROCESSING AN AUDIO SIGNAL}

The present invention relates to a method and apparatus for processing an audio signal, and more particularly, to a method and apparatus for processing an audio signal capable of processing an audio signal received through a digital medium, a broadcast signal, and the like.

In the process of downmixing an audio signal including a plurality of objects into a mono or stereo signal to generate a downmix signal, parameters are extracted from the objects. These parameters may be used in the process of decoding the downmixed signal. The panning and gain of each object may be controlled by the user's selection in addition to the parameters.

Panning and gain of the objects included in the downmix signal may be controlled by the user's selection. However, when controlling an object by a user, it is cumbersome to control all object signals directly, and it may be difficult to reproduce an optimal state of an audio signal including a plurality of objects as compared to being controlled by an expert. have.

In addition, when the user adjusts the panning and gain of the objects, it is determined whether the output signal is a stereo object signal, and in the case of the stereo object signal, the user can control the stereo object signal using one user input.

The present invention has been made to solve the above problems, and provides an audio signal processing method and apparatus for identifying whether a downmix signal is a stereo object signal by using a relationship identifier and downmix channel level difference information. There is a purpose.

Another object of the present invention is to provide an audio signal processing method and apparatus capable of controlling panning and gain of objects based on a user's selection.

Another object of the present invention is to control the panning and gain of the objects based on the user's selection, if the output signal is a stereo object signal, audio that can control the panning and gain of the object using one user input It is to provide a signal processing method and apparatus.

An audio signal processing method of the present invention includes: receiving a downmix signal including at least one object signal and a bitstream including object information and a downmix channel level difference; If the downmix signal includes at least two object signals, extracting a relationship identifier from the bitstream indicating whether two of the at least two object signals are related to each other; Identifying whether the two object signals correspond to stereo object signals using the downmix channel level difference and the relationship identifier; Generating mix information including a first component and a second component using a single user input; And generating at least one of downmix processing information and multichannel information based on the object information and the mix information, wherein the stereo object signals include a left object signal and a right object signal. A component is applied to the left object signal of the stereo object signal to output a first channel, and the second component is applied to the right object signal of the stereo object signal to output a second channel; The second component includes one that is in a negative correlation.

The left object signal may be mapped to the left channel of the downmix signal, and the right object signal may be mapped to the right channel.

The identifying may include identifying whether two of the at least two object signals are related to each other based on the relationship identifier; If two object signals are related to each other, identifying whether the downmix channel level differences of the two object signals have a maximum or minimum value; And if the two object signals have the maximum value or the minimum value of the downmix channel level differences, determining that the two object signals correspond to the stereo object signals.

The first component and the second component may be used to jointly control the stereo object signals.

When the first component is large, the second component may be small, or when the first component is small, the second component may be large.

The mix information includes a third component and a fourth component, wherein the third component is applied to a left object signal of the stereo object signal to output a second channel, and the fourth component is a right object of the stereo object signal. The first component may be applied to a signal to output a first output, and the third component and the fourth component may be zero.

Processing the downmix signal using the downmix processing information; And generating a multichannel signal based on the processed downmix signal and the multichannel information.

In addition, the audio signal processing apparatus of the present invention receives a downmix signal including at least one object signal and a bitstream including a difference between object information and a downmix channel level, and the downmix signal includes at least two object signals. A receiving unit for extracting, from the bitstream, a relationship identifier indicating whether two of the at least two object signals are related to each other; An identification unit for identifying whether the two object signals correspond to stereo object signals using the downmix channel level difference and the relationship identifier; A mix information generation unit for generating mix information including a first component and a second component using a single user input; And an information generation unit configured to generate at least one of downmix processing information and multichannel information based on the object information and the mix information, wherein the stereo object signals include a left object signal and a right object signal. A first component is applied to the left object signal of the stereo object signal to output a first channel, and the second component is applied to the right object signal of the stereo object signal to output a second channel. The component and the second component may be in a negative correlation.

The present invention provides the following effects and advantages.

First, whether the output signal is a stereo object signal may be identified using the relationship identifier and downmix channel level difference information.

Second, the panning and gain of the objects can be adjusted based on the user's selection.

Third, when the panning and gain of the objects are adjusted, when the output signal is a stereo object signal, the panning and gain of the object may be controlled using one user input.

1 shows an object encoder according to an embodiment of the present invention.
2 is a block diagram showing the configuration of an audio signal processing apparatus according to an embodiment of the present invention.
3 is a block diagram showing a configuration of an audio signal processing apparatus including a user interface according to an embodiment of the present invention.
4 is a flowchart of an audio signal processing method according to an embodiment of the present invention.
5 illustrates a method of displaying a user input using a user interface according to an embodiment of the present invention.
6 illustrates an object adjusting method using a user interface according to an embodiment of the present invention in the case of a mono output.
FIG. 7 illustrates a method for displaying user input using a user interface according to an embodiment of the present invention in the case of (a) stereo, (b) binaural, and (c) multichannel output.
8 illustrates an object control method using a user interface according to an embodiment of the present invention including an extended mode in a user interface.
9 illustrates a user interface including an indicator capable of displaying an object level according to an embodiment of the present invention.
10 illustrates a method of setting an initial position of a level fader in a user interface according to an embodiment of the present invention.
11 illustrates a method of setting an initial position of a panning knob in a user interface according to an embodiment of the present invention.
12 is a view illustrating a schematic configuration of a product in which an audio signal processing apparatus is implemented according to an embodiment of the present invention, and FIGS. 13A and 13B are diagrams illustrating products implementing the audio signal processing apparatus according to an embodiment of the present invention. A diagram showing the relationship.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms or words used in the specification and claims should not be construed as having a conventional or dictionary meaning, and the inventors should properly explain the concept of terms in order to best explain their own invention. Based on the principle that can be defined, it should be interpreted as meaning and concept corresponding to the technical idea of the present invention. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

In particular, in the present specification, information is a term encompassing values, parameters, coefficients, elements, and the like, and in some cases, the meaning may be interpreted differently. However, the present invention is not limited thereto.

1 shows an object encoder according to an embodiment of the present invention.

Referring to FIG. 1A, an object encoder 100 according to an embodiment of the present invention receives a plurality of object signals (Object 1, Object 2, ..., Object 4) to receive a mono or stereo downmix signal (DMX). Create

FIG. 1B shows the object encoder 100A when the plurality of object signals are, for example, vocal, piano, violin and cello, and FIG. 1C shows the plurality of objects. The object encoder 100B in the case where two object signals (piano_L, piano_R) among the signals are stereo object signals.

Referring to FIG. 1C, the object encoder 100B receives a plurality of object signals (vocal, piano_L, piano_R, and cello), and two object signals (piano_L, piano) among the plurality of object signals. Relation identifier indicating whether _R) is related to each other, and downmix channel level difference indicating gain difference between objects distributed to the left channel and the right channel when the downmix signal is a stereo downmix signal. A bitstream including information (Downmix Channel Level Difference) is generated.

The bitstream may further include object information indicating an attribute of an object, wherein the object information includes object level information indicating a level of an object and the generation of the downmix signal. It may include object gain information (DMG) indicating a gain applied to the object. The downmix gain information may be gain itself when a specific object is applied to a mono channel when the downmix signal is mono, or gain for the left channel of the specific object and gain for the right channel when the downmix signal is stereo It may correspond to the sum of. In contrast, the downmix level difference information described above may correspond to a ratio of a gain corresponding to the left channel and a gain corresponding to the right channel.

2 is a block diagram showing the configuration of an audio signal processing apparatus according to an embodiment of the present invention.

Referring to FIG. 2, first, an audio signal processing apparatus 200 according to an exemplary embodiment of the present invention includes a reception unit 210, an identification unit 220, a mix information generation unit 230, an information generation unit 240, A downmix processing unit 240 and a multichannel decoder 260.

The reception unit 210 receives a downmix signal including at least one object from the object encoders 100, 100A, and 100B, and a bitstream including a relationship identifier and downmix level difference information.

Although the present invention is illustrated as separately receiving the downmix signal and the bitstream, this is to help understanding of the present invention, and the downmix signal may be included in one bitstream and transmitted.

When the received downmix signal includes at least two object signals, the reception unit 210 extracts a relation identifier and downmix channel level difference information from the bitstream to the identification unit 220. Output

The relationship identifier indicates whether two object signals among at least two object signals included in the downmix signal are related to each other.

The identification unit 220 uses the relationship identifier and the downmix channel level difference information to determine whether two object signals included in the downmix signal are represented by a stereo object signal, more specifically, the two object signals are stereo object signals. Identifies if

Since the relationship identifier bsrelatedTo [i] [j] may correspond to information indicating whether there is a relationship between the i-th object and the j-th object, it is extracted when there are two or more objects. In addition, the relationship identifier may be, for example, information corresponding to 1 bit. Therefore, when the relationship identifier is '1', it may indicate that two object signals are related to each other, and when the relationship identifier is '0', it may indicate that two object signals are not related to each other.

The following table shows an example in which a relationship identifier is transmitted when there are five objects in total and the second object (i = 1) and the third object (j = 2) have a relationship with each other.

bsrelatedTo [i] [j] i = 0 i = 1 i = 2 i = 3 i = 4 j = 0 - - - - - j = 1 0 - - - - j = 2 0 One - - - j = 3 0 0 0 - - j = 4 0 0 0 0 -

Where i and j are object indices

As shown in Table 1, i may be transmitted a relation identifier corresponding to 0 to 4 and j to i + 1 to 4 (when i is 0 to 4 and j is 0 to i, it is duplicated. Excluded).

The stereo object signals are object signals including a left object signal and a right object signal, wherein the left object signal is mapped to a left channel, and the right object signal is It is mapped to the right channel.

For example, if the downmix signal is a signal composed of two channels including object signals A and B (eg, A may be piano_L and B may be piano_B), the stereo object signals The object A may be mapped to the left channel of the downmix signal, and the object B may be mapped to the right channel of the downmix signal. Therefore, since the object signal A is mostly mapped to the left channel, the downmix level difference with respect to the object signal A has a maximum value (for example, 150 dB), and since the object signal B is almost mapped to the right channel, The downmix level difference for object signal B is the minimum value (eg -150 dB). (Of course, depending on the definition of DCLD, on the contrary, the DCLD of the object signal A may have a minimum value and the DCLD of the object signal B may have a maximum value.)

The decoder may use this property to determine whether the object is part of a stereo object (left channel or right channel) based on the transmitted DCLD value. Specifically, if the downmix channel level difference of two related objects (pair) has a maximum value (e.g. +150 dB) or a minimum value (e.g. -150 dB), It may be identified that the two object signals correspond to the stereo object signal (left object or right object). Furthermore, it is possible to identify that the object having the maximum downmix channel level difference is the left object among the stereo objects, and the object having the minimum downmix level difference is the right object among the stereo objects. (Of course, as mentioned earlier, depending on the definition of DCLD, the inverse may hold.)

When at least two object signals are represented by a stereo object signal, the mix information generation unit 230 receives a single user input for the left object and the entire right object, and uses the one user input. To generate mix information including a first element and a second element. Here, one user input for the left object and the entire right object will be described in detail. The left object and the right object of the stereo objects may be treated as independent objects, so that an interface that can be adjusted separately may be displayed (see FIG. 5 later). However, only one object may be adjusted at the same time. Specifically, when there is a user input on the left object, the user input on the right object is automatically determined. Conversely, if there is a user input for the right object, the user cannot input a user input for the left object.

Since the sound quality is severely distorted when adjusting the level (and panning) for each of the left object and the right object due to the characteristics of the stereo object, it is a means for adjusting all at once.

Meanwhile, the first component and the second component are used to control the stereo object signal.

On the other hand, when at least two object signals do not correspond to the stereo object signal, the mix information generation unit 230 receives each user input for the object signals, and receives each user input. To generate mix information.

The mix information is information generated based on object position information, object gain information, playback configuration information, and the like. It is information input to control the position or panning of an object. The object gain information is information input by the user to control the gain of each object. The playback environment information is the number of speakers, the position of the speaker, and ambient information Information including the virtual location of the speaker) may be input from the user, may be stored in advance, or may be received from another device.

Meanwhile, referring to FIG. 2, the case where the mix information is input by the user is described as an example, but the present invention is not limited thereto. That is, the mix information may be information included in the bitstream and input to the information generator 240 or may be information separately input from the outside.

The information generating unit 240 may generate the downmix processing information and the multi-channel information based on the bitstream received from the receiving unit 210 and the mix information received from the mix information generating unit 230. information) can be generated.

The information generation unit 240 may generate downmix processing information for preprocessing a downmix signal using the mix information and the bitstream.

Thereafter, the downmix processing information may be input to the downmix processing unit 250 to perform panning or adjust the gain of the object by changing a channel including the object included in the downmix signal.

For example, if the downmix signal is stereo, that is, if the object signal is present in both the left and right channels, panning or adjusting the object gain can be performed, and the object signal is in one of the left and right channels. In the case of positioning, the object signal may be positioned at an opposite position.

Meanwhile, when the downmix signal is mono, the object gain may be adjusted.

The downmix processing unit 250 may receive the downmix signal from the receiving unit 210 and the downmix processing information from the information generating unit 240, and analyze the submix domain signal into a subband domain signal using a subband analysis filter bank. . The downmix processing unit 250 may generate a processed downmix signal using the downmix signal and the downmix processing information. At this time, it is possible to preprocess the downmix signal in order to control object panning and object gain.

On the other hand, when the number of final output channels of the audio signal is greater than the number of channels of the downmix signal, the information generation unit 240 mixes the bitstream received from the reception unit 210 and the mix information generation unit 230. The information may further be used to generate multichannel information for upmixing the downmix signal.

The multichannel information may include channel level information, channel correlation information, and channel prediction coefficients.

The multichannel information is output to a multi-channel decoder 260, and the multichannel decoder 260 performs upmixing using the processed downmix signal and the multichannel information and finally Can generate a multi-channel signal.

Meanwhile, the processed downmix signal may be directly output through the speaker. For this purpose, the downmix processing unit 250 may output the PCM signal of the time domain by performing a synthesis filter bank using the processed subband domain signal. Can be.

3 is a block diagram showing a configuration of an audio signal processing apparatus including a user interface according to an embodiment of the present invention.

Referring to FIG. 3, first, an audio signal processing apparatus 300 according to an exemplary embodiment of the present invention may include a reception unit 310, an identification unit 320, a mix information generation unit 330, an information generation unit 340, A downmix processing unit 350, a multichannel decoder 360, and a user interface 370.

The reception unit 310, the identification unit 320, the mix information generation unit 330, the information generation unit 340, the downmix processing unit 340, and the multichannel decoder 360 of FIG. 3 are the reception unit of FIG. 2. Since the functions of the 210, the identification unit 220, the mix information generation unit 230, the information generation unit 240, the downmix processing unit 250, and the multichannel decoder 260 are the same, a description thereof will be provided. Omit.

The user interface 370 receives a user input for adjusting the level of at least one object, and the user input is input to the mix information generation unit 330 to output mix information estimated by the user input.

4 is a flowchart of an audio signal processing method according to an embodiment of the present invention.

Referring to FIG. 4, an audio signal processing method according to an embodiment of the present invention first receives a bitstream including a downmix signal, a relationship identifier, and a DCLD (S110).

Thereafter, it is checked whether the downmix signal includes at least two object signals (S120), and when the downmix signal includes at least two object signals, a relationship identifier is obtained from the received bitstream (S130). .

Subsequently, it is identified whether two object signals of at least two object signals are stereo object signals using the relationship identifier and the DCLD (S140).

Subsequently, according to step S140, in the case of a stereo object signal, the stereo object is displayed through a user interface, and a single user input for the stereo object signal is received (S160). Mix information is generated using (S165).

On the other hand, according to step S140, if it is not a stereo object signal, display each object through a user interface, receive each user input (each user input) for the stereo object signal (S170), each of the user Mix information is generated using the input (S175).

5 illustrates a method of displaying a user input using a user interface according to an embodiment of the present invention.

Referring to FIG. 5, the user interface may include a panning knob for adjusting panning of objects including a stereo object and a level fader for adjusting gain of an object.

As described above with reference to FIGS. 2 and 3, stereo objects (eg, piano_L and piano_R) may be included among the objects. When the user adjusts the level fader (and panning knob) to one of the stereo objects (left object or right object), the level (and panning) for the remaining objects is automatically determined as described above, and accordingly It can indicate that the level faders (and panning knobs) for the remaining objects move automatically.

The level and / or panning of the adjusted object to which the mix information generated using the user input input through the user interface is applied may be displayed on the user interface along with metadata representing the characteristics of the object.

6 illustrates an object adjusting method using a user interface according to an embodiment of the present invention in the case of a mono output. If the output is mono, no panning knob is needed to adjust the panning of the object, so only the level of the object needs to be adjusted.

FIG. 6A illustrates adjusting the level of the object by moving the level fader up and down using the level fader, and FIG. 6B illustrates adjusting the level of the object by rotating the level knob using the level knob. Furthermore, the level fader may be implemented to move up and down (or straight line) as shown in FIG. 6A, but may of course be implemented to move on a curve or to rotate.

In FIG. 6A, for example, assume that the parameter from the level fader for the vocal object is Li, the parameter from the panning knob is Pi, and the parameter is given in dB scale.

In this case, in the case of the mono output, the mix information generated by the mix information generating unit 330 may be determined by Equation 1 or Equation 2.

[Equation 1]

[Equation 2]

Here, _{N-1 in} m _{N-1, M} represents an object. Therefore, in Equations 1 and 2, the mono output includes N objects (0, ..., N-1). In addition, since the parameter exists only in three rows of the matrix corresponding to the center channel in Equation 2 and the remaining rows do not exist in the equation, it indicates the mix information in the case of mono output as in Equation 1, and the mix information m _i _{, M} is obtained as in Equation 3 below.

&Quot; (3) "

In order to generate a multichannel signal from a downmix signal including at least one object signal, initialized mix information must be specified. The information may be input by the user, but may be provided by preset information or default settings indicating various modes selectable by the user according to the characteristics of the audio signal or the listening environment.

FIG. 7 illustrates a method for displaying user input using a user interface according to an embodiment of the present invention in the case of (a) stereo, (b) binaural, and (c) multichannel output.

In the case of a stereo output, a panning knob for adjusting the panning of an object is shown in FIG. 7A. In the case of stereo output. Mix information in a matrix form generated by the mix information generation unit 330 is defined by Equation 4 or Equation 5 below.

&Quot; (4) "

[Equation 5]

Here, N-1 represents an object and L and R represent a channel.

In addition, the mix information m _{i, L} and m _{i, R} can be obtained by equation (6).

&Quot; (6) "

In the case of the binaural output, there is a difference in the interpretation of the panning knob, which is similar to the case of the stereo output. As shown in FIG. 7B, in the case of binaural output, an indicator displayed around the panning knob may include another direction corresponding to HRTF dB. In FIG. 7B it is assumed that HRTF dB includes four different labels (P1, P2, P3 and P4).

In the case of binaural output, the mix information may be in the form of a matrix of L × N having the number of virtual positions L as shown in Equation (7).

[Equation 7]

Meanwhile, each value included in the matrix may be obtained as follows through Equation 8.

[Equation 8]

Here, VP _i is a predetermined panning value in the i-th virtual position.

Referring to FIG. 7C, the case of the multichannel output is similar to the case of the binaural output shown in FIG. 7B except that the predetermined indicia are 5.1 channels.

Inferred from FIG. 7C, in the case of multi-channel output, the user intends to place one object in one spatial position.

However, if you wish to render an object (eg, applaud or background noise) to play from all speakers, this is not possible with the user interface of FIG. 7C.

For example, in the case of stereo output, a panning knob can be placed at the center position so that an object can be played from all speakers.

In the case of a multichannel output, the mix information may be in a matrix form as shown in Equation (9).

[Equation 9]

Each row in the matrix represents an output channel, and each column represents an object. Accordingly, the output signal through the matrix includes N objects and includes six channels (Lf, Rf, C, Lfe, Ls, and Rs) of 5.1 channels.

On the other hand, each value included in the matrix can be obtained as follows through the equation (10).

[Equation 10]

Here, y and z are adjacent channels.

For example, _{assume that} P _c , P _Lf , P _Rf , P _Ls, and P _Rs are 0 dB, -10 dB, 10 dB, -20 dB, and 20 dB, respectively, and the user inputted panning value for the i th object is 15 dB. do. Substituting the value in Equation 10 results in Equation 11.

[Equation 11]

Accordingly, it can be seen from Equation 11 that the user intended to render the i-th object in the middle of the right front and right surround speakers.

The user can adjust the objects one by one. However, as shown in FIG. 5, in the case of including stereo objects (piano_L, piano_R), the level and panning of the two objects must be adjusted jointly.

The left channel of the stereo object may be mixed into the right channel of the downmix signal in the encoding step, and the left channel of the stereo object may be rendered into the right channel of the processed output downmix signal (cross rendering). However, because each channel of a stereo object shares the same attribute, it is appropriate to limit cross rendering in most applications.

In this case, when the i-th object is a right channel object, the rendering parameters M _{i, Lf} , M _{i, Ls} are always 0 (zero), and when the j-th object is a left channel object, the rendering parameters M _{j, Rf} , M _{j, Rs} is always zero.

In the stereo object shown in FIG. 5, it is assumed that the level of the piano_L object is adjusted by L _{i on the} dB scale, and the panning of the piano_L object is adjusted by θ _i . In this case, L _i and θ _i may be mapped to a rendering parameter according to an amplitude panning law.

As a result, equation (12) holds.

[Equation 12]

here,

Is the gain ratio between two adjacent speakers obtained from θ _i .

As described above, in the case of a stereo object, as one module of the user interface, for example, it is possible to adjust the level of the object with one level fader for the piano_L object of FIG. 5.

Considering the characteristics of Equation 12 and the stereo object, the mix information in the form of a matrix for the stereo object is shown in Equation 13.

[Equation 13]

That is, in the case of a stereo object signal, the mix information includes a first component (

) And the second component (

). The first component may be applied to the left object signal of the stereo object signal to output a first channel, and the second component may be applied to the right object signal of the stereo object signal to output a second channel.

The first component and the second component are jointly used to control the stereo object signal, and the first component and the second component are in negative correlation. In other words, when the first component becomes larger, the second component becomes smaller, and when the first component becomes smaller, the second component becomes larger.

In addition, in the case of a stereo object signal, the mix information includes a third component (

) And the fourth component (

The third component may be applied to the left object signal of the stereo object signal to output the second channel, and the fourth component may be applied to the right object signal of the stereo object signal to output the first channel. And the third component and the fourth component have a value of "0".

Meanwhile, the first channel may be a left channel and the second channel may be a right channel.

8 illustrates an object control method using a user interface according to an embodiment of the present invention including an extended mode in a user interface.

Referring to FIG. 8, FIG. 8A illustrates a normal mode of the user interface, and FIG. 8B illustrates an extended manual mode. On the user interface shown in Fig. 8A, the user can select a manual portion, and as a result, as shown in Fig. 8B, the user can manually select the desired rendering level for each output channel.

9 illustrates a user interface including an indicator capable of displaying an object level according to an embodiment of the present invention.

Referring to FIG. 9, a user interface according to an embodiment of the present invention includes an indicator on an upper end of a panning knob to display an object level, and the indicator may display an object level by changing color. Although the present invention describes displaying the object level as the color is changed, the present invention is not limited thereto.

10 illustrates a method of setting an initial position of a level fader in a user interface according to an embodiment of the present invention.

An initial position may be set in the level fader according to object gain information (DMG) indicating gain applied to an object when generating the downmix signal. FIG. 10A illustrates a method of setting an initial position to the middle of a level fader by reflecting a current level (eg, 3 dB) of an object included in a downmix signal, and FIG. , 3dB) to set the initial position.

As shown in Figs. 10A and 10B, it is easy for the user to control the object level relative to the current level, so that the initial position can be set in the level fader according to the object gain information as described above.

In this case, the rendering parameter is calculated by reflecting the current level of the object as shown in Equation (14).

[Equation 14]

Meanwhile, when the downmix signal is a stereo downmix signal, an initial position may be set to the panning knob according to downmix channel level difference information indicating gain difference between objects distributed to the left channel and the right channel.

11 illustrates a method of setting an initial position of a panning knob in a user interface according to an embodiment of the present invention.

If the downmix channel level difference is 0 dB, the initial position of the panning knob can be set to the neutral position as shown in FIG. 11A, and the downmix channel level difference is the maximum value (e.g., 150 dB) or the minimum value ( For example, -150 dB), the initial position can be set to the left (or right) end position.

12 is a view showing a schematic configuration of a product in which the audio signal processing apparatus according to an embodiment of the present invention is implemented, and FIGS. 13A and 13B are views of products implementing the audio signal processing apparatus according to an embodiment of the present invention. A diagram showing the relationship.

Referring to FIG. 12, the wired / wireless communication unit 1210 receives a bitstream through a wired / wireless communication scheme. In more detail, the wired / wireless communication unit 1210 may include at least one of a wired communication unit 1211, an infrared communication unit 1212, a Bluetooth unit 1213, and a wireless LAN communication unit 1214.

The user authentication unit 1220 receives user information and performs user authentication, and includes one or more of a fingerprint recognition unit 1221, an iris recognition unit 1222, a face recognition unit 1223, and a voice recognition unit 1224. The fingerprint, iris information, facial contour information, and voice information may be input, converted into user information, and the user authentication may be performed by determining whether the user information and the existing registered user data match. .

The input unit 1230 is an input device for a user to input various types of commands, and may include one or more of a keypad unit 1231, a touch pad unit 1232, and a remote control unit 1233. It is not limited.

Meanwhile, when the audio signal processing device 1241 generates mix information, when the mix information is displayed on the screen through the display 1262, the user may adjust the mix information through the input unit 1230. The information is input to the controller 1250.

The signal decoding unit 1240 includes an audio signal processing apparatus. First, it is determined whether two object signals are stereo object signals by using a relationship identifier and a DCLD included in the received bitstream. As a result of the determination, in the case of a stereo object signal, the audio signal processing apparatus 1245 generates mix information using one user input, and based on the generated mix information and the object information included in the bitstream, the downmix processing information and At least one of the multichannel information is generated.

The controller 1250 receives an input signal from the input devices and controls all processes of the signal decoding unit 1240 and the output unit 1260.

The output unit 1260 is a component that outputs an output signal generated by the signal decoding unit 1240 and the like, and may include a speaker unit 1261 and a display unit 1262. When the output signal is an audio signal, the output signal is output through the speaker unit 1261, and when the output signal is a video signal, the output signal is output through the display unit 1262.

13A and 13B illustrate a relationship between a terminal and a server corresponding to the product illustrated in FIG. 12. Referring to FIG. 13A, each of the terminals of the first terminal 1310 and the second terminal 1320 is wired or wireless. It can be seen that the data to the bitstream can communicate in both directions through the communication unit. The data or bitstream communicating through the wired / wireless communication unit may be a bitstream generated in FIG. 1 of the present invention, or may be data including a relationship identifier, DCLD, etc. of the present invention described with reference to FIGS. 1 to 12. . It may also be a bitstream including only data type information. Referring to FIG. 13B, it can be seen that the server 1330 and the first terminal 1340 may also perform wired / wireless communication with each other. As described above, although the present invention has been described by way of limited embodiments and drawings, the present invention is not limited thereto and is intended by those skilled in the art to which the present invention pertains. Of course, various modifications and variations are possible within the scope of equivalents of the claims to be described.

The present invention can be applied to encoding and decoding audio signals.

Claims

Receiving a downmix signal comprising at least one object signal and a bitstream comprising object information and a downmix channel level difference;
If the downmix signal includes at least two object signals, extracting a relationship identifier from the bitstream indicating whether two of the at least two object signals are related to each other;
Identifying whether the two object signals correspond to stereo object signals using the downmix channel level difference and the relationship identifier;
Generating mix information including a first component and a second component using a single user input; And
Generating at least one of downmix processing information and multichannel information based on the object information and the mix information;
The stereo object signals include a left object signal and a right object signal, wherein the first component is applied to the left object signal of the stereo object signal to output a first channel, and the second component of the stereo object signal The second signal is applied to the right object signal to output a second channel, wherein the first component and the second component have a negative correlation.

The method of claim 1,
And the left object signal is mapped to the left channel of the downmix signal, and the right object signal is mapped to the right channel.

The method of claim 1,
The identifying may include identifying whether two of the at least two object signals are related to each other based on the relationship identifier;
If two object signals are related to each other, identifying whether the downmix channel level differences of the two object signals have a maximum or minimum value; And,
And determining that the two object signals correspond to the stereo object signals if the two object signals have the maximum value or the minimum value of the downmix channel level differences. .

The method of claim 1,
The first component and the second component are used to jointly control the stereo object signals.

The method of claim 1,
And the second component becomes smaller when the first component becomes larger, or when the first component becomes smaller, the second component becomes larger.

The method of claim 1,
The mix information includes a third component and a fourth component, wherein the third component is applied to a left object signal of the stereo object signal to output a second channel, and the fourth component is a right object of the stereo object signal. Is applied to the signal to output a first output,
And the third component and the fourth component are zero.

The method of claim 1,
Processing the downmix signal using the downmix processing information; And,
And generating a multi-channel signal based on the processed downmix signal and the multi-channel information.

Receiving a downmix signal comprising at least one object signal and a bitstream comprising object information and a downmix channel level difference, and if the downmix signal comprises at least two object signals, from the bitstream, A receiving unit for extracting a relationship identifier indicating whether two of the at least two object signals are related to each other;
An identification unit for identifying whether the two object signals correspond to stereo object signals using the downmix channel level difference and the relationship identifier;
A mix information generation unit for generating mix information including a first component and a second component using a single user input; And,
And an information generation unit configured to generate at least one of downmix processing information and multichannel information based on the object information and the mix information.
The stereo object signals include a left object signal and a right object signal, wherein the first component is applied to the left object signal of the stereo object signal to output a first channel, and the second component of the stereo object signal And a second channel applied to the right object signal, wherein the first component and the second component have a negative correlation.

The method of claim 8,
And the left object signal is mapped to the left channel of the downmix signal, and the right object signal is mapped to the right channel.

The method of claim 8,
The identification unit identifies, based on the relationship identifier, whether two object signals of the at least two object signals are related to each other, and when the two object signals are related to each other, Identify whether the downmix channel level differences have a maximum or minimum value, and when the two object signals have the maximum or minimum value, the two object signals are applied to the stereo object signals. And determine to correspond.

The method of claim 8,
And wherein the first component and the second component are used to jointly control the stereo object signals.

The method of claim 8,
And the second component becomes smaller when the first component becomes larger, or when the first component becomes smaller, the second component becomes larger.

The method of claim 8,
The mix information includes a third component and a fourth component, wherein the third component is applied to a left object signal of the stereo object signal to output a second channel, and the fourth component is a right object of the stereo object signal. Is applied to the signal to output a first output,
And the third component and the fourth component are zero.

The method of claim 8,
A downmix processing unit for processing the downmix signal using the downmix processing information; And,
And a multichannel decoder configured to generate a multichannel signal based on the processed downmix signal and the multichannel information.