KR20210030886A

KR20210030886A - Encoding method and decoding method for audio signal using dynamic model parameter, audio encoding apparatus and audio decoding apparatus

Info

Publication number: KR20210030886A
Application number: KR1020200115530A
Authority: KR
Inventors: 성종모; 백승권; 이미숙; 이태진; 임우택; 최진수
Original assignee: 한국전자통신연구원
Priority date: 2019-09-10
Filing date: 2020-09-09
Publication date: 2021-03-18

Abstract

Disclosed are an audio encoding method and audio decoding method using a dynamic model parameter, an audio encoding device, and an audio decoding device. The audio encoding method using a dynamic model parameter may use the dynamic model parameter corresponding to each of the levels when the dimension of the audio signal is reduced in an encoding network. In addition, the audio decoding method using the dynamic model parameter may use the dynamic model parameter corresponding to each of the levels when the dimension of the audio signal is extended in the encoding network.

Description

Audio encoding method and audio decoding method using dynamic model parameters, audio encoding device and audio decoding device {ENCODING METHOD AND DECODING METHOD FOR AUDIO SIGNAL USING DYNAMIC MODEL PARAMETER, AUDIO ENCODING APPARATUS AND AUDIO DECODING APPARATUS}

본 발명은 동적 모델 파라미터를 이용한 오디오 부호화 방법 및 오디오 복호화 방법, 오디오 부호화 장치 및 오디오 복호화 장치에 관한 것으로, 보다 구체적으로는, 부호화 네트워크와 복호화 네트워크의 전체 레벨에서 동적 모델 파라미터를 생성하는 것에 관한 것이다.The present invention relates to an audio encoding method and an audio decoding method, an audio encoding device, and an audio decoding device using dynamic model parameters, and more particularly, to generating dynamic model parameters at all levels of an encoding network and a decoding network. .

최근 딥러닝 기술의 발전으로 인해 음성, 오디오, 언어 및 영상을 처리하기 위한 다양한 딥러닝 기술이 활용되고 있다. 특히, 오디오 신호를 처리하기 위해, 딥러닝 기술이 적극적으로 활용되고 있다. 오디오 신호를 부호화하거나 복호화할 때 복수의 레벨들로 구성된 네트워크과 같은 신경망을 이용함으로써 효율적으로 오디오 신호를 부호화하거나 복호화할 수 있는 방안도 제시되고 있다.Due to the recent development of deep learning technology, various deep learning technologies are being used to process voice, audio, language, and video. In particular, deep learning technology is actively used to process audio signals. When encoding or decoding an audio signal, a method for efficiently encoding or decoding an audio signal by using a neural network such as a network composed of a plurality of levels has also been proposed.

본 발명은 오디오 신호를 처리하기 위해 오디오 신호에 오토 인코더를 적용함으로써 오디오 신호를 효율적으로 부호화하고, 원본 신호에 가깝게 복원할 수 있는 방법 및 장치를 제안한다.The present invention proposes a method and apparatus capable of efficiently encoding an audio signal and restoring it close to an original signal by applying an auto encoder to an audio signal in order to process the audio signal.

본 발명은 오토 인코더의 부호화 네트워크와 복호화 네트워크를 구성하는 레이어들 각각에 대해 동적 모델 파라미터 네트워크를 생성함으로써 오디오 신호를 효과적으로 처리할 수 있는 방법 및 장치를 제안한다.The present invention proposes a method and apparatus capable of effectively processing an audio signal by generating a dynamic model parameter network for each of the layers constituting an encoding network and a decoding network of an auto encoder.

본 발명의 일실시예에 따른 동적 모델 파라미터를 이용한 오디오 부호화 방법은 복수의 부호화 네트워크를 통해 오디오 신호의 차원을 축소하는 단계; 상기 부호화 네트워크에서 차원이 축소된 최종 레벨의 오디오 신호에 대응하는 코드를 출력하는 단계; 및 상기 출력된 코드를 양자화하는 단계를 포함할 수 있다.An audio encoding method using a dynamic model parameter according to an embodiment of the present invention includes reducing the dimension of an audio signal through a plurality of encoding networks; Outputting a code corresponding to an audio signal of a final level whose dimension has been reduced in the encoding network; And quantizing the output code.

상기 코드를 출력하는 단계는, 상기 부호화 네트워크의 레벨들 각각의 동적 모델 파라미터를 이용하여 N개 레벨의 레이어에 대응하는 오디오 신호의 차원을 축소할 수 있다.In the step of outputting the code, a dimension of an audio signal corresponding to an N-level layer may be reduced by using a dynamic model parameter of each of the levels of the encoding network.

상기 동적 모델 파라미터는, 상기 N개 레벨의 레이어에 대해 N-1개 레벨을 위해 생성될 수 있다.The dynamic model parameter may be generated for N-1 levels for the N levels of layers.

상기 동적 모델 파라미터는, 상기 부호화 네트워크와 독립적인 동적 모델 파라미터 생성 네트워크에서 결정될 수 있다.The dynamic model parameter may be determined in a dynamic model parameter generation network independent from the encoding network.

상기 동적 모델 파라미터는, 다음 레벨의 특징을 결정하기 위해 이전 레벨의 특징에 기초하여 결정될 수 있다.The dynamic model parameter may be determined based on a feature of a previous level to determine a feature of a next level.

상기 부호화 네트워크는, 이전 레벨의 특징 신호와 다음 레벨의 동적 모델 파라미터를 이용하여 다음 레벨의 특징 신호를 결정하고, 상기 다음 레벨의 특징 신호는 상기 이전 레벨의 특징 신호보다 차원이 축소된 신호일 수 있다.The encoding network may determine a feature signal of a next level using a feature signal of a previous level and a dynamic model parameter of a next level, and the feature signal of the next level may be a signal whose dimension is reduced from that of the feature signal of the previous level. .

본 발명의 일실시예에 따른 동적 모델 파라미터를 이용한 오디오 복호화 방법은 축소된 차원에 대응하는 오디오 신호의 양자화된 코드를 수신하는 단계; 상기 오디오 신호의 코드를 이용하여 복수의 복호화 네트워크를 통해 오디오 신호의 차원을 확장하는 단계; 상기 복호화 네트워크에서 차원이 확장된 최종 레벨의 오디오 신호를 출력하는 단계를 포함하고, 상기 오디오 신호의 차원을 확장하는 단계는, 상기 복호화 네트워크의 레벨들 각각의 동적 모델 파라미터를 이용하여 N개 레벨의 레이어에 대응하는 오디오 신호의 차원을 확장할 수 있다.An audio decoding method using a dynamic model parameter according to an embodiment of the present invention includes: receiving a quantized code of an audio signal corresponding to a reduced dimension; Extending a dimension of an audio signal through a plurality of decoding networks by using the code of the audio signal; And outputting an audio signal of a final level with an expanded dimension in the decoding network, and extending the dimension of the audio signal includes N levels of the audio signal using a dynamic model parameter of each of the levels of the decoding network. The dimension of the audio signal corresponding to the layer can be extended.

상기 동적 모델 파라미터는, 상기 복호화 네트워크와 독립적인 동적 모델 파라미터 생성 네트워크에서 결정될 수 있다.The dynamic model parameter may be determined in a dynamic model parameter generation network independent from the decoding network.

상기 복호화 네트워크는, 이전 레벨의 특징 신호와 다음 레벨의 동적 모델 파라미터를 이용하여 다음 레벨의 특징 신호를 결정하고, 상기 다음 레벨의 특징 신호는 상기 이전 레벨의 특징 신호보다 차원이 확장된 신호일 수 있다.The decoding network may determine a feature signal of a next level using a feature signal of a previous level and a dynamic model parameter of a next level, and the feature signal of the next level may be a signal whose dimension is extended from that of the previous level. .

본 발명의 다른 실시예에 따른 동적 모델 파라미터를 이용한 오디오 부호화 방법은 복수의 부호화 네트워크를 통해 오디오 신호의 차원을 축소하는 단계; 상기 부호화 네트워크에서 차원이 축소된 최종 레벨의 오디오 신호에 대응하는 코드를 출력하는 단계; 상기 출력된 코드를 양자화하는 단계를 포함하고, 상기 코드를 출력하는 단계는, 상기 부호화 네트워크의 레벨들 중 미리 설정된 일부 레벨에 동적 모델 파라미터를 이용하여 N개 레벨의 레이어에 대응하는 오디오 신호의 차원을 축소할 수 있다.An audio encoding method using a dynamic model parameter according to another embodiment of the present invention includes reducing the dimension of an audio signal through a plurality of encoding networks; Outputting a code corresponding to an audio signal of a final level whose dimension has been reduced in the encoding network; Quantizing the output code, and outputting the code comprises: dimension of an audio signal corresponding to an N-level layer by using a dynamic model parameter at some preset levels among the levels of the coding network. Can be reduced.

상기 코드를 출력하는 단계는, 상기 부호화 네트워크의 미리 설정된 일부 레벨에 대해서는 동적 모델 파라미터를 이용하여 오디오 신호의 차원을 축소하고, 상기 부호화 네트워크의 전체 레벨 중 미리 설정된 일부 레벨을 제외한 나머지 레벨에 대해서는 고정 모델 파라미터를 이용하여 오디오 신호의 차원을 축소할 수 있다.In the step of outputting the code, the dimension of the audio signal is reduced using dynamic model parameters for some preset levels of the coding network, and the remaining levels except for some preset levels are fixed among all levels of the coding network. The dimension of the audio signal can be reduced by using model parameters.

상기 부호화 네트워크는, 이전 레벨의 특징 신호와 다음 레벨의 동적 또는 고정 모델 파라미터를 이용하여 다음 레벨의 특징 신호를 결정하고, 상기 다음 레벨의 특징 신호는 상기 이전 레벨의 특징 신호보다 차원이 축소된 신호일 수 있다.The encoding network determines a next level feature signal using a previous level feature signal and a next level dynamic or fixed model parameter, and the next level feature signal is a signal whose dimension is reduced from that of the previous level feature signal. I can.

본 발명의 다른 실시예에 따른 동적 모델 파라미터를 이용한 오디오 복호화 방법은 축소된 차원에 대응하는 오디오 신호의 양자화된 코드를 수신하는 단계; 상기 오디오 신호의 코드를 이용하여 복수의 복호화 네트워크를 통해 오디오 신호의 차원을 확장하는 단계; 상기 복호화 네트워크에서 차원이 확장된 최종 레벨의 오디오 신호를 출력하는 단계를 포함하고, 상기 복호화 네트워크의 레벨들 중 미리 설정된 일부 레벨에 동적 모델 파라미터를 이용하여 N개 레벨의 레이어에 대응하는 오디오 신호의 차원을 확장할 수 있다.An audio decoding method using a dynamic model parameter according to another embodiment of the present invention includes: receiving a quantized code of an audio signal corresponding to a reduced dimension; Extending a dimension of an audio signal through a plurality of decoding networks by using the code of the audio signal; And outputting an audio signal of a final level with an expanded dimension in the decoding network, and an audio signal corresponding to an N level of layers by using a dynamic model parameter at some preset levels among the levels of the decoding network. Dimensions can be expanded.

상기 확장하는 단계는, 상기 복호화 네트워크의 미리 설정된 일부 레벨에 대해서는 동적 모델 파라미터를 이용하여 오디오 신호의 차원을 확장하고, 상기 복호화 네트워크의 전체 레벨 중 미리 설정된 일부 레벨을 제외한 나머지 레벨에 대해서는 고정 모델 파라미터를 이용하여 오디오 신호의 차원을 확장할 수 있다.In the expanding step, the dimension of the audio signal is expanded using dynamic model parameters for some preset levels of the decoding network, and fixed model parameters for the remaining levels except for some preset levels among all levels of the decoding network. The dimension of the audio signal can be extended by using.

본 발명의 일실시예에 따르면, 오디오 신호를 처리하기 위해 오디오 신호에 오토 인코더를 적용함으로써 오디오 신호를 효율적으로 부호화하고, 원본 신호에 가깝게 복원할 수 있다.According to an embodiment of the present invention, by applying an auto encoder to an audio signal to process an audio signal, an audio signal can be efficiently encoded and restored close to an original signal.

본 발명의 일실시예에 따르면, 오토 인코더의 부호화 네트워크와 복호화 네트워크를 구성하는 레이어들 각각에 대해 동적 모델 파라미터 네트워크를 생성함으로써 오디오 신호를 효과적으로 처리할 수 있다.According to an embodiment of the present invention, an audio signal can be effectively processed by generating a dynamic model parameter network for each of layers constituting an encoding network and a decoding network of an auto encoder.

도 1은 본 발명의 일실시예에 따른 오디오 신호의 부호화와 복호화를 위한 오토 인코더의 세부 네트워크를 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 오디오 부호화 방법을 도시한 플로우차트이다.
도 3은 본 발명의 일실시예에 따른 오디오 복호화 방법을 도시한 플로우차트이다.
도 4는 본 발명의 일실시예에 따른 오디오 부호화 방법과 오디오 복호화 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 다른 실시예에 따른 오디오 부호화 방법과 오디오 복호화 방법을 설명하기 위한 도면이다.
도 6은 본 발명의 일실시예에 따라 오토 인코더의 각 레이어별로 동적 모델 파라미터를 도출하는 과정을 설명하기 위한 도면이다.
도 7은 본 발명의 다른 실시예에 따라 오토 인코더의 특정 레이어에 대해 동적 모델 파라미터를 도출하는 과정을 설명하기 위한 도면이다.1 is a diagram illustrating a detailed network of an auto encoder for encoding and decoding an audio signal according to an embodiment of the present invention.
2 is a flowchart illustrating an audio encoding method according to an embodiment of the present invention.
3 is a flowchart illustrating an audio decoding method according to an embodiment of the present invention.
4 is a diagram illustrating an audio encoding method and an audio decoding method according to an embodiment of the present invention.
5 is a diagram illustrating an audio encoding method and an audio decoding method according to another embodiment of the present invention.
6 is a diagram illustrating a process of deriving a dynamic model parameter for each layer of an auto encoder according to an embodiment of the present invention.
7 is a diagram for describing a process of deriving dynamic model parameters for a specific layer of an auto encoder according to another embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 오디오 신호의 부호화와 복호화를 위한 오토 인코더의 세부 네트워크를 도시한 도면이다.1 is a diagram illustrating a detailed network of an auto encoder for encoding and decoding an audio signal according to an embodiment of the present invention.

본 발명에서 오디오 신호를 부호화 또는 복호화하기 위한 네트워크 구조로 복수의 레이어들로 구성되는 딥오토인코더(Deep Autoencoder)가 사용될 수 있다. 오토인코더는 차원 축소 및 표현 학습(representation learning)을 위한 대표적인 딥러닝 구조를 가진다. 딥-오토인코더는 복수의 레벨(class)들 각각에 대응하는 레이어(layer)들로 구성될 수 있으며, 입력 레이어와 출력 레이어가 같도록 하는 신경망을 가진다. 본 발명에서 딥 오토인코더는 도 1에 도시된 바와 같이 부호화 네트워크와 복호화 네트워크로 구성될 수 있다.In the present invention, a deep autoencoder composed of a plurality of layers may be used as a network structure for encoding or decoding an audio signal. The autoencoder has a representative deep learning structure for dimension reduction and representation learning. The deep-auto encoder may be composed of layers corresponding to each of a plurality of classes, and has a neural network that makes the input layer and the output layer the same. In the present invention, the deep autoencoder may be composed of an encoding network and a decoding network as shown in FIG. 1.

도 1은 딥오토인코더의 기본적인 구조를 나타낸다. 딥오토인코더의 부호화 과정은, 입력 레이어보다 차원을 축소하는 방식으로 부호화 네트워크에 입력된 오디오 신호를 부호화한다. 부호화 네트워크의 가장 마지막 레이어에서 코드 Z가 도출된다. 1 shows the basic structure of a deep auto encoder. In the encoding process of the deep auto encoder, the audio signal input to the encoding network is encoded in a manner that reduces the dimension compared to the input layer. Code Z is derived from the last layer of the coding network.

딥 오토인코더는 입력된 데이터가 고차원 (즉, 변수의 개수가 많음)이어서 변수 (또는 특징)들 간의 관계를 표현하기 어려운 경우, 데이터의 특징을 적절하게 표현하면서도 처리하기 용이한 저차원의 벡터로 변환할 수 있다. 코드는 오디오 신호의 변수 (또는 특징)들을 변환하여 획득한 새로운 변수를 의미한다. 그리고, 딥 오토인코더의 출력값은 딥 오토인코더의 예측값으로, 복호화 네트워크는 딥오토인코더에 입력된 오디오 신호와 딥오토인코더를 통해 복원된 오디오 신호가 서로 동일하도록 학습된다. 여기서, 코드는 잠재 변수(latent variable)로 표현될 수 있다. 그리고, 딥오토인코더의 복호화 과정은, 차원을 확장하는 방식으로 오디오 신호를 복원한다. 딥오토인코더를 구성하는 레이어들은 은닉 레이어라고 지칭될 수 있으며, 부호화를 위한 은닉 레이어와 복호화를 위한 은닉 레이어들이 서로 대칭( symmetric)하게 구성될 수 있다.When the input data is high-dimensional (that is, the number of variables is large) and it is difficult to express the relationship between variables (or features), it is a low-dimensional vector that is easy to process while properly expressing the features of the data. Can be converted. Code refers to a new variable obtained by converting variables (or features) of an audio signal. In addition, the output value of the deep auto-encoder is a predicted value of the deep auto-encoder, and the decoding network is learned so that the audio signal input to the deep auto-encoder and the audio signal restored through the deep auto-encoder are identical to each other. Here, the code may be expressed as a latent variable. In addition, in the decoding process of the deep auto encoder, the audio signal is restored by extending the dimension. Layers constituting the deep auto encoder may be referred to as a hidden layer, and a hidden layer for encoding and a hidden layer for decoding may be configured to be symmetric with each other.

부호화 네트워크에 입력된 오디오 신호는 딥오토인코더의 부호화 네트워크(encoding network)를 구성하는 복수의 레이어들을 통해 차원이 축소되어 은닉 레이어의 잠재 벡터(latent vector)로 표현될 수 있다. 그리고, 은닉 레이어의 잠재 벡터는 딥오토인코더의 복호화 네트워크(decoding network)를 구성하는 복수의 레이어들을 통해 차원이 확장되어 부호화 네트워크에 입력된 오디오 신호와 거의 동일한 오디오 신호를 복원할 수 있다. The audio signal input to the encoding network may be reduced in dimension through a plurality of layers constituting the encoding network of the deep auto encoder, and may be expressed as a latent vector of the hidden layer. In addition, the latent vector of the hidden layer is dimensionally extended through a plurality of layers constituting a decoding network of a deep auto encoder, so that an audio signal that is substantially the same as an audio signal input to the encoding network may be restored.

부호화 네트워크를 구성하는 레이어들의 개수와 복호화 네트워크를 구성하는 레이어들의 개수는 서로 동일하거나 또는 다를 수 있다. 이 때, 부호화 네트워크와 복호화 네트워크를 구성하는 레이어들의 개수가 동일하면, 부호화 네트워크와 복호화 네트워크는 서로 대칭되는 구조를 나타낸다.The number of layers constituting the encoding network and the number of layers constituting the decoding network may be the same or different from each other. In this case, if the number of layers constituting the encoding network and the decoding network is the same, the encoding network and the decoding network have a structure that is symmetric with each other.

본 발명의 일실시예에 따르면, 부호화 네트워크에 입력된 오디오 신호는 딥-오토인코더의 부호화 네트워크를 구성하는 복수의 레이어들을 통해 차원이 축소된 코드로 출력된다. 그리고, 코드는 딥 오토인코더의 복호화 네트워크를 구성하는 복수의 레이어들을 통해 차원이 확장되어 오디오 신호로 복원된다. 이 때, 부호화 네트워크에 입력된 오디오 신호와 복호화 네트워크를 통해 복원된 오디오 신호가 실질적으로 동일하도록 학습 과정을 통해 딥오토인코더의 오차함수를 최소화하는 파라미터가 결정될 수 있다.According to an embodiment of the present invention, an audio signal input to an encoding network is output as a code with a reduced dimension through a plurality of layers constituting an encoding network of a deep-auto encoder. In addition, the code is reconstructed as an audio signal by extending the dimension through a plurality of layers constituting the decoding network of the deep autoencoder. In this case, a parameter for minimizing the error function of the deep auto encoder may be determined through a learning process so that the audio signal input to the encoding network and the audio signal restored through the decoding network are substantially the same.

딥오토인코더의 네트워크 구조는 FC(Fully-Connected), CNN(Convolutional Neural Network) 등의 구조를 나타내나, 본 발명은 이에 한정되지 않고 복수의 레이어들로 구성된 형태이면 모두 가능하다.The network structure of the deep auto encoder represents a structure such as a fully-connected (FC) or a convolutional neural network (CNN), but the present invention is not limited thereto, and any type of a plurality of layers may be used.

도 1에 표시된 변수의 의미는 다음과 같다.The meanings of the variables shown in FIG. 1 are as follows.

: 부호화 네트워크에 입력된 오디오 신호

: Audio signal input to the encoding network

: 잠재 벡터 (또는 코드, 병목)

: Latent vector (or code, bottleneck)

: 양자화된 코드

: Quantized code

: 복호화 네트워크를 통해 복원된 오디오 신호

: Audio signal restored through the decoding network

i: 레이어 인덱스i: layer index

:

-번째 레이어의 동적 모델 파라미터

:

-Th layer dynamic model parameters

: 코드의 레이어 인덱스

: Layer index in code

: 오토인코더의 네트워크를 구성하는 전체 레이어 개수

: Total number of layers that make up the autoencoder's network

: 양자화

: Quantization

도 1에서 i번째 레이어의 출력값은 In Fig. 1, the output value of the i-th layer is

로 표현될 수 있다. 이 때, 여기서,

이고,

는 비선형 활성 함수를 나타낸다.

It can be expressed as At this time, here,

ego,

Represents a nonlinear activity function.

본 발명의 일실시예에 따르면, 오토인코더의 Feed-forward 과정은 다음과 같다.According to an embodiment of the present invention, the feed-forward process of an autoencoder is as follows.

부호화 과정:

Encoding process:

양자화:

Quantization:

복호화 과정:

여기서,

Decryption process:

here,

그리고, 본 발명의 일실시예에 따르면 오토인코더의 학습 과정은 다음과 같다.And, according to an embodiment of the present invention, the learning process of the autoencoder is as follows.

손실 함수:

Loss function:

손실 함수(L)는 목적 함수로, 오토인코더의 부호화와 복호화의 성능을 결정하는 MSE(Mean-Squared Error)와 비트율(Rate) 등의 가중합으로 표현될 수 있다. 오토인코더의 기본적인 목적은 오토인코더의 부호화 네트워크에 입력된 오디오 신호와 오토인코더의 복호화 네트워크를 통해 복원된 오디오 신호를 거의 동일하게 만드는 것이 필요하다.The loss function L is an objective function and may be expressed as a weighted sum such as a Mean-Squared Error (MSE) and a bit rate that determine the encoding and decoding performance of the autoencoder. The basic purpose of an autoencoder is to make the audio signal input to the encoding network of the autoencoder almost identical to the audio signal restored through the decoding network of the autoencoder.

i번째 레이어의 동적 모델 파라미터인

와 양자화 파라미터(예, 코드북)을 결정하기 위해, feed-forward 과정을 통해 복원된 오디오 신호인

과 오토인코더에 입력된 오디오 신호인

에 의해서 결정된 손실 함수를 출력 레이어에서 입력 레이어까지 역전파함(back-propagation)으로써 각각의 동적 모델 파라미터가 업데이트될 수 있다. 이를 위해, 코드(z)를 양자화하는 과정은 학습 단계에서 미분 가능한 형태로 진행되어야 한다.The dynamic model parameter of the i-th layer

And the audio signal restored through the feed-forward process to determine the quantization parameter (eg, codebook)

And the audio signal input to the autoencoder.

Each dynamic model parameter can be updated by back-propagation of the loss function determined by the output layer to the input layer. To this end, the process of quantizing the code z must be performed in a form that can be differentiated in the learning stage.

종래의 학습 과정을 통해 얻어진 고정 모델 파라미터는 학습을 위한 DB에 과대 적합(over-fitting) 또는 과소 적합(under-fitting)되는 경우가 자주 발생하였다. 단순히, 학습을 통해 얻어진 고정 모델 파라미터는 고정된 모델 파라미터로 입력 오디오 신호의 특성에 상관없이 동일하게 적용된다. 그렇기 때문에, 종래의 고정 모델 파라미터는 다양한 입력 오디오 신호의 특성을 반영하는데 매우 제한적이다. 따라서, 매우 다양한 특성을 갖는 입력 오디오 신호가 입력되더라도, 입력 오디오 신호의 특성이 잘 반영될 수 있는 부호화/복호화 과정이 요구된다.Fixed model parameters obtained through a conventional learning process frequently occur over-fitting or under-fitting to a DB for training. Simply, the fixed model parameters obtained through learning are fixed model parameters and are applied equally regardless of the characteristics of the input audio signal. Therefore, the conventional fixed model parameters are very limited in reflecting the characteristics of various input audio signals. Therefore, even if an input audio signal having a wide variety of characteristics is input, an encoding/decoding process is required in which the characteristics of the input audio signal can be well reflected.

본 발명은 딥-오토인코더를 기반으로 오디오 신호의 특성을 반영할 수 있도록 딥오토인코더를 구성하는 레이어들로부터 동적으로 동적 모델 파라미터를 도출하는 방법을 제안한다. The present invention proposes a method of dynamically deriving dynamic model parameters from layers constituting a deep auto encoder to reflect the characteristics of an audio signal based on a deep auto encoder.

도 2는 본 발명의 일실시예에 따른 오디오 부호화 방법을 도시한 플로우차트이다.2 is a flowchart illustrating an audio encoding method according to an embodiment of the present invention.

도 2의 오디오 부호화 방법은 오디오 부호화 장치에 의해 수행될 수 있다.The audio encoding method of FIG. 2 may be performed by an audio encoding apparatus.

도 2의 단계(201)에서, 오디오 부호화 장치는 복수의 부호화 네트워크를 통해 오디오 신호의 차원을 축소할 수 있다.In step 201 of FIG. 2, the audio encoding apparatus may reduce the dimension of the audio signal through a plurality of encoding networks.

도 2의 단계(202)에서, 오디오 부호화 장치는 부호화 네트워크에서 차원이 축소된 최종 레벨의 오디오 신호에 대응하는 코드를 출력할 수 있다. 여기서, 오디오 부호화 장치는 부호화 네트워크의 레벨들 각각의 동적 모델 파라미터를 이용하여 N개 레벨의 레이어에 대응하는 오디오 신호의 차원을 축소할 수 있다. 동적 모델 파라미터는 N개 레벨의 레이어에 대해 N-1개 레벨을 위해 생성될 수 있다.In step 202 of FIG. 2, the audio encoding apparatus may output a code corresponding to the audio signal of the final level whose dimension is reduced in the encoding network. Here, the audio encoding apparatus may reduce a dimension of an audio signal corresponding to an N-level layer by using a dynamic model parameter of each of the levels of the encoding network. Dynamic model parameters can be generated for N-1 levels for N levels of layers.

동적 모델 파라미터는 부호화 네트워크와 독립적인 동적 모델 파라미터 생성 네트워크에서 결정될 수 있다. 그리고, 동적 모델 파라미터는 다음 레벨의 특징을 결정하기 위해 이전 레벨의 특징에 기초하여 결정될 수 있다. 부호화 네트워크는 이전 레벨의 특징 신호와 다음 레벨의 동적 모델 파라미터를 이용하여 다음 레벨의 특징 신호를 결정할 수 있다. 이 때, 다음 레벨의 특징 신호는 일반적으로 상기 이전 레벨의 특징 신호보다 차원이 축소된 신호이다.The dynamic model parameter may be determined in a dynamic model parameter generation network independent from the encoding network. And, the dynamic model parameter may be determined based on the feature of the previous level to determine the feature of the next level. The encoding network may determine the next level feature signal by using the previous level feature signal and the next level dynamic model parameter. In this case, the feature signal of the next level is generally a signal whose dimension is reduced compared to the feature signal of the previous level.

도 2의 단계(203)에서, 오디오 부호화 장치는 출력된 코드를 양자화할 수 있다.In step 203 of FIG. 2, the audio encoding apparatus may quantize the output code.

도 3은 본 발명의 일실시예에 따른 오디오 복호화 방법을 도시한 플로우차트이다.3 is a flowchart illustrating an audio decoding method according to an embodiment of the present invention.

도 3의 오디오 복호화 방법은 오디오 복호화 장치에 의해 수행될 수 있다. The audio decoding method of FIG. 3 may be performed by an audio decoding apparatus.

도 3의 단계(301)에서, 오디오 복호화 방법은 축소된 차원에 대응하는 오디오 신호의 양자화된 코드를 수신할 수 있다.In step 301 of FIG. 3, the audio decoding method may receive a quantized code of an audio signal corresponding to a reduced dimension.

도 3의 단계(302)에서, 오디오 복호화 방법은 오디오 신호의 양자화된 코드를 이용하여 복수의 복호화 네트워크를 통해 특징 신호의 차원을 확장할 수 있다. 오디오 복호화 장치는 복호화 네트워크의 레벨들 각각의 동적 모델 파라미터를 이용하여 N개 레벨의 레이어에 대응하는 특징 신호의 차원을 확장할 수 있다.In step 302 of FIG. 3, the audio decoding method may extend the dimension of the feature signal through a plurality of decoding networks using quantized codes of the audio signal. The audio decoding apparatus may extend a dimension of a feature signal corresponding to an N-level layer by using a dynamic model parameter of each of the levels of the decoding network.

여기서, 오디오 복호화 장치는 복호화 네트워크의 레벨들 각각의 동적 모델 파라미터를 이용하여 N개 레벨의 레이어에 대응하는 특징 신호의 차원을 확장할 수 있다. 동적 모델 파라미터는 N개 레벨의 레이어에 대해 N-1개 레벨을 위해 생성될 수 있다.Here, the audio decoding apparatus may extend the dimension of the feature signal corresponding to the N levels of layers by using the dynamic model parameter of each of the levels of the decoding network. Dynamic model parameters can be generated for N-1 levels for N levels of layers.

동적 모델 파라미터는 복호화 네트워크와 독립적인 동적 모델 파라미터 생성 네트워크에서 결정될 수 있다. 그리고, 동적 모델 파라미터는 다음 레벨의 특징을 결정하기 위해 이전 레벨의 특징에 기초하여 결정될 수 있다. 복호화 네트워크는 이전 레벨의 특징 신호와 다음 레벨의 동적 모델 파라미터를 이용하여 다음 레벨의 특징 신호를 결정할 수 있다. 이 때, 다음 레벨의 특징 신호는 일반적으로 상기 이전 레벨의 특징 신호보다 차원이 확장된 신호이다.The dynamic model parameter may be determined in a dynamic model parameter generation network independent from the decoding network. And, the dynamic model parameter may be determined based on the feature of the previous level to determine the feature of the next level. The decoding network may determine the next level feature signal by using the previous level feature signal and the next level dynamic model parameter. In this case, the feature signal of the next level is generally a signal whose dimension is extended from that of the previous level.

도 3의 단계(303)에서, 오디오 복호화 방법은 복호화 네트워크에서 차원이 확장된 최종 레벨의 오디오 신호를 출력할 수 있다.In step 303 of FIG. 3, the audio decoding method may output an audio signal of a final level with an expanded dimension in the decoding network.

도 4는 본 발명의 일실시예에 따른 오디오 부호화 방법과 오디오 복호화 방법을 설명하기 위한 도면이다.4 is a diagram illustrating an audio encoding method and an audio decoding method according to an embodiment of the present invention.

도 4의 경우, 제1 오디오 신호 처리 방식인 부호화 과정과 제2 오디오 신호 처리 방식인 복호화 과정들 전체에 대해 동적 모델 파라미터를 생성하는 과정을 나타낸다.In the case of FIG. 4, a process of generating a dynamic model parameter for all of the encoding process that is the first audio signal processing method and the decoding process that is the second audio signal processing method is shown.

도 4와 같이, 딥-오토인코더에서 오디오 신호의 차원을 축소하는 부호화 네트워크는 레이어 0부터 레이어 b로 구성된다고 가정하고, 차원을 확장하는 복호화 네트워크가 레이어 b+1부터 레이어 L로 구성된다고 가정한다. 그러면, 부호화 장치는 부호화 네트워크를 이용하여 오디오 신호의 차원을 축소할 수 있다. 그리고, 차원이 축소된 결과는 코드가 되어 양자화될 수 있다. As shown in FIG. 4, it is assumed that an encoding network for reducing the dimension of an audio signal in a deep-auto encoder is composed of layer b from layer 0, and a decoding network that extends the dimension is composed of layer b+1 to layer L. . Then, the encoding device may reduce the dimension of the audio signal by using the encoding network. And, the result of the reduced dimension can be quantized as a code.

이 때, 본 발명의 일실시예에 따르면, 부호화 네트워크의 각 레이어별로 동적 모델 파라미터가 생성될 수 있다. 구체적으로, i번째 레이어(현재 레이어)의 특징은 i-1번째 레이어(이전 레이어)의 특징과 i번째 레이어의 동적 모델 파라미터에 기초하여 결정된다. 유사한 방식으로, i+1번째 레이어(다음 레이어)의 특징은 i번째 레이어(현재 레이어)의 특징과 i+1번째 동적 모델 파라미터에 기초하여 결정된다. 즉, 동적 모델 파라미터를 생성하는 과정은 이전 레이어에서 도출된 특징(feature)으로부터 현재 레이어의 특징을 도출하기 위한 동적 모델 파라미터를 결정할 수 있다.In this case, according to an embodiment of the present invention, a dynamic model parameter may be generated for each layer of an encoding network. Specifically, the characteristics of the i-th layer (current layer) are determined based on the characteristics of the i-th layer (previous layer) and the dynamic model parameters of the i-th layer. In a similar manner, the features of the i+1th layer (next layer) are determined based on the features of the i+1th layer (current layer) and the i+1th dynamic model parameter. That is, the process of generating the dynamic model parameter may determine a dynamic model parameter for deriving a feature of a current layer from a feature derived from a previous layer.

특히, 도 4는 부호화를 위한 제1 오디오 신호 처리와 복호화를 위한 제2 오디오 신호 처리를 위해 네트워크의 전체 레벨에 동적 모델 파라미터가 적용될 수 있다. 다만, 동적 모델 파라미터 생성 네트워크는 부호화 과정과 복호화 과정에 대해 서로 독립적으로 적용될 수 있다.In particular, in FIG. 4, a dynamic model parameter may be applied to an entire level of a network for processing a first audio signal for encoding and a second audio signal for decoding. However, the dynamic model parameter generation network can be independently applied to the encoding process and the decoding process.

복수의 레이어들 각각의 특징들에 기초하여 레이어들 각각의 동적 모델 파라미터가 결정될 수 있다. 즉, 본 발명의 일실시예에 따르면, 딥-오토인코더 기반의 오디오 부호화를 위해 오토인코더의 모델 파라미터를 오토인코더를 구성하는 복수의 레이어들 각각의 특징에 기초하여 동적으로 계산함으로써 음악 및 음향 신호의 복원에 대한 품질이 향상될 수 있다.Dynamic model parameters of each of the layers may be determined based on features of each of the plurality of layers. That is, according to an embodiment of the present invention, for audio encoding based on a deep-auto encoder, a model parameter of an autoencoder is dynamically calculated based on the characteristics of each of a plurality of layers constituting the autoencoder, thereby providing music and sound signals. The quality for the restoration of can be improved.

도 5은 본 발명의 다른 실시예에 따른 오디오 부호화 방법과 오디오 복호화 방법을 설명하기 위한 도면이다.5 is a diagram illustrating an audio encoding method and an audio decoding method according to another embodiment of the present invention.

도 5은 도 4와 달리 제1 오디오 신호 처리 방식인 부호화 과정과 제2 오디오 신호 처리 방식인 복호화 과정에 대해 공통적으로 동적 모델 파라미터가 생성될 수 있다. 즉, 동적 모델 파라미터 생성 네트워크는 부호화 과정과 복호화 과정에 대해 서로 공통적으로 적용될 수 있다. 도 5도 도 4와 마찬가지로 오디오 신호의 부호화를 위한 각각의 레벨별로 동적 모델 파라미터가 생성되고, 오디오 신호의 복호화를 위한 각각의 레벨별로 동적 모델 파라미터가 생성될 수 있다.In FIG. 5, unlike FIG. 4, a dynamic model parameter may be commonly generated for an encoding process that is a first audio signal processing method and a decoding process that is a second audio signal processing method. That is, the dynamic model parameter generation network can be applied in common to each other for an encoding process and a decoding process. Like FIG. 4, a dynamic model parameter may be generated for each level for encoding an audio signal, and a dynamic model parameter may be generated for each level for decoding an audio signal.

도 6는 본 발명의 일실시예에 따라 오토 인코더의 각 레이어별로 동적 모델 파라미터를 도출하는 과정을 설명하기 위한 도면이다.6 is a diagram for describing a process of deriving a dynamic model parameter for each layer of an auto encoder according to an embodiment of the present invention.

도 6를 참고하면, 부호화 장치에 대응하는 부호화 네트워크와 복호화 장치에 대응하는 복호화 네트워크가 도시된다. 부호화 네트워크는 b+1개의 레이어로 구성되고, 복호화 네트워크는 L-b-1개의 레이어로 구성될 수 있다.Referring to FIG. 6, an encoding network corresponding to an encoding device and a decoding network corresponding to a decoding device are illustrated. The encoding network may be composed of b+1 layers, and the decoding network may be composed of L-b-1 layers.

도 6를 참고하면, 부호화 네트워크와 복호화 네트워크를 구성하는 레이어들 각각에 대응하는 동적 모델 파라미터 생성 네트워크를 Gi로 정의할 수 있다. 여기서, i는 레이어의 인덱스를 의미한다. 모델 생성 파라미터는 딥-러닝 형태로 FC, CNN 등이 사용 가능하다.Referring to FIG. 6, a dynamic model parameter generation network corresponding to each of layers constituting an encoding network and a decoding network may be defined as Gi. Here, i means the index of the layer. As for model generation parameters, FC, CNN, etc. can be used in deep-learning form.

도 6를 참고하면, 입력 레이어를 포함한 부호화 네트워크와 복호화 네트워크를 구성하는 레이어들에 대해 이전 레이어 i-1의 특징

로부터 현재 레이어 i의 특징

를 계산하기 위해 필요한 동적 모델 파라미터

가 결정될 수 있다.Referring to FIG. 6, characteristics of a previous layer i-1 for layers constituting an encoding network and a decoding network including an input layer

From the features of the current layer i

Dynamic model parameters required to calculate

Can be determined.

도 6에서, 부호화 네트워크는 입력 신호를 부호화, 양자화하고, 복호화 네트워크는 양자화된 결과를 복호화한다. 이 때, 부호화 네트워크와 복호화 네트워크에서 사용되는 동적 모델 파라미터는 부호화 네트워크와 복호화 네트워크를 구성하는 레이어들 각각의 특징에 따라 별도의 네트워크 (동적 모델 파라미터 생성 네트워크)를 통해 결정될 수 있다.In Fig. 6, the coding network encodes and quantizes the input signal, and the decoding network decodes the quantized result. In this case, the dynamic model parameters used in the encoding network and the decoding network may be determined through a separate network (dynamic model parameter generation network) according to characteristics of each of the layers constituting the encoding network and the decoding network.

도 6에 도시된 바와 같이, 부호화 네트워크와 복호화 네트워크를 구성하는 전체 레이어들 각각에 대해 동적 모델 파라미터 생성 네트워크로부터 레이어의 동적 모델 파라미터가 결정될 수 있다. 도 6에서, 레이어 0에서 결정된 동적 모델 파라미터

는 레이어 1의 특징을 추출하기 위해 사용된다.As illustrated in FIG. 6, a dynamic model parameter of a layer may be determined from a dynamic model parameter generation network for each of all layers constituting an encoding network and a decoding network. In FIG. 6, dynamic model parameters determined in layer 0

Is used to extract layer 1 features.

도 7는 본 발명의 다른 실시예에 따라 오토 인코더의 특정 레이어에 대해 동적 모델 파라미터를 도출하는 과정을 설명하기 위한 도면이다.7 is a diagram for describing a process of deriving dynamic model parameters for a specific layer of an auto encoder according to another embodiment of the present invention.

도 7와 같이, 복잡도를 줄이기 위해 부호화 네트워크와 복호화 네트워크의 전체 레이어에 대해 동적 모델 파라미터를 생성하는 것 이외에 특정 레이어에 대해서만 동적 모델 파라미터가 생성될 수 있다.As shown in FIG. 7, in addition to generating dynamic model parameters for all layers of an encoding network and a decoding network to reduce complexity, dynamic model parameters may be generated only for a specific layer.

도 7에서, 부호화 네트워크의 레이어 1의 동적 모델 파라미터와 복호화 네트워크의 출력 레이어인 레이어 L의 동적 모델 파라미터

는 이전 레이어(i=0, L-1)의 특징에 따라 각각 동적으로 결정될 수 있다. 다만, 부호화 네트워크의 입력 레이어와 복호화 네트워크의 출력 레이어 이외에 다른 레이어에 대한 모델 파라미터

은 고정 모델 파라미터를 사용할 수 있다.In FIG. 7, a dynamic model parameter of a layer 1 of an encoding network and a dynamic model parameter of a layer L of an output layer of a decoding network

May be determined dynamically according to the characteristics of the previous layer (i=0, L-1). However, model parameters for layers other than the input layer of the encoding network and the output layer of the decoding network

Can use fixed model parameters.

도 7에서는 레이어 2부터 레이어 L-1까지를 고정 네트워크로 설정되었으나, 본 발명은 이에 한정되지 않는다. 부호화 및 복호화에 따른 신호 품질과 압축율을 포함하는 손실 함수와 양자화 방식에 따라 부호화 및 복호화 네트워크를 커버하는 고정 네트워크에 포함되는 레이어는 다르게 결정될 수 있다.In FIG. 7, layers 2 to L-1 are set as a fixed network, but the present invention is not limited thereto. Layers included in the fixed network covering the coding and decoding networks may be determined differently according to a loss function including a signal quality and a compression rate according to coding and decoding and a quantization method.

결국, 본 발명의 일실시예에 따르면, 부호화 네트워크와 복호화 네트워크를 구성하는 레이어들에 대해 동적 모델 파라미터를 도출함으로써 부호화 및 복호화에 따른 신호 복원 품질 및 압축율 개선의 효과를 기대할 수 있다.Consequently, according to an embodiment of the present invention, by deriving dynamic model parameters for layers constituting an encoding network and a decoding network, it is possible to expect an effect of improving signal reconstruction quality and compression rate according to encoding and decoding.

한편, 본 발명에 따른 방법은 컴퓨터에서 실행될 수 있는 프로그램으로 작성되어 마그네틱 저장매체, 광학적 판독매체, 디지털 저장매체 등 다양한 기록 매체로도 구현될 수 있다.Meanwhile, the method according to the present invention is written as a program that can be executed on a computer and can be implemented in various recording media, such as a magnetic storage medium, an optical reading medium, and a digital storage medium.

본 명세서에 설명된 각종 기술들의 구현들은 디지털 전자 회로조직으로, 또는 컴퓨터 하드웨어, 펌웨어, 소프트웨어로, 또는 그들의 조합들로 구현될 수 있다. 구현들은 데이터 처리 장치, 예를 들어 프로그램가능 프로세서, 컴퓨터, 또는 다수의 컴퓨터들의 동작에 의한 처리를 위해, 또는 이 동작을 제어하기 위해, 컴퓨터 프로그램 제품, 즉 정보 캐리어, 예를 들어 기계 판독가능 저장 장치(컴퓨터 판독가능 매체) 또는 전파 신호에서 유형적으로 구체화된 컴퓨터 프로그램으로서 구현될 수 있다. 상술한 컴퓨터 프로그램(들)과 같은 컴퓨터 프로그램은 컴파일된 또는 인터프리트된 언어들을 포함하는 임의의 형태의 프로그래밍 언어로 기록될 수 있고, 독립형 프로그램으로서 또는 모듈, 구성요소, 서브루틴, 또는 컴퓨팅 환경에서의 사용에 적절한 다른 유닛으로서 포함하는 임의의 형태로 전개될 수 있다. 컴퓨터 프로그램은 하나의 사이트에서 하나의 컴퓨터 또는 다수의 컴퓨터들 상에서 처리되도록 또는 다수의 사이트들에 걸쳐 분배되고 통신 네트워크에 의해 상호 연결되도록 전개될 수 있다.Implementations of the various techniques described herein may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or combinations thereof. Implementations include a data processing device, e.g., a programmable processor, a computer, or a computer program product, i.e. an information carrier, e.g., machine-readable storage, for processing by or controlling the operation of a number of computers. It may be implemented as a computer program tangibly embodied in an apparatus (computer readable medium) or a radio signal. Computer programs such as the above-described computer program(s) may be recorded in any type of programming language, including compiled or interpreted languages, and as a standalone program or in a module, component, subroutine, or computing environment. It can be deployed in any form, including as other units suitable for the use of. A computer program can be deployed to be processed on one computer or multiple computers at one site or to be distributed across multiple sites and interconnected by a communication network.

컴퓨터 프로그램의 처리에 적절한 프로세서들은 예로서, 범용 및 특수 목적 마이크로프로세서들 둘 다, 및 임의의 종류의 디지털 컴퓨터의 임의의 하나 이상의 프로세서들을 포함한다. 일반적으로, 프로세서는 판독 전용 메모리 또는 랜덤 액세스 메모리 또는 둘 다로부터 명령어들 및 데이터를 수신할 것이다. 컴퓨터의 요소들은 명령어들을 실행하는 적어도 하나의 프로세서 및 명령어들 및 데이터를 저장하는 하나 이상의 메모리 장치들을 포함할 수 있다. 일반적으로, 컴퓨터는 데이터를 저장하는 하나 이상의 대량 저장 장치들, 예를 들어 자기, 자기-광 디스크들, 또는 광 디스크들을 포함할 수 있거나, 이것들로부터 데이터를 수신하거나 이것들에 데이터를 송신하거나 또는 양쪽으로 되도록 결합될 수도 있다. 컴퓨터 프로그램 명령어들 및 데이터를 구체화하는데 적절한 정보 캐리어들은 예로서 반도체 메모리 장치들, 예를 들어, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광 기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 롬(ROM, Read Only Memory), 램(RAM, Random Access Memory), 플래시 메모리, EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM) 등을 포함한다. 프로세서 및 메모리는 특수 목적 논리 회로조직에 의해 보충되거나, 이에 포함될 수 있다.Processors suitable for processing a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. In general, the processor will receive instructions and data from read-only memory or random access memory or both. Elements of the computer may include at least one processor that executes instructions and one or more memory devices that store instructions and data. In general, a computer may include one or more mass storage devices that store data, such as magnetic, magnetic-optical disks, or optical disks, or receive data from or transmit data to them, or both. It may be combined to be Information carriers suitable for embodying computer program instructions and data are, for example, semiconductor memory devices, for example magnetic media such as hard disks, floppy disks and magnetic tapes, Compact Disk Read Only Memory (CD-ROM). ), Optical Media such as DVD (Digital Video Disk), Magnetic-Optical Media such as Floptical Disk, ROM (Read Only Memory), RAM (RAM) , Random Access Memory), flash memory, EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), and the like. The processor and memory may be supplemented by or included in a special purpose logic circuit structure.

또한, 컴퓨터 판독가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용매체일 수 있고, 컴퓨터 저장매체 및 전송매체를 모두 포함할 수 있다.Further, the computer-readable medium may be any available medium that can be accessed by a computer, and may include both a computer storage medium and a transmission medium.

본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 발명이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안되며, 오히려 특정한 발명의 특정한 실시형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. 개별적인 실시형태의 문맥에서 본 명세서에 기술된 특정한 특징들은 단일 실시형태에서 조합하여 구현될 수도 있다. 반대로, 단일 실시형태의 문맥에서 기술한 다양한 특징들 역시 개별적으로 혹은 어떠한 적절한 하위 조합으로도 복수의 실시형태에서 구현 가능하다. 나아가, 특징들이 특정한 조합으로 동작하고 초기에 그와 같이 청구된 바와 같이 묘사될 수 있지만, 청구된 조합으로부터의 하나 이상의 특징들은 일부 경우에 그 조합으로부터 배제될 수 있으며, 그 청구된 조합은 하위 조합이나 하위 조합의 변형물로 변경될 수 있다.While this specification includes details of a number of specific implementations, these should not be construed as limiting to the scope of any invention or claimable, but rather as a description of features that may be peculiar to a particular embodiment of a particular invention. It must be understood. Certain features described herein in the context of separate embodiments may be implemented in combination in a single embodiment. Conversely, various features described in the context of a single embodiment can also be implemented in multiple embodiments individually or in any suitable sub-combination. Furthermore, although features operate in a particular combination and may be initially described as so claimed, one or more features from a claimed combination may in some cases be excluded from the combination, and the claimed combination may be a sub-combination. Or sub-combination variations.

마찬가지로, 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. 특정한 경우, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다. 또한, 상술한 실시형태의 다양한 장치 컴포넌트의 분리는 그러한 분리를 모든 실시형태에서 요구하는 것으로 이해되어서는 안되며, 설명한 프로그램 컴포넌트와 장치들은 일반적으로 단일의 소프트웨어 제품으로 함께 통합되거나 다중 소프트웨어 제품에 패키징 될 수 있다는 점을 이해하여야 한다.Likewise, although operations are depicted in the drawings in a specific order, it should not be understood that such operations must be performed in that particular order or sequential order shown, or that all illustrated operations must be performed in order to obtain a desired result. In certain cases, multitasking and parallel processing can be advantageous. In addition, separation of the various device components in the above-described embodiments should not be understood as requiring such separation in all embodiments, and the program components and devices described are generally integrated together into a single software product or packaged in multiple software products. It should be understood that you can.

한편, 본 명세서와 도면에 개시된 본 발명의 실시 예들은 이해를 돕기 위해 특정 예를 제시한 것에 지나지 않으며, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시 예들 이외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 자명한 것이다.On the other hand, the embodiments of the present invention disclosed in the specification and drawings are merely presented specific examples to aid understanding, and are not intended to limit the scope of the present invention. It is apparent to those of ordinary skill in the art that other modified examples based on the technical idea of the present invention may be implemented in addition to the embodiments disclosed herein.

Claims

In the audio encoding method using dynamic model parameters,
Reducing the dimension of the audio signal through a plurality of encoding networks;
Outputting a code corresponding to an audio signal of a final level whose dimension has been reduced in the encoding network;
Quantizing the output code
Including,
The step of outputting the code,
An audio encoding method for reducing a dimension of an audio signal corresponding to an N-level layer by using a dynamic model parameter of each of the levels of the encoding network.

The method of claim 1,
The dynamic model parameter,
An audio encoding method generated for N-1 levels for the N levels of layers.

The method of claim 1,
The dynamic model parameter,
An audio encoding method determined by a dynamic model parameter generation network independent from the encoding network.

The method of claim 1,
The dynamic model parameter,
An audio encoding method that is determined based on the features of the previous level to determine the features of the next level.

The method of claim 1,
The encoding network,
The feature signal of the next level is determined using the feature signal of the previous level and the dynamic model parameter of the next level,
The audio encoding method wherein the next level feature signal is a signal whose dimension is reduced from that of the previous level feature signal.

In the audio decoding method using dynamic model parameters,
Receiving a quantized code of the audio signal corresponding to the reduced dimension;
Extending a dimension of an audio signal through a plurality of decoding networks by using the code of the audio signal;
Outputting an audio signal of a final level with an expanded dimension in the decoding network
Including,
Extending the dimension of the audio signal,
An audio decoding method for extending a dimension of an audio signal corresponding to an N-level layer by using a dynamic model parameter of each of the levels of the decoding network.

The method of claim 6,
The dynamic model parameter,
An audio decoding method generated for N-1 levels for the N levels of layers.

The method of claim 6,
The dynamic model parameter,
An audio decoding method determined in a dynamic model parameter generation network independent from the decoding network.

The method of claim 6,
The dynamic model parameter,
An audio decoding method that is determined based on the features of the previous level to determine the features of the next level.

The method of claim 6,
The decryption network,
The feature signal of the next level is determined using the feature signal of the previous level and the dynamic model parameter of the next level,
The audio decoding method wherein the next level feature signal is a signal whose dimension is extended from that of the previous level feature signal.

In the audio encoding method using dynamic model parameters,
Reducing the dimension of the audio signal through a plurality of encoding networks;
Outputting a code corresponding to an audio signal of a final level whose dimension has been reduced in the encoding network;
Quantizing the output code
Including,
The step of outputting the code,
An audio encoding method for reducing a dimension of an audio signal corresponding to an N-level layer by using a dynamic model parameter in some of the levels of the encoding network.

The method of claim 11,
The step of outputting the code,
For some preset levels of the encoding network, the dimension of the audio signal is reduced by using a dynamic model parameter,
An audio encoding method for reducing a dimension of an audio signal using a fixed model parameter for the remaining levels of the entire levels of the encoding network except for some preset levels.

The method of claim 11,
The dynamic model parameter,
An audio encoding method determined by a dynamic model parameter generation network independent from the encoding network.

The method of claim 11,
The dynamic model parameter,
An audio encoding method that is determined based on the features of the previous level to determine the features of the next level.

The method of claim 11,
The encoding network,
The audio signal of the next level is determined using the audio signal of the previous level and the dynamic model parameter of the next level,
The audio signal of the next level is a signal whose dimension is reduced from that of the audio signal of the previous level.

In the audio decoding method using dynamic model parameters,
Receiving a quantized code of the audio signal corresponding to the reduced dimension;
Extending a dimension of an audio signal through a plurality of decoding networks by using the code of the audio signal;
Outputting an audio signal of a final level with an expanded dimension in the decoding network
Including,
An audio decoding method for extending a dimension of an audio signal corresponding to an N-level layer by using a dynamic model parameter in some of the levels of the decoding network.

The method of claim 16,
The expanding step,
For some preset levels of the decoding network, the dimension of the audio signal is extended by using a dynamic model parameter,
An audio decoding method for extending a dimension of an audio signal using a fixed parameter for the remaining levels of all levels of the decoding network except for some preset levels.

The method of claim 16,
The dynamic model parameter,
An audio decoding method determined in a dynamic model parameter generation network independent from the decoding network.

The method of claim 16,
The dynamic model parameter,
An audio decoding method that is determined based on the features of the previous level to determine the features of the next level.

The method of claim 16,
The decryption network,
The feature signal of the next level is determined using the feature signal of the previous level and the dynamic or fixed model parameter of the next level,
The audio decoding method wherein the next level feature signal is a signal whose dimension is extended from that of the previous level feature signal.