RU2809646C1

RU2809646C1 - Multichannel signal generator, audio encoder and related methods based on mixing noise signal

Info

Publication number: RU2809646C1
Application number: RU2023107800A
Authority: RU
Inventors: Эммануэль РАВЕЛЛИ; Ян Фредерик КИНЕ; Гийом ФУКС; Срикантх КОРСЕ; Маркус МУЛЬТРУС; Элени ФОТОПОУЛОУ
Original assignee: Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.
Priority date: 2020-08-31
Filing date: 2021-06-30
Publication date: 2023-12-14

Abstract

FIELD: computer engineering.

SUBSTANCE: processing audio data. The technical result is to provide comfortable stereo noise by modelling the spectral characteristics of background noise in both channels, as well as the degree of correlation between them while maintaining an average bit rate comparable to mono applications. The technical result is achieved by generating a first audio signal using a first audio source; generating a second audio signal using the second audio source; generating a mixing noise signal using a mixing noise source; and mixing the mixing noise signal and the first audio signal to obtain a first channel, and mixing the mixing noise signal and the second audio signal to obtain a second channel, using a first amplitude element affecting the amplitude of the first audio signal; using a first adder that sums the output signal of the first amplitude element and at least a portion of the noise mixing signal; using a second amplitude element affecting the amplitude of the second audio signal; using a second adder summing the output of the second amplitude element and at least a portion of the mixing noise signal.

EFFECT: comfortable stereo noise by modelling the spectral characteristics of background noise in both channels, as well as the degree of correlation between them while maintaining an average bit rate comparable to mono applications.

49 cl, 11 dwg

Description

Настоящее изобретение относится, в числе прочего, к формированию комфортного шума (CNG) для обеспечения прерывистой передачи (DTX) в стереокодеках. Изобретение также относится к генератору многоканальных сигналов, к аудиокодеру и к связанным способам, например, на базе шумового сигнала микширования. Изобретение может быть реализовано в устройстве, в системе, в способе, в постоянном модуле хранения, сохраняющем инструкции, которые, при выполнении посредством компьютера (процессора, контроллера), предписывают компьютеру (процессору, контроллеру) осуществлять конкретный способ, и в кодированном многоканальном аудиосигнале.The present invention relates, inter alia, to comfort noise generation (CNG) to provide discontinuous transmission (DTX) in stereo codecs. The invention also relates to a multi-channel signal generator, an audio encoder and related methods, for example based on a noise mixing signal. The invention may be implemented in a device, in a system, in a method, in a persistent storage module storing instructions that, when executed by a computer (processor, controller), instruct the computer (processor, controller) to carry out a particular method, and in an encoded multi-channel audio signal.

ВведениеIntroduction

Генераторы комфортного шума обычно используются в прерывистой передаче (DTX) аудиосигналов, в частности, аудиосигналов, содержащих речь. В таком режиме, аудиосигнал сначала классифицируется на активные и неактивные кадры посредством детектора голосовой активности (VAD). На основе результата VAD, только активные речевые кадры кодируются и передаются с номинальной скоростью передачи битов. В течение длинных пауз, в которых присутствует только фоновый шум, скорость передачи битов понижается или обнуляется, и фоновый шум кодируется параметрически с использованием кадров дескриптора вставки молчания (кадров SID). Средняя скорость передачи битов в таком случае значительно уменьшается.Comfort noise generators are commonly used in discontinuous transmission (DTX) of audio signals, particularly audio signals containing speech. In this mode, the audio signal is first classified into active and inactive frames by a voice activity detector (VAD). Based on the VAD result, only active speech frames are encoded and transmitted at the nominal bit rate. During long pauses in which only background noise is present, the bit rate is reduced or zeroed and the background noise is encoded parametrically using silence insertion descriptor frames (SID frames). The average bit rate in this case is significantly reduced.

Шум формируется в течение неактивных кадров на стороне декодера посредством генератора комфортного шума (CNG). Размер кадра SID очень ограничен на практике. Следовательно, число параметров, описывающих фоновый шум, должно сохраняться максимально возможно малым. С этой целью, оценка шума не применяется непосредственно к выводу спектральных преобразований. Вместо этого, она применяется при более низком спектральном разрешении за счет усреднения входного спектра мощности между группами полос частот, например, согласно шкале в барках. Усреднение может достигаться посредством средних арифметических или геометрических. К сожалению, ограниченное число параметров, передаваемых в кадрах SID, не позволяет захватывать точную спектральную структуру фонового шума. Следовательно, только сглаженная спектральная огибающая шума может воспроизводиться посредством CNG. Когда VAD инициирует кадр CNG, расхождение между сглаженным спектром восстановленного комфортного шума и спектром фактического фонового шума может становиться очень слышимым при переходах между активными кадрами (предусматривающими регулярное кодирование и декодирование зашумленной речевой части сигнала) и кадрами CNG.The noise is generated during inactive frames at the decoder side by means of a comfort noise generator (CNG). The SID frame size is very limited in practice. Therefore, the number of parameters describing background noise should be kept as small as possible. To this end, noise estimation is not applied directly to the output of spectral transforms. Instead, it is applied at a lower spectral resolution by averaging the input power spectrum between groups of frequency bands, for example, according to the bark scale. Averaging can be achieved using arithmetic or geometric means. Unfortunately, the limited number of parameters carried in SID frames prevents the precise spectral structure of background noise from being captured. Therefore, only the smoothed spectral envelope of noise can be reproduced by CNG. When the VAD initiates a CNG frame, the discrepancy between the smoothed spectrum of the reconstructed comfort noise and the spectrum of the actual background noise can become very audible during transitions between active frames (involving regular encoding and decoding of the noisy speech portion of the signal) and CNG frames.

Некоторые примерные технологии CNG содержатся в рекомендациях ITU-T G.729B [1], G.729.1C [2], G.718 [3] либо в спецификациях 3GPP для AMR [4] и AMR-WB [5]. Все эти технологии формируют комфортный шум (CN) посредством использования подхода анализа/синтеза с использованием линейного прогнозирования (LP).Some example CNG technologies are contained in ITU-T Recommendations G.729B [1], G.729.1C [2], G.718 [3] or in the 3GPP specifications for AMR [4] and AMR-WB [5]. All these technologies generate comfort noise (CN) by using a linear prediction (LP) analysis/synthesis approach.

Для дополнительного уменьшения скорости передачи кодек связи 3GPP для улучшенных голосовых услуг (EVS) LTE [6] оснащается режимом прерывистой передачи (DTX), применяющим формирование комфортного шума (CNG) для неактивных кадров, т.е. для кадров, которые определяются как состоящие только из фонового шума. Для этих кадров, низкоскоростное параметрическое представление сигнала передается посредством кадров дескриптора вставки молчания (SID) самое большее каждые 8 кадров (160 мс). Это обеспечивает возможность CNG в декодере формировать искусственный шумовой сигнал, напоминающий фактический фоновый шум. В EVS, CNG может достигаться с использованием либо линейной прогнозирующей схемы (LP-CNG), либо схемы в частотной области (FD-CNG), в зависимости от спектральных характеристик фонового шума.To further reduce the bit rate, the 3GPP communication codec for enhanced voice services (EVS) LTE [6] is equipped with a discontinuous transmission (DTX) mode that uses comfort noise generation (CNG) for inactive frames, i.e. for frames that are determined to consist only of background noise. For these frames, a low-speed parametric representation of the signal is transmitted via silent insertion descriptor (SID) frames at most every 8 frames (160 ms). This allows the CNG in the decoder to produce an artificial noise signal that resembles actual background noise. In EVS, CNG can be achieved using either a linear prediction scheme (LP-CNG) or a frequency domain scheme (FD-CNG), depending on the spectral characteristics of the background noise.

Подход LP-CNG в EVS [7] работает на основе разбиения полосы частот с кодированием, состоящим из ступени аналитического/синтезирующего кодирования в полосе низких частот и в полосе высоких частот. В отличие от кодирования в полосе низких частот, параметрическое моделирование спектра шума полосы высоких частот не выполняется для сигнала полосы высоких частот. Только энергия сигнала полосы высоких частот кодируется и передается в декодер, и спектр шума полосы высоких частот формируется исключительно на стороне декодера. CN полосы низких частот и полосы высоких частот синтезируются посредством фильтрации возбуждения через синтезирующий фильтр. Возбуждение в полосе низких частот извлекается из принимаемой энергии возбуждения в полосе низких частот и частотной огибающей возбуждения в полосе низких частот. Синтезирующий фильтр полосы низких частот извлекается из принимаемых параметров LP в форме частотных коэффициентов спектральных линий (LSF). Возбуждение в полосе высоких частот получается с использованием энергии, которая экстраполируется из энергии полосы низких частот, и синтезирующий фильтр полосы высоких частот извлекается из интерполяции LSF на стороне декодера. Синтез полосы высоких частот спектрально переворачивается и добавляется в синтез полосы низких частот для формирования конечного сигнала CN.The LP-CNG approach in EVS [7] operates on the basis of band partitioning with coding consisting of an analytical/synthesis coding stage in the low band and in the high band. Unlike low-band coding, parametric modeling of the high-band noise spectrum is not performed on the high-band signal. Only the high-band signal energy is encoded and transferred to the decoder, and the high-band noise spectrum is generated exclusively at the decoder side. The CN low-pass bands and high-pass bands are synthesized by filtering the excitation through a synthesis filter. The low-band excitation is derived from the received low-band excitation energy and the frequency envelope of the low-band excitation. A low-pass synthesis filter is extracted from the received LP parameters in the form of line spectral frequency factors (LSFs). The high-band excitation is obtained using energy that is extrapolated from the low-band energy, and the high-band synthesis filter is derived from the LSF interpolation at the decoder side. The high-band synthesis is spectrally inverted and added to the low-band synthesis to form the final CN signal.

Подход FD-CNG [8],[9] использует алгоритм оценки шума в частотной области, а затем векторное квантование сглаженной спектральной огибающей фонового шума. Декодированная огибающая детализируется в декодере посредством выполнения второго модуля оценки шума в частотной области. Поскольку чисто параметрическое представление используется в течение неактивных кадров, шумовой сигнал не доступен в декодере в этом случае. В FD-CNG, оценка шума выполняется в каждом кадре (активном и неактивном) на сторонах кодера и декодера на основе минимального статистического алгоритма.The FD-CNG approach [8],[9] uses a frequency domain noise estimation algorithm and then vector quantizes the smoothed spectral envelope of the background noise. The decoded envelope is detailed in the decoder by executing a second frequency domain noise estimator. Since a purely parametric representation is used during inactive frames, the noise signal is not available at the decoder in this case. In FD-CNG, noise estimation is performed in each frame (active and inactive) at the encoder and decoder sides based on a minimal statistical algorithm.

Способ формирования комфортного шума в случае двух (или более) каналов описан в [10]. В [10] описана система для стерео-DTX и CNG, которая комбинирует моно-SID с показателем когерентности для каждой полосы частот, вычисленным для двух входных стереоканалов в кодере. В декодере моноинформация CNG и значения когерентности декодируются из потока битов, и целевая когерентность в числе полос частот синтезируется. Чтобы понижать скорость передачи битов результирующего стереокадра SID, значения когерентности кодируются с использованием прогнозирующей схемы и после этого энтропийного кодирования с переменной скоростью передачи битов. Комфортный шум формируется для каждого канала с помощью способов, описанных в предыдущих параграфах, и затем два CN микшируются для каждой полосы частот с использованием формулы со взвешиванием на основе передаваемых значений когерентности полос частот, включенных в кадр SID.The method for generating comfortable noise in the case of two (or more) channels is described in [10]. [10] describes a system for stereo DTX and CNG that combines mono-SID with a per-band coherence metric calculated for the two stereo input channels in the encoder. At the decoder, mono CNG information and coherence values are decoded from the bit stream, and the target coherence in a number of frequency bands is synthesized. To reduce the bit rate of the resulting stereo SID frame, the coherence values are encoded using a prediction circuit followed by variable bit rate entropy encoding. Comfort noise is generated for each channel using the methods described in the previous paragraphs, and then the two CNs are mixed for each frequency band using a weighted formula based on the transmitted coherence values of the frequency bands included in the SID frame.

Обуславливание/недостатки уровня техникиBackground/disadvantages of the state of the art

В стереосистеме, отдельное формирование фонового шума приводит к полностью декоррелированному шуму, который звучит неприятно и существенно отличается от фактического фонового шума, вызывающего резкие слышимые переходы, при переключении в/из фона активного режима в фоны режима DTX. Кроме того, невозможно сохранять стереоизображение фона с использованием только двух полностью декоррелированных источников шума. В завершение, если имеется источник фонового шума, и говорящий перемещается с карманным устройством вокруг источника, то пространственное изображение фонового шума должно изменяться во времени, то, что не может реплицироваться при независимом восстановлении фонового шума для каждого канала. Следовательно, необходимо создать новый подход для решения данной проблемы для стереофонических сигналов.In a stereo system, separately shaping the background noise results in a completely decorrelated noise that sounds unpleasant and significantly different from the actual background noise, causing abrupt audible transitions when switching to/from the active mode background to the DTX mode backgrounds. In addition, it is not possible to maintain a stereo image of the background using only two fully decorrelated noise sources. Finally, if there is a source of background noise, and the speaker moves with a handheld device around the source, then the spatial representation of the background noise must change over time, something that cannot be replicated when reconstructing the background noise independently for each channel. Therefore, it is necessary to create a new approach to solve this problem for stereo signals.

Это также решается в [10]; тем не менее, в вариантах осуществления, вставка общего источника шума для двух каналов для имитации коррелированного шума для формирования конечного комфортного шума, играет важную роль при имитации стереофонической записи фонового шума.This is also solved in [10]; however, in embodiments, inserting a common noise source for the two channels to simulate correlated noise to generate the final comfort noise plays an important role in simulating a stereo background noise recording.

Существующие речевые кодеки связи обычно кодируют только моносигналы. Следовательно, большинство существующих систем DTX проектируются для моно-CNG. Простое применение режима DTX работы независимо для обоих каналов стереосигнала кажется несложным, но включает в себя несколько проблем. Во-первых, этот подход требует передачи двух наборов параметров, описывающих два фоновых шумовых сигнала в двух каналах. Это должно увеличивать скорость передачи данных, необходимую для передачи кадров SID, что уменьшает преимущество уменьшения нагрузки на сеть. Другой проблематичный аспект заключается в решении VAD, которое должно синхронизироваться между каналами, чтобы не допускать странностей и искажений пространственного изображения стереосигнала, а также оптимизировать уменьшение скорости передачи битов системы. Кроме того, при применении CNG к стороне приемного устройства независимо для обоих каналов, два независимых алгоритма CNG обычно должны формировать два сигнала случайного шума с нулевой или очень низкой когерентностью. Это должно приводить к очень широкому стереоизображению в сформированном комфортном шуме. С другой стороны, применение только к генератору шума и использование одинакового комфортного шумового сигнала в обоих каналах приводит к очень высокой когерентности и к очень узкому стереоизображению. Тем не менее, для большинства стереосигналов, стереоизображение и его пространственное впечатление должны находиться где-то между этими двумя экстремальными значениями. Переключение на или из активных кадров в режим DTX в силу этого должно вводить резкие слышимые переходы. Кроме того, если имеется источник фонового шума, и говорящий перемещается с карманным устройством вокруг источника, то пространственное изображение фонового шума должно изменяться во времени, то, что не может реплицироваться при независимом восстановлении фонового шума для каждого канала. Следовательно, необходим новый подход для решения данной проблемы для стереофонических сигналов.Existing speech communication codecs typically encode only mono signals. Consequently, most existing DTX systems are designed for mono-CNG. Simply using DTX mode to operate independently on both channels of a stereo signal seems straightforward, but does present several challenges. First, this approach requires passing two sets of parameters describing two background noise signals in two channels. This should increase the data rate required to transmit SID frames, which reduces the benefit of reducing network load. Another problematic aspect is the VAD solution, which must be synchronized between channels to avoid weirdness and distortion in the spatial image of the stereo signal, as well as to optimize the system bit rate reduction. Additionally, when applying CNG to the receiver side independently for both channels, two independent CNG algorithms typically must produce two random noise signals with zero or very low coherence. This should result in a very wide stereo image in the generated comfort noise. On the other hand, applying only to the noise generator and using the same comfort noise signal in both channels results in very high coherence and a very narrow stereo image. However, for most stereo signals, the stereo image and its spatial impression should fall somewhere between these two extremes. Switching to or from active frames in DTX mode should therefore introduce sharp audible transitions. Additionally, if there is a source of background noise, and the speaker moves with a handheld device around the source, then the spatial representation of the background noise must change over time, something that cannot be replicated when reconstructing the background noise independently for each channel. Therefore, a new approach is needed to solve this problem for stereo signals.

Система, описанная в [10], разрешает эти проблемы посредством передачи информации для моно-CNG наряду со значениями параметров, которые используются для повторного синтеза стереоизображения фонового шума в декодере. Этот тип системы DTX оптимально подходит для параметрических стереокодеров, которые применяют понижающее микширование к двум входным каналам перед кодированием и передачей, из которых могут извлекаться монопараметры CNG. Тем не менее, в схеме дискретного стереокодирования обычно по-прежнему два канала кодируются объединенно, и параметры повышающего микширования, такие как высокодетализированный показатель когерентности, обычно не извлекаются. Таким образом, для подобных стереокодеров, требуется другой подход.The system described in [10] resolves these issues by passing information to the mono-CNG along with parameter values that are used to re-synthesize a stereo image of the background noise at the decoder. This type of DTX system is optimally suited for parametric stereo encoders that apply downmixing to two input channels before encoding and transmitting, from which mono CNG parameters can be extracted. However, in a discrete stereo encoding scheme, the two channels are typically still encoded together, and upmixing parameters such as a fine-grained coherence metric are typically not extracted. Thus, for similar stereo encoders, a different approach is required.

Аспекты настоящего изобретенияAspects of the Present Invention

Настоящие примеры обеспечивают эффективную передачу речевых стереосигналов. Передача стереосигнала может улучшать возможности работы пользователей и понятность речи по сравнению с (моно-)передачей только одного канала аудио, в частности, в ситуациях с налагаемым фоновым шумом или другими звуками. Стереосигналы могут кодироваться параметрически, при этом понижающее мономикширование двух стереоканалов применяется, и этот один канал понижающего микширования кодируется и передается в приемное устройство наряду со вспомогательной информацией, которая используется для аппроксимации исходного стереосигнала в декодере. Другой подход заключается в использовании дискретного стереокодирования, которое направлено на удаление избыточности между каналами, чтобы достигать более компактного двухканального представления исходного сигнала посредством некоторой предварительной обработки сигналов. Два обработанных канала затем кодируются и передаются. В декодере, обратная обработка применяется. Однако, вспомогательная информация, релевантная для стереообработки, может передаваться вдоль двух каналов. Основное различие между способами параметрического и дискретного стереокодирования в силу этого заключается в числе передаваемых каналов.The present examples provide efficient transmission of stereo speech signals. Stereo transmission can improve user experience and speech intelligibility compared to (mono) transmission of only one channel of audio, particularly in situations with overlapping background noise or other sounds. Stereo signals can be encoded parametrically, whereby mono downmixing of two stereo channels is applied and this one downmix channel is encoded and sent to the receiving device along with auxiliary information that is used to approximate the original stereo signal at the decoder. Another approach is to use discrete stereo encoding, which aims to remove redundancy between channels to achieve a more compact two-channel representation of the original signal through some signal pre-processing. The two processed channels are then encoded and transmitted. At the decoder, inverse processing is applied. However, auxiliary information relevant for stereo processing can be transmitted along two channels. The main difference between parametric and discrete stereo coding methods therefore lies in the number of transmitted channels.

Типично, в разговоре возникают периоды, в которые не все говорящие активно говорят. Входной сигнал в речевой кодер в эти периоды в силу этого состоит главным образом из фонового шума или (практически) молчания. Чтобы снижать скорость передачи данных и понижать нагрузку на сеть передачи, речевые кодеры пытаются отличать между кадрами, которые содержат речь (активными кадрами), и кадрами, которые содержат главным образом фоновый шум или молчание (неактивными кадрами). Для неактивных кадров, скорость передачи данных может значительно уменьшаться за счет не кодирования аудиосигнала, как в активных кадрах, а вместо этого извлечения параметрического описания с низкой скоростью передачи битов текущего фонового шума в форме кадра дескриптора вставки молчания (SID). Этот кадр SID периодически передается в декодер, чтобы обновлять параметры, описывающие фоновый шум, тогда как для неактивных кадров в промежутке скорость передачи битов уменьшается, либо даже информация вообще не передается. В декодере, фоновый шум ремоделируется с использованием параметров, передаваемых в кадре SID посредством алгоритма формирования комфортного шума (CNG). Таким образом, скорость передачи может понижаться или даже обнуляться для неактивных кадров без интерпретации пользователем этого как прерывания или конца соединения.Typically, there are periods in a conversation during which not all speakers are actively speaking. The input signal to the speech encoder during these periods therefore consists mainly of background noise or (virtually) silence. To reduce the data rate and load on the transmission network, speech encoders attempt to distinguish between frames that contain speech (active frames) and frames that contain mostly background noise or silence (inactive frames). For inactive frames, the data rate can be significantly reduced by not encoding the audio signal as in active frames, but instead extracting a low bit rate parametric description of the current background noise in the form of a silent insertion descriptor (SID) frame. This SID frame is periodically sent to the decoder to update the parameters describing the background noise, while for inactive frames in the interim the bit rate is reduced or even no information is transmitted at all. At the decoder, the background noise is remodeled using the parameters carried in the SID frame through a comfort noise generation (CNG) algorithm. Thus, the transmission rate can be reduced or even reset to zero for inactive frames without the user interpreting this as an interruption or end of the connection.

Описана система DTX для дискретно кодированных стереосигналов, состоящая из стерео-SID и способа CNG, которое формирует комфортный стереошум за счет моделирования спектральных характеристик фонового шума в обоих каналах, а также степени корреляции между ними, при поддержании средней скорости передачи битов сравнимой с моновариантами применения.A DTX system for discretely encoded stereo signals is described, consisting of a stereo SID and a CNG method that generates comfortable stereo noise by modeling the spectral characteristics of the background noise in both channels, as well as the degree of correlation between them, while maintaining an average bit rate comparable to mono applications.

Раскрытие изобретенияDisclosure of the Invention

В соответствии с аспектом, предусмотрен генератор многоканальных сигналов для формирования многоканального сигнала, имеющего первый канал и второй канал, содержащий:In accordance with an aspect, a multi-channel signal generator is provided for generating a multi-channel signal having a first channel and a second channel comprising:

- первый аудиоисточник для формирования первого аудиосигнала;- a first audio source for generating a first audio signal;

- второй аудиоисточник для формирования второго аудиосигнала;- a second audio source for generating a second audio signal;

- источник шума при микшировании для формирования шумового сигнала микширования; и- noise source during mixing to generate a mixing noise signal; And

- микшер для микширования шумового сигнала микширования и первого аудиосигнала для получения первого канала, и для микширования шумового сигнала микширования и второго аудиосигнала для получения второго канала.- a mixer for mixing the mixing noise signal and the first audio signal to obtain a first channel, and for mixing the mixing noise signal and the second audio signal to obtain a second channel.

Согласно аспекту, первый аудиоисточник представляет собой первый источник шума, и первый аудиосигнал представляет собой первый шумовой сигнал, или второй аудиоисточник представляет собой второй источник шума, и второй аудиосигнал представляет собой второй шумовой сигнал,According to an aspect, the first audio source is a first noise source and the first audio signal is a first noise signal, or the second audio source is a second noise source and the second audio signal is a second noise signal,

- при этом первый источник шума или второй источник шума выполнен с возможностью формирования первого шумового сигнала или второго шумового сигнала таким образом, что первый шумовой сигнал или второй шумовой сигнал декоррелируется относительно шумового сигнала микширования.- wherein the first noise source or the second noise source is configured to generate the first noise signal or the second noise signal in such a way that the first noise signal or the second noise signal is decorrelated with respect to the mixing noise signal.

Согласно аспекту, микшер выполнен с возможностью формирования первого канала и второго канала таким образом, что величина шумового сигнала микширования в первом канале равна величине шумового сигнала микширования во втором канале или составляет в пределах диапазона в 80-120 процентов относительно величины шумового сигнала микширования во втором канале.According to an aspect, the mixer is configured to generate the first channel and the second channel such that the magnitude of the mixing noise signal in the first channel is equal to the magnitude of the mixing noise signal in the second channel or is within a range of 80-120 percent relative to the magnitude of the mixing noise signal in the second channel .

Согласно аспекту, микшер содержит управляющий ввод для приема управляющего параметра, и при этом микшер выполнен с возможностью управления величиной шумового сигнала микширования в первом канале и втором канале в ответ на управляющий параметр.According to an aspect, the mixer includes a control input for receiving a control parameter, and wherein the mixer is configured to control the amount of a mixing noise signal in a first channel and a second channel in response to the control parameter.

Согласно аспекту, каждый из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании представляет собой источник гауссова шума.According to an aspect, each of the first audio source, the second audio source, and the mixing noise source is a Gaussian noise source.

Согласно аспекту, первый аудиоисточник содержит первый генератор шума для формирования первого аудиосигнала в качестве первого шумового сигнала, при этом второй аудиоисточник содержит декоррелятор для декорреляции первого шумового сигнала для формирования второго аудиосигнала в качестве второго шумового сигнала, и при этом источник шума при микшировании содержит второй генератор шума, илиAccording to an aspect, the first audio source includes a first noise generator for generating a first audio signal as a first noise signal, wherein the second audio source includes a decorrelator for decorrelating the first noise signal to generate a second audio signal as a second noise signal, and wherein the mixing noise source comprises a second generator noise, or

- при этом первый аудиоисточник содержит первый генератор шума для формирования первого аудиосигнала в качестве первого шумового сигнала, при этом второй аудиоисточник содержит второй генератор шума для формирования второго аудиосигнала в качестве второго шумового сигнала, и при этом источник шума при микшировании содержит декоррелятор для декорреляции первого шумового сигнала или второго шумового сигнала для формирования шумового сигнала микширования, или- wherein the first audio source contains a first noise generator for generating a first audio signal as a first noise signal, wherein the second audio source contains a second noise generator for generating a second audio signal as a second noise signal, and wherein the mixing noise source contains a decorrelator for decorrelating the first noise signal signal or a second noise signal to form a mixing noise signal, or

- при этом один из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании содержит генератор шума для формирования шумового сигнала, и при этом другой из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании содержит первый декоррелятор для декорреляции шумового сигнала, и при этом еще один из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании содержит второй декоррелятор для декорреляции шумового сигнала, при этом первый декоррелятор и второй декоррелятор отличаются друг от друга таким образом, что выходные сигналы первого декоррелятора и второго декоррелятора декоррелируются друг от друга, или- wherein one of the first audio source, the second audio source and the mixing noise source contains a noise generator for generating a noise signal, and the other of the first audio source, the second audio source and the mixing noise source contains a first decorrelator for decorrelating the noise signal, and furthermore one of the first audio source, the second audio source, and the mixing noise source comprises a second decorrelator for decorrelating the noise signal, wherein the first decorrelator and the second decorrelator are different from each other such that the output signals of the first decorrelator and the second decorrelator are decorrelated from each other, or

- при этом первый аудиоисточник содержит первый генератор шума, при этом второй аудиоисточник содержит второй генератор шума, и при этом источник шума при микшировании содержит третий генератор шума, при этом первый генератор шума, второй генератор шума и третий генератор шума выполнены с возможностью формирования взаимно декоррелированных шумовых сигналов.- wherein the first audio source contains a first noise generator, wherein the second audio source contains a second noise generator, and wherein the mixing noise source contains a third noise generator, wherein the first noise generator, the second noise generator and the third noise generator are configured to form mutually decorrelated noise signals.

Согласно аспекту, один из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании содержит генератор псевдослучайных числовых последовательностей, выполненный с возможностью формирования псевдослучайной числовой последовательности в ответ на начальное число, и при этом по меньшей мере два из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании выполнены с возможностью инициализации генератора псевдослучайных числовых последовательностей с использованием различных начальных чисел.According to an aspect, one of the first audio source, the second audio source, and the mixing noise source comprises a pseudo-random number sequence generator configured to generate a pseudo-random number sequence in response to a seed, and wherein at least two of the first audio source, the second audio source, and the noise source during mixing, they are configured to initialize the generator of pseudo-random number sequences using different initial numbers.

Согласно аспекту по меньшей мере один из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании выполнен с возможностью работы с использованием предварительно сохраненной таблицы шумов, илиIn an aspect, at least one of the first audio source, the second audio source, and the mixing noise source is configured to operate using a previously stored noise table, or

- при этом по меньшей мере один из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании выполнен с возможностью формирования комплексного спектра для кадра с использованием первого значения шума для действительной части и второго значения шума для мнимой части,- wherein at least one of the first audio source, the second audio source and the mixing noise source is configured to generate a complex spectrum for the frame using the first noise value for the real part and the second noise value for the imaginary part,

- при этом, при необходимости, по меньшей мере один генератор шума выполнен с возможностью формирования комплексного спектрального значения шума для частотного элемента k разрешения с использованием, для одной из действительной части и мнимой части, первого случайного значения с индексом k, и с использованием, для другой из действительной части и мнимой части, второго случайного значения с индексом (k+M), при этом первое значение шума и второе значение шума включаются в шумовой массив, например, извлекаемый из генератора последовательности случайных чисел или из таблицы шумов, или из шумового процесса, в диапазоне от начального индекса до конечного индекса, причем начальный индекс меньше M, и причем конечный индекс равен или меньше 2M, при этом M и k являются целыми числами.- wherein, if necessary, at least one noise generator is configured to generate a complex spectral noise value for the frequency element k of the resolution using, for one of the real part and the imaginary part, the first random value with index k, and using, for the other of a real part and an imaginary part, a second random value with index (k+M), wherein the first noise value and the second noise value are included in a noise array, for example, extracted from a random number sequence generator or from a noise table, or from a noise process , in the range from the start index to the end index, wherein the start index is less than M, and where the end index is equal to or less than 2M, wherein M and k are integers.

Согласно аспекту, микшер содержит:According to the aspect, the mixer contains:

- первый амплитудный элемент для воздействия на амплитуду первого аудиосигнала;- a first amplitude element for influencing the amplitude of the first audio signal;

- первый сумматор для суммирования выходного сигнала первого амплитудного элемента и по меньшей мере части шумового сигнала микширования;- a first adder for summing the output signal of the first amplitude element and at least part of the noise mixing signal;

- второй амплитудный элемент для воздействия на амплитуду второго аудиосигнала;- a second amplitude element for influencing the amplitude of the second audio signal;

- второй сумматор для суммирования вывода второго амплитудного элемента и по меньшей мере части шумового сигнала микширования,- a second adder for summing the output of the second amplitude element and at least a portion of the mixing noise signal,

- при этом величина воздействия, выполняемого посредством первого амплитудного элемента, и величина воздействия, выполняемого посредством второго амплитудного элемента, равны друг другу, или величина воздействия, выполняемого посредством второго амплитудного элемента, отличается менее чем на 20 процентов относительно величины, выполняемой посредством первого амплитудного элемента.- in this case, the magnitude of the impact performed through the first amplitude element and the magnitude of the impact performed through the second amplitude element are equal to each other, or the magnitude of the impact performed through the second amplitude element differs by less than 20 percent relative to the value performed through the first amplitude element .

Согласно аспекту, микшер содержит третий амплитудный элемент для воздействия на амплитуду шумового сигнала микширования,According to an aspect, the mixer includes a third amplitude element for influencing the amplitude of the mixing noise signal,

- при этом величина воздействия, выполняемого посредством третьего амплитудного элемента, зависит от величины воздействия, выполняемого посредством первого амплитудного элемента или второго амплитудного элемента таким образом, что величина воздействия, выполняемого посредством третьего амплитудного элемента, становится больше, когда величина воздействия, выполняемого посредством первого амплитудного элемента, или величина воздействия, выполняемого посредством второго амплитудного элемента, становится меньше.- in this case, the magnitude of the impact performed by means of the third amplitude element depends on the magnitude of the impact performed by means of the first amplitude element or the second amplitude element in such a way that the magnitude of the impact performed by means of the third amplitude element becomes greater when the magnitude of the impact performed by means of the first amplitude element element, or the magnitude of the effect performed by the second amplitude element becomes smaller.

Согласно аспекту, величина воздействия, выполняемого посредством третьего амплитудного элемента, представляет собой квадратный корень значения c_q, и величина воздействия, выполняемого посредством первого амплитудного элемента, и величина воздействия, выполняемого посредством второго амплитудного элемента, представляет собой квадратный корень разности между единицей и c_q.According to an aspect, the amount of action performed by the third amplitude element is the square root of the value of c _q , and the amount of action performed by the first amplitude element and the amount of action performed by the second amplitude element is the square root of the difference between one and c _q .

Согласно аспекту, входной интерфейс для приема кодированных аудиоданных в последовательности кадров, содержащих активный кадр и неактивный кадр после активного кадра; иAccording to an aspect, an input interface for receiving encoded audio data in a sequence of frames comprising an active frame and an inactive frame after the active frame; And

- аудиодекодер для декодирования кодированных аудиоданных для активного кадра для формирования декодированного многоканального сигнала для активного кадра,- an audio decoder for decoding encoded audio data for the active frame to generate a decoded multi-channel signal for the active frame,

- при этом первый аудиоисточник, второй аудиоисточник, источник шума при микшировании и микшер являются активными в неактивном кадре для формирования многоканального сигнала для неактивного кадра.- wherein the first audio source, the second audio source, the mixing noise source and the mixer are active in the inactive frame to generate a multi-channel signal for the inactive frame.

Согласно аспекту, кодированный аудиосигнал для активного кадра имеет первое множество коэффициентов, описывающих первое число частотных элементов разрешения; иAccording to an aspect, the encoded audio signal for an active frame has a first set of coefficients describing a first number of frequency bins; And

- кодированный аудиосигнал для неактивного кадра имеет второе множество коэффициентов, описывающих второе число частотных элементов разрешения,- the encoded audio signal for the inactive frame has a second set of coefficients describing the second number of frequency bins,

- при этом первое число частотных элементов разрешения больше второго числа частотных элементов разрешения.- in this case, the first number of frequency bins is greater than the second number of frequency bins.

Согласно аспекту, кодированные аудиоданные для неактивного кадра содержат данные дескриптора вставки молчания, содержащие данные комфортного шума, указывающие энергию сигналов для каждого канала двух каналов или для каждой из первой линейной комбинации первого и второго каналов и второй линейной комбинации первого и второго каналов для неактивного кадра и указывающие когерентность между первым каналом и вторым каналом в неактивном кадре, иAccording to an aspect, the encoded audio data for an inactive frame comprises silence insert descriptor data comprising comfort noise data indicating signal energies for each channel of the two channels or for each of a first linear combination of the first and second channels and a second linear combination of the first and second channels for the inactive frame, and indicating coherence between the first channel and the second channel in the inactive frame, and

- при этом микшер выполнен с возможностью микширования шумового сигнала микширования и первого аудиосигнала или второго аудиосигнала на основе данных комфортного шума, указывающих когерентность, и- wherein the mixer is configured to mix the mixing noise signal and the first audio signal or the second audio signal based on the comfort noise data indicating coherence, and

- при этом генератор многоканальных сигналов дополнительно содержит модуль модификации сигналов для модификации первого канала и второго канала либо первого аудиосигнала, либо второго аудиосигнала, либо шумового сигнала микширования, при этом модуль модификации сигналов выполнен с возможностью управления посредством данных комфортного шума, указывающих энергии сигналов для первого аудиоканала и второго аудиоканала либо указывающих энергии сигналов для первой линейной комбинации первого и второго каналов и второй линейной комбинации первого и второго каналов.- wherein the multi-channel signal generator further comprises a signal modification module for modifying the first channel and the second channel of either the first audio signal, or the second audio signal, or the mixing noise signal, wherein the signal modification module is configured to be controlled by comfort noise data indicating signal energies for the first an audio channel and a second audio channel, or indicating signal energies for a first linear combination of the first and second channels and a second linear combination of the first and second channels.

Согласно аспекту, аудиоданные для неактивного кадра содержат:According to an aspect, the audio data for the inactive frame comprises:

- первый кадр дескриптора вставки молчания для первого канала и второй кадр дескриптора вставки молчания для второго канала, при этом первый кадр дескриптора вставки молчания содержит:- a first frame of a silence insert descriptor for the first channel and a second frame of a silence insert descriptor for the second channel, wherein the first frame of the silence insert descriptor contains:

- данные параметров комфортного шума для первого канала и/или для первой линейной комбинации первого и второго каналов, и- comfort noise parameter data for the first channel and/or for the first linear combination of the first and second channels, and

- вспомогательную информацию формирования комфортного шума для первого канала и второго канала, и- comfort noise generation support information for the first channel and the second channel, and

- при этом второй кадр дескриптора вставки молчания содержит:- wherein the second frame of the silence insertion descriptor contains:

- данные параметров комфортного шума для второго канала и/или для второй линейной комбинации первого и второго каналов, и- comfort noise parameter data for the second channel and/or for the second linear combination of the first and second channels, and

- информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре, и- coherence information indicating coherence between the first channel and the second channel in the inactive frame, and

- при этом генератор многоканальных сигналов содержит контроллер для управления формированием многоканального сигнала в неактивном кадре с использованием вспомогательной информации формирования комфортного шума для первого кадра дескриптора вставки молчания для определения режима формирования комфортного шума для первого канала и второго канала и/или для первой линейной комбинации первого и второго каналов и второй линейной комбинации первого и второго каналов, с использованием информации когерентности во втором кадре дескриптора вставки молчания для задания когерентности между первым каналом и вторым каналом в неактивном кадре, и с использованием данных параметров комфортного шума из первого кадра дескриптора вставки молчания, и с использованием данных параметров комфортного шума из второго кадра дескриптора вставки молчания для задания энергетической ситуации первого канала и энергетической ситуации второго канала.- wherein the multi-channel signal generator contains a controller for controlling the generation of a multi-channel signal in an inactive frame using the comfort noise generation auxiliary information for the first frame of the silence insert descriptor to determine the comfort noise generation mode for the first channel and the second channel and/or for the first linear combination of the first and the second channels and a second linear combination of the first and second channels, using the coherence information in the second frame of the silence insert descriptor to specify the coherence between the first channel and the second channel in the inactive frame, and using the comfort noise parameter data from the first frame of the silence insert descriptor, and with using the comfort noise parameter data from the second frame of the silence insert descriptor to define the power situation of the first channel and the power situation of the second channel.

- по меньшей мере один кадр дескриптора вставки молчания для первой линейной комбинации первого и второго каналов и второй линейной комбинации первого и второго каналов,- at least one silent insert descriptor frame for a first linear combination of the first and second channels and a second linear combination of the first and second channels,

- при этом по меньшей мере один кадр дескриптора вставки молчания содержит:- wherein at least one frame of the silence insertion descriptor contains:

- данные (p_noise) параметров комфортного шума для первой линейной комбинации первого и второго каналов, и- data (p_noise) of comfort noise parameters for the first linear combination of the first and second channels, and

- вспомогательную информацию формирования комфортного шума для второй линейной комбинации первого и второго каналов,- auxiliary information on the formation of comfortable noise for the second linear combination of the first and second channels,

- при этом генератор многоканальных сигналов содержит контроллер для управления формированием многоканального сигнала в неактивном кадре с использованием вспомогательной информации формирования комфортного шума для первой линейной комбинации первого и второго каналов и второй линейной комбинации первого и второго каналов, с использованием информации когерентности во втором кадре дескриптора вставки молчания для задания когерентности между первым каналом и вторым каналом в неактивном кадре, и с использованием данных параметров комфортного шума по меньшей мере из одного кадра дескриптора вставки молчания, и с использованием данных параметров комфортного шума по меньшей мере из одного кадра дескриптора вставки молчания для задания энергетической ситуации первого канала и энергетической ситуации второго канала.- wherein the multi-channel signal generator comprises a controller for controlling the generation of a multi-channel signal in an inactive frame using the comfort noise generation auxiliary information for the first linear combination of the first and second channels and the second linear combination of the first and second channels, using coherence information in the second frame of the silence insert descriptor for defining coherence between the first channel and the second channel in the inactive frame, and using the comfort noise parameters data from at least one silence insertion descriptor frame, and using the comfort noise parameters data from at least one silence insertion descriptor frame to define a power situation the first channel and the energy situation of the second channel.

Согласно аспекту, спектрально-временной преобразователь для преобразования результирующего первого канала и результирующего второго канала, спектрально регулируемых и когерентно регулируемых, в соответствующие представления во временной области, которые должны комбинироваться или конкатенироваться с представлениями во временной области соответствующих каналов декодированного многоканального сигнала для активного кадра.According to an aspect, a spectral-time converter for converting the resulting first channel and the resulting second channel, spectrally adjustable and coherently adjustable, into corresponding time domain representations that are to be combined or concatenated with the time domain representations of the corresponding channels of the decoded multi-channel signal for the active frame.

- кадр дескриптора вставки молчания, при этом кадр дескриптора вставки молчания содержит данные параметров комфортного шума для первого и второго канала и вспомогательную информацию формирования комфортного шума для первого канала и второго канала и/или для первой линейной комбинации первого и второго каналов и второй линейной комбинации первого и второго каналов и информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре, и- a silence insertion descriptor frame, wherein the silence insertion descriptor frame contains data on comfort noise parameters for the first and second channel and auxiliary information on the generation of comfort noise for the first channel and the second channel and/or for the first linear combination of the first and second channels and the second linear combination of the first and second channels and coherence information indicating coherence between the first channel and the second channel in the inactive frame, and

- при этом генератор многоканальных сигналов содержит контроллер для управления формированием многоканального сигнала в неактивном кадре с использованием вспомогательной информации формирования комфортного шума для кадра дескриптора вставки молчания для определения режима формирования комфортного шума для первого канала и второго канала, с использованием информации когерентности в кадре дескриптора вставки молчания для задания когерентности между первым каналом и вторым каналом в неактивном кадре, и с использованием данных параметров комфортного шума из кадра дескриптора вставки молчания для задания энергетической ситуации первого канала и энергетической ситуации второго канала.- wherein the multi-channel signal generator contains a controller for controlling the generation of a multi-channel signal in an inactive frame using the auxiliary information of comfort noise generation for the silence insertion descriptor frame to determine the comfort noise generation mode for the first channel and the second channel, using coherence information in the silence insertion descriptor frame for specifying coherence between the first channel and the second channel in the inactive frame, and using the comfort noise parameter data from the silence insert descriptor frame to specify the power situation of the first channel and the power situation of the second channel.

Согласно аспекту, кодированные аудиоданные для неактивного кадра содержат данные дескриптора вставки молчания, содержащие данные комфортного шума, указывающие энергию сигналов для каждого канала в среднем/боковом представлении, и данные когерентности, указывающие когерентность между первым каналом и вторым каналом в левом/правом представлении, при этом генератор многоканальных сигналов выполнен с возможностью преобразования среднего/бокового представления энергии сигналов в левое/правое представление энергии сигналов в первом канале и втором канале,According to an aspect, the encoded audio data for an inactive frame comprises silence insert descriptor data comprising comfort noise data indicating the energy of signals for each channel in the middle/side view, and coherence data indicating the coherence between the first channel and the second channel in the left/right view, with wherein the multi-channel signal generator is configured to convert the middle/side representation of the signal energy into a left/right representation of the signal energy in the first channel and the second channel,

- при этом микшер выполнен с возможностью микширования шумового сигнала микширования в первый аудиосигнал и второй аудиосигнал на основе данных когерентности для получения первого канала и второго канала, и- wherein the mixer is configured to mix the mixing noise signal into the first audio signal and the second audio signal based on the coherence data to obtain the first channel and the second channel, and

- при этом генератор многоканальных сигналов дополнительно содержит модуль модификации сигналов, выполненный с возможностью модификации первого и второго канала посредством формирования первого и второго канала на основе энергии сигналов в левой/правой области.- wherein the multi-channel signal generator further comprises a signal modification module configured to modify the first and second channels by generating the first and second channels based on the energy of the signals in the left/right region.

Согласно аспекту, генератор многоканальных сигналов выполнен с возможностью, в случае, если аудиоданные содержат передачу служебных сигналов, указывающую, что энергия в боковом канале меньше заданного порогового значения, обнуления коэффициентов бокового канала.According to an aspect, the multi-channel signal generator is configured to, in the case that the audio data includes signaling signaling indicating that the energy in a side channel is less than a predetermined threshold value, zero the side channel coefficients.

- по меньшей мере один кадр дескриптора вставки молчания, при этом по меньшей мере один кадр дескриптора вставки молчания содержит данные параметров комфортного шума для среднего и бокового канала и вспомогательную информацию формирования комфортного шума для среднего и бокового канала и информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре, и- at least one silent insertion descriptor frame, wherein the at least one silence insertion descriptor frame contains comfort noise parameter data for the middle and side channel and comfort noise generation auxiliary information for the middle and side channel and coherence information indicating coherence between the first channel and the second channel in the inactive frame, and

- при этом генератор многоканальных сигналов содержит контроллер для управления формированием многоканального сигнала в неактивном кадре с использованием вспомогательной информации формирования комфортного шума для кадра дескриптора вставки молчания для определения режима формирования комфортного шума для первого канала и второго канала, с использованием информации когерентности в кадре дескриптора вставки молчания для задания когерентности между первым каналом и вторым каналом в неактивном кадре, и с использованием данных параметров комфортного шума либо их обработанной версии из кадра дескриптора вставки молчания для задания энергетической ситуации первого канала и энергетической ситуации второго канала.- wherein the multi-channel signal generator contains a controller for controlling the generation of a multi-channel signal in an inactive frame using the auxiliary information of comfort noise generation for the silence insertion descriptor frame to determine the comfort noise generation mode for the first channel and the second channel, using coherence information in the silence insertion descriptor frame to specify coherence between the first channel and the second channel in the inactive frame, and using these comfort noise parameters or a processed version thereof from the silence insert descriptor frame to define the power situation of the first channel and the power situation of the second channel.

Согласно аспекту, генератор многоканальных сигналов выполнен с возможностью масштабирования энергетических коэффициентов сигналов для первого и второго канала посредством информации усиления, кодированной с помощью данных параметров комфортного шума для первого и второго канала.According to an aspect, the multi-channel signal generator is configured to scale the energy coefficients of the signals for the first and second channel by means of gain information encoded using the comfort noise parameter data for the first and second channel.

Согласно аспекту, генератор многоканальных сигналов выполнен с возможностью преобразования сформированного многоканального сигнала из версии в частотной области в версию во временной области.According to an aspect, the multi-channel signal generator is configured to convert the generated multi-channel signal from a frequency domain version to a time domain version.

- при этом первый источник шума или второй источник шума выполнен с возможностью формирования первого шумового сигнала или второго шумового сигнала таким образом, что первый шумовой сигнал или второй шумовой сигнал по меньшей мере частично коррелируются, и- wherein the first noise source or the second noise source is configured to generate the first noise signal or the second noise signal such that the first noise signal or the second noise signal is at least partially correlated, and

- источник шума при микшировании выполнен с возможностью формирования шумового сигнала микширования с первой частью шума при микшировании и второй частью шума при микшировании, причем вторая часть шума при микшировании по меньшей мере частично декоррелируется относительно первой части шума при микшировании; и- the mixing noise source is configured to generate a mixing noise signal with a first part of the mixing noise and a second part of the mixing noise, wherein the second part of the mixing noise is at least partially decorrelated with respect to the first part of the mixing noise; And

- микшер служит для микширования первой части шума при микшировании шумового сигнала микширования и первого аудиосигнала для получения первого канала, и для микширования второй части шума при микшировании шумового сигнала микширования и второго аудиосигнала для получения второго канала.- the mixer is used for mixing the first part of the noise when mixing the mixing noise signal and the first audio signal to obtain the first channel, and for mixing the second part of the noise when mixing the mixing noise signal and the second audio signal to obtain the second channel.

В соответствии с аспектом, предусмотрен способ формирования многоканального сигнала, имеющего первый канал и второй канал, содержащий:According to an aspect, there is provided a method for generating a multi-channel signal having a first channel and a second channel comprising:

- формирование первого аудиосигнала с использованием первого аудиоисточника;- generating a first audio signal using the first audio source;

- формирование второго аудиосигнала с использованием второго аудиоисточника;- generating a second audio signal using a second audio source;

- формирование шумового сигнала микширования с использованием источника шума при микшировании; и- generation of a mixing noise signal using a noise source during mixing; And

- микширование шумового сигнала микширования и первого аудиосигнала для получения первого канала, и микширование шумового сигнала микширования и второго аудиосигнала для получения второго канала.- mixing the mixing noise signal and the first audio signal to obtain a first channel, and mixing the mixing noise signal and the second audio signal to obtain a second channel.

В соответствии с аспектом, предусмотрен аудиокодер для формирования кодированного многоканального аудиосигнала для последовательности кадров, содержащих активный кадр и неактивный кадр, причем аудиокодер содержит:In accordance with an aspect, an audio encoder is provided for generating an encoded multi-channel audio signal for a sequence of frames comprising an active frame and an inactive frame, the audio encoder comprising:

- детектор активности для анализа многоканального сигнала для определения кадра последовательности кадров как представляющего собой неактивный кадр;- an activity detector for analyzing the multi-channel signal to determine a frame of the sequence of frames as representing an inactive frame;

- модуль вычисления параметров шума для вычисления первых параметрических данных шума для первого канала многоканального сигнала и для вычисления вторых параметрических данных шума для второго канала многоканального сигнала;- a noise parameter calculation module for calculating first noise parameter data for a first channel of the multi-channel signal and for calculating second noise parameter data for a second channel of the multi-channel signal;

- модуль вычисления когерентности для вычисления данных когерентности, указывающих ситуацию когерентности между первым каналом и вторым каналом в неактивном кадре; иa coherence calculation module for calculating coherence data indicating a coherence situation between the first channel and the second channel in the inactive frame; And

- выходной интерфейс для формирования кодированного многоканального аудиосигнала, имеющего кодированные аудиоданные для активного кадра и, для неактивного кадра, первые параметрические данные шума, вторые параметрические данные шума или первую линейную комбинацию первых параметрических данных шума и вторых параметрических данных шума и вторую линейную комбинацию первых параметрических данных шума и вторых параметрических данных шума, и данных когерентности.- an output interface for generating an encoded multi-channel audio signal having encoded audio data for an active frame and, for an inactive frame, first noise parametric data, second noise parametric data, or a first linear combination of the first noise parametric data and second noise parametric data, and a second linear combination of the first parametric data noise and second parametric noise data, and coherence data.

Согласно аспекту, модуль вычисления когерентности выполнен с возможностью вычисления значения когерентности и квантования значения когерентности для получения квантованного значения когерентности, при этом выходной интерфейс выполнен с возможностью использования квантованного значения когерентности в качестве данных когерентности в кодированном многоканальном сигнале.According to an aspect, the coherence calculation module is configured to calculate a coherence value and quantize the coherence value to obtain a quantized coherence value, wherein the output interface is configured to use the quantized coherence value as coherence data in an encoded multi-channel signal.

Согласно аспекту, модуль вычисления когерентности выполнен с возможностью:According to an aspect, the coherence calculation module is configured to:

- вычисления действительного промежуточного значения и мнимого промежуточного значения из комплексных спектральных значений для первого канала и второго канала в неактивном кадре;- calculating the real intermediate value and the imaginary intermediate value from the complex spectral values for the first channel and the second channel in the inactive frame;

- вычисления первого значения энергии для первого канала и второго значения энергии для второго канала в неактивном кадре; и- calculating a first energy value for the first channel and a second energy value for the second channel in the inactive frame; And

- вычисления данных когерентности с использованием действительного промежуточного значения, мнимого промежуточного значения, первого значения энергии и второго значения энергии, или- calculating coherence data using a real intermediate value, an imaginary intermediate value, a first energy value and a second energy value, or

- сглаживания по меньшей мере одного из действительного промежуточного значения, мнимого промежуточного значения, первого значения энергии и второго значения энергии и вычислять данные когерентности с использованием по меньшей мере одного сглаженного значения.- smoothing at least one of the real intermediate value, the imaginary intermediate value, the first energy value and the second energy value and calculating the coherence data using the at least one smoothed value.

Согласно аспекту, модуль вычисления когерентности выполнен с возможностью вычисления действительного промежуточного значения в качестве суммы по действительным частям произведений комплексных спектральных значений для соответствующих частотных элементов разрешения первого канала и второго канала в неактивном кадре, илиAccording to an aspect, the coherence calculation module is configured to calculate a real intermediate value as a sum over the real parts of products of complex spectral values for corresponding frequency bins of the first channel and the second channel in the inactive frame, or

- вычисления мнимого промежуточного значения в качестве суммы по мнимым частям произведений комплексных спектральных значений для соответствующих частотных элементов разрешения первого канала и второго канала в неактивном кадре.- calculating the imaginary intermediate value as a sum over the imaginary parts of the products of complex spectral values for the corresponding frequency bins of the first channel and the second channel in the inactive frame.

Согласно аспекту, модуль вычисления когерентности выполнен с возможностью возведения в квадрат сглаженного действительного промежуточного значения и возведения в квадрат сглаженного мнимого промежуточного значения и суммирования возведенных в квадрат значений для получения первого компонентного числа,According to an aspect, the coherence calculation module is configured to square the smoothed real intermediate value and square the smoothed imaginary intermediate value and sum the squared values to obtain a first component number,

- при этом модуль вычисления когерентности выполнен с возможностью умножения сглаженных первого и второго значений энергии для получения второго компонентного числа, и комбинирования первого и второго компонентных чисел для получения результирующего числа для значения когерентности, на котором основаны данные когерентности.- wherein the coherence calculation module is configured to multiply the smoothed first and second energy values to obtain a second component number, and combine the first and second component numbers to obtain a resultant number for the coherence value on which the coherence data is based.

Согласно аспекту, модуль вычисления когерентности выполнен с возможностью вычисления квадратного корня результирующего числа для получения значения когерентности, на котором основаны данные когерентности.According to an aspect, the coherence calculation module is configured to calculate the square root of the resultant number to obtain a coherence value on which the coherence data is based.

Согласно аспекту, модуль вычисления когерентности выполнен с возможностью квантования значения когерентности с использованием равномерного квантователя для получения квантованного значения когерентности в качестве n битов в качестве данных когерентности.According to an aspect, the coherence calculation module is configured to quantize the coherence value using a uniform quantizer to obtain the quantized coherence value as n bits as coherence data.

Согласно аспекту, выходной интерфейс выполнен с возможностью формирования первого кадра дескриптора вставки молчания для первого канала и второго кадра дескриптора вставки молчания для второго канала, при этом первый кадр дескриптора вставки молчания содержит данные параметров комфортного шума для первого канала и вспомогательную информацию формирования комфортного шума для первого канала и второго канала, и при этом второй кадр дескриптора вставки молчания содержит данные параметров комфортного шума для второго канала и информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре, илиAccording to an aspect, the output interface is configured to generate a first silence insert descriptor frame for a first channel and a second silence insert descriptor frame for a second channel, wherein the first silence insert descriptor frame contains comfort noise parameter data for the first channel and comfort noise generating auxiliary information for the first channel. channel and a second channel, and wherein the second frame of the silent insert descriptor contains comfort noise parameter data for the second channel and coherence information indicating coherence between the first channel and the second channel in the inactive frame, or

- при этом выходной интерфейс выполнен с возможностью формирования кадра дескриптора вставки молчания, при этом кадр дескриптора вставки молчания содержит данные параметров комфортного шума для первого и второго канала и вспомогательную информацию формирования комфортного шума для первого канала и второго канала и информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре,- wherein the output interface is configured to generate a silence insertion descriptor frame, wherein the silence insertion descriptor frame contains comfort noise parameter data for the first and second channel and auxiliary comfort noise generation information for the first channel and the second channel and coherence information indicating coherence between the first channel and the second channel in the inactive frame,

- или при этом выходной интерфейс выполнен с возможностью формирования первого кадра дескриптора вставки молчания для первого канала и второго канала и второй кадр дескриптора вставки молчания для первого канала и второго канала, при этом первый кадр дескриптора вставки молчания содержит данные параметров комфортного шума для первого канала и второго канала и вспомогательную информацию формирования комфортного шума для первого канала и второго канала, и при этом второй кадр дескриптора вставки молчания содержит данные параметров комфортного шума для первого канала и второго канала и информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре.- or wherein the output interface is configured to generate a first frame of a silence insert descriptor for the first channel and a second channel and a second frame of a silence insert descriptor for the first channel and a second channel, wherein the first frame of the silence insert descriptor contains comfort noise parameter data for the first channel and a second channel and comfort noise generating auxiliary information for the first channel and a second channel, and wherein the second frame of the silence insert descriptor contains comfort noise parameter data for the first channel and the second channel and coherence information indicating coherence between the first channel and the second channel in the inactive frame.

Согласно аспекту, равномерный квантователь выполнен с возможностью вычисления n битов таким образом, что значение для n равно значению битов, занимаемых посредством вспомогательной информации формирования комфортного шума для первого кадра дескриптора вставки молчания.According to an aspect, the uniform quantizer is configured to calculate n bits such that the value for n is equal to the value of the bits occupied by the comfort noise generation auxiliary information for the first frame of the silence insertion descriptor.

Согласно аспекту, детектор активности выполнен с возможностью:According to an aspect, the activity detector is configured to:

- анализа первого канала многоканального сигнала для классификации первого канала как активного или неактивного, и- analyzing the first channel of the multi-channel signal to classify the first channel as active or inactive, and

- анализа второго канала многоканального сигнала для классификации второго канала как активного или неактивного, и- analyzing the second channel of the multi-channel signal to classify the second channel as active or inactive, and

- определения кадра последовательности кадров как представляющего собой неактивный кадр, если как первый канал, так и второй канал классифицированы как неактивные.- determining a frame of the frame sequence to be an inactive frame if both the first channel and the second channel are classified as inactive.

Согласно аспекту, модуль вычисления параметров шума выполнен с возможностью вычисления первой информации усиления для первого канала и второй информации усиления для второго канала и обеспечения параметрических данных шума в качестве первой информации усиления для первого канала и второй информации усиления.According to an aspect, the noise parameter calculation module is configured to calculate first gain information for the first channel and second gain information for the second channel, and provide noise parameter data as the first gain information for the first channel and the second gain information.

Согласно аспекту, модуль вычисления параметров шума выполнен с возможностью преобразования по меньшей мере некоторых из первых параметрических данных шума и вторых параметрических данных шума из левого/правого представления в среднее/боковое представление со средним каналом и боковым каналом.According to an aspect, the noise parameter calculation module is configured to convert at least some of the first noise parameter data and the second noise parameter data from a left/right view to a middle/side view with a middle channel and a side channel.

Согласно аспекту, модуль вычисления параметров шума выполнен с возможностью повторного преобразования среднего/бокового представления по меньшей мере некоторых из первых параметрических данных шума и вторых параметрических данных шума в левое/правое представление,According to an aspect, the noise parameter calculation module is configured to reconvert the middle/side representation of at least some of the first noise parametric data and the second noise parametric data into a left/right representation,

- при этом модуль вычисления параметров шума выполнен с возможностью вычисления из повторно преобразованного левого/правого представления первой информации усиления для первого канала и второй информации усиления для второго канала и обеспечения первой информации усиления для первого канала, включенной в первые параметрические данные шума, и второй информации усиления, включенной во вторые параметрические данные шума.- wherein the noise parameter calculation module is configured to calculate, from the re-converted left/right representation, first gain information for the first channel and second gain information for the second channel, and provide first gain information for the first channel included in the first noise parameter data and the second information gain included in the second noise parameter data.

Согласно аспекту, модуль вычисления параметров шума выполнен с возможностью вычисления:According to an aspect, the noise parameter calculation module is configured to calculate:

- первой информации усиления посредством сравнения:- first gain information by comparison:

- версии первых параметрических данных шума для первого канала, повторно преобразованной из среднего/бокового представления в левое/правое представление; с- a version of the first parametric noise data for the first channel, re-converted from the middle/side view to the left/right view; With

- версией первых параметрических данных шума для первого канала до преобразования из среднего/бокового представления в левое/правое представление; и/или- a version of the first parametric noise data for the first channel before conversion from the middle/side view to the left/right view; and/or

- второй информации усиления посредством сравнения:- second gain information by comparison:

- версии вторых параметрических данных шума для второго канала, повторно преобразованной из среднего/бокового представления в левое/правое представление; с- a version of the second parametric noise data for the second channel re-converted from the middle/side view to the left/right view; With

- версией вторых параметрических данных шума для второго канала до преобразования из среднего/бокового представления в левое/правое представление.- a version of the second parametric noise data for the second channel before conversion from the middle/side view to the left/right view.

Согласно аспекту, модуль вычисления параметров шума выполнен с возможностью сравнения энергии второй линейной комбинации между первыми параметрическими данными шума и вторыми параметрическими данными шума с заданным пороговым значением энергии и:According to an aspect, the noise parameter calculation module is configured to compare the energy of a second linear combination between the first noise parameter data and the second noise parameter data with a given energy threshold value and:

- в случае, если энергия второй линейной комбинации между первыми параметрическими данными шума и вторыми параметрическими данными шума больше заданного порогового значения энергии, коэффициенты вектора форм бокового канального шума обнуляются; и- in the event that the energy of the second linear combination between the first noise parametric data and the second noise parametric data is greater than a predetermined energy threshold, the coefficients of the side channel noise shape vector are set to zero; And

- в случае, если энергия второй линейной комбинации между первыми параметрическими данными шума и вторыми параметрическими данными шума меньше заданного порогового значения энергии, коэффициенты вектора форм бокового канального шума сохраняются.- in the event that the energy of the second linear combination between the first noise parametric data and the second noise parametric data is less than a predetermined energy threshold, the coefficients of the side channel noise shape vector are stored.

Согласно аспекту, аудиокодер выполнен с возможностью кодирования второй линейной комбинации между первыми параметрическими данными шума и вторыми параметрическими данными шума с меньшим количеством битов, чем количество битов, через которые кодируется первая линейная комбинация между первыми параметрическими данными шума и вторыми параметрическими данными шума.According to an aspect, the audio encoder is configured to encode a second linear combination between the first noise parametric data and the second noise parametric data with fewer bits than the number of bits through which the first linear combination between the first noise parametric data and the second noise parametric data is encoded.

Согласно аспекту, выходной интерфейс выполнен с возможностью:According to an aspect, the output interface is configured to:

- формирования кодированного многоканального аудиосигнала, имеющего кодированные аудиоданные для активного кадра с использованием первого множества коэффициентов для первого числа частотных элементов разрешения; и- generating a coded multi-channel audio signal having coded audio data for the active frame using a first set of coefficients for a first number of frequency bins; And

- формирования первых параметрических данных шума, вторых параметрических данных шума или первой линейной комбинации первых параметрических данных шума и вторых параметрических данных шума и второй линейной комбинации первых параметрических данных шума и вторых параметрических данных шума с использованием второго множества коэффициентов, описывающих второе число частотных элементов разрешения,- generating first parametric noise data, second parametric noise data, or a first linear combination of the first parametric noise data and second parametric noise data and a second linear combination of the first parametric noise data and second parametric noise data using a second set of coefficients describing a second number of frequency bins,

В соответствии с аспектом, предусмотрен способ кодирования аудио для формирования кодированного многоканального аудиосигнала для последовательности кадров, содержащих активный кадр и неактивный кадр, при этом способ содержит:According to an aspect, there is provided an audio encoding method for generating an encoded multi-channel audio signal for a sequence of frames comprising an active frame and an inactive frame, the method comprising:

- анализ многоканального сигнала для определения кадра последовательности кадров как представляющего собой неактивный кадр;- analyzing the multi-channel signal to determine the frame of the frame sequence as representing an inactive frame;

- вычисление первых параметрических данных шума для первого канала многоканального сигнала и/или для первой линейной комбинации первого и второго каналов многоканального сигнала и вычисление вторых параметрических данных шума для второго канала многоканального сигнала и/или для второй линейной комбинации первого и второго каналов многоканального сигнала;- calculating the first parametric noise data for the first channel of the multi-channel signal and/or for the first linear combination of the first and second channels of the multi-channel signal and calculating the second parametric noise data for the second channel of the multi-channel signal and/or for the second linear combination of the first and second channels of the multi-channel signal;

- вычисление данных когерентности, указывающих ситуацию когерентности между первым каналом и вторым каналом в неактивном кадре; и- calculating coherence data indicating a coherence situation between the first channel and the second channel in the inactive frame; And

- формирование кодированного многоканального аудиосигнала, имеющего кодированные аудиоданные для активного кадра и, для неактивного кадра, первые параметрические данные шума, вторые параметрические данные шума и данные когерентности.- generating an encoded multi-channel audio signal having encoded audio data for the active frame and, for the inactive frame, first noise parametric data, second noise parametric data, and coherence data.

Согласно аспекту, предусмотрена компьютерная программа для осуществления, при выполнении на компьютере или процессоре, вышеприведенного или нижеприведенного способа.According to an aspect, a computer program is provided for implementing, when executed on a computer or processor, the above or below method.

В соответствии с аспектом, предусмотрен кодированный многоканальный аудиосигнал, организованный в последовательности кадров, причем последовательность кадров содержит активный кадр и неактивный кадр, причем кодированный многоканальный аудиосигнал содержит:According to an aspect, there is provided an encoded multi-channel audio signal organized in a sequence of frames, wherein the sequence of frames comprises an active frame and an inactive frame, wherein the encoded multi-channel audio signal comprises:

- кодированные аудиоданные для активного кадра;- encoded audio data for the active frame;

- первые параметрические данные шума для первого канала в неактивном кадре;- first parametric noise data for the first channel in the inactive frame;

- вторые параметрические данные шума для второго канала в неактивном кадре; и- second parametric noise data for the second channel in the inactive frame; And

- данные когерентности, указывающие ситуацию когерентности между первым каналом и вторым каналом в неактивном кадре.- coherence data indicating the coherence situation between the first channel and the second channel in the inactive frame.

- при этом первый источник шума или второй источник шума выполнен с возможностью формирования первого шумового сигнала или второго шумового сигнала таким образом, что первый шумовой сигнал или второй шумовой сигнал декоррелирован относительно шумового сигнала микширования.- wherein the first noise source or the second noise source is configured to generate the first noise signal or the second noise signal in such a way that the first noise signal or the second noise signal is decorrelated with respect to the mixing noise signal.

Согласно аспекту, один из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании содержит генератор псевдослучайных числовых последовательностей, выполненный с возможностью формирования псевдослучайной числовой последовательности в ответ на начальное число, иAccording to an aspect, one of the first audio source, the second audio source, and the mixing noise source comprises a pseudo-random number sequence generator configured to generate a pseudo-random number sequence in response to a seed, and

- при этом по меньшей мере два из первого аудиоисточника, второго аудиоисточника и источника шума при микшировании выполнены с возможностью инициализации генератора псевдослучайных числовых последовательностей с использованием различных начальных чисел.- wherein at least two of the first audio source, the second audio source and the mixing noise source are configured to initialize the pseudo-random number sequence generator using different initial numbers.

- при этом, при необходимости, по меньшей мере один генератор шума выполнен с возможностью формирования комплексного спектрального значения шума для частотного элемента k разрешения с использованием для одной из действительной части и мнимой части первого случайного значения с индексом k, и с использованием для другой из действительной части и мнимой части второго случайного значения с индексом (k+M),- in this case, if necessary, at least one noise generator is configured to generate a complex spectral noise value for the frequency element k of the resolution using for one of the real part and the imaginary part the first random value with index k, and using for the other of the real part and imaginary part of the second random value with index (k+M),

- при этом первое значение шума и второе значение шума включаются в шумовой массив, например, извлекаемый из генератора последовательности случайных чисел или из таблицы шумов, или из шумового процесса, в диапазоне от начального индекса до конечного индекса, причем начальный индекс меньше M, и причем конечный индекс равен или меньше 2M, при этом M и k являются целыми числами.- wherein the first noise value and the second noise value are included in a noise array, for example, extracted from a random number sequence generator or from a noise table, or from a noise process, in the range from the initial index to the final index, and wherein the initial index is less than M, and wherein the final index is equal to or less than 2M, with M and k being integers.

- при этом величина воздействия, выполняемого посредством первого амплитудного элемента, и величина воздействия, выполняемого посредством второго амплитудного элемента, равны друг другу или отличаются менее чем на 20 процентов относительно величины, выполняемой посредством первого амплитудного элемента.- in this case, the magnitude of the impact performed through the first amplitude element and the magnitude of the impact performed through the second amplitude element are equal to each other or differ by less than 20 percent relative to the value performed through the first amplitude element.

Согласно аспекту, микшер содержит третий амплитудный элемент для воздействия на амплитуду шумового сигнала микширования, при этом величина воздействия, выполняемого посредством третьего амплитудного элемента, зависит от величины воздействия, выполняемого посредством первого амплитудного элемента или второго амплитудного элемента таким образом, что величина воздействия, выполняемого посредством третьего амплитудного элемента, становится больше, когда величина воздействия, выполняемого посредством первого амплитудного элемента, или величина воздействия, выполняемого посредством второго амплитудного элемента, становится меньше.According to an aspect, the mixer includes a third amplitude element for influencing the amplitude of the mixing noise signal, wherein the amount of influence performed by the third amplitude element depends on the amount of influence performed by the first amplitude element or the second amplitude element such that the amount of influence performed by of the third amplitude element becomes larger when the amount of action performed by the first amplitude element or the amount of action performed by the second amplitude element becomes smaller.

Согласно аспекту, генератор многоканальных сигналов дополнительно содержит:According to an aspect, the multi-channel signal generator further comprises:

- входной интерфейс для приема кодированных аудиоданных в последовательности кадров, содержащих активный кадр и неактивный кадр после активного кадра; и- an input interface for receiving encoded audio data in a sequence of frames containing an active frame and an inactive frame after the active frame; And

Согласно аспекту, кодированные аудиоданные для неактивного кадра содержат данные дескриптора вставки молчания, содержащие данные комфортного шума, указывающие энергию сигналов для каждого канала двух каналов для неактивного кадра и указывающие когерентность между первым каналом и вторым каналом в неактивном кадре, иAccording to an aspect, the encoded audio data for the inactive frame comprises silence insert descriptor data comprising comfort noise data indicating signal energies for each channel of two channels for the inactive frame and indicating coherence between the first channel and the second channel in the inactive frame, and

- при этом микшер выполнен с возможностью микширования шумового сигнала микширования и первого аудиосигнала или второго аудиосигнала на основе данных комфортного шума, указывающих когерентность, и при этом генератор многоканальных сигналов дополнительно содержит модуль модификации сигналов для модификации первого канала и второго канала либо первого аудиосигнала, либо второго аудиосигнала, либо шумового сигнала микширования,- wherein the mixer is configured to mix the mixing noise signal and the first audio signal or the second audio signal based on the comfort noise data indicating coherence, and wherein the multi-channel signal generator further comprises a signal modification module for modifying the first channel and the second channel of either the first audio signal or the second audio signal, or noise mixing signal,

- при этом модуль модификации сигналов выполнен с возможностью управления посредством данных комфортного шума, указывающих энергии сигналов для первого аудиоканала и второго аудиоканала.- wherein the signal modification module is configured to be controlled by comfort noise data indicating signal energies for the first audio channel and the second audio channel.

- первый кадр дескриптора вставки молчания для первого канала и второй кадр дескриптора вставки молчания для второго канала, при этом первый кадр дескриптора вставки молчания содержит данные параметров комфортного шума для первого канала и вспомогательную информацию формирования комфортного шума для первого канала и второго канала, и при этом второй кадр дескриптора вставки молчания содержит данные параметров комфортного шума для второго канала и информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре, и- a first frame of a silence insert descriptor for the first channel and a second frame of a silence insert descriptor for the second channel, wherein the first frame of the silence insert descriptor contains comfort noise parameter data for the first channel and auxiliary comfort noise generation information for the first channel and the second channel, and wherein a second frame of the silent insert descriptor contains comfort noise parameter data for the second channel and coherence information indicating coherence between the first channel and the second channel in the inactive frame, and

- при этом генератор многоканальных сигналов содержит контроллер для управления формированием многоканального сигнала в неактивном кадре с использованием вспомогательной информации формирования комфортного шума для первого кадра дескриптора вставки молчания для определения режима формирования комфортного шума для первого канала и второго канала, с использованием информации когерентности во втором кадре дескриптора вставки молчания для задания когерентности между первым каналом и вторым каналом в неактивном кадре, и с использованием данных формирования комфортного шума из первого кадра дескриптора вставки молчания, и с использованием данных параметров формирования комфортного шума из второго кадра дескриптора вставки молчания для задания энергетической ситуации первого канала и энергетической ситуации второго канала.- wherein the multi-channel signal generator contains a controller for controlling the generation of a multi-channel signal in an inactive frame using the auxiliary information of comfort noise generation for the first frame of the silence insert descriptor for determining the mode of comfort noise generation for the first channel and the second channel, using coherence information in the second frame of the descriptor silence inserts to set the coherence between the first channel and the second channel in the inactive frame, and using comfort noise generation data from the first frame of the silence insert descriptor, and using comfort noise generation data from the second frame of the silence insert descriptor to specify the energy situation of the first channel, and energy situation of the second channel.

Согласно аспекту, дополнительно содержащий спектрально-временной преобразователь для преобразования результирующего первого канала и результирующего второго канала, спектрально регулируемых и когерентно регулируемых, в соответствующие представления во временной области, которые должны комбинироваться или конкатенироваться с представлениями во временной области соответствующих каналов декодированного многоканального сигнала для активного кадра.According to an aspect, further comprising a spectral-time converter for converting the resulting first channel and the resulting second channel, spectrally adjustable and coherently adjustable, into corresponding time domain representations that are to be combined or concatenated with the time domain representations of the corresponding channels of the decoded multi-channel signal for the active frame .

- кадр дескриптора вставки молчания, при этом кадр дескриптора вставки молчания содержит данные параметров комфортного шума для первого и второго канала и вспомогательную информацию формирования комфортного шума для первого канала и второго канала и информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре, и- a silent insertion descriptor frame, wherein the silence insertion descriptor frame contains comfort noise parameter data for the first and second channel and comfort noise generation auxiliary information for the first channel and the second channel and coherence information indicating coherence between the first channel and the second channel in the inactive frame, And

- при этом генератор многоканальных сигналов содержит контроллер для управления формированием многоканального сигнала в неактивном кадре с использованием вспомогательной информации формирования комфортного шума для кадра дескриптора вставки молчания для определения режима формирования комфортного шума для первого канала и второго канала, с использованием информации когерентности во втором кадре дескриптора вставки молчания для задания когерентности между первым каналом и вторым каналом в неактивном кадре, и с использованием данных формирования комфортного шума из кадра дескриптора вставки молчания для задания энергетической ситуации первого канала и энергетической ситуации второго канала.- wherein the multi-channel signal generator contains a controller for controlling the generation of a multi-channel signal in an inactive frame using the auxiliary information of comfort noise generation for the silence insertion descriptor frame to determine the comfort noise generation mode for the first channel and the second channel, using coherence information in the second insertion descriptor frame silence to set the coherence between the first channel and the second channel in the inactive frame, and using the comfort noise generation data from the silence insert descriptor frame to set the power situation of the first channel and the power situation of the second channel.

- при этом первый источник шума или второй источник шума выполнен с возможностью формирования первого шумового сигнала или второго шумового сигнала таким образом, что первый шумовой сигнал или второй шумовой сигнал по меньшей мере частично коррелированы, и- wherein the first noise source or the second noise source is configured to generate the first noise signal or the second noise signal such that the first noise signal or the second noise signal is at least partially correlated, and

- при этом источник шума при микшировании выполнен с возможностью формирования шумового сигнала микширования с первой частью шума при микшировании и второй частью шума при микшировании, причем вторая часть шума при микшировании по меньшей мере частично декоррелируется относительно первой части шума при микшировании; и- wherein the mixing noise source is configured to generate a mixing noise signal with a first part of the mixing noise and a second part of the mixing noise, wherein the second part of the mixing noise is at least partially decorrelated with respect to the first part of the mixing noise; And

- при этом микшер выполнен с возможностью микширования первой части шума при микшировании шумового сигнала микширования и первого аудиосигнала для получения первого канала, и микширования второй части шума при микшировании шумового сигнала микширования и второго аудиосигнала для получения второго канала.- wherein the mixer is configured to mix the first part of the noise when mixing the mixing noise signal and the first audio signal to obtain the first channel, and mixing the second part of the noise when mixing the mixing noise signal and the second audio signal to obtain the second channel.

Согласно аспекту, способ формирования многоканального сигнала, имеющего первый канал и второй канал содержит:According to an aspect, a method for generating a multi-channel signal having a first channel and a second channel comprises:

Согласно аспекту, предусмотрен аудиокодер для формирования кодированного многоканального аудиосигнала для последовательности кадров, содержащих активный кадр и неактивный кадр, причем аудиокодер содержит:According to an aspect, an audio encoder is provided for generating an encoded multi-channel audio signal for a sequence of frames comprising an active frame and an inactive frame, the audio encoder comprising:

- выходной интерфейс для формирования кодированного многоканального аудиосигнала, имеющего кодированные аудиоданные для активного кадра и, для неактивного кадра, первые параметрические данные шума, вторые параметрические данные шума и данные когерентности.- an output interface for generating an encoded multi-channel audio signal having encoded audio data for an active frame and, for an inactive frame, first noise parametric data, second noise parametric data, and coherence data.

Согласно аспекту, предусмотрен аудиокодер, при этом модуль вычисления когерентности выполнен с возможностью вычисления квадратного корня результирующего числа для получения значения когерентности, на котором основаны данные когерентности.According to an aspect, an audio encoder is provided, wherein the coherence calculation module is configured to calculate the square root of the resultant number to obtain a coherence value on which the coherence data is based.

Согласно аспекту, предусмотрен аудиокодер,According to the aspect, an audio encoder is provided,

- при этом выходной интерфейс выполнен с возможностью формирования первого кадра дескриптора вставки молчания для первого канала и второго кадра дескриптора вставки молчания для второго канала, при этом первый кадр дескриптора вставки молчания содержит данные параметров комфортного шума для первого канала и вспомогательную информацию формирования комфортного шума для первого канала и второго канала, и при этом второй кадр дескриптора вставки молчания содержит данные параметров комфортного шума для второго канала и информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре, или- wherein the output interface is configured to generate a first frame of a silence insertion descriptor for the first channel and a second frame of a silence insertion descriptor for the second channel, wherein the first frame of the silence insertion descriptor contains data on comfortable noise parameters for the first channel and auxiliary information on the formation of comfortable noise for the first channel and a second channel, and wherein the second frame of the silent insert descriptor contains comfort noise parameter data for the second channel and coherence information indicating coherence between the first channel and the second channel in the inactive frame, or

- при этом выходной интерфейс выполнен с возможностью формирования кадра дескриптора вставки молчания, при этом кадр дескриптора вставки молчания содержит данные параметров комфортного шума для первого и второго канала и вспомогательную информацию формирования комфортного шума для первого канала и второго канала и информацию когерентности, указывающую когерентность между первым каналом и вторым каналом в неактивном кадре.- wherein the output interface is configured to generate a silence insertion descriptor frame, wherein the silence insertion descriptor frame contains comfort noise parameter data for the first and second channel and auxiliary comfort noise generation information for the first channel and the second channel and coherence information indicating coherence between the first channel and the second channel in the inactive frame.

Согласно аспекту, равномерный квантователь выполнен с возможностью вычисления N битов таким образом, что значение для N равно значению битов, занимаемых посредством вспомогательной информации формирования комфортного шума для первого кадра дескриптора вставки молчания.According to an aspect, the uniform quantizer is configured to calculate N bits such that the value for N is equal to the value of the bits occupied by the comfort noise generation auxiliary information for the first frame of the silence insert descriptor.

Согласно аспекту, способ кодирования аудио для формирования кодированного многоканального аудиосигнала для последовательности кадров, содержащих активный кадр и неактивный кадр, при этом способ содержит:According to an aspect, an audio encoding method for generating an encoded multi-channel audio signal for a sequence of frames comprising an active frame and an inactive frame, the method comprising:

- вычисление первых параметрических данных шума для первого канала многоканального сигнала и вычисление вторых параметрических данных шума для второго канала многоканального сигнала;- calculating first parametric noise data for a first channel of the multi-channel signal and calculating second parametric noise data for a second channel of the multi-channel signal;

Согласно аспекту, кодированный многоканальный аудиосигнал, организованный в последовательности кадров, причем последовательность кадров содержит активный кадр и неактивный кадр, причем кодированный многоканальный аудиосигнал содержит:According to an aspect, an encoded multi-channel audio signal arranged in a sequence of frames, wherein the sequence of frames comprises an active frame and an inactive frame, wherein the encoded multi-channel audio signal comprises:

Краткое описание чертежейBrief description of drawings

Фиг. 1 показывает пример в кодере, в частности, для классификации кадра как активного или неактивного.Fig. 1 shows an example in an encoder, in particular for classifying a frame as active or inactive.

Фиг. 2 показывает пример кодера и декодера.Fig. 2 shows an example of an encoder and decoder.

Фиг. 3a-3f показывают примеры генераторов многоканальных сигналов, которые могут использоваться в декодере.Fig. 3a-3f show examples of multi-channel signal generators that can be used in a decoder.

Фиг. 4 показывает пример кодера и декодера.Fig. 4 shows an example of an encoder and decoder.

Фиг. 5 показывает пример ступени квантования параметров шума.Fig. 5 shows an example of a noise parameter quantization stage.

Фиг. 6 показывает пример ступени деквантования параметров шума.Fig. 6 shows an example of a noise parameter dequantization stage.

Осуществление изобретенияCarrying out the invention

В настоящем документе описана, в числе прочего, новая технология, например, для DTX и CNG для дискретно кодированных стереосигналов. Вместо работы с понижающим мономикшированием стереосигнала, параметры шума для обоих каналов извлекаются, объединенно кодируются и передаются. В декодере (или если обобщать, в многоканальном генераторе), три независимых комфортных шумовых сигнала могут микшироваться на основе одного широкополосного значения межканальной когерентности, которое передается, например, вдоль двух наборов параметров шума. Некоторые аспекты примеров могут охватывать, в некоторых примерах по меньшей мере один из следующих аспектов:This document describes, among other things, new technology for, for example, DTX and CNG for discretely encoded stereo signals. Instead of down-mixing the stereo signal, the noise parameters for both channels are extracted, jointly encoded, and transmitted. In a decoder (or more generally, a multi-channel oscillator), three independent comfort noise signals can be mixed based on a single wideband inter-channel coherence value that is transmitted, for example, along two sets of noise parameters. Some aspects of the examples may cover, in some examples, at least one of the following aspects:

- CNG в декодере посредством микширования, например, трех независимых шумовых сигналов. После декодирования стерео-SID и восстановления параметров шума для левого и правого канала, два шумовых сигнала могут формироваться, например, в качестве смешения коррелированного и декоррелированного шума. Для этого, один общий источник шума для обоих каналов (служащий в качестве источника коррелированного шума) и два отдельных источника шума (обеспечивающих декоррелированный шум) могут микшироваться между собой. Процесс микширования может управляться посредством значения межканальной когерентности, передаваемого в стерео-SID. После микширования, два микшированных шумовых сигнала спектрально формируются с использованием восстановленных параметров шума для левого и правого каналов, соответственно.- CNG in the decoder by mixing, for example, three independent noise signals. After decoding the stereo SID and reconstructing the noise parameters for the left and right channels, two noise signals can be generated, for example, as a mixture of correlated and decorrelated noise. To achieve this, one common noise source for both channels (serving as a source of correlated noise) and two separate noise sources (providing decorrelated noise) can be mixed together. The mixing process can be controlled by the inter-channel coherence value carried in the stereo SID. After mixing, the two mixed noise signals are spectrally shaped using the reconstructed noise parameters for the left and right channels, respectively.

- Объединенное кодирование параметров шума может извлекаться из двух каналов стереосигнала. Для поддержания скорости передачи битов стерео-SID низкой, параметры шума дополнительно могут сжиматься перед их кодированием в стерео-SID. Это может достигаться, например, посредством преобразования левого/правого канального представления параметров шума в среднее/боковое представление и кодирования боковых параметров шума с меньшим числом битов, чем средние параметры шума.- A combined encoding of noise parameters can be extracted from two channels of a stereo signal. To keep the stereo SID bit rate low, the noise parameters may be further compressed before being encoded into the stereo SID. This can be achieved, for example, by converting the left/right channel representation of the noise parameters into an average/side representation and encoding the side noise parameters with fewer bits than the average noise parameters.

- SID для двухканальной DTX (стерео-SID). Этот SID может содержать параметры шума для обоих каналов стереосигнала наряду с одним широкополосным значением межканальной когерентности и флагом, указывающим равные параметры шума для обоих каналов.- SID for two-channel DTX (stereo SID). This SID may contain noise parameters for both channels of the stereo signal along with a single wideband inter-channel coherence value and a flag indicating equal noise parameters for both channels.

Показано, что нижеприведенные примеры могут быть реализованы в устройствах, системах, способах, контроллерах и постоянных модулях хранения, сохраняющих инструкции, которые, при выполнении посредством процессора, предписывают процессору выполнять раскрытые технологии (например, способы, такие как последовательности операций).It is shown that the following examples can be implemented in devices, systems, methods, controllers, and persistent storage units storing instructions that, when executed by a processor, cause the processor to perform the disclosed technologies (eg, methods such as flows of operations).

В частности по меньшей мере один из нижеприведенных блоков может управляться посредством контроллера.In particular, at least one of the following blocks may be controlled by a controller.

ПримерыExamples

Перед подробным пояснением аспектов настоящих примеров приведено краткое общее представление некоторых наиболее важных из них.Before explaining aspects of the present examples in detail, a brief overview of some of the most important ones is provided.

1) Фиг. 3a-3f показывают примеры генераторов многоканальных сигналов (например, сформированных по меньшей мере посредством одного первого сигнала или канала и одного второго аудиосигнала или канала), которые формируют многоканальный аудиосигнал (например, в декодере). Многоканальный аудиосигнал (первоначально в форме нескольких, декоррелированных каналов) может воздействоваться (например, масштабироваться) посредством амплитудного элемента(ов). Величина воздействия может быть основана на данных когерентности между первым и вторым аудиосигналами, оцененных в кодере. Первый и второй аудиосигналы могут подвергаться микшированию с общим сигналом микширования (который также может декоррелироваться и воздействоваться, например, масштабироваться, посредством данных когерентности). Величина воздействия для сигнала микширования может быть такой, что первый и второй аудиосигналы масштабируются посредством высокого весового коэффициента (например, в 1 или менее, но, например, близкого к 1), когда сигнал микширования масштабируется посредством низкого весового коэффициента (например, в 0 или более, но, например, близкого к 0), и наоборот. Величина воздействия для сигнала микширования может быть такой, что высокая когерентность, измеренная в кодере, предписывает масштабирование первого и второго аудиосигналов посредством низкого весового коэффициента (например, в 0 или более, но, например, близкого к 0), и высокая когерентность, измеренная в кодере, предписывает масштабирование первого и второго аудиосигналов посредством высокого весового коэффициента (например, в 1 или менее, но, например, близкого к 1). Технологии по фиг. 3a-3f могут использоваться для реализации генератора комфортного шума (CNG).1) Fig. 3a-3f show examples of multi-channel signal generators (eg, generated by at least one first signal or channel and one second audio signal or channel) that generate a multi-channel audio signal (eg, in a decoder). A multi-channel audio signal (initially in the form of multiple, decorrelated channels) may be affected (eg, scaled) by the amplitude element(s). The magnitude of the impact may be based on coherence data between the first and second audio signals estimated at the encoder. The first and second audio signals may be mixed with a common mixing signal (which may also be decorrelated and affected, eg, scaled, by the coherence data). The amount of impact for the mixing signal may be such that the first and second audio signals are scaled by a high weighting factor (for example, 1 or less, but, for example, close to 1), while the mixing signal is scaled by a low weighting factor (for example, 0 or more, but, for example, close to 0), and vice versa. The amount of influence for the mixing signal may be such that high coherence measured at the encoder causes the first and second audio signals to be scaled by a low weighting factor (eg, 0 or more, but, for example, close to 0), and high coherence measured at encoder causes the first and second audio signals to be scaled by a high weighting factor (eg, 1 or less, but eg, close to 1). Technologies according to FIG. 3a-3f can be used to implement a comfort noise generator (CNG).

2) Фиг. 1, 2 и 4 показывают примеры кодеров. Кодер может классифицировать аудиокадр как активный или неактивный. Если аудиокадр является неактивным, то только некоторые параметрические данные шума кодируются в потоке битов (например, для обеспечения параметрической формы шума, которая обеспечивает параметрическое представление формы шума без необходимости обеспечения самого шумового сигнала), и также могут быть обеспечены данные когерентности между двумя каналами.2) Fig. 1, 2 and 4 show examples of encoders. The encoder can classify an audio frame as active or inactive. If the audio frame is inactive, then only some parametric noise data is encoded into the bit stream (eg, to provide a parametric noise shape that provides a parametric representation of the noise shape without having to provide the noise signal itself), and coherence data between the two channels may also be provided.

3) Фиг. 2 и 4 показывают примеры декодеров. Декодер может формировать аудиосигнал (комфортный шум), например, посредством следующего:3) Fig. 2 and 4 show examples of decoders. The decoder can generate an audio signal (comfort noise), for example, by the following:

a. использование одной из технологий, показанных на фиг. 3a-3f (вышеприведенный пункт 1)) (в частности, с учетом значения когерентности, обеспеченного кодером, и его применения в качестве весового коэффициента в амплитудном элементе(ах)); иa. using one of the technologies shown in FIG. 3a-3f (1) above) (in particular taking into account the coherence value provided by the encoder and its application as a weighting factor in the amplitude element(s)); And

b. формирование сформированного аудиосигнала (комфортного шума) с использованием параметрических данных шума, кодированных в потоке битов.b. generating a shaped audio signal (comfort noise) using parametric noise data encoded in a bit stream.

В частности, для кодера не обязательно обеспечивать полный аудиосигнал для неактивного кадра, а можно только значение когерентности и параметрическое представление формы шума, за счет этого уменьшая количество битов, которые должны кодироваться в потоке битов.In particular, it is not necessary for the encoder to provide the full audio signal for the inactive frame, but only a coherence value and a parametric representation of the noise shape, thereby reducing the number of bits that must be encoded in the bitstream.

Генератор сигналов (например, сторона декодера), CNGSignal generator (eg decoder side), CNG

Фиг. 3a-3f показывают примеры CNG или, если обобщать, генератора 200 многоканальных сигналов для формирования многоканального сигнала 204, имеющего первый канал 201 и второй канал 203. (В настоящем описании, сформированные аудиосигналы 221 и 223 считаются шумом, но также являются возможными другие виды сигналов, которые не представляют собой шум). Первоначально следует обратиться к на фиг. 3f, который является общим, тогда как фиг. 3a-3e показывают конкретные примеры.Fig. 3a-3f show examples of a CNG, or more generally, a multi-channel signal generator 200 for generating a multi-channel signal 204 having a first channel 201 and a second channel 203. (In the present description, the generated audio signals 221 and 223 are considered noise, but other types of signals are also possible , which do not represent noise). Refer first to FIG. 3f, which is general, while FIG. 3a-3e show specific examples.

Первый аудиоисточник 211 может представлять собой первый источник шума и может указываться здесь как формирующий первый аудиосигнал 221, который может представлять собой первый шумовой сигнал. Источник 212 шума при микшировании может формировать шумовой сигнал 222 микширования. Второй аудиоисточник 213 может формировать второй аудиосигнал 223, который может представлять собой второй шумовой сигнал. Генератор 200 многоканальных сигналов может микшировать первый аудиосигнал 221 (первый шумовой сигнал) с шумовым сигналом 222 микширования и второй аудиосигнал 223 (второй шумовой сигнал) с шумовым сигналом 222 микширования. (Помимо этого или альтернативно, первый аудиосигнал 221 может микшироваться с версией 221a шумового сигнала 222 микширования, и второй аудиосигнал 223 может микшироваться с версией 221b шумового сигнала 222 микширования, при этом версии 221a и 221b могут отличаться, например, на 20% друг от друга; каждая из версий 221a и 221b может, например, представлять собой повышающе масштабированную и/или понижающе масштабированную версию общего сигнала 222). Соответственно, первый канал 201 многоканального сигнала 204 может получаться из первого аудиосигнала 221 (первого шумового сигнала) и шумового сигнала 222 микширования. Аналогично, второй канал 203 многоканального сигнала 204 может получаться из второго аудиосигнала 223, микшированного с шумовым сигналом 222 микширования. Также следует отметить, что сигналы здесь могут находиться в частотной области, и k означает конкретный индекс или коэффициент (ассоциированный с конкретным частотным элементом разрешения).The first audio source 211 may be a first noise source and may be referred to herein as generating a first audio signal 221, which may be a first noise signal. The mixing noise source 212 may generate the mixing noise signal 222 . The second audio source 213 may generate a second audio signal 223, which may be a second noise signal. The multi-channel signal generator 200 may mix a first audio signal 221 (first noise signal) with a mixing noise signal 222 and a second audio signal 223 (second noise signal) with a mixing noise signal 222. (In addition or alternatively, the first audio signal 221 may be mixed with a version 221a of the mixing noise signal 222, and the second audio signal 223 may be mixed with the version 221b of the mixing noise signal 222, wherein the versions 221a and 221b may differ, for example, by 20% from each other ; each of versions 221a and 221b may, for example, be an upscaled and/or downscaled version of the overall signal 222). Accordingly, the first channel 201 of the multi-channel signal 204 may be obtained from the first audio signal 221 (the first noise signal) and the mixing noise signal 222. Likewise, the second channel 203 of the multi-channel signal 204 may be obtained from the second audio signal 223 mixed with the noise mix signal 222. It should also be noted that the signals here may be in the frequency domain, and k denotes a particular index or coefficient (associated with a particular frequency bin).

Как видно из фиг. 3a-3f, первый аудиосигнал 221, шумовой сигнал 222 микширования и второй аудиосигнал 223 могут декоррелироваться друг с другом. Это может получаться, например, посредством декорреляции того же сигнала (например, в декорреляторе) и/или посредством независимого формирования шума (примеры приведены ниже).As can be seen from Fig. 3a-3f, the first audio signal 221, the mixing noise signal 222, and the second audio signal 223 may be decorrelated with each other. This can be achieved, for example, by decorrelating the same signal (eg in a decorrelator) and/or by independently shaping noise (examples are given below).

Микшер 208 может быть реализован для микширования первого аудиосигнала 221 и второго аудиосигнала 223 с шумовым сигналом 222 микширования. Микширование может иметь тип суммирования сигналов (например, в ступенях 206-1 и 206-3 сумматора) после того, как первый аудиосигнал 221, шумовой сигнал 222 микширования и второй аудиосигнал 223 взвешены посредством масштабирования (например, в амплитудных элементах 208-1, 208-2, 208-3). Микширование имеет тип «суммирование после взвешивания». Фиг. 3a-3f показывают фактическую обработку сигналов, которая применяется для формирования шумовых сигналов N_l[k] и N_r[k], при этом элемент суммирования (+) обозначает суммирование по выборкам двух сигналов (k является индексом частотного элемента разрешения).The mixer 208 may be implemented to mix the first audio signal 221 and the second audio signal 223 with the noise mix signal 222 . The mixing may be of a signal summing type (eg, in adder stages 206-1 and 206-3) after the first audio signal 221, mixing noise signal 222, and second audio signal 223 have been weighted by scaling (eg, in amplitude elements 208-1, 208 -2, 208-3). Mixing is of the “sum after weighing” type. Fig. 3a-3f show the actual signal processing that is applied to generate the noise signals N _l [k] and N _r [k], where the sum element (+) denotes the summation over samples of the two signals (k is the index of the frequency bin).

Амплитудные элементы 208-1, 208-2 и 208-3 (либо весовые элементы или масштабирующие элементы) могут получаться, например, посредством масштабирования первого аудиосигнала 221, шумового сигнала 222 микширования и второго аудиосигнала 223 посредством подходящих коэффициентов и могут выводить взвешенную версию 221' первого аудиосигнала 221, взвешенную версию 222' шумового сигнала 222 микширования и взвешенную версию 223' второго аудиосигнала 223. Подходящие коэффициенты могут представлять собой sqrt(coh) и sqrt(1-coh) и могут получаться, например, из информации когерентности, кодированной в передаче в служебных сигналах конкретного кадра дескриптора (см. также ниже) (sqrt относится здесь к операции вычисления квадратного корня). Когерентность "coh" подробно поясняется ниже и, например, может представлять собой то, что указывается с помощью "c" или "c_ind" или "c_q" ниже, например, кодироваться в информации 404 когерентности потока 232 битов (см. ниже, в комбинации с фиг. 2 и 4). В частности, шумовой сигнал 222 микширования может быть подвергнут, например, масштабированию посредством весового коэффициента, который представляет собой квадратный корень значения когерентности, в то время как первый аудиосигнал 221 и второй аудиосигнал 222 могут масштабироваться посредством весового коэффициента, который представляет собой квадратный корень значения, комплементарного одной из когерентности coh. Несмотря на это, шумовой сигнал 222 микширования может считаться общим сигналом режима, часть которого микшируется во взвешенную версию 221' первого аудиосигнала 221 и взвешенную версию 223' второго аудиосигнала 223 таким образом, чтобы получать первый канал 201 многоканального сигнала 204 и второй канал 203 многоканального сигнала 204, соответственно. В некоторых случаях, первый источник 211 шума или второй источник 213 шума может быть выполнен с возможностью формирования первого шумового сигнала 221 или второго шумового сигнала 223 таким образом, что первый шумовой сигнал 221 и/или второй шумовой сигнал 223 декоррелированы относительно шумового сигнала 222 микширования (см. ниже с обращением к фиг. 3b-3e).Amplitude elements 208-1, 208-2, and 208-3 (or weighting elements or scaling elements) may be obtained, for example, by scaling the first audio signal 221, the mixing noise signal 222, and the second audio signal 223 by suitable coefficients and may output a weighted version 221' the first audio signal 221, a weighted version 222' of the mixing noise signal 222, and a weighted version 223' of the second audio signal 223. Suitable coefficients may be sqrt(coh) and sqrt(1-coh) and may be obtained, for example, from coherence information encoded in the transmission in the overhead of a particular descriptor frame (see also below) (sqrt refers here to the square root operation). The coherence "coh" is explained in detail below and, for example, may be what is indicated by "c" or "c _ind " or "c _q " below, for example, encoded in the 232 bit stream coherence information 404 (see below, in combination with Fig. 2 and 4). In particular, the noise mixing signal 222 may be scaled, for example, by a weighting factor that is the square root of the coherence value, while the first audio signal 221 and the second audio signal 222 may be scaled by a weighting factor that is the square root of the value. complementary to one of the coherence coh. Regardless, the noise mix signal 222 may be considered a common mode signal, a portion of which is mixed into a weighted version 221' of the first audio signal 221 and a weighted version 223' of the second audio signal 223 so as to obtain a first channel 201 of the multi-channel signal 204 and a second channel 203 of the multi-channel signal 204, respectively. In some cases, the first noise source 211 or the second noise source 213 may be configured to generate the first noise signal 221 or the second noise signal 223 such that the first noise signal 221 and/or the second noise signal 223 are decorrelated with respect to the mixing noise signal 222 ( see below with reference to Figures 3b-3e).

По меньшей мере один (либо каждый) из первого аудиоисточника 211, второго аудиоисточника 213 и источника 212 шума при микшировании может представлять собой источник гауссова шума.At least one or each of the first audio source 211, the second audio source 213, and the mixing noise source 212 may be a Gaussian noise source.

В примере по фиг. 3a, первый аудиоисточник 211 (здесь указываемый с помощью 211a) может содержать или соединяться с первым генератором шума, и второй аудиоисточник 213 (213a) может содержать или соединяться со вторым генератором шума. Источник 212 (212a) шума при микшировании может содержать или соединяться с третьим генератором шума. Первый генератор 211 (211a) шума, второй генератор 213 (213a) шума и третий генератор 212 (212a) шума могут формировать взаимно декоррелированные шумовые сигналы.In the example of FIG. 3a, a first audio source 211 (here indicated by 211a) may comprise or couple to a first noise generator, and a second audio source 213 (213a) may comprise or couple to a second noise generator. The mixing noise source 212 (212a) may comprise or be coupled to a third noise generator. The first noise generator 211 (211a), the second noise generator 213 (213a), and the third noise generator 212 (212a) may generate mutually decorrelated noise signals.

В примерах по меньшей мере один из первого аудиоисточника 211 (211a), второго аудиоисточника 213 (213a) и источника 212 (212a) шума при микшировании может работать с использованием предварительно сохраненной таблицы шумов, которая может в силу этого обеспечивать случайную последовательность.In examples, at least one of the first audio source 211 (211a), the second audio source 213 (213a), and the mixing noise source 212 (212a) may operate using a pre-stored noise table, which may thereby provide a random sequence.

В некоторых примерах по меньшей мере один из первого аудиоисточника 211, второго аудиоисточника 213 и источника 212 шума при микшировании может формировать комплексный спектр для кадра с использованием первого значения шума для действительной части и второго значения шума для мнимой части. При необходимости, по меньшей мере один генератор шума может формировать комплексное спектральное значение шума (например, коэффициент) для частотного элемента k разрешения с использованием, для одной из действительной части и мнимой части, первого случайного значения с индексом k, и с использованием, для другой из действительной части и мнимой части, второго случайного значения с индексом (k+M). Первое значение шума и второе значение шума могут включаться в шумовой массив, например, извлекаемый из генератора последовательности случайных чисел или из таблицы шумов, или из шумового процесса, в диапазоне от начального индекса до конечного индекса, причем начальный индекс меньше M, и причем конечный индекс равен или меньше 2xM (который в два раза больше M). M и k могут быть целыми числами (при этом k является индексом конкретного частотного элемента разрешения битов в представлении в частотной области сигнала).In some examples, at least one of the first audio source 211, the second audio source 213, and the mixing noise source 212 may generate a complex spectrum for a frame using a first noise value for the real part and a second noise value for the imaginary part. Optionally, the at least one noise generator may generate a complex spectral noise value (eg, a coefficient) for frequency bin k using, for one of the real part and the imaginary part, a first random value with index k, and using, for the other from the real part and the imaginary part, the second random value with index (k+M). The first noise value and the second noise value may be included in a noise array, for example, extracted from a random number sequence generator or noise table, or from a noise process, ranging from a start index to an end index, wherein the start index is less than M, and wherein the end index equal to or less than 2xM (which is twice the size of M). M and k can be integers (with k being the index of the particular frequency element of the bit resolution in the frequency domain representation of the signal).

Каждый аудиоисточник 211, 212, 213 может включать в себя по меньшей мере один генератор аудиоисточника (генератор шума), который формирует шум, например, с точки зрения N₁[k], N₂[k], N₃[k].Each audio source 211, 212, 213 may include at least one audio source generator (noise generator) that generates noise, for example, in terms of N ₁ [k], N ₂ [k], N ₃ [k].

Генератор 200 многоканальных сигналов по фиг. 3a-3f может использоваться, например, для декодера 200a, 200b (200''). В частности, генератор 200 многоканальных сигналов может рассматриваться в качестве части генератора 220 комфортного шума (CNG) на фиг. 4. Декодер 200 может использоваться, в общем, для декодирования сигналов, которые кодированы посредством кодера, либо посредством формирования сигналов, которые должны формироваться посредством информации энергии, полученной из потока битов, с тем чтобы формировать аудиосигнал, который соответствует исходному входному аудиосигналу, вводимому в кодер. В некоторых примерах, предусмотрена классификация между кадрами с речью (или в общем непустыми аудиосигналами) и кадрами дескриптора вставки молчания. Как пояснено выше и ниже, кадры дескриптора вставки молчания (SID) (так называемые «неактивные кадры 308», которые могут кодироваться, например, как -кадры 241 и/или 243 SID) предусматриваются в общем под информацией скорости передачи битов и в силу этого предусматриваются реже, чем нормальные речевые кадры (так называемые «активные кадры 306», см. также ниже). Кроме того, информация, которая присутствует в кадрах дескриптора вставки молчания (SID, неактивных кадрах 308), в общем ограничена (и может практически соответствовать информации энергии в отношении сигнала).The multi-channel signal generator 200 of FIG. 3a-3f can be used, for example, for a decoder 200a, 200b (200''). In particular, the multichannel signal generator 200 can be considered as part of the comfort noise generator (CNG) 220 in FIG. 4. The decoder 200 may be used generally to decode signals that are encoded by the encoder, either by generating signals to be generated by energy information obtained from the bit stream so as to generate an audio signal that corresponds to the original input audio signal input to encoder In some examples, a classification is provided between speech frames (or generally non-empty audio signals) and silence insert descriptor frames. As explained above and below, silent insert descriptor (SID) frames (so-called “inactive frames 308”, which may be encoded, for example, as SID frames 241 and/or 243) are generally provided under bit rate information and therefore are provided less frequently than normal speech frames (so-called “active frames 306”, see also below). In addition, the information that is present in silent insert descriptor frames (SID, inactive frames 308) is generally limited (and may substantially correspond to energy information regarding the signal).

Несмотря на это, следует понимать, что можно дополнять контент кадров SID с многоканальным шумом 204, сформированным посредством генератора многоканальных сигналов. По существу, аудиоисточники 211, 212, 213 могут обрабатывать сигналы (например, шум), которые могут быть независимыми и декоррелироваться друг с другом. Несмотря на это, первый аудиосигнал 221, шумовой сигнал 222 микширования и второй аудиосигнал 223могут масштабироваться посредством информации когерентности, обеспеченной кодером и вставленной в поток битов. Как видно из фиг. 3a-3f, значение когерентности может быть равным для шумового сигнала 222 микширования, вводит общий сигнал режима как в первый аудиосигнал 221, так и во второй аудиосигнал 223, в силу этого обеспечивая возможность получения первого канала 201 и второго канала 203 многоканального сигнала 204. Сигнал когерентности в общем составляет значение от 0 до 1:However, it should be understood that it is possible to augment the content of SID frames with multi-channel noise 204 generated by the multi-channel signal generator. As such, audio sources 211, 212, 213 may process signals (eg, noise) that may be independent and decorrelated to each other. Despite this, the first audio signal 221, the mixing noise signal 222, and the second audio signal 223 can be scaled by the coherence information provided by the encoder and inserted into the bit stream. As can be seen from Fig. 3a-3f, the coherence value may be equal to the mixing noise signal 222, introduces a common mode signal into both the first audio signal 221 and the second audio signal 223, thereby allowing the first channel 201 and the second channel 203 of the multi-channel signal 204 to be obtained. coherence generally ranges from 0 to 1:

- Когерентность, равная 0, означает, что исходный первый аудиоканал (например, L, 301) и второй аудиоканал (например, R, 303) полностью декоррелированы друг с другом, и амплитудный элемент 208-2 шумового сигнала 222 микширования должен масштабировать посредством 0 шумовой сигнал 222 микширования, что приводит к тому, что первый аудиосигнал 221 и второй аудиосигнал 223 вообще не должны микшироваться с общим сигналом режима (посредством микширования с сигналом, который постоянно равен 0), и выходные каналы 201, 203 должны быть практически равными первому шумовому сигналу 221 и второму шумовому сигналу 223 многоканального сигнала 204.- A coherence of 0 means that the original first audio channel (eg, L, 301) and the second audio channel (eg, R, 303) are completely decorrelated with each other, and the amplitude element 208-2 of the noise signal 222 of the mix should be scaled by 0 noise mixing signal 222, which results in the first audio signal 221 and the second audio signal 223 not having to be mixed with the overall mode signal at all (by mixing with a signal that is constantly equal to 0), and the output channels 201, 203 having to be substantially equal to the first noise signal 221 and a second noise signal 223 of the multi-channel signal 204.

- Когерентность, равная 1, означает, что исходный первый аудиоканал (например, L, 301) и второй аудиоканал (например, R, 303) должны быть одинаковыми, и амплитудные элементы 208-1 и 208-3 должны масштабировать посредством 0 входные сигналы, и первый и второй каналы в таком случае равны шумовому сигналу 222 микширования (который масштабируется посредством 1 в амплитудном элементе 208-2).- Coherence equal to 1 means that the original first audio channel (for example, L, 301) and the second audio channel (for example, R, 303) must be the same, and the amplitude elements 208-1 and 208-3 must scale by 0 the input signals, and the first and second channels are then equal to the mixing noise signal 222 (which is scaled by 1 in amplitude element 208-2).

- Когерентности, промежуточные между 0 и 1, должны приводить к промежуточным микшированиям между двумя вышеописанными ситуациями.- Coherences intermediate between 0 and 1 should result in intermediate mixes between the two situations described above.

Ниже поясняются некоторые аспекты и варианты микшера 206 и/или CNG 220.Some aspects and options of the 206 and/or CNG 220 mixer are explained below.

Первый аудиоисточник (211) может представлять собой первый источник шума, и первый аудиосигнал (221) может представлять собой первый шумовой сигнал, или второй аудиоисточник (213) представляет собой второй источник шума, и второй аудиосигнал (223) представляет собой второй шумовой сигнал. Первый источник (211) шума или второй источник (213) шума может быть выполнен с возможностью формирования первого шумового сигнала (221) или второго шумового сигнала (223) таким образом, что первый шумовой сигнал (221) или второй шумовой сигнал (223) декоррелированы относительно шумового сигнала (222) микширования.The first audio source (211) may be a first noise source and the first audio signal (221) may be a first noise signal, or the second audio source (213) may be a second noise source and the second audio signal (223) may be a second noise signal. The first noise source (211) or the second noise source (213) may be configured to generate a first noise signal (221) or a second noise signal (223) such that the first noise signal (221) or the second noise signal (223) is decorrelated relative to the mixing noise signal (222).

Микшер (206) может быть выполнен с возможностью формирования первого канала (201) и второго канала (203) таким образом, что величина шумового сигнала (222) микширования в первом канале (201) равна величине шумового сигнала (222) микширования во втором канале (203) или составляет в пределах диапазона в 80-120 процентов относительно величины шумового сигнала (222) микширования во втором канале (203) (например, его части 221a и 221b отличаются в диапазоне в 80-120 процентов друг от друга и от исходного шумового сигнала 222 микширования).The mixer (206) may be configured to generate the first channel (201) and the second channel (203) such that the magnitude of the mixing noise signal (222) in the first channel (201) is equal to the magnitude of the mixing noise signal (222) in the second channel ( 203) or is within a range of 80-120 percent relative to the magnitude of the noise signal (222) of the mix in the second channel (203) (for example, portions 221a and 221b differ within the range of 80-120 percent from each other and from the original noise signal 222 mixes).

В некоторых случаях:In some cases:

- величина воздействия, выполняемого посредством первого амплитудного элемента (208-1), и величина воздействия, выполняемого посредством второго амплитудного элемента (208-3), равны друг другу (например, когда отсутствует различение между частями 221a и 221b), или- the amount of action performed by the first amplitude element (208-1) and the amount of action performed by the second amplitude element (208-3) are equal to each other (for example, when there is no distinction between parts 221a and 221b), or

- величина воздействия, выполняемого посредством второго амплитудного элемента (208-3), отличается менее чем на 20 процентов относительно величины, выполняемой посредством первого амплитудного элемента (208-1) (например, когда разность между частями 221a и 221b меньше 20%).- the amount of action performed by the second amplitude element (208-3) differs by less than 20 percent relative to the amount performed by the first amplitude element (208-1) (for example, when the difference between parts 221a and 221b is less than 20%).

Микшер (206) и/или CNG 220 могут содержать управляющий ввод для приема управляющего параметра (404, c). Микшер (206) может в силу этого быть выполнен с возможностью управления величиной шумового сигнала (222) микширования в первом канале (201) и втором канале (203) в ответ на управляющий параметр (404, c).The mixer (206) and/or CNG 220 may include a control input for receiving a control parameter (404, c). The mixer (206) may therefore be configured to control the amount of the mixing noise signal (222) in the first channel (201) and the second channel (203) in response to the control parameter (404, c).

На фиг. 3a-3f показано, что шумовой сигнал 222 микширования подвергается коэффициенту sqrt(coh), и первый и второй аудиосигналы 221, 223 подвергаются коэффициенту sqrt(1-coh).In fig. 3a-3f show that the mixing noise signal 222 is subject to a factor sqrt(coh), and the first and second audio signals 221, 223 are subject to a factor sqrt(1-coh).

Как пояснено выше, фиг. 3a показывает CNG 220a, в котором первый источник 211a (211), второй источник 213a (213) и источник 212a (212) шума при микшировании содержат различные генераторы. Это не является строго обязательным, и возможны несколько вариантов.As explained above, FIG. 3a shows a CNG 220a in which the first source 211a (211), the second source 213a (213), and the mixing noise source 212a (212) comprise different oscillators. This is not strictly required and several variations are possible.

В качестве обобщения:To summarize:

1. Первый вариантный CNG 220b (фиг. 3b):1. First variant CNG 220b (Fig. 3b):

a. первый аудиоисточник 211b (211) может содержать первый генератор шума для формирования первого аудиосигнала (221) в качестве первого шумового сигнала,a. the first audio source 211b (211) may include a first noise generator for generating a first audio signal (221) as a first noise signal,

b. второй аудиоисточник 213b (213) может содержать декоррелятор для декорреляции первого шумового сигнала (221) для формирования второго аудиосигнала (213) в качестве второго шумового сигнала (например, второй аудиосигнал получается из первого аудиосигнала после декорреляции), иb. the second audio source 213b (213) may include a decorrelator for decorrelating the first noise signal (221) to generate a second audio signal (213) as the second noise signal (eg, the second audio signal is obtained from the first audio signal after decorrelation), and

c. источник 212b (212) шума при микшировании может содержать второй генератор шума (который исходно декоррелируется относительно первого генератора шума);c. The mixing noise source 212b (212) may comprise a second noise generator (which is initially decorrelated with respect to the first noise generator);

2. Второй вариантный CNG 220c (фиг. 3c):2. Second variant CNG 220c (Fig. 3c):

a. первый аудиоисточник 211c (211) может содержать первый генератор шума для формирования первого аудиосигнала (221) в качестве первого шумового сигнала,a. the first audio source 211c (211) may include a first noise generator for generating a first audio signal (221) as a first noise signal,

b. второй аудиоисточник 213c (213) может содержать второй генератор шума для формирования второго аудиосигнала (223) в качестве второго шумового сигнала (например, второй генератор шума исходно декоррелируется относительно первого генератора шума), иb. the second audio source 213c (213) may include a second noise generator for generating a second audio signal (223) as a second noise signal (eg, the second noise generator is initially decorrelated with respect to the first noise generator), and

c. источник 212c (212) шума при микшировании может содержать декоррелятор для декорреляции первого шумового сигнала (221) или второго шумового сигнала (223) для формирования шумового сигнала (222) микширования;c. The mixing noise source 212c (212) may comprise a decorrelator for decorrelating the first noise signal (221) or the second noise signal (223) to generate a mixing noise signal (222);

3. Третий вариантный CNG 220d (фиг. 3d и 3e):3. Third variant CNG 220d (Fig. 3d and 3e):

a. один из первого аудиоисточника 211d или 211e (211), второго аудиоисточника 213d или 213e (213) и источника 212d или 212e (212) шума при микшировании может содержать генератор шума для формирования шумового сигнала,a. one of the first audio source 211d or 211e (211), the second audio source 213d or 213e (213) and the mixing noise source 212d or 212e (212) may include a noise generator for generating a noise signal,

b. другой из первого аудиоисточника 211d или 211e (211), второго аудиоисточника 213d или 213e (213) и источника 212d или 212e (212) шума при микшировании может содержать первый декоррелятор для декорреляции шумового сигнала, иb. the other of the first audio source 211d or 211e (211), the second audio source 213d or 213e (213), and the mixing noise source 212d or 212e (212) may comprise a first decorrelator for decorrelating the noise signal, and

c. еще один из первого аудиоисточника 211d или 211e (211), второго аудиоисточника 213d или 213e (213) и источника 212d или 212e (212) шума при микшировании может содержать второй декоррелятор для декорреляции шумового сигнала,c. yet another of the first audio source 211d or 211e (211), the second audio source 213d or 213e (213), and the mixing noise source 212d or 212e (212) may comprise a second decorrelator for decorrelating the noise signal,

d. первый декоррелятор и второй декоррелятор могут отличаться друг от друга, так что выходные сигналы первого декоррелятора и второго декоррелятора декоррелируются друг от друга;d. the first decorrelator and the second decorrelator may be different from each other, such that the output signals of the first decorrelator and the second decorrelator are decorrelated from each other;

4. Четвертый вариантный CNG 220 (фиг. 3a):4. Fourth variant CNG 220 (Fig. 3a):

a. первый аудиоисточник 211a (211) содержит первый генератор шума,a. the first audio source 211a (211) contains a first noise generator,

b. второй аудиоисточник 213a (213) содержит второй генератор шума,b. second audio source 213a (213) includes a second noise generator,

c. источник 212a (212) шума при микшировании содержит третий генератор шума,c. mixing noise source 212a (212) includes a third noise generator,

d. первый генератор шума, второй генератор шума и третий генератор шума могут представлять собой сформированные взаимно декоррелированные шумовые сигналы (например, древовидные генераторы исходно декоррелируются друг от друга).d. the first noise generator, the second noise generator, and the third noise generator may be generated mutually decorrelated noise signals (eg, the tree generators are initially decorrelated from each other).

5. Пятый вариант:5. Fifth option:

a. один из первого аудиоисточника (211) второго аудиоисточника (213) и источника (212) шума при микшировании может содержать генератор псевдослучайных числовых последовательностей для формирования псевдослучайной числовой последовательности в ответ на начальное число,a. one of the first audio source (211), the second audio source (213), and the mixing noise source (212) may comprise a pseudo-random number sequence generator for generating a pseudo-random number sequence in response to the seed,

b. по меньшей мере два из первого аудиоисточника (211), второго аудиоисточника (213) и источника (212) шума при микшировании могут инициализировать генератор псевдослучайных числовых последовательностей с использованием различных начальных чисел.b. at least two of the first audio source (211), the second audio source (213), and the mixing noise source (212) may initialize the pseudo-random number sequence generator using different seeds.

6. Шестой вариант:6. Sixth option:

a. по меньшей мере один из первого аудиоисточника (211), второго аудиоисточника (213) и источника (212) шума при микшировании может работать с использованием предварительно сохраненной таблицы шумов,a. at least one of the first audio source (211), the second audio source (213) and the mixing noise source (212) may be operated using a previously stored noise table,

b. при необходимости, по меньшей мере один из первого аудиоисточника (211), второго аудиоисточника (213) и источника (212) шума при микшировании может формировать комплексный спектр для кадра с использованием первого значения шума для действительной части и второго значения шума для мнимой части,b. optionally, at least one of the first audio source (211), the second audio source (213) and the mixing noise source (212) may generate a complex spectrum for the frame using the first noise value for the real part and the second noise value for the imaginary part,

c. при необходимости, по меньшей мере один генератор шума может формировать комплексное спектральное значение шума для частотного элемента k разрешения с использованием, для одной из действительной части и мнимой части, первого случайного значения с индексом k, и с использованием, для другой из действительной части и мнимой части, второго случайного значения с индексом (k+M) (первое значение шума и второе значение шума включаются в шумовой массив, например, извлекаемый из генератора последовательности случайных чисел или из таблицы шумов, или из шумового процесса, в диапазоне от начального индекса до конечного индекса, причем начальный индекс меньше M, и причем конечный индекс равен или меньше 2xM, M и k являются целыми числами),c. optionally, the at least one noise generator may generate a complex spectral noise value for frequency bin k using, for one of the real part and the imaginary part, a first random value with index k, and using, for the other, a real part and an imaginary part part, a second random value with index (k+M) (the first noise value and the second noise value are included in a noise array, for example, extracted from a random number sequence generator or from a noise table, or from a noise process, in the range from the initial index to the final index, where the starting index is less than M, and where the ending index is equal to or less than 2xM, M and k are integers),

Как видно из фиг. 4 декодер 200'' (200a, 200b) может включать в себя, помимо CNG 220 по фиг. 3, также входной интерфейс 210 для приема кодированных аудиоданных в последовательности кадров, содержащих активный кадр и неактивный кадр после активного кадра; и аудиодекодер для декодирования кодированных аудиоданных для активного кадра для формирования декодированного многоканального сигнала для активного кадра, при этом первый аудиоисточник 211, второй аудиоисточник 213, источник 212 шума при микшировании и микшер 206 являются активными в неактивном кадре для формирования многоканального сигнала для неактивного кадра.As can be seen from Fig. 4, decoder 200'' (200a, 200b) may include, in addition to the CNG 220 of FIG. 3, also an input interface 210 for receiving encoded audio data in a sequence of frames containing an active frame and an inactive frame after the active frame; and an audio decoder for decoding encoded audio data for the active frame to generate a decoded multi-channel signal for the active frame, wherein the first audio source 211, the second audio source 213, the mixing noise source 212, and the mixer 206 are active in the inactive frame to generate a multi-channel signal for the inactive frame.

В частности, активные кадры представляют собой кадры, которые классифицируются посредством кодера как имеющие речь (или любой другой вид нешумового звука), и неактивные кадры представляют собой кадры, которые классифицируются как имеющие молчание или только шум.In particular, active frames are frames that are classified by the encoder as having speech (or any other form of non-noise audio), and inactive frames are frames that are classified as having silence or only noise.

Любые из примеров CNG 220 (220a-220e) могут управляться посредством подходящего контроллера.Any of the example CNG 220 (220a-220e) can be controlled by a suitable controller.

КодерEncoder

Ниже поясняется кодер. Кодер может кодировать активные кадры и неактивные кадры. Для неактивных кадров, кодер может кодировать параметрические данные шума (например, форму шума и/или значение когерентности) без кодирования аудиосигнала полностью. Следует отметить, что кодирование неактивных аудиокадров может уменьшаться относительно активных аудиокадров, таким образом, чтобы уменьшить объем информации, которая должна кодироваться в потоке битов. Также параметрические данные шума (например, форма шума) для неактивных кадров могут иметь меньший объем информации для каждой полосы частот и/или могут иметь меньшее количество элементов разрешения, чем данные, кодированные в активных кадрах. Параметрические данные шума могут определяться в левой/правой области или в другой области (например, в средней/боковой области), например, посредством обеспечеения первой линейной комбинации между параметрическими данными шума первого и второго каналов и второй линейной комбинации между параметрическими данными шума первого и второго каналов (в некоторых случаях также можно предусмотреть информацию усиления, которая не ассоциирована с первой и второй линейными комбинациями, а определяется в левой/правой области). Первая и вторая линейные комбинации в общем являются линейно независимыми друг от друга.The encoder is explained below. The encoder can encode active frames and inactive frames. For inactive frames, the encoder may encode parametric noise data (eg, noise shape and/or coherence value) without encoding the entire audio signal. It should be noted that the encoding of inactive audio frames may be reduced relative to active audio frames, so as to reduce the amount of information that must be encoded in the bit stream. Also, parametric noise data (eg, noise shape) for inactive frames may have less information per frequency band and/or may have fewer bins than data encoded in active frames. The noise parameter data may be determined in the left/right region or in another region (eg, in the middle/side region), for example, by providing a first linear combination between the noise parameter data of the first and second channels and a second linear combination between the noise parameter data of the first and second channels. channels (in some cases, it is also possible to provide gain information that is not associated with the first and second line combinations, but is determined in the left/right region). The first and second linear combinations are generally linearly independent of each other.

Кодер может включать в себя детектор активности, который классифицирует, является ли кадр активным или неактивным.The encoder may include an activity detector that classifies whether a frame is active or inactive.

Фиг. 1, 2 и 4 показывают примеры кодеров 300a и 300b (которые также указаны позицией 300, если не обязательно различать кодер 300a и кодер 300b). Каждый аудиокодер 300 может формировать кодированный многоканальный аудиосигнал 232 для последовательности кадров входного сигнала 304. Входной сигнал 304 здесь считается разделенным между первым каналом 301 (также указываемым в качестве левого канала или "l", где "l" является буквой, прописная версия которой представляет собой "L" и является первой буквой слова «левый» на английском языке) и вторым каналом 303 (или "r", где "r" является буквой, прописная версия которой представляет собой "R" и является первой буквой слова «правый» на английском языке).Fig. 1, 2 and 4 show examples of encoders 300a and 300b (which are also indicated at 300 if it is not necessary to distinguish between encoder 300a and encoder 300b). Each audio encoder 300 may generate an encoded multi-channel audio signal 232 for a sequence of frames of the input signal 304. The input signal 304 is here considered divided between the first channel 301 (also referred to as the left channel or "l", where "l" is a letter, the uppercase version of which represents "L" is the first letter of the word "left" in English) and the second channel 303 (or "r" where "r" is the letter whose uppercase version is "R" and is the first letter of the word "right" in English language).

Кодированный многоканальный аудиосигнал 232 может задаваться в последовательности кадров, которые, например, могут находиться во временной области (например, каждая выборка "n" может означать конкретный момент времени, и выборки одного кадра могут формировать последовательность, например, последовательность дискретизации входного аудиосигнала или последовательность после фильтрации входного аудиосигнала).The encoded multi-channel audio signal 232 may be specified in a sequence of frames, which, for example, may be in the time domain (for example, each sample "n" may represent a specific point in time, and samples of one frame may form a sequence, for example, the sampling sequence of the input audio signal or the sequence after filtering the input audio signal).

Кодер 300 (300a, 300b) может включать в себя детектор 380 активности, который не показывается на фиг. 2 и 4 (хотя быть в некоторых примерах реализован в нем), но показывается на фиг. 1. Фиг. 1 показывает, что каждый кадр входного сигнала 304 может быть классифицирован либо как «активный кадр 306», либо как «неактивный кадр 308». Неактивный кадр 308 является таким, что сигнал считается молчанием (и, например, имеется только молчание или шум), тогда как активный кадр 306 может иметь некоторое обнаружение бесшумного аудиосигнала (например, речи, музыки и т.д.).Encoder 300 (300a, 300b) may include an activity detector 380, which is not shown in FIG. 2 and 4 (although in some examples it is implemented in it), but is shown in FIG. 1. Fig. 1 shows that each frame of input signal 304 can be classified as either “active frame 306” or “inactive frame 308”. The inactive frame 308 is such that the signal is considered silent (and, for example, there is only silence or noise), while the active frame 306 may have some detection of a silent audio signal (for example, speech, music, etc.).

В кодированном многоаудиосигнале 232, кодированном (например, в потоке битов) посредством кодера 300, информация относительно того, представляет кадр собой активный кадр 306 или кадр 308 молчания, может передаваться в служебных сигналах, например, в так называемой «вспомогательной информации 402 (p_frame) формирования комфортного шума», также называемой «вспомогательной информацией».In the encoded multi-audio signal 232 encoded (e.g., in a bitstream) by the encoder 300, information regarding whether the frame is an active frame 306 or a silent frame 308 may be signaled, for example, in so-called "side information 402 (p_frame)" comfort noise generation”, also called “auxiliary information”.

Фиг. 1 показывает ступень 360 предварительной обработки, которая может определять (например, классифицировать), является ли кадр активным кадром 306 или кадром 308 молчания. Здесь следует отметить, что каналы 301 и 303 входного сигнала 304 указаны прописными буквами, такими как L (301, левый канал) и R (303, правый канал) для указания того, что они находятся в частотной области. Как можно видеть на фиг. 1, может применяться ступень 370 этапа спектрального анализа (первый спектральный анализ 370-1 к первому каналу 301, L; и вторая ступень 370-3 для второго канала 303, R). Ступень 370 спектрального анализа может выполняться для каждого кадра входного сигнала 304 и может быть основана, например, на измерениях гармоничности. В частности, в некоторых примерах, спектральный анализ выполняется посредством ступени 370 для первого канала 301, и может выполняться отдельно от спектрального анализа, выполняемого для второго канала 303 того же кадра.Fig. 1 shows a preprocessing stage 360 that can determine (eg, classify) whether a frame is an active frame 306 or a silent frame 308. It should be noted here that the channels 301 and 303 of the input signal 304 are indicated by capital letters such as L (301, left channel) and R (303, right channel) to indicate that they are in the frequency domain. As can be seen in FIG. 1, a spectral analysis stage 370 may be applied (a first spectral analysis 370-1 to the first channel 301, L; and a second stage 370-3 to the second channel 303, R). Spectral analysis stage 370 may be performed on each frame of input signal 304 and may be based, for example, on harmonicity measurements. Specifically, in some examples, spectral analysis is performed by stage 370 on the first channel 301, and may be performed separately from the spectral analysis performed on the second channel 303 of the same frame.

В некоторых случаях, ступень 370 спектрального анализа может включать в себя вычисление энергозависимых параметров, таких как средняя энергия для диапазона предварительно заданных полос частот и полная средняя энергия.In some cases, spectral analysis stage 370 may include calculating energy-dependent parameters such as average energy over a range of predefined frequency bands and overall average energy.

Может применяться ступень 380 обнаружения активности (что может считаться обнаружением голосовой активности в случае поиска голоса). Первая ступень 380-1 обнаружения активности может применяться к первому каналу 301 (и, в частности, к измерениям, выполняемым для первого канала), и вторая ступень 380-3 обнаружения активности может применяться ко второму каналу 303 (и, в частности, к измерениям, выполняемым для второго канала). В примерах, ступень 380 обнаружения активности может оценивать энергию фонового шума во входном сигнале 304 и использовать эту оценку для вычисления отношения сигнала к шуму, которое сравнивается с пороговым значением отношения сигнала к шуму, для определения, классифицируется ли кадр как активный или неактивный (т.е. вычисленное отношение сигнала к шуму выше порогового значения отношения сигнала к шуму, что подразумевает, что кадр классифицируется как активный; и вычисленное отношение сигнала к шуму ниже порогового значения отношения сигнала к шуму, что подразумевает, что кадр классифицируется как неактивный). В примерах, ступень 380 может сравнивать гармоничность, полученную посредством ступеней 370-1 и 370-3 спектрального анализа, соответственно, с одним или двумя пороговыми значениями гармоничности (например, с первым пороговым значением для первого канала 301 и со вторым пороговым значением для второго канала 303). В обоих случаях, может быть возможным классифицировать не только каждый кадр, но также и каждый канал каждого кадра как активный канал или неактивный канал.An activity detection stage 380 may be employed (which may be considered voice activity detection in the case of a voice search). The first activity detection stage 380-1 may be applied to the first channel 301 (and, in particular, to the measurements performed on the first channel), and the second activity detection stage 380-3 may be applied to the second channel 303 (and, in particular, to the measurements , performed for the second channel). In examples, activity detection stage 380 may estimate the energy of background noise in input signal 304 and use this estimate to calculate a signal-to-noise ratio that is compared to a signal-to-noise ratio threshold to determine whether a frame is classified as active or inactive (i.e., e. the calculated signal-to-noise ratio is above the signal-to-noise ratio threshold, which implies that the frame is classified as active; and the calculated signal-to-noise ratio is below the signal-to-noise ratio threshold, which implies that the frame is classified as inactive). In examples, stage 380 may compare the harmonicity obtained by spectral analysis stages 370-1 and 370-3, respectively, with one or two harmonicity thresholds (e.g., a first threshold for the first channel 301 and a second threshold for the second channel 303). In both cases, it may be possible to classify not only each frame, but also each channel of each frame as an active channel or an inactive channel.

Решение 381 может выполняться, и на его основе можно принимать решение в (что идентифицируется посредством переключателя 381'') в отношении того, следует выполнять дискретную стереообработку 306a или стереообработку 306b прерывистой передачи (стерео-DTX). В частности, в случае активного кадра (и дискретной стереообработки 306a), кодирование может выполняться согласно любой стратегии либо стандарту или процессу обработки и в силу этого здесь не анализируется более подробно. Большая часть нижеприведенного пояснения относится к стерео-DTX 306b.Decision 381 may be executed, and based on it, a decision may be made (as identified by switch 381'') as to whether discrete stereo processing 306a or discontinuous transfer stereo (stereo-DTX) processing 306b should be performed. Particularly in the case of the active frame (and discrete stereo processing 306a), encoding may be performed according to any strategy or processing standard or process and is therefore not analyzed in more detail here. Most of the following explanation applies to the stereo DTX 306b.

В частности, в примерах кадр классифицируется (в ступени 381) как неактивный кадр, только если оба канала 301 и 303 классифицированы как неактивные ступенями 380-1 и 380-3, соответственно. Следовательно, исключаются проблемы в решении по обнаружению активности, как пояснено выше. В частности, не обязательно передавать в служебных сигналах классификацию активных/неактивных для каждого канала для каждого кадра (за счет этого уменьшая передачу служебных сигналов) и синхронизация между каналами внутренне получается. Кроме того, если декодер является таким, как пояснено в настоящем документе, можно использовать когерентность между первым и вторым каналами 301 и 303 и формировать некоторые шумовые сигналы, которые коррелируются/декоррелируются согласно когерентности, полученной для сигнала 304. Ниже подробно поясняются элементы кодера 300 (300a, 300b), которые используются для кодирования неактивного кадра. Как пояснено, любая другая технология может использоваться для кодирования активных кадров 308 и в силу этого не поясняется здесь.Specifically, in the examples, a frame is classified (in step 381) as an inactive frame only if both channels 301 and 303 are classified as inactive in steps 380-1 and 380-3, respectively. Therefore, problems in the activity detection solution as explained above are eliminated. In particular, it is not necessary to signal the active/inactive classification for each channel for each frame (thereby reducing signaling) and synchronization between channels is internally obtained. In addition, if the decoder is as explained herein, it is possible to use the coherence between the first and second channels 301 and 303 and generate some noise signals that are correlated/decorrelated according to the coherence obtained for the signal 304. The elements of the encoder 300 are explained in detail below. 300a, 300b), which are used to encode the inactive frame. As explained, any other technology may be used to encode active frames 308 and is therefore not explained here.

В общих чертах, кодер 300a, 300b (300) может включать в себя модуль 3040 вычисления параметров шума для вычисления параметрических данных 401, 403 шума для первого и второго каналов 301, 303. Модуль 3040 вычисления параметров шума может вычислять параметрические данные 401, 403 шума (например, индексы и/или усиления) для первого канала 301 и второго канала 303. Модуль 3040 вычисления параметров шума может, таким образом, обеспечивать кодированные аудиоданные 232 в последовательности кадров, которые могут содержать активные кадры 306 и неактивные кадры 308 (которые могут идти после активных кадров 306). В частности, в случае неактивных кадров 308, кодированные аудиоданные 232 могут кодироваться как один или два кадра 241 дескриптора вставки молчания (SID), 243. В некоторых примерах (например, на фиг. 2), предусмотрен только один отдельный кадр SID, в некоторых других, предусмотрено два кадра SID (например, на фиг. 4).In general, the encoder 300a, 300b (300) may include a noise parameter calculation module 3040 for computing noise parameter data 401, 403 for the first and second channels 301, 303. A noise parameter calculation module 3040 may calculate noise parameter data 401, 403 (e.g., indices and/or gains) for the first channel 301 and the second channel 303. The noise parameter calculation module 3040 may thus provide encoded audio data 232 in a sequence of frames, which may include active frames 306 and inactive frames 308 (which may be after active frames 306). Specifically, in the case of inactive frames 308, encoded audio data 232 may be encoded as one or two silent insertion descriptor (SID) frames 241, 243. In some examples (eg, FIG. 2), only one separate SID frame is provided, in some others, two SID frames are provided (eg, in FIG. 4).

Неактивный кадр 308 может включать в себя, в частности по меньшей мере одно из следующего:Inactive frame 308 may include, but is not limited to, at least one of the following:

- вспомогательную информацию формирования комфортного шума (например, 402, p_frame);- auxiliary information for the formation of comfortable noise (for example, 402, p_frame);

- данные 401 параметров комфортного шума для первого канала 301 либо первую линейную комбинацию данных параметров комфортного шума для первого канала 301 и данных параметров комфортного шума для второго канала (v_{l, ind}, v_{m, ind}, p_noise, усиление g_{l, q});- comfort noise parameter data 401 for the first channel 301 or a first linear combination of the comfort noise parameter data for the first channel 301 and the comfort noise parameter data for the second channel (v _{l, ind} , v _{m, ind} , p_noise, gain g _{l, q} );

- данные 403 параметров комфортного шума для второго канала 303 либо вторую линейную комбинацию данных параметров комфортного шума для первого канала 301 и данных (v_{r, ind}, v_{s, ind}, p_noise, усиление g_{r, q}) параметров комфортного шума для второго канала;- comfort noise parameter data 403 for the second channel 303 or a second linear combination of the comfort noise parameter data for the first channel 301 and the comfort noise parameter data (v _{r, ind} , v _{s, ind} , p_noise, gain g _{r, q} ) for the second channel;

- информацию (c, 404) когерентности (данные когерентности).- information (c, 404) coherence (coherence data).

В некоторых примерах, первый кадр 241 дескриптора вставки молчания может включать в себя первые два пункта вышеприведенного списка, и второй кадр 243 дескриптора вставки молчания может включать в себя последние два признака в конкретных полях данных. Несмотря на это, различные протоколы могут предусматривать различные поля данных или другую организацию потока битов. Тем не менее, в некоторых случаях (например, на фиг. 2), может быть предусмотрен только один отдельный неактивный кадр для параметров шума для обоих каналов.In some examples, the first silent insert descriptor frame 241 may include the first two items of the above list, and the second silent insert descriptor frame 243 may include the last two features in particular data fields. However, different protocols may provide different data fields or different bitstream organization. However, in some cases (eg, FIG. 2), there may be only one separate inactive frame for noise parameters for both channels.

Показано, что информация когерентности (например, часть «дескриптора вставки молчания») может включать в себя одно отдельное значение (например, кодированное в небольшом количестве битов, к примеру, в четырех битах), которое указывает информацию когерентности (например, корреляционные данные), например, когерентность между первым каналом 301 и вторым каналом 303 того же неактивного кадра 308. С другой стороны, данные 401, 403 параметров комфортного шума могут указывать, для каждого канала 301, 303, энергию сигналов для неактивного кадра 308 (например, они могут фактически обеспечивать огибающую) либо в любом случае могут обеспечивать информацию формы шума. Огибающая или информация формы шума может иметь форму нескольких коэффициентов для частотных элементов разрешения и усиления для каждого канала. Информация формы шума может получаться в ступени 312 (см. ниже) с использованием исходных входных каналов (301, 303), и после этого среднее/боковое кодирование выполняется для векторов параметров формы шума. Показано, что в декодере может быть возможным формировать некоторые шумовые каналы (например, 201, 203, как указано на фиг. 3), которые могут воздействоваться посредством информации 404 когерентности. Шумовые каналы 201, 203, сформированные посредством CNG 220 (220a-220), могут в силу этого модифицироваться посредством модуля 250 модификации сигналов, управляемого посредством данных шума управления (данных 401, 403, 2312 параметров комфортного шума), которые указывают энергии сигналов для первого аудиоканала L_out и второго аудиоканала R_out.It is shown that the coherence information (e.g., the “silence insertion descriptor” portion) may include one single value (e.g., encoded in a small number of bits, e.g., four bits) that indicates coherence information (e.g., correlation data) for example, the coherence between the first channel 301 and the second channel 303 of the same inactive frame 308. On the other hand, the comfort noise parameter data 401, 403 may indicate, for each channel 301, 303, the signal energy for the inactive frame 308 (for example, it may actually provide an envelope) or in any case may provide noise shape information. The envelope or noise shape information may take the form of several coefficients for frequency elements of resolution and gain for each channel. The noise shape information may be obtained in stage 312 (see below) using the original input channels (301, 303), and thereafter mid/side encoding is performed on the noise shape parameter vectors. It is shown that in the decoder it may be possible to generate certain noise channels (eg, 201, 203, as indicated in FIG. 3), which can be affected by the coherence information 404. The noise channels 201, 203 generated by the CNG 220 (220a-220) can therefore be modified by the signal modification module 250 controlled by control noise data (comfort noise parameter data 401, 403, 2312) that indicate the signal energies for the first audio channel L _out and second audio channel R _out .

Аудиокодер 300 (300a, 300b) может включать в себя модуль 320 вычисления когерентности, который может получать информацию (404) когерентности, которая должна кодироваться в потоке битов (например, в сигнале 232, в кадре 241 или 243). Информация (c, 404) когерентности может указывать ситуацию когерентности между первым каналом 301 (например, левым каналом) и вторым каналом 303 (например, правым каналом) в неактивном кадре 308. В дальнейшем поясняются примеры означенного.Audio encoder 300 (300a, 300b) may include a coherence calculator 320 that may obtain coherence information (404) to be encoded in the bit stream (eg, in signal 232, frame 241 or 243). The coherence information (c, 404) may indicate a coherence situation between the first channel 301 (eg, the left channel) and the second channel 303 (eg, the right channel) in the inactive frame 308. Examples of the foregoing will be explained.

Кодер 300 (300a, 300b) может включать в себя выходной интерфейс 310, выполненный с возможностью формирования многоканального аудиосигнала 232 (потока битов) с кодированными аудиоданными для активного кадра 306 и, для неактивного кадра 308, первыми параметрическими данными 401 (p_noise, left) (параметрическими данными комфортного шума) вторыми параметрическими данными (p_noise, right 403) шума и данными c (404) когерентности. Первые параметрические данные 401 могут представлять собой параметрические данные первого канала (например, левого канала) либо первой линейной комбинации первого и второго канала (например, среднего канала). Вторые параметрические данные 403 могут представлять собой параметрические данные второго канала (например, правого канала) либо вторую линейную комбинацию первого и второго канала (например, бокового канала), отличающуюся от первой линейной комбинации.Encoder 300 (300a, 300b) may include an output interface 310 configured to generate a multi-channel audio signal 232 (bit stream) with encoded audio data for an active frame 306 and, for an inactive frame 308, first parameter data 401 (p_noise, left) ( parametric comfort noise data) second parametric data (p_noise, right 403) noise and c (404) coherence data. The first parameter data 401 may be the parameter data of a first channel (eg, a left channel) or a first linear combination of a first and a second channel (eg, a middle channel). The second parameter data 403 may be parameter data of a second channel (eg, a right channel) or a second linear combination of the first and second channel (eg, a side channel) different from the first linear combination.

В потоке 232 битов, также может быть предусмотрена вспомогательная информация 402, включающая в себя индикатор для того, представляет текущий кадр собой активный кадр 306 или неактивный кадр 308, например, чтобы информировать декодер в отношении технологий декодирования, которые должны использоваться.In the bit stream 232, ancillary information 402 may also be provided including an indicator as to whether the current frame is an active frame 306 or an inactive frame 308, for example, to inform the decoder as to the decoding technologies to be used.

В частности, на фиг. 4 показан модуль 3040 вычисления параметров шума (каскад вычисления параметров шума), включающий в себя первую ступень 304-1 модуля вычисления параметров шума, в котором могут вычисляться данные 401 параметров комфортного шума для первого канала 301, и вторую ступень 304-3 модуля вычисления параметров шума, в которой может вычисляться второй параметр 403 комфортного шума для второго канала 303. На фиг. 2 показан пример, в котором параметры шума обрабатываются и квантуются объединенным образом. Внутренние части (например, преобразование векторов форм шума в представление M/S) показаны на фиг 5. По существу, можно иметь форму шума первого канала M и форму шума второго канала S, которые могут кодироваться как средние индексы и боковые индексы, тогда как усиление для формы шума левого канала 301 и усиления для формы шума правого канала 303 также могут кодироваться.In particular, in FIG. 4 shows a noise parameter calculation module 3040 (noise parameter calculation stage) including a first noise parameter calculation module stage 304-1 in which comfort noise parameter data 401 for the first channel 301 can be calculated, and a second parameter calculation module stage 304-3. noise, in which a second comfort noise parameter 403 for the second channel 303 can be calculated. FIG. 2 shows an example in which noise parameters are processed and quantized in a joint manner. The internals (e.g., converting noise shape vectors to M/S representation) are shown in FIG. 5. Essentially, one can have a first channel noise shape M and a second channel noise shape S, which can be encoded as mid indices and side indices, while the gain for the left channel noise shape 301 and the gain for the right channel noise shape 303 may also be encoded.

Модуль 320 вычисления когерентности может вычислять данные c (404) когерентности (информацию когерентности), которые указывают ситуацию когерентности между первым каналом L и вторым каналом R. В этом случае, модуль 320 вычисления когерентности может работать в частотной области.The coherence calculation unit 320 may calculate coherence data c (404) (coherence information) that indicates the coherence situation between the first channel L and the second channel R. In this case, the coherence calculation unit 320 may operate in the frequency domain.

Как можно видеть, модуль 320 вычисления когерентности может включать в себя ступень 320'' вычисления канальной когерентности, в которой получается значение c (404) когерентности. После неё может использоваться ступень 320'' равномерного квантования. Следовательно, может быть получена квантованная версия c_ind значения c когерентности.As can be seen, the coherence calculating module 320 may include a channel coherence calculating stage 320'' in which a coherence value c (404) is obtained. After this, a uniform quantization stage 320'' can be used. Therefore, a quantized version c _ind of the coherence value c can be obtained.

Ниже приведены некоторые пояснения в отношении того, каким образом следует получить когерентность, и того, каким образом следует квантовать ее.Below are some explanations regarding how coherence should be obtained and how it should be quantized.

Модуль 320 вычисления когерентности может, в некоторых примерах:Coherence calculation module 320 may, in some examples:

- вычислять действительное промежуточное значение и мнимое промежуточное значение из комплексных спектральных значений для первого канала и второго канала (303) в неактивном кадре;- calculate the real intermediate value and the imaginary intermediate value from the complex spectral values for the first channel and the second channel (303) in the inactive frame;

- вычислять первое значение энергии для первого канала и второе значение энергии для второго канала (303) в неактивном кадре; и- calculate a first energy value for the first channel and a second energy value for the second channel (303) in the inactive frame; And

- вычислять данные (404, c) когерентности с использованием действительного промежуточного значения, мнимого промежуточного значения, первого значения энергии и второго значения энергии, и/или- calculate coherence data (404, c) using a real intermediate value, an imaginary intermediate value, a first energy value and a second energy value, and/or

- сглаживать по меньшей мере одно из действительного промежуточного значения, мнимого промежуточного значения, первого значения энергии и второго значения энергии и вычислять данные когерентности с использованием по меньшей мере одного сглаженного значения.- smoothing at least one of the real intermediate value, the imaginary intermediate value, the first energy value and the second energy value and calculating coherence data using the at least one smoothed value.

Модуль 320 вычисления когерентности может возводить в квадрат сглаженное действительное промежуточное значение и возводить в квадрат сглаженное мнимое промежуточное значение и суммировать возведенные в квадрат значения для получения первого компонентного числа. Модуль 320 вычисления когерентности может умножать сглаженные первое и второе значения энергии для получения второго компонентного числа, и комбинировать первое и второе компонентные числа для получения результирующего числа для значения когерентности, на котором основаны данные когерентности. Модуль 320 вычисления когерентности может вычислять квадратный корень результирующего числа для получения значения когерентности, на котором основаны данные когерентности. Ниже приведены примеры формул.The coherence calculator 320 may square the smoothed real intermediate value and square the smoothed imaginary intermediate value and sum the squared values to obtain a first component number. Coherence calculation module 320 may multiply the smoothed first and second energy values to obtain a second component number, and combine the first and second component numbers to obtain a resultant number for the coherence value on which the coherence data is based. The coherence calculation module 320 may calculate the square root of the resultant number to obtain a coherence value on which the coherence data is based. Below are examples of formulas.

Ниже поясняется то, каким образом получается форма формы шума (или другой энергии сигналов), которая должна подготавливаться посредством рендеринга в декодере. Кодируется, по существу, форма (или другая информация, связанная с энергией) шума исходного входного сигнала 302, который в декодере должен применяться к шуму 252 и должен формировать его, с тем чтобы выполнять рендеринг шума 252 (выходного аудиосигнала), который напоминает исходный шум сигнала 304.The following explains how the shape of the noise waveform (or other signal energy) is obtained, which must be prepared by rendering in the decoder. Essentially, the shape (or other energy-related information) of the noise of the original input signal 302 is encoded, which in the decoder is to be applied to and shape the noise 252 so as to render noise 252 (the output audio signal) that resembles the original noise. signal 304.

Сначала следует отметить, что сигнал 304 как таковой не кодируется в потоке 232 битов посредством кодера. Тем не менее, информация шума (например, информация энергии, информация огибающей) может кодироваться в потоке 232 битов, с тем чтобы затем формировать шумовой сигнал, который имеет форму шума, кодированную посредством кодера.It should first be noted that signal 304 as such is not encoded into the 232 bit stream by the encoder. However, noise information (eg, energy information, envelope information) can be encoded into a 232 bit stream to then generate a noise signal that has a noise shape encoded by an encoder.

Блок 312 получения формы шума может применяться к входному сигналу 304 кодера. Блок 312 «получения формы шума» может вычислять параметрическое представление 1312 низкого разрешения спектральной огибающей шума во входном сигнале 304. Это может осуществляться, например, посредством вычисления значений энергии в полосах частот представления в частотной области входного сигнала 304. Значения энергии могут быть преобразованы в логарифмическое представление (при необходимости) и могут уплотняться в меньшее число (N) параметров, которые впоследствии используются в декодере для формирования комфортного шума. Эти представления низкого разрешения шума здесь называются «формами 1312 шума». Следовательно, то, что находится после блока 312 «получения формы шума», следует понимать не как представляющее входной сигнал 304, а как представляющее его форму шума (параметрические представления спектральных огибающих шума в соответствующих каналах). Это является важным, поскольку кодер может передавать только это представление более низкого разрешения спектральной огибающей шума в -кадре SID. Таким образом, на фиг. 2, вся часть (3040) «модуля вычисления параметров шума» может пониматься как работающая только с этими векторами связанных с шумом параметров (например, идентифицированными как v_l, v_r, v_{m, ind} и v_{s, ind}) а не с представлениями сигналов для сигнала 304.The noise shape obtaining block 312 may be applied to the encoder input signal 304. The "noise shape acquisition" block 312 may compute a low-resolution parametric representation 1312 of the spectral envelope of the noise in the input signal 304. This may be accomplished, for example, by calculating the energy values in the frequency domain representation bands of the input signal 304. The energy values may be converted to logarithmic representation (if necessary) and can be compressed into a smaller number (N) of parameters, which are subsequently used in the decoder to generate comfortable noise. These low-resolution noise representations are referred to herein as “noise shapes 1312.” Therefore, what comes after the "get noise shape" block 312 should be understood not as representing the input signal 304, but as representing its noise shape (parametric representations of the spectral envelopes of the noise in the corresponding channels). This is important because the encoder can only transmit this lower resolution representation of the noise spectral envelope in the SID frame. Thus, in FIG. 2, the entire "noise parameter calculation module" portion (3040) can be understood to operate only on these vectors of noise-related parameters (eg, identified as _vl , _vr ,vm _,ind and vs _,ind ) and not on the representations signals for signal 304.

Фиг. 5 показывает пример части 3040 «модуля вычисления параметров шума» (объединенного квантования по форме шума). Ступень 314 преобразователя L/R-M/S может применяться для получения среднего канального представления v_m формы 1312 шума (первой линейной комбинации форм шума каналов L и R) и бокового канального представления v_r формы 1312 шума (второй линейной комбинации форм шума для форм шума каналов L и R). Ниже показан способ их получения. Соответственно, форма шума 304 может в результате разделяться на два канала v_m и v_r.Fig. 5 shows an example of a “noise parameter calculation module” (combined noise shape quantization) portion 3040. L/RM/S converter stage 314 may be used to obtain an average channel representation v _m of the noise shape 1312 (the first linear combination of the L and R channel noise shapes) and a side channel representation v _r of the noise shape 1312 (the second linear combination of the noise shapes for the channel noise shapes L and R). The method for obtaining them is shown below. Accordingly, the noise waveform 304 may result in being separated into two channels v _m and v _r .

Затем, в ступени 316 нормализации может быть нормализовано по меньшей мере одно из среднего канального представления v_m формы 1312 шума и бокового канального представления v_r формы 1312 шума для получения нормализованной версии v_{m, n} среднего канального представления v_m формы 1312 шума и/или нормализованной версии v_{r, n} бокового канального представления v_r формы 1312 шума.Then, in normalization stage 316, at least one of the middle channel representation v _m of noise shape 1312 and the side channel representation v _r of noise shape 1312 may be normalized to obtain a normalized version v _{m, n of} the middle channel representation v _m of noise shape 1312 and/or a normalized version of the v _{r, n} side channel representation of the v _r shape 1312 of the noise.

Затем ступень 318 квантования (например, векторное квантование, VQ) может применяться к нормализованной версии сигнала 1304, например, в форме квантованной версии v_{m, ind} нормализованного среднего канального представления v_{m, n} формы 1312 шума и квантованной версии v_{s, ind} нормализованного бокового канального представления v_{s, n} формы 1312 шума. Может использоваться векторное квантование (например, через многоступенчатый векторный квантователь). Следовательно, индексы v_{m, ind}[k] (k является индексом конкретного частотного элемента разрешения) могут описывать среднее представление формы шума, и индексы v_{s, ind}[k] могут описывать боковое представление формы шума. Индексы v_{m, ind}[k] и v_{s, ind}[k] в силу этого могут кодироваться в потоке 232 битов в качестве первой линейной комбинации данных параметров комфортного шума для первого канала и данных параметров комфортного шума для второго канала и второй линейной комбинации данных параметров комфортного шума для первого канала и данных параметров комфортного шума для второго канала.A quantization stage 318 (e.g., vector quantization, VQ) may then be applied to a normalized version of the signal 1304, e.g., in the form of a quantized version v _{m, ind} of the normalized channel average representation v _{m, n} of the noise form 1312 and a quantized version v _{s, ind} of the normalized side channel representation v _{s, n} of noise form 1312. Vector quantization can be used (for example, through a multi-stage vector quantizer). Therefore, the indices v _{m, ind} [k] (k is the index of a particular frequency bin) may describe the average representation of the noise shape, and the indices v _{s, ind} [k] may describe the side representation of the noise shape. The indices v _{m, ind} [k] and v _{s, ind} [k] can therefore be encoded in the 232 bit stream as a first linear combination of the comfort noise parameter data for the first channel and the comfort noise parameter data for the second channel and the second linear combination of data comfort noise parameters for the first channel and comfort noise parameters data for the second channel.

В ступени 322 деквантования может выполняться деквантование для квантованной версии v_{m, ind} нормализованного среднего канального представления v_{m, n} формы 1312 шума и квантованной версии v_{s, ind} нормализованного бокового канального представления v_{s, n} формы 1312 шума.At dequantization stage 322, dequantization may be performed on a quantized version v _{m, ind} of the normalized mid-channel representation v _{m, n} of the noise shape 1312 and a quantized version v _{s, ind} of the normalized side channel representation v _{s, n} of the noise shape 1312.

Преобразователь 324 M/S-L/R может применяться к деквантованным версиям деквантованных среднего и бокового представлений v_{m, q} и v_{s, q}формы 1312 шума для получения версии v'_l и v'_r формы 1312 шума в исходных (левом и правом) каналах.The M/SL/R converter 324 can be applied to dequantized versions of the dequantized mid and side v _{m, q} and v _{s, q} representations of the noise shape 1312 to produce the v' _l and v' _r versions of the noise shape 1312 in the original (left and right) channels .

Затем в ступени 326 могут вычисляться усиления g_l и g_r. В частности, усиления являются допустимыми для всех выборок формы (v'_l и v'_r) шума одинакового канала одинакового неактивного кадра 306. Усиления g_l и g_r могут получаться с учетом совокупности (или почти совокупности) частотных элементов разрешения в представлениях v'_l и v'_r формы шума.The gains g _l and g _r can then be calculated in stage 326. In particular, the gains are valid for all noise shape samples (v' _l and v' _r ) of the same channel of the same inactive frame 306. The gains g _l and g _r can be obtained by taking into account a population (or nearly a population) of frequency bins in the v' representations _l and v' _r noise shapes.

Усиление g_l может получаться посредством сравнения:Gain g _l can be obtained by comparison:

- значений частотных элементов разрешения формы шума первого канала 301 в области L/R (перед -преобразователем 314 L/R-M/S); - values of the frequency elements of the noise waveform resolution of the first channel 301 in the L/R region (before the L/R-M/S converter 314);

- со значениями частотных элементов разрешения формы 1312 шума, после повторного преобразования в области L/R, первого канала 301 (после преобразователя 324 M/S-L/R).- with the values of the frequency elements of the resolution of the noise form 1312, after re-conversion in the L/R domain, of the first channel 301 (after the M/S-L/R converter 324).

Аналогично, усиление g_r может получаться посредством сравнения:Likewise, the gain g _r can be obtained by comparison:

- значений коэффициентов формы шума второго канала 303 в области L/R (перед преобразователем 314 L/R-M/S); - values of the noise shape coefficients of the second channel 303 in the L/R region (before the L/R-M/S converter 314);

- со значениями коэффициентов формы 1312 шума, после повторного преобразования в области L/R, второго канала 303 (после преобразователя 324 M/S-L/R).- with the values of the noise form coefficients 1312, after repeated conversion in the L/R region, of the second channel 303 (after the M/S-L/R converter 324).

Ниже предложен пример того, каким образом следует получать усиления. Тем не менее, усиление, в линейной области, например, может быть пропорциональным среднему геометрическому кратности долей, при этом каждая доля представляет собой долю между коэффициентами формы шума конкретного канала в области L/R (перед преобразователем 314 L/R-M/S) и коэффициентами того же канала после повторного преобразования в области L/R после преобразователя 324 M/S-L/R. В логарифмической области, для каждого канала усиление может быть получено как пропорциональное алгебраическому среднему между разностями между коэффициентами коэффициенты версии FD формы шума в области L/R (перед преобразователем 314 L/R-M/S) и коэффициентами формы шума после повторного преобразования в области L/R после преобразователя 324 M/S-L/R. В общем, в логарифмической или скалярной области, усиление может обеспечивать взаимосвязь между версией формы шума левого или правого канала перед преобразованием L/R-M/S и квантованием с версией формы шума левого или правого канала после деквантования и обратного преобразования M/S-L/R.Below is an example of how to get boosts. However, the gain, in the linear domain, for example, can be proportional to the geometric mean of the fractions, with each fraction being a fraction between the noise shape factors of a particular channel in the L/R domain (before the L/R-M/S converter 314) and the coefficients the same channel after re-conversion into the L/R region after the 324 M/S-L/R converter. In the logarithmic domain, for each channel, the gain can be obtained as proportional to the algebraic average between the differences between the coefficients of the FD version of the noise shape in the L/R domain (before the 314 L/R-M/S converter) and the coefficients of the noise shape after reconversion in the L/ domain R after converter 324 M/S-L/R. In general, in the logarithmic or scalar domain, gain may provide a relationship between the left or right channel version of the noise waveform before the L/R-M/S transformation and the quantization with the left or right channel version of the noise waveform after dequantization and the inverse M/S-L/R transformation.

Ступень 328 квантования может применяться к усилению g_l для получения его квантованной версии, указываемой g_{l, q}, к усилению g_r для получения его квантованной версии, указываемуой g_{r, q}, которая может быть получена из неквантованного усиления g_r. Усиления g_{l, q} и g_{r, q} могут кодироваться в потоке 232 битов (например, в качестве данных 401 и/или 403 параметров комфортного шума), который должен считываться декодером.A quantization stage 328 may be applied to the gain g _l to obtain a quantized version thereof, indicated by g _{l, q} , to the gain g _r to obtain a quantized version thereof, indicated by g _{r, q} , which may be obtained from the unquantized gain g _r . The gains g _{l, q} and g _{r, q} may be encoded into a 232 bit stream (eg, as comfort noise parameter data 401 and/or 403), which must be read by a decoder.

В некоторых примерах также можно сравнивать энергию вектора форм бокового канального шума (например, перед нормализацией, например, между ступенями 314 и 316) с заданным пороговым значением α энергии (которое может быть положительным действительным значением) (которое в этом случае составляет 0,1, но также может составлять другое значение, например значение от 0,05 до 0,15). В блоке 435 сравнения можно определять, имеет ли боковое представление v_sформы шума неактивного кадра 308 достаточно энергии. Если энергия бокового представления v_sформы шума меньше порогового значения α энергии, то двоичные результаты («флаг no-side»), в качестве вспомогательной информации 402 передаются в служебных сигналах в потоке 232 битов. Здесь предполагается, что флаг no-side=1, если энергия бокового представления v_sформы шума меньше порогового значения α энергии, и флаг no-side=0, если энергия бокового представления v_sформы шума больше порогового значения α энергии. В некоторых случаях, флаг может быть равным 1 или 0 согласно конкретному варианту применения в случае, если энергия точно равна пороговому значению энергии. Блок 436 отрицает двоичное значение флага 436 no-side (если ввод блока 436 равен 1, то вывод 436'' равен 0; если ввод блока 436 равен 0, то вывод 436'' равен 1). Блок 436 показан как выдающий в качестве вывода 436'' противоположное значение флага. Соответственно, если энергия бокового представления v_sформы шума больше порогового значения энергии, то значение 436'' может быть равным 1, и если энергия бокового представления v_sформы шума меньше заданного порогового значения, то значение 436'' равно 0. Следует отметить, что деквантованное значение v_{s, q} может умножаться на двоичное значение 436''. Это представляет собой просто один возможный способ получения того, что если энергия бокового представления v_sформы шума меньше заданного порогового значения α энергии, то элементы разрешения деквантованного бокового представления v_{s, q} формы шума искусственно обнуляются (вывод 437'' блока 437 должен быть равен 0). С другой стороны, если энергия бокового представления v_sформы шума является достаточно большой (>α), то вывод 437'' блока 437 (умножителя) может быть точно равным v_{s, q}. Соответственно, если энергия бокового представления v_sформы шума меньше заданного порогового значения α энергии, боковое представление v_sформы шума (и, в частности, ее деквантованной версии v_{s, q})не учитывается при получении левого/правого представлений формы шума. (Показано, что помимо этого или альтернативно, декодер также может иметь аналогичный механизм, который обнуляет коэффициенты бокового представления формы шума). Следует отметить, что флаг no-side также может кодироваться в потоке 232 битов в качестве части вспомогательной информации 402.In some examples, it is also possible to compare the energy of a vector of sidechannel noise shapes (e.g., before normalization, e.g., between stages 314 and 316) with a given energy threshold α (which may be a positive real value) (which in this case is 0.1. but may also be another value, for example a value between 0.05 and 0.15). At comparison block 435, it is possible to determine whether the side view v_sthe noise shape of the inactive frame 308 is enough energy. If the energy of the lateral representation v_snoise shape is less than the α energy threshold, then the binary results ("no-side flag"), as auxiliary information 402, are signaled in a 232 bit stream. Here the no-side flag is assumed to be 1 if the side view energy is v_snoise shapes are less than the α energy threshold, and the no-side=0 flag if the side view energy v_snoise shapes are greater than the α energy threshold. In some cases, the flag may be 1 or 0 according to a particular application in case the energy is exactly equal to the energy threshold. Block 436 negates the binary value of the no-side flag 436 (if block 436's input is 1, then output 436'' is 0; if block 436's input is 0, then output 436'' is 1). Block 436 is shown as outputting 436'' the opposite value of the flag. Accordingly, if the energy of the lateral representation v_snoise shape is greater than the energy threshold, then the value of 436'' can be equal to 1, and if the energy of the side view v_snoise shape is less than a given threshold value, then the value of 436'' is equal to 0. It should be noted that the dequantized value of v_{s, q} can be multiplied by the binary value 436''. This represents simply one possible way of obtaining that if the side view energy v_sform of noise is less than a given threshold value α of energy, then the resolution elements of the dequantized side representation v_{s, q} noise forms are artificially reset to zero (pin 437'' of block 437 should be equal to 0). On the other hand, if the energy of the lateral representation v_sthe noise shape is large enough (>α), then the output 437'' of block 437 (multiplier) can be exactly equal to v_{s, q}. Accordingly, if the energy of the lateral representation v_snoise shapes less than a given threshold α energy, side view v_snoise form (and in particular its dequantized version v_{s, q})is not taken into account when obtaining left/right representations of the noise shape. (It is shown that in addition or alternatively, the decoder may also have a similar mechanism that nulls the side representation coefficients of the noise shape). It should be noted that the no-side flag may also be encoded in the 232 bit stream as part of the side information 402.

Следует отметить, что энергия бокового представления формы шума показана как измеряемая (посредством блока 435) перед нормализацией формы шума (в блоке 316), и энергия не нормализуется до ее сравнения с пороговым значением. В принципе, она также может измеряться посредством блока 435 после нормализации формы шума (например, блок 435 может вводиться посредством v_{s, n} вместо v_s).It should be noted that the energy of the side representation of the noise shape is shown as being measured (via block 435) before the noise shape is normalized (at block 316), and the energy is not normalized before it is compared with a threshold value. In principle, it could also be measured by block 435 after normalizing the noise shape (for example, block 435 could be input by v _{s, n} instead of v _s ).

Со ссылкой на пороговое значение α, используемое для сравнения энергии бокового представления формы шума, значение 0,1, в некоторых примерах, может произвольно выбираться. В примерах, пороговое значение α может выбираться после экспериментирования и тонкой подстройки (например, посредством калибровки). В некоторых примерах, в принципе, может использоваться любое число, которое работает для числового формата (с плавающей запятой или с фиксированной запятой) либо точности отдельной реализации. Следовательно, пороговое значение α может представлять собой конкретный для реализации параметр, который может вводиться после калибровки.With reference to the threshold value α used to compare the energy of the side representation of the noise shape, a value of 0.1, in some examples, can be arbitrarily selected. In examples, the threshold value α may be selected after experimentation and fine tuning (eg, through calibration). In some examples, in principle, any number that works for the number format (floating point or fixed point) or the precision of a particular implementation can be used. Therefore, the threshold value α may be an implementation-specific parameter that may be entered after calibration.

Следует отметить, что выходной интерфейс (310) может быть выполнен с возможностью:It should be noted that the output interface (310) may be configured to:

- формирования кодированного многоканального аудиосигнала (232), имеющего кодированные аудиоданные для активного кадра (306) с использованием первого множества коэффициентов для первого числа частотных элементов разрешения; и- generating an encoded multi-channel audio signal (232) having encoded audio data for the active frame (306) using a first set of coefficients for a first number of frequency bins; And

Фактически, уменьшенное разрешение может использоваться для неактивных кадров, за счет этого дополнительное уменьшая количество битов, используемых для кодирования потока битов. То же применимо к декодеру.In fact, reduced resolution can be used for inactive frames, thereby further reducing the number of bits used to encode the bit stream. The same applies to the decoder.

Любой из примеров кодера может управляться посредством подходящего контроллера.Any of the encoder examples may be controlled by a suitable controller.

ДекодерDecoder

Ниже поясняются декодеры согласно примерам. Декодер может включать в себя, например, генератор 220 (220a-220e) комфортного шума, поясненный выше, например, показанный на фиг. 3a-3f. Комфортный шум 204 (многоканальный аудиосигнал) может формироваться в модуле 250 модификации сигналов для получения выходного сигнала 252. Здесь интересна демонстрация операций для формирования шума в неактивных кадрах 308, а не операций для активных кадров 206.The decoders are explained below according to examples. The decoder may include, for example, a comfort noise generator 220 (220a-220e) explained above, such as shown in FIG. 3a-3f. Comfort noise 204 (multi-channel audio signal) may be generated at signal modification module 250 to produce output signal 252. Of interest here is the demonstration of operations for generating noise in inactive frames 308 rather than operations for active frames 206.

Фиг. 4 показывает первый пример декодера 200'', здесь указываемого с помощью 200'' (200b). Следует отметить, что декодер 200' включает в себя генератор 220 комфортного шума, который может включать в себя генератор 220 (220a-220e) согласно любому по фиг. 3a-3f. После генератора 220 (220a-220e) может присутствовать модуль 250 модификации сигналов (не показан, но показан на фиг. 4) для формирования сформированного многоканального шума 204 согласно энергетическим параметрам, кодированным в данных (401, 403) параметров комфортного шума. Через входной интерфейс 210 декодера декодер 200'' может получать из потока 232 битов данные (401, 403) параметров комфортного шума, которые могут включать в себя данные параметров комфортного шума, описывающие энергию сигнала (например, для первого канала и второго канала либо для первой линейной комбинации и второй линейной комбинации первого и второго каналов, причем первая и вторая линейные комбинации являются линейно независимыми друг от друга). Через входной интерфейс 210 декодера, декодер 200'' может получать данные 404 когерентности, которые указывают когерентность между различными каналами Фиг. 4 показывает то, что в потоке 232 битов для кодирования неактивных кадров, предусмотрены два различных кадра 241 и 243 дескриптора молчания, соответственно, но имеется возможность использования более чем двух кадров дескриптора или только одного отдельного кадра дескриптора. Вывод декодера 200b представляет собой многоканальный вывод.Fig. 4 shows a first example of decoder 200'', here indicated by 200'' (200b). It should be noted that the decoder 200' includes a comfort noise generator 220, which may include a generator 220 (220a-220e) according to any of FIG. 3a-3f. Following the generator 220 (220a-220e), a signal modification module 250 (not shown, but shown in FIG. 4) may be present to generate a shaped multi-channel noise 204 according to the energy parameters encoded in the comfort noise parameter data (401, 403). Via decoder input interface 210, decoder 200'' may obtain from the 232 bit stream comfort noise parameter data (401, 403), which may include comfort noise parameter data describing signal energy (eg, for the first channel and the second channel, or for the first linear combination and a second linear combination of the first and second channels, the first and second linear combinations being linearly independent of each other). Through decoder input interface 210, decoder 200'' can receive coherence data 404 that indicates coherence between different channels of FIG. 4 shows that in the 232 bit stream for encoding inactive frames, two different silent descriptor frames 241 and 243 are provided, respectively, but it is possible to use more than two descriptor frames or just one separate descriptor frame. The output of the decoder 200b is a multi-channel output.

Обращаясь к фиг. 2, ниже поясняется декодер 200'' (здесь указанный позицией 200a), который представляет собой пример декодера 200, который может использоваться для формирования выходного сигнала 252, например, в форме шума.Referring to FIG. 2, the decoder 200'' (here indicated at 200a) is explained below, which is an example of a decoder 200 that can be used to generate an output signal 252, for example, in the form of noise.

Сначала, например, декодер 200a (200'') может включать в себя входной интерфейс 210 для приема кодированных аудиоданных 232 (потока битов) в последовательности кадров 306, 308, кодированных посредством кодера 300a или 300b. Декодер 200a (200''), например, может представлять собой либо, если обобщать, представлять собой часть генератора 200 многоканальных сигналов, который может представлять собой или включать в себя генератор 220 (220a-220e) комфортного шума по любому из фиг. 3a-3f.First, for example, decoder 200a (200'') may include an input interface 210 for receiving encoded audio data 232 (bit stream) in a sequence of frames 306, 308 encoded by encoder 300a or 300b. Decoder 200a (200''), for example, may be or, more generally, be part of a multi-channel signal generator 200, which may be or include a comfort noise generator 220 (220a-220e) of any one of FIGS. 3a-3f.

Сначала, фиг. 2 показывает стереогенератор 220 (220a-220e) комфортного шума (CNG). В частности, генератор 220 (220a-220e) комфортного шума может быть похожим на генератор комфортного шума по фиг. 3a-3f либо на один из его вариантов. Здесь, информация 404 когерентности (например, c или более точно c_q,также указываемая с помощью "coh" или c_ind), полученная из кодера 300a или 300b, может использоваться для формирования многоканального сигнала 204 (в каналах 201, 203), который пояснен выше. Многоканальный сигнал 204, сформированный посредством CNG 220 (220a-220e), может фактически дополнительно модифицироваться, например, за счет учета данных 401 и 403 параметров комфортного шума, например, информации формы шума для первого (левого) канала и второго (правого) канала многоканального сигнала, который должен формироваться. В частности, показано, что имеется возможность получения средних индексов v_{m, ind} (401) и боковых индексов v_{s, ind} (403), сформированных посредством кодера 300a (и, в частности, посредством модуля 3040 вычисления параметров шума) в ступени 316 и/или 318, и усилений g_{l, q} и g_{r, q}, полученных в ступени 326 и/или 328.First, fig. 2 shows a stereo comfort noise generator (CNG) 220 (220a-220e). In particular, the comfort noise generator 220 (220a-220e) may be similar to the comfort noise generator of FIG. 3a-3f or one of its variants. Here, coherence information 404 (e.g., c or more specifically c _q, also indicated by "coh" or c _ind ) obtained from encoder 300a or 300b may be used to generate a multi-channel signal 204 (in channels 201, 203) that explained above. The multi-channel signal 204 generated by the CNG 220 (220a-220e) may in fact be further modified, for example, by taking into account comfort noise parameter data 401 and 403, such as noise shape information for the first (left) channel and the second (right) channel of the multi-channel signal that should be generated. In particular, it is shown that it is possible to obtain the middle indices v _{m, ind} (401) and side indices v _{s, ind} (403) generated by the encoder 300a (and, in particular, by the noise parameter calculation module 3040) in stage 316 and /or 318, and gains g _{l, q} and g _{r, q} obtained in stage 326 and/or 328.

Как показано на фиг. 2, вспомогательная информация 402 может позволить определить, является ли текущий кадр активным кадром 306 или неактивным кадром 308. Элементы по фиг. 2 означают обработку неактивных кадров 308, и подразумевается, что любая технология может использоваться для формирования выходного сигнала в активных кадрах 306, что в силу этого не представляет собой цель настоящего документа.As shown in FIG. 2, supporting information 402 may allow it to be determined whether the current frame is an active frame 306 or an inactive frame 308. The elements of FIG. 2 refer to processing of inactive frames 308, and it is understood that any technology can be used to generate output in active frames 306, which is therefore not the purpose of this document.

Как показано на фиг. 2, несколько примеров данных комфортного шума получаются из потока 232 битов. Данные комфортного шума могут включать в себя, как пояснено выше, информацию 404 (данные) когерентности, параметры 401 и 403 (v_{m, ind} и v_{s, ind}), указывающие форму шума, и/или усиления (g_{l, q} и g_{r, q}).As shown in FIG. 2, several examples of comfort noise data are obtained from a 232 bit stream. The comfort noise data may include, as explained above, coherence information 404 (data), parameters 401 and 403 (v _{m, ind} and v _{s, ind} ) indicating noise shape, and/or gains (g _{l, q} and g _{r, q} ).

Ступень 212-C может деквантовать квантованную версию c_ind информации 404 когерентности для получения деквантованной информации c_q когерентности.Stage 212-C may dequantize the quantized version c _ind of the coherence information 404 to obtain the dequantized coherence information c _q .

Ступень 2120 (объединенное деквантование по форме шума) может позволить деквантовать другие данные комфортного шума, полученные из потока 232 битов. Можно обратиться к фиг. 6. Ступень 212 деквантования образована другими ступенями деквантования, указанными в настоящем документе позициями 212-M, 212-S, 212-R, 212-L. Ступень 212-M может деквантовать параметры 401 и 403 формы среднего канального шума для получения деквантованных параметров v_{m, q} и v_{s, q} формы шума. Ступень 212-S может обеспечивать деквантованную версию v_{s, q} параметров 403 (v_{s, ind}) формы бокового канального шума. В некоторых примерах можно использовать флаг no-side таким образом, чтобы обнулить вывод ступени 212-S в случае, если блок 435 в кодере 300a распознает энергию вектора v_s форм шума как меньшую, чем заданное пороговое значение α. В случае, если энергия меньше заданного порогового значения α и флаг no-side передает это в служебных сигналах, деквантованная версия v_{s, q} вектора v_s форм шума может обнуляться (что концептуально показано как умножение на флаг 536', полученный из блока 536, который имеет ту же функцию, что и блок 436 кодера, даже если блок 536 фактически считывает флаг no-side, кодированный во вспомогательной информации потока 232 битов, вообще без выполнения сравнения с пороговым значением α). Следовательно, если энергия бокового канала в кодере определена как меньшая, чем заданное пороговое значение α, деквантованная версия v_{s, q} вектора v_s форм шума искусственно обнуляется, и значение в выводе 537'' блока 537 модуля масштабирования равно нулю. В противном случае, если энергия больше заданного порогового значения, то вывод 537'' является равным квантованной версии v_{s, q} боковых индексов 403 (v_{s, ind}) формы шума бокового канала. Другими словами, значениями вектора v_{s, ind} форм шума пренебрегают в случае, если энергия бокового канала ниже заданного порогового значения α энергии.Stage 2120 (combined noise shape dequantization) may allow other comfort noise data obtained from the 232 bit stream to be dequantized. Refer to FIG. 6. The dequantization stage 212 is formed by other dequantization stages, indicated herein by numerals 212-M, 212-S, 212-R, 212-L. Stage 212-M may dequantize channel average noise waveform parameters 401 and 403 to obtain dequantized noise waveform parameters v _{m, q} and v _{s, q} . Stage 212-S may provide a dequantized version of the v _{s, q} parameters 403 (v _{s, ind} ) of the side channel noise waveform. In some examples, a no-side flag may be used to reset the output of stage 212-S in the event that block 435 in encoder 300a detects the energy of the noise waveform vector v _s as less than a predetermined threshold value α. In the event that the energy is less than a given threshold α and the no-side flag signals this, the dequantized version v _{s, q} of the noise waveform vector v _s can be set to zero (which is conceptually shown as multiplying by flag 536' obtained from block 536. which has the same function as encoder block 436, even though block 536 actually reads the no-side flag encoded in the auxiliary information of bitstream 232 without performing a comparison with the threshold α at all). Therefore, if the side channel energy at the encoder is determined to be less than a predetermined threshold value α, the dequantized version v _{s, q} of the noise shape vector v _s is artificially nulled and the value at output 537'' of scaler block 537 is zero. Otherwise, if the energy is greater than a predetermined threshold, then the output 537'' is equal to the quantized version v _{s, q} of the side channel indices 403 (v _{s, ind} ) of the side channel noise waveform. In other words, the values of the noise waveform vector v _{s, ind} are neglected if the side channel energy is below a predetermined energy threshold α.

В ступени 516 M/S-L/R выполняется преобразование M/S-L/R таким образом, чтобы получить версию L/R v'_l, v'_r параметрических данных (формы шума). Затем может использоваться усилительная ступень 518 (образованная ступенями 518-L и 518-L) таким образом, что в ступени 518-L, канал v'_l масштабируется посредством усиления g_{l, d}, тогда как в ступени 518-R, канал v'_r масштабируется посредством усиления g_{r, q}. Следовательно, энергетические каналы v_{l, q} и v_{r, q} могут быть получены в виде вывода усилительного каскада 518. Блоки 518-L и 518-R ступеней показаны с «+», поскольку передача значений предположительно выполняется в логарифмической области, и масштабирование значений в силу этого указывается в суммировании. Тем не менее, усилительная ступень 518 указывает то, что восстановленные векторы v_{l, q} и v_{r, q} форм шума масштабируются. Восстановленные векторы v_{l, q} и v_{r, q} форм шума здесь комплексно указываются с помощью 2312 и представляют собой восстановленную версию формы 1312 шума, первоначально полученной посредством блока 312 «получения формы шума» в кодере. В общих чертах, каждое усиление является постоянным для всех индексов (коэффициентов) того же канала того же неактивного кадра.The M/SL/R stage 516 performs the M/SL/R transformation so as to obtain the L/R v' _l , v' _r version of the parametric data (noise waveform). An amplifier stage 518 (formed by stages 518-L and 518-L) may then be used such that in stage 518-L, channel v' _l is scaled by gain g _{l, d} , whereas in stage 518-R, channel v' _r is scaled by the gain of g _{r, q} . Therefore, the power channels v _{l, q} and v _{r, q} can be obtained as the output of the amplifier stage 518. Stage blocks 518-L and 518-R are shown with "+" since the transfer of values is assumed to be in the logarithmic domain, and the scaling of the values therefore it is indicated in the summation. However, the amplifier stage 518 indicates that the reconstructed noise waveform vectors v _{l, q} and v _{r, q} are scaled. The reconstructed noise shape vectors v _{l, q} and v _{r, q} are here collectively indicated by 2312 and are a reconstructed version of the noise shape 1312 originally obtained by the “get noise shape” block 312 in the encoder. In general terms, each gain is constant across all indices (coefficients) of the same channel of the same inactive frame.

Следует отметить, что индексы v_{m, ind}, v_{s, ind} и усиления g_{l, q}, g_{r, q} представляют собой коэффициенты формы шума и выдают информацию относительно энергии кадра. Они по существу означают параметрические данные, ассоциированные с входным сигналом 304, которые используются для формирования сигнала 252, но они не представляют сигнал 304 или сигнал 252, который должен формироваться. Иначе говоря, шумовые каналы v_{r, q} и v_{l, q} описывают огибающую, которая должна применяться к многоканальному сигналу 204, сформированному посредством CNG 220.It should be noted that the indices v _{m, ind} , v _{s, ind} and gains g _{l, q} , g _{r, q} are noise shape factors and provide information regarding frame energy. They essentially mean the parametric data associated with the input signal 304 that is used to generate the signal 252, but they do not represent the signal 304 or the signal 252 that is to be generated. In other words, the noise channels v _{r, q} and v _{l, q} describe an envelope that should be applied to the multi-channel signal 204 generated by the CNG 220.

Возвращаясь к фиг. 2, восстановленные векторы v_{l, q} и v_{r, q} форм шума (2312) используются в модуле 250 модификации сигналов для получения модифицированного сигнала 252 посредством формирования шума 204. В частности, первый канал 201 сформированного шума 204 может быть образован каналом v_{l, q} в ступени 250-L и каналом 203 сформированного шума 204 в ступени 250-R для получения выходного многоканального аудиосигнала 252 (L_out и R_out).Returning to FIG. 2, the reconstructed vectors v _{l, q} and v _{r, q} of the noise shapes (2312) are used in the signal modification module 250 to obtain the modified signal 252 through noise shaping 204. In particular, the first channel 201 of the generated noise 204 may be formed by the channel v _{l, q} in stage 250-L and channel 203 of the generated noise 204 in stage 250-R to produce multi-channel audio output 252 (L _out and R _out ).

В примерах, непосредственно комфортный шумовой сигнал 204 не формируется в логарифмической области: только формы шума могут использовать логарифмическое представление. Преобразование из логарифмической области в линейную область может выполняться (хотя не показано).In the examples, the comfort noise signal 204 itself is not generated in the logarithmic domain: only noise shapes can use the logarithmic representation. Conversion from logarithmic domain to linear domain can be performed (although not shown).

Также преобразование из частотной области во временную область может выполняться (хотя не показано).Also, conversion from frequency domain to time domain can be performed (although not shown).

Декодер 200'' (200a, 200b) также может содержать спектрально-временной преобразователь (например, модуль 250 модификации сигналов) для преобразования результирующего первого канала 201 и результирующего второго канала 203, спектрально регулируемых и когерентно регулируемых, в соответствующие представления во временной области, которые должны комбинироваться или конкатенироваться с представлениями во временной области соответствующих каналов декодированного многоканального сигнала для активного кадра. Это преобразование сформированного комфортного шума в сигнал временной области происходит после блока 250 модификации сигналов на фиг. 2. «Комбинация или конкатенация» с частью, по существу, означает, что до или после неактивного кадра, который использует одну из этих технологий CNG, также могут быть предусмотрены активные кадры (другой тракт обработки на фиг. 1), и для формирования непрерывного вывода вообще без перерывов или слышимых щелчков и т.д., кадры должны корректно конкатенироваться.Decoder 200'' (200a, 200b) may also include a time-spectral converter (eg, signal modification module 250) for converting the resulting first channel 201 and the resulting second channel 203, spectrally adjusted and coherently adjusted, into corresponding time domain representations that must be combined or concatenated with the time domain representations of the corresponding channels of the decoded multi-channel signal for the active frame. This conversion of the generated comfort noise into a time domain signal occurs after the signal modification block 250 in FIG. 2. "Combination or concatenation" with a part essentially means that before or after an inactive frame that uses one of these CNG technologies, active frames may also be provided (another processing path in Fig. 1), and to form a continuous output without any interruptions or audible clicks, etc., the frames should be concatenated correctly.

В некоторых примерах:In some examples:

- кодированный аудиосигнал (232) для активного кадра (306) имеет первое множество коэффициентов, описывающих первое число частотных элементов разрешения; и- the encoded audio signal (232) for the active frame (306) has a first set of coefficients describing a first number of frequency bins; And

- кодированный аудиосигнал (232) для неактивного кадра (308) имеет второе множество коэффициентов, описывающих второе число частотных элементов разрешения.- the encoded audio signal (232) for the inactive frame (308) has a second set of coefficients describing a second number of frequency bins.

Первое число частотных элементов разрешения может быть больше второго числа частотных элементов разрешения.The first number of frequency bins may be greater than the second number of frequency bins.

Любой из примеров декодера может управляться посредством подходящего контроллера.Any of the example decoder may be controlled by a suitable controller.

Этапы обработки: первая версияProcessing stages: first version

Параметры шума, кодированные в двух кадрах SID для двух каналов, вычисляются так, как указано в EVS [6], к примеру, согласно LP-CNG или FD-CNG либо обоим из означенного. Формирование энергии шумов в декодере также является одинаковым с EVS, к примеру, согласно LP-CNG или FD-CNG либо обоим из них.The noise parameters encoded in two SID frames for two channels are calculated as specified in EVS [6], for example, according to LP-CNG or FD-CNG or both. The generation of noise energy in the decoder is also the same with EVS, for example, according to LP-CNG or FD-CNG or both of them.

В кодере, дополнительно когерентность двух каналов вычисляется, равномерно квантуется с использованием четырех битов и отправляется в потоке 232 битов. В декодере, работа в режиме CNG затем может управляться посредством передаваемого значения 404 когерентности. Три источника N₁, N₂, N₃ (211a, 212a, 213a; 211b, 212b, 213b; 211c, 212c, 213c; 211d, 212d, 213d; 211e, 212e, 213e) гауссова шума могут использоваться, как показано на фиг. 3a-3f. Когда канальная когерентность является высокой, главным образом коррелированный шум может добавляться в оба канала 221' и 223', тогда как больше декоррелированного шума добавляется, если когерентность 404 является низкой.In the encoder, the coherence of the two channels is further calculated, uniformly quantized using four bits, and sent in a 232-bit stream. At the decoder, operation in the CNG mode can then be controlled by the transmitted coherence value 404. Three sources N ₁ , N ₂ , N ₃ (211a, 212a, 213a; 211b, 212b, 213b; 211c, 212c, 213c; 211d, 212d, 213d; 211e, 212e, 213e) of Gaussian noise can be used as shown in FIG. . 3a-3f. When channel coherence is high, mostly correlated noise may be added to both channels 221' and 223', while more decorrelated noise is added if coherence 404 is low.

Для всех неактивных кадров 306, параметры для формирования комфортного шума (параметры шума) могут постоянно оцениваться в кодере (например, 300, 300a, 300b). Это может осуществляться, например, посредством применения алгоритма оценки шума в частотной области (например, [8]), например, как описано в [6] отдельно для обоих входных каналов (например, 301, 303) для вычисления двух наборов параметров шума (например, 401, 403), которые также поясняются в качестве параметрических данных шума. Кроме того, когерентность (c, 404) двух каналов может вычисляться (например, в модуле 320 вычисления когерентности) следующим образом: С учетом M-точечных DFT-спектров двух входных каналов (L, R могут представлять собой 301, 303), могут вычисляться четыре промежуточных значения, например:For all inactive frames 306, parameters for generating comfort noise (noise parameters) may be continuously estimated in the encoder (eg, 300, 300a, 300b). This may be accomplished, for example, by applying a frequency domain noise estimation algorithm (e.g., [8]), e.g., as described in [6] separately for both input channels (e.g., 301, 303) to compute two sets of noise parameters (e.g. , 401, 403), which are also explained as parametric noise data. In addition, the coherence (c, 404) of the two channels can be calculated (for example, in the coherence calculation module 320) as follows: Given the M-point DFT spectra of the two input channels (L, R can be 301, 303), four intermediate values can be calculated, for example:

, ,

и энергии двух каналов:and the energies of two channels:

Здесь, оно может составлять M=256, обозначает действительную часть комплексного числа, обозначает мнимую часть комплексного числа, и обозначает комплексное сопряжение. Эти промежуточные значения затем могут сглаживаться, например, с использованием соответствующих значений из предыдущего кадра:Here, it can be M=256, denotes the real part of a complex number, denotes the imaginary part of a complex number, and denotes complex conjugation. These intermediate values can then be smoothed, for example using the corresponding values from the previous frame:

Этот проход может представлять собой часть блока 320'' «вычисления канальной когерентности» в кодере. Он представляет собой временное сглаживание внутренних параметров, чтобы не допустить больших внезапных перескоков в параметрах между кадрами. Другими словами, фильтр нижних частот применяется здесь к параметрам.This pass may be part of a "channel coherence calculation" block 320'' in the encoder. It is a temporary smoothing of internal parameters to prevent large sudden jumps in parameters between frames. In other words, a low pass filter is applied to the parameters here.

Вместо констант 0,95 и 0,05, могут использоваться другие константы в интервале 0,95±0,03 и 0,05±0,03.Instead of constants 0.95 and 0.05, other constants in the range 0.95±0.03 and 0.05±0.03 can be used.

В альтернативе, можно задавать:Alternatively, you can specify:

, ,

где , и β+γ=1, например, β=0,95 и γ=0,05.Where , and β+γ=1, for example, β=0.95 and γ=0.05.

Когерентность (c, 404) (которая может составлять между 0 и 1) затем может вычисляться (например, в модуле (320) вычисления) когерентности следующим образом:The coherence (c, 404) (which can be between 0 and 1) can then be calculated (for example, in the calculation module (320) of the coherence as follows:

, ,

и равномерно квантоваться (например, в квантователе 320'') с использованием, например, четырех битов следующим образом:and uniformly quantized (for example, in quantizer 320'') using, for example, four bits as follows:

Кодирование оцененных параметров 1312, 2312 шума для обоих каналов может выполняться отдельно, например, как указано в [6]. Два кадра 241, 243 SID затем могут кодироваться и отправляться в декодер. Первый кадр 241 SID может содержать оцененные параметры 401 шума канала L и (например, четыре) бита вспомогательной информации 402, например, как описано в [6]. Во втором кадре 243 SID, параметры 403 шума канала R могут отправляться наряду с четырехбитовым квантованным значением c, 404 когерентности (различные количества битов могут выбираться в других примерах).Encoding of the estimated noise parameters 1312, 2312 for both channels can be done separately, for example, as described in [6]. The two SID frames 241, 243 can then be encoded and sent to the decoder. The first SID frame 241 may contain estimated L channel noise parameters 401 and (eg, four) bits of ancillary information 402, eg, as described in [6]. In the second SID frame 243, the R channel noise parameters 403 may be sent along with the four-bit quantized coherence value c, 404 (different numbers of bits may be selected in other examples).

В декодере (например, 200'', 200a, 200b), как параметры (401, 403) шума кадра SID, так и вспомогательная информация 402 первого кадра могут декодироваться, например, как описано в [6]. Значение 404 когерентности во втором кадре может деквантоваться в ступени 212-C следующим образом:At a decoder (eg, 200'', 200a, 200b), both the SID frame noise parameters (401, 403) and the first frame side information 402 may be decoded, for example, as described in [6]. The coherence value 404 in the second frame may be dequantized in stage 212-C as follows:

(на фиг. 2, заменяется на c_q).(in Fig. 2, is replaced by c _q ).

Для формирования комфортного шума (например, в генераторе 220 либо в любом из генераторов 220a-220e, которые могут включать в себя генератор по любому из фиг. 3a-3e), согласно примеру, три источника 211, 212, 213 гауссова шума могут использоваться, как показано на фиг. 3. Источники 211, 212, 213 шума могут адаптивно суммироваться между собой (например, в ступенях 206-1 и 206-3 сумматора), например, на основе значения (c, 404) когерентности. DFT-спектры левого и правого канальных шумовых сигналов N_l[k], N_r[k] могут вычисляться следующим образом:To generate comfort noise (e.g., in the generator 220 or in any of the generators 220a-220e, which may include the generator of any of FIGS. 3a-3e), according to the example, three Gaussian noise sources 211, 212, 213 can be used, as shown in Fig. 3. The noise sources 211, 212, 213 may be adaptively summed together (eg, in adder stages 206-1 and 206-3), eg, based on a coherence value (c, 404). The DFT spectra of the left and right channel noise signals N _l [k], N _r [k] can be calculated as follows:

, ,

где (что является индексом конкретного частотного элемента разрешения, в тогда как каждый канал имеет M частотных элементов разрешения), и j ²=-1 (т.е. j является мнимой единицей), и "x" является нормальным умножением. Здесь, «частотный элемент разрешения» означает число комплексных значений в спектрах N_l и N_r, соответственно. M является длиной преобразования FFT или DFT, которое используется, так что длина спектров составляет M. Следует отметить, что шум, вставленный в действительную часть, и шум, вставленный в мнимую часть, может отличаться. Таким образом, для длины спектра M, требуется 2xM значений (одно действительное и одно мнимое), сформированных из каждого источника шума. Либо, другими словами: N_l и N_r являются комплекснозначными векторами с длиной M, тогда как N₁, N₂ и N₃ являются действительнозначными векторами с длиной 2xM.Where (which is the index of a particular frequency bin, whereas each channel has M frequency bins), and j ² = -1 (ie j is the imaginary unit), and "x" is a normal multiplication. Here, “frequency element” means the number of complex values in the spectra N _l and N _r , respectively. M is the length of the FFT or DFT transform that is used, so that the length of the spectra is M. It should be noted that the noise inserted into the real part and the noise inserted into the imaginary part may be different. Thus, for a spectrum length M, 2xM values (one real and one imaginary) generated from each noise source are required. Or, in other words: N _l and N _r are complex-valued vectors of length M, while N ₁ , N ₂ and N ₃ are real-valued vectors of length 2xM.

Впоследствии, шумовой сигнал 204 в двух каналах спектрально формируется (например, в ступенях 250-L, 250-R на фиг. 2) с использованием соответствующих параметров (2312) шума, декодированных из соответствующего кадра SID и затем преобразованных обратно во временную область (например, как описано в [6]) для формирования комфортного шума частотной области.Subsequently, the noise signal 204 in the two channels is spectrally generated (eg, in stages 250-L, 250-R in FIG. 2) using the corresponding noise parameters (2312) decoded from the corresponding SID frame and then converted back to the time domain (eg , as described in [6]) to generate comfortable frequency domain noise.

Любой из примеров обработки может выполняться посредством подходящего контроллера.Any of the example processing may be performed by a suitable controller.

Этапы обработки: вторая версияProcessing stages: second version

Аспекты этапов обработки, как пояснено выше, могут интегрироваться по меньшей мере с одним из нижеприведенных аспектов. Здесь главным образом необходимо обратиться к фиг. 2 и 5, но также можно обратиться к фиг. 4.Aspects of the processing steps, as explained above, may be integrated with at least one of the following aspects. Here it is mainly necessary to refer to FIG. 2 and 5, but also refer to FIGS. 4.

Блок-схема общей инфраструктуры кодера проиллюстрирована на фиг. 1. Для каждого кадра в кодере, текущий сигнал может классифицироваться как активный или как неактивный посредством отдельного выполнения VAD для каждого канала, как описано в [6]. VAD-решение затем может синхронизироваться между двумя каналами. В примерах, кадр классифицируется как неактивный кадр 308, только если оба канала классифицируются как неактивные. В противном случае, он классифицируется как активный, и оба канала объединенно кодируются в системе на основе MDCT с использованием M/S для каждой полосы частот, как описано в [10]. При переключении из активного кадра на неактивный кадр, сигналы могут входить в тракт кодирования SID, как показано на фиг. 3.A block diagram of the overall encoder infrastructure is illustrated in FIG. 1. For each frame in the encoder, the current signal can be classified as active or inactive by separately performing VAD for each channel, as described in [6]. The VAD solution can then synchronize between the two channels. In the examples, a frame is classified as inactive frame 308 only if both channels are classified as inactive. Otherwise, it is classified as active and both channels are jointly encoded in an MDCT based system using M/S for each frequency band as described in [10]. When switching from an active frame to an inactive frame, signals may enter the SID encoding path, as shown in FIG. 3.

Параметры (например, 1312, 401, 403, q_{l, q}, g_{r, q}) для формирования комфортного шума (например, параметры шума) могут постоянно оцениваться в кодере (например, 300, 300a, 300b) для активных и неактивных кадров (306, 308). Это может осуществляться, например, посредством применения процесса оценки шума в частотной области, такого как процесс, поясненный в [8], и/или как описано в [6], например, отдельно для обоих входных каналов 301, 303 для вычисления двух наборов параметров шума, включающих в себя формы (Mi или 401 и/или I_s или 403) спектрального шума, например, в логарифмической области для каждого канала.Parameters (eg, 1312, 401, 403, q _{l, q} , g _{r, q} ) for generating comfort noise (eg, noise parameters) can be continuously estimated in the encoder (eg, 300, 300a, 300b) for active and inactive frames ( 306, 308). This may be accomplished, for example, by applying a frequency domain noise estimation process, such as the process explained in [8] and/or as described in [6], for example, separately for both input channels 301, 303 to calculate two sets of parameters noise, including forms (Mi or 401 and/or I _s or 403) of spectral noise, for example, in the logarithmic domain for each channel.

Кроме того, когерентность (404, c) двух каналов может вычисляться (например, в модуле 320 вычисления когерентности) следующим образом: С учетом M-точечных DFT-спектров двух входных каналов , четыре промежуточных значения могут вычисляться как:In addition, the coherence (404, c) of the two channels can be calculated (for example, in the coherence calculation module 320) as follows: Given the M-point DFT spectra of the two input channels , the four intermediate values can be calculated as:

, ,

и энергии двух каналов:and the energies of two channels:

Здесь, оно может составлять M=256 (другие значения для M могут использоваться), обозначает действительную часть комплексного числа, обозначает мнимую часть комплексного числа, и обозначает комплексное сопряжение. Эти промежуточные значения затем сглаживаются на основе 10-миллисекундных субкадров. Если обозначает соответствующее значение из предыдущего субкадра, сглаженные значения могут вычисляться следующим образом:Here, it can be M=256 (other values for M can be used), denotes the real part of a complex number, denotes the imaginary part of a complex number, and denotes complex conjugation. These intermediate values are then smoothed based on 10 ms subframes. If denotes the corresponding value from the previous subframe, the smoothed values can be calculated as follows:

, ,

где , β+xγ=1, например, β=0.95 и γ=0.95 (β> γ, например, β> 3xγ или β> 6xγ).Where , β+xγ=1, for example β=0.95 and γ=0.95 (β> γ, for example β> 3xγ or β> 6xγ).

Когерентность затем может вычисляться (например, в 320'') следующим образом:Coherence can then be calculated (for example in 320'') as follows:

, ,

и равномерно квантоваться (например, в 320'') с использованием четырех битов (но различные количества битов являются возможными) следующим образом:and uniformly quantized (e.g. at 320'') using four bits (but different numbers of bits are possible) as follows:

, ,

где обозначает округление в меньшую сторону до ближайшего целого числа (функцию минимального уровня)Where denotes rounding down to the nearest whole number (minimum level function)

Кодирование оцененных форм шума обоих каналов может выполняться объединенно. Из форм левого (v_l) и правого (v_r) канального шума, различные каналы могут получаться (например, через линейную комбинацию), к примеру, форма среднего канального (v_m) шума и форма бокового канального (v_s) шума могут вычисляться, (например, в блоке 314) следующим образом:Encoding of the estimated noise shapes of both channels can be performed jointly. From the left (v _l ) and right (v _r ) channel noise shapes, different channels can be obtained (for example, through a linear combination), for example, the middle channel (v _m ) noise shape and the side channel (v _s ) noise shape can be calculated , (for example, at block 314) as follows:

, ,

где N обозначает длину векторов форм шума (например, для каждого неактивного кадра 308), например, в частотной области. N обозначает длину вектора форм шума, например, оцененную так, как указано в EVS [6], которая может составлять между 17 и 24. Векторы форм шума могут рассматриваться в качестве более компактного представления спектральной огибающей шума во входном кадре либо, более абстрактно, в качестве параметрического спектрального описания шумового сигнала с использованием N параметров. N не связано с длиной преобразования FFT или DFT.where N denotes the length of vectors of noise shapes (eg, for each inactive frame 308), for example, in the frequency domain. N denotes the length of a noise shape vector, for example, estimated as specified in EVS [6], which can be between 17 and 24. Noise shape vectors can be thought of as a more compact representation of the spectral envelope of the noise in the input frame, or, more abstractly, as as a parametric spectral description of the noise signal using N parameters. N is not related to the length of the FFT or DFT transform.

Затем эти формы шума могут быть нормализованы (например, в ступени 316) и/или квантованы. Например, они могут быть векторно квантованы (например, в ступени 318), например, с использованием многокаскадных векторных квантователей (MSVQ) (пример описан в [6, стр. 442].These noise waveforms can then be normalized (eg, at stage 316) and/or quantized. For example, they can be vector quantized (for example, in stage 318), for example, using multistage vector quantizers (MSVQ) (an example is described in [6, page 442].

MSVQ, используемый в каскаде 318 для квантования формы v_m (для получения v_{m, ind}401), может иметь 6 каскадов (но возможно и другое число каскадов) и/или использовать 37 битов (но возможно и другое количество битов), например, при реализации для моноканалов в [6], тогда как MSVQ, используемый, в каскаде 318, для квантования формы v_s (для получения v_{s, ind} 403) может быть уменьшен до 4 ступеней (либо, в любом случае, до числа ступеней, меньшего, чем число ступеней, используемых в ступени 318), и/или может использовать в сумме 25 битов (либо, в любом случае, количество битов, меньшее, чем количество битов, используемое в ступени 318 для кодирования формы v_m).The MSVQ used in stage 318 to quantize the shape of v _m (to obtain v _{m, ind} 401) may have 6 stages (but another number of stages is possible) and/or use 37 bits (but another number of bits are possible), for example, when implemented for mono channels in [6], while the MSVQ used, in stage 318, to quantize the shape of v _s (to obtain v _{s, ind} 403) can be reduced to 4 stages (or, in any case, to the number of stages less than the number of stages used in stage 318) and/or may use a total of 25 bits (or, in any case, a number of bits less than the number of bits used in stage 318 to encode the form v _m ).

Индексы таблиц кодирования MSVQ могут передаваться в потоке битов (например, в данных 232 и, более конкретно, в данных 401, 403 параметров комфортного шума). Индексы затем деквантуются, что приводит к деквантованным формам v_{m, q} и v_{m, q} шума.MSVQ codebook indices may be carried in the bit stream (eg, in data 232 and, more specifically, in comfort noise parameter data 401, 403). The indices are then dequantized, resulting in dequantized forms v _{m, q} and v _{m, q} of noise.

В случае фонового шума, представляющего собой один источник шума в центре стереоизображения, оцененные формы v_m, v_s шума обоих каналов предположительно должны быть почти равными или даже равными. Результирующая форма шума S-канала в таком случае должна содержать только нули. Тем не менее, векторный квантователь (ступень 322), используемый для квантования текущей реализации v_s, может быть таким, что он не может моделировать вектор со всеми нулями, и после деквантования, деквантованная форма (v_{s, q}) шума v_s в результате может более не быть со всеми нулями. Это может приводить к перцепционным проблемам с представлением таких центрированных фоновых шумов. Чтобы обходить этот недостаток VQ 322, значение no_side (флаг no_side) может вычисляться (и также может передаваться в служебных сигналах в потоке битов) в зависимости от энергии неквантованного вектора форм v_s (например, энергии вектора форм шума v_s после ступени 314 и/или перед ступенью 316). Флаг no_side может быть следующим:In the case of background noise, which is a single noise source in the center of the stereo image, the estimated noise shapes v _m , v _s of both channels are expected to be almost equal or even equal. The resulting S-channel noise waveform should then contain only zeros. However, the vector quantizer (stage 322) used to quantize the current implementation of v _s may be such that it cannot model a vector with all zeros, and after dequantization, the dequantized shape (v _{s, q} ) of the noise v _s results may no longer be all zeros. This may lead to perceptual problems with the representation of such centered background noise. To work around this shortcoming of VQ 322, the value of no_side (no_side flag) may be calculated (and may also be signaled in the bit stream) depending on the energy of the unquantized shape vector v _s (eg, the energy of the noise shape vector v _s after stage 314 and/ or before step 316). The no_side flag could be as follows:

Пороговое значение α энергии может составлять, просто в качестве примера, 0,1 или другое значение в интервале [0,05, 0,15]. Тем не менее, пороговое значение α может быть произвольным, и в реализации может зависеть от используемого числового формата (например, с фиксированной запятой или с плавающей запятой) и/или от возможно используемых нормализаций сигналов. В примерах, положительное действительное значение может использоваться в зависимости от того, насколько резким является используемое определение S-канала «молчания». Следовательно, интервал может составлять (0, 1). Значение no_side может использоваться для указания того, должна ли форма шума v_s использоваться для восстановления форм канального шума v_l и vr (например, в декодере). Если no_side равно 1, деквантованная форма v_s задается равной нулю (например, посредством масштабирования канала v_{s, q} на значение 436'' на фиг. 2, которое представляет собой логическое значение NOT(no_side)); no_side передается (передается в служебных сигналах) в потоке 232 битов, например, в качестве вспомогательной информации 402. Затем, обратное преобразование M/S (например, ступень 324) может применяться к деквантованным векторам v_{m, q} и v_{s, q} форм шума (при этом второе может заменяться на 0 в случае, если энергия является низкой, в силу чего указывается с помощью 437'' на фиг. 2) для получения промежуточных векторов v'_l и v'_r следующим образом:The threshold α energy value may be, just by way of example, 0.1 or another value in the range [0.05, 0.15]. However, the threshold value α can be arbitrary, and in implementation may depend on the number format used (eg, fixed point or floating point) and/or on the signal normalizations that may be used. In the examples, a positive real value may be used depending on how sharp the "silence" S-channel definition used is. Therefore, the interval can be (0, 1). The value no_side can be used to indicate whether the noise waveform v _s should be used to reconstruct the channel noise waveforms v _l and vr (eg, in a decoder). If no_side is 1, the dequantized shape of v _s is set to zero (eg, by scaling the channel v _{s, q} to the value 436'' in FIG. 2, which is the logical value NOT(no_side)); no_side is carried (signalled) in a 232 bit stream, for example, as side information 402. Then, an inverse M/S transform (for example, stage 324) can be applied to the dequantized vectors v _{m, q} and v _{s, q of} noise shapes (the latter may be replaced by 0 if the energy is low, therefore indicated by 437'' in Fig. 2) to obtain the intermediate vectors v' _l and v' _r as follows:

С использованием этих промежуточных векторов v'_l и v'_r и неквантованных векторов v_l и vr форм шума, два значения усиления вычисляются следующим образом:Using these intermediate vectors v' _l and v' _r and the unquantized vectors v _l and vr of the noise shapes, two gain values are calculated as follows:

. .

Два значения усиления затем могут линейно квантоваться (например, в ступени 328) следующим образом:The two gain values can then be linearly quantized (for example, at stage 328) as follows:

, ,

(возможны другие квантования).(other quantizations are possible).

Квантованные усиления могут кодироваться в потоке битов SID (например, в качестве части данных 401 или 403 параметров комфортного шума, и более конкретно, g_{l, q} может представлять собой часть первых параметрических данных шума, и g_{r, q} может представлять собой часть вторых параметрических данных шума), например, с использованием семи битов для значения g_{l, q} усиления и/или семи битов для значения g_{r, q} усиления (различные величины также являются возможными для каждого значения усиления).The quantized gains may be encoded in the SID bitstream (e.g., as part of the comfort noise parameter data 401 or 403, and more specifically, g _{l, q} may be a portion of the first noise parametric data, and g _{r, q} may be a portion of the second parametric noise data noise data), for example, using seven bits for the gain value g _{l, q} and/or seven bits for the gain value g _{r, q} (different values are also possible for each gain value).

В декодере (например, 200'', 200a, 200b), квантованные векторы форм шума (например, часть данных 401 или 403 параметров комфортного шума и, более конкретно, первых параметрических данных шума и вторых параметрических данных шума) могут деквантоваться, например, в ступени 212 (в частности, в любой из частичных ступеней 212-M, 212-S).At a decoder (e.g., 200'', 200a, 200b), the quantized noise shape vectors (e.g., a portion of the comfort noise parameter data 401 or 403 and, more specifically, the first noise parameter data and the second noise parameter data) may be dequantized, e.g. stages 212 (specifically, in any of the partial stages 212-M, 212-S).

Значения усиления могут деквантоваться, например, в ступени 212 (в частности, в любой из частичных ступеней 212-L, 212-R) следующим образом:The gain values may be dequantized, for example, in stage 212 (specifically, in any of the partial stages 212-L, 212-R) as follows:

(значение 45 зависит от квантования и может отличаться для различных квантований). (На фиг. 2, g_{l, d} и g_{r, d} используются вместо g_{l, deq} и g_{r, deq}).(the value of 45 depends on the quantization and may differ for different quantizations). (In Fig. 2, g _{l, d} and g _{r, d} are used instead of g _{l, deq} and g _{r, deq} ).

Значение 404 когерентности может деквантоваться (например, в ступени 212-C) следующим образом:The coherence value 404 may be dequantized (eg, in stage 212-C) as follows:

Если флаг no_side (во вспомогательной информации 402) равен 1, деквантованная форма v_{s, q} v_s задается равной нулю (значение 537'') до вычисления промежуточных векторов v'_l и v'_r (например, в ступени 516). Затем соответствующее значение усиления суммируется со всеми элементами соответствующего промежуточного вектора для формирования деквантованных форм v_{l, q} и v_{r, q} шума, комплексно указываемых позицией 522, следующим образом:If the no_side flag (in supporting information 402) is 1, the dequantized form v _{s, q} v _s is set to zero (value 537'') before calculating the intermediate vectors v' _l and v' _r (eg, in step 516). The corresponding gain value is then summed with all elements of the corresponding intermediate vector to form the dequantized noise forms v _{l, q} and v _{r, q} , collectively indicated at 522, as follows:

(Суммирование обусловлено тем, что мы находимся в логарифмической области, и соответствует умножению на коэффициент в линейной области).(Summation is due to the fact that we are in the logarithmic domain, and corresponds to multiplication by a coefficient in the linear domain).

Для формирования комфортного шума, три источника N₁, N₂, N₃ гауссова шума (например, 211a, 212a, 213a на фиг. 3a, 211b, 212b, 212c на фиг. 3b и т.д.) могут использоваться, как показано на любом из фиг. 3a-3f (либо может использоваться любая из других технологий). Когда канальная когерентность является высокой, главным образом коррелированный шум добавляется в оба канала, тогда как больше декоррелированного шума добавляется, если когерентность является низкой.To generate comfort noise, three Gaussian noise sources N ₁ , N ₂ , N ₃ (for example, 211a, 212a, 213a in Fig. 3a, 211b, 212b, 212c in Fig. 3b, etc.) can be used as shown in any of FIG. 3a-3f (or any of the other technologies may be used). When channel coherence is high, mostly correlated noise is added to both channels, whereas more decorrelated noise is added if coherence is low.

С использованием трех источников шума, DFT-спектры левого и правого канальных шумовых сигналов N_l (201) и N_r (203) могут вычисляться следующим образом:Using three noise sources, the DFT spectra of the left and right channel noise signals N _l (201) and N _r (203) can be calculated as follows:

, ,

где , и j²=-1. Здесь, M обозначает длину блока DFT. Чтобы формировать независимый шум в действительной и мнимой части комплексного спектра, 2xM значений (по два для одного частотного элемента разрешения) в расчете на кадр должны формироваться посредством каждого источника шума. Следовательно, N₁, N₂ и N₃ (соответственно, в 211, 212, 213 на фиг. 3f) могут рассматриваться в качестве действительнозначных шумовых векторов, имеющих длину 2xM, тогда как N_r и N_k (соответственно, в 201, 203) являются комплекснозначными векторами с длиной M.Where , and j ² =-1. Here, M denotes the length of the DFT block. To generate independent noise in the real and imaginary parts of the complex spectrum, 2xM values (two for each frequency bin) per frame must be generated by each noise source. Therefore, N ₁ , N ₂ and N ₃ (respectively at 211, 212, 213 in Fig. 3f) can be considered as real-valued noise vectors having a length of 2xM, while N _r and N _k (respectively at 201, 203 ) are complex-valued vectors with length M.

Впоследствии, шумовые сигналы в двух каналах могут быть спектрально сформированы (например, в модуле 252 модификации сигналов) с использованием соответствующей формы (v_{l, q} или v_{r, q}) шума, декодированной из потока 232 битов, и затем преобразованы обратно из логарифмической области в скалярную область и из частотной области во временную область, например, как описано в [6], чтобы формировать стереофонический комфортный шумовой сигнал.Subsequently, the noise signals in the two channels can be spectrally conditioned (eg, in signal modification module 252) using the appropriate noise shape (v _{l, q} or v _{r, q} ) decoded from the 232 bit stream, and then converted back from the logarithmic domain into the scalar domain and from the frequency domain into the time domain, for example as described in [6], to generate a stereo comfort noise signal.

Некоторые преимуществаSome benefits

Настоящее изобретение может обеспечивать технологию для формирования комфортного стереошума, в частности, подходящую для схем дискретного стереокодирования. Посредством объединенного кодирования и передачи параметров формы шума для обоих каналов, стерео-CNG может применяться без необходимости понижающего мономикширования.The present invention may provide technology for generating stereo comfort noise, particularly suitable for discrete stereo encoding schemes. By combining encoding and transmitting noise shape parameters for both channels, stereo CNG can be used without the need for mono downmixing.

Вместе с двумя отдельными наборами параметров шума, микширование одного общего и двух отдельных источников шума, управляемых посредством одного значения когерентности, обеспечивает возможность достоверного восстановления стереоизображения фонового шума без необходимости передавать высокодетализированные стереопараметры, которые типично присутствуют только в параметрических аудиокодерах. Поскольку только этот один параметр используется, кодирование SID является простым без необходимости сложных способов сжатия, при одновременном поддержании размера кадра SID малым.Together with two separate sets of noise parameters, mixing one common and two separate noise sources controlled by a single coherence value provides the ability to reliably reconstruct a stereo image of background noise without the need to convey highly detailed stereo parameters typically found only in parametric audio encoders. Since only this one parameter is used, SID encoding is simple without the need for complex compression techniques, while keeping the SID frame size small.

Некоторые важные аспекты:Some important aspects:

В некоторых примерах получается по меньшей мере один из следующих аспектов:In some examples, at least one of the following aspects is obtained:

1. Формирование комфортного шума для стереофонического сигнала посредством микширования трех источников гауссова шума, по одному для каждого канала и третьего общего источника шума для создания коррелированного фонового шума.1. Generate comfort noise for a stereo signal by mixing three Gaussian noise sources, one for each channel, and a third common noise source to create correlated background noise.

2. Управление микшированием источников шума со значением когерентности, которое передается с кадром SID.2. Control the mixing of noise sources with the coherence value that is transmitted with the SID frame.

3. Передача отдельных параметров формы шума для обоих стереоканалов посредством объединенного кодирования форм шума способом M/S. Понижение скорости передачи в битах кадров SID посредством кодирования S-образной формы с меньшим числом битов, чем M.3. Transfer of separate noise shape parameters for both stereo channels by means of combined noise shape encoding using the M/S method. Reducing the bit rate of SID frames by S-shape encoding with fewer bits than M.

Другие технологииOther technologies

Также можно реализовать способ формирования многоканального сигнала, имеющего первый канал и второй канал, содержащий:It is also possible to implement a method for generating a multi-channel signal having a first channel and a second channel containing:

Также может быть реализован способ кодирования аудио для формирования кодированного многоканального аудиосигнала для последовательности кадров, содержащих активный кадр и неактивный кадр, при этом способ содержит:An audio encoding method may also be implemented to generate an encoded multi-channel audio signal for a sequence of frames comprising an active frame and an inactive frame, the method comprising:

Изобретение также может быть реализовано в постоянном модуле хранения, сохраняющем инструкции, которые, при выполнении посредством компьютера (либо процессора или контроллера), предписывают компьютеру (либо процессору или контроллеру) осуществлять вышеприведенный способ.The invention may also be implemented in a persistent storage module storing instructions which, when executed by a computer (or processor or controller), cause the computer (or processor or controller) to carry out the above method.

Изобретение также может быть реализовано в многоканальном аудиосигнале, организованном в последовательность кадров, причем последовательность кадров содержит активный кадр и неактивный кадр, причем кодированный многоканальный аудиосигнал содержит:The invention may also be implemented in a multi-channel audio signal organized into a sequence of frames, wherein the sequence of frames comprises an active frame and an inactive frame, wherein the encoded multi-channel audio signal comprises:

- данные когерентности, указывающие ситуацию когерентности между первым каналом и вторым каналом в неактивном кадре. Многоканальный аудиосигнал может получаться с помощью одной из технологий, раскрытых выше и/или ниже.- coherence data indicating the coherence situation between the first channel and the second channel in the inactive frame. The multi-channel audio signal may be obtained using one of the technologies disclosed above and/or below.

Преимущества вариантов осуществленияAdvantages of the Embodiments

Вставка общего источника шума для двух каналов, чтобы имитировать коррелированный шум для формирования конечного комфортного шума, играет важную роль при имитации стереофонической записи фонового шума.Inserting a common noise source for two channels to simulate correlated noise to generate the final comfort noise plays an important role in simulating stereo recording of background noise.

Варианты осуществления изобретения также могут считаться процедурой для формирования комфортного шума для стереофонического сигнала посредством микширования трех источников гауссова шума, по одному для каждого канала и третьего общего источника шума для создания коррелированного фонового шума либо, дополнительно или отдельно, управления микшированием источников шума со значением когерентности, которое передается с кадром SID, либо, дополнительно или отдельно, следующим образом: В стереосистеме, отдельное формирование фонового шума приводит к полностью декоррелированному шуму, который звучит неприятно и существенно отличается от фактического фонового шума, вызывающего резкие слышимые переходы, при переключении в/из фона активного режима в фоны режима DTX. В варианте осуществления, на стороне кодера, помимо параметров шума, когерентность двух каналов вычисляется, равномерно квантуется и суммируется с кадром SID. В декодере, работа в режиме CNG затем управляется посредством передаваемого значения когерентности. Используются три источника N_1, N_2, N_3 гауссова шума; когда канальная когерентность является высокой, главным образом коррелированный шум добавляется в оба канала, тогда как больше декоррелированного шума добавляется, если когерентность является низкой.Embodiments of the invention may also be considered a procedure for generating comfort noise for a stereo signal by mixing three Gaussian noise sources, one for each channel, and a third common noise source to create correlated background noise, or, additionally or separately, controlling the mixing of the noise sources with a coherence value. which is transmitted with the SID frame, either, additionally or separately, as follows: In a stereo system, separate background noise generation results in completely decorrelated noise that sounds unpleasant and significantly different from the actual background noise, causing abrupt audible transitions when switching to/from the background active mode into DTX mode backgrounds. In an embodiment, on the encoder side, in addition to the noise parameters, the coherence of the two channels is calculated, uniformly quantized, and summed with the SID frame. At the decoder, CNG mode operation is then controlled by the transmitted coherence value. Three sources N_1, N_2, N_3 of Gaussian noise are used; when channel coherence is high, mostly correlated noise is added to both channels, whereas more decorrelated noise is added if coherence is low.

Здесь следует отметить, что все альтернативы или аспекты, поясненные выше, и все аспекты, заданные посредством независимых пунктов в нижеприведенной формуле изобретения, могут использоваться отдельно, т.е. без альтернатив или целей, отличных от предполагаемой альтернативы, цели или независимого пункта формулы изобретения. Тем не менее, в других вариантах осуществления, две или более из альтернатив или аспектов или независимых пунктов формулы изобретения могут комбинироваться друг с другом, и, в других вариантах осуществления, все аспекты или альтернативы и все независимые пункты формулы изобретения могут комбинироваться друг с другом.It should be noted here that all alternatives or aspects explained above and all aspects specified by independent claims in the following claims can be used separately, i.e. without alternatives or purposes other than the intended alternative, purpose or independent claim. However, in other embodiments, two or more of the alternatives or aspects or independent claims may be combined with each other, and, in other embodiments, all aspects or alternatives and all independent claims may be combined with each other.

Кодированный сигнал согласно изобретению может сохраняться на цифровом носителе хранения данных или на постоянном носителе хранения данных либо может передаваться по передающей среде, такой как беспроводная среда передачи или проводная среда передачи, к примеру, Интернет.The encoded signal according to the invention may be stored on a digital storage medium or a non-transitory storage medium, or may be transmitted over a transmission medium such as a wireless transmission medium or a wired transmission medium, for example, the Internet.

Хотя некоторые аспекты описаны в контексте устройства, очевидно, что эти аспекты также представляют описание соответствующего способа, при этом блок или устройство соответствует этапу способа либо признаку этапа способа. Аналогичным образом, аспекты, описанные в контексте этапа способа, также представляют описание соответствующего блока или элемента, или признака соответствующего устройства.Although certain aspects are described in the context of an apparatus, it will be appreciated that these aspects also represent a description of the corresponding method, wherein the block or apparatus corresponds to a method step or a feature of a method step. Likewise, aspects described in the context of a method step also provide a description of the corresponding block or element, or feature of the corresponding device.

В зависимости от определенных требований к реализации, варианты осуществления изобретения могут быть реализованы в аппаратных средствах или в программном обеспечении. Реализация может выполняться с использованием цифрового носителя хранения данных, например, гибкого диска, DVD, CD, ROM, PROM, EPROM, EEPROM или флэш-памяти, имеющего сохраненные считываемые электронными средствами управляющие сигналы, которые взаимодействуют (или позволяют взаимодействовать) с программируемой компьютерной системой таким образом, что осуществляется соответствующий способ.Depending on certain implementation requirements, embodiments of the invention may be implemented in hardware or software. An implementation may be performed using a digital storage medium, such as a floppy disk, DVD, CD, ROM, PROM, EPROM, EEPROM, or flash memory, having stored electronically readable control signals that interface (or enable interaction) with a programmable computer system such that the appropriate method is carried out.

Некоторые варианты осуществления согласно изобретению содержат носитель данных, имеющий считываемые электронными средствами управляющие сигналы, которые позволяют взаимодействовать с программируемой компьютерной системой таким образом, что осуществляется один из способов, описанных в данном документе.Some embodiments of the invention include a storage medium having electronically readable control signals that allow interaction with a programmable computer system in a manner that implements one of the methods described herein.

В общем, варианты осуществления настоящего изобретения могут быть реализованы в виде компьютерного программного продукта с программным кодом, при этом программный код выполнен с возможностью осуществления одного из способов, когда компьютерный программный продукт выполняется на компьютере. Программный код, например, может сохраняться на машиночитаемом носителе.In general, embodiments of the present invention may be implemented as a computer program product with program code, wherein the program code is configured to implement one of the methods where the computer program product is executed on a computer. The program code, for example, may be stored on a computer-readable medium.

Другие варианты осуществления содержат компьютерную программу для осуществления одного из способов, описанных в данном документе, сохраненную на машиночитаемом носителе или на постоянном носителе хранения данных.Other embodiments comprise a computer program for performing one of the methods described herein stored on a computer-readable medium or a non-transitory storage medium.

Другими словами, вариант осуществления способа согласно изобретению, таким образом, представляет собой компьютерную программу, имеющую программный код для осуществления одного из способов, описанных в данном документе, когда компьютерная программа работает на компьютере.In other words, an embodiment of the method of the invention is thus a computer program having program code for carrying out one of the methods described herein when the computer program is running on a computer.

Следовательно, дополнительный вариант осуществления способов согласно изобретению представляет собой носитель хранения данных (цифровой носитель хранения данных или машиночитаемый носитель), содержащий записанную компьютерную программу для осуществления одного из способов, описанных в данном документе.Therefore, a further embodiment of the methods of the invention is a storage medium (digital storage medium or computer readable medium) containing a recorded computer program for performing one of the methods described herein.

Следовательно, дополнительный вариант осуществления способа согласно изобретению представляет собой поток данных или последовательность сигналов, представляющих компьютерную программу для осуществления одного из способов, описанных в данном документе. Поток данных или последовательность сигналов, например, может быть выполнена с возможностью передачи через соединение для передачи данных, например, через Интернет.Therefore, a further embodiment of the method of the invention is a data stream or signal sequence representing a computer program for implementing one of the methods described herein. The data stream or signal sequence, for example, may be configured to be transmitted over a data connection, such as the Internet.

Дополнительный вариант осуществления содержит средство обработки, например, компьютер или программируемое логическое устройство, выполненное с возможностью осуществления одного из способов, описанных в данном документе.An additional embodiment comprises processing means, such as a computer or programmable logic device, configured to implement one of the methods described herein.

Дополнительный вариант осуществления содержит компьютер, имеющий установленную компьютерную программу для осуществления одного из способов, описанных в данном документе.A further embodiment comprises a computer having a computer program installed for performing one of the methods described herein.

В некоторых вариантах осуществления, программируемое логическое устройство (например, программируемая пользователем вентильная матрица) может использоваться для выполнения части или всех из функциональностей способов, описанных в данном документе. В некоторых вариантах осуществления, программируемая пользователем вентильная матрица может взаимодействовать с микропроцессором для осуществления одного из способов, описанных в данном документе. В общем, способы предпочтительно осуществляются посредством любого аппаратного устройства.In some embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functionality of the methods described herein. In some embodiments, a field programmable gate array may interface with a microprocessor to implement one of the methods described herein. In general, the methods are preferably implemented by any hardware device.

Вышеописанные варианты осуществления являются просто иллюстративными в отношении принципов настоящего изобретения. Следует понимать, что модификации и изменения компоновок и подробностей, описанных в данном документе, должны быть очевидными для специалистов в данной области техники. Следовательно, они подразумеваются как ограниченные только посредством объема нижеприведенной формулы изобретения, а не посредством конкретных подробностей, представленных посредством описания и пояснения вариантов осуществления в данном документе.The above-described embodiments are merely illustrative of the principles of the present invention. It should be understood that modifications and changes to the arrangements and details described herein will be apparent to those skilled in the art. Accordingly, they are intended to be limited only by the scope of the claims below and not by the specific details provided by the description and explanation of the embodiments herein.

Список литературыBibliography

[1] ITU-T G.729 Annex B "A silence compression scheme for G.729 optimized for terminals conforming to ITU-T Recommendation V.70. International Telecommunication Union (ITU)", серия G, 2007. [1] ITU-T G.729 Annex B "A silence compression scheme for G.729 optimized for terminals conforming to ITU-T Recommendation V.70. International Telecommunication Union (ITU)", series G, 2007.

[2] ITU-T G.729.1 Annex C "DTX/CNG scheme: International Telecommunication Union (ITU)", серия G, 2008. [2] ITU-T G.729.1 Annex C "DTX/CNG scheme: International Telecommunication Union (ITU)", series G, 2008.

[3] ITU-T G.718 "Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s, International Telecommunication Union (ITU)", серия G, 2008. [3] ITU-T G.718 "Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s, International Telecommunication Union (ITU)", series G, 2008.

[4] "Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions", 3GPP Technical Specification TS 26.090, 2014. [4] "Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions", 3GPP Technical Specification TS 26.090, 2014.

[5] "Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions", 3GPP, 2014. [5] "Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions", 3GPP, 2014.

[6] 3GPP TS 26.445 "Codec for Enhanced Voice Services (EVS); Detailed algorithmic description". [6] 3GPP TS 26.445 "Codec for Enhanced Voice Services (EVS); Detailed algorithmic description".

[7] Z. Wang и другие "Linear prediction based comfort noise generation in the EVS codec", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, 2015. [7] Z. Wang et al. "Linear prediction based comfort noise generation in the EVS codec", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, 2015.

[8] A. Lombard, S. Wilde, E. Ravelli, S. Döhla, G. Fuchs и M. Dietz "Frequency-domain Comfort Noise Generation for Discontinuous Transmission in EVS", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, 2015. [8] A. Lombard, S. Wilde, E. Ravelli, S. Döhla, G. Fuchs and M. Dietz "Frequency-domain Comfort Noise Generation for Discontinuous Transmission in EVS", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, 2015.

[9] A. Lombard, M. Dietz, S. Wilde, E. Ravelli, P. Setiawan и M. Multrus "Generation of the comfort noise with high spectro-temporal resolution in discontinuous transmission of audio signals". Патент США № 9,583,114B2, 19 июня 2015 года.[9] A. Lombard, M. Dietz, S. Wilde, E. Ravelli, P. Setiawan and M. Multrus "Generation of the comfort noise with high spectro-temporal resolution in discontinuous transmission of audio signals." US Patent No. 9,583,114B2, June 19, 2015.

[10] E. NORVELL и F. JANSSON "SUPPORT FOR GENERATION OF COMFORT NOISE. AND GENERATION OF COMFORT NOISE", публикация WO 2019/193149 A1, 5 апреля 2019 года.[10] E. NORVELL and F. JANSSON "SUPPORT FOR GENERATION OF COMFORT NOISE. AND GENERATION OF COMFORT NOISE", publication WO 2019/193149 A1, April 5, 2019.

Claims

1. Multichannel signal generator (200) for generating a multichannel signal (204) having a first channel (201) and a second channel (203), containing:

- a first audio source (211) for generating a first audio signal (221);

- a second audio source (213) for generating a second audio signal (223);

- a mixing noise source (212) to generate a mixing noise signal (222); And

- a mixer (206) for mixing the mixing noise signal (222) and the first audio signal (221) to obtain the first channel (201), and for mixing the mixing noise signal (222) and the second audio signal (222) to obtain the second channel (203),

- in this case the mixer (206) contains:

- a first amplitude element (208-1) for influencing the amplitude of the first audio signal (221);

- a first adder (206-1) for summing the output signal (221) of the first amplitude element and at least part of the mixing noise signal (222);

- a second amplitude element (208-3) for influencing the amplitude of the second audio signal (223);

- a second adder (206-3) for summing the output (223) of the second amplitude element (208-3) and at least part of the mixing noise signal (222),

- in this case, the magnitude of the impact performed through the first amplitude element (208-1) and the magnitude of the impact performed through the second amplitude element (208-3) are equal to each other, or the magnitude of the impact performed through the second amplitude element (208-3) , differs by less than 20 percent relative to the value performed by the first amplitude element (208-1),

- wherein the mixer (206) contains a third amplitude element (208-2) for influencing the amplitude of the mixing noise signal (222),

- in this case, the magnitude of the impact performed by means of the third amplitude element (208-2) depends on the magnitude of the impact performed by means of the first amplitude element (208-1) or the second amplitude element (208-3) in such a way that the magnitude of the impact performed by means the third amplitude element (208-2) becomes larger when the amount of action performed by the first amplitude element or the amount of action performed by the second amplitude element (208-3) becomes smaller.

2. The channel signal generator of claim 1, wherein the first audio source (211) is a first noise source, and the first audio signal (221) is a first noise signal, and/or the second audio source (213) is a second noise source, and the second audio signal (223) is a second noise signal,

- wherein the first noise source (211) and/or the second noise source (213) is configured to generate the first noise signal (221) and/or the second noise signal (223) in such a way that the first noise signal (221) and/or the second noise signal (223) is decorrelated with respect to the mixing noise signal (222).

3. The multi-channel signal generator according to claim 1 or 2, in which the mixer (206) is configured to generate the first channel (201) and the second channel (203) in such a way that the magnitude of the mixing noise signal (222) in the first channel (201) equal to the magnitude of the mixing noise signal (222) in the second channel (203) or within a range of 80-120 percent relative to the magnitude of the mixing noise signal (222) in the second channel (203).

4. The multi-channel signal generator according to one of the preceding paragraphs, in which the mixer (206) contains a control input for receiving a control parameter (404, s), and the mixer (206) is configured to control the magnitude of the noise signal (222) of the mix in the first channel (201) and the second channel (203) in response to the control parameter (404, s).

5. The multi-channel signal generator as claimed in one of the preceding claims, wherein each of the first audio source (211), the second audio source (213), and the mixing noise source (212) is a Gaussian noise source.

6. Multichannel signal generator according to one of the previous paragraphs,

- wherein the first audio source (211) includes a first noise generator for generating a first audio signal (221) as a first noise signal, wherein the second audio source (213) includes a decorrelator for decorrelating the first noise signal (221) to generate a second audio signal (213) as as a second noise signal, and wherein the mixing noise source (212) comprises a second noise generator.

7. Multichannel signal generator according to one of claims. 1-5,

- wherein the first audio source (211) includes a first noise generator (211) for generating a first audio signal (221) as a first noise signal, wherein the second audio source (213) contains a second noise generator (213) for generating a second audio signal (223) as as a second noise signal, and wherein the mixing noise source (212) comprises a decorrelator for decorrelating the first noise signal (221) or the second noise signal (223) to generate a mixing noise signal (222).

8. Multichannel signal generator according to one of claims. 1-5,

- wherein one of the first audio source (211), the second audio source (213) and the mixing noise source (212) contains a noise generator for generating a noise signal, and the other of the first audio source (211), the second audio source (213) and the source The mixing noise source (212) comprises a first decorrelator for decorrelating the noise signal, and wherein another of the first audio source (211), the second audio source (213), and the mixing noise source (212) includes a second decorrelator for decorrelating the noise signal, wherein the first the decorrelator and the second decorrelator are different from each other such that the output signals of the first decorrelator and the second decorrelator are decorrelated from each other.

9. Multichannel signal generator according to one of claims. 1-5, wherein the first audio source (211) includes a first noise generator, wherein the second audio source (213) contains a second noise generator, and wherein the mixing noise source (212) includes a third noise generator, wherein the first noise generator, the second the noise generator and the third noise generator are configured to generate mutually decorrelated noise signals.

10. Multichannel signal generator according to one of the previous paragraphs,

- wherein one of the first audio source (211), the second audio source (213) and the mixing noise source (212) comprises a pseudo-random number sequence generator configured to generate a pseudo-random number sequence in response to the seed, and at least two of the first audio source (211), the second audio source (213) and the mixing noise source (212) are configured to initialize a pseudo-random number sequence generator using different seeds.

11. Multichannel signal generator according to one of claims. 1-6,

- wherein at least one of the first audio source (211), the second audio source (213) and the mixing noise source (212) is configured to operate using a previously stored noise table.

12. Multichannel signal generator according to one of claims. 1-6,

- wherein at least one of the first audio source (211), the second audio source (213) and the mixing noise source (212) is configured to generate a complex spectrum for the frame using the first noise value for the real part and the second noise value for the imaginary part .

13. Multichannel signal generator according to one of claims. 11 and 12, in which the at least one noise generator is configured to generate a complex spectral noise value for frequency bin k using, for one of the real part and the imaginary part, a first random value with index k, and using, for the other from the real part and the imaginary part, a second random value with index (k+M), wherein the first noise value and the second noise value are included in a noise array, for example, extracted from a random number sequence generator or from a noise table, or from a noise process, in the range from the start index to the end index, wherein the start index is less than M, and wherein the end index is equal to or less than 2M, wherein M and k are integers.

14. Multichannel signal generator according to one of the previous paragraphs,

- in which the magnitude of the influence performed by means of the third amplitude element (208-2) is the square root of a given value (c _q ), and the magnitude of the action performed by means of the first amplitude element (208-1), and the magnitude of the influence performed by means of the second amplitude element (208-3), represents the square root of the difference between 1 and the specified value (c _q ).

15. Multichannel signal generator according to one of the previous paragraphs, additionally containing:

- an input interface (210) for receiving encoded audio data (232) in a sequence of frames (306, 308) containing an active frame (306) and an inactive frame (308) after the active frame (306); And

- an audio decoder (200'', 200a, 200b) for decoding encoded audio data for the active frame (306) to generate a decoded multi-channel signal for the active frame,

wherein the first audio source (211), the second audio source (213), the mixing noise source (212) and the mixer (206) are active in the inactive frame (308) to generate a multi-channel signal (204) for the inactive frame.

16. Multichannel signal generator according to claim 15, in which:

- encoded audio data (232) for the active frame (306) has a first set of coefficients describing a first number of frequency bins; And

- the encoded audio data (232) for the inactive frame (308) has a second set of coefficients describing a second number of frequency bins,

- in this case, the first number of frequency bins is greater than the second number of frequency bins.

17. Multichannel signal generator (200) for generating a multichannel signal (204) having a first channel (201) and a second channel (203), containing:

- a first audio source (211) for generating a first audio signal (221);

- a second audio source (213) for generating a second audio signal (223);

- a mixing noise source (212) to generate a mixing noise signal (222);

- wherein the first audio source (211), the second audio source (213), the mixing noise source (212) and the mixer (206) are active in the inactive frame (308) to generate a multi-channel signal (204) for the inactive frame,

- wherein the encoded audio data (232) for the inactive frame (308) contains silence insertion descriptor data (p_noise, c) containing comfort noise data (c, p_noise) indicating the signal energy (1312) for each channel of the two channels (301, 303 ) or for each of a first linear combination of the first and second channels and a second linear combination of the first and second channels for the inactive frame and indicating coherence (404, c) between the first channel (301) and the second channel (303) in the inactive frame, and

- wherein the mixer (206, 220) is configured to mix (206-1, 206-3) the mixing noise signal (222) and the first audio signal (221) or the second audio signal (223) based on the comfort noise data indicating coherence (404 , c), and

- wherein the generator (200, 220, 220a-220e) of multi-channel signals additionally contains a signal modification module (250) for modifying the first channel (201) and the second channel (203), or the first audio signal (221) or the second audio signal (223), or noise signal (222) mixing,

- wherein the signal modification module (250) is configured to be controlled by comfort noise data (p_noise) indicating signal energies for the first audio channel (301) and the second audio channel (303) or indicating signal energies for the first linear combination of the first and second channels and the second linear combination of the first and second channels.

18. Multichannel signal generator according to any one of claims. 15-17, in which the audio data (232) for the inactive frame contains:

- a first frame (241) of a silence insert descriptor for the first channel (201) and a second frame (243) of a silence insert descriptor for the second channel (203), wherein the first frame (241) of the silence insert descriptor contains:

- data (p_noise) of comfort noise parameters for the first channel (201) and/or for the first linear combination of the first and second channels, and

- auxiliary information (p_frame) for the formation of comfortable noise for the first channel and the second channel (203), and

- wherein the second frame (243) of the silence insertion descriptor contains:

- data (p_noise) of comfort noise parameters for the second channel (203) and/or for the second linear combination of the first and second channels, and

- coherence information (404, c) indicating coherence between the first channel (201) and the second channel (203) in the inactive frame, and

- wherein the multi-channel signal generator contains a controller for controlling the formation of a multi-channel signal (204) in an inactive frame using auxiliary information (p_frame) for the formation of comfortable noise for the first frame (241) of the silence insert descriptor for determining the mode of generating comfortable noise for the first channel (201) and the second channel (203) and/or for the first linear combination of the first and second channels and the second linear combination of the first and second channels, using the coherence information (404, c) in the second frame (243) of the silence insert descriptor to specify the coherence (404, c) between the first channel (201) and the second channel (203) in the inactive frame, and using the comfort noise parameter data (p_noise) from the first frame (241) of the silence insert descriptor, and using the comfort noise parameter data (p_noise) from the second a silent insert descriptor frame (243) for specifying the power situation (v _{l, q} ) of the first channel (301) and the power situation (v _{r, q} ) of the second channel (303).

19. Multichannel signal generator according to any one of claims. 15-18, in which the audio data (232) for the inactive frame contains:

- at least one silent insert descriptor frame (241) for a first linear combination of the first and second channels and a second linear combination of the first and second channels,

- wherein at least one frame (241) of the silence insertion descriptor contains:

- data (p_noise) of comfortable noise parameters for the first linear combination of the first and second channels, and

- auxiliary information (p_frame) for the formation of comfortable noise for the second linear combination of the first and second channels,

- wherein the multi-channel signal generator contains a controller for controlling the generation of a multi-channel signal (204) in an inactive frame using auxiliary information (p_frame) for generating comfortable noise for the first linear combination of the first and second channels and the second linear combination of the first and second channels, using information ( 404, c) coherence in the second frame (243) of the insertion of silence descriptor to specify the coherence (404, c) between the first channel (201) and the second channel (203) in the inactive frame, and using data (p_noise) comfort noise parameters at least from at least one silence insertion descriptor frame (241), and using the comfort noise parameter data (p_noise) from at least one silence insertion descriptor frame (243) to specify the energy situation (v _l , _q ) of the first channel (301) and the energy situation (v _{r, q} ) of the second channel (303).

20. Multichannel signal generator according to any one of claims. 17-19, further comprising a time-spectral converter for converting the resulting first channel and the resulting second channel, spectrally variable and coherently variable, into corresponding time domain representations to be combined or concatenated with the time domain representations of the corresponding channels of the decoded multi-channel signal for the active frame.

21. Multichannel signal generator according to any one of claims. 15-20, in which the audio data for the inactive frame contains:

- frame (241, 243) of the silence insertion descriptor, wherein the frame (241, 243) of the silence insertion descriptor contains data (p_noise) of the comfortable noise parameters for the first and second channels (201, 203) and auxiliary information (p_frame) for the formation of comfortable noise for the first channel (203) and the second channel (203) and/or for the first linear combination of the first and second channels and the second linear combination of the first and second channels and coherence information (404, c) indicating the coherence between the first channel (201) and the second channel (203) in an inactive frame, and

- wherein the generator (200) of multi-channel signals contains a controller for controlling the formation of a multi-channel signal (202) in an inactive frame using auxiliary information (p_frame) for the formation of comfortable noise for the frame (241, 243) of the silence insertion descriptor for determining the mode of generating comfortable noise for the first channel (201) and the second channel (203), using the coherence information (404, s) in the silence insert descriptor frame (241) to specify the coherence (404, s) between the first channel (201) and the second channel (203) in the inactive frame, and using the data (p_noise) comfort noise parameters from the frame (241, 243) of the silence insertion descriptor to specify the energy situation (v _l, _q ) of the first channel (301) and the energy situation (v _r, _q ) of the second channel (303 ).

22. Multichannel signal generator according to any one of claims. 15-21,

- wherein the encoded audio data (232) for the inactive frame contains silence insert descriptor data (p_noise, c), containing comfort noise data (c, p_noise), indicating the signal energy for each channel in the middle/side view, and data (404, c ) coherences indicating coherence between the first channel and the second channel in a left/right representation, wherein the multi-channel signal generator is configured to convert the middle/side representation of the energy of the signals into a left/right representation of the energy of the signals in the first channel (301) and the second channel (303 ),

- wherein the mixer (206, 220) is configured to mix (206-1, 206-3) the mixing noise signal (222) into the first audio signal (221) and the second audio signal (223) based on the coherence data (404, c) for obtaining a first channel (201) and a second channel (203), and

- wherein the multi-channel signal generator additionally contains a signal modification module (250), configured to modify the first and second channels (201, 203) by generating the first and second channels (201, 203) based on the energy of the signals in the left/right region.

23. The multi-channel signal generator of claim 22, configured to, if the audio data contains signaling signaling indicating that the energy in the side channel is less than a predetermined threshold value, reset (337) the coefficients (v _s, _q ) of the side channel .

24. The multi-channel signal generator according to claim 22 or 23, wherein the audio data for the inactive frame contains:

- at least one frame (241, 243) of a silence insertion descriptor, wherein at least one frame (241, 243) of a silence insertion descriptor contains data (p_noise, v _{m, ind} , q _{l, q} , q _r, _q , v _{s, ind} ) parameters of comfortable noise for the middle and side channels (v _{m, q} , v _{s, q} ) and auxiliary information (p_frame) for the formation of comfortable noise for the middle and side channels (v _{m, q} , v _{s, q} ) and coherence information (404, c) indicating coherence between the first channel (201) and the second channel (203) in the inactive frame, and

- wherein the generator (200) of multi-channel signals contains a controller for controlling the formation of a multi-channel signal (202) in an inactive frame using auxiliary information (p_frame) for the formation of comfortable noise for the frame (241, 243) of the silence insertion descriptor for determining the mode of generating comfortable noise for the first channel (201) and the second channel (203), using the coherence information (404, s) in the silence insert descriptor frame (241) to specify the coherence (404, s) between the first channel (201) and the second channel (203) in the inactive frame, and using the data (p_noise) comfort noise parameters or their processed version from the frame (241, 243) of the silence insertion descriptor to specify the energy situation (v _{l, q} ) of the first channel (301) and the energy situation (v _{r, q} ) second channel (303).

25. Multichannel signal generator according to any one of claims. 15-24, further configured to scale the energy coefficients (1312, v' _l , v' _r ) of the signals for the first and second channel by means of gain information (g _{l, q} , q _{r, q} ) encoded by the data (401, 403) comfortable noise parameters for the first and second channels.

26. The multichannel signal generator according to any of the previous paragraphs, configured to

converting the generated multi-channel signal (252) from a frequency domain version to a time domain version.

27. The channel signal generator as claimed in any one of the preceding claims, wherein the first audio source (211) is a first noise source and the first audio signal (221) is a first noise signal, or the second audio source (213) is a second noise source and the second the audio signal (223) is a second noise signal,

- wherein the first noise source or the second noise source is configured to generate the first noise signal (201) or the second noise signal (203) such that the first noise signal (201) or the second noise signal (203) is at least partially correlated, And

- wherein the mixing noise source (212) is configured to generate a mixing noise signal (222) with a first part (221a) of mixing noise and a second part (221b) of mixing noise, wherein the second part (221b) of mixing noise is at least is at least partially decorrelated with respect to the first part of the mixing noise (221b); And

- wherein the mixer (206) is configured to mix the first noise part (221a) when mixing the mixing noise signal (222) and the first audio signal (221) to obtain the first channel (201), and mix the second noise part (221b) when mixing the noise a mixing signal (222) and a second audio signal (223) to obtain a second channel (203).

28. Multichannel signal generator (200) for generating a multichannel signal (204) having a first channel (201) and a second channel (203), containing:

- a first audio source (211) for generating a first audio signal (221);

- a second audio source (213) for generating a second audio signal (223);

- a mixing noise source (212) to generate a mixing noise signal (222); And

- wherein the first audio source (211) is a first noise source, and the first audio signal (221) is a first noise signal, or the second audio source (213) is a second noise source, and the second audio signal (223) is a second noise signal,

29. A method for generating a multi-channel signal having a first channel and a second channel (203), comprising the steps of:

- generating a first audio signal (221) using the first audio source (211);

- generating a second audio signal (223) using a second audio source (213);

- generating a mixing noise signal (222) using a mixing noise source (212); And

- mixing (206) the mixing noise signal (222) and the first audio signal (221) to obtain the first channel (201), and mixing the mixing noise signal (222) and the second audio signal (223) to obtain the second channel (202), wherein the method contains stages in which:

- using a first amplitude element (208-1) influencing the amplitude of the first audio signal (221);

- using a first adder (206-1) summing the output signal (221) of the first amplitude element and at least part of the mixing noise signal (222);

- using a second amplitude element (208-3) influencing the amplitude of the second audio signal (223);

- using a second adder (206-3), summing the output (223) of the second amplitude element (208-3) and at least part of the mixing noise signal (222),

- wherein the mixing (206) uses a third amplitude element (208-2) affecting the amplitude of the mixing noise signal (222),

30. An audio encoder (300, 300a, 300b) for generating an encoded multi-channel audio signal (232) for a sequence of frames comprising an active frame (306) and an inactive frame (308), the audio encoder comprising:

- an activity detector (380) for analyzing the multi-channel signal (304) to determine (381) a frame of the sequence of frames as representing an inactive frame (308);

- a noise parameter calculation module (3040) for calculating the first noise parametric data (p_noise, v _{m, ind} ) for the first channel (301, 201) of the multi-channel signal (304) and for calculating the second noise parametric data (p_noise, v _{s, ind} ) for the second channel (303) of the multi-channel signal (320);

- a coherence calculation module (320) for calculating coherence data (404, c) indicating the coherence situation between the first channel (301, 201) and the second channel (303, 203) in the inactive frame (308); And

- output interface (310) for generating an encoded multi-channel audio signal (232) having encoded audio data for the active frame (306) and for the inactive frame (308), first noise parametric data (p_noise, v _{m, ind} ), second parametric data (p_noise , v _{s, ind} ) noise and/or a first linear combination of the first parametric noise data and the second parametric noise data and a second linear combination of the first parametric noise data and the second parametric noise data and the coherence data (c, 404), wherein modulus (3040) calculating the noise parameters is configured to convert at least some of the first noise parameter data and the second noise parameter data from a left/right view to a middle/side view with a middle channel and a side channel.

31. The audio encoder of claim 30, wherein the noise parameter calculation module (3040) is configured to reconvert the middle/side representation (M, S) of at least some of the first noise parametric data and the second noise parametric data into a left/right representation ,

- wherein the noise parameter calculation module (3040) is configured to calculate from the re-converted left/right representation the first gain information (g _l ) for the first channel (301) and the second gain information ( _gr ) for the second channel (303) and provide , first gain information (g _l ) for the first channel (301) included in the first noise parameter data, and second gain information ( _gr ) included in the second noise parameter data.

32. Audio encoder (300) according to claim 31, in which the module (3040) for calculating noise parameters is configured to calculate:

- first information (g _l ) gain through comparison:

- a version (v' _l ) of the first parametric noise data for the first channel (301) re-converted from the middle/side view to the left/right view;

- with a version (v _l ) of the first parametric noise data for the first channel (301) before conversion from the middle/side view to the left/right view; and/or

- second gain information (g _r ) by comparison:

- a version (v' _r ) of the second parametric noise data for the second channel (301) re-converted from the middle/side view to the left/right view;

- with a version (v _r ) of the second parametric noise data for the second channel (301) before conversion from the middle/side view to the left/right view.

33. An audio encoder (300, 300a, 300b) for generating an encoded multi-channel audio signal (232) for a sequence of frames comprising an active frame (306) and an inactive frame (308), the audio encoder comprising:

- output interface (310) for generating an encoded multi-channel audio signal (232) having encoded audio data for the active frame (306) and for the inactive frame (308), first noise parametric data (p_noise, v _{m, ind} ), second parametric data (p_noise , v _{s, ind} ) noise and/or a first linear combination of the first parametric noise data and the second parametric noise data and a second linear combination of the first parametric noise data and the second parametric noise data, and the coherence data (c, 404), wherein modulus (320 ) coherence calculations are made with the ability to:

- calculating the real intermediate value and the imaginary intermediate value from the complex spectral values for the first channel and the second channel (303) in the inactive frame;

- calculating a first energy value for the first channel (301) and a second energy value for the second channel (303) in the inactive frame; And

- calculating coherence data (404, c) using a real intermediate value, an imaginary intermediate value, a first energy value and a second energy value, or

- smoothing at least one of the real intermediate value, the imaginary intermediate value, the first energy value and the second energy value and calculating coherence data using the at least one smoothed value,

- wherein the coherence calculating module (320) is configured to square the smoothed real intermediate value and square the smoothed imaginary intermediate value and sum the squared values to obtain the first component number,

- wherein the coherence calculation module (320) is configured to multiply the smoothed first and second energy values to obtain a second component number and combine the first and second component numbers to obtain a resultant number for the coherence value on which the coherence data is based.

34. An audio encoder (300, 300a, 300b) for generating an encoded multi-channel audio signal (232) for a sequence of frames comprising an active frame (306) and an inactive frame (308), the audio encoder comprising:

- output interface (310) for generating an encoded multi-channel audio signal (232) having encoded audio data for the active frame (306) and for the inactive frame (308), first noise parametric data (p_noise, v _{m, ind} ), second parametric data (p_noise , v _{s, ind} ) noise and/or a first linear combination of the first parametric noise data and the second parametric noise data and a second linear combination of the first parametric noise data and the second parametric noise data and coherence data (c, 404),

- wherein the noise parameter calculation module (3040) is configured to compare the energy of the second linear combination between the first noise parameter data and the second noise parameter data with a given energy threshold value(s) and:

- in the event that the energy of the second linear combination between the first noise parametric data and the second noise parametric data is greater than a predetermined energy threshold(s), the coefficients of the side channel noise shape vector are set to zero (437); And

- in case the energy of the second linear combination between the first noise parametric data and the second noise parametric data is less than a predetermined energy threshold (ex), the coefficients of the side channel noise shape vector are stored.

35. Audio encoder according to any one of paragraphs. 30-34, in which the coherence calculation module (320) is configured to calculate (320'') the coherence value (404, c) and quantize (320'') the coherence value (320'') to obtain a quantized value (c _ind ) coherence, wherein the output interface (310) is configured to use the quantized coherence value (c _ind ) as coherence data in the encoded multi-channel signal.

36. Audio encoder according to any one of paragraphs. 30-35,

- wherein the coherence calculation module (320) is configured to calculate the real intermediate value as a sum over the real parts of the products of complex spectral values for the corresponding frequency bins of the first channel and the second channel (303) in the inactive frame, or

- calculating an imaginary intermediate value as a sum over the imaginary parts of the products of complex spectral values for the corresponding frequency bins of the first channel and the second channel (303) in the inactive frame.

37. The audio encoder of claim 33, wherein the coherence calculation module is configured to calculate the square root of the resultant number to obtain a coherence value on which the coherence data is based.

38. Audio encoder according to one of paragraphs. 30-37,

- wherein the coherence calculation module (320) is configured to quantize the coherence value (404, c) using a uniform quantizer (320'') to obtain the quantized coherence value (c _ind ) as n bits as coherence data.

39. The audio encoder of claim 38, wherein the uniform quantizer (320'') is configured to calculate n bits such that the value for n is equal to the value of the bits occupied by the comfort noise generating auxiliary information (p_frame) for the first frame (241) silent insert handle.

40. Audio encoder according to one of paragraphs. 30-39, in which the output interface (310) is configured to generate a first silence insert descriptor frame (241) for the first channel (301, L) and a second silence insert descriptor frame (243) for the second channel (303, R), with In this case, the first frame (241) of the silence insert descriptor contains data (p_noise) of the comfort noise parameters for the first channel (301, L) and auxiliary information (p_frame) for the formation of comfortable noise for the first channel (301, L) and the second channel (303, R) , and wherein the second frame (243) of the silence insert descriptor contains comfort noise parameter data (p_noise) for the second channel (303) and coherence information (404, c) indicating coherence between the first channel and the second channel (303) in the inactive frame.

41. Audio encoder according to one of paragraphs. 30-39,

- in which the output interface (310) is configured to generate a silence insertion descriptor frame (241, 243), wherein the silence insertion descriptor frame contains data (p_noise) of comfort noise parameters for the first and second channels (301, 303) and auxiliary information ( p_frame) comfort noise generation for the first channel (301, L) and the second channel (303, R) and coherence information (404, c) indicating coherence between the first channel (301, L) and the second channel (303, R) in the inactive frame.

42. Audio encoder according to one of paragraphs. 30-39,

- in which the output interface (310) is configured to generate a first silence insert descriptor frame (241) for the first channel (301, L) and the second channel and a second silence insert descriptor frame (243) for the first channel and the second channel (303, R ), wherein the first frame (241) of the silence insert descriptor contains data (p_noise) of the comfort noise parameters for the first channel and the second channel and auxiliary information (p_frame) for the formation of comfortable noise for the first channel (301, L) and the second channel (303, R ), and wherein the second frame (243) of the silence insert descriptor contains comfort noise parameter data (p_noise) for the first channel and the second channel (303) and coherence information (404, c) indicating the coherence between the first channel and the second channel (303) in an inactive frame.

43. Audio encoder (300) according to one of claims. 30-42, in which the activity detector (380) is configured to: for at least one frame of the sequence of frames:

- analyzing (370-1) the first channel (301, L) of the multi-channel signal (304) to classify the first channel (301, L) as active or inactive, and

- analyzing (370-2) the second channel (303, R) of the multi-channel signal (304) to classify the second channel (303, R) as active or inactive, and

- defining (381) a frame as inactive if both the first channel (301, L) and the second channel (303, R) are classified as inactive, and otherwise as active.

44. Audio encoder (300) according to one of claims. 30-43, in which the noise parameter calculation module (3040) is configured to calculate first gain information (g _l ) for the first channel (301) and second gain information ( _gs ) for the second channel (g _l ) and provide noise parameter data as first gain information (g _l ) for the first channel (301) and second gain information (g _s ).

45. Audio encoder according to one of paragraphs. 30-44, configured to encode a second linear pattern between the first noise parameter data and the second noise parameter data with fewer bits than the number of bits through which the first linear pattern between the first noise parameter data and the second noise parameter data is encoded.

46. Audio encoder according to one of paragraphs. 30-45,

- in which the output interface (310) is configured to:

- generating an encoded multi-channel audio signal (232) having encoded audio data for the active frame (306) using a first set of coefficients for a first number of frequency bins; And

- generating first parametric noise data, second parametric noise data, or a first linear combination of the first parametric noise data and second parametric noise data and a second linear combination of the first parametric noise data and second parametric noise data using a second set of coefficients describing a second number of frequency bins,

47. An audio encoding method for generating an encoded multi-channel audio signal for a sequence of frames containing an active frame and an inactive frame, the method comprising the steps of:

- analyzing the multi-channel signal to determine a frame of the frame sequence as representing an inactive frame;

- calculate the first parametric noise data for the first channel of the multi-channel signal and/or for the first linear combination of the first and second channels of the multi-channel signal and calculate the second parametric noise data for the second channel (303) of the multi-channel signal and/or for the second linear combination of the first and second channels of the multi-channel signal signal;

- calculating coherence data indicating the coherence situation between the first channel and the second channel (303) in the inactive frame; And

- generating an encoded multi-channel audio signal having encoded audio data for the active frame and, for the inactive frame, first noise parametric data, second noise parametric data and coherence data,

- wherein the noise parameter calculation module (3040) is configured to convert at least some of the first noise parameter data and the second noise parameter data from a left/right view to a middle/side view with a middle channel and a side channel.

48. A permanent storage module that stores instructions that, when executed on a computer or processor, implement the method of claim 29.

49. A permanent storage module storing instructions that, when executed on a computer or processor, implements the method of claim 47.