KR101495879B1

KR101495879B1 - A apparatus for producing spatial audio in real-time, and a system for playing spatial audio with the apparatus in real-time

Info

Publication number: KR101495879B1
Application number: KR20140016458A
Authority: KR
Inventors: 김형국; 류상현
Original assignee: 광운대학교 산학협력단
Priority date: 2013-12-27
Filing date: 2014-02-13
Publication date: 2015-02-25

Abstract

The present invention relates to an apparatus for receiving music respectively performed by players on different spaces through an IP network in real time at a receiving end, mixing the music into one of the natural music in a multi-channel method, and generating the mixed music into a stereophonic sound, and a stereophonic sound reproduction system including the same. The apparatus and the stereophonic sound reproduction system restore music or audio packets with a minimum delay by applying a signal processing method, such as adaptive reproduction scheduling and packet loss hiding and merging, to an audio stream corresponding to the music received through the IP network, mix audio streams respectively transmitted from a plurality of players on independent spaces in a multi-channel sound signal processing method such that the audio streams can be one of the natural music, and create the mixed audio streams into a stereophonic sound through an arrangement of a plurality of speakers. The stereophonic sound generation apparatus includes a stereophonic sound generator.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a stereophonic real-time generation apparatus and a stereophonic real-

본 발명은 다자간 독립 연주를 IP 네트워크에 기반하여 입체 음향으로 실시간으로 생성하는 장치와 상기 입체 음향을 실시간으로 재생하는 시스템에 관한 것이다. 보다 상세하게는, 본 발명은 각각 다른 독립된 공간에서 각 연주자가 연주하는 음악을 수신단에서 실시간으로 IP 네트워크를 통해 전송을 받아 하나의 자연스러운 음악으로 다채널 방식을 통해 혼합(믹싱)하고, 혼합된 음악을 입체 음향으로 생성하는 장치와 상기 입체 음향을 재생하는 시스템에 관한 것이다.The present invention relates to a device for real-time generation of stereophonic sound based on an IP network, and a system for real-time reproducing the stereophonic sound. More particularly, the present invention relates to a method and apparatus for mixing music played by each performer in different independent spaces in a multi-channel manner, And a system for reproducing the stereophonic sound.

입체 음향이란 음원이 발생하지 않은 공간에 위치한 청취자가 음향을 들었을 때 방향감, 공간감, 거리감 등을 지각할 수 있도록 공간 정보가 부가된 음향을 말한다.Stereophonic refers to sound with spatial information added to the listener's perception of direction, space, and distance when a listener in a space where no sound source is heard hears the sound.

MPEG과 ITU-R에서 입체 음향에 대한 표준화가 진행되고 있다. 현재까지 개발된 기술로는 MPEG-D USAC(Undefined Speech and Audio Coding), MPEG Surround, MPEG-4 ALS(Audio Lossless Coding), MPEG-4 SLS(Scalable Lossless Coding), MPEG-4 IMAF(Interactive Music Application Format), MPEG-D SAOC(Spatial Audio Object Coding) 등이 있는데, 최근에는 MPEG Surround와 MPEG SAOC를 결합한 입체 음향 기술이 집중적으로 연구 및 개발되고 있다. 이와 함께 음상 정위 기술, 음장 제어 기술, 크로스토크 제어 기술, 바이노럴 및 앰비소닉 녹음 기술 등도 연구되고 있으며, 가상 스피커를 이용한 입체 음향 출력에 대한 연구도 진행되고 있다.Standardization of stereophony is underway in MPEG and ITU-R. MPEG-4 ALS (Audio Lossless Coding), MPEG-4 SLS (Scalable Lossless Coding), MPEG-4 IMAF (Interactive Music Application) Format, and MPEG-D Spatial Audio Object Coding (SAOC). In recent years, stereo sound technology combining MPEG Surround and MPEG SAOC has been intensively researched and developed. In addition, sound localization technology, sound field control technology, crosstalk control technology, binaural and ambsonic recording technology are being studied, and research on stereophonic output using virtual speaker is also being carried out.

하지만 상기한 기술들은 동일한 공간에서 다중의 연주자 혹은 객체가 입력되어 입체 음향으로 제작되고, 제작된 입체 음향을 메타 데이터로 신호 처리가 수행된 오디오 신호와 함께 전송되고, 수신단에서 전송받은 오디오 신호와 메타 데이터를 결합하여 수신단 환경에 적합하도록 입체 음향을 생성하는 기술로서, 각 연주자가 독립된 공간에서 연주한 음악을 수신단에서 IP 네트워크 망을 통해 전송받아 입체 음향을 생성하는 방식과는 큰 차이가 있다.However, the above-described techniques are performed by inputting a plurality of players or objects in the same space to produce a stereophonic sound. The generated stereophonic sound is transmitted together with the audio signal processed with the metadata, The technique is a technique to combine data to generate a stereophonic sound suited to the receiving end environment. The technique is different from a method in which each player transmits music played in an independent space through an IP network at a receiving end to generate stereophonic sound.

한편 IP 기반의 패킷망을 통해 오디오 데이터를 전송할 경우에 각 패킷이 같은 경로가 아니라 서로 다른 경로를 통해 전송되기 때문에 각 패킷이 수신단에 도착하는 시간이 일정하지 않다. 즉, 네트워크 부하로 인해 패킷이 손실되거나, 정해진 시간 내에 패킷이 수신되지 않거나, 또는 송신한 순서가 바뀌어 수신되는 경우가 발생되어 음질 저하가 발생한다.On the other hand, when audio data is transmitted through an IP-based packet network, since each packet is transmitted through different paths rather than through the same path, the time at which each packet arrives at the receiving end is not constant. That is, packets may be lost due to a network load, packets may not be received within a predetermined time, or packets may be received in a changed order, resulting in degraded sound quality.

한국공개특허 제2007-0033860호는 다중 채널의 사운드 입력 신호를 2채널 스피커로 재생하는 입체 음향 생성 장치에 대하여 제안하고 있다. 그러나 이 장치는 서로 다른 공간에서 생성된 사운드 신호를 채널 신호로 믹싱하지 못하기 때문에 상기한 문제점을 해결할 수 없다.Korean Patent Laid-Open Publication No. 2007-0033860 proposes a stereo sound generating apparatus for reproducing a multi-channel sound input signal with a two-channel speaker. However, this apparatus can not solve the above-described problem because it can not mix sound signals generated in different spaces into a channel signal.

본 발명은 상기한 문제점을 해결하기 위해 안출된 것으로서, 각기 다른 독립 공간에서 획득된 오디오 정보를 각 채널의 네트워크 지터(jitter)와 각 오디오 정보의 생성 시간을 기초로 믹싱하여 입체 음향을 실시간으로 생성하는 입체 음향 실시간 생성 장치 및 이를 구비하는 입체 음향 실시간 재생 시스템을 제안하는 것을 목적으로 한다.The present invention has been conceived to solve the problems described above, and it is an object of the present invention to provide an apparatus and a method for generating stereophony in real time by mixing audio information obtained in different independent spaces on the basis of network jitter of each channel and generation time of each audio information And a stereoscopic real-time reproduction system having the same.

그러나 본 발명의 목적은 상기에 언급된 사항으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.However, the objects of the present invention are not limited to those mentioned above, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.

본 발명은 상기한 목적을 달성하기 위해 안출된 것으로서, 특정 채널로 오디오 정보가 수신되면 상기 특정 채널로 현재 수신된 오디오 정보의 수신 시간과 상기 특정 채널로 이전 수신된 오디오 정보의 수신 시간을 기초로 네트워크 지터(jitter)를 추정하는 지터 추정부; 상기 네트워크 지터를 기초로 상기 현재 수신된 오디오 정보를 선택적으로 신호 처리하는 신호 처리부; 및 각기 다른 채널들로부터 수신된 뒤 신호 처리된 오디오 정보들 중에서 각 오디오 정보의 생성 시간을 기초로 동일한 시간에 생성된 오디오 정보들을 추출하며, 추출된 오디오 정보들을 믹싱하여 입체 음향을 실시간으로 생성하는 입체 음향 생성부를 포함하는 것을 특징으로 하는 입체 음향 생성 장치를 제안한다.According to an aspect of the present invention, there is provided a method for transmitting audio information on a specific channel, the method comprising the steps of: receiving audio information on a specific channel; A jitter estimator for estimating a network jitter; A signal processor for selectively processing the currently received audio information based on the network jitter; And extracts audio information generated at the same time based on the generation time of each audio information among signal processed audio information received from different channels and mixes the extracted audio information to generate stereo sound in real time And a stereophonic sound generating unit.

바람직하게는, 상기 입체 음향 생성부는 상기 동일한 시간에 생성된 오디오 정보들로 서로 다른 공간에서 서로 다른 연주자나 악기에 의한 연주를 기초로 생성된 오디오 정보들을 추출한다.Preferably, the stereo sound generating unit extracts audio information generated based on performance by a different player or musical instrument in different spaces with the audio information generated at the same time.

바람직하게는, 상기 지터 추정부는 현재까지 획득된 지터들의 평균값, 현재까지 획득된 각 지터와 상기 각 지터가 획득되기 이전에 획득된 지터 사이의 차이값들로부터 산출된 분산값, 및 미리 정해진 가중치를 이용하여 상기 네트워크 지터를 추정한다.Preferably, the jitter estimating unit may calculate an average value of jits obtained so far, a variance value calculated from difference values between each jitter obtained up to now and jitter obtained before each jitter is acquired, and a predetermined weight To estimate the network jitter.

바람직하게는, 상기 지터 추정부는 상기 가중치로 이전 사용된 가중치, 또는 현재 획득된 지터와 이전까지 획득된 지터들의 평균값 사이의 차이값, 및 이전 사용된 분산값의 연산값을 이용한다.Preferably, the jitter estimation section uses a difference value between the weight used previously as the weight or the currently obtained jitter and the average value of jitters obtained up to now, and the calculated value of the previously used variance value.

바람직하게는, 상기 지터 추정부는, 상기 현재 수신된 오디오 정보의 수신 시간과 상기 이전 수신된 오디오 정보의 수신 시간을 기초로 상기 특정 채널의 패킷 지연 시간을 산출하는 지연 시간 산출부; 및 상기 패킷 지연 시간과 기준 시간을 비교하여 상기 네트워크 지터를 추정하며, 상기 패킷 지연 시간이 상기 기준 시간 이상이면 상기 특정 채널에서 스파이크(spike)가 발생했다고 추정하고, 상기 패킷 지연 시간이 상기 기준 시간 미만이면 상기 특정 채널이 정상 작동중인 것으로 추정하는 지연 시간 비교부를 포함한다.Preferably, the jitter estimation unit may include: a delay time calculation unit for calculating a packet delay time of the specific channel based on a reception time of the currently received audio information and a reception time of the previously received audio information; And estimating the network jitter by comparing the packet delay time with a reference time. If the packet delay time is greater than or equal to the reference time, it is estimated that a spike has occurred in the specific channel. And estimates that the specific channel is in a normal operation if it is less than the predetermined time.

바람직하게는, 상기 지터 추정부는, 상기 패킷 지연 시간이 상기 기준 시간 이상이면 상기 패킷 지연 시간을 상기 기준 시간으로 변경하고, 상기 특정 채널로 이후 수신된 오디오 정보를 이용하여 얻은 패킷 지연 시간이 상기 기준 시간 미만이면 상기 기준 시간을 원상 회복시키는 기준 시간 조정부를 더 포함한다.Preferably, the jitter estimation unit changes the packet delay time to the reference time if the packet delay time is equal to or greater than the reference time, and changes the packet delay time obtained using the audio information received after the specific channel to the reference time, And a reference time adjusting unit for restoring the reference time to an original state when the time is less than the predetermined time.

바람직하게는, 상기 입체 음향 생성 장치는, 상기 현재 수신된 오디오 정보의 시간축 에너지를 이용하여 상기 현재 수신된 오디오 정보를 오디오 신호의 존재 여부에 따라 구간 단위로 분할하는 오디오 정보 분할부; 상기 구간 단위로 분할된 오디오 정보들을 저장하며, 시퀀스 순서에 따라 상기 구간 단위로 분할된 오디오 정보들을 차례대로 출력하는 분할 정보 저장부; 및 차례대로 출력되는 오디오 정보들을 압축/비압축 방법, 손실 은닉 방법 및 병합 방법 중 어느 하나의 방법을 이용하여 신호 처리할 것인지를 결정하는 신호 처리 결정부를 더 포함한다.Preferably, the stereophonic sound generating apparatus further comprises: an audio information dividing unit dividing the currently received audio information in units of intervals according to the presence or absence of the audio signal using the time-base energy of the currently received audio information; A division information storage unit for storing the audio information divided in units of intervals and sequentially outputting audio information divided in units of intervals according to a sequence order; And a signal processing decision unit for deciding whether to process the audio information outputted in order by using one of a compression / decompression method, a loss concealment method and a merging method.

바람직하게는, 상기 신호 처리 결정부는 현재 차례에 출력되는 오디오 정보가 있는지 여부와 이전 차례에 출력된 오디오 정보가 상기 손실 은닉 방법에 따라 신호 처리되었는지 여부를 기초로 상기 현재 차례에 출력되는 오디오 정보를 상기 압축/비압축 방법, 상기 손실 은닉 방법 및 상기 병합 방법 중 어느 하나의 방법을 이용하여 신호 처리할 것인지를 결정한다.Preferably, the signal processing determination unit determines whether the audio information output at the current time is present or not based on whether the audio information output at the previous time is processed according to the loss concealment method The loss concealment method, and the merge method to determine whether to perform signal processing using the compression / decompression method, the loss concealment method, and the merge method.

바람직하게는, 상기 신호 처리부는, 현재 차례에 출력되는 오디오 정보가 있으며 이전 차례에 출력된 오디오 정보가 상기 손실 은닉 방법에 따라 신호 처리되지 않았다면 상기 압축/비압축 방법을 이용하여 상기 현재 차례에 출력되는 오디오 정보를 신호 처리하는 압축/비압축 처리부; 상기 현재 차례에 출력되는 오디오 정보가 있으며 상기 이전 차례에 출력된 오디오 정보가 상기 손실 은닉 방법에 따라 신호 처리되었다면 상기 병합 방법을 이용하여 상기 현재 차례에 출력되는 오디오 정보를 신호 처리하는 병합 처리부; 및 상기 현재 차례에 출력되는 오디오 정보가 없으면 상기 손실 은닉 방법을 이용하여 상기 현재 차례에 출력되는 오디오 정보를 신호 처리하는 손실 은닉 처리부를 포함한다.Preferably, the signal processing unit outputs audio information output at the current time using the compression / decompression method if the audio information output at the previous time is not signal-processed according to the loss concealment method A compression / non-compression processing unit for processing the audio information; A merging processor for signaling the audio information output at the current time using the merging method if the audio information output at the current time is present and the audio information output at the previous time is signal processed according to the loss concealment method; And a loss concealment processing unit for processing the audio information output at the current time using the loss concealment method if there is no audio information output at the current time.

바람직하게는, 상기 압축/비압축 처리부는 상기 오디오 신호의 존재 여부와 상기 특정 채널에서의 스파이크 발생 여부에 따라 압축 방법과 비압축 방법 중 어느 하나의 방법을 이용하여 상기 현재 차례에 출력되는 오디오 정보를 신호 처리한다.Preferably, the compression / non-compression processing unit is configured to compress the audio information output at the current time using one of a compression method and an uncompression method depending on whether the audio signal is present or not and whether spikes are generated in the specific channel. .

바람직하게는, 상기 압축/비압축 처리부는 상기 압축 방법과 상기 비압축 방법 중 어느 하나의 방법을 선택할 때 현재까지 획득된 각 지터와 상기 각 지터가 획득되기 이전에 획득된 지터 사이의 차이값들로부터 산출된 분산값, 또는 상기 네트워크 지터와 상기 구간 단위로 분할된 오디오 정보 간 비율을 더 이용한다.Preferably, the compression / non-compression processing unit calculates a difference value between the jitter obtained up to now and the jitter obtained before the jitter is acquired when the method of any one of the compression method and the non-compression method is selected Or the ratio between the network jitter and the audio information divided on the basis of the interval.

바람직하게는, 상기 손실 은닉 처리부는 상기 손실 은닉 방법에 따라 연속으로 신호 처리를 수행한 횟수와 기준 횟수를 비교하여 단구간 손실 은닉 방법과 장구간 손실 은닉 방법 중 어느 하나의 방법을 이용하여 상기 현재 차례에 출력되는 오디오 정보를 신호 처리한다.Preferably, the loss concealment processing unit compares the number of consecutive signal processes performed in accordance with the loss concealment method with a reference frequency, and determines whether or not the current And processes the audio information output in turn.

바람직하게는, 상기 손실 은닉 처리부는 상기 현재 차례에 출력되는 오디오 정보를 신호 처리할 때 버즈 사운드를 제거하기 위한 선형적 감소 스케일 함수를 더 이용한다.Preferably, the loss concealment processing unit further uses a linear decreasing scale function for eliminating a buzz sound when signaling the audio information output at the current time.

바람직하게는, 상기 병합 처리부는 상기 현재 차례에 출력되는 오디오 정보로부터 서브 오디오 정보들을 추출하고, 각 서브 오디오 정보를 상기 이전 차례에 출력된 오디오 정보에 중첩시켜 상기 현재 차례에 출력되는 오디오 정보를 신호 처리한다.Preferably, the merging processor extracts sub-audio information from the audio information output at the current time, superimposes each sub-audio information on the audio information output at the previous time, .

바람직하게는, 상기 입체 음향 생성부는 상기 추출된 오디오 정보들이 기준 개수보다 많은 개수의 채널들로부터 수신된 것인지 여부를 판별하며, 상기 추출된 오디오 정보들이 상기 기준 개수보다 많은 개수의 채널들로부터 수신된 것으로 판별되면 손실 은닉 방법을 이용하여 상기 추출된 오디오 정보들을 믹싱하여 상기 입체 음향을 생성한다.Preferably, the stereophony generator determines whether the extracted audio information is received from a larger number of channels than the reference number, and if the extracted audio information is received from a larger number of channels than the reference number The extracted audio information is mixed using the loss concealment method to generate the stereo sound.

바람직하게는, 상기 입체 음향 생성 장치는, 상기 특정 채널로 수신된 오디오 정보를 저장하며, 상기 특정 채널로 수신된 오디오 정보에 상기 특정 채널로 수신된 오디오 정보의 생성 시간과 상기 특정 채널에 대한 정보를 결합하여 저장하는 수신 정보 저장부; 저장된 오디오 정보들 중에서 상기 네트워크 지터를 기초로 선택된 오디오 정보를 디코딩하는 디코딩부; 및 상기 신호 처리된 오디오 정보들을 백업(backup)시키는 처리 정보 백업부를 더 포함한다.Preferably, the stereo sound generating device stores audio information received on the specific channel, and generates audio information received on the specific channel and information on the specific channel A reception information storage unit for storing the combined information; A decoding unit decoding audio information selected based on the network jitter among the stored audio information; And a processing information backup unit for backing up the signal-processed audio information.

바람직하게는, 상기 입체 음향 생성 장치는, 상기 신호 처리된 오디오 정보들이 오디오 신호의 존재 여부에 따라 구간 단위로 분할되면 상기 구간 단위로 분할된 오디오 정보들의 제1 이득값을 조정하여 음량을 균일시키는 제1 이득 조정부; 및 반향 성분과 지연 성분이 포함된 상기 구간 단위로 분할된 오디오 정보들의 제2 이득값을 산출하며, 상기 구간 단위로 분할된 오디오 정보들 각각에 대하여 상기 제1 이득값과 상기 제2 이득값을 반영하는 제2 이득 반영부를 더 포함한다.Preferably, the stereophonic sound generating apparatus may further include a step of adjusting the first gain value of the audio information divided in units of intervals to equalize the volume when the signal-processed audio information is divided in units of intervals according to whether an audio signal exists or not A first gain adjustment unit; And a second gain value of the audio information divided by the interval including the echo component and the delay component is calculated, and the first gain value and the second gain value are calculated for each audio information segmented by the interval And a second gain reflector for reflecting the second gain.

바람직하게는, 상기 제1 이득 조정부는, 상기 구간 단위로 분할된 오디오 정보들 중에서 묵음을 포함하는 오디오 정보들을 제외한 나머지 정보들을 이용하여 오디오 정보 그룹들을 생성하는 그룹 생성부; 각 오디오 정보 그룹마다 음량에 대한 피크값을 산출하고, 상기 피크값과 기준값을 이용하여 제1 이득값을 산출하는 이득 산출부; 및 시간 순서에 따라 전후에 위치하는 두 오디오 정보 그룹들의 제1 이득값을 비교하고, 두 제1 이득값 사이에 차이값이 있으면 스무딩 기법을 이용하여 이전에 위치하는 오디오 정보 그룹의 제1 이득값을 기초로 이후에 위치하는 오디오 정보 그룹의 제1 이득값을 조정하는 이득 비교부를 포함한다.Preferably, the first gain adjuster comprises: a group generator for generating audio information groups using information other than audio information including silence among the audio information divided in units of intervals; A gain calculating unit for calculating a peak value for a volume for each audio information group and calculating a first gain value using the peak value and the reference value; And comparing the first gain values of the two audio information groups located before and after the audio information group in accordance with the time order, and if there is a difference value between the two first gain values, And a gain comparator for adjusting a first gain value of an audio information group located later on the basis of the first gain value.

바람직하게는, 상기 입체 음향 생성부는 상기 두 오디오 정보 그룹들 간 에너지 상관도를 이용하여 믹싱 콘트롤 계수를 생성하고, 상기 믹싱 콘트롤 계수를 이용하여 오디오 정보들을 믹싱한다.Preferably, the stereo sound generating unit generates mixing control coefficients using the energy correlation between the two audio information groups, and mixes the audio information using the mixing control coefficients.

바람직하게는, 상기 제2 이득 반영부는 전역 통과 필터(All pass filter)와 피드백 콤 필터(Feedback comb filter)를 이용하여 상기 제2 이득값을 산출한다.Preferably, the second gain reflector calculates the second gain using an all pass filter and a feedback comb filter.

또한 본 발명은, 특정 채널로 오디오 정보가 수신되면 상기 특정 채널로 현재 수신된 오디오 정보의 수신 시간과 상기 특정 채널로 이전 수신된 오디오 정보의 수신 시간을 기초로 네트워크 지터(jitter)를 추정하는 지터 추정부; 상기 네트워크 지터를 기초로 상기 현재 수신된 오디오 정보를 선택적으로 신호 처리하는 신호 처리부; 및 각기 다른 채널들로부터 수신된 뒤 신호 처리된 오디오 정보들 중에서 각 오디오 정보의 생성 시간을 기초로 동일한 시간에 생성된 오디오 정보들을 추출하며, 추출된 오디오 정보들을 믹싱하여 입체 음향을 실시간으로 생성하는 입체 음향 생성부를 포함하는 입체 음향 생성 장치; 동일 시간에 각기 다른 공간에서 각기 다른 연주자나 악기에 의한 연주를 녹음하여 각각의 오디오 정보로 생성하는 오디오 정보 생성부; 생성된 오디오 정보를 인코딩하는 인코딩부; 인코딩된 오디오 정보를 패킷 단위로 분할하여 상기 지터 추정부로 전송하며, 상기 인코딩된 오디오 정보의 생성 시간을 상기 입체 음향 생성부로 통지하는 오디오 정보 전송부; 및 상기 입체 음향을 실시간으로 재생하는 입체 음향 재생부를 포함하는 것을 특징으로 하는 입체 음향 재생 시스템을 제안한다.According to another aspect of the present invention, there is provided a jitter estimation method for estimating network jitter based on a reception time of audio information currently received on a specific channel and a reception time of audio information previously received on the specific channel, Estimation; A signal processor for selectively processing the currently received audio information based on the network jitter; And extracts audio information generated at the same time based on the generation time of each audio information among signal processed audio information received from different channels and mixes the extracted audio information to generate stereo sound in real time A stereo sound generating device including a stereo sound generating section; An audio information generating unit for recording performances of different performers or musical instruments in different spaces at the same time to generate respective pieces of audio information; An encoding unit encoding the generated audio information; An audio information transmitting unit for dividing the encoded audio information into packets and transmitting the divided audio information to the jitter estimating unit and notifying the generation time of the encoded audio information to the stereo sound generating unit; And a stereophonic sound reproducing unit for reproducing the stereophonic sound in real time.

본 발명은 각기 다른 독립 공간에서 획득된 오디오 정보를 각 채널의 네트워크 지터(jitter)와 각 오디오 정보의 생성 시간을 기초로 믹싱하여 입체 음향을 실시간으로 생성 재생함으로써 다음 효과를 얻을 수 있다.The present invention can obtain the following effects by mixing and reproducing audio information obtained in different independent spaces on the basis of network jitter of each channel and generation time of each audio information to generate and reproduce stereophonic sound in real time.

첫째, 지터 분산의 최적 가중치를 통해 네트워크 지터 추정 오류를 최소화할 수 있으며, 이를 이용한 플레이 아웃 스케줄링으로 버퍼링 지연과 패킷 손실을 최소화할 수 있다.First, we can minimize the network jitter estimation error through the optimum weight of jitter dispersion, and it can minimize buffering delay and packet loss by playout scheduling using it.

둘째, IP 네트워크를 통해 전송된 각 연주자의 연주음을 하나로 혼합하는 과정에서, 각 N 채널의 오디오 신호에 대한 음량의 균일화 및 오디오 프레임의 확산 효과 생성, 각 N 채널의 오디오 신호를 M개의 출력 신호에 부합할 수 있도록 연주 시점에 부합하는 조정된 타이밍을 통한 믹싱, 믹싱에 의해 발생할 수 있는 음량의 포화 제거, 그리고 각 연주음들의 조화 등을 통해 음질이 향상된 입체 음향을 생성할 수 있다.Second, in the process of mixing the performances of the performers transmitted through the IP network into one, the equalization of the volume and the diffusion effect of the audio frame for the audio signals of the N channels are generated, and the audio signals of the N channels are output to the M output signals It is possible to generate a stereo sound with improved sound quality by mixing through the adjusted timing corresponding to the time of the performance, saturation elimination that can be caused by mixing, and harmonization of the respective playing sounds.

도 1은 본 발명의 일실시예에 따른 IP 네트워크 기반 다자간 독립 연주의 실시간 입체 음향 생성 시스템을 개략적으로 도시한 개념도이다.
도 2는 도 1에 도시된 시스템을 구성하는 패킷 콘트롤부의 내부 구성을 구체적으로 도시한 개념도이다.
도 3은 도 1에 도시된 시스템을 구성하는 믹싱 및 입체 음향 생성부의 내부 구성을 도시한 개념도이다.
도 4는 본 발명의 바람직한 실시예에 따른 입체 음향 재생 시스템을 개략적으로 도시한 블록도이다.
도 5는 도 4의 입체 음향 재생 시스템을 구성하는 입체 음향 생성 장치의 내부 구성을 개략적으로 도시한 블록도이다.
도 6은 도 5의 입체 음향 생성 장치를 구성하는 지터 추정부의 내부 구성을 개략적으로 도시한 블록도이다.
도 7은 도 5의 입체 음향 생성 장치에 추가 가능한 내부 구성을 개략적으로 도시한 블록도이다.
도 8은 도 5의 입체 음향 생성 장치를 구성하는 신호 처리부의 내부 구성을 개략적으로 도시한 블록도이다.
도 9는 도 5의 입체 음향 생성 장치에 추가 가능한 내부 구성을 개략적으로 도시한 블록도이다.FIG. 1 is a conceptual diagram schematically showing a real-time stereo sound generation system of IP network based multi-party independent performance according to an embodiment of the present invention.
2 is a conceptual diagram specifically showing an internal configuration of a packet control unit constituting the system shown in FIG.
3 is a conceptual diagram showing an internal configuration of a mixing and stereo sound generating unit constituting the system shown in FIG.
4 is a block diagram schematically illustrating a stereo sound reproducing system according to a preferred embodiment of the present invention.
FIG. 5 is a block diagram schematically showing an internal configuration of a stereophonic sound generating apparatus constituting the stereophonic sound reproducing system of FIG.
FIG. 6 is a block diagram schematically illustrating the internal structure of a jitter estimating unit included in the stereo sound generating apparatus of FIG. 5;
7 is a block diagram schematically illustrating an internal configuration that can be added to the stereophonic sound generating apparatus of FIG.
8 is a block diagram schematically showing an internal configuration of a signal processing unit constituting the stereophonic sound generating apparatus of FIG.
9 is a block diagram schematically showing an internal configuration that can be added to the stereophonic sound generating apparatus of FIG.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 우선 각 도면의 구성요소들에 참조 부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다. 또한, 이하에서 본 발명의 바람직한 실시예를 설명할 것이나, 본 발명의 기술적 사상은 이에 한정하거나 제한되지 않고 당업자에 의해 변형되어 다양하게 실시될 수 있음은 물론이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, the preferred embodiments of the present invention will be described below, but it is needless to say that the technical idea of the present invention is not limited thereto and can be variously modified by those skilled in the art.

본 발명은 다중 연주자가 참여하는 음악 연주에 있어서, 각 악기의 연주자들이 동일한 공간에서 연주하는 것이 아니라 각각 다른 독립된 공간에서 연주하고, 연주하는 음악을 실시간으로 IP(Internet Protocol) 네트워크를 통해 전송하면, 수신단에서는 각 연주자로부터 전송받은 연주들을 하나의 자연스러운 음악 연주가 될 수 있도록 믹싱하고, 믹싱된 음악을 입체 음향으로 생성 및 재생하는 방법 및 그 장치에 관한 것이다.In the music playing performed by multiple performers, when the performers of the respective musical instruments do not play in the same space but play in different independent spaces and transmit the music to be played through the IP (Internet Protocol) network in real time, The present invention relates to a method and apparatus for mixing musical performances transmitted from respective musicians so that musical performances become natural musical performances, and generating and reproducing mixed musics as stereophonic sounds.

본 발명은 IP 네트워크를 통해 전송받은 음악 연주의 오디오 스트림을 적응적 재생 스케줄링과 패킷 손실 은닉 및 병합 등의 신호 처리 방식을 적용함으로써 최소의 지연으로 음악 혹은 오디오 패킷을 복원하고, 각각의 독립된 공간에서 여러 명의 연주자가 전송하는 오디오 스트림을 하나의 자연스러운 음악 연주가 될 수 있도록 멀티 채널 음향 신호 처리 방식을 통해 믹싱하고, 다수의 스피커의 배열을 통해 믹싱된 오디오 스트림을 입체 음향으로 생성하는 방식 등을 포함한다.The present invention applies a signal processing scheme such as adaptive reproduction scheduling, packet loss concealment, and merging to recover an audio stream of music performance transmitted through an IP network, thereby restoring music or audio packets with a minimum delay, A method of mixing an audio stream transmitted by a plurality of players through a multi-channel sound signal processing method so as to become a natural music performance, and a method of generating a mixed audio stream through a plurality of speaker arrays in a stereo sound .

본 발명은 각각 독립된 공간에서 연주된 음악을 마치 동일한 공간에서 연주자가 연주한 것과 같이 하나의 자연스러운 음악으로 믹싱하고, 입체 음향을 생성하여 사용자에게 제공한다. 입체 음향 생성에 사용되는 음향 파라미터들은 메타 데이터로 저장되어 사용자가 자신이 원하는 방식으로 입체 음향을 청취할 수 있다.The present invention mixes music played in independent spaces into one natural music as if played by a player in the same space, and generates stereoscopic sound and provides it to a user. The acoustic parameters used for stereophonic generation are stored as metadata so that the user can listen to the stereophonic sound in a desired manner.

도 1은 본 발명의 일실시예에 따른 IP 네트워크 기반 다자간 독립 연주의 실시간 입체 음향 생성 시스템을 개략적으로 도시한 개념도이다.FIG. 1 is a conceptual diagram schematically showing a real-time stereo sound generation system of IP network based multi-party independent performance according to an embodiment of the present invention.

본 발명은 다중의 연주자가 동일한 공간에서 음악 연주를 하는 것이 아니라, 각 연주자가 독립된 공간에서 실시간으로 연주를 하여 IP 네트워크를 통해 전송하고, 전송받은 연주를 기반으로 수신단에서 입체 음향을 실시간으로 생성해야 하기 때문에 여러가지 문제가 발생한다.In the present invention, instead of playing music in the same space by multiple players, each performer plays in real time in an independent space and transmits through the IP network, and generates stereo sound in real time on the receiving end based on the transmitted performance So various problems arise.

IP 기반의 패킷 망을 통해 음악 혹은 오디오 데이터를 전송할 경우에 각 패킷이 같은 경로가 아니라 서로 다른 경로를 통해 전송되기 때문에 각 패킷이 수신단에 도착하는 시간이 일정하지 않다. 즉, 네트워크 부하로 인해 패킷이 손실되거나, 정해진 시간 내에 패킷이 수신되지 않거나 또는 송신한 순서가 바뀌어 수신되는 경우가 발생된다. 이와 함께, 역동적으로 변화하는 네트워크 상황이나 스파이크 같은 비정상적인 네트워크 상황에서 버퍼링 지연이 증가됨에 따라 음질저하가 심하게 발생한다. 이러한 음질 저하는 현재 VoIP 기반의 음성 통화 및 음악 스트리밍 서비스에도 발생하고 있으며, 이러한 버퍼링 지연 증가와 음질 저하 문제를 해결하기 위한 방법이 요구된다.When music or audio data is transmitted through an IP-based packet network, since each packet is transmitted through different paths instead of the same path, the time at which each packet arrives at the receiving end is not constant. That is, a packet may be lost due to a network load, a packet may not be received within a predetermined time, or the order of transmission may be changed. In addition, the buffering delay is increased in a dynamically changing network situation or an abnormal network situation such as a spike, so that the sound quality deteriorates severely. This degradation in voice quality is also occurring in VoIP-based voice calls and music streaming services, and a method for solving the problem of increasing buffering delay and degrading sound quality is required.

다중의 연주자가 참여하는 음악 연주에 있어서, 독립된 공간에서 각 연주자가 연주를 할 경우에는 상대방 연주자의 연주음을 실질적으로 듣고 연주하고 있는 것이 아니라, IP 네트워크를 통해 전송된 각 연주자의 연주음을 하나로 혼합한 연주를 듣고 연주를 한다. 그런데, 서로 다른 공간에 존재하는 각 참여자의 환경의 차이로 인해 각 참여자가 전송한 음악 연주의 오디오 음량이 다르게 되어, 어떤 연주자의 오디오 음량은 크게 또 어떤 연주자의 오디오 음량은 작게 들리는 현상이 발생하게 되고, 각 연주자로부터 전송되어 오는 IP 네트워크 지연이 각각 다르고, 여러 채널의 신호를 혼합하기 때문에 오디오 신호를 합성하는 타이밍의 불일치, 음량의 불균형, 음량의 포화 현상이 발생된다. 이로 인해 각 연주음들 간의 조화가 이루어지지 않으며 음질도 심하게 저하되는 문제가 발생한다. 이러한 N 채널 신호의 믹싱 타이밍 일치, 음량의 균일화를 통한 각 채널 간의 조화를 이루는 방법이 요구된다.In the case of a musical performance involving multiple musicians, when each musical performer plays in a separate space, the musical notes of each musical performer transmitted through the IP network are converted into one Listen to mixed performances and play. However, due to differences in the environment of each participant in different spaces, the audio volume of the musical performance transmitted by each participant is different, so that the audio volume of a player becomes large and the audio volume of a player sounds small Since the IP network delay transmitted from each player is different from each other and signals of a plurality of channels are mixed, inconsistencies in timing for synthesizing audio signals, unbalance in volume, and saturation of volume occur. As a result, there is a problem that the harmony between the respective playing sounds is not achieved and the sound quality is seriously degraded. There is a need for a method of harmonizing each channel by matching the mixing timing of the N-channel signals and equalizing the volume.

입체 음향 제작을 위해서는 N 채널로 입력되는 연주자의 오디오 신호를 M 채널로 믹싱하여 사용자가 보유한 청취 환경에 적합하게 출력 신호를 전달할 필요가 있다. 입체 음향을 생성하기 위해서는 오디오 신호의 음량 균일 이득 조정, 반향과 지연을 반영한 이득 조정, 각 채널과 채널 간의 에너지 비율의 차와 상관도를 적용한 믹싱 방식이 필요하다. 또한, 실시간으로 전송받은 다채널 음원이 입체 음향으로 구성되는지 판단할 수 있는 방식이 요구된다.In order to produce stereo sound, it is necessary to mix the player's audio signal input through the N channel into the M channel, and to transmit the output signal in accordance with the listening environment possessed by the user. In order to generate stereophonic sound, it is necessary to adjust the uniformity of the volume of the audio signal, to adjust the gain reflecting the echo and delay, and to apply the mixing method applying the difference and correlation of the energy ratio between each channel and the channel. Also, there is a need for a method capable of determining whether a multi-channel sound source received in real time is composed of stereo sound.

오디오 스트리밍, 입체 음향 생성 및 재현에 대한 연구는 지금까지 각각 독립적으로 연구 및 개발이 진행되어 오고 있다. 본 발명은 다중 참여자가 동일한 공간에서 연주한 음원을 전송받아 입체 음향을 생성하는 경우와는 달리, 각 연주자가 독립된 공간에서 연주하는 음악을 전송받아 하나의 음악 연주를 실시간으로 생성하고, 생성된 음악을 입체 음향으로 출력하여 입체 음향 효과를 사용자에게 제공하는 경우이기 때문에 오디오 스트리밍과 입체 음향 생성을 결합하여 실시간으로 입체 음향을 생성하기 위해 발생되는 다양한 문제를 해결하기 위한 방법이 요구된다.Studies on audio streaming, stereo sound generation and reproduction have been conducted independently and independently. Unlike the case where the multiple participants transmit sound sources played in the same space to generate stereophonic sound, each player receives music played in an independent space, generates one music performance in real time, A stereophonic effect is provided to the user, and therefore, a method for solving various problems caused by generating audio in real time by combining audio streaming and stereophonic sound is required.

음질 저하의 원인이 되는 지터를 추정하기 위해서는 네트워크 경로 상에 시험 패킷을 전송하여 시험 패킷의 타임스탬프(timestamp) 및 시퀀스(sequence) 정보를 이용하여 지터를 추정하는 방식과, 수신단에 도착하는 패킷의 정보를 이용하여 지터의 평균과 분산을 구하고 고정된 지터 분산 가중치를 분산에 곱하여 지터의 평균과 더함으로써 네트워크 지터를 추정하는 방식 등이 사용될 수 있다.In order to estimate the jitter that causes deterioration in sound quality, a method of estimating jitter using a timestamp and sequence information of a test packet by transmitting a test packet on a network path, A method of estimating the network jitter by obtaining the average and variance of the jitter using the information, multiplying the fixed jitter dispersion weight by the variance, and adding it to the average of the jitter.

네트워크 상에서 발생하는 지연 및 지터에 따른 통화 품질의 저하를 막기 위해 플레이 아웃 방식은 일정 시간 간격 동안의 네트워크 트래픽을 측정하여 문턱값과 비교함으로써 수신단에 존재하는 패킷에 대해 신장/압축/정상 출력 중의 하나의 신호 처리 과정을 수행하여 버퍼링 지연을 줄이는 방식을 사용할 수 있으며, 네트워크 지연이 갑자기 증가하여 지터 버퍼의 고갈이 예상되거나 네트워크 지연이 갑자기 감소하여 지터 버퍼의 패킷 수용 범위를 넘는 언더플로우와 오버플로우 현상이 예상되면 이에 따라 신장 및 압축을 수행하는 방법을 사용할 수 있다.In order to prevent the degradation of the call quality due to the delay and jitter occurring in the network, the playout method measures network traffic for a predetermined time interval and compares the measured network traffic with a threshold value, thereby detecting one of the extension / compression / normal output And the buffering delay can be reduced by the signal processing of the jitter buffer. If the network delay suddenly increases and the jitter buffer is expected to be exhausted or the network delay suddenly decreases, the underflow and the overflow phenomenon A method of performing extension and compression can be used.

또한 네트워크 상에서 패킷이 손실되어 수신단에 도착하지 않는 패킷을 은닉하는 패킷 손실 은닉 방법에서는 정상적으로 디코딩된 이전 패킷에서 손실된 패킷에 대한 음성 파라미터를 예측하고, 이러한 예측 파라미터를 통해 손실된 음성 패킷을 디코딩하여 사용하거나 손실된 음성 프레임을 복원하기 위해 손실이 발생되기 바로 이전의 프레임 정보를 이용하여 이전 음성 프레임의 피치를 하나씩 증가하면서 복원하는 방식을 사용할 수 있다.Also, in the packet loss concealment method of concealing a packet that is lost on the network due to packet loss, the voice parameter for the packet lost in the normally decoded packet is predicted, and the lost voice packet is decoded through the predictive parameter It is possible to use a method of restoring the pitch of the previous voice frame one by one using the frame information immediately before the loss is generated in order to restore the used or lost voice frame.

하지만 상기한 방법들은 역동적으로 변화하는 네트워크 상황이나 스파이크 같은 비정상적인 네트워크 상황에서 버퍼링 지연이 증가되고, 음질 저하가 심하게 발생한다. 또한 압축, 신장, 손실 은닉, 병합 등의 신호 처리 과정에서 음질이 훼손되어 음질 저하가 발생할 수 있다. 이러한 버퍼링 지연 증가와 음질 저하 문제를 해결하기 위한 방법이 요구된다.However, the above methods increase buffering delay and deteriorate sound quality in dynamically changing network conditions or abnormal network conditions such as spikes. In addition, the sound quality may be degraded in the signal processing process of compression, extension, loss concealment, merging, etc., and sound quality may be degraded. There is a need for a method for solving the problems of increased buffering delay and sound quality degradation.

상기한 문제점을 해결하기 위해 본 발명의 일 측면에 따른 IP 네트워크 기반 다자간 독립 연주의 실시간 입체 음향 생성 시스템(100)은, 다중의 연주자가 참여하는 음악 연주에 있어서, 다른 독립된 공간에서 각 연주자가 연주하는 음악을 IP 네트워크를 통해 전송을 받아 다채널 방식으로 믹싱하여 하나의 음악 연주를 생성하고, 생성된 음악 연주가 입체 음향으로 재생하는 개선된 네가지 방식을 제안한다.In order to solve the above problems, an IP network-based real-time stereo sound generation system 100 for real-time stereo sound generation based on IP network according to an aspect of the present invention is characterized in that in a music performance in which a plurality of performers participate, The proposed method is based on multi - channel mixing of music through IP network to create one musical performance and reproduce the generated musical performance as stereoscopic sound.

첫째는, 독립된 공간에서 생성된 음악 연주가 역동적으로 변화하는 IP 네트워크 상황이나 스파이크 같은 비정상적인 네트워크 상황으로 인해 수신단에서 버퍼링 지연이 증가됨에 따라 음질 저하가 심하게 발생하는 현상을 제거하기 위해, 수신단에 도착하는 패킷들의 정보와 역동적인 지터 분산의 최적 가중치 함수를 이용하여 네트워크의 정상 상태에서 스파이크 상태로의 전환과 네트워크의 스파이크 상태에서 정상상태로의 전환점을 신속히 검출함으로써 다음 네트워크 상태의 지터 추정 에러를 감소시키고 버퍼링 지연과 음질 향상의 균형을 유지한다.First, in order to eliminate the phenomenon that sound quality deterioration occurs due to increase of buffering delay at a receiving end due to an abnormal network situation such as an IP network situation or a spike in which a music performance generated in an independent space changes dynamically, The jitter estimation error of the next network state can be reduced by rapidly detecting the transition from the steady state to the spike state of the network and the transition point from the spike state of the network to the steady state by using the information of the dynamic jitter distribution and the dynamic weighting function of the jitter dispersion, Maintain balance between delay and sound quality enhancement.

지터 추정 정확도가 높은 지터 추정 방식을 통해 수신단의 패킷을 신장하지 않고, 압축 및 정상적으로 출력함으로써 버퍼링 지연을 줄이고 음질을 향상시킨다. 압축시에는 오디오 신호가 존재하는 구간을 제외한 묵음 구간에서의 압축만을 수행하여 버퍼링 지연과 음질 향상의 균형을 유지한다.The jitter estimation method with high jitter estimation accuracy reduces the buffering delay and improves the sound quality by compressing and normally outputting the packets of the receiving end without expanding the packets of the receiving end. Compression is performed only in the silence period excluding the section where the audio signal exists, thereby maintaining a balance between the buffering delay and the sound quality improvement.

패킷 손실이 발생할 경우에 패킷 손실 은닉을 위해 손실된 패킷을 단구간과 장구간으로 구분하여 각 패킷 손실 은닉 방식을 적용함으로써, 발생하는 버즈 사운드를 제거하여 음질을 향상시킨다.In case of packet loss, packet loss concealment is applied to each packet loss concealment method by dividing the lost packet into short and long intervals, thereby eliminating buzz sounds and improving sound quality.

손실되어 복원된 이전의 패킷과 현재 정상적으로 수신된 병합시에, 정상적으로 수신된 패킷을 왜곡하지 않고 이전 복원 패킷과의 병합을 수행함으로써 음질을 향상시킨다.And improves the sound quality by performing merging with the previous restoration packet without distorting the normally received packet at the time of merging the previously recovered previous packet with the currently received restoration.

적응적 재생 스케줄링 방식과 패킷 손실 은닉 방식은 입체 음악 생성을 위한 믹싱부와 밀접한 관련을 가짐으로써 지연을 최대한 줄이는 방식을 통해서 실시간으로 입체 음향이 생성되도록 한다. N 채널의 오디오 신호를 믹싱하는 믹싱부에서 N 채널의 오디오 신호들 중에서 2/3에 해당하는 채널의 오디오 신호가 도착했다는 연락을 패킷 콘트롤부(130)가 수신을 하면, 패킷 콘트롤부(130)에서는 압축, 병합 과정을 수행하지 않고, 단구간에서의 패킷 손실 은닉만을 사용하여 정상 출력한다.The adaptive playback scheduling method and the packet loss concealment method are closely related to the mixing unit for generating stereophonic music, so that the stereophonic sound is generated in real time by reducing the delay as much as possible. When the packet control unit 130 receives a notification that an audio signal of a channel corresponding to 2/3 of the N-channel audio signals has arrived in the mixing unit for mixing the N-channel audio signals, The packet loss concealment in the short interval is used for normal output without performing the compression and merging process.

둘째는, IP 네트워크를 통해 전송된 각 연주자의 연주음을 하나로 혼합하는 과정에서 발생하는 믹싱 타이밍의 불일치, 음량의 불균형, 음량의 포화 현상, 각 연주음들의 불균형적인 조화, 음질 저하 등의 문제를 제거하기 위해, 묵음 구간이 아닌 오디오 신호가 존재하는 연속적인 오디오 프레임을 악절로 정의하고, 음량 기준값과 한 악절에서의 최대 피크값의 비율로 획득된 각 악절의 이득값을 오디오 악절에 적용함으로써 연주자의 각각 독립된 환경 차이에 따른 음량 불균화를 제거한다.Second, there are problems such as inconsistency of mixing timings, unbalance of volume, saturation of volume, unbalanced harmony of each playing sound, deterioration of sound quality, and the like which occur in the process of mixing each player's performance sound transmitted through IP network A consecutive audio frame in which an audio signal is not a silence section is defined as a section and a gain value of each section obtained by a ratio of a volume reference value and a maximum peak value in one section is applied to an audio section, Thereby eliminating the loudness nonuniformity due to the independent environmental difference of each of

급격한 오디오 이득값의 변화로 인해서 발생하는 오디오 신호의 왜곡을 방지하기 위해서 이전의 이득값과 현재 이득값의 차이를 통해 획득된 이득값의 변화만큼 선형적으로 이득값이 변화하면서 적용되도록 함으로써 안정적으로 음량 균일화가 이루어지게 한다.In order to prevent distortion of the audio signal caused by a sudden change in the audio gain value, the gain value is applied linearly to the gain value obtained through the difference between the previous gain value and the current gain value, Volume equalization is performed.

각 채널에서 독립적으로 계산된 오디오 이득값을 직접 이득값으로 정의하고, 전처리 전향 필터와 피드백 필터를 적용하여 생성된 반향과 지연이 추가되는 확산 이득값을 음량이 균일화된 오디오 악절에 적용함으로써 스피커 혹은 헤드폰에 입체 음향 발산의 느낌을 생성한다.By independently calculating the audio gain values independently calculated for each channel, applying the pre-processing forward filter and the feedback filter to the spreading gain values that are generated by adding the generated echoes and the delay to the audio sections that are equalized in volume, Creating a feeling of stereophonic emanating to the headphone.

N개의 각 채널에서 입력된 오디오 악절의 오디오 프레임의 송신단 생성 시간, 네트워크 전송 지연 시간, 신호 처리 시간 등을 적용하여 각 N개의 채널 간에 연주 시차의 오차가 없도록 믹싱 타이밍을 맞추고, N 채널 간의 에너지 비율, 상관 비율을 통해 각 N개의 채널로부터 전송된 음악 연주 오디오 스트림을 M개의 오디오 채널로 믹싱한다.The mixing timing is adjusted so that there is no error in the parallax between each of the N channels by applying the transmitting end generation time, the network transmission delay time, the signal processing time, etc. of the audio frame of the audio section input from each of the N channels, , And mixes the musical performance audio stream transmitted from each of the N channels through the correlation ratio into M audio channels.

각 채널간에 입력되는 오디오 신호의 믹싱시, 음량 제한 문턱값과 믹싱 계수를 기반으로 음량 한계점과 음성 신호 크기의 비율로 음량 포화 방지를 위한 균일 이득 믹싱 계수를 계산하여 오디오 악절에 적용함으로써 음량 포화를 방지한다.When mixing the input audio signal between channels, the uniform gain mixing coefficient for preventing saturation is calculated based on the volume limit threshold and the mixing coefficient at the ratio of the volume limit to the voice signal size, and applied to the audio section, prevent.

믹싱된 오디오 신호는 증폭기(Amplifier)에 의해 아날로그 신호로 바뀌고 증폭된다. 증폭된 신호는 입체 음향 환경부의 스피커로 전송된다.The mixed audio signal is converted into an analog signal by an amplifier and amplified. The amplified signal is transmitted to the speaker of the stereo environment section.

셋째는, 입체 음향 제작을 위해서, 믹싱되어 출력되는 M 채널의 전-후-좌-우의 신호를 통해서 새로운 가상의 스피커를 생성하여 스피커 출력 신호로 사용하고, 배치된 스피커들로부터 출력되는 오디오 신호를 다채널 입력 오디오 디바이스를 통해 전달받아 L개 악절 단위로 청취 테스트를 실시하여 입체 음향이 제대로 생성되는지를 평가한다. 입체 음향이 제대로 생성되고 있다고 판단될 경우에는 현재 적용하는 파라미터를 입체 음향 렌더링 모델에 적용한다. 반면에, 입체 음향이 제대로 생성되지 않고 있다고 판단될 경우에는 확산 게인, 확산 파라미터, 믹싱 게인, 믹싱 파라미터 등을 변경하여 새롭게 입력되는 오디오 악절에 적용한다.Third, for stereophonic production, a new virtual speaker is generated through the M-channel pre-post-left-right signals mixed and output, and used as a speaker output signal, and an audio signal output from the placed speakers Receive through a multi-channel input audio device and perform a listening test in L sections to evaluate whether the stereo sound is properly generated. If it is determined that the stereo sound is being generated properly, the currently applied parameter is applied to the stereo sound rendering model. On the other hand, when it is determined that the stereo sound is not generated properly, it is applied to the newly input audio section by changing the diffusion gain, the diffusion parameter, the mixing gain, and the mixing parameter.

이상 설명한 바에 따르면 본 발명은 다음 효과를 얻을 수 있다.As described above, the present invention can achieve the following effects.

첫째, 지터 분산의 최적 가중치를 통해 네트워크 지터 추정 오류를 최소화하며, 이를 이용한 플레이 아웃 스케줄링으로 버퍼링 지연과 패킷 손실을 최소화한다.First, network jitter estimation error is minimized through optimal weight of jitter dispersion, and playout scheduling using it minimizes buffering delay and packet loss.

둘째, 손실 은닉 수행시, 연속된 손실 구간을 감지하여 단구간 및 장구간 손실 은닉으로 구분하여 손실시 은닉된 음성 프레임에 대해 소거 스케일링 함수를 적용함으로써 장구간 패킷 은닉으로 인한 버즈 사운드의 발생을 저지하여 음질 훼손을 막는다. 손실 은닉 이후 병합 과정에서 정상 수신된 오디오 프레임의 서브 프레임을 사용한 병합 방법으로 기존 정상 수신된 오디오 프레임의 변형이 되는 것을 막아 음질을 향상시킨다.Second, when the loss concealment is performed, the consecutive loss interval is detected and classified into the short-term and long-term loss concealment, and the erasure scaling function is applied to the concealed voice frame to prevent the buzz sound due to concealment of the long- Thereby preventing sound quality degradation. After the loss concealment, the merging method using the subframes of the normally received audio frames in the merging process prevents the deformation of the previously received audio frames, thereby improving the sound quality.

셋째, IP 네트워크를 통해 전송된 각 연주자의 연주음을 하나로 혼합하는 과정에서, 각 N 채널의 오디오 신호에 대한 음량의 균일화 및 오디오 프레임의 확산 효과 생성, 각 N 채널의 오디오 신호를 M개의 출력 신호에 부합할 수 있도록 연주 시점에 부합하는 조정된 타이밍을 통한 믹싱, 믹싱에 의해 발생할 수 있는 음량의 포화 제거, 그리고 각 연주음들의 조화 등을 통해 음질이 향상된 입체 음향을 생성한다.Third, in the process of mixing the performances of the performers transmitted through the IP network into one, the equalization of the volume and the spreading effect of the audio frame for the audio signals of the N channels are performed, and the audio signals of the N channels are output to the M output signals To produce a stereo sound with enhanced sound quality by mixing with adjusted timing corresponding to the time of the performance, saturation elimination that can be caused by mixing, and harmonization of the respective playing sounds.

넷째, 입체 음향 생성의 효과를 측정하기 위해 배치된 스피커로부터 출력되는 오디오 음향을 녹음하여 청취 테스트 및 출력 음파의 비쥬얼 시뮬레이션 결과를 통해 입체 음향 생성도를 믹싱 엔진에 피드백하여, 개선된 입체 음향이 출력되도록 한다.Fourth, in order to measure the effect of the stereo sound production, the audio sound outputted from the speaker arranged is recorded to feedback the stereo sound generation diagram to the mixing engine through the result of the visual simulation of the listening test and the output sound wave, .

이하 도면을 참조하여 설명한다.The following description will be made with reference to the drawings.

도 1을 참조하면, 본 발명에 따른 입체 음향 생성 시스템(100)은 N개 채널의 음악 연주를 통한 음악 사운드의 녹음 및 인코딩부(110), 패킷 생성 및 전송부(120), 패킷 콘트롤부(130) 및 게인 콘트롤을 통한 믹싱 및 입체 음향 생성부(140)로 구성된다.1, a stereophonic sound generating system 100 according to the present invention includes a sound recording and encoding unit 110, a packet generating and transmitting unit 120, a packet control unit 130 and a mixing and stereo sound generation unit 140 through gain control.

동작이 시작되면 각각의 독립된 공간에서 연주자는 실시간으로 연주를 하고, 연주되는 오디오 사운드는 음악 사운드 녹음 및 인코딩부(110)에 의해 녹음되어 오디오 인코딩 방식을 통해 인코딩되고, 패킷 생성 및 전송부(120)에 의해 패킷 단위로 분할되어 IP 네트워크망을 통해 패킷 콘트롤부(130)로 전송된다. 패킷 생성 및 전송부(120)에서는 패킷 내에 각 패킷이 생성되는 시간에 대한 정보를 포함하여 전송되는 것과는 독립적으로 각 전송단에서 패킷이 생성된 시간을 수신단의 입체 음향 생성부(140)로 전송한다. 본 발명에서는 다양한 오디오 코덱을 적용할 수 있도록 시스템을 구성하므로, 오디오 코덱에 대해서는 제한을 두지 않는다.When the operation is started, the player plays in real time in each independent space, and the audio sound to be played is recorded by the music sound recording and encoding unit 110 and encoded through the audio encoding method, ), And is transmitted to the packet control unit 130 through the IP network. The packet generating and transmitting unit 120 transmits the time at which each packet is generated at each transmitting end to the stereo sound generating unit 140 of the receiving end independently of being transmitted including information on the time at which each packet is generated in the packet . In the present invention, since the system is configured to apply various audio codecs, there is no limitation on the audio codec.

각각의 독립된 공간에서 연주되는 오디오 사운드는 수신단 즉, 입체 음향 생성부(140)의 입력부인 N 채널 중에서 하나의 채널에 상응한다. 여러 개의 채널로부터 입체 음향 생성부(140)에 오디오 패킷이 전송되어 입력되고, 각각의 채널에 입력되는 오디오 패킷에 대해서 독립적으로 패킷 콘트롤과 오디오 이득값 콘트롤이 수행된다.The audio sound played in each of the independent spaces corresponds to one of the N channels, which is the input of the receiver, that is, the stereo sound generator 140. An audio packet is transmitted from the plurality of channels to the stereo sound generating unit 140, and packet control and audio gain value control are independently performed on the audio packet input to each channel.

각 채널의 입력된 오디오 패킷은 패킷 콘트롤부(130)로 전달된다. 패킷 콘트롤부(130)에서는 IP 네트워크 경로 상의 지터를 추정하고, 동시에 패킷을 오디오 디코딩을 통해 오디오 신호로 전환시킨다. 추정된 지터, 패킷 손실 은닉 및 적응적 재생 방식을 이용하여 최소한의 버퍼링 지연으로 지터와 지연에 의한 문제를 극복하고 패킷 단위의 오디오 신호를 복원한다. 도 2를 통해 패킷 콘트롤부(130)의 구성 및 동작을 보다 상세하게 설명한다.The input audio packet of each channel is transmitted to the packet control unit 130. The packet control unit 130 estimates the jitter on the IP network path and simultaneously converts the packet into an audio signal through audio decoding. By using the estimated jitter, packet loss concealment, and adaptive playback method, the problem of jitter and delay is overcome with minimum buffering delay and the audio signal of packet unit is restored. The configuration and operation of the packet control unit 130 will be described in more detail with reference to FIG.

복원된 패킷 단위의 오디오 신호는 게인 콘트롤을 통한 믹싱 및 입체 음향 생성부(140)로 전달된다. 개인 콘트롤을 통한 믹싱부에서는 N 채널의 입력 오디오 신호를 수신하여 적절한 직접 게인과 지연을 해당 채널에 적용하여 오디오 출력 신호의 확산 효과를 수행하고, M 채널로 합성하여 출력할 수 있도록 N 채널의 입력 오디오 신호를 믹싱한다. 출력되는 M 채널은 배치되는 스피커와 매핑되어 입체 음향을 생성한다. 도 3을 통해 게인 콘트롤을 통한 믹싱 및 입체 음향 생성부(140)의 구성 및 동작을 보다 상세하게 설명한다.The restored audio signal of the packet unit is transmitted to the mixing and stereo sound generation unit 140 through the gain control. In the mixing unit through the individual control, an N-channel input audio signal is received, an appropriate direct gain and delay are applied to the corresponding channel to perform a diffusion effect of the audio output signal, and an N-channel input Mix audio signals. The output M channel is mapped to the speaker to be placed to generate stereo sound. The configuration and operation of the mixing and stereo sound generation unit 140 through gain control will be described in more detail with reference to FIG.

도 2는 도 1에 도시된 시스템을 구성하는 패킷 콘트롤부(130)의 내부 구성을 구체적으로 도시한 개념도이다. 패킷 콘트롤부(130)는 IP 네트워크 환경에서 적응적인 플레이 아웃 스케줄링과 패킷 손실 은닉 방식을 통해 패킷을 복원하고 재생되도록 하는 구성이다.FIG. 2 is a conceptual diagram specifically showing an internal configuration of the packet control unit 130 constituting the system shown in FIG. The packet control unit 130 is configured to recover and reproduce a packet through an adaptive playout scheduling and packet loss concealment scheme in an IP network environment.

도 2를 참조하면, 패킷 콘트롤부(130)는 오디오 지터 버퍼(210), 네트워크 지터 추정부(205), 오디오 디코더(215), 묵음 구간 검출부(220), 오디오 프레임 저장부(225), 오디오 신호 처리 결정부(230), 오디오 프레임 압축 및 정상 출력부(235), 손실 은닉부(245), 병합부(240), 오디오 프레임 백업부(250) 및 수신단측 출력 장치(255)로 구분된다.2, the packet control unit 130 includes an audio jitter buffer 210, a network jitter estimation unit 205, an audio decoder 215, a silence interval detection unit 220, an audio frame storage unit 225, A signal processing determination unit 230, an audio frame compression and normal output unit 235, a loss concealment unit 245, a merging unit 240, an audio frame backup unit 250, and a receiving end output device 255 .

동작이 시작되면, 송신단에서 생성된 오디오 패킷은 통신 모듈을 통해 수신단, 즉 입체 음향 생성부에 도착한다. 수신단에 도착한 오디오 패킷은 오디오 지터 버퍼(210)에 저장된다. 도착한 오디오 패킷은 RTP 헤더의 경우 타임스탬프(Timestamp), 시퀀스 번호(Sequence Number), 부호화된 오디오 정보 등을 포함한다.When the operation is started, the audio packet generated at the transmitting terminal arrives at the receiving end, that is, the stereo sound generating section, through the communication module. The audio packets arriving at the receiving end are stored in the audio jitter buffer 210. The arrived audio packet includes a time stamp (Timestamp), a sequence number (Sequence Number), and encoded audio information in the case of the RTP header.

수신된 오디오 패킷은 지터 버퍼(210)에 저장됨과 동시에, 오디오 패킷 정보를 통해 네트워크 지터 추정부(205)에서 다음 수신될 오디오 패킷의 네트워크 지터를 추정한다. 지터 추정은 지터의 분산이 크게 변하지 않고 지터가 평균값에서 일정 범위 안에서 변화하는 정상 상태와 지터의 분산이 크게 변하여 지터가 평균값에서 일정 범위를 벗어나 변화하는 스파이크 상태로 구분한다. 정상 상태를 초기 단계로 설정하고 시간에 따라 수신단에 현재 도착한 패킷과 이전 패킷의 도착 시간을 기반으로 네트워크 지연 변화를 측정하고, 측정된 네트워크 지연의 변화가 스파이크 감지 문턱값 이상이 되면 스파이크가 발생했다고 판단하고, 반면에 스파이크 감지 문턱값보다 작으면 정상 상태가 유지되고 있음을 판단한다.The received audio packet is stored in the jitter buffer 210, and at the same time, the network jitter estimator 205 estimates the network jitter of the audio packet to be received next through the audio packet information. The jitter estimation is classified into a steady state in which the jitter dispersion does not vary greatly, a jitter state in which the jitter changes within a certain range in the average value, and a spike state in which the jitter is varied by a large change in the average value. The steady state is set as the initial stage, and the network delay change is measured based on the arrival time of the packet currently arriving at the receiving end and the previous packet according to the time, and a spike occurs when the measured network delay change exceeds the spike detection threshold On the other hand, if it is smaller than the spike detection threshold value, it is determined that the normal state is maintained.

네트워크가 정상 상태로부터 스파이크가 검출되었다고 판단되면, 스파이크 검출 이전의 정상 상태에서 수신된 패킷의 시퀀스 중 가장 큰 시퀀스를 사용하여 네트워크 정상 상태의 끝점으로 결정하고, 스파이크 이전의 정상 상태에서 구해진 지터의 평균과 분산을 저장한다. 저장된 지터의 평균과 분산은 스파이크 발생 이후에 정상 상태로 복원되었을 경우에 스파이크 검출을 위해 사용된다.If it is determined that a spike has been detected from the steady state, the network is determined as the end point of the network steady state using the largest sequence of the packets received in the steady state before the spike detection, and the average of the jitter obtained in the steady state before the spike And variance. The average and variance of the stored jitter is used for spike detection when restored to a steady state after a spike has occurred.

스파이크 상태에서 정상 상태로 변화를 검출하기 위해 스파이크 검출시 도착한 패킷의 시퀀스 번호를 스파이크 끝점이라 정의하고, 그 이전에 들어온 패킷 번호를 스파이크 시작점으로 정의한다. 스파이크 시작점과 끝점 사이의 패킷이 모두 들어온다면, 스파이크로 인해 들어오지 못한 패킷이 모두 들어왔다고 생각하고 스파이크가 끝나 정상 상태로 복원되었다고 판단한다.In order to detect a change from a spike state to a steady state, a sequence number of a packet arriving at the time of spike detection is defined as a spike end point, and a packet number entered before that is defined as a spike start point. If all the packets between the spike start point and the end point are received, it is assumed that all the packets that have not been received due to spikes are received, and that the spikes are finished and are restored to a normal state.

스파이크 이후의 패킷이 들어온다면 스파이크 끝점을 다시 스파이크 검출 이후 들어온 패킷의 시퀀스 번호로 정의하여 스파이크 구간이 아직 진행 중이라 판단하여 스파이크 구간을 늘린다. 이후 늘어난 스파이크 구간까지의 패킷이 모두 들어온다면 스파이크가 끝나 정상 상태로 복원되었다고 판단한다.If a packet arrives after the spike, the spike endpoint is defined as the sequence number of the packet received after the detection of the spike again, so that the spike section is judged to be still in progress and the spike section is increased. If all the packets reaching the extended spike are received, it is judged that the spike is over and the packet is restored to the normal state.

만약 스파이크에서 패킷이 손실되어 스파이크 구간의 패킷이 모두 들어올 수 없는 경우의 오류를 방지하기 위해, 스파이크 검출시 지연의 변화를 이용하여 스파이크 구간 검출 문턱값을 정의하고 스파이크 시작시 지연 변화의 범위를 이용해 스파이크 최대 지속 시간을 예상한다. 스파이크 최대 지속 시간이 넘어 정상 상태로 복원되지 않는다면, 스파이크 최대 지속 시간 이후는 정상 상태로 판단한다.In order to prevent an error when a packet is lost in a spike due to a failure to receive all the packets of the spike interval, a spike interval detection threshold is defined using a change in delay at the time of spike detection, Expected maximum spike duration. If the spike maximum duration is not restored to a steady state, then the spike maximum duration is considered normal.

스파이크 상태에서 정상 상태로 전환되면 스파이크 검출시에 임시 저장했던 스파이크 이전까지의 정상 상태 지터의 평균과 분산값을 다시 복원시켜 스파이크 상태에서 계산된 지터에 영향없이 정상 상태의 지터 추정을 하도록 한다.When the spike state is changed to the steady state, the mean and variance values of the steady state jitter before the spike temporarily stored at the time of detecting the spike are restored to estimate the jitter in the steady state without affecting the jitter calculated in the spike state.

정상 상태에서의 지터 추정은 현재까지의 지터 평균에 지터 변화를 예측한 값인 현재 지터 분산에 지터 분산의 최적 가중치를 곱한 값을 합하여 다음 네트워크 지터를 추정한다.In the steady-state jitter estimation, the next network jitter is estimated by summing the current jitter dispersion multiplied by the optimal weight of the jitter dispersion, which is a value obtained by predicting the jitter variation to the present jitter average.

지터 에러를 최소화시키는 핵심 요소인 지터 분산의 최적 가중치는 이전에 추정된 지터와 실제 현재 발생한 지터의 값의 차이를 최소화시키는 가중치로서, 이전 지터 분산의 최적 가중치값이 이전에 계산된 지터 분산과 동일하다면 지터의 변화가 일어나지 않은 상황으로 판단하고 지터 분산의 최적 가중치값은 이전 최대 가중치로 설정한다. 만약 가중치값의 변화가 있다면, 현재 지터에서 이전 지터 평균값을 차감한 값에 대한 이전 지터 분산과의 비율을 지터 분산의 최적가중치로 적용한다.The optimal weight of the jitter dispersion, which is a key factor for minimizing the jitter error, is a weight that minimizes the difference between the previously estimated jitter and the actual presently generated jitter, and the optimum weight value of the previous jitter dispersion is equal to the previously calculated jitter dispersion It is judged that the jitter is not changed and the optimum weight value of the jitter variance is set to the previous maximum weight value. If there is a change in the weight value, the ratio of the jitter variance to the previous jitter variance with respect to the value obtained by subtracting the previous jitter average value from the current jitter is applied as the optimum weight value of the jitter variance.

스파이크 상태에서는 패킷이 비정상적인 네트워크 상태로 인해 패킷이 수신되지 않고 순서가 바뀌어 수신되고 있는 상황이기 때문에 스파이크 이후 정상상태로 복원되었을 때 스파이크 구간에서의 지터의 평균, 분산, 지터 분산의 최적 가중치를 갱신함으로써 네트워크 상황에 효과적으로 적응하도록 한다.In the spike state, since the packets are received due to the abnormal network state and the packets are not received, the optimal weights of the average, dispersion, and jitter dispersion of the jitter in the spike interval are restored Allows you to adapt effectively to network conditions.

수신된 패킷을 출력 장치에 전달하기 위해서는 오디오 디코더(215)에서 지터 버퍼(210)로 오디오 패킷을 요청한다. 오디오 디코더(215)로부터 오디오 패킷을 요청받은 지터 버퍼(210)는 네트워크 지터 추정부(205)에서 추정된 네트워크 지터와 지터 버퍼(210)에 저장된 오디오 패킷 정보를 참조하여 오디오 디코더(215)로 전달할 오디오 패킷을 지정하여 오디오 디코더(215)로 전달한다.In order to deliver the received packet to the output device, the audio decoder 215 requests an audio packet from the jitter buffer 210. The jitter buffer 210 receiving the audio packet from the audio decoder 215 refers to the network jitter estimated by the network jitter estimator 205 and the audio packet information stored in the jitter buffer 210 and transmits the audio packet to the audio decoder 215 And delivers the audio packet to the audio decoder 215.

오디오 패킷을 전달받은 오디오 디코더(215)는 해당 오디오 패킷을 디코딩하여 오디오 신호가 저장되어 있는 오디오 프레임을 생성하고 묵음 구간 검출부(220)에 전달한다.The audio decoder 215 receiving the audio packet decodes the audio packet, generates an audio frame storing the audio signal, and transmits the audio frame to the silence interval detecting unit 220.

묵음 구간 검출부(220)에서는 오디오 디코딩을 통해 생성된 오디오 프레임의 시간축 에너지를 측정하여 오디오 신호 존재 구간과 묵음 구간을 구분한다. 그 이후에 오디오 프레임은 오디오 프레임 저장부(225)에 전달된다.The silence interval detector 220 measures the time axis energy of an audio frame generated through audio decoding to distinguish between an audio signal existence interval and a silence interval. Thereafter, the audio frame is delivered to the audio frame storage unit 225.

오디오 프레임 저장부(225)는 저장된 오디오 프레임에서 현재 출력 장치로 전달해야 할 시퀀스 번호를 가지는 오디오 프레임을 오디오 신호 처리 결정부(230)에 전달하여 수신된 오디오 프레임에 대해 압축 및 정상 출력, 손실 은닉 혹은 병합 중의 하나의 신호 처리 수행을 결정한다.The audio frame storage unit 225 transmits an audio frame having a sequence number to be transmitted from the stored audio frame to the current output device to the audio signal processing determination unit 230 to compress and normalize the received audio frame, Or concatenation of the signals.

오디오 프레임 저장부(225)에 오디오 프레임이 존재하고 이전 오디오 프레임이 손실 은닉부(245)를 통해 손실 은닉이 이루어지지 않았다면, 오디오 프레임 압축 및 정상 출력부(235)로 음성 프레임을 전달한다.If there is an audio frame in the audio frame storage unit 225 and no loss concealment has occurred through the loss concealment unit 245, the audio frame is delivered to the audio frame compression and normal output unit 235.

그러나, 오디오 프레임 저장부(225)에 출력될 오디오 프레임이 존재하지 않는다면, 신호를 출력하기 위해 손실 은닉부(245)에서 손실 은닉 신호를 생성한다.However, if there is no audio frame to be output to the audio frame storage unit 225, the loss concealment unit 245 generates a loss concealment signal to output a signal.

그러나, 오디오 프레임 저장부(225)에 오디오 프레임이 존재하고 이전 오디오 프레임이 손실 은닉부(245)를 통해 손실 은닉이 이루어졌다면, 손실 은닉부(245)를 통해 손실 은닉된 오디오 프레임과 불연속점을 제거하기 위해 병합부(240)에 오디오 프레임을 전달한다.However, if there is an audio frame in the audio frame storage unit 225 and the loss concealment is performed through the loss concealment unit 245, the loss concealment unit 245 extracts the lost concealed audio frame and the discontinuity point And transfers the audio frame to the merging unit 240 for elimination.

오디오 프레임 압축 및 정상 출력부(235)는 IP 네트워크가 스파이크 상태에서 정상 상태로 전환되는 경우와 정상 상태가 유지되는 두가지 경우로 구분된다.The audio frame compression and normal output unit 235 is divided into two cases where the IP network is switched from the spike state to the normal state and the normal state is maintained.

스파이크 상태에서 정상 상태로 전환되는 경우에는 스파이크 이후 갑자기 들어오는 오디오 패킷으로 인해 버퍼링 지연의 증가가 발생하는 것을 막기 위해 지터 버퍼(210) 및 오디오 프레임 저장부(225) 안에 저장되어 있는 연속된 묵음 구간을 찾고 이를 압축한다.In the case of switching from the spike state to the normal state, in order to prevent an increase in the buffering delay due to the sudden incoming audio packet after the spike, the continuous silence interval stored in the jitter buffer 210 and the audio frame storage unit 225 And compresses it.

디코딩된 오디오 프레임이 묵음 구간이고 현 상태가 스파이크 상태에서 정상 상태로의 전환점이 아니라고 판단되면, 현재 추정된 지터와 지터의 분산을 오버플로우 문턱값과 비교하여 현재 출력될 묵음 구간에 대해 압축을 수행할 것인지 정상 출력을 할 것인지를 판단한다. 즉, 현재 추정된 지터와 지터의 분산이 오버플로우 문턱값보다 작다면, 현재 네트워크 상황이 매우 원활하여 지터의 변동이 적은 상황이라 예측하고, 정상 출력을 수행한다. 반면에, 현재 추정된 지터와 지터의 분산이 오버플로우 문턱값보다 크다면, 추정된 지터와 수신단에 존재하는 음성 프레임의 길이의 비율을 비교하여 불안정한 네트워크 상황에 대한 압축 및 정상 출력을 판단한다.If it is determined that the decoded audio frame is a silent period and the current state is not a transition point from the spike state to the steady state, the current estimated jitter and jitter dispersion are compared with the overflow threshold value, Whether or not to make a normal output. That is, if the dispersion of the currently estimated jitter and jitter is smaller than the overflow threshold value, it is predicted that the current network situation is very smooth and the jitter fluctuation is small, and the normal output is performed. On the other hand, if the variance of the currently estimated jitter and jitter is larger than the overflow threshold value, the ratio of the estimated jitter to the length of the voice frame existing at the receiving end is compared to determine the compression and normal output for unstable network conditions.

추정된 지터와 수신단에 존재하는 음성 프레임의 길이의 비율이 정상 출력 문턱값보다 작다면, 압축을 수행한다. 반면 문턱값보다 크다면 정상 출력을 수행한다.If the ratio of the estimated jitter to the length of the speech frame present at the receiving end is smaller than the normal output threshold value, compression is performed. On the other hand, if it is larger than the threshold value, it performs normal output.

그러나 디코딩된 오디오 프레임이 묵음 구간이고 스파이크 상태에서 정상 상태로의 전환점이라고 판단되면, 잘못된 압축을 방지하기 위해 정상 출력한다.However, if it is determined that the decoded audio frame is a silent period and is a transition point from a spike state to a normal state, normal output is performed to prevent erroneous compression.

또한, 현재 출력될 음성 프레임이 묵음 구간이 아닌 오디오 신호 존재 구간이라면 압축을 수행하지 않고 정상 출력한다. 이는 오디오 존재 구간에서의 압축을 통해 통화 품질이 훼손되는 오디오 신호의 왜곡을 방지하기 위함이다.If the audio frame to be output at present is an audio signal existence period other than the silence period, the normal output is performed without performing compression. This is to prevent distortion of the audio signal, which degrades the communication quality through compression in the audio presence interval.

손실 은닉부(245)에서는 연속된 손실 은닉 횟수가 음질 훼손 문턱값보다 작다면 단구간 손실 은닉을 수행하고, 그렇지 않다면 장구간 손실 은닉을 수행한다.In the loss concealment section 245, the short loss loss concealment is performed if the number of continuous loss concealment times is smaller than the sound quality loss threshold value, and otherwise, the long loss loss concealment is performed.

단구간 손실 은닉이 수행되면, 오디오 프레임 저장부(225)에 오디오 프레임이 존재하는지 확인한다. 오디오 프레임 저장부(225)에 오디오 프레임이 존재하지 않으면, 오디오 프레임 백업부(250)에 있는 손실 이전에 정상 수신된 오디오 프레임을 이용하여 손실 음성 프레임의 대체 신호를 생성한다.When the short-interval loss concealment is performed, it is confirmed whether or not an audio frame exists in the audio frame storage unit 225. [ If an audio frame does not exist in the audio frame storage unit 225, a replacement signal of the lost voice frame is generated using the audio frame normally received before the loss in the audio frame backup unit 250.

오디오 프레임 저장부(225)에 오디오 프레임이 존재하지 않고, 지터 버퍼(302)에 출력될 오디오 패킷이 존재한다면, 오디오 디코더(215)에 오디오 패킷을 요청하여 지터 버퍼(210)에 존재하는 패킷을 오디오 프레임으로 디코딩하여 오디오 프레임 저장부(225)에 저장한 후, 손실 이전 정상 수신된 오디오 프레임과 손실 이후 정상 수신된 오디오 프레임을 이용하여 은닉 신호를 생성한다.If there is no audio frame in the audio frame storage unit 225 and there is an audio packet to be output to the jitter buffer 302, an audio packet is requested to the audio decoder 215 and a packet existing in the jitter buffer 210 And stores the decoded audio frame into the audio frame storage unit 225, and generates a concealed signal using the normally received audio frame before the loss and the normally received audio frame after the loss.

장구간 패킷 손실이 발생하게 되면 단구간에서처럼 손실 이전 정상 수신된 오디오 프레임만을 이용해 손실된 패킷 구간을 반복적으로 대체하기 때문에 반복된 동일 신호 혹은 유사 신호로 인해 음질을 급격히 훼손시키는 버즈 사운드가 발생한다. 이러한 버즈 사운드를 제거하기 위해서 본 발명에서는 패킷 은닉에 사용된 손실 이전 오디오 프레임이 묵음이라면 그대로 사용하고, 오디오 신호가 존재하는 구간이라고 판별되면 이전 오디오 프레임을 적용하여 손실을 은닉함과 동시에, 급격히 선형적으로 감소하는 감소 스케일 함수를 복원된 오디오 프레임에 바로 적용하여 버즈 사운드를 제거한다.When a packet loss occurs in a long interval, a lost packet interval is repeated using only a normal received audio frame before a loss as in the short interval, so that a buzzing sound is generated which rapidly degrades the sound quality due to repeated identical or similar signals. In order to eliminate such a buzzing sound, in the present invention, if a previous audio frame used for packet concealment is silent, it is used as it is. If it is discriminated that an audio signal exists, a previous audio frame is applied to conceal the loss, The reduced scale function is applied directly to the restored audio frame to remove the buzzing sound.

단구간 및 장구간 패킷 손실 은닉에 사용된 묵음 및 오디오 신호 감소 스케일 함수는 20명의 청취자를 대상으로 하여 청취 효과를 측정함으로써 생성된 버즈 사운드 제거 함수이다.The silence and audio signal reduction scale functions used in short and long interval packet loss concealment are buzz sound cancellation functions that are generated by measuring the listening effect for 20 listeners.

병합부(240)는 손실 은닉된 오디오 프레임을 획득과 동시에 정상적으로 수신된 오디오 프레임를 오디오 프레임 저장부(225)에서 획득한다. 병합시 발생하는 왜곡을 막기 위해 정상 수신된 오디오 프레임에서 서브 프레임을 발췌하여 중첩-합을 위한 대체 신호를 생성한다. 생성된 대체 신호를 이용하여 손실 은닉된 오디오 신호에서 유사 구간을 검색하고, 검색된 유사 구간을 이용하여 손실 은닉된 오디오 프레임과 대체 신호의 중첩-합을 수행한다. 손실 은닉된 음성 프레임과 대체 신호의 중첩-합으로 불연속점을 제거한 오디오 프레임에 정상 수신된 오디오 프레임을 연결하여 최종 병합 신호를 생성한다.The merging unit 240 acquires the normally received audio frame at the audio frame storage unit 225 at the same time as acquiring the lost hidden audio frame. Subtracting the subframes from the normally received audio frame to prevent distortion occurring during merging to generate an alternate signal for superposition summing. Searches for a pseudo interval in the lost secreted audio signal using the generated substitute signal, and performs superposition-sum of the lost secreted audio frame and the substitute signal using the searched pseudo interval. The final merged signal is generated by connecting the normally received audio frame to the audio frame from which the discontinuity point is removed by the overlapping sum of the loss concealed voice frame and the replacement signal.

적응적 재생 스케줄링 방식과 패킷 손실 은닉 방식은 지터 추정과 함께 입체 음악 생성을 위한 믹싱부와 밀접한 관련을 갖고, 총체적인 지연을 최대한 줄이고 실시간으로 입체 음향이 생성될 수 있도록 한다. 즉, 패킷 콘트롤부(130)가 N 채널의 오디오 신호를 믹싱하는 믹싱부에서 N 채널의 오디오 신호들 중에서 2/3에 해당하는 채널의 오디오 신호가 이미 도착했다는 연락을 수신하면, 패킷 콘트롤부(130)에서는 압축, 병합 과정을 수행하지 않고, 단구간에서의 패킷 손실 은닉만을 사용하여 정상 출력함으로써 지연을 최대한 줄이고, 효과적으로 믹싱이 수행될 수 있도록 한다.The adaptive playback scheduling method and the packet loss concealment method are closely related to the mixing section for generating stereophonic music together with the jitter estimation, so that the overall delay can be minimized and a stereophonic sound can be generated in real time. That is, when the packet control unit 130 receives a notification that an audio signal of a channel corresponding to 2/3 of the N-channel audio signals has already arrived in the mixing unit for mixing N-channel audio signals, the packet control unit 130 does not perform the compression and merging processes, but uses only the packet loss concealment in the short interval to output normally, thereby reducing the delay to a minimum and effectively performing the mixing.

네트워크 지터에 따른 오디오 프레임 압축 및 정상 출력부(235), 패킷 손실에 따른 손실 은닉부(245) 혹은 병합부(240) 중에서 하나의 신호 처리 과정이 수행된 음성 프레임은 게인 콘트롤을 통한 믹싱 및 입체 음향 생성부(140)에 전달된다. 이와 동시에 오디오 프레임은 백업하는 오디오 프레임 백업부(250)에 저장된다.The audio frame compression and normal output unit 235 according to the network jitter, the loss concealment unit 245 according to the packet loss, or the merging unit 240 are subjected to the signal processing process, And transmitted to the sound generating unit 140. At the same time, the audio frame is stored in the backup audio frame backup unit 250.

오디오 프레임 압축 및 정상 출력부(235)의 특성에 대하여 부연 설명하면 다음과 같다.The characteristics of the audio frame compression and normal output unit 235 will be further described as follows.

- 발생하는 문제- Problems that occur

모바일 VoIP를 이용한 라이브 오디오 스트리밍시 네트워크 상황이 혼잡한 상황에서 갑자기 네트워크 상황이 좋아지면 수신단에 패킷이 갑자기 많은 패킷이 수신되어 패킷이 많이 쌓이는 오버플로우 현상이 발생하고 네트워크 상황이 좋은 상황에서 갑자기 네트워크 상황이 혼잡해져 수신단에 출력할 오디오 프레임이 고갈되는 언더플로우 현상이 발생한다.In the case of live audio streaming using mobile VoIP, when the network conditions are congested and the network condition improves suddenly, there is an overflow that a lot of packets are received at the receiving end and a lot of packets are piled up. Congestion occurs and an underflow phenomenon occurs in which audio frames to be output to the receiving end are exhausted.

- 기존 기술의 문제점- Problems of existing technology

종래의 지터 추정 방식을 이용한 플레이 아웃 스케줄 방식에서 역동적으로 변하는 네트워크 상황이나 스파이크 상황에서 지터 추정에 대한 에러가 크게 발생한다. 특히 정상 상태에서 스파이크의 변화 구간과 스파이크 구간에서 정상 상태로 변화하는 구간에서 지터 추정 에러가 크며, 이로 인해 플레이 아웃 스케줄링을 위한 버퍼링 지연 조절이 적절하게 이루어지지 못한다. 즉, 부정확한 지터 추정과 수신단에 존재하는 오디오 프레임의 길이만을 이용하여 압축을 수행하는 플레이 아웃 스케줄링 방식은 역동적인 네트워크 망 변동 상황을 즉시 반영하지 못하기 때문에 버퍼링 지연의 증가와 지연으로 인한 패킷 손실이 발생된다.There is a large error in jitter estimation in a network situation or a spike situation that varies dynamically in a playout schedule scheme using a conventional jitter estimation scheme. Especially, in the steady state, the jitter estimation error is large in the period of change of the spike and the period of change from the spike period to the steady state, so that the buffering delay adjustment for the playout scheduling is not properly performed. In other words, since the playout scheduling method that performs compression using only an inaccurate jitter estimation and a length of an audio frame existing in a receiving end does not immediately reflect dynamic network network fluctuation situation, the buffering delay increases and packet loss Lt; / RTI >

수행 과정은 다음 두가지로 구분된다.The process is divided into the following two.

첫째, 스파이크에서 정상으로 돌아오는 경우First, when returning from spikes to normal

먼저 오디오 패킷이 갑자기 들어와서 버퍼링 지연이 증가한다. 이후, 지터 버퍼(210) 및 출력 오디오 프레임 저장부(225)의 비오디오 구간에서 압축한다. 이때, 비오디오 구간이 출력 오디오 프레임 저장부(225)에 없고, 지터 버퍼(210)에 있으면, 디코딩을 요청해서 출력 오디오 프레임을 채운 후 다시 오디오 여부를 판단한다. 그 후 비오디오면 압축한다.First, the audio packet suddenly comes in and the buffering delay increases. Thereafter, the audio signal is compressed in the non-audio section of the jitter buffer 210 and the output audio frame storage unit 225. At this time, if the non-audio interval is not present in the output audio frame storage unit 225 and is present in the jitter buffer 210, decoding is requested to fill the output audio frame, and then audio is judged again. Then it compresses to Biodi.

둘째, 오디오 구간의 경우 정상 출력Second, in the audio section,

현재 출력될 것이 비오디오인 경우, 지터 정보를 이용해서 네트워크가 스파이크에서 정상으로 돌아오는 전환점인지 판단한다. 이후, 비오디오 구간에서 지터와 지터 분산을 오버플로우 문턱값과 비교하여 오버플로우라면 네트워크가 불안정하여 지터의 변화가 크다고 판단한 뒤 압축을 수행하고, 아니면 정상 출력한다.If the current output is non-audio, use jitter information to determine whether the network is a turning point from spike to normal. Then, if the jitter and jitter variance are compared with the overflow threshold value in the non-audio interval, it is determined that the network is unstable due to the unstable jitter and the compression is performed. Otherwise, the normal output is performed.

추정된 지터와 출력 오디오 프레임에 저장된 프레임의 길이의 비율이 정상 출력 문턱값보다 작으면 다음 패킷이 도착할 때까지 수신단에 정상 출력할 수 있는 오디오 프레임이 존재한다고 판단하여 정상 출력한다.If the ratio of the estimated jitter and the length of the frame stored in the output audio frame is smaller than the normal output threshold value, it is determined that there is an audio frame that can be normally output to the receiver until the next packet arrives,

다음으로 손실 은닉부(245)와 병합부(240)의 특성에 대해 부연 설명한다.Next, the characteristics of the loss concealment unit 245 and the merging unit 240 will be described in detail.

- 원인- cause

스파이크 발생으로 패킷의 손실이 발생했을 때 손실 은닉을 한다. 은닉 후 정상 프레임이 입력되면, 병합한다.It conceals loss when packet loss occurs due to spike occurrence. If a normal frame is input after concealment, the frame is merged.

- 기존의 문제점- existing problems

연속으로 손실된 구간에서 기존의 방식을 사용하면 손실 은닉된 패킷이 반복적으로 생성되어 버즈 사운드가 발생하므로 음질을 훼손시킨다.In the case of using the conventional scheme in the continuously lost interval, the lost secret packet is repeatedly generated and the buzz sound is generated, which degrades the sound quality.

- 수행 과정- Process

손실 은닉은 크게 단구간 패킷 손실 은닉과 장구간 패킷 손실 은닉으로 구분되어 수행된다.The loss concealment is largely classified into short - term packet loss concealment and long - term packet loss concealment.

1. 단구간 손실과 장구간 손실 구분1. Short-term loss and long-term loss

연속된 손실 은닉 횟수가 문턱값을 기준으로 작으면 단구간 손실, 크면 장구간 손실로 구분한다.If the number of consecutive loss concealment is smaller than the threshold, it is divided into short-term loss and long-term loss.

2. 단구간 손실 은닉2. Short-term loss concealment

출력 오디오 프레임 저장부에 손실 이후의 오디오 프레임이 존재하는지 확인한다.And checks whether an audio frame after loss is present in the output audio frame storage unit.

1) 출력 오디오 프레임 저장부에 손실 이후의 오디오 프레임이 존재하지 않고, 지터 버퍼(210)에도 손실 이후 정상 패킷이 존재하지 않으면, 손실된 오디오 프레임을 복원하기 위해 사용할 수 있는 오디오 프레임이 오디오 프레임 백업부(250)의 손실 이전에 정상 수신된 오디오 프레임 밖에 없다.1) If there is no audio frame after the loss in the output audio frame storage unit and the normal packet does not exist in the jitter buffer 210 after loss, the audio frame usable for restoring the lost audio frame is an audio frame backup There is only a normally received audio frame before the loss of the part 250. [

손실 이전에 정상 수신된 오디오 프레임을 이용해서 손실된 패킷을 은닉한다.And conceals the lost packet using the normally received audio frame before loss.

손실 패킷 은닉시 정상 수신된 오디오 프레임이 유성음인지 무성음인지 판단한다.When the loss packet is concealed, it is determined whether the normally received audio frame is voiced or unvoiced.

무성음이라면 무성음 소거 스케일링 함수를 적용하고, 유성음이라면 일정 구간까지 스케일값을 유지하고 지정 구간 이후 유성음 감소 스케일 함수를 적용하여 손실된 프레임을 은닉한다.If it is unvoiced, apply the unvoiced clearing scaling function. If the voiced sound is a voiced sound, keep the scale value until a certain interval. Then, after the designated interval, apply the voiced sound reduction scale function to conceal the lost frame.

2) 출력 오디오 프레임 저장부(225)에 손실 이후의 오디오 프레임이 존재하지 않고, 지터 버퍼(210)에 출력될 오디오 패킷이 존재하면, 손실된 오디오 프레임을 복원하기 위해 디코더(215)에 오디오 패킷을 디코딩해달라고 요청한다.2) If there is no audio frame after loss in the output audio frame storage unit 225 and there is an audio packet to be output to the jitter buffer 210, To be decoded.

디코더는 요청에 따라 지터 버퍼(210)의 오디오 패킷을 디코딩하여 출력 오디오 프레임 저장부에 저장한다.The decoder decodes the audio packet of the jitter buffer 210 upon request and stores the decoded audio packet in the output audio frame storage unit.

출력 오디오 프레임 저장부의 손실 이후 프레임과 오디오 프레임 저장부(225)의 손실 이전 오디오 프레임을 이용해서 손실을 은닉한다.Loss after the loss of the output audio frame storage unit and the audio frame before the loss of the audio frame storage unit 225 are used.

3) 출력 오디오 프레임 저장부에 오디오 프레임이 있으면, 디코더(215) 요청 없이, 오디오 프레임 백업부(250)의 손실 이전 정상 수신된 오디오 프레임과 출력 오디오 프레임의 손실 이후 오디오 프레임을 이용해서 손실을 은닉한다.3) If there is an audio frame in the output audio frame storage unit, the loss of the normally received audio frame and the output audio frame after the loss of the audio frame backup unit 250, do.

도 3은 도 1에 도시된 시스템을 구성하는 믹싱 및 입체 음향 생성부(140)의 내부 구성을 도시한 개념도이다. 본 실시예에서 믹싱 및 입체 음향 생성부(140)는 이득 콘트롤을 이용한다.3 is a conceptual diagram showing an internal configuration of a mixing and stereo sound generating unit 140 constituting the system shown in FIG. In this embodiment, the mixing and stereo sound generator 140 uses gain control.

도 3을 참조하면, 믹싱 및 입체 음향 생성부(140)는 오디오 음량 이득 조정부(305), 오디오 확산 조정부(310), 오디오 믹싱부(325), 메타 데이터 생성부(330), 렌더링 모델 생성부(320), 증폭기(335), 입체 음향 환경부(345), 입력 디바이스(340) 및 입체 음향 환경 측정부(315)로 구성된다.3, the mixing and stereo sound generation unit 140 includes an audio volume gain adjustment unit 305, an audio diffusion adjustment unit 310, an audio mixing unit 325, a metadata generation unit 330, An amplifier 335, a stereo sound environment unit 345, an input device 340, and a stereo sound environment measurement unit 315. [

동작이 시작되면, 각각의 N 채널에 입력되는 오디오 프레임에 대해서 독립적으로 오디오 이득 조정이 수행된다.When the operation is started, audio gain adjustment is performed independently for audio frames input to each N channel.

오디오 음량 이득 조정부(305)에서는 각각의 독립된 공간에서 연주된 음악 신호의 음량의 다양성을 고려하여 우선적으로 음량을 균일하게 하는 것을 목적으로 한다. 패킷 콘트롤부(130)의 묵음 구간 검출부(304)를 통해 묵음 구간이 아닌 오디오 신호가 존재하는 구간에 해당하는 정보를 전달받아 연속된 F개의 오디오 프레임을 악절로 정의하고, 각 오디오 악절의 각 샘플들 중에서 가장 높은 값을 가지는 피크값을 검색한다. 이전 오디오 프레임과 현재 오디오 프레임의 피크값 비교를 통해서 악절 단위로 최대 피크값을 찾고, 동시에 악절 단위로 에너지 값을 저장한다. 한 악절에서의 최대 피크값을 통해서 음량 기준값과 최대 피크값의 비율로 각 악절의 이득값을 계산한다. 한 악절에서 구한 이득값과 연속되는 그 다음 악절의 이득값의 차이가 발생할 경우에는 각 악절 간의 스무딩을 통해 다음 악절의 이득값으로 적용하여 오디오 신호의 연속성을 유지한다. 조정된 이득값을 복원된 패킷 단위의 음성 프레임 신호에 적용하여 환경 차이에 따른 음량 불균화를 제거한다. 음량 균일화시에는 음량 기준값을 적용하기 때문에 음량 포화가 발생하지 않는다. 그러나, 만일의 경우에 오디오 신호의 음량 포화가 발생한다면, 음량 포화 정도에 따른 비율로 오디오 이득값을 감소시킨다. 급격한 오디오 이득값의 변화로 인해서 발생하는 오디오 신호의 왜곡을 방지하기 위해서 이전의 이득값과 현재 이득값의 차이를 통해서 이득값의 변화를 계산한다. 이득값의 변화가 발생한다면, 이득값의 차이만큼 선형적으로 이득값이 변화하면서 악절 단위의 오디오 신호에 적용되도록 한다. 이득값의 변화가 없다면, 현재 이득값을 그대로 오디오 신호에 적용한다.The audio volume gain adjuster 305 aims to equalize the volume in preference in consideration of the diversity of the volume of the music signal played in each independent space. Information corresponding to a section in which an audio signal exists rather than a silence section is received through the silence section detection section 304 of the packet control section 130, and F consecutive audio frames are defined as sections, and each of the samples The peak value having the highest value among the peak values is searched. The peak value of the previous audio frame is compared with the peak value of the current audio frame. The gain value of each section is calculated by the ratio of the volume reference value and the maximum peak value through the maximum peak value in one section. If there is a difference between the gain value obtained from one section and the gain value of the subsequent section, the continuity of the audio signal is maintained by applying the gain value of the next section through smoothing between the sections. The adjusted gain value is applied to the restored packet unit of the voice frame signal to remove the loudness nonuniformity according to the environment difference. When volume equalization is applied, the volume reference value is applied, so that the volume saturation does not occur. However, if the volume saturation of the audio signal occurs in the event of an event, the audio gain value is reduced in proportion to the degree of saturation. In order to prevent distortion of the audio signal caused by a sudden change in the audio gain value, the change of the gain value is calculated through the difference between the previous gain value and the current gain value. If the gain value changes, the gain value is linearly changed by the difference of the gain value, so that the gain value is applied to the audio signal of the unit of measure. If there is no change in the gain value, the current gain value is directly applied to the audio signal.

각 채널에 입력되는 오디오 프레임들의 악절 단위로 계산된 에너지를 통해, 이득 조정 전에 각 연주자로부터 입력되는 N 채널 간의 에너지 비율, 상관도를 계산한다. 획득된 N 채널 간의 에너지 비율, 상관 비율은 오디오 믹싱부(325)의 믹싱에 적용되도록 한다. 적용되는 방식에 대해서는 오디오 믹싱부(325)에서 설명한다. 오디오 이득 조정부(305)에서 사용되는 오디오 이득값은 각 독립된 공간에서 연주되어 전송된 음악 구간을 시스템 엔지니어 혹은 사용자가 조정하는 음량 조절 요소 혹은 메타 데이터로 사용될 수 있다.The energy ratio and the degree of correlation between the N channels input from each player before gain adjustment are calculated through energy calculated in units of sections of audio frames input to each channel. The energy ratio and the correlation ratio between the obtained N channels are applied to the mixing of the audio mixing unit 325. The audio mixing unit 325 will be described in detail. The audio gain value used in the audio gain adjustment unit 305 may be used as a volume control element or metadata for a system engineer or a user to adjust a music section transmitted and played in each independent space.

오디오 확산 조정부(310)에서는 각 채널에서 독립적으로 계산된 오디오 이득값을 직접 이득값으로 정의하고, 음량 이득 조정부(305)로부터 전달받은 오디오 프레임에 전처리 필터를 적용하여 고주파 에너지를 제거하고, 전역 통과 필터(All pass filter)와 피드백 콤 필터(Feedback comb filter)를 통해 반향과 지연이 추가되는 확산 이득값을 생성한다. 피드백 콤 필터는 신호의 분산을 만들고 반향 주파수의 의존적인 지연을 생성한다. 확산 이득값을 적용함으로써 오디오 신호의 다양한 반향과 지연 성분이 생성되고, 이를 통해 오디오 신호의 확산 효과가 발생된다. 확산 이득값과 직접 이득값이 오디오 악절에 적용되어 출력된다. 확산 이득은 직접 이득과 결합되어 스피커 혹은 헤드폰에 입체 음향 발산의 느낌을 주기 위한 의도적인 오디오 프로세싱 모듈이다. 오디오 확산 조정부(310)에서 사용되는 확산 이득값은 시스템 엔지니어 혹은 사용자가 조정하는 음량 조절 요소 혹은 메타 데이터로 사용될 수 있다.The audio diffusion adjustment unit 310 defines the audio gain value independently calculated in each channel as a direct gain value, removes high frequency energy by applying a preprocessing filter to the audio frame received from the volume gain adjustment unit 305, Generates a spreading gain value to which echoes and delays are added through a filter (all pass filter) and a feedback comb filter. The feedback comb filter creates a variance of the signal and generates a delay dependent on the echo frequency. By applying the spreading gain value, various echoes and delay components of the audio signal are generated, thereby generating a diffusion effect of the audio signal. The spread gain value and the direct gain value are applied to the audio section and output. Diffusion gain is an intentional audio processing module that is combined with direct gain to give the speaker or headphone a sense of stereo sound emission. The diffusion gain value used in the audio diffusion adjustment unit 310 can be used as a volume adjustment element or metadata that is adjusted by a system engineer or a user.

오디오 믹싱부(325)에서는 패킷 분할 및 전송부(120)로부터 전송받은 패킷 생성 시간, N개의 각 채널에서 입력된 오디오 악절의 오디오 프레임의 송신단 생성 시간, 네트워크 전송 지연 시간, 신호 처리 시간 등을 적용하여 각 N개의 채널 간에 연주 시차의 오차가 없도록 타이밍을 맞추고, 각 N개의 채널로부터 전송된 음악 연주 오디오 스트림을 M개의 오디오 채널로 믹싱한다. 만일, 각 N개의 채널 중 2/3에 해당하는 패킷이 이미 오디오 믹싱부(325)에 도착하게 되면, 아직 도착하지 않은 채널의 패킷 콘트롤부(130)에 단순한 단구간 패킷 손실 은닉만을 적용하여 정상 출력할 것을 요청하여 지연이 더 지체되지 않고 믹싱부에서 N 채널을 오디오 신호가 다채널 방식으로 믹싱될 수 있도록 한다.The audio mixing unit 325 applies the packet generation time received from the packet dividing and transmitting unit 120, the transmitting end generation time of the audio section of the audio section inputted in each of N channels, the network transmission delay time, and the signal processing time So that there is no error in the parallax between each of the N channels, and the music performance audio stream transmitted from each of the N channels is mixed into M audio channels. If a packet corresponding to two thirds of the N channels has already arrived at the audio mixing unit 325, only a simple short-term packet loss concealment is applied to the packet control unit 130 of the channel that has not arrived yet, So that the delay can be delayed and the N channel can be mixed with the audio signal in the multi-channel manner in the mixing unit.

오디오 믹싱부(325)는 입체 음향 환경부(345)에서 연주가 재생이 될 수 있도록 M 출력 채널의 세트를 만드는 믹싱 컨트롤 계수를 생성하여 N 오디오 입력 채널의 오디오 신호를 믹싱하게 된다. 믹싱 컨트롤 계수는 음량 이득 조정부(305)에서 획득된 N 채널 간의 에너지 비율, 상관 비율을 통해 생성된다.The audio mixing unit 325 mixes the audio signals of the N audio input channels by generating mixing control coefficients for forming a set of M output channels so that the stereo audio environment unit 345 can reproduce the performance. The mixing control coefficients are generated through the ratio of the energy between the N channels obtained by the volume gain adjuster 305, and the correlation ratio.

각 채널 간에 입력되는 오디오 신호를 믹싱하게 되면 음량 포화의 상황이 발생할 수 있기 때문에, 음량 포화를 방지하기 위해서 음량 포화 방지를 위한 균일 이득 믹싱 계수를 적용한다. 즉, i번째 오디오 신호의 크기에 k번째 악절에서 구해진 믹싱 계수를 적용시, 음량 제한 문턱값보다 작으면 믹싱 계수를 그대로 적용할 수 있는 상황이라 판단한다. 여기서 k번째 악절에서 구해진 믹싱 계수는 다음 k+1번째 악절에서 믹싱 계수가 계산되기 전까지 입력되는 오디오 신호에 동일하게 적용되는 믹싱 계수이다. 반대로 i번째 음성 신호의 크기에 믹싱 계수를 적용시, 음량 제한 문턱값을 초과하면 음량 포화가 발생하는 상황으로 판단하여, 음량 한계점과 음성 신호 크기의 비율로 음량 포화 방지를 위한 균일 이득 믹싱 계수를 계산하여 적용함으로써 음량 포화를 방지한다.Since the situation of volume saturation may occur when mixing the audio signals inputted between the channels, a uniform gain mixing coefficient for preventing saturation of the volume is applied in order to prevent volume saturation. That is, when the mixing coefficient obtained from the k-th section is applied to the size of the i-th audio signal, if it is smaller than the volume limitation threshold value, it is determined that the mixing coefficient can be applied as it is. Here, the mixing coefficient obtained in the k-th section is a mixing coefficient applied to the audio signal inputted until the mixing coefficient is calculated in the (k + 1) -th section. On the contrary, when the mixing coefficient is applied to the amplitude of the i-th audio signal, it is determined that the volume saturation occurs when the volume limitation threshold is exceeded. The uniform gain mixing coefficient for preventing saturation Thereby preventing volume saturation.

믹싱 계수의 변동으로 인해서 발생하는 오디오 신호의 왜곡을 방지하기 위해서는 i-1번째 오디오 신호에 적용했던 믹싱 계수와

i번째 오디오 신호에 적용될 믹싱 계수의 차이가 발생하면, i

번째 오디오 신호의 전체 샘플 단위로 믹싱 계수값을 선형적으로 변화시키면서 적용한다.In order to prevent the distortion of the audio signal caused by the variation of the mixing coefficient, the mixing coefficient applied to the (i-1)

When a difference in the mixing coefficient to be applied to the i-th audio signal occurs, i

And the mixing coefficient value is linearly changed in units of a whole sample of the first audio signal.

오디오 믹싱부(325)에서 사용되는 믹싱 컨트롤 계수는 시스템 엔지니어 혹은 사용자가 조정하는 음량 조절 요소 혹은 메타 데이터로 사용될 수 있다.The mixing control coefficient used in the audio mixing unit 325 may be used as a volume control element or metadata that is adjusted by a system engineer or a user.

믹싱된 오디오 신호는 증폭기(335)에 의해 아날로그 신호로 바뀌고 증폭된다. 증폭된 신호는 입체 음향 환경부(345)의 스피커로 전송된다.The mixed audio signal is converted into an analog signal by the amplifier 335 and amplified. The amplified signal is transmitted to the speaker of the stereo environment unit 345.

입체 음향 환경부(345)는 스피커의 수, 배열 형태 및 위치 등으로 구성된다. 오디오 믹싱부(325)로부터 출력되는 M 채널과 실제적인 스피커의 수는 반드시 일치할 필요는 없다. 출력되는 M 채널의 전-후-좌-우의 신호를 통해서 새로운 가상의 스피커를 생성하여 스피커 출력 신호로 사용할 수 있다. 입체 음향 환경부(345)에서 사용되는 스피커의 수, 배열, 방위 정보, 높이 정보, 등위 정보 등은 메타 데이터로 사용될 수 있다.The stereo sound environment section 345 is composed of the number of speakers, arrangement type, position, and the like. The number of M channels output from the audio mixing unit 325 and the actual number of speakers do not necessarily match each other. A new virtual speaker can be generated through the pre-post-left-right signal of the output M channel and used as a speaker output signal. The number, arrangement, orientation information, height information, and level information of speakers used in the stereophonic environment unit 345 can be used as metadata.

입체 음향 환경 측정부(315)에서는 입체 음향 환경부(345)의 스피커들로부터 출력되는 오디오 신호를 다채널 입력 오디오 디바이스를 통해 전달받는다. 전달받은 오디오 신호에 대해 L개 악절 단위로 전문가의 청취 테스트를 실시하여 입체 음향이 제대로 생성되는지를 평가한다. 입체 음향이 제대로 생성되고 있다고 판단될 경우에는 현재 적용하는 파라미터를 입체 음향 렌더링 모델에 적용한다. 반면에, 입체 음향이 제대로 생성되지 않고 있다고 판단될 경우에는 음량 이득 조정부(305), 확산 이득 조정부(310), 오디오 믹싱부(325) 등에 피드백하여 음량 이득, 확산 이득 및 확산 파라미터, 믹싱 이득 및 믹싱 파라미터 등을 변경하여 새롭게 입력되는 오디오 악절에 적용한다.The stereo environment environment measuring unit 315 receives the audio signals output from the speakers of the stereo sound environment unit 345 through the multi-channel input audio device. Expert listening tests are performed on the transmitted audio signal in units of L sections to evaluate whether the stereo sound is generated properly. If it is determined that the stereo sound is being generated properly, the currently applied parameter is applied to the stereo sound rendering model. On the other hand, if it is determined that the stereo sound is not properly generated, the volume gain adjusting unit 305, the diffusion gain adjusting unit 310, and the audio mixing unit 325 feedback the volume gain, spreading gain and spreading parameter, And changes the mixing parameter to apply to the newly input audio section.

입체 음향 렌더링 모델은 입체 음향 청취 테스트에 만족되는 입체 음향 파라미터들 간의 상관도를 온톨로지 맵으로 구성하고, 구성된 온톨로지 맵과 스피커에서 출력되는 사운드필드 비쥬얼 시뮬레이션 결과를 통해 입체 음향 모델이 제대로 적용되고 있는지를 확인한다. 온톨로지 맵은 생성된 입체 음향 파라미터들 간의 최적화된 상관도를 나타내고, 상관도를 통해 입체 음향 생성 레벨을 구분할 수 있게 한다.The stereophonic rendering model consists of the ontology map and the sound field visual simulation result output from the speaker to compute the correlation between the stereophonic parameters satisfying the stereophonic listening test and the stereophonic sound model Check. The ontology map shows the optimized correlation between the generated stereo sound parameters and enables to distinguish the stereo sound generation level through the correlation.

메타 데이터 생성부(330)에서는 게인 콘트롤을 이용한 믹싱 및 입체 음향 생성부(140)에서 생성되는 파라미터들을 메타 데이터로 저장한다. 저장된 메타 데이터는 입체 음향 재현 시스템을 위해 적용될 수 있다.The metadata generating unit 330 stores parameters generated by the mixing and stereo sound generating unit 140 using gain control as metadata. The stored metadata may be applied for a stereo reproduction system.

이상 도 1 내지 도 3을 참조하여 설명한 입체 음향 생성 시스템(100)의 특징을 정리하면 다음과 같다.The features of the stereo sound generating system 100 described with reference to FIGS. 1 to 3 are summarized as follows.

입체 음향 생성 시스템(100)은 각 독립된 공간에서 연주자가 연주한 연주음을 IP 네트워크를 통해 전송하여 수신단에서 입체 음향을 생성하는 경우에 있어서 전적으로 수신단측에서의 신호 처리 과정만을 통해 버퍼링 지연을 줄이고 오디오 품질을 향상시키기 위한 지터 분산의 최적 가중치 함수를 적용한 지터 추정 방식을 이용한다.The stereophonic sound generation system 100 can reduce the buffering delay only through the signal processing at the receiving end in the case of generating the stereophonic sound at the receiving end by transmitting the played sound played by the player in each independent space through the IP network, A jitter estimation method using an optimum weight function of jitter dispersion is used.

입체 음향 생성 시스템(100)은 지터 추정 에러를 감소시키는 지터 추정을 이용한 플레이 아웃 스케줄링 방식을 이용한다.The stereophonic generation system 100 utilizes a playout scheduling scheme using jitter estimation to reduce jitter estimation errors.

입체 음향 생성 시스템(100)은 디코더와 손실 은닉부의 분리로 디코더의 종류에 상관없이 손실 음성 프레임 복원 방법이 적용 가능한 구조; 수신된 패킷의 묵음 구간 검출 결과를 이용한 패킷 손실 구간별 패킷 손실 은닉 및 병합 방식을 이용한다.The stereo sound generation system 100 may include a structure in which a lossy speech frame reconstruction method can be applied regardless of a type of a decoder by separating a decoder and a loss concealment unit; The packet loss concealment and merging method based on the detection result of the silent interval of the received packet is used.

입체 음향 생성 시스템(100)은 수신 장치에 수신된 패킷의 헤더 정보와 그에 따른 최적의 지터 분산 가중치를 이용하여 네트워크 지터가 급격히 변화하는 스파이크 상태를 검출하고, 검출된 스파이크를 이용하여 순차적으로 음성 프레임을 재생한다.The stereo sound generation system 100 detects a spike state in which the network jitter is rapidly changed using the header information of the packet received by the reception apparatus and the optimum jitter dispersion weight corresponding thereto, Lt; / RTI >

입체 음향 생성 시스템(100)은 급격히 변화하는 네트워크 환경에서 네트워크 상태를 스파이크 상태와 정상 상태로 구분하고, 각 네트워크의 상태에 따른 현재 네트워크 지터의 평균과 분산, 그리고 최적 지터 분산 가중치 함수를 이용하여 지터 추정 에러를 최소화한다.The stereo sound generation system 100 divides a network state into a spike state and a normal state in a rapidly changing network environment, and calculates a jitter distribution using a mean and variance of current network jitter according to the state of each network, Minimize estimation error.

입체 음향 생성 시스템(100)은 네트워크의 정상 상태에서 스파이크 상태로의 변화를 감지하기 위해 스파이크 검출시 정상 수신된 패킷의 정보를 저장하고, 저장된 패킷의 정보를 이용하여 스파이크에서 정상 상태로의 네트워크 변화를 감지한다.The stereophony generation system 100 stores information of a packet normally received at the time of spike detection to detect a change from a steady state to a spike state of the network, Lt; / RTI >

입체 음향 생성 시스템(100)은 스파이크 검출시 이전 정상 상태의 지터의 평균 및 분산을 임시 저장하고, 스파이크 상태에서 정상 상태로 전환되었을 때, 임시 저장된 지터의 평균과 분산을 사용하여 스파이크 상태에서 변화된 지터에 영향 없이 정상 상태의 지터를 추정한다.The stereophonic generation system 100 temporarily stores the mean and variance of the jitter of the previous steady state during spike detection and uses the average and variance of the temporarily stored jitter when switched from the spike state to the steady state to change the jitter The jitter of the steady state is estimated without affecting the jitter.

입체 음향 생성 시스템(100)은 각 네트워크의 정상 상태와 스파이크 상태에서 각각 추정된 지터와 실제 지터와의 오차를 계산하여 최적의 지터 분산 가중치를 구한다.The stereo sound generation system 100 calculates an error between the estimated jitter and the actual jitter in the steady state and the spike state of each network to obtain an optimum jitter dispersion weight.

입체 음향 생성 시스템(100)은 지터 추정 방식으로 추정된 지터와 수신단에 존재하는 오디오 프레임, 오디오 패킷의 길이를 이용하여 효과적으로 플레이 아웃 스케줄링을 수행한다.The stereophonic sound generation system 100 effectively performs playout scheduling by using the jitter estimated by the jitter estimation method, the length of the audio frame and the audio packet existing at the receiving end.

입체 음향 생성 시스템(100)은 플레이 아웃 스케줄링 수행에서 신장을 제외한 압축 및 정상 출력만을 판단한다.The stereophonic sound generation system 100 judges only the compression and normal output excluding the extension in performing the playout scheduling.

입체 음향 생성 시스템(100)은 압축 수행시 스파이크에서 정상 상태로 전환되는 경우와 정상 상태가 유지되는 상황으로 구분하고 이에 따른 압축을 수행한다.The stereophonic sound generation system 100 distinguishes between a case where a spike is switched to a normal state and a state where a normal state is maintained during compression, and compression is performed accordingly.

입체 음향 생성 시스템(100)은 압축 수행시 오디오 신호가 존재하는 구간 및 묵음 구간으로 구분하고 오디오 신호가 존재하는 구간에서는 압축을 수행하지 않고 묵음이 존재하는 구간에 대해서만 압축을 수행하여 음질 훼손을 막는다.The stereophonic sound generation system 100 is divided into a section where an audio signal exists and a silence section in which compression is performed, and compression is performed only in a section in which silence exists without performing compression in an interval in which an audio signal exists, thereby preventing sound quality degradation .

입체 음향 생성 시스템(100)은 스파이크에서 정상 상태로 전환되는 전환점에서의 오류를 방지하기 위해 정상 출력을 수행한다.The stereophonic generation system 100 performs a normal output to prevent an error at a transition point from a spike to a steady state transition.

입체 음향 생성 시스템(100)은 압축 수행시 스파이크에서 정상 상태로 전환되는 상황에서 오디오 프레임 버퍼와 지터 버퍼의 연속된 묵음 구간을 찾고, 연속된 묵음 구간에 대해 압축을 수행하여 버퍼링 지연을 줄인다.The stereophonic sound generation system 100 finds consecutive silence periods of the audio frame buffer and the jitter buffer in a state where the spike to the steady state is changed during compression, and compresses the consecutive silence periods to reduce the buffering delay.

입체 음향 생성 시스템(100)은 패킷 손실 은닉시 연속된 패킷 손실을 인지하고 버즈 사운드를 제거하기 위하여 단구간 및 장구간 손실 은닉으로 구분하여 은닉을 수행한다.Stereophonic sound generation system 100 recognizes consecutive packet losses when concealing packet loss and performs concealment in such a manner as to discriminate short-term and long-term loss concealances in order to eliminate buzz sounds.

입체 음향 생성 시스템(100)은 연속된 패킷 손실이 단구간일 경우, 손실된 오디오 프레임 이후의 오디오 프레임이 존재하지 않는 경우에 손실 이전 오디오 프레임만을 이용하여 손실된 오디오 프레임을 복원하고, 손실된 오디오 프레임 이후의 오디오 프레임이 존재하면 손실 이전 오디오 프레임과 손실 이후 오디오 프레임을 이용하여 손실된 오디오 프레임을 복원한다.The stereo sound generation system 100 restores the lost audio frame using only the previous audio frame in the case where there is no audio frame after the lost audio frame when the consecutive packet loss is a short interval, If there is an audio frame after the frame, the lost audio frame is recovered using the audio frame before loss and the audio frame after loss.

입체 음향 생성 시스템(100)은 손실 이후 오디오 프레임의 존재 확인 과정에서 오디오 프레임 버퍼에 손실 이후 오디오 프레임이 존재하지 않고 믹싱을 위한 타이머의 시간에 문제가 없을 경우, 지터 버퍼에 손실 이후 오디오 프레임을 검색하고 디코딩하여 손실 복원에 사용한다.If the audio frame does not exist after loss in the audio frame buffer in the process of confirming the existence of the audio frame after loss, and the time of the timer for mixing does not have a problem, the stereo sound generation system 100 detects the audio frame after loss in the jitter buffer And decoded to be used for loss recovery.

입체 음향 생성 시스템(100)은 장구간 손실 은닉시 버즈 사운드를 제거하기 위해 묵음 및 오디오 신호 존재 구간을 구별하고 오디오 신호 존재 구간에 대해 소거 스케일링 함수를 적용한다.The stereophonic sound generation system 100 distinguishes silence and audio signal existence intervals and applies an erasure scaling function to the audio signal existence interval in order to remove buzz sounds when the long interval loss is concealed.

입체 음향 생성 시스템(100)은 병합 수행시 음질 저하를 막기 위해 정상 수신된 오디오 프레임에서 서브 프레임을 발췌하여 이를 이용한 손실 은닉된 오디오 프레임과 자연스러운 병합 수행으로 정상 도착한 패킷의 변형을 막는다.The stereophonic sound generation system 100 extracts a subframe from a normally received audio frame to prevent degradation in sound quality during merging, and prevents a deformation of a normally arriving packet by performing a natural merge with a lost hidden audio frame using the extracted subframe.

적응적 재생 스케줄링 방식과 패킷 손실 은닉 방식은 입체음악 생성을 위한 믹싱부와의 밀접한 관련을 갖기 때문에 지연을 최대한 줄이고, 실시간으로 입체 음향이 생성될 수 있도록 믹싱부에 2/3의 채널 신호가 도착하면 패킷 콘트롤부에서는 압축, 병합 과정을 수행하지 않고, 단구간에서의 패킷 손실 은닉만을 사용하여 정상 출력함으로써 지연을 최대한 줄인다.Since the adaptive reproduction scheduling method and the packet loss concealment method are closely related to the mixing unit for generating stereophonic music, a 2/3 channel signal is received in the mixing unit so that the delay can be minimized and a stereophonic sound can be generated in real time The packet control unit does not perform the compression and merging process, but uses only the packet loss concealment in the short interval and outputs normally, thereby reducing the delay as much as possible.

입체 음향 생성 시스템(100)은 IP 네트워크를 통해 전송된 각 연주자의 연주음들이 포함되는 N개의 채널을 M개의 출력 신호로 믹싱하는 과정에서 타이밍의 일치, 음량의 균일, 음량의 포화 현상 제거, 각 연주음들의 균형적인 조화를 유지한다.The stereophonic sound generation system 100 performs a process of mixing the N channels including the performance sound of each player transmitted through the IP network into the M output signals in the process of matching the timing, uniformity of the volume, saturation of the volume, Maintain balanced balance of playing notes.

입체 음향 생성 시스템(100)은 묵음 구간이 아닌 오디오 신호가 존재하는 F개의 오디오 프레임을 악절로 정의하고, 악절 단위로 찾아진 최대 피크값을 통해서 음량 기준값과 최대 피크값의 비율로 각 악절의 이득값을 계산한다.The stereo sound generation system 100 defines F audio frames in which audio signals exist, not silence periods, as sections, and calculates a gain of each section as a ratio of a volume reference value to a maximum peak value through a maximum peak value found in units of sections Calculate the value.

입체 음향 생성 시스템(100)은 한 악절에서 구한 이득값과 연속되는 그 다음 악절의 이득값의 차이가 발생할 경우에는 각 악절 간의 스무딩을 통해 다음 악절의 이득값으로 적용하여 오디오 신호의 연속성을 유지한다.The stereo sound generation system 100 maintains the continuity of the audio signal by applying the gain value of the next section through the smoothing between the respective sections when the difference between the gain value obtained in one section and the gain value of the succeeding section is generated .

입체 음향 생성 시스템(100)은 악절 단위의 이득값을 적용한 오디오 신호의 음량 포화가 발생한다면, 음량 포화 정도에 따른 비율로 오디오 이득값을 감소시켜 음량 포화를 저지시킨다.The stereo sound generation system 100 reduces the audio gain value at a rate according to the degree of volume saturation to prevent saturation of the volume if the volume saturation of the audio signal applied with the gain value in the unit of section occurs.

입체 음향 생성 시스템(100)은 각 채널에 입력되는 오디오 프레임들의 악절 단위로 계산된 에너지를 통해, 이득 조정 전에 각 연주자로부터 입력되는 패킷 분할 및 전송부로부터 수신된 패킷 생성 시간, N 채널 간의 에너지 비율, 상관도를 계산하여 믹싱에 적용한다.Stereophonic sound generating system 100 calculates the packet generation time from the packet segmentation and transmission unit input from each player before the gain adjustment through the energy calculated in units of sections of audio frames input to each channel, , And the correlation is calculated and applied to the mixing.

입체 음향 생성 시스템(100)은 각 채널에서 독립적으로 계산된 오디오 이득값을 직접 이득값으로 정의하고, 음량 이득 조정부로부터 전달받은 오디오 프레임에 전처리 포워드 필터와 피드백 필터를 통해 반향과 지연이 추가되는 확산 이득값을 생성한다.The stereo sound generation system 100 defines an audio gain value independently calculated for each channel as a direct gain value, and adds a pre-processing forward filter and a feedback filter to which echo and delay are added through an audio frame received from the volume gain adjusting unit Thereby generating a gain value.

입체 음향 생성 시스템(100)은 확산 이득과 직접 이득을 결합하여 스피커 혹은 헤드폰에 입체 음향 발산의 느낌을 준다.Stereophonic generation system 100 combines diffusion gain and direct gain to give a speaker or headphone a sense of stereophonic sound emission.

입체 음향 생성 시스템(100)은 N개의 각 채널에서 입력된 오디오 악절의 오디오 프레임의 송신단 생성 시간, 네트워크 전송 지연 시간, 신호 처리 시간 등을 적용하여 각 N개의 채널 간에 연주 시차의 오차가 없도록 타이밍을 맞추고, 각 N개의 채널로부터 전송된 음악 연주 오디오 스트림을 M개의 오디오 채널로 믹싱한다.The stereophonic sound generation system 100 applies timing such as a transmission terminal generation time, a network transmission delay time, a signal processing time, etc. of an audio frame of an audio section input in each of N channels so that there is no error in the parallax between the N channels And mixes music performance audio streams transmitted from each of the N channels into M audio channels.

입체 음향 생성 시스템(100)은 입체 음향 환경부에서 연주가 재생이 될 수 있도록 M 출력 채널의 세트를 만드는 믹싱 컨트롤 계수를 음량 이득 조정부에서 획득된 N 채널 간의 에너지 비율, 상관 비율을 통해 생성하여 N 오디오 입력 채널의 오디오 신호를 믹싱한다.The stereophonic sound generation system 100 generates a mixing control coefficient for generating a set of M output channels so that a performance can be reproduced in a stereo sound environment unit through an energy ratio and a correlation ratio between N channels obtained in the volume gain adjustment unit, Mix the audio signal of the channel.

입체 음향 생성 시스템(100)은 각 채널간에 입력되는 오디오 신호의 믹싱시에 음량 포화를 방지하기 위해서 음량 포화 방지를 위한 균일 이득 믹싱 계수를 적용한다.The stereophonic sound generation system 100 applies a uniform gain mixing coefficient for preventing saturation of sound to prevent loudness saturation at the time of mixing audio signals inputted between respective channels.

입체 음향 생성 시스템(100)은 음량 한계점과 음성 신호 크기의 비율로 음량 포화 방지를 위한 균일 이득 믹싱 계수를 계산하여 적용함으로써 음량 포화를 방지한다.The stereo sound generation system 100 prevents the saturation of volume by calculating and applying a uniform gain mixing coefficient for preventing volume saturation at a ratio of the volume limit and the voice signal size.

입체 음향 생성 시스템(100)은 믹싱 계수의 변동으로 인해서 발생하는 오디오 신호의 왜곡을 방지하기 위해 믹싱 계수값을 선형적으로 변화시킨다.The stereo sound generation system 100 linearly changes the mixing coefficient value to prevent distortion of the audio signal caused by variation of the mixing coefficient.

입체 음향 생성 시스템(100)은 오디오 믹싱과 입체 음향 환경을 결합하여 초기 입체 음향을 생성하고, 생성된 입체 음향을 다채널 입력 디바이스로 녹음하여 개선된 입체 음향을 생성한다.The stereo sound generation system 100 combines audio mixing and a stereo environment to generate an initial stereo sound, and records the generated stereo sound to a multi-channel input device to generate an improved stereo sound.

입체 음향 생성 시스템(100)은 오디오 믹싱부로부터 출력되는 M 채널의 전-후-좌-우의 신호를 통해서 새로운 가상의 스피커를 생성하여 스피커 출력 신호로 사용한다.The stereophonic sound generation system 100 generates a new virtual speaker through a pre-post-left-right signal of the M channel output from the audio mixing unit and uses it as a speaker output signal.

입체 음향 생성 시스템(100)은 입체 음향 환경부의 스피커들로부터 출력되는 오디오 신호를 다채널 입력 오디오로 입력받고, L개 악절 단위로 전문가의 청취 테스트를 실시하여 입체 음향이 제대로 생성되는지를 평가한다.The stereo sound generation system 100 receives an audio signal output from the speakers of the stereo environment unit as multi-channel input audio, and performs a listening test of an expert in units of L to evaluate whether a stereo sound is properly generated.

입체 음향 생성 시스템(100)은 입체 음향이 제대로 생성되었다면 적용된 입체 음향 파라미터를 렌더링 모델에 적용한다. 반면에 입체 음향이 제대로 생성되지 않았다고 판단되면, 입체 음향 파라미터 변경에 대한 요구를 패킷 콘트롤부, 음량 이득 조정부, 확산 조정부, 믹싱 계수 조정부에 전송하여, 새로운 파라미터를 적용함으로써 개선된 입체 음향을 생성한다.The stereophonic generation system 100 applies the applied stereophonic parameters to the rendering model if the stereophony is properly generated. On the other hand, if it is determined that the stereo sound is not properly generated, a request for changing the stereo sound parameter is transmitted to the packet control unit, the volume gain adjusting unit, the diffusion adjusting unit, and the mixing coefficient adjusting unit to generate an improved stereo sound by applying new parameters .

입체 음향 생성 시스템(100)은 입체 음향 렌더링 모델 생성을 위해서 입체 음향 청취 테스트에 만족되는 입체 음향 파라미터들 간의 상관도를 온톨로지 맵으로 구성하고, 구성된 온톨로지 맵과 스피커에서 출력되는 사운드 필드 비쥬얼 시뮬레이션 결과를 통해 입체 음향 모델이 제대로 적용되고 있는지를 확인한다.The stereo sound generation system 100 configures an ontology map between the stereo sound parameters satisfying the stereo sound listening test in order to generate a stereo sound rendering model, and generates a sound field visual simulation result output from the ontology map and the speaker Make sure that the stereo model is applied properly.

다음으로 도 1 내지 도 3에서 설명한 일실시예로부터 추론 가능한 본 발명의 바람직한 실시예에 대하여 설명한다.Next, a preferred embodiment of the present invention, which can be inferred from the embodiments described in Figs. 1 to 3, will be described.

도 4는 본 발명의 바람직한 실시예에 따른 입체 음향 재생 시스템을 개략적으로 도시한 블록도이다.4 is a block diagram schematically illustrating a stereo sound reproducing system according to a preferred embodiment of the present invention.

도 4에 따르면, 입체 음향 재생 시스템(400)은 오디오 정보 생성부(410), 인코딩부(420), 오디오 정보 전송부(430), 입체 음향 생성 장치(500) 및 입체 음향 재생부(440)를 포함한다.4, the stereophonic sound reproducing system 400 includes an audio information generating unit 410, an encoding unit 420, an audio information transmitting unit 430, a stereo sound generating apparatus 500 and a stereo sound reproducing unit 440, .

오디오 정보 생성부(410)는 동일 시간에 각기 다른 공간에서 각기 다른 연주자나 악기에 의한 연주를 녹음하여 각각의 오디오 정보로 생성하는 기능을 수행한다.The audio information generating unit 410 records performances of different performers or musical instruments in different spaces at the same time, and generates audio information for each of the performers.

인코딩부(420)는 오디오 정보 생성부(410)에 의해 생성된 오디오 정보를 인코딩하는 기능을 수행한다. 본 실시예에서 도 1의 음악 사운드 녹음 및 인코딩부(110)는 도 4의 오디오 정보 생성부(410)와 인코딩부(420)로 구현될 수 있다.The encoding unit 420 encodes the audio information generated by the audio information generating unit 410. In the present embodiment, the music sound recording and encoding unit 110 of FIG. 1 may be implemented by the audio information generating unit 410 and the encoding unit 420 of FIG.

오디오 정보 전송부(430)는 인코딩부(420)에 의해 인코딩된 오디오 정보를 패킷 단위로 분할하여 입체 음향 생성 장치(430)의 지터 추정부로 전송하며, 인코딩된 오디오 정보의 생성 시간을 입체 음향 생성 장치(430)의 입체 음향 생성부로 통지하는 기능을 수행한다. 본 실시예에서 도 1의 패킷 생성 및 전송부(120)는 도 4의 오디오 정보 전송부(430)로 구현될 수 있다.The audio information transmitting unit 430 divides the audio information encoded by the encoding unit 420 into packet units and transmits the divided audio information to the jitter estimating unit of the stereo sound generating apparatus 430. The audio information transmitting unit 430 generates a stereo sound generation time And notifies the stereo sound generating unit of the apparatus 430. [ In this embodiment, the packet generating and transmitting unit 120 of FIG. 1 may be implemented by the audio information transmitting unit 430 of FIG.

입체 음향 재생부(440)는 입체 음향 생성 장치(430)에 의해 생성된 입체 음향을 실시간으로 재생하는 기능을 수행한다.The stereo sound reproducing unit 440 performs a function of real-time reproducing the stereo sound generated by the stereo sound generating device 430. [

이하에서는 입체 음향 생성 장치(500)에 대하여 보다 자세하게 설명한다.Hereinafter, the stereo sound generating apparatus 500 will be described in more detail.

도 5는 도 4의 입체 음향 재생 시스템을 구성하는 입체 음향 생성 장치의 내부 구성을 개략적으로 도시한 블록도이다.FIG. 5 is a block diagram schematically showing an internal configuration of a stereophonic sound generating apparatus constituting the stereophonic sound reproducing system of FIG.

도 5에 따르면, 입체 음향 생성 장치(500)는 지터 추정부(510), 신호 처리부(520), 입체 음향 생성부(530), 전원부(540) 및 주제어부(550)를 포함한다.5, the stereophonic sound generating apparatus 500 includes a jitter estimating unit 510, a signal processing unit 520, a stereo sound generating unit 530, a power source unit 540, and a main control unit 550.

전원부(540)는 입체 음향 생성 장치(500)를 구성하는 각 구성에 전원을 공급하는 기능을 수행한다. 주제어부(550)는 입체 음향 생성 장치(500)를 구성하는 각 구성의 전체 작동을 제어하는 기능을 수행한다.The power supply unit 540 performs a function of supplying power to each configuration of the stereophonic sound generating apparatus 500. The main control unit 550 performs a function of controlling the overall operation of each configuration of the stereophonic sound generating apparatus 500.

지터 추정부(510)는 특정 채널로 오디오 정보가 수신되면 특정 채널로 현재 수신된 오디오 정보의 수신 시간과 특정 채널로 이전 수신된 오디오 정보의 수신 시간을 기초로 네트워크 지터(jitter)를 추정하는 기능을 수행한다. 본 실시예에서 도 2의 네트워크 지터 추정부(205)는 도 5의 지터 추정부(510)로 구현될 수 있다.The jitter estimator 510 estimates network jitter based on the reception time of the audio information currently received on the specific channel and the reception time of the audio information previously received on the specific channel when the audio information is received on the specific channel . In this embodiment, the network jitter estimation unit 205 of FIG. 2 may be implemented by the jitter estimation unit 510 of FIG.

지터 추정부(510)는 현재까지 획득된 지터들의 평균값, 현재까지 획득된 각 지터와 각 지터가 획득되기 이전에 획득된 지터 사이의 차이값들로부터 산출된 분산값, 및 미리 정해진 가중치를 이용하여 네트워크 지터를 추정할 수 있다.The jitter estimation unit 510 uses the average value of the jitters obtained so far, the variance value calculated from the difference values between the jitter obtained before the respective jits obtained until now and the jitter obtained up to now, and the predetermined weight The network jitter can be estimated.

지터 추정부(510)는 상기 가중치로 이전 사용된 가중치, 또는 현재 획득된 지터와 이전까지 획득된 지터들의 평균값 사이의 차이값, 및 이전 사용된 분산값의 연산값을 이용할 수 있다.The jitter estimating unit 510 may use a difference value between the weight previously used as the weight or the currently obtained jitter and the average value of the jitters previously obtained and the calculated value of the previously used variance value.

지터 추정부(510)는 도 6에 도시된 바와 같이 지연 시간 산출부(511), 지연 시간 비교부(512) 및 기준 시간 조정부(513)를 포함할 수 있다. 도 6은 도 5의 입체 음향 생성 장치를 구성하는 지터 추정부의 내부 구성을 개략적으로 도시한 블록도이다.The jitter estimator 510 may include a delay time calculator 511, a delay time comparator 512 and a reference time adjuster 513 as shown in FIG. FIG. 6 is a block diagram schematically illustrating the internal structure of a jitter estimating unit included in the stereo sound generating apparatus of FIG. 5;

지연 시간 산출부(511)는 현재 수신된 오디오 정보의 수신 시간과 이전 수신된 오디오 정보의 수신 시간을 기초로 특정 채널의 패킷 지연 시간을 산출하는 기능을 수행한다.The delay time calculating unit 511 calculates the packet delay time of a specific channel based on the reception time of the currently received audio information and the reception time of the previously received audio information.

지연 시간 비교부(512)는 패킷 지연 시간과 기준 시간을 비교하여 네트워크 지터를 추정하며, 패킷 지연 시간이 기준 시간 이상이면 특정 채널에서 스파이크(spike)가 발생했다고 추정하고, 패킷 지연 시간이 기준 시간 미만이면 특정 채널이 정상 작동중인 것으로 추정하는 기능을 수행한다.The delay time comparator 512 estimates network jitter by comparing the packet delay time with a reference time, estimates that a spike occurs in a specific channel when the packet delay time is longer than a reference time, , It performs a function of estimating that a specific channel is operating normally.

기준 시간 조정부(513)는 패킷 지연 시간이 기준 시간 이상이면 패킷 지연 시간을 기준 시간으로 변경하고, 특정 채널로 이후 수신된 오디오 정보를 이용하여 얻은 패킷 지연 시간이 기준 시간 미만이면 기준 시간을 원상 회복시키는 기능을 수행한다.The reference time adjustment unit 513 changes the packet delay time to the reference time if the packet delay time is longer than the reference time and restores the reference time when the packet delay time obtained using the audio information received later on the specific channel is less than the reference time .

다시 도 5를 참조하여 설명한다.Referring back to Fig.

신호 처리부(520)는 지터 추정부(510)에 의해 추정된 네트워크 지터를 기초로 현재 수신된 오디오 정보를 선택적으로 신호 처리하는 기능을 수행한다. 본 실시예에서 도 2의 오디오 신호 처리 결정부(230), 오디오 프레임 압축 및 정상 출력부(235), 병합부(240), 손실 은닉부(245) 등은 도 5의 신호 처리부(520)로 구현될 수 있다.The signal processor 520 selectively performs signal processing on the currently received audio information based on the network jitter estimated by the jitter estimator 510. In the present embodiment, the audio signal processing determination unit 230, the audio frame compression and normal output unit 235, the merge unit 240, the loss concealment unit 245, and the like of FIG. 2 correspond to the signal processing unit 520 of FIG. 5 Can be implemented.

신호 처리부(520)는 도 8에 도시된 바와 같이 압축/비압축 처리부(521), 병합 처리부(522) 및 손실 은닉 처리부(523)를 포함할 수 있다. 도 8은 도 5의 입체 음향 생성 장치를 구성하는 신호 처리부의 내부 구성을 개략적으로 도시한 블록도이다.The signal processing unit 520 may include a compression / non-compression processing unit 521, a merge processing unit 522, and a loss concealment processing unit 523 as shown in FIG. 8 is a block diagram schematically showing an internal configuration of a signal processing unit constituting the stereophonic sound generating apparatus of FIG.

압축/비압축 처리부(521)는 현재 차례에 출력되는 오디오 정보가 있으며 이전 차례에 출력된 오디오 정보가 손실 은닉 방법에 따라 신호 처리되지 않았다면 압축/비압축 방법을 이용하여 현재 차례에 출력되는 오디오 정보를 신호 처리하는 기능을 수행한다. 본 실시예에서 도 2의 오디오 프레임 압축 및 정상 출력부(235)는 도 8의 압축/비압축 처리부(521)로 구현될 수 있다.The compression / non-compression processing unit 521 has audio information output at the current time, and if the audio information output at the previous time is not signal-processed according to the loss concealment method, the audio / As shown in FIG. In this embodiment, the audio frame compression and normal output unit 235 of FIG. 2 may be implemented by the compression / decompression processing unit 521 of FIG.

압축/비압축 처리부(521)는 오디오 신호의 존재 여부와 특정 채널에서의 스파이크 발생 여부에 따라 압축 방법과 비압축 방법 중 어느 하나의 방법을 이용하여 현재 차례에 출력되는 오디오 정보를 신호 처리할 수 있다.The compression / non-compression processing unit 521 can process the audio information output at the current time using either a compression method or an uncompression method depending on whether an audio signal is present and whether spikes are generated in a specific channel have.

압축/비압축 처리부(521)는 압축 방법과 비압축 방법 중 어느 하나의 방법을 선택할 때 현재까지 획득된 각 지터와 각 지터가 획득되기 이전에 획득된 지터 사이의 차이값들로부터 산출된 분산값, 또는 네트워크 지터와 구간 단위로 분할된 오디오 정보 간 비율을 더 이용할 수 있다.The compression / non-compression processing unit 521 performs a compression / decompression process on the basis of the difference values between the jitter obtained before the jitter obtained and the jitter obtained before the jitter obtained, , Or a ratio between network jitter and audio information segmented on a segmental basis.

병합 처리부(522)는 현재 차례에 출력되는 오디오 정보가 있으며 이전 차례에 출력된 오디오 정보가 손실 은닉 방법에 따라 신호 처리되었다면 병합 방법을 이용하여 현재 차례에 출력되는 오디오 정보를 신호 처리하는 기능을 수행한다. 본 실시예에서 도 2의 병합부(240)는 도 8의 병합 처리부(522)로 구현될 수 있다.The merging processor 522 performs a function of processing the audio information output at the current time using the merging method if the audio information output at the current time is signaled and the audio information output at the previous time is signal processed according to the loss concealment method do. In the present embodiment, the merging unit 240 of FIG. 2 may be implemented by the merge processing unit 522 of FIG.

병합 처리부(522)는 현재 차례에 출력되는 오디오 정보로부터 서브 오디오 정보들을 추출하고, 각 서브 오디오 정보를 이전 차례에 출력된 오디오 정보에 중첩시켜 현재 차례에 출력되는 오디오 정보를 신호 처리할 수 있다.The merging processor 522 extracts the sub audio information from the audio information output at the current time and superimposes the sub audio information on the audio information output at the previous time so as to process the audio information output at the current time.

손실 은닉 처리부(523)는 현재 차례에 출력되는 오디오 정보가 없으면 손실 은닉 방법을 이용하여 현재 차례에 출력되는 오디오 정보를 신호 처리하는 기능을 수행한다. 본 실시예에서 도 2의 손실 은닉부(245)는 도 8의 손실 은닉 처리부(523)로 구현될 수 있다.The loss concealment processing unit 523 performs a function of signal processing the audio information output at the current time using the loss concealment method if there is no audio information to be outputted at the current time. In this embodiment, the loss concealment unit 245 of FIG. 2 may be implemented by the loss concealment processing unit 523 of FIG.

손실 은닉 처리부(523)는 손실 은닉 방법에 따라 연속으로 신호 처리를 수행한 횟수와 기준 횟수를 비교하여 단구간 손실 은닉 방법과 장구간 손실 은닉 방법 중 어느 하나의 방법을 이용하여 현재 차례에 출력되는 오디오 정보를 신호 처리할 수 있다.The loss concealment processing unit 523 compares the number of consecutive signal processes performed in accordance with the loss concealment method with the reference number and outputs the result at the current time using either the short-term loss concealment method or the long-term loss concealment method And can process audio information.

손실 은닉 처리부(523)는 현재 차례에 출력되는 오디오 정보를 신호 처리할 때 버즈 사운드를 제거하기 위한 선형적 감소 스케일 함수를 더 이용할 수 있다.The loss concealment processing unit 523 may further use a linear decreasing scale function for eliminating the buzz sound when processing the audio information output at the current time.

다시 도 5를 참조하여 설명한다.Referring back to Fig.

입체 음향 생성부(530)는 각기 다른 채널들로부터 수신된 뒤 신호 처리된 오디오 정보들 중에서 각 오디오 정보의 생성 시간을 기초로 동일한 시간에 생성된 오디오 정보들을 추출하며, 추출된 오디오 정보들을 믹싱하여 입체 음향을 실시간으로 생성하는 기능을 수행한다. 본 실시예에서 도 3의 오디오 믹싱부(325)는 도 5의 입체 음향 생성부(530)로 구현될 수 있다.The stereo sound generating unit 530 extracts audio information generated at the same time based on the generation time of each audio information among signal processed audio information received from different channels, mixes the extracted audio information And performs a function of generating stereo sound in real time. In this embodiment, the audio mixing unit 325 of FIG. 3 may be implemented by the stereo sound generation unit 530 of FIG.

입체 음향 생성부(530)는 동일한 시간에 생성된 오디오 정보들로 서로 다른 공간에서 서로 다른 연주자나 악기에 의한 연주를 기초로 생성된 오디오 정보들을 추출할 수 있다.The stereophonic sound generator 530 may extract audio information generated based on performance by different performers or musical instruments in different spaces using the audio information generated at the same time.

입체 음향 생성부(530)는 추출된 오디오 정보들이 기준 개수보다 많은 개수의 채널들로부터 수신된 것인지 여부를 판별하며, 추출된 오디오 정보들이 기준 개수보다 많은 개수의 채널들로부터 수신된 것으로 판별되면 손실 은닉 방법을 이용하여 추출된 오디오 정보들을 믹싱하여 입체 음향을 생성할 수 있다.The stereo sound generator 530 determines whether the extracted audio information is received from a larger number of channels than the reference number. If it is determined that the extracted audio information is received from a larger number of channels than the reference number, The stereo information can be generated by mixing the extracted audio information using the concealment method.

입체 음향 생성부(530)는 두 오디오 정보 그룹들 간 에너지 상관도를 이용하여 믹싱 콘트롤 계수를 생성하고, 이 믹싱 콘트롤 계수를 이용하여 오디오 정보들을 믹싱할 수 있다.The stereo sound generating unit 530 may generate a mixing control coefficient using the energy correlation between the two audio information groups, and may mix the audio information using the mixing control coefficient.

도 7과 도 9는 도 5의 입체 음향 생성 장치에 추가 가능한 내부 구성을 개략적으로 도시한 블록도이다.FIGS. 7 and 9 are block diagrams schematically illustrating an internal configuration that can be added to the stereophonic sound generating apparatus of FIG.

먼저 도 7에 따르면, 입체 음향 생성 장치(500)는 오디오 정보 분할부(610), 분할 정보 저장부(620), 신호 처리 결정부(630), 수신 정보 저장부(640), 디코딩부(650), 처리 정보 백업부(660) 등을 더 포함할 수 있다.7, the stereo sound generating apparatus 500 includes an audio information dividing unit 610, a divided information storing unit 620, a signal processing determining unit 630, a received information storing unit 640, a decoding unit 650 , A processing information backup unit 660, and the like.

오디오 정보 분할부(610)는 현재 수신된 오디오 정보의 시간축 에너지를 이용하여 현재 수신된 오디오 정보를 오디오 신호의 존재 여부에 따라 구간 단위로 분할하는 기능을 수행한다. 본 실시예에서 도 2의 묵음 구간 검출부(220)는 도 7의 오디오 정보 분할부(610)로 구현될 수 있다.The audio information dividing unit 610 divides the currently received audio information in units of intervals according to the presence or absence of the audio signal using the time axis energy of the currently received audio information. In this embodiment, the silence interval detection unit 220 of FIG. 2 may be implemented by the audio information division unit 610 of FIG.

분할 정보 저장부(620)는 구간 단위로 분할된 오디오 정보들을 저장하며, 시퀀스 순서에 따라 구간 단위로 분할된 오디오 정보들을 차례대로 출력하는 기능을 수행한다. 본 실시예에서 도 2의 오디오 프레임 저장부(225)는 도 7의 분할 정보 저장부(620)로 구현될 수 있다.The segmentation information storage unit 620 stores audio information segmented on a segment basis, and sequentially outputs audio information segmented on a segment basis in a sequence order. In this embodiment, the audio frame storage unit 225 of FIG. 2 may be implemented as the division information storage unit 620 of FIG.

신호 처리 결정부(630)는 차례대로 출력되는 오디오 정보들을 압축/비압축 방법, 손실 은닉 방법 및 병합 방법 중 어느 하나의 방법을 이용하여 신호 처리할 것인지를 결정하는 기능을 수행한다. 본 실시예에서 도 2의 오디오 신호 처리 결정부(230)는 도 7의 신호 처리 결정부(630)로 구현될 수 있다.The signal processing determination unit 630 performs a function of determining whether to process the audio information, which is sequentially output, by using one of a compression / decompression method, a loss concealment method, and a merge method. In this embodiment, the audio signal processing determination unit 230 of FIG. 2 may be implemented by the signal processing determination unit 630 of FIG.

신호 처리 결정부(630)는 현재 차례에 출력되는 오디오 정보가 있는지 여부와 이전 차례에 출력된 오디오 정보가 손실 은닉 방법에 따라 신호 처리되었는지 여부를 기초로 현재 차례에 출력되는 오디오 정보를 압축/비압축 방법, 손실 은닉 방법 및 병합 방법 중 어느 하나의 방법을 이용하여 신호 처리할 것인지를 결정할 수 있다.The signal processing determination unit 630 determines whether the audio information output at the current time is compressed or not based on whether the audio information output at the current time is present and whether the audio information output at the previous time is processed according to the loss concealment method It is possible to decide whether to perform signal processing using any one of a compression method, a loss concealment method, and a merge method.

수신 정보 저장부(640)는 특정 채널로 수신된 오디오 정보를 저장하며, 특정 채널로 수신된 오디오 정보에 특정 채널로 수신된 오디오 정보의 생성 시간과 특정 채널에 대한 정보를 결합하여 저장하는 기능을 수행한다. 본 실시예에서 도 2의 오디오 지터 버퍼(210)는 도 7의 수신 정보 저장부(640)로 구현될 수 있다.The reception information storage unit 640 stores audio information received on a specific channel, and combines and stores information on a specific channel and the generation time of audio information received on a specific channel in audio information received on a specific channel . In the present embodiment, the audio jitter buffer 210 of FIG. 2 may be implemented as the reception information storage unit 640 of FIG.

디코딩부(650)는 수신 정보 저장부(640)에 저장된 오디오 정보들 중에서 네트워크 지터를 기초로 선택된 오디오 정보를 디코딩하는 기능을 수행한다. 본 실시예에서 도 2의 오디오 디코더(215)는 도 7의 디코딩부(650)로 구현될 수 있다.The decoding unit 650 decodes the selected audio information based on the network jitter among the audio information stored in the reception information storage unit 640. In this embodiment, the audio decoder 215 of FIG. 2 may be implemented by the decoding unit 650 of FIG.

처리 정보 백업부(660)는 신호 처리된 오디오 정보들을 백업(backup)시키는 기능을 수행한다. 본 실시예에서 도 2의 오디오 프레임 백업부(250)는 도 7의 처리 정보 백업부(660)로 구현될 수 있다.The processing information backup unit 660 performs a function of backing up the processed audio information. In this embodiment, the audio frame backup unit 250 of FIG. 2 may be implemented by the processing information backup unit 660 of FIG.

한편 입체 음향 생성 장치(500)는 제1 이득 조정부(670)와 제2 이득 반영부(680)를 더 포함할 수 있다.The stereophonic sound generator 500 may further include a first gain adjuster 670 and a second gain reflector 680.

제1 이득 조정부(670)는 신호 처리된 오디오 정보들이 오디오 신호의 존재 여부에 따라 구간 단위로 분할되면 구간 단위로 분할된 오디오 정보들의 제1 이득값을 조정하여 음량을 균일시키는 기능을 수행한다. 본 실시예에서 도 3의 오디오 음량 이득 조정부(305)는 도 9의 제1 이득 조정부(670)로 구현될 수 있다.The first gain adjustment unit 670 adjusts the first gain value of the audio information divided in units of intervals to equalize the volume when the signal-processed audio information is divided in units of intervals according to the presence or absence of the audio signal. In this embodiment, the audio volume gain adjustment unit 305 of FIG. 3 may be implemented by the first gain adjustment unit 670 of FIG.

제1 이득 조정부(670)는 그룹 생성부(671), 이득 산출부(672) 및 이득 비교부(673)를 포함할 수 있다.The first gain adjustment unit 670 may include a group generation unit 671, a gain calculation unit 672, and a gain comparison unit 673.

그룹 생성부(671)는 구간 단위로 분할된 오디오 정보들 중에서 묵음을 포함하는 오디오 정보들을 제외한 나머지 정보들을 이용하여 오디오 정보 그룹들을 생성하는 기능을 수행한다.The group generation unit 671 generates audio information groups using information other than audio information including silence among the audio information divided in units of intervals.

이득 산출부(672)는 각 오디오 정보 그룹마다 음량에 대한 피크값을 산출하고, 상기 피크값과 기준값을 이용하여 제1 이득값을 산출하는 기능을 수행한다.The gain calculating unit 672 calculates a peak value for the volume for each audio information group, and calculates a first gain value using the peak value and the reference value.

이득 비교부(673)는 시간 순서에 따라 전후에 위치하는 두 오디오 정보 그룹들의 제1 이득값을 비교하고, 두 제1 이득값 사이에 차이값이 있으면 스무딩 기법을 이용하여 이전에 위치하는 오디오 정보 그룹의 제1 이득값을 기초로 이후에 위치하는 오디오 정보 그룹의 제1 이득값을 조정하는 기능을 수행한다.The gain comparator 673 compares the first gain values of the two audio information groups located before and after the audio information groups according to the time order, and if there is a difference value between the two first gain values, And adjusts the first gain value of the audio information group located later based on the first gain value of the group.

제2 이득 반영부(680)는 반향 성분과 지연 성분이 포함된 것으로서 구간 단위로 분할된 오디오 정보들의 제2 이득값을 산출하며, 구간 단위로 분할된 오디오 정보들 각각에 대하여 제1 이득값과 제2 이득값을 반영하는 기능을 수행한다. 본 실시예에서 도 3의 오디오 확산 조정부(310)는 도 9의 제2 이득 반영부(680)로 구현될 수 있다.The second gain reflector 680 calculates the second gain value of the audio information that includes the echo component and the delay component and is divided in units of intervals, and calculates a first gain value and a second gain value for each audio information And reflects the second gain value. In this embodiment, the audio diffusion adjustment unit 310 of FIG. 3 may be implemented by the second gain reflection unit 680 of FIG.

제2 이득 반영부(680)는 전역 통과 필터(All pass filter)와 피드백 콤 필터(Feedback comb filter)를 이용하여 제2 이득값을 산출할 수 있다.The second gain reflection unit 680 may calculate the second gain value using an all pass filter and a feedback comb filter.

이상에서 설명한 본 발명의 실시예를 구성하는 모든 구성요소들이 하나로 결합하거나 결합하여 동작하는 것으로 기재되어 있다고 해서, 본 발명이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 또한, 이와 같은 컴퓨터 프로그램은 USB 메모리, CD 디스크, 플래쉬 메모리 등과 같은 컴퓨터가 읽을 수 있는 기록매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시예를 구현할 수 있다. 컴퓨터 프로그램의 기록매체로서는 자기 기록매체, 광 기록매체, 캐리어 웨이브 매체 등이 포함될 수 있다.It is to be understood that the present invention is not limited to these embodiments, and all elements constituting the embodiment of the present invention described above are described as being combined or operated in one operation. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them. In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. In addition, such a computer program may be stored in a computer readable medium such as a USB memory, a CD disk, a flash memory, etc., and read and executed by a computer to implement an embodiment of the present invention. As the recording medium of the computer program, a magnetic recording medium, an optical recording medium, a carrier wave medium, and the like can be included.

또한, 기술적이거나 과학적인 용어를 포함한 모든 용어들은, 상세한 설명에서 다르게 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 문맥상의 의미와 일치하는 것으로 해석되어야 하며, 본 발명에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Furthermore, all terms including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined in the Detailed Description. Commonly used terms, such as predefined terms, should be interpreted to be consistent with the contextual meanings of the related art, and are not to be construed as ideal or overly formal, unless expressly defined to the contrary.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. 따라서, 본 발명에 개시된 실시예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.It will be apparent to those skilled in the art that various modifications, substitutions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. will be. Therefore, the embodiments disclosed in the present invention and the accompanying drawings are intended to illustrate and not to limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings . The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

Claims

Estimating network jitter based on a reception time of the audio information currently received on the specific channel and a reception time of the audio information previously received on the specific channel when the audio information is received on the specific channel, A delay time calculating unit for calculating a packet delay time of the specific channel based on a reception time of the audio information and a reception time of the previously received audio information; And estimating the network jitter by comparing the packet delay time with a reference time. If the packet delay time is greater than or equal to the reference time, it is estimated that a spike has occurred in the specific channel. A jitter estimator for estimating that the specific channel is in a normal operation;
A signal processor for selectively processing the currently received audio information based on the network jitter; And
Extracts audio information generated at the same time on the basis of the generation time of each audio information among the signal-processed audio information received from different channels, mixes the extracted audio information to generate stereoscopic sound in real time The sound-
And a second sound generator for generating a sound.

The method according to claim 1,
Wherein the stereo sound generating unit extracts audio information generated based on performance by a different player or musical instrument in different spaces with the audio information generated at the same time.

The method according to claim 1,
The jitter estimating unit may calculate the jitter using the average value of the jitters obtained so far, the variance value calculated from the differences between the jitter obtained before the jitter obtained before the jitter is obtained, and the predetermined weight, And estimates the jitter.

The method of claim 3,
Wherein the jitter estimation unit uses a difference value between a weight previously used as the weight or the currently obtained jitter and an average value of jitters previously obtained and an operation value of the previously used variance value.

delete

The method according to claim 1,
Wherein the jitter estimator comprises:
And changes the packet delay time to the reference time if the packet delay time is longer than or equal to the reference time and restores the reference time when the packet delay time obtained using the audio information received later with the specific channel is less than the reference time The reference time adjustment unit
Further comprising: a stereo sound generator for generating stereo sound.

The method according to claim 1,
An audio information dividing unit for dividing the currently received audio information in units of intervals according to the presence or absence of an audio signal using the time axis energy of the currently received audio information;
A division information storage unit for storing the audio information divided in units of intervals and sequentially outputting audio information divided in units of intervals according to a sequence order; And
A loss concealment method, and a merge method, and determines whether to process the audio information, which is output in order, by using a compression / decompression method, a loss concealment method,
Further comprising: a stereo sound generator for generating stereo sound.

8. The method of claim 7,
Wherein the signal processing decision unit decides whether or not audio information to be outputted at the present time is present and whether audio information outputted at the previous turn is processed according to the loss concealment method, The loss concealment method, and the merge method according to a result of the determination of whether to perform signal processing using the compression method, the loss concealment method, and the merge method.

8. The method of claim 7,
The signal processing unit,
A compression / non-compression method for processing audio information output at the current time by using the compression / decompression method if the audio information outputted at the current time is present and the audio information output at the previous time is not signal- An uncompression processing unit;
A merging processor for signaling the audio information output at the current time using the merging method if the audio information output at the current time is present and the audio information output at the previous time is signal processed according to the loss concealment method; And
If there is no audio information to be output at the current time, a loss concealment processing unit for processing the audio information output at the current time using the loss concealment method,
And a second sound generator for generating a sound.

10. The method of claim 9,
The compression / non-compression processing unit processes the audio information output at the current time using one of a compression method and an uncompression method depending on whether the audio signal exists or not and whether spikes are generated in the specific channel Wherein the stereo sound generating device is a stereo sound generating device.

11. The method of claim 10,
Wherein the compression / non-compression processing unit is configured to perform the compression / decompression processing on the basis of the difference between the jitter obtained up to now and the jitter obtained before the jitter is acquired when selecting the compression method or the non- Value or the ratio between the network jitter and the audio information divided on the basis of the interval.

10. The method of claim 9,
Wherein the loss concealment processing unit compares the number of consecutive signal processes performed in accordance with the loss concealment method with a reference frequency and outputs the result at the current time using one of a short loss concealment method and a long term loss concealment method And the audio information is signal-processed.

10. The method of claim 9,
Wherein the loss concealment processing unit further uses a linear decreasing scale function for eliminating a buzz sound when signaling the audio information output at the current time.

10. The method of claim 9,
The merging processor extracts sub audio information from the audio information output at the current time and superimposes each sub audio information on the audio information output at the previous time and processes the audio information output at the current time To-be-generated sound.

The method according to claim 1,
The stereo sound generator determines whether or not the extracted audio information is received from a larger number of channels than the reference number. If it is determined that the extracted audio information is received from a larger number of channels than the reference number, Wherein the stereophonic sound is generated by mixing the extracted audio information using a concealment method.

8. The method of claim 7,
A receiving information storage unit for storing the audio information received by the specific channel and for combining audio information received by the specific channel with information about the specific channel and generating time information of the audio information received by the specific channel;
A decoding unit decoding audio information selected based on the network jitter among the stored audio information; And
A processing information backup unit for backing up the signal-processed audio information,
Further comprising: a stereo sound generator for generating stereo sound.

The method according to claim 1,
A first gain adjuster for adjusting the first gain value of the audio information divided in units of intervals to equalize the volume when the signal processed audio information is divided in units of intervals according to whether an audio signal exists; And
The first gain value and the second gain value are reflected on each of the audio information segments divided in units of intervals, the second gain value of the audio information divided into the interval units including the echo component and the delay component is calculated, The second gain-
Further comprising: a stereo sound generator for generating stereo sound.

18. The method of claim 17,
Wherein the first gain adjustment unit comprises:
A group generating unit for generating audio information groups using information other than audio information including silence among the audio information divided in units of intervals;
A gain calculating unit for calculating a peak value for a volume for each audio information group and calculating a first gain value using the peak value and the reference value; And
The first gain value of the two audio information groups located before and after the audio information group in the time order is compared. If there is a difference value between the two first gain values, the first gain value of the audio information group located before is smoothed A gain comparison unit for adjusting a first gain value of an audio information group located later on the basis of the first gain value,
And a second sound generator for generating a sound.

19. The method of claim 18,
Wherein the stereo sound generating unit generates a mixing control coefficient using the energy correlation between the two audio information groups, and mixes the audio information using the mixing control coefficient.

18. The method of claim 17,
Wherein the second gain reflection unit calculates the second gain value using an all pass filter and a feedback comb filter.

Estimating network jitter based on a reception time of the audio information currently received on the specific channel and a reception time of the audio information previously received on the specific channel when the audio information is received on the specific channel, A delay time calculating unit for calculating a packet delay time of the specific channel based on a reception time of the audio information and a reception time of the previously received audio information; And estimating the network jitter by comparing the packet delay time with a reference time. If the packet delay time is greater than or equal to the reference time, it is estimated that a spike has occurred in the specific channel. A jitter estimator for estimating that the specific channel is in a normal operation; A signal processor for selectively processing the currently received audio information based on the network jitter; And extracts audio information generated at the same time based on the generation time of each audio information among signal processed audio information received from different channels and mixes the extracted audio information to generate stereo sound in real time A stereo sound generating device including a stereo sound generating section;
An audio information generating unit for recording performances of different performers or musical instruments in different spaces at the same time to generate respective pieces of audio information;
An encoding unit encoding the generated audio information;
An audio information transmitting unit for dividing the encoded audio information into packets and transmitting the divided audio information to the jitter estimating unit and notifying the generation time of the encoded audio information to the stereo sound generating unit; And
A stereoscopic sound reproducing unit for reproducing the stereo sound in real time,
And a stereo sound reproduction system.