KR20040090567A

KR20040090567A - Method for concealing Packet Loss using Information of Packets before and after Packet Loss

Info

Publication number: KR20040090567A
Application number: KR1020030024438A
Authority: KR
Inventors: 한민수; 김재현; 김학균
Original assignee: 주식회사 케이티
Priority date: 2003-04-17
Filing date: 2003-04-17
Publication date: 2004-10-26
Also published as: KR100954668B1

Abstract

PURPOSE: A PLC(Packet Loss Concealment) method using the packet information before/after loss is provided to compare both phonemes by using the pitch information obtained from two packets and the spectral distance between the packets before/after packet loss, and to compensate for a loss section with front/back phoneme information, thereby supplying more natural synthetic voice signals. CONSTITUTION: A receiving-side confirms that packet loss occurred during a VoIP(Voice over Internet Protocol) transmission(601). To conceal the packet loss, the receiving-side detects pitches from two packets before and after the loss(602). The receiving-side confirms whether the two packets are all the same as voiced sounds or voiceless sounds(603). If so, the receiving-side calculates the spectral distance between voice signals of the two packets(604), and compares the calculated spectral distance with the spectral distance between adjacent packets to check whether the different between the two spectral distance is big(605). If not, the receiving-side judges that the phonemes of the two packets are the same(606), and compensates for the entire packet loss section with the packet before the packet loss(607). If the different is big, the receiving-side judges the phonemes of the two packets are different(608), and compensates for two-thirds of the packet loss section with the phoneme of the packet before the packet loss and the rest with the phoneme of the packet after the loss(609).

Description

Method for concealing Packet Loss using Information of Packets before and after Packet Loss}

본 발명은 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것으로, 더욱 상세하게는 비오아이피(VoIP : Voice over Internet Protocol)와 같은 오디오 전송 시스템에서 패킷손실에 의한 음성품질 저하를 개선하기 위한 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention relates to a packet loss concealment method using packet information before and after loss, and a computer readable recording medium recording a program for realizing the method. More specifically, it is Voice over Internet Protocol (VoIP). The present invention relates to a packet loss concealment method using pre- and post-loss packet information for improving voice quality degradation due to packet loss in an audio transmission system, and a computer-readable recording medium recording a program for realizing the method.

종래에 VoIP상에서 패킷손실을 보상하기 위해서 흔히 쓰이는 기술로 에프이씨(FEC : Forward Error Correction)와 패킷손실 은닉(PLC : Packet Loss Concealment)을 들 수 있다. FEC는 송신 측과 수신 측에서 소스에 대한 추가적인 처리가 요구되며, 그 효율성은 손실율과 손실분포에 따라 달라진다는 문제점이 있었다. 또한, FEC 방식은 넓은 대역폭과 부가적인 처리에 따른 지연을 필요로 한다는 문제점이 있었다.Conventionally, techniques commonly used to compensate for packet loss in VoIP include FEC (Forward Error Correction) and Packet Loss Concealment (PLC). The FEC has a problem that additional processing of the source is required at the transmitting side and the receiving side, and its efficiency depends on the loss rate and the loss distribution. In addition, the FEC method requires a wide bandwidth and a delay due to additional processing.

반면, 패킷손실은닉 방식은 수신된 비트 스트림에 대해서 네트워크 상에서 사라진 데이터를 대신할 수 있는 합성음성신호를 만들어 내는 것이다. 음성신호는 대개 부분적으로 안정화 되어있기 때문에 이전의 음성신호로부터 손실된 패킷의 음성신호를 추정하는 것이 가능하다. 이제까지 VoIP 코덱들에 주로 적용되는 패킷손실 은닉 방식들은 이전 패킷의 정보를 손실구간에 그대로 대체하거나 묵음으로 처리하고 있으며, 패턴매칭이나 음성정보 모델링에 기반한 다양한 알고리즘들이 연구되고 이에 대한 적용사례가 발표되고 있다.Packet loss concealment, on the other hand, generates a synthetic speech signal that can replace the lost data on the network for the received bit stream. Since the speech signal is usually partially stabilized, it is possible to estimate the speech signal of the lost packet from the previous speech signal. So far, packet loss concealment methods mainly applied to VoIP codecs are replaced or silently processed the information of previous packet in the loss interval. Various algorithms based on pattern matching or voice information modeling are studied and application cases are presented. have.

그러나, 이제까지의 손실은닉 방식은 단일 프레임(4-40ms)의 낮은 패킷손실율(<15%)에 대해서 효과적으로 적용될 수 있으며, 연속적인 패킷손실에 의한 상대적으로 긴 구간에 대해서 패킷손실을 보상하기에는 무리가 있다는 문제점이 있었다.However, the loss concealment method up to now can be effectively applied for the low packet loss rate (<15%) of a single frame (4-40ms), and it is difficult to compensate for the packet loss over a relatively long period due to continuous packet loss. There was a problem.

종래기술의 문제점에 대해 다시 한 번 설명하면 다음과 같다.The problem of the prior art will be described once again.

실제 네트워크 상에서 패킷손실은 연이어서 발생하는 경우가 빈번하다. 따라서, 음성신호의 손실보상은 비교적 긴 구간의 패킷손실에 대해서도 좋은(robust) 결과를 제공해야 한다. 하지만, 음성 신호가 변하는 구간(transition)ㅣ나 한 음소정보 전부가 손실될 정도의 손실에 대해서는 앞 프레임 정보만을 이용한 손실보상은 한계가 있는 것이 사실이다.On a real network, packet loss often occurs consecutively. Therefore, the loss compensation of the speech signal should provide a robust result even for packet loss in a relatively long period. However, it is true that the loss compensation using only the previous frame information is limited in terms of the transition | transition of the audio signal or the loss to which all one phoneme information is lost.

예를 들어 설명하자면, 무리하게 피치정보를 반복함으로 해서 부자연스러운 인공결과물(artifact)이 발생하게 된다는 것이다. 이런 현상은 오디오 전송에 있어서 음성신호가 올바르게 손실이 보상되는 것이라 말할 수 없다. 물론, 'ITU-T' 권고안이나 종래 특허 출원된 "패킷 손실 또는 프레임 삭제 은폐를 실행하는 방법 및 장치(대한민국 출원번호 : 10-2000-7014272)"에도 이를 방지하기 위해 생성된 신호를 서서히 줄이는 방식을 제시하고 있다. 그러나, 이는 상기한 것처럼 비교적 긴구간의 패킷손실을 보상하기에는 충분치 않다는 문제점이 있었다.For example, by unduly repeating pitch information, unnatural artifacts are generated. This phenomenon cannot be said that the audio signal is compensated for loss in audio transmission correctly. Of course, the 'ITU-T' recommendation or the conventional patent application "Method and apparatus for performing packet loss or frame erasure concealment (Korean Application No. 10-2000-7014272)" also gradually reduces the generated signal to prevent this. Presenting. However, this has a problem that, as described above, it is not enough to compensate for a relatively long period of packet loss.

본 발명은, 상기한 바와 같은 문제점을 해결하기 위하여 안출된 것으로, VoIP와 같은 방식을 이용하는 오디오 전송 시스템에서의 패킷손실을 손실 전/후의 피치정보와 스펙트럼 거리를 이용하여 양쪽의 음소를 비교하여 손실구간을 앞/뒤의 음소정보를 활용하여 보다 자연스러운 합성 음성신호를 제공하도록 하는 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.The present invention has been made to solve the above problems, and the packet loss in the audio transmission system using a method such as VoIP is lost by comparing both phonemes using the pitch information before and after the loss and the spectral distance. The packet loss concealment method using pre / post loss packet information to provide a more natural synthesized speech signal using the front / rear phoneme information of the section and a computer-readable recording medium recording a program for implementing the method. The purpose is to provide.

도 1 은 본 발명이 적용되는 오디오 전송 시스템의 구성예시도.1 is an exemplary configuration diagram of an audio transmission system to which the present invention is applied.

도 2 는 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 있어서 패킷손실구간 보상에 대한 일예시도.2 is an exemplary view of packet loss interval compensation in a packet loss concealment method using before and after loss packet information according to the present invention.

도 3 은 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 있어서 불연속성을 줄이기 위한 OLA의 일예시도.3 is an exemplary diagram of an OLA for reducing discontinuity in a packet loss concealment method using before and after loss packet information according to the present invention.

도 4 는 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 있어서 손실구간 합성음성의 크기조절에 대한 일예시도.Figure 4 is an exemplary view for controlling the size of the loss interval synthesized voice in the packet loss concealment method using before and after loss packet information according to the present invention.

도 5 는 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 대한 일실시예 설명도.5 is a diagram illustrating an embodiment of a packet loss concealment method using before and after loss packet information according to the present invention;

도 6a 및 도 6b 는 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 따른 일실시예 흐름도.6A and 6B are flowcharts illustrating an embodiment of a packet loss concealment method using before and after loss packet information according to the present invention.

*도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

110 : 인코더 115 : 손실 프레임 검출기110: encoder 115: lost frame detector

120 : 디코더 130 : 패킷손실 은닉 모듈120: decoder 130: packet loss concealment module

상기 목적을 달성하기 위한 본 발명은, 오디오 전송 시스템에 적용되는 패킷손실 은닉 방법에 있어서, 수신된 음성 패킷의 손실을 확인하고 손실구간 전/후의 패킷을 비교하여 상기 손실 전/후의 패킷이 서로 상이한 음소인지를 판단하는 제 1 단계; 상기 제 1 단계의 판단 결과, 상기 손실 전/후의 패킷이 서로 동일한 음소라고 판단되면 손실 전 패킷의 음성정보를 이용하여 손실구간 전부를 보상함으로써 패킷손실을 은닉하는 제 2 단계; 및 상기 제 1 단계의 판단 결과, 상기 손실 전/후의 패킷이 서로 상이한 음소라고 판단되면 상기 손실구간 중 소정부분을 상기 손실 전 패킷의 음성정보를 이용하여 보상하고 상기 손실구간 중 상기 소정부분을 제외한 구간을 상기 손실 후 패킷의 음성정보를 이용하여 보상함으로써 패킷손실을 은닉하는 제 3 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention provides a packet loss concealment method applied to an audio transmission system, wherein a loss of a received voice packet is checked and packets before and after the loss interval are different from each other. Determining whether the phoneme is a phoneme; A second step of concealing packet loss by compensating the entire loss section by using the voice information of the pre-lost packet if it is determined that the packets before and after the loss are the same phoneme; And as a result of the determination in the first step, if it is determined that the packets before and after the loss are different phonemes from each other, the predetermined portion of the loss interval is compensated by using the voice information of the packet before the loss and the predetermined portion of the loss interval is excluded. And a third step of concealing packet loss by compensating the interval by using the voice information of the packet after the loss.

또한, 본 발명은, 프로세서를 구비한 오디오 전송 시스템에, 수신된 음성 패킷의 손실을 확인하고 손실구간 전/후의 패킷을 비교하여 상기 손실 전/후의 패킷이 서로 상이한 음소인지를 판단하는 제 1 기능; 상기 제 1 기능의 판단 결과, 상기 손실 전/후의 패킷이 서로 동일한 음소라고 판단되면 손실 전 패킷의 음성정보를 이용하여 손실구간 전부를 보상함으로써 패킷손실을 은닉하는 제 2 기능; 및 상기 제 1 기능의 판단 결과, 상기 손실 전/후의 패킷이 서로 상이한 음소라고 판단되면 상기 손실구간 중 소정부분을 상기 손실 전 패킷의 음성정보를 이용하여 보상하고 상기 손실구간 중 상기 소정부분을 제외한 구간을 상기 손실 후 패킷의 음성정보를 이용하여 보상함으로써 패킷손실을 은닉하는 제 3 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention, the audio transmission system having a processor, the first function of checking the loss of the received voice packet and comparing the packets before and after the loss interval to determine whether the packets before and after the loss are different phonemes ; A second function of concealing packet loss by compensating for the entire loss section by using the voice information of the pre-lost packet if it is determined that the packets before and after the loss are the same phonemes as a result of the determination of the first function; And as a result of the determination of the first function, if it is determined that the packets before and after the loss are different phonemes from each other, the predetermined portion of the loss interval is compensated using the voice information of the packet before the loss, and the other portions except the predetermined portion of the loss interval are excluded. A computer readable recording medium having recorded thereon a program for realizing a third function for concealing packet loss by compensating the section using the voice information of the packet after the loss is provided.

본 발명은, VoIP의 음성품질을 개선하기 위하여 음성패킷 손실 구간에 대해서 손실 전/후의 음소의 피치정보와 스펙트럼 거리를 비교하여 동일음소 여부를 확인하고 이에 따라서 패킷손실 전의 음소정보만을 이용하여 손실구간을 보상하거나 손실이후 음소정보를 추가로 이용하여 손실구간을 보상하게 되는 것이다. 여기서, 손실구간의 보상은 사라진 데이터를 대신할 수 있는 합성음성신호를 만들어 낸다는 것이다.In order to improve the voice quality of VoIP, the present invention compares the pitch information and the spectral distance of the phoneme before and after the loss of the voice packet loss interval to check whether the same phoneme is used, and accordingly the loss interval using only the phoneme information before packet loss. To compensate for this loss or to use additional phoneme information after loss to compensate for the loss. Here, the compensation of the loss section is to produce a synthesized speech signal to replace the missing data.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명이 적용되는 오디오 전송 시스템의 구성예시도이다.1 is an exemplary configuration diagram of an audio transmission system to which the present invention is applied.

본 발명이 적용되는 오디오 전송 시스템은 도면에 도시된 것처럼 인코더(110), 손실 프레임 검출기(115), 디코더(120) 및 패킷손실 은닉 모듈(130)을 포함한다.The audio transmission system to which the present invention is applied includes an encoder 110, a lost frame detector 115, a decoder 120, and a packet loss concealment module 130 as shown in the figure.

도 1에서 인코더(110)는 입력 오디오 프레임을 전달받아 코드화된 비트-스트림(bit-stream)을 전송한다. 전송된 비트-스트림은 프레임이 손실되었나 여부를 결정하는 수신측의 손실 프레임 검출기(115)에 전달된다. 손실 프레임 검출기(115)에서 프레임이 손실된 것으로 결정되면 손실 프레임 검출기(115)는 손실 프레임을 재구성하기 위해 패킷손실 은닉 모듈(130)로 이를 통보한다.In FIG. 1, the encoder 110 receives an input audio frame and transmits a coded bit-stream. The transmitted bit-stream is passed to the lost frame detector 115 at the receiving side which determines whether the frame has been lost. If the lost frame detector 115 determines that the frame is lost, the lost frame detector 115 notifies the packet loss concealment module 130 to reconstruct the lost frame.

패킷손실 은닉 모듈(130)에서는 손실 프레임에 대해 주어진 패킷손실 은닉 방식에 따라 보상하여 온전한 오디오 프레임이 디코더(120)를 통해 출력될 수 있도록 한다.The packet loss concealment module 130 compensates for the lost frame according to a given packet loss concealment scheme so that an intact audio frame can be output through the decoder 120.

상기한 바와 같은 오디오 전송 시스템은 본 발명이 적용되는 시스템의 일예가 될 수 있다.The audio transmission system as described above may be an example of a system to which the present invention is applied.

본 발명에 따른 패킷손실 은닉 방법은 손실이전 패킷과 손실이후 패킷에서 피치검출을 실시하고 피치가 검출되면 유성음, 그렇지 않으면 무성음으로 구분한다. 양쪽 다 무성음이나 유성음으로 같을 경우에는 두 패킷의 음성신호간의 스펙트럼 거리를 계산한다. 만일 스펙트럼 거리가 이웃한 스펙트럼 거리와 비교하여 상대적으로 차이가 많이 나면 서로 상이한 음소로 판단하고, 그렇지 않다면 동일한 음소라고 판단한다.In the packet loss concealment method according to the present invention, pitch detection is performed on the packet before the loss and the packet after the loss, and when the pitch is detected, the packet loss concealment is divided into voiced sound and voiceless sound. If both voiced or voiced sounds are equal, the spectral distance between the voice signals of the two packets is calculated. If the spectral distance is relatively different from the neighboring spectral distance, it is determined to be different phonemes from each other, otherwise it is determined to be the same phoneme.

손실구간 양단의 음소가 동일하다면 앞의 패킷의 음소정보를 이용하여 손실구간 전부를 보상하고, 그렇지 않다면 손실구간의 2/3를 앞의 음소정보로 보상하고 나머지 1/3 부분을 뒤 패킷의 음소정보로 보상한다. 이때, 음소가 유성음인 경우에는 피치구간 음성정보를 반복해주고, 무성음인 경우에는 패킷의 음성정보를 전부 사용한다. 여기서, 유성음인 경우에 수행되는 피치구간 음성정보의 반복은 도 2를 통해 간략히 제시된다.If the phonemes at both ends of the loss section are the same, the phoneme information of the previous packet is used to compensate for the entire loss section. Otherwise, two-thirds of the loss section is compensated with the previous phoneme information, and the remaining third part is the phoneme of the later packet. Reward with information In this case, when the phoneme is a voiced sound, the pitch section voice information is repeated, and in the case of the unvoiced sound, all voice information of the packet is used. Here, the repetition of the pitch section speech information performed in the case of voiced sound is briefly shown through FIG. 2.

손실구간을 앞/뒤의 음성정보로 보상하게 되면 손실구간 끝 부분이나 중간에서 불연속적인 부분이 나타난다. 이 부분은 하기에 설명하는 도 3에 도시된 것처럼 중복구간 D(5msec) 만큼을 추가적으로 보상음성신호를 만들어 삼각 창을 이용한 오버랩 가산(OLA : Overlap Add)을 적용함으로써 불연속성을 줄일 수 있다.When the loss section is compensated with the front / rear voice information, a discontinuous part appears at the end or the middle of the loss section. This part can reduce the discontinuity by applying an overlap addition (OLA: Overlap Add) using a triangular window by additionally compensating for the overlapping section D (5 msec) as shown in FIG.

손실구간이 길어짐에 따라 동일한 음성신호를 길게 반복하여 보상해 주면 재생되는 합성음이 부자연스러워지게 된다. 이때는 하기에 설명하는 도 4에서처럼 손실구간 2/3 구간의 음성신호 크기를 10 msec 당 20%씩 감소하도록 조절하고, 나머지 1/3 부분을 뒤프레임 음성신호의 크기와 맞도록 다시 서서히 증가시킨다. 만일 손실구간 앞뒤가 동일 음소인 경우에는 앞/뒤 음성정보의 피크점을 고려해 보상음성신호의 크기를 선형적으로 조정한다.As the duration of the loss becomes longer, the synthesized sound to be reproduced becomes unnatural if the same voice signal is repeatedly compensated for a long time. In this case, as illustrated in FIG. 4, the control unit adjusts the size of the voice signal in the loss section 2/3 by 20% per 10 msec, and gradually increases the remaining 1/3 to match the size of the back frame voice signal. If the front and rear of the loss section are the same phoneme, the amplitude of the compensated voice signal is linearly adjusted in consideration of the peak points of the front and rear voice information.

도 2 는 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 있어서 패킷손실구간 보상에 대한 일예시도이다.2 is an exemplary view of packet loss interval compensation in a packet loss concealment method using before and after loss packet information according to the present invention.

도 2에서는 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법이 가지는 특징을 도면을 통해 나타내고 있다. 즉, 패킷손실구간에 대해 손실구간 전의 피치구간 음성정보와 손실구간 후의 피치구간 음성정보를 손실구간에 적용하고 있음을 도시하고 있다. 또한, 손실구간 전/후의 패킷이 만나는 불연속면에 대해 이를 줄이기 위해 OLA를 적용함을 나타내고 있다.2 shows the characteristics of a packet loss concealment method using before and after loss packet information according to the present invention. That is, it shows that the pitch section speech information before the loss section and the pitch section speech information after the loss section are applied to the loss section for the packet loss section. In addition, it indicates that the OLA is applied to reduce the discontinuity between the packets before and after the loss section.

도 3 은 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 있어서 불연속성을 줄이기 위한 OLA의 일예시도이다.3 is an exemplary diagram of an OLA for reducing discontinuity in a packet loss concealment method using before and after loss packet information according to the present invention.

본 발명에 따라 손실구간 합성음성을 보상하는데 있어서 손실전/후의 음성정보를 이용함으로 해서 발생하는 경계부분의 불연속성을 줄이기 위한 OLA를 나타내고 있다. 이때, 도 3에 도시된 것처럼 삼각창(triangular window) 형태로 OLA가 이루어진다.According to the present invention, the OLA for reducing the discontinuity of the boundary portion generated by using the voice information before and after loss in compensating the loss interval synthesized voice is shown. At this time, the OLA is made in the form of a triangular window as shown in FIG. 3.

도 4 는 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 있어서 손실구간 합성음성의 크기조절에 대한 일예시도이다.4 is an exemplary view illustrating the control of the size of the loss interval synthesized voice in a packet loss concealment method using before and after loss packet information according to the present invention.

도 4에서는 상기한 것처럼 손실구간이 긴 경우에는 부자연스러워질 수 있으므로 크기를 조절하게 된다. 도 4에서 보여지는 방식은 손실 전/후가 다른 음소일 경우이며, 동일 음소일 경우에는 상기한 것처럼 전/후 음성정보의 피크점을 고려해 보상음성신호의 크기를 선형적으로 조정하게 된다.In FIG. 4, since the loss interval is long as described above, the size may be unnatural. The method shown in FIG. 4 is a case where different phonemes are before and after loss, and when the same phonemes are used, the compensation voice signal is linearly adjusted in consideration of peak points of the voice information before and after.

도 5 는 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 대한 일실시예 설명도이다.5 is a diagram illustrating an embodiment of a packet loss concealment method using before and after loss packet information according to the present invention.

검출된 음성 패킷과 음성 패킷 사이에 패킷손실구간이 발생하면 피치검출을 수행하고 고속푸리에변환(FFT : Fast Fourier Transform)을 통한 스펙트럼 거리 계산 결과를 기본적인 이웃한 패킷간의 스펙트럼거리와 비교하여 손실구간을 보상한다. 또한, 손실구간의 길이에 맞게 음성파형의 크기를 조정한다.If a packet loss interval occurs between the detected speech packet and the speech packet, pitch detection is performed and the loss interval is measured by comparing the spectral distance calculation result through Fast Fourier Transform (FFT) with the spectral distance between basic neighboring packets. To compensate. Also, adjust the size of the speech waveform to fit the length of the loss section.

도 6a 및 도 6b 는 본 발명에 따른 손실 전/후 패킷정보를 이용한 패킷손실 은닉 방법에 따른 일실시예 흐름도이다.6A and 6B are flowcharts illustrating an embodiment of a packet loss concealment method using before and after loss packet information according to the present invention.

VoIP 전송 상에서 패킷 손실이 발생하였음을 수신측에서 확인한다(601).The receiver confirms that packet loss has occurred on the VoIP transmission (601).

패킷 손실을 은닉하기 위해서 수신측에서 우선 손실이전 패킷과 손실이후 패킷에서 피치검출을 실시한다(602). 피치검출 결과, 피치가 검출되면 유성음 그렇지 않으면 무성음으로 판별한다.To conceal packet loss, the receiving side first performs pitch detection on the packet before loss and the packet after loss (602). As a result of pitch detection, if a pitch is detected, it is discriminated as a voiced sound or an unvoiced sound.

다음으로, 손실 전/후의 패킷이 모두 유성음이나 무성음으로 같은지를 확인한다(603). 확인 결과, 손실 전/후의 패킷이 모두 유성음 또는 무성음으로 같은 경우에는 두 패킷의 음성신호간의 스펙트럼 거리를 계산하여 파악한다(604). 그리고, 파악된 스펙트럼 거리가 이웃한 패킷간의 스펙트럼 거리와 비교하여 차이가 많이 나는지를 점검한다(605).Next, it is checked whether the packets before and after the loss are the same with voiced or unvoiced sounds (603). As a result, if the packets before and after the loss are the same as the voiced or unvoiced sound, the spectral distance between the voice signals of the two packets is calculated and identified (604). In operation 605, it is checked whether the identified spectral distance is different from the spectral distance between neighboring packets.

점검 결과, 차이가 많이 나지 않으면, 동일한 음소로 판단하고(606), 손실 전 패킷으로 패킷 손실구간 전부를 보상한다(607).As a result of the check, if there is not much difference, it is determined as the same phoneme (606), and the entire packet loss section is compensated with the packet before loss (607).

점검 결과, 상대적으로 차이가 많이 나면 손실 전/후 패킷이 서로 상이한 음소로 판단한다(608). 또한, 손실 전/후의 패킷이 모두 유성음이나 무성음으로 같은가를 확인한 결과, 하나는 유성음, 다른 하나는 무성음으로 다른 경우에도 서로 상이한 음소로 판단한다(608).As a result of the check, if the differences are relatively large, the packets before and after the loss are determined to be different phonemes (608). In addition, as a result of confirming whether the packets before and after the loss are the same as the voiced sound or the unvoiced sound, one is voiced sound and the other is unvoiced sound.

손실 전/후의 패킷이 서로 상이한 음소로 판단된 경우에는, 판단 후에 손실구간의 2/3를 손실 전 패킷의 음소로 하고 나머지 1/3을 손실 후 패킷의 음소로 하여 보상을 한다(609).If the packets before and after the loss are determined to be different phonemes, compensation is made by two-thirds of the loss section being the phonemes of the pre-loss packet and the remaining 1/3 being the phonemes of the post-loss packet after the determination (609).

손실구간의 패킷의 보상에 있어서 음소가 유성음인 경우에는 도 1처럼 피치구간 음성정보를 반복해주고 무성음인 경우에는 패킷의 음성정보를 전부 사용한다.When the phoneme is voiced in the compensation of the lost section, the pitch section voice information is repeated as shown in FIG. 1, and in the case of the unvoiced sound, all voice information of the packet is used.

다음으로, 손실구간 보상에 따라 불연속적인 부분이 발생하였는지를 조사한다(610). 조사 결과, 불연속적인 면이 발생하였으면 해당하는 중복구간(도 3의 D)만큼을 추가적으로 보상음성신호를 만들어 삼각 창을 이용한 OLA를 적용하여 불연속성을 줄여준다(611). 즉, 손실구간 전/후가 서로 상이한 음소라 여겨지면 전/후의 음성정보로 보상한 중간부분에서 불연속면이 발생할 수 있으며, 손실구간 전/후가 동일하다고 여겨지는 경우에는 손실 전 패킷정보를 이용하여 손실구간을 보상함으로 손실구간 끝 부분에 불연속면이 발생할 수 있으며, 이렇게 불연속면이 발생할 때 OLA를 적용하게 된다.Next, it is examined whether discontinuous portions have occurred according to the loss interval compensation (610). As a result, if a discontinuous surface is generated, an additional compensating voice signal is made for the corresponding overlapping section (D of FIG. 3), and the discontinuity is reduced by applying an OLA using a triangular window (611). In other words, if the phoneme before and after the loss section is considered to be different phonemes, discontinuities may occur in the middle part compensated by the voice information before and after the loss section. By compensating for the loss section, discontinuities can occur at the end of the loss section, and OLA is applied when such discontinuities occur.

손실구간 보상에 따른 불연속적인 면이 발생하지 않았거나 불연속적인 면이 발생한 부분에 대해 OLA를 적용한 후에, 손실구간이 기준값보다 긴가를 검사한다(612). 손실구간이 길어지면 질수록 동이한 음성신호를 길게 반복하게 되므로 재생되는 합성음이 부자연스러워지게 된다. 따라서, 손실구간이 주어진 기준값보다 긴지를 검사하여, 주어진 기준값보다 손실구간이 길면 보상되는 손실구간의 음성신호에 대해 크기를 선형적으로 조정한다(613). 즉, 손실구간 전/후가 서로 상이한 음소인 경우에는 손실구간 2/3 구간의 음성신호 크기를 10msec당 20% 씩 감소하도록 조절하고 나머지 1/3 부분을 뒤프레임 음성신호의 크기와 맞도록 다시 서서히 증가시키며, 서로 동일한 음소로 판단된 겨우에는 전/후 음성정보의 피크점을고려해 보상음성신호의 크기를 선형적으로 조절한다.After applying the OLA to the portion where the discontinuous face does not occur or the discontinuous face caused by the loss interval compensation, it is checked whether the loss interval is longer than the reference value (612). The longer the loss interval is, the longer the repeated audio signal is, so that the synthesized sound is unnatural. Therefore, by checking whether the loss section is longer than the given reference value, if the loss section is longer than the given reference value, the amplitude is linearly adjusted for the speech signal of the loss section to be compensated (613). That is, if the phoneme before and after the loss section is different from each other, adjust the audio signal size of 2/3 section of the loss section by 20% per 10msec, and adjust the remaining 1/3 to match the size of the back frame audio signal. If it is determined that the phonemes are gradually increased, the amplitude of the compensating voice signal is linearly adjusted in consideration of the peak points of the front and rear voice information.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.The method of the present invention as described above may be implemented as a program and stored in a computer-readable recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.).

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be clear to those of ordinary knowledge.

상기한 바와 같은 본 발명은, VoIP상에서의 패킷손실을 손실 전/후의 피치정보와 스펙트럼 거리를 이용하여 양쪽의 음소를 비교하여 손실구간을 앞/뒤의 음소정보로 보상하여 줌으로써 단일 패킷손실 뿐만 아니라 연속적인 패킷손실에 의한 상대적으로 긴 손실구간에 대해서 보다 자연스러운 합성 음성신호로 보상해 줄 수 있는 효과가 있다.As described above, the present invention compares the two phonemes using the pitch information before and after the loss and the spectral distance, and compensates the loss section with the phoneme information before and after the loss. There is an effect that the natural speech signal can be compensated for a relatively long loss period due to continuous packet loss.

Claims

In the packet loss concealment method applied to an audio transmission system,

Checking a loss of the received voice packet and comparing packets before and after the loss interval to determine whether the packets before and after the loss are different phonemes;

A second step of concealing packet loss by compensating the entire loss section by using the voice information of the pre-lost packet if it is determined that the packets before and after the loss are the same phoneme; And

As a result of the determination in the first step, when it is determined that the packets before and after the loss are different phonemes, a portion of the loss section is compensated using the voice information of the packet before the loss, and the section except the predetermined portion of the loss section is excluded. A third step of concealing packet loss by compensating for the loss using the voice information of the packet after the loss

Packet loss concealment method using before / after loss packet information comprising a.

The method of claim 1,

A fourth to reduce discontinuity by applying an overlap addition (OLA: OverLap Add) over a predetermined overlap period to the discontinuous portion generated in the loss section by compensating the loss section using the voice information of the before and after loss packet. step

Packet loss concealment method using before / after loss packet information further comprising.

The method of claim 1,

The first step is,

A fifth step of the audio transmission system confirming a packet loss with respect to the received voice packet;

A sixth step of determining whether packets before and after the loss are phonemes different from each other based on whether a pitch is detected for the packets before and after the loss; And

In the case where the pitch detection result is the same, the spectral distance between the voice signals of the pre- and post-loss packets is obtained to determine whether the pre- and post-loss packets are different phonemes based on whether there is a difference between the spectral distances between neighboring packets. 7 steps

The method according to any one of claims 1 to 3,

An eighth step of linearly adjusting a magnitude of a compensated voice signal reproduced based on voice information of the pre / post loss packet for the loss section in which the synthesized sound is unnatural due to a length longer than a predetermined reference value;

The method of claim 4, wherein

The eighth step,

A ninth step of checking whether packets before and after the loss are determined to be different phonemes for the loss intervals in which the synthesized sound is unnatural due to a length longer than a predetermined reference value;

As a result of the checking in the ninth step, if it is determined that the pre- and post-loss packets are different phonemes, the magnitude of the voice signal for the predetermined portion of the first half of the loss section compensated with the voice information of the pre-loss packet is based on this. A tenth step of adjusting to reduce the ratio at a predetermined rate and gradually increasing the rest to match the size of the packet voice signal after the loss; And

As a result of the checking in the ninth step, if it is determined that the packets before and after the loss are the same phonemes, the amplitude of the compensated speech signal is linearly determined in consideration of the peak points of the voice information of the packets before and after the loss. 11th step to adjust

In an audio transmission system having a processor,

A first function of checking a loss of a received voice packet and comparing packets before and after a loss interval to determine whether the packets before and after the loss are different phonemes;

A second function of concealing packet loss by compensating for the entire loss section by using the voice information of the pre-lost packet if it is determined that the packets before and after the loss are the same phonemes as a result of the determination of the first function; And

As a result of the determination of the first function, if it is determined that the packets before and after the loss are different phonemes from each other, the predetermined portion of the loss interval is compensated using the voice information of the packet before the loss, and the interval except the predetermined portion of the loss interval is excluded. A third function for concealing packet loss by compensating for the loss after the loss by using the voice information of the packet

A computer-readable recording medium having recorded thereon a program for realizing this.

The method of claim 6,

A fourth method for reducing discontinuity by applying an overlap addition (OLA: OverLap Add) over a predetermined overlapping section for the discontinuous portion generated in the loss section as compensation for the loss section using the voice information of the before and after loss packet. function

A computer-readable recording medium that records a program for further realization.

The method according to claim 6 or 7,

A fifth function of linearly adjusting a magnitude of a compensated voice signal reproduced based on voice information of the pre- and post-loss packet for the loss section in which the synthesized sound may be unnatural because it is longer than a predetermined reference value.