JP5335390B2

JP5335390B2 - Signal processing apparatus and signal processing method

Info

Publication number: JP5335390B2
Application number: JP2008307219A
Authority: JP
Inventors: 幸夫岡田
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2008-12-02
Filing date: 2008-12-02
Publication date: 2013-11-06
Anticipated expiration: 2028-12-02
Also published as: JP2010134013A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a signal processing device and a signal processing method, accurately calculating a fundamental period of an input signal just before the current time. <P>SOLUTION: A template setting section 51 sets a voice signal of a specified time width in the past from the time RT when packet loss occurs, as a template TM. In a period detection section 52, the template TM set by the template setting section 51 is shifted to the past from the time RT of the voice signal, and the template TM is correlated with the voice signal, and the fundamental period of the voice signal is detected just before the time RT on the basis of the shift amount at the time when a peak of correlation of the template TM and the voice signal is the highest. Here, the template setting section 51 increases the time width of the template TM, as the period detection section 52 increases the shift amount of the template TM. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、入力信号の基本周期を検出する信号処理装置及び信号処理方法に関するものである。 The present invention relates to a signal processing device and a signal processing method for detecting a fundamental period of an input signal.

近年、ＰＬＣや無線ＬＡＮ等の通信媒体を用いて音声通話を行うＶｏＩＰ（Voice over Internet Protocol）通信が知られている。ＰＬＣや無線ＬＡＮ等の通信媒体は、伝送品質が変動しやすいため、パケットロスによる音声品質の劣化が問題となる。パケットロスが発生すると、例えばプツというようなノイズ音が通話中に発生する可能性がある。 In recent years, VoIP (Voice over Internet Protocol) communication in which voice communication is performed using a communication medium such as a PLC or a wireless LAN is known. Since communication quality such as PLC and wireless LAN tends to fluctuate, deterioration of voice quality due to packet loss becomes a problem. When a packet loss occurs, for example, a noise sound such as a pudding may occur during a call.

そこで、ＶｏＩＰの通信装置では、パケットロスが発生した場合、直前の音声信号の基本周期（ピッチ）を検出し、パケットロスが発生した直前の基本周期分の音声信号を用いてロス期間を補間する隠蔽処理が行われている。 Therefore, in the VoIP communication apparatus, when packet loss occurs, the basic period (pitch) of the immediately preceding audio signal is detected, and the loss period is interpolated using the audio signal for the basic period immediately before the occurrence of the packet loss. Concealment processing is performed.

図７は、従来の隠蔽処理を説明するための音声信号の波形図である。図７において縦軸は通信装置に入力される音声信号の強度を示し、横軸は時間を示している。音声パケットの受信に失敗し、パケットロスが発生すると、通信装置は、パケットロスが発生する直前の所定期間の音声信号をテンプレートとして設定する。 FIG. 7 is a waveform diagram of an audio signal for explaining a conventional concealment process. In FIG. 7, the vertical axis indicates the intensity of the audio signal input to the communication device, and the horizontal axis indicates time. When reception of a voice packet fails and packet loss occurs, the communication apparatus sets a voice signal for a predetermined period immediately before the packet loss occurs as a template.

次に、このテンプレートを音声信号に対してパケットロスが発生したした時点から過去に向けてスライドさせる。次に、テンプレートと音声信号との相関演算を実行し、パケットロスが発生する直前の音声信号の基本周期を検出する。 Next, the template is slid toward the past from the time when the packet loss occurs with respect to the audio signal. Next, correlation calculation between the template and the audio signal is executed, and the basic period of the audio signal immediately before the packet loss occurs is detected.

次に、パケットロスが発生してから過去に遡って、基本周期分の音声信号を取り出し、その音声信号をロス期間に繰り返し当てはめてロス期間を補間する。 Next, from the occurrence of packet loss, the audio signal for the basic period is extracted retroactively, and the audio signal is repeatedly applied to the loss period to interpolate the loss period.

ここで、この基本周期分の音声信号でロス期間を補間するのは、通話者が例えば「あ」という音声を発した場合、この「あ」の音声は、２０ｍｓｅｃ程度に区切られて１つの音声パケットにのせて送信されるため、ロス期間ではパケットロスが発生する直前の基本周期分の音声信号が繰り返されている可能性が高いからである。 Here, the loss period is interpolated with the audio signal for this basic period. For example, when the caller utters the voice “A”, the voice “A” is divided into about 20 msec and is converted into one voice. This is because, since it is transmitted on a packet, there is a high possibility that the audio signal for the basic period immediately before the occurrence of the packet loss is repeated in the loss period.

したがって、隠蔽処理を行う場合、パケットロスが発生した直前の音声信号の基本周期を検出することが重要となる。 Therefore, when performing concealment processing, it is important to detect the basic period of the audio signal immediately before the occurrence of packet loss.

なお、従来の基本周期の検出手法として下記に示す特許文献１〜４が知られている。特許文献１では、複数の異なる分析窓幅で、入力音声波形の自己相関関数Ｒ（τ）を求め、求めた最大値をＲ（τ）ｍａｘとし、Ｖ＝Ｒ（τ）ｍａｘ／Ｒ（０）を求め、Ｖの大きさとτのばらつきとを考慮して、最も信頼できるτの値から入力音声波形のピッチを求める技術が開示されている。 Note that Patent Documents 1 to 4 shown below are known as conventional fundamental period detection methods. In Patent Document 1, an autocorrelation function R (τ) of an input speech waveform is obtained with a plurality of different analysis window widths, and the obtained maximum value is R (τ) max, and V = R (τ) max / R (0 ), And the pitch of the input speech waveform is determined from the most reliable value of τ in consideration of the magnitude of V and the variation in τ.

特許文献２では、デジタルサンプルされた音信号の自己相関関数φ（ｄ）を求め、音声信号の概略のピッチを抽出し、抽出した概略のピッチを基本区間として、基本区間の整数倍で音声信号を区切って演算区間を決定し、決定した演算区間の自己相関関数φ´（ｄ）から音声信号のピッチを求める技術が開示されている。 In Patent Document 2, an autocorrelation function φ (d) of a digitally sampled sound signal is obtained, an approximate pitch of the audio signal is extracted, and the audio signal is an integer multiple of the basic interval using the extracted approximate pitch as a basic interval. A technique is disclosed in which a calculation interval is determined by dividing, and the pitch of an audio signal is obtained from the autocorrelation function φ ′ (d) of the determined calculation interval.

特許文献３では、過去のフレームで抽出したピッチの平均値が大きいときに分析窓長を長く設定すると共に間引き率を高く設定し、逆に前記平均値が小さいときに分析窓長を短く設定すると共に間引き率を低く設定する技術が開示されている。 In Patent Document 3, when the average value of the pitch extracted in the past frame is large, the analysis window length is set long and the thinning rate is set high. Conversely, when the average value is small, the analysis window length is set short. In addition, a technique for setting the thinning rate low is disclosed.

特許文献４では、予め定められた複数のピッチ候補のそれぞれについて入力音声信号のピッチを求めるにあたり、ピッチ候補が長周期になるにつれて、周期評価に用いる入力音声信号の時間軸上の評価範囲を広くすると共に、入力音声信号のサンプリング周期を狭める技術が開示されている。
特許第３２１９８６８号公報特許第３２３３４４８号公報特許第３４５９４１号公報特開２００４−３１７５３３号公報 In Patent Document 4, when the pitch of the input speech signal is obtained for each of a plurality of predetermined pitch candidates, the evaluation range on the time axis of the input speech signal used for cycle evaluation becomes wider as the pitch candidates become longer cycles. In addition, a technique for narrowing the sampling period of the input audio signal is disclosed.
Japanese Patent No. 3219868 Japanese Patent No. 3323448 Japanese Patent No. 345941 JP 2004-317533 A

しかしながら、特許文献１〜４の手法は、いずれも、上記の隠蔽処理を行うことを課題としておらず、単に入力信号から基本周期を精度良く検出すること課題としている。そのため、パケットロスが発生した直前の入力信号の基本周期を精度良く検出することができないという問題がある。特に、特許文献１〜４の手法では、パケットロスが発生する直前に入力信号の基本周期が変化した際に、この基本周期とテンプレートの時間幅とがマッチしていない場合、この基本周期を精度良く検出することができないという問題が発生する。 However, none of the methods of Patent Documents 1 to 4 has the problem of performing the concealment process described above, and simply detects the basic period from the input signal with high accuracy. Therefore, there is a problem that the basic period of the input signal immediately before the packet loss cannot be detected with high accuracy. In particular, in the methods of Patent Documents 1 to 4, when the basic period of the input signal changes immediately before the packet loss occurs, if this basic period and the time width of the template do not match, the basic period is accurately determined. The problem that it cannot be detected well occurs.

本発明の目的は、現時点の直前における入力信号の基本周期を精度良く算出することができる信号処理装置及び信号処理方法を提供することである。 An object of the present invention is to provide a signal processing apparatus and a signal processing method capable of accurately calculating the basic period of an input signal immediately before the present time.

（１）本発明の一局面による信号処理装置は、現時点から過去に向けてある時間幅の入力信号を基準信号として設定する基準信号設定部と、前記基準信号を前記入力信号に対して現時点から過去に向けてスライドさせ、前記基準信号と前記入力信号との相関を求めることで、前記入力信号の基本周期を検出する周期検出部とを備え、前記基準信号設定部は、前記基準信号のスライド量が増大するにつれて前記基準信号の時間幅を増大させる。 (1) A signal processing device according to one aspect of the present invention includes a reference signal setting unit that sets an input signal having a time width from the current time to the past as a reference signal, and the reference signal is set to the input signal from the current time. And a period detection unit that detects a basic period of the input signal by sliding toward the past and obtaining a correlation between the reference signal and the input signal, and the reference signal setting unit is configured to slide the reference signal. As the quantity increases, the time width of the reference signal is increased.

本発明の別の一局面による信号処理方法は現時点から過去に向けてある時間幅の入力信号を基準信号として設定する基準信号設定ステップと、前記基準信号を現在の入力信号から過去の入力信号に向けてスライドさせ、前記基準信号と入力信号との相関を求めることで、前記基本周期を検出する周期検出ステップとを備え、前記周期検出ステップは、前記基準信号のスライド量が増大するにつれて前記基準信号の時間幅を増大させる。 A signal processing method according to another aspect of the present invention includes a reference signal setting step of setting an input signal having a time width from the present time to the past as a reference signal, and changing the reference signal from the current input signal to the past input signal. And a period detection step for detecting the fundamental period by obtaining a correlation between the reference signal and the input signal, and the period detection step includes the reference signal as the reference signal slide amount increases. Increase the time width of the signal.

この構成によれば、現時点から過去に向けてある時間幅の入力信号が基準信号として設定される。そして、設定された基準信号が入力信号に対して現時点から過去に向けてスライドされる。そして、基準信号と入力信号との相関が求められ、入力信号の基本周期が検出される。 According to this configuration, an input signal having a time width from the present time to the past is set as the reference signal. Then, the set reference signal is slid toward the past from the present time with respect to the input signal. Then, the correlation between the reference signal and the input signal is obtained, and the fundamental period of the input signal is detected.

ここで、基準信号はスライド量が増大するにつれて時間幅が増大される。したがって、スライド量の小さい比較的初期の段階において、現時点のほぼ直前の基本周期分の入力信号が基準信号とされるタイミングが発生する。このとき、基準信号と入力信号との間で強い相関ピークが現れる。一方、スライド量が大きくなると、それに応じて基準信号の時間幅も増大され、基準信号には複数の周波数成分が含まれるようになる。そのため、上記のタイミングで得られる相関ピークほど強い相関ピークを得ることはできなくなる。よって、現時点の直前の入力信号の基本周期を精度良く検出することが可能となる。 Here, the time width of the reference signal is increased as the slide amount increases. Therefore, at a relatively early stage where the slide amount is small, a timing is generated at which the input signal for the basic period immediately before the current time is used as the reference signal. At this time, a strong correlation peak appears between the reference signal and the input signal. On the other hand, when the slide amount increases, the time width of the reference signal is increased accordingly, and the reference signal includes a plurality of frequency components. Therefore, it becomes impossible to obtain a stronger correlation peak as the correlation peak obtained at the above timing. Therefore, the basic period of the input signal immediately before the current time can be detected with high accuracy.

更に、スライド量の小さい比較的初期の段階においては、基準信号の時間幅が短いため、計算量を小さくすることができる。 Furthermore, in a relatively early stage where the slide amount is small, the time width of the reference signal is short, so that the calculation amount can be reduced.

（２）前記入力信号にパケットロスが発生したか否かを検出するパケットロス検出部と、隠蔽処理部とを更に備え、前記周期検出部は、前記パケットロス検出部によりパケットロスが発生したロス発生時点を前記現時点として前記基本周期を検出し、前記隠蔽処理部は、前記ロス発生時点から過去に向けて前記基本周期分の入力信号を取り出し、取り出した入力信号でパケットロスが発生したロス期間を補間することが好ましい。 (2) A packet loss detection unit that detects whether or not a packet loss has occurred in the input signal, and a concealment processing unit, wherein the period detection unit is a loss in which a packet loss has occurred by the packet loss detection unit The basic period is detected with the occurrence time as the current time, and the concealment processing unit extracts the input signal for the basic period from the loss occurrence time to the past, and a loss period in which a packet loss has occurred in the extracted input signal Is preferably interpolated.

この構成によれば、パケットロスが発生した場合に、ロス発生時点が現時点とされ、ロス発生時点の直前における入力信号の基本周期が検出される。そして、ロス発生時点から過去に向けて基本周期分の入力信号が取り出され、この入力信号を用いてロス期間が補間されて隠蔽処理が行われる。そのため、隠蔽処理を精度良く行うことができる。 According to this configuration, when packet loss occurs, the loss occurrence time is set as the current time, and the basic period of the input signal immediately before the loss occurrence time is detected. Then, an input signal corresponding to the basic period is extracted from the point of occurrence of loss toward the past, and the loss period is interpolated using this input signal to perform concealment processing. Therefore, the concealment process can be performed with high accuracy.

（３）前記基準信号設定部は、前記基準信号のスライド量が所定のスライド基準値になるまで、前記基準信号の時間幅を所定の初期時間幅に設定することが好ましい。 (3) It is preferable that the reference signal setting unit sets a time width of the reference signal to a predetermined initial time width until a slide amount of the reference signal reaches a predetermined slide reference value.

この構成によれば、基準信号のスライド量が比較的小さい場合は、基準信号の時間幅が初期時間幅に設定されるため、スライド量が小さい場合であっても基準信号の時間幅を一定の大きさ以上確保することが可能となり、基準信号と入力信号との間の相関をより精度良く求めることができる。なお、初期時間幅としては、例えば想定される入力信号の基本周期の最小値程度の値を採用すればよい。また、スライド基準値としては、例えば初期時間幅を採用すればよい。 According to this configuration, when the slide amount of the reference signal is relatively small, the time width of the reference signal is set to the initial time width. Therefore, even when the slide amount is small, the time width of the reference signal is constant. It is possible to ensure a size equal to or greater than that, and the correlation between the reference signal and the input signal can be obtained with higher accuracy. As the initial time width, for example, a value about the minimum value of the assumed basic period of the input signal may be adopted. As the slide reference value, for example, an initial time width may be adopted.

（４）前記周期検出部は、相互相関により前記基準信号と前記入力信号との相関を求めることが好ましい。 (4) It is preferable that the period detection unit obtains a correlation between the reference signal and the input signal by cross-correlation.

この構成によれば、相互相関が採用されているため、基準信号と入力信号との相関を精度良く算出することができる。 According to this configuration, since the cross-correlation is employed, the correlation between the reference signal and the input signal can be calculated with high accuracy.

（５）前記周期検出部は、ＡＭＤＦにより前記基準信号と前記入力信号との相関を求めることが好ましい。 (5) It is preferable that the period detection unit obtains a correlation between the reference signal and the input signal by AMDF.

この構成によれば、ＡＭＤＦが採用されているため、比較的少ない計算量でありながら精度良く基準信号と入力信号との相関を算出することができる。 According to this configuration, since the AMDF is employed, it is possible to calculate the correlation between the reference signal and the input signal with high accuracy while using a relatively small amount of calculation.

（６）前記入力信号は、所定のサンプリング周期でサンプリングされた信号であり、前記周期検出部は、式（１）を用いて前記相互相関を行うことが好ましい。 (6) Preferably, the input signal is a signal sampled at a predetermined sampling period, and the period detector performs the cross-correlation using equation (1).

但し、φ（τ）は相関値を示し、Ｎは前記基準信号の時間幅を示し、ｘ（ｊ）は前記基準信号を示し、ｘ（ｊ−τ）は前記入力信号を示し、ｋ＋１は前記基準信号の開始点を示し、ａは予め定められた係数を示し、τは前記基準信号のスライド量を示し、ｊは入力信号の各サンプリング点のサンプリング番号を示す。 Where φ (τ) represents a correlation value, N represents a time width of the reference signal, x (j) represents the reference signal, x (j−τ) represents the input signal, and k + 1 represents the input signal. The starting point of the reference signal is indicated, a is a predetermined coefficient, τ is the sliding amount of the reference signal, and j is the sampling number of each sampling point of the input signal.

この構成によれば、式（１）を用いて相互相関が算出されることになる。 According to this configuration, the cross correlation is calculated using Expression (1).

（７）前記入力信号は、所定のサンプリング周期でサンプリングされた信号であり、前記周期検出部は、式（２）を用いて前記ＡＭＤＦを行うことが好ましい。 (7) It is preferable that the input signal is a signal sampled at a predetermined sampling period, and the period detection unit performs the AMDF using Equation (2).

この構成によれば、式（２）を用いてＡＭＤＦが行われることになる。 According to this configuration, AMDF is performed using equation (2).

（８）前記基準信号設定部は、前記基準信号のスライド量が所定の変更基準値を超えるまで、前記ａを１≦ａ＜２の範囲内の所定の固定値に設定し、前記基準信号のスライド量が前記変更基準値を超えると、前記スライド量が所定の最大スライド量に近づくにつれて、１に近づくように前記ａの値を減少させることが好ましい。 (8) The reference signal setting unit sets the a to a predetermined fixed value within a range of 1 ≦ a <2 until the slide amount of the reference signal exceeds a predetermined change reference value, When the slide amount exceeds the change reference value, it is preferable to decrease the value a so as to approach 1 as the slide amount approaches a predetermined maximum slide amount.

この構成によれば、スライド量が小さい場合、基準信号の時間幅をスライド量に対して大きめに設定することができ、スライド量が大きい場合、基準信号の時間幅をスライド量程度の値に設定することができる。そのため、スライド量が小さい場合において、基準信号の時間幅が小さくなりすぎることを防止することができる。 According to this configuration, when the slide amount is small, the time width of the reference signal can be set larger than the slide amount, and when the slide amount is large, the time width of the reference signal is set to a value about the slide amount. can do. Therefore, when the slide amount is small, it is possible to prevent the time width of the reference signal from becoming too small.

本発明によれば、スライド量が小さい比較的初期の段階において、現時点のほぼ直前の基本周期分の入力信号が基準信号とされるタイミングが発生する。このとき、基準信号と入力信号との間で強い相関ピークが現れる。一方、スライド量が大きくなると、それに応じて基準信号の時間幅も増大され、基準信号には複数の周波数成分が含まれるようになる。そのため、上記のタイミングで得られる相関ピークほど強い相関ピークを得ることはできなくなる。よって、現時点のほぼ直前の入力信号の基本周期を精度良く検出することが可能となる。 According to the present invention, in a relatively early stage where the slide amount is small, a timing is generated at which the input signal for the basic period almost immediately before the current point is used as the reference signal. At this time, a strong correlation peak appears between the reference signal and the input signal. On the other hand, when the slide amount increases, the time width of the reference signal is increased accordingly, and the reference signal includes a plurality of frequency components. Therefore, it becomes impossible to obtain a stronger correlation peak as the correlation peak obtained at the above timing. Therefore, it is possible to detect the basic period of the input signal almost immediately before the present time with high accuracy.

以下、本発明の一実施の形態による信号処理装置を通信装置に適用した場合を例に挙げて説明する。図１は、本発明の一実施の形態による信号処理装置が適用された通信装置の全体構成を示すブロック図である。なお、この通信装置は、例えばＶｏＩＰによる通話機能を備える通信装置であり、SIP(Session Initiation Protocol)、Ｈ．３２３、ＩＰｖ４、又はＩＰｖ６等の通信プロトコルを用いて所定の通信ネットワークを介して接続された他の通信装置との間で通信を行う。 Hereinafter, a case where a signal processing device according to an embodiment of the present invention is applied to a communication device will be described as an example. FIG. 1 is a block diagram showing an overall configuration of a communication apparatus to which a signal processing apparatus according to an embodiment of the present invention is applied. Note that this communication apparatus is a communication apparatus having a call function based on VoIP, for example, SIP (Session Initiation Protocol), H.264. Communication is performed with other communication devices connected via a predetermined communication network using a communication protocol such as H.323, IPv4, or IPv6.

図１に示すように、本通信装置は、パケット受信部１、遅延ゆらぎ吸収バッファ２、タイマ３、パケットロス検出部４、検出処理部５、隠蔽処理部６、音声出力部７、及びスピーカ８を備えている。 As shown in FIG. 1, the communication apparatus includes a packet receiver 1, a delay fluctuation absorbing buffer 2, a timer 3, a packet loss detector 4, a detection processor 5, a concealment processor 6, an audio output unit 7, and a speaker 8. It has.

パケット受信部１は、他の通信装置から送信される音声パケットを受信し、この音声パケットを遅延ゆらぎ吸収バッファ２に出力する。この音声パケットは、例えば、ＲＴＰ（Real-time Transport Protocol）に準拠した音声パケットであり、２０ｍｓｅｃのデジタルの音声信号を含む。また、音声信号は、例えば、ＰＣＭμ−Ｌａｗ等により８ｋＨｚのサンプリング周波数でサンプリングされたデジタルの音声信号である。 The packet receiver 1 receives a voice packet transmitted from another communication device, and outputs this voice packet to the delay fluctuation absorbing buffer 2. This voice packet is a voice packet compliant with, for example, RTP (Real-time Transport Protocol), and includes a 20 msec digital voice signal. The audio signal is a digital audio signal sampled at a sampling frequency of 8 kHz by, for example, PCM μ-Law or the like.

したがって、パケット受信部１は、音声パケットのＲＴＰヘッダに含まれるシーケンス番号に従って、受信した音声パケットを時系列順に遅延ゆらぎ吸収バッファ２に出力する。なお、ＲＴＰヘッダには、シーケンス番号の他タイムスタンプ等が含まれている。シーケンス番号は音声パケットの送信順序を示し、タイムスタンプは、元の音声波形における音声信号の相対的な位置を示す。 Therefore, the packet receiving unit 1 outputs the received voice packet to the delay fluctuation absorbing buffer 2 in time-series order according to the sequence number included in the RTP header of the voice packet. Note that the RTP header includes a time stamp and the like in addition to the sequence number. The sequence number indicates the transmission order of voice packets, and the time stamp indicates the relative position of the voice signal in the original voice waveform.

遅延ゆらぎ吸収バッファ２は、パケット受信部１から出力された音声パケットを一旦保持し、所定時間遅延させて出力することで音声パケットの遅延ゆらぎを吸収する。 The delay fluctuation absorbing buffer 2 temporarily holds the voice packet output from the packet receiving unit 1 and outputs it after a predetermined time delay to absorb the voice packet delay fluctuation.

タイマ３は、パケットロス検出部４がパケットロスを検出する際に用いられる。パケットロス検出部４は、遅延ゆらぎ吸収バッファ２が検出処理部５に音声パケットを出力したとき、タイマ３に計時を開始させ、遅延ゆらぎ吸収バッファ２が次の音声パケットを出力する前に、タイマ３による計時時間がパケットロスが発生したと想定される所定時間を超えた場合、パケットロスが発生したと判定する。 The timer 3 is used when the packet loss detection unit 4 detects a packet loss. When the delay fluctuation absorbing buffer 2 outputs a voice packet to the detection processing section 5, the packet loss detecting section 4 causes the timer 3 to start timing, and before the delay fluctuation absorbing buffer 2 outputs the next voice packet, When the time measured by 3 exceeds a predetermined time when it is assumed that a packet loss has occurred, it is determined that a packet loss has occurred.

検出処理部５は、遅延ゆらぎ吸収バッファ２から順次出力される音声パケットからシーケンス番号及びタイムスタンプに従って音声信号を取り出す。そして、検出処理部５は、パケットロス検出部４によりパケットロスが検出された場合、取り出した音声信号に対して基本周期（ピッチ）の検出処理を行い、パケットロス検出部４によりパケットロスが検出されなかった場合、取り出した音声信号を音声出力部７にそのまま出力する。なお、検出処理部５は、過去一定期間の音声信号を保持するものとする。 The detection processing unit 5 extracts an audio signal according to the sequence number and the time stamp from the audio packets sequentially output from the delay fluctuation absorbing buffer 2. When the packet loss detection unit 4 detects a packet loss, the detection processing unit 5 performs a basic period (pitch) detection process on the extracted audio signal, and the packet loss detection unit 4 detects the packet loss. If not, the extracted audio signal is output to the audio output unit 7 as it is. It is assumed that the detection processing unit 5 holds an audio signal for a certain past period.

また、検出処理部５は、テンプレート設定部５１（基準信号設定部の一例）及び周期検出部５２を備えている。テンプレート設定部５１は、パケットロスが発生したロス発生時点から過去に向けてある時間幅の音声信号をテンプレート（基準信号の一例）として設定する。ここで、テンプレート設定部５１は、周期検出部５２がテンプレートのスライド量を増大させるにつれてテンプレートの時間幅を増大させる。 The detection processing unit 5 includes a template setting unit 51 (an example of a reference signal setting unit) and a period detection unit 52. The template setting unit 51 sets, as a template (an example of a reference signal), an audio signal having a time width from the loss occurrence time point when the packet loss has occurred toward the past. Here, the template setting unit 51 increases the time width of the template as the period detection unit 52 increases the slide amount of the template.

周期検出部５２は、テンプレート設定部５１により設定されたテンプレートを音声信号に対してロス発生時点から過去に向けてスライドさせ、テンプレートと音声信号との相関を求め、テンプレートと音声信号との相関ピークが最も強く現れたときのスライド量からロス発生時点の直前の音声信号の基本周期を検出する。 The period detection unit 52 slides the template set by the template setting unit 51 from the loss occurrence time to the past with respect to the audio signal, obtains the correlation between the template and the audio signal, and correlates the peak between the template and the audio signal. The basic period of the audio signal immediately before the point of occurrence of loss is detected from the amount of slide when appears most strongly.

図２は、テンプレート設定部５１及び周期検出部５２の処理を説明するための音声信号の波形図である。なお、図２に示す縦軸は音声信号の強度を示し、横軸は時間をサンプル数で示したものである。また、図２に示すテンプレートＴＪは従来の隠蔽処理に使用されていたテンプレートを示している。 FIG. 2 is a waveform diagram of an audio signal for explaining the processing of the template setting unit 51 and the cycle detection unit 52. Note that the vertical axis shown in FIG. 2 indicates the intensity of the audio signal, and the horizontal axis indicates time as the number of samples. A template TJ shown in FIG. 2 indicates a template used for the conventional concealment process.

パケットロスが発生すると、従来の通信装置では、例えば、ロス発生時点ＲＴから過去の所定期間分の音声信号をテンプレートＴＪとして設定する。そして、このテンプレートＴＪを音声信号に対してロス発生時点ＲＴから過去に向けてスライドさせることで、音声信号とテンプレートＴＪの相関を求め、最も強い相関ピークが得られたときのテンプレートＴＪのスライド量から音声信号の基本周期を検出していた。 When a packet loss occurs, the conventional communication apparatus sets, for example, an audio signal for a predetermined period in the past from the loss occurrence time RT as the template TJ. Then, by sliding this template TJ toward the past from the loss occurrence time RT with respect to the audio signal, the correlation between the audio signal and the template TJ is obtained, and the slide amount of the template TJ when the strongest correlation peak is obtained. The basic period of the audio signal was detected.

図３は、従来のテンプレートＴＪを用いたときのテンプレートＴＪと音声信号との相関値の演算結果を示したグラフである。なお、図３においては、ＡＭＤＦを用いて相関値が算出されている。また、図３において、縦軸は相関値を示し、横軸はロス発生時点ＲＴを０としたときの時間をサンプル数で示したものである。また、図３はＡＭＤＦによる相関値であるため、値が小さいほど音声信号とテンプレートＴＪとの相関が強い。 FIG. 3 is a graph showing the calculation result of the correlation value between the template TJ and the audio signal when the conventional template TJ is used. In FIG. 3, the correlation value is calculated using AMDF. In FIG. 3, the vertical axis indicates the correlation value, and the horizontal axis indicates the time when the loss occurrence time RT is 0 as the number of samples. Also, since FIG. 3 shows the correlation value by AMDF, the smaller the value, the stronger the correlation between the audio signal and the template TJ.

図３では、まず、３７サンプルの時点で下に凸の相関ピークＰＫ１が現れ、次に、４７サンプルの時点で下に凸の相関ピークＰＫ２が現れ、以後、およそ３７サンプルの周期で下に凸の相関ピークが繰り返し現れている。そして、相関ピークＰＫ１の方が相関ピークＰＫ２よりも小さく現れている。そのため、従来の手法では３７サンプルが音声信号の基本周期として検出されてしまう。 In FIG. 3, first, a convex correlation peak PK1 appears downward at the time of 37 samples, then a downward convex correlation peak PK2 appears at the time of 47 samples, and thereafter convex downward at a period of about 37 samples. The correlation peak of appears repeatedly. The correlation peak PK1 appears smaller than the correlation peak PK2. Therefore, in the conventional method, 37 samples are detected as the basic period of the audio signal.

一方、図２に示すようにロス発生時点ＲＴの直前の音声信号の基本周期は、４７サンプルである。そのため、従来の手法では、ロス発生時点ＲＴの直前の音声信号の基本周期が精度良く検出されていないことが分かる。 On the other hand, as shown in FIG. 2, the basic period of the audio signal immediately before the loss occurrence time RT is 47 samples. Therefore, it can be seen that the conventional method does not accurately detect the fundamental period of the audio signal immediately before the loss occurrence time RT.

これは、テンプレートＴＪの時間幅は、４７サンプルより遙かに大きく、テンプレートＴＪには検出対象となる基本周期が４７サンプルの音声信号は１周期分しか含まれていないが、検出対象でない基本周期が３７サンプルの音声信号は３周期分も含まれているため、３７サンプルで強い相関ピークが現れたと考えられる。 This is because the time width of the template TJ is much larger than 47 samples, and the template TJ includes only one period of a sound signal whose basic period to be detected is 47 samples. However, since the sound signal of 37 samples includes 3 periods, it is considered that a strong correlation peak appeared at 37 samples.

この場合、ロス発生時点ＲＴから過去に遡って３７サンプル分の音声信号を取り出し、この音声信号をロス期間に繰り返し当てはめて補間することで、隠蔽処理が行われる。 In this case, the concealment process is performed by taking out the audio signal of 37 samples retroactively from the loss occurrence time RT, and repeatedly applying this audio signal to the loss period for interpolation.

そのため、ロス期間の波形とロス期間以外の波形とを滑らかに繋ぐことが困難となり、隠蔽処理を精度良く行うことが困難となってしまう。 Therefore, it is difficult to smoothly connect the waveform of the loss period and the waveform other than the loss period, and it is difficult to perform the concealment process with high accuracy.

一方、テンプレートの時間幅が４７サンプルより小さい場合、４７サンプルの基本周期を検出することはできない。 On the other hand, when the time width of the template is smaller than 47 samples, the basic period of 47 samples cannot be detected.

そこで、本実施の形態では、図２に示すようにテンプレートＴＭのスライド量が増大されるにつれて、テンプレートＴＭの時間幅が増大されている。 Therefore, in the present embodiment, as shown in FIG. 2, the time width of the template TM is increased as the slide amount of the template TM is increased.

そのため、例えば図２の３段目に示すテンプレートＴＭのように、ある程度テンプレートＴＭをスライドさせたとき、そのテンプレートには、ほぼ検出対象となる４７サンプルの音声信号のみが含まれるようになる。一方、図２の４段目のテンプレートＴＭにおいては、基本周期が４７サンプルの音声信号に加えて、基本周期が３７サンプルの音声信号も含まれている。そのため、３段目のテンプレートＴＭと音声信号との相関の方が、４段目のテンプレートＴＭと音声信号との相関よりも強く表れ、ロス発生時点ＲＴの直前の音声信号の基本周期を精度良く検出することが可能となる。 Therefore, for example, when the template TM is slid to some extent as in the template TM shown in the third row of FIG. 2, the template includes only 47 samples of audio signals to be detected. On the other hand, the template TM in the fourth stage in FIG. 2 includes an audio signal having a basic period of 37 samples in addition to an audio signal having a basic period of 47 samples. For this reason, the correlation between the third-stage template TM and the audio signal appears stronger than the correlation between the fourth-stage template TM and the audio signal, and the basic period of the audio signal immediately before the loss occurrence time RT is accurately determined. It becomes possible to detect.

ここで、周期検出部５２は、相関演算として、例えば式（１）に示す相互相関又は式（２）に示すＡＭＤＦを採用することが好ましい。 Here, it is preferable that the period detection unit 52 employs, for example, cross-correlation represented by Expression (1) or AMDF represented by Expression (2) as the correlation calculation.

但し、φ（τ）は相関値を示し、ＮはテンプレートＴＭの時間幅を示し、ｘ（ｊ）はテンプレートＴＭを示し、ｘ（ｊ−τ）は音声信号を示し、ｋ＋１はテンプレートＴＭの開始点を示し、ａは予め定められた係数を示し、τはテンプレートＴＭのスライド量を示し、ｊは音声信号の各サンプリング点のサンプリング番号を示す。 Where φ (τ) indicates the correlation value, N indicates the time width of the template TM, x (j) indicates the template TM, x (j−τ) indicates the audio signal, and k + 1 indicates the start of the template TM. A point indicates a predetermined coefficient, τ indicates a slide amount of the template TM, and j indicates a sampling number of each sampling point of the audio signal.

また、テンプレート設定部５１は、テンプレートＴＭのスライド量が所定のスライド基準値になるまで、テンプレートＴＭの時間幅を所定の初期時間幅に設定することが好ましい。 Further, it is preferable that the template setting unit 51 sets the time width of the template TM to a predetermined initial time width until the slide amount of the template TM reaches a predetermined slide reference value.

こうすることで、テンプレートＴＭのスライド量が比較的小さい場合は、テンプレートＴＭの時間幅が初期時間幅に設定され、スライド量が小さい場合であってもテンプレートＴＭの時間幅を一定の大きさ以上確保することが可能となり、テンプレートＴＭと音声信号（入力信号）との間の相関をより精度良く求めることができる。 In this way, when the slide amount of the template TM is relatively small, the time width of the template TM is set to the initial time width, and even if the slide amount is small, the time width of the template TM is greater than a certain amount. The correlation between the template TM and the audio signal (input signal) can be obtained more accurately.

更に、テンプレートＴＭのスライド量がスライド基準値になるまで、テンプレートＴＭの時間幅は初期時間幅に設定されるが、この初期時間幅を比較的短くすることで、計算量を少なくすることができる。 Further, the time width of the template TM is set to the initial time width until the slide amount of the template TM reaches the slide reference value, but the amount of calculation can be reduced by relatively shortening the initial time width. .

なお、初期時間幅としては、想定される音声信号の基本周期の最小値程度を採用することが好ましい。また、スライド基準値としては、例えば初期時間幅を採用すればよい。 Note that, as the initial time width, it is preferable to employ the estimated minimum value of the basic period of the audio signal. As the slide reference value, for example, an initial time width may be adopted.

図４は、テンプレート設定部５１及び周期検出部５２の処理を説明する図である。図４に示す直線上の各点は音声信号のサンプリング点を示している。また、右端のサンプリング点はロス発生時点ＲＴを示し、各サンプリング点は、左に向かうにつれて過去のサンプリング点を示している。また、ロス発生時点ＲＴを０番目のサンプリング点とする。音声信号の基本周期は、短い場合で３ｍｓｅｃ程度であり、サンプリング周波数が８ｋＨｚとすると、２４サンプルに相当する。したがって、初期時間幅として、例えば２４サンプルとすればよいが、図４では、説明の便宜上、テンプレートＴＭの初期時間幅を４とし、ａ＝１とし、スライド基準値を５とする。 FIG. 4 is a diagram for explaining processing of the template setting unit 51 and the cycle detection unit 52. Each point on the straight line shown in FIG. 4 indicates a sampling point of the audio signal. The rightmost sampling point indicates a loss occurrence time RT, and each sampling point indicates a past sampling point as it goes to the left. Further, the loss occurrence time RT is set as the 0th sampling point. The basic period of the audio signal is about 3 msec when it is short. If the sampling frequency is 8 kHz, it corresponds to 24 samples. Therefore, the initial time width may be 24 samples, for example. In FIG. 4, for convenience of explanation, the initial time width of the template TM is set to 4, a = 1, and the slide reference value is set to 5.

まず、パケットロスが発生すると、周期検出部５２は、τ＝０に設定し、テンプレートＴＭの初期時間幅が４であるため、ロス発生時点ＲＴから左に４番目のサンプリング点を基準サンプリング点ｋとして設定し、ｋからロス発生時点ＲＴに向かうにつれて、１ずつ増えるように各サンプリング点にサンプリング番号を付与し、ｋから過去に向かうにつれて、１ずつ減少するように各サンプリング点にサンプリング番号を付与する。 First, when a packet loss occurs, the period detection unit 52 sets τ = 0 and the initial time width of the template TM is 4. Therefore, the fourth sampling point on the left from the loss occurrence time RT is set as the reference sampling point k. And set the sampling number to each sampling point so that it increases by 1 from k to the loss occurrence time RT, and assign the sampling number to each sampling point so that it decreases by 1 from k to the past. To do.

そして、テンプレート設定部５１は、音声信号ｘ（ｋ＋１）〜ｘ（ｋ＋４）をテンプレートＴＭ０として設定する。 Then, the template setting unit 51 sets the audio signals x (k + 1) to x (k + 4) as the template TM0.

そして、周期検出部５２は、式（１）又は（２）を用いて、テンプレートＴＭ０と音声信号ｘ（ｊ−０）との相関値φ（０）を算出する。この場合テンプレートＴＭ０は、音声信号ｘ（ｋ＋１）〜ｘ（ｋ＋４）に当てはめられる。 Then, the cycle detection unit 52 calculates the correlation value φ (0) between the template TM0 and the audio signal x (j-0) using the formula (1) or (2). In this case, the template TM0 is applied to the audio signals x (k + 1) to x (k + 4).

次に、周期検出部５２は、τ＝１に設定し、τ＝０と同様にして、式（１）又は（２）を用いて、テンプレートＴＭ０と音声信号ｘ（ｊ−１）との相関値φ（１）を算出する。この場合、テンプレートＴＭ０は、音声信号ｘ（ｋ）〜ｘ（ｋ＋３）に当てはめられる。 Next, the period detection unit 52 sets τ = 1, and in the same manner as τ = 0, the correlation between the template TM0 and the audio signal x (j−1) is obtained using Expression (1) or (2). The value φ (1) is calculated. In this case, the template TM0 is applied to the audio signals x (k) to x (k + 3).

以下、τ＝４になるまで、テンプレートＴＭ０が音声信号に対して過去に向けてスライドされ、式（１）又は（２）を用いてφ（２），φ（３），φ（４）が算出される。 Hereinafter, the template TM0 is slid toward the past with respect to the audio signal until τ = 4, and φ (2), φ (3), φ (4) are expressed by using the formula (1) or (2). Calculated.

次に、周期検出部５２は、τ＝５に設定すると、τ≧スライド基準値（＝５）であるため、ロス発生時点ＲＴから左に５番目のサンプリング点を基準サンプリング点ｋとして設定する。そして、テンプレート設定部５１は、音声信号ｘ（ｋ＋１）〜ｘ（ｋ＋５）をテンプレートＴＭ５として設定する。そして、周期検出部５２は、式（１）又は（２）を用いてテンプレートＴＭ５と音声信号ｘ（ｊ−５）との相関値φ（５）を求める。この場合、テンプレートＴＭ５は、音声信号ｘ（ｋ−４）〜ｘ（ｋ）に当てはめられる。 Next, when setting τ = 5, the cycle detection unit 52 sets τ ≧ slide reference value (= 5), and therefore sets the fifth sampling point to the left from the loss occurrence time RT as the reference sampling point k. Then, the template setting unit 51 sets the audio signals x (k + 1) to x (k + 5) as the template TM5. Then, the period detection unit 52 obtains a correlation value φ (5) between the template TM5 and the audio signal x (j-5) using the formula (1) or (2). In this case, the template TM5 is applied to the audio signals x (k-4) to x (k).

次に、周期検出部５２は、τ＝６に設定し、ロス発生時点ＲＴから左に６番目のサンプリング点を基準サンプリング点ｋとして設定する。そして、テンプレート設定部５１は、音声信号ｘ（ｋ＋１）〜ｘ（ｋ＋６）をテンプレートＴＭ６として設定する。そして、周期検出部５２は、式（１）又は（２）を用いてテンプレートＴＭ６と音声信号ｘ（ｊ−６）との相関値φ（６）を求める。この場合、テンプレートＴＭ６は、音声信号ｘ（ｋ−５）〜ｘ（ｋ）に当てはめられる。 Next, the period detection unit 52 sets τ = 6, and sets the sixth sampling point to the left from the loss occurrence time RT as the reference sampling point k. Then, the template setting unit 51 sets the audio signals x (k + 1) to x (k + 6) as the template TM6. Then, the period detection unit 52 obtains a correlation value φ (6) between the template TM6 and the audio signal x (j-6) using the formula (1) or (2). In this case, the template TM6 is applied to the audio signals x (k-5) to x (k).

以後、周期検出部５２は、τが最大スライド量であるτｍａｘになるまで、上記処理を繰り返し、φ（τ）を求める。これにより、テンプレートＴＭは、スライド量が増大するにつれて、時間幅が増大されることになる。 Thereafter, the cycle detection unit 52 repeats the above processing until τ reaches the maximum slide amount τmax, and obtains φ (τ). Thereby, the time width of the template TM is increased as the slide amount increases.

図５は、図２に示す音声信号に対して本実施の形態による手法を用いて相関値φ（τ）を求めたときの相関値φ（τ）のグラフを示している。なお、図５において、縦軸は相関値φ（τ）を示し、横軸は時間をサンプル数で示したものである。また、図５においては、ＡＭＤＦにより相関値φ（τ）が算出されている。したがって、図３と同様、相関値の低い相関ピークほど音声信号とテンプレートＴＭとの相関が強い。 FIG. 5 shows a graph of the correlation value φ (τ) when the correlation value φ (τ) is obtained for the audio signal shown in FIG. 2 using the method according to the present embodiment. In FIG. 5, the vertical axis indicates the correlation value φ (τ), and the horizontal axis indicates the time in terms of the number of samples. In FIG. 5, the correlation value φ (τ) is calculated by AMDF. Therefore, similarly to FIG. 3, the correlation peak with the lower correlation value has a stronger correlation between the audio signal and the template TM.

図５においては、ロス発生時点ＲＴ（＝０）からおよそ４７サンプル経過したときに下に凸の相関ピークＰＫ１が現れ、次に、相関ピークＰＫ１が現れてからおよそ３７サンプル経過したときに下に凸の相関ピークＰＫ２が現れ、以後、およそ３７サンプル経過する毎に下に凸の相関ピークが現れている。また、相関ピークは時間が経過するにつれて値が大きくなっており、テンプレートＴＭと音声信号との相関が弱くなっている。なお、サンプリング周波数を８ｋＨｚとすると、３７サンプルは、３７×０．１２５ｍｓｅｃ＝４．６２５ｍｓｅｃに相当し、４７サンプルは、４７×０．１２５＝５．８７５ｍｓｅｃに相当する。 In FIG. 5, a convex correlation peak PK1 appears when approximately 47 samples have elapsed from the loss occurrence time RT (= 0), and then when approximately 37 samples have elapsed since the correlation peak PK1 has appeared. A convex correlation peak PK2 appears, and thereafter a convex correlation peak appears every approximately 37 samples. The correlation peak value increases with time, and the correlation between the template TM and the audio signal is weakened. If the sampling frequency is 8 kHz, 37 samples correspond to 37 × 0.125 msec = 4.625 msec, and 47 samples correspond to 47 × 0.125 = 5.875 msec.

つまり、図５に示す相関ピークのうち、テンプレートＴＭを４７サンプル分ずらしたときの相関ピークＰＫ１が最小となっている。 That is, among the correlation peaks shown in FIG. 5, the correlation peak PK1 when the template TM is shifted by 47 samples is the minimum.

そのため、周期検出部５２は、最小の相関ピークＰＫ１が現れた時刻である４７サンプルをロス発生時点ＲＴの直前の音声信号の基本周期として検出する。したがって、周期検出部５２は、図２に示すロス発生時点ＲＴの直前の音声信号の基本周期である４７サンプルを検出できていることが分かる。 Therefore, the period detection unit 52 detects 47 samples, which is the time when the minimum correlation peak PK1 appears, as the basic period of the audio signal immediately before the loss occurrence time RT. Therefore, it can be seen that the period detector 52 can detect 47 samples, which are the basic period of the audio signal immediately before the loss occurrence time RT shown in FIG.

図１に戻り、隠蔽処理部６は、ロス発生時点ＲＴから過去に向けて周期検出部５２により検出された基本周期分の音声信号を取り出し、取り出した音声信号でパケットロスが発生したロス期間を補間する隠蔽処理を行う。 Returning to FIG. 1, the concealment processing unit 6 extracts voice signals for the basic period detected by the period detection unit 52 from the loss occurrence time RT to the past, and calculates a loss period in which packet loss has occurred in the extracted audio signals. Performs concealment processing for interpolation.

ここで、隠蔽処理部６は、例えば、図２に示す音声信号が入力され、周期検出部５２が基本周期として４７サンプルを検出したとすると、ロス発生時点ＲＴから過去に向けて４７サンプルの音声信号を取り出し、取り出された音声信号をロス期間の最後まで繰り返し当てはめ、ロス期間を補間する。 Here, for example, if the audio signal shown in FIG. 2 is input to the concealment processing unit 6 and the period detection unit 52 detects 47 samples as the basic period, the audio of 47 samples from the loss occurrence time point RT to the past. The signal is extracted, the extracted audio signal is repeatedly applied to the end of the loss period, and the loss period is interpolated.

音声出力部７は、隠蔽処理が行われた音声信号又は隠蔽処理が行われなかった音声信号をアナログ信号に変換し、スピーカ８から音声として出力させる。 The audio output unit 7 converts an audio signal that has been concealed or an audio signal that has not been concealed into an analog signal, and outputs the analog signal from the speaker 8.

図６は、図１に示す通信装置の処理を示すフローチャートである。なお、図６のフローチャートでは、説明の便宜上、ａ＝１としている。まず、ステップＳ１において、パケットロス検出部４が、パケットロスを検出すると（ステップＳ１でＹＥＳ）、周期検出部５２は、τ＝０に設定する（ステップＳ２）。 FIG. 6 is a flowchart showing processing of the communication apparatus shown in FIG. In the flowchart of FIG. 6, a = 1 is set for convenience of explanation. First, in step S1, when the packet loss detection unit 4 detects a packet loss (YES in step S1), the period detection unit 52 sets τ = 0 (step S2).

次に、テンプレート設定部５１は、τの値に応じた時間幅のテンプレートＴＭを音声信号から設定する（ステップＳ３）。このとき、テンプレート設定部５１は、τ＜スライド基準値の場合は、テンプレートＴＭの時間幅を初期時間幅に設定し、τ≧スライド基準値の場合、テンプレートＴＭの時間幅をＮ＝τに設定する。 Next, the template setting unit 51 sets a template TM having a time width corresponding to the value of τ from the audio signal (step S3). At this time, the template setting unit 51 sets the time width of the template TM to the initial time width if τ <slide reference value, and sets the time width of the template TM to N = τ if τ ≧ slide reference value. To do.

次に、周期検出部５２は、ｋ＋１がテンプレートＴＭの開始点となるように、基準サンプリング点ｋを設定し、各サンプリング点にサンプリング番号を付与する（ステップＳ４）。 Next, the period detection unit 52 sets the reference sampling point k so that k + 1 becomes the starting point of the template TM, and assigns a sampling number to each sampling point (step S4).

次に、周期検出部５２は、式（１）又は（２）を用いてテンプレートＴＭと音声信号との相関値を算出する（ステップＳ５）。 Next, the period detection unit 52 calculates a correlation value between the template TM and the audio signal using the formula (1) or (2) (step S5).

次に、周期検出部５２は、τ＝τ＋１とする（ステップＳ６）。次に、周期検出部５２は、τ≧スライド基準値の場合（ステップＳ７でＹＥＳ）、すなわち、テンプレートＴＭのスライド量がスライド基準値を超えた場合、処理をステップＳ８に進め、τ＜スライド基準値の場合（ステップＳ７でＮＯ）、処理をステップＳ５に戻す。ステップＳ５〜Ｓ７の処理が繰り返されることで、初期時間幅のテンプレートＴＭは、スライド基準値となるまで、音声信号に対して過去に向けてスライドされる。 Next, the period detection unit 52 sets τ = τ + 1 (step S6). Next, when τ ≧ slide reference value (YES in step S7), that is, when the slide amount of the template TM exceeds the slide reference value, the cycle detection unit 52 advances the process to step S8, and τ <slide reference If it is a value (NO in step S7), the process returns to step S5. By repeating the processes of steps S5 to S7, the template TM having the initial time width is slid toward the past with respect to the audio signal until the slide TM reaches the slide reference value.

ステップＳ８において、τ＜τｍａｘである場合（ステップＳ８でＮＯ）処理がステップＳ３に戻され、τ≧τｍａｘとなるまで、ステップＳ３〜Ｓ８の処理が繰り返される。これにより、テンプレートＴＭは、スライド量であるτが増大するにつれて時間幅が増大される。 In step S8, if τ <τmax (NO in step S8), the process returns to step S3, and the processes in steps S3 to S8 are repeated until τ ≧ τmax. As a result, the time width of the template TM is increased as the slide amount τ increases.

ステップＳ８において、τ≧τｍａｘとなった場合（ステップＳ８でＹＥＳ）、周期検出部５２は、ステップＳ５で算出した相関値から相関ピークを検出し、検出した相関ピークのうち、テンプレートＴＭと音声信号との相関が最も強い相関ピークのスライド量を特定し、特定したスライド量から基本周期を検出する（ステップＳ９）。ここで、式（１）を採用した場合、相関値が最大の値を有する相関ピークがテンプレートＴＭと音声信号との最も強い相関を示す。また、式（２）を採用した場合、相関値が最小の値を示す相関ピークがテンプレートＴＭと音声信号との最も強い相関を示す。 In step S8, when τ ≧ τmax is satisfied (YES in step S8), the period detection unit 52 detects a correlation peak from the correlation value calculated in step S5, and the template TM and the audio signal among the detected correlation peaks. The slide amount of the correlation peak having the strongest correlation with is identified, and the basic period is detected from the identified slide amount (step S9). Here, when Expression (1) is adopted, the correlation peak having the maximum correlation value indicates the strongest correlation between the template TM and the audio signal. Further, when Expression (2) is adopted, a correlation peak showing a minimum correlation value indicates the strongest correlation between the template TM and the audio signal.

また、周期検出部５２は、特定したスライド量に音声信号のサンプリング周期を乗じることで、基本周期を算出すればよい。 The period detector 52 may calculate the basic period by multiplying the specified slide amount by the sampling period of the audio signal.

次に、隠蔽処理部６は、ステップＳ９で検出された基本周期に従って音声信号を取り出し、取り出した音声信号を用いてロス期間を補間し、隠蔽処理を行う（ステップＳ１０）。 Next, the concealment processing unit 6 extracts an audio signal according to the basic period detected in step S9, interpolates the loss period using the extracted audio signal, and performs concealment processing (step S10).

なお、図４の説明では、テンプレート設定部５１は、ａ＝１に設定したが、これに限定されず、テンプレートＴＭのスライド量が所定の変更基準値を超えるまで、ａを１≦ａ＜２の範囲内の所定の固定値に設定し、スライド量が変更基準値を超えると、スライド量が最大スライド量（τｍａｘ）に近づくにつれて、１に近づくようにａの値を漸次減少させてもよい。変更基準値としては、例えば上記のスライド基準値を採用することができる。 In the description of FIG. 4, the template setting unit 51 sets a = 1. However, the present invention is not limited to this, and a is set to 1 ≦ a <2 until the slide amount of the template TM exceeds a predetermined change reference value. When the slide amount exceeds the change reference value, the value of a may be gradually decreased so as to approach 1 as the slide amount approaches the maximum slide amount (τmax). . As the change reference value, for example, the above-described slide reference value can be adopted.

これにより、スライド量が小さい場合、テンプレートＴＭの時間幅をスライド量に対して大きめに設定することができ、スライド量が大きい場合、テンプレートＴＭの時間幅をスライド量程度の値に設定することができる。そのため、スライド量が小さい場合において、テンプレートＴＭの時間幅が小さくなりすぎることによる相関演算精度の低下を防止することができる。 Thereby, when the slide amount is small, the time width of the template TM can be set larger than the slide amount. When the slide amount is large, the time width of the template TM can be set to a value about the slide amount. it can. Therefore, when the slide amount is small, it is possible to prevent the correlation calculation accuracy from being lowered due to the time width of the template TM becoming too small.

また、相関演算としては、式（１）に示す相互相関、又は式（２）に示すＡＭＤＦに代えて、ＡＳＤＦ等の手法を採用してもよい。 Further, as the correlation calculation, a technique such as ASDF may be employed instead of the cross-correlation shown in Expression (1) or the AMDF shown in Expression (2).

このように、本通信装置によれば、ロス発生時点ＲＴから過去に向けてある時間幅の音声信号がテンプレートＴＭとして設定される。そして、設定されたテンプレートＴＭが音声信号に対して現時点から過去に向けてスライドされる。そして、テンプレートＴＭと音声信号との相関が求められ、音声信号の基本周期が検出される。 As described above, according to the present communication device, the audio signal having a certain time width from the loss occurrence time point RT to the past is set as the template TM. Then, the set template TM is slid toward the past from the present time with respect to the audio signal. Then, the correlation between the template TM and the audio signal is obtained, and the basic period of the audio signal is detected.

ここで、テンプレートＴＭはスライド量が増大するにつれて時間幅が増大される。したがって、スライド量が小さい比較的初期の段階において、現時点のほぼ直前の基本周期分の音声信号がテンプレートＴＭとされるタイミングが発生する。このとき、テンプレートＴＭと音声信号との間で強い相関ピークが現れる。一方、スライド量が大きくなると、それに応じてテンプレートＴＭの時間幅も増大され、テンプレートＴＭには複数の周波数成分が含まれるようになる。そのため、上記のタイミングで得られる相関ピークほど強い相関ピークを得ることはできなくなる。よって、現時点のほぼ直前の音声信号の基本周期を精度良く検出することが可能となる。 Here, the time width of the template TM is increased as the slide amount increases. Therefore, at a relatively early stage where the slide amount is small, a timing is generated at which the audio signal for the basic period almost immediately before the current time is used as the template TM. At this time, a strong correlation peak appears between the template TM and the audio signal. On the other hand, as the slide amount increases, the time width of the template TM is increased accordingly, and the template TM includes a plurality of frequency components. Therefore, it becomes impossible to obtain a stronger correlation peak as the correlation peak obtained at the above timing. Therefore, it is possible to detect the basic period of the audio signal almost immediately before the present time with high accuracy.

本発明の一実施の形態による信号処理装置が適用された通信装置の全体構成を示すブロック図である。1 is a block diagram showing an overall configuration of a communication apparatus to which a signal processing apparatus according to an embodiment of the present invention is applied. テンプレート設定部及び周期検出部の処理を説明するための音声信号の波形図である。It is a wave form diagram of an audio signal for explaining processing of a template setting part and a cycle detection part. 従来のテンプレートを用いたときのテンプレートと音声信号との相関値の演算結果を示したグラフである。It is the graph which showed the calculation result of the correlation value of a template and an audio | voice signal when the conventional template is used. テンプレート設定部及び周期検出部の処理を説明する図である。It is a figure explaining the process of a template setting part and a period detection part. 図２に示す音声信号に対して本実施の形態による手法を用いて相関値を求めたときの相関値のグラフを示している。The graph of the correlation value when the correlation value is calculated | required using the method by this Embodiment with respect to the audio | voice signal shown in FIG. 図１に示す通信装置の処理を示すフローチャートである。It is a flowchart which shows the process of the communication apparatus shown in FIG. 従来の隠蔽処理を説明するための音声信号の波形図である。It is a wave form diagram of an audio signal for explaining conventional concealment processing.

Explanation of symbols

１パケット受信部
２遅延ゆらぎ吸収バッファ
３タイマ
４パケットロス検出部
５検出処理部
６隠蔽処理部
７音声出力部
８スピーカ
５１テンプレート設定部
５２周期検出部 DESCRIPTION OF SYMBOLS 1 Packet receiving part 2 Delay fluctuation absorption buffer 3 Timer 4 Packet loss detection part 5 Detection processing part 6 Concealment processing part 7 Audio | voice output part 8 Speaker 51 Template setting part 52 Period detection part

Claims

A reference signal setting unit for setting an input signal having a time width from the present time to the past as a reference signal;
A period detection unit that detects the basic period of the input signal by sliding the reference signal toward the past from the present time with respect to the input signal, and obtaining a correlation between the reference signal and the input signal ;
A packet loss detector that detects whether or not a packet loss has occurred in the input signal ,
The reference signal setting unit increases the time width of the reference signal as the slide amount of the reference signal increases,
The signal processor according to claim 1, wherein the period detector detects the basic period with the loss occurrence time at which the packet loss detected by the packet loss detector has occurred as the current time .

Toward the past from the previous SL loss occurs when taking out an input signal of the basic period, according to claim 1, further comprising a concealment processing unit for interpolating the loss period packet loss occurs in the input signal extracted Signal processing equipment.

The reference signal setting unit sets the time width of the reference signal to a predetermined initial time width until the slide amount of the reference signal reaches a predetermined slide reference value. Signal processing device.

The signal processing apparatus according to claim 1, wherein the period detection unit obtains a correlation between the reference signal and the input signal by cross-correlation.

The signal processing apparatus according to claim 1, wherein the period detection unit obtains a correlation between the reference signal and the input signal by AMDF.

The input signal is a signal sampled at a predetermined sampling period,
The signal processing apparatus according to claim 4, wherein the period detection unit performs the cross-correlation using Equation (1).

Where φ (τ) represents a correlation value, N represents a time width of the reference signal, x (j) represents the reference signal, x (j−τ) represents the input signal, and k + 1 represents the input signal. The starting point of the reference signal is indicated, a is a predetermined coefficient, τ is the sliding amount of the reference signal, and j is the sampling number of each sampling point of the input signal.

The input signal is a signal sampled at a predetermined sampling period,
The signal processing apparatus according to claim 5, wherein the period detection unit performs the AMDF using Equation (2).

The reference signal setting unit sets the a to a predetermined fixed value within a range of 1 ≦ a <2 until the slide amount of the reference signal exceeds a predetermined change reference value, and the slide amount of the reference signal is 8. The signal processing device according to claim 6, wherein when the change reference value is exceeded, the value of a is decreased so as to approach 1 as the slide amount approaches a predetermined maximum slide amount.

A reference signal setting step for setting an input signal having a time width from the present time to the past as a reference signal;
Slide toward the past input signal the reference signal from the current input signal, by obtaining the correlation between the reference signal and the input signal, the period detection step of detecting a basic period of the input signal,
A packet loss detection step for detecting whether or not a packet loss has occurred in the input signal ,
The reference signal setting step increases the time width of the reference signal as the slide amount of the reference signal increases,
In the signal processing method , the period detection step detects the basic period with the loss occurrence time at which the packet loss detected in the packet loss detection step has occurred as the current time .