JP5346230B2

JP5346230B2 - Speaking speed converter

Info

Publication number: JP5346230B2
Application number: JP2009056958A
Authority: JP
Inventors: 哲平鷲; 恵一 ▲吉▼田
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2009-03-10
Filing date: 2009-03-10
Publication date: 2013-11-20
Anticipated expiration: 2029-03-10
Also published as: JP2010210947A

Description

本発明は、入力音声の話速を変換して出力する話速変換装置に関するものである。 The present invention relates to a speech speed conversion device that converts a speech speed of input speech and outputs the converted speech speed.

従来、入力音声の話速を変換して出力する話速変換装置が種々提案されている。ここで、単純に話速を遅くしてしまうと出力音声に遅延が生じてしまうので、音を含む区間（有音区間）と音を含まない区間（無音区間）を判別し、有音区間の入力信号を伸長するとともに無音区間の入力信号を圧縮することによって出力音声に遅延が生じないようにしていた（例えば、特許文献１参照）。 Conventionally, various speech speed conversion devices that convert and output the speech speed of input speech have been proposed. Here, if the speech speed is simply slowed down, the output voice will be delayed, so the section containing sound (sounded section) and the section not containing sound (silent section) are discriminated. By delaying the input signal and compressing the input signal in the silent period, the output sound is prevented from being delayed (for example, see Patent Document 1).

特表２００６−７７６２６号公報JP-T-2006-77626

しかしながら、上述のように有音区間と無音区間を判別する従来例では、入力信号に騒音が重畳した場合に無音区間が存在しなくなるため、常に入力信号が伸長されてしまい、話速変換による遅延が増加していくという問題がある。そこで、入力信号に対して音声が含まれる区間（音声区間）と音声が含まれない区間（非音声区間）とを判別をすることにより、騒音が重畳した入力信号に対しても遅延が発生しない話速変換を実現することが考えられる。しかしながら、入力信号に重畳する騒音のレベルが大きい場合、音声区間が非音声区間と誤判別されてしまい、その結果、音声区間が誤って圧縮されることによって入力音声の一部が失われてしまう虞があった。 However, in the conventional example that discriminates between the voiced section and the silent section as described above, since the silent section does not exist when noise is superimposed on the input signal, the input signal is always expanded, and the delay due to the speech speed conversion. There is a problem that increases. Therefore, by discriminating between a section (speech section) in which speech is included in the input signal and a section (non-speech section) in which speech is not included (delayed by non-speech), no delay occurs in the input signal on which noise is superimposed. It is conceivable to realize speech speed conversion. However, when the level of noise superimposed on the input signal is large, the speech section is misclassified as a non-speech section, and as a result, a part of the input speech is lost due to erroneous compression of the speech section. There was a fear.

本発明は上記事情に鑑みて為されたものであり、その目的は、入力音声に騒音が重畳する場合においても適切に話速変換することができる話速変換装置を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a speech speed conversion device capable of appropriately converting speech speed even when noise is superimposed on input speech.

請求項１の発明は、上記目的を達成するために、入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部と、音声区間判別部が音声区間と判別しているときに伸長圧縮処理部に伸張処理を行わせるとともに音声区間判別部が非音声区間と判別しているときに伸長圧縮処理部に圧縮処理を行わせる話速変換処理判定部と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部とを備え、話速変換処理判定部は、騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するとともに、音声区間判別部が非音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが所定のしきい値以上であるときには伸長圧縮処理部に圧縮処理を行わせないことを特徴とする。 In order to achieve the above object, the first aspect of the present invention is a speech section discriminating section for discriminating between a speech section in which speech is included in an input signal and a non-speech section in which speech is not included, and the input signal is expanded. Alternatively, when the decompression / compression processing unit that compresses and outputs and the speech segment determination unit determines that the speech segment is a speech segment, the decompression / compression processing unit performs the decompression process and the speech segment determination unit determines that the segment is a non-speech segment A speech speed conversion processing determination unit that sometimes causes the compression / decompression processing unit to perform compression processing, and a noise level determination unit that determines a noise level included in the input signal, and the speech speed conversion processing determination unit is a noise level determination unit The decompression / compression processing unit determines whether or not the decompression process and the compression process can be performed according to the noise level determined by the sound level, and the noise level determination unit determines even if the speech segment determination unit determines that it is a non-speech segment. Be done Sound level is equal to or not to perform a compression process in the expansion compression processing unit when it is above a predetermined threshold.

請求項１の発明によれば、話速変換処理判定部が騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するので、入力音声に騒音が重畳する場合においても適切に話速変換することができる。さらに、音声区間判別部が非音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルがしきい値以上であるときには伸長圧縮処理部が圧縮処理を行わないので、入力音声が誤って圧縮されることによって出力音声が途切れるのを防ぐことができる。 According to the first aspect of the present invention, the speech speed conversion processing determination unit determines whether or not the expansion processing and the compression processing in the expansion / compression processing unit can be performed according to the noise level determined by the noise level determination unit. The speech speed can be appropriately converted even when is superimposed. Furthermore, even when the speech section determination unit determines that it is a non-speech section, the expansion / compression processing unit does not perform compression processing when the noise level determined by the noise level determination unit is greater than or equal to the threshold value. It is possible to prevent the output sound from being interrupted by the input sound being compressed by mistake.

請求項２の発明は、上記目的を達成するために、入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部と、音声区間判別部が音声区間と判別しているときに伸長圧縮処理部に伸張処理を行わせるとともに音声区間判別部が非音声区間と判別しているときに伸長圧縮処理部に圧縮処理を行わせる話速変換処理判定部と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部とを備え、話速変換処理判定部は、騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するとともに、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが所定のしきい値以上であるときには伸長圧縮処理部に伸長処理を行わせないことを特徴とする。 In order to achieve the above object , a second aspect of the present invention provides a speech segment discriminating unit that discriminates between a speech segment that includes speech and a non-speech segment that does not include speech, and expands the input signal. Alternatively, when the decompression / compression processing unit that compresses and outputs and the speech segment determination unit determines that the speech segment is a speech segment, the decompression / compression processing unit performs the decompression process and the speech segment determination unit determines that the segment is a non-speech segment A speech speed conversion processing determination unit that sometimes causes the compression / decompression processing unit to perform compression processing, and a noise level determination unit that determines a noise level included in the input signal, and the speech speed conversion processing determination unit is a noise level determination unit The decompression processing in the decompression processing unit and whether or not to execute the compression process are determined according to the noise level determined by the sound level, and the noise level determination unit determines even if the speech segment determination unit determines that it is a speech segment. Noisy Level is equal to or not to perform the decompression processing to the decompression compression processing unit when it is above a predetermined threshold.

請求項２の発明によれば、話速変換処理判定部が騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するので、入力音声に騒音が重畳する場合においても適切に話速変換することができる。さらに、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルがしきい値以上であるときには伸長圧縮処理部が伸長処理を行わないので、騒音の影響で音声が不自然に伸長されることを防止できる。 According to the second aspect of the present invention, the speech speed conversion processing determination unit determines whether or not the expansion processing and the compression processing in the expansion / compression processing unit can be performed according to the noise level determined by the noise level determination unit. The speech speed can be appropriately converted even when is superimposed. Furthermore, even if the speech segment determination unit determines that the speech segment is a speech segment, if the noise level determined by the noise level determination unit is greater than or equal to the threshold value, the expansion / compression processing unit does not perform the expansion process. It is possible to prevent the sound from being unnaturally stretched due to the influence of.

請求項３の発明は、上記目的を達成するために、入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部と、音声区間判別部が音声区間と判別しているときに伸長圧縮処理部に伸張処理を行わせるとともに音声区間判別部が非音声区間と判別しているときに伸長圧縮処理部に圧縮処理を行わせる話速変換処理判定部と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部とを備え、話速変換処理判定部は、音声区間判別部が非音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが所定の第１のしきい値以上であり且つ第１のしきい値よりも大きい第２のしきい値未満であるときには伸長圧縮処理部に圧縮処理を行わせず、さらに、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが第２のしきい値以上であるときには伸長圧縮処理部に伸長処理及び圧縮処理の何れも行わせないことを特徴とする。 In order to achieve the above object , a third aspect of the present invention provides a voice section discriminating section that discriminates a voice section in which voice is included in an input signal and a non-voice section in which voice is not included, and expands the input signal. Alternatively, when the decompression / compression processing unit that compresses and outputs and the speech segment determination unit determines that the speech segment is a speech segment, the decompression / compression processing unit performs the decompression process and the speech segment determination unit determines that the segment is a non-speech segment A speech speed conversion processing determining unit that causes the decompression / compression processing unit to perform compression processing, and a noise level determining unit that determines a noise level included in the input signal. Even if the sound level is determined to be a non-speech segment, the second level of the noise level determined by the noise level determination unit is greater than or equal to a predetermined first threshold value and greater than the first threshold value. When it is less than the threshold, decompression and compression processing Without performing the compression process, further, extended compression when the voice segment discriminating unit noise level determined by the noise level determining unit even if is determined that the speech section is a second threshold or more It is characterized in that neither the decompression process nor the compression process is performed by the processing unit.

請求項３の発明によれば、話速変換処理判定部が騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するので、入力音声に騒音が重畳する場合においても適切に話速変換することができる。さらに、音声区間判別部が非音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが第１のしきい値以上且つ第２のしきい値未満と判定されているときには伸長圧縮処理部が圧縮処理を行わないので、入力音声が誤って圧縮されることによって出力音声が途切れるのを防ぐことができ、また、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが第２のしきい値以上と判定されているときには伸長圧縮処理部が伸長処理を行わないので、騒音の影響で音声が不自然に伸長されることを防止できる。 According to the third aspect of the present invention, the speech speed conversion process determination unit determines whether or not the expansion process and the compression process can be performed in the expansion / compression process unit according to the noise level determined by the noise level determination unit. The speech speed can be appropriately converted even when is superimposed. Furthermore, even when the speech segment determination unit determines that the speech segment is a non-speech segment, the noise level determined by the noise level determination unit is determined to be greater than or equal to the first threshold value and less than the second threshold value. When the decompression compression processing unit does not perform compression processing, the input speech is prevented from being interrupted by accidentally compressing the input speech, and the speech segment determination unit determines that it is a speech segment Even if the noise level determined by the noise level determination unit is determined to be greater than or equal to the second threshold value, the expansion / compression processing unit does not perform expansion processing, so that the sound is unnaturally expanded due to noise. Can be prevented.

請求項４の発明は、請求項１〜３の何れか１項の発明において、騒音レベル判定部は、入力信号の振幅の絶対値を用いて騒音レベルを求めることを特徴とする。 According to a fourth aspect of the present invention, in any one of the first to third aspects, the noise level determination unit obtains the noise level using the absolute value of the amplitude of the input signal.

請求項４の発明によれば、比較的簡単に騒音レベルを求めることができる。 According to invention of Claim 4 , a noise level can be calculated | required comparatively easily.

請求項５の発明は、請求項１〜３の何れか１項の発明において、騒音レベル判定部は、入力信号の非音声帯域の周波数成分を用いて騒音レベルを求めることを特徴とする。 According to a fifth aspect of the present invention, in any one of the first to third aspects, the noise level determination unit obtains a noise level using a frequency component of a non-voice band of the input signal.

請求項５の発明によれば、入力音声のレベルに依存せずに騒音レベルを求めることができる。 According to the invention of claim 5 , the noise level can be obtained without depending on the level of the input voice.

請求項６の発明は、請求項１〜５の何れか１項の発明において、伸長圧縮処理部で伸長又は圧縮された入力信号を蓄積するバッファ部を備え、話速変換処理判定部は、音声区間判別部が音声区間と判別している場合であってもバッファ部の空き容量が所定の下限値以下であるときには伸長圧縮処理部に伸長処理を行わせないことを特徴とする。 A sixth aspect of the present invention includes a buffer unit that accumulates the input signal expanded or compressed by the expansion / compression processing unit according to any one of the first to fifth aspects of the present invention. Even when the section discriminating unit discriminates the voice section, the decompression / compression processing unit is not allowed to perform the decompression process when the free capacity of the buffer unit is equal to or less than a predetermined lower limit value.

請求項６の発明によれば、バッファ部がオーバーフローして他のメモリ領域に出力音声を上書きしてしまうことによる誤動作を防止できる。 According to the sixth aspect of the present invention, it is possible to prevent malfunction caused by overflow of the buffer section and overwriting of output sound in another memory area.

請求項７の発明は、請求項１〜６の何れか１項の発明において、伸長圧縮処理部で伸長又は圧縮された入力信号を蓄積するバッファ部を備え、話速変換処理判定部は、音声区間判別部が非音声区間と判別している場合であってもバッファ部の空き容量が所定の上限値以上であるときには伸長圧縮処理部に圧縮処理を行わせないことを特徴とする。 The invention of claim 7 is the invention of any one of claims 1 to 6 , further comprising a buffer unit that accumulates the input signal expanded or compressed by the expansion / compression processing unit, and the speech speed conversion processing determination unit Even when the section determination unit determines that the section is a non-speech section, the decompression / compression processing unit is not allowed to perform compression processing when the free space in the buffer unit is equal to or greater than a predetermined upper limit value.

請求項７の発明によれば、バッファ部がアンダーフロー（アンダーラン）して他のメモリ領域に出力音声を上書きしてしまうことによる誤動作を防止できる。 According to the seventh aspect of the present invention, it is possible to prevent malfunction caused by the buffer section underflowing (underrun) and overwriting the output sound in another memory area.

請求項８の発明は、請求項１〜７の何れか１項の発明において、伸長圧縮処理部で伸長又は圧縮された入力信号を蓄積するバッファ部を備え、話速変換処理判定部は、伸長圧縮処理部に伸張処理を行わせる際の伸長率を、バッファ部の空き容量に応じて増減させることを特徴とする。 The invention according to claim 8 is the invention according to any one of claims 1 to 7 , further comprising a buffer unit that accumulates an input signal expanded or compressed by the expansion / compression processing unit, and the speech speed conversion process determining unit is expanded The expansion rate when the compression processing unit performs the expansion process is increased or decreased according to the free capacity of the buffer unit.

請求項８の発明によれば、バッファ部の空き容量がなくなったときに急に話速変換されていない音声が出力されてしまうことを防止できる。 According to the eighth aspect of the present invention, it is possible to prevent a voice that has not been subjected to speech speed conversion from being suddenly output when there is no more free space in the buffer unit.

本発明によれば、入力音声に騒音が重畳する場合においても適切に話速変換することができる。 According to the present invention, speech speed can be appropriately converted even when noise is superimposed on input speech.

本発明の実施形態を示すブロック図である。It is a block diagram which shows embodiment of this invention. （ａ）は騒音が重畳されていない入力音声の絶対値の波形図、（ｂ）は騒音が重畳されている入力音声の絶対値の波形図である。(A) is a waveform diagram of the absolute value of the input speech on which noise is not superimposed, and (b) is a waveform diagram of the absolute value of the input speech on which noise is superimposed. 同上の動作説明図である。It is operation | movement explanatory drawing same as the above.

以下、図面を参照して本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本実施形態の話速変換装置は、図１に示すように入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部１と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部２と、音声区間判別部１が音声区間と判別しているときに伸長圧縮処理部２に伸張処理を行わせるとともに音声区間判別部１が非音声区間と判別しているときに伸長圧縮処理部２に圧縮処理を行わせる話速変換処理判定部３と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部４と、伸長圧縮処理部２で伸長又は圧縮された入力信号を蓄積するバッファ部５とを備えている。尚、これらの各部は、ＤＳＰ(Digital Signal Proccesor)に所定のプログラムを実行させることで実現されるものであり、例えば、マイクロホンで集音されたアナログの音響信号をＡ／Ｄ変換して得られるデジタル信号が入力信号となる。 As shown in FIG. 1, the speech speed conversion apparatus according to the present embodiment includes a speech section determination unit 1 that determines a speech section in which speech is included in an input signal and a non-speech section in which speech is not included, and an input signal. The decompression / compression processing unit 2 that decompresses or compresses and outputs the decompression / compression processing unit 2 causes the decompression / compression processing unit 2 to perform decompression processing when the speech segment determination unit 1 determines that it is a speech segment, and the speech segment determination unit 1 performs non-speech A speech speed conversion processing determination unit 3 that causes the expansion / compression processing unit 2 to perform compression processing when it is determined to be a section, a noise level determination unit 4 that determines a noise level included in an input signal, and an expansion / compression processing unit 2 And a buffer unit 5 for accumulating the input signal expanded or compressed in (1). Each of these units is realized by causing a DSP (Digital Signal Processor) to execute a predetermined program, and can be obtained by, for example, A / D converting an analog acoustic signal collected by a microphone. A digital signal becomes an input signal.

音声区間判別部１は従来周知のものであって、入力信号の音声区間を判別して当該判別結果を話速変換処理判定部３に出力している。 The voice section discriminating unit 1 is a conventionally known one, discriminates the voice section of the input signal and outputs the discrimination result to the speech speed conversion processing judgment unit 3.

伸長圧縮処理部２は、入力信号に時間軸圧伸（圧縮及び伸長）処理を施し、その音声信号の表す音声の話速を調整するものであって、例えば、フレーム長を変えつつ音声の自己相関を算出し、最も相関が高くなるフレーム長をその音声の周期とみなし、その周期単位で波形の挿入または削除を行うことにより話速変換を行うＰＩＣＯＬＡと呼ばれるアルゴリズムを実行している。 The decompression / compression processing unit 2 performs time-axis companding (compression and decompression) processing on the input signal and adjusts the speech speed of the speech represented by the speech signal. An algorithm called PICOLA that performs speech speed conversion is performed by calculating the correlation, regarding the frame length with the highest correlation as the period of the speech, and inserting or deleting the waveform in that period.

騒音レベル判定部４は、立ち上がりが緩やかであり且つ立ち下がりが急峻な特性、すなわち、立ち上がり時定数が相対的に大きく且つ立ち下がり時定数が相対的に小さい応答特性を有するデジタルフィルタからなり、入力信号の振幅の絶対値を用いて入力信号中に定常的に存在する騒音のレベル（騒音レベル）を判定（推定）するものである。図２（ａ）に示すように入力信号が音声のみで騒音が重畳されていない場合と、図２（ｂ）に示すように入力信号が音声に騒音が重畳されている場合とを比較すると、音声の振幅波形は相対的に急激に変化するのに対して、騒音の振幅波形は相対的に緩やかに変化するので、立ち上がり時定数が相対的に大きく且つ立ち下がり時定数が相対的に小さい応答特性を有するデジタルフィルタによって入力信号の振幅波形から音声の振幅波形を除去することで騒音レベルを判定することができる。尚、図２では入力信号の絶対値の波形を示している。但し、騒音レベル判定部４において離散高速フーリエ変換を行い、入力信号の非音声周波数帯域の周波数成分を用いて騒音レベルを判定することも可能である。 The noise level determination unit 4 is composed of a digital filter having a characteristic that the rise is gradual and the fall is steep, that is, a response characteristic having a relatively large rise time constant and a relatively small fall time constant. The absolute value of the amplitude of the signal is used to determine (estimate) the level of noise (noise level) that is constantly present in the input signal. Comparing the case where the input signal is speech only and no noise is superimposed as shown in FIG. 2A, and the case where the input signal is speech and noise is superimposed as shown in FIG. While the amplitude waveform of speech changes relatively abruptly, the amplitude waveform of noise changes relatively slowly, so the response with a relatively large rise time constant and a relatively small fall time constant The noise level can be determined by removing the amplitude waveform of the voice from the amplitude waveform of the input signal by a digital filter having characteristics. FIG. 2 shows the waveform of the absolute value of the input signal. However, the noise level determination unit 4 can perform discrete fast Fourier transform to determine the noise level using the frequency component of the non-speech frequency band of the input signal.

話速変換処理判定部３は、騒音レベル判定部４が判定する騒音レベルに応じて伸長圧縮処理部２における伸張処理及び圧縮処理の実行可否を判定する。具体的には、話速変換処理判定部３では、音声区間判別部１が非音声区間と判別している場合であっても騒音レベル判定部４で判定される騒音レベルがしきい値以上であるときには伸長圧縮処理部２に圧縮処理を行わせない。つまり、騒音レベルがしきい値以上である場合、音声区間判別部１が音声区間を非音声区間と誤判別してしまう可能性が高くなるので、このような場合に伸長圧縮処理部２に圧縮処理を行わせないことにより、入力音声に騒音が重畳する場合においても適切に話速変換する、具体的には、入力音声が誤って圧縮されることによって出力音声が途切れるのを防ぐことができる。 The speech speed conversion processing determination unit 3 determines whether or not the expansion processing and the compression processing in the expansion / compression processing unit 2 are executable according to the noise level determined by the noise level determination unit 4. Specifically, in the speech speed conversion process determination unit 3, the noise level determined by the noise level determination unit 4 is greater than or equal to the threshold value even when the voice segment determination unit 1 determines that it is a non-speech segment. In some cases, the decompression / compression processing unit 2 is not allowed to perform compression processing. That is, when the noise level is equal to or higher than the threshold value, there is a high possibility that the voice segment determination unit 1 erroneously discriminates the voice segment as a non-speech segment. In such a case, the decompression compression processing unit 2 performs compression processing. By not performing it, it is possible to appropriately convert the speech speed even when noise is superimposed on the input voice. Specifically, it is possible to prevent the output voice from being interrupted due to erroneous compression of the input voice.

また、話速変換処理判定部３が、音声区間判別部１が音声区間と判別している場合であっても騒音レベル判定部４で判定される騒音レベルが所定のしきい値以上であるときには伸長圧縮処理部２に伸長処理を行わせないようにすれば、騒音の影響で音声が不自然に伸長されることを防止できる。 Further, when the speech level conversion processing determination unit 3 determines that the noise level determined by the noise level determination unit 4 is equal to or greater than a predetermined threshold even when the speech segment determination unit 1 determines that the speech segment is a speech segment. If the expansion / compression processing unit 2 is not allowed to perform the expansion process, it is possible to prevent the sound from being unnaturally expanded due to the influence of noise.

あるいは、話速変換処理判定部３が、音声区間判別部１で非音声区間と判別されている場合であっても騒音レベル判定部４で判定される騒音レベルが所定の第１のしきい値以上であり且つ第１のしきい値よりも大きい第２のしきい値未満であるときには伸長圧縮処理部２に圧縮処理を行わせず、さらに、音声区間判別部１が音声区間と判別している場合であっても騒音レベル判定部４で判定される騒音レベルが第２のしきい値以上であるときには伸長圧縮処理部２に伸長処理及び圧縮処理の何れも行わせないようにすれば、入力音声が誤って圧縮されることによって出力音声が途切れるのを防ぐことができるとともに、騒音の影響で音声が不自然に伸長されることを防止できる。 Alternatively, even if the speech speed conversion processing determination unit 3 is determined as a non-speech segment by the speech segment determination unit 1, the noise level determined by the noise level determination unit 4 is a predetermined first threshold value. If it is above and less than the second threshold value that is greater than the first threshold value, the decompression / compression processing unit 2 does not perform compression processing, and the speech segment determination unit 1 determines that it is a speech segment. Even if the noise level determined by the noise level determination unit 4 is equal to or higher than the second threshold value, if the expansion / compression processing unit 2 is not allowed to perform either expansion processing or compression processing, It is possible to prevent the output sound from being interrupted by the input sound being compressed by mistake, and to prevent the sound from being unnaturally expanded due to the influence of noise.

ところで、伸長圧縮処理部２で伸長又は圧縮された後の入力信号は、一旦バッファ部５に蓄積された後に話速変換された出力信号として出力されるのであるが、入力信号の音声区間が長くなるとバッファ部５がオーバーフローして他のメモリ領域に出力信号（音声データ）が上書きされてしまうことによる誤動作が発生する虞がある。同様に、入力信号の音声区間が短くなるとバッファ部５がアンダーフロー（アンダーラン）して他のメモリ領域に出力信号（音声データ）が上書きされてしまうことによる誤動作が発生する虞がある。 By the way, the input signal after being decompressed or compressed by the decompression / compression processing unit 2 is temporarily stored in the buffer unit 5 and then output as an output signal subjected to speech speed conversion. Then, the buffer unit 5 overflows, and there is a risk of malfunction due to the output signal (audio data) being overwritten in another memory area. Similarly, if the voice section of the input signal becomes shorter, the buffer unit 5 may underflow (underrun), and a malfunction may occur due to overwriting of the output signal (voice data) in another memory area.

そこで本実施形態における話速変換処理判定部３では、音声区間判別部１が音声区間と判別している場合であってもバッファ部５の空き容量が所定の下限値以下であるときには伸長圧縮処理部２に伸長処理を行わせないことでバッファ部５のオーバーフローを防止するとともに、音声区間判別部１が非音声区間と判別している場合であってもバッファ部５の空き容量が所定の上限値以上であるときには伸長圧縮処理部２に圧縮処理を行わせないことでバッファ部５のアンダーフロー（アンダーラン）を防止している。 Therefore, in the speech speed conversion process determination unit 3 according to the present embodiment, even when the voice segment determination unit 1 determines that the voice segment is a voice segment, if the free capacity of the buffer unit 5 is equal to or less than a predetermined lower limit value, the expansion / compression process is performed. By not allowing the unit 2 to perform the decompression process, the buffer unit 5 is prevented from overflowing, and the free space of the buffer unit 5 has a predetermined upper limit even when the voice segment determination unit 1 determines that it is a non-speech segment. When the value is greater than or equal to the value, the decompression / compression processing unit 2 is not allowed to perform compression processing, thereby preventing underflow (underrun) of the buffer unit 5.

また、伸長圧縮処理部２における伸長率を固定した場合、音声区間が長くなるにつれてバッファ部５の空き容量が減少し、バッファ部の空き容量がなくなったときに急に話速変換されていない音声が出力されてしまう虞がある。これに対して本実施形態の話速変換処理判定部３では、伸長圧縮処理部２に伸張処理を行わせる際の伸長率を、バッファ部５の空き容量に応じて（例えば、比例して）増減させることにより、バッファ部５の空き容量がなくなって急に話速変換されていない音声が出力されてしまうことを防止している。例えば、図３に示すように「しょうえねるぎーはこころがけしだいです」という文章の入力音声を話速変換（伸長）する場合、当該文章の文頭部分ではバッファ部５の空き容量が十分に残っているので伸長率を大きくし、文章の文末に近付いてバッファ部５の空き容量が減少するにつれて伸長率を小さくすればよい。 In addition, when the expansion rate in the expansion / compression processing unit 2 is fixed, the free capacity of the buffer unit 5 decreases as the audio section becomes longer, and the voice whose speech speed is not suddenly converted when the free capacity of the buffer unit is exhausted. May be output. On the other hand, in the speech speed conversion process determination unit 3 of the present embodiment, the expansion rate when the expansion / compression processing unit 2 performs the expansion process is determined according to the free capacity of the buffer unit 5 (for example, in proportion). By increasing / decreasing, it is possible to prevent the voice that has not been subjected to the speech speed conversion from being suddenly output due to the empty capacity of the buffer unit 5 being lost. For example, as shown in FIG. 3, when the input voice of the sentence “Sho Nerugi is as much as possible” is converted (expanded), the free space of the buffer unit 5 remains at the beginning of the sentence. Therefore, the expansion rate may be increased, and the expansion rate may be decreased as the free space of the buffer unit 5 decreases near the end of the sentence.

ここで、本実施形態の話速変換装置は、いわゆるハンズフリータイプのインターホン機器に好適なものである。つまり、住戸外に設置されたインターホン機器（玄関子機あるいはロビーインターホン）と住戸内に設置されたインターホン機器（親機）との間でハンズフリー通話（拡声通話）するインターホンシステムにおいては、玄関子機あるいはロビーインターホンのマイクロホンで集音される音声に屋外の騒音が重畳している場合が多い。故に、親機に本実施形態の話速変換装置を搭載することによって、入力音声（玄関子機あるいはロビーインターホンから入力する音声）に騒音が重畳する場合においても適切に話速変換することができるという顕著な効果が得られるものである。 Here, the speech speed conversion apparatus according to the present embodiment is suitable for a so-called hands-free type intercom device. In other words, in an intercom system that performs a hands-free call (speech call) between an intercom device installed outside the dwelling unit (entrance handset or lobby intercom) and an intercom device installed inside the dwelling unit (master unit) In many cases, outdoor noise is superimposed on the sound collected by the microphone of the machine or lobby intercom. Therefore, by installing the speech speed conversion device of the present embodiment in the master unit, it is possible to appropriately convert the speech speed even when noise is superimposed on the input speech (speech input from the front door unit or lobby intercom). A remarkable effect is obtained.

１音声区間判定部
２伸長圧縮処理部
３話速変換処理判定部
４騒音レベル判定部
５バッファ部 DESCRIPTION OF SYMBOLS 1 Speech section determination part 2 Decompression compression processing part 3 Speech rate conversion process determination part 4 Noise level determination part 5 Buffer part

Claims

A speech segment determination unit that determines a speech segment in which speech is included in an input signal and a non-speech segment in which speech is not included, a decompression / compression processing unit that decompresses or compresses the input signal, and a speech segment determination Speed conversion that causes the decompression / compression processing unit to perform decompression processing when the speech section is determined to be a speech segment and causes the decompression / compression processing unit to perform compression processing when the speech segment determination unit determines that it is a non-speech segment A processing determination unit, and a noise level determination unit that determines a noise level included in the input signal,
The speech speed conversion processing determination unit determines whether or not the expansion processing and the compression processing in the expansion / compression processing unit can be performed according to the noise level determined by the noise level determination unit, and the speech section determination unit determines that the non-speech section is determined. A speech speed converting apparatus characterized in that the compression / decompression processing unit does not perform compression processing when the noise level determined by the noise level determination unit is equal to or greater than a predetermined threshold even when the noise level is determined .

A speech segment determination unit that determines a speech segment in which speech is included in an input signal and a non-speech segment in which speech is not included, a decompression / compression processing unit that decompresses or compresses the input signal, and a speech segment determination Speed conversion that causes the decompression / compression processing unit to perform decompression processing when the speech section is determined to be a speech segment and causes the decompression / compression processing unit to perform compression processing when the speech segment determination unit determines that it is a non-speech segment A processing determination unit, and a noise level determination unit that determines a noise level included in the input signal,
Speech speed conversion processing determination unit is configured to determine whether to execute expansion processing and compression processing in the decompression compression processor according to the noise level determining noise level determining unit, speech segment determination section to determine the voice interval A speech speed converting apparatus characterized in that even when the noise level determined by the noise level determination unit is equal to or greater than a predetermined threshold value, the expansion / compression processing unit does not perform the expansion processing.

A speech segment determination unit that determines a speech segment in which speech is included in an input signal and a non-speech segment in which speech is not included, a decompression / compression processing unit that decompresses or compresses the input signal, and a speech segment determination Speed conversion that causes the decompression / compression processing unit to perform decompression processing when the speech section is determined to be a speech segment and causes the decompression / compression processing unit to perform compression processing when the speech segment determination unit determines that it is a non-speech segment A processing determination unit, and a noise level determination unit that determines a noise level included in the input signal,
Speech speed conversion processing determination unit, and Ri first der than the threshold noise level is given even when it is determined by the noise level determining unit which speech segment determination unit is to determine the non-speech section Even if it is less than the second threshold value, which is larger than the first threshold value, the decompression / compression processing unit does not perform the compression process, and further, even if the speech segment determination unit determines that it is a speech segment An apparatus for converting speech speed according to claim 1, wherein when the noise level determined by the noise level determination unit is equal to or higher than a second threshold value, the expansion / compression processing unit does not perform either expansion processing or compression processing .

The speech speed conversion apparatus according to any one of claims 1 to 3, wherein the noise level determination unit obtains a noise level using an absolute value of an amplitude of an input signal .

The speech speed conversion apparatus according to any one of claims 1 to 3 , wherein the noise level determination unit obtains a noise level using a frequency component of a non-voice band of the input signal.

A buffer unit for storing the input signal expanded or compressed by the expansion / compression processing unit;
The speech rate conversion processing determination unit does not cause the expansion / compression processing unit to perform the expansion process when the free space of the buffer unit is equal to or less than a predetermined lower limit value even when the speech section determination unit determines that the speech section is a speech section. The speech rate conversion apparatus according to any one of claims 1 to 5 , wherein

A buffer unit for storing the input signal expanded or compressed by the expansion / compression processing unit;
Speech speed conversion processing determination unit, to perform a compression process in the expansion compression processing unit when the voice segment discriminating unit free space of the buffer unit even if is determined that the non-voice section is equal to or greater than a predetermined upper limit value The speech rate conversion apparatus according to claim 1, wherein the speech rate conversion apparatus is not provided.

A buffer unit for storing the input signal expanded or compressed by the expansion / compression processing unit;
The speech rate conversion processing determination unit increases or decreases the expansion rate when the expansion / compression processing unit performs expansion processing according to the free capacity of the buffer unit. The speech rate conversion device described .