JP5346230B2 - Speaking speed converter - Google Patents

Speaking speed converter Download PDF

Info

Publication number
JP5346230B2
JP5346230B2 JP2009056958A JP2009056958A JP5346230B2 JP 5346230 B2 JP5346230 B2 JP 5346230B2 JP 2009056958 A JP2009056958 A JP 2009056958A JP 2009056958 A JP2009056958 A JP 2009056958A JP 5346230 B2 JP5346230 B2 JP 5346230B2
Authority
JP
Japan
Prior art keywords
speech
determination unit
noise level
unit
compression processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2009056958A
Other languages
Japanese (ja)
Other versions
JP2010210947A (en
Inventor
哲平 鷲
恵一 ▲吉▼田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Corp
Panasonic Holdings Corp
Original Assignee
Panasonic Corp
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Corp, Matsushita Electric Industrial Co Ltd filed Critical Panasonic Corp
Priority to JP2009056958A priority Critical patent/JP5346230B2/en
Publication of JP2010210947A publication Critical patent/JP2010210947A/en
Application granted granted Critical
Publication of JP5346230B2 publication Critical patent/JP5346230B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephone Function (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Description

本発明は、入力音声の話速を変換して出力する話速変換装置に関するものである。   The present invention relates to a speech speed conversion device that converts a speech speed of input speech and outputs the converted speech speed.

従来、入力音声の話速を変換して出力する話速変換装置が種々提案されている。ここで、単純に話速を遅くしてしまうと出力音声に遅延が生じてしまうので、音を含む区間(有音区間)と音を含まない区間(無音区間)を判別し、有音区間の入力信号を伸長するとともに無音区間の入力信号を圧縮することによって出力音声に遅延が生じないようにしていた(例えば、特許文献1参照)。   Conventionally, various speech speed conversion devices that convert and output the speech speed of input speech have been proposed. Here, if the speech speed is simply slowed down, the output voice will be delayed, so the section containing sound (sounded section) and the section not containing sound (silent section) are discriminated. By delaying the input signal and compressing the input signal in the silent period, the output sound is prevented from being delayed (for example, see Patent Document 1).

特表2006−77626号公報JP-T-2006-77626

しかしながら、上述のように有音区間と無音区間を判別する従来例では、入力信号に騒音が重畳した場合に無音区間が存在しなくなるため、常に入力信号が伸長されてしまい、話速変換による遅延が増加していくという問題がある。そこで、入力信号に対して音声が含まれる区間(音声区間)と音声が含まれない区間(非音声区間)とを判別をすることにより、騒音が重畳した入力信号に対しても遅延が発生しない話速変換を実現することが考えられる。しかしながら、入力信号に重畳する騒音のレベルが大きい場合、音声区間が非音声区間と誤判別されてしまい、その結果、音声区間が誤って圧縮されることによって入力音声の一部が失われてしまう虞があった。   However, in the conventional example that discriminates between the voiced section and the silent section as described above, since the silent section does not exist when noise is superimposed on the input signal, the input signal is always expanded, and the delay due to the speech speed conversion. There is a problem that increases. Therefore, by discriminating between a section (speech section) in which speech is included in the input signal and a section (non-speech section) in which speech is not included (delayed by non-speech), no delay occurs in the input signal on which noise is superimposed. It is conceivable to realize speech speed conversion. However, when the level of noise superimposed on the input signal is large, the speech section is misclassified as a non-speech section, and as a result, a part of the input speech is lost due to erroneous compression of the speech section. There was a fear.

本発明は上記事情に鑑みて為されたものであり、その目的は、入力音声に騒音が重畳する場合においても適切に話速変換することができる話速変換装置を提供することにある。   The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a speech speed conversion device capable of appropriately converting speech speed even when noise is superimposed on input speech.

請求項1の発明は、上記目的を達成するために、入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部と、音声区間判別部が音声区間と判別しているときに伸長圧縮処理部に伸張処理を行わせるとともに音声区間判別部が非音声区間と判別しているときに伸長圧縮処理部に圧縮処理を行わせる話速変換処理判定部と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部とを備え、話速変換処理判定部は、騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するとともに、音声区間判別部が非音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが所定のしきい値以上であるときには伸長圧縮処理部に圧縮処理を行わせないことを特徴とする。 In order to achieve the above object, the first aspect of the present invention is a speech section discriminating section for discriminating between a speech section in which speech is included in an input signal and a non-speech section in which speech is not included, and the input signal is expanded. Alternatively, when the decompression / compression processing unit that compresses and outputs and the speech segment determination unit determines that the speech segment is a speech segment, the decompression / compression processing unit performs the decompression process and the speech segment determination unit determines that the segment is a non-speech segment A speech speed conversion processing determination unit that sometimes causes the compression / decompression processing unit to perform compression processing, and a noise level determination unit that determines a noise level included in the input signal, and the speech speed conversion processing determination unit is a noise level determination unit The decompression / compression processing unit determines whether or not the decompression process and the compression process can be performed according to the noise level determined by the sound level, and the noise level determination unit determines even if the speech segment determination unit determines that it is a non-speech segment. Be done Sound level is equal to or not to perform a compression process in the expansion compression processing unit when it is above a predetermined threshold.

請求項1の発明によれば、話速変換処理判定部が騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するので、入力音声に騒音が重畳する場合においても適切に話速変換することができる。さらに、音声区間判別部が非音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルがしきい値以上であるときには伸長圧縮処理部が圧縮処理を行わないので、入力音声が誤って圧縮されることによって出力音声が途切れるのを防ぐことができる。 According to the first aspect of the present invention, the speech speed conversion processing determination unit determines whether or not the expansion processing and the compression processing in the expansion / compression processing unit can be performed according to the noise level determined by the noise level determination unit. The speech speed can be appropriately converted even when is superimposed. Furthermore, even when the speech section determination unit determines that it is a non-speech section, the expansion / compression processing unit does not perform compression processing when the noise level determined by the noise level determination unit is greater than or equal to the threshold value. It is possible to prevent the output sound from being interrupted by the input sound being compressed by mistake.

請求項の発明は、上記目的を達成するために入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部と、音声区間判別部が音声区間と判別しているときに伸長圧縮処理部に伸張処理を行わせるとともに音声区間判別部が非音声区間と判別しているときに伸長圧縮処理部に圧縮処理を行わせる話速変換処理判定部と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部とを備え、話速変換処理判定部は、騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するとともに、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが所定のしきい値以上であるときには伸長圧縮処理部に伸長処理を行わせないことを特徴とする。 In order to achieve the above object , a second aspect of the present invention provides a speech segment discriminating unit that discriminates between a speech segment that includes speech and a non-speech segment that does not include speech, and expands the input signal. Alternatively, when the decompression / compression processing unit that compresses and outputs and the speech segment determination unit determines that the speech segment is a speech segment, the decompression / compression processing unit performs the decompression process and the speech segment determination unit determines that the segment is a non-speech segment A speech speed conversion processing determination unit that sometimes causes the compression / decompression processing unit to perform compression processing, and a noise level determination unit that determines a noise level included in the input signal, and the speech speed conversion processing determination unit is a noise level determination unit The decompression processing in the decompression processing unit and whether or not to execute the compression process are determined according to the noise level determined by the sound level, and the noise level determination unit determines even if the speech segment determination unit determines that it is a speech segment. Noisy Level is equal to or not to perform the decompression processing to the decompression compression processing unit when it is above a predetermined threshold.

請求項の発明によれば、話速変換処理判定部が騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するので、入力音声に騒音が重畳する場合においても適切に話速変換することができる。さらに、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルがしきい値以上であるときには伸長圧縮処理部が伸長処理を行わないので、騒音の影響で音声が不自然に伸長されることを防止できる。 According to the second aspect of the present invention, the speech speed conversion processing determination unit determines whether or not the expansion processing and the compression processing in the expansion / compression processing unit can be performed according to the noise level determined by the noise level determination unit. The speech speed can be appropriately converted even when is superimposed. Furthermore, even if the speech segment determination unit determines that the speech segment is a speech segment, if the noise level determined by the noise level determination unit is greater than or equal to the threshold value, the expansion / compression processing unit does not perform the expansion process. It is possible to prevent the sound from being unnaturally stretched due to the influence of.

請求項の発明は、上記目的を達成するために入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部と、音声区間判別部が音声区間と判別しているときに伸長圧縮処理部に伸張処理を行わせるとともに音声区間判別部が非音声区間と判別しているときに伸長圧縮処理部に圧縮処理を行わせる話速変換処理判定部と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部とを備え、話速変換処理判定部は、音声区間判別部が非音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが所定の第1のしきい値以上であり且つ第1のしきい値よりも大きい第2のしきい値未満であるときには伸長圧縮処理部に圧縮処理を行わせず、さらに、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが第2のしきい値以上であるときには伸長圧縮処理部に伸長処理及び圧縮処理の何れも行わせないことを特徴とする。 In order to achieve the above object , a third aspect of the present invention provides a voice section discriminating section that discriminates a voice section in which voice is included in an input signal and a non-voice section in which voice is not included, and expands the input signal. Alternatively, when the decompression / compression processing unit that compresses and outputs and the speech segment determination unit determines that the speech segment is a speech segment, the decompression / compression processing unit performs the decompression process and the speech segment determination unit determines that the segment is a non-speech segment A speech speed conversion processing determining unit that causes the decompression / compression processing unit to perform compression processing, and a noise level determining unit that determines a noise level included in the input signal. Even if the sound level is determined to be a non-speech segment, the second level of the noise level determined by the noise level determination unit is greater than or equal to a predetermined first threshold value and greater than the first threshold value. When it is less than the threshold, decompression and compression processing Without performing the compression process, further, extended compression when the voice segment discriminating unit noise level determined by the noise level determining unit even if is determined that the speech section is a second threshold or more It is characterized in that neither the decompression process nor the compression process is performed by the processing unit.

請求項の発明によれば、話速変換処理判定部が騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するので、入力音声に騒音が重畳する場合においても適切に話速変換することができる。さらに、音声区間判別部が非音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが第1のしきい値以上且つ第2のしきい値未満と判定されているときには伸長圧縮処理部が圧縮処理を行わないので、入力音声が誤って圧縮されることによって出力音声が途切れるのを防ぐことができ、また、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが第2のしきい値以上と判定されているときには伸長圧縮処理部が伸長処理を行わないので、騒音の影響で音声が不自然に伸長されることを防止できる。 According to the third aspect of the present invention, the speech speed conversion process determination unit determines whether or not the expansion process and the compression process can be performed in the expansion / compression process unit according to the noise level determined by the noise level determination unit. The speech speed can be appropriately converted even when is superimposed. Furthermore, even when the speech segment determination unit determines that the speech segment is a non-speech segment, the noise level determined by the noise level determination unit is determined to be greater than or equal to the first threshold value and less than the second threshold value. When the decompression compression processing unit does not perform compression processing, the input speech is prevented from being interrupted by accidentally compressing the input speech, and the speech segment determination unit determines that it is a speech segment Even if the noise level determined by the noise level determination unit is determined to be greater than or equal to the second threshold value, the expansion / compression processing unit does not perform expansion processing, so that the sound is unnaturally expanded due to noise. Can be prevented.

請求項の発明は、請求項1〜の何れか1項の発明において、騒音レベル判定部は、入力信号の振幅の絶対値を用いて騒音レベルを求めることを特徴とする。 According to a fourth aspect of the present invention, in any one of the first to third aspects, the noise level determination unit obtains the noise level using the absolute value of the amplitude of the input signal.

請求項の発明によれば、比較的簡単に騒音レベルを求めることができる。 According to invention of Claim 4 , a noise level can be calculated | required comparatively easily.

請求項の発明は、請求項1〜の何れか1項の発明において、騒音レベル判定部は、入力信号の非音声帯域の周波数成分を用いて騒音レベルを求めることを特徴とする。 According to a fifth aspect of the present invention, in any one of the first to third aspects, the noise level determination unit obtains a noise level using a frequency component of a non-voice band of the input signal.

請求項の発明によれば、入力音声のレベルに依存せずに騒音レベルを求めることができる。 According to the invention of claim 5 , the noise level can be obtained without depending on the level of the input voice.

請求項の発明は、請求項1〜の何れか1項の発明において、伸長圧縮処理部で伸長又は圧縮された入力信号を蓄積するバッファ部を備え、話速変換処理判定部は、音声区間判別部が音声区間と判別している場合であってもバッファ部の空き容量が所定の下限値以下であるときには伸長圧縮処理部に伸長処理を行わせないことを特徴とする。 A sixth aspect of the present invention includes a buffer unit that accumulates the input signal expanded or compressed by the expansion / compression processing unit according to any one of the first to fifth aspects of the present invention. Even when the section discriminating unit discriminates the voice section, the decompression / compression processing unit is not allowed to perform the decompression process when the free capacity of the buffer unit is equal to or less than a predetermined lower limit value.

請求項の発明によれば、バッファ部がオーバーフローして他のメモリ領域に出力音声を上書きしてしまうことによる誤動作を防止できる。 According to the sixth aspect of the present invention, it is possible to prevent malfunction caused by overflow of the buffer section and overwriting of output sound in another memory area.

請求項の発明は、請求項1〜の何れか1項の発明において、伸長圧縮処理部で伸長又は圧縮された入力信号を蓄積するバッファ部を備え、話速変換処理判定部は、音声区間判別部が非音声区間と判別している場合であってもバッファ部の空き容量が所定の上限値以上であるときには伸長圧縮処理部に圧縮処理を行わせないことを特徴とする。 The invention of claim 7 is the invention of any one of claims 1 to 6 , further comprising a buffer unit that accumulates the input signal expanded or compressed by the expansion / compression processing unit, and the speech speed conversion processing determination unit Even when the section determination unit determines that the section is a non-speech section, the decompression / compression processing unit is not allowed to perform compression processing when the free space in the buffer unit is equal to or greater than a predetermined upper limit value.

請求項の発明によれば、バッファ部がアンダーフロー(アンダーラン)して他のメモリ領域に出力音声を上書きしてしまうことによる誤動作を防止できる。 According to the seventh aspect of the present invention, it is possible to prevent malfunction caused by the buffer section underflowing (underrun) and overwriting the output sound in another memory area.

請求項の発明は、請求項1〜の何れか1項の発明において、伸長圧縮処理部で伸長又は圧縮された入力信号を蓄積するバッファ部を備え、話速変換処理判定部は、伸長圧縮処理部に伸張処理を行わせる際の伸長率を、バッファ部の空き容量に応じて増減させることを特徴とする。 The invention according to claim 8 is the invention according to any one of claims 1 to 7 , further comprising a buffer unit that accumulates an input signal expanded or compressed by the expansion / compression processing unit, and the speech speed conversion process determining unit is expanded The expansion rate when the compression processing unit performs the expansion process is increased or decreased according to the free capacity of the buffer unit.

請求項の発明によれば、バッファ部の空き容量がなくなったときに急に話速変換されていない音声が出力されてしまうことを防止できる。 According to the eighth aspect of the present invention, it is possible to prevent a voice that has not been subjected to speech speed conversion from being suddenly output when there is no more free space in the buffer unit.

本発明によれば、入力音声に騒音が重畳する場合においても適切に話速変換することができる。   According to the present invention, speech speed can be appropriately converted even when noise is superimposed on input speech.

本発明の実施形態を示すブロック図である。It is a block diagram which shows embodiment of this invention. (a)は騒音が重畳されていない入力音声の絶対値の波形図、(b)は騒音が重畳されている入力音声の絶対値の波形図である。(A) is a waveform diagram of the absolute value of the input speech on which noise is not superimposed, and (b) is a waveform diagram of the absolute value of the input speech on which noise is superimposed. 同上の動作説明図である。It is operation | movement explanatory drawing same as the above.

以下、図面を参照して本発明の実施形態を詳細に説明する。   Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本実施形態の話速変換装置は、図1に示すように入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部1と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部2と、音声区間判別部1が音声区間と判別しているときに伸長圧縮処理部2に伸張処理を行わせるとともに音声区間判別部1が非音声区間と判別しているときに伸長圧縮処理部2に圧縮処理を行わせる話速変換処理判定部3と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部4と、伸長圧縮処理部2で伸長又は圧縮された入力信号を蓄積するバッファ部5とを備えている。尚、これらの各部は、DSP(Digital Signal Proccesor)に所定のプログラムを実行させることで実現されるものであり、例えば、マイクロホンで集音されたアナログの音響信号をA/D変換して得られるデジタル信号が入力信号となる。   As shown in FIG. 1, the speech speed conversion apparatus according to the present embodiment includes a speech section determination unit 1 that determines a speech section in which speech is included in an input signal and a non-speech section in which speech is not included, and an input signal. The decompression / compression processing unit 2 that decompresses or compresses and outputs the decompression / compression processing unit 2 causes the decompression / compression processing unit 2 to perform decompression processing when the speech segment determination unit 1 determines that it is a speech segment, and the speech segment determination unit 1 performs non-speech A speech speed conversion processing determination unit 3 that causes the expansion / compression processing unit 2 to perform compression processing when it is determined to be a section, a noise level determination unit 4 that determines a noise level included in an input signal, and an expansion / compression processing unit 2 And a buffer unit 5 for accumulating the input signal expanded or compressed in (1). Each of these units is realized by causing a DSP (Digital Signal Processor) to execute a predetermined program, and can be obtained by, for example, A / D converting an analog acoustic signal collected by a microphone. A digital signal becomes an input signal.

音声区間判別部1は従来周知のものであって、入力信号の音声区間を判別して当該判別結果を話速変換処理判定部3に出力している。   The voice section discriminating unit 1 is a conventionally known one, discriminates the voice section of the input signal and outputs the discrimination result to the speech speed conversion processing judgment unit 3.

伸長圧縮処理部2は、入力信号に時間軸圧伸(圧縮及び伸長)処理を施し、その音声信号の表す音声の話速を調整するものであって、例えば、フレーム長を変えつつ音声の自己相関を算出し、最も相関が高くなるフレーム長をその音声の周期とみなし、その周期単位で波形の挿入または削除を行うことにより話速変換を行うPICOLAと呼ばれるアルゴリズムを実行している。   The decompression / compression processing unit 2 performs time-axis companding (compression and decompression) processing on the input signal and adjusts the speech speed of the speech represented by the speech signal. An algorithm called PICOLA that performs speech speed conversion is performed by calculating the correlation, regarding the frame length with the highest correlation as the period of the speech, and inserting or deleting the waveform in that period.

騒音レベル判定部4は、立ち上がりが緩やかであり且つ立ち下がりが急峻な特性、すなわち、立ち上がり時定数が相対的に大きく且つ立ち下がり時定数が相対的に小さい応答特性を有するデジタルフィルタからなり、入力信号の振幅の絶対値を用いて入力信号中に定常的に存在する騒音のレベル(騒音レベル)を判定(推定)するものである。図2(a)に示すように入力信号が音声のみで騒音が重畳されていない場合と、図2(b)に示すように入力信号が音声に騒音が重畳されている場合とを比較すると、音声の振幅波形は相対的に急激に変化するのに対して、騒音の振幅波形は相対的に緩やかに変化するので、立ち上がり時定数が相対的に大きく且つ立ち下がり時定数が相対的に小さい応答特性を有するデジタルフィルタによって入力信号の振幅波形から音声の振幅波形を除去することで騒音レベルを判定することができる。尚、図2では入力信号の絶対値の波形を示している。但し、騒音レベル判定部4において離散高速フーリエ変換を行い、入力信号の非音声周波数帯域の周波数成分を用いて騒音レベルを判定することも可能である。   The noise level determination unit 4 is composed of a digital filter having a characteristic that the rise is gradual and the fall is steep, that is, a response characteristic having a relatively large rise time constant and a relatively small fall time constant. The absolute value of the amplitude of the signal is used to determine (estimate) the level of noise (noise level) that is constantly present in the input signal. Comparing the case where the input signal is speech only and no noise is superimposed as shown in FIG. 2A, and the case where the input signal is speech and noise is superimposed as shown in FIG. While the amplitude waveform of speech changes relatively abruptly, the amplitude waveform of noise changes relatively slowly, so the response with a relatively large rise time constant and a relatively small fall time constant The noise level can be determined by removing the amplitude waveform of the voice from the amplitude waveform of the input signal by a digital filter having characteristics. FIG. 2 shows the waveform of the absolute value of the input signal. However, the noise level determination unit 4 can perform discrete fast Fourier transform to determine the noise level using the frequency component of the non-speech frequency band of the input signal.

話速変換処理判定部3は、騒音レベル判定部4が判定する騒音レベルに応じて伸長圧縮処理部2における伸張処理及び圧縮処理の実行可否を判定する。具体的には、話速変換処理判定部3では、音声区間判別部1が非音声区間と判別している場合であっても騒音レベル判定部4で判定される騒音レベルがしきい値以上であるときには伸長圧縮処理部2に圧縮処理を行わせない。つまり、騒音レベルがしきい値以上である場合、音声区間判別部1が音声区間を非音声区間と誤判別してしまう可能性が高くなるので、このような場合に伸長圧縮処理部2に圧縮処理を行わせないことにより、入力音声に騒音が重畳する場合においても適切に話速変換する、具体的には、入力音声が誤って圧縮されることによって出力音声が途切れるのを防ぐことができる。   The speech speed conversion processing determination unit 3 determines whether or not the expansion processing and the compression processing in the expansion / compression processing unit 2 are executable according to the noise level determined by the noise level determination unit 4. Specifically, in the speech speed conversion process determination unit 3, the noise level determined by the noise level determination unit 4 is greater than or equal to the threshold value even when the voice segment determination unit 1 determines that it is a non-speech segment. In some cases, the decompression / compression processing unit 2 is not allowed to perform compression processing. That is, when the noise level is equal to or higher than the threshold value, there is a high possibility that the voice segment determination unit 1 erroneously discriminates the voice segment as a non-speech segment. In such a case, the decompression compression processing unit 2 performs compression processing. By not performing it, it is possible to appropriately convert the speech speed even when noise is superimposed on the input voice. Specifically, it is possible to prevent the output voice from being interrupted due to erroneous compression of the input voice.

また、話速変換処理判定部3が、音声区間判別部1が音声区間と判別している場合であっても騒音レベル判定部4で判定される騒音レベルが所定のしきい値以上であるときには伸長圧縮処理部2に伸長処理を行わせないようにすれば、騒音の影響で音声が不自然に伸長されることを防止できる。   Further, when the speech level conversion processing determination unit 3 determines that the noise level determined by the noise level determination unit 4 is equal to or greater than a predetermined threshold even when the speech segment determination unit 1 determines that the speech segment is a speech segment. If the expansion / compression processing unit 2 is not allowed to perform the expansion process, it is possible to prevent the sound from being unnaturally expanded due to the influence of noise.

あるいは、話速変換処理判定部3が、音声区間判別部1で非音声区間と判別されている場合であっても騒音レベル判定部4で判定される騒音レベルが所定の第1のしきい値以上であり且つ第1のしきい値よりも大きい第2のしきい値未満であるときには伸長圧縮処理部2に圧縮処理を行わせず、さらに、音声区間判別部1が音声区間と判別している場合であっても騒音レベル判定部4で判定される騒音レベルが第2のしきい値以上であるときには伸長圧縮処理部2に伸長処理及び圧縮処理の何れも行わせないようにすれば、入力音声が誤って圧縮されることによって出力音声が途切れるのを防ぐことができるとともに、騒音の影響で音声が不自然に伸長されることを防止できる。 Alternatively, even if the speech speed conversion processing determination unit 3 is determined as a non-speech segment by the speech segment determination unit 1, the noise level determined by the noise level determination unit 4 is a predetermined first threshold value. If it is above and less than the second threshold value that is greater than the first threshold value, the decompression / compression processing unit 2 does not perform compression processing, and the speech segment determination unit 1 determines that it is a speech segment. Even if the noise level determined by the noise level determination unit 4 is equal to or higher than the second threshold value, if the expansion / compression processing unit 2 is not allowed to perform either expansion processing or compression processing, It is possible to prevent the output sound from being interrupted by the input sound being compressed by mistake, and to prevent the sound from being unnaturally expanded due to the influence of noise.

ところで、伸長圧縮処理部2で伸長又は圧縮された後の入力信号は、一旦バッファ部5に蓄積された後に話速変換された出力信号として出力されるのであるが、入力信号の音声区間が長くなるとバッファ部5がオーバーフローして他のメモリ領域に出力信号(音声データ)が上書きされてしまうことによる誤動作が発生する虞がある。同様に、入力信号の音声区間が短くなるとバッファ部5がアンダーフロー(アンダーラン)して他のメモリ領域に出力信号(音声データ)が上書きされてしまうことによる誤動作が発生する虞がある。   By the way, the input signal after being decompressed or compressed by the decompression / compression processing unit 2 is temporarily stored in the buffer unit 5 and then output as an output signal subjected to speech speed conversion. Then, the buffer unit 5 overflows, and there is a risk of malfunction due to the output signal (audio data) being overwritten in another memory area. Similarly, if the voice section of the input signal becomes shorter, the buffer unit 5 may underflow (underrun), and a malfunction may occur due to overwriting of the output signal (voice data) in another memory area.

そこで本実施形態における話速変換処理判定部3では、音声区間判別部1が音声区間と判別している場合であってもバッファ部5の空き容量が所定の下限値以下であるときには伸長圧縮処理部2に伸長処理を行わせないことでバッファ部5のオーバーフローを防止するとともに、音声区間判別部1が非音声区間と判別している場合であってもバッファ部5の空き容量が所定の上限値以上であるときには伸長圧縮処理部2に圧縮処理を行わせないことでバッファ部5のアンダーフロー(アンダーラン)を防止している。   Therefore, in the speech speed conversion process determination unit 3 according to the present embodiment, even when the voice segment determination unit 1 determines that the voice segment is a voice segment, if the free capacity of the buffer unit 5 is equal to or less than a predetermined lower limit value, the expansion / compression process is performed. By not allowing the unit 2 to perform the decompression process, the buffer unit 5 is prevented from overflowing, and the free space of the buffer unit 5 has a predetermined upper limit even when the voice segment determination unit 1 determines that it is a non-speech segment. When the value is greater than or equal to the value, the decompression / compression processing unit 2 is not allowed to perform compression processing, thereby preventing underflow (underrun) of the buffer unit 5.

また、伸長圧縮処理部2における伸長率を固定した場合、音声区間が長くなるにつれてバッファ部5の空き容量が減少し、バッファ部の空き容量がなくなったときに急に話速変換されていない音声が出力されてしまう虞がある。これに対して本実施形態の話速変換処理判定部3では、伸長圧縮処理部2に伸張処理を行わせる際の伸長率を、バッファ部5の空き容量に応じて(例えば、比例して)増減させることにより、バッファ部5の空き容量がなくなって急に話速変換されていない音声が出力されてしまうことを防止している。例えば、図3に示すように「しょうえねるぎーはこころがけしだいです」という文章の入力音声を話速変換(伸長)する場合、当該文章の文頭部分ではバッファ部5の空き容量が十分に残っているので伸長率を大きくし、文章の文末に近付いてバッファ部5の空き容量が減少するにつれて伸長率を小さくすればよい。   In addition, when the expansion rate in the expansion / compression processing unit 2 is fixed, the free capacity of the buffer unit 5 decreases as the audio section becomes longer, and the voice whose speech speed is not suddenly converted when the free capacity of the buffer unit is exhausted. May be output. On the other hand, in the speech speed conversion process determination unit 3 of the present embodiment, the expansion rate when the expansion / compression processing unit 2 performs the expansion process is determined according to the free capacity of the buffer unit 5 (for example, in proportion). By increasing / decreasing, it is possible to prevent the voice that has not been subjected to the speech speed conversion from being suddenly output due to the empty capacity of the buffer unit 5 being lost. For example, as shown in FIG. 3, when the input voice of the sentence “Sho Nerugi is as much as possible” is converted (expanded), the free space of the buffer unit 5 remains at the beginning of the sentence. Therefore, the expansion rate may be increased, and the expansion rate may be decreased as the free space of the buffer unit 5 decreases near the end of the sentence.

ここで、本実施形態の話速変換装置は、いわゆるハンズフリータイプのインターホン機器に好適なものである。つまり、住戸外に設置されたインターホン機器(玄関子機あるいはロビーインターホン)と住戸内に設置されたインターホン機器(親機)との間でハンズフリー通話(拡声通話)するインターホンシステムにおいては、玄関子機あるいはロビーインターホンのマイクロホンで集音される音声に屋外の騒音が重畳している場合が多い。故に、親機に本実施形態の話速変換装置を搭載することによって、入力音声(玄関子機あるいはロビーインターホンから入力する音声)に騒音が重畳する場合においても適切に話速変換することができるという顕著な効果が得られるものである。   Here, the speech speed conversion apparatus according to the present embodiment is suitable for a so-called hands-free type intercom device. In other words, in an intercom system that performs a hands-free call (speech call) between an intercom device installed outside the dwelling unit (entrance handset or lobby intercom) and an intercom device installed inside the dwelling unit (master unit) In many cases, outdoor noise is superimposed on the sound collected by the microphone of the machine or lobby intercom. Therefore, by installing the speech speed conversion device of the present embodiment in the master unit, it is possible to appropriately convert the speech speed even when noise is superimposed on the input speech (speech input from the front door unit or lobby intercom). A remarkable effect is obtained.

1 音声区間判定部
2 伸長圧縮処理部
3 話速変換処理判定部
4 騒音レベル判定部
5 バッファ部
DESCRIPTION OF SYMBOLS 1 Speech section determination part 2 Decompression compression processing part 3 Speech rate conversion process determination part 4 Noise level determination part 5 Buffer part

Claims (8)

入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部と、音声区間判別部が音声区間と判別しているときに伸長圧縮処理部に伸張処理を行わせるとともに音声区間判別部が非音声区間と判別しているときに伸長圧縮処理部に圧縮処理を行わせる話速変換処理判定部と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部とを備え、
話速変換処理判定部は、騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するとともに、音声区間判別部が非音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが所定のしきい値以上であるときには伸長圧縮処理部に圧縮処理を行わせないことを特徴とする話速変換装置。
A speech segment determination unit that determines a speech segment in which speech is included in an input signal and a non-speech segment in which speech is not included, a decompression / compression processing unit that decompresses or compresses the input signal, and a speech segment determination Speed conversion that causes the decompression / compression processing unit to perform decompression processing when the speech section is determined to be a speech segment and causes the decompression / compression processing unit to perform compression processing when the speech segment determination unit determines that it is a non-speech segment A processing determination unit, and a noise level determination unit that determines a noise level included in the input signal,
The speech speed conversion processing determination unit determines whether or not the expansion processing and the compression processing in the expansion / compression processing unit can be performed according to the noise level determined by the noise level determination unit, and the speech section determination unit determines that the non-speech section is determined. A speech speed converting apparatus characterized in that the compression / decompression processing unit does not perform compression processing when the noise level determined by the noise level determination unit is equal to or greater than a predetermined threshold even when the noise level is determined .
入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部と、音声区間判別部が音声区間と判別しているときに伸長圧縮処理部に伸張処理を行わせるとともに音声区間判別部が非音声区間と判別しているときに伸長圧縮処理部に圧縮処理を行わせる話速変換処理判定部と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部とを備え、
話速変換処理判定部は、騒音レベル判定部が判定する騒音レベルに応じて伸長圧縮処理部における伸張処理及び圧縮処理の実行可否を判定するとともに、音声区間判別部が声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが所定のしきい値以上であるときには伸長圧縮処理部に伸長処理を行わせないことを特徴とする速変換装置。
A speech segment determination unit that determines a speech segment in which speech is included in an input signal and a non-speech segment in which speech is not included, a decompression / compression processing unit that decompresses or compresses the input signal, and a speech segment determination Speed conversion that causes the decompression / compression processing unit to perform decompression processing when the speech section is determined to be a speech segment and causes the decompression / compression processing unit to perform compression processing when the speech segment determination unit determines that it is a non-speech segment A processing determination unit, and a noise level determination unit that determines a noise level included in the input signal,
Speech speed conversion processing determination unit is configured to determine whether to execute expansion processing and compression processing in the decompression compression processor according to the noise level determining noise level determining unit, speech segment determination section to determine the voice interval A speech speed converting apparatus characterized in that even when the noise level determined by the noise level determination unit is equal to or greater than a predetermined threshold value, the expansion / compression processing unit does not perform the expansion processing.
入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別する音声区間判別部と、入力信号を伸長又は圧縮して出力する伸長圧縮処理部と、音声区間判別部が音声区間と判別しているときに伸長圧縮処理部に伸張処理を行わせるとともに音声区間判別部が非音声区間と判別しているときに伸長圧縮処理部に圧縮処理を行わせる話速変換処理判定部と、入力信号に含まれる騒音レベルを判定する騒音レベル判定部とを備え、
話速変換処理判定部は、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが所定の第1のしきい値以上であり且つ第1のしきい値よりも大きい第2のしきい値未満であるときには伸長圧縮処理部に圧縮処理を行わせず、さらに、音声区間判別部が音声区間と判別している場合であっても騒音レベル判定部で判定される騒音レベルが第2のしきい値以上であるときには伸長圧縮処理部に伸長処理及び圧縮処理の何れも行わせないことを特徴とする速変換装置。
A speech segment determination unit that determines a speech segment in which speech is included in an input signal and a non-speech segment in which speech is not included, a decompression / compression processing unit that decompresses or compresses the input signal, and a speech segment determination Speed conversion that causes the decompression / compression processing unit to perform decompression processing when the speech section is determined to be a speech segment and causes the decompression / compression processing unit to perform compression processing when the speech segment determination unit determines that it is a non-speech segment A processing determination unit, and a noise level determination unit that determines a noise level included in the input signal,
Speech speed conversion processing determination unit, and Ri first der than the threshold noise level is given even when it is determined by the noise level determining unit which speech segment determination unit is to determine the non-speech section Even if it is less than the second threshold value, which is larger than the first threshold value, the decompression / compression processing unit does not perform the compression process, and further, even if the speech segment determination unit determines that it is a speech segment An apparatus for converting speech speed according to claim 1, wherein when the noise level determined by the noise level determination unit is equal to or higher than a second threshold value, the expansion / compression processing unit does not perform either expansion processing or compression processing .
騒音レベル判定部は、入力信号の振幅の絶対値を用いて騒音レベルを求めることを特徴とする請求項1〜3の何れか1項に記載の話速変換装置。 The speech speed conversion apparatus according to any one of claims 1 to 3, wherein the noise level determination unit obtains a noise level using an absolute value of an amplitude of an input signal . 騒音レベル判定部は、入力信号の非音声帯域の周波数成分を用いて騒音レベルを求めることを特徴とする請求項1〜の何れか1項に記載の話速変換装置。 The speech speed conversion apparatus according to any one of claims 1 to 3 , wherein the noise level determination unit obtains a noise level using a frequency component of a non-voice band of the input signal. 伸長圧縮処理部で伸長又は圧縮された入力信号を蓄積するバッファ部を備え、
話速変換処理判定部は、音声区間判別部が音声区間と判別している場合であってもバッファ部の空き容量が所定の下限値以下であるときには伸長圧縮処理部に伸長処理を行わせないことを特徴とする請求項1〜の何れか1項に記載の話速変換装置。
A buffer unit for storing the input signal expanded or compressed by the expansion / compression processing unit;
The speech rate conversion processing determination unit does not cause the expansion / compression processing unit to perform the expansion process when the free space of the buffer unit is equal to or less than a predetermined lower limit value even when the speech section determination unit determines that the speech section is a speech section. The speech rate conversion apparatus according to any one of claims 1 to 5 , wherein
伸長圧縮処理部で伸長又は圧縮された入力信号を蓄積するバッファ部を備え、
話速変換処理判定部は、音声区間判別部が音声区間と判別している場合であってもバッファ部の空き容量が所定の上限値以上であるときには伸長圧縮処理部に圧縮処理を行わせないことを特徴とする請求項1〜6の何れか1項に記載の話速変換装置。
A buffer unit for storing the input signal expanded or compressed by the expansion / compression processing unit;
Speech speed conversion processing determination unit, to perform a compression process in the expansion compression processing unit when the voice segment discriminating unit free space of the buffer unit even if is determined that the non-voice section is equal to or greater than a predetermined upper limit value The speech rate conversion apparatus according to claim 1, wherein the speech rate conversion apparatus is not provided.
伸長圧縮処理部で伸長又は圧縮された入力信号を蓄積するバッファ部を備え、
話速変換処理判定部は、伸長圧縮処理部に伸張処理を行わせる際の伸長率を、バッファ部の空き容量に応じて増減させることを特徴とする請求項1〜7の何れか1項に記載の話速変換装置
A buffer unit for storing the input signal expanded or compressed by the expansion / compression processing unit;
The speech rate conversion processing determination unit increases or decreases the expansion rate when the expansion / compression processing unit performs expansion processing according to the free capacity of the buffer unit. The speech rate conversion device described .
JP2009056958A 2009-03-10 2009-03-10 Speaking speed converter Active JP5346230B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009056958A JP5346230B2 (en) 2009-03-10 2009-03-10 Speaking speed converter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2009056958A JP5346230B2 (en) 2009-03-10 2009-03-10 Speaking speed converter

Publications (2)

Publication Number Publication Date
JP2010210947A JP2010210947A (en) 2010-09-24
JP5346230B2 true JP5346230B2 (en) 2013-11-20

Family

ID=42971184

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2009056958A Active JP5346230B2 (en) 2009-03-10 2009-03-10 Speaking speed converter

Country Status (1)

Country Link
JP (1) JP5346230B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127924B2 (en) 2016-05-31 2018-11-13 Panasonic Intellectual Property Management Co., Ltd. Communication apparatus mounted with speech speed conversion device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0315897A (en) * 1989-06-14 1991-01-24 Fujitsu Ltd Decision threshold value setting control system
JPH04168499A (en) * 1990-10-31 1992-06-16 Sanyo Electric Co Ltd Device for compressing and extending time axis
JPH09154093A (en) * 1995-11-29 1997-06-10 Sanyo Electric Co Ltd Video/audio reproducing device
JP2000250566A (en) * 1999-02-25 2000-09-14 Sanyo Electric Co Ltd Sound and soundless deciding device and speech rate converting device
JP2003216200A (en) * 2002-01-28 2003-07-30 Telecommunication Advancement Organization Of Japan System for supporting creation of writing text for caption and semi-automatic caption program production system
JP3871657B2 (en) * 2003-05-27 2007-01-24 株式会社東芝 Spoken speed conversion device, method, and program thereof
JP2006126548A (en) * 2004-10-29 2006-05-18 Matsushita Electric Works Ltd Speech synthesizer
WO2006077626A1 (en) * 2005-01-18 2006-07-27 Fujitsu Limited Speech speed changing method, and speech speed changing device
WO2009011021A1 (en) * 2007-07-13 2009-01-22 Panasonic Corporation Speaking speed converting device and speaking speed converting method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127924B2 (en) 2016-05-31 2018-11-13 Panasonic Intellectual Property Management Co., Ltd. Communication apparatus mounted with speech speed conversion device

Also Published As

Publication number Publication date
JP2010210947A (en) 2010-09-24

Similar Documents

Publication Publication Date Title
JP6378274B2 (en) Voice / audio signal processing method and apparatus
KR102237718B1 (en) Device and method for reducing quantization noise in a time-domain decoder
JP4630876B2 (en) Speech speed conversion method and speech speed converter
JP5071346B2 (en) Noise suppression device and noise suppression method
WO2005117366A1 (en) Sound packet reproducing method, sound packet reproducing apparatus, sound packet reproducing program, and recording medium
US20110054889A1 (en) Enhancing Receiver Intelligibility in Voice Communication Devices
JP2006189907A (en) Method of detecting voice activity of signal and voice signal coder including device for implementing method
JP4460580B2 (en) Speed conversion device, speed conversion method and program
KR100806155B1 (en) Method and system for enabling audio speed conversion
JP6065308B2 (en) Volume correction device
JPWO2010087147A1 (en) Howling suppression device, howling suppression method, program, and integrated circuit
US20040236571A1 (en) Subband method and apparatus for determining speech pauses adapting to background noise variation
US10757514B2 (en) Method of suppressing an acoustic reverberation in an audio signal and hearing device
JP5346230B2 (en) Speaking speed converter
JP6878776B2 (en) Noise suppression device, noise suppression method and computer program for noise suppression
JP6277739B2 (en) Communication device
JP6396829B2 (en) Information processing apparatus, determination method, and computer program
JP2009229921A (en) Acoustic signal analyzing device
WO2017085815A1 (en) Perplexed state determination system, perplexed state determination method, and program
JP2009265422A (en) Information processing apparatus and information processing method
KR100744375B1 (en) Apparatus and method for processing sound signal
WO2016203753A1 (en) Noise detection device, noise suppression device, noise detection method, noise suppression method, and recording medium
KR101098763B1 (en) Method and system of suppressing noise
KR102238429B1 (en) Sporadic noise detecting apparatus
JP6451079B2 (en) Speech enhancement device and program, and speech decoding device and program

Legal Events

Date Code Title Description
RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20100714

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20111215

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A712

Effective date: 20120113

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20121213

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20121225

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20130225

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20130723

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20130816

R150 Certificate of patent or registration of utility model

Ref document number: 5346230

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150