JP3306784B2

JP3306784B2 - Bone conduction microphone output signal reproduction device

Info

Publication number: JP3306784B2
Application number: JP21158494A
Authority: JP
Inventors: 芳夫中▲ダイ▼; 豊西野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1994-09-05
Filing date: 1994-09-05
Publication date: 2002-07-24
Anticipated expiration: 2017-07-24
Also published as: JPH0879868A

Abstract

PURPOSE: To provide a bone conduction microphone output signal reproduction device in which a voice signal is corrected in the unit of phoneme so as to obtain an output voice signal with high quality. CONSTITUTION: A voice signal is collected simultaneously from a bone conduction microphone 1 and an air conduction microphone 2 and sets of voice signal waveform patterns divided in the unit of short phoneme time are stored in a conversion rule decision device 9, and when the signal waveform pattern of a collected voice is received from the bone conduction microphone 1, a pattern closest to the signal waveform pattern stored in the conversion rule decision device 9 is selected and the signal waveform pattern related to the selected signal is obtained and the patterns are combined and the result is outputted from an output terminal 16.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、骨導マイクロホン出力
信号再生装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a bone conduction microphone output signal reproducing apparatus.

【０００２】[0002]

【従来の技術】移動電話、トランシーバ等の通信機器や
テープレコーダなどの音声収録機器を使用するときに、
発声者の音声を収音するために、空気中の振動を収音す
るマイクロホン、いわゆる気導マイクロホンが用いられ
ることが多い。しかしながら、気導マイクロホンを用い
て、工事現場などの高レベルの騒音が放射される場所で
発声者の音声を収録しようとした場合、その音声には騒
音が重畳し、送話に十分なＳ／Ｎ（音声信号対雑音比）
を得ることができない。そこで、このような送話環境に
おいては、骨導マイクロホンが使用される。2. Description of the Related Art When using communication devices such as mobile phones and transceivers and audio recording devices such as tape recorders,
In order to pick up the voice of the speaker, a microphone that picks up vibrations in the air, a so-called air conduction microphone, is often used. However, when attempting to record a speaker's voice in a place where high-level noise is radiated, such as a construction site, using an air-conducting microphone, the noise is superimposed on the voice, and S / S is sufficient for transmitting. N (voice signal to noise ratio)
Can not get. Therefore, in such a transmission environment, a bone conduction microphone is used.

【０００３】骨導マイクロホンは、骨伝導マイクロホン
とも呼ばれるが、音声発生時の声帯振動によって生じる
骨の振動を額、顎、頬、耳孔などで収録し、実際の音声
の代用の信号として利用するための振動ピックアップの
ひとつである。高レベルの騒音下では騒音によって人間
の骨も振動するため、これが骨導マイクロホン出力音声
信号に重畳する事が観測されるが、それでも音源、すな
わち声帯振動から近接した位置で音声を収録できるた
め、気導マイクロホンに比べて高いＳ／Ｎを得ることが
でき、高騒音下の音声入力手段として有効である。[0003] The bone conduction microphone is also called a bone conduction microphone. The bone conduction microphone is used to record the vibration of the bone caused by the vibration of the vocal cords at the time of voice generation in a forehead, a chin, a cheek, an ear hole or the like, and to use the signal as a substitute for the actual voice. It is one of the vibration pickups. Under high-level noise, human bones vibrate due to the noise, and it is observed that this is superimposed on the bone-conducted microphone output audio signal.However, since the sound can be recorded at a position close to the sound source, that is, the vocal cord vibration, A higher S / N can be obtained as compared with the air conduction microphone, and it is effective as a voice input unit under high noise.

【０００４】ところが、骨導マイクロホンは、気導マイ
クロホンに比べて、周波数特性上いくつかの問題点を有
している。第１の問題点は、受話信号について平坦な周
波数特性が得られず、低域が強調された音声となり易い
点である。第２の問題点は、声帯振動で生じる音声と、
声帯以外の発生帰還を介して生じる音声とで、骨導マイ
クロホン出力音声信号のパワーが、気導音声の場合と異
なった特徴を示す点である。例えば、耳孔に骨導マイク
ロホンを配置した場合、鼻腔を振動させる撥音「ん」に
ついては、鼻腔と耳孔との位置が近接しているため、声
帯振動で生じる母音よりも高レベルの信号として検出さ
れ、気導音声の場合と比較して違和感のある音声とな
る。また、第３の問題点は、骨導マイクロホンの材質、
形状や装着状態によっては、マイクロホンのユニットと
皮膚との摩擦によって生じる不要音を拾いやすく、この
不要音が口蓋の開閉によって常時生じる雑音として骨導
マイクロホン送話音声に重畳されるという点である。However, bone conduction microphones have some problems in frequency characteristics as compared with air conduction microphones. The first problem is that a flat frequency characteristic cannot be obtained for the received signal, and the sound is likely to be a sound in which the low frequency band is emphasized. The second problem is that the sound generated by vocal cord vibration
The power of the bone-conducted microphone output voice signal is different from that of the air-conducted voice in the voice generated via the generated feedback other than the vocal cords. For example, when a bone-conducting microphone is placed in the ear canal, the sound-repelling “n” that vibrates the nasal cavity is detected as a higher-level signal than the vowel generated by vocal cord vibration because the position of the nasal cavity and the ear canal are close to each other. , The sound is more uncomfortable than the air-conducted sound. The third problem is that the material of the bone conduction microphone
Depending on the shape and the wearing state, unnecessary sound generated by friction between the microphone unit and the skin is easily picked up, and this unnecessary sound is superimposed on the bone-conducted microphone transmission voice as noise constantly generated by opening and closing of the palate.

【０００５】従って、骨導マイクロホンによって通常の
マイクロホン収音程度の明瞭な音声を得ようとした場
合、骨導マイクロホン出力音声信号の各音素（または音
素と同等レベルの短時間区間）ごとに周波数特性を平坦
化し、音声パワーを調整し、また不要雑音を除去して再
度音声合成するような信号処理技術を用いることが必要
になる。このような信号処理技術として、従来より、骨
導マイクロホン出力音声信号にアクティブフィルタによ
る補正を施して音声品質を改善する試みが行われてき
た。[0005] Therefore, when an attempt is made to obtain a clear voice equivalent to that of a normal microphone picked up by a bone conduction microphone, the frequency characteristics of each phoneme (or a short time period equivalent to the phoneme) of the bone conduction microphone output voice signal are determined. It is necessary to use a signal processing technique for flattening, adjusting audio power, removing unnecessary noise, and synthesizing speech again. As such a signal processing technique, conventionally, an attempt has been made to improve the voice quality by performing correction using an active filter on a bone conduction microphone output voice signal.

【０００６】図６はこのフィルタ補正の例である。ここ
で、発声者の音声を骨導マイクロホンで収録したものを
骨導音声、また音声を通常の気導マイクロホンで収録し
たものを気導音声と呼ぶことにする。まず、発声者の骨
導音声と気導音声とをそれぞれ骨導マイクロホン１およ
び気導マイクロホン２を使用して同時収録し、これらを
一旦、テープレコーダ３などで記憶する。記憶した各々
の音声波形について長時間平均スペクトルを観測し、骨
導マイクロホン１での収音波形に対する気導マイクロホ
ン２での収音波形の特性の差異を長時間スペクトル計算
部４で得る。そこで、この差分特性を実現するフィルタ
をフィルタ部５で実現すれば、以降、骨導マイクロホン
１の収音音声はフィルタ部５を通じ、気導マイクロホン
２での収音音声に相当する疑似気導音声となって出力端
６より得られるというものである。FIG. 6 shows an example of this filter correction. Here, a voice recorded by a speaker using a bone conduction microphone is called a bone conduction voice, and a voice recorded by a normal air conduction microphone is called an air conduction voice. First, the bone-conducted voice and the air-conducted voice of the speaker are simultaneously recorded using the bone-conducted microphone 1 and the air-conducted microphone 2, respectively, and these are temporarily stored in the tape recorder 3 or the like. A long-term average spectrum is observed for each of the stored voice waveforms, and a long-term spectrum calculation unit 4 obtains a difference between the characteristics of the sound-collecting waveform of the air-conducting microphone 2 and the sound-collecting waveform of the bone-conducting microphone 1. Therefore, if a filter that realizes this difference characteristic is realized by the filter unit 5, the sound collected by the bone conduction microphone 1 will be passed through the filter unit 5 and the pseudo air-conducted sound equivalent to the sound collected by the air-conducted microphone 2. And is obtained from the output terminal 6.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、上述し
た従来の改善方法は、長時間平均値としての音声特性の
改善を行うものであって、各音節毎に正しく修正するも
のではない。より正確な補正を行うためには、骨導音声
を音素（または音素と同等レベルの短時間）単位で分解
した上で、これに予め各音素単位で求めておいた骨導音
声から気導音声への音声補正フィルタ処理を施して、音
声を再生成する方法が望まれる。However, the above-described conventional improvement method improves the sound characteristics as a long-term average value, and does not correct each syllable correctly. In order to perform more accurate correction, the bone-conducted speech is decomposed in phonemes (or in a short time at the same level as the phoneme), and then the bone-conducted speech is calculated from the bone-conducted speech obtained in advance for each phoneme. There is a demand for a method of performing audio correction filter processing on the audio data and regenerating the audio.

【０００８】ところが、骨導音声には、前述したように
骨導マイクロホンと皮膚との摩擦音が重畳している。し
たがって、前述した骨導音声を信号処理的に補正して使
用する方式では、骨導音声を音素単位で分割したとして
も、耳障りな雑音が残ってしまうという問題がある。本
発明は上述した事情に鑑みて為されたものであり、音声
を音素単位で補正して高品質の出力音声を得ることがで
きる骨導マイクロホン出力信号再生装置を提供すること
を目的とする。However, the friction sound between the bone conduction microphone and the skin is superimposed on the bone conduction sound as described above. Therefore, in the above-described method in which the bone conduction voice is corrected and used in signal processing, there is a problem that even if the bone conduction voice is divided in units of phonemes, unpleasant noise remains. The present invention has been made in view of the above circumstances, and has as its object to provide a bone-conducted microphone output signal reproducing device capable of obtaining a high-quality output sound by correcting the sound in units of phonemes.

【０００９】[0009]

【課題を解決するための手段】請求項１に記載の骨導マ
イクロホン出力信号再生装置は、骨導マイクロホンと、
骨導マイクロホン出力信号を所定の短時間毎に分割する
手段と、気導マイクロホンと、気導マイクロホン出力信
号を前記所定の短時間毎に分割する手段と、前記所定の
短時間の骨導マイクロホン出力信号と前記所定の短時間
の気導マイクロホン出力信号との対応を求め骨導マイク
ロホン出力信号から気導マイクロホン出力信号への前記
所定の短時間単位での信号変換ルールを決定する手段
と、前記信号変換ルールを記憶する手段と、該手段に記
憶された信号変換ルールに基づいて前記所定の短時間の
骨導マイクロホン出力信号から前記所定の短時間の疑似
気導マイクロホン出力信号を生成して出力する手段と、
前記所定の短時間の疑似気導マイクロホン出力信号のそ
れぞれを接合して長時間の疑似気導マイクロホン出力信
号を得る手段とを具備し、前記骨導マイクロホンおよび
前記気導マイクロホンのそれぞれより同時収録して前記
所定の短時間分割を施した各音声信号波形について１対
１の対応を求め、前記所定の短時間の骨導マイクロホン
出力信号から前記所定の短時間の疑似気導マイクロホン
出力信号への変換ルールとして記憶し、該変換ルールお
よび前記所定の短時間の骨導マイクロホン出力信号に基
づいて得られる前記所定の短時間の疑似気導マイクロホ
ン出力信号を接合して長時間の信号波形を得ることによ
り前記長時間の疑似気導マイクロホン出力信号を再生す
ることを特徴としている。According to a first aspect of the present invention, there is provided a bone conduction microphone output signal reproducing apparatus comprising: a bone conduction microphone;
Means for dividing the bone conduction microphone output signal every predetermined short time; air conduction microphone; means for dividing the air conduction microphone output signal every predetermined short time; and the bone conduction microphone output for the predetermined short time. Means for determining a correspondence between a signal and the predetermined short-time air-conducting microphone output signal, and determining a signal conversion rule for the predetermined short-time unit from the bone-conducting microphone output signal to the air-conducting microphone output signal; and Means for storing a conversion rule; and generating and outputting the predetermined short-time pseudo air-conducting microphone output signal from the predetermined short-time bone conduction microphone output signal based on the signal conversion rule stored in the means. Means,
Means for joining each of the predetermined short-time pseudo air-conducting microphone output signals to obtain a long-time pseudo air-conducting microphone output signal, wherein the bone-conducting microphone and the air-conducting microphone record simultaneously. A one-to-one correspondence is obtained for each of the predetermined short-time divided audio signal waveforms, and the predetermined short-time bone conduction microphone output signal is converted into the predetermined short-time pseudo air conduction microphone output signal. By storing as a rule, joining the predetermined short-time pseudo air-conducting microphone output signal obtained based on the conversion rule and the predetermined short-time bone-conducting microphone output signal to obtain a long-term signal waveform The long-time pseudo air-conducting microphone output signal is reproduced.

【００１０】請求項２に記載の骨導マイクロホン出力信
号再生装置は、骨導マイクロホンと、骨導マイクロホン
出力信号を所定の短時間毎に分割する手段と、前記所定
の短時間の骨導マイクロホン出力信号を特徴抽出して基
本周波数と声道特徴パラメータとを導出する手段と、気
導マイクロホンと、気導マイクロホン出力信号を前記所
定の短時間毎に分割する段と、前記所定の短時間の気導
マイクロホン出力信号を特徴抽出して声道特徴パラメー
タを導出する手段と、前記骨導マイクロホン出力信号の
声道特徴パラメータと前記気導マイクロホン出力信号の
声道特徴パラメータとの対応を求め、前記骨導マイクロ
ホン出力信号の声道特徴パラメータから前記気導マイク
ロホン出力信号の声道特徴パラメータへの前記所定の短
時間単位での変換ルールを決定する手段と、前記変換ル
ールを記憶する手段と、該手段に記憶された変換ルール
に基づいて前記骨導マイクロホン出力信号の声道特徴パ
ラメータから疑似気導マイクロホン出力信号の声道特徴
パラメータを生成して出力する手段と、前記疑似気導マ
イクロホン出力信号の声道特徴パラメータと前記骨導マ
イクロホン出力信号の基本周波数成分とから前記所定の
短時間の疑似気導マイクロホン出力信号を合成する手段
と、前記所定の短時間の疑似気導マイクロホン出力信号
のそれぞれを接合して長時間の疑似気導マイクロホン出
力信号を得る手段とを具備し、前記骨導マイクロホンと
前記気導マイクロホンのそれぞれより同時収録して前記
所定の短時間毎に分割した各音声信号波形について、声
道特徴パラメータの抽出を施した上で１対１の対応を求
め、前記骨導マイクロホン出力信号の声導特徴パラメー
タから前記疑似気導マイクロホン出力信号の声道特徴パ
ラメータへの変換ルールとして記憶し、該変換ルールお
よび前記骨導マイクロホン出力信号の声導特徴パラメー
タを用いて得たパラメータに基づいて得られる声道特徴
パラメータと前記骨導マイクロホン出力信号のピッチ成
分とから得られる前記所定の短時間の疑似気導マイクロ
ホン出力信号を接合して長時間の信号波形を得ることに
より前記長時間の疑似気導マイクロホン出力信号を再生
することを特徴としている。A bone-conducting microphone output signal reproducing apparatus according to a second aspect of the present invention provides a bone-conducting microphone, a means for dividing the bone-conducting microphone output signal for each predetermined short time, and the bone-conducting microphone output for a predetermined short time. Means for extracting a signal to derive a fundamental frequency and a vocal tract characteristic parameter; an air-conducting microphone; a stage for dividing the air-conducting microphone output signal for each of the predetermined short periods; Means for extracting a vocal tract feature parameter by extracting a feature of the microphone output signal, and determining a correspondence between a vocal tract feature parameter of the bone-conducted microphone output signal and a vocal tract feature parameter of the air-conducted microphone output signal, Converting the vocal tract feature parameters of the guided microphone output signal into the vocal tract feature parameters of the air guided microphone output signal in the predetermined short time unit Vocal tract characteristics of the pseudo air-conducted microphone output signal from the vocal tract characteristic parameters of the bone-conducted microphone output signal based on the conversion rules stored in the means. Means for generating and outputting a parameter; and synthesizing the pseudo air-conducted microphone output signal for a predetermined short time from the vocal tract characteristic parameter of the pseudo air-conducted microphone output signal and a fundamental frequency component of the bone-conducted microphone output signal. Means, and a means for joining each of the predetermined short-time pseudo air-conducting microphone output signals to obtain a long-time pseudo air-conducting microphone output signal, wherein each of the bone-conducting microphone and the air-conducting microphone includes For each audio signal waveform simultaneously recorded and divided every predetermined short time, vocal tract feature parameters are extracted. A one-to-one correspondence is obtained and stored as a conversion rule from the vocal conduction feature parameter of the bone conduction microphone output signal to the vocal tract feature parameter of the pseudo air conduction microphone output signal, and the conversion rule and the bone conduction microphone output signal are stored. Vocal tract feature parameters obtained based on the parameters obtained using the voice conduction feature parameters and the predetermined short-time pseudo air conduction microphone output signal obtained from the pitch component of the bone conduction microphone output signal by joining The long-time pseudo air-conducting microphone output signal is reproduced by obtaining a long-time signal waveform.

【００１１】請求項３に記載の骨導マイクロホン出力信
号再生装置は、骨導マイクロホンと、骨導マイクロホン
出力信号を所定の短時間毎に分割する手段と、前記所定
の短時間の骨導マイクロホン出力信号に相当する疑似気
導マイクロホン出力信号を得るための信号変換ルールを
記憶した手段と、該手段に記憶された信号変換ルールに
基づいて前記所定の短時間の骨導マイクロホン出力信号
から前記所定の短時間の疑似気導マイクロホン出力信号
を生成して出力する手段と、前記所定の短時間の疑似気
導マイクロホン出力信号のそれぞれを接合して長時間の
疑似気導マイクロホン出力信号を得る手段とを具備し、
前記変換ルールおよび前記所定の短時間の骨導マイクロ
ホン出力信号に基づいて得られる前記所定の短時間の疑
似気導マイクロホン出力信号を接合して長時間の信号波
形を得ることにより前記長時間の疑似気導マイクロホン
出力信号を再生することを特徴としている。According to a third aspect of the present invention, there is provided a bone conduction microphone output signal reproducing apparatus, comprising: a bone conduction microphone; a unit for dividing the bone conduction microphone output signal for each predetermined short time; Means for storing a signal conversion rule for obtaining a pseudo air-conducted microphone output signal corresponding to a signal, and the predetermined short-time bone-conducted microphone output signal based on the signal conversion rule stored in the means. Means for generating and outputting a short-time pseudo air-conducting microphone output signal; and means for joining each of the predetermined short-time pseudo air-conducting microphone output signals to obtain a long-time pseudo air-conducting microphone output signal. Have,
The long-term pseudo-conducted microphone output signal obtained based on the conversion rule and the predetermined short-time bone-conducted microphone output signal is joined to obtain a long-term signal waveform, thereby obtaining the long-term pseudo-conducted microphone. It is characterized by reproducing an air conduction microphone output signal.

【００１２】請求項４に記載の骨導マイクロホン出力信
号再生装置は、骨導マイクロホンと、骨導マイクロホン
出力信号を所定の短時間毎に分割する手段と、前記所定
の短時間の骨導マイクロホン出力信号を特徴抽出して基
本周波数と声道特徴パラメータとを導出する手段と、前
記所定の短時間の骨導マイクロホン出力信号に相当する
疑似気導マイクロホン出力信号を得るための変換ルール
を記憶した手段と、前記骨導マイクロホン出力信号の声
道特徴パラメータに相当する疑似気導マイクロホン出力
信号の声道特徴パラメータを得るための変換ルールを記
憶した手段と、該手段に記憶された変換ルールに基づい
て前記骨導マイクロホン出力信号の声道特徴パラメータ
から疑似気導マイクロホン出力信号の声道特徴パラメー
タを生成して出力する手段と、前記疑似気導マイクロホ
ン出力信号の声道特徴パラメータと前記骨導マイクロホ
ン出力信号の基本周波数成分とから前記所定の短時間の
疑似気導マイクロホン出力信号を合成する手段と、前記
所定の短時間の疑似気導マイクロホン出力信号のそれぞ
れを接合して長時間の疑似気導マイクロホン出力信号を
得る手段とを具備し、前記変換ルールおよび前記所定の
短時間の骨導マイクロホン出力信号の声導特徴パラメー
タを用いて得られる声道特徴パラメータと前記骨導マイ
クロホン出力信号のピッチ成分とから得られる前記所定
の短時間の疑似気導マイクロホン出力信号を接合して長
時間の信号波形を得ることにより前記長時間の疑似気導
マイクロホン出力信号を再生することを特徴としてい
る。According to a fourth aspect of the present invention, there is provided a bone conduction microphone output signal reproducing apparatus, comprising: a bone conduction microphone; a unit for dividing the bone conduction microphone output signal for each predetermined short time; Means for extracting a signal to derive a fundamental frequency and a vocal tract feature parameter, and means for storing a conversion rule for obtaining a pseudo air conduction microphone output signal corresponding to the predetermined short-time bone conduction microphone output signal. Means for storing a conversion rule for obtaining a vocal tract feature parameter of the pseudo air conduction microphone output signal corresponding to the vocal tract feature parameter of the bone conduction microphone output signal, and based on the conversion rule stored in the means. Generating and outputting a vocal tract feature parameter of the pseudo air-conducting microphone output signal from the vocal tract feature parameter of the bone conduction microphone output signal; Means for synthesizing the predetermined short-time pseudo air-conducted microphone output signal from the vocal tract characteristic parameter of the pseudo air-conducted microphone output signal and a fundamental frequency component of the bone-conducted microphone output signal; and Means for joining each of the short-time pseudo air-conducting microphone output signals to obtain a long-time pseudo air-conducting microphone output signal, wherein said conversion rule and said predetermined short-time bone-conducting microphone output signal are voiced. By combining the vocal tract characteristic parameter obtained using the characteristic parameter and the predetermined short-time pseudo air-conducting microphone output signal obtained from the pitch component of the bone-conducting microphone output signal to obtain a long-term signal waveform The long-time pseudo air-conducting microphone output signal is reproduced.

【００１３】[0013]

【作用】請求項１に記載の骨導マイクロホン出力信号再
生装置では、骨導マイクロホンを装着した発声者につい
て、予め、骨導マイクロホンより骨導音声および気導マ
イクロホンより気導音声を同時収録しておき、得られた
骨導音声信号および気導音声信号をそれぞれ短時間分割
し、短時間の骨導音声信号と気導音声信号との対応を求
め、信号変換ルールとして記憶する。次に骨導マイクロ
ホンより骨導音声が入力されたときに、得られた骨導音
声信号を短時間分割し、先に記憶した信号変換ルールに
よって疑似骨導音声信号へ変換し、これらを接合して長
時間の疑似気導音声信号を再生する。In the bone conduction microphone output signal reproducing apparatus according to the first aspect, for a speaker wearing the bone conduction microphone, bone conduction sound from the bone conduction microphone and air conduction sound from the air conduction microphone are simultaneously recorded in advance. Then, the obtained bone-conducted voice signal and air-conducted voice signal are each divided in a short time, and the correspondence between the bone-conducted voice signal and the air-conducted voice signal in a short time is obtained and stored as a signal conversion rule. Next, when bone-conducted speech is input from the bone-conducted microphone, the obtained bone-conducted speech signal is divided for a short time, converted into a pseudo-bone-conducted speech signal according to the previously stored signal conversion rule, and these are joined. To reproduce a long pseudo air-conducted voice signal.

【００１４】請求項２に記載の骨導マイクロホン出力信
号再生装置では、骨導マイクロホンを装着した発声者に
ついて、予め、骨導マイクロホンより骨導音声および気
導マイクロホンより気導音声を同時収得しておき、得ら
れた骨導音声信号および気導音声信号をそれぞれ短時間
分割して信号分析を行う。その結果、骨導および気導そ
れぞれの音声信号より短時間単位での声導特徴パラメー
タが得られ、これら各音声信号の声導特徴パラメータ間
の対応を求め、変換ルールとして記憶する。次に、骨導
マイクロホンより骨導音声が入力されたときに、得られ
た骨導音声信号を短時間分割し信号分析して得られる声
導特徴パラメータと先に記憶した変換ルールとに基づい
て、骨導音声信号の疑似声導特徴パラメータを導出し、
このパラメータと元の骨導音声信号の信号分析によって
得られる基本周波数とを用いて、短時間の疑似骨導音声
信号を合成し、これらを接合して長時間の疑似気導音声
信号を再生する。In the bone-conducting microphone output signal reproducing apparatus according to the second aspect, for a speaker wearing the bone-conducting microphone, bone-conducting voice is obtained from the bone-conducting microphone and air-conducting voice is obtained from the air-conducting microphone in advance. Then, the obtained bone-conducted speech signal and the air-conducted speech signal are each divided in a short time to perform signal analysis. As a result, the voice conduction feature parameters are obtained in a short time unit from the bone conduction and air conduction voice signals, and the correspondence between the voice conduction feature parameters of each voice signal is obtained and stored as a conversion rule. Next, when bone-conducted speech is input from the bone-conducted microphone, the obtained bone-conducted speech signal is divided in a short time and analyzed based on the voice-conducted feature parameters obtained based on the conversion rules stored previously. Derive the pseudo-conductivity feature parameters of the bone-conducted speech signal,
Using this parameter and the fundamental frequency obtained by signal analysis of the original bone-conducted speech signal, a short-time pseudo-bone-conducted speech signal is synthesized, and these are joined to reproduce a long-time pseudo-conducted speech signal. .

【００１５】請求項３に記載の骨導マイクロホン出力信
号再生装置では、骨導マイクロホンより骨導音声が入力
されたときに、得られた骨導音声信号を所定の短時間毎
に分割し、前記所定の短時間の骨導マイクロホン出力信
号に相当する疑似気導マイクロホン出力信号を得るため
の普遍的な信号変換ルールによって前記所定の短時間の
疑似骨導音声信号へ変換し、これらを接合して長時間の
疑似気導音声信号を再生する。According to a third aspect of the present invention, when a bone conduction sound is input from the bone conduction microphone, the obtained bone conduction sound signal is divided every predetermined short time. By a universal signal conversion rule for obtaining a pseudo air-conducting microphone output signal corresponding to a predetermined short-time bone-conducting microphone output signal, the signal is converted into the predetermined short-time pseudo-bone-conducting voice signal, and these are joined together. Plays a long pseudo air-conducted audio signal.

【００１６】請求項４に記載の骨導マイクロホン出力信
号再生装置では、骨導マイクロホンより骨導音声が入力
されたときに、得られた骨導音声信号を所定の短時間毎
に分割して特徴抽出し、これにより得られる声道特徴パ
ラメータと予め設定された普遍的な変換ルールとによっ
て得られる声道特徴パラメータと、前記所定の短時間の
骨導マイクロホン出力信号のピッチ成分とを合成して前
記所定の短時間の疑似骨導音声信号を生成し、これらを
接合して長時間の疑似気導音声信号を再生する。According to a fourth aspect of the present invention, when a bone conduction sound is input from the bone conduction microphone, the obtained bone conduction sound signal is divided every predetermined short time. Extracted, synthesized by combining the vocal tract feature parameter obtained by this and the vocal tract feature parameter obtained by a preset universal conversion rule, and the pitch component of the predetermined short-time bone conduction microphone output signal. The predetermined short-time pseudo bone-conducted audio signals are generated, and these are joined to reproduce a long-time pseudo air-conducted audio signal.

【００１７】[0017]

【実施例】以下、図面を参照して本発明の実施例につい
て説明する。図１は本発明の第１の実施例による骨導マ
イクロホン出力信号再生装置の概略構成を示すブロック
図である。図１において、１は骨導マイクロホンであ
り、顔の部位、例えば、額、顎、頬、耳孔などに装着さ
れ、骨や皮膚に伝達される発声者の声帯振動を収録する
ものである。２は気導マイクロホンであり、空気伝搬す
る発声者の肉声信号を収録するものであり、すなわち一
般的なマイクロホンである。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a bone conduction microphone output signal reproducing apparatus according to a first embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a bone conduction microphone, which is attached to a face part, for example, a forehead, a chin, a cheek, an ear hole, or the like, and records a vocal cord vibration of a speaker transmitted to a bone or skin. Reference numeral 2 denotes an air-conducting microphone, which records a real voice signal of a speaker who propagates in the air, that is, a general microphone.

【００１８】３および４はローパスフィルタであり、そ
れぞれ骨導マイクロホン１および気導マイクロホン２の
出力信号に対してエリアシング歪みを防止するためのも
のである。ローパスフィルタ３，４のカットオフ周波数
は、最終的に得ようとする疑似気導音声の周波数帯域を
元の骨導音声と同一帯域にしようとするものであれば、
それぞれ同一の値、例えば、４ｋＨｚである。また、最
終的に得ようとする疑似気導音声の周波数帯域を元の骨
導音声の帯域よりも拡大しようとする場合には、カット
オフ周波数は、例えば、ローパスフィルタ３については
４ｋＨｚ、ローパスフィルタ４については７ｋＨｚとい
うように、それぞれ異なった値としてもよい。Numerals 3 and 4 denote low-pass filters for preventing aliasing distortion of the output signals of the bone conduction microphone 1 and the air conduction microphone 2, respectively. The cutoff frequency of the low-pass filters 3 and 4 is such that the frequency band of the pseudo air-conducted sound to be finally obtained is to be the same band as the original bone-conducted sound.
Each has the same value, for example, 4 kHz. When the frequency band of the pseudo air-conducted sound to be finally obtained is to be expanded beyond the band of the original bone-conducted sound, the cutoff frequency is, for example, 4 kHz for the low-pass filter 3 and the low-pass filter. 4 may be different values such as 7 kHz.

【００１９】５および６はＡ／Ｄ変換器であり、それぞ
れローパスフィルタ３および４の出力について、後段で
行われる信号処理を容易にするためにＡ／Ｄ変換を施す
ものである。各Ａ／Ｄ変換器５，６は、それぞれのサン
プリング周波数の音声の特徴が明確に現れる周波数帯域
を含み、かつ、ローパスフィルタ３、４のカットオフ周
波数に対してナイキストの標本化定理を満たす関係であ
れば良い。また、Ａ／Ｄ変換器５，６の量子化ビット数
は、音声の特徴が明確に表れ、量子化歪が少ないもので
あれば良い。Reference numerals 5 and 6 denote A / D converters for performing A / D conversion on the outputs of the low-pass filters 3 and 4, respectively, in order to facilitate signal processing performed in the subsequent stage. Each of the A / D converters 5 and 6 includes a frequency band in which sound characteristics of the respective sampling frequency clearly appear, and satisfies the Nyquist sampling theorem with respect to the cutoff frequencies of the low-pass filters 3 and 4. Is fine. Further, the number of quantization bits of the A / D converters 5 and 6 may be any number as long as the characteristics of the voice clearly appear and the quantization distortion is small.

【００２０】すなわち例えば、ローパスフィルタ３，４
のそれぞれのカットオフ周波数を４ｋＨｚ同一とした場
合のＡ／Ｄ変換器５および６のサンプリング周波数およ
び量子化ビット数は、例えば、８ｋＨｚサンプリング、
１２ビット線形量子化で同一となる。また、ローパスフ
ィルタ３については４ｋＨｚ、ローパスフィルタ４につ
いては７ｋＨｚのように、各ローパスフィルタ３，４の
カットオフ周波数が異なっている場合、Ａ／Ｄ変換器５
については、例えば８ｋＨｚサンプリング、１２ビット
線形量子化となり、Ａ／Ｄ変換器６については、例えば
１６ｋＨｚサンプリング、１６ビット線形量子化とな
る。That is, for example, low-pass filters 3 and 4
The sampling frequency and the number of quantization bits of the A / D converters 5 and 6 when the respective cutoff frequencies are the same at 4 kHz are, for example, 8 kHz sampling,
The same is obtained by 12-bit linear quantization. When the cutoff frequency of each of the low-pass filters 3 and 4 is different, such as 4 kHz for the low-pass filter 3 and 7 kHz for the low-pass filter 4, the A / D converter 5
Is, for example, 8 kHz sampling and 12-bit linear quantization, and the A / D converter 6 is, for example, 16 kHz sampling and 16-bit linear quantization.

【００２１】７および８は短時間分析器であり、それぞ
れＡ／Ｄ変換器５，６より得られる骨導音声信号および
気導音声信号を短時間区間単位に分割する。この分割単
位は、各短時間分析器７，８で同一の値をとり、音素あ
るいは音韻レベルの時間長、例えば３２ｍｓｅｃとす
る。また例えば、後述する平滑化器１３において窓関数
を乗じるために信号パワーの損失が生じるような場合に
は、短時間分析器７，８では、波形の一部を重複させな
がら分割することによって窓関数での損失を避けるよう
な処理を行う。この処理により分割される波形の例を図
２に示す。この図に示す例では、原音声波形の一部を重
複させながら分割波形パターンＡ、Ｂ、Ｃを生成してい
る。Reference numerals 7 and 8 denote short-time analyzers, which divide the bone-conducted speech signal and the air-conducted speech signal obtained from the A / D converters 5 and 6 into short-time sections. This division unit takes the same value in each of the short time analyzers 7 and 8, and sets the time length of the phoneme or phoneme level, for example, 32 msec. Further, for example, in the case where a loss of signal power occurs due to multiplication by a window function in a smoothing unit 13 described later, the short-time analyzers 7 and 8 divide the window while overlapping a part of the waveform to obtain a window. Perform processing to avoid loss in the function. FIG. 2 shows an example of a waveform divided by this processing. In the example shown in this figure, divided waveform patterns A, B, and C are generated while partially overlapping the original audio waveform.

【００２２】再び図１において、９は変換ルール決定器
であり、短時間分析器７，８で得られた短時間の骨導音
声信号と気導音声信号との対応を学習して記憶するもの
である。すなわち短時間の骨導音声信号をａ（ｎ）、ａ
（ｎ）と同時に収録した短時間の気導音声信号をｂ
（ｎ）とすると、変換ルール決定器９は、ａ（ｎ）とｂ
（ｎ）との組を信号変換ルールとして決定する。なお、
ｎは記憶するルールの番号を示しており、ここではルー
ルの個数は最大で１０００となるものとする。Referring again to FIG. 1, reference numeral 9 denotes a conversion rule determiner, which learns and stores the correspondence between the short-time bone-conducted speech signal obtained by the short-time analyzers 7 and 8 and the air-conducted speech signal. It is. That is, the short-time bone conduction voice signal is represented by a (n), a
The short air conduction audio signal recorded simultaneously with (n)
Assuming that (n), the conversion rule determiner 9 calculates a (n) and b
The combination with (n) is determined as a signal conversion rule. In addition,
n indicates the number of the rule to be stored. Here, it is assumed that the number of rules is 1000 at the maximum.

【００２３】１０はルール記憶器であり、変換ルール決
定器９で決定された変換ルールを記憶し、後段の信号変
換器１２に与えるものである。１１は短時間分析器７か
らの骨導音声信号の出力先を切り替えるスイッチであ
り、このスイッチ１１により、信号変換ルールを学習す
る学習モードと、信号変換ルールに基づいて骨導音声信
号の信号変換を行う再生モードとが切り替えられる。A rule storage 10 stores the conversion rules determined by the conversion rule determiner 9 and supplies the conversion rules to a signal converter 12 at the subsequent stage. Reference numeral 11 denotes a switch for switching an output destination of the bone conduction voice signal from the short-time analyzer 7, and the switch 11 controls a learning mode for learning a signal conversion rule and a signal conversion of the bone conduction voice signal based on the signal conversion rule. And the playback mode for performing

【００２４】１２は信号変換器であり、ルール記憶器１
０で記憶された信号変換ルールに基づいて、短時間分析
器７から出力される短時間の骨導音声信号から短時間の
疑似気導音声信号を得るものである。平滑化器１３は、
信号変換器１２の出力である短時間の疑似気導音声信号
を、元の骨導音声信号の時間軸に合わせて接合し、また
接合端で信号が不連続になることによって信号歪みが出
ることのないように平滑化処理を施すものである。平滑
化の手法としては、例えば、ハミング窓関数によって信
号接合部の振幅値を０に近似した値とするものである。Reference numeral 12 denotes a signal converter, which is a rule storage 1
Based on the signal conversion rule stored as 0, a short-time pseudo air-conducted voice signal is obtained from the short-time bone-conducted voice signal output from the short-time analyzer 7. The smoother 13
A short-time pseudo air-conducted audio signal output from the signal converter 12 is spliced according to the time axis of the original bone-conducted audio signal, and signal distortion occurs due to discontinuity of the signal at the spliced end. In this case, a smoothing process is performed so as not to cause the problem. As a smoothing method, for example, the amplitude value of the signal junction is set to a value close to 0 by a Hamming window function.

【００２５】１４はＤ／Ａ変換器であり、平滑化器１３
から出力されるディジタル信号を、アナログ信号に変換
するものである。１５はローパスフィルタであり、Ｄ／
Ａ変換器１４の出力信号について、エリアシング歪みを
防止する。ここで、Ｄ／Ａ変換器１４はＡ／Ｄ変換器６
と同一のサンプリング周波数及び量子化ビット数を有
し、また、ローパスフィルタ１５はローパスフィルタ４
と同一のカットオフ周波数を有するものとする。１６は
最終的に疑似気導音声信号を出力する出力端である。Reference numeral 14 denotes a D / A converter, which is a smoother 13
Is to convert a digital signal output from the device into an analog signal. Reference numeral 15 denotes a low-pass filter.
The aliasing distortion of the output signal of the A converter 14 is prevented. Here, the D / A converter 14 is the A / D converter 6
And the low-pass filter 15 has the same sampling frequency and the same number of quantization bits as the low-pass filter 4.
And has the same cutoff frequency. Reference numeral 16 denotes an output terminal for finally outputting a pseudo air-conducted voice signal.

【００２６】上述した構成による装置の動作について、
学習モードと、再生モードとに分けて説明する。学習モ
ードは、骨導音声と気導音声との対応を求めて信号変換
ルールを決定するモードであり、再生モードは、信号変
換ルールに基づいて、骨導音声から疑似気導音声出力を
得るモードである。With respect to the operation of the apparatus having the above-described configuration,
The description will be made separately for the learning mode and the reproduction mode. The learning mode is a mode in which a signal conversion rule is determined by obtaining a correspondence between the bone-conducted voice and the air-conducted voice. The playback mode is a mode in which a pseudo air-conducted voice output is obtained from the bone-conducted voice based on the signal conversion rule. It is.

【００２７】（１）学習モードの動作学習モードにおいては、スイッチ１１は学習モードの方
へ接続されている。このような状態において、まず、発
声者が、音声信号としてあらゆる特徴が表出した語彙や
文章、例えば、文献、板橋著、「音声認識用共通音声デ
ータ」、日本音響学会シンポジウム「試験用音声の標準
化」予稿集、１９８５年、に述べられているような１０
０個の日本都市名などを発声する。ここで発声者の使用
する環境は、音声の特徴抽出に悪影響を与えない周囲騒
音レベルの少ない室内であることが必要である。(1) Learning Mode Operation In the learning mode, the switch 11 is connected to the learning mode. In such a state, first, the utterer gives a vocabulary or sentence in which all features are expressed as a voice signal, for example, a document, written by Itabashi, "Common voice data for voice recognition", a symposium of the ASJ Standardization, Proceedings, 1985, 10
Say zero Japanese city names. Here, the environment used by the speaker needs to be a room with a low ambient noise level that does not adversely affect the feature extraction of the voice.

【００２８】発声された音声は、骨導マイクロホン１お
よび気導マイクロホン２にそれぞれ同時に入力され、ロ
ーパスフィルタ３，４およびＡ／Ｄ変換器５，６を通じ
てディジタル形式の波形データに変換される。このディ
ジタル形式の波形データは短時間分析器７，８において
前述したように短時間単位で分割され、変換ルール決定
器９へ送出される。変換ルール決定器９では前述したよ
うに、短時間の骨導音声信号ａ（ｎ）と短時間の気導音
声信号ｂ（ｎ）とを組み合わせてａ（ｎ）からｂ（ｎ）
への変換ルールとする。The uttered voice is simultaneously input to the bone conduction microphone 1 and the air conduction microphone 2, respectively, and is converted into digital waveform data through the low pass filters 3 and 4 and the A / D converters 5 and 6. The digital waveform data is divided by the short time analyzers 7 and 8 into short time units as described above, and sent to the conversion rule decision unit 9. As described above, the conversion rule determiner 9 combines the short-time bone conduction audio signal a (n) and the short-time air conduction audio signal b (n) to convert a (n) to b (n).
To the conversion rule.

【００２９】なお、ａ（ｎ）については、多数の骨導音
声信号を観測した場合、類似した信号パターンが観測さ
れるが、類似した信号パターンについては同一のａ
（ｎ）として扱われる。すなわち、既に変換ルール決定
器９に記憶されたａ（ｎ）の、例えばＬＰＣケプストラ
ム係数などのスペクトル上の特徴量をＡ（ｎ）とする
と、新たに入力された短時間の骨導音声信号ａ（ｎ’）
についてそのスペクトルをＡ（ｎ’）としたとき、スペ
クトル上の特徴量の距離の絶対値｜Ａ（ｎ）−Ａ
（ｎ’）｜が所定のしきい値ＴＨよりも小さい場合に、
このａ（ｎ’）はａ（ｎ）と同一の骨導音声信号パター
ンとして分類される。このようにして、変換ルール決定
器９では、ある一定個数の変換ルールが決定される。こ
うして得られた変換ルールはルール記憶器１０に記憶さ
れ、十分な数の変換ルールが得られると学習モードが終
了する。For a (n), when a large number of bone-conducted voice signals are observed, similar signal patterns are observed, but for similar signal patterns, the same a
(N). That is, assuming that the feature amount on the spectrum of a (n) already stored in the conversion rule determiner 9 such as the LPC cepstrum coefficient is A (n), the newly input short-time bone conduction speech signal a (N ')
Is the absolute value | A (n) −A of the distance of the feature amount on the spectrum, when the spectrum is A (n ′).
(N ′) | is smaller than a predetermined threshold value TH,
This a (n ') is classified as the same bone conduction voice signal pattern as a (n). In this way, the conversion rule determiner 9 determines a certain number of conversion rules. The conversion rules thus obtained are stored in the rule storage 10, and the learning mode ends when a sufficient number of conversion rules are obtained.

【００３０】（２）再生モードの場合再生モードは、学習モードが終了した後に使用されるモ
ードである。再生モードにおいては、スイッチ１１は再
生モードの方へ接続されており、短時間分析器７と信号
変換器１２とが接続されている。また、このモードで
は、気導マイクロホン２、ローパスフィルタ４、Ａ／Ｄ
変換器６、短時間分析器８、変換ルール決定器９は使用
されない。(2) Reproduction Mode The reproduction mode is a mode used after the learning mode is completed. In the reproduction mode, the switch 11 is connected to the reproduction mode, and the short-time analyzer 7 and the signal converter 12 are connected. In this mode, the air conduction microphone 2, the low-pass filter 4, the A / D
The converter 6, the short-time analyzer 8, and the conversion rule determiner 9 are not used.

【００３１】再生モードでは、発声者の音声は、骨導マ
イクロホン１、ローパスフィルタ３、Ａ／Ｄ変換器５を
通じてディジタル形式の波形データに変換され、短時間
分析器７で短時間単位に分割された後で信号変換器１２
へ送出される。ここで、信号変換器１２に送出される骨
導音声信号をｘとする。次に信号変換器１２では、入力
されたｘとルール記憶器１０で記憶した各ａ（ｎ）との
スペクトル上の特徴量の距離の絶対値Ｄ（ｎ）を求め
る。なお、Ｄ（ｎ）は、ｘ，ａ（ｎ）のスペクトル上の
特徴量をそれぞれＸ，Ａ（ｎ）とすると、Ｄ（ｎ）＝｜
Ｘ−Ａ（ｎ）｜である。In the reproduction mode, the voice of the speaker is converted into digital waveform data through the bone conduction microphone 1, the low-pass filter 3, and the A / D converter 5, and is divided by the short-time analyzer 7 into short-time units. After the signal converter 12
Sent to Here, the bone-conducted voice signal transmitted to the signal converter 12 is defined as x. Next, the signal converter 12 obtains the absolute value D (n) of the distance of the feature amount on the spectrum between the input x and each a (n) stored in the rule storage 10. Note that D (n) is D (n) = |, where X and a (n) are spectral feature amounts of X and a (n), respectively.
XA (n) |.

【００３２】ここで、Ｄ（ｎ）が最小値となる場合のａ
（ｎ）が、入力信号ｘの疑似骨導音声信号とされ、疑似
気導音声信号ｂ（ｎ）が導出される。導出されたｂ
（ｎ）は平滑化器１３へ送出され、ここで短時間分割信
号より長時間の信号へ変換される。この信号はディジタ
ル形式の波形データであるため、Ｄ／Ａ変換器１４およ
びローパスフィルタ１５を介してアナログ信号波形に変
換され、出力端１６より元のアナログ信号として出力さ
れる。Here, when D (n) has the minimum value, a
(N) is a pseudo bone conduction audio signal of the input signal x, and a pseudo air conduction audio signal b (n) is derived. Derived b
(N) is sent to the smoother 13, where it is converted from a short-time split signal into a longer-time signal. Since this signal is digital waveform data, it is converted to an analog signal waveform via the D / A converter 14 and the low-pass filter 15 and output from the output terminal 16 as an original analog signal.

【００３３】次に、図３は本発明の第２の実施例による
骨導マイクロホン出力信号再生装置の概略構成を示すブ
ロック図である。図３において、図１と共通する部分に
は同一の符号を付し、その説明を省略する。図３におい
て、１６，１７はＬＰＣ分析器であり、それぞれ短時間
分析器７，８の出力について線形予測分析（ＬＰＣ）を
行い、入力音声をピッチ周波数と、声道特徴を示すパラ
メータ、例えばＬＰＣ係数などとに分離するものであ
る。FIG. 3 is a block diagram showing a schematic configuration of a bone conduction microphone output signal reproducing apparatus according to a second embodiment of the present invention. In FIG. 3, portions common to FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted. In FIG. 3, LPC analyzers 16 and 17 perform linear predictive analysis (LPC) on the outputs of the short-time analyzers 7 and 8, respectively, and convert the input speech into a pitch frequency and a parameter indicating a vocal tract feature, for example, LPC. It is separated into coefficients and the like.

【００３４】ここで分離されたもののうち、骨導音声の
ピッチ周波数は後述するＬＰＣ合成器１９へ送出され、
また骨導音声および気導音声の特徴パラメータがそれぞ
れスイッチ１１および係数変換ルール決定器９´へ送出
される。係数変換ルール決定器９´は、骨導音声の特徴
パラメータから気導音声の特徴パラメータへの変換ルー
ルを決定し、ルール記憶器１０へ供給するものである。The pitch frequency of the bone-conducted voice among the separated voices is sent to an LPC synthesizer 19 described later.
The characteristic parameters of the bone conduction voice and the air conduction voice are sent to the switch 11 and the coefficient conversion rule determiner 9 ', respectively. The coefficient conversion rule determiner 9 ′ determines a conversion rule from the characteristic parameter of the bone-conducted speech to the characteristic parameter of the air-conducted speech, and supplies the rule to the rule storage 10.

【００３５】１２´は係数変換器であり、係数変換ルー
ル決定器９´で決定され、ルール記憶器１０に記憶され
た変換ルールに基づいて、スイッチ１１を介して入力さ
れる骨導音声の特徴パラメータより疑似気導音声の特徴
パラメータを導出するものである。ＬＰＣ合成器１９
は、ＬＰＣ分析器１７より出力された骨導音声のピッチ
周波数と係数変換器１２´より出力された疑似気導音声
の特徴パラメータとにより、線形予測分析（ＬＰＣ）合
成を行って短時間の疑似気導音声の信号波形を生成する
ものである。Reference numeral 12 'denotes a coefficient converter, which is determined by the coefficient conversion rule determiner 9', and is based on the conversion rules stored in the rule storage 10 and is used to input the characteristics of the bone conduction speech through the switch 11. The characteristic parameters of the pseudo air-conducted speech are derived from the parameters. LPC synthesizer 19
Performs linear prediction analysis (LPC) synthesis on the basis of the pitch frequency of the bone-conducted speech output from the LPC analyzer 17 and the characteristic parameters of the pseudo-conducted speech output from the coefficient converter 12 ', and performs pseudo-simulation in a short time. A signal waveform of the air-conducted voice is generated.

【００３６】次に、図３に示す構成の骨導マイクロホン
出力信号再生装置の動作について、第１の実施例と同様
に、学習モードと再生モードとに分けて説明する。（３）学習モードの場合学習モードにおいては、スイッチ１１は学習モードの方
へ接続されている。このような状態で、まず、発声者
が、第１の実施例の場合と同様に音声信号としてあらゆ
る特徴が表出した語彙や文章を発声する。この音声は、
骨導マイクロホン１および気導マイクロホン２にそれぞ
れ同時に入力され、ローパスフィルタ３，４およびＡ／
Ｄ変換器５，６を通じてディジタル形式の波形データに
変換され、短時間分析器７および８において短時間単位
で分割され、係数変換ルール決定器９´へ送出される。Next, the operation of the bone-conducted microphone output signal reproducing apparatus having the configuration shown in FIG. 3 will be described separately for the learning mode and the reproduction mode, as in the first embodiment. (3) In the learning mode In the learning mode, the switch 11 is connected to the learning mode. In such a state, first, the speaker utters a vocabulary or a sentence in which all features are expressed as an audio signal as in the case of the first embodiment. This sound is
The bone-conducting microphone 1 and the air-conducting microphone 2 are simultaneously input to the low-pass filters 3, 4 and A /
The data is converted into digital form waveform data through the D converters 5 and 6, divided by the short time analyzers 7 and 8 in short time units, and sent to the coefficient conversion rule determiner 9 ′.

【００３７】係数変換ルール決定器９´では、まず多数
に分割した骨導音声について、ＬＰＣ係数などの音声特
徴パラメータｘ（ｔ）（ただしｔは入力時刻）を抽出し
て記憶し、この中から音声の特徴を広く網羅する一定個
数の代表的なパラメータｐ（ｎ）を導出する。この方法
はベクトル量子化としてよく知られているものであり、
具体的手法としては、例えばＬＢＧアルゴリズムや、Ｋ
−平均クラスタリング等の名称で知られているものが使
用される。なお、ここでは、最終的に分類されたパラメ
ータの個数、すなわちコードブック数ｎを例えば、２５
６個とする。The coefficient conversion rule determiner 9 'first extracts and stores speech characteristic parameters x (t) (where t is the input time) such as LPC coefficients for the bone-conducted speech divided into many parts. A certain number of representative parameters p (n) covering a wide range of voice features are derived. This method is well known as vector quantization,
Specific methods include, for example, the LBG algorithm and K
-What is known by the name such as average clustering is used. Here, the number of parameters finally classified, that is, the number n of codebooks is, for example, 25
There are six.

【００３８】次に、前出の骨導音声特徴パラメータｘ
（ｔ）をこの２５６個の代表的パラメータｐ（ｎ）のい
ずれかに置換する。すなわち、ｘ（ｔ）とｐ（ｎ）の、
例えばＬＰＣケプストラムなどのスペクトル上の特徴量
をそれぞれＸ（ｔ）、Ｐ（ｎ）としたとき、そのスペク
トル上の特徴量の距離の絶対値Ｄ（ｎ）＝｜Ｘ（ｔ）−
Ｐ（ｎ）｜について、Ｄ（ｎ）が最小値を取るときのｐ
（ｎ）によってｘ（ｔ）を置換する。Next, the bone conduction speech feature parameter x
(T) is replaced with any of the 256 representative parameters p (n). That is, x (t) and p (n)
For example, when a feature on a spectrum such as an LPC cepstrum is X (t) and P (n), an absolute value of a distance of the feature on the spectrum D (n) = | X (t) −
For P (n) |, p when D (n) takes the minimum value
Replace (x) with (n).

【００３９】ここで、骨導音声特徴パラメータｘと同時
に収録した気導音声特徴パラメータをｙ（ｔ）、またｘ
がｐ（ｎ）に置換されたときのｙ（ｔ）をｙ（ｔ，ｎ）
としたとき、全てのｔに対して、ｙ（ｔ，ｎ）はｐ
（ｎ）毎に、すなわちここでは２５６種類に分類される
が、分類されたｙ（ｔ，ｎ）について集計され、その相
加平均をとって平均値ｑ（ｎ）が算出される。上述した
操作によって、骨導音声特徴パラメータｐ（ｎ）に対す
る疑似気導音声特徴パラメータｑ（ｎ）の変換ルールが
導出される。係数変換ルール決定器９´では、このｐ
（ｎ）とｑ（ｎ）との組をルール記憶器１０へ送出し、
ルール記憶器１０で記憶させる。Here, the air-conducted speech feature parameters recorded simultaneously with the bone-conducted speech feature parameters x are represented by y (t) and x.
Is replaced by p (n), y (t) is replaced by y (t, n)
And for all t, y (t, n) is p
Each (n), that is, 256 types are classified here. However, the classified y (t, n) are totaled, and an arithmetic average thereof is calculated to calculate an average value q (n). By the above-described operation, a conversion rule of the pseudo air-conducted speech feature parameter q (n) with respect to the bone-conducted speech feature parameter p (n) is derived. In the coefficient conversion rule determiner 9 ', this p
A set of (n) and q (n) is sent to the rule storage 10,
The data is stored in the rule storage 10.

【００４０】（４）再生モードの場合再生モードにおいては、スイッチ１１は再生モードの方
へ接続され、ＬＰＣ分析器１７と係数変換器１２´とが
接続される。このような状態において、発声者の音声は
骨導マイクロホン１、ローパスフィルタ３、Ａ／Ｄ変換
器５を通じてディジタル形式の波形データに変換され、
短時間分析器７で短時間単位に分割され、ＬＰＣ分析器
１７によりピッチ周波数データｖと骨導音声特徴パラメ
ータｘとに分離される。(4) Reproduction Mode In the reproduction mode, the switch 11 is connected to the reproduction mode, and the LPC analyzer 17 and the coefficient converter 12 'are connected. In such a state, the voice of the speaker is converted into digital waveform data through the bone conduction microphone 1, the low-pass filter 3, and the A / D converter 5,
It is divided into short time units by the short time analyzer 7 and separated into pitch frequency data v and bone conduction speech feature parameters x by the LPC analyzer 17.

【００４１】骨導音声特徴パラメータｘについては信号
変換器１２へ送られ、予め係数変換ルール決定器９´で
算出されルール記憶器１０に記憶された代表的パラメー
タｐ（ｎ）によって置換される。すなわち、ｘおよびｐ
（ｎ）のスペクトルをそれぞれＸ、Ｐ（ｎ）としたと
き、そのスペクトル距離の絶対値Ｄ（ｎ）＝｜Ｘ−Ｐ
（ｎ）｜について、Ｄ（ｎ）が最小値を取るときのｐ
（ｎ）によってｘが置換される。The bone-conducted voice feature parameter x is sent to the signal converter 12 and replaced by a representative parameter p (n) calculated in advance by the coefficient conversion rule determiner 9 ′ and stored in the rule storage 10. That is, x and p
When the spectrum of (n) is X and P (n), respectively, the absolute value of the spectrum distance D (n) = | X−P
For (n) |, p when D (n) takes the minimum value
X is replaced by (n).

【００４２】ここで、ルール記憶器１０で記憶したルー
ルに基づき、ｐ（ｎ）から疑似気導音声特徴パラメータ
ｑ（ｎ）が導出され、ＬＰＣ合成器１９へ送出される。
ＬＰＣ合成器１９では、ｑ（ｎ）とＬＰＣ分析器１７か
ら出力されたピッチ周波数データｖとに基づいて、短時
間単位の疑似気導音声の信号波形が生成される。生成さ
れた疑似気導音声信号波形は平滑化器１３へ送出され、
短時間分割信号から長時間の信号へ変換される。平滑化
器１３の出力信号はディジタル形式の波形データである
ため、Ｄ／Ａ変換器１４およびローパスフィルタ１５を
経由してアナログ波形に変換され、出力端１６より元の
アナログ信号として出力される。Here, based on the rules stored in the rule storage 10, the pseudo air-conducted speech feature parameter q (n) is derived from p (n) and sent to the LPC synthesizer 19.
The LPC synthesizer 19 generates a signal waveform of the pseudo air-conducted voice in short time units based on q (n) and the pitch frequency data v output from the LPC analyzer 17. The generated pseudo air-conducted voice signal waveform is sent to the smoother 13,
The short-time split signal is converted into a long-time signal. Since the output signal of the smoother 13 is digital waveform data, the output signal is converted into an analog waveform via the D / A converter 14 and the low-pass filter 15 and output from the output terminal 16 as an original analog signal.

【００４３】ここで、上述した第１および第２の実施例
のそれぞれのルール記憶器１０で記憶した各変換ルール
が、いかなる発声者に対しても普遍的な変換結果をもた
らすルールであれば、気導マイクロホンと、気導マイク
ロホンで収録した気導音声を信号処理する部分と、変換
ルールを算出して決定する部分とが不要になり、骨導マ
イクロホン出力信号再生装置の構成はより簡易になる。
このような構成の骨導マイクロホン出力信号再生装置に
ついて以下に説明する。Here, if each of the conversion rules stored in the respective rule storages 10 of the first and second embodiments described above is a rule that gives a universal conversion result to any speaker, The air-conducting microphone, the part for processing the air-conducted voice recorded by the air-conducting microphone, and the part for calculating and determining the conversion rule become unnecessary, and the configuration of the bone-conducting microphone output signal reproducing device becomes simpler. .
The bone-conducting microphone output signal reproducing device having such a configuration will be described below.

【００４４】図４は本発明の第３の実施例による骨導マ
イクロホン出力信号再生装置の概略構成を示すブロック
図であり、図１と共通する部分には同一の符号を付し、
その説明を省略する。この図に示す装置は、図１に示す
ものから、気導マイクロホン２、ローパスフィルタ４、
Ａ／Ｄ変換器６、短時間分析器８、変換ルール決定器
９、およびスイッチ１１を取り去った構成となってい
る。ただし、図４のルール記憶部１０には、予め、いか
なる発声者に対しても普遍的な変換結果をもたらす変換
ルールが記憶されている。FIG. 4 is a block diagram showing a schematic configuration of a bone conduction microphone output signal reproducing apparatus according to a third embodiment of the present invention. In FIG.
The description is omitted. The device shown in this figure is different from the device shown in FIG. 1 in that an air conducting microphone 2, a low-pass filter 4,
The configuration is such that the A / D converter 6, the short-time analyzer 8, the conversion rule determiner 9, and the switch 11 are removed. However, the rule storage unit 10 of FIG. 4 stores in advance conversion rules that provide universal conversion results for any speaker.

【００４５】また、図５は本発明の第４の実施例による
骨導マイクロホン出力信号再生装置の概略構成を示すブ
ロック図であり、図３と共通する部分には同一の符号を
付し、その説明を省略する。この図に示す装置は、図３
に示すものから、気導マイクロホン２、ローパスフィル
タ４、Ａ／Ｄ変換器６、短時間分析器８、ＬＰＣ分析器
１８、係数変換ルール決定器９´、およびスイッチ１１
を取り去った構成となっている。ただし、図５のルール
記憶部１０には、予め、いかなる発声者に対しても普遍
的な変換結果をもたらす変換ルールが記憶されている。FIG. 5 is a block diagram showing a schematic configuration of a bone conduction microphone output signal reproducing apparatus according to a fourth embodiment of the present invention. In FIG. 5, parts common to those in FIG. Description is omitted. The device shown in FIG.
, An air conduction microphone 2, a low-pass filter 4, an A / D converter 6, a short-time analyzer 8, an LPC analyzer 18, a coefficient conversion rule determiner 9 ', and a switch 11
Has been removed. However, the rule storage unit 10 in FIG. 5 stores in advance conversion rules that provide universal conversion results for any speaker.

【００４６】上述した第３および第４の実施例による骨
導マイクロホン出力信号再生装置では、予めルール記憶
部１０に変換ルールが記憶されているため、第１および
第２の実施例における学習モードが存在しない。したが
って、第１および第２の実施例における再生モードと同
様の動作のみが行われる。In the bone conduction microphone output signal reproducing device according to the third and fourth embodiments described above, since the conversion rule is stored in the rule storage unit 10 in advance, the learning mode in the first and second embodiments is not changed. not exist. Therefore, only the same operation as in the reproduction mode in the first and second embodiments is performed.

【００４７】以上説明したように、図１、図３、図４、
および図５に示す構成によって、骨導マイクロホン１で
収録した音声に対し、予め作成した変換ルールに基づい
て、音素レベルの短時間単位で骨導音声から疑似気導音
声への変換処理が行われる。したがって、骨導音声に含
まれる雑音が疑似気導音声へ与える影響を除去すること
ができ、従来の時間平均値に基づく一定のフィルタ特性
での補正方式に比べて、優れた特性の音声を疑似気導音
声として得ることができる。As described above, FIG. 1, FIG. 3, FIG.
With the configuration shown in FIG. 5, the speech recorded by the bone conduction microphone 1 is subjected to conversion processing from bone conduction speech to pseudo air conduction speech in short time units of phoneme level based on a conversion rule created in advance. . Therefore, it is possible to remove the influence of the noise included in the bone-conducted speech on the pseudo-conducted speech, and to simulate speech having superior characteristics compared to the conventional correction method using a fixed filter characteristic based on a time average value. It can be obtained as air conduction voice.

【００４８】さらに、図１，図３に示す構成では、骨導
マイクロホン１とマイクロホン２とで同時に音声を収録
できるため、それぞれのマイクロホンで収録した信号に
ついて１体１対応をとることができる。この対応は、発
声者に応じて求めることができるため、骨導音声から疑
似気導音声への変換処理を極めて高い確度で行うことが
できる。Further, in the configurations shown in FIGS. 1 and 3, since the bone-conducting microphone 1 and the microphone 2 can record sound simultaneously, the signals recorded by the respective microphones can correspond to each other. Since this correspondence can be obtained according to the speaker, the conversion process from the bone-conducted speech to the pseudo-conducted speech can be performed with extremely high accuracy.

【００４９】また、図３および図５に示す構成では、骨
導音声および気導音声をそれぞれ各種の信号分析技術に
よって基本周波数と声導特徴パラメータとに分離し、声
導特徴パラメータの使用によって変換ルールを生成する
ように構成されている。したがって、ルール記憶部１０
の記憶容量を低減し、また音素単位で短時間分割した音
声を接合する際に音声品質を良好とすることができる。
さらに、図４および図５に示す構成では、予め不特定多
数の発声者について普遍的な変換ルールを作成しておく
ようにしたため、装置構成を簡素とすることができると
ともに、装置の使用者も変換ルールを作成（学習）する
ための手間を省くことができる。In the configuration shown in FIGS. 3 and 5, the bone-conducted speech and the air-conducted speech are respectively separated into a fundamental frequency and a voice-conducting feature parameter by various signal analysis techniques, and are converted by using the voice-conducting feature parameter. It is configured to generate rules. Therefore, the rule storage unit 10
Can be reduced, and voice quality can be improved when voices divided for a short time in phoneme units are joined.
Furthermore, in the configuration shown in FIGS. 4 and 5, since universal conversion rules are created in advance for an unspecified number of speakers, the device configuration can be simplified, and the user of the device can also be used. The effort for creating (learning) the conversion rule can be saved.

【００５０】[0050]

【発明の効果】本発明では、従来の骨導マイクロホンで
は収音できなかった高い周波数の信号成分についても正
確に再生できるという効果がある。また、短時間（音
素）単位での変換を行うことにより、音声の平均スペク
トルの差分により補正していた従来の方法に比べて、音
素毎に最適な音声を再生できるという効果がある。さら
に、骨導マイクロホンで収録した音声を補正用の音声の
現信号として使用しないため、骨導マイクロホンで収音
した音声に重畳する不要雑音が変換後の音声に残留しな
い。すなわち、出力信号から不要雑音の影響を除去する
ことができるという効果がある。また、予め発声者に普
遍的な信号変換ルールを記憶しておくことにより、使用
者毎の学習操作を不要とすることができるという効果が
ある。According to the present invention, it is possible to accurately reproduce even a high-frequency signal component which cannot be collected by a conventional bone conduction microphone. Further, by performing the conversion in a short time (phoneme) unit, there is an effect that an optimum voice can be reproduced for each phoneme, as compared with the conventional method in which the correction is made based on the difference of the average spectrum of the voice. Furthermore, since the voice recorded by the bone conduction microphone is not used as the current signal of the voice for correction, unnecessary noise superimposed on the voice collected by the bone conduction microphone does not remain in the converted voice. That is, there is an effect that the influence of unnecessary noise can be removed from the output signal. Also, by storing in advance the universal signal conversion rules for the speaker, there is an effect that the learning operation for each user can be eliminated.

[Brief description of the drawings]

【図１】本発明の第１の実施例による骨導マイクロホン
出力信号再生装置の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a bone conduction microphone output signal reproducing device according to a first embodiment of the present invention.

【図２】音声信号の短時間分析を説明するための図であ
る。FIG. 2 is a diagram for explaining short-time analysis of an audio signal.

【図３】本発明の第２の実施例による骨導マイクロホン
出力信号再生装置の概略構成を示すブロック図である。FIG. 3 is a block diagram showing a schematic configuration of a bone conduction microphone output signal reproducing device according to a second embodiment of the present invention.

【図４】本発明の第３の実施例による骨導マイクロホン
出力信号再生装置の概略構成を示すブロック図である。FIG. 4 is a block diagram illustrating a schematic configuration of a bone conduction microphone output signal reproducing device according to a third embodiment of the present invention.

【図５】本発明の第４の実施例による骨導マイクロホン
出力信号再生装置の概略構成を示すブロック図である。FIG. 5 is a block diagram showing a schematic configuration of a bone conduction microphone output signal reproducing device according to a fourth embodiment of the present invention.

【図６】従来の骨導マイクロホン出力信号再生装置を説
明するためのブロック図である。FIG. 6 is a block diagram illustrating a conventional bone conduction microphone output signal reproducing device.

[Explanation of symbols]

１…骨導マイクロホン、２…気導マイクロホン、３，
４，１５…ローパスフィルタ、５，６…Ａ／Ｄ変換器、
７，８…短時間分析器、９…変換ルール決定器、９´…
係数変換ルール決定器、１０…ルール記憶器、１１…ス
イッチ、１２…信号変換器、１２´…係数変換器、１３
…平滑化器、１４…Ｄ／Ａ変換器、１６…出力端、１
７，１８…ＬＰＣ分析器、１９…ＬＰＣ合成器。1 ... bone conduction microphone, 2 ... air conduction microphone, 3,
4,15 ... low-pass filter, 5,6 ... A / D converter,
7, 8: short-time analyzer, 9: conversion rule determiner, 9 '...
Coefficient conversion rule determiner, 10 ... rule storage, 11 ... switch, 12 ... signal converter, 12 '... coefficient converter, 13
... Smoothing device, 14 ... D / A converter, 16 ... Output terminal, 1
7, 18 ... LPC analyzer, 19 ... LPC synthesizer.

フロントページの続き (56)参考文献特開昭56−65200（ＪＰ，Ａ) 特開昭63−278100（ＪＰ，Ａ) 特開平４−245720（ＪＰ，Ａ) 特開平４−271398（ＪＰ，Ａ) 特開平４−276799（ＪＰ，Ａ) 特開平８−70344（ＪＰ，Ａ) 実開平３−125400（ＪＰ，Ｕ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 21/00 - 21/04 H04R 1/00,3/00 Continuation of the front page (56) References JP-A-56-65200 (JP, A) JP-A-63-278100 (JP, A) JP-A-4-245720 (JP, A) JP-A-4-271398 (JP) JP-A-4-276799 (JP, A) JP-A-8-70344 (JP, A) JP-A-3-125400 (JP, U) (58) Fields investigated (Int. Cl. ⁷ , DB Name) G10L 21/00-21/04 H04R 1 / 00,3 / 00

Claims

(57) [Claims]

1. A bone conduction microphone, means for dividing an output signal of a bone conduction microphone every predetermined short time, an air conduction microphone, and means for dividing an output signal of the air conduction microphone every said predetermined time, A signal in the predetermined short time unit from the bone conduction microphone output signal to the air conduction microphone output signal is obtained by determining a correspondence between the predetermined short time bone conduction microphone output signal and the predetermined short time air conduction microphone output signal. Means for determining a conversion rule; means for storing the signal conversion rule; and a method for determining the predetermined short time from the bone-conducting microphone output signal for a predetermined time based on the signal conversion rule stored in the means. Means for generating and outputting a conducting microphone output signal; and joining the predetermined short-time pseudo-conducting microphone output signal to each other for a long-time pseudo-conducting microphone. Means for obtaining a crophone output signal, wherein a one-to-one correspondence is obtained for each of the audio signal waveforms that have been simultaneously recorded from the bone conduction microphone and the air conduction microphone and subjected to the predetermined short-time division, A conversion rule for converting a predetermined short-time bone conduction microphone output signal into the predetermined short-time pseudo air conduction microphone output signal is stored, and is obtained based on the conversion rule and the predetermined short-time bone conduction microphone output signal. Reproducing the long-time pseudo air-conducting microphone output signal by joining the predetermined short-time pseudo air-conducting microphone output signal to obtain a long-time signal waveform. apparatus.

2. A bone conduction microphone, means for dividing an output signal of the bone conduction microphone every predetermined short time, and extracting a characteristic of the bone conduction microphone output signal of the predetermined short time to obtain a fundamental frequency and a vocal tract feature parameter. Means for deriving an air-conducting microphone, a stage for dividing the air-conducting microphone output signal for each of the predetermined short time periods, and vocal tract feature parameters by extracting features of the predetermined short-time air-conducting microphone output signal. Means for deriving a vocal tract characteristic parameter of the bone conduction microphone output signal and a vocal tract characteristic parameter of the air conduction microphone output signal, and calculating the air conduction from the vocal tract characteristic parameter of the bone conduction microphone output signal. Means for determining a conversion rule for the vocal tract feature parameter of the microphone output signal in the predetermined short time unit, and means for storing the conversion rule Means for generating and outputting a vocal tract characteristic parameter of a pseudo air conduction microphone output signal from a vocal tract characteristic parameter of the bone conduction microphone output signal based on the conversion rule stored in the means; Means for synthesizing the predetermined short-time pseudo air-conducting microphone output signal from the vocal tract characteristic parameter of the output signal and the fundamental frequency component of the bone-conducting microphone output signal; and the predetermined short-time pseudo air-conducting microphone output signal. Means for obtaining a pseudo air-conducting microphone output signal for a long period of time by combining each of the above, wherein each of the voices recorded simultaneously from each of the bone-conducting microphone and the air-conducting microphone and divided every predetermined short time The vocal tract feature parameters are extracted for the signal waveform, and a one-to-one correspondence is obtained. It is stored as a conversion rule from the voice conduction feature parameter to the vocal tract feature parameter of the pseudo air conduction microphone output signal, and is obtained based on the conversion rule and the parameter obtained using the voice conduction feature parameter of the bone conduction microphone output signal. The predetermined short-time pseudo air-conducting microphone output signal obtained from the vocal tract feature parameter obtained and the pitch component of the bone-conducting microphone output signal is joined to obtain a long-time signal waveform, thereby obtaining the long-time pseudo air-conducting signal. A bone-conducted microphone output signal reproducing device for reproducing a conductive microphone output signal.

3. A bone conduction microphone, means for dividing the bone conduction microphone output signal for each predetermined short time, and obtaining a pseudo air conduction microphone output signal corresponding to the predetermined short time bone conduction microphone output signal. Means for storing the signal conversion rules, and generating the predetermined short-time pseudo air-conducting microphone output signal from the predetermined short-time bone conduction microphone output signal based on the signal conversion rules stored in the means. Output means, and means for joining each of the predetermined short-time pseudo air-conducting microphone output signal to obtain a long-time pseudo air-conducting microphone output signal, wherein the conversion rule and the predetermined short-time By joining the predetermined short-time pseudo air-conducting microphone output signal obtained based on the bone-conducting microphone output signal to obtain a long-term signal waveform Bone conduction microphone output signal reproducing apparatus characterized by playing the long pseudo air conduction microphone output signal.

4. A bone conduction microphone, means for dividing the bone conduction microphone output signal for each predetermined short period, and extracting a characteristic of the bone conduction microphone output signal for the predetermined short period to obtain a fundamental frequency and a vocal tract characteristic parameter. Means for deriving a pseudo-conducted microphone output signal corresponding to the predetermined short-time bone-conducted microphone output signal; and a vocal tract characteristic parameter of the bone-conducted microphone output signal. Means for storing a conversion rule for obtaining a vocal tract feature parameter of a pseudo air-conducted microphone output signal corresponding to the following; and a pseudo-conduction technique based on the vocal tract feature parameter of the bone-conducted microphone output signal based on the conversion rule stored in the means. Means for generating and outputting a vocal tract feature parameter of the air conduction microphone output signal; and vocal tract characteristics of the pseudo air conduction microphone output signal. Means for synthesizing the predetermined short-time pseudo air-conducting microphone output signal from the characteristic parameter and the fundamental frequency component of the bone-conducting microphone output signal, and joining each of the predetermined short-time pseudo air-conducting microphone output signal. Means for obtaining a pseudo air-conducting microphone output signal for a long time by using the conversion rule and a voice-conducting characteristic parameter of the predetermined short-time bone-conducting microphone output signal. Reproducing the long-time pseudo air-conducted microphone output signal by joining the predetermined short-time pseudo air-conducted microphone output signal obtained from the pitch component of the conductive microphone output signal and obtaining a long-time signal waveform. A bone conduction microphone output signal reproduction device characterized by the above-mentioned.