JP2016110050A

JP2016110050A - Voice processor, voice clearing device, and voice processing method

Info

Publication number: JP2016110050A
Application number: JP2015074832A
Authority: JP
Inventors: 正司最上; Masaji Mogami; 渡邉　秀雄; Hideo Watanabe; 秀雄渡邉
Original assignee: Kotobuki Communication Instr Co Ltd; Kotobuki Communication Instrument Co Ltd
Current assignee: Kotobuki Communication Instr Co Ltd; Kotobuki Communication Instrument Co Ltd
Priority date: 2014-01-17
Filing date: 2015-04-01
Publication date: 2016-06-20
Anticipated expiration: 2035-04-01
Also published as: JP2015135267A; JP6548938B2

Abstract

PROBLEM TO BE SOLVED: To provide a novel voice processor, a voice clearing device, and a voice processing method which make it easy to hear voice from speakers, even in an existing acoustic device and a broadcasting facility.SOLUTION: Collected audio signals are branched, a low frequency component equal to low-order formant or lower and a high frequency component beyond high-order formant are removed from one of the branched audio signals, a phase of the other audio signal is synthesized with an audio signal corrected so that it matches a phase of the processed audio signal and processed, and outputted. Thus, it is possible to extract from the original audio signal only formant components of second to fourth degree significance for language discrimination, to add it to an original audio signal, and to output it, so that contents of the voice can be clearly heard.SELECTED DRAWING: Figure 5

Description

本発明は、音響装置や放送設備などのスピーカーから流れる音声を明瞭に聴きやすくするための音声処理装置及び音声明瞭化装置並びに音声処理方法に関する。 The present invention relates to an audio processing device, an audio clarifying device, and an audio processing method for making it easy to clearly hear audio flowing from a speaker such as an audio device or broadcasting equipment.

一般に、ビルや公共の建物会館、ショッピングセンター、スポーツ施設などといった大勢の人が集まる場所には、必要な情報の案内や緊急のアナウンスを行うための放送設備（館内放送）が設置されている。この放送設備は、例えば建物の天井や壁などにスピーカーを設置し、放送室から送られる音声信号をスピーカーで音声に変換して出るようになっているが、天井板の共振現象や他のスピーカーからでる音声との干渉などの条件によっては音声の内容が聞き取り難いことがある。 In general, broadcasting facilities (in-house broadcasting) are installed at places where a large number of people gather, such as buildings, public building halls, shopping centers, sports facilities, etc., for guiding necessary information and making emergency announcements. In this broadcasting equipment, for example, speakers are installed on the ceiling or wall of a building, and the audio signal sent from the broadcasting room is converted into audio by the speakers, but the resonance phenomenon of the ceiling plate and other speakers Depending on the conditions such as interference with the voice coming out, the contents of the voice may be difficult to hear.

一方、加齢によって聴力が衰えた高齢者や難聴者の場合、音声は聴き取れても何を言っているのかよく分からないことがある。これは人間が発する音声（言語）が母音と子音からなるものであって、このうち母音は比較的容易に聴き取れても子音が聴き取り難いためであると考えられている。従って、スピーカーから出る音量を上げたり、補聴器の感度を高くしただけでは、子音だけでなく母音も大きくなるため、容易に解消できない。 On the other hand, in the case of an elderly person or a hearing-impaired person whose hearing ability has declined due to aging, even if the voice can be heard, it may be difficult to understand what is being said. This is thought to be because the speech (language) produced by human beings consists of vowels and consonants, and among these vowels, it is difficult to hear consonants even if they can be heard relatively easily. Therefore, simply raising the volume of the sound from the speaker or increasing the sensitivity of the hearing aid increases not only the consonant but also the vowel, which cannot be easily eliminated.

そのため、例えば以下の特許文献１では、携帯電話やテレビ、ステレオなどのオーディオ機器を使用する使用者の年齢による聴覚の変化に合わせてレシーバなどから出力される音声の周波数特性およびレベルを補正する音声補正装置が提案されている。また、以下の特許文献２や３などでは、話者のホルマントを検出して聴きやすいホルマントに変形する方法が提案されている。 For this reason, for example, in Patent Document 1 below, audio that corrects frequency characteristics and levels of audio output from a receiver or the like in accordance with changes in hearing depending on the age of a user who uses an audio device such as a mobile phone, a television, or a stereo. Correction devices have been proposed. In Patent Documents 2 and 3 below, a method of detecting a formant of a speaker and transforming it into a formant that is easy to listen to is proposed.

特開２０００−２０９６９８号公報JP 2000-209698 A 特開２００８−１１６５３４号公報JP 2008-116534 A 特開平０１−９３７９６号公報Japanese Patent Laid-Open No. 01-93796

ところで、前述したような館内放送におけるスピーカーから出る音声を聴き取り易くするためには、放送機器やスピーカーを高品質なものに代えたり、スピーカーの数や設置場所などを工夫するなどが考えられるが、そのためには多大な費用を要する。一方、聴力が衰えた高齢者や難聴者に対する音声の補正方法では、声色（音質）が大きく変化して不自然な音声になってしまうことがある。 By the way, in order to make it easier to hear the sound from the speakers in the in-house broadcasting as described above, it is possible to replace the broadcasting equipment and speakers with high-quality ones, devise the number of speakers and the installation location, etc. This requires a great deal of money. On the other hand, in the sound correction method for the elderly or hearing-impaired people whose hearing ability has declined, the voice color (sound quality) may change greatly, resulting in an unnatural sound.

そこで、本発明はこれらの課題を解決するために案出されたものであり、その目的の１つは、既存の音響装置や放送設備であってもそのスピーカーから流れる音声を明瞭に聴きやすくできる新規な音声処理装置及び音声明瞭化装置並びに音声処理方法を提供するものである。また、本発明の他の目的は、聴力が衰えた高齢者や難聴者であっても違和感の無い自然な音声で聴き取ることができる新規な音声処理装置及び音声明瞭化装置並びに音声処理方法を提供するものである。 Therefore, the present invention has been devised to solve these problems, and one of the purposes is to make it easier to hear the sound flowing from the speaker even in an existing audio device or broadcasting facility. A novel speech processing device, speech clarification device, and speech processing method are provided. Another object of the present invention is to provide a novel speech processing device, speech clarification device, and speech processing method that can be listened to with natural speech without any sense of incongruity even if the hearing ability is aged or deaf. It is to provide.

人間の聴覚による言語の理解は、鼓膜で受けた音波を聴覚器官で電気信号に変換して脳へ伝達され、過去の経験から得た記憶に照合して言葉として判断されるものであるが、聴覚器官は非常に複雑であり、その働きについてはあまり解明されていない。 The understanding of language by human hearing is that sound waves received by the eardrum are converted into electrical signals by the auditory organ and transmitted to the brain, and are judged as words by collating with memories obtained from past experiences. The auditory organ is very complex and its function is not well understood.

近年、難聴者向けに補聴器や集音器が「対個人」を目的に特殊な補助音声処理を施した受動的機器類が広く普及しつつあるが、これらは外部から到来する音声や他の音を聴取するもので装着による疲労感や見た目を気にすることから使用をためらうケースが多い。また、高性能なものほど高額となり、経済的負担が大きい。 In recent years, passive devices such as hearing aids and sound collectors with special auxiliary sound processing for “individual” for hearing-impaired people are becoming widespread. There are many cases that hesitate to use because of the tiredness and appearance due to wearing. Moreover, the higher the performance, the higher the cost and the greater the economic burden.

一方、言語の発声メカニズムに関する研究によれば、声帯という器官から強さ、量などが調整された空気が母音として放出される。これを第１ホルマントといい、後に続く軌道、口腔の容積や形状、鼻腔、舌の振動・形状、上顎、下顎による弛緩容積などの調整による共鳴現象で第１ホルマントより高い周波数域にエネルギーの大きい周波数特性上のピークが複数現れ、これらをホルマントと呼んでいる。低い周波数側から高い周波数側に向かって順次現れるピークを第２ホルマント、第３ホルマント、第４ホルマント…、第ｎホルマントとされ、それらを合成したものがその個人独自の声色として発せられる。 On the other hand, according to the research on the utterance mechanism of language, air whose strength, amount, etc. are adjusted is emitted as a vowel from an organ called a vocal cord. This is called the first formant, and it has resonance energy by adjusting the following trajectory, oral volume and shape, nasal cavity, tongue vibration and shape, maxillary and mandibular relaxation volume, and so on. A plurality of peaks on the frequency characteristic appear, and these are called formants. Peaks that appear in order from the lower frequency side toward the higher frequency side are the second formant, the third formant, the fourth formant,..., The nth formant, and a synthesized voice is emitted as a voice unique to the individual.

一般に聴覚による音声信号の理解は、その声の流れの中で重量な周波数成分およびその大きさを検知し、それらを脳に伝達して実現されるのであるが、この流れに関する研究は、言語の分野では比較的進んでおり、第１ホルマントから第４ホルマントが特に重量な周波数成分とされている。 In general, auditory comprehension of speech signals is realized by detecting heavy frequency components and their magnitude in the voice flow and transmitting them to the brain. The field is relatively advanced, and the first to fourth formants are particularly heavy frequency components.

そこで、前記課題を解決するための第１の発明は、収音された音声信号を分岐し、分岐した一方の音声信号から低次のホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去する帯域濾波器と、前記分岐した他方の音声信号の位相を、前記帯域濾波器を通過した音声信号の位相と合うように補正する位相補正器と、前記帯域濾波器を通過した音声信号を前記位相補正器を通過した音声信号に加算合成して出力する加算合成器とを有することを特徴とする音声処理装置である。 Accordingly, a first invention for solving the above-mentioned problem branches a collected audio signal, and a low frequency component lower than a low-order formant and a high frequency exceeding a high-order formant from one of the branched audio signals. A band-pass filter that removes components, a phase corrector that corrects the phase of the other branched audio signal so as to match the phase of the audio signal that has passed through the band-pass filter, and the band-pass filter that has passed An audio processing apparatus comprising: an adder / synthesizer that adds an audio signal to the audio signal that has passed through the phase corrector and outputs the resultant signal.

このような構成によれば、もとの音声信号から言語の識別に重要な第２〜第４程度のホルマント成分のみを抽出して元の音声信号に加えて出力できるため、健聴者にとっては、条件悪化により減衰した第２〜第４程度のホルマント成分が強調されることにより、条件の悪い放送設備であってもその音声を明瞭に聴き取ることができる。一方、聴力が衰えた高齢者や難聴者のように特に第２〜第４程度のホルマント成分が聴き取り難い対象者にとっては、全体の音量を上げなくともその音声の内容を明瞭に聴き取ることができる。 According to such a configuration, only the second to fourth formant components important for language identification can be extracted from the original voice signal and output in addition to the original voice signal. By emphasizing the second to fourth formant components attenuated due to worsening conditions, the sound can be clearly heard even in broadcasting facilities with poor conditions. On the other hand, especially for subjects who are difficult to hear the second to fourth formant components, such as elderly people and hearing-impaired people whose hearing has declined, the contents of the sound must be clearly heard without increasing the overall volume. Can do.

ここで、本発明でいう「低次のホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去した」周波数帯域としては特に限定するものではないが、例えば言語の認識に重要な要素である第２から第４ホルマント、より望ましくは第２から第５ホルマントを含む周波数帯域をいう（以下、同じである）。 Here, although it is not particularly limited as a frequency band according to the present invention, “a low-frequency component below a low-order formant and a high-frequency component exceeding a high-order formant are removed”, it is important for language recognition, for example. This is a frequency band including the second to fourth formants, more preferably the second to fifth formants (hereinafter, the same).

第２の発明は、収音されたステレオ信号となる左右音声信号から低次のホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去する帯域濾波器と、前記左右音声信号の位相を、前記帯域濾波器を通過した音声信号の位相と合うようにそれぞれ補正する一対の位相補正器と、前記帯域濾波器を通過した音声信号を前記各位相補正器を通過した左右音声信号にそれぞれ加算合成して出力する一対の加算合成器とを有することを特徴とする音声処理装置である。このような構成によれば、ステレオ音声の場合であっても、第１の発明と同様に明瞭で聴き取りやすい音声を出力できる。 According to a second aspect of the present invention, there is provided a bandpass filter for removing a low-frequency component below a low-order formant and a high-frequency component exceeding a high-order formant from the left and right audio signals to be collected stereo signals, and the left and right audio signals A pair of phase correctors that respectively correct the phase of the audio signal that has passed through the bandpass filter, and the left and right audio signals that have passed through the bandpass filter as the audio signal that has passed through the bandpass filter. And a pair of adder / synthesizers that respectively add and synthesize and output. According to such a configuration, even in the case of stereo sound, it is possible to output clear and easy-to-listen sound as in the first invention.

第３の発明は、第１または第２の発明において、前記帯域濾波器は、分岐した一方の音声信号から低次のホルマント以下の低周波数成分を除去するハイパスフィルターと、前記ハイパスフィルターを通過した音声信号から高次のホルマントを超える高周波数成分を除去するローパスフィルターとからなることを特徴とする音声処理装置である。このような構成によれば、もとの音声信号から言語の識別に重要な第２〜第４程度のホルマント成分のみを確実に抽出することができる。 According to a third invention, in the first or second invention, the bandpass filter passes a high-pass filter that removes a low-frequency component below a low-order formant from one of the branched audio signals, and the high-pass filter. An audio processing apparatus comprising: a low-pass filter that removes a high-frequency component exceeding a high-order formant from an audio signal. According to such a configuration, only the second to fourth formant components important for language identification can be reliably extracted from the original speech signal.

第４の発明は、第１〜第３の発明において、前記帯域濾波器を通過した合成前の音声信号の値を調整する効果調整器を備えたことを特徴とする音声処理装置である。条件や放送設備の性能などにより、音声の識別に重要な第２〜第４程度のホルマント成分の減衰量は一様ではない。また、聴力が衰えた高齢者や難聴者の場合、第２〜第４程度のホルマント成分の聴取能力には個人差が大きい。そこで、本発明のように効果調整器によって帯域濾波器で抽出した第２〜第４程度のホルマント成分の値を適宜調整（増減）することによって明瞭で且つ違和感のない音声を出力することができる。 According to a fourth aspect of the present invention, there is provided the speech processing apparatus according to any one of the first to third aspects, further comprising an effect adjuster for adjusting a value of the speech signal before synthesis that has passed through the bandpass filter. Depending on conditions, performance of broadcasting facilities, etc., attenuation amounts of the second to fourth formant components important for voice identification are not uniform. In addition, in the case of an elderly person or a hearing-impaired person whose hearing ability has declined, there is a great individual difference in the ability to listen to formant components of the second to fourth degree. Therefore, by adjusting (increasing / decreasing) the values of the second to fourth formant components extracted by the bandpass filter by the effect adjuster as in the present invention, clear and uncomfortable voice can be output. .

第５の発明は、可搬自在な筐体内に、前記第１〜第４のいずれかの音声処理装置を収容すると共に、前記筐体の表面に、収音用のマイクロホンを脱着可能に接続する入力側接続口と、前記音声処理装置で処理した音声信号を放送設備に出力する出力側接続口とを備えた音声明瞭化装置である。このような構成によれば、既存の放送設備に対して本発明装置を簡単に組み込むことが（付設）できるため、低コストで優れた効果を発揮できる。また、持ち運びが容易となるため、屋外のイベント会場の放送設備や他の放送設備にも簡単に適用できる。 According to a fifth aspect of the present invention, any one of the first to fourth sound processing devices is housed in a portable housing, and a microphone for collecting sound is detachably connected to the surface of the housing. The speech clarification device includes an input side connection port and an output side connection port that outputs an audio signal processed by the audio processing device to broadcasting equipment. According to such a configuration, the apparatus of the present invention can be easily incorporated (attached) to existing broadcasting equipment, so that an excellent effect can be exhibited at low cost. In addition, since it is easy to carry, it can be easily applied to broadcasting facilities at outdoor event venues and other broadcasting facilities.

第６の発明は、収音された音声信号を分岐する第１のステップと、前記第１のステップで分岐した一方の音声信号から第１ホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去する第２のステップと、前記第１のステップで分岐した他方の音声信号の位相を、前記第２のステップで処理された音声信号の位相と合うように補正する第３のステップと、前記第２のステップで処理された音声信号と前記第３のステップで処理された音声信号とを合成して出力する第４のステップとを含むことを特徴とする音声処理方法である。このような構成によれば、第１の発明と同様に、明瞭で聴き取りやすい音声を出力できる。 According to a sixth aspect of the present invention, there is provided a first step of branching a collected sound signal, a low frequency component equal to or lower than the first formant and a high frequency exceeding a higher order formant from one of the sound signals branched in the first step. A second step of removing frequency components, and a third step of correcting the phase of the other audio signal branched in the first step so as to match the phase of the audio signal processed in the second step. A voice processing method comprising: a step; and a fourth step of synthesizing and outputting the voice signal processed in the second step and the voice signal processed in the third step. . According to such a configuration, it is possible to output a clear and easy-to-listen sound as in the first invention.

本発明によれば、もとの音声信号から言語の識別に重要な第２〜第４程度のホルマント成分のみを抽出して元の音声信号に加えて出力できるため、健聴者にとっては、条件悪化により減衰した第２〜第４程度のホルマント成分が強調されることにより、条件の悪い放送設備であってもその音声を明瞭に聴き取ることができる。一方、聴力が衰えた高齢者や難聴者のように特に第２〜第４程度のホルマント成分が聴き取り難い対象者にとっては、全体の音量を上げなくともその音声の内容を明瞭に聴き取ることができる。 According to the present invention, only the second to fourth formant components important for language identification can be extracted from the original audio signal and output in addition to the original audio signal. By emphasizing the second to fourth formant components attenuated by, the sound can be clearly heard even in poorly conditioned broadcasting facilities. On the other hand, especially for subjects who are difficult to hear the second to fourth formant components, such as elderly people and hearing-impaired people whose hearing has declined, the contents of the sound must be clearly heard without increasing the overall volume. Can do.

音声の流れのなかでみられる周波数分析例を示した図である。It is the figure which showed the example of a frequency analysis seen in the flow of an audio | voice. 音声の流れの中からサンプリングした言語認識の過程を説明した例を示した図である。It is the figure which showed the example explaining the process of the language recognition sampled from the flow of an audio | voice. 感音性難聴におけるＶの音声レベルを１０ｄＢ増加した例を示した図である。It is the figure which showed the example which increased the audio | voice level of V in sound-sensitive deafness by 10 dB. 第１ホルマントｈ１以外のＧ領域のホルマントを帯域濾波器で抽出し、振幅増幅させた各ホルマントの各高調波成分信号を主音声信号へ加算合成した状態を示した図である。It is the figure which showed the state which extracted the formant of G area | regions other than the 1st formant h1 with the bandpass filter, and added and synthesize | combined each harmonic component signal of each formant which carried out amplitude amplification to the main audio | voice signal. 本発明に係る音声処理装置１００の実施の一形態を示した図である。It is the figure which showed one Embodiment of the audio processing apparatus 100 which concerns on this invention. 位相補正器３の回路構成図である。3 is a circuit configuration diagram of a phase corrector 3. FIG. 入力緩衝器２と帯域濾波器３０（４、５）の周波数特性図である。It is a frequency characteristic figure of input buffer 2 and bandpass filter 30 (4, 5). 帯域濾波器３０（４、５）による位相ずれ特性図である。It is a phase shift characteristic figure by bandpass filter 30 (4, 5). 位相補正器３が無い場合の位相差Ｔを示す図である。It is a figure which shows the phase difference T when there is no phase corrector. 位相差による周波数特性図である。It is a frequency characteristic figure by phase difference. 位相変換器３の位相可変範囲を示す図である。FIG. 4 is a diagram showing a phase variable range of the phase converter 3. 効果調整器６による出力特性を示す図である。It is a figure which shows the output characteristic by the effect regulator. 位相補正器３がある場合の同位相加算を示す図である。It is a figure which shows the same phase addition in case there exists the phase corrector 3. FIG. 既存の有線放送設備の構成を示す図である。It is a figure which shows the structure of the existing cable broadcasting equipment. 既存の無線放送設備の構成を示す図である。It is a figure which shows the structure of the existing radio broadcasting equipment. 本発明に係る音声明瞭化装置２００の実施の一形態を示した図である。It is the figure which showed one Embodiment of the speech clarification apparatus 200 which concerns on this invention. 本発明に係る音声処理装置１００の他の実施形態（ステレオ音声）を示した図である。It is the figure which showed other embodiment (stereo sound) of the audio processing apparatus 100 which concerns on this invention.

以下、本発明の実施の形態を添付図面を参照しながら説明する。図１は、音声の流れのなかでみられる周波数分析例を示したものである。前述したように人が言葉を発すると共鳴現象に伴い、ホルマントと称される数次の高調波が同時に放出される。これら数次の高調波のうち、主に母音に影響を与える基本波ｈ１を第１ホルマントといい、次に周波数が高い高調波ｈ２を第２ホルマント、その次に周波数が高い高調波ｈ３を第３ホルマント、その次に周波数が高い高調波ｈ４を第４ホルマントと称されており、以後順に周波数が高い高調波ｈｎを第ｎホルマントと称されている。そして、高次のホルマントになるほど低いレベルへと減衰するが、このうち第２〜第５程度のホルマント成分が特に言語理解に重要な要素となっていることが判明している。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. FIG. 1 shows an example of frequency analysis observed in a voice flow. As described above, when a person utters a word, several harmonics called formants are emitted simultaneously with the resonance phenomenon. Of these several harmonics, the fundamental wave h1 that mainly affects the vowel is called the first formant, the second highest form harmonic h2 is the second formant, and the second highest harmonic h3 is the second formant. The third formant, and then the harmonic h4 with the next highest frequency is called the fourth formant, and the harmonic hn with the higher frequency in turn is called the nth formant. The higher the formant, the lower the level, but it has been found that the second to fifth formant components are particularly important for language understanding.

言語分野における研究によれば、聴覚は各ホルマントを敏感に検出し、また言葉を知覚する能力は幼児期に学習し、記憶しているといわれている。そのため、聴覚から得られた言語情報の一部が欠落していても過去に記憶蓄積された経験で欠落部分が補完され、一般的な日常会話であれば問題にはならないといわれている。 According to research in the language field, hearing is sensitive to detecting each formant and the ability to perceive words is learned and remembered in early childhood. For this reason, even if a part of the language information obtained from hearing is missing, the missing part is complemented by the experience stored and accumulated in the past, and it is said that there is no problem in general daily conversation.

しかしながら、聞こえてきた音を言葉として認識するためには、聴覚が前記のホルマントを検出する必要があるが、もしこれらのホルマントのうち重要なホルマント成分が検出できないと言葉として認識できない。例えば、５００Ｈｚ〜３ｋＨｚの帯域を削除すると言語を全く理解できないという実験結果も報告されている。 However, in order to recognize the sound that has been heard as words, it is necessary for the auditory sense to detect the formants, but if the important formant components of these formants cannot be detected, they cannot be recognized as words. For example, an experimental result has been reported that the language cannot be understood at all if the band of 500 Hz to 3 kHz is deleted.

音を聴き取り難い難聴は、外耳、中耳の障害による伝音性難聴と、内耳、聴覚神経、脳の障害による感音性難聴と、これらの両方に起因する複合性難聴との３種類に大きく分類される。このうち、軽度の難聴である伝音性難聴の場合は、例えばテレビの音量を大きくしたり、対話者が耳元で話しかけるなどで対応できるが、感音性難聴の場合は、加齢と共に進行し、到来音を大きくしてもただ「ゴホン、ゴホン」という音として聞こえるだけで言葉の内容が理解できない。これは音量を上げても第２ホルマント〜第４ホルマントの中レベルのホルマントを検出する能力が低下してしまったためと考えられる。 There are three types of hearing loss that are difficult to hear: hearing loss due to disturbance of the outer ear and middle ear, sensory hearing loss due to damage to the inner ear, auditory nerve, and brain, and complex hearing loss due to both. Broadly classified. Of these, in the case of conductive hearing loss, which is mildly deaf, it can be dealt with, for example, by increasing the volume of the TV or by talking to the speaker in the ear, but in the case of sensory hearing loss, it progresses with aging. Even if the incoming sound is made louder, I can't understand the content of the word just by hearing it as a “gohon, gohon” sound. This is presumably because the ability to detect medium level formants from the second formant to the fourth formant was reduced even when the volume was increased.

また、聴覚には過大音が到達したときにその音の周波数近辺の他の小さな音は検出され難いか、またはある音が別の音に妨害されて聴き取り難いという性質がある。これをマスキング効果といい、過大音に対して脳の自己防衛作用が働くためといわれている。これらの代表例として、例えば大きな会場に設置される放送設備において天井部に埋め込まれたスピーカーから出る音声成分に含まれる低音域の音圧レベルが、天井板によるバッフル効果によって２〜３倍程度に上昇することがある。 In addition, hearing has the property that when an excessive sound reaches, other small sounds near the frequency of the sound are difficult to detect, or one sound is disturbed by another sound and difficult to hear. This is called the masking effect and is said to be due to the brain's self-defense action against excessive sounds. As a representative example of these, for example, in a broadcasting facility installed in a large venue, the sound pressure level in the low range included in the sound component emitted from the speaker embedded in the ceiling is about 2 to 3 times due to the baffle effect by the ceiling board. May rise.

この結果、そのスピーカーから出る音声成分のうち、第２ホルマント以上のホルマント成分がマスキングされてしまい放送内容が聴き取れないという事態がしばしば起こる。一方、高齢者の場合では音声検知能力が低下するため、言語の基本波である第１ホルマントの音量が上昇すると第２ホルマント以降のホルマントがマスキングされて言葉が聴き取れなくなってしまう。 As a result, among the audio components output from the speaker, a formant component equal to or higher than the second formant is masked, so that the broadcast content cannot often be heard. On the other hand, in the case of an elderly person, the voice detection ability is reduced, so that when the volume of the first formant, which is the fundamental wave of the language, is increased, the formants after the second formant are masked and the words cannot be heard.

以上の考察から言語を正確に伝達するためには、第２ホルマント以上の高調波成分を増幅させてマスキングレベルを超えるようにすることでこれらの難聴の問題を解決できると考える。 From the above considerations, in order to accurately convey the language, it is considered that these deafness problems can be solved by amplifying higher harmonic components than the second formant to exceed the masking level.

図２は、音声の流れの中からサンプリングした言語認識の過程を説明した例を示したものである。横軸は周波数（Ｈｚ）、縦軸は音圧（ｄＢ）であり、Ｖ（ヴイ）という声に含まれるホルマント成分とそのレベルを示している。第１ホルマントｈ１は最も低い周波数成分（母音）を有しているが、その音圧は最も高いレベルとなっている。以後、第２ホルマントｈ２〜第ｎホルマントｈｎになるに従ってその音圧は徐々に低いレベルに推移している。 FIG. 2 shows an example illustrating a language recognition process sampled from a speech flow. The horizontal axis represents frequency (Hz) and the vertical axis represents sound pressure (dB), which indicates the formant component included in the voice V (Vu) and its level. The first formant h1 has the lowest frequency component (vowel), but its sound pressure is at the highest level. Thereafter, as the second formant h2 to the nth formant hn, the sound pressure gradually changes to a low level.

この図において、ラインＭＫは健聴者のマスキング範囲を示し、ラインＭＫ’は難聴者のマスキング範囲を示している。図示するように健聴者のマスキングラインＭＫはすべてのホルマントより低いため、マスキング効果を受け難くＶ音声を明瞭に聴き取れるのに対し、難聴者のマスキングラインＭＫ’は、第１ホルマントｈ１を除く他のホルマントよりも高いため、マスキング効果が著しくＶ音声を聴き取れることができないことが分かる。 In this figure, a line MK indicates a masking range for a normal hearing person, and a line MK ′ indicates a masking range for a hearing impaired person. As shown in the figure, the masking line MK of the normal hearing person is lower than all the formants, so that the V voice can be clearly heard without being affected by the masking effect, whereas the masking line MK ′ of the hearing person is other than the first formant h1. It can be seen that the V sound cannot be heard because the masking effect is remarkably high.

図３は、感音性難聴におけるＶの音声レベルを１０ｄＢ増加した場合を示したものである。図においてｈ１＋〜ｈｎ＋はそれぞれ各ホルマントｈ１〜ｈｎをそれぞれ１０ｄＢ増加したレベルを示しているが、難聴者のマスキングラインＭＫ’は単にそのまま上方向へ平行移動するだけであり、音量を上げただけでは言語の理解は不可能であることが分かる。 FIG. 3 shows a case where the sound level of V in sound-sensitive hearing loss is increased by 10 dB. In the figure, h1 + to hn + indicate levels obtained by increasing each formant h1 to hn by 10 dB, respectively, but the deaf person's masking line MK ′ is simply moved upward in parallel, and simply increasing the volume It turns out that language understanding is impossible.

図４は、音声認識に重要な要素である、第１ホルマントｈ１以外のＧ領域のホルマント（第２ホルマントｈ２〜第ｎホルマントｈｎ）を帯域濾波器（バンドパスフィルタ）で抽出し、振幅増幅させた各高調波成分信号ホルマントｈ２＋＋〜ｈｎ＋＋を主音声信号へ加算合成した状態を示したものである。図示するように、このような処理をすれば全てのホルマント成分が難聴者のマスキングラインＭＫ’を上回る結果となって音声Ｖを理解できることが分かる。 FIG. 4 shows that a G region formant (second formant h2 to nth formant hn), which is an important element for speech recognition, is extracted by a bandpass filter (bandpass filter) and amplified in amplitude. The harmonic component signal formants h2 + to hn ++ are added and synthesized to the main audio signal. As shown in the figure, it can be understood that if such processing is performed, the voice V can be understood because all formant components exceed the masking line MK ′ of the hearing impaired.

なお、従来の補聴器や集音器は、離れた位置から発生される音声を集めることが主体となっているが、空中を伝播する途中で高次のホルマント成分は周囲の騒音や拡散、空気質量により減衰して言語理解度が著しく阻害されるため、一層複雑な音声処理が必要になると思われる。 Note that conventional hearing aids and sound collectors mainly collect sound generated from a distant location, but higher-order formant components in the middle of propagating in the air are ambient noise, diffusion, and air mass. As a result, the language comprehension is significantly hindered, so that more complicated speech processing is necessary.

本発明の音声処理装置は、発声者の口元の近い距離のマイクロホンで収音された音声、すなわち空気などによる伝搬損失が少なく、言語理解に重要な要素となる高次のホルマント成分が減衰されていない状態のピュアな音声を主な対象とし、その音声から高次のホルマント成分を抽出して元の音声に加算合成すれば、条件の悪い放送設備や難聴者であっても明瞭に音声の内容を聴き取ることが可能となるとの知見のもとに案出されたものである。 The speech processing apparatus of the present invention has less propagation loss due to speech picked up by a microphone at a distance close to the mouth of the speaker, that is, air, etc., and attenuates higher-order formant components that are important for language understanding. If the main target is pure speech with no sound, and the higher-order formant components are extracted from the speech and added to the original speech, the content of the speech is clearly seen even in poorly conditioned broadcasting facilities and deaf people. It was devised based on the knowledge that it is possible to listen to music.

図５は、本発明に係る音声処理装置１００の実施の一形態を示したものである。図中符号１は収音用のマイクロホン、２はこのマイクロホン１で発声する微小電圧信号を数倍に増幅して効率良く処理するための緩衝増幅器、３はこの緩衝増幅器２で増幅された音声信号の位相を補正する位相補正器、３０は同じく緩衝増幅器２で増幅された音声信号のうちを低次のホルマント以下の周波数成分と高次のホルマント以上の周波数成分とを除去する帯域濾波器である。 FIG. 5 shows an embodiment of the speech processing apparatus 100 according to the present invention. In the figure, reference numeral 1 is a microphone for sound collection, 2 is a buffer amplifier for amplifying a minute voltage signal uttered by the microphone 1 several times, and 3 is an audio signal amplified by the buffer amplifier 2. Similarly, a phase corrector 30 corrects the phase of the sound signal, and is a bandpass filter 30 that removes a low-order formant frequency component and a high-order formant frequency component from the audio signal amplified by the buffer amplifier 2. .

また、図中６はこの帯域濾波器３０を通過した音声信号の値を調整する調整器、７はこの調整器７で調整された音声信号を位相補正器３で補正された音声信号に加算合成する加算合成器、８はこの加算合成器で合成された音声信号の出力を調整する出力緩衝器、９は音声信号をマイクロホン１の出力感度に調整するための減衰器、１０は出力端子である。 In the figure, 6 is an adjuster that adjusts the value of the audio signal that has passed through the bandpass filter 30, and 7 is an adder that combines the audio signal adjusted by the adjuster 7 with the audio signal corrected by the phase corrector 3. 8 is an output buffer for adjusting the output of the audio signal synthesized by the adder / synthesizer, 9 is an attenuator for adjusting the audio signal to the output sensitivity of the microphone 1, and 10 is an output terminal. .

帯域濾波器３０は、さらにハイパスフィルター４とローパスフィルター５とから構成されており、ハイパスフィルター４によって音声信号のなかから第１ホルマントｈ１以下の低周波成分を除去し、ローパスフィルター５によってハイパスフィルター４を通過した音声信号から高次のホルマントを超える高周波数成分を除去して第２ホルマント〜第５ホルマントの周波数成分を抽出するようになっている。 The band-pass filter 30 further includes a high-pass filter 4 and a low-pass filter 5. The high-pass filter 4 removes low-frequency components below the first formant h 1 from the audio signal, and the low-pass filter 5 removes the high-pass filter 4. The high-frequency components exceeding the higher-order formants are removed from the audio signal that has passed through, and the second to fifth formant frequency components are extracted.

位相補正器３は、図６に示すように入力端子１５、入力接地端子１６、出力端子１７、出力接地端子１８、演算増幅器１９、コンデンサ、抵抗Ｒ、Ｒ１、Ｒ２とからなる回路構成を有しており、音声信号を後述するように０〜−１８０度の範囲で任意に位相遅延制御できるようになっている。 As shown in FIG. 6, the phase corrector 3 has a circuit configuration including an input terminal 15, an input ground terminal 16, an output terminal 17, an output ground terminal 18, an operational amplifier 19, a capacitor, resistors R, R1, and R2. The audio signal can be arbitrarily controlled in phase delay in the range of 0 to -180 degrees as will be described later.

以下、このような構成をした音声処理装置１００の作用を説明する。図５に示すようにまず発話者の口元で発せられた音声は殆ど減衰することなく直接マイクロホン１で集音されて電気信号（音声信号）に変換されて緩衝増幅器２で数倍に増幅される。増幅された音声信号はその後、分岐してその一方が主音声経路Ｌ１を通過して位相補正器３に送られて後述するようにその位相が補正処理される。 Hereinafter, the operation of the speech processing apparatus 100 having such a configuration will be described. As shown in FIG. 5, firstly, the voice uttered by the speaker's mouth is collected directly by the microphone 1 without being attenuated, converted into an electric signal (speech signal), and amplified several times by the buffer amplifier 2. . The amplified audio signal is then branched, one of which passes through the main audio path L1, is sent to the phase corrector 3, and the phase thereof is corrected as will be described later.

他方、副音声経路Ｌ２側に送られた音声信号は、帯域濾波器３０で第２ホルマント〜第５ホルマントの周波数成分が抽出される。すなわち、この帯域濾波器３０のハイパスフィルター４を通過することによって第１ホルマントおよび有害な低周波成分、例えばマイクロホン１に吹き付けられる息などで生ずる風切り音が抑圧された後、ローパスフィルタ−５を通過することによって各部の増幅素子自体の発する耳障りな高周波雑音を抑圧して図７の曲線ｎａで示す一種の広帯域濾波器特性を持たせ改善効果の向上を図っている。 On the other hand, the frequency components of the second formant to the fifth formant are extracted from the sound signal sent to the sub sound path L2 side by the bandpass filter 30. That is, by passing through the high pass filter 4 of the bandpass filter 30, the first formant and harmful low frequency components, for example, wind noise generated by the breath blown to the microphone 1, is suppressed, and then passes through the low pass filter-5. This suppresses annoying high-frequency noise generated by the amplifying element itself of each part, thereby providing a kind of broadband filter characteristic indicated by a curve na in FIG. 7, thereby improving the improvement effect.

ハイパスフィルター４は、明瞭度合いを任意に設定するため５倍程度の増幅器としての機能を併せ持ち、ローパスフィルタ−５は、さらに言語の認識に対する貢献度の低い超高次のホルマントを含む広域周波数成分を除去することで信号対雑音比性能の特性を改善している。この帯域濾過器３０の出力点ｚの周波数特性を図７の曲線ｂｐで示す。本実施の形態では、この帯域濾過器３０の周波数特性は様々な検証テストで設定された４００Ｈｚ~７ｋＨｚの約４オクターブであり、低域、高域ともおよそ−１２ｄＢ／ｏｃｔ程度で各々遮断している。 The high-pass filter 4 also has a function as an amplifier of about 5 times to arbitrarily set the degree of intelligibility, and the low-pass filter-5 further includes a wide frequency component including a super high-order formant having a low contribution to language recognition. Eliminating the signal-to-noise ratio performance characteristics. The frequency characteristic of the output point z of the band filter 30 is shown by a curve bp in FIG. In the present embodiment, the frequency characteristic of the band filter 30 is about 4 octaves from 400 Hz to 7 kHz set in various verification tests, and both low and high frequencies are cut off at about −12 dB / oct. Yes.

この帯域濾過器３０を出た音声信号は、効果調整器６でその値が調整されてから、加算合成器７で主音声経路Ｌ１側の音声信号に加算合成される。加算合成された音声信号は、出力緩衝器８および減衰器９で適宜その出力が調整された後、出力端子１０から既存の放送設備などへ出力される。 The value of the audio signal that has exited the band filter 30 is adjusted by the effect adjuster 6 and then added and synthesized by the adder / synthesizer 7 to the audio signal on the main audio path L1 side. The added and synthesized audio signal is appropriately adjusted by the output buffer 8 and the attenuator 9 and then output from the output terminal 10 to existing broadcasting equipment.

ここで、この帯域濾過器３０においては図５のｘ点からの入力信号が図８に示すように音声帯域内で最大９０度程度の位相遅れＴをもってｚ点に出力されるという本質的な特性がある。このため、このｚ点に出力された音声信号を、主音声経路Ｌ１を通過する主音声信号にそのまま加算合成器７で合成すると、図９に示すように主音声信号電圧をＶｘ、帯域濾過器３０の出力電圧をＶｚとすると、Ｖｚは位相遅れのため、−Ｖｚと考えられることでＶｘ＋（−Ｖｚ）＝Ｖｋとなり、２信号の電圧差が加算合成器７の出力となる。 Here, in the band filter 30, the essential characteristic is that the input signal from the point x in FIG. 5 is output to the point z with a phase delay T of about 90 degrees at the maximum in the voice band as shown in FIG. There is. For this reason, when the audio signal output at the z point is directly synthesized with the main audio signal passing through the main audio path L1 by the adder / synthesizer 7, the main audio signal voltage is set to Vx, as shown in FIG. Assuming that the output voltage of 30 is Vz, Vz is considered to be −Vz because of the phase lag, so Vx + (− Vz) = Vk, and the voltage difference between the two signals becomes the output of the adder / synthesizer 7.

この結果、図１０に示すようにある周波数域に周波数特性上ｄなる「深い谷」特性という不感帯を生じてしまい、言語理解上特に重要な第２ホルマントおよび第３ホルマント金パンの音量が極端に小さくなるなどの明瞭度の改善に大きな障害となる不都合が発生する。図９は２音声信号の位相を考慮した場合の加算合成器７における信号の相殺現象を示したものである。 As a result, as shown in FIG. 10, a dead zone called “deep valley” characteristic, which is d in the frequency characteristic, is generated in a certain frequency region, and the volume of the second formant and the third formant gold bread, which is particularly important for language understanding, is extremely high. Inconveniences that become a major obstacle to improvement in clarity such as reduction in size occur. FIG. 9 shows the signal cancellation phenomenon in the adder / synthesizer 7 when the phases of the two audio signals are taken into consideration.

本発明は、このような不感帯を解消するために主音声信号経路Ｌ１上に位相補正器３を設け、図１３に示すように高次のホルマント抽出のための帯域濾波器３０（４、５）で生ずる位相遅れに相当する位相遅延を加えて補正し、補正した音声信号に帯域濾波器３０（４、５）を通過した音声信号を加算合成することで不感帯の発生を回避するようにしたものである。 In the present invention, in order to eliminate such a dead zone, a phase corrector 3 is provided on the main audio signal path L1, and a bandpass filter 30 (4, 5) for higher-order formant extraction as shown in FIG. The phase delay corresponding to the phase delay generated in step 1 is added and corrected, and the generation of the dead zone is avoided by adding and synthesizing the corrected audio signal with the audio signal passed through the bandpass filter 30 (4, 5). It is.

従って、主音声経路Ｌ１側に設けられた位相補正器３は、本発明の音声処理装置１００を実現する上で重要な役割を担っていると共に大きな特徴となっている。すなわち、図６に示すようにこの位相補正器３を構成する回路では、角周波数はω＝１／ＣＲで決定され、０〜−１８０度の範囲で任意に位相遅延制御が達成できる。ちなみにＣ及びＲ値を任意に選択することで利得を一定にしたまま低い周波数帯であっても位相遅れを限りなく０度に近づけることができるという優れた特質があり、その位相特性を図１１に示す。 Therefore, the phase corrector 3 provided on the main voice path L1 side plays an important role in realizing the voice processing apparatus 100 of the present invention and is a major feature. That is, in the circuit constituting the phase corrector 3 as shown in FIG. 6, the angular frequency is determined by ω = 1 / CR, and phase delay control can be arbitrarily achieved in the range of 0 to −180 degrees. Incidentally, by arbitrarily selecting C and R values, there is an excellent characteristic that the phase delay can be as close to 0 degrees as possible even in a low frequency band while keeping the gain constant. Shown in

図６において入力端子１５から入力された音声信号は演算増幅器１９の反転端子（−）に印加されるため、出力端子１７には入力された信号とは逆相の反転信号が現れる。一方、同時に入力端子１５にはコンデンサＣが演算増幅器１９の非反転端子（＋）へ接続されているが、コンデンサＣは周波数よりそのインピーダンスが変化する素子でそのインピーダンスＺは１／ωＣであることから周波数に反比例し、周波数が高くなるほど内部抵抗（インピーダンス）が低くなるという特性がある。 In FIG. 6, since the audio signal input from the input terminal 15 is applied to the inverting terminal (−) of the operational amplifier 19, an inverted signal having a phase opposite to that of the input signal appears at the output terminal 17. On the other hand, at the same time, the capacitor C is connected to the input terminal 15 to the non-inverting terminal (+) of the operational amplifier 19, but the capacitor C is an element whose impedance changes with frequency and its impedance Z is 1 / ωC. Therefore, the internal resistance (impedance) decreases as the frequency increases.

そのため、高い周波数ほど非反転入力端子への入力信号が増加し、演算増幅器１９内部では両方の信号を演算し、非反転側の信号が大きければ出力端子１７の信号の位相遅延幅を小さくし、周波数が低くなればインピーダンスが大きくなるため、非反転入力端子への入力信号が小さくなり、出力端子１７には大きな遅延幅を持った信号が出力されるのであるが、希望周波数での位相遅延度合いは、抵抗Ｒ、コンデンサＣの定数で自由に設定が可能になる。なお、この位相補正器３の増幅度はｒ２／ｒ１で決定され、目的に応じて任意に設定可能であり、常に一定に保つことができる。 Therefore, the higher the frequency, the more the input signal to the non-inverting input terminal increases. Both signals are calculated inside the operational amplifier 19, and if the non-inverting side signal is large, the phase delay width of the signal at the output terminal 17 is reduced. Since the impedance increases as the frequency decreases, the input signal to the non-inverting input terminal decreases, and a signal having a large delay width is output to the output terminal 17, but the degree of phase delay at the desired frequency Can be freely set by the constants of the resistor R and the capacitor C. The amplification factor of the phase corrector 3 is determined by r2 / r1, and can be arbitrarily set according to the purpose, and can always be kept constant.

そして、図４のＧ範囲にある言語理解に重要な高次ホルマント成分は効果調整器６により加算合成器に印加され、マスキングラインＭＫ’を超えるように任意に設定することで出力緩衝器８の合成出力ｔ点では、図１２（Ａ）〜（Ｃ）に示すように周波数対出力レベル曲線は、効果調整器６の設定位置により、「最小位置」、「中央位置」、「最大位置」のように使用場所や難聴者の程度に応じて最適な効果が得られるように自由に選択することができる。 Then, a high-order formant component important for language understanding in the G range of FIG. 4 is applied to the adder / synthesizer by the effect adjuster 6 and arbitrarily set so as to exceed the masking line MK ′. At the composite output point t, as shown in FIGS. 12A to 12C, the frequency versus output level curve has “minimum position”, “center position”, and “maximum position” depending on the setting position of the effect adjuster 6. Thus, it can be freely selected so as to obtain an optimum effect according to the place of use and the degree of hearing loss.

さらに、本発明の音声処理装置１００は、出力側に減衰器９を備えることにより、その出力感度をマイクロホン１の出力感度に合わせことが可能となるため、図１４や図１５に示すような既存の有線及び無線の放送設備に対しても簡単に組み込むことができる。この結果、既存の放送設備に対しては何ら改造や改良を加える必要がなくなり、低コストで優れた館内放送の音質改善効果が得られる。 Furthermore, since the sound processing apparatus 100 of the present invention is provided with the attenuator 9 on the output side, the output sensitivity can be matched with the output sensitivity of the microphone 1, so that the existing sound processing apparatus as shown in FIGS. It can also be easily incorporated into wired and wireless broadcasting facilities. As a result, it is not necessary to modify or improve the existing broadcasting equipment, and an excellent effect of improving the sound quality of the in-house broadcasting can be obtained at low cost.

この場合には、さらに図１６に示すように持ち運び（可搬）自在な筐体（金属ケース）３３内に、本発明の音声処理装置１００を内蔵し、その筐体３３の表面に収音用のマイクロホン１を脱着可能に接続する入力側接続口３４と、音声処理装置１００で処理した音声信号を放送設備に出力する出力側接続口３５を設けるような構成としたユニット状の音声明瞭化装置２００とすれば、取り付けや取り外しが容易になるだけでなく、持ち運びや収容が簡単にできるため、屋外のイベント会場の放送設備や他の放送設備にも簡単に適用できる。 In this case, as shown in FIG. 16, the voice processing device 100 of the present invention is built in a portable (portable) casing 33 (metal case), and the surface of the casing 33 is used for collecting sound. A unit-like audio clarifying device configured to include an input-side connection port 34 for detachably connecting the microphone 1 and an output-side connection port 35 for outputting an audio signal processed by the audio processing device 100 to broadcasting equipment. If it is 200, not only attachment and detachment will be easy, but also it can be easily carried and accommodated, so it can be easily applied to broadcasting equipment at outdoor event venues and other broadcasting equipment.

また、本発明に係る音声処理装置１００の他の実施の形態として、図１７に示すように主音声経路Ｌ１を左右一対設け、各主音声経路Ｌ１−Ｒ、Ｌ１−Ｌの音声信号を前記と同じように副主音声経路Ｌ２で帯域濾過し、その抽出した信号（第２ホルマント〜第５ホルマント）をそれぞれの加算合成器７ａ、７ｂで各主音声経路Ｌ１−Ｒ、Ｌ１−Ｌの音声信号に加算合成すれば、ステレオ音声の場合であっても同様な効果を得ることができる。 As another embodiment of the sound processing apparatus 100 according to the present invention, a pair of main sound paths L1 are provided as shown in FIG. 17, and the sound signals of the main sound paths L1-R and L1-L are described above. Similarly, band-filtering is performed on the sub-main audio path L2, and the extracted signals (second formant to fifth formant) are output to the audio signals of the main audio paths L1-R and L1-L by the respective adder / synthesizers 7a and 7b. The same effect can be obtained even in the case of stereo sound.

すなわち、図１７に示すように一対の信号入力端子１ａ、１ｂからそれぞれ入力されたステレオ信号となる左右音声信号は、各入力緩衝装置２ａ、２ｂで増幅されると共に、後段の動作を正常とするために低インピーダンス出力変換される。各入力緩衝装置２ａ、２ｂからでた左右音声信号（ステレオ信号）は、それぞれ各主音声経路Ｌ１−Ｒ、Ｌ１−Ｌと入力加算器３１側に分岐する。 That is, as shown in FIG. 17, the left and right audio signals, which are stereo signals input from the pair of signal input terminals 1a and 1b, are amplified by the input buffer devices 2a and 2b, and the subsequent operation is made normal. Therefore, low impedance output is converted. The left and right audio signals (stereo signals) from the input buffer devices 2a and 2b branch to the main audio paths L1-R and L1-L and the input adder 31 side, respectively.

入力加算器３１側に分岐した左右音声信号（ステレオ信号）は、ここで合流してから混合緩衝器３２でモノラル信号に変換され、ハイパスフィルタ−４とローパスフィルター５を含む帯域濾波器３０で増幅されると共に高次のホルマントが抽出されて効果調整器６に送られてその値が調整される。 The left and right audio signals (stereo signals) branched to the input adder 31 side are merged here, converted to a monaural signal by the mixing buffer 32, and amplified by the bandpass filter 30 including the high-pass filter 4 and the low-pass filter 5. At the same time, a high-order formant is extracted and sent to the effect adjuster 6 to adjust its value.

一方、各主音声経路Ｌ１−Ｒ、Ｌ１−Ｌ側を通過する各左右音声信号（ステレオ信号）は、各位相補正器３ａ、３ｂで前述したようにその位相が帯域濾波器３０で必然的に生ずる位相遅れ分に相当する時間遅れが補正されて同位相となってそれぞれ加算合成器７ａ、７ｂにおいて効果調整器６で効果が調整されたモノラル音声信号と加算合成される。その後、各左右音声信号（ステレオ信号）は、それぞれの出力緩衝器８ａ、８ｂでインピーダンスが下げられた後、信号出力端子１０ａ、１０ｂから出力されることになる。 On the other hand, the left and right audio signals (stereo signals) passing through the main audio paths L1-R and L1-L are inevitably in phase with the bandpass filter 30 as described above with the phase correctors 3a and 3b. The time delay corresponding to the generated phase delay is corrected to have the same phase, and are added and synthesized with the monaural audio signal whose effect is adjusted by the effect adjuster 6 in the adder / synthesizers 7a and 7b. Thereafter, the left and right audio signals (stereo signals) are output from the signal output terminals 10a and 10b after the impedance is lowered by the respective output buffers 8a and 8b.

一般にテレビ放送の音声はモノラルで放送され、広がり成分は左右のチャンネルにそれぞれ分離して音場を作り出しているため、入力加算器３１および混合緩衝器３２の出力、すなわち音声成分は広がり成分よりも約６ｄＢ程度大きなレベルで帯域濾波器３０へと印加される。このため、この音声成分から言語認識に重要な第２ホルマント〜第５ホルマントを抽出してもとのステレオ信号に合成することにより明瞭化を達成することができる。 In general, audio of a television broadcast is broadcast in monaural, and the spread component is separated into left and right channels to create a sound field. Therefore, the output of the input adder 31 and the mixing buffer 32, that is, the sound component is larger than the spread component. It is applied to the bandpass filter 30 at a level about 6 dB larger. Therefore, clarification can be achieved by synthesizing the second formant to the fifth formant, which are important for language recognition, from the speech components and synthesizing them into the original stereo signal.

この結果、聴きやすく明瞭化された音声信号を無線送信装置やＴＶ音声補助装置、音楽再生装置などへ組み込むことで聴力が衰えた高齢者や難聴者であっても明瞭に聴き取ることができ、また、騒音が激しい環境での日常生活で活用することが可能となる。そして、本発明装置１００を実際に既存の無線送信装置やＴＶ音声補助装置、音楽再生装置等に組み込んだところ、健聴者と難聴者との日常生活で生ずる障害を除くことが可能となったとのフィールドテスト結果も報告されている。 As a result, it is possible to listen clearly even to elderly people and hearing-impaired people whose hearing has declined by incorporating sound signals that are easy to hear and clarified into wireless transmission devices, TV sound auxiliary devices, music playback devices, etc. It can also be used in daily life in a noisy environment. And when the device 100 of the present invention is actually incorporated into an existing wireless transmission device, TV audio assist device, music playback device, etc., it has become possible to eliminate the obstacles that occur in the daily life of the normal hearing person and the deaf person. Field test results are also reported.

本発明の音声処理装置１００および持ち運びを容易にした音声明瞭化装置２００は、既存の放送設備であれば、その殆どに適用可能であり、簡単に優れた音声明瞭化効果を発揮できる。例えば、病院などの呼び出し設備、自治体の非常放送設備、電話機の送話部、無線通信機、音声認識会議システム、イベント会場の案内放送装置、テレビ受像機、ラジオ受信機、公共施設の案内放送設備、高齢者・障害者収容施設の放送設備、学校内放送設備、電車・バスの車内放送、駅・ショッピングセンター・デパート・映画館などの多くの人が集まる場所の館内放送などに簡単に適用でき、優れた音声明瞭化効果が得られる。 The voice processing apparatus 100 and the voice clarification apparatus 200 that can be easily carried can be applied to almost any existing broadcasting equipment, and can easily exhibit an excellent voice clarification effect. For example, call facilities such as hospitals, local government emergency broadcast facilities, telephone transmitters, wireless communication devices, voice recognition conference systems, event venue guidance broadcast devices, television receivers, radio receivers, public facility guidance broadcast facilities It can be easily applied to broadcasting facilities for elderly and handicapped facilities, school broadcasting facilities, in-car broadcasting of trains and buses, and in-house broadcasting in places where many people gather, such as stations, shopping centers, department stores, movie theaters, etc. Excellent speech clarification effect can be obtained.

本発明の音声処理装置１００および音声明瞭化装置２００によれば、聴き取り難かった音声が明瞭化されるだけでなく、送話者個人特有の声色を保持できるため、違和感のない自然な音声を生成することができる。 According to the voice processing device 100 and the voice clarification device 200 of the present invention, not only the voice that is difficult to hear is clarified, but also the voice color unique to the individual of the sender can be maintained, so that natural voice without a sense of incongruity can be maintained. Can be generated.

１００…音声処理装置
２００…音声明瞭化装置
１…マイクロホン
２…入力緩衝装置
３…位相補正器
４…ハイパスフィルター
５…ローパスフィルター
６…効果調整器
７…加算合成器
８…出力緩衝器
９…減衰器
１０…音声出力端子
１１…音声処理部
１２…電力増幅器
１３…スピーカ
１５…入力端子
１６…入力接地端子
１７…出力端子
１８…出力接地端子
１９…演算増幅器
２０…本発明装置を搭載した変調器部
２１…無線送信部
２２…送信用アンテナ又は赤外線発光部
２３…受信用アンテナ又は赤外線受光部
２４…無線信号受信部
３０…帯域濾波器
３１…入力加算器
３２…混合緩衝器
３３…筐体
３４…入力側接続口
３５…出力側接続口 DESCRIPTION OF SYMBOLS 100 ... Audio processing apparatus 200 ... Audio | voice clarifying apparatus 1 ... Microphone 2 ... Input buffer 3 ... Phase corrector 4 ... High pass filter 5 ... Low pass filter 6 ... Effect adjuster 7 ... Adder / synthesizer 8 ... Output buffer 9 ... Attenuation 10: Audio output terminal 11 ... Audio processing unit 12 ... Power amplifier 13 ... Speaker 15 ... Input terminal 16 ... Input ground terminal 17 ... Output terminal 18 ... Output ground terminal 19 ... Operational amplifier 20 ... Modulator equipped with the device of the present invention Unit 21: Wireless transmission unit 22 ... Transmitting antenna or infrared light emitting unit 23 ... Reception antenna or infrared light receiving unit 24 ... Radio signal receiving unit 30 ... Bandpass filter 31 ... Input adder 32 ... Mixing buffer 33 ... Housing 34 ... Input side connection port 35 ... Output side connection port

Claims

A bandpass filter that branches the collected audio signal and removes a low-frequency component below a low-order formant and a high-frequency component exceeding a high-order formant from one of the branched audio signals;
A phase corrector that corrects the phase of the other branched audio signal so as to match the phase of the audio signal that has passed through the bandpass filter;
An audio processing apparatus comprising: an adder / synthesizer that adds and combines the audio signal that has passed through the bandpass filter with the audio signal that has passed through the phase corrector.

A bandpass filter that removes low-frequency components below the low-order formant and high-frequency components exceeding the high-order formant from the left and right audio signals to be collected stereo signals;
A pair of phase correctors that respectively correct the phase of the left and right audio signals to match the phase of the audio signal that has passed through the bandpass filter;
A speech processing apparatus, comprising: a pair of adder / synthesizers that respectively add and synthesize an audio signal that has passed through the bandpass filter and a left / right audio signal that has passed through each phase corrector.

The speech processing apparatus according to claim 1 or 2,
The bandpass filter is a high-pass filter having a function of removing a low-frequency component below a low-order formant from one branched audio signal and amplifying the signal,
An audio processing apparatus comprising: a low-pass filter that removes a high-frequency component exceeding a high-order formant from an audio signal that has passed through the high-pass filter.

The speech processing apparatus according to any one of claims 1 to 3,
An audio processing apparatus comprising an effect adjuster for adjusting a value of an audio signal before synthesis that has passed through the bandpass filter.

An input side connection port for accommodating the sound processing device according to any one of claims 1 to 4 in a portable housing, and detachably connecting a sound collecting microphone to the surface of the housing. And an audio clarification device comprising: an output side connection port that outputs an audio signal processed by the audio processing device to broadcasting equipment.

A first step of branching the collected audio signal;
A second step of removing low-frequency components below the low-order formant and high-frequency components exceeding the high-order formant from the one audio signal branched in the first step;
A third step of correcting the phase of the other audio signal branched in the first step so as to match the phase of the audio signal processed in the second step;
A voice processing method comprising: a fourth step of synthesizing and outputting the voice signal processed in the second step and the voice signal processed in the third step.