JP4840082B2

JP4840082B2 - Voice communication device

Info

Publication number: JP4840082B2
Application number: JP2006297474A
Authority: JP
Inventors: 卓也田丸; 勝一刑部
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-11-01
Filing date: 2006-11-01
Publication date: 2011-12-21
Anticipated expiration: 2026-11-01
Also published as: JP2008116534A

Description

この発明は、自装置の収音音声信号を相手先装置に送信し、相手先装置からの放音用音声信号を受信して放音する音声通信装置に関するものである。 The present invention relates to a voice communication device that transmits a collected voice signal of its own device to a counterpart device, and receives and emits a sound emission sound signal from the counterpart device.

現在、互いに離れた複数地点間をネットワークで接続して音声会議等を行う音声通信システムが各種考案されている。 At present, various audio communication systems have been devised in which a plurality of points distant from each other are connected by a network to perform an audio conference or the like.

例えば、特許文献１には、それぞれの地点にメインユニットと、会議者数に応じたマイク、スピーカ、ＲＦＩＤ読み取り装置とを、備え、各メインユニット間をネットワークで接続して音声会議を行う音声会議システムが開示されている。 For example, Patent Literature 1 includes a main unit at each point, a microphone, a speaker, and an RFID reader according to the number of participants, and a voice conference in which the main units are connected via a network to perform a voice conference. A system is disclosed.

ところで、このような音声会議システムでは、いろいろな声質や声量の会議者が参加するため、声質および声量によっては相手先の会議者が聞こえ難い場合がある。これを解決する方法として、特許文献１では、ユーザ毎に予め音声補正量を記憶しておき、会議の際にはＲＦＩＤ等で話者識別を行い、識別した話者毎に特有の音声補正を行っている。また、音質補正の方法として、特許文献２には、話者のホルマントを検出して、聞きやすいホルマントに変形する方法が開示されている。
特開２００５−８０１１０公報特開平１−９３７９６号公報 By the way, in such a voice conference system, since participants with various voice qualities and voice volumes participate, it may be difficult to hear the other party's conference person depending on the voice qualities and voice volumes. As a method for solving this, in Patent Document 1, a voice correction amount is stored in advance for each user, speaker identification is performed by RFID or the like at the time of a conference, and voice correction peculiar to each identified speaker is performed. Is going. As a sound quality correction method, Patent Document 2 discloses a method of detecting a speaker formant and transforming it into a formant that is easy to hear.
JP 2005-80110 A Japanese Unexamined Patent Publication No. 1-93796

しかしながら、特許文献１の方法では、会議に参加し得るユーザ全ての識別情報および音声補正量を予め記憶しておかなければならず、ユーザ識別処理が煩雑になってしまう。また、新たなユーザが会議に参加する場合には、その都度ユーザ毎の識別情報取得処理を行わなければならず、ユーザにとって使い勝手が悪かった。 However, in the method of Patent Document 1, it is necessary to previously store identification information and audio correction amounts of all users who can participate in the conference, and the user identification process becomes complicated. Moreover, when a new user participates in a meeting, the identification information acquisition process for every user must be performed, and the usability was bad for the user.

また、特許文献２の方法では、ホルマントを完全に変形してしまうことで、聞き手にとっては話者が誰なのかを確実に識別することができなかった。 Further, in the method of Patent Document 2, the formant is completely deformed, so that it is impossible for the listener to reliably identify the speaker.

したがって、本発明の目的は、聞き取り難い声質や声量の話者からの音声であっても、当該話者を識別可能な程度で補正して、聞き手にとって聞き取り易い音で放音させることができる音声通信装置を構成することにある。 Accordingly, an object of the present invention is to enable sound that is easy to hear for the listener to be heard by correcting the speaker so as to be identifiable even if the voice is from a speaker whose voice quality or volume is difficult to hear. It is to constitute a communication device.

この発明の音声通信装置は、自装置周りの会議者の発声音を収音する収音手段と、収音音声信号の音声特徴量を解析して、予め設定された複数の音声特徴量グループから該当する音声特徴量グループを検出する収音音声特徴量検出手段と、検出した音声特徴量グループに応じた音声補正処理を行う収音音声補正手段と、音声補正処理後の収音音声信号を相手先装置に送信する通信手段と、を備えたことを特徴としている。 The voice communication device according to the present invention is configured to collect a voice collecting unit for collecting voices of conference participants around the device, and analyze a voice feature amount of the collected voice signal from a plurality of preset voice feature amount groups. The collected sound feature detecting means for detecting the corresponding sound feature group, the collected sound correcting means for performing the sound correction processing according to the detected sound feature group, and the collected sound signal after the sound correction processing as the other party And a communication means for transmitting to the destination device.

この構成では、収音手段が収音した収音音声信号の音声特徴量を収音音声特徴量検出手段が検出し、該当する音声特徴量グループが決定される。収音音声補正手段は、決定された音声特徴量グループに応じた音声補正を行い、通信手段は、補正後の収音音声信号を相手先装置に送信する。このような音声補正を行うことで、音質、音量に応じて適当な幅を持たせたグループ単位で補正量が設定される。 In this configuration, the collected sound feature detecting unit detects the sound feature amount of the collected sound signal collected by the sound collecting unit, and the corresponding sound feature group is determined. The collected sound correction unit performs sound correction according to the determined sound feature group, and the communication unit transmits the corrected collected sound signal to the counterpart device. By performing such sound correction, the correction amount is set in units of groups having appropriate widths according to sound quality and volume.

また、この発明の音声通信装置は、相手先装置から放音用音声信号を受信する通信手段と、受信した放音用音声信号の音声特徴量を解析して、予め設定された複数の音声特徴量グループから該当する音声特徴量グループを検出する放音用音声特徴量検出手段と、検出した音声特徴量グループに応じた音声補正処理を行う放音用音声補正手段と、音声補正処理後の放音用音声信号に基づいて放音する放音手段と、を備えたことを特徴としている。 The voice communication device according to the present invention comprises a communication unit that receives a sound signal for sound emission from a destination device, and a plurality of preset sound features by analyzing the sound feature value of the received sound signal for sound emission. A sound feature detector for sound emission that detects a corresponding sound feature group from the sound amount group, a sound corrector for sound emission that performs sound correction processing according to the detected sound feature group, and a release after sound correction processing. And sound emitting means for emitting sound based on the sound signal for sound.

この構成では、通信手段が受信した放音用音声信号の音声特徴量を放音用音声特徴量検出手段が検出し、該当する音声特徴量グループが決定される。放音用音声補正手段は、決定された音声特徴量グループに応じた音声補正を行い、放音手段は、補正後の放音用音声信号に基づいて放音する。このような音声補正を行うことでも、音質、音量に応じて適当な幅を持たせたグループ単位で補正量が設定される。 In this configuration, the sound feature quantity detecting unit detects the sound feature quantity of the sound output sound signal received by the communication means, and the corresponding sound feature group is determined. The sound emitting sound correcting means performs sound correction according to the determined sound feature group, and the sound emitting means emits sound based on the corrected sound emitting sound signal. Even with such audio correction, the correction amount is set in units of groups having appropriate widths according to the sound quality and volume.

また、この発明の音声通信装置は、前述の収音・収音音声補正・送信機能に加え、通信手段で相手先装置からの放音用音声信号を受信するとともに、受信した放音用音声信号の音声特徴量を解析して、予め設定された複数の音声特徴量グループから該当する音声特徴量グループを検出する放音用音声特徴量検出手段と、検出した音声特徴量グループに応じた音声補正処理を行う放音用音声補正手段と、音声補正処理後の放音用音声信号に基づいて放音する放音手段と、を備えたことを特徴としている。 The voice communication device according to the present invention receives the sound output sound signal from the counterpart device by the communication means in addition to the sound collection / acquired sound correction / transmission function described above, and receives the sound output sound signal received. The sound feature detecting means for detecting a sound feature amount group from a plurality of preset sound feature groups, and sound correction according to the detected sound feature group It is characterized by comprising sound emitting sound correcting means for performing processing and sound emitting means for emitting sound based on the sound emitting sound signal after the sound correcting process.

この構成では、前述の収音音声信号の音質補正と放音用音声信号の音質補正とをともに実行することが可能な音声通信装置が得られる。この際、音声特徴量の解析と、音声特徴量グループの識別は一つの素子で実現が可能となる。 With this configuration, it is possible to obtain a voice communication apparatus capable of executing both the sound quality correction of the collected sound signal and the sound quality correction of the sound output sound signal. At this time, the analysis of the voice feature quantity and the identification of the voice feature quantity group can be realized by one element.

また、この発明の音声通信装置は、音声特徴量グループを、言語種類による分類、性別による分類、世代による分類の少なくとも一種または複数の組み合わせにより設定することを特徴としている。 The voice communication apparatus according to the present invention is characterized in that the voice feature amount group is set by at least one kind or a combination of a classification by language type, a classification by gender, and a classification by generation.

この構成では、音声特徴量グループが、「日本語」と「英語」、「女」と「男」、「子供」と「成人（青年期）」と「成人（老年期）」、のように、それぞれに音声特徴量の違いが取得可能な基準で分類される。そして、このグループ毎に個別の音声補正特性を与えることで、どのグループに属する音声が収音されたり、音声信号を受信したりしても、適宜聞きやすいように補正される。 In this configuration, the speech feature groups are “Japanese” and “English”, “Woman” and “Men”, “Children”, “Adult (Adolescence)” and “Adult (Old)”, etc. , Each of them is classified according to a criterion capable of acquiring a difference in voice feature amount. Then, by providing individual voice correction characteristics for each group, correction is made so as to be easy to hear as appropriate regardless of which group's voice is collected or a voice signal is received.

この発明によれば、自装置の収音音声信号や相手装置からの放音用音声信号が音声特徴量により分類され、分類毎に応じた音質補正が行われることで、補正を行わなければ聞き取り難い音声を含むどのような音声であっても、聞き取り易い音声に補正してユーザに放音することができる。これにより、ユーザは常に聞き取り易い音声を聞くことができ、快適な音声会議を行うことができる。 According to the present invention, the collected sound signal of the own device and the sound signal for sound emission from the partner device are classified by the sound feature amount, and the sound quality correction corresponding to each classification is performed. Any voice including difficult voice can be corrected to a voice that is easy to hear and emitted to the user. Thereby, the user can always hear a voice that is easy to hear and can perform a comfortable voice conference.

本発明の実施形態に係る音声通信装置と当該音声通信装置を用いた音声通信システムについて図を参照して説明する。
図１は本実施形態の音声通信システムの構成図である。
図２は本実施形態の音声通信装置の主要構成を示すブロック図である。 A voice communication apparatus according to an embodiment of the present invention and a voice communication system using the voice communication apparatus will be described with reference to the drawings.
FIG. 1 is a configuration diagram of a voice communication system according to the present embodiment.
FIG. 2 is a block diagram showing the main configuration of the voice communication apparatus of this embodiment.

図１に示すように、本実施形態の音声通信システムは、各地点ａ，ｂに設置された各音声通信装置１１１Ａ，１１１Ｂを、ネットワーク１００で接続した構成からなる。 As shown in FIG. 1, the voice communication system according to the present embodiment has a configuration in which voice communication apparatuses 111A and 111B installed at points a and b are connected by a network 100.

地点ａに設置された音声通信装置１１１Ａは、自装置の周囲に在席する会議者Ａ〜会議者Ｇの発言を収音して収音音声信号を生成し、音声データに変換してネットワーク１００経由で送信する。また、音声通信装置１１１Ａは、ネットワーク１００経由で取得した音声データから放音用音声信号（相手装置（例えば、音声通信装置１１１Ｂ）の収音音声信号）を取得し、会議者Ａ〜会議者Ｇに聞こえるように放音する。一方、地点ｂに設置された音声通信装置１１１Ｂは、自装置の周囲に在席する会議者Ｈ〜会議者Ｌの発言を収音して収音音声信号を生成し、音声データに変換してネットワーク１００経由で送信する。また、音声通信装置１１１Ｂは、ネットワーク１００経由で受信した音声データから放音用音声信号（相手装置（例えば、音声通信装置１１１Ａ）の収音音声信号）を取得し、会議者Ｈ〜会議者Ｌに聞こえるように放音する。 The voice communication device 111A installed at the point a picks up the speech of the conference participants A to G who are present around the device, generates a collected voice signal, converts it into voice data, and converts it into voice data. Send via. In addition, the voice communication device 111A acquires a sound signal for sound emission (sound collected voice signal of the partner device (for example, the voice communication device 111B)) from the voice data acquired via the network 100, and the participants A to G Sounds so that it can be heard. On the other hand, the voice communication device 111B installed at the point b picks up the speeches of the conference participants H to L who are present around the device, generates a collected voice signal, and converts it into voice data. Transmit via the network 100. Also, the voice communication device 111B acquires a sound signal for sound emission from the voice data received via the network 100 (a collected voice signal of the partner device (for example, the voice communication device 111A)), and the conference participants H to L Sounds so that it can be heard.

二地点ａ，ｂに設置された音声通信装置１１１Ａ，１１１Ｂは、同じ構成からなり、図２に示す構造からなる。以下、音声通信装置１１１Ａ，１１１Ｂに共通する機能に関しては音声通信装置１１１として説明する。 The voice communication apparatuses 111A and 111B installed at the two points a and b have the same configuration and the structure shown in FIG. Hereinafter, functions common to the voice communication apparatuses 111A and 111B will be described as the voice communication apparatus 111.

図２に示すように、音声通信装置１１１は、メイン制御部１０、通信制御部１１、放音制御部１２、Ｄ／Ａコンバータ１３、放音用アンプ（ＡＭＰ）１４、スピーカＳＰ１〜ＳＰ１６、マイクＭＩＣ１０１〜ＭＩＣ１１６，ＭＩＣ２０１〜ＭＩＣ２１６、収音アンプ（ＡＭＰ）１５、Ａ／Ｄコンバータ１６、収音制御部１７、エコーキャンセル部１８、音声信号補正部１９、操作部２０を備える。 As shown in FIG. 2, the voice communication apparatus 111 includes a main control unit 10, a communication control unit 11, a sound emission control unit 12, a D / A converter 13, a sound emission amplifier (AMP) 14, speakers SP1 to SP16, a microphone. MIC 101 to MIC 116, MIC 201 to MIC 216, a sound collection amplifier (AMP) 15, an A / D converter 16, a sound collection control unit 17, an echo cancellation unit 18, an audio signal correction unit 19, and an operation unit 20.

本実施形態の音声通信装置１１１は、筐体が一方向に長尺な略直方体形状からなる。なお、以下の説明では、筐体の四側面のうち、長尺な面を長尺面、短尺な面を短尺面と称する。 The voice communication apparatus 111 according to the present embodiment has a substantially rectangular parallelepiped shape whose casing is long in one direction. In the following description, of the four side surfaces of the housing, the long surface is referred to as a long surface, and the short surface is referred to as a short surface.

筐体の上面における長尺な方向の一方端には、複数のボタンや表示画面からなる操作部２０が設置されている。これら操作部２０は筐体内に設置されたメイン制御部１０に接続し、会議者からの操作入力を受け付けて、メイン制御部１０に出力するとともに、操作内容や実行モード等を表示画面に表示する。また、筐体における操作部２０が設置された側の短尺面には、図示しないが、ネットワーク接続端子等の各種入出力インターフェース端子が設置されている。 An operation unit 20 including a plurality of buttons and a display screen is installed at one end of the upper surface of the housing in the long direction. These operation units 20 are connected to a main control unit 10 installed in the housing, receive operation inputs from conference participants, output them to the main control unit 10, and display operation contents, execution modes, and the like on a display screen. . Although not shown, various input / output interface terminals such as a network connection terminal are installed on the short surface of the housing on the side where the operation unit 20 is installed.

音声通信装置１１１の筐体の下面には、同形状からなるスピーカＳＰ１〜ＳＰ１６が設置されている。これらスピーカＳＰ１〜ＳＰ１６は長尺方向に沿って一定の間隔で直線状に設置されており、これによりスピーカアレイが構成される。筐体の一方の長尺面には、同形状からなるマイクＭＩＣ１０１〜ＭＩＣ１１６が設置されている。これらマイクＭＩＣ１０１〜ＭＩＣ１１６は長尺方向に沿って一定の間隔で直線状に設置されており、これによりマイクアレイが構成される。また、筐体の他方の長尺面にも、同形状からなるマイクＭＩＣ２０１〜ＭＩＣ２１６が設置されている。これらマイクＭＩＣ２０１〜ＭＩＣ２１６も長尺方向に沿って一定の間隔で直線状に設置されており、これによりマイクアレイが構成される。そして、筐体の下面側には、これらスピーカアレイおよびマイクアレイを覆う形状で形成され、パンチメッシュされた下面グリル（図示せず）が設置されている。なお、本実施形態では、スピーカアレイのスピーカ数を１６本とし、各マイクアレイのマイク数をそれぞれ１６本としたが、これに限ることなく、仕様に応じてスピーカ数およびマイク数は適宜設定すればよい。 Speakers SP <b> 1 to SP <b> 16 having the same shape are installed on the lower surface of the casing of the voice communication device 111. These speakers SP1 to SP16 are installed in a straight line at regular intervals along the longitudinal direction, thereby constituting a speaker array. Microphones MIC101 to MIC116 having the same shape are installed on one long surface of the housing. These microphones MIC101 to MIC116 are installed in a straight line at regular intervals along the longitudinal direction, thereby forming a microphone array. Further, microphones MIC201 to MIC216 having the same shape are also installed on the other long surface of the housing. These microphones MIC201 to MIC216 are also installed in a straight line at regular intervals along the longitudinal direction, thereby forming a microphone array. A bottom grille (not shown) formed in a shape covering the speaker array and the microphone array and punch meshed is installed on the lower surface side of the housing. In this embodiment, the number of speakers in the speaker array is 16 and the number of microphones in each microphone array is 16. However, the present invention is not limited to this, and the number of speakers and the number of microphones may be set as appropriate according to the specifications. That's fine.

メイン制御部１０は、音声会議装置の全体制御を行うとともに、操作部２０から入力される電源オン／オフ等の制御や、その他信号処理系の各種制御を行う。また、メイン制御部１０は、操作部２０を介して会議者から放音用音声補正制御を受け付けると、放音用音声補正制御信号を音声信号補正部１９に与える。また、メイン制御部１０は、操作部２０を介して会議者から収音音声補正制御を受け付けると、収音音声補正制御信号を音声信号補正部１９に与える。また、メイン制御部１０は、後述する収音制御部１７から、話者方位データを取得すると、通信制御部１１に与える。 The main control unit 10 performs overall control of the audio conference apparatus, and also performs control such as power on / off input from the operation unit 20 and other control of the signal processing system. Further, when the main control unit 10 receives the sound emission sound correction control from the conference person via the operation unit 20, the main control unit 10 gives a sound emission sound correction control signal to the sound signal correction unit 19. Further, when the main control unit 10 receives the collected sound correction control from the conference via the operation unit 20, the main control unit 10 gives the collected sound correction control signal to the sound signal correcting unit 19. Further, the main control unit 10 gives the speaker control data to the communication control unit 11 when acquiring the speaker orientation data from the sound collection control unit 17 described later.

通信制御部１１は、受信側の機能として、ネットワーク１００に接続し、ネットワーク１００を介して受信した他装置からの音声データから話者方位データと放音用音声信号とを復調して、話者方位データをメイン制御部１０へ、放音用音声信号を音声信号補正部１９に出力する。 As a reception-side function, the communication control unit 11 is connected to the network 100 and demodulates speaker orientation data and sound emission sound signals from voice data received from other devices via the network 100, The azimuth data is output to the main control unit 10 and the sound output sound signal is output to the sound signal correction unit 19.

音声信号補正部１９は、受信側機能として、メイン制御部１０から放音用音声補正制御信号を受け付けると、通信制御部１１からの放音用音声信号を補正して、エコーキャンセル部１８を介して放音制御部１２に補正後の放音用音声信号を出力する。一方、音声信号補正部１９は、メイン制御部１０から放音用音声補正制御信号を受けなければ、通信制御部１１からの放音用音声信号をそのままエコーキャンセル部１８を介して放音制御部１２に出力する。なお、音声信号補正部１９の詳細な構成および動作は後述する。 When the sound signal correcting unit 19 receives a sound emitting sound correction control signal from the main control unit 10 as a reception side function, the sound signal correcting unit 19 corrects the sound emitting sound signal from the communication control unit 11 and passes through the echo canceling unit 18. The corrected sound emission sound signal is output to the sound emission control unit 12. On the other hand, if the sound signal correction unit 19 does not receive the sound output sound correction control signal from the main control unit 10, the sound signal output from the communication control unit 11 is directly used as the sound output control unit via the echo cancellation unit 18. 12 is output. The detailed configuration and operation of the audio signal correction unit 19 will be described later.

放音制御部１２は、相手装置からの話者方位データに基づいてメイン制御部１０が設定した放音制御データを取得すると、入力された放音用音声信号に対して遅延処理や振幅処理等を行って、音声会議装置の周りに在席する全ての会議者へ同等に放音する特性で放音ビームを形成するように、各スピーカＳＰ１〜ＳＰ１６に対応する個別放音信号を生成する。 When the sound emission control unit 12 acquires the sound emission control data set by the main control unit 10 based on the speaker orientation data from the partner device, the sound emission control unit 12 performs delay processing, amplitude processing, and the like on the input sound emission sound signal. To generate individual sound emission signals corresponding to the speakers SP1 to SP16 so as to form sound emission beams with the characteristic of sound emission equally to all the conference persons present around the audio conference apparatus.

各Ｄ／Ａコンバータ１３は、入力された個別放音信号をディジタル−アナログ変換して、各放音アンプ１４に与え、各放音アンプ１４はアナログ化された個別放音信号を増幅して、各スピーカＳＰ１〜ＳＰ１６に与える。各スピーカＳＰ１〜ＳＰ１６は、入力された電気的な個別放音信号を音声に変換して放音する。 Each D / A converter 13 performs digital-analog conversion on the input individual sound emission signal and supplies it to each sound emission amplifier 14, and each sound emission amplifier 14 amplifies the analog individual sound emission signal, It gives to each speaker SP1-SP16. Each of the speakers SP1 to SP16 converts the input electrical individual sound emission signal into sound and emits the sound.

マイクＭＩＣ１０１〜ＭＩＣ１１６、ＭＩＣ２０１〜ＭＩＣ２１６は、自装置の周囲に在席する話者からの発声音を含む周囲の音を収音して電気的な収音信号に変換し、収音アンプ１５に与える。収音アンプ１５は収音信号を増幅してＡ／Ｄコンバータ１６に与え、Ａ／Ｄコンバータ１６は、アナログ形式の収音信号をディジタル変換して、収音制御部１７に出力する。 The microphones MIC101 to MIC116 and MIC201 to MIC216 collect ambient sounds including utterances from speakers present around the device, convert them into electrical sound collection signals, and provide them to the sound collection amplifier 15 . The sound collection amplifier 15 amplifies the sound collection signal and applies it to the A / D converter 16, and the A / D converter 16 converts the analog sound collection signal into a digital signal and outputs it to the sound collection control unit 17.

収音制御部１７は、各マイクＭＩＣ１０１〜ＭＩＣ１１６，ＭＩＣ２０１〜ＭＩＣ２１６の収音信号に対して遅延処理等を行い、各会議者の方位を含む所定方位に強い指向性を有する収音ビーム信号を生成する。収音制御部１７は、生成した各方位の収音ビーム信号の振幅を比較し、最も振幅の大きい収音ビーム信号を選択して、収音音声信号としてエコーキャンセル部１８に出力する。また、収音制御部１７は、選択した収音ビーム信号に対応する収音方位を抽出して、前記話者方位データとしてメイン制御部１０に与える。 The sound collection control unit 17 performs a delay process on the sound collection signals of the microphones MIC101 to MIC116 and MIC201 to MIC216, and generates a sound collection beam signal having a strong directivity in a predetermined direction including the direction of each conference person. To do. The sound collecting control unit 17 compares the amplitudes of the generated sound collecting beam signals in the respective directions, selects the sound collecting beam signal having the largest amplitude, and outputs it to the echo canceling unit 18 as a sound collecting sound signal. In addition, the sound collection control unit 17 extracts a sound collection direction corresponding to the selected sound collection beam signal, and provides it to the main control unit 10 as the speaker direction data.

エコーキャンセル部１８は、適応型フィルタとポストプロセッサとを備え、適応型フィルタで放音用音声信号に基づく擬似回帰音信号を生成して、収音制御部１７の収音音声信号から擬似回帰音信号を減算して、音声信号補正部１９に出力する。このようなエコーキャンセル処理を行うことで、スピーカＳＰからマイクＭＩＣへの回り込み音が抑圧される。 The echo cancellation unit 18 includes an adaptive filter and a post processor, generates a pseudo regression sound signal based on the sound output sound signal by the adaptive filter, and generates a pseudo regression sound from the collected voice signal of the sound collection control unit 17. The signal is subtracted and output to the audio signal correction unit 19. By performing such echo cancellation processing, the wraparound sound from the speaker SP to the microphone MIC is suppressed.

音声信号補正部１９は、送信側機能として、メイン制御部１０から収音音声補正制御信号を受け付けると、エコーキャンセル部１８から出力された収音音声信号を補正して、通信制御部１１に出力する。一方、音声信号補正部１９は、メイン制御部１０から収音音声補正制御信号を受けなければ、エコーキャンセル部１８から出力された収音音声信号をそのまま通信制御部１１に出力する。なお、音声信号補正部１９の詳細な構成および動作は後述する。 When the sound signal correcting unit 19 receives a sound collecting sound correction control signal from the main control unit 10 as a transmission side function, the sound signal correcting unit 19 corrects the sound collecting sound signal output from the echo canceling unit 18 and outputs it to the communication control unit 11. To do. On the other hand, if the sound signal correcting unit 19 does not receive the sound collecting sound correction control signal from the main control unit 10, the sound signal correcting unit 19 outputs the sound collecting sound signal output from the echo canceling unit 18 to the communication control unit 11 as it is. The detailed configuration and operation of the audio signal correction unit 19 will be described later.

通信制御部１１は、送信側機能として、音声信号補正部１９からの収音音声信号に対して、メイン制御部１０からの話者方位データと、装置ＩＤデータとを添付して、ネットワーク通信形式の音声データに変換し、ネットワーク１００に送信する。 As a transmission side function, the communication control unit 11 attaches the speaker orientation data from the main control unit 10 and the device ID data to the collected voice signal from the voice signal correction unit 19 to form a network communication format. Are transmitted to the network 100.

次に、音声信号補正部１９の構成および動作処理について、より具体的に説明する。
図３は図２に示した音声信号補正部１９の主要構成を示すブロック図である。また、図４はＥＱテーブル１９４に記憶されている音声特徴量グループの例を示す図である。 Next, the configuration and operation processing of the audio signal correction unit 19 will be described more specifically.
FIG. 3 is a block diagram showing the main configuration of the audio signal correction unit 19 shown in FIG. FIG. 4 is a diagram showing an example of a voice feature group stored in the EQ table 194.

図３に示すように、音声信号補正部１９は、補正制御用ＣＰＵ１９０、ホルマント検出部１９１、収音用ＥＱ処理部１９２、放音用ＥＱ処理部１９３、メモリであるＥＱテーブル１９４を備える。 As shown in FIG. 3, the audio signal correction unit 19 includes a correction control CPU 190, a formant detection unit 191, a sound collection EQ processing unit 192, a sound emission EQ processing unit 193, and an EQ table 194 that is a memory.

ＥＱテーブル１９４には、ホルマントの特徴が異なる組み合わせ毎に分類された音声特徴量グループが記憶されている。例えば、図４に示すように、「日本語と英語」、「女性と男性」、「子供、成人（青年期）、成人（老年期）」をそれぞれホルマントの異なる三種類の組み合わせとして設定し、これら三種類より分類される１２通りの音声特徴量グループＧ１１〜Ｇ１３，Ｇ２１〜Ｇ２３，Ｇ３１〜Ｇ３３，Ｇ４１〜Ｇ４３が記憶されている。各音声特徴量グループＧに対応して記憶されている内容は、グループ名称Ｇと、各グループの特徴となるホルマントと、音声信号のＥＱ補正特性である。例えば、グループＧ１２であれば、グループ名称Ｇ１２と、「日本語」、「女性」、「成人（青年）」の組み合わせを代表する一般的なホルマントと、当該ホルマントの音声信号を一般的な人が聞きやすいフラットな周波数特性の音声信号に変換するためのＥＱ補正係数が記憶されている。なお、組み合わせ基準は、この例に限らず、さらに、「早口」と「ゆっくり」との組み合わせによる基準等のさらに細かい基準を追加して設定してもよい。また、ホルマント以外の基本周波数、ピッチ、イントネーション、ケプストラム等の音声特徴パラメータを設定するようにしてもよい。 The EQ table 194 stores speech feature amount groups classified for each combination having different formant features. For example, as shown in FIG. 4, "Japanese and English", "Women and Men", "Children, Adults (Adolescence), and Adults (Old)" are set as three different combinations of formants, Twelve types of speech feature groups G11 to G13, G21 to G23, G31 to G33, and G41 to G43 classified from these three types are stored. The contents stored corresponding to each voice feature group G are the group name G, formants that are the characteristics of each group, and EQ correction characteristics of the voice signal. For example, in the case of a group G12, a general person represents a general formant representing a combination of the group name G12, “Japanese”, “female”, and “adult (youth)” and the sound signal of the formant. An EQ correction coefficient for conversion into an audio signal having a flat frequency characteristic that is easy to hear is stored. The combination criterion is not limited to this example, and further finer criteria such as a criterion based on a combination of “fast mouth” and “slow” may be additionally set. Moreover, you may make it set audio | voice feature parameters, such as fundamental frequencies other than a formant, a pitch, intonation, and a cepstrum.

＜放音用音声信号を音声補正する場合＞
ホルマント検出部１９１は、放音用音声補正制御信号をメイン制御部１０から受け付けると、通信制御部１１から入力される放音用音声信号を所定タイミング毎に所定時間長で区切ってホルマントを検出し、補正制御用ＣＰＵ１９０に与える。
補正制御用ＣＰＵ１９０は、放音用音声補正制御信号をメイン制御部１０から受け付けると、放音用音声補正処理を行う。具体的には、補正制御用ＣＰＵ１９０は、ホルマント検出部１９１が検出したホルマントと、ＥＱテーブル１９４に記憶された各グループＧのホルマントとの類似性を検出して、最も類似性が高いグループを選択する。補正制御用ＣＰＵ１９０は、選択したグループＧのＥＱ補正係数を読み出し、放音用ＥＱ処理部１９３に与える。 <When correcting sound output for sound emission>
When the formant detection unit 191 receives the sound emission correction control signal from the main control unit 10, the formant detection unit 191 detects the formant by dividing the sound emission sound signal input from the communication control unit 11 by a predetermined time length at every predetermined timing. To the correction control CPU 190.
When the correction control CPU 190 receives a sound emission sound correction control signal from the main control unit 10, it performs sound emission sound correction processing. Specifically, the correction control CPU 190 detects the similarity between the formants detected by the formant detection unit 191 and the formants of each group G stored in the EQ table 194, and selects the group with the highest similarity. To do. The correction control CPU 190 reads the EQ correction coefficient of the selected group G and provides it to the sound emission EQ processing unit 193.

放音用ＥＱ処理部１９３は、与えられたＥＱ補正係数を用いて、通信制御部１１からの放音音声信号をイコライザ処理して、例えば、略平坦な周波数特性からなる放音音声信号に変換して放音制御部１２に出力する。 The sound emission EQ processing unit 193 performs an equalizer process on the sound emission sound signal from the communication control unit 11 using the given EQ correction coefficient, and converts the sound emission sound signal into, for example, a substantially flat frequency characteristic. And output to the sound emission control unit 12.

＜収音音声信号を音声補正する場合＞
ホルマント検出部１９１は、収音音声補正制御信号をメイン制御部１０から受け付けると、エコーキャンセル部１８から出力された収音音声信号を所定タイミング毎に所定時間長で区切ってホルマントを検出し、補正制御用ＣＰＵ１９０に与える。
補正制御用ＣＰＵ１９０は、収音音声補正制御信号をメイン制御部１０から受け付けると、収音音声補正処理を行う。具体的には、補正制御用ＣＰＵ１９０は、ホルマント検出部１９１が検出したホルマントと、ＥＱテーブル１９４に記憶された各グループＧのホルマントとの類似性を検出して、最も類似性が高いグループを選択する。補正制御用ＣＰＵ１９０は、選択したグループＧのＥＱ補正係数を読み出し、収音用ＥＱ処理部１９２に与える。 <When correcting the collected sound signal>
When the formant detection unit 191 receives the collected sound correction control signal from the main control unit 10, the formant detection unit 191 detects the formant by dividing the collected sound signal output from the echo cancellation unit 18 by a predetermined time length at every predetermined timing, and corrects it. This is given to the control CPU 190.
When the CPU 190 for correction control receives a sound collection sound correction control signal from the main control unit 10, it performs sound collection sound correction processing. Specifically, the correction control CPU 190 detects the similarity between the formants detected by the formant detection unit 191 and the formants of each group G stored in the EQ table 194, and selects the group with the highest similarity. To do. The correction control CPU 190 reads the EQ correction coefficient of the selected group G and supplies it to the sound collection EQ processing unit 192.

収音用ＥＱ処理部１９２は、与えられたＥＱ補正係数を用いて、エコーキャンセル部１８からの収音音声信号をイコライザ処理して、例えば、略平坦な周波数特性からなる放音音声信号に変換して通信制御部１１に出力する。 The sound collection EQ processing unit 192 performs an equalizer process on the collected sound signal from the echo canceling unit 18 using the given EQ correction coefficient, and converts it into, for example, a sound emission sound signal having a substantially flat frequency characteristic. And output to the communication control unit 11.

このようなイコライザ処理により放音音声信号を補正することで、オリジナル音声が聞き難い場合でも明確に音声を聞き取ることができる。さらに、従来のように直接ホルマントを修正するよりも、元の音声の余韻を残すことができるので、聞き手は、話者を判別しやすい。 By correcting the emitted sound signal by such an equalizer process, it is possible to clearly hear the sound even when it is difficult to hear the original sound. Furthermore, since the reverberation of the original speech can be left rather than directly correcting the formant as in the prior art, the listener can easily identify the speaker.

また、従来のように話者同定を行ってから、話者毎に補正特性を読み出して補正を行うよりも、簡素な処理で音質を補正することができる。これにより、より確実で且つ素早く音質補正を行うことができる。 In addition, it is possible to correct the sound quality by a simple process, compared to the case where speaker identification is performed as in the prior art, and then correction characteristics are read and corrected for each speaker. Thereby, sound quality correction can be performed more reliably and quickly.

なお、前述の装置では、放音用音声信号の補正処理を行う機能部と収音音声信号の補正処理を行う機能部とを同時に備える構成を示した。しかしながら、収音側の音声通信装置と放音側の音声通信装置とが分かれているような場合には、収音側の音声通信装置は収音音声信号の補正処理を行う機能部のみを備え、放音側の音声通信装置は放音用音声信号の補正処理を行う機能のみを備えるようにすればよい。 In the above-described apparatus, a configuration is shown in which a functional unit that performs correction processing of a sound emission sound signal and a functional unit that performs correction processing of a collected sound signal are simultaneously provided. However, in a case where the sound collection side voice communication device and the sound emission side voice communication device are separated, the sound collection side voice communication device includes only a function unit that performs correction processing of the collected sound signal. The sound communication device on the sound emission side only needs to have a function of correcting the sound signal for sound emission.

また、前述の装置では、放音用音声信号を一括で音質補正して、全ての会議者に対して同じ音質で放音する場合を示した。しかしながら、会議者の全てが同じ聴覚特性を有するわけではないので、会議者毎に音質補正特性を設定しても良い。この場合、前述のＥＱ補正特性に追加して、会議者から操作部２０等により入力される追加補正特性に基づくＥＱ補正を実行することで、会議者毎に個別の音質補正を行うことができる。このような構成とすることで、各会議者は、さらに自分に合った音質で相手装置の会議者の発言を聞き取ることができる。 In the above-described apparatus, the sound quality of sound emission sound signals is collectively corrected, and the sound is emitted with the same sound quality to all the conference participants. However, since not all of the conference participants have the same auditory characteristics, sound quality correction characteristics may be set for each conference participant. In this case, in addition to the above-described EQ correction characteristics, by executing EQ correction based on the additional correction characteristics input from the conference through the operation unit 20 or the like, individual sound quality correction can be performed for each conference. . By adopting such a configuration, each conferee can further listen to the remarks of the conferee of the counterpart device with sound quality suitable for him.

本発明の音声通信システムの構成図である。It is a block diagram of the audio | voice communication system of this invention. 本発明の実施形態の音声通信装置の主要構成を示すブロック図である。It is a block diagram which shows the main structures of the audio | voice communication apparatus of embodiment of this invention. 図２に示した音声信号補正部１９の主要構成を示すブロック図である。It is a block diagram which shows the main structures of the audio | voice signal correction | amendment part 19 shown in FIG. ＥＱテーブル１９４に記憶されている音声特徴量グループの例を示す図である。It is a figure which shows the example of the audio | voice feature-value group memorize | stored in EQ table 194. FIG.

Explanation of symbols

１００−ネットワーク、１１１，１１１Ａ，１１１Ｂ−音声通信装置、１０−メイン制御部、１１−通信制御部、１２−放音制御部、１３−Ｄ／Ａコンバータ、１４−放音用アンプ、１５−収音アンプ、１６−Ａ／Ｄコンバータ、１７−収音制御部、１８−エコーキャンセル部、１９−音声信号補正部、１９０−補正制御用ＣＰＵ、１９１−ホルマント検出部、１９２−収音用ＥＱ処理部、１９３−放音用ＥＱ処理部、１９４−ＥＱテーブル、２０−操作部 100-network, 111, 111A, 111B-voice communication device, 10-main control unit, 11-communication control unit, 12-sound emission control unit, 13-D / A converter, 14-sound emission amplifier, 15-collection Sound amplifier, 16-A / D converter, 17-sound collection control unit, 18-echo cancellation unit, 19-sound signal correction unit, 190-correction control CPU, 191-formant detection unit, 192-sound collection EQ processing Section, 193-sound emission EQ processing section, 194-EQ table, 20-operation section

Claims

Sound collection means for collecting the utterances of conference participants around the device;
A formant detection means for detecting a formant of collected sound signals,
Storage means for storing correction characteristics for each of a plurality of preset formants;
Control means for referring to the storage means with the formants detected by the formant detection means, and for reading the correction characteristics of the corresponding formants ;
Sound collection sound correction means for performing correction processing of the sound collected by the sound collection means using the correction characteristic read by the control means ;
And a communication means for transmitting the collected sound signal after the sound correction processing to the counterpart device.

A communication means for receiving a sound signal for sound emission from the counterpart device;
A formant detection means for detecting a formant of the received sound emitting signal,
Storage means for storing correction characteristics for each of a plurality of preset formants;
Control means for referring to the storage means with the formants detected by the formant detection means, and for reading the correction characteristics of the corresponding formants ;
A sound emission correcting means for correcting the received sound output sound signal using the correction characteristic read by the control means ;
A sound communication device comprising: sound emitting means for emitting sound based on a sound emission sound signal after the sound correction processing.

The communication means receives the sound output sound signal from the counterpart device,
The formant detection means detects the formant of the received sound emitting signal,
The control means refers to the storage means with the formant detected by the formant detection means , reads the correction characteristic of the corresponding formant ,
A sound emission correcting means for correcting the received sound output sound signal using the correction characteristic read by the control means ;
The voice communication device according to claim 1, further comprising: a sound emitting unit that emits sound based on the sound output sound signal after the sound correction processing.

The voice communication according to any one of claims 1 to 3, wherein the storage means stores the correction characteristic for each of at least one kind or a combination of classification by language type, classification by gender, and classification by generation. apparatus.