JP2019146055A

JP2019146055A - Telephone call device and control method therefor

Info

Publication number: JP2019146055A
Application number: JP2018029387A
Authority: JP
Inventors: 高橋　正明; Masaaki Takahashi; 正明高橋
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2018-02-22
Filing date: 2018-02-22
Publication date: 2019-08-29
Anticipated expiration: 2038-02-22
Also published as: JP6948275B2

Abstract

To give a native utterer and a non-native utterer an easy telephone call without need for using manual switch or the like.SOLUTION: A telephone call device 21 determines whether a first speaker X is a native utterer or a language learner on the basis of at least a voice of the first speaker X and on the basis of the determination result, executes slow reproduction processing as predetermined voice processing for a language learner for a voice signal (equivalent to a voice of a second speaker Y) output to the first speaker X determined to be a language learner.SELECTED DRAWING: Figure 1

Description

本発明は、通話装置、及び通話装置の制御方法に関する。 The present invention relates to a call device and a control method for the call device.

複数の話者を通話可能にする通話装置として、母語を話す者（母語発話者と言う）の発話に対する音声信号を、非母語を話す者（非母語発話者又は語学学習者と言う）の発話に対する音声信号よりも遅延させる会議支援装置が開示されている（例えば、特許文献１参照）。この会議支援装置では、母語発話者用の端末と非母語発話者用の端末とを予め設ける方法、又は、スイッチ等により発話者が手動で各端末を母語者端末または非母語者端末と設定する方法等によって、母語発話者の音声信号であるか否かを判定可能にしている。 As a communication device that allows multiple speakers to talk, voice signals for the utterances of native speakers (referred to as native speakers) are spoken by non-native speakers (referred to as non-native speakers or language learners). A conference support apparatus is disclosed that delays the audio signal from the voice signal for the above (see, for example, Patent Document 1). In this conference support apparatus, a method for providing a terminal for a native speaker and a terminal for a non-native speaker in advance, or a speaker manually sets each terminal as a native speaker terminal or a non-native speaker terminal using a switch or the like. It is possible to determine whether or not it is a voice signal of a native speaker by a method or the like.

特開２０１４−０８６８３２号公報JP 2014-086832 A

しかし、従来の構成は、母語発話者用と非母語発話者用とで別々の端末を製作する必要や、端末を母語発話者用か非母語発話者用に切り替える手動スイッチが必要になる、といった制約がある。このため、従来の会議支援装置の機能を、例えば、車両等に搭載されるハンズフリー装置に適用し難い、といった事態が生じる。
そこで、本発明は、手動スイッチ等を使用しなくても母語発話者と非母語発話者とが通話し易くすることを目的とする。 However, the conventional configuration requires the production of separate terminals for native speakers and non-native speakers, and manual switches that switch the terminal for native speakers or non-native speakers. There are limitations. For this reason, the situation where it is difficult to apply the function of the conventional meeting assistance apparatus to the hands-free apparatus mounted in a vehicle etc. arises, for example.
Therefore, an object of the present invention is to make it easy for a native speaker and a non-native speaker to talk without using a manual switch or the like.

上記目的を達成するために、本発明は、複数の話者が互いに通話可能に、各話者の音声に対応する音声信号を入出力する通話装置において、前記話者の少なくとも音声に基づいて、その話者が、母語発話者相当の第一の発話者か、語学学習者相当の第二の発話者かを判定する判定部と、前記判定部の判定結果に基づき、前記第二の発話者であると判定された話者に向けて出力される音声信号に対し、語学学習者向けの所定の音声処理を行う音声処理部と、を備えることを特徴とする。 In order to achieve the above object, the present invention provides a communication device that inputs and outputs audio signals corresponding to the voices of the respective speakers so that a plurality of speakers can talk to each other, based on at least the voices of the speakers, A determination unit that determines whether the speaker is a first speaker equivalent to a native speaker or a second speaker equivalent to a language learner, and the second speaker based on the determination result of the determination unit And a voice processing unit that performs predetermined voice processing for language learners on a voice signal output to a speaker determined to be.

上記構成において、前記判定部は、前記話者の音声から、前記話者の母語を特定可能な所定の周波数情報を取得し、取得した周波数情報に基づいて母語を特定し、特定した母語を利用して前記第一の発話者か前記第二の発話者か否かを判定する第一の判定処理を行ってもよい。 In the above configuration, the determination unit obtains predetermined frequency information capable of specifying the speaker's mother tongue from the speaker's voice, specifies the mother tongue based on the acquired frequency information, and uses the specified mother tongue And you may perform the 1st determination process which determines whether it is said 1st speaker or said 2nd speaker.

また、上記構成において、前記所定の周波数情報は、第０フォルマント周波数であり、前記判定部は、複数種類の言語と、各言語を母語とする者の第０フォルマント周波数とを関係付けた言語別周波数情報に基づき、母語を特定してもよい。 Further, in the above configuration, the predetermined frequency information is a 0th formant frequency, and the determination unit is classified by language that associates a plurality of types of languages with a 0th formant frequency of a person whose native language is each language. The native language may be specified based on the frequency information.

また、上記構成において、当該通話装置が通話に使用するユーザー固定の装置に設定された言語を利用して、そのユーザー固定の装置のユーザーとみなせる話者が前記第一の発話者か前記第二の発話者か否かを判定する第二の判定処理を行ってもよい。 Further, in the above configuration, the speaker that can be regarded as the user of the user-fixed device using the language set in the user-fixed device that the call device uses for the call is the first speaker or the second speaker. You may perform the 2nd determination process which determines whether it is a speaker.

また、上記構成において、前記第二の判定処理では、前記ユーザー固定の装置に設定された言語が、当該通話装置に接続される車載装置に設定された言語と一致する場合、その言語を母語と特定し、一致しない場合、前記ユーザー固定の装置に設定された言語を母語と特定し、特定した母語を利用して前記第一の発話者か前記第二の発話者か否かを判定してもよい。 In the above configuration, in the second determination process, when the language set in the user-fixed device matches the language set in the in-vehicle device connected to the call device, the language is set as the native language. If it is identified and does not match, the language set in the user-fixed device is identified as a native language, and the identified native language is used to determine whether the first speaker or the second speaker is used. Also good.

また、上記構成において、前記判定部は、前記話者の音声から、音声の無音部分の情報を取得し、取得した情報に基づいて前記第一の発話者か前記第二の発話者か否かを判定する第三の判定処理を行ってもよい。 Further, in the above configuration, the determination unit acquires information on a silent portion of the voice from the voice of the speaker, and whether the first speaker or the second speaker is based on the acquired information. You may perform the 3rd determination process which determines.

また、上記構成において、前記第一〜第三の判定処理の判定結果が異なる場合、予め定めた優先度に従って、前記第一の発話者か第二の発話者かを判定してもよい。 Moreover, in the said structure, when the determination result of said 1st-3rd determination process differs, you may determine whether it is said 1st speaker or a 2nd speaker according to a predetermined priority.

また、上記構成において、前記通話装置は、前記複数の話者の中の所定の話者の音声を集音する集音部と、前記所定の話者に向けて、他の話者の音声を放音する放音部とを備え、前記音声処理部は、前記所定の話者が、前記第二の発話者であると判定された場合、前記放音部により放音させる音声に対応する音声信号に、語学学習者向けの所定の音声処理を行ってもよい。 Further, in the above configuration, the call device is configured to collect a voice of a predetermined speaker among the plurality of speakers and a voice of another speaker toward the predetermined speaker. A sound output unit that emits sound, and the sound processing unit is configured to output sound corresponding to the sound to be emitted by the sound output unit when the predetermined speaker is determined to be the second speaker. The signal may be subjected to predetermined speech processing for language learners.

また、上記構成において、前記通話装置は、前記所定の話者がハンズフリー通話に使用するハンズフリ−通話装置でもよい。
また、上記構成において、前記第二の発話者であると判定された前記他の話者の電話番号を記憶する記憶部を有し、通話開始時に、前記他の話者の電話番号が前記記憶部に記憶済みの場合、前記判定部が判定を行わずに、前記音声処理部が、前記他の話者に向けて出力される音声信号に対し、語学学習者向けの所定の音声処理を行ってもよい。 In the above configuration, the call device may be a hands-free call device used by the predetermined speaker for a hands-free call.
Further, in the above configuration, a storage unit that stores a telephone number of the other speaker determined to be the second speaker is provided, and the telephone number of the other speaker is stored in the memory at the start of a call. The voice processing unit performs predetermined voice processing for a language learner on the voice signal output to the other speaker without making the determination. May be.

また、複数の話者が互いに通話可能に、各話者の音声に対応する音声信号を入出力する通話装置の制御方法において、前記話者の少なくとも音声に基づいて、その話者が、母語発話者相当の第一の発話者か、語学学習者相当の第二の発話者かを判定する判定ステップと、前記判定ステップの判定結果に基づき、前記第二の発話者であると判定された話者に向けて出力される音声信号に対し、語学学習者向けの所定の音声処理を行う音声処理ステップと、を実行することを特徴とする。 Further, in a control method for a communication device for inputting / outputting a voice signal corresponding to each speaker's voice so that a plurality of speakers can talk to each other, the speaker can speak a native language based on at least the voice of the speaker. A determination step for determining whether the speaker is a first speaker equivalent to a speaker or a second speaker corresponding to a language learner, and a story determined to be the second speaker based on the determination result of the determination step A voice processing step for performing predetermined voice processing for a language learner on a voice signal output to the learner.

本発明によれば、手動スイッチ等を使用しなくても母語発話者と非母語発話者とが通話し易くすることができる。 According to the present invention, it is possible to facilitate communication between a native speaker and a non-native speaker without using a manual switch or the like.

本発明の第一実施形態に係る通話装置を含む通話システムの構成を示した図である。It is the figure which showed the structure of the telephone call system containing the telephone apparatus which concerns on 1st embodiment of this invention. 複数種類の言語と各言語を母語とする者の第０フォルマント周波数との関係の一例を示した図である。It is the figure which showed an example of the relationship between several types of languages and the 0th formant frequency of the person who uses each language as a mother tongue. 車両側の話者（第一話者Ｘ）に対する通話装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the communication apparatus with respect to the speaker (1st speaker X) by the side of a vehicle. 車両外の話者（第二話者Ｙ）に対する通話装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the telephone call apparatus with respect to the speaker (2nd speaker Y) outside a vehicle. 第二実施形態に係る通話装置の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the telephone apparatus which concerns on 2nd embodiment.

以下、図面を参照して本発明の実施の形態について説明する。
（第一実施形態）
図１は本発明の第一実施形態に係る通話装置２１を含む通話システム１０の構成を示した図である。
通話装置２１は、自動車等の車両に配置され、車両の乗員であるユーザー（所定の話者）がいわゆるハンズフリー通話を行うために使用するハンズフリー通話装置である。通話装置２１は、無線通信機能を備える機器と無線通信するための通信モジュール２２を備える。通話装置２１は、この通信モジュール２２により、ユーザーが所有する携帯電話２３（電話端末とも称する）と無線通信することによって、電話網を介して他の電話端末２５との間で通信する。これによって、通話装置２１と他の電話端末２５との間で通話の音声信号が入出力される。 Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
FIG. 1 is a diagram showing a configuration of a call system 10 including a call device 21 according to the first embodiment of the present invention.
The call device 21 is a hands-free call device that is disposed in a vehicle such as an automobile and is used by a user (predetermined speaker) who is an occupant of the vehicle to perform a so-called hands-free call. The call device 21 includes a communication module 22 for wireless communication with a device having a wireless communication function. The communication device 21 communicates with another telephone terminal 25 via the telephone network by wirelessly communicating with a mobile phone 23 (also referred to as a telephone terminal) owned by the user by the communication module 22. As a result, a voice signal for a call is input / output between the call device 21 and the other telephone terminal 25.

通信モジュール２２は、Bluetooth（登録商標）規格に従って近距離無線通信を行うための通信モジュールである。なお、Bluetooth以外の近距離無線通信を行うものでもよい。
携帯電話２３は、移動通信網（図示略）を介して他の携帯電話や固定電話と通信する機能を有し、内蔵スピーカー、内蔵マイク、電話通信のための通信部に加え、通話装置２１と近距離無線通信を行うための通信モジュールを備えている。なお、携帯電話２３と通話装置２１とは無線で接続される場合に限らず、有線で接続される構成でもよい。 The communication module 22 is a communication module for performing short-range wireless communication according to the Bluetooth (registered trademark) standard. Note that short-range wireless communication other than Bluetooth may be performed.
The mobile phone 23 has a function of communicating with other mobile phones and fixed phones via a mobile communication network (not shown). In addition to a built-in speaker, a built-in microphone, and a communication unit for telephone communication, A communication module for performing near field communication is provided. The mobile phone 23 and the communication device 21 are not limited to being connected wirelessly, and may be configured to be connected by wire.

また、車両内には、車載装置２７が配置されており、この車載装置２７は、通話装置２１と通信可能に接続されている。車載装置２７は、ナビゲーション機能、ラジオ受信機能、又はオーディオ再生機能等を具備する装置であり、通話装置２１は、例えば車載装置２７が有する操作パネルを介して当該通話装置２１への各種指示を入力する。なお、通話装置２１と車載装置２７とが一体に構成されていてもよい。 An in-vehicle device 27 is disposed in the vehicle, and this in-vehicle device 27 is connected to the communication device 21 so as to be communicable. The in-vehicle device 27 is a device having a navigation function, a radio reception function, an audio playback function, or the like, and the call device 21 inputs various instructions to the call device 21 through an operation panel of the in-vehicle device 27, for example. To do. Note that the communication device 21 and the in-vehicle device 27 may be configured integrally.

ここで、携帯電話２３及び車載装置２７には、各装置２３、２７の表示等に使用する言語（以下、設定言語）２３Ａ、２７Ａが設定されている。通常、携帯電話２３の設定言語２３Ａは、携帯電話２３の所有者によって設定され、車載装置２７の設定言語２７Ａは、車両の所有者によって設定される。なお、携帯電話２３の所有者と車両の所有者は、一致する場合もあるし、不一致の場合もある。 Here, the mobile phone 23 and the in-vehicle device 27 are set with languages (hereinafter referred to as setting languages) 23A and 27A used for displaying the devices 23 and 27 and the like. Usually, the setting language 23A of the mobile phone 23 is set by the owner of the mobile phone 23, and the setting language 27A of the in-vehicle device 27 is set by the owner of the vehicle. Note that the owner of the mobile phone 23 and the owner of the vehicle may or may not match.

図１に示すように、通話装置２１において、制御部３１は、ＣＰＵを備え、この通話装置２１の各部を制御するコンピュータとして機能する。また、記憶部３２は、制御部３１によって実行される制御プログラム、及び各種のデータを記憶する。
マイク３３は、通話装置２１のユーザーである話者（以下、第一話者Ｘと言う）の音声（ハンズフリー通話の際の送話音声に相当）を集音する集音部として機能する。スピーカー３５は、車両内の第一話者Ｘに向けて、他の電話端末２５のユーザーである話者（以下、第二話者Ｙと言う）の音声を放音する放音部として機能する。なお、マイク３３及びスピーカー３５は、通話装置専用のマイク及びスピーカーでもよいし、車載装置２７等が音声の入出力に使用するマイク及びスピーカーを兼用したものでもよい。 As shown in FIG. 1, in the communication device 21, the control unit 31 includes a CPU and functions as a computer that controls each unit of the communication device 21. The storage unit 32 stores a control program executed by the control unit 31 and various data.
The microphone 33 functions as a sound collection unit that collects the sound of a speaker who is a user of the call device 21 (hereinafter referred to as the first speaker X) (corresponding to a transmitted sound during a hands-free call). The speaker 35 functions as a sound emitting unit that emits the sound of a speaker (hereinafter referred to as a second speaker Y) who is a user of another telephone terminal 25 toward the first speaker X in the vehicle. . Note that the microphone 33 and the speaker 35 may be a microphone and a speaker dedicated to the telephone device, or may be a microphone and a speaker that the in-vehicle device 27 or the like uses for voice input / output.

第一検出部４１は、マイク３３を介して入力した第一話者Ｘの音声から、この話者Ｘの母語を特定可能な周波数として、フォルマント周波数を検出する。より具体的には、フォルマント周波数のうちの最も低い第０フォルマント周波数（第一フォルマント周波数、又は基底周波数と言う場合もある）を検出する。例えば、第一検出部４１は、線形予想分析（ＬＰＣ）の後にピークピッキングでピークを抽出し、バンドパスフィルタにて後述する図２に記載のバンド毎にピークの有無を検出する。これにより、バンド毎にフォルマント周波数の有無を検出し、検出結果を第一演算部４３又は制御部３１に出力する。なお、フォルマント周波数の検出方法は、公知の方法を広く適用可能である。 The first detection unit 41 detects the formant frequency from the voice of the first speaker X input via the microphone 33 as a frequency that can identify the native language of the speaker X. More specifically, the lowest 0th formant frequency (also referred to as the first formant frequency or the base frequency) of the formant frequencies is detected. For example, the first detection unit 41 extracts a peak by peak picking after linear predictive analysis (LPC), and detects the presence or absence of a peak for each band shown in FIG. Thereby, the presence or absence of the formant frequency is detected for each band, and the detection result is output to the first calculation unit 43 or the control unit 31. A known method can be widely applied as a formant frequency detection method.

図２は、複数種類の言語と、各言語を母語とする者（母語発話者）の第０フォルマント周波数との関係の一例を示した図である。なお、図２は音声帯域に相当する０〜１５ｋＨｚまでの周波数帯を９つのバンドに区切り、言語毎に第０フォルマント周波数のバンドを暗部で表記している。
また、図２は成人の場合の一例を示している。この図に示すように、日本語を母語とする者は相対的に第０フォルマント周波数が低い範囲となり、英語やイタリア語を母語とする者は相対的に第０フォルマント周波数が高い範囲となる傾向がある。 FIG. 2 is a diagram illustrating an example of a relationship between a plurality of types of languages and the 0th formant frequency of a person who speaks each language as a mother tongue (speaker of native language). In FIG. 2, the frequency band from 0 to 15 kHz corresponding to the voice band is divided into nine bands, and the band of the 0th formant frequency is shown in a dark part for each language.
FIG. 2 shows an example of an adult. As shown in this figure, those whose native language is Japanese have a relatively low range of 0th formant frequency, and those whose native language is English or Italian tend to have a relatively high range of 0th formant frequency. There is.

このように、言語と第０フォルマント周波数とは相関関係がある。このため、例えば、日本語を母語とする者が英語を話した場合、その英語音声の第０フォルマント周波数は、日本語の周波数帯域である低い範囲（図２中、０〜１．２５ｋＨｚ）になり易い。また、イギリス英語を母語とする者が日本語を話した場合、日本語音声の第０フォルマント周波数は、イギリス英語の周波数帯域である高い範囲（図２中、２ｋＨｚ〜１５ｋＨｚ）の範囲になり易い。 Thus, there is a correlation between the language and the 0th formant frequency. For this reason, for example, when a person whose mother tongue is Japanese speaks English, the 0th formant frequency of the English voice is in a low range (0 to 1.25 kHz in FIG. 2) which is a Japanese frequency band. Easy to be. In addition, when a person whose native language is British English speaks Japanese, the 0th formant frequency of Japanese speech tends to be in a high range (2 kHz to 15 kHz in FIG. 2), which is the frequency band of British English. .

本構成では、この相関関係（第０フォルマント周波数が母語に影響されること）を利用して、第一演算部４３が、制御部３１の制御の下、第一話者Ｘの音声の第０フォルマント周波数から、第一話者Ｘの母語を判定し、判定結果に基づき第一話者Ｘが母語発話者（第一の発話者に相当）か語学学習者（第二の発話者に相当）か否かを判定する第一の判定処理を行う。
なお、記憶部３２には、図２に記載の傾向に基づき、複数種類の言語と、各言語を母語とする者のフォルマント周波数とを関係付けた言語別周波数情報３２Ａが記憶され、この言語別周波数情報３２Ａを利用することによって、第０フォルマント周波数から母語を容易に特定可能である。 In this configuration, using this correlation (the 0th formant frequency is influenced by the native language), the first calculation unit 43 controls the 0th of the voice of the first speaker X under the control of the control unit 31. The first speaker X is determined from the formant frequency, and the first speaker X is a native speaker (corresponding to the first speaker) or a language learner (corresponding to the second speaker) based on the determination result. A first determination process is performed to determine whether or not.
The storage unit 32 stores language-specific frequency information 32A that associates a plurality of types of languages with formant frequencies of persons whose mother tongue is each language based on the tendency shown in FIG. By using the frequency information 32A, the native language can be easily specified from the 0th formant frequency.

携帯電話２３は、第一話者Ｘが常に使用する事が前提であるため、ユーザー固定の装置である。このため、携帯電話２３の設定言語２３Ａは母語に設定されていることが多い。
一方、車載装置２７の設定言語２７Ａは、必ずしも母語に設定されてはいない。なぜなら、実際の道路標識が母語以外の場合、車載装置２７の設定言語２７Ａを母語に設定していると、実際の道路標識の文字を車載装置２７側で正確に認識できず、ナビゲーション処理や地図等の表示に不備を招くおそれがあるからである。このため、車載装置２７の設定言語２７Ａは、実際の道路標識の言語に設定されていることが多い。
そこで、本構成では、母語判定精度を更に向上させるために、第一演算部４３が、制御部３１の制御の下、少なくとも携帯電話２３の設定言語２３Ａに基づいて、携帯電話２３のユーザーである第一話者Ｘの母語を特定し、特定した結果に基づき第一話者Ｘが母語発話者か語学学習者か否かを判定する第二の判定処理を行う。 The mobile phone 23 is a user-fixed device because it is assumed that the first speaker X always uses it. For this reason, the setting language 23A of the mobile phone 23 is often set to the native language.
On the other hand, the setting language 27A of the in-vehicle device 27 is not necessarily set to the native language. This is because if the actual road sign is other than the native language and the setting language 27A of the in-vehicle device 27 is set to the native language, the characters of the actual road sign cannot be accurately recognized on the in-vehicle device 27 side, and navigation processing and map This is because there is a risk of deficiencies in the display of the above. For this reason, the setting language 27A of the in-vehicle device 27 is often set to the language of the actual road sign.
Therefore, in this configuration, in order to further improve the native language determination accuracy, the first calculation unit 43 is a user of the mobile phone 23 under the control of the control unit 31 based on at least the set language 23A of the mobile phone 23. A second determination process is performed in which the native language of the first speaker X is specified, and whether the first speaker X is a native speaker or a language learner is determined based on the specified result.

また、通話は、母語発話者同士の通話の次に、母語発話者と語学学習者の通話が比較的多いと考えられる。母語発話者が発話した場合は一回の文章が長く途切れる回数が少ないが、語学学習者の場合は、単語や文法を考えながら発話する為、どうしても無音区間が多くなる傾向となる。
そこで、本構成では、母語判定精度を更に向上させるために、第一演算部４３が、制御部３１の制御の下、第一話者Ｘの音声の無音部分の情報を取得し、取得した情報に基づき第一話者Ｘが母語発話者か語学学習者か否かを判定する第三の判定処理を行う。 In addition, it is considered that there are relatively many calls between native speakers and language learners after calls between native speakers. When a native speaker speaks, the number of times a sentence is interrupted for a long time is small. However, in the case of a language learner, the speech is uttered while considering words and grammar.
Therefore, in this configuration, in order to further improve the native language determination accuracy, the first calculation unit 43 acquires information on the silent part of the voice of the first speaker X under the control of the control unit 31, and the acquired information Based on the above, a third determination process is performed to determine whether the first speaker X is a native speaker or a language learner.

エコーキャンセラー４５は、マイク３３からの音声信号に対し、エコーキャンセル処理を施すことにより、スピーカー３５から発する音をマイク３３でひろってしまうことで発生するエコーをキャンセルする。
また、制御部３１は、第二話者Ｙが語学学習者の場合、マイク３３から入力した音声信号（送話音声の信号に相当）に対し、語学学習者向けの所定の音声処理を行う。具体的には、制御部３１は、所定の音声処理として、音声をスロー再生させる処理（再生速度を遅くする処理に相当）を行う。つまり、マイク３３と第一演算部４３との間には、サンプリング・レート・コンバーター（以下、ＳＲＣ４７）が配置され、制御部３１は、マイク３３から入力した音声信号に対するサンプリング周波数をｎ倍にし、且つ、サンプリング後のデータを適宜に間引くことによって、スロー再生される音声信号に変換する。 The echo canceller 45 performs echo cancellation processing on the audio signal from the microphone 33, thereby canceling the echo generated by spreading the sound emitted from the speaker 35 with the microphone 33.
In addition, when the second speaker Y is a language learner, the control unit 31 performs predetermined sound processing for the language learner on the sound signal input from the microphone 33 (corresponding to the transmitted speech signal). Specifically, the control unit 31 performs a process of slow reproduction of sound (corresponding to a process of reducing the reproduction speed) as the predetermined sound process. That is, a sampling rate converter (hereinafter referred to as SRC 47) is arranged between the microphone 33 and the first calculation unit 43, and the control unit 31 multiplies the sampling frequency for the audio signal input from the microphone 33 by n times, In addition, the sampled data is thinned out appropriately to convert it into an audio signal that is played back slowly.

この音声信号は、通信モジュール２２を介して他の電話端末２５に送信され、他の電話端末２５からスロー再生された音声が放音される。なお、サンプリング周波数等をデフォルト値にすることで、他の電話端末２５から実速度で音声が放音される。
このようにして、ＳＲＣ４７は、スロー再生させる音声処理を行う第一スロー再生部として機能する。この場合、再生速度を段階的に遅くすることによって、聞く側（第二話者Ｙ側）の違和感を抑えることが好ましい。なお、スロー再生させる構成はＳＲＣ４７に限定されない。また、語学学習者向けの所定の音声処理は、スロー再生に限定しなくてもよく、語学学習者が聞き取り易くなる音声処理（語学学習者向けの音声処理に相当）を広く適用可能である。 This audio signal is transmitted to the other telephone terminal 25 via the communication module 22, and the audio that is slowly reproduced from the other telephone terminal 25 is emitted. Note that, by setting the sampling frequency and the like to default values, sound is emitted from other telephone terminals 25 at an actual speed.
In this way, the SRC 47 functions as a first slow playback unit that performs audio processing for slow playback. In this case, it is preferable to suppress the uncomfortable feeling on the listening side (second speaker Y side) by slowing down the playback speed in steps. Note that the slow playback configuration is not limited to the SRC 47. Further, the predetermined voice processing for language learners does not have to be limited to slow reproduction, and voice processing that is easy for language learners to hear (corresponding to voice processing for language learners) is widely applicable.

通話装置２１において、第二検出部５１は、通信モジュール２２を介して入力した第二話者Ｙの音声（ハンズフリー通話の際の受話音声に相当）から、この話者Ｙの母語を特定可能な周波数として、フォルマント周波数（本構成では第０フォルマント周波数）を検出する。この第二検出部５１には、第一検出部４１と同様のものを適用可能である。
第二演算部５３は、制御部３１の制御の下、第二話者Ｙの音声の第０フォルマント周波数から、第二話者Ｙの母語を判定し、判定結果に基づき第二話者Ｙが母語発話者（第一の発話者に相当）か語学学習者（第二の発話者に相当）か否かを判定する第一の判定処理を行う。
また、第二演算部５３は、母語判定精度を更に向上させるために、制御部３１の制御の下、第二話者Ｙの音声の無音部分の情報を取得し、取得した情報に基づき第二話者Ｙが母語発話者か語学学習者か否かを判定する第三の判定処理を行う。 In the call device 21, the second detection unit 51 can specify the native language of the speaker Y from the voice of the second speaker Y input via the communication module 22 (corresponding to the received voice during a hands-free call). A formant frequency (the 0th formant frequency in the present configuration) is detected as a low frequency. The same thing as the 1st detection part 41 is applicable to this 2nd detection part 51. FIG.
Under the control of the control unit 31, the second calculation unit 53 determines the mother tongue of the second speaker Y from the 0th formant frequency of the voice of the second speaker Y, and based on the determination result, the second speaker Y First determination processing is performed to determine whether the speaker is a native speaker (corresponding to the first speaker) or a language learner (corresponding to the second speaker).
Further, in order to further improve the native language determination accuracy, the second calculation unit 53 acquires information on the silent part of the voice of the second speaker Y under the control of the control unit 31, and based on the acquired information, A third determination process is performed to determine whether the speaker Y is a native speaker or a language learner.

ところで、第一演算部４３は、第一話者Ｘが使用する携帯電話２３の設定言語２３Ａに基づいて、第一話者Ｘの母語を判定し、判定結果に基づき第一話者Ｘが母語発話者か語学学習者か否かを判定する第二の判定処理を行っていたが、通話装置２１側では、第二話者Ｙが使用する他の電話端末２５の設定言語等は判らないため、第二演算部５３では第二の判定処理は実行されない。
但し、通話装置２１と他の電話端末２５との間の通信によって、第二話者Ｙが使用する他の電話端末２５の設定言語が判るようにした場合、通話装置２１側（第二演算部５３）にて、その設定言語に基づき第二話者Ｙが母語発話者か語学学習者か否かを判定する第二の判定処理を行ってもよい。
なお、上述した第二及び第三の判定処理は制御部３１が行ってもよい。 By the way, the 1st calculating part 43 determines the native language of the 1st speaker X based on the setting language 23A of the mobile telephone 23 which the 1st speaker X uses, and the 1st speaker X is a native language based on the determination result. The second determination process is performed to determine whether the speaker is a language learner or not, but the telephone device 21 does not know the language set for the other telephone terminal 25 used by the second speaker Y. In the second calculation unit 53, the second determination process is not executed.
However, when the setting language of the other telephone terminal 25 used by the second speaker Y is known by communication between the telephone apparatus 21 and the other telephone terminal 25, the side of the telephone apparatus 21 (second arithmetic unit) 53), a second determination process may be performed for determining whether the second speaker Y is a native speaker or a language learner based on the set language.
Note that the control unit 31 may perform the second and third determination processes described above.

制御部３１は、第一話者Ｘが語学学習者の場合、通信モジュール２２を介して入力した第二話者Ｙの音声信号に対し、語学学習者向けの所定の音声処理を行う。この所定の音声処理は、音声をスロー再生させる処理である。
つまり、通信モジュール２２には、サンプリング・レート・コンバーター（以下、ＳＲＣ５７）が接続され、制御部３１は、ＳＲＣ５７を利用して上述と同様にして、第二話者Ｙの音声信号を、スロー再生される音声信号に選択的に変換する。この場合も、スロー再生される音声信号は、再生速度を段階的に遅くすることによって、聞く側（第一話者Ｘ側）の違和感を抑えるものであることが好ましい。また、図１中、符号５８はＳＲＣ５７の出力側に配置されるローパルフィルタ（ＬＰＦ）である。
なお、スロー再生させる構成はＳＲＣ５７に限定されない。また、語学学習者向けの所定の音声処理は、スロー再生させる音声処理に限定されず、語学学習者向けの音声処理を広く適用可能である。 When the first speaker X is a language learner, the control unit 31 performs predetermined sound processing for the language learner on the sound signal of the second speaker Y input via the communication module 22. This predetermined sound processing is processing for slow reproduction of sound.
That is, a sampling rate converter (hereinafter referred to as SRC 57) is connected to the communication module 22, and the control unit 31 uses the SRC 57 in the same manner as described above to slowly reproduce the voice signal of the second speaker Y. Is selectively converted into an audio signal. Also in this case, it is preferable that the slow-played audio signal suppresses a sense of discomfort on the listening side (first speaker X side) by gradually reducing the playback speed. In FIG. 1, reference numeral 58 denotes a low-pass filter (LPF) disposed on the output side of the SRC 57.
Note that the slow playback configuration is not limited to the SRC 57. Moreover, the predetermined audio processing for language learners is not limited to the audio processing for slow reproduction, and audio processing for language learners can be widely applied.

図３は車両側の話者（第一話者Ｘ）に対する通話装置２１の動作を示すフローチャートである。
通話装置２１は、第一検出部４１によって、マイク３３に発話音声が入力されたことを検出すると（ステップＳ１Ａ）、第一演算部４３によって、上述した第一の判定処理（ステップＳ２Ａ）と、第二の判定処理（ステップＳ３Ａ）と、第三の判定処理（ステップＳ４Ａ）とを実行する。
第一の判定処理では、第一演算部４３は、第一話者Ｘの音声の第０フォルマント周波数を特定した後、記憶部３２に記憶される言語別周波数情報３２Ａを参照することによって、第一話者Ｘの母語を特定する。次いで、特定した母語から第一話者Ｘが母語発話者か語学学習者か否かを判定する。 FIG. 3 is a flowchart showing the operation of the communication device 21 for the vehicle-side speaker (first speaker X).
When the first detection unit 41 detects that the utterance voice is input to the microphone 33 (step S1A), the call device 21 uses the first calculation unit 43 to perform the first determination process (step S2A) described above, A second determination process (step S3A) and a third determination process (step S4A) are executed.
In the first determination process, the first calculation unit 43 specifies the 0th formant frequency of the voice of the first speaker X, and then refers to the frequency-specific frequency information 32A stored in the storage unit 32 to thereby change the first formant frequency. Identify the native language of speaker X. Next, it is determined from the identified native language whether the first speaker X is a native language speaker or a language learner.

母語から母語発話者か語学学習者か否かを判定する方法には、様々な方法を適用可能である。例えば、母語が、この通話装置２１が利用される国の公用語と同じ言語であった場合に第一話者Ｘを母語発話者と判定し、別の言語の場合に語学学習者と判定してもよい。また、音声認識技術を適用して第一話者Ｘの音声の言語を特定し、母語が特定した言語と同じ言語の場合、第一話者Ｘを母語発話者と判定し、別の言語の場合に語学学習者と判定する方法を適用してもよい。 Various methods can be applied to the method of determining whether the speaker is a native speaker or a language learner from the native language. For example, if the native language is the same language as the official language of the country in which the call device 21 is used, the first speaker X is determined as the native speaker, and the language is determined as a language learner in another language. May be. Further, the speech recognition technology is applied to identify the language of the first speaker X, and when the native language is the same as the identified language, the first speaker X is determined to be a native speaker, In some cases, a method of determining a language learner may be applied.

第二の判定処理では、第一演算部４３は、携帯電話２３及び車載装置２７の設定言語２３Ａ、２７Ａを比較して母語を特定し、特定した母語から第一話者Ｘが母語発話者か語学学習者か否かを判定する。設定言語２３Ａ、２７Ａが一致する場合は、その言語を母語と特定し、一致しない場合は、携帯電話２３の設定言語２３Ａを母語と特定する。また、車載装置２７の設定言語２７Ａを特定できない場合に、携帯電話２３の設定言語２３Ａを母語と特定してもよい。また、特定した母語から母語発話者か語学学習者か否かを判定する方法は、第一判定処理と同様の判定方法でもよいし、異なる判定方法でもよい。 In the second determination process, the first calculation unit 43 compares the set languages 23A and 27A of the mobile phone 23 and the in-vehicle device 27 to identify the native language, and whether the first speaker X is the native speaker from the identified native language. Determine if you are a language learner. When the set languages 23A and 27A match, the language is specified as a native language, and when they do not match, the set language 23A of the mobile phone 23 is specified as the native language. Further, when the setting language 27A of the in-vehicle device 27 cannot be specified, the setting language 23A of the mobile phone 23 may be specified as the native language. In addition, the method for determining whether the speaker is a native speaker or a language learner from the identified native language may be the same determination method as the first determination process or a different determination method.

第三の判定処理では、第一演算部４３は、マイク３３を介して入力した第一話者Ｘの音声から、無音部分の回数を計数し、計数結果に基づいて第一話者Ｘが母語発話者か語学学習者か否かを判定する。この判定方法にも複数の方法が挙げられる。例えば、無音部分の回数が、所定時間内で予め定めた閾値よりも大の場合に語学学習者と判定し、小の場合に母語発話者と判定する方法でもよい。
また、母語発話者と語学学習者の通話が比較的多いことを踏まえて、第一話者Ｘの音声の無音部分の回数と、通信モジュール２２を介して入力される第二話者Ｙの音声の無音部分の回数とを比較し、回数が多い方を語学学習者と判定する方法でもよい。 In the third determination process, the first calculation unit 43 counts the number of silent portions from the voice of the first speaker X input via the microphone 33, and the first speaker X is the mother tongue based on the count result. Determine if you are a speaker or a language learner. There are a plurality of methods for this determination method. For example, it may be determined that the language learner is determined when the number of silent portions is greater than a predetermined threshold value within a predetermined time, and the speaker is determined as a native speaker when the number is small.
In addition, the number of silent parts of the voice of the first speaker X and the voice of the second speaker Y input via the communication module 22 are considered in view of the relatively large number of calls between the native speaker and the language learner. A method may be used in which the number of silent portions is compared and the language learner is determined as having a higher number of times.

第一演算部４３は、第一から第三の判定処理を並列的に行うことによって、それぞれの判定結果を取得した後、これら判定結果に基づき第一話者Ｘが母語発話者か語学学習者か否かを判定する（ステップＳ５Ａ）。これらステップＳ２Ａ〜Ｓ５Ａまでの処理が判定ステップに相当する。
第一〜第三の判定処理の判定結果が異なる場合、予め定めた優先順位が高い判定結果を優先する。本構成では、第一の判定結果を最優先とする。なお、第一の判定結果を取得できなかった場合（例えば、図２中の１ｋＨｚ〜１．５ｋＨｚのような複数の母語に含まれる第０フォルマント周波数の場合も含む）、第二及び第三判定処理の判定結果を利用することで、１つの母語を特定し易くなる。なお、優先順位は変更してもよい。 The first calculation unit 43 acquires the respective determination results by performing the first to third determination processes in parallel, and then, based on these determination results, the first speaker X is a native speaker or a language learner. Whether or not (step S5A). The process from step S2A to S5A corresponds to a determination step.
When the determination results of the first to third determination processes are different, priority is given to a determination result having a high priority. In this configuration, the first determination result is given the highest priority. When the first determination result cannot be obtained (for example, including the case of the 0th formant frequency included in a plurality of native languages such as 1 kHz to 1.5 kHz in FIG. 2), the second and third determinations are made. By using the processing determination result, it becomes easy to specify one native language. The priority order may be changed.

ステップＳ５Ａの判定で第一話者Ｘが母語発話者である（つまり、語学学習者でない）と判定した場合（ステップＳ５Ａ；ＮＯ）、通話装置２１は、制御部３１によって当該処理（車両側の話者（第一話者Ｘ）に対する動作に相当）を終了する。
一方、ステップＳ５Ａの判定で第一話者Ｘが語学学習者であると判定した場合（ステップＳ５Ａ；ＮＯ）、通話装置２１は、ＳＲＣ５７を利用して、受話音声である第二話者Ｙの音声信号をスロー再生させる（ステップＳ６Ａ、音声処理ステップに相当）。これにより、語学学習者と判定された第一話者Ｘには、第二話者Ｙの音声がゆっくり聞こえ、音声を聞き取り易くなる。 When it is determined in step S5A that the first speaker X is a native speaker (that is, not a language learner) (step S5A; NO), the call device 21 causes the control unit 31 to execute the process (on the vehicle side). The operation of the speaker (corresponding to the operation for the first speaker X) is terminated.
On the other hand, when it is determined in step S5A that the first speaker X is a language learner (step S5A; NO), the call device 21 uses the SRC 57 to determine the second speaker Y that is the received voice. The audio signal is played slowly (step S6A, corresponding to the audio processing step). Thereby, the first speaker X determined to be a language learner can hear the voice of the second speaker Y slowly and can easily hear the voice.

なお、ステップＳ１Ａ〜Ｓ５Ａまでの処理は数秒程度の短時間で終了し、スロー再生は短時間で開始される。また、第一話者Ｘが母語発話者である（つまり、語学学習者でない）と判定された場合、ステップＳ６Ａの処理が実行されないので、母語発話者である第一話者Ｘには、第二話者Ｙの音声が実速度で聞こえることになる。 Note that the processing from step S1A to S5A is completed in a short time of about several seconds, and the slow reproduction is started in a short time. If it is determined that the first speaker X is a native speaker (that is, not a language learner), the process of step S6A is not executed. The voice of the two speaker Y can be heard at the actual speed.

図４は車両外の話者（第二話者Ｙ）に対する通話装置２１の動作を示すフローチャートである。
通話装置２１は、第二検出部５１によって、通信モジュール２２を介して携帯電話２３から受話音声（第二話者Ｙの音声）が入力されたことを検出すると（ステップＳ１Ｂ）、第二演算部５３によって、上述した第一の判定処理（ステップＳ２Ｂ）と、第三の判定処理（ステップＳ４Ｂ）とを実行する。
この第一の判定処理では、第二演算部５３は、第二話者Ｙの音声の第０フォルマント周波数を特定した後、記憶部３２に記憶される言語別周波数情報３２Ａを参照することによって、第二話者Ｙの母語を特定する。次いで、特定した母語から第二話者Ｙが母語発話者か語学学習者か否かを判定する。なお、この第一の判定処理は、第二話者Ｙの音声を利用する点を除いて、第一演算部４３によって実行される第一の判定処理と同様の処理である。 FIG. 4 is a flowchart showing the operation of the communication device 21 for a speaker outside the vehicle (second speaker Y).
When the second detecting unit 51 detects that the received voice (the voice of the second speaker Y) is input from the mobile phone 23 via the communication module 22 (step S1B), the call device 21 detects the second calculating unit. 53, the first determination process (step S2B) and the third determination process (step S4B) described above are executed.
In this first determination process, the second calculation unit 53 specifies the 0th formant frequency of the voice of the second speaker Y, and then refers to the language-specific frequency information 32A stored in the storage unit 32. The native language of the second speaker Y is specified. Next, it is determined from the identified native language whether the second speaker Y is a native language speaker or a language learner. The first determination process is the same as the first determination process executed by the first calculation unit 43 except that the voice of the second speaker Y is used.

第三の判定処理では、第一演算部４３は、第二話者Ｙの音声から、無音部分の回数を計数し、計数結果に基づいて第二話者Ｙが母語発話者か語学学習者か否かを判定する。なお、この第三の判定処理は、第二話者Ｙの音声を利用する点を除いて、第一演算部４３によって実行される第三の判定処理と同様の処理である。 In the third determination process, the first calculation unit 43 counts the number of silent portions from the voice of the second speaker Y, and whether the second speaker Y is a native speaker or a language learner based on the counting result. Determine whether or not. The third determination process is the same as the third determination process executed by the first calculation unit 43 except that the voice of the second speaker Y is used.

第一演算部４３は、第一及び第三の判定処理を並列的に行うことによって、それぞれの判定結果を取得した後、これら判定結果に基づき第二話者Ｙが母語発話者か語学学習者か否かを判定する（ステップＳ５Ｂ）。これらステップＳ２Ｂ〜Ｓ５Ｂまでの処理が判定ステップに相当する。
第一及び第三の判定処理の判定結果が異なる場合、予め定めた優先順位が高い判定結果を優先する。本構成では、第一の判定結果を最優先とする。なお、優先順位は変更してもよい。 The first calculation unit 43 performs the first and third determination processes in parallel, and acquires the respective determination results. Then, based on these determination results, the second speaker Y is a native speaker or a language learner. Whether or not (step S5B). The process from step S2B to S5B corresponds to a determination step.
When the determination results of the first and third determination processes are different, priority is given to a determination result having a high priority. In this configuration, the first determination result is given the highest priority. The priority order may be changed.

ステップＳ５Ｂの判定で第二話者Ｙが母語発話者である（つまり、語学学習者でない）と判定した場合（ステップＳ５Ｂ；ＮＯ）、通話装置２１は、制御部３１によって当該処理（車両外の話者（第二話者Ｙ）に対する動作に相当）を終了する。
一方、ステップＳ５Ｂの判定で第二話者Ｙが語学学習者であると判定した場合（ステップＳ５Ｂ；ＹＥＳ）、通話装置２１は、ＳＲＣ４７を利用して、送話音声である第一話者Ｘの音声信号をスロー再生させる（ステップＳ６Ｂ、音声処理ステップに相当）。これにより、語学学習者と判定された第二話者Ｙには、第一話者Ｘの音声がゆっくり聞こえ、音声を聞き取り易くなる。なお、ステップＳ１Ｂ〜Ｓ５Ｂまでの処理は数秒程度の短時間で終了し、スロー再生は短時間で開始される。 When it is determined in step S5B that the second speaker Y is a native speaker (that is, not a language learner) (step S5B; NO), the call device 21 causes the control unit 31 to perform the process (outside the vehicle). The speaker (corresponding to the operation for the second speaker Y) is terminated.
On the other hand, when it is determined in step S5B that the second speaker Y is a language learner (step S5B; YES), the call device 21 uses the SRC 47 to transmit the first speaker X that is the transmitted voice. Are reproduced in a slow manner (step S6B, corresponding to the audio processing step). Thereby, the second speaker Y determined to be a language learner can hear the voice of the first speaker X slowly and easily hear the voice. Note that the processing from step S1B to S5B is completed in a short time of about several seconds, and the slow playback is started in a short time.

また、第二話者Ｙが母語発話者である（つまり、語学学習者でない）と判定された場合、ステップＳ６Ｂの処理が実行されないので、母語発話者である第二話者Ｙには、第一話者Ｘの音声が実速度で聞こえることになる。
また、図４に示すフローチャートにおいて、ステップＳ４Ｂの処理（第３の判定処理）を省略してもよい。 If it is determined that the second speaker Y is a native speaker (that is, not a language learner), the process of step S6B is not executed. The voice of the speaker X can be heard at the actual speed.
Further, in the flowchart shown in FIG. 4, the process of step S4B (third determination process) may be omitted.

以上説明したように、本実施形態では、第一検出部４１、第一演算部４３及び制御部３１によって、第一話者Ｘの少なくとも音声に基づいて、第一話者Ｘが、母語発話者か語学学習者か否かを判定する判定部が構成される。
また、制御部３１及びＳＲＣ５７によって、判定部の判定結果に基づき、語学学習者であると判定された第一話者Ｘに向けて出力される音声信号（第二話者Ｙの音声）に対し、スロー再生処理（語学学習者向けの所定の音声処理に相当）を行う音声処理部が構成される。
これにより、手動スイッチ等を使用しなくても、第一話者Ｘが母語発話者か語学学習者か否かを自動的に特定し、第一話者Ｘが語学学習者であっても第二話者Ｙと通話し易くなる。 As described above, in the present embodiment, the first speaker X, the first calculation unit 43, and the control unit 31 make the first speaker X a native speaker based on at least the voice of the first speaker X. The determination part which determines whether it is a language learner is comprised.
Moreover, with respect to the voice signal (the voice of the second speaker Y) output by the control unit 31 and the SRC 57 toward the first speaker X determined to be a language learner based on the determination result of the determination unit. A sound processing unit that performs slow reproduction processing (corresponding to predetermined sound processing for language learners) is configured.
This automatically identifies whether the first speaker X is a native speaker or a language learner without using a manual switch or the like, and even if the first speaker X is a language learner. It becomes easy to talk with the two speaker Y.

また、第二検出部５１、第二演算部５３及び制御部３１によって、第二話者Ｙの少なくとも音声に基づいて、第二話者Ｙが、母語発話者か語学学習者か否かを判定する判定部が構成される。
また、制御部３１及びＳＲＣ４７によって、判定部の判定結果に基づき、語学学習者であると判定された第二話者Ｙに向けて出力される音声信号（第一話者Ｘの音声）に対し、スロー再生処理（語学学習者向けの所定の音声処理に相当）を行う音声処理部が構成される。
これにより、手動スイッチ等を使用しなくても、第二話者Ｙが母語発話者か語学学習者か否かを自動的に特定し、第二話者Ｙが語学学習者であっても第一話者Ｘと通話し易くなる。 In addition, the second detection unit 51, the second calculation unit 53, and the control unit 31 determine whether the second speaker Y is a native speaker or a language learner based on at least the voice of the second speaker Y. A determination unit is configured.
Moreover, with respect to the voice signal (the voice of the first speaker X) output by the control unit 31 and the SRC 47 toward the second speaker Y determined to be a language learner based on the determination result of the determination unit. A sound processing unit that performs slow reproduction processing (corresponding to predetermined sound processing for language learners) is configured.
This automatically identifies whether the second speaker Y is a native speaker or a language learner without using a manual switch or the like, and even if the second speaker Y is a language learner. It becomes easier to talk to the speaker X.

本実施形態において、母語発話者は、厳密な意味の母語発話者に限定しなくてもよく、母語発話者に似た発話を行う者を含んでもよい。また、語学学習者についても、厳密な意味の語学学習者に限定しなくてもよく、語学学習者に似た発話を行う者を含んでもよい。
例えば、高齢者の場合、母語発話者であっても無音区間が多い場合があり、この場合は、第三の判定処理で語学学習者と判定される場合が生じる。また、個体差によって、母語発話者であっても、第一の判定処理等で語学学習者と判定される場合もある。いずれも判定基準を適宜に調整することによって、母語発話者又は語学学習者と判定される範囲を調整可能である。
すなわち、第一〜第三の判定処理において、母語発話者相当の第一の発話者か語学学習者相当の第二の発話者か否かを判定すればよい。 In the present embodiment, the native speaker may not be limited to a native speaker with a strict meaning, but may include a person who speaks similar to the native speaker. Further, the language learner does not have to be limited to a language learner having a strict meaning, and may include a person who speaks similar to the language learner.
For example, in the case of an elderly person, even if it is a native speaker, there may be many silent sections, and in this case, it may be determined as a language learner in the third determination process. Further, depending on individual differences, even a native speaker may be determined as a language learner in the first determination process or the like. In any case, the range determined to be a native speaker or a language learner can be adjusted by appropriately adjusting the determination criteria.
That is, in the first to third determination processes, it may be determined whether the first speaker corresponding to the native speaker or the second speaker corresponding to the language learner.

また、第一の発話者か第二の発話者か否かを判定する方法として、第一の判定処理を行うので、つまり、第一話者Ｘ及び第二話者Ｙの音声から、各話者Ｘ、Ｙの母語を特定可能な所定の周波数情報である第０フォルマント周波数を取得し、取得した第０フォルマント周波数に基づいて母語を特定し、特定した母語を利用して第一の発話者か第二の発話者か否かを判定するので、言語と第０フォルマント周波数との相関関係を利用して、高精度に各話者Ｘ、Ｙの母語を特定できる。 In addition, as a method for determining whether the speaker is the first speaker or the second speaker, the first determination process is performed. That is, each speech is determined from the voices of the first speaker X and the second speaker Y. The 0th formant frequency, which is predetermined frequency information that can identify the native language of the person X, Y, is acquired, the native language is specified based on the acquired 0th formant frequency, and the first speaker is used using the specified native language Therefore, it is possible to specify the native language of each speaker X and Y with high accuracy using the correlation between the language and the 0th formant frequency.

なお、第一の判定処理において、第０フォルマント周波数を利用する場合を説明したが、第０フォルマント周波数に限定しなくてもよい。例えば、第０フォルマント周波数以外のフォルマント周波数から話者Ｘ、Ｙの母語を特定可能であれば、そのフォルマント周波数を利用してもよい。さらに、フォルマント周波数以外に、話者Ｘ、Ｙの母語を特定可能な周波数情報があれば、その周波数情報を利用してもよい。 In the first determination process, the case where the 0th formant frequency is used has been described. However, the first formant frequency may not be limited to the 0th formant frequency. For example, if the native language of the speakers X and Y can be identified from formant frequencies other than the 0th formant frequency, the formant frequency may be used. In addition to the formant frequency, if there is frequency information that can identify the native language of the speakers X and Y, the frequency information may be used.

また、本実施形態では、複数種類の言語と、各言語を母語とする者のフォルマント周波数とを関係付けた言語別周波数情報３２Ａを記憶し、この言語別周波数情報３２Ａに基づき母語を特定するので、母語の特定が容易である。 Further, in the present embodiment, language-specific frequency information 32A that associates plural types of languages with formant frequencies of persons whose mother tongue is each language is stored, and the mother language is specified based on the language-specific frequency information 32A. It is easy to identify the mother tongue.

また、本実施形態では、第二の判定処理を行うので、つまり、通話装置２１が通話に使用するユーザー固定の装置である携帯電話２３に設定された設定言語２３Ａを利用して、その携帯電話２３のユーザーとみなせる第一話者Ｘが第一の発話者か第二の発話者か否かを判定するので、第一話者Ｘが第一の発話者か第二の発話者か否かの判定精度を向上し易くなる。 In the present embodiment, since the second determination process is performed, that is, by using the setting language 23A set in the mobile phone 23 which is a user-fixed device used by the call device 21 for a call, the mobile phone Since it is determined whether the first speaker X that can be regarded as 23 users is the first speaker or the second speaker, it is determined whether the first speaker X is the first speaker or the second speaker. It becomes easy to improve the determination accuracy.

さらに、第二の判定処理では、携帯電話２３の設定言語２３Ａが、この通話装置２１に接続される車載装置２７の設定言語２７Ａと一致する場合は、その言語を母語と特定し、一致しない場合は、携帯電話２３の設定言語２３Ａを母語と特定し、特定した母語を利用して第一話者Ｘが第一の発話者か第二の発話者か否かを判定する。これにより、第一の発話者か第二の発話者か否かの判定精度を向上し易くなる。なお、この第二の判定処理は、第一話者Ｘについてのみ行うので、第一話者Ｘの方が、第二話者Ｙよりも判定精度が向上し易くなる。 Furthermore, in the second determination process, when the setting language 23A of the mobile phone 23 matches the setting language 27A of the in-vehicle device 27 connected to the call device 21, the language is identified as the native language and does not match. Specifies the set language 23A of the mobile phone 23 as a native language, and determines whether the first speaker X is the first speaker or the second speaker using the identified native language. Thereby, it becomes easy to improve the determination accuracy of whether it is a 1st speaker or a 2nd speaker. In addition, since this 2nd determination process is performed only about the 1st speaker X, the 1st speaker X becomes easier to improve determination accuracy than the 2nd speaker Y.

また、本実施形態では、第三の判定処理を行うので、つまり、第一話者Ｘ及び第二話者Ｙの音声から、各話者Ｘ、Ｙの音声の無音部分の情報を取得し、取得した情報に基づいて各話者Ｘ、Ｙが第一の発話者か第二の発話者かを判定するので、各話者Ｘ、Ｙが第一の発話者か第二の発話者か否かの判定精度をより向上し易くなる。
また、第一〜第三の判定処理の判定結果が異なる場合、予め定めた優先度に従って第一の発話者か第二の発話者か否かを判定するので、これによっても判定精度を向上し易くなる。 In the present embodiment, since the third determination process is performed, that is, from the voices of the first speaker X and the second speaker Y, information on the silent part of the voices of the speakers X and Y is acquired. Since it is determined whether each speaker X, Y is the first speaker or the second speaker based on the acquired information, whether each speaker X, Y is the first speaker or the second speaker is determined. It becomes easier to improve the determination accuracy.
In addition, when the determination results of the first to third determination processes are different, it is determined whether the speaker is the first speaker or the second speaker according to a predetermined priority. It becomes easy.

また、通話装置２１は、複数の話者Ｘ、Ｙの中の第一話者Ｘの音声を集音する集音部として機能するマイク３３と、第一話者Ｘ（所定の話者に相当）に向けて第二話者Ｙの音声を放音する放音部として機能するスピーカー３５とを備える。そして、第一話者Ｘが、語学学習者相当の第二の発話者と判定された場合、スピーカー３５により放音させる音声に対応する音声信号に、スロー再生処理（語学学習者向けの所定の音声処理に相当）を行う。これにより、マイク３３とスピーカー３５を利用する第一話者Ｘが通話を聞き取り易くなる。
マイク３３及びスピーカー３５が別体の場合は、通話装置２１は、集音部として、マイク３３からの音声を入力する音声入力部を備え、放音部として、スピーカー３５に向けて音声を出力する音声出力部を備えればよい。 In addition, the communication device 21 includes a microphone 33 that functions as a sound collecting unit that collects the voice of the first speaker X among the plurality of speakers X and Y, and the first speaker X (corresponding to a predetermined speaker). ) And a speaker 35 that functions as a sound emitting unit that emits the voice of the second speaker Y. When the first speaker X is determined to be a second speaker corresponding to a language learner, a slow reproduction process (predetermined for language learners) is applied to an audio signal corresponding to the sound emitted by the speaker 35. Equivalent to voice processing). This makes it easier for the first speaker X who uses the microphone 33 and the speaker 35 to hear the call.
When the microphone 33 and the speaker 35 are separate, the call device 21 includes a sound input unit that inputs sound from the microphone 33 as a sound collecting unit, and outputs sound toward the speaker 35 as a sound emitting unit. An audio output unit may be provided.

また、この通話装置２１は、第一話者Ｘがハンズフリー通話に使用するハンズフリ−通話装置であるので、通話装置２１の直接のユーザーである第一話者Ｘが通話を聞き取り易くなる。
なお、第一話者Ｘ及び第二話者Ｙが第一の発話者か第二の発話者か否かを判定し、各話者Ｘ、Ｙが第二の発話者である場合に各話者が通話を聞き取り易くする場合を説明したが、これに限定されず、いずれか一方（例えば、第一話者Ｘ）だけについて、第一の発話者か第二の発話者か否かを判定し、第二の発話者である場合に、その一方の話者が通話を聞き取り易くするようにしてもよい。 Further, since the communication device 21 is a hands-free communication device used by the first speaker X for hands-free communication, the first speaker X who is a direct user of the communication device 21 can easily hear the call.
It is determined whether the first speaker X and the second speaker Y are the first speaker or the second speaker, and each speaker X, Y is the second speaker. However, the present invention is not limited to this, and it is determined whether only one (for example, the first speaker X) is the first speaker or the second speaker. However, when the speaker is the second speaker, the other speaker may make it easy to hear the call.

（第二実施形態）
図５は第二実施形態に係る通話装置２１の動作を説明するフローチャートであり、車両外の話者（第二話者Ｙ）に対する通話装置２１の動作を示している。
第二実施形態では、通話装置２１の記憶部３２に、語学学習者と判定された通話相手（第二話者Ｙ）の電話番号が記憶される点、及び、通話相手の電話番号が記憶部３２に記憶される場合は、判定処理を行うことなく、ステップＳ６Ａの処理に移行する点が第一実施形態と異なる。
以下、第一実施形態と重複する説明は省略する。 (Second embodiment)
FIG. 5 is a flowchart for explaining the operation of the call device 21 according to the second embodiment, and shows the operation of the call device 21 for a speaker outside the vehicle (second speaker Y).
In the second embodiment, the storage unit 32 of the call device 21 stores the telephone number of the call partner (second speaker Y) determined to be a language learner, and the phone number of the call partner is stored in the storage unit. 32 is different from the first embodiment in that the process proceeds to step S6A without performing the determination process.
Hereinafter, the description which overlaps with 1st embodiment is abbreviate | omitted.

図５に示すように、通話装置２１は、携帯電話２３から受話音声（第二話者Ｙの音声）が入力されると（ステップＳ１Ｂ）、通話相手である第二話者Ｙの電話番号が、語学学習者の電話番号として記憶部３２に記憶されているか否かを判定する（ステップＳ１１Ｂ）。
なお、電話番号の取得方法は、通話装置２１側（携帯電話２３）からの発信時は、その発信に使用した電話番号を取得すればよく、他の電話端末２５からの着信時は、他の電話端末２５から電話網を介して通知される電話番号を取得すればよい。 As shown in FIG. 5, when the received voice (second speaker Y's voice) is input from the mobile phone 23 (step S1B), the telephone device 21 receives the telephone number of the second speaker Y who is the other party. Then, it is determined whether or not the language learner's telephone number is stored in the storage unit 32 (step S11B).
Note that the telephone number can be acquired by acquiring the telephone number used for the outgoing call when making a call from the telephone device 21 (mobile phone 23). What is necessary is just to acquire the telephone number notified from the telephone terminal 25 via a telephone network.

ここで、通話装置２１において、通話相手である第二話者Ｙとの通話が初めての場合、又は、同じ第二話者Ｙとの過去の通話時において、第二話者Ｙが母語発話者であると判定されている場合、この第二話者Ｙの電話番号は記憶部３２に記憶されていない。このため、ステップＳ１１Ｂの判定は否定結果となり（ステップＳ１１Ｂ；ＮＯ）、次のステップＳ２Ｂの処理に移行する。 Here, in the call device 21, when the call with the second speaker Y who is the call partner is the first time or when the call with the same second speaker Y is in the past, the second speaker Y is the native speaker. If it is determined that the telephone number of the second speaker Y is not stored in the storage unit 32. For this reason, the determination in step S11B is a negative result (step S11B; NO), and the process proceeds to the next step S2B.

その後、ステップＳ５Ｂの判定で第二話者Ｙが語学学習者と判定した場合（ステップＳ５Ｂ；ＹＥＳ）、通話装置２１は、第一話者Ｘの音声信号をスロー再生させると共に（ステップＳ６Ｂ）、第二話者Ｙの電話番号を、語学学習者の電話番号として記憶部３２に記憶する（ステップＳ１２Ｂ）。
このため、以降、同じ第二話者Ｙと通話する場合、ステップＳ１１Ｂの判定が肯定結果となり（ステップＳ１１Ｂ；ＹＥＳ）、図５に示すように、ステップＳ６Ｂの処理に移行する。これにより、第一及び第三の判定処理（判定ステップに相当）を省略することができ、ステップＳ６Ｂのスロー再生の開始をより早めることが可能になる。 Thereafter, when it is determined in step S5B that the second speaker Y is a language learner (step S5B; YES), the call device 21 performs slow playback of the voice signal of the first speaker X (step S6B). The telephone number of the second speaker Y is stored in the storage unit 32 as the telephone number of the language learner (step S12B).
For this reason, after that, when talking with the same second speaker Y, the determination in step S11B is affirmative (step S11B; YES), and the process proceeds to step S6B as shown in FIG. As a result, the first and third determination processes (corresponding to the determination step) can be omitted, and the start of the slow playback in step S6B can be accelerated.

なお、ステップＳ１２Ｂにおいて、同じ電話番号が既に記憶部３２に記憶済みの場合、その電話番号は新たに記憶されない。これにより、同じ電話番号が記憶部３２に二重登録される事態が回避される。 If the same telephone number has already been stored in the storage unit 32 in step S12B, the telephone number is not newly stored. This avoids a situation where the same telephone number is double registered in the storage unit 32.

このように、本実施の形態では、記憶部３２に、語学学習者（第二の発話者）と判定された第二話者Ｙの電話番号を記憶し、通話開始時に、第二話者Ｙの電話番号が記憶部３２に記憶済みの場合、第一及び第三の判定処理を行わずに、第一話者Ｘの音声信号をスロー再生させる。これにより、速やかにスロー再生させることができる。 As described above, in the present embodiment, the storage unit 32 stores the telephone number of the second speaker Y determined as the language learner (second speaker), and at the start of the call, the second speaker Y Is stored in the storage unit 32, the first speaker X's audio signal is played slowly without performing the first and third determination processes. Thereby, it is possible to promptly perform slow reproduction.

上述の実施形態は、あくまでも本発明の一実施の態様を例示するものであって、本発明の趣旨を逸脱しない範囲で任意に変形、及び応用が可能である。
例えば、図１に示す通話装置２１、及びその制御方法に本発明を適用する場合を説明したが、これに限定されない。例えば、車載に限定されない通話装置、及びその制御方法に本発明を適用してもよい。さらに、一対一で通話する通話装置２１に限定されず、特許文献１に記載した会議支援装置といった、３人以上で通話可能な通話装置に本発明を適用してもよい。 The above-described embodiments are merely illustrative of one embodiment of the present invention, and can be arbitrarily modified and applied without departing from the spirit of the present invention.
For example, although the case where this invention is applied to the telephone apparatus 21 shown in FIG. 1 and its control method was demonstrated, it is not limited to this. For example, the present invention may be applied to a communication device that is not limited to a vehicle and a control method thereof. Further, the present invention is not limited to the call device 21 that makes a one-to-one call, and the present invention may be applied to a call device that allows three or more people to call, such as the conference support device described in Patent Document 1.

また、上述の実施形態では、制御プログラムを記憶部３２に予め記憶しておく場合について説明したが、この制御プログラムを、磁気記録媒体、光記録媒体、半導体記録媒体等のコンピュータが読み取り可能な記録媒体に格納し、コンピュータが記録媒体からこの制御プログラムを読み取って実行するようにしてもよい。また、この制御プログラムを電気通信回線を介して通信ネットワーク上の配信サーバー等からダウンロードできるようにしてもよい。 In the above-described embodiment, the case where the control program is stored in the storage unit 32 in advance has been described. However, the control program can be recorded on a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, or a semiconductor recording medium. The program may be stored in a medium, and the computer may read the control program from the recording medium and execute it. Further, the control program may be downloaded from a distribution server or the like on a communication network via an electric communication line.

１０通話システム
２１通話装置
２２通信モジュール
２３携帯電話
２３Ａ、２７Ａ設定言語
２５他の電話端末
２７車載装置
３１制御部（判定部、音声処理部）
３２記憶部
３２Ａ言語別周波数情報
３３マイク（集音部）
３５スピーカー（放音部）
４１第一検出部（判定部）
４３第一演算部（判定部）
４５エコーキャンセラー
４７、５７ＳＲＣ（音声処理部）
５１第二検出部（判定部）
５３第二演算部（判定部）
５８ローパスフィルタ（ＬＰＦ）
Ｘ第一話者
Ｙ第二話者 DESCRIPTION OF SYMBOLS 10 Call system 21 Call apparatus 22 Communication module 23 Mobile phone 23A, 27A Setting language 25 Other telephone terminals 27 In-vehicle apparatus 31 Control part (determination part, voice processing part)
32 Storage Unit 32A Frequency Information by Language 33 Microphone (Sound Collection Unit)
35 Speaker (sound emission part)
41 1st detection part (determination part)
43 First operation unit (determination unit)
45 Echo canceller 47, 57 SRC (voice processing unit)
51 2nd detection part (determination part)
53 Second operation unit (determination unit)
58 Low-pass filter (LPF)
X First speaker Y Second speaker

Claims

In a call device that inputs and outputs audio signals corresponding to each speaker's voice so that multiple speakers can talk to each other,
A determination unit for determining whether the speaker is a first speaker equivalent to a native speaker or a second speaker equivalent to a language learner based on at least the voice of the speaker;
A voice processing unit that performs predetermined voice processing for a language learner on a voice signal output toward a speaker determined to be the second speaker based on the determination result of the determination unit;
A call device comprising:

The determination unit acquires predetermined frequency information capable of specifying the speaker's native language from the speaker's voice, specifies a native language based on the acquired frequency information, and uses the identified native language to determine the first language. The call device according to claim 1, wherein a first determination process is performed to determine whether the speaker is the first speaker or the second speaker.

The predetermined frequency information is a 0th formant frequency,
3. The determination unit according to claim 2, wherein the determination unit specifies a native language based on language-specific frequency information in which a plurality of types of languages are associated with a 0th formant frequency of a person whose native language is each language. Telephone device.

Whether the speaker that can be regarded as the user of the user-fixed device using the language set in the user-fixed device used by the call device for the call is the first speaker or the second speaker The call device according to claim 2, wherein a second determination process is performed to determine whether or not.

In the second determination process, if the language set in the user-fixed device matches the language set in the in-vehicle device connected to the call device, the language is identified as the native language and does not match The language set in the user-fixed device is specified as a native language, and the identified native language is used to determine whether the first speaker or the second speaker is used. 4. The communication device according to 4.

The determination unit acquires information on a silent portion of the voice from the voice of the speaker, and determines whether the first speaker or the second speaker is based on the acquired information. 6. The call device according to claim 4, wherein a determination process is performed.

7. The method according to claim 6, wherein when the determination results of the first to third determination processes are different, it is determined whether the first speaker or the second speaker according to a predetermined priority. Telephone device.

The call device includes: a sound collection unit that collects voices of a predetermined speaker among the plurality of speakers; and a sound emission unit that emits voices of other speakers toward the predetermined speaker And
When the predetermined speaker is determined to be the second speaker, the voice processing unit generates a predetermined voice signal for a language learner in a voice signal corresponding to the voice to be emitted by the sound emitting unit. The call device according to any one of claims 1 to 7, wherein voice processing is performed.

9. The call device according to claim 8, wherein the call device is a hands-free call device used by the predetermined speaker for a hands-free call.

A storage unit for storing a telephone number of the other speaker determined to be the second speaker;
When the telephone number of the other speaker has already been stored in the storage unit at the start of the call, the voice processing unit outputs the voice to the other speaker without making the determination The call device according to claim 8 or 9, wherein predetermined speech processing for language learners is performed on the signal.

In a control method of a call device that inputs and outputs audio signals corresponding to each speaker's voice so that a plurality of speakers can talk to each other,
A determination step of determining whether the speaker is a first speaker equivalent to a native speaker or a second speaker equivalent to a language learner based on at least the voice of the speaker;
An audio processing step for performing predetermined audio processing for a language learner on an audio signal output toward the speaker determined to be the second speaker based on the determination result of the determination step;
A method for controlling a communication device, characterized in that: