JP4207701B2

JP4207701B2 - Call device, call method, and call system

Info

Publication number: JP4207701B2
Application number: JP2003280435A
Authority: JP
Inventors: 義之國頭; 哲川畑; 晃弘保木本; 忠幸服部
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-07-25
Filing date: 2003-07-25
Publication date: 2009-01-14
Anticipated expiration: 2023-07-25
Also published as: JP2005045742A

Description

本発明は高い音質環境下での通話を可能とする例えばインターネットのようなネットワークを用いた通話装置及び通話方法に関し、通話音声の他にバックグランドミュージック（Back ground music：ＢＧＭ）或いは効果音（Sound effect：ＳＥ）を送受信する通話装置及び通話方法に関する。 The present invention relates to a call device and a call method using a network such as the Internet, which enables a call in a high sound quality environment, and in addition to call voice, background music (BGM) or sound effects (Sound effect: SE) relates to the communication device and call method for transmitting and receiving.

本件出願人は、特開２００２−２３７８７３号公報にて、音楽データの再生機能を有するデジタル携帯電話機にあって、イヤホンにて音楽を聞いているときに、電話の着信があると音楽に重ねて着信音をイヤホンから放音し、オンフック／オフフックボタンの操作に応じて音楽の再生を停止し、さらに本体の切り替えスイッチの操作に応じて、イヤホンと外部マイクロフォンという組み合わせでの通話にするか、或いは電話機本体のスピーカとマイクロフォンという組み合わせでの通話にするかを切り替えるという技術を開示している。 In Japanese Patent Application Laid-Open No. 2002-237873, the applicant of the present application is a digital mobile phone having a music data reproduction function, and when listening to music with an earphone, if there is an incoming call, it is superimposed on the music. Release the ringtone from the earphone, stop the music playback according to the operation of the on-hook / off-hook button, and make a call with the combination of the earphone and the external microphone according to the operation of the switch on the main unit, or A technique of switching whether to make a call using a combination of a speaker of a telephone body and a microphone is disclosed.

また、近年、インターネットを用いての電話が普及してきたが、通話装置として例えばパーソナルコンピュータを用いた場合には、キーボードやマウス等の操作に備えて使用者の両手をフリーにするこによりハンドセットを用いず、またエコー対策のためにＰＣ本体のスピーカとマイクロホンを用いず、ヘッドフォンとマイクロホンをセットにしたヘッドセットを用いることが多い。 In recent years, telephones using the Internet have become widespread. However, for example, when a personal computer is used as a communication device, a handset can be used by freeing both hands of a user in preparation for operations such as a keyboard and a mouse. In many cases, a headset including a headphone and a microphone is used instead of using a speaker and a microphone of the PC main body for preventing echoes.

特に、ＰＣ本体のスピーカとマイクロホンを使った場合、エコーキャンセラーが必要になる。エコーキャンセラーがないと自分が発音した声が通信により相手のスピーカ、相手のマイクロホンという経路で戻ってきてしまい、非常に話しづらくなる。特に、ステレオ音声での通話を行う場合には、益々困難になる。 In particular, when a PC main body speaker and microphone are used, an echo canceller is required. If there is no echo canceller, the voice you pronounced will come back through the other party's speaker and the other party's microphone via communication, making it very difficult to speak. In particular, it becomes more and more difficult to make a stereo voice call.

特開２００２−２３７８７３号公報JP 2002-237873 A

ところで、インターネット電話にて、前記特許文献１に開示された技術を用いると、ヘッドセットをＰＣに挿し、耳から離した状態では、着信音が聞こえなくなる。ＰＣの前から退席した状態でも着信音を聞きとることができないと電話がかかってきても判らないことになる。そこで、ヘッドセットとＰＣ内蔵のスピーカの両方から着信音を再生することが考えられる。しかし、スピーカで遠くでも音が聞こえるように設定していて、たまたまヘッドセットをかけて着信したときには、思わず大きな音を着信することになり、使用者にとっては使いかっての悪いものになってしまう。 By the way, when the technique disclosed in Patent Document 1 is used in an Internet telephone, a ring tone cannot be heard when the headset is inserted into the PC and separated from the ear. If you can't hear the ringtone even when you are away from the PC, you won't know if you get a call. Therefore, it is conceivable to reproduce the ringtone from both the headset and the speaker built in the PC. However, if the speaker is set so that sound can be heard even in the distance, and if you accidentally receive a call with a headset, you will receive a loud sound unintentionally, making it unusable for the user.

携帯電話装置についても同様の事が言える。携帯電話を離れた場所においた場合、着信音を聞こえるように大きな音に設定していて、たまたまヘッドセットをかけて着信したときにも大きな音で着信することになる。 The same is true for mobile phone devices. When the mobile phone is placed away from the mobile phone, it is set to a loud sound so that it can hear the ringtone.

本発明に係る通話装置は、前記課題を解決するために、ネットワークを介して音声による対話のための双方向通信を行う通話装置において、送信系として、収音した音声を電気信号に変換する音声変換手段からの音声信号に可変のゲイン係数を乗じてゲインを調整する第１のゲイン調整手段と、音のデータをファイル単位で記憶している第１の音データ記憶手段と、前記第１の音データ記憶手段から読み出したファイル単位の音データをデコードする第１のデコード手段と、前記第１のデコード手段からのデコード出力に可変のゲイン係数を乗じてゲインを調整する第２のゲイン調整手段と、音のデータをファイル単位で記憶している第２の音データ記憶手段と、前記第２の音データ記憶手段から読み出したファイル単位の音データをデコードする第２のデコード手段と、前記第２のデコード手段からのデコード出力に可変のゲイン係数を乗じてゲインを調整する第３のゲイン調整手段と、前記第１のゲイン調整手段からの第１の出力と前記第２のゲイン調整手段からの第２の出力と前記第３のゲイン調整手段からの第３の出力とを合成する合成手段と、前記合成手段の合成出力をエンコードするエンコード手段と、前記エンコード手段からのエンコード出力を前記ネットワークに送信する送信手段とを備え、受信系として、前記ネットワークを介して他の通話装置の送信手段から送信されてきたエンコード出力を受信する受信手段と、前記受信手段で受信されたエンコードデータをデコードする受信データデコード手段と、前記受信データデコード手段からのデコード出力に可変のゲイン係数を乗じてゲインを調整する第４のゲイン調整手段と、着信音データをファイル単位で記憶している着信音データ記憶手段と、前記着信音データ記憶手段からのデータに可変のゲイン係数を乗じてゲインを調整する第５のゲイン調整手段と、前記第４のゲイン調整手段からの第４の出力と前記第５のゲイン調整手段からの第５の出力とを合成する第２の合成手段と、第２の合成手段の出力に基づいてヘッドフォンを駆動するヘッドフォン再生手段と、前記着信音データ記憶手段からのデータに可変のゲイン係数を乗じてゲインを調整する第６のゲイン調整手段と、前記第６のゲイン調整手段の出力に基づいてスピーカを駆動するスピーカ再生手段とを備える。 Call device according to the present invention, in order to solve the above problems, to convert the communication device that performs two-way communication for voice dialog via the network, a transmission system, the voice picked up into an electrical signal sound First gain adjusting means for adjusting the gain by multiplying the audio signal from the converting means by a variable gain coefficient; first sound data storing means for storing sound data in units of files; and First decoding means for decoding sound data in units of files read from the sound data storage means, and second gain adjusting means for adjusting the gain by multiplying the decode output from the first decoding means by a variable gain coefficient Second sound data storage means for storing sound data in file units, and decoding of sound data in file units read from the second sound data storage means. Second decoding means, a third gain adjusting means for adjusting a gain by multiplying a decode output from the second decoding means by a variable gain coefficient, and a first gain from the first gain adjusting means. Combining means for combining the output, the second output from the second gain adjusting means, and the third output from the third gain adjusting means, and encoding means for encoding the combined output of the combining means; Transmitting means for transmitting the encoded output from the encoding means to the network, and as a receiving system, receiving means for receiving the encoded output transmitted from the transmitting means of another call device via the network; Received data decoding means for decoding encoded data received by the receiving means, and variable to decode output from the received data decoding means A fourth gain adjuster for adjusting the gain by multiplying the gain coefficient, the ringtone data storage means for storing ring tone data in file units, the variable gain coefficient data from the ring tone data storage means A fifth gain adjusting means for adjusting the gain by multiplication; a second combining means for combining the fourth output from the fourth gain adjusting means and the fifth output from the fifth gain adjusting means; And a headphone reproducing means for driving the headphones based on the output of the second synthesizing means; a sixth gain adjusting means for adjusting the gain by multiplying the data from the ring tone data storing means by a variable gain coefficient; Speaker reproducing means for driving the speaker based on the output of the sixth gain adjusting means.

本発明に係る通話方法は、前記課題を解決するために、ネットワークを介して音声による対話のための双方向通信を行う通話方法において、送信系として、収音した音声を電気信号に変換する音声変換手段からの音声信号に可変のゲイン係数を乗じてゲインを調整する第１のゲイン調整工程と、第１の音データ記憶手段にファイル単位で記憶している音のデータを読み出してデコードする第１のデコード工程と、前記第１のデコード工程からのデコード出力に可変のゲイン係数を乗じてゲインを調整する第２のゲイン調整工程と、第２の音データ記憶手段にファイル単位で記憶している音のデータを読み出してデコードする第２のデコード工程と、前記第２のデコード手段からのデコード出力に可変のゲイン係数を乗じてゲインを調整する第３のゲイン調整工程と、前記第１のゲイン調整工程からの第１の出力と前記第２のゲイン調整工程からの第２の出力と前記第３のゲイン調整工程からの第３の出力とを合成する第１の合成工程と、前記第１の合成工程の合成出力をエンコードするエンコード工程と、前記エンコード手段からのエンコード出力を前記ネットワークに送信する送信工程とを備え、受信系として、前記ネットワークを介して他の通話装置から送信されてきたエンコード出力を受信する受信工程と、前記受信工程で受信されたエンコードデータをデコードする受信データデコード工程と、前記受信データデコード工程からのデコード出力に可変のゲイン係数を乗じてゲインを調整する第４のゲイン調整工程と、着信音データ記憶手段にファイル単位で記憶されている着信音データの着信音に可変のゲイン係数を乗じてゲインを調整する第５のゲイン調整工程と、前記第４のゲイン調整工程からの第４の出力と前記第５のゲイン調整工程からの第５の出力とを合成する第２の合成工程と、第２の合成工程の出力に基づいてヘッドフォンを駆動するヘッドフォン再生工程と、前記着信音データ記憶手段からのデータに可変のゲイン係数を乗じてゲインを調整する第６のゲイン調整工程と、前記第６のゲイン調整工程の出力に基づいてスピーカを駆動するスピーカ再生工程とを備える。 Call method according to the present invention, in order to solve the above problems, in the call method for bidirectional communication for voice dialog via the network, converts a transmission system, the voice picked up into an electrical signal sound A first gain adjustment step of adjusting the gain by multiplying the audio signal from the conversion means by a variable gain coefficient, and a first read out and decoding of sound data stored in file units in the first sound data storage means. 1 decoding step, a second gain adjusting step for adjusting the gain by multiplying the decoded output from the first decoding step by a variable gain coefficient, and storing the file in the second sound data storage means in units of files. A second decoding step of reading out and decoding the sound data of the sound, and a gain adjustment by multiplying the decoding output from the second decoding means by a variable gain coefficient A first output from the first gain adjustment step, a second output from the second gain adjustment step, and a third output from the third gain adjustment step. A first synthesizing step, an encoding step for encoding the synthesized output of the first synthesizing step, and a transmitting step for transmitting the encoded output from the encoding means to the network. A receiving step for receiving an encoded output transmitted from another communication device, a received data decoding step for decoding the encoded data received in the receiving step, and a decode output from the received data decoding step. a fourth gain adjustment step of adjusting the gain is multiplied by a gain coefficient, stored on a file basis to the ring tone data storage means A fifth gain adjustment step of adjusting the gain multiplied by the variable gain coefficient ringtone ring tone data, first from the fourth output and the fifth gain adjustment step from the fourth gain adjustment step A second synthesizing step for synthesizing the output of No. 5, a headphone reproducing step for driving headphones based on the output of the second synthesizing step, and multiplying the data from the ring tone data storing means by a variable gain coefficient. comprising a sixth gain adjustment step of adjusting the gain, a speaker reproducing step of driving a speaker based on an output of the sixth gain adjustment step.

本発明の通話装置によれば、第１のゲイン調整手段が着信音データ記憶手段からのデータに可変のゲイン係数を乗じてゲインを調整する。ヘッドフォン再生手段が第１のゲイン調整手段の出力に基づいてヘッドフォンを駆動する。第２のゲイン調整手段が、着信音データ記憶手段からのデータに可変のゲイン係数を乗じてゲインを調整する。スピーカ再生手段が第２のゲイン調整手段の出力に基づいてスピーカを駆動する。したがって、ヘッドセットとスピーカで最適な音量調節をすることができる。また、遠くでも聞こえるようにスピーカの着信音だけを大きくすることも可能である。 According to the communication device of the present invention, the first gain adjustment means adjusts the gain by multiplying the data from the ring tone data storage means by a variable gain coefficient. The headphone reproducing means drives the headphones based on the output of the first gain adjusting means. The second gain adjustment means adjusts the gain by multiplying the data from the ring tone data storage means by a variable gain coefficient. The speaker reproducing means drives the speaker based on the output of the second gain adjusting means. Therefore, optimal volume adjustment can be performed with the headset and the speaker. It is also possible to increase only the ringtone of the speaker so that it can be heard from a distance.

本発明の通話方法によれば、第１のゲイン調整工程が着信音データ記憶手段からのデータに可変のゲイン係数を乗じてゲインを調整する。ヘッドフォン再生工程が第１のゲイン調整手段の出力に基づいてヘッドフォンを駆動する。第２のゲイン調整工程が、着信音データ記憶手段からのデータに可変のゲイン係数を乗じてゲインを調整する。スピーカ再生工程が第２のゲイン調整工程の出力に基づいてスピーカを駆動する。したがって、ヘッドセットとスピーカで最適な音量調節をすることができる。また、遠くでも聞こえるようにスピーカの着信音だけを大きくすることも可能である。 According to the calling method of the present invention, the first gain adjustment step adjusts the gain by multiplying the data from the ring tone data storage means by a variable gain coefficient. The headphone playback step drives the headphones based on the output of the first gain adjusting means. In the second gain adjustment step, the gain is adjusted by multiplying the data from the ring tone data storage means by a variable gain coefficient. The speaker reproduction process drives the speaker based on the output of the second gain adjustment process. Therefore, optimal volume adjustment can be performed with the headset and the speaker. It is also possible to increase only the ringtone of the speaker so that it can be heard from a distance.

本発明の通話システムによれば、通話装置の受信系にあって第１のゲイン調整手段が着信音データ記憶手段からのデータに可変のゲイン係数を乗じてゲインを調整する。ヘッドフォン再生手段が第１のゲイン調整手段の出力に基づいてヘッドフォンを駆動する。第２のゲイン調整手段が、着信音データ記憶手段からのデータに可変のゲイン係数を乗じてゲインを調整する。スピーカ再生手段が第２のゲイン調整手段の出力に基づいてスピーカを駆動する。したがって、ヘッドセットとスピーカで最適な音量調節をすることができる。また、遠くでも聞こえるようにスピーカの着信音だけを大きくすることも可能である。 According to the calling system of the present invention, in the receiving system of the calling device, the first gain adjusting means adjusts the gain by multiplying the data from the ring tone data storing means by the variable gain coefficient. The headphone reproducing means drives the headphones based on the output of the first gain adjusting means. The second gain adjustment means adjusts the gain by multiplying the data from the ring tone data storage means by a variable gain coefficient. The speaker reproducing means drives the speaker based on the output of the second gain adjusting means. Therefore, optimal volume adjustment can be performed with the headset and the speaker. It is also possible to increase only the ringtone of the speaker so that it can be heard from a distance.

以下、本発明を実施するための最良の形態としてボイス・オーバー・アイピー（Voice over IP：ＶｏＩＰ）と呼ばれるインターネット電話のプロトコルに従ったＶｏＩＰ通話システムと、このＶｏＩＰ通話システムに用いられるＶｏＩＰクライアントを挙げる。 Hereinafter, as a best mode for carrying out the present invention, a VoIP call system in accordance with an Internet telephone protocol called Voice over IP (VoIP) and a VoIP client used in the VoIP call system will be described. .

先ず、ＶｏＩＰ通話システム１の概略について説明する。このＶｏＩＰ通話システムは、ＶｏＩＰクライアント間の通話音声の他にバックグランドミュージック（Back ground music：ＢＧＭ）或いは効果音（Sound effect：ＳＥ）を送受信する。 First, an outline of the VoIP call system 1 will be described. This VoIP call system transmits and receives background music (BGM) or sound effect (SE) in addition to call voice between VoIP clients.

図１に示すように、ＶｏＩＰクライアント（Client）２は、例えば公衆回線等３によりインターネット４に接続され、同じくインターネット４に接続されている他のＶｏＩＰクライアント５と音声による対話のための双方向の通信を行う。インターネット４には、ＶｏＩＰサーバ（Server）６も接続されており、ＶｏＩＰに基づいた通信の制御等を行う。なお、このＶｏＩＰ通話システム１では、ＶｏＩＰクライアント２とＶｏＩＰクライアント５の二者間の通話を例に挙げるが、ＶｏＩＰクライアントは二つに限らず、よって通話システムへの参加者は２以上であることはもちろんである。 As shown in FIG. 1, a VoIP client (Client) 2 is connected to the Internet 4 via, for example, a public line 3 and the like, and is interactively connected to another VoIP client 5 connected to the Internet 4 for voice conversation. Communicate. A VoIP server (Server) 6 is also connected to the Internet 4 and performs communication control based on VoIP. In this VoIP call system 1, a call between two parties of the VoIP client 2 and the VoIP client 5 is taken as an example, but the number of VoIP clients is not limited to two, and therefore there are two or more participants in the call system. Of course.

インターネット４は、一般公衆回線などの通信回線や、情報通信ネットワークを複数接続することによって世界中に拡がったネットワーク環境である。現在、広帯域、高速な通信回線の普及によってブロードバンド伝送（Broadband Transmission）を可能としている。光ファイバー、非対称ディジタル加入者線、無線等を用い、500kbps以上の通信回線でネットワークを構成している。 The Internet 4 is a network environment that is spread all over the world by connecting a plurality of communication lines such as general public lines and information communication networks. Currently, broadband transmission is enabled by the widespread use of broadband and high-speed communication lines. The network is composed of communication lines of 500kbps or higher using optical fiber, asymmetric digital subscriber line, radio, etc.

ＶｏＩＰサーバ６は、ＶｏＩＰ通話システム１にあって契約者のＩＰアドレスの管理や、認証、あるいは通信の制御を行う。ワークステーションのようなコンピュータより構成されている。もちろん、課金処理のためのサーバや、契約者のＩＰアドレス他管理情報を処理するサーバを別に設けてもよい。 The VoIP server 6 is in the VoIP call system 1 and manages the contractor's IP address, authenticates, or controls communication. It consists of a computer such as a workstation. Of course, a server for billing processing and a server for processing management information such as the contractor's IP address may be provided separately.

ＶｏＩＰクライアント２は、マイクロフォンとスピーカ、又はマイクロフォン７ａとヘッドフォン７ｂとをセットにしたヘッドセット７を接続した例えばパーソナルコンピュータ（Personal computer：ＰＣ）である。ＰＣがソフトウェアで実現されるＶｏＩＰクライアントプログラム２ａを実行することによりＶｏＩＰクライアント２になる。なお、以下では、ＶｏＩＰクライアント２がＶｏＩＰクライアント５に電話をかける場合、つまりＶｏＩＰクライアント２が始めに送信し、ＶｏＩＰクライアント５が受信するという状況を想定する。もちろん、ＶｏＩＰクライアント５も、ＶｏＩＰクライアントプログラム５ａを実行するＰＣよりなり、始めに送信側となるときには同様の動作を行う。 The VoIP client 2 is, for example, a personal computer (PC) to which a headset 7 including a microphone and a speaker or a microphone 7a and a headphone 7b is connected. The PC becomes the VoIP client 2 by executing the VoIP client program 2a realized by software. In the following, it is assumed that the VoIP client 2 makes a call to the VoIP client 5, that is, the VoIP client 2 transmits first and the VoIP client 5 receives. Of course, the VoIP client 5 is also composed of a PC that executes the VoIP client program 5a, and performs the same operation when it first becomes the transmission side.

送信側であるＶｏＩＰクライアント２は、ＶｏＩＰ通話中に背景音として例えば数分間単位の連続した時間継続する音である音楽（Back ground music：ＢＧＭ）等や、例えば数秒間単位の効果音（Sound effect：ＳＥ）を通話音声にミキシングすることができる。ＶｏＩＰクライアント２は、通話音はもちろん、背景音や効果音の音量レベルも個別に調整する。 The VoIP client 2 on the transmission side is a background sound during a VoIP call, for example, music (Back ground music: BGM) that is a continuous sound for several minutes, for example, a sound effect (Sound effect) for several seconds. : SE) can be mixed into the call voice. The VoIP client 2 individually adjusts the volume level of the background sound and the sound effect as well as the call sound.

また、ＶｏＩＰクライアント２は受信側となるとき、ヘッドセット７とスピーカで着信音の音量を独立に調節できる。 Further, when the VoIP client 2 becomes the receiving side, the volume of the ringtone can be adjusted independently by the headset 7 and the speaker.

以下、ＶｏＩＰクライアント２が背景音や効果音の音量レベルを個別に調整できる構成及び動作、さらにヘッドセット７とスピーカで着信音の音量を独立に調節できる構成及び動作について図２を参照して説明する。ＶｏＩＰクライアント２は、ＶｏＩＰクライアントプログラム２ａを実行することにより、送信系、受信系がそれぞれ機能的に以下に説明するように構成される。先ず、送信系１０にあって、マイクロフォン７ａにて収音されて電気信号に変換されたユーザの音声に基づく電気信号はマイクキャプチャー部１１にて取り込まれる。マイクキャプチャー部１１が取り込んだ音声に基づく電気信号には、ユーザが設定するマイク音量レベルであるゲイン係数ｋ１がゲイン調整部１２により乗算される。このゲイン調整部１２の乗算出力は、加算部１３に供給される。 Hereinafter, a configuration and operation in which the VoIP client 2 can individually adjust the volume level of the background sound and the sound effect, and a configuration and operation in which the volume of the ringtone can be independently adjusted by the headset 7 and the speaker will be described with reference to FIG. To do. The VoIP client 2 is configured such that the transmission system and the reception system are functionally described below by executing the VoIP client program 2a. First, in the transmission system 10, an electric signal based on a user's voice collected by the microphone 7 a and converted into an electric signal is captured by the microphone capture unit 11. The gain adjustment unit 12 multiplies the electrical signal based on the sound captured by the microphone capture unit 11 by a gain coefficient k1 that is a microphone volume level set by the user. The multiplication output of the gain adjusting unit 12 is supplied to the adding unit 13.

また、ＶｏＩＰクライアント２は、例えば、マシンガンの銃声、雷鳴、拍手音、笑い声など、数秒間単位の効果音を例えばＰＣＭデータにしてからそれぞれＭＰ３（MPEG-1 Audio Layer-III）や、MPEG4、あるいはＡＴＲＡＣ（Adaptive Transform Acoustic Coding）等の圧縮技術により予め圧縮し、ファイル単位のＳＥデータとしてＳＥファイル記憶部１４に複数ファイル分記憶している。ＳＥファイル記憶部１４としては、後述するようなハードディスクドライブ（ＨＤＤ）や、ＲＯＭ、光磁気ディスクが挙げられる。 In addition, the VoIP client 2 converts MP3 (MPEG-1 Audio Layer-III), MPEG4, or MPEG4 sound effects such as machine gun gunshots, thunders, applause sounds, laughter, etc., for example, into PCM data. A plurality of files are compressed in advance by a compression technique such as ATRAC (Adaptive Transform Acoustic Coding) and stored in the SE file storage unit 14 as SE data in file units. Examples of the SE file storage unit 14 include a hard disk drive (HDD), ROM, and magneto-optical disk as will be described later.

また、ＶｏＩＰクライアント２は、例えば、波の音、小鳥のさえずり、或いは様々なジャンルの音楽などよりなる、数分間単位の背景音を例えばＰＣＭデータにしてからそれぞれＭＰ３や、MPEG4、あるいはＡＴＲＡＣ等の圧縮技術により予め圧縮し、ファイル単位のＢＧＭデータとしてＢＧＭファイル記憶部１５に複数ファイル分記憶している。 In addition, the VoIP client 2 uses, for example, MP3, MPEG4, ATRAC, etc. after converting background sounds in units of several minutes, such as wave sounds, birdsongs, or music of various genres, into PCM data, for example. A plurality of files are stored in the BGM file storage unit 15 as BGM data in file units in advance by compression using a compression technique.

ＳＥファイル記憶部１４に記憶されているＳＥファイルは、使用者の所望によって選択されるとＳＥファイル読み出し部１６によって図示しないＲＡＭに読み出されながらデコード部１７にてデコードされてＰＣＭデータとなる。デコード部１７のデコード出力（ＰＣＭデータ）には、ユーザが設定するＳＥ音量レベルであるゲイン係数ｋ２がゲイン調整部１８により乗算される。このゲイン調整部１８の乗算出力は、加算部１３に供給される。 When the SE file stored in the SE file storage unit 14 is selected as desired by the user, it is decoded by the decoding unit 17 while being read by the SE file reading unit 16 into a RAM (not shown), and becomes PCM data. The gain adjustment unit 18 multiplies the decoding output (PCM data) of the decoding unit 17 by a gain coefficient k2 that is an SE volume level set by the user. The multiplication output of the gain adjusting unit 18 is supplied to the adding unit 13.

ＢＧＭファイル記憶部１５に記憶されているＢＧＭファイルも、使用者の所望によって選択されるとＢＧＭファイル読み出し部１７によって図示しないＲＡＭに読み出されながらデコード部２０にてデコードされてＰＣＭデータとなる。デコード部２０のデコード出力には、ユーザが設定するＢＧＭ音量レベルであるゲイン係数ｋ３がゲイン調整部２１により乗算される。このゲイン調整部２１の乗算出力は、加算部１３に供給される。加算部１３は、３つのゲイン調整部１２、１８、２１の乗算出力を飽和処理をしつつ加算し、加算出力をエンコード部２２に供給する。 When the BGM file stored in the BGM file storage unit 15 is also selected as desired by the user, the BGM file reading unit 17 decodes the BGM file into a RAM (not shown) and decodes it into PCM data. The gain adjustment unit 21 multiplies the decoding output of the decoding unit 20 by a gain coefficient k3 that is a BGM volume level set by the user. The multiplication output of the gain adjusting unit 21 is supplied to the adding unit 13. The addition unit 13 adds the multiplication outputs of the three gain adjustment units 12, 18, and 21 while performing saturation processing, and supplies the addition output to the encoding unit 22.

エンコード部２２は、加算部１３の加算出力（ＰＣＭデータ）をＭＰ３や、MPEG4、あるいはＡＴＲＡＣ等の圧縮技術により数十ｋbps、例えば６４kbpsに圧縮する。このエンコード部２２が行う、ＭＰ３や、MPEG4、あるいはＡＴＲＡＣ等の圧縮技術は、ＣＤで採用されているＰＣＭオーディオデータ等に対して施される高能率の音響圧縮符号化復号化技術である。よって、パケット化されてからインターネットを介して伝送され、受信側にて再生されたオーディオは、ステレオ２チャンネル化が可能であり、また高音質である。 The encoding unit 22 compresses the addition output (PCM data) of the addition unit 13 to tens of kbps, for example, 64 kbps, using a compression technique such as MP3, MPEG4, or ATRAC. The compression technology such as MP3, MPEG4, or ATRAC performed by the encoding unit 22 is a high-efficiency acoustic compression coding / decoding technology applied to PCM audio data or the like employed in a CD. Therefore, the audio that has been packetized, transmitted over the Internet, and played back on the receiving side can be converted into two stereo channels and has high sound quality.

この圧縮データは、リアルタイム・トランスポート・プロトコル（Real-time Transport Protocol：ＲＴＰ）に従ってデータをパケット化するＲＴＰパケット化（packetize）部２３に供給される。ＲＴＰパケット化部２３は、前記圧縮データをＲＴＰのパケットに入れ、さらにＵＤＰ、ＩＰとパケッタイズする。ＲＴＰに従ったパケット化については詳細を後述する。パケット化されたパケットデータは送信処理部２４からインターネットに送られる。 This compressed data is supplied to an RTP packetizing unit 23 that packetizes the data according to a real-time transport protocol (RTP). The RTP packetizing unit 23 puts the compressed data into an RTP packet and further packetizes with UDP and IP. Details of packetization according to RTP will be described later. The packetized packet data is sent from the transmission processing unit 24 to the Internet.

受信系３０にあって、インターネット４を介して他のＶｏＩＰクライアント５から送信されてきたパケットデータは受信処理部３１によって受信される。受信処理部３１で受信されたパケット化データは、ＲＴＰデパケット化（depacketize）部３２にて解かれる。デジッタ（de-jitter）部３３は、ＲＴＰデパケット化部３２にてＩＰ、ＵＤＰから解かれたＲＴＰのタイムスタンプ、シーケンシャルナンバーを基に到着時間の補正を行う。 In the reception system 30, packet data transmitted from another VoIP client 5 via the Internet 4 is received by the reception processing unit 31. The packetized data received by the reception processing unit 31 is solved by the RTP depacketize unit 32. The de-jitter unit 33 corrects the arrival time based on the RTP time stamp and sequential number solved from the IP and UDP by the RTP depacketization unit 32.

パケット補償（packet loss compensator）部３４は前記ＲＴＰのタイムスタンプ、シーケンシャルナンバーを基にパケット損失の補償を行い、補償データをデコード部３５に送る。デコード部３５は、到着時間の補正、パケットロスの補償が行われた圧縮データをＰＣＭデータにデコードし、ＰＣＭデータをゲイン調整部３６に送る。ゲイン調整部３６は、前記ＰＣＭデータに使用者が設定する再生音量レベルであるゲイン係数ｋ５を乗算する。このゲイン調整部３６の乗算出力は加算部３７に送られる。また、送信される音声を通話相手と共有するため、ゲイン調整部３８において送信音声データに使用者が設定するループバック音量レベルであるゲイン係数ｋ４を乗算する。ゲイン調整部３８の乗算出力も加算部３７に供給される。 A packet loss compensator 34 compensates for packet loss based on the RTP time stamp and sequential number, and sends compensation data to the decoder 35. The decoding unit 35 decodes the compressed data subjected to arrival time correction and packet loss compensation into PCM data, and sends the PCM data to the gain adjusting unit 36. The gain adjusting unit 36 multiplies the PCM data by a gain coefficient k5 that is a playback volume level set by the user. The multiplication output of the gain adjusting unit 36 is sent to the adding unit 37. Further, in order to share the transmitted voice with the other party, the gain adjustment unit 38 multiplies the transmission voice data by a gain coefficient k4 that is a loopback volume level set by the user. The multiplication output of the gain adjustment unit 38 is also supplied to the addition unit 37.

さらに、このＶｏＩＰクライアント２は、着信音（Ring Tone）を例えばＰＣＭデータにしてからそれぞれＭＰ３や、MPEG4、あるいはＡＴＲＡＣ等の圧縮技術により予め圧縮し、ファイル単位の着信音データとして着信音ファイル記憶部３９に複数ファイル分記憶している。 Further, the VoIP client 2 converts the ring tone (Ring Tone) into, for example, PCM data and then compresses it in advance using a compression technique such as MP3, MPEG4, or ATRAC, and the ring tone file storage unit as ring tone data for each file. 39 stores a plurality of files.

着信音ファイル記憶部３９からの着信音ファイルは、使用者の所望によって予め選択されており、着信のタイミングに従ってリングトーン読み出し部４０によって図示しないＲＡＭに読み出され、デコード部４１にてＰＣＭデータにデコードされる。デコード部４１のデコード出力は、ゲイン調整部４２及びゲイン調整部４３に供給される。ゲイン調整部４２は、使用者が設定するヘッドフォン着信音量レベルであるゲイン係数ｋ６をリングトーンのデコード出力（ＰＣＭデータ）に乗算して加算部３７に供給する。加算部３７は、ゲイン調整部３６の乗算出力である通話音声と背景音等のミキシング出力（ＰＣＭデータ）にゲイン調整部３８の乗算出力である自分の通話音のＰＣＭデータとを加算し、加算出力をヘッドフォン再生部４４に供給する。ヘッドフォン再生部４４は、前記加算出力をアナログ信号に変換してから増幅し、ヘッドフォン７ｂに供給する。ヘッドフォン７ｂは、使用者の耳に前記ミキシング出力を発音する。 The ring tone file from the ring tone file storage unit 39 is selected in advance according to the user's request, and is read into a RAM (not shown) by the ring tone reading unit 40 according to the timing of the incoming call, and is converted into PCM data by the decoding unit 41. Decoded. The decoded output of the decoding unit 41 is supplied to the gain adjustment unit 42 and the gain adjustment unit 43. The gain adjusting unit 42 multiplies the ring tone decoded output (PCM data) by a gain coefficient k6 that is a headphone ringing volume level set by the user, and supplies the result to the adding unit 37. The adder 37 adds the call voice as the multiplication output of the gain adjustment unit 36 and the PCM data of the own call sound as the multiplication output of the gain adjustment unit 38 to the mixing output (PCM data) such as background sound. The output is supplied to the headphone playback unit 44. The headphone reproducing unit 44 converts the added output into an analog signal, amplifies it, and supplies it to the headphone 7b. The headphones 7b generate the mixing output in the user's ear.

また、加算部３７は、他のＶｏＩＰクライアント５からの電話がかかってきたタイミングにて、リングトーンファイル読み出し部４０が読み出したリングトーンファイルのデコード出力（ＰＣＭデータ）に使用者が設定したヘッドフォン着信音量レベルであるゲイン係数ｋ６の乗算されたデータを、ヘッドフォン再生部４４に供給する。ヘッドフォン再生部４４は、前記ゲイン係数ｋ６の乗算されたリングトーンデータをアナログ信号に変換してからヘッドフォン７ｂに供給する。よって、ヘッドフォン７ｂは他のＶｏＩＰクライアント５からの電話がかかってきたタイミングで、使用者が設定したヘッドフォン着信音量レベルの着信音を使用者の耳に発音する。 In addition, the adding unit 37 receives the headphone call set by the user in the decoded output (PCM data) of the ring tone file read by the ring tone file reading unit 40 at the timing when a call is received from another VoIP client 5. The data multiplied by the gain coefficient k6, which is the volume level, is supplied to the headphone playback unit 44. The headphone reproducing unit 44 converts the ring tone data multiplied by the gain coefficient k6 into an analog signal and then supplies the analog signal to the headphone 7b. Therefore, the headphone 7b emits a ringtone of the headphone ringing volume level set by the user at the ear of the user at the timing when a call from another VoIP client 5 is received.

ゲイン調整部４３は、デコード部４１からのデコード出力であるリングトーンのＰＣＭデータに使用者の設定するスピーカ着信音音量レベルであるゲイン係数ｋ７を乗算し、スピーカ再生部４５に供給する。スピーカ再生部４５は、前記乗算出力をアナログ信号に変換してから増幅しスピーカ４６に供給する。スピーカ４６は、使用者がスピーカ用に設定したスピーカ着信音音量レベルの着信音を発音する。 The gain adjustment unit 43 multiplies the ring tone PCM data, which is the decoded output from the decoding unit 41, by a gain coefficient k7 which is a speaker ringing tone volume level set by the user, and supplies the result to the speaker reproduction unit 45. The speaker reproducing unit 45 converts the multiplication output into an analog signal, amplifies it, and supplies it to the speaker 46. The speaker 46 generates a ringtone having a speaker ringtone volume level set by the user for the speaker.

したがって、ＶｏＩＰクライアント２は、受信側となるとき、ヘッドセット７とスピーカで着信音の音量を独立に調節できる。 Therefore, when the VoIP client 2 becomes the receiving side, the volume of the ringtone can be adjusted independently by the headset 7 and the speaker.

次に、ＲＴＰに基づいたパケット化及びデパケット化について説明しておく。ＲＴＰは、インターネット等のＩＰネットワークにおいて、リアルタイムに音声や動画を送信／受信するトランスポートプロトコルである。ＲＦＣ１８８９で勧告されている。ＲＴＰは、トランスポート層に位置し、一般にユーザ・データグラム・プトロコル（User Datagram Protocol：ＵＤＰ）上でリアルタイム・コントロール・プトロコル（Real-time Control Protcol）とともに用いられる。 Next, packetization and depacketization based on RTP will be described. RTP is a transport protocol for transmitting / receiving voice and moving images in real time in an IP network such as the Internet. It is recommended in RFC1889. RTP is located in the transport layer and is generally used with Real-time Control Protocol over User Datagram Protocol (UDP).

ＲＴＰパケットは、図３に示すように、ＩＰヘッダ、ＵＤＰヘッダ、ＲＴＰヘッダ及びＲＴＰデータからなる。ＲＴＰヘッダには、バージョン情報（Verasion：Ｖ）、パディング（Padding：Ｐ）、拡張ヘッダ（extension：Ｘ）の有無、送信元（Contoributing source：ＣＲＳＣ）数、マーカ情報（Marker：Ｍ）、ペイロードタイプ（Payload Type：ＰＴ）、シーケンス番号（Sequence Number）、ＲＴＰタイムスタンプ、同期送信元（Sychronization Source：ＳＳＲＣ）識別子、及び寄与送信元（Contoributeing source：ＣＲＳＣ)識別子を格納する各フィールドが設けられている。 As shown in FIG. 3, the RTP packet includes an IP header, a UDP header, an RTP header, and RTP data. The RTP header includes version information (Verasion: V), padding (Padding: P), presence / absence of extension header (extension: X), number of transmission sources (Contoributing source: CRSC), marker information (Marker: M), payload type (Payload Type: PT), sequence number (Sequence Number), RTP time stamp, synchronization transmission source (Sychronization Source: SSRC) identifier, and each field storing a contribution transmission source (Contoributeing source: CRSC) identifier is provided. .

図２におけるＲＴＰパケット化部２３は、エンコード部２２の出力である圧縮データを、前述したＲＴＰに従ってパケット化する。圧縮データそのものは図３に示すＲＴＰペイロード部分に含まれる。このＲＴＰパケットを送信処理部２４からインターネット４を介して他のＶｏＩＰクライアント（例えば図１のＶｏＩＰクライアント５）に送る。 The RTP packetizing unit 23 in FIG. 2 packetizes the compressed data that is the output of the encoding unit 22 according to the RTP described above. The compressed data itself is included in the RTP payload portion shown in FIG. The RTP packet is sent from the transmission processing unit 24 to another VoIP client (for example, the VoIP client 5 in FIG. 1) via the Internet 4.

他のＶｏＩＰクライアント５の受信系３０では、受信処理部３１により前記ＲＴＰパケットを受信する。ここでは、他のＶｏＩＰクライアント５の動作になるが、図２を用いて説明する。ＲＴＰデパケット化部３２は、ＲＴＰヘッダとＲＴＰデータをＩＰヘッダ、ＵＤＰヘッダから分離する。ＲＴＰヘッダに格納されているシーケンス番号及びタイプスタンプをデジッタ部３３に送る。 In the reception system 30 of another VoIP client 5, the reception processing unit 31 receives the RTP packet. Here, the operation of the other VoIP client 5 will be described with reference to FIG. The RTP depacketizer 32 separates the RTP header and RTP data from the IP header and the UDP header. The sequence number and type stamp stored in the RTP header are sent to the de-jitter unit 33.

デジッタ部３３は、前記シーケンス番号及びタイプスタンプを基に到着時刻の不均等を補正する。ＲＴＰパケットは、他のデータが伝送されているインターネットによって送信されてくるので、伝送が込んでいるときの影響を受けたりし、その到着時刻は等間隔ではない。時間軸上で詰まったり、伸びたりして、通信間隔が不均等になることがある。そこで、デジッタ部３３は、前記シーケンス番号及びタイプスタンプを基に補正し、等間隔とする。 The de-jitter unit 33 corrects the arrival time non-uniformity based on the sequence number and the type stamp. Since the RTP packet is transmitted by the Internet through which other data is transmitted, the RTP packet may be affected by the transmission, and the arrival times thereof are not equal. The communication interval may become uneven due to clogging or stretching on the time axis. Therefore, the de-jitter unit 33 corrects based on the sequence number and the type stamp so as to have equal intervals.

また、パケット補償部３４は、前記シーケンス番号及びタイプスタンプを基にパケットの損失を補正する。ＲＴＰパケットは、インターネットによって送受信されるので、パケットが欠落したり、受信不能になることがある。そこで、パケット補償部３４は、欠落したパケットの代わりにその前又は後ろのパケットと同じパケットを使用したり、欠落したデータを０にする等してパケットの損失を補償する。 The packet compensator 34 corrects the packet loss based on the sequence number and the type stamp. Since the RTP packet is transmitted / received via the Internet, the packet may be lost or may not be received. Therefore, the packet compensation unit 34 compensates for packet loss by using the same packet as the preceding or succeeding packet instead of the missing packet or setting the missing data to 0.

そして、デコード部３５は、到着時刻が補正され、パケット損失が補償された前記通話音と背景音等のミキシングデータをデコードし、ＰＣＭデータにする。 Then, the decoding unit 35 decodes the mixing data such as the call sound and the background sound in which the arrival time is corrected and the packet loss is compensated, and converts it into PCM data.

このような機能構成のＶｏＩＰクライアント２にあって、本発明を適用することにより、特徴的となるのは、通話音はもちろん、背景音の音量レベルも個別に調整することができることである。 In the VoIP client 2 having such a functional configuration, by applying the present invention, a characteristic is that the volume level of the background sound as well as the call sound can be individually adjusted.

通話音の音量レベルの調整は、ゲイン調整部１２にて音声データにユーザが設定するマイク音量レベルであるゲイン係数ｋ１を乗算することによって行われる。また、効果音又はＢＧＭの音量レベルの調整は、ゲイン調整部１８、又はゲイン調整部２１にて各オーディオデータにユーザが設定するＳＥ音量レベルであるゲイン係数ｋ２、又はＢＧＭ音量レベルであるゲイン係数ｋ３を乗算することによって行われる。 The volume level of the call sound is adjusted by multiplying the audio data by a gain coefficient k1, which is a microphone volume level set by the user, in the gain adjustment unit 12. Further, the adjustment of the sound effect or the volume level of the BGM is performed by adjusting the gain coefficient k2 that is the SE volume level set by the user in each audio data by the gain adjustment unit 18 or the gain adjustment unit 21 or the gain coefficient that is the BGM volume level. This is done by multiplying k3.

各ゲイン調整部１２、ゲイン調整部１８、ゲイン調整部２１にて音量レベルが調整された後の、通話音データ、効果音又はＢＧＭのオーディオデータは加算部１３にて合成され、エンコード部２２にてエンコードされた後、ＲＴＰパケット化部２３にてパケット化され、送信処理部２４から通話相手の他のＶｏＩＰクライアント５に送信される。 The call sound data, the sound effect, or the BGM audio data after the volume level is adjusted by each gain adjustment unit 12, gain adjustment unit 18, and gain adjustment unit 21 are synthesized by the addition unit 13, and are sent to the encoding unit 22. Are encoded by the RTP packetizing unit 23 and transmitted from the transmission processing unit 24 to the other VoIP client 5 of the other party.

通話相手のＶｏＩＰクライアント５は、インターネット４を介して伝送されてきたＲＴＰパケットを受信処理部３１にて受信し、ＲＴＰデパケット化部３２によりデパケット化し、デジッタ部３３により到着時刻の間隔を補正し、パケット補償部３４によりパケット損失を補償した後、デコード部３５にてＰＣＭデータにデコードする。デコードされた後のオーディオデータ（ＰＣＭデータ）には、受信側使用者により、音量レベルであるゲイン係数ｋ５がゲイン調整部３６により乗算されて、送信者からの通話音を、ＢＧＭ又はＳＥとミキシングした状態でヘッドフォン４４により聞くことができる。 The other party's VoIP client 5 receives the RTP packet transmitted via the Internet 4 by the reception processing unit 31, depackets it by the RTP depacketization unit 32, corrects the arrival time interval by the dejitter unit 33, After the packet loss is compensated by the packet compensation unit 34, the decoding unit 35 decodes it to PCM data. The decoded audio data (PCM data) is multiplied by the gain adjustment unit 36 by the gain adjustment unit 36 by the receiving side user, and the call sound from the sender is mixed with BGM or SE. In this state, it can be heard through the headphones 44.

このＶｏＩＰクライアント２は、次の図４に示す開放型システム間相互接続（Open System Interconnection：ＯＳＩ）のアーキテクチャに基づく各階層のプロトコルに応じたソフトウェアモジュールを実行することにより前記図２に示した機能を達成する。 The VoIP client 2 has the functions shown in FIG. 2 by executing software modules corresponding to the protocols of each layer based on the open system interconnection (OSI) architecture shown in FIG. To achieve.

図４において下位層から上位層に向かって各階層を説明する。先ず、物理層としての機能にはユニバーサル・シリアル・バス（Universal Serial Bus：ＵＳＢ）カメラドライバー、ＵＳＢオーディオドライバ及び各種ドライバがある。カメラドライバからのビデオデータやオーディオドライバからのオーディオデータの伝送条件の物理的条件を合わせるレイヤである。次に、データリンク層としての機能には、オペレーティングシステム（Operating System：ＯＳ）がある。隣接ノード間の誤りのないデータ転送を実行するためのものである。 In FIG. 4, each layer will be described from the lower layer to the upper layer. First, functions as a physical layer include a universal serial bus (USB) camera driver, a USB audio driver, and various drivers. This layer matches the physical conditions of the transmission conditions of video data from the camera driver and audio data from the audio driver. Next, the function as the data link layer includes an operating system (OS). This is for executing error-free data transfer between adjacent nodes.

ネットワーク層としての機能には、インターネットプロトコル（Internet Protocol：ＩＰ）がある。ネットワーク層は、データ送受信に使用する通信経路を選択し、フロー制御・品質制御などの通信制御を行うところである。信頼性を追求しないコネクションレス（Conectionless)パケット転送プロトコルであるＩＰは、信頼性保証機能、フロー制御機能、エラー回復機能を上位階層（トランスポート層とアプリケーション層）に任せている。 As a function of the network layer, there is the Internet Protocol (IP). The network layer selects a communication path used for data transmission / reception and performs communication control such as flow control and quality control. IP, which is a connectionless packet transfer protocol that does not pursue reliability, leaves the reliability assurance function, flow control function, and error recovery function to the upper layers (transport layer and application layer).

トランスポート層としての機能には、トランスポート・コントロール・プロトコル（Transport Control Protocol）／ユーザ・データグラム・プロトコル（User Datagram Protocol）がある。トランスポート層では、ＩＰアドレスを使用してエンド・ツー・エンドの伝送を行う。ネットワークの種類に依存せず、要求される品質クラスに従ってフロー制御や順序制御を行う。ＴＣＰは信頼性保証機能を持ち、転送したデータの各バイトにシーケンス番号を付け、受信側から受け取り通知（ＡＣＫ）が送られてこなければデータを再送する。ＵＤＰは、アプリケーション間のデータグラムの送信機能を提供する。ＩＰネットワークを用いて、音声・動画像をストリーミング再生する場合、一般にエラー時に再送を行うＴＣＰのようなトランスポートプロトコルは使用できない。また、ＴＣＰは、１対１通信用のプロトコルであり、複数の相手に情報を送信することができない。そこで、このような用途には、ＵＤＰが用いられる。 The functions as the transport layer include the Transport Control Protocol / User Datagram Protocol. In the transport layer, end-to-end transmission is performed using an IP address. Regardless of the type of network, flow control and sequence control are performed according to the required quality class. TCP has a reliability guarantee function, attaches a sequence number to each byte of transferred data, and retransmits data if a reception notification (ACK) is not sent from the receiving side. UDP provides a function for transmitting datagrams between applications. When streaming audio / video images using an IP network, a transport protocol such as TCP that retransmits when an error occurs cannot generally be used. TCP is a protocol for one-to-one communication, and information cannot be transmitted to a plurality of partners. Therefore, UDP is used for such applications.

ＵＤＰは、アプリケーションのプロセスがリモートマシン上の他のアプリケーションのプロセスへデータを転送することを、最小のオーバーヘッドで行えるように設計されている。そのため、ＵＤＰのヘッダに入る情報は、送信元ポート番号、宛先ポート番号、データ長、チェックサムのみであり、ＴＣＰにあるパケットの順序を表す番号を入れるフィールドがないので、ネットワーク上で異なる経路を介して伝送されるなどによりパケットの順序が入れ替わってしまった場合に、その順序を正しい状態に戻す処理を行うことができない。また、送信時のタイムスタンプ等の時間情報を入れるフィールドは、ＴＣＰにもＵＤＰにもない。 UDP is designed to allow application processes to transfer data to other application processes on a remote machine with minimal overhead. Therefore, the information entered in the UDP header is only the source port number, destination port number, data length, and checksum, and there is no field for entering the number indicating the order of packets in TCP. When the order of the packets is changed due to transmission through the network, processing for returning the order to the correct state cannot be performed. Also, there is no field for inputting time information such as a time stamp at the time of transmission in TCP or UDP.

セッション層としての機能には、セッション・イニシエーション・プロトコル（Session Initiation Protocol：ＳＩＰ）と、本発明の要部となる前記通話音とＢＧＭ又はＳＥの合成処理ソフトウェアに必要とされるモジュールがある。保留音発生とＢＧＭ合成と着信音発生とコーデック（codec）とＲＴＰである。セッション層は、情報の転送制御を行う。アプリケーション間における対話モードを管理して会話単位の制御を行う。ＳＩＰは、ＩＰネットワーク上でマルチメディアセッションを確立・変更・終了するための、アプリケーション層のシグナリングプロトコルである。ＲＦＣ３２６１で標準化されている。 The function as the session layer includes a session initiation protocol (SIP) and a module required for the software for synthesizing the speech sound and BGM or SE, which is a main part of the present invention. On-hold tone generation, BGM synthesis, ring tone generation, codec, and RTP. The session layer controls information transfer. Manage conversation modes between applications and control conversation units. SIP is an application layer signaling protocol for establishing, changing and terminating multimedia sessions on an IP network. It is standardized by RFC3261.

プレゼンテーション層としての機能には、ＶｏＩＰ通話制御がある。プレゼンテーション層では、アプリケーションで送受信する情報の表現形式を管理して、データの変換や暗号化を行う。 As a function as a presentation layer, there is VoIP call control. The presentation layer manages the expression format of information transmitted and received by the application, and performs data conversion and encryption.

アプリケーション層としての機能には、グラフィカルユーザインターフェース（Graphical User Interface：ＧＵＩ）がある。アプリケーション層では、ユーザプログラムで使用する通信機能の外部仕様を管理して、それに基づく情報のやり取りを行う。 As a function as an application layer, there is a graphical user interface (GUI). In the application layer, the external specification of the communication function used in the user program is managed, and information is exchanged based thereon.

次に、実際に前記ソフトウェアモジュールを実行するＶｏＩＰクライアント２のハードウェア構成を説明する。図５はＶｏＩＰクライアント２の構成を表している。図５において、ＣＰＵ（Central Processing Unit）５１は、ＲＯＭ（Read Only Memory）５２に記憶されている前記ソフトウェアモジュールを構成する各種プログラム、または記憶部５８からＲＡＭ（Random Access Memory）５３にロードされた前記ソフトウェアモジュールを構成する各種プログラムに従って各種の処理を実行する。ＲＡＭ５３にはまた、ＣＰＵ５１が各種の処理を実行する上において必要なデータなども適宜記憶される。 Next, a hardware configuration of the VoIP client 2 that actually executes the software module will be described. FIG. 5 shows the configuration of the VoIP client 2. In FIG. 5, a CPU (Central Processing Unit) 51 is loaded into a RAM (Random Access Memory) 53 from various programs constituting the software module stored in a ROM (Read Only Memory) 52 or a storage unit 58. Various processes are executed in accordance with various programs constituting the software module. The RAM 53 also appropriately stores data necessary for the CPU 51 to execute various processes.

ＣＰＵ５１，ＲＯＭ５２及びＲＡＭ５３は、バス５４を介して相互に接続されている。このバス５４にはまた、入出力インターフェース５５も接続されている。入出力インタフェース５５には、キーボード、マウスなどよりなる入力部５６、ＣＲＴ、ＬＣＤなどよりなるディスプレイ、並びに、ヘッドフォンやスピーカなどよりなる出力部５７、ハードディスクなどより構成される記憶部５８、モデム、ターミナルアダプタなどより構成される通信部５９が接続されている。ヘッドセット７のマイクロフォン７ａは入力部５６に含まれる。また、ヘッドフォン７ｂは出力部５７に含まれる。 The CPU 51, ROM 52 and RAM 53 are connected to each other via a bus 54. An input / output interface 55 is also connected to the bus 54. The input / output interface 55 includes an input unit 56 including a keyboard and a mouse, a display including a CRT and an LCD, an output unit 57 including headphones and speakers, a storage unit 58 including a hard disk, a modem, and a terminal. A communication unit 59 composed of an adapter or the like is connected. The microphone 7 a of the headset 7 is included in the input unit 56. The headphone 7 b is included in the output unit 57.

通信部５９は、インターネット４を介しての通信処理を行う。ＣＰＵ５１から提供されたデータを送信する。また通信部５９は通信相手から受信したデータをＣＰＵ５１、ＲＡＭ５３、記憶部５８に出力する。記憶部５８はＣＰＵ５１との間でやり取りし、情報の保存・消去を行う。通信部５９はまた、他のクライアントとの間で、アナログ信号またはデジタル信号の通信処理を行う。 The communication unit 59 performs communication processing via the Internet 4. Data provided from the CPU 51 is transmitted. The communication unit 59 outputs data received from the communication partner to the CPU 51, RAM 53, and storage unit 58. The storage unit 58 exchanges information with the CPU 51 to save and erase information. The communication unit 59 also performs analog signal or digital signal communication processing with other clients.

入出力インタフェース５５にはまた、必要に応じてドライブ６０が接続され、磁気ディスク６１、光ディスク６２、光磁気ディスク６３、或いは半導体メモリ６４などが適宜装着され、それらから読み出されたコンピュータプログラムが、必要に応じて記憶部５８にインストールされる。 A drive 60 is connected to the input / output interface 55 as necessary, and a magnetic disk 61, an optical disk 62, a magneto-optical disk 63, a semiconductor memory 64, or the like is appropriately mounted, and a computer program read from these is loaded. It is installed in the storage unit 58 as necessary.

なお、記憶部５８は例えばＨＤＤであり、図２に示したＳＥファイル記憶部１４、ＢＧＭファイル記憶部１５、着信音ファイル記憶部３９を構成する。 The storage unit 58 is, for example, an HDD, and constitutes the SE file storage unit 14, the BGM file storage unit 15, and the ring tone file storage unit 39 shown in FIG.

以上のハードウェア構成は、ＶｏＩＰクライアント２及び５の構成を示すとともに、ＶｏＩＰサーバ６や、後述のＷｅｂサーバの構成を示すものでもある。 The above hardware configuration shows the configuration of the VoIP clients 2 and 5 and also the configuration of the VoIP server 6 and a Web server described later.

次に、出力部５７を構成するディスプレイに表示されるＧＵＩ（Graphical Use Interface）について図６を参照して説明する。このＧＵＩは、ＶｏＩＰクライアントのアプリケーション層に属する。ＰＣをユーザが視覚的に操作するためのインターフェースであり、ユーザの手入力情報をハンドリングする。このＧＵＩは、上部から下部に向かって、アプリケーション制御部７１、情報表示部７２、ダイヤル部７３、ヘッドセットボリューム部７４、スピーカボリューム部７５、効果音（ＳＥ）選択表示部７６、ＳＥ制御部７７、ＢＧＭ選択表示部７８、ＢＧＭ制御部７９を備えている。 Next, GUI (Graphical Use Interface) displayed on the display constituting the output unit 57 will be described with reference to FIG. This GUI belongs to the application layer of the VoIP client. It is an interface for the user to visually operate the PC, and handles user's manual input information. From the upper part to the lower part, the GUI includes an application control unit 71, an information display unit 72, a dial unit 73, a headset volume unit 74, a speaker volume unit 75, a sound effect (SE) selection display unit 76, and an SE control unit 77. , A BGM selection display unit 78 and a BGM control unit 79 are provided.

アプリケーション制御部７１は、ＶｏＩＰクライアントアプリケーションの終了処理を行う。情報表示部７２は、ダイヤル番号、相手情報（話中等）を表示する。ダイヤル部７３は、ＶｏＩＰ相手先をダイヤルするテンキーである。ヘッドセットボリューム部７４は、ヘッドセット７のヘッドフォン７ｂから出力される音量を調節するためのものである。使用者がマウスを用いてスライダ７４ａを左右に移動することにより、ゲイン調整部３６におけるゲイン係数ｋ５を設定することになる。また、ヘッドフォン７ｂから出力される着信音の音量を調節するために用いてもよい。この場合には、使用者がマウスを用いてスライダ７４ａを左右に移動することにより、ゲイン調整部４２におけるゲイン係数ｋ６を設定することになる。 The application control unit 71 performs termination processing for the VoIP client application. The information display unit 72 displays dial numbers and partner information (busy, etc.). The dial unit 73 is a numeric keypad for dialing a VoIP partner. The headset volume unit 74 is for adjusting the volume output from the headphones 7 b of the headset 7. When the user moves the slider 74a to the left and right using the mouse, the gain coefficient k5 in the gain adjusting unit 36 is set. Moreover, you may use in order to adjust the volume of the ringtone output from the headphones 7b. In this case, the gain coefficient k6 in the gain adjustment unit 42 is set by the user moving the slider 74a left and right using the mouse.

スピーカボリューム部７５は、スピーカ４６から出力される着信音のボリュームを調整するためのものである。使用者がマウスを用いてスライダ７５ａを左右に移動することにより、ゲイン調整部４３におけるゲイン係数ｋ７を設定することになる。 The speaker volume unit 75 is for adjusting the volume of the ringtone output from the speaker 46. When the user moves the slider 75a left and right using the mouse, the gain coefficient k7 in the gain adjusting unit 43 is set.

ＳＥ選択表示部７６は、ユーザに選択させる使用可能なＳＥ音源データファイル（ＳＥファイル記憶部１４に記憶されているＳＥファイル）を表示するものであり、例えば銃声音、雷音、拍手の音、歓声等の効果音を使用者に選択させるために表示する。ＳＥ制御部７７は、効果音の再生及び停止、並びに音量調整を、再生ボタン７７ｂ、停止ボタン７７ｃ及びスライダ７７ａを用いた使用者にマウス等の入力部を介して行わせる。 The SE selection display unit 76 displays usable SE sound source data files (SE files stored in the SE file storage unit 14) to be selected by the user. For example, gunshot sound, thunder, applause sound, A sound effect such as cheers is displayed for the user to select. The SE control unit 77 causes the user using the play button 77b, the stop button 77c, and the slider 77a to play and stop sound effects and adjust the volume via an input unit such as a mouse.

例えば、使用者がマウスを用いてＳＥ選択表示部７６にて所望のＳＥを選択し、スライダ７７ａを適切な位置に移動し、再生ボタン７７ｂをクリックしたとする。すると、デコード部１７は、ＳＥファイル読み出し部１６で読み出された所望のＳＥファイルをデコードし、ゲイン調整部１８にてスライダ７７ａに対応したＳＥ音量レベルであるゲイン係数ｋ２がＳＥファイルのＰＣＭデータに乗算され加算部１３に出力される。これにより、効果音の各種効果音で使用者が通話相手への気持ち等を表現することができる。 For example, it is assumed that the user selects a desired SE on the SE selection display unit 76 using the mouse, moves the slider 77a to an appropriate position, and clicks the play button 77b. Then, the decoding unit 17 decodes the desired SE file read by the SE file reading unit 16, and the gain adjustment unit 18 sets the gain coefficient k2 that is the SE volume level corresponding to the slider 77a to the PCM data of the SE file. And output to the adder 13. Thereby, the user can express feelings for the other party with various sound effects.

ＢＧＭ選択表示部７８は、ユーザに選択させる使用可能なＢＧＭ音源データファイルを表示する。ＢＧＭ制御部７０は、ＢＧＭの再生及び停止、並びに音量調整を、再生ボタン７９ｂ、停止ボタン７９ｃ及びスライダ７９ａを用いた使用者にマウス等の入力部を介して行わせる。例えば、使用者がマウスを用いてＢＧＭ選択表示部７８にて所望のＢＧＭを選択し、スライダ７９ａを適切な位置に移動し、再生ボタン７９ｂをクリックしたとする。すると、デコード部２０は、ＢＧＭファイル読み出し部１９で読み出された所望のＢＧＭファイルをデコードし、ゲイン調整部２１にてスライダ７９ａに対応したＢＧＭ音量レベルであるゲイン係数ｋ３がＢＧＭファイルのＰＣＭデータに乗算され加算部１３に出力される。これにより、ＳＥと同様、使用者自身が選択し、調節した音量により、使用者の気分やその場の雰囲気を通信相手へ伝えることができる。 The BGM selection display unit 78 displays usable BGM sound source data files to be selected by the user. The BGM control unit 70 causes the user using the playback button 79b, the stop button 79c, and the slider 79a to perform playback and stop of the BGM and volume adjustment via an input unit such as a mouse. For example, it is assumed that the user selects a desired BGM on the BGM selection display unit 78 using the mouse, moves the slider 79a to an appropriate position, and clicks the play button 79b. Then, the decoding unit 20 decodes the desired BGM file read by the BGM file reading unit 19, and the gain adjustment unit 21 sets the gain coefficient k3 that is the BGM volume level corresponding to the slider 79a to the PCM data of the BGM file. And output to the adder 13. Thereby, like SE, the user's mood and the atmosphere of the place can be communicated to the communication partner by the volume selected and adjusted by the user himself / herself.

したがって、ＶｏＩＰクライアント２は、前記ソフトウェアモジュールを構成する各種プログラムを実行することにより、ヘッドセット７とスピーカ４６で着信音の音量を独立に調節できる。 Accordingly, the VoIP client 2 can independently adjust the volume of the ringtone by the headset 7 and the speaker 46 by executing various programs constituting the software module.

ＶｏＩＰ通話システムの構成図である。It is a block diagram of a VoIP call system. ＶｏＩＰクライアントの機能ブロック図である。It is a functional block diagram of a VoIP client. ＲＴＰパケットのフォーマット図である。It is a format diagram of an RTP packet. ＶｏＩＰクライアントが実行するソフトウェアモジュール示す図である。It is a figure which shows the software module which a VoIP client performs. ＶｏＩＰクライアントとなるＰＣのハードウェア構成図である。It is a hardware block diagram of PC used as a VoIP client. ＶｏＩＰクライアントの表示部に表示されるＧＵＩを示す図である。It is a figure which shows GUI displayed on the display part of a VoIP client.

Explanation of symbols

１ＶｏＩＰシステム、２，５ＶｏＩＰクライアント、４インターネット、６ＶｏＩＰサーバ、７ヘッドセット、１２ゲイン調整部、１３合成部、１４ＳＥファイル、１５ＢＧＭファイル、１７デコード部、１８ゲイン調整部、２１ゲイン調整部、２２エンコード、３６ゲイン調整部、４２ゲイン調整部、４３ゲイン調整部 1 VoIP system, 2,5 VoIP client, 4 Internet, 6 VoIP server, 7 headset, 12 gain adjustment unit, 13 synthesis unit, 14 SE file, 15 BGM file, 17 decoding unit, 18 gain adjustment unit, 21 gain adjustment Unit, 22 encoding, 36 gain adjusting unit, 42 gain adjusting unit, 43 gain adjusting unit

Claims

In a communication device that performs two-way communication for voice conversation over a network,
As a transmission system,
First gain adjusting means for adjusting the gain by multiplying the audio signal from the audio converting means for converting the collected sound into an electric signal by a variable gain coefficient;
First sound data storage means for storing sound data in units of files;
First decoding means for decoding sound data in units of files read from the first sound data storage means;
Second gain adjusting means for adjusting the gain by multiplying the decode output from the first decoding means by a variable gain coefficient;
Second sound data storage means for storing sound data in file units;
Second decoding means for decoding sound data in units of files read from the second sound data storage means;
Third gain adjusting means for adjusting the gain by multiplying the decode output from the second decoding means by a variable gain coefficient;
Combining means for combining the first output from the first gain adjusting means, the second output from the second gain adjusting means, and the third output from the third gain adjusting means;
Encoding means for encoding the combined output of the combining means;
Transmission means for transmitting the encoded output from the encoding means to the network,
As a receiving system,
Receiving means for receiving the encoded output transmitted from the transmitting means of another telephone device via the network;
Received data decoding means for decoding the encoded data received by the receiving means;
A fourth gain adjusting means for adjusting the gain by multiplying the decoded output from the received data decoding means by a variable gain coefficient;
Ringtone data storage means for storing ringtone data in units of files;
Fifth gain adjusting means for adjusting the gain by multiplying the data from the ring tone data storing means by a variable gain coefficient;
Second combining means for combining the fourth output from the fourth gain adjusting means and the fifth output from the fifth gain adjusting means;
Headphones reproducing means for driving the headphones based on the output of the second synthesizing means ;
Sixth gain adjusting means for adjusting the gain by multiplying the data from the ring tone data storing means by a variable gain coefficient;
And a speaker reproducing means for driving the speaker based on the output of the sixth gain adjusting means.

In a call method that performs two-way communication for voice conversation over a network,
As a transmission system,
A first gain adjustment step of adjusting the gain by multiplying the audio signal from the audio conversion means for converting the collected audio into an electric signal by a variable gain coefficient;
A first decoding step of reading out and decoding sound data stored in file units in the first sound data storage means;
A second gain adjustment step of adjusting the gain by multiplying the decode output from the first decoding step by a variable gain coefficient;
A second decoding step of reading out and decoding the sound data stored in units of files in the second sound data storage means;
A third gain adjustment step of adjusting the gain by multiplying the decode output from the second decoding means by a variable gain coefficient;
A first combining step of combining the first output from the first gain adjustment step, the second output from the second gain adjustment step, and the third output from the third gain adjustment step. When,
An encoding step for encoding the combined output of the first combining step;
A transmission step of transmitting the encoded output from the encoding means to the network,
As a receiving system,
A receiving step of receiving an encoded output transmitted from another communication device via the network;
A received data decoding step for decoding the encoded data received in the receiving step;
A fourth gain adjusting step of adjusting the gain by multiplying the decoded output from the received data decoding step by a variable gain coefficient;
A fifth gain adjustment step of adjusting the gain multiplied by the variable gain coefficient ringtone ringtone data stored in files in the ring tone data storage means,
A second combining step of combining the fourth output from the fourth gain adjustment step and the fifth output from the fifth gain adjustment step;
A headphone playback step of driving the headphones based on the output of the second synthesis step ;
A sixth gain adjustment step of adjusting the gain by multiplying the data from the ring tone data storage means by a variable gain coefficient;
And a speaker reproduction step of driving the speaker based on the output of the sixth gain adjustment step.