JP2010056634A

JP2010056634A - Telephone communication system, voice data processor, program and method

Info

Publication number: JP2010056634A
Application number: JP2008216792A
Authority: JP
Inventors: Masahiro Hayashi; 正博林; Shu Kasahara; 収笠原
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2008-08-26
Filing date: 2008-08-26
Publication date: 2010-03-11
Anticipated expiration: 2028-08-26
Also published as: JP5200764B2

Abstract

PROBLEM TO BE SOLVED: To suppress volume for voice data of an arbitrary telephone terminal when synthesizing (mixing) voice data transmitted from a plurality of telephone terminals. SOLUTION: A telephone communication system has: a plurality of first telephone terminals; a voice data processor for mixing voice data supplied from all the first telephone terminals to create synthetic voice data; and a second telephone terminal for outputting voice of the synthetic voice data created by the voice data processor. The voice data processor has: a means for registering identification information on the respective first telephone terminals in association with designated volume value; a means for processing the voice data supplied from the respective first telephone terminals into voice data whose volume is adjusted on the basis of the designated volume value registered to the designated volume value registering means; and a means for mixing processed voice data to create synthetic voice data. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、電話通信システム、並びに、音声データ処理装置、プログラム及び方法に関し、例えば、電話会議システムに適用することができる。 The present invention relates to a telephone communication system, an audio data processing apparatus, a program, and a method, and can be applied to, for example, a telephone conference system.

特許文献１には、複数の接続相手を対象に音声情報を送受する際に、回線接続相手によって音声情報の音量を適正化制御する音量自動制御機能をもつことにより、複数の通話相手を対象に、常に、円滑に適正音量で通話を開始することの出来るマルチメディア情報通信システムが示されている。 Patent Document 1 has a volume automatic control function that optimizes the volume of voice information by a line connection partner when sending and receiving voice information to a plurality of connection partners. There is shown a multimedia information communication system that can always start a call at a proper volume smoothly and smoothly.

特許文献２には、ダイナミックに音量調整し、一定の音量で聞くことが可能な音量調整装置が示されている。
特開２００１−９４６０４号公報特開２００６−１４０５４２号公報 Japanese Patent Application Laid-Open No. 2003-228561 discloses a volume adjusting device that can adjust the volume dynamically and listen at a constant volume.
JP 2001-94604 A JP 2006-140542 A

しかしながら、特許文献１、２に記載の方法では、複数の話者が会話を行う電話会議システムで、会議に参加する複数の話者のうち任意の話者の音声量（音声のボリューム）を、音声を受信する側から適宜に制御することができない。すなわち、会議システムでＡ者、Ｂ者、Ｃ者、および、Ｄ者により会議が開始され、それぞれの話者の間で通話の対向関係が確立された状態で、Ａ者がＢ者、Ｃ者、および、Ｄ者の音声を聞くとき（Ａ者にＢ者、Ｃ者、および、Ｄ者の音声が同時に流入するとき）、Ａ者がＤ者の音声量を適宜に制御しようとした場合、従来の方法を会議システムに適用したとしても行えない。特許文献１、２に記載の方法では、例えば、ＰｏＣ（ＰｕｓｈｔｏｔａｌｋｏｖｅｒＣｅｌｌｕｌａｒ）などの、複数の話者が交代で話す形態の電話会議に適用するものであり、通常の対面での会議のように複数の話者が同時に話す場合に、任意の話者の音量を適宜に制御することはできないからである。 However, in the methods described in Patent Documents 1 and 2, in a conference call system in which a plurality of speakers have a conversation, the volume of sound (sound volume) of an arbitrary speaker among a plurality of speakers participating in the conference is It cannot be appropriately controlled from the voice receiving side. That is, in the conference system, the party A, the party B, the party C, and the party D are started, and the party A is the party B, the party C in a state where the opposite relationship of the call is established between the speakers. When listening to the voice of person D (when the voices of person B, person C, and person D simultaneously flow into person A), if person A tries to control the amount of voice of person D appropriately, Even if the conventional method is applied to the conference system, it cannot be performed. The methods described in Patent Documents 1 and 2 are applied to a conference call in which a plurality of speakers speak alternately, such as PoC (Push to talk over Cellular). This is because the volume of an arbitrary speaker cannot be appropriately controlled when a plurality of speakers speak at the same time.

そのため、複数の電話端末から送出された音声データを合成（ミキシング）する際に、任意の電話端末の音声データに係る音量を制御することができる電話通信システム、並びに、音声データ処理装置、プログラム及び方法が望まれている。 Therefore, when synthesizing (mixing) audio data transmitted from a plurality of telephone terminals, a telephone communication system capable of controlling the volume related to the audio data of any telephone terminal, as well as an audio data processing device, a program, and A method is desired.

第１の本発明の電話通信システムは、（１）複数の第１の電話端末と、全ての上記第１の電話端末から与えられた音声データを合成して合成音声データを作成する音声データ処理装置と、上記音声データ処理装置が作成した合成音声データの音声を出力する第２の電話端末とを有する電話通信システムにおいて、（２）上記音声データ処理装置は、（２−１）上記各第１の電話端末の識別情報と指定音量値とを対応付けて登録する指定音量値登録手段と、（２−２）上記各第１の電話端末から与えられた音声データを、上記指定音量値登録手段に登録されている指定音量値に基づいて音量調整した音量データに加工する音声データ加工手段と、（２−３）上記音声データ加工手段により加工された音声データを合成して、合成音声データを作成する音声データ合成手段とを有することを特徴とする。 The telephone communication system according to the first aspect of the present invention includes: (1) voice data processing for generating synthesized voice data by synthesizing a plurality of first telephone terminals and voice data provided from all the first telephone terminals. In a telephone communication system having a device and a second telephone terminal that outputs the voice of the synthesized voice data created by the voice data processing device, (2) the voice data processing device is (2-1) Designated volume value registration means for registering identification information of one telephone terminal and a designated volume value in association with each other; (2-2) voice data given from each of the first telephone terminals is registered in the designated volume value Voice data processing means for processing the volume data whose volume is adjusted based on the designated volume value registered in the means; and (2-3) synthesized voice data by synthesizing the voice data processed by the voice data processing means. Make And having a speech data synthesizing means for.

第２の本発明の音声データ処理装置は、（１）複数の第１の電話端末と、全ての上記第１の電話端末から与えられた音声データを合成して合成音声データを作成する音声データ処理装置と、上記音声データ処理装置が作成した合成音声データの音声を出力する第２の電話端末とを有する電話通信システムを構成する上記音声データ処理装置において、（２）上記各第１の電話端末の識別情報と指定音量値とを対応付けて登録する指定音量値登録手段と、（３）上記各第１の電話端末から与えられた音声データを、上記指定音量値登録手段に登録されている指定音量値に基づいて音量調整した音量データに加工する音声データ加工手段と、（４）上記音声データ加工手段により加工された音声データを合成して、合成音声データを作成する音声データ合成手段とを有することを特徴とする。 The voice data processing apparatus according to the second aspect of the present invention is (1) voice data for generating synthesized voice data by synthesizing a plurality of first telephone terminals and voice data given from all the first telephone terminals. In the voice data processing apparatus constituting the telephone communication system, comprising: a processing device; and a second telephone terminal that outputs the voice of the synthesized voice data created by the voice data processing device. (2) Each of the first telephones Designated volume value registration means for registering terminal identification information and designated volume value in association with each other; and (3) voice data given from each of the first telephone terminals is registered in the designated volume value registration means. Voice data processing means for processing the volume data whose volume is adjusted based on the specified volume value, and (4) voice for generating synthesized voice data by synthesizing the voice data processed by the voice data processing means And having a chromatography data synthesizing means.

第３の本発明の音声データ処理プログラムは、（１）複数の第１の電話端末と、全ての上記第１の電話端末から与えられた音声データを合成して合成音声データを作成する音声データ処理装置と、上記音声データ処理装置が作成した合成音声データの音声を出力する第２の電話端末とを有する電話通信システムを構成する上記音声データ処理装置に搭載されたコンピュータを、（２）上記各第１の電話端末の識別情報と指定音量値とを対応付けて登録する指定音量値登録手段と、（３）上記各第１の電話端末から与えられた音声データを、上記指定音量値登録手段に登録されている指定音量値に基づいて音量調整した音量データに加工する音声データ加工手段と、（４）上記音声データ加工手段により加工された音声データを合成して、合成音声データを作成する音声データ合成手段として機能させることを特徴とする。 The voice data processing program according to the third aspect of the present invention provides (1) voice data for generating synthesized voice data by synthesizing a plurality of first telephone terminals and voice data given from all the first telephone terminals. (2) the computer mounted on the voice data processing device constituting the telephone communication system comprising the processing device and the second telephone terminal that outputs the voice of the synthesized voice data created by the voice data processing device; Designated volume value registering means for registering the identification information and the designated volume value of each first telephone terminal in association with each other; and (3) registering the voice data given from each first telephone terminal with the designated volume value registration. Voice data processing means for processing the volume data whose volume is adjusted based on the designated volume value registered in the means, and (4) synthesizing the voice data processed by the voice data processing means. Wherein the function as the audio data synthesizing means for creating voice data.

第４の本発明の音声データ処理方法は、（１）複数の第１の電話端末と、全ての上記第１の電話端末から与えられた音声データを合成して合成音声データを作成する音声データ処理装置と、上記音声データ処理装置が作成した合成音声データの音声を出力する第２の電話端末とを有する電話通信システムにおける音声データ処理方法において、（２）指定音量値登録手段、音声データ加工手段、音声データ合成手段を有し、（３）上記指定音量値登録手段は、上記音声データ処理装置において、上記各第１の電話端末の識別情報と指定音量値とを対応付けて登録し、（４）上記音声データ加工手段は、上記音声データ処理装置において、上記各第１の電話端末から与えられた音声データを、上記指定音量値登録手段に登録されている指定音量値に基づいて音量調整した音量データに加工し、（５）上記音声データ合成手段は、上記音声データ処理装置において、上記音声データ加工手段により加工された音声データを合成して、合成音声データを作成することを特徴とする。 The voice data processing method of the fourth aspect of the present invention is: (1) voice data for generating synthesized voice data by synthesizing a plurality of first telephone terminals and voice data given from all the first telephone terminals. In a voice data processing method in a telephone communication system having a processing device and a second telephone terminal that outputs the voice of synthesized voice data created by the voice data processing device, (2) designated volume value registration means, voice data processing And (3) the designated volume value registering unit registers the identification information of each first telephone terminal and the designated volume value in association with each other in the voice data processing device, (4) The voice data processing means uses the designated volume registered in the designated volume value registration means for the voice data given from the first telephone terminals in the voice data processing device. (5) The voice data synthesizing means synthesizes the voice data processed by the voice data processing means in the voice data processing device to create synthesized voice data. It is characterized by doing.

本発明によれば、複数の電話端末から送出された音声データを合成（ミキシング）する際に、任意の電話端末の音声データに係る音量を制御することができる。 According to the present invention, when synthesizing (mixing) audio data transmitted from a plurality of telephone terminals, the volume related to the audio data of any telephone terminal can be controlled.

（Ａ）第１の実施形態
以下、本発明による電話通信システム、並びに、音声データ処理装置、プログラム及び方法の第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of a telephone communication system, an audio data processing device, a program, and a method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、この実施形態の電話通信システム１の全体構成を示すブロック図である。 (A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the overall configuration of the telephone communication system 1 of this embodiment.

電話通信システム１は、図１において、ＩＰ網Ｎに、４台の電話端末２０、３０、４０、５０が接続されている。電話端末間において、音声データは、ＲＴＰ（Ｒｅａｌ-ｔｉｍｅＴｒａｎｓｐｏｒｔＰｒｏｔｏｃｏｌ；ＩＥＴＦＲＦＣ１８８９参照）の形式のパケット（以下、「ＲＴＰパケット」という）で伝送されるものとして説明する。 In the telephone communication system 1, four telephone terminals 20, 30, 40, 50 are connected to the IP network N in FIG. It is assumed that voice data is transmitted between telephone terminals in a packet (hereinafter referred to as “RTP packet”) in the format of RTP (Real-time Transport Protocol; see IETF RFC1889).

電話端末２０は、ＩＰ網Ｎに接続されている電話端末であり、通話処理装置２１を有している。電話端末２０は、電話端末３０〜５０と接続し、各電話端末のユーザとの電話会議を提供するものである。なお、図１において図示は省略しているが、電話端末２０には、スピーカやマイクロフォンなど、電話端末が有している他の構成も有しているものとする。 The telephone terminal 20 is a telephone terminal connected to the IP network N and has a call processing device 21. The telephone terminal 20 is connected to the telephone terminals 30 to 50 and provides a telephone conference with the user of each telephone terminal. Although not shown in FIG. 1, the telephone terminal 20 is assumed to have other configurations such as a speaker and a microphone that the telephone terminal has.

通話処理装置３１は、電話端末２０において、電話会議などの通話に係る処理を行う機能を担っているものであり、電話機能部２２、会議機能部２３、音声データ処理装置２４を有している。 The call processing device 31 has a function of performing processing related to a call such as a conference call in the telephone terminal 20, and includes a telephone function unit 22, a conference function unit 23, and a voice data processing device 24. .

通話処理装置３１は、電話端末に搭載されたものであっても良いし、ソフトフォンとして構築されたパソコンなどの情報処理装置上に実現するようにしても良い。 The call processing device 31 may be mounted on a telephone terminal, or may be realized on an information processing device such as a personal computer constructed as a soft phone.

電話機能部２２は、通話処理装置２１において、音声通話を行う機能を担っている。 The telephone function unit 22 has a function of performing a voice call in the call processing device 21.

会議機能部２３は、通話処理装置２１において、複数の通話相手（電話端末３０〜５０）と同時に通話するための会議機能（例えば、会議参加する電話端末とのセッション管理など）を提供する機能を担っている。会議機能部２３は、既存の電話会議システムにおける、会議機能と同様のものを適用することができる。 The conference function unit 23 has a function of providing a conference function (for example, session management with a telephone terminal participating in a conference) for making a call simultaneously with a plurality of call partners (telephone terminals 30 to 50) in the call processing device 21. I'm in charge. The conference function unit 23 can apply the same function as the conference function in the existing conference call system.

図２は、音声データ処理装置２４内部の機能的構成について示したブロック図である。 FIG. 2 is a block diagram showing a functional configuration inside the audio data processing device 24.

音声データ処理装置２４は、ＣＰＵ、ＲＯＭ、ＲＡＭ、ＥＥＰＲＯＭ、ハードディスクなどのプログラムの実行構成有する情報処理装置（１台に限定されず、複数台を分散処理し得るようにしたものであっても良い。）上に、実施形態の音声データ処理プログラム等をインストールすることにより構築されるものであり、機能的には上述の図２のように示すことができる。例えば、電話端末２０が、パソコンなどの情報処理装置上に実現されている場合には、音声データ処理装置２４もその情報処理装置上に、実施形態の音声データ処理プログラム等をインストールすることにより構築しても良い。 The audio data processing device 24 is an information processing device having a program execution configuration such as a CPU, ROM, RAM, EEPROM, hard disk, etc. (not limited to one, but may be a device that can process a plurality of devices in a distributed manner. In addition, it is constructed by installing the audio data processing program or the like of the embodiment, and can be functionally shown as in FIG. 2 described above. For example, when the telephone terminal 20 is realized on an information processing device such as a personal computer, the voice data processing device 24 is also constructed by installing the voice data processing program or the like of the embodiment on the information processing device. You may do it.

音声データ処理装置２４は、複数の通話相手（電話端末３０〜５０）から受信したＲＴＰパケットの音声データを相手先ごとに音量を指定してミキシングを行い、一つの音声データに合成して出力するものであり、制御部２４３、音量制御部２４４、パケット処理部２４５（２４５−１〜２４５−３）を有している。 The voice data processing device 24 mixes voice data of RTP packets received from a plurality of call partners (telephone terminals 30 to 50) by designating the volume for each caller, synthesizes it into one voice data, and outputs it. It has a control unit 243, a volume control unit 244, and a packet processing unit 245 (245-1 to 245-3).

ＵＤＰ／ＩＰ制御部２４６は、ＲＴＰパケットを受信する汎用ソケットの機能を担っている。 The UDP / IP control unit 246 has a function of a general-purpose socket that receives RTP packets.

パケット処理部２４５（２４５−１〜２４５−３）は、ＵＤＰ／ＩＰ制御部２４６が受信したＲＴＰパケットについて、バッファリングや送信元の解析などの処理を行うものであり、パケット解析部２４５ａ（２４５ａ−１〜２４５ａ−３）、バッファ部２４５ｂ（２４５ｂ−１〜２４５ｂ−３）を有している。 The packet processing unit 245 (245-1 to 245-3) performs processing such as buffering and transmission source analysis on the RTP packet received by the UDP / IP control unit 246. The packet analysis unit 245a (245a) -1 to 245a-3) and buffer portion 245b (245b-1 to 245b-3).

音声データ処理装置２４において、パケット処理部２４５は、通信相手先ごとに配置されるものであり、通信相手は、電話端末３０〜５０と３つあるので、図２では３つのパケット処理部２４５−１〜２４５−３が配置されている。なお、音声データ処理装置２４においては、パケット処理部２４５の数は固定でも良いし、動的に変動させるようにしても良い。例えば、それぞれのパケット処理部２４５を、プログラム上における一つのスレッドとして生成し、動的にそのスレッドの数を変動させるようにしても良い。この実施形態においては、ＵＤＰ／ＩＰ制御部２４６が、ＲＴＰ（Ｒｅａｌ−ｔｉｍｅＴｒａｎｓｐｏｒｔＰｒｏｔｏｃｏｌ）パケットを受信した場合に、パケットの送信元ごとにパケット処理部２４５のスレッドが生成されるものとして説明する。また、音声データ処理装置２４において、予めパケット処理部２４５を複数配置しておいて、通信相手の数だけ利用するようにしても良い。このように、音声データ処理装置２４において、パケット処理部２４５の配置（生成）や管理方法は限定されないものである。 In the voice data processing device 24, the packet processing unit 245 is arranged for each communication partner, and there are three communication terminals, the telephone terminals 30 to 50. Therefore, in FIG. 1 to 245-3 are arranged. In the audio data processing device 24, the number of packet processing units 245 may be fixed or dynamically changed. For example, each packet processing unit 245 may be generated as one thread on the program, and the number of threads may be dynamically changed. In this embodiment, description will be made assuming that when the UDP / IP control unit 246 receives an RTP (Real-time Transport Protocol) packet, a thread of the packet processing unit 245 is generated for each transmission source of the packet. Further, in the audio data processing device 24, a plurality of packet processing units 245 may be arranged in advance and used as many as the number of communication partners. As described above, in the audio data processing device 24, the arrangement (generation) and management method of the packet processing unit 245 are not limited.

バッファ部２４５ｂ−１〜２４５ｂ−３は、ＲＴＰパケットを受信した際に、相手先ごとにＶｏＩＰ特有の、遅延・揺らぎを吸収して、パケット解析部２４５ａ−１〜２４５ａ−３にＲＴＰパケットを供給するジッタバッファの機能を担っている。 When receiving RTP packets, the buffer units 245b-1 to 245b-3 absorb delay and fluctuation peculiar to VoIP for each partner and supply the RTP packets to the packet analysis units 245a-1 to 245a-3. It plays the role of a jitter buffer.

パケット解析部２４５ａ−１〜２４５ａ−３は、バッファ部２４５ｂ−１〜２４５ｂ−３から与えられたＲＴＰパケットの送信元を解析する。この実施形態においては、例として、パケット解析部２４５ａ−１〜２４５ａ−３は、ＲＴＰパケットのヘッダ情報から送信元に係る情報として、送信元ＩＰアドレスを抽出することにより、送信元を解析するものとして説明する、そして、パケット解析部２４５ａ−１〜２４５ａ−３は、ＲＴＰパケットと、そのＲＴＰパケットから抽出した送信元ＩＰアドレスを組として、音量制御部２４４に与える。なお、以下の説明においては、電話端末３０のＩＰドレスをＡ、電話端末４０のＩＰドレスをＢ、電話端末５０のＩＰアドレスをＣとして説明する。 The packet analysis units 245a-1 to 245a-3 analyze the transmission source of the RTP packets given from the buffer units 245b-1 to 245b-3. In this embodiment, as an example, the packet analysis units 245a-1 to 245a-3 analyze the transmission source by extracting the transmission source IP address as information related to the transmission source from the header information of the RTP packet. Then, the packet analysis units 245a-1 to 245a-3 give the RTP packet and the transmission source IP address extracted from the RTP packet as a set to the volume control unit 244. In the following description, the IP address of the telephone terminal 30 is A, the IP address of the telephone terminal 40 is B, and the IP address of the telephone terminal 50 is C.

制御部２４３は、上位機能部（電話機能部２２、会議機能部２３など）との間で音量調整用のインタフェースを備え、現在通話中の相手先（複数）に係る情報を保持するものであり、音量制御指定部２４３ａを有している。 The control unit 243 has an interface for volume adjustment with the higher-level function units (telephone function unit 22, conference function unit 23, etc.), and holds information relating to the other party (several) who are currently talking. And a volume control designation unit 243a.

音量制御指定部２４３ａでは、現在電話端末２０において、電話会議の接続中の相手先の電話端末の識別情報と、相手先の電話端末ごとの音量値（以下、「指定音量値」という）を対応付けて登録する。指定音量値としては、例えば、１０ｄｂ、２０ｄｂなどの具体的な音量値を設定しても良いし、１、２、３…などの係数を設定するようにしても良いが、この実施形態においては係数を設定するものとして説明する。また、音量制御指定部２４３ａでは、例えば、指定音量値としてデフォルト値を予め設定しておき、ユーザの操作に応じて、指定音量値をそのデフォルト値から上下させるようにしても良い。 In the volume control designation unit 243a, the identification information of the other party's telephone terminal currently connected to the conference call and the volume value for each destination telephone terminal (hereinafter referred to as “designated volume value”) are supported in the telephone terminal 20. Add and register. As the designated volume value, for example, a specific volume value such as 10 db or 20 db may be set, or a coefficient such as 1, 2, 3,... May be set. Description will be made assuming that the coefficient is set. Further, in the volume control designation unit 243a, for example, a default value may be set in advance as the designated volume value, and the designated volume value may be raised or lowered from the default value according to a user operation.

音量制御指定部２４３ａに設定される指定音量値は、予め設定された内容であっても良いし、ユーザの操作に応じて変更するようにしても良い。例えば、電話端末２０において、電話会議の相手先の電話端末（ＩＰアドレス）ごとに指定音量値をより高くしたり、低くしたりするボタンを備えたり、指定音量値を設定するテンキーを備えるようにしても良いし、通話処理装置２１が、ソフトフォンとして構築されたパソコンなどの情報処理装置上に実現されている場合には、その情報処理装置（パソコン）上でマウス操作やキー操作などにより、ユーザに電話会議の相手先の電話端末（ＩＰアドレス）ごとの指定音量値を入力させるようにしても良く、ユーザに指定音量値を入力させる構成は限定されないものである。また、音量制御指定部２４３ａに設定される指定音量値は、電話端末２０において、電話会議を接続中に、ユーザの操作に応じて変更するようにしても良い。 The designated volume value set in the volume control designation unit 243a may be a preset content or may be changed according to a user operation. For example, the telephone terminal 20 is provided with a button for raising or lowering the designated volume value for each telephone terminal (IP address) of the destination of the conference call, or with a numeric keypad for setting the designated volume value. Alternatively, when the call processing device 21 is realized on an information processing device such as a personal computer constructed as a soft phone, a mouse operation or key operation on the information processing device (personal computer) The user may be allowed to input a designated volume value for each telephone terminal (IP address) of the other party of the conference call, and the configuration for allowing the user to input the designated volume value is not limited. Further, the designated volume value set in the volume control designation unit 243a may be changed according to the user's operation while the telephone terminal 20 is connected to the telephone conference.

また、音量制御指定部２４３ａにおいて、設定する対象の相手先の電話端末に係る識別情報は、例えば、上位機能部（電話機能部２２、会議機能部２３など）において、電話会議の設定（接続）をする際に適用されるセッション情報などから抽出して適用するようにしても良い。 In the volume control designation unit 243a, the identification information related to the telephone terminal of the destination to be set is, for example, set (connected) in a telephone conference in a higher-level function unit (telephone function unit 22, conference function unit 23, etc.). It may be extracted from the session information applied at the time of application.

音量制御部２４４は、パケット処理部２４５−１〜２４５−３から与えられたＲＴＰパケットについて、音量制御の処理を行うものであり、Ｇａｉｎ調整部２４４ａ、パケット制御部２４４ｂを有している。 The volume control unit 244 performs volume control processing on the RTP packets given from the packet processing units 245-1 to 245-3, and includes a gain adjustment unit 244a and a packet control unit 244b.

図３は、パケット制御部２４４ｂの処理について示した説明図である。 FIG. 3 is an explanatory diagram showing processing of the packet control unit 244b.

パケット制御部２４４ｂは、図３に示すように、パケット解析部２４５ａ−１〜２４５ａ−３から、送信元ＩＰアドレスと組となったＲＴＰパケットが与えられると、音量制御指定部２４３ａに登録されている指定音量値を、そのＲＴＰパケットに挿入して、Ｇａｉｎ調整部２４４ａに与える。 As shown in FIG. 3, when the RTP packet paired with the source IP address is given from the packet analysis units 245a-1 to 245a-3, the packet control unit 244b is registered in the volume control designation unit 243a. The designated volume value is inserted into the RTP packet and given to the gain adjusting unit 244a.

パケット制御部２４４ｂは、図３に示すように、パケット解析部２４５ａ−１〜２４５ａ−３から、送信元ＩＰアドレスと組となったＲＴＰパケットが与えられると、音量制御指定部２４３ａに登録されている情報から、その送信元ＩＰアドレスに対応する指定音量値を検索し、その指定音量値をＲＴＰパケットに設挿入する。 As shown in FIG. 3, when the RTP packet paired with the source IP address is given from the packet analysis units 245a-1 to 245a-3, the packet control unit 244b is registered in the volume control designation unit 243a. The designated volume value corresponding to the transmission source IP address is searched from the received information, and the designated volume value is inserted into the RTP packet.

図４は、Ｇａｉｎ調整部２４４ａ及び、ミキシング部２４２の処理について示した説明図である。 FIG. 4 is an explanatory diagram showing processing of the gain adjusting unit 244a and the mixing unit 242.

Ｇａｉｎ調整部２４４ａは、図４に示すように、パケット制御部２４４ｂから与えられたＲＴＰパケットから、音声データ（例えば、ＰＣＭのデータ）を抽出し、さらにその音声データの音量を、パケット制御部２４４ｂにより挿入された指定音量値により指定された音量に変更（Ｇａｉｎ調整）し、ミキシング部２４２に与えるものである。Ｇａｉｎ調整部２４４ａは、ＲＴＰパケットにおいて挿入されている指定音量値が、より大きいＲＴＰパケットに係る音声データの音量を大きく調整する。例えば、Ｇａｉｎ調整部２４４ａは、指定音量値が１のＲＴＰパケットの音声データよりも、指定音量値が３のＲＴＰパケットの音声データの方が大きな音量になるように調整する。 As shown in FIG. 4, the gain adjusting unit 244a extracts voice data (for example, PCM data) from the RTP packet given from the packet control unit 244b, and further determines the volume of the voice data to the packet control unit 244b. The volume is changed to the volume specified by the specified volume value inserted by (Gain adjustment), and is given to the mixing unit 242. Gain adjusting section 244a adjusts the volume of audio data related to an RTP packet having a larger specified volume value inserted in the RTP packet. For example, the gain adjusting unit 244a adjusts so that the sound data of the RTP packet with the designated sound volume value 3 is louder than the sound data of the RTP packet with the designated sound volume value 1.

Ｇａｉｎ調整部２４４ａは、例えば、指定音量値として、１、２、３、…などの係数が割当られていた場合には、指定音量値が１の場合には１０ｄｂ、２の場合には２０ｄｂなど、係数に対応する音量を予めＧａｉｎ調整部２４４ａに設定しておき、該当する音量に音声データの音量を調整するようにしても良い。 For example, when a coefficient such as 1, 2, 3,... Is assigned as the designated sound volume value, the gain adjusting unit 244a has 10 db when the designated sound volume value is 1, 20 db when the designated sound volume value is 2, and the like. The volume corresponding to the coefficient may be set in advance in the Gain adjusting unit 244a, and the volume of the audio data may be adjusted to the corresponding volume.

また、Ｇａｉｎ調整部２４４ａは、例えば、指定音量値に応じた音量の比率で音声データの音量を設定するようにしても良い。例えば、送信元ＩＰアドレスＡのＲＴＰパケットの指定音量値が２でＧａｉｎ調整後の音量値がＸ、送信元ＩＰアドレスＢの指定音量値が１でＧａｉｎ調整後の音量値がＹ、送信元ＩＰアドレスＣの指定音量値が１でＧａｉｎ調整後の音量値がＺであった場合には、Ｘ、Ｙ、Ｚの音量の比率が、２：１：１になるように調整することが挙げられる。 Further, the gain adjusting unit 244a may set the volume of the audio data at a volume ratio according to the specified volume value, for example. For example, the specified volume value of the RTP packet of the source IP address A is 2, the volume value after gain adjustment is X, the specified volume value of the source IP address B is 1, the volume value after gain adjustment is Y, and the source IP When the designated volume value of the address C is 1 and the volume value after Gain adjustment is Z, the volume ratio of X, Y, and Z is adjusted to be 2: 1: 1. .

ミキシング部２４２は、図４に示すように、送信元のＩＰドレスごとの音声データが、Ｇａｉｎ調整部２４４ａから与えられると、それらの音声データを一つの音声データにミキシング（合成）し、サウンド出力部２４１に与える。ミキシング部２４２において、複数の音声データを合成する方法としては、例えば、既存のＰＣＭデータを合成する方法や、既存のＶｏＩＰを用いた電話会議システムなどにおいて、複数の電話端末からの音声を合成する方法などを適用することができる。 As shown in FIG. 4, when the audio data for each IP address of the transmission source is given from the gain adjusting unit 244a, the mixing unit 242 mixes (combines) the audio data into one audio data, and outputs the sound. Part 241. In the mixing unit 242, as a method of synthesizing a plurality of voice data, for example, in a method of synthesizing existing PCM data or a conference call system using existing VoIP, a voice from a plurality of telephone terminals is synthesized. Methods etc. can be applied.

サウンド出力部２４１は、ミキシング部２４２から与えられた音声データを、音声出力装置（スピーカ）を用いて出力する。 The sound output unit 241 outputs the audio data given from the mixing unit 242 using an audio output device (speaker).

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態の電話通信システム１の全体の動作（実施形態の音声データ処理方法）を説明する。 (A-2) Operation of the First Embodiment Next, the overall operation (voice data processing method of the embodiment) of the telephone communication system 1 of the first embodiment having the above configuration will be described.

図５は、電話通信システム１における、通話処理装置２１の動作について説明したフローチャートである。 FIG. 5 is a flowchart illustrating the operation of the call processing device 21 in the telephone communication system 1.

図５の説明では、音声データ処理装置２４において、ＵＤＰ／ＩＰ制御部２４６の制御により、３つのパケット処理部２４５−１〜２４５−３（のスレッド）が生成されているものとして説明する。パケット処理部２４５−１は、電話端末３０向けに生成されたものであり、パケット処理部２４５−２は、電話端末４０向けに生成されたものであり、パケット処理部２４５−３は、電話端末５０向けに生成されたものであるものとして説明する。 In the description of FIG. 5, it is assumed that in the audio data processing device 24, three packet processing units 245-1 to 245-3 (threads thereof) are generated under the control of the UDP / IP control unit 246. The packet processing unit 245-1 is generated for the telephone terminal 30, the packet processing unit 245-2 is generated for the telephone terminal 40, and the packet processing unit 245-3 is the telephone terminal. A description will be given on the assumption that the data is generated for 50.

まず、ＵＤＰ／ＩＰ制御部２４６により、他の電話端末３０〜５０からＲＴＰパケットが受信されると（Ｓ１０１）、バッファ部２４５ｂ−１〜２４５ｂ−３に与えられ、バッファ部２４５ｂ−１〜２４５ｂ−３において一定量のＲＴＰパケットがバッファリングされると、パケット解析部２４５ａ−１〜２４５ａ−３において、ＲＴＰパケットの送信元ＩＰアドレスが解析される（Ｓ１０２）。 First, when the RTP packet is received from the other telephone terminals 30 to 50 by the UDP / IP control unit 246 (S101), the RTP packet is given to the buffer units 245b-1 to 245b-3, and the buffer units 245b-1 to 245b- 3, when a certain amount of RTP packets are buffered, the packet analyzers 245a-1 to 245a-3 analyze the source IP addresses of the RTP packets (S102).

そして、パケット制御部２４４ｂにおいて、制御部２４３（音量制御指定部２４３ａ）に設定されている音量指定値に基づいて、送信元のＩＰアドレスごとに、パケット解析部２４５ａ−１〜２４５ａ−３により解析されたＲＴＰパケットに指定音量値が挿入される（Ｓ１０３）。 Then, in the packet control unit 244b, the packet analysis units 245a-1 to 245a-3 perform analysis for each IP address of the transmission source on the basis of the volume designation value set in the control unit 243 (volume control designation unit 243a). The designated volume value is inserted into the RTP packet that has been sent (S103).

そして、Ｇａｉｎ調整部２４４ａにおいて、パケット制御部２４４ｂで指定音量値が挿入されたＲＴＰパケットから、音声データが抽出され、指定音量値に応じて音量が調整される（Ｓ１０４）。 Then, the gain adjusting unit 244a extracts voice data from the RTP packet in which the designated volume value is inserted by the packet control unit 244b, and adjusts the volume according to the designated volume value (S104).

そして、ミキシング部２４２において、Ｇａｉｎ調整部２４４ａにおいて音量が調整された音声データが、Ｇａｉｎ調整部２４４ａにより調整された音量比でミキシング（合成）が行われ、ミキシングされた音声データが生成される（Ｓ１０５）。 The mixing unit 242 mixes (synthesizes) the audio data whose volume has been adjusted by the gain adjusting unit 244a with the volume ratio adjusted by the gain adjusting unit 244a, and generates mixed audio data ( S105).

そして、ミキシング部２４２でミキシングされた音声データが、サウンド出力部２４１において、適切な出力形式に変換され（Ｓ１０６）、電話端末２０のスピーカにより出力される。 Then, the audio data mixed by the mixing unit 242 is converted into an appropriate output format by the sound output unit 241 (S106), and is output by the speaker of the telephone terminal 20.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of First Embodiment According to the first embodiment, the following effects can be achieved.

音声データ処理装置では、指定音量値が大きく設定されているほど大きな音量でミキシングするため、特定の相手先の音声を明瞭に出力させることができる。特に、相手先ごとに、背景雑音の大きさが異なる場合には、背景雑音の大きな相手先の音声を小さくする必要が生じる場合もあるため、そのような場合にも対応できる。 In the audio data processing device, the larger the designated volume value is set, the higher the volume is mixed, so that the voice of a specific destination can be output clearly. In particular, when the background noise level differs for each partner, it may be necessary to reduce the voice of the partner having a large background noise.

また、パケット制御部により、それぞれのＲＴＰパケットに送信元ＩＰアドレスに応じた指定音量値の情報を挿入し、Ｇａｉｎ調整部では、その挿入された指定音量値に基づいて、音声データの音量を調整している。これにより、ＰｏＣなどの、複数の話者が交代で話す形態の電話会議だけでなく、複数の電話端末における話者が同時に話す形態の電話会議であっても、任意の電話端末の話者の音声量をユーザの操作に応じて制御することができる。 Also, the packet control unit inserts information on the designated volume value corresponding to the transmission source IP address into each RTP packet, and the gain adjustment unit adjusts the volume of the audio data based on the inserted designated volume value. is doing. Thus, not only a conference call in which a plurality of speakers speak in turn, such as PoC, but also a conference call in which a speaker at a plurality of phone terminals speaks simultaneously, The amount of sound can be controlled according to the user's operation.

（Ｂ）第２の実施形態
以下、電話通信システム、並びに、音声データ処理装置、プログラム及び方法の第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Hereinafter, a second embodiment of a telephone communication system, an audio data processing device, a program, and a method will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態の構成
図６は、第２の実施形態の電話通信システムの全体構成を示すブロック図である。 (B-1) Configuration of Second Embodiment FIG. 6 is a block diagram showing an overall configuration of a telephone communication system according to the second embodiment.

第１の実施形態では、通話処理装置２１が、電話端末２０自体に搭載されている例について示したが、第２の実施形態においては、図６に示すように第１の実施形態の音声データ処理装置に相当する手段を、ＩＰ網Ｎ上の別の装置であるサーバ６０上に、音声データ処理装置６１として配置した場合の例について示している。以下、第２の実施形態について、第１の実施形態との差異について説明する。 In the first embodiment, an example is shown in which the call processing device 21 is mounted on the telephone terminal 20 itself. However, in the second embodiment, as shown in FIG. 6, the voice data of the first embodiment is used. An example in which means corresponding to a processing device is arranged as a voice data processing device 61 on a server 60 which is another device on the IP network N is shown. Hereinafter, the difference between the second embodiment and the first embodiment will be described.

ＩＰ網Ｎは、第１の実施形態と同様のものであるので詳しい説明は省略する。 Since the IP network N is the same as that of the first embodiment, detailed description thereof is omitted.

第２の実施形態においては、電話端末２０Ａ、３０Ａ、４０Ａは、直接他の電話端末と音声パケットのやりとりをせずに、音声データ処理装置６１を搭載したサーバ６０を介して音声パケットに係る通信を行う。第１の実施形態においては、電話端末２０Ａは、他の電話端末３０Ａ、４０Ａから音声パケットを受信して、それらの音声パケットに係る音声をミキシングして出力しているが、第２の実施形態においては、サーバ６０（通話処理装置４１）が、電話端末２０Ａ〜４０ＡからＲＴＰパケットを受信して、ミキシングした音声データを生成してから、それぞれの電話端末にその音声データを配信する。そして、サーバ６０（音声データ処理装置６１）は、音声データをミキシングする際に、各電話端末（ユーザ）において指定された音量値でミキシングを行う。 In the second embodiment, the telephone terminals 20A, 30A, and 40A do not directly exchange voice packets with other telephone terminals, but communicate with voice packets via the server 60 on which the voice data processing device 61 is mounted. I do. In the first embodiment, the telephone terminal 20A receives voice packets from the other telephone terminals 30A and 40A, and mixes and outputs the voices related to those voice packets. , The server 60 (call processing device 41) receives RTP packets from the telephone terminals 20A to 40A, generates mixed voice data, and then distributes the voice data to each telephone terminal. The server 60 (voice data processing device 61) mixes the voice data with the volume value designated at each telephone terminal (user) when mixing the voice data.

電話端末２０Ａは、通話処理装置２１Ａを有しており、通話処理装置２１Ａは、電話機能部２２、会議機能部２３、多入力音声制御部２４、音量制御指定部２５を有している。第１の実施形態の通話処理装置は、音声データ処理装置を有していたが、第２の実施形態では、音声データ処理装置は有しておらず、音量制御指定部２５のみを有している。なお、電話機能部２２、会議機能部２３については、第１の実施形態とほぼ同様のものであるので詳しい説明を省略するが、通話相手の電話端末に直接ＲＴＰパケットを送信せずに、サーバ６０（音声データ処理装置６１）を介する点で異なっている。 The telephone terminal 20A includes a call processing device 21A. The call processing device 21A includes a telephone function unit 22, a conference function unit 23, a multi-input voice control unit 24, and a volume control designation unit 25. The call processing device of the first embodiment has a voice data processing device. However, in the second embodiment, the voice processing device does not have a voice data processing device and has only a volume control designation unit 25. Yes. The telephone function unit 22 and the conference function unit 23 are substantially the same as those in the first embodiment and will not be described in detail. However, the server does not directly transmit the RTP packet to the telephone terminal of the other party. 60 (voice data processing device 61).

音量制御指定部２５は、第１の実施形態における、音量制御指定部２４３ａとほぼ同様のものであるが、外部装置であるサーバ６０に対して、指定音量値を与える点において、第１の実施形態と異なっている。音量制御指定部２５からサーバ６０（音声データ処理装置６１）に与えられた指定音量値は、サーバ６０（音声データ処理装置６１）において保持されるようにしても良い。また、音量制御指定部２５が、サーバ６０に対して指定音量値を与えるタイミングは、定期的でも良いし、指定音量値がユーザの操作に応じて変更された場合のみであっても良く、そのタイミングは限定されないものである。 The volume control designation unit 25 is substantially the same as the volume control designation unit 243a in the first embodiment, but the first implementation is that a designated volume value is given to the server 60 that is an external device. It is different from the form. The designated volume value given from the volume control designation unit 25 to the server 60 (voice data processing device 61) may be held in the server 60 (voice data processing device 61). Further, the timing at which the volume control designation unit 25 gives the designated volume value to the server 60 may be regular or only when the designated volume value is changed according to the user's operation. The timing is not limited.

図７は、この実施形態の音声データ処理装置内部の機能的構成について説明したブロック図である。 FIG. 7 is a block diagram illustrating the functional configuration inside the audio data processing apparatus of this embodiment.

音声データ処理装置６１は、ミキシング処理部６１１−１〜６１１−３、パケット制御部６１２、パケット処理部６１３（６１３−１〜６１３−３）、ＵＤＰ／ＩＰ制御部６１４を有している。 The audio data processing device 61 includes a mixing processing unit 611-1 to 611-3, a packet control unit 612, a packet processing unit 613 (6133-1 to 613-3), and a UDP / IP control unit 614.

ＵＤＰ／ＩＰ制御部６１４は、電話端末２０Ａ〜４０Ａと、ＲＴＰパケット等のデータの送受信を行う通信インタフェースの機能を担っている。また、電話端末２０Ａ〜４０Ａから、音量指定に係るパケットが与えられると、そのパケットを、パケット制御部６１２に与える。 The UDP / IP control unit 614 has a communication interface function for transmitting / receiving data such as RTP packets to / from the telephone terminals 20A to 40A. Further, when a packet related to volume specification is given from the telephone terminals 20 A to 40 A, the packet is given to the packet control unit 612.

パケット処理部６１３（６１３−１〜６１３−３）は、第１の実施形態のパケット処理部２４５（２４５−１〜２４５−３）とほぼ同様のものであるので詳しい説明を省略する。音声データ処理装置６１においては、３つの電話端末２０Ａ〜電話端末４０Ａに係る音声データを処理するので、図７に示すように、３つのパケット処理部６１３−１〜６１３−３（のスレッド）が生成配置されることになる。 Since the packet processing unit 613 (613-1 to 613-3) is substantially the same as the packet processing unit 245 (245-1 to 245-3) of the first embodiment, detailed description thereof is omitted. Since the voice data processing device 61 processes voice data relating to the three telephone terminals 20A to 40A, as shown in FIG. 7, the three packet processing units 613-1 to 613-3 (threads thereof) Will be generated and placed.

パケット制御部６１２は、第１の実施形態のパケット制御部２４４ｂとほぼ同様のものであるが、第２の実施形態では、音声データ処理装置６１は、複数の電話端末に係る音声データを処理するので、それぞれの電話端末２０Ａ〜４０Ａから指定音量値のデータが与えられると、そのデータを保持して音量指定に係る処理を行う。 The packet control unit 612 is substantially the same as the packet control unit 244b of the first embodiment, but in the second embodiment, the voice data processing device 61 processes voice data related to a plurality of telephone terminals. Therefore, when the data of the designated volume value is given from each of the telephone terminals 20A to 40A, the data is stored and the process related to the volume designation is performed.

ミキシング処理部６１１−１（６１１−１〜６１１−３）は、パケット制御部６１２から与えられた音声データについてミキシングの処理を行うものであり、ミキシング部６１１ａ（６１１ａ−１〜６１１ａ−３）、音量制御部６１１ｂ（６１１ｂ−１〜６１１ｂ−３）を有している。また、音量制御部６１１ｂ（６１１ｂ−１〜６１１ｂ−４）は、Ｇａｉｎ調整部６１１ｃ（６１１ｃ−１〜６１１ｃ−４）を有している。 The mixing processing unit 611-1 (6111-1 to 611-3) performs mixing processing on the audio data given from the packet control unit 612. The mixing unit 611a (611a-1 to 611a-3), A volume control unit 611b (611b-1 to 611b-3) is included. The sound volume control unit 611b (611b-1 to 611b-4) includes a gain adjusting unit 611c (611c-1 to 611c-4).

ミキシング処理部６１１−１（６１１−１〜６１１−３）は、パケット処理部６１３と同様に、少なくとも音声データを処理する対象の電話端末の数が配置される。音声データ処理装置６１においては、３つの電話端末２０Ａ〜電話端末４０Ａに係る音声データを処理するので、図７に示すように、３つのミキシング処理部６１１−１〜６１１−３配置されている。なお、音声データ処理装置６１においては、ミキシング処理部６１１の数は固定でも良いし、動的に変動させるようにしても良い。例えば、それぞれのミキシング処理部６１１を、プログラム上における一つのスレッドとして生成し、動的にそのスレッドの数を変動させるようにしても良い。この実施形態においては、ＵＤＰ／ＩＰ制御部６１４が、ＲＴＰパケットを受信した場合に、パケットの送信元ごとにミキシング処理部６１１のスレッドが生成されるものとして説明する。また、音声データ処理装置６１において、予めミキシング処理部６１１を複数配置しておいて、通信相手の数だけ利用するようにしても良い。このように、音声データ処理装置６１において、ミキシング処理部６１１の配置や管理方法は限定されないものである。 Similar to the packet processing unit 613, the mixing processing unit 611-1 (611-1 to 611-3) includes at least the number of target telephone terminals that process voice data. Since the voice data processing device 61 processes voice data related to the three telephone terminals 20A to 40A, as shown in FIG. 7, three mixing processing units 611-1 to 611-3 are arranged. In the audio data processing device 61, the number of mixing processing units 611 may be fixed or may be dynamically changed. For example, each mixing processing unit 611 may be generated as one thread on the program, and the number of threads may be dynamically changed. In this embodiment, description will be made assuming that when the UDP / IP control unit 614 receives an RTP packet, a thread of the mixing processing unit 611 is generated for each transmission source of the packet. Also, in the audio data processing device 61, a plurality of mixing processing units 611 may be arranged in advance and used as many as the number of communication partners. As described above, in the audio data processing device 61, the arrangement and management method of the mixing processing unit 611 are not limited.

ミキシング部６１１ａ、Ｇａｉｎ調整部６１１ｃは、第１の実施形態のミキシング部２４２、Ｇａｉｎ調整部２４４ａとほぼ同様のものであるので詳しい説明を省略する。第２の実施形態では、それぞれのミキシング処理部６１１−１〜６１１−３（ミキシング部６１１ａ−１〜６１１ａ−３、Ｇａｉｎ調整部６１１ｃ−１〜６１１ｃ−３）が、それぞれ対応する電話端末２０Ａ〜電話端末４０Ａに係る音声データの音量調整及びミキシングの処理を行う点で、第１の実施形態のものとは異なっている。なお、図７においては、ミキシング処理部６１１−１〜６１１−３が、それぞれ電話端末２０Ａ〜電話端末４０Ａに係る音声データのミキシングを行うものとする。また、ミキシング処理部６１１−１〜６１１−３は、ミキシングした音声データに係るパケットを、ＵＤＰ／ＩＰ制御部６１４を介して、それぞれの電話端末２０Ａ〜電話端末４０Ａに配信する。 Since the mixing unit 611a and the gain adjusting unit 611c are substantially the same as the mixing unit 242 and the gain adjusting unit 244a of the first embodiment, detailed description thereof is omitted. In the second embodiment, each of the mixing processing units 611-1 to 611-3 (mixing units 611a-1 to 611a-3, Gain adjusting units 611c-1 to 611c-3) corresponds to the corresponding telephone terminals 20A to 20A. It differs from that of the first embodiment in that the volume adjustment and mixing processing of the audio data related to the telephone terminal 40A is performed. In FIG. 7, it is assumed that the mixing processing units 611-1 to 611-3 mix audio data related to the telephone terminals 20A to 40A, respectively. Also, the mixing processing units 611-1 to 611-3 distribute the packets related to the mixed audio data to the respective telephone terminals 20A to 40A via the UDP / IP control unit 614.

（Ｂ−２）第２の実施形態の動作
次に、以上のような構成を有する第２の実施形態の電話通信システムの動作（この実施形態の音声データ処理方法）を説明する。 (B-2) Operation of Second Embodiment Next, the operation of the telephone communication system of the second embodiment having the above configuration (the voice data processing method of this embodiment) will be described.

図８は、第２の実施形態の電話通信システム１Ａの動作について示したフローチャートである。 FIG. 8 is a flowchart showing the operation of the telephone communication system 1A of the second embodiment.

図８の説明では、音声データ処理装置６１において、ＵＤＰ／ＩＰ制御部６１４の制御により、３つのパケット処理部６１３−１〜６１３−３（のスレッド）が生成されているものとして説明する。また、パケット処理部６１３−１〜６１３−３は、それぞれ電話端末２０Ａ〜電話端末４０Ａに対応するものとして生成されたものとして説明する。 In the description of FIG. 8, it is assumed that in the audio data processing device 61, three packet processing units 613-1 to 613-3 (threads thereof) are generated under the control of the UDP / IP control unit 614. Further, the packet processing units 613-1 to 613-3 will be described as generated corresponding to the telephone terminals 20 A to 40 A, respectively.

まず、ＵＤＰ／ＩＰ制御部６１４により、電話端末２０Ａ〜４０ＡからＲＴＰパケットが受信されると（Ｓ２０１）、パケット処理部６１３−１〜６１３−３に与えられて、パケット処理部６１３−１〜６１３−３において一定量ＲＴＰパケットがバッファリングされ、パケット解析部６１３ａ−１〜６１３ａ−３において、ＲＴＰパケットの送信元が解析される（Ｓ２０２）。 First, when the RTP packet is received from the telephone terminals 20A to 40A by the UDP / IP control unit 614 (S201), it is given to the packet processing units 613-1 to 613-3 and the packet processing units 613-1 to 613. −3, a fixed amount of RTP packets are buffered, and the packet analysis units 613a-1 to 613a-3 analyze the transmission source of the RTP packets (S202).

そして、パケット制御部６１２において、電話端末２０Ａ〜電話端末４０Ａから与えられた音量指定値に基づいて、電話端末ごとに、パケット解析部６１３ａ−１〜６１３ａ−３により解析されたＲＴＰパケットに音量値が挿入される（Ｓ２０３）。 Then, in the packet control unit 612, based on the volume designation value given from the telephone terminal 20A to the telephone terminal 40A, the volume value is added to the RTP packet analyzed by the packet analysis units 613a-1 to 613a-3 for each telephone terminal. Is inserted (S203).

次に、Ｇａｉｎ調整部６１１ｃ−１〜６１１ｃ−３において、パケット制御部６１２で付加された音声パケットの音量値によって、音量が変更される（Ｓ２０４）。 Next, in the gain adjusting units 611c-1 to 611c-3, the volume is changed according to the volume value of the voice packet added by the packet control unit 612 (S204).

次に、Ｇａｉｎ調整部６１１ｃ−１〜６１１ｃ−３において音量が変更されたＲＴＰパケットを、一つの音声出力とするために、ミキシング部２４２において、Ｇａｉｎ調整部６１１ｃ−１〜６１１ｃ−３により調整された音量比でミキシングが行われ、ミキシングされた音声データが生成される（Ｓ２０５）。 Next, the mixing unit 242 adjusts the gain adjusting units 611c-1 to 611c-3 so that the RTP packet whose volume has been changed by the gain adjusting units 611c-1 to 611c-3 is output as one voice. Mixing is performed at the sound volume ratio, and mixed audio data is generated (S205).

次に、ミキシング部６１１ａ−１〜６１１ａ−３でミキシングされた音声データが、ＵＤＰ／ＩＰ制御部６１４を介して、それぞれの電話端末２０Ａ〜電話端末４０Ａに配信され、それぞれの電話端末２０Ａ〜電話端末４０Ａにおいて出力される（Ｓ２０６）。 Next, the audio data mixed by the mixing units 611a-1 to 611a-3 is distributed to each of the telephone terminals 20A to 40A via the UDP / IP control unit 614, and each of the telephone terminals 20A to 20A The data is output at the terminal 40A (S206).

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態における効果に加えて、さらに以下のような効果を奏することができる。 (B-3) Effects of the Second Embodiment According to the second embodiment, in addition to the effects of the first embodiment, the following effects can be further achieved.

第２の実施形態では、音声データ処理装置を外部装置として構築しているため、第１の実施形態のように音声データ処理装置を電話端末自体に備えなくても良い。 In the second embodiment, since the voice data processing device is constructed as an external device, the voice data processing device may not be provided in the telephone terminal itself as in the first embodiment.

電話端末上で音量調整のミキシングに係る処理を行わずに、サーバ上で行っているため、電話端末において、必要となる情報処理量や記憶容量などのリソースを低減することができる。 Since processing is performed on the server without performing processing related to mixing of volume adjustment on the telephone terminal, resources such as required information processing amount and storage capacity can be reduced in the telephone terminal.

（Ｃ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (C) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｃ−１）上記の各実施形態においては、電話端末から送出される音声データはＲＴＰパケットの形式を用いる例について説明したが、例えば、ＲＴＳＰ（ＲｅａｌＴｉｍｅＳｔｒｅａｍｉｎｇＰｒｏｔｏｃｏｌ；ＩＥＴＦＲＦＣ２３２６参照）などその他の形式のパケットを用いても良い。 (C-1) In each of the above-described embodiments, the example in which the voice data transmitted from the telephone terminal uses the RTP packet format has been described, but other examples such as RTSP (Real Time Streaming Protocol; refer to IETF RFC2326) are used. Format packets may be used.

（Ｃ−２）第１の実施形態においては、音声データ処理装置を電話端末に搭載する例について説明し、第２の実施形態においては、音声データ処理装置を電話端末の外部に配置する例について説明したが、電話通信システム上で、内部に音声データ処理装置を搭載する電話端末と、外部に配置する電話端末の両方を組み合わせて配置するようにしても良い。この場合、音声データ処理装置を搭載していない電話端末は、ＲＴＰパケットをサーバ（外部の音声データ処理装置）と、音声データ処理装置を内蔵している電話端末の両方に送信し、音声データ処理装置を内蔵している電話端末は、ＲＴＰパケットを他の音声データ処理装置を内蔵している電話端末と、サーバ（外部の音声データ処理装置）に送信する。 (C-2) In the first embodiment, an example in which a voice data processing device is mounted on a telephone terminal will be described. In the second embodiment, an example in which a voice data processing device is arranged outside a telephone terminal will be described. As described above, on the telephone communication system, both the telephone terminal having the voice data processing device installed therein and the telephone terminal arranged outside may be arranged in combination. In this case, the telephone terminal not equipped with the voice data processing device transmits the RTP packet to both the server (external voice data processing device) and the telephone terminal incorporating the voice data processing device. The telephone terminal incorporating the device transmits the RTP packet to the telephone terminal incorporating another voice data processing device and the server (external voice data processing device).

これにより、例えば、処理能力や機能が異なる電話端末が混在していた場合であっても、第１及び第２の実施形態と同様の効果を奏することができる。 Thereby, for example, even when telephone terminals having different processing capabilities and functions are mixed, the same effects as those of the first and second embodiments can be obtained.

第１の実施形態に係る電話通信システムの全体構成を示すブロック図である。1 is a block diagram showing an overall configuration of a telephone communication system according to a first embodiment. 第１の実施形態に係る音声データ処理装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the audio | voice data processing apparatus which concerns on 1st Embodiment. 第１の実施形態に係るパケット制御部の処理について示した説明図である。It is explanatory drawing shown about the process of the packet control part which concerns on 1st Embodiment. 第１の実施形態に係るＧａｉｎ調整部及びミキシング部の処理について示した説明図である。It is explanatory drawing shown about the process of the Gain adjustment part and mixing part which concern on 1st Embodiment. 第１の実施形態に係る電話通信システムの動作について説明したフローチャートである。It is the flowchart explaining operation | movement of the telephone communication system which concerns on 1st Embodiment. 第２の実施形態に係る電話通信システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the telephone communication system which concerns on 2nd Embodiment. 第２の実施形態に係る音声データ処理装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the audio | voice data processing apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る電話通信システムの動作について説明したフローチャートである。It is the flowchart explaining operation | movement of the telephone communication system which concerns on 2nd Embodiment.

Explanation of symbols

１…電話通信システム、Ｎ…ＩＰ網、２０、３０、４０、５０…電話端末、２１…通話処理装置、２２…電話機能部、２３…会議機能部、２４…音声データ処理装置、２４１…サウンド出力部、２４２…ミキシング部、２４３…制御部、２４３ａ…音量制御指定部、２４４…音量制御部、２４４ａ…Ｇａｉｎ調整部、２４４ｂ…パケット制御部、２４５、２４５−１〜２４５−３…パケット処理部、２４５ａ、２４５ａ−１〜２４５ａ−３…パケット解析部、２４５ｂ、２４５ｂ−１〜２４５ｂ−３…バッファ部、２４６…ＵＤＰ／ＩＰ制御部。 DESCRIPTION OF SYMBOLS 1 ... Telephone communication system, N ... IP network, 20, 30, 40, 50 ... Telephone terminal, 21 ... Call processing apparatus, 22 ... Telephone function part, 23 ... Conference function part, 24 ... Audio | voice data processing apparatus, 241 ... Sound Output unit, 242 ... mixing unit, 243 ... control unit, 243a ... volume control designation unit, 244 ... volume control unit, 244a ... Gain adjustment unit, 244b ... packet control unit, 245, 245-1 to 245-3 ... packet processing 245a, 245a-1 to 245a-3 ... packet analysis unit, 245b, 245b-1 to 245b-3 ... buffer unit, 246 ... UDP / IP control unit.

Claims

A plurality of first telephone terminals, a voice data processing apparatus that synthesizes voice data provided from all the first telephone terminals to create synthesized voice data, and synthesized voice data created by the voice data processing apparatus A telephone communication system having a second telephone terminal that outputs the voice of
The audio data processing device is
Designated volume value registering means for registering the identification information and the designated volume value of each first telephone terminal in association with each other;
Voice data processing means for processing the voice data given from each of the first telephone terminals into volume data whose volume is adjusted based on the designated volume value registered in the designated volume value registration means;
A telephone communication system comprising voice data synthesis means for synthesizing voice data processed by the voice data processing means to create synthesized voice data.

Voice data given from the first telephone terminal to the second telephone terminal is inserted into a packet given from the telephone terminal of each first telephone terminal to the second telephone terminal,
The audio data processing device is
When a packet is given from each first telephone terminal, the designated volume value corresponding to the identification information of the first telephone terminal of the transmission source inserted in the packet is registered in the designated volume value registration means. And a designated volume value insertion means for inserting the detected designated volume value data into the packet.
The voice data processing means extracts the inserted voice data and the specified volume value from the packet in which the specified volume value is inserted by the specified volume value insertion means, and converts the extracted voice data into the extracted specified volume value. The telephone communication system according to claim 1, wherein the voice communication system processes the voice data with the volume adjusted based on the voice data.

The second telephone terminal is
When a specified volume value is input from the user, the apparatus further includes a specified volume value input unit that takes in the data of the specified volume value and notifies the audio data processing device,
The audio data processing device is
When the designated volume value is notified from the second telephone terminal, the designated volume value update registration unit further registers the designated volume value or updates the registered designated volume value in the volume value registration unit. The telephone communication system according to claim 1 or 2.

A plurality of first telephone terminals, a voice data processing apparatus that synthesizes voice data provided from all the first telephone terminals to create synthesized voice data, and synthesized voice data created by the voice data processing apparatus In the voice data processing apparatus constituting the telephone communication system having the second telephone terminal that outputs the voice of
Designated volume value registering means for registering the identification information and the designated volume value of each first telephone terminal in association with each other;
Voice data processing means for processing the voice data given from each of the first telephone terminals into volume data whose volume is adjusted based on the designated volume value registered in the designated volume value registration means;
A voice data processing apparatus comprising: voice data synthesis means for synthesizing voice data processed by the voice data processing means to create synthesized voice data.

A plurality of first telephone terminals, a voice data processing apparatus that synthesizes voice data provided from all the first telephone terminals to create synthesized voice data, and synthesized voice data created by the voice data processing apparatus A computer mounted in the voice data processing apparatus constituting a telephone communication system having a second telephone terminal that outputs the voice of
Designated volume value registering means for registering the identification information and the designated volume value of each first telephone terminal in association with each other;
Voice data processing means for processing the voice data given from each of the first telephone terminals into volume data whose volume is adjusted based on the designated volume value registered in the designated volume value registration means;
An audio data processing program, comprising: synthesizing audio data processed by the audio data processing means to function as audio data synthesis means for generating synthesized audio data.

A plurality of first telephone terminals, a voice data processing apparatus that synthesizes voice data provided from all the first telephone terminals to create synthesized voice data, and synthesized voice data created by the voice data processing apparatus A voice data processing method in a telephone communication system having a second telephone terminal that outputs the voice of
It has designated volume value registration means, voice data processing means, voice data synthesis means,
The designated volume value registering means registers the identification information of each first telephone terminal and the designated volume value in association with each other in the voice data processing device,
The voice data processing means is a volume obtained by adjusting the volume of the voice data given from the first telephone terminals based on the designated volume value registered in the designated volume value registration means in the voice data processing device. Processed into data,
The voice data processing method, wherein the voice data synthesizing means synthesizes voice data processed by the voice data processing means in the voice data processing device to create synthesized voice data.