JP2022108957A

JP2022108957A - Data processing device, data processing system, and voice processing method

Info

Publication number: JP2022108957A
Application number: JP2021004208A
Authority: JP
Inventors: 真鳥越; Makoto Torigoe
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2021-01-14
Filing date: 2021-01-14
Publication date: 2022-07-27

Abstract

To allow data arrival status to be confirmed without confirmation by conversation between users.SOLUTION: A data processing device includes: a receiving unit that receives first coded data from another data processing device; a decoding unit that decodes the first coded data received by the receiving unit; encoding unit that encodes data obtained by the decoding unit to generate second coded data; and a control unit that controls transmission of the second coded data to the other communication device when it is determined that the first coded data is data transmitted in a specific operation mode from the other communication device.SELECTED DRAWING: Figure 2

Description

本発明は、データ処理装置、データ処理システム、音声処理方法に関する。 The present invention relates to a data processing device, a data processing system, and an audio processing method.

近年、パンデミック対策として各企業においてテレワークの導入が加速している。テレワークにおいては、オフィスと在宅のテレワーカー、または在宅のテレワーカー同士がインターネット経由で結ばれ、ネットワーク対応のＴＶ会議システムまたはＰＣ上のビデオ通話ソフトを用いて、リモート会議、研修および遠隔コミュニケーションなどが行われる。 In recent years, companies are accelerating the introduction of telework as a countermeasure against pandemics. In telework, the office and teleworkers at home, or teleworkers at home are connected via the Internet, and remote meetings, training, remote communication, etc. done.

上記のＴＶ会議システムおよびビデオ通話ソフトでは、使用される装置の種別や構成によっては会議参加時に映像や音声に不具合が生じる場合がある。このため、利用者が会議に参加する前にＴＶ会議システムの動作確認を行える仕組みが知られている。例えば、特許文献１には、会議で使用する通常モードと、事前確認用のセルフチェックモードを備え、遠隔会議時に相手側に伝えられる音声品質を確認できる技術が開示されている。当該技術では、セルフチェックモードにおいて、利用者の発話音声が圧縮された後、当該利用者の端末内で圧縮された発話音声がループバックされ、さらに伸張された後に発話音声が再生される。これにより、利用者は、音声の圧縮と伸張による劣化具合を確認し得る。 In the above TV conference system and video call software, depending on the type and configuration of the device used, problems may occur in video and audio when participating in a conference. For this reason, a mechanism is known in which a user can check the operation of the TV conference system before participating in the conference. For example, Patent Literature 1 discloses a technology that has a normal mode used in a conference and a self-check mode for preliminary confirmation, and allows confirmation of the voice quality transmitted to the other party during a teleconference. In the technology, in the self-check mode, after the user's uttered voice is compressed, the compressed uttered voice is looped back in the user's terminal, and the uttered voice is reproduced after being decompressed. This allows the user to check the degree of deterioration due to compression and decompression of the voice.

特開２０１１－１９３３７４号公報JP 2011-193374 A

しかし、利用者の音声データが会議に参加する他の利用者の端末に向けて送信されても、ネットワークまたは他の利用者の端末の問題などにより、利用者の音声が他の利用者の端末から出力されないことがある。このため、会議開催の都度、参加する利用者間で「音声届いていますか？」といった会話による確認が行われることが多い。これは利用者が増える度に繰り返され、会議の途中参加の場合は逆に会議を中断するわけにもいかず、確認ができないままの場合がある。なお、特許文献１に記載の技術は、利用者の発話音声を利用者の端末内でループバックする技術であるので、特許文献１に記載の技術では利用者の音声データが会議に参加する他の利用者の端末に届くか否かを確認することは困難である。 However, even if the user's voice data is sent to the terminals of other users participating in the conference, the user's voice may be may not be output from For this reason, each time a conference is held, it is often the case that confirmation is made by conversation between participating users, such as "Are you receiving audio?" This is repeated every time the number of users increases, and in the case of midway participation in the conference, it is not possible to interrupt the conference, and confirmation may not be possible. In addition, since the technology described in Patent Document 1 is a technology that loops back the user's uttered voice in the user's terminal, the technology described in Patent Document 1 allows the user's voice data to participate in the conference. It is difficult to confirm whether or not it reaches the user's terminal.

そこで、本発明は、上記問題に鑑みてなされたものであり、本発明の目的とするところは、利用者間での会話による確認無しにデータの到達状況を確認することが可能な、新規かつ改良されたデータ処理装置、データ処理システム、音声処理方法を提供することにある。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to provide a new and novel method that enables confirmation of data arrival status without confirmation by conversation between users. An object of the present invention is to provide an improved data processing device, data processing system, and voice processing method.

上記課題を解決するために、本発明のある観点によれば、他のデータ処理装置から第１の符号化データを受信する受信部と、前記受信部により受信された第１の符号化データを復号する復号部と、前記復号部により得られたデータを符号化して第２の符号化データを生成する符号化部と、前記第１の符号化データが前記他の通信装置から特定の動作モードで送信されたデータであると判断される場合に、前記第２の符号化データの前記他の通信装置への送信を制御する制御部と、を備える、データ処理装置が提供される。 In order to solve the above problems, according to one aspect of the present invention, a receiving unit for receiving first encoded data from another data processing device, and receiving the first encoded data received by the receiving unit, a decoding unit that decodes; an encoding unit that encodes data obtained by the decoding unit to generate second encoded data; and a control unit that controls transmission of the second encoded data to the other communication device if the data is determined to be data transmitted in the above.

前記データ処理装置は、前記復号部により得られた前記データに基づいて音声または映像を出力する出力部をさらに備え、前記出力部は、前記制御部により第１の符号化データが前記他のデータ処理装置から前記特定の動作モードで送信されたデータであると判断された場合、前記復号部により得られた前記データに基づいて音声および映像を出力しなくてもよい。 The data processing device further includes an output unit that outputs audio or video based on the data obtained by the decoding unit, and the output unit causes the first encoded data to be converted to the other data by the control unit. When it is determined from the processing device that the data is transmitted in the specific operation mode, the audio and video may not be output based on the data obtained by the decoding section.

前記符号化部は、前記第１の符号化データの生成に用いられた第１の処理方式よりも品質の劣化が小さい第２の処理方式を用いて前記第２の符号化データを生成してもよい。 The encoding unit generates the second encoded data using a second processing method that causes less quality deterioration than the first processing method used to generate the first encoded data. good too.

前記制御部は、前記第１の符号化データの通信に用いられた第１の通信方式よりも信頼性が高い第２の通信方式で前記第２の符号化データの送信を制御してもよい。 The control unit may control transmission of the second encoded data using a second communication scheme having higher reliability than the first communication scheme used for communication of the first encoded data. .

前記制御部は、前記第１の符号化データに所定のフラグが付加されていることに基づき、前記第１の符号化データが前記他のデータ処理装置から特定の動作モードで送信されたデータであると判断してもよい。 Based on the fact that a predetermined flag is added to the first encoded data, the control section controls that the first encoded data is data transmitted in a specific operation mode from the other data processing device. You can judge that there is.

また、上記課題を解決するために、本発明の別の観点によれば、第１のデータ処理装置および第２のデータ処理装置を有するデータ処理システムであって、前記第１のデータ処理装置は、データを符号化して得られた第１の符号化データを前記第２のデータ処理装置に送信し、前記第２のデータ処理装置は、前記第１のデータ処理装置から第１の符号化データを受信する受信部と、前記受信部により受信された第１の符号化データを復号する復号部と、前記復号部により得られたデータを符号化して第２の符号化データを生成する符号化部と、前記第１の符号化データが前記第１のデータ処理装置から特定の動作モードで送信されたデータであると判断される場合に、前記第２の符号化データの前記第１のデータ処理装置への送信を制御する制御部と、を備える、データ処理システムが提供される。 In order to solve the above problems, according to another aspect of the present invention, there is provided a data processing system having a first data processing device and a second data processing device, wherein the first data processing device comprises transmitting first encoded data obtained by encoding data to the second data processing device, the second data processing device receiving the first encoded data from the first data processing device; a decoding unit that decodes the first encoded data received by the receiving unit; and an encoding that encodes the data obtained by the decoding unit to generate second encoded data and the first data of the second encoded data when it is determined that the first encoded data is data transmitted from the first data processing device in a specific operation mode. and a controller for controlling transmission to the processing device.

また、上記課題を解決するために、本発明の別の観点によれば、他のデータ処理装置から第１の符号化データを受信することと、前記第１の符号化データを復号することと、前記第１の符号化データの復号により得られたデータを符号化して第２の符号化データを生成することと、前記第１の符号化データが前記他のデータ処理装置から特定の動作モードで送信されたデータであると判断される場合に、前記第２の符号化データの前記他のデータ処理装置への送信を制御することと、を含む、音声処理方法が提供される。 Further, in order to solve the above problems, according to another aspect of the present invention, receiving first encoded data from another data processing device, and decoding the first encoded data. encoding data obtained by decoding the first encoded data to generate second encoded data; and receiving the first encoded data from the other data processing device in a specific operation mode and controlling the transmission of said second encoded data to said other data processing device if it is determined that said data was transmitted by said method.

また、上記課題を解決するために、本発明の別の観点によれば、入力されたデータを符号化して符号化データを生成する符号化部と、前記符号化データを他のデータ処理装置に送信する送信部と、前記他のデータ処理装置から第１の符号化データまたは第２の符号化データを受信する受信部と、第１の動作モードにおいては前記第１の符号化データに基づく音声または映像の出力を制御し、第２の動作モードにおいては前記第２の符号化データに基づく音声または映像の出力を制御する制御部と、を備える、データ処理装置が提供される。 In order to solve the above problems, according to another aspect of the present invention, there is provided an encoding unit that encodes input data to generate encoded data; a transmitting unit for transmitting; a receiving unit for receiving first encoded data or second encoded data from said other data processing device; and in a first operation mode, voice based on said first encoded data Alternatively, there is provided a data processing device comprising: a control unit that controls video output, and controls audio or video output based on the second encoded data in a second operation mode.

前記第１の符号化データは第１の通信方式を用いて送信されたデータであり、前記第２の符号化データは、前記第１の通信方式よりも信頼性が高い第２の通信方式で送信されたデータであり、前記受信部は、前記第１の通信方式に対応し、前記第１の符号化データを受信する第１受信部、および前記第２の通信方式に対応し、前記第２の符号化データを受信する第２受信部、を有してもよい。 The first encoded data is data transmitted using a first communication scheme, and the second encoded data is transmitted using a second communication scheme having higher reliability than the first communication scheme. the transmitted data, wherein the receiving unit corresponds to the first communication system and receives the first encoded data; and a second receiver for receiving two encoded data.

前記第１の符号化データは第１の処理方式を用いて生成されたデータであり、前記第２の符号化データは前記第１の処理方式よりも品質の劣化が小さい第２の処理方式を用いて生成されたデータであり、前記データ処理装置は、前記第１の処理方式に対応し、前記第１の符号化データを復号する第１復号部、および、前記第２の処理方式に対応し、前記第２の符号化データを復号する第２復号部、を有してもよい。 The first encoded data is data generated using a first processing method, and the second encoded data is generated using a second processing method with less quality deterioration than the first processing method. and the data processing device corresponds to the first processing method, the first decoding unit for decoding the first encoded data, and the second processing method. and a second decoding unit that decodes the second encoded data.

前記データ処理装置は、入力されたデータを保持する第１バッファをさらに備え、前記制御部は、前記第１バッファに保持されたデータを前記符号化部に供給してもよい。 The data processing device may further include a first buffer that holds input data, and the control section may supply the data held in the first buffer to the encoding section.

前記データ処理装置は、複数の他のデータ処理装置から受信された複数の前記第２の符号化データを復号して得られた複数のデータを保持する第２バッファをさらに備え、前記制御部は、前記第２バッファに保持された複数のデータの出力を順次に制御してもよい。 The data processing device further comprises a second buffer holding a plurality of data obtained by decoding the plurality of second encoded data received from the plurality of other data processing devices, wherein the control unit , the output of the plurality of data held in the second buffer may be sequentially controlled.

前記送信部は、前記第２の動作モードにおいては、所定のフラグと共に前記符号化データを送信してもよい。 The transmitting section may transmit the encoded data together with a predetermined flag in the second operation mode.

また、上記課題を解決するために、本発明の別の観点によれば、入力されたデータを符号化して符号化データを生成することと、前記符号化データを他のデータ処理装置に送信することと、前記他のデータ処理装置から第１の符号化データまたは第２の符号化データを受信することと、第１の動作モードにおいては前記第１の符号化データに基づく音声または映像の出力を制御し、第２の動作モードにおいては前記第２の符号化データに基づく音声または映像の出力を制御することと、を含む、音声処理方法が提供される。 In order to solve the above problems, according to another aspect of the present invention, input data is encoded to generate encoded data, and the encoded data is transmitted to another data processing device. receiving the first encoded data or the second encoded data from the other data processing device; and outputting audio or video based on the first encoded data in a first operation mode. and, in a second mode of operation, controlling output of audio or video based on said second encoded data.

以上説明した本発明によれば、利用者間での会話による確認無しにデータの到達状況を確認することが可能である。 According to the present invention described above, it is possible to confirm the arrival status of data without confirmation by conversation between users.

本発明の一実施形態によるデータ処理システムの構成を示す説明図である。1 is an explanatory diagram showing the configuration of a data processing system according to one embodiment of the present invention; FIG. 本発明の一実施形態による音声処理装置２０の構成を示す説明図である。1 is an explanatory diagram showing the configuration of a speech processing device 20 according to one embodiment of the present invention; FIG. 本発明の一実施形態によるデータ処理システムにおける接続シーケンスを示す説明図である。FIG. 4 is an explanatory diagram showing a connection sequence in the data processing system according to one embodiment of the present invention; 音声処理装置２０Ａが試験モードに移行した場合の処理シーケンスを示す説明図である。FIG. 10 is an explanatory diagram showing a processing sequence when the speech processing device 20A shifts to test mode; 音声処理装置２０Ａおよび音声処理装置２０Ｂの通常モードにおける音声データの流れを示す説明図である。FIG. 4 is an explanatory diagram showing the flow of audio data in the normal mode of the audio processing device 20A and the audio processing device 20B; 音声処理装置２０が試験モードで動作し、音声処理装置２０Ｂが通常モードで動作している場合の音声データの流れを示す説明図である。3 is an explanatory diagram showing the flow of audio data when the audio processing device 20 operates in the test mode and the audio processing device 20B operates in the normal mode; FIG. 音声処理装置２０のハードウェア構成を示したブロック図である。2 is a block diagram showing the hardware configuration of the audio processing device 20; FIG.

以下に添付図面を参照しながら、本発明の実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. In the present specification and drawings, constituent elements having substantially the same functional configuration are denoted by the same reference numerals, thereby omitting redundant description.

また、本明細書及び図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の符号の後に異なるアルファベットを付して区別する場合もある。例えば、実質的に同一の機能構成または論理的意義を有する複数の構成を、必要に応じて音声処理装置２０Ａ、２０Ｂおよび２０Ｃのように区別する。ただし、実質的に同一の機能構成を有する複数の構成要素の各々を特に区別する必要がない場合、複数の構成要素の各々に同一符号のみを付する。例えば、音声処理装置２０Ａ、２０Ｂおよび２０Ｃを特に区別する必要が無い場合には、各音声処理装置を単に音声処理装置２０と称する。 In addition, in this specification and drawings, a plurality of components having substantially the same functional configuration may be distinguished by attaching different alphabets after the same reference numerals. For example, a plurality of configurations having substantially the same functional configuration or logical significance are differentiated like audio processing devices 20A, 20B and 20C as necessary. However, when there is no particular need to distinguish between a plurality of constituent elements having substantially the same functional configuration, only the same reference numerals are given to each of the plurality of constituent elements. For example, when there is no particular need to distinguish between the audio processing devices 20A, 20B and 20C, each audio processing device is simply referred to as the audio processing device 20. FIG.

＜１．データ処理システムの概要＞
本発明の一実施形態は、遠隔する拠点から複数の利用者が参加する音声会議を実現するデータ処理システムに関する。まず、図１を参照し、本発明の一実施形態によるデータ処理システムの概要を説明する。 <1. Outline of Data Processing System>
One embodiment of the present invention relates to a data processing system that realizes a voice conference in which a plurality of users participate from remote sites. First, with reference to FIG. 1, an outline of a data processing system according to one embodiment of the present invention will be described.

図１は、本発明の一実施形態によるデータ処理システムの構成を示す説明図である。図１に示したように、本発明の一実施形態によるデータ処理システムは、音声処理装置２０Ａ～２０Ｆおよび会議サーバ３０を有する。 FIG. 1 is an explanatory diagram showing the configuration of a data processing system according to one embodiment of the present invention. As shown in FIG. 1, a data processing system according to one embodiment of the present invention includes audio processors 20A-20F and a conference server 30. As shown in FIG.

これら音声処理装置２０Ａ～２０Ｆおよび会議サーバ３０はネットワーク１２を介して接続されている。ネットワーク１２は、ネットワーク１２に接続されている装置から送信される情報の有線、または無線の伝送路である。例えば、ネットワーク１２は、インターネット、電話回線網、衛星通信網などの公衆回線網や、Ｅｔｈｅｒｎｅｔ（登録商標）を含む各種のＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）などを含んでもよい。また、ネットワーク１２は、ＩＰ－ＶＰＮ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ－ＶｉｒｔｕａｌＰｒｉｖａｔｅＮｅｔｗｏｒｋ）などの専用回線網を含んでもよい。 These audio processing devices 20A to 20F and conference server 30 are connected via network 12. FIG. Network 12 is a wired or wireless transmission path for information transmitted from devices connected to network 12 . For example, the network 12 may include a public line network such as the Internet, a telephone line network, a satellite communication network, various LANs (Local Area Networks) including Ethernet (registered trademark), WANs (Wide Area Networks), and the like. The network 12 may also include a dedicated line network such as IP-VPN (Internet Protocol-Virtual Private Network).

図１に示した例では、音声処理装置２０Ａ、音声処理装置２０Ｂおよび音声処理装置２０Ｃが同じ音声会議に参加するグループＧ１に属し、音声処理装置２０Ｄおよび音声処理装置２０Ｅが他の音声会議に参加するグループＧ２に属し、音声処理装置２０Ｆはいずれの音声会議にも参加していない。 In the example shown in FIG. 1, the audio processing device 20A, the audio processing device 20B, and the audio processing device 20C belong to the group G1 participating in the same audio conference, and the audio processing device 20D and the audio processing device 20E participate in another audio conference. The voice processing device 20F does not participate in any voice conference.

また、図１に示した例では、利用者ＵＡが音声処理装置２０Ａを利用し、利用者ＵＢが音声処理装置２０Ｂを利用し、利用者ＵＣが音声処理装置２０Ｃを利用し、利用者ＵＤが音声処理装置２０Ｄを利用し、利用者ＵＥが音声処理装置２０Ｅを利用し、利用者ＵＦが音声処理装置２０Ｆを利用している。ただし、データ処理システムを構成する音声処理装置２０の数、およびデータ処理システムを利用する利用者Ｕの数は、より少なくてもよいし、より多くてもよい。 In the example shown in FIG. 1, the user UA uses the voice processing device 20A, the user UB uses the voice processing device 20B, the user UC uses the voice processing device 20C, and the user UD uses the voice processing device 20C. The voice processing device 20D is used, the user UE uses the voice processing device 20E, and the user UF uses the voice processing device 20F. However, the number of voice processing devices 20 constituting the data processing system and the number of users U using the data processing system may be smaller or larger.

（会議サーバ）
会議サーバ３０は、各音声処理装置２０の会議への参加と退出を管理する。例えば、会議サーバ３０は、ＷｅｂＲＴＣ（ＷｅｂＲｅａｌ－ＴｉｍｅＣｏｍｍｕｎｉｃａｔｉｏｎ）のような会議用のプロトコルを用いて各音声処理装置２０の会議への参加と退出を管理する。映像データおよび音声データの通信は、会議サーバ３０を介さずに、上記グループＧ１内およびグループＧ２内などのグループ内で行われる。なお、音声処理装置２０同士がＰｅｅｒ２Ｐｅｅｒで接続する場合には、会議サーバ３０は設けられなくてもよい。 (conference server)
The conference server 30 manages participation and withdrawal of each audio processing device 20 from the conference. For example, the conference server 30 uses a conference protocol such as WebRTC (Web Real-Time Communication) to manage participation and withdrawal of each audio processing device 20 from the conference. Communication of video data and audio data is performed within groups such as the above group G1 and group G2 without going through the conference server 30 . Note that the conference server 30 may not be provided when the audio processing devices 20 are connected to each other by Peer2Peer.

（音声処理装置）
音声処理装置２０は、データ処理装置の一例であり、音声処理装置２０の利用者が発した音声を示す音声データを他の音声処理装置２０に送信する。また、音声処理装置２０は、他の音声処理装置２０の利用者が発した音声を示す音声データを他の音声処理装置２０から受信し、当該音声データに基づいて他の音声処理装置２０の利用者が発した音声を出力する。 (sound processing device)
The voice processing device 20 is an example of a data processing device, and transmits voice data representing voice uttered by the user of the voice processing device 20 to other voice processing devices 20 . In addition, the voice processing device 20 receives voice data indicating the voice uttered by the user of the other voice processing device 20 from the other voice processing device 20, and uses the other voice processing device 20 based on the voice data. Outputs the voice uttered by the person.

例えば、図１に示した例では、利用者ＵＡが発した音声を示す音声データを音声処理装置２０Ａが音声処理装置２０Ｂおよび音声処理装置２０Ｃに送信し、音声処理装置２０Ｂおよび音声処理装置２０Ｃが当該音声データに基づいて利用者ＵＡが発した音声を出力する。また、利用者ＵＢが発した音声を示す音声データを音声処理装置２０Ｂが音声処理装置２０Ａに送信し、音声処理装置２０Ａが当該音声データに基づいて利用者ＵＢが発した音声を出力する。また、利用者ＵＣが発した音声を示す音声データを音声処理装置２０Ｃが音声処理装置２０Ａに送信し、音声処理装置２０Ａが当該音声データに基づいて利用者ＵＣが発した音声を出力する。かかる構成により、利用者ＵＡ、利用者ＵＢおよび利用者ＵＣが音声会議を行うことが可能である。 For example, in the example shown in FIG. 1, the voice processing device 20A transmits voice data representing voice uttered by the user UA to the voice processing devices 20B and 20C, and the voice processing devices 20B and 20C A voice uttered by the user UA is output based on the voice data. Further, the voice processing device 20B transmits voice data indicating voice uttered by the user UB to the voice processing device 20A, and the voice processing device 20A outputs the voice uttered by the user UB based on the voice data. Also, the voice processing device 20C transmits voice data indicating voice uttered by the user UC to the voice processing device 20A, and the voice processing device 20A outputs the voice uttered by the user UC based on the voice data. With this configuration, user UA, user UB, and user UC can hold a voice conference.

なお、音声処理装置２０は、音声データに加えて、映像データを他の音声処理装置２０と送受信してもよい。また、図１においては音声処理装置２０の一例としてノート型のＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）を示しているが、音声処理装置２０は、デスクトップ型のＰＣ、スマートフォン、携帯電話またはＰＨＳ（ＰｅｒｓｏｎａｌＨａｎｄｙｐｈｏｎｅＳｙｓｔｅｍ）などの他の情報処理装置であってもよい。 Note that the audio processing device 20 may transmit and receive video data to and from other audio processing devices 20 in addition to audio data. 1 shows a notebook PC (Personal Computer) as an example of the audio processing device 20, the audio processing device 20 may be a desktop PC, a smart phone, a mobile phone, a PHS (Personal Handyphone System), or the like. It may be another information processing device.

（背景）
このようなデータ処理システムにおいては、利用者の音声データが会議に参加する他の利用者の音声処理装置に向けて送信されても、ネットワークまたは他の利用者の音声処理装置の問題などにより、利用者の音声が他の利用者の音声処理装置から出力されないことがある。このため、会議開催の都度、参加する利用者間で「音声届いていますか？」といった会話による確認が行われることが多い。これは利用者が増える度に繰り返され、会議の途中参加の場合は逆に会議を中断するわけにもいかず、確認ができないままの場合がある。 (background)
In such a data processing system, even if a user's voice data is sent to the voice processing devices of other users participating in the conference, problems with the network or other users' voice processing devices may result in A user's voice may not be output from other users' voice processing devices. For this reason, each time a conference is held, it is often the case that confirmation is made by conversation between participating users, such as "Are you receiving audio?" This is repeated every time the number of users increases, and in the case of midway participation in the conference, it is not possible to interrupt the conference, and confirmation may not be possible.

また、場合によっては受信側の音声処理装置の出力の問題でありながら送信側の音声処理装置の問題であるかのように指摘され、トラブル解決に時間を割かれることがある。逆に、送信側の音声処理装置の障害でありながら、利用者が受信側の音声処理装置の障害であるかのように勘違いすることもある。さらに、遠隔会議はネットワーク環境に大きく左右されるので、パケットの遅延やロスの影響で音声データの品質が低下することがあるが、会議途中で動作確認を行うことが困難であった。 In some cases, the output problem of the receiving-side audio processing device may be pointed out as if it were a problem of the transmitting-side audio processing device, and time may be spent on solving the problem. Conversely, the user may mistakenly think that the problem is the speech processing device on the receiving side even though the problem is the speech processing device on the transmitting side. Furthermore, teleconferencing is greatly affected by the network environment, and packet delays and losses can degrade the quality of voice data, making it difficult to check operations during a conference.

なお、会議サーバ側で映像データと音声データをループバックして音声処理装置の動作確認を行えるように会議サーバを構成することも考えられる。しかし、図１に示したように会議中は各音声処理装置の映像データと音声データが会議サーバを介さずに送受信されるシステムにおいては、音声処理装置から他の音声処理装置への音声データの到達状況を会議途中に確認することは困難である。 It is also conceivable to configure the conference server so that the video data and the audio data can be looped back on the conference server side to check the operation of the audio processing device. However, as shown in FIG. 1, in a system in which video data and audio data of each audio processing device are transmitted and received without going through the conference server during a conference, it is difficult to transfer audio data from one audio processing device to another audio processing device. It is difficult to check the arrival status during the meeting.

本件発明者は、上記事情を一着眼点にして本発明の一実施形態を創作するに至った。本発明の一実施形態によれば、利用者間での会話による確認無しに音声データの到達状況を確認することが可能となる。結果、音声会議を円滑に進めることが可能となる。以下、このような本発明の一実施形態の構成および動作を順次詳細に説明する。 The inventor of the present invention has created an embodiment of the present invention by focusing on the above circumstances. According to one embodiment of the present invention, it is possible to check the delivery status of voice data without confirmation by conversation between users. As a result, it becomes possible to proceed with the audio conference smoothly. Hereinafter, the configuration and operation of such an embodiment of the present invention will be sequentially described in detail.

＜２．音声処理装置の概要＞
図２は、本発明の一実施形態による音声処理装置２０の構成を示す説明図である。図２に示したように、本発明の一実施形態による音声処理装置２０は、音声入力部２２０、第１符号化部２２４、第１通信部２２８、第１復号部２３２、音声出力部２３６、操作部２３８、制御部２４０、第１バッファ２４４、回送部２４８、第２符号化部２５２、第２通信部２５６、第２復号部２６０および第２バッファ２６４を備える。 <2. Overview of audio processing device>
FIG. 2 is an explanatory diagram showing the configuration of the audio processing device 20 according to one embodiment of the present invention. As shown in FIG. 2, the speech processing device 20 according to an embodiment of the present invention includes a speech input section 220, a first encoding section 224, a first communication section 228, a first decoding section 232, a speech output section 236, An operation unit 238 , a control unit 240 , a first buffer 244 , a forwarding unit 248 , a second encoding unit 252 , a second communication unit 256 , a second decoding unit 260 and a second buffer 264 are provided.

（音声入力部２２０）
音声入力部２２０は、音声処理装置２０の利用者が発した音声が入力される構成である。音声入力部２２０は、音声処理装置２０の利用者が発した音声を電気的な音声データに変換し、音声データを第１符号化部２２４および第１バッファ２４４に供給する。音声入力部２２０は、マイクデバイスで構成されてもよいし、会議に使用されるカメラデバイスに搭載されるマイクであってもよい。 (Voice input unit 220)
The speech input unit 220 is configured to receive speech uttered by the user of the speech processing device 20 . The voice input unit 220 converts the voice uttered by the user of the voice processing device 20 into electrical voice data, and supplies the voice data to the first encoding unit 224 and the first buffer 244 . The voice input unit 220 may be configured by a microphone device, or may be a microphone mounted on a camera device used for a conference.

（第１符号化部２２４）
第１符号化部２２４は、音声入力部２２０から供給された音声データを第１の処理方式を用いて符号化して符号化音声データを生成する。第１の処理方式は、非可逆の圧縮方式であってもよい。本明細書においては、第１符号化部２２４により生成された符号化音声データを第１の符号化音声データと称する場合もある。 (First encoding unit 224)
The first encoding unit 224 encodes the audio data supplied from the audio input unit 220 using the first processing method to generate encoded audio data. The first processing method may be a lossy compression method. In this specification, the encoded audio data generated by the first encoding unit 224 may also be referred to as first encoded audio data.

（第１通信部２２８）
第１通信部２２８は、第１符号化部２２４により生成された符号化音声データを他の音声処理装置２０に送信する送信部（第１送信部）、および他の音声処理装置２０から第１の処理方式で生成された符号化音声データを受信する受信部（第１受信部）として機能する。第１通信部２２８は、第１の通信方式として、例えばＵＤＰのような会議用のプロトコルを用いて通信する。第１通信部２２８は、他のプロトコルとしてＴＣＰを用いてもよいが、パケット遅延が大きくなった場合には遅延解消のためにパケットを破棄することが想定される。 (First communication unit 228)
The first communication unit 228 includes a transmission unit (first transmission unit) that transmits the encoded audio data generated by the first encoding unit 224 to the other audio processing device 20 and function as a receiving unit (first receiving unit) that receives encoded audio data generated by the processing method of . The first communication unit 228 communicates using a conference protocol such as UDP as the first communication method. The first communication unit 228 may use TCP as another protocol, but if the packet delay becomes large, it is assumed that the packet is discarded in order to eliminate the delay.

（第１復号部２３２）
第１復号部２３２は、第１通信部２２８により他の音声処理装置２０から受信された符号化音声データを第１の処理方式を用いて復号する。 (First decoding unit 232)
The first decoding unit 232 decodes the encoded audio data received from the other audio processing device 20 by the first communication unit 228 using the first processing method.

（音声出力部２３６）
音声出力部２３６は、第１復号部２３２により得られた音声データに基づき、他の音声処理装置２０の利用者が発した音声を出力する。音声出力部２３６は、スピーカデバイス、マイクデバイスと一体のヘッドセット、またはマイクスピーカなどであってもよい。また、音声出力部２３６は、後述の第２バッファ２６４から供給される音声も出力する。 (Audio output unit 236)
The audio output unit 236 outputs the audio uttered by the user of the other audio processing device 20 based on the audio data obtained by the first decoding unit 232 . The audio output unit 236 may be a speaker device, a headset integrated with a microphone device, a microphone speaker, or the like. The audio output unit 236 also outputs audio supplied from a second buffer 264, which will be described later.

音声処理装置２０の動作モードには、第１の動作モードの一例である通常モード、および第２の動作モード（特定の動作モード）の一例である試験モードがある。通常モードでは、上述した音声入力部２２０、第１符号化部２２４、第１通信部２２８、第１復号部２３２および音声出力部２３６が動作することにより、音声処理装置２０の利用者間で音声会議を行うことが可能である。試験モードでは、後述する第２通信部２５６、第２復号部２６０および第２バッファ２６４などの動作により、音声処理装置２０の利用者の音声データが他の音声処理装置２０に到達したか否か、到達した場合にはどのような品質で到達したかを確認することが可能である。 The operation modes of the audio processing device 20 include a normal mode, which is an example of a first operation mode, and a test mode, which is an example of a second operation mode (specific operation mode). In the normal mode, the audio input unit 220, the first encoding unit 224, the first communication unit 228, the first decoding unit 232, and the audio output unit 236 described above operate to enable voice communication between users of the audio processing device 20. It is possible to hold a meeting. In the test mode, whether or not the voice data of the user of the voice processing device 20 has reached another voice processing device 20 by the operations of the second communication unit 256, the second decoding unit 260, the second buffer 264, etc., which will be described later. , it is possible to check the quality of the arrival.

（操作部２３８）
操作部２３８は、音声処理装置２０の利用者が音声処理装置２０に情報または指示などを入力するために操作する構成である。例えば、音声処理装置２０の利用者は、音声処理装置２０の動作モードを通常モードと試験モードとの間で切り替えるための指示を操作部２３８に入力する。 (Operation unit 238)
The operation unit 238 is configured to be operated by the user of the speech processing device 20 to input information or instructions to the speech processing device 20 . For example, the user of the speech processing device 20 inputs an instruction to the operation unit 238 to switch the operation mode of the speech processing device 20 between the normal mode and the test mode.

（制御部２４０）
制御部２４０は、音声処理装置２０の動作全般を制御する。例えば、制御部２４０は、操作部２３８に対する利用者の指示に従い、音声処理装置２０の動作モードを通常モードと試験モードとの間で切り替える。試験モードにおいては、制御部２４０は、例えば以下に示す制御を行う。 (control unit 240)
The control unit 240 controls overall operations of the audio processing device 20 . For example, the control unit 240 switches the operation mode of the speech processing device 20 between the normal mode and the test mode according to the user's instruction to the operation unit 238 . In the test mode, the control section 240 performs, for example, the following controls.

制御部２４０は、音声入力部２２０から第１符号化部２２４への音声データの供給を停止させる。代わりに、制御部２４０は、第１バッファ２４４に保持されている音声データを第１符号化部２２４に供給する。第１符号化部２２４は、第１バッファ２４４から供給された音声データを符号化して符号化音声データを生成する。利用者の指示に従い、第１バッファ２４４を使用せず音声入力部２２０から第１符号化部２２４から供給された音声データを符号化して符号化音声データを生成してもよい。 The control unit 240 stops the supply of audio data from the audio input unit 220 to the first encoding unit 224 . Instead, the control section 240 supplies the audio data held in the first buffer 244 to the first encoding section 224 . The first encoding unit 224 encodes the audio data supplied from the first buffer 244 to generate encoded audio data. The audio data supplied from the audio input unit 220 through the first encoding unit 224 may be encoded to generate encoded audio data, without using the first buffer 244, according to a user's instruction.

制御部２４０は、符号化音声データの送信と共に試験モードに関する情報を第１通信部２２８に送信させる。試験モードに関する情報は、所定のフラグ、および他の音声処理装置２０からのデータ待ち受け用のアドレスおよびポートを示す情報を含む。第１通信部２２８がＷｅｂＲＴＣを用いて通信を行う場合、第１通信部２２８は、例えばデータチャンネルを用いて当該試験モードに関する情報を送信してもよい。 The control unit 240 causes the first communication unit 228 to transmit information about the test mode together with transmission of the encoded audio data. Information about the test mode includes a predetermined flag, and information indicating an address and port for waiting for data from other audio processors 20 . When the first communication unit 228 communicates using WebRTC, the first communication unit 228 may transmit information regarding the test mode using, for example, a data channel.

制御部２４０は、第１通信部２２８による符号化音声データの受信、第１通信部２２８から第１復号部２３２への符号化音声データの供給、第１復号部２３２による符号化音声データの復号、または第１復号部２３２から音声出力部２３６への音声データの供給、のうちの少なくともいずれかを停止させる。 The control unit 240 receives the encoded audio data by the first communication unit 228, supplies the encoded audio data from the first communication unit 228 to the first decoding unit 232, and decodes the encoded audio data by the first decoding unit 232. , or the supply of audio data from the first decoding unit 232 to the audio output unit 236 is stopped.

第２復号部２６０および第２バッファ２６４を動作状態として、第２通信部２５６により他の音声処理装置２０から受信された符号化音声データ（第２の処理方式により生成された第２の符号化音声データ）を第２復号部２６０に復号させ、復号により得られた音声データを第２バッファ２６４に保持させ、第２バッファ２６４に保持された１または２以上の音声データを音声出力部２３６に順次出力させる。 With the second decoding unit 260 and the second buffer 264 in the operating state, the encoded audio data received from the other audio processing device 20 by the second communication unit 256 (the second encoding generated by the second processing method) audio data) is decoded by the second decoding unit 260, the audio data obtained by decoding is held in the second buffer 264, and one or more audio data held in the second buffer 264 is sent to the audio output unit 236. output sequentially.

また、通常モードにおいては、制御部２４０は、例えば以下に示す制御を行う。 Also, in the normal mode, the control unit 240 performs, for example, the following controls.

制御部２４０は、第２復号部２６０および第２バッファ２６４を非動作状態とする。 Control unit 240 puts second decoding unit 260 and second buffer 264 in a non-operating state.

制御部２４０は、第１通信部２２８により他の音声処理装置２０から符号化音声データと共に試験モードを示す情報が受信された場合、符号化音声データが他の音声処理装置２０から試験モードで送信されたデータであると判断する。 When the information indicating the test mode is received together with the encoded audio data from the other audio processing device 20 by the first communication unit 228, the control unit 240 transmits the encoded audio data from the other audio processing device 20 in the test mode. It is determined that the data is

符号化音声データが他の音声処理装置２０から試験モードで送信されたデータである場合、制御部２４０は、第１復号部２３２により当該符号化音声データから得られた音声データを音声出力部２３６に供給しない。代わりに、制御部２４０は、回送部２４８に第２符号化部２５２に回送させる。結果、第２符号化部２５２が当該音声データを第２の処理方式で符号化して符号化音声データを生成し、第２通信部２５６が当該符号化音声データを他の音声処理装置２０に第２の通信方式を用いて送信する。 If the encoded audio data is data transmitted in the test mode from another audio processing device 20, the control unit 240 outputs the audio data obtained from the encoded audio data by the first decoding unit 232 to the audio output unit 236. do not supply to Instead, the control unit 240 causes the forwarding unit 248 to forward to the second encoding unit 252 . As a result, the second encoding unit 252 encodes the audio data using the second processing method to generate encoded audio data, and the second communication unit 256 transmits the encoded audio data to another audio processing device 20. 2 communication method.

（第１バッファ２４４）
第１バッファ２４４は、音声入力部２２０から供給される音声データの一部を試験用の音声データとして一時的に保持する。試験モードにおいて、利用者からの操作部２３８への操作に基づいて第１バッファ２４４から第１符号化部２２４に音声データが供給される。 (first buffer 244)
The first buffer 244 temporarily holds part of the audio data supplied from the audio input section 220 as test audio data. In the test mode, audio data is supplied from the first buffer 244 to the first encoding section 224 based on the user's operation on the operation section 238 .

（回送部２４８）
回送部２４８は、第１通信部２２８により他の音声処理装置２０から符号化音声データと共に試験モードを示す情報が受信された場合、当該符号化音声データを復号して得られた音声データを第１復号部２３２から第２符号化部２５２に受け渡す。 (forwarding unit 248)
When the information indicating the test mode is received together with the encoded audio data from another audio processing device 20 by the first communication unit 228, the forwarding unit 248 transmits the audio data obtained by decoding the encoded audio data to the first communication unit 228. 1 decoding unit 232 to second encoding unit 252 .

（第２符号化部２５２）
第２符号化部２５２は、回送部２４８から受け取った音声データを第２の処理方式を用いて符号化して符号化音声データを生成する。第２の処理方式は、第１の処理方式よりも音声品質の劣化が小さい処理方式である。第２の処理方式は、音声データが劣化しない可逆性の符号化方式であってもよい。本明細書においては、第２符号化部２５２により生成された符号化音声データを第２の符号化音声データと称する場合もある。 (Second encoding unit 252)
The second encoding unit 252 encodes the audio data received from the forwarding unit 248 using the second processing method to generate encoded audio data. The second processing method is a processing method that causes less deterioration in voice quality than the first processing method. The second processing method may be a reversible encoding method that does not degrade audio data. In this specification, the encoded audio data generated by the second encoding unit 252 may also be referred to as second encoded audio data.

（第２通信部２５６）
第２通信部２５６は、第２符号化部２５２により生成された符号化音声データを、試験モードを示す情報と共に符号化音声データを送信した他の音声処理装置２０に送信する送信部（第２送信部）、および他の音声処理装置２０から第２の処理方式で生成された符号化音声データを受信する受信部（第２受信部）として機能する。第２通信部２５６は、第１の通信方式よりも信頼性が高い第２の通信方式を用いて通信を行ってもよい。そのような第２の通信方式としては、例えば、パケット遅延が起きても音声データの保全を最優先とするＴＣＰが挙げられる。なお、第２通信部２５６と第１通信部２２８とは同一のネットワークデバイスにおいて実現される機能であってもよい。 (Second communication unit 256)
The second communication unit 256 transmits the encoded audio data generated by the second encoding unit 252 together with information indicating the test mode to the other audio processing device 20 that transmitted the encoded audio data (second transmitter) and a receiver (second receiver) that receives encoded audio data generated by the second processing method from another audio processor 20 . The second communication unit 256 may perform communication using a second communication method that is more reliable than the first communication method. As such a second communication method, for example, there is TCP, which gives top priority to preservation of voice data even if packet delay occurs. Note that the second communication unit 256 and the first communication unit 228 may be functions implemented in the same network device.

（第２復号部２６０）
第２復号部２６０は、第２通信部２５６により他の音声処理装置２０から受信された符号化音声データを第２の処理方式を用いて復号する。 (Second decoding unit 260)
The second decoding unit 260 decodes the encoded audio data received from the other audio processing device 20 by the second communication unit 256 using the second processing method.

（第２バッファ２６４）
第２バッファ２６４は、第２復号部２６０により得られた音声データを一時的に保持する。第２バッファ２６４は、音声処理装置２０と通信する他の音声処理装置２０が複数台あり、複数の音声処理装置２０から第２通信部２５６により符号化音声データが受信された場合に、複数の符号化音声データを復号して得られた複数の音声データを保持する。第２バッファ２６４により保持された複数の音声データは、制御部２４０による制御に従って音声出力部２３６から順次に音声として出力される。なお、音声処理装置２０と通信する他の音声処理装置２０が１台である場合には第２バッファ２６４は音声データを保持せず、音声出力部２３６が当該音声データに基づいて音声を出力してもよい。 (Second buffer 264)
A second buffer 264 temporarily holds the audio data obtained by the second decoding unit 260 . When there are a plurality of other audio processing devices 20 communicating with the audio processing device 20, and the second communication unit 256 receives encoded audio data from the plurality of audio processing devices 20, the second buffer 264 stores a plurality of Holds a plurality of audio data obtained by decoding encoded audio data. A plurality of pieces of audio data held by the second buffer 264 are sequentially output as audio from the audio output section 236 under the control of the control section 240 . Note that when there is only one other audio processing device 20 communicating with the audio processing device 20, the second buffer 264 does not hold audio data, and the audio output unit 236 outputs audio based on the audio data. may

＜３．動作＞
以上、本発明の一実施形態による音声処理装置２０の構成を説明した。続いて、本発明の一実施形態によるデータ処理システムの動作を説明する。 <3. Operation>
The configuration of the speech processing device 20 according to one embodiment of the present invention has been described above. Next, the operation of the data processing system according to one embodiment of the present invention will be described.

（接続シーケンス）
まず、図３を参照して、本発明の一実施形態によるデータ処理システムにおける接続シーケンスを説明する。 (connection sequence)
First, referring to FIG. 3, the connection sequence in the data processing system according to one embodiment of the present invention will be described.

図３は、本発明の一実施形態によるデータ処理システムにおける接続シーケンスを示す説明図である。図３に示したように、まず、利用者ＵＡが音声処理装置２０Ａの操作部２３８に対して会議への接続操作を行う（Ｓ１）。会議への接続は、会議サーバ３０が事前に用意した会議室を指定する方法でもよいし、新たに会議室を作成する方法でもよい。音声処理装置２０Ａは、利用者ＵＡからの操作に従い、会議サーバ３０へ接続要求を送信する（Ｓ２）。 FIG. 3 is an explanatory diagram showing the connection sequence in the data processing system according to one embodiment of the present invention. As shown in FIG. 3, first, the user UA performs a conference connection operation on the operation unit 238 of the speech processing device 20A (S1). The connection to the conference may be made by designating a conference room prepared in advance by the conference server 30 or by creating a new conference room. The voice processing device 20A transmits a connection request to the conference server 30 according to the operation from the user UA (S2).

同様に、まず、利用者ＵＢが音声処理装置２０Ｂの操作部２３８に対して会議への接続操作を行うと（Ｓ３）、音声処理装置２０Ｂが会議サーバ３０へ接続要求を送信する（Ｓ４）。音声処理装置２０Ａおよび音声処理装置２０Ｂが同一の会議室を指定した場合、会議サーバ３０が音声処理装置２０Ａおよび音声処理装置２０Ｂに互いにＰｅｅｒ２Ｐｅｅｒで通信を行うように接続指示を出し（Ｓ５，Ｓ６）、音声処理装置２０Ａおよび音声処理装置２０Ｂが接続を確立する（Ｓ７，Ｓ８）。 Similarly, first, when the user UB operates the operation unit 238 of the audio processing device 20B to connect to the conference (S3), the audio processing device 20B transmits a connection request to the conference server 30 (S4). When the voice processing device 20A and the voice processing device 20B specify the same conference room, the conference server 30 issues a connection instruction to the voice processing device 20A and the voice processing device 20B so that they communicate with each other by Peer2Peer (S5, S6). , the audio processing device 20A and the audio processing device 20B establish a connection (S7, S8).

接続の確立後、利用者ＵＡが音声処理装置２０Ａに向かって発話すると（Ｓ９）、音声処理装置２０Ａの第１符号化部２２４が符号化音声データを生成し、符号化音声データを音声処理装置２０Ｂに送信する（Ｓ１０）。そして、音声処理装置２０Ｂの第１復号部２３２が符号化音声データを復号し、復号により得られた音声データに基づいて音声処理装置２０Ｂの音声出力部２３６が利用者ＵＡの音声を出力する（Ｓ１１）。 After the connection is established, when the user UA speaks to the voice processing device 20A (S9), the first encoding unit 224 of the voice processing device 20A generates encoded voice data, and transmits the encoded voice data to the voice processing device. 20B (S10). Then, the first decoding unit 232 of the audio processing device 20B decodes the encoded audio data, and the audio output unit 236 of the audio processing device 20B outputs the audio of the user UA based on the audio data obtained by decoding ( S11).

同様に、利用者ＵＢが音声処理装置２０Ｂに向かって発話すると（Ｓ１２）、音声処理装置２０Ｂの第１符号化部２２４が符号化音声データを生成し、符号化音声データを音声処理装置２０Ａに送信する（Ｓ１３）。そして、音声処理装置２０Ａの第１復号部２３２が符号化音声データを復号し、復号により得られた音声データに基づいて音声処理装置２０Ａの音声出力部２３６が利用者ＵＢの音声を出力する（Ｓ１４）。この間、符号化音声データは会議サーバ３０を経由しない。会議室に参加する音声処理装置２０が増えても同様のシーケンスにより各音声処理装置２０を接続することが可能である。 Similarly, when user UB speaks toward speech processing device 20B (S12), first encoding unit 224 of speech processing device 20B generates encoded speech data, and sends the encoded speech data to speech processing device 20A. Send (S13). Then, the first decoding unit 232 of the audio processing device 20A decodes the encoded audio data, and the audio output unit 236 of the audio processing device 20A outputs the audio of the user UB based on the audio data obtained by decoding ( S14). During this time, encoded voice data does not pass through the conference server 30 . Even if the number of voice processing devices 20 participating in the conference room increases, each voice processing device 20 can be connected by the same sequence.

（試験モードでのシーケンス）
続いて、音声処理装置２０Ａおよび音声処理装置２０Ｂの接続が確立された後に、音声処理装置２０Ａが試験モードに移行した場合の処理シーケンスを説明する。 (Sequence in test mode)
Next, a processing sequence when the audio processing device 20A shifts to the test mode after the connection between the audio processing device 20A and the audio processing device 20B is established will be described.

図４は、音声処理装置２０Ａが試験モードに移行した場合の処理シーケンスを示す説明図である。利用者ＵＡが音声処理装置２０Ａに対して音声試験の開始操作を行うと（Ｓ２１）、音声処理装置２０Ａが試験モードへ移行する（Ｓ２２）。試験モードでは、後述の符号化音声データの送信において、相手装置に試験用の音声データであることを示すフラグが設定される。 FIG. 4 is an explanatory diagram showing a processing sequence when the speech processing device 20A shifts to the test mode. When the user UA performs a speech test start operation on the speech processing device 20A (S21), the speech processing device 20A shifts to the test mode (S22). In the test mode, a flag indicating that the encoded audio data is transmitted to the partner device is set to indicate that the encoded audio data is test audio data.

その後、利用者ＵＡは、試験モードにおいて発話を行うか、第１バッファ２４４に保持している音声データを使うかの指示を出す（Ｓ２３）。利用者ＵＡが試験用の発話を行うと（Ｓ２４）、音声処理装置２０Ａの第１バッファ２４４が一定時間の音声データを保持し（Ｓ２５：第１バッファリング）、当該保持された音声データを第１符号化部２２４が第１の処理方式を用いて符号化して符号化音声データを生成する（Ｓ２７：第１符号化）。 After that, the user UA issues an instruction to speak in the test mode or to use the voice data held in the first buffer 244 (S23). When the user UA makes a test utterance (S24), the first buffer 244 of the speech processing device 20A holds speech data for a certain period of time (S25: first buffering), and transfers the held speech data to the first buffer. 1 encoding unit 224 encodes using the first processing method to generate encoded audio data (S27: first encoding).

一方、第１バッファ２４４に保持している音声データを使う場合、第１バッファ２４４に保持されている音声データがあれば、制御部２４０は当該音声データを第１符号化部２２４に供給し（Ｓ２６）、第１符号化部２２４は当該音声データを第１の処理方式を用いて符号化して符号化音声データを生成する（Ｓ２７：第１符号化）。 On the other hand, when using the audio data held in the first buffer 244, if there is audio data held in the first buffer 244, the control unit 240 supplies the audio data to the first encoding unit 224 ( S26), the first encoding unit 224 encodes the audio data using the first processing method to generate encoded audio data (S27: first encoding).

いずれにしても、符号化音声データは、所定のフラグを含む試験モードに関する情報を伴って音声処理装置２０Ａの第１通信部２２８から音声処理装置２０Ｂに送信される（Ｓ２８）。試験モードに関する情報は、所定のフラグに加えて、音声処理装置２０Ｂからのデータ待ち受け用のアドレスとポート等を示す情報を含む。 In any case, the encoded audio data is transmitted from the first communication unit 228 of the audio processing device 20A to the audio processing device 20B along with the test mode information including the predetermined flag (S28). The information about the test mode includes, in addition to a predetermined flag, information indicating an address and port for waiting for data from the audio processing device 20B.

音声処理装置２０Ｂの第１通信部２２８が符号化音声データを受信すると、音声処理装置２０Ｂの第１復号部２３２が当該符号化音声データを第１の処理方式を用いて復号する（Ｓ２９：第１復号）。ここで、音声処理装置２０Ｂの制御部２４０が所定のフラグの設定に基づいて当該符号化音声データが試験モードで送信されたデータであると判断すると、音声処理装置２０Ｂの回送部２４８が復号により得られた音声データを第２符号化部２５２に回送し、第２符号化部２５２が当該音声データを第２の処理方式を用いて符号化する（Ｓ３０：第２符号化）。そして、音声処理装置２０Ｂの第２通信部２５６が符号化により生成された符号化音声データを音声処理装置２０Ａに送信する（Ｓ３１）。 When the first communication unit 228 of the audio processing device 20B receives the encoded audio data, the first decoding unit 232 of the audio processing device 20B decodes the encoded audio data using the first processing method (S29: first 1 decode). Here, when the control unit 240 of the audio processing device 20B determines that the encoded audio data is data transmitted in the test mode based on the setting of a predetermined flag, the forwarding unit 248 of the audio processing device 20B decodes The obtained audio data is forwarded to the second encoding unit 252, and the second encoding unit 252 encodes the audio data using the second processing method (S30: second encoding). Then, the second communication unit 256 of the audio processing device 20B transmits the encoded audio data generated by encoding to the audio processing device 20A (S31).

音声処理装置２０Ａの第２通信部２５６が符号化音声データを受信すると、音声処理装置２０Ａの第２復号部２６０が符号化音声データを第２の処理方式を用いて復号し（Ｓ３２：第２復号）、第２バッファ２６４が復号により得られた音声データを一時的に保持する（Ｓ３３：第２バッファリング）。 When the second communication unit 256 of the audio processing device 20A receives the encoded audio data, the second decoding unit 260 of the audio processing device 20A decodes the encoded audio data using the second processing method (S32: second decoding), and the second buffer 264 temporarily holds the audio data obtained by decoding (S33: second buffering).

その後、音声出力部２３６が、第２バッファ２６４に保持されている音声データに基づいて音声を出力する（Ｓ３４）。音声処理装置２０Ａの相手先の装置が複数存在する場合、音声出力部２３６は、第２バッファ２６４に保持されている複数の音声データを任意の順番で再生する。そして、制御部２４０が試験モード終了の処理を行い、所定のフラグの設定を外す（Ｓ３５）。 After that, the audio output unit 236 outputs audio based on the audio data held in the second buffer 264 (S34). When there are a plurality of devices on the other end of the audio processing device 20A, the audio output unit 236 reproduces the plurality of audio data held in the second buffer 264 in arbitrary order. Then, the control unit 240 performs processing for terminating the test mode, and removes the setting of the predetermined flag (S35).

（音声データの流れ）
ここで、図５および図６を参照して、通常モードにおける音声データの流れ、および試験モードにおける音声データの流れを整理する。 (Audio data flow)
Here, with reference to FIGS. 5 and 6, the flow of audio data in the normal mode and the flow of audio data in the test mode will be organized.

図５は、音声処理装置２０Ａおよび音声処理装置２０Ｂの通常モードにおける音声データの流れを示す説明図である。図５において、実線は音声処理装置２０Ａの利用者ＵＡの音声データの流れを示し、二点鎖線は音声処理装置２０Ｂの利用者ＵＢの音声データの流れを示している。 FIG. 5 is an explanatory diagram showing the flow of audio data in the normal mode of the audio processing devices 20A and 20B. In FIG. 5, a solid line indicates the flow of voice data of the user UA of the voice processing device 20A, and a two-dot chain line indicates the flow of voice data of the user UB of the voice processing device 20B.

図５において実線で示したように、通常モードにおいては、利用者ＵＡの音声データは、音声処理装置２０Ａの音声入力部２２０、第１符号化部２２４（第１バッファ２４４を介してもよい）、第１通信部２２８、そして、音声処理装置２０Ｂの第１通信部２２８、第１復号部２３２、音声出力部２３６、という流れで処理される。同様に、図５において二点鎖線で示したように、利用者ＵＢの音声データは、音声処理装置２０Ｂの音声入力部２２０、第１符号化部２２４（第１バッファ２４４を介してもよい）、第１通信部２２８、そして、音声処理装置２０Ａの第１通信部２２８、第１復号部２３２、音声出力部２３６、という流れで処理される。 As indicated by the solid line in FIG. 5, in the normal mode, the voice data of the user UA is input to the voice input section 220 and the first encoding section 224 of the voice processing device 20A (may pass through the first buffer 244). , the first communication unit 228, the first communication unit 228, the first decoding unit 232, and the audio output unit 236 of the audio processing device 20B. Similarly, as indicated by the two-dot chain line in FIG. 5, the voice data of user UB is input to voice input section 220 and first encoding section 224 (may be passed through first buffer 244) of voice processing device 20B. , the first communication unit 228, the first communication unit 228, the first decoding unit 232, and the audio output unit 236 of the audio processing device 20A.

図６は、音声処理装置２０が試験モードで動作し、音声処理装置２０Ｂが通常モードで動作している場合の音声データの流れを示す説明図である。図６において、実線は音声処理装置２０Ａの利用者ＵＡの音声データの流れを示し、二点鎖線は音声処理装置２０Ｂの利用者ＵＢの音声データの流れを示している。 FIG. 6 is an explanatory diagram showing the flow of audio data when the audio processing device 20 operates in the test mode and the audio processing device 20B operates in the normal mode. In FIG. 6, a solid line indicates the flow of voice data of the user UA of the voice processing device 20A, and a two-dot chain line indicates the flow of voice data of the user UB of the voice processing device 20B.

図６において実線で示したように、試験モードにおいては、利用者ＵＡの音声データは、音声処理装置２０Ａの音声入力部２２０、第１符号化部２２４（第１バッファ２４４を介してもよい）、第１通信部２２８、そして、音声処理装置２０Ｂの第１通信部２２８、第１復号部２３２、回送部２４８、第２符号化部２５２、第２通信部２５６という流れで処理され、第２の処理方式で生成された符号化音声データが音声処理装置２０Ａに送信される。その後、符号化音声データが音声処理装置２０Ａの第２通信部２５６、第２復号部２６０、第２バッファ２６４、音声出力部２３６という流れで処理され、音声処理装置２０Ａの音声出力部２３６から利用者ＵＡの音声が出力される。なお、音声処理装置２０Ｂにおいて、第１復号部２３２から音声出力部２３６へは音声データが供給されないので、音声処理装置２０Ｂの音声出力部２３６からは利用者ＵＡの音声データは出力されない。 As indicated by the solid line in FIG. 6, in the test mode, the voice data of the user UA is input to the voice input unit 220 and the first encoding unit 224 (may be passed through the first buffer 244) of the voice processing device 20A. , the first communication unit 228, the first communication unit 228, the first decoding unit 232, the forwarding unit 248, the second encoding unit 252, and the second communication unit 256 of the audio processing device 20B. The encoded audio data generated by the processing method of (1) is transmitted to the audio processing device 20A. After that, the encoded audio data is processed in the flow of the second communication unit 256, the second decoding unit 260, the second buffer 264, and the audio output unit 236 of the audio processing device 20A, and is used by the audio output unit 236 of the audio processing device 20A. The voice of the person UA is output. In the audio processing device 20B, since audio data is not supplied from the first decoding unit 232 to the audio output unit 236, the audio data of the user UA is not output from the audio output unit 236 of the audio processing unit 20B.

一方、図６において二点鎖線で示したように、音声処理装置２０Ａが試験モードで動作している場合には、利用者ＵＢの音声データは、音声処理装置２０Ｂの音声入力部２２０、第１符号化部２２４（第１バッファ２４４を介してもよい）、第１通信部２２８、そして、音声処理装置２０Ａの第１通信部２２８、第１復号部２３２という流れで処理される。第１復号部２３２により得られた音声データは音声出力部２３６に供給されないので、音声処理装置２０Ａの音声出力部２３６からは利用者ＵＢの音声データは出力されない。 On the other hand, as indicated by the two-dot chain line in FIG. 6, when the speech processing device 20A is operating in the test mode, the speech data of the user UB is input to the speech input unit 220 of the speech processing device 20B, the first Processing is performed in the flow of the encoding unit 224 (which may be via the first buffer 244), the first communication unit 228, the first communication unit 228, and the first decoding unit 232 of the audio processing device 20A. Since the audio data obtained by the first decoding section 232 is not supplied to the audio output section 236, the audio data of the user UB is not output from the audio output section 236 of the audio processing device 20A.

＜４．作用効果＞
以上説明した本発明の一実施形態によれば、多様な作用効果が発揮される。例えば、本発明の一実施形態による音声処理装置２０は、試験モードにおいて、利用者の音声データを他の音声処理装置２０から折り返して受信し、当該音声データに基づいて利用者の音声を出力する。従って、利用者の音声データが他の音声処理装置２０に到達する環境であるか否かの確認を、利用者間で「音声届いていますか？」といった会話による確認無しに行うことが可能である。結果、このような会話により会議の進行が妨げられないので、会議を円滑に進行することが可能となる。 <4. Action effect>
According to one embodiment of the present invention described above, various functions and effects are exhibited. For example, in the test mode, the speech processing device 20 according to one embodiment of the present invention receives the user's speech data by returning it from another speech processing device 20, and outputs the user's speech based on the speech data. . Therefore, it is possible to confirm whether or not the user's voice data is in an environment where the voice data of the user reaches the other voice processing device 20 without confirming by conversation such as "Is the voice reaching you?" be. As a result, the progress of the conference is not hindered by such conversations, so that the conference can proceed smoothly.

また、本発明の一実施形態による音声処理装置２０は、ネットワーク対応のＴＶ会議システムやＰＣ上のビデオ通話ソフトにおいて、通常の会議や会話の際の符号化方式（第１の処理方式）および速度重視の送受信のプロトコル（第１の通信方式）に加えて、可逆式の符号化方式（第２の処理方式）と確実性重視の送受信プロトコル（第２の通信方式）に対応している。そして、他の音声処理装置２０からの音声データの折り返しには、第２の処理方式および第２の通信方式が用いられる。従って、音声処理装置２０の利用者は、音声処理装置２０から出力される自身の音声に基づき、他の音声処理装置２０で出力されるだろう音声の品質を確認することが可能である。また、音声処理装置２０から出力された自身の音声の品質に問題がなければ、音声処理装置２０と他の音声処理装置２０の内部処理および双方のネットワークには問題がないことが分かるので、トラブル原因の追究の時間ロスを軽減することができる。 In addition, the audio processing device 20 according to one embodiment of the present invention can be used in a network-compatible TV conference system or video call software on a PC, and the encoding method (first processing method) and speed during normal conferences and conversations. In addition to the transmission/reception protocol (first communication method) that emphasizes importance, it supports a reversible encoding method (second processing method) and a transmission/reception protocol that emphasizes certainty (second communication method). Then, the second processing method and the second communication method are used for looping back the audio data from the other audio processing device 20 . Therefore, the user of the speech processing device 20 can check the quality of the speech that will be output by other speech processing devices 20 based on his/her own speech output from the speech processing device 20 . Also, if there is no problem with the quality of the own voice output from the voice processing device 20, it can be seen that there is no problem with the internal processing of the voice processing device 20 and the other voice processing device 20, and with the networks of both. It is possible to reduce the time loss for investigating the cause.

また、本発明の一実施形態による音声処理装置２０は、試験モードにおいて、他の音声処理装置２０から受信された通常の音声データに基づく音声の出力を行わない。従って、音声処理装置２０の利用者は、他の音声処理装置２０から折り返して受信された自身の音声を明確に聞くことで、当該音声の品質をより正確に把握することが可能である。また、他の音声処理装置２０においては、試験モードで動作する音声処理装置２０から送信された音声データに基づく音声の出力を行わないので、試験用の音声により会議が妨げられることを防止できる。 Also, the audio processing device 20 according to the embodiment of the present invention does not output audio based on normal audio data received from other audio processing devices 20 in the test mode. Therefore, the user of the voice processing device 20 can clearly hear the voice that is returned and received from the other voice processing device 20, thereby more accurately grasping the quality of the voice. Further, since the other audio processing device 20 does not output audio based on the audio data transmitted from the audio processing device 20 operating in the test mode, it is possible to prevent the conference from being disturbed by the test audio.

また、本発明の一実施形態による音声処理装置２０は第２バッファ２６４を備えるので、音声処理装置２０が複数の他の音声処理装置２０と会議を行う場合でも、他の音声処理装置２０の各々から折り返して受信された自身の音声を順番に聞くことが可能である。 In addition, since the speech processing device 20 according to one embodiment of the present invention includes the second buffer 264, even when the speech processing device 20 conducts a conference with a plurality of other speech processing devices 20, each of the other speech processing devices 20 It is possible to listen to their own voices in order, which are returned from and received.

＜５．変形例＞
以上、本発明の一実施形態を説明した。以下では、上述した実施形態の幾つかの変形例を説明する。なお、以下に説明する各変形例は、単独で上述した実施形態に適用されてもよいし、組み合わせで上述した実施形態に適用されてもよい。また、各変形例は、上述した実施形態の構成に代えて適用されてもよいし、上述した実施形態の構成に対して追加的に適用されてもよい。 <5. Variation>
An embodiment of the present invention has been described above. Several modifications of the above-described embodiment are described below. In addition, each modification described below may be applied to the above-described embodiment alone, or may be applied in combination to the above-described embodiment. Further, each modification may be applied instead of the configuration of the embodiment described above, or may be applied additionally to the configuration of the embodiment described above.

上記では第２の処理方式として、通常の会議や会話の際の符号化方式（第１の処理方式）と異なる方式を説明した。しかし、第２の処理方式の種類は第１の処理方式の種類と同じであり、第２の処理方式に適用されるパラメータが第１の処理方式に適用されるパラメータと異なってもよい。また、第２の処理方式は完全可逆の方式でなく、ニアロスレスであってもよい。 As the second processing method, a method different from the encoding method (first processing method) used in normal meetings and conversations has been described above. However, the type of the second processing scheme may be the same as the type of the first processing scheme, and the parameters applied to the second processing scheme may differ from the parameters applied to the first processing scheme. Also, the second processing method may be a near-lossless method instead of a completely reversible method.

また、音声処理装置２０が通常の会議や会話の際の符号化方式（第１の処理方式）および速度重視の送受信のプロトコル（第１の通信方式）に加えて、可逆式の符号化方式（第２の処理方式）と確実性重視の送受信プロトコル（第２の通信方式）に対応している例を説明したが、音声処理装置２０は、第２の処理方式または第２の通信方式の一方または双方に対応していなくてもよい。すなわち、他の音声処理装置２０において第１の処理方式で生成された符号化音声データが第１の通信方式で折り返されてもよい。この場合、他の音声処理装置２０に届く音声データよりも品質が劣化した音声データが音声処理装置２０に折り返されることになるが、音声処理装置２０の利用者の音声データが他の音声処理装置２０に届いているか否かの確認は可能である。 In addition to the encoding method (first processing method) for normal meetings and conversations and the speed-oriented transmission/reception protocol (first communication method), the speech processing device 20 also uses a reversible encoding method ( 2nd processing method) and a transmission/reception protocol emphasizing certainty (second communication method) have been described. Or it does not have to correspond to both. That is, encoded audio data generated by the first processing method in another audio processing device 20 may be looped back by the first communication method. In this case, voice data whose quality has deteriorated from that of the voice data reaching the other voice processing device 20 is returned to the voice processing device 20. It is possible to confirm whether or not it has reached 20.

また、上記では音声データを折り返すための機能が音声処理装置２０に実装される例を説明したが、当該機能は会議サーバ３０にも実装されてもよい。この場合、音声処理装置２０は、会議サーバ３０および他の音声処理装置２０の双方から音声データの折り返しを受け、双方の音声データを比較することで、トラブルの原因究明を行い得る。例えば、会議サーバ３０からは正常な音声データが折り返されたが、他の音声処理装置２０からは音声データが折り返されない（または、折り返されても品質に問題がある）場合には、音声処理装置２０および音声処理装置２０側のネットワーク回線には問題が無く、他の音声処理装置２０側に何かしらの問題があることが分かる。 Moreover, although the example in which the voice processing device 20 has the function for looping back the voice data has been described above, the function may also be implemented in the conference server 30 . In this case, the voice processing device 20 receives the voice data returned from both the conference server 30 and the other voice processing device 20, and compares the voice data of both to investigate the cause of the trouble. For example, when normal audio data is returned from the conference server 30, but audio data is not returned from the other audio processing devices 20 (or if there is a quality problem even if the audio data is returned), the audio processing It can be seen that there is no problem with the network lines on the device 20 side and the audio processing device 20 side, and there is some problem on the other audio processing device 20 side.

また、上記では、データの一例として音声データを説明し、符号化データの一例として符号化音声データを説明したが、映像データおよび符号化映像データにも本発明の一実施形態を適用可能である。すなわち、音声処理装置２０は、映像データの折り返しのための機能を有してもよい。この場合、図２を参照して説明した音声入力部２２０に加えてまたは代えてカメラのような映像入力部が用いられ、音声出力部２３６に加えてまたは代えてディスプレイのような映像表示部が用いられる。かかる構成によっても、音声処理装置２０の利用者の映像が他の音声処理装置２０に届くか否か、届く場合にはどのような品質で届くかを確認することが可能である。 In the above description, audio data was described as an example of data, and encoded audio data was described as an example of encoded data, but an embodiment of the present invention can also be applied to video data and encoded video data. . That is, the audio processing device 20 may have a function for folding back video data. In this case, a video input unit such as a camera is used in addition to or instead of the audio input unit 220 described with reference to FIG. 2, and a video display unit such as a display is used in addition to or instead of the audio output unit 236. Used. With this configuration as well, it is possible to confirm whether or not the video of the user of the audio processing device 20 will reach another audio processing device 20, and if so, what quality it will have.

＜６．ハードウェア構成＞
以上、本発明の一実施形態を説明した。上述した音声データの符号化および復号などの情報処理は、ソフトウェアと、以下に説明する音声処理装置２０のハードウェアとの協働により実現される。 <6. Hardware Configuration>
An embodiment of the present invention has been described above. Information processing such as encoding and decoding of the audio data described above is realized by cooperation between software and hardware of the audio processing device 20 described below.

図７は、音声処理装置２０のハードウェア構成を示したブロック図である。音声処理装置２０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３と、ホストバス２０４と、を備える。また、音声処理装置２０は、ブリッジ２０５と、外部バス２０６と、インターフェース２０７と、入力装置２０８と、表示装置２０９と、音声出力装置２１０と、ストレージ装置（ＨＤＤ）２１１と、ドライブ２１２と、ネットワークインターフェース２１５とを備える。 FIG. 7 is a block diagram showing the hardware configuration of the audio processing device 20. As shown in FIG. The audio processing device 20 includes a CPU (Central Processing Unit) 201 , a ROM (Read Only Memory) 202 , a RAM (Random Access Memory) 203 and a host bus 204 . The audio processing device 20 also includes a bridge 205, an external bus 206, an interface 207, an input device 208, a display device 209, an audio output device 210, a storage device (HDD) 211, a drive 212, a network and an interface 215 .

ＣＰＵ２０１は、演算処理装置および制御装置として機能し、各種プログラムに従って音声処理装置２０内の動作全般を制御する。また、ＣＰＵ２０１は、マイクロプロセッサであってもよい。ＲＯＭ２０２は、ＣＰＵ２０１が使用するプログラムや演算パラメータ等を記憶する。ＲＡＭ２０３は、ＣＰＵ２０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を一時記憶する。これらはＣＰＵバスなどから構成されるホストバス２０４により相互に接続されている。これらＣＰＵ２０１、ＲＯＭ２０２およびＲＡＭ２０３とソフトウェアとの協働により、上述した第１符号化部２２４、第１復号部２３２、制御部２４０、回送部２４８、第２符号化部２５２および第２通信部２５６などの機能が実現され得る。 The CPU 201 functions as an arithmetic processing device and a control device, and controls overall operations within the sound processing device 20 according to various programs. Alternatively, the CPU 201 may be a microprocessor. The ROM 202 stores programs, calculation parameters, and the like used by the CPU 201 . The RAM 203 temporarily stores programs used in the execution of the CPU 201, parameters that change as appropriate during the execution, and the like. These are interconnected by a host bus 204 comprising a CPU bus or the like. The CPU 201, the ROM 202, the RAM 203, and the software cooperate to operate the first encoding unit 224, the first decoding unit 232, the control unit 240, the forwarding unit 248, the second encoding unit 252, the second communication unit 256, and the like. can be realized.

ホストバス２０４は、ブリッジ２０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バスなどの外部バス２０６に接続されている。なお、必ずしもホストバス２０４、ブリッジ２０５および外部バス２０６を分離構成する必要はなく、１つのバスにこれらの機能を実装してもよい。 The host bus 204 is connected via a bridge 205 to an external bus 206 such as a PCI (Peripheral Component Interconnect/Interface) bus. It should be noted that the host bus 204, bridge 205 and external bus 206 do not necessarily have to be configured separately, and these functions may be implemented in one bus.

入力装置２０８は、マウス、キーボード、タッチパネル、ボタン、マイクロフォン、センサー、スイッチおよびレバーなどユーザが情報を入力するための入力手段と、ユーザによる入力に基づいて入力信号を生成し、ＣＰＵ２０１に出力する入力制御回路などから構成されている。音声処理装置２０のユーザは、該入力装置２０８を操作することにより、音声処理装置２０に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input device 208 includes input means for the user to input information, such as a mouse, keyboard, touch panel, button, microphone, sensor, switch, and lever, and generates an input signal based on the user's input, and outputs the input to the CPU 201. It consists of a control circuit and the like. By operating the input device 208, the user of the speech processing device 20 can input various data to the speech processing device 20 and instruct processing operations.

表示装置２０９は、例えば、液晶ディスプレイ（ＬＣＤ）装置、プロジェクター装置、ＯＬＥＤ（ＯｒｇａｎｉｃＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）装置およびランプなどの表示装置を含む。また、音声出力装置２１０は、スピーカおよびヘッドホンなどの音声出力装置を含む。 The display device 209 includes, for example, a liquid crystal display (LCD) device, a projector device, an OLED (Organic Light Emitting Diode) device and a lamp. Also, the audio output device 210 includes audio output devices such as speakers and headphones.

ストレージ装置２１１は、本実施形態にかかる音声処理装置２０の記憶部の一例として構成されたデータ格納用の装置である。ストレージ装置２１１は、記憶媒体、記憶媒体にデータを記録する記録装置、記憶媒体からデータを読み出す読出し装置および記憶媒体に記録されたデータを削除する削除装置などを含んでもよい。ストレージ装置２１１は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）またはＳＳＤ（ＳｏｌｉｄＳｔｒａｇｅＤｒｉｖｅ）、あるいは同等の機能を有するメモリ等で構成される。このストレージ装置２１１は、ストレージを駆動し、ＣＰＵ２０１が実行するプログラムや各種データを格納する。 The storage device 211 is a data storage device configured as an example of a storage unit of the audio processing device 20 according to the present embodiment. The storage device 211 may include a storage medium, a recording device that records data on the storage medium, a reading device that reads data from the storage medium, a deletion device that deletes data recorded on the storage medium, and the like. The storage device 211 is configured by, for example, a HDD (Hard Disk Drive), an SSD (Solid Storage Drive), or a memory having equivalent functions. The storage device 211 drives storage and stores programs executed by the CPU 201 and various data.

ドライブ２１２は、記憶媒体用リーダライタであり、音声処理装置２０に内蔵、あるいは外付けされる。ドライブ２１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリ等のリムーバブル記憶媒体２４に記録されている情報を読み出して、ＲＡＭ２０３またはストレージ装置２１１に出力する。また、ドライブ２１２は、リムーバブル記憶媒体２４に情報を書き込むこともできる。 The drive 212 is a reader/writer for storage media, and is built in or externally attached to the audio processing device 20 . The drive 212 reads out information recorded in the attached removable storage medium 24 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and outputs it to the RAM 203 or the storage device 211 . Drive 212 can also write information to removable storage medium 24 .

ネットワークインターフェース２１５は、例えば、ネットワーク１２に接続するための通信デバイス等で構成された通信インターフェースである。また、ネットワークインターフェース２１５は、無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）対応通信装置であっても、有線による通信を行うワイヤー通信装置であってもよい。 The network interface 215 is, for example, a communication interface configured with a communication device or the like for connecting to the network 12 . The network interface 215 may be a wireless LAN (Local Area Network) compatible communication device or a wired communication device that performs wired communication.

なお、上述した音声処理装置２０のハードウェア構成は会議サーバ３０にも適用可能である。 Note that the hardware configuration of the audio processing device 20 described above can also be applied to the conference server 30 .

＜７．補足＞
以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 <7. Supplement>
Although the preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, the present invention is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field to which the present invention belongs can conceive of various modifications or modifications within the scope of the technical idea described in the claims. , of course, are also understood to belong to the technical scope of the present invention.

例えば、本明細書の音声処理装置２０の処理における各ステップは、必ずしもシーケンス図またはフローチャートとして記載された順序に沿って時系列に処理する必要はない。例えば、音声処理装置２０の処理における各ステップは、フローチャートとして記載した順序と異なる順序で処理されても、並列的に処理されてもよい。 For example, each step in the processing of the speech processing device 20 of this specification does not necessarily have to be processed in chronological order according to the order described as a sequence diagram or flowchart. For example, each step in the processing of the audio processing device 20 may be processed in an order different from the order described as the flowchart, or may be processed in parallel.

また、音声処理装置２０に内蔵されるＣＰＵ、ＲＯＭおよびＲＡＭなどのハードウェアに、上述した音声処理装置２０の各構成と同等の機能を発揮させるためのコンピュータプログラムも作成可能である。また、該コンピュータプログラムを記憶させた記憶媒体も提供される。 It is also possible to create a computer program for causing hardware such as the CPU, ROM, and RAM built into the audio processing device 20 to exhibit functions equivalent to the components of the audio processing device 20 described above. A storage medium storing the computer program is also provided.

２０音声処理装置
２２０音声入力部
２２４第１符号化部
２２８第１通信部
２３２第１復号部
２３６音声出力部
２３８操作部
２４０制御部
２４４第１バッファ
２４８回送部
２５２第２符号化部
２５６第２通信部
２６０第２復号部
２６４第２バッファ
３０会議サーバ
20 audio processing device 220 audio input unit 224 first encoding unit 228 first communication unit 232 first decoding unit 236 audio output unit 238 operation unit 240 control unit 244 first buffer 248 forwarding unit 252 second encoding unit 256 second Communication unit 260 Second decoding unit 264 Second buffer 30 Conference server

Claims

a receiving unit that receives the first encoded data from another data processing device;
a decoding unit that decodes the first encoded data received by the receiving unit;
an encoding unit that encodes the data obtained by the decoding unit to generate second encoded data;
Controlling transmission of the second encoded data to the other communication device when it is determined that the first encoded data is data transmitted from the other communication device in a specific operation mode a control unit that
A data processing device comprising:

The data processing device further comprises an output unit that outputs audio or video based on the data obtained by the decoding unit,
The output unit outputs the data obtained by the decoding unit when the control unit determines that the first encoded data is data transmitted from the other data processing device in the specific operation mode. 2. The data processing apparatus of claim 1, wherein no audio or video is output based on.

The encoding unit generates the second encoded data using a second processing method that causes less quality deterioration than the first processing method used to generate the first encoded data. 3. A data processing apparatus according to claim 1 or 2.

3. The control unit controls transmission of the second encoded data in a second communication scheme having higher reliability than the first communication scheme used for communication of the first encoded data. 4. The data processing device according to any one of 1 to 3.

Based on the fact that a predetermined flag is added to the first encoded data, the control section controls that the first encoded data is data transmitted in a specific operation mode from the other data processing device. 5. The data processing device according to any one of claims 1 to 4, wherein the data processing device determines that there is a

A data processing system having a first data processing device and a second data processing device,
The first data processing device,
transmitting first encoded data obtained by encoding data to the second data processing device;
The second data processing device is
a receiving unit that receives first encoded data from the first data processing device;
a decoding unit that decodes the first encoded data received by the receiving unit;
an encoding unit that encodes the data obtained by the decoding unit to generate second encoded data;
transmitting the second encoded data to the first data processing device when it is determined that the first encoded data is data transmitted from the first data processing device in a specific operation mode; a control unit for controlling the transmission of
A data processing system comprising:

receiving first encoded data from another data processing device;
decoding the first encoded data;
encoding data obtained by decoding the first encoded data to generate second encoded data;
transmitting the second encoded data to the other data processing device when it is determined that the first encoded data is data transmitted from the other data processing device in a specific operation mode; and
An audio processing method, comprising:

an encoding unit that encodes input data to generate encoded data;
a transmission unit that transmits the encoded data to another data processing device;
a receiving unit that receives the first encoded data or the second encoded data from the other data processing device;
Control for controlling audio or video output based on the first encoded data in the first operation mode, and controlling audio or video output based on the second encoded data in the second operation mode Department and
A data processing device comprising:

The first encoded data is data transmitted using a first communication scheme, and the second encoded data is transmitted using a second communication scheme having higher reliability than the first communication scheme. is the data sent,
The receiving unit
A first receiving unit that corresponds to the first communication method and receives the first encoded data, and a second receiving unit that corresponds to the second communication method and receives the second encoded data ,
9. A data processing apparatus according to claim 8, comprising:

The first encoded data is data generated using a first processing method, and the second encoded data is generated using a second processing method with less quality deterioration than the first processing method. is data generated using
The data processing device is
a first decoding unit that corresponds to the first processing method and decodes the first encoded data; and
a second decoding unit that corresponds to the second processing method and decodes the second encoded data;
10. A data processing apparatus according to claim 8 or 9, comprising:

The data processing device further comprises a first buffer that holds input data,
11. The data processing device according to claim 8, wherein said control section supplies the data held in said first buffer to said encoding section.

The data processing device further comprises a second buffer holding a plurality of data obtained by decoding a plurality of the second encoded data received from a plurality of other data processing devices,
12. The data processing device according to claim 8, wherein said control unit sequentially controls output of a plurality of data held in said second buffer.

13. The data processing apparatus according to any one of claims 8 to 12, wherein said transmission section transmits said encoded data together with a predetermined flag in said second operation mode.

encoding input data to generate encoded data;
transmitting the encoded data to another data processing device;
receiving first encoded data or second encoded data from the other data processing device;
controlling audio or video output based on the first encoded data in a first operation mode, and controlling audio or video output based on the second encoded data in a second operation mode; When,
An audio processing method, comprising: