JP6951610B1

JP6951610B1 - Speech processing system, speech processor, speech processing method, and speech processing program

Info

Publication number: JP6951610B1
Application number: JP2021549976A
Authority: JP
Inventors: 篤古城; 栗原　洋介; 洋介栗原
Original assignee: Uhuru Corp
Current assignee: Uhuru Corp
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2021-10-20
Anticipated expiration: 2040-07-20
Also published as: JP2022020625A; JPWO2022018786A1; WO2022018786A1

Abstract

音声処理システムは、イベントに参加するユーザが発する音を検出するマイクと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を調整する音調整部と、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成する合成部と、合成部が合成した音を表すデータを出力する出力部と、を備える。The voice processing system includes a microphone that detects sounds emitted by users participating in the event, a performer position that represents the position of the performer of the event in a predetermined area where the event is held, and an audience position that represents the position of the user in the predetermined area. A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship, a sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit, and a user different from the first user. It is provided with a compositing unit that synthesizes the sound detected by the microphone corresponding to the second user and adjusted by the sound adjusting unit, and an output unit that outputs data representing the sound synthesized by the compositing unit.

Description

本発明は、音声処理システム、音声処理装置、音声処理方法、及び音声処理プログラムに関する。 The present invention relates to a voice processing system, a voice processing device, a voice processing method, and a voice processing program.

下記特許文献１には、アーティストがファンの前に姿を見せることなく、あたかもアーティストがその場で歌ったり、あるいは演奏したりしているかのような印象をファンに与えることができる技術が提案されている。音楽や芸能、スポーツ等の各種イベントのコンテンツを配信する分野において、新たな体験を提供できる技術が望まれている。 Patent Document 1 below proposes a technique that can give a fan the impression that the artist is singing or playing on the spot without showing the artist in front of the fan. ing. In the field of distributing the contents of various events such as music, entertainment, and sports, a technology that can provide a new experience is desired.

特開２０１５−１８５０２０号公報Japanese Unexamined Patent Publication No. 2015-185020

本発明の態様に従えば、イベントに参加するユーザが発する音を検出するマイクと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を調整する音調整部と、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成する合成部と、合成部が合成した音を表すデータを出力する出力部と、を備え、演者は、所定領域の外でパフォーマンスを行い、合成部が合成した音のデータは、演者へ音が届く位置に配置される音声出力装置へ提供される、音声処理システムが提供される。
本発明の態様に従えば、イベントに参加するユーザが発する音を検出するマイクと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を調整する音調整部と、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成する合成部と、合成部が合成した音を表すデータを出力する出力部と、を備え、音調整部は、マイクが検出した第１の音を、第１の音が観客位置で発せられた場合に演者位置に届く第２の音に近づけるように調整する、音声処理システムが提供される。
本発明の態様に従えば、イベントに参加するユーザが発する音を検出するマイクと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を調整する音調整部と、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成する合成部と、合成部が合成した音を表すデータを出力する出力部と、を備え、音調整部は、観客位置と演者位置との距離を用いて、マイクが検出した音の大きさを調整する、音声処理システムが提供される。
本発明の態様に従えば、イベントに参加するユーザが発する音を検出するマイクと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を調整する音調整部と、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成する合成部と、合成部が合成した音を表すデータを出力する出力部と、第１のユーザに対応する観客位置と、第２のユーザに対応する観客位置との関係に基づいて、第２のユーザに対応するマイクが検出した音を調整する第２音調整部と、第２音調整部によって調整された音を含む音のデータを、第１のユーザに音を伝える音声出力装置へ出力する第２出力部と、を備える音声処理システムが提供される。
本発明の態様に従えば、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を受信する受信部と、ユーザから発せられてマイクに検出された音を、位置情報に基づいて調整する音調整部と、音調整部が調整した音を表すデータを送信する送信部と、を備え、音調整部は、マイクが検出した第１の音を、第１の音が観客位置で発せられた場合に演者位置に届く第２の音に近づけるように調整する、音声処理装置が提供される。
本発明の態様に従えば、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を受信する受信部と、ユーザから発せられてマイクに検出された音を、位置情報に基づいて調整する音調整部と、音調整部が調整した音を表すデータを送信する送信部と、を備え、音調整部は、観客位置と演者位置との距離を用いて、マイクが検出した音の大きさを調整する、音声処理装置が提供される。
本発明の態様に従えば、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を送信する送信部と、ユーザから発せられて位置情報に基づいて調整された音のデータを、位置情報の送信先から受信する受信部と、ユーザである第１のユーザに対応する送信先から受信したデータが表す音と、第１のユーザと異なるユーザである第２のユーザに対応する送信先から受信したデータが表す音と、を合成する合成部と、を備え、演者は、所定領域の外でパフォーマンスを行い、合成部が合成した音のデータは、演者へ音が届く位置に配置される音声出力装置へ提供される、音声処理装置が提供される。
本発明の態様に従えば、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を送信する送信部と、ユーザから発せられて位置情報に基づいて調整された音のデータを、位置情報の送信先から受信する受信部と、ユーザである第１のユーザに対応する送信先から受信したデータが表す音と、第１のユーザと異なるユーザである第２のユーザに対応する送信先から受信したデータが表す音と、を合成する合成部と、第１のユーザに対応する観客位置と、第２のユーザに対応する観客位置との関係に基づいて、第２のユーザに対応するマイクが検出した音を調整する第２音調整部と、を備え、第２音調整部によって調整された音を含む音のデータは、第１のユーザに音を伝える音声出力装置へ出力される、音声処理装置が提供される。
本発明の態様に従えば、イベントに参加するユーザが発する音をマイクによって検出することと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を音調整部によって調整することと、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成部によって合成することと、合成部が合成した音を表すデータを出力することと、を含み、演者は、所定領域の外でパフォーマンスを行い、合成部が合成した音のデータは、演者へ音が届く位置に配置される音声出力装置へ提供される、音声処理方法が提供される。
本発明の態様に従えば、イベントに参加するユーザが発する音をマイクによって検出することと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を音調整部によって調整することと、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成部によって合成することと、合成部が合成した音を表すデータを出力することと、を含み、音調整部は、マイクが検出した第１の音を、第１の音が観客位置で発せられた場合に演者位置に届く第２の音に近づけるように調整する、音声処理方法が提供される。
本発明の態様に従えば、イベントに参加するユーザが発する音をマイクによって検出することと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を音調整部によって調整することと、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成部によって合成することと、合成部が合成した音を表すデータを出力することと、を含み、音調整部は、観客位置と演者位置との距離を用いて、マイクが検出した音の大きさを調整する、音声処理方法が提供される。
本発明の態様に従えば、イベントに参加するユーザが発する音をマイクによって検出することと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を音調整部によって調整することと、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成部によって合成することと、合成部が合成した音を表すデータを出力することと、第１のユーザに対応する観客位置と、第２のユーザに対応する観客位置との関係に基づいて、第２のユーザに対応するマイクが検出した音を第２音調整部によって調整することと、第２音調整部によって調整された音を含む音のデータを、第１のユーザに音を伝える音声出力装置へ出力することと、を含む音声処理方法が提供される。
本発明の態様に従えば、コンピュータに、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を受信することと、ユーザから発せられてマイクに検出された音を、位置情報に基づいて調整することと、調整した音を表すデータを送信することと、を実行させ、調整することは、マイクが検出した第１の音を、第１の音が観客位置で発せられた場合に演者位置に届く第２の音に近づけるように調整することを含む、音声処理プログラムが提供される。
本発明の態様に従えば、コンピュータに、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を受信することと、ユーザから発せられてマイクに検出された音を、位置情報に基づいて調整することと、調整した音を表すデータを送信することと、を実行させ、調整することは、観客位置と演者位置との距離を用いて、マイクが検出した音の大きさを調整することを含む、音声処理プログラムが提供される。
本発明の態様に従えば、コンピュータに、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を送信することと、ユーザから発せられて位置情報に基づいて調整された音のデータを、位置情報の送信先から受信することと、ユーザである第１のユーザに対応する送信先から受信したデータが表す音と、第１のユーザと異なるユーザである第２のユーザに対応する送信先から受信したデータが表す音とを合成することと、を実行させ、演者は、所定領域の外でパフォーマンスを行い、合成された音のデータは、演者へ音が届く位置に配置される音声出力装置へ提供される、音声処理プログラムが提供される。
本発明の態様に従えば、コンピュータに、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を送信することと、ユーザから発せられて位置情報に基づいて調整された音のデータを、位置情報の送信先から受信することと、ユーザである第１のユーザに対応する送信先から受信したデータが表す音と、第１のユーザと異なるユーザである第２のユーザに対応する送信先から受信したデータが表す音とを合成することと、第１のユーザに対応する観客位置と、第２のユーザに対応する観客位置との関係に基づいて、第２のユーザに対応するマイクが検出した音を調整することと、を実行させ、第２のユーザに対応して調整された音を含む音のデータは、第１のユーザに音を伝える音声出力装置へ出力される、音声処理プログラムが提供される。
本発明の第１の態様に従えば、イベントに参加するユーザが発する音を検出するマイクと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を調整する音調整部と、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成する合成部と、合成部が合成した音を表すデータを出力する出力部と、を備える音声処理システムが提供される。
本発明の態様に従えば、イベントに参加するユーザが発する音を検出するマイクと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を調整する音調整部と、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成する合成部と、合成部が合成した音を表すデータを出力する出力部と、を備え、音調整部は、マイクが検出した第１の音を、第１の音が観客位置で発せられた場合に演者位置に届く第２の音に近づけるように調整する、音声処理システムが提供される。 According to an aspect of the present invention, a microphone that detects a sound emitted by a user participating in an event, a performer position that represents the position of the performer of the event in a predetermined area where the event is held, and a spectator that represents the position of the user in a predetermined area. A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship with the position, a sound detected by the microphone corresponding to the first user who is the user, and adjusted by the sound adjustment unit, and the first user. A compositing unit that synthesizes the sound detected by the microphone corresponding to the second user who is a different user from the above and adjusted to the sound adjusting unit, and an output unit that outputs data representing the sound synthesized by the compositing unit. Provided is a sound processing system in which the performer performs a performance outside a predetermined area, and the sound data synthesized by the synthesis unit is provided to a sound output device arranged at a position where the sound reaches the performer.
According to an aspect of the present invention, a microphone that detects a sound emitted by a user participating in an event, a performer position that represents the position of the performer of the event in a predetermined area where the event is held, and a spectator that represents the position of the user in a predetermined area. A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship with the position, a sound detected by the microphone corresponding to the first user who is the user, and adjusted by the sound adjustment unit, and the first user. A compositing unit that synthesizes the sound detected by the microphone corresponding to the second user who is a different user from the above and adjusted to the sound adjusting unit, and an output unit that outputs data representing the sound synthesized by the compositing unit. Provided by the sound processing system, the sound adjustment unit adjusts the first sound detected by the microphone so as to approach the second sound that reaches the performer position when the first sound is emitted at the audience position. Will be done.
According to an aspect of the present invention, a microphone that detects a sound emitted by a user participating in an event, a performer position that represents the position of the performer of the event in a predetermined area where the event is held, and a spectator that represents the position of the user in a predetermined area. A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship with the position, a sound detected by the microphone corresponding to the first user who is the user, and adjusted by the sound adjustment unit, and the first user. A compositing unit that synthesizes the sound detected by the microphone corresponding to the second user who is a different user from the above and adjusted to the sound adjusting unit, and an output unit that outputs data representing the sound synthesized by the compositing unit. The sound adjusting unit is provided with a sound processing system that adjusts the loudness of the sound detected by the microphone by using the distance between the audience position and the performer position.
According to an aspect of the present invention, a microphone that detects a sound emitted by a user participating in an event, a performer position that represents the position of the performer of the event in a predetermined area where the event is held, and a spectator that represents the position of the user in a predetermined area. A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship with the position, a sound detected by the microphone corresponding to the first user who is the user, and adjusted by the sound adjustment unit, and the first user. A synthesizer that synthesizes the sound detected by the microphone corresponding to the second user who is a different user from the above and adjusted to the sound adjustment unit, an output unit that outputs data representing the sound synthesized by the synthesizer, and a first A second sound adjustment unit that adjusts the sound detected by the microphone corresponding to the second user based on the relationship between the audience position corresponding to the first user and the audience position corresponding to the second user, and the second Provided is a sound processing system including a second output unit that outputs sound data including sound adjusted by the sound adjustment unit to a sound output device that transmits the sound to a first user.
According to the aspect of the present invention, the position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area is received. It is provided with a receiving unit for adjusting the sound, a sound adjusting unit that adjusts the sound emitted from the user and detected by the microphone based on the position information, and a transmitting unit that transmits data representing the sound adjusted by the sound adjusting unit. The sound adjustment unit is provided with a sound processing device that adjusts the first sound detected by the microphone so as to approach the second sound that reaches the performer position when the first sound is emitted at the audience position. ..
According to the aspect of the present invention, the position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area is received. It is provided with a receiving unit for adjusting the sound, a sound adjusting unit that adjusts the sound emitted from the user and detected by the microphone based on the position information, and a transmitting unit that transmits data representing the sound adjusted by the sound adjusting unit. The sound adjusting unit is provided with a sound processing device that adjusts the loudness of the sound detected by the microphone by using the distance between the audience position and the performer position.
According to the aspect of the present invention, position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area is transmitted. Receives sound data emitted from the user and adjusted based on the position information from the transmission unit to receive the sound data from the transmission destination of the position information, and from the transmission unit corresponding to the first user who is the user. The performer is provided with a compositing unit that synthesizes the sound represented by the data and the sound represented by the data received from the destination corresponding to the second user who is a user different from the first user, and the performer is outside the predetermined area. The sound processing device is provided, in which the sound data synthesized by the synthesis unit is provided to the sound output device arranged at a position where the sound reaches the performer.
According to the aspect of the present invention, position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area is transmitted. The sound data emitted from the user and adjusted based on the position information is received from the transmission unit that receives the sound data from the transmission destination of the position information, and from the transmission unit corresponding to the first user who is the user. A compositing unit that synthesizes the sound represented by the data and the sound represented by the data received from the destination corresponding to the second user who is a user different from the first user, and the audience position corresponding to the first user. , A second sound adjustment unit that adjusts the sound detected by the microphone corresponding to the second user based on the relationship with the audience position corresponding to the second user, and is adjusted by the second sound adjustment unit. A sound processing device is provided in which sound data including the sound is output to a sound output device that transmits the sound to the first user.
According to the aspect of the present invention, the sound emitted by the user participating in the event is detected by the microphone, the performer position representing the position of the performer of the event in the predetermined area where the event is held, and the position of the user in the predetermined area are determined. The sound detected by the microphone is adjusted by the sound adjustment unit based on the relationship with the represented audience position, and the sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit. The synthesizer synthesizes the sound detected by the microphone corresponding to the second user, which is a different user from the first user, and adjusted to the sound adjustment unit, and outputs the data representing the sound synthesized by the synthesizer. A sound processing method in which the performer performs a performance outside a predetermined area, and the sound data synthesized by the synthesizer is provided to a sound output device arranged at a position where the sound reaches the performer. Is provided.
According to the aspect of the present invention, the sound emitted by the user participating in the event is detected by the microphone, the performer position representing the position of the performer of the event in the predetermined area where the event is held, and the position of the user in the predetermined area are determined. The sound detected by the microphone is adjusted by the sound adjustment unit based on the relationship with the represented audience position, and the sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit. The synthesizer synthesizes the sound detected by the microphone corresponding to the second user, which is a different user from the first user, and adjusted to the sound adjustment unit, and outputs the data representing the sound synthesized by the synthesizer. The sound adjustment unit adjusts the first sound detected by the microphone so as to be closer to the second sound that reaches the performer position when the first sound is emitted at the audience position. A voice processing method is provided.
According to the aspect of the present invention, the sound emitted by the user participating in the event is detected by the microphone, the performer position representing the position of the performer of the event in the predetermined area where the event is held, and the position of the user in the predetermined area are determined. The sound detected by the microphone is adjusted by the sound adjustment unit based on the relationship with the represented audience position, and the sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit. The synthesizer synthesizes the sound detected by the microphone corresponding to the second user, which is a different user from the first user, and adjusted to the sound adjustment unit, and outputs the data representing the sound synthesized by the synthesizer. A sound processing method is provided in which the sound adjusting unit adjusts the loudness of the sound detected by the microphone by using the distance between the audience position and the performer position.
According to the aspect of the present invention, the sound emitted by the user participating in the event is detected by the microphone, the performer position representing the position of the performer of the event in the predetermined area where the event is held, and the position of the user in the predetermined area are determined. The sound detected by the microphone is adjusted by the sound adjustment unit based on the relationship with the represented audience position, and the sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit. The synthesizer synthesizes the sound detected by the microphone corresponding to the second user, which is a different user from the first user, and adjusted to the sound adjustment unit, and outputs the data representing the sound synthesized by the synthesizer. Based on the relationship between the spectator position corresponding to the first user and the spectator position corresponding to the second user, the sound detected by the microphone corresponding to the second user is detected by the second sound adjustment unit. A sound processing method including adjusting and outputting sound data including a sound adjusted by a second sound adjusting unit to a sound output device that transmits the sound to a first user is provided.
According to the aspect of the present invention, a position showing the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area on the computer. Receiving information, adjusting the sound emitted by the user and detected by the microphone based on the position information, and transmitting data representing the adjusted sound can be executed and adjusted. , A speech processing program is provided that includes adjusting the first sound detected by the microphone to be closer to the second sound that reaches the performer position when the first sound is emitted at the audience position.
According to the aspect of the present invention, a position showing the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area on the computer. Receiving information, adjusting the sound emitted by the user and detected by the microphone based on the position information, and transmitting data representing the adjusted sound can be executed and adjusted. , A voice processing program is provided that includes adjusting the loudness of the sound detected by the microphone using the distance between the audience position and the performer position.
According to the aspect of the present invention, a position showing the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area on the computer. Sending information, receiving sound data emitted from the user and adjusted based on the position information from the destination of the position information, and receiving from the destination corresponding to the first user who is the user. Combining the sound represented by the data and the sound represented by the data received from the destination corresponding to the second user who is a different user from the first user, the performer is out of the predetermined area. A sound processing program is provided in which the synthesized sound data is provided to a sound output device arranged at a position where the sound reaches the performer.
According to the aspect of the present invention, a position showing the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area on the computer. Sending information, receiving sound data emitted from the user and adjusted based on the position information from the destination of the position information, and receiving from the destination corresponding to the first user who is the user. Combining the sound represented by the generated data and the sound represented by the data received from the destination corresponding to the second user who is a different user from the first user, the audience position corresponding to the first user, and the audience position corresponding to the first user. Adjusting the sound detected by the microphone corresponding to the second user based on the relationship with the audience position corresponding to the second user, and executing the sound adjusted corresponding to the second user. A sound processing program is provided in which the sound data including the above is output to a sound output device that transmits the sound to the first user.
According to the first aspect of the present invention, a microphone for detecting a sound emitted by a user participating in an event, a performer position representing an event performer's position in a predetermined area where the event is held, and a user's position in a predetermined area. A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship with the audience position representing the above, a sound detected by the microphone corresponding to the first user who is the user, and adjusted by the sound adjustment unit. A synthesizer that synthesizes the sound detected by the microphone corresponding to the second user who is a different user from the first user and adjusted to the sound adjustment unit, and an output unit that outputs data representing the sound synthesized by the synthesizer. And, a sound processing system is provided.
According to an aspect of the present invention, a microphone that detects a sound emitted by a user participating in an event, a performer position that represents the position of the performer of the event in a predetermined area where the event is held, and a spectator that represents the position of the user in a predetermined area. A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship with the position, a sound detected by the microphone corresponding to the first user who is the user, and adjusted by the sound adjustment unit, and the first user. A compositing unit that synthesizes the sound detected by the microphone corresponding to the second user who is a different user and adjusted to the sound adjusting unit, and an output unit that outputs data representing the sound synthesized by the compositing unit. Provided by the sound processing system, the sound adjustment unit adjusts the first sound detected by the microphone so as to approach the second sound that reaches the performer position when the first sound is emitted at the audience position. Will be done.

本発明の第２の態様に従えば、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を受信する受信部と、ユーザから発せられてマイクに検出された音を、位置情報に基づいて調整する音調整部と、音調整部が調整した音を表すデータを送信する送信部と、を備える音声処理装置が提供される。 According to the second aspect of the present invention, a position indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area. A receiving unit that receives information, a sound adjusting unit that adjusts the sound emitted from the user and detected by the microphone based on the position information, and a transmitting unit that transmits data representing the sound adjusted by the sound adjusting unit. A voice processing device comprising the above is provided.

本発明の第３の態様に従えば、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を送信する送信部と、ユーザから発せられて位置情報に基づいて調整された音のデータを、位置情報の送信先から受信する受信部と、ユーザである第１のユーザに対応する送信先から受信したデータが表す音と、第１のユーザと異なるユーザである第２のユーザに対応する送信先から受信したデータが表す音と、を合成する合成部と、を備える音声処理装置が提供される。 According to the third aspect of the present invention, a position indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area. A transmitter that transmits information, a receiver that receives sound data emitted from the user and adjusted based on the location information from the destination of the location information, and a destination corresponding to the first user who is the user. Provided is a voice processing device including a compositing unit that synthesizes a sound represented by data received from and a sound represented by data received from a destination corresponding to a second user who is a user different from the first user. Will be done.

本発明の第４の態様に従えば、イベントに参加するユーザが発する音をマイクによって検出することと、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、所定領域におけるユーザの位置を表す観客位置との関係に基づいて、マイクが検出した音を音調整部によって調整することと、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成部によって合成することと、合成部が合成した音を表すデータを出力することと、を含む音声処理方法が提供される。 According to the fourth aspect of the present invention, the sound emitted by the user participating in the event is detected by the microphone, the performer position representing the position of the performer of the event in the predetermined area where the event is held, and the user in the predetermined area. The sound detected by the microphone is adjusted by the sound adjustment unit based on the relationship with the audience position representing the position of, and is detected by the microphone corresponding to the first user who is the user and adjusted to the sound adjustment unit. The sound and the sound detected by the microphone corresponding to the second user, which is a user different from the first user, and adjusted by the sound adjustment unit are synthesized by the synthesis unit, and the sound synthesized by the synthesis unit is represented. A sound processing method including outputting data is provided.

本発明の第５の態様に従えば、コンピュータに、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を受信することと、ユーザから発せられてマイクに検出された音を、位置情報に基づいて調整することと、調整した音を表すデータを送信することと、を実行させる音声処理プログラムが提供される。 According to the fifth aspect of the present invention, the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area on the computer. Voice processing that receives the position information indicating the above, adjusts the sound emitted from the user and detected by the microphone based on the position information, and transmits the data representing the adjusted sound. The program is provided.

本発明の第６態様に従えば、コンピュータに、イベントが開催される所定領域におけるイベントの演者の位置を表す演者位置と、イベントに参加するユーザの所定領域における位置を表す観客位置との関係を示す位置情報を送信することと、ユーザから発せられて位置情報に基づいて調整された音のデータを、位置情報の送信先から受信することと、ユーザである第１のユーザに対応する送信先から受信したデータが表す音と、第１のユーザと異なるユーザである第２のユーザに対応する送信先から受信したデータが表す音とを合成することと、実行させる音声処理プログラムが提供される。
According to the sixth aspect of the present invention, the computer is provided with a relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area. Sending the indicated position information, receiving sound data emitted from the user and adjusted based on the position information from the destination of the position information, and a destination corresponding to the first user who is the user. A voice processing program for synthesizing the sound represented by the data received from the first user and the sound represented by the data received from the destination corresponding to the second user who is a user different from the first user and executing the data is provided. ..

第１実施形態に係る音声処理システムを示す図である。It is a figure which shows the voice processing system which concerns on 1st Embodiment. 第１実施形態に係る端末を設定する処理を示す図である。It is a figure which shows the process of setting the terminal which concerns on 1st Embodiment. 第１実施形態に係る音声処理システムを示す図である。It is a figure which shows the voice processing system which concerns on 1st Embodiment. 第１実施形態に係る音声の調整方法の例を示す図である。It is a figure which shows the example of the sound adjustment method which concerns on 1st Embodiment. 第１実施形態に係る音声処理方法を示す図である。It is a figure which shows the voice processing method which concerns on 1st Embodiment. 第２実施形態に係る音声処理システムを示す図である。It is a figure which shows the voice processing system which concerns on 2nd Embodiment. 第２実施形態に係る音声処理方法作を示す図である。It is a figure which shows the voice processing method work which concerns on 2nd Embodiment. 第３実施形態に係る音声処理システムを示す図である。It is a figure which shows the voice processing system which concerns on 3rd Embodiment. 第３実施形態に係るグループの例を示す図である。It is a figure which shows the example of the group which concerns on 3rd Embodiment. 第４実施形態に係る音声処理システムを示す図である。It is a figure which shows the voice processing system which concerns on 4th Embodiment.

［第１実施形態］
第１実施形態について説明する。図１は、第１実施形態に係る音声処理システムを示す図である。音声処理システム１は、例えば、各種イベントにおけるコンテンツを配信するサービス（適宜、コンテンツ配信サービスという）に利用される情報処理システムである。音声処理システム１は、演者Ｐのパフォーマンスが表現されたコンテンツを、観客であるユーザＵへ配信する。音声処理システム１は、コンテンツを視聴するユーザＵの反応を、演者Ｐに伝える。演者Ｐは、複数のユーザＵの反応を意識して、後のパフォーマンスを行うことができる。複数のユーザＵは、例えば、自身の反応によって変化するパフォーマンスを視聴することができる。複数のユーザＵは、例えば、リモート環境でイベントに参加する場合であっても、臨場感を味わうことができる。このように、音声処理システム１は、各種イベントのコンテンツを配信する分野において、ユーザＵへ新たな体験を提供できる。以下の説明において適宜、任意のユーザを符号Ｕで表し、ユーザＵを区別する場合、ユーザＵａ、ユーザＵｂのように符号Ｕにアルファベットａ、ｂ、・・・を追加した符号で表す。[First Embodiment]
The first embodiment will be described. FIG. 1 is a diagram showing a voice processing system according to the first embodiment. The voice processing system 1 is, for example, an information processing system used for a service for distributing contents at various events (appropriately referred to as a content distribution service). The voice processing system 1 distributes the content expressing the performance of the performer P to the user U who is an audience. The voice processing system 1 transmits the reaction of the user U who views the content to the performer P. The performer P can perform a later performance while being aware of the reactions of the plurality of users U. A plurality of users U can, for example, watch a performance that changes according to their own reaction. The plurality of users U can experience the sense of presence even when participating in an event in a remote environment, for example. In this way, the voice processing system 1 can provide the user U with a new experience in the field of distributing the contents of various events. In the following description, any user is appropriately represented by a reference numeral U, and when the user U is distinguished, it is represented by a reference numeral in which the alphabets a, b, ... Are added to the reference numeral U, such as user Ua and user Ub.

音声処理システム１が利用される各種イベントは、例えば、音楽、演劇、娯楽、スポーツ、ｅスポーツ、競技、講演、講義、演説、対談、討論、及び実演販売の少なくとも１つに関するイベントを含む。音声処理システム１は、その他のイベントに利用されてもよい。イベントは、例えば、演者Ｐのスキルを観客に披露する場である。演者Ｐは、アーティスト、演奏者、歌手、俳優、芸人、話者、又は選手と呼ばれてもよい。観客は、聴衆、又は視聴者と呼ばれてもよい。イベントは、観客の少なくとも一部が有償で参加する形態でもよいし、観客の少なくとも一部が無償で参加する形態でもよい。イベントは、実空間で開催される形態でもよいし、仮想空間（適宜、サイバー空間と称す）で開催される形態でもよく、実空間および仮想空間で並行して開催される形態でもよい。 Various events in which the audio processing system 1 is used include, for example, events related to at least one of music, drama, entertainment, sports, esports, competition, lectures, lectures, speeches, dialogues, discussions, and demonstration sales. The voice processing system 1 may be used for other events. An event is, for example, a place to show the skills of performer P to the audience. Performer P may be referred to as an artist, performer, singer, actor, entertainer, speaker, or player. The audience may be referred to as the audience, or viewer. The event may be attended by at least a part of the spectators for a fee, or may be attended by at least a part of the spectators free of charge. The event may be held in a real space, in a virtual space (appropriately referred to as cyberspace), or in parallel in the real space and the virtual space.

図１の符号ＡＥは、イベントが開催される所定領域（適宜、イベント会場ＡＥと称す）である。イベント会場ＡＥは、実空間の領域でもよいし、仮想空間の領域でもよい。イベント会場ＡＥは、実空間に存在する施設をサイバー空間に再現または拡張したものでもよい。例えば、イベントの参加者は、演者Ｐがパフォーマンスを行う会場で参加する観客と、この会場をサイバー空間に再現した会場で参加する観客とを含んでもよい。図１の符号Ｐ１は、イベント会場ＡＥにおける演者の位置を表す演者位置である。演者位置Ｐ１は、例えば、イベント会場ＡＥのステージＡＥ１に対して予め設定される位置である。例えば、演者位置Ｐ１は、ステージＡＥ１の中心でもよい。 Reference numeral AE in FIG. 1 is a predetermined area where an event is held (appropriately referred to as an event venue AE). The event venue AE may be an area of real space or an area of virtual space. The event venue AE may be a facility that exists in the real space reproduced or expanded in the cyber space. For example, the participants of the event may include an audience who participates in the venue where the performer P performs, and an audience who participates in the venue where this venue is reproduced in cyberspace. Reference numeral P1 in FIG. 1 is a performer position representing the position of the performer in the event venue AE. The performer position P1 is, for example, a position preset with respect to the stage AE1 of the event venue AE. For example, the performer position P1 may be the center of the stage AE1.

図１の符号ＡＰは、演者Ｐがパフォーマンスを行う領域（適宜、実演領域という）である。実演領域ＡＰは、例えば、イベント会場ＡＥの外である。実演領域ＡＰは、実空間上の任意の領域である。実演領域ＡＰは、例えばスタジオである。実演領域ＡＰは、演者Ｐの住居内の領域でもよいし、商業施設内の領域でもよく、その他の領域でもよい。実演領域ＡＰは、屋内の領域でもよいし、屋外の領域（例、公園、建物の屋上）でもよい。 The reference numeral AP in FIG. 1 is an area where the performer P performs (appropriately referred to as a demonstration area). The demonstration area AP is, for example, outside the event venue AE. The demonstration area AP is an arbitrary area in the real space. The demonstration area AP is, for example, a studio. The demonstration area AP may be an area in the residence of the performer P, an area in a commercial facility, or another area. The demonstration area AP may be an indoor area or an outdoor area (eg, a park, the roof of a building).

演者Ｐは、１人でもよいし、複数人でもよい。複数の演者Ｐが存在する場合、第１の演者Ｐに対応する実演領域ＡＰは、第２の演者Ｐに対応する実演領域ＡＰと異なってもよい。１つの実演領域ＡＰに複数の演者Ｐが存在する場合、演者位置Ｐ１は、演者Ｐごとに定められてもよいし、複数の演者Ｐを代表する位置に定められてもよい。複数の演者Ｐの代表位置は、例えば、複数の演者Ｐのそれぞれの位置を座標で表し、複数の演者Ｐの座標の平均（例、複数の演者の位置の重心）でもよい。 The performer P may be one person or a plurality of people. When there are a plurality of performers P, the performance area AP corresponding to the first performer P may be different from the performance area AP corresponding to the second performer P. When a plurality of performers P exist in one performance area AP, the performer position P1 may be determined for each performer P or may be determined at a position representing the plurality of performers P. The representative position of the plurality of performers P may be, for example, the position of each of the plurality of performers P represented by coordinates, and may be the average of the coordinates of the plurality of performers P (eg, the center of gravity of the positions of the plurality of performers).

なお、実演領域ＡＰは、イベント会場ＡＥ内の領域でもよい。例えば、実演領域ＡＰは、実空間の会場におけるステージでもよい。演者位置Ｐ１は、固定でもよいし、可変でもよい。例えば、音声処理システム１は、実演領域ＡＰにおける演者Ｐの位置と移動との一方または双方を検出するセンサを備え、センサの検出結果に基づいて演者位置Ｐ１を決定してもよい。例えば、音声処理システム１は、演者Ｐが実演領域ＡＰの中央から外側へ向かって移動した場合、演者位置Ｐ１をステージＡＥ１の中央から外側へ向かうように変更してもよい。 The demonstration area AP may be an area within the event venue AE. For example, the demonstration area AP may be a stage in a real space venue. The performer position P1 may be fixed or variable. For example, the voice processing system 1 may include a sensor that detects one or both of the position and movement of the performer P in the demonstration area AP, and may determine the performer position P1 based on the detection result of the sensor. For example, the audio processing system 1 may change the performer position P1 from the center of the stage AE1 to the outside when the performer P moves from the center of the performance area AP to the outside.

図１の符号Ｑは、イベント会場ＡＥにおける観客の位置を表す観客位置である。以下の説明において適宜、任意のユーザＵの観客位置を符号Ｑで表し、観客位置を区別する場合に観客位置Ｑａ、観客位置Ｑｂのように、符号Ｑにアルファベットａ、ｂ、・・・を追加した符号で表す。観客位置Ｑａは、観客であるユーザＵａの位置を表す。観客位置Ｑｂは、ユーザＵａと異なる観客であるユーザＵｂの位置を表す。観客位置Ｑｂは、観客位置Ｑａに比べて、演者位置Ｐ１から離れた位置である。演者位置Ｐ１と観客位置Ｑａとの距離Ｌａは、演者位置Ｐ１と観客位置Ｑｂとの距離Ｌｂよりも短い。 Reference numeral Q in FIG. 1 is an spectator position representing the position of the spectator in the event venue AE. In the following description, the spectator position of any user U is represented by the code Q, and the alphabets a, b, ... Are added to the code Q, such as the spectator position Qa and the spectator position Qb, when distinguishing the spectator positions. It is represented by the code. The spectator position Qa represents the position of the user Ua who is the spectator. The spectator position Qb represents the position of the user Ub who is a different spectator from the user Ua. The audience position Qb is a position farther from the performer position P1 than the audience position Qa. The distance La between the performer position P1 and the audience position Qa is shorter than the distance Lb between the performer position P1 and the audience position Qb.

図１の符号ＡＵは、観客であるユーザＵがイベントのコンテンツを視聴する領域（適宜、視聴領域ＡＵという）である。視聴領域ＡＵは、実空間の任意の領域である。視聴領域ＡＵは、例えば、イベント会場ＡＥの外である。視聴領域ＡＵは、例えば、ユーザＵの住居内の領域である。視聴領域ＡＵは、ユーザＵの住居内の領域と異なる領域でもよい。視聴領域ＡＵは、カラオケボックス、映画館、劇場、飲食店（例、スポーツバー）などの施設の領域でもよい。視聴領域ＡＵは、鉄道や航空機、船舶、車両などの移動体の領域でもよい。視聴領域ＡＵは、イベント会場ＡＥ内の領域でもよい。例えば、イベント会場ＡＥがドライブインシアターであって、視聴領域ＡＵは、イベント会場ＡＥに駐車した車両内の領域でもよい。視聴領域ＡＵは、屋内でもよいし、屋外でもよい。 The reference numeral AU in FIG. 1 is an area (appropriately referred to as a viewing area AU) in which the user U, who is an audience, views the contents of the event. The viewing area AU is an arbitrary area in the real space. The viewing area AU is, for example, outside the event venue AE. The viewing area AU is, for example, an area in the residence of the user U. The viewing area AU may be an area different from the area in the user U's residence. The viewing area AU may be an area of facilities such as a karaoke box, a movie theater, a theater, and a restaurant (eg, a sports bar). The viewing area AU may be an area of a moving body such as a railroad, an aircraft, a ship, or a vehicle. The viewing area AU may be an area within the event venue AE. For example, the event venue AE may be a drive-in theater, and the viewing area AU may be an area in a vehicle parked at the event venue AE. The viewing area AU may be indoors or outdoors.

以下の説明において適宜、任意の視聴領域を符号ＡＵで表し、視聴領域ＡＵを区別する場合、視聴領域ＡＵａ、視聴領域ＡＵｂのように、符号ＡＵにアルファベットａ、ｂ、・・・を追加した符号で表す。例えば、視聴領域ＡＵａは、ユーザＵａがコンテンツを視聴する領域であり、視聴領域ＡＵｂは、ユーザＵｂがコンテンツを視聴する領域である。１つの視聴領域ＡＵに存在する視聴者の数は、ユーザＵのみの１人でもよいし、ユーザＵを含む複数人でもよい。 In the following description, when an arbitrary viewing area is appropriately represented by a code AU and the viewing area AU is distinguished, a code in which the alphabets a, b, ... Are added to the code AU, such as the viewing area AUa and the viewing area AUb. It is represented by. For example, the viewing area AUa is an area where the user Ua views the content, and the viewing area AUb is an area where the user Ub views the content. The number of viewers existing in one viewing area AU may be one only for the user U or may be a plurality of viewers including the user U.

音声処理システム１は、処理装置２と、端末３と、カメラ４と、マイク５と、スピーカ６とを備える。カメラ４、マイク５、及びスピーカ６は、実演領域ＡＰに配置される。カメラ４、マイク５、及びスピーカ６は、処理装置２と接続される。例えば、カメラ４、マイク５、及びスピーカ６は、通信部を備える情報処理装置（例、コンピュータ）と有線または無線で接続さる。この情報処理装置は、例えばインターネット回線を介して、処理装置２と通信可能に接続される。端末３は、視聴領域ＡＵに配置される。端末３は、例えばインターネット回線を介して、処理装置２と通信可能に接続される。 The voice processing system 1 includes a processing device 2, a terminal 3, a camera 4, a microphone 5, and a speaker 6. The camera 4, the microphone 5, and the speaker 6 are arranged in the demonstration area AP. The camera 4, the microphone 5, and the speaker 6 are connected to the processing device 2. For example, the camera 4, the microphone 5, and the speaker 6 are connected by wire or wirelessly to an information processing device (eg, a computer) including a communication unit. This information processing device is communicably connected to the processing device 2 via, for example, an internet line. The terminal 3 is arranged in the viewing area AU. The terminal 3 is communicably connected to the processing device 2 via, for example, an internet line.

カメラ４は、実演領域ＡＰでパフォーマンスを行う演者Ｐを撮影する。マイク５は、実演領域ＡＰでパフォーマンスを行う演者Ｐが発する音声を検出する。演者Ｐが発する音声は、例えば、演者Ｐが身体から発する音（例、声、手拍子）と、演者Ｐが道具（例、楽器）によって発する音との一方または双方を含む。カメラ４により撮影された映像のデータ、及びマイク５により検出された音声のデータは、処理装置２に提供される。スピーカ６は、演者Ｐに対して音声を出力する。スピーカ６は、処理装置２から提供される音声のデータに基づいて、音声を出力する。処理装置２は、実演領域ＡＰに設置された装置（例、カメラ４、マイク５）から提供されるデータ（例、映像のデータ、音声のデータ）を用いて、コンテンツを生成する。処理装置２は、生成したコンテンツのデータを、例えばインターネット回線を介して提供する。 The camera 4 captures the performer P performing in the demonstration area AP. The microphone 5 detects the sound emitted by the performer P who performs in the demonstration area AP. The voice produced by the performer P includes, for example, one or both of the sound produced by the performer P from the body (eg, voice, clapping) and the sound produced by the performer P by a tool (eg, musical instrument). The video data captured by the camera 4 and the audio data detected by the microphone 5 are provided to the processing device 2. The speaker 6 outputs sound to the performer P. The speaker 6 outputs voice based on the voice data provided by the processing device 2. The processing device 2 generates content using data (eg, video data, audio data) provided by devices (eg, camera 4, microphone 5) installed in the demonstration area AP. The processing device 2 provides the generated content data via, for example, an internet line.

なお、音声処理システム１は、カメラ４と、マイク５と、スピーカ６との少なくとも１つを備えなくてもよい。例えば、カメラ４と、マイク５と、スピーカ６との少なくとも１つは、音声処理システム１の外部の装置でもよい。音声処理システム１は、上記外部の装置を利用する形態でもよい。実施形態に係るカメラは、撮像装置と呼ばれてもよい。実施形態に係るマイクは、音声検出装置と呼ばれてもよい。実施形態に係るスピーカは、音声出力装置と呼ばれてもよい。 The voice processing system 1 does not have to include at least one of a camera 4, a microphone 5, and a speaker 6. For example, at least one of the camera 4, the microphone 5, and the speaker 6 may be an external device of the voice processing system 1. The voice processing system 1 may be in the form of using the external device. The camera according to the embodiment may be called an imaging device. The microphone according to the embodiment may be referred to as a voice detection device. The speaker according to the embodiment may be referred to as an audio output device.

端末３は、例えば、視聴領域ＡＵごとに配置される。以下の説明において適宜、視聴領域ＡＵに配置される端末３を区別する場合、端末３ａ、端末３ｂのようにアルファベットを追加した符号で表す。例えば、端末３ａは、視聴領域ＡＵａに配置される端末であり、端末３ｂは、視聴領域ＡＵｂに配置される端末である。端末３ｂは、例えば、端末３ａと同様の処理を実行する。以下の説明において適宜、端末３ｂにおいて端末３ａと同様の構成については端末３ａと同じ符号を付け、端末３ｂについて端末３ａと重複する説明を省略あるいは簡略化する。 The terminals 3 are arranged for each viewing area AU, for example. In the following description, when the terminals 3 arranged in the viewing area AU are appropriately distinguished, they are represented by reference numerals having alphabets added, such as terminals 3a and 3b. For example, the terminal 3a is a terminal arranged in the viewing area AUa, and the terminal 3b is a terminal arranged in the viewing area AUb. The terminal 3b executes the same processing as the terminal 3a, for example. In the following description, the terminal 3b has the same reference numerals as the terminal 3a for the same configuration as the terminal 3a, and the description of the terminal 3b overlapping with the terminal 3a is omitted or simplified.

端末３は、処理装置２が提供するコンテンツのデータを、例えばインターネット回線を介して取得する。端末３は、取得したデータを用いて、ユーザにコンテンツを提供する。また、端末３は、ユーザが発する音を検出する。ユーザが発する音は、例えば、ユーザが身体から発する音（例、声、手拍子）と、ユーザが道具（例、鳴り物）によって発する音との一方または双方を含む。以下の説明において適宜、ユーザが発する音を観客音という。端末３は、演者位置Ｐ１と自装置に関連付けられた観客位置Ｑとの関係に基づいて、観客音を調整する。例えば、端末３は、演者位置Ｐ１と観客位置Ｑとの距離が長いほど観客音の音量が小さくなるように、観客音を調整する。端末３は、調整した観客音のデータを、例えばインターネット回線を介して提供する。 The terminal 3 acquires the data of the content provided by the processing device 2, for example, via an internet line. The terminal 3 uses the acquired data to provide content to the user. In addition, the terminal 3 detects the sound emitted by the user. The sound produced by the user includes, for example, one or both of the sound produced by the user from the body (eg, voice, clapping) and the sound produced by the user by a tool (eg, noise). In the following description, the sound emitted by the user is appropriately referred to as an audience sound. The terminal 3 adjusts the audience sound based on the relationship between the performer position P1 and the audience position Q associated with the own device. For example, the terminal 3 adjusts the audience sound so that the longer the distance between the performer position P1 and the audience position Q, the lower the volume of the audience sound. The terminal 3 provides the adjusted audience sound data, for example, via an internet line.

処理装置２は、複数の端末３から供給される観客音を合成し、合成した観客音のデータを提供する。実演領域ＡＰにおいて、スピーカ６は、処理装置２から供給される観客音のデータに基づいて、観客音を演者Ｐへ出力する。スピーカ６から出力される観客音Ｒは、例えば、ユーザＵａからの観客音に由来する成分Ｒａと、ユーザＵｂからの観客音に由来する成分Ｒｂとを含む。ユーザＵａに対応する観客位置Ｑａは、ユーザＵｂに対応する観客位置Ｑｂに比べて演者位置Ｐ１に近く、ユーザＵａの観客音に由来する成分Ｒａは、ユーザＵｂの観客音に由来する成分Ｒｂに比べて音量が大きい。 The processing device 2 synthesizes the spectator sounds supplied from the plurality of terminals 3 and provides the combined spectator sound data. In the demonstration area AP, the speaker 6 outputs the audience sound to the performer P based on the audience sound data supplied from the processing device 2. The audience sound R output from the speaker 6 includes, for example, a component Ra derived from the audience sound from the user Ua and a component Rb derived from the audience sound from the user Ub. The audience position Qa corresponding to the user Ua is closer to the performer position P1 than the audience position Qb corresponding to the user Ub, and the component Ra derived from the audience sound of the user Ua is the component Rb derived from the audience sound of the user Ub. The volume is louder than that.

スピーカ６から出力される観客音は、観客位置Ｑに基づいて調整された成分を含むので、例えば演者へライブ感を伝えることに寄与する。演者Ｐは、例えば、ライブ感がある観客音を聞くことで、複数のユーザＵの反応に応えるようにパフォーマンスを行うことができる。その結果、複数のユーザＵは、例えばライブ感のあるパフォーマンスのコンテンツを楽しむことができる。 Since the audience sound output from the speaker 6 contains a component adjusted based on the audience position Q, it contributes to, for example, transmitting a live feeling to the performer. For example, the performer P can perform a performance so as to respond to the reactions of a plurality of users U by listening to the audience sound having a live feeling. As a result, the plurality of users U can enjoy, for example, live performance content.

本実施形態において、ユーザＵは、イベントの開催前にイベント参加の申し込み（適宜、参加申請という）を行う。ここでは、参加申請の申請先は、イベント開催者であるとする。イベント開催者は、ユーザＵの参加を受け入れる場合、ユーザＵ向けに端末３を設定する。例えば、イベント開催者は、図１で説明したサービスの提供に用いる各種情報（例、データ、プログラム）を端末３に記憶させる。イベント開催者は、設定された端末３をユーザＵに提供する。ユーザＵは、イベント開催者から提供された端末３を利用して、イベントに参加してンテンツを視聴しつつ、反応を演者Ｐに伝えることができる。このように、本実施形態に係る端末３は、イベントへ参加する権利を示すチケットを兼ねる。 In the present embodiment, the user U applies for participation in the event (appropriately referred to as a participation application) before the event is held. Here, it is assumed that the application destination of the participation application is the event organizer. When the event organizer accepts the participation of the user U, the event organizer sets the terminal 3 for the user U. For example, the event organizer stores various information (eg, data, program) used for providing the service described in FIG. 1 in the terminal 3. The event organizer provides the set terminal 3 to the user U. The user U can use the terminal 3 provided by the event organizer to participate in the event, watch the content, and convey the reaction to the performer P. As described above, the terminal 3 according to the present embodiment also serves as a ticket indicating the right to participate in the event.

図２は、第１実施形態に係る端末を設定する処理を示す図である。ユーザＵａは、イベントの開催に先立ち、イベントへの参加を申請する。ユーザＵａは、例えば、コンビニエンスストア等の店舗に設置される端末（適宜、予約端末７という）を操作し、参加申請に関する情報（適宜、エントリー情報という）を入力する。エントリー情報は、例えば、ユーザＵａを特定する情報と、ユーザＵａが参加を希望するイベントを特定する情報と、ユーザＵａが希望するイベント会場ＡＥの席の情報とを含む。 FIG. 2 is a diagram showing a process of setting a terminal according to the first embodiment. User Ua applies for participation in the event prior to holding the event. The user Ua operates, for example, a terminal installed in a store such as a convenience store (appropriately referred to as a reservation terminal 7), and inputs information regarding a participation application (appropriately referred to as entry information). The entry information includes, for example, information that identifies the user Ua, information that identifies the event that the user Ua wants to participate in, and information on the seats of the event venue AE that the user Ua wants to participate in.

ユーザを特定する情報は、例えば、ユーザの氏名や識別情報などである。以下の説明において適宜、ユーザを特定する情報を、ユーザＩＤという。イベントを特定する情報は、例えば、各イベントに割り付けられた識別情報（例、番号）やイベントの名称などである。以下の説明において適宜、イベントを特定する情報をイベントＩＤという。席の情報は、例えば、席のランク（Ｓ席、Ａ席）を指定する情報と、席の条件（例、二階席、アリーナ、通路側）を指定する情報と、席の番号を指定する情報との少なくとも１つを含む。 The information that identifies the user is, for example, the user's name or identification information. Information that identifies a user as appropriate in the following description is referred to as a user ID. The information that identifies the event is, for example, the identification information (eg, number) assigned to each event, the name of the event, or the like. In the following description, information that identifies an event is appropriately referred to as an event ID. The seat information includes, for example, information for specifying the seat rank (S seat, A seat), information for specifying seat conditions (eg, upstairs seat, arena, aisle side), and information for specifying the seat number. Includes at least one of.

予約端末７に入力されたエントリー情報は、申請先の端末（適宜、受付端末８という）へ提供される。例えば、予約端末７は、インターネット回線に接続され、エントリー情報を送信する。受付端末８は、予約端末７が送信したエントリー情報を受信する。 The entry information input to the reservation terminal 7 is provided to the application destination terminal (appropriately referred to as a reception terminal 8). For example, the reservation terminal 7 is connected to an internet line and transmits entry information. The reception terminal 8 receives the entry information transmitted by the reservation terminal 7.

受付端末８は、例えば、ユーザ情報を管理するサーバである。受付端末８は、記憶部１１と、通信部１２とを備える。通信部１２は、予約端末７と通信可能に接続され、予約端末７が送信したエントリー情報を受信する。受付端末８は、ユーザから提供される参加申請を取得する申請取得部を含む。受付端末８は、受信したエントリー情報に基づいて、ユーザＵａのイベント会場ＡＥにおける席を決定する。受付端末８は、例えば、抽選によって席を決定する。受付端末８は、抽選と異なる方法で席を決定してもよいし、抽選とその他の方法とを組み合わせた方法で席を決定してもよい。例えば、受付端末８は、エントリー情報を受信した順に、エントリー情報に対応するユーザＵに席を割り当ててもよいし、エントリー情報に示される席の情報に基づいて席を決定してもよい。 The reception terminal 8 is, for example, a server that manages user information. The reception terminal 8 includes a storage unit 11 and a communication unit 12. The communication unit 12 is communicably connected to the reservation terminal 7 and receives the entry information transmitted by the reservation terminal 7. The reception terminal 8 includes an application acquisition unit for acquiring a participation application provided by a user. The reception terminal 8 determines the seat at the event venue AE of the user Ua based on the received entry information. The reception terminal 8 determines the seat by, for example, a lottery. The reception terminal 8 may determine the seat by a method different from the lottery, or may determine the seat by a method combining the lottery and other methods. For example, the reception terminal 8 may allocate seats to the user U corresponding to the entry information in the order in which the entry information is received, or may determine the seat based on the seat information shown in the entry information.

受付端末８の記憶部１１は、ユーザ情報Ｄ１を記憶する。ユーザ情報Ｄ１は、例えば、ユーザＩＤと、ユーザＵａの席として決定された席を特定する識別情報と含む。以下の説明において適宜、席を特定する識別情報を席ＩＤという。ユーザ情報Ｄ１は、例えば、複数のユーザＵのそれぞれについて、ユーザＩＤと席ＩＤとを関連付けたテーブルデータを含む。ユーザ情報Ｄ１は、ユーザＩＤおよび席ＩＤと異なる情報を含んでもよい。例えば、ユーザ情報Ｄ１は、ユーザＵａが過去に参加したイベントの履歴を含んでもよい。ユーザ情報Ｄ１は、各ユーザＵの年齢、性別、及びその他の属性情報の少なくとも１つを含んでもよい。記憶部１１が記憶するユーザ情報Ｄ１の少なくとも一部は、処理装置２へ提供される。 The storage unit 11 of the reception terminal 8 stores the user information D1. The user information D1 includes, for example, a user ID and identification information that identifies a seat determined as a seat of the user Ua. In the following description, the identification information that identifies the seat is referred to as a seat ID as appropriate. The user information D1 includes, for example, table data in which a user ID and a seat ID are associated with each of the plurality of users U. The user information D1 may include information different from the user ID and the seat ID. For example, the user information D1 may include a history of events in which the user Ua has participated in the past. The user information D1 may include at least one of the age, gender, and other attribute information of each user U. At least a part of the user information D1 stored in the storage unit 11 is provided to the processing device 2.

受付端末８は、第１のユーザが過去のイベントに参加した履歴に基づいて、過去のイベントよりも後のイベントに第１のユーザが参加する際の座席を決定してもよい。例えば、記憶部１１に記憶されるユーザ情報Ｄ１は、各ユーザが過去のイベントに参加した履歴（適宜、参加履歴という）を含む。受付端末８は、例えば、ユーザ情報Ｄ１に含まれる参加履歴を用いて、第１のユーザの今回の参加申請に対する席を決定する。例えば、実施形態に係るコンテンツ配信サービスのおいて、ユーザには、イベントに参加した回数に応じた属性（例、ランク）が付与される。例えば、ユーザのランク（例、シルバー、ゴールド、プラチナ）は、イベントの参加回数が増えるにつれて高くなる。受付端末８は、ユーザのランクが高いほど、今回のイベントにおける第１のユーザの席として、優遇された席を決定してもよい。優遇された席は、例えば、図1に示したイベント会場ＡＥにおいて相対的にステージＡＥ１に近い席である。ユーザのランクは、参加回数以外のパラメータにより決定されてもよく、例えば、イベントの参加に費やした金額に応じて決定されてもよい。ユーザのランクは、１つのパラメータによって決定されてもよいし、複数のパラメータによって決定されてもよく、抽選等によってランダムに決定されてもよい。 The reception terminal 8 may determine a seat when the first user participates in an event after the past event based on the history of the first user participating in the past event. For example, the user information D1 stored in the storage unit 11 includes a history of each user participating in a past event (appropriately referred to as a participation history). The reception terminal 8 determines the seat for the current participation application of the first user by using, for example, the participation history included in the user information D1. For example, in the content distribution service according to the embodiment, the user is given an attribute (eg, rank) according to the number of times he / she has participated in the event. For example, a user's rank (eg silver, gold, platinum) increases as the number of times they attend an event increases. As the rank of the user is higher, the reception terminal 8 may determine a preferential seat as the seat of the first user in this event. The preferential seats are, for example, the seats relatively close to the stage AE1 at the event venue AE shown in FIG. The rank of the user may be determined by a parameter other than the number of participations, and may be determined according to, for example, the amount of money spent attending the event. The rank of the user may be determined by one parameter, may be determined by a plurality of parameters, or may be randomly determined by lottery or the like.

処理装置２は、ユーザＵａの参加申請に基づいて、ユーザＵａの観客位置Ｑを決定する。処理装置２は、処理部２１と、記憶部２２と、通信部２３とを備える。通信部２３は、受付端末８と通信可能に接続される。通信部２３は、受付端末８が送信したユーザ情報を受信する。記憶部２２は、イベント会場ＡＥの演者位置Ｐ１（図１参照）の情報Ｄ２を記憶する。記憶部２２が記憶する演者位置Ｐ１の情報Ｄ２は、例えばサイバー空間における座標である。 The processing device 2 determines the audience position Q of the user Ua based on the participation application of the user Ua. The processing device 2 includes a processing unit 21, a storage unit 22, and a communication unit 23. The communication unit 23 is communicably connected to the reception terminal 8. The communication unit 23 receives the user information transmitted by the reception terminal 8. The storage unit 22 stores information D2 of the performer position P1 (see FIG. 1) of the event venue AE. The information D2 of the performer position P1 stored in the storage unit 22 is, for example, coordinates in cyberspace.

処理部２１は、位置決定部２４を含む。位置決定部２４は、ユーザからのイベントへの参加申請に基づいて、イベントが開催される所定領域（例、図１のイベント会場ＡＥ）におけるユーザの位置を表す観客位置Ｑを決定する。位置決定部２４は、例えば、申請取得部である受付端末８が取得した参加申請に基づいて、観客位置Ｑを決定する。例えば、位置決定部２４は、参加申請に基づいて決定される座席ＩＤに対応するサイバー空間上の座標を導出し、導出した座標を観客位置Ｑとして決定する。位置決定部２４は、座席ＩＤに対応する、演者位置Ｐ１と同じ座標系における座標を導出する。例えば、記憶部２２は、座席ＩＤと観客位置Ｑとを関連付けたマップデータを記憶する。位置決定部２４は、通信部２３が受信したユーザ情報に含まれる座席ＩＤを取得し、この座席ＩＤを上記マップデータと照合して観客位置Ｑを決定する。位置決定部２４は、例えば複数のユーザＵのそれぞれの観客位置Ｑを決定することによって、複数の観客位置Ｑの相対的な位置を決定する。 The processing unit 21 includes a positioning unit 24. The position determination unit 24 determines the audience position Q representing the user's position in the predetermined area where the event is held (eg, the event venue AE in FIG. 1) based on the user's application for participation in the event. The position determination unit 24 determines the spectator position Q based on, for example, the participation application acquired by the reception terminal 8 which is the application acquisition unit. For example, the position determination unit 24 derives the coordinates in cyberspace corresponding to the seat ID determined based on the participation application, and determines the derived coordinates as the spectator position Q. The position determination unit 24 derives the coordinates in the same coordinate system as the performer position P1 corresponding to the seat ID. For example, the storage unit 22 stores map data in which the seat ID and the spectator position Q are associated with each other. The position determination unit 24 acquires the seat ID included in the user information received by the communication unit 23, collates the seat ID with the map data, and determines the spectator position Q. The position determination unit 24 determines the relative positions of the plurality of spectator positions Q by, for example, determining the respective spectator positions Q of the plurality of users U.

位置決定部２４は、第１のユーザ（例、ユーザＵａ）が過去のイベントに参加した履歴に基づいて、過去のイベントよりも後のイベントに第１のユーザが参加する際の観客位置Ｑを決定してもよい。例えば、位置決定部２４は、受付端末８がユーザＵａの参加履歴に基づいて決定した今回のイベントにおけるユーザＵａの席に基づいて、この席に対応する観客位置Ｑを決定してもよい。 The position determination unit 24 determines the audience position Q when the first user participates in an event after the past event based on the history of the first user (eg, user Ua) participating in the past event. You may decide. For example, the position determination unit 24 may determine the spectator position Q corresponding to this seat based on the seat of the user Ua in the current event determined by the reception terminal 8 based on the participation history of the user Ua.

処理部２１は、演者位置Ｐ１と、位置決定部２４が決定した観客位置Ｑとの関係を示す位置情報を生成する。位置情報は、例えば、演者位置Ｐ１と観客位置Ｑとの距離（例、図１の距離Ｌａ）を含む。位置情報は、演者位置Ｐ１から観客位置Ｑまでの距離と異なる情報を含んでもよい。例えば、位置情報は、演者位置Ｐ１と、観客位置Ｑとを一組にした情報を含んでもよい。位置情報は、演者位置Ｐ１と観客位置Ｑとを結ぶ方向の情報（例、ベクトル）を含んでもよい。位置情報は、演者位置Ｐ１と観客位置Ｑとの間の空間の情報（例、観客位置Ｑから演者位置Ｐ１へ向かう音の減衰のレベルを示す情報、障害物の有無）を含んでもよい。位置情報は、演者位置Ｐ１と観客位置Ｑとの距離、演者位置Ｐ１と観客位置Ｑとを一組にした情報、演者位置Ｐ１と観客位置Ｑとを結ぶ方向の情報、及び演者位置Ｐ１と観客位置Ｑとの間の空間の情報の少なくとも１つを含まなくてもよく、これら情報のいずれとも異なる情報を含んでもよい。 The processing unit 21 generates position information indicating the relationship between the performer position P1 and the audience position Q determined by the position determination unit 24. The position information includes, for example, the distance between the performer position P1 and the audience position Q (eg, the distance La in FIG. 1). The position information may include information different from the distance from the performer position P1 to the audience position Q. For example, the position information may include information in which the performer position P1 and the audience position Q are paired. The position information may include information (eg, a vector) in the direction connecting the performer position P1 and the audience position Q. The position information may include spatial information between the performer position P1 and the audience position Q (eg, information indicating the level of sound attenuation from the audience position Q to the performer position P1, the presence or absence of obstacles). The position information includes the distance between the performer position P1 and the audience position Q, the information in which the performer position P1 and the audience position Q are paired, the information in the direction connecting the performer position P1 and the audience position Q, and the performer position P1 and the audience. It is not necessary to include at least one of the information of the space between the position Q and the information, and the information different from any of these information may be included.

処理装置２が生成した位置情報は、端末３ａに供給される。端末３は、記憶部３１と、通信部３２とを備える。通信部３２は、例えば、処理装置２の通信部２３と通信可能に接続される。通信部３２は、処理装置２の通信部２３が送信した位置情報を受信する。端末３は、通信部２３が受信した位置情報を、記憶部３１に記憶させる。端末３ａは、図１に示した演者位置Ｐ１と、ユーザＵａの参加申請に基づいて決定される観客位置Ｑａとの関係を示す位置情報Ｄ３ａを記憶する。端末３ａは、例えば郵送などによってユーザＵａに提供され、ユーザＵａがイベントに参加する際に利用される。 The position information generated by the processing device 2 is supplied to the terminal 3a. The terminal 3 includes a storage unit 31 and a communication unit 32. The communication unit 32 is communicably connected to, for example, the communication unit 23 of the processing device 2. The communication unit 32 receives the position information transmitted by the communication unit 23 of the processing device 2. The terminal 3 stores the position information received by the communication unit 23 in the storage unit 31. The terminal 3a stores the position information D3a indicating the relationship between the performer position P1 shown in FIG. 1 and the audience position Qa determined based on the participation application of the user Ua. The terminal 3a is provided to the user Ua by, for example, mail, and is used when the user Ua participates in the event.

端末３ｂは、端末３ａと同様の構成である。端末３ｂは、図１に示した演者位置Ｐ１と、ユーザＵｂの参加申請に基づいて決定される観客位置Ｑｂとの関係を示す位置情報Ｄ３ｂを記憶する。端末３ｂは、例えば宅配便などによってユーザＵｂに届けられ、ユーザＵｂがイベントに参加する際に利用される。以下の説明において適宜、任意のユーザＵに対応する観客位置Ｑと演者位置Ｐ１との関係を示す位置情報を符号Ｄ３で表す。 The terminal 3b has the same configuration as the terminal 3a. The terminal 3b stores the position information D3b indicating the relationship between the performer position P1 shown in FIG. 1 and the audience position Qb determined based on the participation application of the user Ub. The terminal 3b is delivered to the user Ub by, for example, a courier service, and is used when the user Ub participates in the event. In the following description, the position information indicating the relationship between the audience position Q corresponding to the arbitrary user U and the performer position P1 is represented by reference numeral D3 as appropriate.

なお、予約端末７は、音声処理システム１の一部でもよいし、音声処理システム１の外部の装置でもよい。予約端末７は、ユーザＵａが保有するスマートフォンやパーソナルコンピュータ等の端末でもよい。ユーザＵｂからのエントリー情報を受け付ける予約端末７は、ユーザＵａからのエントリー情報を受け付ける予約端末７と同じ装置でもよいし、異なる装置でもよい。エントリー情報の少なくとも一部を予約端末７へ入力する処理は、ユーザＵａと異なる者が実行してもよい。例えば、ユーザＵａは、参加申請を店舗または電話で受付担当者へ伝え、受付担当者が予約端末７にエントリー情報を入力してもよい。参加申請の申請先は、イベント開催者でなくてもよい。例えば、参加申請の申請先は、イベント開催者から委託された受託者でもよいし、イベントに関するコンテンツの配信者でもよく、イベントの管理者でもよい。 The reservation terminal 7 may be a part of the voice processing system 1 or an external device of the voice processing system 1. The reservation terminal 7 may be a terminal such as a smartphone or a personal computer owned by the user Ua. The reservation terminal 7 that receives the entry information from the user Ub may be the same device as the reservation terminal 7 that receives the entry information from the user Ua, or may be a different device. The process of inputting at least a part of the entry information to the reservation terminal 7 may be executed by a person different from the user Ua. For example, the user Ua may notify the receptionist of the participation application by the store or telephone, and the receptionist may input the entry information into the reservation terminal 7. The application destination for the participation application does not have to be the event organizer. For example, the application destination of the participation application may be a trustee entrusted by the event organizer, a distributor of content related to the event, or an event manager.

なお、ユーザＵａの席を決定する処理は、申請先の担当者によって手動で実行されてもよい。例えば、ユーザＵａは、参加申請が記入された申込用紙を申請先へ郵送し、申請先の担当者は、申込用紙に記載された情報に基づいてユーザＵａの席を決定してもよい。申請先の担当者は、ユーザＵａの席として決定された席ＩＤをユーザＩＤとを関連付けて、受付端末８に入力してもよい。受付端末８または上記担当者は、イベントに参加する権利（適宜、参加権という）をユーザＵａへ付与するか否かを決定し、その後に、ユーザＵａの席を決定してもよい。例えば、定員を超える人数のユーザＵから参加申請があった場合、又は参加申請があると予測される場合、受付端末８は、抽選によって参加権を付与するか否かを決定してもよい。 The process of determining the seat of the user Ua may be manually executed by the person in charge of the application destination. For example, the user Ua may mail the application form on which the participation application is filled in to the application destination, and the person in charge of the application destination may determine the seat of the user Ua based on the information described in the application form. The person in charge of the application destination may input the seat ID determined as the seat of the user Ua into the reception terminal 8 in association with the user ID. The reception terminal 8 or the person in charge may decide whether or not to grant the user Ua the right to participate in the event (appropriately referred to as the participation right), and then decide the seat of the user Ua. For example, when there is a participation application from a number of users U exceeding the capacity, or when it is predicted that there is a participation application, the reception terminal 8 may decide whether or not to grant the participation right by lottery.

なお、受付端末８は、音声処理システム１の一部でもよいし、音声処理システム１の外部の装置でもよい。例えば、処理装置２は、受付端末８を備えてもよい。受付端末８は、予約端末７を兼ねてもよい。処理装置２の機能は、複数の装置に分かれて実現されてもよく、これら複数の装置は、包括して処理システムと称されてもよい。 The reception terminal 8 may be a part of the voice processing system 1 or an external device of the voice processing system 1. For example, the processing device 2 may include a reception terminal 8. The reception terminal 8 may also serve as the reservation terminal 7. The function of the processing device 2 may be realized by being divided into a plurality of devices, and these plurality of devices may be collectively referred to as a processing system.

なお、端末３に位置情報を記憶させる方法は、有線又は無線による通信を用いない方法でもよい。例えば、位置情報は、ＵＳＢメモリ等の記録媒体を介して端末３に記憶されてもよい。端末３は、ユーザＵに届けられる前に、位置情報を予め記憶しなくてもよい。例えば、処理装置２は、端末３に関連付けられる宛先を記憶し、ユーザＵが端末３を受け取った後に、この宛先へ位置情報を送信してもよい。処理装置２は、上記宛先へ位置情報を送信することで、端末３に記憶される位置情報を更新させてもよい。 The method of storing the position information in the terminal 3 may be a method that does not use wired or wireless communication. For example, the position information may be stored in the terminal 3 via a recording medium such as a USB memory. The terminal 3 does not have to store the position information in advance before being delivered to the user U. For example, the processing device 2 may store the destination associated with the terminal 3 and transmit the location information to the destination after the user U receives the terminal 3. The processing device 2 may update the position information stored in the terminal 3 by transmitting the position information to the destination.

図３は、第１実施形態に係る音声処理システムを示す図である。こでは、ユーザからの音を演者へ伝える際の音声処理システム１の各部の動作を説明する。処理装置２は、例えば、サーバコンピュータ等の情報処理装置を含む。処理装置２は、音声処理システム１の各部を管理する。処理装置２は、処理部２１と、通信部２３と、入力部２５と、出力部２６とを備える。処理装置２には、カメラ４と、マイク５と、スピーカ６とが接続されている。 FIG. 3 is a diagram showing a voice processing system according to the first embodiment. Here, the operation of each part of the voice processing system 1 when transmitting the sound from the user to the performer will be described. The processing device 2 includes, for example, an information processing device such as a server computer. The processing device 2 manages each part of the voice processing system 1. The processing device 2 includes a processing unit 21, a communication unit 23, an input unit 25, and an output unit 26. A camera 4, a microphone 5, and a speaker 6 are connected to the processing device 2.

入力部２５は、処理装置２の外部から信号が入力されるインターフェスを含む。入力部２５には、カメラ４およびマイク５が接続される。処理装置２には、カメラ４によって撮影された映像のデータ、及びマイク５によって収集された音声のデータが入力される。出力部２６は、処理装置２の外部へ信号を出力するインターフェースを含む。出力部２６には、スピーカ６が接続される。カメラ４、マイク５、及びスピーカ６の少なくとも１つは、音声処理システム１の一部でもよいし、処理装置２の一部でもよい。カメラ４、マイク５、及びスピーカ６の少なくとも１つは、音声処理システム１の外部の装置でもよく、例えば図１に示した実演領域ＡＰの設備でもよい。 The input unit 25 includes an interface in which a signal is input from the outside of the processing device 2. A camera 4 and a microphone 5 are connected to the input unit 25. The video data captured by the camera 4 and the audio data collected by the microphone 5 are input to the processing device 2. The output unit 26 includes an interface that outputs a signal to the outside of the processing device 2. A speaker 6 is connected to the output unit 26. At least one of the camera 4, the microphone 5, and the speaker 6 may be a part of the voice processing system 1 or a part of the processing device 2. At least one of the camera 4, the microphone 5, and the speaker 6 may be an external device of the audio processing system 1, for example, the equipment of the demonstration area AP shown in FIG.

処理部２１は、合成部２７と、コンテンツ生成部２８とを備える。コンテンツ生成部２８は、入力部２５に入力されるデータを用いて、コンテンツを生成する。コンテンツ生成部２８は、例えば、演者Ｐのパフォーマンスが映像および音声で表現されたコンテンツを生成する。例えば、コンテンツ生成部２８は、カメラ４から入力部２５に入力される映像のデータと、マイク５から入力部２５に入力される音声のデータを用いて、ユーザＵに配信されるコンテンツを生成する。コンテンツ生成部２８は、例えば、映像と音声との同期をとり、コンテンツとしてストリーミング動画を生成する。コンテンツ生成部２８は、映像処理、各種音響処理、フィルタリング、及び圧縮処理の少なくとも1つを実行してコンテンツを生成してもよい。通信部２３は、コンテンツ生成部２８が生成したコンテンツのデータを送信する。 The processing unit 21 includes a synthesis unit 27 and a content generation unit 28. The content generation unit 28 generates content using the data input to the input unit 25. The content generation unit 28 generates, for example, content in which the performance of the performer P is expressed by video and audio. For example, the content generation unit 28 generates content to be distributed to the user U by using the video data input from the camera 4 to the input unit 25 and the audio data input from the microphone 5 to the input unit 25. .. The content generation unit 28 synchronizes video and audio, for example, and generates a streaming moving image as content. The content generation unit 28 may generate content by executing at least one of video processing, various sound processing, filtering, and compression processing. The communication unit 23 transmits the data of the content generated by the content generation unit 28.

端末３ａは、記憶部３１と、通信部３２と、処理部３３とを備える。通信部３２は、処理装置２の通信部２３と通信可能に接続される。通信部３２は、通信部２３が送信したコンテンツのデータを受信する。端末３ａには、マイク４１と、スピーカ４２と、表示装置４３とが接続される。マイク４１およびスピーカ４２は、ヘッドセットでもよい。表示装置４３は、テレビジョンセット、ＰＣモニター、プロジェクター等の画像を表示する装置である。 The terminal 3a includes a storage unit 31, a communication unit 32, and a processing unit 33. The communication unit 32 is communicably connected to the communication unit 23 of the processing device 2. The communication unit 32 receives the data of the content transmitted by the communication unit 23. A microphone 41, a speaker 42, and a display device 43 are connected to the terminal 3a. The microphone 41 and the speaker 42 may be a headset. The display device 43 is a device that displays an image of a television set, a PC monitor, a projector, or the like.

端末３ａの処理部３３は、通信部３２が受信したコンテンツのデータに基づいて、コンテンツの音声をスピーカ４２に出力させる。端末３ａの処理部３３は、通信部３２が受信したコンテンツのデータに基づいて、コンテンツの映像を表示装置４３に表示させる。ユーザＵａは、スピーカ４２から出力される音声、及び表示装置４３に表示される映像によって、演者Ｐのパフォーマンスを表現したコンテンツを視聴できる。マイク４１は、イベントの演者Ｐを視聴するユーザが発する音を検出する。例えば、マイク４１は、コンテンツを視聴するユーザＵａの歓声や拍手を検出する。マイク４１が検出した観客音のデータは、端末３ａに出力される。端末３ａの処理部３３は、音調整部３４を含む。音調整部３４は、マイク４１が検出した音を、演者位置Ｐ１と観客位置Ｑａとの関係に基づいて調整する。 The processing unit 33 of the terminal 3a causes the speaker 42 to output the sound of the content based on the data of the content received by the communication unit 32. The processing unit 33 of the terminal 3a causes the display device 43 to display the video of the content based on the data of the content received by the communication unit 32. The user Ua can view the content expressing the performance of the performer P by the sound output from the speaker 42 and the video displayed on the display device 43. The microphone 41 detects a sound emitted by a user who watches the performer P of the event. For example, the microphone 41 detects the cheers and applause of the user Ua who views the content. The audience sound data detected by the microphone 41 is output to the terminal 3a. The processing unit 33 of the terminal 3a includes a sound adjusting unit 34. The sound adjustment unit 34 adjusts the sound detected by the microphone 41 based on the relationship between the performer position P1 and the audience position Qa.

図４は、第１実施形態に係る音声の調整方法の例を示す図である。図４の符号Ｄ５は、音調整部３４に入力される音声のデータであり、符号Ｄ６は音調整部３４から出力される音声のデータである。データＤ５およびデータＤ６のそれぞれにおいて、横軸は周波数であり、縦軸は振幅である。周波数は音の高さに対応し、振幅は音量に対応する。符号Ｄ７は、データＤ５をデータＤ６へ変換することに利用される関数（例、フィルタ）である。関数Ｄ７は、距離とゲインとの関係を示す関数である。ゲインは、距離に対する音の減衰率に相当する。本例において、音調整部３４は、図１に示した観客位置Ｑａと演者位置Ｐ１との距離Ｌａを用いて、マイク４１が検出した音の大きさを調整する。ここでは、ゲインは、可聴音の周波数帯（例、２０Ｈｚ以上２０ｋＨｚ以下）において一定であるとする。 FIG. 4 is a diagram showing an example of a voice adjustment method according to the first embodiment. Reference numeral D5 in FIG. 4 is audio data input to the sound adjustment unit 34, and reference numeral D6 is audio data output from the sound adjustment unit 34. In each of the data D5 and the data D6, the horizontal axis is the frequency and the vertical axis is the amplitude. The frequency corresponds to the pitch and the amplitude corresponds to the volume. Reference numeral D7 is a function (eg, filter) used to convert data D5 to data D6. The function D7 is a function showing the relationship between the distance and the gain. The gain corresponds to the rate of sound attenuation with respect to distance. In this example, the sound adjusting unit 34 adjusts the loudness of the sound detected by the microphone 41 by using the distance La between the audience position Qa and the performer position P1 shown in FIG. Here, it is assumed that the gain is constant in the frequency band of audible sound (eg, 20 Hz or more and 20 kHz or less).

ゲインは、距離に対して一意に定まる値である。例えば、図１に示した距離Ｌａに対して、ゲインはＧ１と定まる。音調整部３４は、ユーザＵａの観客音を表すデータＤ５の振幅にゲインＧ１を作用させることで、データＤ６を生成する。音調整部３４は、例えば、マイク４１が検出した第１の音を、第１の音が観客位置Ｑで発せられた場合に演者位置Ｐ１に届く第２の音に近づけるように調整する。マイク４１とユーザＵａとの距離は、例えば、図１に示した視聴領域ＡＵａの種類（例、自宅、映画館）、コンテンツの視聴に利用される装置の種類（例、スマートフォン、パソコン、テレビジョンセット）、装置の配置などによって概ね定まる。 The gain is a value that is uniquely determined with respect to the distance. For example, for the distance La shown in FIG. 1, the gain is determined to be G1. The sound adjustment unit 34 generates the data D6 by applying the gain G1 to the amplitude of the data D5 representing the audience sound of the user Ua. For example, the sound adjustment unit 34 adjusts the first sound detected by the microphone 41 so as to approach the second sound that reaches the performer position P1 when the first sound is emitted at the audience position Q. The distance between the microphone 41 and the user Ua is, for example, the type of viewing area AUa shown in FIG. 1 (eg, home, movie theater), the type of device used for viewing content (eg, smartphone, personal computer, television). It is largely determined by the set) and the layout of the equipment.

ここで、図１に示した演者位置Ｐ１と観客位置Ｑａとの距離Ｌａは、例えば２０メールであるとする。端末３ａがユーザＵａの自宅の居間に配置され、テレビジョンセットに接続されるものとし、マイク４１とユーザＵａとの距離が２メートルであるとする。この場合、演者位置Ｐ１と観客位置Ｑａとの距離Ｌａ（例２０メートル）は、マイク４１とユーザＵａとの距離（例、２メートル）よりも長い。音調整部３４は、例えばユーザＵａから２メートル離れたマイク４１によって検出された観客音が、観客位置Ｑａで発せられた場合に観客位置Ｑａから２０メートル離れた演者位置Ｐ１で聞こえる音へ近づくように、音量を調整する。例えば、データＤ６は、観客音が観客位置Ｑａで発せられた場合に演者位置Ｐ１で聞こえる観客音のデータに相当する。図４の例では、距離Ｌａに対応するゲインＧ１が１未満であり、データＤ６は、データＤ５に比べて振幅が減少する。データＤ６が表す音声は、データＤ５が表す音声に比べて音量が小さく、観客音が観客位置Ｑａで発せられた場合に演者位置Ｐ１で聞こえる観客音を表すデータに相当する。 Here, it is assumed that the distance La between the performer position P1 and the audience position Qa shown in FIG. 1 is, for example, 20 mails. It is assumed that the terminal 3a is arranged in the living room of the user Ua and is connected to the television set, and the distance between the microphone 41 and the user Ua is 2 meters. In this case, the distance La between the performer position P1 and the audience position Qa (eg, 20 meters) is longer than the distance between the microphone 41 and the user Ua (eg, 2 meters). The sound adjustment unit 34 makes the audience sound detected by the microphone 41 2 meters away from the user Ua approach the sound heard at the performer position P1 20 meters away from the audience position Qa when it is emitted at the audience position Qa. To adjust the volume. For example, the data D6 corresponds to the data of the audience sound heard at the performer position P1 when the audience sound is emitted at the audience position Qa. In the example of FIG. 4, the gain G1 corresponding to the distance La is less than 1, and the amplitude of the data D6 is smaller than that of the data D5. The sound represented by the data D6 has a lower volume than the sound represented by the data D5, and corresponds to data representing the audience sound heard at the performer position P1 when the audience sound is emitted at the audience position Qa.

なお、音調整部３４は、周波数に対する振幅を表す波形を調整してもよい。例えば、音調整部３４は、周波数帯によって異なるゲインを用いて、観客音を調整してもよい。例えば、音調整部３４は、観客音のうち相対的に高音域の成分を、相対的に低音域の成分よりも減衰させてもよい。音調整部３４は、観客音を減衰させなくてもよく、観客音を増幅してもよい。音調整部３４は、マイク４１によって検出された観客音を、この観客音が観客位置Ｑで発せられた場合に演者位置Ｐ１に届く観客音に比べて音量が大きくなるように、調整してもよい。音調整部３４は、マイク４１によって検出された観客音が所定値を超える音量である場合に所定値を超える音量をカットし、音量がカットされた観客音を、観客位置Ｑと演者位置Ｐ１との関係に基づいて調整してもよい。音調整部３４は、調整された観客音の音量が所定値を超えないように、観客音を調整してもよい。 The sound adjusting unit 34 may adjust a waveform representing an amplitude with respect to the frequency. For example, the sound adjusting unit 34 may adjust the audience sound by using different gains depending on the frequency band. For example, the sound adjusting unit 34 may attenuate the relatively high-pitched sound component of the audience sound more than the relatively low-pitched sound component. The sound adjustment unit 34 does not have to attenuate the audience sound, and may amplify the audience sound. Even if the sound adjustment unit 34 adjusts the audience sound detected by the microphone 41 so that the volume is louder than the audience sound that reaches the performer position P1 when the audience sound is emitted at the audience position Q. good. When the audience sound detected by the microphone 41 has a volume exceeding a predetermined value, the sound adjusting unit 34 cuts the volume exceeding the predetermined value, and the audience sound from which the volume is cut is referred to as the audience position Q and the performer position P1. It may be adjusted based on the relationship of. The sound adjustment unit 34 may adjust the audience sound so that the volume of the adjusted audience sound does not exceed a predetermined value.

音調整部３４は、マイク４１の位置とユーザＵの位置との関係を表す情報と、観客位置Ｑと演者位置Ｐ１との関係を表す情報とを用いて、観客音を調整してもよい。マイク４１の位置とユーザＵの位置との関係を表す情報は、例えば、マイク４１とユーザＵａとの距離である。音調整部３４は、マイク４１とユーザＵａとの距離として、センサーにより検出される値を用いてもよい。音調整部３４は、マイク４１とユーザＵａとの距離として、予め定められた値を用いてもよい。マイク４１とユーザＵａとの距離は、例えば、音声処理システム１により提供されるサービスの利用規約に定められる範囲内の値（例、推奨値）でもよい。 The sound adjustment unit 34 may adjust the audience sound by using the information indicating the relationship between the position of the microphone 41 and the position of the user U and the information indicating the relationship between the audience position Q and the performer position P1. The information representing the relationship between the position of the microphone 41 and the position of the user U is, for example, the distance between the microphone 41 and the user Ua. The sound adjusting unit 34 may use a value detected by the sensor as the distance between the microphone 41 and the user Ua. The sound adjusting unit 34 may use a predetermined value as the distance between the microphone 41 and the user Ua. The distance between the microphone 41 and the user Ua may be, for example, a value (eg, a recommended value) within a range defined in the terms of use of the service provided by the voice processing system 1.

図３の説明に戻り、本実施形態に係る音調整部３４は、記憶部３１に記憶された位置情報Ｄ３ａを用いて、観客音を調整する。位置情報Ｄ３ａは、例えば、図１に示した観客位置Ｑａと演者位置Ｐ１との距離Ｌａを含む。音調整部３４は、例えば、図４に示した関数Ｄ７を用いて、距離Ｌａに対するゲインＧ１を取得する。例えば関数Ｄ７は記憶部３１に記憶され、音調整部３４は、記憶部３１から関数Ｄ７および位置情報Ｄ３ａ（例、距離Ｌａ）を読出し、ゲインＧ１を導出する。音調整部３４は、マイク４１が検出した観客音を示すデータＤ５（図４参照）に対して、導出したゲインＧ１を作用させることで、データＤ６（図４参照）を生成する。以下の説明において適宜、音調整部３４によって調整された観客音を調整後の観客音という。データＤ６は、調整後の観客音を表すデータである。 Returning to the description of FIG. 3, the sound adjusting unit 34 according to the present embodiment adjusts the audience sound by using the position information D3a stored in the storage unit 31. The position information D3a includes, for example, the distance La between the audience position Qa and the performer position P1 shown in FIG. The sound adjusting unit 34 acquires the gain G1 with respect to the distance La by using, for example, the function D7 shown in FIG. For example, the function D7 is stored in the storage unit 31, and the sound adjustment unit 34 reads the function D7 and the position information D3a (eg, distance La) from the storage unit 31 to derive the gain G1. The sound adjustment unit 34 generates data D6 (see FIG. 4) by applying the derived gain G1 to the data D5 (see FIG. 4) indicating the audience sound detected by the microphone 41. In the following description, the audience sound adjusted by the sound adjustment unit 34 as appropriate is referred to as the adjusted audience sound. The data D6 is data representing the adjusted audience sound.

なお、位置情報Ｄ３ａは、例えばサイバー空間上の座標で表された観客位置Ｑａと演者位置Ｐ１とを一組にした情報を含んでもよい。この場合、音調整部３４は、観客位置Ｑａと演者位置Ｐ１とを用いて、観客位置Ｑａと演者位置Ｐ１との距離Ｌａを算出してもよい。関数Ｄ７は、端末３の外部に記憶されてもよく、音調整部３４は、関数Ｄ７を表す情報を、端末３の外部から通信部３２を介して取得してもよい。関数Ｄ７は、数式で表されてもよいし、テーブルデータで表されてもよい。位置情報Ｄ３ａは、観客位置Ｑａから演者位置Ｐ１へ伝わる音の変化を表す情報（例、ゲインＧ１）を含んでもよい。この場合、音調整部３４は、図４に示した関数Ｄ７を用いないで観客音を調整してもよい。 The position information D3a may include, for example, information in which the audience position Qa and the performer position P1 represented by the coordinates in cyberspace are paired. In this case, the sound adjusting unit 34 may calculate the distance La between the audience position Qa and the performer position P1 by using the audience position Qa and the performer position P1. The function D7 may be stored outside the terminal 3, and the sound adjusting unit 34 may acquire information representing the function D7 from the outside of the terminal 3 via the communication unit 32. The function D7 may be represented by a mathematical formula or table data. The position information D3a may include information (eg, gain G1) representing a change in sound transmitted from the audience position Qa to the performer position P1. In this case, the sound adjusting unit 34 may adjust the audience sound without using the function D7 shown in FIG.

端末３は、調整後の観客音のデータを端末３の外部へ提供する。例えば、端末３の処理部３３は、通信部３２を制御し、調整後の観客音のデータを送信させる。処理装置２は、端末３が提供した調整後の観客音のデータを取得する。例えば、処理装置２の通信部２３は、端末３の通信部３２が送信した調整後の観客音のデータを受信する。処理装置２は、ユーザＵａに対応する端末３ａから、ユーザＵａの調整後の観客音のデータを取得する。処理装置２は、ユーザＵｂに対応する端末３ｂから、ユーザＵｂの調整後の観客音のデータを取得する。 The terminal 3 provides the adjusted audience sound data to the outside of the terminal 3. For example, the processing unit 33 of the terminal 3 controls the communication unit 32 to transmit the adjusted audience sound data. The processing device 2 acquires the adjusted audience sound data provided by the terminal 3. For example, the communication unit 23 of the processing device 2 receives the adjusted audience sound data transmitted by the communication unit 32 of the terminal 3. The processing device 2 acquires the adjusted audience sound data of the user Ua from the terminal 3a corresponding to the user Ua. The processing device 2 acquires the adjusted audience sound data of the user Ub from the terminal 3b corresponding to the user Ub.

合成部２７は、ユーザＵａに対応するマイク４１に検出されて端末３ａの音調整部３４に調整された観客音と、ユーザＵｂに対応するマイク４１に検出されて端末３ｂの音調整部３４に調整された観客音とを合成する。合成部２７は、端末３ａから取得した観客音のデータと、端末３ｂから取得した観客音のデータとを重畳し、合成された観客音を表すデータを生成する。以下の説明において適宜、合成部２７が合成した調整後の観客音を、合成後の観客音という。 The synthesis unit 27 is detected by the microphone 41 corresponding to the user Ua and adjusted by the sound adjustment unit 34 of the terminal 3a, and is detected by the microphone 41 corresponding to the user Ub and is detected by the sound adjustment unit 34 of the terminal 3b. Combine with the adjusted audience sound. The synthesizing unit 27 superimposes the spectator sound data acquired from the terminal 3a and the spectator sound data acquired from the terminal 3b, and generates data representing the synthesized spectator sound. In the following description, the adjusted audience sound synthesized by the synthesis unit 27 is referred to as the synthesized audience sound.

出力部２６は、合成部２７が合成した観客音を表すデータを出力する。例えば、出力部２６は観客音のデジタルデータを出力し、このデジタルデータはアンプによってアナログデータ（例、音声信号）へＤＡ変換されて、スピーカ６に入力される。スピーカ６は、アナログデータを振動へ変換して、合成後の観客音を出力する。上記アンプは、処理装置２が備えてもよいし、スピーカ６が備えてもよく、処理装置２とスピーカ６との間に接続される装置が備えてもよい。処理装置２が外部へ提供する合成後の観客音のデータは、デジタルデータでもよいし、アナログデータ（例、音声信号）でもよい。処理装置２は、通信部２３から通信によって観客音のデータを出力してもよい。この場合、通信部２３は、合成部２７が合成した音を表すデータを出力する出力部であってもよい。 The output unit 26 outputs data representing the audience sound synthesized by the synthesis unit 27. For example, the output unit 26 outputs digital data of audience sound, and this digital data is DA-converted into analog data (eg, audio signal) by an amplifier and input to the speaker 6. The speaker 6 converts analog data into vibration and outputs the synthesized audience sound. The amplifier may be provided by the processing device 2, a speaker 6, or a device connected between the processing device 2 and the speaker 6. The synthesized audience sound data provided by the processing device 2 to the outside may be digital data or analog data (eg, audio signal). The processing device 2 may output audience sound data from the communication unit 23 by communication. In this case, the communication unit 23 may be an output unit that outputs data representing the sound synthesized by the synthesis unit 27.

演者Ｐは、例えば、スピーカ６から出力された観客音を聞きながらパフォーマンスをすることができる。マイク５は、例えば、スピーカ６による観客音の出力と同期して、音を検出する。マイク５は、スピーカ６から出力される観客音と、演者Ｐが発する音とを合わせて検出する。カメラ４は、例えば、スピーカ６から観客音が出力されるタイミングに対して所定のタイミングで音を検出する。カメラ４は、マイク５による音の検出と同期して、演者Ｐを撮影する。スピーカ６から音が出力されるタイミングと、マイク４１が音を検出するタイミングと、カメラ４が演者Ｐを撮影するタイミングとのうち、少なくとも２つのタイミングは、処理装置２またはその他の装置によって制御されてもよい。 The performer P can perform while listening to the audience sound output from the speaker 6, for example. The microphone 5 detects the sound in synchronization with the output of the audience sound by the speaker 6, for example. The microphone 5 detects the audience sound output from the speaker 6 and the sound emitted by the performer P together. For example, the camera 4 detects the sound at a predetermined timing with respect to the timing at which the audience sound is output from the speaker 6. The camera 4 takes a picture of the performer P in synchronization with the sound detection by the microphone 5. At least two timings, that is, the timing at which the sound is output from the speaker 6, the timing at which the microphone 41 detects the sound, and the timing at which the camera 4 shoots the performer P, are controlled by the processing device 2 or other devices. You may.

処理装置２は、マイク５によって検出された音声、及びカメラ４によって撮影された映像を取得する。コンテンツ生成部２８は、マイク５によって検出された音声、及びカメラ４によって撮影された映像を用いてコンテンツのデータを生成する。処理装置２は、コンテンツ生成部２８が生成したコンテンツを配信する。例えば、通信部２３は、コンテンツ生成部２８が生成したコンテンツのデータを送信する。コンテンツ生成部２８が生成したコンテンツに含まれる音は、スピーカ６から出力された合成後の観客音と、演者Ｐが発した音とを含む。ユーザＵは、例えば、合成された観客音と演者Ｐが発する音とを含む音を聞きながら、演者Ｐの映像を視聴することができる。このように、通信部２３は、演者Ｐが発する音と、音調整部３４によって調整された音とが合成された音のデータを出力する出力部でもよい。 The processing device 2 acquires the sound detected by the microphone 5 and the image captured by the camera 4. The content generation unit 28 generates content data using the voice detected by the microphone 5 and the video captured by the camera 4. The processing device 2 distributes the content generated by the content generation unit 28. For example, the communication unit 23 transmits the data of the content generated by the content generation unit 28. The sound included in the content generated by the content generation unit 28 includes the synthesized audience sound output from the speaker 6 and the sound emitted by the performer P. The user U can watch the video of the performer P while listening to the sound including the synthesized audience sound and the sound emitted by the performer P, for example. As described above, the communication unit 23 may be an output unit that outputs sound data in which the sound emitted by the performer P and the sound adjusted by the sound adjustment unit 34 are combined.

なお、マイク５は、スピーカ６から出力される観客音を検出しなくてもよい。例えば、スピーカ６は、イヤホンやヘッドフォンでもよい。マイク５およびスピーカ６は、ヘッドセットでもよい。この場合、処理装置２のコンテンツ生成部２８は、マイク５が検出した音声（例、演者Ｐが発する音）と、合成部２７が合成した観客音とを合成して、コンテンツを生成してもよい。処理装置２が配信するコンテンツは、合成後の観客音を含まなくてもよい。この場合においても、演者Ｐは、例えば合成後の観客音に応じたパフォーマンスをすることができ、ユーザは、例えば観客に反応する演者Ｐのパフォーマンスを視聴することで、臨場感を味わうことができる。 The microphone 5 does not have to detect the audience sound output from the speaker 6. For example, the speaker 6 may be earphones or headphones. The microphone 5 and the speaker 6 may be a headset. In this case, even if the content generation unit 28 of the processing device 2 synthesizes the sound detected by the microphone 5 (for example, the sound emitted by the performer P) and the audience sound synthesized by the synthesis unit 27 to generate the content. good. The content delivered by the processing device 2 does not have to include the audience sound after synthesis. Even in this case, the performer P can perform, for example, according to the audience sound after synthesis, and the user can experience a sense of reality by, for example, viewing the performance of the performer P who reacts to the audience. ..

なお、コンテンツ生成部２８は、図３において合成部２７と同じ装置に設けられるが、合成部２７と別の装置に設けられてもよい。例えば、合成部２７は、複数の端末３から提供される観客音のデータを処理する第１のストリーミングサーバに設けられてもよい。コンテンツ生成部２８は、実演領域ＡＰに配置される装置（例、カメラ４、マイク５）から提供されるデータを処理する第２のストリーミングサーバに設けられてもよい。 Although the content generation unit 28 is provided in the same device as the synthesis unit 27 in FIG. 3, it may be provided in a device different from the composition unit 27. For example, the synthesis unit 27 may be provided in a first streaming server that processes audience sound data provided by a plurality of terminals 3. The content generation unit 28 may be provided in a second streaming server that processes data provided by devices (eg, camera 4, microphone 5) arranged in the demonstration area AP.

なお、端末３は、イベントの終了後にユーザからイベント開催者へ返還されてもよいし、返還されなくてもよい。端末３は、今回のイベント以降のイベントにおいて再利用されてもよい。例えば、ユーザＵａは、第１イベントの後の第２イベントへの参加申請を行う。処理装置２は、第２イベントへの参加申請に基づいて第２イベントにおける観客位置Ｑａを決定する。処理装置２は、第２イベントにおける観客位置Ｑａに対応する位置情報Ｄ３ａを端末３ａへ提供する。端末３ａの音調整部３４は、第２イベントに対応する位置情報Ｄ３ａを用いて、マイク４１が検出した観客音を調整してもよい。 The terminal 3 may or may not be returned from the user to the event organizer after the end of the event. The terminal 3 may be reused in the events after this event. For example, the user Ua applies for participation in the second event after the first event. The processing device 2 determines the spectator position Qa in the second event based on the application for participation in the second event. The processing device 2 provides the terminal 3a with the position information D3a corresponding to the spectator position Qa in the second event. The sound adjustment unit 34 of the terminal 3a may adjust the audience sound detected by the microphone 41 by using the position information D3a corresponding to the second event.

なお、図１から図３等では、ユーザＵとして、ユーザＵａおよびユーザＵｂの２人を代表的に示されている。ユーザＵの数は、図１から図３等では２人であるが、任意の数である。例えば、ユーザＵの数は、１００人でもよいし、１万人でもよく、１万人でもよい。 In addition, in FIGS. 1 to 3 and the like, two users, user Ua and user Ub, are typically shown as the user U. The number of users U is two in FIGS. 1 to 3, but is arbitrary. For example, the number of users U may be 100, 10,000, or 10,000.

次に、上述の音声処理システム１の構成に基づき、実施形態に係る音声処理方法について説明する。図５は、第１実施形態に係る音声処理方法を示す図である。音声処理システム１の構成については適宜、図１から図３を参照する。図５に示す各処理は、例えばイベントの開催中に音声処理システム１の各部によって実行される。図２を参照して説明したように、イベントの開催前に、位置決定部２４はユーザＵの参加申請に基づいて観客位置Ｑを決定し、観客位置Ｑに対応する位置情報を記憶した端末３がユーザに届けられる。 Next, the voice processing method according to the embodiment will be described based on the configuration of the voice processing system 1 described above. FIG. 5 is a diagram showing a voice processing method according to the first embodiment. For the configuration of the voice processing system 1, refer to FIGS. 1 to 3 as appropriate. Each process shown in FIG. 5 is executed by each part of the voice processing system 1 during, for example, an event. As described with reference to FIG. 2, before the event is held, the position determination unit 24 determines the spectator position Q based on the participation application of the user U, and the terminal 3 that stores the position information corresponding to the spectator position Q. Is delivered to the user.

図５において、第１ユーザは図１から図３のユーザＵａに相当し、第１ユーザ端末は、ユーザＵａに対応する端末３ａに相当する。第２ユーザは図１から図３のユーザＵｂに相当し、第２ユーザ端末は、ユーザＵｂに対応する端末３ｂに相当する。 In FIG. 5, the first user corresponds to the user Ua of FIGS. 1 to 3, and the first user terminal corresponds to the terminal 3a corresponding to the user Ua. The second user corresponds to the user Ub of FIGS. 1 to 3, and the second user terminal corresponds to the terminal 3b corresponding to the user Ub.

処理装置は、ステップＳ１においてコンテンツを送信する。例えば、図３の処理装置２の通信部２３は、コンテンツのデータを送信する。第１ユーザ端末は、処理装置が送信したコンテンツのデータを受信する。第１ユーザ端末は、受信したコンテンツのデータに基づいて、コンテンツを再生する。第１ユーザ端末は、ステップＳ２において、第１ユーザからの観客音を検出する。第１ユーザ端末は、ステップＳ３において、観客音を位置情報を用いて調整する。例えば、図３の端末３ａの音調整部３４は、記憶部３１に記憶された位置情報Ｄ３ａを用いて、ステップＳ２において検出された観客音を調整する。第１ユーザ端末は、ステップＳ４において調整後の観客音のデータを送信する。例えば、図３の端末３ａの通信部３２は、調整後の観客音のデータを送信する。処理装置は、第１ユーザ端末がステップＳ４において送信した調整後の観客音のデータを受信する。 The processing device transmits the content in step S1. For example, the communication unit 23 of the processing device 2 of FIG. 3 transmits content data. The first user terminal receives the data of the content transmitted by the processing device. The first user terminal reproduces the content based on the received content data. The first user terminal detects the audience sound from the first user in step S2. In step S3, the first user terminal adjusts the audience sound using the position information. For example, the sound adjusting unit 34 of the terminal 3a of FIG. 3 adjusts the audience sound detected in step S2 by using the position information D3a stored in the storage unit 31. The first user terminal transmits the audience sound data adjusted in step S4. For example, the communication unit 32 of the terminal 3a of FIG. 3 transmits the adjusted audience sound data. The processing device receives the adjusted audience sound data transmitted by the first user terminal in step S4.

ステップＳ５からステップＳ７の処理は、ステップＳ２からステップＳ４の処理と同様である。第２ユーザ端末は、ステップＳ１において処理装置が送信したコンテンツのデータを受信する。第２ユーザ端末は、ステップＳ５において、第２ユーザからの観客音を検出する。第２ユーザ端末は、ステップＳ６において、観客音を、位置情報を用いて調整する。第２ユーザ端末は、ステップＳ７において調整後の観客音のデータを送信する。処理装置は、第２ユーザ端末がステップＳ７において送信した調整後の観客音のデータを受信する。 The processing of steps S5 to S7 is the same as the processing of steps S2 to S4. The second user terminal receives the content data transmitted by the processing device in step S1. The second user terminal detects the audience sound from the second user in step S5. In step S6, the second user terminal adjusts the audience sound using the position information. The second user terminal transmits the audience sound data adjusted in step S7. The processing device receives the adjusted audience sound data transmitted by the second user terminal in step S7.

処理装置は、ステップＳ８において、調整後の観客音を合成する。例えば、図３の処理装置２の合成部２７は、端末３ａから送信された観客音のデータと、端末３ｂから送信された観客音のデータとを合成する。処理装置は、ステップＳ９において、合成後の観客音を演者へ出力する。例えば、処理装置２は、合成後の観客音のデータをスピーカ６へ出力し、演者Ｐに対して、観客音をスピーカ６によって出力する。処理装置は、ステップＳ１０において、合成後の観客音を含むコンテンツを送信する。例えば、図３のマイク５は、演者Ｐが発する音と、スピーカ６から出力される音とを含む音を検出する。そして、処理装置２のコンテンツ生成部２８は、マイク５が検出した音を用いてコンテンツを生成する。処理装置２の通信部２３は、コンテンツ生成部２８が生成したコンテンツのデータを送信する。 The processing device synthesizes the adjusted audience sound in step S8. For example, the synthesizing unit 27 of the processing device 2 of FIG. 3 synthesizes the spectator sound data transmitted from the terminal 3a and the spectator sound data transmitted from the terminal 3b. In step S9, the processing device outputs the synthesized audience sound to the performer. For example, the processing device 2 outputs the synthesized audience sound data to the speaker 6, and outputs the audience sound to the performer P by the speaker 6. In step S10, the processing device transmits the content including the synthesized audience sound. For example, the microphone 5 of FIG. 3 detects a sound including a sound emitted by the performer P and a sound output from the speaker 6. Then, the content generation unit 28 of the processing device 2 generates content using the sound detected by the microphone 5. The communication unit 23 of the processing device 2 transmits the data of the content generated by the content generation unit 28.

第１ユーザ端末および第２ユーザ端末は、それぞれ、処理装置が送信したコンテンツのデータを受信する。第１ユーザ端末および第２ユーザ端末は、受信したコンテンツのデータに基づいて、コンテンツを再生する。ステップＳ２からステップＳ１０の処理は、例えばイベントの開催中に繰り返し実行される。ステップＳ２からステップＳ１０の一連の処理が繰り返される周期は、任意に設定される。 The first user terminal and the second user terminal each receive the content data transmitted by the processing device. The first user terminal and the second user terminal reproduce the content based on the received content data. The processes of steps S2 to S10 are repeatedly executed, for example, during the holding of an event. The cycle in which the series of processes from step S2 to step S10 is repeated is arbitrarily set.

上述の実施形態に係る音声処理システム１おいて、位置決定部２４は、ユーザＵからのイベントへの参加申請に基づいて、イベントが開催される所定領域（例、イベント会場ＡＥ）におけるユーザＵの位置を表す観客位置Ｑを決定する。マイク４１は、イベントの演者Ｐ（例、コンテンツ）を視聴するユーザＵが発する音を検出する。音調整部３４は、マイク４１が検出した音を、所定領域におけるイベントの演者の位置を表す演者位置Ｐ１と観客位置Ｑとの関係に基づいて調整する。合成部２７は、ユーザＵである第１ユーザ（例、ユーザＵａ）に対応するマイク４１に検出されて音調整部３４に調整された音と、第１ユーザと異なるユーザＵである第２ユーザ（例、ユーザＵｂ）に対応するマイク４１に検出されて音調整部３４に調整された音とを合成する。出力部２６は、合成部２７が合成した音を表すデータを出力する。 In the voice processing system 1 according to the above-described embodiment, the position determination unit 24 determines the user U in a predetermined area (eg, event venue AE) where the event is held based on the application for participation in the event from the user U. The audience position Q representing the position is determined. The microphone 41 detects the sound emitted by the user U who is viewing the performer P (eg, content) of the event. The sound adjustment unit 34 adjusts the sound detected by the microphone 41 based on the relationship between the performer position P1 representing the position of the performer of the event in the predetermined area and the audience position Q. The synthesis unit 27 includes a sound detected by the microphone 41 corresponding to the first user (eg, user Ua) who is the user U and adjusted by the sound adjustment unit 34, and a second user who is a user U different from the first user. (Example, the sound detected by the microphone 41 corresponding to the user Ub) and adjusted by the sound adjusting unit 34 is synthesized. The output unit 26 outputs data representing the sound synthesized by the synthesis unit 27.

この音声処理システム１は、演者ＰにユーザＵの反応を伝えることができる。演者Ｐは、例えば、ユーザＵの反応を意識したパフォーマンスを行うことができる。ユーザＵは、例えば自身の反応によって影響を受けたパフォーマンスを視聴することができる。このように、実施形態に係る音声処理システム１は、新たな体験を提供できる。 The voice processing system 1 can convey the reaction of the user U to the performer P. The performer P can perform a performance conscious of the reaction of the user U, for example. User U can, for example, view a performance influenced by his or her reaction. In this way, the voice processing system 1 according to the embodiment can provide a new experience.

本実施形態において、申請取得部（例、受付端末８）は、ユーザＵから提供される参加申請を取得する。位置決定部２４は、申請取得部が取得した参加申請に基づいて、観客位置Ｑを決定する。この場合、例えば、参加申請の取得から観客位置Ｑの決定までの処理を自動で行うことができる。なお、音声処理システム１は、申請取得部（例、受付端末８）を備えなくてもよい。例えば、処理装置２には、イベントへ参加するユーザの情報が入力され、位置決定部２４は、この情報を用いて観客位置を決定してもよい。 In the present embodiment, the application acquisition unit (eg, the reception terminal 8) acquires the participation application provided by the user U. The position determination unit 24 determines the audience position Q based on the participation application acquired by the application acquisition unit. In this case, for example, the process from the acquisition of the participation application to the determination of the spectator position Q can be automatically performed. The voice processing system 1 does not have to include an application acquisition unit (eg, a reception terminal 8). For example, information on a user participating in an event may be input to the processing device 2, and the position determination unit 24 may determine the position of the spectator using this information.

イベントに参加するユーザの少なくとも一部（例、第１ユーザ）は、例えば、所定領域（例、イベント会場ＡＥ）の外で演者Ｐを視聴する。第１ユーザ（例、ユーザＵａ）に対応するマイク４１は、第１ユーザが発する音を検出する位置（例、視聴領域ＡＵａ）に配置される。この形態によれば、イベントに参加可能なユーザの数は、実空間のみでイベントが開催される形態と比較して、イベント会場の広さの制約を受けにくくなる。例えば、この形態によれば、実空間の会場でイベントに参加する他、実空間の会場の外からイベントに参加することが可能になり、実空間の会場の収容人数を超える人数の観客がイベントに参加できる。この形態によれば、例えば、より多くの観客がイベントを楽しむことができる。この形態によれば、例えば実空間の会場から遠方に住むユーザに会場の外で参加する機会を創出し、ユーザの利便性を図ることができる。 At least some of the users participating in the event (eg, first user) watch the performer P outside, for example, a predetermined area (eg, event venue AE). The microphone 41 corresponding to the first user (eg, user Ua) is arranged at a position (eg, viewing area AUa) where the sound emitted by the first user is detected. According to this form, the number of users who can participate in the event is less likely to be restricted by the size of the event venue as compared with the form in which the event is held only in the real space. For example, according to this form, in addition to participating in an event at a venue in real space, it is possible to participate in the event from outside the venue in real space, and the number of spectators exceeding the capacity of the venue in real space is the event. You can participate in. According to this form, for example, a larger audience can enjoy the event. According to this form, for example, it is possible to create an opportunity for a user who lives far from the venue in the real space to participate outside the venue, and to improve the convenience of the user.

演者Ｐは、例えば、所定領域（例、イベント会場ＡＥ）の外でパフォーマンスを行う。合成部２７が合成した音のデータは、演者Ｐへ音が届く位置（例、実演領域ＡＰ）に配置される音声出力装置（例、スピーカ６）へ提供される。この形態によれば、演者Ｐが実空間のイベント会場でパフォーマンスを行う形態と比べて、演者Ｐが実在する位置がイベント会場の位置の制約を受けにくくなる。例えば、複数の演者Ｐが存在し、一部の演者Ｐが他の演者Ｐと離れている場合でもイベントを行うことができる。 Performer P performs, for example, outside a predetermined area (eg, event venue AE). The sound data synthesized by the synthesis unit 27 is provided to a voice output device (eg, speaker 6) arranged at a position where the sound reaches the performer P (eg, the demonstration area AP). According to this form, the position where the performer P actually exists is less likely to be restricted by the position of the event venue, as compared with the form in which the performer P performs at the event venue in the real space. For example, an event can be performed even when there are a plurality of performers P and some performers P are separated from other performers P.

本実施形態において、端末３は、音調整部３４と、音調整部３４が調整した音のデータを送信する送信部（例、通信部３２）とを備える第１音声処理装置に相当する。処理装置２は、第１音声処理装置の送信部が送信したデータを受信する受信部（例、通信部２３）と、受信部が受信したデータが表す音を合成する合成部２７とを備える第２音声処理装置に相当する。この形態によれば、例えば、複数のユーザの観客を調整する処理が複数の第１音声処理装置に分散され、第２音声処理装置の負荷を軽減できる。第２音声処理装置の送信部は、演者位置と、位置決定部が決定した観客位置との関係を示す位置情報を送信してもよい。第１音声処理装置の受信部は、第２音声処理装置の送信部が送信した位置情報を受信してもよい。第１音声処理装置の音調整部は、第１音声処理装置の受信部が受信した位置情報に基づいて、第１音声処理装置のマイクが検出した音を調整してもよい。 In the present embodiment, the terminal 3 corresponds to a first voice processing device including a sound adjusting unit 34 and a transmitting unit (eg, a communication unit 32) for transmitting sound data adjusted by the sound adjusting unit 34. The processing device 2 includes a receiving unit (eg, communication unit 23) that receives data transmitted by the transmitting unit of the first voice processing device, and a synthesizing unit 27 that synthesizes the sound represented by the data received by the receiving unit. 2 Corresponds to a voice processing device. According to this form, for example, the process of adjusting the audience of a plurality of users is distributed to the plurality of first voice processing devices, and the load on the second voice processing device can be reduced. The transmission unit of the second audio processing device may transmit position information indicating the relationship between the performer position and the audience position determined by the position determination unit. The receiving unit of the first voice processing device may receive the position information transmitted by the transmitting unit of the second voice processing device. The sound adjusting unit of the first voice processing device may adjust the sound detected by the microphone of the first voice processing device based on the position information received by the receiving unit of the first voice processing device.

本実施形態において、音声処理システム１は、仮想的な情報と実体的な情報とを種々組み合わせてコンテンツを生成してもよい。例えば、イベント会場ＡＥは、実際の施設や場所（コンサートホール、球場、イベント用ステージ等）を含んでもよい。イベント会場ＡＥは、仮想的な施設や環境（仮想空間に構築されたステージや、海、山あるいは空等）を含んでもよい。音声処理システム１は、ユーザから発せられた観客音の他に、イベント用に生成された人工的な観客音と効果音を用いて、コンテンツを生成してもよい。 In the present embodiment, the voice processing system 1 may generate content by variously combining virtual information and real information. For example, the event venue AE may include an actual facility or location (concert hall, stadium, event stage, etc.). The event venue AE may include virtual facilities and environments (stages constructed in virtual space, sea, mountains, sky, etc.). The voice processing system 1 may generate content by using artificial spectator sounds and sound effects generated for the event in addition to the spectator sounds emitted from the user.

本実施形態において、端末３と処理装置２との一方または双方は、コンピュータを含む。実施形態に係るコンピュータは、例えば、スマートフォン、タブレット、若しくはノートパソコンなどの携帯型の装置、又はサーバー装置、デスクトップ型パソコン、若しくはタワー型パソコン等の据置型の装置の少なくとも１つを含む。コンピュータは、例えば、処理部と、記憶部と、通信部と、入出力インターフェースとを備える。 In this embodiment, one or both of the terminal 3 and the processing device 2 includes a computer. The computer according to the embodiment includes, for example, at least one of a portable device such as a smartphone, a tablet, or a laptop computer, or a stationary device such as a server device, a desktop personal computer, or a tower personal computer. The computer includes, for example, a processing unit, a storage unit, a communication unit, and an input / output interface.

処理部は、例えば、汎用のプロセッサー（例、ＣＰＵ）を含む。記憶部は、例えば、不揮発性の記憶装置と、揮発性の記憶装置との一方または双方を含む。不揮発性の記憶装置は、例えばハードディスク、ソリッドステートドライブ等である。揮発性の記憶装置は、例えば、ランダムアクセスメモリ、キャッシュメモリ、ワークメモリなどのである。通信部は、所定の通信規格に準拠して、有線または無線の通信を行う。通信部は、通信規格が異なる２種以上の通信を組み合わせて通信を行ってもよい。所定の通信規格は、ＬＡＮを含んでもよいし、Ｂｌｕｅｔｏｏｔｈ（登録商標）などの近距離無線通信規格を含んでもよく、赤外線通信の規格又はその他の規格を含んでもよい。 The processing unit includes, for example, a general-purpose processor (eg, CPU). The storage unit includes, for example, one or both of a non-volatile storage device and a volatile storage device. The non-volatile storage device is, for example, a hard disk, a solid state drive, or the like. Volatile storage devices include, for example, random access memory, cache memory, work memory, and the like. The communication unit performs wired or wireless communication in accordance with a predetermined communication standard. The communication unit may perform communication by combining two or more types of communication having different communication standards. The predetermined communication standard may include a LAN, a short-range wireless communication standard such as Bluetooth (registered trademark), an infrared communication standard, or other standards.

なお、端末３は、処理装置、情報処理装置、電子機器、又はその他の名称で呼ばれてもよい。端末３の少なくとも一部の処理は、デジタルシグナルプロセッサにより実行されてもよいし、特定用途向け集積回路によって実行されてもよい。処理装置２は、端末、情報処理装置、電子機器、又はその他の名称で呼ばれてもよい。処理装置２の少なくとも一部の処理は、デジタルシグナルプロセッサにより実行されてもよいし、特定用途向け集積回路によって実行されてもよい。 The terminal 3 may be referred to by a processing device, an information processing device, an electronic device, or another name. At least a part of the processing of the terminal 3 may be executed by a digital signal processor or an integrated circuit for a specific purpose. The processing device 2 may be referred to by a terminal, an information processing device, an electronic device, or other names. At least a part of the processing of the processing device 2 may be executed by a digital signal processor or an integrated circuit for a specific application.

端末３がコンピュータである場合、その処理部は、記憶部に記憶されているプログラムを読み出し、このプログラムに従って各種の処理を実行する。このプログラムは、例えば、コンピュータに、ユーザからのイベントへの参加申請に基づいて決定される、イベントが開催される所定領域におけるユーザの位置を表す観客位置と、所定領域におけるイベントの演者の位置を表す演者位置との関係を示す位置情報を受信することと、イベントの演者を視聴するユーザから発せられてマイクにより検出された音を、位置情報に基づいて調整することと、イベントに参加する複数のユーザについて音調整部が調整した音を合成部を備える処理装置に、音調整部が調整した音を表すデータを送信することと、を実行させる音声処理プログラムを含む。このプログラムは、コンピュータ読み取り可能な記憶媒体に記録されて提供されてもよい。 When the terminal 3 is a computer, the processing unit reads a program stored in the storage unit and executes various processes according to this program. This program, for example, tells a computer the audience position representing the position of the user in a predetermined area where the event is held and the position of the performer of the event in the predetermined area, which is determined based on the user's application for participation in the event. Receiving position information indicating the relationship with the represented performer position, adjusting the sound emitted by the user viewing the performer of the event and detecting by the microphone based on the position information, and participating in the event. The user includes a sound processing program for transmitting data representing the sound adjusted by the sound adjustment unit to a processing device including a sound adjusting unit for synthesizing the sound adjusted by the sound adjustment unit. The program may be provided recorded on a computer-readable storage medium.

処理装置２がコンピュータである場合、その処理部は、記憶部に記憶されているプログラムを読み出し、このプログラムに従って各種の処理を実行する。このプログラムは、例えば、コンピュータに、ユーザからのイベントへの参加申請に基づいて、イベントが開催される所定領域におけるユーザの位置を表す観客位置を決定することと、所定領域におけるイベントの演者の位置を表す演者位置と観客位置との関係を示す位置情報を、イベントの演者を視聴するユーザから発せられてマイクにより検出された音を位置情報に基づいて調整する音調整部を備える処理装置に送信することと、音調整部が調整した音を表し処理装置が送信するデータを受信することと、受信部が受信したデータに基づいて、ユーザである第１のユーザに対応するマイクに検出されて音調整部に調整された音と、第１のユーザと異なるユーザである第２のユーザに対応するマイクに検出されて音調整部に調整された音とを合成し、合成した音を表すデータを出力することと、を実行させる音声処プログラムを含む。 When the processing device 2 is a computer, the processing unit reads a program stored in the storage unit and executes various processes according to this program. This program determines, for example, a spectator position representing a user's position in a predetermined area where an event is held, and a position of a performer of the event in a predetermined area, based on a user's request for participation in the event on a computer. Position information indicating the relationship between the performer position and the audience position representing the above is transmitted to a processing device provided with a sound adjustment unit that adjusts the sound emitted from the user viewing the performer of the event and detected by the microphone based on the position information. It is detected by the microphone corresponding to the first user who is the user, based on the fact that the sound is adjusted by the sound adjustment unit and the data transmitted by the processing device is received, and the data received by the reception unit is received. Data representing the combined sound by synthesizing the sound adjusted by the sound adjustment unit and the sound detected by the microphone corresponding to the second user who is a different user from the first user and adjusted by the sound adjustment unit. Includes a voice processing program that outputs and executes.

実施形態に係るプログラムは、実施形態に係る機能を、コンピュータシステムに記録されているプログラムとの組み合わせで実現できるプログラム（例、差分ファイル、差分プログラム）でもよい。実施形態に係るプログラムは、コンピュータ読み取り可能な記憶媒体に記録されて提供されてもよい。 The program according to the embodiment may be a program (eg, difference file, difference program) that can realize the function according to the embodiment in combination with the program recorded in the computer system. The program according to the embodiment may be recorded and provided on a computer-readable storage medium.

［第２実施形態］
第２実施形態について説明する。本実施形態において、上述の実施形態と同様の構成については、同じ符号を付してその説明を省略あるいは簡略化する。図６は、第２実施形態に係る音声処理システムを示す図である。本実施形態において、端末３は、この端末３を使用するユーザＵと関連付けられた端末である。例えば、端末３ａは、ユーザＵａが所有する端末である。端末３ａは、例えば、スマートフォン、タブレット、パーソナルコンピュータ、又はその他の情報処理装置である。端末３ａは、例えば、端末３ａを特定する情報と、ユーザＵａを特定する情報とを記憶部３１に記憶する。以下の説明において適宜、端末３を特定する情報を端末ＩＤという。ユーザを特定する情報は、例えば、コンテンツ配信サービスにおけるアカウントの情報を含む。アカウントの情報は、例えば、ユーザＩＤおよびパスワードを含む。[Second Embodiment]
The second embodiment will be described. In the present embodiment, the same components as those in the above-described embodiment are designated by the same reference numerals, and the description thereof will be omitted or simplified. FIG. 6 is a diagram showing a voice processing system according to the second embodiment. In the present embodiment, the terminal 3 is a terminal associated with the user U who uses the terminal 3. For example, the terminal 3a is a terminal owned by the user Ua. The terminal 3a is, for example, a smartphone, a tablet, a personal computer, or other information processing device. The terminal 3a stores, for example, information that identifies the terminal 3a and information that identifies the user Ua in the storage unit 31. In the following description, the information that identifies the terminal 3 is referred to as a terminal ID as appropriate. The information that identifies the user includes, for example, the information of the account in the content distribution service. Account information includes, for example, a user ID and password.

端末３ａは、マイク４１、スピーカ４２、表示部４５、及び入力部４６を備える。表示部４５は、図３に示した表示装置４３に相当する。入力部４６は、各種情報の入力を受け付ける。入力部４６は、例えば、キーボード、マウス、及びタッチパッドの少なくとも一つを含む。入力部４６および表示部４５は、タッチパネルでもよい。ユーザＵａは、入力部４６を操作することによって、各種情報を端末３ａへ入力する。 The terminal 3a includes a microphone 41, a speaker 42, a display unit 45, and an input unit 46. The display unit 45 corresponds to the display device 43 shown in FIG. The input unit 46 accepts input of various information. The input unit 46 includes, for example, at least one of a keyboard, a mouse, and a touchpad. The input unit 46 and the display unit 45 may be a touch panel. The user Ua inputs various information to the terminal 3a by operating the input unit 46.

本実施形態において、端末３ａは、図２で説明した予約端末７を兼ねる。ユーザＵａは、端末３ａの入力部４６を操作することによってエントリー情報を入力し、参加申請を行うことができる。エントリー情報は、例えば、ユーザＩＤ、及びイベントを特定する情報を含む。端末３ａの通信部３２は、エントリー情報を送信する。本実施形態において、処理装置２は、図２で説明した受付端末８を兼ねる。処理装置２は、端末３ａの通信部３２が送信したエントリー情報を取得する。例えば、処理装置２の通信部３２は、端末３ａが送信したエントリー情報を受信する。 In the present embodiment, the terminal 3a also serves as the reservation terminal 7 described with reference to FIG. The user Ua can input the entry information and apply for participation by operating the input unit 46 of the terminal 3a. The entry information includes, for example, a user ID and information that identifies an event. The communication unit 32 of the terminal 3a transmits the entry information. In the present embodiment, the processing device 2 also serves as the reception terminal 8 described with reference to FIG. The processing device 2 acquires the entry information transmitted by the communication unit 32 of the terminal 3a. For example, the communication unit 32 of the processing device 2 receives the entry information transmitted by the terminal 3a.

処理装置２の記憶部２２は、図２で説明したユーザ情報Ｄ１を記憶する。処理装置２の処理部２１は、受付部２９を含む。受付部２９は、図２で説明した受付端末８による処理を実行する。受付部２９は、イベントにおけるユーザの席を決定する。例えば、受付部２９は、端末３ａが送信したエントリー情報に含まれるイベントＩＤを取得する。受付部２９は、イベントＩＤとイベントの情報とを関連付けて記憶するデータベースを参照し、イベントＩＤに対応するイベントの席の情報を取得する。受付部２９は、端末３ａが送信したエントリー情報に含まれるユーザＩＤを取得する。受付部２９は、記憶部２２に記憶されたユーザ情報Ｄ１から、ユーザＩＤに対応するユーザＵａの情報（例、ユーザのランク）を取得する。受付部２９は、イベントの席の情報（例、席のランク、空席の情報）と、ユーザＵａの情報とを用いて、イベントにおけるユーザＵａの席（例、席の番号）を決定する。 The storage unit 22 of the processing device 2 stores the user information D1 described with reference to FIG. The processing unit 21 of the processing device 2 includes a reception unit 29. The reception unit 29 executes the process by the reception terminal 8 described with reference to FIG. The reception unit 29 determines the user's seat at the event. For example, the reception unit 29 acquires the event ID included in the entry information transmitted by the terminal 3a. The reception unit 29 refers to a database that stores the event ID and the event information in association with each other, and acquires the event seat information corresponding to the event ID. The reception unit 29 acquires the user ID included in the entry information transmitted by the terminal 3a. The reception unit 29 acquires user Ua information (eg, user rank) corresponding to the user ID from the user information D1 stored in the storage unit 22. The reception unit 29 determines the seat of the user Ua (eg, the seat number) in the event by using the seat information of the event (eg, the rank of the seat, the information of the vacant seat) and the information of the user Ua.

処理装置２の位置決定部２４は、受付部２９が決定したユーザＵａの席の情報を用いて、ユーザＵａの観客位置Ｑａを決定する。処理部２１は、演者位置Ｐ１と、位置決定部２４が決定した観客位置Ｑａとの関係を示す位置情報を生成する。通信部２３は、処理部２１が生成した位置情報を送信する。ユーザＵａに対応する端末３ａの通信部３２は、処理装置２の通信部２３が送信した位置情報を受信する。端末３ａの処理部３３は、通信部３２が受信した位置情報を記憶部３１に記憶させる。 The position determination unit 24 of the processing device 2 determines the audience position Qa of the user Ua by using the information on the seat of the user Ua determined by the reception unit 29. The processing unit 21 generates position information indicating the relationship between the performer position P1 and the audience position Qa determined by the position determination unit 24. The communication unit 23 transmits the position information generated by the processing unit 21. The communication unit 32 of the terminal 3a corresponding to the user Ua receives the position information transmitted by the communication unit 23 of the processing device 2. The processing unit 33 of the terminal 3a stores the position information received by the communication unit 32 in the storage unit 31.

端末３ａの通信部３２は、第１実施形態で説明したように、処理装置２が送信するコンテンツのデータを受信する。端末３ａは、通信部３２が受信したコンテンツのデータに基づいて、このコンテンツを再生する。例えば、処理部３３は、コンテンツのデータが表す映像を表示部４５に表示させる。処理部３３は、コンテンツのデータが表す音声を、スピーカ４２に出力させる。マイク４１は、ユーザＵａが発する観客音を検出する。音調整部３４は、記憶部３１に記憶されている位置情報Ｄ３ａを用いて、マイク４１が検出した観客音を調整する。通信部３２は、調整後の観客音のデータを送信する。処理装置２の通信部２３は、端末３ａの通信部２３が送信した調整後の観客音のデータを受信する。合成部２７は、ユーザＵａに対応する調整後の観客音と、ユーザＵｂに対応する調整後の観客音とを合成する。出力部２６は、合成後の観客音のデータを出力し、スピーカ４２に合成後の観客音を出力させる。 As described in the first embodiment, the communication unit 32 of the terminal 3a receives the data of the content transmitted by the processing device 2. The terminal 3a reproduces this content based on the data of the content received by the communication unit 32. For example, the processing unit 33 causes the display unit 45 to display the video represented by the content data. The processing unit 33 causes the speaker 42 to output the sound represented by the content data. The microphone 41 detects the audience sound emitted by the user Ua. The sound adjusting unit 34 adjusts the audience sound detected by the microphone 41 by using the position information D3a stored in the storage unit 31. The communication unit 32 transmits the adjusted audience sound data. The communication unit 23 of the processing device 2 receives the adjusted audience sound data transmitted by the communication unit 23 of the terminal 3a. The synthesizing unit 27 synthesizes the adjusted spectator sound corresponding to the user Ua and the adjusted spectator sound corresponding to the user Ub. The output unit 26 outputs the data of the synthesized spectator sound, and causes the speaker 42 to output the synthesized spectator sound.

本実施形態において、ユーザＵは、観客位置Ｑを変更可能である。例えば、ユーザＵａは、観客位置Ｑａの変更を、イベントの開催中に申請できる。ユーザＵａは、例えば、端末３ａを操作することによって、観客位置Ｑａの変更を要求する情報（適宜、変更申請という）を端末３ａによって送信できる。端末３ａの入力部４６は、変更申請、又は変更申請のもとになる情報（適宜、変更申請等という）の入力を受け付ける。変更申請等は、ユーザＵが希望する変更先の席の情報（適宜、変更先席情報という）を含む。 In this embodiment, the user U can change the audience position Q. For example, user Ua can apply for a change in spectator position Qa during the event. By operating the terminal 3a, for example, the user Ua can transmit information requesting a change of the spectator position Qa (appropriately referred to as a change application) by the terminal 3a. The input unit 46 of the terminal 3a accepts a change application or input of information (appropriately referred to as a change application, etc.) that is the basis of the change application. The change application and the like include information on the seat of the change destination desired by the user U (appropriately referred to as change front seat information).

端末３ａの通信部３２は、入力部４６に入力された変更申請等に基づいて、変更申請を送信する。変更申請は、例えば、ユーザＵａを特定する情報（例、ユーザＩＤ）とユーザＵａと関連付けられた端末３ａを特定する情報（例、端末ＩＤ）との一方または双方と、変更先席情報とを含む。記憶部３１は、ユーザＩＤと端末ＩＤとの一方または双方を予め記憶してもよい。ユーザＵａは、席の変更を申請する際に、変更先席情報を入力し、ユーザＩＤと端末ＩＤとの一方または双方を端末３ａに入力しなくてもよい。この場合、端末３ａの処理部３３は、記憶部３１に記憶されたユーザＩＤと端末ＩＤとの一方または双方と、入力部４６に入力された変更先席情報とを用いて変更申請を生成してもよい。通信部３２は、処理部３３が生成した変更申請を送信してもよい。 The communication unit 32 of the terminal 3a transmits a change application based on the change application or the like input to the input unit 46. The change application includes, for example, one or both of information that identifies the user Ua (eg, user ID) and information that identifies the terminal 3a associated with the user Ua (eg, terminal ID), and change prior information. include. The storage unit 31 may store one or both of the user ID and the terminal ID in advance. When applying for a seat change, the user Ua does not have to input the change destination seat information and one or both of the user ID and the terminal ID in the terminal 3a. In this case, the processing unit 33 of the terminal 3a generates a change request using one or both of the user ID and the terminal ID stored in the storage unit 31 and the change senior information input to the input unit 46. You may. The communication unit 32 may transmit the change request generated by the processing unit 33.

処理装置２の通信部２３は、端末３ａの通信部３２が送信した変更申請を受信する。位置決定部２４は、ユーザＵａからの観客位置Ｑの変更申請に基づいて、ユーザＵａに対応する観客位置Ｑａを変更する。例えば、位置決定部２４は、通信部２３が受信した変更申請に基づいて、ユーザＵａの観客位置Ｑａの変更先を決定する。例えば、位置決定部２４は、変更申請に含まれる変更先席情報に示される席を利用可能か否かを判定する。例えば、位置決定部２４は、変更先席情報に示される席がユーザＵに割り当てられていない場合、この席を利用可能と判定する。位置決定部２４は、例えば、変更先席情報に示される席に対応するイベント会場ＡＥのサイバー空間上の座標を導出し、導出した座標を変更先の観客位置Ｑ（適宜、変更後の観客位置という）として決定する。通信部２３は、更新後の観客位置Ｑに対応する位置情報を送信する。例えば、処理部２１は、端末３ａが送信した変更申請に含まれるユーザＩＤと端末ＩＤとの一方または双方に基づいて、位置情報の送信先として端末３ａを特定する。通信部２３は、処理部２１が特定した送信先へ位置情報を送信する。 The communication unit 23 of the processing device 2 receives the change application transmitted by the communication unit 32 of the terminal 3a. The position determination unit 24 changes the spectator position Qa corresponding to the user Ua based on the change request of the spectator position Q from the user Ua. For example, the position determination unit 24 determines the change destination of the spectator position Qa of the user Ua based on the change application received by the communication unit 23. For example, the position determination unit 24 determines whether or not the seat indicated in the change front seat information included in the change application is available. For example, if the seat indicated in the change destination seat information is not assigned to the user U, the position determination unit 24 determines that this seat can be used. For example, the position determination unit 24 derives the coordinates of the event venue AE in cyberspace corresponding to the seats shown in the changed seat information, and uses the derived coordinates as the changed audience position Q (as appropriate, the changed audience position). ) To be decided. The communication unit 23 transmits the position information corresponding to the updated spectator position Q. For example, the processing unit 21 specifies the terminal 3a as the transmission destination of the location information based on one or both of the user ID and the terminal ID included in the change application transmitted by the terminal 3a. The communication unit 23 transmits the position information to the destination specified by the processing unit 21.

端末３ａの通信部３２は、処理装置２が送信した更新後の位置情報を受信する。端末３ａの処理部３３は、記憶部３１に記憶されている位置情報Ｄ３ａを、通信部３２が受信した更新後の位置情報へ更新する。音調整部３４は、位置決定部２４が変更した観客位置Ｑａを用いて、ユーザＵａに対応するマイク４１が検出した音を調整する。例えば、記憶部３１に記憶されている位置情報Ｄ３ａが更新された後、音調整部３４は、マイク４１が検出した観客音を、更新後の位置情報Ｄ３ａを用いて調整する。 The communication unit 32 of the terminal 3a receives the updated position information transmitted by the processing device 2. The processing unit 33 of the terminal 3a updates the position information D3a stored in the storage unit 31 to the updated position information received by the communication unit 32. The sound adjustment unit 34 adjusts the sound detected by the microphone 41 corresponding to the user Ua by using the audience position Qa changed by the position determination unit 24. For example, after the position information D3a stored in the storage unit 31 is updated, the sound adjustment unit 34 adjusts the audience sound detected by the microphone 41 using the updated position information D3a.

なお、観客位置を変更する処理は、トークンを消費して実行されてもよい。トークンは、例えば、各国の通貨と互換性がある電子マネーを含んでもよいし、暗号資産を含んでもよい。トークンは、例えば、コンテンツ配信サービスにおいてユーザが使用できるポイントでもよく、コンテンツ配信サービスと提携するサービスにおいてユーザが使用できるポイントでもよい。ユーザＵが所有するトークンの情報は、ユーザ情報Ｄ１に含まれてもよい。消費するトークンの値は、例えば、予め定められた固定値でもよいし、ユーザが指定する値でもよく、変更先の席によって定まる値でもよい。処理装置２は、例えばユーザが指定するトークンの値を用いたオークションによって、変更先の席を決定してもよい。観客位置を変更する処理においてトークンが消費される場合、トークンが消費されるタイミングは任意である。例えば、トークンは、処理装置２が変更申請を受け付けた際に消費されてもよいし、変更申請によって観客位置が変更される際に消費されてもよい。 The process of changing the spectator position may be executed by consuming tokens. The token may include, for example, electronic money compatible with the currency of each country, or may include cryptographic assets. The token may be, for example, a point that can be used by the user in the content distribution service, or a point that can be used by the user in a service affiliated with the content distribution service. The information of the token owned by the user U may be included in the user information D1. The value of the token to be consumed may be, for example, a predetermined fixed value, a value specified by the user, or a value determined by the seat to be changed. The processing device 2 may determine the seat to be changed by, for example, an auction using the value of the token specified by the user. When tokens are consumed in the process of changing the audience position, the timing at which the tokens are consumed is arbitrary. For example, the token may be consumed when the processing device 2 accepts the change application, or may be consumed when the spectator position is changed by the change application.

位置決定部２４は、例えばユーザから指定された席が使用中であるか否かによってこの席を利用可能であるか否かを判定するが、その他の基準あるいは条件で席を利用可能であるか否かを判定してもよい。例えば、位置決定部２４は、ユーザが保有するトークンの値が、観客位置を変更する処理で消費されるトークンの値以上である場合に、席を利用可能と判定してもよい。サイバー空間上の席は、１人のユーザのみに割り当てられてもよいし、複数のユーザに割り当てられてもよい。例えば、複数のユーザの観客位置が同一でもよく、任意のユーザの観客位置は、他のユーザの観客位置と同じでもよい。 The positioning unit 24 determines whether or not this seat is available, for example, depending on whether or not the seat specified by the user is in use, but whether or not the seat can be used according to other criteria or conditions. It may be determined whether or not. For example, the position determination unit 24 may determine that the seat can be used when the value of the token held by the user is equal to or greater than the value of the token consumed in the process of changing the spectator position. Seats in cyberspace may be assigned to only one user or to multiple users. For example, the spectator positions of a plurality of users may be the same, and the spectator positions of any user may be the same as the spectator positions of other users.

なお、音声処理システム１は、演者ＰとユーザＵとの一方又は双方に対するアクションを指定する情報をユーザＵから受け付けるアクション受付部と、アクション受付部が受け付けた情報に予め関連付けられた処理を実行する処理部と、を備えてもよい。上記アクションは、観客音と異なる形態のコンテンツ上の動作（例、映像、音声）を含む。上記アクションは、例えば、コンテンツにおける映像と音声との一方または双方に関する作用を含む。上記アクションは、例えば、ユーザの入力をコンテンツ上の映像と音声との一方または双方で表現する作用を含む。以下の説明において適宜、アクションを指定する情報をアクション要求と称し、アクション受付部が受け付けた情報に予め関連付けられた処理を、アクション追加処理という。 The voice processing system 1 executes an action reception unit that receives information specifying an action for one or both of the performer P and the user U from the user U, and a process associated with the information received by the action reception unit in advance. A processing unit may be provided. The above actions include actions (eg, video, audio) on the content in a form different from the audience sound. The action includes, for example, an action relating to one or both of video and audio in the content. The above action includes, for example, the action of expressing the user's input by one or both of the video and the audio on the content. In the following description, the information that specifies the action is referred to as an action request, and the process associated with the information received by the action receiving unit in advance is referred to as an action addition process.

上記アクション受付部は、例えば端末３ａの入力部４６を含む。上記処理部は、例えば処理装置２のコンテンツ生成部２８を含む。ユーザＵａは、例えば、アクション要求として、イベントに関するコメントをテキスト形式で入力部４６に入力する。端末３ａの通信部３２は、入力部４６に入力されたテキスト情報を送信する。処理装置２の通信部２３は、端末３ａの通信部３２が送信したテキスト情報を受信する。コンテンツ生成部２８は、例えば、通信部２３が受信したテキスト情報に含まれるテキストを映像として含むコンテンツを生成する。例えば、ユーザが入力したテキストは、コンテンツの映像上で移動しながら表示されてもよいし、その他の形態で表示されてもよい。 The action reception unit includes, for example, an input unit 46 of the terminal 3a. The processing unit includes, for example, a content generation unit 28 of the processing device 2. For example, the user Ua inputs a comment about the event in the input unit 46 in a text format as an action request. The communication unit 32 of the terminal 3a transmits the text information input to the input unit 46. The communication unit 23 of the processing device 2 receives the text information transmitted by the communication unit 32 of the terminal 3a. The content generation unit 28 generates, for example, content including the text included in the text information received by the communication unit 23 as a video. For example, the text input by the user may be displayed while moving on the video of the content, or may be displayed in other forms.

上記アクション受付部は、端末３ａのマイク４１を含んでもよい。例えば、ユーザは、テキストを入力する代わりに音声を入力してもよく、マイク４１は、アクション要求として、ユーザが発する音声を受け付けてもよい。この場合、端末３ａの処理部３３は、マイク４１に入力された音声を音声認識によってテキスト情報へ変換してもよい。端末３ａの通信部３２は、処理部３３が音声認識によって生成したテキスト情報を送信してもよい。 The action reception unit may include the microphone 41 of the terminal 3a. For example, the user may input voice instead of inputting text, and the microphone 41 may accept the voice emitted by the user as an action request. In this case, the processing unit 33 of the terminal 3a may convert the voice input to the microphone 41 into text information by voice recognition. The communication unit 32 of the terminal 3a may transmit the text information generated by the processing unit 33 by voice recognition.

アクション要求情報は、テキストまたは音声と異なる形式で入力されてもよい。例えば、端末３ａは、所定のアクション追加処理に関連付けられたアイコンを表示部４５に表示させ、このアイコンに対するユーザの操作が検出された場合に、アイコンに関連付けられたアクション追加処理の実行を要求してもよい。例えば、端末３ａは、拍手の音の追加を表すアイコンを表示部４５に表示する。端末３ａは、このアイコンに対するユーザからの操作を検出した場合、拍手の音をコンテンツに追加する要求を通信部３２によって送信する。処理装置２は通信部３２が送信した要求を受信し、コンテンツ生成部２８は、例えば、予めサンプリングされた拍手の音を追加したコンテンツを生成する。 The action request information may be input in a format different from text or voice. For example, the terminal 3a displays an icon associated with a predetermined action addition process on the display unit 45, and requests the execution of the action addition process associated with the icon when the user's operation on this icon is detected. You may. For example, the terminal 3a displays an icon indicating the addition of the applause sound on the display unit 45. When the terminal 3a detects an operation from the user for this icon, the communication unit 32 transmits a request for adding the sound of applause to the content. The processing device 2 receives the request transmitted by the communication unit 32, and the content generation unit 28 generates, for example, content to which a pre-sampled clap sound is added.

上記アクション追加処理は、音調整部３４によって調整された観客音に追加の調整を施す処理を含んでもよい。例えば、アクション追加処理は、調整後の観客音の音量を変更する処理を含んでもよいし、エコーを利かせる等のエフェクトを加える処理を含んでもよい。上記アクション追加処理は、映像を追加する処理でもよく、映像効果を追加する処理でもよい。例えば、上記アクション追加処理は、コンテンツの映像に花火の映像を追加する処理を含んでもよい。上記アクション追加処理は、演者に対してチップ、投げ銭に相当する価値を譲渡する処理を含んでもよい。例えば、処理装置２の処理部２１は、ユーザＵａから指定された値のトークンを、ユーザＵａのアカウントから演者Ｐのアカウントへ移す処理を実行してもよいし、トークンを管理する装置にトークンを譲渡する処理を実行させてもよい。コンテンツ生成部２８は、トークンを譲渡する処理が実行される際に、トークンの譲渡を表す映像をコンテンツに追加してもよい。例えば、コンテンツ生成部２８は、ＣＧで表現された投げ銭をコンテンツの映像に追加してもよい。上記アイコンは、１種類のアイコンでもよいし、アクション追加処理が異なる複数種類のアイコンでもよい。 The action addition process may include a process of making additional adjustments to the audience sound adjusted by the sound adjustment unit 34. For example, the action addition process may include a process of changing the volume of the adjusted audience sound, or may include a process of adding an effect such as making an echo. The action addition process may be a process of adding a video or a process of adding a video effect. For example, the action addition process may include a process of adding a fireworks image to the content image. The action addition process may include a process of transferring a value equivalent to a tip or a thrown money to the performer. For example, the processing unit 21 of the processing device 2 may execute a process of transferring a token having a value specified by the user Ua from the account of the user Ua to the account of the performer P, or transfer the token to the device that manages the token. The process of transferring may be executed. When the process of transferring the token is executed, the content generation unit 28 may add a video showing the transfer of the token to the content. For example, the content generation unit 28 may add the tossed money expressed in CG to the video of the content. The above icon may be one type of icon, or may be a plurality of types of icons having different action addition processes.

なお、アクション要求に関連付けられた処理は、トークンを消費して実行されてもよい。消費するトークンの値は、例えば、予め定められた固定値でもよいし、ユーザが指定する値でもよい。消費するトークンの値は、アクション追加処理の種類によって定まる値でもよい。例えば、消費するトークンの値は、コンテンツの映像にテキストを追加する処理と、コンテンツの音声に効果音を追加する処理とで異なってもよい。 The process associated with the action request may be executed by consuming the token. The value of the token to be consumed may be, for example, a predetermined fixed value or a value specified by the user. The value of the token to be consumed may be a value determined by the type of action addition processing. For example, the value of the token to be consumed may differ between the process of adding text to the video of the content and the process of adding sound effects to the audio of the content.

消費するトークンの値は、アクション追加処理によるエフェクトのレベルによって定まる値でもよい。例えば、アクション追加処理は、拍手の効果音を追加する処理を含むとする。コンテンツ生成部２８は、消費するトークンが第１の値である場合、拍手の効果音を第１の音量でコンテンツに追加し、消費するトークンが第１の値よりも大きい第２の値である場合、拍手の効果音を第１の音量よりも大きい第２の音量でコンテンツに追加してもよい。アクション追加処理は、映像にテキストを追加する処理を含むとする。コンテンツ生成部２８は、消費するトークンが第１の値である場合、テキストを第１のフォントサイズでコンテンツに追加し、消費するトークンが第１の値よりも大きい第２の値である場合、テキストを第１のフォントサイズよりも大きい第２のフォントサイズでコンテンツに追加してもよい。 The value of the token to be consumed may be a value determined by the level of the effect by the action addition process. For example, it is assumed that the action addition process includes a process of adding an applause sound effect. When the token to be consumed is the first value, the content generation unit 28 adds the sound effect of applause to the content at the first volume, and the token to be consumed is a second value larger than the first value. In this case, the applause sound effect may be added to the content at a second volume higher than the first volume. It is assumed that the action addition process includes a process of adding text to the video. The content generation unit 28 adds text to the content with the first font size when the token to be consumed is the first value, and when the token to be consumed is a second value larger than the first value, the content generation unit 28 adds the text to the content. Text may be added to the content in a second font size that is larger than the first font size.

なお、端末３ａの音調整部３４または端末３ｂの音調整部３４は、ユーザＵａに対応する観客位置Ｑａと、ユーザＵｂに対応する観客位置Ｑｂとの関係に基づいて、ユーザＵｂに対応するマイクが検出した音を調整してもよい。例えば、ユーザＵｂの観客位置Ｑｂは、ユーザＵｂの観客音が観客位置Ｑｂで発せられた場合に、この観客音がユーザＵａの観客位置Ｑａに届く範囲であるとする。端末３ａは、観客位置Ｑａの周囲の観客位置Ｑで発せられた観客音を強調する処理（適宜、周囲強調処理という）を実行可能である。端末３は、例えば、周囲強調処理を実行しない第１の動作モードと、周囲強調処理を実行する第２の動作モードとを切替可能である。例えば、端末３は、ユーザＵからモード切替の入力を受け付け、この入力があったと判定した場合に第１の動作モードと第２の動作モードとを切り替える。 The sound adjustment unit 34 of the terminal 3a or the sound adjustment unit 34 of the terminal 3b is a microphone corresponding to the user Ub based on the relationship between the audience position Qa corresponding to the user Ua and the audience position Qb corresponding to the user Ub. You may adjust the sound detected by. For example, the audience position Qb of the user Ub is assumed to be a range in which the audience sound of the user Ub reaches the audience position Qa of the user Ua when the audience sound of the user Ub is emitted at the audience position Qb. The terminal 3a can execute a process of emphasizing the spectator sound emitted at the spectator position Q around the spectator position Qa (appropriately referred to as an ambient emphasizing process). The terminal 3 can switch between, for example, a first operation mode in which the peripheral emphasis processing is not executed and a second operation mode in which the peripheral enhancement processing is executed. For example, the terminal 3 receives an input for mode switching from the user U, and when it is determined that this input has been received, the terminal 3 switches between the first operation mode and the second operation mode.

第１の動作モードにおいて、ユーザＵｂの調整後の観客音は、処理装置２の合成部２７によって合成される。合成された観客音は、演者Ｐに対してスピーカ６から出力される。スピーカ６から出力された観客音は、演者Ｐが発する音とともにマイク５によって検出される。コンテンツ生成部２８は、マイク５によって検出された音を用いてコンテンツを生成し、ユーザＵｂの観客音は、コンテンツの音声に含まれる。 In the first operation mode, the adjusted audience sound of the user Ub is synthesized by the synthesis unit 27 of the processing device 2. The synthesized audience sound is output from the speaker 6 to the performer P. The audience sound output from the speaker 6 is detected by the microphone 5 together with the sound emitted by the performer P. The content generation unit 28 generates content using the sound detected by the microphone 5, and the audience sound of the user Ub is included in the sound of the content.

第２の動作モードにおいて、処理装置２の通信部２３は、ユーザＵａの観客位置ＱａとユーザＵｂの観客位置Ｑｂとの関係を示す位置情報（適宜、第２位置情報、観客間位置情報という）を送信する。観客間位置情報は、例えば、端末３ｂの音調整部３４によって調整された観客音を、ユーザＵｂに対応するマイク４１によって検出された観客音が観客位置Ｑｂで発せられた場合にユーザＵａの観客位置Ｑａに届く音へ変換するゲインを含む。処理装置２の通信部２３は、ユーザＵｂに対応する調整後の観客音のデータを送信する。端末３ａの通信部３２は、処理装置２の通信部２３が送信した調整後の観客音のデータを受信する。端末３ａの処理部３３は、観客間位置情報を用いて、ユーザＵｂに対応する調整後の観客音を、観客音が観客位置Ｑｂで発せられた場合にユーザＵａの観客位置Ｑａに届く音へ変換する。処理部３３は、変換したユーザＵｂの観客音を、コンテンツの音声と合成して、スピーカ４２から出力させる。このような形態の音声処理システム１は、例えばサイバー空間において第１のユーザの周囲に居る第２のユーザの観客音が第１のユーザにリアルに伝わり、イベントの臨場感が高められる。 In the second operation mode, the communication unit 23 of the processing device 2 has position information indicating the relationship between the spectator position Qa of the user Ua and the spectator position Qb of the user Ub (appropriately referred to as a second position information and an inter-audience position information). To send. The inter-audience position information is, for example, the audience sound of the user Ua when the audience sound adjusted by the sound adjustment unit 34 of the terminal 3b is emitted at the audience position Qb by the audience sound detected by the microphone 41 corresponding to the user Ub. Includes a gain that converts the sound to reach position Qa. The communication unit 23 of the processing device 2 transmits the adjusted audience sound data corresponding to the user Ub. The communication unit 32 of the terminal 3a receives the adjusted audience sound data transmitted by the communication unit 23 of the processing device 2. The processing unit 33 of the terminal 3a uses the inter-audience position information to send the adjusted spectator sound corresponding to the user Ub to the sound that reaches the spectator position Qa of the user Ua when the spectator sound is emitted at the spectator position Qb. Convert. The processing unit 33 synthesizes the converted audience sound of the user Ub with the sound of the content and outputs it from the speaker 42. In the voice processing system 1 of such a form, for example, the audience sound of the second user who is around the first user in the cyber space is realistically transmitted to the first user, and the presence of the event is enhanced.

端末３ａは、第１の動作モードから第２の動作モードへ切り替える際に、動作を切り替えることを処理装置２に通知する。処理装置２の処理部２１は、上記通知を受けた場合、端末３ａに対応するユーザＵａの観客位置Ｑａを特定する。処理部２１は、観客位置Ｑａの周囲の所定の範囲に含まれる少なくとも１つの観客位置Ｑを抽出する。処理部Ａは、抽出した観客位置Ｑの各々について観客間位置情報を生成する。処理装置２は、生成した観客間位置情報を端末３ａに提供する。処理装置２は、観客間位置情報に対応する調整後の観客音のデータを、観客間位置情報と関連付けて端末３ａに提供する。例えば、処理装置２の処理部２１は、観客間位置情報ごとに識別情報を割り付け、調整音のデータを識別情報と一組にして送信する。 When switching from the first operation mode to the second operation mode, the terminal 3a notifies the processing device 2 that the operation is to be switched. Upon receiving the above notification, the processing unit 21 of the processing device 2 identifies the audience position Qa of the user Ua corresponding to the terminal 3a. The processing unit 21 extracts at least one spectator position Q included in a predetermined range around the spectator position Qa. The processing unit A generates inter-audience position information for each of the extracted spectator positions Q. The processing device 2 provides the generated inter-audience position information to the terminal 3a. The processing device 2 provides the adjusted spectator sound data corresponding to the inter-audience position information to the terminal 3a in association with the inter-audience position information. For example, the processing unit 21 of the processing device 2 allocates identification information for each position information between spectators, and transmits the adjustment sound data as a set with the identification information.

次に、上述の音声処理システム１の構成に基づき、実施形態に係る音声処理方法について説明する。図７は、第２実施形態に係る音声処理方法を示す図である。音声処理システム１の構成については適宜、図６を参照する。図５と同様の処理については、適宜、図５と同じ符号を付してその説明を省略あるいは簡略化する。 Next, the voice processing method according to the embodiment will be described based on the configuration of the voice processing system 1 described above. FIG. 7 is a diagram showing a voice processing method according to the second embodiment. Refer to FIG. 6 as appropriate for the configuration of the voice processing system 1. The same processing as in FIG. 5 is appropriately designated with the same reference numerals as those in FIG. 5, and the description thereof will be omitted or simplified.

第１ユーザ端末は、図６に示したユーザＵａから参加申請に関する情報の入力を受け付ける。第１ユーザ端末は、ステップＳ２１において、エントリー情報を送信する。例えば、図６の端末３ａの処理部３３は、入力部４６に入力された情報に基づいて、エントリー情報を通信部３２に送信させる。処理装置は、ステップＳ２１において送信されたエントリー情報を受信する。処理装置は、ステップＳ２２において、第１ユーザの観客位置を決定する。例えば、図６の処理装置２の位置決定部２４は、通信部２３が受信したユーザＵａに対応するエントリー情報に基づいて、ユーザＵａの観客位置Ｑａを決定する。処理装置は、ステップＳ２３において位置情報を送信する。例えば、図６の処理装置２の処理部２１は、位置決定部２４が決定した観客位置Ｑａを用いて、観客位置Ｑａと演者位置Ｐ１との関係を示す位置情報を生成する。通信部２３は、処理部２１が生成した位置情報を送信する。 The first user terminal accepts input of information regarding the participation application from the user Ua shown in FIG. The first user terminal transmits the entry information in step S21. For example, the processing unit 33 of the terminal 3a of FIG. 6 causes the communication unit 32 to transmit the entry information based on the information input to the input unit 46. The processing device receives the entry information transmitted in step S21. The processing device determines the audience position of the first user in step S22. For example, the position determination unit 24 of the processing device 2 of FIG. 6 determines the audience position Qa of the user Ua based on the entry information corresponding to the user Ua received by the communication unit 23. The processing device transmits the position information in step S23. For example, the processing unit 21 of the processing device 2 of FIG. 6 uses the spectator position Qa determined by the position determining unit 24 to generate position information indicating the relationship between the spectator position Qa and the performer position P1. The communication unit 23 transmits the position information generated by the processing unit 21.

ステップＳ２４からステップＳ２６の処理は、ステップＳ２１からステップＳ２３の処理と同様であり、その説明を簡略化する。第２ユーザ端末は、図６に示したユーザＵｂから参加申請に関する情報の入力を受け付ける。第２ユーザ端末は、ステップＳ２４において、エントリー情報を送信する。例処理装置は、ステップＳ２４において送信されたエントリー情報を受信する。処理装置は、ステップＳ２５において、第２ユーザ（例、図６のユーザＵｂ）の観客位置を決定する。処理装置は、ステップＳ２６において位置情報を送信する。 The process from step S24 to step S26 is the same as the process from step S21 to step S23, and the description thereof will be simplified. The second user terminal accepts input of information regarding the participation application from the user Ub shown in FIG. The second user terminal transmits the entry information in step S24. Example The processing device receives the entry information transmitted in step S24. In step S25, the processing device determines the audience position of the second user (eg, user Ub in FIG. 6). The processing device transmits the position information in step S26.

以上のステップＳ２１からステップＳ２６の処理は、例えば、イベントの開始前に実行される。なお、ステップＳ２１からステップＳ２６の処理の少なくとも一部は、イベントの開始後に実行されてもよい。例えば、ユーザは、コンテンツの配信が開始された後で、イベントの参加申請を行って、イベントに参加してもよい。ステップＳ２６の後のステップＳ１からステップＳ１１の処理は、図３と同様である。ステップＳ１からステップＳ１１の処理は繰り返し実行され、観客位置を変更する処理およびアクション追加処理は、それぞれ、ステップＳ１からステップＳ１１の処理が繰り返し実行される期間に、実行される。観客位置を変更する処理は、ステップＳ１の処理が実行される前に、実行されてもよい。例えば、ユーザＵは、参加申請を行った後、コンテンツの配信が開始される前に、端末３を操作して観客位置の変更処理を実行させてもよい。 The processing of steps S21 to S26 described above is executed, for example, before the start of the event. In addition, at least a part of the processing of steps S21 to S26 may be executed after the start of the event. For example, the user may apply for participation in the event and participate in the event after the distribution of the content is started. The processing of steps S1 to S11 after step S26 is the same as in FIG. The processes of steps S1 to S11 are repeatedly executed, and the process of changing the spectator position and the action addition process are executed during the period in which the processes of steps S1 to S11 are repeatedly executed, respectively. The process of changing the spectator position may be executed before the process of step S1 is executed. For example, the user U may operate the terminal 3 to execute the spectator position change process after the participation application is made and before the distribution of the content is started.

［第３実施形態］
第３実施形態について説明する。図８は、第３実施形態に係る音声処理システムを示す図である。本実施形態において、上述の実施形態と同様の構成については、同じ符号を付してその説明を省略あるいは簡略化する。本実施形態に係る音声処理システム１は、複数のグループに分けられた複数の第１音声処理装置を備え、合成部は、複数のグループの各グループに対応して設けられ、各グループにおいて、第１音声処理装置の音調整部が調整した音を合成する第１の合成部と、複数のグループにおいて、第１の合成部が合成した音を合成する第２の合成部と、を備える。[Third Embodiment]
The third embodiment will be described. FIG. 8 is a diagram showing a voice processing system according to the third embodiment. In the present embodiment, the same components as those in the above-described embodiment are designated by the same reference numerals, and the description thereof will be omitted or simplified. The voice processing system 1 according to the present embodiment includes a plurality of first voice processing devices divided into a plurality of groups, and a synthesis unit is provided corresponding to each group of the plurality of groups. (1) A first synthesis unit that synthesizes the sound adjusted by the sound adjustment unit of the voice processing device, and a second synthesis unit that synthesizes the sound synthesized by the first synthesis unit in a plurality of groups are provided.

本実施形態において、複数の端末３は、複数のグループに分かれている。以下の説明において適宜、任意のグループを符号Ｇで表し、グループＧを区別する場合、グループＧａ、グループＧｂのように符号Ｇにアルファベットａ、ｂ、・・・を追加した符号で表す。 In the present embodiment, the plurality of terminals 3 are divided into a plurality of groups. In the following description, any group is appropriately represented by a code G, and when the group G is distinguished, it is represented by a code obtained by adding the alphabets a, b, ... To the code G such as group Ga and group Gb.

本実施形態において、音声処理システム１は、複数の処理装置５１と、処理装置５２とを備える。処理装置５１は、グループＧごとに設けられる。以下の説明において適宜、任意の処理装置５１を符号５１で表し、処理装置５１を区別する場合、処理装置５１ａ、処理装置５１ｂのように符号５１にアルファベットａ、ｂ、・・・を追加した符号で表す。 In the present embodiment, the voice processing system 1 includes a plurality of processing devices 51 and a processing device 52. The processing device 51 is provided for each group G. In the following description, when an arbitrary processing device 51 is appropriately represented by reference numeral 51 and the processing device 51 is distinguished, the reference numerals in which the alphabets a, b, ... Are added to the reference numerals 51 as in the processing device 51a and the processing device 51b. It is represented by.

処理装置５１は、図３などで説明した合成部２７を備える。処理装置５１は、対応するグループに含まれる２以上の端末３から調整後の観客音のデータを取得する。例えば、処理装置５１ａは、グループＧａに含まれる端末３の各々が送信する調整後の観客音のデータを受信する。処理装置５１ａの合成部２７は、受信した観客音のデータを用いて、グループＧａに含まれる端末３の各々から取得した調整後の観客音を合成する。以下の説明において適宜、処理装置５１が合成した調整後の観客音を、１次合成後の観客音という。 The processing device 51 includes a synthesis unit 27 described with reference to FIG. 3 and the like. The processing device 51 acquires the adjusted audience sound data from two or more terminals 3 included in the corresponding group. For example, the processing device 51a receives the adjusted audience sound data transmitted by each of the terminals 3 included in the group Ga. The synthesizing unit 27 of the processing device 51a synthesizes the adjusted spectator sound acquired from each of the terminals 3 included in the group Ga by using the received spectator sound data. In the following description, the adjusted spectator sound synthesized by the processing device 51 as appropriate is referred to as the spectator sound after the primary synthesis.

処理装置５２は、２以上の処理装置５１に対応して設けられる。図８において、処理装置５２は、処理装置５１ａと、処理装置５１ｂとに対応する。処理装置５２は、対応する２以上の処理装置５１の各々から、各処理装置５１から１次合成後の観客音のデータを取得する。処理装置５２は、合成部５３と、コンテンツ生成部２８とを備える。合成部５３は、２以上の処理装置５１の各々から取得された１次合成後の観客音のデータを用いて、これら観客音のデータが表す観客音を合成する。合成部５３の処理は、例えば合成部２７の処理と同様の処理でよい。処理装置５２は、合成部５３が合成した観客音のデータを出力する。処理装置５２は、合成部５３が合成した観客音を、スピーカ６に出力させる。 The processing device 52 is provided corresponding to two or more processing devices 51. In FIG. 8, the processing device 52 corresponds to the processing device 51a and the processing device 51b. The processing device 52 acquires the data of the audience sound after the primary synthesis from each of the two or more processing devices 51 corresponding to each of the processing devices 51. The processing device 52 includes a synthesis unit 53 and a content generation unit 28. The synthesizing unit 53 synthesizes the spectator sounds represented by the spectator sound data by using the spectator sound data after the primary synthesis acquired from each of the two or more processing devices 51. The processing of the synthesis unit 53 may be the same processing as that of the synthesis unit 27, for example. The processing device 52 outputs the data of the audience sound synthesized by the synthesis unit 53. The processing device 52 causes the speaker 6 to output the audience sound synthesized by the synthesis unit 53.

コンテンツ生成部２８は、図３等で説明したように、カメラ４によって撮影された映像と、マイク５によって検出された音声とを用いてコンテンツを生成する。処理装置５２は、生成したコンテンツのデータを提供する。処理装置５１は、処理装置５２が提供するコンテンツのデータを取得する。処理装置５１は、対応するグループＧに含まれる端末３にコンテンツのデータを提供する。例えば、処理装置５１ａは、対応するグループＧａに含まれる２以上の端末３の各々に、コンテンツのデータを提供する。グループＧａに含まれる２以上の端末３は、それぞれ、処理装置５１ａから提供されるコンテンツのデータを用いて、コンテンツを再生する。 As described with reference to FIG. 3 and the like, the content generation unit 28 generates content using the video captured by the camera 4 and the voice detected by the microphone 5. The processing device 52 provides the data of the generated content. The processing device 51 acquires data on the content provided by the processing device 52. The processing device 51 provides content data to the terminal 3 included in the corresponding group G. For example, the processing device 51a provides content data to each of the two or more terminals 3 included in the corresponding group Ga. The two or more terminals 3 included in the group Ga each reproduce the content using the content data provided by the processing device 51a.

以上のように、本実施形態に係る音声処理システム１は、複数の端末３が複数のグループに分かれており、各グループに含まれる２以上の端末３から提供される調整後の観客音を合成する処理装置５１を備える。音声処理システム１は、調整後の観客音を合成する処理を複数の装置で分散して実行するので、各装置の処理の負荷を軽減することができ、例えば処理の遅延が発生することを減らすことができる。 As described above, in the audio processing system 1 according to the present embodiment, the plurality of terminals 3 are divided into a plurality of groups, and the adjusted audience sounds provided by the two or more terminals 3 included in each group are synthesized. The processing device 51 is provided. Since the voice processing system 1 executes the process of synthesizing the adjusted audience sound in a distributed manner among a plurality of devices, it is possible to reduce the processing load of each device, for example, to reduce the occurrence of processing delay. be able to.

なお、音声処理システム１は、２以上の処理装置５１から提供される１次合成後の観客音を合成する処理装置（適宜、上位処理装置という）を複数備え、処理装置５２は、複数の上位処理装置から各上位処理装置が合成した観客音（適宜、２次合成後の観客音という）を取得し、これら２次合成後の観客を合成してそのデータを出力してもよい。例えば、各端末３と処理装置５２との間で中間データを生成する装置のレイヤーは、図８において処理装置５１の１層であるが、処理装置５１と上位処理装置との２層でもよい。このように、中間データを生成する装置のレイヤーの数は任意であり、図３のように０層でもよいし、図８のように１層でもよく、複数層でもよい。コンテンツ生成部２８は、処理装置５２と異なる装置に設けられてもよく、例えば、端末３と処理装置５２との間で中間データを生成する装置（例、処理装置５１）に設けられてもよい。 The audio processing system 1 includes a plurality of processing devices (appropriately referred to as higher-level processing devices) for synthesizing the audience sound after the primary synthesis provided by two or more processing devices 51, and the processing device 52 is a plurality of higher-level processing devices. Audience sounds synthesized by each higher-level processing device (appropriately referred to as spectator sounds after secondary synthesis) may be acquired from the processing device, and the spectators after these secondary synthesis may be combined and the data may be output. For example, the layer of the device that generates intermediate data between each terminal 3 and the processing device 52 is one layer of the processing device 51 in FIG. 8, but may be two layers of the processing device 51 and the upper processing device. As described above, the number of layers of the device for generating the intermediate data is arbitrary, and may be 0 layer as shown in FIG. 3, 1 layer as shown in FIG. 8, or multiple layers as shown in FIG. The content generation unit 28 may be provided in a device different from the processing device 52, and may be provided in, for example, a device (eg, processing device 51) that generates intermediate data between the terminal 3 and the processing device 52. ..

図９は、第３実施形態に係るグループの例を示す図である。図９の符号ＥＸ１からＥＸ３は、それぞれ、グループの例を表す。第１の例ＥＸ１において、複数のグループＧは、それぞれ、イベント会場ＡＥにおける演者位置Ｐ１までの距離が所定の範囲である観客位置Ｑに対応する端末の集合である。例えば、グループＧａは、演者位置Ｐ１までの距離がＬ１未満である観客位置に対応する端末の集合である。グループＧｂは、演者位置Ｐ１までの距離がＬ１以上Ｌ２未満である観客位置に対応する端末の集合である。 FIG. 9 is a diagram showing an example of a group according to the third embodiment. Reference numerals EX1 to EX3 in FIG. 9 represent examples of groups, respectively. In the first example EX1, each of the plurality of groups G is a set of terminals corresponding to the audience position Q in which the distance to the performer position P1 in the event venue AE is within a predetermined range. For example, the group Ga is a set of terminals corresponding to the audience position where the distance to the performer position P1 is less than L1. The group Gb is a set of terminals corresponding to the audience positions where the distance to the performer position P1 is L1 or more and less than L2.

第２の例ＥＸ２において、複数のグループＧは、それぞれ、イベント会場ＡＥにおける演者位置Ｐ１からの向きが所定の範囲である観客位置Ｑに対応する端末の集合である。例えば、例えば、グループＧａは、演者位置Ｐ１を基準とする方位角が範囲θ１にある観客位置Ｑに対応する端末のグループである。グループＧｂは、演者位置Ｐ１を基準とする方位角が範囲θ２にある観客位置Ｑに対応する端末のグループである。 In the second example EX2, each of the plurality of groups G is a set of terminals corresponding to the audience position Q whose direction from the performer position P1 in the event venue AE is within a predetermined range. For example, the group Ga is a group of terminals corresponding to the audience position Q whose azimuth angle with respect to the performer position P1 is in the range θ1. The group Gb is a group of terminals corresponding to the audience position Q whose azimuth angle with respect to the performer position P1 is in the range θ2.

なお、グループ分けのルールは、第１の例又は第２の例に限定されず、任意に定められる事項である。例えば、第３の例ＥＸ３は、第１の例のグループ分けのルールと第２の例のグループ分けのルールとを組み合わせたグループ分けの例である。第３の例において、複数のグループＧは、演者位置Ｐ１からの距離と、演者位置Ｐ１を基準とする方位角とに基づいて、決定される。 The rules for grouping are not limited to the first example or the second example, and are arbitrarily determined. For example, the third example EX3 is an example of grouping in which the grouping rule of the first example and the grouping rule of the second example are combined. In the third example, the plurality of groups G are determined based on the distance from the performer position P1 and the azimuth angle with respect to the performer position P1.

なお、複数のグループＧは、観客位置Ｑが決定された順番に基づいて、決定されてもよい。例えば、複数の端末３は、観客位置Ｑが決定された順番に基づいて、グループ分けされてもよい。例えば、グループＧａに含まれる端末３は、観客位置Ｑが決定された順番が１番から１００番までの観客位置Ｑに対応する端末でもよい。グループＧｂに含まれる端末３は、観客位置Ｑが決定された順番が１０１番から２００番までの観客位置Ｑに対応する端末でもよい。 The plurality of groups G may be determined based on the order in which the spectator positions Q are determined. For example, the plurality of terminals 3 may be grouped based on the order in which the audience position Q is determined. For example, the terminal 3 included in the group Ga may be a terminal corresponding to the spectator position Q in which the spectator position Q is determined from the first to the 100th. The terminal 3 included in the group Gb may be a terminal corresponding to the spectator position Q from 101 to 200 in the order in which the spectator position Q is determined.

なお、複数のグループＧは、イベント会場ＡＥにおける席の優先度（例、席のランク）に基づいて、決定されてもよい。例えば、グループＧａに含まれる端末３は、イベント会場ＡＥにおいて第１ランクの席（例、Ｓ席）に相当する観客位置Ｑに対応する端末でもよい。グループＧｂに含まれる端末３は、イベント会場ＡＥにおいて第１ランクよりも優先度が低い第２ランクの席（例、Ａ席）に相当する観客位置Ｑに対応する端末でもよい。第１ランクの席は、第２ランクの席に比べてユーザが支払う参加費用が高い席でもよいし、予め選択されたユーザ（例、招待客）に割り当てられる席でもよい。複数のグループＧは、各グループＧに含まれる端末３を、複数の端末３から無作為に選択することで決定されてもよい。上述したグループ分けのルールは、２種類以上のルール（例、条件）を組み合わせたルールでもよい。 The plurality of groups G may be determined based on the priority of seats (eg, seat rank) at the event venue AE. For example, the terminal 3 included in the group Ga may be a terminal corresponding to the audience position Q corresponding to the first rank seat (eg, S seat) in the event venue AE. The terminal 3 included in the group Gb may be a terminal corresponding to the audience position Q corresponding to a second rank seat (eg, A seat) having a lower priority than the first rank in the event venue AE. The first-ranked seats may be seats for which the participation fee paid by the user is higher than that of the second-ranked seats, or may be seats assigned to preselected users (eg, invited guests). The plurality of groups G may be determined by randomly selecting the terminals 3 included in each group G from the plurality of terminals 3. The above-mentioned grouping rule may be a rule that combines two or more types of rules (eg, conditions).

なお、複数のグループＧにおいて、各グループに含まれる端末３の数は、同一でもよいし、異なってもよい。各グループＧに含まれる端末３の数は、例えば、各グループＧに対応する処理装置５１が処理を担当する端末３の数でもよい。ここで、グループＧａに含まれる端末３は、上記第１ランクの席に相当する観客位置Ｑに対応する端末であるとする。また、グループＧｂに含まれる端末３は、上記第２ランクの席に相当する観客位置Ｑに対応する端末であるとする。グループＧａに含まれる端末３の数は、グループＧｂに含まれる端末３の数よりも少なくてもよい。この場合、グループＧａに対応する処理装置５１ａは、グループＧｂに対応する処理装置５１ｂに比べて処理の負荷が軽減され、例えば処理の遅延が発生する可能性が低下する。 In the plurality of groups G, the number of terminals 3 included in each group may be the same or different. The number of terminals 3 included in each group G may be, for example, the number of terminals 3 in which the processing device 51 corresponding to each group G is in charge of processing. Here, it is assumed that the terminal 3 included in the group Ga is a terminal corresponding to the spectator position Q corresponding to the seat of the first rank. Further, it is assumed that the terminal 3 included in the group Gb is a terminal corresponding to the spectator position Q corresponding to the second rank seat. The number of terminals 3 included in the group Ga may be smaller than the number of terminals 3 included in the group Gb. In this case, the processing device 51a corresponding to the group Ga has a reduced processing load as compared with the processing device 51b corresponding to the group Gb, and for example, the possibility of a processing delay is reduced.

なお、複数の処理装置５１において、各処理装置５１の性能は、同一でもよいし異なってもよい。ここで、グループＧａに含まれる端末３は、上記第１ランクの席に相当する観客位置Ｑに対応する端末であるとする。また、グループＧｂに含まれる端末３は、上記第２ランクの席に相当する観客位置Ｑに対応する端末であるとする。グループＧａに対応する処理装置５１ａは、グループＧｂに対応する処理装置５１ｂに比べてハードウェアの性能（例、ＣＰＵの処理速度、記憶部の読み書き速度、通信速度）が高くてもよい。この場合、グループＧａに対応する処理装置５１ａは、グループＧｂに対応する処理装置５１ｂに比べて、例えば処理の遅延が発生する可能性が低下する。 In the plurality of processing devices 51, the performance of each processing device 51 may be the same or different. Here, it is assumed that the terminal 3 included in the group Ga is a terminal corresponding to the spectator position Q corresponding to the seat of the first rank. Further, it is assumed that the terminal 3 included in the group Gb is a terminal corresponding to the spectator position Q corresponding to the second rank seat. The processing device 51a corresponding to the group Ga may have higher hardware performance (eg, CPU processing speed, storage read / write speed, communication speed) than the processing device 51b corresponding to the group Gb. In this case, the processing device 51a corresponding to the group Ga is less likely to cause a processing delay than the processing device 51b corresponding to the group Gb, for example.

複数の端末３を複数のグループに分ける処理（適宜、グループ分け処理という）は、例えば、図３の受付端末８によって実行される。グループ分け処理は、受付端末８と異なる装置によって実行されてもよい。例えば、グループ分け処理は、処理装置２によって実行されてもよいし、その他の装置によって実行されてもよい。グループ分け処理は、音声処理システム１の外部の装置によって実行されてもよい。 The process of dividing the plurality of terminals 3 into a plurality of groups (appropriately referred to as grouping process) is executed by, for example, the reception terminal 8 of FIG. The grouping process may be executed by a device different from the reception terminal 8. For example, the grouping process may be executed by the processing device 2 or may be executed by another device. The grouping process may be executed by an external device of the voice processing system 1.

［第４実施形態］
第４実施形態について説明する。図１０は、第４実施形態に係る音声処理システムを示す図である。本実施形態において、上述の実施形態と同様の構成については、同じ符号を付してその説明を省略あるいは簡略化する。本実施形態において、所定領域（例、イベント会場ＡＥ）は、各々が各グループに対応する複数の部分領域に区分され、各グループは、各グループに対応する部分領域に含まれる観客位置に対応する第１音声処理装置の集合を含む。複数の部分領域は、それぞれ、複数のグループのいずれかと対応する。例えば、複数の端末３（図８参照）は、複数のグループＧに分かれている。複数のグループＧは、図９の第２の例ＥＸ２で説明したように、観客位置Ｑに対する演者位置の向きに基づいて決定されている。[Fourth Embodiment]
A fourth embodiment will be described. FIG. 10 is a diagram showing a voice processing system according to the fourth embodiment. In the present embodiment, the same components as those in the above-described embodiment are designated by the same reference numerals, and the description thereof will be omitted or simplified. In the present embodiment, the predetermined area (eg, event venue AE) is divided into a plurality of subregions each corresponding to each group, and each group corresponds to the audience position included in the subregion corresponding to each group. Includes a set of first audio processing devices. Each of the plurality of subregions corresponds to one of a plurality of groups. For example, the plurality of terminals 3 (see FIG. 8) are divided into a plurality of groups G. The plurality of groups G are determined based on the orientation of the performer position with respect to the audience position Q, as described in the second example EX2 of FIG.

図１０の音声処理システム１は、複数の処理装置２を備える。以下の説明において適宜、任意の処理装置を符号２で表し、処理装置２を区別する場合、処理装置２ａ、処理装置２ｂのように符号２にアルファベットａ、ｂ、・・・を追加した符号で表す。複数の処理装置２は、それぞれ、１つのグループＧと対応する。例えば、処理装置２ａはグループＧａと対応し、処理装置２ｂはグループＧｂと対応する。 The voice processing system 1 of FIG. 10 includes a plurality of processing devices 2. In the following description, any processing device is appropriately represented by reference numeral 2, and when the processing device 2 is distinguished, the reference numerals are obtained by adding the alphabets a, b, ... To the reference numerals 2 as in the processing device 2a and the processing device 2b. show. Each of the plurality of processing devices 2 corresponds to one group G. For example, the processing device 2a corresponds to the group Ga, and the processing device 2b corresponds to the group Gb.

各処理装置２は、対応するグループＧに含まれる端末３にコンテンツのデータを提供する。例えば、処理装置２ａは、グループＧａに含まれる各端末３にコンテンツのデータを提供する。各処理装置２は、対応するグループＧに含まれる端末３から調整後の観客音のデータを取得する。例えば、処理装置２ａは、グループＧａに含まれる各端末３が送信した調整後の観客音のデータを受信する。各処理装置２は、対応するグループＧに含まれる端末３から取得した調整後の観客音を合成し、合成後の観客音のデータを出力する。例えば、処理装置２ａは、グループＧａに含まれる各端末３から取得した調整後の観客音を合成し、合成後の観客音のデータを出力する。 Each processing device 2 provides content data to a terminal 3 included in the corresponding group G. For example, the processing device 2a provides content data to each terminal 3 included in the group Ga. Each processing device 2 acquires the adjusted audience sound data from the terminal 3 included in the corresponding group G. For example, the processing device 2a receives the adjusted audience sound data transmitted by each terminal 3 included in the group Ga. Each processing device 2 synthesizes the adjusted audience sound acquired from the terminal 3 included in the corresponding group G, and outputs the data of the synthesized audience sound. For example, the processing device 2a synthesizes the adjusted audience sound acquired from each terminal 3 included in the group Ga, and outputs the data of the synthesized audience sound.

本実施形態において、音声処理システム１は、複数の音声出力装置（例、スピーカ６）を備える。以下の説明において適宜、任意のスピーカを符号６で表し、スピーカ６を区別する場合、スピーカ６ａ、スピーカ６ｂのように符号６にアルファベットａ、ｂ、・・・を追加した符号で表す。複数のスピーカ６は、実演領域ＡＰに配置される。複数のスピーカ６は、それぞれ、処理装置２と対応関係がある。ここでは、スピーカ６は、処理装置２と１対１の関係で設けられるものとする。例えば、スピーカ６ａは処理装置２ａに対応し、スピーカ６ｂは処理装置２ｂと対応する。 In the present embodiment, the voice processing system 1 includes a plurality of voice output devices (eg, speakers 6). In the following description, any speaker is appropriately represented by reference numeral 6, and when the speaker 6 is distinguished, it is represented by a reference numeral in which the alphabets a, b, ... Are added to the reference numeral 6 such as the speaker 6a and the speaker 6b. The plurality of speakers 6 are arranged in the demonstration area AP. Each of the plurality of speakers 6 has a corresponding relationship with the processing device 2. Here, it is assumed that the speaker 6 is provided in a one-to-one relationship with the processing device 2. For example, the speaker 6a corresponds to the processing device 2a, and the speaker 6b corresponds to the processing device 2b.

複数のスピーカ６は、それぞれ、対応する処理装置２が担当するグループＧと対応する。例えば、スピーカ６ａはグループＧａに対応し、スピーカ６ｂはグループＧｂと対応する。ここで、各グループＧに含まれる端末３に対応する観客位置Ｑが分布する領域をグループの領域という。 Each of the plurality of speakers 6 corresponds to a group G in charge of the corresponding processing device 2. For example, the speaker 6a corresponds to the group Ga, and the speaker 6b corresponds to the group Gb. Here, the area in which the audience position Q corresponding to the terminal 3 included in each group G is distributed is referred to as a group area.

実演領域ＡＰにおける演者Ｐと複数のスピーカ６との配置は、イベント会場ＡＥにおける演者位置Ｐ１と複数のグループ領域との配置と対応するように、設定される。ここで、実演領域ＡＰにおいて、演者Ｐを中心とし、演者Ｐの鉛直方向の周りの回転方向を考える。この回転方向は、イベント会場ＡＥにおいて、演者位置Ｐ１を中心とし、演者位置Ｐ１の鉛直方向の周りの回転方向と対応する。実演領域ＡＰにおいて、複数のスピーカ６は、回転方向の時計回りに、スピーカ６ａ、スピーカ６ｂ、スピーカ６ｃ、スピーカ６ｄ、スピーカ６ｅの順に配列されている。イベント会場ＡＥにおいて、複数のグループＧの領域は、回転方向の時計回りに、グループＧａの領域、グループＧｂの領域、グループＧｃの領域、グループＧｄの領域、グループＧｅの領域の順に配列されている。このように、複数のスピーカ６は、演者Ｐに対する方位角の関係が、演者位置Ｐ１に対する複数のグループＧの領域の相互の関係を保持するように、配置される。 The arrangement of the performer P and the plurality of speakers 6 in the demonstration area AP is set so as to correspond to the arrangement of the performer position P1 and the plurality of group areas in the event venue AE. Here, in the demonstration area AP, the rotation direction around the vertical direction of the performer P is considered centering on the performer P. This rotation direction is centered on the performer position P1 at the event venue AE, and corresponds to the rotation direction around the vertical direction of the performer position P1. In the demonstration area AP, the plurality of speakers 6 are arranged in the order of speaker 6a, speaker 6b, speaker 6c, speaker 6d, and speaker 6e in the clockwise direction in the rotation direction. In the event venue AE, the regions of the plurality of groups G are arranged in the order of the region of the group Ga, the region of the group Gb, the region of the group Gc, the region of the group Gd, and the region of the group Ge in the clockwise direction in the rotation direction. .. In this way, the plurality of speakers 6 are arranged so that the relationship of the azimuth angles with respect to the performer P maintains the mutual relationship of the regions of the plurality of groups G with respect to the performer position P1.

本実施形態において、音声処理システム１は、複数のカメラ４を備える。以下の説明において適宜、任意のカメラを符号４で表し、カメラ４を区別する場合、カメラ４ａ、カメラ４ｂのように符号４にアルファベットａ、ｂ、・・・を追加した符号で表す。複数のカメラ４は、実演領域ＡＰに配置される。複数のカメラ４は、それぞれ、処理装置２と対応関係がある。ここでは、カメラ４は、処理装置２と１対１の関係で設けられるものとする。例えば、カメラ４ａは処理装置２ａに対応し、カメラ４ｂは処理装置２ｂと対応する。複数のカメラ４は、それぞれ、対応する処理装置２が担当するグループＧと対応する。例えば、カメラ４ａはグループＧａに対応し、カメラ４ｂはグループＧｂと対応する。 In the present embodiment, the voice processing system 1 includes a plurality of cameras 4. In the following description, any camera is appropriately represented by reference numeral 4, and when the camera 4 is distinguished, it is represented by a reference numeral in which the alphabets a, b, ... Are added to the reference numeral 4 such as the camera 4a and the camera 4b. The plurality of cameras 4 are arranged in the demonstration area AP. Each of the plurality of cameras 4 has a corresponding relationship with the processing device 2. Here, it is assumed that the camera 4 is provided in a one-to-one relationship with the processing device 2. For example, the camera 4a corresponds to the processing device 2a, and the camera 4b corresponds to the processing device 2b. Each of the plurality of cameras 4 corresponds to a group G in charge of the corresponding processing device 2. For example, the camera 4a corresponds to the group Ga and the camera 4b corresponds to the group Gb.

実演領域ＡＰにおける演者Ｐと複数のカメラ４との配置は、イベント会場ＡＥにおける演者位置Ｐ１と複数のグループＧの領域との配置と対応するように、設定される。ここで、実演領域ＡＰにおいて、演者Ｐを中心とし、演者Ｐの鉛直方向の周りの回転方向を考える。この回転方向は、イベント会場ＡＥにおいて、演者位置Ｐ１を中心とし、演者位置Ｐ１の鉛直方向の周りの回転方向と対応する。実演領域ＡＰにおいて、複数のカメラ４は、回転方向の時計回りに、カメラ４ａ、カメラ４ｂ、カメラ４ｃ、カメラ４ｄ、カメラ４ｅの順に配列されている。イベント会場ＡＥにおいて、複数のグループＧの領域は、回転方向の時計回りに、グループＧａの領域、グループＧｂの領域、グループＧｃの領域、グループＧｄの領域、グループＧｅの領域の順に配列されている。このように、複数のカメラ４は、演者Ｐに対する方位角の関係が、演者位置Ｐ１に対する複数のグループＧ領域の互いの位置関係を保持するように、配置される。各カメラ４は、演者位置Ｐ１に演者Ｐが居るとした場合に、対応するグループＧの領域から演者位置Ｐ１を見た状態に相当する映像を撮影するように、演者Ｐに対する向きが設定されている。 The arrangement of the performer P and the plurality of cameras 4 in the demonstration area AP is set so as to correspond to the arrangement of the performer position P1 and the areas of the plurality of groups G in the event venue AE. Here, in the demonstration area AP, the rotation direction around the vertical direction of the performer P is considered centering on the performer P. This rotation direction is centered on the performer position P1 at the event venue AE, and corresponds to the rotation direction around the vertical direction of the performer position P1. In the demonstration area AP, the plurality of cameras 4 are arranged in the order of camera 4a, camera 4b, camera 4c, camera 4d, and camera 4e in the clockwise direction in the rotation direction. In the event venue AE, the regions of the plurality of groups G are arranged in the order of the region of the group Ga, the region of the group Gb, the region of the group Gc, the region of the group Gd, and the region of the group Ge in the clockwise direction in the rotation direction. .. In this way, the plurality of cameras 4 are arranged so that the relationship of the azimuth angles with respect to the performer P maintains the positional relationship of the plurality of group G regions with respect to the performer position P1. When the performer P is in the performer position P1, each camera 4 is set to be oriented with respect to the performer P so as to shoot an image corresponding to the state in which the performer position P1 is viewed from the area of the corresponding group G. There is.

処理装置２は、マイク５が検出した音と、処理装置２と対応するカメラ４が撮影した映像とを用いて、コンテンツを生成する。例えば、処理装置２ａは、マイク５が検出した音と、カメラ４ａが撮影した映像とを用いてコンテンツを生成する。処理装置２ａは、生成したコンテンツを、対応するグループＧａに含まれる端末３に提供する。 The processing device 2 generates content by using the sound detected by the microphone 5 and the image captured by the camera 4 corresponding to the processing device 2. For example, the processing device 2a generates content using the sound detected by the microphone 5 and the image captured by the camera 4a. The processing device 2a provides the generated content to the terminal 3 included in the corresponding group Ga.

本実施形態において、演者Ｐは、演者位置Ｐ１に対するグループＧの方位角と同様の方向から、このグループＧに対応する合成後の観客音を聞くことができる。例えば、イベント会場ＡＥにおいて、グループＧａは、演者位置Ｐ１の正面に対して左側の領域である。演者Ｐは、コンテンツの中でグループＧａの領域を向いて、ユーザに拍手等の反応を要求したとする。グループＧａに含まれる端末３によってコンテンツを視聴するユーザＵは、自分の方を向いて反応を呼びかける演者Ｐを見ることができる。このユーザＵが拍手や歓声などの反応を行うと、反応に相当する観客音は、端末３によって調整されて処理装置２ａによって合成され、スピーカ６ａから出力される。演者Ｐは、例えば、ユーザＵへ反応を呼びかけた方向から、反応に相当する観客音を聞くことができる。ユーザＵは、例えば、実空間におけるイベントの会場と異なる場所でイベントに参加する場合でも、臨場感を味わうことができ、実施形態に係る音声処理システム１は、新たな体験を提供できる。 In the present embodiment, the performer P can hear the synthesized audience sound corresponding to the group G from the same direction as the azimuth angle of the group G with respect to the performer position P1. For example, in the event venue AE, the group Ga is the area on the left side of the front of the performer position P1. It is assumed that the performer P faces the area of the group Ga in the content and requests the user for a reaction such as applause. The user U who views the content by the terminal 3 included in the group Ga can see the performer P who turns to himself and calls for a reaction. When the user U performs a reaction such as applause or cheers, the audience sound corresponding to the reaction is adjusted by the terminal 3, synthesized by the processing device 2a, and output from the speaker 6a. The performer P can hear the audience sound corresponding to the reaction from, for example, the direction in which the user U is called for the reaction. For example, the user U can experience a sense of realism even when participating in an event at a place different from the venue of the event in the real space, and the voice processing system 1 according to the embodiment can provide a new experience.

なお、図１０において、スピーカ６は、処理装置２と１対１の対応で設けられるが、１つの処理装置２に対応するスピーカ６の数は１でもよいし、複数でもよい。音声処理システム１は、少なくとも１つのスピーカ６を備えなくてもよい。例えば、複数のスピーカ６は、音声処理システム１と異なるシステム（例、音響システム）が備える装置であって、音声処理システムは、複数のグループＧの領域の配列と対応するように配列された複数のスピーカ６に対して、合成後の観客音のデータを出力するシステムでもよい。 In FIG. 10, the speaker 6 is provided in a one-to-one correspondence with the processing device 2, but the number of speakers 6 corresponding to one processing device 2 may be one or a plurality. The voice processing system 1 does not have to include at least one speaker 6. For example, the plurality of speakers 6 are devices provided in a system different from the voice processing system 1 (eg, a sound system), and the voice processing system is arranged so as to correspond to the arrangement of the regions of the plurality of groups G. A system that outputs the data of the audience sound after synthesis to the speaker 6 of the above may be used.

なお、図１０において、カメラ４は、処理装置２と１対１の対応で設けられるが、１つの処理装置２に対応するカメラ４の数は１でもよいし、複数でもよい。音声処理システム１は、少なくとも１つのカメラ４を備えなくてもよい。例えば、複数のカメラ４は、音声処理システム１と異なるシステム（例、撮影システム）が備える装置であって、音声処理システムは、複数のグループＧの領域の配列と対応するように配列された複数のカメラ４から、演者Ｐの映像を取得するシステムでもよい。カメラ４の数は１つでもよく、例えば、音声処理システム１は、複数のグループＧで共通のコンテンツを配信するシステムでもよい。 In FIG. 10, the camera 4 is provided in a one-to-one correspondence with the processing device 2, but the number of cameras 4 corresponding to one processing device 2 may be one or a plurality. The voice processing system 1 does not have to include at least one camera 4. For example, the plurality of cameras 4 are devices provided in a system different from the voice processing system 1 (eg, a photographing system), and the voice processing systems are arranged so as to correspond to the arrangement of the regions of the plurality of groups G. A system that acquires the image of the performer P from the camera 4 of the above may be used. The number of cameras 4 may be one, and for example, the voice processing system 1 may be a system that distributes common contents among a plurality of groups G.

実施形態に係る音声処理システム１は、予め作成された演者Ｐのコンピュータグラフィックス（適宜、ＣＧという）を利用して、コンテンツを生成してもよい。例えば、演者Ｐがポリゴン等の３次元ＣＧで表されており、音声処理システム１は、モーションキャプチャ等で検出される演者Ｐの動作を用いて、３次元ＣＧを動かすことでコンテンツを生成してもよい。音声処理システム１は、カメラ４によって取得される映像と、ＣＧとを組み合わせてコンテンツを生成してもよい。 The voice processing system 1 according to the embodiment may generate content by using the computer graphics (appropriately referred to as CG) of the performer P created in advance. For example, the performer P is represented by a three-dimensional CG such as a polygon, and the voice processing system 1 generates content by moving the three-dimensional CG using the movement of the performer P detected by motion capture or the like. May be good. The audio processing system 1 may generate content by combining the video acquired by the camera 4 and the CG.

なお、実施形態に係る音声処理システム１は、ユーザからのイベントへの参加申請に基づいて、イベントが開催される所定領域におけるユーザの位置を表す観客位置を決定する位置決定部を備えなくてもよい。例えば、観客位置は、参加申請と無関係に決定されてもよいし、イベントの開催者によって決定されてもよい。このような形態であっても、音声処理システム１は、ユーザＵの反応を演者Ｐに伝えることができ、例えば演者ＰのパフォーマンスにユーザＵの反応が影響を及ぼすことによって、ユーザＵに新たな体験を提供できる。 The voice processing system 1 according to the embodiment does not have to include a position determining unit that determines an audience position representing the position of the user in a predetermined area where the event is held based on an application for participation in the event from the user. good. For example, the position of the spectators may be determined independently of the application for participation, or may be determined by the organizer of the event. Even in such a form, the voice processing system 1 can convey the reaction of the user U to the performer P, and for example, the reaction of the user U affects the performance of the performer P, so that the user U is newly affected. Can provide an experience.

なお、本発明の技術範囲は、上述の実施形態などで説明した態様に限定されるものではない。上述の実施形態などで説明した要件の１つ以上は、省略されることがある。また、上述の実施形態などで説明した要件は、適宜組み合わせることができる。法令で許容される限りにおいて、上述の実施形態などで引用した全ての文献の開示を援用して本明細書の記載の一部とする。 The technical scope of the present invention is not limited to the embodiments described in the above-described embodiments. One or more of the requirements described in the above embodiments and the like may be omitted. In addition, the requirements described in the above-described embodiments and the like can be combined as appropriate. To the extent permitted by law, the disclosure of all documents cited in the above embodiments and the like shall be incorporated as part of the description of this specification.

１音声処理システム、２処理装置、３端末、４カメラ、５マイク、６スピーカ、７予約端末、８受付端末、Ｇグループ、Ｐ演者、Ｑ観客位置、Ｒ観客音、Ｕユーザ、２４位置決定部、２７合成部、２８コンテンツ生成部、３４音調整部 1 Audio processing system, 2 Processing device, 3 terminals, 4 cameras, 5 microphones, 6 speakers, 7 reservation terminals, 8 reception terminals, G group, P performers, Q spectator position, R spectator sound, U user, 24 position determination unit , 27 Synthesis part, 28 Content generation part, 34 Sound adjustment part

Claims

A microphone that detects the sound emitted by users participating in the event,
A sound that adjusts the sound detected by the microphone based on the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user in the predetermined area. Adjustment part and
The sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit and detected by the microphone corresponding to the second user who is a user different from the first user. A compositing unit that synthesizes the sound adjusted to the sound adjusting unit, and a compositing unit,
It is provided with an output unit that outputs data representing the sound synthesized by the synthesis unit.
The performer performs outside the predetermined area and
A voice processing system in which sound data synthesized by the synthesis unit is provided to a voice output device arranged at a position where the sound reaches the performer.

The sound adjusting unit adjusts the first sound detected by the microphone so as to be close to the second sound that reaches the performer position when the first sound is emitted at the audience position.
The voice processing system according to claim 1.

A microphone that detects the sound emitted by users participating in the event,
A sound that adjusts the sound detected by the microphone based on the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user in the predetermined area. Adjustment part and
The sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit and detected by the microphone corresponding to the second user who is a user different from the first user. A compositing unit that synthesizes the sound adjusted to the sound adjusting unit, and a compositing unit,
It is provided with an output unit that outputs data representing the sound synthesized by the synthesis unit.
The sound adjustment unit adjusts the first sound detected by the microphone so as to approach the second sound that reaches the performer position when the first sound is emitted at the audience position. system.

The sound adjusting unit adjusts the loudness of the sound detected by the microphone by using the distance between the audience position and the performer position.
The voice processing system according to any one of claims 1 to 3.

A microphone that detects the sound emitted by users participating in the event,
A sound that adjusts the sound detected by the microphone based on the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user in the predetermined area. Adjustment part and
The sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit and detected by the microphone corresponding to the second user who is a user different from the first user. A compositing unit that synthesizes the sound adjusted to the sound adjusting unit, and a compositing unit,
It is provided with an output unit that outputs data representing the sound synthesized by the synthesis unit.
The sound adjusting unit is a voice processing system that adjusts the loudness of the sound detected by the microphone by using the distance between the audience position and the performer position.

A second adjusting the sound detected by the microphone corresponding to the second user based on the relationship between the spectator position corresponding to the first user and the spectator position corresponding to the second user. Sound adjustment part and
Any of claims 1 to 5, further comprising a second output unit that outputs sound data including sound adjusted by the second sound adjustment unit to a voice output device that transmits sound to the first user. The voice processing system described in item 1.

A microphone that detects the sound emitted by users participating in the event,
A sound that adjusts the sound detected by the microphone based on the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user in the predetermined area. Adjustment part and
The sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit and detected by the microphone corresponding to the second user who is a user different from the first user. A compositing unit that synthesizes the sound adjusted to the sound adjusting unit, and a compositing unit,
An output unit that outputs data representing the sound synthesized by the synthesis unit, and an output unit.
A second adjusting the sound detected by the microphone corresponding to the second user based on the relationship between the spectator position corresponding to the first user and the spectator position corresponding to the second user. Sound adjustment part and
A voice processing system including a second output unit that outputs sound data including sound adjusted by the second sound adjustment unit to a voice output device that transmits the sound to the first user.

The voice processing system according to any one of claims 1 to 7, further comprising a position determining unit for determining the position of the spectator based on an application for participation in the event from the user.

It is equipped with an application acquisition unit that acquires the participation application provided by the user.
The position determination unit determines the audience position based on the participation application acquired by the application acquisition unit.
The voice processing system according to claim 8.

A position determining unit for changing the spectator position corresponding to the first user is provided based on the spectator position change request from the first user.
The sound adjusting unit adjusts the sound detected by the microphone corresponding to the first user by using the audience position changed by the positioning unit.
The voice processing system according to any one of claims 1 to 9.

A position determining unit for determining the spectator position when the first user participates in the event after the past event based on the history of the first user participating in the past event is provided. The voice processing system according to any one of claims 1 to 10.

A first voice processing device including the sound adjusting unit and a first communication unit for transmitting sound data adjusted by the sound adjusting unit.
A second communication unit including a second communication unit that receives the sound data transmitted by the first communication unit and a synthesis unit that synthesizes the sound represented by the sound data received by the second communication unit. The voice processing system according to any one of claims 1 to 11, further comprising a voice processing device.

The second audio processing device includes a position determining unit that determines the audience position.
The second communication unit transmits position information indicating the relationship between the performer position and the audience position determined by the position determination unit.
The first communication unit receives the position information transmitted by the second communication unit, and receives the position information.
The sound adjusting unit of the first voice processing device adjusts the sound detected by the microphone corresponding to the first voice processing device based on the position information received by the first communication unit.
The voice processing system according to claim 12.

It is provided with a plurality of the first voice processing devices divided into a plurality of groups, and has a plurality of first voice processing devices.
The synthesis part
A first synthesis unit that is provided corresponding to each group of the plurality of groups and that synthesizes a sound adjusted by the sound adjustment unit of the first voice processing device in each group.
The plurality of groups include a second compositing unit that synthesizes the sound synthesized by the first compositing unit.
The voice processing system according to claim 12 or 13.

The predetermined region is divided into a plurality of subregions, each of which corresponds to each of the groups.
Each group includes a set of the first audio processing devices corresponding to the audience position included in the subregion corresponding to each group.
The voice processing system according to claim 14.

An action reception unit that receives information from the user that specifies an action for one or both of the performer and the user.
The voice processing system according to any one of claims 1 to 15, further comprising a processing unit that executes processing associated with information received in advance by the action receiving unit.

The voice processing according to any one of claims 1 to 16, further comprising a third output unit that outputs sound data obtained by combining the sound emitted by the performer and the sound adjusted by the sound adjustment unit. system.

The first user watches the event outside the predetermined area and views the event.
The microphone corresponding to the first user is arranged at a position where the sound emitted by the first user is detected.
The voice processing system according to any one of claims 1 to 17.

A receiver that receives position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area.
A sound adjustment unit that adjusts the sound emitted from the user and detected by the microphone based on the position information.
A transmission unit for transmitting data representing the sound adjusted by the sound adjustment unit is provided.
The sound adjusting unit adjusts the first sound detected by the microphone so as to be close to the second sound that reaches the performer position when the first sound is emitted at the audience position.
Voice processing device.

A receiver that receives position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area.
A sound adjustment unit that adjusts the sound emitted from the user and detected by the microphone based on the position information.
A transmission unit for transmitting data representing the sound adjusted by the sound adjustment unit is provided.
The sound adjusting unit adjusts the loudness of the sound detected by the microphone by using the distance between the audience position and the performer position.
Voice processing device.

A transmitter that transmits position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area.
A receiving unit that receives sound data emitted from the user and adjusted based on the position information from the transmission destination of the position information.
The sound represented by the data received from the destination corresponding to the first user who is the user and the data received from the destination corresponding to the second user who is the user different from the first user are represented. It is equipped with a synthesizer that synthesizes sound and
The performer performs outside the predetermined area and
The sound data synthesized by the synthesis unit is provided to a voice output device arranged at a position where the sound reaches the performer.
Voice processing device.

A transmitter that transmits position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area.
A receiving unit that receives sound data emitted from the user , detected by the microphone, and adjusted based on the position information from the transmission destination of the position information.
The sound represented by the data received from the destination corresponding to the first user who is the user and the data received from the destination corresponding to the second user who is the user different from the first user are represented. The synthesizer that synthesizes the sound,
A second adjusting the sound detected by the microphone corresponding to the second user based on the relationship between the spectator position corresponding to the first user and the spectator position corresponding to the second user. Equipped with a sound adjustment unit
The sound data including the sound adjusted by the second sound adjusting unit is output to the voice output device that transmits the sound to the first user.
Voice processing device.

Detecting the sound emitted by the user participating in the event with a microphone,
A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user in the predetermined area. To adjust by
The sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit and detected by the microphone corresponding to the second user who is a user different from the first user. By synthesizing the sound adjusted to the sound adjustment unit by the synthesis unit,
Including the output of data representing the sound synthesized by the synthesis unit.
The performer performs outside the predetermined area and
The sound data synthesized by the synthesis unit is provided to a voice output device arranged at a position where the sound reaches the performer.
Voice processing method.

Detecting the sound emitted by the user participating in the event with a microphone,
A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user in the predetermined area. To adjust by
The sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit and detected by the microphone corresponding to the second user who is a user different from the first user. By synthesizing the sound adjusted to the sound adjustment unit by the synthesis unit,
Including the output of data representing the sound synthesized by the synthesis unit.
The sound adjusting unit adjusts the first sound detected by the microphone so as to be close to the second sound that reaches the performer position when the first sound is emitted at the audience position.
Voice processing method.

Detecting the sound emitted by the user participating in the event with a microphone,
A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user in the predetermined area. To adjust by
The sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit and detected by the microphone corresponding to the second user who is a user different from the first user. By synthesizing the sound adjusted to the sound adjustment unit by the synthesis unit,
Including the output of data representing the sound synthesized by the synthesis unit.
The sound adjusting unit adjusts the loudness of the sound detected by the microphone by using the distance between the audience position and the performer position.
Voice processing method.

Detecting the sound emitted by the user participating in the event with a microphone,
A sound adjustment unit that adjusts the sound detected by the microphone based on the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user in the predetermined area. To adjust by
The sound detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit and detected by the microphone corresponding to the second user who is a user different from the first user. By synthesizing the sound adjusted to the sound adjustment unit by the synthesis unit,
To output data representing the sound synthesized by the synthesis unit,
The sound detected by the microphone corresponding to the second user is adjusted to the second sound based on the relationship between the spectator position corresponding to the first user and the spectator position corresponding to the second user. Adjusting by department and
To output sound data including the sound adjusted by the second sound adjusting unit to the voice output device that transmits the sound to the first user.
Speech processing methods including.

On the computer
Receiving position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area.
Adjusting the sound emitted from the user and detected by the microphone based on the position information,
Sending data representing the adjusted sound and executing
The adjustment includes adjusting the first sound detected by the microphone so as to be close to the second sound that reaches the performer position when the first sound is emitted at the audience position. ,
Speech processing program.

On the computer
Receiving position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area.
Adjusting the sound emitted from the user and detected by the microphone based on the position information,
Sending data representing the adjusted sound and executing
The adjustment includes adjusting the loudness of the sound detected by the microphone by using the distance between the audience position and the performer position.
Speech processing program.

On the computer
Transmission of position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area.
Receiving sound data emitted from the user and adjusted based on the position information from the transmission destination of the position information.
The sound represented by the data received from the destination corresponding to the first user who is the user and the data received from the destination corresponding to the second user who is the user different from the first user are represented. To synthesize the sound and to execute,
The performer performs outside the predetermined area and
The synthesized sound data is provided to a voice output device arranged at a position where the sound reaches the performer.
Speech processing program.

On the computer
Transmission of position information indicating the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area.
Receiving sound data emitted from the user , detected by the microphone, and adjusted based on the position information from the transmission destination of the position information.
The sound represented by the data received from the destination corresponding to the first user who is the user and the data received from the destination corresponding to the second user who is the user different from the first user are represented. Combining sound and
Adjusting the sound detected by the microphone corresponding to the second user based on the relationship between the spectator position corresponding to the first user and the spectator position corresponding to the second user. To execute,
The sound data including the sound adjusted corresponding to the second user is output to the voice output device that transmits the sound to the first user.
Speech processing program.