JP2009094701A

JP2009094701A - Information processing device and program

Info

Publication number: JP2009094701A
Application number: JP2007262116A
Authority: JP
Inventors: 剛 ▲高▼澤; Go Takazawa
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-10-05
Filing date: 2007-10-05
Publication date: 2009-04-30

Abstract

<P>PROBLEM TO BE SOLVED: To increase entertainment by giving changes to a video in holding a session via a communication network. <P>SOLUTION: An information transmitter 110 transmits sound information representing performed sounds collected by microphones MICa to MICc and transmits video information representing videos photographed by the cameras CAMa to CAMc in association with position information corresponding to the video information. An information processor 120 receives the sound information and the video information associated with the position information, appropriately works them, and supplies speakers SPa to SPc and screens SCRa to SCRc with them. The information processor 120 works the video information in accordance with the position information so that a video of a plurality of performers are represented without a sense of incongruity. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、映像情報を加工する技術に関する。 The present invention relates to a technique for processing video information.

通信ネットワークを介してセッションを行うための技術が知られている（例えば、特許文献１参照）。このような技術を用いれば、遠隔の地にいる演奏者同士でも気軽にセッションを行うことが可能となる。通信ネットワークを介してセッションを行う場合、演奏音等の音声に加えて演奏者の映像を再生すれば、より臨場感が高まり、セッションの娯楽性を高めることができる。
特許第３８４６３４４号公報 A technique for performing a session via a communication network is known (see, for example, Patent Document 1). If such a technique is used, it becomes possible to perform a session casually even between performers in remote locations. When a session is performed via a communication network, if a player's video is reproduced in addition to sound such as performance sound, the sense of reality is further enhanced and the entertainment of the session can be enhanced.
Japanese Patent No. 3846344

しかし、単に演奏者の映像を映し出すだけでは、映像の再生が単調に終始し、実際に演奏者が会してセッションを行う場合に比べて現実味や面白味に欠けるという問題点があった。
そこで、本発明は、通信ネットワークを介してセッションを行うに際し、映像に変化を与えて娯楽性を高めることを目的としている。 However, simply displaying the video of the performer has the problem that the video playback is monotonous and lacks realism and fun compared to the case where the performer actually meets and conducts a session.
Therefore, an object of the present invention is to enhance entertainment by giving a change to an image when a session is performed via a communication network.

本発明に係る情報処理装置は、音声情報と、複数の映像情報と、各々が前記複数の映像情報のいずれかに対応付けられた複数の位置情報とを通信ネットワークを介して取得する取得手段と、前記取得手段により取得された複数の映像情報を、当該映像情報に対応付けられた前記位置情報に応じた態様で加工する映像加工手段と、前記映像加工手段により加工された複数の映像情報と前記取得手段により取得された音声情報とを出力する出力手段とを備えることを特徴とする。 An information processing apparatus according to the present invention includes an acquisition unit configured to acquire audio information, a plurality of pieces of video information, and a plurality of pieces of position information each associated with any one of the plurality of pieces of video information via a communication network. , Video processing means for processing a plurality of video information acquired by the acquisition means in a manner corresponding to the position information associated with the video information, and a plurality of video information processed by the video processing means, Output means for outputting voice information acquired by the acquisition means.

本発明に係る情報処理装置において、前記映像加工手段は、前記映像情報が出力されることにより表示される映像の位置又は大きさを変更する加工を行う構成としてもよい。
また、前記映像加工手段は、前記複数の映像情報を合成する加工を行う構成としてもよい。 In the information processing apparatus according to the present invention, the video processing means may be configured to perform processing to change a position or size of a video displayed when the video information is output.
The video processing means may be configured to perform processing for combining the plurality of video information.

本発明に係る情報処理装置において、前記位置情報は、対応する前記映像情報が表す映像を撮影した撮影手段と被写体との距離を表す情報であり、前記映像加工手段は、前記映像情報が表す映像の大きさを前記位置情報が表す距離に応じて変更する加工を行う構成としてもよい。
また、前記位置情報は、対応する前記映像情報が表す映像を撮影した撮影手段のある撮影方向を基準とした撮影角度を表す情報であり、前記映像加工手段は、前記映像情報が表す映像の表示位置を当該映像情報に対応する前記位置情報が表す撮影角度に応じて変更する加工を行う構成としてもよい。
あるいは、前記位置情報は、対応する前記映像情報が表す映像を撮影した撮影手段の位置を表す情報であり、前記映像加工手段は、前記映像情報が表す映像の表示位置を前記複数の位置情報が表す位置に応じて変更する加工を行う構成としてもよい。 In the information processing apparatus according to the present invention, the position information is information indicating a distance between a photographing unit that captures a video represented by the corresponding video information and a subject, and the video processing unit is a video represented by the video information. It is good also as a structure which performs the process which changes the magnitude | size of according to the distance which the said positional information represents.
The position information is information indicating a shooting angle based on a shooting direction of a shooting unit that has shot a video represented by the corresponding video information, and the video processing unit displays a video represented by the video information. It is good also as a structure which performs the process which changes a position according to the imaging | photography angle which the said positional information corresponding to the said video information represents.
Alternatively, the position information is information indicating a position of a photographing unit that has captured the video represented by the corresponding video information, and the video processing unit is configured to display a display position of the video represented by the video information by the plurality of position information. It is good also as a structure which performs the process changed according to the position to represent.

本発明に係る情報処理装置において、前記音声情報及び前記映像情報は、それぞれ、対象者の音声及び映像を表し、前記位置情報は、測位手段により計測された前記対象者の位置を表し、前記映像加工手段は、前記映像情報が表す映像の表示位置を前記複数の位置情報が表す位置に応じて変更する加工を行う構成としてもよい。 In the information processing apparatus according to the present invention, the audio information and the video information respectively represent a target person's voice and video, and the position information represents the position of the target person measured by positioning means, and the video The processing means may be configured to perform processing to change the display position of the video represented by the video information according to the positions represented by the plurality of position information.

本発明に係る情報処理装置において、前記映像情報に対する加工の態様を指定する指定手段を備え、前記映像加工手段は、前記位置情報に応じた態様の加工と、前記指定手段により指定された態様の加工とを行う構成としてもよい。 The information processing apparatus according to the present invention further comprises designation means for designating a mode of processing for the video information, wherein the video processing means has a mode according to the position information and a mode designated by the designation means. It is good also as composition which performs processing.

本発明に係る情報処理装置は、前記音声情報に対応付けられる位置情報を取得する位置情報取得手段と、前記位置情報取得手段により取得された位置情報に応じた態様で前記音声情報を加工する音声加工手段とを備える構成としてもよい。
あるいは、本発明に係る情報処理装置は、前記映像加工手段により前記複数の映像情報に行われた加工の態様に応じて前記音声情報を加工する音声加工手段を備える構成としてもよい。 The information processing apparatus according to the present invention includes a position information acquisition unit that acquires position information associated with the sound information, and a sound that processes the sound information in a manner corresponding to the position information acquired by the position information acquisition unit. It is good also as a structure provided with a process means.
Alternatively, the information processing apparatus according to the present invention may include an audio processing unit that processes the audio information according to a mode of processing performed on the plurality of pieces of video information by the video processing unit.

本発明に係る情報処理装置において、前記取得手段により取得された音声情報と映像情報とを同期させる同期手段を備え、前記取得手段は、前記音声情報及び映像情報のそれぞれについて、各々の再生タイミングを表す時間情報を対応付けて取得し、前記同期手段は、前記映像加工手段による加工の前又は後に、前記音声情報及び映像情報のそれぞれに対応付けられた前記時間情報に基づいて当該音声情報及び映像情報を同期させる構成としてもよい。 The information processing apparatus according to the present invention further includes a synchronization unit that synchronizes the audio information and the video information acquired by the acquisition unit, wherein the acquisition unit sets a reproduction timing for each of the audio information and the video information. The time information is acquired in association with each other, and the synchronization unit is configured to obtain the audio information and the video based on the time information associated with each of the audio information and the video information before or after the processing by the video processing unit. It is good also as a structure which synchronizes information.

なお、本発明の実施の形態は、上述した情報処理装置に限らず、コンピュータにかかる情報処理装置の機能を実現させるためのプログラムや、かかるプログラムを記憶した記録媒体であってもよい。 The embodiment of the present invention is not limited to the information processing apparatus described above, and may be a program for realizing the functions of the information processing apparatus related to a computer or a recording medium storing such a program.

本発明によれば、通信ネットワークを介してセッションを行うに際し、映像に変化を与えて娯楽性を高めることが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, when performing a session via a communication network, it becomes possible to give a change to an image | video and to improve entertainment property.

［実施形態］
図１は、本発明の一実施形態であるネットワークセッションシステムの全体構成を概略的に示す図である。同図に示すように、ネットワークセッションシステム１０は、第１セッション地点と第２セッション地点とをネットワーク１３０を介して接続した構成を有する。ネットワーク１３０は、第１セッション地点と第２セッション地点との間の通信を可能にする通信ネットワークであり、例えば、インターネットである。 [Embodiment]
FIG. 1 is a diagram schematically showing an overall configuration of a network session system according to an embodiment of the present invention. As shown in the figure, the network session system 10 has a configuration in which a first session point and a second session point are connected via a network 130. The network 130 is a communication network that enables communication between the first session point and the second session point, and is, for example, the Internet.

本実施形態において、第１セッション地点には、３人の演奏者がいるものとする。また、第２セッション地点は、第１セッション地点において記録された音声や映像を再生する地点であり、ここには１人の演奏者がいるものとする。第１セッション地点の３人の演奏者は、それぞれ、ここではキーボード、ドラム又はギターのいずれかを演奏し、第２セッション地点の演奏者は、第１セッション地点の演奏に合わせて歌唱するヴォーカリストであるとする。 In the present embodiment, it is assumed that there are three performers at the first session point. In addition, the second session point is a point where audio and video recorded at the first session point are reproduced, and it is assumed that there is one player. The three performers at the first session point each play a keyboard, a drum or a guitar here, and the performers at the second session point are vocalists who sing along with the performance at the first session point. Suppose there is.

第１セッション地点には、複数のマイクＭＩＣａ、ＭＩＣｂ及びＭＩＣｃと、複数のカメラＣＡＭａ、ＣＡＭｂ及びＣＡＭｃと、情報送信装置１１０とが設けられている。マイクＭＩＣａ、ＭＩＣｂ及びＭＩＣｃは、それぞれ、キーボードの演奏者（以下「演奏者ａ」という。）、ドラムの演奏者（以下「演奏者ｂ」という。）又はギターの演奏者（以下「演奏者ｃ」という。）のいずれかに対応するマイクロホンであり、対応する演奏者の演奏音や歌唱音を収音する。カメラＣＡＭａ、ＣＡＭｂ及びＣＡＭｃは、それぞれ、演奏者ａ、ｂ又はｃのいずれかに対応するビデオカメラであり、対応する演奏者を撮影する。なお、カメラＣＡＭａ、ＣＡＭｂ及びＣＡＭｃは、被写体である演奏者までの距離を測定できるよう構成されている。被写体までの距離は、例えば、オートフォーカス機構を有するビデオカメラであればフォーカス時の測距により求めることができる。情報送信装置１１０は、マイクＭＩＣａ、ＭＩＣｂ及びＭＩＣｃにより収音された演奏音を表す音声情報と、カメラＣＡＭａ、ＣＡＭｂ及びＣＡＭｃにより撮影された映像を表す映像情報とを取得し、適当なデータ処理を施して第２セッション地点へと送信する。なお、情報送信装置１１０は、自動で処理を行ってもよいが、演奏者以外の操作者が操作できるように構成されている。 A plurality of microphones MICa, MICb, and MICc, a plurality of cameras CAMa, CAMb, and CAMc, and an information transmission device 110 are provided at the first session point. The microphones MICA, MICb, and MICc are respectively a keyboard player (hereinafter referred to as “Performer a”), a drum player (hereinafter referred to as “Performer b”), or a guitar player (hereinafter referred to as “Performer c”). And a microphone corresponding to any one of the above, and picks up the performance sound and singing sound of the corresponding performer. The cameras CAMa, CAMb, and CAMc are video cameras corresponding to the performers a, b, and c, respectively, and photograph the corresponding performers. The cameras CAMa, CAMb, and CAMc are configured to be able to measure the distance to the performer who is the subject. For example, in the case of a video camera having an autofocus mechanism, the distance to the subject can be obtained by distance measurement at the time of focusing. The information transmitting apparatus 110 acquires audio information representing performance sounds collected by the microphones MICa, MICb and MICc, and video information representing images shot by the cameras CAMa, CAMb and CAMc, and performs appropriate data processing. And send it to the second session point. The information transmitting apparatus 110 may perform processing automatically, but is configured to be operated by an operator other than the performer.

第２セッション地点には、情報処理装置１２０と、複数のスクリーンＳＣＲａ、ＳＣＲｂ及びＳＣＲｃと、複数のスピーカＳＰａ、ＳＰｂ及びＳＰｃと、マイクＭＩＣｄとが設けられている。情報処理装置１２０は、情報送信装置１１０から送信された音声情報及び映像情報と、マイクＭＩＣｄから供給された音声情報とを取得し、適当なデータ処理を施すことによりこれらを加工して出力する。なお、情報処理装置１２０も、自動で処理を行ってもよいが、演奏者以外の操作者が操作できるように構成されている。スクリーンＳＣＲａ、ＳＣＲｂ及びＳＣＲｃは、それぞれ、情報処理装置１２０から出力された映像情報を投影するためのスクリーンである。スクリーンＳＣＲａ、ＳＣＲｂ及びＳＣＲｃは、ここでは、液晶等の表示素子により構成されたスクリーンであるとするが、別途設けられる投影装置（プロジェクタ等）により投影された映像を表示する布や幕であってもよい。この場合には、投影装置が映像情報を取得するように構成すればよい。スピーカＳＰａ、ＳＰｂ及びＳＰｃは、それぞれ、いわゆるマルチスピーカであり、情報処理装置１２０から出力された音声情報を音声として再生する。スピーカＳＰａ、ＳＰｂ及びＳＰｃは、それぞれ、いわゆるアレイスピーカであると望ましい。ここにおいて、スピーカＳＰａは、他のスピーカＳＰｂ及びＳＰｃから見て相対的に「左」、スピーカＳＰｂは相対的に「中央」、スピーカＳＰｃは相対的に「右」に、それぞれ位置している。マイクＭＩＣｄは、ヴォーカリスト（以下「演奏者ｄ」という。）に対応するマイクロホンであり、演奏者ｄの歌唱音声を収音する。なお、本実施形態においては、マイクＭＩＣｄの位置は固定であり、あらかじめ決められた位置であるとする。 The information processing device 120, a plurality of screens SCRa, SCRb, and SCRc, a plurality of speakers SPa, SPb, and SPc, and a microphone MICd are provided at the second session point. The information processing apparatus 120 acquires the audio information and video information transmitted from the information transmission apparatus 110 and the audio information supplied from the microphone MICd, and processes and outputs them by performing appropriate data processing. The information processing apparatus 120 may also perform processing automatically, but is configured to be operated by an operator other than the performer. Screens SCRa, SCRb, and SCRc are screens for projecting video information output from information processing device 120, respectively. Here, the screens SCRa, SCRb, and SCRc are assumed to be screens composed of display elements such as liquid crystal, but are cloths and curtains for displaying images projected by a separately provided projection device (projector or the like). Also good. In this case, the projection device may be configured to acquire video information. The speakers SPa, SPb, and SPc are so-called multi-speakers, and reproduce the audio information output from the information processing apparatus 120 as audio. The speakers SPa, SPb, and SPc are each preferably so-called array speakers. Here, the speaker SPa is relatively “left” as viewed from the other speakers SPb and SPc, the speaker SPb is relatively “center”, and the speaker SPc is relatively “right”. The microphone MICd is a microphone corresponding to a vocalist (hereinafter referred to as “player d”), and collects the singing voice of the player d. In the present embodiment, it is assumed that the position of the microphone MICd is fixed and is a predetermined position.

以上の構成のもと、本実施形態のネットワークセッションシステム１０においては、情報送信装置１１０が音声情報及び映像情報を送信し、情報処理装置１２０が再生地点での再生に適した態様となるようにこれらを加工して出力する。
本実施形態の情報処理装置１２０は、情報の加工に際して２通りの態様をとり得る。一方は、取得した映像情報に対応する位置情報に基づいて映像情報を加工するものであり、他方は、取得した映像情報に対応する位置情報に基づいて映像情報及び音声情報を加工するものである。そこで、以下では、前者を第１実施例、後者を第２実施例として説明する。なお、第２実施例においては、第１実施例と重複する説明を適宜省略する。 Based on the above configuration, in the network session system 10 of the present embodiment, the information transmitting apparatus 110 transmits audio information and video information, and the information processing apparatus 120 is in a mode suitable for playback at a playback point. These are processed and output.
The information processing apparatus 120 of the present embodiment can take two modes when processing information. One is for processing video information based on position information corresponding to the acquired video information, and the other is for processing video information and audio information based on position information corresponding to the acquired video information. . Therefore, hereinafter, the former will be described as a first embodiment, and the latter will be described as a second embodiment. In the second embodiment, the description overlapping with the first embodiment will be omitted as appropriate.

（１）第１実施例
図２は、本実施例に係る情報送信装置１１０の構成を示すブロック図である。同図に示すように、情報送信装置１１０は、入力部１１１と、制御部１１２と、記憶部１１３と、操作部１１４と、通信部１１５とを備える。なお、情報送信装置１１０は、汎用のパーソナルコンピュータであってもよいし、図２の構成を備えた専用の装置であってもよい。 (1) First Example FIG. 2 is a block diagram illustrating a configuration of an information transmission apparatus 110 according to the present example. As shown in the figure, the information transmitting apparatus 110 includes an input unit 111, a control unit 112, a storage unit 113, an operation unit 114, and a communication unit 115. The information transmitting apparatus 110 may be a general-purpose personal computer or a dedicated apparatus having the configuration of FIG.

入力部１１１は、音声情報及び映像情報を入力するインタフェースである。入力部１１１は、マイクＭＩＣａ、ＭＩＣｂ及びＭＩＣｃ並びにカメラＣＡＭａ、ＣＡＭｂ及びＣＡＭｃと接続され、それぞれから音声情報又は映像情報を取得する。図２において、符号Ａ_aは演奏者ａに対応する音声情報を表し、符号Ｖ_aは演奏者ａに対応する映像情報を表している。同様に、符号Ａ_b、Ａ_c、Ｖ_b及びＶ_cは、それぞれ、添字に対応する演奏者の音声情報又は映像情報を表している。 The input unit 111 is an interface for inputting audio information and video information. The input unit 111 is connected to the microphones MICa, MICb, and MICc, and the cameras CAMa, CAMb, and CAMc, and acquires audio information or video information from each of them. In FIG. 2, symbol A _a represents audio information corresponding to the player a, and symbol V _a represents video information corresponding to the player a. Similarly, symbols A _b , A _c , V _b, and V _c represent the performer's audio information or video information corresponding to the subscripts, respectively.

また、入力部１１１は、映像情報Ｖ_a、Ｖ_b及びＶ_cとともに、各々の映像情報に対応付けられた位置情報Ｐ_a、Ｐ_b及びＰ_cを取得する。位置情報としては、例えば、各カメラＣＡＭａ〜ＣＡＭｃの相対的な位置関係を表す情報（以下「相対位置情報」という。）や、各カメラＣＡＭａ〜ＣＡＭｃから対応する演奏者までの距離を表す情報（以下「距離情報」という。）を用いることができる。相対位置情報とは、図１に示す構成でいえば、カメラＣＡＭａが相対的に「左」、カメラＣＡＭｂが相対的に「中央」、カメラＣＡＭｃが相対的に「右」であることを示す情報をいう。これらの位置情報は、映像情報に対応付けられて付加された形式でもよいし、映像情報に含まれる形式でもよい。 The input unit 111, the video information V _a, with V _b and V _c, the position information associated with each of the video information P _a, to obtain a P _b and P _c. The position information includes, for example, information indicating the relative positional relationship between the cameras CAMa to CAMc (hereinafter referred to as “relative position information”), and information indicating the distance from each camera CAMa to CAMc to the corresponding performer ( (Hereinafter referred to as “distance information”). In the configuration shown in FIG. 1, the relative position information is information indicating that the camera CAMa is relatively “left”, the camera CAMb is relatively “center”, and the camera CAMc is relatively “right”. Say. The position information may be added in association with the video information or may be included in the video information.

制御部１１２は、ＣＰＵ（Central Processing Unit）等の演算装置やメモリを備え、記憶部１１３に記憶されたプログラムを実行することにより情報送信装置１１０の各部の動作を制御する。制御部１１２は、プログラムを実行することにより、音声情報や映像情報にデータ処理を実行する。制御部１１２が実行するデータ処理には、音声情報や映像情報を所定のフォーマットに変換するエンコード処理と、音声情報に位置情報及び時間情報を付加するとともに、映像情報に時間情報を付加する付加処理とが含まれる。 The control unit 112 includes an arithmetic device such as a CPU (Central Processing Unit) and a memory, and controls the operation of each unit of the information transmission device 110 by executing a program stored in the storage unit 113. The control unit 112 executes data processing on audio information and video information by executing a program. The data processing executed by the control unit 112 includes encoding processing for converting audio information and video information into a predetermined format, and addition processing for adding position information and time information to the audio information and adding time information to the video information. And are included.

記憶部１１３は、ハードディスク等の書き換え可能な記憶媒体を備え、制御部１１２が実行するプログラムを記憶する。操作部１１４は、ボタンやスライダ（ツマミ）等の操作子を備え、操作者による操作を受け付ける。操作部１１４は、操作者による操作を受け付けると、これを表すデータを制御部１１２に供給する。通信部１１５は、ネットワーク１３０を介して通信を行うためのインタフェースであり、制御部１１２から供給された音声情報や映像情報を情報処理装置１２０に送信する。 The storage unit 113 includes a rewritable storage medium such as a hard disk, and stores a program executed by the control unit 112. The operation unit 114 includes operation elements such as buttons and sliders (knobs), and accepts operations by the operator. When the operation unit 114 receives an operation by the operator, the operation unit 114 supplies data representing the operation to the control unit 112. The communication unit 115 is an interface for performing communication via the network 130, and transmits audio information and video information supplied from the control unit 112 to the information processing apparatus 120.

図３は、本実施例に係る情報処理装置１２０の構成を示すブロック図である。同図に示すように、情報処理装置１２０は、通信部１２１と、制御部１２２と、記憶部１２３と、操作部１２４と、音声入力部１２５と、音声出力部１２６と、映像出力部１２７とを備える。なお、情報処理装置１２０は、汎用のパーソナルコンピュータであってもよいし、図３の構成を備えた専用の装置であってもよい。 FIG. 3 is a block diagram illustrating the configuration of the information processing apparatus 120 according to the present embodiment. As shown in the figure, the information processing apparatus 120 includes a communication unit 121, a control unit 122, a storage unit 123, an operation unit 124, an audio input unit 125, an audio output unit 126, and a video output unit 127. Is provided. Note that the information processing apparatus 120 may be a general-purpose personal computer or a dedicated apparatus having the configuration of FIG.

通信部１２１は、通信部１１５と同様のインタフェースであり、情報送信装置１１０から送信された音声情報や映像情報を受信し、制御部１２２に供給する。制御部１２２は、演算装置やメモリを備え、記憶部１２３に記憶されたプログラムを実行することにより情報処理装置１２０の各部の動作を制御する。制御部１２２は、プログラムを実行することにより、音声情報や映像情報にデータ処理を実行する。制御部１２２が実行するデータ処理には、所定のフォーマットでエンコードされた音声情報や映像情報をデコードするデコード処理と、映像情報を位置情報に基づいて加工する加工処理と、複数の音声情報をミキシングするミキシング処理とが含まれる。なお、制御部１２２は、音声情報と映像情報のそれぞれに対する専用のＤＳＰ（Digital Signal Processor）などによってデータ処理を行う構成であってもよい。 The communication unit 121 is an interface similar to that of the communication unit 115, receives audio information and video information transmitted from the information transmission device 110, and supplies them to the control unit 122. The control unit 122 includes an arithmetic device and a memory, and controls the operation of each unit of the information processing device 120 by executing a program stored in the storage unit 123. The control unit 122 executes data processing on audio information and video information by executing a program. Data processing executed by the control unit 122 includes decoding processing for decoding audio information and video information encoded in a predetermined format, processing processing for processing video information based on position information, and mixing a plurality of audio information. Mixing processing. Note that the control unit 122 may be configured to perform data processing using a dedicated DSP (Digital Signal Processor) or the like for each of audio information and video information.

記憶部１２３は、書き換え可能な記憶媒体を備え、制御部１２２が実行するプログラムを記憶する。操作部１２４は、操作者による操作を操作子により受け付け、これを表すデータを制御部１２２に供給する。音声入力部１２５は、マイクＭＩＣｄと接続され、マイクＭＩＣｄから音声情報を取得する。音声出力部１２６は、制御部１２２によりミックス処理が実行された音声情報を取得し、これをスピーカＳＰａ、ＳＰｂ及びＳＰｃに出力する。映像出力部１２７は、制御部１２２により加工処理が実行された映像情報を取得し、これをスクリーンＳＣＲａ、ＳＣＲｂ及びＳＣＲｃに出力する。 The storage unit 123 includes a rewritable storage medium and stores a program executed by the control unit 122. The operation unit 124 receives an operation by the operator using an operator, and supplies data representing the operation to the control unit 122. The voice input unit 125 is connected to the microphone MICd and acquires voice information from the microphone MICd. The audio output unit 126 acquires audio information on which the mixing process has been executed by the control unit 122, and outputs this to the speakers SPa, SPb, and SPc. The video output unit 127 acquires video information processed by the control unit 122 and outputs the video information to the screens SCRa, SCRb, and SCRc.

情報送信装置１１０及び情報処理装置１２０の構成は、以上のとおりである。続いて、情報送信装置１１０及び情報処理装置１２０のそれぞれの動作を説明する。 The configurations of the information transmission device 110 and the information processing device 120 are as described above. Subsequently, each operation of the information transmission apparatus 110 and the information processing apparatus 120 will be described.

図４は、本実施例における情報送信装置１１０の動作を示すフローチャートである。同図に示すように、情報送信装置１１０の制御部１１２は、まず、入力部１１１を介して演奏者ａ、ｂ及びｃに対応する音声情報と映像情報とを取得する（ステップＳ１１）。次に、制御部１１２は、映像情報に位置情報を付加するか否かを判断する（ステップＳ１２）。制御部１１２は、この判断を操作部１１４からのデータがあるか否かにより行う。すなわち、ここにおいて映像情報に位置情報を付加するか否かは、操作者の任意である。操作者は、映像情報に必要な位置情報が付加されていない場合や、映像情報に付加されている位置情報を変更する場合などに、必要に応じて、操作部１１４を操作することにより位置情報を付加することができる。よって、制御部１１２は、映像情報に付加すべき位置情報がある場合に、その位置情報を付加する（ステップＳ１３）。 FIG. 4 is a flowchart showing the operation of the information transmitting apparatus 110 in the present embodiment. As shown in the figure, the control unit 112 of the information transmitting apparatus 110 first acquires audio information and video information corresponding to the performers a, b, and c via the input unit 111 (step S11). Next, the control unit 112 determines whether or not position information is added to the video information (step S12). The control unit 112 makes this determination based on whether there is data from the operation unit 114. That is, whether or not position information is added to the video information here is arbitrary by the operator. The operator operates the operation unit 114 as necessary to operate the position information when the necessary position information is not added to the video information or when the position information added to the video information is changed. Can be added. Therefore, if there is position information to be added to the video information, the control unit 112 adds the position information (step S13).

続いて、制御部１１２は、音声情報と映像情報のそれぞれに時間情報を付加する処理を実行する（ステップＳ１４）。ここにおいて、時間情報とは、複数の音声情報及び映像情報を同期して再生できるようにするための情報をいう。時間情報は、例えば、音声情報及び映像情報の再生タイミングを示す情報であり、情報処理装置１２０は、この時間情報が示すタイミングで複数の音声情報及び映像情報を読み出すことによって、時間的なずれを生じさせることなくこれらを再生することができる。 Subsequently, the control unit 112 executes a process of adding time information to each of the audio information and the video information (step S14). Here, the time information refers to information for enabling a plurality of audio information and video information to be reproduced in synchronization. The time information is, for example, information indicating the reproduction timing of audio information and video information, and the information processing apparatus 120 reads out a plurality of audio information and video information at the timing indicated by the time information, so that a time lag is obtained. These can be reproduced without causing them.

音声情報及び映像情報に時間情報を付加したら、制御部１１２は、音声情報及び映像情報を所定のフォーマットで符号化するエンコード処理を実行し（ステップＳ１５）、エンコードされた音声情報及び映像情報を通信部１１５に出力し、通信部１１５を介して情報処理装置１２０に送信する（ステップＳ１６）。 When the time information is added to the audio information and the video information, the control unit 112 executes an encoding process for encoding the audio information and the video information in a predetermined format (step S15), and communicates the encoded audio information and the video information. The data is output to the unit 115 and transmitted to the information processing apparatus 120 via the communication unit 115 (step S16).

図５は、本実施例における情報処理装置１２０の動作を示すフローチャートである。情報処理装置１２０は、情報送信装置１１０から以上のように音声情報及び映像情報が送信されると、同図に示す処理を実行する。まず、情報処理装置１２０の制御部１２２は、音声情報及び映像情報を受信すると、通信部１２１を介してこれらを取得する（ステップＳ２１）。また、制御部１２２は、第１セッション地点の音声情報及び映像情報を取得しつつ、音声入力部１２５を介してマイクＭＩＣｄからの音声情報、すなわち第２セッション地点の演奏者（演奏者ｄ）の音声情報を取得する（ステップＳ２２）。 FIG. 5 is a flowchart showing the operation of the information processing apparatus 120 in this embodiment. When the audio information and the video information are transmitted from the information transmitting apparatus 110 as described above, the information processing apparatus 120 executes the process shown in FIG. First, the control part 122 of the information processing apparatus 120 will acquire these via the communication part 121, if audio | voice information and video information are received (step S21). In addition, the control unit 122 acquires the audio information and the video information of the first session point, and the audio information from the microphone MICd via the audio input unit 125, that is, the player (player d) of the second session point. Audio information is acquired (step S22).

次に、制御部１２２は、通信部１２１を介して取得した音声情報及び映像情報を同期させる処理を実行する（ステップＳ２３）。制御部１２２は、音声情報及び映像情報に付加された時間情報を参照し、これらが時間的なずれを生じることなく再生されるように各音声情報及び映像情報の再生タイミングを調整する。 Next, the control part 122 performs the process which synchronizes the audio | voice information and video information which were acquired via the communication part 121 (step S23). The control unit 122 refers to the time information added to the audio information and the video information, and adjusts the reproduction timing of each audio information and the video information so that these are reproduced without causing a time lag.

制御部１２２は、映像情報を同期させたら、これを加工する加工処理を実行する（ステップＳ２４）。このとき、制御部１２２は、各々の映像情報に対応付けられた位置情報を参照し、映像情報を加工する。例えば、制御部１２２は、位置情報に含まれる距離情報に基づき、スクリーンＳＣＲａ〜ＳＣＲｃに表示される演奏者がほぼ等しい大きさとなるように各映像情報を拡大又は縮小する加工を行う。
また、制御部１２２は、マイクＭＩＣｄとスクリーンＳＣＲａ〜ＳＣＲｃとの距離を表す情報を更に用いて、演奏者ａ〜ｄが一堂に会しているように見えるように映像情報の倍率を変更する加工を行ってもよい。この場合、マイクＭＩＣｄとスクリーンＳＣＲａ〜ＳＣＲｃとの距離を表す情報は、制御部１２２のメモリ等にあらかじめ記憶されていると望ましいが、操作者が入力してもよい。 After synchronizing the video information, the control unit 122 executes a processing process for processing the video information (step S24). At this time, the control unit 122 refers to the position information associated with each video information and processes the video information. For example, based on the distance information included in the position information, the control unit 122 performs a process of enlarging or reducing each video information so that performers displayed on the screens SCRa to SCRc have substantially the same size.
Further, the control unit 122 further uses information representing the distance between the microphone MICd and the screens SCRa to SCRc to change the magnification of the video information so that the performers a to d seem to be gathered together. May be performed. In this case, information representing the distance between the microphone MICd and the screens SCRa to SCRc is preferably stored in advance in the memory or the like of the control unit 122, but may be input by an operator.

制御部１２２は、音声情報を同期させたら、これにミキシング処理を実行する（ステップＳ２５）。このとき、制御部１２２は、マイクＭＩＣａ〜ＭＩＣｄのそれぞれから取得した音声情報を、スピーカＳＰａ〜ＳＰｃにおいて適当なバランスで再生されるように分配する比率を決定し、ミキシングを行う。本実施例においては、マイクＭＩＣａからの音声情報は主にスピーカＳＰａから出力され、マイクＭＩＣｂからの音声情報は主にスピーカＳＰｂから出力される、といったように、第２セッション地点において、第１セッション地点における演奏者の相対的な位置関係と一致する態様で音声情報が再生される。なお、マイクＭＩＣｄからの音声情報については、スピーカＳＰａ〜ＳＰｃに配分する比率を特に問わない。 When the audio information is synchronized, the control unit 122 performs mixing processing on the audio information (step S25). At this time, the control unit 122 determines a ratio for distributing the audio information acquired from each of the microphones MICa to MICd so that the audio information is reproduced with an appropriate balance in the speakers SPa to SPc, and performs mixing. In the present embodiment, the audio information from the microphone MICa is mainly output from the speaker SPa, and the audio information from the microphone MICb is mainly output from the speaker SPb. Audio information is reproduced in a manner that matches the relative positional relationship of the performers at the point. In addition, about the audio | voice information from microphone MICd, the ratio allocated to speaker SPa-SPc in particular is not ask | required.

その後、制御部１２２は、ミキシングされた音声情報と加工された映像情報とを出力し、音声出力部１２６及び映像出力部１２７を介してスピーカＳＰａ〜ＳＰｃ及びスクリーンＳＣＲａ〜ＳＣＲｃに供給する（ステップＳ２６）。これにより、第２セッション地点においては、演奏者ａ〜ｄの演奏音や歌唱音がミキシングされて再生され、演奏者ａ〜ｃの加工された映像が演奏者ｄの背後に再生される。 Thereafter, the control unit 122 outputs the mixed audio information and the processed video information, and supplies them to the speakers SPa to SPc and the screens SCRa to SCRc via the audio output unit 126 and the video output unit 127 (step S26). ). Thus, at the second session point, the performance sounds and singing sounds of the performers a to d are mixed and reproduced, and the processed images of the performers a to c are reproduced behind the performer d.

本実施形態のネットワークセッションシステム１０は、以上のように動作することによって、複数の音声情報及び映像情報の再生タイミングを同期させるとともに、加工した映像情報により表示される映像の大きさを適宜に変更することが可能となる。ゆえに、本実施形態のネットワークセッションシステム１０によれば、遠隔の地で撮影された複数の映像を違和感なく再生し、あたかも全演奏者が実際に会しているような臨場感のあるリアルなセッションを行うことが可能となる。 The network session system 10 of the present embodiment operates as described above to synchronize the reproduction timings of a plurality of audio information and video information, and appropriately change the size of the video displayed by the processed video information. It becomes possible to do. Therefore, according to the network session system 10 of the present embodiment, a plurality of videos shot at a remote place are reproduced without a sense of incongruity, and a realistic session with a sense of presence that is as if all performers are actually meeting. Can be performed.

なお、本実施形態においては、セッション地点が２箇所であるが、セッション地点が３箇所以上に分かれていてもよい。このような場合は、表示される複数の映像の大きさが等しくなるように複数のカメラの設置位置を調整するのが比較的困難であるため、本実施例による効果がより顕著となるといえる。 In the present embodiment, there are two session points, but the session points may be divided into three or more points. In such a case, since it is relatively difficult to adjust the installation positions of the plurality of cameras so that the sizes of the plurality of displayed images are equal, it can be said that the effect of this embodiment becomes more remarkable.

（２）第２実施例
図６は、本実施例に係る情報送信装置１１０の構成を示すブロック図である。本実施例において、情報送信装置１１０は、音声情報に位置情報を対応付ける点が第１実施例と異なっている。また、情報処理装置１２０は、音声情報に対応付けられた位置情報に基づいて音声情報を加工する点が第１実施例と異なっている。そこで、本実施例ではこれらの点を中心に説明する。 (2) Second Embodiment FIG. 6 is a block diagram illustrating a configuration of an information transmission apparatus 110 according to the present embodiment. In the present embodiment, the information transmitting apparatus 110 is different from the first embodiment in that position information is associated with audio information. Further, the information processing apparatus 120 is different from the first embodiment in that the voice information is processed based on the position information associated with the voice information. Therefore, in this embodiment, these points will be mainly described.

制御部１１２は、音声情報Ａ_a、Ａ_b及びＡ_cに位置情報を付加する。制御部１１２が音声情報Ａ_a、Ａ_b及びＡ_cに付加する位置情報は、それぞれ、同一の演奏者に係る映像情報に付加された位置情報である。すなわち、音声情報Ａ_aには映像情報Ｖ_aと同一の位置情報Ｐ_aが付加され、以下同様に、音声情報Ａ_bには位置情報Ｐ_b、音声情報Ａ_cには位置情報Ｐ_cが付加される。すなわち、制御部１１２は、映像情報に対応付けられた位置情報を特定し、特定した位置情報を音声情報に付加して対応付ける処理を行う。 The control unit 112 adds position information to the audio information A _a , A _b, and A _c . The position information added to the audio information A _a , A _b and A _c by the control unit 112 is position information added to the video information related to the same performer. That is, the voice information A _a is added the same position information P _a video information V _a, below Similarly, positional information P _b for voice information A _b, the additional position information P _c for voice information A _c Is done. That is, the control unit 112 performs processing for identifying position information associated with video information and adding the identified position information to audio information for association.

図７は、本実施例における情報送信装置１１０の動作を示すフローチャートである。同図に示すように、本実施例における情報送信装置１１０の動作は、音声情報に位置情報を付加するステップ（ステップＳ１Ａ）がステップＳ１３とＳ１４の間に追加される点を除けば、第１実施例における動作（図４参照）と同様である。ステップＳ１Ａにおいて、制御部１１２は、上述した要領で音声情報に位置情報を付加する。なお、制御部１１２は、ステップＳ１３において、ユーザの操作による位置情報が映像情報に付加された場合には、その位置情報も音声情報に付加する。その後、制御部１１２は、音声情報及び映像情報を出力し、情報処理装置１２０に送信する。 FIG. 7 is a flowchart showing the operation of the information transmitting apparatus 110 in this embodiment. As shown in the figure, the operation of the information transmitting apparatus 110 in the present embodiment is the same as that of the first embodiment except that the step of adding position information (step S1A) to the audio information is added between steps S13 and S14. This is the same as the operation in the embodiment (see FIG. 4). In step S1A, the control unit 112 adds position information to the audio information in the manner described above. In addition, when the position information by a user's operation is added to video information in step S13, the control part 112 also adds the position information to audio | voice information. Thereafter, the control unit 112 outputs audio information and video information and transmits them to the information processing apparatus 120.

図８は、本実施例における情報処理装置１２０の動作を示すフローチャートである。同図に示すように、本実施例における情報処理装置１２０の動作は、音声情報を加工するステップ（ステップＳ２Ａ）がステップＳ２４とＳ２５に追加される点を除けば、第１実施例における動作（図５参照）と同様である。ステップＳ２Ａにおいて、制御部１２２は、ステップＳ２４において映像情報に行われた加工の態様に応じて音声情報を加工する。制御部１２２は、例えば、ある映像情報に対して、その映像情報が表す映像を拡大する加工を行った場合には、当該映像情報と同一の位置情報が付加された音声情報が表す演奏音の音量を増加させ、その映像情報が表す映像を縮小する加工を行った場合には、当該映像情報と同一の位置情報が付加された音声情報が表す演奏音の音量を低減させる加工を行う。その後、制御部１２２は、同期及び加工が行われた音声情報をミキシングし、これを加工した映像情報とともに出力する。 FIG. 8 is a flowchart showing the operation of the information processing apparatus 120 in this embodiment. As shown in the figure, the operation of the information processing apparatus 120 in this embodiment is the same as that in the first embodiment except that the step of processing audio information (step S2A) is added to steps S24 and S25. This is the same as in FIG. In step S2A, the control unit 122 processes the audio information according to the mode of processing performed on the video information in step S24. For example, when the control unit 122 performs processing for enlarging the video represented by the video information with respect to certain video information, the controller 122 plays the performance sound represented by the audio information to which the same position information as the video information is added. When processing for increasing the volume and reducing the video represented by the video information is performed, processing for reducing the volume of the performance sound represented by the audio information to which the same position information as the video information is added is performed. Thereafter, the control unit 122 mixes the synchronized and processed audio information and outputs the mixed audio information together with the processed video information.

なお、本実施例において、音声情報は、付加された位置情報が映像情報に付加されたものと同一のものであるため、映像情報に行われた加工の態様と同様の態様で加工される。しかしながら、音声情報は、映像情報に行われた加工の態様によるのではなく、付加された位置情報に基づいて加工されてもよい。 In the present embodiment, the audio information is processed in the same manner as the processing performed on the video information because the added position information is the same as that added to the video information. However, the audio information may be processed based on the added position information, not based on the mode of processing performed on the video information.

本実施形態のネットワークセッションシステム１０は、以上のように動作することによって、再生される映像の加工の態様に応じて演奏音を加工することが可能となる。ゆえに、本実施形態のネットワークセッションシステム１０は、本実施例に従って動作することにより、第１実施例の場合に比べ、よりリアリティのあるセッションを行うことが可能となる。 The network session system 10 according to the present embodiment can process the performance sound according to the mode of processing of the reproduced video by operating as described above. Therefore, the network session system 10 of this embodiment can perform a more realistic session than that of the first example by operating according to this example.

［変形例］
本発明は、上述した実施形態に限らず、その他の形態でも実施し得る。本発明に対しては、例えば、以下のような変形を適用することが可能である。なお、以下に示す変形例は、各々を適宜に組み合わせてもよい。 [Modification]
The present invention is not limited to the above-described embodiment, and may be implemented in other forms. For example, the following modifications can be applied to the present invention. Note that the following modifications may be combined as appropriate.

（１）変形例１
本発明に係る情報処理装置は、音声情報や映像情報の加工を操作者の指定に基づいて行ってもよい。例えば、操作者は、上述した操作部１２４を介して加工の態様を指定し、制御部１２２は、操作者により指定された態様の加工を行うようにすることができる。このとき、操作者は、演奏の内容に応じて音声情報や映像情報の加工の態様を決定する。例えば、ギターの演奏者ｃが楽曲のあるパートをソロで演奏する場合、操作者は、演奏者ｃに対応する映像を右側のスクリーンＳＣＲｃではなく中央のスクリーンＳＣＲｂに表示させるよう指定したり、演奏者ｃに対応する音声を他の音声よりも音量が大きくなるよう指定したりしてもよい。また、この場合、操作者は、演奏者ｃに対応する映像が拡大されるよう指定を行ってもよい。すなわち、操作者は、映像情報については、映像の拡大又は縮小や表示されるスクリーンの切り替えなどを指定することが可能であり、音声情報については、演奏音の音量や再生されるスピーカの切り替えなどを指定することが可能である。例えば、制御部１２２は、表示されるスクリーンを入れ替えるよう指定された場合には、相対位置情報によらずに、操作者の指定のみによって映像情報の出力先を決定してもよい。 (1) Modification 1
The information processing apparatus according to the present invention may process audio information or video information based on an operator's designation. For example, the operator can specify the processing mode via the operation unit 124 described above, and the control unit 122 can perform processing in the mode specified by the operator. At this time, the operator determines the processing mode of the audio information and the video information according to the contents of the performance. For example, when a guitar player c performs a solo part with a song, the operator designates that the video corresponding to the player c should be displayed on the center screen SCRb instead of the right screen SCRc, The voice corresponding to the person c may be specified to be louder than other voices. In this case, the operator may specify that the video corresponding to the player c is enlarged. That is, for the video information, the operator can specify enlargement or reduction of the video, switching of the screen to be displayed, etc., and for the audio information, the volume of the performance sound, switching of the speaker to be played back, etc. Can be specified. For example, when it is specified that the screen to be displayed is to be replaced, the control unit 122 may determine the output destination of the video information only by the designation of the operator without using the relative position information.

なお、本発明に係る情報処理装置は、音声情報や映像情報の加工に際し、位置情報に応じた態様の加工と操作者の指定に応じた態様の加工の双方を行ってもよいが、位置情報に応じた態様の加工に代えて操作者の指定に応じた態様の加工を行うようにしてもよい。
また、上述した実施形態においては、音声情報や映像情報の加工が同期処理（ステップＳ２３）の後に行われたが、本発明に係る情報処理装置は、音声情報や映像情報の加工を同期処理の前に行ってもよい。 Note that the information processing apparatus according to the present invention may perform both of the processing according to the position information and the processing according to the designation by the operator when processing the audio information and the video information. Instead of processing according to the mode, processing according to the mode specified by the operator may be performed.
In the above-described embodiment, the processing of the audio information and the video information is performed after the synchronization processing (step S23). However, the information processing apparatus according to the present invention performs the processing of the audio information and the video information on the synchronization processing. You may go before.

（２）変形例２
本発明において、位置情報は、上述した相対位置情報や距離情報に限らない。例えば、上述したカメラＣＡＭａ〜ＣＡＭｃの撮影方向が天地方向又は左右方向に回転可能であり、その回転角が計測可能である場合、位置情報は、その回転角を表す情報であってもよい。ここにおいて、回転角を表す情報とは、ある撮影方向を基準とし、演奏者を撮影している方向とその基準となる方向とのずれ（角度）を表す情報である。 (2) Modification 2
In the present invention, the position information is not limited to the relative position information and distance information described above. For example, when the shooting direction of the cameras CAMa to CAMc described above can be rotated in the vertical direction or the horizontal direction, and the rotation angle can be measured, the position information may be information indicating the rotation angle. Here, the information representing the rotation angle is information representing a deviation (angle) between the direction in which the performer is photographed and the direction serving as the reference with respect to a certain photographing direction.

かかる情報を位置情報として用いた場合、本発明に係る情報処理装置は、この位置情報に基づいて映像の位置や演奏音の定位点を制御することができ、これらを変更するように映像情報や音声情報を加工することが可能である。例えば、上述した実施形態において、ギターの演奏者ｃがドラムの演奏者ｂに近づくように（すなわち、右から左に）移動し、カメラＣＡＭｃが演奏者ｃを追うように回転しながら撮影を行った場合、位置情報は、時間の経過に従って回転角が変化する情報となる。制御部１２２は、このような位置情報を取得した場合、スクリーンＳＣＲｃに表示される演奏者ｃの映像が右から左に移動するような加工を映像情報に対して行う。また、このとき、制御部１２２は、演奏者ｃの演奏音が映像に伴って移動しているように知覚されるような加工を音声情報に対して行ってもよい。 When such information is used as position information, the information processing apparatus according to the present invention can control the position of the video and the localization point of the performance sound based on this position information, It is possible to process voice information. For example, in the above-described embodiment, the guitar player c moves so as to approach the drum player b (that is, from right to left), and the camera CAMc rotates while following the player c. In this case, the position information is information whose rotation angle changes with time. When such position information is acquired, the control unit 122 performs processing on the video information so that the video of the player c displayed on the screen SCRc moves from right to left. At this time, the control unit 122 may perform processing on the audio information so that the performance sound of the player c is perceived as moving with the video.

また、位置情報は、測位手段により計測された演奏者の位置を表す情報であってもよい。この場合において、測位手段は、例えば、いわゆるＩＣタグ等の情報（電波等）を発信する発信手段と、この情報を受信する受信手段とにより構成することができる。この構成において、演奏者は、発信手段を自身と共に移動するように携帯し、受信手段は、発信手段から発信された情報を受信することにより、演奏者の位置を特定することができる。この場合、本発明に係る情報送信装置は、この発信手段が特定した位置を位置情報として取得し、これを映像情報と対応付けて送信する。
なお、かかる測位手段を実現する技術としては、例えば、ＵＷＢ（Ultra Wide Band）などが挙げられる。 Further, the position information may be information indicating the position of the performer measured by the positioning means. In this case, the positioning means can be constituted by, for example, a transmitting means for transmitting information (such as a radio wave) such as a so-called IC tag and a receiving means for receiving this information. In this configuration, the performer carries the transmitting means so as to move with the player, and the receiving means can specify the position of the performer by receiving the information transmitted from the transmitting means. In this case, the information transmission apparatus according to the present invention acquires the position specified by the transmission means as position information, and transmits the position information in association with the video information.
In addition, as a technique which implement | achieves this positioning means, UWB (Ultra Wide Band) etc. are mentioned, for example.

（３）変形例３
本発明に係る情報処理装置は、複数の映像情報を合成する加工を行ってもよい。例えば、スクリーンに表示される複数の映像の隣り合う辺の部分を合成し、複数の映像が１つの映像になるように映像情報を加工してもよい。このようにすれば、１つのスクリーンで映像を再生することが可能となる。 (3) Modification 3
The information processing apparatus according to the present invention may perform processing for combining a plurality of pieces of video information. For example, the video information may be processed so that the plurality of videos are combined into one video by combining adjacent side portions of the videos displayed on the screen. In this way, it is possible to reproduce the video on one screen.

なお、このような加工を行う場合、第１セッション地点においては、演奏者をいわゆるブルーバック（ブルースクリーン）を用いて撮影するのが望ましい。このようにすれば、映像情報から演奏者の映像を抽出することが容易となるからである。この場合、演奏者の映像情報の他に演奏者の背景を構成する映像情報を別途取得し、これらを合成するようにしてもよい。なお、背景部分に相当する映像情報は、情報処理装置がこれを記憶していてもよいし、通信ネットワークを介して外部装置から取得してもよい。
この変形例は、上述した変形例２と組み合わせて適用されると、より好適である。 When performing such processing, it is desirable to photograph the performer using a so-called blue back (blue screen) at the first session point. This is because it is easy to extract the performer's video from the video information. In this case, in addition to the video information of the performer, video information that constitutes the background of the performer may be acquired separately and synthesized. Note that the video information corresponding to the background portion may be stored in the information processing apparatus, or may be acquired from an external apparatus via a communication network.
This modification is more suitable when applied in combination with Modification 2 described above.

（４）変形例４
第２セッション地点には、光の照射方向が映像に応じて変化する照明装置が設けられてもよい。この照明装置は、いわゆるスポットライトのように、局所的な照明であると望ましい。このようにすれば、映像上の演奏者があたかもその場にいるような演出効果を行うことができる。 (4) Modification 4
The second session point may be provided with an illumination device in which the light irradiation direction changes according to the video. This illuminating device is preferably a local illumination such as a so-called spotlight. In this way, it is possible to produce an effect as if the performer on the video is on the spot.

照明装置の照射方向を制御するためには、映像情報に対応付けられた位置情報を用いればよい。例えば、位置情報が回転角を表す情報である場合、照明装置は、照射方向をこの回転角に応じて変化させることにより光の照射方向を映像に応じて変化させることが可能となる。
なお、この照明装置は、操作者により点灯及び消灯を制御される構成でもよい。 In order to control the irradiation direction of the illumination device, position information associated with video information may be used. For example, when the position information is information indicating a rotation angle, the illumination device can change the irradiation direction of light according to the image by changing the irradiation direction according to the rotation angle.
The lighting device may be configured to be turned on and off by an operator.

（５）変形例５
第２セッション地点には、演奏者ｄが映像を確認するための表示装置が設けられてもよい。この表示装置は、いわゆるカラオケ装置の表示部のように、演奏者ｄが歌唱する楽曲の歌詞を表示してもよい。また、この表示装置は、スクリーンに表示される映像と同様の映像を表示してもよい。 (5) Modification 5
A display device may be provided at the second session point for the player d to check the video. This display device may display the lyrics of music sung by the player d like a display unit of a so-called karaoke device. In addition, the display device may display an image similar to the image displayed on the screen.

（６）変形例６
本発明において、取得する音声情報及び映像情報の数は、上述した実施形態に限定されない。上述した実施形態においては、第１セッション地点から３人の演奏者に対応する音声情報及び映像情報が送信されたが、演奏者をより多数としてもよいし、２人としてもよい。また、第１実施例においては、３人の演奏音を３つのマイクで収音せずに１つのマイクで収音してもよい。 (6) Modification 6
In the present invention, the number of audio information and video information to be acquired is not limited to the above-described embodiment. In the above-described embodiment, audio information and video information corresponding to three performers are transmitted from the first session point. However, the number of performers may be more or two. In the first embodiment, the performance sounds of three people may be picked up by one microphone instead of picked up by three microphones.

また、第２セッション地点の演奏者の人数も、変更可能である。例えば、第２セッション地点に複数の演奏者がおり、それぞれの演奏音を複数のマイクで収音してもよい。あるいは、第２セッション地点には演奏者がおらず、第１セッション地点の演奏音と映像を再生するのみであってもよい。
また、第２セッション地点における出力先（スクリーン及びスピーカ）の数も、変更可能である。 Also, the number of performers at the second session point can be changed. For example, there may be a plurality of performers at the second session point, and each performance sound may be collected by a plurality of microphones. Alternatively, there may be no performer at the second session point, and only the performance sound and video at the first session point may be reproduced.
The number of output destinations (screen and speakers) at the second session point can also be changed.

さらに、セッション地点は、３箇所以上あってもよい。本発明に係る情報処理装置は、このような場合であっても、時間情報を参照することによって複数の音声情報及び映像情報を同期させることが可能である。 Furthermore, there may be three or more session points. Even in such a case, the information processing apparatus according to the present invention can synchronize a plurality of audio information and video information by referring to the time information.

（７）変形例７
本発明におけるセッションは、歌唱や演奏を目的としたものに限らず、複数の対象者が集団で行う種々の活動を含み得る。例えば、通信ネットワークを介した会議において本発明を適用してもよいし、学校での授業等に本発明を適用してもよい。すなわち、本発明において収音や撮影の対象となる者は、演奏者に限らない。 (7) Modification 7
The session in the present invention is not limited to the purpose of singing or playing, but may include various activities performed by a plurality of subjects in a group. For example, the present invention may be applied to a meeting via a communication network, or may be applied to a class at school. That is, in the present invention, the person who is the target of sound collection and shooting is not limited to the performer.

（８）変形例８
本発明は、コンピュータに上述した制御部１２２の機能を実現させるためのプログラムとしても提供され得る。かかるプログラムは、これを記憶させた光ディスク等の記録媒体としても提供可能であり、また、インターネット等の通信ネットワークを介して所定のサーバ装置からコンピュータにダウンロードされ、これをインストールして利用可能にするなどの形態でも提供され得る。 (8) Modification 8
The present invention can also be provided as a program for causing a computer to realize the functions of the control unit 122 described above. Such a program can be provided as a recording medium such as an optical disk storing the program, and is downloaded to a computer from a predetermined server device via a communication network such as the Internet, and can be installed and used. It can also be provided in the form.

本発明のネットワークセッションシステムの構成を示す図である。It is a figure which shows the structure of the network session system of this invention. 情報送信装置の構成を示すブロック図である。It is a block diagram which shows the structure of an information transmitter. 情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of information processing apparatus. 情報送信装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of an information transmitter. 情報処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of information processing apparatus. 情報送信装置の構成を示すブロック図である。It is a block diagram which shows the structure of an information transmitter. 情報送信装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of an information transmitter. 情報処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of information processing apparatus.

Explanation of symbols

１０…ネットワークセッションシステム、１１０…情報送信装置、１１１…入力部、１１２…制御部、１１３…記憶部、１１４…操作部、１１５…通信部、１２０…情報処理装置、１２１…通信部、１２２…制御部、１２３…記憶部、１２４…操作部、１２５…音声入力部、１２６…音声出力部、１２７…映像出力部、１３０…ネットワーク DESCRIPTION OF SYMBOLS 10 ... Network session system, 110 ... Information transmission apparatus, 111 ... Input part, 112 ... Control part, 113 ... Memory | storage part, 114 ... Operation part, 115 ... Communication part, 120 ... Information processing apparatus, 121 ... Communication part, 122 ... Control unit, 123 ... storage unit, 124 ... operation unit, 125 ... audio input unit, 126 ... audio output unit, 127 ... video output unit, 130 ... network

Claims

Obtaining means for obtaining audio information, a plurality of pieces of video information, and a plurality of pieces of position information each associated with any of the plurality of pieces of video information via a communication network;
Video processing means for processing a plurality of video information acquired by the acquisition means in a manner according to the position information associated with the video information;
An information processing apparatus comprising: output means for outputting a plurality of pieces of video information processed by the video processing means and audio information acquired by the acquisition means.

The information processing apparatus according to claim 1, wherein the video processing unit performs processing to change a position or a size of a video displayed when the video information is output.

The information processing apparatus according to claim 1, wherein the video processing unit performs processing to combine the plurality of video information.

The position information is information representing a distance between a photographing unit that photographed a video represented by the corresponding video information and a subject,
The information processing apparatus according to claim 1, wherein the video processing unit performs processing to change a size of a video represented by the video information according to a distance represented by the position information.

The position information is information that represents a shooting angle with reference to a shooting direction of a shooting unit that shot a video represented by the corresponding video information,
The information processing according to claim 1, wherein the video processing unit performs processing to change a display position of a video represented by the video information according to a shooting angle represented by the position information corresponding to the video information. apparatus.

The position information is information indicating the position of a photographing unit that has photographed the video represented by the corresponding video information,
The information processing apparatus according to claim 1, wherein the video processing unit performs processing to change a display position of a video represented by the video information in accordance with a position represented by the plurality of position information.

The audio information and the video information represent the audio and video of the target person, respectively.
The position information represents the position of the subject measured by the positioning means,
The information processing apparatus according to claim 1, wherein the video processing unit performs processing to change a display position of a video represented by the video information in accordance with a position represented by the plurality of position information.

A specifying means for specifying a processing mode for the video information;
The information processing apparatus according to claim 1, wherein the video processing unit performs processing in a mode according to the position information and processing in a mode specified by the specifying unit.

Position information acquisition means for acquiring position information associated with the audio information;
The information processing apparatus according to claim 1, further comprising: an audio processing unit that processes the audio information in a manner corresponding to the position information acquired by the position information acquisition unit.

The information processing apparatus according to claim 1, further comprising: an audio processing unit that processes the audio information according to a mode of processing performed on the plurality of video information by the video processing unit.

Synchronization means for synchronizing the audio information and the video information acquired by the acquisition means;
The acquisition means acquires time information representing each reproduction timing in association with each of the audio information and the video information,
The synchronization means synchronizes the audio information and video information based on the time information associated with each of the audio information and video information before or after processing by the video processing means. Item 4. The information processing apparatus according to Item 1.

Computer
Obtaining means for obtaining audio information, a plurality of pieces of video information, and a plurality of pieces of position information each associated with any of the plurality of pieces of video information via a communication network;
Video processing means for processing a plurality of video information acquired by the acquisition means in a manner according to the position information associated with the video information;
A program for causing a plurality of video information processed by the video processing means and audio information acquired by the acquisition means to function as output means.