JP5493551B2

JP5493551B2 - Information processing system, information processing apparatus, and information processing method

Info

Publication number: JP5493551B2
Application number: JP2009177718A
Authority: JP
Inventors: 晃一竹内
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2009-07-30
Filing date: 2009-07-30
Publication date: 2014-05-14
Anticipated expiration: 2029-07-30
Also published as: JP2011035524A

Description

本発明は、情報処理システム、情報処理装置、及び情報処理方法に関し、特に、音声制御することの出来る情報処理システム、情報処理装置、及び情報処理方法に関する。 The present invention relates to an information processing system, an information processing apparatus, and an information processing method, and more particularly, to an information processing system, an information processing apparatus, and an information processing method capable of voice control.

近年、テレビ電話及びテレビ会議など、離れた拠点間をネットワークで結び、遠隔地にいる相手と映像及び音声を用いてコミュニケーションすることができるシステムが利用されるようになってきている。 2. Description of the Related Art In recent years, systems such as videophones and videoconferences have been used that can connect remote bases via a network and communicate with remote parties using video and audio.

特に、複数人でこのようなシステムを利用する場合には、次のような問題が発生することがある。例えば、音声を取得するマイクからの距離、角度、及び個人の音声の大きさはそれぞれ異なるため、特定の音声について聞き取りづらい状況が発生することがある。また、例えば、プロジェクタなどの機器から発生するファン音などの動作音、及び、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）のタイプ音など人の動作に応じて発生する動作音などが、必要以上に大きく集音されてしまうこともある。従って、複数人でコミュニケーションするシステムにおいては、音声の制御が重要となる。 In particular, when such a system is used by a plurality of people, the following problems may occur. For example, since the distance from the microphone that acquires the voice, the angle, and the volume of the individual voice are different, it may be difficult to hear a specific voice. Also, for example, operation sounds such as fan sounds generated from devices such as projectors, and operation sounds generated in response to human actions such as PC (Personal Computer) type sounds are collected more than necessary. Sometimes it ends up. Therefore, in a system in which communication is performed by a plurality of people, voice control is important.

そこで、様々な音源からの音声を、分離して取得することによって、分離取得された音声を個別に制御出来るようにする方法が提案されている。例えば、ピン・マイクロホンを個別に装着することによって、人毎に音声を取得することが出来る。また、例えば、特許文献１には、椅子に多数のマイクロホンを埋め込むことによって、ピン・マイクロホンを装着することなく、人毎に音声を分離取得することの出来る椅子が提案されている。また、例えば、特許文献２には、音声を分離取得し、着席位置に応じた放音を行う音声会議システムが提案されている。 In view of this, a method has been proposed in which voices from various sound sources are separately acquired, so that the separately acquired voices can be individually controlled. For example, voices can be acquired for each person by attaching a pin microphone individually. Further, for example, Patent Document 1 proposes a chair that allows voices to be separated and acquired for each person without embedding a pin microphone by embedding a large number of microphones in the chair. For example, Patent Document 2 proposes an audio conference system that separates and acquires audio and emits sound according to the seating position.

特開２００７-３３６０１０号公報JP 2007-336010 A 特開２００８−１７１２６号公報JP 2008-17126 A

しかし、音源毎に音声を分離取得することが出来たとしても、音声の調整をする際にはどのマイクロホンでどの音源からの音声が取得されているのかを特定する必要があるが、この特定が困難であるという問題があった。 However, even if the sound can be obtained separately for each sound source, when adjusting the sound, it is necessary to specify which sound source from which sound source is acquired with which microphone. There was a problem that it was difficult.

そこで、本発明は、上記問題に鑑みてなされたものであり、本発明の目的とするところは、映像を表示する画面に対する操作から、制御対象の音声を特定し、制御することが可能な、新規かつ改良された情報処理装置及び情報処理方法を提供することにある。 Therefore, the present invention has been made in view of the above problems, and the object of the present invention is to specify and control the sound to be controlled from the operation on the screen for displaying the video. It is an object to provide a new and improved information processing apparatus and information processing method.

上記課題を解決するために、本発明のある観点によれば、複数のマイクロホンと、映像を撮影する撮像装置と、上記複数のマイクロホンによって取得された音声を処理する音声処理装置と、上記映像及び上記音声処理装置において処理された音声を出力し、上記音声処理装置に対して音声制御情報を出力する情報処理装置と、を有し、上記情報処理装置は、映像を表示する表示部と、上記表示部の表示画面上の座標位置を入力する座標入力部と、上記座標入力部によって入力された座標位置付近に音声調整インタフェースを上記映像に重ねて上記表示部に表示させ、内部又は外部の記憶部に記憶された対応情報に基づいて、上記映像中の上記座標位置に対応する位置の音声を取得するマイクロホンのマイク識別子を特定し、特定されたマイク識別子及び上記音声調整インタフェースに対する上記座標入力部からの入力に応じた音声処理情報を含む音声制御情報を出力する制御部と、を有する、情報処理システムが提供される。 In order to solve the above problems, according to an aspect of the present invention, a plurality of microphones, an imaging device that captures images, a sound processing device that processes sound acquired by the plurality of microphones, the images, and An information processing device that outputs sound processed by the sound processing device and outputs sound control information to the sound processing device, wherein the information processing device includes a display unit that displays video; and A coordinate input unit for inputting a coordinate position on the display screen of the display unit, and an audio adjustment interface overlaid on the video in the vicinity of the coordinate position input by the coordinate input unit, displayed on the display unit, and stored internally or externally Based on the correspondence information stored in the unit, the microphone identifier of the microphone that acquires the sound at the position corresponding to the coordinate position in the video is specified, and the specified microphone Besshi and and a control unit for outputting sound control information including a sound processing information corresponding to the input from the coordinate input unit with respect to the audio adjustment interface, the information processing system is provided.

かかる構成によれば、複数のマイクロホンによって取得された音声は、音声処理装置において処理され、情報処理装置において出力される。情報処理装置は、表示部において映像を出力すると共に、音声出力部において、音声処理装置において処理された音声を出力する。さらに、情報処理装置は、音声処理装置に音声制御情報を出力することも出来る。このとき、情報処理装置は、ユーザが情報処理装置の表示部に表示された映像を見ながら、座標入力部を用いて音声を制御したい対象を入力すると、その座標位置に音声調整インタフェースを表示する。そして、情報処理装置は、対応情報を用いて座標位置から制御対象のマイクロホンを特定する。ユーザが音声調整インタフェースを操作したときに、情報処理装置は、その操作の情報及び特定したマイクロホンのマイク識別子に基づいて音声制御情報を作成して、音声処理装置に対して出力する。これにより、ユーザは、映像を見ながらの直感的な操作により、操作と制御したいマイクロホンとの対応を考えることなく、音声制御を実行することが出来るようになる。 According to this configuration, the sound acquired by the plurality of microphones is processed by the sound processing device and output by the information processing device. The information processing device outputs video on the display unit, and outputs audio processed by the audio processing device in the audio output unit. Further, the information processing apparatus can output voice control information to the voice processing apparatus. At this time, when the user inputs an object whose sound is to be controlled using the coordinate input unit while viewing the video displayed on the display unit of the information processing device, the information processing device displays a voice adjustment interface at the coordinate position. . Then, the information processing apparatus identifies the microphone to be controlled from the coordinate position using the correspondence information. When the user operates the voice adjustment interface, the information processing apparatus creates voice control information based on the operation information and the identified microphone identifier of the microphone, and outputs the voice control information to the voice processing apparatus. As a result, the user can execute voice control by an intuitive operation while watching the video without considering the correspondence between the operation and the microphone to be controlled.

また、音源には予め識別のための音源識別子が付与されており、上記対応情報は、上記音源識別子及び上記マイク識別子の対応関係を収集して生成された情報であってもよい。 In addition, a sound source identifier for identification may be assigned to the sound source in advance, and the correspondence information may be information generated by collecting the correspondence relationship between the sound source identifier and the microphone identifier.

また、上記複数のマイクロホンは、音声取得装置に内蔵されており、上記音声取得装置は、上記音源識別子を受信する受信器を有し、上記音声取得装置の表面には、音声取得装置を識別するための音声取得装置識別子が埋め込まれた認識コードが表示され、上記対応情報は、上記音声取得装置識別子、上記音源識別子、上記受信器に付与された受信器識別子、及び上記マイク識別子の対応関係を収集して生成された情報であり、上記制御部は、上記映像中の上記認識コードから上記音声取得装置識別子を読取り、読取られた音声取得装置識別子及び上記対応情報を用いて、上記座標位置に対応する位置の音声を取得するマイクロホンのマイク識別子を特定してもよい。 The plurality of microphones are incorporated in a sound acquisition device, and the sound acquisition device includes a receiver that receives the sound source identifier, and the sound acquisition device is identified on a surface of the sound acquisition device. A recognition code embedded with a voice acquisition device identifier for the display, and the correspondence information includes a correspondence relationship between the voice acquisition device identifier, the sound source identifier, a receiver identifier assigned to the receiver, and the microphone identifier. The information is collected and generated, and the control unit reads the voice acquisition device identifier from the recognition code in the video, and uses the read voice acquisition device identifier and the correspondence information to read the coordinate position. You may specify the microphone identifier of the microphone which acquires the audio | voice of the corresponding position.

また、上記音声取得装置は、机の形状をしており、上記受信器は、上記音声取得装置の側面に設置され、上記受信器は、人が身につけた、音源識別子発信器から上記音源識別子を受信してもよい。 The voice acquisition device has a desk shape, the receiver is installed on a side surface of the voice acquisition device, and the receiver receives the sound source identifier from a sound source identifier transmitter worn by a person. May be received.

また、上記音声取得装置は、自らの発信器識別子を発信する発信器をさらに有し、上記発信器は、他の音声取得装置と組み合わせられた場合に、他の音声取得装置の受信器と対向するよう配置され、上記受信器は、上記音源識別子又は上記発信器識別子のいずれかを受信し、複数の上記音声取得装置の対応情報を取得し、上記複数の音声取得装置の配置を認識する対応情報作成装置をさらに有してもよい。 The voice acquisition device further includes a transmitter for transmitting its own transmitter identifier, and the transmitter is opposed to a receiver of the other voice acquisition device when combined with the other voice acquisition device. The receiver receives either the sound source identifier or the transmitter identifier, acquires correspondence information of the plurality of sound acquisition devices, and recognizes the arrangement of the plurality of sound acquisition devices. You may further have an information preparation apparatus.

また、上記音声処理装置は、上記マイクロホンから入力された音声にマイク識別子を付与するマイク識別子付与部と、上記音声に付与されたマイク識別子と、上記音声制御情報とに基づいて、上記音声を処理する音声ミキサと、を有してもよい。 The voice processing device processes the voice based on a microphone identifier giving unit that gives a microphone identifier to the voice input from the microphone, a microphone identifier given to the voice, and the voice control information. And an audio mixer.

また、上記対応情報は、上記座標位置と上記音声取得部識別子との対応関係を予め記憶した情報であってもよい。 The correspondence information may be information in which a correspondence relationship between the coordinate position and the voice acquisition unit identifier is stored in advance.

また上記課題を解決するために、本発明の別の観点によれば、複数のマイクロホンによって取得された音声、及び映像を取得し、出力する情報処理装置であって、上記映像を表示する表示部と、上記表示部の表示画面上の座標位置を入力する座標入力部と、上記座標入力部によって入力された座標位置付近に音声調整インタフェースを上記映像に重ねて上記表示部に表示させ、内部又は外部の記憶部に記憶された対応情報に基づいて、上記映像中の上記座標位置に対応する位置の音声を取得するマイクロホンのマイク識別子を特定し、特定されたマイク識別子及び上記音声調整インタフェースに対する上記座標入力部からの入力に応じた音声処理情報を含む音声制御情報を出力する制御部と、を有する情報処理装置が提供される。 In order to solve the above-described problem, according to another aspect of the present invention, an information processing apparatus that acquires and outputs audio and video acquired by a plurality of microphones, the display unit displaying the video A coordinate input unit that inputs a coordinate position on the display screen of the display unit, and a voice adjustment interface that is superimposed on the video and displayed on the display unit in the vicinity of the coordinate position input by the coordinate input unit. Based on correspondence information stored in an external storage unit, a microphone identifier of a microphone that acquires sound at a position corresponding to the coordinate position in the video is specified, and the microphone identifier specified and the voice adjustment interface There is provided an information processing apparatus including a control unit that outputs voice control information including voice processing information according to an input from a coordinate input unit.

また上記課題を解決するために、本発明の別の観点によれば、映像を表示する表示部と、上記表示部の表示画面上の座標位置を入力する座標入力部と、制御部と、を有する情報処理装置による方法であって、上記制御部が、上記座標入力部によって入力された座標位置付近に音声調整インタフェースを上記映像に重ねて上記表示部に表示させ、内部又は外部の記憶部に記憶された対応情報に基づいて、上記映像中の上記座標位置に対応する位置の音声を取得するマイクロホンのマイク識別子を特定し、特定されたマイク識別子及び上記音声調整インタフェースに対する上記座標入力部からの入力に応じた音声処理情報を含む音声制御情報を出力する、情報処理方法が提供される。 In order to solve the above problem, according to another aspect of the present invention, a display unit that displays an image, a coordinate input unit that inputs a coordinate position on the display screen of the display unit, and a control unit are provided. The information processing apparatus has a method in which the control unit causes a voice adjustment interface to be superimposed on the video and displayed on the display unit in the vicinity of the coordinate position input by the coordinate input unit, and to be stored in an internal or external storage unit. Based on the stored correspondence information, the microphone identifier of the microphone that acquires the sound at the position corresponding to the coordinate position in the video is specified, and the microphone input from the coordinate input unit for the specified microphone identifier and the voice adjustment interface is specified. There is provided an information processing method for outputting voice control information including voice processing information according to an input.

以上説明したように本発明によれば、映像を表示する画面に対する操作から、制御対象の音声を特定し、制御することができる。 As described above, according to the present invention, the sound to be controlled can be specified and controlled from the operation on the screen displaying the video.

本発明の一実施形態に係る情報処理システムの音声調整画面の一例を示す説明図である。It is explanatory drawing which shows an example of the audio | voice adjustment screen of the information processing system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理システムの音声調整画面の一例を示す説明図である。It is explanatory drawing which shows an example of the audio | voice adjustment screen of the information processing system which concerns on one Embodiment of this invention. 本発明の第１の実施形態に係る情報処理システムの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the information processing system which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る情報処理システムが、マイクロホンを特定するために用いるエリアについて説明するための説明図である。It is explanatory drawing for demonstrating the area which the information processing system which concerns on the 1st Embodiment of this invention uses in order to specify a microphone. マイク識別子とエリアとの対応関係から成る対応情報を示す表である。It is a table | surface which shows the correspondence information which consists of a correspondence with a microphone identifier and an area. 本発明の一実施形態に係る情報処理システムにおける音声調整の動作を説明するフローチャートである。It is a flowchart explaining the operation | movement of the audio | voice adjustment in the information processing system which concerns on one Embodiment of this invention. 本発明の第１の実施形態に係る情報処理システムの音声調整のマイク特定フローを示すサブフローチャートである。It is a sub-flowchart which shows the microphone specific flow of the audio | voice adjustment of the information processing system which concerns on the 1st Embodiment of this invention. 本発明の第２の実施形態に係る情報処理システムの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the information processing system which concerns on the 2nd Embodiment of this invention. 音声取得装置における対応情報の一例を示す説明図である。It is explanatory drawing which shows an example of the correspondence information in an audio | voice acquisition apparatus. 音声取得装置を上部から見た概観図である。It is the general-view figure which looked at the audio | voice acquisition apparatus from the upper part. 音声取得装置を側面から見た概観図である。It is the general-view figure which looked at the audio | voice acquisition apparatus from the side. 音源識別子発信器と音声取得装置との通信について説明する説明図である。It is explanatory drawing explaining communication with a sound source identifier transmitter and an audio | voice acquisition apparatus. バッチホルダの概観構成例について説明する説明図である。It is explanatory drawing explaining the example of an external appearance structure of a batch holder. 音声取得装置において取得された、対応情報の一例を示す説明図である。It is explanatory drawing which shows an example of the corresponding information acquired in the audio | voice acquisition apparatus. 音声取得装置において取得された、対応情報の一例を示す説明図である。It is explanatory drawing which shows an example of the corresponding information acquired in the audio | voice acquisition apparatus. 音声取得装置において取得された、対応情報の一例を示す説明図である。It is explanatory drawing which shows an example of the corresponding information acquired in the audio | voice acquisition apparatus. 音声取得装置において取得された、対応情報の一例を示す説明図である。It is explanatory drawing which shows an example of the corresponding information acquired in the audio | voice acquisition apparatus. 対応情報作成装置において認識される音声取得装置の配置例を示す説明図である。It is explanatory drawing which shows the example of arrangement | positioning of the audio | voice acquisition apparatus recognized in a corresponding | compatible information production apparatus. 本発明の第２の実施形態に係る情報処理システムの音声制御について説明するための説明図である。It is explanatory drawing for demonstrating the audio | voice control of the information processing system which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係る情報処理システムの音声調整のマイク特定フローを示すサブフローチャートである。It is a sub-flowchart which shows the microphone specific flow of the audio | voice adjustment of the information processing system which concerns on the 2nd Embodiment of this invention.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Exemplary embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol.

（概要）
まず、本発明の一実施形態に係る情報処理システムの概要を図１及び図２を用いて説明する。図１は、本発明の一実施形態に係る情報処理システムの音声調整画面の一例を示す説明図である。図２は、本発明の一実施形態に係る情報処理システムの音声調整画面の一例を示す説明図である。 (Overview)
First, an outline of an information processing system according to an embodiment of the present invention will be described with reference to FIGS. 1 and 2. FIG. 1 is an explanatory diagram illustrating an example of a sound adjustment screen of an information processing system according to an embodiment of the present invention. FIG. 2 is an explanatory diagram showing an example of a sound adjustment screen of the information processing system according to the embodiment of the present invention.

図１は、本実施形態に係る情報処理システムにおいて、表示部１１５から入力操作を行うことによって、音声調整をする場面を示している。まず、表示部１１５は、例えばテレビ会議、テレビ電話などにおいて、相手方の映像を映し出している表示画面である。ここで表示画面は、上部にタッチパネルが積層され、座標入力装置としての機能を有するものとする。表示部１１５を見ているユーザが、参加者４０の音声が聞き取りにくいと感じた場面を想定する。ユーザは、表示部１１５において参加者４０を指差すようにタッチする。すると、情報処理装置１００は、参加者４０の音声を取得しているマイクロホンを特定し、特定したマイクロホンの音声調整をするための操作表示を、映像の上に重ねて表示する。操作表示は、ここではボリューム調整バー６０としたがこれに限られない。例えば、音量、音色、バランス、エフェクトなど、音声に対して調整できるものであればよい。 FIG. 1 shows a scene in which sound adjustment is performed by performing an input operation from the display unit 115 in the information processing system according to the present embodiment. First, the display unit 115 is a display screen that displays an image of the other party in, for example, a video conference or a video phone. Here, it is assumed that the display screen has a function as a coordinate input device with a touch panel laminated on the top. It is assumed that the user who is viewing the display unit 115 feels that the voice of the participant 40 is difficult to hear. The user touches the display unit 115 so as to point at the participant 40. Then, the information processing apparatus 100 identifies the microphone that has acquired the voice of the participant 40, and displays an operation display for adjusting the sound of the identified microphone on the video. Here, the operation display is the volume adjustment bar 60, but is not limited thereto. For example, any sound volume, timbre, balance, effect, etc. that can be adjusted with respect to the sound may be used.

ここで、図２に示すように、ユーザが、操作表示６０に対してボリュームを上げる操作をする。すると、情報処理装置１００は、先ほど特定したマイクロホンに対して、ユーザ操作に従った制御を行う信号を生成し、出力する。 Here, as shown in FIG. 2, the user operates the operation display 60 to increase the volume. Then, the information processing apparatus 100 generates and outputs a signal for performing control according to the user operation with respect to the microphone specified earlier.

上記のように、本実施形態に係る情報処理システムは、リアルタイムの映像を映し出した画面を見ながら、例えば、現在喋っている人を画面上で確認出来、その人を指差すという直感的な操作によって、映像内の音声を制御することが出来るものである。これにより、場の流れを遮ることなく、音声の調整を実行することが可能となる。また、どのマイクロホンによってどの人の音声を取得しているかを、音声調整するユーザは気にする必要がない。 As described above, the information processing system according to the present embodiment allows an intuitive operation in which, for example, a person who is currently speaking can be confirmed on the screen while pointing to the person while viewing the screen on which the real-time video is displayed. By this, the audio in the video can be controlled. As a result, it is possible to perform audio adjustment without blocking the flow of the field. Further, the user who adjusts the sound does not need to worry about which person's sound is acquired by which microphone.

（第１の実施形態）
次に、上記の機能を実現するための、本発明の第１の実施形態に係る情報処理システム１００の機能構成について図３〜図５を用いて説明する。図３は、本発明の第１の実施形態に係る情報処理システム１００の機能構成を示すブロック図である。図４は、本発明の第１の実施形態に係る情報処理システムが、マイクロホンを特定するために用いるエリアについて説明するための説明図である。図５は、マイク識別子とエリアとの対応関係から成る対応情報を示す表である。 (First embodiment)
Next, the functional configuration of the information processing system 100 according to the first embodiment of the present invention for realizing the above functions will be described with reference to FIGS. FIG. 3 is a block diagram showing a functional configuration of the information processing system 100 according to the first embodiment of the present invention. FIG. 4 is an explanatory diagram for describing an area used for specifying a microphone by the information processing system according to the first embodiment of the present invention. FIG. 5 is a table showing correspondence information including correspondence relationships between microphone identifiers and areas.

（情報処理システム１００の構成）
本実施形態に係る情報処理システム１００は、情報処理装置１１０、音声処理装置１２０、マイクロホン１３０、及び撮像装置１４０を主に有する。情報処理装置１１０は、撮像装置１４０が撮像した映像を表示すると共に、マイクロホン１３０によって取得された音声を出力し、ユーザが音声調整操作を入力することの出来る端末装置である。情報処理装置１１０は、ユーザによって入力された操作情報から、操作対象のマイクロホンを特定し、そのマイクロホンに対する制御情報を生成して、音声処理装置１２０に入力する。音声処理装置１２０は、情報処理装置１１０から入力された制御情報に従って、マイクロホン１３０によって取得された音声に対する制御を実行する。なお、例えば、遠隔地ＡＢ間で双方向通信する際、遠隔地Ａにいるユーザａと、遠隔地Ｂにいるユーザｂとが、双方で音声調整操作を行う場合には、ユーザａが音声調整操作するための情報処理システム１００ａと、ユーザｂが音声調整するための情報処理システム１００ｂとが遠隔地ＡＢそれぞれに備わってもよい。 (Configuration of information processing system 100)
The information processing system 100 according to the present embodiment mainly includes an information processing device 110, a sound processing device 120, a microphone 130, and an imaging device 140. The information processing device 110 is a terminal device that displays the video imaged by the imaging device 140, outputs the sound acquired by the microphone 130, and allows the user to input a sound adjustment operation. The information processing apparatus 110 identifies the operation target microphone from the operation information input by the user, generates control information for the microphone, and inputs the control information to the audio processing apparatus 120. The sound processing device 120 executes control on the sound acquired by the microphone 130 according to the control information input from the information processing device 110. For example, when two-way communication is performed between the remote locations AB, when the user a at the remote location A and the user b at the remote location B perform voice adjustment operations on both sides, the user a performs voice adjustment. An information processing system 100a for operation and an information processing system 100b for user b to adjust sound may be provided in each remote location AB.

マイクロホン１３０は、音源から音声を取得する音声取得装置である。本実施形態においては、１３０ａ〜１３０ｆの６つのマイクロホンを用いて、６つのエリア毎に音声を取得する固定マイクロホンである。マイクロホン１３０は、例えば天井、壁、机などに固定されていてよい。マイクロホン１３０は、撮像装置１４０と同じ空間に配置される。即ち、マイクロホン１３０は、撮像装置１４０によって撮像される空間の音声を取得する。このとき、マイクロホン１３０は、所望の音声を特に分離して取得することのできる音源分離マイク、指向性マイクなどであってよい。 The microphone 130 is a sound acquisition device that acquires sound from a sound source. In the present embodiment, the microphone is a stationary microphone that acquires sound for each of six areas using six microphones 130a to 130f. For example, the microphone 130 may be fixed to a ceiling, a wall, a desk, or the like. The microphone 130 is disposed in the same space as the imaging device 140. That is, the microphone 130 acquires the sound of the space imaged by the imaging device 140. At this time, the microphone 130 may be a sound source separation microphone, a directional microphone, or the like that can separate and acquire desired sound.

撮像装置１４０は、映像を撮影するための装置である。撮像装置１４０は、映像を撮影し、ネットワークを介して情報処理装置１１０に入力する。 The imaging device 140 is a device for capturing a video. The imaging device 140 captures an image and inputs it to the information processing device 110 via the network.

（情報処理装置１１０）
情報処理装置１１０は、記憶部１１１、通信部１１２、座標入力部１１３、制御部１１４、表示部１１５、及び、音声出力部１１６を主に有する。 (Information processing apparatus 110)
The information processing apparatus 110 mainly includes a storage unit 111, a communication unit 112, a coordinate input unit 113, a control unit 114, a display unit 115, and an audio output unit 116.

記憶部１１１は、情報を記憶することのできる記憶装置である。例えばＨＤＤ（Hard Disk Drive）などの磁気記録媒体や、ＥＥＰＲＯＭ（Electronically Erasable and Programmable Read Only Memory）、フラッシュメモリ、ＭＲＡＭ（Magnetoresistive Random Access Memory）、ＦｅＲＡＭ（Ferroelectric Random Access Memory）、ＰＲＡＭ（Phase change Random Access Memory）などの不揮発性メモリが挙げられるが、上記に限られない。 The storage unit 111 is a storage device that can store information. For example, magnetic recording media such as HDD (Hard Disk Drive), EEPROM (Electronically Erasable and Programmable Read Only Memory), flash memory, MRAM (Magnetoresistive Random Access Memory), FeRAM (Ferroelectric Random Access Memory), PRAM (Phase change Random Access) Non-volatile memory such as “Memory”, but is not limited to the above.

記憶部１１１は、本実施形態においては、対応情報１０００を記憶する。対応情報１０００は、ユーザが操作する画面上の座標位置から、その位置の音声を取得するマイクロホンを特定するために用いられる情報である。例えば図５に示すマイク識別子１００２とエリア名１００４とを対応付けた情報であってよい。ここでエリアとは、ユーザが操作する画面を、それぞれのマイクロホンが音声を取得している範囲に区分したものである。エリアの一例は、図４に示す。このような対応情報１０００は、予め記憶部１１１に記憶されている。 The storage unit 111 stores correspondence information 1000 in the present embodiment. Correspondence information 1000 is information used to identify a microphone that acquires the sound at that position from the coordinate position on the screen operated by the user. For example, it may be information in which the microphone identifier 1002 and the area name 1004 shown in FIG. Here, the area is obtained by dividing the screen operated by the user into ranges in which each microphone acquires sound. An example of the area is shown in FIG. Such correspondence information 1000 is stored in the storage unit 111 in advance.

通信部１１２は、有線又は無線の通信方式に対応した通信インタフェースである。本実施形態においては、通信部１１２は、撮像装置１４０からの映像を受信して制御部１１４に入力し、また、音声処理装置１２０から音声を受信し、制御部１１４に入力する。 The communication unit 112 is a communication interface corresponding to a wired or wireless communication method. In the present embodiment, the communication unit 112 receives video from the imaging device 140 and inputs the video to the control unit 114, and also receives audio from the audio processing device 120 and inputs it to the control unit 114.

座標入力部１１３は、画面上の座標位置を入力する入力装置、又は、入力装置との接続インタフェースである。例えば、タッチパネル、マウス、トラックボール、ジョイスティックなどの入力装置、又は上記の入力装置との接続インタフェースである。本実施形態においては、座標入力部１１３は、表示部１１５の上に重ねられたタッチパネルであり、表示部１１５の表示画面上の座標位置をユーザの操作に応じて入力する。タッチパネルは、ユーザが画面上をタッチすると、その位置及び変化を読み取り、情報処理装置１１０に入力する。 The coordinate input unit 113 is an input device for inputting a coordinate position on the screen or a connection interface with the input device. For example, an input device such as a touch panel, a mouse, a trackball, or a joystick, or a connection interface with the above input device. In the present embodiment, the coordinate input unit 113 is a touch panel overlaid on the display unit 115, and inputs a coordinate position on the display screen of the display unit 115 in accordance with a user operation. When the user touches the screen, the touch panel reads the position and change and inputs them to the information processing apparatus 110.

制御部１１４は、情報処理装置１１０全体の動きを制御する機能を有する。制御部１１４は、情報処理装置１１０における処理手順を記述したプログラムを読み込んで解釈し、実行することにより、情報処理装置１１０の各機能を実現する。制御部１１４は、座標入力部１１３からの入力に応じて、処理を実行してもよい。制御部１１４は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などにより構成されてよい。 The control unit 114 has a function of controlling the movement of the information processing apparatus 110 as a whole. The control unit 114 implements each function of the information processing apparatus 110 by reading, interpreting, and executing a program describing a processing procedure in the information processing apparatus 110. The control unit 114 may execute processing in response to an input from the coordinate input unit 113. The control unit 114 may be configured by, for example, a CPU (Central Processing Unit).

表示部１１５は、映像を表示する機能を有するディスプレイ、又はディスプレイへの出力インタフェースである。例えば、ディスプレイの例としては、液晶ディスプレイ（ＬＣＤ：ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）・プラズマディスプレイ（ＰＤＰ：ＰｌａｓｍａＤｉｓｐｌａｙＰａｎｅｌ）・電界放出ディスプレイ（ＦＥＤ：ＦｉｅｌｄＥｍｉｓｓｉｏｎＤｉｓｐｌａｙ）・有機エレクトロルミネッセンスディスプレイ（有機ＥＬ、ＯＥＬＤ：ＯｒｇａｎｉｃＥｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅＤｉｓｐｌａｙ）・ビデオプロジェクタなどが挙げられる。表示部１１５は、制御部１１４からの制御に従って、入力された画像を表示する機能を有する。 The display unit 115 is a display having a function of displaying an image or an output interface to the display. For example, examples of the display include a liquid crystal display (LCD), a plasma display (PDP), a field emission display (FED), an organic electroluminescence display (organic EL, OELD: Organic). Electroluminescence Display), a video projector, etc. are mentioned. The display unit 115 has a function of displaying an input image in accordance with control from the control unit 114.

音声出力部１１６は、音声を出力する機能を有する装置、又は音声を出力する機能を有する装置への出力インタフェースである。例えば、音声出力部１１６は、スピーカなどの出力装置、又はスピーカへの出力インタフェースである。音声出力部１１６は、制御部１１４からの制御に従って、入力された音声を出力する。 The audio output unit 116 is an output interface to a device having a function of outputting sound or a device having a function of outputting sound. For example, the audio output unit 116 is an output device such as a speaker, or an output interface to the speaker. The sound output unit 116 outputs the input sound according to the control from the control unit 114.

（音声処理装置１２０）
音声処理装置１２０は、通信部１２１、マイク識別子付与部１２２、音声ミキサ１２３、多重化部１２４を主に有する。 (Speech processor 120)
The audio processing device 120 mainly includes a communication unit 121, a microphone identifier assigning unit 122, an audio mixer 123, and a multiplexing unit 124.

通信部１２１は、有線又は無線の通信方式に対応した通信インタフェースである。本実施形態においては、通信部１２１は、マイクロホン１３０及び情報処理装置１１０の通信部１１２と接続され、マイクロホン１３０が取得した音声を受信し、マイク識別子付与部に入力する。また、通信部１２１は、多重化部１２４から入力された音声を情報処理装置１１０に入力するインタフェースである。さらに、通信部１２１は、情報処理装置１１０から、音声に対する制御情報を受信し、マイク識別子付与部１２２に入力する。 The communication unit 121 is a communication interface corresponding to a wired or wireless communication method. In the present embodiment, the communication unit 121 is connected to the microphone 130 and the communication unit 112 of the information processing apparatus 110, receives the voice acquired by the microphone 130, and inputs the received voice to the microphone identifier assigning unit. The communication unit 121 is an interface for inputting the voice input from the multiplexing unit 124 to the information processing apparatus 110. Further, the communication unit 121 receives control information for voice from the information processing apparatus 110 and inputs the control information to the microphone identifier assigning unit 122.

マイク識別子付与部１２２は、通信部１２１から入力された音声に対して、情報処理装置１１０から制御情報を受信した場合には、マイク識別子を付与して、音声ミキサ１２３へ入力する。ここでマイク識別子は、予め付与されたマイクロホン１３０それぞれに固有の番号である。マイク識別子付与部１２２は、通信部１２１から受信した制御情報も音声と合わせて音声ミキサ１２３へ入力する。一方、マイク識別子付与部１２２は、情報処理装置１１０から制御情報を受信していない場合には、音声に対して何の処理も実行せずに音声を音声ミキサに入力する。 When the control information is received from the information processing apparatus 110 with respect to the sound input from the communication unit 121, the microphone identifier assigning unit 122 assigns a microphone identifier and inputs it to the audio mixer 123. Here, the microphone identifier is a number unique to each microphone 130 assigned in advance. The microphone identifier assigning unit 122 also inputs the control information received from the communication unit 121 to the audio mixer 123 together with the audio. On the other hand, when the control information is not received from the information processing apparatus 110, the microphone identifier assigning unit 122 inputs the sound to the sound mixer without performing any processing on the sound.

音声ミキサ１２３は、マイク識別子付与部１２２からマイク識別子が付与された音声、及び制御情報を受信すると、制御情報に従って特定されたマイク識別子に対応する音声を、マイク識別子付与部１２２において付与されたマイク識別子を用いて判別し、特定された音声に対して制御情報に従ってミキシング処理を行い、多重化部１２４に入力する。音声ミキサ１２３は、情報処理装置１１０からの制御情報を受信していない場合には、音声に対して何の処理も実行せずに、音声を多重化部に入力する。 When the audio mixer 123 receives the audio with the microphone identifier from the microphone identifier assigning unit 122 and the control information, the audio mixer 123 receives the audio corresponding to the microphone identifier specified according to the control information by the microphone identifier giving unit 122. The identification is performed using the identifier, the mixing processing is performed on the identified voice according to the control information, and the audio is input to the multiplexing unit 124. If the audio mixer 123 has not received control information from the information processing apparatus 110, the audio mixer 123 does not perform any processing on the audio and inputs the audio to the multiplexing unit.

多重化部１２４は、音声ミキサ１２３においてミキシングされた音声について、音源位置及び音量を考慮した上で多重化し、通信部１２１に入力する。ここで多重化された音声は、通信部１２１を介してネットワークに送信される。 The multiplexing unit 124 multiplexes the audio mixed in the audio mixer 123 in consideration of the sound source position and volume, and inputs the multiplexed audio to the communication unit 121. The multiplexed audio is transmitted to the network via the communication unit 121.

（情報処理システム１００の動作）
次に、図６及び図７を用いて、本実施形態に係る情報処理システム１００の動作について説明する。図６は、本実施形態に係る情報処理システム１００における音声調整の動作を説明するフローチャートである。図７は、本発明の第1の実施形態に係る情報処理システム１００のマイク特定の手順を示すフローチャートである。ここで、図７は図６のステップＳ１０４のサブフローチャートである。 (Operation of the information processing system 100)
Next, the operation of the information processing system 100 according to the present embodiment will be described with reference to FIGS. 6 and 7. FIG. 6 is a flowchart for explaining the sound adjustment operation in the information processing system 100 according to the present embodiment. FIG. 7 is a flowchart showing a procedure for specifying a microphone in the information processing system 100 according to the first embodiment of the present invention. Here, FIG. 7 is a sub-flowchart of step S104 of FIG.

まず、制御部１１４は、撮像装置において撮影された映像を通信部１１２を介して取得し、表示部１１５に表示させると共に、マイクロホン１３０において取得された音声を、音声処理装置１２０及び通信部１１２を介して取得し、音声出力部１１６に出力させている。そして、例えば、図４に示すように、参加者４０が喋っている声が聞こえ難い場合に、ユーザが指５０で画面上の参加者４０付近を指差すようにタッチする。すると、情報処理装置１１０の制御部１１４は、上記タッチの結果、入力された座標位置Ｘ（ｘ，ｙ）を取得する（Ｓ１０２）。次に、情報処理装置１１０の制御部１１４は、入力された座標位置Ｘ（ｘ，ｙ）に対応するマイク識別子を特定する（Ｓ１０４）。 First, the control unit 114 acquires the video captured by the imaging device via the communication unit 112 and displays the video on the display unit 115, and the voice acquired by the microphone 130 is transmitted to the voice processing device 120 and the communication unit 112. And output to the audio output unit 116. Then, for example, as shown in FIG. 4, when it is difficult to hear the voice of the participant 40 speaking, the user touches the finger 50 so as to point near the participant 40 on the screen. Then, the control unit 114 of the information processing apparatus 110 acquires the input coordinate position X (x, y) as a result of the touch (S102). Next, the control unit 114 of the information processing apparatus 110 specifies a microphone identifier corresponding to the input coordinate position X (x, y) (S104).

次に、図６のステップＳ１０４のサブフローチャートである図７を参照しながら、マイク特定手順について説明する。まず、入力された座標位置Ｘ（ｘ，ｙ）から、制御部１１４は、対応するエリアを特定する（Ｓ２０２）。本実施形態においては、対応するエリアは、エリアＢであることがわかる。このエリア特定ステップは、例えば、情報処理装置１１０内の記憶部に、座標位置とエリアとの対応情報を持っておくことによって実現することができる。 Next, the microphone identification procedure will be described with reference to FIG. 7 which is a sub-flowchart of step S104 in FIG. First, from the input coordinate position X (x, y), the control unit 114 specifies a corresponding area (S202). In the present embodiment, it can be seen that the corresponding area is area B. This area specifying step can be realized by, for example, holding correspondence information between the coordinate position and the area in the storage unit in the information processing apparatus 110.

そして、特定したエリアの音声を取得するマイクロホンを特定し、そのマイク識別子を取得する（Ｓ２０４）。本実施形態においては、エリアは、それぞれのマイクで音声を取得するエリアとして定義されているため、例えば図５のような対応情報１０００を予め作成し、保持しておくことによって、これを用いて、マイク識別子を特定することができる。本実施形態においては、エリアＢの音声を取得するマイクロホン１３０のマイク識別子は１３０ｂであることがわかる。 And the microphone which acquires the audio | voice of the specified area is specified, and the microphone identifier is acquired (S204). In the present embodiment, the area is defined as an area for acquiring sound by each microphone. Therefore, for example, the correspondence information 1000 as shown in FIG. The microphone identifier can be specified. In the present embodiment, it can be seen that the microphone identifier of the microphone 130 that acquires the sound of the area B is 130b.

次に、再び図６に戻って音声調整手順について説明を続ける。ステップＳ１０４において、調整対象のマイク識別子を取得すると、制御部１１４は、特定されたマイクロホンの音声調整機能をＯＮにする（Ｓ１０６）。そして、制御部１１４は、入力された座標位置Ｘ（ｘ，ｙ）付近に音声調整インタフェースを表示する（Ｓ１０８）。 Next, returning to FIG. 6 again, the description of the audio adjustment procedure will be continued. In step S104, when the microphone identifier to be adjusted is acquired, the control unit 114 turns on the sound adjustment function of the specified microphone (S106). Then, the control unit 114 displays an audio adjustment interface near the input coordinate position X (x, y) (S108).

そして、例えば図２に示されたように、ユーザが指５０によって、音声調整インタフェース６０に対して操作をすると、制御部１１４は、その操作情報を取得する（Ｓ１１０）。 Then, for example, as shown in FIG. 2, when the user operates the audio adjustment interface 60 with the finger 50, the control unit 114 acquires the operation information (S110).

制御部１１４は、ステップＳ１０４において特定したマイク識別子、及び、ステップＳ１１０において取得したユーザの音声調整操作情報に基づいて、音声制御情報を生成し、出力する。出力された音声制御情報は、通信部１１２を介して音声処理装置１２０に出力される。 The control unit 114 generates and outputs voice control information based on the microphone identifier specified in step S104 and the user's voice adjustment operation information acquired in step S110. The output audio control information is output to the audio processing device 120 via the communication unit 112.

そして、音声調整の完了が検知されると（Ｓ１１４）、音声調整インタフェースの表示をＯＦＦにし、音声調整を終える。 When the completion of the audio adjustment is detected (S114), the display of the audio adjustment interface is turned off and the audio adjustment is finished.

（まとめ）
このように、本実施形態に係る情報処理装置１００は、映像を映した画面上の特定の位置の音声を取得しているマイクロホン１３０を、対応情報を用いることによって自動的に特定する。ユーザは、ただ音声の調整をしたい対象を指差して、音声調整操作をするだけで、従来あったどのマイクロホン１３０によって音声が取得されているかという対応付けを行う必要がなくなる。そのため、直感的な操作により、簡便に音声を調整することができるようになる。 (Summary)
As described above, the information processing apparatus 100 according to the present embodiment automatically identifies the microphone 130 that has acquired the sound at a specific position on the screen on which the video is projected, using the correspondence information. The user does not need to make a correspondence as to which microphone 130 has been used to acquire the sound simply by pointing to the object whose sound is to be adjusted and performing the sound adjustment operation. Therefore, it is possible to easily adjust the voice by an intuitive operation.

（第２の実施形態）
上記の第１の実施形態では、予め設定された対応情報によって、操作画面上の位置から、その位置の音声を取得するマイクロホンを特定することを実現したものである。しかし、このようなシステムは、予めマイクロホンと、画面上のエリアとの対応情報を設定しておく必要がある。そこで、本発明の第2の実施形態は、対応情報をユーザが予め設定することなく、操作対象のマイクロホンの特定をすることができるものである。尚、以下第２の実施形態の説明においては、第１の実施形態と同様である部分については説明を省略し、相違点について主に説明する。 (Second Embodiment)
In the first embodiment described above, the microphone that acquires the sound at the position is specified from the position on the operation screen based on preset correspondence information. However, such a system needs to set correspondence information between a microphone and an area on the screen in advance. Therefore, in the second embodiment of the present invention, the operation target microphone can be specified without the user setting correspondence information in advance. In the following description of the second embodiment, description of parts that are the same as those of the first embodiment will be omitted, and differences will be mainly described.

（情報処理システム２００）
まず、図８を用いて本実施形態に係る情報処理システム２００について説明する。図８は、本発明の第２の実施形態に係る情報処理システムの機能構成を示すブロック図である。 (Information processing system 200)
First, the information processing system 200 according to the present embodiment will be described with reference to FIG. FIG. 8 is a block diagram showing a functional configuration of an information processing system according to the second embodiment of the present invention.

情報処理システム２００は、情報処理装置２１０、音声処理装置２２０、撮像装置２４０、複数の音声取得装置２５０、及び、対応情報作成装置２６０を主に有し、ネットワーク３００によってそれぞれ接続されている。 The information processing system 200 mainly includes an information processing device 210, a sound processing device 220, an imaging device 240, a plurality of sound acquisition devices 250, and a correspondence information creation device 260, which are connected by a network 300.

情報処理システム２００は、ユーザが指定した音声を取得しているマイクロホンを特定するために、音声取得装置２５０及びバッチホルダ８００を用いて生成した情報を対応情報作成装置２６０が収集して作成した、対応情報を用いる。 In the information processing system 200, the correspondence information creation device 260 collects and creates information generated by using the voice acquisition device 250 and the batch holder 800 in order to identify the microphone that has acquired the voice specified by the user. Use correspondence information.

情報処理装置２１０、音声処理装置２２０、及び撮像装置２４０については、第１の実施形態と同様であるため、説明を省略する。尚、情報処理装置２１０において記憶部を図示していないが、もちろん記憶部を有する構成であってもよい。情報処理装置２１０が、内部に対応情報を有しないことを図示したものである。情報処理装置２１０の音量調整時の動作については、後述する。 Since the information processing device 210, the sound processing device 220, and the imaging device 240 are the same as those in the first embodiment, the description thereof is omitted. In addition, although the memory | storage part is not illustrated in the information processing apparatus 210, of course, the structure which has a memory | storage part may be sufficient. The information processing apparatus 210 shows that it does not have correspondence information inside. The operation at the time of volume adjustment of the information processing apparatus 210 will be described later.

（対応情報作成装置２６０）
対応情報作成装置２６０は、通信部２６１、制御部２６２、及び記憶部２６３を主に有する。本実施形態においては、対応情報を作成する機能に着目しているが、対応情報作成装置２６０は、対応情報を収集することによって、音声取得装置２５０及び参加者の空間配置を認識することが可能である。 (Correspondence information creation device 260)
The correspondence information creation device 260 mainly includes a communication unit 261, a control unit 262, and a storage unit 263. In this embodiment, attention is focused on the function of creating correspondence information, but the correspondence information creation device 260 can recognize the spatial arrangement of the voice acquisition device 250 and the participants by collecting the correspondence information. It is.

（音声取得装置２５０）
音声取得装置２５０は、音声を取得するためのマイクロホンが設置された装置である。本実施形態においては、音声取得装置２５０は、例えば図１０及び図１１に示した机の形状をしている。図８においては、音声取得装置２５０は、音声取得装置２５０ａと音声取得装置２５０ｂとの２つを有するものとして記載しているが、これに限られない。複数組み合わせて用いることが可能である。 (Voice acquisition device 250)
The sound acquisition device 250 is a device in which a microphone for acquiring sound is installed. In the present embodiment, the voice acquisition device 250 has, for example, the desk shape shown in FIGS. 10 and 11. In FIG. 8, the voice acquisition device 250 is described as having two of the voice acquisition device 250a and the voice acquisition device 250b, but is not limited thereto. Multiple combinations can be used.

音声取得装置２５０は、記憶部２５１、制御部２５２、通信部２５３、受信器２５４、発信器２５５、マイクロホン２３０を主に有する。尚、ここで受信器２５４及び発信器２５５については、複数有するため、それぞれを識別するために、例えば３つの受信器２５４は受信器２５４１〜２５４３の符号を振っている。以下、発信器についてまとめて説明をする際には、発信器２５５として表記する。受信器についても同様である。 The voice acquisition device 250 mainly includes a storage unit 251, a control unit 252, a communication unit 253, a receiver 254, a transmitter 255, and a microphone 230. Here, since there are a plurality of receivers 254 and transmitters 255, for example, the three receivers 254 are assigned the codes of the receivers 2541 to 2543 in order to identify them. Hereinafter, the transmitter will be described as the transmitter 255 when collectively described. The same applies to the receiver.

記憶部２５１は、情報を記憶することのできる記憶装置である。例えばＨＤＤ（Hard Disk Drive）などの磁気記録媒体や、ＥＥＰＲＯＭ（Electronically Erasable and Programmable Read Only Memory）、フラッシュメモリ、ＭＲＡＭ（Magnetoresistive Random Access Memory）、ＦｅＲＡＭ（Ferroelectric Random Access Memory）、ＰＲＡＭ（Phase change Random Access Memory）などの不揮発性メモリが挙げられるが、上記に限られない。 The storage unit 251 is a storage device that can store information. For example, magnetic recording media such as HDD (Hard Disk Drive), EEPROM (Electronically Erasable and Programmable Read Only Memory), flash memory, MRAM (Magnetoresistive Random Access Memory), FeRAM (Ferroelectric Random Access Memory), PRAM (Phase change Random Access) Non-volatile memory such as “Memory”, but is not limited to the above.

記憶部２５１は、本実施形態においては、例えば図９において示したような対応情報２０００を記憶する。図９は、音声取得装置における対応情報の一例を示す説明図である。対応情報２０００は、それぞれの音声取得装置２５０が有する、音声取得装置識別子２００２、マイク識別子２００４、発信器識別子２００６、受信器識別子２００８の対応関係を予め記憶したものであり、それぞれの受信器２５４が、識別情報を受信したときには、受信した識別情報が受信情報２０１０に格納される。 In the present embodiment, the storage unit 251 stores correspondence information 2000 as illustrated in FIG. 9, for example. FIG. 9 is an explanatory diagram illustrating an example of correspondence information in the voice acquisition device. The correspondence information 2000 stores in advance the correspondence between the voice acquisition device identifier 2002, the microphone identifier 2004, the transmitter identifier 2006, and the receiver identifier 2008 that each of the voice acquisition devices 250 has. When the identification information is received, the received identification information is stored in the reception information 2010.

制御部２５２は、音声取得装置２５０全体の動作を制御する。例えば、受信器２５４において、識別情報が受信された場合には、制御部２５２は、受信した識別情報を記憶部２５１の対応情報２００の、受信器識別子に対応する受信情報２０１０に記憶すると共に、対応情報２０００を通信部２５３を介して対応情報作成装置２６０に送信する。制御部２５２は、例えばＣＰＵなどにより構成されてよい。 The control unit 252 controls the overall operation of the voice acquisition device 250. For example, when identification information is received by the receiver 254, the control unit 252 stores the received identification information in the reception information 2010 corresponding to the receiver identifier in the correspondence information 200 of the storage unit 251, and The correspondence information 2000 is transmitted to the correspondence information creation device 260 via the communication unit 253. The control unit 252 may be configured by a CPU, for example.

通信部２５３は、音声取得装置２５０がネットワークと接続するための通信インタフェースである。通信部２５３は、有線のネットワークに接続するためのものであってもよいし、無線のネットワークに接続するためのものであってもよい。通信部２５３は、制御部２５２の制御に従って各種データの送受信を行う。 The communication unit 253 is a communication interface for connecting the voice acquisition device 250 to a network. The communication unit 253 may be for connecting to a wired network or may be for connecting to a wireless network. The communication unit 253 transmits and receives various data according to the control of the control unit 252.

受信器２５４は、制御部２５２の制御に従って、他の音声取得装置２５０に設置された発信器２５５又は音源識別子を発信する音源識別子発信器８０１と通信し、発信器識別子又は音源識別子を受信する機能を有する。受信器２５４は、例えば赤外線を用いて通信を行う赤外線受信器であってよい。受信器２５４の物理的な設置については、後述する。 The receiver 254 communicates with a transmitter 255 installed in another voice acquisition device 250 or a sound source identifier transmitter 801 that transmits a sound source identifier under the control of the control unit 252, and receives a transmitter identifier or a sound source identifier. Have The receiver 254 may be an infrared receiver that performs communication using infrared rays, for example. The physical installation of the receiver 254 will be described later.

発信器２５５は、制御部２５２の制御に従って、発信器識別子を対向する受信器に向かって発信する。発信器２５５は、例えば赤外線を用いて通信を行う赤外線発信器であってよい。発信器２５５の通信方式は、受信器２５４の通信方式と互換性があることを前提とする。 The transmitter 255 transmits a transmitter identifier toward the opposite receiver according to the control of the control unit 252. The transmitter 255 may be an infrared transmitter that performs communication using infrared rays, for example. It is assumed that the communication method of the transmitter 255 is compatible with the communication method of the receiver 254.

マイクロホン２３０は、音声を取得するための装置である。マイクロホン２３０は、音声取得装置に設置されている。例えば、マイクロホン２３０は、音声取得装置２５０に内蔵されていてもよいし、例えば、音声取得装置２５０の側面又は上面などに設置されてもよい。マイクロホン２３０は、机の形状をした音声取得装置２５０のうち、参加者が座る位置に対応して設置されることが望ましい。 The microphone 230 is a device for acquiring sound. The microphone 230 is installed in the sound acquisition device. For example, the microphone 230 may be built in the sound acquisition device 250 or may be installed on the side surface or the upper surface of the sound acquisition device 250, for example. The microphone 230 is preferably installed corresponding to the position where the participant sits in the desk-like audio acquisition device 250.

（物理構成）
次に、音声取得装置２５０の物理的な構成について図１０及び図１１を用いて説明する。図１０は、音声取得装置２５０ａ、及び、音声取得装置２５０ｂを上部から見た概観図である。図１１は、音声取得装置２５０ａ、及び、音声取得装置２５０ｂを側面から見た概観図である。 (Physical configuration)
Next, the physical configuration of the voice acquisition device 250 will be described with reference to FIGS. 10 and 11. FIG. 10 is an overview of the voice acquisition device 250a and the voice acquisition device 250b as viewed from above. FIG. 11 is a schematic view of the voice acquisition device 250a and the voice acquisition device 250b as viewed from the side.

本実施形態において、音声取得装置２５０は、机の形状をしている。そして、受信器２５４及び発信器２５５が一対となって同じ側面に配置されている。尚、受信器２５４及び発信器２５５は、図においては、突出形状で示しているが、実際には、机間の隙間をなくすため、例えば、音声取得装置２５０の参加者に対向する側面に凹状に設けているものとする。受信器２５４及び発信器２５５は、本実施形態においては、音声取得装置２５０当たり、それぞれ３つずつ設置される。なお、情報上の対応関係を示す対応情報（図９）と同様に、マイク２３０のマイクの向きと机の向きとの物理的な構成（構成上の対応関係）についても関連付く。例えば、図１０に示すように、マイク２３０ａのマイクの向きは発信器２５５１ａが備わる側面と対向する向きになっている。つまり、マイク２３０ａは発信器２５５１ａが備わる側に存在する参加者等の音声を主に取得する。 In the present embodiment, the voice acquisition device 250 has a desk shape. A receiver 254 and a transmitter 255 are arranged on the same side as a pair. Note that the receiver 254 and the transmitter 255 are shown in a protruding shape in the figure, but actually, in order to eliminate a gap between the desks, for example, a concave shape is formed on the side surface facing the participant of the voice acquisition device 250. It shall be provided in In the present embodiment, three receivers 254 and three transmitters 255 are provided for each voice acquisition device 250. Similar to the correspondence information (FIG. 9) indicating the correspondence on information, the physical configuration (correspondence on the configuration) between the direction of the microphone of the microphone 230 and the orientation of the desk is also associated. For example, as shown in FIG. 10, the direction of the microphone of the microphone 230a is opposite to the side surface provided with the transmitter 2551a. That is, the microphone 230a mainly acquires the voices of the participants and the like existing on the side where the transmitter 2551a is provided.

図１０に示すように、受信器２５４と発信器２５５とは、音声取得装置２５０の各辺が接した際に、一方の音声取得装置２５０の受信器２５４と他方の音声取得装置２５０の発信器２５５とが対向するように配置されている。例えば、図９においては、机の天板の一番長い辺を合わせたときに、受信器２５４３ａと発信器２５５３ｂ、発信器２５５３ａと受信器２５４３ｂが対向し、通信可能となる。 As shown in FIG. 10, the receiver 254 and the transmitter 255 are configured such that when each side of the voice acquisition device 250 contacts, the receiver 254 of one voice acquisition device 250 and the transmitter of the other voice acquisition device 250. It arrange | positions so that 255 may oppose. For example, in FIG. 9, when the longest sides of the top of the desk are aligned, the receiver 2543a and the transmitter 2553b and the transmitter 2553a and the receiver 2543b face each other and can communicate.

また、図１０に点線で示したように、音声取得装置２５０は、マイクロホン２３０を内蔵する。マイクロホン２３０は、受信器２５４の数に対応して設けられる。本実施形態においては、マイクロホン２３０は、音声取得装置２５０当たり３つ設置される。マイクロホン２３０は、受信器２５４及び発信器２５５と対応付けられている。 Further, as indicated by a dotted line in FIG. 10, the sound acquisition device 250 includes a microphone 230. Microphones 230 are provided corresponding to the number of receivers 254. In the present embodiment, three microphones 230 are installed for each sound acquisition device 250. The microphone 230 is associated with the receiver 254 and the transmitter 255.

音声取得装置２５０の天板の表面には、認識コード７００が模様として埋め込まれている。認識コード７００は、音声取得装置２５０毎の音声取得装置識別子を示すものである。全体に繰り返し埋め込まれているため、この認識コード７００を解読することによって、認識コードを映した映像から、ユーザが指示した地点の最寄の音声取得装置２５０がどの音声取得装置であるかを識別することが可能となる。なお、本実施形態に係る認識コード７００は音声取得装置識別子を示す場合を例に挙げて説明するが、かかる例に限定されず、例えば、認識コード７００は複数台のマイク２３０を一組としたマイクセットを識別する識別子の場合でも良い。例えば、図１０に示すように、マイク２３０ａ〜ｃを一組としたマイクセットを例示できる。かかる場合、音声取得装置２５０に複数台のマイクセットを備えることができる。なお、この場合、例えば、音声取得装置２５０の表面のうち、第１のマイクセットの備わる周辺部には認識コード７００ａが表示され、第２のマイクセットの備わる周辺部には認識コード７００ｂが表示される。 A recognition code 700 is embedded as a pattern on the top surface of the sound acquisition device 250. The recognition code 700 indicates a voice acquisition device identifier for each voice acquisition device 250. Since it is repeatedly embedded in the whole, by decoding this recognition code 700, it is possible to identify which voice acquisition device is the nearest voice acquisition device 250 at the point indicated by the user from the video showing the recognition code. It becomes possible to do. Note that the recognition code 700 according to the present embodiment will be described using an example in which a voice acquisition device identifier is shown, but is not limited to this example. For example, the recognition code 700 includes a plurality of microphones 230 as a set. An identifier for identifying a microphone set may be used. For example, as illustrated in FIG. 10, a microphone set including a pair of microphones 230 a to 230 c can be exemplified. In such a case, the voice acquisition device 250 can include a plurality of microphone sets. In this case, for example, of the surface of the voice acquisition device 250, the recognition code 700a is displayed on the peripheral portion provided with the first microphone set, and the recognition code 700b is displayed on the peripheral portion provided with the second microphone set. Is done.

ここで、認識コード７００は、音声取得装置２５０の識別子を記憶可能なコードであり、その画像から情報を読み取ることができる。例えば、図示したＱＲコード（登録商標）、その他の２次元コード、及び１次元コードであるバーコードであってもよい。 Here, the recognition code 700 is a code capable of storing the identifier of the sound acquisition device 250, and information can be read from the image. For example, the barcode may be a QR code (registered trademark), other two-dimensional code, or a one-dimensional code.

次に、図１２、及び図１３を用いて、音声取得装置２５０の受信部２５４が音源識別子を受信するときの様子について説明する。図１２は、音源識別子発信器と音声取得装置との通信について説明する説明図である。図１３は、バッチホルダの概観構成例について説明する説明図である。 Next, using FIG. 12 and FIG. 13, a state when the reception unit 254 of the voice acquisition device 250 receives a sound source identifier will be described. FIG. 12 is an explanatory diagram illustrating communication between the sound source identifier transmitter and the voice acquisition device. FIG. 13 is an explanatory diagram illustrating an example of an overview configuration of a batch holder.

図１２に示すように、音声取得装置２５０に対して参加者１０が着席した場合を考える。音声取得装置２５０の、参加者１０が着席した側の側面に、受信器２５４２が設置されている。このとき、参加者１０は、バッチホルダ８００を身に着けており、バッチホルダ８００が有する音源識別子発信器８０１が、音源識別子を、音声取得装置２５０の受信器２５４２に発信する。図１２に示すように、参加者１０が着席した際に、バッチホルダ８００と受信器２５４２とが対向する位置にくるよう、音声取得装置２５０は設計されることが望ましい。 As shown in FIG. 12, consider a case where the participant 10 is seated on the voice acquisition device 250. A receiver 2542 is installed on the side of the voice acquisition device 250 on the side where the participant 10 is seated. At this time, the participant 10 wears the batch holder 800, and the sound source identifier transmitter 801 included in the batch holder 800 transmits the sound source identifier to the receiver 2542 of the sound acquisition device 250. As shown in FIG. 12, it is desirable that the voice acquisition device 250 be designed so that when the participant 10 is seated, the batch holder 800 and the receiver 2542 are positioned to face each other.

ここで、バッチホルダ８００の構成について、図１３を用いて説明する。バッチホルダ８００は、例えば参加者１０が自身を示すためのバッチ等を身に着けるためのものである。例えば、バッチホルダ８００に、音源識別子を発信するための音源識別子発信器８０１が取り付けられている。音源識別子は、音源を識別するための符号であり、例えば社員番号のように、個人に割り当てられた識別番号であってもよい。 Here, the configuration of the batch holder 800 will be described with reference to FIG. The batch holder 800 is for the participant 10 to wear a batch or the like for showing himself / herself, for example. For example, a sound source identifier transmitter 801 for transmitting a sound source identifier is attached to the batch holder 800. The sound source identifier is a code for identifying the sound source, and may be an identification number assigned to an individual such as an employee number.

以上、説明してきた通り、本実施形態に係る情報処理システム２００は、机の形状をした音声取得装置２５０を組み合わせた際に、それぞれ側面に設けられた受信器２５４及び発信器２５５間において、発信器識別子を送受信することによって、どの受信器がどの発信器と対向しているかを、把握することができるようになる。また、対応情報作成装置２６０が、予めそれぞれの音声取得装置２５０の形状などのデータを保持していると、どのような配置で音声取得装置２５０が配置されているのか、仮想空間に再現することが可能である。 As described above, the information processing system 200 according to the present embodiment transmits information between the receiver 254 and the transmitter 255 provided on the side surfaces when the voice acquisition device 250 having a desk shape is combined. By transmitting and receiving the device identifier, it becomes possible to grasp which receiver is facing which transmitter. Further, if the correspondence information creation device 260 holds data such as the shape of each voice acquisition device 250 in advance, the arrangement of the voice acquisition device 250 is reproduced in the virtual space. Is possible.

また、例えば参加者１０に音源識別子をそれぞれ付与し、参加者が着席した箇所の受信器２５４が音源識別子を受信することができるように、参加者が身につけるバッチホルダ８００などに音源識別子発信器８０１を設けることにより、音声取得装置２５０及び、音声取得装置２５０から対応情報を受信した、対応情報作成装置２６０は、各音声取得装置２５０のどの位置にどの参加者が着席しているかを把握することができるようになる。 Further, for example, a sound source identifier is given to each participant 10, and the sound source identifier is transmitted to the batch holder 800 worn by the participant so that the receiver 254 at the place where the participant is seated can receive the sound source identifier. By providing the device 801, the voice acquisition device 250 and the correspondence information creation device 260 that has received the correspondence information from the voice acquisition device 250 know which participant is seated at which position of each voice acquisition device 250. Will be able to.

（動作例）
ここで、以上のような情報処理システム２００を用いた動作例について説明をする。例えば、音声取得装置２５０を４つ組み合わせた例について説明をする。４つの音声取得装置２５０を組み合わせると、対向する受信器２５４と発信器２５５との間で、発信器識別子の送受信が行われる。そして、受信器２５４は、発信器識別子を受信すると、制御部２５２の制御に応じて、受信した発信器識別子を記憶部２５１の対応情報の受信情報に記憶する。 (Operation example)
Here, an operation example using the information processing system 200 as described above will be described. For example, an example in which four voice acquisition devices 250 are combined will be described. When the four voice acquisition devices 250 are combined, a transmitter identifier is transmitted and received between the receiver 254 and the transmitter 255 facing each other. Then, when receiving the transmitter identifier, the receiver 254 stores the received transmitter identifier in the reception information of the correspondence information in the storage unit 251 under the control of the control unit 252.

また、そのように組み合わせられた音声取得装置２５０に、参加者が着席した場合を考える。ここでは、４名の参加者が着席するものとする。参加者が着席すると、参加者の保有するバッチホルダの音源識別子発信器から、音声取得装置２５０の受信器２５４に音源識別子が送信される。受信された音源識別子は、制御部２５２の制御に応じて、記憶部２５１の対応情報の受信情報に記憶される。 Further, consider a case where a participant is seated on the voice acquisition device 250 combined in such a manner. Here, it is assumed that four participants are seated. When the participant is seated, the sound source identifier is transmitted from the sound source identifier transmitter of the batch holder owned by the participant to the receiver 254 of the voice acquisition device 250. The received sound source identifier is stored in the reception information of the correspondence information in the storage unit 251 under the control of the control unit 252.

このようにして、収集された対応情報を図１４〜図１７にそれぞれ示す。図１４〜図１７は、音声取得装置２５０において取得された、対応情報を示す説明図である。 The correspondence information collected in this way is shown in FIGS. 14-17 is explanatory drawing which shows the correspondence information acquired in the audio | voice acquisition apparatus 250. FIG.

図１４は、音声取得装置２５０ａについてのものである。音声取得装置識別子２１０２、マイク識別子２１０４、発信器識別子２１０６、受信器識別子２１０８が予め記憶されており、受信情報２１１０に、それぞれ対応する受信器２５４が受信した情報が格納されている。 FIG. 14 is for the voice acquisition device 250a. A voice acquisition device identifier 2102, a microphone identifier 2104, a transmitter identifier 2106, and a receiver identifier 2108 are stored in advance, and information received by the corresponding receiver 254 is stored in the reception information 2110, respectively.

以下、それぞれ、図１５は２５０ｂ、図１６は２５０ｃ、図１７は２５０ｄについての対応情報を示す。このように収集された対応情報は、対応情報作成装置２６０へ送信される。 In the following, FIG. 15 shows correspondence information for 250b, FIG. 16 for 250c, and FIG. 17 for 250d. The correspondence information collected in this way is transmitted to the correspondence information creation device 260.

それぞれの音声取得装置２５０において収集された対応情報が、対応情報作成装置２６０において受信されると、対応情報作成装置２６０は、予め保持している、机の形状、構成、及び参加者に割り当てられた音源識別子の情報を用いて、音声取得装置２５０の配置状況、及び、参加者の着席状況を把握することが可能である。即ち、どの音源の音声をどのマイクロホンによって取得しているかを把握することが可能である。 When the correspondence information collected by each voice acquisition device 250 is received by the correspondence information creation device 260, the correspondence information creation device 260 is assigned to the desk shape, configuration, and participants that are held in advance. It is possible to grasp the arrangement status of the voice acquisition device 250 and the seating status of the participant using the information of the sound source identifier. That is, it is possible to grasp which sound source sound is acquired by which microphone.

即ち、対応情報作成装置２６０は、図１４〜図１７の対応情報を用いて、図１８のような配置で音声取得装置が配置されていること、参加者が着席している位置、及び、参加者が着席している位置に対応するマイクロホンを把握することも可能である。図１８は、対応情報作成装置２６０において認識される音声取得装置の配置例を示す説明図である。 That is, the correspondence information creation device 260 uses the correspondence information shown in FIGS. 14 to 17 to indicate that the voice acquisition device is arranged as shown in FIG. 18, the position where the participant is seated, and the participation It is also possible to grasp the microphone corresponding to the position where the person is seated. FIG. 18 is an explanatory diagram showing an arrangement example of the voice acquisition devices recognized by the correspondence information creation device 260.

このような状況において、撮像装置２４０が撮影した映像を見ながら、情報処理装置２１０の表示画面上から音声制御をする際の動作について、図１９及び図２０を用いて説明する。図１９は、本発明の第２の実施形態に係る情報処理システムの音声制御について説明するための説明図である。図２０は、本発明の第２の実施形態に係る情報処理システム２００の音声調整のマイク特定フローを示すサブフローチャートである。 In such a situation, an operation when performing voice control from the display screen of the information processing apparatus 210 while viewing the video captured by the imaging apparatus 240 will be described with reference to FIGS. 19 and 20. FIG. 19 is an explanatory diagram for explaining voice control of the information processing system according to the second embodiment of the present invention. FIG. 20 is a sub-flowchart showing a microphone identification flow for audio adjustment of the information processing system 200 according to the second embodiment of the present invention.

音声調整の全体の流れについては、図６において説明した内容と同様であるため、説明を省略する。マイク特定フローについてのみ相違するため、図６のステップＳ１０４のサブフローチャートである図２０に示したマイク特定フローについて説明する。前提として、情報処理装置２１０は、音声取得装置２５０の配置が行われたとき、又は、参加者が着席したときなどの、対応情報が更新されたときに、対応情報作成装置２６０から対応情報を取得しているものとする。 The overall flow of the audio adjustment is the same as that described with reference to FIG. Since only the microphone specifying flow is different, the microphone specifying flow shown in FIG. 20 which is a sub-flowchart of step S104 of FIG. 6 will be described. As a premise, the information processing device 210 receives the correspondence information from the correspondence information creation device 260 when the correspondence information is updated, such as when the voice acquisition device 250 is arranged or when a participant is seated. It shall be acquired.

図１９において、ユーザは、音声を調整したい対象を画面上において指差すようにタッチする。ここでは、参加者４０の音声を調整する場合について考える。参加者４０を指差すようにタッチすると、指差した座標位置が取得される。そして、情報処理装置２１０の制御部２１４は、座標位置入力時の画像を取得する（Ｓ３０２）。そして、制御部２１４は、取得された画像から、座標位置に最も近い認識コードを検索する（Ｓ３０４）。 In FIG. 19, the user touches the target whose sound is to be adjusted to point on the screen. Here, the case where the audio | voice of the participant 40 is adjusted is considered. When the participant 40 is touched to point, the pointed coordinate position is acquired. Then, the control unit 214 of the information processing apparatus 210 acquires an image at the time of inputting a coordinate position (S302). And the control part 214 searches the recognition code nearest to a coordinate position from the acquired image (S304).

次に、対応情報から認識コードに対応する机を特定する（Ｓ３０６）。具体的には、ステップＳ３０４において検索された認識コード７００を読取ると、各机に付与された音声取得装置識別子「２５０ｄ」を取得することが出来る。そして、特定された机の対応情報の中から、音源識別子を受信した受信器を特定する（Ｓ３０８）。図１７を参照すると、音声取得装置２５０ｄの中で、音源識別子を受信しているのは、受信器２５４２ｄであることがわかる。 Next, the desk corresponding to the recognition code is identified from the correspondence information (S306). Specifically, when the recognition code 700 searched in step S304 is read, the voice acquisition device identifier “250d” given to each desk can be acquired. And the receiver which received the sound source identifier is specified from the correspondence information of the specified desk (S308). Referring to FIG. 17, it can be seen that the receiver 2542d is receiving the sound source identifier in the voice acquisition device 250d.

そして、この特定された受信器２５４２ｄに対応するマイクロホンを特定する（Ｓ３１０）。ここでは、図１７を参照すると、対応するマイク識別子は２３０ｋであることがわかる。 Then, the microphone corresponding to the specified receiver 2542d is specified (S310). Here, referring to FIG. 17, it can be seen that the corresponding microphone identifier is 230k.

このようにして、操作画面上の位置から、音声調整対象のマイクロホンを特定し、確実に所望のマイクロホンの調整を実行することが可能である。 In this way, it is possible to specify the microphone to be adjusted from the position on the operation screen, and reliably execute the desired microphone adjustment.

このとき、例えば、特定された机（または、複数台のマイク２３０からなるマイクセット）に２名以上の参加者が座っている場合には、制御部２１４は、双方の調整インタフェースを表示させるようにしてもよい。 At this time, for example, when two or more participants are sitting on the specified desk (or a microphone set including a plurality of microphones 230), the control unit 214 displays both adjustment interfaces. It may be.

（効果の例）
以上説明したように、本発明の第２の実施形態に係る情報処理システムによると、予めユーザが対応情報を設定することなく、自動的に対応情報を収集し、マイクロホンの特定をすることが出来る。そして特定されたマイクロホンに対して、映像を見ながら音声の調整を実行することが出来るようになる。 (Example of effects)
As described above, according to the information processing system according to the second embodiment of the present invention, the user can automatically collect the correspondence information and specify the microphone without setting the correspondence information in advance. . Then, it becomes possible to perform audio adjustment on the identified microphone while viewing the video.

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されないことは言うまでもない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても当然に本発明の技術的範囲に属するものと了解される。 As mentioned above, although preferred embodiment of this invention was described in detail, referring an accompanying drawing, it cannot be overemphasized that this invention is not limited to this example. It is obvious that a person having ordinary knowledge in the technical field to which the present invention pertains can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that these also belong to the technical scope of the present invention.

例えば、上記第1の実施形態では、固定マイクを用いたが、かかる例に限定されない。例えば、ピンマイクを用いてもよく、その場合には、参加者の着席位置に応じてピンマイクの位置が決まるため、そのエリア情報と予め対応づけておく必要がある。 For example, in the first embodiment, a fixed microphone is used, but the present invention is not limited to this example. For example, a pin microphone may be used, and in this case, the position of the pin microphone is determined according to the seating position of the participant, and therefore it is necessary to associate the area information with the area information in advance.

また、例えば、本実施形態においては、表示部の上にタッチパネル式の座標入力部を積層した表示入力部を有する構成としたが、これに限られない。例えば、表示部は、スクリーンに映しだすプロジェクタに接続するインタフェースであってもよい。この場合、例えばマウスなどの入力装置を用いてもよい。 For example, in this embodiment, although it was set as the structure which has a display input part which laminated | stacked the touch-panel type coordinate input part on the display part, it is not restricted to this. For example, the display unit may be an interface connected to a projector that projects on a screen. In this case, an input device such as a mouse may be used.

また、例えば、上記第２の実施形態においては、音声取得装置は、二等辺三角形形状の机であることとしたが、この形状は限定されない。受信器、送信器、及びマイクロホンが一対となり、机の参加者が座る位置に配置されるものであればよい。 Further, for example, in the second embodiment, the sound acquisition device is a desk with an isosceles triangle shape, but this shape is not limited. A receiver, a transmitter, and a microphone may be paired and arranged at a position where a desk participant sits.

また、例えば、上記第２の実施形態においては、音源識別子は、参加者が身につけるバッチホルダから発信されるものとしたが、かかる例に限られない。例えば、プロジェクタなどの音源となりうる装置にそれぞれ音源識別子が付与され、装置に音源識別子発信器を設けてもよい。その場合、音声取得装置の受信器の配置、又は、通信方式を工夫する必要がある。例えば、プロジェクタのファンの音がうるさい場合に、かかる音量を下げるために用いることが出来る。 Further, for example, in the second embodiment, the sound source identifier is transmitted from the batch holder worn by the participant, but is not limited to this example. For example, a sound source identifier may be assigned to each device that can be a sound source such as a projector, and a sound source identifier transmitter may be provided in the device. In that case, it is necessary to devise the arrangement of the receiver of the voice acquisition device or the communication method. For example, when the sound of a projector fan is loud, it can be used to reduce the volume.

また、例えば、上記第２の実施形態においては、受信器及び発信器は、赤外線を用いたものとしたがこれに限られない。例えば、音波を用いたドップラーセンサ、ＺｉｇＢｅｅ（登録商標）ノード等を用いて信号の送受信を行ってもよい。 For example, in the second embodiment, the receiver and the transmitter use infrared rays, but the present invention is not limited to this. For example, signals may be transmitted and received using a Doppler sensor using a sound wave, a ZigBee (registered trademark) node, or the like.

また、例えば、上記第２の実施形態においては、認識コードは、音声取得装置の上面にのみ埋め込まれているものとしたが、これに限られない。例えば、側面や、机の脚など、表面全体に埋め込まれているものであってもよい。 Further, for example, in the second embodiment, the recognition code is embedded only in the upper surface of the voice acquisition device, but is not limited thereto. For example, it may be embedded in the entire surface such as a side surface or a desk leg.

また、例えば、上記第２の実施形態においては、情報処理装置は、対応情報に更新がある度に、対応情報を、対応情報作成装置から取得しているものとしたが、これに限られない。例えば、マイク特定処理を行う前に対応情報を取得してもよい。 For example, in the second embodiment, the information processing apparatus acquires the correspondence information from the correspondence information creation apparatus every time the correspondence information is updated. However, the present invention is not limited to this. . For example, the correspondence information may be acquired before performing the microphone identification process.

また、上記第２の実施形態においては、認識コード７００は、音声取得装置識別子を示す場合を例に挙げて説明するが、かかる例に限定されない。例えば、認識コード７００は複数台のマイク２３０を一組としたマイクセットを識別する識別子の場合でも良い。例えば、図１０に示すように、マイク２３０ａ〜ｃを一組としたマイクセットを例示できる。かかる場合、音声取得装置２５０に複数台のマイクセットを備えることができる。なお、この場合、例えば、音声取得装置２５０の表面のうち、第１のマイクセットの備わる周辺部には認識コード７００ａが表示され、第２のマイクセットの備わる周辺部には認識コード７００ｂが表示される。また、認識コード７００は、マイクロホン１台１台のそれぞれを識別するものであってもよい。この場合には、認識コードから音声取得装置を識別し、マイクロホンと紐付けることなく、認識コードから直接マイクロホンを特定することが出来る。 In the second embodiment, the recognition code 700 will be described by way of an example indicating a voice acquisition device identifier, but is not limited to this example. For example, the recognition code 700 may be an identifier for identifying a microphone set including a plurality of microphones 230 as a set. For example, as illustrated in FIG. 10, a microphone set including a pair of microphones 230 a to 230 c can be exemplified. In such a case, the voice acquisition device 250 can include a plurality of microphone sets. In this case, for example, of the surface of the voice acquisition device 250, the recognition code 700a is displayed on the peripheral portion provided with the first microphone set, and the recognition code 700b is displayed on the peripheral portion provided with the second microphone set. Is done. Further, the recognition code 700 may identify each of the microphones. In this case, the voice acquisition device can be identified from the recognition code, and the microphone can be identified directly from the recognition code without being associated with the microphone.

尚、本明細書において、フローチャートに記述されたステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的に又は個別的に実行される処理をも含む。また時系列的に処理されるステップでも、場合によっては適宜順序を変更することが可能であることは言うまでもない。 In this specification, the steps described in the flowcharts are executed in parallel or individually even if they are not necessarily processed in time series, as well as processes performed in time series in the described order. Including processing to be performed. Further, it goes without saying that the order can be appropriately changed even in the steps processed in time series.

１１０情報処理装置
１１３座標入力部
１１４制御部
１１５表示部
１１６音声出力部
１２０音声処理装置
１３０マイクロホン
１４０撮像装置 DESCRIPTION OF SYMBOLS 110 Information processing apparatus 113 Coordinate input part 114 Control part 115 Display part 116 Audio | voice output part 120 Audio | voice processing apparatus 130 Microphone 140 Imaging device

Claims

An audio acquisition device comprising a plurality of microphones;
An imaging device for acquiring video by shooting ;
An audio processing device for processing audio acquired by the plurality of microphones;
An information processing device that outputs the video and sound processed in the sound processing device, and outputs sound control information to the sound processing device;
With
The voice acquisition device
A receiver for receiving the sound source identifier;
On the surface of the voice acquisition device, a recognition code embedded with a voice acquisition device identifier for identifying the voice acquisition device is displayed,
The information processing apparatus includes:
A display unit for displaying images;
A coordinate input unit for inputting a coordinate position on the display screen of the display unit;
An audio adjustment interface is superimposed on the video and displayed on the display unit in the vicinity of the coordinate position input by the coordinate input unit, the audio acquisition device identifier is read from the recognition code in the video, and the read audio acquisition Based on the device identifier and the correspondence information stored in the storage unit, the microphone identifier of the microphone that acquires the sound at the position corresponding to the coordinate position in the video is identified, and the identified microphone identifier and the sound adjustment interface A control unit for outputting voice control information including voice processing information according to an input from the coordinate input unit for
I have a,
The correspondence information is generated by collecting correspondence relationships among the sound acquisition device identifier, the sound source identifier given in advance to the sound source, the receiver identifier given to the receiver, and the microphone identifier. .

The voice acquisition device has a desk shape,
The receiver is installed on a side surface of the voice acquisition device,
The receiver receives the sound source identifier from a sound source identifier transmitter worn by a person;
The information processing system according to claim 1 .

The voice acquisition device further includes a transmitter that transmits its own transmitter identifier,
When the transmitter is combined with another voice acquisition device, the transmitter is arranged to face the receiver of the other voice acquisition device,
The receiver receives either the sound source identifier or the transmitter identifier;
The information processing system according to claim 2 , further comprising a correspondence information creation device that acquires correspondence information of the plurality of voice acquisition devices and recognizes an arrangement of the plurality of voice acquisition devices.

The voice processing device
A microphone identifier giving unit for giving a microphone identifier to the sound input from the microphone;
An audio mixer that processes the audio based on the microphone identifier assigned to the audio and the audio control information;
The a, an information processing system according to claim 1 or 2.

The information processing system according to claim 1, wherein the correspondence information is information in which a correspondence relationship between the coordinate position and the microphone identifier is stored in advance.

An information processing apparatus that acquires and outputs audio acquired by a plurality of microphones provided in an audio acquisition apparatus and video acquired by an imaging apparatus by shooting ,
A display unit for displaying the video;
A coordinate input unit for inputting a coordinate position on the display screen of the display unit;
An audio adjustment interface is superimposed on the video and displayed on the display unit in the vicinity of the coordinate position input by the coordinate input unit, and an audio acquisition device identifier is obtained from a recognition code displayed on the surface of the audio acquisition device in the video. The microphone identifier of the microphone that acquires the audio at the position corresponding to the coordinate position in the video is specified based on the read voice identifier and the correspondence information stored in the internal or external storage unit. A control unit that outputs audio control information including audio processing information corresponding to the input from the coordinate input unit to the specified microphone identifier and the audio adjustment interface to the audio processing device ;
The correspondence information is a correspondence relationship between the sound acquisition device identifier, a sound source identifier previously assigned to a sound source, a receiver identifier assigned to a receiver included in the sound acquisition device, and the microphone identifier. Processing equipment.

A method by an information processing apparatus comprising: a display unit that displays an image; a coordinate input unit that inputs a coordinate position on a display screen of the display unit; and a control unit,
The control unit is
A voice adjustment interface is superimposed on the video and displayed on the display unit near the coordinate position input by the coordinate input unit, and the voice acquisition device identifier is read from the recognition code displayed on the surface of the voice acquisition device in the video. Identifying a microphone identifier of a microphone that acquires sound at a position corresponding to the coordinate position in the video based on the read sound acquisition device identifier and correspondence information stored in an internal or external storage unit; the audio control information including a sound processing information corresponding to the input from the coordinate input unit for the identified microphone identifier and the audio adjustment interface outputs to the audio processor,
The information processing method , wherein the correspondence information is a correspondence relationship between the sound acquisition device identifier, a sound source identifier previously assigned to a sound source, a receiver identifier assigned to a receiver included in the sound acquisition device, and the microphone identifier .