JP2015022357A

JP2015022357A - Information processing system, information processing method, and information processing device

Info

Publication number: JP2015022357A
Application number: JP2013147853A
Authority: JP
Inventors: 亮人相場; Akihito Aiba
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2013-07-16
Filing date: 2013-07-16
Publication date: 2015-02-02

Abstract

PROBLEM TO BE SOLVED: To provide an information processing system capable of displaying appropriate content depending on a language used by a target and an interest level of the target, an information processing method, and an information processing device.SOLUTION: On the basis of a voice signal acquired with a microphone 5, a target attribute estimation unit 14 estimates an attribute of a target, a content selection acquisition unit 15 selects and acquires content on the basis of the attribute estimated by the target attribute estimation unit 14, and a display device 2 or an image projection device 3 displays the content acquired by the content selection acquisition unit 15 to the target.

Description

本発明は、デジタルサイネージを提供する情報処理システム、情報処理方法および情報処理装置に関する。 The present invention relates to an information processing system, an information processing method, and an information processing apparatus that provide digital signage.

近年、ネットワークやディスプレイ性能の向上に伴い、ディスプレイに映像や音声などを組み合わせたコンテンツを表示してターゲットに情報を提供するデジタルサイネージが注目されている。デジタルサイネージは、既存のポスターや看板に比べて高い表現力を有し、場所や時間に合わせてコンテンツを表示することができるため、ターゲットに強い印象を与えつつ情報を提供できる。そこで、カメラなど各種センサの情報を利用してターゲットの性別や年齢などの属性や様子を観測し、コンテンツに興味を示したターゲットの人物像を分析したりターゲットに興味を持たれそうなコンテンツを自動的に表示したりする技術が開発されている。 In recent years, with the improvement of network and display performance, digital signage that displays information that combines video and audio on a display and provides information to a target has attracted attention. Digital signage has higher expressive power than existing posters and signboards, and can display content according to location and time, so it can provide information while giving a strong impression to the target. Therefore, we use information from various sensors such as cameras to observe the attributes and appearances of the target such as gender and age, analyze the human figure of the target that showed interest in the content, and select content that is likely to be interested in the target. Technology that automatically displays has been developed.

例えば、特許文献１，２には、ターゲットの顔を撮像してその特徴から属性を推定する技術が記載されている。また、特許文献３にはターゲット映像から顔や動きの特徴を解析して、コンテンツの使用言語を切り替える技術が記載されている。 For example, Patent Documents 1 and 2 describe a technique for capturing an image of a target face and estimating an attribute from the feature. Japanese Patent Application Laid-Open No. 2003-228561 describes a technique for analyzing the features of a face and movement from a target video and switching the language used for the content.

なお、特許文献４には、ターゲットの会話を音声認識して顧客情報として収集する技術が記載されている。また、特許文献５には、一定の時間間隔で使用言語を切り替えて案内情報を表示し、ユーザが応答して行った操作に応じていずれかの使用言語を選択してコンテンツを表示する技術が記載されている。 Patent Document 4 describes a technique for voice recognition of target conversation and collecting it as customer information. Japanese Patent Application Laid-Open No. 2004-228620 has a technique for displaying guidance information by switching the language used at regular time intervals, and selecting one of the language used according to the operation performed in response to the user to display the content. Have been described.

しかしながら、特許文献１および２に記載されている顔の特徴からターゲットの属性を推定する技術では、ターゲットの使用言語まで推定することは困難である。例えば、顔の特徴のみから日本語を使用する日本人と中国語を使用する中国人とを区別して推定することは困難である。したがって、この技術によれば、ターゲットに応じて使用言語を選択してコンテンツを表示することは困難であった。 However, with the techniques for estimating target attributes from facial features described in Patent Documents 1 and 2, it is difficult to estimate the target language. For example, it is difficult to estimate the Japanese who uses Japanese and the Chinese who use Chinese from the facial features alone. Therefore, according to this technique, it has been difficult to select a language to be used according to a target and display content.

また、特許文献３に記載されている映像からターゲットの属性を推定する技術によれば、ターゲットが情報を単に眺めているだけなのか興味を持って見ているのかを区別することは困難である。したがって、ターゲットの関心度に応じて適切なコンテンツを表示することは困難であった。 Further, according to the technique for estimating the attribute of a target from the video described in Patent Document 3, it is difficult to distinguish whether the target is simply looking at information or looking at it with interest. . Therefore, it has been difficult to display appropriate content according to the degree of interest of the target.

本発明は、上記に鑑みてなされたものであって、ターゲットの属性に応じて適切なコンテンツを表示可能な情報処理システム、情報処理方法および情報処理装置を提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide an information processing system, an information processing method, and an information processing apparatus capable of displaying appropriate content in accordance with a target attribute.

上述した課題を解決し、目的を達成するために、本発明は、ターゲットの音声を含む音声信号を取得する音声取得手段と、前記音声取得手段により取得された音声信号に基づいて、前記ターゲットの属性を推定するターゲット属性推定手段と、前記ターゲット属性推定手段により推定された前記属性に基づいてコンテンツを選択して取得するコンテンツ選択取得手段と、前記コンテンツ選択取得手段により取得された前記コンテンツを前記ターゲットに向けて表示する表示手段と、を備えることを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention provides an audio acquisition unit that acquires an audio signal including target audio, and an audio signal acquired by the audio acquisition unit based on the audio signal acquired by the audio acquisition unit. Target attribute estimation means for estimating an attribute; content selection acquisition means for selecting and acquiring content based on the attribute estimated by the target attribute estimation means; and the content acquired by the content selection acquisition means Display means for displaying toward the target.

本発明によれば、ターゲットの属性に応じて適切なコンテンツを表示可能という効果を奏する。 According to the present invention, there is an effect that appropriate content can be displayed according to the attribute of the target.

図１は、第１の実施の形態にかかる情報処理システムの構成を例示する模式図である。FIG. 1 is a schematic view illustrating the configuration of an information processing system according to the first embodiment. 図２は、第１の実施の形態にかかる情報処理装置のハードウェア構成を例示する図である。FIG. 2 is a diagram illustrating a hardware configuration of the information processing apparatus according to the first embodiment. 図３は、第１の実施の形態にかかる画像投影装置のハードウェア構成を例示する図である。FIG. 3 is a diagram illustrating a hardware configuration of the image projection apparatus according to the first embodiment. 図４は、第１の実施の形態にかかる情報処理装置の機能構成を例示するブロック図である。FIG. 4 is a block diagram illustrating a functional configuration of the information processing apparatus according to the first embodiment. 図５は、コンテンツ属性テーブルのデータ構成を例示する図である。FIG. 5 is a diagram illustrating a data configuration of the content attribute table. 図６は、情報処理装置における情報処理手順を示すフローチャートである。FIG. 6 is a flowchart illustrating an information processing procedure in the information processing apparatus. 図７は、ターゲットの属性を推定する方法を説明するための説明図である。FIG. 7 is an explanatory diagram for explaining a method of estimating a target attribute. 図８は、関心度の推定方法を説明するための説明図である。FIG. 8 is an explanatory diagram for explaining an interest level estimation method. 図９は、コンテンツ属性テーブルの他のデータ構成例を示す図である。FIG. 9 is a diagram illustrating another data configuration example of the content attribute table. 図１０は、コンテンツの表示位置の調整について説明するための説明図である。FIG. 10 is an explanatory diagram for explaining the adjustment of the display position of the content. 図１１は、第２の実施の形態にかかる情報処理装置の機能構成を例示するブロック図である。FIG. 11 is a block diagram illustrating a functional configuration of the information processing apparatus according to the second embodiment. 図１２は、第２の実施の形態にかかる情報処理手順を示すフローチャートである。FIG. 12 is a flowchart illustrating an information processing procedure according to the second embodiment. 図１３は、関心度の推定方法を説明するための説明図である。FIG. 13 is an explanatory diagram for explaining an interest level estimation method. 図１４は、コンテンツの表示位置の調整について説明するための説明図である。FIG. 14 is an explanatory diagram for explaining the adjustment of the display position of the content.

以下に添付図面を参照して、情報処理システム、情報処理方法および情報処理装置の実施の形態を詳細に説明する。なお、この実施の形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Hereinafter, embodiments of an information processing system, an information processing method, and an information processing apparatus will be described in detail with reference to the accompanying drawings. In addition, this invention is not limited by this embodiment. Moreover, in description of drawing, the same code | symbol is attached | subjected and shown to the same part.

（第１の実施の形態）
［情報処理システムの構成］
図１は、本発明の第１の実施の形態にかかる情報処理システムの構成を示す模式図である。図１に示すように、情報処理システム１０は、情報処理装置１、表示装置２、画像投影装置３、サーバ４、およびマイク５などを備え、データ伝送路Ｎを介して相互に接続されている。なお、情報処理装置１は、マイク５、表示装置２または画像投影装置３などを内蔵して、一体のハードウェアとしてもよい。 (First embodiment)
[Configuration of information processing system]
FIG. 1 is a schematic diagram showing the configuration of the information processing system according to the first embodiment of the present invention. As shown in FIG. 1, the information processing system 10 includes an information processing device 1, a display device 2, an image projection device 3, a server 4, a microphone 5, and the like, and is connected to each other via a data transmission path N. . Note that the information processing apparatus 1 may include a microphone 5, the display apparatus 2, the image projection apparatus 3, and the like, and may be integrated hardware.

サーバ４は、演算装置や大容量の記憶装置を備えサーバ機能を有する機器であり、サーバ装置やユニット装置などに相当する。本実施の形態のサーバ４は、表示手段としての表示装置２や画像投影装置３に表示させるコンテンツを格納する。マイク５は、音声取得手段として、表示装置２や画像投影装置３の投影面の近傍に設置され、ターゲットの音声を含む音声信号を取得する。データ伝送路Ｎは、例えば、ＬＡＮ（Local Area Network）、イントラネット、イーサネット（登録商標）またはインターネットなどの各種ネットワーク通信路に相当する。なお、ネットワーク通信路の有線または無線を問わない。また、データ伝送路ＮにはＵＳＢ（Universal Serial Bus）などの各種バス通信路も含まれる。 The server 4 is a device having an arithmetic device and a large-capacity storage device and having a server function, and corresponds to a server device, a unit device, or the like. The server 4 according to the present embodiment stores contents to be displayed on the display device 2 and the image projection device 3 as display means. The microphone 5 is installed in the vicinity of the projection surface of the display device 2 or the image projection device 3 as a sound acquisition unit, and acquires a sound signal including target sound. The data transmission path N corresponds to various network communication paths such as a LAN (Local Area Network), an intranet, Ethernet (registered trademark), or the Internet. It does not matter whether the network communication path is wired or wireless. The data transmission path N includes various bus communication paths such as USB (Universal Serial Bus).

情報処理装置１は、演算装置を備え情報処理機能を有する機器であり、タブレットなどの情報端末も含む。情報処理装置１は、図２に示すように、ＣＰＵ（Central Processing Unit）１０１、主記憶装置１０２、補助記憶装置１０３、通信ＩＦ（interface）１０４、及び外部ＩＦ１０５などを備え、バス通信路Ｂを介して相互に接続される。 The information processing apparatus 1 is a device that includes an arithmetic device and has an information processing function, and includes an information terminal such as a tablet. As shown in FIG. 2, the information processing apparatus 1 includes a CPU (Central Processing Unit) 101, a main storage device 102, an auxiliary storage device 103, a communication IF (interface) 104, an external IF 105, and the like. Connected to each other.

ＣＰＵ１０１は、情報処理装置１全体の制御や搭載機能を実現するための演算装置である。主記憶装置１０２は、プログラムやデータなどを所定の記憶領域に保持する記憶装置（メモリ）である。主記憶装置１０２は、例えば、ＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）などである。また、補助記憶装置１０３は、主記憶装置１０２より容量の大きい記憶領域を備える記憶装置である。補助記憶装置１０３は、例えば、ＨＤＤ（Hard Disk Drive）やメモリカード（Memory Card）などの不揮発性の記憶装置である。なお、補助記憶装置１０３には、例えば、フレキシブルディスク（ＦＤ）、ＣＤ（Compact Disk）、及びＤＶＤ（Digital Versatile Disk）などの記憶媒体が含まれる。よって、ＣＰＵ１０１は、例えば、補助記憶装置１０３から主記憶装置１０２上に、プログラムやデータを読み出し、処理を実行することで、情報処理装置１全体の制御や搭載機能を実現する。 The CPU 101 is an arithmetic device for realizing control of the entire information processing apparatus 1 and mounting functions. The main storage device 102 is a storage device (memory) that holds programs, data, and the like in a predetermined storage area. The main storage device 102 is, for example, a ROM (Read Only Memory) or a RAM (Random Access Memory). The auxiliary storage device 103 is a storage device having a storage area with a larger capacity than the main storage device 102. The auxiliary storage device 103 is a non-volatile storage device such as an HDD (Hard Disk Drive) or a memory card (Memory Card). The auxiliary storage device 103 includes storage media such as a flexible disk (FD), a CD (Compact Disk), and a DVD (Digital Versatile Disk). Therefore, for example, the CPU 101 reads out programs and data from the auxiliary storage device 103 to the main storage device 102 and executes processing, thereby realizing control and mounting functions of the entire information processing apparatus 1.

通信ＩＦ１０４は、情報処理装置１をデータ伝送路Ｎに接続するインタフェースである。これにより、情報処理装置１は、表示装置２、画像投影装置３、およびサーバ４とデータ通信可能となる。外部ＩＦ１０５は、情報処理装置１と外部機器１０６との間でデータを送受信するためのインタフェースである。外部機器１０６には、例えば、操作入力を受け付けるテンキーやタッチパネルなどの入力装置や、大容量の記憶領域を備える外部記憶装置や各種記憶媒体の書き込み又は読み取りを行うドライブ装置などがある。 The communication IF 104 is an interface that connects the information processing apparatus 1 to the data transmission path N. As a result, the information processing apparatus 1 can perform data communication with the display device 2, the image projection device 3, and the server 4. The external IF 105 is an interface for transmitting and receiving data between the information processing apparatus 1 and the external device 106. The external device 106 includes, for example, an input device such as a numeric keypad and a touch panel that accepts an operation input, an external storage device having a large-capacity storage area, and a drive device that writes or reads various storage media.

表示装置２は、液晶ディスプレイなどに相当し、情報処理装置１での処理結果などの各種情報の他、後述する情報処理装置１の情報処理の結果、選択されたコンテンツをターゲットに向けて表示する。 The display device 2 corresponds to a liquid crystal display or the like, and displays various contents such as processing results in the information processing device 1 as well as the selected content as a result of information processing in the information processing device 1 described later toward the target. .

画像投影装置３は、光学系の投影エンジンを備え投影機能を有する機器であり、プロジェクタなどに相当する。本実施の形態では、画像投影装置３は、表示装置２に表示されるコンテンツと同様のコンテンツを投影面に投影する。画像投影装置３は、図３に例示するように、ＣＰＵ３０１、メモリコントローラ３０２、メインメモリ３０３、およびホスト−ＰＣＩ（Peripheral Component Interconnect）ブリッジ３０４などを備える。メモリコントローラ３０２は、ホスト・バス３１１を介して、ＣＰＵ３０１、メインメモリ３０３、およびホスト−ＰＣＩブリッジ３０４などに接続されている。 The image projection device 3 is a device having an optical projection engine and having a projection function, and corresponds to a projector or the like. In the present embodiment, the image projection device 3 projects the same content as the content displayed on the display device 2 onto the projection plane. As illustrated in FIG. 3, the image projection apparatus 3 includes a CPU 301, a memory controller 302, a main memory 303, a host-PCI (Peripheral Component Interconnect) bridge 304, and the like. The memory controller 302 is connected to the CPU 301, the main memory 303, the host-PCI bridge 304, and the like via the host bus 311.

ＣＰＵ３０１は、画像投影装置３の全体制御を行う演算装置である。メモリコントローラ３０２は、メインメモリ３０３に対する読み書きなどを制御する制御回路である。メインメモリ３０３は、例えば、プログラムやデータの格納用メモリ、プログラムやデータの展開用メモリ、描画用メモリ、または描画用メモリなどとして用いられる半導体メモリである。 The CPU 301 is an arithmetic device that performs overall control of the image projection device 3. The memory controller 302 is a control circuit that controls reading and writing with respect to the main memory 303. The main memory 303 is, for example, a semiconductor memory used as a program or data storage memory, a program or data expansion memory, a drawing memory, or a drawing memory.

ホスト−ＰＣＩブリッジ３０４は、周辺デバイスやＰＣＩデバイス３０５を接続するためのブリッジ回路である。ホスト−ＰＣＩブリッジ３０４は、ＨＤＤＩ／Ｆ３１２を介して、メモリカード３０６に接続される。また、ホスト−ＰＣＩブリッジ３０４は、ＰＣＩバス３１３を介して、ＰＣＩデバイス３０５に接続される。また、ホスト−ＰＣＩブリッジ３０４は、ＰＣＩバス３１３およびＰＣＩスロット３１４を介して、通信カード３０７、無線通信カード３０８、およびビデオカード３０９などに接続される。 The host-PCI bridge 304 is a bridge circuit for connecting peripheral devices and PCI devices 305. The host-PCI bridge 304 is connected to the memory card 306 via the HDD I / F 312. The host-PCI bridge 304 is connected to the PCI device 305 via the PCI bus 313. The host-PCI bridge 304 is connected to the communication card 307, the wireless communication card 308, the video card 309, and the like via the PCI bus 313 and the PCI slot 314.

メモリカード３０６は、基本ソフトウェア（ＯＳ：Operating System）のブートデバイスとして利用される記憶メディアである。通信カード３０７および無線通信カード３０８は、画像投影装置３をＬＡＮなどのネットワークや通信回線に接続し、データ通信を制御する通信制御装置である。ビデオカード３０９は、投影面に出力する画像の表示を制御する表示制御装置である。なお、本実施の形態の画像投影装置３で実行される制御プログラムは、メインメモリ３０３の格納用メモリなどに予め組み込まれて提供される。 The memory card 306 is a storage medium used as a boot device for basic software (OS: Operating System). The communication card 307 and the wireless communication card 308 are communication control devices that control the data communication by connecting the image projection device 3 to a network such as a LAN or a communication line. The video card 309 is a display control device that controls display of an image to be output on the projection surface. The control program executed by the image projection apparatus 3 according to the present embodiment is provided by being incorporated in advance in a storage memory of the main memory 303 or the like.

［情報処理装置の構成］
図４は、本実施の形態の情報処理装置１の機能構成を例示するブロック図である。情報処理装置１は、図４に示すように、サーバ４から適宜配信されるコンテンツを受信して、各種メモリで実現されるコンテンツ記憶部１１に記憶する。このコンテンツは、コンテンツを識別するコンテンツ番号と対応づけられてコンテンツ記憶部１１に記憶される。また、コンテンツ記憶部１１には、コンテンツ属性テーブルが記憶される。コンテンツ属性テーブルは、図５に例示するように、属性、コンテンツ番号が含まれる。属性とは、コンテンツにより情報を提供するターゲットの性別、年齢、使用言語、関心度などの属性を意味する。なお、関心度とは、コンテンツにより提供された情報に対するターゲットの興味の度合い（高低）を意味する。 [Configuration of information processing device]
FIG. 4 is a block diagram illustrating a functional configuration of the information processing apparatus 1 according to this embodiment. As illustrated in FIG. 4, the information processing apparatus 1 receives content appropriately distributed from the server 4 and stores the content in the content storage unit 11 realized by various memories. This content is stored in the content storage unit 11 in association with a content number for identifying the content. The content storage unit 11 stores a content attribute table. The content attribute table includes attributes and content numbers as illustrated in FIG. The attribute means attributes such as the sex, age, language used, and interest level of the target that provides information by content. Note that the degree of interest means the degree of interest (high or low) of the target with respect to information provided by the content.

また、情報処理装置１は、音声取得手段としてのマイク５から情報を受け取って、ＣＰＵ１０１が、補助記憶装置１０３から情報処理プログラムを読み出し、ＲＡＭに展開して実行する。これにより、ＲＡＭが、ターゲット属性推定部１４、およびコンテンツ選択取得部１５として機能する。これら各部の機能については後述する。 In addition, the information processing apparatus 1 receives information from the microphone 5 serving as a voice acquisition unit, and the CPU 101 reads out an information processing program from the auxiliary storage device 103, expands it in the RAM, and executes it. Thereby, the RAM functions as the target attribute estimation unit 14 and the content selection acquisition unit 15. The functions of these units will be described later.

なお、本実施形態の情報処理装置１で実行される情報処理プログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（Digital Versatile Disk）などのコンピュータで読み取り可能な記録媒体に記録されて提供される。 The information processing program executed by the information processing apparatus 1 according to the present embodiment is a file in an installable format or an executable format, and is a CD-ROM, flexible disk (FD), CD-R, DVD (Digital Versatile Disk). ) And the like are provided by being recorded on a computer-readable recording medium.

また、本実施の形態の情報処理装置１で実行される情報処理プログラムを、インターネットなどのネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。また、本実施形態の情報処理装置１で実行される情報処理プログラムをインターネットなどのネットワーク経由で提供または配布するように構成しても良い。また、本実施形態の情報処理プログラムを、ＲＯＭなどに予め組み込んで提供するように構成してもよい。 In addition, the information processing program executed by the information processing apparatus 1 according to the present embodiment may be stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. . Further, the information processing program executed by the information processing apparatus 1 of the present embodiment may be provided or distributed via a network such as the Internet. Further, the information processing program of the present embodiment may be provided by being incorporated in advance in a ROM or the like.

本実施の形態の情報処理装置１で実行される情報処理プログラムは、上述した各部（ターゲット属性推定部１４、およびコンテンツ選択取得部１５）を含むモジュール構成となっている。実際のハードウェアとしてはＣＰＵ１０１が上記記憶媒体から情報処理プログラムを読み出して実行することにより上記各部が主記憶装置１０２上にロードされ、各部が主記憶装置１０２上に生成される。 The information processing program executed by the information processing apparatus 1 according to the present embodiment has a module configuration including the above-described units (target attribute estimation unit 14 and content selection acquisition unit 15). As actual hardware, the CPU 101 reads out and executes the information processing program from the storage medium, whereby the above-described units are loaded onto the main storage device 102 and the respective units are generated on the main storage device 102.

［情報処理］
図６は、情報処理装置１における情報処理手順を示すフローチャートである。図６に示す情報処理は、例えば、オペレータによる情報処理開始の指示入力があったタイミングで開始となり、情報処理はステップＳ１の処理に進む。 [Information processing]
FIG. 6 is a flowchart showing an information processing procedure in the information processing apparatus 1. The information processing shown in FIG. 6 is started, for example, at a timing when an operator inputs an information processing start instruction, and the information processing proceeds to the process of step S1.

ステップＳ１の処理では、マイク５がターゲットの音声を含む音声信号を取得する。これにより、ステップＳ１の処理は完了し、情報処理はステップＳ２の処理に進む。 In the process of step S1, the microphone 5 acquires an audio signal including the target audio. Thereby, the process of step S1 is completed and information processing progresses to the process of step S2.

ステップＳ２の処理では、ターゲット属性推定部１４が、ステップＳ１で取得された音声信号に基づいて、例えば、ターゲットの使用言語や関心度などを含む属性を推定する。これにより、ステップＳ２の処理は完了し、情報処理はステップＳ３の処理に進む。 In the process of step S2, the target attribute estimation unit 14 estimates attributes including, for example, the target language and interest level, based on the audio signal acquired in step S1. Thereby, the process of step S2 is completed, and the information processing proceeds to the process of step S3.

ここで、図７を参照して、ターゲットの属性の推定方法について説明する。図７に示すように、例えば、ターゲットが属性Ａ，Ｂ，Ｃのうちのどの属性に属しているかを推定する場合、図７に示すように、ターゲット属性推定部１４は、まず、取得された音声信号の特徴量を抽出する。特徴量とは、例えば、一定区間ごとの信号のエネルギーや周波数スペクトル、ＭＦＣＣ（メル周波数ケプストラム係数）などを意味する。なお、音声認識などに適用されるＭＦＣＣは、人間の聴覚の性質を取り入れた特徴量であって、次のようにして求められる。すなわち、まず、ＦＦＴによって得られた周波数スペクトルの絶対値をとり、人間の聴覚に応じた音の高さの尺度であるメル尺度上で等間隔なフィルタバンクにかけて各帯域のスペクトルの和を求める。次に、対数をとり、離散コサイン変換（ＤＣＴ）を行って、低次成分を取り出す。 Here, a method for estimating the target attribute will be described with reference to FIG. As shown in FIG. 7, for example, when estimating which attribute of the attributes A, B, and C the target belongs to, the target attribute estimation unit 14 is first acquired as shown in FIG. Extract features of audio signal. The feature amount means, for example, signal energy, frequency spectrum, MFCC (Mel Frequency Cepstrum Coefficient), etc. for each fixed section. Note that the MFCC applied to speech recognition or the like is a feature amount that incorporates the nature of human hearing and is obtained as follows. That is, first, the absolute value of the frequency spectrum obtained by FFT is taken, and the sum of the spectrum of each band is obtained by applying a filter bank at equal intervals on the Mel scale which is a scale of the pitch of the sound according to human hearing. Next, the logarithm is taken and a discrete cosine transform (DCT) is performed to extract a low-order component.

次に、ターゲット属性推定部１４は、抽出された特徴量から各属性のモデルに対する尤度を算出する。各属性のモデルとは、例えば、ＧＭＭ（ガウス混合モデル）やＨＭＭ（隠れマルコフモデル）などにより各属性の特徴をモデル化したものである。各モデルの尤度算出対象のパラメータは、予め各属性のサンプルから抽出された特徴量を用いて学習される。例えば、各モデルのパラメータは、各多次元ガウス分布の重み、平均や共分散などである。尤度とは、そのモデルに対する尤もらしさを示し、尤度が高いほどそのモデルに合致していることを示す。属性の尤度は、属性のモデルのパラメータと抽出された特徴量とから算出できる。例えば、ＧＭＭの場合には、尤度は次式（１）によって算出される。ここで、Ｌが尤度、ｘがＭＦＣＣのベクトル、ｗ_ｋがｋ番目の多次元ガウス分布とする。 Next, the target attribute estimation unit 14 calculates the likelihood for each attribute model from the extracted feature quantity. The model of each attribute is obtained by modeling the characteristics of each attribute using, for example, GMM (Gaussian mixture model) or HMM (Hidden Markov model). The likelihood calculation target parameter of each model is learned using a feature amount extracted in advance from a sample of each attribute. For example, the parameters of each model are the weight, average, and covariance of each multidimensional Gaussian distribution. Likelihood indicates the likelihood of the model, and the higher the likelihood, the better the model. The likelihood of the attribute can be calculated from the parameter of the attribute model and the extracted feature amount. For example, in the case of GMM, the likelihood is calculated by the following equation (1). Here, L is likelihood, x is a MFCC vector, and w _k is the k-th multidimensional Gaussian distribution.

次に、ターゲット属性推定部１４は、以上のようにして算出された尤度から、ターゲットがどの属性に属するかを判定する。例えば、ターゲット属性推定部１４は、尤度が最大となったモデルの属性を、このターゲットが属する属性と判定する。 Next, the target attribute estimation unit 14 determines which attribute the target belongs to from the likelihood calculated as described above. For example, the target attribute estimation unit 14 determines the attribute of the model having the maximum likelihood as the attribute to which the target belongs.

なお、３つ以上の属性や、複数の属性を組み合わせた複合的な属性についても、同様に求めることができる。例えば、ターゲット属性推定部１４は、使用言語（Ａ，Ｂ，・・・）と関心度（高、中、低）との組み合わせからなる（使用言語Ａ、関心度高）、（使用言語Ａ、関心度低）、（使用言語Ｂ、関心度中）などの属性を予め規定しておく。そして、ターゲット属性推定部１４は、各属性のモデルに対するターゲットの特徴量の尤度を算出し、ターゲットがどの属性に属するかを推定する。 Note that three or more attributes or a complex attribute obtained by combining a plurality of attributes can be obtained in the same manner. For example, the target attribute estimation unit 14 includes a combination of a language used (A, B,...) And a degree of interest (high, medium, low) (used language A, high interest level), (used language A, Attributes such as (low interest level) and (use language B, medium interest level) are defined in advance. Then, the target attribute estimation unit 14 calculates the likelihood of the target feature amount for each attribute model, and estimates which attribute the target belongs to.

ステップＳ３の処理では、コンテンツ選択取得部１５が、図５に示すコンテンツ属性テーブルを参照し、ステップＳ２の処理で推定された属性に対応するコンテンツを選択し、コンテンツ記憶部１１から取得する。これにより、ステップＳ３の処理は完了し、情報処理はステップＳ４の処理に進む。 In the process of step S3, the content selection / acquisition unit 15 refers to the content attribute table shown in FIG. 5, selects the content corresponding to the attribute estimated in the process of step S2, and acquires it from the content storage unit 11. Thereby, the process of step S3 is completed and information processing progresses to the process of step S4.

ステップＳ４の処理では、表示手段としての表示装置２あるいは画像投影装置３が、ステップＳ３の処理で取得されたコンテンツを表示する。これにより、ステップＳ４の処理は完了し、一連の情報処理は終了する。 In the process of step S4, the display device 2 or the image projection device 3 as a display unit displays the content acquired in the process of step S3. Thereby, the process of step S4 is completed and a series of information processing is complete | finished.

なお、ステップＳ２の処理における属性の一例である関心度については、以下に説明する推定方法を採用してもよい。すなわち、図８に例示するように、ターゲット属性推定部１４が、まず、取得された音声信号に基づいて、ターゲットの発話の有無や、特定単語を検出し、これらが検出されたか否かに基づいて関心度を推定する。例えば、ターゲットがコンテンツを無言で見ているより何かを話しながら見ている方が関心度は高いと考えられることから、ターゲットの発話の有無を関心度の尺度とする。特定単語とは、例えば、欲しい、安いなどの関心度の尺度となるキーワードを意味する。そこで、ターゲット属性推定部１４は、例えば、発話、特定単語ともに検出された場合には関心度が高いと推定する。また、ターゲット属性推定部１４は、発話は検出されたが特定単語が検出されなかった場合には関心度は中程度と推定し、いずれも検出されなかった場合には関心度が低いと推定する。なお、特定単語は、予め単語データベースに登録しておく。あるいは、マイク５がターゲットの音声信号を取得する際に、表示装置２や画像投影装置３の投影面に表示されているコンテンツに含まれる特定単語を単語データベースに自動的に登録する。そして、ターゲット属性推定部１４が、関心度を推定する際に、単語データベースを参照して検出対象の特定単語を決定する。 In addition, about the interest level which is an example of the attribute in the process of step S2, the estimation method demonstrated below may be employ | adopted. That is, as illustrated in FIG. 8, the target attribute estimation unit 14 first detects the presence / absence of a target utterance and a specific word based on the acquired voice signal, and whether or not these are detected. To estimate the degree of interest. For example, since it is considered that the degree of interest is higher when the target is watching the content while speaking, the presence or absence of the target utterance is used as a measure of the degree of interest. The specific word means, for example, a keyword that is a measure of the degree of interest such as wanting and cheap. Therefore, the target attribute estimation unit 14 estimates that the degree of interest is high when both utterances and specific words are detected, for example. Further, the target attribute estimation unit 14 estimates that the degree of interest is medium when an utterance is detected but a specific word is not detected, and if neither is detected, the degree of interest is estimated to be low. . The specific word is registered in advance in the word database. Alternatively, when the microphone 5 acquires the target audio signal, the specific word included in the content displayed on the projection surface of the display device 2 or the image projection device 3 is automatically registered in the word database. Then, the target attribute estimation unit 14 determines a specific word to be detected with reference to the word database when estimating the degree of interest.

上記のように関心度を他の属性とは別に推定する場合、ステップＳ３の処理で参照されるコンテンツ属性テーブルは、図９に示すように、関心度を他の属性と区別した形態としてもよい。 When the interest level is estimated separately from other attributes as described above, the content attribute table referred to in the process of step S3 may have a form in which the interest level is distinguished from other attributes as shown in FIG. .

なお、コンテンツ属性テーブルには、関心度や使用言語などの属性に応じたコンテンツのコンテンツ番号が予め登録される。例えば、関心度が低い場合に対応付けされた初期コンテンツに対し、関心度が中程度の場合には、ターゲットの使用言語に翻訳された初期コンテンツが対応付けされる。また、関心度が高い場合には、さらに追加情報を含むコンテンツが対応付けされる。これにより、例えば、関心度や使用言語などのターゲットの属性に応じたコンテンツが表示装置２や画像投影装置３の投影面に表示される。 In the content attribute table, content numbers of contents corresponding to attributes such as the degree of interest and the language used are registered in advance. For example, when the interest level is medium, the initial content translated into the target language is associated with the initial content associated with the low interest level. In addition, when the degree of interest is high, content including additional information is associated. Thereby, for example, content according to the target attribute such as the degree of interest and the language used is displayed on the projection surface of the display device 2 or the image projection device 3.

また、ステップＳ１の処理で取得される音声信号がマルチチャネルである場合には、情報処理装置１は、ＤＳ（Delay and Sum Beamformer）法やＭＵＳＩＣ（MUltiple SIgnal Classification）法などのアレイ処理により音源の方向を推定することにより、ターゲットの位置を検出できる。その場合、情報処理装置１は、検出されたターゲットの位置に応じて、ステップＳ３の処理で選択され取得されたコンテンツを表示装置２あるいは画像投影装置３の投影面に表示する際の表示位置を調整できる。例えば、図１０に例示するように、ターゲットの位置が表示装置２や画像投影装置３の投影面の左側と検出された場合に、先に表示されているコンテンツａの左側に重ね合わせて選択取得されたコンテンツｂを表示できる。 Further, when the audio signal acquired in the process of step S1 is multi-channel, the information processing apparatus 1 uses the array process such as the DS (Delay and Sum Beamformer) method or the MUSIC (MUltiple SIgnal Classification) method to generate a sound source. By estimating the direction, the position of the target can be detected. In this case, the information processing apparatus 1 displays the display position when the content selected and acquired in the process of step S3 is displayed on the projection surface of the display device 2 or the image projection device 3 according to the detected target position. Can be adjusted. For example, as illustrated in FIG. 10, when the position of the target is detected as the left side of the projection surface of the display device 2 or the image projection device 3, it is selected and acquired by superimposing it on the left side of the content a that is displayed first. The displayed content b can be displayed.

（第２の実施の形態）
第２の実施の形態の情報処理システム１０は、映像取得手段としてのカメラ６を備える点を除いて、図１に示す上記第１の実施の形態の情報処理システム１０と同様に構成される。図１１は、第２の実施の形態の情報処理装置１の機能構成を例示する図である。カメラ６は、表示装置２や画像投影装置３の投影面の近傍に設置され、ターゲットの映像を含む映像信号を取得する。情報処理プログラムにより、図４に示す第１の実施の形態と同様のターゲット属性推定部１４、およびコンテンツ選択取得部１５に加え、ターゲット音声抽出部１２、ターゲット位置検出部１３、表示位置調整部１６がＲＡＭに展開される。本実施の形態の情報処理システム１０は、音声信号に加えて映像信号を取得することにより、後述するように、ターゲットの位置の検出精度が向上し、また、複数人のターゲットの位置を検出することもできる。そのため、複数人のターゲットのそれぞれの位置に応じて複数のコンテンツを表示させることもできる。 (Second Embodiment)
The information processing system 10 of the second embodiment is configured in the same manner as the information processing system 10 of the first embodiment shown in FIG. 1 except that a camera 6 is provided as a video acquisition unit. FIG. 11 is a diagram illustrating a functional configuration of the information processing apparatus 1 according to the second embodiment. The camera 6 is installed in the vicinity of the projection surface of the display device 2 or the image projection device 3, and acquires a video signal including a target image. According to the information processing program, in addition to the target attribute estimation unit 14 and the content selection acquisition unit 15 similar to those in the first embodiment shown in FIG. Is expanded in the RAM. The information processing system 10 according to the present embodiment acquires a video signal in addition to an audio signal, thereby improving the accuracy of target position detection and detecting the positions of a plurality of targets as will be described later. You can also. Therefore, a plurality of contents can be displayed according to the positions of the targets of a plurality of people.

なお、情報処理装置１は、マイク５やカメラ６、表示装置２または画像投影装置３などを内蔵して、一体のハードウェアとしてもよい。 Note that the information processing apparatus 1 may include a microphone 5, a camera 6, a display device 2, an image projection device 3, and the like, and may be integrated hardware.

図１２は、第２の実施の形態の情報処理装置１における情報処理手順例を示す。図１２に示す情報処理は、上記第１の実施の形態の情報処理と同様に、例えば、オペレータによる情報処理開始の指示入力があったタイミングで開始となり、情報処理はステップＳ１１の処理に進む。 FIG. 12 illustrates an example of an information processing procedure in the information processing apparatus 1 according to the second embodiment. The information processing shown in FIG. 12 is started, for example, at the timing when the operator inputs an information processing start instruction, as in the information processing of the first embodiment, and the information processing proceeds to the processing of step S11.

ステップＳ１１の処理では、上記した第１の実施の形態のステップＳ１の処理と同様に、マイク５がターゲットの音声を含む音声信号を取得する。加えて、カメラ６がターゲットの映像を含む映像信号を取得する。これにより、ステップＳ１１の処理は完了し、情報処理はステップＳ１２の処理に進む。 In the process of step S11, similarly to the process of step S1 of the first embodiment described above, the microphone 5 acquires an audio signal including the target audio. In addition, the camera 6 acquires a video signal including the target video. Thereby, the process of step S11 is completed and information processing progresses to the process of step S12.

ステップＳ１２の処理では、ターゲット位置検出部１３が、ステップＳ１１の処理により取得された音声信号と映像信号とに基づいて、ターゲットが表示装置２または画像投影装置３の投影面に対してどの位置にいるかを検出する。例えば、前述したとおり、マルチチャネルの音声信号に基づいて、ＤＳ法やＭＵＳＩＣ法などのアレイ処理により音源の方向を推定することにより、ターゲットの位置を検出する。加えて、映像信号からターゲットの顔や身体を識別することにより、ターゲットの位置を検出する。なお、本実施の形態では、音声信号と映像信号とを複合的に利用することにより、ターゲットの位置の検出の精度が向上する。これにより、ステップＳ１２の処理は完了し、情報処理はステップＳ１３の処理に進む。 In the process of step S12, the target position detection unit 13 determines the position of the target with respect to the projection surface of the display device 2 or the image projection device 3 based on the audio signal and the video signal acquired by the process of step S11. Detect whether or not For example, as described above, the position of the target is detected by estimating the direction of the sound source by array processing such as the DS method or the MUSIC method based on the multi-channel audio signal. In addition, the target position is detected by identifying the face and body of the target from the video signal. In the present embodiment, the detection accuracy of the target position is improved by using the audio signal and the video signal in combination. Thereby, the process of step S12 is completed and information processing progresses to the process of step S13.

ステップＳ１３の処理では、ターゲット音声抽出部１２が、ステップＳ１２の処理で検出されたターゲットの位置に基づいて、ステップＳ１１の処理により取得された音声信号からターゲットの音声信号を抽出する。例えば、ターゲット音声抽出部１２は、映像に含まれる複数人の中から発声しているターゲットの音声信号を抽出する。また、ターゲット音声抽出部１２は、入力された音声信号にターゲットの音声以外の雑音が含まれている場合には、その影響を低減してターゲットの音声のみを抽出する。具体的には、音声信号が単チャネルの場合には、ターゲット音声抽出部１２は、スペクトルサブトラクション法などを用いて処理を行う。音声信号が複数チャネルの場合には、ターゲット音声抽出部１２は、ＤＳ法やＭＶＤＲ法などのビームフォーミングや、ＩＣＡ（独立成分分析）などを用いたブラインド音源分離などの方法を用いて処理を行う。なお、ビームフォーミングなどを用いる際、ターゲットの位置に基づいて目的音を抽出する。このように、音声信号と映像信号とを複合的に利用することにより、映像に含まれる発声している複数人のターゲットの位置と音声とを検出できる。これにより、ステップＳ１３の処理は完了し、情報処理はステップＳ１４の処理に進む。 In the process of step S13, the target sound extraction unit 12 extracts the target sound signal from the sound signal acquired by the process of step S11 based on the position of the target detected by the process of step S12. For example, the target sound extraction unit 12 extracts a target sound signal uttered from a plurality of persons included in the video. In addition, when the input audio signal includes noise other than the target audio, the target audio extraction unit 12 extracts only the target audio while reducing the influence thereof. Specifically, when the audio signal is a single channel, the target audio extraction unit 12 performs processing using a spectral subtraction method or the like. When the audio signal has a plurality of channels, the target audio extraction unit 12 performs processing using beam forming such as DS method or MVDR method, or blind sound source separation using ICA (independent component analysis) or the like. . When beam forming or the like is used, the target sound is extracted based on the target position. As described above, by using the audio signal and the video signal in combination, it is possible to detect the positions and voices of a plurality of uttering targets included in the video. Thereby, the process of step S13 is completed, and the information processing proceeds to the process of step S14.

ステップＳ１４の処理では、ターゲット属性推定部１４が、ステップＳ１３の処理で抽出されたターゲットの音声信号と、ステップＳ１１で取得された映像信号とに基づいて、例えば、ターゲットの使用言語や関心度などの属性を推定する。これにより、ステップＳ１４の処理は完了し、情報処理はステップＳ１５の処理に進む。 In the process of step S14, the target attribute estimation unit 14 uses the target audio signal extracted in the process of step S13 and the video signal acquired in step S11. Estimate the attributes of Thereby, the process of step S14 is completed, and the information processing proceeds to the process of step S15.

なお、属性の推定方法は、上記ステップＳ２の処理における属性の推定方法と同様である。すなわち、取得された音声信号に加え映像信号の特徴量を抽出する点が異なる以外、図７に示す属性の推定方法を適用できる。本実施の形態では、ターゲット属性推定部１４の関心度推定部１４１が、ターゲットの関心度を推定し、属性推定部１４２がターゲットの使用言語などの他の属性を推定する。本実施の形態では、音声信号に加えて映像信号が取得されることにより、関心度の推定の精度が向上する。例えば、図１３に例示するように、関心度推定部１４１が、まず、取得された映像信号に基づいて、ターゲットの顔を検出する。また、関心度推定部１４１は、図８に示す処理と同様に、取得された音声信号に基づいて、ターゲットの発話の有無や、単語データベースから抽出された特定単語を検出する。ここで、ターゲットの顔が検出されることは、カメラ６が映像信号を取得する際に、表示装置２や画像投影装置３の投影面に表示されているコンテンツをターゲットが見ていること意味する。そこで、関心度推定部１４１は、例えば、顔、発話および特定単語の全てが検出された場合には関心度が高いと推定する。また、関心度推定部１４１は、顔および発話が検出されたが特定単語が検出されなかった場合には関心度は中程度と推定し、顔が検出されたが発話が検出されなかった場合には関心度が低いと推定する。 The attribute estimation method is the same as the attribute estimation method in the process of step S2. That is, the attribute estimation method shown in FIG. 7 can be applied except that the feature amount of the video signal is extracted in addition to the acquired audio signal. In the present embodiment, the interest level estimation unit 141 of the target attribute estimation unit 14 estimates the target interest level, and the attribute estimation unit 142 estimates other attributes such as the target language used. In the present embodiment, since the video signal is acquired in addition to the audio signal, the accuracy of the interest level estimation is improved. For example, as illustrated in FIG. 13, the interest level estimation unit 141 first detects a target face based on the acquired video signal. Similarly to the processing shown in FIG. 8, the interest level estimation unit 141 detects the presence or absence of a target utterance and a specific word extracted from the word database based on the acquired voice signal. Here, the detection of the face of the target means that the target is watching the content displayed on the projection surface of the display device 2 or the image projection device 3 when the camera 6 acquires the video signal. . Therefore, the interest level estimation unit 141 estimates that the interest level is high when, for example, all of the face, the utterance, and the specific word are detected. The interest level estimation unit 141 estimates that the interest level is medium when a face and an utterance are detected but no specific word is detected, and when the face is detected but no utterance is detected. Presumes that the degree of interest is low.

なお、取得された映像信号中に顔が検出されない場合には、ターゲットがコンテンツを見ていないことを意味することから、関心度推定部１４１は、例えば、システム対象外などの通知情報を発信し、以降の処理は中止あるいは中断する。 Note that, when a face is not detected in the acquired video signal, it means that the target does not see the content, and therefore, the interest level estimation unit 141 transmits notification information, for example, that the system is out of scope. The subsequent processing is canceled or interrupted.

ステップＳ１５の処理では、コンテンツ選択取得部１５が、ステップＳ３の処理と同様に、図５または図９に示すコンテンツ属性テーブルを参照し、ステップＳ１４の処理で推定された属性に対応するコンテンツを選択し、コンテンツ記憶部１１から取得する。これにより、ステップＳ１５の処理は完了し、情報処理はステップＳ１６の処理に進む。 In the process of step S15, the content selection / acquisition unit 15 selects the content corresponding to the attribute estimated in the process of step S14 with reference to the content attribute table shown in FIG. 5 or FIG. 9 as in the process of step S3. And acquired from the content storage unit 11. Thereby, the process of step S15 is completed, and the information processing proceeds to the process of step S16.

ステップＳ１６の処理では、表示位置調整部１６が、図１０に示すように、表示装置２あるいは画像投影装置３の投影面に表示する際の表示位置を調整する。また、前述したように、本実施の形態では複数人のターゲットの位置を推定できるので、例えば、図１４に示すように、複数人のターゲットのそれぞれに属性に応じて異なるコンテンツを表示するよう、表示位置を調整することも可能である。これにより、ステップＳ１６の処理は完了し、情報処理はステップＳ１７の処理に進む。 In the processing of step S16, the display position adjustment unit 16 adjusts the display position when displaying on the projection surface of the display device 2 or the image projection device 3, as shown in FIG. In addition, as described above, since the positions of a plurality of targets can be estimated in the present embodiment, for example, as shown in FIG. 14, different contents are displayed on each of the plurality of targets according to attributes. It is also possible to adjust the display position. Thereby, the process of step S16 is completed, and the information processing proceeds to the process of step S17.

ステップＳ１７の処理では、表示手段としての表示装置２あるいは画像投影装置３が、ステップＳ１５の処理で取得されたコンテンツを、ステップＳ１６の処理で調整された表示位置に表示する。これにより、ステップＳ１７の処理は完了し、一連の情報処理は終了する。 In the process of step S17, the display device 2 or the image projection apparatus 3 as the display unit displays the content acquired in the process of step S15 at the display position adjusted in the process of step S16. Thereby, the process of step S17 is completed and a series of information processing is complete | finished.

以上、説明したように、本発明によれば、音声により属性を推定し、推定された属性に対応するコンテンツを選択して表示できるので、ターゲットの属性に応じて適切なコンテンツを表示できる。また、本発明によれば、ターゲットの位置を検出することにより、ターゲットの位置に応じてコンテンツの表示位置を調整できる。また、本発明によれば、音声信号に加えて映像信号を取得することにより、精度高くターゲットの位置を検出できるので、ターゲットの位置に応じてコンテンツの表示位置を精度高く調整できる。また、本発明によれば、音声信号に加えて映像信号を取得することにより、複数人のターゲットの位置を検出することができるので、複数人のターゲットのそれぞれの位置に応じて複数のコンテンツを表示させることもできる。 As described above, according to the present invention, attributes can be estimated by voice, and content corresponding to the estimated attributes can be selected and displayed, so that appropriate content can be displayed according to the target attribute. Further, according to the present invention, the display position of the content can be adjusted according to the position of the target by detecting the position of the target. Further, according to the present invention, the target position can be detected with high accuracy by acquiring the video signal in addition to the audio signal, so that the display position of the content can be adjusted with high accuracy according to the target position. In addition, according to the present invention, since the position of a plurality of targets can be detected by acquiring a video signal in addition to the audio signal, a plurality of contents can be obtained according to the respective positions of the plurality of targets. It can also be displayed.

１情報処理装置
２表示装置
３画像投影装置
４サーバ
５マイク（音声取得手段）
６カメラ（映像取得手段）
１０情報処理システム
１２ターゲット音声抽出部
１３ターゲット位置検出部
１４ターゲット属性推定部
１４１関心度推定部
１４２属性推定部
１５コンテンツ選択取得部
１６表示位置調整部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus 2 Display apparatus 3 Image projector 4 Server 5 Microphone (voice acquisition means)
6 Camera (Video acquisition means)
DESCRIPTION OF SYMBOLS 10 Information processing system 12 Target audio | voice extraction part 13 Target position detection part 14 Target attribute estimation part 141 Interest level estimation part 142 Attribute estimation part 15 Content selection acquisition part 16 Display position adjustment part

特開２０１０−１９１４８７号公報JP 2010-191487 A 特開２０１２−１８５３０３号公報JP 2012-185303 A 特開２０１２−８３９２５号公報JP 2012-83925 A 特開２０１２−１１８６２３号公報JP 2012-118623 A 特開２０１２−１８５３０２号公報JP 2012-185302 A

Claims

Voice acquisition means for acquiring a voice signal including the target voice;
Target attribute estimation means for estimating the attribute of the target based on the audio signal acquired by the audio acquisition means;
Content selection acquisition means for selecting and acquiring content based on the attribute estimated by the target attribute estimation means;
Display means for displaying the content acquired by the content selection acquisition means toward the target;
An information processing system comprising:

The information processing system according to claim 1, wherein the attribute includes a language used and / or an interest level.

Provided with a video acquisition means for acquiring a video signal including the target video,
The target attribute estimation unit estimates a target attribute based on the video signal acquired by the video acquisition unit and the audio signal acquired by the audio acquisition unit. Information processing system described in 1.

The target attribute estimation means estimates the degree of interest by detecting any one or more of a target face, utterance, or a specific word. Information processing system described in 1.

The information processing system according to claim 4, wherein the target attribute estimation unit determines the specific word to be detected from words included in the content.

Target position detection means for detecting the position of the target based on the audio signal acquired by the audio acquisition means and / or the video signal acquired by the video acquisition means;
Display position adjusting means for adjusting the display position of the content displayed on the display means based on the position of the target detected by the target position detecting means;
The information processing system according to any one of claims 1 to 5, further comprising:

An audio acquisition step for acquiring an audio signal including the target audio;
A target attribute estimation step for estimating an attribute of the target based on the audio signal acquired in the audio acquisition step;
A content selection acquisition step of selecting and acquiring content based on the attribute estimated in the target attribute estimation step;
A display step of displaying the content acquired in the content selection acquisition step toward the target;
An information processing method comprising:

Voice acquisition means for acquiring a voice signal including the target voice;
Target attribute estimation means for estimating the attribute of the target based on the audio signal acquired by the audio acquisition means;
Content selection acquisition means for selecting and acquiring content based on the attribute estimated by the target attribute estimation means;
Display means for displaying the content acquired by the content selection acquisition means toward the target;
An information processing apparatus comprising: