JP2017059916A

JP2017059916A - Content output device, content output method, and program

Info

Publication number: JP2017059916A
Application number: JP2015181307A
Authority: JP
Inventors: 隆司川下; Takashi Kawashita; 俊彦吉田; Toshihiko Yoshida; 洋一村山; Yoichi Murayama; 剛川上; Takeshi Kawakami; 麻実麻生; Asami Aso; 和真川原; Kazuma Kawahara
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2015-09-15
Filing date: 2015-09-15
Publication date: 2017-03-23

Abstract

PROBLEM TO BE SOLVED: To enable output of a content that can attract the interest of people.SOLUTION: In a digital signage device 1, a control part 23 executes human detection processing on a photographed image acquired by an imaging part 31. When a person is detected through the human detection processing, the control part 23 executes face recognition processing on the photographed image acquired by the imaging part 31, and determines whether the reaction of the detected person corresponds to a predetermined condition on the basis of a recognized face image. When determining that the reaction corresponds to the predetermined condition, the control part 23 causes a voice output part 33 to output a voice content 331 with the details different from that of a display content 271 output from an image forming part 27.SELECTED DRAWING: Figure 4

Description

本発明は、コンテンツ出力装置、コンテンツ出力方法及びプログラムに関する。 The present invention relates to a content output device, a content output method, and a program.

従来、コンテンツに登場する人物等の輪郭の形状に形成されたスクリーンにコンテンツを投影する映像出力装置搭載機器が知られている（例えば、特許文献１参照）。 2. Description of the Related Art Conventionally, there is known a video output device-equipped device that projects content onto a screen formed in a contour shape of a person or the like appearing in the content (see, for example, Patent Document 1).

特開２０１１−１５０２２１号公報JP 2011-150221 A

しかし、特許文献１に記載の機器等のコンテンツ出力装置では、予め定められたコンテンツを定期的に、もしくは繰り返し放映するだけである。そのため、人の興味を引くことが難しく、せっかくコンテンツ出力装置の前に人が来ても素通りされてしまったり、すぐに離れてしまったりする等の問題があった。 However, the content output device such as a device described in Patent Document 1 only broadcasts predetermined content regularly or repeatedly. For this reason, it has been difficult to attract people's interests, and there have been problems such as being passed through even if a person comes in front of the content output device, or being immediately away.

本発明の課題は、人の興味を引くことができるコンテンツ出力を可能とすることである。 An object of the present invention is to enable content output that can attract people's interest.

上記課題を解決するため、請求項１に記載の発明のコンテンツ出力装置は、
コンテンツを出力する第１の出力手段と、
前記第１の出力手段とは異なる出力形式のコンテンツを出力する第２の出力手段と、
人を検出する人検出手段と、
前記人検出手段による検出結果に基づいて、前記第１の出力手段により出力されているコンテンツと異なる内容のコンテンツを前記第２の出力手段に出力させる制御手段と、
を備える。 In order to solve the above-mentioned problem, the content output device of the invention according to claim 1 provides:
First output means for outputting content;
Second output means for outputting content in an output format different from the first output means;
A person detecting means for detecting a person;
Control means for causing the second output means to output content having a content different from the content output by the first output means based on the detection result by the human detection means;
Is provided.

本発明によれば、人の興味を引くことができるコンテンツ出力が可能となる。 According to the present invention, it is possible to output content that can attract people's interest.

本実施形態におけるデジタルサイネージ装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the digital signage apparatus in this embodiment. 図１のスクリーン部の概略構成を示す図である。It is a figure which shows schematic structure of the screen part of FIG. （ａ）は、基本発話文データの一例を示す図、（ｂ）は、言い間違い変換テーブルの一例を示す図、（ｃ）は、文言定義テーブルの一例を示す図である。(A) is a figure which shows an example of basic utterance sentence data, (b) is a figure which shows an example of a word error conversion table, (c) is a figure which shows an example of a word definition table. 図１の制御部により実行されるコンテンツ出力処理を示すフローチャートである。It is a flowchart which shows the content output process performed by the control part of FIG. （ａ）は、図４のステップＳ５で出力される通常コンテンツの一例を示す図、（ｂ）は、図４のステップＳ６で出力される言い間違いを発生させたコンテンツの一例を示す図である。(A) is a figure which shows an example of the normal content output by step S5 of FIG. 4, (b) is a figure which shows an example of the content which produced the typographical error output by step S6 of FIG. . 通常コンテンツ及び言い間違いを発生させたコンテンツの発話文の例を示す図である。It is a figure which shows the example of the utterance sentence of the content which produced the normal content and the mistake.

以下、添付図面を参照して本発明に係る好適な実施形態を詳細に説明する。以下の実施形態では、本発明に係るコンテンツ出力装置としてデジタルサイネージ装置１を適用した場合を例にとり説明する。なお、本発明は、図示例に限定されるものではない。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following embodiment, a case where the digital signage apparatus 1 is applied as a content output apparatus according to the present invention will be described as an example. The present invention is not limited to the illustrated example.

［デジタルサイネージ装置１の構成］
図１は、本実施形態におけるデジタルサイネージ装置１の主制御構成を示すブロック図である。図２は、スクリーン部２２の概略構成を示す正面図である。デジタルサイネージ装置１は、例えば、店舗等に設置され、商品を説明するコンテンツを出力する装置である。 [Configuration of Digital Signage Device 1]
FIG. 1 is a block diagram showing a main control configuration of the digital signage apparatus 1 in the present embodiment. FIG. 2 is a front view illustrating a schematic configuration of the screen unit 22. The digital signage apparatus 1 is an apparatus that is installed in, for example, a store or the like and outputs content describing a product.

図１に示すように、デジタルサイネージ装置１は、コンテンツの映像光を照射する投影部２１と、投影部２１から照射された映像光を背面で受けて前面に投影するスクリーン部２２とを備えている。 As shown in FIG. 1, the digital signage apparatus 1 includes a projection unit 21 that irradiates content image light, and a screen unit 22 that receives the image light emitted from the projection unit 21 at the back and projects it onto the front. Yes.

まず、投影部２１について説明する。
投影部２１は、制御部２３と、プロジェクタ２４と、記憶部２５と、通信部２６と、を備えている。プロジェクタ２４、記憶部２５、通信部２６は、図１に示すように制御部２３に接続されている。 First, the projection unit 21 will be described.
The projection unit 21 includes a control unit 23, a projector 24, a storage unit 25, and a communication unit 26. The projector 24, the storage unit 25, and the communication unit 26 are connected to the control unit 23 as shown in FIG.

制御部２３は、記憶部２５に記憶されている各種のプログラムを実行して所定の演算や各部の制御を行うＣＰＵ（Central Processing Unit）とプログラム実行時の作業領域となるメモリとを備えている（いずれも図示略）。制御部２３は、記憶部２５のプログラム記憶部２５１に記憶されているプログラムとの協働により、制御手段として機能する。また、制御部２３は、撮像部３１との協働により人検出手段として機能する。 The control unit 23 includes a CPU (Central Processing Unit) that executes various programs stored in the storage unit 25 to perform predetermined calculations and control of each unit, and a memory that is a work area when the program is executed. (Both not shown). The control unit 23 functions as a control unit in cooperation with a program stored in the program storage unit 251 of the storage unit 25. Further, the control unit 23 functions as a human detection unit in cooperation with the imaging unit 31.

プロジェクタ２４は、制御部２３から出力された画像データを映像光に変換してスクリーン部２２に向けて照射する投影装置である。プロジェクタ２４は、例えば、アレイ状に配列された複数個（ＸＧＡの場合、横１０２４画素×縦７６８画素）の微小ミラーの各傾斜角度を個々に高速でオン／オフ動作して表示動作することでその反射光により光像を形成する表示素子であるＤＭＤ（デジタルマイクロミラーデバイス）を利用したＤＬＰ（Digital Light Processing）(登録商標)プロジェクタが適用可能である。
プロジェクタ２４は、画像形成部２７と合せて第１の出力手段を構成する。 The projector 24 is a projection device that converts the image data output from the control unit 23 into image light and irradiates the image data toward the screen unit 22. For example, the projector 24 performs a display operation by individually turning on / off each tilt angle of a plurality of micromirrors arranged in an array (in the case of XGA, horizontal 1024 pixels × vertical 768 pixels) at high speed. A DLP (Digital Light Processing) (registered trademark) projector using a DMD (digital micromirror device), which is a display element that forms a light image with the reflected light, is applicable.
The projector 24 together with the image forming unit 27 constitutes a first output unit.

記憶部２５は、ＨＤＤ（Hard Disk Drive）や不揮発性の半導体メモリ等により構成される。記憶部２５には、図１に示すように、プログラム記憶部２５１、コンテンツ記憶部２５２が設けられている。 The storage unit 25 is configured by an HDD (Hard Disk Drive), a nonvolatile semiconductor memory, or the like. As shown in FIG. 1, the storage unit 25 includes a program storage unit 251 and a content storage unit 252.

プログラム記憶部２５１には、制御部２３で実行されるシステムプログラムや各種処理プログラム、これらのプログラムの実行に必要なデータ等が記憶されている。 The program storage unit 251 stores a system program executed by the control unit 23, various processing programs, data necessary for executing these programs, and the like.

コンテンツ記憶部２５２には、コンテンツを出力するための情報が記憶されている。本実施形態において、コンテンツは、表示コンテンツ２７１と音声コンテンツ３３１の２つの異なる出力形式のコンテンツにより構成される。具体的に、コンテンツ記憶部２５２には、表示コンテンツ２７１を出力するための複数のフレーム画像により構成される動画データ、及び各フレーム画像に対応する音声コンテンツ３３１を出力するためのテキストデータを規定する情報（図３（ａ）に示す基本発話文データ２５２ａ、図３（ｂ）に示す言い間違い変換テーブル２５２ｂ、図３（ｃ）に示す文言定義テーブル２５２ｃ）が記憶されている。音声コンテンツ３３１は、表示コンテンツ２７１に含まれるキャラクターＣ（図５（ａ）、（ｂ）参照）の発話音声となるコンテンツである。 The content storage unit 252 stores information for outputting content. In the present embodiment, the content is composed of content in two different output formats, display content 271 and audio content 331. Specifically, the content storage unit 252 defines moving image data composed of a plurality of frame images for outputting the display content 271 and text data for outputting the audio content 331 corresponding to each frame image. Information (basic utterance sentence data 252a shown in FIG. 3 (a), an error conversion table 252b shown in FIG. 3 (b), and a word definition table 252c shown in FIG. 3 (c)) is stored. The audio content 331 is content that becomes speech voice of the character C included in the display content 271 (see FIGS. 5A and 5B).

基本発話文データ２５２ａは、図３（ａ）に示すように、音声コンテンツ３３１の基本発話文のテキストデータを特定するための情報である。基本発話文データ２５２ａは、図３（ｂ）の言い間違い変換テーブル２５２ｂの番号０のレコードに示すように、文言定義テーブル２５２ｃで定義されている文言（#SENDEN_、#SHOUHIN_）を含む変更箇所Ａ、Ｂを含んで構成されている。 The basic utterance sentence data 252a is information for specifying the text data of the basic utterance sentence of the audio content 331, as shown in FIG. The basic utterance sentence data 252a includes the change part A including the words (#SENDEN_, #SHOUHIN_) defined in the word definition table 252c, as shown in the record of number 0 in the error word conversion table 252b of FIG. , B are included.

言い間違い変換テーブル２５２ｂは、言い間違いを発生させるための条件と、その条件に該当したときに変更する文言を対応付けて記憶したテーブルである。図３（ｂ）に示すように、言い間違い変換テーブル２５２ｂは「番号」、「条件」、「変更箇所Ａ」、「変更箇所Ｂ」の項目により構成されている。「条件」は、コンテンツ出力中に言い間違いを発生させる条件を定義した情報である。「変更箇所Ａ」、「変更箇所Ｂ」は、「条件」に該当した場合に基本発話文データ２５２ａの変更箇所Ａ、変更箇所Ｂをどのように変更するかを示す情報である。言い間違い変換テーブル２５２ｂの番号１〜４の条件に該当していない状態（番号０）では、基本発話文データ２５２ａに基づき音声出力用のテキストデータが生成される。番号１〜４の何れかの条件に該当した場合、言い間違い変換テーブル２５２ｂに基づいて基本発話文データ２５２ａの変更箇所Ａ、Ｂの文言が変更された音声出力用のテキストデータが生成される。なお、言い間違い変換テーブル２５２ｂにおいて、番号０は最も優先順位が低いが、番号１〜４においては、番号が小さいほど優先順位が高い。 The word error conversion table 252b is a table in which a condition for generating a word error and a word to be changed when the condition is met are stored in association with each other. As shown in FIG. 3B, the error conversion table 252b is composed of items of “number”, “condition”, “change location A”, and “change location B”. “Condition” is information that defines a condition that causes an error in content output. “Change location A” and “change location B” are information indicating how to change the change location A and the change location B of the basic utterance sentence data 252a when the “condition” is met. In a state (number 0) that does not correspond to the conditions of numbers 1 to 4 in the wrong word conversion table 252b, text data for voice output is generated based on the basic utterance sentence data 252a. When any of the conditions of Nos. 1 to 4 is satisfied, text data for voice output is generated in which the changed words A and B of the basic utterance sentence data 252a are changed based on the wrong word conversion table 252b. In the wrong word conversion table 252b, number 0 has the lowest priority, but numbers 1 to 4 have a higher priority as the number is smaller.

文言定義テーブル２５２ｃは、基本発話文データ２５２ａ及び言い間違い変換テーブル２５２ｂで用いられている共通の文言を定義したテーブルである。 The word definition table 252c is a table that defines common words used in the basic utterance sentence data 252a and the mistaken word conversion table 252b.

通信部２６は、モデム、ルータ、ネットワークカード等により構成され、外部機器との通信を行う。 The communication unit 26 includes a modem, a router, a network card, and the like, and performs communication with an external device.

次に、スクリーン部２２について説明する。
図２に示すようにスクリーン部２２には、画像形成部２７と、画像形成部２７を支持する台座２８とが備えられている。 Next, the screen unit 22 will be described.
As shown in FIG. 2, the screen unit 22 includes an image forming unit 27 and a pedestal 28 that supports the image forming unit 27.

画像形成部２７は、映像光の照射方向に対して略直交するように配置された、例えばアクリル板などの一枚の透光板２９に、フィルム状のフレネルレンズが積層された背面投影用のフィルムスクリーンが貼付されて構成されたスクリーンである。 The image forming unit 27 is for rear projection in which a film-shaped Fresnel lens is laminated on a single translucent plate 29 such as an acrylic plate, which is arranged so as to be substantially orthogonal to the irradiation direction of the image light. It is a screen configured by attaching a film screen.

台座２８には、撮像部３１と、ボタン式の操作部３２と、音声出力部３３とが設けられている。
撮像部３１は、画像形成部２７に対向する空間の画像を撮影して撮影画像を生成する。
撮像部３１は、図示は省略するが、光学系及び撮像素子を備えるカメラと、カメラを制御する撮像制御部とを備えている。カメラの光学系は、画像形成部２７の前にいる人物を撮影可能な方向に向けられている。また、その撮像素子は、例えば、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal-oxide Semiconductor）等のイメージセンサであり、光学系を通過した光学像を２次元の画像信号に変換する。 The pedestal 28 is provided with an imaging unit 31, a button-type operation unit 32, and an audio output unit 33.
The imaging unit 31 captures an image of a space facing the image forming unit 27 and generates a captured image.
Although not shown, the imaging unit 31 includes a camera including an optical system and an imaging element, and an imaging control unit that controls the camera. The optical system of the camera is directed in a direction in which a person in front of the image forming unit 27 can be photographed. The image pickup device is an image sensor such as a charge coupled device (CCD) or a complementary metal-oxide semiconductor (CMOS), and converts an optical image that has passed through the optical system into a two-dimensional image signal.

音声出力部３３は、スピーカ等を備えて構成され、制御部２３からの制御に従って音声コンテンツ３３１を出力する。音声出力部３３は、第２の出力手段として機能する。
撮像部３１、操作部３２、音声出力部３３は、図１に示すように制御部２３に接続されている。 The audio output unit 33 includes a speaker or the like, and outputs audio content 331 according to control from the control unit 23. The audio output unit 33 functions as a second output unit.
The imaging unit 31, the operation unit 32, and the audio output unit 33 are connected to the control unit 23 as shown in FIG.

［デジタルサイネージ装置１の動作］
次に、デジタルサイネージ装置１の動作について説明する。
図４に、デジタルサイネージ装置１において実行されるコンテンツ出力処理のフローチャートを示す。コンテンツ出力処理は、制御部２３とプログラム記憶部２５１に記憶されているプログラムとの協働により実行される。 [Operation of Digital Signage Device 1]
Next, the operation of the digital signage apparatus 1 will be described.
FIG. 4 shows a flowchart of content output processing executed in the digital signage apparatus 1. The content output process is executed in cooperation with the control unit 23 and the program stored in the program storage unit 251.

まず、制御部２３は、人検出処理を実行する（ステップＳ１）。ステップＳ１において、制御部２３は、例えば、撮像部３１に撮影を行わせ、得られた撮影画像に対し人検出処理を実行する。人検出処理としては、例えば、予め準備した背景のみの画像と撮影画像との差分を取ることで人物を検出する手法、パターン認識により人物形状を検出する手法等、公知の画像処理技術を用いることができる。 First, the control part 23 performs a person detection process (step S1). In step S <b> 1, for example, the control unit 23 causes the imaging unit 31 to perform imaging, and performs a human detection process on the obtained captured image. As human detection processing, for example, a known image processing technique such as a method of detecting a person by taking a difference between a background image prepared in advance and a captured image, a method of detecting a person shape by pattern recognition, or the like is used. Can do.

次いで、制御部２３は、人検出処理の結果、人が検出されたか否かを判断する（ステップＳ２）。人が検出されていないと判断した場合（ステップＳ２；ＮＯ）、制御部２３は、ステップＳ５に移行する。 Next, the control unit 23 determines whether a person is detected as a result of the person detection process (step S2). When it is determined that no person is detected (step S2; NO), the control unit 23 proceeds to step S5.

一方、人が検出されたと判断した場合（ステップＳ２；ＹＥＳ）、制御部２３は、ステップＳ１で取得された撮影画像に顔認識処理を行う（ステップＳ３）。顔認識処理は、例えば、特開２００６−２０２０４９号公報に記載のように、公知の画像処理技術を用いることができる。 On the other hand, when it is determined that a person has been detected (step S2; YES), the control unit 23 performs a face recognition process on the captured image acquired in step S1 (step S3). For the face recognition processing, for example, a known image processing technique can be used as described in JP-A-2006-202049.

次いで、制御部２３は、顔認識処理によって認識された顔画像に基づいて、ステップＳ１で検出された人が予め定められた条件（所定の条件）に該当するか否かを判断する（ステップＳ４）。
ステップＳ４で使用される所定の条件とは、言い間違いを発生させるためのトリガーとなる条件であり、例えば、以下の（１）〜（４）に示す条件が挙げられる。
（１）デジタルサイネージ装置１の前に人がｍ秒（例えば、ｍ＝５）以上いる
（２）デジタルサイネージ装置１の前にいる人が笑顔ではない
（３）デジタルサイネージ装置１の前にいる人の性別識別率が低い
（４）デジタルサイネージ装置１の前をｎ人以上（例えば、ｎ＝１００）の人が通過した
ここで、（３）は、デジタルサイネージ装置１から離れている人が多いことを示す条件である。
なお、ステップＳ４で使用される所定の条件は、コンテンツに人の興味を引くような変化をもたせたほうが好ましいと判断されるような反応をデジタルサイネージ装置１の前にいる人が示していることが推定される条件であれば、特に限定されない。 Next, the control unit 23 determines whether or not the person detected in step S1 satisfies a predetermined condition (predetermined condition) based on the face image recognized by the face recognition process (step S4). ).
The predetermined condition used in step S4 is a condition that serves as a trigger for generating an error, and includes the following conditions (1) to (4).
(1) A person is in front of the digital signage device 1 for m seconds (for example, m = 5) or more (2) A person in front of the digital signage device 1 is not smiling (3) A person is in front of the digital signage device 1 (4) N or more people (for example, n = 100) have passed in front of the digital signage device 1. Here, (3) indicates that a person away from the digital signage device 1 This is a condition indicating that there are many.
Note that the predetermined condition used in step S4 indicates that a person in front of the digital signage apparatus 1 has a reaction that it is determined that it is preferable to change the content so as to attract people's interest. If it is the conditions which are estimated, it will not specifically limit.

例えば、制御部２３は、直前のｍ秒間で取得された撮影画像から認識された顔画像をメモリに記憶しておき、ステップＳ３で認識された顔画像とメモリに記憶されている顔画像と比較することにより、デジタルサイネージ装置１の前に同じ人がｍ秒以上いるか否かを判断する。
また、例えば、制御部２３は、ステップＳ３で認識された顔の目や口の形状等に基づいて、デジタルサイネージ装置１の前にいる人が笑顔であるか否かを判断する。
また、例えば、制御部２３は、ステップＳ３で認識された顔画像に性別識別処理を行い、ステップＳ１で検出された人数のうち、性別識別処理で性別が識別された人数の割合（性別識別率）を算出する。そして、算出された性別識別率と所定の閾値との比較に基づいて、デジタルサイネージ装置１の前にいる人の性別識別率が所定の閾値より低いか否かを判断する。
また、例えば、制御部２３は、前回の顔認識処理で認識された顔画像と今回の顔認識処理で認識された顔画像との比較に基づいてデジタルサイネージ装置１の前を通過した人数をカウントしてメモリ内のカウンタに加算し、加算後のカウンタの値に基づいて、デジタルサイネージ装置１の前を所定数以上（例えば、１００人以上）の人が通過したか否かを認識する。 For example, the control unit 23 stores the face image recognized from the captured image acquired in the immediately preceding m seconds in the memory, and compares the face image recognized in step S3 with the face image stored in the memory. By doing so, it is determined whether or not the same person is in front of the digital signage apparatus 1 for m seconds or more.
Further, for example, the control unit 23 determines whether or not the person in front of the digital signage device 1 is smiling based on the face eyes and mouth shape recognized in step S3.
Further, for example, the control unit 23 performs gender identification processing on the face image recognized in step S3, and among the number of people detected in step S1, the ratio of the number of genders identified in the gender identification processing (gender identification rate) ) Is calculated. Then, based on the comparison between the calculated gender identification rate and a predetermined threshold, it is determined whether or not the gender identification rate of the person in front of the digital signage apparatus 1 is lower than the predetermined threshold.
For example, the control unit 23 counts the number of persons who have passed in front of the digital signage device 1 based on a comparison between the face image recognized in the previous face recognition process and the face image recognized in the current face recognition process. Then, it is added to the counter in the memory, and it is recognized whether or not a predetermined number or more (for example, 100 or more people) have passed in front of the digital signage device 1 based on the counter value after the addition.

ステップＳ４において、ステップＳ１において検出された人が予め定められた条件に該当しないと判断した場合（ステップＳ４；ＮＯ）、制御部２３は、通常コンテンツを出力し（ステップＳ５）、ステップＳ７に移行する。即ち、制御部２３は、コンテンツ記憶部２５２から表示コンテンツ２７１の画像データを読み出し、そのフレーム画像を順次プロジェクタ２４に出力してプロジェクタ２４により画像形成部２７に投影させる。また、コンテンツ記憶部２５２に記憶されている、フレーム画像に対応する基本発話文データ２５２ａ及び文言定義テーブル２５２ｃに基づいて基本発話文のテキストデータを生成し、生成したテキストデータに基づいて音声を合成し、音声出力部３３により基本発話文の音声コンテンツ３３１を出力させる。 If it is determined in step S4 that the person detected in step S1 does not meet the predetermined condition (step S4; NO), the control unit 23 outputs the normal content (step S5), and the process proceeds to step S7. To do. That is, the control unit 23 reads the image data of the display content 271 from the content storage unit 252, sequentially outputs the frame images to the projector 24, and causes the projector 24 to project the image forming unit 27. Further, text data of the basic utterance sentence is generated based on the basic utterance sentence data 252a and the word definition table 252c corresponding to the frame image stored in the content storage unit 252, and the voice is synthesized based on the generated text data. Then, the voice output unit 33 outputs the voice content 331 of the basic utterance sentence.

図５（ａ）に、ステップＳ５で出力される通常コンテンツの一例を示す。本実施形態において、表示コンテンツ２７１は、キャラクターＣを含み、音声コンテンツ３３１は、キャラクターＣの発話音声となっている。通常コンテンツでは、表示コンテンツ２７１と音声コンテンツ３３１の内容が一致している。即ち、図５（ａ）に示すように、通常コンテンツの表示コンテンツ２７１は肉まんを宣伝する内容であり、通常コンテンツの音声コンテンツ３３１は、例えば、「こちらが当店おすすめの肉まんです」のように、肉まんを宣伝する内容の音声である。 FIG. 5A shows an example of normal content output in step S5. In the present embodiment, the display content 271 includes the character C, and the audio content 331 is the speech voice of the character C. In the normal content, the contents of the display content 271 and the audio content 331 are the same. That is, as shown in FIG. 5A, the display content 271 of the normal content is a content for promoting meat buns, and the audio content 331 of the normal content is, for example, “This is the meat bun recommended for our shop” It is the sound of the content that promotes meat buns.

一方、ステップＳ４において、ステップＳ１において検出された人が予め定められた条件に該当していると判断した場合（ステップＳ４；ＹＥＳ）、制御部２３は、該当した条件に応じて、言い間違いを発生させたコンテンツを出力し（ステップＳ６）、ステップＳ７に移行する。
ステップＳ６において、制御部２３は、コンテンツ記憶部２５２から表示コンテンツ２７１の画像データを読み出し、そのフレーム画像を順次プロジェクタ２４に出力してプロジェクタ２４により画像形成部２７に投影させる。また、コンテンツ記憶部２５２に記憶されている言い間違い変換テーブル２５２ｂに基づいて、出力されているフレーム画像に対応する基本発話文データ２５２ａの変更箇所Ａ、Ｂの文言を変更した音声出力用のテキストデータを生成する。そして、生成したテキストデータに基づいて音声を合成し、音声出力部３３により言い間違いを発生させた音声コンテンツ３３１を出力させる。 On the other hand, if it is determined in step S4 that the person detected in step S1 satisfies the predetermined condition (step S4; YES), the control unit 23 makes an error in accordance with the corresponding condition. The generated content is output (step S6), and the process proceeds to step S7.
In step S <b> 6, the control unit 23 reads the image data of the display content 271 from the content storage unit 252, sequentially outputs the frame images to the projector 24, and causes the projector 24 to project the image forming unit 27. Also, the text for voice output in which the wording of the changed portions A and B of the basic utterance sentence data 252a corresponding to the output frame image is changed based on the word error conversion table 252b stored in the content storage unit 252. Generate data. Then, the voice is synthesized based on the generated text data, and the voice output unit 33 outputs the voice content 331 in which an error has occurred.

図５（ｂ）に、ステップＳ６で出力される言い間違いを発生させたコンテンツの一例を示す。図５（ｂ）は、上述の条件（１）、（２）に該当したときに出力されるコンテンツの例である。図５（ｂ）に示すように、上述の条件（１）、（２）に該当したときに出力されるコンテンツでは、表示コンテンツ２７１は通常コンテンツの内容と同じであるが、音声コンテンツ３３１が通常コンテンツの内容と異なる。即ち、表示コンテンツ２７１の内容と音声コンテンツ３３１の内容とが異なる。例えば、図５（ｂ）に示すように、肉まんの広告を表示した表示コンテンツ２７１に対し、音声コンテンツ３３１は「こちらが当店おすすめの餃子です・・・じゃなくて、肉まんです（汗）」のように、表示コンテンツ２７１とわざと異なる内容に言い間違えることで、人の興味を引くようになっている。 FIG. 5B shows an example of content in which an error is output in step S6. FIG. 5B is an example of content output when the above conditions (1) and (2) are met. As shown in FIG. 5B, in the content output when the above conditions (1) and (2) are met, the display content 271 is the same as the content of the normal content, but the audio content 331 is the normal content. It differs from the content. That is, the content of the display content 271 and the content of the audio content 331 are different. For example, as shown in FIG. 5 (b), the audio content 331 is “This is our recommended dumpling, not meat bun (sweat)” for the display content 271 that displays the meat bun advertisement. In this way, it is possible to draw people's interest by mistakenly saying the contents different from the display contents 271 on purpose.

図６に、通常コンテンツの音声コンテンツ３３１（Ｎｏｒｍａｌ）の発話文及び上述の（１）〜（４）の条件のそれぞれに該当したときの音声コンテンツ３３１の発話文を示す。図６に示すように、条件（１）〜（４）に該当したときにステップＳ６で出力されるコンテンツの音声コンテンツ３３１は、表示コンテンツ２７１のキャラクターＣが言い間違いをしたり、セリフを度忘れしたり、視聴者に特別感を与えるセリフを発したりする等、より人間味の溢れたコンテンツとなっている。よって、出力されているコンテンツに対して人の興味を引くことが可能となる。 FIG. 6 shows the utterance sentence of the audio content 331 (Normal) and the utterance sentence of the audio content 331 when the above conditions (1) to (4) are satisfied. As shown in FIG. 6, the audio content 331 of the content output in step S6 when the conditions (1) to (4) are met, the character C of the display content 271 makes a mistake or forgets to speak Content that is more humane, such as giving a special feeling to viewers. Therefore, it becomes possible to attract people's interest in the output content.

ステップＳ７において、制御部２３は、操作部３２により終了（電源ＯＦＦ）が指示されたか否かを判断する（ステップＳ７）。終了が指示されていないと判断した場合（ステップＳ７；ＮＯ）、制御部２３は、ステップＳ１に戻る。終了が指示されたと判断した場合（ステップＳ７；ＹＥＳ）、制御部２３は、コンテンツ出力処理を終了する。 In step S7, the control unit 23 determines whether or not the operation unit 32 has instructed termination (power OFF) (step S7). When it is determined that the termination is not instructed (step S7; NO), the control unit 23 returns to step S1. When it is determined that the end is instructed (step S7; YES), the control unit 23 ends the content output process.

以上説明したように、デジタルサイネージ装置１によれば、制御部２３は、撮像部３１により取得された撮影画像に人検出処理を実行する。人検出処理により人が検出されると、制御部２３は、撮像部３１により取得された撮影画像に顔認識処理を実行し、認識された顔画像に基づいて、検出された人の反応が所定の条件に該当するか否かを判断する。所定の条件に該当すると判断した場合、制御部２３は、画像形成部２７により出力される表示コンテンツ２７１と異なる内容の音声コンテンツ３３１を音声出力部３３に出力させる。具体的には、表示コンテンツ２７１に対応する音声コンテンツ３３１の発話文の一部を変えて言い間違いを発生させた音声コンテンツ３３１を音声出力部３３により出力させる。
従って、より人間味の溢れた、人の興味を引くことができるコンテンツの出力が可能となる。 As described above, according to the digital signage apparatus 1, the control unit 23 performs a person detection process on the captured image acquired by the imaging unit 31. When a person is detected by the person detection process, the control unit 23 performs a face recognition process on the photographed image acquired by the imaging unit 31, and based on the recognized face image, a reaction of the detected person is predetermined. It is determined whether or not this condition is met. If it is determined that the predetermined condition is satisfied, the control unit 23 causes the audio output unit 33 to output audio content 331 having a content different from the display content 271 output by the image forming unit 27. Specifically, the audio output unit 33 outputs the audio content 331 in which a part of the utterance sentence of the audio content 331 corresponding to the display content 271 is changed and an error is generated.
Accordingly, it is possible to output content that is more human and can attract people's interest.

なお、上記実施形態における記述内容は、本発明に係るデジタルサイネージ装置の好適な一例であり、これに限定されるものではない。 In addition, the description content in the said embodiment is a suitable example of the digital signage apparatus which concerns on this invention, and is not limited to this.

例えば、上記実施形態においては、人が検出された場合に、検出された人（検出された人の反応）が所定の条件に該当しているか否かを判断し、所定の条件に該当した場合に、言い間違いを発生させたコンテンツを出力させることとしたが、これに限定されない。例えば、人が検出された場合に、言い間違いを発生させたコンテンツを出力させることとしてもよい。これにより、人が検出された場合に、人の興味を引くことができるコンテンツを出力することが可能となる。 For example, in the above-described embodiment, when a person is detected, it is determined whether the detected person (detected person's reaction) meets a predetermined condition, and the predetermined condition is met. In addition, the content in which a mistake has been made is output, but the present invention is not limited to this. For example, when a person is detected, content in which an error has occurred may be output. Thereby, when a person is detected, it is possible to output content that can attract the person's interest.

また、上記実施形態においては、デジタルサイネージ装置１単体でコンテンツの出力制御を行うこととしたが、この例に限定されない。
例えば、デジタルサイネージ装置１に通信ネットワークを介して接続される図示しないサーバ装置に表示コンテンツ２７１を出力するための動画データや各フレーム画像に対応する音声コンテンツ３３１のテキストデータを特定する情報を記憶しておくこととしてもよい。そして、撮像部３１により取得した撮影画像を通信部２６を介してサーバ装置に送信し、サーバ装置でステップＳ１〜ステップＳ４の処理を行い、ステップＳ４の判断結果に基づいて、サーバ装置で表示コンテンツ２７１の動画データの読み出し、及び、通常コンテンツ又は言い間違いを発生させたコンテンツの音声コンテンツ３３１のテキストデータの生成を行ってデジタルサイネージ装置１に送信し、デジタルサイネージ装置１においてはサーバ装置から受信したデータに基づいてリアルタイムでコンテンツの出力を行うこととしてもよい。 In the above embodiment, the content output control is performed by the digital signage device 1 alone, but the present invention is not limited to this example.
For example, information specifying moving image data for outputting the display content 271 to a server device (not shown) connected to the digital signage device 1 via a communication network and information specifying the text data of the audio content 331 corresponding to each frame image are stored. It is good to keep it. And the picked-up image acquired by the imaging part 31 is transmitted to a server apparatus via the communication part 26, the process of step S1-step S4 is performed by a server apparatus, and display content is displayed by a server apparatus based on the determination result of step S4. The video data of 271 is read, and the text data of the audio content 331 of the normal content or the content in which the error is generated is generated and transmitted to the digital signage device 1, and the digital signage device 1 receives it from the server device The content may be output in real time based on the data.

また、上記実施形態においては、表示コンテンツ２７１にキャラクターＣが含まれ、キャラクターＣの発話音声として音声コンテンツ３３１が出力される場合を例にとり説明したが、キャラクターＣの代わりに人物としてもよい。 Further, in the above embodiment, the case where the character C is included in the display content 271 and the audio content 331 is output as the speech voice of the character C has been described as an example, but a person may be used instead of the character C.

また、上記実施形態においては、本発明をプロジェクタからスクリーンに画像を投影することで画像の表示を行うデジタルサイネージ装置に適用した場合を例にとり説明したが、例えば、液晶ディスプレイ、プラズマディスプレイ等、他の表示装置を備えるコンテンツ再生装置に適用しても同様の効果を奏することができ、この例に限定されない。 In the above embodiment, the case where the present invention is applied to a digital signage apparatus that displays an image by projecting an image from a projector onto a screen has been described as an example. Even when applied to a content reproduction apparatus including the display device, the same effect can be obtained, and the present invention is not limited to this example.

その他、デジタルサイネージ装置を構成する各装置の細部構成及び細部動作に関しても、発明の趣旨を逸脱することのない範囲で適宜変更可能である。 In addition, the detailed configuration and detailed operation of each device constituting the digital signage device can be changed as appropriate without departing from the spirit of the invention.

本発明のいくつかの実施形態を説明したが、本発明の範囲は、上述の実施形態に限定するものではなく、特許請求の範囲に記載された発明の範囲とその均等の範囲を含む。
以下に、この出願の願書に最初に添付した特許請求の範囲に記載した発明を付記する。付記に記載した請求項の項番は、この出願の願書に最初に添付した特許請求の範囲の通りである。
［付記］
＜請求項１＞
コンテンツを出力する第１の出力手段と、
前記第１の出力手段とは異なる出力形式のコンテンツを出力する第２の出力手段と、
人を検出する人検出手段と、
前記人検出手段による検出結果に基づいて、前記第１の出力手段により出力されているコンテンツと異なる内容のコンテンツを前記第２の出力手段に出力させる制御手段と、
を備えるコンテンツ出力装置。
＜請求項２＞
前記第１の出力手段により出力されるコンテンツは、表示コンテンツであり、前記第２の出力手段により出力されるコンテンツは、音声コンテンツである請求項１に記載のコンテンツ出力装置。
＜請求項３＞
前記表示コンテンツと前記音声コンテンツは、互いに対応する内容を表すコンテンツであり、
前記制御手段は、前記人検出手段による検出結果に基づいて、前記第１の出力手段により出力されている表示コンテンツに対応する音声コンテンツの一部を変更して前記第２の出力手段に出力させる請求項２に記載のコンテンツ出力装置。
＜請求項４＞
前記制御手段は、前記人検出手段による検出結果に基づいて、前記第１の出力手段により出力されている表示コンテンツに対応する音声コンテンツの発話文に言い間違いを発生させた音声コンテンツを前記第２の出力手段に出力させる請求項３に記載のコンテンツ出力装置。
＜請求項５＞
前記制御手段は、前記人検出手段により人が検出された場合に、当該検出された人の反応を認識し、認識した反応が予め定められた条件に該当した場合に、前記第１の出力手段により出力されているコンテンツと異なる内容のコンテンツを前記第２の出力手段に出力させる請求項１〜４の何れか一項に記載のコンテンツ出力装置。
＜請求項６＞
第１の出力手段によりコンテンツを出力する工程と、
第２の出力手段により前記第１の出力手段とは異なる出力形式のコンテンツを出力する工程と、
人検出手段により人を検出する工程と、
前記人検出手段による検出結果に基づいて、前記第１の出力手段により出力されているコンテンツと異なる内容のコンテンツを前記第２の出力手段に出力させる工程と、
を含むコンテンツ出力方法。
＜請求項７＞
コンテンツを出力する第１の出力手段と、前記第１の出力手段とは異なる出力形式のコンテンツを出力する第２の出力手段と、人を検出する人検出手段と、を備えるコンテンツ出力装置に用いられるコンピュータを、
前記人検出手段による検出結果に基づいて、前記第１の出力手段により出力されているコンテンツと異なる内容のコンテンツを前記第２の出力手段に出力させる制御手段、
として機能させるためのプログラム。 Although several embodiments of the present invention have been described, the scope of the present invention is not limited to the above-described embodiments, but includes the scope of the invention described in the claims and equivalents thereof.
The invention described in the scope of claims attached to the application of this application will be added below. The item numbers of the claims described in the appendix are as set forth in the claims attached to the application of this application.
[Appendix]
<Claim 1>
First output means for outputting content;
Second output means for outputting content in an output format different from the first output means;
A person detecting means for detecting a person;
Control means for causing the second output means to output content having a content different from the content output by the first output means based on the detection result by the human detection means;
A content output device comprising:
<Claim 2>
The content output apparatus according to claim 1, wherein the content output by the first output means is display content, and the content output by the second output means is audio content.
<Claim 3>
The display content and the audio content are content representing contents corresponding to each other,
The control unit changes a part of the audio content corresponding to the display content output by the first output unit based on the detection result by the human detection unit, and causes the second output unit to output it. The content output device according to claim 2.
<Claim 4>
The control means, based on the detection result by the person detection means, outputs the audio content in which an error has occurred in the utterance sentence of the audio content corresponding to the display content output by the first output means. The content output device according to claim 3, wherein the output unit outputs the content.
<Claim 5>
The control means recognizes a reaction of the detected person when the person is detected by the person detection means, and when the recognized reaction meets a predetermined condition, the first output means The content output device according to any one of claims 1 to 4, wherein the second output unit outputs a content having a content different from the content output by the first output unit.
<Claim 6>
Outputting the content by the first output means;
Outputting a content in an output format different from that of the first output means by a second output means;
Detecting a person by means of human detection means;
A step of causing the second output means to output a content having a content different from the content output by the first output means based on a detection result by the human detection means;
Content output method.
<Claim 7>
Used in a content output apparatus comprising: first output means for outputting content; second output means for outputting content in an output format different from that of the first output means; and human detection means for detecting a person Computer
Control means for causing the second output means to output content having a content different from the content output by the first output means, based on the detection result by the human detection means;
Program to function as.

１デジタルサイネージ装置
２１投影部
２２スクリーン部
２３制御部
２４プロジェクタ
２５記憶部
２５１プログラム記憶部
２５２コンテンツ記憶部
２６通信部
２７画像形成部
２７１表示コンテンツ
２８台座
２９透光板
３１撮像部
３２操作部
３３音声出力部
３３１音声コンテンツ DESCRIPTION OF SYMBOLS 1 Digital signage apparatus 21 Projection part 22 Screen part 23 Control part 24 Projector 25 Storage part 251 Program storage part 252 Content storage part 26 Communication part 27 Image formation part 271 Display content 28 Base 29 Translucent board 31 Imaging part 32 Operation part 33 Sound Output unit 331 Audio content

Claims

First output means for outputting content;
Second output means for outputting content in an output format different from the first output means;
A person detecting means for detecting a person;
Control means for causing the second output means to output content having a content different from the content output by the first output means based on the detection result by the human detection means;
A content output device comprising:

The content output apparatus according to claim 1, wherein the content output by the first output means is display content, and the content output by the second output means is audio content.

The display content and the audio content are content representing contents corresponding to each other,
The control unit changes a part of the audio content corresponding to the display content output by the first output unit based on the detection result by the human detection unit, and causes the second output unit to output it. The content output device according to claim 2.

The control means, based on the detection result by the person detection means, outputs the audio content in which an error has occurred in the utterance sentence of the audio content corresponding to the display content output by the first output means. The content output device according to claim 3, wherein the output unit outputs the content.

The control means recognizes a reaction of the detected person when the person is detected by the person detection means, and when the recognized reaction meets a predetermined condition, the first output means The content output device according to any one of claims 1 to 4, wherein the second output unit outputs a content having a content different from the content output by the first output unit.

Outputting the content by the first output means;
Outputting a content in an output format different from that of the first output means by a second output means;
Detecting a person by means of human detection means;
A step of causing the second output means to output a content having a content different from the content output by the first output means based on a detection result by the human detection means;
Content output method.

Used in a content output apparatus comprising: first output means for outputting content; second output means for outputting content in an output format different from that of the first output means; and human detection means for detecting a person Computer
Control means for causing the second output means to output content having a content different from the content output by the first output means, based on the detection result by the human detection means;
Program to function as.