JP5668017B2

JP5668017B2 - Information providing apparatus, program thereof, and information providing system

Info

Publication number: JP5668017B2
Application number: JP2012109580A
Authority: JP
Inventors: 健島崎
Original assignee: Toshiba TEC Corp
Current assignee: Toshiba TEC Corp
Priority date: 2012-05-11
Filing date: 2012-05-11
Publication date: 2015-02-12
Anticipated expiration: 2032-05-11
Also published as: JP2013238646A

Description

本発明の実施形態は、商品等の広告用画像を表示する情報提供装置とそのプログラムおよび情報提供システムに関する。 Embodiments described herein relate generally to an information providing apparatus that displays an advertisement image such as a product, a program thereof, and an information providing system.

商品等の広告用画像をディスプレイで表示して公衆に知らせるディジタル・サイネージ（Digital Signage）が知られている。 There is known a digital signage that displays advertisement images such as products on a display and informs the public.

特開２０１１−２１０２３８号公報JP 2011-210238 A

上記のディジタル・サイネージは不特定多数の人を広告の対象としており、表示を見る人に情報が的確に伝わらないことがある。この場合、十分な広告効果が得られない。 The above-mentioned digital signage targets an unspecified number of people as advertisements, and information may not be accurately transmitted to those who view the display. In this case, sufficient advertising effect cannot be obtained.

本発明の実施形態の目的は、表示を見る人に情報を的確に伝えることができ、高い広告効果が得られる情報提供装置とそのプログラムおよび情報提供システムを提供することである。 An object of an embodiment of the present invention is to provide an information providing apparatus, a program therefor, and an information providing system that can accurately convey information to a viewer who sees the display and can obtain a high advertising effect.

一実施形態の情報提供装置は、表示手段、認識手段、生成手段、および制御手段を備える。表示手段は、広告用画像を表示する。認識手段は、前記表示手段の近傍の人が発する音声の言語を認識する。生成手段は、前記認識手段で認識された言語が公用語として予め設定された言語でないとき、前記広告用画像に関わる字幕画像を前記認識手段で認識される言語の文字を用いて生成するとともに、前記公用語で前記字幕画像に関わる音声を生成する。制御手段は、前記生成手段で生成される文字画像を前記表示手段で表示するとともにこの字幕表示の最中に前記公用語での音声をスピーカから発する。 An information providing apparatus according to an embodiment includes a display unit, a recognition unit, a generation unit, and a control unit. The display means displays an advertisement image. The recognizing means recognizes a language of a voice uttered by a person in the vicinity of the display means. The generating unit generates a subtitle image related to the advertisement image using characters of a language recognized by the recognizing unit when the language recognized by the recognizing unit is not a language preset as an official language . Sound related to the caption image is generated in the official language . The control means displays the character image generated by the generating means on the display means, and emits sound in the official language from the speaker during the caption display .

一実施形態の構成を示すブロック図。The block diagram which shows the structure of one Embodiment. 一実施形態の制御を示すフローチャート。The flowchart which shows the control of one Embodiment. 一実施形態の液晶ディスプレイで表示される広告用画像と日本語の字幕画像を示す図。The figure which shows the image for an advertisement displayed on the liquid crystal display of one Embodiment, and a Japanese caption image. 一実施形態の液晶ディスプレイで表示される広告用画像と英語の字幕画像を示す図。The figure which shows the image for an advertisement displayed on the liquid crystal display of one Embodiment, and an English subtitle image. 一実施形態の変形例を示す図。The figure which shows the modification of one Embodiment.

以下、一実施形態について図面を参照して説明する。
図１において、１はディスプレイユニットで、広告用画像を表示する液晶ディスプレイ（表示手段）２、音収集用のマイクロフォン３、および音出力用のスピーカ４を前面に有し、商品の宣伝用として建物の壁面や車両の荷台などに設置される。商品を販売する店舗では、レジカウンタに設置されるＰＯＳ端末の客面側ディスプレイをこのディスプレイユニット１の一部として兼用する場合もある。 Hereinafter, an embodiment will be described with reference to the drawings.
In FIG. 1, reference numeral 1 denotes a display unit, which has a liquid crystal display (display means) 2 for displaying an advertisement image 2, a microphone 3 for sound collection, and a speaker 4 for sound output on the front, It is installed on the wall of the car and the loading platform of the vehicle. In a store that sells merchandise, a customer side display of a POS terminal installed in a cashier counter may be used as a part of the display unit 1 in some cases.

マイクロフォン３は、ディスプレイユニット１の近傍の音を収集する。ディスプレイユニット１の近傍の音として、人が発する音声、自動車や自転車の走行音、店舗が発する音楽、スピーカ４が発する音などがある。 The microphone 3 collects sound in the vicinity of the display unit 1. As sounds in the vicinity of the display unit 1, there are voices emitted by people, running sounds of automobiles and bicycles, music emitted by stores, sounds emitted by speakers 4, and the like.

１０は制御部で、この制御部１０に画像形成部１１、音声認識部１２、音響処理部１３、記憶部１４、操作部１５、ネットワークインタフェース１６が接続される。画像生成部１１は、制御部１０の指令に応じた広告用画像を周知のディジタル画像処理技術を利用して生成する。音声認識部１２は、マイクロフォン３の収集音から人の音声およびその音声の言語を周知の音声認識処理技術を利用して認識する。例えば、日本語・英語・中国語・韓国語・ドイツ語など、複数の言語を認識する。音響処理部１３は、制御部１０の指令に応じた音声および音楽を周知のディジタル音響処理技術を利用して生成する。記憶部１４は、制御部１０の制御に必要なプログラムを記憶するとともに、広告用画像および音声・音楽の生成に必要な各種データを記憶する。操作部１５は、広告用画像および音声・音楽の出力に関する動作条件を設定するための種々の操作手段を含む。ネットワークインタフェース１６は、通信ネットワーク２０を介してサーバ２１に接続される。サーバ２１は、広告用画像およびその広告用画像に関わる音声の生成に必要なデータを通信ネットワーク２０を介して制御部１０に転送する。転送されるデータは、記憶部１４に記憶される。 Reference numeral 10 denotes a control unit, to which an image forming unit 11, a voice recognition unit 12, an acoustic processing unit 13, a storage unit 14, an operation unit 15, and a network interface 16 are connected. The image generation unit 11 generates an advertisement image according to a command from the control unit 10 by using a known digital image processing technique. The voice recognition unit 12 recognizes a human voice and the language of the voice from the collected sound of the microphone 3 using a known voice recognition processing technique. For example, it recognizes multiple languages such as Japanese, English, Chinese, Korean and German. The sound processing unit 13 generates voice and music according to the command from the control unit 10 using a known digital sound processing technique. The storage unit 14 stores a program necessary for the control of the control unit 10 and stores various data necessary for generating an advertisement image and sound / music. The operation unit 15 includes various operation means for setting operation conditions relating to the output of advertisement images and sound / music. The network interface 16 is connected to the server 21 via the communication network 20. The server 21 transfers the advertisement image and data necessary for generating sound related to the advertisement image to the control unit 10 via the communication network 20. The transferred data is stored in the storage unit 14.

上記ディスプレイユニット１、制御部１０、画像生成部１１、音声認識部１２、音響処理部１３、記憶部１４、操作部１５、およびネットワークインタフェース１６により、広告用画像およびその広告用画像に関わる音声・音楽を生成してそれを公衆に知らせるディジタル・サイネージ（Digital Signage）用の情報提供装置が構成される。また、この情報提供装置およびサーバ２１により、情報提供システムが構成される。 The display unit 1, the control unit 10, the image generation unit 11, the voice recognition unit 12, the acoustic processing unit 13, the storage unit 14, the operation unit 15, and the network interface 16 are used for the advertisement image and the audio / video related to the advertisement image. An information providing device for digital signage that generates music and informs the public of it is configured. The information providing apparatus and the server 21 constitute an information providing system.

なお、情報提供装置はコンピュータを含み、そのコンピュータの機能により制御部１０、画像生成部１１、音声認識部１２、音響処理部１３、記憶部１４、操作部１５、ネットワークインタフェース１６の構成が実現される。 The information providing apparatus includes a computer, and the configuration of the control unit 10, the image generation unit 11, the voice recognition unit 12, the acoustic processing unit 13, the storage unit 14, the operation unit 15, and the network interface 16 is realized by the functions of the computer. The

そして、制御部１０は、記憶部１４内のプログラムに基づく主要な機能として、次の（１）〜（７）の手段を有する。
（１）サーバ２１から指定される広告に対応する広告用画像、または記憶部１４に予め登録されている複数の広告のうち表示タイミングが訪れた広告に対応する広告用画像を、画像形成部１１により生成する第１生成手段。 And the control part 10 has the following means (1)-(7) as main functions based on the program in the memory | storage part 14. FIG.
(1) An image for an advertisement corresponding to an advertisement designated from the server 21 or an image for an advertisement corresponding to an advertisement whose display timing has been visited among a plurality of advertisements registered in advance in the storage unit 14 First generating means for generating by

（２）上記生成する広告用画像に関わる音声や音楽を音響処理部１３により生成する第２生成手段。 (2) Second generation means for generating sound and music related to the generated advertisement image by the acoustic processing unit 13.

（３）上記広告用画像の生成に際し、予め定められた一定時間内に音声認識部１２で音声が認識されない場合、上記生成する広告用画像・音声・音楽のいずれかに関わる字幕画像を予め定められている言語の文字を用いて画像生成部１１により生成する第３生成手段。予め定められている言語とは、当該情報提供装置が利用される地域の言語、あるいは当該情報提供装置が利用される地域で公用語として使用される言語である。 (3) When the advertisement image is generated, if no sound is recognized by the voice recognition unit 12 within a predetermined time, a caption image related to any one of the advertisement image, sound, and music to be generated is determined in advance. Third generation means for generating by the image generation unit 11 using characters in the language being used. The predetermined language is a language used in the area where the information providing apparatus is used or a language used as an official language in the area where the information providing apparatus is used.

（４）上記広告用画像の生成に際し、予め定められた一定時間内に音声認識部１２で１つまたは複数の音声が認識され且つその音声から１つの言語が認識された場合、上記生成する広告用画像・音声・音楽のいずれかに関わる字幕画像を上記認識された１つの言語の文字を用いて画像生成部１１により生成する第４生成手段。 (4) When the advertisement image is generated, if one or more sounds are recognized by the voice recognition unit 12 within one predetermined time and one language is recognized from the sounds, the advertisement to be generated is generated. Fourth generation means for generating, by the image generation unit 11, a subtitle image related to any one of the image, sound, and music using the recognized character of one language.

（５）上記広告用画像の生成に際し、予め定められた一定時間内に音声認識部１２で複数の音声が認識され且つその各音声から複数の言語が認識された場合、上記生成する広告用画像・音声・音楽のいずれかに関わる字幕画像を上記認識された複数の言語ごとに各々の言語の文字を用いて画像生成部１１により複数生成する第５生成手段。 (5) When the advertisement image is generated, when a plurality of voices are recognized by the voice recognition unit 12 within a predetermined time and a plurality of languages are recognized from the respective voices, the advertisement image to be generated is generated. A fifth generation unit that generates a plurality of subtitle images related to either voice or music by the image generation unit 11 using characters of each language for each of the recognized plurality of languages.

（６）上記生成した広告用画像を液晶ディスプレイ２で表示するとともに、上記生成した音声および音楽をスピーカ４から出力する第１制御手段。 (6) First control means for displaying the generated advertisement image on the liquid crystal display 2 and outputting the generated voice and music from the speaker 4.

（７）上記生成した字幕画像を、液晶ディスプレイ２で表示中の広告用画像に重ねて、かつ表示画面上でスクロールしながら、液晶ディスプレイ２で表示する第２制御手段。 (7) Second control means for displaying the generated subtitle image on the liquid crystal display 2 while being superimposed on the advertisement image being displayed on the liquid crystal display 2 and scrolling on the display screen.

つぎに、制御部１０が実行する制御を図２のフローチャートを参照しながら説明する。
サーバ２１から指定される広告または記憶部１４に予め登録されている複数の広告のうち表示タイミングが訪れた広告に対応する広告用画像を、画像形成部１１により生成する（ステップ１０１）。この広告用画像の生成に伴い、必要に応じて、その広告用画像に関わる音声や音楽を音響処理部１３により生成する（ステップ１０２）。 Next, the control executed by the control unit 10 will be described with reference to the flowchart of FIG.
The image forming unit 11 generates an advertisement image corresponding to an advertisement designated by the server 21 or an advertisement whose display timing is visited among a plurality of advertisements registered in advance in the storage unit 14 (step 101). Along with the generation of the advertisement image, sound and music related to the advertisement image are generated by the acoustic processing unit 13 as necessary (step 102).

この広告用画像および音声・音楽の生成に際し、一定時間内に音声認識部１２で音声が認識されるかどうかを監視する（ステップ１０３）。 When generating the advertisement image and voice / music, it is monitored whether or not the voice is recognized by the voice recognition unit 12 within a predetermined time (step 103).

ディスプレイユニット１の近傍に人がいない場合、あるいはディスプレイユニット１の近傍に人がいたとしてもだれも音声を発しない場合、音声認識部１２の認識結果は“音声なし”となる（ステップ１０３のＹＥＳ）。この場合、上記生成する広告用画像や音声・音楽に関わる字幕画像を予め定められている言語たとえば日本語の文字を用いて画像生成部１１により生成する（ステップ１０４）。そして、生成した広告用画像を液晶ディスプレイ２で表示するとともに、必要に応じて生成した音声や音楽をスピーカ４から出力する（ステップ１０５）。さらに、生成した字幕画像を、液晶ディスプレイ２で表示中の広告用画像に重ねて、かつ表示画面上でスクロールしながら、液晶ディスプレイ２で表示する（ステップ１０６）。 If there is no person in the vicinity of the display unit 1 or if no one speaks even if there is a person in the vicinity of the display unit 1, the recognition result of the voice recognition unit 12 is “no sound” (YES in step 103). ). In this case, the image generation unit 11 generates the advertisement image and the subtitle image related to the sound and music to be generated using a predetermined language such as Japanese characters (step 104). Then, the generated advertisement image is displayed on the liquid crystal display 2, and the generated voice or music is output from the speaker 4 as necessary (step 105). Further, the generated subtitle image is displayed on the liquid crystal display 2 while being superimposed on the advertisement image being displayed on the liquid crystal display 2 and scrolling on the display screen (step 106).

例えば、図３に示すように、商品である衣服の写真・店舗名・ブランド名などが含まれる広告用画像を生成しそれを液晶ディスプレイ２で表示するとともに、字幕画像３１を表示中の広告用画像の上に重ねて且つ表示画面の図示右端位置から図示左端位置へとスクロールしながら表示する。字幕画像３１は、表示中の広告用画像より面積が小さい矩形状枠の内側に“セール中！”という日本語の文字を入れたもので、広告用画像に含まれる商品や店舗がセール中であることを液晶ディスプレイ２の表示を見る人にアピールする。しかも、字幕画像３１が表示画面上で移動するので、表示中の広告用画像が多くの色を使っていたり複雑な動きを含んでいても、字幕画像３１の表示そのものを液晶ディスプレイ２の表示を見る人に高い注目度をもって気づかせることができる。 For example, as shown in FIG. 3, an advertisement image including a photograph of clothes as a product, a store name, a brand name, and the like is generated and displayed on the liquid crystal display 2, and the caption image 31 is being displayed. The image is displayed over the image while scrolling from the right end position in the drawing to the left end position in the drawing. The subtitle image 31 has Japanese characters “sale!” Inside a rectangular frame that is smaller in area than the currently displayed advertisement image, and the products and stores included in the advertisement image are on sale. It appeals to those who see the display on the liquid crystal display 2. Moreover, since the subtitle image 31 moves on the display screen, the display of the subtitle image 31 itself is displayed on the liquid crystal display 2 even if the advertising image being displayed uses many colors or includes complicated movements. The viewer can be noticed with a high degree of attention.

この図３の例では、広告用画像と日本語の字幕画像３１とを液晶ディスプレイ２でスクロール表示することに加え、例えば“Now On Sale！”という英語の音声を生成しそれを字幕画像３１のスクロール表示の最中にスピーカ４から発する。この英語音声の発生時、日本語の字幕画像３１の表示は、広告用画像の表示をサポートすると同時に、発生した英語音声の内容を同時通訳的に日本人の聴衆に伝える役目をする。 In the example of FIG. 3, in addition to scrolling the advertisement image and the Japanese subtitle image 31 on the liquid crystal display 2, for example, an English voice “Now On Sale!” Is generated and the subtitle image 31 is generated. The sound is emitted from the speaker 4 during the scroll display. When this English voice is generated, the display of the Japanese subtitle image 31 supports the display of the advertisement image and at the same time serves to simultaneously convey the content of the generated English voice to the Japanese audience.

スピーカ４から音声が発せられると、その音声がマイクロフォン３で収集されて音声認識部１２で認識されてしまうが、その時点では音声認識部１２で音声が認識されているかどうかを監視するステップ１０３の処理がすでに終了しているので、スピーカ４からの発生音声をディスプレイユニット１の近傍にいる人の音声として誤認識することはない。 When a sound is emitted from the speaker 4, the sound is collected by the microphone 3 and recognized by the sound recognition unit 12. At that time, whether the sound is recognized by the sound recognition unit 12 is monitored in step 103. Since the processing has already been completed, the voice generated from the speaker 4 is not erroneously recognized as the voice of a person in the vicinity of the display unit 1.

なお、字幕画像３１の表示は、対応する広告用画像を表示している期間内であれば、一度だけ現れて消えるスクロールに限らず、何度も現れて消える繰り返しのスクロールであってもよい。スクロールの方向については、右端位置から左端位置へのスクロールに限らず、左端位置から右端位置へのスクロール、上端位置から下端位置へのスクロール、下端位置から上端位置へのスクロール、斜め方向のスクロールなど、そのいずれでもよい。例えば、広告用画像に含まれる商品の種類、広告用画像の色使い、広告用画像の動きなどに合せて、スクロールの方向を選定する。 The display of the subtitle image 31 is not limited to the scroll that appears and disappears only once as long as it is within the period in which the corresponding advertisement image is displayed, but may be a repeated scroll that appears and disappears many times. The scroll direction is not limited to the scroll from the right end position to the left end position, but the scroll from the left end position to the right end position, the scroll from the upper end position to the lower end position, the scroll from the lower end position to the upper end position, the scroll in the oblique direction, etc. Any of them may be used. For example, the scroll direction is selected according to the type of product included in the advertisement image, the color usage of the advertisement image, the movement of the advertisement image, and the like.

一方、広告用画像の生成および必要に応じた音声・音楽の生成に際し、ディスプレイユニット１の近傍にいる人が音声を発すると、その音声がマイクロフォン３で収集され、その収集音に含まれる音声およびその音声の言語が音声認識部１２で認識される（ステップ１０３のＮＯ）。このとき、認識された言語の数が１つであれば（ステップ１０７のＹＥＳ）、生成する広告用画像や音声・音楽のいずれかに関わる字幕画像を上記認識された１つの言語の文字を用いて画像生成部１１により生成する（ステップ１０８）。そして、生成した広告用画像を液晶ディスプレイ２で表示するとともに、生成した音声および音楽をスピーカ４から出力する（ステップ１０５）。さらに、生成した字幕画像を、表示中の広告用画像に重ねて、かつ表示画面上でスクロールしながら、液晶ディスプレイ２で表示する（ステップ１０６）。 On the other hand, when a person in the vicinity of the display unit 1 utters sound when generating an advertisement image and sound / music as necessary, the sound is collected by the microphone 3, and the sound included in the collected sound and The speech language is recognized by the speech recognition unit 12 (NO in step 103). At this time, if the number of recognized languages is one (YES in step 107), the subtitle image related to either the advertisement image to be generated or voice / music is used with the characters of the recognized one language. Then, the image is generated by the image generator 11 (step 108). Then, the generated advertisement image is displayed on the liquid crystal display 2, and the generated sound and music are output from the speaker 4 (step 105). Further, the generated subtitle image is displayed on the liquid crystal display 2 while being superimposed on the displayed advertisement image and scrolling on the display screen (step 106).

例えば、上記認識された１つの言語が日本語であれば、ディスプレイユニット１の近傍に日本人がいてその人が液晶ディスプレイ２を見ることが可能な状況にあるとの判断の下に、図３の例と同じく、商品である衣服の写真・店舗名・ブランド名などが含まれる広告用画像を生成しそれを液晶ディスプレイ２で表示するとともに、その表示中の広告用画像の上に重ねて日本語の字幕画像３１をスクロール表示する。液晶ディスプレイ２を見る日本人にとっては、日本語の字幕画像３１が現れるので、その字幕画像３１が伝える情報を的確に把握することができる。 For example, if the recognized one language is Japanese, it is determined that there is a Japanese in the vicinity of the display unit 1 and that the person can see the liquid crystal display 2 as shown in FIG. As in the example above, an advertisement image including a photograph of clothes, a store name, a brand name, etc., which is a product, is generated and displayed on the liquid crystal display 2 and is superimposed on the displayed advertisement image in Japan. The subtitle image 31 of the word is scroll-displayed. For Japanese viewing the liquid crystal display 2, a Japanese subtitle image 31 appears, so that the information conveyed by the subtitle image 31 can be accurately grasped.

上記認識された１つの言語が英語であれば、ディスプレイユニット１の近傍に英語圏の人がいてその人が液晶ディスプレイ２を見ることが可能な状況にあるとの判断の下に、図４のように、商品である衣服の写真・店舗名・ブランド名などが含まれる広告用画像を生成しそれを液晶ディスプレイ２で表示するとともに、その表示中の広告用画像の上に重ねて英語の字幕画像３２をスクロール表示する。字幕画像３２は、表示中の広告用画像より面積が小さい矩形状枠の内側に“Now On Sale！”という英語の文字を入れたもので、広告用画像に含まれる商品や店舗がセール中であることを液晶ディスプレイ２の表示を見る英語圏の人にアピールする。 If the recognized one language is English, it is determined that there is an English speaking person in the vicinity of the display unit 1 and that the person can see the liquid crystal display 2 in FIG. In the same way, an advertisement image including a photograph of clothes as a product, a store name, a brand name, etc. is generated and displayed on the liquid crystal display 2, and the English subtitles are superimposed on the displayed advertisement image. The image 32 is scroll-displayed. The subtitle image 32 has an English character “Now On Sale!” Inside a rectangular frame having a smaller area than the currently displayed advertising image, and products and stores included in the advertising image are on sale. It appeals to English-speaking people who see the display on the liquid crystal display 2.

この図４の例では、広告用画像と英語の字幕画像３２を液晶ディスプレイ２で表示することに加え、例えば“セール中！”という日本語の音声を生成しそれを字幕画像３２のスクロール表示の最中にスピーカ４から発する。この日本語音声の発生時、英語の字幕画像３２の表示は、広告用画像の表示をサポートすると同時に、発生した日本語音声の内容を同時通訳的に英語圏の聴衆に伝える役目をする。 In the example of FIG. 4, in addition to displaying the advertisement image and the English subtitle image 32 on the liquid crystal display 2, for example, a Japanese voice “sale!” Is generated and the subtitle image 32 is scrolled. It emits from the speaker 4 in the middle. When this Japanese voice is generated, the display of the English subtitle image 32 supports the display of the advertisement image, and at the same time serves to simultaneously convey the content of the generated Japanese voice to the English-speaking audience.

なお、字幕画像３２の表示についても、字幕画像３１の表示と同じく、対応する広告用画像を表示している期間内であれば、一度だけ現れて消えるスクロールに限らず、何度も現れて消える繰り返しのスクロールであってもよい。スクロールの方向についても、右端位置から左端位置へのスクロールに限らず、左端位置から右端位置へのスクロール、上端位置から下端位置へのスクロール、下端位置から上端位置へのスクロール、斜め方向のスクロールなど、そのいずれでもよい。例えば、広告用画像に含まれる商品の種類、広告用画像の色使い、広告用画像の動きなどに合せて、スクロールの方向を選定する。 As with the display of the subtitle image 31, the display of the subtitle image 32 is not limited to the scroll that appears and disappears only once, but appears and disappears many times within the period of displaying the corresponding advertisement image. It may be repeated scrolling. The scroll direction is not limited to the scroll from the right end position to the left end position, but the scroll from the left end position to the right end position, the scroll from the upper end position to the lower end position, the scroll from the lower end position to the upper end position, the scroll in the oblique direction, etc. Any of them may be used. For example, the scroll direction is selected according to the type of product included in the advertisement image, the color usage of the advertisement image, the movement of the advertisement image, and the like.

上記認識された１つの言語が中国語の場合は、ディスプレイユニット１の近傍に中国語圏の人がいてその人が液晶ディスプレイ２を見ることが可能な状況にあるとの判断の下に、中国語の文字を用いて字幕画像を生成し、それを表示中の広告用画像の上に重ねてスクロール表示する。上記認識された１つの言語が韓国語の場合は、ディスプレイユニット１の近傍に韓国語圏の人がいてその人が液晶ディスプレイ２を見ることが可能な状況にあるとの判断の下に、韓国語の文字を用いて字幕画像を生成し、それを表示中の広告用画像の上に重ねてスクロール表示する。上記認識された１つの言語がドイツ語の場合は、ディスプレイユニット１の近傍にドイツ語圏の人がいてその人が液晶ディスプレイ２を見ることが可能な状況にあるとの判断の下に、ドイツ語の文字を用いて字幕画像を生成し、それを表示中の広告用画像の上に重ねてスクロール表示する。 If the recognized one language is Chinese, it is determined that there is a Chinese-speaking person in the vicinity of the display unit 1 and that the person can see the liquid crystal display 2. A subtitle image is generated using the characters of the word, and it is scrolled and displayed on the advertisement image being displayed. If one of the recognized languages is Korean, it is determined that there is a Korean-speaking person in the vicinity of the display unit 1 and that the person can see the liquid crystal display 2. A subtitle image is generated using the characters of the word, and it is scrolled and displayed on the advertisement image being displayed. If one of the recognized languages is German, German is determined to be in a situation where there is a German-speaking person in the vicinity of the display unit 1 and that person can see the liquid crystal display 2. A subtitle image is generated using the characters of the word, and it is scrolled and displayed on the advertisement image being displayed.

また、広告用画像および音声・音楽の生成に際し、音声認識部１２で一定時間内に複数の言語が認識された場合には（ステップ１０３のＮＯ、ステップ１０７のＮＯ）、生成する広告用画像や音声・音楽のいずれかに関わる字幕画像を音声認識部１２で認識された複数の言語ごとに各々の言語の文字を用いて画像生成部１１により複数生成する（ステップ１０９）。そして、生成した広告用画像を液晶ディスプレイ２で表示するとともに、必要に応じて生成した音声や音楽をスピーカ４から出力する（ステップ１１０）。さらに、生成した複数の字幕画像を、液晶ディスプレイ２で表示中の広告用画像に重ねて、かつ表示画面上でスクロールしながら、液晶ディスプレイ２で順次に表示する（ステップ１１１）。 In addition, when the speech recognition unit 12 recognizes a plurality of languages within a predetermined time when generating the advertisement image and voice / music (NO in step 103, NO in step 107), A plurality of subtitle images related to either voice or music are generated by the image generation unit 11 for each of a plurality of languages recognized by the voice recognition unit 12 using characters of each language (step 109). Then, the generated advertisement image is displayed on the liquid crystal display 2, and the generated voice and music are output from the speaker 4 as necessary (step 110). Further, the generated subtitle images are sequentially displayed on the liquid crystal display 2 while being superimposed on the advertisement image being displayed on the liquid crystal display 2 and scrolling on the display screen (step 111).

例えば、複数の言語として日本語と英語が認識された場合には、ディスプレイユニット１の近傍に日本人と英語圏の人がいてその人たちが液晶ディスプレイ２を見ることが可能な状況にあるとの判断の下に、商品である衣服の写真・店舗名・ブランド名などが含まれる広告用画像を生成しそれを液晶ディスプレイ２で表示するとともに、その表示中の広告用画像の上に、先ず図３のように日本語の“セール中！”という字幕画像３１をスクロール表示し、この字幕画像３１のスクロール表示が終わったところで、続いて図４のように英語の“Now On Sale！”という字幕画像３２をスクロール表示する。最初の日本語の字幕画像３１のスクロール表示はセール中であることを日本人にアピールし、続いての英語の字幕画像３２のスクロール表示はセール中であることを英語圏の人にアピールする。 For example, when Japanese and English are recognized as a plurality of languages, there are Japanese and English-speaking people in the vicinity of the display unit 1 and the people can see the liquid crystal display 2. In accordance with the above determination, an advertisement image including a photograph of clothes, a store name, a brand name, etc., which is a product, is generated and displayed on the liquid crystal display 2, and on the advertisement image being displayed, As shown in FIG. 3, the Japanese subtitle image “SALE!” Is scroll-displayed. When the subtitle image 31 is scrolled, the English subtitle image “Now On Sale!” Is displayed as shown in FIG. The subtitle image 32 is scroll-displayed. The scroll display of the first Japanese subtitle image 31 appeals to the Japanese that it is on sale, and the subsequent scroll display of the English subtitle image 32 appeals to an English speaking person that it is on sale.

また、“Now On Sale！”という英語の音声を生成するとともに“セール中！”という日本語の音声を生成し、英語の音声を日本語の字幕画像３１のスクロール表示の最中にスピーカ４から発し、日本語の音声を英語の字幕画像３２のスクロール表示の最中にスピーカ４から発する。英語音声の発生時、日本語の字幕画像３１の表示は、広告用画像の表示をサポートすると同時に、発生した英語音声の内容を同時通訳的に日本人の聴衆に伝える役目をする。日本語音声の発生時、英語の字幕画像３２の表示は、広告用画像の表示をサポートすると同時に、発生した日本語音声の内容を同時通訳的に英語圏の聴衆に伝える役目をする。 In addition, an English voice “Now On Sale!” And a Japanese voice “Sale!” Are generated, and the English voice is generated from the speaker 4 while the Japanese subtitle image 31 is being scrolled. A Japanese voice is emitted from the speaker 4 during the scroll display of the English subtitle image 32. When English speech is generated, the display of the Japanese subtitle image 31 supports the display of the advertisement image and simultaneously serves to convey the content of the generated English speech to the Japanese audience in a simultaneous interpretation. When Japanese speech is generated, the display of the English subtitle image 32 supports the display of the advertisement image, and at the same time serves to simultaneously convey the content of the generated Japanese speech to an English-speaking audience.

なお、字幕画像３１，３２のスクロール表示の移動方向については、互いに同じもよいし、互いに異なっていてもよい。 Note that the moving directions of the scroll display of the subtitle images 31 and 32 may be the same or different from each other.

複数の言語として中国語と韓国語が認識された場合には、ディスプレイユニット１の近傍に中国語圏の人と韓国語圏の人がいてその人たちが液晶ディスプレイ２を見ることが可能な状況にあるとの判断の下に、セール中という意味の中国語の文字を用いた字幕画像を生成するとともにセール中という意味の韓国語の文字を用いた字幕画像を生成し、先ず中国語の字幕画像をスクロール表示し、次に韓国語の字幕画像をスクロール表示する。この場合、セール中という意味の中国語の音声を生成するとともにセール中という意味の韓国語の音声を生成し、中国語の音声を韓国語の字幕画像のスクロール表示の最中にスピーカ４から発し、韓国語の音声を中国語の字幕画像のスクロール表示の最中にスピーカ４から発する。中国語音声の発生時、韓国語の字幕画像の表示は、広告用画像の表示をサポートすると同時に、発生した中国語音声の内容を同時通訳的に韓国語圏の聴衆に伝える役目をする。韓国語音声の発生時、中国語の字幕画像の表示は、広告用画像の表示をサポートすると同時に、発生した韓国語音声の内容を同時通訳的に中国語圏の聴衆に伝える役目をする。 When Chinese and Korean are recognized as multiple languages, there are Chinese-speaking people and Korean-speaking people in the vicinity of the display unit 1, and those people can see the LCD 2 If a subtitle image using Chinese characters meaning “on sale” is generated, a subtitle image using Korean characters meaning “on sale” is generated. The image is scrolled and then the Korean subtitle image is scrolled. In this case, a Chinese voice meaning “sale” and a Korean voice meaning “sale” are generated, and the Chinese voice is emitted from the speaker 4 during scroll display of the Korean subtitle image. Korean audio is emitted from the speaker 4 during the scroll display of the Chinese subtitle image. When a Chinese voice is generated, the display of the Korean subtitle image supports the display of the advertisement image and at the same time serves to convey the content of the generated Chinese voice to a Korean-speaking audience in a simultaneous interpretation. When a Korean voice is generated, the display of the Chinese subtitle image supports the display of the advertisement image, and at the same time serves to simultaneously convey the content of the generated Korean voice to a Chinese-speaking audience.

なお、中国語の字幕画像の内容を日本人が理解できるよう、中国語の字幕画像のスクロール表示と同時に、セール中という日本語音声をスピーカ４から発してもよい。韓国語の字幕画像の内容を日本人が理解できるよう、韓国語の字幕画像のスクロール表示と同時に、セール中という日本語音声をスピーカ４から発してもよい。 In order to allow the Japanese to understand the content of the Chinese subtitle image, a Japanese voice indicating that the sale is in progress may be generated from the speaker 4 simultaneously with the scroll display of the Chinese subtitle image. In order for the Japanese to understand the contents of the Korean subtitle image, a Japanese voice indicating that the sale is in progress may be issued from the speaker 4 simultaneously with the scroll display of the Korean subtitle image.

複数の言語として日本語・英語・ドイツ語の３つが認識された場合には、“セール中！”という日本語の字幕画像３１、“Now On Sale！”という英語の字幕画像３２、セール中という意味のドイツ語の文字を用いた字幕画像をそれぞれ生成し、最初に日本語の字幕画像３１をスクロール表示し、次に英語の字幕画像３２をスクロール表示し、続いてドイツ語の字幕画像をスクロール表示する。この場合、セール中という意味のドイツ語の音声、“Now On Sale！”という英語の音声、“セール中！”という日本語の音声をそれぞれ生成し、そのドイツ語音声および英語音声を日本語の字幕画像３１のスクロール表示の最中にスピーカ４から連続的に発し、日本語音声を英語の字幕画像３２のスクロール表示の最中およびドイツ語の字幕画像のスクロール表示の最中にそれぞれスピーカ４から発する。 If Japanese, English, and German are recognized as multiple languages, the Japanese subtitle image 31 “Now On Sale!”, The English subtitle image 32 “Now On Sale!”, And “On Sale” Subtitle images using meaning German characters are generated respectively, first the Japanese subtitle image 31 is scroll-displayed, then the English subtitle image 32 is scroll-displayed, and then the German subtitle image is scrolled. indicate. In this case, a German voice meaning “on sale!”, An English voice “Now On Sale!”, And a Japanese voice “sale!” Are generated, and the German voice and English voice are generated in Japanese. The subtitle image 31 is continuously emitted from the speaker 4 during the scroll display of the subtitle image 31, and Japanese audio is output from the speaker 4 during the scroll display of the English subtitle image 32 and during the scroll display of the German subtitle image, respectively. To emit.

ドイツ語音声の発生時、日本語の字幕画像３１の表示は、広告用画像の表示をサポートすると同時に、発生したドイツ語音声の内容を同時通訳的に日本人の聴衆に伝える役目をする。日本語音声の発生時、ドイツ語の字幕画像の表示は、広告用画像の表示をサポートすると同時に、発生した日本語音声の内容を同時通訳的にドイツ語圏の聴衆に伝える役目をする。 When the German voice is generated, the display of the Japanese subtitle image 31 supports the display of the advertisement image and simultaneously serves to convey the content of the generated German voice to the Japanese audience in a simultaneous interpretation. When Japanese speech is generated, the display of German subtitle images supports the display of advertising images and at the same time serves to convey the content of the generated Japanese speech to a German-speaking audience in a simultaneous interpretation.

以上のように、ディスプレイユニット１の近傍の人が発する音声およびその音声の言語を認識し、認識した言語の文字を用いた字幕画像を生成し、その字幕画像を広告用画像と共に液晶ディスプレイ２で表示することにより、液晶ディスプレイ２の表示を見る人に対し、表示中の広告用画像に関わる情報を的確に伝えることができる。すなわち、外国人の場合、広告用画像の表示を見るだけでは広告の内容をよく理解できないことがあるが、日常使用している言語の文字を用いた字幕画像が広告用画像の上に表示されるので、広告の内容を的確に把握することができる。 As described above, a voice uttered by a person in the vicinity of the display unit 1 and a language of the voice are recognized, a caption image using characters of the recognized language is generated, and the caption image is displayed on the liquid crystal display 2 together with an advertisement image. By displaying, it is possible to accurately convey information related to the advertisement image being displayed to a person who views the display on the liquid crystal display 2. In other words, foreigners may not be able to understand the content of the advertisement by simply looking at the display of the advertisement image, but subtitle images using characters in the language that they use everyday are displayed on the advertisement image. Therefore, it is possible to accurately grasp the contents of the advertisement.

外国人に限らず、日本人であっても、字幕画像が表示されることで、広告用画像の表示だけを見る場合よりも、広告の内容をより的確に把握することができる。 Even if it is not only a foreigner but a Japanese, by displaying a caption image, the contents of an advertisement can be grasped more accurately than when only the display of an advertisement image is viewed.

なお、上記実施形態では、字幕画像の内容として“セール中！”および“Now On Sale！”を例に説明したが、字幕画像の内容としては、商品名、店舗名、ブランド名、店舗の開店日時、店舗へのアクセス情報などいずれでもよい。認識する言語として日本語、英語、中国語、韓国語、ドイツ語を例に説明したが、他の言語についても同様に実施可能である。複数の字幕画像を順次にスクロール表示したが、複数の字幕画像を互いに並んだ状態で同時にスクロール表示してもよい。スクロール表示に限らず、字幕画像を広告用画像の所定位置に固定的に表示してもよい。 In the above embodiment, the contents of the subtitle image are described as “sale!” And “Now On Sale!” As an example, but the content of the subtitle image includes a product name, a store name, a brand name, and a store opening. Date and time, store access information, etc. may be used. Although Japanese, English, Chinese, Korean, and German have been described as examples of recognized languages, other languages can be similarly implemented. Although a plurality of subtitle images are sequentially scroll-displayed, a plurality of subtitle images may be simultaneously scroll-displayed in a state where they are arranged side by side. Not only scroll display but also a subtitle image may be fixedly displayed at a predetermined position of the advertisement image.

当該情報提供装置の機能や構成の一部を外部のサーバに設けることも可能である。このシステムを構築する場合、例えばクラウドコンピューティングを利用できる。より具体的には、ＳａａＳ（software as a service）と称されるソフトウェア提供形態が適する。このクラウドシステムを利用する場合の構成を図５に示す。情報提供システム２００は、クラウド２０１、複数の端末２０２および複数の通信ネットワーク２０３、および互いに通信接続された複数のサーバ２０４を有する。これら端末２０２、通信ネットワーク２０３、およびサーバ２０４は、それぞれ１つのみでもよい。端末２０２は、通信ネットワーク２０３を介してクラウド２０１と通信可能である。端末２０２としては、当該情報提供装置、デスクトップタイプやノートブックタイプなどの種々のコンピュータ、携帯電話装置、携帯情報端末（ＰＤＡ）、あるいはスマートフォンなどを適宜に利用できる。通信ネットワーク２０３としては、インターネット、プライベートネットワーク、次世代ネットワーク（ＮＧＮ）、あるいはモバイルネットワークなどを適宜に利用できる。 A part of the function and configuration of the information providing apparatus can be provided in an external server. When constructing this system, for example, cloud computing can be used. More specifically, a software provision form called SaaS (software as a service) is suitable. A configuration for using this cloud system is shown in FIG. The information providing system 200 includes a cloud 201, a plurality of terminals 202, a plurality of communication networks 203, and a plurality of servers 204 connected to each other for communication. Each of these terminals 202, communication network 203, and server 204 may be only one. The terminal 202 can communicate with the cloud 201 via the communication network 203. As the terminal 202, the information providing device, various computers such as a desktop type and a notebook type, a mobile phone device, a personal digital assistant (PDA), a smartphone, and the like can be used as appropriate. As the communication network 203, the Internet, a private network, a next generation network (NGN), a mobile network, or the like can be used as appropriate.

この画像表示システム２００において、当該情報提供装置が持つ機能や構成のうち、少なくとも１つをサーバ２０４に設け、そのサーバ２０４に設けない残りの機能や構成を端末２０２に設ける。サーバ２０４に設ける機能や構成は、１つのサーバ２０４に配置してもよいし、複数のサーバ２０４に分散して配置してもよい。 In the image display system 200, at least one of the functions and configurations of the information providing apparatus is provided in the server 204, and the remaining functions and configurations that are not provided in the server 204 are provided in the terminal 202. The functions and configurations provided in the server 204 may be arranged in one server 204 or may be distributed in a plurality of servers 204.

その他、上記実施形態および変形例は、例として提示したものであり、発明の範囲を限定することは意図していない。この新規な実施形態および変形例は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、書き換え、変更を行うことができる。これら実施形態や変形は、発明の範囲は要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。
以下に、本願出願の当初の特許請求の範囲に記載された発明を付記する。
［１］広告用画像を表示する表示手段と、前記表示手段の近傍の人が発する音声の言語を認識する認識手段と、前記広告用画像に関わる字幕画像を前記認識手段で認識される言語の文字を用いて生成する生成手段と、前記生成手段で生成される字幕画像を前記表示手段で表示する制御手段と、を備えることを特徴とする情報提供装置。
［２］前記制御手段は、前記生成手段で生成される字幕画像を、前記表示手段で表示中の広告用画像に重ねて、かつスクロールしながら、表示する、ことを特徴とする付記［１］に記載の情報提供装置。
［３］前記生成手段は、前記認識手段で１つの言語のみ認識された場合、前記広告用画像に関わる字幕画像をその認識された１つの言語の文字を用いて生成し、前記認識手段で複数の言語が認識された場合、前記広告用画像に関わる字幕画像をその認識された複数の言語ごとに各々の言語の文字を用いて複数生成する、ことを特徴とする付記［１］に記載の情報提供装置。
［４］前記制御手段は、前記生成手段で複数の字幕画像が生成された場合、その複数の字幕画像を、前記表示手段で表示中の前記広告用画像に重ねて、かつスクロールしながら、順次に表示する、ことを特徴とする付記［３］に記載の情報提供装置。
［５］広告用画像の表示手段およびコンピュータを含む情報提供装置において、前記コンピュータを、前記表示手段の近傍の人が発する音声の言語を認識する認識手段と、前記広告用画像に関わる字幕画像を前記認識手段で認識される言語の文字を用いて生成する生成手段と、前記生成手段で生成される字幕画像を前記表示手段で表示する制御手段と、して機能させることを特徴とするプログラム。
［６］広告用画像を表示する情報提供装置およびサーバを含む情報提供システムにおいて、前記情報提供装置の近傍の人が発する人の音声の言語を認識する認識手段と、前記広告用画像に関わる字幕画像を前記認識手段で認識される言語の文字を用いて生成する生成手段と、前記生成手段で生成される字幕画像を前記表示手段で表示する制御手段と、を備え、前記認識手段、前記生成手段、前記制御手段のうち、少なくとも１つの手段を前記サーバが含み、そのサーバが含む手段を除く残りの手段を前記情報提供装置が含むことを特徴とする情報提供システム。 In addition, the said embodiment and modification are shown as an example and are not intending limiting the range of invention. The novel embodiments and modifications can be implemented in various other forms, and various omissions, rewrites, and changes can be made without departing from the spirit of the invention. The scope of the invention is included in the gist of these embodiments and modifications, and is included in the invention described in the claims and the equivalents thereof.
Hereinafter, the invention described in the scope of claims of the present application will be appended.
[1] Display means for displaying an advertisement image, recognition means for recognizing a language of a voice uttered by a person in the vicinity of the display means, and a subtitle image related to the advertisement image in a language recognized by the recognition means An information providing apparatus comprising: generating means for generating using characters; and control means for displaying a caption image generated by the generating means on the display means.
[2] The supplementary note [1], wherein the control means displays the subtitle image generated by the generating means while being superimposed on the advertisement image being displayed by the display means and scrolling. The information providing device described in 1.
[3] When only one language is recognized by the recognition unit, the generation unit generates a subtitle image related to the advertisement image using characters of the recognized one language, and the recognition unit generates a plurality of subtitle images. When the language is recognized, a plurality of subtitle images related to the advertisement image are generated using characters of each language for each of the recognized languages. Information providing device.
[4] When a plurality of subtitle images are generated by the generation unit, the control unit sequentially superimposes and scrolls the plurality of subtitle images on the advertising image being displayed on the display unit. The information providing device according to [3], wherein the information providing device is displayed on the screen.
[5] In an information providing apparatus including an advertisement image display means and a computer, the computer includes a recognition means for recognizing a language of speech uttered by a person in the vicinity of the display means, and a caption image related to the advertisement image. A program that functions as a generation unit that generates characters using a language recognized by the recognition unit, and a control unit that displays subtitle images generated by the generation unit on the display unit.
[6] In an information providing system including an information providing apparatus and a server for displaying an advertisement image, a recognition unit for recognizing a speech language of a person uttered by a person in the vicinity of the information providing apparatus, and a caption related to the advertisement image A generating unit that generates an image using characters in a language recognized by the recognizing unit; and a control unit that displays the subtitle image generated by the generating unit on the display unit. An information providing system, wherein the server includes at least one of the means and the control means, and the information providing apparatus includes the remaining means excluding the means included in the server.

１…ディスプレイユニット、２…液晶ディスプレイ、３…マイクロフォン、４…スピーカ、１０…制御部、１１…画像生成部、１２…音声認識部、１３…音響処理部、１４…記憶部、１５…操作部、１６…ネットワークインタフェース、２０…通信ネットワーク、２１…サーバ DESCRIPTION OF SYMBOLS 1 ... Display unit, 2 ... Liquid crystal display, 3 ... Microphone, 4 ... Speaker, 10 ... Control part, 11 ... Image generation part, 12 ... Voice recognition part, 13 ... Acoustic processing part, 14 ... Memory | storage part, 15 ... Operation part , 16 ... Network interface, 20 ... Communication network, 21 ... Server

Claims

Display means for displaying advertising images;
Recognizing means for recognizing the language of a sound emitted by a person in the vicinity of the display means;
When the language recognized by the recognition means is not a language preset as an official language , a subtitle image related to the advertising image is generated using characters of the language recognized by the recognition means, and the official language Generating means for generating audio related to the subtitle image ;
Control means for displaying the subtitle image generated by the generating means on the display means and emitting sound in the official language from a speaker during the subtitle display ;
An information providing apparatus comprising:

The control means displays the subtitle image generated by the generation means while overlapping and scrolling the advertising image being displayed by the display means.
The information providing apparatus according to claim 1.

Said generating means, respectively for each official language and if the language other than an official language has been recognized, the subtitle image relating to the advertising image and the recognized the official language other than the official language in said recognition means Generating a plurality of sounds using characters in the language, and generating sound related to the subtitle image in the recognized official language and a language other than the official language,
When the subtitle image of the official language generated by the generating unit is displayed on the display unit, the control unit emits a sound in a language other than the official language from the speaker during the subtitle display. The information providing apparatus according to claim 1 , wherein when displaying a subtitle image other than that on the display unit, sound in the language of the official language is emitted from a speaker during the subtitle display .

When a plurality of subtitle images are generated by the generation unit, the control unit sequentially displays the plurality of subtitle images while overlapping and scrolling the advertisement image being displayed by the display unit. ,
The information providing apparatus according to claim 3.

In an information providing apparatus including an advertising image display means and a computer,
The computer,
Recognizing means for recognizing the language of a sound emitted by a person in the vicinity of the display means;
When the language recognized by the recognition means is not a language preset as an official language , a subtitle image related to the advertisement image is generated using characters of a language recognized by the recognition means, and the official language Generating means for generating audio related to the subtitle image ;
Control means for displaying the subtitle image generated by the generating means on the display means and emitting sound in the official language from a speaker during the subtitle display ;
Program for to function.

In an information providing system including an information providing apparatus and a server including display means for displaying an advertisement image,
Recognizing means for recognizing a speech language of a person uttered by a person in the vicinity of the information providing device;
When the language recognized by the recognition unit is not a language preset as an official language, a generation unit that generates a caption image related to the advertising image using characters of a language recognized by the recognition unit;
Control means for displaying the subtitle image generated by the generating means on the display means and emitting sound in the official language from a speaker during the subtitle display ;
With
The information providing system, wherein the server includes at least one of the recognition unit, the generation unit, and the control unit, and the information providing apparatus includes the remaining units excluding the unit included in the server.