JP2005332404A

JP2005332404A - Content providing system

Info

Publication number: JP2005332404A
Application number: JP2005146146A
Authority: JP
Inventors: Nariyuki Motoi; 成幸元井
Original assignee: MOTOI SOKEN KK
Current assignee: MOTOI SOKEN KK
Priority date: 2002-09-24
Filing date: 2005-05-19
Publication date: 2005-12-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a content in accordance with the main topic of an object person in real time without requiring huge hardware resources. <P>SOLUTION: The content providing system is provided with a means for recognizing a setting keyword by recognizing voice captured from a microphone, a means for recording a recognition time in association with the keyword or the like or for updating the recorded recognition time to the recognition time of the above recognition in response to the recognition of the keyword, a means for recognizing a prescribed key set wherein a elapsed time from the oldest recognition time to the latest recognition time in the recognition of a plurality of kinds of keywords configuring the key set is within a recognition setting time, a means for calling the content set in association with the recognized prescribed key set from a content DB, and a display type device which is installed near a table and reproduces and outputs the content called in substantially real time with the latest recognition time. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、例えばコンテンツを提供される話者等の対話から音声を取り込み、その音声に含まれるキーワードに基づき話題を特定し、その話題に対応する広告等のコンテンツをリアルタイムに提供するコンテンツ提供システムに関する。 The present invention provides a content providing system that captures voice from, for example, a conversation of a speaker or the like who is provided with content, specifies a topic based on a keyword included in the voice, and provides content such as an advertisement corresponding to the topic in real time About.

コンテンツを提供される被提供者等の対話から音声を取り込み、その音声に含まれるキーワードに基づき話題を特定し、その話題に対応するコンテンツをリアルタイムに提供する構成に関する公知文献として、特許文献１がある。特許文献１には、テレビ会議等の対話を音声認識してキーワードを認知し、そのキーワードに基づき話題を特定し、特定した話題に基づいてコンテンツを決定し、そのコンテンツをディスプレイ等に出力する情報提供装置が開示され、前記情報提供装置は、特徴的なキーワードの認識に応じて、そのキーワードと対応する話題を特定する、或いは話題に出現する複数のキーワードの出願頻度の分布を取得し、その出現頻度分布が最も類似する設定された出現頻度分布を決定し、決定した設定出現頻度分布に対応する話題を該当する話題として特定するものである。 Patent Document 1 discloses a publicly known document relating to a configuration in which a voice is taken from a dialogue of a recipient or the like who is provided with content, a topic is specified based on a keyword included in the voice, and content corresponding to the topic is provided in real time. is there. Patent Document 1 discloses information that recognizes a keyword by recognizing a dialogue such as a video conference, identifies a topic based on the keyword, determines content based on the identified topic, and outputs the content to a display or the like. A providing device is disclosed, and the information providing device identifies a topic corresponding to the keyword or acquires a distribution of application frequencies of a plurality of keywords appearing in the topic in accordance with recognition of a characteristic keyword, A set appearance frequency distribution with the most similar appearance frequency distribution is determined, and a topic corresponding to the determined set appearance frequency distribution is specified as a corresponding topic.

また、他の公知文献として特許文献２があり、特許文献２には、テレビ電話の対話を音声認識してキーワードを認知し、予め配信されテレビ電話端末に蓄積された広告の中から認知したキーワードと対応する広告を呼び出し、その広告をディスプレイに出力するテレビ電話端末が開示されている。 Further, there is Patent Document 2 as another known document. In Patent Document 2, a keyword is recognized by voice recognition of a dialog of a videophone and recognized from advertisements distributed in advance and stored in a videophone terminal. A videophone terminal that calls the corresponding advertisement and outputs the advertisement on a display is disclosed.

また、上記構成に関連する公知文献として特許文献３があり、特許文献３には、音声認識センサーで周囲の人の話し声を検知し、その話し声の周波数帯域やフォルトマントを分析して周囲の人の性別及び年齢層を判断し、性別及び年齢層の判断結果に応じた広告情報をディスプレイで表示する広告表示装置が開示されている。 Further, there is Patent Document 3 as a publicly known document related to the above configuration. In Patent Document 3, a voice recognition sensor detects a voice of a surrounding person, analyzes the frequency band of the voice and a fault cloak, and analyzes the surrounding person. An advertisement display device is disclosed that displays the advertisement information according to the determination result of the sex and age group on the display.

特開平１１−２０３２９５号公報JP-A-11-203295 特開２００２−２７１５０７号公報JP 2002-271507 A 特開２００２−２４４６０６号公報JP 2002-244606 A

しかし、特許文献１、２の如く、キーワードの認知に応じ、そのキーワードと対応するコンテンツを提供する構成では、対話で偶然現れたキーワードや付随的な話題で現れたキーワード等に対応するコンテンツが提供されてしまい、話者の中心的な話題を特定し、その中心的な話題や興味に沿ったコンテンツをリアルタイムで提供することは困難である。また、特許文献１の如く、話題に出現する複数のキーワードの出願頻度の分布を取得し、その出現頻度分布が最も類似する設定された出現頻度分布を決定し、決定した設定出現頻度分布に対応する話題を該当する話題として特定する構成では、対話の中心的な話題を正確に特定することができるものの、出現頻度分布の類似度を演算するために多大なハードウェアリソースが必要となって高コスト化し、又、キーワードの出現頻度データの蓄積や類似度の演算のためにコンテンツ提供の迅速性やリアルタイム性が損なわれ、中心的な話題や興味の移り変わり後にコンテンツが提供される事態が多々生ずる。かかる欠点は設定するキーワード数や話題数の増大とともに非常に顕著となる。 However, as disclosed in Patent Documents 1 and 2, according to the recognition of a keyword, content corresponding to the keyword is provided, and content corresponding to a keyword that appears by chance or a keyword that appears in an incidental topic is provided. Therefore, it is difficult to specify a central topic of a speaker and provide content in accordance with the central topic and interest in real time. Also, as in Patent Document 1, the distribution of the application frequency of a plurality of keywords appearing in the topic is acquired, the set appearance frequency distribution with the most similar appearance frequency distribution is determined, and the determined set appearance frequency distribution is supported. In this configuration, the central topic of the conversation can be accurately identified, but a large amount of hardware resources are required to calculate the similarity of the appearance frequency distribution. Costs are increased, and the speed of provision of content and real-time performance are impaired due to accumulation of keyword appearance frequency data and similarity calculation, and there are many situations in which content is provided after a central topic or interest changes. . Such a defect becomes very remarkable as the number of keywords and topics to be set increases.

本発明は上記課題に鑑み提案するものであり、話者の中心的な話題を特定し、その中心的な話題に沿ったコンテンツを迅速にリアルタイムで提供することができるコンテンツ提供システムを提供することを目的とする。また、本発明の他の目的は、被提供者の中心的な話題或いは興味に合ったリアルタイムのコンテンツ提供を、多大なハードウェアリソースを必要とせずに低コストで実現できるコンテンツ提供システムを提供することにある。 The present invention is proposed in view of the above problems, and provides a content providing system capable of identifying a central topic of a speaker and quickly providing content along the central topic in real time. With the goal. Another object of the present invention is to provide a content providing system capable of realizing real-time content provision suitable for the central topic or interest of the recipient at low cost without requiring a large amount of hardware resources. There is.

本発明のコンテンツ提供システムは、コンピュータ若しくはネットワークコンピュータで構成され、マイクロフォンから取り込まれる音声を認識し、認識した音声から設定されているキーワードを認知する手段と、キーワードの認知に応じて、該キーワードの認知対応時刻を該キーワードと対応して記録する若しくは該キーワードと対応するキーと対応して記録すると共に、認知対応時刻の記録を対応するキーワードの認知に応じて更新する手段と、キーセットを構成する複数種のキーワード若しくはキーに対する最先の認知対応時刻から最後の認知対応時刻までの認知経過時間が認知設定時間以内である所定のキーセットを認識する手段と、各キーセットと対応設定されているコンテンツを格納するコンテンツデータベースから該認識した所定のキーセットと対応設定されているコンテンツを呼び出す手段と、該最後の認知対応時刻と略リアルタイムで該呼び出したコンテンツを再生する手段と、再生するコンテンツを出力する手段とを備えることを特徴とする。尚、コンテンツを出力する手段は、ディスプレイ若しくはスピーカ若しくはその両者等とすることが可能であり、前記出力手段に対応して再生される画像若しくは音声若しくはその両者等のコンテンツをコンテンツデータベースに設定する。 The content providing system of the present invention is configured by a computer or a network computer, recognizes a voice captured from a microphone, recognizes a keyword set from the recognized voice, and determines the keyword according to the keyword recognition. A key set is configured and means for recording the recognition corresponding time in correspondence with the keyword or in correspondence with the key corresponding to the keyword, and updating the recognition corresponding time recording according to the recognition of the corresponding keyword Means for recognizing a predetermined key set whose recognition elapsed time from the earliest recognition correspondence time to the last recognition correspondence time for a plurality of types of keywords or keys is within the recognition setting time, and correspondingly set to each key set The content database that stores the content A means for calling content set in correspondence with a predetermined key set, means for playing back the called content in substantially real time, and means for outputting the content to be played back, To do. The means for outputting the content can be a display, a speaker, or both, and the content such as an image and / or sound to be reproduced corresponding to the output means is set in the content database.

また、本発明のコンテンツ提供システムは、コンピュータ若しくはネットワークコンピュータで構成され、マイクロフォンから取り込まれる音声を認識し、認識した音声から設定されているキーワードを認知する手段と、キーワードの認知に応じて、該キーワードの認知対応時刻を該キーワードと対応して記録する若しくは該キーワードと対応するキーと対応して記録すると共に、認知対応時刻の記録を対応するキーワードの認知に応じて更新する手段と、キーセットを構成する複数種のキーワード若しくはキーに対する最先の認知対応時刻から最後の認知対応時刻までの認知経過時間が認知設定時間以内である所定のキーセットを認識する手段と、各キーセットと対応設定されているコンテンツを格納するコンテンツデータベースから該認識した所定のキーセットと対応設定されているコンテンツを呼び出す手段と、該最後の認知対応時刻と略リアルタイムで該呼び出したコンテンツを再生出力装置へ送信する手段とを備えることを特徴とする。 Further, the content providing system of the present invention is constituted by a computer or a network computer, recognizes a voice taken in from a microphone, recognizes a keyword set from the recognized voice, and according to the keyword recognition, A key set that records a recognition time of a keyword corresponding to the keyword or a key corresponding to the keyword, and updates a record of the recognition time according to the recognition of the corresponding keyword; Means for recognizing a predetermined key set in which the recognition elapsed time from the earliest cognitive response time to the last cognitive response time for the plurality of types of keywords or keys constituting the key is within the cognitive setting time, and the corresponding setting From the content database storing stored content. Characterized in that it comprises and means for invoking the content with a predetermined key set is associated Configuration, and means for transmitting the call content cognitive corresponding time and near real time said last to the playback output device.

上記コンテンツ提供システムは、例えばキーワードの認知に応じ、キーワードの認知対応時刻を該キーワード若しくは該キーワードと対応するキーと対応して記録し、同一のキーワードの認知若しくは同一のキーと対応するキーワードの認知に応じ、記録された認知対応時刻を更新すると共に、複数種のキーワード若しくはキーで構成されるキーセットについて、属する複数種のキーワード若しくはキーを認知設定時間以内に全て認知した所定のキーセットを認識する、或いは、全キーワード若しくは全キーに対する最先の認知対応時刻から最後の認知対応時刻までの認知経過時間が認知設定時間以内である所定のキーセットを認識することにより、時間の経過に応じて複雑に移り変わる話者の話題を並列的に随時追跡し、所定時間に於いて盛り上がっている集中度が高い或いは中心的な話題や興味を複雑な処理を要さずに容易、柔軟且つ迅速に特定することができ、又、複数種のキーワード若しくはキーを用いて、コンテンツ提供に必要且つ十分な正確性で話題や興味を特定することができる。更に、例えば話題に応じたキーワードやキーが設定されている話題毎のキーセットと対応するコンテンツを呼び出し、略リアルタイムに再生出力する、又は再生出力装置へ送信することにより、話者の中心的な話題や興味に応じた心理的に非常に受け入れやすいコンテンツを、話者、或いは話者の対話を聴いている対象者、又は再生出力装置で話者の対話を聴いている対象者等に迅速にリアルタイムで提供することができる。更に、設定キーワードを認知すると共に、認知対応時刻を記録・更新し、認知設定時間以内に属するキーワード若しくはキーを認知したキーセットを認識する簡単な構成であることから、多大なハードウェアリソースを必要とせずに低コストで実現でき、遍在的にコンテンツ提供システムを設置することが可能となる。更に、例えばキーセットで設定する話題数やキーワード数等が増大した場合にも、必要とするハードウェアリソースの増加やコスト増が無い或いは非常に軽微に留めつつ、コンテンツ提供の迅速性やリアルタイム性を確保することができる。更に、認知対応時刻の記録及び更新により、キーセット数やキーワード数等の増大した場合等にも、最先の認知対応時刻から最後の認知対応時刻までの認知経過時間を複雑な処理を要さず非常に容易に取得することができ、例えば認知したキーワードに対応する多数のキーセット毎にキーワードの認知に応じて時刻計測するような複雑な処理等も必要無い。更に、設定キーワードの認知に基づきコンテンツを提供するので、適度にコンテンツを提供し、過剰な情報提供を防止することができる。更に、話者が意識的にある種の話題で対話し、その話題に対応するコンテンツを呼び出して提供を受ける、或いは提供することも可能であり、話者やコンテンツ提供を受ける被提供者の便宜性や娯楽性を向上することができる。 For example, according to the keyword recognition, the content providing system records the keyword recognition corresponding time corresponding to the keyword or the key corresponding to the keyword, and recognizes the same keyword or the keyword corresponding to the same key. In response to the update, the recorded cognitive response time is updated, and for a key set composed of a plurality of types of keywords or keys, a predetermined key set that recognizes all of a plurality of types of keywords or keys within a set recognition time is recognized. Or by recognizing a predetermined key set in which the recognition elapsed time from the earliest recognition correspondence time to the last recognition correspondence time for all keywords or all keys is within the recognition setting time, according to the passage of time Track the topics of speakers that change in complex ways in parallel at any given time. It is possible to easily, flexibly and quickly identify a highly concentrated or central topic or interest that does not require complex processing, and provide content using multiple types of keywords or keys. The topic and interest can be specified with sufficient and sufficient accuracy. Further, for example, by calling a content corresponding to a key set for each topic for which a keyword or key corresponding to the topic is set, the content is reproduced and output in substantially real time, or transmitted to a reproduction output device. Promptly accept psychologically very acceptable content according to the topic and interest to the speaker, the subject listening to the speaker's dialogue, or the subject listening to the speaker's dialogue with the playback output device Can be provided in real time. In addition, it requires a lot of hardware resources because it recognizes the set keyword and records / updates the cognitive response time and recognizes the key set that recognized the keyword or key within the recognition set time. It can be realized at a low cost, and a content provision system can be installed ubiquitously. In addition, for example, when the number of topics or keywords set in the key set increases, the speed of content provision and real-time performance are not increased while there is no increase in the required hardware resources or cost. Can be secured. Furthermore, even when the number of key sets or keywords increases due to the recording and updating of the cognitive response time, complicated processing is required for the recognition elapsed time from the earliest cognitive response time to the last cognitive response time. For example, complicated processing such as time measurement according to keyword recognition for each of a large number of key sets corresponding to the recognized keyword is not necessary. Furthermore, since the content is provided based on the recognition of the set keyword, it is possible to provide the content moderately and prevent excessive information provision. Furthermore, it is possible for the speaker to consciously interact with a certain topic and to receive or provide the content corresponding to the topic. Sex and entertainment can be improved.

更に、本発明のコンテンツ提供システムは、現在時刻から認知設定時間が経過した認知対応時刻の記録を消去する手段を備えることを特徴とするものである。現在時刻から認知設定時間が経過した認知対応時刻の記録を消去することにより、ハードウェアリソースを有効利用することができると共に、キーセットに属する各キーワードや各キーの認知に応じて、或いは認知対応時刻の記録に応じて、特段の後処理を要せず自動的に、コンテンツを呼び出すキーセットを認識することができる。 Furthermore, the content providing system of the present invention is characterized by comprising means for deleting the record of the cognitive response time when the cognitive set time has elapsed from the current time. By deleting the record of cognitive response time when the cognitive set time has elapsed from the current time, hardware resources can be used effectively, and each keyword or key belonging to the keyset is recognized or cognitively supported. According to the time recording, it is possible to automatically recognize the key set for calling the content without requiring any special post-processing.

更に、本発明のコンテンツ提供システムは、キーセットを構成する複数種のキーワード若しくは複数種のキーが３種以上であることを特徴とするものである。キーセットを構成する複数種の異なるキーワード若しくはキーは、２種以上の複数であれば３種、４種、５種、６種など適宜であるが、３種以上とすると、より正確に話題を特定し、話題に適合したコンテンツを提供することが可能となって好適である。尚、キーセットを構成するキーワード若しくはキーは各キーセットに対して同数設定してもよいが、ターゲットとする話題の特定と提供するコンテンツの内容のバランスや、キーセットに対応設定するキーワードの出現する可能性等に応じて、キーセット毎に適宜所要数のキーワード若しくはキーを設定してもよい。 Furthermore, the content providing system of the present invention is characterized in that there are three or more types of keywords or a plurality of types of keys constituting the key set. Multiple types of different keywords or keys that make up the key set are appropriate as long as there are two or more types, such as three, four, five, six, etc. It is possible to identify and provide content suitable for the topic, which is preferable. The same number of keywords or keys that make up the key set may be set for each key set, but the balance between the identification of the target topic and the content of the content to be provided, and the appearance of keywords that are set corresponding to the key set A required number of keywords or keys may be set as appropriate for each key set according to the possibility of performing the operation.

更に、本発明のコンテンツ提供システムは、取り込まれる音声を認識し、認識した音声から設定されているキーワードを認知する手段が該設定されているキーワードだけを認知することを特徴とする。例えばワードスポッティング音声認識技術を用い、設定されているキーワードだけを認知することにより、速い応答速度でリアルタイムにコンテンツを提供できると共に、システムを一層低コスト化することができ、又、予め決まったキーワード以外は認知しないことから、音声を取り込まれる話者のプライバシー保護を図ることが可能である。 Furthermore, the content providing system of the present invention is characterized in that a means for recognizing a captured voice and recognizing a keyword set from the recognized voice recognizes only the set keyword. For example, by using word spotting speech recognition technology and recognizing only the set keywords, content can be provided in real time with a fast response speed, the system can be further reduced in cost, and predetermined keywords can be provided. Since it does not recognize anything other than, it is possible to protect the privacy of the speaker who captures the voice.

更に、本発明のコンテンツ提供システムは、前記所定のキーセットの認識に基づきコンテンツが再生中若しくは送信中であるか判定する手段と、該所定のキーセットの認識或いは該コンテンツ再生中若しくは送信中の判定からコンテンツ再生終了若しくは送信終了の認識までの経過時間が設定時間以内である再生終了若しく送信終了の認識に基づき、該所定のキーセットと対応設定されているコンテンツを再生若しくは再生出力装置へ送信すると共に、該設定時間以内の再生終了若しくは送信終了の認識が得られない場合に、該所定のキーセットと対応設定されているコンテンツの再生若しくは再生出力装置への送信をしない手段とを備えることを特徴とする。例えば後に認識したキーセットに対応するコンテンツを再生若しくは送信する際、前に認識したキーセットに対応するコンテンツが再生若しくは送信している場合に、後のキーセットの認識等から設定時間以内に前のコンテンツの再生や送信が終了した場合には後のキーセットのコンテンツを再生若しくは送信し、前記設定時間以内に終了しない場合には後のキーセットのコンテンツを再生若しくは送信しない構成とすることにより、話者の話題や興味が経時的に他へ変化した可能性が高い所定時間経過後にはコンテンツを提供せず、コンテンツ提供時やその直近の話者の話題や興味に即し、対象者が心理的に受け入れやすいコンテンツだけを確実に提供することが可能となる。 Further, the content providing system according to the present invention includes means for determining whether the content is being reproduced or transmitted based on the recognition of the predetermined key set, and the recognition of the predetermined key set or the reproduction or transmission of the content. Elapsed time from the determination to the end of content playback or transmission end is within the set time. Based on the recognition of playback end or transmission end, the content set corresponding to the predetermined key set is played back or played back to the playback output device. And a means for not playing back the content set corresponding to the predetermined key set or transmitting to the playback output device when the end of playback within the set time or the recognition of the end of transmission cannot be obtained. It is characterized by that. For example, when playing back or transmitting content corresponding to the key set recognized later, if the content corresponding to the key set recognized earlier is played back or transmitted, When the reproduction or transmission of the content of the current key ends, the content of the subsequent key set is played or transmitted. When the content does not end within the set time, the content of the subsequent key set is not played or transmitted. , The topic or interest of the speaker is likely to have changed over time. The content is not provided after a predetermined period of time. Only content that is psychologically acceptable can be reliably provided.

更に、本発明のコンテンツ提供システムは、前記所定のキーセットの認識に基づきコンテンツが再生中若しくは送信中であるか判定する手段と、コンテンツの再生中若しくは送信中の判定に基づき、該所定のキーセットの認識を記録保持すると共に、既に記録保持されているキーセットの認識がある場合には該所定のキーセットの認識に更新して記録保持する手段と、コンテンツの再生終了若しくは送信終了の認識に基づき、該記録保持されている所定のキーセットと対応設定されているコンテンツを再生若しくは再生出力装置へ送信する手段とを備えることを特徴とする。コンテンツの再生中若しくは送信中に認識したキーセットを順次更新記録し、現在時刻に最も近い最後に認識されたキーセットに対応するコンテンツを提供することにより、簡単な構成で話者の話題や興味の経時的な変化に適応して、コンテンツ提供時やその間近の話者の話題や興味に即し、対象者が心理的に受け入れやすいコンテンツを提供することが可能となり、又、キーセットの認識を更新して記録保持することにより、ハードウェアリソースの有効利用を図ることができる。 Further, the content providing system of the present invention includes a unit for determining whether the content is being reproduced or transmitted based on the recognition of the predetermined key set, and the predetermined key based on the determination of the content being reproduced or transmitted. A means for recording and holding the set recognition, and when there is a recognition of a key set that has already been recorded and held, a means for updating and recording and holding the predetermined key set, and a recognition of the end of playback or transmission of the content And a means for reproducing or transmitting the content set corresponding to the predetermined key set recorded and held to the reproduction output device. The key set recognized during content playback or transmission is sequentially updated and recorded, and the content corresponding to the last key set recognized closest to the current time is provided. Adapting to changes over time, it is possible to provide content that is psychologically acceptable to the target audience according to the topic and interest of the speaker at the time of providing the content or in the near future, and also recognizes the key set. It is possible to effectively use the hardware resources by updating and storing the information.

また、本発明のコンテンツ提供システムは、コンピュータ若しくはネットワークコンピュータで構成され、マイクロフォンから取り込まれる音声を認識し、認識した音声から設定されているキーワードを認知する手段と、複数種のキーワード若しくは該キーワードと対応するキーで構成されるキーセットからキーワードの認知に基づき所定のキーセットを認識する手段と、所定のキーセットの認識に基づきコンテンツが再生中若しくは送信中であるか判定する手段と、該所定のキーセットの認識或いは該コンテンツ再生中若しくは送信中の判定からコンテンツ再生終了若しくは送信終了の認識までの経過時間が設定時間以内である再生終了若しくは送信終了の認識に基づき、該所定のキーセットと対応設定されているコンテンツを再生若しくは再生出力装置へ送信すると共に、該設定時間以内の再生終了若しくは送信終了の認識が得られない場合に、該所定のキーセットと対応設定されているコンテンツの再生若しくは再生出力装置への送信をしない手段とを備えることを特徴とする。 Further, the content providing system of the present invention comprises a computer or a network computer, recognizes a voice captured from a microphone, recognizes a keyword set from the recognized voice, a plurality of types of keywords or the keywords, Means for recognizing a predetermined key set based on recognition of a keyword from a key set composed of corresponding keys; means for determining whether content is being reproduced or transmitted based on recognition of the predetermined key set; The predetermined key set based on the recognition of the end of reproduction or the end of transmission in which the elapsed time from the determination during the reproduction or transmission of the content to the recognition of the end of content reproduction or the end of transmission is within a set time Play content that is set to support When transmitting to the playback output device and when the end of playback within the set time or the recognition of the end of transmission cannot be obtained, the content set in correspondence with the predetermined key set is not played back or transmitted to the playback output device. Means.

上記コンテンツ提供システムは、複数種のキーワード若しくはキーで構成されるキーセットから音声認識によるキーワードの認知に基づくキーセットを認識することにより、時間の経過に応じて複雑に移り変わる話者の話題を並列的に随時追跡し、集中度が高い或いは中心的な話題や興味を複雑な処理を要さずに容易、柔軟且つ迅速に特定することができ、又、コンテンツ提供に必要な正確性で話題や興味を特定することができる。更に、例えば話題に応じたキーワードやキーが設定されている話題毎のキーセットと対応するコンテンツを呼び出し、再生出力する、又は再生出力装置へ送信することにより、話者の中心的な話題や興味に応じた心理的に受け入れやすいコンテンツを、話者、或いは話者の対話を聴いている対象者、又は再生出力装置で話者の対話を聴いている対象者等に提供することができる。更に、設定キーワードを認知して、属するキーワード若しくはキーを認知したキーセットを認識する簡単な構成であることから、多大なハードウェアリソースを必要とせずに低コストで実現でき、遍在的にコンテンツ提供システムを設置することが可能となり、例えばキーセットで設定する話題数やキーワード数等が増大した場合にも、必要とするハードウェアリソースの増加やコスト増が無い或いは非常に軽微に留めることができる。更に、話者の話題や興味が経時的に他へ変化した可能性が高い所定時間経過後にはコンテンツを提供せず、コンテンツ提供時やその直近の話者の話題や興味に即し、対象者が心理的に受け入れやすいコンテンツだけを確実に提供することが可能となる。更に、設定キーワードの認知に基づきコンテンツを提供するので、適度にコンテンツを提供し、過剰に情報提供を防止することができる。更に、話者が意識的にある種の話題で対話し、その話題に対応するコンテンツを呼び出して提供を受ける、或いは提供することも可能であり、話者やコンテンツ提供を受ける被提供者の便宜性や娯楽性を向上することができる。 The above content provision system recognizes a key set based on keyword recognition by voice recognition from a key set composed of a plurality of types of keywords or keys, thereby paralleling topics of speakers that change in a complex manner over time. Can be tracked at any time, and can easily and flexibly and quickly identify highly concentrated or central topics and interests without requiring complex processing. You can identify interests. Further, for example, by calling a content corresponding to a key set for each topic for which a keyword or key corresponding to the topic is set, and reproducing / outputting the content or transmitting it to a reproduction output device, the central topic or interest of the speaker is obtained. Thus, content that is psychologically acceptable according to the user can be provided to a speaker, a subject who is listening to the conversation of the speaker, or a subject who is listening to the conversation of the speaker using the reproduction output device. Furthermore, since it is a simple configuration that recognizes a set keyword and a key set that recognizes a keyword or key to which it belongs, it can be realized at low cost without requiring a large amount of hardware resources, and ubiquitous content It is possible to install a provision system. For example, even when the number of topics or keywords set with a key set increases, there is no increase in required hardware resources, an increase in cost, or a very slight increase. it can. In addition, the content of the speaker's topic or interest is likely to have changed over time. After a predetermined time has elapsed, the content is not provided, and the subject is subject to the content and the latest speaker's topic and interest. It is possible to reliably provide only psychologically acceptable content. Furthermore, since the content is provided based on the recognition of the set keyword, it is possible to provide the content appropriately and prevent excessive information provision. Furthermore, it is possible for the speaker to consciously interact with a certain topic and to receive or provide the content corresponding to the topic. Sex and entertainment can be improved.

また、本発明のコンテンツ提供システムは、コンピュータ若しくはネットワークコンピュータで構成され、マイクロフォンから取り込まれる音声を認識し、認識した音声から設定されているキーワードを認知する手段と、複数種のキーワード若しくは該キーワードと対応するキーで構成されるキーセットからキーワードの認知に基づき所定のキーセットを認識する手段と、所定のキーセットの認識に基づきコンテンツが再生中若しくは送信中であるか判定する手段と、コンテンツの再生中若しくは送信中の判定に基づき、該所定のキーセットの認識を記録保持すると共に、既に記録保持されているキーセットの認識がある場合には該所定のキーセットの認識に更新して記録保持する手段と、コンテンツの再生終了若しくは送信終了の認識に基づき、該記録保持されている所定のキーセットと対応設定されているコンテンツを再生若しくは再生出力装置へ送信する手段とを備えることを特徴とする。 Further, the content providing system of the present invention comprises a computer or a network computer, recognizes a voice captured from a microphone, recognizes a keyword set from the recognized voice, a plurality of types of keywords or the keywords, Means for recognizing a predetermined key set based on keyword recognition from a key set composed of corresponding keys, means for determining whether the content is being reproduced or transmitted based on recognition of the predetermined key set, Based on the determination during playback or transmission, the recognition of the predetermined key set is recorded and held, and if there is a recognition of the key set that has already been recorded and held, it is updated to the recognition of the predetermined key set and recorded. Based on the means to hold and the recognition of the end of playback or transmission of content. It can, characterized in that it comprises a means for transmitting a predetermined key set being the record holder to reproduce or playback output apparatus content that is compatible set.

上記コンテンツ提供システムは、複数種のキーワード若しくはキーで構成されるキーセットから音声認識によるキーワードの認知に基づくキーセットを認識することにより、時間の経過に応じて複雑に移り変わる話者の話題を並列的に随時追跡し、集中度が高い或いは中心的な話題や興味を複雑な処理を要さずに容易、柔軟且つ迅速に特定することができ、又、コンテンツ提供に必要な正確性で話題や興味を特定することができる。更に、例えば話題に応じたキーワードやキーが設定されている話題毎のキーセットと対応するコンテンツを呼び出し、再生出力する、又は再生出力装置へ送信することにより、話者の中心的な話題や興味に応じた心理的に受け入れやすいコンテンツを、話者、或いは話者の対話を聴いている対象者、又は再生出力装置で話者の対話を聴いている対象者等に提供することができる。更に、設定キーワードを認知して、属するキーワード若しくはキーを認知したキーセットを認識する簡単な構成であることから、多大なハードウェアリソースを必要とせずに低コストで実現でき、遍在的にコンテンツ提供システムを設置することが可能となり、例えばキーセットで設定する話題数やキーワード数等が増大した場合にも、必要とするハードウェアリソースの増加やコスト増が無い或いは非常に軽微に留めることができる。
更に、簡単な構成で話者の話題や興味の経時的な変化に適応して、コンテンツ提供時やその間近の話者の話題や興味に即し、対象者が心理的に受け入れやすいコンテンツを提供することが可能となり、又、キーセットの認識を更新して記録保持することにより、ハードウェアリソースの有効利用を図ることができる。更に、設定キーワードの認知に基づきコンテンツを提供するので、適度にコンテンツを提供し、過剰に情報提供を防止することができる。更に、話者が意識的にある種の話題で対話し、その話題に対応するコンテンツを呼び出して提供を受ける、或いは提供することも可能であり、話者やコンテンツ提供を受ける被提供者の便宜性や娯楽性を向上することができる。 The above content provision system recognizes a key set based on keyword recognition by voice recognition from a key set composed of a plurality of types of keywords or keys, thereby paralleling topics of speakers that change in a complex manner over time. Can be tracked at any time to identify highly concentrated or central topics and interests easily, flexibly and quickly without complicated processing, and with the accuracy required to provide content, You can identify interests. Further, for example, by calling a content corresponding to a key set for each topic for which a keyword or key corresponding to the topic is set, and reproducing / outputting the content or transmitting it to a reproduction output device, the central topic or interest of the speaker is obtained. Thus, content that is psychologically acceptable according to the user can be provided to a speaker, a subject who is listening to the conversation of the speaker, or a subject who is listening to the conversation of the speaker using the reproduction output device. Furthermore, since it is a simple configuration that recognizes a set keyword and a key set that recognizes a keyword or key to which it belongs, it can be realized at low cost without requiring a large amount of hardware resources, and ubiquitous content It is possible to install a provision system. For example, even when the number of topics or keywords set with a key set increases, there is no increase in required hardware resources, an increase in cost, or a very slight increase. it can.
In addition, it adapts to changes in the speaker's topic and interest over time with a simple structure, and provides content that is psychologically acceptable to the target audience according to the topic and interest of the speaker at the time of providing the content or in the immediate vicinity. In addition, it is possible to effectively use hardware resources by updating the key set recognition and recording and holding it. Furthermore, since the content is provided based on the recognition of the set keyword, it is possible to provide the content moderately and prevent excessive information provision. Furthermore, it is possible for the speaker to consciously interact with a certain topic and to receive or provide the content corresponding to the topic. Sex and entertainment can be improved.

尚、本発明のコンテンツ提供システムをネットワークコンピュータで構成する場合、そのネットワークにはＬＡＮ・インターネットや通信網・放送網などデータを伝送可能な適宜のものを用いることが可能であり、有線或は無線、専用或いは汎用、内部或は外部のネットワーク等としてもよい。更に、本発明の各構成手段は、ネットワークで接続される複数のコンピュータの適宜のコンピュータに設けることが可能であり、所要の構成手段と他の構成手段を遠隔地など別の場所のコンピュータに設ける構成等としてもよい。例えばコンテンツ提供システムを、一若しくは複数のディスプレイ型装置・スピーカ型装置・リアルタイムボイスチャットが可能な構成等のパソコン・テレビ電話機・携帯電話機・携帯情報端末・テレビ携帯電話機・テレビ受像機・テレビ受像機とセットトップボックス若しくはゲーム機等の各種端末或いは各種装置と通信ネットワークで接続される遠隔等のサーバで構成すること等が可能であり、又、各種端末がそのマイクから取り込む音声に基づきコンテンツの呼出指令をサーバに送信し、サーバが受信する前記呼出指令に基づき、そのコンテンツＤＢから対応するコンテンツを抽出して端末に送信する構成や、或いは各種端末がそのマイクから取り込む音声データ若しくは取り込む音声に基づく認知したキーワードデータをサーバに送信し、サーバが受信する音声データ若しくはキーワードデータに基づき所定処理を実行し、所定のキーセットと対応するコンテンツをそのコンテンツＤＢから抽出して端末に送信する構成や、或いはサーバがそのマイクから取り込む音声に基づき所定処理を実行し、所定のキーセットと対応するコンテンツをそのコンテンツＤＢから抽出して端末に送信する構成等とすることが可能である。尚、コンテンツ提供システムの各構成手段を前述のような各種端末或いは各種装置に一体的に或いは一箇所に設けてもよい。 When the content providing system of the present invention is configured by a network computer, any appropriate network capable of transmitting data, such as a LAN, the Internet, a communication network, and a broadcast network, can be used as the network. Alternatively, a dedicated or general purpose, internal or external network may be used. Furthermore, each constituent means of the present invention can be provided in an appropriate computer of a plurality of computers connected by a network, and the required constituent means and other constituent means are provided in a computer at another place such as a remote place. It is good also as a structure. For example, the content providing system is a personal computer, a video phone, a mobile phone, a mobile information terminal, a TV mobile phone, a TV receiver, a TV receiver, etc., with one or more display-type devices, speaker-type devices, and a configuration capable of real-time voice chat. And other terminals such as set-top boxes or game machines, or remote servers connected to various devices via a communication network, etc. Based on the call command transmitted to the server and based on the call command received by the server, the corresponding content is extracted from the content DB and transmitted to the terminal, or based on the audio data captured by the various terminals from the microphone or the captured audio Send recognized keyword data to server A configuration in which predetermined processing is executed based on audio data or keyword data received by the server, content corresponding to a predetermined key set is extracted from the content DB and transmitted to the terminal, or audio that the server captures from the microphone It is possible to adopt a configuration in which predetermined processing is executed on the basis of the content, content corresponding to the predetermined key set is extracted from the content DB, and transmitted to the terminal. In addition, you may provide each structural means of a content provision system integrally or in one place in the above various terminals or various apparatuses.

また、コンテンツを再生、出力する手段等の各手段は適宜の場所に設けることができ、例えばマイクロフォンから音声を取り込むと共に、画像及び音声或いは画像或いは音声を再生出力する装置を、飲食店等の店舗、タクシー・バス・電車・飛行機等の乗物、デパート・ショッピングモール・娯楽施設等の施設、駅の集合場所等の集合場所などで、座席や乗車席、テーブル近傍やテーブル上、集合場所の近傍など、対象者或いは顧客がある程度の時間留まってその場に居る他の対象者や携帯電話による通話相手等と対話するスペースの近傍に設置して、対象者或いは顧客の対話の音声からキーワードを認知し、対話の話題に対応するコンテンツをリアルタイムに提供する構成や、又は、前記装置を自宅に設置する構成や、又は、前記装置を移動可能に携帯する構成等とすることが可能であり、広告情報、案内情報、観光情報、娯楽番組等のコンテンツを適切なタイミングで提供し、対象者や顧客の娯楽性、便宜性、情報内容に対する印象度を高めることができ、更に、店舗、乗物、施設等に設置する場合には集客率を向上することができる。特に、薄型のディスプレイ型装置等によりシステムを構成して、店舗、施設、乗物等の公共スペースに多数或いは遍在的に設置し、コンテンツとして広告情報を提供すると、広告情報を対象者が心理的に受け入れやすいタイミングを可能な限り多く捉えて情報提供し、マーケティング効果を増大することができる。 In addition, each means such as a means for reproducing and outputting content can be provided at an appropriate place. For example, a device that captures sound from a microphone and reproduces and outputs an image and sound or an image or sound can be provided at a store such as a restaurant. , Taxis, buses, trains, airplanes, etc., department stores, shopping malls, entertainment facilities, etc., gathering places such as station gathering places, seats and passenger seats, near tables, on tables, near meeting places, etc. Installed in the vicinity of a space where the target person or customer stays for a certain amount of time and interacts with the other target person or the other party on the mobile phone, and recognizes the keyword from the voice of the target person or customer's conversation. A configuration that provides content corresponding to the topic of conversation in real time, a configuration in which the device is installed at home, or a transfer of the device. It can be configured to be portable, etc., providing advertisement information, guidance information, tourism information, entertainment programs, etc. content at an appropriate time, with respect to the entertainment and convenience of the target audience and customers, information content The degree of impression can be increased, and when it is installed in a store, vehicle, facility, etc., the rate of attracting customers can be improved. In particular, when a system is configured with a thin display type device or the like, and installed in public spaces such as stores, facilities, vehicles, etc. It is possible to increase the marketing effect by providing information by capturing as many times as possible.

また、マイクロフォンとコンテンツを再生、出力する手段は、ディスプレイやスピーカとマイクロフォンが一体的に設けられているディスプレイ型装置や携帯電話など、これらを１対１で対応させて設ける構成以外に、例えば店舗等の一人若しくは複数人が座れる座席単位にそれぞれ対応して座席近傍に、或いはテーブル近傍にマイクロフォンを設けると共に、その座席から所定距離離れた場所に大型画面のディスプレイやスピーカを有する再生出力手段を前記座席の顧客が視聴可能に設置し、各マイクロフォンで取り込む音声をそれぞれ別々に処理してキーセットの認識処理を実行し、所定のマイクロフォンで取り込む音声に基づくキーセットの認識に応じ、その再生出力手段がコンテンツを再生出力する構成等、１つのシステムのマイクロフォンとコンテンツを再生、出力する手段を複数対１で対応させて設ける構成としてもよく、又、例えば娯楽施設等の２人〜４人など複数人が座れる座席単位にマイクロフォンを設け、その座席単位の各座席に対象者が視聴可能にディスプレイやスピーカを有する再生出力手段を設置し、各座席に対応する再生出力手段が座席単位に対応させて設けられている一つのマイクロフォンから取り込まれる音声に基づき同一のコンテンツを再生出力する構成等、１つのシステムのマイクロフォンとコンテンツを再生、出力する手段を１対複数で対応させて設ける構成としてもよく、又、１つのシステムのマイクロフォンとコンテンツを再生、出力する手段を複数対複数で対応させて設ける構成としてもよい。 Further, the means for reproducing and outputting the microphone and the content is not limited to a configuration in which the display, the speaker, and the microphone are integrally provided, such as a display-type device or a mobile phone, and for example, a store A reproduction output means having a large screen display and a speaker provided at a predetermined distance from the seat and a microphone provided in the vicinity of the seat or in the vicinity of the table corresponding to each seat unit on which one or a plurality of persons can sit. Installed so that seat customers can view and process the sound captured by each microphone separately to execute key set recognition processing, and according to the recognition of the key set based on the sound captured by a predetermined microphone, the reproduction output means A system microphone that plays and outputs content It is good also as a structure which provides a one-to-one correspondence with the phone and the means for reproducing and outputting the content. For example, a microphone is provided in a seat unit that can seat two to four people such as an amusement facility, and the seat unit. Based on the sound taken from one microphone provided with a reproduction output means having a display and a speaker so that the target person can view each seat, and a reproduction output means corresponding to each seat is provided for each seat. It may be configured to provide one-to-multiple means for reproducing and outputting content from a single system, such as a configuration for reproducing and outputting the same content, and to reproduce and output content from a single system microphone. It is good also as a structure which provides the means to respond | correspond in multiple-to-multiple correspondence.

また、音声のコンテンツを出力する手段としてスピーカを設ける場合に、例えばリクライニングチェアーに座った対象者の耳元で出力し、対象者にだけ聞こえる音量で出力するスピーカや、対象者の耳元に音声を超音波で搬送する指向性スピーカ等とすると、マイクロフォンで取り込む音声から、マイクロフォンと近距離等に配置されるスピーカの出力音声を予め排除することが可能となり、スピーカの音声出力中も音声認識やキーワードの認知を継続的に実行することができて好適である。また、音声を取り込むマイクロフォンは、対象者以外の音声を排除して対象者の音声を拾えるものであれば適宜であり、例えば対象者の略口元へ指向性を有する指向性マイクロフォンや、或いは所定音量以上の音声のみを取得するマイクロフォン等とし、周囲の一人若しくは複数人の対象者の音声を取得するものとする。更に、例えば複数のマイクロフォンから取得する音声を周波数分析して音源の位置を特定する等、音源の位置を特定する既存の手段等を用いて、所定位置の対象者の音声を取り込んで認識する構成等としてもよい。 In addition, when a speaker is provided as a means for outputting audio content, for example, a speaker that outputs at the ear of the subject sitting on a reclining chair and outputs at a volume that can be heard only by the subject, or an audio that exceeds the ear of the subject. With a directional speaker or the like that is transported by sound waves, it is possible to exclude in advance the output sound of a speaker placed at a short distance from the microphone from the sound captured by the microphone. It is preferable that recognition can be continuously performed. The microphone that captures the sound is appropriate as long as it can pick up the sound of the target person by excluding the sound other than the target person. For example, a directional microphone having directivity toward the mouth of the target person or a predetermined volume It is assumed that a microphone or the like that acquires only the above-described sound is acquired, and the sound of one or a plurality of target persons around is acquired. Further, for example, a structure for capturing and recognizing a target person's voice at a predetermined position using an existing means for specifying the position of a sound source, such as specifying the position of a sound source by performing frequency analysis on sounds acquired from a plurality of microphones. Etc.

また、店舗内の座席近傍やテーブル近傍やテーブル上、施設内の座席近傍、公共的な乗物内の座席近傍、集合場所の壁面等の公共スペースに配置するディスプレイ型装置やスピーカ型装置など、公共スペース等に設けるコンテンツを再生、出力する手段を備える各種装置、又は、公共スペース等に設けるマイクロフォンを備える各種装置、又は、公共スペース等に設けるマイクロフォン及びコンテンツを再生、出力する手段を備える各種装置に、対象とする所定場所に対象者が存在することを検知する赤外線センサー等のセンサーを設け、所定場所の対象者の存在に対するセンサーの検知に応じ、各種装置の制御部など所定部が制御プログラムと協働して所定の制御指令を出力し、前記制御指令に基づき、伝送されるコンテンツ若しくは設定記憶している指定のコンテンツを再生、出力する、或は前記制御指令に基づき、マイクロフォンから取り込んだ音声からキーワードを認知し、認知したキーワードに基づきコンテンツを呼び出し、再生、出力するコンテンツ提供処理をコンテンツ提供システムの所定部が実行する構成等としてもよい。 Also, display-type devices and speaker-type devices such as display-type devices and speaker-type devices placed in public spaces such as in the vicinity of seats in tables, near tables, on tables, near seats in facilities, near seats in public vehicles, and wall surfaces of meeting places Various devices provided with means for reproducing and outputting content provided in a space, etc., various devices provided with a microphone provided in a public space, etc., or various devices provided with means for reproducing and outputting a microphone and content provided in a public space etc. In addition, a sensor such as an infrared sensor that detects the presence of a target person at a predetermined target location is provided, and a predetermined unit such as a control unit of various apparatuses is configured as a control program in response to detection of the sensor for the presence of the target person at the predetermined location. Collaborate to output a predetermined control command, and based on the control command, content or settings to be transmitted Plays and outputs specified content that is remembered, or recognizes a keyword from the voice captured from the microphone based on the control command, calls the content based on the recognized keyword, and plays and outputs the content providing process. It is good also as a structure etc. which the predetermined part of a provision system performs.

また、ディスプレイ型装置或はスピーカ型装置或はその両者等のコンテンツを再生、出力する手段を備える各種装置は、キーワードの認知に基づくコンテンツを提供する場合以外には、コンテンツを提供しない、又は適宜のコンテンツを提供する構成とすることが可能である。例えば各種装置の再生処理部が、設定記憶する若しくは伝送される通常のコンテンツを再生し、該コンテンツの再生を中止し若しくは該コンテンツの再生が所定時間内に終了した場合に若しくは該コンテンツの再生終了後に、キーワードの認知に基づくコンテンツを再生する構成等とする。前記通常のコンテンツは、例えばテレビやラジオ等の番組、広告情報、案内情報等とし、ディスプレイ型装置に於けるメニュー画面等も含まれる。又は、キーワードの認知に基づくコンテンツを通常のコンテンツと並行して提供する構成としてもよく、例えばキーワードの認知に基づくコンテンツをディスプレイの一部に表示する構成や、通常のコンテンツとキーワードの認知に基づくコンテンツを分割画面で表示する構成や、出力音声を通常のコンテンツの音声としながらキーワードの認知に基づくコンテンツの画像をディスプレイで表示する構成や、出力音声をキーワードの認知に基づくコンテンツの音声としながら通常のコンテンツの画像をディスプレイで表示する構成等とすることが可能である。 In addition, various devices including means for reproducing and outputting content such as a display-type device and / or a speaker-type device do not provide content other than when providing content based on keyword recognition, or as appropriate. It is possible to adopt a configuration for providing the content. For example, when the playback processing unit of various devices plays back normal content that is stored or transmitted, stops playback of the content, or if playback of the content ends within a predetermined time, or the playback of the content ends Later, the content based on keyword recognition is played back. The normal content is, for example, a program such as a television or radio, advertisement information, guidance information, and the like, and also includes a menu screen in a display type device. Alternatively, the content based on the keyword recognition may be provided in parallel with the normal content. For example, the content based on the keyword recognition is displayed on a part of the display, or based on the normal content and the keyword recognition. A configuration that displays content in a split screen, a configuration that displays the content image based on keyword recognition while the output audio is normal content audio, and a normal content that displays output audio as content audio based on keyword recognition It is possible to adopt a configuration in which an image of the content is displayed on a display.

また、コンテンツ提供システムの音声からのキーワードの認知に基づくコンテンツの提供処理を、例えば前記キーワードの認知に基づくコンテンツ提供処理の実行要求の入力や実行スイッチのＯＮに基づき実行し、実行要求の入力や実行スイッチのＯＮがない場合には前記コンテンツ提供処理を実行しない構成等とすることにより、対象者の対話等に対してプライバシー保護を図ること等が可能となる。例えばコンテンツ提供システムをサーバーとディスプレイ型装置をネットワークで接続する等で構成し、ディスプレイ型装置が記憶保持する或は伝送されるメニュー画面をタッチパネル式のディスプレイに再生して表示し、メニュー画面に表示されるコンテンツ提供処理の実行要求ボタンの指定入力に応じて、ディスプレイ型装置の制御部或はサーバーの制御部などシステムの所定部が制御プログラムと協働し、所定の実行制御指令を出力し、前記実行制御指令に基づき、ディスプレイ型装置に設置される或はその近傍に設置される等のマイクロフォンから取り込んだ音声からキーワードを認知し、認知したキーワードに基づきコンテンツを呼び出し、呼び出したコンテンツをディスプレイ型装置で再生、出力するコンテンツ提供処理を所定部が実行する構成等としてもよい。 Further, the content providing process based on the recognition of the keyword from the voice of the content providing system is executed based on, for example, the execution request of the content providing process based on the recognition of the keyword or the ON of the execution switch, If the execution switch is not turned on, the content providing process is not executed, so that it is possible to protect the privacy of the subject's conversation and the like. For example, a content providing system is configured by connecting a server and a display-type device via a network, etc., and a menu screen that is stored or held by the display-type device is displayed on a touch panel display and displayed on the menu screen. In response to the designated input of the execution request button of the content providing process, a predetermined unit of the system such as the control unit of the display type device or the control unit of the server cooperates with the control program, and outputs a predetermined execution control command, Based on the execution control command, a keyword is recognized from a voice captured from a microphone installed in or near a display type device, the content is called based on the recognized keyword, and the called content is displayed on the display type. Content provision processing to be played and output on the device is predetermined There may be configured such that execution.

また、コンテンツ提供システムは、特定のキーワードの組み合わせの認知に基づき特定のキャラクターが登場するコンテンツを提供するなど娯楽性が高いコンテンツを有料若しくは無料で提供するゲームシステム等としてもよい。例えば特定のキャラクターの好みのキーワードを設定時間内に認知した場合には、前記特定のキャラクターが登場するコンテンツを提供し、前記好みのキーワードと異なるキーワードを設定時間内に認知した場合には、その異なるキーワードに対応設定されている別のキャラクターが登場する等のコンテンツを提供する構成等とする。更に、例えばコンテンツ提供システムの所定部が、所定の記憶部に記憶するクイズ形式の誘導画面をディスプレイ型装置のディスプレイに再生表示し、クイズに対する対象者の回答の音声からキーワードを認知し、そのキーワードの認知に基づき特定のキャラクターが登場する等のコンテンツを提供する構成等としてもよい。更に、有料で提供する場合の課金情報の処理は、例えばコンテンツ提供システムをサーバーとディスプレイ型装置をネットワークで接続する等で構成し、コンテンツ提供処理の実行要求ボタンの指定入力に応じて、ディスプレイ型装置或はサーバーの課金処理部がキーワードの認知に基づくコンテンツ提供処理の実行開始時を実時間クロック等の時刻データにより記録し、タッチパネル式のディスプレイのコンテンツ提供時に於ける画面の一部或はメニュー画面等で表示されるコンテンツ提供処理の実行終了ボタンの指定入力に応じて、前記課金処理部がコンテンツ提供処理の実行終了時を記録し、実行開始時から実行終了時までの利用経過時間を取得し、設定記憶されている単位時間当たりの所定の単価を経過時間に乗じてコンテンツ提供に対する対価を算出し、その対価を記憶保持する、或はその対価を送信処理部が所定の記憶部に対価を記憶保持する課金システムに送信する構成等とすると良いが、課金情報の処理には適宜の構成を採用できる。 Further, the content providing system may be a game system or the like that provides highly entertaining content free of charge, such as providing content in which a specific character appears based on recognition of a combination of specific keywords. For example, when a favorite keyword of a specific character is recognized within a set time, content in which the specific character appears is provided, and when a keyword different from the favorite keyword is recognized within a set time, The content is provided such that another character set to correspond to a different keyword appears. Further, for example, the predetermined unit of the content providing system reproduces and displays a quiz-type guidance screen stored in the predetermined storage unit on the display of the display-type device, recognizes the keyword from the voice of the subject's answer to the quiz, and the keyword It is good also as a structure which provides content, such as a specific character appearing based on recognition. Further, the processing of billing information in the case of providing for a fee is configured, for example, by configuring the content providing system by connecting a server and a display type device via a network, and in accordance with the designation input of the execution request button for the content providing process. The billing processing unit of the device or server records the start of execution of the content providing process based on the recognition of the keyword as time data such as a real time clock, and a part of the screen or the menu at the time of providing the content of the touch panel display In response to the designated input of the execution end button of the content provision process displayed on the screen or the like, the billing processing unit records the execution end time of the content provision process and obtains the elapsed usage time from the execution start time to the execution end time Content is provided by multiplying the elapsed time by a predetermined unit price per unit time that is set and stored It may be configured to calculate the compensation for the storage and store the compensation, or to transmit the compensation to a charging system that stores the compensation in a predetermined storage unit, etc. An appropriate configuration can be adopted.

また、コンテンツ提供システムは、例えば所定部が認識した所定のキーセットで最後に認知した或は最後の認知設定時刻を有するキーワードを認識し、該認識したキーワード自体を表現する、所定の記憶部に設定記憶されている画像データ或は音声データ或はその両者を抽出し、該所定のキーセットと対応するコンテンツの再生及び出力の開始直前に、再生処理部或は送信処理部が該キーワードの画像データ或は音声データ或はその両者を再生出力或は送信する構成等とし、コンテンツを提供される対象者の注意を引き付けるようにしてもよい。また、本発明には、各発明に他の発明の特定事項を追加し、或は各発明の特定事項を他の発明の特定事項に変更し、或は本発明の部分的な作用効果を奏する限度に於いて、各発明の特定事項を削除して上位概念化したものも本発明に包含され、又、システムのカテゴリー以外に、同様の趣旨の発明を方法やプログラムとして規定したものも本発明に包含され、各カテゴリーの発明の特定事項は適宜他のカテゴリーの発明の特定事項とすることができる。更に、本発明に於ける所定手段や所定部は、設定されるプログラムと協働するＣＰＵや、プログラムやデータを記憶するメモリ等で適宜実現される。 In addition, the content providing system recognizes a keyword having been recognized last with a predetermined key set recognized by the predetermined unit or having a last recognized setting time, and stores the recognized keyword in a predetermined storage unit. The image data and / or audio data stored in the setting is extracted, and immediately before the reproduction and output of the content corresponding to the predetermined key set, the reproduction processing unit or the transmission processing unit extracts the image of the keyword. The data and / or audio data may be reproduced and transmitted or transmitted to attract the attention of the subject to whom the content is provided. In addition, in the present invention, specific items of other inventions are added to each invention, or specific items of each invention are changed to specific items of other inventions, or partial effects of the present invention are exhibited. In terms of limits, those obtained by deleting specific matters of each invention and making it a superordinate concept are also included in the present invention. Besides the system category, the inventions that define similar inventions as methods and programs are also included in the present invention. It is included, and the specific matters of the invention in each category can be appropriately specified as specific matters of the invention in other categories. Furthermore, the predetermined means and the predetermined unit in the present invention are appropriately realized by a CPU that cooperates with a set program, a memory that stores programs and data, and the like.

本発明のコンテンツ提供システムは、時間の経過に応じて複雑に移り変わる話者の話題を並列的に随時追跡し、所定時間に於いて盛り上がっている集中度が高い或いは中心的な話題や興味を複雑な処理を要さずに容易に且つ迅速に特定することができ、又、コンテンツ提供に必要且つ十分な正確性で話題や興味を特定することができ、話者の中心的な話題や興味に沿った心理的に非常に受け入れやすいコンテンツを、話者、或いは話者の対話を聴いている対象者、又は再生出力装置で話者の対話を聴いている対象者等に迅速にリアルタイムで提供することができる。更に、簡単な構成で多大なハードウェアリソースを必要とせずに低コストで実現することができ、コンテンツ提供システム或いはディスプレイ型装置やマイクロフォンなどコンテンツ提供システムの所定部を遍在的に設置することも可能となる。更に、例えばキーセットで設定する話題数やキーワード数等が増大した場合にも、必要とするハードウェアリソースの増加やコスト増が無い或いは非常に軽微に留めつつ、コンテンツ提供の迅速性やリアルタイム性を確保することができる。更に、認知対応時刻の記録及び更新により、例えばキーセット数やキーワード数等の増大した場合等にも、最先の認知対応時刻から最後の認知対応時刻までの認知経過時間を複雑な処理を要さず非常に容易に取得することができる。更に、設定キーワードの認知に基づきコンテンツを提供するので、適度にコンテンツを提供し、過剰に情報提供を防止することができる。更に、話者が意識的にある種の話題で対話し、その話題に対応するコンテンツを呼び出して提供を受ける、或いは提供することも可能であり、話者やコンテンツ提供を受ける被提供者の便宜性や娯楽性を向上することができる。 The content providing system of the present invention tracks the topic of a speaker that changes in a complex manner as time passes, and at the same time, the topic of high concentration or the central topic or interest that is rising in a predetermined time is complicated. Can be identified easily and quickly without the need for special processing, and topics and interests can be identified with sufficient accuracy necessary and sufficient for content provision. Provide psychologically very acceptable content along the way to speakers, subjects who are listening to the speaker's dialogue, or subjects who are listening to the speaker's dialogue on the playback output device in real time. be able to. Furthermore, it can be realized at a low cost without requiring a large amount of hardware resources with a simple configuration, and a predetermined part of the content providing system such as a content providing system or a display-type device or a microphone can be installed ubiquitously. It becomes possible. In addition, for example, when the number of topics or keywords set in the key set increases, the speed of content provision and real-time performance are not increased while there is no increase in the required hardware resources or cost. Can be secured. Furthermore, even if the number of key sets, keywords, etc. increases due to the recording and updating of the cognitive response time, complicated processing of the elapsed time from the earliest cognitive response time to the last cognitive response time is required. Without being able to get very easily. Furthermore, since the content is provided based on the recognition of the set keyword, it is possible to provide the content moderately and prevent excessive information provision. Furthermore, it is possible for the speaker to consciously interact with a certain topic and to receive or provide the content corresponding to the topic. Sex and entertainment can be improved.

本発明のコンテンツ提供システムの第１実施形態について説明する。第１実施形態のコンテンツ提供システム１００は、図１に示すように、マイクロフォン１０１と、特徴抽出部１０２と、キーワード認知部１０３と、標準特徴記憶部１０４と、認知管理部１０５と、コンテンツ呼出部１０６と、コンテンツデータベース（コンテンツＤＢ）１０７と、再生処理部１０８と、ディスプレイ１０９で基本的に構成され、特徴抽出部１０２、キーワード認知部１０３、標準特徴記憶部１０４、認知管理部１０５、コンテンツ呼出部１０６、コンテンツＤＢ１０７、再生処理部１０８等の所定部は、設定されるプログラムと協働するＣＰＵや、プログラムやデータを記憶するメモリ等で実現される。コンテンツ提供システム１００の前記１０１〜１０９各部の物理的な配置構成は適宜であり、例えば１０１〜１０９を一体的に設けたディスプレイ型装置や、或はコンテンツＤＢ１０７以外を一体的に設けたディスプレイ型装置とコンテンツＤＢ１０７を設けたサーバーで構成し、前記ディスプレイ型装置がサーバーとデータを送受信する構成や、或はマイクロフォン１０１と再生処理部１０８とディスプレイ１０９を一体的に設けたディスプレイ型装置と前記１０２〜１０７を設けたサーバーで構成し、前記ディスプレイ型装置が特徴抽出部１０２を有するサーバーへ音声を送信し、サーバーが所定処理を実行し、再生処理部１０８がサーバーからコンテンツデータを受信する構成等とすることが可能であり、又、適宜の所定部或いは所定部を有する装置をネットワークで接続して構成することが可能である。 A first embodiment of the content providing system of the present invention will be described. As shown in FIG. 1, the content providing system 100 according to the first embodiment includes a microphone 101, a feature extraction unit 102, a keyword recognition unit 103, a standard feature storage unit 104, a recognition management unit 105, and a content call unit. 106, a content database (content DB) 107, a playback processing unit 108, and a display 109. The feature extraction unit 102, keyword recognition unit 103, standard feature storage unit 104, recognition management unit 105, content call The predetermined units such as the unit 106, the content DB 107, and the reproduction processing unit 108 are realized by a CPU that cooperates with a set program, a memory that stores the program and data, and the like. The physical arrangement configuration of each of the units 101 to 109 of the content providing system 100 is appropriate. For example, a display type device in which 101 to 109 are integrally provided, or a display type device in which items other than the content DB 107 are provided integrally. And a server provided with the content DB 107, the display type device transmits / receives data to / from the server, or the display type device provided with the microphone 101, the reproduction processing unit 108, and the display 109 integrally, And a configuration in which the display-type device transmits audio to the server having the feature extraction unit 102, the server executes predetermined processing, and the reproduction processing unit 108 receives content data from the server. Can also be used, and appropriate predetermined parts or predetermined It can be constructed by connecting a network device having a.

特徴抽出部１０２は、マイクロフォン１０１から入力される音声を取り込んでアナログ／デジタル変換し、音響分析により例えばケプストラムなど単位時間毎の特徴量の抽出を行う。また、標準特徴記憶部１０４は登録されている各キーワードと対応設定されている標準的な特徴量を記憶しており、キーワード認知部１０３は、例えば連続ＤＰマッチングにより、特徴抽出部１０２で抽出した特徴量と標準特徴記憶部１０４に格納されている各キーワードの標準的な特徴量とを照合して類似距離を算出し、算出した類似距離が設定記憶されている所定の閾値以下であるか判定し、類似距離が所定の閾値以下である場合に、その類似距離が算出された標準的な特徴量と対応設定されている所定のキーワードを認知する。尚、本発明に於ける音声を認識して認識した音声から設定されているキーワードを認知する構成には、例えば前記構成のようなワードスポッティングの音声認識技術を用いると、速い応答速度でリアルタイムにコンテンツを提供できるシステムを低コストで実現することが可能となり、又、特段の構成を用いずとも予め決まったキーワード以外は抽出しないことから、音声を取り込まれる人のプライバシー保護を図ることができて好適であるが、適宜の音声認識技術を用いて設定されるキーワードを認知する構成とすることが可能である。 The feature extraction unit 102 takes in the voice input from the microphone 101, performs analog / digital conversion, and extracts a feature quantity for each unit time such as a cepstrum by acoustic analysis. In addition, the standard feature storage unit 104 stores standard feature amounts set corresponding to each registered keyword, and the keyword recognition unit 103 extracts the feature extraction unit 102 by, for example, continuous DP matching. A similarity distance is calculated by comparing the feature quantity with the standard feature quantity of each keyword stored in the standard feature storage unit 104, and it is determined whether the calculated similarity distance is equal to or less than a predetermined threshold value that is set and stored. When the similarity distance is equal to or smaller than a predetermined threshold, a predetermined keyword set corresponding to the standard feature amount for which the similarity distance is calculated is recognized. In the configuration of recognizing the keyword set from the speech recognized by recognizing the speech in the present invention, for example, when using the word spotting speech recognition technology as in the above configuration, in real time with a fast response speed. It is possible to realize a system that can provide content at a low cost, and since only keywords that are determined in advance are extracted without using a special configuration, it is possible to protect the privacy of people who capture audio. Although it is preferable, it is possible to adopt a configuration for recognizing a keyword set using an appropriate voice recognition technique.

認知管理部１０５は、キーワード認知部１０３に於ける所定のキーワードの認知に応じて、実時間クロックの計測時刻等により前記所定のキーワードの認知時刻を取得し、記憶保持する図２の認知管理テーブルから前記所定のキーワードに対応する一若しくは複数の認知時刻記録領域を記憶する対応テーブル等により全て認識し、認識した認知時刻記録領域に前記所定のキーワードの認知時刻を記録し、又、認識した認知時刻記録領域に既に所定のキーワードの認知に基づき認知時刻が記録されている場合には、同一の所定のキーワードの新たな認知に基づき新たな認知時刻を更新して記録する。図２の認知管理テーブルは、識別ＩＤで識別されるキーセットの複数を有し、各キーセットにはそれぞれ属する複数種の異なるキーワードが設定され、各キーワードに対する認知時刻記録領域が設けられている。図２の例では認知管理テーブルの各キーセットにそれぞれ３個或は４個のキーワードが設定されている。尚、各キーセットにそれぞれ設定する複数の異なるキーワードの数は３個以上の複数とすると、例えば音声入力される話の中心的話題により適合したコンテンツの提供が可能となって好適である。更に、認知管理部１０５は、実時間クロック等により取得する現在時刻から設定記憶されている認知設定時間を超える時間が経過している、記録されたキーワードの認知時刻を順次認識して消去する。更に、認知管理部１０５は、設定されている全てのキーワードに対して認知時刻が記録された所定のキーセットの識別ＩＤを認識し、認識した識別ＩＤをコンテンツ抽出部１０６に出力する。 The recognition management unit 105 obtains the recognition time of the predetermined keyword based on the measurement time of the real time clock in accordance with the recognition of the predetermined keyword in the keyword recognition unit 103, and stores and holds the recognition time of the predetermined keyword in FIG. From the recognition table or the like that stores one or a plurality of recognition time recording areas corresponding to the predetermined keyword, and the recognition time of the predetermined keyword is recorded in the recognized recognition time recording area. When the recognition time is already recorded in the time recording area based on the recognition of the predetermined keyword, the new recognition time is updated and recorded based on the new recognition of the same predetermined keyword. The recognition management table of FIG. 2 has a plurality of key sets identified by identification IDs, and each key set is set with a plurality of different keywords, and a recognition time recording area for each keyword is provided. . In the example of FIG. 2, three or four keywords are set in each key set of the recognition management table. If the number of a plurality of different keywords set for each key set is three or more, for example, it is possible to provide content more suitable for the central topic of the voice input speech. Furthermore, the recognition management unit 105 sequentially recognizes and deletes the recognized recognition times of the recorded keywords that have passed the recognized setting time that has been set and stored from the current time acquired by a real time clock or the like. Furthermore, the recognition management unit 105 recognizes the identification ID of a predetermined key set in which the recognition time is recorded for all the set keywords, and outputs the recognized identification ID to the content extraction unit 106.

コンテンツ呼出部１０６は、認知管理部１０５からの識別ＩＤの入力に応じ、再生処理部１０８でコンテンツが再生中であるか判定し、コンテンツが再生中でないと判定した場合に、コンテンツＤＢ１０７に格納されているコンテンツの中から前記識別ＩＤと対応設定されているコンテンツを呼び出し、呼び出したコンテンツを再生処理部１０８へ出力し、又、コンテンツが再生中であると判定した場合に、実時間クロック等により設定記憶されている呼出設定時間の計測を開始し、更に、呼出設定時間内にコンテンツの再生終了を認識した場合に、前記識別ＩＤと対応するコンテンツを呼び出して再生処理部１０８へ出力し、呼出設定時間内にコンテンツの再生終了を認識しなかった場合には、前記識別ＩＤと対応するコンテンツを呼び出しないようになっている。再生処理部１０８は、コンテンツ呼出部１０６で抽出されたコンテンツを復号化して再生し、例えば動画像若しくは静止画像の画像のコンテンツを再生し、ディスプレイ１０９は、再生処理部１０８で再生される画像を表示する。再生するコンテンツの内容は、例えば広告情報とすると良いが、施設等の案内情報、キーセットに属するキーワードと高い関連性を有する事柄の説明若しくは高い関連性を有する番組、クイズ等としても良く、その他にもキーワードと関連性を有する適宜の内容とすることが可能である。 The content calling unit 106 determines whether the content is being reproduced by the reproduction processing unit 108 in response to the input of the identification ID from the recognition management unit 105. If the content calling unit 106 determines that the content is not being reproduced, the content calling unit 106 is stored in the content DB 107. The content that is set to correspond to the identification ID is called out from the content that is being played, the called content is output to the playback processing unit 108, and if it is determined that the content is being played back, The measurement of the call setting time stored in the setting is started, and when the reproduction of the content is recognized within the call setting time, the content corresponding to the identification ID is called and output to the reproduction processing unit 108, If the end of content playback is not recognized within the set time, the content corresponding to the identification ID is called up. It is made as to no. The reproduction processing unit 108 decodes and reproduces the content extracted by the content calling unit 106, for example, reproduces the content of a moving image or a still image, and the display 109 displays the image reproduced by the reproduction processing unit 108. indicate. The content of the content to be reproduced may be, for example, advertisement information, but may be information on facilities, etc., explanations of matters highly relevant to keywords belonging to the key set, programs with high relevance, quizzes, etc. In addition, it is possible to have appropriate contents having relevance to the keyword.

上記第１実施形態のコンテンツ提供システム１００による処理の流れを図３に示す。例えばディスプレイ型装置等として構成されるコンテンツ提供システム１００のマイクロフォン１０１が周囲近傍に位置する人の話し声を取り込み、連続して音声入力される（Ｓ１）。入力された話し声の音声は特徴抽出部１０２に随時取り込まれてＡ／Ｄ変換され、特徴抽出部１０２は、音声データから単位時間毎に特徴量を随時抽出し（Ｓ２）、抽出した特徴量をキーワード認知部１０３へ出力する。キーワード認知部１０３は、入力音声の特徴量と標準特徴記憶部１０４に格納されている標準的な特徴量とを随時照合して類似距離を算出し（Ｓ３）、算出した類似距離が所定の閾値以下であるか随時判定し（Ｓ４）、類似距離が所定の閾値以下である場合に、その類似距離が算出された標準的な特徴量と対応設定されている所定のキーワードを認知する（Ｓ５）。 The flow of processing by the content providing system 100 of the first embodiment is shown in FIG. For example, the microphone 101 of the content providing system 100 configured as a display-type device or the like takes in the voice of a person located in the vicinity of the surroundings and continuously inputs voice (S1). The input speech voice is captured by the feature extraction unit 102 and A / D converted at any time, and the feature extraction unit 102 extracts the feature amount from the speech data at every unit time (S2), and the extracted feature amount is obtained. Output to the keyword recognition unit 103. The keyword recognizing unit 103 compares the feature quantity of the input speech with the standard feature quantity stored in the standard feature storage unit 104 as needed to calculate a similarity distance (S3), and the calculated similarity distance is a predetermined threshold value. It is determined at any time whether or not (S4), and when the similar distance is equal to or smaller than a predetermined threshold, a predetermined keyword set corresponding to the standard feature amount for which the similar distance is calculated is recognized (S5). .

認知管理部１０５は、キーワード認知部１０３に於ける所定のキーワードの認知に応じて、認知した所定のキーワードに対する認知時刻を取得すると共に（Ｓ６）、前記所定のキーワードに対応する認知管理テーブルの認知時刻記録領域を認識し、認知時刻記憶領域に前記所定のキーワードの認知時刻を記録し（Ｓ７）、又、前期認知時刻記録領域に既に認知時刻が記録された状態になっている場合には、新たな認知時刻を更新して記録する（Ｓ７）。この場合、例えば図２の認知管理テーブルの識別ＩＤ：０００１のキーセットのキーワード：Ａと、識別ＩＤ：０００３のキーセットのキーワード：Ａのように、同一のキーワードが異なるキーセットに複数設定されている場合、図２の１２時３０分５秒のように、前記キーワードが属する全てのキーセットについて、そのキーワードに対する認知時刻記録領域に認知時刻を記録する。そして、認知管理部１０５は、記録された各キーワードの認知時刻に対し、現在時刻から認知時刻までの経過時間が例えば１０分など認知設定時間を超えているか随時判定し（Ｓ８）、現在時刻から認知時刻までの経過時間が認知設定時間を超えていると判定した場合には、その判定に応じ、判定したの認知時刻の記録を消去する（Ｓ９）。前記認知設定時間は、例えば３分以上３０分以内の所定時間、好ましくは５分以上１５分以内の所定時間とする等、キーセットのキーワード数や自然な対話に於ける複数のキーワードの予測出現時間等に応じて適宜設定することが可能である。かかる認知設定時間により、周囲の対象者同士の自然な対話や周囲の対象者の携帯電話による対話等から音声を取り込んで所定の処理を行う。 The recognition management unit 105 acquires a recognition time for the recognized predetermined keyword according to the recognition of the predetermined keyword in the keyword recognition unit 103 (S6), and recognizes the recognition management table corresponding to the predetermined keyword. Recognizing the time recording area, recording the recognition time of the predetermined keyword in the recognition time storage area (S7), and if the recognition time has already been recorded in the previous recognition time recording area, The new recognition time is updated and recorded (S7). In this case, for example, a plurality of the same keywords are set in different key sets, such as the keyword A of the key set with the identification ID 0001 in the recognition management table of FIG. 2 and the keyword A of the key set with the identification ID 0003. If it is, the recognized time is recorded in the recognized time recording area for the keyword for all the key sets to which the keyword belongs, as at 12: 30: 5 in FIG. Then, the recognition management unit 105 determines at any time whether the elapsed time from the current time to the recognized time exceeds the recognized set time such as 10 minutes with respect to the recorded recognized time of each keyword (S8). If it is determined that the elapsed time until the recognition time exceeds the recognition setting time, the determined recognition time record is deleted according to the determination (S9). The recognition setting time is, for example, a predetermined time of 3 to 30 minutes, preferably a predetermined time of 5 to 15 minutes, such as the number of keywords in the key set and the predicted appearance of multiple keywords in a natural conversation. It is possible to set appropriately according to time or the like. With such a recognition setting time, a predetermined process is performed by capturing voice from a natural conversation between surrounding subjects or a conversation of a surrounding subject using a mobile phone.

更に、認知管理部１０５は、現在時刻から最先の認知時刻までの経過時間が認知設定時間以内である、所定のキーセットの全キーワードに対する認知時刻の記録を認識し（Ｓ１０）、前記認識に応じて、前記所定のキーセットの識別ＩＤを認識してコンテンツ呼出部１０６に出力すると共に（Ｓ１１）、前記所定のキーセットに於ける全キーワードの認知時刻の記録を消去して初期化する。図２の認知管理テーブルの例では、例えば１０分の認知設定時間以内にキーワード：Ａ、Ｂ、Ｃに対する認知時刻が記録されることにより、識別ＩＤ：０００１がコンテンツ呼出部１０６へ出力される。また、認知管理部１０５は、前記所定のキーセットの識別ＩＤを出力した場合に、他のキーセットに於ける認知時刻の記録はそのまま維持した状態としつつ、前記所定のキーセットに対する認知時刻の記録処理を開始し、キーワードの認知に基づく認知時刻の記録を継続的に実行する。尚、認知管理テーブルの各キーセットにそれぞれ対応設定されているキーワードの数は全てのキーワード組で同数としても良いが、例えば図２の識別ＩＤ：０００１のＡ、Ｂ、Ｃの３個と識別ＩＤ：０００２のＤ、Ｅ、Ｆ、Ｇのように、キーセット毎に設定されているキーワードの数が相違するようにしてもよく、この場合にも、各キーセットについてそれぞれ設定されている全キーワードに対する認知設定時間以内の認知時刻の記録に基づき識別ＩＤを出力する。また、認知管理部１０５は、各キーセットに対して一律に同じ時間である認知設定時間を記憶し、前記処理を実行する以外に、各キーセット毎に別々に設定される認知設定時間を記憶し、前記処理を実行する構成としてもよい。 Furthermore, the recognition management unit 105 recognizes the record of the recognition time for all keywords of a predetermined key set whose elapsed time from the current time to the earliest recognition time is within the recognition setting time (S10). Accordingly, the identification ID of the predetermined key set is recognized and output to the content calling unit 106 (S11), and the record of the recognition time of all keywords in the predetermined key set is deleted and initialized. In the example of the recognition management table in FIG. 2, for example, the recognition time for the keywords A, B, and C is recorded within the recognition setting time of 10 minutes, so that the identification ID: 0001 is output to the content calling unit 106. In addition, when the identification manager 105 outputs the identification ID of the predetermined key set, the recognition management unit 105 maintains the record of the recognition time in the other key sets as it is, while maintaining the recognition time of the predetermined key set. The recording process is started, and the recording of the recognition time based on the recognition of the keyword is continuously performed. It should be noted that the number of keywords set corresponding to each key set in the recognition management table may be the same for all keyword sets. For example, the identification IDs are identified as three IDs A, B, and C in FIG. The number of keywords set for each key set, such as D, E, F, and G of ID: 0002, may be different. In this case as well, all the keys set for each key set are all set. The identification ID is output based on the recording of the recognition time within the recognition setting time for the keyword. In addition, the cognitive management unit 105 stores the cognitive setting time that is the same time uniformly for each key set, and stores the cognitive setting time that is set separately for each key set, in addition to executing the processing. And it is good also as a structure which performs the said process.

コンテンツ呼出部１０６は、認知管理部１０５からの識別ＩＤの入力に基づき、再生処理部１０８で既に認識済のキーセットと対応するコンテンツが再生中であるか判定し（Ｓ１２）、再生処理部１０８から再生無のデータを取得してコンテンツが再生中でないと判定した場合には、その判定に応じて、コンテンツＤＢ１０７から前記識別ＩＤと対応設定されているコンテンツを呼び出し（Ｓ１５）、再生処理部１０８へ出力する。また、コンテンツ呼出部１０６は、再生処理部１０８から再生有のデータを取得してコンテンツが再生中であると判定した場合には、その判定に応じて、実時間クロック等により設定記憶されている呼出設定時間の計測を開始し（Ｓ１３）、呼出設定時間の計測中にコンテンツの再生が終了したか判定し（Ｓ１４）、呼出設定時間の計測中に再生処理部１０８からコンテンツの再生無或いは再生終了データを取得してコンテンツの再生終了を判定或は認識した場合に、その判定等に応じて、コンテンツＤＢ１０７から前記識別ＩＤと対応設定されているコンテンツを呼び出し（Ｓ１５）、再生処理部１０８へ出力すると共に、呼出設定時間に対する計測時間を初期化する。他方で、コンテンツ呼出部１０６は、呼出設定時間の計測中に再生処理部１０８からコンテンツの再生無或いは再生終了データを取得せずコンテンツの再生終了を判定或は認識しなかった場合には、呼出設定時間の計測終了に応じ、前記識別ＩＤと対応設定されているコンテンツの呼び出しや再生を実行せずに処理を終了すると共に、計測時間を初期化する。 The content calling unit 106 determines whether the content corresponding to the key set already recognized by the reproduction processing unit 108 is being reproduced based on the input of the identification ID from the recognition management unit 105 (S12). If it is determined that the content is not being played back by acquiring data that has not been played back, the content set corresponding to the identification ID is called from the content DB 107 in response to the determination (S15), and the playback processing unit 108 Output to. Further, when the content calling unit 106 obtains data with reproduction from the reproduction processing unit 108 and determines that the content is being reproduced, the content calling unit 106 is set and stored by a real time clock or the like according to the determination. The measurement of the call set time is started (S13), it is determined whether the reproduction of the content is completed during the measurement of the call set time (S14), and the content is not reproduced or reproduced from the reproduction processing unit 108 during the measurement of the call set time. When the end data is acquired to determine or recognize the end of playback of the content, the content set corresponding to the identification ID is called from the content DB 107 according to the determination or the like (S15), and the playback processing unit 108 is called. Outputs and initializes the measurement time for the call setup time. On the other hand, if the content calling unit 106 does not determine or recognize the end of the content reproduction without obtaining the content reproduction end or the reproduction end data from the reproduction processing unit 108 during the call setting time measurement, In response to the completion of the measurement of the set time, the process is terminated without executing the call and reproduction of the content set corresponding to the identification ID, and the measurement time is initialized.

そして、再生処理部１０８は、コンテンツ呼出部１０６から入力された広告映像等のコンテンツを復号化して再生し、ディスプレイ１０９が再生される映像を表示する（Ｓ１６）。前記再生及び表示は、認識した所定のキーセットの最後のキーワードの認知とほぼリアルタイム若しくはほぼ呼出設定時間内で行われ、再生出力されるコンテンツの映像は、コンテンツ提供システム１００を構成するディスプレイ型装置等の周囲近傍に位置して話し声の音声が取り込まれた音声入力者等に提供される。かかるコンテンツ提供システム１００により、例えばキーワードとして「海外旅行」「夏」「バリ」を設定時間内に認知し、これに基づき前記キーワードに対応する夏季のバリ島旅行に対する広告のコンテンツを提供すること等が可能である。 Then, the reproduction processing unit 108 decodes and reproduces the content such as the advertisement video input from the content calling unit 106, and displays the video to be reproduced on the display 109 (S16). The reproduction and display are performed within the real time or almost the call setting time with the recognition of the last keyword of the recognized predetermined key set, and the video of the content to be reproduced and output is a display type device constituting the content providing system 100 The voice input person or the like that is located in the vicinity of the voice and the like and has the voice of the spoken voice taken in is provided. The content providing system 100 recognizes, for example, “overseas travel”, “summer”, and “bali” as keywords within a set time, and provides advertisement content for a Bali trip in summer corresponding to the keywords based on this. Is possible.

尚、例えばコンテンツ呼出部１０６が認知管理部１０５からの識別ＩＤの入力に応じて、前記識別ＩＤと対応設定されているコンテンツを呼び出して再生処理部１０８へ出力し、再生処理部１０８が入力されたコンテンツを所定の記憶領域に記憶保持し、再生処理部１０８或いは所定部が、コンテンツを再生中であるか判定し、コンテンツを再生中でないと判定した場合に、再生処理部１０８が、その判定に応じて、前記出力され記憶保持しているコンテンツを復号化して再生し、また、再生処理部１０８或いは所定部が、コンテンツが再生中であると判定した場合に、その判定に応じて、再生設定時間の計測を開始し、再生設定時間の計測中にコンテンツの再生が終了したか判定し、再生設定時間の計測中にコンテンツの再生終了を認識した場合に、再生処理部１０８が、その認識に応じて、前記出力され記憶保持しているコンテンツを復号化して再生すると共に、再生処理部１０８或いは所定部が、再生設定時間に対する計測時間を初期化し、他方で、再生処理部１０８或いは所定部が、再生設定時間の計測中にコンテンツの再生終了を認識しなかった場合に、再生処理部１０８が、再生設定時間の計測終了に応じ、前記出力され記憶保持するコンテンツを再生せずに消去すると共に、再生処理部１０８或いは所定部が、計測時間を初期化する構成としてもよい。また、コンテンツ呼出部１０６が、ネットワーク接続で分離して設置されている再生処理部１０８から再生中や再生終了のデータを取得し、それに応じてコンテンツの呼出或は設定時間の計測開始等を行い、再生処理部１０８がコンテンツ呼出部１０６から受信するコンテンツを再生する構成等としてもよい。 Note that, for example, in response to the input of the identification ID from the recognition management unit 105, the content calling unit 106 calls the content corresponding to the identification ID and outputs the content to the reproduction processing unit 108, and the reproduction processing unit 108 is input. If the playback processing unit 108 or the predetermined unit determines whether the content is being played back and determines that the content is not being played back, the playback processing unit 108 Accordingly, the content that is output and stored is decrypted and played back, and when the playback processing unit 108 or the predetermined unit determines that the content is being played back, playback is performed according to the determination. Start measuring the set time, determine whether the content playback has ended while measuring the playback set time, and recognize the end of content playback while measuring the playback set time. In this case, the reproduction processing unit 108 decrypts and reproduces the content that is output and stored in accordance with the recognition, and the reproduction processing unit 108 or the predetermined unit initializes the measurement time for the reproduction set time. On the other hand, when the playback processing unit 108 or the predetermined unit does not recognize the end of the playback of the content during the measurement of the playback set time, the playback processing unit 108 outputs the output in response to the end of the playback set time measurement. The stored content may be erased without being reproduced, and the reproduction processing unit 108 or the predetermined unit may be configured to initialize the measurement time. In addition, the content calling unit 106 obtains data during playback or playback completion from the playback processing unit 108 that is installed separately by network connection, and calls the content or starts measuring the set time accordingly. The reproduction processing unit 108 may reproduce the content received from the content calling unit 106.

次に、コンテンツ提供システムの第２実施形態について説明する。第２実施形態のコンテンツ提供システム１００は、図４に示すように、第１実施形態と同様、マイクロフォン１０１と、特徴抽出部１０２と、キーワード認知部１０３と、標準特徴記憶部１０４と、認知管理部１０５と、コンテンツ呼出部１０６と、コンテンツデータベース（コンテンツＤＢ）１０７を備えるものであるが、第１実施形態と異なり、コンテンツ呼出部１０６が呼び出したコンテンツを、コンテンツ提供システム１００の外部の再生出力装置２００へ送信する送信処理部１１０を備えるものであり、その所定部は、設定されるプログラムと協働するＣＰＵや、プログラムやデータを記憶するメモリ等で実現される。尚、マイクロフォン１０１、特徴抽出部１０２、キーワード認知部１０３、標準特徴記憶部１０４、認知管理部１０５、コンテンツＤＢ１０７の機能や動作は上記第１実施形態と同様である。 Next, a second embodiment of the content providing system will be described. As shown in FIG. 4, the content providing system 100 according to the second embodiment includes a microphone 101, a feature extraction unit 102, a keyword recognition unit 103, a standard feature storage unit 104, and recognition management, as in the first embodiment. Unit 105, content calling unit 106, and content database (content DB) 107, but unlike the first embodiment, the content called by the content calling unit 106 is reproduced and output outside the content providing system 100. The transmission processing unit 110 that transmits to the apparatus 200 is provided, and the predetermined unit is realized by a CPU that cooperates with a set program, a memory that stores the program and data, and the like. The functions and operations of the microphone 101, the feature extraction unit 102, the keyword recognition unit 103, the standard feature storage unit 104, the recognition management unit 105, and the content DB 107 are the same as those in the first embodiment.

コンテンツ呼出部１０６は、認知管理部１０５からの識別ＩＤの入力に応じ、送信処理部１１０でコンテンツを送信中であるか判定し、送信処理部１１０でコンテンツを送信中でないと判定した場合に、コンテンツＤＢ１０７に格納されているコンテンツの中から前記識別ＩＤと対応設定されているコンテンツを呼び出し、送信処理部１１０へ呼び出したコンテンツを出力し、又、送信処理部１１０でコンテンツを送信中であると判定した場合に、実時間クロック等により設定記憶されている呼出設定時間の計測を開始し、更に、呼出設定時間内にコンテンツの送信終了を認識した場合に、前記識別ＩＤと対応するコンテンツを呼び出して送信処理部１１０へ出力し、他方で、呼出設定時間内にコンテンツの送信終了を認識しなかった場合には、前記識別ＩＤと対応するコンテンツを呼び出ししないようになっている。 In response to the input of the identification ID from the recognition management unit 105, the content calling unit 106 determines whether the content is being transmitted by the transmission processing unit 110, and determines that the content is not being transmitted by the transmission processing unit 110. The content that is set in correspondence with the identification ID is called from the content stored in the content DB 107, the content that is called is output to the transmission processing unit 110, and the content is being transmitted by the transmission processing unit 110. When the determination is made, the measurement of the call setting time set and stored by the real time clock or the like is started, and when the content transmission end is recognized within the call setting time, the content corresponding to the identification ID is called. Output to the transmission processing unit 110, and on the other hand, if the end of content transmission is not recognized within the call setup time , So as not to call the contents corresponding to the identification ID.

送信処理部１１０は、コンテンツ呼出部１０６で呼び出され出力されたコンテンツを記憶保持し、再生出力装置２００へコンテンツのデータを時系列で順次送信する構成であり、前記コンテンツのデータは例えば再生出力装置２００のディスプレイに部分的に表示される画像データとする。再生出力装置２００は、順次送信されるコンテンツのデータを復号化して再生し出力するものであり、例えば送信される画像データ及び音声データを受信、再生してディスプレイ及びスピーカで出力するデジタルテレビ受像器等で、ピクチャーインピクチャー等でディスプレイ画面の一部に前記コンテンツの動画像若しくは静止画像の画像を配置して、主たる画像と共に前記コンテンツ画像を副画像として表示するものとする。また、コンテンツの内容は、例えばキーセットに属するキーワードと高い関連性を有する事柄の説明、広告情報、クイズ、案内情報等とすると良いが、その他にもキーワードと関連性を有する適宜の内容とすることが可能である。 The transmission processing unit 110 is configured to store and hold the content called and output by the content calling unit 106 and sequentially transmit the content data to the reproduction output device 200 in time series. The content data is, for example, the reproduction output device. It is assumed that the image data is partially displayed on the 200 display. The reproduction output device 200 decodes and sequentially outputs content data to be transmitted, and outputs, for example, a digital television receiver that receives, reproduces, and outputs the transmitted image data and audio data through a display and a speaker. For example, a moving image or still image of the content is arranged on a part of the display screen using a picture-in-picture or the like, and the content image is displayed as a sub-image together with the main image. The content may be, for example, an explanation of matters highly relevant to the keywords belonging to the key set, advertisement information, quizzes, guidance information, etc., but other appropriate content that is relevant to the keywords. It is possible.

上記第２実施形態のコンテンツ提供システム１００による処理の流れを図５に示す。第２実施形態のコンテンツ提供システム１００は、例えばテレビ受像機等の再生出力装置２００へ生放送の番組を送信する際に、放送中の番組から出演者の話声の音声をマイクロフォン１０１で取り込み、その音声からキーワードを認知し、認知したキーワードに基づき、キーワードと関連性を有する事柄の説明や広告情報をコンテンツとして送信する等に用い、図５に示すように、コンテンツ呼出部１０６に認知管理部１０５から所定のキーセットの識別ＩＤが出力されるまでは上記第１実施形態と同様の処理を実行する（Ｓ１〜Ｓ１１）。 The flow of processing by the content providing system 100 of the second embodiment is shown in FIG. When the content providing system 100 of the second embodiment transmits a live broadcast program to the reproduction output device 200 such as a television receiver, for example, the microphone 101 captures the voice of the performer's voice from the program being broadcast. The keyword is recognized from the voice, and based on the recognized keyword, it is used to transmit the explanation of the matters related to the keyword and the advertisement information as the content. As shown in FIG. Until the identification ID of a predetermined key set is output, the same processing as in the first embodiment is executed (S1 to S11).

そして、コンテンツ呼出部１０６は、認知管理部１０５からの所定のキーセットの識別ＩＤの入力に応じて、送信処理部１１０で既に認識済のキーセットと対応するコンテンツが送信中であるか判定し（Ｓ１７）、送信処理部１１０から送信無のデータを取得してコンテンツが送信中でないと判定した場合には、その判定に応じて、コンテンツＤＢ１０７から前記識別ＩＤと対応設定されているコンテンツを呼び出し（Ｓ１５）、送信処理部１１０へ出力し、送信処理部１１０は、入力される前記コンテンツを所定の記憶領域に記憶保持し、記憶保持する前記コンテンツのデータを時系列で順次再生出力装置２００へ送信する（Ｓ１９）。また、コンテンツ呼出部１０６は、送信処理部１１０から送信有のデータを取得してコンテンツを送信中であると判定した場合には、その判定に応じて、実時間クロック等により設定記憶されている呼出設定時間の計測を開始し（Ｓ１３）、呼出設定時間の計測中にコンテンツの送信が終了したか判定し（Ｓ１８）、呼出設定時間の計測中に送信処理部１１０からコンテンツの送信無或いは送信終了データを取得してコンテンツの送信終了を判定或は認識した場合に、その判定等に応じて、コンテンツＤＢ１０７から前記識別ＩＤと対応設定されているコンテンツを呼び出し（Ｓ１５）、送信処理部１１０へ出力すると共に、呼出設定時間に対する計測時間を初期化し、送信処理部１１０は、入力される前記コンテンツを所定の記憶領域に記憶保持し、記憶保持する前記コンテンツのデータを時系列で順次再生出力装置２００へ送信する（Ｓ１９）。他方で、コンテンツ呼出部１０６は、呼出設定時間の計測中に送信処理部１１０からコンテンツの送信無或いは送信終了データを取得せずコンテンツの送信終了を判定或は認識しなかった場合には、呼出設定時間の計測終了に応じ、前記識別ＩＤと対応設定されているコンテンツの呼び出しや送信を実行せずに処理を終了すると共に、計測時間を初期化する。 Then, in response to the input of the identification ID of the predetermined key set from the recognition management unit 105, the content calling unit 106 determines whether content corresponding to the key set already recognized by the transmission processing unit 110 is being transmitted. (S17) When it is determined that the data not transmitted is acquired from the transmission processing unit 110 and the content is not being transmitted, the content corresponding to the identification ID is called from the content DB 107 according to the determination. (S15) Output to the transmission processing unit 110, the transmission processing unit 110 stores and holds the input content in a predetermined storage area, and sequentially stores the content data stored and held to the reproduction output device 200 in time series Transmit (S19). Further, when the content calling unit 106 acquires data with transmission from the transmission processing unit 110 and determines that the content is being transmitted, the content calling unit 106 is set and stored by a real time clock or the like according to the determination. The measurement of the call setup time is started (S13), it is determined whether the content transmission is completed during the measurement of the call setup time (S18), and the content is not transmitted or transmitted from the transmission processing unit 110 during the measurement of the call setup time. When the end data is acquired and the transmission end of the content is determined or recognized, the content set corresponding to the identification ID is called from the content DB 107 according to the determination or the like (S15), and sent to the transmission processing unit 110. At the same time, the transmission processing unit 110 initializes the measurement time for the call setup time, and stores the input content in a predetermined storage area. Held sequentially transmits to the playback output apparatus 200 the data of the content to be stored and held in time series (S19). On the other hand, if the content calling unit 106 does not determine or recognize the content transmission end without acquiring the content transmission transmission data or the transmission end data from the transmission processing unit 110 during the call setting time measurement, In response to the completion of the measurement of the set time, the process is terminated without executing the call or transmission of the content set corresponding to the identification ID, and the measurement time is initialized.

前記コンテンツのデータを受信する再生出力装置２００は、例えば通常の放送で送信される番組を受信して再生出力すると共に、その再生処理部で前記コンテンツのデータを受信に応じて復号化して再生し、例えばディスプレイの一部にコンテンツ画像を表示する。前記送信及び表示は、認識した所定のキーセットの最後のキーワードの認知とほぼリアルタイム若しくはほぼ呼出設定時間内で行われ、再生出力されるコンテンツ画像は、例えば前記番組を再生出力装置２００で視聴する視聴者に提供される。かかるコンテンツ提供システム１００により、例えばテレビ番組の出演者の対話からキーワードを認知し、キーワードに関連した情報や説明等のコンテンツを提供すること等が可能である。 The reproduction output device 200 that receives the content data receives, reproduces, and outputs, for example, a program transmitted by a normal broadcast, and the reproduction processing unit decodes and reproduces the content data in response to reception. For example, a content image is displayed on a part of the display. The transmission and display are performed in real time or almost within the call setting time with the recognition of the last keyword of the recognized predetermined key set, and the content image to be reproduced and output is, for example, viewing the program on the reproduction output device 200. Provided to viewers. With this content providing system 100, for example, it is possible to recognize a keyword from the dialogue of performers of a TV program and provide content such as information and explanation related to the keyword.

尚、例えばコンテンツ呼出部１０６が認知管理部１０５からの識別ＩＤの入力に応じて、前記識別ＩＤと対応設定されているコンテンツを呼び出して送信処理部１１０へ出力し、送信処理部１１０が入力されたコンテンツを所定の記憶領域に記憶保持すると共に、送信処理部１１０或いは所定部が、コンテンツを送信中であるか判定し、コンテンツを送信中でないと判定した場合に、送信処理部１１０は、その判定に応じて、前記出力され記憶保持しているコンテンツを再生出力装置２００へ送信し、また、送信処理部１１０或いは所定部が、コンテンツが送信中であると判定した場合に、その判定に応じて、送信設定時間の計測を開始し、送信設定時間の計測中にコンテンツの送信が終了したか判定し、送信設定時間の計測中にコンテンツの送信終了を認識した場合に、送信処理部１１０は、その認識に応じて、前記出力され記憶保持しているコンテンツを送信すると共に、送信処理部１１０或いは所定部が、送信設定時間に対する計測時間を初期化し、他方で、送信処理部１１０或いは所定部が、送信設定時間の計測中にコンテンツの送信終了を認識しなかった場合に、送信処理部１１０が、送信設定時間の計測終了に応じ、前記出力され記憶保持するコンテンツを再生せずに消去すると共に、送信処理部１１０或いは所定部が、計測時間を初期化する構成としてもよい。 For example, in response to the input of the identification ID from the recognition management unit 105, the content calling unit 106 calls the content set corresponding to the identification ID and outputs the content to the transmission processing unit 110, and the transmission processing unit 110 is input. The transmission processing unit 110 or the predetermined unit determines whether the content is being transmitted and determines that the content is not being transmitted. In response to the determination, the content that is output and stored is transmitted to the reproduction output device 200, and when the transmission processing unit 110 or the predetermined unit determines that the content is being transmitted, the determination is made. Start the transmission set time measurement, determine whether the content transmission has been completed during the transmission set time measurement, and When recognizing the end of transmission, the transmission processing unit 110 transmits the content that has been output and stored in response to the recognition, and the transmission processing unit 110 or the predetermined unit determines the measurement time for the transmission set time. On the other hand, when the transmission processing unit 110 or the predetermined unit does not recognize the end of content transmission during the measurement of the transmission set time, the transmission processing unit 110 responds to the end of the measurement of the transmission set time according to the end of the measurement of the transmission set time. The content that is output and stored may be erased without being played back, and the transmission processing unit 110 or the predetermined unit may be configured to initialize the measurement time.

次に、上記第１、第２実施形態のコンテンツ提供システム１００の変形例等について説明する。上記第１、第２実施形態のコンテンツ提供システム１００では、提供するコンテンツを画像としたが、提供するコンテンツはこれに限定されるものではなく、例えば音声がない動画像若しくは静止画像の画像だけとする他に、動画像若しくは静止画像の画像と音声、画像がない音声だけのもの等とすることが可能である。音声のコンテンツ若しくは音声を有するコンテンツを提供する場合には、例えば再生処理部１０８で復号化して再生し、コンテンツ提供システム１００に設置されるスピーカから出力する、或は再生出力装置２００の再生処理部１０８で復号化して再生し、再生出力装置２００に設置されるスピーカから出力する等により、ディスプレイ１０９や再生出力装置２００の視聴者等にコンテンツの音声を提供する。 Next, modified examples of the content providing system 100 according to the first and second embodiments will be described. In the content providing system 100 according to the first and second embodiments, the content to be provided is an image. However, the content to be provided is not limited to this, for example, only a moving image without sound or an image of a still image. In addition, it is possible to use a moving image or still image image and sound, only sound without an image, or the like. When providing audio content or content having audio, for example, the reproduction processing unit 108 decodes and reproduces the audio content, and outputs the content from a speaker installed in the content providing system 100 or the reproduction processing unit of the reproduction output device 200 The audio of the content is provided to the display 109, the viewer of the reproduction output device 200, and the like by decoding and reproducing at 108 and outputting from a speaker installed in the reproduction output device 200.

また、上記第１、第２実施形態は、キーセットを複数種のキーワードで構成し、各キーワードに対する認知時刻を記録し、認知設定時間内における所定キーセットの全キーワードの認知に基づき、所定キーセットを認識する構成としたが、認知設定時間内における複数種のキーワードの認知に基づき、該複数種のキーワードと対応するキーセットを認識する構成であれば適宜であり、キーワードと対応設定されている複数種のキーでキーセットを構成し、キーワードの認知時刻をキーワードと対応するキーの認知時刻として記録し、認知設定時間内における所定キーセットの全キーの認知或は認知時刻の記録に基づき、所定キーセットを認識する構成等としてもよい。例えば図６に示すように、一のキーワード若しくは意味内容が類似する複数のキーワード（例えばＡ１、Ａ２、Ａ３）を類似集合単位や代表キーワードを表すキー（例えばＡ）と対応設定し、複数種のキー（例えばＡ、Ｂ、Ｃ）で識別ＩＤで特定されるキーセットを構成し、認知管理部１０５が、キーワード（例えばＡ１）の認知に応じて、キーワードの認知時刻を前記キーワードが対応するキー（例えばＡ）の認知時刻として記録し、同一のキー（例えばＡ）に対応するキーワード（例えばＡ１、Ａ２若しくはＡ３）の認知に応じて、キー（例えばＡ）の認知時刻を更新記録し、又、現在時刻から認知時刻までの経過時間が認知設定時間を経過した記録されたキーに対する認知時刻を消去する構成とし、更に、キーセットを構成する複数種のキー（例えばＡ、Ｂ、Ｃ）の全てについて、認知設定時間内に認知時刻が記録された場合に、例えば識別ＩＤ：０００１で特定されるキーセットなど所定のキーセットを認識する構成等とする。 In the first and second embodiments, the key set is composed of a plurality of types of keywords, the recognition time for each keyword is recorded, and the predetermined key is set based on the recognition of all the keywords of the predetermined key set within the recognition setting time. Although it is configured to recognize a set, it is appropriate as long as it is a configuration that recognizes a key set corresponding to a plurality of types of keywords based on recognition of a plurality of types of keywords within a recognition setting time, and is set to correspond to a keyword. A key set is composed of a plurality of types of keys, and the recognition time of the keyword is recorded as the recognition time of the key corresponding to the keyword. Based on the recognition of all keys of the predetermined key set within the recognition setting time or the recording of the recognition time A configuration that recognizes a predetermined key set may be used. For example, as shown in FIG. 6, a single keyword or a plurality of keywords (for example, A1, A2, A3) having similar meanings and contents are set corresponding to a key representing a similar set unit or a representative keyword (for example, A), A key set specified by an identification ID is configured with keys (for example, A, B, C), and the recognition management unit 105 corresponds to the recognition of the keyword (for example, A1), and the key corresponding to the keyword's recognition time (For example, A) is recorded as the recognition time, and the recognition time of the key (for example, A) is updated and recorded according to the recognition of the keyword (for example, A1, A2, or A3) corresponding to the same key (for example, A). , The time from the current time to the recognition time is erased the recognition time for the recorded key that has passed the recognition setting time, and moreover, a plurality of types constituting the key set For all the keys (for example, A, B, C), when the recognition time is recorded within the recognition setting time, for example, a configuration that recognizes a predetermined key set such as a key set specified by the identification ID: 0001, etc. .

また、所定のキーセットを認識する際に、所定のキーセットと対応する全キーワード若しくは全キーを認知し、且つ前記認知におけるキーワード若しくはキー対する最先の認知時刻から最後の認知時刻までの経過時間が認知設定時間以内であることを認識する構成は、上述の現在時刻から認知時刻までの経過時間が認知設定時間を経過している認知時刻を消去する構成に限定されず適宜であり、例えば前記認知設定時間が経過している認知時刻の消去を行わずに、上記と同様にキーワード若しくはキーに対する認知時刻を記録及び更新記録し、所定のキーセットと対応する全キーワード若しくは全キーの認知時刻の記録に応じ、前記全キーワード若しくは全キーの認知時刻のうちで最先の認知時刻と最後の認知時刻までの経過時間を取得し、その経過時間を認知設定時間と対比して、前記経過時間が認知設定時間以内である場合に前記所定のキーセットを認識する構成等としてもよい。 Also, when recognizing a predetermined key set, all the keywords or all keys corresponding to the predetermined key set are recognized, and the elapsed time from the earliest recognition time for the keyword or key in the recognition to the last recognition time The configuration for recognizing that the time is within the recognition setting time is not limited to the configuration for deleting the recognition time when the elapsed time from the current time to the recognition time has passed the recognition setting time. The recognition time for the keyword or key is recorded and updated in the same manner as above without deleting the recognition time when the recognition setting time has passed, and the recognition time of all keywords or all keys corresponding to the predetermined key set is recorded. In accordance with the record, the earliest recognition time and the elapsed time until the last recognition time among the recognition times of all the keywords or all the keys are acquired, The elapsed time in comparison with cognitive set time, the elapsed time may be such configured that recognize the predetermined set of keys when it is within cognitive set time.

また、上記第１、第２実施形態では、キーワードの認知と対応して記録する時刻を、キーワードの認知に応じて実時間クロック等により取得する認知時刻としたが、係る時刻はキーワードの認知と対応関係を有する時刻であれば適宜であり、例えばキーワードの認知事実やキーの認知事実を記録する時点の時刻等としてもよい。 In the first and second embodiments, the time recorded corresponding to the keyword recognition is the recognition time acquired by a real-time clock or the like according to the keyword recognition. It is appropriate as long as the time has a correspondence relationship. For example, it may be a time when a keyword recognition fact or a key recognition fact is recorded.

また、例えばコンテンツ呼出部１０６が、認知管理部１０５からの識別ＩＤの入力に基づき、再生処理部１０８若しくは送信処理部１１０で既に認識済のキーセットと対応するコンテンツが再生中若しくは送信中であるか判定し、コンテンツが再生中若しくは送信中でないと判定した場合には、コンテンツＤＢ１０７から前記識別ＩＤと対応設定されているコンテンツを呼び出し、再生処理部１０８若しくは送信処理部１１０へ出力し、他方で、コンテンツが再生中若しくは送信中であると判定した場合には、前記認知管理部１０５から入力された識別ＩＤを所定の記憶領域に記録保持し、或いは所定の記憶領域に既に記録保持されている識別ＩＤがある場合、記録保持されている識別ＩＤを前記認知管理部１０５から新たに入力された識別ＩＤに書き換え、前記所定の記憶領域の識別ＩＤを更新して記録保持し、更に、再生処理部１０８からコンテンツの再生終了データを取得若しくは送信処理部１１０からコンテンツの送信終了データを取得によりコンテンツの再生終了若しくは送信終了を判定或いは認識した場合に、その判定等に応じて、その判定等の時点で前記所定の記憶領域に記録保持している識別ＩＤと対応設定されているコンテンツを呼び出し、再生処理部１０８若しくは送信処理部１１０へ出力する構成等としてもよい。 Further, for example, the content calling unit 106 is reproducing or transmitting content corresponding to a key set that has already been recognized by the reproduction processing unit 108 or the transmission processing unit 110 based on the input of the identification ID from the recognition management unit 105. If it is determined that the content is not being reproduced or transmitted, the content DB 107 calls the content corresponding to the identification ID and outputs the content to the reproduction processing unit 108 or the transmission processing unit 110. When it is determined that the content is being reproduced or transmitted, the identification ID input from the recognition management unit 105 is recorded and held in a predetermined storage area, or is already recorded and held in a predetermined storage area. When there is an identification ID, the identification ID newly recorded from the recognition manager 105 is recorded and held. D is rewritten, the identification ID of the predetermined storage area is updated and recorded and held, and further, the content reproduction end data is obtained from the reproduction processing unit 108 or the content transmission end data is obtained from the transmission processing unit 110. When the playback end or transmission end is determined or recognized, the content set corresponding to the identification ID recorded and held in the predetermined storage area at the time of the determination or the like is called and played back according to the determination or the like A configuration for outputting to the processing unit 108 or the transmission processing unit 110 may be adopted.

また、例えば異なる複数のキーセット或いはその識別ＩＤに対応させて同一のコンテンツを設定してもよく、コンテンツ呼出部１０６が、当該コンテンツと対応する何れかの識別ＩＤの入力に基づき、記憶保持しているコンテンツＤＢ１０７から当該コンテンツを呼び出す構成としてもよい。 In addition, for example, the same content may be set corresponding to a plurality of different key sets or identification IDs thereof, and the content calling unit 106 stores and holds based on the input of any identification ID corresponding to the content. It is good also as a structure which calls the said content from existing content DB107.

また、コンテンツが動画像や音声等の場合には、例えば再生処理部１０８或いは送信処理部１１０は一巡して終了するまでコンテンツを再生して出力する或いは送信するが、必要に応じて再生中や送信中のコンテンツの再生や送信を途中で打ち切る構成としてもよい。例えばコンテンツ呼出部１０６が、キーセットの認識による識別ＩＤの入力に基づき、コンテンツＤＢ１０７から前記識別ＩＤと対応設定されているコンテンツを呼び出し、再生処理部１０８若しくは送信処理部１１０へ出力し、再生処理部１０８若しくは送信処理部１１０は、前記コンテンツの入力等に基づき、再生中若しくは送信中のコンテンツがない場合には、前記入力されたコンテンツの再生若しくは送信を開始し、他方、再生中若しくは送信中のコンテンツがある場合には、その再生若しくは送信を中止して終了し、前記入力されたコンテンツの再生若しくは送信を開始する構成等とする。更に、コンテンツが静止画像である場合には、例えば再生処理部１０８若しくは送信処理部１１０に設定記憶されている一律の設定時間に亘ってコンテンツを再生若しくは送信する構成や、コンテンツと対応してコンテンツＤＢ１０７等に設定記憶されている設定時間を再生処理部１０８若しくは送信処理部１１０が認識し、その設定時間に亘ってコンテンツを再生若しくは送信する構成等としてもよい。 In addition, when the content is a moving image or sound, for example, the reproduction processing unit 108 or the transmission processing unit 110 reproduces and outputs or transmits the content until the completion of the cycle. A configuration may be adopted in which the reproduction or transmission of the content being transmitted is interrupted. For example, the content calling unit 106 calls the content set corresponding to the identification ID from the content DB 107 based on the input of the identification ID based on the recognition of the key set, and outputs the content to the reproduction processing unit 108 or the transmission processing unit 110 for reproduction processing. The unit 108 or the transmission processing unit 110 starts reproduction or transmission of the input content when there is no content being reproduced or transmitted based on the input of the content or the like, and on the other hand, during reproduction or transmission If there is such content, the reproduction or transmission is stopped and terminated, and the reproduction or transmission of the input content is started. Further, when the content is a still image, for example, the content is reproduced or transmitted over a uniform set time set and stored in the reproduction processing unit 108 or the transmission processing unit 110, or the content corresponds to the content. A configuration in which the playback processing unit 108 or the transmission processing unit 110 recognizes a set time that is set and stored in the DB 107 or the like and plays or transmits content over the set time may be employed.

また、一つのキーセット或は一つのキーセットの識別ＩＤに複数のコンテンツを対応設定して、前記キーセット或いはキーセットの識別ＩＤに加え、他の条件項目或いは指定ＩＤに対応させて各コンテンツをコンテンツＤＢ１０７に設定記憶し、コンテンツ呼出部１０６が、所定のキーセット或いは所定のキーセットの識別ＩＤ及び他の条件項目等の認識若しくは入力に基づき、前記所定のキーセット或いは所定のキーセットの識別ＩＤ及び他の条件項目に対応設定されているコンテンツを呼び出す構成としてもよい。例えばキーセットの識別ＩＤ及び時間帯の区分に対応させて各コンテンツをコンテンツＤＢ１０７に設定記憶し、コンテンツ呼出部１０６が、所定のキーセットの識別ＩＤの入力に応じて、実時間クロック等での計測時刻から前記所定のキーセットの識別ＩＤが入力された時点の現在時刻を取得し、所定の記憶領域に記憶保持している時間帯区分と対比して前記現在時刻が含まれる所定の時間帯区分を認識し、入力された所定のキーセットの識別ＩＤと前記現在時刻が含まれる認識した所定の時間帯区分に対応するコンテンツをコンテンツＤＢ１０７から呼び出す構成等とし、キーワードを認知した時刻やキーセットを認識した時刻に応じて異なるコンテンツを提供してもよい。 Also, a plurality of contents are set corresponding to the identification ID of one key set or one key set, and in addition to the key set or the identification ID of the key set, each content is associated with another condition item or designated ID. Is stored in the content DB 107, and the content calling unit 106 determines whether the predetermined key set or the predetermined key set is based on the recognition or input of the predetermined key set or the identification ID of the predetermined key set and other condition items. It is good also as a structure which calls the content set corresponding to identification ID and another condition item. For example, each content is set and stored in the content DB 107 corresponding to the identification ID of the key set and the time zone classification, and the content calling unit 106 uses a real time clock or the like according to the input of the identification ID of the predetermined key set. The current time at which the identification ID of the predetermined key set is input from the measurement time is acquired, and the predetermined time zone in which the current time is included in comparison with the time zone section stored and held in the predetermined storage area Recognizing the category, the content corresponding to the recognized predetermined time zone classification including the identification ID of the input key set and the current time is called from the content DB 107, etc. Different content may be provided according to the time at which is recognized.

更に、キーセットの識別ＩＤ及び場所区分若しくは天候区分等の環境的な区分に対応させて各コンテンツをコンテンツＤＢ１０７に設定記憶し、例えばコンテンツ呼出部１０６が、所定のキーセットの識別ＩＤの入力に応じ、所定の記憶領域に記憶保持する所定の場所区分若しくは所定の天候区分を認識し、入力された所定のキーセットの識別ＩＤと認識した所定の場所区分若しくは所定の天候区分に対応するコンテンツをコンテンツＤＢ１０７から呼び出す構成等とし、場所や天候に対応するコンテンツを提供してもよい。前記場所区分は、例えばコンテンツの再生出力手段が設置される店舗の種別、地域別などで適宜設定することが可能であり、又、前記天候区分は、温度、湿度、天気などで適宜設定することが可能であり、例えばコンテンツ提供システムの所定部が制御プログラムと協働し、その日に入力された温度、湿度若しくは天気若しくはその組み合わせ等の天候データを認識し、設定記憶されている天候区分と天候データを対比して前記天候データが含まれる所定の天候区分を認識し、その天候区分を所定の記憶領域に記憶保持する構成等とすることが可能である。 Further, each content is set and stored in the content DB 107 in correspondence with the key set identification ID and the environmental classification such as the location classification or the weather classification. For example, the content calling unit 106 inputs the identification ID of a predetermined key set. In response, a predetermined location section or a predetermined weather section stored and held in a predetermined storage area is recognized, and an identification ID of the input predetermined key set and content corresponding to the recognized predetermined location section or the predetermined weather section are It may be configured to call from the content DB 107, and the content corresponding to the place and the weather may be provided. The location classification can be appropriately set according to, for example, the type of store where the content reproduction output means is installed, or by region, and the weather classification is appropriately set according to temperature, humidity, weather, or the like. For example, a predetermined part of the content providing system cooperates with the control program, recognizes weather data such as temperature, humidity, weather or a combination thereof input on the day, and sets and stores the weather category and weather. It is possible to adopt a configuration in which a predetermined weather classification including the weather data is recognized by comparing the data, and the weather classification is stored and held in a predetermined storage area.

更に、例えばキーセットの識別ＩＤ及び指定ＩＤに対応させて各コンテンツをコンテンツＤＢ１０７に設定記憶し、コンテンツ呼出部１０６が、所定のキーセットの識別ＩＤに対応する複数のコンテンツの内、出力或いは再生したコンテンツの次順位に設定されているコンテンツの指定ＩＤを、出力或いは再生したコンテンツの指定ＩＤである指定番号に１など所定数加算する等で取得し、その指定ＩＤを前記所定のキーセットの識別ＩＤに対応させて所定の記憶領域に記憶保持し、その後、前記所定のキーセットの識別ＩＤの入力に基づき、前記所定のキーセットの識別ＩＤに対して記憶保持する指定ＩＤを認識し、前記所定のキーセットの識別ＩＤ及び前記認識した指定ＩＤに対応するコンテンツを呼び出す構成等とし、同一のキーセットに対して２回目等に別のコンテンツを提供するようにしてもよい。 Further, for example, each content is set and stored in the content DB 107 in correspondence with the identification ID and designated ID of the key set, and the content calling unit 106 outputs or reproduces a plurality of contents corresponding to the identification ID of the predetermined key set. The specified ID of the content set in the next order of the acquired content is acquired by adding a predetermined number such as 1 to the specified number that is the specified ID of the output or reproduced content, and the specified ID is obtained from the predetermined key set. Recognizing a specified ID stored and held with respect to the identification ID of the predetermined key set based on the input of the identification ID of the predetermined key set based on the input of the identification ID of the predetermined key set; It is configured such that the content corresponding to the identification ID of the predetermined key set and the recognized designated ID is called, and the same key set It may be provided a different content to the second or the like to.

また、例えば一つのキーセット或いは一つのキーセットの識別ＩＤに連続再生若しくは連続送信する複数のコンテンツを必要に応じて対応させコンテンツＤＢ１０７に設定記憶し、コンテンツ呼出部１０６が、所定のキーセットの入力或いは所定のキーセットの識別ＩＤの入力に基づき、前記所定のキーセット或いは所定のキーセットの識別ＩＤに対応設定されている複数のコンテンツを呼び出し、再生処理部１０８若しくは送信処理部１１０が前記複数のコンテンツを連続再生若しくは連続送信する構成等とし、必要に応じて一若しくは複数のコンテンツを提供するようにしてもよい。 Further, for example, a plurality of contents that are continuously reproduced or continuously transmitted are associated with one key set or an identification ID of one key set as necessary, and set and stored in the contents DB 107, and the contents calling unit 106 stores a predetermined key set. Based on the input or the input of an identification ID of a predetermined key set, a plurality of contents set corresponding to the predetermined key set or the identification ID of the predetermined key set are called, and the reproduction processing unit 108 or the transmission processing unit 110 A configuration in which a plurality of contents are continuously reproduced or continuously transmitted may be provided, and one or a plurality of contents may be provided as necessary.

本発明は、例えば店舗などで話者等の対話から音声を取り込み、その音声に含まれるキーワードに基づき話題を特定し、その話題に対応する広告等のコンテンツを話者等にリアルタイムに提供することができる。 The present invention, for example, captures voice from a conversation of a speaker or the like in a store, specifies a topic based on a keyword included in the voice, and provides content such as an advertisement corresponding to the topic to the speaker or the like in real time. Can do.

第１実施形態のコンテンツ提供システムの全体構成を示すブロック図。The block diagram which shows the whole structure of the content provision system of 1st Embodiment. 認知管理テーブルのデータ構成を示す図。The figure which shows the data structure of a recognition management table. 第１実施形態のコンテンツ提供システムによるコンテンツ提供処理の流れを示すフローチャート。The flowchart which shows the flow of the content provision process by the content provision system of 1st Embodiment. 第２実施形態のコンテンツ提供システムの全体構成を示すブロック図。The block diagram which shows the whole structure of the content provision system of 2nd Embodiment. 第２実施形態のコンテンツ提供システムによるコンテンツ提供処理の流れを示すフローチャート。The flowchart which shows the flow of the content provision process by the content provision system of 2nd Embodiment. 認知管理テーブルのデータ構成の変形例を示す図。The figure which shows the modification of the data structure of a recognition management table.

Explanation of symbols

１００コンテンツ提供システム
１０１マイクロフォン
１０２特徴抽出部
１０３キーワード認知部
１０４標準特徴記憶部
１０５認知管理部
１０６コンテンツ呼出部
１０７コンテンツＤＢ
１０８再生処理部
１０９ディスプレイ
１１０送信処理部
２００再生出力装置

DESCRIPTION OF SYMBOLS 100 Content provision system 101 Microphone 102 Feature extraction part 103 Keyword recognition part 104 Standard feature memory | storage part 105 Recognition management part 106 Content calling part 107 Content DB
108 reproduction processing unit 109 display 110 transmission processing unit 200 reproduction output device

Claims

A computer or network computer, which recognizes the voice taken from the microphone, recognizes the keyword set from the recognized voice, and corresponds the recognition corresponding time of the keyword to the keyword according to the keyword recognition Or a means for updating the recording of the recognition corresponding time according to the recognition of the corresponding keyword, and a plurality of types of keywords or keys constituting the key set. From means for recognizing a predetermined key set whose recognition elapsed time from the earliest recognition correspondence time to the last recognition correspondence time is within the recognition setting time, and a content database storing contents set corresponding to each key set Corresponding setting with the recognized predetermined key set Dividing the content based on the recognition of the normal content and the keyword, the means for calling the content, the normal content such as the transmitted program and the like, and the means for playing the called content in near real time and the real time A content providing system comprising means for displaying on a screen.

It is composed of a computer or a network computer, and recognizes a voice taken from a microphone, recognizes a keyword set from the recognized voice, and a key set including a plurality of types of keywords or keys corresponding to the keywords. Means for recognizing a predetermined key set based on keyword recognition, means for determining whether content is being reproduced or transmitted based on recognition of the predetermined key set, recognition of the predetermined key set, or reproduction of the content Alternatively, based on the recognition of the end of playback or the end of transmission that is within the set time from the determination during transmission to the end of content playback or transmission end, playback or playback of the content set corresponding to the predetermined key set When sending to the output device and at the time of the setting And a means for not reproducing or transmitting the content set corresponding to the predetermined key set when the end of reproduction or transmission end is not recognized. system.

It is composed of a computer or a network computer, and recognizes a voice taken from a microphone, recognizes a keyword set from the recognized voice, and a key set including a plurality of types of keywords or keys corresponding to the keywords. Based on recognition of a keyword, a means for recognizing a predetermined key set, a means for determining whether the content is being reproduced or transmitted based on the recognition of the predetermined key set, and based on a determination of whether the content is being reproduced or transmitted, Means for recording and holding the recognition of the predetermined key set and, if there is recognition of the key set already recorded and held, renewing and holding the recognition of the predetermined key set; Based on the recognition of the end of transmission, the predetermined key set that is stored Contents providing system characterized in that it comprises means for transmitting to the corresponding have been set reproduction or playback output apparatus content.