JP2008170820A

JP2008170820A - Content provision system and method

Info

Publication number: JP2008170820A
Application number: JP2007005155A
Authority: JP
Inventors: Takeshi Moriyama; 剛森山
Original assignee: Individual
Current assignee: Individual
Priority date: 2007-01-12
Filing date: 2007-01-12
Publication date: 2008-07-24

Abstract

<P>PROBLEM TO BE SOLVED: To select an advertisement etc., to be displayed according to a speaker's feeling in order to enhance advertisement effect. <P>SOLUTION: A speaker's voice is acquired (S11), and the feeling of the speaker is analyzed based upon the acquired speech (S12); and the analysis results of the feeling are stored in a database (S13), and screen display is changed based upon the feeling analysis results (S14). For example, when the feeling analysis results show "excitement", advertisement prestored while related to a feeling classification "excitement" is displayed on the screen of a monitor that the speaker is viewing. Further, the screen display is changed based upon a word spotting result (S15). For example, when a word "hot-spring" is extracted from a conversation, advertisement prestored while related to the word "hot-spring" is displayed on the screen of the monitor that the speaker is viewing. Then banner advertisement displayed on the screen is clicked on with a pointing device such as a mouse to access a link-destination Web site of the banner advertisement (S16). <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、例えばパーソナルコンピュータ及びインターネットを用いて２以上の話者が会話する場合において、入力音声から話題や話者の感情を判定し、その話題や感情に適切な広告やコメントなどのコンテンツをパーソナルコンピュータのモニターに表示させるシステム及び方法に関する。 In the present invention, when two or more speakers have a conversation using, for example, a personal computer and the Internet, the topic and the emotion of the speaker are determined from the input voice, and contents such as advertisements and comments suitable for the topic and emotion are displayed. The present invention relates to a system and method for displaying on a monitor of a personal computer.

入力音声から感情を検出する方法及び装置に関する技術が提案されている（例えば、特許文献１参照）。
特願２００２−２９３９２６号公報 A technique relating to a method and apparatus for detecting emotion from input speech has been proposed (see, for example, Patent Document 1).
Japanese Patent Application No. 2002-293926

特許文献１では段落［００３２］に記載されているように、所定の話者について感情認識を行っているが、不特定の話者の感情を判定することはできなかった。
そこで、本発明は、不特定の話者が不特定の環境で不特定の発話内容で発話した場合であっても、話者の感情を判定し、判定された感情に対して適切なコメントや広告を表示することを可能とする。 In Patent Document 1, as described in paragraph [0032], emotion recognition is performed for a predetermined speaker, but the emotion of an unspecified speaker cannot be determined.
Therefore, the present invention determines a speaker's emotion even when an unspecified speaker speaks with an unspecified utterance content in an unspecified environment, and an appropriate comment or Allows advertisements to be displayed.

本発明の第１の特徴は、コンテンツ提供システムであって、単語とコンテンツを関連付けて記憶する手段と、会話音声の中から単語を抽出する単語抽出手段と、単語抽出手段によって抽出された単語に関連付けて記憶されているコンテンツを読み取るコンテンツ読み取り手段と、読み取られたコンテンツをコンテンツ再生手段へ送るコンテンツ送信手段と、を備えることにある。 A first feature of the present invention is a content providing system, in which a word and content are stored in association with each other, a word extracting unit that extracts a word from conversation speech, and a word extracted by the word extracting unit. It is provided with a content reading means for reading the content stored in association with the content transmitting means for sending the read content to the content reproduction means.

コンテンツとは、例えば広告やコメントを含む。
記憶する手段とは、磁気的、電気的、光学的又は光磁気的に情報を記憶する手段を含み、具体的にはハード・ディスク・ドライブ（HDD）、ランダム・アクセス・メモリ（RAM）、ＣＤドライブ、ＤＶＤドライブ、ＭＯドライブを含む。
単語抽出手段とは、ワードスポッティングと称される技術において使用されるものを含む。
コンテンツ再生手段とは、例えばGIF又はJPEG形式の画像やHTMLで記述されたWebサイトを表示可能なパーソナルコンピュータや携帯電話機を言う。 Content includes, for example, advertisements and comments.
The means for storing includes means for storing information magnetically, electrically, optically or magneto-optically, specifically, hard disk drive (HDD), random access memory (RAM), CD Includes drives, DVD drives, and MO drives.
The word extraction means includes those used in a technique called word spotting.
The content reproduction means refers to, for example, a personal computer or a mobile phone capable of displaying a GIF or JPEG format image or a website described in HTML.

本発明の第２の特徴は、第１の特徴に加えて、コンテンツ送信手段によって送信されたコンテンツの履歴を記憶する手段をさらに備え、１つの単語に対して複数のコンテンツが優先順位付きで記憶され、コンテンツ読み取り手段は、単語抽出手段によって抽出された単語に関連付けて記憶されている複数のコンテンツの中から、送信されたコンテンツの履歴を参照して、直近に送信されたコンテンツの次に優先順位が高いコンテンツを読み取ることにある。 In addition to the first feature, the second feature of the present invention further includes means for storing a history of the content transmitted by the content transmission unit, and a plurality of contents are stored with priority for one word. The content reading unit refers to the history of the transmitted content from among the plurality of contents stored in association with the word extracted by the word extracting unit, and prioritizes the content transmitted most recently. It is to read the content with higher rank.

「１つの単語に対して複数のコンテンツが優先順位付きで記憶され、」とは、例えば「オンセン」という単語に対して、旅行会社Ａの広告が優先順位第１位で記憶され、旅行会社Ｂの広告が優先順位第２位で記憶され、また「ケーキ」という音声の波形に対して、銀座Ａ店の広告が優先順位第１位で記憶され、青山Ｂ店の広告が優先順位第２位で記憶されていることを言う。 “A plurality of contents are stored with priority for one word” means that, for example, the advertisement of travel company A is stored at the top priority for the word “Onsen”, and travel company B Is stored in the second priority, and the advertisement of the Ginza A store is stored in the first priority, and the advertisement in the Aoyama B store is the second priority in the waveform of the voice “cake”. Say that is remembered in.

「コンテンツ読み取り手段は、単語抽出手段によって抽出された単語に関連付けて記憶されている複数のコンテンツの中から、送信されたコンテンツの履歴を参照して、直近に送信されたコンテンツの次に優先順位が高いコンテンツを読み取る」とは、例えば
「オンセン」という単語が抽出され、
「オンセン」という単語に関連付けられているコンテンツの中での優先順位が第１位の旅行会社Ａの広告が送信され、
その後に「ケーキ」という単語が抽出され、「ケーキ」という単語に関連付けられているコンテンツの中での優先順位が第１位の銀座Ａ店又は第２位の青山Ｂ店の広告が送信され、
その後に再び「オンセン」という単語が抽出された場合に、「オンセン」という単語に関連付けられているコンテンツの中での優先順位第２位の旅行会社Ｂの広告を読み取ることを言う。 “The content reading means refers to the history of the transmitted content from among the plurality of contents stored in association with the word extracted by the word extracting means, and then prioritizes the content that has been transmitted most recently. "Read content with high" means, for example, the word "Onsen"
An advertisement for travel agency A, which has the highest priority among the content associated with the word "Onsen"
After that, the word “cake” is extracted, and the advertisement of the first priority Ginza A store or the second highest Aoyama B store in the content associated with the word “cake” is transmitted,
Thereafter, when the word “Onsen” is extracted again, it means reading the advertisement of travel company B having the second highest priority among the contents associated with the word “Onsen”.

本発明の第３の特徴は、コンテンツ提供システムであって、感情種別とコンテンツを関連付けて記憶する感情種別コンテンツ記憶手段と、音声入力手段から入力された音声の特徴量を算出する特徴量算出手段と、算出された音声特徴量に基づいて、感情種別を判定する感情種別判定手段と、判定された前記感情種別に関連付けて記憶されている前記コンテンツを読み取るコンテンツ読み取り手段と、読み取られたコンテンツをコンテンツ再生手段へ送る送信手段と、を備えることにある。 According to a third aspect of the present invention, there is provided a content providing system, an emotion type content storage unit that stores an emotion type and content in association with each other, and a feature amount calculation unit that calculates a feature amount of a voice input from the voice input unit And an emotion type determining means for determining an emotion type based on the calculated audio feature amount, a content reading means for reading the content stored in association with the determined emotion type, and the read content Transmitting means for sending to the content reproduction means.

特徴量算出手段は、音声のパワーの平均、標準偏差、及びピッチの平均、標準偏差などを算出し、それら平均、標準偏差を所定の数式に代入して計算可能な手段である。 The feature amount calculating means is a means that can calculate an average of voice power, a standard deviation, an average of pitch, a standard deviation, and the like, and substituting these averages and standard deviations into a predetermined formula.

本発明の第４の特徴は、第３の特徴に加えて、感情種別判定手段によって判定された感情種別を第１の所定時間毎に記憶する手段と、第１の所定時間毎に記憶された感情種別の中から第２の所定時間毎に１つの感情種別を抽出する感情種別抽出手段と、をさらに備え、コンテンツ読み取り手段は、感情種別抽出手段によって抽出された感情種別に関連付けて記憶されているコンテンツを感情種別コンテンツ記憶手段から読み取ることにある。
感情種別抽出手段は、例えば、感情種別を１秒ごとに記憶し、５秒ごとに直近５秒間で出現頻度が最も多い感情を抽出する。 According to a fourth feature of the present invention, in addition to the third feature, a means for storing the emotion type determined by the emotion type determining means at every first predetermined time and a first predetermined time are stored. An emotion type extracting unit that extracts one emotion type from the emotion types every second predetermined time, and the content reading unit is stored in association with the emotion type extracted by the emotion type extracting unit. The content is to be read from the emotion type content storage means.
For example, the emotion type extraction unit stores the emotion type every second, and extracts the emotion having the highest appearance frequency in the latest 5 seconds every 5 seconds.

本発明の第５の特徴は、第４の特徴に加えて、コンテンツ送信手段によって送信されたコンテンツの履歴を記憶する手段をさらに備え、１つの感情種別に対して複数のコンテンツが優先順位付きで記憶され、コンテンツ読み取り手段は、感情種別判定手段によって判定された感情種別に関連付けて記憶されている複数のコンテンツの中から、送信されたコンテンツの履歴を参照して、直近に送信されたコンテンツの次に優先順位が高いコンテンツを読み取ることにある。 In addition to the fourth feature, the fifth feature of the present invention further comprises means for storing a history of content transmitted by the content sending means, and a plurality of contents are given priority for one emotion type. The stored content reading means refers to the history of the transmitted content from among the plurality of contents stored in association with the emotion type determined by the emotion type determining means, and Next, the content with the highest priority is read.

本発明の第６の特徴は、第３乃至第５の特徴に加えて、音声の特徴量が、音声のパワーの平均及び標準偏差、並びに音声のピッチの平均及び標準偏差であることにある。 A sixth feature of the present invention resides in that, in addition to the third to fifth features, the voice feature amount is an average and standard deviation of voice power and an average and standard deviation of voice pitch.

本発明の第７の特徴は、コンテンツ提供システムであって、感情種別に関連付けられた広告、単語に関連付けられた広告、及び感情種別と単語の双方に関連付けられたコメントをそれぞれ記憶する手段と、入力された音声に基づいて、感情種別を判定する感情種別判定手段と、会話音声の中から単語を抽出する単語抽出手段と、前記感情種別判定手段によって判定された感情種別に関連付けて記憶されている広告を読み出し、前記単語抽出手段によって抽出された単語に関連付けて記憶されている広告を読み出し、かつ前記感情種別判定手段によって判定された感情種別及び前記単語抽出手段によって抽出された単語に関連付けて記憶されているコメントを読み取る広告コメント読み取り手段と、前記読み取られた広告及びコメントをコンテンツ再生手段へ送るコンテンツ送信手段と、を備えることにある。 A seventh feature of the present invention is a content providing system, which stores an advertisement associated with an emotion type, an advertisement associated with a word, and a comment associated with both the emotion type and the word, Based on the input voice, the emotion type determining means for determining the emotion type, the word extracting means for extracting a word from the conversation voice, and the emotion type determined by the emotion type determining means are stored in association with each other. The advertisement stored in association with the word extracted by the word extracting means, and the emotion type determined by the emotion type determining means and the word extracted by the word extracting means An advertisement comment reading means for reading stored comments, and the read advertisement and comment are containerized. It lies in comprising: a content transmission unit for sending to Tsu reproducing means.

本発明によれば、会話に含まれる単語や話者の感情に応じた広告やコメントなどのコンテンツを表示させることが可能となる。 According to the present invention, it is possible to display contents such as advertisements and comments according to words included in a conversation and emotions of speakers.

以下に本発明を実施するための最良の形態を説明する。なお、以下の説明は、単なる例示に過ぎず、本発明の技術的範囲は以下の説明に限定されるものではない。 The best mode for carrying out the present invention will be described below. The following description is merely an example, and the technical scope of the present invention is not limited to the following description.

［全体イメージ］
感情音声識別ツールを利用して話者の感情データ及び、会話中の単語を取得（ワードスポッティング）する。ツール利用後、パソコン版Webサイトと携帯版Webサイトを表示する。取得した感情データと単語に基づいて各種サービスを閲覧可能とする。取得した感情データをもとに「声占い」、「声健康」等の各種サービスをブラウザで閲覧することができる。 [Overall image]
The emotional voice identification tool is used to acquire the emotional data of the speaker and the word in conversation (word spotting). After using the tool, display the PC version website and the mobile version website. Various services can be browsed based on the acquired emotion data and words. Various services such as “voice fortune-telling” and “voice health” can be browsed on the browser based on the acquired emotion data.

［音声感情識別ツール］
音声感情又は会話中に発せられた単語（ワード）に関連づけられた広告及びその広告に関連したコメントを表示させ、その広告がクリックされ、その広告に関連付けられているリンク先のウェブサイトにジャンプさせる確率を高める。 [Voice emotion recognition tool]
Displays advertisements associated with voice emotions or words uttered during a conversation and comments associated with the advertisements, clicks on the advertisements, and jumps to the linked website associated with the advertisement Increase probability.

図２０は、音声感情識別ツールの画面デザインの一例を示す。同図に示す画面２７は、音声感情に関連づけられたバナー広告２７１と、会話中に発せられた単語に関連づけられたバナー広告２７２と、音声感情及び会話中に発せられた単語に関連づけられたコメント２７３と、音声感情バロメーター２７４を含む。 FIG. 20 shows an example of the screen design of the voice emotion identification tool. The screen 27 shown in the figure includes a banner advertisement 271 associated with the voice emotion, a banner advertisement 272 associated with the word uttered during the conversation, and a comment associated with the voice emotion and the word uttered during the conversation. 273 and a voice emotion barometer 274.

バナー広告２７１は、話者の感情に基づいて選択される。例えば、「興奮」という感情種別と「興奮しているときに表示させる広告」とを関連付けて記憶し、また「悲しい」という感情種別と「悲しいときに表示させる広告」とを関連付けて記憶しておく。そして、話者の感情が「興奮」と判断された場合には、「興奮しているときに表示させる広告」が表示される。話者の感情が「悲しい」と識別された場合には、「悲しいときに表示させる広告」が表示される。 The banner advertisement 271 is selected based on the emotion of the speaker. For example, the emotion type “excited” and the “advertisement to be displayed when excited” are stored in association with each other, and the emotion type “sad” and the “advertisement to be displayed when sad” are stored in association with each other. deep. When the speaker's emotion is determined to be “excited”, “an advertisement to be displayed when excited” is displayed. When the emotion of the speaker is identified as “sad”, “an advertisement to be displayed when sad” is displayed.

バナー広告２７２は、会話中に発せられた単語に基づいて選択される。例えば、「温泉」という単語と「オンセンと発音されたときに表示させる広告」とを関連付けて記憶し、また「鍋」という単語と「ナベと発音されたときに表示させる広告」とを関連付けて記憶しておく。そして、会話中に「オンセン」と発音されたと判断された場合には、「オンセンと発音されたときに表示させる広告」が表示される。会話中に「ナベ」と発音されたと判断された場合には、「ナベと発音されたときに表示させる広告」が表示される。 The banner advertisement 272 is selected based on words uttered during the conversation. For example, the word “hot spring” and “an advertisement to be displayed when pronounced as Onsen” are stored in association with each other, and the word “nabe” is associated with “an advertisement to be displayed when pronounced as pan”. Remember. If it is determined that “onsen” is pronounced during the conversation, “an advertisement to be displayed when it is pronounced onsen” is displayed. If it is determined that “nabe” is pronounced during the conversation, “an advertisement to be displayed when it is pronounced” is displayed.

コメント２７３は、音声感情及び会話中に発せられた単語に基づいて選択される。例えば、「悲しい」という感情種別と「温泉」という単語と「悲しんでいると判断され、かつオンセンと発音されたときに表示させるコメント」とを関連付けて記憶し、また「興奮」という感情種別と「温泉」という単語と「興奮していると判断され、かつオンセンと発音されたときに表示させるコメント」とを関連付けて記憶しておく。そして、話者の感情が「悲しい」と識別され、かつ会話中に「オンセン」と発音されたと判断された場合には、「悲しんでいると判断され、かつオンセンと発音されたときに表示させるコメント」が表示される。また、話者の感情が「興奮している」と識別され、かつ会話中に「オンセン」と発音されたと判断された場合には、「興奮していると判断され、かつオンセンと発音されたときに表示させるコメント」が表示される。 The comment 273 is selected based on the voice emotion and the word uttered during the conversation. For example, the emotion type “sad”, the word “hot spring” and the “comment to be displayed when it is determined to be sad and pronounced onsen” are stored in association with each other, and the emotion type “excitement” The word “hot spring” and the “comment to be displayed when it is determined to be excited and pronounced Onsen” are stored in association with each other. And if the speaker's emotion is identified as “sad” and it is determined that “onsen” is pronounced during the conversation, “when it is determined to be sad and pronounced as onsen, it is displayed. Comment "is displayed. Also, if the speaker ’s emotion is identified as “excited” and it is determined that “onsen” was pronounced during the conversation, then “excited and onsen was pronounced” "Comment to be displayed when" is displayed.

「音声感情に基づいて選択された広告」、「ワードに基づいて選択された広告」、又は「音声感情及びワードに基づいて選択されたコメント」のそれぞれが表示されると高い広告効果が発揮されるため、好ましい。 When each of “advertising selected based on voice emotion”, “advertising selected based on word”, or “comment selected based on voice emotion and word” is displayed, a high advertising effect is exhibited. Therefore, it is preferable.

「音声感情に基づいて選択された広告」、「ワードに基づいて選択された広告」、並びに「音声感情及びワードに基づいて選択されたコメント」が組み合わせて表示されるとより一層高い広告効果が発揮されるため、これら全てが表示されることが好ましい。 When “advertising selected based on voice emotion”, “advertising selected based on word”, and “comment selected based on voice emotion and word” are displayed in combination, a higher advertising effect is obtained. All of these are preferably displayed in order to be demonstrated.

［全体フロー］
＜感情音声識別ツール利用時＞
図１は、感情音声識別ツール利用時の処理の流れを示すフローチャートである。図１に示すように、ステップＳ１１でマイクロフォンなどを用いて音声を取得し、ステップＳ１２で感情を分析し、ステップＳ１３で感情の分析結果をデータベースに蓄積し、ステップＳ１４で分析結果に基づいて画面表示を変更し、またステップＳ１５でワードスポッティング結果から画面表示を変更し、ステップＳ１６でバナー広告などがクリックされリンク先のウェブサイトへアクセスする。 [Overall flow]
<When using the emotion voice recognition tool>
FIG. 1 is a flowchart showing the flow of processing when the emotional voice identification tool is used. As shown in FIG. 1, voice is acquired using a microphone or the like in step S11, emotions are analyzed in step S12, emotion analysis results are accumulated in a database in step S13, and a screen is displayed based on the analysis results in step S14. In step S15, the screen display is changed from the word spotting result. In step S16, a banner advertisement or the like is clicked, and the linked website is accessed.

＜パソコン又は携帯電話でのパーソナルサイト閲覧時＞
図２は、パソコン又は携帯電話でのパーソナルサイト閲覧時の処理の流れを示すフローチャートである。図２に示すように、ステップＳ２１でＷｅｂサイトにログインし、ステップＳ２２でツール利用時に蓄積されたデータをベースとした各種サービスを閲覧し、ステップＳ２３でバナー広告からリンク先へアクセスする。 <When viewing a personal site on a PC or mobile phone>
FIG. 2 is a flowchart showing the flow of processing when browsing a personal site on a personal computer or mobile phone. As shown in FIG. 2, in step S21, the user logs in to the website, browses various services based on the data accumulated when using the tool in step S22, and accesses the link destination from the banner advertisement in step S23.

［サービス提供システム全体構成例］
図３は、サービス提供システムの全体構成の一例を示すブロック図である。同図に示す例では、登録メンバー２１の会話はマイクロフォン２２を介してパーソナルコンピュータ（ＰＣ）２３に取り込まれる。メンバー認証データベース３１は、登録メンバー２１の詳細情報（メンバー名、パスワード等）を蓄積する。コンテンツ配信サーバ３２は、ＰＣ２３の感情音声識別ツールからのリクエストを受けて、感情音声結果とメンバー名から決定される表示すべきコメントと広告をコメントサーバ３３と広告管理サーバ３４から取得し、ＰＣ２３の音声感情識別ツールにコメントデータと広告データを送信する。 [Example of overall service provision system configuration]
FIG. 3 is a block diagram illustrating an example of the overall configuration of the service providing system. In the example shown in the figure, the conversation of the registered member 21 is taken into the personal computer (PC) 23 via the microphone 22. The member authentication database 31 stores detailed information (member name, password, etc.) of the registered member 21. In response to the request from the emotion voice identification tool of the PC 23, the content distribution server 32 acquires a comment and an advertisement to be displayed determined from the emotion voice result and the member name from the comment server 33 and the advertisement management server 34, and Send comment data and advertisement data to the voice emotion identification tool.

コメントサーバ３３は、コンテンツ配信サーバ３２から受けた感情音声結果とメンバー名から決定される表示すべきコメントデータをコンテンツ配信サーバ３２に送信し、メンバー別の感情音声データ履歴を保持する。広告管理サーバ３４は、コンテンツ配信サーバ３２から受けた感情音声結果とメンバー名から決定される表示すべき広告データをコンテンツ配信サーバ３２に送信し、メンバー別の広告データ履歴を保持する。 The comment server 33 transmits to the content distribution server 32 the comment data to be displayed, which is determined from the emotion sound result received from the content distribution server 32 and the member name, and holds the emotion sound data history for each member. The advertisement management server 34 transmits the advertisement data to be displayed determined from the emotion sound result received from the content distribution server 32 and the member name to the content distribution server 32, and holds the advertisement data history for each member.

感情音声データベース３５は、ＰＣ２３の感情音声識別ツールから送信されてきた特徴量データ２６をメンバー（話者２１）毎に保存する。パーソナルデータベース３６は、ＰＣ２３の感情音声識別ツールから送信されてきた感情音声識別結果と表示した広告番号（分析結果データ２５）をメンバー毎に保存する。分析結果データ２５は、パーソナルデータベース３６からサイト運営用Ｗｅｂサーバ３７へ送られる。 The emotion voice database 35 stores the feature amount data 26 transmitted from the emotion voice identification tool of the PC 23 for each member (speaker 21). The personal database 36 stores the emotional voice identification result transmitted from the emotional voice identification tool of the PC 23 and the displayed advertisement number (analysis result data 25) for each member. The analysis result data 25 is sent from the personal database 36 to the site management Web server 37.

サイト運営用Ｗｅｂサーバ３７は、ＰＣ２３及び携帯電話機２４から閲覧可能なサイト環境を構築する。提供サービスデータベース３８は、サイト運営用Ｗｅｂサーバ３７から受けたメンバー名をもとに、パーソナルデータベース３６から最新の感情識別結果を取得し、取得した感情識別結果から「声占い」、「声健康」等の提供サービスデータをサイト運営用Ｗｅｂサーバ３７に送信する。 The site management Web server 37 constructs a site environment that can be browsed from the PC 23 and the mobile phone 24. The provided service database 38 acquires the latest emotion identification result from the personal database 36 based on the member name received from the web server 37 for site management, and “voice fortune-telling” and “voice health” from the acquired emotion identification result. Service data such as the above is transmitted to the Web server 37 for site operation.

［感情認識の基本コンセプト］
言語や話者に依存しない単純な特徴量として，音量や音高といった韻律成分を用いる。特徴量の所定時間（例えば、過去１秒間）の基本統計量をもって，話者の現在の話し方とする。話し方の定常状態（例えば、過去５秒間の基本統計量）からの逸脱量から，各感情の度合いを求める。 [Basic concept of emotion recognition]
Prosodic components such as volume and pitch are used as simple features that do not depend on language or speaker. The basic statistic for a predetermined time (for example, the past one second) of the feature amount is used as the speaker's current way of speaking. The degree of each emotion is obtained from the amount of deviation from the steady state of speaking (for example, basic statistics over the past 5 seconds).

［感情識別ソフトウェア（ツール）を構成するDLL］
感情識別ソフトウェア（ツール）を構成するDLLは、EmotionMonitorDLL.dll、WaveIn.dll、Fft_C.dllなどである。EmotionMonitorDLL.dllは、音声特徴量を算出しその統計量から感情を計測する。音声特徴量には、音声入力波形データ、スペクタル包絡データ、ピッチ軌跡データ、パワー軌跡データが含まれる。WaveIn.dllは、マイクロフォンから音声を取得する。WaveIn.dllは、Windows（登録商標）系オペレーティングシステムでサウンドを録音する場合に、一般的に使用されているマルチメディアＡＰＩ（Application Program Interface）を使用して、音声入力デバイスからツールが用意した記憶領域（バッファ）に指定された長さの音声波形を更新格納し利用できるようにする。Fft_C.dllは、高速フーリエ変換に使用可能なＡＰＩであって、周波数スペクトルを計算する。 [DLL constituting emotion identification software (tool)]
The DLLs that make up the emotion identification software (tool) are EmotionMonitorDLL.dll, WaveIn.dll, Fft_C.dll, and the like. EmotionMonitorDLL.dll calculates voice features and measures emotions from the statistics. The voice feature amount includes voice input waveform data, spectral envelope data, pitch trajectory data, and power trajectory data. WaveIn.dll gets audio from the microphone. WaveIn.dll is a memory prepared by the tool from the voice input device using the multimedia API (Application Program Interface) that is generally used when recording sound in the Windows (registered trademark) operating system. The voice waveform of the length specified in the area (buffer) is updated and stored so that it can be used. Fft_C.dll is an API that can be used for fast Fourier transform, and calculates a frequency spectrum.

［音声取得から音声感情分析全体フロー］
図４は、音声取得から音声感情分析までの処理の流れを示すフローチャートである。図４に示すように、ステップＳ４１でマイクから音声を取得し、ステップＳ４２でＡ／Ｄ変換をし、ステップＳ４３で離散フーリエ変換をし、ステップＳ４４で音声特徴量を算出し、ステップＳ４５で感情を計測し、ステップＳ４６で感情を出力する。ステップＳ４１、ステップＳ４２で音声を取得し、ステップＳ４３〜４６で音声感情を分析する。 [Overall flow of voice emotion analysis from voice acquisition]
FIG. 4 is a flowchart showing a flow of processing from voice acquisition to voice emotion analysis. As shown in FIG. 4, voice is acquired from the microphone in step S41, A / D conversion is performed in step S42, discrete Fourier transform is performed in step S43, voice feature values are calculated in step S44, and emotion is calculated in step S45. Is measured, and emotion is output in step S46. Voices are acquired in steps S41 and S42, and voice emotions are analyzed in steps S43 to S46.

Ａ／Ｄ変換（アナログ／ディジタル変換）は、アナログ信号である音声波形を標本化及び量子化を行ってデジタルデータに変換する。例えばサンプリングレート１６ＫＨｚ、分解能１６ビットとする。 In A / D conversion (analog / digital conversion), a speech waveform which is an analog signal is sampled and quantized and converted into digital data. For example, the sampling rate is 16 KHz and the resolution is 16 bits.

ステップＳ４４では、音声特徴量を算出する。音声特徴量とは、各バッファ（分析フレーム）についてパワーを求めて得た音声波形全体の軌跡（パワー軌跡）及び各バッファ（分析フレーム）についてピッチを求めて得た音声波形全体の軌跡（ピッチ軌跡）を言う。パワーは、周波数スペクトルの各周波数成分の自乗和を意味する。ピッチは、声の高さ（単位：Ｈｚ（ヘルツ））を意味する。 In step S44, an audio feature amount is calculated. The voice feature amount is a trajectory (power trajectory) of the entire speech waveform obtained by obtaining power for each buffer (analysis frame) and a trajectory (pitch trajectory) of the entire speech waveform obtained by obtaining pitch for each buffer (analysis frame). ) Power means the sum of squares of each frequency component of the frequency spectrum. The pitch means the pitch of the voice (unit: Hz (Hertz)).

ステップＳ４５では、音声特徴量の統計量から例えば「興奮度」、「悲しみ度」、「わくわく度」、「まったり度」をそれぞれ0.0から1.0まで0.1刻み11段階で0.128秒ごとにリアルタイムに計測する。そして、５回の計測で最頻の感情を「現在の感情」と判定する。「音声特徴量の統計量」とは、例えば直前５秒間のパワー軌跡についての平均、標準偏差及び直前１秒間のパワー軌跡及びピッチ軌跡の平均、標準偏差を言う。平均は、相加平均（算術平均）を意味する。 In step S45, for example, “excitement degree”, “sadness degree”, “excitement degree”, and “degree of chilliness” are measured in real time every 0.128 seconds in 11 steps from 0.0 to 1.0 from the statistics of the voice feature quantity. . Then, the most frequent emotion is determined as the “current emotion” by five measurements. The “statistics of voice feature amount” refers to, for example, the average and standard deviation of the power locus for the last 5 seconds and the average and standard deviation of the power locus and pitch locus for the immediately preceding 1 second. Average means arithmetic mean (arithmetic mean).

［音声特徴量算出フロー］
図５は、音声特徴量算出処理の流れを示すフローチャートである。図５は、図４のステップＳ４４を詳しく説明するものである。図５に示すように、ステップＳ５１で自乗和を算出し、ステップＳ５２でパワーの５秒間の平均・標準偏差及び１秒間の平均・標準偏差を算出する。 [Audio feature calculation flow]
FIG. 5 is a flowchart showing the flow of the voice feature amount calculation process. FIG. 5 explains step S44 of FIG. 4 in detail. As shown in FIG. 5, the sum of squares is calculated in step S51, and the average / standard deviation for 5 seconds and the average / standard deviation for 1 second are calculated in step S52.

また、ステップＳ５４で対数変換をし、ステップＳ５５で離散フーリエ変換をし、ステップＳ５６でピークを検出し、ステップＳ５７でピッチの１秒間の平均・標準偏差を算出する。ステップＳ４３及びステップＳ５４からステップＳ５６までをケプストラム分析と言う。 Further, logarithmic transformation is performed in step S54, discrete Fourier transformation is performed in step S55, a peak is detected in step S56, and an average / standard deviation for one second of the pitch is calculated in step S57. Steps S43 and S54 to S56 are referred to as cepstrum analysis.

［不特定の話者への対応］
音声の話者による違い（話者性）を「話し方の定常状態における抑揚の統計量」と定義する。入力音声の短時間平均パワーを５秒間監視し、その基本統計量（平均及び標準偏差）をもって話者性と同定する。同定した話者性を基準に感情認識を行うことにより、話者の声の大きさの違いを吸収する。短時間平均パワーは音声の音量に相当し、2048サンプルの分析フレーム（16KHzサンプリングで0.128秒）のフーリエスペクトルの自乗和を計算する。 [Response to unspecified speakers]
The difference between speakers of speech (speaker property) is defined as “statistic of inflection in steady state of speech”. The short-time average power of the input speech is monitored for 5 seconds, and the basic statistic (average and standard deviation) is identified as speaker characteristics. By recognizing emotions based on the identified speaker characteristics, differences in speaker's voice volume are absorbed. The short-time average power corresponds to the sound volume, and the square sum of the Fourier spectrum of 2048 sample analysis frames (16 KHz sampling 0.128 seconds) is calculated.

［不特定の音声環境への対応］
音声の環境による違いを「背景雑音のパワーの基本統計量」と定義する。感情識別ソフトウェア起動時に、ユーザが発話しない状態で、自動的に入力音声の短時間平均パワーを５秒間監視し、その基本統計量をもって音声環境と同定する。同定した音声環境を基準に感情認識を行うことにより、周囲のノイズを感情による音声の抑揚と誤認識することを回避する。音声環境の同定は，手動でいつでも行える。 [Response to unspecified audio environment]
The difference depending on the voice environment is defined as “basic statistic of power of background noise”. When the emotion identification software is activated, the short-time average power of the input voice is automatically monitored for 5 seconds in a state where the user does not speak, and the voice environment is identified with the basic statistics. By recognizing emotion based on the identified voice environment, it is possible to avoid misrecognizing ambient noise as voice inflection due to emotion. The voice environment can be identified manually at any time.

［不特定の発話内容への対応］
発話内容による違いを「音声の短時間変動」と定義し、長時間では互いに相殺されてその違いが無くなると仮定する。分析フレームごとに感情認識を行う代わりに、過去１秒間の特徴量の基本統計量で行う。特徴量は短時間平均パワー（音量）を用いる。「悲しみ度」及び「わくわく度」については，有声区間についてピッチ（音高）を同時に用いる。過去５秒間から同定される音声環境の基本統計量との比較によって感情認識を行う。 [Response to unspecified utterance contents]
The difference depending on the utterance content is defined as “short-term fluctuation of speech”, and it is assumed that the difference is canceled out for a long time and the difference disappears. Instead of performing emotion recognition for each analysis frame, the basic statistic of the feature amount for the past 1 second is used. The short-term average power (volume) is used as the feature amount. For “sadness” and “excitement”, the pitch (pitch) is used simultaneously for the voiced interval. Emotion recognition is performed by comparison with the basic statistics of the voice environment identified from the past 5 seconds.

［処理の観点からのまとめ］
不特定の音声環境に対応するために、感情識別ソフトウェア起動後５秒間の環境を監視する（無発話時）。また、不特定の話者に対応するために、感情識別時から遡って過去５秒間の音声を監視する（発話時）。さらに、不特定の発話内容に対応するために、感情識別時から遡って過去１秒間の音声特徴量（パワー、ピッチ）から計算される基本統計量（直前５秒間のパワー平均値、パワー標準偏差。直前１秒間のパワー平均値、パワー標準偏差、ピッチ平均値、ピッチ標準偏差）と、音声環境の基本統計量とを比較する。 [Summary from the viewpoint of processing]
In order to cope with an unspecified voice environment, the environment is monitored for 5 seconds after the emotion identification software is activated (at the time of no speech). Also, in order to deal with unspecified speakers, the voice for the past 5 seconds is monitored retroactively from the time of emotion identification (at the time of utterance). In addition, in order to deal with unspecified utterance content, basic statistics (power average value and power standard deviation for the last 5 seconds) calculated from voice feature values (power, pitch) for the past 1 second from the time of emotion identification The power average value, power standard deviation, pitch average value, pitch standard deviation for the last one second) are compared with basic statistics of the voice environment.

［音声環境同定の流れ］
感情識別ソフトウェア起動後５秒間における短時間平均パワー（適宜「パワー」と称する。）の軌跡を保存する。そして、保存したパワー軌跡について、基本統計量（平均及び標準偏差）を算出する。その基本統計量を「音声環境の基本統計量」とする。 [Flow of voice environment identification]
The trajectory of short-term average power (referred to as “power” as appropriate) for 5 seconds after the emotion identification software is activated is stored. Then, basic statistics (average and standard deviation) are calculated for the stored power trajectory. The basic statistic is referred to as “basic statistic of voice environment”.

［話者性同定の流れ］
常に過去５秒間における短時間平均パワーの軌跡を保存する。そして、保存したパワー軌跡について、基本統計量（平均及び標準偏差）を算出する。その基本統計量をもって話者性とする。 [Flow of speaker identification]
Always keep track of the short-term average power over the past 5 seconds. Then, basic statistics (average and standard deviation) are calculated for the stored power trajectory. The basic statistics are used as speaker characteristics.

［各感情の認識アルゴリズム］
図６は、興奮度計測処理の流れを示すフローチャートである。興奮度は、声の大きさ（パワー）が大きく保たれると高くなる。
図８は、悲しみ度計測処理の流れを示すフローチャートである。悲しみ度は、声の大きさが大きく保たれると低くなり、声の高さ（ピッチ）が閾値を超えるとより低くなる。すなわち、ぼそぼそと低めの声で話すと、悲しみ度は高くなる。
図９は、わくわく度計測処理の流れを示すフローチャートである。わくわく度は、声の大きさが大きく，声の高さが高いと高くなる。
数６に、まったり度を算出する式を示す。まったり度は、会話に沈黙が増えると高くなる。 [Each emotion recognition algorithm]
FIG. 6 is a flowchart showing a flow of excitement level measurement processing. The degree of excitement increases as the loudness (power) of the voice is kept large.
FIG. 8 is a flowchart showing the flow of sadness degree measurement processing. The degree of sadness becomes lower when the loudness of the voice is kept large, and becomes lower when the pitch (pitch) of the voice exceeds the threshold. In other words, if you speak with a low voice, your sadness will increase.
FIG. 9 is a flowchart showing the flow of the exciting degree measurement process. The degree of excitement increases when the loudness of the voice is high and the pitch of the voice is high.
Equation 6 shows an equation for calculating the degree of relaxation. The degree of relaxation increases as silence in the conversation increases.

［感情計測フロー］
以下で、各感情の計測処理の流れを説明する。なお、音声特徴量をそれぞれ次の記号で表す。
ピッチ：ｆ、
直前１秒間のピッチ平均値：μ_1f、
直前１秒間のピッチ標準偏差：σ_1f、
直前１秒間のパワー平均値：μ_1p、
直前１秒間のパワー標準偏差：σ_1p、
直前５秒間のパワー平均値：μ_5p、
直前５秒間のパワー標準偏差：σ_5p [Emotion measurement flow]
Below, the flow of the measurement process of each emotion is demonstrated. In addition, each voice feature is represented by the following symbol.
Pitch: f,
1 second pitch average value: μ _1f
Pitch standard deviation for the last 1 second: σ _1f ,
Power average value for 1 second immediately before: μ _1p
Power standard deviation for the last 1 second: σ _1p ,
Average power for the last 5 seconds: μ _5p
Power standard deviation for the last 5 seconds: σ _5p

［興奮度計測フロー］
図６に、興奮度計測処理の流れを示す。図７に、過去１秒のパワー平均値と興奮度との関係を示す。過去５秒のパワー平均値を環境雑音と見なし、過去１秒のパワー平均値がそれを上回る場合（ステップＳ６１でＹＥＳ）、興奮度の評価を行うＹのパスを通る。下回る場合（ステップＳ６１でＮＯ）、興奮度は0とする。ステップＳ６１でＹＥＳの場合、数１によって興奮度を算出する（ステップＳ６２）。 [Excitation level measurement flow]
FIG. 6 shows the flow of excitement level measurement processing. FIG. 7 shows the relationship between the average power value in the past 1 second and the excitement level. The power average value of the past 5 seconds is regarded as environmental noise, and if the power average value of the past 1 second is higher than that (YES in step S61), the Y path for evaluating the degree of excitement is passed. If it falls below (NO in step S61), the excitement level is set to zero. If YES in step S61, the degree of excitement is calculated by equation 1 (step S62).

［悲しみ度計測フロー］
図８に、悲しみ度計測処理の流れを示す。パワーが過去５秒のパワー平均値を上回る場合（ステップＳ８１でＹＥＳ）、悲しみ度の評価を行うＹのパスを通る。ピッチがμ_fT（例えば、１５０Ｈｚ）以下の場合は（ステップＳ８３でＮＯ）、パワーの評価のみ用い、ピッチがμ_fTを上回る場合は（ステップＳ８３でＹＥＳ）、ピッチからの評価を乗ずる。 [Sadness measurement flow]
FIG. 8 shows the flow of sadness degree measurement processing. If the power exceeds the power average value of the past 5 seconds (YES in step S81), the path of Y for evaluating the degree of sadness is passed. If the pitch is μ _fT (for example, 150 Hz) or less (NO in step S83), only power evaluation is used. If the pitch exceeds μ _fT (YES in step S83), the evaluation from the pitch is multiplied.

つまり、「ステップＳ８１でＹＥＳ」かつ「ステップＳ８３でＮＯ」の場合は、「数２の算出値」＝「悲しみ度」とする（ステップＳ８２）。「ステップＳ８１でＹＥＳ」かつ「ステップＳ８３でＹＥＳ」の場合は、「数２の算出値」×「数３の算出値」＝「悲しみ度」とする（ステップＳ８５）。数３中のσ_fTは、例えば１００Ｈｚとする。 That is, if “YES in step S81” and “NO in step S83”, “calculated value of Formula 2” = “degree of sadness” is set (step S82). If “YES in step S81” and “YES in step S83”, “calculated value of equation 2” × “calculated value of equation 3” = “degree of sadness” is set (step S85). Σ _{fT in Equation} 3 is, for example, 100 Hz.

［わくわく度計測フロー］
図９に、わくわく度計測処理の流れを示す。過去１秒のピッチ平均値がμ_fT（例えば、１００Ｈｚ）を超える場合（ステップＳ９１でＹＥＳ）、ステップＳ９２で算出した数４の算出値と、ステップＳ９３で算出した数５の算出値との積をわくわく度とする。過去１秒のピッチ平均値がμ_fT以下の場合はわくわく度＝０とする。数４中のσ_fTは、例えば５０Ｈｚとする。 [Exciting measurement flow]
FIG. 9 shows the flow of the excitement degree measurement process. When the pitch average value in the past one second exceeds μ _fT (for example, 100 Hz) (YES in step S91), the product of the calculated value of formula 4 calculated in step S92 and the calculated value of formula 5 calculated in step S93. Is the degree of excitement. When the pitch average value in the past 1 second is less than μ _fT , the exciting degree is set to 0. Σ _{fT in Equation} 4 is, for example, 50 Hz.

［まったり度計測フロー］
まったり度は、数６によって算出される。過去５秒のパワー平均値を環境雑音と見なし、過去１秒のパワー平均値がそれに一致する場合をまったり度が最大とし、それから離れるに従って指数関数で減少する。 [Degree of measurement flow]
The degree of looseness is calculated by Equation 6. The power average value in the past 5 seconds is regarded as environmental noise, and when the power average value in the past 1 second coincides with it, the degree of clogging is maximized, and decreases with an exponential function as the distance from it is increased.

［感情判定］
過去５フレーム（１６KHz，フレーム長２０４８の場合、０．６４秒に相当）の４感情の度合いを保存する。次に、各フレームで度合いが最大の感情について、生起回数を１増やす。そして、過去５フレームで累積回数が最大の感情を現在の感情と決定する。 [Emotion judgment]
The degree of 4 emotions in the past 5 frames (corresponding to 0.64 seconds in the case of 16 KHz and frame length 2048) is stored. Next, the number of occurrences is increased by 1 for the emotion having the maximum degree in each frame. Then, the emotion having the maximum cumulative number in the past five frames is determined as the current emotion.

［感情測定モニターTypeA画面］
図１０は、感情測定モニター画面の一例を示す。音声のスペクトルは、緩やかな起伏である包絡に周期的な細かい凹凸である微細構造が重畳した構造を持つ。 [Emotion measurement monitor TypeA screen]
FIG. 10 shows an example of the emotion measurement monitor screen. The spectrum of speech has a structure in which a fine structure that is periodic fine irregularities is superimposed on an envelope that is a gentle undulation.

＜ケプストラム＞
音声波形のパワースペクトルを対数に変換し、さらにフーリエ変換した結果をケプストラムと呼ぶ。ケプストラムの横軸をケフレンシー軸と呼ぶ。スペクトルの横軸の次元が周波数であるから、これをフーリエ変換して得られるケプストラムの横軸の次元は時間軸である。包絡に相当する成分は低ケフレンシー部に、微細構造に相当する部分は高ケフレンシー部に現れる。前者が声道特性（声色）、後者が声帯音源の特性（ピッチ：声の高さ）に相当する。 <Cepstrum>
The result of converting the power spectrum of the speech waveform to a logarithm and further Fourier transforming is called a cepstrum. The horizontal axis of the cepstrum is called the quefrency axis. Since the dimension of the horizontal axis of the spectrum is the frequency, the horizontal axis dimension of the cepstrum obtained by Fourier transforming this is the time axis. The component corresponding to the envelope appears in the low quefrency portion, and the portion corresponding to the fine structure appears in the high quefrency portion. The former corresponds to the vocal tract characteristics (voice color), and the latter corresponds to the characteristics of the vocal cord sound source (pitch: voice pitch).

閾値によって低ケフレンシー部と高ケフレンシー部に分ける処理をフィルタリングをもじってリフタリングと呼ぶ。リフタリングによって高ケフレンシー部からピークを抽出することによってピッチ周期（声の高さ）を求めることができる。この一連の処理をケプストラム分析と呼ぶ。 The process of dividing the low and high quefrency parts according to the threshold is called liftering through filtering. The pitch period (voice pitch) can be obtained by extracting a peak from the high quefrency portion by liftering. This series of processing is called cepstrum analysis.

［感情測定モニターTypeB画面］
図１１は、感情測定モニター画面の他の例を示す。同図に示されているように、音声特徴量の統計量（直前５秒間のパワー平均値、パワー標準偏差。直前１秒間のパワー平均値、パワー標準偏差、ピッチ平均値、ピッチ標準偏差。）を算出し、それら統計量に基づいて「興奮」、「悲しい」、「わくわく」、「まったり」などの各感情の度合いを算出し、感情を判定する。 [Emotion measurement monitor TypeB screen]
FIG. 11 shows another example of the emotion measurement monitor screen. As shown in the figure, the statistic of the voice feature amount (power average value, power standard deviation for the last 5 seconds. Power average value, power standard deviation, pitch average value, pitch standard deviation for the previous 1 second.) Is calculated, and the degree of each emotion such as “excitement”, “sad”, “exciting”, “marvel” is calculated based on these statistics, and the emotion is determined.

［感情の分析結果をデータベースに蓄積］
図１２は、感情の分析結果をデータベースに蓄積するまでの流れを示す。同図に示すように、ステップＳ１３１で、感情分析結果と音声特徴量を取得する。ステップＳ１３２で、インターネット経由でデータベースに接続する。ステップＳ１３３で、１秒ごとに感情音声データベース３５に特徴量データ２６を登録する。ステップＳ１３４で、１秒ごとにパーソナルデータベース３６に感情分析結果データ２５を登録する。 [Accumulating emotion analysis results in database]
FIG. 12 shows a flow until the analysis result of emotion is accumulated in the database. As shown in the figure, the emotion analysis result and the voice feature amount are acquired in step S131. In step S132, the database is connected via the Internet. In step S133, the feature data 26 is registered in the emotion voice database 35 every second. In step S134, the emotion analysis result data 25 is registered in the personal database 36 every second.

［感情分析結果から画面表示の変更］
図１３は、感情分析結果から画面表示を変更するまでの流れを示す。同図に示すように、ステップＳ１４１で、感情音声分析結果をコンテンツ配信サーバ３２に送信する。ステップＳ１４２で、コンテンツ配信サーバ３２は、パーソナルデータベース３６から、感情履歴と広告表示履歴を取得し、その値をもとに広告管理サーバ３４から広告データを取得する。ステップＳ１４３で、コンテンツ配信サーバ３２は、ステップＳ１４２で取得した広告表示履歴をもとにコメントデータをコメントサーバ３３から取得する。ステップＳ１４４で、取得したコメントデータと広告データをＰＣ２３の画面に表示させる。 [Change of screen display from emotion analysis results]
FIG. 13 shows a flow from the emotion analysis result to changing the screen display. As shown in the figure, the emotional voice analysis result is transmitted to the content distribution server 32 in step S141. In step S142, the content distribution server 32 acquires the emotion history and the advertisement display history from the personal database 36, and acquires the advertisement data from the advertisement management server 34 based on the values. In step S143, the content distribution server 32 acquires comment data from the comment server 33 based on the advertisement display history acquired in step S142. In step S144, the acquired comment data and advertisement data are displayed on the screen of the PC 23.

ステップＳ１４２〜１４３の広告表示履歴とは、感情履歴をもとにして表示した広告表示の履歴であって、パーソナルデータベース３６に蓄積される。例えば、過去1秒間で「悲しみ」の感情結果となった場合に、温泉のバナー広告を表示させる。その表示履歴に基づいて、次回に「悲しみ」の感情結果になった場合には、別の広告を表示させる。表示させるシーケンスについては後述する。 The advertisement display history in steps S142 to S143 is the advertisement display history displayed based on the emotion history, and is stored in the personal database 36. For example, a hot spring banner ad is displayed when the emotional result of “sadness” has been achieved in the past second. Based on the display history, when the emotional result of “sadness” is obtained next time, another advertisement is displayed. The sequence to be displayed will be described later.

ステップＳ１４３の感情履歴とは、ステップＳ１３４でパーソナルデータベース３６に蓄積した利用者毎の感情分析結果データの履歴を言う。例えば、過去1秒間で判断した感情が「興奮」、「悲しみ」、「わくわく」、「まったり」のどの感情であったかのデータ履歴を言う。より具体的には、４秒前から３秒前までは「興奮」、３秒前から２秒前までは「悲しみ」、２秒前から１秒前までは「わくわく」、そして１秒前から現在までは「まったり」などである。 The emotion history in step S143 refers to the history of emotion analysis result data for each user accumulated in the personal database 36 in step S134. For example, it refers to a data history indicating whether the emotion determined in the past one second is “excitement”, “sadness”, “excitement”, or “slowness”. More specifically, “excitement” from 4 seconds to 3 seconds ago, “sadness” from 3 seconds to 2 seconds ago, “excitement” from 2 seconds to 1 second ago, and from 1 second ago Up to now, it is “married”.

［広告表示間隔］
感情別広告は、５秒〜１０秒間隔で、ワード別広告は、登録ワードを発話したタイミングで各広告をチェンジさせることが好ましい。 [Ad display interval]
It is preferable that the advertisement classified by emotion is changed every 5 to 10 seconds, and the advertisement classified by word is changed at the timing when the registered word is uttered.

［感情別広告及びコメント表示シーケンス］
図１４は、感情別広告及びコメントを表示するために必要な各テーブルと処理の流れを示す。同図に示すように、Ａ〜Ｃテーブルは、パーソナルデータベース３６に記憶される。Ａテーブルには、パーソナルＩＤ及び名前が記憶される。Ｂテーブルには、パーソナルＩＤ、感情履歴時間、及び感情結果が記憶される。Ｃテーブルには、パーソナルＩＤ、表示時間、広告番号、直前表示広告、コメント、及び開示時間が記憶される。Ｄテーブルは、広告管理サーバ３４に記憶される。Ｄテーブルには、広告番号、広告データ（バイナリデータ）、感情種別、優先順位、及びコメント番号が記憶される。コメントサーバ３３のテーブルには、コメント番号、及びコメントが記憶される。 [Emotion-specific advertisement and comment display sequence]
FIG. 14 shows each table necessary for displaying emotion-specific advertisements and comments and the flow of processing. As shown in the figure, the A to C tables are stored in the personal database 36. A personal ID and a name are memorize | stored in A table. In the B table, personal ID, emotion history time, and emotion result are stored. In the C table, a personal ID, a display time, an advertisement number, a previous display advertisement, a comment, and a disclosure time are stored. The D table is stored in the advertisement management server 34. The D table stores an advertisement number, advertisement data (binary data), emotion type, priority, and comment number. In the table of the comment server 33, comment numbers and comments are stored.

感情別広告及びコメントを表示するために必要な処理について説明する。
ステップＳ１４１１：１秒毎に感情結果データをパーソナルデータベース３６のＢテーブルに登録する。図１４に示す例では、感情結果「１」は「興奮」、「２」は「悲しい」、「３」は「わくわく」、「４」は「まったり」を示す。 A process necessary for displaying an emotion-specific advertisement and a comment will be described.
Step S1411: Emotion result data is registered in the B table of the personal database 36 every second. In the example shown in FIG. 14, the emotion result “1” indicates “excitement”, “2” indicates “sad”, “3” indicates “exciting”, and “4” indicates “slow”.

ステップＳ１４１２：５秒毎にテーブルＢの感情結果の中から、もっとも顕著な感情結果を抽出する。図１４に示す例では、感情結果「１」が３回出現し、最多出現であるため、「１」が抽出される。出現回数が同じ感情結果が複数存在する場合は、例えば数字が少ない方を抽出する。 Step S1412: The most prominent emotional result is extracted from the emotional results in Table B every 5 seconds. In the example illustrated in FIG. 14, the emotion result “1” appears three times and is the most frequent occurrence, so “1” is extracted. When there are a plurality of emotional results having the same appearance frequency, for example, the one with a smaller number is extracted.

ステップＳ１４１３：Ｂテーブルから抽出された感情を第一要素として持つ広告であって、かつＣテーブルに記憶されている直前表示広告の中から第二要素がもっとも高い値を取得する。図１４に示す例では、この時点においては、１行目のデータ（「直前表示広告」列が「１，１」のレコード）と２行目のデータ（「直前表示広告」列が「１，２」のデータ）のみが記憶されており、３行目のデータ（「直前表示広告」列が「１，３」のデータ）は記憶されていないものとする。このような状況下では、Ｂテーブルから抽出された感情を第一要素として持ち、かつＣテーブルに記憶されている直前表示広告の中の第二要素の最大値は、２行目のデータの「直前表示広告」列の「１，２」の「２」である。 Step S1413: The advertisement having an emotion extracted from the B table as the first element and the second element having the highest value among the immediately preceding display advertisements stored in the C table is acquired. In the example shown in FIG. 14, at this time, the data on the first row (the record “1 and 1” in the “previous display advertisement” column) and the data on the second row (the “display advertisement immediately before” column) are “1,”. 2 ”) is stored, and the data in the third row (the“ previous display advertisement ”column is“ 1,3 ”data) is not stored. Under such circumstances, the maximum value of the second element in the immediately preceding display advertisement having the emotion extracted from the B table as the first element and stored in the C table is “2” in the data in the second row. “2” in “1, 2” in the “immediate display advertisement” column.

ステップＳ１４１４：「ステップＳ１４１２で取得した値」と「ステップＳ１４１３で取得した値に１をプラスした値」の２つの値から感情種別と優先順位を抽出条件として広告管理サーバ３４のＤテーブルから行データを絞り込む。図１４に示す例では、「感情識別：１、優先順位：３」の行データに絞り込まれる。優先順が最後尾になった場合は、１に戻る。例えば、感情種別１の場合、優先順位は１から３までであるから、優先順位が３になったら、次は優先順位１の行データに絞り込まれる。 Step S1414: The row data from the D table of the advertisement management server 34 using the emotion type and priority as the extraction conditions from the two values “value acquired in step S1412” and “value obtained by adding 1 to the value acquired in step S1413”. Narrow down. In the example illustrated in FIG. 14, the row data is “emotion identification: 1, priority: 3”. If the priority is at the end, return to 1. For example, in the case of emotion type 1, since the priority order is from 1 to 3, when the priority order becomes 3, next, the row data of priority order 1 is narrowed down.

ステップＳ１４１５：ＤテーブルからステップＳ１４１４で絞り込んだ行の各列のデータが取得される。 Step S1415: Data of each column in the row narrowed down in step S1414 is acquired from the D table.

ステップＳ１４１６：ステップＳ１４１５で取得した列データ内のコメント番号（図１４に示す例では「eje0019」）をもとにコメントサーバ３３からコメントを取得する（図１４に示す例では「冬は温泉でしょう！」）。 Step S1416: A comment is acquired from the comment server 33 based on the comment number (“eje0019” in the example shown in FIG. 14) in the column data acquired in Step S1415 (“Winter is a hot spring in the example shown in FIG. 14). !))

ステップＳ１４１７：ステップＳ１４１５で取得した広告データとステップＳ１４１６で取得したコメントデータをＣテーブルに登録する。 Step S1417: The advertisement data acquired in step S1415 and the comment data acquired in step S1416 are registered in the C table.

ステップＳ１４１８：ステップＳ１４１５とステップＳ１４１６で取得した各データをコンテンツ配信サーバ３２を経由して、ＰＣ２３のツールに送信する。 Step S1418: Each data acquired in step S1415 and step S1416 is transmitted to the tool of the PC 23 via the content distribution server 32.

ステップＳ１４１９：ステップＳ１４１８で送信されてきた各データをツールの表示エリアに表示させる。 Step S1419: Each data transmitted in step S1418 is displayed in the display area of the tool.

［ワードスポッティング結果に基づく画面表示の変更］
図１５は、ワードスポッティング結果に基づいて画面表示を変更する処理の流れを示す。同図に示すように、ステップＳ１５１では、ワードスポティング結果をＰＣ２３からコンテンツ配信サーバ３２に送信する。 [Change of screen display based on word spotting result]
FIG. 15 shows the flow of processing for changing the screen display based on the word spotting result. As shown in the figure, in step S151, the word spotting result is transmitted from the PC 23 to the content distribution server 32.

ステップＳ１５２では、コンテンツ配信サーバ３２からコメントサーバ３３にアクセスし、ワードスポッティング結果に基づいて表示させるコメントデータをコメントサーバ３３から取得する。 In step S152, the content distribution server 32 accesses the comment server 33, and acquires comment data to be displayed based on the word spotting result from the comment server 33.

ステップＳ１５３では、コンテンツ配信サーバ３２から広告管理サーバ３４にアクセスし、ワードスポッティング結果に基づいて表示させる広告データを広告管理サーバ３４から取得する。 In step S153, the content management server 34 accesses the advertisement management server 34, and acquires advertisement data to be displayed based on the word spotting result from the advertisement management server 34.

ステップＳ１５４では、ステップＳ１５２で取得したコメントデータとステップＳ１５３で取得した広告データをコンテンツ配信サーバ３２からＰＣ２３に送信し、ＰＣ２３の画面に表示させる。 In step S154, the comment data acquired in step S152 and the advertisement data acquired in step S153 are transmitted from the content distribution server 32 to the PC 23 and displayed on the screen of the PC 23.

ステップＳ１５２の「ワードスポッティング」とは、会話中に発した単語の音声波形と抽出したい単語の音声波形が類似であった場合に、特定の単語が発せられたと判断することが可能な技術である。例えば、「おんせんいきたいね！」と発音された場合に、「おんせん」部分の音声波形を抽出し、「温泉」という単語が発音されたと判断することが可能である。そして、本実施形態においては温泉旅行に関するバナー広告が表示される。 The “word spotting” in step S152 is a technique that can determine that a specific word is emitted when the speech waveform of a word uttered during a conversation and the speech waveform of a word to be extracted are similar. . For example, when “Onsen I want you!” Is pronounced, it is possible to extract the speech waveform of the “Onsen” portion and determine that the word “Onsen” has been pronounced. And in this embodiment, the banner advertisement regarding a hot spring trip is displayed.

［ワード別広告及びコメント表示シーケンス］
図１６は、ワード別広告及びコメントを表示するために必要な各テーブルと処理の流れを示す。同図に示すように、Ｅテーブルは、パーソナルデータベース３６に記憶される。Ｅテーブルには、パーソナルID、表示時間、広告番号、直前表示広告、コメント、開示時間、及びワードＩＤが記憶される。Ｆテーブルは、広告管理サーバ３４に記憶される。Ｆテーブルには、広告番号、広告データ（バイナリデータ）、感情種別、優先順位、コメント番号、及びワードＩＤが記憶される。コメントサーバ３３のテーブルには、コメント番号、及びコメントが記憶される。 [Word-based advertisement and comment display sequence]
FIG. 16 shows each table necessary for displaying advertisements and comments by word and the flow of processing. As shown in the figure, the E table is stored in the personal database 36. The E table stores a personal ID, display time, advertisement number, immediately preceding display advertisement, comment, disclosure time, and word ID. The F table is stored in the advertisement management server 34. The F table stores an advertisement number, advertisement data (binary data), emotion type, priority, comment number, and word ID. In the table of the comment server 33, comment numbers and comments are stored.

ワード別広告及びコメントを表示するために必要な処理について説明する。
ステップＳ１５１１：ワードスポッティング技術を利用して、会話中の音声波形のデジタルデータとワード音声データ内の「ワード音声波形データ」を比較し、該当波形をモニタリングする。 Processing necessary for displaying the word-by-word advertisement and the comment will be described.
Step S1511: Using the word spotting technique, the digital data of the speech waveform during conversation is compared with the “word speech waveform data” in the word speech data, and the corresponding waveform is monitored.

ステップＳ１５１２：該当波形があった場合、その該当波形のワードＩＤを取得する。 Step S1512: If there is a corresponding waveform, the word ID of the corresponding waveform is acquired.

ステップＳ１５１３：ステップＳ１５１２で取得したワードＩＤと同一の行をパーソナルデータベース３６のＥテーブルから抽出し、その抽出行内の直前表示広告の第二要素がもっとも高い値（図１６に示す例では、「１,２」の「２」）を取得する。 Step S1513: The same row as the word ID acquired in Step S1512 is extracted from the E table of the personal database 36, and the second element of the immediately preceding display advertisement in the extracted row has the highest value (in the example shown in FIG. 16, “1 , 2 "" 2 ").

ステップＳ１５１４：「ステップＳ１５１２で取得した値」、「ステップＳ１５１３で取得した値に１をプラスした値」及び「図１４のＢテーブルから抽出された最も顕著な感情値」の3つの値からワードＩＤ、感情種別、優先順位を抽出条件としてＦテーブルから行データを絞り込む（図１６に示す例では、「感情種別：１、優先順位：３、ワードＩＤ：００１」の行を抽出）。優先順位が最後尾になった場合は、１に戻る。 Step S1514: Word ID from three values: “value acquired in step S1512”, “value acquired in step S1513 plus 1” and “most prominent emotion value extracted from table B in FIG. 14”. Then, the row data is narrowed down from the F table using the emotion type and priority as extraction conditions (in the example shown in FIG. 16, the row of “emotion type: 1, priority: 3, word ID: 001” is extracted). If the priority is at the end, return to 1.

ステップＳ１５１５：ステップＳ１５１４で絞り込んだ行の各列データを取得する。 Step S1515: Each column data of the row narrowed down in step S1514 is acquired.

ステップＳ１５１６：ステップＳ１５１５で取得した列データ内のコメント番号（図１６に示す例では「oke5009」）をもとにコメントサーバ３３からコメントデータを取得する（図１６に示す例では、「声が元気だね！六本木Ｃ店のケーキでウキウキ度アップ！」）。 Step S1516: Comment data is acquired from the comment server 33 based on the comment number in the column data acquired in step S1515 (“oke5009” in the example shown in FIG. 16) (in the example shown in FIG. It's exciting with a cake from Roppongi C! ”).

ステップＳ１５１７：ステップＳ１５１５で取得した広告データとステップＳ１５１６で取得したコメントデータをパーソナルデータベース３６のＥテーブルに登録する。 Step S1517: The advertisement data acquired in step S1515 and the comment data acquired in step S1516 are registered in the E table of the personal database 36.

ステップＳ１５１８：ステップＳ１５１５とステップＳ１５１６で取得した各データをコンテンツ配信サーバ３２を経由して、ＰＣ２３のツールに送信する。 Step S1518: Each data acquired in step S1515 and step S1516 is transmitted to the tool of the PC 23 via the content distribution server 32.

ステップＳ１５１９：ステップＳ１５１８でコンテンツ配信サーバ３２を経由してＰＣ２３に送信されてきたデータをツールの表示エリアに表示させる。 Step S1519: The data transmitted to the PC 23 via the content distribution server 32 in step S1518 is displayed in the display area of the tool.

［Webサイトにログイン］
図１７は、Ｗｅｂサイトにログインするまでの処理の流れを示す。ステップＳ２１１では、音声感情識別ツール又はインターネット閲覧ソフト（ＩＥ等）を起動する。ステップＳ２１２は、音声感情識別ツールを起動させた場合であって、音声感情識別ツールの表示エリア内に表示されたリンクボタンが押される。以降、システム内動作として、ツール内に保存してある「メンバー名」と「パスワード」を用いて、メンバー認証データベース３１において自動認証が行われる。ステップＳ２１３は、インターネット閲覧ソフトを起動させた場合であって、インターネット閲覧ソフトによってＰＣ２３の画面に表示されるメンバーログインエリアに「メンバー名」と「パスワード」を入力してログインを実行する。ステップＳ２１４では、認証が完了し、Ｗｅｂサイト（パーソナルサイト）にログインが完了する。 [Login to website]
FIG. 17 shows the flow of processing until login to the Web site. In step S211, a voice emotion identification tool or Internet browsing software (IE or the like) is activated. Step S212 is a case where the voice emotion identification tool is activated, and the link button displayed in the display area of the voice emotion identification tool is pressed. Thereafter, as an operation in the system, automatic authentication is performed in the member authentication database 31 using “member name” and “password” stored in the tool. Step S213 is a case where the Internet browsing software is started, and the login is executed by inputting “member name” and “password” in the member login area displayed on the screen of the PC 23 by the Internet browsing software. In step S214, authentication is completed and login to the Web site (personal site) is completed.

［ツール利用時に蓄積されたデータ結果をベースとした各種サービスの閲覧］
図１８は、ツール利用時に蓄積されたデータ結果をベースとした各種サービスを閲覧する処理の流れを示す。同図に示すように、ステップＳ２２１では、サイト運営用Ｗｅｂサーバ３７からパーソナルデータベース３６にアクセスし、ログインユーザーの感情音声分析結果を取得する。ステップＳ２２２では、取得した分析結果から表示すべき広告を広告管理サーバ３４から取得する。ステップＳ２２３では、取得した分析結果をベースとした各種サービス内容を提供サービスデータベース３８から取得する。ステップＳ２２４では、取得した各データをhtmlファイルに埋め込みメンバーのＰＣ２３又は携帯電話機２４に送信する。 [Browse various services based on data results accumulated when using the tool]
FIG. 18 shows the flow of processing for browsing various services based on the data results accumulated when using the tool. As shown in the figure, in step S221, the personal database 36 is accessed from the web server 37 for site operation, and the emotion voice analysis result of the login user is acquired. In step S222, an advertisement to be displayed is acquired from the advertisement management server 34 from the acquired analysis result. In step S223, various service contents based on the acquired analysis result are acquired from the provided service database 38. In step S224, the acquired data is embedded in the html file and transmitted to the member PC 23 or the mobile phone 24.

ステップＳ２２２では、ツールを用いて音声を取得している最中にもっとも顕著だった感情をパーソナルデータベース３６から抽出し、その感情に合わせて広告を表示させる。例えば、会話が悲しげな場合に「ハワイ旅行」等の広告を表示させる。「会話中にもっとも顕著だった感情を判断する処理」については後述する。 In step S222, the most prominent emotion during voice acquisition using the tool is extracted from the personal database 36, and an advertisement is displayed in accordance with the emotion. For example, if the conversation is sad, an advertisement such as “Hawaii trip” is displayed. The “process for judging the most prominent emotion during the conversation” will be described later.

ステップＳ２２３では、ツールを用いて音声を取得している最中にもっとも顕著だった感情をパーソナルデータベースから抽出し、その感情に合わせてサービスを表示させる。表示されるサービスは、例えば声占い（感情履歴をもとに占いを表示する）、声健康チェック（感情履歴から健康度合いを表示する）、お勧めライフ（感情履歴からお勧めの生活スタイルを表示する）などである。 In step S223, emotions that are most prominent during voice acquisition using the tool are extracted from the personal database, and services are displayed in accordance with the emotions. The displayed services are, for example, voice fortune telling (display fortune telling based on emotion history), voice health check (display health level from emotion history), recommended life (display recommended lifestyle from emotion history) Etc.).

［会話中にもっとも顕著だった感情を判断する処理］
直前表示広告の第一要素の統計から最も多い数値を抽出し、その値から「最も顕著だった感情」を導き出す。また同順だった場合は、数値が小さい方を優先する。 [Process to judge emotions that were most noticeable during conversation]
The most numerical value is extracted from the statistics of the first element of the immediately preceding display advertisement, and “the most prominent emotion” is derived from the value. If the order is the same, priority is given to the smaller number.

図１９に、パーソナルデータベース３６に記憶されるテーブルＣの第２の例を示し、同図を用いて「会話中にもっとも顕著だった感情を判断する処理」を説明する。なお、「直前表示広告の第一要素」とは、図１９の例では「直前表示広告」列の各データの第１番目の要素を言う。例えば、「直前表示広告」列の「１，２」中の「１」を言う。図１９の例では、「直前表示広告の第一要素」は「１」が６個、「２」が３個であるから、「直前表示広告の第一要素の統計から最も多い数値」は「１」になる。そして、これに対応する感情（例えば、「興奮」）が「最も顕著だった感情」となる。仮に、「直前表示広告の第一要素」が「１」も「２」も同じ個数であったら、数値が小さい方の「１」が「最も顕著だった感情」となる。 FIG. 19 shows a second example of the table C stored in the personal database 36, and the “process for judging the emotion most noticeable during the conversation” will be described with reference to FIG. The “first element of the immediately preceding display advertisement” refers to the first element of each data in the “immediately displayed advertisement” column in the example of FIG. For example, “1” in “1, 2” in the “advance display advertisement” column is referred to. In the example of FIG. 19, since “1” is “6” and “2” is 3 in the “first element of the immediately preceding display advertisement”, “the most numerical value from the statistics of the first element of the immediately preceding display advertisement” is “ 1 ”. The emotion corresponding to this (for example, “excitement”) becomes “the most prominent emotion”. If the “first element of the immediately preceding display advertisement” is the same number of “1” and “2”, the smaller numerical value “1” becomes “the most prominent emotion”.

感情音声識別ツール利用時の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process at the time of emotion voice identification tool utilization. パソコン又は携帯電話でのパーソナルサイト閲覧時の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process at the time of personal site browsing with a personal computer or a mobile telephone. サービス提供システムの全体構成の一例を示すブロック図である。It is a block diagram which shows an example of the whole structure of a service provision system. 音声取得から音声感情分析までの処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process from an audio | voice acquisition to an audio | voice emotion analysis. 音声特徴量算出処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an audio | voice feature-value calculation process. 興奮度計測処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an excitement degree measurement process. 過去１秒のパワー平均値と興奮度との関係を示すグラフである。It is a graph which shows the relationship between the power average value of the past 1 second, and excitement degree. 悲しみ度計測処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a sadness degree measurement process. わくわく度計測処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an exciting degree measurement process. 感情測定モニター画面の一例を示す図である。It is a figure which shows an example of an emotion measurement monitor screen. 感情測定モニター画面の他の例を示す図である。It is a figure which shows the other example of an emotion measurement monitor screen. 感情の分析結果をデータベースに蓄積するまでの流れを示す図である。It is a figure which shows the flow until it accumulate | stores the analysis result of an emotion in a database. 感情分析結果から画面表示を変更するまでの流れを示す図である。It is a figure which shows the flow until it changes a screen display from an emotion analysis result. 感情別広告及びコメントを表示するために必要な各テーブルと処理の流れを示す図である。It is a figure which shows each table required in order to display the advertisement according to emotion, and a comment, and the flow of a process. ワードスポッティング結果に基づいて画面表示を変更する処理の流れを示す図である。It is a figure which shows the flow of the process which changes a screen display based on a word spotting result. ワード別広告及びコメントを表示するために必要な各テーブルと処理の流れを示す図である。It is a figure which shows each table required in order to display the advertisement according to word, and a comment, and the flow of a process. Ｗｅｂサイトにログインするまでの処理の流れを示す図である。It is a figure which shows the flow of a process until it logs in to a Web site. ツール利用時に蓄積されたデータ結果をベースとした各種サービスを閲覧する処理の流れを示す図である。It is a figure which shows the flow of the process which browses the various services based on the data result accumulated at the time of tool utilization. パーソナルデータベース３６に記憶されるテーブルＣの第２の例を示す図である。It is a figure which shows the 2nd example of the table C memorize | stored in the personal database. 音声感情識別ツールの画面デザインの一例を示す図である。It is a figure which shows an example of the screen design of an audio | voice emotion identification tool.

Explanation of symbols

２２…マイクロフォン
２３…パーソナルコンピュータ
２５…分析結果データ
２６…特徴量データ
３１…メンバー認証データベース
３２…コンテンツ配信サーバ
３３…コメントサーバ
３４…広告管理サーバ
３５…感情音声データベース
３６…パーソナルデータベース
３７…サイト運営用Ｗｅｂサーバ
３８…提供サービスデータベース 22 ... Microphone 23 ... Personal computer 25 ... Analysis result data 26 ... Feature data 31 ... Member authentication database 32 ... Content distribution server 33 ... Comment server 34 ... Advertisement management server 35 ... Emotion voice database 36 ... Personal database 37 ... For site management Web server 38 ... provided service database

Claims

Means for associating and storing words and content;
Word extraction means for extracting words from conversational speech;
Content reading means for reading the content stored in association with the word extracted by the word extracting means;
A content providing system comprising: content transmitting means for sending the read content to content reproducing means.

Means for storing a history of contents transmitted by the contents transmitting means;
A plurality of contents are stored with priority for one word,
The content reading unit refers to a history of the transmitted content from among the plurality of contents stored in association with the word extracted by the word extracting unit, and next to the content transmitted most recently. The content providing system according to claim 1, wherein content having a high priority is read.

Emotion type content storage means for storing the emotion type and the content in association with each other;
Feature quantity calculating means for calculating the feature quantity of the voice input from the voice input means;
Emotion type determination means for determining an emotion type based on the calculated audio feature amount;
Content reading means for reading the content stored in association with the determined emotion type;
A content providing system comprising: transmission means for sending the read content to content reproduction means.

Means for storing the emotion type determined by the emotion type determination means for each first predetermined time;
Emotion type extraction means for extracting one emotion type for every second predetermined time from the emotion types stored for each first predetermined time;
The content providing system according to claim 1, wherein the content reading unit reads content stored in association with the emotion type extracted by the emotion type extraction unit from the emotion type content storage unit.

Means for storing a history of contents transmitted by the contents transmitting means;
A plurality of the contents are stored with priority for one emotion type,
The content reading unit refers to a history of the transmitted content and is transmitted most recently from the plurality of contents stored in association with the emotion type determined by the emotion type determination unit. The content providing system according to claim 4, wherein content having the next highest priority is read from the content.

The content distribution system according to any one of claims 3 to 5, wherein the audio feature amount is an average and standard deviation of audio power and an average and standard deviation of audio pitch.

Means for respectively storing advertisements associated with emotion types, advertisements associated with words, and comments associated with both emotion types and words;
Emotion type determination means for determining an emotion type based on the input voice;
Word extraction means for extracting words from conversational speech;
The advertisement stored in association with the emotion type determined by the emotion type determination unit is read, the advertisement stored in association with the word extracted by the word extraction unit is read, and the determination by the emotion type determination unit An advertisement comment reading means for reading a comment stored in association with the extracted emotion type and the word extracted by the word extraction means;
A content providing system comprising: content transmitting means for sending the read advertisement and comment to content reproducing means.

Pre-store the audio waveform and content in association with each other,
Compare the stored speech waveform with the speech waveform input from the speech input means, determine whether they are similar,
Reading the content stored in association with the speech waveform determined to be similar to the input speech waveform;
A content providing method for sending the read content to a content reproduction means.

Associating emotion types with content in advance,
Calculate the feature value of the voice input from the voice input means,
Based on the calculated voice feature, determine the emotion type,
Read the content stored in association with the determined emotion type,
A content providing method for sending read content to a content reproduction means.