JP4965322B2

JP4965322B2 - User support method, user support device, and user support program

Info

Publication number: JP4965322B2
Application number: JP2007108253A
Authority: JP
Inventors: 豪入江; 浩太日高; 隆佐藤; 行信谷口; 信弥中嶌
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-04-17
Filing date: 2007-04-17
Publication date: 2012-07-04
Anticipated expiration: 2027-04-17
Also published as: JP2008269065A

Abstract

PROBLEM TO BE SOLVED: To provide a user support device capable of timely associating users mutually with high reliability. SOLUTION: The user support device is provided with a feeling estimation part F100 extracting an attribute or a value of the attribute of content by using one or more of sound information and image information of the content, a database 400 storing identification information of the content, the attribute estimated as to the content, and the attribute information in association mutually, a user information storage part F200 storing contents published by a user, contents viewed by the user, and information identifying at least one among viewed contents as user taste information for each of users, and a user association part F300 associating a first user as an association target with one or more users except the user as the association target based on the user taste information. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、複数のユーザを関連付けるユーザ支援方法、ユーザ支援装置およびユーザ支援プログラムに関する。 The present invention relates to a user support method, a user support apparatus, and a user support program for associating a plurality of users.

現在、放送に限らず、Ｗｅｂサイトや個人ＰＣにおいても、コンテンツを視聴することが増えてきており、コンテンツの種類も、例えば、映画やドラマ、ホームビデオ、ニュース、ドキュメンタリ、音楽等、非常に多様化している。通信と放送の連携に伴い、今後更に、コンテンツ視聴を楽しむユーザが増加することは容易に予想される。 Currently, not only broadcasting, but also viewing of content is increasing on websites and personal PCs, and the types of content are very diverse, such as movies, dramas, home videos, news, documentaries, music, etc. It has become. With the cooperation of communication and broadcasting, it is easily expected that the number of users who enjoy viewing content will increase further in the future.

最近では、コンテンツをＷｅｂ上で共有し、ユーザ同士でコミュニケーションを取ることができる、コンテンツ共有サイトの利用が一般的となりつつある。 Recently, it has become common to use content sharing sites where users can share content on the Web and communicate with each other.

コンテンツ共有サイトでは、各ユーザが、視聴してもらいたいコンテンツをサーバにアップロードすることによりコンテンツが蓄積される。蓄積されたコンテンツは原則として公開されるため、各ユーザは視聴したいコンテンツを選択及び受信し、自由に視聴することができる。 In the content sharing site, the content is accumulated by each user uploading the content that the user wants to view to the server. Since the accumulated content is disclosed in principle, each user can select and receive the content he / she wishes to view and can view it freely.

また、各ユーザは、視聴したコンテンツについて、感想や意見等のコメントを書き込むことができる。これらのコメントも原則として公開されるため、コンテンツを公開したユーザや、同じコンテンツを視聴した他のユーザとのコミュニケーション、及び、同じようなコンテンツを好むユーザ同士でのコミュニティ形成、といったことを楽しむことができるようになっている。 In addition, each user can write comments such as impressions and opinions on the viewed content. Since these comments are also released in principle, enjoy communication with users who have published the content, other users who have watched the same content, and community formation among users who like the same content. Can be done.

特に、最近主流となっているコミュニティ形成型、いわゆるＳＮＳ（ソーシャル・ネットワーキング・サービス）型のコンテンツ共有サイトでは、コンテンツを介して、ユーザ同士がコミュニケーションを取る、またはユーザ同士でコミュニティを形成することが最も大きな目的となっている。 In particular, in a community-sharing type content sharing site that has become the mainstream recently, so-called SNS (social networking service) type content sharing sites, users can communicate with each other or form a community among users via content. It has become the biggest purpose.

活発で有意義なコミュニケーションを取るためには、ユーザ同士の感性が類似していることや、話したい話題が共通していること等が重要な要因となってくる。コンテンツ共有サイトにおけるコミュニケーションのケースでは、感性の類似として、コンテンツに対する嗜好が類似していること、コンテンツを視聴した際の感想、感情が類似しているといったこと、また、話題の共通性については、あるコンテンツを公開しているユーザと、そのコンテンツを視聴したユーザの関係であることや、同じ、もしくは、類似性のあるコンテンツを公開、又は視聴していること、などといった条件がこの要因に当たる。 In order to take active and meaningful communication, it is important that the sensibilities of users are similar and that the topic to be shared is common. In the case of communication on a content sharing site, the similarities of sensibility, similar tastes to content, impressions and emotions when viewing content, and the commonality of topics, Conditions such as a relationship between a user who has published a certain content and a user who has viewed the content, a content having the same or similar content being disclosed or viewed, and so on.

こういった着眼点に基づけば、同じような嗜好や感情を持つユーザ同士を仲介し、コミュニケーションのきっかけを与えるユーザ支援技術を実現することは、ユーザにとって、より便利な利用と、新たな価値の創造、といった点で大きな利益をもたらす。 Based on these points of view, realization of user support technology that mediates between users with similar tastes and emotions and provides an opportunity for communication is more convenient for users and new value. It brings great benefits in terms of creation.

コミュニケーションを支援するユーザ支援技術として、従来、ユーザが付与したコメントの類似性によってユーザ同士の仲介をする技術が発明されてきている。このような技術の１つとして、特許文献１に開示されている技術がある。 2. Description of the Related Art Conventionally, as a user support technology for supporting communication, a technology has been invented that mediates between users based on the similarity of comments given by users. As one of such techniques, there is a technique disclosed in Patent Document 1.

特許文献１に開示の技術では、ユーザがコメントを付与した箇所はそのユーザが興味を持っている箇所であるという知見から導出される、コメントを付与した箇所が類似しているユーザ同士は感性が似ているという仮説に基づいて、各コンテンツについて、ユーザからシーン単位でコメントを付与する機能を提供し、さらにこのコメントを付与したシーンが類似している複数のユーザを互いに仲介する方法、装置について開示されている。 In the technique disclosed in Patent Document 1, the user's comments are derived from the knowledge that the user is interested, and the users who have similar comments have similar sensibility. On the basis of the hypothesis that they are similar, a method and apparatus for providing a function for giving a comment from a user in units of scenes for each content, and further mediating a plurality of users having similar scenes to which the comment is given. It is disclosed.

尚本発明の関連技術として、音声特徴量出現確率を計算するための確率モデルのパラメータ推定方法（例えば非特許文献１，２参照）、基本周波数及びパワーの抽出法（例えば、非特許文献３参照）、発話速度の時間変動特性の抽出方法（例えば、非特許文献４、特許文献２参照）、一般化状態空間モデルを求める方法（例えば、非特許文献５参照）、感情、感情度を抽出する方法（例えば特許文献３，４，５，６参照）、映像情報のうち色情報、編集情報を抽出する方法（例えば非特許文献６参照）、動きベクトル情報を抽出する方法（例えば非特許文献７参照）、映像モデルのパラメータ推定方法（例えば非特許文献８，９、特許文献７参照）、顔の表情情報を抽出する方法（例えば特許文献８，９参照）が知られている。
特許第３６２２７１１号公報特開２００５−３４５４９６号公報（段落［００１１］〜［００１４］等）。特開２００５−２７５３４８号公報特許第３３７２５３２号公報特開２００６−１１３９１７号公報特開平５−７３３１７号公報特開２００５−０７８５５５号公報特開２００５−１５７９１１号公報特許第３０９８２７６号公報石井健一郎、上田修功、前田栄作、村瀬洋、「わかりやすいパターン認識」、オーム社、第１版、１９９８年８月、ｐｐ．５２−５４。汪金芳、手塚集、上田修功、田栗正章、「計算統計Ｉ確率計算の新しい手法統計科学のフロンティア１１第ＩＩＩ章３ＥＭ法４変分ベイズ法」、岩波書店、２００３年６月、ｐｐ．１５７−１８６。古井貞熙、「ディジタル音声処理第４章４．９ピッチ抽出」、東海大学出版会、１９８５年９月、ｐｐ．５７−５９。嵯峨山茂樹、板倉文忠、「音声の動的尺度に含まれる個人性情報」、日本音響学会昭和５４年度春季研究発表会講演論文集、３−２−７、１９７９年、ｐｐ．５８９−５９０。Ｋｉｔａｇａｗａ，Ｇ．、「Ｎｏｎ−Ｇａｕｓｓｉａｎｓｔａｔｅ−ｓｐａｃｅｍｏｄｅｌｉｎｇｏｆｎｏｎｓｔａｔｉｏｎａｒｙｔｉｍｅｓｅｒｉｅｓ」、ＪｏｕｒｎａｌｏｆｔｈｅＡｍｅｒｉｃａｎＳｔａｔｉｓｔｉｃａｌＡｓｓｏｃｉａｔｉｏｎ、１９８７年１２月、ｐｐ．１０３２−１０６３。「映像特徴インデクシングに基づく構造化映像ハンドリング機構と映像利用インタフェースに関する研究第３章画像処理に基づく映像インデクシング」、外村佳伸、京都大学博士論文、ｐｐ．１５−２３、２００６「コンピュータ画像処理」、田村秀行編著、オーム社、ｐｐ．２４２−２４７、２００２年１２月「計算統計Ｉ第ＩＩＩ章３ＥＭ法４変分ベイズ法」、上田修功、岩波書店、ｐｐ．１５７−１８６、２００３年６月「日本語語彙大系」、ＮＴＴコミュニケーション科学研究所監修、池原悟、宮崎正弘、白井諭、横尾昭男、中岩浩巳、小倉健太郎、大山芳史、林良彦編集、岩波書店、１９９７年 As related techniques of the present invention, a parameter estimation method for a probability model for calculating a speech feature amount appearance probability (see, for example, Non-Patent Documents 1 and 2), a fundamental frequency and power extraction method (for example, see Non-Patent Document 3). ), Extraction method of time variation characteristics of speech rate (see, for example, Non-Patent Document 4 and Patent Document 2), method for obtaining a generalized state space model (see, for example, Non-Patent Document 5), emotion, and emotion level A method (for example, see Patent Documents 3, 4, 5, and 6), a method for extracting color information and editing information from video information (for example, see Non-Patent Document 6), and a method for extracting motion vector information (for example, Non-Patent Document 7). (See, for example, Non-Patent Documents 8 and 9 and Patent Document 7) and a method for extracting facial expression information (see, for example, Patent Documents 8 and 9).
Japanese Patent No. 3622711 Japanese Patent Laying-Open No. 2005-345496 (paragraphs [0011] to [0014] and the like). JP 2005-275348 A Japanese Patent No. 3372532 Japanese Patent Laid-Open No. 2006-113717 Japanese Patent Laid-Open No. 5-73317 Japanese Patent Laying-Open No. 2005-077855 JP 2005-157911 A Japanese Patent No. 3098276 Kenichiro Ishii, Noriyoshi Ueda, Eisaku Maeda, Hiroshi Murase, “Intuitive Pattern Recognition”, Ohmsha, 1st edition, August 1998, pp. 52-54. Kanayoshi Tsuji, Shuzuka Tezuka, Nobuo Ueda, Masaaki Taguri, “Computational Statistics I: A New Method of Stochastic Calculations, Frontier of Statistical Science 11 Chapter 3 3EM Method, Four Variational Bayes Method”, Iwanami Shoten, June 2003, pp. 157-186. Sadahiro Furui, “Digital Audio Processing, Chapter 4, 4.9 Pitch Extraction”, Tokai University Press, September 1985, pp. 57-59. Shigeki Hiyama and Fumitada Itakura, “Personality information included in the dynamic scale of speech”, Proceedings of the Acoustical Society of Japan 1979 Spring Research Presentation, 3-2-7, 1979, pp. 589-590. Kitagawa, G .; , "Non-Gaussian state-space modeling of non- sternary time series", Journal of the American Statistical Association, December 1987, pp. 10-28. 1032-1063. “Research on structured video handling mechanism and video interface based on video feature indexing, Chapter 3 Video indexing based on image processing”, Yoshinobu Tonomura, Ph.D. 15-23, 2006 "Computer image processing", edited by Hideyuki Tamura, Ohmsha, pp. 242-247, December 2002 “Computational Statistics I Chapter 3 3EM Method 4 Variation Bayes Method”, Nobuyoshi Ueda, Iwanami Shoten, pp. 157-186, June 2003 "Japanese vocabulary system", supervised by NTT Communication Science Laboratories, Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai, Akio Yokoo, Hiroaki Nakaiwa, Kentaro Ogura, Yoshifumi Oyama, Yoshihiko Hayashi, Iwanami Shoten, 1997

しかしながら、このような従来のユーザ支援方法、装置では、原理的に、２人以上のユーザが実際にコンテンツを視聴し、コメントを書き込む、という過程を経なければ、ユーザ同士を仲介する手段は無かった。 However, in such a conventional user support method and apparatus, in principle, there is no means for mediating between users unless two or more users actually view content and write comments. It was.

従って、一度もコメントを書き込んだことのないユーザや、他のユーザが一度も視聴したことのないコンテンツを視聴したユーザにとっては、どのユーザが自分と同じ嗜好、感情を持っているか、という情報を得ることは出来ないという問題があった。 Therefore, for users who have never written a comment, or for users who have watched content that other users have never watched, information about which users have the same preferences and feelings as their own. There was a problem that I could not get it.

また一方で、ユーザの嗜好や感情は、時々刻々と変化するものである。このため、コミュニケーションを取る時点での嗜好や感情を、タイムリーに反映したユーザ支援を行うことが必要となる。 On the other hand, user preferences and emotions change from moment to moment. For this reason, it is necessary to provide user support that reflects the preference and emotion at the time of communication in a timely manner.

これに対し、従来のユーザ支援方法、装置は、コメントに基づいた嗜好、感情の共有を行うものであった。コメントは、チャット等と異なり、コミュニケーション手段としては実時間性が低く、例えば、あるユーザが過去にコメントを付与した時点での嗜好や感情と、該ユーザが他のユーザとコミュニケーションを取る時点での嗜好や感情は異なっている、ということが起こりやすい。このため、タイムリーに嗜好、感情を反映したユーザ支援を実施することは難しいという問題があった。 On the other hand, the conventional user support method and apparatus share preference and emotion based on comments. Comments, unlike chat, etc., have low real-time characteristics as a means of communication. For example, when a user has given a comment in the past, the preferences and feelings at the time when the user communicates with other users It tends to happen that preferences and feelings are different. For this reason, there has been a problem that it is difficult to implement user support reflecting tastes and feelings in a timely manner.

本発明は上記課題を解決するものでありその目的は、信頼性が高く、タイムリーなユーザ同士の関連付けを実現することができるユーザ支援方法、ユーザ支援装置およびユーザ支援プログラムを提供することにある。 The present invention solves the above-described problems, and an object of the present invention is to provide a user support method, a user support apparatus, and a user support program capable of realizing a highly reliable and timely association between users. .

本発明は、ユーザの嗜好を反映するコンテンツの属性と、属性の値をコンテンツに含まれる情報に基づいて分析し、分析された属性、属性の値を基準として類似したコンテンツを公開、視聴したユーザ同士を、嗜好等の感性が類似していると判断し、関連付けを実施するユーザ支援方法、装置、プログラムである。 The present invention analyzes a content attribute that reflects a user's preference and an attribute value based on information included in the content, and discloses and views similar content based on the analyzed attribute and attribute value It is a user support method, a device, and a program that judge that sensibilities such as preferences are similar to each other and perform association.

より具体的には、各ユーザが公開しているコンテンツ、視聴しているコンテンツ、視聴したコンテンツなどの音声情報、映像情報のうちの少なくとも１つを用いて、コンテンツに含まれる属性、属性の値を表す感情度を抽出する。また更に、必要であれば、コンテンツ視聴時のユーザの活動情報を取得する。これらの情報をユーザの嗜好や感情を反映した指標とし、この指標に基づき、２人以上のユーザ間を関連付ける。 More specifically, attributes and attribute values included in the content using at least one of the content published by each user, the content being viewed, the audio information such as the content viewed, and the video information. To extract the emotional level. Furthermore, if necessary, the activity information of the user when viewing the content is acquired. These pieces of information are used as indices reflecting user preferences and feelings, and two or more users are associated with each other based on the indices.

この方法により、コメントの書き込みなど、ユーザに対して、コメントの付与などの特別な動作を要求することなく、公開しているコンテンツ、視聴しているコンテンツ、視聴したコンテンツに基づいてコミュニケーションのきっかけを与えることを可能とすると共に、コミュニケーション、コミュニティ形成において重要となる嗜好や感情の簡単かつタイムリーな反映によるユーザ支援を可能とする。 This method allows users to communicate based on publicly available content, watched content, and watched content without requiring users to take special actions such as writing comments. In addition to making it possible to give users, it is possible to support users through simple and timely reflection of preferences and emotions that are important in communication and community formation.

尚、本発明において、嗜好とは、興味、関心等も含むものとし、感情とは、情動、雰囲気、印象等も含むものとする。また、コンテンツとは、映像・音声コンテンツを指すものとし、更に、音声とは、人間による発話音声のみではなく、歌唱音声、音楽、環境音等も含むものとする。 In the present invention, preference includes interest and interest, and emotion includes emotion, atmosphere, impression, and the like. In addition, content refers to video / audio content, and audio includes not only human speech but also singing audio, music, environmental sound, and the like.

第１の発明による方法（請求項１）、装置（請求項４）では、複数のユーザを関連付けるユーザ支援方法、装置であって、コンテンツの音声情報、映像情報のうち何れか１つ以上から特徴量を抽出し、該抽出された特徴量と、予め構成された統計モデルに基づいて、前記コンテンツの複数の感情のそれぞれの感情度を推定する推定ステップ／手段と、前記コンテンツの識別情報と、該コンテンツについて推定された前記感情とその感情度を対応づけて記憶しておくコンテンツ属性記憶ステップ／手段と、ユーザ端末毎に、該ユーザ端末が公開しているコンテンツ、視聴しているコンテンツ、視聴したコンテンツのうち少なくとも１つについて、該コンテンツを識別する情報と、該ユーザ端末を識別する情報とを対応づけ、これをユーザ嗜好情報として記憶しておくユーザ情報記憶ステップ／手段と、前記ユーザ嗜好情報に基づいて、各ユーザ端末に対応づけられたコンテンツを対象に、複数の感情のそれぞれについて、感情度の荷重平均を感情カテゴリの嗜好値としてユーザ端末毎に求め、求めたユーザ端末の嗜好値の類似度に基づき、第１のユーザ端末に対して、該第１のユーザ端末以外の１つ以上のユーザ端末を関連付けるユーザ関連付けステップ／手段と、を備えることを特徴とする。
The method according to the first invention (Claim 1) and the apparatus (Claim 4) are user support methods and apparatuses for associating a plurality of users, and are characterized by any one or more of content audio information and video information An estimation step / means for extracting an amount, estimating an emotion level of each of a plurality of emotions of the content based on the extracted feature amount and a pre-configured statistical model, identification information of the content, A content attribute storage step / means for storing the emotion estimated for the content in association with the emotion level, content published by the user terminal, content being viewed, viewing The information for identifying the content and the information for identifying the user terminal are associated with each other for at least one of the obtained contents, and the user preference A user information storing step / means for storing as, based on the user preference information, the subject content associated with the respective user terminal, for each of the plurality of emotion, the weighted average of the emotion of the emotional category A user association step for obtaining a preference value for each user terminal and associating one or more user terminals other than the first user terminal with the first user terminal based on the similarity of the obtained preference values of the user terminals. / Means.

第２の発明による方法（請求項２）、装置（請求項５）では、第１の発明に記載の方法、装置において、前記ユーザ情報記憶ステップ／手段が、ユーザ端末毎に、該ユーザ端末が公開しているコンテンツ、視聴しているコンテンツ、視聴したコンテンツのうち少なくとも１つについて、該コンテンツを識別する情報と、該ユーザ端末において当該コンテンツが視聴されている又は視聴された際の該ユーザ端末を利用するユーザの動作情報、音声情報、生体情報のうち、少なくとも１つからなる活動情報を取得し、これを該ユーザ端末のユーザ嗜好情報として記憶しておき、前記ユーザ関連付けステップ／手段が、ユーザ端末の嗜好値および前記活動情報から算出される類似度に基づき、ユーザ端末を関連付けることを特徴とする。 The method according to the second invention (claim 2), the apparatus (claim 5) The method according to the first invention, in the apparatus, said user information storing step / means, for each user terminal, the user terminal Information for identifying at least one of the published content, the viewed content, and the viewed content, and the user terminal when the content is viewed or viewed on the user terminal Activity information consisting of at least one of user's operation information, voice information, and biological information , and storing this as user preference information of the user terminal . The user terminal is associated based on the preference value of the user terminal and the similarity calculated from the activity information .

第３の発明による方法（請求項３）、装置（請求項６）では、第１又は第２の発明による方法、装置において、前記ユーザ情報記憶ステップ／手段は更に、前記ユーザ嗜好情報の時間変化を記憶しておくことを特徴とする。 In the method (invention 3) and the apparatus (invention 6) according to the third invention, in the method and apparatus according to the first or second invention, the user information storing step / means further includes a time change of the user preference information. It is characterized by memorizing.

第４の発明によるプログラム（請求項７）では、請求項１乃至３の何れかに記載の各ステップを実行させるためのプログラムとしたことを特徴とする。 According to a fourth aspect of the present invention, there is provided a program for executing the steps according to any one of the first to third aspects.

（１）請求項１〜７に記載の発明によれば、コンテンツに含まれる情報に基づいて、ユーザの嗜好を反映したコンテンツの属性、属性の値（感情、感情度）を推定することができる。また、この属性、属性の値に基づいて、ユーザが公開・視聴している、もしくは、視聴したコンテンツから即座にユーザの嗜好を分析し、これに基づいた関連付けを実施することができる。 (1) According to the first to seventh aspects of the present invention, based on the information included in the content, it is possible to estimate the content attributes and attribute values (emotion, emotion level) reflecting the user's preference. . Further, based on the attribute and the value of the attribute, the user's preference can be immediately analyzed from the content that the user has made public / viewed or viewed, and the association can be performed based on this.

この発明によれば、単純に同じコンテンツを視聴したユーザ同士、もしくはコメントを付与したユーザ同士などを関連付けるのではなく、ユーザの嗜好を反映したコンテンツの属性、属性の値での関連付けを行うことで、信頼性の高い、タイムリーな関連付けを実現することができる。また、この発明によれば、関連付けを実施するために、ユーザがコメントを付与するなどの行動を取る必要がなく、ユーザにとって利用障壁が低いという利点を持つ。
（２）また請求項２，５に記載の発明によれば、ユーザの嗜好情報と、更に、コンテンツ視聴時のユーザの活動情報とによってユーザの嗜好、感情を分析することができる。この発明によれば、更に正確なユーザ同士の関連付けを実現することができる。
（３）また請求項３，６に記載の発明によれば、ユーザ嗜好情報の時間変化を記憶しておくことで、各ユーザの嗜好や感情の動的な変化に対する指標を得ることができる。この発明によれば、ユーザの嗜好や感情の変化を含めて関連付けを実施することにより、更にユーザ同士の関連付けの適時性と精度を高めることができる。
（４）また請求項７に記載の発明によれば、上記発明を全て計算機上で実行することが可能となる。 According to this invention, it is not simply associated with users who have viewed the same content, or with users who have added comments, but by associating with content attributes and attribute values that reflect user preferences. Reliable and timely association can be realized. Moreover, according to this invention, in order to implement association, it is not necessary for the user to take an action such as giving a comment, and there is an advantage that the use barrier is low for the user.
(2) According to the second and fifth aspects of the present invention, user preferences and emotions can be analyzed based on user preference information and user activity information during content viewing. According to the present invention, more accurate association between users can be realized.
(3) According to the inventions described in claims 3 and 6, by storing the time change of the user preference information, it is possible to obtain an index for the dynamic change of the preference and emotion of each user. According to this invention, it is possible to further improve the timeliness and accuracy of association between users by performing association including changes in user preferences and emotions.
(4) According to the invention as set forth in claim 7, it is possible to execute all the above inventions on a computer.

以下、図面を参照しながら本発明の実施の形態を説明するが、本発明は下記の実施形態例に限定されるものではない。ユーザの嗜好、感情を反映するコンテンツの属性としては、例えば、コンテンツのジャンルや、被写体、撮影場所、撮影時刻、時間、コンテンツの作成者など、様々なものがあるが、好ましくは、ユーザがコンテンツを視聴した際にユーザの受ける印象を反映する属性を選択する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings, but the present invention is not limited to the following embodiments. There are various content attributes that reflect user preferences and emotions, such as content genre, subject, shooting location, shooting time, time, content creator, etc. Select an attribute that reflects the impression that the user receives when viewing.

このような属性の１例として、本発明の実施形態では、コンテンツに含まれる感情を想定し、属性の値として感情度を推定することとし、感情、感情度に基づいてユーザ同士の関連付けを行う場合の例について説明する。以降、本発明の実施形態の例３つについて図１〜図１９を用いて説明する。
[実施形態の第１例]
本発明の実施形態の第１例は、ユーザ支援を実行する際、コンテンツの音声情報から感情、感情度を抽出し、この情報に基づいて、ユーザ嗜好情報を求め、ユーザ同士の関連付けを実行する場合についてである。 As an example of such an attribute, in the embodiment of the present invention, an emotion included in content is assumed, an emotion level is estimated as an attribute value, and users are associated with each other based on the emotion and the emotion level. An example of the case will be described. Hereinafter, three examples of the embodiment of the present invention will be described with reference to FIGS.
[First example of embodiment]
In the first example of the embodiment of the present invention, when user support is executed, emotions and emotion levels are extracted from audio information of contents, user preference information is obtained based on this information, and association between users is executed. It is about the case.

本発明の実施形態の第１例に係るユーザ支援方法、ユーザ支援装置について説明する。図１は、本発明の実施形態に係るユーザ支援方法を説明する処理を示すフロー図、図２は、本発明の実施形態に関わるユーザ支援装置１００を説明するブロック図である。 A user support method and a user support apparatus according to a first example of an embodiment of the present invention will be described. FIG. 1 is a flowchart showing a process for explaining a user support method according to an embodiment of the present invention, and FIG. 2 is a block diagram for explaining a user support apparatus 100 according to the embodiment of the present invention.

この実施形態の第１例におけるユーザ支援装置１００は、少なくともユーザ端末２００ａ、２００ｂ、・・・、情報制御部３００、データベース４００（本発明のコンテンツ属性記憶手段）で構成される。これらは、同一端末内で接続されてもよく、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）及び／又はインターネットなどの広域通信網などの、相互に通信可能な所定の通信手段を介して接続されてもよい。 The user support apparatus 100 in the first example of this embodiment includes at least user terminals 200a, 200b,..., An information control unit 300, and a database 400 (content attribute storage means of the present invention). These may be connected within the same terminal, or may be connected via a predetermined communication means that can communicate with each other, such as a LAN (Local Area Network) and / or a wide area communication network such as the Internet.

情報制御部３００とデータベース４００は、例えば、データベース４００が情報制御部３００内に含まれているなどによって、同一装置内に格納されていてもよい。 The information control unit 300 and the database 400 may be stored in the same device, for example, because the database 400 is included in the information control unit 300.

各ユーザ端末２００ａ、２００ｂ、・・・の構成を説明するブロック図を図３に示す。ユーザ端末２００ａ、２００ｂ、・・・は、例えば、キーボード２１１、マウス等に代表されるポインティングデバイス２１２と、コンテンツの音声情報（音声信号データ）、映像情報（映像信号データ）が入力されるデータ入力手段（図示省略）を備えた入力部２１０、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２２１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２２２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２２３から構成される制御部２２０、ＨＤＤ（ＨａｒｄＤｉｓｃＤｒｉｖｅ）２３１から構成される記憶部２３０、液晶画面等のモニタ画面２４１を有し、入力部２１０の操作に応じて制御部２２０から出力する情報を表示する表示部２４０を備えたものとする。 FIG. 3 is a block diagram illustrating the configuration of each user terminal 200a, 200b,. The user terminals 200a, 200b,... Are, for example, a keyboard 211, a pointing device 212 represented by a mouse, etc., and data input for inputting audio information (audio signal data) and video information (video signal data) of content. An input unit 210 having means (not shown), a CPU (Central Processing Unit) 221, a ROM (Read Only Memory) 222, a RAM (Random Access Memory) 223, a control unit 220, and an HDD (Hard Disc Drive) 231. The display unit 240 includes a storage unit 230 including a monitor screen 241 such as a liquid crystal screen, and displays information output from the control unit 220 in accordance with an operation of the input unit 210.

情報制御部３００は、ＣＰＵ３０１、ＲＯＭ３０２、ＲＡＭ３０３、ＨＤＤ３０４等が相互接続され構成される。本発明における各種の処理は、全てこの情報制御部３００によって行われるものであり、各種処理を実現するプログラム及びデータは、全てＲＯＭ３０２やＨＤＤ３０４等のメモリ装置に記憶され、適宜ＲＡＭ３０３に読み出され、ＣＰＵ３０１において処理が実行される。 The information control unit 300 is configured by connecting a CPU 301, a ROM 302, a RAM 303, an HDD 304, and the like. Various processes in the present invention are all performed by the information control unit 300, and all programs and data for realizing the various processes are stored in a memory device such as the ROM 302 and the HDD 304, and are appropriately read out to the RAM 303. Processing is executed in the CPU 301.

情報制御部３００は機能部毎に、感情推定部Ｆ１００（本発明の推定手段）、ユーザ情報記憶部Ｆ２００（本発明のユーザ情報記憶手段）、ユーザ関連付け部Ｆ３００、更に、必要であれば、通知部Ｆ４００を備えているものとする。 The information control unit 300 includes, for each function unit, an emotion estimation unit F100 (estimation unit of the present invention), a user information storage unit F200 (user information storage unit of the present invention), a user association unit F300, and a notification if necessary. It is assumed that a part F400 is provided.

ステップＳ１００として、この実施形態におけるユーザ支援装置１００では、ユーザ端末２００ａ、２００ｂ、・・・からコンテンツ公開要請を受け、コンテンツをデータベース４００に格納する際、図４に示すように、ＩＤ等の各コンテンツ、及びコンテンツを所定の部分区間に分割した部分コンテンツ固有の識別情報と、感情推定部Ｆ１００により推定されたコンテンツの感情、感情度とを対応付けた状態で保存する。更に、コンテンツを公開したユーザ端末のユーザ嗜好情報として、コンテンツを公開したユーザ端末の識別情報、公開されたコンテンツの識別情報、コンテンツの公開日時等をユーザ嗜好情報として、ユーザ情報記憶部Ｆ２００に記憶する。 As step S100, in the user support apparatus 100 in this embodiment, when receiving a content release request from the user terminals 200a, 200b,... And storing the content in the database 400, as shown in FIG. The content, the identification information unique to the partial content obtained by dividing the content into predetermined partial sections, and the emotion and emotion level of the content estimated by the emotion estimation unit F100 are stored in association with each other. Further, as user preference information of the user terminal that published the content, the identification information of the user terminal that published the content, the identification information of the published content, the date of publication of the content, etc. are stored in the user information storage unit F200 as user preference information. To do.

ステップＳ２００では、ユーザ端末２００ａ、２００ｂ、・・・からのコンテンツ公開、又はコンテンツ配信要求に応じてコンテンツの公開又は配信を実行し、同時に、各ユーザ端末の識別情報とコンテンツの識別情報、公開／配信日時等を対応付けたユーザ嗜好情報をユーザ情報記憶部Ｆ２００に保存する。 In step S200, content publishing from user terminals 200a, 200b,... Or content publishing or distribution is executed in response to a content distribution request, and at the same time, identification information of each user terminal and content identification information, User preference information associated with delivery date and time is stored in the user information storage unit F200.

次にステップＳ３００では、ユーザ関連付け部Ｆ３００が、記憶されたユーザ嗜好情報に基づいて、各ユーザ端末間の関連付けを行う。 Next, in step S300, the user association unit F300 associates the user terminals based on the stored user preference information.

最後に、ステップＳ４００では、通知部Ｆ４００が、関連付けの結果に関する情報を各ユーザ端末２００ａ、２００ｂ、・・・に送信する。 Finally, in step S400, the notification unit F400 transmits information related to the association result to each of the user terminals 200a, 200b,.

以下、本発明の実施形態の１例における処理の流れについて、図１を用いて、詳細に説明する。 Hereinafter, the flow of processing in one example of the embodiment of the present invention will be described in detail with reference to FIG.

まず、コンテンツ公開要請を受けた時点で、予め、感情推定部Ｆ１００が、該コンテンツに含まれる音声情報を基に、コンテンツの感情、感情度を抽出しておく。 First, when receiving a content release request, the emotion estimation unit F100 extracts in advance the emotion and emotion level of the content based on the audio information included in the content.

感情推定部Ｆ１００は、コンテンツに含まれる音声情報のうち、韻律情報、音声認識した結果得られるテキスト情報のうち何れか１つ以上に基づいて、コンテンツの感情、感情度を計算する。感情推定部Ｆ１００によって、コンテンツの感情、感情度を計算する為の処理は、本発明によって実際にユーザ支援を実行する前に、行っておく事前処理である。 The emotion estimation unit F100 calculates the emotion and the emotion level of the content based on any one or more of the prosodic information and the text information obtained as a result of speech recognition among the audio information included in the content. The process for calculating the emotion and the emotion level of the content by the emotion estimation unit F100 is a pre-process to be performed before the user support is actually executed according to the present invention.

音声に含まれる韻律情報から、感情、感情度を推定する方法の１例について説明する。 An example of a method for estimating emotion and emotion level from prosodic information included in speech will be described.

図５は、本実施形態における感情検出方法を説明するフローチャートである。 FIG. 5 is a flowchart for explaining an emotion detection method in the present embodiment.

なお、本実施形態の説明におけるデータは、汎用的な記憶部（例えば、メモリやハードディスク装置）あるいは記憶手段に記憶され、アクセスされるものとする。 Note that the data in the description of the present embodiment is stored in and accessed by a general-purpose storage unit (for example, a memory or a hard disk device) or storage means.

まず、ステップＳ１１０（統計モデル構築処理ステップ）は、本実施形態に係る感情検出方法によって、実際に感情的状態の判定を行う前に、予め行っておくステップであり、感情的状態確率を計算するために用いる統計モデルを構築するステップである。なお、前記統計モデルの実体とは、その統計を計算する関数、及び、統計量などのパラメータをプログラムとして記載したものである。なお、前記統計モデルのパラメータ及び関数型を表す符号は記憶部に格納されることになるが、それらのための記憶容量が比較的小さい。 First, step S110 (statistical model construction processing step) is a step that is performed in advance before actually determining the emotional state by the emotion detection method according to the present embodiment, and calculates the emotional state probability. This is a step of constructing a statistical model used for this purpose. The entity of the statistical model is a function in which the statistics are calculated and parameters such as statistics are described as a program. In addition, although the code | symbol showing the parameter and function type | mold of the said statistical model will be stored in a memory | storage part, the memory capacity for them is comparatively small.

次に、ステップＳ１２０（音声特徴量抽出処理ステップ）では、取り込まれたコンテンツの音声信号データから、所望の音声特徴量をベクトルとして分析フレーム（以下、単にフレームと呼ぶ）毎に計算し、抽出する。なお、この音声特徴量ベクトルは、基本周波数，基本周波数の時間変動特性の系列，パワー，パワーの時間変動特性の系列，発話速度の時間変動特性のうち１つ以上の要素で構成されるベクトルである。また、音声信号データは、音声信号データ入力手段（例えば、図３の入力部２１０）によって、入力されるものとする。抽出された音声特徴量は、記憶部に記憶される。ステップＳ１２０〜Ｓ１５０は感情的状態確率を計算する処理である。 Next, in step S120 (speech feature amount extraction processing step), a desired speech feature amount is calculated as a vector for each analysis frame (hereinafter simply referred to as a frame) and extracted from the captured content speech signal data. . The speech feature vector is a vector composed of one or more elements of a fundamental frequency, a time variation characteristic sequence of the fundamental frequency, a power, a time variation characteristic series of power, and a time variation characteristic of speech rate. is there. The audio signal data is input by an audio signal data input unit (for example, the input unit 210 in FIG. 3). The extracted voice feature amount is stored in the storage unit. Steps S120 to S150 are processes for calculating the emotional state probability.

次に、ステップＳ１３０（音声特徴量出現確率計算処理ステップ）では、ステップＳ１２０において計算され記憶部に記憶された音声特徴量ベクトルに基づき、フレーム毎に、感情的状態に対応する音声特徴量ベクトルが出現する確率を、ステップＳ１１０において予め構成された統計モデルによって算出し、その算出結果を音声特徴量出現確率と見做す。 Next, in step S130 (speech feature amount appearance probability calculation processing step), a speech feature amount vector corresponding to the emotional state is determined for each frame based on the speech feature amount vector calculated in step S120 and stored in the storage unit. The probability of appearance is calculated by a statistical model configured in advance in step S110, and the calculation result is regarded as the speech feature amount appearance probability.

次に、ステップＳ１４０（感情的状態遷移確率計算処理ステップ）では、フレーム毎に、ステップＳ１１０において予め構成された統計モデルを用いて、感情的状態に対応する１つ以上の状態変数の時間方向への遷移確率を算出し、その算出結果を感情的状態遷移確率と見做す。 Next, in step S140 (emotional state transition probability calculation processing step), for each frame, in the time direction of one or more state variables corresponding to the emotional state, using the statistical model previously configured in step S110. The transition probability is calculated, and the calculation result is regarded as the emotional state transition probability.

次に、ステップＳ１５０（感情的状態確率計算処理ステップ）では、フレーム毎に、ステップＳ１３０で計算した音声特徴量出現確率及びＳ１４０で計算した感情的状態遷移確率に基づいて、感情的状態確率を計算する。 Next, in step S150 (emotional state probability calculation processing step), the emotional state probability is calculated for each frame based on the speech feature amount appearance probability calculated in step S130 and the emotional state transition probability calculated in S140. To do.

そして、ステップＳ１６０（感情的状態判定処理ステップ）では、フレーム毎に、ステップＳ１５０で計算した感情的状態確率に基づいて、該フレームの感情的状態を判定し、出力する。 In step S160 (emotional state determination processing step), the emotional state of the frame is determined and output for each frame based on the emotional state probability calculated in step S150.

ステップＳ１７０（感情、感情度推定処理ステップ）において、ステップＳ１５０において計算された感情的状態確率，判定された感情的状態，連続発話及び非発話時間，連続発話，連続非発話時間のうち１つ以上に基づいて、１つ以上のフレームで構成される区間を構成し、この区間を単位として感情、感情度を推定する。 In step S170 (emotion, emotion level estimation processing step), one or more of the emotional state probability calculated in step S150, the determined emotional state, continuous speech and non-speech time, continuous speech, and continuous non-speech time Based on the above, a section composed of one or more frames is formed, and the emotion and the emotion level are estimated using this section as a unit.

以下に、感情検出方法の各ステップを詳細に説明する。 Hereinafter, each step of the emotion detection method will be described in detail.

まず、統計モデルを構成するステップＳ１１０の処理詳細を図６に基づいて説明する。なお、統計モデルは、学習用音声信号データから学習を行うことによって獲得するものとする。 First, details of the processing in step S110 constituting the statistical model will be described with reference to FIG. It is assumed that the statistical model is acquired by learning from learning speech signal data.

まず、ステップＳ１１１において、学習用音声信号データを入力する。なお、学習用音声信号データは、音声信号データ入力手段から入力されても良いし、学習用音声信号データ専用の入力手段（学習用音声信号データ入力手段）から入力されても良い。 First, in step S111, learning speech signal data is input. Note that the learning speech signal data may be input from speech signal data input means, or may be input from input means dedicated to learning speech signal data (learning speech signal data input means).

次に、ステップＳ１１２において、この学習用音声信号データに対して、学習用音声信号データ全てに渡って、フレーム毎に、人間が実際に視聴して判断したそのフレームにおける感情的状態ｅを与える。ここで、この感情的状態ｅは、人間によって判定されたものをラベルと呼ぶこととし、判定行為をラベル付けと呼ぶこととする。 Next, in step S112, the learning voice signal data is given the emotional state e in the frame, which is actually viewed and judged by a human, for each frame over the entire learning voice signal data. Here, in this emotional state e, what is determined by a human is referred to as a label, and the determination action is referred to as labeling.

また、厳密にはフレーム毎に感情的状態ｅのラベルが与えられていない場合であっても、フレーム単位にラベル付けされるように変換できるものであれば、これに変換することで利用してもよい。ラベル付けの例としては、ある区間において感情的状態ｅのラベルが付与されている場合もある。この場合には、その区間に含まれるフレームに対して、区間に付与されたラベルと同じラベルを付与することによって、フレーム毎にラベル付けできる。より具体的には、音声のある時刻ｔ１〜ｔ２が感情的状態ｅとラベル付けされている場合は、該区間の全フレームはｅと与えられているものとして構成する。 Strictly speaking, even if the emotional state e label is not given for each frame, if it can be converted so that it is labeled in units of frames, it can be used by converting to this. Also good. As an example of labeling, there is a case where a label of emotional state e is given in a certain section. In this case, it is possible to label each frame by giving the same label as the label given to the section to the frame included in the section. More specifically, when the time t1 to t2 with the sound is labeled as the emotional state e, all the frames in the section are configured as e.

次に、ステップＳ１１３において、フレーム毎に、前記ステップＳ１２０と同様に、音声特徴量ベクトルｘを抽出する。以下では、フレーム番号ｔのフレームＦtにおける音声特徴量ベクトルｘt、感情的状態ｅtと表す。 Next, in step S113, the speech feature quantity vector x is extracted for each frame, as in step S120. Hereinafter, the voice feature vector xt and the emotional state et in the frame Ft of the frame number t are represented.

次に、ステップＳ１１４において、音声特徴量出現確率を計算するための統計モデルと、感情的状態遷移確率を計算するための統計モデルをそれぞれ学習によって構成する。 Next, in step S114, a statistical model for calculating the speech feature amount appearance probability and a statistical model for calculating the emotional state transition probability are configured by learning.

まず、音声特徴量出現確率を計算するための統計モデルの学習方法の一例を説明する。 First, an example of a statistical model learning method for calculating the speech feature amount appearance probability will be described.

音声特徴量出現確率を計算するための統計モデルは、フレーム毎に与えられる音声特徴量ベクトルｘ空間上の確率分布であって、例えば、フレームＦtにおいては、それ以前のある（ｎ−１）フレーム区間における感情的状態ｅt＝{ｅ_t，ｅ_t-1，・・・，ｅ_t-n+1}に依存してｘtが出現する確率を表す、条件付き確率分布ｐ^A（ｘt｜ｅt）として作成する。ｎは、例えば、２〜３程度とするものでよい。 The statistical model for calculating the speech feature amount appearance probability is a probability distribution in the speech feature vector vector x space given for each frame. For example, in the frame Ft, a certain (n−1) frame before that emotional state et for the interval _{_{= {e t, e t-}} 1, ···, e t-n + 1} depending on represents the probability that xt appears, the conditional probability distribution p ^a (xt | et) Create as. For example, n may be about 2 to 3.

この条件付き確率分布ｐ^A（ｘt｜ｅt）は、例えば、ｅtの取りうる値毎に、正規分布、混合正規分布などといった、確率モデルを用いて構成してもよい。また、更に音声特徴量の種類別に、正規分布、混合正規分布、多項分布などといった、確率モデルを用いて構成するのでもよい。これら確率モデルのパラメータを学習用音声信号データに基づいて推定することになる。 The conditional probability distribution p ^A (xt | et) may be configured using a probability model such as a normal distribution, a mixed normal distribution, or the like for each possible value of et. Further, it may be configured using a probability model such as a normal distribution, a mixed normal distribution, and a multinomial distribution for each type of speech feature. The parameters of these probability models are estimated based on the learning speech signal data.

なお、前記パラメータの推定方法は、例えば、最尤推定法や、ＥＭアルゴリズム、変分Ｂａｙｅｓ法などが公知のもの（例えば、非特許文献１、非特許文献２などを参照）を用いることができる。 As the parameter estimation method, for example, a maximum likelihood estimation method, an EM algorithm, a variational Bayes method, or the like (for example, see Non-Patent Document 1, Non-Patent Document 2, etc.) can be used. .

次に、感情的状態遷移確率を計算するための統計モデルを説明する。 Next, a statistical model for calculating the emotional state transition probability will be described.

感情的状態遷移確率を計算するための統計モデルは、音声特徴量出現確率を計算するための統計モデル同様、学習用音声信号データから、学習を行うことによって獲得する。 The statistical model for calculating the emotional state transition probability is acquired by performing learning from the speech signal data for learning, similarly to the statistical model for calculating the speech feature amount appearance probability.

前記ステップＳ１１１〜Ｓ１１３において、前記学習用音声信号データは、学習用音声信号データ全てに渡って、フレーム毎に、抽出された音声特徴量ベクトルｘと、人間が実際に視聴して判断したそのフレームにおける感情的状態ｅと、が与えられているという前提で、以下に、ステップＳ１１４の学習を行うステップの一例について説明する。なお、フレーム番号ｔにおける感情的状態をｅtと表す。 In steps S111 to S113, the learning speech signal data includes the extracted speech feature vector x for each frame over all the learning speech signal data, and the frame actually determined by human viewing and viewing. An example of the step of performing learning in step S114 will be described below on the assumption that the emotional state e is given. The emotional state at frame number t is represented by et.

感情的状態遷移確率計算のための統計モデルは、ｔ番目のフレームＦtにおける感情的状態の系列ｅt空間上の確率分布であって、Ｆt以前の（ｎ−１）フレームにおける感情的状態系列ｅ_t-1＝{ｅ_t-1，ｅ_t-2，・・・，ｅ_t-n}に依存して、ｅtが出現する確率を表す条件付き確率分布ｐ^B（ｅ_t｜ｅ_t-1）として作成する。 Statistical model for the emotional state transition probability calculating is a probability distribution on series et spatial emotional state in the t-th frame Ft, emotional state sequence in Ft previous (n-1) frame e _{t _{-1 = {e t-1,}} e t-2, ···, e tn} , depending on the conditional probability distribution p ^B representing the probability that et appears | created as (e _{_t} e _{_t-1)} To do.

なお、ｅtは、例えば、喜び、怒り、哀しみ、などの感情的状態を表す変数であり、離散変数であるから、前記条件付き確率分布ｐ^B（ｅ_t｜ｅ_t-1）は、例えば、ｂｉ−ｇｒａｍ型のヒストグラムを作成することで構成することが考えられる。この場合は、学習用音声信号データに基づき、ｅ_t-1が与えられている時に、各感情的状態の系列ｅ_tが何度出現するのかを計数することによってこれを構成することができる。 Note that et is a variable representing an emotional state such as joy, anger, and sadness, and is a discrete variable. Therefore, the conditional probability distribution p ^B (e _t | e _t-1 ) is, for example, It may be configured by creating a bi-gram type histogram. In this case, based on the learning audio signal data, when the e _t-1 are given, series e _t of each emotional state can configure this by counting how many times to emerge.

以上がステップＳ１１０の詳細処理である。 The above is the detailed processing of step S110.

次に、ステップＳ１２０では、取り込まれたコンテンツの音声信号データから、所望の音声特徴量ベクトルをフレーム毎に抽出する。なお、本発明における音声とは、人間による会話音声のみではなく、歌唱音声、または音楽なども含むものとする。 Next, in step S120, a desired audio feature vector is extracted for each frame from the audio signal data of the captured content. In addition, the audio | voice in this invention shall include not only the conversation audio | voice by a human but singing audio | voice, or music.

以下に、音声特徴量ベクトル抽出方法の一例を説明する。 Below, an example of the audio | voice feature-value vector extraction method is demonstrated.

まず、音声特徴量について説明する。本実施形態における音声特徴量としては、音声スペクトルやケプストラム等と比較して、雑音環境下でも安定して得られ、かつ感情的状態を判定するにあたり、話者のプロフィールに依存しにくいものが好ましい。 First, the audio feature amount will be described. As the voice feature amount in the present embodiment, it is preferable that the voice feature amount is obtained stably even in a noisy environment and is less dependent on the speaker profile in determining the emotional state as compared with the voice spectrum, cepstrum, and the like. .

上述のような条件を満たす音声特徴量として、基本周波数ｆ０，基本周波数の時間変動特性の系列{ｒｆ０ⁱ}，パワーｐ，パワーの時間変動特性の系列{ｒｐⁱ}，発話速度の時間変動特性等を抽出する。なお、ｉは時間変動特性のインデクスである。 As speech feature quantities satisfying the above-described conditions, the fundamental frequency f0, the fundamental frequency time variation characteristic sequence {rf0 ⁱ }, the power p, the power temporal variation characteristic sequence {rp ⁱ }, and the speech rate temporal variation characteristic. Etc. are extracted. Note that i is an index of time variation characteristics.

また、本実施形態においては、系列とは１つ以上の要素を持つ集合であると定義する。時間変動特性の例としては、例えば、回帰係数、分析フレーム内変化量の平均値，最大値，最小値や、分析フレーム内変化量の絶対値の累積和、レンジなどが考えられ、必要に応じて選択すれば良い。特に、回帰係数の場合には、インデクスは次数に対応づけることができる。回帰係数は、何次までを用いてもよいが、例えば、１〜３次程度としてもよい。以下の例では、時間変動特性として回帰係数のみを採用した場合について説明する。パワーｐは、音声信号波形の振幅値を用いるものでもよいし、絶対値や平滑値、ｒｍｓ値を用いるのでもよい。また、ある周波数帯域、例えば、人間の聞き取り易い５００Ｈｚ（ヘルツ）〜３ＫＨｚ（キロヘルツ）などの領域におけるパワーの平均値などを用いるのでも良い。 In the present embodiment, a series is defined as a set having one or more elements. Examples of time fluctuation characteristics include, for example, regression coefficient, average value, maximum value, minimum value of change amount in analysis frame, cumulative sum of absolute value of change amount in analysis frame, range, etc. To choose. In particular, in the case of regression coefficients, the index can be associated with the order. The regression coefficient may be used up to any order, but may be about 1 to 3 for example. In the following example, a case where only the regression coefficient is employed as the time variation characteristic will be described. As the power p, an amplitude value of a sound signal waveform may be used, or an absolute value, a smooth value, or an rms value may be used. Further, an average value of power in a certain frequency band, for example, a region such as 500 Hz (hertz) to 3 KHz (kilohertz) that is easy for humans to hear may be used.

前記抽出すべき基本周波数ｆ０及びパワーｐの抽出法は様々である。それらの抽出方法は公知であり、その詳細については、例えば、非特許文献３に記載の方法等を参照されたい。 There are various methods for extracting the fundamental frequency f0 and the power p to be extracted. These extraction methods are publicly known, and for details, refer to the method described in Non-Patent Document 3, for example.

前記抽出すべき発話速度の時間変動特性は、既知の方法（例えば、非特許文献４、特許文献２参照）によって、動的尺度ｍの時間変動特性ｒｍとして抽出する。 The time variation characteristic of the speech rate to be extracted is extracted as the time variation characteristic rm of the dynamic measure m by a known method (see, for example, Non-Patent Document 4 and Patent Document 2).

例えば、動的尺度のピークを検出し、その数をカウントすることで発話速度を検出する方法をとってもよく、また、発話リズムに相当するピーク間隔の平均値、分散値を計算して発話速度の時間変動特性を検出する方法をとるのでもよい。 For example, a method may be used in which the utterance speed is detected by detecting the peak of the dynamic scale and counting the number, and the average value and the variance of the peak interval corresponding to the utterance rhythm are calculated to calculate the utterance speed. A method of detecting the time variation characteristic may be adopted.

以下の説明では、動的尺度のピーク間隔平均値を用いた、動的尺度の時間変動特性ｒｍについて説明することとする。 In the following description, the time variation characteristic rm of the dynamic scale using the peak interval average value of the dynamic scale will be described.

前記抽出すべき基本周波数の時間変動特性の系列{ｒｆ０ⁱ}、及びパワーの時間変動特性の系列{ｒｐⁱ}として、回帰係数を抽出する方法の一例について説明する。 An example of a method for extracting a regression coefficient as the sequence {rf0 ⁱ } of time variation characteristics of the fundamental frequency to be extracted and the sequence {rp ⁱ } of power time variation characteristics will be described.

分析する時刻をｔとしたとき、時刻ｔにおいて抽出された基本周波数ｆ０_t（例えば、図７中の符号δのグラフ）又はｐ_tと、{ｒｆ０ⁱ _t}又は{ｒｐⁱ _t}との関係は、次の近似式によって表される。 When the time to analyze is t, the relationship between the fundamental frequency f0 _t (for example, the graph of the symbol δ in FIG. 7) or p _t extracted at time _t and {rf0 ⁱ _t } or {rp ⁱ _t } Is represented by the following approximate expression.

ただし、Ｉは回帰関数の最大次数を表す。ｔの近傍でこの近似誤差が小さくなるように{ｒｆ０ⁱ _t}及び{ｒｐⁱ _t}を決定すればよく、これを実現する方法として、例えば、最小自乗法を用いることが考えられる。 Here, I represents the maximum order of the regression function. It suffices to determine {rf0 ⁱ _t } and {rp ⁱ _t } so that this approximation error becomes small in the vicinity of _t . As a method for realizing this, for example, the least square method may be used.

Ｉは任意の値でよいが、ここでは、例として、Ｉ＝１であるときのｒｆ０¹tを求める場合について説明する。ｒｐ¹ _jについても同様に計算できる。分析する時刻をｔとすると、ｔにおける基本周波数の時間変動特性ｒｆ０¹ _tは、 I may be an arbitrary value, but here, as an example, a case of obtaining rf0 ¹ t when I = 1 will be described. The same calculation can be performed for rp ¹ _j . If the time to analyze is t, the time variation characteristic rf0 ¹ _t of the fundamental frequency at _t is

と求めることができる。ここで、ｄは時刻ｔ前後の計算に用いるサンプリング点の数で、ｔの近傍に対応する。例えば、ｄ＝２とする。 It can be asked. Here, d is the number of sampling points used for calculation before and after time t, and corresponds to the vicinity of t. For example, d = 2.

以下では、例えば、Ｉ＝１の場合に求めたｒｆ０¹ _t、ｒｐ¹ _jのみをそれぞれ基本周波数の時間変動特性ｒｆ０、パワーの時間変動特性ｒｐとして扱う場合について説明する。 Hereinafter, for example, a case will be described in which only rf0 ¹ _t and rp ¹ _j obtained in the case of I = 1 are treated as the time variation characteristic rf0 of the fundamental frequency and the time variation characteristic rp of the power, respectively.

フレーム毎に音声特徴量を計算する方法の一例を説明する。１フレームの長さ（以下、フレーム長とよぶ）を１００ｍｓ（ミリセコンド）とし、次のフレームは現フレームに対して５０ｍｓの時間シフトによって形成されるものとする。 An example of a method for calculating a speech feature amount for each frame will be described. Assume that the length of one frame (hereinafter referred to as the frame length) is 100 ms (milliseconds), and the next frame is formed by a time shift of 50 ms with respect to the current frame.

これらのフレーム毎に、各フレーム内での各音声特徴量の平均値、つまり、平均基本周波数ｆ０’、基本周波数の平均時間変動特性ｒｆ０’、平均パワーｐ’、パワーの平均時間変動特性ｒｐ’、動的尺度の平均ピーク間隔平均値ｒｍ’を計算するものとする。あるいは、これらの平均値のみではなく、フレーム内での各音声特徴量の最大値、最小値、または変動幅などを計算して用いてもよい。以下では、平均値のみを用いた場合について説明する。 For each frame, the average value of each voice feature amount in each frame, that is, the average fundamental frequency f0 ′, the average time variation characteristic rf0 ′ of the fundamental frequency, the average power p ′, and the average time variation characteristic rp ′ of the power. The average peak interval average value rm ′ of the dynamic scale is calculated. Alternatively, not only the average value but also the maximum value, the minimum value, or the fluctuation range of each voice feature amount in the frame may be calculated and used. Below, the case where only an average value is used is demonstrated.

各音声特徴量は予め規格化しておくことが好ましい。規格化は、例えばｆ０’について、例えば、処理対象となる音声信号データ全体に渡る平均基本周波数を差し引く、もしくは、平均基本周波数で割ることによって行うのでもよいし、標準正規化して平均０分散１にするのでもよい。その他の音声特徴量についても、同様に行うことが考えられる。 Each voice feature is preferably normalized in advance. The normalization may be performed, for example, for f0 ′ by, for example, subtracting the average fundamental frequency over the entire audio signal data to be processed or dividing by the average fundamental frequency. It may be. It is conceivable that the other audio feature amounts are also performed in the same manner.

感情的状態の判定においては、音声特徴量の時間的な挙動を捉えることが必要である。本実施形態では、フレーム毎に計算した音声特徴量から、１つ以上のフレームの音声特徴量を用いて音声特徴量ベクトルを計算して、音声特徴量の時間的な挙動の捕捉を実現する。なお、この音声特徴量ベクトルを抽出する区間を、音声特徴量ベクトル抽出区間（例えば、図８中の符号Ｗで示される区間）と呼ぶこととする。 In the determination of the emotional state, it is necessary to capture the temporal behavior of the voice feature amount. In the present embodiment, a speech feature vector is calculated from speech features calculated for each frame using speech feature values of one or more frames, and the temporal behavior of the speech feature is captured. Note that a section in which the speech feature vector is extracted is referred to as a speech feature vector extraction section (for example, a section indicated by a symbol W in FIG. 8).

以下、その音声特徴量の時間的な挙動の捕捉方法の一例を図８に基づいて説明する。 Hereinafter, an example of a method for capturing the temporal behavior of the voice feature amount will be described with reference to FIG.

現フレームＦについて、そのフレーム番号をｊとおき、Ｆjと表す。Ｆjに含まれるフレーム毎の音声特徴量をそれぞれ、基本周波数ｆ０’_j、基本周波数の時間変動特性ｒｆ０’_j、パワーｐ’_j、パワーの時間変動特性ｒｐ’_j、動的尺度のピーク間隔平均値ｒｍ’_jと表すものとする。 For the current frame F, its frame number is set as j and expressed as Fj. The speech feature values for each frame included in Fj are respectively the fundamental frequency f0 ′ _j , the fundamental frequency time variation characteristic rf0 ′ _j , the power p ′ _j , the power temporal variation characteristic rp ′ _j , and the peak interval average of the dynamic scale. It shall be represented as a value rm ′ _j .

以上で求めた音声特徴量に基づいた音声特徴量ベクトルの構成方法は、例えば、音声特徴量毎に遅延座標空間に埋め込むことで構成する方法が考えられる。すなわち、現フレームからＳフレーム分過去のフレームまでに含まれる音声特徴量をベクトル表記することで構成するものとする。 As a method for constructing a speech feature vector based on the speech feature obtained as described above, for example, a method of constructing by embedding each speech feature in the delay coordinate space is conceivable. That is, the speech feature amount included from the current frame to the past frames of S frames is represented by a vector.

例えば、基本周波数の場合、現フレーム番号をｔとおけば、ｆ０＝{ｆ０’_t，ｆ０’_t-1，・・・，ｆ０’_t-S}^Tのように基本周波数の音声特徴量ベクトルｆ０を求める物とする。図８中のフレームは、符号ｗ１で示すフレームＦ_t-S，符号ｗ２で示すフレームＦ_t-1，符号ｗ３で示すフレームＦ_tである。 For example, in the case of the fundamental frequency, if the current frame number is t, the speech feature vector f0 of the fundamental frequency is _{expressed as} f0 = {f0 ′ _t , f0 ′ _t−1 ,..., F0 ′ _tS } ^T. Let it be what you want. Frame in Figure 8, the frame F _tS indicated at w1, frame F _t-1 indicated by reference numeral w2, a frame F _t indicated at w3.

あるいは、音声特徴量毎に現フレームからＳフレーム分過去のフレームまでのフレーム間差分量を計算し、これをベクトル表記することで構成する方法も考えられる。 Alternatively, a method is also conceivable in which the interframe difference amount from the current frame to the past frame of S frames is calculated for each audio feature amount, and this is expressed as a vector.

ここで、上記Ｓの値は、例えば、Ｓ＝５とする。同様に、基本周波数の時間変動特性ｒｆ０、パワーｐ、パワーの時間変動特性ｒｐ、動的尺度の時間変動特性ｒｍを計算する。 Here, the value of S is, for example, S = 5. Similarly, the time variation characteristic rf0 of the fundamental frequency, the power p, the time variation characteristic rp of the power, and the time variation characteristic rm of the dynamic scale are calculated.

予め、感情的状態を判定するために使用すると決定した全音声特徴量ベクトルを並べたものをｘと表記するものとする。例えば、抽出した音声特徴量全てを使用する場合は、ｘ＝{ｆ０^T，ｒｆ０^T，ｐ^T，ｒｐ^T，ｒｍ^T}^Tとなる。そして、基本周波数の時間変動特性ｒｆ０、パワーの時間変動特性ｒｐ、動的尺度のピーク間隔平均値ｒｍを使用する場合にはｘ＝{ｒｆ０^T，ｒｐ^T，ｒｍ^T}^Tとなる。 Assume that x is a list of all voice feature vectors determined to be used for determining an emotional state in advance. For example, when all the extracted speech feature values are used, x = {f0 ^T , rf0 ^T , p ^T , rp ^T , rm ^T } ^T. Then, when using the time variation characteristic rf0 of the fundamental frequency, the time variation characteristic rp of the power, and the peak interval average value rm of the dynamic scale, x = {rf0 ^T , rp ^T , rm ^T } ^T.

本実施形態においては、前記音声特徴量のうち１つ以上を使用すれば、感情的状態を判定することが可能である。ただし、感情的状態において特徴的に現れる発話においては、基本周波数ｆ０そのものの抽出が困難な場合が多く、しばしば欠損することがある。このため、そのような欠損を補完する効果を容易に得ることのできる、基本周波数の時間変動特性ｒｆ０は含むことが好ましい。更には、話者依存性を低く抑えたまま、判定精度を高めるため、パワーの時間変動特性ｒｐを更に含むことが好ましい。 In the present embodiment, the emotional state can be determined by using one or more of the audio feature quantities. However, in an utterance that appears characteristically in an emotional state, it is often difficult to extract the fundamental frequency f0 itself, which is often lost. For this reason, it is preferable to include the time variation characteristic rf0 of the fundamental frequency that can easily obtain the effect of complementing such a defect. Furthermore, it is preferable to further include a time variation characteristic rp of power in order to increase the determination accuracy while keeping speaker dependency low.

以上のように、フレーム毎に行った音声特徴量、音声特徴量ベクトルの計算処理を、コンテンツ全てに渡るフレームに対して行うことで、全てのフレームにおいて音声特徴量ベクトルを得ることが可能である。 As described above, it is possible to obtain the audio feature vector in all frames by performing the calculation processing of the audio feature value and the audio feature vector performed for each frame on the frame over the entire content. .

以上がステップＳ１２０の詳細処理である。 The above is the detailed processing of step S120.

次に、ステップＳ１３０は、ステップＳ１２０において抽出された各フレームの音声特徴量ベクトルと、ステップＳ１１０において予め構成しておいた統計モデルとを用いて各感情的状態における音声特徴量ベクトルの出現確率（音声特徴量出現確率）を計算する。 Next, step S130 uses the speech feature vector of each frame extracted in step S120 and the statistical model pre-configured in step S110, and the appearance probability of the speech feature vector in each emotional state ( (Speech feature appearance probability) is calculated.

以下に、ステップＳ１３０で実行する処理の一例を説明する。 Below, an example of the process performed by step S130 is demonstrated.

ステップＳ１１０で作成した統計モデルに基づき、音声特徴量ベクトルの出現確率を計算する方法の一例を説明する。 An example of a method for calculating the appearance probability of a speech feature vector based on the statistical model created in step S110 will be described.

統計モデルは、フレーム毎に与えられる音声特徴量ベクトルｘ_t空間上の条件付き確率分布ｐ^A（ｘ_t｜ｅ_t）であるため、入力された音声特徴量ベクトルｘ_tを、ステップＳ１１０によって予め作成した統計モデルｐ^A（ｘ_t｜ｅ_t）に基づいて尤度計算する。この計算した尤度を、各感情的状態においてｘ_tが出現する音声特徴量出現確率と見做す。 Since the statistical model is a conditional probability distribution p ^A (x _t | e _t ) in the speech feature vector x _t space given for each frame, the input speech feature vector x _t is preliminarily obtained in step S110. The likelihood is calculated based on the created statistical model p ^A (x _t | e _t ). This calculated likelihood is regarded as a speech feature amount appearance probability at which x _t appears in each emotional state.

以上の処理を、全てのフレームに渡って行うことで、全てのフレームに対して音声特徴量出現確率を計算することができる。 By performing the above processing over all the frames, it is possible to calculate the speech feature amount appearance probability for all the frames.

以上がステップＳ１３０の詳細処理である。 The above is the detailed processing of step S130.

次に、ステップＳ１４０では、統計モデルを用いて、感情的状態を判定するフレーム（現フレーム）直前の１つ以上遡ったフレームの感情的状態に依存して、現フレームで各感情的状態に遷移する遷移確率（即ち、感情的状態遷移確率）が計算される。 Next, in step S140, using the statistical model, transition to each emotional state in the current frame depends on the emotional state of one or more previous frames immediately before the frame for determining the emotional state (current frame). Transition probability (ie, emotional state transition probability) is calculated.

以下に、ステップＳ１４０を実行するため処理の一例を説明する。 Below, an example of a process for performing step S140 is demonstrated.

まず、ステップＳ１１０で作成した統計モデルに基づき、感情的状態遷移確率を計算する方法の一例を説明する。 First, an example of a method for calculating the emotional state transition probability based on the statistical model created in step S110 will be described.

統計モデルは、感情的状態の系列ｅ_t空間上の条件付き確率分布ｐ^B（ｅ_t｜ｅ_t-1）である。そのため、ステップＳ１４０では、ｅ_t-1が既に判定され、決定されていれば、ステップＳ１１０における方法などによって予め作成した統計モデルｐ^B（ｅ_t｜ｅ_t-1）に基づいて各感情的状態の系列ｅ_tの起こりうる確率を計算する。その前記計算した確率を、感情的状態遷移確率と見做す。 The statistical model is a conditional probability distribution p ^B (e _t | e _t-1 ) on a sequence _et space of emotional states. Therefore, in step S140, if e _t-1 has already been determined and determined, each emotional state is based on the statistical model p ^B (e _t | e _t-1 ) created in advance by the method in step S110 or the like. Compute the probability of a sequence e _t of. The calculated probability is regarded as an emotional state transition probability.

なお、感情的状態の判定は、音声信号データの時間進行方向に沿って逐次的に行っていくため、フレーム番号ｔがこの時間軸に対して単調増大とすることで、ｅ_tを判定する段階でｅ_t-1が既に判定されている状態を構成できる。 The determination of emotional state, since we performed sequentially along the time proceeding direction of the audio signal data, by the frame number t is a monotonically increasing with respect to the time axis, step of determining e _t The state in which _et-1 is already determined can be configured.

以上の処理を、全てのフレームに渡って行うことで、全てのフレームに対して感情的状態遷移確率を計算することができる。 By performing the above processing over all the frames, the emotional state transition probability can be calculated for all the frames.

以上がステップＳ１４０の詳細処理である。 The above is the detailed processing of step S140.

次に、ステップＳ１５０では、ステップＳ１３０及びステップＳ１４０において計算された音声特徴量出現確率及び感情的状態遷移確率に基づいて、感情的状態確率が計算される。 Next, in step S150, the emotional state probability is calculated based on the speech feature amount appearance probability and the emotional state transition probability calculated in steps S130 and S140.

以下に、ステップＳ１５０で行う感情的状態確率を計算する処理の一例について説明する。 Below, an example of the process which calculates the emotional state probability performed by step S150 is demonstrated.

前記統計モデルｐ^A（ｘt｜ｅt）と、ｐ^B（ｅ_t｜ｅ_t-1）の組は、合わせて一般化状態空間モデルと呼ばれる構造を有しており、音声認識などによく用いられるＬｅｆｔ−ｔｏ−Ｒｉｇｈｔ型のＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌｓ）などと同様の因果構造（例えば、図９中の符号Ｓ_t1で示される感情状態ｅ_t-1，ｅ_tと、符号Ｓ_t2で示される音声特徴量ｘ_t-1，ｘ_tである）を有する。 ^{A set of the} statistical model p ^A (xt | et) and p ^B (e _t | e _t-1 ) has a structure called a generalized state space model, and is often used for speech recognition and the like. Left-to-Right type HMM (Hidden Markov Models), such as the same causal structure (e.g., a emotional state e _t1, e _t represented by reference numeral S _t1 in FIG. 9, the audio, it indicated at S _t2 Characteristic quantities x _t-1 and x _t ).

一般化状態空間モデルは、時刻ｔまでの観測系列{ｘ_t}を得た場合に、時刻ｔでの内部状態ｅ_tの確率分布ｐ（ｅ_t｜{ｘ_t}）は、ｐ^A（ｘ_t｜ｅ_t）とｐ^B（ｅ_t｜ｅ_t-1）に基づき、以下の式を再帰的に計算することで求めることができる（例えば、非特許文献５参照）。 Generalized state space model, if obtaining the observation sequence {x _t} up to time t, the probability distribution of the internal state e _t at time _{t p (e t | {x} t}) is p ^A (x _{Based on t} | e _t ) and p ^B (e _t | e _t-1 ), the following equation can be calculated recursively (for example, see Non-Patent Document 5).

ただし、Ｅは、ｅ_tが取りうる全ての値の集合である。 However, E is a set of all values that e _t can take.

一般に、一般化状態空間モデルにおいては、Ｅの値が非常に大きな値となるため、前記式を直接計算してｐ（ｅ_t｜{ｘ_t}）の値を求めることは困難である。 Generally, in the generalized state space model, since the value of E is a very large value, it is difficult to obtain the value of p (e _t | {x _t }) by directly calculating the above equation.

本実施形態では、ｅtが取りうる全ての値は、取り扱う感情的状態、つまり、喜び、怒り、哀しみ、恐怖、驚き、焦り等であるため、この数を｜ｅ｜とすると、ｅ_t＝{ｅ_t，ｅ_t-1，・・・，ｅ_t-n+1}におけるありうる全ての組み合わせを考えたとき、Ｅのサイズは｜ｅ｜nである。 In this embodiment, since all values that et can take are emotional states to be handled, that is, joy, anger, sadness, fear, surprise, impatience, etc., if this number is | e |, then e _t = { Considering all possible combinations of e _t , e _t−1 ,..., e _{t−n + 1} }, the size of E is | e | n.

例えば、本実施形態においては、想定する取り扱う感情的状態の数｜ｅ｜は、例えば、喜び、怒り、哀しみ、恐怖、驚き、焦り、平静、などが想定され、およそ１０程度である。このとき、例えば、ｎ＝３とすれば、Ｅのサイズオーダにして１０3程度であり、現在普及している汎用的なコンピュータであっても、前記表式を十分に直接計算可能な規模である。 For example, in the present embodiment, the number of emotional states to be handled | e | is assumed to be about 10 for example, assuming joy, anger, sadness, fear, surprise, impatience, calmness, and the like. At this time, for example, if n = 3, the size order of E is about 10 3, and even the currently popular general-purpose computer is a scale that can sufficiently directly calculate the expression. .

従って、本実施形態では、この表式の確率分布ｐ（ｅ_t｜{ｘ_t}）の値を直接計算することが可能であるので、直接計算することで感情的状態確率を計算するものとする。 Therefore, in this embodiment, since the value of the probability distribution p (e _t | {x _t }) of this expression can be directly calculated, the emotional state probability is calculated by direct calculation. To do.

更に、ｅtの要素からｅtを除外したものをｅ’_t＝{ｅ_t-1，・・・，ｅ_t-n+1}とし、かつｅ’tが取りうる全ての値の集合Ｅ’と表すとすると、フレーム毎の感情的状態ｅ_tの感情的状態確率は、 Further, let e ′ _t = {e _t−1 ,..., E _{t−n + 1} } excluding et from the elements of et, and a set E ′ of all values that e′t can take. When expressed, emotional state probability of the emotional state e _t for each frame,

を計算することで求めることが可能である。図１０に、喜び、哀しみ、平静の感情的状態を扱った場合の感情的状態確率の例を示す。即ち、図１０中の符号Ｌ１で示す曲線が喜びの感情的状態確率、符号Ｌ２で示す曲線が平静の感情的状態確率、符号Ｌ３で示す曲線が哀しみの感情的状態確率である。 Can be obtained by calculating. FIG. 10 shows an example of emotional state probabilities when emotional states of joy, sadness, and calm are handled. In other words, the curve indicated by the symbol L1 in FIG. 10 is the emotional state probability of pleasure, the curve indicated by the symbol L2 is the calm emotional state probability, and the curve indicated by the symbol L3 is the sad emotional state probability.

以上の処理を全てのフレームに渡って行うことによって、全てのフレームに対して感情的状態確率を計算することが可能である。 By performing the above processing over all frames, it is possible to calculate the emotional state probability for all the frames.

以上がステップＳ１５０の詳細処理である。 The above is the detailed processing of step S150.

ステップＳ１６０において、ステップＳ１５０において計算された感情的状態確率が感情判定手段に取り込まれ、感情的状態確率に基づいて感情的状態が判定される。 In step S160, the emotional state probability calculated in step S150 is taken into the emotion determination means, and the emotional state is determined based on the emotional state probability.

以下、感情的状態を判定するステップＳ１６０の処理の一例を説明する。なお、以下の説明では、取り扱う感情的状態のカテゴリを順にｅ¹、ｅ²、・・・、ｅ^|e|とインデクス付けする。例えば、喜び、怒り、哀しみ、恐怖、驚き、焦り、平静の感情的状態を取り扱う場合には、ｅ¹：喜び、ｅ²：怒り、ｅ³：哀しみ、ｅ⁴：恐怖、ｅ⁵：驚き、ｅ⁶：焦り、ｅ⁷：平静等とすればよく、この場合は、｜ｅ｜＝７である。 Hereinafter, an example of the process of step S160 for determining the emotional state will be described. In the following description, the categories of emotional states to be handled are indexed in order as e ¹ , e ² ,..., E ^{| e |} . For example, when dealing with emotional states of joy, anger, sadness, fear, surprise, impatience, calmness, e ¹ : joy, e ² : anger, e ³ : sadness, e ⁴ : fear, e ⁵ : surprise, e ⁶ : impatience, e ⁷ : calmness, etc. In this case, | e | = 7.

ステップＳ１５０で、フレーム番号ｔのフレームＦtにおける感情的状態がｅ^kである感情的状態確率ｐ^k _t＝ｐ（ｅ_t＝ｅ^k｜{ｘ_t}）を計算しているため、最も単純には、この確率ｐ^k _tが最も高いｅ^kに対応する感情的状態を、Ｆtにおける感情的状態と判定することができる。あるいは、ｐ^k _tが高い値をとるｅ^kを、降順に１つ以上選出し、これら複数の感情的状態を持って判定結果としてもよい。 In step S150, the emotional state emotional state probability is e ^{^k} p ^k _t = p in the frame Ft of the frame number t | because it calculates the _{^{(e t = e k {x}} t}), the simplest Can determine the emotional state corresponding to e ^k having the highest probability p ^k _{t as} the emotional state in Ft. Alternatively, one or more e ^k having a high value of p ^k _t may be selected in descending order, and the determination result may be obtained by having these plural emotional states.

または、感情的状態によっては、同時刻に相互想起しやすいものと、しにくいものとがある。例えば、喜びと哀しみは同時に想起しにくいことは容易に想像できる。このような現象を考慮して、ある感情的状態ｅ^kに対応する感情的状態確率ｐ^k _tから、その他の感情的状態{ｅ¹，ｅ²，・・・，ｅ^k-1，ｅ^k+1，・・・，ｅ^|e|}に対応する感情的状態確率{ｐ¹t，ｐ²t，・・・，ｐ^k-1t，ｐ^k+1t，・・・，ｐ^|e|t}の凸結合である、 Or, depending on the emotional state, there are things that are easily recollected at the same time and things that are difficult to recall. For example, it is easy to imagine that joy and sorrow are difficult to recall at the same time. This phenomenon in view, the emotional state probability p ^k _t corresponding to a certain emotional state e ^k, other emotional states ^{^{{e 1, e 2, ···}} , e k-1, e k ^+1, ···, ^{e |} e ^| emotional state probability {p ¹ t corresponding ^{to}, p 2 t, ···,} p k-1 t, p k + 1 t, ···, p | ^{e |} t} is a convex combination,

を減算し、規格化したものを新たなｐ^ktとして、これを比較してもよい。 May be compared as a new p ^k t obtained by normalization.

または、簡単に、予めある閾値を定め、これよりも値の大きなｐ^ktに対応する感情的状態ｅ^kを、感情的状態と判定してもよい。 Or, simply, determined in advance certain threshold, the emotional state e ^k corresponding to the value of the large p ^k t than this, it may be determined that emotional state.

以上の処理を全てのフレームに渡って行うことによって、フレーム毎に感情的状態を判定することが可能である。 By performing the above processing over all the frames, it is possible to determine the emotional state for each frame.

以上がステップＳ１６０の詳細処理である。 The detailed processing in step S160 has been described above.

以上のステップＳ１１０〜Ｓ１６０によって、任意の音声信号データを含むコンテンツに対して、フレーム毎に感情的状態を判定することが可能となる。 Through the above steps S110 to S160, it is possible to determine the emotional state for each frame with respect to content including arbitrary audio signal data.

そして、ステップＳ１７０では、ステップＳ１５０において計算された感情的状態確率，ステップＳ１６０で判定した感情的状態に基づいて感情、感情度を推定し出力する。 In step S170, the emotion and emotion level are estimated and output based on the emotional state probability calculated in step S150 and the emotional state determined in step S160.

以下、感情、感情度を推定する処理の一例について説明する。本実施形態においては、連続する発話であると考えられる発話区間の集合は１つの区間としてまとめる処理を行っておく。以下、この連続する発話で構成される発話区間集合のことを音声小段落と呼ぶ。 Hereinafter, an example of processing for estimating emotion and emotion level will be described. In the present embodiment, a set of utterance sections that are considered to be continuous utterances is processed as one section. Hereinafter, a set of utterance sections composed of continuous utterances is referred to as an audio sub-paragraph.

ここで、音声小段落を生成する方法の一例を説明する。 Here, an example of a method for generating a small audio paragraph will be described.

まず、発話区間であると考えられる区間を抽出する。このような区間を抽出する方法の一例としては、音声波形における発話区間の周期性を利用して、自己相関関数の高い区間を発話区間であると見做して、該当区間を抽出する方法がある。 First, a section that is considered to be an utterance section is extracted. As an example of a method of extracting such a section, a method of extracting a corresponding section by using a periodicity of a speech section in a speech waveform and considering a section having a high autocorrelation function as a speech section. is there.

実際には、ある閾値よりも高い自己相関関数値を示す区間を、発話区間であると見做す。この閾値の与え方は、予め定数として与えてもよいし、コンテンツ全体の自己相関関数値を計算した後、一般の場合の発話時間（または、有声時間）と非発話時間（または、無声時間）の割合を基準として、この基準に近い発話時間割合になるように閾値を決定してもよい。 Actually, a section showing an autocorrelation function value higher than a certain threshold is regarded as an utterance section. This threshold value may be given as a constant in advance, or after calculating the autocorrelation function value of the entire content, the speech time (or voiced time) and non-speech time (or unvoiced time) in the general case As a reference, the threshold may be determined so that the speech time ratio is close to this reference.

本実施形態においては、発話区間を構成する単位はフレーム単位である。即ち、音声特徴量ベクトルに基本周波数が含まれていれば、これがある閾値よりも高い区間を発話区間（即ち、発話フレーム）と見做してもよい。 In the present embodiment, the unit constituting the speech section is a frame unit. That is, if the fundamental frequency is included in the speech feature vector, a section higher than a certain threshold value may be regarded as an utterance section (that is, an utterance frame).

以上の処理によって、コンテンツ中からフレーム単位で発話フレームと非発話フレームを分離することが可能である。 Through the above processing, it is possible to separate speech frames and non-speech frames from content in units of frames.

次に、抽出された発話フレームのうち、連続する発話であると考えられる発話フレーム集合を１つの区間としてまとめていくことで、音声小段落を生成する。 Next, among the extracted utterance frames, an utterance frame set considered to be continuous utterances is collected as one section, thereby generating a small audio paragraph.

このような音声小段落を生成する方法の一例について説明する。 An example of a method for generating such a small audio paragraph will be described.

コンテンツ中の発話フレームＦ’の集合を時刻の早いものから順に{Ｆ’₁，Ｆ’₂，・・・，Ｆ^' _N}とする。ここで、Ｎは発話フレームの総数である。 Assume that a set of utterance frames F ′ in the content is {F ′ ₁ , F ′ ₂ ,..., F ^′ _N } in order from the earliest time. Here, N is the total number of speech frames.

次に、時間軸上隣り合う発話フレームＦ’_i、Ｆ’_i+1の時間間隔、すなわち、Ｆ’_iの終了時刻Ｆ’_iendと、次のフレームであるＦ’_i+1の開始時刻Ｆ’_i+1startについて、その時刻の差Ｆ_i+1start−Ｆ_iendを計算する。 Next, the speech frame F _'i, F' adjacent on the time axis _{i + 1} of the time interval, i.e., a _iend 'end time F of _i' F, which is the next frame F _{'i + 1} start time F 'For _{i + 1start} , the time difference F _{i + 1start} -F _iend is calculated.

次に、その計算結果を予め決定したある閾値と比較して小さければ、ＦiとＦi+1は連続する発話フレームであると考え、１つの音声小段落を構成する。 Next, if the calculation result is smaller than a predetermined threshold, Fi and Fi + 1 are considered to be continuous speech frames, and form one audio sub-paragraph.

そして、これを全てのフレームに渡り繰り返すことで、連続発話と考えられるフレームは音声小段落にまとめることができる。 Then, by repeating this process over all frames, frames that are considered to be continuous speech can be grouped into audio sub-paragraphs.

また、音声小段落を生成する方法の他例としては、次のようなものが考えられる。 As another example of the method for generating the audio sub-paragraph, the following can be considered.

まず、コンテンツ全体に渡り発話フレーム毎に開始時刻、終了時刻を求め、これらを纏めて２次元のベクトルと見做す。 First, a start time and an end time are obtained for each utterance frame over the entire content, and these are collectively regarded as a two-dimensional vector.

そして、このベクトルについて、コンテンツ全体の発話時間と非発話時間の比が、一般の場合の発話時間と非発話時間の割合と同じ程度になるようにボトムアップクラスタリング法を適用し、音声小段落を生成する。但し、対象となるコンテンツのジャンル等が、コンテンツの属性情報等から予め既知である場合には、これに応じて目標とする割合を変えても良い。 For this vector, the bottom-up clustering method is applied so that the ratio of the speech time to the non-speech time of the entire content is approximately the same as the ratio of the speech time to the non-speech time in the general case. Generate. However, when the genre or the like of the target content is known in advance from the content attribute information or the like, the target ratio may be changed accordingly.

上述の他例においては、予め閾値を決定しておく方法とは異なり、コンテンツ間の発話速度の差を吸収し、適応的に音声小段落を構成できる。 In the other example described above, unlike the method in which the threshold value is determined in advance, the difference in the speech rate between contents can be absorbed and the audio sub-paragraph can be configured adaptively.

以上の処理により、各音声小段落は、１つ又は連続するフレームの集合を必ず含むこととなり、コンテンツ中に含まれる発話フレーム全体を、いくつかの音声小段落にまとめることが可能である。 Through the above processing, each audio sub-paragraph necessarily includes one or a set of continuous frames, and the entire speech frame included in the content can be combined into several audio sub-paragraphs.

次に、構成した音声小段落単位での感情的状態である感情度を計算する。以下、この感情度を計算する方法の一例を図１１に基づいて説明する。 Next, the emotional level, which is the emotional state of each constructed audio sub-paragraph, is calculated. Hereinafter, an example of a method for calculating the emotion level will be described with reference to FIG.

コンテンツ中の音声小段落Ｓ’の集合を時刻の早いものから順に{Ｓ₁，Ｓ₂，・・・，Ｓ_NS}とする。例えば、図１１中では、符号ｖ１で示される音声小段落Ｓ_i-1，符号ｖ２で示される音声小段落Ｓi，符号ｖ３で示される音声小段落Ｓ_i+1である。 Assume that a set of audio sub-paragraphs S ′ in the content is {S ₁ , S ₂ ,..., S _NS } in order from the earliest time. For example, in FIG. 11, the audio sub-paragraph S _i-1 indicated by the reference symbol v1, the audio sub-paragraph Si indicated by the reference symbol v2, and the audio sub-paragraph S _{i + 1} indicated by the reference symbol v3.

ここで、ＮＳは音声小段落の総数である。また、ある音声小段落Ｓiに含まれる発話フレームを{ｆ₁，ｆ₂，・・・，ｆ_NFi}と置く。ＮＦｉは音声小段落Ｓiに含まれる発話フレーム数である。 Here, NS is the total number of audio sub-paragraphs. Also, an utterance frame included in a certain audio sub-paragraph Si is set as {f ₁ , f ₂ ,..., F _NFi }. NFi is the number of speech frames included in the audio sub-paragraph Si.

各発話フレームｆtは、感情的状態確率計算手段によって、感情的状態確率ｐ（ｅt｜{ｘt}）が与えられている。音声小段落Ｓiの感情的状態ｅが、ｅ^kである感情度ｐ_Si（ｅ＝ｅ^k）は、例えば、平均値を表す次式によって計算することが考えられる。 Each speech frame ft is given an emotional state probability p (et | {xt}) by the emotional state probability calculation means. The emotional level p _Si (e = e ^k ) in which the emotional state e of the audio sub-paragraph Si is e ^k can be calculated by, for example, the following expression representing an average value.

また、例えば、最大値を表す次式によって計算することも考えられる。 Further, for example, it is conceivable to calculate by the following expression representing the maximum value.

これら以外にも、例えば、音声小段落内で窓掛けを行ってから感情度を計算するなど、方法はさまざま考えられるが、音声小段落間で感情度を比較する場合があるため、感情度はある一定の値の範囲内、例えば０〜１の間に収まるようにすることが好ましい。 In addition to these, for example, there are various methods such as calculating the emotion level after windowing in the audio sub-paragraph, but the emotion level may be compared between the audio sub-paragraphs, so the emotion level is It is preferable that the value falls within a certain value range, for example, between 0 and 1.

なお、図１１における感情度は、符号Ｈ１〜Ｈ３で示されている感情度である。 In addition, the emotional degree in FIG. 11 is an emotional degree shown with the code | symbol H1-H3.

以上のような計算を、全ての音声小段落に渡って行い、全ての音声小段落に対して全ての感情的状態の感情度を与えることが可能である。 It is possible to perform the calculation as described above over all the audio sub-paragraphs and to give emotion levels of all emotional states to all the audio sub-paragraphs.

また、音声認識した結果得られるテキスト情報から、感情、感情度を抽出する方法としては、例えば、まず、例えば、特許文献３の技術などを用いて音声情報から音声認識を行ってテキスト情報を得、次に、この得られたテキスト情報から、例えば、特許文献４の技術などを用いて感情表現に対応する単語等を抽出すればよい。 As a method for extracting emotion and emotion level from text information obtained as a result of speech recognition, for example, first, text information is obtained by performing speech recognition from speech information using the technique of Patent Document 3, for example. Next, from the obtained text information, for example, a word or the like corresponding to the emotional expression may be extracted using the technique of Patent Document 4.

感情度については、例えば、「単語：“笑える”、感情：“楽しい”、感情度：０．８」や、「単語：“むかつく”→感情：“怒り”、感情度：０．７」等、感情表現に対応する単語と、感情、感情度を対応付けした辞書を保持し、これを参照することで抽出可能である。辞書に未登録の単語の感情、感情度については、例えば、特許文献５に記載の方法などにより、辞書に登録済みの単語と未登録の単語との類似度を計算し、類似度の高い、１つ以上の登録済みの単語の感情度の平均値を該未登録の単語の感情度とすればよい。 As for the emotion level, for example, “word:“ can laugh ”, emotion:“ fun ”, emotion level: 0.8”, “word:“ muddy ”→ emotion:“ anger ”, emotion level: 0.7, etc. It can be extracted by holding a dictionary in which words corresponding to emotional expressions are associated with emotions and feelings, and referring to this dictionary. For the feelings and feelings of the words not registered in the dictionary, for example, by calculating the similarity between the words registered in the dictionary and the unregistered words by the method described in Patent Document 5, the similarity is high. The average value of the emotion level of one or more registered words may be set as the emotion level of the unregistered word.

また、例えば、１つの感情カテゴリに帰属させることが難しい単語、例えば、“異様な”などの表現については、「単語：“異様な”→感情：怖い、感情度：０．３ＡＮＤ感情：不思議、感情度：０．３」などと複数の感情と感情度に帰属させて処理を実行するとしてもよい。 For example, for words that are difficult to be attributed to one emotion category, such as “odd”, “word:“ odd ”” → emotion: scary, emotion level: 0.3 AND emotion: mysterious , Emotion level: 0.3 ”or the like, the process may be executed by belonging to a plurality of emotion levels and emotion levels.

また、“とても”などの相対的表現に関しては、特許文献６の技術などを用いることによって数値化ができるため、これに基づいて適宜感情度を調整してもよい。感情度は、数値として計算することとし、好ましくは、ある上限と下限、例えば、１と０など、ある一定の範囲に収める。 In addition, relative expressions such as “very” can be quantified by using the technique of Patent Document 6 and the like, and the emotion level may be appropriately adjusted based on this. The emotion level is calculated as a numerical value, and preferably falls within a certain range such as an upper limit and a lower limit, for example, 1 and 0.

この単語毎に得られた感情度に対して、コンテンツの全体、又は部分に渡り、総和、平均値、最大値等を計算することで、コンテンツの全体、又は部分の感情度を定義してもよい。 Even if the emotion level obtained for each word is calculated over the entire content or part of the content, the sum, average value, maximum value, etc. are calculated to define the emotion level of the entire content or part. Good.

ここで用いる辞書は、設計者が予め設計しておくのでもよいし、ユーザの実際の主観的感覚を考慮するために学習を用いて構築してもよい。 The dictionary used here may be designed in advance by the designer, or may be constructed using learning to take into account the actual subjective feeling of the user.

音声情報を用いて抽出される感情のカテゴリについては、例えば、“楽しい”、“哀しい”、“怖い”、“激しい”、“かっこいい”、“かわいい”、“エキサイティング”、“情熱的”、“ロマンチック”、“暴力的”、“穏やか”、“癒される”、“暖かい”、“冷たい”、“不気味”などをインデクス集合とすればよい。 For emotion categories extracted using voice information, for example, “fun”, “sad”, “scary”, “severe”, “cool”, “cute”, “exciting”, “passionate”, “ The index set may be “romantic”, “violent”, “calm”, “healed”, “warm”, “cold”, “creepy”, etc.

また、感情、感情度を抽出する範囲は、コンテンツ全体であっても、コンテンツを所定の区間毎に分割した部分コンテンツ単位であってもよい。 The range for extracting emotions and emotion levels may be the entire content or a partial content unit obtained by dividing the content into predetermined intervals.

１つの感情に対して、感情度が複数抽出される場合があるが、この場合は、感情毎に、感情度の総和、平均値、最大値などを計算することによって、該感情の感情度とすればよい。 There are cases where a plurality of emotion levels are extracted for one emotion. In this case, by calculating the sum, average value, maximum value, etc. of the emotion level for each emotion, do it.

このように推定されたコンテンツの感情、感情度を、コンテンツの識別情報と対応付けて、データベース４００に記憶しておく。このデータベース４００は、所定の記憶装置と、コンテンツ、及び感情、感情度を含めたコンテンツの情報によって構成する。この所定の記憶装置は、例えば、個人、家庭内等の比較的小規模な利用範囲の場合は、ユーザ端末内ＨＤＤ、ＨＤＤレコーダ内のＨＤＤ、又はＬＡＮ等によってユーザ端末と接続された所定のサーバ装置内のＨＤＤ、もしくは、ＤＶＤ等持ち出し可能な外部記憶装置によって構成するのでもよいが、本発明の実施形態の例においては、インターネット等の広域通信網等によって接続された複数のユーザ端末間での感情の共有を行う、利用範囲が比較的大規模であるものを取り扱う。この場合には、例えば、広域通信網等によって接続されたサーバ装置を伴う記憶装置としてもよい。 The content emotion and emotion level estimated in this way are stored in the database 400 in association with the content identification information. The database 400 includes a predetermined storage device and content information including content, emotion, and emotion level. This predetermined storage device is, for example, a predetermined server connected to a user terminal by a HDD in a user terminal, an HDD in an HDD recorder, or a LAN in the case of a relatively small use range such as an individual or home. In the example of the embodiment of the present invention, a plurality of user terminals connected by a wide-area communication network such as the Internet may be used. It deals with those who share the feelings of those who have a relatively large scope of use. In this case, for example, a storage device with a server device connected by a wide area communication network or the like may be used.

公開されたコンテンツの感情、感情度の推定と同時に、情報制御部３００が、コンテンツを公開したユーザ端末のユーザ嗜好情報として、コンテンツを公開したユーザ端末の識別情報、公開されたコンテンツの識別情報、コンテンツの公開日時等をユーザ嗜好情報として、図１２に示すように、ユーザ情報記憶部Ｆ２００に記憶する。 At the same time as the estimation of emotion and emotion level of the published content, the information control unit 300 uses the user preference information of the user terminal that published the content as the user preference information of the user terminal that published the content, the identification information of the published content, As shown in FIG. 12, the date and time of content release are stored in the user information storage unit F200 as user preference information.

ユーザ端末の識別情報については、例えば、コンテンツ共有サイトなどにおいて、ユーザが利用の際に予めユーザ登録などを行う場合には、ユーザ毎に識別情報を割り当てることができるため、これをユーザ端末の識別情報としてもよい。 Regarding the identification information of the user terminal, for example, when the user performs user registration in advance at the content sharing site or the like, the identification information can be assigned for each user. It may be information.

また、ユーザ登録などを行わない場合でも、インターネットなどの通信を介した利用の場合は、通信のために端末毎に割り当てられる識別情報、例えばＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）アドレス等を可能な限り取得すればよい。 Even when user registration or the like is not performed, in the case of use through communication such as the Internet, if identification information assigned to each terminal for communication, for example, an IP (Internet Protocol) address, etc. is acquired as much as possible. Good.

以下では、特に言及しない場合にはこれらは区別せず、ユーザ端末の識別情報と呼ぶ。 In the following, unless otherwise mentioned, these are not distinguished and referred to as user terminal identification information.

次に、ステップＳ２００において、ユーザ端末からコンテンツ配信要求を受けた時点で、各ユーザ端末の識別情報と、コンテンツの識別情報、コンテンツの配信日時等を対応付け、これをユーザ嗜好情報としてユーザ情報記憶部Ｆ２００に記憶する。 Next, in step S200, when a content distribution request is received from the user terminal, the identification information of each user terminal is associated with the identification information of the content, the distribution date and time of the content, and the user information is stored as user preference information. Store in part F200.

ユーザ端末２００ａ、２００ｂ、・・・からそれぞれ入力されたコンテンツ配信要求が、所定の通信手段によって情報制御部３００に送信される。この際、各ユーザ端末２００ａ、２００ｂ、・・・は、情報制御部３００によって、ＩＤ等の識別情報に対応付けられ、それぞれ異なるものとして識別される。 A content distribution request input from each of the user terminals 200a, 200b,... Is transmitted to the information control unit 300 by a predetermined communication unit. At this time, each of the user terminals 200a, 200b,... Is associated with identification information such as an ID by the information control unit 300, and is identified as different from each other.

コンテンツ配信要求を受けた場合、情報制御部３００は、要求に対応したコンテンツを、データベース４００に蓄積されたコンテンツの中から検索し、各ユーザ端末２００ａ、２００ｂ、・・・に配信する。この際、図１２に示すように、コンテンツの配信先であるユーザ端末の識別情報、配信されたコンテンツの識別情報、コンテンツ配信日時等をユーザ嗜好情報として、ユーザ情報記憶部Ｆ２００に記憶する。 When receiving the content distribution request, the information control unit 300 searches the content corresponding to the request from the content stored in the database 400, and distributes the content to each user terminal 200a, 200b,. At this time, as shown in FIG. 12, the identification information of the user terminal that is the distribution destination of the content, the identification information of the distributed content, the content distribution date and time, etc. are stored as user preference information in the user information storage unit F200.

この時、コンテンツの配信日時だけでなく、例えば、視聴開始時刻及び視聴終了時刻といった様々な時刻情報と、前記嗜好情報及び／又は動作情報の時間的な変化を前記時刻情報と対応づけて記憶しておくことにより、後述するユーザ端末間の関連付けの際、ユーザの感情の動的な変化を考慮することができる。 At this time, not only the content delivery date but also various time information such as viewing start time and viewing end time, and temporal changes in the preference information and / or operation information are stored in association with the time information. By doing so, it is possible to take into account dynamic changes in the user's emotion when associating the user terminals described later.

次に、ユーザ関連付け部Ｆ３００が、記憶されたユーザ嗜好情報に基づいて、各ユーザ端末間の関連付けを行う。本発明の実施形態の第１例においては、この関連付けは、ユーザ嗜好情報に含まれるコンテンツの感情、感情度の類似度合いを類似度として計算し、この類似度に基づいて関連付けを行う。 Next, the user association unit F300 associates each user terminal based on the stored user preference information. In the first example of the embodiment of the present invention, this association is calculated based on the similarity of the emotion and emotion level of the content included in the user preference information as the similarity, and the association is performed based on the similarity.

関連付けを行うタイミングとしては、例えば、コンテンツ共有サイトなどの場合は、当該サイトにアクセスしているユーザ端末の内、所定の割合以上のユーザ端末が、少なくとも一度コンテンツ配信を受けている状態で、かつ、前回の関連付けの処理から所定の時間経過している場合に計算するなどとすればよい。また、ユーザ端末から関連付けの実行を要求された時点で実施するなどとしてもよい。 As a timing of association, for example, in the case of a content sharing site or the like, among user terminals accessing the site, at least a predetermined percentage of user terminals have received content distribution at least once, and The calculation may be performed when a predetermined time has elapsed since the previous association process. Further, it may be performed at the time when execution of association is requested from the user terminal.

以下、説明する関連付けは、少なくとも一度コンテンツの公開をしている、もしくは、配信を受けているユーザ端末のみを対象とすることを想定している。即ち、ユーザ端末の識別情報と、コンテンツの公開／配信日時と、コンテンツの識別情報によって構成されるユーザ嗜好情報が、１つ以上記憶されているユーザ端末のみを対象とする。 Hereinafter, it is assumed that the association described below targets only user terminals that have published content at least once or are receiving distribution. That is, only a user terminal in which one or more user preference information composed of user terminal identification information, content publication / distribution date and time, and content identification information is stored is targeted.

ユーザ関連付け部Ｆ３００が実行する、ユーザ関連付け処理を行うステップＳ３００の処理の流れの１例を示すフロー図を図１３に従って説明する。 A flowchart illustrating an example of the flow of processing in step S300 for performing user association processing executed by the user association unit F300 will be described with reference to FIG.

まず、ステップＳ３１０では、ユーザ端末毎に、ユーザ嗜好情報の基準値としての嗜好値を計算する。 First, in step S310, a preference value as a reference value of user preference information is calculated for each user terminal.

以下、嗜好値を計算する方法の１例を説明する。例えば、ユーザ端末２００ａについて、ｉ番目に視聴したコンテンツの、ｋ番目のカテゴリｅ^kの感情度をｒｉ（ｅ^k）と表し、視聴したコンテンツの数をＮと表す。各ｒｉ（ｅ^k）は、０〜１の間の数値などに規格化されていることが好ましい。このとき、例えば、ユーザ端末２００ａのｋ番目の感情のカテゴリｅ^kの嗜好値Ｒａ（ｅ^k）は、 Hereinafter, an example of a method for calculating the preference value will be described. For example, the user terminal 200a, the viewed content i-th, the emotion of the k-th category e ^k represents the ri (e ^k), represents the number of viewed content in the N. Each ri (e ^k ) is preferably normalized to a numerical value between 0 and 1, for example. At this time, for example, the preference value Ra (e ^k ) of the category e ^k of the ^kth emotion of the user terminal 200a is

によって計算することが考えられる。 It is conceivable to calculate by

ここで、ｗｉは重みであり、例えば、コンテンツ視聴時の時刻が最近であるものほど、大きな値とすることが考えられる。また、エビングハウスの忘却曲線など、心理学の知見を取り入れた忘却モデルを導入し、これを過去のコンテンツの視聴時刻と対応付けて重みを決定してもよい。 Here, wi is a weight. For example, the more recent the time when viewing the content, the larger the value can be considered. Also, a forgetting model incorporating psychological knowledge such as the forgetting curve of Ebbing House may be introduced, and the weight may be determined by associating it with the viewing time of past content.

このような重み付け処理によって、ユーザの嗜好の変化を考慮に入れた嗜好値を計算できる。例えば、ユーザ端末２００ａが、以前は“楽しい”印象のコンテンツを好んでいたが、最近は“泣ける”印象のコンテンツを好んでいるため、似たような傾向を持つユーザ端末と関連付けをする、など、より適時性と精度の高い関連付けを実行できる。 By such weighting processing, it is possible to calculate a preference value taking into account a change in user preference. For example, the user terminal 200a previously liked the content of the “fun” impression, but recently likes the content of the “crying” impression, and therefore associates with the user terminal having a similar tendency, etc. , More timely and accurate associations can be performed.

同様に、ユーザ端末２００ｂ、２００ｃ、・・・についてもそれぞれ、感情カテゴリｅ^k毎の嗜好値Ｒｂ（ｅ^k）、Ｒｃ（ｅ^k）、・・・として計算する。 Similarly, the user terminal 200b, 200c, respectively, for even ..., preference value Rb for each emotional category ^{^{e k (e k), Rc}} (e k), calculated as ....

次に、ステップＳ３２０で、各ユーザ端末２００ａ、２００ｂ、・・・の嗜好値に基づいて、ユーザ端末間の類似度を計算する。類似度は、例えば、ユーザ端末２００ａの嗜好値に対して、その他のユーザ端末２００ｂ、２００ｃ、・・・の嗜好値がどれだけ類似しているかを表す。以下、ユーザ端末２００ａとユーザ端末２００ｂの類似度を計算する方法の１例について説明する。 Next, in step S320, the similarity between the user terminals is calculated based on the preference value of each user terminal 200a, 200b,. The similarity indicates, for example, how similar the preference values of the other user terminals 200b, 200c,... Are with respect to the preference value of the user terminal 200a. Hereinafter, an example of a method for calculating the similarity between the user terminal 200a and the user terminal 200b will be described.

例えば、ユーザ端末２００ａとユーザ端末２００ｂの類似度ｆｓ（ａ、ｂ）は、感情カテゴリのインデクス集合をＫ、その数を＃（Ｋ）とすれば、 For example, the similarity fs (a, b) between the user terminal 200a and the user terminal 200b is expressed as follows: if the emotion category index set is K and the number is # (K),

によって計算することができる。類似度の計算式は上記式９に限らず、例えば、分母は対象となるユーザ間の嗜好値のｐ次平均ノルム等によって代替してもよい。 Can be calculated by: The calculation formula of the similarity is not limited to the above formula 9, and for example, the denominator may be replaced by the p-order average norm of the preference value between target users.

更には、過去の嗜好値を記憶しておき、この値による修正を加えて嗜好値を計算してもよい。過去の嗜好値を考慮することで、ユーザの嗜好の不変性や大きな変化を捉えることができる。 Furthermore, the past preference value may be stored, and the preference value may be calculated by adding correction based on this value. By taking into account past preference values, it is possible to capture invariance and large changes in user preferences.

同様に、ユーザ端末２００ａについて、ユーザ端末２００ｃ、２００ｄ・・・との類似度についてもそれぞれ、ｆｓ（ａ、ｃ）、ｆｓ（ａ、ｄ）・・・として計算し、ユーザ端末２００ｂ、２００ｃ・・・についても、その他の端末との類似度ｆｓ（ｂ、ｃ）、ｆｓ（ｂ、ｄ）、・・・を計算していく。このような処理によって、対象となる全ての２ユーザ端末間の類似度を計算することができる。 Similarly, the similarity between the user terminal 200a and the user terminals 200c, 200d,... Is calculated as fs (a, c), fs (a, d),. ... Also calculates the similarity fs (b, c), fs (b, d),... With other terminals. By such processing, the similarity between all target two user terminals can be calculated.

以上の嗜好値の計算、及び類似度の計算は、必ずしもユーザ嗜好情報の全ての情報を用いる必要はない。例えば、ユーザの公開したコンテンツのみに対して嗜好値を計算することや、ユーザの視聴したコンテンツのみに対して嗜好値を計算してもよい。 The above preference value calculation and similarity calculation need not necessarily use all of the user preference information. For example, the preference value may be calculated only for the content published by the user, or the preference value may be calculated only for the content viewed by the user.

例えば、ユーザ端末２００ａの視聴したコンテンツのみの情報を用いて嗜好値を計算した場合、この嗜好値はユーザ端末２００ａの視聴についての嗜好の指標（視聴嗜好値）となり、又、ユーザ端末２００ｂの公開したコンテンツのみの情報を用いて嗜好値を計算した場合、この嗜好値はユーザ端末２００ｂの公開したコンテンツの嗜好の傾向の指標（公開嗜好値）となる。 For example, when the preference value is calculated using only the information viewed by the user terminal 200a, the preference value becomes a preference index (viewing preference value) for viewing of the user terminal 200a, and is also disclosed to the user terminal 200b. When the preference value is calculated using only the information on the content that has been made, the preference value becomes an index (public preference value) of the tendency of the content preference published by the user terminal 200b.

これらの値を用いて、例えば、ユーザ端末２００ａの視聴嗜好値とユーザ端末２００ｂの公開嗜好値の類似度が高い場合等には、後に説明する通知のプロセスにおいて、ユーザ端末２００ａに、嗜好に合うコンテンツを公開しているユーザとしてユーザ端末２００ｂの情報を通知すること等もできる。 Using these values, for example, when the similarity between the viewing preference value of the user terminal 200a and the public preference value of the user terminal 200b is high, the user terminal 200a meets the preference in the notification process described later. It is also possible to notify the information of the user terminal 200b as a user who has published the content.

またこの他、例えば、あるユーザと、その他の１人以上のユーザとの類似度の総和等を計算したとき、この値はどれだけ多くのユーザとどれだけ強く類似しているかを表す指標（総合類似度）．となる。 In addition, for example, when calculating the sum of similarities between a certain user and one or more other users, this value is an index indicating how much users and how strongly they are similar (total Degree of similarity). It becomes.

例えば、ユーザ端末２００ａについて、その公開嗜好値と、他のユーザ端末２００ｂ、２００ｃ、・・・の視聴嗜好値との総合類似度が、他のユーザ端末で同様に計算した総合類似度よりも高いとする。これに基づき、ユーザ端末２００ａは多くのユーザに強く嗜好されるコンテンツを公開しているとして、他のユーザに情報を通知すること等もできる。 For example, for the user terminal 200a, the total similarity between the public preference value and the viewing preference value of the other user terminals 200b, 200c,... Is higher than the total similarity calculated in the other user terminals in the same manner. And Based on this, the user terminal 200a can notify other users of information, etc., assuming that content that is strongly preferred by many users is disclosed.

最後に、通知部Ｆ４００が、類似度に基づいた結果を各ユーザ端末２００ａ、２００ｂ、・・・に送信する。 Finally, the notification unit F400 transmits the result based on the similarity to each user terminal 200a, 200b,.

通知部Ｆ４００は、ユーザ関連付け部Ｆ３００によって計算された類似度に基づいた結果を各ユーザ端末に通知する。以下、この処理を実行するステップＳ４００について説明する。通知の方法としては、あるユーザ端末に対して、類似度の高いユーザ端末に関する情報を通知すればよい。通知する結果としては、例えば、コンテンツ共有サイトなどにおいて、各ユーザが事前にユーザ登録を行って利用するような場合には、その登録時に各ユーザが入力したプロフィール情報のうち、公開を許可されているプロフィール情報や、そのユーザ端末の嗜好値などのユーザ情報などがある。 The notification unit F400 notifies each user terminal of a result based on the similarity calculated by the user association unit F300. Hereinafter, step S400 for executing this process will be described. As a notification method, information on a user terminal having a high degree of similarity may be notified to a certain user terminal. As a result of notification, for example, in a content sharing site or the like, when each user performs user registration in advance, the user is permitted to release the profile information input by each user at the time of registration. Profile information and user information such as the preference value of the user terminal.

この他の通知する情報としては、類似度の高いユーザ（類似ユーザ）端末に配信されたコンテンツの属性情報、コンテンツの感情と感情度、サムネイル、要約などのコンテンツに関する情報などが考えられる。 As other information to be notified, attribute information of content distributed to a user (similar user) terminal with high similarity, information on content such as emotion and emotion level, thumbnail, summary, and the like can be considered.

通知するコンテンツの数としては、類似ユーザ端末毎に、例えば、最近配信されたものから順に１つ、２つなど、予め定数として定めておいてもよいし、類似度のランキングに応じて数を変えてもよい。 The number of contents to be notified may be determined as a constant for each similar user terminal in advance, for example, one or two in order from the most recently distributed, or the number may be determined according to the ranking of similarity. You may change it.

コンテンツの属性情報としては、コンテンツのタイトル、製作者、キーワード、概要、作成日時、フォーマット、関連するコンテンツの属性情報等が考えられ、これらは、例えば、ＭＰＥＧ７等、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）による記述形式に則っている場合等には簡単に付与することが可能である。 As the content attribute information, content title, producer, keyword, outline, creation date / time, format, related content attribute information, and the like can be considered, for example, description by XML (eXtensible Markup Language) such as MPEG7. If it conforms to the format, it can be easily given.

以下では、要約コンテンツを作成する方法の１例を説明する。 Hereinafter, an example of a method for creating summary content will be described.

要約コンテンツは、音声小段落を単位として構成される、音声段落を生成し、これを単位として作成する。音声段落は、例えば、５ｓ（セコンド）、１０ｓなどの音声段落の目標時間を設定として、これに近づくように音声小段落をまとめていくものが考えられる。 The summary content is generated by generating an audio paragraph composed of audio small paragraphs as a unit. As the audio paragraph, for example, a target time of an audio paragraph such as 5 s (second), 10 s or the like is set, and audio sub-paragraphs are gathered so as to approach this.

あるいは、音声小段落間の無声区間が、ある一定の閾値以下である場合にはこれをまとめることで音声段落を構成することが考えられる。 Alternatively, if the unvoiced interval between audio sub-paragraphs is equal to or less than a certain threshold, it is conceivable to compose an audio paragraph by collecting them.

感情度は、音声小段落単位で与えられているため、各音声段落は、１つ以上の感情度を含むことになる。例えば、図１４中の音声段落ＶＣ１は音声小段落ｖ１１，ｖ１２，ｖ１３を含み、さらに、音声小段落ｖ１１は、感情度Ｈ１１〜Ｈ１３を含む。 Since the emotion level is given in units of small voice paragraphs, each voice paragraph includes one or more emotion levels. For example, the audio paragraph VC1 in FIG. 14 includes audio subparagraphs v11, v12, and v13, and the audio subparagraph v11 includes emotion levels H11 to H13.

実際には、これらの感情度に基づいて、音声段落の累積感情度を計算する。この計算方法としては、例えば、音声小段落の感情度の平均、重み付け平均を取るものや、最大値を取るものが考えられる。 Actually, the cumulative emotion level of the voice paragraph is calculated based on these emotion levels. As this calculation method, for example, an average of the emotion level of the audio sub-paragraph, a weighted average, or a maximum value can be considered.

要約コンテンツは、累積感情度の大きい音声段落から降順にランキングし、全コンテンツとの比である圧縮率をある一定の目標値に近づくように上位から順番に音声段落を抽出することで作成してもよい。 The summary content is created by ranking the descending order from the speech paragraph with the highest cumulative emotion level, and extracting the speech paragraphs in order from the top so that the compression ratio, which is the ratio to all content, approaches a certain target value. Also good.

なお、この目標値は視聴者が希望のものを入力してもよい。更には、視聴者が累積感情度に対応する基準の値を入力し、その基準に整合する音声段落部分を優先的に抽出して要約を作成してもよい。 The target value may be input by the viewer as desired. Furthermore, the viewer may input a reference value corresponding to the cumulative emotion level, and extract the audio paragraph portion that matches the reference to create a summary.

要約コンテンツは、例えば、要約が提示されているモニタ２４１上の領域を、ユーザがポインティングデバイス２１２等を用いてポイントする等の操作によって再生可能なものとする。 The summary content can be reproduced by, for example, an operation in which the user points the area on the monitor 241 where the summary is presented using the pointing device 212 or the like.

サムネイルについては、前述の要約のうち所定の時間箇所、例えば、要約コンテンツの先頭の画像、中央の画像等を静止画として抽出し、提示する、といった方法がある。また、当該コンテンツ中、最も優勢に表れたと判定される感情において、感情的状態確率の最も高いフレームの画像をサムネイルとしてもよい。 As for thumbnails, there is a method of extracting and presenting a predetermined time portion of the above-mentioned summary, for example, the top image, the center image, etc. of the summary content as a still image. In addition, in the emotion determined to be most dominant in the content, an image of a frame having the highest emotional state probability may be used as a thumbnail.

このような類似度に基づいた結果の通知によって、各ユーザは自分の嗜好、感情と類似している可能性の高いユーザとのコミュニケーションの機会や、自分の嗜好、感情に合うと判断されたコンテンツの情報を得ることができる。 Content that is determined to suit each user's chances of communication with their users, who are likely to be similar to their preferences and emotions, and their preferences and emotions based on notification of results based on such similarity Information can be obtained.

更に、この結果の通知によって、ユーザが起こした行動の履歴を記憶しておくことで、類似度のみに依存しない二次的なユーザ支援を実行することもできる。 Further, by storing the history of actions taken by the user by the notification of this result, secondary user support that does not depend only on the similarity can be executed.

例えば、前回の類似度計算の結果、ユーザ端末２００ａに対してユーザ端末２００ｂが、ユーザ端末２００ｂに対してユーザ端末２００ｃが類似度の高いユーザ端末であるとして紹介されていたとする。この時、ユーザ端末２００ａとユーザ端末２００ｂがある一定回数よりも多くのコミュニケーションを取ったことを記憶していた場合、ユーザ端末２００ａとユーザ端末２００ｂの嗜好、感情はより類似性が高いと判断し、ユーザ端末２００ａに対してユーザ端末２００ｃを新たに紹介する、などが挙げられる。
[実施形態の第２例]
本発明の実施形態の第２例は、ユーザ支援を実行する際、コンテンツの音声情報と映像情報から感情、感情度を抽出し、この情報に基づいて、ユーザ嗜好情報を求め、ユーザ同士の関連付けを実行する場合についてである。 For example, as a result of the previous similarity calculation, it is assumed that the user terminal 200b is introduced to the user terminal 200a and the user terminal 200c is introduced to the user terminal 200b as a user terminal having a high similarity. At this time, if it is stored that the user terminal 200a and the user terminal 200b have communicated more than a certain number of times, it is determined that the preferences and emotions of the user terminal 200a and the user terminal 200b are more similar. The user terminal 200c is newly introduced to the user terminal 200a.
[Second Example of Embodiment]
In the second example of the embodiment of the present invention, when user support is executed, emotions and emotion levels are extracted from audio information and video information of content, user preference information is obtained based on this information, and association between users is obtained. It is about the case of executing.

本発明の実施形態の第２例に係る処理の流れ、装置の具体的構成の１例は、それぞれ図１のフロー図、図２のブロック図の範囲に示されている限り、本発明の実施形態の第１例の場合と同じとしてよい。 As long as the flow of processing according to the second example of the embodiment of the present invention and one example of the specific configuration of the apparatus are shown in the flowchart of FIG. 1 and the block diagram of FIG. It may be the same as the case of the first example of the form.

実施形態の第１例との違いは、感情推定部Ｆ１００において、音声情報のみではなく、映像情報も用いて、ユーザ嗜好情報を求める点である。以下、感情推定部Ｆ１００によって実行される、ステップＳ１００の感情、感情度の推定方法の１例について説明する。その他の処理の流れ、及び装置の具体的構成は、全て実施形態の第１例と同じとしてよい。 The difference from the first example of the embodiment is that the emotion estimation unit F100 obtains user preference information using not only audio information but also video information. Hereinafter, an example of the emotion / feeling level estimation method in step S 100 executed by the emotion estimation unit F 100 will be described. Other processing flows and the specific configuration of the apparatus may all be the same as in the first example of the embodiment.

感情推定部Ｆ１００は、コンテンツに含まれる音声情報のうち、韻律情報、音声認識した結果得られるテキスト情報のうち何れか１つ以上の情報、及び、映像情報のうち、色情報、編集情報、動きベクトル情報のうち何れか１つ以上の情報に基づいて、コンテンツの感情、感情度を計算する。音声情報に関しては、本発明の実施形態の第１例に説明したものと全く同じものとすればよい。 The emotion estimation unit F100 includes one or more information of prosodic information, text information obtained as a result of speech recognition, and color information, editing information, motion, among video information, and audio information included in content. The emotion of the content and the emotion level are calculated based on any one or more of the vector information. The audio information may be exactly the same as that described in the first example of the embodiment of the present invention.

映像情報のうち色情報に含まれるものとしては、例えば、単一画素及び／又は１つ以上の画素によって構成される領域毎に計算された色相値及びこの時間変動特性、単一画素及び／又は１つ以上の画素によって構成される領域毎に計算された輝度値及びこの時間変動特性、色相ヒストグラム、色相ヒストグラムの時間変動特性、輝度ヒストグラム、輝度ヒストグラムの時間変動特性等がある。これらの情報の抽出方法としては、例えば、非特許文献６などに示されている方法を用いることができる。 Among the video information, what is included in the color information is, for example, a hue value calculated for each region constituted by a single pixel and / or one or more pixels and this temporal variation characteristic, a single pixel and / or There are a luminance value calculated for each region constituted by one or more pixels, a temporal variation characteristic thereof, a hue histogram, a temporal variation characteristic of the hue histogram, a luminance histogram, a temporal variation characteristic of the luminance histogram, and the like. As a method for extracting such information, for example, a method shown in Non-Patent Document 6 or the like can be used.

編集情報としては、例えば、カット点頻度、ショット長、ワイプやディゾルブ等のショット間のつなぎ効果情報がある。これらの情報の抽出方法としては、例えば、非特許文献６、などに示されている方法を用いることができる。 Examples of the editing information include cut point frequency, shot length, and connection effect information between shots such as wipes and dissolves. As a method for extracting such information, for example, the method shown in Non-Patent Document 6, etc. can be used.

動きベクトル情報としては、カメラワーク情報、撮像対象の動作情報等がある。これらは画像全体及び／又は特定の領域において、例えば、非特許文献７に記載の方法などを用いて、動きベクトル情報を抽出すればよい。 Examples of the motion vector information include camera work information and operation information of an imaging target. For example, the motion vector information may be extracted from the entire image and / or a specific region using the method described in Non-Patent Document 7, for example.

これらの情報から感情、感情度を抽出する方法の一例としては、学習によって構成した統計モデルによって行うことができる。この際の、学習用映像情報は、抽出した映像情報と、人手によって、感情のカテゴリに対応したラベルが付与されているものとする。 As an example of a method for extracting emotion and emotion level from these pieces of information, a statistical model constructed by learning can be used. In this case, it is assumed that the learning video information is provided with a label corresponding to the emotion category by the extracted video information and manually.

感情と、統計モデルを対応付けることで、感情毎に確率を計算するための統計モデルを獲得する。統計モデルとしては、例えば、正規分布、混合正規分布、隠れマルコフモデル、一般化状態空間モデルなどを用いるのでもよい。好ましくは、感情の時間遷移をモデル化できる、隠れマルコフモデル、一般化状態空間モデルなどの時系列モデルを採用する。 By associating emotion with a statistical model, a statistical model for calculating a probability for each emotion is obtained. As the statistical model, for example, a normal distribution, a mixed normal distribution, a hidden Markov model, a generalized state space model, or the like may be used. Preferably, a time series model such as a hidden Markov model or a generalized state space model that can model the temporal transition of emotion is employed.

これらの映像モデルのパラメータの推定方法は、例えば、最尤推定法や、ＥＭアルゴリズム、変分ベイズ法などが公知のものとして知られており、これらを用いることができる。詳しくは非特許文献８、非特許文献９などを参照されたい。感情度は、例えば、計算された各感情の確率、又は所定の区間に含まれる確率の平均値や最大値などを感情度と見做せばよい。 As a method for estimating the parameters of these video models, for example, a maximum likelihood estimation method, an EM algorithm, a variational Bayes method, and the like are known, and these can be used. For details, refer to Non-Patent Document 8, Non-Patent Document 9, and the like. For the emotion level, for example, the calculated probability of each emotion or the average value or maximum value of the probabilities included in a predetermined section may be regarded as the emotion level.

また、これ以外にも、例えば、特許文献７の技術を用いて撮像対象を認識した結果の情報等を考慮してもよい。特に、撮像対象が人物等である場合、顔の表情情報等も加えて考慮してもよい。 In addition to this, for example, information on the result of recognizing the imaging target using the technique of Patent Document 7 may be considered. In particular, when the imaging target is a person or the like, facial expression information or the like may be taken into consideration.

この場合の感情、感情度は、例えば、撮像対象のカテゴリに応じて、「撮像対象：“ぬいぐるみ”→感情：“かわいい”、感情度：０．６」、「撮像対象：“花”感情：“綺麗”、感情度：０．４」等、感情と対応付けることのできる撮像対象と、感情、感情度を対応付けた辞書を保持し、これを参照することで抽出可能である。この辞書は、ユーザの実際の主観的感覚を考慮するために学習を用いて構築してもよい。 The emotion and the emotion level in this case are, for example, “imaging target:“ stuffed toy ”→ emotion:“ cute ”, emotion level: 0.6”, “imaging target:“ flower ”emotion: It is possible to extract by referring to a dictionary in which an imaging target that can be associated with an emotion, such as “beautiful”, emotion level: 0.4 ”, and the emotion and emotion level are associated with each other. This dictionary may be constructed using learning to take into account the actual subjective feeling of the user.

顔の表情情報は、例えば、特許文献８に開示されている方法などによって、映像中の顔と判断される領域を検出し、更に、特許文献９に開示されている方法などによって、顔の表情情報を抽出できる。 For facial expression information, for example, an area determined to be a face in a video is detected by a method disclosed in Patent Document 8, and the facial expression information is detected by a method disclosed in Patent Document 9. Information can be extracted.

感情度については、例えば、顔の表情を認識した結果が、“楽しい”と判定された場合には、“楽しい”感情についての感情度を所定値増加させるなど、認識結果の感情と対応する感情の感情度を所定値増加させる、及び／又は、それ以外の感情と対応する感情の感情度を所定値減少させる等の方法を取ることができる。 For the emotion level, for example, if the result of recognizing the facial expression is determined to be “fun”, the emotion level corresponding to the emotion of the recognition result, such as increasing the emotion level for the “fun” emotion by a predetermined value, etc. It is possible to take a method such as increasing the emotion level of a particular emotion and / or decreasing the emotion level of emotions corresponding to other emotions by a predetermined value.

映像情報を用いて抽出される感情のカテゴリについては、音声情報を用いて抽出されるカテゴリと同一としてよい。また、感情、感情度を抽出する範囲は、コンテンツ全体であっても、コンテンツを所定の区間毎に分割した部分コンテンツ単位であってもよい。 The emotion category extracted using the video information may be the same as the category extracted using the audio information. The range for extracting emotions and emotion levels may be the entire content or a partial content unit obtained by dividing the content into predetermined intervals.

感情度については、１つの感情に対して、感情度が複数抽出される場合があるが、この場合は、感情毎に、感情度の平均値、最大値などを計算することによって、該感情の感情度とすればよい。 Regarding emotion level, there are cases where multiple emotion levels are extracted for one emotion. In this case, by calculating the average value, maximum value, etc. of the emotion level for each emotion, The emotion level should be used.

このように推定されたコンテンツの感情、感情度を、コンテンツの識別情報と対応付けて、データベース４００に記憶しておく。 The content emotion and emotion level estimated in this way are stored in the database 400 in association with the content identification information.

以降、ステップＳ２００以降の処理の流れは、本発明の実施形態の第１例と同様に実行すればよい。
[実施形態の第３例]
本発明の実施形態の第３例は、実施形態の第１例、第２例において求め、記憶したユーザ嗜好情報に、更に、コンテンツ視聴時のユーザの活動情報をセンシングすることで取得し、記憶しておく場合である。 Henceforth, the flow of the process after step S200 should just be performed similarly to the 1st example of embodiment of this invention.
[Third example of embodiment]
The third example of the embodiment of the present invention is obtained by sensing the user's activity information at the time of content viewing and storing the user preference information obtained and stored in the first example and the second example of the embodiment. This is the case.

本発明の実施形態の第３例に係る処理の流れを図１５のフロー図、装置の具体的構成を図１６のブロック図に示す。 The flow of processing according to the third example of the embodiment of the present invention is shown in the flowchart of FIG. 15, and the specific configuration of the apparatus is shown in the block diagram of FIG.

この実施形態の第３例におけるユーザ支援装置１００は、少なくともユーザ端末２００ａ、２００ｂ、・・・、情報制御部３００、データベース４００が相互に通信可能な所定の通信手段を介して接続されることで構成される。 The user support apparatus 100 in the third example of this embodiment is such that at least the user terminals 200a, 200b,..., The information control unit 300, and the database 400 are connected via a predetermined communication means that can communicate with each other. Composed.

各ユーザ端末２００ａ、２００ｂ、・・・の構成を説明するブロック図を図１７に示す。ユーザ端末２００ａ、２００ｂ、・・・は、実施形態の第１例、及び第２例の構成に加え、更に、ユーザの動作情報を取得する手段として、身体センサ２５０を備えたものとする。この身体センサとしては、撮像装置、収音装置、圧力センサ、加速度センサ、温度センサ、脈拍センサ、筋電センサ、発汗センサ、脳波センサ等がある。 FIG. 17 is a block diagram illustrating the configuration of each user terminal 200a, 200b,. In addition to the configurations of the first example and the second example of the embodiment, the user terminals 200a, 200b,... Further include a body sensor 250 as means for acquiring user operation information. Examples of the body sensor include an imaging device, a sound collection device, a pressure sensor, an acceleration sensor, a temperature sensor, a pulse sensor, a myoelectric sensor, a sweat sensor, and an electroencephalogram sensor.

また、情報制御部３００は機能毎に、感情推定部Ｆ１００、ユーザ情報記憶部Ｆ２００、ユーザ関連付け部Ｆ３００、通知部Ｆ４００、活動情報認識部Ｆ５００を備えているものとする。 The information control unit 300 includes an emotion estimation unit F100, a user information storage unit F200, a user association unit F300, a notification unit F400, and an activity information recognition unit F500 for each function.

感情推定部Ｆ１００が、コンテンツについての感情、感情度を抽出しておき、コンテンツをユーザ端末に配信する際、各ユーザ端末の識別情報とコンテンツの識別情報を対応付け、これをユーザ嗜好情報としてユーザ情報記憶部Ｆ２００に記憶する処理は、全て実施形態の第１例、又は第２例における処理と全く同様としてよい。 When the emotion estimation unit F100 extracts the emotion and the emotion level of the content and distributes the content to the user terminal, the identification information of each user terminal is associated with the identification information of the content, and the user preference information is used as the user preference information. All the processes stored in the information storage unit F200 may be the same as the processes in the first example or the second example of the embodiment.

本実施形態では更に、活動情報認識部Ｆ５００が、ユーザ端末にコンテンツを配信した後、コンテンツ視聴時のユーザの活動情報を認識する。この結果得られた活動情報を、コンテンツを分析することによって取得したユーザ嗜好情報に加えて、ユーザ情報記憶部Ｆ２００に記憶する。 In the present embodiment, the activity information recognition unit F500 further recognizes user activity information when viewing the content after distributing the content to the user terminal. The activity information obtained as a result is stored in the user information storage unit F200 in addition to the user preference information acquired by analyzing the content.

以下、本発明の実施形態の第３例の処理の流れの１例について、図１５のフロー図を用いて詳細に説明する。 Hereinafter, an example of the flow of processing of the third example of the embodiment of the present invention will be described in detail with reference to the flowchart of FIG.

ステップＳ１００Ｂ、及びステップＳ２００Ｂでは、実施形態の第１例、又は第２例のステップＳ１００、ステップＳ２００と全く同様の処理を行うものとしてよい。 In step S100B and step S200B, the same processing as step S100 and step S200 of the first example or the second example of the embodiment may be performed.

ステップＳ３００Ｂでは、活動情報認識部Ｆ５００において、各ユーザ端末にコンテンツを配信後、ユーザ端末２００内の身体センサ２５０によってコンテンツ視聴時の活動情報が取得される。この活動情報は、例えば、動作情報、音声情報、生体情報のうち少なくとも１つによって構成することができる。 In step S300B, in the activity information recognition unit F500, after distributing the content to each user terminal, the activity information at the time of viewing the content is acquired by the body sensor 250 in the user terminal 200. This activity information can be configured by at least one of, for example, motion information, audio information, and biological information.

動作情報としては、例えば、顔の表情情報、姿勢情報、キーボード２１１、ポインティングデバイス２１２等の操作頻度情報、操作量情報等がある。 Examples of the operation information include facial expression information, posture information, operation frequency information of the keyboard 211, pointing device 212, and operation amount information.

このうち、顔の表情情報は、例えば、撮像装置によって取得された画像情報から、特許文献６に開示されている方法などによって、映像中の顔と判断される領域を検出し、更に、特許文献７に開示されている方法を用いること等によって、顔の表情を特徴づける特徴ベクトルとして数値によって抽出することができる。 Of these, facial expression information is, for example, detected from the image information acquired by the imaging device by an area determined to be a face in a video by a method disclosed in Patent Document 6, and the like. 7 can be extracted numerically as a feature vector that characterizes facial expressions.

また、姿勢情報としては、例えば、撮像装置、加速度センサ、筋電センサ等によって取得することができる。 The posture information can be acquired by, for example, an imaging device, an acceleration sensor, a myoelectric sensor, or the like.

撮像装置によって取得する場合には、例えば、映像中の人物の領域を検出し、その垂直方向軸からの傾きや、人物領域の大きさ等によって数値として抽出することができる。 When the image is acquired by the imaging device, for example, a person area in the video can be detected and extracted as a numerical value based on the inclination from the vertical axis, the size of the person area, and the like.

またその他、手の動きなどを計測することが考えられ、例えば、ユーザが手首に加速度センサを装着している場合には、積分演算によって単位時間あたりの移動量を計算できるので、これを速度ベクトル、及び、速さであるノルムとして数値によって抽出することができる。 In addition, it is conceivable to measure the movement of the hand. For example, when the user wears an acceleration sensor on the wrist, the amount of movement per unit time can be calculated by integral calculation. , And can be extracted numerically as the norm, which is the speed.

操作頻度、操作量情報は、それぞれ、単位時間あたりのキーボード２１１のキーの押下げ、押上げの回数や、ポインティングデバイス２１２によるポインタの移動量、クリック数等として、数値として得ることができる。 The operation frequency and the operation amount information can be respectively obtained as numerical values such as the number of times the key of the keyboard 211 is pressed and pushed up per unit time, the amount of pointer movement by the pointing device 212, the number of clicks, and the like.

音声情報としては、ユーザの発話音声の韻律情報や、発話を音声認識した結果得られるテキスト情報等があり、これらは、本発明の実施形態の第１例の、ステップＳ１００でコンテンツの感情推定部Ｆ１００が実行した方法と同じ方法で、各感情の感情度として数値化することができる。 The speech information includes prosodic information of the user's uttered speech, text information obtained as a result of speech recognition of the utterance, and these are the emotion estimation unit of the content in step S100 of the first example of the embodiment of the present invention. It can be quantified as the emotion level of each emotion by the same method as executed by F100.

生体情報としては、例えば、体温情報、脈拍情報、発汗情報、脳波情報等がある。 Examples of the biological information include body temperature information, pulse information, sweat information, and electroencephalogram information.

これらは、温度センサ、脈拍センサ、筋電センサ、発汗センサ、脳波センサ等、種々の生体センサを用いて数値として取得することができる。 These can be acquired as numerical values using various biological sensors such as a temperature sensor, a pulse sensor, a myoelectric sensor, a sweat sensor, and an electroencephalogram sensor.

こられの活動情報を抽出するタイミングとしては、例えば、コンテンツがユーザ端末に配信されてから、ユーザ端末内でコンテンツの再生が終了するまで、一定の間隔で抽出するのでもよい。 As the timing for extracting such activity information, for example, the content information may be extracted at regular intervals after the content is distributed to the user terminal until the reproduction of the content is finished in the user terminal.

また、配信したコンテンツの再生箇所（時刻）の情報と、ステップＳ１００Ｂで推定されている感情度の情報に基づいて、感情度が相対的に高い箇所を再生しているタイミングで抽出を行ってもよい。 Moreover, even if extraction is performed at a timing when a portion having a relatively high emotion level is reproduced based on the information on the reproduction location (time) of the distributed content and the emotion level information estimated in step S100B. Good.

以上のようにして得られた各活動情報を、図１８に示すようにユーザ嗜好情報に追加する形式で、ユーザ情報記憶部Ｆ２００に記憶する。 Each activity information obtained as described above is stored in the user information storage unit F200 in a format to be added to the user preference information as shown in FIG.

次に、ユーザ関連付け部Ｆ３００が、記憶されたユーザ嗜好情報に基づいて、各ユーザ端末間の関連付けを行う。本発明の実施形態の第３例においては、この関連付けは、ユーザ嗜好情報の類似度合いを類似度として計算し、この類似度に基づいて関連付けを行う。 Next, the user association unit F300 associates each user terminal based on the stored user preference information. In the third example of the embodiment of the present invention, this association is performed by calculating the similarity degree of the user preference information as the similarity degree and performing the association based on the similarity degree.

以下、説明する関連付けは、少なくとも一度コンテンツ配信を受ける、即ち、ユーザ端末の識別情報と、コンテンツの配信日時と、コンテンツの識別情報によって構成される配信記録が、１つ以上記憶されているユーザ端末のみを対象とする。 Hereinafter, the association described below receives at least one content distribution, that is, a user terminal in which one or more distribution records composed of user terminal identification information, content distribution date and time, and content identification information are stored. Only for the target.

ユーザ関連付け部Ｆ３００が実行する、ユーザ関連付け処理を行うステップＳ４００Ｂの処理の流れの１例を示すフロー図である図１９に従って説明する。 The process will be described with reference to FIG. 19, which is an example of the flow of the process of step S 400 B that is executed by the user association unit F 300 and performs the user association process.

まず、ステップＳ４１０Ｂでは、実施形態の第１例、第２例同様、ユーザ端末毎に、コンテンツを分析することで推定された感情、感情度に基づいた嗜好値を計算する。以下、これに該当する嗜好値を、コンテンツ嗜好値と呼ぶ。コンテンツ嗜好値の計算については、本発明の実施形態の第１例のステップＳ３１０で説明したものと全く同様としてよい。 First, in step S410B, like the first example and the second example of the embodiment, for each user terminal, a preference value based on the emotion and the emotion level estimated by analyzing the content is calculated. Hereinafter, the preference value corresponding to this is referred to as a content preference value. The calculation of the content preference value may be exactly the same as that described in step S310 of the first example of the embodiment of the present invention.

次に、ステップＳ４２０Ｂで、ユーザ端末毎に、活動情報によって得られたユーザ嗜好情報に基づいた嗜好値を計算する。以下、これに該当する嗜好値を、活動嗜好値と呼ぶ。
Next, in step S420B, a preference value based on user preference information obtained from activity information is calculated for each user terminal. Hereinafter, a preference value corresponding to this is referred to as an activity preference value.

以下、活動嗜好値を計算する方法の１例を説明する。例えば、ユーザ端末２００ａについて、ｉ番目に視聴したコンテンツ視聴時の、ｋ番目の活動情報ｍ^kの計算値をｑｉ（ｍ^k）と表し、視聴したコンテンツの数をＮと表す。各ｑｉ（ｍ^k）は、例えば、０〜１の間などの数値に規格化されていることが好ましい。このとき、例えば、ユーザ端末２００ａのｋ番目の活動情報ｍ^kに対する活動嗜好値Ｑａ（ｍ^k）は、重み付け平均値を表す。 Hereinafter, an example of a method for calculating the activity preference value will be described. For example, for the user terminal 200a, the calculated value of the k-th activity information m ^k when viewing the i-th viewed content is represented as qi (m ^k ), and the number of viewed content is represented as N. Each qi (m ^k ) is preferably normalized to a numerical value between 0 and 1, for example. At this time, for example, the activity preference value Qa (m ^k ) for the k-th activity information m ^k of the user terminal 200a represents a weighted average value.

によって計算することが考えられる。ここで、ｖｉは重みであり、例えば、コンテンツ視聴時の時刻が最近であるものほど、大きな値とすることが考えられる。また、エビングハウスの忘却曲線など、心理学の知見を取り入れた忘却モデルを導入し、これを過去のコンテンツの視聴時間と対応付けて重みを決定してもよい。 It is conceivable to calculate by Here, vi is a weight. For example, it can be considered that the value is larger as the time when viewing the content is more recent. In addition, a forgetting model incorporating psychological knowledge such as an Ebbinghouse forgetting curve may be introduced, and the weight may be determined by associating it with the viewing time of past content.

このような重み付け処理によって、ユーザの嗜好の変化を考慮に入れた嗜好値を計算でき、より適時性の高い関連付けを実行できる。 By such weighting processing, it is possible to calculate a preference value taking into account a change in user preference, and to perform association with higher timeliness.

同様に、ユーザ端末２００ｂ、２００ｃ、・・・についてもそれぞれ、各活動情報ｍ^kに対する活動嗜好値Ｑｂ（ｍ^k）、Ｑｃ（ｍ^k）、・・・として計算する。 Similarly, each user terminal 200b, 200c, for also ..., activity preference value Qb for each activity information ^{^{m k (m k), Qc}} (m k), calculated as ....

以上の計算によって得られたコンテンツ嗜好値、活動嗜好値を合わせて嗜好値とし、これに基づいてステップＳ４３０Ｂでは、各ユーザ端末２００ａ、２００ｂ、・・・のユーザ端末間の類似度を計算する。類似度は、例えば、ユーザ端末２００ａの嗜好値に対して、その他のユーザ端末２００ｂ、２００ｃ、・・・の嗜好値がどれだけ類似しているかを表す。以下、ユーザ端末２００ａとユーザ端末２００ｂの類似度を計算する方法の１例について説明する。 The content preference value and activity preference value obtained by the above calculation are combined into a preference value, and based on this, the similarity between the user terminals of the user terminals 200a, 200b,... Is calculated in step S430B. The similarity indicates, for example, how similar the preference values of the other user terminals 200b, 200c,... Are with respect to the preference value of the user terminal 200a. Hereinafter, an example of a method for calculating the similarity between the user terminal 200a and the user terminal 200b will be described.

例えば、ユーザ端末２００ａとユーザ端末２００ｂの類似度ｆｓ（ａ、ｂ）は、感情カテゴリのインデクス集合をＫ、その数を＃（Ｋ）、また、活動情報のインデクス集合をＬ、その数を＃（Ｌ）とすれば、 For example, the similarity fs (a, b) between the user terminal 200a and the user terminal 200b is expressed by K for the emotion category index set, the number # (K), the activity information index set L, and the number # (L)

によって計算することができる。 Can be calculated by:

類似度の計算式は上記式１１に限らず、例えば、分母は対象となるユーザ間のコンテンツ嗜好値及び活動嗜好値それぞれのｐ次平均ノルムの和等によって代替してもよい。 For example, the denominator may be replaced by the sum of p-order average norms of content preference values and activity preference values between target users, for example.

更には、過去の嗜好値を記憶しておき、この値による修正を加えて嗜好値を計算してもよい。過去の嗜好値を考慮することで、ユーザの嗜好の不変性や大きな変化を捉えることができる。この類似度を計算する際、例えば、それぞれの嗜好値の信頼性等の基準に基づいて、数値に重み付けを行ってもよい。これにより、例えば、活動嗜好値に強く反映されるユーザの嗜好や感情がある場合には、活動嗜好値に対する重みを増加させる、等の処理によって、より精度の高い類似度の計算が実行できる。 Furthermore, the past preference value may be stored, and the preference value may be calculated by adding correction based on this value. By taking into account past preference values, it is possible to capture invariance and large changes in user preferences. When calculating the similarity, for example, the numerical values may be weighted based on criteria such as the reliability of each preference value. Thereby, for example, when there is a user preference or emotion that is strongly reflected in the activity preference value, the calculation of the similarity with higher accuracy can be executed by processing such as increasing the weight for the activity preference value.

同様に、ユーザ端末２００ａについて、ユーザ端末２００ｃ、２００ｄ・・・との類似度についてもそれぞれ、ｆｓ（ａ、ｃ）、ｆｓ（ａ、ｄ）・・・として計算し、ユーザ２００ｂ、２００ｃ・・・についても、その他の端末との類似度ｆｓ（ｂ、ｃ）、ｆｓ（ｂ、ｄ）、・・・を計算していく。このような処理によって、対象となる全ての２ユーザ端末間の類似度を計算することができる。 Similarly, the similarity between the user terminal 200a and the user terminals 200c, 200d... Is calculated as fs (a, c), fs (a, d). For *, similarity fs (b, c), fs (b, d),... With other terminals is calculated. By such processing, the similarity between all target two user terminals can be calculated.

以下、本発明の実施形態の第３例におけるステップＳ５００Ｂは、本発明の実施形態の第１例、又は第２例におけるステップＳ４００と全く同じとしてよい。 Hereinafter, step S500B in the third example of the embodiment of the present invention may be exactly the same as step S400 in the first example or the second example of the embodiment of the present invention.

以上、この発明によるユーザ支援方法の、実施形態における方法の１例について詳細に説明した。その他、本発明の実施形態として示した１例以外のものであっても、本発明の原理に基づいて取りうる実施形態の範囲においては、適宜その実施形態に変化しうるものである。 Heretofore, an example of the method in the embodiment of the user support method according to the present invention has been described in detail. Other than the example shown as the embodiment of the present invention, the embodiment can be appropriately changed within the scope of the embodiment that can be taken based on the principle of the present invention.

以下では、この発明によってユーザ支援を行う実施例を示す。以下で説明する実施例では、インターネット通信によって、情報制御部３００と、コンテンツを蓄積したデータベース４００を含むサーバ装置５００と接続された各ユーザ端末２００ａ、２００ｂ、・・・間の類似度を計算し、これによって感情の共有によるユーザ支援を行う場合である。 Below, the Example which performs user assistance by this invention is shown. In the embodiment described below, the degree of similarity between each of the user terminals 200a, 200b,... Connected to the information control unit 300 and the server device 500 including the database 400 storing contents is calculated by Internet communication. This is a case where user support is performed by sharing emotions.

事前処理として、感情抽出部Ｆ１００内に、感情確率を計算する統計モデルとして、少なくとも音声情報に基づいて感情確率を計算する音声モデルが格納されているものとする。 As pre-processing, it is assumed that a speech model for calculating an emotion probability based on at least speech information is stored as a statistical model for calculating an emotion probability in the emotion extraction unit F100.

各ユーザは、情報制御部Ｆ３００を備えたサーバ装置５００によって供給される所定のコンテンツ共有サイトへアクセスを行い、このサイトにログインすることで各々固有の識別情報（ユーザＩＤ）を付与されているものとする。
［第１実施例］
本実施例は、感情推定部Ｆ１００が、コンテンツ中の音声情報からコンテンツの感情、感情度を抽出し、これに基づいてユーザ嗜好情報を求め、関連付けを行う場合である。第１実施例の具体的装置の構成を図２０に示す。実施手順は以下の通りである。
[手順１]サーバ装置５００が各ユーザ端末２００ａ、２００ｂ、・・・からのコンテンツの公開要請を受けた時点で、固有の識別情報としてＩＤ番号を付与する。感情推定部Ｆ１００が、該コンテンツ中の音声情報から、コンテンツの感情、感情度を推定し、この感情と感情度の情報を、コンテンツの識別情報と共に情報制御部３００内のＨＤＤ３０４に記憶する。さらに、コンテンツを公開したユーザに対して、そのユーザＩＤ毎に、公開したコンテンツの識別情報、公開日時をユーザ嗜好情報記録部Ｆ２００に記憶する。
[手順２]各ユーザが視聴を希望するコンテンツに対しての配信要求を、各ユーザ端末を通してサーバ装置５００に送信し、サーバ装置５００は、対応する各ユーザ端末２００ａ、２００ｂ、・・・にコンテンツを配信する。この際、サーバ装置５００は、ユーザＩＤ毎に、配信日時と、配信したコンテンツの識別情報をユーザ嗜好情報として、ユーザ嗜好情報記録部Ｆ２００に記憶する。
[手順３]全ユーザ端末の内７５％が、少なくとも一度コンテンツを公開した、もしくは視聴をしており、かつ、直前の類似度計算から１５分以上を経過している状態である場合、ユーザ関連付け部Ｆ３００が、少なくとも一度コンテンツの配信を受けているユーザ（ユーザＩＤ）を対象として、関連付けを実行する。
[手順４]類似度計算の結果、各ユーザ（ユーザＩＤ）に対して、類似度の高かったユーザ（ユーザＩＤ）の上位３名ずつについて、ユーザ名等のユーザ情報と、各類似ユーザが直前に視聴したコンテンツ２つのコンテンツ情報、通知部Ｆ４００が配信、通知する。
［第２実施例］
本実施例は、感情推定部Ｆ１００が、コンテンツ中の音声情報及び映像情報からコンテンツの感情、感情度を抽出し、これに基づいてユーザ嗜好情報を求め、関連付けを行う場合である。第２実施例の具体的装置の構成を図２０に示す。実施手順は以下の通りである。
[手順１]サーバ装置５００が各ユーザ端末２００ａ、２００ｂ、・・・からのコンテンツの公開要請を受けた時点で、固有の識別情報としてＩＤ番号を付与する。さらに、感情推定部Ｆ１００が、コンテンツ中の音声情報及び映像情報から、コンテンツの感情、感情度を推定し、この感情と感情度の情報を、コンテンツの識別情報と共に情報制御部３００内のＨＤＤ３０４に記憶する。さらに、コンテンツを公開したユーザに対して、そのユーザＩＤ毎に、公開したコンテンツの識別情報、感情と感情度の情報、公開日時をユーザ嗜好情報記録部Ｆ２００に記憶する。 Each user is given a unique identification information (user ID) by accessing a predetermined content sharing site supplied by the server device 500 provided with the information control unit F300 and logging in to this site. And
[First embodiment]
In this embodiment, the emotion estimation unit F100 extracts the emotion and emotion level of the content from the audio information in the content, obtains the user preference information based on this, and performs the association. FIG. 20 shows the configuration of a specific apparatus of the first embodiment. The implementation procedure is as follows.
[Procedure 1] When the server apparatus 500 receives a content disclosure request from each of the user terminals 200a, 200b,..., An ID number is assigned as unique identification information. The emotion estimation unit F100 estimates the emotion and emotion level of the content from the audio information in the content, and stores this emotion and emotion level information in the HDD 304 in the information control unit 300 together with the content identification information. Furthermore, for the user who published the content, the identification information of the published content and the publication date / time are stored in the user preference information recording unit F200 for each user ID.
[Procedure 2] A distribution request for content that each user wants to view is transmitted to the server device 500 through each user terminal, and the server device 500 transmits the content to each corresponding user terminal 200a, 200b,. To deliver. At this time, the server device 500 stores, for each user ID, the distribution date and the identification information of the distributed content in the user preference information recording unit F200 as user preference information.
[Procedure 3] If 75% of all user terminals have released or viewed content at least once and have been in a state where 15 minutes or more have passed since the previous similarity calculation, user association The unit F300 executes association for a user (user ID) who has received the content distribution at least once.
[Procedure 4] As a result of similarity calculation, with respect to each user (user ID), the user information such as the user name and the similar users immediately before the top three users (user IDs) with high similarity The content information of the content that was viewed at the time is delivered and notified by the notification unit F400.
[Second Embodiment]
In this embodiment, the emotion estimation unit F100 extracts the emotion and emotion level of the content from the audio information and the video information in the content, obtains the user preference information based on this, and performs the association. FIG. 20 shows the configuration of a specific device of the second embodiment. The implementation procedure is as follows.
[Procedure 1] When the server apparatus 500 receives a content disclosure request from each of the user terminals 200a, 200b,..., An ID number is assigned as unique identification information. Further, the emotion estimation unit F100 estimates the emotion and emotion level of the content from the audio information and video information in the content, and sends the emotion and emotion level information together with the content identification information to the HDD 304 in the information control unit 300. Remember. Furthermore, for each user ID who has made the content public, the published content identification information, emotion and emotion level information, and release date are stored in the user preference information recording unit F200.

以降の[手順２]〜[手順４]は、第１実施例と全く同様である。
［第３実施例］
本実施例は、感情推定部Ｆ１００が、コンテンツ中の音声情報及び映像情報からコンテンツの感情、感情度を抽出し、更に、活動情報認識部Ｆ５００が、ユーザの活動情報を抽出することでユーザ嗜好情報を求め、これに基づいて関連付けを行う場合である。第２実施例の具体的装置の構成を図２１に示す。実施手順は以下の通りである。
[手順１]〜[手順２]は、第２実施例と全く同様である。
[手順４]各ユーザ端末に対して、コンテンツ配信直後、サーバ装置５００は、それぞれのユーザ端末に対して、動作情報取得要求を送信する。これを受けた各ユーザ端末は、身体センサ２５０によって、ユーザの活動情報を取得し、これをサーバ装置５００、活動情報認識部Ｆ５００に送信する。活動情報認識部Ｆ５００は、このユーザの活動情報を、[手順３]で記憶した情報と共に、ユーザＩＤ毎のコンテンツ識別情報に対応付けて記憶しておく。
[手順５]全ユーザ端末の内７５％が、少なくとも一度コンテンツを視聴しており、かつ、直前の類似度計算から１５分以上を経過している状態である場合、ユーザ関連付け部Ｆ３００が、少なくとも一度コンテンツの配信を受けているユーザ（ユーザＩＤ）を対象として、関連付けを実行する。
[手順６]類似度計算の結果、各ユーザ（ユーザＩＤ）に対して、類似度の高かったユーザ（ユーザＩＤ）の上位３名ずつについて、ユーザ名等のユーザ情報と、各類似ユーザが直前に視聴したコンテンツ２つのコンテンツ情報、通知部Ｆ４００が配信、通知する。 Subsequent [Procedure 2] to [Procedure 4] are exactly the same as those in the first embodiment.
[Third embodiment]
In this embodiment, the emotion estimation unit F100 extracts the emotion and emotion level of the content from the audio information and the video information in the content, and the activity information recognition unit F500 extracts the user activity information so that the user preference is obtained. This is a case where information is obtained and association is performed based on the information. FIG. 21 shows a specific apparatus configuration of the second embodiment. The implementation procedure is as follows.
[Procedure 1] to [Procedure 2] are exactly the same as in the second embodiment.
[Procedure 4] Immediately after content distribution to each user terminal, the server device 500 transmits an operation information acquisition request to each user terminal. Receiving this, each user terminal acquires the activity information of the user by the body sensor 250, and transmits this to the server device 500 and the activity information recognition unit F500. The activity information recognition unit F500 stores the user activity information in association with the content identification information for each user ID together with the information stored in [Procedure 3].
[Procedure 5] If 75% of all user terminals have watched the content at least once and have passed 15 minutes or more since the previous similarity calculation, the user association unit F300 The association is executed for the user (user ID) who has received the content once.
[Procedure 6] As a result of similarity calculation, for each user (user ID), the user information such as the user name and the similar users immediately before the top three users (user IDs) with high similarity The content information of the content that was viewed at the time is delivered and notified by the notification unit F400.

また、本実施形態のユーザ支援装置における各手段の一部もしくは全部の機能をコンピュータのプログラムで構成し、そのプログラムをコンピュータを用いて実行して本発明を実現することができること、本実施形態のユーザ支援装置における手順をコンピュータのプログラムで構成し、そのプログラムをコンピュータに実行させることができることは言うまでもなく、コンピュータでその機能を実現するためのプログラムを、そのコンピュータが読み取り可能な記録媒体、例えばＦＤ（Ｆｌｏｐｐｙ（登録商標）Ｄｉｓｋ）や、ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄｉｓｋ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、メモリカード、ＣＤ（ＣｏｍｐａｃｔＤｉｓｋ）−ＲＯＭ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）−ＲＯＭ、ＣＤ−Ｒ，、ＣＤ−ＲＷ，ＨＤＤ，リムーバブルディスクなどに記録して、保存したり、配布したりすることが可能である。また、上記のプログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。 Further, a part or all of the functions of each means in the user support apparatus of the present embodiment can be configured by a computer program, and the program can be executed using the computer to realize the present invention. It goes without saying that the procedure in the user support apparatus can be constituted by a computer program and the program can be executed by the computer, and the computer-readable recording medium, for example, FD, can be realized by the computer. (Floppy (registered trademark) Disk), MO (Magneto-Optical disk), ROM (Read Only Memory), memory card, CD (Compact Disk) -ROM, DVD (Digital Versatile) Disk) -ROM, CD-R, CD-RW, HDD, removable disk, etc., and can be stored and distributed. It is also possible to provide the above program through a network such as the Internet or electronic mail.

本発明の実施形態の第１例及び第２例における方法の処理の流れを説明するフローチャート。The flowchart explaining the flow of a process of the method in the 1st example and 2nd example of embodiment of this invention. 本発明の実施形態の第１例及び第２例における装置の構成を説明するブロック図。The block diagram explaining the structure of the apparatus in the 1st example and 2nd example of embodiment of this invention. 本発明の実施形態の第１例及び第２例におけるユーザ端末の装置の構成を説明するブロック図。The block diagram explaining the structure of the apparatus of the user terminal in the 1st example and 2nd example of embodiment of this invention. 本発明の実施形態におけるコンテンツの識別情報と感情、感情度を対応付けた情報を説明する図。The figure explaining the information which matched the identification information, the emotion, and the emotion level of the content in the embodiment of the present invention. 本発明の実施形態における感情検出方法を説明するフローチャート。The flowchart explaining the emotion detection method in embodiment of this invention. 本発明の実施形態における統計モデルを構成するステップの処理詳細を示すフローチャート。The flowchart which shows the processing detail of the step which comprises the statistical model in embodiment of this invention. 基本周波数の時間変動特性を求める方法の概念図。The conceptual diagram of the method of calculating | requiring the time fluctuation characteristic of a fundamental frequency. 音声特徴量の時間的な挙動を示す概念図。The conceptual diagram which shows the time behavior of an audio | voice feature-value. 一般化状態空間モデルの概念図。The conceptual diagram of a generalized state space model. 喜び、哀しみ、平静の感情的状態を扱った場合の感情的状態確率の一例を示す図。The figure which shows an example of the emotional state probability at the time of dealing with the emotional state of joy, sorrow, and calm. 音声小段落と感情度の関係を示す概念図。The conceptual diagram which shows the relationship between an audio | voice small paragraph and emotion degree. 本発明の実施形態におけるユーザ嗜好情報を説明する図。The figure explaining the user preference information in the embodiment of the present invention. 本発明の実施形態におけるステップＳ３００を説明するフローチャート。The flowchart explaining step S300 in embodiment of this invention. 音声段落と感情度の関係を示す概念図。The conceptual diagram which shows the relationship between an audio | voice paragraph and emotion degree. 本発明の実施形態の第３例における方法の処理の流れを説明するフローチャート。The flowchart explaining the flow of a process of the method in the 3rd example of embodiment of this invention. 本発明の実施形態の第３例における装置の構成を説明するブロック図。The block diagram explaining the structure of the apparatus in the 3rd example of embodiment of this invention. 本発明の実施形態の第３例におけるユーザ端末の装置の構成を説明するブロック図。The block diagram explaining the structure of the apparatus of the user terminal in the 3rd example of embodiment of this invention. 本発明の実施形態における活動情報を含めたユーザ嗜好情報を説明する図。The figure explaining the user preference information including the activity information in the embodiment of the present invention. 本発明の実施形態におけるステップＳ４００Ｂを説明するフローチャート。The flowchart explaining step S400B in embodiment of this invention. 本発明の第１及び第２実施例におけるユーザ支援装置１００の具体的構成を示すブロック図。The block diagram which shows the specific structure of the user assistance apparatus 100 in the 1st and 2nd Example of this invention. 本発明の第３実施例におけるユーザ支援装置１００の具体的構成を示すブロック図。The block diagram which shows the specific structure of the user assistance apparatus 100 in 3rd Example of this invention.

Explanation of symbols

１００…ユーザ支援装置
２００，２００ａ，２００ｂ…ユーザ端末
２１０…入力部
２２０…制御部
２３０…記憶部
２４０…表示部
３００…情報制御部
４００…データベース
５００…サーバ装置
Ｆ１００…感情推定部
Ｆ２００…ユーザ情報記憶部
Ｆ３００…ユーザ関連付け部
Ｆ４００…通知部
Ｆ５００…活動情報認識部 DESCRIPTION OF SYMBOLS 100 ... User assistance apparatus 200, 200a, 200b ... User terminal 210 ... Input part 220 ... Control part 230 ... Storage part 240 ... Display part 300 ... Information control part 400 ... Database 500 ... Server apparatus F100 ... Emotion estimation part F200 ... User information Storage unit F300 ... user association unit F400 ... notification unit F500 ... activity information recognition unit

Claims

A user support method for associating a plurality of users,
The estimation means extracts a feature amount from one or more of the audio information and video information of the content, and based on the extracted feature amount and a pre-configured statistical model, a plurality of emotions of the content An estimation step for estimating each emotion level,
A content attribute storage step in which content attribute storage means stores the identification information of the content in association with the emotion estimated for the content and the emotion level;
User information storage means identifies, for each user terminal, at least one of the content published by the user terminal, the content being viewed, and the content viewed, and the user terminal A user information storage step for associating information to be stored and storing this as user preference information;
Based on the user preference information, the user association means obtains, for each user terminal, a weighted average of emotion level as a preference value of the emotion category for each of a plurality of emotions for content associated with each user terminal. A user association step of associating one or more user terminals other than the first user terminal with the first user terminal based on the similarity of the obtained preference value of the user terminal;
A user support method comprising:

The method of claim 1, wherein
The user information storing step includes, for each user terminal, information for identifying the content of at least one of the content published by the user terminal, the content being viewed, and the content viewed, and the user terminal Activity information consisting of at least one of operation information, audio information, and biometric information of a user who uses the user terminal when the content is viewed or viewed is acquired, and this is obtained as a user of the user terminal Remember it as preference information,
The user association step associates the user terminal based on a preference value of the user terminal and a similarity calculated from the activity information.

3. The method according to claim 1, wherein the user information storing step further stores a time change of the user preference information.

A user support device that associates a plurality of users,
A feature amount is extracted from one or more of audio information and video information of the content, and each emotion level of the plurality of emotions of the content is extracted based on the extracted feature amount and a statistical model configured in advance. Estimating means for estimating
Content attribute storage means for storing the content identification information, the emotion estimated for the content, and the emotion level in association with each other;
For each user terminal, information for identifying the content and information for identifying the user terminal are associated with at least one of the content published by the user terminal, the content being viewed, and the content viewed. User information storage means for storing this as user preference information;
Based on the user preference information, for each of a plurality of emotions for content associated with each user terminal, a weighted average of emotion levels is obtained for each user terminal as a preference value of an emotion category, and the obtained user terminal User association means for associating one or more user terminals other than the first user terminal with the first user terminal based on the similarity of the preference values of
A user support apparatus comprising:

The apparatus according to claim 4.
The user information storage means includes, for each user terminal, information for identifying the content of at least one of the content published by the user terminal, the content being viewed, and the content viewed, and the user terminal Activity information consisting of at least one of operation information, audio information, and biometric information of a user who uses the user terminal when the content is viewed or viewed is acquired, and this is obtained as a user of the user terminal Remember it as preference information,
The user association means associates a user terminal based on a preference value of the user terminal and a similarity calculated from the activity information.

6. The user support apparatus according to claim 4, wherein the user information storage unit further stores a time change of the user preference information.

A user support program, characterized in that it is a program for causing a computer to execute each step according to any one of claims 1 to 3.