JP2007027918A

JP2007027918A - Real world communication management apparatus

Info

Publication number: JP2007027918A
Application number: JP2005203863A
Authority: JP
Inventors: Hirotaka Ueda; 宏高上田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2005-07-13
Filing date: 2005-07-13
Publication date: 2007-02-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method of managing a user's behavior on a real world associated with call origination. <P>SOLUTION: A communication recording management section 2804 stores and manages communication recordings. A user name acquisition means plays a roll of integrally converting accounts recorded in the communication recording management section in a form easily understandable by a user. Means for handling various communications such as electronic mail, phone calls, and chats are connected to the communication recording management section, which integrally handles the communication recordings. The stored communication recordings are exhibited for the user via a user interface section 2805 and operated by the user. In the case of implementing operations such as a response to the existing communication recordings, a required communication utilizing means is called. An environment wherein the communication recording management means integrates the communication recordings into one is obtained. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、実世界における対人コミュニケーションを監視し、記録する技術に関する。 The present invention relates to a technique for monitoring and recording interpersonal communication in the real world.

実世界における人間の行動を記録する研究は古くから行われている。たとえば、GPS(Global Positioning System)を用いて取得したユーザの位置情報を追跡し、地図上などに記録するシステムが数多く考案されている。それらの情報は、後日、その日の行動を参照する場合などに利用される。日時情報が重要な要素を占める場合、たとえば日記を書くときなどに、ユーザがいつどこにいたかという時間・位置（時空間）情報を利用できれば便利である。 Research on human behavior in the real world has been conducted for a long time. For example, many systems have been devised that track user position information acquired using GPS (Global Positioning System) and record it on a map or the like. Such information is used when referring to the behavior of the day later. When date and time information occupies an important factor, it is convenient if time / position (time and space) information indicating when and where the user was, for example, can be used when writing a diary.

Feiner, Steven, Blair MacIntyre, Marcus Haupt, and Eliot: Solomon. Windows on the World: 2D Windows for 3D Augmented Reality. Proceedings of UIST '93 (Atlanta, GA, 3-5 November 1993), pp. 145-155.Feiner, Steven, Blair MacIntyre, Marcus Haupt, and Eliot: Solomon.Windows on the World: 2D Windows for 3D Augmented Reality.Proceedings of UIST '93 (Atlanta, GA, 3-5 November 1993), pp. 145-155. Mike Addlesee, Rupert Curwen, Steve Hodges, Joe Newman, Pete Steggles, Andy Ward, Andy Hopper. Implementing a Sentient Computing System. IEEE Computer Magazine, Vol. 34, No. 8, August 2001, pp. 50-56.Mike Addlesee, Rupert Curwen, Steve Hodges, Joe Newman, Pete Steggles, Andy Ward, Andy Hopper.Implementing a Sentient Computing System.IEEE Computer Magazine, Vol. 34, No. 8, August 2001, pp. 50-56. 特許第３１０４２０４号Japanese Patent No. 3104204 特開２００２−７３５８JP 2002-7358

実世界上の人間の行動の記録という側面で考えると、いわゆる5W1H、いつ(When)、どこで(Where)、誰が(Who)、何を(What)、なぜ(Why)、どのように(How)、というコンテキスト(context)を取得することが望ましい。このうち、When、Whereに関しては、既に取得する枠組みが存在するが、残りの要素に関しては現在利用できる枠組みが存在しない。これらのコンテキストをコンピュータに精度よく認識させることを目的としたコンテキストアウェアネス(context awareness)手法が計算機科学の分野で盛んに研究されている。 In terms of recording human behavior in the real world, the so-called 5W1H, When, Where, Who, What, Why, How It is desirable to obtain the context. Among them, when and where, there are already acquired frameworks, but there are no frameworks available for the remaining elements. Context awareness methods aiming at making computers recognize these contexts with high accuracy have been actively studied in the field of computer science.

5W1HのうちのWhoに関しては、あるユーザの行動を追跡する場合には、その行動の主体者は常にその監視対象ユーザであるので、取得の必要が無いようにも考えられる。しかしながら、実際には、そのユーザが誰と会ったのかというコミュニケーションの相手に関する情報も主体者と同様に極めて重要であり、それを取得することができればユーザ行動の記録という観点から大変意義がある。 Regarding Who in 5W1H, when the behavior of a certain user is tracked, the subject of the behavior is always the monitoring target user, so it may be considered that there is no need for acquisition. However, in reality, information regarding the communication partner of whom the user has met is also extremely important, like the subject, and if it can be obtained, it is very significant from the viewpoint of recording user actions.

ところが、現時点では、ユーザが誰と会ったかという情報を取得し、記録する確かな枠組みは一般的には存在しない。たしかに、ユーザが誰と居合わせたのかという情報を取得することはできる。たとえば、拡張現実感(AR:Augmented Reality)の分野では、各ユーザが無線通信機能を有する識別票を首からぶら下げ、人と会ったときに、相手がぶら下げている識別票をコンピュータによって認識することにより、その人の名前などの属性情報をユーザに提示するシステムが考案されている（上記非特許文献１参照）。非特許文献１のシステムでは、ユーザがHMD(Head Mounted Display)をつけて対象を観察すると、対象の名前などのさまざまな情報がユーザの視界に重畳表示される。 However, at present, there is generally no reliable framework for obtaining and recording information about who the user has met. Certainly, information on who the user was with can be acquired. For example, in the augmented reality (AR) field, each user hangs an identification tag with a wireless communication function from his neck, and when he meets a person, the identification tag that the other party hangs is recognized by the computer. Therefore, a system for presenting attribute information such as the person's name to the user has been devised (see Non-Patent Document 1 above). In the system of Non-Patent Document 1, when a user wears an HMD (Head Mounted Display) and observes an object, various information such as the name of the object is superimposed on the user's field of view.

また、各ユーザに無線発信装置を携帯させ、環境中に配置された多数の位置センサがユーザ位置を常時記録するシステムも考案されている（上記非特許文献２参照）。非特許文献２のシステムは、ActiveBAT（Active Badgeという前システム有り）と呼ばれるユビキタス分野の古典的研究のひとつである。同様のシステムは、日本においても、たとえばゼネコンがこれ以前に複数構築していたものである。 Also, a system has been devised in which each user carries a wireless transmission device and a large number of position sensors arranged in the environment constantly record the user position (see Non-Patent Document 2 above). The system of Non-Patent Document 2 is one of the classic studies in the ubiquitous field called ActiveBAT (with a former system called Active Badge). In Japan, for example, a plurality of general contractors have previously constructed a similar system.

現在では特別な機器を利用しなくても、既存の携帯電話機に搭載されているGPS(Global Positioning System)で取得した各個人の位置情報や携帯電話機の基地局情報を集積し、同じ時空間を共有した人をピックアップすることで、それらの人が同じ時空間に居合わせたという事実を把握することができる。 Even without using special equipment, the location information of each individual and the base station information of the mobile phone acquired by the GPS (Global Positioning System) installed in the existing mobile phone are integrated, and the same space-time can be obtained. By picking up the shared people, you can grasp the fact that they were in the same space and time.

ところが、上記のシステムでは、単にすれ違うなどしてたまたま近くに居合わせただけの人と、ユーザが対話などのコミュニケーションをした人とを区別することができない。多くの場合、単に居合わせただけの人に関する情報はそれほど重要ではなく、誰とコミュニケーション（たとえば対話）したかという情報の方がより重要である。今日一日、誰と、どこで、どのような話をしたかという情報が分かれば、後で自分の行動を反芻する上で有用である。逆に、今日一日誰と居合わせたかというだけの情報が分かったとしても、大量の人間がリストアップされるだけであり、多くの場合は意味の無い情報の羅列になるだけで役に立たない可能性が高い。 However, in the above system, it is not possible to distinguish between a person who just happens to pass by, for example, a person who has just been in the vicinity, and a person who has communicated such as a dialog. In many cases, information about a person who is simply present is not so important, and information about who communicated (for example, dialogue) is more important. If you know who, where and what you talked about today, it will be useful to refute your actions later. On the other hand, even if you know information about who you were with today, only a large number of people are listed, and in many cases it may be useless just by listing meaningless information Is expensive.

コミュニケーション相手の記録方法として、たとえば名刺データの電子的な交換方法に関して、多くの特許出願がなされている。例えば、上記特許文献１には、電子手帳内に予め記録されている名刺データを簡単な操作によって相互に交換できる電子手帳に関する技術が記載されている。また、上記特許文献２には、非接触で名刺交換を行う携帯用電子機器に関する技術が記載されている。また、人体に微弱な電流を流し、人体を伝送ケーブルとして利用することにより、握手をすると同時に名刺データの交換を行うシステムも複数考案されている。しかしながら、上記の技術は、名刺交換を必要とするようなフォーマルなコミュニケーションのみが対象となり、日常的な会話などは対象となりえない。また、これらの技術では、名刺データの交換のために特別な機器操作や人体の接触を必要とするため、限定的であり、かつ、日常的なコミュニケーションに対応させると煩雑に過ぎるという問題がある。 As a communication partner recording method, for example, many patent applications have been filed regarding an electronic exchange method of business card data. For example, Patent Document 1 describes a technique related to an electronic notebook that can exchange business card data recorded in advance in the electronic notebook with a simple operation. Patent Document 2 describes a technique related to a portable electronic device that exchanges business cards without contact. A plurality of systems for exchanging business card data at the same time as shaking hands by passing a weak current through the human body and using the human body as a transmission cable have been devised. However, the above-described technique is only applicable to formal communication that requires business card exchange, and cannot be applied to daily conversation. In addition, these technologies require a special device operation and human contact for exchanging business card data, so that there is a problem that it is limited and too complicated to deal with daily communication. .

本発明は、実世界上のユーザ行動を確認するための新たな方式を提供することを目的とする。 An object of this invention is to provide the new system for confirming the user action in the real world.

誰とコミュニケーションしたか：実世界上のコミュニケーションは発話によって成り立っていると言える。メラビアンの法則(Mehrabian's communication model)によれば、人間のコミュニケーションは非言語的な要素が全体の９３パーセントを占めるとされる。しかしながら、発話を全く伴わないコミュニケーションが成立することは稀である。例えば、２者が同じ時間及び同じ空間を共有した場合でも、その２者間に全く会話がなかった場合と、活発な会話があった場合とでは、後者のほうが重要なコミュニケーションが行われたとみなすことは理に適っている。従って、ユーザの発話に着目し、誰が誰に対してコミュニケーションしたかを自動的に記録することで、実世界上のユーザ行動を確認することが可能である。 Who communicated: Real-world communication is based on utterances. According to the Mehrabian's communication model, non-verbal elements account for 93 percent of human communication. However, it is rare that communication without any utterance is established. For example, even if two people share the same time and the same space, the latter is considered to have been more importantly communicated when there is no conversation between the two and when there is an active conversation That makes sense. Therefore, it is possible to confirm user behavior in the real world by focusing on the user's utterance and automatically recording who communicated with whom.

本発明の一形態によれば，発話者を識別する発話者識別子と、発話の発生時刻から構成される発話記録を管理するコミュニケーション記録管理手段と、前記コミュニケーション記録管理手段に管理されている時間的に近接する発話記録を、対話記録としてグループ化する発話統合手段を有し、前記発話統合手段により生成された対話記録の一部または全部を、対話発生時刻に基づき時系列に並べてユーザに提示するコミュニケーション記録提示手段を備える。 According to one aspect of the present invention, a communication record management unit that manages a utterance record that includes a speaker identifier that identifies a speaker, and a time when the utterance has occurred, and a time that is managed by the communication record management unit Utterance recording means for grouping utterance records close to the utterance record as dialogue records, and presenting a part or all of the conversation records generated by the utterance integration means to the user in time series based on the dialogue occurrence time Communication record presenting means is provided.

前記発話記録管理手段に記録されている発話記録の一部または全部を時系列に並べてユーザに提示する発話記録提示手段を備えるのが好ましい。発話記録はメーラにおけるメールのように管理される。一般的にはＰＣを用いるが、発話通知端末情報端末がその機能も含有する場合もある。 It is preferable that the apparatus further comprises utterance record presenting means for presenting a part or all of the utterance records recorded in the utterance record management means to the user in time series. Utterance records are managed like emails in mailers. Generally, a PC is used, but the utterance notification terminal information terminal may also include the function.

時系列に並べる発話情報は、何らかの形でフィルタリング（たとえば、今日一日の発話記録だけを見る、或いは、○×さんの発話記録だけを見るなど）される可能性があるため、「一部または全部」としている。 Since the utterance information arranged in time series may be filtered in some way (for example, see only the utterance record of the day, or see only the utterance record of Mr. XX) "All".

前記発話記録管理手段コミュニケーション記録管理手段に記録されている発話記録に含まれる発話者を識別する発話者識別子IDにより特定されるユーザ名を取得するユーザ名取得手段を有し、前記発話記録提示手段コミュニケーション記録提示手段は、発話に対応するユーザ名を提示するのが好ましい。発話記録は、そのままの形で表示されるのではなく、何らかの加工が施されて表示される（ユーザIDをユーザ名に置き換える手段を有する）。 The utterance record management means comprises user name acquisition means for acquiring a user name specified by a speaker identifier ID for identifying a speaker included in the utterance record recorded in the communication record management means, and the utterance record presentation means The communication record presenting means preferably presents the user name corresponding to the utterance. The utterance record is not displayed as it is, but is displayed after some processing (having means for replacing the user ID with the user name).

前記発話記録提示手段コミュニケーション記録提示手段は、前記発話記録管理手段コミュニケーション記録管理手段に記録されている発話記録において、時間的に近接する発話をグループ化して提示することを特徴とする。すなわち、各発話は対話としてグループ化して取り扱うことができる。 The utterance record presenting means communication record presenting means presents the utterance records recorded in the utterance record managing means and communication record managing means as a group and presents utterances that are close in time. That is, each utterance can be handled as a grouped dialogue.

前記発話記録提示手段コミュニケーション記録提示手段は、グループ化された発話対話における発話者のうち、グループ内で頻繁に発話した発話者を、当該対話グループの主な発話者として提示するのが好ましい。頻繁に発話した者が参加者としては重み付けが大きいと考えられるからである。尚、グループへのユーザ名の割当としては、自己（自ユーザ）の名前はとりあえず外しても良い。 It is preferable that the utterance record presenting means and the communication record presenting means present, as a main speaker in the dialogue group, a speaker who frequently speaks in the group among the speakers in the grouped speech dialogue. This is because a person who speaks frequently is considered to be heavily weighted as a participant. In addition, as an assignment of the user name to the group, the name of the self (own user) may be removed for the time being.

一方、各発話が誰によって聞かれたかを示す受話者情報をさらに取得することによって，発言を伴わない対話者に関しても扱うことが可能になる。また、対話発生場所に関する情報を取得することによって、対話がどこで行われたかという大きな特徴を扱うことができる。人間の記憶は場所と結びつくことが多いため、対話発生場所に関する情報は、対話記録にアクセスする際に非常に重要である。 On the other hand, it is possible to deal with a talker who does not accompany a speech by further acquiring receiver information indicating who has heard each utterance. In addition, by acquiring information on the location where the dialogue occurs, it is possible to handle a large feature of where the dialogue was performed. Since human memory is often tied to a location, information about the location of the dialogue is very important when accessing the dialogue record.

好ましくは、対話記録には，対話を構成する各発話の内容を記録した発話内容を含む。これは、たとえば、対話の録音であったり、対話内容の要約であったりする。また、実人間の発話には感情が含まれる場合が多く、声の調子や高さを計測することで、ある程度感情を推定することが可能である。好ましくは、発話記録は、発話者の精神状態の推定情報をさらに含み，対話記録は前記推定情報と併せて提示される。 Preferably, the dialogue record includes an utterance content in which the content of each utterance constituting the dialogue is recorded. This may be, for example, a recording of a dialogue or a summary of the dialogue content. In addition, emotions are often included in real human utterances, and emotions can be estimated to some extent by measuring the tone and height of the voice. Preferably, the utterance record further includes estimation information of the mental state of the speaker, and the dialogue record is presented together with the estimation information.

前記グループ化された発話の継続時間を算出する対話継続時間算出手段をさらに備え、
前記コミュニケーション記録提示手段発話記録提示手段は、前記対話継続時間算出手段により算出された継続時間を発話者と対応させて提示するのが好ましい。対話の継続時間を算出することにより、対話に関する重要性などの情報を得ることができる。但し、これは重要度等の推定に利用するものでありため、必ずしも正確である必要は無い。 A dialog duration calculating means for calculating a duration of the grouped utterances,
The communication record presenting means utterance record presenting means preferably presents the duration calculated by the dialog duration calculating means in association with the speaker. By calculating the duration of the dialogue, it is possible to obtain information such as importance regarding the dialogue. However, since this is used for estimating the importance or the like, it is not necessarily accurate.

前記コミュニケーション記録提示手段発話記録提示手段は、前記対話継続時間算出手段により算出された継続時間が、ある閾値以上であるのグループ化された発話対話記録のみをフィルタリングして提示するのが好ましい。長い対話ほど重要であると推定される。従って、対話の長短でフィルタリングすることにより、対話の特性を知ることができる。 The communication record presenting means utterance record presenting means preferably filters and presents only the grouped utterance dialog records whose duration calculated by the dialog duration calculating means is equal to or greater than a certain threshold. It is estimated that the longer the dialogue, the more important. Therefore, it is possible to know the characteristics of the dialog by filtering by the length of the dialog.

前記コミュニケーション記録提示手段発話記録提示手段は、前記対話継続時間算出手段により算出された継続時間に基づいて、昇順又は降順に提示するのが好ましい。長い対話ほど重要であると推定される。そこで、昇順又は降順にソーティングして提示することで、重要度を一目で見ることが出来る。 Preferably, the communication record presenting means utterance record presenting means presents in ascending or descending order based on the duration calculated by the dialog duration calculating means. It is estimated that the longer the dialogue, the more important. Therefore, the importance can be seen at a glance by sorting and presenting in ascending or descending order.

好ましくは前記コミュニケーション記録管理手段により記録されている発話記録からユーザによって指定された特定の発話者を含む発話記録もしくは対話記録のみを検索するユーザ検索手段を有する。特定の人物とのコミュニケーション記録のみを探し出すことは，メールにおいても通常行われる極めて重要な機能である。 Preferably, user search means for searching only an utterance record or a dialog record including a specific speaker designated by the user from the utterance record recorded by the communication record management means is provided. Finding only a record of communication with a specific person is an extremely important function that is usually performed in email.

本発明の他の観点によれば、前記コミュニケーション記録管理手段は、電子メール記録を発話記録に加えて管理し、前記コミュニケーション記録提示手段は、前記コミュニケーション記録管理手段により管理されている電子メール記録と対話記録の一部または全部を電子メール送信時刻および対話発生時刻に基づき、時系列に並べてユーザに提示することを特徴とする情報処理装置が提供される。電子メールを管理する電子メール管理手段と、前記電子メール管理手段で管理されている電子メールアカウントと前記発話記録管理手段に記録されている発話者を識別するIDとを関連付ける電子メール連携手段を有し、前記発話記録提示手段は、前記電子メール管理手段により管理されている電子メールと前記発話記録管理手段に記録されている発話記録とを統合してユーザに提示することを特徴とする情報処理装置が提供される。電子メールと統合して利用することにより、その後に電話、チャットなどのコミュニケーションツールとして扱うことが容易になる。 According to another aspect of the present invention, the communication record management means manages an electronic mail record in addition to an utterance record, and the communication record presenting means includes an electronic mail record managed by the communication record management means, There is provided an information processing apparatus characterized in that a part or all of a dialogue record is presented to a user in chronological order based on an e-mail transmission time and a dialogue occurrence time. E-mail management means for managing e-mail, and e-mail linkage means for associating an e-mail account managed by the e-mail management means with an ID for identifying a speaker recorded in the utterance record management means The utterance record presenting means integrates the electronic mail managed by the electronic mail management means and the utterance record recorded in the utterance record management means and presents them to the user. An apparatus is provided. By using it integrated with e-mail, it can be easily handled as a communication tool such as telephone and chat.

好ましくは、前記コミュニケーション記録管理手段に管理されている発話記録に含まれる発話者を識別するための情報と、前記コミュニケーション記録管理手段に管理されている電子メール記録に含まれる電子メールアドレスを対応付けて管理するアカウント管理手段を有し、発話記録への電子メールによる返信を行う際に、その発話記録の発話者に対応する電子メールアドレスを前記アカウント管理手段により取得し、取得した電子メールアドレスを返信メールの宛先に設定する。こうすることで、発話記録への返信として電子メールを送信することが可能になり、発話電子メールという異なるコミュニケーション手段を互いに連携させて利用することが可能になる。 Preferably, the information for identifying the speaker included in the utterance record managed by the communication record management unit is associated with the e-mail address included in the e-mail record managed by the communication record management unit The account management means for managing the utterance record, and when replying to the utterance record by e-mail, the account management means obtains an e-mail address corresponding to the utterer of the utterance record, and the obtained e-mail address is Set the reply mail address. This makes it possible to send an e-mail as a reply to the utterance record, and to use different communication means called utterance e-mails in cooperation with each other.

同様に対話に対する返信を電子メールで行う場合には、対話への参加者全員ないしは、主な参加者の電子メールアドレスを取得し、宛先として設定する。こうすることで対話記録への返信として電子メールを送信することができる。 Similarly, when a reply to a dialog is sent by e-mail, the e-mail addresses of all participants or main participants in the dialog are acquired and set as destinations. This makes it possible to send an e-mail as a reply to the dialogue record.

好ましくは、返信電子メールの返信元のメッセージ識別子を示すIn-Reply-Toヘッダや、引用メッセージ識別子を示すReferencesヘッダに、適切な発話記録ないしは対話記録を識別するためのIDを付加する。こうしたIDは、コミュニケーション記録を電子メールと統合して管理する際に非常に有用である。 Preferably, an ID for identifying an appropriate utterance record or dialogue record is added to the In-Reply-To header indicating the reply message identifier of the reply e-mail or the References header indicating the quote message identifier. These IDs are very useful when managing communication records integrated with email.

コミュニケーション記録の中から、ある人物が関連するコミュニケーション記録のみを抽出することは、コミュニケーション記録利用の上で非常に重要な機能である。電子メール記録および発話記録ないしは対話記録において、同一人物が関与する記録のみを検索するユーザ検索手段を有することが好ましい。 Extracting only communication records related to a certain person from the communication records is a very important function in using the communication records. It is preferable to have a user search means for searching only records in which the same person is involved in e-mail recording and utterance recording or dialogue recording.

好ましくは、電子メール記録のみならず他のネットワークを介して行われるさまざまなコミュニケーション、たとえば、チャット記録、電話記録、TV電話記録などをも統合して同様に扱えることが望ましい。このような送信側の対応として、既知の（登録済みの）IDをもつ端末にのみに対して発話通知をする制御を行うのが好ましい。このようにすると、登録されたユーザの発話のみ記録されることになる。従って、プライバシを保護すると同時に、発話記録におけるノイズ（関係ない他者の会話記録）を減らすことができる。異種のコミュニケーション記録を単一のインタフェースで，わけ隔てなく利用できる枠組みは、人間のコミュニケーションを支援する上で有用である。 Preferably, it is desirable that not only electronic mail recording but also various communications performed via other networks, for example, chat recording, telephone recording, video telephone recording, and the like can be integrated and handled in the same manner. As such a response on the transmission side, it is preferable to perform control for notifying an utterance only to a terminal having a known (registered) ID. If it does in this way, only the utterance of the registered user will be recorded. Accordingly, privacy can be protected and noise in the utterance record (conversation record of an unrelated person) can be reduced. A framework in which different types of communication records can be used with a single interface is useful for supporting human communication.

本発明によれば、ユーザが誰とコミュニケーションを行ったかを記録することができるため、実世界上のコミュニケーションを仮想世界上のコミュニケーションと同様に扱うことができるようになる。従って、後からコミュニケーション記録を見直すことによって、実世界上のユーザ行動を確認することが可能になる。 According to the present invention, it is possible to record who the user communicated with, so that communication in the real world can be handled in the same way as communication in the virtual world. Accordingly, it is possible to confirm user behavior in the real world by reviewing the communication record later.

以下、本発明の各実施の形態による実世界コミュニケーション管理技術について、図面を参照しつつ説明を行う。本明細書において、情報処理装置は、パーソナルコンピュータ（ＰＣ）、携帯端末（携帯電話機を含む）などを含む広い概念である。 Hereinafter, real-world communication management technology according to each embodiment of the present invention will be described with reference to the drawings. In this specification, the information processing apparatus is a broad concept including a personal computer (PC), a mobile terminal (including a mobile phone), and the like.

まず、本発明の第１の実施の形態による実世界コミュニケーション管理技術について説明する。 First, the real world communication management technique according to the first embodiment of the present invention will be described.

本実施の形態による発話通知装置である情報処理装置は、ユーザの発話を検知する発話検知手段と、この発話検知手段によってユーザの発話が検知された際に、ユーザの発話を近接する他の端末に通知する発話通知手段とを備えている。上記発話通知手段が通知する情報は、発話者を識別する発話者識別子を含むことを特徴とする。発話者識別子を含むことにより、「誰に」話しかけられたのかについて、装置は認識することができる。 An information processing apparatus that is an utterance notification apparatus according to the present embodiment includes an utterance detection unit that detects a user's utterance, and another terminal that closes the user's utterance when the utterance is detected by the utterance detection unit And utterance notification means for notifying the user. The information notified by the utterance notification means includes a speaker identifier for identifying a speaker. By including the speaker identifier, the device can recognize who was spoken to.

各ユーザは上記情報処理装置を常時携帯する。この情報処理装置を携帯する監視対象ユーザが発話を行うと、上記発話検知手段によってユーザの発話が検出される。実際の発話の検出方法には様々な手法が考えられるが、典型的な周知の手法として、マイクを用いて監視対象ユーザの音声を検出する方法がある。ここで他者の発話や環境音などのノイズの分離が課題となる場合もあるが、音量レベルなどで適切にフィルタリングすることによって、高精度に監視対象ユーザの発話を検出することは可能である。ピンマイクなどを監視対象ユーザが身につけるだけでも、実用的な精度で発話を検出できる。 Each user always carries the information processing apparatus. When the monitoring target user carrying this information processing apparatus utters, the utterance detecting means detects the user's utterance. Various methods are conceivable as an actual speech detection method. As a typical well-known method, there is a method of detecting a voice of a monitoring target user using a microphone. Here, separation of noises such as other people's utterances and environmental sounds may be a problem, but it is possible to detect the utterances of the monitored user with high accuracy by appropriately filtering by the volume level etc. . Even if the user to be monitored wears a pin microphone or the like, the speech can be detected with practical accuracy.

さらにカメラによる口周囲の画像の取得、声帯振動の検知、声紋分析などを組み合わせることで、発話検知の精度を上げることも可能である。尚、発話時にはユーザが何らかのボタンを押すなどして、発話をシステムに通知してもよい。発話検知の方法に関しては、上記の記載は単なる例示であり、公知の様々な方法を用いることが可能である。 Furthermore, it is possible to improve the accuracy of speech detection by combining the acquisition of an image around the mouth with a camera, detection of vocal cord vibrations, voiceprint analysis, and the like. When speaking, the user may notify the system of the utterance by pressing any button. Regarding the speech detection method, the above description is merely an example, and various known methods can be used.

情報処理装置は、発話を検出すると近接するユーザ（実際にはユーザが携帯している携帯端末）に対してその発話の発生を通知する。本来、発話の通知は、発話者が発話を行っている対象のユーザのみに行うことが望ましい。しかしながら、コンピュータが対話相手を特定することは通常困難であるし、発話内容はその場に居合わせている他人にも傍受可能である。そこで、本実施の形態においては、発話ユーザに近接するすべてのユーザに対して発話の通知を行う。本明細書における「近接」とは、ユーザの発話の音声が届く（記録可能である）範囲のことを指す。 When the information processing apparatus detects an utterance, the information processing apparatus notifies an adjacent user (actually, a portable terminal carried by the user) of the occurrence of the utterance. Originally, it is desirable that notification of an utterance be performed only for a target user who is speaking. However, it is usually difficult for a computer to identify a conversation partner, and the content of the utterance can be intercepted by another person present at the place. Therefore, in the present embodiment, the utterance is notified to all users close to the uttering user. In this specification, “proximity” refers to a range in which the voice of the user's utterance can reach (recordable).

発話の通知に関しては、さまざまな方法が考えられるが、たとえば近距離無線通信を用いて、通信範囲内の任意の端末に対して、発話を通知するメッセージを送信することで、通知を行うことが考えられる。発話の通知は、対話相手にさえ届けばよいため、通信半径は一般的な対話における平均的な距離である１〜２ｍ程度が適当である。但し、拡声器などにより音量が上がっている場合には、通知領域は広くなるべきであるし、ひそひそ声の場合には通知領域は狭くなるべきである。従って、発話音量レベルとの関連において通知領域の大きさは可変となる。実際には発話音量レベルを検出し、それに応じて通知領域の大きさを変更する仕組みを備えても良い。 Various methods can be considered for utterance notification. For example, by using short-range wireless communication, notification can be performed by transmitting a message notifying utterance to any terminal within the communication range. Conceivable. Since the notification of the utterance only needs to be delivered to the conversation partner, a communication radius of about 1 to 2 m, which is an average distance in general conversation, is appropriate. However, when the volume is increased by a loudspeaker or the like, the notification area should be widened, and in the case of a secret voice, the notification area should be narrowed. Therefore, the size of the notification area is variable in relation to the speech volume level. In practice, a mechanism may be provided that detects the utterance volume level and changes the size of the notification area accordingly.

図２は、本実施の形態による発話通知において、無指向性の近距離無線通信システムを用いた場合の概念図である。図２に示すように、本実施の形態による無指向性の近距離無線通信システムにおいては、例えばA〜Fまでの６台の端末が存在している場合を考える。端末A２０１と端末D２０４、端末C２０３と端末F２０６が、それぞれ対話を行っている。矢印で示すように、端末B２０２、C２０３、F２０６は、図２の上方に向かって移動している。ここで、端末A２０１のユーザが発話をしたとする。その場合、端末A２０１は自己の発話通知領域内に存在する端末に対して、発話を通知する。ここで、発話通知領域内の端末B２０２と端末D２０４がその通知を受信し、端末A２０１が発話したことを知る。一方、端末C２０３、E２０５、F２０６は端末A２０１の発話通知領域外に存在するため、端末A２０１の発話を認識することができない。 FIG. 2 is a conceptual diagram when an omnidirectional short-range wireless communication system is used in the utterance notification according to the present embodiment. As shown in FIG. 2, in the omnidirectional short-range wireless communication system according to the present embodiment, consider a case where there are six terminals A to F, for example. Terminal A201 and terminal D204, and terminal C203 and terminal F206 are interacting with each other. As indicated by the arrows, the terminals B202, C203, and F206 are moving upward in FIG. Here, it is assumed that the user of terminal A201 speaks. In that case, the terminal A 201 notifies the utterance to the terminals existing in its own utterance notification area. Here, the terminal B 202 and the terminal D 204 in the utterance notification area receive the notification and know that the terminal A 201 has uttered. On the other hand, since the terminals C203, E205, and F206 exist outside the utterance notification area of the terminal A201, the utterance of the terminal A201 cannot be recognized.

図３は、本実施の形態において、発話通知に指向性の近距離無線通信システムを用いた場合の概念を示す図である。この場合、端末A３０１の発話通知領域内に存在するのは端末D３０４だけとなり、端末Ｂ３０２は領域外の端末となる。一般的に、発話は対話相手の方向を向いて行われることが多いという事実を考慮すると、発話通知はユーザが向いている方向に向けて行われることが望ましい。この場合には、発話通知領域ＡＲ２は、図に示すように扇型になる。 FIG. 3 is a diagram showing a concept when a directional short-range wireless communication system is used for the utterance notification in the present embodiment. In this case, only the terminal D304 exists in the utterance notification area of the terminal A301, and the terminal B302 is a terminal outside the area. In general, in consideration of the fact that utterances are often directed in the direction of the conversation partner, it is desirable that the utterance notification be directed in the direction that the user is facing. In this case, the utterance notification area AR2 has a fan shape as shown in the figure.

さらに、前述のように、発話通知半径を音声レベルに応じて調整することができれば、コミュニケーション相手ではない端末に対しては極力発話通知を行わずに、コミュニケーション相手だけに発話通知を行うことが可能になる。 Furthermore, as described above, if the utterance notification radius can be adjusted according to the voice level, it is possible to notify the utterance only to the communication partner without performing the utterance notification to the terminal that is not the communication partner as much as possible. become.

一方、ユーザが通信路を介して遠距離発話を行っている場合には、発話者識別子を含む発話通知も通信路を介して、受話者に送信される必要がある。発話通知は、発話の伝達に利用される通信路を介して送信されてもよいし（図２５Ａ）、別の通信路を介して送信されても良い（図２５Ｂ）。たとえば電話回線を介して発話者と受話者が会話をしている場合、発話者の発話は電話回線を通して受話者まで到達するが、発話通知は電話回線を通して送信してもよいし、インターネット網などの他のネットワークを通して送信しても良い。他の通信路を利用することができれば、発話を伝達する通信路が、発話通知の伝送を認めない場合でも、発話通知を受話者に伝達することが可能になる。通信路を介した遠距離発話の場合、通信路の状態によっては、発話通知の伝達が話に比べて遅延する状況が発生することが考えられる。そこで発話とその発話通知を関連付けるため、発話通知には発話発生時刻などの発話を特定するための情報が含まれていることが望ましい。 On the other hand, when the user is making a long-distance utterance via the communication path, an utterance notification including the speaker identifier needs to be transmitted to the receiver via the communication path. The utterance notification may be transmitted via a communication path used for transmitting the utterance (FIG. 25A) or may be transmitted via another communication path (FIG. 25B). For example, when a speaker and a receiver are having a conversation via a telephone line, the speaker's utterance reaches the receiver through the telephone line, but the utterance notification may be transmitted through the telephone line or the Internet network. It may be transmitted through other networks. If another communication path can be used, it is possible to transmit the utterance notification to the receiver even when the communication path for transmitting the utterance does not permit transmission of the utterance notification. In the case of long-distance utterance via a communication path, depending on the state of the communication path, a situation may occur in which the transmission of the utterance notification is delayed compared to the speech. Therefore, in order to associate the utterance with the utterance notification, it is desirable that the utterance notification includes information for specifying the utterance such as the utterance occurrence time.

発話の通知を行うパケットには、発話者を識別する発話者識別子（発話者ID）が付加される。発話者を識別できれば良いので、氏名やニックネームなどを利用しても良いが、プライバシ保護を考慮すると英数字の羅列などの機械的なIDのほうが望ましい。その場合、各ユーザはこの発話者IDをある期間使い続けることができるが、任意のタイミングで新たな発話者IDを取得することもできる。同じ発話者IDを長時間使い続けると、第三者によって行動を追跡される可能性が大きくなるため、インターネットにおけるcookieのように定期的にリフレッシュするのが好ましい。無論、固有の発話者IDを長期間使い続けることも可能である。 A speaker identifier (speaker ID) for identifying a speaker is added to the packet for notification of the utterance. As long as the speaker can be identified, a name or nickname may be used. However, considering privacy protection, a mechanical ID such as an alphanumeric list is preferred. In this case, each user can continue to use the speaker ID for a certain period, but can acquire a new speaker ID at an arbitrary timing. If the same speaker ID is used for a long time, there is a high possibility that a third party will be able to track the behavior. Therefore, it is preferable to refresh periodically like a cookie on the Internet. Of course, it is possible to continue using a unique speaker ID for a long time.

このように、発話者の発話と同期して、その発話という事象の発生と、発話者を示す発話者IDとを、発話を聞いている受話者（の所有する端末）に通知することができる。従って、「誰に」話しかけられているのかに関して受話者（の所有する端末）が認識することが可能になる。 In this way, in synchronization with the utterance of the speaker, the occurrence of the utterance and the speaker ID indicating the speaker can be notified to the receiver (terminal owned by the speaker) who is listening to the utterance. . Therefore, it becomes possible for the receiver (the terminal owned by the receiver) to recognize who is talking to whom.

次に、本発明の第２の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置においては、発話を識別する発話識別子（発話ID）を割り当てる発話識別子割当手段を有し、上記発話通知手段が通知する情報は、この発話IDを含むことを特徴とする。この発話IDは上記の発話者を識別する発話者識別子（発話者ＩＤ）を内包していても良い（具体例については後述する）。 Next, a real world communication management technique according to the second embodiment of the present invention will be described. The information processing apparatus according to the present embodiment has an utterance identifier assigning means for assigning an utterance identifier (utterance ID) for identifying an utterance, and the information notified by the utterance notification means includes the utterance ID. To do. This utterance ID may include a speaker identifier (speaker ID) for identifying the above speaker (a specific example will be described later).

発話IDは継続する発話に対して一意に割り当てられる。ここで述べる継続する発話とは、同一の発話者によって行われる時間的に連続する発話のことを指す。継続する発話の検出方法としては、例えば、発話の検出が中断して、閾値(たとえば０．５秒)以上の時間、新たな発話が検出されなければ、そこで一連の発話は完了したと判断することができる。ただし、本明細書においては、特に区別する必要の無い限り、「継続する発話」を単に「発話」と表記することにする。 The utterance ID is uniquely assigned to a continuous utterance. The continuous utterance described here refers to continuous utterances performed by the same speaker. As a method for detecting a continuous utterance, for example, if utterance detection is interrupted and a new utterance is not detected for a time longer than a threshold (for example, 0.5 seconds), it is determined that a series of utterances is completed. be able to. However, in this specification, “continuous utterance” is simply referred to as “utterance” unless there is a particular need for distinction.

ここで継続する発話という定義は、例えば、「おはようございます」という発言があった場合に、「O-HA-YO-U-GO-ZA-I-MA-SU」という音韻ごとに発話IDを振るのではなく、それ全体に一つの発話IDを割り当てることを意図したものである。もちろん音韻ごとに個別にIDを振ってもよいが（上記の例で閾値を極端に小さくすれば、発話IDは細かく振られる）、煩雑さを避けるためには全体に一つの発話IDを割り当てるのが好ましい。すなわち、発話ＩＤは、発話のまとまりごとにIDを振ることを意味し、そのまとまりの意味は任意である。 The definition of utterance that continues here is, for example, when an utterance "Good morning" is given, an utterance ID is assigned to each phoneme "O-HA-YO-U-GO-ZA-I-MA-SU". It is not intended to be shaken, but is intended to assign a single utterance ID to it. Of course, you may assign an ID individually for each phoneme (if the threshold is made extremely small in the above example, the utterance ID will be finely assigned), but in order to avoid complications, assign one utterance ID to the whole. Is preferred. That is, the utterance ID means that an ID is assigned to each utterance group, and the meaning of the group is arbitrary.

ある発話者の発話を受けて、別の発話者が発話を行うと対話が成立する。発話は、「おはようございます、○×さん」程度の長さが一般的である。それに対し、対話は、「おはようございます、○×さん」「おはようございます。今日もいい天気ですね」「そうですね。洗濯ものがよく乾いて嬉しいです」…と続くひとまとまりとして定義することができる。ある発話の継続中に、他者の発話が割り込んで発話が中断したり、一方で他者の発話の割り込みにもかかわらず、発話は中断せず継続したりすることもある。特に多人数における対話を考えると、各人の発話は、時間的に重複することが多い。対話を構成する発話が終了して、ある閾値（例えば３秒）以上の時間、発話が再開されなければ、そこで対話は完了したと判断することができる。 When a speaker speaks and another speaker speaks, a dialogue is established. The length of the utterance is generally about “Good morning, Mr. XX”. On the other hand, the dialogue can be defined as a group of "Good morning, Mr. Xx", "Good morning. Good weather today" "Yes, I'm glad that the laundry dries well" ... . During the continuation of a certain utterance, the utterance of another person interrupts and the utterance is interrupted. On the other hand, the utterance may continue without interruption despite interruption of the utterance of the other person. In particular, when considering a dialogue with a large number of people, each person's utterance often overlaps in time. If the utterance constituting the dialogue is finished and the utterance is not resumed for a time longer than a certain threshold (for example, 3 seconds), it can be determined that the dialogue is completed.

図８は、ユーザA、B、Cによる対話の例を示す図である。横軸が時間軸であり、この時間軸上の発話と記されている矩形はその時間だけユーザによる発話が行われたことを示している。それぞれの発話には、発話IDが付加される。本実施の形態による例では、発話IDは発話者IDを含み、発話IDを見れば、どのユーザの発話か分かるようになっている。例えば、ID:A01はユーザＡによる発話であることを示す。 FIG. 8 is a diagram illustrating an example of a dialogue by the users A, B, and C. The horizontal axis is the time axis, and the rectangles marked as utterances on this time axis indicate that the user has made utterances for that time. An utterance ID is added to each utterance. In the example according to the present embodiment, the utterance ID includes the utterer ID, and by looking at the utterance ID, which user's utterance can be known. For example, ID: A01 indicates an utterance by user A.

まずユーザAが発話ID:A01で識別される発話を開始し、発話ID:A01の終了後、Δt1後に、ユーザBおよびCがそれぞれ発話ID:B01、C01を同時に開始している。発話が衝突したことに気がついたユーザCは発話を中断し、ユーザBのみが発話を継続し、発話ID:B01の終了後、Δt2後に改めてユーザCが発話ID:C02を開始している。ここで、それぞれの発話の中断時間Δt1、 Δt2、 Δt3に関しては、対話の区切りの判別基準となる閾値を超えておらず、Δt4のみが閾値を超えている。従って、発話ID:A01、 B01、 C01、 C02、 A02をひとまとまりの発話群、すなわち対話として扱うことができる。発話ID:C03とそれに続く発話（図示されていない）は別の対話に分類される。本実施例では、対話を単純に時間的ギャップの大小に基づいて判別しているが、本来は発話内容により判別できることが望ましい。また、対話への参加者の変化、位置の移動などからも対話の区切りを判別することができる。 First, the user A starts an utterance identified by the utterance ID: A01, and after the utterance ID: A01 ends, after Δt1, the users B and C simultaneously start the utterance IDs: B01, C01, respectively. The user C who notices that the utterance collided interrupts the utterance, and only the user B continues the utterance. After the utterance ID: B01 ends, the user C starts the utterance ID: C02 again after Δt2. Here, regarding the interruption time Δt1, Δt2, and Δt3 of each utterance, the threshold value that is a criterion for determining the break of the dialogue is not exceeded, and only Δt4 exceeds the threshold value. Therefore, the utterance IDs A01, B01, C01, C02, and A02 can be treated as a group of utterances, that is, a dialogue. The utterance ID: C03 and the subsequent utterance (not shown) are classified into different dialogs. In the present embodiment, the conversation is simply determined based on the size of the time gap, but it is desirable that the conversation can be originally determined based on the utterance content. In addition, it is possible to determine the break of the dialogue from the change of the participant in the dialogue and the movement of the position.

ここで、一連の発話に発話IDが付加されるため、後日その発話を参照する場合などに、そのIDを用いることができる。たとえば、あるユーザが「今年のボーナスは10か月分支給することを約束します。」と発言したとして、その一連の発話に発話ID:KJ3002-200503241923が付加されたとする。後日、実際には２か月分のボーナスしか支給されなかったことに憤った社員が、発話ID:KJ3002-200503241923を引用した上で、「先日このように仰いましたが、あの約束はどうなりましたか？」と主張することができる。発話IDは、その発話が発生したときに近接するユーザに対して通知されているので、その発話を聞いた人はその発話を特定することができる。発話IDと関連付けて発話を録音しておけば、発話IDの引用時に録音内容を関連付けて提示することも可能である。 Here, since an utterance ID is added to a series of utterances, the ID can be used when referring to the utterance later. For example, suppose that a user says, “I promise to pay the bonus of this year for 10 months”, and the utterance ID: KJ3002-200503241923 is added to the series of utterances. An employee who was reluctant to pay only a bonus for two months later quoted the utterance ID: KJ3002-200503241923 Can you insist? Since the utterance ID is notified to a nearby user when the utterance occurs, the person who has heard the utterance can specify the utterance. If the utterance is recorded in association with the utterance ID, it is also possible to present the recorded content in association with the utterance ID.

次に、本発明の第３の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置において、発話通知手段は、発話が継続している間、間欠的に発話通知を行う。発話は時間的に継続するので、通常、発話通知も時間的に継続して行われる必要がある。図１３は発話の継続と発話通知とのタイミングを示す図である。 Next, real-world communication management technology according to the third embodiment of the present invention will be described. In the information processing apparatus according to the present embodiment, the utterance notification means intermittently notifies the utterance while the utterance continues. Since the utterance continues over time, the utterance notification usually needs to be performed continuously over time. FIG. 13 is a diagram illustrating the timing of continuation of utterance and utterance notification.

図１３(A)は、ユーザが発話を開始したときに発話通知が発信される例を示している。ここで発話者ID:S125753をもつユーザは、時間t_sからt_eにかけて発話を行っている。その発話には、発話ID:200503241923が付加されている。このように発話開始時に一度だけ発話通知を行う方式では、発話が継続中に途中から会話に入ってきた人（発話通知領域内に移動してきた人）には発話通知が送信されないことになる。そのため、途中参加者（のもつ端末）は、その発話を検知することができない。各発話が短時間で終了する場合にはそれほど大きな問題にならないが、長時間にわたる発話を行う場合には、問題になる場合もある。図示していないが、発話終了時に一度だけ発話通知を送信しても良い。この場合は、発話開始時に送信する方法と比較して、発話の継続時間を併せて送信することができるというメリットがある。 FIG. 13A shows an example in which an utterance notification is transmitted when the user starts utterance. Here the speaker ID: S125753 users with is, is doing a speech from time t _s over the t _e. An utterance ID: 200503241923 is added to the utterance. As described above, in the method in which the utterance notification is performed only once at the start of the utterance, the utterance notification is not transmitted to the person who enters the conversation from the middle while the utterance is continuing (the person who has moved into the utterance notification area). For this reason, the midway participant (its terminal) cannot detect the utterance. When each utterance ends in a short time, it does not become a big problem, but when uttering for a long time, it may become a problem. Although not shown, the utterance notification may be transmitted only once at the end of the utterance. In this case, there is an advantage that the duration of the utterance can be transmitted together with the method of transmitting at the start of the utterance.

図１３(B)は、発話の開始時と終了時とに、それぞれ発話開始通知と発話終了通知とを発信する例である。この例では、発話終了と同時に発話終了通知が送信されているが、一定時間以上の発話の中断をもって発話終了と判断する場合には、少なくとも待機時間分のタイムラグが生じる。発話開始通知と発話終了通知とを両方受け取った人は、その発話（この例の場合、発話ID:200503241923の発話）を聞いたとみなすことができる。一方、どちらか一方の発話通知しか受け取らなかった人は、その発話の一部を聞いたとみなすことができる。ただし、先の例と同様に長時間にわたる発話の場合には、発話の途中で参加し、発話が終了するまでに去った人は、発話通知を受けられないという問題がある。 FIG. 13B is an example in which an utterance start notification and an utterance end notification are transmitted at the start and end of an utterance, respectively. In this example, the utterance end notification is transmitted simultaneously with the end of the utterance. However, when it is determined that the utterance is ended when the utterance is interrupted for a predetermined time or more, a time lag corresponding to at least the waiting time occurs. A person who has received both the utterance start notification and the utterance end notification can be regarded as having heard the utterance (in this example, the utterance with the utterance ID: 200503241923). On the other hand, a person who has received only one of the utterance notifications can be regarded as having heard a part of the utterance. However, as in the previous example, in the case of an utterance for a long time, there is a problem that a person who participates in the middle of the utterance and leaves before the utterance ends cannot receive the utterance notification.

図１３(C)は、発話の継続中、一定時間ごとに発話通知を発信している例である。この場合、発話の途中から参加した人も発話通知を受け取ることができる。発話において送信される各発話通知に通し番号を付加しておけば、発話のどの割合を聞くことができたか知ることができる。図１３(C)では発話通知が途絶えることで発話終了と認識することを想定しているが、図１３(D)に示すように発話終了通知を発話終了時に発信するようにしても良い。このようにすると、発話検知漏れを防ぐことはできるが、通信量が増大するという問題が生じる。平均的な発話継続時間から発話通知の発信間隔t₀を適切に調節する必要がある。ユーザが移動していない場合には発話間隔t₀を大きくして、通信量を減らすこともできる。また、発話者識別子に関してもすべての発話通知で通知するわけではなく、適当に間引く（たとえば2回に1回だけ送信する等）ことも可能である。 FIG. 13C shows an example in which an utterance notification is transmitted at regular intervals while the utterance is continued. In this case, the person who participated from the middle of the utterance can also receive the utterance notification. If a serial number is added to each utterance notification transmitted in the utterance, it is possible to know what percentage of the utterance has been heard. In FIG. 13C, it is assumed that the utterance end is recognized when the utterance notification is interrupted. However, as shown in FIG. 13D, the utterance end notification may be transmitted when the utterance ends. If this is done, it is possible to prevent missed speech detection, but there is a problem that the amount of communication increases. It is necessary to appropriately adjust the transmission interval t ₀ of the utterance notification from the average utterance duration. If the user is not moving to increase the speech interval t _0, it is also possible to reduce the amount of communication. Further, the speaker identifier is not notified by all utterance notifications, but can be appropriately thinned out (for example, transmitted only once every two times).

尚、図１３(A)は図１３(C)においてt₀が発話継続時間よりも長い場合、図１３(B)は図１３(D)においてt₀が発話継続時間よりも長い場合を示していると考えることもできる。
ここに挙げるように発話通知の方式はさまざまなパターンが存在するが、発話の継続時間の長短（長い発話が多い場合には図１３(C)や図１３(D)の方法が望ましいが、短い発話が多い場合には図１３(A)や図１３(B)が望ましい）やユーザ移動の有無(ユーザが移動しないのであれば図１３(A)や図１３(B)のようにしても問題ない)などを勘案して、適応的に選択することが好ましい。 13A shows the case where t ₀ is longer than the utterance duration in FIG. 13C, and FIG. 13B shows the case where t ₀ is longer than the utterance duration in FIG. 13D. You can also think that
As shown here, there are various patterns of the utterance notification method, but the duration of the utterance is long or short (when there are many long utterances, the method of FIG. 13C or FIG. 13D is desirable, but short. 13 (A) and 13 (B) are desirable when there are many utterances) and whether or not the user has moved (if the user does not move, problems such as FIGS. 13 (A) and 13 (B)) may occur. It is preferable to select adaptively in consideration of the above.

次に、本発明の第４の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置において、他の端末から発信された発話通知を受信する発話受理手段をさらに備え、発話受理手段により受信した他の発話通知を、記録するコミュニケーション記録管理手段を備えることも可能である。発話通知を記録しておくことにより、誰から(Who)話しかけられたのかということを記録することができる。 Next, a real world communication management technique according to the fourth embodiment of the present invention will be described. The information processing apparatus according to the present embodiment further includes an utterance receiving unit that receives an utterance notification transmitted from another terminal, and further includes a communication record management unit that records the other utterance notification received by the utterance receiving unit. Is also possible. By recording the utterance notification, it is possible to record who (Who) spoke to.

さらに発話通知手段により通知した自らの発話通知に関してもコミュニケーション記録管理手段において記録するようにすることで、誰に(Who)話しかけたかということも記録することができる。 Further, by recording the own utterance notification notified by the utterance notification means in the communication record management means, it is possible to record who (Who) spoke to.

また、情報処理装置において、発話通知を発した端末に対して、受話通知を返信する受話通知手段をさらに備え、コミュニケーション記録管理手段は、自らの発話通知に対して返信された受話通知を発話通知に関連付けて記録することも可能である。 Further, the information processing apparatus further includes reception notification means for returning the reception notification to the terminal that has issued the utterance notification, and the communication record management means notifies the reception notification returned for its own utterance notification. It is also possible to record in association with.

発話通知を受信した際に、受信端末がユーザID(発話者IDと同一。受話者を示すので特に受話者IDとも呼ぶ)を含む受話通知を返信するようにすれば、その発話が誰に聞かれたのかということも知ることができる。但し、通常の対話において、発話は参加者双方から行われることを考えると、受話通知を導入しなくても発話通知だけでも、ある程度までは対話相手を特定することが可能である。 When receiving the utterance notification, if the receiving terminal sends back a reception notification including the user ID (same as the speaker ID, which is also called the receiver ID because it indicates the receiver), who will hear the utterance? You can also know if it was done. However, in consideration of the fact that utterances are made by both participants in a normal conversation, it is possible to specify the conversation partner to a certain extent even if only the utterance notification is used without introducing the incoming notification.

情報処理装置において、コミュニケーション記録管理手段が管理する発話記録は、発話時刻と対応付けられて管理することができる。時刻情報を含むことにより、いつ（When）話しかけられたのか（話しかけたのか）について機械が認識することが可能になる。もちろん、通知される情報の中に時刻情報が含まれていなくても、通知情報を受け取った時刻を確認すれば良い。 In the information processing apparatus, the utterance record managed by the communication record management means can be managed in association with the utterance time. By including time information, the machine can recognize when it was spoken (when it was spoken). Of course, the time at which the notification information is received may be confirmed even if the time information is not included in the notified information.

また、情報処理装置において、コミュニケーション記録管理手段が管理する発話記録は、発話場所と対応付けられて管理されることも可能である。場所情報を含むことにより、どこで（Where）話しかけられたのか（話しかけたのか）機械が認識することが可能になる。もちろん、通知される情報の中に場所情報が含まれていなくても、受信端末が現在位置を取得する手段を有している場合にはその情報を使うこともできる。例えば、受信端末はGPSなどを利用して位置を取得することができるし、あるいは環境中に配置されているシビルマーカー等からの位置情報を受信することで、位置を取得することができる。 In the information processing apparatus, the utterance record managed by the communication record management means can be managed in association with the utterance place. By including location information, the machine can recognize where it was spoken (where it was spoken). Of course, even if the location information is not included in the notified information, if the receiving terminal has means for acquiring the current position, the information can be used. For example, the receiving terminal can acquire the position using GPS or the like, or can acquire the position by receiving position information from a civil marker or the like placed in the environment.

また、情報処理装置において、発話内容を記録する発話内容記録手段をさらに備え、コミュニケーション記録管理手段は、発話内容記録手段により録音された発話内容も発話通知に対応付けて記録することもできる。会話を録音するマイクやその様子を撮影するカメラがあれば何を（What）どのように（How）話しかけられたのか（話しかけたのか）蓄積することができる。これらの情報は前述の発話IDや発話時刻を手がかりにして、後に参照することができ、会話内容を再確認したり、引用したりすることが可能になる。 The information processing apparatus further includes an utterance content recording unit that records the utterance content, and the communication record management unit can also record the utterance content recorded by the utterance content recording unit in association with the utterance notification. If you have a microphone that records the conversation and a camera that captures the situation, you can accumulate what (What) and how (Talk) you talked to. These pieces of information can be referred to later using the above-mentioned utterance ID and utterance time as clues, and the contents of the conversation can be reconfirmed or quoted.

但し、発話内容を記録した録音データや録画データをそのまま保持している必要は無く、音声認識や、意味解析が利用可能であるならば、発話内容をテキスト化したり、要約したり、キーワードを付加したりすることもできる。例えば、発話に頻繁に出現するキーワードを抽出するだけでも、後に検索する場合などに有用である。 However, there is no need to keep the recorded data or recorded data of the utterance contents as they are. If voice recognition and semantic analysis are available, the utterance contents can be converted into text, summarized, or added with keywords. You can also do it. For example, it is useful only when extracting a keyword that frequently appears in an utterance or when searching later.

一方で、一人の人間が一生のうちに処理するデータ量は高々Pbit(Pは10¹⁵)のオーダという研究結果もあり、全ての発話記録を圧縮してコンピュータに蓄積することも不可能ではないし、10数年後には当たり前になっている可能性もある。 On the other hand, there is a research result that the amount of data processed by one person in the lifetime is at the order of Pbit (P is 10 ¹⁵ ), and it is not impossible to compress all utterance records and store them in the computer. There is a possibility that it will become commonplace after 10 years.

2005年現在、個人が所有するPCのHDDの最大値は1TB(=1,000GB)程度だと思われる。1PBは1，000TBなので、記憶容量が1,000倍になれば一生の出来事すべてを記録できる。HDDの容量は毎年2倍弱ぐらいのペースで伸張しているので、同じペースが維持できれば10数年で必要な容量がまかなえる計算になる。適切に圧縮すれば、より小さな容量でもすべての記録は可能だと思われる。 As of 2005, it seems that the maximum value of a personal computer's own HDD is about 1 TB (= 1,000 GB). Since 1PB is 1,000TB, if the storage capacity is increased 1,000 times, you can record all the events of your lifetime. Since the capacity of HDDs is growing at a rate of about twice a year every year, if the same pace can be maintained, the required capacity can be covered in over 10 years. With proper compression, all recordings are likely to be possible with smaller capacities.

図１５は、発話記録を管理する表の一例を示す図である。各発話には、発話ID１５０１、発話者１５０２、受話者１５０３、発話時刻１５０４、発話場所１５０５、発話継続時間１５０６、発話内容を記録した音声ファイル１５０７、がそれぞれ関連付けられて記録されている。発話ID１５０１は発話に機械的に割り当てられる一意のIDである。発話者１５０２は発話通知に含まれる発話者のIDから知ることができる。図１５では、後述するユーザ名取得手段によって、既知のIDに関しては対応するユーザ名を取得し、IDの代わりに記述している。発話時刻１５０４や発話場所１５０５に関しては、前述のように発話通知に記述されていても良いし、端末が備える別の手段により取得しても良い。発話継続時間１５０６は発話開始時刻と終了時刻とから求めることができる。発話内容を記録した音声ファイルは、発話の継続中稼動するマイクによって記録された音声を含む。詳細については、後述する実施例において説明する。 FIG. 15 is a diagram illustrating an example of a table for managing utterance records. Each utterance is recorded in association with an utterance ID 1501, an utterer 1502, a receiver 1503, an utterance time 1504, an utterance place 1505, an utterance duration 1506, and an audio file 1507 in which the utterance contents are recorded. The utterance ID 1501 is a unique ID mechanically assigned to the utterance. The speaker 1502 can know from the ID of the speaker included in the utterance notification. In FIG. 15, a user name corresponding to a known ID is acquired by a user name acquisition unit, which will be described later, and described instead of the ID. The utterance time 1504 and the utterance place 1505 may be described in the utterance notification as described above, or may be acquired by another means included in the terminal. The utterance duration time 1506 can be obtained from the utterance start time and end time. The audio file in which the utterance content is recorded includes the audio recorded by the microphone that operates during the continuation of the utterance. Details will be described in Examples described later.

ここで、受話者が取得できているのは、前述の受話通知を導入しているからであり、受話通知を利用しない場合には、受話者の取得は行えない。しかしながら、対話が継続している間の発話者を発話通知によって知ることにより、受話者を推測することは可能である。 Here, the listener is able to acquire because the above-described reception notification is introduced. If the reception notification is not used, the receiver cannot be acquired. However, it is possible to infer the receiver by knowing the speaker while the conversation is ongoing by the notification of the utterance.

次に、本発明の第５の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置は、発話記録に電子署名を施す電子署名手段を有する。すなわち、発話の引用に証拠能力を付与するために、発話の参加者による合議印的な電子署名を付加することもできる。 Next, a real-world communication management technique according to the fifth embodiment of the present invention will be described. The information processing apparatus according to this embodiment has electronic signature means for applying an electronic signature to the utterance record. That is, in order to give evidence ability to the citation of an utterance, an electronic signature like a consensus sign by an utterance participant can be added.

図１４は、ユーザA１４１１とユーザB１４１５との間で交わされた対話記録に付加される電子署名の例を示す図である。対話記録１４０３には、対話時刻、対話場所、対話の参加者（対話者）、対話に含まれる発話IDなどが記録される。対話記録はその対話を構成する発話記録を統合することで構成される。対話開始時刻は、対話を構成する最初の発話の発話開始時刻であるし、対話終了時刻は、対話を構成する最後の発話の発話終了時刻である。対話場所は、対話を構成する発話の発話場所を統合したものであって、対話参加者が移動しながら対話をしていた場合には、移動軌跡として示される。対話の参加者は、対話を構成する発話の発話者ということになるが、一切発話をせず聞くだけというユーザも参加者として扱ってもよい（尚，こうしたユーザを認識するためには前述の受話通知を導入する必要がある）。 FIG. 14 is a diagram illustrating an example of an electronic signature added to a dialogue record exchanged between user A 1411 and user B 1415. The dialogue record 1403 records the dialogue time, dialogue location, dialogue participants (dialogues), utterance IDs included in the dialogue, and the like. The dialogue record is formed by integrating the utterance records constituting the dialogue. The dialogue start time is the utterance start time of the first utterance constituting the dialogue, and the dialogue end time is the utterance end time of the last utterance constituting the dialogue. The dialogue place is obtained by integrating the utterance places of the utterances constituting the dialogue, and is shown as a movement trajectory when the dialogue participant is talking while moving. Participants in the dialogue are utterers of the utterances that make up the dialogue, but users who just listen without uttering may be treated as participants. Need to introduce incoming call notifications).

対話記録には前述の情報に加えて、発話内容を記録したバイナリファイルが付加されても良い。発話内容の録音ファイルが対話記録に含まれれば、きわめて高い精度をもって、その発話が誰によって行われたかを特定することが可能になる。特に証拠能力が求められる場合には、必ず付加することが望ましい。 In addition to the information described above, a binary file in which the utterance content is recorded may be added to the dialog record. If the recorded file of the utterance content is included in the dialogue record, it is possible to specify who made the utterance with extremely high accuracy. It is desirable to add it especially when evidence capacity is required.

対話の参加者が、互いに、その時刻、その場所においてその対話が交わされたことを確認した上で、電子署名を施す。ユーザA１４１１とユーザB１４１５の両者の電子署名が付加された対話記録を、両者が保持することにより簡易的な証明になる。 Participants in the dialogue confirm that the dialogue has been exchanged with each other at that time and place, and then give an electronic signature. A simple proof can be obtained by holding the dialogue records to which the digital signatures of both the user A 1411 and the user B 1415 are added.

より厳密性を期すならば、第三者のタイムスタンプ局（TSA: Time Stamping Authority）によるタイムスタンプを付与することが望ましい。対話への参加者（図１４の例ではユーザA１４１１とユーザB１４１５）の電子署名が付加された対話記録はTSA１４１７に送信される。TSA１４１７は、対話記録のハッシュ値に（１４２１）タイムスタンプを付与し（１４２２）、デジタル署名をする（１４２３）ことで、電子レシートを発行する（１４２４）。タイムスタンプは、たとえばRFC3161に準拠したものが利用されることが望ましい。電子レシートを付与された対話記録はユーザに返信されるので、それを交互が保持することにより（レシートの原本は、ＴＳＡ１４１７が保持する（１４２５））。その対話が、いつ（When）、どこで（Where）、誰によって(Who)、何を（What）、どのように（How）行われたのかを証明することが可能になる。 For more strictness, it is desirable to give a time stamp by a third party Time Stamping Authority (TSA). The dialogue record to which the electronic signatures of the participants in the dialogue (user A 1411 and user B 1415 in the example of FIG. 14) are added is transmitted to the TSA 1417. The TSA 1417 attaches a time stamp (1421) to the hash value of the dialog record (1422), and issues a digital signature (1423), thereby issuing an electronic receipt (1424). For example, a time stamp that conforms to RFC3161 is preferably used. Since the dialogue record to which the electronic receipt has been given is returned to the user, by holding it alternately (the TSA 1417 holds the original receipt (1425)). It becomes possible to prove when the dialogue was performed when, where, where, who, what, and how.

情報処理装置において、コミュニケーション記録管理手段に記録されている発話記録を外部に出力する発話記録出力手段を備えることにより、情報処理装置内部に十分な記憶領域が無くても、出力手段を用いて外部に出力することで大量の情報を蓄積できる。外部出力はインターネットや電話網などのネットワークを通じて行っても良いし、不揮発性メモリを利用したUSBメモリやカード型メモリなどの記憶媒体を通じて行っても良い。 In the information processing apparatus, by providing utterance record output means for outputting the utterance record recorded in the communication record management means to the outside, even if there is not enough storage area inside the information processing apparatus, the output means can be used for external A large amount of information can be accumulated by outputting to. The external output may be performed through a network such as the Internet or a telephone network, or may be performed through a storage medium such as a USB memory or a card type memory using a nonvolatile memory.

以上では、基本的に人間の発話者の発話を検知し、発話通知を行う発話通知装置について述べてきたが、本発明の第５の実施の形態によれば、発話を含むコンテンツを再生するコンテンツ再生装置が、前述の構成と同様の発話通知を行うことができる。 In the above, the utterance notification device that basically detects the utterance of a human speaker and notifies the utterance has been described. However, according to the fifth embodiment of the present invention, the content that reproduces the content including the utterance The playback device can make an utterance notification similar to the above-described configuration.

例えば、テレビやラジオなどのコンテンツ再生装置で、出演者が話しているときに、その発話に対応する発話通知をテレビやラジオが行うことができる。発話通知を行うのに必要な発話者識別はコンテンツ再生装置が行ってもいいが、コンテンツに発話者を識別する情報があらかじめ埋め込まれており、コンテンツ再生装置がそれに基づいて発話通知を行うことが望ましい。所定のフォーマットで構成された発話通知そのものをコンテンツに埋め込む、あるいはコンテンツと同様に配信しても良い。発話を行うものが実人間であっても、テレビなどのコンテンツ再生装置であっても、同様の発話通知が送信されることで、それらの記録を分け隔てなく扱うことが可能になる。 For example, when a performer is talking on a content playback device such as a television or radio, the television or radio can make an utterance notification corresponding to the utterance. The content playback device may perform speaker identification necessary for performing the speech notification, but information for identifying the speaker is embedded in the content in advance, and the content playback device may perform the speech notification based on the information. desirable. The utterance notification itself configured in a predetermined format may be embedded in the content or distributed in the same manner as the content. Even if an utterance is a real person or a content playback device such as a television, the same utterance notification is transmitted, so that these records can be handled without separation.

さらに発話通知に対応する前述の受話通知をコンテンツ再生装置が受信し、発話者ないしはコンテンツ作成者に転送するようにすれば、発話者やコンテンツ作成者が、発話が誰に聞かれたのか遠隔地からでも知ることが可能になる。転送先のアドレスは受話通知に記述されるが、これは元の発話通知に指定されていることが望ましい。 Furthermore, if the content playback device receives the above-mentioned reception notification corresponding to the utterance notification and forwards it to the utterer or the content creator, it is possible to determine who the utterer heard from the remote site. It is possible to know even from. The forwarding address is described in the incoming call notification, but it is desirable that this is specified in the original outgoing call notification.

単に受話通知を転送しているだけで、何も特別なことをしていない。受話通知には送信先のアドレスが含まれるので、別にこのコンテンツ再生装置を通さなくても、別の手段（インターネットなど）を介して普通に送信できる。しかしながら、近くにあるネットワークに接続された通信可能なコンテンツ再生装置に、受話通知を転送する機能を付与するのが好ましい。送受信される発話通知、受話通知などの特性は前述の構成と変わらないため、説明を省略する。 It simply forwards the incoming call notification and does nothing special. Since the reception notification includes the address of the transmission destination, it can be normally transmitted through another means (such as the Internet) without going through the content reproduction apparatus. However, it is preferable to provide a communicable content reproduction apparatus connected to a nearby network with a function of transferring an incoming call notification. Since the characteristics such as the utterance notification and the reception notification that are transmitted and received are the same as the above-described configuration, the description thereof is omitted.

次に、本発明の第６の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置においては、発話者を識別する発話者識別子と発話の発生時刻とを含む発話記録を管理するコミュニケーション記録管理手段と、前記コミュニケーション記録管理手段に管理されている時間的に近接する発話記録を、対話記録としてグループ化する発話統合手段を有し、前記発話統合手段により生成された対話記録の一部または全部を、対話発生時刻に基づき時系列に並べてユーザに提示することを特徴とする。 Next, real-world communication management technology according to the sixth embodiment of the present invention will be described. In the information processing apparatus according to the present embodiment, communication record management means for managing an utterance record including a speaker identifier for identifying a speaker and an utterance occurrence time, and temporal information managed by the communication record management means Utterance recording means for grouping utterance records close to the utterance record as dialogue records, and presenting a part or all of the conversation records generated by the utterance integration means to the user in time series based on the dialogue occurrence time It is characterized by that.

本実施の形態による情報処理装置においては、コミュニケーション記録提示手段は、コミュニケーション記録管理手段に記録されている発話記録において時間的に近接する発話を対話としてグループ化する。会話による対人コミュニケーションを考えた場合、対話はいくつかの発話のやりとりによって構成される。対話にはひとつないし複数の話題があり、その話題に対して、対話への参加者による発話が交互に行われる。発話記録の管理においては、個々の発話単位でなく、意味的・時間的まとまりである対話単位で行うことが、煩雑さを避けるうえでも望ましい。 In the information processing apparatus according to the present embodiment, the communication record presenting means groups the utterances that are close in time in the utterance record recorded in the communication record management means as a dialog. When considering interpersonal communication through conversation, a dialogue consists of several utterances. There are one or more topics in the dialogue, and utterances by participants in the dialogue are alternately performed on the topic. In managing the utterance record, it is desirable not to make individual utterance units, but also to make dialogue units that are semantically and temporally organized in order to avoid complications.

図１９は、夏目漱石の「こころ」の一節であり、図２２は、これに対応する対話記録である（発話時刻など原作に含まれていない情報に関しては、説明のために例として追記した）。引用した節では「私」と「先生」との合計９つの発話が存在する。複数の発話の組み合わせで対話が成立するが、単純に発話間の時間間隔が閾値（たとえば５秒）を越えた時点で区切って対話を認識する場合、
１）対話１(対話ID:TALK01)：私発話１+先生発話１
２）対話２(対話ID:TALK02)：私発話２+先生発話２+私発話３+先生発話３
３）対話３(対話ID:TALK03)：先生発話４+私発話４+先生発話５
の３つの対話が認識できる。 FIG. 19 is a section of Natsume Soseki's “Kokoro”, and FIG. 22 is a corresponding dialogue record (information not included in the original such as speech time was added as an example for explanation). . In the quoted section, there are a total of nine utterances of “I” and “Teacher”. When a dialogue is formed by a combination of multiple utterances, but when the time interval between utterances exceeds a threshold (for example, 5 seconds), the dialogue is separated and recognized.
1) Dialogue 1 (Dialogue ID: TALK01): Private utterance 1 + Teacher's utterance 1
2) Dialogue 2 (Dialogue ID: TALK02): Private speech 2 + Teacher speech 2 + Private speech 3 + Teacher speech 3
3) Dialog 3 (dialog ID: TALK03): Teacher utterance 4 + Private utterance 4 + Teacher utterance 5
The following three dialogs can be recognized.

図２３(Ａ)は、図２２で示した発話記録を対話単位で表した例である。図２３（Ａ）においては、対話ＩＤ２３０１と、主な対話者２３０２と、対話時刻２３０３と、対話場所２３０４と、対話継続時間２３０５と、音声２３０６と、を含んでいる。 FIG. 23A shows an example in which the utterance record shown in FIG. 22 is expressed in dialog units. In FIG. 23A, a dialogue ID 2301, a main dialogue person 2302, a dialogue time 2303, a dialogue place 2304, a dialogue duration 2305, and a voice 2306 are included.

対話継続時間２３０５は一つの対話ＩＤ２３０１により特定される対話に含まれる最初の発話の開始時刻を最後の発話の終了時刻から減算した時間として算出しているが、個々の発話継続時間を累積しても良い。 The dialog duration 2305 is calculated as a time obtained by subtracting the start time of the first utterance included in the dialog specified by one dialog ID 2301 from the end time of the last utterance. Also good.

図２２の単一の発話（たとえばID:T002の先生の発話「いいえ」）だけをあとで参照したとしても意味が分からないが、ID:T002を含む対話２（ID:TALK02図２３（Ａ））を参照すると、話の内容を理解することができる。また、発話記録を参照するときに図２２のように個別の発話が大量に並んでいるよりも、図２３のように対話ごとにグループ化されて表示されている方が内容を理解しやすい。対話の区切り方には多様な方法が考えられる。図２３（Ｂ）の符号２３１１〜２３１６までは、いったん図２３（Ａ）のように分類した対話を、対話者が同一で、時間的に連続する（対話間隔が５分未満であり、かつ、間に他の対話が挟まれない）対話同士をさらにまとめて、大対話を形成した例である。 Even if only a single utterance in FIG. 22 (for example, the utterance “No” of the teacher of ID: T002) is referred later, the meaning is not understood, but Dialog 2 including ID: T002 (ID: TALK02 FIG. 23 (A)) ) To understand the content of the story. Further, when referring to the utterance record, it is easier to understand the contents when the individual utterances are grouped and displayed as shown in FIG. 23 than when a large number of individual utterances are arranged as shown in FIG. There are various ways to divide the dialogue. In FIG. 23B, reference numerals 2311 to 2316 indicate that conversations once classified as shown in FIG. 23A are the same in conversation and continuous in time (interaction interval is less than 5 minutes, and This is an example in which a large dialogue is formed by further gathering dialogues with no other dialogue in between.

さらに対話記録を時系列で並べることでユーザは直感的に対話記録を閲覧することが可能になる。日付、週、月ごとに分割して表示することも有効である。この情報処理装置において、コミュニケーション記録管理手段に記録されている発話記録に含まれる発話者を識別する発話者IDと対応するユーザ名を取得するユーザ名取得手段を有し、コミュニケーション記録提示手段は、各発話に対応するユーザ名を提示することもできる。 Furthermore, by arranging the dialogue records in time series, the user can intuitively browse the dialogue records. It is also effective to divide and display by date, week, and month. In this information processing apparatus, the information processing apparatus includes a user name acquisition unit that acquires a user name corresponding to a speaker ID that identifies a speaker included in the utterance record recorded in the communication record management unit, and the communication record presentation unit includes: A user name corresponding to each utterance can also be presented.

前述のように、IDは必ずしも人間が容易に個人を特定できるように記述されているわけではないため、既知のIDに関しては、ユーザが理解可能な名前やニックネームなどで表示されることが望ましい。プライバシに関する懸念の無い環境（すべての構成員が互いに顔見知りであるオフィスのような環境）では、重複する恐れが無ければ氏名をIDとして使うことも可能だが、パブリックな環境で利用する場合に、自分の氏名を周囲の見ず知らずの他人に発信することは問題が大きい。そのためIDは他人が個人を識別できないような、機械的な英数字の羅列となることが望ましい。 As described above, since the ID is not necessarily described so that a person can easily identify an individual, it is desirable to display a known ID with a name or nickname that can be understood by the user. In an environment where there is no concern about privacy (such as an office environment where all members are acquainted with each other), the name can be used as an ID if there is no fear of duplication, but when using it in a public environment, It's a big problem to send your name to others around you. Therefore, it is desirable for the ID to be a mechanical enumeration of alphanumeric characters that cannot be identified by others.

そこで、互いに面識があり、両者が同意した場合には、自分のIDと自分の氏名との対応関係を相手に教える必要がある。情報処理装置は、図１８に示すようなID１８０１と名前１８０２とから構成されるID変換テーブルを保持しており、既知の（登録済みの）IDに関しては名前に変換してユーザに提示する。プライバシ保護の観点からIDは定期的に変更される場合には、IDには有効期限１８０３を設定しておく。有効期限を過ぎたIDは消去され、再取得されることが望ましい。ただし、公人などのIDは互いに面識が無くても、公知されるので、消去しなくても良い場合もある。ID情報の受け渡しは、電子名刺の交換時などにあわせて行われることが望ましい。あるいは、ある部署の所属員全員のID情報をまとめて全員に配布することもできる。 Therefore, if they are acquainted with each other and both agree, it is necessary to teach the other person the correspondence between their ID and their name. The information processing apparatus holds an ID conversion table composed of an ID 1801 and a name 1802 as shown in FIG. 18, and a known (registered) ID is converted into a name and presented to the user. If the ID is periodically changed from the viewpoint of privacy protection, an expiration date 1803 is set for the ID. It is desirable that IDs that have expired be deleted and reacquired. However, IDs such as public officials are publicly known even if they are not acquainted with each other, and may not be deleted. It is desirable that ID information is exchanged when the electronic business card is exchanged. Alternatively, ID information of all members of a department can be collected and distributed to all members.

図７は電子名刺フォーマットとして最もよく利用されているvCard方式で記述した電子名刺データの例である。名前や所属などの一般的な名刺データに加え、５行目には現在利用しているID（ＡＣ２６５８７０３５５２）とその有効期限（２００５０３１２）を記述している。ここでは、IDは現在利用中の１つのみを通知しているが、過去に利用したことのあるID一覧も併せて相手に渡せば、過去意外なところで遭っていたことが判明するかもしれない。例えば、以下のような会話が交わされる可能性もある。「あっ、10年前、オーストラリアに旅行に行ったとき、道順を聞いた人ですね？」
「おや、そうでしたか、世間は狭いですなあ(笑)」 FIG. 7 shows an example of electronic business card data described in the vCard method that is most often used as an electronic business card format. In addition to general business card data such as name and affiliation, the fifth line describes the ID (AC2658703552) currently used and the expiration date (200550312). Here, only one ID that is currently in use is notified, but if a list of IDs that have been used in the past is also given to the other party, it may be found that the user has encountered an unexpected place in the past. . For example, the following conversation may be exchanged. “Ah, when you traveled to Australia 10 years ago, did you hear the directions?”
`` Oh, was that so, the world is narrow (lol) ''

ID情報に信頼性を求める場合には、第三者機関による認証が必要になる。各ユーザは自分が使用するIDに認証機関の証明書を添付して配布を行う。証明書はそのIDが確実にそのユーザによって利用されたということを保障するので、当該IDを有する発話記録はそのユーザによる発話であったことが高い信頼性を持って保障される。 When reliability is required for ID information, authentication by a third party is required. Each user distributes by attaching the certificate of the certification authority to the ID that he uses. Since the certificate ensures that the ID is used by the user, it is ensured with high reliability that the utterance record having the ID is an utterance by the user.

一方、プライバシ保護の目的で、既知の（登録済みの）IDをもつ発話通知のみ記録し、未知のIDをもつ発話通知は受信しないという受信側の対応、ないしは、既知の（登録済みの）IDをもつ端末にのみ発話通知を行う送信側の対応を行うこともできる。このようにすると、登録されたユーザの発話のみ記録されることになる。従って、プライバシを保護すると同時に、発話記録におけるノイズ（関係ない第三者の会話記録）を減らすことができる。 On the other hand, for the purpose of privacy protection, only the utterance notification with a known (registered) ID is recorded and the utterance notification with an unknown ID is not received, or the receiving side response or a known (registered) ID It is also possible to deal with the transmission side that notifies the utterance only to the terminal having. If it does in this way, only the utterance of the registered user will be recorded. Accordingly, it is possible to reduce privacy (irrelevant third-party conversation recording) in speech recording while protecting privacy.

次に、本発明の第７の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態の情報処理装置は、コミュニケーション記録提示手段において、対話の参加者のうち、グループ内で頻繁に発話した発話者を、当該対話の代表的な発話者として提示するようにすることもできる。対話単位で発話記録を管理する場合、その対話を特徴付けるラベルがあるほうが、アクセスする上で望ましい。ある対話において、アクティブに発言した人の名前は、その対話の特徴を表現する一つの指標となりえる。各対話は時空間情報を付加されて、例えば、次のように表現される。
「2005年03月24日12時23分〜34分＠食堂稗田、首藤、上田（ほか4名）」 Next, a real-world communication management technique according to the seventh embodiment of the present invention will be described. The information processing apparatus according to the present embodiment may cause the communication record presenting means to present, as a representative speaker of the dialogue, a speaker who has spoken frequently in the group among the participants of the dialogue. it can. When managing utterance records in dialog units, it is desirable to have a label characterizing the dialog for access. In a dialogue, the name of the person who spoke actively can be an index that expresses the characteristics of the dialogue. Each dialog is added with spatio-temporal information and expressed, for example, as follows.
“March 24th, 2005 12: 23-34 @ Restaurants Kamata, Shudo, Ueda (and 4 others)”

代表的な発話者の判別としては、対話における各発話者の総発話時間を算出し、発話時間が長い順に必要数だけユーザ名を取得することによって行える。必要数に満たない場合でも総発話時間（あるいは対話継続時間におけるその発話者の発話時間の割合）が規定値に満たない場合には提示を取りやめたり、発話時間だけではなく発話回数をパラメタとして加味したりすることも可能である。 A typical speaker can be identified by calculating the total utterance time of each utterer in the conversation and acquiring the necessary number of user names in the order of longer utterance time. If the total number of utterances (or the percentage of the utterance time of the speaker in the duration of the conversation) is less than the specified value even if the number is less than the required number, the presentation is canceled or the number of utterances is considered as a parameter, not just the utterance time It is also possible to do.

次に、本発明の第８の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置においては、前記コミュニケーション記録管理手段が管理する発話記録は、発話の受話者を示す情報をさらに含み、前記発話統合手段により生成された対話記録は、発話の受話者と併せて提示されることを特徴とする。 Next, a real-world communication management technique according to the eighth embodiment of the present invention will be described. In the information processing apparatus according to the present embodiment, the utterance record managed by the communication record management unit further includes information indicating the utterance recipient, and the dialogue record generated by the utterance integration unit is the utterance recipient. It is characterized by being presented together.

コミュニケーション記録においては、その発話が誰に聞かれていたのかを取得することが重要になる場合が多い。発話者のみを記録すると、対話には居合わせたが、発言をしなかったユーザは記録から漏れてしまうことになるが、そうしたユーザに関しても記録を行うことでこの問題を回避することができる。 In communication records, it is often important to obtain who was listening to the utterance. If only the speaker is recorded, the user who is present in the dialogue, but does not speak, will be omitted from the recording, but this problem can also be avoided by recording the user.

次に、本発明の第９の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置においては、前記コミュニケーション記録管理手段が管理する発話記録は、発話の発生場所を示す情報をさらに含み、前記発話統合手段により生成された対話記録は、対話の発生場所と併せて提示されることを特徴とする。 Next, real-world communication management technology according to the ninth embodiment of the present invention will be described. In the information processing apparatus according to the present embodiment, the utterance record managed by the communication record management unit further includes information indicating an utterance occurrence location, and the dialogue record generated by the utterance integration unit is an occurrence location of the dialogue. It is characterized by being presented together.

コミュニケーション記録においては、その発話（対話）がどこで行われていたのかを取得することが重要になる場合が多い。発話ごとに発話場所を取得し、記録しておくことで対話の発生場所を提示することが可能になる。対話参加者が移動しながら対話を行っている場合には、対話の発生場所は、広い範囲に分散する可能性がある。その場合は、場所A→場所B→場所Cというように代表的な場所の遷移を示すことができる。 In communication records, it is often important to obtain where the utterance (dialogue) was made. By acquiring and recording the utterance location for each utterance, it is possible to present the location of the conversation. In the case where the conversation participants are performing the conversation while moving, the places where the conversation occurs may be distributed over a wide range. In that case, a transition of a typical place can be shown as place A → place B → place C.

次に、本発明の第１０の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置においては、前記コミュニケーション記録管理手段が管理する発話記録は、発話内容をさらに含み、前記発話統合手段により生成された対話記録は、発話内容から構成される対話内容と併せて提示されることを特徴とする。発話内容をマイクにより録音することで発話内容を記録することが可能であり、それを対話記録と関連づけて保存しておくことで、あとから発話内容を呼び出すことが可能になる。 Next, real-world communication management technology according to the tenth embodiment of the present invention will be described. In the information processing apparatus according to the present embodiment, the utterance record managed by the communication record management unit further includes utterance content, and the dialogue record generated by the utterance integration unit includes dialogue content composed of utterance content, and It is also characterized by being presented together. The utterance content can be recorded by recording the utterance content with a microphone, and the utterance content can be recalled later by storing it in association with the dialogue record.

次に、本発明の第１１の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置においては、前記コミュニケーション記録管理手段が管理する発話記録は、発話者の精神状態の推定情報をさらに含み、前記発話統合手段により生成された対話記録は、前記推定情報と併せて提示されることを特徴とする。 Next, real-world communication management technology according to the eleventh embodiment of the present invention will be described. In the information processing apparatus according to the present embodiment, the utterance record managed by the communication record management unit further includes estimation information of the mental state of the speaker, and the dialogue record generated by the utterance integration unit is the estimation information. It is characterized by being presented together.

発話は、人間のさまざまな感情を含んでいる。通常の自然発話における人間の感情認識率は60%〜70%と言われているが、機械でも会話速度、声量、話のトーンなどを定量的に解析することによってほぼ同等レベルの感情認識が可能になっている。たとえば、幸福、驚きなどの場合は発話音声の基本周波数の平均値は高く、嫌悪の場合には低いという特徴がある。また発話音声の基本周波数の標準偏差と感情の種類にも一定の関係があることが知られている。そこで、例えば、図４の装置が携帯電話機であれば、音声記録用のマイクを用いて、会話速度、声量、話のトーンなどを定量的に解析することが可能である。この解析結果も、発話内容と関連付けして記録しておくことができる。 Utterance includes various human emotions. Human emotion recognition rate in normal natural utterances is said to be 60% to 70%, but even machines can recognize emotions at almost the same level by quantitatively analyzing conversation speed, voice volume, talk tone, etc. It has become. For example, the average value of the fundamental frequency of speech is high for happiness and surprise, and low for disgust. It is also known that there is a certain relationship between the standard deviation of the fundamental frequency of speech and the type of emotion. Therefore, for example, if the apparatus of FIG. 4 is a mobile phone, it is possible to quantitatively analyze the conversation speed, the volume of voice, the tone of the talk, etc., using a microphone for voice recording. This analysis result can also be recorded in association with the utterance content.

発話内容を記録する際に感情認識をあわせて行い、発話記録をユーザに提示する際に、各発話に対する感情情報をユーザに合わせて提示することにより、ユーザが情報を閲覧するときの手がかりとして利用できる。例えば、怒りの感情のこもった発話は赤色で記述し、悲しみの感情のこもった発話は青色で記述する。これにより、ユーザが必要な発話記録を見つけ出すための大きな手がかりとなる。また、後日、発話記録を閲覧したときに、ある人がその発話中、（受話者には）意外な感情を有していたことが分かるかもしれない。特にこの感情情報は、電子メールなどのテキストメッセージにおいては利用しにくい特性であるため、発話記録における大きな特長となる。 Emotion recognition is also performed when recording the utterance content, and when the utterance record is presented to the user, it is used as a clue when the user browses the information by presenting the emotion information for each utterance to the user. it can. For example, utterances with emotions of anger are described in red, and utterances with emotions of sadness are described in blue. This provides a great clue for the user to find the necessary utterance record. Also, when viewing the utterance record at a later date, it may be found that a person had an unexpected emotion (to the listener) during the utterance. In particular, this emotion information is a feature that is difficult to use in text messages such as e-mails, and is therefore a major feature in utterance recording.

同様に、発話に関連してキーワードを自動抽出することにより、記録を簡潔にすることができる。或いは、このキーワードを、後日に検索をする場合の検索キーとして利用することも可能である。 Similarly, the recording can be simplified by automatically extracting keywords related to the utterance. Alternatively, this keyword can be used as a search key when searching at a later date.

次に、本発明の第１２の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置においては、グループ化された発話の継続時間を見積る対話継続時間算出手段をさらに備え、コミュニケーション記録提示手段は、対話継続時間算出手段により見積られた継続時間をあわせて提示する。 Next, real-world communication management technology according to the twelfth embodiment of the present invention will be described. The information processing apparatus according to the present embodiment further includes a dialog duration calculating unit that estimates the duration of grouped utterances, and the communication record presenting unit combines the duration estimated by the dialog duration calculating unit. Present.

それぞれの対話がどの程度の時間継続したかという情報は、対話を管理する上で重要な指標となる。長時間継続した対話ほど重要である可能性が大きいからである。一言、二言で終わるような対話は、挨拶などほとんど意味を成さない場合も多いが、例えば１時間も継続した対話は、その内容がどんなものであれ、それなりに意味がある場合が多いと考えられる。 Information on how long each dialogue lasted is an important indicator in managing the dialogue. This is because the longer the conversation, the more likely it is to be important. A dialogue that ends in one or two words often makes little sense, such as greetings, but a dialogue that lasts for an hour, for example, is often meaningful in any way. it is conceivable that.

対話継続時間は、対話中に含まれる最初の発話の開始時刻から、対話中に含まれる最後の発話の終了時刻までとして算出できる。それぞれの開始時刻、終了時刻には若干の誤差が含まれると考えられるが、精度は特に要求されないので問題にはならない。１分未満、１分以上１０分未満、１０分以上の３段階程度の分類で、表示色やフォントサイズを異ならせて提示することもユーザによる認識のしやすさという観点から有効である。また、コミュニケーション記録提示手段は、対話継続時間算出手段により算出された継続時間が、規定の閾値以上のグループ化された発話のみを提示するようにしても良い。人は、一日に細かな対話を大量に行っている。それらの全てを提示されると、情報量が多すぎるため、目的の対話を見つけることは困難になる。例えば、継続時間が１分以下の対話をフィルタリングすれば、残る対話の量は著しく減少していると期待される。フィルタリングの閾値はユーザによって変更可能であることが望ましい。 The conversation duration time can be calculated from the start time of the first utterance included during the conversation to the end time of the last utterance included during the conversation. Although it is considered that each start time and end time include some errors, there is no problem because accuracy is not particularly required. It is also effective from the viewpoint of easy recognition by the user that the display colors and font sizes are presented in three levels of less than 1 minute, less than 1 minute, less than 10 minutes, and more than 10 minutes. Further, the communication record presenting means may present only grouped utterances whose duration calculated by the dialog duration calculating means is equal to or greater than a prescribed threshold. People conduct a large amount of detailed dialogue a day. When all of them are presented, the amount of information is too large to find the desired dialogue. For example, if conversations with a duration of 1 minute or less are filtered, the amount of remaining conversations is expected to be significantly reduced. It is desirable that the filtering threshold can be changed by the user.

また、コミュニケーション記録提示手段は、対話継続時間算出手段により算出された継続時間に従い、昇順、あるいは、降順にグループ化された発話を提示するようにすることもできる。例えば、発話継続時間の長い順に発話を提示することができれば、比較的重要度の大きい順に対話記録が並ぶことが期待される。 Further, the communication record presenting means may present the utterances grouped in ascending order or descending order according to the duration calculated by the dialog duration calculating means. For example, if the utterances can be presented in the order of long utterance duration, it is expected that the conversation records are arranged in order of relatively high importance.

次に、本発明の第１３の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理装置においては、コミュニケーション記録提示手段は、コミュニケーション記録管理手段により記録されている発話記録からユーザによって指定された特定の発話者を含む発話記録のみを検索するユーザ検索手段を有する。対話は、人と人とが行うものである。従って、ある人と行った対話のみを抽出したいという要求が生じる場合がある。そこで、ユーザ検索手段を備えることにより、ある特定の人との対話又はグループ間での対話を、簡単に抽出し呼び出すことが可能になる。 Next, real-world communication management technology according to the thirteenth embodiment of the present invention is described. In the information processing apparatus according to the present embodiment, the communication record presenting means includes user search means for searching only the utterance record including the specific utterer designated by the user from the utterance record recorded by the communication record management means. Have. Dialogue is something between people. Therefore, there may be a request to extract only a dialogue with a certain person. Therefore, by providing the user search means, it is possible to easily extract and call out a dialogue with a specific person or a dialogue between groups.

次に、本発明の第１４の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理端末は、前記コミュニケーション記録管理手段は、電子メール記録を発話記録に加えて管理し、前記コミュニケーション記録提示手段は、前記コミュニケーション記録管理手段により管理されている電子メール記録と対話記録の一部または全部を、電子メール送信時刻および対話発生時刻に基づき、時系列に並べてユーザに提示することを特徴とする。 Next, a real world communication management technique according to the fourteenth embodiment of the present invention will be described. In the information processing terminal according to the present embodiment, the communication record management unit manages the electronic mail record in addition to the utterance record, and the communication record presenting unit includes the electronic mail record managed by the communication record management unit, A part or all of the dialogue record is presented to the user in a time series based on the e-mail transmission time and the dialogue occurrence time.

ユーザは、日々送受信するメールを、メール管理ソフトウェアすなわちメーラを用いて管理している。一般的なメーラは、時系列にメールを提示したり、発信者などの属性で検索したりすることができる。これらのメーラで管理されているメールと発話記録とを同一の画面内で統合して扱うことができれば、ユーザはメールと実世界上の対話という２通りのコミュニケーションを、別々の手段で分離して扱う必要が無くなる。無論、分離して扱うほうが便利な場合もあるため、コミュニケーションの手段ごとに表示を分離するモードと統合するモードとを有することが望ましい。 A user manages mail transmitted and received daily using mail management software, that is, a mailer. A general mailer can present e-mails in time series or search by an attribute such as a sender. If mail and utterance records managed by these mailers can be integrated and handled within the same screen, the user can separate the two types of communication, email and real-world dialogue, by separate means. No need to handle. Of course, since it may be more convenient to handle separately, it is desirable to have a mode for separating display and a mode for integrating for each means of communication.

操作に関しても、例えば、ある発話記録に対する返信として電子メールを新規作成し、発話記録を引用するような、双方のコミュニケーション記録を分け隔てなく扱えることが望ましい。そのためには少なくともメーラのアドレス帳の電子メールアカウント情報と、発話記録（受話記録）の発話者IDを相互に関連付けて管理するアカウント管理手段を持つ必要がある。例えば、ある発話者に対する返信を電子メールで行う際には、その発話者に対応する電子メールアドレスを取得し、その電子メールアドレスを宛先に設定したメールを作成する必要がある。これは単純に発話者識別子と電子メールアドレスとを対応付けるテーブルを持つことにより解決される。 As for the operation, it is desirable to be able to handle both communication records without any distinction, such as creating a new e-mail as a reply to a certain utterance record and quoting the utterance record. For this purpose, it is necessary to have at least account management means for managing the e-mail account information in the mailer address book and the utterance ID of the utterance record (received record) in association with each other. For example, when replying to a speaker by e-mail, it is necessary to obtain an e-mail address corresponding to the speaker and create a mail with that e-mail address as the destination. This can be solved simply by having a table that associates speaker identifiers with email addresses.

一方、対話に対して電子メールで返信を行う場合には、その対話の参加者のユーザID（発話者IDと受話者ID）に基づき電子メールアドレスを取得し、その電子メールアドレスを宛先に設定したメールを作成する必要がある。対話の参加者は通常多くの人間が含まれるため、ユーザは不必要なアドレスを削除するなどの処理を行う必要がある可能性があるが、その際、電子メールアドレスが発話時間順などにソートされていれば、取捨選択が行いやすくなる（前述の対話代表者の抽出処理と同様の処理を行えばよい）。 On the other hand, when replying to a dialogue by e-mail, the e-mail address is acquired based on the user ID (speaker ID and receiver ID) of the participant of the dialogue, and the e-mail address is set as the destination You need to create an email. Since the conversation participants usually include many people, the user may need to perform processing such as deleting unnecessary addresses. At that time, the e-mail addresses are sorted in order of utterance time, etc. If it is done, it becomes easy to make a selection (the same process as the above-described dialog representative extraction process may be performed).

また、引用や返答を行う際に、発話やメールなどのコミュニケーション記録に関連付けられているIDを記述することは、コンピュータで処理を行う上で非常に役に立つ。電子メールには、Message-IDと呼ばれる枠組みがあり、RFC2822で定義されている。あるメールに対して返答するときには、In-Reply-Toヘッダに元メールのMessage-IDを記述することで、そのメールを受け取ったメーラは、そのメールが該当するMessage-IDを有するメールへの返答であることを知り、スレッド表示などを行うことができる。また、メールの一部を引用する場合にもReferencesヘッダに引用元のメールのMessage-IDを記述することで、そのメールを受け取ったメーラは、そのメールが該当するMessage-IDを有するメールを引用していることを知るため、2つのメールを関連付けることが可能になる。 Also, when quoting or replying, it is very useful to describe the ID associated with communication records such as utterances and e-mails when processing with a computer. E-mail has a framework called Message-ID and is defined in RFC2822. When replying to a certain mail, the Message-ID of the original mail is described in the In-Reply-To header so that the mailer that receives the mail responds to the mail with the corresponding Message-ID. Knowing that, you can display the thread. Also, when quoting a part of an email, by describing the Message-ID of the original email in the References header, the mailer that received the email quotes the email with the corresponding Message-ID. To know what you are doing, you can associate two emails.

電子メールにおいて標準化されている上記の処理は、本発明の各実施の形態で述べている発話に対しても発話IDを用いることで可能になる。発話IDはその発話を聞いた人に周知されているため、後日、発話IDを示すことで、どの発話を指しているのか判断することができる。対話の中で、ある発話に続いて行われた別の発話は、先の発話に対する返答となっている場合が多いので、In-Reply-Toヘッダに相当する情報（元の発話の発話ID）を発話通知に含めることができれば、後日、発話記録を参照する際に、発話間のつながりを知ることができるため有用である。 The above processing standardized in the electronic mail can be performed by using the utterance ID for the utterance described in each embodiment of the present invention. Since the utterance ID is known to the person who heard the utterance, the utterance ID can be determined at a later date by indicating the utterance ID. In the dialogue, another utterance that was made after one utterance is often a response to the previous utterance, so information corresponding to the In-Reply-To header (the utterance ID of the original utterance) Can be included in the utterance notification, because it is possible to know the connection between utterances later when referring to the utterance record.

また、過去に行われた発話を呼び出して、それに対してコメントをする際には、Referenceヘッダに相当する情報（引用元の発話の発話ID）を発話通知に含められれば有用である。本発明の各実施の形態で述べる情報処理装置に記録されている発話記録に含まれる発話内容を視聴した後に続けて発話を行った場合、Referenceヘッダに相当する情報を発話通知に含めることができる。 Also, when calling an utterance made in the past and commenting on it, it is useful if information corresponding to the Reference header (the utterance ID of the citation utterance) can be included in the utterance notification. When the utterance is continuously performed after viewing the utterance content included in the utterance record recorded in the information processing apparatus described in each embodiment of the present invention, information corresponding to the Reference header can be included in the utterance notification. .

尚、ここでは発話単位で発話記録を扱う方法を記載したが、対話単位で扱う場合も対話IDを用いることで、同様の処理が可能であることはいうまでもない。 Although a method of handling an utterance record in units of utterances has been described here, it goes without saying that the same processing can be performed by using dialogue IDs in the case of handling in units of dialogs.

一般的な電子メールには、その他にも、送信元アドレス、送信先アドレス、送信日、サブジェクト、本文という要素がある。これらの要素と、本発明の対象とする発話における、発話者識別子、受話者ID、発話時間、話題、発話内容の各要素が対応する。このうち、電子メールのサブジェクトに該当する話題以外の要素に関しては、前述の構成で取得、管理することが可能である。話題に関しても、発話内容から音声認識、キーワード抽出などを行うことにより、ある程度は推定可能であし、また、特になくても問題がない場合も多い。 General e-mail includes other elements such as a transmission source address, a transmission destination address, a transmission date, a subject, and a text. These elements correspond to the respective elements of the speaker identifier, receiver ID, speech time, topic, and speech content in the utterance targeted by the present invention. Among these, elements other than the topic corresponding to the subject of the electronic mail can be acquired and managed with the above-described configuration. The topic can be estimated to some extent by performing speech recognition, keyword extraction, etc. from the utterance content, and there are many cases where there is no problem even if there is no particular problem.

以上に説明したように、電子メールと発話とは、それぞれ共通する要素を有するため、それぞれの記録の保持形式さえ共通化又は変換しさえすれば、電子メールと発話とを、統一されたインタフェースにより取り扱うことが可能になる。現在利用しているメーラに、メールに混じって発話記録が並んでいる状態を想像するとわかりやすい（後述する）。それぞれのコミュニケーション記録は、別々のページに分けるなどして分割して表示することもできるが、単一のページに混在して表示することも可能である。 As described above, since e-mail and utterance have common elements, as long as each record retention format is shared or converted, e-mail and utterance can be combined with a unified interface. It becomes possible to handle. It can be easily understood by imagining a state in which utterance records are lined up in a mailer that is currently used, mixed in an email (described later). Each communication record can be divided and displayed on separate pages, but can also be displayed mixed on a single page.

あるユーザによる発話や電子メールを一覧する場合には、そのユーザの発話者識別子と電子メールアドレス双方を認識する必要があり、それらのアカウントが同一人物であることを機械が認識していることが望ましい。たとえば「山田さん」のコミュニケーション記録を検索する場合、山田さんの電子メールアドレスのyamada@dokodemo.netと発話者識別子:Y094897870を検索キーとして検索することになるが、ユーザが各々のアカウントを認識しなくても、機械が適切なアカウントを引き出して検索を行うことが望ましい。 When listing utterances and emails by a user, it is necessary to recognize both the user's speaker identifier and email address, and the machine may recognize that their accounts are the same person desirable. For example, when searching for the communication record of "Mr. Yamada", the search will be performed using Yamada's email address yamada@dokodemo.net and speaker identifier: Y094897870, but the user will recognize each account. Even without it, it is desirable for the machine to retrieve the appropriate account and perform the search.

ユーザはコミュニケーションの違いを意識する必要はなく、たとえば、ある発話記録に対する返答として、電子メールを作成することも容易である（その場合、該当する発話記録の発話IDをIn-Reply-Toヘッダに記述する）。ある発話IDを有する発話への返答を電子メールで行う場合には、その発話IDに対応する電子メールアドレスが自動的に割り当てられることが望ましい。上述のように、電子メール記録と発話記録を統合して利用するためには、それぞれのアカウントを相互に変換、同一視するための枠組みが必要である。それは対応テーブルを保持することで解決できる。 The user does not need to be aware of the difference in communication, for example, it is easy to create an e-mail as a response to a certain utterance record (in this case, the utterance ID of the corresponding utterance record is included in the In-Reply-To header) Write). When replying to an utterance having a certain utterance ID by e-mail, it is desirable that an e-mail address corresponding to the utterance ID is automatically assigned. As described above, in order to integrate and use e-mail records and utterance records, a framework for converting and identifying each account with each other is necessary. This can be solved by holding a correspondence table.

次に、本発明の第１５の実施の形態による実世界コミュニケーション管理技術について説明する。本実施の形態による情報処理端末は、前記コミュニケーション記録管理手段は、ネットワークを通して行われるコミュニケーション記録を発話記録に加えて管理し、
前記コミュニケーション記録提示手段は、前記コミュニケーション記録管理手段により管理されているコミュニケーション記録と対話記録の一部または全部を、コミュニケーション開始時刻ないしは対話発生時刻に基づき時系列に並べてユーザに提示することを特徴とする。すなわち、電子メールだけではなく、電話やチャット、ビデオ会議などの他のコミュニケーション手段も統合できれば、ユーザが行うさまざまなコミュニケーションを統一的に扱うことができる。本実施の形態による情報処理端末は、ユーザの仮想上、実世界上のコミュニケーションを記録し、統合されたユーザインタフェースで、あらゆるコミュニケーション記録にアクセスすることを可能とする。人間のコミュニケーションの記録は、その人物の人生の多くを記録しているものと期待される。その人物が死んだ後においても、その人物のコミュニケーション記録を紐解くことで、その人物の人生の多くの部分を知ることができる。 Next, real-world communication management technology according to the fifteenth embodiment of the present invention will be described. In the information processing terminal according to the present embodiment, the communication record management means manages the communication record performed through the network in addition to the utterance record,
The communication record presenting means presents a part or all of the communication records and dialog records managed by the communication record managing means in a time series based on a communication start time or a dialog occurrence time, and presents them to the user. To do. That is, if other communication means such as telephone, chat, and video conference can be integrated in addition to electronic mail, various communications performed by the user can be handled in a unified manner. The information processing terminal according to the present embodiment records a user's virtual and real world communications, and enables access to any communication record with an integrated user interface. The record of human communication is expected to record much of the person's life. Even after the person dies, he can learn many parts of his life by unraveling his communication record.

電話の通話記録を統合する場合を考えると、電話の発信・着信記録（発信元、発信先、発信日時、通話時間）などはすでに利用可能である。通話内容に関しても同時に録音しておけば記録可能である。従って、前述の発話や電子メールと統合されたインタフェースで通話記録を扱うことができる。相互運用性を高めるために、それぞれの通話に対し、固有のIDを与えることと、ユーザアカウント（この場合電話番号）を他のコミュニケーション手段のアカウント（たとえば電子メールのアドレス、発話の発話者ID）とマッピングする必要がある。 Considering the case where telephone call records are integrated, telephone call / call records (source, destination, call date / time, call time), etc. are already available. You can also record the contents of a call if you record it at the same time. Therefore, the call record can be handled by an interface integrated with the above-mentioned utterance and electronic mail. To enhance interoperability, each call must be given a unique ID, and a user account (in this case, a telephone number) can be accounted for other communication means (eg email address, utterer ID of the utterance) Need to be mapped.

以上に説明したように、本発明の各実施の形態によれば、ユーザが誰とコミュニケーションをとったかを記録することができるため、実世界上のコミュニケーションを仮想世界上のコミュニケーションと同様に扱うことができるようになり、後からコミュニケーション記録を見直すことによって、実世界上のユーザ行動を確認することが可能になる。
以下、さらに詳細かつ具体的な説明を行う。 As described above, according to each embodiment of the present invention, it is possible to record who the user has communicated with, so that communication in the real world is handled in the same way as communication in the virtual world. By reviewing the communication record later, it becomes possible to confirm user behavior in the real world.
Hereinafter, a more detailed and specific description will be given.

送受信した発話通知は記録する必要がある。必ずしも送受信したすべてのパケットをそのまま記録する必要は無く、必要な部分だけ管理しやすい形式で保存すればよい。 The sent and received utterance notifications must be recorded. It is not always necessary to record all transmitted and received packets as they are, and only necessary portions may be stored in a format that is easy to manage.

本実施例は、監視対象ユーザの発話を検知する発話検知手段と、発話検知手段によってユーザの発話が検知された際に、ユーザの発話を近接する他の端末に通知する発話通知手段とを備え、発話通知手段が通知する情報は、発話者を識別するIDを含むことを特徴とする情報処理装置において、他の端末から発信された発話通知を受信する発話受理手段をさらに備え、発話通知手段により通知した自らの発話通知と、発話受理手段により受信した他の発話通知とを、記録するコミュニケーション記録管理手段を備えることを特徴とする。 The present embodiment includes an utterance detection unit that detects an utterance of a user to be monitored, and an utterance notification unit that notifies an utterance of the user to other nearby terminals when the utterance of the user is detected by the utterance detection unit. The information notified by the utterance notification means further includes an utterance reception means for receiving an utterance notification transmitted from another terminal in the information processing apparatus characterized by including an ID for identifying a speaker. And a communication record management means for recording the utterance notice of the utterance notified by the utterance and the other utterance notice received by the utterance accepting means.

図４（Ａ）は、本実施例による情報処理装置の概観を示した図である。情報処理端末４００は無線LAN通信モジュール４０５を備え、近接する他端末とアドホック無線通信を行うことができる。さらにマイクロフォン４０６を備えており、対話内容を録音することができる。録音された対話内容は機器内部に格納されている記憶装置４０８に格納され、外部出力コネクタ４０４や無線LAN通信モジュール４０５を介して外部端末に出力することができる。 FIG. 4A is a diagram illustrating an overview of the information processing apparatus according to the present embodiment. The information processing terminal 400 includes a wireless LAN communication module 405 and can perform ad hoc wireless communication with other nearby terminals. Further, a microphone 406 is provided to record the conversation contents. The recorded conversation content is stored in a storage device 408 stored in the device, and can be output to an external terminal via the external output connector 404 or the wireless LAN communication module 405.

図４（Ｂ）に示すように、ユーザは、声帯の振動を検知する発話検知パッド４０７を喉部に貼り付ける。発話検知パッド４０７によって声帯の振動を検知すると、情報処理端末４００は、近接する端末に発話通知を行うためにspeech.msgと呼ぶメッセージを生成し、無線LAN通信モジュール４０５を介して近接端末にspeech.msgを送信する。speech.msgの発信は発話が継続している間、一定期間ごとに行われ、発話が終了すると停止される。一方で発話の継続中はマイクロフォン４０６によって発話内容が取得され、speech.msgの情報と関連付けられて記憶装置４０８に記録される。表示画面４０１には、図に示すように、発話者識別子（ＥＡ９２０６４５４５９２）、発話者名称（さっちゃん）、発話継続時間（０２’１３”）、および、録音音声レベルが表示されている。自分が発話している場合には、発話者は自分になる。 As shown in FIG. 4B, the user attaches an utterance detection pad 407 for detecting the vibration of the vocal cords to the throat. When the vocal cord vibration is detected by the utterance detection pad 407, the information processing terminal 400 generates a message called speech.msg in order to notify the utterance to the nearby terminal, and then sends the message to the neighboring terminal via the wireless LAN communication module 405. Send .msg. The speech.msg is transmitted at regular intervals while the utterance continues, and is stopped when the utterance ends. On the other hand, during the continuation of the utterance, the utterance content is acquired by the microphone 406 and is recorded in the storage device 408 in association with the information of speech.msg. As shown in the figure, the display screen 401 displays a speaker identifier (EA9206454592), a speaker name (Sacchan), an utterance duration (02'13 "), and a recorded voice level. If so, the speaker becomes himself.

一方、情報処理端末４００は、近接端末が送信するspeech.msgを、無線LANモジュール４０４を介して受信することができる。speech.msgには、発話者を特定するIDが含まれているので、speech.msgを計測することによって、発話者を知ることができる。この端末は内部記憶装置内に既知のIDと名前との変換テーブルを保持しており、既知のIDに関しては図４に示すように、名前に変換してユーザに表示する。さらにマイクロフォン４０６に自分以外の発話が計測され、speech.msgを受信する間、録音された発話内容は、speech.msgの情報と関連付けられて記憶装置４０８に蓄積される。 On the other hand, the information processing terminal 400 can receive speech.msg transmitted by the proximity terminal via the wireless LAN module 404. Since speech.msg includes an ID for identifying the speaker, the speaker can be known by measuring speech.msg. This terminal holds a conversion table of known IDs and names in the internal storage device, and the known IDs are converted into names and displayed to the user as shown in FIG. Furthermore, while the utterance other than the self is measured by the microphone 406 and speech.msg is received, the recorded utterance content is stored in the storage device 408 in association with the information of speech.msg.

記憶装置４０８内に蓄積された発話記録は、表示画面４０１内に表示することもできるため、入力デバイス４０３を操作することにより、所望の発話記録を選択し、記録されている発話内容をスピーカ４０２を介して聞くことができる。 Since the utterance record stored in the storage device 408 can be displayed in the display screen 401, a desired utterance record is selected by operating the input device 403, and the recorded utterance content is displayed on the speaker 402. Can be heard through.

図２０(a)は、２００５年４月１日（金）に記録された発話記録を、発話開始時刻に基づいてソートして、表示画面２００１上に表示させた例である。画面左より発話開始時刻(TIME)、発話者識別子または名前(NAME)、発話継続時間(DUR)の順に情報が表示されている。ユーザは図４に示す入力デバイス４０３を操作することによりカーソル行（反転表示）を上下し、時間を遡ったり進めたりすることで、所望の発話記録を選択し、発話内容を聞くことができる。 FIG. 20A shows an example in which the utterance records recorded on April 1, 2005 (Friday) are sorted based on the utterance start time and displayed on the display screen 2001. From the left side of the screen, information is displayed in the order of utterance start time (TIME), speaker identifier or name (NAME), and utterance duration (DUR). The user can select a desired utterance record and listen to the utterance content by moving up and down the cursor line (highlighted display) by operating the input device 403 shown in FIG.

この例では、発話者識別子に対応する名前を端末が所持していないユーザによる発話（ID:340087BおよびID:129086C）も表示されているが、発話者識別子に対応する名前を端末が所持していない未知の発話IDを有する発話は表示しないように制御する、もしくは、最初から記録しないように制御することも可能である。このように制御することにより、有用でない情報をフィルタリングし、S/N比を改善することもできる。さらに、継続時間が一定時間に満たない発話を表示しなかったり、削除したりすることもできる。挨拶など数秒で完了するような発話は重要でないことが多いので、それらを表示しないことで重要な発話を見つけやすくなる。ここでは1日の発話記録を対象にソートを行っているが、発話記録は発話開始日時と対応付けして記録するため、１週や１月、任意の期間などにおける発話記録を対象に同様の提示を行うことができる。 In this example, utterances (ID: 340087B and ID: 129086C) by users who do not have a name corresponding to the speaker identifier are also displayed, but the terminal has a name corresponding to the speaker identifier. It is possible to control not to display an utterance having an unknown unknown utterance ID, or not to record from the beginning. By controlling in this way, information that is not useful can be filtered and the S / N ratio can be improved. Furthermore, it is possible to not display or delete an utterance whose duration is less than a certain time. Utterances such as greetings that are completed in a few seconds are often not important, so it is easier to find important utterances by not displaying them. Here, the sort is performed on the utterance record of one day, but since the utterance record is recorded in association with the utterance start date and time, the same applies to the utterance record in one week, one month, an arbitrary period, etc. You can make a presentation.

図２０(b)に示す表示画面２００２は、４月１日に記録された発話記録を、発話継続時間でソートして表示した例を示す図である。重要な発話ほど長時間に及ぶことが多いため、発話継続時間でソートすることにより、重要な発話を見つけやすくなるという効果がある。表示画面２００２には、画面左より発話開始時刻(TIME)、発話者識別子または名前(NAME)、発話継続時間(DUR)の順に情報が表示されており、発話継続時間の降順にソートされている。ここでは１日の発話記録を対象にソートを行っているが、１週や１月、任意の期間における発話記録を対象に同様の提示が行うことができる。 A display screen 2002 shown in FIG. 20B is a diagram showing an example in which the utterance records recorded on April 1 are sorted and displayed by the utterance duration. Since important utterances often take a long time, sorting by utterance duration has the effect of making it easier to find important utterances. The display screen 2002 displays information in the order of the utterance start time (TIME), the speaker identifier or name (NAME), and the utterance duration (DUR) from the left of the screen, and is sorted in descending order of the utterance duration. . Here, the sort is performed on the utterance records of one day, but the same presentation can be performed on the utterance records in one week, one month, or an arbitrary period.

図２１(c)は、４月１日に記録された発話記録をユーザごとに総発話時間でソートして表示した表示例２１０１を示す図である。長時間発話する人は重要な人であると推定されるため、総発話時間でソートすることにより、重要な発話、重要な人を見つけやすくなるという効果がある。ここでは１日の発話記録を対象にソートを行っているが、１週や１月、任意の期間における発話記録を対象に同様の提示が行うことができる。 FIG. 21C is a diagram showing a display example 2101 in which the utterance records recorded on April 1 are sorted and displayed by total utterance time for each user. Since a person who speaks for a long time is estimated to be an important person, sorting by the total utterance time has an effect of making it easier to find important utterances and important persons. Here, the sort is performed on the utterance records of one day, but the same presentation can be performed on the utterance records in one week, one month, or an arbitrary period.

図２１(d)は、ある特定のユーザ（ここでは、「さっちゃん」）の発話を発話記録から検索して表示した表示例２１０２を示す図である。ここでは、時系列にソートして表示しているが、発話継続時間によりソートして表示しても良い。ここでは数日の発話記録を対象にソートを行っているが、１週や１月、任意の期間における発話記録を対象に同様の提示が行うこととできる。 FIG. 21D is a diagram showing a display example 2102 in which an utterance of a specific user (here, “Sacchan”) is retrieved from the utterance record and displayed. Here, the data is sorted and displayed in time series, but may be sorted and displayed according to the utterance duration. Here, sorting is performed for utterance records for several days, but similar presentation can be made for utterance records in an arbitrary period of one week or January.

図１は、本発明の各実施の形態による情報処理装置の構成例を示す機能ブロック図である。図１に示すように、情報処理装置は、通信部１０１ａと発話通知部１０２を含む発話通知手段と、発話状態検出部１０３を含む発話検知手段と、対話相手取得部１０４と通信部１０１ｂとを含む通知受理手段と、受話通知部１１０と通信部１０１ｄを含む受話通知手段と、対話管理部１０５と対話記録部１０８とを含むコミュニケーション記録管理手段兼発話識別子割当手段と、通信部１０１ｃと名前管理部１０６と名前解決部１０７とを含むユーザ名取得手段と、対話取得部１１０を含む発話内容記録手段と、通信部１０１を含むコミュニュケーション記録出力手段と、ユーザインタフェース部１０９と、を有している。 FIG. 1 is a functional block diagram showing a configuration example of an information processing apparatus according to each embodiment of the present invention. As shown in FIG. 1, the information processing apparatus includes an utterance notification unit including a communication unit 101a and an utterance notification unit 102, an utterance detection unit including an utterance state detection unit 103, a conversation partner acquisition unit 104, and a communication unit 101b. Including a notification receiving means, a receiving notification means including a reception notification section 110 and a communication section 101d, a communication record management means and utterance identifier assigning means including a dialog management section 105 and a dialog recording section 108, and a communication section 101c and name management. A user name acquisition unit including a unit 106 and a name resolution unit 107, an utterance content recording unit including a dialogue acquisition unit 110, a communication record output unit including a communication unit 101, and a user interface unit 109. ing.

ユーザは、図１に示す各部を備える情報処理装置Ａを常に持ち歩く。発話状態検出部１０３を含む発話検知手段は、ユーザの発話を監視し、ユーザの発話を検出すると発話通知手段である発話通知部１０２に発話の検出を伝える。発話通知部１０２は、近接する他の端末に発話を通知するために図５に示すメッセージ（speech.msg）を生成し、通信部１０１ａを介して近接する端末に発信する。発話状態検出部１０３は同時にコミュニケーション記録管理手段である対話管理部１０５に発話の検出を伝える。対話管理部１０５は発話の発生を時刻と関連付けて、対話記録部１０８に記録する。 The user always carries the information processing apparatus A including each unit shown in FIG. The utterance detection unit including the utterance state detection unit 103 monitors the user's utterance, and when detecting the user's utterance, notifies the utterance notification unit 102 which is an utterance notification unit of detection of the utterance. The utterance notification unit 102 generates a message (speech.msg) shown in FIG. 5 in order to notify an utterance to other nearby terminals, and transmits the message (speech.msg) to the neighboring terminals via the communication unit 101a. At the same time, the utterance state detection unit 103 transmits the detection of the utterance to the dialogue management unit 105 which is a communication record management unit. The dialogue management unit 105 records the occurrence of the utterance in the dialogue recording unit 108 in association with the time.

一方、通知受理手段に該当する対話相手取得部１０４は、通信部１０１ｂを介して他端末から送信されるspeech.msgを受信し、speech.msgに含まれる発話者識別子などの発話者に関する情報を対話管理部１０５に伝える。対話管理部１０５は他者の発話の発生を、時刻および上記発話者に関する情報とともに、対話記録部１０８に記録する。対話記録部１０８は、発話記録出力手段である通信部１０１に接続されており、通信部１０１を介して記録されている対話記録を外部端末に出力することができる。ここで、対話取得部１０１は、発話内容記録手段であり、発話通知の送受信と同期して、マイクロフォンにより音声を取得する。取得された音声は対話管理部１０５によって発話通知と関連付けられ、対話記録部１０８に蓄積される。 On the other hand, the conversation partner acquisition unit 104 corresponding to the notification receiving unit receives speech.msg transmitted from another terminal via the communication unit 101b, and stores information on the speaker such as a speaker identifier included in the speech.msg. Tell the dialog manager 105. The dialogue management unit 105 records the occurrence of another person's utterance in the dialogue recording unit 108 together with the time and information about the utterer. The dialogue recording unit 108 is connected to the communication unit 101 as an utterance record output unit, and can output the dialogue record recorded via the communication unit 101 to an external terminal. Here, the dialogue acquisition unit 101 is an utterance content recording unit, and acquires voice by a microphone in synchronization with transmission / reception of an utterance notification. The acquired voice is associated with the utterance notification by the dialog management unit 105 and stored in the dialog recording unit 108.

さらに、通信部１０１を介して取得されるユーザIDと名前との対応情報は、名前解決部により管理され、対話管理部１０５及び対話記録部１０８で扱われる対話記録に含まれる既知のユーザIDに対して名前を関連付ける（ユーザ名取得手段）。対話管理部１０５及び対話記録部１０８で扱われる対話記録は、ユーザインタフェース１０９を介してユーザに提示される。 Furthermore, the correspondence information between the user ID and the name acquired via the communication unit 101 is managed by the name resolution unit, and is converted into a known user ID included in the dialogue record handled by the dialogue management unit 105 and the dialogue recording unit 108. Associate names with each other (user name acquisition means). The dialogue records handled by the dialogue management unit 105 and the dialogue recording unit 108 are presented to the user via the user interface 109.

図５は、発信通知として利用されるspeech.msgのデータ構成例を示す図である。speech.msgは、発話ID、発話No、発話者識別子、その他の情報から構成される。発話ID５０１は、一連の発話に付加される固有のIDである。長時間継続する発話の場合には、発話継続中に複数のspeech.msgが発信される場合があるが、その場合は同一の発話IDが記載される。発話が途切れると、次の発話には新たな発話IDが付加される。 FIG. 5 is a diagram illustrating a data configuration example of speech.msg used as a transmission notification. speech.msg includes an utterance ID, an utterance number, a speaker identifier, and other information. The utterance ID 501 is a unique ID added to a series of utterances. In the case of an utterance that continues for a long time, a plurality of speech.msg may be transmitted while the utterance is continuing, in which case the same utterance ID is described. When the utterance is interrupted, a new utterance ID is added to the next utterance.

発話No５０２は、同一の発話ID５０１を有する一連の発話におけるspeech.msgの通し番号であり、１より始まり、インクリメントされる。発話No５０２は、長時間連続して発話すると、増加していくが、発話を中断すると１に戻る。発話者識別子５０３は、発話者を識別するためのIDである。前述のように発話者識別子５０３は発話ID５０１に内包されていてもよい。 The utterance No. 502 is a serial number of speech.msg in a series of utterances having the same utterance ID 501, starting from 1 and incremented. The utterance No. 502 increases when speaking continuously for a long time, but returns to 1 when the utterance is interrupted. The speaker identifier 503 is an ID for identifying a speaker. As described above, the speaker identifier 503 may be included in the speech ID 501.

その他の情報５０４としては以下にあげるものがある。
１）発話開始時刻：一連の発話が開始した発話開始時刻を記述する。
２）発話継続時間：一連の発話の継続時間を記述する。
３）発話通知発信間隔：発話通知を発信する時間間隔を記述する。発話通知を受け取って、次に期待する時間に発話通知が受信できなかった場合には、発話が終了したとみなす。
４）発話通知領域の大きさ：発話通知を発信している領域の大きさを記述する。
５）発話場所：発話を行っている場所を記述する。
６）受話通知要求：受話通知が必要かどうかを記述する。
７）受話通知送信先アドレス：受話通知を送信すべきアドレスを記述する。
８）受話者ID：受話者として発話端末が認識しているユーザIDを記述する。一連の発話中の先行するspeech.msgに対する受話通知を受け取ったユーザIDを記載する。
９） In-Reply-To: 対話が形成されている場合に、直前の発話の発話IDを記述する。電子メールに対する返答である場合には、該当する電子メールのMessage-IDを記述する。
１０） References: 引用する発話IDを記述する。電子メールを引用する場合には、該当する電子メールのMessage-IDを記述する。
１１） Subject: 話題を記述する。電子メールのサブジェクトに該当。電子メールへの返答時には「Re:」という接頭辞をつけて、サブジェクトを自動生成できるが、一般的な会話でサブジェクトを自動生成することは難しい。ただし、ユーザが自分でサブジェクトを設定することは可能である。たとえば、多くの人に囲まれている人気者に話しかけるタイミングを計るために、サブジェクトを明示してその人に話しかける。人気者がもつ端末には興味のあるサブジェクトがあらかじめ設定されていて、該当するサブジェクトを有する発話を検知すると、ユーザに通知するなどの使い方が想定できる。
１１）キーワード: キーワードに関しても上記と同様であるし、発話から音声認識により自動的に抽出してもよい。
１２）関連URL: 関連情報へのポインタ。発話に対するリンクとして機能する。ある発話が概要しか述べられていない場合に、詳細な情報がほしい場合には、発話に設定されているリンクをたどって、詳細情報が記載されたWebページを閲覧するなどの使い方ができる。 Other information 504 includes the following.
1) Utterance start time: Describe the utterance start time when a series of utterances started.
2) Utterance duration: Describe the duration of a series of utterances.
3) Speech notification transmission interval: Describe the time interval for transmitting the speech notification. When the utterance notification is received and the utterance notification cannot be received at the next expected time, it is considered that the utterance has ended.
4) Size of the utterance notification area: Describes the size of the utterance notification area.
5) Speaking place: Describe the place where the utterance is made.
6) Reception notification request: Describes whether reception notification is necessary.
7) Destination notification destination address: Describes the address to which the reception notification should be transmitted.
8) Speaker ID: Describes the user ID recognized by the utterance terminal as the speaker. The user ID that has received the incoming call notification for the preceding speech.msg in a series of utterances is described.
9) In-Reply-To: Describes the utterance ID of the previous utterance when a dialogue is formed. In the case of a response to an e-mail, the Message-ID of the corresponding e-mail is described.
10) References: Describes the utterance ID to be cited. When citing an e-mail, describe the Message-ID of the corresponding e-mail.
11) Subject: Describes the topic. Corresponds to email subject. When replying to an e-mail, the subject can be automatically generated with the prefix “Re:”, but it is difficult to automatically generate a subject in a general conversation. However, the user can set the subject by himself / herself. For example, in order to measure the timing of talking to a popular person surrounded by many people, the subject is clearly stated and the person is talked to. A popular subject's terminal is set in advance with an interesting subject. When an utterance having the subject is detected, a method of notifying the user can be assumed.
11) Keyword: The keyword is the same as described above, and may be automatically extracted from speech by speech recognition.
12) Related URL: Pointer to related information. Serves as a link to utterances. If a utterance is only outlined and you want detailed information, you can follow a link set in the utterance and browse a Web page with detailed information.

図１０、１１は、図２における端末A-D間の発話通知のやり取りを示した図である。それぞれの端末はそれぞれのユーザが発話している間（発話状態）、所定の時間ごとに発話通知領域内に存在する端末に対して発話通知を行う。ここではspeech.msgを発話通知領域内に配布することで、通知を行っている。端末A、Dは互いの発話通知領域内に存在するため、ともに相手の発する発話通知を受信する。ここでユーザAの２回の発話にはそれぞれ発話ID:A001とA002が割り当てられており、ユーザDの発話には発話ID:D032が割り当てられている。発話の継続中には定期的にspeech.msgが発信され、ここでは端末Aは端末Dが送信したspeech.msgを５回、端末Dは端末Aが送信したspeech.msgを１０回受信する。それぞれのspeech.msgには、発話No(括弧内数字)が振られており、同一の発話IDを有するspeech.msgでもそれぞれを区別することができる。 10 and 11 are diagrams showing the exchange of utterance notifications between the terminals A and D in FIG. While each user is speaking (speaking state), each terminal sends an utterance notification to terminals existing in the utterance notification area every predetermined time. Here, the notification is made by distributing speech.msg in the utterance notification area. Since terminals A and D exist in each other's utterance notification area, both terminals receive utterance notifications uttered by the other party. Here, the utterance IDs: A001 and A002 are assigned to the two utterances of the user A, and the utterance ID: D032 is assigned to the utterance of the user D. During the continuation of speech, speech.msg is periodically transmitted. Here, terminal A receives speech.msg transmitted by terminal D five times, and terminal D receives speech.msg transmitted by terminal A ten times. Each speech.msg is assigned an utterance number (number in parentheses), and each speech.msg having the same utterance ID can be distinguished.

図６は、このspeech.msgの送信手順を示すフローチャート図である。図６に示すように、ステップＳ１において処理を開始し（Ｓｔａｒｔ）、発話検知手段により発話を検出すると（ステップＳ２）、それが新規発話であるか、継続する一連の発話の続きであるかを判断する（ステップＳ３）。例えば、同一話者による発話間隔が３秒以上離れていると、新規発話と判断する。新規発話の場合には（Ｙｅｓ）、発話IDを生成し（ステップＳ４）、発話Noに新たに１を割り当てる（ステップＳ５）。新規発話でない場合には（Ｎｏ）、以前の発話IDを継続して利用し（ステップＳ８）、発話Noには１を加えたものを割り当てる（ステップＳ９）。 FIG. 6 is a flowchart showing the procedure for transmitting speech.msg. As shown in FIG. 6, processing is started in Step S1 (Start), and when an utterance is detected by the utterance detection means (Step S2), it is determined whether the utterance is a new utterance or a continuation of a series of continuous utterances. Judgment is made (step S3). For example, if the utterance interval by the same speaker is more than 3 seconds apart, it is determined as a new utterance. In the case of a new utterance (Yes), an utterance ID is generated (step S4), and 1 is newly assigned to the utterance No (step S5). If it is not a new utterance (No), the previous utterance ID is continuously used (step S8), and the utterance No plus 1 is assigned (step S9).

発話の直前(ここでは３秒以内)に、他の発話通知を受信していた場合には（ステップＳ６でＹｅｓ）、In-Reply-Toに受信した発話IDを記載する（ステップＳ１０）。図１１の例では、Dの発話ID:D032はAの発話ID:A001を受けて行われたものなので、In-Reply-ToフィールドにID:A001を記述する。同様にAの発話ID:A001にはIn-Reply-To: D032が記述されている。 If another utterance notification has been received immediately before the utterance (here, within 3 seconds) (Yes in step S6), the received utterance ID is described in In-Reply-To (step S10). In the example of FIG. 11, since the utterance ID D032 of D is received in response to the utterance ID A001 of A, ID: A001 is described in the In-Reply-To field. Similarly, In-Reply-To: D032 is described in A's utterance ID: A001.

さらに、発話の直前(ここでは３秒以内)に発話記録を視聴していた場合には（Ｙｅｓ）、Referencesに該当する発話IDを記載する（ステップＳ１１）。その後、前述したその他の情報を加えて、speech.msgを作成する（ステップＳ１２）。speech.msgは通信領域内に存在する端末に対して送信され（ステップＳ１３）、規定時間待機（ここでは200ms）する（ステップＳ１４）。ステップＳ２で発明を検出しない場合（Ｎｏ）も、ステップＳ１４に進む。ステップＳ１４からステップＳ２に戻り処理を繰り返す。 Furthermore, when the utterance record is viewed immediately before the utterance (here, within 3 seconds) (Yes), the utterance ID corresponding to References is described (step S11). Thereafter, speech.msg is created by adding the other information described above (step S12). speech.msg is transmitted to the terminals existing in the communication area (step S13) and waits for a specified time (200 ms in this case) (step S14). Even when the invention is not detected in step S2 (No), the process proceeds to step S14. The process returns from step S14 to step S2 and is repeated.

ここでは発話が継続中200msごとに発話通知が発信されるため、発話通知１つごとに200ms該当ユーザの発話を聞いたと見積もることができる。発話Noを見ればどの程度その発話が継続しているかを知ることもできるし、途中の発話Noから受信を始めたときには、発話の途中に割り込んだことを知ることができる。 Here, since the utterance notification is transmitted every 200 ms while the utterance is continuing, it can be estimated that the utterance of the corresponding user is heard for each utterance notification. If you look at the utterance No, you can know how much the utterance has continued, and when you start receiving from the utterance No in the middle, you can know that you interrupted the middle of the utterance.

図１０では、speech.msgの送信に対して、受信側は何の応答もしておらず、一方的な通知となっているため、自分の発話をどの端末が聞いていたかを発話者が知ることはできない。しかしながら、通常対話は相互の発話で行われることを考えると、自分の発話の受話者は、その時間の前後に発話することが想定されるため、受話者を高い精度で推測することができる。図１０の例では端末A、Dとも端末A-D間で対話が行われたことを認識できる。 In FIG. 10, since the receiving side does not respond to the transmission of speech.msg and is a one-way notification, the speaker knows which terminal is listening to his / her utterance. I can't. However, considering that the normal dialogue is performed by mutual utterances, it is assumed that the listener of the utterance speaks before and after that time, so that the listener can be estimated with high accuracy. In the example of FIG. 10, both terminals A and D can recognize that a dialogue has been performed between terminals A-D.

それでも聞いているだけの人などを管理したい場合など、より厳密に受話者を管理したい場合には、各端末が発話通知を受信したときに、該当する発話IDとユーザIDを含む受話通知を返すようにすることができる。図１１は、受話通知であるACKを発話端末に返すときのメッセージのやり取りを示している（ACKは点線の矢印で示されている）。受話通知を導入することにより、発話者端末は、該当する発話がどの端末によって受話されたかを正確に管理することができる。 If you want to manage the listeners more strictly, such as when you want to manage people who just listen, etc., when each terminal receives an utterance notification, it returns a reception notification including the corresponding utterance ID and user ID. Can be. FIG. 11 shows message exchange when ACK, which is an incoming call notification, is returned to the utterance terminal (ACK is indicated by a dotted arrow). By introducing the reception notification, the speaker terminal can accurately manage which terminal has received the corresponding utterance.

図１２は、図２においてユーザA-Dの対話の最中に、別のユーザEと対話しながら近くを通りかかった無関係のユーザFと、ユーザA間のspeech.msgのやり取りを示したものである。ユーザFとの間の発話通知はノイズであるため、こうした情報は可能な限りフィルタリングすることがS/N比を向上させる上で望ましい。この方法を以下に示す。
１）まず、ユーザFのユーザIDがユーザAにとって未知であるならば、その発話通知は無視するという解決策がある。２）また、一連の発話Noが１から始まらずに途中から始まっていた場合には、発話の途中で割り込んだと判断し、フィルタリングすることも有効である。３）さらに図１２で示すように、発話タイミングが長時間にわたって重なっている場合には、それぞれ別の相手と話していると判断し、フィルタリングすることができる。 FIG. 12 shows the exchange of speech.msg between the user A and an unrelated user F who passed nearby while interacting with another user E during the dialog of the user AD in FIG. Since the utterance notification with the user F is noise, it is desirable to filter such information as much as possible in order to improve the S / N ratio. This method is shown below.
1) First, if the user ID of user F is unknown to user A, there is a solution that ignores the utterance notification. 2) If a series of utterance numbers did not start from 1, but started in the middle, it is also effective to determine that the interruption occurred in the middle of the utterance and perform filtering. 3) Further, as shown in FIG. 12, when the utterance timings overlap for a long time, it can be determined that each of the utterances is talking to another party and filtered.

図３０は受話通知を送信するフローを示した図である。処理を開始し（ステップＳ３１：ＳＴＡＲＴ）、ステップＳ３２において発話通知を受信した端末は、ステップＳ３３において発話通知を記録するともに、ステップＳ３４において受話通知が必要かどうか判断を行う。端末の設定が受話通知の通知を認めており、発話通知に含まれる受話通知要求が真の場合には（Ｙｅｓ）、対応する発話の発話IDと受話者IDを含む受話通知を生成し（ステップＳ３５）、受話通知を送信する（ステップＳ３６）。受話通知の送信先は通常、発話通知の送信元端末となるが、発話通知に別の受話通知送信先が指定されている場合にはそれに従う。受話通知が不要であれば（ステップＳ３４：Ｎｏ）、処理を終了する（ステップＳ３７）。 FIG. 30 is a diagram showing a flow of transmitting a reception notification. The process is started (step S31: START), and the terminal that has received the utterance notification in step S32 records the utterance notification in step S33 and determines whether or not the reception notification is necessary in step S34. If the terminal setting allows the notification of the incoming notification and the incoming notification request included in the outgoing notification is true (Yes), an incoming notification including the corresponding outgoing speech ID and receiver ID is generated (step S35), an incoming call notification is transmitted (step S36). The transmission destination of the reception notification is normally the transmission source terminal of the utterance notification. However, when another reception notification transmission destination is specified in the utterance notification, it is followed. If no incoming call notification is required (step S34: No), the process is terminated (step S37).

図３１は受話通知を受信した場合のフローを示した図である。処理を開始し（ステップＳ４１：ＳＴＡＲＴ）、ステップＳ４２において受話通知を受信すると受話通知に含まれる発話IDと受話者IDを識別し、発話と受話者を特定する（ステップＳ４３）。続いて該当する発話IDに対応する発話記録に受話者を記録し（ステップＳ４４）、処理を終了する（ステップＳ４５）。 FIG. 31 is a diagram showing a flow when an incoming call notification is received. The process is started (step S41: START). When the reception notification is received in step S42, the utterance ID and the receiver ID included in the reception notification are identified, and the utterance and the receiver are specified (step S43). Subsequently, the receiver is recorded in the utterance record corresponding to the corresponding utterance ID (step S44), and the process is terminated (step S45).

前述のように、図１５は、蓄積された発話記録の一例を示す図である。ここでユーザA１５０８、ユーザB１５０９、ユーザC１５１０が喫茶ファミーユ内の６番テーブル１５１１を囲んで話をしており、途中、未知の店員１５１１が６番テーブル１５１１の近くを通り過ぎている。ここでは図１１に示すように、発話通知に対する受話通知を導入しているので、各発話に対する受話者の情報も取得されている。図１５の表には記載されていないが、前述のspeech.msgに含まれる各種情報も同様に管理される。この表に示すように、発話ごとに管理すると、記録が膨大になる。そこで、発話同士をある程度まとめて対話として管理する方がユーザにとって理解しやすい。図１６は発話同士をまとめて対話として表示した例を示す図である。図１６は図１５に対応する表であるが、対話ＩＤ１６０１と、主な対話者１６０２と、対話時刻１６０３と、対話場所１６０４と、対話継続時間１６０５と、音声１６０６と、によりまとめられている。図１５の喫茶店ファミーユでの対話は、ＴＡＬＫ０００１と、ＴＡＬＫ０００２との２つの対話にまとめられている。図１５と図１６とのいずれが理解しやすいかは、目的によるが、まず、図１６をみて、その項目の内のいずれかの詳細を知りたい場合に、図１５を参照するのが一般的な利用法である。 As described above, FIG. 15 is a diagram showing an example of accumulated utterance records. Here, a user A 1508, a user B 1509, and a user C 1510 are talking around the sixth table 1511 in the cafe Famille, and an unknown store clerk 1511 passes near the sixth table 1511. Here, as shown in FIG. 11, since the reception notification for the utterance notification is introduced, the information of the receiver for each utterance is also acquired. Although not described in the table of FIG. 15, various information included in the speech.msg is also managed in the same manner. As shown in this table, if each utterance is managed, the record becomes enormous. Therefore, it is easier for the user to manage the utterances together as a dialogue to some extent. FIG. 16 is a diagram showing an example in which utterances are collectively displayed as a dialogue. FIG. 16 is a table corresponding to FIG. 15, and is summarized by a dialog ID 1601, a main dialog person 1602, a dialog time 1603, a dialog place 1604, a dialog duration 1605, and a voice 1606. The dialogue at the coffee shop Famille in FIG. 15 is summarized into two dialogues between TALK0001 and TALK0002. Which of FIG. 15 and FIG. 16 is easy to understand depends on the purpose, but when looking at FIG. 16 and wanting to know the details of any of the items, it is common to refer to FIG. It is a use.

図２６は発話記録を含むコンテンツを再生するときに、発話通知を行うコンテンツ再生装置の機能ブロック図を示している。コンテンツ管理部２６０５とコンテンツ再生部２６０４とを有するコンテンツ再生手段と、前述の発話検知手段２６０３と発話通知手段（通信部２６０１ａ、発話通知部２６０２）と、を有している。コンテンツ内の発話を検出することは通常困難であるので、あらかじめコンテンツ内に発話とその発話者を識別するための情報を記述しておくか、あるいは発話通知をあらかじめ生成しコンテンツ再生装置に渡すことが望ましい。コンテンツ再生装置はコンテンツ再生に同期して、適切な発話通知を行う。 FIG. 26 shows a functional block diagram of a content reproducing apparatus that performs utterance notification when reproducing content including an utterance record. It has a content reproduction unit having a content management unit 2605 and a content reproduction unit 2604, and the utterance detection unit 2603 and the utterance notification unit (communication unit 2601a, utterance notification unit 2602). Since it is usually difficult to detect utterances in the content, information for identifying the utterance and the utterer is described in the content in advance, or an utterance notification is generated in advance and passed to the content playback device Is desirable. The content reproduction apparatus performs appropriate utterance notification in synchronization with content reproduction.

図２６の下図には、発話通知に対応する受話通知を、所定の宛先に転送するための受話通知受理手段（受話通知受信部２６０６及び通信部２６０１ｃを含む）と受話通知転送手段（受話通知転送部２６０７と通信部２６０１とを含む）を記述している。上部分と完全に独立しているため、これらは同一の端末上にある必要は無いが、機器構成上役割を兼ねることが望ましい。 In the lower part of FIG. 26, an incoming call notification receiving means (including an incoming call notification receiving unit 2606 and a communication unit 2601c) for transferring an incoming call notification corresponding to the outgoing call notification to a predetermined destination and an incoming call notification transfer unit (receiving incoming call notification). Part 2607 and communication part 2601). Since they are completely independent from the upper part, they do not have to be on the same terminal, but it is desirable that they also serve as a device configuration.

図２７はコンテンツ再生装置の処理フローを示した図である。処理を開始し（ステップＳ２１：ＳＴＡＲＴ）、始コンテンツ再生中に（ステップＳ２２）コンテンツ中の発話を検知した場合（ステップＳ２３でＹＥＳ）、発話者を特定し（ステップＳ２４）、発話通知を生成、送信する（ステップＳ２５、Ｓ２６）。ステップＳ２７でコンテンツが終了したか否かを判定し、コンテンツの再生（ステップＳ２２）又はコンテンツの再生終了（ステップＳ２８）を行う。このフローではコンテンツ再生装置が発話通知を生成しているが、コンテンツの中に発話通知を埋め込んでおいたり、同時に配信したりすれば、コンテンツ再生装置が発話通知を生成する必要は無くなり、単にそれを転送すればよいことになる。このようなコンテンツ再生装置を導入することで、実人間の発話のみならず、機械を介した発話をも同様のコミュニケーション記録方法を用いることで記録、管理することが可能になる。 FIG. 27 is a diagram showing a processing flow of the content reproduction apparatus. The processing is started (step S21: START), and when the utterance in the content is detected during playback of the initial content (step S22) (YES in step S23), the speaker is specified (step S24), and the utterance notification is generated. Transmit (steps S25 and S26). In step S27, it is determined whether or not the content is finished, and the reproduction of the content (step S22) or the reproduction of the content is finished (step S28). In this flow, the content playback device generates the utterance notification. However, if the utterance notification is embedded in the content or distributed simultaneously, the content playback device does not need to generate the utterance notification. Will be transferred. By introducing such a content reproduction apparatus, it is possible to record and manage not only real human utterances but also utterances via machines by using the same communication recording method.

図２４は、発話記録と電子メール記録、電話の通話記録を統合して用いるコミュニケーションクライアントの表示例である。発話記録は実施例１で記したようにテーブルにより管理されるので、メーラによるメールの管理と基本的には同じである。電話や対話などには、例えば、サブジェクトの記述がない場合があるが、大きな問題とはならない。図２４に示すように、表示２４０１は、コミュニケーションクライアント２４０２のウインドウを表示している。左欄２４０３には、発信、受信、ゴミ箱、会社・大学・友人などの項目に分けられたメール閲覧への入り口が表示されている。符号２４０４には、すべて、メール、発話、電話、ＴＶ会議、チャット、などのメニューが表示されている。ここでは、「すべて」を表示させているが、種別、件名、相手、日付が表示されている。種別は、メールの他に、電話、対話などの項目が追加されている（２４０９）。２４１０の欄には、発話が選択された場合には、例えば図１５に示す発話記録が表示される。 FIG. 24 is a display example of a communication client that uses an utterance record, an e-mail record, and a telephone call record in an integrated manner. Since the utterance record is managed by the table as described in the first embodiment, it is basically the same as the mail management by the mailer. For example, there are cases where there is no subject description in telephones and conversations, but this is not a big problem. As shown in FIG. 24, the display 2401 displays the window of the communication client 2402. The left column 2403 displays an entrance to the mail browsing divided into items such as transmission, reception, trash, company / university / friends. Reference numeral 2404 displays menus such as mail, speech, telephone, TV conference, and chat. Here, “all” is displayed, but the type, subject, partner, and date are displayed. As the type, items such as telephone and dialogue are added in addition to mail (2409). In the field 2410, when an utterance is selected, for example, an utterance record shown in FIG. 15 is displayed.

図２９はコミュニケーションクライアントが動作するPCの概観を示す図である。図２９に示すように、コミュニケーションクライアントが動作するPC ２９０１は、ＣＰＵやメモリ、ＨＤＤなどを有するＰＣ本体部２９０３と、表示画面２９０７を備えたディスプレイ２９０５と、キーボードなどの入力部２９１１と、記録媒体を着脱可能にするスロット１９１３と、を有するとともに、図示しないが、例えば無線通信部などによりインターネットにアクセスすることができる。すなわち、この装置に、上記機能をハードウェア又はソフトウェアにより機能追加を行うことにより、本明細書に記載のコミュニケーション記録装置として利用することができる。 FIG. 29 is a diagram showing an overview of a PC on which a communication client operates. As shown in FIG. 29, a PC 2901 on which a communication client operates includes a PC main body 2903 having a CPU, memory, HDD, and the like, a display 2905 having a display screen 2907, an input unit 2911 such as a keyboard, and a recording medium. And a slot 1913 that can be attached and detached, and although not shown, for example, the Internet can be accessed by a wireless communication unit or the like. That is, by adding the above functions to this apparatus by hardware or software, the apparatus can be used as a communication recording apparatus described in this specification.

図２８は、図２９の装置の一構成例を示す機能ブロック図である。コミュニケーション記録手段である通信部2802bを介して取得されたコミュニケーション記録はコミュニケーション記録管理手段であるコミュニケーション記録管理部２８０４により蓄積、管理される。ユーザ名取得手段は通信部2801aとアカウント管理部2802、アカウント解決部2803によって構成されており、前述のようにコミュニケーション記録管理部で記録されているコミュニケーション記録内に含まれるさまざまなアカウントをユーザに分かりやすい形式に統合変換する役割を担う。コミュニケーション記録管理部は、電子メール（2808）、電話(2807)、チャット(2806)などさまざまなコミュニケーションを扱うためのコミュニケーション手段が連結されており、それらのコミュニケーション記録も統合して扱う。コミュニケーション記録管理部2804に蓄積されているコミュニケーション記録はユーザインタフェース部2805を介してユーザに提示されたり、ユーザから操作されたりする。既存のコミュニケーション記録への応答などの操作が行われた場合には、コミュニケーション記録管理部2804は、アカウント解決部2803と連携して必要なコミュニケーション利用手段を呼び出す。発話記録（対話記録）を含む人間の行うさまざまなコミュニケーション記録が、コミュニケーション記録管理手段によりひとつに統合され、わけ隔てなく扱える環境が実現する。 FIG. 28 is a functional block diagram showing a configuration example of the apparatus shown in FIG. Communication records acquired via the communication unit 2802b, which is communication recording means, are accumulated and managed by a communication record management unit 2804, which is communication record management means. The user name acquisition means includes a communication unit 2801a, an account management unit 2802, and an account resolution unit 2803. As described above, the user can recognize various accounts included in the communication record recorded in the communication record management unit. It plays a role of integrated conversion into an easy format. The communication record management unit is connected with various communication means such as e-mail (2808), telephone (2807), and chat (2806), and these communication records are also integrated. The communication records stored in the communication record management unit 2804 are presented to the user or operated by the user via the user interface unit 2805. When an operation such as a response to an existing communication record is performed, the communication record management unit 2804 calls a necessary communication utilization unit in cooperation with the account resolution unit 2803. Various communication records performed by humans, including utterance records (dialog records), are integrated into one by the communication record management means, and an environment that can be handled without difficulty is realized.

電子メールの管理において行われていることは、発話記録や通話記録が統合された環境においてもほとんど同様に行うことができる。電子メールの管理と発話記録や通話記録の管理とを統合して取り扱うために、各コミュニケーションにおけるアカウントの変換処理を行う必要がある。変換テーブルを用意しておくことで、必要に応じて読み替えることができる。 What is performed in the management of e-mail can be performed in almost the same manner in an environment in which utterance records and call records are integrated. In order to integrate and manage e-mail management and utterance record / call record management, it is necessary to perform account conversion processing in each communication. By preparing a conversion table, it can be read as necessary.

次に、本発明の第４実施例について説明する。図１７は、発話記録と電子メール記録、スケジュール情報、位置情報、デジカメ撮影写真などを統合して扱うコミュニケーションクライアントの表示例である。このコミュニケーションクライアントは、ＰＣ上で動作し、実施例１で示した情報処理装置が取得した発話記録を、ネットワークを経由してＰＣに取り込み、管理を行う。カレンダー１７０１でいつのコミュニケーション記録を閲覧するか否かをユーザが選択できる。選択した日付のコミュニケーション記録が時系列にマッピングされる。各対話に含まれる主な発話者の名前１７０６は、発話開始時刻に対応してマッピングされ表示されている。図１７においては、対話に含まれる発話継続時間が長いユーザ名ほど大きく表示されている。さらに、オフィス、食堂などのユーザの位置情報１７０４も時系列に沿って表示されるため、ユーザは直感的にいつどこで行われた対話か知ることができる。ユーザ名をクリックすることにより該当する対話を聞くことができる。さらに、会議や歓迎会などのスケジュール情報１７０５を合わせて表示することにより、時間と場所に意味づけを行うことができる。 Next, a fourth embodiment of the present invention will be described. FIG. 17 is a display example of a communication client that handles an utterance record and an e-mail record, schedule information, position information, a digital camera photograph, and the like. The communication client operates on the PC, and takes and manages the utterance record acquired by the information processing apparatus shown in the first embodiment via the network. The user can select when to view communication records on the calendar 1701. The communication records of the selected date are mapped in time series. The names 1706 of the main speakers included in each dialogue are mapped and displayed corresponding to the utterance start time. In FIG. 17, the user names with longer utterance durations included in the dialogue are displayed larger. Furthermore, since location information 1704 of users such as offices and canteens is also displayed in chronological order, the user can intuitively know when and where the dialogue was performed. Click on the user name to hear the corresponding dialog. Further, by displaying together schedule information 1705 such as a meeting or a welcome party, meaning can be given to time and place.

写真１７０２は、その時間、その場所で撮影した写真であり、撮影時間で時系列にマッピングしている。「はい、チーズ」などの発話に基づいて、その写真に写っている人のＩＤを取得できるため、写真の管理にも役立てることができる。例えば、図４の装置が携帯電話機であれば、カメラやムービーと音声記録用のマイクを用いて、写真と発話とを上述する方法で、例えば図５に示す形式のその他の情報に写真データを関連付けすることで管理することができる。これらの情報は、コミュニケーション記録という面でも、有用な記録である。無論、写真でなくても他のメディア、たとえばビデオ記録などを同一の時間軸にマッピングしておくことは可能である。 A photograph 1702 is a photograph taken at that place for the time, and is mapped in time series by the photographing time. Based on the utterance such as “Yes, cheese”, the ID of the person shown in the photo can be acquired, which can also be used for managing the photo. For example, if the apparatus shown in FIG. 4 is a mobile phone, using a camera, a movie, and a microphone for voice recording, the photograph and speech are described in the above-described manner, for example, photo data is added to other information in the format shown in FIG. It can be managed by associating. These pieces of information are also useful records in terms of communication records. Of course, it is possible to map other media, such as a video recording, to the same time axis even if it is not a photograph.

電子メール１７０３は、「神之門司」というユーザ名に代表される対話に含まれるある発話に対する返答の電子メールである。時間軸を拡大して発話単位で見ることにより、実際に、どの発話に対応するかを見ることもできる。 An e-mail 1703 is an e-mail of a reply to a certain utterance included in the dialogue represented by the user name “Koji Moji”. By expanding the time axis and looking at each utterance, it is possible to see which utterance is actually handled.

矩形１７０７内には、ユーザがマウスポインタで選択している選択範囲が示されている。この選択範囲内に含まれるユーザを宛先に電子メールを作成することができる。たとえば、会議を行った議事録などを会議の参加者（実際には発話者）に送信する場合、その時間会議室内に居たユーザを同様に選択して電子メールを送信することができる。すなわち、ある対話の参加者又はある発話者に対して電子メールなどを送信するツールを提供することができる。その際にも、図１７のような管理ツールを利用すれば、送り忘れ（漏れ）がなくなるというメリットがある。また、これは任意の時空間を宛先とした電子メールと捉えることもできる。過去の時空間だけではなく、未来の時空間をカレンダー１７０１と時間軸とにより指定して電子メールを送信すれば、指定された時間になったときに、指定された空間に居る人に電子メールが送信される。これはたとえばリマインダ、メール送信予約ツールとして利用することができる。 In a rectangle 1707, a selection range selected by the user with the mouse pointer is shown. An e-mail can be created with a user included in the selection range as a destination. For example, when the minutes of a meeting, etc., are transmitted to a conference participant (actually a speaker), an e-mail can be transmitted by selecting the user who was in the conference room at that time. That is, it is possible to provide a tool for transmitting an electronic mail or the like to a participant in a certain dialog or a certain speaker. Even in this case, if a management tool as shown in FIG. This can also be regarded as an e-mail addressed to any time and space. If not only the past time and space but also the future time and space are designated by the calendar 1701 and the time axis and the e-mail is sent, the e-mail is sent to the person in the designated space when the designated time comes. Is sent. This can be used as, for example, a reminder or a mail transmission reservation tool.

次に、本発明の第５実施例について説明する。第５実施例は対話のカテゴライズに関する。図９は、発話によるユーザA、B間のコミュニケーションを図式化したものである。図９（Ａ）では、ユーザAがユーザBに対して発言（発話）を行っている。この場合、ユーザAがユーザBに対して一方的に発言しているので、このコミュニケーションは情報伝達と言うことができる。図９（Ｂ）では、ユーザAの発話に対し、ユーザBの発話の応答がある様子を示している。ここでは対話が成立しており、図９（Ａ）で示した情報伝達よりも密なコミュニケーションが行われていることが推測される。図９（Ｃ）は、対話が継続して発生している場合であって、図９（Ｂ）に示した単純な対話よりもより緊密なコミュニケーションが行われていることを示す。コミュニケーションの緊密度は、対話の量（時間）によって見積もることができ、より多くの対話が行われるほど、より緊密なコミュニケーションが行われるとみなすことは妥当である。さらに双方の対話量に偏りが生じる場合のコミュニケーションは、情報伝達型(図９（Ｄ）)、対話量が平衡している場合は議論型(図９（Ｃ）)であるというカテゴライズも可能である。これらの情報は、発話記録をユーザに提示する際に、ユーザに合わせて提示することで、ユーザが情報を閲覧するときの手がかりとして利用できる。 Next, a fifth embodiment of the present invention will be described. The fifth embodiment relates to dialogue categorization. FIG. 9 shows a schematic diagram of communication between users A and B by speech. In FIG. 9A, the user A is speaking (speaking) to the user B. In this case, since user A is unilaterally speaking to user B, this communication can be said to be information transmission. FIG. 9B shows a state in which there is a response from user B's utterance to user A's utterance. Here, it is presumed that a dialogue has been established, and that closer communication than the information transmission shown in FIG. FIG. 9C illustrates a case where the conversation is continuously generated, and closer communication is performed than the simple conversation illustrated in FIG. 9B. The tightness of communication can be estimated by the amount of interaction (time), and it is reasonable to assume that the more conversations are made, the closer communication is done. Furthermore, the communication when the amount of dialogue between both parties is biased can be categorized as an information transmission type (Fig. 9 (D)) and a discussion type (Fig. 9 (C)) when the amount of dialogue is balanced. is there. Such information can be used as a clue when the user browses the information by presenting the utterance record to the user according to the user.

本発明は、発話に関連する種々の情報処理装置に利用することができる。 The present invention can be used for various information processing apparatuses related to speech.

本発明の各実施の形態による情報処理装置の構成例を示す機能ブロック図である。It is a functional block diagram which shows the structural example of the information processing apparatus by each embodiment of this invention. 本実施の形態による発話通知において、無指向性の近距離無線通信システムを用いた場合の概念図である。It is a conceptual diagram at the time of using the omnidirectional short-distance radio | wireless communications system in the speech notification by this Embodiment. 本実施の形態において、発話通知に指向性の近距離無線通信システムを用いた場合の概念を示す図である。In this Embodiment, it is a figure which shows the concept at the time of using a directional short-distance radio | wireless communications system for speech notification. 本実施例による情報処理装置の概観を示した図である。It is the figure which showed the external appearance of the information processing apparatus by a present Example. 発信通知として利用されるspeech.msgのデータ構成例を示す図である。It is a figure which shows the data structural example of speech.msg utilized as a transmission notification. speech.msgの送信手順を示すフローチャート図である。It is a flowchart figure which shows the transmission procedure of speech.msg. 電子名刺フォーマットとして最もよく利用されているvCard方式で記述した電子名刺データの例である。This is an example of electronic business card data described in the vCard method that is most often used as an electronic business card format. ユーザA、B、Cによる対話の例を示す図であり、横軸が時間軸である。It is a figure which shows the example of the dialogue by user A, B, C, and a horizontal axis is a time axis. 発話によるユーザA、B間のコミュニケーションを図式化したものである。This is a schematic diagram of communication between users A and B by utterances. 図２における端末A-D間の発話通知のやり取りを示した図である。FIG. 3 is a diagram illustrating exchange of utterance notifications between terminals A and D in FIG. 2. 図２における端末A-D間の発話通知のやり取りを示した図である。FIG. 3 is a diagram illustrating exchange of utterance notifications between terminals A and D in FIG. 2. 図２においてユーザA-Dの対話の最中に、別のユーザEと対話しながら近くを通りかかった無関係のユーザFと、ユーザA間のspeech.msgのやり取りを示したものである。FIG. 2 shows the exchange of speech.msg between the user A and an unrelated user F who passed nearby while interacting with another user E during the conversation of the user A-D. 一連の発話の継続と発話通知とのタイミングを示す図である。It is a figure which shows the timing of continuation of a series of utterances, and utterance notification. ユーザAとユーザBとの間で交わされた対話記録に付加される電子署名の例を示す図である。6 is a diagram illustrating an example of an electronic signature added to a dialogue record exchanged between a user A and a user B. FIG. 発話記録を管理する表の一例を示す図である。It is a figure which shows an example of the table | surface which manages an utterance record. 発話同士をまとめて対話として表示した例を示す図である。It is a figure which shows the example which displayed utterances collectively as a dialog. 発話記録と電子メール記録、スケジュール情報、位置情報、デジカメ撮影写真などを統合して扱うコミュニケーションクライアントの表示例である。This is a display example of a communication client that handles an utterance record and an e-mail record, schedule information, position information, digital camera photograph, and the like in an integrated manner. 情報処理装置が有するテーブルであって、ID１８０１と名前１８０２とから構成されるID変換テーブルである。It is a table possessed by the information processing apparatus, and is an ID conversion table composed of an ID 1801 and a name 1802. 夏目漱石の「こころ」の一節である。This is a passage from Soseki Natsume's “Kokoro”. 記録された発話記録を、発話開始時刻に基づいてソートして、表示画面上に表示させた例である。This is an example in which recorded utterance records are sorted on the basis of the utterance start time and displayed on the display screen. 記録された発話記録を、発話開始時刻に基づいてソートして、表示画面上に表示させた例である。This is an example in which recorded utterance records are sorted on the basis of the utterance start time and displayed on the display screen. 図１９に対応する対話記録である。20 is a dialogue record corresponding to FIG. 図２２で示した発話記録を対話単位で表した例である。It is the example which represented the utterance record shown in FIG. 22 per dialog. 発話記録と電子メール記録、電話の通話記録を統合して用いるコミュニケーションクライアントの表示例である。It is a display example of a communication client that uses an utterance record, an e-mail record, and a telephone call record in an integrated manner. ユーザが通信路を介して遠距離発話を行っている場合において、発話通知が、発話の伝達に利用される通信路を介して送信されている様子（図２５Ａ）、別の通信路を介して送信されている様子（図２５Ｂ）をそれぞれ示す図である。When the user is making a long-distance utterance via a communication path, the utterance notification is being transmitted via a communication path used for transmitting the utterance (FIG. 25A), via another communication path. It is a figure which each shows a mode (FIG. 25B) being transmitted. 発話記録を含むコンテンツを再生するときに、発話通知を行うコンテンツ再生装置の機能ブロック図である。It is a functional block diagram of the content reproduction apparatus which performs utterance notification when reproducing content including utterance records. コンテンツ再生装置の処理フローを示した図である。It is the figure which showed the processing flow of the content reproduction apparatus. コミュニケーションクライアントが動作するPCの内部構成例を示す機能ブロック図である。It is a functional block diagram which shows the internal structural example of PC in which a communication client operates. コミュニケーションクライアントが動作するPCの概観を示す図である。It is a figure which shows the external view of PC in which a communication client operates. 受話通知を送信するフローを示した図である。It is the figure which showed the flow which transmits an incoming call notification. 受話通知を受信した場合のフローを示した図である。It is the figure which showed the flow at the time of receiving an incoming call notification.

Explanation of symbols

Ａ…情報処理装置、１０１ａ、１０１ｂ、１０１ｃ…通信部、１０２…発話通知部、１０３…発話状態検出部、１０４…対話相手取得部、１０５…対話管理部、１０６…名前管理部、１０７…名前解決部、１０８…対話記録部、１０９…ユーザインタフェース部。 A ... Information processing device 101a, 101b, 101c ... Communication unit 102 ... Utterance notification unit 103 ... Utterance state detection unit 104 ... Dialog partner acquisition unit 105 ... Dialog management unit 106 ... Name management unit 107 ... Name Solution unit 108. Dialog recording unit 109 109 User interface unit.

Claims

A communication record management means for managing an utterance record including an utterer identifier for identifying an utterer and an utterance occurrence time;
Utterance integration means for grouping utterance records close in time managed by the communication record management means as dialog records,
An information processing apparatus comprising communication record presenting means for presenting a part or all of a conversation record generated by the utterance integration means to a user in a time series based on a conversation occurrence time.

A representative extracting means for extracting a representative speaker in the dialogue based on the utterance time or the number of utterances of each utterer among the utterers of the utterance included in the dialogue record generated by the utterance integration unit;
The information processing apparatus according to claim 1, wherein the dialogue record generated by the utterance integration unit is presented together with the representative extracted by the representative extraction unit.

The utterance record managed by the communication record management means further includes information indicating the utterance recipient,
The information processing apparatus according to claim 1, wherein the dialogue record generated by the utterance integration unit and the utterance receiver are presented together.

The information processing according to any one of claims 1 to 3, further comprising an ID-name conversion mechanism that converts an identifier corresponding to a speaker or a receiver of the utterance into an actual name or a nickname. apparatus.

The utterance record managed by the communication record management means further includes information indicating an utterance occurrence location,
The information processing apparatus according to claim 1, wherein the dialogue record generated by the utterance integration unit is presented together with a place where the dialogue occurs.

The utterance record managed by the communication record management means further includes utterance contents,
The information processing apparatus according to claim 1, wherein the dialogue record generated by the utterance integration unit is presented together with the dialogue content including the utterance content.

The utterance record managed by the communication record management means is:
Further includes estimated information of the mental state of the speaker,
The information processing apparatus according to claim 1, wherein the dialogue record generated by the utterance integration unit is presented together with the estimation information.

A dialog duration calculating means for calculating a duration of each dialog with respect to the dialog record generated by the utterance integration means;
The information processing apparatus according to claim 1, wherein the dialogue record is presented together with a duration calculated by the dialogue duration calculation unit.

The communication record presenting means is:
9. The information processing apparatus according to claim 8, wherein only the dialogs whose dialog duration time calculated by the dialog duration calculation unit is equal to or greater than a predetermined threshold value are presented.

The communication record presenting means is:
The information processing apparatus according to claim 9, wherein the dialogs are presented in ascending order or descending order according to the dialog duration calculated by the dialog duration calculation unit.

The user search means for searching only an utterance record or a dialog record including a specific utterance designated by a user from the utterance record recorded by the communication record management means. The information processing apparatus according to any one of claims.

The communication record management means manages the email record in addition to the utterance record,
The communication record presenting means presents a part or all of the electronic mail record and the conversation record managed by the communication record management means in a time series based on the electronic mail transmission time and the conversation occurrence time, and presents it to the user. The information processing apparatus according to claim 1.

Information for identifying a speaker included in the utterance record managed by the communication record management unit and an e-mail address included in the e-mail record managed by the communication record management unit are managed in association with each other. Have account management means,
When replying to an utterance record by e-mail, an e-mail address corresponding to the utterer of the utterance record is acquired by the account management means, and the acquired e-mail address is set as a reply mail destination The information processing apparatus according to claim 12.

When replying to a dialogue record by email, the account management means obtains email addresses corresponding to some or all of the participants in the dialogue, and uses the obtained email address as the reply email destination. The information processing apparatus according to claim 13, wherein the information processing apparatus is set.

15. The information processing apparatus according to claim 13, wherein an ID for identifying a reply source utterance record or dialogue record is described in the reply mail.

16. When citing an utterance record or dialogue record in an e-mail, an ID for identifying the utterance record or dialogue record of the citation source is described in the citation destination e-mail. The information processing apparatus according to any one of claims.

User search means for searching only an e-mail record, a utterance record or a dialogue record including a specific user designated by the user from the e-mail record and the utterance record recorded by the communication record management means, The information processing apparatus according to any one of claims 12 to 15.

The communication record management means manages the communication record performed through the network in addition to the utterance record,
The communication record presenting means presents a part or all of the communication record and dialog record managed by the communication record management means in a time series based on the communication start time or dialog occurrence time, and presents it to the user. The information processing apparatus according to claim 1.

An information management method using a computer,
Recording an utterance record that includes a speaker identifier identifying the speaker and the time of occurrence of the utterance;
Grouping utterance records that are close in time managed by the communication record management means as dialogue records;
And a step of presenting a part or all of the grouped dialogue records in time series to the user based on the dialogue occurrence time.

A program for causing a computer to execute the steps according to claim 19.

A recording medium recording a program for causing a computer to execute the steps according to claim 19.