JP2021015362A

JP2021015362A - Interactive device, information processing method, and information processing program

Info

Publication number: JP2021015362A
Application number: JP2019128549A
Authority: JP
Inventors: 祐一郎小上; Yuichiro Ogami
Original assignee: Fujitsu Connected Technologies Ltd
Current assignee: Fujitsu Connected Technologies Ltd
Priority date: 2019-07-10
Filing date: 2019-07-10
Publication date: 2021-02-12

Abstract

To alleviate a user load when updating teacher data of a model used for estimation of a person.SOLUTION: The interactive device estimates a user photographed by a camera, and interacts with the estimated user, and includes a person estimation model generated by using a plurality of pieces of imaging data obtained by photographing persons under a plurality of photographing conditions as teacher data, an estimation unit which estimates a user photographed by the camera by using the person estimation model, an interaction unit which outputs an inquiry containing information indicating the estimated person, and an updating unit which replaces one piece of the teacher data with a piece of imaging data captured when the user is photographed by the camera so that a variation of appearance frequency of the imaging data being the teacher data decreases for each of the photographing conditions when the response to the inquiry indicates that the user and the estimated person are different.SELECTED DRAWING: Figure 2

Description

本発明は、対話装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an interactive device, an information processing method and an information processing program.

人物を撮影した撮像データを教師データとして生成したモデルを用いて、人物の推定を行う情報処理装置が利用されている（例えば、特許文献１、２参照）。 An information processing device that estimates a person using a model generated as teacher data of imaged data obtained by photographing a person is used (see, for example, Patent Documents 1 and 2).

特開２０１０−０６１４６５号公報Japanese Unexamined Patent Publication No. 2010-061465 特開２００８−２８２０８９号公報Japanese Unexamined Patent Publication No. 2008-282089

人物の推定は様々な局面で利用されており、例えば、推定した人物との対話を行う対話装置が提案されている。このような対話装置において、人物の推定精度を向上するには、姿勢、方向、服装や人物の周囲の明るさ等の様々な撮影条件下における撮像データを教師データとして人物推定に用いるモデルを生成するとともに、生成した当該モデルについて教師データの更新を適宜行うことが好ましい。しかしながら、教師データの更新には様々な撮影条件の下でユーザの撮影を行うこととなるため、ユーザにとって負担となっていた。 The estimation of a person is used in various aspects, and for example, a dialogue device for performing a dialogue with the estimated person has been proposed. In such a dialogue device, in order to improve the estimation accuracy of a person, a model is generated in which imaging data under various shooting conditions such as posture, direction, clothes and brightness around the person are used as teacher data for person estimation. At the same time, it is preferable to update the teacher data for the generated model as appropriate. However, updating the teacher data requires shooting the user under various shooting conditions, which is a burden on the user.

開示の技術の１つの側面は、人物の推定に用いるモデルの教師データを更新する際におけるユーザの負担を軽減できる対話装置、情報処理方法および情報処理プログラムを提供することを目的とする。 One aspect of the disclosed technique is to provide a dialogue device, an information processing method and an information processing program that can reduce the burden on the user in updating the teacher data of the model used for estimating a person.

開示の技術の１つの側面は、次のような対話装置によって例示される。本対話装置は、カメラに映ったユーザを推定し、推定した前記ユーザとの対話を行う対話装置である。本対話装置は、複数の撮影条件の下で人物を撮影した複数の撮像データを教師データとして用いて生成した人物推定モデルと、前記カメラに映ったユーザを前記人物推定モデルを用いて推定する推定部と、推定した人物を示す情報を含む問いかけを出力する対話部と、前記問いかけに対する応答が前記ユーザと前記推定した人物とが異なることを示す場合に、前記教師データのいずれかと前記カメラに前記ユーザが映った際に取得した撮像データとを、前記教師データとする撮像データの撮影条件毎の出現頻度のばらつきが減少するように入れ替える更新部と、を備える。 One aspect of the disclosed technique is exemplified by the following dialogue device. This dialogue device is a dialogue device that estimates a user captured by a camera and performs a dialogue with the estimated user. This dialogue device estimates a person estimation model generated by using a plurality of imaging data obtained by photographing a person under a plurality of shooting conditions as teacher data, and an estimation that estimates a user captured by the camera using the person estimation model. When the unit, the dialogue unit that outputs a question including information indicating the estimated person, and the response to the question indicate that the user and the estimated person are different, one of the teacher data and the camera It is provided with an update unit that replaces the imaging data acquired when the user is photographed so as to reduce the variation in the appearance frequency of the imaging data as the teacher data for each imaging condition.

開示の技術は、人物の推定に用いるモデルの教師データを更新する際におけるユーザの負担を軽減することができる。 The disclosed technique can reduce the burden on the user in updating the teacher data of the model used for estimating the person.

図１は、実施形態に係るコミュニケーション装置のハードウェア構成の一例を示す図である。FIG. 1 is a diagram showing an example of a hardware configuration of a communication device according to an embodiment. 図２は、実施形態に係るコミュニケーション装置の処理ブロックの一例を示す図である。FIG. 2 is a diagram showing an example of a processing block of the communication device according to the embodiment. 図３は、実施形態において、管理データベースが格納する撮影条件管理テーブルの一例を示す図である。FIG. 3 is a diagram showing an example of a shooting condition management table stored in the management database in the embodiment. 図４は、実施形態において、撮影部が撮影した顔画像データの撮影条件の一例を示す図である。FIG. 4 is a diagram showing an example of shooting conditions for face image data shot by the shooting unit in the embodiment. 図５は、教師データを入れ替えた場合における分散の一例を示す図である。FIG. 5 is a diagram showing an example of dispersion when the teacher data is exchanged. 図６は、実施形態における判定データベースの初期構築の処理フローを示す図である。FIG. 6 is a diagram showing a processing flow of initial construction of the determination database in the embodiment. 図７は、実施形態における、判定データモデルの更新処理の処理フローの一例を示す第１の図である。FIG. 7 is a first diagram showing an example of a processing flow of the determination data model update processing in the embodiment. 図８は、実施形態における、判定データモデルの更新処理の処理フローの一例を示す第２の図である。FIG. 8 is a second diagram showing an example of a processing flow of the determination data model update processing in the embodiment. 図９は、第１変形例に係るコミュニケーション装置の処理ブロックの一例を示す図である。FIG. 9 is a diagram showing an example of a processing block of the communication device according to the first modification.

実施形態に係る対話装置は、カメラに映ったユーザを推定し、推定した前記ユーザとの対話を行う対話装置である。本対話装置は、例えば、以下の構成を有する。
複数の撮影条件の下で人物を撮影した複数の撮像データを教師データとして用いて生成した人物推定モデルと、
前記カメラに映ったユーザを前記人物推定モデルを用いて推定する推定部と、
推定した人物を示す情報を含む問いかけを出力する対話部と、
前記問いかけに対する応答が前記ユーザと前記推定した人物とが異なることを示す場合に、前記教師データのいずれかと前記カメラに前記ユーザが映った際に取得した撮像データとを、前記教師データとする撮像データの撮影条件毎の出現頻度のばらつきが減少するように入れ替える更新部と、を備える。 The dialogue device according to the embodiment is a dialogue device that estimates a user captured by a camera and performs a dialogue with the estimated user. The dialogue device has, for example, the following configuration.
A person estimation model generated by using multiple imaging data of a person photographed under multiple shooting conditions as teacher data, and
An estimation unit that estimates the user captured by the camera using the person estimation model,
A dialogue section that outputs questions that include information indicating the estimated person,
When the response to the question indicates that the user and the estimated person are different, one of the teacher data and the imaging data acquired when the user is photographed by the camera are used as the teacher data. It is provided with an update unit that is replaced so that the variation in the appearance frequency of the data for each shooting condition is reduced.

対話装置は、例えば、ユーザとの間で音声によるコミュニケーションを行う。対話装置では、複数の撮影条件の下で人物を撮影した複数の撮像データを教師データとして用いて生成した人物推定モデルを備える。人物推定モデルは、例えば、ディープラーニング等の機械学習によって生成される。カメラは、例えば、ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ（ＣＣＤ）センサやＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ（ＣＭＯＳ）センサを備えるデジタルカメラである。カメラは、映ったユーザの画像をデジタルイメージとした撮像データを生成する。 The dialogue device, for example, performs voice communication with the user. The dialogue device includes a person estimation model generated by using a plurality of imaging data obtained by photographing a person under a plurality of photographing conditions as teacher data. The person estimation model is generated by machine learning such as deep learning. The camera is, for example, a digital camera equipped with a Charge Coupled Device (CCD) sensor and a Complementary Metal Oxide Sensor (CMOS) sensor. The camera generates imaging data in which the captured user's image is used as a digital image.

推定部は、カメラに映ったユーザの推定を人物推定モデルを用いて行う。推定部は、例えば、人物推定モデルを用いた推定の結果として、推定した人物を示す情報を取得する。人物を示す情報としては、例えば、氏名、苗字、名前、ニックネーム等を挙げることができる。 The estimation unit estimates the user captured by the camera using a person estimation model. The estimation unit acquires, for example, information indicating the estimated person as a result of estimation using the person estimation model. Examples of the information indicating the person include a name, a surname, a first name, a nickname, and the like.

対話部は、推定した人物を示す情報を含む問いかけを出力する。問いかけは、例えば、挨拶やスケジュールの通知等であり、具体的には、「こんにちは、Ａさん」、「Ａさん、明日はＸＸのイベントがあります」といった文言を挙げることができる。 The dialogue unit outputs a question including information indicating the estimated person. Question is, for example, a greeting and schedule of the notification, such as, in particular, "Hello, Mr. A", "Mr. A, tomorrow there is a XX of the event" can be mentioned wording such.

前記問いかけに対する応答が前記ユーザと前記推定した人物とが異なることを示す場合としては、問いかけに含まれる推定した人物を示す情報を否定する応答を挙げることができる。例えば、「こんにちは、Ａさん」という問いかけに対して、「いいえ、私はＢです」という応答は、問いかけに含まれる「Ａさん」を応答において否定していることで、前記ユーザと前記推定した人物とが異なることを示す。このようにユーザと推定した人物とが異なる場合には、人物推定モデルを用いた人物の推定精度が低いと考えられるため、更新部は人物推定モデルの教師データの入れ替えを行う。 As a case where the response to the question indicates that the user and the estimated person are different, a response denying the information indicating the estimated person included in the question can be mentioned. For example, for the question of "Hello, Mr. A", a response of "No, I am B", that has denied in response to "Mr. A" that is included in the question, was the estimated and the user Indicates that the person is different. When the user and the estimated person are different in this way, it is considered that the estimation accuracy of the person using the person estimation model is low, so that the update unit replaces the teacher data of the person estimation model.

更新部１６は、撮影条件毎の出現頻度のばらつきが減少するように教師データを入れ替えることで、人物推定モデルの生成に用いる撮像データを様々な撮影条件下で撮影されたものとすることができる。その結果、本対話装置は、様々な条件下における人物の推定精度を高めることができる。また、本対話装置は、ユーザとの対話の中で取得した撮像データを用いて教師データの入れ替えを行うため、ユーザに対して姿勢等を指示しなくとも良いため、教師データの入れ替えの際におけるユーザの負担を軽減することができる。 By exchanging the teacher data so that the variation in the appearance frequency for each shooting condition is reduced, the updating unit 16 can assume that the imaging data used for generating the person estimation model is captured under various shooting conditions. .. As a result, the dialogue device can improve the estimation accuracy of the person under various conditions. In addition, since this dialogue device replaces the teacher data using the imaging data acquired during the dialogue with the user, it is not necessary to instruct the user about the posture, etc., so that when the teacher data is replaced, the teacher data is replaced. The burden on the user can be reduced.

本対話装置において、前記対話部は、前記ユーザからの指示への応答として、前記推定した人物を示す情報を含む問いかけを出力してもよい。すなわち、ユーザとの対話における第一声は、本対話装置からではなく、ユーザからであってもよい。 In the dialogue device, the dialogue unit may output a question including information indicating the estimated person as a response to an instruction from the user. That is, the first voice in the dialogue with the user may be from the user, not from the dialogue device.

本対話装置において、前記推定部は、前記問いかけに応答したユーザを前記カメラに再度撮影させ、再度撮影させた撮像データを用いて前記ユーザの推定をさらに行ってもよい。問いかけに応答したユーザは、問いかけに応答する前とは異なる姿勢や位置であると考えられる。このように姿勢や位置が変化したユーザについて推定を行い、推定結果に基づく教師データの更新が行われることで、様々な撮影条件における教師データを人物推定モデルの生成に用いることができる。 In the present dialogue device, the estimation unit may make the camera take a picture of the user who responded to the question again, and further estimate the user using the imaged data taken again. The user who responded to the question is considered to have a different posture and position than before responding to the question. By estimating the user whose posture and position have changed in this way and updating the teacher data based on the estimation result, the teacher data under various shooting conditions can be used to generate a person estimation model.

本対話装置において、前記更新部は、前記カメラに撮影させた撮像データと入れ替えることで前記撮影条件毎の出現頻度のばらつきを減少させることができる教師データが複数ある場合、前記撮影条件毎の出現頻度のばらつきを減少させる減少量が多い教師データを優先して前記カメラに撮影させた撮像データと入れ替える対象としてもよい。このような特徴を有することで、撮影条件それぞれについてバランスよく集めた撮像データを人物推定モデルの教師データとすることができる。 In the present dialogue device, when there is a plurality of teacher data capable of reducing the variation in the appearance frequency for each shooting condition by replacing the imaging data captured by the camera, the updating unit appears for each shooting condition. The teacher data having a large amount of reduction that reduces the variation in frequency may be prioritized and replaced with the imaging data captured by the camera. By having such a feature, the imaging data collected in a well-balanced manner for each shooting condition can be used as the teacher data of the person estimation model.

本対話装置において、前記更新部は、前記カメラに撮影させた撮像データを入れ替えることで前記撮影条件毎の出現頻度のばらつきを減少させることができる教師データが複数ある場合、より古い時期に撮影された教師データを優先して前記カメラに撮影させた撮像データと入れ替える対象としてもよい。ユーザの顔は時間の経過とともに変化するため、古い撮像データは教師データとして有効ではないと考えられる。このような特徴を有することで、古い教師データを新しい撮像データに入れ替えることができ、人物推定モデルを現在のユーザの推定に適したものとすることができる。 In this dialogue device, the updating unit is photographed at an older time when there is a plurality of teacher data capable of reducing the variation in the appearance frequency for each shooting condition by exchanging the imaging data captured by the camera. The teacher data may be prioritized and replaced with the imaging data captured by the camera. Since the user's face changes over time, it is considered that the old imaging data is not valid as teacher data. By having such a feature, the old teacher data can be replaced with the new imaging data, and the person estimation model can be suitable for the estimation of the current user.

本対話装置において、前記更新部は、前記問いかけに対する応答が前記ユーザと前記推定した人物とが一致している場合であっても、前回の教師データの入れ替えから所定期間が経過している場合には、前記教師データのいずれかと前記カメラに撮影させた撮像データとを、前記教師データとする撮像データの撮影条件毎の出現頻度のばらつきが減少するように入れ替えてもよい。このような特徴によっても、古い教師データを新しい撮像データに入れ替えることができ、人物推定モデルを現在のユーザの推定に適したものとすることができる。 In this dialogue device, even if the response to the question is the same between the user and the estimated person, the update unit is used when a predetermined period has passed since the previous replacement of the teacher data. May replace any of the teacher data with the imaging data captured by the camera so as to reduce the variation in the appearance frequency of the imaging data as the teacher data for each imaging condition. These features also allow the old teacher data to be replaced with new imaging data, making the person estimation model suitable for the estimation of the current user.

実施形態は、上記対話装置が実行する情報処理方法および情報処理プログラムとして把握することもできる。 The embodiment can also be grasped as an information processing method and an information processing program executed by the dialogue device.

以下、図面を参照して、実施形態についてさらに説明する。以下に示す実施形態の構成は例示であり、開示の技術は実施形態の構成に限定されない。 Hereinafter, embodiments will be further described with reference to the drawings. The configurations of the embodiments shown below are examples, and the disclosed technology is not limited to the configurations of the embodiments.

＜実施形態＞
実施形態では、カメラに映ったユーザを推定し、推定したユーザとの対話を行うコミュニケーション装置について説明する。図１は、実施形態に係るコミュニケーション装置の
ハードウェア構成の一例を示す図である。図１に例示されるコミュニケーション装置１は、ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＣＰＵ）１０１、主記憶部１０２、補助記憶部１０３、カメラ１０４、マイクロフォン１０５、スピーカー１０６、計時部１０７、センサ１０８および接続バスＢ１を含む情報処理装置である。ＣＰＵ１０１、主記憶部１０２、補助記憶部１０３、カメラ１０４、マイクロフォン１０５、スピーカー１０６、計時部１０７およびセンサ１０８は、接続バスＢ１によって相互に接続されている。コミュニケーション装置１は、「対話装置」の一例である。 <Embodiment>
In the embodiment, a communication device that estimates the user captured by the camera and interacts with the estimated user will be described. FIG. 1 is a diagram showing an example of a hardware configuration of a communication device according to an embodiment. The communication device 1 illustrated in FIG. 1 includes a Central Processing Unit (CPU) 101, a main storage unit 102, an auxiliary storage unit 103, a camera 104, a microphone 105, a speaker 106, a timekeeping unit 107, a sensor 108, and a connection bus B1. It is an information processing device. The CPU 101, the main storage unit 102, the auxiliary storage unit 103, the camera 104, the microphone 105, the speaker 106, the timekeeping unit 107, and the sensor 108 are connected to each other by the connection bus B1. The communication device 1 is an example of a “dialogue device”.

ＣＰＵ１０１は、マイクロプロセッサユニット（ＭＰＵ）、プロセッサとも呼ばれる。ＣＰＵ１０１は、単一のプロセッサに限定される訳ではなく、マルチプロセッサ構成であってもよい。また、単一のソケットで接続される単一のＣＰＵ１０１がマルチコア構成を有していても良い。ＣＰＵ１０１が実行する処理のうち少なくとも一部は、ＣＰＵ１０１以外のプロセッサ、例えば、ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ（ＤＳＰ）、ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＧＰＵ）、数値演算プロセッサ、ベクトルプロセッサ、画像処理プロセッサ等の専用プロセッサで行われても良い。また、ＣＰＵ１０１が実行する処理のうち少なくとも一部は、集積回路（ＩＣ）、その他のディジタル回路によって実行されてもよい。また、ＣＰＵ１０１の少なくとも一部にアナログ回路が含まれても良い。集積回路は、ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｅｄｃｉｒｃｕｉｔ（ＬＳＩ）、ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ（ＡＳＩＣ）、プログラマブルロジックデバイス（ＰＬＤ）を含む。ＰＬＤは、例えば、Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ（ＦＰＧＡ）を含む。ＣＰＵ１０１は、プロセッサと集積回路との組み合わせであっても良い。組み合わせは、例えば、マイクロコントローラユニット（ＭＣＵ）、Ｓｙｓｔｅｍ−ｏｎ−ａ−ｃｈｉｐ（ＳｏＣ）、システムＬＳＩ、チップセットなどと呼ばれる。コミュニケーション装置１では、ＣＰＵ１０１が補助記憶部１０３に記憶されたプログラムを主記憶部１０２の作業領域に展開し、プログラムの実行を通じて周辺装置の制御を行う。これにより、コミュニケーション装置１は、所定の目的に合致した処理を実行することができる。主記憶部１０２および補助記憶部１０３は、コミュニケーション装置１が読み取り可能な記録媒体である。 The CPU 101 is also called a microprocessor unit (MPU) or a processor. The CPU 101 is not limited to a single processor, and may have a multiprocessor configuration. Further, a single CPU 101 connected by a single socket may have a multi-core configuration. At least a part of the processing executed by the CPU 101 is performed by a processor other than the CPU 101, for example, a dedicated processor such as a Digital Signal Processor (DSP), a Graphics Processing Unit (GPU), a numerical arithmetic processor, a vector processor, or an image processing processor. You may. Further, at least a part of the processing executed by the CPU 101 may be executed by an integrated circuit (IC) or other digital circuit. Further, at least a part of the CPU 101 may include an analog circuit. Integrated circuits include Large Scale Integrated Circuits (LSIs), Application Specific Integrated Circuits (ASICs), and Programmable Logic Devices (PLDs). PLDs include, for example, Field-Programmable Gate Array (FPGA). The CPU 101 may be a combination of a processor and an integrated circuit. The combination is called, for example, a microcontroller unit (MCU), a system-on-a-chip (SoC), a system LSI, a chipset, or the like. In the communication device 1, the CPU 101 expands the program stored in the auxiliary storage unit 103 into the work area of the main storage unit 102, and controls the peripheral devices through the execution of the program. As a result, the communication device 1 can execute a process that meets a predetermined purpose. The main storage unit 102 and the auxiliary storage unit 103 are recording media that can be read by the communication device 1.

主記憶部１０２は、ＣＰＵ１０１から直接アクセスされる記憶部として例示される。主記憶部１０２は、ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ（ＲＡＭ）およびＲｅａｄ
ＯｎｌｙＭｅｍｏｒｙ（ＲＯＭ）を含む。 The main storage unit 102 is exemplified as a storage unit that is directly accessed from the CPU 101. The main memory 102 is a Random Access Memory (RAM) and a Read.
Includes Only Memory (ROM).

補助記憶部１０３は、各種のプログラムおよび各種のデータを読み書き自在に記録媒体に格納する。補助記憶部１０３は外部記憶装置とも呼ばれる。補助記憶部１０３には、オペレーティングシステム（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ、ＯＳ）、各種プログラム、各種テーブル等が格納される。外部装置等には、例えば、コンピュータネットワーク等で接続された、他の情報処理装置および外部記憶装置が含まれる。 The auxiliary storage unit 103 stores various programs and various data in a readable and writable recording medium. The auxiliary storage unit 103 is also called an external storage device. The auxiliary storage unit 103 stores an operating system (Operating System, OS), various programs, various tables, and the like. External devices and the like include, for example, other information processing devices and external storage devices connected by a computer network or the like.

補助記憶部１０３は、例えば、ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲＯＭ（ＥＰＲＯＭ）、ＥｍｂｅｄｄｅｄＭｕｌｔｉＭｅｄｉａＣａｒｄ（ｅＭＭＣ）、ソリッドステートドライブ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ、ＳＳＤ）、ハードディスクドライブ（ＨａｒｄＤｉｓｋＤｒｉｖｅ、ＨＤＤ）等である。また、補助記憶部１０３は、例えば、ＣｏｍｐａｃｔＤｉｓｃ（ＣＤ）ドライブ装置、Ｄｉｇｉｔａｌ
ＶｅｒｓａｔｉｌｅＤｉｓｃ（ＤＶＤ）ドライブ装置、Ｂｌｕ−ｒａｙ（登録商標）
Ｄｉｓｃ（ＢＤ）ドライブ装置等であってもよい。 The auxiliary storage unit 103 is, for example, an Erasable Programmable ROM (EPROM), an Embedded MultiMediaCard (eMMC), a solid state drive (Solid State Drive, SSD), a hard disk drive (Hard Disk Drive, HDD), or the like. Further, the auxiliary storage unit 103 is, for example, a Compact Disc (CD) drive device, Digital.
Versail Disc (DVD) drive device, Blu-ray®
It may be a Disc (BD) drive device or the like.

カメラ１０４は、動画および静止画の少なくとも一方を撮影可能な撮像装置である。カメラ１０４は、例えば、ＣＣＤセンサやＣＭＯＳセンサを備えるデジタルカメラである。
カメラ１０４は、例えば、撮影した撮像データのＥｘｃｈａｎｇｅａｂｌｅＩｍａｇｅ
ＦｉｌｅＦｏｒｍａｔ（Ｅｘｉｆ）データに、計時部１０７から取得した撮影日時を格納してもよい。カメラ１０４は、例えば、コミュニケーション装置１の正面側（ユーザに向けられる側）に配置される。 The camera 104 is an imaging device capable of capturing at least one of a moving image and a still image. The camera 104 is, for example, a digital camera including a CCD sensor or a CMOS sensor.
The camera 104 uses, for example, an Exchange Image of captured image data.
The shooting date and time acquired from the time measuring unit 107 may be stored in the File Format (Exif) data. The camera 104 is arranged, for example, on the front side (the side facing the user) of the communication device 1.

マイクロフォン１０５は、ユーザからの音声による操作指示等を受け付ける入力部である。マイクロフォン１０５によって入力された音声は、音声解析されて氏名等の情報が抽出される。コミュニケーション装置１は、マイクロフォン１０５に加えて、または、マイクロフォン１０５に代えて、他の入力部を備えてもよい。他の入力部としては、例えば、キーボード、ポインティングデバイス、タッチパネル、加速度センサ等を挙げることができる。 The microphone 105 is an input unit that receives voice operation instructions and the like from the user. The voice input by the microphone 105 is voice-analyzed to extract information such as a name. The communication device 1 may include another input unit in addition to or in place of the microphone 105. Examples of other input units include a keyboard, a pointing device, a touch panel, an acceleration sensor, and the like.

スピーカー１０６は、音声を出力する出力部である。コミュニケーション装置１は、スピーカー１０６に加えて、または、スピーカー１０６に代えて、他の出力部を備えてもよい。他の出力部としては、例えば、ＣａｔｈｏｄｅＲａｙＴｕｂｅ（ＣＲＴ）ディスプレイ、ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ（ＬＣＤ）、ＰｌａｓｍａＤｉｓｐｌａｙＰａｎｅｌ（ＰＤＰ）、Ｅｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅ（ＥＬ）パネルあるいは有機ＥＬパネルといった出力デバイスを例示できる。本実施形態に係るコミュニケーション装置１では、マイクロフォン１０５とスピーカー１０６とを備えることで、ユーザとの音声によるコミュニケーション（対話）を可能とする。 The speaker 106 is an output unit that outputs audio. The communication device 1 may include another output unit in addition to the speaker 106 or in place of the speaker 106. Examples of other output units include output devices such as a Cathode Ray Tube (CRT) display, a Liquid Crystal Display (LCD), a Plasma Display Panel (PDP), an Electroluminescence (EL) panel, and an organic EL panel. The communication device 1 according to the present embodiment is provided with the microphone 105 and the speaker 106 to enable voice communication (dialogue) with the user.

計時部１０７は、時刻情報を生成する回路である。計時部１０７は、例えば、コミュニケーション装置１が内蔵する時計である。 The timekeeping unit 107 is a circuit that generates time information. The timekeeping unit 107 is, for example, a clock built in the communication device 1.

センサ１０８は、人の接近を検知するセンサである。センサ１０８としては、人感センサを挙げることができる。人感センサは、例えば、赤外線、超音波、可視光を用いて、人の接近を検知する。可視光を用いて人の接近を検知する場合、センサ１０８として照度センサを採用してもよい。 The sensor 108 is a sensor that detects the approach of a person. As the sensor 108, a motion sensor can be mentioned. The motion sensor detects the approach of a person by using, for example, infrared rays, ultrasonic waves, and visible light. When detecting the approach of a person using visible light, an illuminance sensor may be adopted as the sensor 108.

＜コミュニケーション装置１の処理ブロック＞
図２は、実施形態に係るコミュニケーション装置の処理ブロックの一例を示す図である。コミュニケーション装置１は、撮影部１１、推定部１２、対話部１３、判定部１４、算出部１５、更新部１６、初期構築部１７、判定データモデル１８および管理データベース１９を備える。コミュニケーション装置１は、主記憶部１０２に実行可能に展開されたコンピュータプログラムをＣＰＵ１０１が実行することで、上記コミュニケーション装置１の、撮影部１１、推定部１２、対話部１３、判定部１４、算出部１５、更新部１６、初期構築部１７、判定データモデル１８および管理データベース１９等の各部としての処理を実行する。 <Processing block of communication device 1>
FIG. 2 is a diagram showing an example of a processing block of the communication device according to the embodiment. The communication device 1 includes a photographing unit 11, an estimation unit 12, a dialogue unit 13, a determination unit 14, a calculation unit 15, an update unit 16, an initial construction unit 17, a determination data model 18, and a management database 19. In the communication device 1, the CPU 101 executes a computer program executably deployed in the main storage unit 102, so that the communication device 1 has a photographing unit 11, an estimation unit 12, a dialogue unit 13, a determination unit 14, and a calculation unit. The processing as each part of 15, the update unit 16, the initial construction unit 17, the determination data model 18, the management database 19, and the like is executed.

判定データモデル１８は、ユーザを撮影した撮像データから顔領域を抽出した顔画像データを教師データのセット（教師データセット）として、例えば、ディープラーニング等の機械学習によって構築される学習モデルである。判定データモデル１８は、例えば、教師データセットをサポートベクターマシン（ＳＶＭ）等の分類器に入力して構築される。判定データモデル１８は、後述する推定部１２によるユーザの推定に用いられる。判定データモデル１８は、例えば、ユーザごとに構築され、ユーザの氏名をモデル名とすることができる。ユーザの撮影により取得した撮像データと判定データモデル１８とを照合させることで、ユーザの氏名が取得される。判定データモデル１８は、「人物推定モデル」の一例である。 The determination data model 18 is a learning model constructed by machine learning such as deep learning, using face image data obtained by extracting a face region from imaged data taken by a user as a set of teacher data (teacher data set). The determination data model 18 is constructed by inputting, for example, a teacher data set into a classifier such as a support vector machine (SVM). The determination data model 18 is used for user estimation by the estimation unit 12, which will be described later. The determination data model 18 is constructed for each user, for example, and the user's name can be used as the model name. The user's name is acquired by collating the imaging data acquired by the user's shooting with the determination data model 18. The determination data model 18 is an example of a “person estimation model”.

管理データベース１９は、判定データモデル１８の教師データとした顔画像データを管
理するデータベースである。図３は、実施形態において、管理データベースが格納する撮影条件管理テーブルの一例を示す図である。図３に例示される撮影条件管理テーブル１９１は、例えば、撮影されたユーザ毎に作成される。撮影条件管理テーブル１９１のそれぞれには、例えば、ユーザの氏名をラベルとして付すことができる。撮影条件管理テーブル１９１のそれぞれは、ラベルとして付されたユーザの氏名によって特定される。なお、撮影条件管理テーブル１９１のそれぞれに付すラベルがユーザの氏名に限定されるわけではなく、ユーザごとに一意に決定される情報であればユーザの氏名以外の数値や文字列等の情報であってもよい。撮影条件管理テーブル１９１は、撮影部１１が取得した撮像データから抽出した顔画像データのうち、判定データモデル１８の教師データとした顔画像データそれぞれの撮影条件を管理する。 The management database 19 is a database that manages face image data as teacher data of the determination data model 18. FIG. 3 is a diagram showing an example of a shooting condition management table stored in the management database in the embodiment. The shooting condition management table 191 illustrated in FIG. 3 is created for each user who has been shot, for example. For example, a user's name can be attached to each of the shooting condition management tables 191 as a label. Each of the shooting condition management tables 191 is specified by the name of the user attached as a label. Note that the label attached to each of the shooting condition management table 191 is not limited to the user's name, and if the information is uniquely determined for each user, it is information such as a numerical value or a character string other than the user's name. You may. The shooting condition management table 191 manages the shooting conditions of each of the face image data used as the teacher data of the determination data model 18 among the face image data extracted from the imaged data acquired by the shooting unit 11.

図３に例示される撮影条件管理テーブル１９１は、「ＰＩＤ」、「撮影日」および「撮影条件」の各項目を含む。「ＰＩＤ」には、カメラ１０４がユーザを撮影することで取得した撮像データから抽出した顔画像データを一意に識別するＩＤ情報が格納される。「撮影日」には、撮影を行った日、すなわち、顔画像データの抽出元となる撮像データを生成した日が格納される。「撮影条件」には、撮影を行った際の撮影条件を示す情報が格納される。図３の例では、「撮影条件」は、「向き」、「距離」および「明暗」の３つのカテゴリを含む。「向き」には、撮影されたユーザの顔の向きを示す情報が格納される。図３では、「向き」として、「正面」、「左」、「右」、「上」および「下」の５方向が例示される。「距離」には、コミュニケーション装置１とユーザとの距離を示す情報が格納される。図３では、「距離」として、「遠」、「中」、「近」の３段階が例示される。「明暗」には、撮影時におけるユーザの周囲の明るさを示す情報が格納される。図３では、「明暗」として、「明」、「中」、「暗」の３段階が例示される。「向き」の５方向、「距離」の３段階、「明暗」の３段階のそれぞれの基準は、例えば、カメラ１０４のｆ値や焦点距離等の特性に合わせて適宜決定すればよい。図３では、「向き」、「距離」、「明暗」のそれぞれについて、撮像データが該当する項目のビットがオンにされる。例えば、「ＰＩＤ」が１である撮像データでは、顔の向きが「正面」、コミュニケーション装置１とユーザとの距離が「中」、ユーザの周囲の明るさが「中」であることが理解できる。 The shooting condition management table 191 illustrated in FIG. 3 includes each item of "PID", "shooting date", and "shooting condition". The "PID" stores ID information that uniquely identifies the face image data extracted from the imaging data acquired by the camera 104 taking a picture of the user. The “shooting date” stores the date on which the shooting was performed, that is, the date on which the imaging data from which the face image data was extracted was generated. In the "shooting condition", information indicating the shooting condition at the time of shooting is stored. In the example of FIG. 3, the "shooting condition" includes three categories of "direction", "distance" and "light and darkness". In the "orientation", information indicating the orientation of the photographed user's face is stored. In FIG. 3, five directions of "front", "left", "right", "up" and "down" are exemplified as the "direction". In the "distance", information indicating the distance between the communication device 1 and the user is stored. In FIG. 3, three stages of “far”, “medium”, and “near” are exemplified as “distance”. Information indicating the brightness of the user's surroundings at the time of shooting is stored in "brightness". In FIG. 3, three stages of “light”, “medium”, and “dark” are exemplified as “light and dark”. The criteria for each of the five directions of "direction", the three steps of "distance", and the three steps of "light and darkness" may be appropriately determined according to the characteristics such as the f-number and focal length of the camera 104, for example. In FIG. 3, for each of “direction”, “distance”, and “brightness”, the bit of the item corresponding to the imaging data is turned on. For example, in the imaging data in which the "PID" is 1, it can be understood that the orientation of the face is "front", the distance between the communication device 1 and the user is "medium", and the brightness around the user is "medium". ..

撮影条件管理テーブル１９１は、さらに、「合計」、「分散」および「分散合計」の各項目を含む。「合計」には、撮影条件の各項目について、該当する撮像データの数の合計が格納される。「分散」には、撮影条件毎の出現頻度のばらつきを示す情報（指標）が格納される。撮影条件毎のばらつきを示す指標としては、例えば、分散、標準偏差、変動係数等を挙げることができる。分散は、不偏分散および標本分散のいずれが用いられてもよい。本実施形態では、撮影条件毎のばらつきを示す指標として「分散」を採用するが、他の指標を用いてもよい。図３の例では、「分散」として、「撮影条件」に含まれる３つのカテゴリのうち、「向き」についての分散と、「距離」についての分散と、「明暗」についての分散が例示される。「分散合計」には、これらの分散の合計値が格納される。すなわち、「分散合計」の値が小さければ小さいほど、「向き」、「距離」および「明暗」のそれぞれのカテゴリに属する教師データの数が可及的に等しくなっていることが示される。また、「分散」の値が小さければ小さいほど、当該カテゴリ内における条件（例えば、「距離」カテゴリにおける「遠」、「中」、「近」）それぞれに属する教師データの数が可及的に等しくなっていることが示される。 The shooting condition management table 191 further includes each item of "total", "dispersion", and "variance total". In "total", the total number of corresponding imaging data is stored for each item of shooting conditions. In the "dispersion", information (index) indicating the variation in the appearance frequency for each shooting condition is stored. Examples of the index showing the variation for each shooting condition include variance, standard deviation, coefficient of variation, and the like. As the dispersion, either unbiased dispersion or sample dispersion may be used. In the present embodiment, "dispersion" is adopted as an index showing the variation for each shooting condition, but other indexes may be used. In the example of FIG. 3, as "dispersion", among the three categories included in "shooting conditions", the variance for "direction", the variance for "distance", and the variance for "light and darkness" are exemplified. .. The total value of these variances is stored in the "variance total". That is, the smaller the value of "total variance", the more equal the number of teacher data belonging to each of the "direction", "distance" and "light and dark" categories is. Also, the smaller the value of "variance", the more the number of teacher data belonging to each condition in the category (for example, "far", "medium", "near" in the "distance" category) is possible. It is shown that they are equal.

撮影部１１は、センサ１０８がユーザの接近を検知すると、カメラ１０４を用いてユーザを撮影する。撮影部１１は、カメラ１０４の撮影によって取得した撮像データに含まれる顔領域を検出し、当該顔領域を抽出した顔画像データを取得する。顔領域は少なくともユーザの顔（頭部）を含む領域であり、ユーザの顔とその周囲を含んでもよい。撮影部１１は、取得した顔画像データを、顔画像データを一意に識別するＰＩＤと対応付けて補助記憶部１０３に記憶させる。 When the sensor 108 detects the approach of the user, the photographing unit 11 photographs the user using the camera 104. The photographing unit 11 detects the face area included in the imaged data acquired by the image taken by the camera 104, and acquires the face image data obtained by extracting the face area. The face area is an area including at least the user's face (head), and may include the user's face and its surroundings. The photographing unit 11 stores the acquired face image data in the auxiliary storage unit 103 in association with a UID that uniquely identifies the face image data.

推定部１２は、判定データモデル１８を用いて、撮影部１１がカメラ１０４を用いて撮影した人物の推定を行う。推定部１２は、例えば、撮影部１１が取得した顔画像データと判定データモデル１８とを照合し、人物の氏名を推定する。推定部１２は、「推定部」の一例である。 The estimation unit 12 estimates the person photographed by the photographing unit 11 using the camera 104 using the determination data model 18. For example, the estimation unit 12 collates the face image data acquired by the photographing unit 11 with the determination data model 18 and estimates the name of the person. The estimation unit 12 is an example of the “estimation unit”.

対話部１３は、推定部１２が推定したユーザとの対話を行う。対話部１３は、例えば、推定部１２が推定した人物の氏名を含む問いかけを出力する。対話部１３は、例えば、問いかけとしてユーザの氏名を含む挨拶をスピーカー１０６から出力する。対話部１３は、問いかけに対するユーザからの応答をマイクロフォン１０５を介して取得する。対話部１３は、例えば、取得したユーザからの応答を音声解析して、応答に含まれる氏名を取得する。対話部１３は、「対話部」の一例である。 The dialogue unit 13 has a dialogue with the user estimated by the estimation unit 12. The dialogue unit 13 outputs, for example, a question including the name of the person estimated by the estimation unit 12. For example, the dialogue unit 13 outputs a greeting including the user's name as a question from the speaker 106. The dialogue unit 13 acquires a response from the user to the question via the microphone 105. For example, the dialogue unit 13 voice-analyzes the acquired response from the user and acquires the name included in the response. The dialogue unit 13 is an example of the “dialogue unit”.

判定部１４は、対話部１３が取得したユーザからの応答を基に、応答したユーザと推定部１２が推定した人物とが一致しているか否かを判定する。判定部１４は、問いかけに含まれる氏名と対話部１３が取得した氏名とが一致しない場合に、応答したユーザと推定部１２が推定した人物とは一致しないと判定する。また、判定部１４は、問いかけに含まれる氏名と、対話部１３が取得した氏名とが一致する場合に、応答したユーザと推定部１２が推定した人物とが一致すると判定する。なお、判定部１４は、問いかけに含まれる氏名を否定しない応答を対話部１３が受信したときに、応答したユーザと推定部１２が推定した人物とは一致すると判定してもよい。 The determination unit 14 determines whether or not the responding user and the person estimated by the estimation unit 12 match, based on the response from the user acquired by the dialogue unit 13. When the name included in the question and the name acquired by the dialogue unit 13 do not match, the determination unit 14 determines that the responding user and the person estimated by the estimation unit 12 do not match. Further, when the name included in the question and the name acquired by the dialogue unit 13 match, the determination unit 14 determines that the responding user and the person estimated by the estimation unit 12 match. When the dialogue unit 13 receives a response that does not deny the name included in the question, the determination unit 14 may determine that the responding user and the person estimated by the estimation unit 12 match.

応答したユーザと推定部１２が推定した人物とが一致しない具体例としては、対話部１３が「おはようございます、Ａ山Ｂ太郎さん」との問いかけを出力し、当該問いかけに対する応答として「いいえ、私はＣ田Ｄ男です」との応答をユーザから取得した場合を挙げることができる。この場合、ユーザの応答に含まれる氏名「Ｃ田Ｄ男」は、問いかけに含まれる氏名「Ａ山Ｂ太郎」と一致しない。換言すれば、ユーザの応答は、推定部１２が推定した氏名とは異なる氏名を含む。そこで、判定部１４は、応答したユーザと推定部１２が推定した人物とは一致しないと判定する。 As a specific example in which the responding user and the person estimated by the estimation unit 12 do not match, the dialogue unit 13 outputs a question "Good morning, Mr. A mountain B Taro", and the response to the question is "No, I'm a C field D man ", for example, when I get a response from a user. In this case, the name "C field D man" included in the user's response does not match the name "A mountain B Taro" included in the question. In other words, the user response includes a name different from the name estimated by the estimation unit 12. Therefore, the determination unit 14 determines that the responding user and the person estimated by the estimation unit 12 do not match.

また、応答したユーザと推定部１２が推定した人物とが一致している具体例としては、対話部１３が「おはようございます、Ａ山Ｂ太郎さん」との問いかけを出力し、当該問いかけに対する応答として「はい、おはようございます」や「はい、おはようございます、Ａ山Ｂ太郎です」との応答をユーザから取得した場合を挙げることができる。この場合、ユーザの応答は、問いかけに含まれる氏名を否定していない。そこで、判定部１４は、応答したユーザと推定部１２が推定した人物とは一致していると判定する。 Further, as a specific example in which the responding user and the person estimated by the estimation unit 12 match, the dialogue unit 13 outputs a question "Good morning, Mr. A mountain B Taro" and responds to the question. For example, when a response such as "Yes, good morning" or "Yes, good morning, A mountain B Taro" is obtained from the user. In this case, the user's response does not deny the name included in the question. Therefore, the determination unit 14 determines that the responding user and the person estimated by the estimation unit 12 match.

算出部１５は、撮影条件管理テーブル１９１を参照して、判定データモデル１８の教師データとした顔画像データの撮影条件毎の出現頻度のばらつきを示す指標を算出する。撮影条件毎の出現頻度のばらつきを示す指標は、上記の通り、分散、標準偏差、変動係数等を挙げることができ、本実施形態では「分散」を採用する。算出部１５は、算出した分散を撮影条件管理テーブル１９１に記憶させる。 The calculation unit 15 refers to the shooting condition management table 191 and calculates an index indicating the variation in the appearance frequency of the face image data used as the teacher data of the determination data model 18 for each shooting condition. As the index showing the variation in the appearance frequency for each shooting condition, the variance, standard deviation, coefficient of variation, etc. can be mentioned as described above, and “variance” is adopted in the present embodiment. The calculation unit 15 stores the calculated variance in the shooting condition management table 191.

更新部１６は、応答したユーザと推定部１２が推定した人物とが一致しないと判定部１４が判定した場合に、判定データモデル１８の構築に用いる教師データの更新を行う。更新部１６は、ユーザからの応答に含まれる氏名を取得する。更新部１６は、取得した氏名に対応する判定データモデル１８を補助記憶部１０３から読み出す。更新部１６は、撮影部１１が取得した顔画像データを用いて、教師データの撮影条件の分散が小さくなるように、読み出した判定データモデル１８の教師データの更新を行う。換言すれば、更新部１６は、様々な撮影条件の下で撮影された教師データを用いて判定データモデル１８が構築
されるように、教師データの更新を行う。教師データの更新は、例えば、判定データモデル１８の教師データのうち、いずれかの教師データを撮影部１１が取得した顔画像データと入れ替えることで行う。 When the determination unit 14 determines that the responding user and the person estimated by the estimation unit 12 do not match, the update unit 16 updates the teacher data used for constructing the determination data model 18. The update unit 16 acquires the name included in the response from the user. The update unit 16 reads the determination data model 18 corresponding to the acquired name from the auxiliary storage unit 103. The update unit 16 uses the face image data acquired by the photographing unit 11 to update the teacher data of the read determination data model 18 so that the dispersion of the photographing conditions of the teacher data becomes small. In other words, the update unit 16 updates the teacher data so that the determination data model 18 is constructed using the teacher data shot under various shooting conditions. The teacher data is updated, for example, by replacing any of the teacher data among the teacher data of the determination data model 18 with the face image data acquired by the photographing unit 11.

更新部１６は、補助記憶部１０３から読み出した判定データモデル１８の教師データのうち、入れ替えの対象となる教師データを特定する。更新部１６は、例えば、判定データモデル１８の教師データのそれぞれを撮影部１１が取得した顔画像データと入れ替えた場合における撮影条件の分散を算出部１５に算出させる。更新部１６は、算出させた撮影条件の分散が入れ替え前よりも小さい値となる教師データを入れ替え対象として特定する。更新部１６は、好ましくは、撮影部１１が取得した顔画像データと入れ替えることで撮影条件の分散が最も小さくなる教師データを入れ替え対象として特定する。更新部１６は、特定した教師データを撮影部１１が取得した顔画像データと入れ替えることで、新たな教師データセットを作成する。更新部１６は、新たな教師データセットを、例えば、ＳＶＭ等の分類器に入力することで、判定データモデル１８を更新する。なお、更新部１６は、判定データモデル１８の構築に用いた教師データのいずれを撮影部１１が取得した顔画像データと入れ替えても撮影条件の分散が小さくならない場合、教師データの入れ替えを実行しなくともよい。 The update unit 16 identifies the teacher data to be replaced among the teacher data of the determination data model 18 read from the auxiliary storage unit 103. For example, the update unit 16 causes the calculation unit 15 to calculate the variance of the shooting conditions when each of the teacher data of the determination data model 18 is replaced with the face image data acquired by the shooting unit 11. The update unit 16 specifies the teacher data whose variance of the calculated shooting conditions is smaller than that before the replacement as the replacement target. The updating unit 16 preferably specifies the teacher data as the replacement target, which minimizes the dispersion of the shooting conditions by replacing the face image data acquired by the shooting unit 11. The update unit 16 creates a new teacher data set by replacing the specified teacher data with the face image data acquired by the shooting unit 11. The update unit 16 updates the determination data model 18 by inputting a new teacher data set into a classifier such as SVM. If any of the teacher data used for constructing the determination data model 18 is replaced with the face image data acquired by the shooting unit 11, the updating unit 16 replaces the teacher data when the dispersion of the shooting conditions is not reduced. It doesn't have to be.

ここで、更新部１６が入れ替えの対象となる教師データを特定する処理の具体例について説明する。図４は、実施形態において、撮影部が取得した顔画像データの撮影条件の一例を示す図である。図４では、図３と同様に、「ＰＩＤ」、「撮影日」および「撮影条件」の各項目を含み、「撮影条件」の項目は、「向き」、「距離」および「明暗」の各項目を含む。「ＰＩＤ」、「撮影日」および「撮影条件」の各項目は、図３と同一であるため、その説明を省略する。更新部１６は、撮影条件管理テーブル１９１に登録されている撮像データ（すなわち、教師データ）のそれぞれについて、撮影部１１が取得した顔画像データと入れ替えた場合の分散を算出部１５に算出させる。 Here, a specific example of the process for the update unit 16 to specify the teacher data to be replaced will be described. FIG. 4 is a diagram showing an example of shooting conditions for face image data acquired by the shooting unit in the embodiment. In FIG. 4, as in FIG. 3, each item of “PID”, “shooting date” and “shooting condition” is included, and the item of “shooting condition” is each of “direction”, “distance” and “light and darkness”. Includes items. Since each item of "PID", "shooting date" and "shooting condition" is the same as that in FIG. 3, the description thereof will be omitted. The update unit 16 causes the calculation unit 15 to calculate the variance when the imaging data (that is, the teacher data) registered in the shooting condition management table 191 is replaced with the face image data acquired by the shooting unit 11.

図５は、教師データを入れ替えた場合における分散の一例を示す図である。図５では、説明のため、教師データのそれぞれについて、撮影部１１が取得した顔画像データと入れ替えた場合における分散合計が「更新後分散」として撮影条件管理テーブル１９１とともに例示される。図５を参照すると、いずれの教師データを入れ替え対象としても、現在の分散合計「４２．７」よりも改善される（分散合計が小さくなる）ことが理解できる。そのため、更新部１６は、教師データのいずれを更新対象として特定してもよい。 FIG. 5 is a diagram showing an example of dispersion when the teacher data is exchanged. In FIG. 5, for the sake of explanation, the total variance when the teacher data is replaced with the face image data acquired by the photographing unit 11 is illustrated together with the photographing condition management table 191 as “updated variance”. With reference to FIG. 5, it can be understood that any teacher data to be replaced is improved (the total variance becomes smaller) than the current total variance "42.7". Therefore, the update unit 16 may specify any of the teacher data as the update target.

さらに、図５を参照すると、例えば、ＰＩＤ「１」またはＰＩＤ「１６」の教師データを撮像部１１が取得した顔画像データと入れ替えると、分散合計は「４２．７」となり、入れ替え後の分散合計の値を最も小さくすることができる。すなわち、入れ替えによって、撮影条件毎の出現頻度のばらつきを最も小さくすることができる。そのため、ＰＩＤ「１」またはＰＩＤ「１６」の教師データを入れ替え対象として特定することがより好ましいといえる。 Further, referring to FIG. 5, for example, when the teacher data of PID "1" or PID "16" is replaced with the face image data acquired by the imaging unit 11, the total variance becomes "42.7", and the variance after the replacement becomes "42.7". The total value can be the smallest. That is, by replacing them, the variation in the appearance frequency for each shooting condition can be minimized. Therefore, it is more preferable to specify the teacher data of PID "1" or PID "16" as a replacement target.

ＰＩＤ「１」およびＰＩＤ「１６」の教師データのように、入れ替え後の分散合計を最も小さくする教師データが複数存在する場合には、更新部１６は、例えば、撮影日時が最も古い教師データを入れ替え対象として特定してもよい。図５の場合、ＰＩＤ「１」の教師データの撮影日時は「２０１７年７月９日」であり、ＰＩＤ「１６」の教師データの撮影日時は「２０１７年７月７日」である。ＰＩＤ「１６」の教師データの方がＰＩＤ「１」の教師データよりも撮影日時が古いため、更新部１６はＰＩＤ「１６」の教師データを入れ替え対象として特定すればよい。更新部１６は、「更新部」の一例である。 When there are a plurality of teacher data that minimize the total variance after replacement, such as the teacher data of PID "1" and PID "16", the update unit 16 selects, for example, the teacher data having the oldest shooting date and time. It may be specified as a replacement target. In the case of FIG. 5, the shooting date and time of the teacher data of PID "1" is "July 9, 2017", and the shooting date and time of the teacher data of PID "16" is "July 7, 2017". Since the shooting date and time of the teacher data of PID "16" is older than that of the teacher data of PID "1", the update unit 16 may specify the teacher data of PID "16" as a replacement target. The update unit 16 is an example of the “update unit”.

初期構築部１７は、様々な撮影条件の下でカメラ１０４に撮影させた複数の撮像データ
を教師データセットとして、判定データモデル１８の初期構築を行う。初期構築部１７は、例えば、ユーザから初期構築開始を指示された場合に、判定データモデル１８の初期構築を開始する。判定データモデル１８の初期構築は、例えば、ユーザを推定するためのモデルを新規に構築することを含む。初期構築部１７は、例えば、ユーザに対して、「右を向いてください」、「もっと近づいてください」等の指示をスピーカー１０６から出力して撮影を行うことで、様々な撮影条件における撮像データを取得する。初期構築部１７は、カメラ１０４にユーザを撮影させて取得した撮像データに対して顔認識処理を行い、撮像データからユーザの顔が撮影された顔画像データを抽出する。初期構築部１７は、抽出した顔画像データを顔画像データを一意に識別するＰＩＤと対応付けて補助記憶部１０３に記憶させる。初期構築部１７は、このように補助記憶部１０３に記憶させた複数の顔画像データを教師データセットとしてＳＶＭ等の分類器に入力して、判定データモデル１８の初期構築を行う。 The initial construction unit 17 initially constructs the determination data model 18 using a plurality of imaging data captured by the camera 104 under various shooting conditions as a teacher data set. The initial construction unit 17 starts the initial construction of the determination data model 18, for example, when the user instructs the start of the initial construction. The initial construction of the determination data model 18 includes, for example, constructing a new model for estimating the user. The initial construction unit 17 outputs imaging data under various shooting conditions to the user, for example, by outputting instructions such as "turn to the right" and "closer" from the speaker 106 to perform shooting. To get. The initial construction unit 17 performs face recognition processing on the imaged data acquired by having the camera 104 photograph the user, and extracts the face image data in which the user's face is photographed from the imaged data. The initial construction unit 17 stores the extracted face image data in the auxiliary storage unit 103 in association with a UID that uniquely identifies the face image data. The initial construction unit 17 inputs the plurality of face image data stored in the auxiliary storage unit 103 as a teacher data set into a classifier such as an SVM, and performs initial construction of the determination data model 18.

初期構築部１７は、構築した判定データモデル１８を補助記憶部１０３に記憶させる。なお、補助記憶部１０３の記憶容量は有限であることから、判定データモデル１８が使用することができる補助記憶部１０３の容量には制限がある。そのため、判定データモデル１８の構築に用いる教師データの数には制限が生じる。判定データモデル１８の構築に用いる教師データの数は、補助記憶部１０３の容量に応じて適宜定めればよい。本実施形態では、例えば、判定データモデル１８の構築に用いる教師データの数は「２０」とする。 The initial construction unit 17 stores the constructed determination data model 18 in the auxiliary storage unit 103. Since the storage capacity of the auxiliary storage unit 103 is finite, the capacity of the auxiliary storage unit 103 that can be used by the determination data model 18 is limited. Therefore, the number of teacher data used for constructing the determination data model 18 is limited. The number of teacher data used for constructing the determination data model 18 may be appropriately determined according to the capacity of the auxiliary storage unit 103. In the present embodiment, for example, the number of teacher data used for constructing the determination data model 18 is "20".

初期構築部１７は、さらに、顔画像データのそれぞれについて、撮影条件を判定する。撮影条件は、例えば、撮影条件管理テーブル１９１について図３を用いて例示した、「向き」、「距離」および「明暗」を挙げることができる。初期構築部１７は、例えば、撮影条件「向き」については、顔画像データにおける顔の輪郭と目、鼻、口の位置関係から正面向き、左向き、右向き、上向き、下向きを判定する。初期構築部１７は、例えば、撮影条件「距離」については、顔画像データの画像サイズ（画素数）の大きさを基に、「遠」、「中」、「近」のいずれに該当するかを判定する。初期構築部１７は、例えば、撮影条件「明暗」については、顔画像データの輝度分布を用いて、「明」、「中」、「暗」のいずれに該当するかを判定する。初期構築部１７は、顔画像データのＰＩＤ、撮像データの撮影日時および判定した撮影条件とを対応付けて、撮影条件管理テーブル１９１に記憶させる。 The initial construction unit 17 further determines the shooting conditions for each of the face image data. Examples of the shooting conditions include "direction", "distance", and "light and darkness", which are exemplified with reference to FIG. 3 for the shooting condition management table 191. For example, the initial construction unit 17 determines the front direction, left direction, right direction, upward direction, and downward direction from the positional relationship between the face contour and the eyes, nose, and mouth in the face image data with respect to the shooting condition "direction". For example, the initial construction unit 17 determines whether the shooting condition "distance" corresponds to "far", "medium", or "near" based on the size of the image size (number of pixels) of the face image data. To judge. For example, the initial construction unit 17 determines whether the shooting condition “light / dark” corresponds to “bright”, “medium”, or “dark” by using the luminance distribution of the face image data. The initial construction unit 17 stores the PID of the face image data, the shooting date and time of the imaging data, and the determined shooting condition in association with each other in the shooting condition management table 191.

＜処理フロー＞
以上説明した実施形態に係るコミュニケーション装置１の処理フローについて図面を参照して説明する。 <Processing flow>
The processing flow of the communication device 1 according to the embodiment described above will be described with reference to the drawings.

（判定データベース１８の初期構築）
図６は、実施形態における判定データベースの初期構築の処理フローを示す図である。図６の処理フローは、例えば、ユーザから初期構築開始を指示された場合に開始される。以下、図６を参照して、実施形態における判定データベース１８の初期構築の処理フローについて説明する。 (Initial construction of judgment database 18)
FIG. 6 is a diagram showing a processing flow of initial construction of the determination database in the embodiment. The processing flow of FIG. 6 is started, for example, when the user instructs the start of initial construction. Hereinafter, the processing flow of the initial construction of the determination database 18 in the embodiment will be described with reference to FIG.

Ｔ１では、初期構築部１７は、ユーザに対する指示（例えば、「右を向いてください」、「もっと近づいてください」等）をスピーカー１０６から出力し、カメラ１０４にユーザの撮影を実行させる。Ｔ２では、初期構築部１７は、Ｔ１で撮影した撮像データから顔領域の検出を行う。Ｔ３では、初期構築部１７、Ｔ１で撮影した撮像データに顔が含まれているか否かを判定する。すなわち、初期構築部１７は、Ｔ２で顔領域の検出ができたか否かを判定する。撮像データに顔が含まれている場合（Ｔ３でＹＥＳ）、処理はＴ４に進められる。撮像データに顔が含まれていない場合（Ｔ３でＮＯ）、処理はＴ１に進められる。 In T1, the initial construction unit 17 outputs an instruction to the user (for example, "turn right", "closer", etc.) from the speaker 106, and causes the camera 104 to take a picture of the user. In T2, the initial construction unit 17 detects the face region from the imaged data captured in T1. In T3, it is determined whether or not the imaged data captured by the initial construction unit 17 and T1 includes a face. That is, the initial construction unit 17 determines whether or not the face region can be detected by T2. If the captured data includes a face (YES at T3), the process proceeds to T4. If the captured data does not include a face (NO at T3), the process proceeds to T1.

Ｔ４では、初期構築部１７は、Ｔ１で撮影した撮像データから顔画像データを抽出する。Ｔ５では、初期構築部１７は、Ｔ４で抽出した顔画像データの撮影条件を抽出する。初期構築部１７は、例えば、Ｔ４で抽出した顔画像データを解析して、ユーザの顔の向き、コミュニケーション装置１とユーザとの距離、ユーザの周囲の明るさ（照度）等の撮影条件を抽出する。 In T4, the initial construction unit 17 extracts face image data from the imaged data captured in T1. In T5, the initial construction unit 17 extracts the shooting conditions of the face image data extracted in T4. For example, the initial construction unit 17 analyzes the face image data extracted in T4 and extracts shooting conditions such as the orientation of the user's face, the distance between the communication device 1 and the user, and the brightness (illuminance) around the user. To do.

Ｔ６では、初期構築部１７は、Ｔ１で撮影した撮影データの撮影日時を取得する。初期構築部１７は、例えば、撮像データのＥｘｉｆデータから撮影日時を取得する。Ｔ７では、初期構築部１７は、Ｔ４で抽出した顔画像データと当該顔画像データを一意に識別するＰＩＤとを対応付けて補助記憶部１０３に記憶させる。 In T6, the initial construction unit 17 acquires the shooting date and time of the shooting data shot in T1. The initial construction unit 17 acquires, for example, the shooting date and time from the Exif data of the imaging data. In T7, the initial construction unit 17 stores the face image data extracted in T4 and the UID that uniquely identifies the face image data in the auxiliary storage unit 103 in association with each other.

Ｔ８では、初期構築部１７は、Ｔ５で抽出した撮影条件、Ｔ６で取得した撮影日時および顔画像データを一意に識別するＰＩＤとを対応付けて、撮影条件管理テーブル１９１を更新する。Ｔ１からＴ８の処理は、例えば、２０回繰り返し実行されることで、様々な撮影条件下で撮影された２０枚の顔画像データが収集される。 In T8, the initial construction unit 17 updates the shooting condition management table 191 in association with the shooting conditions extracted in T5, the shooting date and time acquired in T6, and the UID that uniquely identifies the face image data. The processing of T1 to T8 is executed, for example, 20 times repeatedly, so that 20 face image data taken under various shooting conditions are collected.

Ｔ９では、初期構築部１７は、Ｔ６で補助記憶部１０３に記憶させた２０枚の顔画像データを教師データセットとしてＳＶＭ等の分類器に入力して、判定データモデル１８を構築する。 In T9, the initial construction unit 17 constructs the determination data model 18 by inputting the 20 face image data stored in the auxiliary storage unit 103 in T6 as a teacher data set into a classifier such as SVM.

（判定データモデル１８の更新）
図７および図８は、実施形態における、判定データモデルの更新処理の処理フローの一例を示す図である。図７の「Ａ」は図８の「Ａ」と接続し、図７の「Ｂ」は図８の「Ｂ」と接続する。以下、図７および図８を参照して、判定データモデル１８の更新処理の処理フローの一例について説明する。 (Update of judgment data model 18)
7 and 8 are diagrams showing an example of the processing flow of the determination data model update processing in the embodiment. “A” in FIG. 7 is connected to “A” in FIG. 8, and “B” in FIG. 7 is connected to “B” in FIG. Hereinafter, an example of the processing flow of the update processing of the determination data model 18 will be described with reference to FIGS. 7 and 8.

Ｔ１からＴ４の処理は、処理の主体が初期構築部１７からセンサ１０８によってユーザの接近を検知した撮影部１１に代わることを除いて、図６のＴ１からＴ３の処理と同一である。そのため、その説明を省略する。 The processing of T1 to T4 is the same as the processing of T1 to T3 in FIG. 6, except that the main body of the processing is replaced by the photographing unit 11 in which the approach of the user is detected by the sensor 108 from the initial construction unit 17. Therefore, the description thereof will be omitted.

Ｔ１１では、推定部１２は、Ｔ４で抽出した顔画像データを基に、判定データモデル１８を用いて、ユーザの氏名を推定する。判定データモデル１８は、例えば、図６を参照して説明した初期構築処理によって構築済みである。 In T11, the estimation unit 12 estimates the user's name using the determination data model 18 based on the face image data extracted in T4. The determination data model 18 has already been constructed by, for example, the initial construction process described with reference to FIG.

Ｔ１２では、対話部１３は、Ｔ１１で推定したユーザの氏名を含む問いかけをスピーカー１０６から出力する。問いかけは、例えば、挨拶であってもよい。対話部１３は、例えば、「おはようございます、Ａ山Ｂ太郎さん」という音声による問いかけをスピーカー１０６から出力する。対話部１３は、スピーカー１０６から出力した問いかけに対するユーザからの応答をマイクロフォン１０５を介して受信する。対話部１３は、マイクロフォン１０５を介して受信した応答からユーザの氏名を取得する。 At T12, the dialogue unit 13 outputs a question including the user's name estimated in T11 from the speaker 106. The question may be, for example, a greeting. The dialogue unit 13 outputs, for example, a voice question "Good morning, Mr. A mountain B Taro" from the speaker 106. The dialogue unit 13 receives a response from the user to the question output from the speaker 106 via the microphone 105. The dialogue unit 13 acquires the user's name from the response received via the microphone 105.

Ｔ１３では、判定部１４は、Ｔ１２で対話部１３が受信した応答を基に、Ｔ１１において推定部１２が推定したユーザの氏名が妥当であったか否かを判定する。妥当である場合（Ｔ１３でＹＥＳ）、判定データモデル１８の更新は行わずに処理は終了する。妥当ではない場合（Ｔ１３でＮＯ）、処理はＴ４に進められる。 In T13, the determination unit 14 determines whether or not the user's name estimated by the estimation unit 12 in T11 is valid based on the response received by the dialogue unit 13 in T12. If it is valid (YES in T13), the process ends without updating the determination data model 18. If not valid (NO at T13), processing proceeds to T4.

Ｔ５からＴ７の処理は、処理の主体が初期構築部１７から更新部１６に代わることを除いて、図６のＴ４からＴ６の処理と同一であるため、その説明を省略する。Ｔ１４では、更新部１６は、Ｔ１２で取得したユーザの氏名がラベルとして付与された撮影条件管理テ
ーブル１９１を特定する。更新部１６は、特定した撮影条件管理テーブル１９１に登録済みの教師データのそれぞれについて、Ｔ４で抽出した顔画像データと入れ替えたときの撮影条件の分散合計を、算出部１５に算出させる。更新部１６は、Ｔ２で抽出された顔画像データと入れ替えたときにおける撮影条件の分散合計が入れ替え前における分散合計以下となる教師データを入れ替え対象の教師データとして特定する。 Since the processing of T5 to T7 is the same as the processing of T4 to T6 in FIG. 6 except that the main body of the processing is changed from the initial construction unit 17 to the update unit 16, the description thereof will be omitted. In T14, the update unit 16 identifies the shooting condition management table 191 to which the user's name acquired in T12 is given as a label. The update unit 16 causes the calculation unit 15 to calculate the total variance of the shooting conditions when the teacher data registered in the specified shooting condition management table 191 is replaced with the face image data extracted in T4. The update unit 16 specifies teacher data in which the total variance of the shooting conditions at the time of replacement with the face image data extracted in T2 is equal to or less than the total variance before replacement as the teacher data to be replaced.

なお、更新部１６は、どの教師データと入れ替えても、撮影条件の分散合計が入れ替え前よりも大きい値となる場合、更新対象となる教師データは無いと判定してもよい。更新対象となる教師データがある場合（Ｔ１４でＹＥＳ）、処理はＴ１５に進められる。更新対象となる教師データが無い場合（Ｔ１４でＮＯ）、判定データモデル１８の更新は行わずに処理は終了する。 Note that the update unit 16 may determine that there is no teacher data to be updated if the total variance of the shooting conditions is larger than that before the replacement regardless of which teacher data is replaced. If there is teacher data to be updated (YES at T14), the process proceeds to T15. If there is no teacher data to be updated (NO in T14), the process ends without updating the determination data model 18.

Ｔ１５では、更新部１６は、Ｔ１２で取得したユーザの氏名をモデル名とした判定データモデル１８の教師データのうち、Ｔ１４で特定した教師データと、Ｔ４で抽出した顔画像データとを入れ替えて、新たな教師データセットを決定する。Ｔ１６では、更新部１６は、Ｔ１５で決定した教師データセットをＳＶＭに入力して、判定データモデル１８の更新を行う。 In T15, the update unit 16 replaces the teacher data specified in T14 with the face image data extracted in T4 among the teacher data of the determination data model 18 using the user's name acquired in T12 as the model name. Determine a new teacher dataset. In T16, the update unit 16 inputs the teacher data set determined in T15 into the SVM and updates the determination data model 18.

＜実施形態の作用効果＞
実施形態では、コミュニケーション装置１は、撮像部１１が撮影させた撮像データ（顔画像データ）を基にユーザを推定し、推定したユーザの氏名を含む問いかけを行う。推定した人物とユーザとが異なることを当該問いかけに対する応答が示す場合、コミュニケーション装置１は、判定データモデル１８に用いる教師データの撮影条件の分散合計が
入れ替え前の分散合計以下となるように、撮像部１１が撮影させた撮像データと教師データのいずれかとの入れ替えを行う。すなわち、実施形態によれば、撮影条件に係る指示をユーザに対して出さなくとも、ユーザとの対話の中で判定データモデル１８の更新を行うことができるため、判定データモデル１８の更新に係るユーザへの負担を軽減することができる。 <Action and effect of the embodiment>
In the embodiment, the communication device 1 estimates a user based on the imaged data (face image data) captured by the image capturing unit 11, and asks a question including the estimated user's name. When the response to the question indicates that the estimated person and the user are different, the communication device 1 takes an image so that the total variance of the shooting conditions of the teacher data used in the determination data model 18 is equal to or less than the total variance before replacement. Either the imaging data captured by the unit 11 or the teacher data is exchanged. That is, according to the embodiment, the determination data model 18 can be updated in a dialogue with the user without issuing an instruction relating to the shooting conditions to the user, and thus the determination data model 18 is updated. The burden on the user can be reduced.

実施形態によるコミュニケーション装置１は、ディスプレイを備えていない場合であっても、判定データモデル１８の更新に係るユーザへの負担を軽減することができる。例えば、コミュニケーション装置１がディスプレイを有している場合、ユーザは判定データモデル１８の教師データとした顔画像データが様々な撮影条件の下で撮影されたものか否か（すなわち、撮影条件の分散が小さいか否か）をディスプレイを用いて確認し、教師データとして不足する撮影条件を把握することができる。一方、コミュニケーション装置１がディスプレイを備えていない場合には、教師データとした顔画像データの確認をユーザは行うことは困難となる。実施形態によれば、様々な撮影条件の下で撮影された顔画像データが教師データとなるように更新部１６が更新を行うため、ユーザが確認および修正を行わなくとも、人物の推定に好適な判定データモデル１８を構築することができる。 The communication device 1 according to the embodiment can reduce the burden on the user related to the update of the determination data model 18 even when the display is not provided. For example, when the communication device 1 has a display, the user determines whether or not the face image data used as the teacher data of the determination data model 18 is photographed under various shooting conditions (that is, the distribution of shooting conditions). Is small or not) can be confirmed using a display, and the shooting conditions that are insufficient as teacher data can be grasped. On the other hand, when the communication device 1 does not have a display, it is difficult for the user to confirm the face image data as the teacher data. According to the embodiment, since the update unit 16 updates the face image data taken under various shooting conditions as teacher data, it is suitable for estimating a person without confirmation and correction by the user. Judgment data model 18 can be constructed.

ここで、分散は教師データの撮影条件毎のばらつきを示すため、更新部１６は、撮影条件毎の出現頻度のばらつきを減少させるように、撮像部１１が撮影させた撮像データと教師データのいずれかとの入れ替えを行うということもできる。このように教師データが入れ替えられることで、判定データモデル１８は様々な撮影条件の下で撮影された撮像データ（顔画像データ）を教師データとすることができる。このような教師データを基に判定データモデル１８が構築されるため、コミュニケーション装置１は、様々な条件下におけるユーザの推定精度を高めることができる。 Here, since the variance indicates the variation of the teacher data for each shooting condition, the updating unit 16 may use either the imaging data or the teacher data captured by the imaging unit 11 so as to reduce the variation in the appearance frequency for each shooting condition. It is also possible to replace the heel. By exchanging the teacher data in this way, the determination data model 18 can use the imaging data (face image data) taken under various shooting conditions as the teacher data. Since the determination data model 18 is constructed based on such teacher data, the communication device 1 can improve the estimation accuracy of the user under various conditions.

また、上記のように教師データが入れ替えられるため、教師データの数が所定数（実施形態では２０）に制限されていても、更新部１６は、その制限された数の中で、様々な撮
影条件の下撮影された撮像データを教師データとして判定データモデル１８を構築することができる。そのため、実施形態によれば、教師データの数が制限されていても、ユーザの推定精度を高めることができる。また、所定数に制限された教師データを基に判定データモデル１８が構築されるため、教師データの数に制限を設けない場合と比較して、主記憶部１０２や補助記憶部１０３の利用効率を高めることができる。 Further, since the teacher data is replaced as described above, even if the number of teacher data is limited to a predetermined number (20 in the embodiment), the update unit 16 may perform various shootings within the limited number. The determination data model 18 can be constructed using the imaging data taken under the conditions as the teacher data. Therefore, according to the embodiment, the estimation accuracy of the user can be improved even if the number of teacher data is limited. Further, since the determination data model 18 is constructed based on the teacher data limited to a predetermined number, the utilization efficiency of the main storage unit 102 and the auxiliary storage unit 103 is compared with the case where the number of teacher data is not limited. Can be enhanced.

実施形態では、図５のＴ２で取得した顔画像データと入れ替えた後の分散が同じ値となる教師データが複数存在する場合には、更新部１６は、撮影日時が最も古い教師データを入れ替え対象として特定する。ユーザの顔は、成長や老化等の影響により時間の経過とともに変化する。実施形態では、撮影日時が古い教師データを優先して入れ替えることで、コミュニケーション装置１による人物の推定精度を高めることができる。 In the embodiment, when there are a plurality of teacher data having the same variance after replacement with the face image data acquired in T2 of FIG. 5, the update unit 16 replaces the teacher data with the oldest shooting date and time. Identify as. The user's face changes with the passage of time due to the effects of growth and aging. In the embodiment, the accuracy of estimating a person by the communication device 1 can be improved by preferentially replacing the teacher data having an old shooting date and time.

＜第１変形例＞
実施形態では、撮影条件の分散合計を基に、図７のＴ４で取得した顔画像データと教師データとの入れ替えを行う。第１変形例では、撮影条件中の複数のカテゴリ毎に算出した分散を基に、図７のＴ４で取得した顔画像データと教師データとの入れ替えを行う。実施形態と同一の構成については同一の符号を付し、その説明を省略する。以下、図面を参照して、第１変形例について説明する。 <First modification>
In the embodiment, the face image data acquired in T4 of FIG. 7 and the teacher data are exchanged based on the total variance of the shooting conditions. In the first modification, the face image data acquired in T4 of FIG. 7 and the teacher data are exchanged based on the variance calculated for each of a plurality of categories in the shooting conditions. The same components as those in the embodiment are designated by the same reference numerals, and the description thereof will be omitted. Hereinafter, the first modification will be described with reference to the drawings.

図９は、第１変形例に係るコミュニケーション装置の処理ブロックの一例を示す図である。図９に例示さえるコミュニケーション装置１ａは、更新部１６に代えて更新部１６ａを備える点で、実施形態に係るコミュニケーション装置１とは異なる。 FIG. 9 is a diagram showing an example of a processing block of the communication device according to the first modification. The communication device 1a illustrated in FIG. 9 is different from the communication device 1 according to the embodiment in that the update unit 16a is provided in place of the update unit 16.

更新部１６ａは、管理データベース１９の撮影条件管理テーブル１９１を参照して、撮影条件の各カテゴリ（図３の例では「向き」、「距離」および「明暗」）のうち、最も分類が大きいカテゴリを特定する。更新部１６ａは、図５のＴ２で取得した顔画像データと入れ替えることで、特定したカテゴリについての分類を入れ替え前以下の値とすることができる教師データを特定する。更新部１６ａは、特定した教師データを図５のＴ２で取得した顔画像データと入れ替えて、新たな教師データセットを決定する。更新部１６ａは、決定した新たな教師データセットをＳＶＭに入力して、判定データモデル１８を更新する。 The update unit 16a refers to the shooting condition management table 191 of the management database 19, and refers to the category with the largest classification among the shooting condition categories (“direction”, “distance”, and “brightness” in the example of FIG. 3). To identify. The update unit 16a specifies teacher data that can be replaced with the face image data acquired in T2 of FIG. 5 so that the classification of the specified category can be set to a value equal to or less than that before the replacement. The update unit 16a replaces the specified teacher data with the face image data acquired in T2 of FIG. 5 to determine a new teacher data set. The update unit 16a inputs the determined new teacher data set into the SVM and updates the determination data model 18.

第１変形例によれば、撮影条件のカテゴリ毎の分散を基に入れ替え対象となる教師データの入れ替えを行うことで、他のカテゴリと比較して推定精度が低くなるカテゴリにおける推定精度の改善を行うことができる。 According to the first modification, by exchanging the teacher data to be exchanged based on the variance of each category of shooting conditions, the estimation accuracy in the category where the estimation accuracy is lower than that of other categories can be improved. It can be carried out.

＜その他の変形＞
実施形態では、「おはようございます、Ａ山Ｂ太郎さん」といったユーザの氏名を含む挨拶が対話部１３による問いかけの一例として挙げられたが、対話部１３による問いかけはあいさつに限定されるわけではない。対話部１３による問いかけは、例えば、「Ａ山Ｂ太郎さん、昨日の健康データをお知らせします」と呼びかけるものでもよい。また、呼びかけに対してコミュニケーション装置１の方向に向いたユーザを撮像部１１が撮影し、推定部１２によるユーザの推定をさらに再度実行してもよい。推定部１２が再度実行した推定の結果が「Ａ山Ｂ太郎」とは異なる「Ｃ田Ｄ男」となった場合、更新部１６によって「Ｃ田Ｄ男」に対応する判定データモデル１８の教師データを更新してもよい。 <Other variants>
In the embodiment, a greeting including the user's name such as "Good morning, Mr. A mountain B Taro" is given as an example of the question by the dialogue unit 13, but the question by the dialogue unit 13 is not limited to the greeting. .. The question asked by the dialogue unit 13 may be, for example, calling out, "Mr. A, B, Taro, I will inform you of yesterday's health data." Further, the imaging unit 11 may take a picture of the user facing the direction of the communication device 1 in response to the call, and the estimation unit 12 may further perform the estimation of the user. When the result of the estimation executed again by the estimation unit 12 is "C field D man" different from "A mountain B Taro", the teacher of the judgment data model 18 corresponding to "C field D man" by the update unit 16 The data may be updated.

コミュニケーション装置１は、ユーザのスケジュールや趣味等の情報を補助記憶部１０３に保持しておき、保持したこれらの情報を基にユーザに問いかけを行ってもよい。コミュニケーション装置１は、例えば、ユーザ「Ａ山Ｂ太郎」が昨日コンサートに行ったことを補助記憶部１０３に保持している場合、「昨日のコンサートは楽しかったですか？」と
の問いかけをスピーカー１０６から出力する。コンサートに行ったことを、例えば、「はい、楽しかったです」とコンサートに行ったことを否定しない応答を受信すると、ユーザが「Ａ山Ｂ太郎」であるとした推定は妥当であったと判定できる。また、コミュニケーション装置１は、例えば、「いいえ、コンサートは行ってないです」といったコンサートに行ったことを否定する応答を受信すると、ユーザが「Ａ山Ｂ太郎」であるとした推定が妥当ではなかったと判定できる。 The communication device 1 may store information such as a user's schedule and hobbies in the auxiliary storage unit 103, and may ask the user based on the stored information. For example, when the communication device 1 holds in the auxiliary storage unit 103 that the user "A mountain B Taro" went to the concert yesterday, the speaker 106 asks "Did you enjoy the concert yesterday?" Output from. If you receive a response that does not deny that you went to the concert, for example, "Yes, it was fun", you can judge that the presumption that the user was "A mountain B Taro" was valid. .. Further, when the communication device 1 receives a response denying that the user has gone to the concert, for example, "No, the concert has not been performed", the presumption that the user is "A mountain B Taro" is not valid. It can be determined that it was.

このようなスケジュールや趣味等の情報を用いる他の例として、推定部１２が推定した人物の昨日の歩数が１００００歩である場合を挙げることができる。この場合、対話部１３は「昨日はよく歩きましたね１００００歩です」との問いかけを出力する。この問いかけに対する応答として「そうですね。いい運動になった」との応答を受信した場合には、判定部１４は推定部１２による推定が妥当であると判定できる。また、「いいえ、そんなに歩いてないです」との応答を受信した場合には、判定部１４は推定部１２による推定が妥当ではなかったと判定できる。 As another example of using such information such as schedule and hobbies, the case where the number of steps of the person estimated by the estimation unit 12 yesterday is 10,000 steps can be mentioned. In this case, the dialogue unit 13 outputs a question that "I walked a lot yesterday, it's 10,000 steps." When the response to this question is "Yes, it was a good exercise", the determination unit 14 can determine that the estimation by the estimation unit 12 is appropriate. Further, when the response "No, I haven't walked so much" is received, the determination unit 14 can determine that the estimation by the estimation unit 12 is not valid.

スケジュールや趣味等の情報を用いる例として、さらに、推定部１２が推定したユーザ（Ａ山Ｂ太郎）の好きなＸＸについてのイベントが明日開催される場合を挙げることができる。この場合、対話部１３は「Ａ山Ｂ太郎さんの大好きなＸＸのイベントが明日開催されますよ」との問いかけを出力する。この問いかけに対する応答として「今回はパスしよう」とのＸＸが好きであるという「Ａ山Ｂ太郎」の趣味を否定しない応答を受信した場合には、判定部１４は推定部１２による推定が妥当であると判定できる。また、「私はＸＸは好きじゃないです」というＸＸが好きであるという「Ａ山Ｂ太郎」の趣味を否定する応答を受信した場合には、判定部１４は推定部１２による推定が妥当ではなかったと判定できる。なお、ＸＸに関するイベントの開催日等の情報は、コミュニケーション装置１がインターネット等の情報源を適宜検索して取得すればよい。 As an example of using information such as schedules and hobbies, there is a case where an event about the user's favorite XX estimated by the estimation unit 12 (A mountain B Taro) is held tomorrow. In this case, the dialogue section 13 outputs a question that "A mountain B Taro's favorite XX event will be held tomorrow." When receiving a response that does not deny the hobby of "A mountain B Taro" that he likes XX saying "Let's pass this time" as a response to this question, the estimation unit 14 is appropriate for the estimation unit 12. It can be determined that there is. In addition, when a response denying the hobby of "A mountain B Taro" that "I do not like XX" that I like XX is received, the determination unit 14 is appropriate for the estimation by the estimation unit 12. It can be determined that there was no such thing. Information such as the date of the event related to XX may be acquired by the communication device 1 by appropriately searching for an information source such as the Internet.

また、コミュニケーション装置１は、ユーザからの指示に対する応答に問いかけを含めてもよい。コミュニケーション装置１は、例えば、ユーザから「照明つけて」との指示に対して、「はい、Ａ山Ｂ太郎さん、了解です」との応答を返してもよい。この場合、ユーザから「いいえ、私はＣ田Ｄ男です」といったＡ山Ｂ太郎であることを否定する応答を受信すると、ユーザが「Ａ山Ｂ太郎」であるとした推定が妥当ではなかったと判定できる。なお、コミュニケーション装置１は、ユーザから所定時間内に応答がなかった場合には、推定した氏名をユーザが否定しないため、ユーザが「Ａ山Ｂ太郎」であるとした推定が妥当であると判定してもよい。同様の例として、ユーザから「昨日の歩数教えて」との指示を受信したときに、「はい。Ａ山Ｂ太郎さんの歩数をお知らせします」と応答してもよい。 Further, the communication device 1 may include a question in the response to the instruction from the user. For example, the communication device 1 may return a response of "Yes, Mr. A, Mr. B, Taro, I understand" to the instruction "Turn on the lighting" from the user. In this case, when a response denying that it is A mountain B Taro such as "No, I am C field D man" is received from the user, the presumption that the user is "A mountain B Taro" was not valid. Can be judged. If the communication device 1 does not respond from the user within a predetermined time, the user does not deny the estimated name, so it is determined that the estimation that the user is "A mountain B Taro" is appropriate. You may. As a similar example, when receiving an instruction "Tell me the number of steps yesterday" from the user, you may reply "Yes. I will inform you of the number of steps of Mr. A mountain B Taro".

コミュニケーション装置１は、様々な撮影条件の顔画像データを取得するために、ユーザに対して運動を促す指示を行ってもよい。運動を促す指示としては、例えば、ユーザに対して「少し運動しましょう。ゆっくり３回首をぐるっと回してください」や「少し運動しましょう。１０秒間ぎゅっと目をつむってください」等を挙げることができる。 The communication device 1 may instruct the user to exercise in order to acquire face image data under various shooting conditions. As instructions to encourage exercise, for example, "exercise a little. Slowly turn your neck three times" or "exercise a little. Close your eyes tightly for 10 seconds." ..

コミュニケーション装置１は、推定部１２による推定が妥当な場合であっても、判定データモデル１８の教師データが所定期間（例えば、１年間）入れ替えられていない場合、教師データの入れ替えを実行してもよい。上記の通り、ユーザの顔は成長や老化等の影響により時間の経過とともに変化するため、ある程度の期間の経過とともに教師データを入れ替えることで、推定部１２による推定精度を高く維持できると考えられる。 Even if the estimation by the estimation unit 12 is valid, the communication device 1 may execute the replacement of the teacher data when the teacher data of the determination data model 18 has not been replaced for a predetermined period (for example, one year). Good. As described above, since the user's face changes with the passage of time due to the influence of growth, aging, etc., it is considered that the estimation accuracy by the estimation unit 12 can be maintained high by exchanging the teacher data with the passage of a certain period of time.

実施形態では、推定部１２が推定した人物の氏名を含む問いかけを対話部１３は出力する。しかしながら、対話部１３が出力する問いかけは、ユーザの氏名ではなく、推定した
人物を示す情報を含んでもよい。推定した人物を示す情報は、例えば、ユーザの苗字、ユーザの名前、ユーザのニックネーム等を挙げることができる。 In the embodiment, the dialogue unit 13 outputs a question including the name of the person estimated by the estimation unit 12. However, the question output by the dialogue unit 13 may include information indicating the estimated person instead of the user's name. Information indicating the estimated person may include, for example, a user's surname, a user's name, a user's nickname, and the like.

実施形態では、撮影条件のカテゴリとして、「向き」、「距離」、「明暗」の３つを挙げたが、これらに代えて、または、これらに加えて、他のカテゴリを撮影条件に含めてもよい。他のカテゴリとしては、例えば、「帽子の有無」、「眼鏡の有無」、「表情」等を挙げることができる。また、実施形態では、撮影条件の各カテゴリについて、「向き」を５方向、「距離」を３段階、「明暗」を３段階としたが、撮影条件の各カテゴリをさらに細かく区分けしてもよいし、より大まかに区分けしてもよい。 In the embodiment, three categories of shooting conditions are listed, "direction", "distance", and "light and darkness", but instead of or in addition to these, other categories are included in the shooting conditions. May be good. Other categories include, for example, "presence or absence of a hat", "presence or absence of eyeglasses", "expression" and the like. Further, in the embodiment, for each category of shooting conditions, "direction" is set to 5 directions, "distance" is set to 3 levels, and "light and darkness" is set to 3 levels, but each category of shooting conditions may be further subdivided. However, it may be divided more roughly.

以上で開示した実施形態や変形例はそれぞれ組み合わせることができる。 The embodiments and modifications disclosed above can be combined with each other.

<<コンピュータが読み取り可能な記録媒体>>
コンピュータその他の機械、装置（以下、コンピュータ等）に上記いずれかの機能を実現させる情報処理プログラムをコンピュータ等が読み取り可能な記録媒体に記録することができる。そして、コンピュータ等に、この記録媒体のプログラムを読み込ませて実行させることにより、その機能を提供させることができる。 << Computer-readable recording medium >>
An information processing program that enables a computer or other machine or device (hereinafter, computer or the like) to realize any of the above functions can be recorded on a recording medium that can be read by the computer or the like. Then, by causing a computer or the like to read and execute the program of this recording medium, the function can be provided.

ここで、コンピュータ等が読み取り可能な記録媒体とは、データやプログラム等の情報を電気的、磁気的、光学的、機械的、または化学的作用によって蓄積し、コンピュータ等から読み取ることができる記録媒体をいう。このような記録媒体のうちコンピュータ等から取り外し可能なものとしては、例えばフレキシブルディスク、光磁気ディスク、ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ（ＣＤ−ＲＯＭ）、ＣｏｍｐａｃｔＤｉｓｃ−Ｒｅｃｏｒｄａｂｌｅ（ＣＤ−Ｒ）、ＣｏｍｐａｃｔＤｉｓｃ−ＲｅＷｒｉｔｅｒａｂｌｅ（ＣＤ−ＲＷ）、ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ（ＤＶＤ）、ブルーレイディスク（ＢＤ）、ＤｉｇｉｔａｌＡｕｄｉｏＴａｐｅ（ＤＡＴ）、８ｍｍテープ、フラッシュメモリなどのメモリカード等がある。また、コンピュータ等に固定された記録媒体としてハードディスクやＲＯＭ等がある。 Here, a recording medium that can be read by a computer or the like is a recording medium that can store information such as data and programs by electrical, magnetic, optical, mechanical, or chemical action and can be read from the computer or the like. To say. Among such recording media, those that can be removed from a computer or the like include, for example, a flexible disk, a photomagnetic disk, a Compact Disc Read Only Memory (CD-ROM), a Compact Disc-Recordable (CD-R), and a Compact Disc-ReWriterable. (CD-RW), Digital Versaille Disc (DVD), Blu-ray Disc (BD), Digital Audio Tape (DAT), 8 mm tape, memory cards such as flash memory, and the like. In addition, there are hard disks, ROMs, and the like as recording media fixed to computers and the like.

１、１ａ・・・コミュニケーション装置
１１・・・撮影部
１２・・・推定部
１３・・・対話部
１４・・・判定部
１５・・・算出部
１６、１６ａ・・・更新部
１７・・・初期構築部
１８・・・判定データモデル
１９・・・管理データベース
１９１・・・撮影条件管理テーブル
１０１・・・ＣＰＵ
１０２・・・主記憶部
１０３・・・補助記憶部
１０４・・・カメラ
１０５・・・マイクロフォン
１０６・・・スピーカー
１０７・・・計時部
１０８・・・センサ
1, 1a ... Communication device 11 ... Shooting unit 12 ... Estimating unit 13 ... Dialogue unit 14 ... Judgment unit 15 ... Calculation unit 16, 16a ... Update unit 17 ... Initial construction unit 18 ... Judgment data model 19 ... Management database 191 ... Shooting condition management table 101 ... CPU
102 ... Main storage unit 103 ... Auxiliary storage unit 104 ... Camera 105 ... Microphone 106 ... Speaker 107 ... Timekeeping unit 108 ... Sensor

Claims

It is a dialogue device that estimates the user captured by the camera and performs a dialogue with the estimated user.
A person estimation model generated by using multiple imaging data of a person photographed under multiple shooting conditions as teacher data, and
An estimation unit that estimates the user captured by the camera using the person estimation model,
A dialogue section that outputs questions that include information indicating the estimated person,
When the response to the question indicates that the user and the estimated person are different, one of the teacher data and the imaging data acquired when the user is photographed by the camera are used as the teacher data. It is equipped with an update unit that replaces data so that the variation in appearance frequency for each shooting condition is reduced.
Dialogue device.

The dialogue unit outputs a question including information indicating the estimated person as a response to the instruction from the user.
The dialogue device according to claim 1.

The estimation unit causes the camera to take a picture of the user who responded to the question again, and further estimates the user using the imaged data taken again.
The dialogue device according to claim 1 or 2.

When there are a plurality of teacher data capable of reducing the variation in the appearance frequency for each shooting condition by replacing the imaging data captured by the camera, the updating unit reduces the variation in the appearance frequency for each shooting condition. The teacher data with a large amount of reduction is prioritized and replaced with the imaged data captured by the camera.
The dialogue device according to any one of claims 1 to 3.

When there are a plurality of teacher data capable of reducing the variation in the appearance frequency for each shooting condition by exchanging the imaging data captured by the camera, the updating unit gives priority to the teacher data captured at an older time. The target is to be replaced with the imaged data taken by the camera.
The dialogue device according to any one of claims 1 to 4.

Even if the response to the question is the same between the user and the estimated person, the update unit will use the teacher data if a predetermined period has passed since the previous replacement of the teacher data. And the imaging data captured by the camera are replaced so that the variation in the appearance frequency of the imaging data as the teacher data for each imaging condition is reduced.
The dialogue device according to any one of claims 1 to 5.

A computer that estimates the user captured by the camera and interacts with the estimated user
The user captured by the camera is estimated by using a person estimation model generated by using a plurality of imaging data of a person photographed under a plurality of shooting conditions as teacher data.
Outputs a question containing information indicating the estimated person,
When the response to the question indicates that the user and the estimated person are different, one of the teacher data and the imaging data acquired when the user is photographed by the camera are used as the teacher data. Replace the data so that the variation in the frequency of appearance for each shooting condition is reduced.
Information processing method.

To a computer that estimates the user in the camera and interacts with the estimated user
The user captured by the camera is estimated by using a person estimation model generated by using a plurality of imaging data of a person photographed under a plurality of shooting conditions as teacher data.
Output a question containing information indicating the estimated person,
When the response to the question indicates that the user and the estimated person are different, one of the teacher data and the imaging data acquired when the user is photographed by the camera are used as the teacher data. Replace the data so that the variation in appearance frequency for each shooting condition is reduced.
Information processing program.