JP2019125305A

JP2019125305A - Support device for creating teacher data

Info

Publication number: JP2019125305A
Application number: JP2018007187A
Authority: JP
Inventors: 克繁中田; Katsushige Nakada
Original assignee: Aisin Seiki Co Ltd
Current assignee: Aisin Corp
Priority date: 2018-01-19
Filing date: 2018-01-19
Publication date: 2019-07-25

Abstract

To provide a teacher data creating support device which simplifies work of accurately creating teacher data containing a correspondence relationship between an image of a human face and expressions of the face.SOLUTION: In a teacher data creating support device, a support screen 300 displayed by a display processing unit includes: a first region 310 in which plural first images are displayed that are extracted from plural still pictures; a second region 320 in which plural second images are displayed that are time-sequential relative to one selected first image among the plural still pictures, and that fall within a predetermined range, when receiving a user's operation for selecting the one first image among the plural first images; and a third region 330 in which plural candidates of expressions are displayed that may be associated with a human face projected in the plural second images respectively.SELECTED DRAWING: Figure 3

Description

本発明の実施形態は、教師データ作成支援装置に関する。 Embodiments of the present invention relate to a teacher data creation support device.

従来から、画像の入力に応じて何らかの情報（ラベル）を出力する学習済みモデルが知られており、このような学習済みモデルを教師付き学習によって生成する場合に必要な教師データの作成を支援するための技術が種々検討されている。 Conventionally, a learned model that outputs some information (labels) in response to image input is known, and generation of teacher data necessary for generating such a learned model by supervised learning is supported. Various techniques for achieving this have been studied.

特開２０１６−７６０７３号公報JP, 2016-76073, A 特開２０１０−９１４０１号公報JP, 2010-91401, A

しかしながら、上記のような教師データを正確に作成するためには、最終的には、画像をユーザが目で見ることで当該画像にどのようなラベルを対応付けるかを判断し、判断結果に基づいて、画像とラベルとの対応付けを人手で実施する必要があるので、教師データを作成するための作業は、一般的に煩雑である。 However, in order to create the above teacher data accurately, finally, the user looks at the image visually to determine what label is to be associated with the image, and based on the determination result Since it is necessary to manually associate the image with the label, the work for creating teacher data is generally complicated.

特に、人間の顔が写った画像と、当該顔の表情との対応付けを実施する場合、１枚の画像だけでなく、その前後の時系列の他の画像も考慮して、一般的な画像処理では抽出しにくい表情の変化を目で見て分析し、どのような対応付けを実施するかを判断する必要があるので、この場合における教師データを正確に作成するための作業は、非常に煩雑である。 In particular, when correlating the image of a human face with the facial expression of the face, not only one image but also other images in time series before and after that are considered as a general image. Since it is necessary to visually analyze changes in facial expressions that are difficult to extract in processing and to determine what kind of correspondence is to be carried out, the work to create teacher data accurately in this case is very important. It is complicated.

そこで、実施形態の課題の一つは、人間の顔が写った画像と、当該顔の表情との対応関係を含む教師データを正確に作成する作業を簡単化することが可能な教師データ作成支援装置を提供することである。 Therefore, one of the problems in the embodiment is teacher data creation support that can simplify the task of accurately creating teacher data including the correspondence between the image of a human face and the facial expression of the face. It is providing a device.

実施形態の一例としての教師データ作成支援装置は、教師付き学習に用いられる教師データの作成を支援するための教師データ作成支援装置であって、人間の顔が写った動画を取得する取得部と、動画からフレームごとに複数の静止画を抽出する抽出部と、複数の静止画の各々に写った人間の顔と、当該顔の表情と、の対応付けを支援するための支援画面を表示部に表示する表示処理部と、支援画面に対する操作入力部を介したユーザの操作を受け付ける操作受付部と、支援画面に対するユーザの操作に応じて対応付けを実施する対応付け処理部と、を備え、支援画面は、複数の静止画から抽出された複数の第１画像が表示される第１領域と、複数の第１画像のうち１つの第１画像を選択するためのユーザの操作が受け付けられた場合に、複数の静止画のうち、選択された１つの第１画像に対して時系列で所定の範囲内にある複数の第２画像が表示される第２領域と、複数の第２画像の各々に写った人間の顔に対して対応付け可能な表情の複数の候補が表示される第３領域と、を含む。 The teacher data creation support device as an example of the embodiment is a teacher data creation support device for supporting creation of teacher data used for supervised learning, which is an acquisition unit for acquiring a moving image in which a human face is taken, An extraction unit for extracting a plurality of still images for each frame from a moving image, and a support screen for supporting association between human faces captured in each of the plurality of still images and facial expressions of the face A display processing unit for displaying on the screen, an operation reception unit for receiving an operation of the user via the operation input unit on the support screen, and an association processing unit for performing association in accordance with the user's operation on the support screen; In the support screen, a first region where a plurality of first images extracted from a plurality of still images are displayed, and a user operation for selecting one first image of the plurality of first images are accepted In the case A second region in which a plurality of second images within a predetermined range are displayed in time series with respect to the selected one first image among the still images of the second image and a plurality of second images And a third area in which a plurality of candidates for facial expressions that can be mapped to human faces are displayed.

上述した教師データ作成支援装置によれば、支援画面により、人間の顔が写った画像と、当該顔の表情との対応関係を含む教師データを正確に作成する作業を簡単化することができる。 According to the above-described teacher data creation support device, it is possible to simplify the task of correctly creating teacher data including the correspondence between the image of a human face and the facial expression of the face by using the support screen.

上述した教師データ作成支援装置において、複数の第１画像は、人間の顔が写った画像を入力した場合に当該人間の顔の表情の推定結果を出力する表情認識ライブラリを用いた処理を複数の静止画に実行した結果に基づいて抽出される、複数の静止画を時系列で見て表情の推定結果に所定以上の変化が見られた複数の時点における複数の画像である。このような構成によれば、表情が変化している可能性が高い画像を基準として、画像と表情との対応付け作業を行うことができる。 In the above-described teacher data creation support device, when a plurality of first images are input with an image in which a human face appears, a plurality of processes using an expression recognition library that outputs estimation results of the facial expression of the human face They are a plurality of images at a plurality of time points when a plurality of still images are extracted in time series and extracted on the basis of a result of execution on a still image in time series, and a change of a predetermined level or more is observed. According to such a configuration, the image and the expression can be associated with each other on the basis of the image that is highly likely to change in expression.

また、上述した教師データ作成支援装置において、複数の第１画像は、複数の静止画を時系列で見て所定の時間間隔で並んだ複数の画像である。このような構成によれば、第１画像を簡単に抽出することができる。 Further, in the above-described teacher data creation support device, the plurality of first images are a plurality of images arranged in predetermined time intervals by viewing a plurality of still images in time series. According to such a configuration, the first image can be easily extracted.

また、上述した教師データ作成支援装置において、表示処理部は、複数の第２画像を抽出する所定の範囲を、選択された１つの第１画像ごとに変更可能に構成されている。このような構成によれば、たとえば、第１画像に対応した時点の前後の時系列も含めた表情の変化を確認するために抽出する第２画像の個数を必要に応じて異ならせることが可能になるので、画像と表情との対応付けを効率的に行うことができる。 In the teacher data creation support device described above, the display processing unit is configured to be able to change the predetermined range for extracting the plurality of second images for each selected first image. According to such a configuration, for example, it is possible to make the number of second images to be extracted different in order to confirm changes in expression including time series before and after the time point corresponding to the first image. Thus, the image and the expression can be associated efficiently.

また、上述した教師データ作成支援装置において、対応付け処理部は、第２領域に表示された複数の第２画像のうち１以上の第２画像がユーザの操作に応じて選択された状態で、第３領域に表示された複数の候補のうちの１つの候補がユーザの操作に応じて選択された場合に、当該１つの候補と、１以上の第２画像と、の対応付けを一括して実施する。このような構成によれば、たとえば複数の第２画像の各々について個別に対応付けの作業を行う場合と異なり、作業負担を軽減することができる。 In the teacher data creation support device described above, the association processing unit is configured to select one or more second images among the plurality of second images displayed in the second area according to the user's operation. When one of the plurality of candidates displayed in the third area is selected according to the user's operation, the association between the one candidate and one or more second images is collectively performed. carry out. According to such a configuration, it is possible to reduce the work load, unlike, for example, the case of individually associating the plurality of second images.

また、上述した教師データ作成支援装置において、取得部は、動画に写った人間の当該動画の各時点における生体情報を動画とともに取得し、支援画面は、所定の範囲内における生体情報の変化が表示される第４領域をさらに含む。このような構成によれば、生体情報の変化をさらに考慮して、表情の判定をより正確に行うことができる。 In addition, in the teacher data creation support device described above, the acquisition unit acquires the biological information at each time point of the moving image of the human taken in the moving image together with the moving image, and the support screen displays the change in the biological information within a predetermined range Further includes a fourth area to be According to such a configuration, it is possible to more accurately determine the facial expression in consideration of the change in the biological information.

また、上述した教師データ作成支援装置において、取得部は、人間が乗っている車両の内部の映像を動画として取得するとともに、当該映像の各時点における車両の走行状態を含む車両情報を取得し、支援画面は、所定の範囲内における車両情報の変化が表示される第５領域をさらに含む。このような構成によれば、車両の車両情報の変化をさらに考慮して、表情の判定をより正確に行うことができる。 Further, in the teacher data creation support device described above, the acquisition unit acquires a video inside the vehicle on which a human being is riding as a moving image, and acquires vehicle information including the traveling state of the vehicle at each time of the video. The support screen further includes a fifth area in which a change in vehicle information within a predetermined range is displayed. According to such a configuration, it is possible to more accurately determine the expression by further considering the change in the vehicle information of the vehicle.

図１は、実施形態にかかる教師データ作成支援装置のハードウェア構成を示した例示的かつ模式的なブロック図である。FIG. 1 is an exemplary and schematic block diagram showing a hardware configuration of a teacher data creation support device according to the embodiment. 図２は、実施形態にかかる教師データ作成支援装置の機能的構成を示した例示的かつ模式的なブロック図である。FIG. 2 is an exemplary and schematic block diagram showing a functional configuration of the teacher data creation support device according to the embodiment. 図３は、実施形態にかかる教師データ作成支援装置によって提供される支援画面の画面構成を示した例示的かつ模式的な図である。FIG. 3 is an exemplary and schematic diagram showing a screen configuration of a support screen provided by the teacher data creation support device according to the embodiment. 図４は、実施形態にかかる教師データ作成支援装置が表示部に支援画面を表示する際における処理の流れを示した例示的かつ模式的なフローチャートである。FIG. 4 is an exemplary schematic flow chart showing the flow of processing when the teacher data creation support device according to the embodiment displays the support screen on the display unit. 図５は、実施形態にかかる教師データ作成支援装置が支援画面上に詳細画像を表示する際における処理の流れを示した例示的かつ模式的な図である。FIG. 5 is an exemplary schematic diagram showing a flow of processing when the teacher data creation support device according to the embodiment displays a detailed image on the support screen. 図６は、実施形態にかかる教師データ作成支援装置が支援画面に対する操作に応じて対応付けを実施する際における処理の流れを示した例示的かつ模式的な図である。FIG. 6 is an exemplary schematic diagram showing a flow of processing when the teacher data creation support device according to the embodiment carries out the correspondence in response to the operation on the support screen. 図７は、実施形態にかかる教師データ作成支援装置によって作成された教師データを利用することで実現される車両制御システムの構成を示した例示的かつ模式的なブロック図である。FIG. 7 is an exemplary and schematic block diagram showing a configuration of a vehicle control system realized by using teacher data created by the teacher data creation support device according to the embodiment.

以下、実施形態を図面に基づいて説明する。以下に記載する実施形態の構成、ならびに当該構成によってもたらされる作用および結果（効果）は、あくまで一例であって、以下の記載内容に限られるものではない。 Hereinafter, embodiments will be described based on the drawings. The configurations of the embodiments described below, and the operations and results (effects) provided by the configurations are merely examples, and the present invention is not limited to the following description.

図１は、実施形態にかかる教師データ作成支援装置１００のハードウェア構成を示した例示的かつ模式的なブロック図である。教師データ作成支援装置１００とは、教師付き学習に用いられる教師データの作成を支援するための装置である。以下に説明するように、教師データ作成支援装置１００は、ＰＣ（パーソナルコンピュータ）などといった通常のコンピュータと同様のハードウェア構成を有している。 FIG. 1 is an exemplary and schematic block diagram showing the hardware configuration of the teacher data creation support device 100 according to the embodiment. The teacher data creation support device 100 is a device for supporting creation of teacher data used for supervised learning. As described below, the teacher data creation support device 100 has the same hardware configuration as a normal computer such as a PC (personal computer).

図１に示されるように、教師データ作成支援装置１００は、ハードウェア構成として、プロセッサ１０１と、メモリ１０２と、入出力インターフェース（Ｉ／Ｆ）１０３と、ストレージ１０４と、を有している。これらのハードウェア構成は、バス１５０を介して互いに通信可能に接続されている。 As shown in FIG. 1, the teacher data creation support device 100 has a processor 101, a memory 102, an input / output interface (I / F) 103, and a storage 104 as a hardware configuration. These hardware configurations are communicably connected to one another via a bus 150.

プロセッサ１０１は、たとえばＣＰＵ（中央演算装置）により構成され、教師データ作成支援装置１００の各部の動作を統括的に制御する。メモリ１０２は、たとえばＲＯＭ（リードオンリーメモリ）やＲＡＭ（ランダムアクセスメモリ）などを含み、プロセッサ１０１により実行される各種の処理に必要なデータの保存や、当該プロセッサ１０１の作業領域の提供などを実現する。 The processor 101 is formed of, for example, a CPU (central processing unit), and centrally controls the operation of each unit of the teacher data creation support device 100. Memory 102 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), etc., and realizes storage of data necessary for various processes executed by processor 101, provision of a work area of processor 101, etc. Do.

入出力インターフェース１０３は、教師データ作成支援装置１００に対して情報を入力する入力デバイスと、教師データ作成支援装置１００から出力される情報を出力する出力デバイスと、を接続可能なインターフェースである。入力デバイスは、たとえばマウスやキーボードなどといった操作入力部１５１であり、出力デバイスは、たとえばＬＣＤ（液晶ディスプレイ）やＯＥＬＤ（有機エレクトロルミネセンスディスプレイ）などといった表示部１５２である。なお、入出力インターフェース１０３に接続可能な構成が操作入力部１５１および表示部１５２に限られないことは言うまでもない。 The input / output interface 103 is an interface capable of connecting an input device for inputting information to the teacher data creation support device 100 and an output device for outputting information output from the teacher data creation support device 100. The input device is, for example, an operation input unit 151 such as a mouse or a keyboard, and the output device is, for example, a display unit 152 such as an LCD (liquid crystal display) or an OELD (organic electroluminescent display). It goes without saying that the configuration connectable to the input / output interface 103 is not limited to the operation input unit 151 and the display unit 152.

なお、ストレージ１０４は、たとえばＨＤＤ（ハードディスクドライブ）やＳＳＤ（ソリッドステートドライブ）などによって構成された補助記憶装置である。 The storage 104 is an auxiliary storage device configured by, for example, a hard disk drive (HDD) or a solid state drive (SSD).

実施形態では、上記のようなハードウェア構成を有した教師データ作成支援装置１００がユーザによって操作されることで、教師データが作成される。 In the embodiment, teacher data is created by the user operating the teacher data creation support device 100 having the hardware configuration as described above.

実施形態にかかる教師データは、車両に乗っている人間の顔が写った画像を含む情報が入力された場合に当該顔の表情（表情に表れている感情）を出力する学習済みモデル（具体例は後述する）を生成するための教師付き学習に用いられる。 The teacher data according to the embodiment is a learned model that outputs an expression of the face (an emotion appearing in an expression) when information including an image showing an image of a human face on a vehicle is input (specific example) Is used for supervised learning to generate (described later).

ところで、画像の入力に応じて当該画像に対応付けられた何らかの情報（ラベル）を出力する学習済みモデルは従来から知られており、このような学習済みモデルを教師付き学習によって生成する場合に必要となる教師データの作成を支援するための技術が種々検討されている。 By the way, a learned model that outputs some information (label) associated with the image according to the input of the image is conventionally known, and is necessary when generating such a learned model by supervised learning Various techniques for assisting in the creation of teacher data are being considered.

特に、人間の顔が写った画像と、当該顔の表情との対応付けを実施する実施形態のような場合、１枚の画像だけでなく、その前後の時系列の他の画像も考慮して、一般的な画像処理では抽出しにくい表情の変化を目で見て分析し、どのような対応付けを実施するかを判断する必要があるので、この場合における教師データを正確に作成するための作業は、非常に煩雑である。 In particular, in the case of an embodiment in which an image showing a human face is associated with an expression of the face, not only one image but also other images before and after it are taken into consideration. Because it is necessary to visually analyze changes in facial expressions that are difficult to extract in general image processing, and to determine what kind of correspondence is to be performed, to create teacher data accurately in this case The work is very complicated.

そこで、実施形態は、教師データ作成支援装置１００に以下のような機能を持たせることで、人間の顔が写った画像と、当該顔の表情との対応関係を含む教師データを正確に作成する作業を簡単化することを実現する。 Therefore, according to the embodiment, the teacher data creation support device 100 accurately creates teacher data including the correspondence between an image in which a human face appears and an expression of the face by providing the following function. Achieve simplification of work.

図２は、実施形態にかかる教師データ作成支援装置１００の機能的構成を示した例示的かつ模式的なブロック図である。図２に示される機能モジュール群は、プロセッサ１０１がメモリ１０２に記憶されたソフトウェア（制御プログラム）を実行した結果として実現される。なお、実施形態では、図２に示される機能モジュール群の一部または全部が、専用のハードウェア（回路）によって実現されてもよい。 FIG. 2 is an exemplary and schematic block diagram showing a functional configuration of the teacher data creation support device 100 according to the embodiment. The functional modules shown in FIG. 2 are realized as a result of the processor 101 executing software (control program) stored in the memory 102. In the embodiment, part or all of the functional module group shown in FIG. 2 may be realized by dedicated hardware (circuit).

図２に示されるように、教師データ作成支援装置１００は、機能的構成として、取得部２０１と、抽出部２０２と、表示処理部２０３と、操作受付部２０４と、対応付け処理部２０５と、を有している。 As illustrated in FIG. 2, the teacher data creation support device 100 has an acquisition unit 201, an extraction unit 202, a display processing unit 203, an operation reception unit 204, and an association processing unit 205 as functional components. have.

取得部２０１は、教師データの作成のもととなる、車両に乗っている人間の顔が写った動画を取得する。なお、実施形態において、取得部２０１は、動画の撮像時における当該動画に写った人間の生体情報（心拍数や発汗状況、体温、重心、体重など）や、動画の撮像時における車両の走行状態（速度や加速度、アクセルペダルの開度、ブレーキペダルの踏力、時刻、位置など）を含む車両情報なども、あわせて取得可能である。なお、車両情報は、車内の撮像環境（環境光の量など）や、動画の撮像時における車両の周辺の状況（交通状況や天候など）なども含みうる。これらのデータは、たとえば、有線あるいは無線の通信によって外部から教師データ作成支援装置１００に入力される。 The acquisition unit 201 acquires a moving image in which the face of a person riding a vehicle is taken, which is a source of creation of teacher data. In the embodiment, the acquisition unit 201 is a traveling state of the vehicle at the time of imaging of a moving image, such as human's biological information (heart rate, sweating condition, body temperature, center of gravity, body weight, etc.) Vehicle information including (speed, acceleration, accelerator pedal opening, brake pedal pressing force, time, position, etc.) can also be acquired together. The vehicle information may also include an imaging environment (such as the amount of ambient light) in the vehicle and a situation around the vehicle (such as traffic conditions and weather) at the time of capturing a moving image. These data are externally input to the teacher data creation support device 100 by, for example, wired or wireless communication.

抽出部２０２は、取得部２０１により取得された動画からフレームごとに複数の静止画を抽出する。つまり、動画は、時系列で連続した複数の静止画の集まりとして構成されるので、抽出部２０２は、取得部２０１により取得された動画から、当該動画を構成する時系列で連続した複数の静止画を抽出する。 The extraction unit 202 extracts a plurality of still images from the moving image acquired by the acquisition unit 201 for each frame. That is, since the moving image is configured as a collection of a plurality of still images continuous in time series, the extraction unit 202 extracts a plurality of still images continuous in time series constituting the moving image from the moving image acquired by the acquiring unit 201. Extract the picture.

表示処理部２０３は、表示部１５２（図１参照）の表示内容を制御する。たとえば、表示処理部２０３は、抽出部２０２により抽出された複数の静止画の各々に写った人間の顔と、当該顔の表情と、の対応付けを支援するための支援画面３００（後述する図３参照）を表示部１５２（図１参照）に表示する。詳細は後述するが、支援画面３００には、簡単な操作で正確な対応付けを実現するための環境をユーザに提供する様々なＧＵＩ（グラフィカルユーザインターフェース）が設けられる。 The display processing unit 203 controls the display content of the display unit 152 (see FIG. 1). For example, the display processing unit 203 may use a support screen 300 for supporting association of the human face in each of the plurality of still images extracted by the extraction unit 202 with the facial expression of the face (see FIG. 3) is displayed on the display unit 152 (see FIG. 1). Although the details will be described later, the support screen 300 is provided with various GUIs (Graphical User Interfaces) for providing the user with an environment for achieving accurate association with a simple operation.

操作受付部２０４は、操作入力部１５１（図１参照）を介したユーザの操作を受け付ける。たとえば、操作受付部２０４は、支援画面３００に対するユーザの操作を受け付ける。 The operation accepting unit 204 accepts an operation of the user via the operation input unit 151 (see FIG. 1). For example, the operation receiving unit 204 receives the user's operation on the support screen 300.

対応付け処理部２０５は、支援画面３００に対するユーザの操作に応じて、複数の静止画の各々に写った人間の顔と、当該顔の表情と、の対応付けを実施する。実施形態では、対応付け処理部２０５により実施された対応付けの結果に基づいて、教師データが作成される。 The association processing unit 205 associates the human face captured in each of the plurality of still images with the facial expression of the face according to the user's operation on the support screen 300. In the embodiment, teacher data is created based on the result of the association performed by the association processing unit 205.

ここで、実施形態において提供される支援画面３００について、その画面構成の例を挙げて具体的に説明する。 Here, the support screen 300 provided in the embodiment will be specifically described with an example of the screen configuration.

図３は、実施形態にかかる教師データ作成支援装置１００によって提供される支援画面３００の画面構成を示した例示的かつ模式的な図である。 FIG. 3 is an exemplary and schematic diagram showing the screen configuration of the support screen 300 provided by the teacher data creation support device 100 according to the embodiment.

図３に示されるように、実施形態にかかる支援画面３００は、動画が再生表示される再生表示領域３０１と、当該再生表示領域３０１内で再生表示されている動画の再生区間（時間）を識別するための区間表示３０２と、再生表示領域３０１内での動画の再生／一時停止を実行するための再生／一時停止ボタン３０３と、を含んでいる。これらの画面構成によれば、ユーザは、取得部２０１により取得された動画を視認しながら所望の対応付け（画像と表情との対応付け）を実施することができる。 As shown in FIG. 3, the support screen 300 according to the embodiment identifies the reproduction display area 301 in which the moving image is reproduced and displayed, and the reproduction section (time) of the moving image reproduced and displayed in the reproduction display area 301. And a playback / pause button 303 for executing playback / pause of the moving image in the playback / display area 301. According to these screen configurations, the user can perform desired association (association of an image and an expression) while visually recognizing the moving image acquired by the acquisition unit 201.

また、実施形態にかかる支援画面３００は、抽出部２０２により抽出された複数の静止画から所定の基準に従って概略的に（間隔をあけて）さらに抽出された複数の概略画像が表示される概略表示領域３１０を含んでいる。複数の概略画像は、複数の静止画の時系列に沿った概略的な変化を表す。なお、概略画像は、「第１画像」の一例であり、概略表示領域３１０は、「第１領域」の一例である。 In addition, the support screen 300 according to the embodiment is a schematic display in which a plurality of schematic images further extracted (spacingly) according to a predetermined standard from the plurality of still images extracted by the extraction unit 202 are displayed. Region 310 is included. The plurality of schematic images represent schematic changes along the time series of the plurality of still images. The outline image is an example of the “first image”, and the outline display area 310 is an example of the “first area”.

たとえば、図３に示される例では、抽出部２０２により抽出された複数の静止画から所定の基準に従ってさらに抽出された複数の概略画像のうち、区間３０２ａに対応した概略画像としての３つの画像３１１〜３１２が、概略表示領域３１０内に時系列に沿って並んで表示されている。なお、実施形態において、概略表示領域３１０内に表示される概略画像の範囲に対応した区間３０２ａは、手動または自動で切り替わりうる。 For example, in the example illustrated in FIG. 3, among the plurality of outline images further extracted according to a predetermined standard from the plurality of still images extracted by the extraction unit 202, three images 311 as outline images corresponding to the section 302a. ... 312 are displayed side by side in time series in the schematic display area 310. In the embodiment, the section 302a corresponding to the range of the schematic image displayed in the schematic display area 310 can be switched manually or automatically.

実施形態において、概略画像は、たとえば、抽出部２０２により抽出された複数の静止画に、人間の顔が写った画像を入力した場合に当該人間の顔の表情の推定結果を出力する表情認識ライブラリを用いた処理を実行した結果に基づいて抽出される、当該複数の静止画を時系列で見て表情の推定結果が変化した複数の時点における複数の画像である。このような構成によれば、表情が変化している可能性が高い画像を基準として、画像と表情との対応付け作業を行うことができる。 In the embodiment, the outline image is, for example, an expression recognition library that outputs an estimation result of the facial expression of a human face when an image including a human face is input to a plurality of still images extracted by the extraction unit 202. They are a plurality of images at a plurality of time points when the plurality of still images are viewed in time series and the estimation result of the expression changes, which is extracted based on the result of executing the process using. According to such a configuration, the image and the expression can be associated with each other on the basis of the image that is highly likely to change in expression.

なお、実施形態で使用される表情認識ライブラリは、既知のものであってもよい。また、実施形態では、表情認識ライブラリを使用せずに一定の基準で概略画像を抽出する構成が採用されてもよい。したがって、実施形態では、概略画像が、たとえば、抽出部２０２により抽出された複数の静止画を時系列で見て所定の時間間隔で並んだ複数の画像であってもよい。 The expression recognition library used in the embodiment may be known. Further, in the embodiment, a configuration may be adopted in which the outline image is extracted on a certain basis without using the expression recognition library. Therefore, in the embodiment, the outline images may be, for example, a plurality of images arranged in predetermined time intervals by viewing a plurality of still images extracted by the extraction unit 202 in time series.

また、実施形態において、概略画像は、表情認識ライブラリを使用することなく、人間の顔に対して予め定められた複数の特徴点のそれぞれの特徴量の変化量を算出することで抽出されてもよい。この構成においては、たとえば、時系列で隣接する２つの時点における２つの静止画につき、対応する特徴点同士で特徴量の差分をとり、当該差分を示す値が閾値以上となる特徴点が一定数以上存在するか否かを判定することで、２つの静止画間で表情に変化があったか否かを判定し、表情に変化があったと判定された静止画を概略画像として抽出する、という手法が採用されうる。なお、比較する特徴点の個数は、複数であれば、たとえば１００個などといった任意の個数に設定可能である。 Further, in the embodiment, the rough image may be extracted by calculating the amount of change of each feature amount of a plurality of predetermined feature points for a human face without using an expression recognition library. Good. In this configuration, for example, with respect to two still images at two time points adjacent to each other in time series, differences between feature amounts are taken between corresponding feature points, and a certain number of feature points whose value indicating the difference is equal to or more than a threshold By determining whether or not the above exists, it is determined whether or not there is a change in expression between two still images, and a still image determined to have a change in expression is extracted as a schematic image. It can be adopted. The number of feature points to be compared can be set to an arbitrary number such as 100, as long as the number is more than one.

さらに、実施形態では、抽出された複数の概略画像が概略表示領域３１０内に収まりきらない場合、区間３０２ａに対応した区間表示３０２内の領域にカーソルを重畳し、当該カーソルの移動に応じて、概略表示領域３１０内に表示される概略画像の範囲を切り替える構成が採用されてもよい。 Furthermore, in the embodiment, when the plurality of extracted outline images do not fit within the outline display area 310, the cursor is superimposed on the area in the section display 302 corresponding to the section 302a, and the cursor is moved according to the movement. A configuration may be employed in which the range of the schematic image displayed in the schematic display area 310 is switched.

ここで、実施形態にかかる支援画面３００は、概略表示領域３１０に表示された複数の概略画像の各々の時系列で前後の状況をより詳細に確認するための画面構成を含んでいる。より具体的に、実施形態にかかる支援画面３００は、複数の概略画像のうち１つの概略画像を選択するためのユーザの操作が操作受付部２０４により受け付けられた場合に、抽出部２０２により抽出された複数の静止画のうち、選択された１つの概略画像に対して時系列で所定の範囲内にある複数の詳細画像が表示される詳細表示領域３２０を含んでいる。なお、詳細画像は、「第２画像」の一例であり、詳細表示領域３２０は、「第２領域」の一例である。 Here, the support screen 300 according to the embodiment includes a screen configuration for confirming in more detail the situation before and after in the time series of each of the plurality of outline images displayed in the outline display area 310. More specifically, the support screen 300 according to the embodiment is extracted by the extraction unit 202 when the operation reception unit 204 receives an operation of the user for selecting one of the plurality of outline images. Among the plurality of still images, it includes a detail display area 320 in which a plurality of detail images within a predetermined range in time series with respect to one selected outline image is displayed. The detailed image is an example of the “second image”, and the detailed display area 320 is an example of the “second area”.

たとえば、図３に示される例では、詳細表示領域３２０内に、概略表示領域３１０に表示された概略画像としての画像３１１に対して時系列で前後に位置する９つの画像３２１〜３２９が、詳細画像として表示されている。これにより、ユーザは、詳細画像として表示された９つの画像３２１〜３２９とともに、概略画像として表示された画像３１１を見ることで、画像３１１の前後の所定の範囲の時系列における表情の変化（移り変わり）を確認し、確認結果に基づいて、正確に表情を特定することができる。なお、実施形態において、詳細表示領域３２０内に表示される詳細画像の個数は、図３に示される例のような９つに制限されるものではない。 For example, in the example shown in FIG. 3, in the detail display area 320, nine images 321 to 329 which are positioned before and behind the image 311 as the outline image displayed in the outline display area 310 are detailed It is displayed as an image. Thereby, the user sees the image 311 displayed as the outline image together with the nine images 321 to 329 displayed as the detailed image, thereby changing the expression in time series of the predetermined range before and after the image 311 (transition ), And based on the confirmation result, the expression can be accurately identified. In the embodiment, the number of detail images displayed in the detail display area 320 is not limited to nine as in the example shown in FIG.

また、実施形態では、詳細画像を抽出する所定の範囲が、選択された１つの概略画像ごとに異なっていてもよい。たとえば、実施形態では、選択された１つの概略画像に対応した表情の推定結果に応じて、詳細画像を抽出する範囲が変更されうる。このような構成によれば、たとえば、前後の時系列も含めた表情の変化を詳細に確認する必要性が大きいほどより多くの詳細画像を抽出することが可能になるので、画像と表情との対応付けを効率的に行うことができる。 In the embodiment, the predetermined range from which the detailed image is extracted may be different for each selected one of the schematic images. For example, in the embodiment, the range from which the detailed image is extracted may be changed according to the estimation result of the expression corresponding to one selected outline image. According to such a configuration, it is possible to extract more detailed images, for example, as the necessity for confirming changes in facial expressions including the time series before and after in greater detail is increased. The association can be performed efficiently.

また、実施形態にかかる支援画面３００は、ユーザによる対応付けの操作を受け付けるための画面構成を含んでいる。より具体的に、実施形態にかかる支援画面３００は、詳細表示領域３２０に表示された複数の詳細画像の各々に写った人間の顔に対して対応付け可能な表情の複数の候補が表示される候補表示領域３３０を含んでいる。なお、候補表示領域３３０は、「第３領域」の一例である。 In addition, the support screen 300 according to the embodiment includes a screen configuration for receiving an operation of association by the user. More specifically, in the support screen 300 according to the embodiment, a plurality of expressions of facial expressions that can be associated with human faces captured in each of the plurality of detail images displayed in the detail display area 320 are displayed. The candidate display area 330 is included. The candidate display area 330 is an example of the “third area”.

たとえば、図３に示される例では、候補表示領域３３０内に、複数の候補を表す画面構成として、はっきりしない感情（ｎｅｕｔｒａｌ）に対応したボタン３３１と、喜びの感情（ｈａｐｐｉｎｅｓｓ）に対応したボタン３３２と、悲しみの感情（ｓａｄｎｅｓｓ）に対応したボタン３３３と、が表示されている。なお、候補表示領域３３０にこれら以外の感情を表すボタンも表示されうることは言うまでもない。 For example, in the example shown in FIG. 3, in the candidate display area 330, as a screen configuration representing a plurality of candidates, a button 331 corresponding to an indeterminate emotion (neutral) and a button 332 corresponding to an emotion (happiness) And a button 333 corresponding to the sadness of sadness is displayed. Needless to say, buttons representing emotions other than these may be displayed in the candidate display area 330.

ここで、実施形態は、詳細表示領域３２０に表示された複数の詳細画像のうち１以上の詳細画像がユーザの操作に応じて選択された状態で、候補表示領域３３０に表示された複数の候補のうちの１つの候補がユーザの操作に応じて選択された場合に、選択された１つの候補と、選択された１以上の詳細画像と、の対応付けを一括して実施するように構成されている。 Here, in the embodiment, the plurality of candidates displayed in the candidate display area 330 in a state where one or more of the plurality of detail images displayed in the detail display area 320 are selected according to the user's operation. Is configured to collectively carry out the correspondence between the selected one candidate and the selected one or more detail images when one of the candidates is selected according to the user's operation. ing.

たとえば、実施形態では、詳細表示領域３２０に表示された複数の詳細画像が、デフォルトで選択済みの状態となっている。そして、ユーザは、詳細表示領域３２０に表示された複数の詳細画像のうち、他の画像と同様の対応付けを実施したくない詳細画像を選択することで、選択した詳細画像を対応付けの対象から除外し、その上で候補表示領域３３０内の１つの候補を選択することで、一括した対応付けを対応付け処理部２０５に実施させる。このような構成によれば、たとえば複数の詳細画像の各々について個別に対応付けの作業を行う場合と異なり、作業負担を軽減することができる。 For example, in the embodiment, the plurality of detail images displayed in the detail display area 320 are in a state of being selected by default. Then, the user selects, from among the plurality of detail images displayed in the detail display area 320, a detail image that the user does not want to perform the same correspondence as the other images, and the target of the selected detail image By excluding one from the above and selecting one candidate in the candidate display area 330 on that, the association processing unit 205 is made to carry out the collective association. According to such a configuration, it is possible to reduce the work load, unlike, for example, the case of individually associating the plurality of detailed images.

さらに、実施形態にかかる支援画面３００は、表情の判別を補助するための情報を表示するための画面構成を含んでいる。より具体的に、実施形態にかかる支援画面３００は、取得部２０１により取得された生体情報および車両情報（走行状態など）の変化が表示される参考情報表示領域３４０を含んでいる。なお、参考情報表示領域３４０は、「第４領域」の一例であるとともに「第５領域」の一例である。 Furthermore, the support screen 300 according to the embodiment includes a screen configuration for displaying information for assisting the determination of the facial expression. More specifically, the support screen 300 according to the embodiment includes the reference information display area 340 in which changes in the biological information and the vehicle information (such as the traveling state) acquired by the acquisition unit 201 are displayed. The reference information display area 340 is an example of the “fourth area” and an example of the “fifth area”.

たとえば、図３に示される例では、参考情報表示領域３４０内に、詳細表示領域３２０に表示された複数の詳細画像に対応した期間（所定の範囲）内における生体情報の一例としての心拍数の変化を表すグラフ３４１と、当該期間内における車両情報（走行状態など）の一例としての車速の変化を表すグラフ３４２と、が表示されている。このような構成によれば、生体情報および車両情報の変化をさらに考慮して、詳細画像に写った人間の顔の表情の判定をより正確に行うことができる。 For example, in the example illustrated in FIG. 3, in the reference information display area 340, a heart rate as an example of biological information within a period (predetermined range) corresponding to a plurality of detailed images displayed in the detail display area 320. A graph 341 representing a change and a graph 342 representing a change in vehicle speed as an example of vehicle information (e.g., a traveling state) within the period are displayed. According to such a configuration, it is possible to more accurately determine the facial expression of the human face shown in the detailed image, further considering changes in the biological information and the vehicle information.

なお、図３に示される例では、生体情報の変化（グラフ３４１）と車両情報の変化（グラフ３４２）とが共に同一の領域（参考情報表示領域３４０）に表示されているが、実施形態では、生体情報の変化と車両情報の変化とが別個の領域に表示されてもよい。また、実施形態では、前述したように、車両情報として、車内の撮像環境や動画の撮像時における車両の周辺の状況なども取得部２０１によって取得されうるので、これらの情報も支援画面３００内に表示されうる。 In the example shown in FIG. 3, the change in biological information (graph 341) and the change in vehicle information (graph 342) are both displayed in the same area (reference information display area 340), but in the embodiment, The change in biological information and the change in vehicle information may be displayed in separate areas. Further, in the embodiment, as described above, the acquisition unit 201 can also acquire, as vehicle information, the imaging environment in the car, the situation around the vehicle at the time of imaging a moving image, etc. It can be displayed.

以上の構成に基づき、実施形態にかかる教師データ作成支援装置１００は、ユーザによる教師データの作成を支援するために、次のような処理を実行する。 Based on the above configuration, the teacher data creation support device 100 according to the embodiment executes the following process in order to support the creation of teacher data by the user.

図４は、実施形態にかかる教師データ作成支援装置１００が表示部１５２に支援画面３００を表示する際における処理の流れを示した例示的かつ模式的なフローチャートである。この図４に示される処理フローは、たとえば、教師データの作成を行うために支援画面３００を呼び出す操作をユーザが操作入力部１５１を介して教師データ作成支援装置１００に入力した場合に実行される。 FIG. 4 is an exemplary schematic flow chart showing the flow of processing when the teacher data creation support device 100 according to the embodiment displays the support screen 300 on the display unit 152. The process flow shown in FIG. 4 is executed, for example, when the user inputs an operation for calling the support screen 300 to create teacher data to the teacher data creation support device 100 via the operation input unit 151. .

図４に示される処理フローでは、まず、Ｓ４０１において、教師データ作成支援装置１００（たとえば取得部２０１）は、教師データの作成のもととなる、車両に乗っている人間の顔が写った動画や、当該動画に写った人間の生体情報、当該動画の撮像時における車両の走行状態を含む車両情報などを取得する。 In the processing flow shown in FIG. 4, first, in S401, the teacher data creation support apparatus 100 (for example, the acquisition unit 201) receives an animation of the face of a person riding a vehicle, which is the basis of creation of teacher data. Or, human's biological information captured in the moving image, vehicle information including the traveling state of the vehicle at the time of capturing the moving image, and the like are acquired.

そして、Ｓ４０２において、教師データ作成支援装置１００（たとえば抽出部２０２）は、Ｓ４０１で取得された動画からフレームごとに複数の静止画を抽出する。 Then, in S402, the teacher data creation support device 100 (for example, the extraction unit 202) extracts a plurality of still images for each frame from the moving image acquired in S401.

そして、Ｓ４０３において、教師データ作成支援装置１００（たとえば表示処理部２０３）は、Ｓ４０２で抽出された複数の静止画から、前述した支援画面３００の概略表示領域３１０に表示すべき概略画像を所定の基準に従って抽出する。なお、前述したように、概略画像は、表情認識ライブラリによる表情の推定結果に基づいて抽出されてもよいし、当該表情の推定結果とは関係なく一定の基準に基づいて抽出されてもよい。 Then, in S403, the teacher data creation support device 100 (for example, the display processing unit 203) generates a predetermined outline image to be displayed in the outline display area 310 of the support screen 300 described above from the plurality of still images extracted in S402. Extract according to the criteria. As described above, the rough image may be extracted based on the estimation result of the expression by the expression recognition library, or may be extracted based on a certain reference regardless of the estimation result of the expression.

そして、Ｓ４０４において、教師データ作成支援装置１００（たとえば表示処理部２０３）は、Ｓ４０３で抽出された概略画像が表示された概略表示領域３１０を含む支援画面３００を表示部１５２に表示する。そして、処理が終了する。 Then, in S404, the teacher data creation support device 100 (for example, the display processing unit 203) displays, on the display unit 152, the support screen 300 including the schematic display area 310 on which the schematic image extracted in S403 is displayed. Then, the process ends.

図５は、実施形態にかかる教師データ作成支援装置１００が支援画面３００上に詳細画像を表示する際における処理の流れを示した例示的かつ模式的な図である。この図５に示される処理フローは、たとえば、上述した図４に示される一連の処理によって支援画面３００が表示部１５２に表示された後に実行される。 FIG. 5 is an exemplary schematic diagram showing the flow of processing when the teacher data creation support device 100 according to the embodiment displays a detailed image on the support screen 300. As shown in FIG. The process flow shown in FIG. 5 is executed, for example, after the support screen 300 is displayed on the display unit 152 by the series of processes shown in FIG. 4 described above.

図５に示される処理フローでは、まず、Ｓ５０１において、教師データ作成支援装置１００（たとえば表示処理部２０３）は、操作受付部２０４が受け付けたユーザの操作（支援画面３００の概略表示領域３１０に対する操作）に応じて、概略表示領域３１０に表示された複数の概略画像のうちの１つが選択されたか否かを判断する。 In the processing flow shown in FIG. 5, first, in S501, the teacher data creation support device 100 (for example, the display processing unit 203) receives the user's operation (the operation on the schematic display area 310 of the support screen 300) received by the operation receiving unit 204. It is determined whether or not one of the plurality of outline images displayed in the outline display area 310 is selected according to.

Ｓ５０１において、複数の概略画像のうちの１つが選択されていないと判断された場合、処理が終了する。一方、Ｓ５０１において、複数の概略画像のうちの１つが選択されたと判断された場合、Ｓ５０２に処理が進む。 If it is determined in S501 that one of the outline images is not selected, the process ends. On the other hand, when it is determined in S501 that one of the plurality of outline images is selected, the process proceeds to S502.

そして、Ｓ５０２において、教師データ作成支援装置１００（たとえば表示処理部２０３）は、動画を構成する複数の静止画から、Ｓ５０１で選択された１つの概略画像に対して時系列で所定の範囲内にある複数の詳細画像を抽出する。 Then, in S502, the teacher data creation support device 100 (for example, the display processing unit 203) causes the one outline image selected in S501 to be within a predetermined range in time series from a plurality of still images constituting the moving image. Extract certain detailed images.

そして、Ｓ５０３において、教師データ作成支援装置１００（たとえば表示処理部２０３）は、Ｓ５０２で抽出された詳細画像を、支援画面３００の詳細表示領域３２０に表示する。 Then, in S503, the teacher data creation support device 100 (for example, the display processing unit 203) displays the detail image extracted in S502 in the detail display area 320 of the support screen 300.

そして、Ｓ５０４において、教師データ作成支援装置１００（たとえば表示処理部２０３）は、Ｓ５０３で表示された詳細表示領域３２０に対応した期間内における生体情報および車両情報の変化を支援画面３００の参考情報表示領域３４０に表示する。そして、処理が終了する。 Then, in S504, the teacher data creation support device 100 (for example, the display processing unit 203) displays the change of the biological information and the vehicle information in the period corresponding to the detail display area 320 displayed in S503 as the reference information display of the support screen 300. Displayed in the area 340. Then, the process ends.

図６は、実施形態にかかる教師データ作成支援装置１００が支援画面３００に対する操作に応じて対応付けを実施する際における処理の流れを示した例示的かつ模式的な図である。この図６に示される処理フローは、たとえば、上述した図５に示される一連の処理によって詳細画像が詳細表示領域３２０に表示された後に実行される。 FIG. 6 is an exemplary and schematic diagram showing a flow of processing when the teacher data creation support device 100 according to the embodiment performs the correspondence in accordance with the operation on the support screen 300. The processing flow shown in FIG. 6 is executed, for example, after the detail image is displayed in the detail display area 320 by the series of processing shown in FIG. 5 described above.

図６に示される処理フローでは、まず、Ｓ６０１において、教師データ作成支援装置１００（たとえば対応付け処理部２０５）は、操作受付部２０４が受け付けたユーザの操作（支援画面３００の候補表示領域３３０に対する操作）に応じて、表情の候補が決定（選択）されたか否かを判断する。 In the processing flow shown in FIG. 6, first, in S601, the teacher data creation support apparatus 100 (for example, the association processing unit 205) receives the user's operation (the support screen 300 for the candidate display area 330) received by the operation receiving unit 204. In accordance with the operation), it is determined whether or not the expression candidate is determined (selected).

Ｓ６０１において、表情の候補が決定されていないと判断された場合、処理が終了する。一方、Ｓ６０１において、表情の候補が決定されたと判断された場合、Ｓ６０２に処理が進む。 If it is determined in S601 that no expression candidate has been determined, the process ends. On the other hand, when it is determined in S601 that the expression candidate is determined, the process proceeds to S602.

そして、Ｓ６０２において、教師データ作成支援装置１００（たとえば対応付け処理部２０５）は、Ｓ６０１で決定された表情の候補に基づき、現在選択されている詳細画像に対する一括した対応付けを実施する。なお、現在選択されている詳細画像とは、前述したように、詳細表示領域３２０に表示された複数の詳細画像のうち、ユーザの操作によって対応付けの対象から除外されていないものである。実施形態では、Ｓ６０２における対応付けの結果に基づいて、教師データが作成される。そして、処理が終了する。 Then, in step S602, the teacher data creation support apparatus 100 (for example, the association processing unit 205) performs, on the basis of the facial expression candidates determined in step S601, collective association with the currently selected detailed image. Note that, as described above, the currently selected detailed image is one of the plurality of detailed images displayed in the detail display area 320 that is not excluded from the target of association by the operation of the user. In the embodiment, teacher data is created based on the result of the matching in S602. Then, the process ends.

以上説明したように、実施形態にかかる教師データ作成支援装置１００は、少なくとも次のような画面構成を含んだ支援画面３００を表示部１５２に表示する表示処理部２０３を有している。支援画面３００は、動画を構成する複数の静止画から所定の基準に従って抽出された複数の概略画像が表示される概略表示領域３１０と、複数の概略画像のうち１つを選択するためのユーザの操作が受け付けられた場合に、複数の静止画のうち、選択された１つの概略画像に対して時系列で所定の範囲内にある複数の詳細画像が表示される詳細表示領域３２０と、複数の詳細画像の各々に写った人間の顔に対して対応付け可能な表情の複数の候補が表示される候補表示領域３３０と、を含んでいる。 As described above, the teacher data creation support device 100 according to the embodiment includes the display processing unit 203 that displays the support screen 300 including at least the following screen configuration on the display unit 152. The support screen 300 includes a schematic display area 310 in which a plurality of schematic images extracted according to a predetermined standard from a plurality of still images constituting a moving image, and a user's user for selecting one of the plurality of schematic images. Among the plurality of still images, a detail display area 320 displaying a plurality of detail images within a predetermined range in time series with respect to a selected one of the plurality of still images, when the operation is accepted; And a candidate display area 330 in which a plurality of expressions of facial expressions that can be mapped to human faces captured in each of the detail images are displayed.

上記のような構成によれば、支援画面３００により、（車両内に存在している）人間の顔が写った画像と、当該顔の表情との対応関係を含む教師データを正確に作成する作業を簡単化することができる。 According to the above configuration, the support screen 300 accurately creates teacher data including the correspondence between the image of the human face (present in the vehicle) and the facial expression of the face. Can be simplified.

また、実施形態において、複数の概略画像は、動画を構成する複数の静止画に表情認識ライブラリを実行した結果に基づいて抽出される、複数の静止画を時系列で見て、表情認識ライブラリによる表情の推定結果に所定以上の変化が見られた複数の時点における複数の画像である。このように構成すれば、表情が変化している可能性が高い画像を基準として、画像と表情との対応付け作業を行うことができる。 In the embodiment, the plurality of outline images are extracted based on the result of executing the expression recognition library on the plurality of still images constituting the moving image, and the plurality of still images are viewed in time series It is a plurality of images at a plurality of time points when a change of a predetermined level or more is found in the estimation result of the expression. According to this configuration, the image and the expression can be associated with each other on the basis of the image that is highly likely to change in expression.

なお、実施形態において、複数の概略画像は、複数の静止画を時系列で見て所定の時間間隔で並んだ複数の画像であってもよい。このように構成すれば、概略画像を簡単に抽出することができる。 In the embodiment, the plurality of outline images may be a plurality of images which are arranged at predetermined time intervals by viewing a plurality of still images in time series. With this configuration, it is possible to easily extract a schematic image.

また、実施形態において、表示処理部２０３は、複数の詳細画像を抽出する所定の範囲を、選択された１つの概略画像ごとに変更可能に構成されている。たとえば、実施形態では、選択された１つの概略画像に対応した表情の推定結果に応じて、詳細画像を抽出する範囲が変更されうる。このように構成すれば、概略画像に対応した時点の前後の時系列も含めた表情の変化を確認するために抽出する詳細画像の個数を必要に応じて異ならせることが可能になるので、たとえば前後の時系列も含めた表情の変化を詳細に確認する必要性が大きい概略画像ほど、当該概略画像を基準としてより多くの詳細画像を抽出することが可能になる。この結果、画像と表情との対応付けを効率的に行うことができる。 Further, in the embodiment, the display processing unit 203 is configured to be able to change a predetermined range for extracting a plurality of detailed images for each selected one of the outline images. For example, in the embodiment, the range from which the detailed image is extracted may be changed according to the estimation result of the expression corresponding to one selected outline image. This configuration makes it possible to make the number of detailed images to be extracted different in order to confirm changes in expression including the time series before and after the time point corresponding to the outline image, for example. It is possible to extract more detailed images on the basis of the rough image, as the rough image needs to confirm the change of the facial expression including the time series before and after in detail in detail. As a result, it is possible to efficiently associate the image with the expression.

また、実施形態において、対応付け処理部２０５は、詳細表示領域３２０に表示された複数の詳細画像のうち１以上の詳細画像がユーザの操作に応じて選択された状態で、候補表示領域３３０に表示された複数の候補のうちの１つの候補がユーザの操作に応じて選択された場合に、当該１つの候補と、１以上の詳細画像と、の対応付けを一括して実施する。このような構成によれば、たとえば複数の詳細画像の各々について個別に対応付けの作業を行う場合と異なり、作業負担を軽減することができる。 Further, in the embodiment, the association processing unit 205 displays the candidate display area 330 in a state where one or more of the detail images among the plurality of detail images displayed in the detail display area 320 are selected according to the user's operation. When one candidate among the displayed plurality of candidates is selected according to the user's operation, the association between the one candidate and one or more detail images is collectively performed. According to such a configuration, it is possible to reduce the work load, unlike, for example, the case of individually associating the plurality of detailed images.

また、実施形態において、支援画面３００は、複数の詳細画像に対応した所定の範囲内における生体情報および車両情報の変化が表示される参考情報表示領域３４０をさらに含んでいる。このように構成すれば、生体情報および車両情報の変化をさらに考慮して、表情の判定をより正確に行うことができる。 In the embodiment, the support screen 300 further includes a reference information display area 340 in which changes in biological information and vehicle information within a predetermined range corresponding to a plurality of detailed images are displayed. According to this configuration, it is possible to more accurately determine the expression by further considering changes in the biological information and the vehicle information.

以下、実施形態にかかる教師データ作成支援装置１００によって作成された教師データを利用した技術について簡単に説明する。実施形態において作成された教師データを利用すれば、次の図７に示されるような車両制御システム７００を構成することができる。 Hereinafter, a technology using teacher data created by the teacher data creation support device 100 according to the embodiment will be briefly described. By using the teacher data created in the embodiment, a vehicle control system 700 as shown in the following FIG. 7 can be configured.

図７は、実施形態にかかる教師データ作成支援装置１００によって作成された教師データを利用することで実現される車両制御システム７００の構成を示した例示的かつ模式的なブロック図である。この車両制御システム７００は、車両に搭載される。 FIG. 7 is an exemplary and schematic block diagram showing a configuration of a vehicle control system 700 realized by utilizing the teacher data created by the teacher data creation support device 100 according to the embodiment. The vehicle control system 700 is mounted on a vehicle.

図７に示されるように、車両制御システム７００は、車両制御装置７１０と、センサ群７２０と、制御対象７３０と、を有している。 As shown in FIG. 7, the vehicle control system 700 includes a vehicle control device 710, a sensor group 720, and a control target 730.

車両制御装置７１０は、たとえば、車両に搭載されるＥＣＵ（エレクトロニックコントロールユニット）である。ＥＣＵは、一般に、プロセッサやメモリなどといった通常のコンピュータと同様のハードウェアを有したマイクロコンピュータとして構成される。 Vehicle control device 710 is, for example, an ECU (Electronic Control Unit) mounted on a vehicle. The ECU is generally configured as a microcomputer having hardware similar to that of a normal computer such as a processor and a memory.

センサ群７２０は、車両に設けられる各種のセンサの集まりである。図７に示される例では、センサ群７２０が、車内の状況を撮像する車内カメラ７２１と、車内に存在する人間の生体情報を検出する生体情報センサ７２２と、車両の走行状態を検出する走行状態センサ７２３と、を含んでいる。 The sensor group 720 is a collection of various sensors provided in the vehicle. In the example illustrated in FIG. 7, the sensor group 720 includes an in-vehicle camera 721 that captures an in-vehicle situation, a biometric information sensor 722 that detects human biometric information present in the vehicle, and a traveling state in which the traveling state of the vehicle is detected. And a sensor 723.

制御対象７３０は、車両制御装置７１０によって制御される車両の各種の設備である。制御対象７３０は、アクセルなどを含む加速システムや、ブレーキなどを含む制動システム、ステアリングなどを含む操舵システムなどといった車両の走行に関するシステムのみならず、車内における様々なサービス（空調や音楽など）を提供するためのシステムも含みうる。 The control target 730 is various equipment of the vehicle controlled by the vehicle control device 710. The control target 730 provides various services (air conditioning, music, etc.) in the vehicle as well as systems related to vehicle travel such as acceleration systems including accelerators, braking systems including brakes, steering systems including steering etc. May also include a system for

ここで、車両制御装置７１０は、情報取得部７１１と、表情判定部７１２と、制御部７１３と、を有している。これらの構成は、プロセッサがメモリに記憶されたソフトウェア（制御プログラム）を実行した結果として実現されてもよいし、専用のハードウェア（回路）によって実現されてもよい。 Here, the vehicle control device 710 includes an information acquisition unit 711, an expression determination unit 712, and a control unit 713. These configurations may be realized as a result of the processor executing software (control program) stored in the memory, or may be realized by dedicated hardware (circuit).

情報取得部７１１は、センサ群７２０から情報を取得する。そして、表情判定部７１２は、情報取得部７１１によって取得された情報に基づいて、車内の人間の顔の表情を判定する。 The information acquisition unit 711 acquires information from the sensor group 720. Then, the facial expression determination unit 712 determines the facial expression of the human face in the car based on the information acquired by the information acquisition unit 711.

より具体的に、表情判定部７１２は、実施形態にかかる教師データ作成支援装置１００によって作成された教師データに基づく教師付き学習によって生成された学習済みモデル７１２ａを有している。この学習済みモデル７１２ａは、実施形態にかかる技術を利用しているので、車内カメラ７２１によって取得された映像（動画または静止画）と、生体情報センサ７２２によって取得された生体情報と、走行状態センサ７２３によって取得された走行状態と、の入力に応じて、現在車内に存在する人間の顔の表情（の推定結果）を出力する。 More specifically, the facial expression determination unit 712 includes a learned model 712 a generated by supervised learning based on the teacher data generated by the teacher data generation support device 100 according to the embodiment. Since this learned model 712a uses the technology according to the embodiment, the image (moving image or still image) acquired by the in-vehicle camera 721, the biological information acquired by the biological information sensor 722, and the traveling state sensor According to the input of the traveling state acquired by 723, (the estimation result of) the facial expression of the human face currently existing in the car is output.

そして、制御部７１３は、表情判定部７１２の判定結果に応じて、現在車内に存在する人間の顔の表情に合った適切な走行やサービスなどの提供が実現されるように、制御対象７３０を制御する。これにより、人間の感情に沿った快適な車両を提供することができる。 Then, in accordance with the determination result of the expression determination unit 712, the control unit 713 controls the control target 730 so as to realize provision of appropriate traveling and services that match the human facial expression currently existing in the car. Control. This makes it possible to provide a comfortable vehicle in line with human emotions.

なお、図７に示される例において、センサ群７２０は、車内の撮像環境や、動画の撮像時における車両の周辺の状況などといった、走行状態以外の車両情報を検出するための構成を含んでいてもよい。そして、表情判定部７１２は、これら全ての車両情報を考慮して表情を判定するように構成されていてもよい。 In the example illustrated in FIG. 7, the sensor group 720 includes a configuration for detecting vehicle information other than the traveling state, such as an imaging environment in a car, a situation around the vehicle at the time of imaging a moving image, and the like. It is also good. Then, the facial expression determination unit 712 may be configured to determine the facial expression in consideration of all the vehicle information.

以上、本発明の実施形態を説明したが、上述した実施形態はあくまで一例であって、発明の範囲を限定することは意図していない。上述した新規な実施形態は、様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上述した実施形態およびその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 As mentioned above, although embodiment of this invention was described, embodiment mentioned above is an example to the last, and limiting the scope of invention is not intended. The novel embodiments described above can be implemented in various forms, and various omissions, substitutions, and modifications can be made without departing from the scope of the invention. The above-described embodiments and the modifications thereof are included in the scope and the gist of the invention, and are included in the invention described in the claims and the equivalent scope thereof.

１００教師データ作成支援装置
１５１操作入力部
１５２表示部
２０１取得部
２０２抽出部
２０３表示処理部
２０４操作受付部
２０５対応付け処理部
３００支援画面
３１０概略表示領域（第１領域）
３２０詳細表示領域（第２領域）
３３０候補表示領域（第３領域）
３４０参考情報表示領域（第４領域、第５領域） 100 teacher data creation support device 151 operation input unit 152 display unit 201 acquisition unit 202 extraction unit 203 display processing unit 204 operation reception unit 205 association processing unit 300 support screen 310 schematic display area (first area)
320 Detail display area (second area)
330 candidate display area (third area)
340 Reference information display area (fourth area, fifth area)

Claims

A teacher data creation support device for supporting creation of teacher data used for supervised learning, comprising:
An acquisition unit that acquires a moving image of a human face;
An extraction unit for extracting a plurality of still images from the moving image for each frame;
A display processing unit that displays on a display unit a support screen for supporting association of the human face captured in each of the plurality of still images with the facial expression of the face;
An operation receiving unit that receives a user's operation via the operation input unit on the support screen;
An association processing unit that carries out the association according to the user's operation on the support screen;
Equipped with
The support screen is
A first area in which a plurality of first images extracted from the plurality of still images are displayed;
When the user's operation for selecting one of the plurality of first images is accepted, time may be selected with respect to the selected one of the plurality of still images. A second area in which a plurality of second images within a predetermined range in series are displayed;
A third area in which a plurality of candidates for the facial expression that can be associated with the human face captured in each of the plurality of second images are displayed;
including,
Teacher data creation support device.

When the plurality of first images receives an image including the human face, processing is performed on the plurality of still images using an expression recognition library that outputs the estimation result of the expression of the human face. It is a plurality of images at a plurality of time points when the plurality of still images extracted based on the result are seen in time series and the estimation result of the expression is changed in a predetermined amount or more.
The teacher data creation support device according to claim 1.

The plurality of first images are a plurality of images in which the plurality of still images are viewed in time series and arranged at predetermined time intervals.
The teacher data creation support device according to claim 1.

The display processing unit is configured to be able to change the predetermined range for extracting the plurality of second images for each of the selected first images.
The teacher data creation assistance apparatus of any one of Claims 1-3.

The association processing unit is displayed in the third area in a state where one or more second images among the plurality of second images displayed in the second area are selected according to the user's operation. When one candidate among the plurality of candidates is selected according to the operation of the user, the association between the one candidate and the one or more second images is collectively performed. ,
The teacher data creation assistance apparatus of any one of Claims 1-4.

The acquisition unit acquires, together with the moving image, biological information at each time point of the moving image of the human captured in the moving image,
The support screen further includes a fourth area in which a change in the biological information within the predetermined range is displayed.
The teacher data creation assistance apparatus of any one of Claims 1-5.

The acquisition unit acquires, as the moving image, an image of the inside of a vehicle on which the human being is riding, and acquires vehicle information including a traveling state of the vehicle at each time of the image.
The support screen further includes a fifth area in which a change in the vehicle information within the predetermined range is displayed.
The teacher data creation assistance apparatus of any one of Claims 1-6.