JP2020027982A

JP2020027982A - Teacher data preparation device, teacher data preparation method, teacher data preparation program, learning device, and imaging device

Info

Publication number: JP2020027982A
Application number: JP2018150749A
Authority: JP
Inventors: 伸之志摩; Nobuyuki Shima; 尚米山; Nao Yoneyama; 和男神田; Kazuo Kanda; 和彦志村; Kazuhiko Shimura; 野中　修; Osamu Nonaka; 修野中
Original assignee: Olympus Corp
Current assignee: Olympus Corp
Priority date: 2018-08-09
Filing date: 2018-08-09
Publication date: 2020-02-20
Anticipated expiration: 2038-08-09
Also published as: JP7166837B2

Abstract

To predict a desired timing from an image of a subject by machine learning.SOLUTION: A teacher data preparation device includes: a subject image determination section to detect a specific state image that is an image of a specific subject in a specific state from a series of images having time information based on time of shooting; a time determination section to determine a time difference between time of shooting of each image for each image of the series of images and time of shooting of an image including the specific state image among the series of images; and a control section to pair each image and data of time difference obtained for each image, as teacher data.SELECTED DRAWING: Figure 1

Description

本発明は、機械学習のための教師データ作成装置、教師データ作成方法、教師データ作成プログラム、学習装置及び撮像装置に関する。 The present invention relates to a teacher data creation device, a teacher data creation method, a teacher data creation program, a learning device, and an imaging device for machine learning.

近年、デジタルカメラなどの撮影機能付き携帯機器（撮影機器）が普及している。この種の撮影機器においては、撮影時の種々の設定が自動化されたものがある。例えば、デジタルカメラ等には、ピント合わせを自動化したＡＦ機能や、露出を自動化した自動露出（ＡＥ）機能を搭載したものがある。また、撮影を連続的に行う、所謂連写機能を備えた撮影機器も普及している。 2. Description of the Related Art In recent years, portable devices with a photographing function (photographing devices) such as digital cameras have become widespread. In this type of photographing device, there is a device in which various settings at the time of photographing are automated. For example, some digital cameras are equipped with an AF function that automates focusing and an automatic exposure (AE) function that automates exposure. In addition, photographing devices having a so-called continuous photographing function for continuously photographing have been widely used.

ところで、このような撮影機器によって取得した撮像画像に対する機械学習によって所望の推論結果を得る手法が開発されている。機械学習は、既知の入力情報についてその特徴、時系列情報、空間情報等を学習し、学習結果に基づいて推論を行うことで、未知の事柄についての推論結果を得るものである。即ち、機械学習では、先ず特定の入力情報から、判定可能な出力結果を推論可能にするための学習済みモデルを得る。 By the way, a technique for obtaining a desired inference result by machine learning on a captured image acquired by such a photographing device has been developed. The machine learning is to obtain an inference result about an unknown matter by learning features, time-series information, spatial information, and the like of known input information and performing inference based on the learning result. That is, in machine learning, first, a learned model for enabling inference of a determinable output result from specific input information is obtained.

高い信頼性で推論結果が得られるように、学習済みモデルの生成に際して、入力と出力との関係が既知の大量の情報が学習用データとして用いられる。例えば、深層学習においては、大量の学習用データを用いて既知の入力に対して期待される出力が得られるようにネットワークのデザイン設計が行われる。このようなプロセスで得られた学習済モデル（以下、推論モデルともいう）は、学習を行ったネットワークから独立して利用可能である。 In order to obtain an inference result with high reliability, a large amount of information having a known relationship between inputs and outputs is used as learning data when generating a learned model. For example, in deep learning, a network is designed and designed so that an expected output is obtained for a known input using a large amount of learning data. The learned model (hereinafter also referred to as an inference model) obtained by such a process can be used independently of the network that has performed the learning.

例えば、特許文献１においては、学習データの数が少ない場合にも学習精度の悪化を防ぐことを目的として、第１コンテンツと当該第１コンテンツとは種別が異なる第２コンテンツとの組が有する関係性を深層学習した第１学習器の一部を用いて、新たな第２学習器を生成する生成部と、前記生成部が生成した前記第２学習器に、第１コンテンツと、前記第２コンテンツとは異なる種別の第３コンテンツとの組が有する関係性を深層学習させる学習部とを備えた技術が開示されている。 For example, in Patent Document 1, in order to prevent the learning accuracy from deteriorating even when the number of learning data is small, a relationship between a set of a first content and a second content having a type different from the first content is provided. A generation unit that generates a new second learning device by using a part of the first learning device that deeply learns the gender; a first content that is generated by the second learning device generated by the generation unit; There is disclosed a technology including a learning unit that deeply learns the relationship of a set with a third content of a type different from the content.

特許第６１５１４０４号公報Japanese Patent No. 6151404

従来、未知の動きをする被写体の画像を元に、所望のタイミングを予測する機械学習を行う装置は開発されていない。 Conventionally, no device has been developed that performs machine learning for predicting a desired timing based on an image of a subject that moves in an unknown manner.

本発明は、機械学習により、被写体の画像から所望のタイミングを予測することを可能にすることができる教師データ作成装置、教師データ作成方法、教師データ作成プログラム、学習装置及び撮像装置を提供することを目的とする。 The present invention provides a teacher data creation device, a teacher data creation method, a teacher data creation program, a learning device, and an imaging device capable of predicting a desired timing from an image of a subject by machine learning. With the goal.

本発明の一態様による教師データ作成装置は、撮影時刻に基づく時間情報を有する一連の画像から、特定の対象物の特定の状態における画像である特定状態画像を検出する対象物画像判定部と、上記一連の画像の各画像について上記各画像の撮影時刻と、上記一連の画像のうち上記特定状態画像を含む画像の撮影時刻との時間差を判定する時間判定部と、上記各画像と上記各画像について求めた時間差のデータとを組にして教師データとする制御部とを具備する。 A teacher data creation device according to an aspect of the present invention is an object image determination unit that detects a specific state image that is an image in a specific state of a specific object from a series of images having time information based on a shooting time, A time determining unit that determines a time difference between a photographing time of each of the images of the series of images and a photographing time of an image including the specific state image in the series of images, and the respective images and the respective images. And a control unit that sets the data of the time difference obtained for (1) as teacher data.

本発明の一態様による学習装置は、上記教師データ作成装置によって作成された教師データを用いた機械学習により、入力された画像から所定の対象物が上記特定の状態となる時間を推論する推論モデルを生成する推論モデル生成部を具備する。 A learning device according to one aspect of the present invention is an inference model that infers a time when a predetermined target is in the specific state from an input image by machine learning using teacher data created by the teacher data creating device. Is provided.

本発明の一態様による撮像装置は、上記学習装置によって生成された推論モデルを実現する推論エンジンと、撮像部と、上記撮像部による撮像画像を上記推論エンジンに与えて、上記撮像画像中の上記所定の対象物が上記特定の状態となるまでの時間の推論結果を得る設定制御部とを具備する。 An imaging device according to an aspect of the present invention includes an inference engine that implements an inference model generated by the learning device, an imaging unit, and an image captured by the imaging unit, which is provided to the inference engine, and the A setting control unit that obtains an inference result of a time required for the predetermined object to reach the specific state.

本発明の一態様による教師データ作成方法は、撮影時刻に基づく時間情報を有する一連の画像から、特定の対象物の特定の状態における画像である特定状態画像を検出する検出ステップと、上記一連の画像の各画像について上記各画像の撮影時刻と、上記一連の画像のうち上記特定状態画像を含む画像の撮影時刻との時間差を判定するステップと、上記各画像と上記各画像について求めた時間差のデータとを組にして教師データとして生成する生成ステップとを具備する。 The teacher data creation method according to one aspect of the present invention includes a detecting step of detecting a specific state image that is an image in a specific state of a specific target object from a series of images having time information based on a shooting time; Determining a time difference between the photographing time of each of the images for each of the images and the photographing time of the image including the specific state image in the series of images; and And generating a set of data as teacher data.

本発明の一態様による教師データ作成プログラムは、コンピュータに、撮影時刻に基づく時間情報を有する一連の画像から、特定の対象物の特定の状態における画像である特定状態画像を検出する検出ステップと、上記一連の画像の各画像について上記各画像の撮影時刻と、上記一連の画像のうち上記特定状態画像を含む画像の撮影時刻との時間差を判定するステップと、上記各画像と上記各画像について求めた時間差のデータとを組にして教師データとして生成する生成ステップとを実行させる。 The teacher data creation program according to one aspect of the present invention, a computer, from a series of images having time information based on the shooting time, a detection step of detecting a specific state image that is an image in a specific state of a specific object, Determining a time difference between a photographing time of each of the images in the series of images and a photographing time of an image including the specific state image in the series of images; And generating a set of the time difference data as teacher data.

本発明によれば、機械学習により、被写体の画像から所望のタイミングを予測することを可能にすることができるという効果を有する。 According to the present invention, there is an effect that it is possible to predict a desired timing from an image of a subject by machine learning.

本発明の第１の実施の形態に係る学習装置及び撮像装置を示すブロック図。FIG. 1 is a block diagram showing a learning device and an imaging device according to a first embodiment of the present invention. 推論エンジン１２のネットワーク１２ａを説明するための説明図。FIG. 3 is an explanatory diagram for explaining a network 12a of the inference engine 12. 画像群３４ａの各画像を撮像する一例を示す説明図。Explanatory drawing which shows an example which captures each image of the image group 34a. 画像群３４ａの各画像と撮影時間との関係を示す説明図。FIG. 4 is an explanatory diagram showing a relationship between each image of an image group 34a and a photographing time. 母集合作成部３１ａによる教師データの作成方法を説明するためのフローチャート。9 is a flowchart for explaining a method of creating teacher data by the population creating unit 31a. 第１の実施の形態の動作を説明するための説明図。FIG. 4 is an explanatory diagram for explaining the operation of the first embodiment. 第１の実施の形態の動作を説明するための説明図。FIG. 4 is an explanatory diagram for explaining the operation of the first embodiment. 撮像装置２０の動作を示すフローチャート。5 is a flowchart showing the operation of the imaging device 20. 外部機器３０の動作を示すフローチャート。5 is a flowchart showing the operation of the external device 30. 第１の実施の形態の動作を説明するための説明図。FIG. 4 is an explanatory diagram for explaining the operation of the first embodiment. 本発明の第２の実施の形態において採用される動作フローを示すフローチャート。9 is a flowchart showing an operation flow employed in the second embodiment of the present invention. 外部画像ＤＢ３２から母集合作成部３１ａに取り込まれる連続画像群の一例を示す説明図。Explanatory drawing which shows an example of the continuous image group taken in by the mother set preparation part 31a from the external image DB32. ネットワーク１２ａを生成する手法を説明するための説明図。FIG. 3 is an explanatory diagram for explaining a method of generating a network 12a. 表示部１５の表示画面に表示される画像の表示例を示す説明図。FIG. 7 is an explanatory diagram illustrating a display example of an image displayed on a display screen of a display unit 15; 本発明の第３の実施の形態において採用される動作フローを示すフローチャート。13 is a flowchart showing an operation flow employed in the third embodiment of the present invention. 撮像装置２０の制御部１１の制御を示すフローチャート。9 is a flowchart illustrating control of the control unit 11 of the imaging device 20. 表示部１５の表示画面に表示される画像の表示例を示す説明図。FIG. 7 is an explanatory diagram illustrating a display example of an image displayed on a display screen of a display unit 15; 本発明の第４の実施の形態を説明するための説明図。FIG. 14 is an explanatory diagram for describing a fourth embodiment of the present invention. 本発明の第４の実施の形態を説明するための説明図。FIG. 14 is an explanatory diagram for describing a fourth embodiment of the present invention. 本発明の第４の実施の形態を説明するための説明図。FIG. 14 is an explanatory diagram for describing a fourth embodiment of the present invention.

以下、図面を参照して本発明の実施の形態について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（第１の実施の形態）
図１は本発明の第１の実施の形態に係る学習装置及び撮像装置を示すブロック図である。本実施の形態においては、時間情報を有する画像を学習用データとして、所定の瞬間（以下、決定的瞬間ともいう）に到達するまでの時間を予測する機械学習を実現する。具体例として、鳥が飛び立つ瞬間を機械学習により予測する推論モデルを構築すると共に、当該推論モデルを用いて、ライブビュー画像から鳥が飛び立つ瞬間の予測結果を表示することを可能にする。 (First Embodiment)
FIG. 1 is a block diagram showing a learning device and an imaging device according to the first embodiment of the present invention. In the present embodiment, machine learning is realized in which an image having time information is used as learning data to predict a time until a predetermined moment (hereinafter, also referred to as a deterministic moment) is reached. As a specific example, an inference model that predicts the moment at which a bird takes off by machine learning is constructed, and the prediction result at the moment at which a bird takes off from a live view image can be displayed using the inference model.

図１の撮像装置２０は、被写体を撮像して得た画像を記録する。撮像装置２０としては、デジタルカメラやビデオカメラだけでなく、スマートフォンやタブレット端末に内蔵されるカメラを採用してもよい。撮像装置２０は、後述するように、ライブビュー表示時に推論モデルを利用することができるようになっているが、撮像装置２０は予め搭載されている推論モデルを用いてもよく、また、外部機器３０から推論モデルを取得するようになっていてもよい。 The imaging device 20 in FIG. 1 records an image obtained by imaging a subject. As the imaging device 20, not only a digital camera and a video camera, but also a camera built in a smartphone or a tablet terminal may be adopted. As described later, the imaging device 20 can use an inference model at the time of live view display. However, the imaging device 20 may use an inference model that is mounted in advance, and an external device. An inference model may be obtained from 30.

撮像装置２０は、制御部１１及び撮像部２２を備えている。制御部１１は、ＣＰＵ（Central Processing Unit）等を用いたプロセッサによって構成されて、図示しないメモリに記憶されたプログラムに従って動作して各部を制御するものであってもよいし、ハードウェアの電子回路で機能の一部又は全部を実現するものであってもよい。 The imaging device 20 includes a control unit 11 and an imaging unit 22. The control unit 11 may be configured by a processor using a CPU (Central Processing Unit) or the like, and may operate according to a program stored in a memory (not shown) to control each unit, or may be a hardware electronic circuit. May realize some or all of the functions.

撮像部２２は、撮像素子２２ａ及び光学系２２ｂを有している。光学系２２ｂは、ズームやフォーカシングのための図示しないレンズや絞り等を備えている。光学系２２ｂは、これらのレンズを駆動する図示しないズーム（変倍）機構、ピント及び絞り機構を備えている。 The imaging unit 22 has an imaging element 22a and an optical system 22b. The optical system 22b includes a lens and a stop (not shown) for zooming and focusing. The optical system 22b includes a zoom (magnification) mechanism (not shown) for driving these lenses, and a focus and aperture mechanism.

撮像素子２２ａは、ＣＣＤやＣＭＯＳセンサ等によって構成されており、光学系２２ｂによって被写体光学像が撮像素子２２ａの撮像面に導かれるようになっている。撮像素子２２ａは、被写体光学像を光電変換して被写体の撮像画像（撮像信号）を取得する。 The imaging device 22a is configured by a CCD, a CMOS sensor, or the like, and an optical system 22b guides a subject optical image to an imaging surface of the imaging device 22a. The image sensor 22a photoelectrically converts the subject optical image to obtain a captured image (imaging signal) of the subject.

制御部１１の撮像制御部１１ａは、光学系２２ｂのズーム機構、ピント機構及び絞り機構を駆動制御して、ズーム、絞り及びピントを調節することができるようになっている。撮像部２２は、撮像制御部１１ａに制御されて撮像を行い、撮像画像（動画像及び静止画像）の撮像信号を制御部１１に出力する。 The imaging control unit 11a of the control unit 11 can control the zoom mechanism, the focus mechanism, and the aperture mechanism of the optical system 22b to adjust the zoom, the aperture, and the focus. The imaging unit 22 performs imaging under the control of the imaging control unit 11a, and outputs an imaging signal of a captured image (a moving image and a still image) to the control unit 11.

撮像装置２０には操作部１３が設けられている。操作部１３は、図示しないレリーズボタン、ファンクションボタン、撮影モード設定、パラメータ操作等の各種スイッチ、ダイヤル、リング部材等を含み、ユーザ操作に基づく操作信号を制御部１１に出力する。制御部１１は、操作部１３からの操作信号に基づいて、各部を制御するようになっている。 An operation unit 13 is provided in the imaging device 20. The operation unit 13 includes a release button, a function button, various switches for setting a shooting mode, parameter operation, and the like, a dial, a ring member, and the like, and outputs an operation signal based on a user operation to the control unit 11. The control unit 11 controls each unit based on an operation signal from the operation unit 13.

制御部１１は、撮像部２２からの撮像画像（動画像及び静止画像）を取込む。制御部１１の画像処理部１１ｂは、取込んだ撮像画像に対して、所定の信号処理、例えば、色調整処理、マトリックス変換処理、ノイズ除去処理、その他各種の信号処理を行う。 The control unit 11 captures captured images (moving images and still images) from the imaging unit 22. The image processing unit 11b of the control unit 11 performs predetermined signal processing, for example, color adjustment processing, matrix conversion processing, noise removal processing, and other various signal processing on the captured image.

撮像装置２０には表示部１５が設けられており、制御部１１には、表示制御部１１ｆが設けられている。表示部１５は、例えば、ＬＣＤ（液晶表示装置）等の表示画面を有しており、この表示画面は撮像装置２０の例えば筐体背面等に設けられる。表示制御部１１ｆは、画像処理部１１ｂによって信号処理された撮像画像を表示部１５に表示させるようになっている。また、表示制御部１１ｆは、撮像装置２０の各種メニュー表示や警告表示等を表示部１５に表示させることもできるようになっている。 The display unit 15 is provided in the imaging device 20, and the display control unit 11f is provided in the control unit 11. The display unit 15 has, for example, a display screen such as an LCD (liquid crystal display), and the display screen is provided on, for example, the back of the housing of the imaging device 20. The display control unit 11f causes the display unit 15 to display the captured image signal-processed by the image processing unit 11b. In addition, the display control unit 11f can also display various menu displays, warning displays, and the like of the imaging device 20 on the display unit 15.

撮像装置２０には通信部１４が設けられており、制御部１１には、通信制御部１１ｅが設けられている。通信部１４は、通信制御部１１ｅに制御されて、外部機器３０との間で情報を送受することができるようになっている。通信部１４は、例えば、ブルートゥース（登録商標）等の近距離無線による通信及び例えば、Ｗｉ−Ｆｉ（登録商標）等の無線ＬＡＮによる通信が可能である。なお、通信部１４は、ブルートゥースやＷｉ−Ｆｉに限らず、各種通信方式での通信を採用することが可能である。通信制御部１１ｅは、通信部１４を介して、外部機器３０から推論モデルの情報を受信することができる。 The communication unit 14 is provided in the imaging device 20, and the communication control unit 11 e is provided in the control unit 11. The communication unit 14 can transmit and receive information to and from the external device 30 under the control of the communication control unit 11e. The communication unit 14 can perform short-range wireless communication such as Bluetooth (registered trademark) and wireless LAN communication such as Wi-Fi (registered trademark). Note that the communication unit 14 is not limited to Bluetooth and Wi-Fi, and can employ communication using various communication methods. The communication control unit 11e can receive the information of the inference model from the external device 30 via the communication unit 14.

制御部１１には記録制御部１１ｃが設けられている。記録制御部１１ｃは、信号処理後の撮像画像を圧縮処理し、圧縮後の画像を記録部１６に与えて記録させることができる。記録部１６は、所定の記録媒体によって構成されて、制御部１１から与えられた情報を記録すると共に、記録されている情報を制御部１１に出力することができる。また、記録部１６としては、例えばカードインターフェースを採用してもよく、この場合には記録部１６はメモリカード等の記録媒体に画像データを記録可能である。 The control unit 11 is provided with a recording control unit 11c. The recording control unit 11c can compress the captured image after the signal processing and supply the compressed image to the recording unit 16 for recording. The recording unit 16 is configured by a predetermined recording medium, and can record information provided from the control unit 11 and output the recorded information to the control unit 11. The recording unit 16 may employ a card interface, for example. In this case, the recording unit 16 can record image data on a recording medium such as a memory card.

記録部１６は、画像データ記録領域１６ａを有しており、記録制御部１１ｃは、画像データを画像データ記録領域１６ａに記録するようになっている。また、記録制御部１１ｃは、記録部１６に記録されている情報を読み出して再生することも可能である。 The recording unit 16 has an image data recording area 16a, and the recording control unit 11c records image data in the image data recording area 16a. Further, the recording control unit 11c can also read and reproduce information recorded in the recording unit 16.

なお、記録部１６は、設定データ記録領域１６ｂを有している。設定データ記録領域１６ｂには推論モデルの設定情報が記録されるようになっている。 The recording unit 16 has a setting data recording area 16b. In the setting data recording area 16b, setting information of the inference model is recorded.

本実施の形態においては、撮像装置２０には、推論部としての推論エンジン１２が設けられている。推論エンジン１２は、ネットワーク１２ａを有している。ネットワーク１２ａは、記録部１６に記録されている設定値を用いて構築されており、機械学習における学習が完了することによって得られるネットワーク、即ち、推論モデルを構成する。 In the present embodiment, the imaging device 20 is provided with an inference engine 12 as an inference unit. The inference engine 12 has a network 12a. The network 12a is constructed using the set values recorded in the recording unit 16, and constitutes a network obtained by completing learning in machine learning, that is, an inference model.

記録制御部１１ｃは、通信部１４を介して、外部機器３０である学習部３１から推論モデルを構成するための情報を受信して、記録部１６の設定データ記録領域１６ｂに設定情報を記録することができるようになっていてもよい。 The recording control unit 11c receives information for forming an inference model from the learning unit 31 that is the external device 30 via the communication unit 14, and records the setting information in the setting data recording area 16b of the recording unit 16. You may be able to do it.

図２から図４は推論エンジン１２のネットワーク１２ａを説明するための説明図である。図２において、所定のネットワークＮ１には入力及び出力に対応する大量のデータセット３１Ｇが教師データとして与えられる。これにより、ネットワークＮ１は、入力に対応する出力が得られるように、ネットワークデザインが決定される。本実施の形態においては、入力として画像が用いられ、出力として決定的瞬間までの推定時間が信頼性の情報（信頼度）と共に得られる。ネットワークＮ１の決定されたネットワークデザインの情報が設定データ記録領域１６ｂに設定情報として記録される。 2 to 4 are explanatory diagrams for explaining the network 12a of the inference engine 12. FIG. In FIG. 2, a large amount of data sets 31G corresponding to inputs and outputs are given as teacher data to a predetermined network N1. Thus, the network design of the network N1 is determined so that an output corresponding to the input is obtained. In the present embodiment, an image is used as an input, and an estimated time up to a decisive moment is obtained as output along with reliability information (reliability). Information on the determined network design of the network N1 is recorded as setting information in the setting data recording area 16b.

なお、深層学習（ディープ・ラーニング）」は、ニューラル・ネットワークを用いた「機械学習」の過程を多層構造化したものである。情報を前から後ろに送って判定を行う「順伝搬型ニューラル・ネットワーク」が代表的なものである。これは、最も単純なものでは、Ｎ１個のニューロンで構成される入力層、パラメータで与えられるＮ２個のニューロンで構成される中間層、判別するクラスの数に対応するＮ３個のニューロンで構成される出力層の３層があればよい。そして、入力層と中間層、中間層と出力層の各ニューロンはそれぞれが結合加重で結ばれ、中間層と出力層はバイアス値が加えられることで、論理ゲートの形成が容易である。簡単な判別なら３層でもよいが、中間層を多数にすれば、機械学習の過程において複数の特徴量の組み合わせ方を学習することも可能となる。近年では、９層〜１５２層のものが、学習にかかる時間や判定精度、消費エネルギーの関係から実用的になっている。
機械学習に採用するネットワークＮ１としては、公知の種々のネットワークを採用してもよい。例えば、ＣＮＮ（Convolution Neural Network）を利用したＲ−ＣＮＮ（Regions with CNN features）やＦＣＮ（Fully Convolutional Networks）等を用いてもよい。これは、画像の特徴量を圧縮する、「畳み込み」と呼ばれる処理を伴い、最小限処理で動き、パターン認識に強い。また、より複雑な情報を扱え、順番や順序によって意味合いが変わる情報分析に対応して、情報を双方向に流れる「再帰型ニューラル・ネットワーク」（全結合リカレントニューラルネット）を利用してもよい。
これらの技術の実現のためには、ＣＰＵやＦＰＧＡ（Field Programmable Gate Array）といったこれまでの汎用的な演算処理回路などを使ってもよいが、ニューラル・ネットワークの処理の多くが行列の掛け算であることから、行列計算に特化したGPU（Graphic Processing Unit）やTensor Processing Unit（TPU）と呼ばれるものが利用される場合もある。近年ではこうした人工知能（ＡＩ）専用ハードの「ニューラル・ネットワーク・プロセッシング・ユニット（ＮＰＵ）」がＣＰＵなどその他の回路とともに集積して組み込み可能に設計され、処理回路の一部になっている場合もある。
また、深層学習に限らず、公知の各種機械学習の手法を採用して推論モデルを取得してもよい。例えば、サポートベクトルマシン、サポートベクトル回帰という手法もある。ここでの学習は、識別器の重み、フィルター係数、オフセットを算出するもので、他には、ロジスティック回帰処理を利用する手法もある。機械に何かを判定させる場合、人間が機械に判定の仕方を教える必要があり、今回の実施例では、画像の判定を、機械学習により導出する手法を採用したが、そのほか、特定の判断を人間が経験則・ヒューリスティクスによって獲得したルールを適応するルールベースの手法を応用して用いてもよい。 "Deep learning" is a multi-layered process of "machine learning" using a neural network. A “forward-propagation neural network” that sends information from the front to the back to make a determination is a typical example. In the simplest case, it is composed of an input layer composed of N1 neurons, an intermediate layer composed of N2 neurons given by parameters, and N3 neurons corresponding to the number of classes to be determined. Only three output layers are required. Each neuron of the input layer and the intermediate layer and the neuron of the intermediate layer and the output layer are connected by a connection weight, and a bias value is applied to the intermediate layer and the output layer, so that a logic gate can be easily formed. For simple discrimination, three layers may be used. However, if the number of intermediate layers is large, it becomes possible to learn how to combine a plurality of feature amounts in the process of machine learning. In recent years, those having 9 to 152 layers have become practical due to the relationship between the time required for learning, the determination accuracy, and energy consumption.
As the network N1 used for machine learning, various known networks may be used. For example, R-CNN (Regions with CNN features) using CNN (Convolution Neural Network) or FCN (Fully Convolutional Networks) may be used. This involves a process called “convolution” that compresses the feature amount of an image, moves with minimal processing, and is strong in pattern recognition. Further, a "recursive neural network" (a fully connected recurrent neural network) that can handle more complicated information and that has a bidirectional flow of information in response to information analysis whose meaning changes depending on the order may be used.
To implement these technologies, conventional general-purpose arithmetic processing circuits such as CPUs and FPGAs (Field Programmable Gate Arrays) may be used, but most of the processing of neural networks is matrix multiplication. For this reason, what is called a GPU (Graphic Processing Unit) or Tensor Processing Unit (TPU) specialized in matrix calculation may be used. In recent years, such a "neural network processing unit (NPU)" dedicated to artificial intelligence (AI) has been designed so that it can be integrated and incorporated with other circuits such as a CPU, and may be part of a processing circuit. is there.
Further, the inference model may be obtained by employing not only the deep learning but also various known machine learning techniques. For example, there is a technique called support vector machine or support vector regression. The learning here is to calculate the weight, filter coefficient, and offset of the classifier, and there is another method using logistic regression processing. When making a machine judge something, it is necessary for a human to teach the machine how to make the judgment.In this embodiment, the method of deriving the image judgment by machine learning was adopted. A rule-based method that applies rules acquired by humans through empirical rules and heuristics may be applied and used.

外部機器３０は、このようなネットワークデザインの決定を行う学習部３１と大量の学習用データを記録した外部画像データベース（ＤＢ）３２を有している。学習部３１は通信部３１ｂを有しており、外部画像ＤＢ３２は通信部３３を有している。通信部３１ｂ，３３は相互に通信が可能である。なお、学習部３１の通信部３１ｃは通信部１４の間でも通信が可能である。 The external device 30 includes a learning unit 31 that determines such a network design and an external image database (DB) 32 that records a large amount of learning data. The learning unit 31 has a communication unit 31b, and the external image DB 32 has a communication unit 33. The communication units 31b and 33 can communicate with each other. The communication unit 31c of the learning unit 31 can communicate with the communication unit 14.

学習部３１は、制御部３１ｇを有しており、制御部３１ｇは、ＣＰＵ等を用いたプロセッサによって構成されて、図示しないメモリに記憶されたプログラムに従って動作して各部を制御するものであってもよいし、ハードウェアの電子回路で機能の一部又は全部を実現するものであってもよい。なお、学習部３１全体が、ＣＰＵ、ＧＰＵ（Graphics Processing Unit）、ＦＰＧＡ等を用いたプロセッサによって構成されて、図示しないメモリに記憶されたプログラムに従って動作して学習を制御するものであってもよいし、ハードウェアの電子回路で機能の一部又は全部を実現するものであってもよい。 The learning unit 31 has a control unit 31g. The control unit 31g is configured by a processor using a CPU or the like, and operates according to a program stored in a memory (not shown) to control each unit. Alternatively, some or all of the functions may be realized by hardware electronic circuits. The entire learning unit 31 may be configured by a processor using a CPU, a GPU (Graphics Processing Unit), an FPGA, or the like, and may operate according to a program stored in a memory (not shown) to control learning. Alternatively, some or all of the functions may be realized by a hardware electronic circuit.

外部画像ＤＢ３２は、画像分類記録部３４を備えている。画像分類記録部３４は、ハードディスクやメモリ媒体等の図示しない記録媒体により構成されており、複数の画像を画像中に含まれる対象物の種類毎に分類して記録する。図１の例では、画像分類記録部３４は、対象物種類Ａの画像群３４ａのみを記録する例を示しているが、分類する種類の数は適宜設定可能である。 The external image DB 32 includes an image classification recording unit 34. The image classification recording unit 34 is configured by a recording medium (not shown) such as a hard disk or a memory medium, and classifies and records a plurality of images for each type of object included in the images. Although the example of FIG. 1 shows an example in which the image classification recording unit 34 records only the image group 34a of the target object type A, the number of types to be classified can be set as appropriate.

本実施の形態においては、画像群３４ａとして、例えば鳥の画像群が記録されている。図３及び図４は画像群３４ａの各画像を説明するためのものである。図３は画像群３４ａの各画像を撮像する一例を示す説明図であり、図４は画像群３４ａの各画像と撮影時間との関係を示す説明図である。 In the present embodiment, for example, a bird image group is recorded as the image group 34a. 3 and 4 are for explaining each image of the image group 34a. FIG. 3 is an explanatory diagram illustrating an example of capturing each image of the image group 34a, and FIG. 4 is an explanatory diagram illustrating a relationship between each image of the image group 34a and a shooting time.

図３は樹木の枝４５に止まっている鳥４６をカメラ４０によって撮影している様子を示している。カメラ４０の背面にはＬＣＤ等により構成された表示画面４２が設けられており、鳥４６の画像がライブビューとして表示されていることを示している。シャッタボタン４１を操作することで、鳥４６の撮影が可能である。本実施の形態においては、鳥４６が枝４５から飛び立つまでの一連の様子を撮像する。例えば、カメラ４０が連写機能を有している場合には連写機能を用いて、鳥４６が枝４５に止まっている状態から枝４５から飛び立つまでの一連の様子を所定の時間間隔で撮像して一連の画像として取得してもよい。また、カメラ４０が動画の撮像機能を有している場合には、鳥４６が枝４５に止まっている状態から枝４５から飛び立つまでの一連の様子を動画撮影した動画像を取得してもよい。また、カメラ４０のシャッタボタン４１を所定の時間間隔で操作することで、鳥４６が枝４５に止まっている状態から枝４５から飛び立つまでの一連の様子を所定の時間間隔で撮像して離散的に静止画像を取得してもよい。 FIG. 3 shows a state in which a bird 46 stopping on a tree branch 45 is photographed by the camera 40. A display screen 42 composed of an LCD or the like is provided on the back of the camera 40, and indicates that an image of the bird 46 is displayed as a live view. The bird 46 can be photographed by operating the shutter button 41. In the present embodiment, a series of states until the bird 46 flies off the branch 45 is imaged. For example, when the camera 40 has a continuous shooting function, the continuous shooting function is used to take an image of a series of states from the state where the bird 46 stops on the branch 45 to the time when the bird 46 jumps off the branch 45 at a predetermined time interval. And may be obtained as a series of images. When the camera 40 has a moving image capturing function, a moving image obtained by capturing a moving image of a series of states from the state where the bird 46 is stopped at the branch 45 to the time it flies off the branch 45 may be obtained. . Further, by operating the shutter button 41 of the camera 40 at a predetermined time interval, a series of states from the state where the bird 46 stops at the branch 45 to the time it jumps off the branch 45 is imaged at a predetermined time interval and discretely. May be obtained.

図４はカメラ４０が取得した６枚の画像Ｐ１〜Ｐ６を時間順に配置して示している。各画像Ｐ１〜Ｐ６は図３の鳥４６を撮影して取得された画像である。各画像Ｐ１〜Ｐ６はそれぞれ時間情報を含んでおり、図４の各画像下の数字は画像Ｐ５を基準として取得された時間を示している。図４の例では、画像Ｐ１〜Ｐ６は画像Ｐ５を時間基準として、−２０秒（２秒前）、−１５秒、−１０秒、−５秒、０秒、＋５秒に撮影されたものである。なお、一般的には、カメラ４０にはタイマが内蔵されており、タイマによって時刻の情報が各画像Ｐ１〜Ｐ６に付加されるが、いずれかの画像、例えば、連写開始時の画像の取得時刻を基準に相対的な時間の情報を各画像Ｐ１〜Ｐ６を付加してもよい。例えば、図４の例では、画像Ｐ１の撮影時刻は１５時３０分２７秒であり、この時間を基準に画像Ｐ２〜Ｐ６の相対的な取得時間を時間情報として保持していてもよい。 FIG. 4 shows six images P1 to P6 acquired by the camera 40 arranged in chronological order. Each of the images P1 to P6 is an image obtained by photographing the bird 46 in FIG. Each of the images P1 to P6 includes time information, and the number below each image in FIG. 4 indicates the time obtained based on the image P5. In the example of FIG. 4, the images P1 to P6 are captured at −20 seconds (2 seconds before), −15 seconds, −10 seconds, −5 seconds, 0 seconds, and +5 seconds with the image P5 as a time reference. is there. In general, a timer is built in the camera 40, and time information is added to each of the images P1 to P6 by the timer. Information of relative time based on the time may be added to each of the images P1 to P6. For example, in the example of FIG. 4, the shooting time of the image P1 is 15:30:27, and the relative acquisition times of the images P2 to P6 may be stored as time information based on this time.

例えば、ユーザが最も撮影したい画像が、鳥４６が枝４５から正に飛び立とうとしている瞬間の画像（以下、特定状態画像ともいう）を含む画像Ｐ５であるものとする。この瞬間を決定的瞬間として、画像Ｐ５を決定的瞬間画像というものとする。本実施の形態においては、図４に示すような時間情報を有する一連の画像を画像分類記録部３４に画像群３４ａとして記録するようになっている。なお、画像分類記録部３４には、同一種類の鳥の画像や、サイズが略同様のサイズであると分類された鳥の画像群を記録するようにしてもよい。また、これらの画像の外に、種類が異なる画像や異なるサイズに分類される鳥の画像群を記録してもよい。 For example, it is assumed that the image that the user wants to photograph most is an image P5 including an image (hereinafter, also referred to as a specific state image) at the moment when the bird 46 is about to fly right off the branch 45. This moment is defined as the decisive moment, and the image P5 is referred to as the decisive moment image. In the present embodiment, a series of images having time information as shown in FIG. 4 is recorded in the image classification recording unit 34 as an image group 34a. The image classification recording unit 34 may record an image of a bird of the same type or an image group of birds classified as having substantially the same size. In addition to these images, a group of images of different types or bird images classified into different sizes may be recorded.

図４に示すように、鳥は、飛び立つ前の所定時間に、曲げていた足を伸ばし、翼を広げようとする予備動作を行うことがある。例えば、種類が異なる鳥の場合、サイズが異なる鳥の場合、或いは、獲物を狙っているか否か等に応じて、予備動作の仕方は多少異なるものと考えられるが、鳥の飛び立ちに関する膨大な画像データについて学習を行えば、飛び立つ前の様子から飛び立つ瞬間の時間を予測することが可能であると考えられる。 As shown in FIG. 4, the bird sometimes performs a preparatory operation to extend his bent legs and spread his wings at a predetermined time before taking off. For example, depending on the type of birds, the size of the birds, or whether or not they are aiming for prey, the preparatory movements may be slightly different, but a huge image of birds taking off If learning is performed on the data, it is possible to predict the time at the moment of taking off from the state before taking off.

画像群３４ａには、例えば、図３のような撮影によって図４に示すような時間情報を有する一連の画像についての膨大なデータ群が記録されている。学習部３１の母集合作成部３１ａは、外部画像ＤＢ３２から画像を読出して、学習の元となる母集合を作成する。 In the image group 34a, for example, an enormous data group of a series of images having time information as shown in FIG. 4 is recorded by shooting as shown in FIG. The mother set creation unit 31a of the learning unit 31 reads images from the external image DB 32 and creates a mother set as a source of learning.

なお、学習部３１に与える学習用データを撮像装置２０から取得することも可能である。この場合には、撮像装置２０は、撮像部２２によって取得した撮像画像に、制御部１１に内蔵されている図示しないタイマからの時間情報を付加して通信部１４を介して学習部３１に送信する。 The learning data to be provided to the learning unit 31 can be acquired from the imaging device 20. In this case, the imaging device 20 adds time information from a timer (not shown) built in the control unit 11 to the captured image acquired by the imaging unit 22 and transmits the captured image to the learning unit 31 via the communication unit 14. I do.

本実施の形態においては、母集合作成部３１ａは、時間判定部３１ａ１及び対象物画像判定部３１ａ２を有している。母集合作成部３１ａは、制御部３１ｇに制御されて、推論モデルの生成に用いる教師データを作成する。対象物画像判定部３１ａ２は決定的瞬間の判定の対象となる対象物の画像部分（以下、対象物画像という）を判定すると共に、当該対象物画像が対象物の特定の状態である決定的瞬間に至ったときの画像（特定状態画像）になったか否か判定する。また、時間判定部３１ａ１は、各画像に付加された時間情報により、各画像が決定的瞬間（特定状態）に至るまでの時間を判定する。即ち、時間判定部３１ａ１は、一連の画像の各画像について、各画像の撮影時刻と、一連の画像のうち決定的瞬間の特定状態画像を含む画像の撮影時刻との時間差を判定する。制御部３１ｇは、各画像と判定した時間差との情報を組にして教師データとする。
ここで、特定状態と書いたのは、撮像して追っている対象物のそのものの形状が時間的に変化して特定の姿勢や向きになった状態の他、この対象物の色や大きさの変化なども含むもので、その他、撮像画面内における形状や位置や向きなどになったことを表している。また、状態には、対象物が発する音声の時間変化などを含んでもよい。また、画像、音声を総合的に判定して、演奏や踊りなどの演芸から、何らかの作業など、イベントの開始や終了、あるいはそのクライマックスや見どころなど、ユーザが設定、指定できるようにすれば良い。指定方法としては文字入力、音声入力、項目選択、類似情報入力などが考えられる。また、いちいち指定しなくとも、多くの人が決定的瞬間と感じるものなどは自動判定でもよく、また、それは特定の１タイミングに限らず複数のタイミングで起こるものでもよい。上記時間差は、数値として複数あってもよい。 In the present embodiment, the population creating unit 31a includes a time determining unit 31a1 and an object image determining unit 31a2. The population creation unit 31a is controlled by the control unit 31g to create teacher data used for generating an inference model. The object image determination unit 31a2 determines an image portion (hereinafter, referred to as an object image) of the object to be determined at the definitive moment, and determines the definitive moment when the object image is in a specific state of the object. It is determined whether or not the image at the time of reaching (specific state image) has been reached. Further, the time determination unit 31a1 determines the time until each image reaches a decisive moment (specific state) based on the time information added to each image. That is, the time determination unit 31a1 determines, for each image in the series of images, the time difference between the shooting time of each image and the imaging time of the image including the specific state image at the decisive moment in the series of images. The control unit 31g sets information of each image and the determined time difference as a set and sets it as teacher data.
Here, the description of the specific state refers to the state in which the shape of the object itself being imaged and following changes over time to a specific posture and orientation, and the color and size of the object. It also includes changes and the like, and indicates that the shape, position, orientation, and the like in the imaging screen have changed. The state may include a time change of a sound emitted from the object. In addition, the image and the sound may be comprehensively determined so that the user can set and designate the start and end of an event, such as performance or dance, or the start or end of an event, or the climax or highlight of the event. Examples of the designation method include character input, voice input, item selection, and similar information input. In addition, even if not specified one by one, what many people feel is a decisive moment may be determined automatically, or it may occur not only at one specific timing but also at a plurality of timings. The time difference may be plural as a numerical value.

図５は母集合作成部３１ａによる教師データの作成方法を説明するためのフローチャートである。図５のステップＳ１において、母集合作成部３１ａは時間情報が付加された一連の画像群を取得する。例えば、図４の画像Ｐ１〜Ｐ６が取得される。母集合作成部３１ａの対象物画像判定部３１ａ２は、取得された一連の画像群の各画像から対象物画像を判定する。 FIG. 5 is a flowchart for explaining a method of creating teacher data by the mother set creating unit 31a. In step S1 of FIG. 5, the population creating unit 31a acquires a series of image groups to which time information has been added. For example, images P1 to P6 in FIG. 4 are obtained. The target image determining unit 31a2 of the population creating unit 31a determines a target image from each image of the acquired series of images.

先ず、対象物画像判定部３１ａ２は、マニュアル選択が指示されているか否かを判定する。マニュアル選択は、画像中からユーザが対象物画像を指定する操作によって行われる。学習部３１にはＬＣＤ等によって構成された表示部３１ｆが設けられており、表示部３１ｆにはユーザ操作を受け付けるための図示しないタッチパネルが配設されている。対象物画像判定部３１ａ２は、一連の画像を表示部３１ｆに表示させ（ステップＳ９）、ユーザのタッチ操作によって指定された被写体を、マニュアル選択された対象物画像として判定する。ユーザによるマニュアル選択があった場合には、当該一連の画像を教師データの候補とする（ステップＳ１０）。 First, the object image determining unit 31a2 determines whether manual selection has been instructed. The manual selection is performed by an operation in which a user specifies a target image from the images. The learning unit 31 is provided with a display unit 31f configured by an LCD or the like, and the display unit 31f is provided with a touch panel (not shown) for receiving a user operation. The target image determining unit 31a2 displays a series of images on the display unit 31f (step S9), and determines the subject specified by the user's touch operation as a manually selected target image. If a manual selection has been made by the user, the series of images is set as teacher data candidates (step S10).

マニュアル選択が指定されていない場合には、対象物画像判定部３１ａ２は、ステップＳ３において対象物画像を判定する。例えば、対象物画像判定部３１ａ２は、画像中央に所定サイズ以上のサイズで位置する被写体を対象物として、その画像部分を対象物画像と判定してもよい。また、ユーザ操作によって、予め対象物とすべきものが指定されていてもよい。例えば、鳥が対象物として指定されている場合には、対象物画像判定部３１ａ２は、取り込まれた画像に対する公知の認識処理によって、対象物である鳥を判定するようになっていてもよい。 If manual selection has not been specified, the object image determining unit 31a2 determines an object image in step S3. For example, the target image determining unit 31a2 may determine a subject located at the center of the image with a size equal to or larger than a predetermined size as the target, and determine the image portion as the target image. Further, what should be the target object may be specified in advance by a user operation. For example, when a bird is specified as the target, the target image determining unit 31a2 may determine the bird as the target by a known recognition process on the captured image.

対象物画像判定部３１ａ２は、次のステップＳ４において、対象物画像を含まない画像を一連の画像群から排除し、残りの画像数が所定数以上であるか否かを判定する（ステップＳ５）。一連の画像群のうち対象物画像を含む画像の枚数が所定数よりも少ない場合には、決定的瞬間の判定や決定的瞬間までの時間判定が困難になることが考えられるので、そのような画像群については、ステップＳ１１において教師データ群の候補から外す。 In the next step S4, the target image determining unit 31a2 excludes images that do not include the target image from the series of image groups, and determines whether the number of remaining images is equal to or greater than a predetermined number (step S5). . When the number of images including the target object image in the series of images is smaller than a predetermined number, it is considered that it is difficult to determine the decisive moment or the time until the decisive moment. The image group is excluded from candidates for the teacher data group in step S11.

対象物画像判定部３１ａ２は、対象物画像を含む画像が所定数以上であった場合には、処理をステップＳ６に移行して、当該対象物画像を含む画像群を選択し、ステップＳ７において、決定的瞬間画像を選択するための画像の候補化を行う。図４の例では、鳥が飛び立つ瞬間を決定的瞬間とする例を示しており、例えば、対象物画像判定部３１ａ２は、画像解析処理によって、対象物画像が画像中で上下左右に最も広がった画像を特定状態画像として検出し、当該特定状態画像を含む画像を決定的瞬間画像の候補とする。図４の例では画像Ｐ５が決定的瞬間画像の候補となる。 When the number of images including the target image is equal to or more than the predetermined number, the target image determining unit 31a2 shifts the processing to step S6, selects an image group including the target image, and in step S7, Image candidates for selecting a crucial instant image are performed. In the example of FIG. 4, an example in which the moment when the bird takes off is set as the definitive moment is shown. For example, the target object image determination unit 31a2 has the target object image spread most vertically, horizontally, and horizontally in the image by the image analysis processing. The image is detected as a specific state image, and an image including the specific state image is determined as a definitive instantaneous image candidate. In the example of FIG. 4, the image P5 is a candidate for the decisive instantaneous image.

対象物画像判定部３１ａ２は、次のステップＳ８において、決定的瞬間画像の候補の前に所定数以上の画像があるか否かを判定する。対象物画像を含む画像群のうち決定的瞬間画像よりも取得時間が前の画像の枚数が所定数よりも少ない場合には、決定的瞬間までの時間が短すぎて利用しにくいことから、そのような画像群については、ステップＳ１１において教師データ群の候補から外す。 In the next step S8, the target object image determining unit 31a2 determines whether or not there is a predetermined number or more of images before the crucial instant image candidate. If the number of images before the decisive instant image in the image group including the target image is less than the predetermined number of images, the time to the definitive instant is too short to be easily used. Such an image group is excluded from candidates for the teacher data group in step S11.

対象物画像判定部３１ａ２は、決定的瞬間画像の前に対象物画像を含む画像が所定数以上あった場合には、処理をステップＳ１２に移行して、当該決定的瞬間画像の候補を決定的瞬間画像に決定し、当該決定的瞬間画像の取得時間を基準化する。 If there is a predetermined number or more of images including the target image before the crucial instant image, the target image determining unit 31a2 shifts the processing to step S12, and determines a candidate for the deterministic instant image. An instantaneous image is determined, and the acquisition time of the decisive instantaneous image is standardized.

次のステップＳ１３において、時間判定部３１ａ１は、一連の画像群のうち対象物画像を含む画像について、決定的瞬間画像の取得時間を基準にした相対的時間を付して記録する。こうして、図４のように、決定的瞬間画像である画像Ｐ５を基準として、他の画像の取得時間との相対的な時間差の情報が付された一連の画像群が教師データとして教師データ記録部３１ｅに記録される。 In the next step S13, the time determination unit 31a1 records the image including the target object image in the series of image groups with the relative time based on the acquisition time of the definitive instantaneous image. In this way, as shown in FIG. 4, a series of images to which information of a relative time difference from the acquisition time of another image is added with reference to the image P5 which is the decisive instantaneous image is used as teacher data as the teacher data recording unit. 31e.

推論モデル生成部としての入出力モデル化部３１ｄは、例えば図２に示す手法によって、母集合作成部３１ａが作成した教師データを教師データ記録部３１ｅから読出して、画像と決定的瞬間画像が得られるまでの時間との関係を学習した学習モデル（推論モデル）、即ち、ネットワーク１２ａ及びその設定情報を求める。 The input / output modeling unit 31d as the inference model generation unit reads the teacher data created by the population creation unit 31a from the teacher data recording unit 31e by, for example, the method shown in FIG. 2, and obtains an image and a definitive instantaneous image. A learning model (inference model) that has learned the relationship with the time until it is obtained, that is, the network 12a and its setting information are obtained.

学習部３１は、撮像装置２０の制御部１１から要求があった場合には、生成した推論モデルを通信部３１ｃ，１４を介して撮像装置２０に送信するようになっている。制御部１１は、通信部１４を介して取得した設定情報を設定データ記録領域１６ｂに記録して、推論エンジン１２のネットワーク１２ａの設定に用いる。こうして、学習部３１において生成された推論モデルを撮像装置２０において利用可能となる。 When there is a request from the control unit 11 of the imaging device 20, the learning unit 31 transmits the generated inference model to the imaging device 20 via the communication units 31c and 14. The control unit 11 records the setting information acquired via the communication unit 14 in the setting data recording area 16b, and uses the setting information for setting the network 12a of the inference engine 12. Thus, the inference model generated in the learning unit 31 can be used in the imaging device 20.

制御部１１には設定制御部１１ｄが設けられており、設定制御部１１ｄは、推論エンジン１２を制御して、推論エンジン１２を利用した推論を行わせることができるようになっている。即ち、設定制御部１１ｄは、撮像部２２によってライブビュー画像が取得されると、当該ライブビュー画像を推論エンジン１２に与えて決定的瞬間までの時間を得る推論（以下、画像時間推論という）を実行させる。この結果、推論エンジン１２から決定的瞬間までの時間の情報が得られた場合には、設定制御部１１ｄは、表示制御部１１ｆを制御して、推論エンジン１２による推論の結果を表示部１５の表示画面上に表示させることができるようになっている。即ち、この場合には、ライブビュー画像に重ねて決定的瞬間までの時間が表示されることになる。 The control unit 11 is provided with a setting control unit 11d, and the setting control unit 11d can control the inference engine 12 to cause the inference using the inference engine 12. That is, when the live view image is acquired by the imaging unit 22, the setting control unit 11d performs inference (hereinafter, referred to as image time inference) to give the live view image to the inference engine 12 and obtain a time until a decisive moment. Let it run. As a result, when the information of the time until the decisive moment is obtained from the inference engine 12, the setting control unit 11d controls the display control unit 11f to display the result of the inference by the inference engine 12 on the display unit 15. It can be displayed on the display screen. That is, in this case, the time until the decisive moment is displayed over the live view image.

なお、設定制御部１１ｄは、表示に限らず、推論エンジン１２による推論の結果を種々の方法でユーザに提示することができるようになっていてもよい。例えば、設定制御部１１ｄは、音声により推論結果を提示してもよく、或いは駆動部の機械的な制御によって推論結果を提示してもよい。 Note that the setting control unit 11d may be configured to be able to present the result of the inference by the inference engine 12 to the user in various ways, without being limited to the display. For example, the setting control unit 11d may present the inference result by voice, or may present the inference result by mechanical control of the driving unit.

次に、このように構成された実施の形態の動作について図６から図１０を参照して説明する。図６、図７及び図１０は第１の実施の形態の動作を説明するための説明図である。また、図８及び図９は第１の実施の形態の動作を説明するためのフローチャートであり、図８は撮像装置２０の動作を示し、図９は外部機器３０の動作を示している。 Next, the operation of the embodiment configured as described above will be described with reference to FIGS. 6, 7, and 10 are explanatory diagrams for explaining the operation of the first embodiment. FIGS. 8 and 9 are flowcharts for explaining the operation of the first embodiment. FIG. 8 shows the operation of the imaging device 20, and FIG. 9 shows the operation of the external device 30.

図６は図１の撮像装置２０により被写体を撮像する様子を示している。図１の撮像装置２０の各部は、図６の筐体２０ａ内に収納されている。筐体２０ａの背面には表示部１５を構成する表示画面１５ａが配設されている。また、筐体２０ａの前面には、光学系２２ｂを構成する図示しないレンズが配設されており、筐体２０ａの上面には、操作部１３を構成するシャッタボタン１３ａが配設されている。 FIG. 6 shows a state in which a subject is imaged by the imaging device 20 of FIG. Each part of the imaging device 20 in FIG. 1 is housed in a housing 20a in FIG. A display screen 15a constituting the display unit 15 is provided on the back of the housing 20a. A lens (not shown) that forms the optical system 22b is provided on the front surface of the housing 20a, and a shutter button 13a that makes up the operation unit 13 is provided on the upper surface of the housing 20a.

図６は被写体として、樹木の枝４５に止まった鳥４６を撮影する例を示しており、ユーザ４７は、例えば、筐体２０ａを右手４８で把持して、表示部１５の表示画面１５ａを見ながら、鳥４６を視野範囲に捉えた状態で、右手４８の指でシャッタボタン１３ａを押下操作することで撮影を行う。 FIG. 6 shows an example in which a bird 46 that has stopped on a tree branch 45 is photographed as a subject. The user 47 holds the housing 20 a with the right hand 48 and looks at the display screen 15 a of the display unit 15, for example. While holding the bird 46 in the visual field range, the shutter button 13a is pressed down with the finger of the right hand 48 to perform photographing.

本実施の形態においては、推論モデルを用いて、シャッタチャンスである決定的瞬間の判定を行う。即ち、推論エンジン１２は、画像（ライブビュー画像）に対して決定的瞬間が訪れるまでの予測時間を推論するための推論モデルを構成する。このような推論モデルは、外部機器３０によって生成可能である。 In the present embodiment, a decisive moment that is a photo opportunity is determined using an inference model. That is, the inference engine 12 forms an inference model for inferring a predicted time until a crucial moment comes to an image (live view image). Such an inference model can be generated by the external device 30.

図９は外部機器３０の動作を示している。図９のステップＳ４１において、外部機器３０は学習依頼があったか否かを判定し、学習依頼があるまで待機状態となる。学習依頼が発生すると、外部機器３０はステップＳ４２において、例えば外部画像ＤＢ３２から学習用データを読出して教師データを作成する。なお、ステップＳ４２の教師データ作成ステップは、図５のフローによって実施されるものであってもよい。 FIG. 9 shows the operation of the external device 30. In step S41 of FIG. 9, the external device 30 determines whether a learning request has been made, and enters a standby state until a learning request is made. When a learning request is generated, the external device 30 reads the learning data from the external image DB 32 and creates teacher data in step S42. Note that the teacher data creation step of step S42 may be performed according to the flow of FIG.

教師データが作成されて教師データ記録部３１ｅに記録されると、入出力モデル化部３１ｄは、ステップＳ４３において、教師データ記録部３１ｅから教師データを読出して学習を行い、推論モデルを作成する。入出力モデル化部３１ｄは、次のステップＳ４４において、練習問題を設定して、作成した推論モデルの検証を行う。入出力モデル化部３１ｄは、ステップＳ４５において、練習問題を用いた検証の結果、推論の信頼性が所定の値以上であるか否かを判定する。所定の値以上の場合には、入出力モデル化部３１ｄは、正しく推論モデルが生成されたものと判定して、当該推論モデルを通信部３１ｃを介して撮像装置２０に送信する（ステップＳ４９）。 When the teacher data is created and recorded in the teacher data recording unit 31e, in step S43, the input / output modeling unit 31d reads the teacher data from the teacher data recording unit 31e, performs learning, and creates an inference model. In the next step S44, the input / output modeling unit 31d sets a practice exercise and verifies the created inference model. In step S45, the input / output modeling unit 31d determines whether or not the reliability of the inference is equal to or more than a predetermined value as a result of the verification using the exercise. If the value is equal to or larger than the predetermined value, the input / output modeling unit 31d determines that the inference model has been correctly generated, and transmits the inference model to the imaging device 20 via the communication unit 31c (step S49). .

入出力モデル化部３１ｄは、信頼性が所定値以上でない場合には、ステップＳ４５から処理をステップＳ４６に移行して、教師データの再設定等を行った後、ステップＳ４７において所定回数以上再設定を行ったか否かを判定する。所定回数以上再設定を行っていない場合には、入出力モデル化部３１ｄはステップＳ４３に処理を戻す。入出力モデル化部３１ｄは、再設定が所定回数以上行われた場合には、ステップＳ４７から処理をステップＳ４８に移行して、対象物画像は推論には不向きな苦手画像であるものと判定して、苦手画像情報を撮像装置２０に送信した後、処理をステップＳ４９に移行する。 If the reliability is not equal to or more than the predetermined value, the input / output modeling unit 31d shifts the processing from step S45 to step S46, resets the teacher data, and then resets the data at step S47 for a predetermined number of times or more. Is determined. If the reset has not been performed a predetermined number of times or more, the input / output modeling unit 31d returns the process to step S43. If the resetting has been performed a predetermined number of times or more, the input / output modeling unit 31d shifts the processing from step S47 to step S48, and determines that the target object image is a poor image unsuitable for inference. Then, after transmitting the weak image information to the imaging device 20, the process proceeds to step S49.

一方、撮像装置２０の制御部１１は、図８のステップＳ２１において、撮影モードが指定されているか否かを判定する。撮影モードが指定されている場合には、制御部１１は、ステップＳ２２において画像入力及び表示を行う。即ち、撮像部２２は被写体を撮像し、制御部１１は、撮像部２２からの撮像画像を取り込み、図３に示すように、撮像画像をライブビュー画像として表示部１５に与えて表示させる。 On the other hand, the control unit 11 of the imaging device 20 determines whether or not the shooting mode is specified in step S21 of FIG. If the shooting mode is designated, the control unit 11 performs image input and display in step S22. That is, the imaging unit 22 captures an image of a subject, and the control unit 11 captures the captured image from the imaging unit 22 and gives the captured image as a live view image to the display unit 15 to display the captured image as shown in FIG.

次に、設定制御部１１ｄは、ステップＳ２３において、推論エンジン１２に決定的瞬間までの時間を表示させるための画像時間推論を実行させる。推論エンジン１２は、ネットワーク１２ａにより実現される推論モデルを用いて、撮像中の各ライブビュー画像が決定的瞬間画像になるまでの時間を推論する。推論エンジン１２は、推論の結果を制御部１１に出力する。なお、推論結果は、表示中のライブビュー画像が決定的瞬間画像に変化するまでの時間及びその信頼性の情報を含む。 Next, in step S23, the setting control unit 11d causes the inference engine 12 to execute image time inference for displaying the time until the decisive moment. The inference engine 12 uses an inference model implemented by the network 12a to infer the time until each live view image being captured becomes a definitive instantaneous image. The inference engine 12 outputs a result of the inference to the control unit 11. Note that the inference result includes information on the time until the live view image being displayed changes to the definitive instantaneous image and the reliability thereof.

設定制御部１１ｄは、ステップＳ２４において、推論エンジン１２がライブビュー画像に関連する推論モデルを有しているか否かを判定する。例えば、設定制御部１１ｄは、推論エンジン１２からの推論結果の信頼性（信頼度）が所定の第１の閾値よりも低い場合には、推論エンジン１２がライブビュー画像に関連する推論モデルを有していないものと判定してもよい。また、設定制御部１１ｄは、公知の認識処理によって、ライブビュー画像中の被写体を認識し、認識した被写体に関する推論モデルが存在するか否かを判定してもよい。 In step S24, the setting control unit 11d determines whether or not the inference engine 12 has an inference model related to the live view image. For example, when the reliability (reliability) of the inference result from the inference engine 12 is lower than a predetermined first threshold, the setting control unit 11d has an inference model related to the live view image. It may be determined that it has not been performed. Further, the setting control unit 11d may recognize the subject in the live view image by a known recognition process and determine whether or not there is an inference model related to the recognized subject.

設定制御部１１ｄは、関連する推論モデルを有していない場合には処理をステップＳ２９に移行する。設定制御部１１ｄは、関連する推論モデルを有していると判定した場合には次のステップＳ２５において、現在のライブビュー画像に関連する推論モデルが存在することを示す表示を表示させる。 If the setting control unit 11d does not have a related inference model, the process proceeds to step S29. If the setting control unit 11d determines that there is a related inference model, in the next step S25, it displays a display indicating that there is a related inference model in the current live view image.

次に、設定制御部１１ｄは、推論エンジン１２からの推論結果の信頼性（信頼度）が十分に高いか否か、例えば所定の第２の閾値よりも高いか否かを判定する。設定制御部１１ｄは、信頼性が第２の閾値以上の場合には、処理をステップＳ２７に移行して信頼性が高い時間差表示を表示させ、信頼性が第２の閾値よりも小さい場合には、処理をステップＳ２８に移行して信頼性が比較的高い時間差幅の表示を表示させる。 Next, the setting control unit 11d determines whether or not the reliability (reliability) of the inference result from the inference engine 12 is sufficiently high, for example, whether or not the reliability is higher than a predetermined second threshold. If the reliability is equal to or greater than the second threshold, the setting control unit 11d shifts the processing to step S27 to display a time difference display with high reliability, and if the reliability is smaller than the second threshold, Then, the process shifts to step S28 to display a display of a time difference width having relatively high reliability.

図７は表示部１５の表示画面１５ａに表示される撮像画像を示す説明図である。上述したように、ユーザ４７は、枝４５上の鳥４６の撮影を試みようとしている。特に、ユーザ４７は、鳥４６が枝４５から飛び立つ瞬間を決定的瞬間と考えて撮影を希望しているものとする。図７の画像Ｐ１１〜Ｐ１４は、所定の時刻におけるライブビュー画像を示しており、画像Ｐ１１〜Ｐ１４の順で時刻が経過している。 FIG. 7 is an explanatory diagram illustrating a captured image displayed on the display screen 15a of the display unit 15. As described above, the user 47 is trying to photograph the bird 46 on the branch 45. In particular, it is assumed that the user 47 desires the photographing, considering the moment when the bird 46 jumps off the branch 45 as the decisive moment. Images P11 to P14 in FIG. 7 indicate live view images at a predetermined time, and the time elapses in the order of the images P11 to P14.

画像Ｐ１１中の画像４６ａは、枝に止まっている鳥４６を示している。この画像Ｐ１１は、表示画面１５ａ上にライブビュー画像として表示されている。画像Ｐ１１から所定時間後に取得されたライブビュー画像である画像Ｐ１２は、画像Ｐ１２中の被写体に関連する推論モデルが存在することを示す丸印の表示５１が表示されている。画像Ｐ１２中の鳥４６の画像４６ｂは、もう少しで鳥４６が飛び立とうとしている様子を示している。更に、画像Ｐ１２中には、推論エンジン１２による画像時間推論の結果、決定的瞬間までの時間が５秒間から２秒間であることを示す時間差幅表示５２ｂが表示されている。時間差幅表示５２ｂは、画像時間推論の推論結果の信頼性は十分に高いとはいえないことから、推論結果に所定の幅を持たせて表示するものであり、例えば比較的高い信頼性（例えば、６５〜８４％）の複数の推論結果の最小値と最大値を示す。 The image 46a in the image P11 shows the bird 46 stopping on the branch. This image P11 is displayed as a live view image on the display screen 15a. The image P12, which is a live view image acquired after a predetermined time from the image P11, has a display 51 with a circle indicating that an inference model related to the subject in the image P12 exists. The image 46b of the bird 46 in the image P12 shows that the bird 46 is about to fly off. Further, in the image P12, as a result of the image time inference by the inference engine 12, a time difference width display 52b indicating that the time until the decisive moment is 5 seconds to 2 seconds is displayed. The time difference width display 52b displays the inference result with a predetermined width since the reliability of the inference result of the image time inference is not sufficiently high. , 65-84%).

これに対し、画像Ｐ１２から所定時間後に取得されたライブビュー画像である画像Ｐ１３中には、飛び立つ直前の鳥４６の画像４６ｃが表示されている。また、画像Ｐ１３中には、推論エンジン１２による画像時間推論の結果、決定的瞬間までの時間が１秒間であることを示す時間差表示５２ｃが表示されている。時間差表示５２ｃは、画像時間推論の推論結果の信頼性が十分に高く（例えば８５％以上）、最も高い信頼性の１つの推論結果を示すものである。画像Ｐ１３中の時間差表示５２ｃによれば、被写体である鳥４６は、時間差表示５２ｃの表示開始から１秒後に飛び立つ可能性が高いことを示している。 On the other hand, in the image P13 which is a live view image acquired a predetermined time after the image P12, an image 46c of the bird 46 immediately before flying up is displayed. Further, in the image P13, as a result of the image time inference by the inference engine 12, a time difference display 52c indicating that the time until the decisive moment is one second is displayed. The time difference display 52c shows one inference result having the highest reliability, with the reliability of the inference result of the image time inference being sufficiently high (for example, 85% or more). The time difference display 52c in the image P13 indicates that the bird 46, which is the subject, is likely to fly one second after the start of the display of the time difference display 52c.

ユーザ４７が、この時間差表示５２ｃの表示から１秒後にシャッタボタン１３ａを押下することで、鳥４６が飛び立つ決定的瞬間を撮影することができる可能性が高い。制御部１１は、ステップＳ２９において、動画又は静止画撮影操作が行われたか否かを判定する。制御部１１は、撮影操作が行われない場合には、処理をステップＳ２１に戻し、撮影操作が行われると、ステップＳ３０において、撮影及び記録処理を実行して処理をステップＳ２１に戻す。即ち、制御部１１は、撮像部２２によって取得された撮像画像を記録制御部１１ｃにより記録部１６の画像データ記録領域１６ａに記録させる。なお、動画記録時には、撮影終了操作時に、画像データ記録領域１６ａに動画ファイルが記録される。 When the user 47 presses the shutter button 13a one second after the display of the time difference display 52c, there is a high possibility that the decisive moment at which the bird 46 will fly can be captured. The control unit 11 determines whether or not a moving image or still image shooting operation has been performed in step S29. When the photographing operation is not performed, the control unit 11 returns the process to step S21. When the photographing operation is performed, the control unit 11 executes the photographing and recording process in step S30, and returns the process to step S21. That is, the control unit 11 causes the recording control unit 11c to record the captured image acquired by the imaging unit 22 in the image data recording area 16a of the recording unit 16. At the time of moving image recording, a moving image file is recorded in the image data recording area 16a at the time of a shooting end operation.

図１０はこうして撮影された撮像画像を説明するための説明図であり、表示画面１５ａ上に表示されるレックビュー画像の一例を示している。図１０の左側は、連写時において表示画面１５ａ上に表示されるレックビュー画像５５を示している。例えば、ユーザが時間差幅表示５２ｂや時間差表示５２ｃを確認した後、連写撮影を開始することで、レックビュー画像５５が得られる。太枠は、連写した１枚の画像が決定的瞬間画像５５ａであることを示している。 FIG. 10 is an explanatory diagram for explaining the captured image thus captured, and shows an example of a rec view image displayed on the display screen 15a. The left side of FIG. 10 shows a rec view image 55 displayed on the display screen 15a during continuous shooting. For example, after the user checks the time difference width display 52b and the time difference display 52c, and starts continuous shooting, the rec view image 55 is obtained. The thick frame indicates that one continuously shot image is the decisive instantaneous image 55a.

また、図１０の右側は、単写時において表示画面１５ａ上に表示されるレックビュー画像５７を示している。例えば、ユーザが時間差表示５２ｃを確認した後、表示された時間後にシャッタボタン１３ａを押下操作することで、レックビュー画像５７で示す決定的瞬間画像が得られる。 The right side of FIG. 10 shows a rec view image 57 displayed on the display screen 15a during single shooting. For example, after the user confirms the time difference display 52c and presses the shutter button 13a after the displayed time, a decisive instantaneous image shown by the rec view image 57 is obtained.

制御部１１は、ステップＳ２１において撮影モードが指定されていないと判定した場合には、処理をステップＳ３１に移行して、推論モデルの取得が指定されているか否かを判定する。制御部１１は、推論モデルの取得が指定されていない場合には、処理をステップＳ２１に戻し、指定されている場合には、ステップＳ３２において対象物の設定や再学習物の設定を行う。 When determining in step S21 that the shooting mode has not been specified, the control unit 11 shifts the processing to step S31 and determines whether acquisition of an inference model has been specified. If acquisition of an inference model has not been specified, the control unit 11 returns the process to step S21. If it has been specified, the control unit 11 performs setting of a target object and setting of a relearning object in step S32.

例えば、制御部１１は、表示制御部１１ｆによって、表示画面１５ａ上に辞書設定のためのメニューを表示させ、更に、ユーザ操作に応じて、対象物の設定画面及び再学習物の設定画面を表示させて、ユーザによる対象物の指定及び再学習物の指定を可能にしてもよい。制御部１１は、ステップＳ３３において、ユーザによって指定された対象物又は再学習物に対する学習依頼又は再学習依頼を、外部機器３０に対して行う。 For example, the control unit 11 causes the display control unit 11f to display a menu for setting a dictionary on the display screen 15a, and further displays a target object setting screen and a relearning object setting screen in response to a user operation. In this way, the user may be able to specify the target object and the relearning object. In step S33, the control unit 11 requests the external device 30 to perform a learning request or a re-learning request for the target object or the re-learned object specified by the user.

制御部１１は、ステップＳ３４において、学習部３１から信頼性が所定値以上になった推論モデル、又は苦手画像情報に対応する推論モデルを通信部１４を介して受信する。制御部１１は、受信した推論モデルを推論エンジン１２に設定し、苦手画像情報を記録部１６に記録する。 In step S34, the control unit 11 receives, via the communication unit 14, the inference model whose reliability has reached a predetermined value or more or the inference model corresponding to the weak image information from the learning unit 31. The control unit 11 sets the received inference model in the inference engine 12 and records the weak image information in the recording unit 16.

なお、図８の説明では、画像時間推論の推論結果の信頼性が十分に高いか否かによって、時間差表示を行うか時間差幅表示を行うかを切換える例を説明したが、推論結果の表示形態は種々考えられる。例えば、推論結果の時間差を信頼性を示す数値や色分けによって表示してもよく、また、推論結果の信頼性が高い程、表示の濃淡の度合いを大きくするようにしてもよい。また、信頼性に拘わらず、常に時間差表示又は時間差幅表示を行ってもよい。 In the description of FIG. 8, an example has been described in which the time difference display or the time difference width display is switched depending on whether or not the reliability of the inference result of the image time inference is sufficiently high. Can be variously considered. For example, the time difference of the inference result may be displayed by numerical values indicating the reliability or color coding, and the degree of shading of the display may be increased as the reliability of the inference result increases. Further, regardless of the reliability, the time difference display or the time difference width display may always be performed.

また、上記説明では、撮影操作は、ユーザが手動で行うものと説明したが、決定的瞬間に自動的に撮影が行われるように撮像制御部が制御することも可能である。また、連写を行う場合には、決定的瞬間において確実に撮影が行われるように、連写のタイミングを決定的瞬間に同期させるように制御することも可能である。 In the above description, the photographing operation is manually performed by the user. However, the photographing control unit can also control the photographing operation to be performed automatically at a decisive moment. In addition, when performing continuous shooting, it is possible to control so that the timing of continuous shooting is synchronized with the crucial instant so as to ensure that shooting is performed at the crucial instant.

このように本実施の形態においては、時間情報を有する画像を学習用データとして、所定の瞬間（決定的瞬間）に到達するまでの時間を予測する画像時間推論を行う機械学習を実現する。この機械学習によって得た推論モデルを例えば撮像装置に適用することにより、時々刻々変化するライブビュー画像に対して画像時間推論を行って、例えば鳥が飛び立つという決定的瞬間までの到達時間を予測して、提示することができる。ユーザは、提示された到達時間を考慮して、例えばシャッタボタンの操作を行うことで、簡単に鳥が飛び立つ決定的瞬間の撮影が可能である。また、学習用データとして用いる時間情報を有する画像は極めて容易に取得することができるものであり、この学習用データから比較的簡単な処理によって教師データを取得することができ、画像時間推論を可能にする推論モデルを簡単に作成することができる。 As described above, in the present embodiment, machine learning that performs image time inference for estimating a time until reaching a predetermined moment (determinative moment) using an image having time information as learning data is realized. By applying the inference model obtained by this machine learning to, for example, an imaging device, image time inference is performed on a live view image that changes from moment to moment, and, for example, the arrival time until a decisive moment when a bird takes off is predicted. And can be presented. The user can easily take an image of the decisive moment when the bird jumps, for example, by operating the shutter button in consideration of the presented arrival time. In addition, an image having time information used as learning data can be obtained very easily, and teacher data can be obtained from the learning data by relatively simple processing, thereby enabling image time inference. Inference models can be easily created.

（第２の実施の形態）
図１１は本発明の第２の実施の形態において採用される動作フローを示すフローチャートである。本実施の形態のハードウェア構成は第１の実施の形態と同様である。図１１において図９と同一の手順には同一符号を付して説明を省略する。 (Second embodiment)
FIG. 11 is a flowchart showing an operation flow employed in the second embodiment of the present invention. The hardware configuration of this embodiment is the same as that of the first embodiment. In FIG. 11, the same steps as those in FIG. 9 are denoted by the same reference numerals, and description thereof will be omitted.

第１の実施の形態においては、時間情報を有する画像を学習用データとして用い、この学習用データを用いて教師データを作成することで、各画像と決定的瞬間までの時間を予測する画像時間推論を行う推論モデルを生成して利用する例を説明した。これに対し、本実施の形態は、時間情報を有する画像を学習用データとして用い、この学習用データを用いて所定の時間間隔を有する複数の画像の組を教師データとして作成することで、各画像と所定の時間後の画像を予測する画像画像推論を行う推論モデルを生成して利用する例である。 In the first embodiment, an image having time information is used as learning data, and teacher data is created using the learning data. An example of generating and using an inference model for performing inference has been described. On the other hand, in the present embodiment, an image having time information is used as learning data, and a set of a plurality of images having a predetermined time interval is created as teacher data by using the learning data. This is an example of generating and using an inference model for performing image image inference for predicting an image and an image after a predetermined time.

図１１のフローは、教師データの作成方法が図５のフローと異なる。即ち、外部機器３０の母集合作成部３１ａは、図１１のステップＳ５１において、類似対象物の連続画像群を外部画像ＤＢ３２等から取得する。母集合作成部３１ａは、ステップＳ５２において、特定時間差の２画像を教師データとしてネットワークに与えて学習させる。 The flow of FIG. 11 differs from the flow of FIG. 5 in the method of creating teacher data. That is, the population generating unit 31a of the external device 30 acquires a continuous image group of similar objects from the external image DB 32 or the like in step S51 of FIG. In step S52, the mother set creating unit 31a gives two images having a specific time difference to the network as teacher data to be trained.

図１２は外部画像ＤＢ３２から母集合作成部３１ａに取り込まれる連続画像群の一例を示す説明図である。画像Ｐ２１〜Ｐ２９は、図４と同様に、鳥が飛び立つ前後の一連の様子を撮影して得られた画像を時間順に配置したものである。なお、これらの画像Ｐ２１〜Ｐ２９は、連写機能や動画機能を利用して取得してもよく、また、所定の時間間隔で単写撮影して取得してもよい。 FIG. 12 is an explanatory diagram showing an example of a continuous image group taken into the mother set creation unit 31a from the external image DB 32. As in FIG. 4, the images P21 to P29 are images obtained by photographing a series of states before and after the bird takes off and arranged in chronological order. Note that these images P21 to P29 may be obtained by using the continuous shooting function or the moving image function, or may be obtained by single shooting at predetermined time intervals.

母集合作成部３１ａは、時間判定部３１ａ１及び対象物画像判定部３１ａ２によって、所定の時間前後の２つの画像の組を教師データとして選択する。例えば、図１２の矢印はＮ秒後の画像を示しており、画像Ｐ２１とＰ２４、画像Ｐ２２とＰ２５、画像Ｐ２３とＰ２６、画像Ｐ２４とＰ２７とが組であることを示している。母集合作成部３１ａは作成した教師データを教師データ記録部３１ｅに記録する。 The population generating unit 31a selects a pair of two images before and after a predetermined time as teacher data by the time determining unit 31a1 and the object image determining unit 31a2. For example, the arrow in FIG. 12 indicates an image after N seconds, and indicates that the images P21 and P24, the images P22 and P25, the images P23 and P26, and the images P24 and P27 are pairs. The mother set creation unit 31a records the created teacher data in the teacher data recording unit 31e.

図１３は、図２と同様の記載方法によって、ネットワーク１２ａを生成する手法を説明するための説明図である。図１３においては、、所定のネットワークＮ１には入力される大量の学習用データセットは、教師データ記録部３１ｅから読出される画像の組である。本実施の形態においては、入力として画像がネットワークＮ１に与えられると、出力としてＮ秒後の画像が得られるように、ネットワークデザインが決定される。こうして決定されたネットワークデザインの情報は、通信部３１ｃから撮像装置２０に伝送され、設定制御部１１ｄによって、設定データ記録領域１６ｂに設定情報として記録される。 FIG. 13 is an explanatory diagram for explaining a method of generating the network 12a by the same description method as that of FIG. In FIG. 13, a large amount of learning data set input to the predetermined network N1 is a set of images read from the teacher data recording unit 31e. In the present embodiment, when an image is provided to network N1 as an input, a network design is determined such that an image after N seconds is obtained as an output. Information on the network design determined in this way is transmitted from the communication unit 31c to the imaging device 20, and is recorded as setting information in the setting data recording area 16b by the setting control unit 11d.

更に、本実施の形態においては、学習部３１は、ステップＳ４５において信頼性が所定の値以上であると判定した場合には、次のステップＳ５３において、入力画像から推測される出力画像、即ち、入力画像の取得時間から所定の時間経過後に取得された画像のうち、いずれの画像を出力するかを決定する。この画像は、後述するように、合成表示に用いる代表画像として用いるために、図示しない記録媒体に記録する。学習部３１は、ネットワークデザインの情報の送信時に、代表画像についても撮像装置２０に送信する。撮像装置２０の記録制御部１１ｃは代表画像を画像データ記録領域１６ａに記録するようになっている。 Furthermore, in the present embodiment, when the learning unit 31 determines that the reliability is equal to or more than the predetermined value in step S45, in the next step S53, the output image estimated from the input image, that is, It is determined which of the images acquired after a predetermined time has elapsed from the acquisition time of the input image is output. As will be described later, this image is recorded on a recording medium (not shown) in order to use it as a representative image used for composite display. The learning unit 31 also transmits the representative image to the imaging device 20 when transmitting the network design information. The recording control unit 11c of the imaging device 20 records the representative image in the image data recording area 16a.

次に、このように構成された実施の形態について撮像装置２０における動作を図１４を参照して説明する。図１４は表示部１５の表示画面に表示される画像の表示例を示す説明図である。 Next, an operation of the imaging apparatus 20 according to the embodiment configured as described above will be described with reference to FIG. FIG. 14 is an explanatory diagram illustrating a display example of an image displayed on the display screen of the display unit 15.

本実施の形態においては、推論エンジン１２は、所定の画像入力に対して所定時間後の予測画像を出力する上述した画像画像推論を行う推論モデルを構成する。また、撮像装置２０の制御部１１は、図８のステップＳ２５〜Ｓ２８に代えて、推論エンジン１２による画像画像推論を実行させ、推論結果に基づく表示を行う。 In the present embodiment, the inference engine 12 configures an inference model that performs the above-described image image inference that outputs a predicted image after a predetermined time has elapsed with respect to a predetermined image input. Further, the control unit 11 of the imaging device 20 causes the inference engine 12 to execute image image inference instead of steps S25 to S28 in FIG. 8 and performs display based on the inference result.

いま、図６と同様に、ユーザ４７が枝４５に止まっている鳥４６を撮像するものとする。図１４の画像Ｐ３１ａ〜Ｐ３１ｄはいずれも鳥４６が飛び立つ直前のライブビュー画像を示している。設定制御部１１ｄは、撮像部２２からのライブビュー画像を推論エンジン１２に与えて、画像画像推論を実行させる。推論エンジン１２は、画像画像推論の結果として、入力されたライブビュー画像の撮影時刻から所定時間後に撮像されるであろう画像を予測して予測結果を設定制御部１１ｄに出力する。 Now, as in FIG. 6, it is assumed that the user 47 captures an image of the bird 46 stopping on the branch 45. Each of the images P31a to P31d in FIG. 14 shows a live view image immediately before the bird 46 takes off. The setting control unit 11d supplies the live view image from the imaging unit 22 to the inference engine 12, and executes the image image inference. The inference engine 12 predicts, as a result of the image image inference, an image that will be captured a predetermined time after the capturing time of the input live view image, and outputs the prediction result to the setting control unit 11d.

設定制御部１１ｄは、予測結果に基づいて、画像データ記録領域１６ａに記憶されている代表画像を読出して表示制御部１１ｆに与える。こうして、表示制御部１１ｆは、現在のライブビュー画像上に、所定時間後に撮像されるであろう代表画像を重畳して表示させる。図１４の画像Ｐ３１ａは、この場合の一表示例を示しており、画像Ｐ３１ａ中には、ライブビュー画像中に含まれる鳥４６の飛び立つ直前の画像部分６１の外に、２秒後の画像として予測された代表画像６２ａと３秒後の画像として予測された代表画像６２ｂとが表示される。また、表示制御部１１ｆは、これらの画像６２ａ，６３ａの近傍に、プレビュー画像の取得時間を基準にして、これらの画像が取得されるであろう時間が２秒後又は３秒後であることを示す時間表示６２ｂ，６３ｂを表示している。 The setting control unit 11d reads the representative image stored in the image data recording area 16a based on the prediction result, and gives the representative image to the display control unit 11f. In this way, the display control unit 11f superimposes and displays the representative image that will be captured after a predetermined time on the current live view image. The image P31a in FIG. 14 shows one display example in this case. In the image P31a, an image 2 seconds later is displayed outside the image portion 61 immediately before the bird 46 included in the live view image jumps up. The predicted representative image 62a and the predicted representative image 62b three seconds later are displayed. In addition, the display control unit 11f determines that the time at which these images will be acquired is two seconds or three seconds after the preview images are acquired near the images 62a and 63a. Are displayed on the display 62b and 63b.

なお、上述したように、代表画像は、外部機器３０によって決定されて記録された画像であり、外部機器３０から撮像装置２０に転送された画像である。このため、必ずしも代表画像が存在しない可能性もある。そこで、この場合には、代表画像の表示位置に、画像部分６１をコピーして表示することも考えられる。図１４の画像Ｐ３１ｂはこの場合の表示例を示しており、画像Ｐ３１ｂ中には、ライブビュー画像中に含まれる鳥４６の飛び立つ直前の画像部分６１の外に、２秒後の画像として予測された代表画像に代えて画像部分６１をコピーして生成した画像６４ａが表示される。また、表示制御部１１ｆは、この画像６４ａの近傍に、プレビュー画像の取得時間を基準にして、代表画像が取得されるであろう時間が２秒後であることを示す時間表示６４ｂを表示している。 As described above, the representative image is an image determined and recorded by the external device 30, and is an image transferred from the external device 30 to the imaging device 20. Therefore, there is a possibility that the representative image does not always exist. Therefore, in this case, the image portion 61 may be copied and displayed at the display position of the representative image. An image P31b in FIG. 14 shows a display example in this case. In the image P31b, an image portion 61 immediately before the bird 46 included in the live view image immediately before jumping out is predicted as an image 2 seconds later. An image 64a generated by copying the image portion 61 instead of the representative image is displayed. In addition, the display control unit 11f displays a time display 64b near the image 64a, indicating that the time at which the representative image will be acquired is two seconds later, based on the acquisition time of the preview image. ing.

また、図１４の画像Ｐ３１ｃは、表示制御部１１ｆが、画像Ｐ３１ａの時間表示６２ｂ，６３ｂに代えて時間表示６５ｂ，６６ｂを表示した例を示している。時計の針の形状を模した時間表示６５ｂ，６６ｂ及び矢印の表示によって、代表画像６２ａ，６３ａの予想取得時間が、それぞれプレビュー画像の撮影時刻から２秒後又は３秒後であることを示している。 Further, an image P31c in FIG. 14 illustrates an example in which the display control unit 11f displays time displays 65b and 66b instead of the time displays 62b and 63b of the image P31a. Time displays 65b and 66b imitating the shape of the hands of a clock and the display of arrows indicate that the expected acquisition times of the representative images 62a and 63a are respectively 2 seconds or 3 seconds after the shooting time of the preview image. I have.

図１４の画像Ｐ３１ｄは、同一の時間に撮像されるであろう代表画像を複数同時に表示する例を示している。上述した図１１のステップＳ５３の説明では、１枚の代表画像のみを選択する例について説明したが、代表画像として複数の画像を選択して記録するようにしてもよい。この場合には、設定制御部１１ｄは、外部機器３０から転送された複数の代表画像を画像データ記録領域１６ａに記録させる。 An image P31d in FIG. 14 illustrates an example in which a plurality of representative images that are likely to be captured at the same time are displayed simultaneously. In the above description of step S53 in FIG. 11, an example in which only one representative image is selected has been described, but a plurality of images may be selected and recorded as the representative image. In this case, the setting control unit 11d records the plurality of representative images transferred from the external device 30 in the image data recording area 16a.

設定制御部１１ｄは、推論エンジン１２の予測結果に基づいて、画像データ記録領域１６ａに記録されている代表画像を読み出して表示制御部１１ｆに与える。表示制御部１１ｆは、代表画像が複数の場合には、複数の代表画像を重ねて表示する。画像Ｐ３１ｄは、この場合の一表示例を示しており、画像Ｐ３１ｄ中には、ライブビュー画像中に含まれる鳥４６の飛び立つ直前の画像部分６１の外に、２秒後の画像として予測された代表画像６７ａ〜６７ｃが表示される。また、表示制御部１１ｆは、これらの画像６７ａ〜６７ｃの近傍に、プレビュー画像の取得時間を基準にして、これらの画像が取得されるであろう時間が２秒後であることを示す時間表示６７ｄを表示している。 The setting control unit 11d reads a representative image recorded in the image data recording area 16a based on the prediction result of the inference engine 12, and supplies the representative image to the display control unit 11f. When there are a plurality of representative images, the display control unit 11f superimposes and displays the plurality of representative images. The image P31d shows one display example in this case. In the image P31d, an image 2 seconds later is predicted as an image outside the image portion 61 immediately before the bird 46 included in the live view image jumps up. The representative images 67a to 67c are displayed. Further, the display control unit 11f displays a time display near these images 67a to 67c indicating that the time at which these images will be obtained is two seconds later, based on the time at which the preview images are obtained. 67d is displayed.

このように本実施の形態においては、時間情報を有する画像を学習用データとして、所定の時間後の画像を予測する画像画像推論を行う機械学習を実現する。この機械学習によって得た推論モデルを例えば撮像装置に適用することにより、時々刻々変化するライブビュー画像に対して画像画像推論を行って、例えば鳥が所定の時間後にどの様に撮影されるかを予測して、提示することができる。ユーザは、提示された画像を考慮して、例えば撮影操作を行うことで、簡単に鳥が飛ぶ様子を捉えた撮影が可能である。また、学習用データとして用いる時間情報を有する画像は極めて容易に取得することができるものであり、この学習用データから比較的簡単な処理によって教師データを取得することができ、画像画像推論を可能にする推論モデルを簡単に作成することができる。 As described above, in the present embodiment, machine learning that performs image image inference for predicting an image after a predetermined time is realized using an image having time information as learning data. By applying the inference model obtained by this machine learning to, for example, an imaging device, image inference is performed on a live view image that changes every moment, and for example, how a bird is photographed after a predetermined time is determined. Can be predicted and presented. The user can easily take a picture of a bird flying, for example, by performing a shooting operation in consideration of the presented image. In addition, an image having time information used as learning data can be obtained extremely easily. Teacher data can be obtained from the learning data by relatively simple processing, and image image inference can be performed. Inference models can be easily created.

（第３の実施の形態）
図１５は本発明の第３の実施の形態において採用される動作フローを示すフローチャートである。本実施の形態のハードウェア構成は第１の実施の形態と同様である。図１５において図１１と同一の手順には同一符号を付して説明を省略する。 (Third embodiment)
FIG. 15 is a flowchart showing an operation flow employed in the third embodiment of the present invention. The hardware configuration of this embodiment is the same as that of the first embodiment. 15, the same steps as those in FIG. 11 are denoted by the same reference numerals, and description thereof will be omitted.

第２の実施の形態は画像画像推論の推論モデルを生成する例を説明したが、本実施の形態は、時間情報を有する画像を学習用データとして用い、この学習用データを用いて所定の時間間隔を有する複数の画像とその位置差の組を教師データとして作成することで、各画像と所定の時間後の画像位置を予測する画像位置推論を行う推論モデルを生成して利用する例である。 In the second embodiment, an example of generating an inference model of image image inference has been described. In the present embodiment, an image having time information is used as learning data, and a predetermined time is determined using the learning data. This is an example of generating and using an inference model for performing image position inference for estimating an image position after a predetermined time from each image by creating a set of a plurality of images having intervals and their position differences as teacher data. .

図１５のフローは、教師データの作成方法が図１１と異なり、ステップＳ５２に代えてステップＳ６１を採用すると共にステップＳ５３の処理を省略したものである。なお、外部画像ＤＢ３２には、第２の実施の形態と同様に、例えば、図１２等に示す連続画像群が記録されているものとする。 The flow of FIG. 15 differs from that of FIG. 11 in that the method of generating teacher data employs step S61 instead of step S52 and omits the processing of step S53. It is assumed that, for example, a continuous image group shown in FIG. 12 and the like is recorded in the external image DB 32 as in the second embodiment.

母集合作成部３１ａは、時間判定部３１ａ１及び対象物画像判定部３１ａ２によって、第２の実施の形態と同様に、所定の時間前後の２つの画像の組を教師データに用いる。例えば、図１２の画像Ｐ２１とＰ２４、画像Ｐ２２とＰ２５、画像Ｐ２３とＰ２６、画像Ｐ２４とＰ２７とが組であることを示している。母集合作成部３１ａは、対象物画像について、画像中の位置の情報を求める。例えば、対象物画像が図１２の鳥の画像である場合には、鳥の顔の位置やつば先の先端の位置を求めてもよい。そして、母集合作成部３１ａは、組の画像中の各対象物画像同士の位置の差を求め、時間的に前に取得された画像（以下、前画像という）中の対象物の位置を基準に、後に取得された画像（以下、後画像という）中の対象物の位置差を求める。母集合作成部３１ａは前画像と位置差との関係を教師データとして、教師データ記録部３１ｅに記録する。 The population generating unit 31a uses a pair of two images before and after a predetermined time as teacher data by the time determining unit 31a1 and the object image determining unit 31a2, as in the second embodiment. For example, it indicates that the images P21 and P24, the images P22 and P25, the images P23 and P26, and the images P24 and P27 in FIG. The population creating unit 31a obtains information on the position of the target object image in the image. For example, when the target object image is the image of the bird in FIG. 12, the position of the face of the bird or the position of the tip of the tip of the brim may be obtained. Then, the population generating unit 31a obtains the difference between the positions of the respective object images in the set of images, and determines the position of the object in an image acquired earlier in time (hereinafter referred to as a previous image) as a reference. Next, the position difference of the target object in an image acquired later (hereinafter, referred to as a later image) is obtained. The population creating unit 31a records the relationship between the previous image and the position difference as teacher data in the teacher data recording unit 31e.

この場合におけるネットワーク１２ａの生成方法は、第２の実施の形態と同様であり、上述した図１３における出力の画像に代えて前画像の位置を基準にした後画像の位置差が得られるように、ネットワークデザインが決定される。即ち、本実施の形態においては、前画像を入力して、後画像の位置を予測する画像位置推論を行う推論モデルを得る。こうして決定されたネットワークデザインの情報は、通信部３１ｃから撮像装置２０に伝送され、設定制御部１１ｄによって、設定データ記録領域１６ｂに設定情報として記録される。 The method of generating the network 12a in this case is the same as that of the second embodiment, and the position difference of the post-image based on the position of the previous image is obtained instead of the output image in FIG. , The network design is decided. That is, in the present embodiment, an inference model for inputting the previous image and performing image position inference for predicting the position of the subsequent image is obtained. Information on the network design determined in this way is transmitted from the communication unit 31c to the imaging device 20, and is recorded as setting information in the setting data recording area 16b by the setting control unit 11d.

次に、このように構成された実施の形態について撮像装置２０における動作を図１６及び図１７を参照して説明する。図１６は撮像装置２０の制御部１１の制御を示すフローチャートであり、図１７は表示部１５の表示画面に表示される画像の表示例を示す説明図である。なお、図１６に示す制御部１１の制御フローは、図８と略同様であり、図８のステップＳ２７，Ｓ２８にそれぞれ代えてステップＳ６５，Ｓ６６を採用した点が異なる。 Next, the operation of the imaging apparatus 20 according to the embodiment configured as described above will be described with reference to FIGS. FIG. 16 is a flowchart showing the control of the control unit 11 of the imaging device 20, and FIG. 17 is an explanatory diagram showing a display example of an image displayed on the display screen of the display unit 15. The control flow of the control unit 11 shown in FIG. 16 is substantially the same as that of FIG. 8 except that steps S65 and S66 are employed instead of steps S27 and S28 in FIG.

本実施の形態においては、推論エンジン１２は、所定の画像入力に対して所定時間後の予測位置を出力する上述した画像位置推論を行う推論モデルを構成する。 In the present embodiment, the inference engine 12 constitutes an inference model that performs the above-described image position inference that outputs a predicted position after a predetermined time with respect to a predetermined image input.

いま、図６と同様に、ユーザ４７が枝４５に止まっている鳥４６を撮像するものとする。図１７の画像Ｐ３２ａ〜Ｐ３２ｄはいずれも鳥４６が飛び立つ直前のライブビュー画像を示している。設定制御部１１ｄは、撮像部２２からのライブビュー画像を推論エンジン１２に与えて、画像位置推論を実行させる。推論エンジン１２は、画像位置推論の結果として、入力画像との位置差、即ち、入力されたライブビュー画像中の対象物画像の位置を基準として、ライブビュー画像の撮影時刻から所定時間後に撮像されるであろう画像中の対象物画像の位置を予測して予測結果を設定制御部１１ｄに出力する。 Now, as in FIG. 6, it is assumed that the user 47 captures an image of the bird 46 stopping on the branch 45. Each of the images P32a to P32d in FIG. 17 shows a live view image immediately before the bird 46 takes off. The setting control unit 11d supplies the live view image from the imaging unit 22 to the inference engine 12, and executes the image position inference. As a result of the image position inference, the inference engine 12 is imaged after a predetermined time from the photographing time of the live view image with reference to the position difference from the input image, that is, the position of the target object image in the input live view image. The position of the target object image in the image that is likely to be predicted is predicted, and the prediction result is output to the setting control unit 11d.

撮像装置２０の制御部１１は、図１６のステップＳ２６において、推論エンジン１２による予測結果の信頼性が十分に高いと判定した場合には、次のステップＳ６４において、推論結果に基づいて、所定時間後に撮像されるであろう画像中の対象物画像の位置を示す表示を行う。 If the control unit 11 of the imaging device 20 determines in step S26 of FIG. 16 that the reliability of the prediction result by the inference engine 12 is sufficiently high, the control unit 11 performs a predetermined time based on the inference result in the next step S64. A display is performed to indicate the position of the target object image in the image that will be captured later.

図１７の画像Ｐ３２ａは、この場合の一表示例を示しており、表示制御部１１ｆは、画像Ｐ３２ａ中に、ライブビュー画像中に含まれる鳥４６の飛び立つ直前の画像（対象物画像）部分６１の外に、対象物画像の２秒後の画像位置として予測された位置を示す位置表示６４ａを表示している。なお、位置表示６４ａは、画像部分６１をコピーして得られた画像である。また、表示制御部１１ｆは、この画像６４ａの近傍に、プレビュー画像の取得時間を基準にして、この画像が取得されるであろう時間が２秒後であることを示す時間表示６４ｂを表示している。 The image P32a in FIG. 17 illustrates one display example in this case, and the display control unit 11f includes, in the image P32a, an image (object image) portion 61 immediately before the bird 46 included in the live view image rises. , A position display 64a indicating a position predicted as an image position two seconds after the target object image is displayed. The position display 64a is an image obtained by copying the image portion 61. In addition, the display control unit 11f displays a time display 64b near the image 64a, indicating that the time at which this image will be obtained is two seconds later, based on the acquisition time of the preview image. ing.

また、図１７の画像Ｐ３２ｂは、表示制御部１１ｆが、画像Ｐ３２ａの時間表示６４ａに代えて時間表示７１ａを表示した例を示している。また、表示制御部１１ｆは、この画像７１ａの近傍に、プレビュー画像の取得時間を基準にして、この画像が取得されるであろう時間が２秒後であることを示す時間表示７１ｂを表示している。時間表示７１ａは、曲線形状によって、鳥４６がプレビュー画像の撮影時刻から２秒後において到達するであろう画像中の位置の範囲を示している。 An image P32b in FIG. 17 illustrates an example in which the display control unit 11f displays a time display 71a instead of the time display 64a of the image P32a. In addition, the display control unit 11f displays a time display 71b near the image 71a, indicating that the time at which this image will be obtained is two seconds later, based on the acquisition time of the preview image. ing. The time display 71a indicates the range of the position in the image that the bird 46 will reach two seconds after the shooting time of the preview image due to the curved shape.

また、制御部１１は、ステップＳ４５において推論の信頼性が十分に高くない場合には、ステップＳ６６において、信頼性が比較的高い位置の範囲を表示する。図１７の画像Ｐ３２ｃはこの場合の表示例を示しており、表示制御部１１ｆは、画像Ｐ３２ｃ中に、２つの曲線による位置範囲表示７２ａを表示して、後画像が２秒後に存在する範囲を示している。また、表示制御部１１ｆは、位置範囲表示７２ａの近傍に、プレビュー画像の取得時間を基準にして、この画像が取得されるであろう時間が２秒後であることを示す時間表示７２ｂを表示している。なお、位置範囲表示７２ａは、画像位置推論の推論結果の信頼性が十分に高いとはいえない場合に、比較的高い信頼性（例えば、６５〜８４％）の複数の推論結果のうち最も近い位置と最も遠い位置との範囲を示すものである。 If the reliability of the inference is not sufficiently high in step S45, the control unit 11 displays a range of positions where the reliability is relatively high in step S66. An image P32c in FIG. 17 shows a display example in this case, and the display control unit 11f displays a position range display 72a by two curves in the image P32c, and displays a range in which the subsequent image exists two seconds later. Is shown. In addition, the display control unit 11f displays a time display 72b near the position range display 72a, indicating that the time at which the preview image will be obtained is two seconds later, based on the time at which the preview image was obtained. are doing. When the reliability of the inference result of the image position inference is not sufficiently high, the position range display 72a is the closest among a plurality of inference results of relatively high reliability (for example, 65 to 84%). It shows the range between the position and the farthest position.

また、画像Ｐ３２ｄは、ステップＳ６６における他の表示例を示している。表示制御部１１ｆは、画像Ｐ３２ｄ中に、円による位置範囲表示７３ａを表示して、後画像が２秒後に存在する範囲を示している。また、表示制御部１１ｆは、位置範囲表示７３ａの近傍に、プレビュー画像の取得時間を基準にして、この画像が取得されるであろう時間が２秒後であることを示す時間表示７３ｂを表示している。 The image P32d shows another display example in step S66. The display control unit 11f displays a position range display 73a by a circle in the image P32d to indicate a range where the subsequent image exists two seconds later. In addition, the display control unit 11f displays a time display 73b near the position range display 73a, which indicates that the time at which the preview image will be obtained is two seconds later, based on the acquisition time of the preview image. are doing.

このように本実施の形態においては、時間情報を有する画像を学習用データとして、所定の時間後の画像の位置を予測する画像位置推論を行う機械学習を実現する。この機械学習によって得た推論モデルを例えば撮像装置に適用することにより、時々刻々変化するライブビュー画像に対して画像位置推論を行って、例えば鳥が所定の時間後にどの位置に撮影されるかを予測して、その位置を提示することができる。ユーザは、提示された位置を考慮して、例えば撮影操作を行うことで、鳥が飛ぶ様子を簡単に捉えた撮影が可能である。また、学習用データとして用いる時間情報を有する画像は極めて容易に取得することができるものであり、この学習用データから比較的簡単な処理によって教師データを取得することができ、画像位置推論を可能にする推論モデルを簡単に作成することができる。 As described above, in the present embodiment, machine learning that performs image position inference for predicting the position of an image after a predetermined time is realized using an image having time information as learning data. By applying the inference model obtained by this machine learning to, for example, an imaging device, image position inference is performed on a live view image that changes from time to time, and for example, a position where a bird is photographed after a predetermined time is determined. The position can be predicted and presented. The user can take a picture in which the birds fly easily by performing, for example, a photographing operation in consideration of the presented position. Further, an image having time information used as learning data can be obtained very easily, and teacher data can be obtained from the learning data by relatively simple processing, and image position inference can be performed. Inference models can be easily created.

（第４の実施の形態）
図１８乃至図２０は本発明の第４の実施の形態を説明するための説明図である。本実施の形態のハードウェア構成は第１の実施の形態と同様である。図１８は本実施の形態における撮影シーンを示す説明図である。 (Fourth embodiment)
FIG. 18 to FIG. 20 are explanatory diagrams for explaining the fourth embodiment of the present invention. The hardware configuration of this embodiment is the same as that of the first embodiment. FIG. 18 is an explanatory diagram showing a shooting scene in the present embodiment.

図１８はユーザ４７が樹木の枝４５に止まっている鳥４６の撮影を行う様子を示している。ユーザ４７と樹木との間には川８１が流れており、川８１では魚４９が泳いでいる。枝４５に止まっている鳥４６は、上空に飛び立ったり、隣の樹木の枝に飛び移ったり、川８１の魚４９に目がけて滑空したりすることが考えられる。図１８ではこれらの状態を符号４６ｈ，４６ｉ，４６ｊでそれぞれ示している。この場合において、鳥４６がいずれの方向に飛び去るかを事前に予測できれば、鳥４６の効果的な撮影が可能となることが考えられる。図１８の例では、鳥４６が符号４６ｈ又は４６ｉの状態になる確率がそれぞれ１５％で、符号４６ｊの状態になる確率が７０％であることを示している。本実施の形態はこのような予測を可能にするものである。 FIG. 18 shows a situation in which the user 47 takes an image of a bird 46 stopping on a tree branch 45. A river 81 flows between the user 47 and the tree, and a fish 49 is swimming in the river 81. It is conceivable that the bird 46 stopping on the branch 45 jumps up in the sky, jumps to a branch of an adjacent tree, or glides toward a fish 49 in the river 81. In FIG. 18, these states are indicated by reference numerals 46h, 46i, and 46j, respectively. In this case, if it is possible to predict in which direction the bird 46 will fly in advance, it may be possible to effectively shoot the bird 46. The example in FIG. 18 indicates that the probability that the bird 46 will be in the state of reference numeral 46h or 46i is 15%, and the probability that the bird 46 will be in the state of reference numeral 46j is 70%. The present embodiment enables such prediction.

第２の実施の形態においては、時間情報を有する画像を学習用データとして用い、この学習用データを用いて所定の時間間隔を有する複数の画像の組を教師データとして作成することで、各画像（前画像）と所定の時間後の画像（後画像）を予測する画像画像推論を行う推論モデルを生成して利用する例であった。本実施の形態は、前画像に対して複数の時間後の後画像と移動方向を予測する画像画像方向推論を行う推論モデルを生成して利用する例である。 In the second embodiment, an image having time information is used as learning data, and a set of a plurality of images having a predetermined time interval is created as teacher data using the learning data. This is an example in which an inference model for performing image image inference for estimating an image (previous image) and an image after a predetermined time (post image) is generated and used. This embodiment is an example of generating and using an inference model for performing image image direction inference for estimating a moving direction with respect to a preceding image and a succeeding image after a plurality of times.

本実施の形態において、例えば、図１２に示す連続画像群を学習用データとして用いることができる。母集合作成部３１ａは、前画像と、前画像の取得時間から複数の所定時間後における後画像とを組にした教師データを生成して教師データ記録部３１ｅに格納する。入出力モデル化部３１ｄは、各所定時間毎に、前画像に対応する後画像とその移動方向を求めると共に、後画像として用いる代表画像を選択する。この場合には、代表画像は移動方向に応じて複数選択される。例えば、移動方向の１５度毎に代表画像を選択するようにしてもよい。こうして、生成されたネットワークデザインの情報及び代表画像は撮像装置２０に送信され、撮像装置２０の記録制御部１１ｃはネットワークデザインの情報を設定データ記録領域１６ｂに記録し、代表画像を画像データ記録領域１６ａに記録するようになっている。 In the present embodiment, for example, the continuous image group shown in FIG. 12 can be used as learning data. The mother set creation unit 31a generates teacher data in which a previous image and a plurality of subsequent images after a plurality of predetermined times from the acquisition time of the previous image are generated and stored in the teacher data recording unit 31e. The input / output modeling unit 31d obtains a rear image corresponding to the front image and the moving direction thereof at each predetermined time, and selects a representative image to be used as the rear image. In this case, a plurality of representative images are selected according to the moving direction. For example, a representative image may be selected every 15 degrees in the moving direction. The network design information and the representative image thus generated are transmitted to the imaging device 20, and the recording control unit 11c of the imaging device 20 records the network design information in the setting data recording area 16b, and stores the representative image in the image data recording area. 16a.

次に、このように構成された実施の形態について図１８乃至図２０を参照して説明する。図１９及び図２０は表示部１５の表示画面に表示される画像の表示例を示す説明図である。 Next, an embodiment configured as described above will be described with reference to FIGS. FIG. 19 and FIG. 20 are explanatory diagrams showing display examples of images displayed on the display screen of the display unit 15.

本実施の形態においては、推論エンジン１２は、所定の画像入力に対して複数の所定時間後の予測画像及びその移動方向を出力する上述した画像画像方向推論を行う推論モデルを構成する。ユーザ４７は、枝４５から飛び立った後の鳥４６の撮影を試みようとしている。図１９の画像Ｐ４１〜Ｐ４４は、所定の時刻におけるライブビュー画像を示しており、画像Ｐ４１〜Ｐ４４の順で時刻が経過している。 In the present embodiment, the inference engine 12 constitutes an inference model that performs the above-described image image direction inference that outputs a predicted image and a moving direction thereof after a plurality of predetermined times with respect to a predetermined image input. The user 47 is trying to photograph the bird 46 after flying off the branch 45. Images P41 to P44 in FIG. 19 show live view images at a predetermined time, and the time elapses in the order of images P41 to P44.

画像Ｐ４１中の画像４６ａは、枝に止まっている鳥４６を示している。この画像Ｐ４１は、表示画面１５ａ上にライブビュー画像として表示されている。画像Ｐ４１から所定時間後に取得されたライブビュー画像である画像Ｐ４２は、画像Ｐ４２中の被写体に関連する推論モデルが存在することを示す丸印の表示５１が表示されている。画像Ｐ４２中の鳥４６の画像４６ｂは、もう少しで鳥４６が飛び立とうとしている様子を示している。例えば、この時点における画像画像方向推論の信頼性は十分に高くはないものとする。この場合には、画像Ｐ４２中には、推論エンジン１２による画像画像方向推論の結果、表示制御部１１ｆにより、５秒間から２秒間後の画像予測であることを示す時間表示８８ｂと鳥４６の移動方向の予測を示す表示が表示される。 The image 46a in the image P41 shows the bird 46 stopping on the branch. This image P41 is displayed as a live view image on the display screen 15a. In the image P42 which is a live view image acquired after a predetermined time from the image P41, a display 51 of a circle indicating that an inference model related to the subject in the image P42 is displayed. The image 46b of the bird 46 in the image P42 shows that the bird 46 is about to fly off. For example, it is assumed that the reliability of the image image direction inference at this point is not sufficiently high. In this case, in the image P42, as a result of the image image direction inference by the inference engine 12, the display control unit 11f displays the time display 88b indicating that the image is predicted after 5 seconds to 2 seconds and the movement of the bird 46. A display indicating the prediction of the direction is displayed.

図１９の例では、画像Ｐ４２中には、鳥４６が、上方に飛び立つ可能性が１５％であることを示す確率表示８５ｈｐとその場合の代表画像８５ｈ、隣の枝に飛び移る可能性が１５％であることを示す確率表示８５ｉｐとその場合の代表画像８５ｉ及び水平又は下方に滑空する可能性が７０％であることを示す確率表示８５ｊｐとその場合の代表画像８５ｊが含まれている。なお、各確率表示によって示す確率は、推論の結果得られる各方向の信頼性の値に基づいて得られるものである。 In the example of FIG. 19, in the image P42, the probability display 85hp indicating that the probability that the bird 46 will jump upward is 15%, the representative image 85h in that case, and the possibility that the bird 46 jumps to the next branch are 15%. % And a representative image 85i in that case, a probability display 85jp indicating that the possibility of gliding horizontally or downward is 70%, and a representative image 85j in that case. The probabilities indicated by the respective probability displays are obtained based on the reliability values in the respective directions obtained as a result of the inference.

更に、画像Ｐ４２から所定時間後に取得されたライブビュー画像である画像Ｐ４３は、画像Ｐ４３中の被写体に関連する推論モデルが存在することを示す丸印の表示５１が表示されている。画像Ｐ４３中の鳥４６の画像４６ｃは、鳥４６が飛び立つ直前の様子を示している。例えば、この時点における画像画像方向推論の信頼性は十分に高いものとする。この場合には、画像Ｐ４３中には、推論エンジン１２による画像画像方向推論の結果、表示制御部１１ｆにより、１秒間後の鳥４６の移動方向を示す表示が表示される。 Furthermore, in the image P43 which is a live view image acquired after a predetermined time from the image P42, a display 51 of a circle indicating that an inference model related to the subject in the image P43 is displayed. An image 46c of the bird 46 in the image P43 shows a state immediately before the bird 46 takes off. For example, it is assumed that the reliability of the image image direction inference at this point is sufficiently high. In this case, as a result of the image image direction inference by the inference engine 12, a display indicating the moving direction of the bird 46 after one second is displayed in the image P43 by the display control unit 11f.

図１９の例では、画像Ｐ４３中には、鳥４６が、隣の枝に飛び移る可能性が５％であることを示す確率表示８６ｉｐとその場合の代表画像８６ｉ及び水平又は下方に滑空する可能性が９５％であることを示す確率表示８６ｊｐとその場合の代表画像８６ｊが含まれている。また、予測が現時点から１秒後のものであることを示す時間表示８８ｃも表示されている。 In the example of FIG. 19, in the image P43, the bird 46 can glide horizontally or downward, with the probability display 86ip indicating that the probability of jumping to the next branch is 5%, the representative image 86i in that case, and the like. A probability display 86jp indicating that the sex is 95% and a representative image 86j in that case are included. A time display 88c indicating that the prediction is one second after the present time is also displayed.

ユーザ４７は、表示部１５の表示画面１５ａに表示された画像Ｐ４３を確認することによって、鳥４６が枝４５から飛び立った後、略水平方向に滑空することを予測することができる。 By checking the image P43 displayed on the display screen 15a of the display unit 15, the user 47 can predict that the bird 46 will glide in a substantially horizontal direction after jumping off the branch 45.

例えば、ユーザ４７は、画像Ｐ４３を確認して鳥４６の移動方向を推測することで、鳥４６を撮影範囲に捉え続けることが比較的容易にできる。結果的に、ユーザ４７は、所望の決定的瞬間、即ち、鳥４６が魚４９を捕獲する瞬間等を撮影することが可能となる。 For example, the user 47 can relatively easily keep the bird 46 in the shooting range by checking the image P43 and estimating the moving direction of the bird 46. As a result, the user 47 can capture a desired decisive moment, that is, a moment at which the bird 46 captures the fish 49 or the like.

図２０はこの場合における表示画面１５ａ上の表示を示しており、魚の画像４９ｐと魚を咥えた鳥の画像４６ｐとが表示されている。 FIG. 20 shows a display on the display screen 15a in this case, in which an image 49p of a fish and an image 46p of a bird holding the fish are displayed.

なお、図１９の画像Ｐ４４は、画像Ｐ４３の撮影時刻から１秒後のライブビュー画像を示しており、実際の鳥４６の画像４６ｄと予測結果の代表画像８７ｊ及びその移動方向の確率が１００％であることを示す確率表示８７ｊｐとが表示されている。 Note that the image P44 in FIG. 19 shows a live view image one second after the shooting time of the image P43, and the image 46d of the actual bird 46, the representative image 87j of the prediction result, and the probability of the moving direction thereof are 100%. Is displayed.

このように本実施の形態においては、時間情報を有する画像を学習用データとして、所定の複数の時間後の画像及びその移動方向を予測する画像画像方向推論を行う機械学習を実現する。この機械学習によって得た推論モデルを例えば撮像装置に適用することにより、時々刻々変化するライブビュー画像に対して画像画像方向推論を行って、例えば鳥が所定の時間後にどの方向で撮影されるかを予測して、提示することができる。ユーザは、提示された画像を考慮して、例えば撮影操作を行うことで、簡単に鳥が飛ぶ様子を捉えた撮影が可能である。また、学習用データとして用いる時間情報を有する画像は極めて容易に取得することができるものであり、この学習用データから比較的簡単な処理によって教師データを取得することができ、画像画像方向推論を可能にする推論モデルを簡単に作成することができる。 As described above, in the present embodiment, machine learning is performed, in which an image having time information is used as learning data to perform image image direction inference for predicting an image after a plurality of predetermined times and a moving direction thereof. By applying the inference model obtained by this machine learning to, for example, an imaging device, image image direction inference is performed on a live view image that changes every moment, for example, in which direction a bird is photographed after a predetermined time. Can be predicted and presented. The user can easily take a picture of a bird flying, for example, by performing a shooting operation in consideration of the presented image. In addition, an image having time information used as learning data can be obtained extremely easily. Teacher data can be obtained from the learning data by relatively simple processing, and image image direction inference can be performed. Inference models that enable it can be easily created.

なお、上記各実施の形態においては、対象物として鳥を想定した例のみを説明したが、対象物としてはどのようなものでもよく、また、決定的瞬間についても前後の画像から学習によって予測可能なものであれば、どのようなものでもよい。例えば、鳥が水面に入る瞬間を予測してもよく、魚が水面からジャンプする瞬間を予測してもよく、猫や犬が振り向く瞬間を予測してもよい。 In each of the above embodiments, only an example in which a bird is assumed as the target object has been described. However, any target object may be used, and the decisive moment can be predicted from the preceding and succeeding images by learning. Anything may be used as long as it is appropriate. For example, the moment when a bird enters the water surface, the moment when a fish jumps from the water surface, or the moment when a cat or dog turns around may be predicted.

また、動物に限らず、ミルククラウンが生じる瞬間を予測してもよく、花火が開く瞬間を予測してもよい。また、比較的予測が簡単なゴルフ等のスイングのインパクトの瞬間を予測してもよい。また、更に、細胞分裂の瞬間、卵割、羽化、孵化等を予測してもよい。細胞分裂の瞬間を確認することは比較的困難であり、分単位で分裂の瞬間を予測できれば、極めて有用である。また、例えば、調理の状態を撮像することで、火を止める瞬間を予測することも可能である。 In addition, the moment when the milk crown occurs is not limited to the animal, and the moment when the fireworks are opened may be predicted. Further, the moment of impact of a swing such as golf, which is relatively easy to predict, may be predicted. Further, the moment of cell division, cleavage, emergence, hatching, etc. may be predicted. It is relatively difficult to identify the moment of cell division, and it would be extremely useful to be able to predict the moment of division in minutes. Further, for example, the moment when the fire is stopped can be predicted by imaging the state of cooking.

更に、上記各実施の形態においては、画像によって時間、画像、位置、方向を推論する例を説明したが、音に基づいてこれらの推論を行うことも可能である。例えば、動物の求愛行動を求める鳴き声等から、求愛行動を予測することも可能である。また、動物の画像から求愛行動を発するまでの鳴き声の瞬間を予測することも可能であり、即ち、画像から音の発生タイミングを予測することも可能である。
前述のように、音声や画像など、取得できる情報は、すべて、本発明の考え方が適用でき、また、総合的に判断して一方から他方の予測のみならず、両方のデータを使った学習を行ってもよい。例えば、画像の一コマごとに、対応する取得時間の音の断片的な情報を入れて学習すればよい。求愛行動のみならず、産卵や羽化、孵化といった決定的瞬間もある。また、心音や呼吸音、腸蠕動音などに基づいて推論する事で患者がその後発症するであろう疾患を予測する医療展開も可能である。ぜんそくなどの喘鳴や呼吸の様子は、悪化によって状態が変わるので初期の発見がしやすい。言葉で表せない乳幼児や障害のある人などの早期治療に役立てることも可能である。 Further, in each of the above-described embodiments, an example has been described in which time, image, position, and direction are inferred by an image, but these inferences can be made based on sound. For example, courtship behavior can be predicted from a call for animal courtship behavior. In addition, it is also possible to predict the instant of a squeak until a courtship action is performed from the image of the animal, that is, it is also possible to predict the timing of sound generation from the image.
As described above, the concept of the present invention can be applied to all information that can be obtained, such as audio and images, and learning using both data as well as prediction from one to the other can be made comprehensively. May go. For example, learning may be performed by inserting fragmentary information of sound at the corresponding acquisition time for each frame of an image. In addition to courtship behavior, there are decisive moments such as spawning, emergence and hatching. In addition, it is also possible to develop a medical treatment that predicts a disease that a patient will subsequently develop by making inferences based on heart sounds, breath sounds, intestinal peristaltic sounds, and the like. The state of wheezing and breathing, such as asthma, changes as the condition worsens, so it is easy to find them early. It can be useful for early treatment of infants and people with disabilities that cannot be expressed in words.

なお、上記実施の形態においては、撮像装置は、外部機器に推論モデルの作成及び転送を依頼したが、推論モデルの作成はいずれの装置において実施してもよく、例えば、クラウド上のコンピュータを利用してもよい。 In the above embodiment, the imaging apparatus requests the external device to create and transfer the inference model. However, the creation of the inference model may be performed in any apparatus, for example, using a computer on a cloud. May be.

上記実施の形態においては、撮像のための機器として、デジタルカメラを用いて説明したが、カメラとしては、デジタル一眼レフカメラでもコンパクトデジタルカメラでもよく、ビデオカメラ、ムービーカメラのような動画用のカメラでもよく、さらに、携帯電話やスマートフォンなど携帯情報端末（ＰＤＡ：Personal Digital Assist）等に内蔵されるカメラでも勿論構わない。また、撮像部が撮像装置と別体になったものでもよい。 In the above embodiment, the description has been made using a digital camera as a device for imaging. However, the camera may be a digital single-lens reflex camera or a compact digital camera, and a video camera, a movie camera such as a movie camera. Alternatively, a camera built in a portable information terminal (PDA: Personal Digital Assist) such as a mobile phone or a smartphone may be used. Further, the imaging unit may be separate from the imaging device.

本発明は、上記各実施形態にそのまま限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素の幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 The present invention is not limited to the above embodiments as they are, and can be embodied by modifying the constituent elements in an implementation stage without departing from the scope of the invention. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components of all the components shown in the embodiment may be deleted. Further, components of different embodiments may be appropriately combined.

なお、特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。また、これらの動作フローを構成する各ステップは、発明の本質に影響しない部分については、適宜省略も可能であることは言うまでもない。 It should be noted that even if the operation flow in the claims, the specification, and the drawings is described using “first”, “next”, etc. for convenience, it is essential that the operation be performed in this order. It does not mean. In addition, it is needless to say that each step constituting the operation flow can be omitted as appropriate for portions that do not affect the essence of the invention.

なお、ここで説明した技術のうち、主にフローチャートで説明した制御に関しては、プログラムで設定可能であることが多く、記録媒体や記録部に収められる場合もある。この記録媒体、記録部への記録の仕方は、製品出荷時に記録してもよく、配布された記録媒体を利用してもよく、インターネットを介してダウンロードしたものでもよい。 Of the techniques described here, the control mainly described in the flowchart can be set by a program in many cases, and may be stored in a recording medium or a recording unit. The recording method on the recording medium and the recording unit may be recorded when the product is shipped, a distributed recording medium may be used, or the recording medium may be downloaded via the Internet.

なお、実施例中で、「部」（セクションやユニット）として記載した部分は、専用の回路や、複数の汎用の回路を組み合わせて構成してもよく、必要に応じて、予めプログラムされたソフトウェアに従って動作を行うマイコン、ＣＰＵなどのプロセッサ、あるいはＦＰＧＡなどシーケンサを組み合わせて構成されてもよい。また、その制御の一部または全部を外部の装置が引き受けるような設計も可能で、この場合、有線や無線の通信回路が介在する。通信は、ブルートゥースやＷｉＦｉ、電話回線などで行えばよく、ＵＳＢなどで行っても良い。専用の回路、汎用の回路や制御部を一体としてＡＳＩＣとして構成してもよい。移動部などは、様々なアクチュエータと、必要に応じて移動用の連結メカニズムによって構成されており、ドライバ回路によってアクチュエータが作動する。このドライブ回路もまた、特定のプログラムに従ってマイコンやＡＳＩＣなどが制御する。こうした制御は各種センサやその周辺回路が出力する情報によって、詳細な補正、調整などが行われても良い。また、推論モデルとか学習済モデルという言葉で人工知能が判断する学習結果で判断する実施例を説明したが、これは、単純なフローチャートや条件分岐、あるいは演算を伴う数値化判断等でも代替可能な場合がある。また、カメラの制御回路の演算能力が改善されることや、特定の状況や対象物に絞り込むことによって、機械学習の学習を撮像装置内で実施してもよい。

［付記項１］
撮影時刻に基づく時間情報を有する一連の画像中の各画像について所定時間後の画像を求める時間判定部と、
上記一連の画像中の各画像について、上記各画像中の特定の対象物の所定時間後における画像状態を検出する対象物画像判定部と、
上記各画像と上記各画像について求めた特定の対象物の所定時間後における画像状態のデータとを組にして教師データとする制御部と
を具備したことを特徴とする教師データ作成装置。 In the embodiments, the part described as a “unit” (section or unit) may be configured by combining a dedicated circuit or a plurality of general-purpose circuits, and if necessary, may be configured by a pre-programmed software. And a processor such as a CPU or a sequencer such as an FPGA that operates according to the above. It is also possible to design such that some or all of the control is undertaken by an external device. In this case, a wired or wireless communication circuit is interposed. The communication may be performed by Bluetooth, WiFi, a telephone line, or the like, and may be performed by USB or the like. A dedicated circuit, a general-purpose circuit, and a control unit may be integrally configured as an ASIC. The moving unit and the like are constituted by various actuators and, if necessary, a connecting mechanism for movement, and the actuators are operated by a driver circuit. This drive circuit is also controlled by a microcomputer or ASIC according to a specific program. For such control, detailed correction, adjustment, and the like may be performed based on information output from various sensors and their peripheral circuits. Also, the embodiment in which the judgment is made based on the learning result determined by the artificial intelligence using the words inference model or learned model has been described, but this can be replaced with a simple flowchart, conditional branching, or a numerical judgment involving an operation. There are cases. In addition, learning of machine learning may be performed in the imaging device by improving the arithmetic performance of the control circuit of the camera or by narrowing down to a specific situation or target.

[Appendix 1]
A time determination unit for obtaining an image after a predetermined time for each image in a series of images having time information based on the shooting time,
For each image in the series of images, an object image determination unit that detects an image state of the specific object in each image after a predetermined time,
A teacher data creation device, comprising: a controller that sets the respective images and data of an image state of a specific object obtained for the respective images after a predetermined time period as teacher data.

［付記項２］
上記制御部は、上記一連の画像から上記特定の対象物の所定時間後における画像状態に類似した画像を選択して代表画像とする
ことを特徴とする付記項２に記載の教師データ作成装置。 [Appendix 2]
3. The teacher data creating apparatus according to claim 2, wherein the control unit selects an image similar to an image state of the specific target object after a predetermined time from the series of images as a representative image.

［付記項３］
付記項１に記載の教師データ作成装置によって作成された教師データを用いた機械学習により、入力された画像から所定の対象物の所定時間後における画像状態を推論する推論モデルを生成する推論モデル生成部
を具備したことを特徴とする学習装置。 [Appendix 3]
Inference model generation for generating an inference model for inferring an image state of a predetermined object after a predetermined time from an input image by machine learning using the teacher data created by the teacher data creation device described in Additional Item 1. A learning device comprising a unit.

［付記項４］
付記項３の学習装置によって生成された推論モデルを実現する推論エンジンと、
撮像部と、
上記撮像部による撮像画像を上記推論エンジンに与えて、上記撮像画像中の上記所定の対象物の所定時間後における画像状態の推論結果を得る設定制御部と
を具備したことを特徴とする撮像装置。 [Appendix 4]
An inference engine that implements the inference model generated by the learning device according to claim 3;
An imaging unit;
An image pickup apparatus, comprising: a setting control unit that supplies an image captured by the image capturing unit to the inference engine to obtain an inference result of an image state of the predetermined object in the captured image after a predetermined time. .

［付記項５］
撮影時刻に基づく時間情報を有する一連の画像中の各画像について所定時間後の画像を求める時間判定部と、
上記一連の画像中の各画像について、上記各画像中の特定の対象物の所定時間後における画像位置を検出する対象物画像判定部と、
上記各画像と上記各画像について求めた特定の対象物の所定時間後における画像位置のデータとを組にして教師データとする制御部と
を具備したことを特徴とする教師データ作成装置。 [Appendix 5]
A time determination unit for obtaining an image after a predetermined time for each image in a series of images having time information based on the shooting time,
For each image in the series of images, an object image determination unit that detects an image position of the specific object in each image after a predetermined time,
A teacher data creating apparatus, comprising: a controller that sets the respective images and data of the image position of a specific object obtained for the respective images after a predetermined time and sets the data as teacher data.

［付記項６］
付記項５に記載の教師データ作成装置によって作成された教師データを用いた機械学習により、入力された画像から所定の対象物の所定時間後における画像位置を推論する推論モデルを生成する推論モデル生成部
を具備したことを特徴とする学習装置。 [Appendix 6]
Inference model generation for generating an inference model for inferring an image position of a predetermined object after a predetermined time from an input image by machine learning using the teacher data generated by the teacher data generation device described in Additional Item 5. A learning device comprising a unit.

［付記項７］
付記項６の学習装置によって生成された推論モデルを実現する推論エンジンと、
撮像部と、
上記撮像部による撮像画像を上記推論エンジンに与えて、上記撮像画像中の上記所定の対象物の所定時間後における画像位置の推論結果を得る設定制御部と
を具備したことを特徴とする撮像装置。 [Appendix 7]
An inference engine that implements the inference model generated by the learning device of claim 6;
An imaging unit;
An image pickup apparatus, comprising: a setting control unit that supplies an image captured by the imaging unit to the inference engine and obtains an inference result of an image position of the predetermined object in the captured image after a predetermined time. .

［付記項８］
撮影時刻に基づく時間情報を有する一連の画像中の各画像について複数の所定時間後の画像を求める時間判定部と、
上記一連の画像中の各画像について、上記各画像中の特定の対象物の複数の所定時間後における画像位置及び移動方向を検出する対象物画像判定部と、
上記各画像と上記各画像について求めた特定の対象物の複数の所定時間後における画像位置及び移動方向のデータとを組にして教師データとする制御部と
を具備したことを特徴とする教師データ作成装置。 [Appendix 8]
A time determination unit that determines an image after a plurality of predetermined times for each image in a series of images having time information based on the shooting time,
For each image in the series of images, an object image determination unit that detects an image position and a moving direction of a specific object in each image after a plurality of predetermined times,
Teacher data comprising: a set of each of the images and data of the image position and the moving direction of the specific object obtained after a plurality of predetermined times obtained for each of the images as teacher data. Creating device.

［付記項９］
付記項８に記載の教師データ作成装置によって作成された教師データを用いた機械学習により、入力された画像から所定の対象物の複数の所定時間後における画像位置及び移動方向を推論する推論モデルを生成する推論モデル生成部
を具備したことを特徴とする学習装置。 [Appendix 9]
An inference model that infers an image position and a moving direction of a predetermined object after a plurality of predetermined times from an input image by machine learning using the teacher data created by the teacher data creation device described in Additional Item 8. A learning device, comprising: an inference model generation unit that generates an inference model.

［付記項１０］
付記項９の学習装置によって生成された推論モデルを実現する推論エンジンと、
撮像部と、
上記撮像部による撮像画像を上記推論エンジンに与えて、上記撮像画像中の上記所定の対象物の複数の所定時間後における画像位置及び移動方向の推論結果を得る設定制御部と
を具備したことを特徴とする撮像装置。 [Appendix 10]
An inference engine that implements the inference model generated by the learning device of claim 9;
An imaging unit;
A setting control unit that gives an image captured by the imaging unit to the inference engine, and obtains an inference result of an image position and a moving direction of the predetermined object in the captured image after a plurality of predetermined times. Characteristic imaging device.

１１…制御部、１１ａ…撮像制御部、１１ｂ…画像処理部、１１ｃ…記録制御部、１１ｄ…設定制御部、１１ｅ…通信制御部、１１ｆ…表示制御部、１２…推論エンジン、１２ａ…ネットワーク、１３…操作部、１４，３１ｂ，３１ｃ，３３…通信部、１５…表示部、１６…記録部、１６ａ…画像データ記録領域、１６ｂ…設定データ記録領域、２０…撮像装置、２２…撮像部、２２ａ…撮像素子、２２ｂ…光学系、３０…外部機器、３１…学習部、３１ａ…母集合作成部、３１ａ１…時間判定部、３１ａ２…対象物画像判定部、３１ｄ…入出力モデル化部、３１ｅ…教師データ記録部、３１ｆ…表示部、３２…外部画像ＤＢ。 11 control unit, 11a imaging control unit, 11b image processing unit, 11c recording control unit, 11d setting control unit, 11e communication control unit, 11f display control unit, 12 inference engine, 12a network 13: operation unit, 14, 31b, 31c, 33: communication unit, 15: display unit, 16: recording unit, 16a: image data recording area, 16b: setting data recording area, 20: imaging device, 22: imaging unit, 22a: imaging device, 22b: optical system, 30: external device, 31: learning unit, 31a: population creation unit, 31a1: time determination unit, 31a2: target image determination unit, 31d: input / output modeling unit, 31e ... teacher data recording unit, 31f ... display unit, 32 ... external image DB.

Claims

From a series of images having time information based on the shooting time, a target image determination unit that detects a specific state image that is an image in a specific state of a specific target,
For each image of the series of images, the shooting time of each of the images, and a time determination unit that determines a time difference between the shooting times of the images including the specific state image of the series of images,
A teacher data creation device, comprising: a control unit that sets each image and data of the time difference obtained for each image as teacher data.

An inference model for generating an inference model for inferring a time when a predetermined target is in the specific state from an input image by machine learning using the teacher data created by the teacher data creation device according to claim 1. A learning device comprising a generation unit.

An inference engine that implements an inference model generated by the learning device according to claim 2;
An imaging unit;
And a setting control unit that gives an image captured by the image capturing unit to the inference engine and obtains an inference result of a time until the predetermined target in the captured image reaches the specific state. Imaging device.

The imaging apparatus according to claim 3, further comprising a display control unit that performs display control for displaying the time inference result on a display unit.

The captured image is a live view image,
The imaging apparatus according to claim 4, wherein the display control unit displays the inference result of the time on the live image displayed on the display unit.

The setting control unit acquires reliability information of the inference result together with the inference result of the time,
The imaging apparatus according to claim 3, wherein the display control unit changes a display mode of the time inference result based on the reliability information.

The display control unit displays the time when the reliability of the inference result is equal to or more than a predetermined threshold, and displays the time width when the reliability of the inference result is smaller than the predetermined threshold.
The imaging device according to claim 6, wherein:

From a series of images having time information based on the shooting time, a detection step of detecting a specific state image that is an image in a specific state of a specific target,
For each image of the series of images, the shooting time of each of the images, a step of determining the time difference between the shooting time of the image including the specific state image of the series of images,
Generating a teacher data by combining each of the images and the data of the time difference obtained for each of the images as a set of teacher data.

9. The teacher data creating method according to claim 8, wherein the detecting step detects the specific object by manual operation or recognition processing, and detects the specific state by image analysis processing.

The generating step excludes the series of images from the teacher data when the number of images including the specific object is less than a predetermined number, and excludes the images not including the specific object from the teacher data. The method according to claim 8, wherein the teacher data is created.

On the computer,
From a series of images having time information based on the shooting time, a detection step of detecting a specific state image that is an image in a specific state of a specific target,
For each image of the series of images, the shooting time of each of the images, a step of determining the time difference between the shooting time of the image including the specific state image of the series of images,
Generating a teacher data by combining each image and the data of the time difference obtained for each image as teacher data.