JP7166837B2

JP7166837B2 - Teacher data creation device, teacher data creation method, teacher data creation program, learning device, and imaging device

Info

Publication number: JP7166837B2
Application number: JP2018150749A
Authority: JP
Inventors: 伸之志摩; 尚米山; 和男神田; 和彦志村; 修野中
Original assignee: Olympus Corp
Current assignee: Olympus Corp
Priority date: 2018-08-09
Filing date: 2018-08-09
Publication date: 2022-11-08
Anticipated expiration: 2038-08-09
Also published as: JP2020027982A

Description

本発明は、機械学習のための教師データ作成装置、教師データ作成方法、教師データ作成プログラム、学習装置及び撮像装置に関する。 The present invention relates to a teacher data creation device, a teacher data creation method, a teacher data creation program, a learning device, and an imaging device for machine learning.

近年、デジタルカメラなどの撮影機能付き携帯機器（撮影機器）が普及している。この種の撮影機器においては、撮影時の種々の設定が自動化されたものがある。例えば、デジタルカメラ等には、ピント合わせを自動化したＡＦ機能や、露出を自動化した自動露出（ＡＥ）機能を搭載したものがある。また、撮影を連続的に行う、所謂連写機能を備えた撮影機器も普及している。 2. Description of the Related Art In recent years, portable devices (photographing devices) with photographing functions such as digital cameras have become widespread. Some of this type of photographing equipment have automated various settings at the time of photographing. For example, some digital cameras are equipped with an AF function that automates focusing and an automatic exposure (AE) function that automates exposure. In addition, photographing equipment having a so-called continuous shooting function, in which photographs are taken continuously, has become widespread.

ところで、このような撮影機器によって取得した撮像画像に対する機械学習によって所望の推論結果を得る手法が開発されている。機械学習は、既知の入力情報についてその特徴、時系列情報、空間情報等を学習し、学習結果に基づいて推論を行うことで、未知の事柄についての推論結果を得るものである。即ち、機械学習では、先ず特定の入力情報から、判定可能な出力結果を推論可能にするための学習済みモデルを得る。 By the way, a technique has been developed for obtaining a desired inference result by machine learning for a captured image acquired by such a photographing device. Machine learning obtains inference results about unknown matters by learning the characteristics, time-series information, spatial information, etc. of known input information and inferring based on the learning results. That is, in machine learning, first, a trained model is obtained from specific input information so that a determinable output result can be inferred.

高い信頼性で推論結果が得られるように、学習済みモデルの生成に際して、入力と出力との関係が既知の大量の情報が学習用データとして用いられる。例えば、深層学習においては、大量の学習用データを用いて既知の入力に対して期待される出力が得られるようにネットワークのデザイン設計が行われる。このようなプロセスで得られた学習済モデル（以下、推論モデルともいう）は、学習を行ったネットワークから独立して利用可能である。 A large amount of information with known relationships between inputs and outputs is used as learning data when generating a trained model so that inference results can be obtained with high reliability. For example, in deep learning, a network is designed using a large amount of training data so that an expected output can be obtained for a known input. A trained model obtained by such a process (hereinafter also referred to as an inference model) can be used independently of the network that performed the training.

例えば、特許文献１においては、学習データの数が少ない場合にも学習精度の悪化を防ぐことを目的として、第１コンテンツと当該第１コンテンツとは種別が異なる第２コンテンツとの組が有する関係性を深層学習した第１学習器の一部を用いて、新たな第２学習器を生成する生成部と、前記生成部が生成した前記第２学習器に、第１コンテンツと、前記第２コンテンツとは異なる種別の第３コンテンツとの組が有する関係性を深層学習させる学習部とを備えた技術が開示されている。 For example, in Patent Literature 1, for the purpose of preventing deterioration of learning accuracy even when the number of learning data is small, a relationship between a set of first content and second content whose type is different from that of the first content. a generation unit that generates a new second learner by using a part of the first learner that has undergone deep learning of the sex, and the second learner generated by the generation unit includes a first content and the second A technology is disclosed that includes a learning unit that deep-learns a relationship between a set of content and a third content of a type different from the content.

特許第６１５１４０４号公報Japanese Patent No. 6151404

従来、未知の動きをする被写体の画像を元に、所望のタイミングを予測する機械学習を行う装置は開発されていない。 Conventionally, no device has been developed that performs machine learning for predicting a desired timing based on an image of an object that moves in an unknown manner.

本発明は、機械学習により、被写体の画像から所望のタイミングを予測することを可能にすることができる教師データ作成装置、教師データ作成方法、教師データ作成プログラム、学習装置及び撮像装置を提供することを目的とする。 The present invention provides a teacher data creation device, a teacher data creation method, a teacher data creation program, a learning device, and an imaging device that can predict a desired timing from an image of a subject by machine learning. With the goal.

本発明の一態様による教師データ作成装置は、撮影時刻に基づく時間情報を有する一連の画像から、特定の対象物の特定の状態における画像である特定状態画像を検出する対象物画像判定部と、上記一連の画像の各画像について上記各画像の撮影時刻と、上記一連の画像のうち上記特定状態画像を含む画像の撮影時刻との時間差を判定する時間判定部と、上記各画像と上記各画像について求めた時間差のデータとを組にして教師データとする制御部とを具備する。 A training data creation device according to an aspect of the present invention includes a target object image determination unit that detects a specific state image, which is an image of a specific target in a specific state, from a series of images having time information based on shooting time; a time determination unit for determining a time difference between the photographing time of each image in the series of images and the photographing time of an image including the specific state image in the series of images; and a control unit that combines the data of the time difference obtained with respect to and sets it as teacher data.

本発明の一態様による学習装置は、上記教師データ作成装置によって作成された教師データを用いた機械学習により、入力された画像から所定の対象物が上記特定の状態となる時間を推論する推論モデルを生成する推論モデル生成部を具備する。 A learning device according to an aspect of the present invention includes an inference model that infers the time at which a given object is in the specific state from an input image by machine learning using the teacher data created by the teacher data creation device. and an inference model generation unit that generates

本発明の一態様による撮像装置は、上記学習装置によって生成された推論モデルを実現する推論エンジンと、撮像部と、上記撮像部による撮像画像を上記推論エンジンに与えて、上記撮像画像中の上記所定の対象物が上記特定の状態となるまでの時間の推論結果を得る設定制御部とを具備する。 An imaging device according to an aspect of the present invention includes an inference engine that realizes an inference model generated by the learning device, an imaging unit, and an image captured by the imaging unit to the inference engine. and a setting control unit for obtaining an inference result of the time required for the predetermined object to reach the specific state.

本発明の一態様による教師データ作成方法は、撮影時刻に基づく時間情報を有する一連の画像から、特定の対象物の特定の状態における画像である特定状態画像を検出する検出ステップと、上記一連の画像の各画像について上記各画像の撮影時刻と、上記一連の画像のうち上記特定状態画像を含む画像の撮影時刻との時間差を判定するステップと、上記各画像と上記各画像について求めた時間差のデータとを組にして教師データとして生成する生成ステップとを具備する。 A teacher data creation method according to an aspect of the present invention includes a detection step of detecting a specific state image, which is an image of a specific object in a specific state, from a series of images having time information based on shooting time; determining the time difference between the photographing time of each image and the photographing time of the image including the specific state image among the series of images; and a generation step of generating training data by pairing the data with the training data.

本発明の一態様による教師データ作成プログラムは、コンピュータに、撮影時刻に基づく時間情報を有する一連の画像から、特定の対象物の特定の状態における画像である特定状態画像を検出する検出ステップと、上記一連の画像の各画像について上記各画像の撮影時刻と、上記一連の画像のうち上記特定状態画像を含む画像の撮影時刻との時間差を判定するステップと、上記各画像と上記各画像について求めた時間差のデータとを組にして教師データとして生成する生成ステップとを実行させる。 A training data creation program according to one aspect of the present invention comprises a computer, a detection step of detecting a specific state image, which is an image of a specific object in a specific state, from a series of images having time information based on shooting time; determining a time difference between the photographing time of each image of the series of images and the photographing time of an image including the specific state image among the series of images; and a generation step of generating teacher data by pairing the data with the time difference data obtained.

本発明によれば、機械学習により、被写体の画像から所望のタイミングを予測することを可能にすることができるという効果を有する。 Advantageous Effects of Invention According to the present invention, it is possible to predict desired timing from an image of a subject by machine learning.

本発明の第１の実施の形態に係る学習装置及び撮像装置を示すブロック図。1 is a block diagram showing a learning device and an imaging device according to a first embodiment of the present invention; FIG. 推論エンジン１２のネットワーク１２ａを説明するための説明図。FIG. 2 is an explanatory diagram for explaining a network 12a of inference engines 12; 画像群３４ａの各画像を撮像する一例を示す説明図。FIG. 4 is an explanatory diagram showing an example of capturing each image of the image group 34a; 画像群３４ａの各画像と撮影時間との関係を示す説明図。FIG. 4 is an explanatory diagram showing the relationship between each image in the image group 34a and the shooting time; 母集合作成部３１ａによる教師データの作成方法を説明するためのフローチャート。4 is a flowchart for explaining a method of creating teacher data by a population creating unit 31a; 第１の実施の形態の動作を説明するための説明図。Explanatory diagram for explaining the operation of the first embodiment. 第１の実施の形態の動作を説明するための説明図。Explanatory diagram for explaining the operation of the first embodiment. 撮像装置２０の動作を示すフローチャート。4 is a flowchart showing the operation of the imaging device 20; 外部機器３０の動作を示すフローチャート。4 is a flowchart showing the operation of the external device 30; 第１の実施の形態の動作を説明するための説明図。Explanatory diagram for explaining the operation of the first embodiment. 本発明の第２の実施の形態において採用される動作フローを示すフローチャート。6 is a flow chart showing an operation flow adopted in the second embodiment of the present invention; 外部画像ＤＢ３２から母集合作成部３１ａに取り込まれる連続画像群の一例を示す説明図。FIG. 4 is an explanatory diagram showing an example of a continuous image group taken in from an external image DB 32 to a mother set creating unit 31a; ネットワーク１２ａを生成する手法を説明するための説明図。Explanatory drawing for demonstrating the method of producing|generating the network 12a. 表示部１５の表示画面に表示される画像の表示例を示す説明図。FIG. 4 is an explanatory diagram showing a display example of an image displayed on the display screen of the display unit 15; 本発明の第３の実施の形態において採用される動作フローを示すフローチャート。9 is a flow chart showing an operation flow employed in the third embodiment of the invention; 撮像装置２０の制御部１１の制御を示すフローチャート。4 is a flowchart showing control of the control unit 11 of the imaging device 20; 表示部１５の表示画面に表示される画像の表示例を示す説明図。FIG. 4 is an explanatory diagram showing a display example of an image displayed on the display screen of the display unit 15; 本発明の第４の実施の形態を説明するための説明図。Explanatory drawing for demonstrating the 4th Embodiment of this invention. 本発明の第４の実施の形態を説明するための説明図。Explanatory drawing for demonstrating the 4th Embodiment of this invention. 本発明の第４の実施の形態を説明するための説明図。Explanatory drawing for demonstrating the 4th Embodiment of this invention.

以下、図面を参照して本発明の実施の形態について詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（第１の実施の形態）
図１は本発明の第１の実施の形態に係る学習装置及び撮像装置を示すブロック図である。本実施の形態においては、時間情報を有する画像を学習用データとして、所定の瞬間（以下、決定的瞬間ともいう）に到達するまでの時間を予測する機械学習を実現する。具体例として、鳥が飛び立つ瞬間を機械学習により予測する推論モデルを構築すると共に、当該推論モデルを用いて、ライブビュー画像から鳥が飛び立つ瞬間の予測結果を表示することを可能にする。 (First embodiment)
FIG. 1 is a block diagram showing a learning device and an imaging device according to the first embodiment of the present invention. In the present embodiment, machine learning that predicts the time until reaching a predetermined moment (hereinafter also referred to as a decisive moment) is realized using an image having time information as learning data. As a specific example, it is possible to build an inference model that predicts the moment when a bird takes off by machine learning, and use the inference model to display the prediction result of the moment when a bird takes off from a live view image.

図１の撮像装置２０は、被写体を撮像して得た画像を記録する。撮像装置２０としては、デジタルカメラやビデオカメラだけでなく、スマートフォンやタブレット端末に内蔵されるカメラを採用してもよい。撮像装置２０は、後述するように、ライブビュー表示時に推論モデルを利用することができるようになっているが、撮像装置２０は予め搭載されている推論モデルを用いてもよく、また、外部機器３０から推論モデルを取得するようになっていてもよい。 The imaging device 20 in FIG. 1 records an image obtained by imaging a subject. As the imaging device 20, not only a digital camera or a video camera but also a camera built in a smart phone or a tablet terminal may be adopted. As will be described later, the imaging device 20 can use an inference model at the time of live view display. 30 to obtain the inference model.

撮像装置２０は、制御部１１及び撮像部２２を備えている。制御部１１は、ＣＰＵ（Central Processing Unit）等を用いたプロセッサによって構成されて、図示しないメモリに記憶されたプログラムに従って動作して各部を制御するものであってもよいし、ハードウェアの電子回路で機能の一部又は全部を実現するものであってもよい。 The imaging device 20 includes a control section 11 and an imaging section 22 . The control unit 11 may be configured by a processor using a CPU (Central Processing Unit) or the like, and may operate according to a program stored in a memory (not shown) to control each unit, or may be a hardware electronic circuit. may implement some or all of the functions.

撮像部２２は、撮像素子２２ａ及び光学系２２ｂを有している。光学系２２ｂは、ズームやフォーカシングのための図示しないレンズや絞り等を備えている。光学系２２ｂは、これらのレンズを駆動する図示しないズーム（変倍）機構、ピント及び絞り機構を備えている。 The imaging unit 22 has an imaging element 22a and an optical system 22b. The optical system 22b includes a lens and a diaphragm (not shown) for zooming and focusing. The optical system 22b includes a zoom (variable magnification) mechanism, focus and diaphragm mechanism (not shown) for driving these lenses.

撮像素子２２ａは、ＣＣＤやＣＭＯＳセンサ等によって構成されており、光学系２２ｂによって被写体光学像が撮像素子２２ａの撮像面に導かれるようになっている。撮像素子２２ａは、被写体光学像を光電変換して被写体の撮像画像（撮像信号）を取得する。 The imaging device 22a is composed of a CCD, CMOS sensor, or the like, and an optical image of a subject is guided to an imaging surface of the imaging device 22a by an optical system 22b. The imaging device 22a photoelectrically converts the subject optical image to obtain a captured image (image capturing signal) of the subject.

制御部１１の撮像制御部１１ａは、光学系２２ｂのズーム機構、ピント機構及び絞り機構を駆動制御して、ズーム、絞り及びピントを調節することができるようになっている。撮像部２２は、撮像制御部１１ａに制御されて撮像を行い、撮像画像（動画像及び静止画像）の撮像信号を制御部１１に出力する。 The imaging control unit 11a of the control unit 11 can drive and control the zoom mechanism, focus mechanism, and aperture mechanism of the optical system 22b to adjust zoom, aperture, and focus. The imaging unit 22 performs imaging under the control of the imaging control unit 11 a and outputs an imaging signal of the captured image (moving image and still image) to the control unit 11 .

撮像装置２０には操作部１３が設けられている。操作部１３は、図示しないレリーズボタン、ファンクションボタン、撮影モード設定、パラメータ操作等の各種スイッチ、ダイヤル、リング部材等を含み、ユーザ操作に基づく操作信号を制御部１１に出力する。制御部１１は、操作部１３からの操作信号に基づいて、各部を制御するようになっている。 An operation unit 13 is provided in the imaging device 20 . The operation unit 13 includes a release button (not shown), function buttons, various switches such as shooting mode setting and parameter operation, dials, ring members, and the like, and outputs operation signals to the control unit 11 based on user operations. The control section 11 controls each section based on an operation signal from the operation section 13 .

制御部１１は、撮像部２２からの撮像画像（動画像及び静止画像）を取込む。制御部１１の画像処理部１１ｂは、取込んだ撮像画像に対して、所定の信号処理、例えば、色調整処理、マトリックス変換処理、ノイズ除去処理、その他各種の信号処理を行う。 The control unit 11 captures captured images (moving images and still images) from the imaging unit 22 . The image processing unit 11b of the control unit 11 performs predetermined signal processing such as color adjustment processing, matrix conversion processing, noise removal processing, and other various signal processing on the captured image.

撮像装置２０には表示部１５が設けられており、制御部１１には、表示制御部１１ｆが設けられている。表示部１５は、例えば、ＬＣＤ（液晶表示装置）等の表示画面を有しており、この表示画面は撮像装置２０の例えば筐体背面等に設けられる。表示制御部１１ｆは、画像処理部１１ｂによって信号処理された撮像画像を表示部１５に表示させるようになっている。また、表示制御部１１ｆは、撮像装置２０の各種メニュー表示や警告表示等を表示部１５に表示させることもできるようになっている。 The imaging device 20 is provided with a display section 15, and the control section 11 is provided with a display control section 11f. The display unit 15 has, for example, a display screen such as an LCD (liquid crystal display device), and this display screen is provided on the rear surface of the housing of the imaging device 20, for example. The display control unit 11f causes the display unit 15 to display the picked-up image signal-processed by the image processing unit 11b. In addition, the display control unit 11f can also cause the display unit 15 to display various menu displays, warning displays, and the like of the imaging device 20. FIG.

撮像装置２０には通信部１４が設けられており、制御部１１には、通信制御部１１ｅが設けられている。通信部１４は、通信制御部１１ｅに制御されて、外部機器３０との間で情報を送受することができるようになっている。通信部１４は、例えば、ブルートゥース（登録商標）等の近距離無線による通信及び例えば、Ｗｉ－Ｆｉ（登録商標）等の無線ＬＡＮによる通信が可能である。なお、通信部１４は、ブルートゥースやＷｉ－Ｆｉに限らず、各種通信方式での通信を採用することが可能である。通信制御部１１ｅは、通信部１４を介して、外部機器３０から推論モデルの情報を受信することができる。 The imaging device 20 is provided with a communication section 14, and the control section 11 is provided with a communication control section 11e. The communication unit 14 is controlled by the communication control unit 11e so that it can transmit and receive information to and from the external device 30 . The communication unit 14 is capable of short-range wireless communication such as Bluetooth (registered trademark) and wireless LAN communication such as Wi-Fi (registered trademark). It should be noted that the communication unit 14 is not limited to Bluetooth and Wi-Fi, and can employ various communication methods. The communication control unit 11 e can receive information on the inference model from the external device 30 via the communication unit 14 .

制御部１１には記録制御部１１ｃが設けられている。記録制御部１１ｃは、信号処理後の撮像画像を圧縮処理し、圧縮後の画像を記録部１６に与えて記録させることができる。記録部１６は、所定の記録媒体によって構成されて、制御部１１から与えられた情報を記録すると共に、記録されている情報を制御部１１に出力することができる。また、記録部１６としては、例えばカードインターフェースを採用してもよく、この場合には記録部１６はメモリカード等の記録媒体に画像データを記録可能である。 The control unit 11 is provided with a recording control unit 11c. The recording control unit 11c can compress the picked-up image after the signal processing, and supply the compressed image to the recording unit 16 for recording. The recording unit 16 is composed of a predetermined recording medium, and can record information given from the control unit 11 and output the recorded information to the control unit 11 . For example, a card interface may be used as the recording unit 16. In this case, the recording unit 16 can record image data on a recording medium such as a memory card.

記録部１６は、画像データ記録領域１６ａを有しており、記録制御部１１ｃは、画像データを画像データ記録領域１６ａに記録するようになっている。また、記録制御部１１ｃは、記録部１６に記録されている情報を読み出して再生することも可能である。 The recording section 16 has an image data recording area 16a, and the recording control section 11c records image data in the image data recording area 16a. The recording control unit 11c can also read and reproduce information recorded in the recording unit 16. FIG.

なお、記録部１６は、設定データ記録領域１６ｂを有している。設定データ記録領域１６ｂには推論モデルの設定情報が記録されるようになっている。 Note that the recording unit 16 has a setting data recording area 16b. Setting information of the inference model is recorded in the setting data recording area 16b.

本実施の形態においては、撮像装置２０には、推論部としての推論エンジン１２が設けられている。推論エンジン１２は、ネットワーク１２ａを有している。ネットワーク１２ａは、記録部１６に記録されている設定値を用いて構築されており、機械学習における学習が完了することによって得られるネットワーク、即ち、推論モデルを構成する。 In this embodiment, the imaging device 20 is provided with an inference engine 12 as an inference unit. The inference engine 12 has a network 12a. The network 12a is constructed using setting values recorded in the recording unit 16, and constitutes a network obtained by completing learning in machine learning, that is, an inference model.

記録制御部１１ｃは、通信部１４を介して、外部機器３０である学習部３１から推論モデルを構成するための情報を受信して、記録部１６の設定データ記録領域１６ｂに設定情報を記録することができるようになっていてもよい。 The recording control unit 11c receives information for configuring an inference model from the learning unit 31, which is the external device 30, via the communication unit 14, and records the setting information in the setting data recording area 16b of the recording unit 16. You may be able to do so.

図２から図４は推論エンジン１２のネットワーク１２ａを説明するための説明図である。図２において、所定のネットワークＮ１には入力及び出力に対応する大量のデータセット３１Ｇが教師データとして与えられる。これにより、ネットワークＮ１は、入力に対応する出力が得られるように、ネットワークデザインが決定される。本実施の形態においては、入力として画像が用いられ、出力として決定的瞬間までの推定時間が信頼性の情報（信頼度）と共に得られる。ネットワークＮ１の決定されたネットワークデザインの情報が設定データ記録領域１６ｂに設定情報として記録される。 2 to 4 are explanatory diagrams for explaining the network 12a of the inference engine 12. FIG. In FIG. 2, a predetermined network N1 is provided with a large amount of data set 31G corresponding to inputs and outputs as teacher data. As a result, the network design of the network N1 is determined so that the output corresponding to the input can be obtained. In this embodiment, an image is used as an input and an estimated time to a decisive moment is obtained as an output together with reliability information (reliability). Information on the determined network design of the network N1 is recorded as setting information in the setting data recording area 16b.

なお、深層学習（ディープ・ラーニング）」は、ニューラル・ネットワークを用いた「機械学習」の過程を多層構造化したものである。情報を前から後ろに送って判定を行う「順伝搬型ニューラル・ネットワーク」が代表的なものである。これは、最も単純なものでは、Ｎ１個のニューロンで構成される入力層、パラメータで与えられるＮ２個のニューロンで構成される中間層、判別するクラスの数に対応するＮ３個のニューロンで構成される出力層の３層があればよい。そして、入力層と中間層、中間層と出力層の各ニューロンはそれぞれが結合加重で結ばれ、中間層と出力層はバイアス値が加えられることで、論理ゲートの形成が容易である。簡単な判別なら３層でもよいが、中間層を多数にすれば、機械学習の過程において複数の特徴量の組み合わせ方を学習することも可能となる。近年では、９層～１５２層のものが、学習にかかる時間や判定精度、消費エネルギーの関係から実用的になっている。
機械学習に採用するネットワークＮ１としては、公知の種々のネットワークを採用してもよい。例えば、ＣＮＮ（Convolution Neural Network）を利用したＲ－ＣＮＮ（Regions with CNN features）やＦＣＮ（Fully Convolutional Networks）等を用いてもよい。これは、画像の特徴量を圧縮する、「畳み込み」と呼ばれる処理を伴い、最小限処理で動き、パターン認識に強い。また、より複雑な情報を扱え、順番や順序によって意味合いが変わる情報分析に対応して、情報を双方向に流れる「再帰型ニューラル・ネットワーク」（全結合リカレントニューラルネット）を利用してもよい。
これらの技術の実現のためには、ＣＰＵやＦＰＧＡ（Field Programmable Gate Array）といったこれまでの汎用的な演算処理回路などを使ってもよいが、ニューラル・ネットワークの処理の多くが行列の掛け算であることから、行列計算に特化したGPU（Graphic Processing Unit）やTensor Processing Unit（TPU）と呼ばれるものが利用される場合もある。近年ではこうした人工知能（ＡＩ）専用ハードの「ニューラル・ネットワーク・プロセッシング・ユニット（ＮＰＵ）」がＣＰＵなどその他の回路とともに集積して組み込み可能に設計され、処理回路の一部になっている場合もある。
また、深層学習に限らず、公知の各種機械学習の手法を採用して推論モデルを取得してもよい。例えば、サポートベクトルマシン、サポートベクトル回帰という手法もある。ここでの学習は、識別器の重み、フィルター係数、オフセットを算出するもので、他には、ロジスティック回帰処理を利用する手法もある。機械に何かを判定させる場合、人間が機械に判定の仕方を教える必要があり、今回の実施例では、画像の判定を、機械学習により導出する手法を採用したが、そのほか、特定の判断を人間が経験則・ヒューリスティクスによって獲得したルールを適応するルールベースの手法を応用して用いてもよい。 “Deep learning” is a multi-layered process of “machine learning” using neural networks. A typical example is a "forward propagation neural network" that sends information from front to back and makes decisions. In the simplest case, it consists of an input layer consisting of N1 neurons, an intermediate layer consisting of N2 neurons given by parameters, and N3 neurons corresponding to the number of classes to be discriminated. It suffices if there are three output layers. The neurons of the input layer and the intermediate layer, and the neurons of the intermediate layer and the output layer are respectively connected by a connection weight, and the intermediate layer and the output layer are added with a bias value, thereby facilitating the formation of logic gates. Three layers may be sufficient for simple discrimination, but if a large number of intermediate layers are used, it becomes possible to learn how to combine a plurality of feature quantities in the process of machine learning. In recent years, those with 9 to 152 layers have become practical due to the relationship between the time required for learning, judgment accuracy, and energy consumption.
Various known networks may be employed as the network N1 employed for machine learning. For example, R-CNN (Regions with CNN features) using CNN (Convolution Neural Network) or FCN (Fully Convolutional Networks) may be used. It involves a process called "convolution" that compresses image features, works with minimal processing, and is robust to pattern recognition. In addition, a "recurrent neural network" (fully-connected recurrent neural network), which can handle more complicated information and can handle information analysis whose meaning changes depending on the order and order, may be used in which information flows in both directions.
In order to realize these technologies, conventional general-purpose arithmetic processing circuits such as CPUs and FPGAs (Field Programmable Gate Arrays) may be used, but much of the neural network processing is matrix multiplication. For this reason, GPUs (Graphic Processing Units) and Tensor Processing Units (TPUs) that specialize in matrix calculations are sometimes used. In recent years, artificial intelligence (AI) dedicated hardware "neural network processing unit (NPU)" is designed to be integrated and embedded with other circuits such as CPU, and it may be part of the processing circuit. be.
In addition, the inference model may be acquired by employing not only deep learning but also various known machine learning techniques. For example, there are methods such as support vector machines and support vector regression. The learning here is to calculate the classifier weights, filter coefficients, and offsets, and there is also a method using logistic regression processing. When a machine judges something, a human needs to teach the machine how to judge something. A rule-based technique that applies rules acquired by humans through empirical rules and heuristics may be applied and used.

外部機器３０は、このようなネットワークデザインの決定を行う学習部３１と大量の学習用データを記録した外部画像データベース（ＤＢ）３２を有している。学習部３１は通信部３１ｂを有しており、外部画像ＤＢ３２は通信部３３を有している。通信部３１ｂ，３３は相互に通信が可能である。なお、学習部３１の通信部３１ｃは通信部１４の間でも通信が可能である。 The external device 30 has a learning unit 31 that determines such network design and an external image database (DB) 32 that records a large amount of data for learning. The learning unit 31 has a communication unit 31 b and the external image DB 32 has a communication unit 33 . The communication units 31b and 33 can communicate with each other. Note that the communication unit 31 c of the learning unit 31 can also communicate with the communication unit 14 .

学習部３１は、制御部３１ｇを有しており、制御部３１ｇは、ＣＰＵ等を用いたプロセッサによって構成されて、図示しないメモリに記憶されたプログラムに従って動作して各部を制御するものであってもよいし、ハードウェアの電子回路で機能の一部又は全部を実現するものであってもよい。なお、学習部３１全体が、ＣＰＵ、ＧＰＵ（Graphics Processing Unit）、ＦＰＧＡ等を用いたプロセッサによって構成されて、図示しないメモリに記憶されたプログラムに従って動作して学習を制御するものであってもよいし、ハードウェアの電子回路で機能の一部又は全部を実現するものであってもよい。 The learning unit 31 has a control unit 31g. The control unit 31g is configured by a processor using a CPU or the like, and operates according to a program stored in a memory (not shown) to control each unit. Alternatively, a part or all of the functions may be realized by an electronic circuit of hardware. Note that the entire learning unit 31 may be configured by a processor using a CPU, a GPU (Graphics Processing Unit), an FPGA, or the like, and may operate according to a program stored in a memory (not shown) to control learning. However, a part or all of the functions may be realized by an electronic circuit of hardware.

外部画像ＤＢ３２は、画像分類記録部３４を備えている。画像分類記録部３４は、ハードディスクやメモリ媒体等の図示しない記録媒体により構成されており、複数の画像を画像中に含まれる対象物の種類毎に分類して記録する。図１の例では、画像分類記録部３４は、対象物種類Ａの画像群３４ａのみを記録する例を示しているが、分類する種類の数は適宜設定可能である。 The external image DB 32 has an image classification recording section 34 . The image classification recording unit 34 is composed of a recording medium (not shown) such as a hard disk or a memory medium, and classifies and records a plurality of images according to the types of objects included in the images. In the example of FIG. 1, the image classification recording unit 34 records only the image group 34a of the object type A, but the number of types to be classified can be set appropriately.

本実施の形態においては、画像群３４ａとして、例えば鳥の画像群が記録されている。図３及び図４は画像群３４ａの各画像を説明するためのものである。図３は画像群３４ａの各画像を撮像する一例を示す説明図であり、図４は画像群３４ａの各画像と撮影時間との関係を示す説明図である。 In this embodiment, for example, an image group of birds is recorded as the image group 34a. 3 and 4 are for explaining each image of the image group 34a. FIG. 3 is an explanatory diagram showing an example of capturing each image of the image group 34a, and FIG. 4 is an explanatory diagram showing the relationship between each image of the image group 34a and the shooting time.

図３は樹木の枝４５に止まっている鳥４６をカメラ４０によって撮影している様子を示している。カメラ４０の背面にはＬＣＤ等により構成された表示画面４２が設けられており、鳥４６の画像がライブビューとして表示されていることを示している。シャッタボタン４１を操作することで、鳥４６の撮影が可能である。本実施の形態においては、鳥４６が枝４５から飛び立つまでの一連の様子を撮像する。例えば、カメラ４０が連写機能を有している場合には連写機能を用いて、鳥４６が枝４５に止まっている状態から枝４５から飛び立つまでの一連の様子を所定の時間間隔で撮像して一連の画像として取得してもよい。また、カメラ４０が動画の撮像機能を有している場合には、鳥４６が枝４５に止まっている状態から枝４５から飛び立つまでの一連の様子を動画撮影した動画像を取得してもよい。また、カメラ４０のシャッタボタン４１を所定の時間間隔で操作することで、鳥４６が枝４５に止まっている状態から枝４５から飛び立つまでの一連の様子を所定の時間間隔で撮像して離散的に静止画像を取得してもよい。 FIG. 3 shows how a bird 46 perched on a branch 45 of a tree is photographed by the camera 40 . A display screen 42 composed of an LCD or the like is provided on the back of the camera 40, and an image of a bird 46 is displayed as a live view. A bird 46 can be photographed by operating the shutter button 41 . In this embodiment, a series of images are captured until the bird 46 takes off from the branch 45 . For example, if the camera 40 has a continuous shooting function, the continuous shooting function is used to capture a series of images of the bird 46 from resting on the branch 45 to taking off from the branch 45 at predetermined time intervals. may be acquired as a series of images. Further, if the camera 40 has a moving image capturing function, a moving image obtained by capturing a series of moving images from the bird 46 resting on the branch 45 to flying away from the branch 45 may be obtained. . Further, by operating the shutter button 41 of the camera 40 at predetermined time intervals, a series of images of the bird 46 from resting on the branch 45 to flying off the branch 45 are captured at predetermined time intervals to obtain a discrete image. A still image may be acquired at

図４はカメラ４０が取得した６枚の画像Ｐ１～Ｐ６を時間順に配置して示している。各画像Ｐ１～Ｐ６は図３の鳥４６を撮影して取得された画像である。各画像Ｐ１～Ｐ６はそれぞれ時間情報を含んでおり、図４の各画像下の数字は画像Ｐ５を基準として取得された時間を示している。図４の例では、画像Ｐ１～Ｐ６は画像Ｐ５を時間基準として、－２０秒（２秒前）、－１５秒、－１０秒、－５秒、０秒、＋５秒に撮影されたものである。なお、一般的には、カメラ４０にはタイマが内蔵されており、タイマによって時刻の情報が各画像Ｐ１～Ｐ６に付加されるが、いずれかの画像、例えば、連写開始時の画像の取得時刻を基準に相対的な時間の情報を各画像Ｐ１～Ｐ６を付加してもよい。例えば、図４の例では、画像Ｐ１の撮影時刻は１５時３０分２７秒であり、この時間を基準に画像Ｐ２～Ｐ６の相対的な取得時間を時間情報として保持していてもよい。 FIG. 4 shows six images P1 to P6 acquired by the camera 40 arranged in chronological order. Each image P1 to P6 is an image obtained by photographing the bird 46 in FIG. Each of the images P1 to P6 contains time information, and the numbers under each image in FIG. 4 indicate the times acquired with reference to the image P5. In the example of FIG. 4, images P1 to P6 were taken at −20 seconds (2 seconds before), −15 seconds, −10 seconds, −5 seconds, 0 seconds, and +5 seconds with image P5 as the time reference. be. Generally, the camera 40 has a built-in timer, and time information is added to each of the images P1 to P6 by the timer. Relative time information may be added to each of the images P1 to P6 based on the time. For example, in the example of FIG. 4, the image P1 was captured at 15:30:27, and relative acquisition times of the images P2 to P6 may be stored as time information based on this time.

例えば、ユーザが最も撮影したい画像が、鳥４６が枝４５から正に飛び立とうとしている瞬間の画像（以下、特定状態画像ともいう）を含む画像Ｐ５であるものとする。この瞬間を決定的瞬間として、画像Ｐ５を決定的瞬間画像というものとする。本実施の形態においては、図４に示すような時間情報を有する一連の画像を画像分類記録部３４に画像群３４ａとして記録するようになっている。なお、画像分類記録部３４には、同一種類の鳥の画像や、サイズが略同様のサイズであると分類された鳥の画像群を記録するようにしてもよい。また、これらの画像の外に、種類が異なる画像や異なるサイズに分類される鳥の画像群を記録してもよい。 For example, it is assumed that the image that the user most wants to shoot is the image P5 including the image of the moment when the bird 46 is about to take off from the branch 45 (hereinafter also referred to as the specific state image). This moment is defined as a decisive moment, and the image P5 is called a decisive moment image. In this embodiment, a series of images having time information as shown in FIG. 4 are recorded in the image classification recording unit 34 as an image group 34a. The image classification recording unit 34 may record images of birds of the same type, or images of birds classified as having approximately the same size. In addition to these images, images of different types and image groups of birds classified into different sizes may be recorded.

図４に示すように、鳥は、飛び立つ前の所定時間に、曲げていた足を伸ばし、翼を広げようとする予備動作を行うことがある。例えば、種類が異なる鳥の場合、サイズが異なる鳥の場合、或いは、獲物を狙っているか否か等に応じて、予備動作の仕方は多少異なるものと考えられるが、鳥の飛び立ちに関する膨大な画像データについて学習を行えば、飛び立つ前の様子から飛び立つ瞬間の時間を予測することが可能であると考えられる。 As shown in FIG. 4, a bird may make a preparatory motion to straighten its bent legs and spread its wings at a predetermined time before taking off. For example, in the case of birds of different types, in the case of birds of different sizes, or depending on whether they are aiming at prey or not, the method of preparatory movement is considered to be somewhat different. By learning data, it is possible to predict the moment of flight from the state before flight.

画像群３４ａには、例えば、図３のような撮影によって図４に示すような時間情報を有する一連の画像についての膨大なデータ群が記録されている。学習部３１の母集合作成部３１ａは、外部画像ＤＢ３２から画像を読出して、学習の元となる母集合を作成する。 In the image group 34a, for example, a huge data group of a series of images having time information as shown in FIG. 4 is recorded by photographing as shown in FIG. A mother set creation unit 31a of the learning unit 31 reads images from the external image DB 32 and creates a mother set that serves as a basis for learning.

なお、学習部３１に与える学習用データを撮像装置２０から取得することも可能である。この場合には、撮像装置２０は、撮像部２２によって取得した撮像画像に、制御部１１に内蔵されている図示しないタイマからの時間情報を付加して通信部１４を介して学習部３１に送信する。 It is also possible to acquire the learning data to be given to the learning unit 31 from the imaging device 20 . In this case, the imaging device 20 adds time information from a timer (not shown) incorporated in the control unit 11 to the captured image acquired by the imaging unit 22 and transmits the image to the learning unit 31 via the communication unit 14 . do.

本実施の形態においては、母集合作成部３１ａは、時間判定部３１ａ１及び対象物画像判定部３１ａ２を有している。母集合作成部３１ａは、制御部３１ｇに制御されて、推論モデルの生成に用いる教師データを作成する。対象物画像判定部３１ａ２は決定的瞬間の判定の対象となる対象物の画像部分（以下、対象物画像という）を判定すると共に、当該対象物画像が対象物の特定の状態である決定的瞬間に至ったときの画像（特定状態画像）になったか否か判定する。また、時間判定部３１ａ１は、各画像に付加された時間情報により、各画像が決定的瞬間（特定状態）に至るまでの時間を判定する。即ち、時間判定部３１ａ１は、一連の画像の各画像について、各画像の撮影時刻と、一連の画像のうち決定的瞬間の特定状態画像を含む画像の撮影時刻との時間差を判定する。制御部３１ｇは、各画像と判定した時間差との情報を組にして教師データとする。
ここで、特定状態と書いたのは、撮像して追っている対象物のそのものの形状が時間的に変化して特定の姿勢や向きになった状態の他、この対象物の色や大きさの変化なども含むもので、その他、撮像画面内における形状や位置や向きなどになったことを表している。また、状態には、対象物が発する音声の時間変化などを含んでもよい。また、画像、音声を総合的に判定して、演奏や踊りなどの演芸から、何らかの作業など、イベントの開始や終了、あるいはそのクライマックスや見どころなど、ユーザが設定、指定できるようにすれば良い。指定方法としては文字入力、音声入力、項目選択、類似情報入力などが考えられる。また、いちいち指定しなくとも、多くの人が決定的瞬間と感じるものなどは自動判定でもよく、また、それは特定の１タイミングに限らず複数のタイミングで起こるものでもよい。上記時間差は、数値として複数あってもよい。 In the present embodiment, the mother set creation unit 31a has a time determination unit 31a1 and an object image determination unit 31a2. The population generating unit 31a is controlled by the control unit 31g to generate teacher data used for generating an inference model. The target object image determination unit 31a2 determines an image portion of the target object (hereinafter referred to as target image) to be determined as the decisive moment, and determines the decisive moment when the target object image is in a specific state of the target object. It is determined whether or not the image (specific state image) when reaching . Further, the time determination unit 31a1 determines the time until each image reaches a decisive moment (specific state) based on the time information added to each image. That is, for each image in the series of images, the time determination unit 31a1 determines the time difference between the shooting time of each image and the shooting time of the image including the specific state image at the decisive moment in the series of images. The control unit 31g pairs the information of each image and the determined time difference as teacher data.
Here, the term "specific state" refers to a state in which the shape of an object being captured and tracked changes over time and assumes a specific posture or orientation, as well as a state in which the color or size of the object changes. It also includes changes, etc., and expresses changes in shape, position, orientation, etc. within the imaging screen. In addition, the state may include a temporal change in the sound emitted by the object. In addition, by comprehensively judging images and sounds, the user can set and designate performances such as performances and dances, some kind of work, the start and end of events, their climaxes and highlights. Character input, voice input, item selection, similar information input, etc. can be considered as the designation method. In addition, even if it is not specified one by one, a moment that many people perceive as a decisive moment may be determined automatically, and it may occur not only at one specific timing but also at a plurality of timings. The time lag may have a plurality of numerical values.

図５は母集合作成部３１ａによる教師データの作成方法を説明するためのフローチャートである。図５のステップＳ１において、母集合作成部３１ａは時間情報が付加された一連の画像群を取得する。例えば、図４の画像Ｐ１～Ｐ６が取得される。母集合作成部３１ａの対象物画像判定部３１ａ２は、取得された一連の画像群の各画像から対象物画像を判定する。 FIG. 5 is a flow chart for explaining a method of creating teacher data by the population creating unit 31a. At step S1 in FIG. 5, the mother set creation unit 31a acquires a series of images to which time information is added. For example, images P1-P6 in FIG. 4 are acquired. The object image determination unit 31a2 of the mother set creation unit 31a determines the object image from each image of the acquired series of images.

先ず、対象物画像判定部３１ａ２は、マニュアル選択が指示されているか否かを判定する。マニュアル選択は、画像中からユーザが対象物画像を指定する操作によって行われる。学習部３１にはＬＣＤ等によって構成された表示部３１ｆが設けられており、表示部３１ｆにはユーザ操作を受け付けるための図示しないタッチパネルが配設されている。対象物画像判定部３１ａ２は、一連の画像を表示部３１ｆに表示させ（ステップＳ９）、ユーザのタッチ操作によって指定された被写体を、マニュアル選択された対象物画像として判定する。ユーザによるマニュアル選択があった場合には、当該一連の画像を教師データの候補とする（ステップＳ１０）。 First, the object image determination unit 31a2 determines whether or not manual selection is instructed. Manual selection is performed by a user's operation of designating an object image from among the images. The learning unit 31 is provided with a display unit 31f configured by an LCD or the like, and the display unit 31f is provided with a touch panel (not shown) for accepting user operations. The object image determination unit 31a2 displays a series of images on the display unit 31f (step S9), and determines the object specified by the user's touch operation as the manually selected object image. If there is manual selection by the user, the series of images is used as a candidate for teacher data (step S10).

マニュアル選択が指定されていない場合には、対象物画像判定部３１ａ２は、ステップＳ３において対象物画像を判定する。例えば、対象物画像判定部３１ａ２は、画像中央に所定サイズ以上のサイズで位置する被写体を対象物として、その画像部分を対象物画像と判定してもよい。また、ユーザ操作によって、予め対象物とすべきものが指定されていてもよい。例えば、鳥が対象物として指定されている場合には、対象物画像判定部３１ａ２は、取り込まれた画像に対する公知の認識処理によって、対象物である鳥を判定するようになっていてもよい。 When the manual selection is not specified, the object image determining section 31a2 determines the object image in step S3. For example, the target object image determination unit 31a2 may determine an image portion of a subject located in the center of the image with a size equal to or larger than a predetermined size as the target object image. Further, an object to be a target object may be specified in advance by a user's operation. For example, when a bird is specified as the target object, the target image determining section 31a2 may determine the bird as the target object by a known recognition process for the captured image.

対象物画像判定部３１ａ２は、次のステップＳ４において、対象物画像を含まない画像を一連の画像群から排除し、残りの画像数が所定数以上であるか否かを判定する（ステップＳ５）。一連の画像群のうち対象物画像を含む画像の枚数が所定数よりも少ない場合には、決定的瞬間の判定や決定的瞬間までの時間判定が困難になることが考えられるので、そのような画像群については、ステップＳ１１において教師データ群の候補から外す。 In the next step S4, the target object image determination unit 31a2 excludes images that do not include the target object image from the series of images, and determines whether or not the number of remaining images is equal to or greater than a predetermined number (step S5). . If the number of images including the object image in the series of images is less than a predetermined number, it may be difficult to determine the decisive moment and the time until the decisive moment. The image group is excluded from candidates for the training data group in step S11.

対象物画像判定部３１ａ２は、対象物画像を含む画像が所定数以上であった場合には、処理をステップＳ６に移行して、当該対象物画像を含む画像群を選択し、ステップＳ７において、決定的瞬間画像を選択するための画像の候補化を行う。図４の例では、鳥が飛び立つ瞬間を決定的瞬間とする例を示しており、例えば、対象物画像判定部３１ａ２は、画像解析処理によって、対象物画像が画像中で上下左右に最も広がった画像を特定状態画像として検出し、当該特定状態画像を含む画像を決定的瞬間画像の候補とする。図４の例では画像Ｐ５が決定的瞬間画像の候補となる。 When the number of images including the target object image is equal to or greater than the predetermined number, the target object image determination unit 31a2 shifts the process to step S6, selects an image group including the target object image, and in step S7, Candidate images for selecting decisive moment images. The example of FIG. 4 shows an example in which the moment when a bird takes off is regarded as the decisive moment. An image is detected as a specific state image, and an image including the specific state image is set as a candidate for the decisive moment image. In the example of FIG. 4, the image P5 is a candidate for the decisive moment image.

対象物画像判定部３１ａ２は、次のステップＳ８において、決定的瞬間画像の候補の前に所定数以上の画像があるか否かを判定する。対象物画像を含む画像群のうち決定的瞬間画像よりも取得時間が前の画像の枚数が所定数よりも少ない場合には、決定的瞬間までの時間が短すぎて利用しにくいことから、そのような画像群については、ステップＳ１１において教師データ群の候補から外す。 In the next step S8, the object image determination unit 31a2 determines whether or not there are a predetermined number or more of images before the candidate for the decisive moment image. If the number of images acquired before the decisive moment image in the group of images including the object image is less than a predetermined number, the time to the decisive moment is too short to be used. Such image groups are excluded from candidates for the training data group in step S11.

対象物画像判定部３１ａ２は、決定的瞬間画像の前に対象物画像を含む画像が所定数以上あった場合には、処理をステップＳ１２に移行して、当該決定的瞬間画像の候補を決定的瞬間画像に決定し、当該決定的瞬間画像の取得時間を基準化する。 When there are a predetermined number or more of images including an object image before the decisive moment image, the object image determination unit 31a2 shifts the process to step S12, and determines the candidate for the decisive moment image as the decisive moment image. A momentary image is determined, and the acquisition time of the decisive momentary image is standardized.

次のステップＳ１３において、時間判定部３１ａ１は、一連の画像群のうち対象物画像を含む画像について、決定的瞬間画像の取得時間を基準にした相対的時間を付して記録する。こうして、図４のように、決定的瞬間画像である画像Ｐ５を基準として、他の画像の取得時間との相対的な時間差の情報が付された一連の画像群が教師データとして教師データ記録部３１ｅに記録される。 In the next step S13, the time determination unit 31a1 records the images including the target object image among the series of images with a relative time based on the acquisition time of the decisive moment image. In this way, as shown in FIG. 4, a series of images to which information on the relative time difference with respect to acquisition times of other images is attached, based on the image P5, which is the decisive moment image, is used as teacher data in the teacher data recording unit. 31e.

推論モデル生成部としての入出力モデル化部３１ｄは、例えば図２に示す手法によって、母集合作成部３１ａが作成した教師データを教師データ記録部３１ｅから読出して、画像と決定的瞬間画像が得られるまでの時間との関係を学習した学習モデル（推論モデル）、即ち、ネットワーク１２ａ及びその設定情報を求める。 The input/output modeling unit 31d as an inference model generating unit reads out the teacher data created by the mother set creating unit 31a from the teacher data recording unit 31e, for example, by the method shown in FIG. A learning model (inference model) that has learned the relationship with the time until it is completed, that is, the network 12a and its setting information are obtained.

学習部３１は、撮像装置２０の制御部１１から要求があった場合には、生成した推論モデルを通信部３１ｃ，１４を介して撮像装置２０に送信するようになっている。制御部１１は、通信部１４を介して取得した設定情報を設定データ記録領域１６ｂに記録して、推論エンジン１２のネットワーク１２ａの設定に用いる。こうして、学習部３１において生成された推論モデルを撮像装置２０において利用可能となる。 The learning unit 31 transmits the generated inference model to the imaging device 20 via the communication units 31 c and 14 when requested by the control unit 11 of the imaging device 20 . The control unit 11 records the setting information acquired via the communication unit 14 in the setting data recording area 16b and uses it for setting the network 12a of the inference engine 12. FIG. Thus, the inference model generated by the learning unit 31 can be used by the imaging device 20 .

制御部１１には設定制御部１１ｄが設けられており、設定制御部１１ｄは、推論エンジン１２を制御して、推論エンジン１２を利用した推論を行わせることができるようになっている。即ち、設定制御部１１ｄは、撮像部２２によってライブビュー画像が取得されると、当該ライブビュー画像を推論エンジン１２に与えて決定的瞬間までの時間を得る推論（以下、画像時間推論という）を実行させる。この結果、推論エンジン１２から決定的瞬間までの時間の情報が得られた場合には、設定制御部１１ｄは、表示制御部１１ｆを制御して、推論エンジン１２による推論の結果を表示部１５の表示画面上に表示させることができるようになっている。即ち、この場合には、ライブビュー画像に重ねて決定的瞬間までの時間が表示されることになる。 A setting control unit 11d is provided in the control unit 11, and the setting control unit 11d can control the inference engine 12 to perform inference using the inference engine 12. FIG. That is, when the imaging unit 22 acquires a live view image, the setting control unit 11d provides the live view image to the inference engine 12 to obtain the time until the decisive moment (hereinafter referred to as image time inference). let it run. As a result, when information on the time until the decisive moment is obtained from the inference engine 12, the setting control unit 11d controls the display control unit 11f to display the result of inference by the inference engine 12 on the display unit 15. It can be displayed on the display screen. That is, in this case, the time until the decisive moment is displayed superimposed on the live view image.

なお、設定制御部１１ｄは、表示に限らず、推論エンジン１２による推論の結果を種々の方法でユーザに提示することができるようになっていてもよい。例えば、設定制御部１１ｄは、音声により推論結果を提示してもよく、或いは駆動部の機械的な制御によって推論結果を提示してもよい。 Note that the setting control unit 11d may be capable of presenting the result of inference by the inference engine 12 to the user in various ways, not limited to display. For example, the setting control unit 11d may present the inference result by voice, or may present the inference result by mechanical control of the driving unit.

次に、このように構成された実施の形態の動作について図６から図１０を参照して説明する。図６、図７及び図１０は第１の実施の形態の動作を説明するための説明図である。また、図８及び図９は第１の実施の形態の動作を説明するためのフローチャートであり、図８は撮像装置２０の動作を示し、図９は外部機器３０の動作を示している。 Next, the operation of the embodiment configured in this manner will be described with reference to FIGS. 6 to 10. FIG. 6, 7 and 10 are explanatory diagrams for explaining the operation of the first embodiment. 8 and 9 are flowcharts for explaining the operation of the first embodiment, FIG. 8 shows the operation of the imaging device 20, and FIG. 9 shows the operation of the external device 30. FIG.

図６は図１の撮像装置２０により被写体を撮像する様子を示している。図１の撮像装置２０の各部は、図６の筐体２０ａ内に収納されている。筐体２０ａの背面には表示部１５を構成する表示画面１５ａが配設されている。また、筐体２０ａの前面には、光学系２２ｂを構成する図示しないレンズが配設されており、筐体２０ａの上面には、操作部１３を構成するシャッタボタン１３ａが配設されている。 FIG. 6 shows how an object is imaged by the imaging device 20 of FIG. Each part of the imaging device 20 in FIG. 1 is housed in a housing 20a in FIG. A display screen 15a constituting the display unit 15 is arranged on the rear surface of the housing 20a. A lens (not shown) forming an optical system 22b is arranged on the front surface of the housing 20a, and a shutter button 13a forming the operation unit 13 is arranged on the upper surface of the housing 20a.

図６は被写体として、樹木の枝４５に止まった鳥４６を撮影する例を示しており、ユーザ４７は、例えば、筐体２０ａを右手４８で把持して、表示部１５の表示画面１５ａを見ながら、鳥４６を視野範囲に捉えた状態で、右手４８の指でシャッタボタン１３ａを押下操作することで撮影を行う。 FIG. 6 shows an example of photographing a bird 46 perched on a branch 45 of a tree as an object. While the bird 46 is caught in the field of view, the user presses the shutter button 13a with the finger of the right hand 48 to take a picture.

本実施の形態においては、推論モデルを用いて、シャッタチャンスである決定的瞬間の判定を行う。即ち、推論エンジン１２は、画像（ライブビュー画像）に対して決定的瞬間が訪れるまでの予測時間を推論するための推論モデルを構成する。このような推論モデルは、外部機器３０によって生成可能である。 In this embodiment, an inference model is used to determine a decisive moment, which is a photo opportunity. That is, the inference engine 12 constructs an inference model for inferring the predicted time until the decisive moment arrives for an image (live-view image). Such an inference model can be generated by the external device 30 .

図９は外部機器３０の動作を示している。図９のステップＳ４１において、外部機器３０は学習依頼があったか否かを判定し、学習依頼があるまで待機状態となる。学習依頼が発生すると、外部機器３０はステップＳ４２において、例えば外部画像ＤＢ３２から学習用データを読出して教師データを作成する。なお、ステップＳ４２の教師データ作成ステップは、図５のフローによって実施されるものであってもよい。 FIG. 9 shows the operation of the external device 30. As shown in FIG. In step S41 of FIG. 9, the external device 30 determines whether or not a learning request has been made, and is in a standby state until a learning request is made. When a learning request occurs, the external device 30 reads data for learning from, for example, the external image DB 32 and creates teacher data in step S42. Note that the teacher data creation step of step S42 may be performed according to the flow of FIG.

教師データが作成されて教師データ記録部３１ｅに記録されると、入出力モデル化部３１ｄは、ステップＳ４３において、教師データ記録部３１ｅから教師データを読出して学習を行い、推論モデルを作成する。入出力モデル化部３１ｄは、次のステップＳ４４において、練習問題を設定して、作成した推論モデルの検証を行う。入出力モデル化部３１ｄは、ステップＳ４５において、練習問題を用いた検証の結果、推論の信頼性が所定の値以上であるか否かを判定する。所定の値以上の場合には、入出力モデル化部３１ｄは、正しく推論モデルが生成されたものと判定して、当該推論モデルを通信部３１ｃを介して撮像装置２０に送信する（ステップＳ４９）。 When the teacher data is created and recorded in the teacher data recording unit 31e, the input/output modeling unit 31d reads the teacher data from the teacher data recording unit 31e and performs learning to create an inference model in step S43. In the next step S44, the input/output modeling unit 31d sets exercises and verifies the created inference model. In step S45, the input/output modeling unit 31d determines whether or not the reliability of the inference is equal to or higher than a predetermined value as a result of verification using practice problems. If it is equal to or greater than the predetermined value, the input/output modeling unit 31d determines that the inference model is correctly generated, and transmits the inference model to the imaging device 20 via the communication unit 31c (step S49). .

入出力モデル化部３１ｄは、信頼性が所定値以上でない場合には、ステップＳ４５から処理をステップＳ４６に移行して、教師データの再設定等を行った後、ステップＳ４７において所定回数以上再設定を行ったか否かを判定する。所定回数以上再設定を行っていない場合には、入出力モデル化部３１ｄはステップＳ４３に処理を戻す。入出力モデル化部３１ｄは、再設定が所定回数以上行われた場合には、ステップＳ４７から処理をステップＳ４８に移行して、対象物画像は推論には不向きな苦手画像であるものと判定して、苦手画像情報を撮像装置２０に送信した後、処理をステップＳ４９に移行する。 If the reliability is not equal to or higher than the predetermined value, the input/output modeling unit 31d shifts the process from step S45 to step S46, resets the teacher data, etc., and then resets the reliability a predetermined number of times or more in step S47. It is determined whether or not the If resetting has not been performed the predetermined number of times or more, the input/output modeling unit 31d returns the process to step S43. When the resetting is performed more than the predetermined number of times, the input/output modeling unit 31d shifts the process from step S47 to step S48, and determines that the object image is a difficult image unsuitable for inference. After transmitting the weak image information to the imaging device 20, the process proceeds to step S49.

一方、撮像装置２０の制御部１１は、図８のステップＳ２１において、撮影モードが指定されているか否かを判定する。撮影モードが指定されている場合には、制御部１１は、ステップＳ２２において画像入力及び表示を行う。即ち、撮像部２２は被写体を撮像し、制御部１１は、撮像部２２からの撮像画像を取り込み、図３に示すように、撮像画像をライブビュー画像として表示部１５に与えて表示させる。 On the other hand, the control unit 11 of the imaging device 20 determines whether or not the imaging mode is designated in step S21 of FIG. If the photographing mode is specified, the control section 11 performs image input and display in step S22. That is, the imaging unit 22 captures an image of a subject, and the control unit 11 captures the captured image from the imaging unit 22 and provides the captured image as a live view image to the display unit 15 for display as shown in FIG.

次に、設定制御部１１ｄは、ステップＳ２３において、推論エンジン１２に決定的瞬間までの時間を表示させるための画像時間推論を実行させる。推論エンジン１２は、ネットワーク１２ａにより実現される推論モデルを用いて、撮像中の各ライブビュー画像が決定的瞬間画像になるまでの時間を推論する。推論エンジン１２は、推論の結果を制御部１１に出力する。なお、推論結果は、表示中のライブビュー画像が決定的瞬間画像に変化するまでの時間及びその信頼性の情報を含む。 Next, in step S23, the setting control unit 11d causes the inference engine 12 to execute image time inference for displaying the time until the decisive moment. The inference engine 12 uses an inference model realized by the network 12a to infer the time required for each live view image being captured to become a decisive moment image. The inference engine 12 outputs the result of inference to the control unit 11 . Note that the inference result includes information on the time and reliability until the live view image being displayed changes to the decisive moment image.

設定制御部１１ｄは、ステップＳ２４において、推論エンジン１２がライブビュー画像に関連する推論モデルを有しているか否かを判定する。例えば、設定制御部１１ｄは、推論エンジン１２からの推論結果の信頼性（信頼度）が所定の第１の閾値よりも低い場合には、推論エンジン１２がライブビュー画像に関連する推論モデルを有していないものと判定してもよい。また、設定制御部１１ｄは、公知の認識処理によって、ライブビュー画像中の被写体を認識し、認識した被写体に関する推論モデルが存在するか否かを判定してもよい。 In step S24, the setting control unit 11d determines whether the inference engine 12 has an inference model related to the live view image. For example, if the reliability (reliability) of the inference result from the inference engine 12 is lower than a predetermined first threshold, the setting control unit 11d determines whether the inference engine 12 has an inference model related to the live view image. It may be determined that it is not. Also, the setting control unit 11d may recognize a subject in the live view image by a known recognition process, and determine whether or not there is an inference model related to the recognized subject.

設定制御部１１ｄは、関連する推論モデルを有していない場合には処理をステップＳ２９に移行する。設定制御部１１ｄは、関連する推論モデルを有していると判定した場合には次のステップＳ２５において、現在のライブビュー画像に関連する推論モデルが存在することを示す表示を表示させる。 If the setting control unit 11d does not have a related inference model, the process proceeds to step S29. If the setting control unit 11d determines that there is a related inference model, then in the next step S25, the setting control unit 11d displays a display indicating that there is an inference model related to the current live view image.

次に、設定制御部１１ｄは、推論エンジン１２からの推論結果の信頼性（信頼度）が十分に高いか否か、例えば所定の第２の閾値よりも高いか否かを判定する。設定制御部１１ｄは、信頼性が第２の閾値以上の場合には、処理をステップＳ２７に移行して信頼性が高い時間差表示を表示させ、信頼性が第２の閾値よりも小さい場合には、処理をステップＳ２８に移行して信頼性が比較的高い時間差幅の表示を表示させる。 Next, the setting control unit 11d determines whether the reliability (reliability) of the inference result from the inference engine 12 is sufficiently high, for example, whether it is higher than a predetermined second threshold. If the reliability is equal to or higher than the second threshold, the setting control unit 11d shifts the process to step S27 to display a highly reliable time difference display, and if the reliability is smaller than the second threshold, , the process proceeds to step S28 to display the display of the time difference range with relatively high reliability.

図７は表示部１５の表示画面１５ａに表示される撮像画像を示す説明図である。上述したように、ユーザ４７は、枝４５上の鳥４６の撮影を試みようとしている。特に、ユーザ４７は、鳥４６が枝４５から飛び立つ瞬間を決定的瞬間と考えて撮影を希望しているものとする。図７の画像Ｐ１１～Ｐ１４は、所定の時刻におけるライブビュー画像を示しており、画像Ｐ１１～Ｐ１４の順で時刻が経過している。 FIG. 7 is an explanatory diagram showing a captured image displayed on the display screen 15a of the display section 15. As shown in FIG. As mentioned above, user 47 is attempting to photograph bird 46 on branch 45 . In particular, it is assumed that the user 47 wishes to photograph the moment when the bird 46 takes off from the branch 45 as the decisive moment. Images P11 to P14 in FIG. 7 show live view images at a predetermined time, and time passes in the order of images P11 to P14.

画像Ｐ１１中の画像４６ａは、枝に止まっている鳥４６を示している。この画像Ｐ１１は、表示画面１５ａ上にライブビュー画像として表示されている。画像Ｐ１１から所定時間後に取得されたライブビュー画像である画像Ｐ１２は、画像Ｐ１２中の被写体に関連する推論モデルが存在することを示す丸印の表示５１が表示されている。画像Ｐ１２中の鳥４６の画像４６ｂは、もう少しで鳥４６が飛び立とうとしている様子を示している。更に、画像Ｐ１２中には、推論エンジン１２による画像時間推論の結果、決定的瞬間までの時間が５秒間から２秒間であることを示す時間差幅表示５２ｂが表示されている。時間差幅表示５２ｂは、画像時間推論の推論結果の信頼性は十分に高いとはいえないことから、推論結果に所定の幅を持たせて表示するものであり、例えば比較的高い信頼性（例えば、６５～８４％）の複数の推論結果の最小値と最大値を示す。 An image 46a in the image P11 shows a bird 46 perched on a branch. This image P11 is displayed as a live view image on the display screen 15a. An image P12, which is a live view image acquired after a predetermined time from the image P11, is displayed with a circle indication 51 indicating that there is an inference model related to the subject in the image P12. An image 46b of the bird 46 in the image P12 shows the bird 46 about to take off. Further, in the image P12, as a result of the image time inference by the inference engine 12, a time difference width display 52b is displayed indicating that the time to the decisive moment is from 5 seconds to 2 seconds. The time difference width display 52b displays the inference result with a predetermined width because the reliability of the inference result of the image time inference is not sufficiently high. , 65-84%).

これに対し、画像Ｐ１２から所定時間後に取得されたライブビュー画像である画像Ｐ１３中には、飛び立つ直前の鳥４６の画像４６ｃが表示されている。また、画像Ｐ１３中には、推論エンジン１２による画像時間推論の結果、決定的瞬間までの時間が１秒間であることを示す時間差表示５２ｃが表示されている。時間差表示５２ｃは、画像時間推論の推論結果の信頼性が十分に高く（例えば８５％以上）、最も高い信頼性の１つの推論結果を示すものである。画像Ｐ１３中の時間差表示５２ｃによれば、被写体である鳥４６は、時間差表示５２ｃの表示開始から１秒後に飛び立つ可能性が高いことを示している。 On the other hand, an image 46c of a bird 46 immediately before taking off is displayed in an image P13, which is a live view image acquired after a predetermined time from the image P12. Further, in the image P13, as a result of the image time inference by the inference engine 12, a time difference display 52c indicating that the time to the decisive moment is one second is displayed. The time difference display 52c indicates one inference result with the highest reliability because the reliability of the inference result of the image time inference is sufficiently high (for example, 85% or more). According to the time difference display 52c in the image P13, it is highly likely that the bird 46, which is the subject, will take off one second after the start of the display of the time difference display 52c.

ユーザ４７が、この時間差表示５２ｃの表示から１秒後にシャッタボタン１３ａを押下することで、鳥４６が飛び立つ決定的瞬間を撮影することができる可能性が高い。制御部１１は、ステップＳ２９において、動画又は静止画撮影操作が行われたか否かを判定する。制御部１１は、撮影操作が行われない場合には、処理をステップＳ２１に戻し、撮影操作が行われると、ステップＳ３０において、撮影及び記録処理を実行して処理をステップＳ２１に戻す。即ち、制御部１１は、撮像部２２によって取得された撮像画像を記録制御部１１ｃにより記録部１６の画像データ記録領域１６ａに記録させる。なお、動画記録時には、撮影終了操作時に、画像データ記録領域１６ａに動画ファイルが記録される。 By pressing the shutter button 13a one second after the time difference display 52c is displayed, there is a high possibility that the decisive moment when the bird 46 takes off can be photographed. In step S29, the control unit 11 determines whether or not a moving image or still image shooting operation has been performed. If the photographing operation is not performed, the control unit 11 returns the process to step S21, and if the photographing operation is performed, the photographing and recording process is performed in step S30, and the process returns to step S21. That is, the control unit 11 causes the recording control unit 11c to record the captured image acquired by the imaging unit 22 in the image data recording area 16a of the recording unit 16. FIG. Note that, during moving image recording, a moving image file is recorded in the image data recording area 16a at the time of an operation to end shooting.

図１０はこうして撮影された撮像画像を説明するための説明図であり、表示画面１５ａ上に表示されるレックビュー画像の一例を示している。図１０の左側は、連写時において表示画面１５ａ上に表示されるレックビュー画像５５を示している。例えば、ユーザが時間差幅表示５２ｂや時間差表示５２ｃを確認した後、連写撮影を開始することで、レックビュー画像５５が得られる。太枠は、連写した１枚の画像が決定的瞬間画像５５ａであることを示している。 FIG. 10 is an explanatory diagram for explaining the captured image thus captured, and shows an example of the rec view image displayed on the display screen 15a. The left side of FIG. 10 shows a rec view image 55 displayed on the display screen 15a during continuous shooting. For example, the rec view image 55 can be obtained by starting continuous shooting after the user confirms the time difference width display 52b and the time difference display 52c. A thick frame indicates that one image taken continuously is the decisive moment image 55a.

また、図１０の右側は、単写時において表示画面１５ａ上に表示されるレックビュー画像５７を示している。例えば、ユーザが時間差表示５２ｃを確認した後、表示された時間後にシャッタボタン１３ａを押下操作することで、レックビュー画像５７で示す決定的瞬間画像が得られる。 The right side of FIG. 10 shows a rec view image 57 displayed on the display screen 15a during single shooting. For example, when the user confirms the time difference display 52c and presses the shutter button 13a after the displayed time has passed, the decisive moment image shown by the rec view image 57 is obtained.

制御部１１は、ステップＳ２１において撮影モードが指定されていないと判定した場合には、処理をステップＳ３１に移行して、推論モデルの取得が指定されているか否かを判定する。制御部１１は、推論モデルの取得が指定されていない場合には、処理をステップＳ２１に戻し、指定されている場合には、ステップＳ３２において対象物の設定や再学習物の設定を行う。 If the control unit 11 determines in step S21 that the shooting mode has not been designated, the process proceeds to step S31 and determines whether acquisition of an inference model has been designated. If acquisition of an inference model is not specified, the control unit 11 returns the process to step S21, and if specified, sets the target object and the relearning object in step S32.

例えば、制御部１１は、表示制御部１１ｆによって、表示画面１５ａ上に辞書設定のためのメニューを表示させ、更に、ユーザ操作に応じて、対象物の設定画面及び再学習物の設定画面を表示させて、ユーザによる対象物の指定及び再学習物の指定を可能にしてもよい。制御部１１は、ステップＳ３３において、ユーザによって指定された対象物又は再学習物に対する学習依頼又は再学習依頼を、外部機器３０に対して行う。 For example, the control unit 11 causes the display control unit 11f to display a menu for dictionary setting on the display screen 15a, and further displays a target object setting screen and a relearning object setting screen according to the user's operation. It is also possible to allow the user to specify an object and specify a relearning object. In step S<b>33 , the control unit 11 requests the external device 30 to learn or re-learn the target object or re-learning object designated by the user.

制御部１１は、ステップＳ３４において、学習部３１から信頼性が所定値以上になった推論モデル、又は苦手画像情報に対応する推論モデルを通信部１４を介して受信する。制御部１１は、受信した推論モデルを推論エンジン１２に設定し、苦手画像情報を記録部１６に記録する。 In step S<b>34 , the control unit 11 receives from the learning unit 31 the inference model whose reliability is equal to or higher than a predetermined value or the inference model corresponding to the weak image information via the communication unit 14 . The control unit 11 sets the received inference model in the inference engine 12 and records the weak image information in the recording unit 16 .

なお、図８の説明では、画像時間推論の推論結果の信頼性が十分に高いか否かによって、時間差表示を行うか時間差幅表示を行うかを切換える例を説明したが、推論結果の表示形態は種々考えられる。例えば、推論結果の時間差を信頼性を示す数値や色分けによって表示してもよく、また、推論結果の信頼性が高い程、表示の濃淡の度合いを大きくするようにしてもよい。また、信頼性に拘わらず、常に時間差表示又は時間差幅表示を行ってもよい。 In the explanation of FIG. 8, an example was explained in which the time difference display or the time difference width display is switched depending on whether or not the reliability of the inference result of the image time inference is sufficiently high. can be considered in various ways. For example, the time difference of the inference result may be displayed by a numerical value indicating reliability or color coding, and the higher the reliability of the inference result, the greater the degree of gradation of the display. Also, the time difference display or the time difference width display may be always performed regardless of the reliability.

また、上記説明では、撮影操作は、ユーザが手動で行うものと説明したが、決定的瞬間に自動的に撮影が行われるように撮像制御部が制御することも可能である。また、連写を行う場合には、決定的瞬間において確実に撮影が行われるように、連写のタイミングを決定的瞬間に同期させるように制御することも可能である。 Also, in the above description, the shooting operation is described as being manually performed by the user, but it is also possible for the imaging control unit to control so that shooting is automatically performed at a decisive moment. Further, when continuous shooting is performed, it is possible to control the timing of continuous shooting to be synchronized with the decisive moment so that the shooting is surely performed at the decisive moment.

このように本実施の形態においては、時間情報を有する画像を学習用データとして、所定の瞬間（決定的瞬間）に到達するまでの時間を予測する画像時間推論を行う機械学習を実現する。この機械学習によって得た推論モデルを例えば撮像装置に適用することにより、時々刻々変化するライブビュー画像に対して画像時間推論を行って、例えば鳥が飛び立つという決定的瞬間までの到達時間を予測して、提示することができる。ユーザは、提示された到達時間を考慮して、例えばシャッタボタンの操作を行うことで、簡単に鳥が飛び立つ決定的瞬間の撮影が可能である。また、学習用データとして用いる時間情報を有する画像は極めて容易に取得することができるものであり、この学習用データから比較的簡単な処理によって教師データを取得することができ、画像時間推論を可能にする推論モデルを簡単に作成することができる。 As described above, in the present embodiment, machine learning that performs image temporal inference for predicting the time until reaching a predetermined moment (decisive moment) is realized using an image having time information as learning data. By applying an inference model obtained by this machine learning to, for example, an imaging device, image time inference can be performed on live view images that change from moment to moment, and the arrival time to a decisive moment, for example, when a bird takes off, can be predicted. can be presented. The user can easily capture the decisive moment when the bird takes off by, for example, operating the shutter button in consideration of the presented arrival time. In addition, images with time information used as learning data can be obtained very easily, and teacher data can be obtained from this learning data by relatively simple processing, making image temporal inference possible. You can easily create an inference model that

（第２の実施の形態）
図１１は本発明の第２の実施の形態において採用される動作フローを示すフローチャートである。本実施の形態のハードウェア構成は第１の実施の形態と同様である。図１１において図９と同一の手順には同一符号を付して説明を省略する。 (Second embodiment)
FIG. 11 is a flow chart showing the operation flow employed in the second embodiment of the invention. The hardware configuration of this embodiment is the same as that of the first embodiment. In FIG. 11, the same steps as in FIG. 9 are denoted by the same reference numerals, and descriptions thereof are omitted.

第１の実施の形態においては、時間情報を有する画像を学習用データとして用い、この学習用データを用いて教師データを作成することで、各画像と決定的瞬間までの時間を予測する画像時間推論を行う推論モデルを生成して利用する例を説明した。これに対し、本実施の形態は、時間情報を有する画像を学習用データとして用い、この学習用データを用いて所定の時間間隔を有する複数の画像の組を教師データとして作成することで、各画像と所定の時間後の画像を予測する画像画像推論を行う推論モデルを生成して利用する例である。 In the first embodiment, an image having time information is used as learning data, and training data is created using this learning data. An example of generating and using an inference model for inference was explained. On the other hand, in the present embodiment, an image having time information is used as learning data, and a set of a plurality of images having a predetermined time interval is created as teacher data using this learning data. This is an example of generating and using an inference model that performs image inference for predicting an image and an image after a predetermined time.

図１１のフローは、教師データの作成方法が図５のフローと異なる。即ち、外部機器３０の母集合作成部３１ａは、図１１のステップＳ５１において、類似対象物の連続画像群を外部画像ＤＢ３２等から取得する。母集合作成部３１ａは、ステップＳ５２において、特定時間差の２画像を教師データとしてネットワークに与えて学習させる。 The flow in FIG. 11 differs from the flow in FIG. 5 in the method of creating teacher data. That is, the mother set creation unit 31a of the external device 30 acquires a continuous image group of similar objects from the external image DB 32 or the like in step S51 of FIG. In step S52, the mother set creation unit 31a gives the two images with a specific time difference as teacher data to the network for learning.

図１２は外部画像ＤＢ３２から母集合作成部３１ａに取り込まれる連続画像群の一例を示す説明図である。画像Ｐ２１～Ｐ２９は、図４と同様に、鳥が飛び立つ前後の一連の様子を撮影して得られた画像を時間順に配置したものである。なお、これらの画像Ｐ２１～Ｐ２９は、連写機能や動画機能を利用して取得してもよく、また、所定の時間間隔で単写撮影して取得してもよい。 FIG. 12 is an explanatory diagram showing an example of a continuous image group taken in from the external image DB 32 to the mother set creation unit 31a. Images P21 to P29 are, as in FIG. 4, arranged in chronological order images obtained by photographing a series of situations before and after the bird takes off. Note that these images P21 to P29 may be acquired using a continuous shooting function or moving image function, or may be acquired by single shooting at predetermined time intervals.

母集合作成部３１ａは、時間判定部３１ａ１及び対象物画像判定部３１ａ２によって、所定の時間前後の２つの画像の組を教師データとして選択する。例えば、図１２の矢印はＮ秒後の画像を示しており、画像Ｐ２１とＰ２４、画像Ｐ２２とＰ２５、画像Ｐ２３とＰ２６、画像Ｐ２４とＰ２７とが組であることを示している。母集合作成部３１ａは作成した教師データを教師データ記録部３１ｅに記録する。 The mother set creation unit 31a selects a set of two images before and after a predetermined time as teacher data by using the time determination unit 31a1 and the object image determination unit 31a2. For example, arrows in FIG. 12 indicate images after N seconds, and indicate that images P21 and P24, images P22 and P25, images P23 and P26, and images P24 and P27 form a set. The population generating unit 31a records the generated training data in the training data recording unit 31e.

図１３は、図２と同様の記載方法によって、ネットワーク１２ａを生成する手法を説明するための説明図である。図１３においては、、所定のネットワークＮ１には入力される大量の学習用データセットは、教師データ記録部３１ｅから読出される画像の組である。本実施の形態においては、入力として画像がネットワークＮ１に与えられると、出力としてＮ秒後の画像が得られるように、ネットワークデザインが決定される。こうして決定されたネットワークデザインの情報は、通信部３１ｃから撮像装置２０に伝送され、設定制御部１１ｄによって、設定データ記録領域１６ｂに設定情報として記録される。 FIG. 13 is an explanatory diagram for explaining a technique for generating the network 12a using the same description method as in FIG. In FIG. 13, a large amount of learning data set input to the predetermined network N1 is a set of images read out from the teacher data recording unit 31e. In this embodiment, the network design is determined so that when an image is given to the network N1 as an input, an image after N seconds is obtained as an output. Information on the network design thus determined is transmitted from the communication section 31c to the imaging device 20, and recorded as setting information in the setting data recording area 16b by the setting control section 11d.

更に、本実施の形態においては、学習部３１は、ステップＳ４５において信頼性が所定の値以上であると判定した場合には、次のステップＳ５３において、入力画像から推測される出力画像、即ち、入力画像の取得時間から所定の時間経過後に取得された画像のうち、いずれの画像を出力するかを決定する。この画像は、後述するように、合成表示に用いる代表画像として用いるために、図示しない記録媒体に記録する。学習部３１は、ネットワークデザインの情報の送信時に、代表画像についても撮像装置２０に送信する。撮像装置２０の記録制御部１１ｃは代表画像を画像データ記録領域１６ａに記録するようになっている。 Furthermore, in the present embodiment, when the learning unit 31 determines in step S45 that the reliability is equal to or higher than the predetermined value, in the next step S53, the output image estimated from the input image, that is, It is determined which image is to be output from among the images acquired after the lapse of a predetermined time from the acquisition time of the input image. This image is recorded on a recording medium (not shown) so as to be used as a representative image for composite display, as will be described later. The learning unit 31 also transmits the representative image to the imaging device 20 when transmitting the network design information. The recording control section 11c of the imaging device 20 records the representative image in the image data recording area 16a.

次に、このように構成された実施の形態について撮像装置２０における動作を図１４を参照して説明する。図１４は表示部１５の表示画面に表示される画像の表示例を示す説明図である。 Next, the operation of the imaging device 20 of the embodiment configured as described above will be described with reference to FIG. FIG. 14 is an explanatory diagram showing a display example of an image displayed on the display screen of the display unit 15. As shown in FIG.

本実施の形態においては、推論エンジン１２は、所定の画像入力に対して所定時間後の予測画像を出力する上述した画像画像推論を行う推論モデルを構成する。また、撮像装置２０の制御部１１は、図８のステップＳ２５～Ｓ２８に代えて、推論エンジン１２による画像画像推論を実行させ、推論結果に基づく表示を行う。 In the present embodiment, the inference engine 12 configures an inference model that performs the above-described image inference for outputting a predicted image after a predetermined time from a predetermined image input. Also, instead of steps S25 to S28 in FIG. 8, the control unit 11 of the imaging device 20 causes the inference engine 12 to perform image image inference, and displays based on the inference result.

いま、図６と同様に、ユーザ４７が枝４５に止まっている鳥４６を撮像するものとする。図１４の画像Ｐ３１ａ～Ｐ３１ｄはいずれも鳥４６が飛び立つ直前のライブビュー画像を示している。設定制御部１１ｄは、撮像部２２からのライブビュー画像を推論エンジン１２に与えて、画像画像推論を実行させる。推論エンジン１２は、画像画像推論の結果として、入力されたライブビュー画像の撮影時刻から所定時間後に撮像されるであろう画像を予測して予測結果を設定制御部１１ｄに出力する。 Now, as in FIG. 6, it is assumed that the user 47 takes an image of a bird 46 perched on a branch 45 . Images P31a to P31d in FIG. 14 all show live view images immediately before the bird 46 takes off. The setting control unit 11d gives the live view image from the imaging unit 22 to the inference engine 12 to execute image inference. As a result of the image inference, the inference engine 12 predicts an image that will be captured after a predetermined time from the shooting time of the input live view image, and outputs the prediction result to the setting control unit 11d.

設定制御部１１ｄは、予測結果に基づいて、画像データ記録領域１６ａに記憶されている代表画像を読出して表示制御部１１ｆに与える。こうして、表示制御部１１ｆは、現在のライブビュー画像上に、所定時間後に撮像されるであろう代表画像を重畳して表示させる。図１４の画像Ｐ３１ａは、この場合の一表示例を示しており、画像Ｐ３１ａ中には、ライブビュー画像中に含まれる鳥４６の飛び立つ直前の画像部分６１の外に、２秒後の画像として予測された代表画像６２ａと３秒後の画像として予測された代表画像６２ｂとが表示される。また、表示制御部１１ｆは、これらの画像６２ａ，６３ａの近傍に、プレビュー画像の取得時間を基準にして、これらの画像が取得されるであろう時間が２秒後又は３秒後であることを示す時間表示６２ｂ，６３ｂを表示している。 Based on the prediction result, setting control section 11d reads out the representative image stored in image data recording area 16a and provides it to display control section 11f. Thus, the display control unit 11f superimposes and displays a representative image that will be captured after a predetermined time on the current live view image. An image P31a in FIG. 14 shows an example of display in this case. In the image P31a, in addition to the image portion 61 immediately before the flight of the bird 46 included in the live view image, an image after two seconds is displayed. A predicted representative image 62a and a predicted representative image 62b as an image after 3 seconds are displayed. In addition, the display control unit 11f displays, in the vicinity of these images 62a and 63a, that the time that these images will be acquired is two seconds or three seconds later based on the acquisition time of the preview image. time displays 62b and 63b indicating .

なお、上述したように、代表画像は、外部機器３０によって決定されて記録された画像であり、外部機器３０から撮像装置２０に転送された画像である。このため、必ずしも代表画像が存在しない可能性もある。そこで、この場合には、代表画像の表示位置に、画像部分６１をコピーして表示することも考えられる。図１４の画像Ｐ３１ｂはこの場合の表示例を示しており、画像Ｐ３１ｂ中には、ライブビュー画像中に含まれる鳥４６の飛び立つ直前の画像部分６１の外に、２秒後の画像として予測された代表画像に代えて画像部分６１をコピーして生成した画像６４ａが表示される。また、表示制御部１１ｆは、この画像６４ａの近傍に、プレビュー画像の取得時間を基準にして、代表画像が取得されるであろう時間が２秒後であることを示す時間表示６４ｂを表示している。 As described above, the representative image is an image determined and recorded by the external device 30 and an image transferred from the external device 30 to the imaging device 20 . Therefore, there is a possibility that the representative image does not necessarily exist. Therefore, in this case, it is conceivable to copy and display the image portion 61 at the display position of the representative image. An image P31b in FIG. 14 shows a display example in this case, and in the image P31b, an image 2 seconds later is predicted in addition to the image portion 61 immediately before the bird 46 takes off, which is included in the live view image. An image 64a generated by copying the image portion 61 is displayed instead of the representative image. The display control unit 11f also displays a time display 64b near the image 64a indicating that the representative image is expected to be acquired in two seconds from the acquisition time of the preview image. ing.

また、図１４の画像Ｐ３１ｃは、表示制御部１１ｆが、画像Ｐ３１ａの時間表示６２ｂ，６３ｂに代えて時間表示６５ｂ，６６ｂを表示した例を示している。時計の針の形状を模した時間表示６５ｂ，６６ｂ及び矢印の表示によって、代表画像６２ａ，６３ａの予想取得時間が、それぞれプレビュー画像の撮影時刻から２秒後又は３秒後であることを示している。 An image P31c in FIG. 14 shows an example in which the display control unit 11f displays time displays 65b and 66b instead of the time displays 62b and 63b of the image P31a. Time displays 65b and 66b imitating the shape of hands of a clock and display of arrows indicate that the expected acquisition time of the representative images 62a and 63a is two seconds or three seconds after the shooting time of the preview image, respectively. there is

図１４の画像Ｐ３１ｄは、同一の時間に撮像されるであろう代表画像を複数同時に表示する例を示している。上述した図１１のステップＳ５３の説明では、１枚の代表画像のみを選択する例について説明したが、代表画像として複数の画像を選択して記録するようにしてもよい。この場合には、設定制御部１１ｄは、外部機器３０から転送された複数の代表画像を画像データ記録領域１６ａに記録させる。 An image P31d in FIG. 14 shows an example of simultaneously displaying a plurality of representative images that are likely to be captured at the same time. In the above description of step S53 in FIG. 11, an example of selecting only one representative image has been described, but a plurality of images may be selected and recorded as the representative image. In this case, the setting control section 11d causes the plurality of representative images transferred from the external device 30 to be recorded in the image data recording area 16a.

設定制御部１１ｄは、推論エンジン１２の予測結果に基づいて、画像データ記録領域１６ａに記録されている代表画像を読み出して表示制御部１１ｆに与える。表示制御部１１ｆは、代表画像が複数の場合には、複数の代表画像を重ねて表示する。画像Ｐ３１ｄは、この場合の一表示例を示しており、画像Ｐ３１ｄ中には、ライブビュー画像中に含まれる鳥４６の飛び立つ直前の画像部分６１の外に、２秒後の画像として予測された代表画像６７ａ～６７ｃが表示される。また、表示制御部１１ｆは、これらの画像６７ａ～６７ｃの近傍に、プレビュー画像の取得時間を基準にして、これらの画像が取得されるであろう時間が２秒後であることを示す時間表示６７ｄを表示している。 Based on the prediction result of the inference engine 12, the setting control section 11d reads the representative image recorded in the image data recording area 16a and supplies it to the display control section 11f. When there are multiple representative images, the display control unit 11f displays the multiple representative images in an overlapping manner. An image P31d shows a display example in this case, and in the image P31d, in addition to the image portion 61 immediately before the flight of the bird 46 included in the live view image, an image after two seconds is predicted. Representative images 67a to 67c are displayed. In addition, the display control unit 11f displays a time display near these images 67a to 67c indicating that these images will be acquired two seconds after the acquisition time of the preview image. 67d is displayed.

このように本実施の形態においては、時間情報を有する画像を学習用データとして、所定の時間後の画像を予測する画像画像推論を行う機械学習を実現する。この機械学習によって得た推論モデルを例えば撮像装置に適用することにより、時々刻々変化するライブビュー画像に対して画像画像推論を行って、例えば鳥が所定の時間後にどの様に撮影されるかを予測して、提示することができる。ユーザは、提示された画像を考慮して、例えば撮影操作を行うことで、簡単に鳥が飛ぶ様子を捉えた撮影が可能である。また、学習用データとして用いる時間情報を有する画像は極めて容易に取得することができるものであり、この学習用データから比較的簡単な処理によって教師データを取得することができ、画像画像推論を可能にする推論モデルを簡単に作成することができる。 As described above, in the present embodiment, machine learning that performs image inference for predicting an image after a predetermined time is realized using an image having time information as learning data. By applying the inference model obtained by this machine learning to, for example, an imaging device, image image inference is performed on live view images that change from moment to moment. Predict and present. The user can easily capture a bird in flight by performing, for example, a shooting operation in consideration of the presented image. In addition, images with time information used as learning data can be obtained very easily, and teacher data can be obtained from this learning data by relatively simple processing, enabling image image inference. You can easily create an inference model that

（第３の実施の形態）
図１５は本発明の第３の実施の形態において採用される動作フローを示すフローチャートである。本実施の形態のハードウェア構成は第１の実施の形態と同様である。図１５において図１１と同一の手順には同一符号を付して説明を省略する。 (Third Embodiment)
FIG. 15 is a flow chart showing the operation flow employed in the third embodiment of the invention. The hardware configuration of this embodiment is the same as that of the first embodiment. In FIG. 15, the same steps as in FIG. 11 are denoted by the same reference numerals, and descriptions thereof are omitted.

第２の実施の形態は画像画像推論の推論モデルを生成する例を説明したが、本実施の形態は、時間情報を有する画像を学習用データとして用い、この学習用データを用いて所定の時間間隔を有する複数の画像とその位置差の組を教師データとして作成することで、各画像と所定の時間後の画像位置を予測する画像位置推論を行う推論モデルを生成して利用する例である。 In the second embodiment, an example of generating an inference model for image image inference has been described. This is an example of generating and using an inference model that performs image position inference to predict the position of each image and the image after a predetermined time by creating pairs of images with intervals and their positional differences as training data. .

図１５のフローは、教師データの作成方法が図１１と異なり、ステップＳ５２に代えてステップＳ６１を採用すると共にステップＳ５３の処理を省略したものである。なお、外部画像ＤＢ３２には、第２の実施の形態と同様に、例えば、図１２等に示す連続画像群が記録されているものとする。 The flow of FIG. 15 differs from that of FIG. 11 in the method of creating teacher data, adopting step S61 instead of step S52 and omitting the processing of step S53. It should be noted that, as in the second embodiment, the external image DB 32 stores, for example, a continuous image group shown in FIG. 12 and the like.

母集合作成部３１ａは、時間判定部３１ａ１及び対象物画像判定部３１ａ２によって、第２の実施の形態と同様に、所定の時間前後の２つの画像の組を教師データに用いる。例えば、図１２の画像Ｐ２１とＰ２４、画像Ｐ２２とＰ２５、画像Ｐ２３とＰ２６、画像Ｐ２４とＰ２７とが組であることを示している。母集合作成部３１ａは、対象物画像について、画像中の位置の情報を求める。例えば、対象物画像が図１２の鳥の画像である場合には、鳥の顔の位置やつば先の先端の位置を求めてもよい。そして、母集合作成部３１ａは、組の画像中の各対象物画像同士の位置の差を求め、時間的に前に取得された画像（以下、前画像という）中の対象物の位置を基準に、後に取得された画像（以下、後画像という）中の対象物の位置差を求める。母集合作成部３１ａは前画像と位置差との関係を教師データとして、教師データ記録部３１ｅに記録する。 The mother set creation unit 31a uses a set of two images before and after a predetermined time as training data by a time determination unit 31a1 and an object image determination unit 31a2, as in the second embodiment. For example, images P21 and P24, images P22 and P25, images P23 and P26, and images P24 and P27 in FIG. 12 are shown as pairs. The mother set creation unit 31a obtains positional information in the image for the target object image. For example, when the object image is the image of the bird in FIG. 12, the position of the bird's face and the tip of the tip of the toe may be obtained. Then, the mother set creation unit 31a obtains the difference in the positions of the object images in the set of images, and uses the position of the object in the image acquired earlier in time (hereinafter referred to as the previous image) as a reference. Second, the positional difference of the object in the image acquired later (hereinafter referred to as the post-image) is obtained. The mother set creating unit 31a records the relationship between the previous image and the position difference as teaching data in the teaching data recording unit 31e.

この場合におけるネットワーク１２ａの生成方法は、第２の実施の形態と同様であり、上述した図１３における出力の画像に代えて前画像の位置を基準にした後画像の位置差が得られるように、ネットワークデザインが決定される。即ち、本実施の形態においては、前画像を入力して、後画像の位置を予測する画像位置推論を行う推論モデルを得る。こうして決定されたネットワークデザインの情報は、通信部３１ｃから撮像装置２０に伝送され、設定制御部１１ｄによって、設定データ記録領域１６ｂに設定情報として記録される。 The method of generating the network 12a in this case is the same as in the second embodiment, and instead of the output image in FIG. , the network design is determined. That is, in the present embodiment, an inference model is obtained by inputting a previous image and performing image position inference for predicting the position of a subsequent image. Information on the network design thus determined is transmitted from the communication section 31c to the imaging device 20, and recorded as setting information in the setting data recording area 16b by the setting control section 11d.

次に、このように構成された実施の形態について撮像装置２０における動作を図１６及び図１７を参照して説明する。図１６は撮像装置２０の制御部１１の制御を示すフローチャートであり、図１７は表示部１５の表示画面に表示される画像の表示例を示す説明図である。なお、図１６に示す制御部１１の制御フローは、図８と略同様であり、図８のステップＳ２７，Ｓ２８にそれぞれ代えてステップＳ６５，Ｓ６６を採用した点が異なる。 Next, the operation of the imaging device 20 of the embodiment configured as described above will be described with reference to FIGS. 16 and 17. FIG. FIG. 16 is a flowchart showing the control of the control unit 11 of the imaging device 20, and FIG. 17 is an explanatory diagram showing a display example of an image displayed on the display screen of the display unit 15. FIG. The control flow of the control unit 11 shown in FIG. 16 is substantially the same as that in FIG. 8, except that steps S65 and S66 are employed instead of steps S27 and S28 in FIG.

本実施の形態においては、推論エンジン１２は、所定の画像入力に対して所定時間後の予測位置を出力する上述した画像位置推論を行う推論モデルを構成する。 In this embodiment, the inference engine 12 constructs an inference model that performs the above-described image position inference that outputs a predicted position after a predetermined time from a given image input.

いま、図６と同様に、ユーザ４７が枝４５に止まっている鳥４６を撮像するものとする。図１７の画像Ｐ３２ａ～Ｐ３２ｄはいずれも鳥４６が飛び立つ直前のライブビュー画像を示している。設定制御部１１ｄは、撮像部２２からのライブビュー画像を推論エンジン１２に与えて、画像位置推論を実行させる。推論エンジン１２は、画像位置推論の結果として、入力画像との位置差、即ち、入力されたライブビュー画像中の対象物画像の位置を基準として、ライブビュー画像の撮影時刻から所定時間後に撮像されるであろう画像中の対象物画像の位置を予測して予測結果を設定制御部１１ｄに出力する。 Now, as in FIG. 6, it is assumed that the user 47 takes an image of a bird 46 perched on a branch 45 . Images P32a to P32d in FIG. 17 all show live view images immediately before the bird 46 takes off. The setting control unit 11d gives the live view image from the imaging unit 22 to the inference engine 12 to execute image position inference. As a result of the image position inference, the inference engine 12 uses the position difference from the input image, that is, the position of the object image in the input live view image as a reference, and the image is captured after a predetermined time from the time when the live view image was captured. It predicts the position of the object image in the image that is likely to be there, and outputs the prediction result to the setting control section 11d.

撮像装置２０の制御部１１は、図１６のステップＳ２６において、推論エンジン１２による予測結果の信頼性が十分に高いと判定した場合には、次のステップＳ６４において、推論結果に基づいて、所定時間後に撮像されるであろう画像中の対象物画像の位置を示す表示を行う。 If the control unit 11 of the imaging device 20 determines in step S26 in FIG. 16 that the reliability of the prediction result by the inference engine 12 is sufficiently high, then in the next step S64, based on the inference result, a predetermined time A display is provided indicating the position of the object image in the image that will be captured later.

図１７の画像Ｐ３２ａは、この場合の一表示例を示しており、表示制御部１１ｆは、画像Ｐ３２ａ中に、ライブビュー画像中に含まれる鳥４６の飛び立つ直前の画像（対象物画像）部分６１の外に、対象物画像の２秒後の画像位置として予測された位置を示す位置表示６４ａを表示している。なお、位置表示６４ａは、画像部分６１をコピーして得られた画像である。また、表示制御部１１ｆは、この画像６４ａの近傍に、プレビュー画像の取得時間を基準にして、この画像が取得されるであろう時間が２秒後であることを示す時間表示６４ｂを表示している。 An image P32a in FIG. 17 shows a display example in this case, and the display control unit 11f displays an image (object image) portion 61 of the image P32a immediately before the bird 46 takes off, which is included in the live view image. In addition to , there is displayed a position display 64a indicating the position predicted as the image position of the object image two seconds later. The position display 64a is an image obtained by copying the image portion 61. FIG. The display control unit 11f also displays a time display 64b near this image 64a indicating that this image will be acquired two seconds after the acquisition time of the preview image. ing.

また、図１７の画像Ｐ３２ｂは、表示制御部１１ｆが、画像Ｐ３２ａの時間表示６４ａに代えて時間表示７１ａを表示した例を示している。また、表示制御部１１ｆは、この画像７１ａの近傍に、プレビュー画像の取得時間を基準にして、この画像が取得されるであろう時間が２秒後であることを示す時間表示７１ｂを表示している。時間表示７１ａは、曲線形状によって、鳥４６がプレビュー画像の撮影時刻から２秒後において到達するであろう画像中の位置の範囲を示している。 An image P32b in FIG. 17 shows an example in which the display control unit 11f displays a time display 71a instead of the time display 64a of the image P32a. In addition, the display control unit 11f displays a time display 71b near the image 71a indicating that the image will be acquired two seconds after the acquisition time of the preview image. ing. The time display 71a shows, by the shape of a curve, the range of positions in the image that the bird 46 will reach two seconds after the time the preview image was captured.

また、制御部１１は、ステップＳ４５において推論の信頼性が十分に高くない場合には、ステップＳ６６において、信頼性が比較的高い位置の範囲を表示する。図１７の画像Ｐ３２ｃはこの場合の表示例を示しており、表示制御部１１ｆは、画像Ｐ３２ｃ中に、２つの曲線による位置範囲表示７２ａを表示して、後画像が２秒後に存在する範囲を示している。また、表示制御部１１ｆは、位置範囲表示７２ａの近傍に、プレビュー画像の取得時間を基準にして、この画像が取得されるであろう時間が２秒後であることを示す時間表示７２ｂを表示している。なお、位置範囲表示７２ａは、画像位置推論の推論結果の信頼性が十分に高いとはいえない場合に、比較的高い信頼性（例えば、６５～８４％）の複数の推論結果のうち最も近い位置と最も遠い位置との範囲を示すものである。 If the reliability of the inference is not sufficiently high in step S45, the control unit 11 displays a range of positions with relatively high reliability in step S66. An image P32c in FIG. 17 shows a display example in this case. The display control unit 11f displays a position range display 72a with two curved lines in the image P32c, and displays the range in which the subsequent image exists two seconds later. showing. In addition, the display control unit 11f displays a time display 72b near the position range display 72a indicating that the preview image will be acquired two seconds after the acquisition time of the preview image. is doing. It should be noted that the position range display 72a displays the closest inference results among a plurality of inference results with relatively high reliability (for example, 65 to 84%) when the reliability of the inference result of the image position inference is not sufficiently high. It indicates the range between the position and the furthest position.

また、画像Ｐ３２ｄは、ステップＳ６６における他の表示例を示している。表示制御部１１ｆは、画像Ｐ３２ｄ中に、円による位置範囲表示７３ａを表示して、後画像が２秒後に存在する範囲を示している。また、表示制御部１１ｆは、位置範囲表示７３ａの近傍に、プレビュー画像の取得時間を基準にして、この画像が取得されるであろう時間が２秒後であることを示す時間表示７３ｂを表示している。 An image P32d shows another display example in step S66. The display control unit 11f displays a circular position range display 73a in the image P32d to indicate the range in which the subsequent image will exist two seconds later. In addition, the display control unit 11f displays a time display 73b near the position range display 73a indicating that the preview image will be acquired in two seconds after the acquisition time of the preview image. is doing.

このように本実施の形態においては、時間情報を有する画像を学習用データとして、所定の時間後の画像の位置を予測する画像位置推論を行う機械学習を実現する。この機械学習によって得た推論モデルを例えば撮像装置に適用することにより、時々刻々変化するライブビュー画像に対して画像位置推論を行って、例えば鳥が所定の時間後にどの位置に撮影されるかを予測して、その位置を提示することができる。ユーザは、提示された位置を考慮して、例えば撮影操作を行うことで、鳥が飛ぶ様子を簡単に捉えた撮影が可能である。また、学習用データとして用いる時間情報を有する画像は極めて容易に取得することができるものであり、この学習用データから比較的簡単な処理によって教師データを取得することができ、画像位置推論を可能にする推論モデルを簡単に作成することができる。 As described above, in the present embodiment, machine learning that performs image position inference for predicting the position of an image after a predetermined time is realized using an image having time information as learning data. By applying the inference model obtained by this machine learning to, for example, an imaging device, image position inference is performed on live view images that change from moment to moment. It can predict and present its position. The user can take a picture of the bird in flight by taking the presented position into consideration and, for example, performing a picture-taking operation. In addition, images with time information used as learning data can be acquired very easily, and teacher data can be acquired from this learning data by relatively simple processing, enabling image position inference. You can easily create an inference model that

（第４の実施の形態）
図１８乃至図２０は本発明の第４の実施の形態を説明するための説明図である。本実施の形態のハードウェア構成は第１の実施の形態と同様である。図１８は本実施の形態における撮影シーンを示す説明図である。 (Fourth embodiment)
18 to 20 are explanatory diagrams for explaining the fourth embodiment of the present invention. The hardware configuration of this embodiment is the same as that of the first embodiment. FIG. 18 is an explanatory diagram showing a shooting scene in this embodiment.

図１８はユーザ４７が樹木の枝４５に止まっている鳥４６の撮影を行う様子を示している。ユーザ４７と樹木との間には川８１が流れており、川８１では魚４９が泳いでいる。枝４５に止まっている鳥４６は、上空に飛び立ったり、隣の樹木の枝に飛び移ったり、川８１の魚４９に目がけて滑空したりすることが考えられる。図１８ではこれらの状態を符号４６ｈ，４６ｉ，４６ｊでそれぞれ示している。この場合において、鳥４６がいずれの方向に飛び去るかを事前に予測できれば、鳥４６の効果的な撮影が可能となることが考えられる。図１８の例では、鳥４６が符号４６ｈ又は４６ｉの状態になる確率がそれぞれ１５％で、符号４６ｊの状態になる確率が７０％であることを示している。本実施の形態はこのような予測を可能にするものである。 FIG. 18 shows the user 47 photographing a bird 46 perched on a branch 45 of a tree. A river 81 flows between the user 47 and the tree, and fish 49 are swimming in the river 81 . A bird 46 perched on a branch 45 may take off into the sky, jump to a branch of an adjacent tree, or glide toward a fish 49 in a river 81 . In FIG. 18, these states are indicated by reference numerals 46h, 46i and 46j, respectively. In this case, if it is possible to predict in advance in which direction the bird 46 will fly away, it is possible to effectively photograph the bird 46 . The example of FIG. 18 indicates that the bird 46 has a 15% probability of being in the state 46h or 46i and a 70% probability of being in the state 46j. The present embodiment enables such prediction.

第２の実施の形態においては、時間情報を有する画像を学習用データとして用い、この学習用データを用いて所定の時間間隔を有する複数の画像の組を教師データとして作成することで、各画像（前画像）と所定の時間後の画像（後画像）を予測する画像画像推論を行う推論モデルを生成して利用する例であった。本実施の形態は、前画像に対して複数の時間後の後画像と移動方向を予測する画像画像方向推論を行う推論モデルを生成して利用する例である。 In the second embodiment, an image having time information is used as learning data, and a set of a plurality of images having a predetermined time interval is created as training data by using this learning data. This is an example of generating and using an inference model that performs image inference for predicting a (previous image) and an image after a predetermined time (post-image). The present embodiment is an example of generating and using an inference model that performs image image direction inference for predicting a later image and a moving direction after a plurality of times with respect to a previous image.

本実施の形態において、例えば、図１２に示す連続画像群を学習用データとして用いることができる。母集合作成部３１ａは、前画像と、前画像の取得時間から複数の所定時間後における後画像とを組にした教師データを生成して教師データ記録部３１ｅに格納する。入出力モデル化部３１ｄは、各所定時間毎に、前画像に対応する後画像とその移動方向を求めると共に、後画像として用いる代表画像を選択する。この場合には、代表画像は移動方向に応じて複数選択される。例えば、移動方向の１５度毎に代表画像を選択するようにしてもよい。こうして、生成されたネットワークデザインの情報及び代表画像は撮像装置２０に送信され、撮像装置２０の記録制御部１１ｃはネットワークデザインの情報を設定データ記録領域１６ｂに記録し、代表画像を画像データ記録領域１６ａに記録するようになっている。 In this embodiment, for example, a continuous image group shown in FIG. 12 can be used as learning data. The mother set creating unit 31a generates teacher data in which a previous image and a later image after a plurality of predetermined times from the acquisition time of the previous image are paired, and stores the teacher data in the teacher data recording unit 31e. The input/output modeling unit 31d obtains the post-image corresponding to the pre-image and the moving direction of the post-image and selects a representative image to be used as the post-image every predetermined time. In this case, a plurality of representative images are selected according to the movement direction. For example, a representative image may be selected every 15 degrees in the moving direction. The network design information and the representative image thus generated are transmitted to the imaging device 20, the recording control unit 11c of the imaging device 20 records the network design information in the setting data recording area 16b, and the representative image is transferred to the image data recording area. 16a.

次に、このように構成された実施の形態について図１８乃至図２０を参照して説明する。図１９及び図２０は表示部１５の表示画面に表示される画像の表示例を示す説明図である。 Next, an embodiment configured in this way will be described with reference to FIGS. 18 to 20. FIG. 19 and 20 are explanatory diagrams showing display examples of images displayed on the display screen of the display unit 15. FIG.

本実施の形態においては、推論エンジン１２は、所定の画像入力に対して複数の所定時間後の予測画像及びその移動方向を出力する上述した画像画像方向推論を行う推論モデルを構成する。ユーザ４７は、枝４５から飛び立った後の鳥４６の撮影を試みようとしている。図１９の画像Ｐ４１～Ｐ４４は、所定の時刻におけるライブビュー画像を示しており、画像Ｐ４１～Ｐ４４の順で時刻が経過している。 In the present embodiment, the inference engine 12 constructs an inference model that performs the above-described image image direction inference for outputting a plurality of predicted images after a predetermined time and their moving directions for a predetermined image input. User 47 is attempting to photograph bird 46 after it has taken off from branch 45 . Images P41 to P44 in FIG. 19 show live view images at a predetermined time, and time passes in the order of images P41 to P44.

画像Ｐ４１中の画像４６ａは、枝に止まっている鳥４６を示している。この画像Ｐ４１は、表示画面１５ａ上にライブビュー画像として表示されている。画像Ｐ４１から所定時間後に取得されたライブビュー画像である画像Ｐ４２は、画像Ｐ４２中の被写体に関連する推論モデルが存在することを示す丸印の表示５１が表示されている。画像Ｐ４２中の鳥４６の画像４６ｂは、もう少しで鳥４６が飛び立とうとしている様子を示している。例えば、この時点における画像画像方向推論の信頼性は十分に高くはないものとする。この場合には、画像Ｐ４２中には、推論エンジン１２による画像画像方向推論の結果、表示制御部１１ｆにより、５秒間から２秒間後の画像予測であることを示す時間表示８８ｂと鳥４６の移動方向の予測を示す表示が表示される。 An image 46a in the image P41 shows a bird 46 perched on a branch. This image P41 is displayed as a live view image on the display screen 15a. An image P42, which is a live view image acquired after a predetermined time from the image P41, is displayed with a circle indication 51 indicating that there is an inference model related to the subject in the image P42. An image 46b of the bird 46 in the image P42 shows the bird 46 about to take off. For example, assume that image orientation inference is not reliable enough at this point. In this case, in the image P42, as a result of the image image direction inference by the inference engine 12, the display control unit 11f displays a time display 88b indicating that the image is predicted after 5 seconds to 2 seconds, and the movement of the bird 46. A display showing the predicted direction is displayed.

図１９の例では、画像Ｐ４２中には、鳥４６が、上方に飛び立つ可能性が１５％であることを示す確率表示８５ｈｐとその場合の代表画像８５ｈ、隣の枝に飛び移る可能性が１５％であることを示す確率表示８５ｉｐとその場合の代表画像８５ｉ及び水平又は下方に滑空する可能性が７０％であることを示す確率表示８５ｊｐとその場合の代表画像８５ｊが含まれている。なお、各確率表示によって示す確率は、推論の結果得られる各方向の信頼性の値に基づいて得られるものである。 In the example of FIG. 19, the image P42 contains a probability display 85hp indicating that there is a 15% chance that the bird 46 will fly upwards, a representative image 85h in that case, and a 15% possibility that the bird 46 will fly to the next branch. % and a representative image 85i in that case, and a probability display 85jp showing that the possibility of glide horizontally or downward is 70% and a representative image 85j in that case. The probability indicated by each probability display is obtained based on the reliability value of each direction obtained as a result of inference.

更に、画像Ｐ４２から所定時間後に取得されたライブビュー画像である画像Ｐ４３は、画像Ｐ４３中の被写体に関連する推論モデルが存在することを示す丸印の表示５１が表示されている。画像Ｐ４３中の鳥４６の画像４６ｃは、鳥４６が飛び立つ直前の様子を示している。例えば、この時点における画像画像方向推論の信頼性は十分に高いものとする。この場合には、画像Ｐ４３中には、推論エンジン１２による画像画像方向推論の結果、表示制御部１１ｆにより、１秒間後の鳥４６の移動方向を示す表示が表示される。 Further, an image P43, which is a live view image acquired after a predetermined time from the image P42, is displayed with a circle indication 51 indicating that there is an inference model related to the subject in the image P43. An image 46c of the bird 46 in the image P43 shows the state just before the bird 46 takes off. For example, assume that the reliability of image orientation inference at this point is sufficiently high. In this case, in the image P43, as a result of the image image direction inference by the inference engine 12, a display indicating the moving direction of the bird 46 one second later is displayed by the display control unit 11f.

図１９の例では、画像Ｐ４３中には、鳥４６が、隣の枝に飛び移る可能性が５％であることを示す確率表示８６ｉｐとその場合の代表画像８６ｉ及び水平又は下方に滑空する可能性が９５％であることを示す確率表示８６ｊｐとその場合の代表画像８６ｊが含まれている。また、予測が現時点から１秒後のものであることを示す時間表示８８ｃも表示されている。 In the example of FIG. 19, the image P43 includes a probability display 86ip indicating that the bird 46 has a 5% chance of jumping to the next branch, a representative image 86i of that case, and a possibility of glide horizontally or downward. A probability display 86jp indicating that the sex is 95% and a representative image 86j in that case are included. Also displayed is a time display 88c indicating that the prediction is for one second from the current time.

ユーザ４７は、表示部１５の表示画面１５ａに表示された画像Ｐ４３を確認することによって、鳥４６が枝４５から飛び立った後、略水平方向に滑空することを予測することができる。 By confirming the image P43 displayed on the display screen 15a of the display unit 15, the user 47 can predict that the bird 46 will glide substantially horizontally after taking off from the branch 45. FIG.

例えば、ユーザ４７は、画像Ｐ４３を確認して鳥４６の移動方向を推測することで、鳥４６を撮影範囲に捉え続けることが比較的容易にできる。結果的に、ユーザ４７は、所望の決定的瞬間、即ち、鳥４６が魚４９を捕獲する瞬間等を撮影することが可能となる。 For example, the user 47 can relatively easily keep the bird 46 in the shooting range by checking the image P43 and estimating the moving direction of the bird 46 . As a result, the user 47 can capture the desired decisive moment, such as the moment when the bird 46 catches the fish 49 .

図２０はこの場合における表示画面１５ａ上の表示を示しており、魚の画像４９ｐと魚を咥えた鳥の画像４６ｐとが表示されている。 FIG. 20 shows the display on the display screen 15a in this case, in which an image 49p of a fish and an image 46p of a bird holding a fish are displayed.

なお、図１９の画像Ｐ４４は、画像Ｐ４３の撮影時刻から１秒後のライブビュー画像を示しており、実際の鳥４６の画像４６ｄと予測結果の代表画像８７ｊ及びその移動方向の確率が１００％であることを示す確率表示８７ｊｐとが表示されている。 Note that the image P44 in FIG. 19 shows a live view image one second after the image P43 was captured. A probability display 87jp indicating that is displayed.

このように本実施の形態においては、時間情報を有する画像を学習用データとして、所定の複数の時間後の画像及びその移動方向を予測する画像画像方向推論を行う機械学習を実現する。この機械学習によって得た推論モデルを例えば撮像装置に適用することにより、時々刻々変化するライブビュー画像に対して画像画像方向推論を行って、例えば鳥が所定の時間後にどの方向で撮影されるかを予測して、提示することができる。ユーザは、提示された画像を考慮して、例えば撮影操作を行うことで、簡単に鳥が飛ぶ様子を捉えた撮影が可能である。また、学習用データとして用いる時間情報を有する画像は極めて容易に取得することができるものであり、この学習用データから比較的簡単な処理によって教師データを取得することができ、画像画像方向推論を可能にする推論モデルを簡単に作成することができる。 As described above, in the present embodiment, an image having time information is used as learning data, and machine learning is realized that performs image image direction inference for predicting an image after a predetermined plurality of times and its movement direction. By applying the inference model obtained by this machine learning to, for example, an imaging device, image image orientation inference is performed on live view images that change from moment to moment. can be predicted and presented. The user can easily capture a bird in flight by performing, for example, a shooting operation in consideration of the presented image. In addition, images with time information used as learning data can be obtained very easily, and teacher data can be obtained from this learning data by relatively simple processing, and image direction inference can be performed. You can easily create inference models that enable it.

なお、上記各実施の形態においては、対象物として鳥を想定した例のみを説明したが、対象物としてはどのようなものでもよく、また、決定的瞬間についても前後の画像から学習によって予測可能なものであれば、どのようなものでもよい。例えば、鳥が水面に入る瞬間を予測してもよく、魚が水面からジャンプする瞬間を予測してもよく、猫や犬が振り向く瞬間を予測してもよい。 In each of the above-described embodiments, only an example in which a bird is assumed as an object has been described, but any object can be used, and the decisive moment can be predicted by learning from images before and after. Anything is possible as long as it is something. For example, it may be possible to predict the moment a bird enters the water surface, the moment a fish jumps out of the water surface, or the moment a cat or dog turns around.

また、動物に限らず、ミルククラウンが生じる瞬間を予測してもよく、花火が開く瞬間を予測してもよい。また、比較的予測が簡単なゴルフ等のスイングのインパクトの瞬間を予測してもよい。また、更に、細胞分裂の瞬間、卵割、羽化、孵化等を予測してもよい。細胞分裂の瞬間を確認することは比較的困難であり、分単位で分裂の瞬間を予測できれば、極めて有用である。また、例えば、調理の状態を撮像することで、火を止める瞬間を予測することも可能である。 In addition, it is not limited to animals, and it is possible to predict the moment when a milk crown will appear, or the moment when fireworks will open. Alternatively, the moment of impact of a golf swing, which is relatively easy to predict, may be predicted. Furthermore, the moment of cell division, egg cleavage, eclosion, hatching, etc. may be predicted. It is relatively difficult to confirm the moment of cell division, and it would be extremely useful if the moment of division could be predicted in minutes. Also, for example, by imaging the state of cooking, it is possible to predict the moment when the fire will be turned off.

更に、上記各実施の形態においては、画像によって時間、画像、位置、方向を推論する例を説明したが、音に基づいてこれらの推論を行うことも可能である。例えば、動物の求愛行動を求める鳴き声等から、求愛行動を予測することも可能である。また、動物の画像から求愛行動を発するまでの鳴き声の瞬間を予測することも可能であり、即ち、画像から音の発生タイミングを予測することも可能である。
前述のように、音声や画像など、取得できる情報は、すべて、本発明の考え方が適用でき、また、総合的に判断して一方から他方の予測のみならず、両方のデータを使った学習を行ってもよい。例えば、画像の一コマごとに、対応する取得時間の音の断片的な情報を入れて学習すればよい。求愛行動のみならず、産卵や羽化、孵化といった決定的瞬間もある。また、心音や呼吸音、腸蠕動音などに基づいて推論する事で患者がその後発症するであろう疾患を予測する医療展開も可能である。ぜんそくなどの喘鳴や呼吸の様子は、悪化によって状態が変わるので初期の発見がしやすい。言葉で表せない乳幼児や障害のある人などの早期治療に役立てることも可能である。 Furthermore, in each of the above embodiments, an example of inferring time, image, position, and direction from images has been described, but it is also possible to perform these inferences based on sound. For example, it is possible to predict the courtship behavior from the cry of the animal requesting the courtship behavior. It is also possible to predict the moment of the cry until courtship behavior is started from the image of the animal, that is, it is also possible to predict the timing of sound generation from the image.
As described above, the concept of the present invention can be applied to all information that can be obtained, such as voice and images, and it is possible to comprehensively judge not only prediction from one to the other but also learning using both data. you can go For example, for each frame of the image, it is possible to learn by inserting fragmentary information of the sound at the corresponding acquisition time. In addition to courtship behavior, there are also decisive moments such as egg laying, emergence, and hatching. In addition, it is also possible to develop medical care that predicts the disease that the patient will develop later by making inferences based on heart sounds, breath sounds, intestinal peristaltic sounds, and the like. The condition of wheezing and breathing in asthma is easy to detect at an early stage because the condition changes depending on the deterioration. It can also be used for early treatment of infants and people with disabilities who cannot express themselves in words.

なお、上記実施の形態においては、撮像装置は、外部機器に推論モデルの作成及び転送を依頼したが、推論モデルの作成はいずれの装置において実施してもよく、例えば、クラウド上のコンピュータを利用してもよい。 In the above embodiment, the imaging device requests the external device to create and transfer the inference model, but the inference model can be created by any device, for example, using a computer on the cloud. You may

上記実施の形態においては、撮像のための機器として、デジタルカメラを用いて説明したが、カメラとしては、デジタル一眼レフカメラでもコンパクトデジタルカメラでもよく、ビデオカメラ、ムービーカメラのような動画用のカメラでもよく、さらに、携帯電話やスマートフォンなど携帯情報端末（ＰＤＡ：Personal Digital Assist）等に内蔵されるカメラでも勿論構わない。また、撮像部が撮像装置と別体になったものでもよい。 In the above embodiments, a digital camera was used as an image capturing device, but the camera may be a digital single-lens reflex camera, a compact digital camera, or a video camera such as a video camera or movie camera. Alternatively, a camera built in a mobile information terminal (PDA: Personal Digital Assist) such as a mobile phone or smart phone may be used. Alternatively, the imaging unit may be separate from the imaging device.

本発明は、上記各実施形態にそのまま限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素の幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 The present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the present invention at the implementation stage. Also, various inventions can be formed by appropriate combinations of the plurality of constituent elements disclosed in the above embodiments. For example, some components of all components shown in the embodiments may be omitted. Furthermore, components across different embodiments may be combined as appropriate.

なお、特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。また、これらの動作フローを構成する各ステップは、発明の本質に影響しない部分については、適宜省略も可能であることは言うまでもない。 Regarding the operation flow in the claims, the specification, and the drawings, even if explanations are made using "first," "next," etc. for the sake of convenience, it is essential to implement them in this order. does not mean Further, it goes without saying that each step constituting these operation flows can be appropriately omitted as long as it does not affect the essence of the invention.

なお、ここで説明した技術のうち、主にフローチャートで説明した制御に関しては、プログラムで設定可能であることが多く、記録媒体や記録部に収められる場合もある。この記録媒体、記録部への記録の仕方は、製品出荷時に記録してもよく、配布された記録媒体を利用してもよく、インターネットを介してダウンロードしたものでもよい。 It should be noted that, among the techniques described here, the control described mainly in the flow charts can often be set by a program, and may be stored in a recording medium or a recording unit. The method of recording in the recording medium and the recording unit may be recorded at the time of product shipment, using a distributed recording medium, or downloading via the Internet.

なお、実施例中で、「部」（セクションやユニット）として記載した部分は、専用の回路や、複数の汎用の回路を組み合わせて構成してもよく、必要に応じて、予めプログラムされたソフトウェアに従って動作を行うマイコン、ＣＰＵなどのプロセッサ、あるいはＦＰＧＡなどシーケンサを組み合わせて構成されてもよい。また、その制御の一部または全部を外部の装置が引き受けるような設計も可能で、この場合、有線や無線の通信回路が介在する。通信は、ブルートゥースやＷｉＦｉ、電話回線などで行えばよく、ＵＳＢなどで行っても良い。専用の回路、汎用の回路や制御部を一体としてＡＳＩＣとして構成してもよい。移動部などは、様々なアクチュエータと、必要に応じて移動用の連結メカニズムによって構成されており、ドライバ回路によってアクチュエータが作動する。このドライブ回路もまた、特定のプログラムに従ってマイコンやＡＳＩＣなどが制御する。こうした制御は各種センサやその周辺回路が出力する情報によって、詳細な補正、調整などが行われても良い。また、推論モデルとか学習済モデルという言葉で人工知能が判断する学習結果で判断する実施例を説明したが、これは、単純なフローチャートや条件分岐、あるいは演算を伴う数値化判断等でも代替可能な場合がある。また、カメラの制御回路の演算能力が改善されることや、特定の状況や対象物に絞り込むことによって、機械学習の学習を撮像装置内で実施してもよい。

［付記項１］
撮影時刻に基づく時間情報を有する一連の画像中の各画像について所定時間後の画像を求める時間判定部と、
上記一連の画像中の各画像について、上記各画像中の特定の対象物の所定時間後における画像状態を検出する対象物画像判定部と、
上記各画像と上記各画像について求めた特定の対象物の所定時間後における画像状態のデータとを組にして教師データとする制御部と
を具備したことを特徴とする教師データ作成装置。 It should be noted that, in the embodiments, portions described as "parts" (sections or units) may be configured by combining a dedicated circuit or a plurality of general-purpose circuits, and if necessary, pre-programmed software A processor such as a microcomputer, a CPU, or a sequencer such as an FPGA may be combined to operate according to the above. It is also possible to design a part or all of the control by an external device, in which case a wired or wireless communication circuit is interposed. Communication may be performed by Bluetooth, WiFi, telephone line, or the like, and may be performed by USB or the like. A dedicated circuit, a general-purpose circuit, and a control unit may be integrated into an ASIC. The moving part and the like are composed of various actuators and, if necessary, a connecting mechanism for movement, and the actuators are operated by a driver circuit. This drive circuit is also controlled by a microcomputer, ASIC, etc. according to a specific program. Such control may include detailed corrections and adjustments based on information output from various sensors and their peripheral circuits. In addition, we have explained an embodiment in which decisions are made based on learning results determined by artificial intelligence using the terms inference model and learned model, but this can also be replaced by simple flowcharts, conditional branching, or numerical judgments involving calculations. Sometimes. Further, machine learning may be performed within the imaging device by improving the computing power of the control circuit of the camera or by narrowing down to a specific situation or object.

[Appendix 1]
a time determination unit that obtains an image after a predetermined time from each image in a series of images having time information based on shooting time;
an object image determination unit for detecting an image state of a specific object in each image after a predetermined time, for each image in the series of images;
A teaching data creating apparatus, comprising: a control unit that combines each of the images and data of an image state of a specific object after a predetermined period of time obtained for each of the images and sets the data as teaching data.

［付記項２］
上記制御部は、上記一連の画像から上記特定の対象物の所定時間後における画像状態に類似した画像を選択して代表画像とする
ことを特徴とする付記項２に記載の教師データ作成装置。 [Appendix 2]
3. The training data creation apparatus according to claim 2, wherein the control unit selects an image similar to an image state of the specific object after a predetermined time from the series of images as a representative image.

［付記項３］
付記項１に記載の教師データ作成装置によって作成された教師データを用いた機械学習により、入力された画像から所定の対象物の所定時間後における画像状態を推論する推論モデルを生成する推論モデル生成部
を具備したことを特徴とする学習装置。 [Appendix 3]
Inference model generation for generating an inference model for inferring an image state of a predetermined object after a predetermined time from an input image by machine learning using the teacher data created by the teacher data creation device according to additional item 1. A learning device characterized by comprising:

［付記項４］
付記項３の学習装置によって生成された推論モデルを実現する推論エンジンと、
撮像部と、
上記撮像部による撮像画像を上記推論エンジンに与えて、上記撮像画像中の上記所定の対象物の所定時間後における画像状態の推論結果を得る設定制御部と
を具備したことを特徴とする撮像装置。 [Appendix 4]
an inference engine that realizes an inference model generated by the learning device of appendix 3;
an imaging unit;
and a setting control unit that provides an image captured by the image capturing unit to the inference engine and obtains an inference result of an image state of the predetermined object in the captured image after a predetermined time. .

［付記項５］
撮影時刻に基づく時間情報を有する一連の画像中の各画像について所定時間後の画像を求める時間判定部と、
上記一連の画像中の各画像について、上記各画像中の特定の対象物の所定時間後における画像位置を検出する対象物画像判定部と、
上記各画像と上記各画像について求めた特定の対象物の所定時間後における画像位置のデータとを組にして教師データとする制御部と
を具備したことを特徴とする教師データ作成装置。 [Appendix 5]
a time determination unit that obtains an image after a predetermined time from each image in a series of images having time information based on shooting time;
an object image determination unit for detecting an image position of a specific object in each image after a predetermined time, for each image in the series of images;
A teaching data creating apparatus, comprising: a control unit that combines each of the images and the data of the image position of the specific object after a predetermined time obtained for each of the images and sets the data as teaching data.

［付記項６］
付記項５に記載の教師データ作成装置によって作成された教師データを用いた機械学習により、入力された画像から所定の対象物の所定時間後における画像位置を推論する推論モデルを生成する推論モデル生成部
を具備したことを特徴とする学習装置。 [Appendix 6]
Inference model generation for generating an inference model for inferring the image position of a predetermined object after a predetermined time from an input image by machine learning using the teacher data created by the teacher data creation device according to additional item 5. A learning device characterized by comprising:

［付記項７］
付記項６の学習装置によって生成された推論モデルを実現する推論エンジンと、
撮像部と、
上記撮像部による撮像画像を上記推論エンジンに与えて、上記撮像画像中の上記所定の対象物の所定時間後における画像位置の推論結果を得る設定制御部と
を具備したことを特徴とする撮像装置。 [Appendix 7]
an inference engine that implements the inference model generated by the learning device of appendix 6;
an imaging unit;
and a setting control unit that provides an image captured by the image capturing unit to the inference engine and obtains an inference result of an image position of the predetermined object after a predetermined time in the captured image. .

［付記項８］
撮影時刻に基づく時間情報を有する一連の画像中の各画像について複数の所定時間後の画像を求める時間判定部と、
上記一連の画像中の各画像について、上記各画像中の特定の対象物の複数の所定時間後における画像位置及び移動方向を検出する対象物画像判定部と、
上記各画像と上記各画像について求めた特定の対象物の複数の所定時間後における画像位置及び移動方向のデータとを組にして教師データとする制御部と
を具備したことを特徴とする教師データ作成装置。 [Appendix 8]
a time determination unit that obtains an image after a plurality of predetermined times for each image in a series of images having time information based on shooting time;
an object image determining unit for detecting, for each image in the series of images, an image position and moving direction of a specific object in each image after a plurality of predetermined times;
and a control unit that combines each of the images and data of image positions and moving directions after a plurality of predetermined times of the specific object obtained for each of the images and sets them as teacher data. creation device.

［付記項９］
付記項８に記載の教師データ作成装置によって作成された教師データを用いた機械学習により、入力された画像から所定の対象物の複数の所定時間後における画像位置及び移動方向を推論する推論モデルを生成する推論モデル生成部
を具備したことを特徴とする学習装置。 [Appendix 9]
An inference model that infers the image position and movement direction of a predetermined object after a plurality of predetermined times from the input image by machine learning using the teacher data created by the teacher data creation device according to additional item 8. A learning device comprising an inference model generation unit that generates an inference model.

［付記項１０］
付記項９の学習装置によって生成された推論モデルを実現する推論エンジンと、
撮像部と、
上記撮像部による撮像画像を上記推論エンジンに与えて、上記撮像画像中の上記所定の対象物の複数の所定時間後における画像位置及び移動方向の推論結果を得る設定制御部と
を具備したことを特徴とする撮像装置。 [Appendix 10]
an inference engine that implements the inference model generated by the learning device of appendix 9;
an imaging unit;
a setting control unit that supplies an image captured by the imaging unit to the inference engine and obtains inference results of image positions and moving directions of the predetermined object after a plurality of predetermined times in the captured image. An imaging device characterized by:

１１…制御部、１１ａ…撮像制御部、１１ｂ…画像処理部、１１ｃ…記録制御部、１１ｄ…設定制御部、１１ｅ…通信制御部、１１ｆ…表示制御部、１２…推論エンジン、１２ａ…ネットワーク、１３…操作部、１４，３１ｂ，３１ｃ，３３…通信部、１５…表示部、１６…記録部、１６ａ…画像データ記録領域、１６ｂ…設定データ記録領域、２０…撮像装置、２２…撮像部、２２ａ…撮像素子、２２ｂ…光学系、３０…外部機器、３１…学習部、３１ａ…母集合作成部、３１ａ１…時間判定部、３１ａ２…対象物画像判定部、３１ｄ…入出力モデル化部、３１ｅ…教師データ記録部、３１ｆ…表示部、３２…外部画像ＤＢ。 DESCRIPTION OF SYMBOLS 11... Control part 11a... Imaging control part 11b... Image processing part 11c... Recording control part 11d... Setting control part 11e... Communication control part 11f... Display control part 12... Inference engine 12a... Network, DESCRIPTION OF SYMBOLS 13... Operation part 14, 31b, 31c, 33... Communication part, 15... Display part, 16... Recording part, 16a... Image data recording area, 16b... Setting data recording area, 20... Imaging device, 22... Imaging part, 22a... image sensor, 22b... optical system, 30... external device, 31... learning unit, 31a... mother set creation unit, 31a1... time determination unit, 31a2... object image determination unit, 31d... input/output modeling unit, 31e ... teaching data recording section, 31f ... display section, 32 ... external image DB.

Claims

an object image determination unit that detects a specific state image, which is an image of a specific object in a specific state, from a series of images having time information based on shooting time;
a time determination unit that determines a time difference between the shooting time of each image of the series of images and the shooting time of an image including the specific state image among the series of images;
A teaching data creating apparatus, comprising: a control unit that sets each image and time difference data obtained for each image as teaching data.

An inference model for generating an inference model for inferring the time at which a given object is in the specific state from an input image by machine learning using the teacher data created by the teacher data creation device according to claim 1. A learning device comprising a generator.

an inference engine that implements an inference model generated by the learning device according to claim 2;
an imaging unit;
a setting control unit that provides an image captured by the imaging unit to the inference engine and obtains an inference result of the time required for the predetermined object in the captured image to reach the specific state. imaging device.

4. The imaging apparatus according to claim 3, further comprising a display control section for performing display control for displaying the inference result of the time on the display section.

The captured image is a live view image,
5. The imaging apparatus according to claim 4, wherein the display control unit displays the inference result of the time by superimposing it on the live image displayed on the display unit.

The setting control unit acquires reliability information of the inference result together with the inference result of the time,
4. The imaging apparatus according to claim 3, wherein the display control unit changes a display form of the time inference result based on the reliability information.

The display control unit displays the time when the reliability of the inference result is equal to or higher than a predetermined threshold, and displays the time width when the reliability of the inference result is lower than the predetermined threshold. 7. The imaging device according to claim 6.

a detection step of detecting a specific state image, which is an image of a specific object in a specific state, from a series of images having time information based on shooting time;
Determining a time difference between the photographing time of each image of the series of images and the photographing time of an image including the specific state image among the series of images;
and a generation step of generating training data by pairing each of the images with the time difference data obtained for each of the images.

9. The teacher data creation method according to claim 8, wherein the detection step detects the specific object by manual operation or recognition processing, and detects the specific state by image analysis processing.

The generation step excludes the series of images from the training data if the number of images containing the specific object is less than a predetermined number, and excludes images that do not contain the specific object from the training data. 9. The teaching data creation method according to claim 8, wherein:

to the computer,
a detection step of detecting a specific state image, which is an image of a specific object in a specific state, from a series of images having time information based on shooting time;
Determining a time difference between the photographing time of each image of the series of images and the photographing time of an image including the specific state image among the series of images;
A training data generation program for executing a generation step of generating training data by pairing each of the images and the time difference data obtained for each of the images.