JP2021190921A

JP2021190921A - Information processing device and method, program, and storage medium

Info

Publication number: JP2021190921A
Application number: JP2020096335A
Authority: JP
Inventors: 誠治高橋; Seiji Takahashi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-06-02
Filing date: 2020-06-02
Publication date: 2021-12-13

Abstract

To provide an information processing device that can control the relationship between the reliability of subject recognition and the processing load so as to match a user's intention.SOLUTION: An information processing device includes an acquisition unit that acquires a first video image, an extraction unit that extracts a portion of an image from the first video image, a recognition unit that performs recognition processing to recognize a subject included in the first video image, and a control unit that controls the recognition unit to perform recognition processing using some of the images if the processing load required for the recognition processing is to be suppressed.SELECTED DRAWING: Figure 4

Description

本発明は、画像から被写体を認識する技術に関するものである。 The present invention relates to a technique for recognizing a subject from an image.

近年、機械学習を中心としたＡＩ技術の開発が盛んに行われている。例えば、カメラで撮影した動画や静止画に含まれる被写体の情報（例えば、人物名、物体名、性別、年齢など）を自動で認識する技術の開発が進められている。このような技術を利用し、カメラで撮影した動画にタグを付与することが考えられる。 In recent years, AI technology centered on machine learning has been actively developed. For example, the development of a technology for automatically recognizing subject information (for example, person name, object name, gender, age, etc.) included in a moving image or a still image taken by a camera is being developed. It is conceivable to use such a technique to add a tag to a moving image taken by a camera.

特許文献１には、被写体認識の対象となる画像からデータ量を低減した画像を被写体認識に用いることにより、被写体認識に要する処理負荷を低減させる手法が開示されている。 Patent Document 1 discloses a method of reducing the processing load required for subject recognition by using an image in which the amount of data is reduced from the image to be subject recognition for subject recognition.

特許第５９８９７８１号公報Japanese Patent No. 5989781

しかしながら、上記の特許文献１に記載のシステムでは、データ量を低減した画像を被写体認識に用いるため、必ずしも信頼度の高い認識結果が得られるとは限らない。一方で、信頼度の高い認識結果を得るためには、データ量を低減していない画像を被写体認識に用いる必要があり、被写体認識に要する時間、バッテリ消費、通信量といった処理負荷が高くなる。このような被写体認識の信頼度と処理負荷の高さがトレードオフの関係にある状況において、ユーザが処理負荷を抑えることを優先したい場合と、信頼度の高い認識結果を得ることを優先したい場合がそれぞれあると考えられる。 However, in the system described in Patent Document 1 above, since an image with a reduced amount of data is used for subject recognition, it is not always possible to obtain a highly reliable recognition result. On the other hand, in order to obtain a highly reliable recognition result, it is necessary to use an image in which the amount of data is not reduced for subject recognition, which increases the processing load such as the time required for subject recognition, battery consumption, and communication amount. In such a situation where the reliability of subject recognition and the high processing load are in a trade-off relationship, the user wants to prioritize reducing the processing load and the user wants to prioritize obtaining a highly reliable recognition result. It is thought that there are each.

例えば、ユーザがカメラでタグの表示や検索の操作をしている場合、ユーザはタグ（被写体認識結果）をすぐに取得したいものと考えられる。また、カメラのバッテリ残量が少ない場合、ユーザはバッテリ残量をなるべく抑えたいものと考えられる。また、ユーザがカメラの通信量に応じて通信料金を支払う場合を想定すると、累積通信量が多い場合、ユーザは通信量をなるべく抑えたいものと考えられる。これらの場合は、ユーザは処理負荷を抑えることを優先したいものと考えられる。 For example, when the user is operating the tag display or search with the camera, it is considered that the user wants to acquire the tag (subject recognition result) immediately. In addition, when the battery level of the camera is low, the user may want to reduce the battery level as much as possible. Further, assuming that the user pays the communication fee according to the communication amount of the camera, it is considered that the user wants to suppress the communication amount as much as possible when the cumulative communication amount is large. In these cases, the user wants to give priority to reducing the processing load.

一方で、上記のいずれの条件にも合致しない場合は、ユーザはより信頼度の高い認識結果を得ることを優先したいものと考えられる。 On the other hand, if none of the above conditions are met, the user may want to prioritize obtaining a more reliable recognition result.

しかし、従来の技術では、被写体の認識の信頼度と処理負荷の関係をユーザの意図に合うように制御することはできなかった。 However, with the conventional technology, it has not been possible to control the relationship between the reliability of subject recognition and the processing load so as to suit the user's intention.

本発明は、上述した課題に鑑みてなされたものであり、その目的は、ユーザの意図に合うように、被写体の認識の信頼度と処理負荷の関係を制御することができる情報処理装置を提供することである。 The present invention has been made in view of the above-mentioned problems, and an object thereof is to provide an information processing apparatus capable of controlling the relationship between the reliability of recognition of a subject and the processing load so as to meet the intention of the user. It is to be.

本発明に係わる情報処理装置は、第１の動画像を取得する取得手段と、前記第１の動画像から一部の画像を抽出する抽出手段と、前記第１の動画像に含まれる被写体を認識する認識処理を行う認識手段と、前記認識処理に要する処理負荷を抑えるべき場合は、前記認識手段に前記一部の画像を用いて前記認識処理を行わせるように制御する制御手段と、
を備えることを特徴とする。 The information processing apparatus according to the present invention includes an acquisition means for acquiring a first moving image, an extraction means for extracting a part of an image from the first moving image, and a subject included in the first moving image. A recognition means that performs recognition processing, and a control means that controls the recognition means to perform the recognition processing using a part of the image when the processing load required for the recognition processing should be suppressed.
It is characterized by having.

本発明によれば、ユーザの意図に合うように、被写体の認識の信頼度と処理負荷の関係を制御することが可能となる。 According to the present invention, it is possible to control the relationship between the reliability of recognition of a subject and the processing load so as to meet the intention of the user.

本発明の一実施形態に係わる情報処理装置とサーバ装置とからなるシステムの構成を示すブロック図。The block diagram which shows the structure of the system which consists of the information processing apparatus and the server apparatus which concerns on one Embodiment of this invention. 情報処理装置において記憶する管理情報の一例を示す概念図。A conceptual diagram showing an example of management information stored in an information processing device. 情報処理装置およびサーバ装置の処理の概要を説明するシーケンス図。A sequence diagram illustrating an outline of processing of an information processing device and a server device. 情報処理装置の動作を示すフローチャート。A flowchart showing the operation of the information processing device.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential for the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are given the same reference numbers, and duplicate explanations are omitted.

＜システムの構成＞
図１は、本発明の一実施形態に係わる情報処理装置とサーバ装置とからなるシステムの構成を示すブロック図である。 <System configuration>
FIG. 1 is a block diagram showing a configuration of a system including an information processing device and a server device according to an embodiment of the present invention.

図１において、情報処理装置１００は、例えばデジタルカメラ、スマートフォン端末、タブレット端末、ゲーム機器などのように、撮像機能および通信機能を有する装置からなる。 In FIG. 1, the information processing device 100 is composed of a device having an image pickup function and a communication function, such as a digital camera, a smartphone terminal, a tablet terminal, a game device, and the like.

制御部１０１は、入力された信号や、プログラムに従って、情報処理装置１００の各部を制御する制御部であり、例えばCentral Processing Unit（ＣＰＵ）で実装される。なお、制御部１０１が情報処理装置１００の全体を制御する代わりに、複数のハードウェアが処理を分担することにより、装置全体を制御してもよい。 The control unit 101 is a control unit that controls each unit of the information processing apparatus 100 according to an input signal or a program, and is implemented by, for example, a Central Processing Unit (CPU). Instead of the control unit 101 controlling the entire information processing apparatus 100, a plurality of hardware may share the processing to control the entire apparatus.

メモリ１０２は、各種データを一時的に保持するバッファメモリや、制御部１０１の作業領域等として使用される。不揮発性メモリ１０３は、電気的に消去・記録可能な不揮発性のメモリであり、制御部１０１で実行されるプログラム等が格納される。 The memory 102 is used as a buffer memory for temporarily holding various data, a work area of the control unit 101, and the like. The non-volatile memory 103 is a non-volatile memory that can be electrically erased and recorded, and stores a program or the like executed by the control unit 101.

操作部１０４は、情報処理装置１００に対する指示をユーザから受け付けるために用いられる。操作部１０４は、例えば、ユーザが情報処理装置１００の電源のＯＮ／ＯＦＦを指示するための電源ボタンや、通信機能のＯＮ／ＯＦＦを指示するための操作ボタンを含む。また、後述する表示部１０５に形成されるタッチパネルも操作部１０４に含まれる。 The operation unit 104 is used to receive an instruction to the information processing apparatus 100 from the user. The operation unit 104 includes, for example, a power button for instructing the user to turn on / off the power of the information processing apparatus 100, and an operation button for instructing the user to turn on / off the communication function. The operation unit 104 also includes a touch panel formed on the display unit 105, which will be described later.

表示部１０５は、対話的な操作のためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）表示を行う。なお、表示部１０５は必ずしも情報処理装置１００が内蔵する必要はない。情報処理装置１００は表示内容を制御する表示制御機能を少なくとも有していればよい。 The display unit 105 displays a GUI (Graphical User Interface) for interactive operation. The display unit 105 does not necessarily have to be built in the information processing device 100. The information processing apparatus 100 may have at least a display control function for controlling the display contents.

記憶媒体１０６は、各種データを記憶することができる。記憶媒体１０６は、情報処理装置１００に着脱可能なように構成されていてもよいし、内蔵されていてもよい。すなわち、情報処理装置１００は少なくとも記憶媒体１０６にアクセスする手段を有していればよい。 The storage medium 106 can store various data. The storage medium 106 may be configured to be detachable from the information processing apparatus 100, or may be built in the storage medium 106. That is, the information processing apparatus 100 may have at least a means for accessing the storage medium 106.

撮像部１０７は、例えば、光学系と、絞り・ズーム・フォーカスなどを駆動制御する駆動制御部とを有する光学レンズユニットと、光学レンズユニットを経て導入された光（映像）を電気的な映像信号に変換するための撮像素子などを備えて構成される。撮像部１０７は、制御部１０１に制御されることにより、撮像部１０７に含まれる光学レンズユニットで結像された被写体像を、撮像素子により電気信号に変換し、ノイズ低減処理などを行い、デジタルの画像データを出力する。 The image pickup unit 107 includes, for example, an optical lens unit having an optical system and a drive control unit that drives and controls aperture, zoom, focus, etc., and an electrical image signal of light (video) introduced through the optical lens unit. It is configured to be equipped with an image pickup element for converting to. By being controlled by the control unit 101, the image pickup unit 107 converts the subject image imaged by the optical lens unit included in the image pickup unit 107 into an electric signal by the image pickup element, performs noise reduction processing, and digitally. Output the image data of.

通信部１１０は、他の機器との通信を実現するための通信ユニットである。通信部１１０は、例えば無線通信のためのアンテナと無線信号を処理するための通信コントローラとから構成され、Ｗ−ＣＤＭＡ（ＵＭＴＳ）やＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）等の規格に従って公衆無線通信を実現する。制御部１０１は、通信部１１０を制御し、公衆回線１４０を経由して、サーバ装置１２０と通信する。 The communication unit 110 is a communication unit for realizing communication with other devices. The communication unit 110 is composed of, for example, an antenna for wireless communication and a communication controller for processing wireless signals, and realizes public wireless communication in accordance with standards such as W-CDMA (UMTS) and LTE (Long Term Evolution). .. The control unit 101 controls the communication unit 110 and communicates with the server device 120 via the public line 140.

サーバ装置１２０は、例えばパーソナルコンピュータ等のように、被写体認識機能および通信機能を有する装置からなる。 The server device 120 is composed of a device having a subject recognition function and a communication function, such as a personal computer or the like.

制御部１２１は、入力された信号や、プログラムに従って、サーバ装置１２０の各部を制御する制御部であり、例えばCentral Processing Unit（ＣＰＵ）で実装される。なお、制御部１２１がサーバ装置１２０の全体を制御する代わりに、複数のハードウェアが処理を分担することにより、装置全体を制御してもよい。制御部１２１は、後述の通信部１３０を介して受信した動画または静止画に含まれる被写体情報（例えば、人物名、動作、感情など）を認識する認識処理機能を有する。 The control unit 121 is a control unit that controls each unit of the server device 120 according to an input signal or a program, and is implemented by, for example, a Central Processing Unit (CPU). Instead of the control unit 121 controlling the entire server device 120, a plurality of hardware may share the processing to control the entire device. The control unit 121 has a recognition processing function for recognizing subject information (for example, a person's name, an action, an emotion, etc.) included in a moving image or a still image received via a communication unit 130 described later.

メモリ１２２は、各種データを一時的に保持するバッファメモリや、制御部１２１の作業領域等として使用される。不揮発性メモリ１２３は、電気的に消去・記録可能な不揮発性のメモリであり、制御部１２１で実行されるプログラム等が格納される。 The memory 122 is used as a buffer memory for temporarily holding various data, a work area of the control unit 121, and the like. The non-volatile memory 123 is a non-volatile memory that can be electrically erased and recorded, and stores a program or the like executed by the control unit 121.

操作部１２４は、サーバ装置１２０に対する指示をユーザから受け付けるために用いられる。操作部１２４は、例えば、ユーザがサーバ装置１２０の電源のＯＮ／ＯＦＦを指示するための電源ボタンを含む。なお、操作部１２４は、必ずしもサーバ装置１２０が内蔵する必要はない。サーバ装置１２０は操作内容を制御する制御機能を少なくとも有していればよい。 The operation unit 124 is used to receive an instruction to the server device 120 from the user. The operation unit 124 includes, for example, a power button for instructing the user to turn on / off the power of the server device 120. The operation unit 124 does not necessarily have to be built in the server device 120. The server device 120 may have at least a control function for controlling the operation content.

表示部１２５は、対話的な操作のためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）の表示を行う。なお、表示部１２５は必ずしもサーバ装置１２０が内蔵する必要はない。サーバ装置１２０は表示内容を制御する表示制御機能を少なくとも有していればよい。 The display unit 125 displays a GUI (Graphical User Interface) for interactive operation. The display unit 125 does not necessarily have to be built in the server device 120. The server device 120 may have at least a display control function for controlling the display contents.

記憶媒体１２６は、各種データを記憶することができる。記憶媒体１２６は、サーバ装置１２０に着脱可能なように構成されていてもよいし、内蔵されていてもよい。すなわち、サーバ装置１２０は少なくとも記憶媒体１２６にアクセスする手段を有していればよい。 The storage medium 126 can store various data. The storage medium 126 may be configured to be detachable from the server device 120, or may be built in the storage medium 126. That is, the server device 120 may have at least a means for accessing the storage medium 126.

通信部１３０は、他の機器との通信を実現するための通信ユニットである。通信部１３０は、例えば通信信号を処理するための通信コントローラから構成され、ＩＥＥＥ８０２．３規格に従った有線通信を実現する。制御部１２１は、通信部１３０を制御し、公衆回線１４０を経由して、情報処理装置１００と通信する。 The communication unit 130 is a communication unit for realizing communication with other devices. The communication unit 130 is composed of, for example, a communication controller for processing a communication signal, and realizes wired communication in accordance with the IEEE802.3 standard. The control unit 121 controls the communication unit 130 and communicates with the information processing device 100 via the public line 140.

＜システムの動作の概要＞
次に、本実施形態におけるシステムの動作の概要について説明する。図２は、本実施形態におけるシステムを実現するために、情報処理装置１００が記憶媒体１０６に記憶する情報の一例を示す図である。 <Overview of system operation>
Next, the outline of the operation of the system in this embodiment will be described. FIG. 2 is a diagram showing an example of information stored in the storage medium 106 by the information processing apparatus 100 in order to realize the system in the present embodiment.

管理情報２００は、情報処理装置１００で撮影した動画（動画像）に関する情報であり、少なくともファイル名２０１、顔検出タイムスタンプ２０２、動き検出タイムスタンプ２０３、タグ２０４、認識方法２０５の組み合わせで構成される。管理情報２００は、撮影した動画に関連付けて記憶される。 The management information 200 is information about a moving image (moving image) taken by the information processing apparatus 100, and is composed of at least a combination of a file name 201, a face detection time stamp 202, a motion detection time stamp 203, a tag 204, and a recognition method 205. To. The management information 200 is stored in association with the captured moving image.

ファイル名２０１は、情報処理装置１００で撮影した動画のファイル名である。顔検出タイムスタンプ２０２は、動画中に顔を検出した箇所を示すタイムスタンプである。動き検出タイムスタンプ２０３は、動画中に動きを検出した箇所を示すタイムスタンプである。タグ２０４は、動画に含まれる被写体に関する情報である。認識方法２０５は、タグ２０４を取得するために用いた被写体認識の方法を示す情報である。 The file name 201 is a file name of a moving image taken by the information processing apparatus 100. The face detection time stamp 202 is a time stamp indicating a portion where a face is detected in the moving image. The motion detection time stamp 203 is a time stamp indicating a position where motion is detected in the moving image. The tag 204 is information about a subject included in the moving image. The recognition method 205 is information indicating the subject recognition method used to acquire the tag 204.

図３は、本実施形態のシステムにおける処理の流れを示したシーケンス図である。図３において、情報処理装置１００とサーバ装置１２０は、互いに通信部１１０および通信部１３０を介した接続を確立した状態であるものとする。また、図３に示す通信処理は、いずれも各装置の通信部１１０および通信部１３０を介して行うものとする。また、図３のステップＳ３０９〜Ｓ３１２に示す処理は、ステップＳ３１５〜３１７に示す処理と比べて、処理に要する負荷（時間、バッテリ消費、通信量）が小さいものとする。 FIG. 3 is a sequence diagram showing a processing flow in the system of the present embodiment. In FIG. 3, it is assumed that the information processing device 100 and the server device 120 are in a state of establishing a connection with each other via the communication unit 110 and the communication unit 130. Further, the communication processing shown in FIG. 3 shall be performed via the communication unit 110 and the communication unit 130 of each device. Further, the processes shown in steps S309 to S312 of FIG. 3 are assumed to have a smaller load (time, battery consumption, communication amount) required for the processes than the processes shown in steps S315 to 317.

情報処理装置１００の制御部１０１は、ステップＳ３０１において、操作部１０４を介して動画撮影を開始する旨の操作を受け付けると、ステップＳ３０２において、撮像部１０７により取得した画像データをメモリ１０２に記憶する。 When the control unit 101 of the information processing apparatus 100 receives an operation to start video recording via the operation unit 104 in step S301, the image data acquired by the image pickup unit 107 is stored in the memory 102 in step S302. ..

その後、情報処理装置１００の制御部１０１は、ステップＳ３０２で記憶した画像データ中に顔を検出した場合は、ステップＳ３０３において、顔検出タイムスタンプをメモリ１０２に記憶する。また、情報処理装置１００の制御部１０１は、ステップＳ３０２で記憶した画像データ中に動きを検出した場合は、ステップＳ３０４において、動き検出タイムスタンプをメモリ１０２に記憶する。 After that, when the control unit 101 of the information processing apparatus 100 detects a face in the image data stored in step S302, the face detection time stamp is stored in the memory 102 in step S303. When the control unit 101 of the information processing apparatus 100 detects a motion in the image data stored in step S302, the control unit 101 stores the motion detection time stamp in the memory 102 in step S304.

情報処理装置１００の制御部１０１は、ステップＳ３０５において、操作部１０４を介して動画撮影を終了する旨の操作を受け付けるまで、ステップＳ３０２〜Ｓ３０４の処理を繰り返す。 The control unit 101 of the information processing apparatus 100 repeats the processes of steps S302 to S304 in step S305 until the operation to end the moving image shooting is received via the operation unit 104.

情報処理装置１００の制御部１０１は、ステップＳ３０５において、操作部１０４を介して動画撮影を終了する旨の操作を受け付けると、ステップＳ３０６において、ステップＳ３０２で記憶した複数の画像データを所定の動画フォーマットに変換して記憶媒体１０６に保存する。 When the control unit 101 of the information processing apparatus 100 receives an operation to end the moving image shooting via the operation unit 104 in step S305, in step S306, the plurality of image data stored in step S302 are stored in a predetermined moving image format. Is converted to and stored in the storage medium 106.

続いて、情報処理装置１００の制御部１０１は、ステップＳ３０７において、下記の情報を管理情報２００に記憶する。
・ファイル名２０１：ステップＳ３０６で保存したファイル名
・顔検出タイムスタンプ２０２：ステップ３０３で記憶した顔検出タイムスタンプ
・動き検出タイムスタンプ２０３：ステップ３０４で記憶した動き検出タイムスタンプ
続いて、情報処理装置１００の制御部１０１は、ステップＳ３０８において、操作部１０４を介して、ステップＳ３０６で保存した動画のタグ表示を要求する操作を受け付けると、ステップＳ３０９において、該当する動画から部分動画を抽出する。ここで情報処理装置１００の制御部１０１は、管理情報２００を参照し、該当する動画から顔検出タイムスタンプ２０２および動き検出タイムスタンプ２０３の前後数フレームを抽出した動画を生成する。 Subsequently, the control unit 101 of the information processing apparatus 100 stores the following information in the management information 200 in step S307.
-File name 201: File name saved in step S306-Face detection time stamp 202: Face detection time stamp stored in step 303-Motion detection time stamp 203: Motion detection time stamp stored in step 304 Next, the information processing device. When the control unit 101 of 100 receives an operation requesting tag display of the moving image saved in step S306 via the operation unit 104 in step S308, the control unit 101 extracts a partial moving image from the corresponding moving image in step S309. Here, the control unit 101 of the information processing apparatus 100 refers to the management information 200 and generates a moving image obtained by extracting several frames before and after the face detection time stamp 202 and the motion detection time stamp 203 from the corresponding moving image.

情報処理装置１００の制御部１０１は、ステップＳ３０９で抽出した動画の数だけステップＳ３１０〜Ｓ３１２に示す処理を繰り返す。情報処理装置１００の制御部１０１は、ステップＳ３１０において、サーバ装置１２０に対して、被写体認識要求を送信する。情報処理装置１００の制御部１０１は、その被写体認識要求に、ステップＳ３０９で抽出した動画を含める。 The control unit 101 of the information processing apparatus 100 repeats the process shown in steps S310 to S312 for the number of moving images extracted in step S309. In step S310, the control unit 101 of the information processing apparatus 100 transmits a subject recognition request to the server apparatus 120. The control unit 101 of the information processing apparatus 100 includes the moving image extracted in step S309 in the subject recognition request.

サーバ装置１２０の制御部１２１は、ステップＳ３１０での被写体認識要求を受信すると、ステップＳ３１１において、受信した動画に含まれる被写体情報（例えば、人物名、動作、感情など）を認識する。その後、サーバ装置１２０の制御部１２１は、ステップＳ３１２において、情報処理装置１００に対して、被写体認識応答を送信する。サーバ装置１２０の制御部１２１は、その被写体認識応答に、ステップＳ３１１において認識された被写体認識結果を含める。 When the control unit 121 of the server device 120 receives the subject recognition request in step S310, the control unit 121 recognizes the subject information (for example, person name, action, emotion, etc.) included in the received moving image in step S311. After that, the control unit 121 of the server device 120 transmits a subject recognition response to the information processing device 100 in step S312. The control unit 121 of the server device 120 includes the subject recognition result recognized in step S311 in the subject recognition response.

情報処理装置１００の制御部１０１は、被写体認識応答Ｓ３１２を受信すると、ステップＳ３１３において、下記の情報を管理情報２００に追加記憶する。
・タグ２０４：ステップＳ３１２で受信した全ての被写体認識結果
・認識方法２０５：「部分」
続いて、情報処理装置１００の制御部１０１は、ステップＳ３１４において、ステップ３１２で受信した被写体認識結果のタグを表示部１０５に表示する。 Upon receiving the subject recognition response S312, the control unit 101 of the information processing apparatus 100 additionally stores the following information in the management information 200 in step S313.
-Tag 204: All subject recognition results / recognition method received in step S312: "Part"
Subsequently, in step S314, the control unit 101 of the information processing apparatus 100 displays the tag of the subject recognition result received in step 312 on the display unit 105.

このように、情報処理装置１００の制御部１０１は、Ｓ３０８でのタグ表示操作を受け付けることにより、処理負荷（ここでは時間）を抑えるべきと判定し、動画から所定の部分を抽出した動画を被写体認識に用いる。これにより、ユーザが被写体認識結果を取得するまでの時間を短縮することができる。 In this way, the control unit 101 of the information processing apparatus 100 determines that the processing load (here, time) should be suppressed by accepting the tag display operation in S308, and takes a moving image obtained by extracting a predetermined portion from the moving image as a subject. Used for recognition. As a result, it is possible to shorten the time until the user acquires the subject recognition result.

続いて、情報処理装置１００の制御部１０１は、ステップＳ３１５において、操作部１０４を介して、動画のタグ表示を終了する操作を受け付けると、ステップＳ３１６において、サーバ装置１２０に対して、被写体認識要求を送信する。情報処理装置１００の制御部１０１は、その被写体認識要求に、ステップＳ３０６で保存した動画を含める。 Subsequently, when the control unit 101 of the information processing apparatus 100 receives the operation of ending the tag display of the moving image in step S315 via the operation unit 104, the subject recognition request is made to the server device 120 in step S316. To send. The control unit 101 of the information processing apparatus 100 includes the moving image saved in step S306 in the subject recognition request.

サーバ装置１２０の制御部１２１は、ステップＳ３１６での被写体認識要求を受信すると、ステップＳ３１７において、受信した動画に含まれる被写体情報（例えば、人物名、動作、感情など）を認識する。その後、サーバ装置１２０の制御部１２１は、ステップＳ３１８において、情報処理装置１００に対して、被写体認識応答を送信する。サーバ装置１２０の制御部１２１は、その被写体認識応答に、ステップＳ３１７の被写体認識結果を含める。 When the control unit 121 of the server device 120 receives the subject recognition request in step S316, the control unit 121 recognizes the subject information (for example, person name, action, emotion, etc.) included in the received moving image in step S317. After that, the control unit 121 of the server device 120 transmits a subject recognition response to the information processing device 100 in step S318. The control unit 121 of the server device 120 includes the subject recognition result of step S317 in the subject recognition response.

情報処理装置１００の制御部１０１は、ステップＳ３１８での被写体認識応答を受信すると、ステップＳ３１９において、下記の情報を管理情報２００に追加記憶する。
・タグ２０４：ステップＳ３１８で受信した被写体認識結果
・認識方法２０５：「全体」
このように、情報処理装置１００の制御部１０１は、タグ表示終了操作を受け付けることにより、処理負荷（ここでは時間）を抑えなくてもよいと判定し、動画全体を被写体認識に用いる。これにより、ユーザはより信頼度の高い被写体認識結果を得ることができる。 Upon receiving the subject recognition response in step S318, the control unit 101 of the information processing apparatus 100 additionally stores the following information in the management information 200 in step S319.
-Tag 204: Subject recognition result / recognition method 205 received in step S318: "Overall"
In this way, the control unit 101 of the information processing apparatus 100 determines that the processing load (here, time) does not have to be suppressed by accepting the tag display end operation, and uses the entire moving image for subject recognition. As a result, the user can obtain a more reliable subject recognition result.

＜各装置の動作＞
続いて、上記の動作を実現するための情報処理装置１００の詳細な動作について、図４を参照して説明する。 <Operation of each device>
Subsequently, the detailed operation of the information processing apparatus 100 for realizing the above operation will be described with reference to FIG.

図４は、本実施形態の情報処理装置１００の動作を示すフローチャートである。なお、本フローチャートに示す処理は、情報処理装置１００の制御部１０１が入力信号やプログラムに従い、情報処理装置１００の各部を制御することにより実現される。また、情報処理装置１００は、サーバ装置１２０と通信部１１０を介した接続を確立した状態であるものとする。 FIG. 4 is a flowchart showing the operation of the information processing apparatus 100 of the present embodiment. The process shown in this flowchart is realized by the control unit 101 of the information processing device 100 controlling each unit of the information processing device 100 according to an input signal or a program. Further, it is assumed that the information processing device 100 is in a state of establishing a connection between the server device 120 and the communication unit 110.

ステップＳ４０１では、制御部１０１は、操作部１０４を介して、情報処理装置１００の操作を終了する旨の操作（例えば、電源ＯＦＦ操作など）を受け付けたか否かを判定する。制御部１０１は、受け付けたと判定した場合は、本フローチャートの処理を終了し、そうでないと判定した場合は、ステップＳ４０２に処理を進める。 In step S401, the control unit 101 determines whether or not an operation (for example, a power-off operation) to end the operation of the information processing apparatus 100 has been accepted via the operation unit 104. If it is determined that the control unit 101 has been accepted, the process of this flowchart is terminated, and if it is determined that the process is not accepted, the process proceeds to step S402.

ステップＳ４０２では、制御部１０１は、操作部１０４を介して動画撮影を開始する旨の操作を受け付けたか否かを判定する。制御部１０１は、受け付けたと判定した場合は、ステップＳ４０３に処理を進め、そうでないと判定した場合は、ステップＳ４１１に処理を進める。 In step S402, the control unit 101 determines whether or not the operation to start the moving image shooting is accepted via the operation unit 104. If it is determined that the control unit 101 has been accepted, the process proceeds to step S403, and if it is determined that the control unit 101 does not, the process proceeds to step S411.

ステップＳ４０３では、制御部１０１は、撮像部１０７により取得された画像データをメモリ１０２に記憶する。本ステップは、図３のステップＳ３０２の処理に相当する。 In step S403, the control unit 101 stores the image data acquired by the image pickup unit 107 in the memory 102. This step corresponds to the process of step S302 in FIG.

ステップＳ４０４では、制御部１０１は、ステップＳ４０３で記憶した画像データ中に顔が含まれるか否かを判定する。制御部１０１は、含まれると判定した場合は、ステップＳ４０５に処理を進め、そうでないと判定した場合は、ステップＳ４０６に処理を進める。 In step S404, the control unit 101 determines whether or not the image data stored in step S403 includes a face. If it is determined that the control unit 101 is included, the process proceeds to step S405, and if it is determined that the control unit 101 is not included, the process proceeds to step S406.

ステップＳ４０５では、制御部１０１は、ステップＳ４０４で記録したタイムスタンプをメモリ１０２に記憶する。本ステップは、図３のステップＳ３０３の処理に相当する。 In step S405, the control unit 101 stores the time stamp recorded in step S404 in the memory 102. This step corresponds to the process of step S303 in FIG.

ステップＳ４０６では、制御部１０１は、ステップＳ４０３で記憶した画像データ中に動きが含まれるか否かを判定する。制御部１０１は、含まれると判定した場合は、ステップＳ４０７に処理を進め、そうでないと判定した場合は、ステップＳ４０８に処理を進める。 In step S406, the control unit 101 determines whether or not the image data stored in step S403 includes motion. If it is determined that the control unit 101 is included, the process proceeds to step S407, and if it is determined that the control unit 101 is not included, the process proceeds to step S408.

ステップＳ４０７では、制御部１０１は、ステップＳ４０６で記録したタイムスタンプをメモリ１０２に記憶する。本ステップは、図３のステップＳ３０４の処理に相当する。 In step S407, the control unit 101 stores the time stamp recorded in step S406 in the memory 102. This step corresponds to the process of step S304 in FIG.

ステップＳ４０８では、制御部１０１は、操作部１０４を介して動画撮影を終了する旨の操作を受け付けたか否かを判定する。制御部１０１は、受け付けたと判定した場合は、ステップＳ４０９に処理を進め、そうでないと判定した場合は、ステップＳ４０３に処理を戻す。 In step S408, the control unit 101 determines whether or not the operation to end the moving image shooting is accepted via the operation unit 104. If it is determined that the control unit 101 has been accepted, the process proceeds to step S409, and if it is determined that the process is not accepted, the process returns to step S403.

ステップＳ４０９では、制御部１０１は、ステップＳ４０３で記憶した複数の画像データを所定の動画フォーマットに変換して記憶媒体１０６に保存する。本ステップは、図３のステップＳ３０６の処理に相当する。 In step S409, the control unit 101 converts the plurality of image data stored in step S403 into a predetermined moving image format and stores the image data in the storage medium 106. This step corresponds to the process of step S306 in FIG.

ステップＳ４１０では、制御部１０１は、下記の情報を管理情報２００に記憶する。
・ファイル名２０１：ステップＳ４０９で保存したファイル名
・顔検出タイムスタンプ２０２：ステップ４０５で記憶したタイムスタンプ
・動き検出タイムスタンプ２０３：ステップ４０７で記憶したタイムスタンプ
本ステップは、図３のステップＳ３０７の処理に相当する。 In step S410, the control unit 101 stores the following information in the management information 200.
-File name 201: File name saved in step S409-Face detection time stamp 202: Time stamp stored in step 405-Motion detection time stamp 203: Time stamp stored in step 407 This step is in step S307 of FIG. Corresponds to processing.

ステップＳ４１１では、制御部１０１は、処理負荷を抑えるべきか否かを判定する。制御部１０１は、下記の条件のいずれかに合致した場合に、処理負荷を抑えるべきと判定する。
・操作部１０４を介してタグ２０４を表示または検索する操作を受け付けた場合
・バッテリ残量が所定値以下の場合
・通信部１１０を介した通信の累積量が所定の量以上の場合
例えば、ユーザが情報処理装置１００でタグの表示や検索の操作をしている場合、ユーザはタグ（被写体認識結果）をすぐに取得したいものと考えられる。また、情報処理装置１００のバッテリ残量が少ない場合、ユーザはバッテリ残量をなるべく抑えたいものと考えられる。また、ユーザが情報処理装置１００の通信量に応じて通信料金を支払う場合を想定すると、累積通信量が多い場合、ユーザは通信量をなるべく抑えたいものと考えられる。一方で、上記のいずれの条件にも合致しない場合（上記の条件の場合以外）は、ユーザはより信頼度の高い被写体認識結果を取得したいものと考えられる。 In step S411, the control unit 101 determines whether or not the processing load should be suppressed. The control unit 101 determines that the processing load should be suppressed when any of the following conditions is met.
-When an operation to display or search the tag 204 is accepted via the operation unit 104-When the remaining battery level is less than or equal to a predetermined value-When the cumulative amount of communication via the communication unit 110 is more than a predetermined amount For example, the user When the information processing apparatus 100 is performing a tag display or search operation, it is considered that the user wants to immediately acquire the tag (subject recognition result). Further, when the battery remaining amount of the information processing apparatus 100 is low, it is considered that the user wants to suppress the battery remaining amount as much as possible. Further, assuming that the user pays the communication fee according to the communication amount of the information processing apparatus 100, it is considered that the user wants to suppress the communication amount as much as possible when the cumulative communication amount is large. On the other hand, when none of the above conditions is met (other than the above conditions), it is considered that the user wants to acquire a more reliable subject recognition result.

制御部１０１は、処理負荷を抑えるべきと判定した場合は、タグ２０４を未付与の動画について、ステップＳ４１２〜Ｓ４１５の処理を繰り返す。そうでない場合、制御部１０１は、認識方法２０５が「全体」ではない動画について、ステップＳ４１６〜Ｓ４１８の処理を繰り返す。なお、前者の処理は、後者の処理と比べて、処理に要する負荷（時間、バッテリ消費、通信量）が小さいものとする。 When the control unit 101 determines that the processing load should be suppressed, the control unit 101 repeats the processing of steps S421 to S415 for the moving image to which the tag 204 is not attached. If not, the control unit 101 repeats the processes of steps S416 to S418 for the moving image whose recognition method 205 is not “whole”. It should be noted that the former process has a smaller load (time, battery consumption, communication amount) required for the process than the latter process.

ステップＳ４１２では、制御部１０１は、動画から部分動画を抽出する。ここで制御部１０１は、管理情報２００を参照し、該当する動画から顔検出タイムスタンプ２０２および動き検出タイムスタンプ２０３の前後数フレームを抽出した動画を生成する。本ステップは、図３のステップＳ３０９の処理に相当する。 In step S412, the control unit 101 extracts a partial moving image from the moving image. Here, the control unit 101 refers to the management information 200 and generates a moving image obtained by extracting several frames before and after the face detection time stamp 202 and the motion detection time stamp 203 from the corresponding moving image. This step corresponds to the process of step S309 in FIG.

制御部１０１は、ステップＳ４１２で抽出した動画について、ステップＳ４１３〜Ｓ４１４の処理を繰り返す。 The control unit 101 repeats the processes of steps S413 to S414 for the moving image extracted in step S412.

ステップＳ４１３では、制御部１０１は、通信部１１０を介して、サーバ装置１２０に対して、被写体認識要求を送信する。制御部１０１は、その被写体認識要求に、ステップＳ４１２で抽出した動画を含める。本ステップは、図３のステップＳ３１０の処理に相当する。 In step S413, the control unit 101 transmits a subject recognition request to the server device 120 via the communication unit 110. The control unit 101 includes the moving image extracted in step S412 in the subject recognition request. This step corresponds to the process of step S310 in FIG.

ステップＳ４１４では、制御部１０１は、通信部１１０を介して、サーバ装置１２０から、被写体認識応答を受信したか否かを判定する。制御部１０１は、受信したと判定した場合は、次のステップ（未認識の動画があればステップＳ４１３、なければステップＳ４１５）に処理を進め、そうでないと判定した場合は、再びステップＳ４１４の処理を繰り返す。 In step S414, the control unit 101 determines whether or not the subject recognition response has been received from the server device 120 via the communication unit 110. If the control unit 101 determines that it has been received, it proceeds to the next step (step S413 if there is an unrecognized moving image, step S415 if not), and if it determines that it is not, the process of step S414 again. repeat.

ステップＳ４１５では、制御部１０１は、被写体認識対象の動画について、下記の情報を管理情報２００に追加記憶する。
・タグ２０４：ステップＳ４１４で受信した全ての被写体認識結果
・認識方法２０５：「部分」
本ステップは、図３のステップＳ３１３の処理に相当する。 In step S415, the control unit 101 additionally stores the following information in the management information 200 for the moving image to be recognized as the subject.
-Tag 204: All subject recognition results received in step S414-Recognition method 205: "Part"
This step corresponds to the process of step S313 in FIG.

このように、情報処理装置１００の制御部１０１は、処理負荷（時間、バッテリ消費、通信量など）を抑えるべきと判定した場合に、動画から所定の部分を抽出した動画を被写体認識に用いる。これにより、ユーザが被写体認識結果を取得するまでの時間を短縮し、バッテリ消費や通信量を抑えることができる。 As described above, when the control unit 101 of the information processing apparatus 100 determines that the processing load (time, battery consumption, communication amount, etc.) should be suppressed, the control unit 101 uses a moving image obtained by extracting a predetermined portion from the moving image for subject recognition. As a result, the time until the user acquires the subject recognition result can be shortened, and the battery consumption and the communication amount can be suppressed.

ステップＳ４１６では、制御部１０１は、通信部１１０を介して、サーバ装置１２０に対して、被写体認識要求を送信する。制御部１０１は、その被写体認識要求に、被写体認識対象の動画を含める。本ステップは、図３のステップＳ３１６の処理に相当する。 In step S416, the control unit 101 transmits a subject recognition request to the server device 120 via the communication unit 110. The control unit 101 includes a moving image to be recognized as a subject in the subject recognition request. This step corresponds to the process of step S316 in FIG.

ステップＳ４１７では、制御部１０１は、通信部１１０を介して、サーバ装置１２０から、被写体認識応答を受信したか否かを判定する。制御部１０１は、受信したと判定した場合は、ステップＳ４１８に処理を進め、そうでないと判定した場合は、再びステップＳ４１７の処理を繰り返す。 In step S417, the control unit 101 determines whether or not the subject recognition response has been received from the server device 120 via the communication unit 110. If the control unit 101 determines that the signal has been received, the process proceeds to step S418, and if it is determined that the signal has not been received, the control unit 101 repeats the process of step S417 again.

ステップＳ４１８では、制御部１０１は、被写体認識対象の動画について、下記の情報を管理情報２００に追加記憶する。
・タグ２０４：ステップＳ４１７で受信した被写体認識結果
・認識方法２０５：「全体」
本ステップは、図３のステップＳ３１９の処理に相当する。 In step S418, the control unit 101 additionally stores the following information in the management information 200 for the moving image to be recognized as the subject.
-Tag 204: Subject recognition result / recognition method 205 received in step S417: "Overall"
This step corresponds to the process of step S319 in FIG.

このように、情報処理装置１００の制御部１０１は、処理負荷（時間、バッテリ消費、通信量など）を抑えなくてもよいと判定した場合に、動画全体を被写体認識に用いる。これにより、ユーザはより信頼度の高い被写体認識結果を得ることができる。 As described above, the control unit 101 of the information processing apparatus 100 uses the entire moving image for subject recognition when it is determined that the processing load (time, battery consumption, communication amount, etc.) does not have to be suppressed. As a result, the user can obtain a more reliable subject recognition result.

サーバ装置１２０の動作については、情報処理装置１００から被写体認識要求を受信すると、その被写体認識要求に含まれる動画に対して被写体認識を実行し、情報処理装置１００に被写体認識応答を送信する、という簡易な内容であるため、図示を省略する。 Regarding the operation of the server device 120, when a subject recognition request is received from the information processing device 100, the subject recognition is executed for the moving image included in the subject recognition request, and the subject recognition response is transmitted to the information processing device 100. Since the content is simple, the illustration is omitted.

以上説明したように、本実施形態の情報処理装置は、被写体認識に要する処理負荷を抑えるべきか否かを判定して、被写体認識に動画の一部分を抽出した画像を用いるか、動画全体を用いるかを切り替える。これにより、被写体認識に要する処理負荷を抑えるべき場合には処理負荷を抑えることを優先し、そうでない場合には信頼度の高い認識結果を得ることを優先できる。 As described above, the information processing apparatus of the present embodiment determines whether or not the processing load required for subject recognition should be suppressed, and uses an image obtained by extracting a part of the moving image for subject recognition, or uses the entire moving image. Switch between. As a result, when the processing load required for subject recognition should be suppressed, it is possible to give priority to suppressing the processing load, and when not, it is possible to give priority to obtaining a highly reliable recognition result.

（実施形態の変形例）
なお、上述の実施形態では、外部のサーバ装置から被写体認識結果を取得する構成を例に挙げたが、情報処理装置の内部に被写体認識機能を有する構成でもよい。すなわち、処理負荷（時間、バッテリ消費など）を抑えるべき場合には、部分抽出した動画を用いて被写体認識を実行し、そうでない場合は動画全体を用いて被写体認識を実行する構成でもよい。 (Modified example of the embodiment)
In the above-described embodiment, the configuration for acquiring the subject recognition result from the external server device is given as an example, but the configuration may have a subject recognition function inside the information processing apparatus. That is, if the processing load (time, battery consumption, etc.) should be suppressed, the subject recognition may be executed using the partially extracted moving image, and if not, the subject recognition may be executed using the entire moving image.

また、上述の実施形態では、処理負荷を抑える場合に部分抽出した動画を用いる構成を例に挙げたが、静止画（静止画像）を抽出する構成でもよい。 Further, in the above-described embodiment, the configuration using the partially extracted moving image is given as an example when the processing load is suppressed, but a configuration for extracting a still image (still image) may also be used.

また、上述の実施形態では、部分抽出した全ての動画について、同一のサーバ装置で被写体認識する構成を例に挙げたが、顔を検出した箇所の動画と、動きを検出した箇所の動画に対して、それぞれ別のサーバ装置で被写体認識を行う構成でもよい。例えば、サーバ装置によって被写体認識の得手不得手がある場合には、得意なサーバ装置を選択することにより、より信頼度の高い被写体認識結果を得ることができる。 Further, in the above-described embodiment, the configuration in which the subject is recognized by the same server device for all the partially extracted moving images is given as an example, but for the moving image of the part where the face is detected and the moving image of the part where the movement is detected. Therefore, the subject may be recognized by different server devices. For example, when the server device has strengths and weaknesses in subject recognition, it is possible to obtain a more reliable subject recognition result by selecting a server device that is good at it.

また、上述の実施形態では被写体認識のアルゴリズムについて、特に限定せずに説明したが、機械学習を用いて被写体を認識する機能を実装してもよい。このようにすることで、より精度よく被写体を認識することができる場合がある。機械学習の具体的なアルゴリズムとしては、最近傍法、ナイーブベイズ法、決定木、サポートベクターマシンなどが挙げられる。また、ニューラルネットワークを利用して、学習するための特徴量、結合重み付け係数を自ら生成する深層学習（ディープラーニング）も挙げられる。適宜、上記アルゴリズムのうち利用できるものを用いて本実施形態に適用することができる。 Further, in the above-described embodiment, the subject recognition algorithm has been described without particular limitation, but a function of recognizing a subject may be implemented by using machine learning. By doing so, it may be possible to recognize the subject with higher accuracy. Specific algorithms for machine learning include the nearest neighbor method, the naive Bayes method, the decision tree, and the support vector machine. In addition, deep learning (deep learning) in which features for learning and coupling weighting coefficients are generated by themselves using a neural network can also be mentioned. As appropriate, any of the above algorithms that can be used can be applied to this embodiment.

例えば、ディープラーニングのような学習モデルを用いて被写体を認識する機能を実装場合、例えば人物の写った画像を入力データとし、その人物を正しく認識した場合の認識結果を教師データとして学習を行うことにより学習済みモデルをあらかじめ作成しておく。この学習モデルをサーバ装置に搭載し、部分抽出した動画を入力として学習済みモデルにより推論処理を行い、認識結果を出力として得ることができる。 For example, when implementing a function to recognize a subject using a learning model such as deep learning, for example, learning is performed using an image of a person as input data and the recognition result when the person is correctly recognized as teacher data. Create a trained model in advance. This learning model can be mounted on a server device, inference processing can be performed by the trained model using a partially extracted video as an input, and the recognition result can be obtained as an output.

また上述のとおり、顔を検出した箇所の動画を入力する学習済みモデルと、動きを検出した箇所の動画を入力する学習済みモデルとを個別に用意してもよい。この場合、動画のうち顔を検出した箇所の動画を入力データとし、顔を検出した箇所の動画から人物を正しく認識した場合の認識結果を教師データとして学習して、学習済みモデルをあらかじめ生成しておく。同様に、動きを検出した箇所の動画を入力データとし、動きを検出した箇所の動画から人物を正しく認識した場合の認識結果を教師データとして学習して、学習済みモデルをあらかじめ生成しておく。このようにすることで、より精度よく被写体を認識することができる。 Further, as described above, a trained model for inputting a moving image of a portion where a face is detected and a trained model for inputting a moving image of a portion where motion is detected may be prepared separately. In this case, the video of the part where the face is detected is used as the input data, and the recognition result when the person is correctly recognized from the video of the part where the face is detected is learned as the teacher data, and the trained model is generated in advance. Keep it. Similarly, the moving image of the place where the movement is detected is used as the input data, and the recognition result when the person is correctly recognized from the moving image of the place where the movement is detected is learned as the teacher data, and the trained model is generated in advance. By doing so, the subject can be recognized more accurately.

なお、ディープラーニングのような学習モデルを実装する場合には、データをより多く並列処理する必要があるため、ＣＰＵだけでなくＧＰＵを用いてもよい。このようにすれば、より効率的な処理が可能である。具体的には、学習モデルを含む学習プログラムを実行する場合に、ＣＰＵとＧＰＵが協働して演算を行うことで学習を行う。なお、学習の処理はＧＰＵのみにより演算が行われてもよい。また、同様に推論の処理もＧＰＵを用いてもよい。 When implementing a learning model such as deep learning, it is necessary to process more data in parallel, so not only the CPU but also the GPU may be used. By doing so, more efficient processing is possible. Specifically, when a learning program including a learning model is executed, learning is performed by the CPU and the GPU collaborating to perform calculations. The learning process may be performed only by the GPU. Similarly, the GPU may be used for inference processing.

なお、上述の説明では被写体認識機能を機械学習された学習済みモデルを用いて処理を実行したが、ルックアップテーブル（ＬＵＴ）等のルールベースの処理を行ってもよい。その場合には、例えば、学習済みモデルの入力データと出力データとの関係をあらかじめＬＵＴとして作成する。そして、この作成したＬＵＴを装置のメモリに格納しておくとよい。被写体認識の処理を行う場合には、この格納されたＬＵＴを参照して、出力データを取得することができる。つまりＬＵＴは、前記学習済みモデルと同等の処理をするためのプログラムとして、ＣＰＵあるいはＧＰＵなどと協働で動作することにより、被写体認識の処理を行う。 In the above description, the subject recognition function is processed using a machine-learned trained model, but a rule-based process such as a look-up table (LUT) may be performed. In that case, for example, the relationship between the input data and the output data of the trained model is created in advance as a LUT. Then, it is advisable to store the created LUT in the memory of the device. When the subject recognition process is performed, the output data can be acquired by referring to the stored LUT. That is, the LUT performs subject recognition processing by operating in collaboration with a CPU, GPU, or the like as a program for performing the same processing as the trained model.

また、上述の実施形態では、動画全体に対して一様にタグを付与する構成を例に挙げたが、タイムスタンプに対応させてタグを付与する構成でもよい。 Further, in the above-described embodiment, the configuration in which the tag is uniformly attached to the entire moving image is given as an example, but the configuration in which the tag is attached corresponding to the time stamp may be used.

また、上述の実施形態では、動画撮影完了後に被写体認識を実行する構成を例に挙げたが、動画撮影中に被写体認識を実行する構成でもよい。すなわち、動画撮影中に顔や動きを検出した際に、該当フレームを含む数フレームの動画を生成し、被写体認識を実行する構成でもよい。 Further, in the above-described embodiment, the configuration in which the subject recognition is executed after the completion of the moving image shooting is given as an example, but the configuration in which the subject recognition is executed during the moving image shooting may be used. That is, when a face or movement is detected during moving image shooting, a moving image of several frames including the corresponding frame may be generated and subject recognition may be executed.

（他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワークまたは記憶媒体を介してシステムまたは装置に供給し、そのシステムまたは装置のコンピュータがプログラムを読出し実行する処理でも実現可能である。コンピュータは、１または複数のプロセッサーまたは回路を有し、コンピュータ実行可能命令を読み出し実行するために、分離した複数のコンピュータまたは分離した複数のプロセッサーまたは回路のネットワークを含みうる。 (Other embodiments)
The present invention can also be realized by supplying a program that realizes one or more functions of the above-described embodiment to a system or a device via a network or a storage medium, and a process in which a computer of the system or the device reads and executes the program. be. A computer may have one or more processors or circuits and may include a network of separate computers or separate processors or circuits for reading and executing computer-executable instructions.

プロセッサーまたは回路は、中央演算処理装置（ＣＰＵ）、マイクロプロセッシングユニット（ＭＰＵ）、グラフィクスプロセッシングユニット（ＧＰＵ）、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートウェイ（ＦＰＧＡ）を含みうる。また、プロセッサーまたは回路は、デジタルシグナルプロセッサ（ＤＳＰ）、データフロープロセッサ（ＤＦＰ）、またはニューラルプロセッシングユニット（ＮＰＵ）を含みうる。 The processor or circuit may include a central processing unit (CPU), a microprocessing unit (MPU), a graphics processing unit (GPU), an application specific integrated circuit (ASIC), a field programmable gateway (FPGA). Also, the processor or circuit may include a digital signal processor (DSP), a data flow processor (DFP), or a neural processing unit (NPU).

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above embodiment, and various modifications and modifications can be made without departing from the spirit and scope of the invention. Therefore, a claim is attached to publicize the scope of the invention.

１００：情報処理装置、１０１：制御部（抽出手段）、１０２：メモリ、１０３：不揮発性メモリ、１０４：操作部、１０５：表示部、１０６：記憶媒体、１０７：撮像部、１１０：通信部 100: Information processing device, 101: Control unit (extraction means), 102: Memory, 103: Non-volatile memory, 104: Operation unit, 105: Display unit, 106: Storage medium, 107: Imaging unit, 110: Communication unit

Claims

The acquisition means for acquiring the first moving image,
An extraction means for extracting a part of an image from the first moving image,
A recognition means that performs a recognition process for recognizing a subject included in the first moving image,
When the processing load required for the recognition process should be suppressed, a control means for controlling the recognition means to perform the recognition process using the part of the image, and a control means.
An information processing device characterized by being equipped with.

The claim is characterized in that the control means controls the recognition means to perform the recognition process using the first moving image, except when the processing load required for the recognition process should be suppressed. Item 1. The information processing apparatus according to item 1.

The information processing apparatus according to claim 1 or 2, wherein the partial image is a moving image or a still image.

One of claims 1 to 3, wherein the processing load required for the recognition process using the partial image is smaller than the processing load required for the recognition process using the first moving image. The information processing device described in the section.

The information processing according to any one of claims 1 to 4, wherein the recognition means transmits an image used for the recognition process to an external server and receives the result of the recognition process from the server. Device.

One of claims 1 to 5, wherein the control means determines that the processing load required for the recognition process should be suppressed when the operation using the result of the recognition process is accepted. The information processing apparatus according to item 1.

The control means according to any one of claims 1 to 6, wherein when the remaining amount of the battery is equal to or less than a predetermined value, it is determined that the processing load required for the recognition process should be suppressed. The information processing device described.

The control means according to claim 1 to 7, wherein when the cumulative communication amount of the information processing apparatus is equal to or more than a predetermined amount, it is determined that the processing load required for the recognition process should be suppressed. The information processing apparatus according to any one of the following items.

The information processing apparatus according to any one of claims 1 to 8, wherein the partial image is an image including a face or a portion where movement is detected in the first moving image.

The information processing apparatus according to any one of claims 1 to 9, further comprising a storage means for storing the result of the recognition process obtained by the recognition means in association with the first moving image. ..

When the control means causes the recognition means to perform the recognition process using the partial image and then determines that the processing load required for the recognition process should no longer be suppressed, the recognition means causes the recognition means to perform the recognition process. The information processing apparatus according to any one of claims 1 to 10, wherein the recognition process is controlled by using the first moving image.

The information processing apparatus according to any one of claims 1 to 11, wherein the acquisition means is an image pickup means for photographing a subject.

The acquisition process for acquiring the first moving image and
An extraction step of extracting a part of an image from the first moving image and
A recognition step of performing a recognition process for recognizing a subject included in the first moving image,
When the processing load required for the recognition process should be suppressed, a control step for controlling the recognition process by using a part of the images in the recognition process and a control step for controlling the recognition process.
An information processing method characterized by having.

A program for making a computer function as each means of the information processing apparatus according to any one of claims 1 to 12.

A storage medium readable by a computer that stores a program for causing the computer to function as each means of the information processing apparatus according to any one of claims 1 to 12.