JP7433782B2

JP7433782B2 - Information processing device, information processing method, and program

Info

Publication number: JP7433782B2
Application number: JP2019110963A
Authority: JP
Inventors: 俊介佐藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-06-14
Filing date: 2019-06-14
Publication date: 2024-02-20
Anticipated expiration: 2039-06-14
Also published as: JP2020204812A

Description

本発明は、学習型の認識における情報処理技術に関する。 The present invention relates to information processing technology in learning type recognition.

学習型の認識装置において、稼働時に得られたデータを追加学習することによって、設置された環境に固有のパターンにおける認識精度を高める、いわゆるドメイン適応の手法が知られている。
近年は追加学習の効果を高めるため、不適切な教師データが混入しないよう選別する方法が提案されている。例えば特許文献１には、入力データを教師データとして採用するかどうかを、それまでに収集した教師データが持つ形態素の構成に合致しているかどうかで判断する手法が提案されている。また、特許文献２には、追加学習データの評価値の推移に基づいて過学習かどうかを判定して、過学習の場合は学習しなおす手法が提案されている。 In a learning type recognition device, a so-called domain adaptation method is known in which the recognition accuracy for patterns specific to the installed environment is increased by additionally learning data obtained during operation.
In recent years, in order to increase the effectiveness of additional learning, methods have been proposed to screen out inappropriate teaching data to prevent it from being mixed in. For example, Patent Document 1 proposes a method of determining whether input data is to be adopted as training data based on whether it matches the morpheme structure of previously collected training data. Moreover, Patent Document 2 proposes a method of determining whether overfitting has occurred based on the transition of the evaluation value of additional learning data, and relearning if overfitting has occurred.

特開２０１０－１９８１８９号公報Japanese Patent Application Publication No. 2010-198189 特開２０１０－２５７１４０号公報Japanese Patent Application Publication No. 2010-257140

しかし、前述の特許文献に記載の手法は、教師データの選別や過学習時のやり直し等を行ったとしても、不適切な学習データの混入や過学習の発生を確実に防ぐことが困難である。特に長期間の運用がなされる場合において、不適切な追加学習による認識精度の低下が、一切生じないようにすることは現実的でない。また、このような認識精度の低下による影響は、すぐに顕在化するとは限らない。認識精度の低下は、誤認識あるいは未認識が問題となったことではじめて認知されることになる。そして、認識精度の低下による影響が表れた場合に、その認識精度の低下の原因が、どのような追加学習にあったのかを分析することは容易ではない。このため、認識精度が低下してしまう不適切な追加学習の影響を、適切に排除可能にすることが望まれる。 However, with the method described in the above-mentioned patent document, it is difficult to reliably prevent the mixing of inappropriate training data and the occurrence of overfitting, even if the training data is selected and reworked in case of overfitting. . Particularly in the case of long-term operation, it is not realistic to ensure that there is no reduction in recognition accuracy due to inappropriate additional learning. Further, the effects of such a decrease in recognition accuracy do not necessarily become apparent immediately. A decline in recognition accuracy becomes noticeable only when erroneous recognition or non-recognition becomes a problem. When the influence of a decrease in recognition accuracy appears, it is not easy to analyze what kind of additional learning was the cause of the decrease in recognition accuracy. Therefore, it is desirable to be able to appropriately eliminate the influence of inappropriate additional learning that reduces recognition accuracy.

そこで、本発明は、認識精度を低下させる追加学習の影響を適切に排除可能とし、認識精度の低下を抑えることを目的とする。 Therefore, an object of the present invention is to make it possible to appropriately eliminate the influence of additional learning that reduces recognition accuracy, and to suppress the reduction in recognition accuracy.

本発明の情報処理装置は、モデルに基づいて、画像から特徴を認識する認識手段と、前記モデルの追加学習を行う学習手段と、前記モデルの追加学習の履歴の情報を管理する管理手段と、前記モデルによる前記認識の精度を評価する評価手段と、前記履歴の情報が管理されているモデルのなかから、前記認識の精度に基づいてモデルを選択する選択手段と、を有し、前記評価手段は、認識対象の情報の集合を記録した評価セットを保持し、前記選択手段は、前記評価セットから前記認識対象の情報を抽出した部分集合によって精度を評価した結果に基づいて、追加学習されたモデルを選択し、前記認識手段は、前記選択手段が選択したモデルを前記認識に用いることを特徴とする。 The information processing device of the present invention includes a recognition unit that recognizes features from an image based on a model, a learning unit that performs additional learning of the model, and a management unit that manages information on the history of additional learning of the model. The evaluation means includes an evaluation means for evaluating the accuracy of the recognition by the model, and a selection means for selecting a model from among the models whose history information is managed based on the accuracy of the recognition, and the evaluation means maintains an evaluation set that records a set of information to be recognized, and the selection means performs additional learning based on the result of evaluating accuracy using a subset of information to be recognized from the evaluation set. A model is selected, and the recognition means uses the model selected by the selection means for the recognition.

本発明によれば、認識精度を低下させる追加学習の影響を適切に排除可能とし、認識精度の低下を抑えることが可能となる。 According to the present invention, it is possible to appropriately eliminate the influence of additional learning that reduces recognition accuracy, and it is possible to suppress a reduction in recognition accuracy.

第１の実施形態における情報処理装置の構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of an information processing device in a first embodiment. 表示部の表示画面の一例を示した図である。It is a figure showing an example of a display screen of a display part. 第１の実施形態における情報処理のフローチャートである。It is a flowchart of information processing in a 1st embodiment. 第１の実施形態の管理部が管理するデータの一例を示す図である。FIG. 3 is a diagram showing an example of data managed by the management unit of the first embodiment. 評価部による評価セット作成処理のフローチャートである。It is a flowchart of the evaluation set creation process by an evaluation part. 評価セットの一例を示す模式図である。FIG. 2 is a schematic diagram showing an example of an evaluation set. 第１の実施形態の選択部による探索処理のフローチャートである。It is a flowchart of the search process by the selection part of 1st Embodiment. 選択部が作成する性能値の推移の一例を示す図である。FIG. 6 is a diagram illustrating an example of the transition of performance values created by the selection unit. 表示部が表示する選択画面の一例を示す図である。FIG. 3 is a diagram showing an example of a selection screen displayed by the display unit. 第２の実施形態における情報処理のフローチャートである。It is a flowchart of information processing in a 2nd embodiment. 評価部における性能値計算処理のフローチャートである。It is a flowchart of the performance value calculation process in an evaluation part. 第３の実施形態の管理部が管理するデータの一例を示す図である。FIG. 7 is a diagram illustrating an example of data managed by a management unit according to a third embodiment. 第３の実施形態の選択部によるスコア計算処理のフローチャートである。It is a flowchart of the score calculation process by the selection part of 3rd Embodiment. 第４の実施形態の管理部が管理するデータの一例を示す図である。It is a figure showing an example of data managed by a management part of a 4th embodiment. 第４の実施形態の選択部によるスコア計算処理のフローチャートである。It is a flowchart of the score calculation process by the selection part of 4th Embodiment.

以下、添付の図面を参照して、本発明の実施形態について詳細に説明する。なお、以下の実施形態において示す構成は一例に過ぎず、本発明は図示された構成に限定されるものではない。
＜第１の実施形態＞
図１は第１の実施形態に係る情報処理装置の概略的な構成例を示した図である。
本実施形態の情報処理装置は、撮影部１０１、特徴量算出部１０２、認識部１０３、学習部１０４、記憶部１０５、管理部１０６、評価部１０７、選択部１０８、表示部１０９、及び操作部１１０を有する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Note that the configurations shown in the following embodiments are merely examples, and the present invention is not limited to the illustrated configurations.
<First embodiment>
FIG. 1 is a diagram showing a schematic configuration example of an information processing apparatus according to a first embodiment.
The information processing device of this embodiment includes a photographing section 101, a feature amount calculation section 102, a recognition section 103, a learning section 104, a storage section 105, a management section 106, an evaluation section 107, a selection section 108, a display section 109, and an operation section. It has 110.

撮影部１０１は、レンズ、撮像素子、レンズ駆動用モータ、および、撮像素子とレンズ駆動用モータを制御するＭＰＵ等を有して構成されたカメラ装置である。撮影部１０１は、例えば動画を撮影し、その動画のデータを出力する。
特徴量算出部１０２は、ＭＰＵ等によって構成されている。特徴量算出部１０２は、撮影部１０１が撮影した動画に含まれる画像から特徴量を算出する。 The photographing unit 101 is a camera device that includes a lens, an image sensor, a lens drive motor, an MPU that controls the image sensor and the lens drive motor, and the like. The photographing unit 101 photographs a moving image, for example, and outputs data of the moving image.
The feature amount calculation unit 102 is configured by an MPU or the like. The feature amount calculation unit 102 calculates feature amounts from images included in the video captured by the imaging unit 101.

認識部１０３は、ＭＰＵ等によって構成されている。認識部１０３は、特徴量算出部１０２が算出した特徴量を基に、動画の画像内に出現した対象物体を認識する。詳細は後述するが、認識部１０３は、特徴量算出部１０２が算出した特徴量について、後述の選択部１０８によって選択された統計モデルを基に、画像内の対象物体の認識に用いる認識スコアを算出する。そして、認識部１０３は、撮影された動画内で認識した対象物体について、異常な動作を行った対象物体を認識する。なお、本実施形態の場合、認識部１０３は、対象物体として人物を認識し、更にそれら人物のなかで異常な行動をとった人物を認識する。 The recognition unit 103 is configured by an MPU or the like. The recognition unit 103 recognizes the target object that appears in the video image based on the feature amount calculated by the feature amount calculation unit 102. Although details will be described later, the recognition unit 103 calculates a recognition score for use in recognizing the target object in the image based on the statistical model selected by the selection unit 108 (described later) for the feature quantity calculated by the feature quantity calculation unit 102. calculate. Then, the recognition unit 103 recognizes a target object that has performed an abnormal motion among the target objects recognized in the captured video. In the case of this embodiment, the recognition unit 103 recognizes a person as a target object, and further recognizes a person who has behaved abnormally among these people.

学習部１０４は、ＭＰＵ等によって構成されている。学習部１０４は、認識部１０３が用いるモデルに、教師データから特徴量算出部１０２が算出した特徴量を追加して再学習することにより、教師データの特徴を追加的に学習したモデルを作成する。 The learning unit 104 is configured by an MPU or the like. The learning unit 104 adds the feature quantities calculated by the feature quantity calculation unit 102 from the teacher data to the model used by the recognition unit 103 and performs re-learning, thereby creating a model that has additionally learned the features of the teacher data. .

記憶部１０５は、ハードディスクなどの記録メディア、およびＭＰＵ等によって構成されている。記憶部１０５は、撮影部１０１が撮影した動画のデータ、特徴量算出部１０２が算出した特徴量、認識部１０３が認識を行った結果の情報、学習部１０４が作成したモデル、これらの関係を表現する情報、および作成時刻情報などのメタデータを保存する。なお、記憶部１０５は、記録メディアの代わりに、ＮＡＳ（Network Attached Storage）、ＳＡＮ（Storage Area Network）、クラウドサービスなどのネットワーク上のストレージを用いてもよい。 The storage unit 105 includes a recording medium such as a hard disk, an MPU, and the like. The storage unit 105 stores the data of the video captured by the imaging unit 101, the feature quantities computed by the feature quantity calculation unit 102, information on the results of recognition performed by the recognition unit 103, the model created by the learning unit 104, and the relationships among these. Stores the information to be represented and metadata such as creation time information. Note that the storage unit 105 may use storage on a network such as a NAS (Network Attached Storage), a SAN (Storage Area Network), or a cloud service instead of a recording medium.

管理部１０６は、ＭＰＵ等によって構成されている。管理部１０６は、学習部１０４で作成されたモデル、モデルの作成日時、学習に用いた教師データ、および認識部１０３での使用状況などの各情報を合わせて記録し、学習されたモデルの履歴を管理する。管理部１０６が管理する情報の詳細は後述する。
評価部１０７は、ＭＰＵ等によって構成されている。評価部１０７は、指定されたモデルについて性能を評価する。性能評価の詳細は後述する。
選択部１０８は、ＭＰＵ等によって構成されている。選択部１０８は、管理部１０６によって管理されているモデルのなかから、評価部１０７の評価結果などに基づいて、認識部１０３が使用するモデルを選択する。モデル選択の詳細は後述する。 The management unit 106 is composed of an MPU and the like. The management unit 106 records information such as the model created by the learning unit 104, the date and time of model creation, teacher data used for learning, and usage status in the recognition unit 103, and records the history of the learned model. Manage. Details of the information managed by the management unit 106 will be described later.
The evaluation unit 107 is configured by an MPU or the like. The evaluation unit 107 evaluates the performance of the specified model. Details of the performance evaluation will be described later.
The selection unit 108 is configured by an MPU or the like. The selection unit 108 selects a model to be used by the recognition unit 103 from among the models managed by the management unit 106 based on the evaluation result of the evaluation unit 107 and the like. Details of model selection will be described later.

表示部１０９は、液晶画面とこれを制御するＭＰＵ等によって構成されている。表示部１０９は、利用者に様々な情報を提示する。また、表示部１０９は、利用者が指示を入力する際に用いられるユーザーインターフェース（ＵＩ）画面を作成して表示する。ＵＩ画面の詳細は後述する。
操作部１１０は、スイッチとタッチパネル等を有して構成されている。操作部１１０は、利用者による操作を感知して情報処理装置に入力する。なお、操作部１１０は、タッチパネルの代わりにマウスやトラックボールなど他のポインティングデバイスを有していてもよい。 The display unit 109 includes a liquid crystal screen and an MPU that controls the screen. The display unit 109 presents various information to the user. Furthermore, the display unit 109 creates and displays a user interface (UI) screen used when the user inputs instructions. Details of the UI screen will be described later.
The operation unit 110 includes a switch, a touch panel, and the like. The operation unit 110 senses a user's operation and inputs it to the information processing device. Note that the operation unit 110 may include another pointing device such as a mouse or a trackball instead of the touch panel.

なお、本実施形態では、撮影された動画から対象物体としての人物の異常行動を認識する装置を例に挙げて説明しているが、認識の対象はこれに限らない。認識の対象は、例えば、特定の人物、特定の車両、又は特定の車種などであってもよいし、イベントや時間帯などであってもよい。また認識の対象は、動画の他に、音声、文書、さらにはそれらの組み合わせでもよい。 Note that although the present embodiment has been described using as an example a device that recognizes abnormal behavior of a person as a target object from a captured video, the recognition target is not limited to this. The recognition target may be, for example, a specific person, a specific vehicle, or a specific car model, or may be an event, a time period, or the like. In addition to moving images, the recognition target may also be audio, documents, or a combination thereof.

図２（Ａ）～図２（Ｄ）は、本実施形態における表示部１０９の動作および利用者の操作の一例の説明に用いる図である。図２（Ａ）の映像２０１、図２（Ｂ）の映像２０４、図２（Ｃ）の映像２０７、及び、図２（Ｄ）の映像２１１は、撮影部１０１で撮影した動画が表示部１０９の画面に表示された映像例を示している。 FIGS. 2(A) to 2(D) are diagrams used to explain an example of the operation of the display unit 109 and the user's operation in this embodiment. The video 201 in FIG. 2A, the video 204 in FIG. 2B, the video 207 in FIG. 2C, and the video 211 in FIG. This shows an example of the image displayed on the screen.

図２（Ａ）の映像２０１は通常可動時の動画の映像例を示している。撮影部１０１は監視カメラとして監視対象箇所に設置されている。表示部１０９は、その撮影部１０１が撮影している動画をライブ映像として表示している。利用者は、表示部１０９に表示されているライブ映像を見て、監視対象箇所に異常が発生していないかどうかを監視している。認識部１０３は、監視対象箇所に人物が出現した場合に、その出現した人物を認識して表示部１０９に通知する。表示部１０９は、その人物の映像に、所定のオブジェクト（例えば人物を内接する矩形オブジェクト）を重畳してライブ映像上に表示する。図２（Ａ）の映像２０１は、監視対象箇所の動画から認識された人物に対して矩形オブジェクト２０２，２０３が重畳表示された例を示している。 An image 201 in FIG. 2A shows an example of a moving image during normal movement. The photographing unit 101 is installed as a surveillance camera at a location to be monitored. The display unit 109 displays the moving image being photographed by the photographing unit 101 as a live video. The user watches the live video displayed on the display unit 109 to monitor whether or not an abnormality has occurred in the monitoring target location. When a person appears at the monitoring target location, the recognition unit 103 recognizes the person and notifies the display unit 109 of the person. The display unit 109 superimposes a predetermined object (for example, a rectangular object that inscribes the person) on the image of the person, and displays the superimposed object on the live image. A video 201 in FIG. 2A shows an example in which rectangular objects 202 and 203 are displayed superimposed on a person recognized from a video of a monitoring target location.

また表示部１０９は、映像２０１の画面上にタイムバー２０９を表示する。タイムバー２０９は、操作部１１０を介して利用者による操作が可能となされている。そして、タイムバー２０９は、記憶部１０５に記憶された動画の撮影時間と対応付けられている。例えば利用者は、ライブ映像について例えば過去に表示された映像を見たい場合などに、操作部１１０を介してタイムバー２０９を操作することにより、過去の映像の表示を指示することができる。表示部１０９は、操作部１１０からタイムバー２０９が操作されたことの通知を受けると、記憶部１０５から、タイムバー２０９の操作位置に応じた撮影時間の動画データを読み出して、その撮影時間の映像を画面上に表示する。 The display unit 109 also displays a time bar 209 on the screen of the video 201. The time bar 209 can be operated by the user via the operation unit 110. The time bar 209 is associated with the video shooting time stored in the storage unit 105. For example, when a user wants to view a live video that was displayed in the past, the user can instruct the display of the past video by operating the time bar 209 via the operation unit 110. When the display unit 109 receives a notification from the operation unit 110 that the time bar 209 has been operated, the display unit 109 reads the video data of the shooting time corresponding to the operating position of the time bar 209 from the storage unit 105, and displays the video data of the shooting time corresponding to the operating position of the time bar 209. Display the image on the screen.

また認識部１０３は、前述したように監視対象箇所に出現した人物の認識処理に加え、その人物における異常な行動を認識する処理も行う。図２（Ｂ）の映像２０４は、人物の異常な行動の一例として、人物２０５が転倒した場合の映像例を示している。認識部１０３は、人物２０５が転倒した場合、通常の歩行とは異なる異常が発生したことを認識して表示部１０９に通知する。表示部１０９は、認識部１０３から人物２０５において異常が発生したことの通知を受けると、その人物２０５の近傍に、人物の行動に異常が発生したことを表す警報オブジェクト２０６を表示する。これにより、利用者は、異常が発生したことを認識可能となり、その異常の状態を確認して必要な措置（例えば転倒者の救護などの必要な措置）をとることが可能となる。なお、図２（Ｂ）の例では警報オブジェクト２０６を画面上に表示したが、警報は、画面上の表示によるものに限らず、例えば警報音の発生でもよいし、利用者や警備員の端末にメールやショートメッセージを送信するようなものでもよい。 Furthermore, the recognition unit 103 performs a process of recognizing a person appearing at a monitoring target location as described above, as well as a process of recognizing abnormal behavior of the person. A video 204 in FIG. 2B shows an example of a video in which a person 205 falls, as an example of the person's abnormal behavior. When the person 205 falls, the recognition unit 103 recognizes that an abnormality different from normal walking has occurred and notifies the display unit 109. When receiving a notification from the recognition unit 103 that an abnormality has occurred in the person 205, the display unit 109 displays an alarm object 206 near the person 205, indicating that an abnormality has occurred in the person's behavior. This allows the user to recognize that an abnormality has occurred, confirm the state of the abnormality, and take necessary measures (for example, necessary measures such as rescuing a person who has fallen). Note that although the alarm object 206 is displayed on the screen in the example of FIG. It may be something like sending an email or short message to someone.

認識部１０３は、前述したように人物の異常な行動を認識可能であるが、例えば、正常に歩行している人物について、異常な行動がとったと誤認識してしまう場合があり得る。図２（Ｃ）の映像２０７は、人物が正常に歩行しているのに、異常な行動が発生したと認識部１０３が誤認識したことにより、画面上に警報オブジェクト２０８が表示された例を示している。このときの利用者は、画面上の映像を見ることで、警報が誤った警報（つまり誤報）であるかどうかを判断することになる。 Although the recognition unit 103 is capable of recognizing abnormal behavior of a person as described above, for example, it may erroneously recognize that a person who is walking normally has acted abnormally. A video 207 in FIG. 2C shows an example in which a warning object 208 is displayed on the screen because the recognition unit 103 mistakenly recognizes that an abnormal behavior has occurred even though the person is walking normally. It shows. At this time, the user will judge whether the alarm is a false alarm (that is, a false alarm) by looking at the image on the screen.

ここで、利用者によりタイムバー２０９が操作されて過去の撮影時間を選択する指示がなされた上で、さらに利用者によって警報オブジェクト２０８の表示箇所に対する選択指示がなされたとする。なお、警報オブジェクト２０８の表示箇所に対する利用者の選択指示は、一例として警報オブジェクト２０８の位置を例えばタップするような操作であるとする。これらの選択指示がなされた場合、認識部１０３は、警報オブジェクト２０８による警報が誤報であると利用者が判断したと認識して、表示部１０９に通知する。このときの表示部１０９は、画面上に警報取り消し用のダイアログボックス２１０を表示する。当該ダイアログボックス２１０には、例えば「取り消しますか？」のメッセージ、及び「ＯＫ」と「キャンセル」のボタンが含まれる。「ＯＫ」ボタンは、利用者が警報取り消しを指示する際にタップ等されるボタンである。「キャンセル」ボタンは、利用者が警報取り消しを指示しない場合にタップ等されるボタンである。そして、表示部１０９は、例えば「ＯＫ］のボタンを利用者が選択した旨の情報を操作部１１０から受け取ると、警報の取り消しを利用者が指示したと判断して、警報オブジェクト２０８を画面から消す。そして、管理部１０６は、取り消された誤報に関する情報を記憶部１０５に保存させる。なお、取り消された誤報に関する情報は、認識部１０３が異常な行動であると認識した人物の画像領域の特徴量、撮影時間、およびタイムバー２０９の操作で利用者により選択された過去の撮影時間の情報などを含む。詳細は後述するが、記憶部１０５に保存された誤報に関する情報は、評価部１０７がモデルの性能を評価する際に使用される。 Here, it is assumed that the user operates the time bar 209 to instruct to select a past shooting time, and then further instructs the user to select a display location of the alarm object 208. It is assumed that the user's selection instruction for the display location of the alarm object 208 is, for example, an operation such as tapping the position of the alarm object 208. When these selection instructions are given, the recognition unit 103 recognizes that the user has determined that the alarm issued by the alarm object 208 is a false alarm, and notifies the display unit 109. At this time, the display unit 109 displays a dialog box 210 for canceling the warning on the screen. The dialog box 210 includes, for example, a message "Do you want to cancel?" and buttons "OK" and "Cancel." The "OK" button is a button that is tapped when the user instructs to cancel the warning. The "cancel" button is a button that is tapped when the user does not instruct to cancel the warning. When the display unit 109 receives information from the operation unit 110 indicating that the user has selected the "OK" button, the display unit 109 determines that the user has instructed to cancel the alarm, and removes the alarm object 208 from the screen. Then, the management unit 106 causes the storage unit 105 to store information regarding the canceled false alarm.The information regarding the canceled false alarm is stored in the image area of the person whose behavior was recognized by the recognition unit 103 as abnormal. It includes information on feature amounts, shooting times, and past shooting times selected by the user by operating the time bar 209.As will be described in detail later, information regarding false alarms stored in the storage unit 105 is stored in the evaluation unit 107. is used when evaluating the performance of the model.

また前述したような監視の運用が行われている間、情報処理装置の学習部１０４は、バックグランド処理として、認識部１０３による認識結果に対する追加学習を行う。学習部１０４は、前述のように映像中に出現した各人物に対して認識部１０３が行った認識結果、人物の撮影時間、人物の画像領域、および特徴量算出部１０２が取得した特徴量などの情報を、人物に関する情報として記憶部１０５に蓄積させる。さらに、学習部１０４は、例えば１日ごとに、記憶部１０５に蓄積した情報を教師データとして追加学習を行う。そして、認識部１０３は、その学習によるモデルを、監視対象箇所での人物の認識処理および異常行動の認識処理に用いる。これにより、認識部１０３における認識精度の向上が図られている。また、認識部１０３における認識精度が保たれている限り、利用者はそのことを意識する必要がなく、また表示部１０９には特に追加学習の情報が表示されることはない。 Further, while the above-described monitoring operation is being performed, the learning unit 104 of the information processing device performs additional learning on the recognition result by the recognition unit 103 as background processing. The learning unit 104 uses the recognition results performed by the recognition unit 103 for each person appearing in the video as described above, the photographing time of the person, the image area of the person, the feature amount acquired by the feature amount calculation unit 102, etc. The information is stored in the storage unit 105 as information regarding the person. Furthermore, the learning unit 104 performs additional learning, for example, every day, using the information accumulated in the storage unit 105 as teacher data. Then, the recognition unit 103 uses the learned model for person recognition processing and abnormal behavior recognition processing at the monitoring target location. Thereby, the recognition accuracy in the recognition unit 103 is improved. Further, as long as the recognition accuracy in the recognition unit 103 is maintained, the user does not need to be aware of this, and no additional learning information is particularly displayed on the display unit 109.

しかしながら、前述のような追加学習を行ったにもかかわらず、認識精度が向上しない場合や、逆に認識精度が低下してしまうことがあり得る。その原因として、例えば人物の行動が正常であるのに異常行動として誤認識されて誤報が発生し、その誤報を利用者が見逃してしまい、その後、当該誤報に係る認識結果を教師データとして追加学習が行われた場合が考えらえる。その他にも、例えば、監視対象箇所において普段と異なるイベント等が催された場合に、そのイベント時の人物の行動、つまり普段とは異なった行動に係る大量の教師データが追加学習された場合などが考えられる。このような学習結果による品質の低いモデルが認識処理に用いられると、認識部１０３は、例えば人物の異常行動が多数発生しているのに、それらを異常行動として認識せずに見逃してしまうことがある。そして、このように認識精度が低下してしまうと、情報処理装置による監視の運用が困難になる。したがって、認識精度が低下してしまう不適切な追加学習を、適切に除去可能にしてて認識精度の低下を抑えることが望まれる。本実施形態では、認識精度の低下が発生する前の状態に戻すことで、不適切な追加学習が行われる前の、認識精度が高い状態に戻すようにする。 However, even though the above-described additional learning is performed, the recognition accuracy may not improve or may conversely deteriorate. The cause of this is, for example, when a person's behavior is erroneously recognized as abnormal even though it is normal, resulting in a false alarm, which is overlooked by the user, and then additional learning is performed using the recognition results related to the false alarm as training data. I can think of a case where this was done. In addition, for example, when an event that is different from normal is held at a monitored location, a large amount of training data related to the behavior of the person at the time of the event, that is, the behavior that is different from normal, is additionally learned. is possible. If a low-quality model based on such learning results is used for recognition processing, the recognition unit 103 may, for example, overlook many abnormal behaviors of a person without recognizing them as abnormal behaviors. There is. If the recognition accuracy decreases in this way, it becomes difficult to perform monitoring by the information processing device. Therefore, it is desirable to be able to appropriately remove inappropriate additional learning that causes a decrease in recognition accuracy, thereby suppressing the decrease in recognition accuracy. In this embodiment, by returning to the state before the recognition accuracy deteriorates, the recognition accuracy is returned to the state before inappropriate additional learning is performed.

このため、本実施形態の情報処理装置では、評価部１０７が、追加学習によるモデルについて性能評価を行うようになされている。評価部１０７は、当該モデルの性能評価の結果、そのモデルを用いた場合に認識機能の性能が低下することを検出した場合、その旨を表示部１０９に通知する。この通知を受けた表示部１０９は、認識機能の性能が低下したことを表す情報を表示する。これにより利用者は、認識機能の性能低下が生じたことを認識できる。 Therefore, in the information processing apparatus of this embodiment, the evaluation unit 107 performs performance evaluation on the model based on additional learning. If the evaluation unit 107 detects that the performance of the recognition function deteriorates when the model is used as a result of the performance evaluation of the model, it notifies the display unit 109 to that effect. The display unit 109 that receives this notification displays information indicating that the performance of the recognition function has deteriorated. This allows the user to recognize that the performance of the recognition function has deteriorated.

図２（Ｄ）の映像２１１は、認識機能の性能低下が検出された場合に表示部１０９が表示する映像例を示している。表示部１０９は、図２（Ｄ）に示すダイアログボックス２１２を映像２１１内に表示する。ダイアログボックス２１２は、利用者に提示するメッセージ、「ＯＫ」、「キャンセル」、および「詳細確認」のボタンを含む。利用者に提示されるメッセージは、認識機能の低下が生じたことを表すメッセージと、認識機能の低下が生じる前のモデルの状態に戻すかを問うメッセージとからなる。なお、認識機能の低下が生じる前のモデルは、例えば、認識機能の低下が生じた時点より前で且つ現時点から最も近い日時に記憶部１０５に保存されたモデルであることが望ましい。 An image 211 in FIG. 2(D) shows an example of an image displayed on the display unit 109 when a decrease in the performance of the recognition function is detected. The display unit 109 displays a dialog box 212 shown in FIG. 2(D) within the video 211. Dialog box 212 includes a message to be presented to the user, buttons for "OK," "cancel," and "confirm details." The message presented to the user consists of a message indicating that a decline in cognitive function has occurred and a message asking whether to return to the state of the model before the decline in cognitive function occurred. Note that it is preferable that the model before the cognitive function deteriorates is a model stored in the storage unit 105 at a date and time closest to the present time and before the time when the cognitive function deteriorates, for example.

ここで本実施形態において、学習部１０４が学習で作成して記憶部１０５に保存したモデルは、管理部１０６により管理されている。また本実施形態において、認識部１０３が認識処理で使用するモデルは、管理部１０６が管理しているモデルの中から選択部１０８によって選択される。評価部１０７により認識機能の性能低下が検出された場合、選択部１０８は、管理部１０６が管理しているモデルの中から、認識機能の低下が生じる前のモデルを、認識機能の低下が生じる前のモデルの状態に戻す際の差し戻し候補のモデルとして選択する。そして、選択部１０８は、その差し戻し候補のモデルを、表示部１０９に通知する。表示部１０９は、その差し戻し候補モデルが記憶部１０５に保存された日時を基に、図２（Ｄ）に示したような、認識機能の低下が生じる前のモデルの状態に戻すことを利用者に問うためのメッセージを生成して表示する。このようなダイアログボックス２１２が表示されることで、利用者は、モデルの劣化によって認識精度の低下が生じたことと、認識精度が低下する前の差し戻し候補の時点のモデルとを知ることができる。 In this embodiment, the model created by learning by the learning unit 104 and stored in the storage unit 105 is managed by the management unit 106. Further, in this embodiment, the model used by the recognition unit 103 in recognition processing is selected by the selection unit 108 from among the models managed by the management unit 106. When the evaluation unit 107 detects a decrease in the performance of the recognition function, the selection unit 108 selects a model from before the decrease in the recognition function occurs from among the models managed by the management unit 106, and a model before the decrease in the recognition function occurs. Select as a model to be returned when returning to the previous model state. Then, the selection unit 108 notifies the display unit 109 of the model to be sent back. The display unit 109 prompts the user to return the model to the state before the recognition function deteriorated, as shown in FIG. Generate and display a message to ask the question. By displaying such a dialog box 212, the user can know that the recognition accuracy has decreased due to deterioration of the model and the model at the time of the remand candidate before the recognition accuracy decreased. .

そして、例えば利用者がダイアログボックス２１２の「ＯＫ」ボタンをタップ等したことの通知を操作部１１０から受け取ると、選択部１０８は、認識部１０３が用いるモデルとして、差し戻し候補モデルを選択する。これにより、認識部１０３では、差し戻し候補モデル、つまり認識精度が低下する前のモデルを用いた認識処理が行われることになる。また例えば、利用者が「キャンセル」ボタンをタップ等したことの通知を操作部１１０から受け取ると、選択部１０８は、認識部１０３が用いるモデルとして、現在のモデルを選択する。これにより、認識部１０３では、現在のモデルを用いた認識処理が行われることになる。また例えば、利用者が「詳細確認」ボタンをタップ等した場合、その通知は操作部１１０から表示部１０９に送られる。この時の表示部１０９は、記憶部１０５から、差し戻し候補モデルに関する詳細情報を表示する。詳細情報は、利用者が差し戻し候補モデルを選択するか否かを判断する際に必要な情報である。詳細情報は、例えば、差し戻し候補モデルが学習された際の日時情報、このモデルを用いて認識部１０３が認識処理を行った際の認識結果、および、このモデルの学習時に用いられた人物の画像とその画像領域の特徴量などを含んでいてもよい。 For example, when receiving a notification from the operation unit 110 that the user has tapped an “OK” button in the dialog box 212, the selection unit 108 selects the return candidate model as the model to be used by the recognition unit 103. As a result, the recognition unit 103 performs recognition processing using the returned candidate model, that is, the model before the recognition accuracy deteriorates. For example, when receiving a notification from the operation unit 110 that the user has tapped a “cancel” button, the selection unit 108 selects the current model as the model used by the recognition unit 103. As a result, the recognition unit 103 performs recognition processing using the current model. Further, for example, when the user taps a "details confirmation" button, a notification thereof is sent from the operation unit 110 to the display unit 109. At this time, the display unit 109 displays detailed information regarding the returned candidate model from the storage unit 105. The detailed information is information necessary for the user to determine whether to select the returned candidate model. The detailed information includes, for example, the date and time information when the returned candidate model was trained, the recognition result when the recognition unit 103 performed recognition processing using this model, and the image of the person used when learning this model. and the feature amount of the image area.

図３は、前述したような本実施形態の情報処理装置における情報処理の流れを示すフローチャートである。本実施形態の情報処理装置は、撮影部１０１から動画の１フレームが入力されるごとに、図３のフローチャートの処理を実行する。なお、図３のフローチャートによる処理対象のフレームは、撮影部１０１からの動画のフレームだけに限定されない。例えば、処理対象のフレームは、以前に取得等した個別の動画ファイルのフレームでもよいし、データベース又はビデオマネジメントシステム（ＶＭＳ）等に蓄積された動画ファイルから日時の範囲が指定されて取り出された動画のフレームでもよい。また例えば、カメラの映像ストリームを指定して、記憶部１０５に逐次記録された動画データのフレームでもよい。 FIG. 3 is a flowchart showing the flow of information processing in the information processing apparatus of this embodiment as described above. The information processing apparatus of this embodiment executes the process shown in the flowchart of FIG. 3 every time one frame of a moving image is input from the imaging unit 101. Note that the frames to be processed according to the flowchart of FIG. 3 are not limited to frames of the moving image from the imaging unit 101. For example, the frames to be processed may be frames of individual video files that have been previously acquired, or videos that have been extracted with a specified date and time range from video files stored in a database or video management system (VMS), etc. It may be a frame of Alternatively, for example, frames of video data sequentially recorded in the storage unit 105 by specifying a video stream of a camera may be used.

まずステップＳ３０１において、特徴量算出部１０２は、撮影部１０１が現在撮影している動画のフレームから画像の特徴量を抽出する。特徴量算出部１０２は、フレームの画像から人物の領域を公知の人体検出の手法を用いて検出し、その人体の画像領域ごとにＨＯＧ（ＨｉｓｔｏｇｒａｍＯｆＧｒａｄｉｅｎｔ）を計算して実数値ベクトルを得ることによって特徴量を取得する。なお、特徴量の抽出方法はこれに限るものではない。 First, in step S301, the feature amount calculation unit 102 extracts the feature amount of an image from the frame of the video that the imaging unit 101 is currently shooting. The feature calculation unit 102 detects a human region from the frame image using a known human body detection method, and calculates a HOG (Histogram Of Gradient) for each human body image region to obtain a real value vector. Obtain the features by Note that the method for extracting feature quantities is not limited to this.

次にステップＳ３０２において、認識部１０３は、ステップＳ３０１で人物ごとに取得された特徴量のそれぞれについて、選択部１０８によって選択されているモデルを用いた認識処理を行って認識スコアを取得する。本実施形態の場合、モデルは混合ガウス分布モデルを用いる。認識部１０３は、混合ガウス分布モデルにおける分布関数に特徴量の値を代入して得られる値を認識スコアとして取得する。例えば、認識部１０３は、モデルＭを用いた際の特徴量ｘの認識スコアＳ（ｘ；Ｍ）を以下の式（１）により算出する。 Next, in step S302, the recognition unit 103 performs recognition processing using the model selected by the selection unit 108 for each of the feature amounts acquired for each person in step S301, and acquires a recognition score. In the case of this embodiment, a Gaussian mixture distribution model is used as the model. The recognition unit 103 obtains, as a recognition score, a value obtained by substituting the value of the feature amount into the distribution function in the Gaussian mixture distribution model. For example, the recognition unit 103 calculates the recognition score S(x;M) of the feature x using the model M using the following equation (1).

式（１）中のＫは混合数、Ｎ（ｘ；μ，σ）は平均μおよび分散共分散行列σの多変量ガウス分布（正規分布）でのｘの値、ω_iはモデルＭのｉ番目の分布の重み、μ_iはモデルＭのｉ番目の分布の平均、σ_iはモデルＭのｉ番目の分布の分散共分散行列である。重みは正の実数値を取り、ω１からωＫの総計は１である。認識スコアの値域は［０，１］の範囲の実数値であり、１に近いほど正常を表し、０に近いほど異常を表す。なおこの例に限らず、モデルは、例えばニューラルネットワークモデルや最近傍モデルでもよい。最近傍モデルについては後述する第３の実施形態において説明する。 In equation (1), K is the number of mixtures, N(x; μ, σ) is the value of x in a multivariate Gaussian distribution (normal distribution) with mean μ and variance-covariance matrix σ, and ω _i is i of model M. The weight of the ith distribution, μ _i is the mean of the ith distribution of model M, and σ _i is the variance-covariance matrix of the ith distribution of model M. The weights take positive real values, and the sum of ω1 to ωK is 1. The range of the recognition score is a real value in the range [0, 1], where the closer it is to 1, the more normal it is, and the closer it is to 0, the more abnormal it is. Note that the model is not limited to this example, and the model may be, for example, a neural network model or a nearest neighbor model. The nearest neighbor model will be explained in the third embodiment described later.

次にステップＳ３０３において、認識部１０３は、ステップＳ３０２で算出した認識スコアに基づいた処理を行う。認識部１０３は、認識スコアが所定の閾値よりも低い場合に人物の行動が異常行動であると判定する。そして、表示部１０９は、図２（Ｂ）の映像２０４に示したような警報オブジェクト２０６を表示する。
次に、ステップＳ３０４において、記憶部１０５は、ステップＳ３０１で抽出された特徴量の情報を保存する。 Next, in step S303, the recognition unit 103 performs processing based on the recognition score calculated in step S302. The recognition unit 103 determines that the person's behavior is abnormal behavior when the recognition score is lower than a predetermined threshold. The display unit 109 then displays an alarm object 206 as shown in the video 204 of FIG. 2(B).
Next, in step S304, the storage unit 105 stores information on the feature amount extracted in step S301.

次にステップＳ３０５において、学習部１０４は、追加学習を実行する条件が満たされたかどうかを判定する。学習部１０４は、条件が満たされていると判定した場合にはステップＳ３０６に処理を進める。一方、ステップＳ３０５において、条件が満たされていないと判定した場合、情報処理装置は、図３のフローチャートの処理を終了する。本実施形態において、追加学習を実行する条件は、図３のフローチャートの処理が例えば当日の０時を過ぎて初めて実行されたかどうかである。すなわち追加学習は、当日の０時を過ぎて初めて図３のフローチャートの処理が実行されたという条件が満たされた場合に限って実行される。これにより、追加学習は、毎日１回ずつ、その日に得られた特徴量を用いて行われることになる。 Next, in step S305, the learning unit 104 determines whether the conditions for performing additional learning are satisfied. If the learning unit 104 determines that the condition is satisfied, the process proceeds to step S306. On the other hand, if it is determined in step S305 that the condition is not satisfied, the information processing apparatus ends the process of the flowchart in FIG. In this embodiment, the condition for executing additional learning is whether the process of the flowchart in FIG. 3 is executed for the first time after 0:00 on that day, for example. That is, additional learning is executed only when the condition that the process of the flowchart of FIG. 3 is executed for the first time after 0:00 on the day is satisfied. As a result, additional learning is performed once every day using the feature amounts obtained on that day.

追加学習を実行する条件は前述した条件に限らない。例えば、追加学習の実行条件としての期間は、１日ごとだけでなく、半日ごとであってもよいし、１週間ごとであってもよい。また追加学習の実行条件は、ステップＳ３０４で保存された特徴量が所定個数以上蓄積されたタイミングであってもよい。所定個数が例えば１０００個である場合、学習部１０４は、ステップＳ３０４で保存された特徴量が１０００個以上蓄積されたタイミングごとに追加学習を実行する。また追加学習の実行条件は、例えば特徴量が得られた時点であってもよい。この場合、学習部１０４は、特徴量が得られるたびに常に追加学習を実行する。また追加学習の実行条件は、利用者が指示したタイミングであってもよい。さらに追加学習の実行条件は、前述した各条件の二つ以上を組み合わせた条件であってもよい。 The conditions for performing additional learning are not limited to the conditions described above. For example, the period as a condition for performing additional learning may be not only every day, but also every half day, or every week. Further, the execution condition for additional learning may be the timing at which a predetermined number or more of the feature amounts saved in step S304 are accumulated. If the predetermined number is, for example, 1000, the learning unit 104 performs additional learning at each timing when 1000 or more feature quantities saved in step S304 are accumulated. Further, the execution condition for additional learning may be, for example, the time point when the feature amount is obtained. In this case, the learning unit 104 always performs additional learning every time a feature amount is obtained. Further, the execution condition for additional learning may be the timing instructed by the user. Furthermore, the execution condition for additional learning may be a combination of two or more of the above-mentioned conditions.

次にステップＳ３０６において、学習部１０４は、選択部１０８によって選択されているモデルについて、記憶部１０５に記憶された特徴量を用いて、ＥＭアルゴリズムによる追加学習を行う。
次にステップＳ３０７において、評価部１０７は、認識部１０３が認識する認識対象の集合を、性能評価に用いる評価セットとして作成して保持する。評価セットは、特徴量の集合、特徴量ごとの正解ラベル、および、その特徴量を重視する度合いである重みの情報を含む。評価セットの作成方法は後述する。
次にステップＳ３０８において、評価部１０７は、ステップＳ３０７において作成した評価セットに基づき、選択部１０８によって選択されているモデルが劣化しているかどうかを判定する。
このときの評価部１０７は、ステップＳ３０７で作成した評価セットに含まれる特徴量を認識部１０３で認識した時の認識スコアを、追加学習の前後のモデルでそれぞれ算出して、重み付き平均を取る。モデルＭの性能値Ｓ（Ｍ）は、以下の式（２）で算出する。 Next, in step S306, the learning unit 104 performs additional learning using the EM algorithm for the model selected by the selection unit 108, using the feature quantities stored in the storage unit 105.
Next, in step S307, the evaluation unit 107 creates and holds a set of recognition targets recognized by the recognition unit 103 as an evaluation set used for performance evaluation. The evaluation set includes a set of feature quantities, a correct label for each feature quantity, and information on weight, which is the degree to which the feature quantity is emphasized. The method for creating the evaluation set will be described later.
Next, in step S308, the evaluation unit 107 determines whether the model selected by the selection unit 108 is degraded based on the evaluation set created in step S307.
At this time, the evaluation unit 107 calculates the recognition scores when the recognition unit 103 recognizes the feature amounts included in the evaluation set created in step S307 using the models before and after the additional learning, and takes a weighted average. . The performance value S(M) of the model M is calculated using the following equation (2).

式（２）中のＴは評価セットの特徴量集合であり、＃ＴはＴの要素数である。ｗ_xは特徴量ｘの重みである。ｐ_xはｘの正解ラベルが正常ならば＋１、異常ならば－１である。すなわちモデルの性能値Ｓ（Ｍ）は、人物の行動が正常で高く、異常で低いスコアを算出するほど大きくなる。例えば、追加学習後のモデルの性能値が、追加学習前のモデルの性能値の０．９８倍未満であった場合、評価部１０７は、モデルが劣化したと判定し、そうでない場合は劣化していないと判定する。なお、本実施形態の場合、追加学習におけるぶれを吸収するために２％程度性能値が下がっても劣化としないこととしているが、これは１．００倍など他の数値でもよい。また、モデルの性能値は、追加学習の進行度合いに応じて変えてもよく、追加学習の履歴が少ない場合はより性能の向上が見込めるため１．０２倍とし、履歴が十分に蓄積した後は０．９８倍にするなどしてもよい。また、モデルの劣化判定は、これらの例に限らず、モデルの性能値の差によって行ってもよい。また、比較対象は、追加学習前のモデルの性能値ではなく、モデルの履歴で最大の性能値を示すモデルなどにしてもよい。 T in equation (2) is a feature set of the evaluation set, and #T is the number of elements of T. w _x is the weight of the feature x. p _x is +1 if the correct label of x is normal, and -1 if it is abnormal. That is, the performance value S(M) of the model increases as the score is higher when the person's behavior is normal and lower when the behavior of the person is abnormal. For example, if the performance value of the model after additional learning is less than 0.98 times the performance value of the model before additional learning, the evaluation unit 107 determines that the model has deteriorated; otherwise, the model has deteriorated. It is determined that the In the case of this embodiment, even if the performance value decreases by about 2% in order to absorb the blurring in the additional learning, it is not considered as deterioration, but this may be another value such as 1.00 times. In addition, the performance value of the model may be changed depending on the progress of additional learning.If the history of additional learning is small, the performance can be expected to improve further, so it is set to 1.02 times, and after the history has accumulated enough, For example, it may be multiplied by 0.98. Furthermore, the determination of model deterioration is not limited to these examples, and may be performed based on the difference in performance values of the models. Further, the comparison target may be a model showing the maximum performance value in the model history, instead of the performance value of the model before additional learning.

次にステップＳ３２０において、評価部１０７は、ステップＳ３０８での判定結果が劣化かそうでないかを判定し、劣化していると判定していた場合はステップＳ３０９に進み、そうでなければステップＳ３１３に進む。
ステップＳ３２０において劣化していなかったと判定された場合、ステップＳ３１３において、管理部１０６は、ステップＳ３０６で追加学習されたモデルをモデル履歴に追加する。そして次のステップＳ３１４において、選択部１０８は、ステップＳ３１３において追加されたモデルを現在のモデルとして選択する。このステップ３１４の後、情報処理装置は、図３のフローチャートの処理を終了する。 Next, in step S320, the evaluation unit 107 determines whether the determination result in step S308 is degraded or not. If it is determined that the determination result is degraded, the process proceeds to step S309; otherwise, the process proceeds to step S313. move on.
If it is determined in step S320 that there has been no deterioration, in step S313, the management unit 106 adds the model additionally learned in step S306 to the model history. Then, in the next step S314, the selection unit 108 selects the model added in step S313 as the current model. After this step 314, the information processing device ends the process of the flowchart in FIG.

図４は、管理部１０６が管理するモデル履歴データの模式的に示した図である。
図４に示すように、モデル履歴データは、モデル情報レコードの列からなる。モデル情報レコードは、モデルＩＤ、モデルデータ、元モデルＩＤ、教師データ、および追加学習日時を含む。
モデルＩＤはモデル情報レコードを識別する通し番号である。
モデルデータは、図４では図示を省略しているが、認識部１０３が用いるモデルの本体であり、例えばバイナリやＸＭＬといった形式で格納される。
元モデルＩＤは、学習部１０４が追加学習を行う際に元となったモデルのモデルＩＤである。モデルＩＤは、出荷時に格納されたモデル、初期設定において作成されたモデルなど、元になったモデルがない場合には「なし」となる。
教師データは、学習部１０４が追加学習を行う際に用いられた教師データの情報である。教師データは、記憶部１０５に格納された特徴量であり、モデル情報レコードではファイル名やｉｎｏｄｅ番号などの参照情報の集合を持つ。元になったモデルがない場合、教師データは「なし」となる。
追加学習日時は、学習部１０４が追加学習を行った日時である。 FIG. 4 is a diagram schematically showing model history data managed by the management unit 106.
As shown in FIG. 4, the model history data consists of a string of model information records. The model information record includes a model ID, model data, original model ID, teacher data, and additional learning date and time.
The model ID is a serial number that identifies the model information record.
Although not shown in FIG. 4, the model data is the main body of the model used by the recognition unit 103, and is stored in a format such as binary or XML.
The original model ID is the model ID of the model that is the source when the learning unit 104 performs additional learning. The model ID is "none" if there is no original model, such as a model stored at the time of shipment or a model created during initial settings.
The teacher data is information on teacher data used when the learning unit 104 performs additional learning. The teacher data is a feature quantity stored in the storage unit 105, and the model information record has a set of reference information such as a file name and an inode number. If there is no original model, the training data will be "none".
The additional learning date and time is the date and time when the learning unit 104 performed additional learning.

選択部１０８は、これらのモデル情報レコードのうち１つを、認識部１０３が用いるモデルとして選択する。
なお、ステップＳ３０８において、追加学習したモデルが劣化していないと判定される限りにおいて、管理部１０６はモデル履歴に追加学習したモデルを追加し、選択部１０８は追加されたモデルを選択することを続けることになる。すなわち、最新の追加学習が行われたモデルが、認識部１０３において使用され続ける。 The selection unit 108 selects one of these model information records as the model used by the recognition unit 103.
Note that as long as it is determined in step S308 that the additionally learned model has not deteriorated, the management unit 106 adds the additionally learned model to the model history, and the selection unit 108 does not select the added model. I will continue. That is, the model on which the latest additional learning has been performed continues to be used in the recognition unit 103.

一方、ステップＳ３２０において、追加学習したモデルが劣化していると評価部１０７が判定してステップＳ３０９に進むと、選択部１０８は、管理部１０６が管理しているモデルの中から、差し戻し候補モデルを探索する。差し戻し候補モデルの探索方法は後述する。 On the other hand, in step S320, when the evaluation unit 107 determines that the additionally learned model is degraded and proceeds to step S309, the selection unit 108 selects a remand candidate model from among the models managed by the management unit 106. Explore. A method of searching for a return candidate model will be described later.

次にステップＳ３１０において、表示部１０９は、ステップＳ３０８において探索した差し戻し候補モデルの情報を表示することで、利用者に確認を促す。このときの利用者は、表示された差し戻し候補モデルに戻すかどうかを判断し、その判断の結果を、操作部１１０を介して入力する。 Next, in step S310, the display unit 109 prompts the user to confirm by displaying information about the remand candidate model searched in step S308. At this time, the user determines whether or not to return to the displayed remand candidate model, and inputs the result of the determination via the operation unit 110.

次にステップＳ３１１において、選択部１０８は、ステップＳ３１０で提示した差し戻し候補モデルに戻すことの指示が利用者から入力されたかどうかを判定する。そして、差し戻し候補モデルに戻すことの指示が入力された場合、選択部１０８は、ステップＳ３１２において、その差し戻し候補モデルを、認識部１０３が使用するモデルとして選択する。 Next, in step S311, the selection unit 108 determines whether the user has input an instruction to return to the remand candidate model presented in step S310. If an instruction to return to the remand candidate model is input, the selection unit 108 selects the remand candidate model as the model to be used by the recognition unit 103 in step S312.

一方、ステップＳ３１１において差し戻し候補モデルに戻さないことの指示が利用者から入力された場合、本実施形態の情報処理装置は図３のフローチャートの処理を終了する。この場合、ステップＳ３０６で追加学習を行う以前に選択されていたモデルが、認識部１０３でそのまま使用される。そして、差し戻し候補モデルをそのまま使うことで、その差し戻し候補モデルが得られた時点の状態に戻ることが保証される。なお、差し戻し候補モデルをそのまま使うのではなく、差し戻し候補モデルに対して、差し戻しの指示が入力された時点以降に行われた追加学習をやり直すようにしてもよい。すなわち、差し戻し候補モデルに戻してそのまま使用する場合、モデル履歴をそのまま辿って先ほどステップＳ３０６で作られたモデルと同じになってしまう。このため、学習部１０４は、例えば特許文献１に記載の手法などを用いて教師データの再選別を行ってから追加学習をやり直す。これにより、差し戻しの指示が入力された時点以降に得られた教師データを活用することができることになる。追加学習をやり直すか、差し戻し候補モデルをそのまま使うかは、利用者の操作によって決めるようにしてもよい。 On the other hand, if the user inputs an instruction not to return to the remand candidate model in step S311, the information processing apparatus of this embodiment ends the process of the flowchart of FIG. 3. In this case, the model selected before performing additional learning in step S306 is used as is by the recognition unit 103. By using the returned candidate model as is, it is guaranteed that the state returns to the state at the time the returned candidate model was obtained. Note that, instead of using the remand candidate model as is, the additional learning performed after the time when the remand instruction was input may be redone for the remand candidate model. That is, when returning to the returned candidate model and using it as is, the model history is followed as is and the model becomes the same as the model created earlier in step S306. Therefore, the learning unit 104 re-sorts the teacher data using the method described in Patent Document 1, for example, and then performs additional learning again. This makes it possible to utilize the teacher data obtained after the point in time when the remand instruction was input. Whether to perform additional learning again or use the returned candidate model as is may be determined by the user's operation.

図５は、ステップＳ３０７において評価部１０７が性能評価に用いるための評価セットを作成する処理の流れを示すフローチャートである。
評価部１０７は、一般的な人物画像から作られた初期評価セットを出荷時に保持している。評価部１０７は、この初期評価セットを監視対象箇所へ適応させるために、運用時に得られた情報を用いて随時更新することによって、評価セットを作成する。 FIG. 5 is a flowchart showing the flow of processing in which the evaluation unit 107 creates an evaluation set for use in performance evaluation in step S307.
The evaluation unit 107 holds an initial evaluation set made from general human images at the time of shipment. The evaluation unit 107 creates an evaluation set by updating the initial evaluation set as needed using information obtained during operation, in order to adapt the initial evaluation set to the monitoring target location.

図６は、評価セットの一例を示した図である。図６に示すように、評価セットは、評価データレコードの列からなる。評価データレコードは、評価データＩＤ、特徴量、重み、正解ラベル、およびスコア履歴を含む。評価ＩＤは評価データレコードを識別する通し番号である。重みは、評価データとしての特徴量の重要度を示し、性能値Ｓ（Ｍ）の計算に用いる。正解ラベルは、特徴量が示す特徴の正解の情報であり、「正常」または「異常」を示す情報となされる。スコア履歴は、モデルＩＤとスコアの値のペアの列から成り、モデルＩＤに対応するモデルを用いて認識部１０３が特徴量を認識した際のスコアの記録を含む。スコア履歴の内容は後述する図７のステップＳ７０２において作られる。 FIG. 6 is a diagram showing an example of an evaluation set. As shown in FIG. 6, the evaluation set consists of a column of evaluation data records. The evaluation data record includes an evaluation data ID, a feature amount, a weight, a correct label, and a score history. The evaluation ID is a serial number that identifies the evaluation data record. The weight indicates the importance of the feature amount as evaluation data, and is used to calculate the performance value S(M). The correct label is information on the correct answer for the feature indicated by the feature amount, and is information indicating "normal" or "abnormal". The score history is made up of a string of pairs of model IDs and score values, and includes records of scores when the recognition unit 103 recognizes feature amounts using the model corresponding to the model ID. The contents of the score history are created in step S702 in FIG. 7, which will be described later.

図５のフローチャートのステップＳ５０１において、評価部１０７は、まず優先評価データを決定して取得する。優先評価データは、記憶部１０５に記憶された特徴量のうち、利用者が操作部１１０を用いて警報の取り消しを行ったことがある特徴量の集合である。これらの特徴量は、異常という誤報が発生した上で、利用者が直接正常であると判断して指示した特徴量であるから、利用者にとっての正常度が高く、かつ誤報が発生した経緯があることから間違えやすい特徴量であるため、重要な特徴量であると推定される。 In step S501 of the flowchart of FIG. 5, the evaluation unit 107 first determines and acquires priority evaluation data. The priority evaluation data is a set of feature quantities stored in the storage unit 105 for which the user has canceled an alert using the operation unit 110. These feature quantities are the ones that the user directly judged and instructed as being normal after a false alarm of abnormality occurred, so they have a high level of normality for the user, and the circumstances behind the occurrence of the false alarm are clear. Because it is a feature that is easy to mistake, it is estimated that it is an important feature.

次にステップＳ５０２において、評価部１０７は、優先評価データのいずれかに類似した特徴量を、記憶部１０５に含まれる特徴量から検索する。特徴量の類似度の算出方法は限定しないが、例えば内積を使用することができる。優先評価データの量は一般に少量であるから、評価の正確性を保つため、優先評価データに類似した性質のデータを追加することによって、評価セットが偏らないようにする。なお、評価部１０７は、特徴量の類似度を直接判断する代わりに、画像から類似性を判断してもよい。例えば、評価部１０７は、人体の姿勢を推定して、同じような姿勢を取っているかどうかを人体パーツの配置から判断してもよい。また、評価部１０７は、例えば、一般物体認識を行って同じカテゴリに属せば類似していると判断してもよい。 Next, in step S502, the evaluation unit 107 searches the feature quantities included in the storage unit 105 for a feature quantity similar to any of the priority evaluation data. Although the method for calculating the similarity of feature amounts is not limited, for example, an inner product can be used. Since the amount of priority evaluation data is generally small, in order to maintain the accuracy of evaluation, data with similar characteristics to the priority evaluation data is added to prevent the evaluation set from being biased. Note that the evaluation unit 107 may determine the similarity from the images instead of directly determining the degree of similarity of the feature amounts. For example, the evaluation unit 107 may estimate the posture of a human body and determine whether the postures are similar based on the arrangement of human body parts. Furthermore, the evaluation unit 107 may perform general object recognition and determine that objects are similar if they belong to the same category, for example.

次にステップＳ５０３において、評価部１０７は、ステップＳ５０１とステップＳ５０２で得られた類似データを評価セットに追加する。また、優先評価データの重みは例えば１０など大きい値とし、類似データの重みは例えば２など優先評価データよりも低い値とする。なお、初期評価セットに含まれるデータの重みは例えば１などさらに低い値になされてもよい。 Next, in step S503, the evaluation unit 107 adds the similar data obtained in steps S501 and S502 to the evaluation set. Furthermore, the weight of priority evaluation data is set to a large value, such as 10, and the weight of similar data is set to a lower value, such as 2, than the priority evaluation data. Note that the weight of the data included in the initial evaluation set may be set to an even lower value, such as 1, for example.

次にステップＳ５０４において、評価部１０７は、評価セットに含まれるデータから、類似したスコア履歴のデータを削減する。つまり、評価セットにデータを追加し続けるとデータ量が肥大化するため、評価部１０７は、スコア履歴の類似度を用いて重複性の高いデータの削減を行う。 Next, in step S504, the evaluation unit 107 reduces data with similar score history from the data included in the evaluation set. In other words, since the amount of data increases if data is continuously added to the evaluation set, the evaluation unit 107 uses the similarity of the score history to reduce highly redundant data.

ここで、モデル履歴の評価の過程で得られたスコア履歴の推移が類似している２つのデータは、異なるモデルにおいても似たようなスコア変化の挙動を示してきていることになる。したがって、これらのデータは、以降の追加学習で作られたモデルでも似たようなスコアを出し続けることが推定される。すなわち、これらのデータは、モデルの評価という目的においては重複性の強いデータであると考えられる。２つのスコア履歴の類似度は、例えば同一モデルＩＤのスコア間の差の絶対値についての２乗平均の値などとして定める。評価部１０７は、類似度が例えば０．０１以下であれば類似しているものとして片方を削除する。例えば評価部１０７は、重みが異なる場合は重みの小さい方を削除し、重みが同じ場合はランダムに決めた方を削除する。すなわち評価部１０７は、類似している少なくとも一つを用いる。なお、評価部１０７は、例えば評価セットが５００個以下など十分に小さい場合、あるいはモデル履歴が１０個以下など類似度を信頼できるほど履歴が蓄積していない場合などでは、ステップＳ５０４の削除処理を行わないようにしてもよい。 Here, two pieces of data that have similar score history transitions obtained in the process of model history evaluation show similar score change behavior even in different models. Therefore, it is presumed that these data will continue to produce similar scores even with subsequent models created through additional learning. In other words, these data are considered to have strong redundancy for the purpose of model evaluation. The degree of similarity between two score histories is determined, for example, as the root mean square value of the absolute value of the difference between the scores of the same model ID. If the degree of similarity is, for example, 0.01 or less, the evaluation unit 107 determines that the two are similar and deletes one of the two. For example, when the weights are different, the evaluation unit 107 deletes the one with the smaller weight, and when the weights are the same, the evaluation unit 107 deletes the randomly determined one. That is, the evaluation unit 107 uses at least one that is similar. Note that the evaluation unit 107 performs the deletion process in step S504 when the evaluation set is sufficiently small, such as 500 or less, or when the history has not been accumulated enough to make the similarity reliable, such as when the number of model histories is 10 or less. You may choose not to do so.

評価部１０７は、以上のようにして、評価セットを作成する。図３のステップＳ３０８における劣化の判定は、この評価セットに基づいて行われる。またステップＳ３０９における差し戻し候補モデルの探索処理は、この評価セットに基づいて行われる。 The evaluation unit 107 creates an evaluation set as described above. Deterioration determination in step S308 in FIG. 3 is performed based on this evaluation set. Further, the search process for a return candidate model in step S309 is performed based on this evaluation set.

図７は、図３のステップＳ３０９において選択部１０８がモデル履歴から差し戻し候補のモデルを探索する処理の流れを示したフローチャートである。
まずステップＳ７０１において、選択部１０８は、ステップＳ３０７で作成した評価セットについて、モデル履歴に含まれるそれぞれのモデルを基に性能値Ｓ（Ｍ）を計算する。
次にステップＳ７０２において、評価部１０７は、ステップＳ７０１における性能値のＳ（Ｍ）の計算過程において計算された、それぞれの評価データについての認識スコアを、評価データレコードのスコア履歴に追加する。 FIG. 7 is a flowchart showing the flow of processing in which the selection unit 108 searches for a model to be sent back from the model history in step S309 of FIG.
First, in step S701, the selection unit 108 calculates a performance value S(M) for the evaluation set created in step S307 based on each model included in the model history.
Next, in step S702, the evaluation unit 107 adds the recognition score for each evaluation data calculated in the process of calculating the performance value S(M) in step S701 to the score history of the evaluation data record.

次にステップＳ７０３において、選択部１０８は、ステップＳ７０１で計算した性能値の値から、性能値が極大となるモデルの候補を探索する。性能値が極大となるモデルは、モデル履歴の中で最も性能値の高いモデルＭ_maxをまず探索する。次いで選択部１０８は、そのモデルＭ_maxの性能値Ｓ（Ｍ_max）との差が、所定の閾値より小さいモデルがあれば、それらも極大候補モデルとして選び出す。ただし、選択部１０８は、モデルＭ_maxが最初のモデル、すなわち追加学習を一度も行っていない初期モデルだった場合、追加学習がまったく効果がなかったことになるため、極大モデルの候補がないものとして、初期モデルの再作成を促す。 Next, in step S703, the selection unit 108 searches for a model candidate with a maximum performance value from the performance value calculated in step S701. For a model with a maximum performance value, the model M _max with the highest performance value in the model history is first searched for. Next, if there is a model in which the difference between the model M _max and the performance value S (M _max ) is smaller than a predetermined threshold value, the selection unit 108 also selects these models as maximum candidate models. However, if the model M _max is the first model, that is, an initial model on which no additional learning has been performed, the selection unit 108 selects a model for which there is no maximum model candidate, since additional learning has no effect at all. As such, we encourage the re-creation of the initial model.

図８（Ａ）～図８（Ｃ）は性能値の推移の例を示した図である。
例えば図８（Ａ）に示すように、最も性能値の高いモデルＭ_maxの性能値Ｓ（Ｍ_max）が高く、それに近い性能値のモデルがモデル履歴にない場合、選択部１０８は、モデルＭ_maxをそのまま選択する。
例えば図８（Ｂ）に示すように、モデルＭ_maxの性能値Ｓ（Ｍ_max）に近い性能値のモデルが、例えば点線８０４で示す枠内のように複数存在した場合にはモデルＭ_maxを単純に選択することが有効であるとは自明には言えない。このため、表示部１０９は、性能値の推移の変化を利用者に提示する。
例えば図８（Ｃ）に示すように、モデルＭ_maxが初期モデルである場合は、そもそも追加学習の効果が一切現れていない。この場合、初期モデルに何らかの異常がある可能性が考えられる。 FIGS. 8(A) to 8(C) are diagrams showing examples of changes in performance values.
For example, as shown in FIG. 8A, when the performance value S (M _max ) of the model M _max with the highest performance value is high and there is no model with a performance value close to it in the model history, the selection unit 108 selects the model M Select _max as is.
For example, as shown in FIG. 8(B), if there are multiple models with performance values close to the performance value S (M _max ) of the model M _max as shown in the frame indicated by the dotted line 804, the model M _max is It is not obvious that simply selecting is effective. Therefore, the display unit 109 presents the change in the performance value transition to the user.
For example, as shown in FIG. 8(C), if the model M _max is an initial model, no effect of additional learning has appeared in the first place. In this case, there may be some abnormality in the initial model.

図７のフローチャートの説明に戻り、ステップＳ７０４において、選択部１０８は、性能値が極大となる候補のモデルの数を判定する。選択部１０８は、性能値が極大となる候補のモデル数が１の場合には、ステップＳ７０５において、その極大のモデルを選択候補モデルとしてそのまま選んで処理を終了する。一方、選択部１０８は、性能値が極大となる候補のモデル数が０個、すなわちモデルＭ_maxが初期モデルだった場合には、表示部１０９が実行するステップＳ７０６に処理を進める。また選択部１０８は、性能値が極大となる候補のモデルが複数あった場合には、表示部１０９が実行するステップ７０９に処理を進める。 Returning to the explanation of the flowchart in FIG. 7, in step S704, the selection unit 108 determines the number of candidate models that have the maximum performance value. If the number of candidate models with the maximum performance value is 1, the selection unit 108 directly selects the maximum model as the selection candidate model in step S705, and ends the process. On the other hand, if the number of candidate models with the maximum performance value is 0, that is, the model M _max is the initial model, the selection unit 108 advances the process to step S706 executed by the display unit 109. If there are multiple candidate models with maximum performance values, the selection unit 108 advances the process to step 709, which is executed by the display unit 109.

ステップＳ７０６に進むと、表示部１０９は、利用者にモデルの再作成を促すためのダイアログを表示する。そして次のステップＳ７０７において、選択部１０８は、利用者が操作部１１０を介してモデルの再作成を指示したか、あるいは、初期モデルに差し戻すことを指示したかを判定する。選択部１０８は、利用者の指示を基にモデルの再作成が行われた場合にはステップ７０８の処理に進み、再生成が行われていない場合つまり初期モデルへの差し戻しが指示された場合にはステップＳ７０５の処理に進む。 Proceeding to step S706, the display unit 109 displays a dialog for prompting the user to recreate the model. In the next step S707, the selection unit 108 determines whether the user has instructed, via the operation unit 110, to recreate the model or to return to the initial model. The selection unit 108 proceeds to step 708 when the model has been regenerated based on the user's instructions, and proceeds to step 708 when the regeneration has not been performed, that is, when an instruction to return to the initial model has been given. The process proceeds to step S705.

選択部１０８は、ステップＳ７０７においてモデルの再作成が行われたと判定してステップＳ７０８に進むと、その再作成されたモデルを選択候補モデルとして処理を終了する。一方、選択部１０８は、再作成が行われず、初期モデルへ差し戻されてステップＳ７０５に進むと、極大のモデルＭ_maxすなわち初期モデルを選択候補モデルとして処理を終了する。 When the selection unit 108 determines in step S707 that the model has been re-created and proceeds to step S708, the selection unit 108 sets the re-created model as the selection candidate model and ends the process. On the other hand, if the selection unit 108 does not perform re-creation and returns to the initial model and proceeds to step S705, the selection unit 108 ends the process with the maximum model M _max , that is, the initial model, as the selection candidate model.

またステップＳ７０４において、性能値が極大となる候補のモデルが複数あってステップＳ７０９に進むと、表示部１０９は、極大候補モデルからの選択画面を表示する。
次にステップＳ７１０において、選択部１０８は、ステップＳ７０９で表示された選択画面を介して利用者が選択した極大候補モデルを、選択候補モデルとして処理を終了する。 Further, in step S704, if there are multiple candidate models with maximum performance values and the process proceeds to step S709, the display unit 109 displays a selection screen from the maximum candidate models.
Next, in step S710, the selection unit 108 ends the process with the maximum candidate model selected by the user via the selection screen displayed in step S709 as the selected candidate model.

図９は、ステップＳ７０９において表示部１０９が表示する極大候補からの選択画面の例を示した図である。
図９の選択画面において、コンボボックス９０１は、極大候補モデルの作成日時を項目として有する。利用者は、コンボボックス９０１の作成日時の項目を選択することで、極大候補モデルから１つを選択することができる。サムネイルエリア９０２および９０３は、評価セットの中で、現在の認識部１０３で用いるように選択されているモデルでの認識結果と異なる評価データのサムネイル画像が表示されるエリアである。サムネイルエリア９０２は、現在のモデルでは異常となるが極大候補モデルでは正常となるデータのサムネイル画像が表示されるエリアである。一方、サムネイルエリア９０３は、現在のモデルでは正常となるが極大候補モデルでは異常となるデータのサムネイル画像が表示されるエリアである。サムネイルエリア９０２および９０３に表示されるサムネイル画像は、コンボボックス９０１の選択を利用者が操作部１１０を用いて切り替えることによって変化する。利用者はこれらを見て、どのように結果が変化するかを確認しながらより良いと考えられる極大候補モデルを選択することができる。ボタン９０４は、利用者がコンボボックス９０１から選択した作成日時の極大候補モデルを、さらに利用者が選択候補モデルとして決定する際にタップ等されるボタンである。このようにして選択された選択候補モデルは、ステップＳ３１０での利用者の確認を経て、差し戻し候補モデルとして以降の認識部１０３において用いられることになる。 FIG. 9 is a diagram showing an example of a selection screen from maximum candidates displayed on the display unit 109 in step S709.
In the selection screen of FIG. 9, a combo box 901 has the creation date and time of the maximum candidate model as an item. The user can select one of the maximum candidate models by selecting the creation date and time item in the combo box 901. Thumbnail areas 902 and 903 are areas in which thumbnail images of evaluation data different from the recognition result of the model currently selected for use by the recognition unit 103 in the evaluation set are displayed. The thumbnail area 902 is an area where thumbnail images of data that is abnormal in the current model but normal in the maximum candidate model are displayed. On the other hand, the thumbnail area 903 is an area where thumbnail images of data that are normal in the current model but abnormal in the maximum candidate model are displayed. The thumbnail images displayed in the thumbnail areas 902 and 903 change as the user switches the selection in the combo box 901 using the operation unit 110. The user can look at these and select the maximum candidate model that is considered better while checking how the results change. The button 904 is a button that is tapped when the user further determines the creation date/time maximum candidate model selected from the combo box 901 as the selected candidate model. The selection candidate model selected in this manner is confirmed by the user in step S310, and is subsequently used by the recognition unit 103 as a return candidate model.

第１の実施形態の情報処理装置は、以上説明したように、追加学習によって認識性能に低下が発生することが判明した時点で、以前の履歴から、精度が高かった時点の状態に差し戻すことができる。すなわち第１の実施形態の情報処理装置は、認識精度を低下させる追加学習を適切に除去可能となり、認識精度の低下を抑えることが可能となる。 As explained above, the information processing device of the first embodiment is capable of reverting to the state at the time when the accuracy was high based on the previous history when it is found that the recognition performance deteriorates due to additional learning. I can do it. That is, the information processing device of the first embodiment can appropriately remove additional learning that lowers recognition accuracy, and can suppress a decrease in recognition accuracy.

＜第２の実施形態＞
第１の実施形態では、評価セットが作成され、それを使って差し戻し候補モデルが探索される例を説明した。ただし、評価セットをモデル履歴のモデルごとに全て評価する場合、計算量が多くなる可能性がある。
第２の実施形態では、通常運用時の認識結果を再利用してモデルの評価を行う方法を説明する。第２の実施形態の情報処理装置の構成は図１と同様であるため、その図示と説明は省略する。以下、第２の実施形態において、第１の実施形態に対して追加または変更された部分についてのみ説明し、第１の実施形態と共通する部分の説明は省略する。 <Second embodiment>
In the first embodiment, an example has been described in which an evaluation set is created and a remand candidate model is searched for using the evaluation set. However, if the evaluation set is evaluated for each model in the model history, the amount of calculation may increase.
In the second embodiment, a method of evaluating a model by reusing recognition results during normal operation will be described. The configuration of the information processing apparatus of the second embodiment is the same as that in FIG. 1, so illustration and description thereof will be omitted. Hereinafter, in the second embodiment, only the parts added or changed with respect to the first embodiment will be explained, and the explanation of the parts common to the first embodiment will be omitted.

図１０は、第２の実施形態の情報処理装置における処理の流れを示すフローチャートである。なお、前述した図３のフローチャートと同様の処理が行われるステップには、図３と同じ参照符号を付してそれらの説明は省略する。図１０のフローチャートの処理は、図３と同様に、撮影部１０１から動画の１フレームの画像データが入力されるごとに実行される。図１０のフローチャートの場合、図３のステップＳ３０７の処理が省かれている一方で、ステップＳ３０５とステップＳ３０６との間にステップＳ１００１の処理が加えられている。 FIG. 10 is a flowchart showing the flow of processing in the information processing apparatus of the second embodiment. Note that the steps in which the same processing as in the flowchart of FIG. 3 described above is performed are given the same reference numerals as in FIG. 3, and the description thereof will be omitted. The process in the flowchart of FIG. 10 is executed every time one frame of image data of a moving image is input from the imaging unit 101, similarly to FIG. 3. In the case of the flowchart in FIG. 10, while the process in step S307 in FIG. 3 is omitted, the process in step S1001 is added between step S305 and step S306.

第２の実施形態の情報処理装置は、ステップＳ３０５で追加学習を行う条件が満たされていると判定された場合、ステップＳ１００１の処理に進む。ステップＳ１００１に進むと、評価部１０７は、現在の認識部１０３で用いるモデルの性能値を、後述する図１１のフローチャートの処理を実行することにより算出する。ステップＳ１００１の処理後、情報処理装置の処理はステップＳ３０６に進み、当該ステップＳ３０６において前述同様に追加学習が行われる。そして、ステップＳ３０６の後、情報処理装置の処理は、ステップＳ３０８に進む。 If the information processing apparatus of the second embodiment determines in step S305 that the conditions for performing additional learning are satisfied, the process proceeds to step S1001. Proceeding to step S1001, the evaluation unit 107 calculates the performance value of the model currently used by the recognition unit 103 by executing the process shown in the flowchart of FIG. 11, which will be described later. After the processing in step S1001, the processing of the information processing apparatus proceeds to step S306, and additional learning is performed in the same manner as described above in step S306. After step S306, the processing of the information processing device proceeds to step S308.

第２の実施形態の場合、図３のステップＳ３０７の処理が省かれている。このため、次のステップＳ３０８において、評価部１０７は、第１の実施形態における初期評価セットを常に用いてモデルの劣化判定を行う。なお、第１の実施形態と同様にステップＳ３０７を設けて評価セットの更新が行われるようにしてもよいが、その場合、評価セットは本ステップでのみ用いられることになる。 In the case of the second embodiment, the process of step S307 in FIG. 3 is omitted. Therefore, in the next step S308, the evaluation unit 107 always uses the initial evaluation set in the first embodiment to determine model deterioration. Note that, as in the first embodiment, step S307 may be provided to update the evaluation set, but in that case, the evaluation set will be used only in this step.

図１１は、図１０のステップＳ１００１において、評価部１０７が現在の認識部１０３で用いているモデルＭの性能値Ｓ（Ｍ）を計算する処理の流れを示すフローチャートである。
まずステップＳ１１０１において、評価部１０７は、ステップＳ３０２で認識スコアを計算した特徴量を記憶部１０５から取得してクラスタリングを行う。ここで、クラスタリングは、認識対象の属性に基づいて特徴量を分類する処理が考えられる。例えば、認識対象が人物である場合、クラスタリングは、人物の属性として、例えば姿勢を検出し、歩いている、走っている、または座っているなどの姿勢カテゴリごとに特徴量を分類する処理が挙げられる。またクラスタリングは、人物の属性として、例えば人物の歩行方向、歩行速度、カメラからの距離、身長、年齢、性別、服装、所持品、または人物密度などの各カテゴリによって特徴量を分類する処理であってもよい。さらにクラスタリングは、それらの各属性のうち、例えば性別と服装、服装と所持品、年齢と性別と歩行速度など、二つ以上の関連性の高い属性の組合せによって特徴量を分類する処理でもよい。その他にも、クラスタリングは、事前知識なしで例えばｋ平均法などを用いて特徴量を分類する処理であってもよい。 FIG. 11 is a flowchart showing the flow of processing in which the evaluation unit 107 calculates the performance value S(M) of the model M currently used by the recognition unit 103 in step S1001 of FIG.
First, in step S1101, the evaluation unit 107 acquires the feature quantities whose recognition scores were calculated in step S302 from the storage unit 105, and performs clustering. Here, clustering can be considered as a process of classifying feature amounts based on the attributes of the recognition target. For example, when the recognition target is a person, clustering is a process of detecting posture as an attribute of the person and classifying the features by posture category such as walking, running, or sitting. It will be done. Clustering is a process that classifies feature amounts according to categories of person attributes, such as the person's walking direction, walking speed, distance from the camera, height, age, gender, clothing, belongings, or person density. You can. Furthermore, clustering may be a process of classifying feature amounts based on a combination of two or more highly related attributes, such as gender and clothing, clothing and belongings, age, sex, and walking speed. Alternatively, clustering may be a process of classifying feature amounts using, for example, the k-means method without prior knowledge.

次にステップＳ１１０２において、評価部１０７は、ステップＳ１１０１でクラスタリングを行ったクラスタから、重要クラスタを決定する。例えば、評価部１０７は、クラスタのサイズが最も大きいものを選択して重要クラスタに決定する。例えば、走っている人物のクラスタが大きい場合、その監視対象箇所は、走っている人物が多く出現する環境であると考えられる。この場合、重要クラスタの決定は、モデルの評価の基準として、走っている人物を対象として行われることが合理的であると推定される。また、評価に用いる重要クラスタの選び方はこれに限ったものではない。例えば、利用者が、事前知識に基づいて座っている人物が重要であると判断するような場合、評価部１０７は、座っている人物のクラスタから、重要クラスタを決定する。また重要クラスタは、異常が発生しやすいクラスタが選ばれてもよい。また、これらのクラスタが複数組み合わせされてもよい。 Next, in step S1102, the evaluation unit 107 determines important clusters from the clusters subjected to clustering in step S1101. For example, the evaluation unit 107 selects the cluster with the largest size and determines it as an important cluster. For example, if the cluster of running people is large, the monitoring target location is considered to be an environment where many running people appear. In this case, it is presumed that it is reasonable to determine important clusters using a running person as a standard for model evaluation. Furthermore, the method of selecting important clusters used for evaluation is not limited to this. For example, if the user determines that a sitting person is important based on prior knowledge, the evaluation unit 107 determines an important cluster from the clusters of sitting people. Further, as the important cluster, a cluster in which an abnormality is likely to occur may be selected. Furthermore, a plurality of these clusters may be combined.

次にステップＳ１１０３において、評価部１０７は、ステップＳ１１０２で決定した重要クラスタと、そこに含まれる特徴量についてステップＳ３０２で計算された認識スコアとを用いて、性能値Ｓ（Ｍ）を計算する。性能値を計算する際の正解ラベルは、利用者が操作部１１０を用いて警報の取り消しを行ったことがある特徴量については正常とし、そうでないものは警報が発生したものは異常としそれ以外は正常とする。 Next, in step S1103, the evaluation unit 107 calculates a performance value S(M) using the important cluster determined in step S1102 and the recognition score calculated in step S302 for the feature amount included therein. The correct label when calculating the performance value is that features whose warnings have been canceled by the user using the operation unit 110 are treated as normal, and those that have not been canceled are treated as abnormal if an alarm has been generated. is considered normal.

次にステップＳ１１０４において、評価部１０７は、ステップＳ１１０３で計算した現在のモデルの性能値を、モデル履歴レコードに紐づけて記憶部１０５に記憶する。
第２の実施形態では、以上のようにして、現在の認識部１０３で用いるモデルの性能値が計算される。 Next, in step S1104, the evaluation unit 107 stores the current model performance value calculated in step S1103 in the storage unit 105 in association with the model history record.
In the second embodiment, the performance value of the model currently used by the recognition unit 103 is calculated as described above.

本実施形態の情報処理装置の場合、ステップＳ１００１によってモデルの性能値が既に算出されている。このため、第２の実施形態の情報処理装置は、前述した図７のステップＳ７０１及びステップＳ７０２の処理を省いたフローチャートの処理を行う。すなわち第２の実施形態の場合、図７のステップＳ７０１及びステップＳ７０２の処理が省かれ、選択部１０８は、ステップＳ７０３からフローチャートの処理を開始し、このステップＳ７０３以降の処理を行う。 In the case of the information processing apparatus of this embodiment, the performance value of the model has already been calculated in step S1001. For this reason, the information processing apparatus of the second embodiment performs the processing in the flowchart that omits the processing in step S701 and step S702 in FIG. 7 described above. That is, in the case of the second embodiment, the processing in steps S701 and S702 in FIG. 7 is omitted, and the selection unit 108 starts the processing in the flowchart from step S703, and performs the processing from step S703 onwards.

このように、第２の実施形態の情報処理装置は、評価セットを用いず、モデルの選択時には重要クラスタでの性能に基づいて選択候補モデルを決定する。重要クラスタを用いた性能の比較は、評価セットを用いるような同一データでの性能比較にはならないものの、クラスタリングによってある程度類似した性質のデータが集まっていると期待される。また重要クラスタの認識スコアは、通常運用時の認識結果であるステップＳ３０２で計算したものを用いる。このため、第２の実施形態の情報処理装置は、モデル選択の際に新たに認識部１０３で計算を行う必要がなく、計算量の制約が強い環境において有用である。 In this manner, the information processing apparatus of the second embodiment does not use an evaluation set, and determines selection candidate models based on performance in important clusters when selecting models. Although comparing performance using important clusters does not compare performance using the same data as using evaluation sets, it is expected that data with somewhat similar properties will be gathered through clustering. Furthermore, the recognition score of the important cluster uses the recognition result calculated in step S302 during normal operation. Therefore, the information processing device of the second embodiment does not require the recognition unit 103 to perform new calculations when selecting a model, and is useful in an environment where the amount of calculation is strongly restricted.

なお、第２の実施形態では第１の実施形態で用いていた評価セットをモデル選択において使用しない方法を説明したが、部分的に評価セットを用いる方法でもよい。例えば、情報処理装置は、小規模な評価セットによる性能値を別に求めて、重要クラスタによる性能値を組み合わせたり、評価セットの内容を重要クラスタに類似したものに制限したりするなどの方法でもよい。この場合、評価セットの計算量が軽減可能となる。 Note that although the second embodiment describes a method in which the evaluation set used in the first embodiment is not used in model selection, a method that partially uses the evaluation set may also be used. For example, the information processing device may separately obtain performance values from a small evaluation set and combine the performance values from important clusters, or limit the contents of the evaluation set to those similar to the important clusters. . In this case, the amount of calculation for the evaluation set can be reduced.

＜第３の実施形態＞
第１の実施形態では、モデルとして混合ガウス分布を例に挙げて説明した。第３の実施形態では、最近傍モデルを用いることによって、モデル履歴の保存容量および差し戻し候補モデルの探索の軽量化を可能とする例について説明する。第３の実施形態の情報処理装置の構成は図１と同様であるため、その図示と説明は省略する。以下、第３の実施形態において、第１の実施形態に対して追加または変更された部分についてのみ説明し、第１の実施形態と共通する部分の説明は省略する。なお、第３の実施形態で説明する手法は、前述した第２の実施形態についても適用可能である。 <Third embodiment>
The first embodiment has been described using a Gaussian mixture distribution as an example of the model. In the third embodiment, an example will be described in which the storage capacity of model history and the search for remand candidate models can be reduced by using a nearest neighbor model. The configuration of the information processing device of the third embodiment is the same as that in FIG. 1, so illustration and description thereof will be omitted. Hereinafter, in the third embodiment, only the parts added or changed with respect to the first embodiment will be explained, and the explanation of the parts common to the first embodiment will be omitted. Note that the method described in the third embodiment is also applicable to the second embodiment described above.

第３の実施形態の場合、モデルは特徴量空間の点の集合である。また第３の実施形態における認識スコアは、特徴量に最も近い点（最近傍）までの距離である。第３の実施形態の場合、モデルＭを用いた際の特徴量ｘの認識スコアＳ（ｘ；Ｍ）は、以下の式（３）により算出する。 In the case of the third embodiment, the model is a set of points in feature space. Furthermore, the recognition score in the third embodiment is the distance to the point closest to the feature amount (nearest neighbor). In the case of the third embodiment, the recognition score S(x;M) of the feature x when using the model M is calculated by the following equation (3).

ここで式（３）のＫはモデルの点の数、ｙ_iはモデルに含まれる点のベクトル値、ｄ（ｙ，ｘ）はｙとｘのユークリッド距離である。認識スコアは（０，１］の範囲の実数値を取り、大きいほど正常、小さいほど異常を表す。なお、ｄの値域は［０，∞）である。ここではスコアとして取り扱うため、ｄの値域は０から１の範囲を取るよう変形しているが、その方法はここで示したものに限らない。また、ｄの値域はユークリッド距離に限らず、マンハッタン距離やチェビシェフ距離など擬距離の公理を満たす関係ならばどのような値域であってもよい。また本実施形態の手法は、ＬＳＨ（Ｌｏｃａｌｌｙ－ＳｅｎｓｉｔｉｖｅＨａｓｈｉｎｇ）などの手法を用いてハミング距離で近似する方法でも同様に用いることができる。また第３の実施形態の場合、モデルの点ごとに追加学習の世代番号が、メタデータとして合わせて保持される。初期モデルでは、モデルの点は、すべて世代番号０を持つ。また第３の実施形態において、図３に示したステップＳ３０６の追加学習は、正解ラベルが正常である特徴量の点を、モデルに追加することによって行う。世代番号は、ステップＳ３０６の実行ごとにインクリメントし、追加学習で追加された点にメタデータとして付与される。 Here, K in equation (3) is the number of points in the model, y _i is the vector value of the points included in the model, and d(y, x) is the Euclidean distance between y and x. The recognition score takes a real value in the range of (0, 1), and the larger the score, the more normal it is, and the smaller it is, it is abnormal. Note that the range of d is [0, ∞). Here, since it is handled as a score, the range of d is transformed to take a range from 0 to 1, but the method is not limited to the one shown here. Further, the range of d is not limited to Euclidean distance, but may be any range as long as it satisfies the axiom of pseudo-distance, such as Manhattan distance or Chebyshev distance. Furthermore, the method of this embodiment can be similarly used with a method of approximating using Hamming distance using a method such as LSH (Locally-Sensitive Hashing). Further, in the case of the third embodiment, the generation number of additional learning for each point of the model is also held as metadata. In the initial model, all points in the model have generation number 0. Further, in the third embodiment, the additional learning in step S306 shown in FIG. 3 is performed by adding points of the feature amount whose correct answer label is normal to the model. The generation number is incremented each time step S306 is executed, and is given as metadata to the points added by additional learning.

第３の実施形態における管理部１０６は、モデル情報レコードにモデルデータを保持せず、代わりに世代番号を保持する。また、管理部１０６は、最新のモデルに対応する単一のモデルのモデルデータを持ち、他の履歴のモデルデータは最新モデルのモデルデータから再現する。 The management unit 106 in the third embodiment does not hold model data in the model information record, but instead holds a generation number. Furthermore, the management unit 106 has model data of a single model corresponding to the latest model, and reproduces other historical model data from the model data of the latest model.

図１２は、第３の実施形態における管理部１０６の履歴管理方法を説明する図である。
図１２に示すように、管理部１０６が管理するモデル履歴データは、モデル情報レコード１２０１の列からなる。モデル情報レコード１２０１は、世代番号、元モデルＩＤ、教師データ、および追加学習日時を含む。つまり第３の実施形態におけるモデル履歴データは、図４に示したモデル履歴データのモデルデータを含まず、その代わりに、世代番号を含んでいる。
世代番号は、追加学習ごとにインクリメントされる番号である。ただし、世代番号は、履歴の削除や分岐などが発生すればモデルＩＤと必ずしも一致するとは限らない。 FIG. 12 is a diagram illustrating a history management method of the management unit 106 in the third embodiment.
As shown in FIG. 12, the model history data managed by the management unit 106 consists of a column of model information records 1201. The model information record 1201 includes a generation number, original model ID, teacher data, and additional learning date and time. That is, the model history data in the third embodiment does not include the model data of the model history data shown in FIG. 4, but instead includes a generation number.
The generation number is a number that is incremented every time additional learning is performed. However, the generation number does not necessarily match the model ID if the history is deleted or branched.

第３の実施形態の管理部１０６は、別途、最新モデルのモデルデータ１２０２を管理する。最新モデルのモデルデータ１２０２は、モデルの点を表現するベクトル値を含み、それらモデルの点を表現するベクトル値ごとに、メタデータとして世代番号が付与されている。なお図１２では、具体的なベクトル値は省略している。同一の世代番号を持つベクトル値の部分集合は、同じ世代番号で追加されたベクトル値の集合を成す。 The management unit 106 of the third embodiment separately manages model data 1202 of the latest model. The model data 1202 of the latest model includes vector values representing points of the model, and a generation number is assigned as metadata to each vector value representing the points of the model. Note that in FIG. 12, specific vector values are omitted. A subset of vector values with the same generation number forms a set of vector values added with the same generation number.

管理部１０６は、例えば世代番号９９のモデルデータを得る場合、最新モデルのモデルデータから、世代番号が９９以下のベクトル値のみを抽出する。これにより、初期モデルから世代番号９９までの追加学習で追加されたベクトル値の集合、すなわち世代番号９９のモデルデータが得られる。図１２の例の場合、ＩＤが９５０１番目から１００００番目までのベクトル値が、世代番号が１００に対応した追加学習で追加された値である。このため、管理部１０６は、世代番号９９のモデルデータを得る場合、世代番号が１００となっているベクトル値を除いた、ＩＤが１番から９５００番までの各ベクトル値を取り出すことによって、世代番号９９のモデルを得る。 For example, when obtaining model data with a generation number of 99, the management unit 106 extracts only vector values with generation numbers of 99 or less from the model data of the latest model. As a result, a set of vector values added through additional learning from the initial model to generation number 99, ie, model data of generation number 99, is obtained. In the example of FIG. 12, vector values with IDs from 9501st to 10000th are values added by additional learning corresponding to generation number 100. Therefore, when obtaining model data with generation number 99, the management unit 106 extracts each vector value with IDs from 1 to 9500, excluding the vector value with generation number 100, to obtain model data of generation number 99. Obtain model number 99.

図１３は、第３の実施形態の情報処理装置が、モデルの性能値Ｓ（Ｍ）を計算する際の処理の流れを示したフローチャートである。第３の実施形態の場合、情報処理装置は、前述した図７のステップＳ７０１で説明したような、すべてのモデル履歴のモデルの性能値を計算する代わりに、図１３に示すフローチャートの処理を行うことで、モデルの性能値の計算効率を高める。以下、図１３のフローチャートを参照して、第３の実施形態の情報処理装置が、モデルの性能値を計算する方法について説明する。なお、以下の説明では、世代番号をｇとし、その世代番号ｇのモデルをＭ_gとする。 FIG. 13 is a flowchart showing the flow of processing when the information processing apparatus according to the third embodiment calculates the performance value S(M) of the model. In the case of the third embodiment, the information processing device performs the process of the flowchart shown in FIG. 13 instead of calculating the performance values of the models of all model histories as described in step S701 of FIG. 7 described above. This increases the efficiency of calculating model performance values. Hereinafter, with reference to the flowchart of FIG. 13, a method for the information processing apparatus of the third embodiment to calculate the performance value of the model will be described. In the following description, the generation number is assumed to be g, and the model of the generation number g is assumed to be M _g .

まずステップＳ１３０１において、選択部１０８は、管理部１０６から最新のモデルの世代番号を取得して、世代番号Ｇとする。
次にステップＳ１３０２において、選択部１０８は、管理部１０６から、世代番号ＧのモデルＭ_Gを取得する。すなわち先に説明したように、選択部１０８は、管理部１０６から世代番号がＧ以下のベクトル値の集合を取得してモデルＭ_Gとする。
次にステップＳ１３０３において、評価部１０７は、認識スコアＳ（ｘ；Ｍ_G）を前述の計算式で計算する。また、評価部１０７は、その計算過程で求める最近傍ｙ、すなわちｄ（ｙ，ｘ）が最小となるモデルＭ_Gの点ｙを記憶しておく。 First, in step S1301, the selection unit 108 acquires the generation number of the latest model from the management unit 106, and sets it as generation number G.
Next, in step S1302, the selection unit 108 acquires the model M _G with generation number G from the management unit 106. That is, as described above, the selection unit 108 acquires a set of vector values whose generation numbers are G or less from the management unit 106 and sets it as a model M _G .
Next, in step S1303, the evaluation unit 107 calculates the recognition score S(x; M _G ) using the above-mentioned formula. Furthermore, the evaluation unit 107 stores the point y of the model M _G at which the nearest neighbor y determined in the calculation process, that is, d(y, x) is the minimum.

次にステップＳ１３０４において、選択部１０８は、ステップＳ１３０３で求めた最近傍ｙの世代番号を取得し、世代番号Ｇ’とする。
次にステップＳ１３０５において、選択部１０８は、世代番号Ｇ’から世代番号Ｇまでのモデルにおけるｘの認識スコアを、ステップＳ１３０３で求めた認識スコアＳ（ｘ；Ｍ_G）の値で記憶する。すなわち、世代番号ｇの認識スコアＳ（ｘ；Ｍ_g）は下記の式（４）の値となる。 Next, in step S1304, the selection unit 108 obtains the generation number of the nearest neighbor y obtained in step S1303, and sets it as the generation number G'.
Next, in step S1305, the selection unit 108 stores the recognition score of x in the models from generation number G' to generation number G as the value of recognition score S(x; M _G ) obtained in step S1303. That is, the recognition score S(x; M _g ) of the generation number g is the value of the following equation (4).

Ｓ（ｘ；Ｍ_g）＝Ｓ（ｘ；Ｍ_G）Ｇ’≦ｇ≦Ｇ式（４） S(x; M _g )=S(x; M _G )G'≦g≦G Formula (4)

最近傍ｙは世代番号Ｇ’の追加学習で追加されたベクトル値であり、それが世代番号Ｇのモデルでも依然として最近傍なのであるから、世代番号Ｇ’＋１から世代番号Ｇまでの追加学習において最近傍ｙよりもｘに近いベクトル値は追加されていない。したがって、世代番号Ｇ’から世代番号Ｇまでのモデルの最近傍はすべてｙであり、同じ値Ｓ（ｘ；Ｍ_G）を取ることがわかる。 The nearest neighbor y is the vector value added in the additional learning of the generation number G', and since it is still the nearest neighbor in the model of the generation number G, it is the vector value added in the additional learning from the generation number G'+1 to the generation number G. Vector values closer to x than the neighbor y are not added. Therefore, it can be seen that the nearest neighbors of the models from generation number G' to generation number G are all y and take the same value S(x; M _G ).

次にステップＳ１３０６において、選択部１０８は、世代番号Ｇ’＝０かどうかを判定する。選択部１０８は、世代番号Ｇ’＝０であるならば、初期モデルまでのすべてのモデルについて認識スコアが既に計算されたため図１３のフローチャートの処理を終了する。一方、選択部１０８は、世代番号Ｇ’＝０でない場合にはステップＳ１３０７に処理を進める。ステップＳ１３０７に進むと、選択部１０８は、世代番号ＧにＧ’－１を設定してからステップＳ１３０２の処理に戻り、世代番号Ｇ’－１より以前のモデルについて認識スコアを計算する。 Next, in step S1306, the selection unit 108 determines whether the generation number G'=0. If the generation number G'=0, the selection unit 108 ends the process of the flowchart of FIG. 13 because recognition scores have already been calculated for all models up to the initial model. On the other hand, if the generation number G' is not 0, the selection unit 108 advances the process to step S1307. Proceeding to step S1307, the selection unit 108 sets the generation number G to G'-1, returns to the process of step S1302, and calculates recognition scores for models before the generation number G'-1.

第３の実施形態の情報処理装置は、以上のようにして、モデル履歴に含まれるモデルのそれぞれについて認識スコアＳ（ｘ；Ｍ_g）を計算することができる。第３の実施形態の情報処理装置は、ステップＳ１３０５において世代番号Ｇ’からＧ－１までのモデルの計算を省略できるため、前述した実施形態のようなモデル履歴のモデルそれぞれで逐次計算を行う場合よりも計算効率が高くなる。例えば、ステップＳ１３０５で常に世代番号Ｇ’＝Ｇとなる最近傍が選ばれ続けるようなワーストケースの場合でも、計算効率は、前述した逐次計算と同等である。
第３の実施形態の情報処理装置は、以上のような処理を行うことで、最近傍モデルを用いた際には、モデル履歴のデータサイズおよびモデル性能の計算の効率を高めることができる。 The information processing device of the third embodiment can calculate the recognition score S(x; M _g ) for each model included in the model history as described above. The information processing apparatus of the third embodiment can omit the calculation of the models with generation numbers G' to G-1 in step S1305, so when sequential calculation is performed for each model of the model history as in the embodiment described above. The calculation efficiency is higher than that of For example, even in the worst case where the nearest neighbor with generation number G'=G is always selected in step S1305, the calculation efficiency is equivalent to the sequential calculation described above.
By performing the above-described processing, the information processing apparatus according to the third embodiment can improve the efficiency of calculation of model history data size and model performance when using the nearest neighbor model.

＜第４の実施形態＞
前述した第３の実施形態の場合、ステップＳ３０６の追加学習は、正解ラベルが正常である教師データをモデルに追加することによって最近傍となり得る正常データを増加させるようにしている。これに対し、第４の実施形態の情報処理装置は、正解ラベルが異常である教師データを用いて、モデルの正常データを削減することによって、教師データの近傍におけるスコアを下げるような追加学習を可能とする。第４の実施形態によるモデルは、未認識の異常について認識しやすくする方向に働く。以下、第４の実施形態において正常データを削減する追加学習を行う方法の追加について説明する。なお、第４の実施形態の情報処理装置の構成は図１と同様であるため、その図示と説明は省略する。以下、第４の実施形態において、前述した実施形態に対して追加または変更された部分についてのみ説明し、前述の実施形態と共通する部分の説明は省略する。 <Fourth embodiment>
In the case of the third embodiment described above, the additional learning in step S306 increases the number of normal data that can be nearest neighbors by adding training data whose correct answer label is normal to the model. In contrast, the information processing device of the fourth embodiment performs additional learning that lowers the score in the vicinity of the teaching data by using the teaching data whose correct answer label is abnormal and reducing the normal data of the model. possible. The model according to the fourth embodiment works toward making it easier to recognize unrecognized abnormalities. The addition of a method for performing additional learning to reduce normal data in the fourth embodiment will be described below. Note that since the configuration of the information processing apparatus according to the fourth embodiment is the same as that in FIG. 1, illustration and description thereof will be omitted. Hereinafter, in the fourth embodiment, only the parts added or changed from the above-described embodiment will be explained, and the explanation of the parts common to the above-described embodiment will be omitted.

第４の実施形態の場合、ステップＳ３０６において、学習部１０４は、まず第３の実施形態と同様に正解ラベルが正常である特徴量の点をモデルに追加する。そして、学習部１０４は、正解ラベルが異常である特徴量が教師データにあれば、それぞれについて、モデルの点のベクトル値のそれぞれとの距離ｄを計算して、いずれかの距離が所定の閾値ｔよりも小さいモデルの点については削除する。すなわち、学習部１０４は、教師データを中心とした半径（閾値ｔ）の超球に含まれるモデルの点を削除する。 In the case of the fourth embodiment, in step S306, the learning unit 104 first adds points of the feature amount whose correct answer label is normal to the model, as in the third embodiment. Then, if there are features whose correct labels are abnormal in the training data, the learning unit 104 calculates the distance d from each of the vector values of the points of the model for each feature, and determines whether any of the distances is a predetermined threshold value. Points in the model that are smaller than t are deleted. That is, the learning unit 104 deletes points of the model included in a hypersphere having a radius (threshold t) centered on the teacher data.

例えば、学習部１０４は、削除時の世代番号をモデルの点のメタデータとしてさらに付加することによって削除する。また、学習部１０４は、削除時の世代番号として付加する世代番号を、別の正解データの追加による追加学習の世代のさらに次の世代とするためインクリメントする。すなわち、同一の世代番号の追加学習の世代は、すべて追加か、すべて削除のいずれかとなる。 For example, the learning unit 104 deletes by further adding the generation number at the time of deletion as metadata of the model point. Further, the learning unit 104 increments the generation number added as the generation number at the time of deletion to make it the next generation after the generation for additional learning by adding another correct answer data. That is, all additional learning generations with the same generation number are either added or all deleted.

図１４は、第４の実施形態における管理部１０６の履歴管理方法を説明する図である。
管理部１０６が管理するモデル履歴データは、モデル情報レコード１２０１の列からなる。モデル情報レコード１２０１は、第３の実施形態で説明したものと同様である。モデルデータ１４０１は、モデルの点を表現するベクトル値ごとに、メタデータとして世代番号および削除時の世代番号が付与されている。なお、削除されていないベクトル値は、削除時の世代番号では「なし」となる。図１４の例の場合、ＩＤの２番目と４８９０番目のベクトル値が、世代番号１０１の追加学習（削除による）によって削除されたことになる。言い換えると、例えばＩＤが４８９０番目のベクトル値は、世代番号５５で追加され、世代番号１０１で削除されたデータである。すなわち、例えばＩＤが４８９０番目のベクトル値は、世代番号５５から世代番号１００までのモデルにおいて存在するベクトル値である。
本実施形態において、このモデルデータから、例えば世代番号９９のモデルデータを得るためには、世代番号が９９以下で、かつ削除時の世代番号が「なし」または世代番号が９９より大きいベクトル値のみを抽出すればよい。 FIG. 14 is a diagram illustrating a history management method of the management unit 106 in the fourth embodiment.
The model history data managed by the management unit 106 consists of a column of model information records 1201. The model information record 1201 is similar to that described in the third embodiment. In the model data 1401, a generation number and a generation number at the time of deletion are assigned as metadata to each vector value representing a point of the model. Note that for vector values that have not been deleted, the generation number at the time of deletion is "none". In the case of the example shown in FIG. 14, the second and 4890th vector values of the ID are deleted by additional learning (by deletion) of the generation number 101. In other words, for example, the vector value with the 4890th ID is data that was added at generation number 55 and deleted at generation number 101. That is, for example, the vector value with the 4890th ID is a vector value that exists in models from generation number 55 to generation number 100.
In this embodiment, in order to obtain model data with generation number 99 from this model data, only vector values whose generation number is 99 or less and whose generation number at the time of deletion is "none" or whose generation number is greater than 99 are required. All you have to do is extract it.

図１５は、第４の実施形態の情報処理装置が、モデルの性能値Ｓ（Ｍ）を計算する際の処理の流れを示したフローチャートである。以下、図１５のフローチャートを参照して、第４の実施形態の情報処理装置が、評価セットのひとつの特徴量ｘについてモデル履歴に含まれるモデルＭ_gのすべてについて認識スコアＳ（ｘ；Ｍ_g）を計算する方法について説明する。以下の説明では、図１３に示したフローチャートとは異なる処理について説明する。 FIG. 15 is a flowchart showing the flow of processing when the information processing apparatus according to the fourth embodiment calculates the performance value S(M) of the model. Hereinafter, with reference to the flowchart of FIG. 15, the information processing apparatus of the fourth embodiment calculates the recognition score S(x; _M _g ) will be explained. In the following explanation, processing different from the flowchart shown in FIG. 13 will be explained.

第４の実施形態の情報処理装置は、ステップＳ１３０４の次はステップＳ１５０１の処理に進む。前述した第３の実施形態の場合、ステップＳ１３０３で探索した最近傍ｙは、最近傍ｙの追加時である世代番号Ｇ’から世代番号Ｇまでずっと最近傍である。一方、第４の実施形態の場合、世代番号Ｇ’から世代番号Ｇまでに削除された点が最近傍ｙよりも近いかもしれないため、選択部１０８はその探索を行う。そして、ステップＳ１５０１において、選択部１０８は、世代番号ＨにＧ’を設定する。以降、世代番号Ｈは、ステップＳ１５０１からＳ１５０７までの一連の流れにおいて、世代番号Ｇ’から始まって世代番号Ｇまで増加する変数である。 After step S1304, the information processing apparatus according to the fourth embodiment proceeds to step S1501. In the case of the third embodiment described above, the nearest neighbor y searched in step S1303 is the nearest neighbor all the way from the generation number G', which is the time when the nearest neighbor y was added, to the generation number G. On the other hand, in the case of the fourth embodiment, the point deleted from generation number G' to generation number G may be closer than the nearest neighbor y, so the selection unit 108 searches for it. Then, in step S1501, the selection unit 108 sets the generation number H to G'. Thereafter, the generation number H is a variable that starts from the generation number G' and increases up to the generation number G in the series of steps S1501 to S1507.

次にステップＳ１５０２において、評価部１０７は、モデルＭ_Hからｘの最近傍ｙよりも近い最近傍ｚ、すなわちｄ（ｚ，ｘ）＜ｄ（ｙ，ｘ）なる最近傍ｚのうちｄ（ｚ，ｘ）が最小のものを探索して記憶する。そして、評価部１０７は、そのような最近傍ｚが存在すれば認識スコアＳ（ｘ；Ｍ_H）を前述の計算式で計算する。最近傍ｚはｙよりもｘに近いが、世代番号Ｇの時点では存在しない（ｚよりも遠いｙが最近傍であるから）ので、最近傍ｚは世代番号ＨからＧの間に削除された点である。逆に、最近傍ｚの候補はそのような点に限られるため、世代番号ＨからＧの間に削除された点、すなわち削除時世代番号がＨ以上Ｇ以下の点に限って距離を計算することで効率よく最近傍ｚを探索できる。 Next, in step _S1502 , the evaluation unit 107 calculates d(z, , x) is searched for and stored. Then, if such a nearest neighbor z exists, the evaluation unit 107 calculates the recognition score S(x; M _H ) using the above-mentioned formula. The nearest neighbor z is closer to x than y, but it does not exist at the time of generation number G (because y, which is further away than z, is the nearest neighbor), so the nearest neighbor z was deleted between generation numbers H and G. It is a point. Conversely, candidates for the nearest neighbor z are limited to such points, so distances are calculated only for points deleted between generation numbers H and G, that is, points whose generation number at the time of deletion is greater than or equal to H and less than or equal to G. This makes it possible to efficiently search for the nearest neighbor z.

次にステップＳ１５０３において、選択部１０８は、ステップＳ１５０２の最近傍ｚが見つかったかどうかを判定する。選択部１０８は、最近傍ｚが見つからなかった場合にはステップＳ１５０４に処理を進める。
ここで、最近傍ｚが見つからなかった場合、世代番号Ｈから世代番号Ｇまでの間に削除されたデータでｙよりも近いものはなかったのだから、世代番号Ｈから世代番号Ｇまでの間の最近傍はすべてｙであることがわかる。このため、選択部１０８は、ステップＳ１５０４において、世代番号Ｈから世代番号Ｇまでのモデルにおけるｘの認識スコアを、ステップＳ１３０３で求めた認識スコアＳ（ｘ；Ｍ_G）の値で記憶する。そして、選択部１０８は、ステップＳ１３０６の処理に進む。 Next, in step S1503, the selection unit 108 determines whether the nearest neighbor z in step S1502 has been found. If the selection unit 108 does not find the nearest neighbor z, the selection unit 108 advances the process to step S1504.
Here, if the nearest neighbor z is not found, there is no deleted data between generation number H and generation number G that is closer than y, so the data between generation number H and generation number G is It can be seen that all the nearest neighbors are y. Therefore, in step S1504, the selection unit 108 stores the recognition score of x in the models from generation number H to generation number G as the value of recognition score S(x; M _G ) obtained in step S1303. The selection unit 108 then proceeds to processing in step S1306.

一方、ステップＳ１５０３において最近傍ｚが見つかった場合、選択部１０８は、ステップＳ１５０５に処理を進める。ステップＳ１５０５に進むと、選択部１０８は、最近傍ｚの削除時の世代番号を取得し、それを世代番号Ｈ'とする。最近傍ｚは、世代番号Ｈから、削除される直前である世代番号Ｈ'－１までの間はすべて最近傍であることがわかる。 On the other hand, if the nearest neighbor z is found in step S1503, the selection unit 108 advances the process to step S1505. Proceeding to step S1505, the selection unit 108 obtains the generation number of the nearest neighbor z at the time of deletion, and sets it as the generation number H'. It can be seen that the nearest neighbor z is the nearest neighbor from generation number H to generation number H'-1, which is immediately before deletion.

次にステップＳ１５０６において、選択部１０８は、世代番号Ｈから世代番号Ｈ’－１までのモデルにおけるｘの認識スコアを、ステップＳ１５０２で求めた認識スコアＳ（ｘ；Ｍ_H）の値で記憶する。
次にステップＳ１５０７において、選択部１０８は、世代番号ＨにＨ’を設定してから、ステップＳ１５０２の処理に戻る。 Next, in step S1506, the selection unit 108 stores the recognition score of x in the models from generation number H to generation number H'-1 as the value of recognition score S(x; M _H ) obtained in step S1502. .
Next, in step S1507, the selection unit 108 sets the generation number H to H', and then returns to the process of step S1502.

第４の実施形態においては、以上の一連の処理によって、世代番号Ｇ’から世代番号Ｇまでのモデルについて、特徴量ｘの認識スコアがすべて決定される。これはステップＳ１３０５の効果と同様である。
第４の実施形態によれば、以上のようにして、第３の実施形態の処理に、削除による追加学習の処理を追加した上で、前述の第３の実施形態と同様に効率的な処理を行うことができる。 In the fourth embodiment, all the recognition scores of the feature amount x are determined for the models from generation number G' to generation number G through the series of processes described above. This is similar to the effect of step S1305.
According to the fourth embodiment, as described above, the processing of additional learning by deletion is added to the processing of the third embodiment, and the processing is performed efficiently like the third embodiment described above. It can be performed.

＜その他の実施形態＞
本発明は、前述の各実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
前述の実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。即ち、本発明は、その技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 <Other embodiments>
The present invention provides a system or device with a program that implements one or more functions of each of the above-described embodiments via a network or a storage medium, and one or more processors in the computer of the system or device reads the program. It can also be realized by executing processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.
The above-described embodiments are merely examples of implementation of the present invention, and the technical scope of the present invention should not be construed as limited by these embodiments. That is, the present invention can be implemented in various forms without departing from its technical idea or main features.

１０１：撮影部、１０２：特徴量算出部、１０３：認識部、１０４：学習部、１０５：記憶部、１０６：管理部、１０７：評価部、１０８：選択部、１０９：表示部、１１０：操作部 101: Photographing section, 102: Feature calculation section, 103: Recognition section, 104: Learning section, 105: Storage section, 106: Management section, 107: Evaluation section, 108: Selection section, 109: Display section, 110: Operation Department

Claims

a recognition means for recognizing features from an image based on the model;
learning means for additionally learning the model;
a management means for managing information on the history of additional learning of the model;
evaluation means for evaluating the accuracy of the recognition by the model;
a selection means for selecting a model based on the recognition accuracy from among the models for which the history information is managed;
The evaluation means holds an evaluation set that records a collection of recognition target information recognized by the recognition means,
The selection means selects an additionally learned model based on a result of evaluating accuracy using a subset of information on the recognition target extracted from the evaluation set;
The information processing apparatus is characterized in that the recognition means uses the model selected by the selection means for the recognition.

The evaluation means includes a feature amount of the recognition target recognized by the recognition means, an ID of the model, and a score regarding the degree of the feature of the recognition target obtained as a result of recognition of the recognition target using the model. A set of correspondences is maintained as an evaluation set,
The selection means selects models included in the evaluation set based on the results of evaluating the accuracy of recognition of the evaluation set by the recognition means using each model for which the history information is managed. The method is characterized in that a model that has been additionally learned is selected using an evaluation set generated by a subset of recognition targets excluding any of a plurality of recognition targets whose IDs and scores of the models are similar to each other. The information processing device according to claim 1.

further comprising display control means for displaying the image on a display device,
3. The information processing apparatus according to claim 1, wherein the display control means displays information on the recognition target included in the evaluation set.

a recognition means for recognizing features from an image based on the model;
learning means for additionally learning the model;
a management means for managing information on the history of additional learning of the model;
evaluation means for evaluating the accuracy of the recognition by the model;
a selection means for selecting a model based on the recognition accuracy from among the models for which the history information is managed;
The selection means selects a part of the recognition target recognized by the recognition means,
The evaluation means evaluates the accuracy of the recognition based on the recognition results for a part of the selected recognition target,
The information processing apparatus is characterized in that the recognition means uses the model selected by the selection means for the recognition.

The selection means clusters the recognition target recognized by the recognition means into a plurality of clusters, and selects at least one or more clusters from the plurality of clusters,
5. The information processing apparatus according to claim 4, wherein the evaluation means evaluates the accuracy of the recognition based on a recognition result of a recognition target belonging to the selected cluster.

The information processing apparatus according to any one of claims 1 to 5, wherein the selection means selects a model with maximum accuracy in the accuracy transition in the history.

further comprising a selection input means for a user to select a model;
The selection means presents the user with a plurality of models determined based on changes in accuracy in the history, and the user selects a model based on the result input by the selection input means. The information processing device according to any one of claims 1 to 5.

6. The information processing apparatus according to claim 1, wherein the selection means determines whether it is necessary to recreate the model based on a change in accuracy in the history.

6. The information processing apparatus according to claim 5 , wherein the selection means clusters the recognition targets based on attributes of the recognition targets.

The learning means performs additional learning by adding or deleting a recognition target to the model,
The management means manages the history of the model by assigning a generation number representing a generation in which the additional learning has been performed to the recognition target,
According to any one of claims 1 to 9 , the recognition means determines a subset of data included in the model based on the generation number, and sets the subset as a model included in the history. The information processing device described.

11. The information processing apparatus according to claim 1, wherein the model includes a nearest neighbor model.

The model is a collection of data indicating the recognition target in a feature space,
The recognition accuracy is a score indicating the distance to the point closest to the data of the predetermined recognition target when using the model with the generation number,
The selection means selects data of the recognition target that minimizes the score based on the data of the recognition target included in each model from the generation number of the latest model to the initial model and the predetermined recognition target. Select the model containing
If the generation number of the selected model is different from the generation number of the initial model, the evaluation means may exclude the recognition target data included in the models after the generation number of the selected model. evaluating the score based on the data of the recognition target and the predetermined recognition target, and when the generation number of the selected model is the generation number of the initial model, terminating the evaluation; The information processing device according to claim 10 .

further comprising display control means for displaying the image on a display device,
13. The display control means displays information indicating objects recognized as normal and objects recognized as abnormal by the recognition means, respectively. The information processing device described in .

The display control means further displays information for instructing the user to cancel the information, out of the information indicating the object recognized as having an abnormality by the recognition means. 14. The information processing device according to 13 .

When there are a plurality of models selected by the selection means, the display control means includes information indicating the time when the plurality of models were generated, the recognition result of the current model, and the recognition of each of the plurality of models. The information processing apparatus according to claim 13 or 14 , wherein an image of an object having a different result is displayed.

An information processing method executed by an information processing device, the method comprising:
a recognition step of recognizing features from the image based on the model;
a learning step of performing additional learning of the model;
a management step of managing information on the history of additional learning of the model;
an evaluation step of evaluating the recognition accuracy of the model;
a selection step of selecting a model based on the recognition accuracy from among the models for which the history information is managed;
The evaluation step includes maintaining an evaluation set that records a collection of information to be recognized;
The selection step selects an additionally learned model based on a result of evaluating accuracy using a subset of information on the recognition target extracted from the evaluation set;
The information processing method is characterized in that the recognition step uses the model selected in the selection step for the recognition.

An information processing method executed by an information processing device, the method comprising:
a recognition step of recognizing features from the image based on the model;
a learning step of performing additional learning of the model;
a management step of managing information on the history of additional learning of the model;
an evaluation step of evaluating the accuracy of the recognition by the model;
A model is selected based on the recognition accuracy from among the models for which the history information is managed.
a selection step of selecting;
The selection step selects a part of the recognition target to be recognized by the recognition step,
The evaluation step evaluates the accuracy of the recognition based on the recognition results for a part of the selected recognition target,
The information processing method is characterized in that the recognition step uses the model selected in the selection step for the recognition.

A program for causing a computer to function as each means of the information processing apparatus according to claim 1 .