JP7222209B2

JP7222209B2 - DEEP LEARNING NETWORK USED FOR EVENT DETECTION, TRAINING DEVICE AND METHOD FOR THE NETWORK

Info

Publication number: JP7222209B2
Application number: JP2018177357A
Authority: JP
Inventors: イヌ・ルォイ; タヌ・ジミン; バイ・シアンホォイ
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-11-15
Filing date: 2018-09-21
Publication date: 2023-02-15
Anticipated expiration: 2038-09-21
Also published as: CN109784487B; CN109784487A; JP2019091421A

Description

本発明は、情報技術分野に関し、特にイベント検出に用いられる深層学習ネットワーク、該ネットワークの訓練装置及び訓練方法に関する。 TECHNICAL FIELD The present invention relates to the field of information technology, and more particularly to a deep learning network used for event detection, a training device for the network, and a training method.

近年、深層学習（ｄｅｅｐｌｅａｒｎｉｎｇ）はコンピュータビジョン（ｃｏｍｐｕｔｅｒｖｉｓｉｏｎ）の分野に広く応用されている。深層学習により、コンピュータビジョン分野の研究方向は、画像分類から例えばイベント検出などの映像解析（ｖｉｄｅｏａｎａｌｙｓｉｓ）に移行している。画像分類に比べて、映像解析はより複雑なシーンに直面し、イベント検出もより高いレベルの論理的判断を学習するためのモデルを必要とする。 Recently, deep learning has been widely applied in the field of computer vision. Deep learning is moving the direction of research in the field of computer vision from image classification to video analysis, such as event detection. Compared to image classification, video analysis faces more complex scenes, and event detection also requires models to learn higher-level logical judgments.

図１は従来のイベント検出モデルの検出結果を示す図である。図１に示すように、図１に示す監視映像画面では、従来のイベント検出モデルで検出された各イベントの発生確率は、正常（Ｎｏｒｍａｌ）０．０３、事故（Ａｃｃｉｄｅｎｔ）０．４６、渋滞（Ｊａｍ）０．４１、駐車（Ｐａｒｋ）０．０８、逆行（Ｒｅｖｅｒｓｅ）０．０２となる。 FIG. 1 is a diagram showing detection results of a conventional event detection model. As shown in FIG. 1, on the monitoring video screen shown in FIG. 1, the probability of occurrence of each event detected by the conventional event detection model is 0.03 for normal, 0.46 for accident, and 0.46 for traffic congestion. Jam) 0.41, Park 0.08, and Reverse 0.02.

なお、上述した技術背景の説明は、本発明の技術案を明確、完全に理解させるための説明であり、当業者を理解させるために記述されているものである。これらの技術案は、単なる本発明の背景技術部分として説明されたものであり、当業者により周知されたものではない。 It should be noted that the above description of the technical background is provided for a clear and complete understanding of the technical solution of the present invention, and is provided for the understanding of those skilled in the art. These technical ideas are merely described as part of the background art of the present invention and are not well known by those skilled in the art.

本発明の発明者の発見によると、図１の場合は、理想的な出力結果は事故（Ａｃｃｉｄｅｎｔ）と渋滞（Ｊａｍ）であるはずなのに、上記従来のイベント検出モデルの出力条件を０．５以上とすると、出力結果が得られなくなり、出力条件を最大確率のイベントの出力とすると、出力結果は事故（Ａｃｃｉｄｅｎｔ）となる。即ち、従来のイベント検出モデルは、相互に排他的なイベントのみを区別することができ、複数のイベントを検出結果として出力することができなく、検出結果の正確性及び完全性を確保することができない。また、従来のイベント検出モデルは、マルチ分類器の検出モデルであるため、その訓練時間が長い。 According to the discovery of the inventor of the present invention, in the case of FIG. Then, the output result cannot be obtained, and if the output condition is the output of the event with the maximum probability, the output result is an accident. That is, the conventional event detection model can only distinguish mutually exclusive events, cannot output multiple events as detection results, and can ensure the accuracy and completeness of detection results. Can not. In addition, since the conventional event detection model is a multi-classifier detection model, its training time is long.

本発明の実施例は、イベント検出に用いられる深層学習ネットワーク、該ネットワークの訓練装置及び訓練方法を提供する。該深層学習ネットワークは、互いに独立した、異なるイベントの検出を行う少なくとも２つのイベント分類器を有し、各イベント分類器が独立して検出を行い、検出結果を出力することで、イベント検出結果の正確性及び完全性を確保できる。また、各イベント分類器が１種類のイベントだけを検出すれば良いため、該深層学習ネットワークの訓練に必要な時間が短く、訓練された該深層学習ネットワークの検出精度が高い。 Embodiments of the present invention provide a deep learning network used for event detection, an apparatus for training the network and a training method. The deep learning network has at least two event classifiers that detect different events independently of each other, and each event classifier detects independently and outputs the detection result, so that the event detection result Accuracy and completeness can be ensured. Also, since each event classifier needs to detect only one type of event, the time required for training the deep learning network is short, and the trained deep learning network has high detection accuracy.

本発明の実施例の第１態様では、イベント検出に用いられる深層学習ネットワークであって、入力データを読み取るデータ層と、前記データ層により読み取られた前記入力データから特徴を抽出する畳み込み層と、前記畳み込み層により抽出された前記特徴に基づいて、互いに独立して異なるイベントの検出を行い、異なるイベントの検出結果をそれぞれ出力する少なくとも２つのイベント分類器と、を含む、深層学習ネットワークを提供する。 In a first aspect of an embodiment of the present invention, a deep learning network used for event detection, comprising: a data layer reading input data; a convolutional layer extracting features from the input data read by the data layer; at least two event classifiers that detect different events independently of each other based on the features extracted by the convolutional layers and output detection results of the different events, respectively. .

本発明の実施例の第２態様では、本発明の実施例の第１態様に記載の深層学習ネットワークの訓練装置であって、前記深層学習ネットワークの前記畳み込み層のパラメータを訓練する第１訓練手段と、前記深層学習ネットワークの前記畳み込み層のパラメータを維持したまま、前記深層学習ネットワークの前記少なくとも２つのイベント分類器のパラメータを訓練する第２訓練手段と、を含む、装置を提供する。 According to a second aspect of an embodiment of the present invention, a training apparatus for a deep learning network according to the first aspect of an embodiment of the present invention, comprising first training means for training parameters of said convolutional layers of said deep learning network. and second training means for training the parameters of the at least two event classifiers of the deep learning network while maintaining the parameters of the convolutional layers of the deep learning network.

本発明の実施例の第３態様では、本発明の実施例の第１態様に記載の深層学習ネットワークの訓練方法であって、前記深層学習ネットワークの前記畳み込み層のパラメータを訓練するステップと、前記深層学習ネットワークの前記畳み込み層のパラメータを維持したまま、前記深層学習ネットワークの前記少なくとも２つのイベント分類器のパラメータを訓練するステップと、を含む、方法を提供する。 In a third aspect of an embodiment of the present invention, a method for training a deep learning network according to the first aspect of an embodiment of the present invention, comprising the steps of training parameters of said convolutional layers of said deep learning network; and training the parameters of the at least two event classifiers of the deep learning network while maintaining the parameters of the convolutional layers of the deep learning network.

本発明の有利な効果としては、該深層学習ネットワークは、互いに独立した、異なるイベントの検出を行う少なくとも２つのイベント分類器を有し、各イベント分類器が独立して検出を行い、検出結果を出力することで、イベント検出結果の正確性及び完全性を確保できる。また、各イベント分類器が１種類のイベントだけを検出すれば良いため、該深層学習ネットワークの訓練に必要な時間が短く、訓練された該深層学習ネットワークの検出精度が高い。 Advantageously, the deep learning network has at least two event classifiers for detecting different events independently of each other, each event classifier for independently detecting and generating a detection result of By outputting, the accuracy and completeness of event detection results can be ensured. Also, since each event classifier needs to detect only one type of event, the time required for training the deep learning network is short, and the trained deep learning network has high detection accuracy.

本発明の特定の実施形態は、後述の説明及び図面に示すように、詳細に開示され、本発明の原理を採用されることが可能な方式を示している。なお、本発明の実施形態は、範囲上には限定されるものではない。本発明の実施形態は、添付されている特許請求の範囲の主旨及び内容の範囲内、各種の変更、修正、及び均等的なものが含まれる。 Specific embodiments of the invention are disclosed in detail, as set forth in the following description and drawings, to illustrate the manner in which the principles of the invention may be employed. It should be noted that embodiments of the present invention are not limited in scope. Embodiments of the present invention include various changes, modifications and equivalents within the spirit and content of the appended claims.

ある一つの実施形態に説明及び又は示されている特徴は、同一又は類似の方式で一つ又は多くの他の実施形態に使用されてもよく、他の実施形態における特徴と組み合わせてもよく、他の実施形態における特徴を代替してもよい。 Features described and/or shown in one embodiment may be used in one or many other embodiments in the same or similar manner and may be combined with features in other embodiments; Features in other embodiments may be substituted.

なお、用語「包括／含む」は、本文に使用される際に、特徴、要素、ステップ又は構成要件の存在を意味し、一つ又は複数の他の特徴、要素、ステップ又は構成要件の存在又は追加を排除するものではない。 It should be noted that the term "comprising/including" as used herein means the presence of a feature, element, step or component, and the presence or absence of one or more other features, elements, steps or components. Additions are not excluded.

ここで含まれる図面は、本発明の実施例を理解させるためのものであり、本明細書の一部を構成し、本発明の実施例を例示するためのものであり、文言の記載と合わせて本発明の原理を説明する。なお、ここに説明される図面は、単なる本発明の実施例を説明するためのものであり、当業者にとって、これらの図面に基づいて他の図面を容易に得ることができる。
従来のイベント検出モデルの検出結果を示す図である。本発明の実施例１のイベント検出に用いられる深層学習ネットワークを示す図である。本発明の実施例１の深層学習ネットワークの検出結果を示す図である。本発明の実施例１のイベント分類器２０３を示す図である。本発明の実施例２の訓練装置を示す図である。本発明の実施例３の電子機器を示す図である。本発明の実施例３の電子機器のシステム構成を示すブロック図である。本発明の実施例４の訓練方法を示す図である。 The drawings included herein are included to provide an understanding of embodiments of the invention, and constitute a part of this specification and are intended to illustrate embodiments of the invention and, together with the written description, The principle of the present invention will now be explained. It should be noted that the drawings described herein are merely for explaining the embodiments of the present invention, and those skilled in the art can easily obtain other drawings based on these drawings.
FIG. 10 is a diagram showing detection results of a conventional event detection model; 1 is a diagram showing a deep learning network used for event detection in Example 1 of the present invention; FIG. It is a figure which shows the detection result of the deep learning network of Example 1 of this invention. It is a figure which shows the event classifier 203 of Example 1 of this invention. Fig. 2 is a diagram showing a training device according to Example 2 of the present invention; It is a figure which shows the electronic device of Example 3 of this invention. FIG. 10 is a block diagram showing the system configuration of an electronic device according to Example 3 of the present invention; It is a figure which shows the training method of Example 4 of this invention.

本発明の上記及びその他の特徴は、図面及び下記の説明により理解できるものである。明細書及び図面では、本発明の特定の実施形態、即ち本発明の原則に従う一部の実施形態を表すものを公開している。なお、本発明は説明される実施形態に限定されず、本発明は、特許請求の範囲内の全ての修正、変形されたもの、及び均等なものを含む。 These and other features of the invention can be understood from the drawings and the description below. The specification and drawings disclose specific embodiments of the invention, which represent some embodiments consistent with the principles of the invention. It should be noted that the invention is not limited to the described embodiments, but that the invention includes all modifications, variations and equivalents that come within the scope of the claims.

＜実施例１＞
本発明の実施例はイベント検出に用いられる深層学習ネットワークを提供する。図２は本発明の実施例１のイベント検出に用いられる深層学習ネットワークを示す図である。図２に示すように、深層学習ネットワーク２００は、データ層２０１、畳み込み層２０２、及び少なくとも２つのイベント分類器２０３を含む。 <Example 1>
Embodiments of the present invention provide deep learning networks for use in event detection. FIG. 2 is a diagram showing a deep learning network used for event detection in Example 1 of the present invention. As shown in FIG. 2, deep learning network 200 includes data layer 201 , convolutional layer 202 and at least two event classifiers 203 .

データ層２０１は、入力データを読み取る。 Data layer 201 reads input data.

畳み込み層２０２は、該データ層により読み取られた該入力データから特徴を抽出する。 A convolutional layer 202 extracts features from the input data read by the data layer.

少なくとも２つのイベント分類器２０３は、該畳み込み層により抽出された該特徴に基づいて、互いに独立して異なるイベントの検出を行い、異なるイベントの検出結果をそれぞれ出力する。 At least two event classifiers 203 independently detect different events based on the features extracted by the convolutional layers, and output different event detection results.

上記実施例によれば、該深層学習ネットワークは、互いに独立した、異なるイベントの検出を行う少なくとも２つのイベント分類器を有し、各イベント分類器が独立して検出を行い、検出結果を出力することで、イベント検出結果の正確性及び完全性を確保できる。また、各イベント分類器が１種類のイベントだけを検出すれば良いため、該深層学習ネットワークの訓練に必要な時間が短く、訓練された該深層学習ネットワークの検出精度が高い。 According to the above embodiment, the deep learning network has at least two event classifiers that detect different events independently of each other, and each event classifier independently detects and outputs a detection result. By doing so, the accuracy and completeness of event detection results can be ensured. Also, since each event classifier needs to detect only one type of event, the time required for training the deep learning network is short, and the trained deep learning network has high detection accuracy.

本実施例では、データ層２０１は、入力データを読み取る。例えば、データ層２０１は、監視映像を処理し、入力データを取得する。 In this example, data layer 201 reads input data. For example, the data layer 201 processes surveillance video and obtains input data.

例えば、該入力データは、監視映像の少なくとも１つのフレームであってもよく、該監視映像は、道路の上方に設置された監視カメラにより取得されてもよい。 For example, the input data may be at least one frame of surveillance video, which may be captured by a surveillance camera installed above the road.

本実施例では、畳み込み層２０２は、該データ層により読み取られた該入力データから特徴を抽出する。該畳み込み層２０２は、従来の構造を用いてもよい。例えば、該畳み込み層２０２は、従来のＡｌｅｘｎｅｔネットワーク構造であってもよい。 In this embodiment, convolutional layer 202 extracts features from the input data read by the data layer. The convolutional layer 202 may use conventional structures. For example, the convolutional layer 202 may be a conventional Alexnet network structure.

本実施例では、該特徴は、入力データとなる監視映像画像における各特徴、例えば輪郭、テクスチャ（ｔｅｘｔｕｒｅ）、輝度などであってもよい。 In this embodiment, the feature may be each feature in the surveillance video image as the input data, such as contour, texture, brightness, and the like.

本実施例では、少なくとも２つのイベント分類器２０３は、該畳み込み層２０２により抽出された該特徴に基づいて、互いに独立して異なるイベントの検出を行い、異なるイベントの検出結果をそれぞれ出力する。 In this embodiment, at least two event classifiers 203 independently detect different events based on the features extracted by the convolutional layer 202, and respectively output different event detection results.

本実施例では、各イベント分類器２０３は異なるイベントを検出でき、且つ各イベント分類器２０３は１種類のイベントのみを検出し、即ち各イベント分類器２０３は何れも二分分類器である。 In this embodiment, each event classifier 203 can detect different events, and each event classifier 203 can only detect one kind of event, ie each event classifier 203 is a dichotomous classifier.

本実施例では、イベント分類器２０３の数は、実際の需要に応じて設定されてもよい。例えば、検出すべきイベントの種類の数に応じて設定されてもよい。 In this embodiment, the number of event classifiers 203 may be set according to actual demand. For example, it may be set according to the number of types of events to be detected.

例えば、図２に示すように、該深層学習ネットワーク２００は、正常（Ｎｏｒｍａｌ）、事故（Ａｃｃｉｄｅｎｔ）、渋滞（Ｊａｍ）、駐車（Ｐａｒｋ）及び逆行（Ｒｅｖｅｒｓｅ）というイベントをそれぞれ検出するための５つのイベント分類器２０３を含んでもよい。 For example, as shown in FIG. 2, the deep learning network 200 has five events for detecting each event: Normal, Accident, Jam, Park and Reverse. An event classifier 203 may be included.

本実施例では、少なくとも２つのイベント分類器２０３により出力された検出結果を表示してもよい。例えば、少なくとも２つのイベント分類器２０３により出力された異なるイベントの検出結果を監視映像画面にまとめて表示してもよい。 In this embodiment, detection results output by at least two event classifiers 203 may be displayed. For example, different event detection results output by at least two event classifiers 203 may be collectively displayed on the surveillance video screen.

図３は本発明の実施例１の深層学習ネットワークの検出結果を示す図である。図３に示すように、図１と同様な監視映像画面では、同様な入力映像について、該深層学習ネットワーク２００により取得された検出結果は、正常（Ｎｏｒｍａｌ）０．０１、事故（Ａｃｃｉｄｅｎｔ）０．９６、渋滞（Ｊａｍ）０．８９、駐車（Ｐａｒｋ）０．３１、逆行（Ｒｅｖｅｒｓｅ）０．１０となる。このように、深層学習ネットワーク２００は、正常（Ｎｏｒｍａｌ）、事故（Ａｃｃｉｄｅｎｔ）、渋滞（Ｊａｍ）、駐車（Ｐａｒｋ）及び逆行（Ｒｅｖｅｒｓｅ）をそれぞれ検出するための５つのイベント分類器２０３を有し、各イベント分類器２０３が異なるイベントを独立して検出することで、イベント検出結果の正確性及び完全性を確保できる。 FIG. 3 is a diagram showing detection results of the deep learning network of Example 1 of the present invention. As shown in FIG. 3, on a monitoring video screen similar to that in FIG. 1, the detection results obtained by the deep learning network 200 for a similar input video are 0.01 for Normal and 0.01 for Accident. 96, Jam 0.89, Park 0.31, Reverse 0.10. Thus, the deep learning network 200 has five event classifiers 203 for detecting normal (Normal), accident (Accident), jam (Jam), parking (Park) and reverse (Reverse), By independently detecting different events by each event classifier 203, the accuracy and completeness of event detection results can be ensured.

本実施例では、各イベント分類器２０３の構造は、同一であってもよいし、異なってもよい。本実施例では、同一の構造を有するイベント分類器２０３を一例にして説明する。 In this embodiment, the structure of each event classifier 203 may be the same or different. In this embodiment, the event classifier 203 having the same structure will be described as an example.

図４は本発明の実施例１のイベント分類器２０３を示す図である。図４に示すように、イベント分類器２０３は、第１全結合層４０１、第２全結合層４０２、及び第１全結合層４０１と第２全結合層４０２との間に設けられる長短期記憶（ＬＳＴＭ：ＬｏｎｇＳｈｏｒｔ－ＴｅｒｍＭｅｍｏｒｙ）層４０３を含む。 FIG. 4 is a diagram showing the event classifier 203 of Example 1 of the present invention. As shown in FIG. 4 , the event classifier 203 includes a first fully connected layer 401 , a second fully connected layer 402 , and a long short-term memory layer provided between the first fully connected layer 401 and the second fully connected layer 402 . (LSTM: Long Short-Term Memory) layer 403 .

本実施例では、イベント分類器にＬＳＴＭ層を設けることで、経時的に有用な情報を記憶し、無用な情報を忘れるという特性を用いて、高い検出精度を得ることができる。 In this embodiment, by providing an LSTM layer in the event classifier, it is possible to obtain high detection accuracy by using the characteristic of storing useful information over time and forgetting useless information.

本実施例では、該イベント分類器２０３は、該イベント分類器２０３により検出されたイベントの発生確率を出力するための出力層４０４をさらに含んでもよい。 In this embodiment, the event classifier 203 may further include an output layer 404 for outputting the probability of occurrence of events detected by the event classifier 203 .

本実施例では、第１全結合層４０１、第２全結合層４０２、ＬＳＴＭ層４０３及び出力層４０４は、何れも従来の構造を用いてもよい。 In this embodiment, the first fully-connected layer 401, the second fully-connected layer 402, the LSTM layer 403, and the output layer 404 may all have conventional structures.

本実施例では、各イベント分類器２０３は、独立して訓練し、且つ／或いは独立してパラメータを調整することができるものである。このように、深層学習ネットワーク２００の訓練及び／又は調整を柔軟に行うことができ、訓練及び／又は調整の時間を効果的に減らすことができる。 In this embodiment, each event classifier 203 can be trained independently and/or its parameters adjusted independently. In this way, the deep learning network 200 can be flexibly trained and/or tuned, effectively reducing training and/or tuning time.

本実施例では、イベント分類器２０３は、独立して該深層学習ネットワーク２００に追加し、或いは該深層学習ネットワーク２００から削除することができるものである。 In this embodiment, event classifiers 203 can be independently added to or deleted from the deep learning network 200 .

例えば、実際な状況に応じて新たなイベントを検出する場合は、該深層学習ネットワーク２００に該新たなイベントを検出するためのイベント分類器を独立して追加してもよい。実際な状況に応じてイベントの検出が不要となる場合は、該深層学習ネットワーク２００から該イベントを検出するためのイベント分類器を削除してもよい。 For example, when detecting a new event according to the actual situation, an event classifier for detecting the new event may be added to the deep learning network 200 independently. If event detection becomes unnecessary according to the actual situation, the event classifier for detecting the event may be deleted from the deep learning network 200 .

このように、該深層学習ネットワークは、柔軟な拡張と削除の機能を有することができ、実際な需要に応じて該深層学習ネットワークにおけるイベント分類器を増減できる。 In this way, the deep learning network can have flexible expansion and deletion functions, and the event classifiers in the deep learning network can be increased or decreased according to actual demands.

＜実施例２＞
本発明の実施例は、実施例１に記載されたイベント検出に用いられる深層学習ネットワークの訓練装置をさらに提供する。該深層学習ネットワークの構成は図２に示すものであり、該深層学習ネットワーク２００は、データ層２０１、畳み込み層２０２、及び少なくとも２つのイベント分類器２０３を含む。 <Example 2>
An embodiment of the present invention further provides an apparatus for training a deep learning network used for event detection as described in the first embodiment. The structure of the deep learning network is shown in FIG. 2, the deep learning network 200 includes a data layer 201, a convolutional layer 202 and at least two event classifiers 203. FIG.

図５は本発明の実施例２の訓練装置を示す図である。図５に示すように、訓練装置５００は、第１訓練部５０１及び第２訓練部５０２を含む。 FIG. 5 is a diagram showing a training device of embodiment 2 of the present invention. As shown in FIG. 5, training device 500 includes first training section 501 and second training section 502 .

第１訓練部５０１は、該深層学習ネットワーク２００の畳み込み層２０２のパラメータを訓練する。 A first training unit 501 trains the parameters of the convolutional layer 202 of the deep learning network 200 .

第２訓練部５０２は、該深層学習ネットワーク２００の畳み込み層２０２のパラメータを維持したまま、該深層学習ネットワーク２００の少なくとも２つのイベント分類器２０３のパラメータを訓練する。 A second training unit 502 trains the parameters of at least two event classifiers 203 of the deep learning network 200 while maintaining the parameters of the convolutional layers 202 of the deep learning network 200 .

このように、各イベント分類器は１種類のイベントのみを検出する必要があり、即ち各イベント分類器が何れも二分分類器であるため、必要な訓練時間が短い。 Thus, each event classifier needs to detect only one type of event, i.e. each event classifier is a dichotomous classifier, thus requiring less training time.

本実施例では、第１訓練部５０１は、深層学習ネットワーク２００の畳み込み層２０２のパラメータを訓練する。 In this embodiment, the first training unit 501 trains parameters of the convolutional layer 202 of the deep learning network 200 .

例えば、公開データセットを用いて畳み込み層２０２のパラメータを訓練してもよい。このように、公開データセットに百万枚以上の画像が含まれているため、モデルがパラメータを訓練するための豊富な特徴を提供でき、訓練して得られたモデルは良好な普遍性を有する。 For example, public data sets may be used to train the parameters of convolutional layer 202 . Thus, since the public dataset contains more than one million images, the model can provide rich features for training parameters, and the trained model has good universality. .

本実施例では、畳み込みニューラルネットワーク構造（Ｃａｆｆｅ：ＣｏｎｖｏｌｕｔｉｏｎａｌＡｒｃｈｉｔｅｃｔｕｒｅｆｏｒＦａｓｔＦｅａｔｕｒｅＥｍｂｅｄｄｉｎｇ）において訓練を行ってもよく、該畳み込み層２０２のパラメータの訓練を終了させるために、通常のＡｌｅｘｎｅｔネットワークの後に２つの全結合層、１つの精度（ａｃｃｕｒａｃｙ）層及び１つの損失（ｌｏｓｓ）層を追加する必要がある。訓練プロセスでは、該ａｃｃｕｒａｃｙ層及び該ｌｏｓｓ層の出力値に基づいて、モデルが収束しているか否かを判断し、収束している場合は訓練を終了させる。訓練が完了した後に、該追加された２つの全結合層、１つのａｃｃｕｒａｃｙ層及び１つのｌｏｓｓ層を削除し、訓練された該畳み込み層２０２を取得する。 In the present example, training may be performed in a convolutional neural network structure (Caffe: Convolutional Architecture for Fast Feature Embedding), in which a normal Alexnet network is followed by two full We need to add a coupling layer, one accuracy layer and one loss layer. In the training process, based on the output values of the accuracy layer and the loss layer, it is determined whether the model has converged, and if converged, the training is terminated. After the training is completed, we delete the added two fully connected layers, one accuracy layer and one loss layer to obtain the trained convolutional layer 202 .

本実施例では、該畳み込み層２０２の訓練が完了した後に、第２訓練部５０２は、該深層学習ネットワーク２００の畳み込み層２０２のパラメータを維持したまま、該深層学習ネットワーク２００の少なくとも２つのイベント分類器２０３のパラメータを訓練する。 In this embodiment, after the training of the convolutional layer 202 is completed, the second training unit 502 performs at least two event classifications of the deep learning network 200 while maintaining the parameters of the convolutional layer 202 of the deep learning network 200. 203 parameters are trained.

例えば、取得された監視映像のデータを用いて訓練を行ってもよい。訓練の際に、１つのａｃｃｕｒａｃｙ層及び１つのｌｏｓｓ層を追加する必要があり、訓練プロセスでは、該畳み込み層２０２の学習率を０に設定し、即ち該畳み込み層２０２のパラメータを変更せずに維持する。また、各イベント分類器２０３のうち訓練不要なイベント分類器の学習率を０に設定してもよい。訓練プロセスでは、該ａｃｃｕｒａｃｙ層及び該ｌｏｓｓ層の出力値に基づいて、モデルが収束しているか否かを判断し、収束している場合は訓練を終了させる。訓練が完了した後に、該追加されたａｃｃｕｒａｃｙ層及びｌｏｓｓ層を削除し、訓練されたイベント分類器２０３を取得する。 For example, the training may be performed using acquired monitoring video data. During training, one accuracy layer and one loss layer need to be added, and in the training process, the learning rate of the convolutional layer 202 is set to 0, i.e. without changing the parameters of the convolutional layer 202 maintain. Further, the learning rate of event classifiers that do not require training among the event classifiers 203 may be set to zero. In the training process, based on the output values of the accuracy layer and the loss layer, it is determined whether the model has converged, and if converged, the training is terminated. After training is completed, we remove the added accuracy and loss layers to obtain a trained event classifier 203 .

本実施例では、第２訓練部５０２は、該少なくとも２つのイベント分類器にそれぞれ対応する、二値化された数値で表される少なくとも２つのラベルを用いて、該少なくとも２つのイベント分類器のパラメータを訓練してもよい。 In this embodiment, the second training unit 502 uses at least two labels represented by binarized numerical values respectively corresponding to the at least two event classifiers, Parameters may be trained.

例えば、ラベル「１」でイベントが発生したことを表し、ラベル「０」でイベントが発生していないことを表してもよく、各ラベルは、順次に配列され、正常（Ｎｏｒｍａｌ）、事故（Ａｃｃｉｄｅｎｔ）、渋滞（Ｊａｍ）、駐車（Ｐａｒｋ）及び逆行（Ｒｅｖｅｒｓｅ）というイベントを検出するための各イベント分類器２０３にそれぞれ対応してもよい。例えば、図１に示す監視映像画面では、各イベント分類器２０３に対応するラベルは「０１１００」と表されてもよい。 For example, the label "1" may indicate that an event has occurred, and the label "0" may indicate that an event has not occurred. ), congestion (Jam), parking (Park), and reverse (Reverse) events. For example, in the surveillance image screen shown in FIG. 1, the label corresponding to each event classifier 203 may be expressed as "01100".

本実施例では、第２訓練部５０２は、該少なくとも２つのイベント分類器２０３のパラメータを同時に訓練し、或いは該少なくとも２つのイベント分類器の各イベント分類器２０３のパラメータをそれぞれ訓練してもよい。 In this embodiment, the second training unit 502 may train the parameters of the at least two event classifiers 203 simultaneously, or may train the parameters of each event classifier 203 of the at least two event classifiers respectively. .

このように、各イベント分類器２０３のパラメータを同時に訓練する場合は、訓練時間をさらに減らすことができ、各イベント分類器２０３のパラメータをそれぞれ訓練する場合は、実際の状況に応じて訓練を柔軟に行うことができる。 In this way, if the parameters of each event classifier 203 are trained simultaneously, the training time can be further reduced, and if the parameters of each event classifier 203 are trained separately, the training can be flexible according to the actual situation. can be done.

本実施例では、イベント分類器２０３は、独立して該深層学習ネットワーク２００に追加し、或いは該深層学習ネットワーク２００から削除することができるものである。このため、訓練装置５００は、第３訓練部５０３をさらに含んでもよい。 In this embodiment, event classifiers 203 can be independently added to or deleted from the deep learning network 200 . Therefore, the training device 500 may further include a third training section 503 .

第３訓練部５０３は、深層学習ネットワーク２００に新たなイベント分類器２０３が追加された場合、畳み込み層２０２及び元の少なくとも２つのイベント分類器２０３のパラメータを維持したまま、深層学習ネットワーク２００に追加された該新たなイベント分類器２０３のパラメータを単独で訓練する。具体的な訓練方法は、元のイベント分類器の訓練方法を参照してもよく、ここでその説明を省略する。 When a new event classifier 203 is added to the deep learning network 200, the third training unit 503 maintains the parameters of the convolutional layer 202 and the original at least two event classifiers 203 while adding the new event classifier 203 to the deep learning network 200. It trains the parameters of the new event classifier 203 that have been created by itself. The specific training method may refer to the training method of the original event classifier, and the description thereof is omitted here.

これによって、新たな検出要求がある場合は、該畳み込み層２０２及び元の少なくとも２つのイベント分類器２０３を訓練し直す必要がなく、該新たなイベント分類器２０３のパラメータを単独で訓練すればよいため、訓練時間を効果的に減らすことができ、新たな検出要求を迅速に満たすことができる。また、新たなイベント分類器の訓練プロセスでは、元のイベント分類器に影響を与えないため、元のイベント分類器の検出精度を確保できる。 Thus, when there is a new detection request, the convolutional layer 202 and the original at least two event classifiers 203 need not be retrained, but the parameters of the new event classifier 203 can be trained alone. Therefore, the training time can be effectively reduced, and new detection requirements can be quickly met. Also, the training process of the new event classifier does not affect the original event classifier, thus ensuring the detection accuracy of the original event classifier.

本実施例では、訓練装置５００は、調整部５０４をさらに含んでもよい。 In this example, training device 500 may further include adjustment unit 504 .

調整部５０４は、該少なくとも２つのイベント分類器２０３のうち１つ又は複数のイベント分類器が所定の条件を満たさない場合、該１つ又は複数のイベント分類器のパラメータを独立して調整する。 The adjusting unit 504 independently adjusts parameters of one or more event classifiers of the at least two event classifiers 203 if one or more event classifiers do not meet a predetermined condition.

本実施例では、該所定の条件は、例えばイベント分類器の検出精度が所定の閾値に達したことである。例えば、該深層学習ネットワークを用いてイベント検出を行うプロセスにおいて、イベント分類器２０３に入力された特徴に以前の訓練の際に使ったことがない特徴が含まれることにより、１つ又は複数のイベント分類器２０３の検出精度が降下し、所定の閾値よりも低くなった。このように、調整部５０４を用いて該１つ又は複数のイベント分類器２０３のパラメータを独立して調整することで、他の調整不要なイベント分類器に影響を与えず、様々な状況に応じて調整を柔軟、且つ迅速に行うことができる。 In this embodiment, the predetermined condition is, for example, that the detection accuracy of the event classifier has reached a predetermined threshold. For example, in the process of performing event detection using the deep learning network, one or more event The detection accuracy of the classifier 203 has decreased and has become lower than the predetermined threshold. In this way, by independently adjusting the parameters of the one or more event classifiers 203 using the adjustment unit 504, the other event classifiers that do not need adjustment are not affected, and various situations can be achieved. adjustments can be made flexibly and quickly.

例えば、調整プロセスでは、これらの以前の訓練の際に使ったことがない特徴を元の訓練データに追加し、調整が必要な該１つ又は複数のイベント分類器２０３を独立して訓練し、訓練が完了すると、該１つ又は複数のイベント分類器２０３のパラメータの調整が完了する。 For example, the tuning process adds features to the original training data that have not been used in these previous trainings, and independently trains the event classifier or classifiers 203 that need to be tuned, Once training is complete, tuning the parameters of the one or more event classifiers 203 is complete.

上記実施例によれば、各イベント分類器が１種類のイベントだけを検出すれば良いため、該深層学習ネットワークの訓練に必要な時間が短く、訓練された該深層学習ネットワークの検出精度が高い。 According to the above embodiments, since each event classifier needs to detect only one type of event, the time required for training the deep learning network is short, and the trained deep learning network has high detection accuracy.

＜実施例３＞
本発明の実施例は電子機器をさらに提供し、図６は本発明の実施例３の電子機器を示す図である。図６に示すように、電子機器６００は訓練装置６０１を含み、該訓練装置６０１は実施例１に記載された深層学習ネットワークを訓練する。該訓練装置６０１の構成及び機能は実施例２に記載されたものと同じであり、ここでその説明を省略する。 <Example 3>
Embodiments of the present invention further provide an electronic device, and FIG. 6 is a schematic diagram of the electronic device of Embodiment 3 of the present invention. As shown in FIG. 6, the electronic device 600 includes a training device 601, which trains the deep learning network described in the first embodiment. The configuration and function of the training device 601 are the same as those described in Example 2, and the description thereof is omitted here.

図７は本発明の実施例３の電子機器のシステム構成を示すブロック図である。図７に示すように、電子機器７００は、中央処理装置（中央制御装置）７０１及び記憶装置７０２を含んでもよく、記憶装置７０２は中央処理装置７０１に接続される。該図は単なる例示的なものであり、電気通信機能又は他の機能を実現するように、他の種類の構成を用いて、該構成を補充又は代替してもよい。 FIG. 7 is a block diagram showing the system configuration of an electronic device according to Example 3 of the present invention. As shown in FIG. 7, the electronic device 700 may include a central processing unit (central control unit) 701 and a storage device 702 , the storage device 702 being connected to the central processing unit 701 . The diagrams are merely exemplary and other types of structures may be used to supplement or replace the structures to implement telecommunications or other functions.

図７に示すように、電子機器７００は、入力部７０３、ディスプレイ７０４及び電源７０５をさらに含んでもよい。 As shown in FIG. 7, electronic device 700 may further include input 703 , display 704 and power supply 705 .

１つの態様では、実施例２の訓練装置の機能は中央処理装置７０１に統合されてもよい。ここで、中央処理装置７０１は、深層学習ネットワークの前記畳み込み層のパラメータを訓練し、該深層学習ネットワークの前記畳み込み層のパラメータを維持したまま、該深層学習ネットワークの少なくとも２つのイベント分類器のパラメータを訓練するように構成されてもよい。 In one aspect, the functionality of the training device of Example 2 may be integrated into central processing unit 701 . Here, the central processing unit 701 trains the parameters of the convolutional layers of the deep learning network, and while maintaining the parameters of the convolutional layers of the deep learning network, trains the parameters of at least two event classifiers of the deep learning network. may be configured to train the

例えば、該少なくとも２つのイベント分類器のパラメータを訓練するステップは、該少なくとも２つのイベント分類器のパラメータを同時に訓練し、或いは該少なくとも２つのイベント分類器の各イベント分類器のパラメータをそれぞれ訓練するステップ、を含んでもよい。 For example, training parameters of the at least two event classifiers includes training parameters of the at least two event classifiers simultaneously or training parameters of each event classifier of the at least two event classifiers respectively. step.

例えば、中央処理装置７０１は、該深層学習ネットワークにイベント分類器が追加された場合、該畳み込み層及び該少なくとも２つのイベント分類器のパラメータを維持したまま、該深層学習ネットワークに追加された該イベント分類器のパラメータを単独で訓練するように構成されてもよい。 For example, when an event classifier is added to the deep learning network, the central processing unit 701 maintains the parameters of the convolutional layer and the at least two event classifiers while maintaining the event classifier added to the deep learning network. It may be configured to train the classifier parameters alone.

例えば、中央処理装置７０１は、該少なくとも２つのイベント分類器のうち１つ又は複数のイベント分類器が所定の条件を満たさない場合、該１つ又は複数のイベント分類器のパラメータを独立して調整するように構成されてもよい。 For example, central processing unit 701 may independently adjust parameters of one or more event classifiers of the at least two event classifiers if one or more event classifiers do not meet a predetermined condition. may be configured to

例えば、該少なくとも２つのイベント分類器のパラメータを訓練するステップは、該少なくとも２つのイベント分類器にそれぞれ対応する、二値化された数値で表される少なくとも２つのラベルを用いて、該少なくとも２つのイベント分類器のパラメータを訓練するステップ、を含んでもよい。 For example, training the parameters of the at least two event classifiers includes using at least two binarized numerical labels respectively corresponding to the at least two event classifiers to obtain the at least two training the parameters of the two event classifiers.

もう１つの態様では、実施例２に記載された訓練装置は中央処理装置７０１とそれぞれ構成されてもよく、例えば訓練装置は中央処理装置７０１に接続されたチップであり、中央処理装置７０１の制御により該訓練装置の機能を実現してもよい。 In another aspect, the training devices described in Example 2 may each be configured with a central processing unit 701, e.g. may implement the functionality of the training device.

本実施例における電子機器７００は、図７に示されている全ての構成部を含まなくてもよい。 The electronic device 700 in this embodiment does not have to include all the components shown in FIG.

図７に示すように、中央処理装置７０１は、コントローラ又は操作制御部とも称され、マイクロプロセッサ又は他の処理装置及び／又は論理装置を含んでもよく、中央処理装置７０１は入力を受信し、電子機器７００の各部の操作を制御する。 As shown in FIG. 7, central processing unit 701, also referred to as a controller or operational control unit, which may include a microprocessor or other processing and/or logic device, central processing unit 701 receives inputs and processes electronic It controls the operation of each part of the device 700 .

記憶装置７０２は、例えばバッファ、フラッシュメモリ、ハードディスク、移動可能な媒体、発揮性メモリ、不発揮性メモリ、又は他の適切な装置の１つ又は複数であってもよい。また、中央処理装置７０１は、記憶装置７０２に記憶されたプログラムを実行し、情報の記憶又は処理などを実現してもよい。他の部材は従来技術に類似するため、ここでその説明が省略される。電子機器７００の各部は、本発明の範囲から逸脱することなく、特定のハードウェア、ファームウェア、ソフトウェア又はその組み合わせによって実現されてもよい。 Storage device 702 may be, for example, one or more of a buffer, flash memory, hard disk, removable media, volatile memory, non-volatile memory, or other suitable device. Also, the central processing unit 701 may execute programs stored in the storage device 702 to realize storage or processing of information. Other members are similar to the prior art, so their description is omitted here. Each portion of electronic device 700 may be implemented by specific hardware, firmware, software, or a combination thereof without departing from the scope of the invention.

本実施例によれば、各イベント分類器が１種類のイベントだけを検出すれば良いため、該深層学習ネットワークの訓練に必要な時間が短く、訓練された該深層学習ネットワークの検出精度が高い。 According to this embodiment, since each event classifier needs to detect only one type of event, the time required for training the deep learning network is short, and the trained deep learning network has high detection accuracy.

＜実施例４＞
本発明の実施例は実施例１のイベント検出に用いられる深層学習ネットワークの訓練方法をさらに提供し、該訓練方法は実施例２の訓練装置に対応する。図８は本発明の実施例４の訓練方法を示す図である。図８に示すように、該方法は以下のステップを含む。 <Example 4>
An embodiment of the present invention further provides a training method for the deep learning network used for event detection in embodiment 1, which training method corresponds to the training apparatus in embodiment 2. FIG. 8 is a diagram showing a training method according to embodiment 4 of the present invention. As shown in FIG. 8, the method includes the following steps.

ステップ８０１：該深層学習ネットワークの該畳み込み層のパラメータを訓練する。 Step 801: Train the parameters of the convolutional layers of the deep learning network.

ステップ８０２：該深層学習ネットワークの該畳み込み層のパラメータを維持したまま、該深層学習ネットワークの該少なくとも２つのイベント分類器のパラメータを訓練する。 Step 802: Training the parameters of the at least two event classifiers of the deep learning network while maintaining the parameters of the convolutional layers of the deep learning network.

本実施例では、該方法は以下のステップをさらに含んでもよい。 In this embodiment, the method may further include the following steps.

ステップ８０３：該深層学習ネットワークにイベント分類器が追加された場合、該畳み込み層及び該少なくとも２つのイベント分類器のパラメータを維持したまま、該深層学習ネットワークに追加された該イベント分類器のパラメータを単独で訓練する。 Step 803: If an event classifier has been added to the deep learning network, while maintaining the parameters of the convolutional layer and the at least two event classifiers, set the parameters of the event classifier added to the deep learning network. train alone.

ステップ８０４：該少なくとも２つのイベント分類器のうち１つ又は複数のイベント分類器が所定の条件を満たさない場合、該１つ又は複数のイベント分類器のパラメータを独立して調整する。 Step 804: Adjust parameters of the one or more event classifiers independently if one or more of the at least two event classifiers do not meet a predetermined condition.

本実施例では、上記の各ステップの具体的な実現方法は実施例２に記載されたものと同じであり、ここでその説明を省略する。 In this embodiment, the specific implementation method of each of the above steps is the same as that described in Embodiment 2, and the description thereof is omitted here.

本発明の実施例は、深層学習ネットワークの訓練装置又は電子機器においてプログラムを実行する際に、コンピュータに、該深層学習ネットワークの訓練装置又は電子機器において上記実施例４に記載の深層学習ネットワークの訓練方法を実行させる、コンピュータ読み取り可能なプログラムをさらに提供する。 An embodiment of the present invention provides a method for executing a program in a deep learning network training device or an electronic device, causing a computer to perform the deep learning network training described in the above embodiment 4 in the deep learning network training device or electronic device. Further provided is a computer readable program for carrying out the method.

本発明の実施例は、コンピュータに、深層学習ネットワークの訓練装置又は電子機器において上記実施例４に記載の深層学習ネットワークの訓練方法を実行させるためのコンピュータ読み取り可能なプログラムを記憶する、記憶媒体をさらに提供する。 An embodiment of the present invention includes a storage medium storing a computer-readable program for causing a computer to execute the deep learning network training method according to the fourth embodiment in a deep learning network training apparatus or electronic device. Offer more.

本発明の実施例を参照しながら説明した深層学習ネットワークの訓練装置又は電子機器において実行される深層学習ネットワークの訓練方法は、ハードウェア、プロセッサにより実行されるソフトウェアモジュール、又は両者の組み合わせで実施されてもよい。例えば、図５に示す機能的ブロック図における１つ若しくは複数、又は機能的ブロック図の１つ若しくは複数の組み合わせは、コンピュータプログラムフローの各ソフトウェアモジュールに対応してもよいし、各ハードウェアモジュールに対応してもよい。これらのソフトウェアモジュールは、図８に示す各ステップにそれぞれ対応してもよい。これらのハードウェアモジュールは、例えばフィールド・プログラマブル・ゲートアレイ（ＦＰＧＡ）を用いてこれらのソフトウェアモジュールをハードウェア化して実現されてもよい。 The deep learning network training method performed in the deep learning network training device or electronic device described with reference to the embodiments of the present invention may be implemented in hardware, software modules executed by a processor, or a combination of both. may For example, one or more of the functional block diagrams shown in FIG. 5, or one or more combinations of functional block diagrams, may correspond to each software module of the computer program flow, or to each hardware module. You can respond. These software modules may respectively correspond to the steps shown in FIG. These hardware modules may be implemented by hardwareizing these software modules using, for example, a field programmable gate array (FPGA).

ソフトウェアモジュールは、ＲＡＭメモリ、フラッシュメモリ、ＲＯＭメモリ、ＥＰＲＯＭメモリ、ＥＥＰＲＯＭメモリ、レジスタ、ハードディスク、モバイルハードディスク、ＣＤ－ＲＯＭ又は当業者にとって既知の任意の他の形の記憶媒体に位置してもよい。プロセッサが記憶媒体から情報を読み取ったり、記憶媒体に情報を書き込むように該記憶媒体をプロセッサに接続してもよいし、記憶媒体がプロセッサの構成部であってもよい。プロセッサ及び記憶媒体はＡＳＩＣに位置する。該ソフトウェアモジュールは移動端末のメモリに記憶されてもよいし、移動端末に挿入されたメモリカードに記憶されてもよい。例えば、機器（例えば移動端末）が比較的に大きい容量のＭＥＧＡ－ＳＩＭカード又は大容量のフラッシュメモリ装置を用いる場合、該ソフトウェアモジュールは該ＭＥＧＡ－ＳＩＭカード又は大容量のフラッシュメモリ装置に記憶されてもよい。 A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, mobile hard disk, a CD-ROM, or any other form of storage medium known to those skilled in the art. The storage medium may be coupled to the processor such that the processor reads information from and writes information to the storage medium, and the storage medium may be a component of the processor. The processor and storage medium are located in the ASIC. The software module may be stored in the memory of the mobile terminal or may be stored on a memory card inserted into the mobile terminal. For example, if the device (eg, mobile terminal) uses a relatively large-capacity MEGA-SIM card or a large-capacity flash memory device, the software module is stored in the MEGA-SIM card or large-capacity flash memory device. good too.

図５に記載されている機能的ブロック図における一つ以上の機能ブロックおよび/または機能ブロックの一つ以上の組合せは、本願に記載されている機能を実行するための汎用プロセッサ、デジタル信号プロセッサ（ＤＳＰ）、特定用途向け集積回路（ＡＳＩＣ）、フィールド・プログラマブル・ゲートアレイ（ＦＰＧＡ）又は他のプログラマブル論理デバイス、ディスクリートゲートまたはトランジスタ論理装置、ディスクリートハードウェアコンポーネント、またはそれらの任意の適切な組み合わせで実現されてもよい。図５に記載されている機能的ブロック図における一つ以上の機能ブロックおよび/または機能ブロックの一つ以上の組合せは、例えば、コンピューティング機器の組み合わせ、例えばＤＳＰとマイクロプロセッサの組み合わせ、複数のマイクロプロセッサの組み合わせ、ＤＳＰ通信と組み合わせた１つ又は複数のマイクロプロセッサ又は他の任意の構成で実現されてもよい。 One or more of the functional blocks and/or one or more combinations of functional blocks in the functional block diagram depicted in FIG. DSP), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or any suitable combination thereof. may be One or more of the functional blocks and/or one or more combinations of functional blocks in the functional block diagram depicted in FIG. It may be implemented in a combination of processors, one or more microprocessors in combination with DSP communications, or any other configuration.

以上、具体的な実施形態を参照しながら本発明を説明しているが、上記の説明は、例示的なものに過ぎず、本発明の保護の範囲を限定するものではない。本発明の趣旨及び原理を離脱しない限り、本発明に対して各種の変形及び修正を行ってもよく、これらの変形及び修正も本発明の範囲に属する。 Although the present invention has been described with reference to specific embodiments, the above description is merely illustrative and does not limit the scope of protection of the present invention. Various variations and modifications may be made to the present invention without departing from the spirit and principle of the present invention, and these variations and modifications are also within the scope of the present invention.

また、上述の実施例を含む実施形態に関し、更に以下の付記を開示する。
（付記１）
イベント検出に用いられる深層学習ネットワークであって、
入力データを読み取るデータ層と、
前記データ層により読み取られた前記入力データから特徴を抽出する畳み込み層と、
前記畳み込み層により抽出された前記特徴に基づいて、互いに独立して異なるイベントの検出を行い、異なるイベントの検出結果をそれぞれ出力する少なくとも２つのイベント分類器と、を含む、深層学習ネットワーク。
（付記２）
前記少なくとも２つのイベント分類器は、同一又は異なる構造を有する、付記１に記載の深層学習ネットワーク。
（付記３）
前記少なくとも２つのイベント分類器は、同一の構造を有し、
前記イベント分類器は、第１全結合層、第２全結合層、及び前記第１全結合層と前記第２全結合層との間に設けられる長短期記憶層を含む、付記２に記載の深層学習ネットワーク。
（付記４）
各前記イベント分類器は、独立して訓練し、且つ／或いは独立してパラメータを調整することができるものである、付記１に記載の深層学習ネットワーク。
（付記５）
前記イベント分類器は、独立して前記深層学習ネットワークに追加し、或いは前記深層学習ネットワークから削除することができるものである、付記１に記載の深層学習ネットワーク。
（付記６）
付記１に記載の深層学習ネットワークの訓練装置であって、
前記深層学習ネットワークの前記畳み込み層のパラメータを訓練する第１訓練手段と、
前記深層学習ネットワークの前記畳み込み層のパラメータを維持したまま、前記深層学習ネットワークの前記少なくとも２つのイベント分類器のパラメータを訓練する第２訓練手段と、を含む、装置。
（付記７）
前記第２訓練手段は、前記少なくとも２つのイベント分類器のパラメータを同時に訓練し、或いは前記少なくとも２つのイベント分類器の各イベント分類器のパラメータをそれぞれ訓練する、付記６に記載の装置。
（付記８）
前記深層学習ネットワークにイベント分類器が追加された場合、前記畳み込み層及び前記少なくとも２つのイベント分類器のパラメータを維持したまま、前記深層学習ネットワークに追加された前記イベント分類器のパラメータを単独で訓練する第３訓練手段、をさらに含む、付記６に記載の装置。
（付記９）
前記少なくとも２つのイベント分類器のうち１つ又は複数のイベント分類器が所定の条件を満たさない場合、前記１つ又は複数のイベント分類器のパラメータを独立して調整する調整手段、をさらに含む、付記６に記載の装置。
（付記１０）
前記第２訓練手段は、前記少なくとも２つのイベント分類器にそれぞれ対応する、二値化された数値で表される少なくとも２つのラベルを用いて、前記少なくとも２つのイベント分類器のパラメータを訓練する、付記６に記載の装置。
（付記１１）
付記１に記載の深層学習ネットワークの訓練方法であって、
前記深層学習ネットワークの前記畳み込み層のパラメータを訓練するステップと、
前記深層学習ネットワークの前記畳み込み層のパラメータを維持したまま、前記深層学習ネットワークの前記少なくとも２つのイベント分類器のパラメータを訓練するステップと、を含む、方法。
（付記１２）
前記少なくとも２つのイベント分類器のパラメータを訓練するステップは、
前記少なくとも２つのイベント分類器のパラメータを同時に訓練し、或いは前記少なくとも２つのイベント分類器の各イベント分類器のパラメータをそれぞれ訓練するステップ、を含む、付記１１に記載の方法。
（付記１３）
前記深層学習ネットワークにイベント分類器が追加された場合、前記畳み込み層及び前記少なくとも２つのイベント分類器のパラメータを維持したまま、前記深層学習ネットワークに追加された前記イベント分類器のパラメータを単独で訓練するステップ、をさらに含む、付記１１に記載の方法。
（付記１４）
前記少なくとも２つのイベント分類器のうち１つ又は複数のイベント分類器が所定の条件を満たさない場合、前記１つ又は複数のイベント分類器のパラメータを独立して調整するステップ、をさらに含む、付記１１に記載の方法。
（付記１５）
前記少なくとも２つのイベント分類器のパラメータを訓練するステップは、
前記少なくとも２つのイベント分類器にそれぞれ対応する、二値化された数値で表される少なくとも２つのラベルを用いて、前記少なくとも２つのイベント分類器のパラメータを訓練するステップ、を含む、付記１１に記載の方法。 In addition, the following notes are further disclosed with respect to the embodiments including the above-described examples.
(Appendix 1)
A deep learning network used for event detection, comprising:
a data layer that reads input data;
a convolutional layer that extracts features from the input data read by the data layer;
at least two event classifiers that detect different events independently of each other based on the features extracted by the convolutional layers and output different event detection results, respectively.
(Appendix 2)
2. The deep learning network of clause 1, wherein the at least two event classifiers have the same or different structures.
(Appendix 3)
the at least two event classifiers have the same structure;
3. The method of claim 2, wherein the event classifier includes a first fully connected layer, a second fully connected layer, and a long-term memory layer provided between the first fully connected layer and the second fully connected layer. deep learning network.
(Appendix 4)
2. The deep learning network of clause 1, wherein each said event classifier can be independently trained and/or independently parameter tuned.
(Appendix 5)
Clause 1. The deep learning network of Clause 1, wherein the event classifier can be independently added to or deleted from the deep learning network.
(Appendix 6)
A deep learning network training device according to Supplementary Note 1,
a first training means for training parameters of the convolutional layers of the deep learning network;
and second training means for training the parameters of the at least two event classifiers of the deep learning network while maintaining the parameters of the convolutional layers of the deep learning network.
(Appendix 7)
7. Apparatus according to clause 6, wherein said second training means trains parameters of said at least two event classifiers simultaneously or trains parameters of each event classifier of said at least two event classifiers respectively.
(Appendix 8)
If an event classifier is added to the deep learning network, training the parameters of the event classifier added to the deep learning network alone while maintaining the parameters of the convolutional layer and the at least two event classifiers. 7. The apparatus of clause 6, further comprising a third training means for:
(Appendix 9)
adjusting means for independently adjusting parameters of the one or more event classifiers if one or more of the at least two event classifiers does not meet a predetermined condition; 6. Apparatus according to clause 6.
(Appendix 10)
The second training means trains parameters of the at least two event classifiers using at least two labels represented by binarized numerical values respectively corresponding to the at least two event classifiers. 6. Apparatus according to clause 6.
(Appendix 11)
A method of training a deep learning network according to Supplementary Note 1, comprising:
training parameters of the convolutional layers of the deep learning network;
training parameters of the at least two event classifiers of the deep learning network while maintaining parameters of the convolutional layers of the deep learning network.
(Appendix 12)
training the parameters of the at least two event classifiers,
12. The method of clause 11, comprising training parameters of the at least two event classifiers simultaneously, or training parameters of each event classifier of the at least two event classifiers, respectively.
(Appendix 13)
If an event classifier is added to the deep learning network, training the parameters of the event classifier added to the deep learning network alone while maintaining the parameters of the convolutional layer and the at least two event classifiers. 12. The method of clause 11, further comprising the step of:
(Appendix 14)
independently adjusting parameters of the one or more event classifiers if one or more of the at least two event classifiers do not meet a predetermined condition. 11. The method according to 11.
(Appendix 15)
training the parameters of the at least two event classifiers,
training parameters of the at least two event classifiers using at least two binarized numerical labels respectively corresponding to the at least two event classifiers. described method.

Claims

A deep learning network used for event detection, comprising:
a data layer that reads input data;
a convolutional layer that extracts features from the input data read by the data layer;
at least two event classifiers that detect different events independently of each other based on the features extracted by the convolutional layers and output different event detection results, respectively;
A deep learning network , wherein each said event classifier detects only one event .

2. The deep learning network of claim 1, wherein the at least two event classifiers have identical or different structures.

the at least two event classifiers have the same structure;
3. The event classifier of claim 2, wherein the event classifier includes a first fully connected layer, a second fully connected layer, and a long-term memory layer provided between the first fully connected layer and the second fully connected layer. deep learning network.

2. The deep learning network of claim 1, wherein each said event classifier can be independently trained and/or independently parameter tuned.

2. The deep learning network of claim 1, wherein the event classifiers can be independently added to or removed from the deep learning network.

A deep learning network training device according to claim 1,
a first training means for training parameters of the convolutional layers of the deep learning network;
and second training means for training the parameters of the at least two event classifiers of the deep learning network while maintaining the parameters of the convolutional layers of the deep learning network.

7. Apparatus according to claim 6, wherein said second training means trains parameters of said at least two event classifiers simultaneously or trains parameters of each event classifier of said at least two event classifiers respectively.

If an event classifier is added to the deep learning network, training the parameters of the event classifier added to the deep learning network alone while maintaining the parameters of the convolutional layer and the at least two event classifiers. 7. The apparatus of claim 6, further comprising a third training means for:

adjusting means for independently adjusting parameters of the one or more event classifiers if one or more of the at least two event classifiers does not meet a predetermined condition; 7. Apparatus according to claim 6.

The second training means trains the parameters of the at least two event classifiers using at least two labels represented by binarized numerical values respectively corresponding to the at least two event classifiers. 7. Apparatus according to claim 6.