JP2023546189A

JP2023546189A - Classification device, control device, classification method, control method and program

Info

Publication number: JP2023546189A
Application number: JP2023523666A
Authority: JP
Inventors: アレクサンダーフィーヴァイダー
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2023-11-01
Anticipated expiration: 2040-10-29
Also published as: WO2022091304A1

Abstract

本開示の目的は、人間への支援を提供することができる分類装置を提供することである。分類装置は、所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定し、特定の時間領域においてビデオデータが抽出された部分ビデオデータを生成する生成手段（１１）と、生成手段によって生成された部分ビデオデータを分類する分類手段（１２）と、分類手段によって実行された分類の評価に基づいて、所定のアルゴリズムを修正する修正手段（１３）とを備える。【選択図】図１The purpose of the present disclosure is to provide a classification device that can provide assistance to humans. The classification device includes a generating means (11) for determining a specific time region of video data based on a predetermined algorithm and generating partial video data from which video data is extracted in the specific time region; and a modification means (13) for modifying a predetermined algorithm based on the evaluation of the classification performed by the classification means. [Selection diagram] Figure 1

Description

本開示は、分類装置、制御装置、分類方法、制御方法及び非一時的なコンピュータ可読媒体に関する。 The present disclosure relates to a classification device, a control device, a classification method, a control method, and a non-transitory computer-readable medium.

画像分析およびビデオ分析の技術は急速に発展してきた。 Image and video analysis techniques have developed rapidly.

例えば、特許文献１は、シーンのサムネイルを作成することができる表示制御装置を開示している。具体的には、表示制御装置は、コンテンツの各フレームがクラスタリングの対象となるクラスタリング結果を作成し、サムネイルを表示する。表示制御装置のシーン分類部６１２は、注目するクラスタに属するフレームを、１つ以上のフレームからなるフレーム群を有するシーンに分類する。表示制御装置のサムネイル作成部６１３は、シーン分類部６１２からのシーン情報を元に、各シーンのサムネイルを作成する。 For example, Patent Document 1 discloses a display control device that can create thumbnails of scenes. Specifically, the display control device creates a clustering result in which each frame of the content is subject to clustering, and displays thumbnails. The scene classification unit 612 of the display control device classifies frames belonging to the cluster of interest into a scene having a frame group consisting of one or more frames. The thumbnail creation unit 613 of the display control device creates a thumbnail for each scene based on the scene information from the scene classification unit 612.

特許第５５３３８６１号公報Patent No. 5533861

近年、人間の活動を機械（例えば、コンピュータ、サポートロボットなど）で支援する技術が開発されている。このような技術では、人間が望む支援を実現するために、機械が人間の動作シーケンスを検出して分類することが重要である。 In recent years, technology has been developed to support human activities with machines (eg, computers, support robots, etc.). In such technologies, it is important for machines to detect and classify human movement sequences in order to provide the assistance desired by humans.

本開示の目的は、人間の支援（人間に対する支援として理解される）を提供することができる分類装置、制御装置、分類方法、制御方法および非一時的なコンピュータ可読媒体を提供することである。 The aim of the present disclosure is to provide a classification device, a control device, a classification method, a control method and a non-transitory computer-readable medium capable of providing human assistance (understood as assistance to humans).

第１の例示的側面では、分類装置は、所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定し、前記特定の時間領域において前記ビデオデータが抽出された部分ビデオデータを生成する生成手段と、前記生成手段によって生成された前記部分ビデオデータを分類する分類手段と、前記分類手段によって実行された分類の評価に基づいて、前記所定のアルゴリズムを修正する修正手段とを備える。 In a first exemplary aspect, the classification device includes a generating means for determining a specific temporal region of video data based on a predetermined algorithm, and generating partial video data from which the video data is extracted in the specific temporal region. and a classification means for classifying the partial video data generated by the generation means, and a modification means for modifying the predetermined algorithm based on the evaluation of the classification performed by the classification means.

第２の例示的側面では、制御装置は、作業を含むビデオデータを認識し、それによって前記作業を決定する認識手段と、決定された前記作業に応じて機械の動作を決定し、前記決定された作業に従って前記機械を制御するコントローラを備える。 In a second exemplary aspect, the control device comprises: recognition means for recognizing video data including a task and thereby determining said task; and determining an operation of the machine in response to said determined task; and a controller for controlling the machine according to the selected task.

第３の例示的側面では、分類方法は、所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定し、前記特定の時間領域において前記ビデオデータが抽出された部分ビデオデータを生成することと、前記部分ビデオデータを分類することと、分類の評価に基づいて、前記所定のアルゴリズムを修正することとを含む。 In a third exemplary aspect, the classification method includes determining a specific temporal region of video data based on a predetermined algorithm, and generating partial video data from which the video data is extracted in the specific temporal region. , classifying the partial video data and modifying the predetermined algorithm based on the classification evaluation.

第４の例示的側面では、制御方法は、作業を含むビデオデータを認識し、それによって前記作業を決定することと、決定された前記作業に応じて機械の動作を決定し、前記決定された作業に従って前記機械を制御することとが含まれる。 In a fourth exemplary aspect, the control method includes: recognizing video data including a task, thereby determining the task; and determining an operation of a machine in response to the determined task; controlling the machine according to the task.

第５の例示的側面は、所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定し、前記特定の時間領域において前記ビデオデータが抽出された部分ビデオデータを生成することと、前記部分ビデオデータを分類することと、分類の評価に基づいて、前記所定のアルゴリズムを修正することとをコンピュータに実行させるプログラムを格納する非一時的なコンピュータ可読媒体である。 A fifth exemplary aspect includes determining a specific time region of video data based on a predetermined algorithm, and generating partial video data from which the video data is extracted in the specific time region; A non-transitory computer-readable medium that stores a program that causes a computer to classify data and modify the predetermined algorithm based on an evaluation of the classification.

第６の例示的側面は、作業を含むビデオデータを認識し、それによって前記作業を決定することと、決定された前記作業に応じて機械の動作を決定し、前記決定された作業に従って前記機械を制御することとをコンピュータに実行させるプログラムを格納する非一時的なコンピュータ可読媒体である。 A sixth exemplary aspect includes recognizing video data including a task, thereby determining the task, determining an operation of a machine in accordance with the determined task, and determining an operation of a machine in accordance with the determined task. A non-transitory computer-readable medium that stores a program that causes a computer to control and execute a computer.

本開示によれば、人間に支援を提供することができる分類装置、制御装置、分類方法、制御方法および非一時的なコンピュータ可読媒体を提供することができる。 According to the present disclosure, it is possible to provide a classification device, a control device, a classification method, a control method, and a non-transitory computer-readable medium that can provide assistance to humans.

図１は、実施の形態１にかかる分類装置のブロック図である。FIG. 1 is a block diagram of a classification device according to a first embodiment. 図２は、実施の形態１にかかるビデオデータの分類方法を示すフローチャートである。FIG. 2 is a flowchart showing a video data classification method according to the first embodiment. 図３は、実施の形態２にかかる制御装置のブロック図である。FIG. 3 is a block diagram of a control device according to a second embodiment. 図４は、実施の形態２にかかる機械の制御方法を示すフローチャートである。FIG. 4 is a flowchart showing a method for controlling a machine according to the second embodiment. 図５は、実施の形態３にかかる分類システムのブロック図である。FIG. 5 is a block diagram of a classification system according to the third embodiment. 図６は、実施の形態３にかかる生成部のブロック図である。FIG. 6 is a block diagram of a generation unit according to the third embodiment. 図７は、実施の形態３にかかるビデオデータの強度信号の例を示すグラフである。FIG. 7 is a graph showing an example of an intensity signal of video data according to the third embodiment. 図８Ａは、実施の形態３にかかる各サブシーケンスの人間の動作の例を示す図である。FIG. 8A is a diagram illustrating an example of human motion in each subsequence according to the third embodiment. 図８Ｂは、実施の形態３にかかるサブシーケンスに対応する分類およびカテゴリラベルの例を示す表である。FIG. 8B is a table showing examples of classifications and category labels corresponding to subsequences according to the third embodiment. 図９は、実施の形態３にかかるビデオデータの強度信号の例を示すグラフである。FIG. 9 is a graph showing an example of an intensity signal of video data according to the third embodiment. 図１０は、実施の形態３にかかるサブシーケンスに対応する分類およびカテゴリラベルの例を示す表である。FIG. 10 is a table showing examples of classifications and category labels corresponding to subsequences according to the third embodiment. 図１１は、実施の形態３にかかるフィードバック処理の概略図である。FIG. 11 is a schematic diagram of feedback processing according to the third embodiment. 図１２は、実施の形態３にかかる合理的な分類ソリューション数の推移例を示すグラフである。FIG. 12 is a graph showing an example of changes in the number of rational classification solutions according to the third embodiment. 図１３は、実施の形態４にかかる意図検出システムのブロック図である。FIG. 13 is a block diagram of an intention detection system according to the fourth embodiment. 図１４は、実施の形態５にかかる意図検出システムを含む機械のブロック図である。FIG. 14 is a block diagram of a machine including the intention detection system according to the fifth embodiment. 図１５は、実施の形態５にかかる意図検出システムを含むピッキングロボットの例を示す図である。FIG. 15 is a diagram illustrating an example of a picking robot including the intention detection system according to the fifth embodiment. 図１６Ａは、実施の形態５にかかる人間のジェスチャーによって指示されたピッキングロボットの処理の一例を示す図である。FIG. 16A is a diagram illustrating an example of processing of a picking robot instructed by a human gesture according to the fifth embodiment. 図１６Ｂは、実施の形態５にかかる人間のジェスチャーによって指示されたピッキングロボットの処理の他の例を示す図である。FIG. 16B is a diagram illustrating another example of processing of the picking robot instructed by a human gesture according to the fifth embodiment. 図１７は、実施の形態にかかる情報処理装置の構成図である。FIG. 17 is a configuration diagram of an information processing device according to an embodiment.

（実施の形態１）
本開示の実施の形態１を、図面を参照して以下に説明する。図１を参照すると、分類装置１０は、生成部１１、分類部１２および修正部１３を備える。分類装置１０は、ビデオデータを扱うことができる様々なコンピュータ又は機械に適用されてもよい。例えば、分類装置１０は、パーソナルコンピュータ、ビデオレコーダ、ロボット、機械、テレビ、携帯電話などとして設置されてもよい。 (Embodiment 1)
Embodiment 1 of the present disclosure will be described below with reference to the drawings. Referring to FIG. 1, the classification device 10 includes a generation section 11, a classification section 12, and a modification section 13. Classification device 10 may be applied to various computers or machines capable of handling video data. For example, the classification device 10 may be installed as a personal computer, video recorder, robot, machine, television, mobile phone, etc.

生成部１１は、所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定し、その特定の時間領域においてビデオデータが抽出された部分ビデオデータを生成する。ビデオデータはある時間長を有し、特定の時間領域はある時間長内にある。ビデオデータは画像データのシーケンスであってもよい。つまり、ビデオデータは、複数のフレームを有してもよい。生成部１１は、所定のアルゴリズムを使用してビデオデータの内容を分析し、特定の時間領域を設定してもよい。ビデオデータは、分類装置１０内のメモリに格納されてもよいし、分類装置１０の外部から生成部１１に入力されてもよい。さらに、所定のアルゴリズムは、分類装置１０内のメモリに格納されてもよい。 The generation unit 11 determines a specific time region of video data based on a predetermined algorithm, and generates partial video data from which video data is extracted in the specific time region. Video data has a certain time length, and a particular time region is within a certain time length. The video data may be a sequence of image data. That is, video data may have multiple frames. The generation unit 11 may analyze the content of the video data using a predetermined algorithm and set a specific time domain. The video data may be stored in a memory within the classification device 10, or may be input to the generation unit 11 from outside the classification device 10. Furthermore, the predetermined algorithm may be stored in memory within the classification device 10.

分類部１２は、生成部１１によって生成された部分ビデオデータを分類する。分類は、数字やテキストなどを用いて行うことができる。分類は、ジェスチャー、テレビ番組又は映画の特定のシーンなど、人間の動作に関連していてもよいが、これらに限定されるものではない。 The classification unit 12 classifies the partial video data generated by the generation unit 11. Classification can be performed using numbers, text, or the like. Classifications may relate to human actions such as, but are not limited to, gestures, specific scenes from television shows or movies.

修正部１３は、分類部１２によって実行される分類の評価に基づいて、所定のアルゴリズムを修正する。評価は、分類装置１０内の構成要素で処理されてもよいが、分類装置１０外の装置で処理されてもよい。 The modification unit 13 modifies a predetermined algorithm based on the classification evaluation performed by the classification unit 12. The evaluation may be processed by a component within the classification device 10, or may be processed by a device outside the classification device 10.

図２は、実施の形態１にかかる分類装置１０で実行される処理の一例を示すフローチャートである。以下、分類装置１０で実行される処理について説明する。 FIG. 2 is a flowchart illustrating an example of a process executed by the classification device 10 according to the first embodiment. The processing executed by the classification device 10 will be described below.

まず、生成部１１が所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定する（ステップＳ１１）。次に、生成部１１は、特定の時間領域においてビデオデータが抽出された部分ビデオデータを生成する（ステップＳ１２）。この部分ビデオデータは、１つのシーンを指し示し、人間の動作の一種を示してもよいが、これに限定されない。 First, the generation unit 11 determines a specific time region of video data based on a predetermined algorithm (step S11). Next, the generation unit 11 generates partial video data in which video data is extracted in a specific time domain (step S12). This partial video data may point to a scene and show a type of human action, but is not limited thereto.

次に、分類部１２は、生成部１１によって生成された部分ビデオデータを分類する（ステップＳ１３）。この処理によって、分類部１２は、さまざまな部分ビデオデータを複数のカテゴリに分類してもよい。 Next, the classification unit 12 classifies the partial video data generated by the generation unit 11 (step S13). Through this process, the classification unit 12 may classify various partial video data into a plurality of categories.

その後、修正部１３は、分類部１２によって実行された分類の評価に基づいて、必要に応じて所定のアルゴリズムを修正する（ステップＳ１４）。所定のアルゴリズムを修正した結果、評価結果とともに特定の時間領域が変更されてもよい。そのため、生成部１１は、部分ビデオデータを生成することで、より正確に分類すべきシーンを部分ビデオデータに含めることができる。例えば、分類が人間の動作サブシーケンスを分類することを目的としたものであれば、分類装置１０は、一人の人間の動作サブシーケンスを表す部分ビデオデータの適切な時間領域を決定することができる。その結果、部分ビデオデータは正確な一人の人間の動作サブシーケンスを示すことができるため、分類部１２は部分ビデオデータをより正確に分類することができる。 Thereafter, the modification unit 13 modifies the predetermined algorithm as necessary based on the evaluation of the classification performed by the classification unit 12 (step S14). As a result of modifying the predetermined algorithm, the specific time domain may be changed along with the evaluation results. Therefore, by generating partial video data, the generation unit 11 can include scenes to be classified more accurately in the partial video data. For example, if the classification is aimed at classifying human motion subsequences, the classifier 10 may determine appropriate temporal regions of partial video data representing one human motion subsequence. . As a result, the classifier 12 can more accurately classify the partial video data because the partial video data can represent an accurate motion subsequence of one person.

（実施の形態２）
本開示の実施の形態２を、図面を参照して以下に説明する。図３を参照すると、制御装置１４は、認識部１５とコントローラ１６とを備える。制御装置１４は、例えば人間を支援するロボットといった、様々なコンピュータ又は機械に搭載された装置に適用されてもよい。 (Embodiment 2)
Embodiment 2 of the present disclosure will be described below with reference to the drawings. Referring to FIG. 3, the control device 14 includes a recognition unit 15 and a controller 16. The control device 14 may be applied to devices mounted on various computers or machines, such as robots that assist humans.

認識部１５は、作業（operation）を含むビデオデータを認識し、それによって作業を決定する。ビデオデータは人の動作を示してもよく、人の動作は、ある物体に対する作業であってもよい。例えば、作業は、ある物体をつかむ作業、ある物体を置く作業などを含む。このジェスチャーは、ロボットに何らかの処理を行うよう指示することができ、暗黙的かつ明示的であってもよい。ビデオデータは、実施の形態１に示すように分類され得る。 The recognition unit 15 recognizes video data including an operation, and determines the operation accordingly. The video data may show human motion, and the human motion may be a task on an object. For example, the work includes the work of grasping a certain object, the work of placing a certain object, and the like. This gesture can instruct the robot to perform some processing, and can be implicit or explicit. Video data may be classified as shown in Embodiment 1.

コントローラ１６は、決定された作業に応じて機械の動作を決定し、決定された作業に従って機械を制御する。機械は、制御装置１４を含むものであってもよいが、これに限らない。 The controller 16 determines the operation of the machine according to the determined work, and controls the machine according to the determined work. The machine may include, but is not limited to, the controller 14.

図４は、実施の形態２にかかる制御装置１４が実行する処理の一例を示すフローチャートである。以下、制御装置１４によって実行される処理について説明する。 FIG. 4 is a flowchart illustrating an example of processing executed by the control device 14 according to the second embodiment. The processing executed by the control device 14 will be described below.

まず、認識部１５は、作業を含むビデオデータを認識する（ステップＳ１５）。前述のように、作業は人間の動作であってもよい。次に、認識部１５はビデオデータを認識することで作業を決定する（ステップＳ１６）。 First, the recognition unit 15 recognizes video data including work (step S15). As mentioned above, the work may be a human action. Next, the recognition unit 15 determines the work by recognizing the video data (step S16).

そして、コントローラ１６は、決定した作業に応じて機械の動作を決定する（ステップＳ１７）。その後、コントローラ１６は、決定した作業に応じて機械を制御する（ステップＳ１８）。例えば、ユーザが作業を行った場合、認識部１５はユーザが機械に何をさせたいかを理解し、コントローラ１６は、ユーザやその他の入力によって指示されたように機械を制御することができる。具体的には、この処理により、制御装置１４は人間の意図を認識することで機械を制御することができる。 Then, the controller 16 determines the operation of the machine according to the determined work (step S17). After that, the controller 16 controls the machine according to the determined work (step S18). For example, when a user performs a task, the recognition unit 15 understands what the user wants the machine to do, and the controller 16 can control the machine as directed by the user or other input. Specifically, through this process, the control device 14 can control the machine by recognizing the human intention.

実施の形態２にかかる制御装置１４は、例えばロボットやコンピュータなどの機械におけるシステム統合機能の低減を実現することができる。 The control device 14 according to the second embodiment can reduce system integration functions in machines such as robots and computers, for example.

認識部１５は、図１の分類部１２及び／又は修正部１３の機能によって実現可能である。さらに、認識部１５は、図５の前処理部２１、生成部２２、分類部２３、マッピング部２４、及び／又は修正部２５の機能によって実現可能とされてもよい。認識部１５は、図６の計算部２６、信号分析部２７、決定部２８、及び／又はサブシーケンス生成部２９の機能によって実現可能とされてもよい。さらに、認識部１５は、図１３の人物対象分析部３１及び／又は意図検出部３２の機能によって実現可能とされてもよい。認識部１５は、コンピュータビジョンの分野におけるパターン認識アルゴリズム及び／又は画像認識アルゴリズムによって実現可能とされてもよい。さらに、コントローラ１６は、図１４の信号発生器４１及び／又はオプティマイザコントローラ４２の機能によって実現可能である。図５、６、１３及び１４の詳細については後述する。 The recognition unit 15 can be realized by the functions of the classification unit 12 and/or correction unit 13 in FIG. Furthermore, the recognition unit 15 may be realized by the functions of the preprocessing unit 21, the generation unit 22, the classification unit 23, the mapping unit 24, and/or the modification unit 25 shown in FIG. The recognition unit 15 may be realized by the functions of the calculation unit 26, the signal analysis unit 27, the determination unit 28, and/or the subsequence generation unit 29 in FIG. Furthermore, the recognition unit 15 may be realized by the functions of the human object analysis unit 31 and/or the intention detection unit 32 shown in FIG. The recognition unit 15 may be realized by a pattern recognition algorithm and/or an image recognition algorithm in the field of computer vision. Furthermore, the controller 16 can be realized by the functions of the signal generator 41 and/or the optimizer controller 42 of FIG. Details of FIGS. 5, 6, 13 and 14 will be described later.

（実施の形態３）
本開示の実施の形態３を、図面を参照して以下に説明する。実施の形態３は、実施の形態１の具体例である。 (Embodiment 3)
Embodiment 3 of the present disclosure will be described below with reference to the drawings. Embodiment 3 is a specific example of Embodiment 1.

まず、実施の形態３にかかる分類システムの構成と処理について説明する。図５を参照すると、分類システム２０は、前処理部２１、生成部２２、分類部２３、マッピング部２４、修正部２５、データベース（ＤＢ）を備える。分類システム２０は、例えば、機械又はロボットのモジュールとして提供されてもよい。分類システム２０は、感覚入力（sensory input）または（図５には示されていない）イメージングセクション、例えばビデオカメラから未加工のビデオデータを受信してもよい。イメージングセクションは、一定の間隔で人物のフレームを捉えることができる。 First, the configuration and processing of the classification system according to the third embodiment will be explained. Referring to FIG. 5, the classification system 20 includes a preprocessing section 21, a generation section 22, a classification section 23, a mapping section 24, a modification section 25, and a database (DB). Classification system 20 may be provided as a mechanical or robotic module, for example. Classification system 20 may receive raw video data from a sensory input or an imaging section (not shown in FIG. 5), such as a video camera. The imaging section can capture frames of a person at regular intervals.

前処理部２１は未加工のビデオデータを受信し、それを前処理（すなわち前工程処理）する。具体的には、前処理部２１は未加工データに含まれる情報を削減し、分類に関する情報を含む前処理済みのビデオデータ（以下、単にビデオデータと呼称する）を生成する。これは、分類部２３によって行われる。例えば、前処理部２１は、不規則にサンプリングされた高解像度フレームのシーケンスを、関連情報を含むデータポイントの数が少ないフレームへ削減することができる。関連情報には、撮影される人物の特徴的な体の点が含まれてもよい。また、関連情報には、人が作業する、又は人の近くに位置する物体と人との関係が含まれてもよい。 The preprocessing unit 21 receives raw video data and preprocesses it (ie, preprocesses it). Specifically, the preprocessing unit 21 reduces information included in the raw data and generates preprocessed video data (hereinafter simply referred to as video data) that includes information regarding classification. This is done by the classification section 23. For example, the preprocessor 21 may reduce a sequence of irregularly sampled high resolution frames to frames with a small number of data points containing relevant information. The related information may include characteristic body points of the person being photographed. Further, the related information may include a relationship between a person and an object that the person works on or is located near the person.

前処理部２１はビデオデータを生成部２２に出力する。前処理部２１は、分類システム２０における前処理ソフトウェアとプロセッサとの組み合わせによって実現されてもよい。 The preprocessing unit 21 outputs video data to the generation unit 22. The preprocessing unit 21 may be realized by a combination of preprocessing software and a processor in the classification system 20.

生成部２２は、前処理部２１からビデオデータを受信し、特定の時間領域においてビデオデータが抽出されるサブシーケンス（部分ビデオデータ）を生成する。そのために、生成部２２は、所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定する。つまり、生成部２２は、ビデオを複数のサブシーケンスに分割する分割生成部として実行してもよい。 The generation unit 22 receives the video data from the preprocessing unit 21 and generates a subsequence (partial video data) from which video data is extracted in a specific time domain. To this end, the generation unit 22 determines a specific time region of the video data based on a predetermined algorithm. That is, the generation unit 22 may be implemented as a division generation unit that divides the video into a plurality of subsequences.

図６は、生成部２２のブロック図である。生成部２２は、計算部２６、信号分析部２７、決定部２８、およびサブシーケンス生成部２９を含む。生成部２２における詳細な処理について説明する。 FIG. 6 is a block diagram of the generation unit 22. The generation unit 22 includes a calculation unit 26, a signal analysis unit 27, a determination unit 28, and a subsequence generation unit 29. Detailed processing in the generation unit 22 will be explained.

計算部２６は、ビデオデータの強度信号を計算して、ビデオデータ内のサブシーケンスの長さとサブシーケンスの位置（すなわち、特定の時間領域）を決定し、強度信号は人の動作を示す。詳細には、スカラー信号と、この信号の特徴点を決定することと、によって、強度信号は、動作する人の動的な動作を集約する。計算部２６は、式及び／又は規則として表されてもよい所定のアルゴリズムを使用して、強度信号を計算する。計算部２６は、信号分析部２７に強度信号を出力する。 The calculation unit 26 calculates an intensity signal of the video data to determine the length of the subsequence and the position of the subsequence (ie, a specific time domain) within the video data, where the intensity signal is indicative of human motion. In particular, by means of a scalar signal and by determining feature points of this signal, the intensity signal summarizes the dynamic movements of the moving person. The calculation unit 26 calculates the intensity signal using a predetermined algorithm that may be expressed as a formula and/or a rule. The calculation unit 26 outputs the intensity signal to the signal analysis unit 27.

信号分析部２７は、強度信号を分析し、強度の候補点を特定する。候補点は、ビデオデータ中のサブシーケンスの長さ及びサブシーケンスの位置を決定するための特徴点の候補である。 The signal analysis unit 27 analyzes the intensity signal and identifies intensity candidate points. The candidate points are feature point candidates for determining the length of the subsequence and the position of the subsequence in the video data.

決定部２８は、信号分析部２７で特定された候補点から特徴点を決定する。信号分析部２７と決定部２８は、所定のアルゴリズムに含まれるルールベースを使用して上記の処理を行う。このようにして、ビデオデータ中の特徴点が導き出される。 The determining unit 28 determines feature points from the candidate points identified by the signal analyzing unit 27. The signal analysis unit 27 and the determination unit 28 perform the above processing using a rule base included in a predetermined algorithm. In this way, feature points in the video data are derived.

サブシーケンス生成部２９は、決定部２８によって決定される特徴点を利用して、ビデオデータ中のサブシーケンスの長さとサブシーケンスの位置を決定する。サブシーケンス生成部２９は、所定のアルゴリズムに含まれる生成法則を使用して、これらの要素を決定する。サブシーケンス生成部２９は、フレームのシーケンス（すなわちビデオデータ）からサブシーケンスを生成する。 The subsequence generation unit 29 uses the feature points determined by the determination unit 28 to determine the length of the subsequence and the position of the subsequence in the video data. The subsequence generation unit 29 determines these elements using generation rules included in a predetermined algorithm. The subsequence generator 29 generates a subsequence from a sequence of frames (ie, video data).

要約すると、生成部２２は、生成法則と適切なルールベースを含む所定のアルゴリズムに基づいて、ビデオデータのフレームのシーケンスから一連のサブシーケンスを生成することができる。各サブシーケンスのデータは、分類部２３によってモーションサブシーケンスの候補として使用される。生成部２２は、生成されたサブシーケンスを分類部２３に出力する。 In summary, the generator 22 can generate a series of subsequences from a sequence of frames of video data based on a predetermined algorithm that includes generative laws and a suitable rule base. The data of each subsequence is used by the classification unit 23 as a motion subsequence candidate. The generation unit 22 outputs the generated subsequence to the classification unit 23.

また、所定のアルゴリズムは、修正部２５からのフィードバックによって修正することができることにも留意する必要がある。所定のアルゴリズムが修正された場合、計算部２６が強度信号を計算する方法を変更する、及び／又は、信号分析部２７及び決定部２８の少なくとも１つが、特徴点を決定する方法を変更する。したがって、ビデオデータ内のサブシーケンスの長さ及び／又はサブシーケンスの位置が修正されることで、より正確に分類を得る。この修正処理について、以下で詳しく説明する。 It should also be noted that the predetermined algorithm can be modified by feedback from the modification unit 25. If the predetermined algorithm is modified, the method by which the calculation unit 26 calculates the intensity signal is changed and/or the method by which at least one of the signal analysis unit 27 and the determination unit 28 determines the feature points is changed. Therefore, the length of the subsequence and/or the position of the subsequence within the video data is modified to obtain a more accurate classification. This correction process will be explained in detail below.

分類部２３は、サブシーケンスを受信し、サブシーケンス（部分ビデオデータ）を人間の動作として分類する。分類部２３は、分類されたサブシーケンスを分類番号に割り当てる。さらに、分類部２３は、マッピング部２４及び／又はＤＢにアクセスすることによって、分類されたサブシーケンスをテキストラベルに割り当てることができる。サブシーケンスは、人間の動作のクラスタとして分類される。分類部２３は、さらなる処理のため、分類番号とテキストラベルとともに、サブシーケンスを出力する。 The classification unit 23 receives the subsequence and classifies the subsequence (partial video data) as a human motion. The classification unit 23 assigns the classified subsequences to classification numbers. Furthermore, the classification unit 23 can assign the classified subsequences to text labels by accessing the mapping unit 24 and/or the DB. Subsequences are classified as clusters of human actions. The classification unit 23 outputs the subsequences along with classification numbers and text labels for further processing.

さらに、単一のサブシーケンスを分類される候補とみなすために、分類部２３は、１つ（または一時的に１つ以上）の分類ソリューションを導出してもよい。分類部２３は、各サブシーケンスに関してこの処理を実行し、分類ソリューションは、分類部２３のサブシーケンスを分類するために必要である。 Furthermore, the classifier 23 may derive one (or temporarily more than one) classification solution in order to consider a single subsequence as a candidate to be classified. The classifier 23 performs this process for each subsequence, and a classification solution is needed for the classifier 23 to classify the subsequences.

ＤＢはライブラリとして機能し、分類部２３によって生成された分類ソリューションと分類番号を格納する。分類部２３はＤＢにアクセスし、分類ソリューションと分類番号を使用してサブシーケンスを分類することができる。 The DB functions as a library and stores classification solutions and classification numbers generated by the classification unit 23. The classification unit 23 can access the DB and classify subsequences using the classification solution and classification number.

マッピング部２４は、データベース及び／又はインターネットから、文書などの分類に関連するテキスト情報を取得する。マッピング部２４はさらに、特に分類システム２０のユーザによって提供されるテキスト情報を取得する。マッピング部２４はテキスト情報を処理し、分類に使用される語彙の説明へのマッピングを生成する。マッピング部２４は、プロセッサとメモリの他に、入力部及び／又はネットワークインターフェースを含んでもよい。分類部２３は、サブシーケンスとカテゴリの決定の精度を向上させるために、マッピング部２４にアクセスしてマッピングを参照することができる。言い換えれば、分類部２３によって行われる分類処理は、マッピング部２４によって生成される言語ドメインにカテゴリをマッピングすることによって支援される。より具体的には、分類部２３は、人間が理解できるテキストラベルをサブシーケンスに割り当て、これまで分類番号でラベル付けされていた既に識別済のカテゴリに対して、可能な限り正確に動作パターンを記述する。また、分類部２３が隣接するサブシーケンスのテキストラベルを使用してカテゴリを決定できない場合にも、これは役立つ。テキスト情報を使用する主な目的は、分類の能力を強化することと、カテゴライザの誤った結果を追加チューニングまたは修正する必要がある場合にシステムの推論を理解することである。 The mapping unit 24 acquires text information related to the classification of documents and the like from a database and/or the Internet. The mapping unit 24 further obtains textual information provided by users of the classification system 20, among others. The mapping unit 24 processes the text information and generates a mapping of vocabulary to descriptions used for classification. The mapping unit 24 may include an input unit and/or a network interface in addition to a processor and a memory. The classification unit 23 can access the mapping unit 24 and refer to the mapping in order to improve the accuracy of subsequence and category determination. In other words, the classification process performed by the classification unit 23 is supported by mapping categories to linguistic domains generated by the mapping unit 24. More specifically, the classification unit 23 assigns human-understandable text labels to subsequences, and assigns motion patterns as accurately as possible to already identified categories that have been labeled with classification numbers. Describe. It is also useful in cases where the classifier 23 cannot determine the category using the text labels of adjacent subsequences. The main purpose of using textual information is to enhance the classification capabilities and to understand the system's reasoning in case erroneous results of the categorizer require additional tuning or correction.

修正部２５は、特定の分類ソリューションの評価値を決定する。分類ソリューションの評価値は、対応する分類が、分類後の後続の処理ステップにどの程度適しているかを示してもよい。後続の処理ステップの例として、意図検出がある。評価値は、対応するサブシーケンスによって示される人間の動作の後のアクションまたはイベントを予測するために、対応する分類がどれだけ適しているかを示してもよい。 The modification unit 25 determines the evaluation value of a specific classification solution. The evaluation value of a classification solution may indicate how suitable the corresponding classification is for subsequent processing steps after classification. An example of a subsequent processing step is intent detection. The evaluation value may indicate how well the corresponding classification is suitable for predicting subsequent actions or events of the human movement indicated by the corresponding subsequence.

得られた分類ソリューションの評価値は、１または複数の指標によって判断できる。指標の第１の例は、同一のカテゴリに属していることがすでに知られている要素を、分類ソリューション（すなわち、分類部２３）が、同一のカテゴリの一部としてどの程度良く分類するかである。指標の第２の例は、定義された問題に対する所定のカテゴリ数からの偏差を記述する指標である。つまり、この指標は、既知であると仮定された最適なカテゴリ数からの偏差が、定義された問題に対してどの程度であるかを示す。たとえば、得られた分類ソリューションが不適切になるほど、この指標は大きくなる。指標の第３の例は、システム全体が分類システム２０を含む一方、システム全体が分類ソリューションを使用して全体的なタスクをどの程度達成するかを記述する指標である。これは最も重要な指標の１つであり、分類を改善するために使用すべきであれば、最も困難な指標である。システム全体の例については後述する。 The evaluation value of the obtained classification solution can be determined based on one or more indicators. The first example of an indicator is how well the classification solution (i.e., the classification unit 23) classifies elements that are already known to belong to the same category as part of the same category. be. A second example of an indicator is an indicator that describes the deviation from a predetermined number of categories for a defined problem. In other words, this metric indicates how much there is a deviation from the optimal number of categories assumed to be known for the defined problem. For example, the more inappropriate the classification solution obtained, the larger this indicator becomes. A third example of a metric is a metric that describes how well the entire system, including the classification system 20, accomplishes the overall task using the classification solution. This is one of the most important metrics and the most difficult one if it should be used to improve classification. An example of the entire system will be described later.

修正部２５は、これらの指標のうち少なくとも１つを用いて分類を評価する。ただし、指標はこれらの例に限定されるものではない。指標は、分類ソリューションの正しさ又は適切さを定義するための様々なパラメータを有してもよい。現在の分類ソリューションの評価値が指標に関して所定の基準を満たさない場合（たとえば、現在の分類ソリューションが検討対象のタスクに対して十分ではない場合）、修正部２５は生成部２２に所定のアルゴリズムを変更するよう、適切な指示（フィードバック）を与える。具体的には、もし修正部２５が分類の評価を考慮して、あるカテゴリが適切でないと判断した場合、修正部２５は所定のアルゴリズムのうちそのカテゴリに対応する部分が修正されるべきことを指示する指示を送信する。指示に基づいて所定のアルゴリズムが修正され、計算部２６による計算方法、信号分析部２７による分析方法、決定部２８による決定方法、及びサブシーケンス生成部２９の生成方法のうち少なくとも１つに修正が加えられる。その結果、ビデオデータ中のサブシーケンスの長さ及びサブシーケンスの位置を変更することができる。 The modification unit 25 evaluates the classification using at least one of these indicators. However, indicators are not limited to these examples. The indicator may have various parameters to define the correctness or suitability of the classification solution. If the evaluation value of the current classification solution does not satisfy a predetermined criterion regarding the index (for example, if the current classification solution is not sufficient for the task under consideration), the modification unit 25 instructs the generation unit 22 in a predetermined algorithm. Give appropriate instructions (feedback) to make changes. Specifically, if the modification unit 25 considers the classification evaluation and determines that a certain category is not appropriate, the modification unit 25 determines that the part of the predetermined algorithm that corresponds to that category should be modified. Send instructions to instruct. The predetermined algorithm is modified based on the instruction, and at least one of the calculation method by the calculation unit 26, the analysis method by the signal analysis unit 27, the determination method by the determination unit 28, and the generation method by the subsequence generation unit 29 is modified. Added. As a result, the length of the subsequences and the position of the subsequences in the video data can be changed.

次に図７～８Ｂを参照して、具体的な人の動作の例と分類システム２０が実行する処理について説明する。ビデオデータの強度信号の例を図７に示す。フレーム番号０からｋ_ｃが時間軸として図７に示され、フレームの特徴点はｋ_Ａとｋ_Ｂの２つがある。図７に示すように、特徴点では信号の強度は強度ｄ_ｋに関して変曲点、特に極小値を有する。 Next, with reference to FIGS. 7 to 8B, specific examples of human movements and processing executed by the classification system 20 will be described. An example of the intensity signal of video data is shown in FIG. FIG. 7 shows the time axis from frame number 0 to _kc , and there are two feature points of the frame, _kA and _kB . As shown in FIG. 7, at the feature point, the signal strength has an inflection point, especially a minimum value, with respect to the strength _dk .

分類システム２０において、計算部２６は図７のグラフを導出する。信号分析部２７はこのグラフを分析し、２つの特徴点ｋ_Ａとｋ_Ｂを求め、この２つの点を候補点とする。そして、決定部２８は、２つの点ｋ_Ａとｋ_Ｂを特徴点とする。サブシーケンス生成部２９は、決定された２つの点ｋ_Ａとｋ_Ｂを利用して、ビデオデータ中のサブシーケンスの長さとサブシーケンスの位置を決定する。この例では、サブシーケンス生成部２９は、サブシーケンス（１）、（２）及び（３）を生成する。フレーム番号０からｋ_Ａまでのサブシーケンスをサブシーケンス（１）、フレーム番号ｋ_Ａからｋ_Ｂまでのサブシーケンスをサブシーケンス（２）、フレーム番号ｋ_Ｂからｋ_ｃまでのサブシーケンスをサブシーケンス（３）と設定する。上記の通り、サブシーケンスは２つの特徴点ｋ_Ａとｋ_Ｂによって定義される。 In the classification system 20, the calculation unit 26 derives the graph of FIG. The signal analysis unit 27 analyzes this graph, finds two feature points _kA and _kB , and sets these two points as candidate points. Then, the determining unit 28 sets the two points _kA and _kB as feature points. The subsequence generator 29 uses the determined two points _kA and _kB to determine the length and position of the subsequence in the video data. In this example, the subsequence generation unit 29 generates subsequences (1), (2), and (3). Subsequence (1) is the subsequence from frame number 0 to _kA , subsequence (2) is the subsequence from frame number _kA to _kB , and subsequence (2) is the subsequence from frame number _kB to _kc . 3). As mentioned above, a subsequence is defined by two feature points k _A and k _B.

図８Ａは、各サブシーケンスの人間の動作の例を示す。図８Ａに示されるように、サブシーケンス（１）は人Ｐが「左腕を上げる」こと、サブシーケンス（２）は人Ｐが物体Ｏについて「物体を渡す」こと、サブシーケンス（３）は人Ｐが「リラックス」することを示している。これらの人間の動作の特徴的な体の点は、図７の強度信号で表される。 FIG. 8A shows an example of human motion for each subsequence. As shown in FIG. 8A, subsequence (1) is for person P to “raise his left arm,” subsequence (2) is for person P to “pass the object” regarding object O, and subsequence (3) is for person P to “raise his left arm.” P indicates "relaxation". These characteristic body points of human motion are represented by the intensity signals in FIG.

図８Ｂは、サブシーケンス（１）から（３）に対応するカテゴリとカテゴリラベルの例を示す。サブシーケンス（１）のカテゴリは「ｍｐ３１」、サブシーケンス（２）のカテゴリは「ｍｐ７６」、サブシーケンス（１）のカテゴリは「ｍｐ２１」である。分類部２３はＤＢを使用してこれらのカテゴリ番号を設定する。さらに、サブシーケンス（１）のカテゴリラベルは「左腕を上げる」、サブシーケンス（２）のカテゴリは「物体を渡す」、サブシーケンス（１）のカテゴリは「リラックス」である。分類部２３は、マッピング部２４によって生成されたテキスト情報を使用してこれらのカテゴリラベルを設定する。このように、分類システム２０はサブシーケンスのラベルを定義する。 FIG. 8B shows examples of categories and category labels corresponding to subsequences (1) to (3). The category of subsequence (1) is "mp31," the category of subsequence (2) is "mp76," and the category of subsequence (1) is "mp21." The classification unit 23 uses the DB to set these category numbers. Furthermore, the category label of subsequence (1) is "raise left arm," the category of subsequence (2) is "pass the object," and the category of subsequence (1) is "relax." The classification unit 23 uses the text information generated by the mapping unit 24 to set these category labels. In this manner, classification system 20 defines labels for subsequences.

次に、図９及び１０を参照して、分類システム２０がサブシーケンスを分類しない例を説明する。図９のグラフは図７のグラフと同じである。しかしながら、特徴点を見つける手がかりとなる情報が不足しているため、分類システム２０は偽点ｋ_Ａ’とｋ_Ｂ’を特徴点と誤判断している。その結果、サブシーケンス生成部２９はサブシーケンス（１）’、（２）’及び（３）’を生成する。フレーム番号０からｋ_Ａ’までのサブシーケンスをサブシーケンス（１）’、フレーム番号ｋ_Ａ’からｋ_Ｂ’までのサブシーケンスをサブシーケンス（２）’、フレーム番号ｋ_Ｂ’からｋ_ｃまでのサブシーケンスをサブシーケンス（３）’と設定する。 Next, an example in which the classification system 20 does not classify subsequences will be described with reference to FIGS. 9 and 10. The graph in FIG. 9 is the same as the graph in FIG. However, since there is a lack of information to help find the feature points, the classification system 20 incorrectly judges the false points k _A ′ and k _B ′ to be feature points. As a result, the subsequence generation unit 29 generates subsequences (1)', (2)', and (3)'. The subsequence from frame number 0 to _kA ' is subsequence (1)', the subsequence from frame number _kA ' to _kB ' is subsequence (2)', and the subsequence from frame number _kB ' to _kc is Set the subsequence as subsequence (3)'.

図１０は、サブシーケンス（１）’から（３）’に対応するカテゴリとカテゴリラベルの例を示す。分類部２３は、サブシーケンス（１）'と（３）'のカテゴリとカテゴリラベルを正しく決定できるが、サブシーケンス（２）'のカテゴリを決定できないため、サブシーケンス（２）'は分類部２３によって分類できない。この場合、テキスト推論は分類処理を支援し、この手段なしでは分類できない場合でも分類を可能にする。 FIG. 10 shows examples of categories and category labels corresponding to subsequences (1)' to (3)'. The classification unit 23 can correctly determine the category and category label of subsequences (1)' and (3)', but cannot determine the category of subsequence (2)'. cannot be classified by In this case, text inference supports the classification process and allows classification even when classification would not be possible without this means.

図１１は、この状況での修正部２５によるフィードバック処理の概略図である。修正部２５は、指標を使用して分類結果を評価し、生成部２２にフィードバックを送信する。フィードバックは、特徴点の決定に関して、所定のアルゴリズムを修正する必要があることを指示する。フィードバックを受けて、生成部２２は、特徴点再評価アルゴリズムを使用し、再評価の結果として特徴点の決定を調整する。これにより、生成部２２は、図９に示すように、点ｋ_Ａ’と点ｋ_Ｂ’を元の点から移動させ、図７の正しい位置に点を設定する。 FIG. 11 is a schematic diagram of the feedback processing by the correction unit 25 in this situation. The modification unit 25 uses the index to evaluate the classification result and sends feedback to the generation unit 22. The feedback indicates that a given algorithm needs to be modified with respect to feature point determination. In response to the feedback, the generator 22 uses a feature point re-evaluation algorithm and adjusts the feature point determination as a result of the re-evaluation. Thereby, the generation unit 22 moves the point k _A ′ and the point k _B ′ from the original points, as shown in FIG. 9 , and sets the points at the correct positions in FIG. 7 .

例えば、修正部２５は、特徴１と２のペアを含む特徴空間を処理し、データ点を異なる方法で複数のグループに分類してもよい。ただし、特徴空間は２次元に限定されないことに留意する必要がある。 For example, the modification unit 25 may process a feature space including the pair of features 1 and 2 and classify the data points into groups in different ways. However, it should be noted that the feature space is not limited to two dimensions.

図１２は、合理的な分類ソリューションの数の推移例を示す。分類部２３は、どのような分類ソリューションが合理的な分類ソリューションであるかを決定する。図１２の開始時点では、合理的な分類ソリューションの数は一つである。その数は、時間の経過とともに順番に２, ３, ２, １, ２, １となる。要約すると、数は一時的に複数になることもあるが、時間経過とともに１に収束する。分類後の処理を行うためには、分類の際にあいまいさを減らすべきであるため、分類部２３は、分類部２３で使用される分類ソリューションの数を１に制限することが望ましい。 FIG. 12 shows an example of the evolution of the number of reasonable classification solutions. The classification unit 23 determines what kind of classification solution is a reasonable classification solution. At the start of FIG. 12, the number of reasonable classification solutions is one. The number becomes 2, 3, 2, 1, 2, 1 in order as time passes. In summary, the number may temporarily be multiple, but it converges to 1 over time. In order to perform post-classification processing, ambiguity should be reduced during classification, so it is desirable for the classification unit 23 to limit the number of classification solutions used by the classification unit 23 to one.

人の動作を検出するため、関連技術では、人の動作のカテゴリの決定が行われることがある。しかしながら、（例えば、特定のタスクを実行する）人の動作を示すムービーのフレームを分析して得られた前処理データを利用して、自動的にカテゴリを決定することに取り組む場合、以下のような問題が生じることがある。 To detect human motion, related techniques may involve determining a category of human motion. However, if you are working on automatically determining categories using preprocessed data obtained by analyzing frames of a movie showing human actions (e.g., performing a specific task), then Problems may occur.

第１の問題は、最小限の情報を使用するか、または情報を全く使用しないことによって、サブシーケンスを記述する意味のあるカテゴリを導出することの問題である。カテゴリは、得られた分類が技術システムの全体的な目的に有用であることを意味する実用的な観点から意味をなすべきである。この問題は、サブシーケンスの正しい長さが正確に知られていても発生する。分類ソリューションの評価値を記述する有効な基準を確立する必要がある。 The first problem is that of deriving meaningful categories that describe subsequences using minimal or no information. The categories should make sense from a practical point of view, meaning that the resulting classification is useful for the overall purpose of the technical system. This problem occurs even if the correct length of the subsequence is precisely known. There is a need to establish valid criteria to describe the evaluation value of classification solutions.

第２の問題は、カテゴリにマッピングできる有効なサブシーケンスを見つけ、時間の経過とともにサブシーケンスの決定を改善することの問題である。この問題は、情報がないか情報量が少ない場合、前処理されたデータのみを使用して単一のサブシーケンスの長さを導出することが困難であるために発生する。さらに、サブシーケンスを生成する指示がない。 The second problem is that of finding valid subsequences that can be mapped to categories and improving the subsequence decisions over time. This problem arises because it is difficult to derive the length of a single subsequence using only preprocessed data when there is no information or a small amount of information. Furthermore, there is no instruction to generate subsequences.

第３の問題は、サブシーケンスやカテゴリの決定を改善するために、データベース、インターネットから得られる、または特にユーザによって提供される文書などのテキスト情報を使用することの問題である。 A third problem is that of using textual information, such as databases, documents obtained from the Internet, or especially provided by users, to improve the determination of subsequences and categories.

分類システム２０は、前述の問題を解決することができる。第１の問題は、分類部２３によって行われた分類を評価する指標を設定することによって解決される。関連技術では、入手できる関連情報の量が少なく、分類システムが（意図検出システムのような）さらなる処理システムへの入力を提供する場合、その影響が直接推定できない、すなわち、ある動作パターン分類システムで局所的な意図をどの程度良好に検出できるかに関して、評価には固有の困難さがある。しかしながら、指標の拡張セットの導入により、分類システム２０は、分類の特性を評価することができる。 Classification system 20 can solve the aforementioned problems. The first problem is solved by setting an index for evaluating the classification performed by the classification unit 23. In related art, when the amount of relevant information available is small and the classification system provides input to further processing systems (such as an intention detection system), the impact cannot be directly estimated, i.e., when a behavior pattern classification system There are inherent difficulties in evaluation regarding how well local intent can be detected. However, with the introduction of an expanded set of indicators, the classification system 20 is able to evaluate the characteristics of the classification.

さらに、分類の評価により、分類システム２０は、必要に応じて、所定のアルゴリズムを修正して、サブシーケンスの生成方法（候補点の選択方法）を変更することができる。つまり、得られた分類ソリューションの評価値に基づいて、例えば、強度信号の計算方法や特徴点の決定方法（例えばルールベースによる）を変更することによって、サブシーケンス長の計算が適合されてもよい。したがって、分類部２３による修正に応じて分類ソリューションが修正され、修正された分類ソリューションがＤＢに格納される。 Furthermore, based on the classification evaluation, the classification system 20 can modify the predetermined algorithm to change the subsequence generation method (candidate point selection method), if necessary. That is, based on the evaluation value of the obtained classification solution, the calculation of the subsequence length may be adapted, for example by changing the method of calculating the intensity signal or the method of determining the feature points (e.g. by rule-based). . Therefore, the classification solution is modified in accordance with the modification by the classification unit 23, and the modified classification solution is stored in the DB.

第２の問題は、所定の適合性のあるアルゴリズムに基づいてサブシーケンスの関連する長さを決定するために、ある適合性が高い方法で強度信号を計算し、この信号の特徴点を導出することによって解決される。 The second problem is to calculate the intensity signal in some adaptive way and derive the feature points of this signal in order to determine the relevant length of the subsequence based on a predetermined adaptive algorithm. This is solved by

第３の問題は、ＤＢとマッピング部２４を分類システム２０に導入することによって解決される。これらのユニットは、分類システム２０が適切な数とテキスト情報を使用してカテゴリとカテゴリラベルを生成することを可能にする。特に、マッピング部２４は、データベース及び／又はインターネットから人間の動作に関する情報を取得することによってマッピング情報を生成することができ、分類部２３は、マッピング情報を利用して分類の精度を向上させることができる。 The third problem is solved by introducing the DB and mapping unit 24 into the classification system 20. These units enable classification system 20 to generate categories and category labels using appropriate numbers and textual information. In particular, the mapping unit 24 can generate mapping information by acquiring information about human movements from a database and/or the Internet, and the classification unit 23 can use the mapping information to improve classification accuracy. I can do it.

カテゴリは、分類システム２０によって自動的に学習されることができ、新しい動作サブシーケンスが実行された場合でも、データの必要性が高くない状態で、新しい動作の新しいカテゴリが決定可能である。 Categories can be automatically learned by the classification system 20, and new categories for new actions can be determined even when new action subsequences are performed, with less need for data.

上記で説明したように、分類システム２０は、分類部２３によって実行された分類の評価に基づいて、所定のアルゴリズムを修正することができる。したがって、分類システム２０は、サブシーケンスをより正確に分類することができる。 As explained above, the classification system 20 can modify the predetermined algorithm based on the evaluation of the classification performed by the classification unit 23. Therefore, classification system 20 can more accurately classify subsequences.

さらに、前処理部２１は未加工のビデオデータに含まれる情報を削減し、分類に関連する情報を含むビデオデータを生成することができる。これにより、分類に関連する処理を少ない処理時間で行うことができ、分類の精度を高めることができる。 Additionally, the preprocessor 21 can reduce information included in the raw video data and generate video data that includes information related to classification. Thereby, processing related to classification can be performed in less processing time, and classification accuracy can be improved.

さらに、修正部２５は、同一のカテゴリに属することがすでに知られている要素を、分類手段が同一のカテゴリの一部としてどの程度良く分類しているかを示す指標、定義された問題に対する既定のカテゴリ数からの偏差を示す指標、システムが全体的なタスクをどの程度達成しているかを示す指標のうち、少なくとも１つの指標を使用して分類を評価することができる。このため、分類システム２０は、実用的に分類を評価することができる。ここで、システムには分類装置が含まれている。 Furthermore, the modification unit 25 includes an index indicating how well the classification means classifies elements that are already known to belong to the same category as part of the same category, and a default value for the defined problem. At least one of the following indicators may be used to evaluate the classification: a deviation from the number of categories, and an indicator of how well the system is accomplishing the overall task. Therefore, the classification system 20 can practically evaluate the classification. Here, the system includes a classification device.

さらに、分類部２３は、サブシーケンス（部分ビデオデータ）を人間の動作の一種として分類することができる。その結果、分類システム２０は、人間の動作を検出するために使用することができる。 Furthermore, the classification unit 23 can classify the subsequence (partial video data) as a type of human motion. As a result, classification system 20 can be used to detect human motion.

特に、生成部２２は、ビデオデータの強度信号を計算して、特定の時間領域を決定することができる。ここで、強度信号は人の動作を示す。人の動作の特徴を単純な強度信号として定義できるため、結果、生成部２２は、人の動作の特徴を容易に把握することができる。 In particular, the generator 22 may calculate the intensity signal of the video data to determine a specific time domain. Here, the intensity signal indicates human motion. Since the characteristics of a person's motion can be defined as a simple intensity signal, the generation unit 22 can easily understand the characteristics of the person's motion.

さらに、分類部２３は、分類されたサブシーケンス（部分ビデオデータ）をテキストラベルに割り当てることができる。このため、分類システム２０のユーザは、分類結果を容易に認識することができる。 Furthermore, the classification unit 23 can assign the classified subsequences (partial video data) to text labels. Therefore, the user of the classification system 20 can easily recognize the classification results.

（実施の形態４）
本開示の実施の形態４を、図面を参照して以下に説明する。 (Embodiment 4)
Embodiment 4 of the present disclosure will be described below with reference to the drawings.

図１３は、意図検出システム３０を示す。意図検出システム３０は、分類システム２０、人物対象分析部３１及び意図検出部３２のユニットを備える。要約すると、意図検出システム３０は、意図検出推論モジュールと結合されたシステムである。前処理部２１から修正部２５までのユニットの処理は、実施の形態３で説明したものと同じであるため、その説明は省略する。実施の形態２における認識部１５の一例には、人物対象分析部３１と意図検出部３２が対応する。 FIG. 13 shows an intention detection system 30. The intention detection system 30 includes a classification system 20, a human object analysis section 31, and an intention detection section 32. In summary, intent detection system 30 is a system coupled with an intent detection inference module. The processing of the units from the preprocessing section 21 to the modification section 25 is the same as that described in the third embodiment, so the explanation thereof will be omitted. An example of the recognition unit 15 in the second embodiment corresponds to the human object analysis unit 31 and the intention detection unit 32.

人物対象分析部３１は、前処理部２１が入力したビデオデータと、生成部２２が生成したサブシーケンスを分析し、サブシーケンス内のさまざまな種類の人間の部分を検出する。検出される人間の部分は、例えば、頭部、右腕または左腕、右足または左足などである。好ましくは、人物対象分析部３１は指示を示すジェスチャーに用いられる部分を検出できる。人物対象分析部３１は、検出結果を分類部２３に出力する。分類部２３は、検出結果を利用してサブシーケンスを分類し、分類の精度を向上させる。 The human object analysis unit 31 analyzes the video data input by the preprocessing unit 21 and the subsequences generated by the generation unit 22, and detects various types of human parts in the subsequences. The detected human parts are, for example, the head, the right or left arm, the right or left leg, etc. Preferably, the human object analysis unit 31 can detect a portion used for a gesture indicating an instruction. The person object analysis section 31 outputs the detection result to the classification section 23. The classification unit 23 uses the detection results to classify the subsequences and improves classification accuracy.

意図検出部３２は、分類部２３からの分類結果を受信し、これを利用して、ビデオデータ内の人物の意図を検出する。本開示において、「意図」は、ある対象に対する作業を表すことができる。作業には、例えば、ある対象をつかむ作業、ある対象を置く作業などがある。意図検出システム３０が工場内に設置されている場合、意図検出部３２は、作業者の意図（例えば、「ある対象を掴みたいという気持ちを表す」、「注意を向けられたいという気持ちを表す」、「対象を置きたいという気持ちを表す」など。）を検出することができる。さらに、「意図」は、機械の動作の指示を表すこともできる。機械の動作は、例えば、移動、機械の一部の操作、またはこれらの操作の停止を含むことができる。意図検出部３２は、意図検出の結果を出力する。出力の例としては、分析対象のサブシーケンスに関する推定対象者の活動及び／又はジェスチャーがある。さらに、意図検出部３２は、人物の次の行動及び／又はジェスチャーを予測し、予測を出力してもよい。 The intention detection unit 32 receives the classification result from the classification unit 23, and uses this to detect the intention of the person in the video data. In this disclosure, "intention" can represent work on a certain object. The tasks include, for example, the task of grasping a certain object and the task of placing a certain object. When the intention detection system 30 is installed in a factory, the intention detection unit 32 detects the intention of the worker (for example, "expresses a feeling of wanting to grab a certain object", "expresses a feeling of wanting to be noticed"). , ``expressing the feeling of wanting to place the object,'' etc.) can be detected. Furthermore, "intent" can also represent instructions for machine operation. Operation of the machine can include, for example, moving, operating parts of the machine, or stopping these operations. The intention detection unit 32 outputs the result of intention detection. Examples of outputs include estimated subject activities and/or gestures for the subsequence being analyzed. Furthermore, the intention detection unit 32 may predict the next action and/or gesture of the person and output the prediction.

この場合、意図検出部３２は、分類されたサブシーケンス（部分ビデオデータ）を使用して、人間の意図を検出することができる。これにより、意図検出システム３０は、産業分野及び／又は医療分野など、様々な分野における人間活動の支援システムに適用することができる。 In this case, the intention detection unit 32 can detect the human intention using the classified subsequences (partial video data). Thereby, the intention detection system 30 can be applied to human activity support systems in various fields such as the industrial field and/or the medical field.

（実施の形態５）
本開示の実施の形態５を、図面を参照して以下に説明する。この実施の形態は、意図検出システム３０の特定用途を説明する。 (Embodiment 5)
Embodiment 5 of the present disclosure will be described below with reference to the drawings. This embodiment describes a specific application of intent detection system 30.

図１４は意図検出システム３０を含む機械を示す。具体的には、機械４０は意図検出システム３０、センサＳ、信号発生器４１及びオプティマイザコントローラ４２を備える。意図検出システム３０の処理は実施の形態４で説明したものと同じであるため、その説明を省略する。機械４０の一例はロボットである。 FIG. 14 shows a machine including an intent detection system 30. Specifically, the machine 40 includes an intention detection system 30, a sensor S, a signal generator 41 and an optimizer controller 42. The processing of the intention detection system 30 is the same as that described in Embodiment 4, so the description thereof will be omitted. An example of machine 40 is a robot.

センサＳは未加工のビデオデータを取得し、意図検出システム３０内の前処理部２１にそれを入力する。例えば、センサＳはビデオセンサであってもよい。 The sensor S acquires raw video data and inputs it to a preprocessor 21 within the intention detection system 30. For example, sensor S may be a video sensor.

信号発生器４１は、意図検出システム３０内の意図検出部３２の出力を受信し、意図検出部３２の出力も考慮して、機械４０の動作を制御する制御信号を生成する。例えば、信号発生器４１は、意図検出部３２が決定した作業に応じて機械４０の動作を決定し、決定した作業に応じて機械４０を制御することができる。信号発生器４１は、図１４に示すように、機械の他のセンサ及び／又は部分から他の入力信号を受信し、また、他の入力信号を考慮して制御信号を発生してもよい。信号発生器４１は、機械４０のコントローラとして機能する。例えば、機械が地上を移動できるのであれば、信号発生器４１は軌道プランナーとして機能し、計画された軌道とともに移動の制御信号を生成することができる。さらに、信号発生器４１は、機械４０の部分から信号を受信し、基準信号を生成してその部分を制御することができる。信号発生器４１は、生成した信号をオプティマイザコントローラ４２に出力する。オプティマイザコントローラ４２は、制御信号を受信し、オプティマイザとして制御信号を処理する。これが、機械４０がその動作を計画し、制御する方法である。 The signal generator 41 receives the output of the intention detection section 32 in the intention detection system 30 and generates a control signal for controlling the operation of the machine 40, also taking into account the output of the intention detection section 32. For example, the signal generator 41 can determine the operation of the machine 40 according to the work determined by the intention detection unit 32, and can control the machine 40 according to the determined work. The signal generator 41 may receive other input signals from other sensors and/or parts of the machine, as shown in FIG. 14, and may also generate control signals in consideration of other input signals. Signal generator 41 functions as a controller for machine 40. For example, if the machine can move on the ground, the signal generator 41 can function as a trajectory planner and generate movement control signals as well as a planned trajectory. Additionally, signal generator 41 can receive signals from a part of machine 40 and generate reference signals to control that part. The signal generator 41 outputs the generated signal to the optimizer controller 42. Optimizer controller 42 receives control signals and processes the control signals as an optimizer. This is how machine 40 plans and controls its operation.

図１５は、ピッキングロボットである機械４０の特定用途を示す。ピッキングロボットＲは、意図検出システム３０をその内部に備え、また、吸い込み機構ＡＭ及び収納スペースを備える。吸い込み機構ＡＭは品物を吸い込み、吸い込まれた品物はピッキングロボットＲの内部制御に対応した収納スペースに収納される。 Figure 15 shows a particular application of machine 40 as a picking robot. The picking robot R includes an intention detection system 30 therein, and also includes a suction mechanism AM and a storage space. The suction mechanism AM sucks in items, and the sucked items are stored in a storage space that corresponds to the internal control of the picking robot R.

図１６Ａ及び１６Ｂは人間のジェスチャーで指示されたピッキングロボットＲの処理例を示す。図１６Ａ及び１６Ｂは、倉庫又は工場で作業者ＷがピッキングロボットＲに指示及び命令を出したい状況を示している。ピッキングロボットＲは作業者Ｗをモニターし、ビデオデータを取得して作業者のジェスチャーを認識することができる。実施の形態３及び４で説明した処理を経て、ピッキングロボットＲは作業者のジェスチャーを分類し、その分類に基づいて作業者の意図を検出する。意図の検出結果を利用して、ピッキングロボットＲは所望の作業を行うことができる。ピッキングロボットＲは、検出された作業者Ｗのジェスチャー（すなわち指示）と、ピッキングロボットＲが行う作業との対応関係を記憶してもよい。ジェスチャーを検出して、ピッキングロボットＲは、記憶された対応関係に基づく所望の作業を行ってもよい。 16A and 16B show an example of processing by the picking robot R instructed by a human gesture. 16A and 16B show a situation in which a worker W wants to issue instructions and commands to a picking robot R in a warehouse or factory. The picking robot R can monitor the worker W, acquire video data, and recognize the worker's gestures. Through the processes described in the third and fourth embodiments, the picking robot R classifies the worker's gestures and detects the worker's intention based on the classification. The picking robot R can perform the desired work using the intention detection result. The picking robot R may store the correspondence between the detected gestures (ie, instructions) of the worker W and the work performed by the picking robot R. Upon detecting the gesture, the picking robot R may perform a desired task based on the stored correspondence.

例えば、図１６Ａでは、作業者Ｗが右腕を棚Ｓに向かって伸ばしている。また、図１６Ａは、棚Ｓに多くの異なる商品があることを示している。作業者Ｗのジェスチャー前では、ピッキングロボットＲは棚Ｓの商品を回収する作業をしない。しかしながら、作業者Ｗがジェスチャーを行うと、ピッキングロボットＲは作業者Ｗのこのジェスチャーを分類し、このジェスチャーが棚Ｓの商品を吸い込む処理に該当すると決定する。そして、ピッキングロボットＲの信号発生器４１が制御信号を生成してピッキングロボットＲを棚Ｓの近くの位置に移動させ、吸い込み機構ＡＭに棚Ｓの商品を吸い込ませて回収する。 For example, in FIG. 16A, worker W is extending his right arm toward shelf S. FIG. 16A also shows that there are many different products on shelf S. Before the worker W's gesture, the picking robot R does not collect the products on the shelf S. However, when the worker W makes a gesture, the picking robot R classifies this gesture of the worker W and determines that this gesture corresponds to the process of sucking in the product on the shelf S. Then, the signal generator 41 of the picking robot R generates a control signal to move the picking robot R to a position near the shelf S, and causes the suction mechanism AM to suck up and collect the products on the shelf S.

もう一つの例として、図１６Ｂでは、作業者Ｗが左腕を図１６Ｂの右側から左側に動かしている。ピッキングロボットＲは、作業者Ｗのこのジェスチャーを分類し、このジェスチャーが作業を停止して棚Ｓから離れる処理に該当すると決定する。そして、ピッキングロボットＲ内の信号発生器４１がこれらの動作を行うための制御信号を生成する。 As another example, in FIG. 16B, worker W is moving his left arm from the right side to the left side in FIG. 16B. The picking robot R classifies this gesture of the worker W and determines that this gesture corresponds to the process of stopping the work and leaving the shelf S. Then, the signal generator 41 in the picking robot R generates control signals for performing these operations.

関連技術では、人にマーカーを取り付けるのが煩わしい場合があっても、機械への指示にマーカーがしばしば必要となる。しかしながら、本開示は様々な機械に適用可能な高度な機械学習システムを開示し、「マーカーなしのソリューション」を提供することができる。したがって、マーカーを人に取り付ける負担を回避することができる。 In related technology, markers are often required for instructions to machines, even though attaching a marker to a person may be cumbersome. However, this disclosure discloses an advanced machine learning system that is applicable to a variety of machines and can provide a "markerless solution." Therefore, the burden of attaching the marker to a person can be avoided.

また、信号発生器４１（コントローラ）は、意図検出部３２によって検出された人間の意図に基づいて機械４０の動作を制御する。そのため、機械４０は作業者の作業を支援することができる。 Further, the signal generator 41 (controller) controls the operation of the machine 40 based on the human intention detected by the intention detection section 32. Therefore, the machine 40 can support the work of the worker.

なお、本発明は、上記の実施形態に限定されるものではなく、本発明の精神を逸脱することなく、適宜変更されてもよい。例えば、修正部２５の代わりに、分類システム２０内の別のユニット、または分類システム２０外部の装置が、分類部２３によって行われた分類を評価してもよい。 Note that the present invention is not limited to the above-described embodiments, and may be modified as appropriate without departing from the spirit of the present invention. For example, instead of the modification unit 25, another unit within the classification system 20 or a device external to the classification system 20 may evaluate the classification performed by the classification unit 23.

異なる人間の動作は、それらの動作の時間が重なり合ってなされ得るため、生成される複数の部分ビデオデータ（またはサブシーケンス）は、実施の形態１及び２において、時間に関して互いに重なっていてもよい。 Since different human motions may be performed with their motions overlapping in time, the generated partial video data (or subsequences) may overlap with each other in time in embodiments 1 and 2.

図８Ａは、人物Ｐの「左腕を上げる」、「物体を渡す」及び「リラックスする」の例を示す。しかしながら、人間の動作の例はこれらに限らないことは言うまでもない。例えば、「物体の近くで左腕を上げる」、「右腕を上げる」、「人差し指で指す」、「手で特別なジェスチャーをする」などが検出対象となる人間の動作であってもよい。 FIG. 8A shows examples of the person P's "raise left arm," "pass the object," and "relax." However, it goes without saying that examples of human actions are not limited to these. For example, the human motion to be detected may be "raising the left arm near an object", "raising the right arm", "pointing with the index finger", "making a special gesture with the hand", etc.

本開示は、データフレームの主要な情報が、空間内で位置が変化する２又は３次元空間内の、何らかの形で関連する特定の少数の点に要約され、これらの点の画像が特定の時間ステップで与えられる用途に適用されることができる。 The present disclosure provides that the main information of a data frame is summarized in a small number of somehow related points in a two- or three-dimensional space that change positions in space, and that the images of these points are summarized at a certain time. can be applied to the uses given in the steps.

本開示は、規則的または不規則にサンプリングされたムービーフレームのシーケンスから計算される点データから得られる動作パターンを分類することができる、様々な目的のための分類システム、方法およびプログラムに関する。この技術システムは、行為をする人の動作パターンを決定し、それに応じて動作パターンを分類するのに役立つ。これは、正しく分類され、ラベル付けされた動作サブシーケンスが、例えば人間への支援の計画など、さらなる処理のために重要な役割を果たす意図検出システムに適用されてもよい。具体的には、工場、ショッピングモール、倉庫、食堂のキッチン、又は建設現場など、さまざまな状況で使用することができる。さらに、スポーツに関する活動又は他の活動における人間の動作を分析するために使用することができる。また、非常に一般的な動的パターンの特徴付けにも適用できる。ただし、本開示の用途は、必ずしもこの分野に限定されない。 The present disclosure relates to classification systems, methods, and programs for various purposes that can classify motion patterns obtained from point data computed from regularly or irregularly sampled sequences of movie frames. This technical system helps to determine the movement patterns of the person performing the act and to classify the movement patterns accordingly. This may be applied in intention detection systems where correctly classified and labeled motion subsequences play an important role for further processing, e.g. planning assistance to a human. Specifically, it can be used in various situations such as factories, shopping malls, warehouses, cafeteria kitchens, or construction sites. Additionally, it can be used to analyze human motion in sports-related or other activities. It can also be applied to characterize very general dynamic patterns. However, the application of the present disclosure is not necessarily limited to this field.

次に、上記の複数の実施形態で説明された装置のハード構成例について、図１７を参照して以下で説明する。 Next, an example of the hardware configuration of the apparatus described in the above embodiments will be described below with reference to FIG. 17.

図１７は、情報処理装置の構成例を示すブロック図である。情報処理装置９０は、図１７に示すように、ネットワークインターフェース９１、プロセッサ９２及びメモリ９３を備える。ネットワークインターフェース９１は、無線通信によって他の機器とデータを送受信することができる。 FIG. 17 is a block diagram showing a configuration example of an information processing device. The information processing device 90 includes a network interface 91, a processor 92, and a memory 93, as shown in FIG. The network interface 91 can send and receive data to and from other devices via wireless communication.

プロセッサ９２は、上記の実施形態のシーケンス図やフローチャートを参照して説明した情報処理装置９０が行う処理を、メモリ９３からソフトウェア（コンピュータプログラム）をロードして実行することによって実行する。プロセッサ９２は、例えば、マイクロプロセッサ、ＭＰＵ（Micro Processing Unit）、又はＣＰＵ（Central Processing Unit）であってもよい。プロセッサ９２は、複数のプロセッサを含んでもよい。 The processor 92 executes the processing performed by the information processing device 90 described with reference to the sequence diagrams and flowcharts of the above embodiments by loading and executing software (computer programs) from the memory 93. The processor 92 may be, for example, a microprocessor, an MPU (Micro Processing Unit), or a CPU (Central Processing Unit). Processor 92 may include multiple processors.

メモリ９３は、揮発性メモリと不揮発性メモリの組み合わせによって構成される。メモリ９３は、プロセッサ９２から離間して配置されたストレージを含んでもよい。この場合、プロセッサ９２はＩ／Ｏインターフェース（不図示）を介してメモリ９３にアクセスしてもよい。 The memory 93 is configured by a combination of volatile memory and nonvolatile memory. Memory 93 may include storage spaced apart from processor 92. In this case, processor 92 may access memory 93 via an I/O interface (not shown).

図１７に示された例では、ソフトウェアモジュール群を格納するためにメモリ９３が使用されている。プロセッサ９２は、メモリ９３からソフトウェアモジュール群を読み込み、読み込んだソフトウェアモジュールを実行することで、上記の実施形態で説明した情報処理装置が行う処理を行うことができる。 In the example shown in FIG. 17, memory 93 is used to store software modules. The processor 92 can perform the processing performed by the information processing apparatus described in the above embodiments by reading a software module group from the memory 93 and executing the read software module.

図１７を参照して上記で説明したように、上記の実施形態の情報処理装置に含まれる各プロセッサは、命令群を含む１または複数のプログラムを実行して、図面を参照して上記で説明したアルゴリズムをコンピュータに実行させる。 As described above with reference to FIG. 17, each processor included in the information processing device of the above embodiment executes one or more programs including a group of instructions, and executes the program as described above with reference to the drawings. have the computer execute the algorithm.

さらに、情報処理装置９０は、ネットワークインターフェースを備えてもよい。ネットワークインターフェースは、通信システムを構成する他のネットワークノード装置との通信に使用される。ネットワークインターフェースは、例えば、ＩＥＥＥ８０２．３シリーズに準拠したネットワークインターフェースカード（ＮＩＣ）を含んでもよい。情報処理装置９０は、ネットワークインターフェースを使用して、入力特徴マップを受信、又は、出力特徴マップを送信してもよい。 Furthermore, the information processing device 90 may include a network interface. The network interface is used for communication with other network node devices that make up the communication system. The network interface may include, for example, a network interface card (NIC) that is compliant with the IEEE 802.3 series. The information processing device 90 may receive input feature maps or transmit output feature maps using a network interface.

上記の例では、プログラムが格納され、任意の種類の非一時的なコンピュータ可読媒体を使用してコンピュータに提供されることができる。非一時的なコンピュータ可読媒体には、任意の種類の有形記憶媒体が含まれる。非一時的なコンピュータ可読媒体の例としては、磁気記憶媒体（例えば、フロッピーディスク、磁気テープ、ハードディスクドライブなど。）、光磁気記憶媒体（例えば光磁気ディスク）、ＣＤ－ＲＯＭ（compact disc read only memory）、ＣＤ－Ｒ（compact disc recordable）、ＣＤ－Ｒ／Ｗ（compact disc rewritable）、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（programmable ROM）、ＥＰＲＯＭ（erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（random access memory）など）がある。プログラムは、任意の種類の一時的なコンピュータ可読媒体を使用してコンピュータに提供されてもよい。一時的なコンピュータ可読媒体の例としては、電気信号、光信号、電磁波がある。一時的なコンピュータ可読媒体は、有線通信回線（例えば、電線、光ファイバー）または無線通信回線を介してコンピュータにプログラムを提供することができる。 In the above example, the program may be stored and provided to the computer using any type of non-transitory computer-readable medium. Non-transitory computer-readable media includes any type of tangible storage media. Examples of non-transitory computer-readable media include magnetic storage media (e.g., floppy disks, magnetic tape, hard disk drives, etc.), magneto-optical storage media (e.g., magneto-optical disks), and compact disc read only memory (CD-ROM) media. ), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), semiconductor memory (e.g. mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash ROM, RAM (random access memory) )and so on. The program may be provided to a computer using any type of temporary computer-readable medium. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves. The transitory computer-readable medium can provide the program to the computer over a wired (eg, wire, fiber optic) or wireless communication link.

上記の実施形態の一部または全部を以下の付記のように記述することができるが、本開示はそれに限定されない。
（付記１）
所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定し、前記特定の時間領域において前記ビデオデータが抽出された部分ビデオデータを生成する生成手段と、
前記生成手段によって生成された前記部分ビデオデータを分類する分類手段と、
前記分類手段によって実行された分類の評価に基づいて、前記所定のアルゴリズムを修正する修正手段と、
を備える分類装置。
（付記２）
未加工のデータに含まれる情報を削減し、前記分類に関連する情報を含む前記ビデオデータを生成する前処理手段をさらに備える、
付記１に記載の分類装置。
（付記３）
前記修正手段は、同一のカテゴリに属することがすでに知られている要素を、前記分類手段が前記同一のカテゴリの一部としてどの程度良く分類しているかを示す指標、定義された問題に対する既定のカテゴリ数からの偏差を示す指標、及び、前記分類装置を含むシステムが全体的なタスクをどの程度達成しているかを示す指標のうち、少なくとも１つの指標を使用して前記分類を評価する、
付記１又は２に記載の分類装置。
（付記４）
前記分類手段は、前記部分ビデオデータを人間の動作の一種として分類する、
付記１から３のいずれか１項に記載の分類装置。
（付記５）
前記生成手段は、前記ビデオデータの強度信号を計算して前記特定の時間領域を決定し、前記強度信号は人の動作を示す、
付記４に記載の分類装置。
（付記６）
前記分類手段は、分類された前記部分ビデオデータをテキストラベルに割り当てる、
付記４または５に記載の分類装置。
（付記７）
分類された前記部分ビデオデータを用いて人間の意図を検出する意図検出手段をさらに備える、
付記４から６のいずれか１項に記載の分類装置。
（付記８）
前記意図検出手段によって検出された人間の意図に基づいて機械の動作を制御するコントローラをさらに備える、
付記７に記載の分類装置。
（付記９）
作業を含むビデオデータを認識し、それによって前記作業を決定する認識手段と、
決定された前記作業に応じて機械の動作を決定し、前記決定された作業に従って前記機械を制御するコントローラと、
を備える制御装置。
（付記１０）
前記ビデオデータを分類し、分類された前記ビデオデータを前記認識手段に入力する分類手段をさらに備える、
付記９に記載の制御装置。
（付記１１）
所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定し、前記特定の時間領域において前記ビデオデータが抽出された部分ビデオデータを生成する生成手段と、
前記分類手段によって実行された分類の評価に基づいて、前記所定のアルゴリズムを修正する修正手段と、をさらに備え、
前記部分ビデオデータは前記認識手段によって認識される、
付記１０に記載の制御装置。
（付記１２）
前記修正手段は、同一のカテゴリに属することがすでに知られている要素を、前記分類手段が前記同一のカテゴリの一部としてどの程度良く分類しているかを示す指標、定義された問題に対する既定のカテゴリ数からの偏差を示す指標、及び、前記分類装置を含むシステムが全体的なタスクをどの程度達成しているかを示す指標のうち、少なくとも１つの指標を使用して前記分類を評価する、
付記１１に記載の制御装置。
（付記１３）
前記分類手段は、前記ビデオデータを人間の動作の一種として分類する、
付記１０から１２のいずれか１項に記載の制御装置。
（付記１４）
所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定し、前記特定の時間領域において前記ビデオデータが抽出された部分ビデオデータを生成することと、
前記部分ビデオデータを分類することと、
分類の評価に基づいて、前記所定のアルゴリズムを修正することと、
を含む分類方法。
（付記１５）
作業を含むビデオデータを認識し、それによって前記作業を決定することと、
決定された前記作業に応じて機械の動作を決定し、前記決定された作業に従って前記機械を制御することと、
を含む制御方法。
（付記１６）
所定のアルゴリズムに基づいてビデオデータの特定の時間領域を決定し、前記特定の時間領域において前記ビデオデータが抽出された部分ビデオデータを生成することと、
前記部分ビデオデータを分類することと、
分類の評価に基づいて、前記所定のアルゴリズムを修正することと、
をコンピュータに実行させるプログラムを格納する非一時的なコンピュータ可読媒体。
（付記１７）
作業を含むビデオデータを認識し、それによって前記作業を決定することと、
決定された前記作業に応じて機械の動作を決定し、前記決定された作業に従って前記機械を制御することと、
をコンピュータに実行させるプログラムを格納する非一時的なコンピュータ可読媒体。 Although some or all of the above embodiments can be described as in the following additional notes, the present disclosure is not limited thereto.
(Additional note 1)
generating means for determining a specific time region of video data based on a predetermined algorithm and generating partial video data from which the video data is extracted in the specific time region;
Classifying means for classifying the partial video data generated by the generating means;
modification means for modifying the predetermined algorithm based on the evaluation of the classification performed by the classification means;
A classification device comprising:
(Additional note 2)
further comprising pre-processing means for reducing information contained in the raw data and generating the video data including information related to the classification;
The classification device described in Appendix 1.
(Additional note 3)
The modification means includes an index indicating how well the classification means classifies elements that are already known to belong to the same category as part of the same category, and a default index for the defined problem. evaluating the classification using at least one indicator of a deviation from the number of categories and an indicator of how well a system including the classification device is accomplishing the overall task;
The classification device according to appendix 1 or 2.
(Additional note 4)
The classification means classifies the partial video data as a type of human motion.
The classification device according to any one of Supplementary Notes 1 to 3.
(Appendix 5)
the generating means calculates an intensity signal of the video data to determine the specific time domain, the intensity signal indicating human motion;
The classification device described in Appendix 4.
(Appendix 6)
The classification means assigns the classified partial video data to a text label.
The classification device according to appendix 4 or 5.
(Appendix 7)
further comprising an intention detection means for detecting a human intention using the classified partial video data;
The classification device according to any one of Supplementary Notes 4 to 6.
(Appendix 8)
further comprising a controller that controls the operation of the machine based on the human intention detected by the intention detection means;
The classification device described in Appendix 7.
(Appendix 9)
recognition means for recognizing video data including a task and thereby determining said task;
a controller that determines the operation of a machine according to the determined work and controls the machine according to the determined work;
A control device comprising:
(Appendix 10)
further comprising a classification means for classifying the video data and inputting the classified video data to the recognition means;
The control device according to appendix 9.
(Appendix 11)
generating means for determining a specific time region of video data based on a predetermined algorithm and generating partial video data from which the video data is extracted in the specific time region;
further comprising a modifying means for modifying the predetermined algorithm based on the evaluation of the classification performed by the classifying means,
the partial video data is recognized by the recognition means;
The control device according to appendix 10.
(Appendix 12)
The modification means includes an index indicating how well the classification means classifies elements that are already known to belong to the same category as part of the same category, and a default index for the defined problem. evaluating the classification using at least one indicator of a deviation from the number of categories and an indicator of how well a system including the classification device is accomplishing the overall task;
The control device according to appendix 11.
(Appendix 13)
The classification means classifies the video data as a type of human motion.
The control device according to any one of Supplementary Notes 10 to 12.
(Appendix 14)
determining a specific time region of video data based on a predetermined algorithm, and generating partial video data in which the video data is extracted in the specific time region;
classifying the partial video data;
modifying the predetermined algorithm based on the classification evaluation;
Classification methods including.
(Appendix 15)
recognizing video data including a task and thereby determining said task;
determining an operation of a machine according to the determined work, and controlling the machine according to the determined work;
control methods including.
(Appendix 16)
determining a specific time region of video data based on a predetermined algorithm, and generating partial video data in which the video data is extracted in the specific time region;
classifying the partial video data;
modifying the predetermined algorithm based on the classification evaluation;
A non-transitory computer-readable medium that stores a program that causes a computer to execute.
(Appendix 17)
recognizing video data including a task and thereby determining said task;
determining an operation of a machine according to the determined work, and controlling the machine according to the determined work;
A non-transitory computer-readable medium that stores a program that causes a computer to execute.

広く説明された本開示の精神または範囲から逸脱することなく、特定の実施形態に示されているように、本開示には多くのバリエーション及び／又は変更を加えてもよいことは、当業者には理解されるであろう。したがって、本実施形態は、すべての点で例示的であり、制限的ではないとみなされる。 It will be apparent to those skilled in the art that many variations and/or modifications may be made to this disclosure, as illustrated in particular embodiments, without departing from the spirit or scope of this disclosure as broadly described. will be understood. Accordingly, this embodiment is to be considered in all respects as illustrative and not restrictive.

１０分類装置
１１生成部
１２分類部
１３修正部
１４制御装置
１５認識部
１６コントローラ
２０分類システム
２１前処理部
２２生成部
２３分類部
２４マッピング部
２５修正部
２６計算部
２７信号分析部
２８決定部
２９サブシーケンス生成部
３０意図検出システム
３１人物対象分析部
３２意図検出部
４０機械
４１信号発生器
４２オプティマイザコントローラ 10 Classification device 11 Generation section 12 Classification section 13 Modification section 14 Control device 15 Recognition section 16 Controller 20 Classification system 21 Preprocessing section 22 Generation section 23 Classification section 24 Mapping section 25 Modification section 26 Calculation section 27 Signal analysis section 28 Determination section 29 Subsequence generation section 30 Intention detection system 31 Person object analysis section 32 Intention detection section 40 Machine 41 Signal generator 42 Optimizer controller

図８Ｂは、サブシーケンス（１）から（３）に対応するカテゴリとカテゴリラベルの例を示す。サブシーケンス（１）のカテゴリは「ｍｐ３１」、サブシーケンス（２）のカテゴリは「ｍｐ７６」、サブシーケンス（３）のカテゴリは「ｍｐ２１」である。分類部２３はＤＢを使用してこれらのカテゴリ番号を設定する。さらに、サブシーケンス（１）のカテゴリラベルは「左腕を上げる」、サブシーケンス（２）のカテゴリは「物体を渡す」、サブシーケンス（３）のカテゴリは「リラックス」である。分類部２３は、マッピング部２４によって生成されたテキスト情報を使用してこれらのカテゴリラベルを設定する。このように、分類システム２０はサブシーケンスのラベルを定義する。 FIG. 8B shows examples of categories and category labels corresponding to subsequences (1) to (3). The category of subsequence (1) is "mp31," the category of subsequence (2) is "mp76," and the category of subsequence ( 3 ) is "mp21." The classification unit 23 uses the DB to set these category numbers. Further, the category label of subsequence (1) is "raise left arm," the category of subsequence (2) is "pass the object," and the category of subsequence ( 3 ) is "relax." The classification unit 23 uses the text information generated by the mapping unit 24 to set these category labels. In this manner, classification system 20 defines labels for subsequences.

Claims

generating means for determining a specific time region of video data based on a predetermined algorithm and generating partial video data from which the video data is extracted in the specific time region;
Classifying means for classifying the partial video data generated by the generating means;
modification means for modifying the predetermined algorithm based on the evaluation of the classification performed by the classification means;
A classification device comprising:

further comprising pre-processing means for reducing information contained in the raw data and generating the video data including information related to the classification;
The classification device according to claim 1.

The modification means includes an index indicating how well the classification means classifies elements that are already known to belong to the same category as part of the same category, and a default index for the defined problem. evaluating the classification using at least one indicator of a deviation from the number of categories and an indicator of how well a system including the classification device is accomplishing the overall task;
The classification device according to claim 1 or 2.

The classification means classifies the partial video data as a type of human motion.
A classification device according to any one of claims 1 to 3.

the generating means calculates an intensity signal of the video data to determine the specific time domain, the intensity signal indicating human motion;
The classification device according to claim 4.

The classification means assigns the classified partial video data to a text label.
The classification device according to claim 4 or 5.

further comprising an intention detection means for detecting a human intention using the classified partial video data;
The classification device according to any one of claims 4 to 6.

further comprising a controller that controls the operation of the machine based on the human intention detected by the intention detection means;
The classification device according to claim 7.

recognition means for recognizing video data including a task and thereby determining said task;
a controller that determines the operation of a machine according to the determined work and controls the machine according to the determined work;
A control device comprising:

further comprising a classification means for classifying the video data and inputting the classified video data to the recognition means;
The control device according to claim 9.

generating means for determining a specific time region of video data based on a predetermined algorithm and generating partial video data from which the video data is extracted in the specific time region;
further comprising a modifying means for modifying the predetermined algorithm based on the evaluation of the classification performed by the classifying means,
the partial video data is recognized by the recognition means;
The control device according to claim 10.

The modification means includes an index indicating how well the classification means classifies elements that are already known to belong to the same category as part of the same category, and a default index for the defined problem. evaluating the classification using at least one indicator of a deviation from a number of categories and an indicator of how well a system including the control device is accomplishing an overall task;
The control device according to claim 11.

The classification means classifies the video data as a type of human motion.
The control device according to any one of claims 10 to 12.

determining a specific time region of video data based on a predetermined algorithm, and generating partial video data in which the video data is extracted in the specific time region;
classifying the partial video data;
modifying the predetermined algorithm based on the classification evaluation;
Classification methods including.

recognizing video data including a task and thereby determining said task;
determining an operation of a machine according to the determined work, and controlling the machine according to the determined work;
control methods including.

determining a specific time region of video data based on a predetermined algorithm, and generating partial video data in which the video data is extracted in the specific time region;
classifying the partial video data;
modifying the predetermined algorithm based on the classification evaluation;
A non-transitory computer-readable medium that stores a program that causes a computer to execute.

recognizing video data including a task and thereby determining said task;
determining an operation of a machine according to the determined work, and controlling the machine according to the determined work;
A non-transitory computer-readable medium that stores a program that causes a computer to execute.