JP2023127176A

JP2023127176A - Instructor-side apparatus, method, and program

Info

Publication number: JP2023127176A
Application number: JP2022030795A
Authority: JP
Inventors: 正睦渕上; Masamutsu Fuchigami
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2023-09-13

Abstract

To desirably provide a technique capable of enabling an instructor to more effectively give instructions to a worker.SOLUTION: The instructor-side apparatus is provided with a recognition unit that recognizes a specific input pattern from first sensor data, and a switching signal output unit that outputs a switching signal between a still image and a moving image to a worker-side apparatus based on the recognition of the input pattern; and an image display control unit that controls display of the gesture on a display based on gesture information of an instructor and display of the still image or moving image transmitted from the worker-side apparatus based on the switching signal on the display.SELECTED DRAWING: Figure 1

Description

本発明は、指示者側装置、方法およびプログラムに関する。 The present invention relates to an instructor side device, method, and program.

従来の遠隔作業支援技術としては、小型のディスプレイに指示を表示する技術、または、透過型のディスプレイに指示を表示する技術などが知られている。しかし、いずれの技術であっても、従来の遠隔作業支援技術としては、平面的に指示を表示する技術が一般的である。 As conventional remote work support techniques, techniques for displaying instructions on a small display, techniques for displaying instructions on a transparent display, etc. are known. However, regardless of the technology, as a conventional remote work support technology, a technology that displays instructions on a flat surface is common.

一方、近年においては、ＡＲ（ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ）技術などの発展により、三次元的に表示可能な作業マニュアルが現れてきている。かかる作業マニュアルはあらかじめ作成され、ＡＲ技術を用いて三次元的に表示され得る。これによって、作業などを三次元的に作業者に理解させることが可能になりつつある。しかしながら、ＡＲ技術を用いて作業マニュアルを三次元的に表示する場合であっても、作業マニュアル自体はあらかじめ作成されたものであるため、そのときの作業状況に応じた作業支援が行われにくい。 On the other hand, in recent years, with the development of AR (Augmented Reality) technology, work manuals that can be displayed three-dimensionally have appeared. Such a work manual may be created in advance and displayed three-dimensionally using AR technology. This is making it possible for workers to understand their work three-dimensionally. However, even when the work manual is displayed three-dimensionally using AR technology, the work manual itself is created in advance, so it is difficult to provide work support according to the work situation at that time.

一方、遠隔作業支援においては、指示者によって入力される音声の出力と、ハンドジェスチャ表示とを併用することが有効であることが既に示されている（例えば、非特許文献１参照）。しかし、ジェスチャ表示をＡＲ技術にそのまま適用した場合には、改善すべき点が生じる。 On the other hand, in remote work support, it has already been shown that it is effective to use both the output of the voice input by the instructor and the display of hand gestures (for example, see Non-Patent Document 1). However, if gesture display is directly applied to AR technology, there will be issues that need to be improved.

例えば、ジェスチャがＡＲ技術を用いて三次元的に表示される場合であっても、ジェスチャが作業者の移動に伴って変化せずに、作業者が同じ視点からしかジェスチャを見ることができないとすると、作業者がジェスチャによる指示を把握しにくい状況が生じ得る。一方、ジェスチャが作業者の移動に伴って常に変化してしまうと、指示者がジェスチャを見る視点と作業者がジェスチャを見る視点とが一致しない状況が頻繁に生じるため、指示者がジェスチャによる指示を出しにくい状況が生じ得る。 For example, even if gestures are displayed three-dimensionally using AR technology, the gestures do not change as the worker moves, and the worker can only view the gestures from the same viewpoint. In this case, a situation may arise in which it is difficult for the worker to understand instructions using gestures. On the other hand, if the gestures constantly change as the worker moves, situations will frequently occur where the point of view from which the instructor views the gestures and the perspective from which the worker views the gestures do not match. Situations may arise where it is difficult to

市原俊介、鈴木雄介、“ハンドジェスチャ送信機能を有する遠隔作業支援システムの開発と課題”、情報処理学会インタラクション２０１９、２Ｂ－３６Shunsuke Ichihara, Yusuke Suzuki, “Development and issues of a remote work support system with hand gesture transmission function”, Information Processing Society of Japan Interaction 2019, 2B-36

そこで、指示者から作業者への指示をより効果的に行うことを可能とする技術が提供されることが望まれる。 Therefore, it is desired to provide a technique that allows an instructor to more effectively give instructions to a worker.

上記問題を解決するために、本発明のある観点によれば、第１のセンサデータから特定の入力パターンを認識する認識部と、前記入力パターンが認識されたことに基づいて、静止画像と動画像との間における切り替え信号を作業者側装置に出力する切り替え信号出力部と、指示者のジェスチャ情報に基づいてジェスチャのディスプレイによる表示を制御するとともに、前記切り替え信号に基づいて前記作業者側装置から送信された静止画像または動画像の前記ディスプレイによる表示を制御する画像表示制御部と、を備える、指示者側装置が提供される。 In order to solve the above problems, one aspect of the present invention includes a recognition unit that recognizes a specific input pattern from first sensor data, and a still image and a moving image based on the recognition of the input pattern. a switching signal output unit that outputs a switching signal between the image and the worker's side device to the worker's side device; and a switching signal output unit that controls display of the gesture on the display based on gesture information of the instructor; An instructor-side device is provided, comprising: an image display control section that controls display of a still image or a moving image transmitted from the display on the display.

前記第１のセンサデータは、ジェスチャを含み、前記特定の入力パターンは、第１のジェスチャパターンを含んでもよい。 The first sensor data may include a gesture, and the specific input pattern may include a first gesture pattern.

前記第１のセンサデータは、音声を含み、前記特定の入力パターンは、特定の音声パターンを含んでもよい。 The first sensor data may include audio, and the specific input pattern may include a specific audio pattern.

前記認識部は、指示者のジェスチャから第２のジェスチャパターンを認識し、前記切り替え信号出力部は、前記第２のジェスチャパターンが認識されたときには、前記特定の音声パターンが認識されたとしても、前記切り替え信号を前記作業者側装置に出力しなくてもよい。 The recognition unit recognizes a second gesture pattern from the gesture of the instructor, and the switching signal output unit determines that when the second gesture pattern is recognized, even if the specific voice pattern is recognized, The switching signal may not be output to the worker side device.

前記認識部は、指示者のジェスチャから第３のジェスチャパターンを認識し、前記切り替え信号出力部は、前記第３のジェスチャパターンが認識されていないときには、前記特定の音声パターンが認識されたとしても、前記切り替え信号を前記作業者側装置に出力しなくてもよい。 The recognition unit recognizes a third gesture pattern from the gesture of the instructor, and the switching signal output unit determines that when the third gesture pattern is not recognized, even if the specific voice pattern is recognized. , it is not necessary to output the switching signal to the worker side device.

前記切り替え信号出力部は、静止画像から動画像への切り替えを示す入力パターンが認識されたことに基づいて、静止画像から動画像への切り替え信号を前記作業者側装置に出力してもよい。 The switching signal output unit may output a switching signal from a still image to a moving image to the worker side device based on recognition of an input pattern indicating switching from a still image to a moving image.

前記切り替え信号出力部は、動画像から静止画像への切り替えを示す入力パターンが認識されたことに基づいて、動画像から静止画像への切り替え信号を前記作業者側装置に出力してもよい。 The switching signal output unit may output a switching signal from a moving image to a still image to the worker side device based on recognition of an input pattern indicating switching from a moving image to a still image.

前記切り替え信号出力部は、第２のセンサデータから認識される作業者の行動が前記作業者の移動中を示す場合には、動画像から静止画像への切り替えを示す入力パターンが認識されたとしても、動画像から静止画像への切り替え信号を前記作業者側装置に出力しなくてもよい。 When the worker's behavior recognized from the second sensor data indicates that the worker is moving, the switching signal output unit determines that an input pattern indicating switching from a moving image to a still image has been recognized. Also, it is not necessary to output a switching signal from a moving image to a still image to the worker side device.

また、本発明の別の観点によれば、第１のセンサデータから特定の入力パターンを認識することと、前記入力パターンが認識されたことに基づいて、静止画像と動画像との間における切り替え信号を作業者側装置に出力することと、指示者のジェスチャ情報に基づいてジェスチャのディスプレイによる表示を制御するとともに、前記切り替え信号に基づいて前記作業者側装置から送信された静止画像または動画像の前記ディスプレイによる表示を制御することと、を備える、方法が提供される。 According to another aspect of the present invention, a specific input pattern is recognized from the first sensor data, and switching between a still image and a moving image is performed based on the recognition of the input pattern. Outputting a signal to a worker-side device, controlling the display of gestures on a display based on the gesture information of the instructor, and still images or moving images transmitted from the worker-side device based on the switching signal. A method is provided, comprising: controlling a display of a display on the display.

また、本発明の別の観点によれば、コンピュータを、第１のセンサデータから特定の入力パターンを認識する認識部と、前記入力パターンが認識されたことに基づいて、静止画像と動画像との間における切り替え信号を作業者側装置に出力する切り替え信号出力部と、指示者のジェスチャ情報に基づいてジェスチャのディスプレイによる表示を制御するとともに、前記切り替え信号に基づいて前記作業者側装置から送信された静止画像または動画像の前記ディスプレイによる表示を制御する画像表示制御部と、を備える指示者側装置として機能させるプログラムが提供される。 According to another aspect of the present invention, the computer includes a recognition unit that recognizes a specific input pattern from the first sensor data, and a still image and a moving image based on the recognition of the input pattern. a switching signal output unit that outputs a switching signal between the two to the worker device; and a switching signal output unit that controls display of the gesture on the display based on the gesture information of the instructor, and transmits the switching signal from the worker device based on the switching signal. and an image display control section that controls display of still images or moving images that have been displayed on the display.

以上説明したように本発明によれば、指示者から作業者への指示をより効果的に行うことを可能とする技術が提供される。 As described above, according to the present invention, a technique is provided that allows an instructor to more effectively give instructions to a worker.

本発明の実施形態に係る遠隔作業支援システムの機能構成例を示す図である。1 is a diagram showing an example of a functional configuration of a remote work support system according to an embodiment of the present invention. 遠隔作業支援システムの静止画状態から動画状態への切り替え動作例を示すフローチャートである。It is a flowchart which shows the example of the switching operation from a still image state to a video state of a remote work support system. 遠隔作業支援システムの動画状態から静止画状態への切り替え動作例を示すフローチャートである。It is a flowchart which shows the example of the switching operation from the video state to the still image state of a remote work support system. 比較例１に係るジェスチャ情報の変換先の座標系について説明するための図である。7 is a diagram for explaining a coordinate system to which gesture information is converted according to Comparative Example 1. FIG. 比較例２に係るジェスチャ情報の変換先の座標系について説明するための図である。7 is a diagram for explaining a coordinate system to which gesture information is converted according to Comparative Example 2. FIG. 本発明の実施形態に係るジェスチャ情報の変換先の座標系について説明するための図である。FIG. 3 is a diagram for explaining a coordinate system to which gesture information is converted according to an embodiment of the present invention. 本発明の実施形態に係る指示者側システムの例としての情報処理装置のハードウェア構成を示す図である。1 is a diagram showing a hardware configuration of an information processing device as an example of an instructor-side system according to an embodiment of the present invention.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Note that, in this specification and the drawings, components having substantially the same functional configurations are designated by the same reference numerals and redundant explanation will be omitted.

また、本明細書および図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の符号の後に異なる数字を付して区別する場合がある。ただし、実質的に同一の機能構成を有する複数の構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。また、異なる実施形態の類似する構成要素については、同一の符号の後に異なるアルファベットを付して区別する場合がある。ただし、異なる実施形態の類似する構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。 Further, in this specification and the drawings, a plurality of components having substantially the same functional configuration may be distinguished by attaching different numbers after the same reference numeral. However, if there is no particular need to distinguish between a plurality of components having substantially the same functional configuration, only the same reference numerals are given. Further, similar components in different embodiments may be distinguished by attaching different alphabets after the same reference numerals. However, when it is not necessary to particularly distinguish between similar components of different embodiments, only the same reference numerals are given.

（１．実施形態の詳細）
本発明の実施形態の詳細について説明する。 (1. Details of embodiment)
Details of embodiments of the present invention will be described.

（１－１．遠隔作業支援システムの構成）
まず、本発明の実施形態に係る遠隔作業支援システムの構成例について説明する。図１は、本発明の実施形態に係る遠隔作業支援システムの機能構成例を示す図である。図１に示されるように、遠隔作業支援システム１は、作業者側システムと、指示者側システムとを有する。 (1-1. Configuration of remote work support system)
First, a configuration example of a remote work support system according to an embodiment of the present invention will be described. FIG. 1 is a diagram showing an example of the functional configuration of a remote work support system according to an embodiment of the present invention. As shown in FIG. 1, the remote work support system 1 includes a worker side system and an instructor side system.

作業者側システムは、遠隔作業支援システム１のうち、作業者によって用いられるシステムである。作業者は、作業者に指示を出す指示者から離れた場所（すなわち、遠隔地）にて作業を行う。一方、指示者側システムは、遠隔作業支援システム１のうち、指示者によって用いられるシステムである。作業者側システムと指示者側システムとは、ネットワーク３０に接続されており、ネットワーク３０を介して通信可能に構成されている。 The worker side system is a system of the remote work support system 1 that is used by the worker. The worker performs the work at a location away from the instructor who gives instructions to the worker (ie, at a remote location). On the other hand, the instructor side system is a system of the remote work support system 1 that is used by the instructor. The worker side system and the instructor side system are connected to a network 30 and are configured to be able to communicate via the network 30.

なお、遠隔作業支援においては、指示者と作業者との間において音声通話などが行われるのが一般的である。しかし、音声通話などに必要な構成は、本発明の実施形態に係る遠隔作業支援システム１の構成の説明に関与しないため、音声通話などに必要な構成の説明は省略する。 Note that in remote work support, voice calls and the like are generally conducted between the instructor and the worker. However, the configuration necessary for voice calls and the like is not involved in the explanation of the configuration of the remote work support system 1 according to the embodiment of the present invention, so the description of the configurations necessary for voice calls and the like will be omitted.

図１に示されるように、作業者側システムは、ＡＲディスプレイ１１と、カメラ１２と、位置姿勢計測部１３と、演算処理部１４と、スピーカ１５と、音声処理部１６とを備える。一方、図１に示されるように、指示者側システムは、ディスプレイ２１と、ジェスチャ入力装置２２と、演算処理部２４と、マイクロフォン２５と、ジェスチャ認識装置２６と、音声認識装置２７とを備える。 As shown in FIG. 1, the worker side system includes an AR display 11, a camera 12, a position and orientation measurement section 13, an arithmetic processing section 14, a speaker 15, and an audio processing section 16. On the other hand, as shown in FIG. 1, the instructor side system includes a display 21, a gesture input device 22, an arithmetic processing section 24, a microphone 25, a gesture recognition device 26, and a voice recognition device 27.

（ＡＲディスプレイ１１）
ＡＲディスプレイ１１は、ジェスチャの表示を行う。より詳細に、ＡＲディスプレイ１１は、作業者の視界にジェスチャを重畳表示する。ＡＲディスプレイ１１は、作業者に装着される。例えば、ＡＲディスプレイ１１は、作業者の頭部に装着されるヘッドマウントディスプレイであってよい。しかし、ＡＲディスプレイ１１の種類はヘッドマウントディスプレイに限定されない。例えば、ＡＲディスプレイ１１は、ヘッドマウントディスプレイ以外のディスプレイであってもよい。 (AR display 11)
The AR display 11 displays gestures. More specifically, the AR display 11 displays the gesture superimposed on the worker's field of view. The AR display 11 is worn by a worker. For example, the AR display 11 may be a head-mounted display mounted on the head of the worker. However, the type of AR display 11 is not limited to a head-mounted display. For example, the AR display 11 may be a display other than a head-mounted display.

（カメラ１２）
カメラ１２は、作業者の環境を撮像することにより動画像（以下、単に「動画」とも言う。）を得る。カメラ１２は、作業者の視線方向と同じ方向を向くように設けられているのが望ましい。したがって、カメラ１２は、ＡＲディスプレイ１１と一体化されているのが望ましい。しかし、カメラ１２は、ＡＲディスプレイ１１とは別個のハードウェアとして構成されてもよい。 (Camera 12)
The camera 12 obtains a moving image (hereinafter also simply referred to as a "moving image") by capturing an image of the worker's environment. It is desirable that the camera 12 be provided so as to face in the same direction as the line of sight of the worker. Therefore, it is desirable that the camera 12 be integrated with the AR display 11. However, the camera 12 may be configured as separate hardware from the AR display 11.

（位置姿勢計測部１３）
位置姿勢計測部１３は、カメラ１２の位置および方向を計測する。例えば、カメラ１２の内部にセンサが組み込まれている場合には、位置姿勢計測部１３は、カメラ１２の内部に組み込まれているセンサによって検出されたセンサデータに基づいて、カメラ１２の位置および方向を計測してもよい。センサは、加速度センサまたはジャイロセンサなどであってよいが、センサの種類は特に限定されない。 (Position and orientation measurement unit 13)
The position and orientation measurement unit 13 measures the position and direction of the camera 12. For example, if a sensor is incorporated inside the camera 12, the position and orientation measuring unit 13 determines the position and direction of the camera 12 based on sensor data detected by the sensor incorporated inside the camera 12. may be measured. The sensor may be an acceleration sensor, a gyro sensor, or the like, but the type of sensor is not particularly limited.

例えば、位置姿勢計測部１３は、作業者の環境に設置された二次元マーカを計測するセンサを含んでもよい。このとき、位置姿勢計測部１３は、計測した二次元マーカの形状に基づいて、カメラ１２の位置および方向を計測してもよい。あるいは、位置姿勢計測部１３は、カメラ１２の内部に組み込まれているセンサによって検出されたセンサデータに基づいてカメラ１２の位置および方向を計測する手法と、計測した二次元マーカの形状に基づいてカメラ１２の位置および方向を計測する手法とを組み合わせて用いてもよい。 For example, the position and orientation measurement unit 13 may include a sensor that measures a two-dimensional marker installed in the worker's environment. At this time, the position and orientation measurement unit 13 may measure the position and direction of the camera 12 based on the shape of the measured two-dimensional marker. Alternatively, the position and orientation measurement unit 13 uses a method of measuring the position and direction of the camera 12 based on sensor data detected by a sensor built into the camera 12, and a method of measuring the position and direction of the camera 12 based on the shape of the measured two-dimensional marker. It may be used in combination with a method of measuring the position and direction of the camera 12.

（演算処理部１４）
演算処理部１４は、コンピュータによって実現され、各種の演算処理を行う作業者側装置として機能する。例えば、演算処理部１４は、演算処理部２４からジェスチャ情報を取得するジェスチャ取得部、演算処理部２４から切り替え信号を取得する信号取得部、作業者の視界にジェスチャが表示されるようにＡＲディスプレイ１１を制御する提示制御部として機能し得る。 (Calculation processing unit 14)
The arithmetic processing unit 14 is implemented by a computer, and functions as a worker-side device that performs various arithmetic processes. For example, the arithmetic processing unit 14 includes a gesture acquisition unit that acquires gesture information from the arithmetic processing unit 24, a signal acquisition unit that acquires a switching signal from the arithmetic processing unit 24, and an AR display that displays gestures in the field of view of the worker. 11.

また、演算処理部１４は、指示者側システムにおける演算処理部２４との間における通信インタフェースによる通信を制御する処理などを行う。例えば、演算処理部１４は、カメラ１２によって得られた動画または静止画像（以下、単に「静止画」とも言う。）の演算処理部２４への送信を制御する送信制御部として機能し得る。 Further, the arithmetic processing unit 14 performs processing such as controlling communication via a communication interface with the arithmetic processing unit 24 in the instructor side system. For example, the arithmetic processing unit 14 may function as a transmission control unit that controls transmission of a moving image or a still image (hereinafter also simply referred to as a “still image”) obtained by the camera 12 to the arithmetic processing unit 24.

例えば、演算処理部１４は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）またはＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの演算装置を含み、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）により記憶されているプログラムが演算装置によりＲＡＭに展開されて実行されることにより、その機能が実現され得る。このとき、当該プログラムを記録した、コンピュータに読み取り可能な記録媒体も提供され得る。 For example, the arithmetic processing unit 14 includes an arithmetic unit such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), and a program stored in a ROM (Read Only Memory) is loaded into a RAM and executed by the arithmetic unit. By doing so, the function can be realized. At this time, a computer-readable recording medium on which the program is recorded may also be provided.

（スピーカ１５）
スピーカ１５は、指示者から入力された指示音声を、音声処理部１６を通さずに出力する。あるいは、スピーカ１５は、指示者から入力された指示音声を、音声処理部１６を通して出力する。なお、スピーカ１５から出力される指示音声の仮想的な出力位置が制御可能となるように、スピーカ１５は、ステレオ方式のスピーカまたはアレイ方式のスピーカによって構成される。 (Speaker 15)
The speaker 15 outputs the instruction voice input from the instructor without passing it through the voice processing section 16. Alternatively, the speaker 15 outputs the instruction voice input from the instructor through the voice processing section 16. Note that the speaker 15 is configured by a stereo speaker or an array speaker so that the virtual output position of the instruction voice output from the speaker 15 can be controlled.

（音声処理部１６）
音声処理部１６は、位置データおよび音声データを取得し、あたかもその位置から音声が出力されているように音声を作業者に聞こえさせる処理を行う。かかる処理として、典型的にはＨＲＴＦ（非特許文献：頭部伝達関数の基礎と３次元音響システムへの応用日本音響学会編飯田一博著コロナ社）が用いられてよい。より詳細に、音声処理部１６は、指示音声を演算処理部１４から取得し、取得した指示音声のスピーカ１５による仮想的な出力位置からの出力を制御する。なお、指示音声は、音声処理部１６による処理が行われずに、そのままスピーカ１５から出力されることもあり得る。 (Sound processing unit 16)
The audio processing unit 16 acquires position data and audio data, and performs processing to make the worker hear the audio as if it were being output from that location. As such processing, HRTF (Non-patent document: Fundamentals of head-related transfer functions and applications to three-dimensional acoustic systems, edited by the Acoustical Society of Japan, written by Kazuhiro Iida, Corona Publishing) may be used. More specifically, the audio processing unit 16 acquires the instruction voice from the arithmetic processing unit 14, and controls the output of the acquired instruction voice from the virtual output position of the speaker 15. Note that the instruction voice may be output from the speaker 15 as it is without being processed by the voice processing unit 16.

（ディスプレイ２１）
ディスプレイ２１は、カメラ１２によって得られた動画を表示し得る。また、ディスプレイ２１は、カメラ１２によって得られた動画に基づくスナップショットを静止画として表示し得る。これによって、指示者は、作業者の環境を見ることができる。さらに、ディスプレイ２１は、指示者のジェスチャを表示し得る。これによって、指示者は、どのようなジェスチャが作業者に伝達されているかを確認することができる。 (Display 21)
Display 21 may display moving images obtained by camera 12. Further, the display 21 can display a snapshot based on a moving image obtained by the camera 12 as a still image. This allows the instructor to view the worker's environment. Furthermore, the display 21 may display the gestures of the instructor. This allows the instructor to confirm what kind of gesture is being transmitted to the worker.

（ジェスチャ入力装置２２）
ジェスチャ入力装置２２は、指示者から入力されるジェスチャを受け付ける入力装置である。ジェスチャ入力装置２２は、受け付けたジェスチャを演算処理部２４に出力する。さらに、ジェスチャ入力装置２２は、受け付けたジェスチャをジェスチャ認識装置２６に出力する。なお、ジェスチャ入力装置２２は、センサの一例に該当し得る。すなわち、ジェスチャ入力装置２２は、センサデータ（第１のセンサデータ）の一例としてのジェスチャを検出する。 (Gesture input device 22)
The gesture input device 22 is an input device that accepts gestures input from an instructor. The gesture input device 22 outputs the received gesture to the arithmetic processing unit 24 . Further, the gesture input device 22 outputs the received gesture to the gesture recognition device 26. Note that the gesture input device 22 may correspond to an example of a sensor. That is, the gesture input device 22 detects a gesture as an example of sensor data (first sensor data).

例えば、ジェスチャ入力装置２２は、光学的機器によって実現されてもよい。かかる光学的機器としては、Ｕｌｔｒａｌｅａｐ社によって開発されたＬｅａｐＭｏｔｉｏｎ（登録商標）などが用いられ得る。ＬｅａｐＭｏｔｉｏｎ（登録商標）は、複数のＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）によって手に照射された赤外線の赤外線ステレオカメラによる検出結果に基づいて、手の動きをトラッキングする技術である。あるいは、ジェスチャ入力装置２２は、指示者によって装着される入力装置（例えば、センサグローブなど）によって実現されてもよい。 For example, gesture input device 22 may be realized by an optical device. As such an optical device, Leap Motion (registered trademark) developed by Ultraleap, etc. can be used. Leap Motion (registered trademark) is a technology that tracks hand movements based on the detection results of infrared rays irradiated onto the hand by a plurality of LEDs (Light Emitting Diodes) using an infrared stereo camera. Alternatively, the gesture input device 22 may be realized by an input device (for example, a sensor glove, etc.) worn by the instructor.

ジェスチャの表現形式は、ジェスチャを複数の三次元座標を含んだ構造によって表現可能であり、ジェスチャを三次元空間内に再現可能であれば、任意の表現形式であってよい。例えば、ジェスチャは、骨組みの端点データによって表現されてもよいし、表皮のメッシュデータによって表現されてもよい。 The expression format of the gesture may be any expression format as long as the gesture can be expressed by a structure including a plurality of three-dimensional coordinates and the gesture can be reproduced in a three-dimensional space. For example, a gesture may be expressed by end point data of a skeleton, or may be expressed by mesh data of an epidermis.

（マイクロフォン２５）
マイクロフォン２５は、指示者から入力される音声を受け付ける。マイクロフォン２５は、受け付けた音声を指示音声として演算処理部２４に出力する。さらに、マイクロフォン２５は、受け付けた音声を音声認識装置２７に出力する。なお、マイクロフォン２５は、センサの一例に該当し得る。すなわち、マイクロフォン２５は、センサデータ（第１のセンサデータ）の一例としての音声を検出する。 (Microphone 25)
The microphone 25 receives audio input from the instructor. The microphone 25 outputs the received voice to the arithmetic processing unit 24 as an instruction voice. Further, the microphone 25 outputs the received voice to the voice recognition device 27. Note that the microphone 25 may correspond to an example of a sensor. That is, the microphone 25 detects audio as an example of sensor data (first sensor data).

（ジェスチャ認識装置２６）
ジェスチャ認識装置２６は、ジェスチャ入力装置２２によって受け付けられたジェスチャに対してジェスチャ認識を実行し、当該ジェスチャから特定の入力パターンが認識されるかを試みる。例えば、特定の入力パターンは、あらかじめ定められたジェスチャパターン（例えば、あらかじめ定められた１または複数のジェスチャ動作など）であってもよい。例えば、あらかじめ定められたジェスチャ動作は、切り替えを示す１または複数のジェスチャ動作であってもよい。 (Gesture recognition device 26)
The gesture recognition device 26 performs gesture recognition on the gesture accepted by the gesture input device 22, and attempts to recognize a specific input pattern from the gesture. For example, the specific input pattern may be a predetermined gesture pattern (eg, one or more predetermined gesture movements). For example, the predetermined gesture may be one or more gestures indicating switching.

ジェスチャ認識装置２６は、コンピュータによって実現され、指示者側装置として機能する。例えば、ジェスチャ認識装置２６は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）またはＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの演算装置を含み、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）により記憶されているプログラムが演算装置によりＲＡＭに展開されて実行されることにより、その機能が実現され得る。このとき、当該プログラムを記録した、コンピュータに読み取り可能な記録媒体も提供され得る。 The gesture recognition device 26 is realized by a computer and functions as an instructor-side device. For example, the gesture recognition device 26 includes an arithmetic device such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), and a program stored in a ROM (Read Only Memory) is loaded into a RAM and executed by the arithmetic device. By doing so, the function can be realized. At this time, a computer-readable recording medium on which the program is recorded may also be provided.

（音声認識装置２７）
音声認識装置２７は、マイクロフォン２５によって受け付けられた音声に対して音声認識を実行し、当該音声から特定の入力パターンが認識されるかを試みる。例えば、特定の入力パターンは、あらかじめ定められた音声パターン（例えば、あらかじめ定められた１または複数の語句など）であってもよい。例えば、あらかじめ定められた語句は、切り替えを示す１または複数の語句であってもよい。 (Voice recognition device 27)
The speech recognition device 27 performs speech recognition on the speech received by the microphone 25, and attempts to recognize a specific input pattern from the speech. For example, the specific input pattern may be a predetermined speech pattern (eg, one or more predetermined words). For example, the predetermined phrase may be one or more phrases indicating switching.

音声認識装置２７は、コンピュータによって実現され、指示者側装置として機能する。例えば、音声認識装置２７は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）またはＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの演算装置を含み、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）により記憶されているプログラムが演算装置によりＲＡＭに展開されて実行されることにより、その機能が実現され得る。このとき、当該プログラムを記録した、コンピュータに読み取り可能な記録媒体も提供され得る。 The voice recognition device 27 is realized by a computer and functions as an instructor-side device. For example, the speech recognition device 27 includes an arithmetic unit such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), and a program stored in a ROM (Read Only Memory) is loaded into a RAM and executed by the arithmetic unit. By doing so, the function can be realized. At this time, a computer-readable recording medium on which the program is recorded may also be provided.

（演算処理部２４）
演算処理部２４は、コンピュータによって実現され、各種の演算処理を行う指示者側装置として機能する。例えば、演算処理部２４は、認識部（例えば、ジェスチャ認識装置２６または音声認識装置２７）によって特定の入力パターンが認識されたことに基づいて、切り替え信号を演算処理部１４に出力する切り替え信号出力部として機能し得る。 (Calculation processing unit 24)
The arithmetic processing unit 24 is implemented by a computer and functions as an instructor-side device that performs various arithmetic processes. For example, the arithmetic processing unit 24 outputs a switching signal to output a switching signal to the arithmetic processing unit 14 based on recognition of a specific input pattern by the recognition unit (for example, the gesture recognition device 26 or the voice recognition device 27). can function as a division.

なお、切り替え信号は、二つの状態の一方から他方への切り替えを示す信号である。本発明の実施形態においては、切り替え信号が、カメラ１２によって得られた動画がディスプレイ２１によって表示される状態（以下、「動画状態」とも言う。）、および、カメラ１２によって得られた動画に基づくスナップショットが静止画としてディスプレイ２１によって表示される状態（以下、「静止画状態」とも言う。）のいずれかを示す場合を主に想定する。 Note that the switching signal is a signal indicating switching from one of two states to the other. In the embodiment of the present invention, the switching signal is based on a state in which a moving image obtained by the camera 12 is displayed on the display 21 (hereinafter also referred to as a "moving image state"), and a state in which a moving image obtained by the camera 12 is displayed. A case is mainly assumed in which a snapshot is displayed as a still image on the display 21 (hereinafter also referred to as a "still image state").

また、演算処理部２４は、ジェスチャのディスプレイ２１による表示を制御するとともに静止画または動画のディスプレイ２１による表示を制御する画像表示制御部として機能し得る。また、演算処理部２４は、作業者側システムにおける演算処理部１４との間における通信インタフェースによる通信を制御する処理などを行う。 Further, the arithmetic processing unit 24 can function as an image display control unit that controls the display of gestures on the display 21 and controls the display of still images or moving images on the display 21. Further, the arithmetic processing unit 24 performs processes such as controlling communication via a communication interface with the arithmetic processing unit 14 in the worker side system.

例えば、演算処理部２４は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）またはＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの演算装置を含み、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）により記憶されているプログラムが演算装置によりＲＡＭに展開されて実行されることにより、その機能が実現され得る。このとき、当該プログラムを記録した、コンピュータに読み取り可能な記録媒体も提供され得る。 For example, the arithmetic processing unit 24 includes an arithmetic unit such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), and a program stored in a ROM (Read Only Memory) is loaded into a RAM and executed by the arithmetic unit. By doing so, the function can be realized. At this time, a computer-readable recording medium on which the program is recorded may also be provided.

（ネットワーク３０）
ネットワーク３０は、作業者側システムと指示者側システムとを接続する。そして、ネットワーク３０は、作業者側システムと指示者側システムとの間における通信路として機能し得る。 (Network 30)
The network 30 connects the worker side system and the instructor side system. The network 30 can function as a communication path between the worker-side system and the instructor-side system.

以上、本発明の実施形態に係る遠隔作業支援システム１の構成例について説明した。 The configuration example of the remote work support system 1 according to the embodiment of the present invention has been described above.

（１－２．遠隔作業支援システムの動作）
続いて、図１～図３を参照しながら、本発明の実施形態に係る遠隔作業支援システム１の動作例について説明する。 (1-2. Operation of remote work support system)
Next, an example of the operation of the remote work support system 1 according to the embodiment of the present invention will be described with reference to FIGS. 1 to 3.

なお、音声通話などの動作は、本発明の実施形態に係る遠隔作業支援システム１の動作の説明に関与しないため、音声通話などの動作の説明は省略する。さらに、通信の遅延などは無視できるほど小さいものとする。 Note that operations such as voice calls are not related to the explanation of the operations of the remote work support system 1 according to the embodiment of the present invention, and therefore, explanations of operations such as voice calls will be omitted. Furthermore, it is assumed that communication delays and the like are negligible.

上記したように、遠隔作業支援システム１は、動画状態および静止画状態のいずれかの状態をとり得る。遠隔作業支援システム１の状態は、切り替え信号に基づいて切り替えられる。動画状態においては、カメラ１２によって得られた動画がディスプレイ２１によって表示される。一方、静止画状態においては、切り替えられたタイミングにおける動画のスナップショットが静止画としてディスプレイ２１によって表示される。 As described above, the remote work support system 1 can take either a moving image state or a still image state. The state of the remote work support system 1 is switched based on a switching signal. In the moving image state, a moving image obtained by the camera 12 is displayed on the display 21. On the other hand, in the still image state, a snapshot of the moving image at the switching timing is displayed on the display 21 as a still image.

図２は、遠隔作業支援システム１の静止画状態から動画状態への切り替え動作例を示すフローチャートである。図３は、遠隔作業支援システム１の動画状態から静止画状態への切り替え動作例を示すフローチャートである。まず、図２を主に参照しながら、静止画状態から動画状態への切り替え動作例を説明し、続いて、図３を主に参照しながら、動画状態から静止画状態への切り替え動作例を説明する。 FIG. 2 is a flowchart illustrating an example of the switching operation of the remote work support system 1 from a still image state to a moving image state. FIG. 3 is a flowchart illustrating an example of an operation of switching the remote work support system 1 from a moving image state to a still image state. First, an example of an operation for switching from a still image state to a moving image state will be explained with reference to FIG. 2, and then an example of an operation for switching from a moving image state to a still image state will be explained, mainly referring to FIG. explain.

ここで、指示者側システムから作業者側システムには、ジェスチャ入力装置２２によって指示者からの入力が受け付けられたジェスチャ情報が送信される。ジェスチャ情報には、基準位置からのジェスチャの変位（手の各特徴点の変位）を示す情報が含まれ得る。 Here, gesture information whose input from the instructor is accepted by the gesture input device 22 is transmitted from the instructor's system to the worker's system. The gesture information may include information indicating the displacement of the gesture (displacement of each feature point of the hand) from the reference position.

作業者側システムにおいて、演算処理部１４は、基準位置および基準方向とジェスチャ情報とに基づいてジェスチャが配置されるようにＡＲディスプレイ１１を制御する。一方、指示者側システムにおいて、演算処理部２４は、ジェスチャ情報に基づいてジェスチャが表示されるようにディスプレイ２１を制御する。 In the worker side system, the arithmetic processing unit 14 controls the AR display 11 so that gestures are arranged based on the reference position, reference direction, and gesture information. On the other hand, in the instructor side system, the arithmetic processing unit 24 controls the display 21 so that gestures are displayed based on the gesture information.

さらに、指示者側システムにおいては、マイクロフォン２５によって指示音声が受け付けられ、演算処理部２４に出力される。そして、演算処理部２４から作業者側システムにおける演算処理部１４に、指示音声がネットワーク３０を介して常時送信される。演算処理部１４は、指示音声を常時取得する。 Furthermore, in the system on the instructor's side, an instruction voice is received by the microphone 25 and output to the arithmetic processing section 24 . Then, the instruction voice is constantly transmitted from the arithmetic processing unit 24 to the arithmetic processing unit 14 in the worker side system via the network 30. The arithmetic processing unit 14 always acquires the instruction voice.

（静止画状態から動画状態への切り替え動作）
静止画状態においては、ジェスチャが配置される基準位置および基準方向が、カメラ１２の位置および方向に追従しない。すなわち、静止画状態においては、作業者側システムにおいてジェスチャが配置される基準位置および基準方向が、動画状態から静止画状態への切り替えられたタイミングに記録されたカメラ１２の位置および方向に固定される。 (Switching operation from still image state to video state)
In the still image state, the reference position and reference direction in which gestures are placed do not follow the position and direction of the camera 12. That is, in the still image state, the reference position and reference direction in which gestures are placed in the worker side system are fixed to the position and direction of the camera 12 recorded at the timing of switching from the moving image state to the still image state. Ru.

指示者側システムにおいて、ジェスチャ認識装置２６は、ジェスチャ入力装置２２によって受け付けられた指示者のジェスチャから第１のジェスチャパターン（静止画状態から動画状態への切り替えを示すジェスチャパターン）が認識されるかを試みる。特定のジェスチャパターンは、具体的にどのようなジェスチャパターンであってもよい。なお、静止画状態から動画状態への切り替えを示すジェスチャパターンと動画状態から静止画状態への切り替えを示すジェスチャパターンとは、同じであってもよいが、二つの切り替え指示を別々に認識するためには異なるのが望ましい。 In the instructor side system, the gesture recognition device 26 determines whether a first gesture pattern (a gesture pattern indicating switching from a still image state to a moving image state) is recognized from the instructor's gesture received by the gesture input device 22. try. The specific gesture pattern may be any specific gesture pattern. Note that the gesture pattern indicating switching from a still image state to a moving image state and the gesture pattern indicating switching from a moving image state to a still image state may be the same, but since the two switching instructions are recognized separately, It is desirable that they be different.

あるいは、音声認識装置２７は、マイクロフォン２５によって受け付けられた指示者の音声から特定の音声パターン（静止画状態から動画状態への切り替えを示す音声パターン）が認識されるかを試みる。特定の音声パターンは、具体的にどのような音声パターンであってもよい。なお、静止画状態から動画状態への切り替えを示す音声パターンと動画状態から静止画状態への切り替えを示す音声パターンとは、同じであってもよいが、二つの切り替え指示を別々に認識するためには異なるのが望ましい。 Alternatively, the voice recognition device 27 attempts to recognize a specific voice pattern (a voice pattern indicating switching from a still image state to a moving image state) from the instructor's voice received by the microphone 25. The specific audio pattern may be any specific audio pattern. Note that the audio pattern indicating switching from a still image state to a moving image state and the audio pattern indicating switching from a moving image state to a still image state may be the same, but in order to recognize the two switching instructions separately, It is desirable that they be different.

演算処理部２４は、ジェスチャ認識装置２６によって第１のジェスチャパターンが認識されたか否かによって、指示者から静止画状態から動画状態への切り替え指示が入力されたか否かを判定する（Ｓ１１）。あるいは、演算処理部２４は、音声認識装置２７によって特定の音声パターンが認識されたか否かによって、指示者から静止画状態から動画状態への切り替え指示が入力されたか否かを判定する。 The arithmetic processing unit 24 determines whether an instruction to switch from the still image state to the moving image state has been input from the instructor based on whether the first gesture pattern has been recognized by the gesture recognition device 26 (S11). Alternatively, the arithmetic processing unit 24 determines whether an instruction to switch from the still image state to the moving image state has been input from the instructor, depending on whether a specific voice pattern has been recognized by the voice recognition device 27.

図２に示されるように、演算処理部２４は、静止画状態から動画状態への切り替え指示が入力されていない場合には（Ｓ１１において「ＮＯ」）、Ｓ１１に動作を移行する。一方、演算処理部２４は、静止画状態から動画状態への切り替え指示が入力された場合には（Ｓ１１において「ＹＥＳ」）、作業者側システムにおける演算処理部１４に、静止画状態から動画状態への切り替え信号を、ネットワーク３０を介して通知する（Ｓ１２）。 As shown in FIG. 2, if the instruction to switch from the still image state to the moving image state has not been input ("NO" in S11), the arithmetic processing unit 24 shifts the operation to S11. On the other hand, when an instruction to switch from the still image state to the moving image state is input ("YES" in S11), the arithmetic processing section 24 instructs the arithmetic processing section 14 in the worker's system to switch from the still image state to the moving image state. A switching signal is notified via the network 30 (S12).

作業者側システムにおいて、演算処理部１４は、静止画状態から動画状態への切り替え信号を受け取ると、ＡＲディスプレイ１１によってジェスチャが配置される基準位置を位置姿勢計測部１３によって計測されるカメラ１２の位置に追従させるとともに、ＡＲディスプレイ１１によってジェスチャが配置される基準方向を位置姿勢計測部１３によって計測されるカメラ１２の方向に追従させる。 In the worker side system, upon receiving the switching signal from the still image state to the moving image state, the arithmetic processing unit 14 determines the reference position at which the gesture is placed by the AR display 11 of the camera 12 measured by the position and orientation measurement unit 13. In addition to following the position, the reference direction in which the gesture is placed by the AR display 11 is made to follow the direction of the camera 12 measured by the position and orientation measurement unit 13.

さらに、演算処理部１４は、音声処理部１６を無効化し（すなわち、音声処理部１６の動作を停止し）、取得した指示音声をそのままスピーカ１５から出力させる（Ｓ３１）。 Furthermore, the arithmetic processing unit 14 disables the audio processing unit 16 (that is, stops the operation of the audio processing unit 16), and causes the acquired instruction voice to be output as is from the speaker 15 (S31).

さらに、演算処理部２４は、ディスプレイ２１によって表示されている静止画があれば、その静止画を消去する（Ｓ１３）。そして、演算処理部２４は、カメラ１２によって得られた動画が演算処理部１４から送信されると、送信された動画のディスプレイ２１による表示が再開されるようにディスプレイ２１を制御する（Ｓ１４）。 Further, if there is a still image displayed on the display 21, the arithmetic processing unit 24 erases the still image (S13). Then, when the video obtained by the camera 12 is transmitted from the calculation processing unit 14, the calculation processing unit 24 controls the display 21 so that the display 21 resumes displaying the transmitted video (S14).

（動画状態から静止画状態への切り替え動作）
動画状態においては、ジェスチャが配置される基準位置および基準方向が、カメラ１２の位置および方向に追従する。 (Switching operation from video state to still image state)
In the moving image state, the reference position and reference direction in which gestures are placed follow the position and direction of the camera 12.

指示者側システムにおいて、ジェスチャ認識装置２６は、ジェスチャ入力装置２２によって受け付けられた指示者のジェスチャから第１のジェスチャパターン（動画状態から静止画状態への切り替えを示すジェスチャパターン）が認識されるかを試みる。なお、上記したように、静止画状態から動画状態への切り替えを示すジェスチャパターンと動画状態から静止画状態への切り替えを示すジェスチャパターンとは、同じであってもよいが、二つの切り替え指示を別々に認識するためには異なるのが望ましい。 In the instructor side system, the gesture recognition device 26 determines whether a first gesture pattern (a gesture pattern indicating switching from a moving image state to a still image state) is recognized from the instructor's gesture received by the gesture input device 22. try. Note that, as described above, the gesture pattern indicating switching from a still image state to a moving image state and the gesture pattern indicating switching from a moving image state to a still image state may be the same, but two switching instructions may be used. It is desirable that they be different in order to be recognized separately.

あるいは、音声認識装置２７は、マイクロフォン２５によって受け付けられた指示者の音声から特定の音声パターン（動画状態から静止画状態への切り替えを示す音声パターン）が認識されるかを試みる。なお、上記したように、動画状態から静止画状態への切り替えを示す音声パターンと静止画状態から動画状態への切り替えを示す音声パターンとは、同じであってもよいが、二つの切り替え指示を別々に認識するためには異なるのが望ましい。 Alternatively, the voice recognition device 27 attempts to recognize a specific voice pattern (a voice pattern indicating switching from a moving image state to a still image state) from the instructor's voice received by the microphone 25. Note that, as described above, the audio pattern indicating switching from the video state to the still image state and the audio pattern indicating switching from the still image state to the video state may be the same, but two switching instructions may be used. It is desirable that they be different in order to be recognized separately.

演算処理部２４は、ジェスチャ認識装置２６によって第１のジェスチャパターンが認識されたか否かによって、指示者から動画状態から静止画状態への切り替え指示が入力されたか否かを判定する（Ｓ２１）。あるいは、演算処理部２４は、音声認識装置２７によって特定の音声パターンが認識されたか否かによって、指示者から動画状態から静止画状態への切り替え指示が入力されたか否かを判定する。 The arithmetic processing unit 24 determines whether an instruction to switch from the moving image state to the still image state has been input from the instructor based on whether the first gesture pattern has been recognized by the gesture recognition device 26 (S21). Alternatively, the arithmetic processing unit 24 determines whether an instruction to switch from the moving image state to the still image state has been input from the instructor, depending on whether a specific voice pattern has been recognized by the voice recognition device 27.

演算処理部２４は、動画状態から静止画状態への切り替え指示が入力されていない場合には（Ｓ２１において「ＮＯ」）、Ｓ２１に動作を移行する。一方、演算処理部２４は、動画状態から静止画状態への切り替え指示が入力された場合には（Ｓ２１において「ＹＥＳ」）、作業者側システムにおける演算処理部１４に、静止画状態から動画状態への切り替え信号を、ネットワーク３０を介して通知する（Ｓ２２）。 If the instruction to switch from the moving image state to the still image state has not been input (“NO” in S21), the arithmetic processing unit 24 shifts the operation to S21. On the other hand, when an instruction to switch from the moving image state to the still image state is input ("YES" in S21), the arithmetic processing section 24 instructs the arithmetic processing section 14 in the worker side system to switch from the still image state to the moving image state. A switching signal is notified via the network 30 (S22).

作業者側システムにおいて、演算処理部１４は、静止画状態から動画状態への切り替え信号を受け取ると、静止画状態から動画状態への切り替えられたタイミングにおけるカメラ１２の位置および方向を記録するとともに（Ｓ２６）、ＡＲディスプレイ１１によってジェスチャの配置に用いられる基準位置および基準方向を、記録した動画状態から静止画状態への切り替えられたタイミングにおけるカメラ１２の位置および方向に固定する。 In the worker side system, upon receiving the switching signal from the still image state to the moving image state, the arithmetic processing unit 14 records the position and direction of the camera 12 at the timing of switching from the still image state to the moving image state, and also records ( S26), the reference position and reference direction used by the AR display 11 for arranging gestures are fixed to the position and direction of the camera 12 at the timing when the recorded moving image state was switched to the still image state.

また、演算処理部１４は、音声処理部１６を有効化する（すなわち、音声処理部１６の動作を開始する）（Ｓ４１）。これによって、音声処理部１６は、指示音声に対する処理（例えば、ＨＲＴＦなどを用いた処理など）によって、ジェスチャが配置される基準位置から指示音声が出力されているかのようにスピーカ１５から指示音声を出力させる。 Furthermore, the arithmetic processing unit 14 enables the audio processing unit 16 (that is, starts the operation of the audio processing unit 16) (S41). As a result, the audio processing unit 16 performs processing on the instruction audio (for example, processing using HRTF, etc.) to output the instruction audio from the speaker 15 as if the instruction audio were being output from the reference position where the gesture is placed. Output.

さらに、演算処理部２４は、カメラ１２によって得られた動画のスナップショットを静止画として作成する（Ｓ２３）。 Furthermore, the arithmetic processing unit 24 creates a snapshot of the moving image obtained by the camera 12 as a still image (S23).

演算処理部２４は、ディスプレイ２１によって表示されている動画があれば、その動画を消去する（Ｓ２４）。そして、演算処理部２４は、生成した静止画のディスプレイ２１による表示が再開されるようにディスプレイ２１を制御する（Ｓ２５）。 If there is a moving image displayed on the display 21, the arithmetic processing unit 24 erases the moving image (S24). Then, the arithmetic processing unit 24 controls the display 21 so that the display 21 resumes displaying the generated still image (S25).

以上、静止画状態から動画状態への切り替え動作例、および、動画状態から静止画状態への切り替え動作例について説明した。続いて、ジェスチャが配置される基準位置および基準方向についてさらに詳細に説明する。 The example of the switching operation from the still image state to the moving image state and the example of the switching operation from the moving image state to the still image state have been described above. Next, the reference position and reference direction in which gestures are placed will be described in more detail.

（１－３．ジェスチャが配置される基準位置および基準方向）
ジェスチャ入力装置２２によって指示者からの入力が受け付けられたジェスチャ情報は、指示者の環境における固有の座標系における位置および方向などによって表現される情報である。かかるジェスチャ情報に基づいて作業者の環境におけるＡＲディスプレイ１１にジェスチャを表示させるためには、指示者側の座標系（変換元の座標系）から作業者側の座標系（変換先の座標系）へのジェスチャ情報の座標変換が必要となる。 (1-3. Reference position and reference direction where gesture is placed)
The gesture information input by the gesture input device 22 from the instructor is information expressed by the position and direction in a coordinate system unique to the environment of the instructor. In order to display a gesture on the AR display 11 in the worker's environment based on such gesture information, it is necessary to change from the coordinate system of the instructor (the coordinate system of the transformation source) to the coordinate system of the worker (the coordinate system of the transformation destination). It is necessary to convert the coordinates of the gesture information into .

ここでは、変換先の座標系の例として、比較例１（ワールド座標系）、比較例２（カメラ座標系）、本実施形態に係る座標系について順に説明する。ここでは、説明を簡単にするため、ジェスチャの位置は固定されているものとする。 Here, as examples of coordinate systems to be converted, Comparative Example 1 (world coordinate system), Comparative Example 2 (camera coordinate system), and the coordinate system according to the present embodiment will be described in order. Here, in order to simplify the explanation, it is assumed that the position of the gesture is fixed.

（比較例１（ワールド座標系））
図４は、比較例１に係るジェスチャ情報の変換先の座標系について説明するための図である。図４を参照すると、初期状態における作業者環境Ｅ１１が示されている。初期状態における作業者環境Ｅ１１には、作業対象Ｒ１が存在している。作業対象Ｒ１は、作業者による作業の対象となる物体である。 (Comparative example 1 (world coordinate system))
FIG. 4 is a diagram for explaining a coordinate system to which gesture information is converted according to Comparative Example 1. Referring to FIG. 4, a worker environment E11 in an initial state is shown. A work target R1 exists in the worker environment E11 in the initial state. The work target R1 is an object to be worked on by a worker.

ここでは、図の見やすさを考慮して、作業対象Ｒ１が、「表」と「裏」を有する正方形の平面として示されている。しかし、作業対象Ｒ１の形状は限定されない。典型的には、作業対象Ｒ１は、ＡＴＭ（ＡｕｔｏｍａｔｅｄＴｅｌｌｅｒＭａｃｈｉｎｅ）またはプリンタなどといった機械である場合が想定されるが、作業対象Ｒ１は、機械でなくてもよく作業に使われる何らかの物体であればよい。 Here, in consideration of the ease of viewing the figure, the work target R1 is shown as a square plane having a "front" and "back". However, the shape of the work target R1 is not limited. Typically, the work target R1 is assumed to be a machine such as an ATM (Automated Teller Machine) or a printer, but the work target R1 does not need to be a machine and may be any object used for work. good.

さらに、初期状態における作業者環境Ｅ１１には、カメラ１２が存在しており、カメラ１２によって得られた動画が指示者側の画面１２１ａに表示される。指示者側の画面１２１ａに表示されている動画には作業対象Ｒ１が写っている。指示者は、初期状態における作業者環境Ｅ１１の動画を見ながら、ジェスチャ情報を入力する。指示者側の画面１２１ａには、ジェスチャ情報に基づいてジェスチャＪ１が表示されている。 Furthermore, a camera 12 is present in the worker environment E11 in the initial state, and a moving image obtained by the camera 12 is displayed on the screen 121a on the instructor's side. The work target R1 is shown in the video displayed on the screen 121a on the instructor's side. The instructor inputs gesture information while watching a video of the worker environment E11 in the initial state. Gesture J1 is displayed on the screen 121a on the instructor's side based on the gesture information.

初期状態における作業者環境Ｅ１１を参照すると、ワールド座標系Ｃ１およびカメラ座標系Ｃ２が示されている。指示者によって入力されたジェスチャ情報に基づいて、ワールド座標系Ｃ１にジェスチャＪ１が配置されるように作業者側のＡＲディスプレイが制御される。作業者は、ワールド座標系Ｃ１に配置されたジェスチャＪ１を見ながら、作業対象Ｒ１に対して作業を行うことができる。 Referring to the worker environment E11 in the initial state, a world coordinate system C1 and a camera coordinate system C2 are shown. Based on the gesture information input by the instructor, the AR display on the worker's side is controlled so that the gesture J1 is placed in the world coordinate system C1. The worker can work on the work target R1 while looking at the gesture J1 arranged in the world coordinate system C1.

ここで、作業者環境Ｅ１２に示されるように、作業者が作業対象Ｒ１の背面に移動したとする。ワールド座標系Ｃ１におけるジェスチャＪ１の位置および方向は固定されている。 Here, it is assumed that the worker moves to the back side of the work target R1, as shown in the worker environment E12. The position and direction of gesture J1 in world coordinate system C1 are fixed.

このとき、作業対象Ｒ１の「裏」およびジェスチャＪ１が指示者側の画面１２２ａに表示される。しかし、作業者の移動量および移動方向に合わせてジェスチャＪ１の位置および方向も変化する。そのため、指示者は、表示されるジェスチャＪ１と比較して位置および方向が変化する前（１８０度回転する前）のジェスチャを入力する必要が生じるため、指示者はジェスチャ入力をしにくくなる。 At this time, the "back" of the work target R1 and the gesture J1 are displayed on the instructor's screen 122a. However, the position and direction of gesture J1 also change according to the amount and direction of movement of the worker. Therefore, the instructor is required to input a gesture before the position and direction change (before the gesture J1 is rotated by 180 degrees) compared to the displayed gesture J1, which makes it difficult for the instructor to input the gesture.

一方、作業者環境Ｅ１３に示されるように、作業者が作業対象Ｒ１の奥側に移動したとする。ワールド座標系Ｃ１におけるジェスチャＪ１の位置および方向は固定されている。 On the other hand, suppose that the worker moves to the back side of the work target R1, as shown in the worker environment E13. The position and direction of gesture J1 in world coordinate system C1 are fixed.

このとき、ワールド座標系Ｃ１におけるジェスチャＪ１の位置および方向は固定されてしまっているため、ジェスチャＪ１が作業者の後方に位置してしまい、作業者からはジェスチャＪ１が見えなくなってしまう。例えば、次の作業位置へ向かうためのジェスチャ情報を指示者が入力したとしても、作業者はそのジェスチャを見ることができずに、次の作業位置を知ることができなくなってしまう。 At this time, since the position and direction of the gesture J1 in the world coordinate system C1 are fixed, the gesture J1 is located behind the worker, and the gesture J1 is no longer visible to the worker. For example, even if the instructor inputs gesture information for heading to the next work position, the worker will not be able to see the gesture and will not be able to know the next work position.

以上により、比較例１は、あらかじめ作成された作業マニュアルなどを表示する場合などには好適である。しかし、比較例１では、作業者の移動に伴って指示者によるジェスチャが見えなくなってしまうことがあるため、リアルタイムに作業指示を行う必要がある場合などには好適ではないと言える。 As described above, Comparative Example 1 is suitable for displaying a work manual created in advance. However, in Comparative Example 1, the gestures made by the instructor may become invisible as the worker moves, so it may not be suitable when it is necessary to give work instructions in real time.

以上、比較例１に係るジェスチャ情報の変換先の座標系について説明した。 The coordinate system to which gesture information is converted according to Comparative Example 1 has been described above.

（比較例２（カメラ座標系））
図５は、比較例２に係るジェスチャ情報の変換先の座標系について説明するための図である。図５を参照すると、図４に示された例と同様に、初期状態における作業者環境Ｅ２１が示されている。初期状態における作業者環境Ｅ２１には、作業対象Ｒ１が存在している。 (Comparative example 2 (camera coordinate system))
FIG. 5 is a diagram for explaining a coordinate system to which gesture information is converted according to Comparative Example 2. Referring to FIG. 5, similar to the example shown in FIG. 4, a worker environment E21 in an initial state is shown. The work target R1 exists in the worker environment E21 in the initial state.

さらに、カメラ１２によって得られた動画が指示者側の画面１２３ａに表示される。指示者側の画面１２３ａに表示されている動画には作業対象Ｒ１が写っている。指示者は、初期状態における作業者環境Ｅ１１の動画を見ながら、ジェスチャ情報を入力する。指示者側の画面１２３ａには、ジェスチャ情報に基づいてジェスチャＪ１が表示されている。 Furthermore, a moving image obtained by the camera 12 is displayed on the screen 123a on the instructor's side. The work target R1 is shown in the video displayed on the screen 123a on the instructor's side. The instructor inputs gesture information while watching a video of the worker environment E11 in the initial state. Gesture J1 is displayed on the screen 123a on the instructor's side based on the gesture information.

比較例２においては、指示者によって入力されたジェスチャ情報に基づいて、カメラ座標系Ｃ２にジェスチャＪ１が配置されるように作業者側のＡＲディスプレイが制御される。作業者は、カメラ座標系Ｃ２に配置されたジェスチャＪ１を見ながら、作業対象Ｒ１に対して作業を行うことができる。 In Comparative Example 2, the AR display on the worker's side is controlled so that gesture J1 is placed in camera coordinate system C2 based on gesture information input by the instructor. The worker can work on the work target R1 while looking at the gesture J1 arranged in the camera coordinate system C2.

ここで、作業者環境Ｅ２２に示されるように、作業者が作業対象Ｒ１の背面に移動したとする。比較例１と異なり、ワールド座標系Ｃ１におけるジェスチャＪ１の位置および方向は変化するが、カメラ座標系Ｃ２におけるジェスチャＪ１の位置および方向は固定されている。 Here, it is assumed that the worker moves to the back side of the work target R1, as shown in the worker environment E22. Unlike Comparative Example 1, the position and direction of the gesture J1 in the world coordinate system C1 change, but the position and direction of the gesture J1 in the camera coordinate system C2 are fixed.

このとき、作業対象Ｒ１の「裏」およびジェスチャＪ１が指示者側の画面１２４ａに表示される。このとき、作業者が移動しているにも関わらず、画面１２４ａに表示されるジェスチャＪ１の位置および方向は固定される。そのため、指示者は、表示されるジェスチャＪ１と同じ位置および方向によってジェスチャを入力すれば済むため、指示者はジェスチャ入力をしやすい。 At this time, the "back" of the work target R1 and the gesture J1 are displayed on the screen 124a on the instructor's side. At this time, although the worker is moving, the position and direction of gesture J1 displayed on screen 124a are fixed. Therefore, the instructor only needs to input a gesture at the same position and direction as the displayed gesture J1, making it easy for the instructor to input the gesture.

しかし、カメラ座標系Ｃ２におけるジェスチャＪ１の位置および方向は固定されるため、作業者から見えるジェスチャＪ１の位置および方向は一定である。そのため、作業者は、ジェスチャＪ１の見える位置または角度を変更することができなくなってしまう（例えば、ジェスチャＪ１を上方から俯瞰的に見ることができなくなってしまう）。 However, since the position and direction of gesture J1 in camera coordinate system C2 are fixed, the position and direction of gesture J1 visible to the operator are constant. Therefore, the operator is no longer able to change the position or angle at which the gesture J1 is viewed (for example, the operator is no longer able to view the gesture J1 from above).

一方、作業者環境Ｅ２３に示されるように、作業者が作業対象Ｒ１の奥側に移動したとする。このとき、ワールド座標系Ｃ１におけるジェスチャＪ１の位置および方向は変化するが、カメラ座標系Ｃ２におけるジェスチャＪ１の位置および方向は固定されている。このとき、指示者側の画面１２５ａに表示されるジェスチャＪ１の位置および方向は固定される。 On the other hand, assume that the worker moves to the back side of the work target R1, as shown in the worker environment E23. At this time, the position and direction of the gesture J1 in the world coordinate system C1 change, but the position and direction of the gesture J1 in the camera coordinate system C2 are fixed. At this time, the position and direction of the gesture J1 displayed on the screen 125a on the instructor side are fixed.

さらに、カメラ座標系Ｃ２におけるジェスチャＪ１の位置は固定されているため、ジェスチャＪ１が作業者の前方に位置し続け、作業者からはジェスチャＪ１が見えなくなってしまうことがなくなる。例えば、次の作業位置へ向かうためのジェスチャ情報を指示者が入力した場合に、作業者はそのジェスチャを見て、次の作業位置を知ることができる。 Furthermore, since the position of the gesture J1 in the camera coordinate system C2 is fixed, the gesture J1 continues to be located in front of the worker, and the gesture J1 will not become invisible to the worker. For example, when an instructor inputs gesture information for heading to the next work position, the worker can see the gesture and know the next work position.

以上により、比較例２は、作業者の移動に伴って指示者によるジェスチャが見えなくなってしまうことがなくなるため、リアルタイムに作業指示を行う必要がある場合などには好適である。しかし、比較例２では、作業者から見えるジェスチャの位置および角度が変更されないため、ＡＲ技術を用いてジェスチャを表示する利点が損なわれてしまうと言える。 As described above, Comparative Example 2 is suitable for cases where it is necessary to give work instructions in real time because the gestures made by the instructor do not become invisible as the worker moves. However, in Comparative Example 2, the position and angle of the gesture visible to the worker are not changed, so it can be said that the advantage of displaying the gesture using AR technology is lost.

以上、比較例２に係るジェスチャ情報の変換先の座標系について説明した。 The coordinate system to which gesture information is converted according to Comparative Example 2 has been described above.

（本実施形態に係る座標系）
図６は、本発明の実施形態に係るジェスチャ情報の変換先の座標系について説明するための図である。図６を参照すると、図４に示された例と同様に、初期状態における作業者環境Ｅ３１が示されている。初期状態における作業者環境Ｅ３１には、作業対象Ｒ１が存在している。初期状態においては、遠隔作業支援システム１の状態が動画状態である場合を想定する。一例として、動画状態においては、指示者が作業者に作業位置をジェスチャによって指示する。 (Coordinate system according to this embodiment)
FIG. 6 is a diagram for explaining a coordinate system into which gesture information is converted according to an embodiment of the present invention. Referring to FIG. 6, similar to the example shown in FIG. 4, a worker environment E31 in an initial state is shown. The work object R1 exists in the worker environment E31 in the initial state. In the initial state, it is assumed that the remote work support system 1 is in a moving image state. For example, in a moving image state, an instructor instructs a worker about a work position using a gesture.

なお、マイクロフォン２５によって受け付けられた指示音声が、演算処理部２４からネットワーク３０を介して演算処理部１４に常時送信される。 Note that the instruction voice received by the microphone 25 is always transmitted from the arithmetic processing section 24 to the arithmetic processing section 14 via the network 30.

遠隔作業支援システム１の状態が動画状態である場合においては、演算処理部２４は、カメラ１２によって得られた動画が指示者側の画面１２６ａに表示されるように制御する。指示者側の画面１２６ａに表示されている動画には作業対象Ｒ１が写っている。指示者は、初期状態における作業者環境Ｅ３１の動画を見ながら、ジェスチャ情報を入力する。指示者側の画面１２６ａには、ジェスチャ情報に基づいてジェスチャＪ１が表示されている。 When the remote work support system 1 is in the moving image state, the arithmetic processing unit 24 controls so that the moving image obtained by the camera 12 is displayed on the instructor's screen 126a. The work target R1 is shown in the video displayed on the screen 126a on the instructor's side. The instructor inputs gesture information while watching a video of the worker environment E31 in the initial state. Gesture J1 is displayed on the screen 126a on the instructor's side based on the gesture information.

演算処理部１４は、ジェスチャ情報を演算処理部２４から取得する。遠隔作業支援システム１の状態が動画状態である場合において、演算処理部１４は、カメラ１２の位置および方向を基準位置および基準方向（第２の基準位置および第２の基準方向）としたカメラ座標系Ｃ２（第２の座標系）にジェスチャＪ１が配置されるようにＡＲディスプレイ１１を制御する手法（第２の手法）を採用する。作業者は、カメラ座標系Ｃ２に配置されたジェスチャＪ１を見ながら、作業位置に向かうことができる。 The arithmetic processing unit 14 acquires gesture information from the arithmetic processing unit 24 . When the state of the remote work support system 1 is a moving image state, the arithmetic processing unit 14 calculates camera coordinates with the position and direction of the camera 12 as a reference position and a reference direction (a second reference position and a second reference direction). A method (second method) is adopted in which the AR display 11 is controlled so that the gesture J1 is placed in the system C2 (second coordinate system). The worker can head to the work position while looking at the gesture J1 arranged in the camera coordinate system C2.

さらに、遠隔作業支援システム１の状態が動画状態である場合においては、作業者の視点と指示者の視点とが同じである。そこで、遠隔作業支援システム１の状態が動画状態である場合において、作業者環境Ｅ３１に示されるように、演算処理部１４は、指示者側システムにおける演算処理部２４から受信した指示音声を音声処理部１６に処理させずに、そのままスピーカ１５から出力させる。 Furthermore, when the remote work support system 1 is in the moving image state, the worker's viewpoint and the instructor's viewpoint are the same. Therefore, when the state of the remote work support system 1 is in the video state, as shown in the worker environment E31, the arithmetic processing unit 14 performs audio processing on the instruction voice received from the arithmetic processing unit 24 in the instructor side system. The output is output from the speaker 15 as it is without being processed by the unit 16.

一例として、指示者が作業位置を指示するのを終了し、作業対象Ｒ１への作業内容をジェスチャによって指示し始める場合を想定する。このとき、指示者は、動画状態から静止画状態への切り替え（すなわち、第２の手法から第１の手法への切り替え）を示す切り替え指示をジェスチャパターンまたは音声パターンによって入力する。かかる切り替え指示が入力されると、動画状態から静止画状態への切り替えを示す切り替え信号が、演算処理部２４から演算処理部１４に通知される。 As an example, assume that the instructor finishes instructing the work position and starts instructing the work content to be performed on the work target R1 by gesture. At this time, the instructor inputs a switching instruction indicating switching from the moving image state to the still image state (that is, switching from the second method to the first method) using a gesture pattern or a voice pattern. When such a switching instruction is input, a switching signal indicating switching from a moving image state to a still image state is notified from the arithmetic processing unit 24 to the arithmetic processing unit 14.

演算処理部１４は、動画状態から静止画状態への切り替えを示す切り替え信号を取得すると、かかる切り替え信号を取得したことに基づいて、動画状態から静止画状態への切り替えを示す切り替え信号を取得したタイミングにおけるカメラ１２の位置および方向を、ジェスチャの配置に用いられる基準位置Ｃ３および基準方向（第１の基準位置および第１の基準方向）として記録する。なお、切り替え信号を取得したタイミングは、所定のタイミングの一例である。 Upon acquiring the switching signal indicating switching from the moving image state to the still image state, the arithmetic processing unit 14 acquired a switching signal indicating switching from the moving image state to the still image state based on the acquisition of the switching signal. The position and direction of the camera 12 at the timing are recorded as a reference position C3 and a reference direction (first reference position and first reference direction) used for gesture placement. Note that the timing at which the switching signal is acquired is an example of a predetermined timing.

作業者環境Ｅ３２に示されるように、作業者が作業対象Ｒ１の背面に移動したとする。位置姿勢計測部１３は、演算処理部１４によって記録された基準位置および基準方向を基準とした作業者の移動量および移動方向を計測する。 Assume that the worker moves to the back of the work target R1, as shown in the worker environment E32. The position and orientation measurement unit 13 measures the movement amount and movement direction of the worker based on the reference position and reference direction recorded by the arithmetic processing unit 14.

演算処理部１４は、記録した基準位置および基準方向を基準とする座標系（第１の座標系）に、位置姿勢計測部１３によって計測された作業者の移動量および移動方向に基づいて、ジェスチャＪ１が配置されるようにＡＲディスプレイ１１を制御する手法（第１の手法）を採用する。換言すると、演算処理部１４は、ＡＲ空間における基準位置および基準方向にジェスチャＪ１を配置する。 The arithmetic processing unit 14 calculates a gesture based on the movement amount and movement direction of the worker measured by the position and orientation measurement unit 13 in a coordinate system (first coordinate system) based on the recorded reference position and reference direction. A method (first method) is adopted in which the AR display 11 is controlled so that J1 is placed. In other words, the arithmetic processing unit 14 arranges the gesture J1 at the reference position and reference direction in the AR space.

このとき、画面１２７ａに示されるように、ＡＲディスプレイ１１には、ワールド座標系Ｃ１におけるジェスチャＪ１の位置および方向は固定されながら、記録された基準位置および基準方向から、作業者の移動量および移動方向だけ移動および回転した位置および方向から見えるジェスチャＪ１の位置および方向に、ジェスチャＪ１が表示される。なお、画面１２７ａに存在する作業対象Ｒ１の「裏」は、スルー画像であってもよいし、実物であってもよい。 At this time, as shown on the screen 127a, while the position and direction of the gesture J1 in the world coordinate system C1 are fixed, the amount and movement of the worker are displayed from the recorded reference position and direction. Gesture J1 is displayed at the position and direction of gesture J1 that can be seen from the position and direction where the gesture J1 is moved and rotated by the direction. Note that the "back" of the work target R1 that exists on the screen 127a may be a through image or may be the actual object.

演算処理部１４は、記録した基準位置および基準方向を基準とする座標系に、位置姿勢計測部１３によって計測された作業者の移動量および移動方向に基づいて、指示音声の仮想的な出力位置および仮想的な出力方向を決定する。換言すると、演算処理部１４は、ＡＲ空間における基準位置および基準方向を指示音声の仮想的な出力位置および出力方向として決定する。 The arithmetic processing unit 14 calculates a virtual output position of the instruction voice in a coordinate system based on the recorded reference position and reference direction, based on the movement amount and movement direction of the worker measured by the position and orientation measurement unit 13. and determine the virtual output direction. In other words, the arithmetic processing unit 14 determines the reference position and reference direction in the AR space as the virtual output position and output direction of the instruction voice.

音声処理部１６は、演算処理部１４によって決定された仮想的な出力位置から仮想的な出力方向に出力されているかのように（仮想的なスピーカＫ１から出力されているかのように）スピーカ１５による指示音声の出力を制御する。この仮想的な出力位置は、指示者の視点に該当し得る。したがって、作業者は、指示音声によって指示者の視点を認識することができ、臨場感を体感しながら指示を理解することが可能となる。 The audio processing unit 16 outputs the sound from the speaker 15 as if it were being output in a virtual output direction from the virtual output position determined by the arithmetic processing unit 14 (as if it were being output from the virtual speaker K1). Controls the output of voice instructions. This virtual output position may correspond to the point of view of the instructor. Therefore, the worker can recognize the point of view of the instructor based on the instruction voice, and can understand the instructions while experiencing a sense of realism.

このとき、作業者環境Ｅ３２に示されるように、基準位置および基準方向はワールド座標系Ｃ１に固定されながら、基準位置および基準方向から、作業者の移動量および移動方向だけ移動および回転した位置および方向から見える基準位置および基準方向から、指示音声が出力されているかのように指示音声が出力される。 At this time, as shown in the worker environment E32, while the reference position and reference direction are fixed to the world coordinate system C1, the reference position and reference direction are moved and rotated by the amount and direction of movement of the worker from the reference position and reference direction. The instruction voice is output as if the instruction voice were being output from the reference position and reference direction that can be seen from the direction.

一方、指示者側システムにおいて、演算処理部２４は、動画状態から静止画状態への切り替えを示す切り替え信号が入力されたタイミングにおける動画のスナップショットを静止画として生成する。そして、画面１２７ｂに示されるように、演算処理部２４は、生成した静止画がディスプレイ２１に表示されるように制御する。指示者側の画面１２７ｂには、ジェスチャ情報に基づいてジェスチャＪ１が表示されている。 On the other hand, in the instructor side system, the arithmetic processing unit 24 generates a snapshot of the moving image at the timing when a switching signal indicating switching from the moving image state to the still image state is input as a still image. Then, as shown on the screen 127b, the arithmetic processing unit 24 controls the generated still image to be displayed on the display 21. Gesture J1 is displayed on the screen 127b on the instructor's side based on the gesture information.

このとき、作業者が移動しているにも関わらず、画面１２７ｂに表示されるジェスチャＪ１の位置および方向は固定される。そのため、指示者は、表示されるジェスチャＪ１と同じ位置および方向によってジェスチャを入力すれば済むため、指示者はジェスチャ入力をしやすい。 At this time, although the worker is moving, the position and direction of gesture J1 displayed on screen 127b are fixed. Therefore, the instructor only needs to input a gesture at the same position and direction as the displayed gesture J1, making it easy for the instructor to input the gesture.

さらに、作業者の移動に伴って作業者から見えるジェスチャＪ１の位置および方向は変化し得る。そのため、作業者は、ジェスチャＪ１の見える位置または角度を変更することができるようになる（例えば、ジェスチャＪ１を上方から俯瞰的に見ることができるようになる）。 Furthermore, as the worker moves, the position and direction of the gesture J1 visible to the worker may change. Therefore, the operator can change the position or angle at which the gesture J1 is viewed (for example, the operator can now view the gesture J1 from above).

一方、作業者環境Ｅ３３に示されるように、作業者が作業対象Ｒ１の奥側に移動したとする。このとき、ワールド座標系Ｃ１におけるジェスチャＪ１の位置および方向は変化するが、カメラ座標系Ｃ２におけるジェスチャＪ１の位置および方向は固定されている。このとき、指示者側の画面１２８ａに表示されるジェスチャＪ１の位置および方向は固定される。 On the other hand, suppose that the worker moves to the back side of the work target R1, as shown in the worker environment E33. At this time, the position and direction of the gesture J1 in the world coordinate system C1 change, but the position and direction of the gesture J1 in the camera coordinate system C2 are fixed. At this time, the position and direction of the gesture J1 displayed on the screen 128a on the instructor side are fixed.

したがって、比較例２と同様に、ジェスチャＪ１が作業者の前方に位置し続け、作業者からはジェスチャＪ１が見えなくなってしまうことがなくなる。例えば、次の作業位置へ向かうためのジェスチャ情報を指示者が入力した場合に、作業者はそのジェスチャを見て、次の作業位置を知ることができる。 Therefore, similarly to Comparative Example 2, the gesture J1 continues to be located in front of the worker, and the gesture J1 will not become invisible to the worker. For example, when an instructor inputs gesture information for heading to the next work position, the worker can see the gesture and know the next work position.

図６には示されていないが、指示者が、作業対象Ｒ１への作業内容をジェスチャによって指示するのを終了し、次の作業位置を指示し始める場合を想定する。このとき、指示者は、静止画状態から動画状態への切り替え（すなわち、第１の手法から第２の手法への切り替え）を示す切り替え指示をジェスチャパターンまたは音声パターンによって入力する。かかる切り替え指示が入力されると、静止画状態から動画状態への切り替えを示す切り替え信号が、演算処理部２４から演算処理部１４に通知される。 Although not shown in FIG. 6, it is assumed that the instructor finishes instructing the work to be performed on the work target R1 by gesture and starts instructing the next work position. At this time, the instructor inputs a switching instruction indicating switching from the still image state to the moving image state (that is, switching from the first method to the second method) using a gesture pattern or a voice pattern. When such a switching instruction is input, a switching signal indicating switching from a still image state to a moving image state is notified from the arithmetic processing unit 24 to the arithmetic processing unit 14.

演算処理部１４は、静止画状態から動画状態への切り替えを示す切り替え信号を取得すると、かかる切り替え信号を取得したことに基づいて、カメラ１２の位置および方向を基準位置および基準方向としたカメラ座標系Ｃ２にジェスチャＪ１が配置されるようにＡＲディスプレイ１１を制御する。作業者は、カメラ座標系Ｃ２に配置されたジェスチャＪ１を見ながら、作業位置に向かうことができる。 Upon acquiring a switching signal indicating switching from a still image state to a moving image state, the arithmetic processing unit 14 calculates camera coordinates with the position and direction of the camera 12 as a reference position and a reference direction based on the acquisition of the switching signal. The AR display 11 is controlled so that gesture J1 is placed in system C2. The worker can head to the work position while looking at the gesture J1 arranged in the camera coordinate system C2.

さらに、遠隔作業支援システム１の状態が動画状態である場合においては、作業者の視点と指示者の視点とが同じである。そこで、遠隔作業支援システム１の状態が動画状態である場合において、作業者環境Ｅ３１にされるように、演算処理部１４は、指示者側システムにおける演算処理部２４から受信した指示音声を音声処理部１６に処理させずに、そのままスピーカ１５から出力させる。 Furthermore, when the remote work support system 1 is in the moving image state, the worker's viewpoint and the instructor's viewpoint are the same. Therefore, when the state of the remote work support system 1 is in the video state, the arithmetic processing unit 14 performs audio processing on the instruction voice received from the arithmetic processing unit 24 in the instructor side system so that the state is set to the worker environment E31. The output is output from the speaker 15 as it is without being processed by the unit 16.

以上、ジェスチャが配置される基準位置および基準方向について詳細に説明した。 The reference positions and reference directions in which gestures are placed have been described above in detail.

（１－４．効果）
以上により、本発明の実施形態によれば、作業者から見えるジェスチャの位置および角度が変更され得るため、ＡＲ技術を用いてジェスチャを表示する利点が損なわれずに済む。さらに、本発明の実施形態によれば、作業者の移動に伴って指示者によるジェスチャが見えなくなってしまうことがなくなる。そのため、本発明の実施形態に係る技術は、リアルタイムに作業指示を行う必要がある場合などに好適である。 (1-4. Effect)
As described above, according to the embodiment of the present invention, the position and angle of the gesture visible to the worker can be changed, so the advantages of displaying the gesture using AR technology are not lost. Furthermore, according to the embodiment of the present invention, the gestures made by the instructor will not become invisible as the worker moves. Therefore, the technology according to the embodiment of the present invention is suitable for cases where it is necessary to issue work instructions in real time.

さらに、本発明の実施形態によれば、遠隔作業支援システム１の状態が静止画状態である場合において、指示音声の仮想的な出力位置が指示者の視点となるように制御されるため、作業者は、指示音声によって指示者の視点を把握することが可能である。 Further, according to the embodiment of the present invention, when the state of the remote work support system 1 is a still image state, the virtual output position of the instruction voice is controlled to be the viewpoint of the instructor, so that the work The person can understand the point of view of the person giving the instruction based on the instruction voice.

また、本発明の実施形態によれば、動画状態と静止画状態との間の切り替え指示が音声またはジェスチャによって行われ得る。そのため、指示者は、動画状態と静止画状態との間の切り替え指示を行いたいときに、切り替えスイッチに手を伸ばす必要がなくなる。これによって、切り替え指示の度に、指示者によるジェスチャ指示が途切れてしまうことがなくなる（あるいは、指示者によるジェスチャ指示が途切れてしまう時間を短くすることが可能となる）。 Further, according to an embodiment of the present invention, an instruction to switch between a moving image state and a still image state may be given by voice or gesture. Therefore, the instructor does not need to reach for the changeover switch when instructing switching between the moving image state and the still image state. This prevents the instructor's gesture instruction from being interrupted every time the instructor issues a switching instruction (or it becomes possible to shorten the time period during which the instructor's gesture instruction is interrupted).

以上、本発明の実施形態に係る遠隔作業支援システム１が奏する効果について説明した。 The effects of the remote work support system 1 according to the embodiment of the present invention have been described above.

（２．ハードウェア構成例）
続いて、本発明の実施形態に係る作業者側システムのハードウェア構成例について説明する。 (2. Hardware configuration example)
Next, an example of the hardware configuration of the worker-side system according to the embodiment of the present invention will be described.

以下では、本発明の実施形態に係る指示者側システムのハードウェア構成例として、情報処理装置９００のハードウェア構成例について説明する。なお、以下に説明する情報処理装置９００のハードウェア構成例は、指示者側システムのハードウェア構成の一例に過ぎない。したがって、指示者側システムのハードウェア構成は、以下に説明する情報処理装置９００のハードウェア構成から不要な構成が削除されてもよいし、新たな構成が追加されてもよい。なお、作業者側システムのハードウェア構成も、指示者側システムのハードウェア構成と同様に実現され得る。 An example of the hardware configuration of the information processing device 900 will be described below as an example of the hardware configuration of the instructor-side system according to the embodiment of the present invention. Note that the hardware configuration example of the information processing device 900 described below is only an example of the hardware configuration of the instructor side system. Therefore, for the hardware configuration of the instructor-side system, unnecessary configurations may be deleted from the hardware configuration of the information processing apparatus 900 described below, or new configurations may be added. Note that the hardware configuration of the worker-side system can also be realized in the same way as the hardware configuration of the instructor-side system.

図７は、本発明の実施形態に係る指示者側システムの例としての情報処理装置９００のハードウェア構成を示す図である。情報処理装置９００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９０３と、ホストバス９０４と、ブリッジ９０５と、外部バス９０６と、インタフェース９０７と、入力装置９０８と、出力装置９０９と、ストレージ装置９１０と、通信装置９１１と、を備える。 FIG. 7 is a diagram showing a hardware configuration of an information processing device 900 as an example of an instructor-side system according to an embodiment of the present invention. The information processing device 900 includes a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, a RAM (Random Access Memory) 903, a host bus 904, a bridge 905, an external bus 906, and an interface. With 907 , an input device 908, an output device 909, a storage device 910, and a communication device 911.

ＣＰＵ９０１は、演算処理装置および制御装置として機能し、各種プログラムに従って情報処理装置９００内の動作全般を制御する。また、ＣＰＵ９０１は、マイクロプロセッサであってもよい。ＲＯＭ９０２は、ＣＰＵ９０１が使用するプログラムや演算パラメータ等を記憶する。ＲＡＭ９０３は、ＣＰＵ９０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を一時記憶する。これらはＣＰＵバス等から構成されるホストバス９０４により相互に接続されている。 The CPU 901 functions as an arithmetic processing device and a control device, and controls overall operations within the information processing device 900 according to various programs. Further, the CPU 901 may be a microprocessor. The ROM 902 stores programs used by the CPU 901, calculation parameters, and the like. The RAM 903 temporarily stores programs used in the execution of the CPU 901 and parameters that change as appropriate during the execution. These are interconnected by a host bus 904 composed of a CPU bus and the like.

ホストバス９０４は、ブリッジ９０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バス等の外部バス９０６に接続されている。なお、必ずしもホストバス９０４、ブリッジ９０５および外部バス９０６を分離構成する必要はなく、１つのバスにこれらの機能を実装してもよい。 The host bus 904 is connected via a bridge 905 to an external bus 906 such as a PCI (Peripheral Component Interconnect/Interface) bus. Note that the host bus 904, bridge 905, and external bus 906 do not necessarily need to be configured separately, and these functions may be implemented in one bus.

入力装置９０８は、マウス、キーボード、タッチパネル、ボタン、マイクロフォン、スイッチおよびレバー等ユーザが情報を入力するための入力手段と、ユーザによる入力に基づいて入力信号を生成し、ＣＰＵ９０１に出力する入力制御回路等から構成されている。情報処理装置９００を操作するユーザは、この入力装置９０８を操作することにより、情報処理装置９００に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input device 908 includes input means for the user to input information, such as a mouse, keyboard, touch panel, button, microphone, switch, and lever, and an input control circuit that generates an input signal based on the user's input and outputs it to the CPU 901. It is composed of etc. By operating the input device 908, a user operating the information processing device 900 can input various data to the information processing device 900 and instruct processing operations.

出力装置９０９は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイ装置、液晶ディスプレイ（ＬＣＤ）装置、ＯＬＥＤ（ＯｒｇａｎｉｃＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）装置、ランプ等の表示装置およびスピーカ等の音声出力装置を含む。 The output device 909 includes, for example, a CRT (Cathode Ray Tube) display device, a liquid crystal display (LCD) device, an OLED (Organic Light Emitting Diode) device, a display device such as a lamp, and an audio output device such as a speaker.

ストレージ装置９１０は、データ格納用の装置である。ストレージ装置９１０は、記憶媒体、記憶媒体にデータを記録する記録装置、記憶媒体からデータを読み出す読出し装置および記憶媒体に記録されたデータを削除する削除装置等を含んでもよい。ストレージ装置９１０は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）で構成される。このストレージ装置９１０は、ハードディスクを駆動し、ＣＰＵ９０１が実行するプログラムや各種データを格納する。 The storage device 910 is a device for storing data. The storage device 910 may include a storage medium, a recording device that records data on the storage medium, a reading device that reads data from the storage medium, a deletion device that deletes data recorded on the storage medium, and the like. The storage device 910 is configured with, for example, an HDD (Hard Disk Drive). This storage device 910 drives a hard disk and stores programs executed by the CPU 901 and various data.

通信装置９１１は、例えば、ネットワークに接続するための通信デバイス等で構成された通信インタフェースである。また、通信装置９１１は、無線通信または有線通信のどちらに対応してもよい。 The communication device 911 is, for example, a communication interface configured with a communication device for connecting to a network. Further, the communication device 911 may support either wireless communication or wired communication.

以上、本発明の実施形態に係る指示者側システムのハードウェア構成例について説明した。 The example hardware configuration of the instructor-side system according to the embodiment of the present invention has been described above.

（３．まとめ）
以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 (3. Summary)
Although preferred embodiments of the present invention have been described above in detail with reference to the accompanying drawings, the present invention is not limited to such examples. It is clear that a person with ordinary knowledge in the technical field to which the present invention pertains can come up with various changes or modifications within the scope of the technical idea stated in the claims. It is understood that these also naturally fall within the technical scope of the present invention.

例えば、上記では、指示者および作業者それぞれが一人ずつである場合について説明した。しかし、指示者は複数存在してもよい。かかる場合には、複数の指示者それぞれによるジェスチャが扱われてもよい。あるいは、作業者は複数存在してもよい。かかる場合には、ジェスチャが配置されるカメラ座標系に対応するカメラが指定されることによって、指定されたカメラの作業者に対してジェスチャが提示されてもよい。 For example, in the above description, there is one instructor and one worker. However, there may be multiple instructors. In such a case, gestures by each of a plurality of instructors may be handled. Alternatively, there may be multiple workers. In such a case, by specifying a camera corresponding to the camera coordinate system in which the gesture is placed, the gesture may be presented to the operator of the specified camera.

また、上記では、切り替え信号に基づいて、ジェスチャが配置される基準位置および基準方向が切り替えられる場合について主に説明した。しかし、切り替え信号に基づいて、ジェスチャ以外の対象（例えば、指示書きなど）が配置される基準位置および基準方向も、ジェスチャが配置される基準位置および基準方向と同様にして切り替えられてもよい。 Furthermore, the above description has mainly been given to the case where the reference position and reference direction in which gestures are placed are switched based on the switching signal. However, based on the switching signal, the reference position and reference direction where objects other than gestures (for example, written instructions) are placed may also be switched in the same manner as the reference position and reference direction where gestures are placed.

また、上記では、切り替え信号が動画像から静止画への切り替えを示す場合に、指示者側システムにおける演算処理部２４が、動画からスナップショットを作成し、作成したスナップショットが静止画としてディスプレイ２１によって表示されるように制御する場合について主に説明した。しかし、演算処理部２４は、静止画とともに動画をディスプレイ２１に表示させてもよい。このとき、静止画と動画とは、並置されてもよい。 Furthermore, in the above, when the switching signal indicates switching from a moving image to a still image, the arithmetic processing unit 24 in the instructor side system creates a snapshot from the moving image, and the created snapshot is displayed as a still image on the display 21. The explanation has mainly been given to the case where the control is performed so that the screen is displayed using . However, the arithmetic processing unit 24 may display a moving image on the display 21 together with the still image. At this time, the still image and the moving image may be juxtaposed.

あるいは、静止画状態においては、静止画と動画とは並置されなくてもよい。このとき、動画からスナップショットを作成する処理は、指示者側システムにおける演算処理部２４の代わりに、作業者側システムにおける演算処理部１４によって行われてもよい。かかる場合には、作業者側システムにおいては、カメラ１２による動画撮影が停止され、作業者側システムから指示者側システムには、動画の送信が停止されてもよい。これによって、カメラ１２による消費電力が低減され得る他、通信帯域が低減され得る。 Alternatively, in the still image state, the still image and the moving image may not be juxtaposed. At this time, the process of creating a snapshot from the video may be performed by the arithmetic processing section 14 in the worker's system instead of the arithmetic processing section 24 in the instructor's system. In such a case, the camera 12 may stop capturing a moving image in the worker's system, and may stop transmitting the moving image from the worker's system to the instructor's system. As a result, not only the power consumption by the camera 12 can be reduced, but also the communication band can be reduced.

上記では、仮想的な指示音声が配置される基準位置が、動画状態から静止画状態への切り替えを示す切り替え信号が取得されたタイミングにおけるカメラ１２の位置である場合を主に説明した。しかし、仮想的な指示音声が配置される基準位置は、切り替え信号が取得されたタイミングにおけるカメラ１２の位置からずれた位置（オフセットされた位置）とされてもよい。 In the above, the case where the reference position where the virtual instruction voice is placed is the position of the camera 12 at the timing when the switching signal indicating switching from the moving image state to the still image state is acquired has been mainly described. However, the reference position where the virtual instruction voice is placed may be a position shifted (offset position) from the position of the camera 12 at the timing when the switching signal was acquired.

また、上記では、指示者のジェスチャおよび指示者の音声が、静止画状態と動画状態との間の切り替えのために別々に用いられる場合について主に想定した。しかし、静止画状態と動画状態との間の切り替えには、指示者のジェスチャおよび指示者の音声の組み合わせが用いられてもよい。 Further, in the above description, the case where the gesture of the instructor and the voice of the instructor are used separately for switching between the still image state and the moving image state has been mainly assumed. However, a combination of the instructor's gesture and the instructor's voice may be used to switch between the still image state and the moving image state.

例えば、ジェスチャ認識装置２６は、ジェスチャ入力装置２２によって受け付けられた指示者のジェスチャから、切り替えの禁止を示すジェスチャパターン（第２のジェスチャパターン）の認識を試みてもよい。切り替えの禁止を示すジェスチャパターンは、具体的にどのようなジェスチャパターンであってもよい。 For example, the gesture recognition device 26 may attempt to recognize a gesture pattern (second gesture pattern) indicating prohibition of switching from the gestures of the instructor received by the gesture input device 22. The gesture pattern indicating prohibition of switching may be any specific gesture pattern.

そして、演算処理部２４は、切り替えの禁止を示すジェスチャパターンが認識されたときには、音声認識装置２７によって特定の音声パターンが認識されたとしても、切り替え信号を演算処理部１４に出力しなくてもよい。すなわち、演算処理部２４は、音声認識装置２７によって特定の音声パターンが認識された場合に、切り替えの禁止を示すジェスチャパターンが認識されないときにのみ、切り替え信号を演算処理部１４に出力してもよい。 Then, when a gesture pattern indicating prohibition of switching is recognized, the arithmetic processing unit 24 does not need to output a switching signal to the arithmetic processing unit 14 even if a specific voice pattern is recognized by the speech recognition device 27. good. That is, the arithmetic processing unit 24 outputs a switching signal to the arithmetic processing unit 14 only when a specific voice pattern is recognized by the speech recognition device 27 and a gesture pattern indicating prohibition of switching is not recognized. good.

あるいは、ジェスチャ認識装置２６は、ジェスチャ入力装置２２によって受け付けられた指示者のジェスチャから、切り替えの実行を示すジェスチャパターン（第３のジェスチャパターン）の認識を試みてもよい。切り替えの実行を示すジェスチャパターンは、具体的にどのようなジェスチャパターンであってもよい。 Alternatively, the gesture recognition device 26 may attempt to recognize a gesture pattern (third gesture pattern) indicating execution of switching from the gestures of the instructor received by the gesture input device 22. The gesture pattern indicating execution of switching may be any specific gesture pattern.

そして、演算処理部２４は、切り替えの実行を示すジェスチャパターンが認識されないときには、音声認識装置２７によって特定の音声パターンが認識されたとしても、切り替え信号を演算処理部１４に出力しなくてもよい。すなわち、演算処理部２４は、音声認識装置２７によって特定の音声パターンが認識された場合に、切り替えの実行を示すジェスチャパターンが認識されたときにのみ、切り替え信号を演算処理部１４に出力してもよい。 Then, when the gesture pattern indicating execution of switching is not recognized, the arithmetic processing unit 24 does not need to output a switching signal to the arithmetic processing unit 14 even if a specific voice pattern is recognized by the speech recognition device 27. . That is, the arithmetic processing unit 24 outputs a switching signal to the arithmetic processing unit 14 only when a gesture pattern indicating execution of switching is recognized when a specific voice pattern is recognized by the speech recognition device 27. Good too.

上記では、ジェスチャ認識装置２６によって第１のジェスチャパターンが認識された場合、または、音声認識装置２７によって特定の音声パターンが認識された場合に、演算処理部２４が、静止画像と動画像との間における切り替え信号を演算処理部１４に出力する場合を主に想定した。しかし、切り替え信号を演算処理部１４に出力するか否かは、追加的な条件も加味されて判定されてもよい。 In the above, when the first gesture pattern is recognized by the gesture recognition device 26 or when a specific voice pattern is recognized by the voice recognition device 27, the calculation processing unit 24 The main assumption is that a switching signal between 1 and 2 is output to the arithmetic processing unit 14. However, whether or not to output the switching signal to the arithmetic processing unit 14 may be determined by taking additional conditions into consideration.

例えば、作業者の状態を検出するセンサ（例えば、加速度センサ、振動センサ、ジャイロセンサなど）と、かかるセンサデータ（第２のセンサデータ）から作業者の行動を認識する行動認識装置とが設けられていてもよい。例えば、センサは、作業者の身体に付されていてもよい（例えば、ＡＲディスプレイ１１に付されていてもよい）。行動認識装置は、作業者側システムに設けられてもよいし、指示者側システムに設けられてもよいし、作業者側システムおよび指示者側システムの外部に設けられてもよい。 For example, a sensor (for example, an acceleration sensor, a vibration sensor, a gyro sensor, etc.) that detects the state of the worker and an action recognition device that recognizes the worker's actions from such sensor data (second sensor data) are provided. You can leave it there. For example, the sensor may be attached to the worker's body (for example, it may be attached to the AR display 11). The behavior recognition device may be provided in the worker-side system, the instructor-side system, or outside the worker-side system and the instructor-side system.

例えば、センサデータから認識される作業者の行動が作業者の移動中（例えば、歩行中など）を示す場合には、作業を行っていないことが想定される。したがって、かかる場合には、演算処理部２４は、動画像から静止画像への切り替えを示す入力パターンがジェスチャ認識装置２６または音声認識装置２７によって認識されたとしても、動画像から静止画像への切り替え信号を演算処理部１４に出力しなくてもよい。 For example, when the worker's behavior recognized from the sensor data indicates that the worker is moving (for example, walking), it is assumed that the worker is not working. Therefore, in such a case, even if the gesture recognition device 26 or the voice recognition device 27 recognizes the input pattern indicating switching from a moving image to a still image, the arithmetic processing unit 24 will not be able to switch from a moving image to a still image. It is not necessary to output the signal to the arithmetic processing section 14.

また、上記では、動画状態と静止画状態との間の切り替えのために、ジェスチャ認識装置２６および音声認識装置２７が専用に設けられる場合について主に想定した。しかし、動画状態と静止画状態との間の切り替え以外の機能（例えば、通信の切断、音量の変更など）を実現するために設けられているジェスチャ認識装置および音声認識装置が、動画状態と静止画状態との間の切り替えを行うためのジェスチャ認識装置２６および音声認識装置２７に流用されてもよい。 Further, in the above description, the case where the gesture recognition device 26 and the voice recognition device 27 are provided exclusively for switching between the moving image state and the still image state has been mainly assumed. However, the gesture recognition device and voice recognition device provided to realize functions other than switching between the video state and the still image state (for example, disconnecting communication, changing the volume, etc.) The present invention may also be used as a gesture recognition device 26 and a voice recognition device 27 for switching between the image state and the image state.

１遠隔作業支援システム
１１ＡＲディスプレイ
１２カメラ
１３位置姿勢計測部
１４演算処理部
１５スピーカ
１６音声処理部
２１ディスプレイ
２２ジェスチャ入力装置
２４演算処理部
２５マイクロフォン
２６ジェスチャ認識装置
２７音声認識装置
３０ネットワーク

1 Remote work support system 11 AR display 12 Camera 13 Position and orientation measurement unit 14 Arithmetic processing unit 15 Speaker 16 Voice processing unit 21 Display 22 Gesture input device 24 Arithmetic processing unit 25 Microphone 26 Gesture recognition device 27 Voice recognition device 30 Network

Claims

a recognition unit that recognizes a specific input pattern from the first sensor data;
a switching signal output unit that outputs a switching signal between a still image and a moving image to a worker-side device based on the recognition of the input pattern;
an image display control unit that controls the display of gestures on the display based on gesture information of the instructor, and controls the display of still images or moving images transmitted from the worker side device on the display based on the switching signal; and,
An instructor-side device comprising:

The first sensor data includes a gesture,
the specific input pattern includes a first gesture pattern;
The instructor side device according to claim 1.

The first sensor data includes audio,
the specific input pattern includes a specific audio pattern;
The instructor side device according to claim 1 or 2.

The recognition unit recognizes a second gesture pattern from the gesture of the instructor,
The switching signal output unit does not output the switching signal to the worker side device when the second gesture pattern is recognized, even if the specific voice pattern is recognized.
The instructor side device according to claim 3.

The recognition unit recognizes a third gesture pattern from the gesture of the instructor,
The switching signal output unit does not output the switching signal to the worker side device when the third gesture pattern is not recognized, even if the specific voice pattern is recognized.
The instructor side device according to claim 3.

The switching signal output unit outputs a switching signal from a still image to a moving image to the worker side device based on recognition of an input pattern indicating switching from a still image to a moving image.
The instructor-side device according to any one of claims 1 to 5.

The switching signal output unit outputs a switching signal from a moving image to a still image to the worker side device based on recognition of an input pattern indicating switching from a moving image to a still image.
The instructor-side device according to any one of claims 1 to 6.

When the worker's behavior recognized from the second sensor data indicates that the worker is moving, the switching signal output unit determines that an input pattern indicating switching from a moving image to a still image has been recognized. Also, a switching signal from a moving image to a still image is not output to the worker side device.
The instructor side device according to claim 7.

recognizing a specific input pattern from the first sensor data;
outputting a switching signal between a still image and a moving image to a worker-side device based on the recognition of the input pattern;
controlling the display of the gesture on the display based on gesture information of the instructor, and controlling the display of the still image or moving image transmitted from the worker side device on the display based on the switching signal;
A method of providing.

computer,
a recognition unit that recognizes a specific input pattern from the first sensor data;
a switching signal output unit that outputs a switching signal between a still image and a moving image to a worker-side device based on the recognition of the input pattern;
an image display control unit that controls the display of gestures on the display based on gesture information of the instructor, and controls the display of still images or moving images transmitted from the worker side device on the display based on the switching signal; and,
A program that functions as an instructor-side device equipped with.