JP2015115724A

JP2015115724A - Video instruction method capable of superposing instruction picture on imaged moving picture, system, terminal, and program

Info

Publication number: JP2015115724A
Application number: JP2013255497A
Authority: JP
Inventors: 大輔荒井; Daisuke Arai; 智彦大岸; Tomohiko Ogishi; 小林　達也; Tatsuya Kobayashi; 達也小林; 智弘辻; Toshihiro Tsuji; 加藤　晴久; Haruhisa Kato; 晴久加藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2013-12-10
Filing date: 2013-12-10
Publication date: 2015-06-22
Anticipated expiration: 2033-12-10
Also published as: JP6156930B2

Abstract

PROBLEM TO BE SOLVED: To provide a video instruction method and the like for performing an instruction on video imaged in a camera in one terminal, from another terminal by using a picture without using a marker or a registered object picture.SOLUTION: A first terminal sequentially transmits an imaged moving picture. Next, a second terminal has a user designate a second predetermined range for a received imaged moving picture and has the user write an instruction still picture. Next, the second terminal calculates a coordinate relative value between a first predetermined range including the instruction still picture written by the user and the second predetermined range, and transmits the coordinate relative value, the instruction still picture, and an imaged still picture obtained by trimming the imaged moving picture as a still picture in the second predetermined range to the first terminal. The first terminal detects a third predetermined range in which an imaged imaged moving picture and the imaged still picture match with each other, detects a fourth predetermined range from the third predetermined range by using the coordinate relative value, and superposes the instruction still picture on the fourth predetermined range to display it on the display.

Description

本発明は、端末間のオンラインビデオサービスの技術に関する。 The present invention relates to a technique of online video service between terminals.

近年、スマートフォンやタブレット等の端末の普及に伴って、地理的に離れた端末間で、ネットワークを介したオンラインビデオサービスが提供されている（例えば非特許文献１参照）。このサービスによれば、例えば現場作業の用途として、現場作業員が持つ端末で撮影された映像を、遠隔の作業管理者へリアルタイムに送信することができる。これに対し、作業管理者は、映像でその作業現場の状況を認識し、音声で指示することができる。 In recent years, with the spread of terminals such as smartphones and tablets, online video services via a network have been provided between geographically distant terminals (see, for example, Non-Patent Document 1). According to this service, for example, as an application for field work, an image captured by a terminal of a field worker can be transmitted to a remote work manager in real time. On the other hand, the work manager can recognize the situation at the work site by video and can give an instruction by voice.

図１は、オンラインビデオサービスのシステム構成図である。 FIG. 1 is a system configuration diagram of an online video service.

図１のシステムによれば、携帯電話機やスマートフォンのような端末が、撮影した映像データを、ネットワークを介してリアルタイムに他方の端末へ、ストリーミングで伝送している。近年、携帯端末のようなポータブル型機器でも、ＨＤ(High-Definition)クラスの映像を撮影することができる。 According to the system of FIG. 1, a terminal such as a mobile phone or a smartphone transmits captured video data to the other terminal in a streaming manner via a network. In recent years, HD (High-Definition) class images can be taken even by portable devices such as portable terminals.

図１によれば、端末１は、現場作業員（被指示者）によって所持され、搭載されたカメラによってその映像が撮影される。一方で、端末２は、作業管理者（指示者）によって所持される。そして、端末１は、アクセスネットワーク及びインターネットを介して、その映像データを端末２へリアルタイムに送信する。端末２は、受信した映像データをディスプレイに再生することによって、作業管理者に対し、現場作業員の状況を視認させることができる。 According to FIG. 1, the terminal 1 is carried by a field worker (instructed person), and the video is photographed by a mounted camera. On the other hand, the terminal 2 is possessed by a work manager (instructor). Then, the terminal 1 transmits the video data to the terminal 2 in real time via the access network and the Internet. The terminal 2 can make the work manager visually recognize the situation of the field worker by reproducing the received video data on the display.

しかしながら、作業管理者にとって、音声だけでは、現場作業員に対して明確に指示できない場合も多い。例えば、作業管理者としては、現場の多種多様な機器や操作部分の位置を、現場作業員へ映像で指示することできれば望ましい。 However, there are many cases where the work manager cannot clearly give instructions to the field worker by voice alone. For example, it is desirable for a work manager to be able to instruct the on-site worker with the position of various equipment and operation parts on the site.

従来、現場作業員が、自ら所持する端末によって撮影した静止画像を、作業管理者の端末へ送信し、これに対し、作業管理者が指示情報を重畳した静止画像を、現場作業員の端末へ送信する技術がある（例えば非特許文献２参照）。これによって、作業管理者は、音声以外の静止画像によって現場作業員へ指示することができる。 Conventionally, a field worker sends a still image captured by a terminal that he / she owns to the work manager's terminal, and on the other hand, a still image on which the work manager superimposes instruction information is transmitted to the field worker's terminal. There is a technique for transmitting (see, for example, Non-Patent Document 2). As a result, the work manager can instruct the field worker using a still image other than sound.

また、映像上の所定位置を特定するために、拡張現実感（ＡＲ(Augmented Reality)）の技術を適用することもできる（例えば非特許文献３、４参照）。映像の中からＡＲマーカを画像認識することよって、その位置を特定する。また、ＡＲマーカを用いることなく、多数のオブジェクト画像の中から、その映像に写るオブジェクトを検出するマーカレス型・物体認識方式を用いることもできる。 Also, augmented reality (AR) technology can be applied to specify a predetermined position on the video (see, for example, Non-Patent Documents 3 and 4). The position of the AR marker is identified by recognizing the AR marker from the video. Further, it is possible to use a markerless type / object recognition method for detecting an object appearing in the video from a large number of object images without using an AR marker.

「Skype」、[online]、［平成２５年１１月１３日検索］、インターネット＜URL:http://www.skype.com/ja/＞"Skype", [online], [searched November 13, 2013], Internet <URL: http://www.skype.com/en/> 構造計画研究所、「Remote Guideware」、[online]、［平成２５年１１月１３日検索］、インターネット＜http://www4.kke.co.jp/guideware/＞Structural Planning Laboratory, “Remote Guideware”, [online], [searched on November 13, 2013], Internet <http://www4.kke.co.jp/guideware/> 富士通、「ＡＲを利用した作業支援技術」、[online]、［平成２５年１１月１３日検索］、インターネット＜http://jp.fujitsu.com/solutions/industry/nextvalue/technology/tec_ar.html＞Fujitsu, "work support technology using AR", [online], [searched on November 13, 2013], Internet <http://jp.fujitsu.com/solutions/industry/nextvalue/technology/tec_ar.html > ＮＴＴ技研、「ＡＲを用いた設備管理業務システム」、[online]、［平成２５年１１月１３日検索］、インターネット＜http://www.ntt.co.jp/journal/1302/files/jn201302042.pdf＞NTT Giken, "Equipment management business system using AR", [online], [searched on November 13, 2013], Internet <http://www.ntt.co.jp/journal/1302/files/jn201302042 .pdf> 「アフィン変換と射影変換」、[online]、［平成２５年１２月１０日検索］、インターネット＜http://www58.atwiki.jp/dooooornob/pages/43.html＞“Affine transformation and projective transformation”, [online], [December 10, 2013 search], Internet <http://www58.atwiki.jp/dooooornob/pages/43.html> 「画像処理ソリューション」、[online]、［平成２５年１２月１０日検索］、インターネット＜http://imagingsolution.blog107.fc2.com/blog-entry-86.html＞“Image Processing Solution”, [online], [Searched on December 10, 2013], Internet <http://imagingsolution.blog107.fc2.com/blog-entry-86.html> 「行列計算」、[online]、［平成２５年１２月１０日検索］、インターネット＜http://www.cg.info.hiroshima-cu.ac.jp/~miyazaki/knowledge/tech07.html＞"Matrix calculation", [online], [December 10, 2013 search], Internet <http://www.cg.info.hiroshima-cu.ac.jp/~miyazaki/knowledge/tech07.html> 「カメラキャリブレーションと３次元再構成」、[online]、［平成２５年１２月１０日検索］、インターネット＜http://opencv.jp/opencv-2svn/cpp/camera_calibration_and_3d_reconstruction.html＞“Camera calibration and 3D reconstruction”, [online], [Search on December 10, 2013], Internet <http://opencv.jp/opencv-2svn/cpp/camera_calibration_and_3d_reconstruction.html> 「３次元幾何解析」、[online]、［平成２５年１２月１０日検索］、インターネット＜http://www.ieice-hbkb.org/files/02/02gun_02hen_03.pdf＞"3D Geometric Analysis", [online], [December 10, 2013 search], Internet <http://www.ieice-hbkb.org/files/02/02gun_02hen_03.pdf>

しかしながら、非特許文献２に記載の技術によれば、現場作業員の端末に搭載されたカメラを固定しておく必要がある。撮影位置が動いた場合、作業管理者から送信された静止画像と位置のずれを生じ、現場作業員にとって、密集した機器や操作部分に対して指示された位置を認識することができない場合もある。 However, according to the technique described in Non-Patent Document 2, it is necessary to fix the camera mounted on the terminal of the field worker. If the shooting position moves, the position may be different from that of the still image sent from the work manager, and the site worker may not be able to recognize the specified position for the dense equipment or operation part. .

非特許文献３，４に記載の技術によれば、指示画像を重畳配置する映像上の位置を特定するために、特殊なパターンが印刷されたＡＲマーカを必要とする。機器や操作部分に予めＡＲマーカを貼り付けることは、極めて手間がかかる。 According to the techniques described in Non-Patent Documents 3 and 4, an AR marker on which a special pattern is printed is required to specify the position on the video on which the instruction image is superimposed. Pasting the AR marker in advance on the device or the operation part is extremely time-consuming.

また、マーカレス型・物体認識方式の技術によれば、予め多数のオブジェクト画像を事前登録しておく必要がある。勿論、映像に写る対象物と、オブジェクト画像との形状が類似する場合、誤ったオブジェクト画像を対応付けてしまう場合もある。 Further, according to the technique of the markerless type / object recognition method, it is necessary to register a large number of object images in advance. Of course, in the case where the object image and the object image are similar in shape, the wrong object image may be associated.

そこで、本発明は、マーカや登録オブジェクト画像を用いることなく、一方の端末のカメラに写る映像に対して、他方の端末から画像的に指示することができる映像指示方法、システム、端末、及びプログラムを提供することを目的とする。 Therefore, the present invention provides a video instruction method, system, terminal, and program capable of instructing an image captured from the camera of one terminal imagewise from the other terminal without using a marker or a registered object image. The purpose is to provide.

本発明によれば、ディスプレイ及びカメラを有する第１の端末と、ディスプレイを有する第２の端末とが、ネットワークを介して接続されたシステムにおける映像指示方法において、
第１の端末が、カメラによる撮影動画像を逐次、第２の端末へ送信する第１のステップと、
第２の端末が、受信した撮影動画像をディスプレイに表示し、当該撮影動画像に対する第２の所定範囲をユーザに指定させると共に、当該撮影動画像に対する指示静止画像をユーザに書き込ませる第２のステップと、
第２の端末が、ユーザに書き込まれた指示静止画像を含む第１の所定範囲と、第２の所定範囲との間の第１の座標相対値を算出すると共に、第１の座標相対値と、指示静止画像と、第２の所定範囲で撮影動画像を静止画像としてトリミングした撮影静止画像とを、第１の端末へ送信する第３のステップと、
第１の端末が、カメラによって撮影された撮影動画像と撮影静止画像（第２の所定範囲）とがマッチングする第３の所定範囲を検出し、第２の所定範囲と第３の所定範囲との間の第２の座標相対値を算出し、撮影静止画像（第２の所定範囲）に第１の座標相対値及び第２の座標相対値を反映した第４の所定範囲に、指示静止画像を重畳させてディスプレイに表示する第４のステップと
を有することを特徴とする。 According to the present invention, in a video instruction method in a system in which a first terminal having a display and a camera and a second terminal having a display are connected via a network,
A first step in which a first terminal sequentially transmits a moving image captured by a camera to a second terminal;
The second terminal displays the received captured moving image on the display, causes the user to specify a second predetermined range for the captured moving image, and causes the user to write an instruction still image for the captured moving image. Steps,
The second terminal calculates a first coordinate relative value between the first predetermined range including the instruction still image written by the user and the second predetermined range, and the first coordinate relative value A third step of transmitting the instruction still image and the captured still image obtained by trimming the captured moving image as a still image in the second predetermined range to the first terminal;
The first terminal detects a third predetermined range in which the captured moving image captured by the camera matches the captured still image (second predetermined range), and the second predetermined range and the third predetermined range are detected. A second coordinate relative value is calculated, and the designated still image is in the fourth predetermined range reflecting the first coordinate relative value and the second coordinate relative value in the captured still image (second predetermined range). And a fourth step of displaying on the display in a superimposed manner.

本発明の映像指示方法における他の実施形態によれば、
第１の座標相対値は、平行移動行列、平行移動行列及び拡大縮小行列の組み合わせ、ユークリッド変換行列又はアフィン変換行列であり、
第２の座標相対値は、射影変換行列又は姿勢変換行列である
ことも好ましい。 According to another embodiment of the video instruction method of the present invention,
The first coordinate relative value is a translation matrix, a combination of a translation matrix and a scaling matrix, a Euclidean transformation matrix or an affine transformation matrix,
The second coordinate relative value is preferably a projective transformation matrix or an attitude transformation matrix.

本発明の映像指示方法における他の実施形態によれば、
第１のステップについて、第１の端末は、撮影動画像を、所定時間幅で間引いたフレームのみを、第２の端末へ送信することも好ましい。 According to another embodiment of the video instruction method of the present invention,
Regarding the first step, it is also preferable that the first terminal transmits to the second terminal only a frame obtained by thinning the captured moving image by a predetermined time width.

本発明の映像指示方法における他の実施形態によれば、
第１のステップについて、撮影動画像は、動き補償フレーム間予測方式の基準となるＩ(Intra-picture)フレームのみを、第２の端末へ送信することも好ましい。 According to another embodiment of the video instruction method of the present invention,
Regarding the first step, it is also preferable that the captured moving image transmits only an I (Intra-picture) frame, which is a reference for the motion compensation interframe prediction method, to the second terminal.

本発明の映像指示方法における他の実施形態によれば、
第１のステップについて、第１の端末は、Ｉフレームのデータレートを、１つのＧＯＰ(Group Of Pictures)のデータレート以下であって比較的高いレートに設定することも好ましい。 According to another embodiment of the video instruction method of the present invention,
Regarding the first step, it is also preferable that the first terminal sets the data rate of the I frame to a relatively high rate that is equal to or lower than the data rate of one GOP (Group Of Pictures).

本発明の映像指示方法における他の実施形態によれば、
撮影静止画像は、マッチングのための特徴量画像、又は、低データ量のための解像度圧縮画像であり、
指示静止画像は、低データ量のための解像度圧縮画像である
ことも好ましい。 According to another embodiment of the video instruction method of the present invention,
The captured still image is a feature amount image for matching or a resolution-compressed image for low data amount,
The instruction still image is also preferably a resolution-compressed image for a low data amount.

本発明の映像指示方法における他の実施形態によれば、
第２の端末に搭載されたディスプレイは、タッチパネルディスプレイであって、
第２のステップについて、第２の端末は、タッチパネルディスプレイ上でユーザに指によって描かれた画像を指示静止画像とする
ことも好ましい。 According to another embodiment of the video instruction method of the present invention,
The display mounted on the second terminal is a touch panel display,
About a 2nd step, it is also preferable that a 2nd terminal makes an instruction | indication still image the image drawn with the finger | toe to the user on the touch panel display.

本発明の映像指示方法における他の実施形態によれば、
第２の端末は、ディスプレイに表示された撮影動画像に、ユーザによって描かせるタッチペン入力装置を更に接続しており、
第２のステップについて、第２の端末は、タッチペンによってユーザに描かれた画像を指示静止画像とする
ことも好ましい。 According to another embodiment of the video instruction method of the present invention,
The second terminal is further connected to a touch pen input device that allows the user to draw the captured moving image displayed on the display,
About a 2nd step, it is also preferable that a 2nd terminal makes an instruction | indication still image the image drawn by the user with the touch pen.

本発明の映像指示方法における他の実施形態によれば、
第４のステップについて、第１の端末は、ＡＲ（拡張現実、Augmented Reality）のマーカレス型・物体認識方式を適用したものであることも好ましい。 According to another embodiment of the video instruction method of the present invention,
Regarding the fourth step, it is also preferable that the first terminal applies an AR (Augmented Reality) markerless type object recognition method.

本発明によれば、ディスプレイ及びカメラを有する第１の端末と、ディスプレイを有する第２の端末とが、ネットワークを介して接続された映像指示システムにおいて、
第１の端末は、
カメラによる撮影動画像を逐次、第２の端末へ送信する撮影動画像送信手段と、
カメラによって撮影された撮影動画像と、第２の端末から受信した撮影静止画像（第２の所定範囲）とがマッチングする第３の所定範囲を検出し、第２の所定範囲と第３の所定範囲との間の第２の座標相対値を算出し、撮影静止画像（第２の所定範囲）に第１の座標相対値及び第２の座標相対値を反映した第４の所定範囲に、指示静止画像を重畳させてディスプレイに表示する映像表示制御手段と
を有し、
第２の端末は、
受信した撮影動画像をディスプレイに表示し、当該撮影動画像に対する第２の所定範囲をユーザに指定させると共に、当該撮影動画像に対する指示静止画像をユーザに書き込ませる指示静止画像入力手段と、
ユーザに書き込まれた指示静止画像を含む第１の所定範囲と、第２の所定範囲との間の第１の座標相対値を算出すると共に、第１の座標相対値と、指示静止画像と、第２の所定範囲で撮影動画像を静止画像としてトリミングした撮影静止画像とを、第１の端末へ送信する指示静止画像送信手段と
を有することを特徴とする。 According to the present invention, in a video instruction system in which a first terminal having a display and a camera and a second terminal having a display are connected via a network,
The first terminal is
Shooting moving image transmitting means for sequentially transmitting a moving image captured by the camera to the second terminal;
A third predetermined range in which a captured moving image captured by the camera matches a captured still image (second predetermined range) received from the second terminal is detected, and the second predetermined range and the third predetermined range are detected. The second coordinate relative value between the range and the fourth predetermined range reflecting the first coordinate relative value and the second coordinate relative value in the captured still image (second predetermined range) is indicated. Video display control means for displaying a still image superimposed on the display,
The second terminal
An instruction still image input means for displaying the received captured moving image on a display, causing the user to specify a second predetermined range for the captured moving image, and causing the user to write an instruction still image for the captured moving image;
Calculating the first coordinate relative value between the first predetermined range including the instruction still image written by the user and the second predetermined range; the first coordinate relative value; the instruction still image; It has an instruction still image transmission means for transmitting to a first terminal a captured still image obtained by trimming a captured moving image as a still image within a second predetermined range.

本発明によれば、ディスプレイ及びカメラを搭載した端末において、
カメラによる撮影動画像を逐次、相手方端末へ送信する撮影動画像送信手段と、
相手方端末から受信した撮影動画像をディスプレイに表示し、当該撮影動画像に対する第２の所定範囲をユーザに指定させると共に、当該撮影動画像に対する指示静止画像をユーザに書き込ませる指示静止画像入力手段と、
ユーザに書き込まれた指示静止画像を含む第１の所定範囲と、第２の所定範囲との間の第１の座標相対値を算出すると共に、第１の座標相対値と、指示静止画像と、第２の所定範囲で撮影動画像を静止画像としてトリミングした撮影静止画像とを、相手方端末へ送信する指示静止画像送信手段と、
カメラによって撮影された撮影動画像と、相手方端末から受信した撮影静止画像（第２の所定範囲）とがマッチングする第３の所定範囲を検出し、第２の所定範囲と第３の所定範囲との間の第２の座標相対値を算出し、撮影静止画像（第２の所定範囲）に第１の座標相対値及び第２の座標相対値を反映した第４の所定範囲に、指示静止画像を重畳させてディスプレイに表示する映像表示制御手段と
を有することを特徴とする。 According to the present invention, in a terminal equipped with a display and a camera,
Shooting moving image transmission means for sequentially transmitting a moving image captured by the camera to the counterpart terminal;
Instructed still image input means for displaying a captured moving image received from the counterpart terminal on a display, causing the user to specify a second predetermined range for the captured moving image, and for allowing the user to write an instruction still image for the captured moving image. ,
Calculating the first coordinate relative value between the first predetermined range including the instruction still image written by the user and the second predetermined range; the first coordinate relative value; the instruction still image; An instruction still image transmitting means for transmitting a captured still image obtained by trimming a captured moving image as a still image in a second predetermined range to a counterpart terminal;
A third predetermined range in which a captured moving image captured by the camera matches a captured still image (second predetermined range) received from the counterpart terminal is detected, and the second predetermined range and the third predetermined range are detected. A second coordinate relative value is calculated, and the designated still image is in the fourth predetermined range reflecting the first coordinate relative value and the second coordinate relative value in the captured still image (second predetermined range). And video display control means for superimposing and displaying on the display.

本発明によれば、ディスプレイ及びカメラを搭載した端末に搭載されたコンピュータを機能させるプログラムにおいて、
カメラによる撮影動画像を逐次、相手方端末へ送信する撮影動画像送信手段と、
相手方端末から受信した撮影動画像をディスプレイに表示し、当該撮影動画像に対する第２の所定範囲をユーザに指定させると共に、当該撮影動画像に対する指示静止画像をユーザに書き込ませる指示静止画像入力手段と、
ユーザに書き込まれた指示静止画像を含む第１の所定範囲と、第２の所定範囲との間の第１の座標相対値を算出すると共に、第１の座標相対値と、指示静止画像と、第２の所定範囲で撮影動画像を静止画像としてトリミングした撮影静止画像とを、相手方端末へ送信する指示静止画像送信手段と、
カメラによって撮影された撮影動画像と、相手方端末から受信した撮影静止画像（第２の所定範囲）とがマッチングする第３の所定範囲を検出し、第２の所定範囲と第３の所定範囲との間の第２の座標相対値を算出し、撮影静止画像（第２の所定範囲）に第１の座標相対値及び第２の座標相対値を反映した第４の所定範囲に、指示静止画像を重畳させてディスプレイに表示する映像表示制御手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, in a program for causing a computer installed in a terminal equipped with a display and a camera to function,
Shooting moving image transmission means for sequentially transmitting a moving image captured by the camera to the counterpart terminal;
Instructed still image input means for displaying a captured moving image received from the counterpart terminal on a display, causing the user to specify a second predetermined range for the captured moving image, and for allowing the user to write an instruction still image for the captured moving image. ,
Calculating the first coordinate relative value between the first predetermined range including the instruction still image written by the user and the second predetermined range; the first coordinate relative value; the instruction still image; An instruction still image transmitting means for transmitting a captured still image obtained by trimming a captured moving image as a still image in a second predetermined range to a counterpart terminal;
A third predetermined range in which a captured moving image captured by the camera matches a captured still image (second predetermined range) received from the counterpart terminal is detected, and the second predetermined range and the third predetermined range are detected. A second coordinate relative value is calculated, and the designated still image is in the fourth predetermined range reflecting the first coordinate relative value and the second coordinate relative value in the captured still image (second predetermined range). The computer is caused to function as video display control means for superimposing and displaying on the display.

本発明の映像指示方法、システム、端末、及びプログラムによれば、マーカや登録オブジェクト画像を用いることなく、一方の端末のカメラに写る映像に対して、他方の端末から画像的な指示をすることができる。 According to the video instruction method, system, terminal, and program of the present invention, an image instruction is given from the other terminal to the video shown in the camera of one terminal without using a marker or a registered object image. Can do.

オンラインビデオサービスのシステム構成図である。1 is a system configuration diagram of an online video service. FIG. 本発明におけるシーケンス図である。It is a sequence diagram in the present invention. 本発明における撮影動画像のフレームを表す説明図である。It is explanatory drawing showing the flame | frame of the picked-up moving image in this invention. 第１の端末によって撮影された映像を、第２の端末のディスプレイに表示した画面図である。It is the screen figure which displayed the image image | photographed by the 1st terminal on the display of the 2nd terminal. 指示者が第２の端末に指示を書き込んでいる画面図である。It is a screen figure in which the instructor has written the instruction into the second terminal. 指示静止画像及び撮影静止画像を表す説明図である。It is explanatory drawing showing an instruction | indication still image and a picked-up still image. 撮影静止画像の部分に指示静止画像が重畳して表示された第１の端末の画面図である。It is a screen figure of the 1st terminal by which the instruction | indication still image was superimposed and displayed on the part of the picked-up still image. 図７について撮影対象物に対する撮影位置が射影移動した場合における第１の端末の画面図であるFIG. 8 is a screen diagram of the first terminal when the shooting position with respect to the shooting target is projected and moved with respect to FIG. 7. 第１の端末及び第２の端末の機能構成図である。It is a functional block diagram of a 1st terminal and a 2nd terminal. 送信側及び受信側の両方の機能を搭載した両用端末の機能構成図である。FIG. 3 is a functional configuration diagram of a dual-purpose terminal equipped with both functions of a transmission side and a reception side.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本発明によれば、ＡＲのマーカレス型・物体認識方式の適用について、マッチングのキーとなる「撮影静止画像」（第２の所定範囲）をユーザが任意に設定するものである。 According to the present invention, regarding the application of the AR markerless type / object recognition method, a user arbitrarily sets a “captured still image” (second predetermined range) as a matching key.

図２は、本発明におけるシーケンス図である。 FIG. 2 is a sequence diagram in the present invention.

図２によれば、ディスプレイ及びカメラを有する端末１（被指示側端末）と、少なくともディスプレイを有する端末２（指示側端末）とが、ネットワークを介して接続されている。ディスプレイやカメラは、当該端末に予め搭載されたものであってもよいし、外部に接続されたものであってもよい。 According to FIG. 2, a terminal 1 (instructed terminal) having a display and a camera and a terminal 2 (indicating terminal) having at least a display are connected via a network. The display and camera may be preinstalled in the terminal or may be connected to the outside.

［第１のステップＳ１］
端末１が、カメラによる撮影動画像を逐次、端末２へ送信する。例えば作業現場員（被指示者）によって操作される端末１は、作業状況（対象物）を、動画像（ビデオ）として撮影する。ここで、「撮影動画像」としては、所定時間幅で間引いたフレームのみを送信することが好ましい。言い換えれば、動画像を「パラパラ画像」とすることによって、端末２を操作する指示者にとって、撮影動画像を認識しやすくする。 [First Step S1]
The terminal 1 sequentially transmits a moving image captured by the camera to the terminal 2. For example, the terminal 1 operated by a work site worker (instructed person) captures a work situation (object) as a moving image (video). Here, it is preferable to transmit only the frames that are thinned out by a predetermined time width as the “captured moving image”. In other words, by making the moving image “a flip image”, the instructor operating the terminal 2 can easily recognize the captured moving image.

図３は、本発明における撮影動画像のフレームを表す説明図である。 FIG. 3 is an explanatory diagram showing a frame of a captured moving image according to the present invention.

図３（ａ）によれば、例えばMotion JPEGの場合であって、撮影動画像は、全ての各フレームがJPEG圧縮されたものであり、単に所定時間幅でフレームを間引いたものである。 According to FIG. 3A, for example, in the case of Motion JPEG, the captured moving image is obtained by JPEG compression of all the frames, and is simply thinned out with a predetermined time width.

図３（ｂ）によれば、例えば動き補償フレーム間予測方式の場合であって、複数のフレームがＧＯＰ(Group Of Pictures)単位で構成されている。ＧＯＰは、一般に、１つのＩ(Intra-picture)フレームと、複数のＰ(Predictive-picture)フレーム及びＢ(Bidirectionally-picture)フレームとから構成される。そして、本発明によれば、撮影動画像として、Ｉ(Intra-picture)フレームのみが抽出される。即ち、画像全体が符号化されたフレームのみを、パラパラ画像として送信する。 According to FIG. 3B, for example, in the case of a motion compensation inter-frame prediction method, a plurality of frames are configured in units of GOP (Group Of Pictures). A GOP is generally composed of one I (Intra-picture) frame, a plurality of P (Predictive-picture) frames, and a B (Bidirectionally-picture) frame. According to the present invention, only an I (Intra-picture) frame is extracted as a captured moving image. That is, only a frame in which the entire image is encoded is transmitted as a flip image.

また、Ｉフレームのデータレートを、１つのＧＯＰのデータレート以下であって比較的高いレートに設定することも好ましい。例えばＩフレーム１枚のデータレートと、ＧＯＰのデータレートとを同一にすることもできる。これによって、撮影動画像におけるパラパラ画像１枚の解像度を高くし、端末２を操作する指示者に対して、撮影動画像を細部に渡って認識しやすくすることができる。 It is also preferable to set the data rate of the I frame to a relatively high rate that is equal to or lower than the data rate of one GOP. For example, the data rate of one I frame can be made the same as the GOP data rate. Thereby, it is possible to increase the resolution of one flip image in the captured moving image, and to easily recognize the captured moving image in detail for the instructor who operates the terminal 2.

［第２のステップＳ２］
（Ｓ２１）端末２が、受信した撮影動画像をディスプレイに表示し、当該撮影動画像に対する第２の所定範囲をユーザに指定させる。 [Second Step S2]
(S21) The terminal 2 displays the received captured moving image on the display, and instructs the user to specify a second predetermined range for the captured moving image.

図４は、第１の端末によって撮影された映像を、第２の端末のディスプレイに表示した画面図である。 FIG. 4 is a screen diagram in which an image captured by the first terminal is displayed on the display of the second terminal.

図４によれば、作業管理者（指示者）が所持する端末２には、作業現場員（被指示者）の操作する端末１によって撮影された現場状況が、動画像（パラパラ画像）として表示される。また、端末２のディスプレイの右上に、「指示書込」用ボタンが明示されている。指示者は、撮影動画像がパラパラ画像として逐次進行していく途中で、「指示書込」用ボタンを押下することによって１枚の画像を対象として、停止させることができる。 According to FIG. 4, on the terminal 2 possessed by the work manager (instructor), the scene situation photographed by the terminal 1 operated by the work site worker (instructed person) is displayed as a moving image (para image). Is done. In addition, a “write instruction” button is clearly shown in the upper right of the display of the terminal 2. The instructor can stop a single image as a target by pressing the “instruction writing” button while the captured moving image sequentially proceeds as a flip image.

図５は、指示者が第２の端末に指示を書き込んでいる画面図である。 FIG. 5 is a screen diagram in which the instructor writes an instruction on the second terminal.

図５によれば、最初に、ユーザは、端末２のタッチパネルディスプレイ上に指で、「第２の所定範囲」を指定している。ここでは、キーボードではなく、スマートフォンが写る部分を第２の所定範囲として指定している。「第２の所定範囲」とは、マッチング用の「撮影静止画像」の範囲である。勿論、第２の所定範囲は、ユーザがこれから指によって描こうとする指示静止画像の枠として指定するものであってもよいし、描こうとする指示静止画像とは全く別の部分を指定するものであってもよい。 According to FIG. 5, first, the user designates the “second predetermined range” with a finger on the touch panel display of the terminal 2. Here, not the keyboard but the part in which the smartphone is shown is designated as the second predetermined range. The “second predetermined range” is a range of “captured still image” for matching. Of course, the second predetermined range may be designated as a frame of the designated still image that the user intends to draw with a finger, or designates a part completely different from the designated still image to be drawn. There may be.

（Ｓ２２）端末２は、当該撮影動画像に対する指示静止画像をユーザに書き込ませる。ここでは、ユーザは、キーボードのキー［Ｒ］の部分を差して、「←ココ」と描いている。 (S22) The terminal 2 causes the user to write an instruction still image for the captured moving image. Here, the user draws “← here” by inserting the key [R] portion of the keyboard.

また、他の実施形態として、端末２は、ディスプレイに表示された撮影動画像に、ユーザによって描かせるタッチペン入力装置を更に接続しているものであってもよい。この場合、端末２は、タッチペンによってユーザに描かれた画像を指示静止画像とすることができる。 As another embodiment, the terminal 2 may be further connected to a touch pen input device that allows a user to draw a captured moving image displayed on a display. In this case, the terminal 2 can use the image drawn by the user with the touch pen as the designated still image.

［第３のステップＳ３］
端末２は、以下の２つの静止画像と第１の座標相対値とを、端末１へ送信する。
「指示静止画像」：ユーザに書き込まれた第１の所定範囲の静止画像
「撮影静止画像」：第２の所定範囲で撮影動画像をトリミングした静止画像
「第１の座標相対値」：第１の所定範囲と第２の所定範囲との間の座標相対値 [Third Step S3]
The terminal 2 transmits the following two still images and the first coordinate relative value to the terminal 1.
“Instructed still image”: Still image of a first predetermined range written by the user “Captured still image”: Still image obtained by trimming a captured moving image within a second predetermined range “First coordinate relative value”: First Relative coordinate value between the predetermined range and the second predetermined range

（Ｓ３１）端末２が、ユーザに書き込まれた指示静止画像を含む第１の所定範囲を決定する。「第１の所定範囲」は、指示静止画像を含むように、自動的に例えば矩形状の所定範囲に設定される。 (S31) The terminal 2 determines a first predetermined range including the instruction still image written by the user. The “first predetermined range” is automatically set to a rectangular predetermined range so as to include the designated still image.

「撮影静止画像」は、後述するように、画像マッチングの「キー画像」として用いられるものである。そのために、撮影静止画像は、画像そのものである必要はなく、マッチングのための特徴量画像であってもよい。特徴量画像とは、画像の局所領域から算出された特徴量であって、例えば画像内のエッジやコーナー等の局所領域から抽出される。代表的には例えばＳＩＦＴ(Scale-Invariant Feature Transform)やＳＵＲＦ(Speeded Up Robust Features)が用いられる。その他、計算コストに優れるバイナリ特徴量を用いることもできる。また、ＳＳＤ(Sum of Squared Difference)や、正規化相互相関（ＮＣＣ）でマッチングを行うための、局所的な切り出し画像（パッチ）であってもよい。 As will be described later, the “photographed still image” is used as a “key image” for image matching. Therefore, the captured still image does not need to be the image itself, and may be a feature amount image for matching. The feature amount image is a feature amount calculated from a local region of the image, and is extracted from a local region such as an edge or a corner in the image, for example. Typically, for example, SIFT (Scale-Invariant Feature Transform) or SURF (Speeded Up Robust Features) is used. In addition, a binary feature amount that is excellent in calculation cost can also be used. Further, it may be a locally cut out image (patch) for performing matching by SSD (Sum of Squared Difference) or normalized cross correlation (NCC).

更に、「撮影静止画像」及び「指示静止画像」は、低データ量のための解像度圧縮画像であってもよい。これら画像は、bitmap形式の画像である必要はなく、例えばJPEGのような圧縮画像であってもよい。 Furthermore, the “photographed still image” and the “instructed still image” may be resolution-compressed images for a low data amount. These images do not need to be bitmap format images, and may be compressed images such as JPEG.

図６は、指示静止画像及び撮影静止画像を表す説明図である。 FIG. 6 is an explanatory diagram showing an instruction still image and a captured still image.

図６によれば、第１の所定範囲と第２の所定範囲のとの関係が表されている。
第１の所定範囲：指示側端末でユーザが書き込んだ指示静止画像を含む範囲
第２の所定範囲：指示側端末でユーザが指定したマッチング用の撮影静止画像を含む範囲
ここで、所定範囲間では、以下のような変換関係を有する。
第２の所定範囲−第１の所定範囲間 -> ユークリッド変換 FIG. 6 shows the relationship between the first predetermined range and the second predetermined range.
First predetermined range: a range including an instruction still image written by the user at the instruction side terminal. Second predetermined range: a range including a matching still image specified by the user at the instruction side terminal. Have the following conversion relationship.
Between the second predetermined range and the first predetermined range-> Euclidean transformation

（Ｓ３２）端末２は、指示静止画像の「第１の所定範囲」と、「第２の所定範囲」との間の第１の座標相対値を算出する（図６における行列Ｈａ参照）。第１の座標相対値は、例えば平行移動行列、平行移動行列及び拡大縮小行列の組み合わせ、ユークリッド変換行列又はアフィン変換行列であってもよい（例えば非特許文献５〜７参照）。これらは、平行移動のみ、又は平行移動に、及び線形変換（拡大縮小、剪断、回転等の線形変換）を組み合わせた変換をいう。例えば矩形状の範囲であれば、第１の所定範囲の４頂点座標と、第２の所定範囲の４頂点座標との間の移動差分を意味する。 (S32) The terminal 2 calculates a first coordinate relative value between the “first predetermined range” and the “second predetermined range” of the instruction still image (see the matrix Ha in FIG. 6). The first coordinate relative value may be, for example, a translation matrix, a combination of a translation matrix and an enlargement / reduction matrix, an Euclidean transformation matrix, or an affine transformation matrix (see, for example, Non-Patent Documents 5 to 7). These refer to a translation only or a combination of translation and a linear transformation (linear transformation such as scaling, shearing, rotation, etc.). For example, a rectangular range means a movement difference between the four vertex coordinates of the first predetermined range and the four vertex coordinates of the second predetermined range.

アフィン変換の場合、第１の所定範囲と第２の所定範囲との間で、線形変換は座標に対する２×２行列を乗算することによって表され、平行移動は２次元のベクトルの加算で表わされる。

ｘ，ｙ：第２の所定範囲におけるｘ座標及びｙ座標
ｘ'，ｙ'：第１の所定範囲におけるｘ座標及びｙ座標
ｈ₁₁，ｈ₁₂，ｈ₂₁，ｈ₂₂要素：線形変換要素
ｈ₁₃，ｈ₂₃要素：平行移動要素 In the case of affine transformation, between the first predetermined range and the second predetermined range, the linear transformation is represented by multiplying the 2 × 2 matrix with respect to the coordinates, and the translation is represented by the addition of a two-dimensional vector. .

x, y: x-coordinate and y-coordinate in the second predetermined range x ′, y ′: x-coordinate and y-coordinate in the first predetermined range h ₁₁ , h ₁₂ , h ₂₁ , h ₂₂ element: linear transformation element h ₁₃ , H ₂₃ element: translation element

また、ユークリッド変換の場合、以下のように表される。

ｘ，ｙ：第２の所定範囲におけるｘ座標及びｙ座標
ｘ'，ｙ'：第１の所定範囲におけるｘ座標及びｙ座標
θ：回転変換要素
tx，ty：平行移動要素 In the case of Euclidean transformation, it is expressed as follows.

x, y: x-coordinate and y-coordinate in the second predetermined range x ′, y ′: x-coordinate and y-coordinate in the first predetermined range θ: rotation conversion element
tx, ty: translation element

（Ｓ３３）図５によれば、端末２のディスプレイの右上に、「指示送信」用ボタンが明示されている。ユーザは、指示静止画像を書き込んだ後、「指示送信」用ボタンを押下する。これによって、端末２は、第１の座標相対値（図６における行列Ｈａのパラメータ要素）と、指示静止画像と、第２の所定範囲で撮影動画像を静止画像としてトリミングした撮影静止画像とを、被指示端末１へ送信する。 (S33) According to FIG. 5, a “command transmission” button is clearly shown in the upper right of the display of the terminal 2. After writing the instruction still image, the user presses the “instruction transmission” button. As a result, the terminal 2 obtains the first coordinate relative value (the parameter element of the matrix Ha in FIG. 6), the designated still image, and the captured still image obtained by trimming the captured moving image as a still image within the second predetermined range. And transmitted to the instructed terminal 1.

［第４のステップＳ４］
図６によれば、更に、第３の所定範囲と第４の所定範囲のとの関係が表されている。
第３の所定範囲：被指示側端末で撮影静止画像にマッチングした範囲
第４の所定範囲：被指示側端末で指示静止画像を重畳すべき範囲
ここで、所定範囲間では、以下のような変換関係を有する。
第２の所定範囲−第３の所定範囲間 -> 射影変換・姿勢変換
第１の所定範囲−第４の所定範囲間 -> 射影変換・姿勢変換 [Fourth Step S4]
FIG. 6 further shows the relationship between the third predetermined range and the fourth predetermined range.
Third predetermined range: a range matched with a captured still image at the instructed side terminal Fourth predetermined range: a range in which the instructed still image is to be superimposed at the instructed side terminal Here, the following conversion is performed between the predetermined ranges: Have a relationship.
Between the second predetermined range and the third predetermined range-> Projection conversion / attitude conversion Between the first predetermined range and the fourth predetermined range-> Projection conversion / attitude conversion

端末１は、以下のステップで、指示静止画像を撮影動画像に重畳して表示する。
（Ｓ４１）端末１は、カメラによって撮影された「撮影動画像」（撮影プレビュー映像）と、端末２から受信した「撮影静止画像」（第２の所定範囲）とがマッチングする第３の所定範囲を検出する。ここで、撮影静止画像を、射影変換又は姿勢変換させながら、撮影動画像にマッチングさせることも好ましい。撮影動画像は常に動いているものであるので、撮影静止画像とのマッチングの追従処理は常に実行されている。 The terminal 1 displays the instruction still image superimposed on the captured moving image in the following steps.
(S41) The terminal 1 matches the “captured moving image” (captured preview video) captured by the camera with the “captured still image” (second predetermined range) received from the terminal 2 in the third predetermined range. Is detected. Here, it is also preferable to match the photographed still image with the photographed moving image while performing projective transformation or posture transformation. Since the captured moving image is constantly moving, the tracking process for matching with the captured still image is always executed.

（Ｓ４２）第２の所定範囲と第３の所定範囲との間の第２の座標相対値を算出する（図６における行列Ｈｐ参照）。第２の座標相対値は、例えば射影変換行列又は姿勢変換行列のいずれかである（例えば非特許文献８、９参照）。 (S42) A second coordinate relative value between the second predetermined range and the third predetermined range is calculated (see matrix Hp in FIG. 6). The second coordinate relative value is, for example, either a projective transformation matrix or a posture transformation matrix (see, for example, Non-Patent Documents 8 and 9).

「射影変換」とは、平行回転移動に、平面の遠近感を表現する射影を更に加えたものである。例えば以下のような行列式によって表される。

ｘ，ｙ：撮影静止画像におけるｘ座標及びｙ座標
ｘ'，ｙ'：マッチング先のｘ座標及びｙ座標
ｈ₁₁〜ｈ₃₃：パラメータ “Projective transformation” is obtained by further adding a projection expressing the perspective of a plane to a parallel rotational movement. For example, it is represented by the following determinant.

x, y: x-coordinate and y-coordinate in the photographed still image x ′, y ′: x-coordinate and y-coordinate of matching destination h _{11 to} h ₃₃ : parameter

「姿勢変換」とは、３次元空間内の剛体運動であって、６自由度の姿勢行列で表現するものである。ここで「姿勢行列」とは、３次元特殊ユークリッド群ＳＥ（３）に属し、３自由度の３次元回転行列と３次元並進ベクトルとで表される。例えば以下のような行列式によって表される。

Ａ：カメラの内部パラメータ
予めカメラキャリブレーションによって求めておくことが望ましい。
しかしながら、実際の値とずれた場合でも最終的に姿勢行列と打ち消し合う
ために、重畳表示の位置には影響しない。そのため本発明の利用用途の場合、
一般的なカメラの値で代用することができる。
Ｒ（r11〜r33）：３次元空間内の回転を表すパラメータ
各パラメータはオイラー角等の表現によって３パラメータで表現可能
ｔ（t1〜t3）：３次元空間内の平行移動を表すパラメータ。
ｘ，ｙ：撮影静止画像におけるｘ座標及びｙ座標
ｘ'，ｙ'：マッチング先のｘ座標及びｙ座標 “Posture transformation” is a rigid body motion in a three-dimensional space and is expressed by a posture matrix of 6 degrees of freedom. Here, the “attitude matrix” belongs to the three-dimensional special Euclidean group SE (3) and is represented by a three-degree-of-freedom three-dimensional rotation matrix and a three-dimensional translation vector. For example, it is represented by the following determinant.

A: Camera internal parameters
It is desirable to obtain in advance by camera calibration.
However, even if it deviates from the actual value, it will eventually cancel out with the attitude matrix
Therefore, the superimposed display position is not affected. Therefore, in the use application of the present invention,
A common camera value can be used instead.
R (r11 to r33): parameter representing rotation in the three-dimensional space
Each parameter can be expressed by three parameters by expressing Euler angles etc. t (t1 to t3): A parameter representing a parallel movement in a three-dimensional space.
x, y: x-coordinate and y-coordinate in the captured still image x ′, y ′: x-coordinate and y-coordinate of matching destination

（Ｓ４３）撮影静止画像（第２の所定範囲）に第１の座標相対値及び第２の座標相対値を反映した第４の所定範囲に、「指示静止画像」を重畳させてディスプレイに表示する（図６におけるＨｐ・Ｈａ参照）。具体的には、ＡＲのマーカレス型・物体認識方式を適用したものである。 (S43) The “designated still image” is superimposed on the fourth predetermined range reflecting the first coordinate relative value and the second coordinate relative value on the captured still image (second predetermined range) and displayed on the display. (Refer to Hp · Ha in FIG. 6). Specifically, an AR markerless type / object recognition method is applied.

図７は、撮影対象物に対する撮影位置が平行回転移動した場合における第１の端末の画面図である。
図８は、図７について撮影対象物に対する撮影位置が射影移動した場合における第１の端末の画面図である。 FIG. 7 is a screen diagram of the first terminal when the shooting position with respect to the shooting target object has been translated and rotated.
FIG. 8 is a screen diagram of the first terminal when the shooting position with respect to the shooting target is moved in a projective manner in FIG.

図９は、第１の端末及び第２の端末の機能構成図である。 FIG. 9 is a functional configuration diagram of the first terminal and the second terminal.

被指示側端末としての端末１は、ネットワークに接続すると共に、ディスプレイ１３及びカメラ１４とを有する。また、端末１は、撮影動画像送信部１１と、映像表示制御部１２とを有する。これら機能構成部は、端末１に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。
撮影動画像送信部１１は、カメラ１４による撮影動画像を逐次、相手方端末２へ送信する（図２のＳ１と同様）。
映像表示制御部１２は、カメラ１４によって撮影された撮影動画像と、相手方端末２から受信した撮影静止画像（第２の所定範囲）とがマッチングする第３の所定範囲を検出する。次に、第２の所定範囲と第３の所定範囲との間の第２の座標相対値を算出する。そして、撮影静止画像（第２の所定範囲）に第１の座標相対値及び第２の座標相対値を反映した第４の所定範囲に、指示静止画像を重畳させてディスプレイ１３に表示する（図２のＳ４１〜Ｓ４３と同様）。 A terminal 1 as an instructed terminal has a display 13 and a camera 14 as well as connected to a network. Further, the terminal 1 includes a captured moving image transmission unit 11 and a video display control unit 12. These functional components are realized by executing a program that causes a computer mounted on the terminal 1 to function.
The captured moving image transmission unit 11 sequentially transmits captured moving images from the camera 14 to the counterpart terminal 2 (similar to S1 in FIG. 2).
The video display control unit 12 detects a third predetermined range in which the captured moving image captured by the camera 14 matches the captured still image (second predetermined range) received from the counterpart terminal 2. Next, a second coordinate relative value between the second predetermined range and the third predetermined range is calculated. Then, the instruction still image is superimposed on the fourth predetermined range reflecting the first coordinate relative value and the second coordinate relative value on the captured still image (second predetermined range) and displayed on the display 13 (FIG. 2 S41 to S43).

指示側端末としての端末２は、ネットワークに接続すると共に、タッチパネルディスプレイ２３を有する。また、端末２は、指示静止画像入力部２１と、指示静止画像送信部２２とを有する。これら機能構成部は、端末２に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。
指示静止画像入力部２１は、第２の実施形態によれば、当該撮影動画像に対する第２の所定範囲をユーザに指定させると共に、当該撮影動画像に対する指示静止画像をユーザに書き込ませる（図２のＳ２１及びＳ２２と同様）。
指示静止画像送信部２２は、第２の実施形態によれば、ユーザに書き込まれた指示静止画像を含む第１の所定範囲と、第２の所定範囲との間の第１の座標相対値を算出する。そして、第１の座標相対値と、指示静止画像と、第２の所定範囲で撮影動画像を静止画像としてトリミングした撮影静止画像とを、端末１へ送信する（図２のＳ３１〜Ｓ３３と同様）。 The terminal 2 as the instruction side terminal is connected to the network and has a touch panel display 23. The terminal 2 includes an instruction still image input unit 21 and an instruction still image transmission unit 22. These functional components are realized by executing a program that causes a computer mounted on the terminal 2 to function.
According to the second embodiment, the instruction still image input unit 21 allows the user to specify a second predetermined range for the captured moving image and causes the user to write the instruction still image for the captured moving image (FIG. 2). Of S21 and S22 in FIG.
According to the second embodiment, the instruction still image transmission unit 22 calculates the first coordinate relative value between the first predetermined range including the instruction still image written by the user and the second predetermined range. calculate. Then, the first coordinate relative value, the instruction still image, and the captured still image obtained by trimming the captured moving image as a still image within the second predetermined range are transmitted to the terminal 1 (similar to S31 to S33 in FIG. 2). ).

図１０は、送信側及び受信側の両方の機能を搭載した両用端末の機能構成図である。 FIG. 10 is a functional configuration diagram of a dual-purpose terminal equipped with both functions on the transmission side and reception side.

図１０によれば、両用端末３における各機能構成部は、図９における被指示側端末１及び指示側端末２の機能構成部と全く同様のものである。また、これら機能構成部は、端末３に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 According to FIG. 10, each functional component in the dual-purpose terminal 3 is exactly the same as the functional component in the instructed terminal 1 and the instructing terminal 2 in FIG. 9. Further, these functional components are realized by executing a program that causes a computer mounted on the terminal 3 to function.

以上、詳細に説明したように、本発明の映像指示方法、システム、端末、及びプログラムによれば、マーカや登録オブジェクト画像を用いることなく、一方の端末のカメラに写る映像に対して、他方の端末から画像的な指示をすることができる。 As described above in detail, according to the video instruction method, system, terminal, and program of the present invention, without using a marker or a registered object image, a video captured by the camera of one terminal is Image instructions can be given from the terminal.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１被指示側端末
１１撮影動画像送信部
１２映像表示制御部
１３ディスプレイ
１４カメラ
２指示側端末
２１指示静止画像入力部
２２指示静止画像送信部
２３タッチパネルディスプレイ
３両用端末 DESCRIPTION OF SYMBOLS 1 Commanded side terminal 11 Shooting moving image transmission part 12 Image | video display control part 13 Display 14 Camera 2 Instruction side terminal 21 Instruction still image input part 22 Instruction still image transmission part 23 Touch panel display 3 Dual-use terminal

Claims

In a video instruction method in a system in which a first terminal having a display and a camera and a second terminal having a display are connected via a network,
A first step in which a first terminal sequentially transmits a moving image captured by the camera to a second terminal;
The second terminal displays the received captured moving image on the display, causes the user to specify a second predetermined range for the captured moving image, and causes the user to write an instruction still image for the captured moving image. Two steps,
The second terminal calculates a first coordinate relative value between the first predetermined range including the instruction still image written by the user and the second predetermined range, and the first coordinate relative value A third step of transmitting the instruction still image and a captured still image obtained by trimming the captured moving image as a still image within a second predetermined range to the first terminal;
The first terminal detects a third predetermined range in which the captured moving image captured by the camera matches the captured still image (second predetermined range), and the second predetermined range and the third predetermined range are detected. Calculating a second relative coordinate value between the range and a fourth predetermined range reflecting the first coordinate relative value and the second coordinate relative value in the captured still image (second predetermined range); And a fourth step of displaying the instruction still image on the display in a superimposed manner.

The first coordinate relative value is a translation matrix, a combination of a translation matrix and a scaling matrix, a Euclidean transformation matrix or an affine transformation matrix,
The video instruction method according to claim 1, wherein the second coordinate relative value is a projective transformation matrix or a posture transformation matrix.

3. The video instruction according to claim 1, wherein the first terminal transmits to the second terminal only a frame obtained by thinning the captured moving image by a predetermined time width. Method.

4. The first step according to claim 3, wherein the captured moving image transmits only an I (Intra-picture) frame, which is a reference for a motion compensation inter-frame prediction method, to the second terminal. 5. Video instruction method.

5. The first terminal sets the data rate of the I frame to a relatively high rate that is equal to or lower than the data rate of one GOP (Group Of Pictures) with respect to the first step. The video instruction method described in 1.

The photographed still image is a feature amount image for the matching or a resolution compressed image for a low data amount,
The video instruction method according to claim 1, wherein the instruction still image is a resolution-compressed image for a low data amount.

The display mounted on the second terminal is a touch panel display,
The video according to any one of claims 1 to 6, wherein in the second step, the second terminal uses the image drawn by the finger of the user on the touch panel display as an instruction still image. Instruction method.

The second terminal is further connected to a touch pen input device that allows the user to draw the captured moving image displayed on the display,
7. The video instruction method according to claim 1, wherein the second terminal uses the image drawn by the user with the touch pen as an instruction still image. 8.

9. The fourth step according to claim 1, wherein the first terminal applies an AR (Augmented Reality) markerless type / object recognition method. 10. Video instruction method.

In a video instruction system in which a first terminal having a display and a camera and a second terminal having a display are connected via a network,
The first terminal is
Shooting moving image transmitting means for sequentially transmitting a moving image captured by the camera to the second terminal;
A third predetermined range in which a captured moving image captured by the camera matches a captured still image (second predetermined range) received from the second terminal is detected, and the second predetermined range and the third predetermined range are detected. A second coordinate relative value between the predetermined range is calculated, and a fourth predetermined range reflecting the first coordinate relative value and the second coordinate relative value in the captured still image (second predetermined range) is calculated. Video display control means for displaying the instruction still image on the display in a superimposed manner,
The second terminal
Instructed still image input means for displaying the received captured moving image on the display, causing the user to specify a second predetermined range for the captured moving image, and for allowing the user to write an instruction still image for the captured moving image;
The first coordinate relative value between the first predetermined range including the designated still image written by the user and the second predetermined range is calculated, and the first coordinate relative value, the designated still image, A video instruction system comprising: an instruction still image transmission unit that transmits a captured still image obtained by trimming the captured moving image as a still image within a second predetermined range to the first terminal.

In a terminal equipped with a display and a camera,
Shooting moving image transmitting means for sequentially transmitting a moving image captured by the camera to the counterpart terminal;
Instructed still image input means for displaying a captured moving image received from the counterpart terminal on the display, causing the user to specify a second predetermined range for the captured moving image, and for the user to write an instruction still image for the captured moving image. When,
The first coordinate relative value between the first predetermined range including the designated still image written by the user and the second predetermined range is calculated, and the first coordinate relative value, the designated still image, An instruction still image transmitting means for transmitting a captured still image obtained by trimming the captured moving image as a still image within a second predetermined range to a counterpart terminal;
A third predetermined range in which a captured moving image captured by the camera matches a captured still image (second predetermined range) received from the counterpart terminal is detected, and the second predetermined range and the third predetermined range are detected. The second coordinate relative value between the first coordinate relative value and the second coordinate relative value is reflected in the captured still image (second predetermined range). A terminal comprising: video display control means for displaying an instruction still image superimposed on the display.

In a program for causing a computer mounted on a terminal equipped with a display and a camera to function,
Shooting moving image transmitting means for sequentially transmitting a moving image captured by the camera to the counterpart terminal;
Instructed still image input means for displaying a captured moving image received from the counterpart terminal on the display, causing the user to specify a second predetermined range for the captured moving image, and for the user to write an instruction still image for the captured moving image. When,
The first coordinate relative value between the first predetermined range including the designated still image written by the user and the second predetermined range is calculated, and the first coordinate relative value, the designated still image, An instruction still image transmitting means for transmitting a captured still image obtained by trimming the captured moving image as a still image within a second predetermined range to a counterpart terminal;
A third predetermined range in which a captured moving image captured by the camera matches a captured still image (second predetermined range) received from the counterpart terminal is detected, and the second predetermined range and the third predetermined range are detected. The second coordinate relative value between the first coordinate relative value and the second coordinate relative value is reflected in the captured still image (second predetermined range). A program for a terminal, which causes a computer to function as video display control means for displaying an instruction still image superimposed on the display.