JP7360158B2

JP7360158B2 - Control system and control program

Info

Publication number: JP7360158B2
Application number: JP2019225747A
Authority: JP
Inventors: 修福田; ウナンイ; 隆介窪園
Original assignee: Saga University
Current assignee: Saga University
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2023-10-12
Anticipated expiration: 2039-12-13
Also published as: JP2021094085A

Description

特許法第３０条第２項適用（１）ＩＥＥＥＡｃｃｅｓｓＶｏｌｕｍｅ７，２０１９第５４５４２頁から５４５４９頁「ＤｅｖｅｌｏｐｍｅｎｔｏｆＤｉｓｔｒｉｂｕｔｅｄＣｏｎｔｒｏｌＳｙｓｔｅｍｆｏｒＶｉｓｉｏｎ－ＢａｓｅｄＭｙｏｅｌｅｃｔｒｉｃＰｒｏｓｔｈｅｔｉｃＨａｎｄ」公開日：平成３１年４月１８日（２）ロボティクス・メカトロニクス講演会２０１９講演論文集「搭載カメラを利用した筋電義手の操作意図推定」公開日：令和元年６月５日（３）ロボティクス・メカトロニクス講演会２０１９講演論文集「ＰｒｏｓｔｈｅｔｉｃＨａｎｄＣｏｎｔｒｏｌＳｙｓｔｅｍＢａｓｅｄｏｎＯｂｊｅｃｔＭａｔｃｈｉｎｇａｎｄＴｒａｃｋｉｎｇ」公開日：令和元年６月５日（４）第７２回電気・情報関係学会九州支部連合大会予稿集５２２頁０６－２Ｐ－０５「義手搭載カメラによる物体認識と把持対象の選択」公開日：令和元年９月１９日（５）第３７回日本ロボット学会学術講演予稿集「ＳｍａｒｔＨａｎｄ：ＧｒａｓｐＯｂｊｅｃｔｓＢａｓｅｄｏｎＳｐａｔｉａｌＡｗａｒｅｎｅｓｓ」公開日：令和元年９月３日Application of Article 30, Paragraph 2 of the Patent Act (1) IEEE Access Volume 7, 2019, pages 54542 to 54549 “Development of Distributed Control System for Vision-Based Myoelectric ic Prosthetic Hand” Release date: April 18, 2019 (2 ) Robotics and Mechatronics Conference 2019 Lecture Paper Collection “Estimation of Operation Intention of Myoelectric Prosthetic Hand Using Onboard Camera” Publication date: June 5, 2019 (3) Robotics and Mechatronics Lecture Conference 2019 Lecture Paper Collection “Prosthetic Hand” "Control System Based on Object Matching and Tracking" Publication date: June 5, 2019 (4) Proceedings of the 72nd Electrical and Information Society Kyushu Branch Conference, page 522 06-2P-05 "Object matching using a camera mounted on a prosthetic hand "Recognition and Grasp Object Selection" Publication date: September 19, 2019 (5) Proceedings of the 37th Academic Conference of the Robotics Society of Japan "Smart Hand: Grasp Objects Based on Spatial Awareness" Publication date: September 2019 3rd day of the month

本発明は、人の行動に合致した動作をするように実行部を駆動制御する制御システム等に関する。 The present invention relates to a control system and the like that drive and control an execution unit so that it performs an operation that matches human behavior.

例えば、義手の駆動制御に関する技術として、例えば特許文献１ないし３に示す技術が開示されている。特許文献１に示す技術は、人指し指に該当する第一義指４と、中指に該当する第二義指５と、薬指に該当する第三義指６と、小指に該当する第四義指７と、親指に該当する第五義指８と、各義指を支持する基台２と、この基台２を支持する腕部とからなる。そして、基台２に義指を動かす複数のモータを取り付け、このモータに対応してモータの回転方向と起動とを操作する複数のスイッチを設けたものである。 For example, as techniques related to drive control of a prosthetic hand, techniques shown in, for example, Patent Documents 1 to 3 are disclosed. The technology shown in Patent Document 1 includes a first artificial finger 4 corresponding to the index finger, a second artificial finger 5 corresponding to the middle finger, a third artificial finger 6 corresponding to the ring finger, and a fourth artificial finger 7 corresponding to the little finger. It consists of a fifth prosthetic finger 8 corresponding to a thumb, a base 2 that supports each prosthetic finger, and an arm that supports this base 2. A plurality of motors for moving the prosthetic finger are attached to the base 2, and a plurality of switches are provided corresponding to the motors for controlling the direction of rotation and activation of the motors.

特許文献２に示す技術は、上肢若しくは下肢の一部若しくは全部を欠損した患肢と、随意的に動かし得る関節とを有する人体に適用される動力義肢を制御するための方法であって、随意的に動かし得る関節の一又は複数の関節変位を、計測手段により第１の関節変位として計測するステップと、第１の関節変位の一般化座標空間から動力義肢の関節変位の一般化座標空間への適切な写像を演算手段により演算し、第２の関節変位として算出するステップと、第２の関節変位を目標値として、制御手段により患肢に適用した動力義肢の一又は複数の関節変位制御を行なうステップと、からなることを特徴とするものである。 The technology disclosed in Patent Document 2 is a method for controlling a powered prosthetic limb applied to a human body having a partially or completely missing upper limb or lower limb and a joint that can be moved voluntarily. measuring the joint displacement of one or more joints that can be moved as a first joint displacement by a measuring means, and converting the generalized coordinate space of the first joint displacement into the generalized coordinate space of the joint displacement of the powered prosthetic limb. calculating an appropriate mapping by the calculation means and calculating it as a second joint displacement; and controlling the joint displacement of one or more power prosthetic limbs applied to the affected limb by the control means, with the second joint displacement as a target value. The method is characterized by the following steps:

特許文献３に示す技術は、手の甲および手のひら部分の間に配置される基部と、基部に連結された第１中間部と、第１中間部に連結された第２中間部と、第２中間部に連結された指先部とを具える可動指であって、基部に配置されたアクチュエーターで第１ワイヤーを引いて偏心部材を回動させ、第１中間部を、偏心部材で第１中間部引っ張りスプリングを介して引いて屈曲させ、第１中間部伸ばしスプリングで伸ばす第１中間部屈曲機構と、第１中間部に動滑車として配置されたプーリーに巻き掛けられた第２ワイヤーを偏心部材で引いて、第２中間部を、プーリーに連結された第３ワイヤーで引いて屈曲させ、第２中間部伸ばしスプリングで伸ばす第２中間部屈曲機構と、指先部を、第１中間部に連結された第４ワイヤーで引いて屈曲させ、第１中間部に連結されて第４ワイヤーと異なる経路で引き回された第５ワイヤーで引いて伸ばす指先部屈曲機構とを具えるものである。 The technology shown in Patent Document 3 includes a base part disposed between the back of the hand and the palm part, a first intermediate part connected to the base, a second intermediate part connected to the first intermediate part, and a second intermediate part. a movable finger having a fingertip portion connected to the fingertip portion, the actuator disposed at the base pulls a first wire to rotate the eccentric member, and the first intermediate portion is pulled by the eccentric member. A first intermediate part bending mechanism that is pulled and bent via a spring and stretched by a first intermediate part stretching spring, and a second wire that is wound around a pulley arranged as a movable pulley in the first intermediate part is pulled by an eccentric member. a second intermediate part bending mechanism that pulls and bends the second intermediate part with a third wire connected to a pulley and stretches it with a second intermediate part stretching spring; The fingertip bending mechanism includes a fingertip bending mechanism that is pulled and bent by a fourth wire, and stretched by being pulled by a fifth wire connected to the first intermediate portion and routed in a different route from the fourth wire.

ここで、発明者らが開発したサイバーフィジカルシステムに関する技術として、例えば非特許文献１に示す技術が開示されている。非特許文献１に示す技術は、義手にカメラと人工知能（ＡＩ）を搭載することで義手がある程度自律的に動作できることが開示されている。 Here, as a technology related to a cyber-physical system developed by the inventors, for example, a technology shown in Non-Patent Document 1 is disclosed. The technology shown in Non-Patent Document 1 discloses that by mounting a camera and artificial intelligence (AI) on the prosthetic hand, the prosthetic hand can operate autonomously to some extent.

特開２０１２－２５００４８号公報Japanese Patent Application Publication No. 2012-250048 特開２０１０－２１３８７３号公報Japanese Patent Application Publication No. 2010-213873 特許第３０８６４５２号Patent No. 3086452

福田修、「見て、考えて、動作するサイボーグ義手」、２０１８年６月１６日、インターネット<https://talk.yumenavi.info/archives/2218?site=p>Osamu Fukuda, “Cyborg prosthetic arm that sees, thinks, and moves,” June 16, 2018, Internet <https://talk.yumenavi.info/archives/2218?site=p>

特許文献１ないし３に示す技術は、いずれも使用者からの操作、音声、筋電情報等の入力情報に応じて義手等の動きを制御するものであるが、入力情報のパターンには限界があり、また、仮に詳細な入力情報を与えたとしても、それに見合う細かい動作を義手等に実行させるのは極めて困難性が高いという課題を有する。 The technologies shown in Patent Documents 1 to 3 all control the movement of a prosthetic arm, etc. according to input information such as operation from the user, voice, and myoelectric information, but there are limits to the patterns of input information. Another problem is that even if detailed input information is provided, it is extremely difficult to make the prosthetic arm perform detailed movements corresponding to the detailed input information.

非特許文献１に示す技術は、人工知能を用いることで対象物に対して義手が自律的に動作することが可能であるが、対象物に対してどのような行動を起こすかまでは人工知能でカバーすることができないため、使用者の対象物に対する行動を正確に実現するには不十分なものとなってしまう。 The technology shown in Non-Patent Document 1 uses artificial intelligence to allow the prosthetic hand to move autonomously toward an object, but it is not possible to determine what actions to take toward the object. Therefore, it is insufficient to accurately realize the user's actions toward the object.

本発明は、人の行動に伴ってセンサで検知された情報から行動内容を特定すると共に、特定された行動内容に合致するように実行部の動作を駆動制御する制御システム及び制御方法を提供する。 The present invention provides a control system and control method that specifies behavior content from information detected by a sensor in conjunction with a person's behavior, and drives and controls the operation of an execution unit to match the identified behavior content. .

本発明に係る制御システムは、人体に装着され対象物に対して動作を実行する実行部と、前記対象物に対する人の行動に伴って１又は複数のセンサで検知された検知情報を取得する検知情報取得手段と、前記対象物に対して１又は複数のセンサで計測された計測情報を取得する計測情報取得手段と、前記検知情報及び前記計測情報に基づいて、前記対象物に対する前記行動の内容に合致する動作を特定する動作特定手段と、特定された前記動作に対応させて前記実行部を駆動制御する駆動制御手段とを備えるものである。 The control system according to the present invention includes an execution unit that is worn on a human body and executes an operation on a target object, and a detection unit that acquires detection information detected by one or more sensors in accordance with a person's action on the target object. information acquisition means; measurement information acquisition means for acquiring measurement information measured by one or more sensors on the object; and content of the action on the object based on the detection information and the measurement information. and a drive control means that drives and controls the execution section in response to the specified motion.

このように、本発明に係る制御システムにおいては、人体に装着され対象物に対する人の行動に伴って１又は複数のセンサで検知された検知情報と、対象物に対して１又は複数のセンサで計測された計測情報とに基づいて、対象物に対する人の行動の内容に合致する動作を特定し、特定された動作に対応させて実行部を駆動制御するため、対象物に対する人の目的に適合するように実行部を駆動させることが可能となり、人と実行部とが連携して協働することで目的を達成することができるという効果を奏する。 As described above, in the control system according to the present invention, the detection information detected by one or more sensors attached to the human body in accordance with the person's actions toward the object, and the detection information detected by the one or more sensors regarding the object. Based on the measured measurement information, it identifies a motion that matches the content of the person's behavior toward the object, and drives and controls the execution unit in response to the identified motion, so it matches the person's purpose toward the object. This makes it possible to drive the execution unit in such a way that the execution unit works in conjunction with the person and the execution unit to achieve the objective.

第１の実施形態に係る制御システムのシステム構成図である。FIG. 1 is a system configuration diagram of a control system according to a first embodiment. 第１の実施形態に係る制御システムにおける義肢装置の機能ブロック図である。FIG. 2 is a functional block diagram of a prosthetic limb device in the control system according to the first embodiment. 第１の実施形態に係る制御システムにおけるサーバの機能ブロック図である。FIG. 2 is a functional block diagram of a server in the control system according to the first embodiment. 第１の実施形態に係る制御システムの動作を示すフローチャートである。3 is a flowchart showing the operation of the control system according to the first embodiment. 第１の実施形態に係る制御システムのハードウェア構成の一例を示す図である。1 is a diagram showing an example of a hardware configuration of a control system according to a first embodiment; FIG. 第１の実施形態に係る制御システムのテストプラットフォームの環境を示す図である。1 is a diagram showing an environment of a test platform of a control system according to a first embodiment; FIG. 第１の実施形態に係る制御システムにおけるＥＭＧの取得からトリガ信号の生成までのプロセスを示す図である。FIG. 3 is a diagram showing a process from EMG acquisition to trigger signal generation in the control system according to the first embodiment. 第１の実施形態に係る制御システムにおいてＣＮＮ内の画像から境界ボックスとクラス確率を検出するアーキテクチャを示す図である。FIG. 2 is a diagram showing an architecture for detecting bounding boxes and class probabilities from images in a CNN in the control system according to the first embodiment. 第１の実施形態に係る制御システムにおけるオブジェクト検出モジュールのトレーニングサンプルを示す図である。FIG. 3 is a diagram showing training samples of the object detection module in the control system according to the first embodiment. 第１の実施形態に係る制御システムにおける実験の概要を示す図である。FIG. 3 is a diagram showing an outline of an experiment in the control system according to the first embodiment. 第１の実施形態に係る制御システムにおいて検出されたオブジェクトのオブジェクトクラスの確率とターゲットオブジェクトの確率をプロットした図である。FIG. 3 is a diagram plotting the probability of an object class of an object detected in the control system according to the first embodiment and the probability of a target object. 第１の実施形態に係る制御システムの別の実験におけるセンサデータと中間結果を示す図である。FIG. 7 is a diagram showing sensor data and intermediate results in another experiment of the control system according to the first embodiment. 第２の実施形態に係る制御システムにおいてカメラの動画像から使用者の動作意図を推定する戦略の概要を示す図である。FIG. 7 is a diagram showing an outline of a strategy for estimating a user's motion intention from a moving image of a camera in a control system according to a second embodiment. 第２の実施形態に係る制御システムにおいて物体毎の検出矩形の大きさとカメラとの距離の関係を示す図である。FIG. 7 is a diagram showing the relationship between the size of a detection rectangle for each object and the distance to a camera in a control system according to a second embodiment. 第２の実施形態に係る制御システムにおいて対象物の選択意図の推定実験の結果を示す図である。FIG. 7 is a diagram showing the results of an experiment for estimating the selection intention of a target object in the control system according to the second embodiment. 第３の実施形態に係る制御システムにおいて位置関係追跡部がカメラと対象物との空間的関係を追跡する場合の図である。FIG. 7 is a diagram showing a case where a positional relationship tracking unit tracks the spatial relationship between a camera and an object in a control system according to a third embodiment. 第３の実施形態に係る制御システムにおいてカメラ座標系ＣＳと対象物ローカルＣＳの空間的関係を表す図である。FIG. 7 is a diagram showing a spatial relationship between a camera coordinate system CS and an object local CS in a control system according to a third embodiment. 第３の実施形態に係る制御システムにおいて事前にスキャンされた対象物の特徴（参照）と比較して、対象物の位置と方向を推定する仕組みを示す図である。FIG. 7 is a diagram showing a mechanism for estimating the position and direction of a target object by comparing the characteristics (reference) of the target object scanned in advance in the control system according to the third embodiment. 第３の実施形態に係る制御システムにおいてデモンストレーション結果を示す図である。It is a figure which shows the demonstration result in the control system based on 3rd Embodiment. 第４の実施形態に係る制御システムにおいてマーカを使用して対象物の姿勢を追跡する様子を示す図である。FIG. 12 is a diagram showing how the posture of a target object is tracked using markers in a control system according to a fourth embodiment. 第４の実施形態に係る制御システムにおいてカメラのｚ軸が２つのシーンのカップとボトルをそれぞれ通過する場合の図である。FIG. 7 is a diagram showing a case where the z-axis of the camera passes through a cup and a bottle in two scenes in the control system according to the fourth embodiment. 第４の実施形態に係る制御システムにおいてスマートハンドが遠方から向けられており、姿勢情報は接近段階の終わりまで保持される様子を示す図である。FIG. 7 is a diagram showing how the smart hand is directed from a distance in the control system according to the fourth embodiment, and posture information is retained until the end of the approach phase.

以下、本発明の実施の形態を説明する。また、本実施形態の全体を通して同じ要素には同じ符号を付けている。 Embodiments of the present invention will be described below. Further, the same elements are given the same reference numerals throughout this embodiment.

（本発明の第１の実施形態）
本実施形態に係る制御システムについて、図１ないし図１２を用いて説明する。本実施形態に係る制御システムは、人の行動に伴うセンサ情報と対象物に対して計測されたセンサ情報とから、対象物に対する人の行動を推定し、その推定内容に合致するように実行部を動作させるものである。 (First embodiment of the present invention)
A control system according to this embodiment will be explained using FIGS. 1 to 12. The control system according to this embodiment estimates a person's behavior toward an object based on sensor information associated with the person's behavior and sensor information measured on the object, and an execution unit It operates.

図１は、本実施形態に係る制御システムのシステム構成図である。本実施形態に係る制御システム１は、人体に装着される１又は複数の義肢装置１０と、当該義肢装置１０とインターネットを介して通信を行うサーバ２０とを備える。義肢装置１０は、例えば義手や義足であり、身体障害者などが普段の生活において装着しているものである。各義肢装置１０は、個々に独立して動作するものであり、それぞれがサーバ２０と接続する構成となっている。 FIG. 1 is a system configuration diagram of a control system according to this embodiment. The control system 1 according to this embodiment includes one or more prosthetic limb devices 10 that are attached to a human body, and a server 20 that communicates with the prosthetic limb devices 10 via the Internet. The prosthetic limb device 10 is, for example, a prosthetic hand or a prosthetic leg, and is worn by physically disabled people in their daily lives. Each prosthesis device 10 operates independently, and is configured to be connected to a server 20.

義肢装置１０には様々なセンサ１１が設置及び通信接続されており、センサ１１からの情報を受信してサーバ２０に送信される。これらのセンサ１１は、義肢装置１０を装着している使用者に付設されているセンサであってもよい。サーバ２０は、義肢装置１０から送信されたセンサ１１の情報や筋電情報に基づいて演算処理を実行する。演算処理とは例えば、ＡＩ機能による情報処理であり、義肢装置１０から送信された情報から義肢装置１０の実行対象となっている対象物の種別を特定したり、対象物に対して義肢装置１０をどのように駆動するかを演算するものである。具体的には、閾値判定、線形判別式、条件式、ニューラルネット、ベイジアンネット、強化学習などの認識アルゴリズムをベースとし、複数の義肢装置１０から収集したデータやインターネット上の情報検索等を組み合わせたアルゴリズムで演算を行うことが可能である。 Various sensors 11 are installed and communicatively connected to the prosthetic limb device 10 , and information from the sensors 11 is received and transmitted to the server 20 . These sensors 11 may be sensors attached to a user wearing the prosthetic limb device 10. The server 20 executes arithmetic processing based on the sensor 11 information and myoelectric information transmitted from the prosthetic limb device 10. Arithmetic processing is, for example, information processing using an AI function, such as identifying the type of object to be executed by the prosthetic device 10 from information transmitted from the prosthetic device 10, or It calculates how to drive. Specifically, it is based on recognition algorithms such as threshold judgment, linear discriminant, conditional expression, neural network, Bayesian network, and reinforcement learning, and combines data collected from multiple prosthetic devices 10 and information searches on the Internet. It is possible to perform calculations using algorithms.

サーバ２０で演算された結果（例えば、実行部の駆動制御情報）は義肢装置１０に返送され、返送された情報に基づいて義肢装置１０の実行部が動作を実行する。サーバ２０には、通信可能な複数の義肢装置１０から情報が集まるため、各義肢装置１０から得た情報を記録し学習に役立てることができる。 The results calculated by the server 20 (for example, drive control information of the execution unit) are sent back to the prosthetic limb device 10, and the execution unit of the prosthetic limb device 10 executes the operation based on the returned information. Since the server 20 collects information from a plurality of communicable prosthetic devices 10, the information obtained from each prosthetic device 10 can be recorded and used for learning.

なお、本実施形態においてセンサ１１のセンシング情報には２つの種別がある。１つは対象物に対して計測した情報（以下、計測情報という）であり、例えばカメラで対象物を撮像した画像情報や超音波センサで対象物までの距離を測定した情報等がこれに該当する。もう１つは人の行動に伴って検知された情報（以下、検知情報という）であり、例えばマイクで検出した人の音声情報や義肢装置１０の方向や位置を特定する加速度センサやジャイロセンサや筋電を取得する筋電センサ等の情報がこれに該当する。また、センサ１１の種類によっては計測情報及び検知情報の両方の側面を有する情報を提供する場合もある。例えば、カメラで撮像した画像情報には対象物に対してその種類を特定するための計測情報という側面と、義肢装置１０が対象物に対してどのような動きをしているか（例えば、近づいている、遠ざかっている、右に動いている、左に動いている等）といった、人の行動に伴って検知される検知情報としての側面がある。同様に、例えば超音波センサによる義肢装置１０から対象物までの距離情報には、単に距離の情報としての計測情報の側面と、対象物に対して何か行動（例えば、掴む、持つ、倒す、起こす、押す、引く等）を起こすために近づいているといった人の行動に伴う検知情報としての側面がある。すなわち、対象物に対して計測する計測情報と、人の行動に伴って検知される検知情報とは、必ずしても異なるセンサ１１から得られる個別の情報ではなく、共通のセンサ１１から得られる場合もあり得る。 Note that in this embodiment, there are two types of sensing information from the sensor 11. One type is information measured on the object (hereinafter referred to as measurement information), such as image information obtained by capturing an image of the object with a camera, information obtained by measuring the distance to the object with an ultrasonic sensor, etc. do. The other type is information detected along with human actions (hereinafter referred to as detection information), such as human voice information detected by a microphone, an acceleration sensor or gyro sensor that specifies the direction and position of the prosthetic limb device 10, etc. This corresponds to information such as a myoelectric sensor that acquires myoelectric signals. Further, depending on the type of sensor 11, information having both measurement information and detection information may be provided. For example, the image information captured by the camera includes measurement information for identifying the type of the object, and information on how the prosthetic limb device 10 is moving relative to the object (for example, when approaching the object. There is an aspect of detection information that is detected along with human actions, such as when a person is present, moving away, moving to the right, moving to the left, etc. Similarly, for example, distance information from the prosthetic limb device 10 to a target object using an ultrasonic sensor includes two aspects: measurement information simply as distance information, and actions (e.g., grabbing, holding, knocking down, etc.) toward the target object. There is also the aspect of detection information accompanying the actions of a person, such as someone approaching to wake someone up (waking up, pushing, pulling, etc.). In other words, the measurement information measured on the object and the detection information detected along with the human behavior are not necessarily individual information obtained from different sensors 11, but are obtained from a common sensor 11. There may be cases.

図２及び図３は、本実施形態に係る制御システムの機能ブロック図である。図２は義肢装置の機能ブロック図、図３はサーバの機能ブロック図である。図２において、義肢装置１０は、義肢装置１０に直接設置されていさるセンサ１１、及び／又は義肢装置１０の使用者に付設されているセンサ１１からのセンシング情報を取得する取得部１１０と、取得したセンシング情報をサーバ２０に送信する送信部１２０と、人体に装着され対象物に対して動作を実行する実行部１３０と、実行部１３０を人の行動内容に合致するように動作させるための駆動制御情報をサーバ２０から受信して、実際に実行部１３０を駆動する駆動制御部１４０とを備える。 2 and 3 are functional block diagrams of the control system according to this embodiment. FIG. 2 is a functional block diagram of the prosthetic limb device, and FIG. 3 is a functional block diagram of the server. In FIG. 2, the prosthetic limb device 10 includes an acquisition unit 110 that acquires sensing information from a sensor 11 installed directly on the prosthetic limb device 10 and/or a sensor 11 attached to a user of the prosthetic limb device 10; a transmitter 120 that transmits the sensed information to the server 20; an execution unit 130 that is worn on the human body and executes an action on an object; and a drive that causes the execution unit 130 to operate in accordance with the content of the person's actions. The drive control unit 140 receives control information from the server 20 and actually drives the execution unit 130.

また図３において、サーバ２０は、義肢装置１０の送信部１２０から送信されたセンサ１１のセンシング情報を取得するセンサ情報取得部２１０と、センサ情報として取得した計測情報に基づいて対象物の種別を特定する対象物特定部２２０と、センサ情報として取得した計測情報及び検知情報、並びに特定された対象物の種別情報に基づいて、使用者の行動を推定する行動推定部２３０と、センサ情報取得部２１０が取得したセンシング情報と行動推定部２３０で推定された行動情報とに基づいて、義肢装置１０の実行部１３０を動作をさせるための駆動制御情報を生成すると共に、当該駆動制御情報を義肢装置１０に送信する駆動制御情報生成部２４０とを備える。 Further, in FIG. 3, the server 20 includes a sensor information acquisition unit 210 that acquires sensing information of the sensor 11 transmitted from the transmission unit 120 of the prosthetic limb device 10, and a sensor information acquisition unit 210 that acquires sensing information of the sensor 11 transmitted from the transmission unit 120 of the prosthetic limb device 10, and a sensor information acquisition unit 210 that acquires the sensing information of the sensor 11 transmitted from the transmission unit 120 of the prosthetic limb device 10. An object specifying section 220 for specifying, a behavior estimation section 230 for estimating the user's behavior based on measurement information and detection information acquired as sensor information, and type information of the specified object, and a sensor information acquisition section. Based on the sensing information acquired by 210 and the behavior information estimated by the behavior estimation unit 230, drive control information for operating the execution unit 130 of the prosthetic limb device 10 is generated, and the drive control information is transmitted to the prosthetic limb device. 10.

センサ１１の種類としては、例えばカメラ、距離センサ、加速度センサ、ジャイロセンサ、地磁気センサ、電磁気センサ、マイク、サーモグラフ、近赤線センサ、バーコード、化学センサ等から単独又は複数を組み合わせて情報を取得する。これらのセンサ１１から取得された計測情報及び検知情報から対象物の種別が特定され、さらに義肢を装着している使用者の行動を推定し、その行動に合致するように実行部１３０が駆動される。 Examples of the types of sensors 11 include cameras, distance sensors, acceleration sensors, gyro sensors, geomagnetic sensors, electromagnetic sensors, microphones, thermographs, near-infrared sensors, barcodes, chemical sensors, etc., which can collect information singly or in combination. get. The type of object is specified from the measurement information and detection information acquired from these sensors 11, and the behavior of the user wearing the prosthetic limb is estimated, and the execution unit 130 is driven to match the behavior. Ru.

例えば一例を挙げると、センサ１１としてマイク及びカメラを備える義肢装置１０でカップを掴んで飲料を飲む場合、使用者の「そのカップ取って」という音声情報がマイクから入力されると、その情報を検知情報としてサーバ２０に送信する。また併せて、使用者がカップを取るために義肢の実行部１３０（ここでは指先）をカップに近づける行動をすると、義肢装置１０に付設されたカメラが撮像する画像又は動画の撮像情報が入力され、計測情報及び検知情報としてサーバ２０に送信される。サーバ２０では、撮像情報から対象物となっている撮像領域を特定し、その撮像領域に相当する物体の種別がカップであることを識別し、取っ手がある場合はその向き、形状、大きさ等を識別する。また、実行部１３０の指先がどの方向からどの方向に向かって移動しているかを特定すると共に、音声情報からカップを掴むことが目的であることを特定し、最終的に実行部１３０がカップの取っ手を掴むために行う必要がある動作の駆動情報を生成して義肢装置１０に返信する。義肢装置１０の実行部１３０が駆動情報に基づいて駆動することで、使用者はカップを掴んで飲料を飲むことができる。駆動情報の具体例としては、例えば角度、角速度、トルク、機械コンプライアンス等が挙げられる。 For example, when drinking a beverage by grasping a cup with the prosthetic limb device 10 equipped with a microphone and a camera as the sensor 11, when voice information from the user saying "take that cup" is input from the microphone, that information is It is transmitted to the server 20 as detection information. Additionally, when the user moves the execution part 130 of the prosthetic limb (in this case, the fingertip) close to the cup in order to take the cup, the imaging information of the image or video captured by the camera attached to the prosthetic limb device 10 is input. , is transmitted to the server 20 as measurement information and detection information. The server 20 identifies the imaging area that is the target object from the imaging information, identifies that the type of object corresponding to the imaging area is a cup, and if there is a handle, its orientation, shape, size, etc. identify. In addition, it specifies from which direction the fingertip of the execution unit 130 is moving toward which direction, and also specifies from the audio information that the purpose is to grasp the cup, and finally the execution unit 130 Driving information for the motion required to grasp the handle is generated and sent back to the prosthetic limb device 10. By driving the execution unit 130 of the prosthetic limb device 10 based on the drive information, the user can grasp the cup and drink a beverage. Specific examples of drive information include angle, angular velocity, torque, mechanical compliance, and the like.

つまり、使用者は義肢装置１０を制御するために、行動の目的となる上位概念の内容（上記の例だと「コップを取る」こと）をマイクに入力するだけで、後の細かい下位概念の動作（上記の例だと指ごとの方向、角度、伸縮、強さ等）は、撮像情報に基づいて義肢装置１０が自動で行うこととなり、人と装置とが協働してこれまで困難だった動作も快適に行うことができるようになる。 In other words, in order to control the prosthetic limb device 10, the user simply inputs into the microphone the content of the superordinate concept that is the purpose of the action (in the example above, "pick up a cup"), and then The motion (in the example above, the direction, angle, expansion/contraction, strength, etc. of each finger) will be automatically performed by the prosthetic device 10 based on the imaging information, which has been difficult until now due to cooperation between humans and the device. You will be able to perform more comfortable movements.

なお、サーバ２０又はサーバ２０に接続するデータベースサーバにおいて、各義肢装置１０から送信されたセンサ情報、対象物特定部２２０の処理結果、行動推定部２３０の処理結果、実際に実行部１３０が駆動した場合の結果（駆動制御が上手くいったか失敗したかといったフィードバック情報）等を管理する管理手段（図示しない）を備えるようにしてもよい。そうすることで、より多くの各義肢装置１０からの情報を元にサーバ２０内での演算処理の精度を向上させることが可能となる。 In addition, in the server 20 or a database server connected to the server 20, the sensor information transmitted from each prosthetic limb device 10, the processing results of the object identification unit 220, the processing results of the behavior estimation unit 230, and the information actually activated by the execution unit 130 are stored. A management means (not shown) may be provided to manage the results (feedback information such as whether the drive control was successful or failed). By doing so, it becomes possible to improve the accuracy of calculation processing within the server 20 based on more information from each prosthetic limb device 10.

次に、本実施形態に係る制御システムの動作について説明する。図４は、本実施形態に係る制御システムの動作を示すフローチャートである。まず、センサ１１が常時又は定期的もしくは不定期にセンシングを行い、取得部１１０がを含むセンサ１１のセンサ情報を取得する（Ｓ１）。送信部１２０が、取得されたセンサ情報をサーバ２０に送信する（Ｓ２）。対象物特定部２２０が、センサ情報に基づいて対象物の種別を特定すると共に（Ｓ３）、行動推定部２３０が、センサ情報に基づいて義肢装置１０の使用者の行動を推定する（Ｓ４）。対象物の種別と使用者の行動とに基づいて、駆動制御情報生成部２４０が、使用者の行動に合致するように実行部１３０を動作させるための制御情報を生成し（Ｓ５）、義肢装置１０に送信する（Ｓ６）。義肢装置１０の駆動制御部１４０が、サーバ２０から送信された制御情報に基づいて実行部１３０を駆動し（Ｓ７）、Ｓ１の処理に戻る。義肢装置１０を使用する間Ｓ１～Ｓ７の処理が繰り返して実行される。 Next, the operation of the control system according to this embodiment will be explained. FIG. 4 is a flowchart showing the operation of the control system according to this embodiment. First, the sensor 11 performs sensing constantly, regularly, or irregularly, and the acquisition unit 110 acquires sensor information of the sensor 11 including (S1). The transmitter 120 transmits the acquired sensor information to the server 20 (S2). The object identification unit 220 identifies the type of object based on the sensor information (S3), and the behavior estimation unit 230 estimates the behavior of the user of the prosthetic limb device 10 based on the sensor information (S4). Based on the type of object and the user's behavior, the drive control information generation unit 240 generates control information for operating the execution unit 130 to match the user's behavior (S5), and the prosthesis device 10 (S6). The drive control unit 140 of the prosthetic limb device 10 drives the execution unit 130 based on the control information transmitted from the server 20 (S7), and returns to the process of S1. While the prosthetic limb device 10 is used, the processes of S1 to S7 are repeatedly executed.

上記説明に関して、より具体的なハードウェアの構成、処理、試作実験を行った結果について以下に説明する。ハードウェアの構成図を図５に示す。上述したようにクライアントとサーバーの２つの主要部分に分かれている。クライアントは義手にインストールされ、使用者の動きに合わせて動く。クライアントのプロセッシングユニットは、いくつかのセンサを接続し、義手が直接の環境を感知できるようにする。これらのセンサの中で、画像センサは外部情報を取得する上で最も重要な役割を果たす。ＥＭＧ電極は、筋肉の活動をリアルタイムで記録するために使用される。義手は人体の延長であるため、その制御はユーザの意図を反映する必要がある。ＥＭＧ信号は、使用者が制御システムに意図を表現する（すなわち、検知情報を取得する）のに役立つ。慣性測定ユニット（ＩＭＵ）センサは補助入力ソースであり、画像分析に使用されるカメラの位置と方向を提供する。これらのセンサからのデータは収集、同期化され、サーバ側に送信されて処理される。その後、サーバは受信データを分析した後、モータコマンドを返す。 Regarding the above description, more specific hardware configurations, processing, and results of prototype experiments will be described below. A diagram of the hardware configuration is shown in FIG. As mentioned above, it is divided into two main parts: the client and the server. The client is installed on the prosthetic hand and moves according to the user's movements. The client's processing unit connects several sensors and allows the prosthetic hand to sense its immediate environment. Among these sensors, image sensors play the most important role in acquiring external information. EMG electrodes are used to record muscle activity in real time. Since a prosthetic hand is an extension of the human body, its control must reflect the user's intentions. EMG signals help the user express intent to the control system (ie, obtain sensed information). An inertial measurement unit (IMU) sensor is an auxiliary input source and provides camera position and orientation used for image analysis. Data from these sensors is collected, synchronized, and sent to the server side for processing. The server then returns motor commands after analyzing the received data.

サーバは、インターネットを介してアクセスできる同じローカルネットワークまたは他のネットワーク上にある。複数のＧＰＵ計算ノードとデータベースを有しており、各ＧＰＵノードは並列処理を実行し、データ処理を高速化する。処理タスクは複数のＧＰＵノードに割り当てることができる。サーバには、受信した画像やその他の補助入力の処理に使用されるアルゴリズムのセットがある。本実施形態に係る制御システムにおいては、受信した画像にディープラーニングアルゴリズムを展開する。これらのシステムでは、画像はＣＮＮネットワークに送られ、出力は把持を実行するために必要な情報である。情報は、目的対象物のプロパティ（サイズ、形状、カテゴリなど）、または義手と対象物の相対的な空間的関係（方向、距離など）となる。サーバのもう１つの機能はデータ管理である。モータコマンド、ディープラーニングモデル、および目的対象物のプロパティを管理する。 The server is on the same local network or other network that is accessible via the Internet. It has multiple GPU calculation nodes and a database, and each GPU node performs parallel processing to speed up data processing. Processing tasks can be assigned to multiple GPU nodes. The server has a set of algorithms used to process received images and other auxiliary input. In the control system according to this embodiment, a deep learning algorithm is applied to the received image. In these systems, the images are sent to a CNN network and the output is the information needed to perform the grasp. The information may be the properties of the target object (size, shape, category, etc.) or the relative spatial relationship between the prosthetic hand and the object (direction, distance, etc.). Another function of the server is data management. Manage motor commands, deep learning models, and object properties.

本実施形態に係る制御システムでは、サーバがデータを処理してモーターコマンドを生成すると共に、クライアントが状況を感知してモータを駆動する。制御システムは、人が対象物を把持するときに行うこととまったく同じように機能する。すなわち、対象物を見て、対象物を握る計画を立ててから手を向ける。サーバは頭脳であり、情報を保存し、データを処理する。クライアントは神経系であり、環境を感知し、データを送信し、コマンドを実行する。 In the control system according to this embodiment, the server processes data to generate motor commands, and the client senses the situation and drives the motor. The control system functions exactly as a person does when grasping an object. That is, look at the object, plan to grasp the object, and then turn your hand. Servers are the brains, storing information and processing data. The client is the nervous system, which senses the environment, sends data, and executes commands.

図５に基づいてテストプラットフォームを構築した。図５において、クライアント側では、ＲａｓｐｂｅｒｒｙＰｉはサーバとのデータ交換を行う。センサデータを収集し、モーターコマンドを受信する。Ｍｙｏアームバンド（Thalmic Labs）からＥＭＧ信号を受信し、Ｗｅｂカメラから画像シーケンスを受信する。Ｍｙｏアームバンドは、前腕ＥＭＧ信号を測定するための８つの電極を備えたウェアラブルデバイスである。ＲａｓｐｂｅｒｒｙＰｉとの通信は、Ｂｌｕｅｔｏｏｔｈ（登録商標）インターフェイスを介して行われる。収集されたデータはソケットにパッケージ化され、さらに処理するためにサーバに送信される。サーバは、クライアントと同じＷｉ－Ｆｉネットワーク内にある。筋電制御モジュールは、ＥＭＧ信号を処理してユーザの意図を推定する。推定された意図は、いくつかの特定のアクションのトリガとなる。対象物検出モジュールは、イメージシーケンスからすべての潜在的な候補対象物を認識してローカライズし、目的対象物である可能性が最も高い対象物を提案する。事前定義された把持姿勢は、識別された対象物に基づいて選択される。筋電制御モジュールからのトリガ信号と物体検出モジュールからの選択された姿勢は、一緒になってモータ制御コマンドを決定する。モータ制御コマンドはクライアントに送り返される。ＡｒｄｕｉｎｏＵｎｏはステッピングモータコントローラとして機能する。生成されたモータコマンドに従ってＰＷＭ信号を出力し、３－ＤｏＦグリッパを制御する。３つのモータは、回外運動/回内運動、手首の屈曲/伸展、手の開閉の６つの前腕運動に対応している。グリッパは義手を表している。テストプラットフォームの環境を図６に示す。筋電制御モジュールと物体検出モジュールの詳細を以下に示す。 A test platform was constructed based on Figure 5. In FIG. 5, on the client side, the Raspberry Pi exchanges data with the server. Collect sensor data and receive motor commands. Receive EMG signals from a Myo armband (Thalmic Labs) and image sequences from a web camera. The Myo armband is a wearable device with eight electrodes for measuring forearm EMG signals. Communication with the Raspberry Pi takes place via a Bluetooth (registered trademark) interface. The collected data is packaged into sockets and sent to the server for further processing. The server is in the same Wi-Fi network as the client. The myoelectric control module processes the EMG signals to estimate the user's intention. The inferred intent triggers some specific actions. The object detection module recognizes and localizes all potential candidate objects from the image sequence and suggests the object that is most likely to be the target object. A predefined grasping pose is selected based on the identified object. The trigger signal from the myoelectric control module and the selected pose from the object detection module together determine motor control commands. Motor control commands are sent back to the client. Arduino Uno functions as a stepping motor controller. A PWM signal is output according to the generated motor command to control the 3-DoF gripper. The three motors correspond to six forearm movements: supination/pronation, wrist flexion/extension, and hand opening/closing. The gripper represents a prosthetic arm. Figure 6 shows the environment of the test platform. Details of the myoelectric control module and object detection module are shown below.

筋電制御モジュールは、ユーザの意図を推定してトリガ信号を生成するために使用される。ユーザの意図には、主に、対象物を把持し、把持を途中でやめて、対象物に対する手の相対的な向きが整ったときに手を閉じるという要求が含まれる。ユーザは、筋肉の収縮レベルを制御するためのトレーニングを受けることができ、収縮レベルで意図を反映することができる。 The myoelectric control module is used to estimate the user's intention and generate the trigger signal. The user's intention mainly includes a request to grasp an object, stop grasping midway, and close the hand when the hand is oriented relative to the object. The user can be trained to control the level of muscle contraction, and the level of contraction can reflect intent.

８つのチャネルからのＥＭＧ信号の合計（修正およびフィルタリング）Ｅ_ｓｕｍは、筋肉の収縮レベルとして定義される。当初、筋電制御モジュールは最大Ｅ_ｓｕｍを見つけるために初期化が必要である。ユーザは、１０秒以内に可能な限り前腕の筋肉を収縮するように求められる。この期間の最大Ｅ_ｓｕｍは、Ｅ_ｍａｘと示される。Ｅ_ｍａｘは定数で、式を使用してＥ_ｓｕｍを範囲［０，１］に正規化するために使用される。 The sum of the EMG signals from the eight channels (modified and filtered) E _sum is defined as the muscle contraction level. Initially, the myoelectric control module needs initialization to find the maximum E _sum . The user is asked to contract the forearm muscles as much as possible within 10 seconds. The maximum E _sum for this period is denoted E _max . E _max is a constant and is used to normalize E _sum to the range [0,1] using the formula.

ここで、Ｌは正規化された収縮レベル、Ｅ_ｄは電極チャネルｄでサンプリングされたＥＭＧ信号を示し、ｔはサンプリング時間である。動作状態では、Ｍｙｏアームバンドで測定された生のＥＭＧ信号が最初に整流され、移動平均フィルタを通過する。次に、正規化された収縮レベルが計算され、ユーザが制御のトリガとなるかどうかをしきい値と比較することで認識する。２つのしきい値、θ_ａとθ_ｂ（θ_ａ＜θ_ｂ）は、２つのアクションのトリガに設定される。ｘがθ_ａを超える場合、制御システムはオブジェクト検出モジュールを起動して画像シーケンスの受信を開始する。ｘがθ_ｂを超える場合、制御システムはシステムを起動して、把持を実行するサーボ機構をアクティブにする。図７は、生のＥＭＧの取得からトリガ信号の生成までのプロセスを示している。 where L is the normalized contraction level, E _d indicates the EMG signal sampled at electrode channel d, and t is the sampling time. In operation, the raw EMG signal measured with the Myo armband is first rectified and passed through a moving average filter. A normalized contraction level is then calculated and the user knows whether to trigger control by comparing it to a threshold. Two thresholds, θ _a and θ _b (θ _a <θ _b ), are set to trigger two actions. If x exceeds θ _a , the control system activates the object detection module to start receiving image sequences. If x exceeds θ _b , the control system activates the system to activate the servomechanism that performs the grip. FIG. 7 shows the process from raw EMG acquisition to trigger signal generation.

オブジェクト検出モジュールは、入力として画像シーケンスを受け入れ、提案された把握姿勢を出力する。最初のステップで、モジュールは画像内のすべての潜在的なオブジェクトを検出してローカライズしようとする。テストプラットフォームで使用されるオブジェクト検出器は１段階の方法である。１つのＣＮＮ内の完全な画像から境界ボックスとクラス確率を直接検出する。そのアーキテクチャを図８に示す。主に３×３の畳み込みフィルタとプーリング層で構成される。ネットワークは最初に畳み込みアーキテクチャ（バックボーン）を使用して画像全体から特徴を抽出し、その特徴は座標回帰やクラス分類を含むボンディングボックス認識に使用される。そのアーキテクチャは、リアルタイムのオブジェクト検出ネットワークであるＹＯＬＯネットワークから改良される（参考文献１：J. Redmon and A. Farhadi. (2018).’YOLOv3: An incremental improvement.’ [Online]. Available: https://arxiv.org/abs/1804.02767）。ＹＯＬＯは、多数のオブジェクトクラスを検出するように設計されているため、非常に深いバックボーンを使用して、境界ボックス認識のためのより複雑で抽象的な機能を抽出する。しかし、この場合、テストプラットフォームのオブジェクト検出モジュールは、クラスの代わりに少数の特定のオブジェクトのみを検出する必要があり、より深いバックボーンは必要ない。したがって、元のＹＯＬＯネットワークには５３の畳み込み層があるが、図８のバックボーンには１３の畳み込み層を有する構成となっている。 The object detection module accepts an image sequence as input and outputs a proposed grasp pose. In the first step, the module attempts to detect and localize all potential objects in the image. The object detector used in the test platform is a one-step method. Directly detect bounding boxes and class probabilities from complete images in one CNN. Its architecture is shown in FIG. It mainly consists of a 3x3 convolution filter and a pooling layer. The network first uses a convolutional architecture (backbone) to extract features from the entire image, which are then used for bonding box recognition, including coordinate regression and class classification. Its architecture is improved from the YOLO network, a real-time object detection network (Reference 1: J. Redmon and A. Farhadi. (2018).'YOLOv3: An incremental improvement.' [Online]. Available: https: //arxiv.org/abs/1804.02767). Because YOLO is designed to detect a large number of object classes, it uses a very deep backbone to extract more complex and abstract features for bounding box recognition. But in this case, the test platform's object detection module only needs to detect a few specific objects instead of classes, and no deeper backbone is needed. Therefore, while the original YOLO network has 53 convolutional layers, the backbone of FIG. 8 has a configuration with 13 convolutional layers.

日常生活の５つの一般的なオブジェクトを把握対象とする。カップ、ボトル、穴あけパンチ、スプレーボトル、ボールである。異なる把持姿勢に対応するため、それらが選択される。オブジェクト検出モジュールをトレーニングおよび評価するために、オブジェクトごとに約１２０個の画像を含むデータセットが作成された。画像は、異なる視野角、距離、背景で撮影された。各画像には、１つまたは複数のオブジェクトがランダムに含まれている。いくつかのサンプルが図９に示されている。データセットは、８０％をトレーニング用に２０％を検証用に分けた。ネットワークは比較的単純なバックボーンを使用するため、収束が容易で、別の画像データセットでバックボーンを事前にトレーニングする必要はない。トレーニングフェーズでは、ネットワークの要件に合わせて、入力画像がランダムにスケーリングされ、４１６×４１６にサイズ変更される。増強のために、色相ジッタと彩度ジッタとがトレーニング画像にランダムに追加される。バッチの正規化を実行して学習を高速化し、過剰適合を減らす。トレーニングポリシーは、参考文献のルールに従う。２つのＧＰＵ計算ノード（GeForce GTX Titan XおよびGeForce GTX 970）をインストールして、トレーニングプロセスを加速する。 Five common objects in daily life are to be grasped. They are cups, bottles, hole punches, spray bottles, and balls. They are selected because they correspond to different grasping postures. To train and evaluate the object detection module, a dataset containing approximately 120 images per object was created. Images were taken at different viewing angles, distances, and backgrounds. Each image contains one or more objects randomly. Some samples are shown in FIG. The dataset was divided into 80% training and 20% validation. Because the network uses a relatively simple backbone, it is easy to converge and there is no need to pre-train the backbone on a separate image dataset. In the training phase, the input images are randomly scaled and resized to 416x416 to suit the network requirements. Hue jitter and saturation jitter are randomly added to the training images for enhancement. Perform batch normalization to speed up training and reduce overfitting. The training policy follows the rules in Ref. Install two GPU compute nodes (GeForce GTX Titan X and GeForce GTX 970) to accelerate the training process.

モデルのパフォーマンスの評価には、ｍＡＰ（mean average precision）のメトリックが使用される。ＩｏＵ（Intersection over union）のしきい値を０．５に設定した場合、ネットワークは９５．７６％ｍＡＰを達成する。ネットワークは、GTX Titan X GPU上のＷｅｂカメラを使用してリアルタイムでオブジェクトを検出するために使用される場合、２８．６フレーム/秒（ＦＰＳ）も達成する。元のＹＯＬＯと比較すると、ネットワークの推論速度は高速であるが、データセットで実行される精度はほぼ同じである。さらに、ネットワークモデルは軽量であり、イメージの推論時に２．２［ＧＢ］のＧＰＵメモリしか消費しない。 The mAP (mean average precision) metric is used to evaluate model performance. If the IoU (Intersection over union) threshold is set to 0.5, the network achieves 95.76% mAP. The network also achieves 28.6 frames per second (FPS) when used to detect objects in real time using a webcam on a GTX Titan X GPU. Compared to the original YOLO, the inference speed of the network is faster, but the accuracy performed on the dataset is about the same. Furthermore, the network model is lightweight, consuming only 2.2 GB of GPU memory during image inference.

次のステップでは、最初のステップで複数のオブジェクトが認識される可能性があるため、多くのオブジェクト候補の中から把握対象を選択する。ターゲットを決定するための簡単なルールが作成される。予測されたすべての境界ボックスのうち、画像のジオメトリ中心に最も近いものが把握対象として決定される。境界ボックスの中心と画像の中心間のユークリッド距離は、検出されたすべてのオブジェクトに対して計算される。検出されないオブジェクトは無限の場所にあると見なされ、それらのユークリッド距離は無限の数になる。オブジェクト検出に使用されるネットワークのアーキテクチャは、検出されたオブジェクトを距離に応じてソートする。このプロセスをより一般的かつ直感的にするために、ターゲットオブジェクト確率と呼ばれる新しい概念を取り入れ、オブジェクトがターゲットになる可能性を説明する。それは、以下に定義される。 In the next step, since multiple objects may be recognized in the first step, a grasp target is selected from among many object candidates. A simple rule is created to determine the target. Among all predicted bounding boxes, the one closest to the geometric center of the image is determined to be the object to be grasped. The Euclidean distance between the center of the bounding box and the center of the image is calculated for all detected objects. Objects that are not detected are considered to be at infinite locations, and their Euclidean distance becomes an infinite number. The architecture of the network used for object detection sorts detected objects according to distance. To make this process more general and intuitive, we introduce a new concept called target-object probability to describe the likelihood of an object becoming a target. It is defined below.

ここで、ｄ_ｉはｉ番目のオブジェクトの距離、ｐ_ｉはｉ番目のオブジェクトが把持対象になる確率、ｎはモデルが検出できるオブジェクトの数である。明らかにｐ_ｉは次式を満たす。 Here, d _i is the distance of the i-th object, p _i is the probability that the i-th object becomes a grasping target, and n is the number of objects that the model can detect. Obviously p _i satisfies the following equation.

ターゲットオブジェクトの確率が最も高く検出されたオブジェクトが把握ターゲットとなる。最後のステップは、把握姿勢を決定することである。把握姿勢は事前定義されており、オブジェクトの形状に基づいて設計されている。対象物体が決定されると、対応する把持姿勢が姿勢候補から選択される。 The object detected with the highest probability of being a target object becomes the grasping target. The final step is to determine the grasping posture. Grasping poses are predefined and designed based on the shape of the object. Once the target object is determined, a corresponding grasping posture is selected from the posture candidates.

上記に関して以下の実験を行った。カップ、ボトル、スプレーボトル、ボール、ホールパンチ（データセット内のオブジェクト）を、互いに遠くない距離でテーブルに一列に配置した。ユーザーが保持しているグリッパを左から右に動かして、最後のオブジェクト（ホールパンチ）を掴む動作を行う。グリッパに対するＷｅｂカメラの向きと位置は、グリッパが画像の中央下に位置するように調整した。カメラはネットワークを介してサーバにライブビデオを送り続け、サーバは各ビデオフレームをリアルタイムで処理する。図１０は、実験の概要を示す図である。カメラビューと環境ビューは、それぞれ上部と下部に表示される。図１０から、カメラビューのオブジェクトはすべてバウンディングボックスで正常にローカライズされていることがわかる。予測されたクラスと対応する確率も、バウンディングボックスの上部にラベル付けされる。背景がプレーンで十分な光がある環境では、クラス予測の確率は約９９％である。検出されたオブジェクトは、さらに処理するために元の画像から切り取られる。トリミングされたオブジェクトは、ターゲットの確率に従ってソートされ、リアルタイムで環境ビューにリストされる。最も高いターゲット確率を持つオブジェクトが最初にソートされる。システムは、平均２８．６ＦＰＳでライブビデオを処理する。義手を制御するための連続的な自動調整は、このような数で可能である。 Regarding the above, the following experiment was conducted. Cups, bottles, spray bottles, balls, and hole punches (objects in the dataset) were placed in a row on a table at not far distances from each other. The gripper held by the user is moved from left to right to grasp the last object (hole punch). The orientation and position of the web camera relative to the gripper was adjusted so that the gripper was located at the bottom center of the image. The camera continues to send live video over the network to the server, which processes each video frame in real time. FIG. 10 is a diagram showing an outline of the experiment. The camera view and environment view are displayed at the top and bottom respectively. From Figure 10, it can be seen that all objects in the camera view are successfully localized with bounding boxes. The predicted class and corresponding probability are also labeled above the bounding box. In an environment with a plain background and sufficient light, the probability of class prediction is about 99%. Detected objects are cropped from the original image for further processing. Cropped objects are sorted according to target probability and listed in the environment view in real time. Objects with the highest target probability are sorted first. The system processes live video at an average of 28.6 FPS. Continuous automatic adjustment to control the prosthetic hand is possible with such numbers.

各フレームの検出されたオブジェクトのオブジェクトクラスの確率とターゲットオブジェクトの確率は、図１１のようにプロットされる。Ｘ軸はフレーム番号である。図１１の上部は、オブジェクトクラスの確率をプロットしたものであり、シーン内のオブジェクトが高い確率で完全に検出できることを示している。図の下の部分は、対象オブジェクトの確率を示しており、接近段階でオブジェクトが把握対象になるプロセスを直感的に視覚化している。確率が高いということは、把握対象となる可能性が高いことを意味する。２つのビデオフレームサンプルが選択され、図に示されている。グリッパが側面からオブジェクトに向かって移動すると、このオブジェクトのターゲット確率はますます高くなる。グリッパがオブジェクトを直接指すか、オブジェクトに近づくと、目標確率はプロットのピークである最大値に近くなる。プロットの５つのピークは、カメラビューの中央にオブジェクトが出現した５回に対応している。ターゲット確率プロットから、義手が物体に近づいている、または物体から遠ざかっていると推測できる。分散制御方式の恩恵を受けて、処理速度と精度が大幅に向上している。 The object class probability and target object probability of the detected object in each frame are plotted as shown in FIG. The X-axis is the frame number. The top part of Figure 11 plots the probabilities of object classes and shows that objects in the scene can be perfectly detected with high probability. The lower part of the figure shows the probability of the target object, and provides an intuitive visualization of the process by which an object becomes a grasp target in the approach stage. A high probability means that there is a high possibility of becoming an object to be grasped. Two video frame samples were selected and shown in the figure. As the gripper moves towards an object from the side, the probability of targeting this object becomes higher and higher. When the gripper points directly at or approaches the object, the target probability will be close to its maximum value, which is the peak of the plot. The five peaks in the plot correspond to the five times the object appeared in the center of the camera view. From the target probability plot, it can be inferred that the prosthetic hand is moving towards or away from the object. Benefiting from the distributed control method, processing speed and accuracy have been significantly improved.

開発したシステムの全体的な機能をテストするために、別の実験を行った。２つのオブジェクトを継続的に把握するサンプルセッションが実行される。このセッションで収集されたセンサデータと中間結果が、図１２における時間領域でプロットされる。上から下の各図は、それぞれ筋収縮レベル、物体認識結果、システム状態、および選択した把持姿勢を示している。ユーザは筋肉の収縮レベルを制御してシステム全体を操作し、システム状態を切り替えるトリガ信号を生成する。システム状態には、イメージシーケンスの供給や把握アクションの実行が含まれる。 Another experiment was conducted to test the overall functionality of the developed system. A sample session is run that keeps track of two objects. The sensor data collected in this session and intermediate results are plotted in the time domain in FIG. Each figure from top to bottom shows the muscle contraction level, object recognition result, system status, and selected grasping posture, respectively. The user operates the overall system by controlling the level of muscle contractions and generates trigger signals that switch system states. System states include providing image sequences and performing grasping actions.

まず、システムはスタンバイ状態から開始し、ＥＭＧ信号電位がしきい値θａを超えたときにボールを把握対象として検出した。次に、ボールを上から掴むように把握姿勢を決定した。次に、ステッパモーターを駆動して、グリッパを選択した把持姿勢に調整した。ユーザーは、グリッパを動かして、オブジェクトに対して適切な把持位置と方向に保持する。最後に、ユーザーは別のしきい値θｂを超えるまで筋肉の収縮レベルを強化することにより、別の信号をトリガした。システム状態が握り動作に変わり、ステッピングモータが駆動されて握り動作を実行しボールを握った。ＥＭＧ電位が低下すると、システムはスタンバイ状態に戻った。その後、システムは、ユーザのトリガ信号が保留されるまで待機する。スプレーボトルをつかむために同じ制御フローが実行された。 First, the system started from a standby state, and detected a ball as a grasping target when the EMG signal potential exceeded the threshold value θa. Next, the grasping posture was determined so that the ball was grasped from above. The stepper motor was then driven to adjust the gripper to the selected gripping position. The user moves the gripper to hold it in the proper gripping position and orientation relative to the object. Finally, the user triggered another signal by increasing the muscle contraction level until it exceeded another threshold θb. The system state changed to a grasping motion, and the stepping motor was driven to perform the grasping motion and grasp the ball. When the EMG potential decreased, the system returned to standby. The system then waits until the user's trigger signal is pending. The same control flow was executed to grab the spray bottle.

分散制御方式により、視覚ベースの義手でオブジェクトを把持する制御は、以前よりはるかにスムーズになった。ユーザの経験の観点から見ると、認識の精度と処理速度が改善されている。実際、分散制御方式の利点はそれ以上である。システムは、半自動連続補綴制御の実現が、画像に代わってライブビデオフィードでオブジェクトの検出とターゲットオブジェクトの提案を実現する。 Thanks to the distributed control method, the control of grasping objects with vision-based prosthetic hands is much smoother than before. From a user experience perspective, recognition accuracy and processing speed are improved. In fact, the benefits of distributed control go beyond that. The system realizes semi-automatic continuous prosthetic control, object detection and target object suggestion with live video feed instead of images.

このように、本実施形態に係る制御システムにおいては、人体に装着され対象物に対する人の行動に伴って１又は複数のセンサで検知された検知情報と、対象物に対して１又は複数のセンサで計測された計測情報とに基づいて、対象物に対する人の行動の内容に合致する動作を特定し、特定された動作に対応させて実行部１３０を駆動制御するため、対象物に対する人の目的に適合するように実行部１３０を駆動させることが可能となり、人と実行部１３０とが連携して協働することで目的を達成することができる。 As described above, in the control system according to the present embodiment, detection information detected by one or more sensors attached to the human body and detected by the person's actions toward the target object, and detection information detected by the one or more sensors attached to the target object. Based on the measurement information measured by the user, an action that matches the content of the person's action with respect to the object is specified, and the execution unit 130 is driven and controlled in accordance with the specified action. It becomes possible to drive the execution unit 130 in a way that suits the user, and the objective can be achieved by the human being and the execution unit 130 working together.

また、対象物に対して実行する実行内容を含む検知情報に基づいて、実行部１３０が行う具体的な動作を制御する制御情報を生成するため、使用者が目的としている行動が特定され、実行部１３０で具体的な動作を行うことが可能になる。 In addition, since the control information for controlling the specific actions performed by the execution unit 130 is generated based on the detection information including the execution details to be performed on the target object, the action targeted by the user is specified and executed. It becomes possible to perform specific operations in the section 130.

なお、本実施形態に係る制御システムでは、対象物の種別特定、使用者の行動推定等の処理をサーバ２０で行う分散処理システムの構成としたが、義肢装置１０側が演算部を備え、義肢装置１０内でサーバ２０の処理を行う集中処理システムの構成であってもよい。 The control system according to the present embodiment has a distributed processing system configuration in which the server 20 performs processing such as identifying the type of object and estimating the user's behavior. It may be configured as a centralized processing system in which processing of the server 20 is performed within the server 10.

（本発明の第２の実施形態）
本実施形態に係る制御システムについて、図１３ないし図１５を用いて説明する。本実施形態に係る制御システムは、前記第１の実施形態に係る制御システムにおける行動推定部２３０の処理の一例を具体的に示したものである。なお、本実施形態において前記第１の実施形態と重複する説明は省略する。 (Second embodiment of the present invention)
The control system according to this embodiment will be explained using FIGS. 13 to 15. The control system according to this embodiment specifically shows an example of the processing of the behavior estimation unit 230 in the control system according to the first embodiment. Note that in this embodiment, explanations that overlap with those of the first embodiment will be omitted.

本実施形態においては、センサ１１としてカメラを用い、義肢装置１０として義手を用いる。カメラの動画像から使用者の動作意図を推定する戦略の概要を図１３に示す。この戦略は３段階からなり、（１）画像上に検出される対象物までの距離を測定する距離算出部を備え、候補特定部が、対象物の中から遠くにあるものを除外し、行動の対象となっている対象物の候補を絞り込む。（２）画像中心に近い程検出される可能性が高いと仮定し、目的特定部が対象物を１つ選択する。（３）その対象物の前で長く（所定時間以上）停止していることを把持する意図と仮定し、目的特定部がその物体を把持対象として決定する。 In this embodiment, a camera is used as the sensor 11, and a prosthetic hand is used as the prosthetic limb device 10. FIG. 13 shows an overview of a strategy for estimating a user's motion intention from camera video images. This strategy consists of three steps: (1) a distance calculation unit that measures the distance to the object detected on the image; a candidate identification unit that excludes objects that are far away; Narrow down the candidates for the target object. (2) The object identification unit selects one object, assuming that the closer it is to the center of the image, the higher the possibility of detection. (3) It is assumed that stopping in front of the object for a long time (more than a predetermined time) indicates an intention to grasp the object, and the purpose specifying unit determines the object as the object to be grasped.

具体的には、例えば、リアルタイム物体検出アルゴリズムＹＯＬＯ（You Only Look Once）を用い、まず対象物が「なに」であるかを認識する。そして、個々の検出矩形の大きさに基づいて近接する対象物を選定する。また、画像の中心と矩形中心との距離を計算し、最も画像の中心に近い対象物を選定する。カメラには加速度、ジャイロセンサが取り付けられており、カメラが静止状態で物体を見ている時間を計測する。以上の情報から、注目している対象物を決定するアルゴリズムを構成する。画像の計測は、シングルボードコンピュータRaspberry Piに、USB接続のWebカメラ（LifeCam Studio for Business 5WH-00003）を接続して実施する。リアルタイム物体検出アルゴリズムの実装にはYOLOを、加速度・ジャイロセンサには、Raspberry Pi用の拡張センサであるSense Hatを用いる。 Specifically, for example, a real-time object detection algorithm YOLO (You Only Look Once) is used to first recognize "what" the object is. Then, a nearby target object is selected based on the size of each detection rectangle. Furthermore, the distance between the center of the image and the center of the rectangle is calculated, and the object closest to the center of the image is selected. The camera is equipped with an acceleration and gyro sensor that measures the amount of time the camera is looking at an object while standing still. From the above information, an algorithm is constructed to determine the object of interest. Image measurements are performed by connecting a USB-connected web camera (LifeCam Studio for Business 5WH-00003) to the single-board computer Raspberry Pi. YOLO is used to implement the real-time object detection algorithm, and Sense Hat, an extended sensor for Raspberry Pi, is used as the acceleration/gyro sensor.

本実施形態におけるアルゴリズムとその妥当性を検証する実験を実施した。ここでは、ボトル（bottle）、カップ（cup）、はさみ（scissors）、スプーン（spoon）の４種類の物体を用いた実施例を紹介する。まず、対象物の大きさに基づく対象物とカメラとの距離を推定する。YOLOで物体検出する際に生成される検出矩形の縦横の長さから、検出矩形の面積を求める。図１４は、物体毎の検出矩形の大きさとカメラとの距離の関係を示したものである。上記４つの物体とカメラとの距離を３０ｃｍ～１００ｃｍの間で１０ｃｍ間隔で距離や角度を変えながら８枚ずつ撮影し、矩形面積の平均と標準偏差を示している。角度の違いなどで面積の大きさにある程度のばらつきが含まれるものの、このグラフから認識した物体とカメラとの大まかな距離を読み取れることが分かる。この結果に基づいて、物体毎に検出矩形のサイズにしきい値を設けることで一定距離以上の物体を検出しないようにする。本実施形態においては前腕の長さを鑑みて４０ｃｍを距離のしきい値とし、矩形サイズのしきい値を、bottle=0.075、cup=0.068、scissors=0.083、spoon=0.035に設定する。 Experiments were conducted to verify the algorithm in this embodiment and its validity. Here, we will introduce examples using four types of objects: a bottle, a cup, scissors, and a spoon. First, the distance between the object and the camera is estimated based on the size of the object. The area of the detection rectangle is calculated from the length and width of the detection rectangle generated when detecting an object with YOLO. FIG. 14 shows the relationship between the size of the detection rectangle for each object and the distance to the camera. Eight images were taken while changing the distance and angle at 10 cm intervals from 30 cm to 100 cm between the above four objects and the camera, and the average and standard deviation of the rectangular areas are shown. Although there is some variation in the size of the area due to differences in angle, it can be seen from this graph that the approximate distance between the recognized object and the camera can be read. Based on this result, a threshold value is set for the size of the detection rectangle for each object, so that objects beyond a certain distance are not detected. In this embodiment, the distance threshold is set to 40 cm in consideration of the length of the forearm, and the rectangular size thresholds are set to bottle=0.075, cup=0.068, scissors=0.083, and spoon=0.035.

次に、撮影画像の中心座標と物体検出時の検出矩形の中心座標との距離を求める。画像全体のサイズは0～1で正規化されているとすると、画像の中心座標は（0.5，0.5）である。物体の画像上の中心座標を（X，Y）であるとすると、２点間の距離は以下のように求められる。 Next, the distance between the center coordinates of the photographed image and the center coordinates of the detection rectangle at the time of object detection is determined. Assuming that the size of the entire image is normalized between 0 and 1, the center coordinates of the image are (0.5, 0.5). Assuming that the center coordinates of the object on the image are (X, Y), the distance between two points is determined as follows.

１枚の画像の中に複数の物体が存在する場合は、式（４）で求められた距離が最短となるものを選択候補とする。 If multiple objects exist in one image, the one with the shortest distance determined by equation (4) is selected as a selection candidate.

次に、カメラによる注視状態の時間計測を行う。カメラが１つの物体を注視している時間が長いほど、その物体を選択する意思が強いと考え、注視時間計測処理を選択アルゴリズムに含めた。Raspberry PiとSense Hatを用いてカメラの動きを計測する。加速度センサの値からRaspberry PiおよびWebカメラの移動を読み取ることが可能である。この時、重力加速度gが常に一定方向へ働いているため、x方向への加速度をAx、y方向への加速度をAy、z方向への加速度をAzとすると、静止状態では Next, the camera measures the time of the gaze state. Considering that the longer the camera gazes at an object, the stronger the intention to select that object, we included gaze time measurement processing in the selection algorithm. Measure camera movement using Raspberry Pi and Sense Hat. It is possible to read the movement of the Raspberry Pi and web camera from the acceleration sensor values. At this time, the gravitational acceleration g always acts in a constant direction, so if the acceleration in the x direction is Ax, the acceleration in the y direction is Ay, and the acceleration in the z direction is Az, in a stationary state

が成り立つ。これを用いて holds true. using this

が成り立つ時をカメラが静止状態であるとみなす。本実施形態においては、仮にk=0.07と設定している。また、ジャイロセンサにおいて、pitch，roll，yawのすべての向きに対する角速度ｖ_θがｖ_θ＜ｍを満たすことも、カメラが静止している条件とする。ここでは、ｖ_θ＜６°を満たした時カメラが静止しているものとする。なおジャイロセンサは、測定開始時を基準角度として取得するように設定されており、カメラの姿勢が－１８０°～１８０°の範囲でどれだけ回転しているかが計算されるようになっている。 The camera is considered to be in a stationary state when the following holds true. In this embodiment, k=0.07 is temporarily set. Further, in the gyro sensor, it is also assumed that the angular velocity v _θ in all directions of pitch, roll, and yaw satisfies v _θ <m as a condition for the camera to be stationary. Here, it is assumed that the camera is stationary when v _θ <6°. Note that the gyro sensor is set to acquire the reference angle at the start of measurement, and calculates how much the camera orientation is rotated within the range of -180° to 180°.

以上、３段階の処理に基づいた対象物の選択意図の推定実験を行った。この実験では、全ての対象物が画角内に収まるように１００ｃｍ以上離れたところから撮影を開始し、距離を近付けながら最終的にbottleに対して把持動作を行う設定で撮影を行った。カメラが静止している場合のみ注視時間を計測するようにアルゴリズムを実装した。結果を図１５に示す。上から、注視時間、撮影画像の中心との距離を基に選択された対象物、加速度、角速度を表している。後方に配置された物体（カメラと４０ｃｍ離れた位置）が、検出されていない様子も観測されている。物体が選択された状態で時間計測が開始され、加速度の変化が０．０７以上、角速度の変化が６°以上の場合にリセットされていることがわかる。図１５から、使用者は最終的にbottleに対して把持動作を実施しようとしていることが推測される。 As described above, an experiment was conducted to estimate the intention to select an object based on three stages of processing. In this experiment, we started photographing from a distance of at least 100 cm so that all objects were within the field of view, and as we approached the distance, we finally performed a gripping motion on the bottle. We implemented an algorithm to measure gaze time only when the camera is stationary. The results are shown in FIG. From the top, the objects selected based on gaze time, distance from the center of the photographed image, acceleration, and angular velocity are shown. It has also been observed that objects placed behind the camera (40 cm away from the camera) were not detected. Time measurement is started with the object selected, and it can be seen that the time measurement has been reset when the change in acceleration is 0.07 or more and the change in angular velocity is 6° or more. From FIG. 15, it can be inferred that the user is ultimately about to perform a gripping motion on the bottle.

本実施形態においては、義手に搭載したカメラの画像から、画像内のどの物体を把持しようとしているかを推定するアルゴリズムの構築を行い、実際に身の回りにある幾つかの物体を用いてシステムの動作が検証された。その結果、対象物の認識結果、検出矩形の位置や面積の情報、および加速度・加速度センサから得られる注視時間の情報を用いて把持対象の候補となる物体を選択するアルゴリズムを構築することができた。 In this embodiment, we constructed an algorithm that estimates which object in the image the prosthetic hand is trying to grasp, based on the image taken by the camera mounted on the prosthetic hand. Verified. As a result, it was possible to construct an algorithm that selects objects that are candidates for grasping using the object recognition results, information on the position and area of the detection rectangle, and information on the gaze time obtained from the acceleration/acceleration sensor. Ta.

このように、本実施形態に係る制御システムにおいては、計測情報には少なくとも撮像センサから取得した撮像情報が含まれており、取得された検知情報に応じて、計測情報として取得された撮像情報に含まれる対象物の種別を特定し、特定された種別の情報及び撮像情報に基づいて対象物までの距離を算出する距離算出部を備えるため、対象物と当該対象物の距離との情報を得ることができ、近くにある対象物や遠くにある対象物を識別し、動作を絞り込むことが可能になる。 In this way, in the control system according to the present embodiment, the measurement information includes at least the imaging information acquired from the imaging sensor, and the imaging information acquired as the measurement information is changed according to the acquired detection information. Since it is equipped with a distance calculation unit that specifies the type of object included and calculates the distance to the object based on information on the specified type and imaging information, information on the distance between the object and the object is obtained. This makes it possible to identify nearby and distant objects and narrow down the action.

また、距離算出部で算出された距離の情報に基づいて、複数の対象物のうち行動の対象となっている候補を候補対象物として特定する候補特定部を備えるため、例えば遠くにある対象物などは除外し、近くにある対象物のみを行動対象の候補として対象物を絞り込むようなことが可能になり、無駄な処理を省いて処理の効率化を図ることができる。 In addition, since it is provided with a candidate identification unit that identifies a candidate that is the target of an action among a plurality of objects as a candidate object based on the distance information calculated by the distance calculation unit, for example, an object that is far away It becomes possible to narrow down the target objects by excluding objects that are nearby and using them as candidates for action, and it is possible to eliminate unnecessary processing and improve processing efficiency.

さらに、撮像情報における画像の中心に最も近い位置に撮像されている対象物が所定時間以上に亘って静止状態で撮像された場合に、当該対象物を行動の目的とする目的対象物として特定する目的特定部を備えるため、対象物が複数あるような場合に使用者が本来目的としている目的対象物を正確に特定することができる。 Furthermore, if an object imaged at the position closest to the center of the image in the imaging information is imaged in a stationary state for a predetermined period of time or more, the object is identified as the target object for the action. Since the purpose specifying section is provided, when there are multiple objects, the user can accurately specify the desired object.

（本発明の第３の実施形態）
本実施形態に係る制御システムについて、図１６ないし図１９を用いて説明する。本実施形態に係る制御システムは、前記各実施形態に係る制御システムにおいて対象物のマッチングと追従に関する処理の具体例を示したものである。なお、本実施形態において前記各実施形態と重複する説明は省略する。 (Third embodiment of the present invention)
The control system according to this embodiment will be explained using FIGS. 16 to 19. The control system according to the present embodiment shows a specific example of processing related to object matching and tracking in the control system according to each of the embodiments described above. Note that, in this embodiment, explanations that overlap with those of the above embodiments will be omitted.

本実施形態に係る制御システムにおいて、対象物の位置や姿勢などの空間的な情報を追従するためには、カメラと対象物との空間的関係を記述する方程式を構築し、一連の動作が終了するまでそれらの関係を追跡する必要がある。カメラと義手の調整を行う簡単な方法は、把握動作中にキャプチャされた画像のすべてのフレームからカメラと対象物との空間的関係を推定することである。ただし、アルゴリズムの複雑さにより、リアルタイムでの実行が非常に難しくなってしまう。特に、把握の一連の動作は非常に短時間で行われる。さらに、カメラの視野は、把持動作の終了時に非常に狭くなり、そこから空間関連情報を抽出するのが難しくなってしまう。この問題を解決するために、本実施形態においては、慣性測定ユニット（IMU）センサを導入して、空間関係を追跡する。IMUセンサは、１つのチップに加速度計とジャイロスコープを組み合わせたものである。 In the control system according to this embodiment, in order to track spatial information such as the position and orientation of a target object, an equation describing the spatial relationship between the camera and the target object is constructed, and a series of operations is completed. You need to track those relationships until A simple way to perform camera and prosthetic hand coordination is to estimate the spatial relationship between the camera and the object from every frame of the image captured during the grasping motion. However, the complexity of the algorithm makes it extremely difficult to execute in real time. In particular, a series of grasping operations are performed in a very short time. Furthermore, the field of view of the camera becomes very narrow at the end of the grasping motion, making it difficult to extract spatially related information therefrom. To solve this problem, in the present embodiment, an inertial measurement unit (IMU) sensor is introduced to track spatial relationships. An IMU sensor combines an accelerometer and gyroscope in one chip.

カメラの動きは、把握の一連の動作で常にIMUセンサによって推定することができる。カメラと対象物との空間的関係は、カメラの動きを知ることで容易に追跡可能である。画像シーケンスのすべてのフレームでの空間的関係の推定と比較して、IMUセンサによって追跡される関係は非常に小さな計算リソースで演算可能であり、カメラの視野に制限されない。IMUセンサを視覚ベース（センサ１１をカメラとした場合）の義手の制御システムに導入すると、上述したようなカメラと義肢の調整の問題が解決できる。 The camera movement can always be estimated by the IMU sensor in a series of grasping movements. The spatial relationship between a camera and an object can be easily tracked by knowing the movement of the camera. Compared to the estimation of spatial relationships at every frame of an image sequence, the relationships tracked by an IMU sensor can be computed with much less computational resources and are not limited to the camera's field of view. Introducing an IMU sensor into a vision-based (when sensor 11 is a camera) control system for a prosthetic hand can solve the problem of coordination between the camera and the prosthesis as described above.

以下、制御方法を説明する。対象物追従の制御スキームには、基本的に２つの段階が含まれる。（１）対象物を認識し、空間演算部がカメラに対する対象物の姿勢を推定する。（２）位置関係追跡部がカメラと対象物との空間的関係を追跡する（図１６を参照）。具体的に言えば、カメラは、ユーザが対象物を把握しようとするシーンを撮像し、空間演算部がカメラと対象物との空間的な関係を即座に構築する。空間関係には、対象物の立ち位置/横たわり位置、および対象物の方向に関する情報が含まれる。これは基本的に、カメラ座標系（ＣＳ）から対象物ローカルＣＳ（ワールドＣＳｅ）へ、またはその逆にポイントを投影するために使用される４×４変換行列である（参考文献２：Fang W, Zheng L, Deng H, Zhang H. Real-Time Motion Tracking for Mobile Augmented/Virtual Reality Using Adaptive Visual-Inertial Fusion. Sensors (Basel). 2017;17(5):1037. Published 2017 May 5.）。その後、位置関係追跡部により、空間関係（マトリックス）がIMUセンサを用いて追跡される。義手がどこに動いても、２つのＣＳの空間的関係は常に制御システムによって追跡される。ただし、IMUセンサのノイズとドリフトのため、この追跡を長期間維持することは難しい。把握の一連の動作中に変換行列を推定し、当該変換行列を更新することが、この制御スキームで解決すべき主な問題となる。以下でそれぞれ説明する。 The control method will be explained below. The control scheme for object tracking basically includes two stages. (1) The target object is recognized, and the spatial calculation unit estimates the orientation of the target object with respect to the camera. (2) The positional relationship tracking unit tracks the spatial relationship between the camera and the object (see FIG. 16). Specifically, the camera images a scene in which the user attempts to grasp the object, and the spatial calculation unit immediately constructs a spatial relationship between the camera and the object. The spatial relationship includes information regarding the standing/lying position of the object and the orientation of the object. This is essentially a 4x4 transformation matrix used to project points from the camera coordinate system (CS) to the object local CS (world CSe) and vice versa (Reference 2: Fang W , Zheng L, Deng H, Zhang H. Real-Time Motion Tracking for Mobile Augmented/Virtual Reality Using Adaptive Visual-Inertial Fusion. Sensors (Basel). 2017;17(5):1037. Published 2017 May 5.). Thereafter, the spatial relationship (matrix) is tracked by the positional relationship tracking unit using the IMU sensor. No matter where the prosthetic hand moves, the spatial relationship of the two CSs is always tracked by the control system. However, it is difficult to maintain this tracking over long periods of time due to IMU sensor noise and drift. Estimating the transformation matrix and updating the transformation matrix during the grasping sequence are the main problems to be solved with this control scheme. Each will be explained below.

変換行列は、カメラ座標系ＣＳと対象物ローカルＣＳの空間的関係を表す。２つのＣＳで同じ対象物の対応するポイントを見つけることが変換行列を推定するための鍵となる。図１７を参照して、カメラフレーム内の対象物のポイントＰ_ｃ＝（ｘ_ｃ；ｙ_ｃ；ｚ_ｃ;１）^Ｔを指定すると、ワールドフレーム内の対応するポイントはＰ_ｗ＝（ｘ_ｗ；ｙ_ｗ；ｚ_ｗ；１）^Ｔであり、ワールドＣＳからカメラＣＳにポイントを投影するには次の式が使用されます。 The transformation matrix represents the spatial relationship between the camera coordinate system CS and the object local CS. Finding corresponding points of the same object in two CSs is the key to estimating the transformation matrix. Referring to FIG. 17, if the point of the object in the camera frame is specified as P _c = (x _c ; y _c ; z _c ; 1) ^T , the corresponding point in the world frame is P _w = (x _w ; y _w ;z _w ;1) ^T , and the following formula is used to project a point from the world CS to the camera CS.

複数の対応するポイントがある場合、変換行列は式（７）を使用して計算することができる。 If there are multiple corresponding points, the transformation matrix can be calculated using equation (7).

特徴マッチングは、２つのシーンで同じ対象物内の対応するポイントを見つけることができる。対象物の表面のテクスチャの特徴を検出し、事前にスキャンされた対象物の特徴（参照）と比較して、対象物の位置と方向を推定する。図１８はその仕組みを示す図である。事前にスキャンされた対象物の３Ｄ特徴点（Ｐ_１；Ｐ_２；Ｐ_３；Ｐ_４；：：：）と同じ対象物の検出された３Ｄ特徴点の束（Ｐ_１ ^０；Ｐ_２ ^０；Ｐ_３ ^０；Ｐ_４ ^０；：：：）が実行時に検出された。その対応付けは、特徴を一致させることで見つけることができる。結果は、（Ｐ_１；Ｐ_１ ^０），（Ｐ_２；Ｐ_２ ^０），（Ｐ_３；Ｐ_３ ^０），・・・，（Ｐ_ｎ，Ｐ_ｎ ^０）となる。定義済みの特徴点がカメラＣＳの原点と重複する場所にＣＳの原点を配置し、それらの軸を揃えると、事前にスキャンされた特徴点がカメラＣＳで表される。同時に、検出された特徴点（変換された点）もカメラＣＳにある。ｉ番目の事前スキャンされた特徴点がＰ_ｉ＝（Ｐ_ｉｘ；Ｐ_ｉｙ；Ｐ_ｉｚ；１）^Ｔとして定義され、実行時に検出される対応する特徴点がＰ_ｉ ^０＝（Ｐ_ｉｘ ^０；Ｐ_ｉｙ ^０；Ｐ_ｉｚ ^０；１）^Ｔで、以下の式がそれらの関係を示している。 Feature matching can find corresponding points in the same object in two scenes. Detects texture features on the object's surface and compares them with previously scanned object features (reference) to estimate the object's position and orientation. FIG. 18 is a diagram showing the mechanism. A bundle ^of detected 3D feature points _of the same object (P ₁ ₀ ; _P ₂ ₀ ^; P ₃ ⁰ ; P ₄ ⁰ ;:::) was detected at runtime. The correspondence can be found by matching features. The results are (P ₁ ; P ₁ ⁰ ), (P ₂ ; P ₂ ⁰ ), (P ₃ ; P ₃ ⁰ ), . . . , (P _n , P _n ⁰ ). By placing the origin of the CS where the defined feature points overlap with the origin of the camera CS and aligning their axes, the pre-scanned feature points are represented by the camera CS. At the same time, the detected feature points (converted points) are also present in the camera CS. The i-th pre-scanned feature point is defined as P _i = (P _ix ; P _iy ; P _iz ; 1) ^T , and the corresponding feature point detected at runtime is P _i ⁰ = (P _ix ⁰ ; P _iy ⁰ ; P _iz ⁰ ; 1) ^T , and the following equation shows their relationship.

したがって、変換行列は、一致した特徴点のリストを式（８）に代入することによって計算される。 Therefore, the transformation matrix is calculated by substituting the list of matched feature points into equation (8).

義手が移動してカメラフレームとワールドフレームの空間的な関係を追跡すると、変換行列が更新される。カメラの平行移動と回転を追跡できる場合、平行移動と回転ベクトルを使用して変換行列を更新できる。カメラの動きは、（α，β，γ，Ｔ_ｘ；Ｔ_ｙ；Ｔ_ｚ）で表され、α、β、γはロール、ピッチ、ヨーをそれぞれ定義し、Ｔ_ｘ；Ｔ_ｙ；Ｔ_ｚは３つの異なる軸の変換を定義する。 As the prosthetic hand moves and tracks the spatial relationship between the camera frame and the world frame, the transformation matrix is updated. If you can track the translation and rotation of the camera, you can use the translation and rotation vectors to update the transformation matrix. The movement of the camera is expressed as (α, β, γ, T _x ; T _y ; T _z ), where α, β, and γ define roll, pitch, and yaw, respectively, and T _x ; T _y ; T _z is Define transformations on three different axes.

カメラの動きの追跡には、例えば、カメラに基づく自己位置・姿勢推定（visual odometry）、カメラと慣性センサに基づく自己位置・姿勢推定（visual-inertial odometry）、またはIMUセンサなど、いくつかの方法がある。最も簡単な方法は、通常は加速度計とジャイロスコープを備えたIMUセンサを使用するものである。加速度計は加速力を測定し、ジャイロスコープは方向または角速度を測定する。これらはそれぞれ、他のノイズとドリフトの誤差を相殺し、より完全で正確な運動追跡を提供する。カメラの動きは、以下の式（９）、（１０）を使用して加速度と角速度を積分することにより計算される。IMUセンサによって導入されたノイズにより、トラッキング精度は比較的短期的にしか許容できない。 There are several methods for tracking camera movement, such as camera-based visual odometry, camera and inertial sensor-based visual-inertial odometry, or IMU sensors. There is. The simplest method is to use an IMU sensor, usually with an accelerometer and gyroscope. Accelerometers measure acceleration forces, and gyroscopes measure direction or angular velocity. Each of these cancels out other noise and drift errors, providing more complete and accurate motion tracking. Camera movement is calculated by integrating acceleration and angular velocity using equations (9) and (10) below. Due to the noise introduced by the IMU sensor, tracking accuracy is only acceptable in the relatively short term.

式（９）及び（１０）は、カメラ姿勢の最終式（１１）を与える。 Equations (9) and (10) give the final equation (11) for the camera pose.

上記に関し、以下の実験を行った。最近のほとんどのスマートフォンには、カメラとIMUセンサが搭載されている。したがって、スマートフォンを使用して簡単なデモンストレーションを行うのが便利である。また、上記方法の基本概念は、拡張現実（ＡＲ）アプリケーションに非常に似ている。AppleとGoogleの会社から、ARアプリケーションを開発する開発者向けにARKitとARCoreライブラリがリリースされており、上記方法を検証するために使用することができる。 Regarding the above, the following experiment was conducted. Most modern smartphones are equipped with a camera and IMU sensor. Therefore, it is convenient to perform a simple demonstration using a smartphone. Also, the basic concept of the above method is very similar to augmented reality (AR) applications. The companies Apple and Google have released ARKit and ARCore libraries for developers developing AR applications, which can be used to test the above method.

iPhone（登録商標）の場合、ARKitは変換行列を推定するための特徴マッチングを実行し、カメラの動きを追跡するためにvisual-inertial odometryを使用する。visual-inertial odometryでは、まずiPhone（登録商標）のカメラを使用して、興味深い特徴点を特定し、それらの点が時間とともにどのように移動するかを追跡する。これらのポイントの移動とiPhone（登録商標）の慣性センサからの読み取り値の組み合わせにより、iPhone（登録商標）が空間を移動する際の位置と方向の両方が決定される。 For iPhone®, ARKit performs feature matching to estimate the transformation matrix and uses visual-inertial odometry to track camera movement. Visual-inertial odometry first uses the iPhone's camera to identify interesting feature points and track how those points move over time. The movement of these points, combined with readings from the iPhone's inertial sensors, determines both the position and direction of the iPhone as it moves through space.

上記で説明した方法を検証するために、２つの実対象物に基づいて、最初にボトルとカップの２つの３Ｄモデルが作成される。カメラＣＳと対象物ローカルＣＳ間の変換行列を推定し、３Ｄモデルを実世界に投影して、実際の対象物として配置および方向付けを行う。カメラの動きは、変換行列を使用してIMUセンサによって追跡され、ＡＲシーンを再構築するために、変換行列はすべてのフレームで更新される。 To verify the method described above, two 3D models of a bottle and a cup are first created based on two real objects. A transformation matrix between the camera CS and the object local CS is estimated, and the 3D model is projected onto the real world to be placed and oriented as a real object. The camera movement is tracked by the IMU sensor using a transformation matrix, which is updated every frame to reconstruct the AR scene.

デモンストレーション結果を図１９に示す。向きも認識されることを示すために、ボトルをカップの上部に置いた。写真の白い領域は、事前に作成された３Ｄモデルである。変換行列を推定した後、カップとボトルの３Ｄモデルは両方とも３Ｄワールドシーンに投影され、対応する実際の対象物に一致する。同時に、３Ｄワールドシーンが再び２Ｄ画像に投影され、図１９に表示される。この図から、カップとボトルの位置と向きの両方がうまく推定されていることがわかる。次に、ビュー角度を変更し、カメラを対象物から遠ざけるようにして、カメラの動きを追跡できるようにした。レンダリングされた３Ｄモデルが実世界で終了すると、２Ｄ画像内のカメラの動きに応じて対象物のスケールと遠近が変更され、カメラが適切に追跡されていることが証明される。ただし、テスト中に、向きと位置を常に正確に推定できないことがわかった。３ＤモデルがＡＲシーンの実際の対象物と完全に一致しない場合がある。 The demonstration results are shown in FIG. A bottle was placed on top of the cup to demonstrate that orientation was also recognized. The white area in the photo is a 3D model created in advance. After estimating the transformation matrix, both the cup and bottle 3D models are projected onto the 3D world scene and match the corresponding real objects. At the same time, the 3D world scene is projected onto the 2D image again and displayed in FIG. It can be seen from this figure that both the position and orientation of the cup and bottle are successfully estimated. They then changed the viewing angle to move the camera away from the object so they could track its movement. When the rendered 3D model is finished in the real world, the scale and perspective of the object changes according to the movement of the camera within the 2D image, proving that the camera is properly tracking. However, during testing, we found that orientation and location could not always be estimated accurately. The 3D model may not perfectly match the actual object in the AR scene.

この実験は、物体の位置と方向を推定し、センサフュージョンを使用してリアルタイムで追跡できることを示している。変換行列は、カメラと対象物間のこの空間的な関係を表す。変換行列を推定および追跡することは、空間関係を構築するための鍵となる。この関係は、義手が包括的に把握シーンを理解するのに役立ち、義手を制御するためのより多くの証拠を提供する。そして、これには多くの利点がある。まず、対象物の形状とその６Ｄポーズがわかっている場合、コントロールは非常に細かく実行できる。第２に、従来の把握タイミングは、通常、ＥＭＧ信号から人の意図を推定することによって決定されるが、本実施形態においては、把握成功率が空間関係を表す変換行列から推測できる。第３に、義手は半自動的に制御できる。空間的関係のマスタは、義手が自分自身を調整する可能性を与える。 This experiment shows that an object's position and orientation can be estimated and tracked in real-time using sensor fusion. A transformation matrix represents this spatial relationship between the camera and the object. Estimating and tracking transformation matrices is key to building spatial relationships. This relationship helps the prosthetic hand to comprehensively understand the scene grasped and provides more evidence for controlling the prosthetic hand. And this has many advantages. First, if the shape of the object and its 6D pose are known, the control can be performed very precisely. Second, conventional grasping timing is usually determined by estimating a person's intention from an EMG signal, but in this embodiment, the grasping success rate can be estimated from a transformation matrix representing the spatial relationship. Third, the prosthetic hand can be controlled semi-automatically. Mastering spatial relationships gives the prosthetic hand the possibility to adjust itself.

このように、本実施形態に係る制御システムにおいては、手と対象物との間の空間的関係の構築と追跡に基づいて義手の制御を行い、空間的関係を行列によって表し、義手を細かく制御するためのより多くの証拠を提供することができた。 In this way, the control system according to this embodiment controls the prosthetic hand based on the construction and tracking of the spatial relationship between the hand and the object, represents the spatial relationship by a matrix, and finely controls the prosthetic hand. I was able to provide more evidence for this.

また、カメラと撮像情報に含まれる対象物との間の空間的な位置関係を演算する空間演算部と、演算された空間位置関係と、検知情報として取得される撮像センサの移動情報とに基づいて、対象物と撮像センサとの相対的な空間位置関係を追跡する位置関係追跡部とを備えるため、対象物及び／又は使用者が移動したり角度が変わった場合であっても、対象物と撮像センサとの位置・姿勢関係を常に追従することが可能となり、実行部の駆動制御情報をその時の状態に応じて正確に生成することができる。 In addition, a spatial calculation unit that calculates the spatial positional relationship between the camera and the object included in the imaging information, and a spatial calculation unit that calculates the spatial positional relationship between the camera and the object included in the imaging information, and a The system is equipped with a positional relationship tracking unit that tracks the relative spatial positional relationship between the target object and the image sensor, so even if the target object and/or the user move or the angle changes, the target object It becomes possible to always track the position/orientation relationship between the image sensor and the image sensor, and it is possible to accurately generate drive control information for the execution unit according to the state at that time.

（本発明の第４の実施形態）
本実施形態に係る制御システムについて、図２０ないし図２２を用いて説明する。本実施形態に係る制御システムは、前記各実施形態に係る制御システムにおいて、対象物の認識後ＡＲマーカを使って義手の動きを計算し、対象物をトラッキングし続けるものである。なお、本実施形態において前記各実施形態と重複する説明は省略する。 (Fourth embodiment of the present invention)
The control system according to this embodiment will be explained using FIGS. 20 to 22. The control system according to this embodiment is the control system according to each of the embodiments described above, and after recognizing the object, calculates the movement of the prosthetic hand using the AR marker and continues to track the object. Note that, in this embodiment, explanations that overlap with those of the above embodiments will be omitted.

本実施形態に係る制御システムは、マーカを使用して対象物の姿勢を追跡する。図２０に示すように、マーカはシーン内のすべての対象物の姿勢をマークするための調整システムの構築に使用される。すなわち、本実施形態における義手（スマートハンドと称する）がマーカを検出した場合、調整システムが構築され、それに応じて調整システム内の対象物の姿勢が更新される。調整システムは、１つのマーカを使用する代わりに、複数のマーカ（マーカーマップ）を使用して構築される。複数のマーカによって推定される調整システムにより、姿勢追跡システムがよりロバストになる。一方、オクルージョンや照明条件が悪いために単一のマーカでは検出されない場合があるが、マルチマーカを使用すると、ほとんどのカメラの視点で調整システムをセットアップすることができる。 The control system according to this embodiment tracks the posture of the object using markers. As shown in Figure 20, markers are used to build a coordination system to mark the poses of all objects in the scene. That is, when the prosthetic hand (referred to as a smart hand) in this embodiment detects a marker, an adjustment system is constructed, and the posture of the object in the adjustment system is updated accordingly. Instead of using one marker, the adjustment system is built using multiple markers (marker map). A coordination system estimated by multiple markers makes the pose tracking system more robust. On the other hand, a single marker may not be detected due to occlusion or poor lighting conditions, whereas multi-markers allow the adjustment system to be set up for most camera viewpoints.

スマートハンドの空間認識を示すデモを図２０に示す。スマートハンドは、カップ、ボトル、ボールの３つのオブジェクトの回りを動き回る。スマートハンドは、視野内の対象物を検出し、姿勢を推定し、マーカを使用してリアルタイムで姿勢を追跡する。姿勢追跡は、スマートハンドの動きに対してロバストである。 A demonstration showing the smart hand's spatial recognition is shown in Figure 20. The smart hand moves around three objects: a cup, a bottle, and a ball. The smart hand detects objects in its field of view, estimates their pose, and uses markers to track their pose in real time. Pose tracking is robust to smart hand movements.

スマートハンドの制御は、他のほとんどのロボット工学における制御とは異なる。人的要因、すなわち、対象物に対する人の行動に伴って検知されたセンサの検知情報が大きな役割をしている。まず、スマートハンドは人体の延長であり、その動きは使用者の意思を反映する必要がある。第２に、使用者の体、特に腕は、スマートハンドと連携して、全体の把握動作を完了する必要があり、義手を対象物の方向に向ける。本実施形態における空間認識モジュールには、使用者が対象物を把握するのに役立つ３つの重要な機能がある。 Controlling a smart hand is different than controlling in most other robotics. Human factors, that is, sensor detection information detected along with human actions toward objects, play a major role. First, the smart hand is an extension of the human body, and its movements must reflect the user's intentions. Second, the user's body, especially the arm, needs to cooperate with the smart hand to complete the entire grasping motion, directing the prosthetic hand towards the object. The spatial recognition module in this embodiment has three important functions that help the user understand objects.

第１に、テーブルに複数の対象物（目的となりうる対象物で候補対象物という）が置かれているシーンにおいて、スマートハンドは、使用者が把握する対象物（目的となる対象物で目的対象物という）を予測する必要がある。スマートハンドに埋め込まれたカメラは、スマートハンドの目と見なされる。したがって、カメラがどの候補対象物を見ているかを分析することにより、目的対象物を決定できる。より具体的には、カメラのｚ軸（撮像方向の軸）が通過する対象物が目的対象物として決定される。図２１では、カメラのｚ軸は、２つのシーンのカップとボトルをそれぞれ通過する。目的対象物は図に示す通りである。つまり、上述した２Ｄ画像の中心に最も近いオブジェクトを見つける場合に比べて、スマートハンドの空間認識を使用した３Ｄに拡張する方が、目的対象物をより直感的で且つ正確に予測することが可能となる。 First, in a scene where multiple objects (objects that can be the target and are called candidate objects) are placed on a table, the smart hand can It is necessary to predict things. The camera embedded in the smart hand is considered the eye of the smart hand. Therefore, by analyzing which candidate objects the camera is looking at, the target object can be determined. More specifically, the object through which the z-axis (axis in the imaging direction) of the camera passes is determined as the target object. In Figure 21, the camera's z-axis passes through the cup and bottle in the two scenes, respectively. The target object is as shown in the figure. In other words, compared to finding the object closest to the center of the 2D image described above, extending it to 3D using smart hand spatial recognition can predict the target object more intuitively and accurately. becomes.

第２に、一般的に腕がスマートハンドを対象物に向けると、カメラのビジョンビューは次第に狭くなる。対象物から所定の距離では、カメラは画像から情報を抽出できない。しかし、本実施形態においては抽出された姿勢情報を追跡するためにマーカーを使用する。つまり、マーカを検出できれば、姿勢情報は保持および更新される。マーカには小さな明確なパターンがあり、カメラから比較的近い距離でも簡単に検出できる。したがって、空間認識モジュールの検出範囲は広くなっている。図２２では、スマートハンドはカップを掴むために遠方から向けられており、姿勢情報は接近段階の終わりまで保持される。 Second, as the arm generally points the smart hand toward the object, the camera's vision view becomes progressively narrower. At a certain distance from the object, the camera cannot extract information from the image. However, in this embodiment, markers are used to track the extracted posture information. In other words, if the marker can be detected, the posture information is retained and updated. The markers have a small, distinct pattern that makes them easy to detect even at relatively close distances from the camera. Therefore, the detection range of the spatial recognition module is wide. In Figure 22, the smart hand is directed from a distance to grasp the cup, and the pose information is retained until the end of the approach phase.

第３に、スマートハンドは使用者と協働して対象物を把握する。スマートハンドの処理速度は、腕の移動速度と一致する必要がある。一致しない場合、制御システムに遅延が生じる。ライブビデオ素材のすべてのフレームから対象物の姿勢を推定するには、多くの時間を要し、制御システムの速度が低下する。このため、本実施形態においては、ＩＭＵセンサとマーカを使用して姿勢情報を追跡している。追跡ベースのシステムでは、把握セッションの開始時に姿勢を推定し、このセッションの終了まで追跡する必要がある。処理速度は、開発されたマーカ追跡システムで毎秒６０フレーム以上を達成する。 Third, the smart hand works with the user to grasp the object. The processing speed of the smart hand must match the movement speed of the arm. If there is a mismatch, there will be delays in the control system. Estimating the object pose from every frame of live video material takes a lot of time and slows down the control system. Therefore, in this embodiment, posture information is tracked using an IMU sensor and a marker. Tracking-based systems require pose estimation at the beginning of a grasping session and tracking until the end of this session. The processing speed achieved is more than 60 frames per second with the developed marker tracking system.

以上のように、本実施形態においては、把握セッションの開始時に姿勢を１回だけ推定し、マーカを使用して最後まで追跡する空間認識モジュールを備える構成とした。開発された空間認識モジュールは、上述した３つの重要な機能を有することで使用者がスマートハンドを簡単に制御できるようにしている。 As described above, this embodiment has a configuration including a spatial recognition module that estimates the posture only once at the start of a grasping session and uses markers to track it until the end. The developed spatial recognition module has the three important functions mentioned above to allow the user to easily control the smart hand.

なお、上記各実施形態において、使用者へのフィードバック機能を備えるようにしてもよい。具体的には、例えばＡＲグラス、ヘッドマウントディスプレイ、プロジェクションマッピング装置、レーザポインタなどを介してコンピュータの演算結果や動作予定（情報の解釈結果、制御計画等も含む）を使用者に提示することで、効率的な作業やエラー回避が可能なシステムを提供することが可能である。 In addition, in each of the above embodiments, a feedback function to the user may be provided. Specifically, by presenting computer calculation results and operation schedules (including information interpretation results, control plans, etc.) to users through, for example, AR glasses, head-mounted displays, projection mapping devices, laser pointers, etc. It is possible to provide a system that allows efficient work and error avoidance.

１制御システム
１０義肢装置
１１センサ
２０サーバ
１１０取得部
１２０送信部
１３０実行部
１４０駆動制御部
２１０センサ情報取得部
２２０対象物特定部
２３０行動推定部
２４０駆動制御情報生成部

1 Control System 10 Prosthetic Limb Device 11 Sensor 20 Server 110 Acquisition Unit 120 Transmission Unit 130 Execution Unit 140 Drive Control Unit 210 Sensor Information Acquisition Unit 220 Object Specification Unit 230 Behavior Estimation Unit 240 Drive Control Information Generation Unit

Claims

an execution unit that is attached to the human body and executes an action on a target object;
a detection information acquisition means for acquiring detection information including at least a myoelectric signal detected by one or more sensors in accordance with a person's action toward the object;
measurement information acquisition means for acquiring measurement information including at least imaging information measured by one or more sensors on the object;
Based on the detection information and the measurement information, when the myoelectric signal exceeds a first threshold, the type of the object included in the imaging information is specified, and the target object that is the purpose of the action is specified. and a motion specifying unit that executes an estimation process of determining a behavioral posture toward the target object and adjusting the execution unit to the behavioral posture;
When the myoelectric signal exceeds a second threshold that is larger than the first threshold while the person wearing the execution unit holds the execution unit in a position and direction that provides the appropriate behavioral posture, the execution unit A control system comprising: drive control means for controlling the drive of the target object so that the target object performs a direct action on the target object .

The control system according to claim 1,
The execution unit is a prosthetic hand having a camera whose imaging direction is the direction of the tip pointed by the wearer,
When the object imaged at a position closest to the center of the image in the imaging information taken by the camera is imaged in a stationary state for a predetermined period of time or more, the motion specifying means determines that the object is A control system characterized by specifying an object as a target object of the action .

The control system according to claim 1 or 2,
The execution unit is a prosthetic hand having a camera whose imaging direction is the direction of the tip pointed by the wearer,
The operation specifying means
a spatial calculation unit that calculates a spatial positional relationship between the camera and the object included in the imaging information;
a positional relationship tracking unit that tracks the relative spatial positional relationship between the object and the camera based on the calculated spatial positional relationship and movement information of the camera acquired as the detection information;
The spatial calculation section and the positional relationship tracking section are configured to calculate the positional relationship within the scene based on the imaging information acquired by the measurement information acquisition means, which is obtained by imaging an imaging range in which markers are arranged in advance with the camera. A control system characterized by determining the posture of an object.

A control program that controls an execution unit that is worn on a human body and executes an operation on a target object,
Detection information acquisition means for acquiring detection information including at least a myoelectric signal detected by one or more sensors in accordance with a person's behavior toward the object;
measurement information acquisition means for acquiring measurement information including at least imaging information measured by one or more sensors on the object;
Based on the detection information and the measurement information, when the myoelectric signal exceeds a first threshold, the type of the object included in the imaging information is specified, and the target object that is the purpose of the action is specified. and a motion specifying means for determining a behavioral posture with respect to the target object and executing an estimation process of adjusting the execution unit to the behavioral posture;
When the myoelectric signal exceeds a second threshold that is larger than the first threshold while the person wearing the execution unit holds the execution unit in a position and direction that provides the appropriate behavioral posture, the execution unit 1. A control program that causes a computer to function as a drive control means for controlling the drive so that the object performs a direct action on the object.