WO2021112107A1 - Information processing device, information processing method, and program - Google Patents

Information processing device, information processing method, and program Download PDF

Info

Publication number
WO2021112107A1
WO2021112107A1 PCT/JP2020/044771 JP2020044771W WO2021112107A1 WO 2021112107 A1 WO2021112107 A1 WO 2021112107A1 JP 2020044771 W JP2020044771 W JP 2020044771W WO 2021112107 A1 WO2021112107 A1 WO 2021112107A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
gesture recognition
orientation
recognition target
posture
Prior art date
Application number
PCT/JP2020/044771
Other languages
French (fr)
Japanese (ja)
Inventor
ダニエル誠 徳永
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021112107A1 publication Critical patent/WO2021112107A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Definitions

  • This technology relates to information processing devices, information processing methods and programs.
  • Patent Document 1 describes a user interface device that recognizes a gesture of a user's hand using a camera attached to the ceiling.
  • a camera fixed to the ceiling is used, and the accuracy of gesture recognition may decrease depending on the position of the user or the direction of the hand.
  • the purpose of this technology is to improve the accuracy of gesture recognition of the gesture recognition target.
  • the concept of this technology is An information acquisition unit that acquires the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
  • An information receiving unit that receives the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
  • the position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target.
  • the information processing apparatus includes an information processing unit that determines a posture and recognizes the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
  • the information acquisition unit acquires the first position / orientation information of the gesture recognition target based on the position / orientation of the first device based on the sensor output of the first device.
  • the information receiving unit receives from the second device the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device. Will be done.
  • the information processing unit recognizes the gesture based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and orientation information of the gesture recognition target. The position and posture of the target are determined. Then, the information processing unit recognizes the gesture of the gesture recognition target based on the determined position and orientation of the gesture recognition target.
  • the information acquisition unit further acquires the position / orientation information of the second device based on the position / orientation of the first device based on the sensor output of the first device, and the relative relationship information is the first device.
  • the position / orientation information of the second device based on the position / orientation of the second device may be included.
  • the information acquisition unit is configured to acquire the position / orientation information of the second device based on the recognition marker information displayed on the second device included in the sensor output of the first device. You may.
  • the information receiving unit receives the position / orientation information of the first device based on the position / attitude information of the second device acquired from the second device based on the sensor output of the second device. Further received, the relative relationship information may be made to include the position / orientation information of the first device based on the position / attitude information of the second device.
  • the information processing unit spatially determines the first position / posture of the gesture recognition target and the second position / posture of the gesture recognition target based on the relative relationship information between the position / posture of the first device and the second device. Based on the time stamp information added to the first position / posture information of the gesture recognition target and the second position / posture information of the gesture recognition target, the first position / posture of the gesture recognition target and the gesture recognition target The second position / orientation is time-synchronized by the prediction process, and the first position / orientation of the gesture recognition target and the second position / orientation of the gesture recognition target that are spatially and temporally synchronized are integrated to perform gesture recognition.
  • the position and orientation of the object may be determined.
  • identification information is added to the first position / orientation information of the gesture recognition target acquired by the information acquisition unit and the second position / orientation information of the gesture recognition target received by the information receiving unit.
  • the information processing unit integrates or separates the identification information based on each position / orientation information of the gesture recognition target, and the first position / orientation of the gesture recognition target and the second position of the gesture recognition target related to the same identification information.
  • the posture may be integrated.
  • the information receiving unit transmits a request for starting the application to the second device, and then sends a request for transmitting information, and the second device recognizes the second position / orientation information of the gesture recognition target. May be received.
  • the second device that has received the application start request may update the second position / orientation information of the gesture recognition target at any time.
  • the first device is an augmented reality display device, and may be further provided with a display control unit that controls the augmented reality display in the augmented reality display device based on the gesture detection recognition information.
  • the gesture recognition target may be located between the first device and the second device, and the augmented reality display may be performed at a position corresponding to the second device.
  • the first device may be a head-mounted display having a transmissive display
  • the second device may be a mobile device having a non-transparent display.
  • the first position / orientation information of the gesture recognition target based on the position / orientation of the first device and the second position / orientation of the gesture recognition target based on the position / orientation of the second device It recognizes the gesture of the gesture recognition target based on the position and posture of the gesture recognition target determined based on the information. Therefore, it is possible to improve the accuracy of gesture recognition of the gesture recognition target. Further, in the present technology, the first device and the second device can be freely moved.
  • FIG. 1 shows a configuration example of an AR (Augmented Reality) display system 10 as an embodiment.
  • the AR display system 10 includes an HMD (Head Mounted Display) 100 having a transmissive display as an AR display device and a mobile device 200 having a non-transparent display such as a smartphone or a tablet.
  • the HMD 100 constitutes a first device and an information processing device
  • the mobile device 200 constitutes a second device.
  • the HMD 100 is attached to the head of the user 300 so that the transmissive display is located at the eye position. Further, the mobile device 200 is held in the right hand of the user 300.
  • the HMD 100 recognizes the mobile device 200 based on the output of a sensor such as a camera, and virtualizes it as an AR display (AR superimposed object) at a position corresponding to the mobile device 200 so as to be superimposed on the mobile device 200 in the illustrated example. Display a typical book 400.
  • the HMD 100 repeatedly acquires the self-position / posture information, the position / posture information of the mobile device 200 based on the self-position / posture, and the position / posture information of the left hand of the user 300 as a gesture recognition target based on the sensor output of the camera or the like. To do. Further, the mobile device 200 obtains the self-position / posture information, the position / posture information of the HMD 100 based on the self-position / posture, and the position / posture information of the left hand of the user 300 as a gesture recognition target based on the sensor output of the camera or the like. Get it repeatedly.
  • the HMD 100 receives information acquired by the mobile device 200 from the mobile device 200. Then, the HMD 100 spatially and temporally synchronizes the position / posture information of the left hand of the user 300 acquired by the HMD 100 and the mobile device 200 based on the information acquired by itself and the information received from the mobile device 200. It will be integrated later to determine the position and orientation of the left hand of the user 300.
  • the HMD 100 recognizes the gesture of the left hand of the user 300 based on the position and orientation of the left hand of the user 300 determined as described above, or further the temporal change thereof, and controls the AR display based on the recognition information of the gesture. .. For example, when the gesture of the left hand of the user 300 is an operation of turning the pages of the virtual book 400, the display of the book 400 is changed so that the pages of the book 400 are turned.
  • the left hand of the user 300 based on the integrated result of the position / posture information of the left hand of the user 300 based on the position / posture of the HMD 100 and the position / posture information of the left hand of the user 300 based on the position / posture of the mobile device 200, the left hand of the user 300 It recognizes gestures (left hand poses). Therefore, it is possible to improve the accuracy of gesture recognition of the left hand of the user 300.
  • FIG. 2A shows an image of a camera mounted on the HMD100.
  • FIG. 2B shows an image of a camera mounted on the mobile device 200.
  • the camera image of the mobile device 200 since the left hand of the user 300 is taken from the palm side, fine poses of the fingers can be observed.
  • the HMD 100 and the mobile device 200 can be freely moved.
  • FIG. 3 shows a configuration example of the HMD 100.
  • the HMD 100 includes a camera 101, an IMU (Inertial Measurement Unit) 102, an information processing unit 103, a communication unit 104, a transmissive display 105, and an application / recognition information storage 106.
  • IMU Inertial Measurement Unit
  • the camera 101 is composed of an image sensor such as a lens, a CCD image sensor, and a CMOS image sensor. Two cameras 101 are provided on the outer surface of the front portion of the HMD 100, for example, and take an image of an object (subject) existing ahead of the user's line-of-sight direction.
  • the IMU 102 acquires information on the acceleration and angular acceleration of the HMD 100.
  • the information processing unit 103 is composed of a CPU (Central Processing Unit) and the like.
  • the information processing unit 103 performs various processes based on various programs stored in a storage unit (not shown).
  • the information processing unit 103 includes a self-position estimation processing unit 131, another person's position estimation processing unit 132, a hand recognition processing unit 133, a main recognition integrated processing unit 134, and an image generation / application processing unit 135. There is.
  • the self-position estimation processing unit 131 uses an algorithm such as SLAM (Simultaneously Localization And Mapping) to determine the position and orientation of the HMD 100 based on the image obtained by the camera 101 and the information (acceleration, angular acceleration) obtained by the IMU 102. presume.
  • the position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system).
  • the self-position estimation processing unit 131 can also estimate information such as an error in position / orientation estimation.
  • the other person position estimation processing unit 132 estimates the position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 by a method such as object recognition (for example, marker recognition) or tracking based on the image obtained by the camera 101.
  • the position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system).
  • the other person position estimation processing unit 132 can also estimate information such as an error in position / orientation estimation.
  • the position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.
  • the hand recognition processing unit 133 recognizes the hand based on the image obtained by the camera 101, and estimates the position / posture and speed of the hand based on the position / posture of the HMD 100. In addition, the hand recognition processing unit 133 estimates the pose. As pose estimation, for example, the position and speed of each joint are estimated.
  • the information on the position and orientation of the hand estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system), and information on the position of each joint. Contains, for example, three-dimensional information (x, y, z in a Cartesian coordinate system). It is also conceivable to estimate the angle of each finger instead of estimating the position of each joint as pose estimation. In the following description, an example of estimating the position of each joint as pose estimation will be described.
  • the hand recognition processing unit 133 can also estimate information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint.
  • the speed of the hand and the speed of each joint can be estimated from the temporal change of their positions.
  • the hand recognition result is the position and posture of the hand and the position of each joint, but other information may be used as long as the pose of the hand can be restored.
  • information such as rotation of relative coordinates of each joint, rotation of absolute coordinates, or position of each joint in world coordinates can be considered.
  • the self-position estimation processing unit 131, the other person's position estimation processing unit 132, and the hand recognition processing unit 133 perform processing as needed to update the information.
  • the information obtained from these processing units can be summarized as follows.
  • the time (observation time) at which the information was acquired is added as a time stamp to the information.
  • the main recognition integrated processing unit 134 acquires the information obtained at any time by the self-position estimation processing unit 131, the other person's position estimation processing unit 132, and the hand recognition processing unit 133, and the same information from the mobile device 200 through the communication unit 104.
  • the left hand recognition results (positional posture of the hand, position of each joint) of the user 300 acquired by the HMD 100 and the mobile device 200 are spatially and temporally synchronized and then integrated. The processing of the main recognition integrated processing unit 134 will be further described later.
  • the image generation / application processing unit 135 performs processing necessary for the operation of the application, and performs rendering processing for displaying the virtual book 400 as an AR display. Further, the image generation / application processing unit 135 receives the integration result of the main recognition integration processing unit 134, recognizes the gesture of the left hand of the user 300 based on the left hand recognition result of the user 300, and uses the recognition information of this gesture. Based on this, the interaction process that controls the AR display is performed. For example, when the gesture of the left hand of the user 300 turns the page of the book 400, the process of turning the page of the book 400 is performed.
  • the communication unit 104 communicates with the mobile device 200 wirelessly (for example, Wi-Fi (Wireless Fidelity) or Li-Fi (Light Fidelity)) or by wire.
  • the transmissive display 105 performs AR display based on the image data supplied from the image generation / application processing unit 135.
  • the application / recognition information storage 106 holds information necessary for the application. In addition, the application / recognition information storage 106 holds information necessary for recognition and the like. Examples of the information held for recognition are localization maps for SLAM, marker recognition information, and hand recognition information. In the example of FIG. 3, the application / recognition information storage 106 is connected only to the main recognition integrated processing unit 134 and the image generation / application processing unit 135, but may be connected to other processing units.
  • FIG. 4 shows a configuration example of the mobile device 200.
  • the mobile device 200 has a camera 201, an IMU 202, an information processing unit 203, a communication unit 204, a non-transparent display 205, and a recognition information storage 206.
  • the camera 201 is composed of an image sensor such as a lens, a CCD image sensor, and a CMOS image sensor.
  • the camera 201 is a stereo camera, which is provided on the display surface side of the mobile device 200, and images an object (subject) existing on the display surface side.
  • the IMU 202 acquires information on the acceleration and angular acceleration of the mobile device 200.
  • the information processing unit 203 is composed of a CPU (Central Processing Unit) and the like.
  • the information processing unit 203 performs various processes based on various programs stored in a storage unit (not shown).
  • the information processing unit 203 includes a self-position estimation processing unit 231, another person's position estimation processing unit 232, a hand recognition processing unit 233, a main recognition integrated processing unit 234, and a recognition marker image generation unit 235. There is.
  • the self-position estimation processing unit 231 estimates the position and orientation of the mobile device 200 based on the image obtained by the camera 201 and the information (acceleration, angular acceleration) obtained by the IMU 202 by using an algorithm such as SLAM.
  • the position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system).
  • the self-position estimation processing unit 231 can also estimate information such as an error in position / orientation estimation.
  • the other person position estimation processing unit 232 estimates the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 by a method such as object recognition or tracking based on the image obtained by the camera 201.
  • the position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system).
  • the other person position estimation processing unit 232 can also estimate information such as an error in position / orientation estimation.
  • the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.
  • the hand recognition processing unit 233 recognizes the hand based on the image obtained by the camera 201, and estimates the position / posture and speed of the hand based on the position / posture of the mobile device 200. In addition, the hand recognition processing unit 233 estimates the pose. As pose estimation, for example, the position and speed of each joint are estimated.
  • the information on the position and orientation of the hand estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system), and the information on the position of each joint is For example, it contains three-dimensional information (x, y, z in a Cartesian coordinate system).
  • the hand recognition processing unit 233 can also estimate information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint.
  • the speed of the hand and the speed of each joint can be estimated from the temporal change of their positions.
  • the self-position estimation processing unit 231, the other person's position estimation processing unit 232, and the hand recognition processing unit 233 perform processing at any time to update the information.
  • the information obtained from these processing units can be summarized as follows.
  • the time (observation time) at which the information was acquired is added as a time stamp to the information.
  • the sub-recognition integrated processing unit 234 uses the information obtained at any time by the self-position estimation processing unit 231 and the other person's position estimation processing unit 232 and the hand recognition processing unit 233 as a request for information transmission sent from the HMD 100 through the communication unit 204. Correspondingly, it is transmitted to the HMD 100 through the communication unit 204. The processing of the sub-recognition integrated processing unit 234 will be further described later.
  • the recognition marker image generation unit 235 acquires the image data of the recognition marker from the recognition information storage 206 and supplies it to the non-transparent display 205 to display the recognition marker.
  • the display of the recognition marker is performed based on an instruction from the sub-recognition integrated processing unit 234 based on an application activation request received from the HMD 100 via the communication unit 204.
  • the communication unit 204 communicates with the HMD 100 wirelessly or by wire.
  • the non-transparent display 205 displays the recognition marker based on the image data supplied from the recognition marker image generation unit 235.
  • the recognition information storage 206 holds information necessary for recognition and the like. Examples of the information held for recognition include the above-mentioned image data of the recognition marker, a localization map for SLAM, and information for hand recognition. In the example of FIG. 4, the recognition information storage 206 is connected only to the sub-recognition integrated processing unit 234 and the recognition marker image generation unit 235, but may be connected to other processing units.
  • the HMD100 starts the application in step ST1.
  • the HMD 100 requests the mobile device 200, which is a sub device, to start the application.
  • the mobile device 200 activates the application in step ST11 and displays the recognition marker on the non-transparent display 205.
  • the mobile device 200 then performs a recognition process (sub) in step ST12. The details of this recognition process (sub) will be described later.
  • the HMD 100 After the process of step ST2, the HMD 100 performs the recognition process of the mobile device 200 which is a sub device in step ST3. In this case, the HMD 100 estimates the position and orientation of the mobile device 200 based on the recognition marker displayed on the non-transparent display 205 of the mobile device 200.
  • the HMD 100 After the position and orientation are estimated in step ST3, the HMD 100 performs rendering in which a virtual book 400 is superimposed and displayed on the position of the mobile device 200 as an AR display in step ST4, and recognition information of the gesture of the left hand of the user 300. Start the interaction, which is the control of the AR display based on. The HMD 100 then performs a recognition process (main) in step ST5. The details of this recognition process (main) will be described later.
  • the HMD 100 updates the state of the virtual book 400 as necessary based on the recognition information of the gesture of the left hand of the user 300 in step ST6. For example, when the recognition information of the gesture indicates that the page of the book 400 is turned, the state is updated so that the page of the book 400 is turned.
  • the HMD 100 performs the application termination processing in step ST7.
  • a termination signal is transmitted to the mobile device 200, which is a sub device, and the recognition process (main) is terminated.
  • the mobile device 200 receives the end signal from the HMD 100 and performs the application end process in step ST13.
  • the recognition process (sub) is terminated and the marker display is stopped.
  • the self-position estimation processing unit 131 uses an algorithm such as SLAM, and the HMD 100 is self based on the image obtained by the camera 101 and the information (acceleration, angular acceleration) obtained by the IMU 102. The position and orientation of the HMD 100 are estimated. In this case, the error of position / orientation estimation is also estimated.
  • the HMD100 uses the position / orientation of the HMD100 as a reference by a method such as object recognition (for example, marker recognition), tracking, etc., based on the image obtained by the camera 101 by the other person position estimation processing unit 132.
  • the position and orientation of the mobile device 200 which is another person, is estimated.
  • the error of position / orientation estimation is also estimated.
  • the position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.
  • step ST23 the hand recognition processing unit 133 recognizes the hand based on the image obtained by the camera 101, and estimates the position / posture and speed of the hand based on the position / posture of the HMD 100. Furthermore, the position and speed of each joint are estimated as pose estimation. In this case, information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint is also estimated.
  • step ST24 the HMD 100 sends a request for information transmission to the mobile device 200, which is another person, through the communication unit 104 in the main recognition integrated processing unit 134, and receives the information from the mobile device 200.
  • the information received in this way includes information acquired by the self-position estimation processing unit 231, the other person's position estimation processing unit 232, and the hand recognition processing unit 233 in the mobile device 200.
  • the information received in this way is based on the position / orientation of the mobile device 200 and the estimation error, the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200, the estimation error, and the position / orientation of the mobile device 200.
  • the position and posture and speed of the hand, and the estimation error, the position and speed of each joint, and the estimation error are included.
  • the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 constitutes information on the relative relationship between the position / orientation of the mobile device 200 and the HMD100.
  • the HMD 100 is the hand recognition result estimated by the main recognition integrated processing unit 134 by the self HMD 100 and the other mobile device 200 (including the position and posture of the hand and the position of each joint). Are spatially and temporally synchronized (initialized).
  • step ST31 the HMD 100 adjusts the position and orientation of the mobile device 200 estimated by the mobile device 200, which is another person, to the position and orientation of the mobile device 200 estimated based on the position and orientation of the HMD 100, which is itself.
  • the position and orientation of the mobile device 200 are organized in the world coordinate system as seen from the HMD 100.
  • the HMD 100 adjusts the hand recognition result estimated based on the position and orientation of the mobile device 200 based on the adjusted position and orientation of the mobile device 200 to the world coordinate system seen from the HMD 100.
  • the position and posture of the hand estimated based on the HMD 100 (including the position of each joint) and the position and posture of the hand estimated based on the mobile device 200 (including the position of each joint) are the world seen from the HMD 100. It is organized in the coordinate system and is spatially synchronized.
  • the present invention is not limited to this, and it is conceivable to use the position and orientation of the HMD 100 estimated based on the position and orientation of the mobile device 200 and to put them together in the world coordinate system seen from the mobile device 200.
  • the position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 or the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 is used as the relative relationship information between the HMD 100 and the mobile device 200.
  • the observation information of the same object is used as the relative relationship information between the position and orientation of the HMD 100 and the mobile device 200.
  • observation of the same environment initialization by SLAM map
  • observation information of the same object hand recognition, special markers, etc. that can be observed to be the same object
  • the HMD 100 predicts the hand recognition result estimated by the self and others at the current time. This prediction is based on information such as hand speed, joint speed, and observation time. This prediction may be a prediction by linear interpolation, or may be an interpolation by curve interpolation or machine learning. As a result, the hand recognition result estimated based on the HMD 100 and the hand recognition result estimated based on the mobile device 200 are time-synchronized.
  • the hand recognition results estimated by the MD 100 and the mobile device 200 are synchronized in time, it is not necessary to synchronize the observations of the MD 100 and the mobile device 200, but each of the MD 100 and the mobile device 200 does not need to be synchronized. It is necessary to adjust the internal time.
  • FIG. 8A is estimated by the HMD100, the hand recognition result in the world coordinate system seen from the HMD100 is shown by a solid line, and the prediction to the current time is shown by a broken line.
  • the position of each circle indicates the position of the joint, and the color of each circle indicates the observation error. Points that are far from the camera and difficult to observe and have many errors are shown in black, while points that are easy to observe and have few errors are shown in white.
  • FIG. 8B shows the hand recognition result estimated by the mobile device 200 and adjusted to the world coordinate system seen from the HMD 100 with a solid line, and the prediction to the current time is shown with a broken line.
  • the position of each circle indicates the position of the joint, and the color of each circle indicates the observation error. Points that are far from the camera and difficult to observe and have many errors are shown in black, while points that are easy to observe and have few errors are shown in white.
  • FIG. 8 (c) shows the predictions of FIGS. 8 (a) and 8 (b) to the current time. That is, FIG. 8 (c) shows the hand recognition results estimated by the HMD 100 and the mobile device 200, spatially and temporally synchronized.
  • the conversion from the observation time to a time earlier than that may be used instead of the future prediction from the observation time to the current time.
  • the observation time of the hand position and posture (including the position of each joint) estimated by the self and others is adjusted to the old time.
  • interpolation can also be performed by storing time-series information.
  • step ST26 the main integrated processing unit 134 synchronizes the hands of the self and others spatially and temporally (hands). (Including the position and posture of each joint, the position of each joint) is integrated.
  • step ST41 the HMD 100 continuously determines the identity of each hand recognition result having the same ID (same identifier).
  • the identity is maintained based on the pose and position / posture of the hands having the same ID.
  • This verification is calculated from the distance of each observation of the position and posture of the hand and the distance of the pose, and if it exceeds a certain threshold value, it is judged that the identity is not maintained.
  • the pose distance can be calculated from the distance between the positions of each joint and the difference in the rotation of the joints.
  • step ST42 the HMD 100 refers to the determination in step ST41 to determine whether there is a hand recognition result in which the identity is not maintained and the ID needs to be separated.
  • the HMD 100 assigns a different ID to each hand recognition result in step ST43 to separate the IDs, and then performs the process of step ST44. move on.
  • the HMD 100 immediately proceeds to the process of step ST44.
  • the HMD 100 makes an integrated judgment of hand recognition results having different IDs. This judgment is the opposite of the above-mentioned judgment of identity at the time of ID separation, and when the distance between the position and posture of the hands and the pose is equal to or less than a certain threshold value, it is judged as the same hand and integrated.
  • step ST45 the HMD 100 refers to the determination in step ST44 and determines whether or not there is a hand recognition result that requires ID integration.
  • the HMD 100 integrates the IDs in step ST46, and then proceeds to the process of step ST47.
  • the HMD 100 immediately proceeds to the process of step ST47.
  • the ID may be assigned to the hand recognition result based on the identification ID of each individual's hand. In that case, the ID will be assigned at the time of hand recognition. This process is considered to be effective even when the tracked hand is not recognized and appears.
  • ID separation is necessary when the observed hands appear to overlap depending on the position of the camera, are mistakenly recognized as one hand, and then are found to be two hands.
  • the ID integration is treated as if the same hand was observed in different places in space due to the misrecognition of the pose, and it is recognized as two hands, and then the position and orientation of the camera are corrected to the correct position. It is needed in cases where it is integrated when it is observed and recognized as the same hand.
  • FIG. 10 shows an example in which ID separation of the hand recognition result is required.
  • two hands overlap in FIG. 10A it may be recognized as the same one hand by misidentification.
  • FIG. 10B if it is possible to recognize that the two hands are two separate hands in the subsequent recognition, it is necessary to register them as different IDs.
  • FIG. 11 shows an example in which ID integration is required.
  • FIG. 11A even if the same hand is recognized in a state of being misidentified when estimating the self-posture position of the camera, it is recognized as a hand in different spaces.
  • FIG. 11B when the self-positioning posture is corrected and the correct position is recognized, it is found that the hands registered as separate hands are the same. In that case, it is necessary to integrate the two hand recognition results as one hand.
  • a hand recognition result synchronized with the identity can be obtained.
  • the HMD 100 integrates synchronized and identical hand recognition results.
  • the HMD 100 integrates the position and posture of the hand among the hand recognition results. This integration is performed, for example, by using an extended Kalman filter, a normal Kalman filter, or a particle filter. Alternatively, this integration is done, for example, by finding a weighted average or a simple position average.
  • the hand position / orientation estimation error can be used as an input for those filters or as a weight to improve the accuracy of integration.
  • the HMD100 integrates the positions of each joint in the hand recognition result.
  • This integration is also performed using, for example, an extended Kalman filter, or a normal Kalman filter or particle filter.
  • this integration is also done, for example, by finding a weighted average or a simple position average.
  • the estimation error of the position of each joint can be used as an input for those filters or as a weight to improve the accuracy of integration.
  • the integration process is performed by integrating the position and posture of the hand, and then integrating the positions of each joint.
  • the positions of the joints may be integrated from the beginning.
  • the HMD 100 feeds back the integration result in step ST27 after the processing of step ST26.
  • This feedback includes feedback to the own device itself and feedback to the mobile device 200 which is another person.
  • Feedback to the mobile device 200 is performed by transmitting the integrated result to the mobile device 200 through the communication unit 104.
  • the mobile device 200 uses an algorithm such as SLAM in the self-position estimation processing unit 231 based on the image obtained by the camera 201 and the information (acceleration, angular acceleration) obtained by the IMU 202.
  • the position and orientation of the mobile device 200 is estimated.
  • the error of position / orientation estimation is also estimated.
  • the mobile device 200 is the other person's position estimation processing unit 232 based on the image obtained by the camera 201, and the other person based on the position / orientation of the mobile device 200 by a method such as object recognition or tracking.
  • the position and orientation of the HMD 100 is estimated.
  • the error of position / orientation estimation is also estimated.
  • the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.
  • the hand recognition processing unit 233 recognizes the hand based on the image obtained by the camera 201, and estimates the position / posture and speed of the hand based on the position / posture of the mobile device 200. Furthermore, the position and speed of each joint are estimated as pose estimation. In this case, information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint is also estimated.
  • step ST52 the mobile device 200 determines in the sub-recognition integrated processing unit 234 whether or not there is a request for information transmission from another person, the HMD 100.
  • the mobile device 200 transmits the information estimated in step ST51 to the other HMD100 through the communication unit 204 in the sub-recognition integrated processing unit 234 in step ST53, and then, The process proceeds to step ST54.
  • the mobile device 200 immediately proceeds to the process of step ST54.
  • step ST54 the mobile device 200 determines whether or not the integration result is received from another person by the sub-recognition integration processing unit 234.
  • the mobile device 200 integrates the received information with the past estimated information in the sub-recognition integrated processing unit 234 in step ST55, and is used in the hand recognition / hand pose estimation. Update the estimated information of.
  • the mobile device 200 ends the process. In this case, the past estimation information used in hand recognition / hand pose estimation is not updated.
  • the order of each process in the flowchart of FIG. 12 is not limited to this.
  • the processing of self-position estimation, other person's position estimation, hand recognition / hand pose estimation, and the processing of reception and transmission may be performed in parallel.
  • the mobile device 200 transmits the latest estimation result to the HMD 100.
  • the mobile device 200 integrates the integration result (received information) with the latest estimation result.
  • the hand gesture is recognized based on the hand recognition result determined based on the hand recognition result based on the position / orientation of the HMD 100 and the hand recognition result based on the position / orientation of the mobile device 200. It is a thing. Therefore, it is possible to improve the accuracy of hand gesture recognition. Further, in the present technology, the HMD 100 and the mobile device 200 can be freely moved.
  • the gesture recognition target is a hand.
  • other objects that can be modeled can be assumed in addition to the hand.
  • objects such as pens, markers, and boxes that are premised on rigid bodies, books, faces, papers, human bodies, and cars that are known to be deformed can be considered as deformation examples.
  • the self-positioning posture, the other-person-positioning posture, and the recognition of the hand are all explained on the premise of the input estimated in the 3D space. Therefore, it is basically premised on a sensor such as a stereo camera that can acquire such information. However, it is possible to loosen that premise.
  • each estimation device uses a monocular camera
  • some estimation / estimation results are in an indefinite scale state (a state in which the size of an object is not determined.
  • an indefinite scale state a state in which the size of an object is not determined.
  • processing can be performed by simultaneously estimating the adjustment (synchronization) of the camera poses and the adjustment of the scale at the time of integration of the information of the above-mentioned specific example.
  • the scale can be estimated from that information, so based on that information, other recognition results with an indefinite scale. Can be estimated. Therefore, from this information as well, the processing of this technology can be applied to a system composed of a monocular camera.
  • the present technology can be expanded by performing integrated processing based on the recognition result of the device that can obtain 3D information. It is also possible to consider the cooperation between such devices and monocular cameras. In this case, it is conceivable to interpolate the information of the monocular camera from the 3D information and the information of the device whose scale can be estimated.
  • the cooperation between the HMD 100 and the mobile device 200 is mentioned, but in addition, this technology can be used for detecting the pose of the human body by a plurality of movable cameras.
  • each camera is recognized as another person and treated as a motion capture system by an operable camera system by integrating the observations of each other.
  • the HMD 100 constitutes the first device and the information processing device, and the mobile device 200 constitutes the second device, but the present technology is not limited thereto. Absent.
  • the mobile device 200 may constitute the first device and the information processing device, and the HMD 100 may constitute the second device.
  • AR display control is performed by transmitting an AR display control signal based on the gesture recognition information from the mobile device 200 to the HMD 100 by communication, or the mobile device 200 communicates with the HMD 100.
  • the gesture recognition information is transmitted, and AR display control is performed in the HMD 100 based on the gesture recognition information.
  • the HMD 100 constitutes the first device and the mobile device 200 constitutes the second device, or the mobile device 200 constitutes the first device and the HMD 100 constitutes the second device.
  • other devices such as an external server connected to the mobile device 200 via a network may constitute an information processing device.
  • AR display control is performed by transmitting an AR display control signal based on the gesture recognition information from another device to the HMD 100 by communication, or the other device communicates with the HMD 100.
  • Gesture recognition information is transmitted, and AR display control is performed in the HMD 100 based on the gesture recognition information.
  • the present technology can have the following configurations.
  • An information acquisition unit that acquires the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
  • An information receiving unit that receives the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
  • the position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target.
  • An information processing apparatus including an information processing unit that determines a posture and recognizes the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
  • the information acquisition unit further acquires the position / orientation information of the second device based on the position / orientation of the first device based on the sensor output of the first device.
  • the information processing apparatus according to (1), wherein the relative relationship information includes position / orientation information of the second device based on the position / orientation of the first device.
  • the information acquisition unit acquires the position / orientation information of the second device based on the recognition marker information displayed on the second device included in the sensor output of the first device (3).
  • the information processing device according to 2).
  • the information receiving unit receives the position / orientation information of the first device based on the position / orientation of the second device acquired from the second device based on the sensor output of the second device.
  • the information processing apparatus Receive more, The information processing apparatus according to (1), wherein the relative relationship information includes position / orientation information of the first device based on the position / attitude of the second device.
  • the information processing unit Based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target are spatially synchronized.
  • the first position / posture of the gesture recognition target and the gesture recognition target are time-synchronized by the prediction process, The spatially and temporally synchronized first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target are integrated to determine the position and orientation of the gesture recognition target (1).
  • the information processing apparatus according to any one of (4) to (4). (6) Identification information is added to the first position / posture information of the gesture recognition target acquired by the information acquisition unit and the second position / posture information of the gesture recognition target received by the information receiving unit.
  • the information processing unit integrates or separates the identification information based on each position / posture information of the gesture recognition target, and the first position / posture of the gesture recognition target and the gesture recognition target related to the same identification information.
  • the information processing apparatus according to (5) above, which integrates the second position and orientation.
  • the information receiving unit transmits a request for starting an application to the second device, and then transmits a request for transmitting information, and the second device sends the second position of the gesture recognition target.
  • the information processing device according to any one of (1) to (6) above, which receives posture information.
  • the first device is an augmented reality display device.
  • the information processing device according to any one of (1) to (8), further comprising a display control unit that controls augmented reality display in the augmented reality display device based on the gesture recognition information.
  • the gesture recognition target is located between the first device and the second device.
  • the first device is a head-mounted display having a transmissive display.
  • the information processing device according to (10) above, wherein the second device is a mobile device having a non-transmissive display.
  • An information processing method having a procedure of determining a posture and recognizing the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
  • An information acquisition means for acquiring the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
  • An information receiving means for receiving the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
  • the position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target.
  • a program that determines a posture and functions as an information processing means for recognizing a gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

According to the present invention, the accuracy of gesture recognition of a gesture recognition target is improved. First position and orientation information about the gesture recognition target is acquired on the basis of sensor outputs of a first apparatus with the position and orientation of the first apparatus as a reference. Second position and orientation information about the gesture recognition target is received from a second apparatus with the position and orientation of the second apparatus, which are acquired on the basis of sensor outputs of the second apparatus, as a reference. The position and orientation of the gesture recognition target are determined on the basis of relative relationship information about the positions and orientations of the first and second apparatuses, the first position and orientation information and the second position and orientation information about the gesture recognition target, and a gesture of the gesture recognition target is recognized on the basis of the determined position and orientation of the gesture recognition target.

Description

情報処理装置、情報処理方法およびプログラムInformation processing equipment, information processing methods and programs
 本技術は、情報処理装置、情報処理方法およびプログラムに関する。 This technology relates to information processing devices, information processing methods and programs.
 従来、例えば特許文献1には、天井に付けられたカメラを用いてユーザの手のジェスチャ認識を行うユーザインタフェース装置が記載されている。このユーザインタフェース装置の場合は、天井に固定されたカメラを用いるものであり、ユーザの位置あるいは手の向きによっては、ジェスチャ認識の精度が低下する恐れがある。 Conventionally, for example, Patent Document 1 describes a user interface device that recognizes a gesture of a user's hand using a camera attached to the ceiling. In the case of this user interface device, a camera fixed to the ceiling is used, and the accuracy of gesture recognition may decrease depending on the position of the user or the direction of the hand.
特開2017-211960号公報Japanese Unexamined Patent Publication No. 2017-21960
 本技術の目的は、ジェスチャ認識対象のジェスチャ認識の精度を高める、ことにある。 The purpose of this technology is to improve the accuracy of gesture recognition of the gesture recognition target.
 本技術の概念は、
 第1の機器のセンサ出力に基づき前記第1の機器の位置姿勢を基準としたジェスチャ認識対象の第1の位置姿勢情報を取得する情報取得部と、
 第2の機器から、前記第2の機器のセンサ出力に基づき取得された前記第2の機器の位置姿勢を基準とした前記ジェスチャ認識対象の第2の位置姿勢情報を受信する情報受信部と、
 前記第1の機器と前記第2の機器の位置姿勢の相対関係情報、前記ジェスチャ認識対象の第1の位置姿勢情報および前記ジェスチャ認識対象の第2の位置姿勢情報に基づき前記ジェスチャ認識対象の位置姿勢を決定し、該決定された前記ジェスチャ認識対象の位置姿勢に基づき前記ジェスチャ認識対象のジェスチャを認識する情報処理部を備える
 情報処理装置にある。
The concept of this technology is
An information acquisition unit that acquires the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
An information receiving unit that receives the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. The information processing apparatus includes an information processing unit that determines a posture and recognizes the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
 本技術において、情報取得部により、第1の機器のセンサ出力に基づき第1の機器の位置姿勢を基準としたジェスチャ認識対象の第1の位置姿勢情報が取得される。また、情報受信部により、第2の機器から、この第2の機器のセンサ出力に基づき取得されたこの第2の機器の位置姿勢を基準としたジェスチャ認識対象の第2の位置姿勢情報が受信される。 In the present technology, the information acquisition unit acquires the first position / orientation information of the gesture recognition target based on the position / orientation of the first device based on the sensor output of the first device. In addition, the information receiving unit receives from the second device the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device. Will be done.
 また、情報処理部により、第1の機器と第2の機器の位置姿勢の相対関係情報、ジェスチャ認識対象の第1の位置姿勢情報およびジェスチャ認識対象の第2の位置姿勢情報に基づき、ジェスチャ認識対象の位置姿勢が決定される。そして、情報処理部により、決定されたジェスチャ認識対象の位置姿勢に基づき、ジェスチャ認識対象のジェスチャが認識される。 In addition, the information processing unit recognizes the gesture based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and orientation information of the gesture recognition target. The position and posture of the target are determined. Then, the information processing unit recognizes the gesture of the gesture recognition target based on the determined position and orientation of the gesture recognition target.
 例えば、情報取得部は、第1の機器のセンサ出力に基づき、第1の機器の位置姿勢を基準とした第2の機器の位置姿勢情報をさらに取得し、相対関係情報は、第1の機器の位置姿勢を基準とした第2の機器の位置姿勢情報を含む、ようにされてもよい。この場合、例えば、情報取得部は、第1の機器のセンサ出力に含まれる第2の機器に表示される認識用マーカー情報に基づき、第2の機器の位置姿勢情報を取得する、ようにされてもよい。また、例えば、情報受信部は、第2の機器から、この第2の機器のセンサ出力に基づき取得されたこの第2の機器の位置姿勢情報を基準とした第1の機器の位置姿勢情報をさらに受信し、相対関係情報は、第2の機器の位置姿勢情報を基準とした第1の機器の位置姿勢情報を含む、ようにされてもよい。 For example, the information acquisition unit further acquires the position / orientation information of the second device based on the position / orientation of the first device based on the sensor output of the first device, and the relative relationship information is the first device. The position / orientation information of the second device based on the position / orientation of the second device may be included. In this case, for example, the information acquisition unit is configured to acquire the position / orientation information of the second device based on the recognition marker information displayed on the second device included in the sensor output of the first device. You may. Further, for example, the information receiving unit receives the position / orientation information of the first device based on the position / attitude information of the second device acquired from the second device based on the sensor output of the second device. Further received, the relative relationship information may be made to include the position / orientation information of the first device based on the position / attitude information of the second device.
 また、例えば、情報処理部は、第1の機器と第2の機器の位置姿勢の相対関係情報に基づき、ジェスチャ認識対象の第1の位置姿勢とジェスチャ認識対象の第2の位置姿勢を空間的に同期させ、ジェスチャ認識対象の第1の位置姿勢情報およびジェスチャ認識対象の第2の位置姿勢情報にそれぞれ付加されたタイムスタンプ情報に基づき、ジェスチャ認識対象の第1の位置姿勢およびジェスチャ認識対象の第2の位置姿勢を予測処理により時間的に同期させ、空間的および時間的に同期させたジェスチャ認識対象の第1の位置姿勢およびジェスチャ認識対象の第2の位置姿勢を統合して、ジェスチャ認識対象の位置姿勢を決定する、ようにされてもよい。 Further, for example, the information processing unit spatially determines the first position / posture of the gesture recognition target and the second position / posture of the gesture recognition target based on the relative relationship information between the position / posture of the first device and the second device. Based on the time stamp information added to the first position / posture information of the gesture recognition target and the second position / posture information of the gesture recognition target, the first position / posture of the gesture recognition target and the gesture recognition target The second position / orientation is time-synchronized by the prediction process, and the first position / orientation of the gesture recognition target and the second position / orientation of the gesture recognition target that are spatially and temporally synchronized are integrated to perform gesture recognition. The position and orientation of the object may be determined.
 この場合、例えば、情報取得部で取得されるジェスチャ認識対象の第1の位置姿勢情報および情報受信部で受信されるジェスチャ認識対象の第2の位置姿勢情報にはそれぞれ識別情報が付与されており、情報処理部は、ジェスチャ認識対象の各位置姿勢情報に基づいて識別情報の統合または分離を行い、同一の識別情報に係るジェスチャ認識対象の第1の位置姿勢およびジェスチャ認識対象の第2の位置姿勢を統合する、ようにされてもよい。 In this case, for example, identification information is added to the first position / orientation information of the gesture recognition target acquired by the information acquisition unit and the second position / orientation information of the gesture recognition target received by the information receiving unit. , The information processing unit integrates or separates the identification information based on each position / orientation information of the gesture recognition target, and the first position / orientation of the gesture recognition target and the second position of the gesture recognition target related to the same identification information. The posture may be integrated.
 また、例えば、情報受信部は、第2の機器に、アプリケーションの起動要求を送信した後、情報送信のリクエストを送信して、この第2の機器から、ジェスチャ認識対象の第2の位置姿勢情報を受信する、ようにされてもよい。この場合、例えば、アプリケーションの起動要求を受信した第2の機器は、ジェスチャ認識対象の第2の位置姿勢情報を随時更新する、ようにされてもよい。 Further, for example, the information receiving unit transmits a request for starting the application to the second device, and then sends a request for transmitting information, and the second device recognizes the second position / orientation information of the gesture recognition target. May be received. In this case, for example, the second device that has received the application start request may update the second position / orientation information of the gesture recognition target at any time.
 また、例えば、第1の機器は、拡張現実表示装置であり、ジェスチャ検認識情報に基づいて拡張現実表示装置における拡張現実表示を制御する表示制御部をさらに備える、ようにされてもよい。この場合、例えば、ジェスチャ認識対象は、第1の機器と第2の機器の間に位置し、拡張現実表示は、第2の機器に対応した位置に行われる、ようにされてもよい。そして、この場合、例えば、第1の機器は、透過ディスプレイを持つヘッドマウントディスプレイであり、第2の機器は、非透過ディスプレイを持つモバイル機器である、ようにされてもよい。 Further, for example, the first device is an augmented reality display device, and may be further provided with a display control unit that controls the augmented reality display in the augmented reality display device based on the gesture detection recognition information. In this case, for example, the gesture recognition target may be located between the first device and the second device, and the augmented reality display may be performed at a position corresponding to the second device. Then, in this case, for example, the first device may be a head-mounted display having a transmissive display, and the second device may be a mobile device having a non-transparent display.
 このように本技術においては、第1の機器の位置姿勢を基準としたジェスチャ認識対象の第1の位置姿勢情報および第2の機器の位置姿勢を基準としたジェスチャ認識対象の第2の位置姿勢情報に基づいて決定されたジェスチャ認識対象の位置姿勢に基づきジェスチャ認識対象のジェスチャを認識するものである。そのため、ジェスチャ認識対象のジェスチャ認識の精度を高めることが可能となる。また、本技術において、第1の機器および第2の機器は自由に移動可能である。 As described above, in the present technology, the first position / orientation information of the gesture recognition target based on the position / orientation of the first device and the second position / orientation of the gesture recognition target based on the position / orientation of the second device. It recognizes the gesture of the gesture recognition target based on the position and posture of the gesture recognition target determined based on the information. Therefore, it is possible to improve the accuracy of gesture recognition of the gesture recognition target. Further, in the present technology, the first device and the second device can be freely moved.
実施の形態としてのAR表示システムの構成例を示すブロック図である。It is a block diagram which shows the configuration example of the AR display system as an embodiment. HMDおよびモバイル機器のカメラ画像の一例を示す図である。It is a figure which shows an example of the camera image of an HMD and a mobile device. HMDの構成例を示すブロック図である。It is a block diagram which shows the structural example of HMD. モバイル機器の構成例を示すブロック図である。It is a block diagram which shows the configuration example of a mobile device. AR表示システムにおけるHMD、モバイル機器の処理の概要を説明するためのフローチャートである。It is a flowchart for demonstrating the outline of processing of HMD and a mobile device in an AR display system. HMDにおける認識処理(メイン)を説明するためのフローチャートである。It is a flowchart for demonstrating the recognition process (main) in HMD. 同期処理の詳細を説明するための図である。It is a figure for demonstrating the detail of a synchronization process. HMDおよびモバイル機器の手認識結果の空間的および時間的な同期処理について説明するための図である。It is a figure for demonstrating the spatial and temporal synchronization processing of the hand recognition result of an HMD and a mobile device. 統合処理の詳細を説明するための図である。It is a figure for demonstrating the detail of the integration process. 手認識結果のID分離が必要な例を説明するための図である。It is a figure for demonstrating an example in which ID separation of a hand recognition result is necessary. 手認識結果のID統合が必要な例を説明するための図である。It is a figure for demonstrating an example in which ID integration of a hand recognition result is necessary. モバイル機器における認識処理(サブ)を説明するためのフローチャートである。It is a flowchart for demonstrating recognition processing (sub) in a mobile device.
 以下、発明を実施するための形態(以下、「実施の形態」とする)について説明する。なお、説明は以下の順序で行う。
 1.実施の形態
 2.変形例
Hereinafter, embodiments for carrying out the invention (hereinafter referred to as “embodiments”) will be described. The explanation will be given in the following order.
1. 1. Embodiment 2. Modification example
 <1.実施の形態>
 「AR表示システム」
 図1は、実施の形態としてのAR(拡張現実:Augmented Reality)表示システム10の構成例を示している。このAR表示システム10は、AR表示装置としての透過ディスプレイを持つHMD(Head Mounted Display)100とスマートフォン、タブレット等の非透過ディスプレイを持つモバイル機器200からなっている。この実施の形態において、HMD100は、第1の機器および情報処理装置を構成しており、モバイル機器200は第2の機器を構成している。
<1. Embodiment>
"AR display system"
FIG. 1 shows a configuration example of an AR (Augmented Reality) display system 10 as an embodiment. The AR display system 10 includes an HMD (Head Mounted Display) 100 having a transmissive display as an AR display device and a mobile device 200 having a non-transparent display such as a smartphone or a tablet. In this embodiment, the HMD 100 constitutes a first device and an information processing device, and the mobile device 200 constitutes a second device.
 HMD100は、ユーザ300の頭部に、透過型ディスプレイが眼位置に位置するように、装着されている。また、モバイル機器200は、ユーザ300の右手に保持されている。HMD100は、カメラ等のセンサ出力に基づいてモバイル機器200を認識し、そのモバイル機器200に対応した位置に、図示の例ではモバイル機器200に重畳するように、AR表示(AR重畳物体)として仮想的な本400を表示する。 The HMD 100 is attached to the head of the user 300 so that the transmissive display is located at the eye position. Further, the mobile device 200 is held in the right hand of the user 300. The HMD 100 recognizes the mobile device 200 based on the output of a sensor such as a camera, and virtualizes it as an AR display (AR superimposed object) at a position corresponding to the mobile device 200 so as to be superimposed on the mobile device 200 in the illustrated example. Display a typical book 400.
 HMD100は、カメラ等のセンサ出力に基づいて、自己位置姿勢情報と、この自己位置姿勢を基準としたモバイル機器200の位置姿勢情報およびジェスチャ認識対象としてのユーザ300の左手の位置姿勢情報を繰り返し取得する。また、モバイル機器200は、カメラ等のセンサ出力に基づいて、自己位置姿勢情報と、この自己位置姿勢を基準としたHMD100の位置姿勢情報およびジェスチャ認識対象としてのユーザ300の左手の位置姿勢情報を繰り返し取得する。 The HMD 100 repeatedly acquires the self-position / posture information, the position / posture information of the mobile device 200 based on the self-position / posture, and the position / posture information of the left hand of the user 300 as a gesture recognition target based on the sensor output of the camera or the like. To do. Further, the mobile device 200 obtains the self-position / posture information, the position / posture information of the HMD 100 based on the self-position / posture, and the position / posture information of the left hand of the user 300 as a gesture recognition target based on the sensor output of the camera or the like. Get it repeatedly.
 HMD100は、モバイル機器200から、モバイル機器200で取得された情報を受信する。そして、HMD100は、自身で取得した情報とモバイル機器200から受信した情報に基づいて、HMD100およびモバイル機器200で取得されたユーザ300の左手の位置姿勢情報を、空間的および時間的に同期させた後に統合して、ユーザ300の左手の位置姿勢を決定する。 The HMD 100 receives information acquired by the mobile device 200 from the mobile device 200. Then, the HMD 100 spatially and temporally synchronizes the position / posture information of the left hand of the user 300 acquired by the HMD 100 and the mobile device 200 based on the information acquired by itself and the information received from the mobile device 200. It will be integrated later to determine the position and orientation of the left hand of the user 300.
 HMD100は、上述したように決定されたユーザ300の左手の位置姿勢、あるいはさらにその時間的変化に基づいてユーザ300の左手のジェスチャを認識し、このジェスチャの認識情報に基づいてAR表示を制御する。例えば、ユーザ300の左手のジェスチャが仮想的な本400のページを捲るような動作であった場合には、本400のページが捲られるように本400の表示を変化させていく。 The HMD 100 recognizes the gesture of the left hand of the user 300 based on the position and orientation of the left hand of the user 300 determined as described above, or further the temporal change thereof, and controls the AR display based on the recognition information of the gesture. .. For example, when the gesture of the left hand of the user 300 is an operation of turning the pages of the virtual book 400, the display of the book 400 is changed so that the pages of the book 400 are turned.
 このように、HMD100の位置姿勢を基準としたユーザ300の左手の位置姿勢情報およびモバイル機器200の位置姿勢を基準としたユーザ300の左手の位置姿勢情報の統合結果に基づいてユーザ300の左手のジェスチャ(左手のポーズ)を認識するものである。そのため、ユーザ300の左手のジェスチャ認識の精度を高めることが可能となる。 As described above, based on the integrated result of the position / posture information of the left hand of the user 300 based on the position / posture of the HMD 100 and the position / posture information of the left hand of the user 300 based on the position / posture of the mobile device 200, the left hand of the user 300 It recognizes gestures (left hand poses). Therefore, it is possible to improve the accuracy of gesture recognition of the left hand of the user 300.
 例えば、ユーザ300の左手のジェスチャが仮想的な本400のページを捲るような動作であった場合を考える。図2(a)は、HMD100搭載のカメラの画像を示している。このHMD100のカメラ画像においては、ユーザ300の左手は手の甲側から撮ったものとなるため、指の細かいポーズは観測てきない。また、図2(b)は、モバイル機器200搭載のカメラの画像を示している。このモバイル機器200のカメラ画像においては、ユーザ300の左手は手の平側から撮ったものとなるため、指の細かいポーズも観測できる。 For example, consider the case where the gesture of the left hand of the user 300 is an operation of turning the pages of a virtual book 400. FIG. 2A shows an image of a camera mounted on the HMD100. In the camera image of the HMD 100, since the left hand of the user 300 is taken from the back side of the hand, the fine pose of the finger is not observed. Further, FIG. 2B shows an image of a camera mounted on the mobile device 200. In the camera image of the mobile device 200, since the left hand of the user 300 is taken from the palm side, fine poses of the fingers can be observed.
 そのため、HMD100およびモバイル機器200によるユーザ300の左手の位置姿勢情報の統合結果に基づいてユーザ300の左手のジェスチャを認識することで、その認識精度を高めることが可能となり、また、単一のカメラ画像からの観測では難しい認識が可能となる。また、このAR表示システム10においては、HMD100とモバイル機器200は、自由に移動可能である。 Therefore, by recognizing the gesture of the left hand of the user 300 based on the integration result of the position and orientation information of the left hand of the user 300 by the HMD 100 and the mobile device 200, it is possible to improve the recognition accuracy, and a single camera. Observation from images enables difficult recognition. Further, in the AR display system 10, the HMD 100 and the mobile device 200 can be freely moved.
 「HMDの構成例」
 図3は、HMD100の構成例を示している。このHMD100は、カメラ101と、IMU(慣性計測ユニット:Inertial Measurement Unit)102と、情報処理部103と、通信部104と、透過ディスプレイ105と、アプリケーション・認識用情報ストレージ106を有している。
"HMD configuration example"
FIG. 3 shows a configuration example of the HMD 100. The HMD 100 includes a camera 101, an IMU (Inertial Measurement Unit) 102, an information processing unit 103, a communication unit 104, a transmissive display 105, and an application / recognition information storage 106.
 カメラ101は、レンズやCCDイメージセンサ、CMOSイメージセンサ等の撮像素子等で構成されている。このカメラ101は、例えば、HMD100のフロント部の外面に2つ設けられており、ユーザの視線方向の先に存在する物体(被写体)を撮像する。IMU102は、HMD100の加速度、角加速度の情報を取得する。 The camera 101 is composed of an image sensor such as a lens, a CCD image sensor, and a CMOS image sensor. Two cameras 101 are provided on the outer surface of the front portion of the HMD 100, for example, and take an image of an object (subject) existing ahead of the user's line-of-sight direction. The IMU 102 acquires information on the acceleration and angular acceleration of the HMD 100.
 情報処理部103は、CPU(Central Processing Unit)等により構成されている。情報処理部103は、図示しない記憶部に記憶された各種プログラムに基づき種々の処理をする。この情報処理部103は、自己位置推定処理部131と、他者位置推定処理部132と、手認識処理部133と、メイン認識統合処理部134と、画像生成・アプリケーション処理部135を有している。 The information processing unit 103 is composed of a CPU (Central Processing Unit) and the like. The information processing unit 103 performs various processes based on various programs stored in a storage unit (not shown). The information processing unit 103 includes a self-position estimation processing unit 131, another person's position estimation processing unit 132, a hand recognition processing unit 133, a main recognition integrated processing unit 134, and an image generation / application processing unit 135. There is.
 自己位置推定処理部131は、例えばSLAM(Simultaneously Localization And Mapping)等のアルゴリズムを利用し、カメラ101で得られる画像とIMU102で得られる情報(加速度、角加速度)に基づいて、HMD100の位置姿勢を推定する。このように推定される位置姿勢の情報は、例えば、6次元の情報(直交座標系におけるx、y、z、ピッチ、ヨー、ロール)を含んでいる。この場合、自己位置推定処理部131は、位置姿勢推定の誤差等の情報も推定できるものとする。 The self-position estimation processing unit 131 uses an algorithm such as SLAM (Simultaneously Localization And Mapping) to determine the position and orientation of the HMD 100 based on the image obtained by the camera 101 and the information (acceleration, angular acceleration) obtained by the IMU 102. presume. The position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system). In this case, the self-position estimation processing unit 131 can also estimate information such as an error in position / orientation estimation.
 他者位置推定処理部132は、カメラ101で得られる画像に基づき、物体認識(例えばマーカー認識)、トラッキング等の手法によって、HMD100の位置姿勢を基準としたモバイル機器200の位置姿勢を推定する。このように推定される位置姿勢の情報は、例えば、6次元の情報(直交座標系におけるx、y、z、ピッチ、ヨー、ロール)を含んでいる。この場合、他者位置推定処理部132は、位置姿勢推定の誤差等の情報も推定できるものとする。このHMD100の位置姿勢を基準としたモバイル機器200の位置姿勢は、HMD100とモバイル機器200の位置姿勢の相対関係情報を構成する。 The other person position estimation processing unit 132 estimates the position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 by a method such as object recognition (for example, marker recognition) or tracking based on the image obtained by the camera 101. The position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system). In this case, the other person position estimation processing unit 132 can also estimate information such as an error in position / orientation estimation. The position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.
 手認識処理部133は、カメラ101で得られる画像に基づき、手を認識すると共に、HMD100の位置姿勢を基準とした手の位置姿勢と速度を推定する。また、手認識処理部133は、ポーズを推定する。ポーズ推定として、例えば各関節の位置と速度を推定する。このように推定される手の位置姿勢の情報は、例えば、6次元の情報(直交座標系におけるx、y、z、ピッチ、ヨー、ロール)を含んでおり、また、各関節の位置の情報は、例えば3次元の情報(直交座標系におけるx、y、z)を含んでいる。なお、ポーズ推定として各関節の位置を推定するのではなく、各指の角度を推定することも考えられる。以下の説明では、ポーズ推定として各関節の位置を推定する例で説明する。 The hand recognition processing unit 133 recognizes the hand based on the image obtained by the camera 101, and estimates the position / posture and speed of the hand based on the position / posture of the HMD 100. In addition, the hand recognition processing unit 133 estimates the pose. As pose estimation, for example, the position and speed of each joint are estimated. The information on the position and orientation of the hand estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system), and information on the position of each joint. Contains, for example, three-dimensional information (x, y, z in a Cartesian coordinate system). It is also conceivable to estimate the angle of each finger instead of estimating the position of each joint as pose estimation. In the following description, an example of estimating the position of each joint as pose estimation will be described.
 この場合、手認識処理部133は、手の位置姿勢推定の誤差や各関節の位置推定の誤差等の情報も推定できるものとする。なお、手の速度や各関節の速度は、それらの位置の時間的変化等によって推定可能である。 In this case, the hand recognition processing unit 133 can also estimate information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint. The speed of the hand and the speed of each joint can be estimated from the temporal change of their positions.
 なお、手の認識結果は手の位置姿勢と各関節の位置としているが、手のポーズが復元可能な情報であれば、その他の情報であってもよい。例えば、各関節の相対座標の回転、絶対座標の回転、もしくは各関節の世界座標における位置等の情報も考えられる。 The hand recognition result is the position and posture of the hand and the position of each joint, but other information may be used as long as the pose of the hand can be restored. For example, information such as rotation of relative coordinates of each joint, rotation of absolute coordinates, or position of each joint in world coordinates can be considered.
 自己位置推定処理部131、他者位置推定処理部132および手認識処理部133は、随時処理を行って、情報を更新していく。これらの処理部から得られる情報をまとめると以下の通りとなる。それらの情報には、それらの情報が取得された時刻(観測時刻)がタイムスタンプとして付加される。 The self-position estimation processing unit 131, the other person's position estimation processing unit 132, and the hand recognition processing unit 133 perform processing as needed to update the information. The information obtained from these processing units can be summarized as follows. The time (observation time) at which the information was acquired is added as a time stamp to the information.
 (1)観測時刻
 (2)手の位置姿勢と速度、および推定誤差
 (3)各関節の位置と速度、および推定誤差
 (4)自己(HMD100)の位置姿勢、および推定誤差
 (5)他者(モバイル機器200)の位置姿勢、および推定誤差
(1) Observation time (2) Position and posture and speed of the hand and estimation error (3) Position and speed of each joint and estimation error (4) Position and posture of self (HMD100) and estimation error (5) Others Positional orientation of (mobile device 200) and estimation error
 メイン認識統合処理部134は、己位置推定処理部131、他者位置推定処理部132および手認識処理部133で随時得られる情報と、通信部104を通じてモバイル機器200から同様の情報を取得し、HMD100およびモバイル機器200で取得されたユーザ300の左手の手認識結果(手の位置姿勢、各関節の位置)を、空間的および時間的に同期させた後に統合する。このメイン認識統合処理部134の処理については、さらに後述する。 The main recognition integrated processing unit 134 acquires the information obtained at any time by the self-position estimation processing unit 131, the other person's position estimation processing unit 132, and the hand recognition processing unit 133, and the same information from the mobile device 200 through the communication unit 104. The left hand recognition results (positional posture of the hand, position of each joint) of the user 300 acquired by the HMD 100 and the mobile device 200 are spatially and temporally synchronized and then integrated. The processing of the main recognition integrated processing unit 134 will be further described later.
 画像生成・アプリケーション処理部135は、アプリケーションの動作に必要な処理を行って、AR表示としての仮想的な本400の表示を行うレンダリング処理をする。また、画像生成・アプリケーション処理部135は、メイン認識統合処理部134の統合結果を受け、ユーザ300の左手の手認識結果に基づいてユーザ300の左手のジェスチャを認識し、このジェスチャの認識情報に基づいてAR表示を制御するインタラクション処理をする。例えば、ユーザ300の左手のジェスチャが本400のページを捲っている場合には、本400のページを捲る処理をする。 The image generation / application processing unit 135 performs processing necessary for the operation of the application, and performs rendering processing for displaying the virtual book 400 as an AR display. Further, the image generation / application processing unit 135 receives the integration result of the main recognition integration processing unit 134, recognizes the gesture of the left hand of the user 300 based on the left hand recognition result of the user 300, and uses the recognition information of this gesture. Based on this, the interaction process that controls the AR display is performed. For example, when the gesture of the left hand of the user 300 turns the page of the book 400, the process of turning the page of the book 400 is performed.
 通信部104は、モバイル機器200との間で、無線(例えば、Wi-Fi(Wireless Fidelity)やLi-Fi(Light Fidelity))、または有線により通信をする。透過ディスプレイ105は、画像生成・アプリケーション処理部135から供給される画像データに基づいてAR表示をする。 The communication unit 104 communicates with the mobile device 200 wirelessly (for example, Wi-Fi (Wireless Fidelity) or Li-Fi (Light Fidelity)) or by wire. The transmissive display 105 performs AR display based on the image data supplied from the image generation / application processing unit 135.
 アプリケーション・認識用情報ストレージ106は、アプリケーションに必要な情報を保持する。また、このアプリケーション・認識用情報ストレージ106は、認識などに必要な情報を保持する。認識用に保持している情報の例として、SLAM用のローカライゼーション用マップ、マーカー認識用の情報や手認識用の情報である。なお、図3の例では、アプリケーション・認識用情報ストレージ106はメイン認識統合処理部134や画像生成・アプリケーション処理部135のみにつながっているが、その他の処理部とつながっていてもよい。 The application / recognition information storage 106 holds information necessary for the application. In addition, the application / recognition information storage 106 holds information necessary for recognition and the like. Examples of the information held for recognition are localization maps for SLAM, marker recognition information, and hand recognition information. In the example of FIG. 3, the application / recognition information storage 106 is connected only to the main recognition integrated processing unit 134 and the image generation / application processing unit 135, but may be connected to other processing units.
 「モバイル機器の構成例」
 図4は、モバイル機器200の構成例を示している。このモバイル機器200は、カメラ201と、IMU202と、情報処理部203と、通信部204と、非透過ディスプレイ205と、認識用情報ストレージ206を有している。
"Example of mobile device configuration"
FIG. 4 shows a configuration example of the mobile device 200. The mobile device 200 has a camera 201, an IMU 202, an information processing unit 203, a communication unit 204, a non-transparent display 205, and a recognition information storage 206.
 カメラ201は、レンズやCCDイメージセンサ、CMOSイメージセンサ等の撮像素子等で構成されている。このカメラ201は、ステレオカメラであり、モバイル機器200のディスプレイ面側に設けられており、このディスプレイ面側に存在する物体(被写体)を撮像する。IMU202は、モバイル機器200の加速度、角加速度の情報を取得する。 The camera 201 is composed of an image sensor such as a lens, a CCD image sensor, and a CMOS image sensor. The camera 201 is a stereo camera, which is provided on the display surface side of the mobile device 200, and images an object (subject) existing on the display surface side. The IMU 202 acquires information on the acceleration and angular acceleration of the mobile device 200.
 情報処理部203は、CPU(Central Processing Unit)等により構成されている。情報処理部203は、図示しない記憶部に記憶された各種プログラムに基づき種々の処理をする。この情報処理部203は、自己位置推定処理部231と、他者位置推定処理部232と、手認識処理部233と、メイン認識統合処理部234と、認識用マーカー画像生成部235を有している。 The information processing unit 203 is composed of a CPU (Central Processing Unit) and the like. The information processing unit 203 performs various processes based on various programs stored in a storage unit (not shown). The information processing unit 203 includes a self-position estimation processing unit 231, another person's position estimation processing unit 232, a hand recognition processing unit 233, a main recognition integrated processing unit 234, and a recognition marker image generation unit 235. There is.
 自己位置推定処理部231は、例えばSLAM等のアルゴリズムを利用し、カメラ201で得られる画像とIMU202で得られる情報(加速度、角加速度)に基づいて、モバイル機器200の位置姿勢を推定する。このように推定される位置姿勢の情報は、例えば、6次元の情報(直交座標系におけるx、y、z、ピッチ、ヨー、ロール)を含んでいる。この場合、自己位置推定処理部231は、位置姿勢推定の誤差等の情報も推定できるものとする。 The self-position estimation processing unit 231 estimates the position and orientation of the mobile device 200 based on the image obtained by the camera 201 and the information (acceleration, angular acceleration) obtained by the IMU 202 by using an algorithm such as SLAM. The position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system). In this case, the self-position estimation processing unit 231 can also estimate information such as an error in position / orientation estimation.
 他者位置推定処理部232は、カメラ201で得られる画像に基づき、物体認識、トラッキング等の手法によって、モバイル機器200の位置姿勢を基準としたHMD100の位置姿勢を推定する。このように推定される位置姿勢の情報は、例えば、6次元の情報(直交座標系におけるx、y、z、ピッチ、ヨー、ロール)を含んでいる。この場合、他者位置推定処理部232は、位置姿勢推定の誤差等の情報も推定できるものとする。このモバイル機器200の位置姿勢を基準としたHMD100の位置姿勢は、HMD100とモバイル機器200の位置姿勢の相対関係情報を構成する。 The other person position estimation processing unit 232 estimates the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 by a method such as object recognition or tracking based on the image obtained by the camera 201. The position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system). In this case, the other person position estimation processing unit 232 can also estimate information such as an error in position / orientation estimation. The position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.
 手認識処理部233は、カメラ201で得られる画像に基づき、手を認識すると共に、モバイル機器200の位置姿勢を基準とした手の位置姿勢と速度を推定する。また、手認識処理部233は、ポーズを推定する。ポーズ推定として、例えば各関節の位置と速度を推定する。このように推定される手の位置姿勢の情報は、例えば、6次元の情報(直交座標系におけるx、y、z、ピッチ、ヨー、ロール)を含んでおり、各関節の位置の情報は、例えば3次元の情報(直交座標系におけるx、y、z)を含んでいる。 The hand recognition processing unit 233 recognizes the hand based on the image obtained by the camera 201, and estimates the position / posture and speed of the hand based on the position / posture of the mobile device 200. In addition, the hand recognition processing unit 233 estimates the pose. As pose estimation, for example, the position and speed of each joint are estimated. The information on the position and orientation of the hand estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system), and the information on the position of each joint is For example, it contains three-dimensional information (x, y, z in a Cartesian coordinate system).
 この場合、手認識処理部233は、手の位置姿勢推定の誤差や各関節の位置推定の誤差等の情報も推定できるものとする。なお、手の速度や各関節の速度は、それらの位置の時間的変化等によって推定可能である。 In this case, the hand recognition processing unit 233 can also estimate information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint. The speed of the hand and the speed of each joint can be estimated from the temporal change of their positions.
 自己位置推定処理部231、他者位置推定処理部232および手認識処理部233は、随時処理を行って、情報を更新していく。これらの処理部から得られる情報をまとめると以下の通りとなる。それらの情報には、それらの情報が取得された時刻(観測時刻)がタイムスタンプとして付加される。 The self-position estimation processing unit 231, the other person's position estimation processing unit 232, and the hand recognition processing unit 233 perform processing at any time to update the information. The information obtained from these processing units can be summarized as follows. The time (observation time) at which the information was acquired is added as a time stamp to the information.
 (1)観測時刻
 (2)手の位置姿勢と速度、および推定誤差
 (3)各関節の位置と速度、および推定誤差
 (4)自己(モバイル機器200)の位置姿勢、および推定誤差
 (5)他者(HMD100)の位置姿勢、および推定誤差
(1) Observation time (2) Position and posture and speed of the hand and estimation error (3) Position and speed of each joint and estimation error (4) Position and posture of the self (mobile device 200) and estimation error (5) Position and orientation of another person (HMD100), and estimation error
 サブ認識統合処理部234は、自己位置推定処理部231、他者位置推定処理部232および手認識処理部233で随時得られる情報を、通信部204を通じてHMD100から送られてくる情報送信のリクエストに応じて、通信部204を通じてHMD100に送信する。このサブ認識統合処理部234の処理については、さらに後述する。 The sub-recognition integrated processing unit 234 uses the information obtained at any time by the self-position estimation processing unit 231 and the other person's position estimation processing unit 232 and the hand recognition processing unit 233 as a request for information transmission sent from the HMD 100 through the communication unit 204. Correspondingly, it is transmitted to the HMD 100 through the communication unit 204. The processing of the sub-recognition integrated processing unit 234 will be further described later.
 認識用マーカー画像生成部235は、認識用情報ストレージ206から認識用マーカーの画像データを取得し、非透過ディスプレイ205に供給して、認識用マーカーを表示させる。なお、認識用マーカーの表示に関しては、HMD100から通信部204を介して受信するアプリケーションの起動要請に基づき、サブ認識統合処理部234からの指示に基づいて行われる。 The recognition marker image generation unit 235 acquires the image data of the recognition marker from the recognition information storage 206 and supplies it to the non-transparent display 205 to display the recognition marker. The display of the recognition marker is performed based on an instruction from the sub-recognition integrated processing unit 234 based on an application activation request received from the HMD 100 via the communication unit 204.
 通信部204は、HMD100との間で、無線、または有線により通信をする。非透過ディスプレイ205は、認識用マーカー画像生成部235から供給される画像データに基づいて認識用マーカーを表示する。 The communication unit 204 communicates with the HMD 100 wirelessly or by wire. The non-transparent display 205 displays the recognition marker based on the image data supplied from the recognition marker image generation unit 235.
 認識用情報ストレージ206は、認識などに必要な情報を保持する。認識用に保持している情報の例として、上述した認識用マーカーの画像データ、さらにはSLAM用のローカライゼーション用マップや手認識用の情報である。なお、図4の例では、認識用情報ストレージ206はサブ認識統合処理部234や認識用マーカー画像生成部235のみにつながっているが、その他の処理部とつながっていてもよい。 The recognition information storage 206 holds information necessary for recognition and the like. Examples of the information held for recognition include the above-mentioned image data of the recognition marker, a localization map for SLAM, and information for hand recognition. In the example of FIG. 4, the recognition information storage 206 is connected only to the sub-recognition integrated processing unit 234 and the recognition marker image generation unit 235, but may be connected to other processing units.
 「AR表示システムにおけるHMD、モバイル機器の処理」
 図5のフローチャートを参照して、図1に示すAR表示システム10におけるHMD100およびモバイル機器200の処理の概要について説明する。
"Processing of HMDs and mobile devices in AR display systems"
The outline of the processing of the HMD 100 and the mobile device 200 in the AR display system 10 shown in FIG. 1 will be described with reference to the flowchart of FIG.
 HMD100は、ステップST1において、アプリケーションを起動する。次に、HMD100は、ステップST2において、サブ機器であるモバイル機器200にアプリケーションの起動を要請する。モバイル機器200は、HMD100からのアプリケーションの起動要請を受けて、ステップST11において、アプリケーションを起動し、非透過ディスプレイ205に認識用マーカーを表示する。モバイル機器200は、その後、ステップST12において、認識処理(サブ)をする。この認識処理(サブ)の詳細については、後述する。 The HMD100 starts the application in step ST1. Next, in step ST2, the HMD 100 requests the mobile device 200, which is a sub device, to start the application. In response to the application activation request from the HMD 100, the mobile device 200 activates the application in step ST11 and displays the recognition marker on the non-transparent display 205. The mobile device 200 then performs a recognition process (sub) in step ST12. The details of this recognition process (sub) will be described later.
 HMD100は、ステップST2の処理の後、ステップST3において、サブ機器であるモバイル機器200の認識処理をする。この場合、HMD100は、モバイル機器200の非透過ディスプレイ205に表示された認識用マーカーに基づいて、モバイル機器200の位置姿勢を推定する。 After the process of step ST2, the HMD 100 performs the recognition process of the mobile device 200 which is a sub device in step ST3. In this case, the HMD 100 estimates the position and orientation of the mobile device 200 based on the recognition marker displayed on the non-transparent display 205 of the mobile device 200.
 HMD100は、ステップST3における位置姿勢の推定がなされた後に、ステップST4において、AR表示として仮想的な本400をモバイル機器200の位置に重畳表示するレンダリングと、ユーザ300の左手のジェスチャの認識情報に基づいたAR表示の制御であるインタラクションを開始する。HMD100は、その後、ステップST5において、認識処理(メイン)をする。この認識処理(メイン)の詳細については、後述する。 After the position and orientation are estimated in step ST3, the HMD 100 performs rendering in which a virtual book 400 is superimposed and displayed on the position of the mobile device 200 as an AR display in step ST4, and recognition information of the gesture of the left hand of the user 300. Start the interaction, which is the control of the AR display based on. The HMD 100 then performs a recognition process (main) in step ST5. The details of this recognition process (main) will be described later.
 HMD100は、ステップST5における認識処理(メイン)と並行して、ステップST6において、ユーザ300の左手のジェスチャの認識情報に基づき、必要に応じて、仮想的な本400の状態を更新する。例えば、ジェスチャの認識情報が本400のページを捲っていることを示す場合は、本400のページが捲られるように状態を更新する。 In parallel with the recognition process (main) in step ST5, the HMD 100 updates the state of the virtual book 400 as necessary based on the recognition information of the gesture of the left hand of the user 300 in step ST6. For example, when the recognition information of the gesture indicates that the page of the book 400 is turned, the state is updated so that the page of the book 400 is turned.
 その後、HMD100は、例えば、ユーザ300の操作でアプリケーション終了が指示された場合、ステップST7において、アプリケーション終了処理をする。この処理では、サブ機器であるモバイル機器200に終了信号を送信し、また、認識処理(メイン)を終了する。モバイル機器200は、HMD100からの終了信号を受けて、ステップST13において、アプリケーション終了処理をする。この処理では、認識処理(サブ)を終了し、また、マーカー表示を停止する。 After that, when the application termination is instructed by the operation of the user 300, for example, the HMD 100 performs the application termination processing in step ST7. In this process, a termination signal is transmitted to the mobile device 200, which is a sub device, and the recognition process (main) is terminated. The mobile device 200 receives the end signal from the HMD 100 and performs the application end process in step ST13. In this process, the recognition process (sub) is terminated and the marker display is stopped.
 「認識処理(メイン)」
 図6のフローチャートを参照して、HMD100における認識処理(メイン)、つまり(図5のフローチャートのステップST5の処理の詳細について説明する。HMD100は、図6のフローチャートの処理を繰り返し実行する。
"Recognition processing (main)"
With reference to the flowchart of FIG. 6, the recognition process (main) in the HMD 100, that is, (details of the process of step ST5 of the flowchart of FIG. 5 will be described. The HMD 100 repeatedly executes the process of the flowchart of FIG.
 まず、HMD100は、ステップST21において、自己位置推定処理部131で、SLAM等のアルゴリズムを利用し、カメラ101で得られる画像とIMU102で得られる情報(加速度、角加速度)に基づいて、自己であるHMD100の位置姿勢を推定する。この場合、位置姿勢推定の誤差も推定される。 First, in step ST21, the self-position estimation processing unit 131 uses an algorithm such as SLAM, and the HMD 100 is self based on the image obtained by the camera 101 and the information (acceleration, angular acceleration) obtained by the IMU 102. The position and orientation of the HMD 100 are estimated. In this case, the error of position / orientation estimation is also estimated.
 次に、HMD100は、ステップST22において、他者位置推定処理部132で、カメラ101で得られる画像に基づき、物体認識(例えばマーカー認識)、トラッキング等の手法によって、HMD100の位置姿勢を基準とした他者であるモバイル機器200の位置姿勢を推定する。この場合、位置姿勢推定の誤差も推定される。このHMD100の位置姿勢を基準としたモバイル機器200の位置姿勢は、HMD100とモバイル機器200の位置姿勢の相対関係情報を構成する。 Next, in step ST22, the HMD100 uses the position / orientation of the HMD100 as a reference by a method such as object recognition (for example, marker recognition), tracking, etc., based on the image obtained by the camera 101 by the other person position estimation processing unit 132. The position and orientation of the mobile device 200, which is another person, is estimated. In this case, the error of position / orientation estimation is also estimated. The position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.
 次に、HMD100は、ステップST23において、手認識処理部133で、カメラ101で得られる画像に基づき、手を認識すると共に、HMD100の位置姿勢を基準とした手の位置姿勢と速度を推定し、さらにポーズ推定として各関節の位置と速度を推定する。この場合、手の位置姿勢推定の誤差や各関節の位置推定の誤差等の情報も推定される。 Next, in step ST23, the hand recognition processing unit 133 recognizes the hand based on the image obtained by the camera 101, and estimates the position / posture and speed of the hand based on the position / posture of the HMD 100. Furthermore, the position and speed of each joint are estimated as pose estimation. In this case, information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint is also estimated.
 次に、HMD100は、ステップST24において、メイン認識統合処理部134で、通信部104を通じて他者であるモバイル機器200に情報送信のリクエストを送り、モバイル機器200から情報を受信する。このように受信される情報には、モバイル機器200において、自己位置推定処理部231、他者位置推定処理部232および手認識処理部233で取得される情報が含まれる。 Next, in step ST24, the HMD 100 sends a request for information transmission to the mobile device 200, which is another person, through the communication unit 104 in the main recognition integrated processing unit 134, and receives the information from the mobile device 200. The information received in this way includes information acquired by the self-position estimation processing unit 231, the other person's position estimation processing unit 232, and the hand recognition processing unit 233 in the mobile device 200.
 つまり、このように受信される情報には、モバイル機器200の位置姿勢、および推定誤差、モバイル機器200の位置姿勢を基準としたHMD100の位置姿勢、および推定誤差、モバイル機器200の位置姿勢を基準とした手の位置姿勢と速度、および推定誤差、各関節の位置と速度、および推定誤差などが含まれる。このモバイル機器200の位置姿勢を基準としたHMD100の位置姿勢は、モバイル機器200とHMD100の位置姿勢の相対関係情報を構成する。 That is, the information received in this way is based on the position / orientation of the mobile device 200 and the estimation error, the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200, the estimation error, and the position / orientation of the mobile device 200. The position and posture and speed of the hand, and the estimation error, the position and speed of each joint, and the estimation error are included. The position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 constitutes information on the relative relationship between the position / orientation of the mobile device 200 and the HMD100.
 次に、HMD100は、ステップST25において、メイン認識統合処理部134で、自己であるHMD100および他者であるモバイル機器200で推定された手認識結果(手の位置姿勢、各関節の位置を含む)を空間的および時間的に同期させる(初期化をする)。 Next, in step ST25, the HMD 100 is the hand recognition result estimated by the main recognition integrated processing unit 134 by the self HMD 100 and the other mobile device 200 (including the position and posture of the hand and the position of each joint). Are spatially and temporally synchronized (initialized).
 図7のフローチャートは、ステップST25における同期処理をさらに詳細に示したものである。HMD100は、ステップST31において、他者であるモバイル機器200で推定された当該モバイル機器200の位置姿勢を、自己であるHMD100の位置姿勢を基準として推定されたモバイル機器200の位置姿勢に合わせ込む。これにより、モバイル機器200の位置姿勢は、HMD100から見た世界座標系でまとまる。 The flowchart of FIG. 7 shows the synchronization process in step ST25 in more detail. In step ST31, the HMD 100 adjusts the position and orientation of the mobile device 200 estimated by the mobile device 200, which is another person, to the position and orientation of the mobile device 200 estimated based on the position and orientation of the HMD 100, which is itself. As a result, the position and orientation of the mobile device 200 are organized in the world coordinate system as seen from the HMD 100.
 次に、HMD100は、ステップST32において、合わせ込まれたモバイル機器200の位置姿勢に基づいて、モバイル機器200の位置姿勢を基準として推定された手認識結果を、HMD100から見た世界座標系に合わせ込む。これにより、HMD100を基準として推定された手の位置姿勢(各関節の位置も含む)とモバイル機器200を基準として推定された手の位置姿勢(各関節の位置も含む)はHMD100から見た世界座標系でまとまり、空間的に同期したものとなる。 Next, in step ST32, the HMD 100 adjusts the hand recognition result estimated based on the position and orientation of the mobile device 200 based on the adjusted position and orientation of the mobile device 200 to the world coordinate system seen from the HMD 100. Include. As a result, the position and posture of the hand estimated based on the HMD 100 (including the position of each joint) and the position and posture of the hand estimated based on the mobile device 200 (including the position of each joint) are the world seen from the HMD 100. It is organized in the coordinate system and is spatially synchronized.
 なお、上述では、HMD100の位置姿勢を基準として推定されたモバイル機器200の位置姿勢を用いて、HMD100から見た世界座標系にまとめる例を示した。しかし、これに限定されるものではなく、モバイル機器200の位置姿勢を基準として推定されたHMD100の位置姿勢を用いて、モバイル機器200から見た世界座標系にまとめることも考えられる。 In the above description, an example is shown in which the position and orientation of the mobile device 200 estimated based on the position and orientation of the HMD 100 are used and summarized in the world coordinate system seen from the HMD 100. However, the present invention is not limited to this, and it is conceivable to use the position and orientation of the HMD 100 estimated based on the position and orientation of the mobile device 200 and to put them together in the world coordinate system seen from the mobile device 200.
 なお、上述では、HMD100の位置姿勢を基準としたモバイル機器200の位置姿勢、あるいはモバイル機器200の位置姿勢を基準としたHMD100の位置姿勢を、HMD100とモバイル機器200の位置姿勢の相対関係情報として使用する旨を説明した。同一物体の観測情報を、HMD100とモバイル機器200の位置姿勢の相対関係情報として用いることも考えられる。例えば、同じ環境の観測(SLAMのマップによる初期化)や、共通で観測可能な同一物体(ハンド認識や特殊なマーカーなど。同一物体であることが観測可能な物体・処理)の観測情報などである。 In the above description, the position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 or the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 is used as the relative relationship information between the HMD 100 and the mobile device 200. I explained that it will be used. It is also conceivable to use the observation information of the same object as the relative relationship information between the position and orientation of the HMD 100 and the mobile device 200. For example, observation of the same environment (initialization by SLAM map) or observation information of the same object (hand recognition, special markers, etc. that can be observed to be the same object) that can be observed in common. is there.
 次に、HMD100は、ステップST33において、自己、他者で推定された手認識結果の現時刻における予測をする。この予測は、手の速度、各関節の速度、さらには観測時刻などの情報に基づいて行われる。この予測は、線形的な補間による予測であってもよく、あるいは曲線補間や機械学習による補間であってもよい。これにより、HMD100を基準として推定された手認識結果とモバイル機器200を基準として推定された手認識結果は、時間的に同期したものとなる。 Next, in step ST33, the HMD 100 predicts the hand recognition result estimated by the self and others at the current time. This prediction is based on information such as hand speed, joint speed, and observation time. This prediction may be a prediction by linear interpolation, or may be an interpolation by curve interpolation or machine learning. As a result, the hand recognition result estimated based on the HMD 100 and the hand recognition result estimated based on the mobile device 200 are time-synchronized.
 このように本技術では、MD100およびモバイル機器200で推定された手認識結果を時間的に同期させるので、MD100およびモバイル機器200の観測の同期は必要としないが、MD100およびモバイル機器200のそれぞれの内部時間の合わせ込みは必要となる。 As described above, in the present technology, since the hand recognition results estimated by the MD 100 and the mobile device 200 are synchronized in time, it is not necessary to synchronize the observations of the MD 100 and the mobile device 200, but each of the MD 100 and the mobile device 200 does not need to be synchronized. It is necessary to adjust the internal time.
 図8(a)は、HMD100で推定され、HMD100から見た世界座標系における手認識結果を実線で示し、現時刻への予測を破線で示している。図示の例では、各丸の位置は関節の位置を示し、各丸の色は観測誤差を示す。カメラから遠く、観測し難く誤差が多い点を黒色で表し、逆に観測し易く誤差が少ない点を白色で表している。 FIG. 8A is estimated by the HMD100, the hand recognition result in the world coordinate system seen from the HMD100 is shown by a solid line, and the prediction to the current time is shown by a broken line. In the illustrated example, the position of each circle indicates the position of the joint, and the color of each circle indicates the observation error. Points that are far from the camera and difficult to observe and have many errors are shown in black, while points that are easy to observe and have few errors are shown in white.
 図8(b)は、モバイル機器200で推定され、HMD100から見た世界座標系に合わせ込まれた手認識結果を実線で示し、現時刻への予測を破線で示している。図示の例では、各丸の位置は関節の位置を示し、各丸の色は観測誤差を示す。カメラから遠く、観測し難く誤差が多い点を黒色で表し、逆に観測し易く誤差が少ない点を白色で表している。 FIG. 8B shows the hand recognition result estimated by the mobile device 200 and adjusted to the world coordinate system seen from the HMD 100 with a solid line, and the prediction to the current time is shown with a broken line. In the illustrated example, the position of each circle indicates the position of the joint, and the color of each circle indicates the observation error. Points that are far from the camera and difficult to observe and have many errors are shown in black, while points that are easy to observe and have few errors are shown in white.
 図8(c)は、図8(a),(b)の現時刻への予測を重ねて示している。つまり、この図8(c)は、空間的および時間的に同期させた、HMD100およびモバイル機器200で推定された手認識結果を示している。 FIG. 8 (c) shows the predictions of FIGS. 8 (a) and 8 (b) to the current time. That is, FIG. 8 (c) shows the hand recognition results estimated by the HMD 100 and the mobile device 200, spatially and temporally synchronized.
 なお、観測時刻から現時刻への未来予測ではなく、観測時刻からそれより過去の時刻への変換であってもよい。この場合、自己、他者で推定された手の位置姿勢(各関節の位置も含む)のうち観測時刻が古い時刻に合わせることとなる。その際、補間は、時系列の情報の保存によって行うことも可能である。 Note that the conversion from the observation time to a time earlier than that may be used instead of the future prediction from the observation time to the current time. In this case, the observation time of the hand position and posture (including the position of each joint) estimated by the self and others is adjusted to the old time. At that time, interpolation can also be performed by storing time-series information.
 図6のフローチャートの説明に戻って、HMD100は、ステップST25の処理の後、ステップST26において、メイン統合処理部134で、空間的および時間的に同期させた自己、他者の手認識結果(手の位置姿勢、各関節の位置を含む)を統合する。 Returning to the explanation of the flowchart of FIG. 6, after the processing of step ST25, in step ST26, the main integrated processing unit 134 synchronizes the hands of the self and others spatially and temporally (hands). (Including the position and posture of each joint, the position of each joint) is integrated.
 図9のフローチャートは、ステップST26における統合処理をさらに詳細に示したものである。HMD100は、ステップST41において、同一ID(同一識別子)を持つ各手認識結果の同一性の継続判断をする。 The flowchart of FIG. 9 shows the integrated process in step ST26 in more detail. In step ST41, the HMD 100 continuously determines the identity of each hand recognition result having the same ID (same identifier).
 この場合、同一IDを持つ手のポーズと位置姿勢に基づいて、同一性が保たれているのかを検証する。この検証は、手の位置姿勢の各観測の距離とポーズの距離で計算し、ある閾値以上の場合、同一性が保たれていないと判断する。ポーズの距離は各関節の位置の距離や関節の回転の差分から計算することが可能である。 In this case, it is verified whether the identity is maintained based on the pose and position / posture of the hands having the same ID. This verification is calculated from the distance of each observation of the position and posture of the hand and the distance of the pose, and if it exceeds a certain threshold value, it is judged that the identity is not maintained. The pose distance can be calculated from the distance between the positions of each joint and the difference in the rotation of the joints.
 次に、HMD100は、ステップST42において、ステップST41の判断を参照して、同一性が保たれていなくてIDの分離が必要な手認識結果が有るか判断する。IDの分離が必要な手認識結果が有ると判断する場合、HMD100は、ステップST43において、それぞれの手認識結果に別々のIDを付与してIDの分離を行い、その後に、ステップST44の処理に進む。一方、ステップST42でIDの分離が必要な手認識結果が無いと判断する場合、HMD100は、直ちに、ステップST44の処理に進む。 Next, in step ST42, the HMD 100 refers to the determination in step ST41 to determine whether there is a hand recognition result in which the identity is not maintained and the ID needs to be separated. When it is determined that there is a hand recognition result that requires ID separation, the HMD 100 assigns a different ID to each hand recognition result in step ST43 to separate the IDs, and then performs the process of step ST44. move on. On the other hand, if it is determined in step ST42 that there is no hand recognition result that requires ID separation, the HMD 100 immediately proceeds to the process of step ST44.
 このステップST44において、HMD100は、IDが違う手認識結果の統合判断をする。この判断は上述のIDの分離の際の同一性の判断と逆で、手の位置姿勢とポーズの距離が、ある閾値以下の場合、同じ手として判断し、統合するものと判断する。 In this step ST44, the HMD 100 makes an integrated judgment of hand recognition results having different IDs. This judgment is the opposite of the above-mentioned judgment of identity at the time of ID separation, and when the distance between the position and posture of the hands and the pose is equal to or less than a certain threshold value, it is judged as the same hand and integrated.
 次に、HMD100は、ステップST45において、ステップST44の判断を参照して、IDの統合が必要な手認識結果が有るか判断する。IDの統合が必要な手認識結果が有ると判断する場合、HMD100は、ステップST46において、IDを統合し、その後、ステップST47の処理に進む。一方、ステップST45でIDの統合が必要な手認識結果が無いと判断する場合、HMD100は、直ちに、ステップST47の処理に進む。 Next, in step ST45, the HMD 100 refers to the determination in step ST44 and determines whether or not there is a hand recognition result that requires ID integration. When it is determined that there is a hand recognition result that requires ID integration, the HMD 100 integrates the IDs in step ST46, and then proceeds to the process of step ST47. On the other hand, when it is determined in step ST45 that there is no hand recognition result that requires ID integration, the HMD 100 immediately proceeds to the process of step ST47.
 なお、IDが付与されていない手が観測されたとき(トラッキングされている手とは別に、突然位置の離れたところに手が認識されたケースや、最初の手認識の処理が走り始める際、またはカメラの画角内に新しい手が映り込む)には、それらの不明の手の手認識結果に、使用されていない、ユニークなIDを割り振る。 When a hand without an ID is observed (aside from the tracked hand, when the hand is suddenly recognized at a distant position, or when the first hand recognition process starts running, Or a new hand is reflected in the angle of view of the camera), a unique ID that is not used is assigned to the hand recognition result of those unknown hands.
 また、各個人の手の識別IDをもとに手認識結果にIDの割り振りを行ってもよい。その場合、IDは手認識の時点で付与されることとなる。この処理は、トラッキングされている手の認識が外れ、また現れた際にも有効であると考えられる。 Further, the ID may be assigned to the hand recognition result based on the identification ID of each individual's hand. In that case, the ID will be assigned at the time of hand recognition. This process is considered to be effective even when the tracked hand is not recognized and appears.
 IDの分離は、観測された手がカメラの位置によって重なった状態に見え、一つの手として誤認識され、その後、2つの手であることが判明した際に必要である。逆に、IDの統合は、ポーズの誤認識により、同じ手が空間上の違うところで観測されたとして扱われ、2つの手として認識され、その後、カメラの位置姿勢が補正などにより、正しい位置に観測され、同じ手であることが認識された際に統合されるようなケースに必要となる。 ID separation is necessary when the observed hands appear to overlap depending on the position of the camera, are mistakenly recognized as one hand, and then are found to be two hands. On the contrary, the ID integration is treated as if the same hand was observed in different places in space due to the misrecognition of the pose, and it is recognized as two hands, and then the position and orientation of the camera are corrected to the correct position. It is needed in cases where it is integrated when it is observed and recognized as the same hand.
 図10は、手認識結果のID分離が必要な例を示している。図10(a)に2つの手が重なった状態の場合、誤認視によって同じ1つの手として認識される可能性がある。ただし、図10(b)に示すように、その後の認識で2つの別々の手であることが認識できた場合、それぞれ別のIDとして登録する必要がある。 FIG. 10 shows an example in which ID separation of the hand recognition result is required. In the case where two hands overlap in FIG. 10A, it may be recognized as the same one hand by misidentification. However, as shown in FIG. 10B, if it is possible to recognize that the two hands are two separate hands in the subsequent recognition, it is necessary to register them as different IDs.
 図11は、IDの統合が必要な例を示している。図11(a)に示すように、カメラの自己姿勢位置推定時に誤認視されている状態で同じ手を認識しても、別々の空間にいる手として認識される。しかし、図11(b)に示すように、自己位置姿勢が補正され、正しい位置に認識された際、別々の手として登録されていた手が同じであることが判明する。その際には、2つの手認識結果を1つの手として統合する必要がある。 FIG. 11 shows an example in which ID integration is required. As shown in FIG. 11A, even if the same hand is recognized in a state of being misidentified when estimating the self-posture position of the camera, it is recognized as a hand in different spaces. However, as shown in FIG. 11B, when the self-positioning posture is corrected and the correct position is recognized, it is found that the hands registered as separate hands are the same. In that case, it is necessary to integrate the two hand recognition results as one hand.
 ステップST47の処理の時点では、同一性と同期が取れた手認識結果が得られる。このステップST47において、HMD100は、同期・同一性がとれた手認識結果を統合する。まず、HMD100は、手認識結果のうち、手の位置姿勢の統合を行う。この統合は、例えば、拡張カルマンフィルタ(Extended Kalman Filter)、あるいは通常のカルマンフィルタ(Kalman Filter)やパーティクルフィルタ(Particle Filter)を用いて行われる。あるいは、この統合は、例えば、重み付き平均や、単純な位置の平均を求めることで行われる。手の位置姿勢の推定誤差は、それらフィルタの入力として、または重みとして使用が可能であり、統合の精度を高めることができる。 At the time of processing in step ST47, a hand recognition result synchronized with the identity can be obtained. In this step ST47, the HMD 100 integrates synchronized and identical hand recognition results. First, the HMD 100 integrates the position and posture of the hand among the hand recognition results. This integration is performed, for example, by using an extended Kalman filter, a normal Kalman filter, or a particle filter. Alternatively, this integration is done, for example, by finding a weighted average or a simple position average. The hand position / orientation estimation error can be used as an input for those filters or as a weight to improve the accuracy of integration.
 次に、HMD100は、手認識結果のうち、各関節の位置の統合を行う。この統合も、例えば、拡張カルマンフィルタ、あるいは通常のカルマンフィルタやパーティクルフィルタを用いて行われる。あるいは、この統合も、例えば、重み付き平均や、単純な位置の平均を求めることで行われる。各関節の位置の推定誤差は、それらフィルタの入力として、または重みとして使用が可能であり、統合の精度を高めることができる。 Next, the HMD100 integrates the positions of each joint in the hand recognition result. This integration is also performed using, for example, an extended Kalman filter, or a normal Kalman filter or particle filter. Alternatively, this integration is also done, for example, by finding a weighted average or a simple position average. The estimation error of the position of each joint can be used as an input for those filters or as a weight to improve the accuracy of integration.
 なお、上述の説明では、統合処理を、手の位置姿勢の統合を行い、その後に各関節の位置の統合を行っている。しかし、最初から各関節の位置の統合を行ってもよい。 In the above explanation, the integration process is performed by integrating the position and posture of the hand, and then integrating the positions of each joint. However, the positions of the joints may be integrated from the beginning.
 図6のフローチャートの説明に戻って、HMD100は、ステップST26の処理の後、ステップST27において、統合結果をフィードバックする。このフィードバックには、自機器自体へのフィードバックと他者であるモバイル機器200へのフィードバックが含まれる。モバイル機器200へのフィードバックは、この統合結果を、通信部104を通じて、モバイル機器200に送信することで行われる。この統合結果のフィードバックにより、手認識(手の位置姿勢の推定、各関節位置の推定)の精度向上を図ることができる。 Returning to the explanation of the flowchart of FIG. 6, the HMD 100 feeds back the integration result in step ST27 after the processing of step ST26. This feedback includes feedback to the own device itself and feedback to the mobile device 200 which is another person. Feedback to the mobile device 200 is performed by transmitting the integrated result to the mobile device 200 through the communication unit 104. By feeding back the result of this integration, it is possible to improve the accuracy of hand recognition (estimation of hand position / posture, estimation of each joint position).
 「認識処理(サブ)」
 図12のフローチャートを参照して、モバイル機器200における認識処理(サブ)、つまり(図5のフローチャートのステップST12の処理の詳細について説明する。モバイル機器200は、図12のフローチャートの処理を繰り返し実行する。
"Recognition processing (sub)"
With reference to the flowchart of FIG. 12, the recognition process (sub) in the mobile device 200, that is, (details of the process of step ST12 of the flowchart of FIG. 5 will be described. The mobile device 200 repeatedly executes the process of the flowchart of FIG. To do.
 まず、モバイル機器200は、ステップST51において、自己位置推定処理部231で、SLAM等のアルゴリズムを利用し、カメラ201で得られる画像とIMU202で得られる情報(加速度、角加速度)に基づいて、自己であるモバイル機器200の位置姿勢を推定する。この場合、位置姿勢推定の誤差も推定される。 First, in step ST51, the mobile device 200 uses an algorithm such as SLAM in the self-position estimation processing unit 231 based on the image obtained by the camera 201 and the information (acceleration, angular acceleration) obtained by the IMU 202. The position and orientation of the mobile device 200 is estimated. In this case, the error of position / orientation estimation is also estimated.
 また、モバイル機器200は、ステップST51において、他者位置推定処理部232で、カメラ201で得られる画像に基づき、物体認識、トラッキング等の手法によって、モバイル機器200の位置姿勢を基準とした他者であるHMD100の位置姿勢を推定する。この場合、位置姿勢推定の誤差も推定される。このモバイル機器200の位置姿勢を基準としたHMD100の位置姿勢は、HMD100とモバイル機器200の位置姿勢の相対関係情報を構成する。 Further, in step ST51, the mobile device 200 is the other person's position estimation processing unit 232 based on the image obtained by the camera 201, and the other person based on the position / orientation of the mobile device 200 by a method such as object recognition or tracking. The position and orientation of the HMD 100 is estimated. In this case, the error of position / orientation estimation is also estimated. The position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.
 また、HMD100は、ステップST51において、手認識処理部233で、カメラ201で得られる画像に基づき、手を認識すると共に、モバイル機器200の位置姿勢を基準とした手の位置姿勢と速度を推定し、さらにポーズ推定として各関節の位置と速度を推定する。この場合、手の位置姿勢推定の誤差や各関節の位置推定の誤差等の情報も推定される。 Further, in step ST51, the hand recognition processing unit 233 recognizes the hand based on the image obtained by the camera 201, and estimates the position / posture and speed of the hand based on the position / posture of the mobile device 200. Furthermore, the position and speed of each joint are estimated as pose estimation. In this case, information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint is also estimated.
 次に、モバイル機器200は、ステップST52において、サブ認識統合処理部234で、他者であるHMD100からの情報送信のリクエストが有るか判断する。情報送信のリクエストが有るとき、モバイル機器200は、ステップST53において、サブ認識統合処理部234で、通信部204を通じて他者であるHMD100に、ステップST51で推定された情報を送信し、その後に、ステップST54の処理に進む。一方、ステップST52で情報送信のリクエストが無いとき、モバイル機器200は、直ちに、ステップST54の処理に進む。 Next, in step ST52, the mobile device 200 determines in the sub-recognition integrated processing unit 234 whether or not there is a request for information transmission from another person, the HMD 100. When there is a request for information transmission, the mobile device 200 transmits the information estimated in step ST51 to the other HMD100 through the communication unit 204 in the sub-recognition integrated processing unit 234 in step ST53, and then, The process proceeds to step ST54. On the other hand, when there is no request for information transmission in step ST52, the mobile device 200 immediately proceeds to the process of step ST54.
 ステップST54において、モバイル機器200は、サブ認識統合処理部234で、他者から統合結果の受信が有るか判断する。他者から統合結果の受信が有るとき、モバイル機器200は、ステップST55において、サブ認識統合処理部234で、過去の推定情報に受信情報を統合し、手認識・手ポーズ推定で使用される過去の推定情報を更新する。このように手認識・手ポーズ推定で使用される過去の推定情報が受信情報(他者からの統合結果)で更新されることで、手認識・手ポーズ推定の精度向上が図られる。一方、ステップST54で他者から統合結果の受信が無いとき、モバイル機器200は、処理を終了する。この場合、手認識・手ポーズ推定で使用される過去の推定情報の更新は行われない。 In step ST54, the mobile device 200 determines whether or not the integration result is received from another person by the sub-recognition integration processing unit 234. When the integrated result is received from another person, the mobile device 200 integrates the received information with the past estimated information in the sub-recognition integrated processing unit 234 in step ST55, and is used in the hand recognition / hand pose estimation. Update the estimated information of. By updating the past estimation information used in hand recognition / hand pose estimation with the received information (integration result from others) in this way, the accuracy of hand recognition / hand pose estimation can be improved. On the other hand, when the integration result is not received from another person in step ST54, the mobile device 200 ends the process. In this case, the past estimation information used in hand recognition / hand pose estimation is not updated.
 なお、図12のフローチャートにおける各処理の順番は、これに限定されない。例えば、自己位置推定、他者位置推定および手認識・手ポーズ推定の処理と、受信と送信の処理は並列で行われていてもよい。その場合、情報送信のリクエストがあったとき、モバイル機器200は、最新の推定結果をHMD100に送信する。また、HMD100から統合結果が受信されたとき、モバイル機器200は、この統合結果(受信情報)と最新の推定結果を統合する。 The order of each process in the flowchart of FIG. 12 is not limited to this. For example, the processing of self-position estimation, other person's position estimation, hand recognition / hand pose estimation, and the processing of reception and transmission may be performed in parallel. In that case, when there is a request for information transmission, the mobile device 200 transmits the latest estimation result to the HMD 100. Further, when the integration result is received from the HMD 100, the mobile device 200 integrates the integration result (received information) with the latest estimation result.
 このように本技術においては、HMD100の位置姿勢を基準とした手認識結果およびモバイル機器200の位置姿勢を基準とした手認識結果に基づいて決定された手認識結果に基づき手のジェスチャを認識するものである。そのため、手のジェスチャ認識の精度を高めることが可能となる。また、本技術において、HMD100およびモバイル機器200は自由に移動可能である。 As described above, in the present technology, the hand gesture is recognized based on the hand recognition result determined based on the hand recognition result based on the position / orientation of the HMD 100 and the hand recognition result based on the position / orientation of the mobile device 200. It is a thing. Therefore, it is possible to improve the accuracy of hand gesture recognition. Further, in the present technology, the HMD 100 and the mobile device 200 can be freely moved.
 <2.変形例>
 なお、上述実施の形態においては、ユーザ300がモバイル機器200を右手で持ち、ユーザ300は左手でジェスチャを行う例を示した。しかし、例えば、モバイル機器200をテーブル等に置き、ユーザ300が右手および左手の双方でジェスチャを行う例も同様に考えることができる。
<2. Modification example>
In the above-described embodiment, an example is shown in which the user 300 holds the mobile device 200 with his right hand and the user 300 makes a gesture with his left hand. However, for example, an example in which the mobile device 200 is placed on a table or the like and the user 300 makes a gesture with both the right hand and the left hand can be considered in the same manner.
 また、上述実施の形態においては、ジェスチャ認識対象が手である例を示した。しかし、ジェスチャ認識対象としては、手以外にも、モデル化が可能な他の物体も想定可能である。例えば、ペンやマーカー、箱などの剛体前提の物体や、変形が既知な本、顔、紙、人体や車なども変形例として考えられる。 Further, in the above-described embodiment, an example in which the gesture recognition target is a hand is shown. However, as the gesture recognition target, other objects that can be modeled can be assumed in addition to the hand. For example, objects such as pens, markers, and boxes that are premised on rigid bodies, books, faces, papers, human bodies, and cars that are known to be deformed can be considered as deformation examples.
 また、上述実施の形態においては、自己位置姿勢、他者位置姿勢と手の認識がすべて3D空間で推定された入力を前提として説明している。そのため、基本的にはそれらの情報が取得可能なステレオカメラなどのセンサを前提としている。しかし、その前提を緩めることも可能と考えられる。 Further, in the above-described embodiment, the self-positioning posture, the other-person-positioning posture, and the recognition of the hand are all explained on the premise of the input estimated in the 3D space. Therefore, it is basically premised on a sensor such as a stereo camera that can acquire such information. However, it is possible to loosen that premise.
 各推定機器が単眼カメラを使用している場合、幾つかの推定・推定結果はスケール不定な状態(物体の大きさが定まらない状態。単眼の場合、自己位置、手認識などがその状態となることが想定できる)で行われるが、その状態でも、本技術で提唱する情報の統合は可能であると考えられる。その場合、上述の具体例の情報のカメラポーズの合わせ込み(同期)と統合の際にスケールの合わせ込みも同時に推定することで処理が可能となる。 When each estimation device uses a monocular camera, some estimation / estimation results are in an indefinite scale state (a state in which the size of an object is not determined. In the case of a monocular, self-position, hand recognition, etc. are in that state. However, even in that state, it is considered possible to integrate the information proposed by this technology. In that case, processing can be performed by simultaneously estimating the adjustment (synchronization) of the camera poses and the adjustment of the scale at the time of integration of the information of the above-mentioned specific example.
 また、他者位置姿勢推定の際、他者とされる物体の大きさが既知である場合、その情報からスケール推定が可能であるため、その情報を基に、スケールが不定な他の認識結果の推定が可能となる。そのため、この情報からも、単眼カメラで構成されたシステムでも本技術の処理は適応可能ある。 In addition, when estimating the position and orientation of another person, if the size of the object to be considered as another person is known, the scale can be estimated from that information, so based on that information, other recognition results with an indefinite scale. Can be estimated. Therefore, from this information as well, the processing of this technology can be applied to a system composed of a monocular camera.
 また、ToFカメラ、Pattern Stereo、Structured Lightシステムなどの距離情報が推定可能な機器の使用も想定できる。その場合、3D情報が取れる機器の認識結果を基に統合処理を行うことで本技術を拡張できる。また、それ等の機器と単眼カメラの連携も考えることが可能である。この場合、3D情報および、スケールが推定可能な機器の情報から単眼カメラの情報を補間することも考えられる。 It is also possible to assume the use of equipment that can estimate distance information, such as ToF cameras, Pattern Stereos, and Structured Light systems. In that case, the present technology can be expanded by performing integrated processing based on the recognition result of the device that can obtain 3D information. It is also possible to consider the cooperation between such devices and monocular cameras. In this case, it is conceivable to interpolate the information of the monocular camera from the 3D information and the information of the device whose scale can be estimated.
 また、上述実施の形態においては、HMD100でメイン処理をし、モバイル機器200でサブ処理を行う例を示した。しかし、HMD100とモバイル機器200で処理を所定の周期で入れ替えることも考えられる。これにより、処理のボトルネックや処理量の偏りをなくすことが可能である。また、これにより、一端末の極端な電力消費を抑えることが可能となる。 Further, in the above-described embodiment, an example in which the main processing is performed by the HMD 100 and the sub processing is performed by the mobile device 200 is shown. However, it is also conceivable to switch the processing between the HMD 100 and the mobile device 200 at a predetermined cycle. This makes it possible to eliminate processing bottlenecks and uneven processing amount. In addition, this makes it possible to suppress the extreme power consumption of one terminal.
 また、上述実施の形態においては、HMD100とモバイル機器200の連携を挙げているが、その他にも、複数台の移動可能なカメラによる、人体のポーズ検出などでも本技術を使用可能である。この場合、各カメラが他者として認識され、お互いの観測を統合することによって、稼働可能なカメラシステムによるモーションキャプチャーシステムとして扱うことも想定できる。 Further, in the above-described embodiment, the cooperation between the HMD 100 and the mobile device 200 is mentioned, but in addition, this technology can be used for detecting the pose of the human body by a plurality of movable cameras. In this case, it can be assumed that each camera is recognized as another person and treated as a motion capture system by an operable camera system by integrating the observations of each other.
 また、上述実施の形態においては、HMD100が第1の機器および情報処理装置を構成し、モバイル機器200が第2の機器を構成する例を示したが、本技術はこれに限定されるものではない。 Further, in the above-described embodiment, the HMD 100 constitutes the first device and the information processing device, and the mobile device 200 constitutes the second device, but the present technology is not limited thereto. Absent.
 例えば、モバイル機器200が第1の機器および情報処理装置を構成し、HMD100が第2の機器を構成してもよい。この場合には、モバイル機器200からHMD100に対して通信によりジェスチャ認識情報に基づいたAR表示制御信号が送信されることでAR表示制御が行われるか、あるいはモバイル機器200からHMD100に対して通信によりジェスチャ認識情報が送信され、HMD100においてそのジェスチャ認識情報に基づいてAR表示制御が行われる。 For example, the mobile device 200 may constitute the first device and the information processing device, and the HMD 100 may constitute the second device. In this case, AR display control is performed by transmitting an AR display control signal based on the gesture recognition information from the mobile device 200 to the HMD 100 by communication, or the mobile device 200 communicates with the HMD 100. The gesture recognition information is transmitted, and AR display control is performed in the HMD 100 based on the gesture recognition information.
 また、例えば、HMD100が第1の機器を構成し、モバイル機器200が第2の機器を構成し、あるいはモバイル機器200が第1の機器を構成し、HMD100が第2の機器を構成し、HMD100およびモバイル機器200にネットワーク接続された外部サーバ等のその他の機器が情報処理装置を構成してもよい。この場合には、その他の機器からHMD100に対して通信によりジェスチャ認識情報に基づいたAR表示制御信号が送信されることでAR表示制御が行われるか、あるいはその他の機器からHMD100に対して通信によりジェスチャ認識情報が送信され、HMD100においてそのジェスチャ認識情報に基づいてAR表示制御が行われる。 Further, for example, the HMD 100 constitutes the first device and the mobile device 200 constitutes the second device, or the mobile device 200 constitutes the first device and the HMD 100 constitutes the second device. And other devices such as an external server connected to the mobile device 200 via a network may constitute an information processing device. In this case, AR display control is performed by transmitting an AR display control signal based on the gesture recognition information from another device to the HMD 100 by communication, or the other device communicates with the HMD 100. Gesture recognition information is transmitted, and AR display control is performed in the HMD 100 based on the gesture recognition information.
 また、添付図面を参照しながら本開示の好適な実施形態について詳細に説明したが、本開示の技術的範囲はかかる例に限定されない。本開示の技術分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本開示の技術的範囲に属するものと了解される。 Although the preferred embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the technical scope of the present disclosure is not limited to such examples. It is clear that a person having ordinary knowledge in the technical field of the present disclosure can come up with various modifications or modifications within the scope of the technical ideas described in the claims. Of course, it is understood that the above also belongs to the technical scope of the present disclosure.
 また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示に係る技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者には明らかな他の効果を奏しうる。 Further, the effects described in the present specification are merely explanatory or exemplary and are not limited. That is, the techniques according to the present disclosure may exhibit other effects apparent to those skilled in the art from the description herein, in addition to or in place of the above effects.
 なお、本技術は、以下のような構成もとることができる。
 (1)第1の機器のセンサ出力に基づき前記第1の機器の位置姿勢を基準としたジェスチャ認識対象の第1の位置姿勢情報を取得する情報取得部と、
 第2の機器から、前記第2の機器のセンサ出力に基づき取得された前記第2の機器の位置姿勢を基準とした前記ジェスチャ認識対象の第2の位置姿勢情報を受信する情報受信部と、
 前記第1の機器と前記第2の機器の位置姿勢の相対関係情報、前記ジェスチャ認識対象の第1の位置姿勢情報および前記ジェスチャ認識対象の第2の位置姿勢情報に基づき前記ジェスチャ認識対象の位置姿勢を決定し、該決定された前記ジェスチャ認識対象の位置姿勢に基づき前記ジェスチャ認識対象のジェスチャを認識する情報処理部を備える
 情報処理装置。
 (2)前記情報取得部は、前記第1の機器のセンサ出力に基づき、前記第1の機器の位置姿勢を基準とした前記第2の機器の位置姿勢情報をさらに取得し、
 前記相対関係情報は、前記第1の機器の位置姿勢を基準とした前記第2の機器の位置姿勢情報を含む
 前記(1)に記載の情報処理装置。
 (3)前記情報取得部は、前記第1の機器のセンサ出力に含まれる前記第2の機器に表示される認識用マーカー情報に基づき、前記第2の機器の位置姿勢情報を取得する
 前記(2)に記載の情報処理装置。
 (4)前記情報受信部は、前記第2の機器から、前記第2の機器のセンサ出力に基づき取得された前記第2の機器の位置姿勢を基準とした前記第1の機器の位置姿勢情報をさらに受信し、
 前記相対関係情報は、前記第2の機器の位置姿勢を基準とした前記第1の機器の位置姿勢情報を含む
 前記(1)に記載の情報処理装置。
 (5)前記情報処理部は、
 前記第1の機器と前記第2の機器の位置姿勢の相対関係情報に基づき、前記ジェスチャ認識対象の第1の位置姿勢と前記ジェスチャ認識対象の第2の位置姿勢を空間的に同期させ、
 前記ジェスチャ認識対象の第1の位置姿勢情報および前記ジェスチャ認識対象の第2の位置姿勢情報にそれぞれ付加されたタイムスタンプ情報に基づき、前記ジェスチャ認識対象の第1の位置姿勢および前記ジェスチャ認識対象の第2の位置姿勢を予測処理により時間的に同期させ、
 前記空間的および時間的に同期させた前記ジェスチャ認識対象の第1の位置姿勢および前記ジェスチャ認識対象の第2の位置姿勢を統合して、前記ジェスチャ認識対象の位置姿勢を決定する
 前記(1)から(4)のいずれかに記載の情報処理装置。
 (6)前記情報取得部で取得される前記ジェスチャ認識対象の第1の位置姿勢情報および前記情報受信部で受信される前記ジェスチャ認識対象の第2の位置姿勢情報にはそれぞれ識別情報が付与されており、
 前記情報処理部は、前記ジェスチャ認識対象の各位置姿勢情報に基づいて前記識別情報の統合または分離を行い、同一の識別情報に係る前記ジェスチャ認識対象の第1の位置姿勢および前記ジェスチャ認識対象の第2の位置姿勢を統合する
 前記(5)に記載の情報処理装置。
 (7)前記情報受信部は、前記第2の機器に、アプリケーションの起動要求を送信した後、情報送信のリクエストを送信して、前記第2の機器から、前記ジェスチャ認識対象の第2の位置姿勢情報を受信する
 前記(1)から(6)のいずれかに記載の情報処理装置。
 (8)前記アプリケーションの起動要求を受信した前記第2の機器は、前記ジェスチャ認識対象の第2の位置姿勢情報を随時更新する
 前記(7)に記載の情報処理装置。
 (9)前記第1の機器は、拡張現実表示装置であり、
 前記ジェスチャ認識情報に基づいて前記拡張現実表示装置における拡張現実表示を制御する表示制御部をさらに備える
 前記(1)から(8)のいずれかに記載の情報処理装置。
 (10)前記ジェスチャ認識対象は、前記第1の機器と前記第2の機器の間に位置し、
 前記拡張現実表示は、前記第2の機器に対応した位置に行われる
 前記(9)に記載の情報処理装置。
 (11)前記第1の機器は、透過ディスプレイを持つヘッドマウントディスプレイであり、
 前記第2の機器は、非透過ディスプレイを持つモバイル機器である
 前記(10)に記載の情報処理装置。
 (12)第1の機器のセンサ出力に基づき前記第1の機器の位置姿勢を基準としたジェスチャ認識対象の第1の位置姿勢情報を取得する手順と、
 第2の機器から、前記第2の機器のセンサ出力に基づき取得された前記第2の機器の位置姿勢を基準とした前記ジェスチャ認識対象の第2の位置姿勢情報を受信する手順と、
 前記第1の機器と前記第2の機器の位置姿勢の相対関係情報、前記ジェスチャ認識対象の第1の位置姿勢情報および前記ジェスチャ認識対象の第2の位置姿勢情報に基づき前記ジェスチャ認識対象の位置姿勢を決定し、該決定された前記ジェスチャ認識対象の位置姿勢に基づき前記ジェスチャ認識対象のジェスチャを認識する手順を有する
 情報処理方法。
 (13)コンピュータを、
 第1の機器のセンサ出力に基づき前記第1の機器の位置姿勢を基準としたジェスチャ認識対象の第1の位置姿勢情報を取得する情報取得手段と、
 第2の機器から、前記第2の機器のセンサ出力に基づき取得された前記第2の機器の位置姿勢を基準とした前記ジェスチャ認識対象の第2の位置姿勢情報を受信する情報受信手段と、
 前記第1の機器と前記第2の機器の位置姿勢の相対関係情報、前記ジェスチャ認識対象の第1の位置姿勢情報および前記ジェスチャ認識対象の第2の位置姿勢情報に基づき前記ジェスチャ認識対象の位置姿勢を決定し、該決定された前記ジェスチャ認識対象の位置姿勢に基づき前記ジェスチャ認識対象のジェスチャを認識する情報処理手段として機能させる
 プログラム。
The present technology can have the following configurations.
(1) An information acquisition unit that acquires the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
An information receiving unit that receives the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. An information processing apparatus including an information processing unit that determines a posture and recognizes the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
(2) The information acquisition unit further acquires the position / orientation information of the second device based on the position / orientation of the first device based on the sensor output of the first device.
The information processing apparatus according to (1), wherein the relative relationship information includes position / orientation information of the second device based on the position / orientation of the first device.
(3) The information acquisition unit acquires the position / orientation information of the second device based on the recognition marker information displayed on the second device included in the sensor output of the first device (3). The information processing device according to 2).
(4) The information receiving unit receives the position / orientation information of the first device based on the position / orientation of the second device acquired from the second device based on the sensor output of the second device. Receive more,
The information processing apparatus according to (1), wherein the relative relationship information includes position / orientation information of the first device based on the position / attitude of the second device.
(5) The information processing unit
Based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target are spatially synchronized.
Based on the time stamp information added to the first position / posture information of the gesture recognition target and the second position / posture information of the gesture recognition target, the first position / posture of the gesture recognition target and the gesture recognition target The second position and orientation are time-synchronized by the prediction process,
The spatially and temporally synchronized first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target are integrated to determine the position and orientation of the gesture recognition target (1). The information processing apparatus according to any one of (4) to (4).
(6) Identification information is added to the first position / posture information of the gesture recognition target acquired by the information acquisition unit and the second position / posture information of the gesture recognition target received by the information receiving unit. And
The information processing unit integrates or separates the identification information based on each position / posture information of the gesture recognition target, and the first position / posture of the gesture recognition target and the gesture recognition target related to the same identification information. The information processing apparatus according to (5) above, which integrates the second position and orientation.
(7) The information receiving unit transmits a request for starting an application to the second device, and then transmits a request for transmitting information, and the second device sends the second position of the gesture recognition target. The information processing device according to any one of (1) to (6) above, which receives posture information.
(8) The information processing device according to (7), wherein the second device that has received the application activation request updates the second position / orientation information of the gesture recognition target at any time.
(9) The first device is an augmented reality display device.
The information processing device according to any one of (1) to (8), further comprising a display control unit that controls augmented reality display in the augmented reality display device based on the gesture recognition information.
(10) The gesture recognition target is located between the first device and the second device.
The information processing device according to (9), wherein the augmented reality display is performed at a position corresponding to the second device.
(11) The first device is a head-mounted display having a transmissive display.
The information processing device according to (10) above, wherein the second device is a mobile device having a non-transmissive display.
(12) A procedure for acquiring the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device, and
A procedure for receiving the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device, and
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. An information processing method having a procedure of determining a posture and recognizing the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
(13) Computer
An information acquisition means for acquiring the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
An information receiving means for receiving the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. A program that determines a posture and functions as an information processing means for recognizing a gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
 10・・・AR表示システム
 100・・・HMD
 101・・・カメラ
 102・・・IMU
 103・・・情報処理部
 104・・・通信部
 105・・・透過ディスプレイ
 106・・・アプリケーション・認識用情報ストレージ
 131・・・自己位置推定処理部
 132・・・他者位置推定処理部
 133・・・手認識処理部
 134・・・メイン認識統合処理部
 135・・・画像生成・アプリケーション処理部
 200・・・モバイル機器
 201・・・カメラ
 202・・・IMU
 203・・・情報処理部
 204・・・通信部
 205・・・非透過ディスプレイ
 206・・・認識用情報ストレージ
 231・・・自己位置推定処理部
 232・・・他者位置推定処理部
 233・・・手認識処理部
 234・・・サブ認識統合処理部
 235・・・認識用マーカー画像生成部
 300・・・ユーザ
 400・・・仮想的な本
10 ... AR display system 100 ... HMD
101 ... Camera 102 ... IMU
103 ・ ・ ・ Information processing unit 104 ・ ・ ・ Communication unit 105 ・ ・ ・ Transmissive display 106 ・ ・ ・ Application / recognition information storage 131 ・ ・ ・ Self-position estimation processing unit 132 ・ ・ ・ Others' position estimation processing unit 133 ・・ ・ Hand recognition processing unit 134 ・ ・ ・ Main recognition integrated processing unit 135 ・ ・ ・ Image generation / application processing unit 200 ・ ・ ・ Mobile device 201 ・ ・ ・ Camera 202 ・ ・ ・ IMU
203 ・ ・ ・ Information processing unit 204 ・ ・ ・ Communication unit 205 ・ ・ ・ Non-transparent display 206 ・ ・ ・ Information storage for recognition 231 ・ ・ ・ Self-position estimation processing unit 232 ・ ・ ・ Others position estimation processing unit 233 ・ ・ ・-Hand recognition processing unit 234 ... Sub-recognition integrated processing unit 235 ... Recognition marker image generation unit 300 ... User 400 ... Virtual book

Claims (13)

  1.  第1の機器のセンサ出力に基づき前記第1の機器の位置姿勢を基準としたジェスチャ認識対象の第1の位置姿勢情報を取得する情報取得部と、
     第2の機器から、前記第2の機器のセンサ出力に基づき取得された前記第2の機器の位置姿勢を基準とした前記ジェスチャ認識対象の第2の位置姿勢情報を受信する情報受信部と、
     前記第1の機器と前記第2の機器の位置姿勢の相対関係情報、前記ジェスチャ認識対象の第1の位置姿勢情報および前記ジェスチャ認識対象の第2の位置姿勢情報に基づき前記ジェスチャ認識対象の位置姿勢を決定し、該決定された前記ジェスチャ認識対象の位置姿勢に基づき前記ジェスチャ認識対象のジェスチャを認識する情報処理部を備える
     情報処理装置。
    An information acquisition unit that acquires the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
    An information receiving unit that receives the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
    The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. An information processing apparatus including an information processing unit that determines a posture and recognizes the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
  2.  前記情報取得部は、前記第1の機器のセンサ出力に基づき前記第1の機器の位置姿勢を基準とした前記第2の機器の位置姿勢情報をさらに取得し、
     前記相対関係情報は、前記第1の機器の位置姿勢を基準とした前記第2の機器の位置姿勢情報を含む
     請求項1に記載の情報処理装置。
    The information acquisition unit further acquires the position / orientation information of the second device based on the position / orientation of the first device based on the sensor output of the first device.
    The information processing device according to claim 1, wherein the relative relationship information includes position / orientation information of the second device based on the position / attitude of the first device.
  3.  前記情報取得部は、前記第1の機器のセンサ出力に含まれる前記第2の機器に表示される認識用マーカー情報に基づき、前記第2の機器の位置姿勢情報を取得する
     請求項2に記載の情報処理装置。
    The second aspect of the present invention, wherein the information acquisition unit acquires the position / orientation information of the second device based on the recognition marker information displayed on the second device included in the sensor output of the first device. Information processing device.
  4.  前記情報受信部は、前記第2の機器から、前記第2の機器のセンサ出力に基づき取得された前記第2の機器の位置姿勢を基準とした前記第1の機器の位置姿勢情報をさらに受信し、
     前記相対関係情報は、前記第2の機器の位置姿勢を基準とした前記第1の機器の位置姿勢情報を含む
     請求項1に記載の情報処理装置。
    The information receiving unit further receives the position / orientation information of the first device based on the position / orientation of the second device acquired based on the sensor output of the second device from the second device. And
    The information processing device according to claim 1, wherein the relative relationship information includes position / orientation information of the first device based on the position / attitude of the second device.
  5.  前記情報処理部は、
     前記第1の機器と前記第2の機器の位置姿勢の相対関係情報に基づき、前記ジェスチャ認識対象の第1の位置姿勢と前記ジェスチャ認識対象の第2の位置姿勢を空間的に同期させ、
     前記ジェスチャ認識対象の第1の位置姿勢情報および前記ジェスチャ認識対象の第2の位置姿勢情報にそれぞれ付加されたタイムスタンプ情報に基づき、前記ジェスチャ認識対象の第1の位置姿勢および前記ジェスチャ認識対象の第2の位置姿勢を予測処理により時間的に同期させ、
     前記空間的および時間的に同期させた前記ジェスチャ認識対象の第1の位置姿勢および前記ジェスチャ認識対象の第2の位置姿勢を統合して、前記ジェスチャ認識対象の位置姿勢を決定する
     請求項1に記載の情報処理装置。
    The information processing unit
    Based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target are spatially synchronized.
    Based on the time stamp information added to the first position / posture information of the gesture recognition target and the second position / posture information of the gesture recognition target, the first position / posture of the gesture recognition target and the gesture recognition target The second position and orientation are time-synchronized by the prediction process,
    The first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target that are spatially and temporally synchronized are integrated to determine the position and orientation of the gesture recognition target according to claim 1. The information processing device described.
  6.  前記情報取得部で取得される前記ジェスチャ認識対象の第1の位置姿勢情報および前記情報受信部で受信される前記ジェスチャ認識対象の第2の位置姿勢情報にはそれぞれ識別情報が付与されており、
     前記情報処理部は、前記ジェスチャ認識対象の各位置姿勢情報に基づいて前記識別情報の統合または分離を行い、同一の識別情報に係る前記ジェスチャ認識対象の第1の位置姿勢および前記ジェスチャ認識対象の第2の位置姿勢を統合する
     請求項5に記載の情報処理装置。
    Identification information is added to the first position / orientation information of the gesture recognition target acquired by the information acquisition unit and the second position / orientation information of the gesture recognition target received by the information receiving unit.
    The information processing unit integrates or separates the identification information based on each position / orientation information of the gesture recognition target, and the first position / orientation of the gesture recognition target and the gesture recognition target related to the same identification information. The information processing device according to claim 5, which integrates the second position and orientation.
  7.  前記情報受信部は、前記第2の機器に、アプリケーションの起動要求を送信した後、情報送信のリクエストを送信して、前記第2の機器から、前記ジェスチャ認識対象の第2の位置姿勢情報を受信する
     請求項1に記載の情報処理装置。
    The information receiving unit transmits a request for starting an application to the second device, then transmits a request for transmitting information, and receives the second position / orientation information of the gesture recognition target from the second device. The information processing device according to claim 1 for receiving.
  8.  前記アプリケーションの起動要求を受信した前記第2の機器は、前記ジェスチャ認識対象の第2の位置姿勢情報を随時更新する
     請求項7に記載の情報処理装置。
    The information processing device according to claim 7, wherein the second device that has received the application activation request updates the second position / posture information of the gesture recognition target at any time.
  9.  前記第1の機器は、拡張現実表示装置であり、
     前記ジェスチャ認識情報に基づいて前記拡張現実表示装置における拡張現実表示を制御する表示制御部をさらに備える
     請求項1に記載の情報処理装置。
    The first device is an augmented reality display device.
    The information processing device according to claim 1, further comprising a display control unit that controls augmented reality display in the augmented reality display device based on the gesture recognition information.
  10.  前記ジェスチャ認識対象は、前記第1の機器と前記第2の機器の間に位置し、
     前記拡張現実表示は、前記第2の機器に対応した位置に行われる
     請求項9に記載の情報処理装置。
    The gesture recognition target is located between the first device and the second device, and is located between the first device and the second device.
    The information processing device according to claim 9, wherein the augmented reality display is performed at a position corresponding to the second device.
  11.  前記第1の機器は、透過ディスプレイを持つヘッドマウントディスプレイであり、
     前記第2の機器は、非透過ディスプレイを持つモバイル機器である
     請求項10に記載の情報処理装置。
    The first device is a head-mounted display having a transmissive display.
    The information processing device according to claim 10, wherein the second device is a mobile device having a non-transparent display.
  12.  第1の機器のセンサ出力に基づき前記第1の機器の位置姿勢を基準としたジェスチャ認識対象の第1の位置姿勢情報を取得する手順と、
     第2の機器から、前記第2の機器のセンサ出力に基づき取得された前記第2の機器の位置姿勢を基準とした前記ジェスチャ認識対象の第2の位置姿勢情報を受信する手順と、
     前記第1の機器と前記第2の機器の位置姿勢の相対関係情報、前記ジェスチャ認識対象の第1の位置姿勢情報および前記ジェスチャ認識対象の第2の位置姿勢情報に基づき前記ジェスチャ認識対象の位置姿勢を決定し、該決定された前記ジェスチャ認識対象の位置姿勢に基づき前記ジェスチャ認識対象のジェスチャを認識する手順を有する
     情報処理方法。
    A procedure for acquiring the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device, and
    A procedure for receiving the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device, and
    The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. An information processing method having a procedure of determining a posture and recognizing a gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
  13.  コンピュータを、
     第1の機器のセンサ出力に基づき前記第1の機器の位置姿勢を基準としたジェスチャ認識対象の第1の位置姿勢情報を取得する情報取得手段と、
     第2の機器から、前記第2の機器のセンサ出力に基づき取得された前記第2の機器の位置姿勢を基準とした前記ジェスチャ認識対象の第2の位置姿勢情報を受信する情報受信手段と、
     前記第1の機器と前記第2の機器の位置姿勢の相対関係情報、前記ジェスチャ認識対象の第1の位置姿勢情報および前記ジェスチャ認識対象の第2の位置姿勢情報に基づき前記ジェスチャ認識対象の位置姿勢を決定し、該決定された前記ジェスチャ認識対象の位置姿勢に基づき前記ジェスチャ認識対象のジェスチャを認識する情報処理手段として機能させる
     プログラム。
    Computer,
    An information acquisition means for acquiring the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
    An information receiving means for receiving the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
    The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. A program that determines a posture and functions as an information processing means for recognizing a gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
PCT/JP2020/044771 2019-12-04 2020-12-02 Information processing device, information processing method, and program WO2021112107A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-219835 2019-12-04
JP2019219835 2019-12-04

Publications (1)

Publication Number Publication Date
WO2021112107A1 true WO2021112107A1 (en) 2021-06-10

Family

ID=76221700

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/044771 WO2021112107A1 (en) 2019-12-04 2020-12-02 Information processing device, information processing method, and program

Country Status (1)

Country Link
WO (1) WO2021112107A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7022250B1 (en) 2021-10-04 2022-02-17 株式会社メディアドゥ Virtual reality or augmented reality reading systems, 3D display control programs for books and images, and information processing methods

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6293110B2 (en) * 2015-12-07 2018-03-14 株式会社Hielero Point cloud data acquisition system and method
JP2018516399A (en) * 2015-04-15 2018-06-21 株式会社ソニー・インタラクティブエンタテインメント Pinch and hold gesture navigation on head mounted display
JP2018530797A (en) * 2015-07-07 2018-10-18 グーグル エルエルシー System for tracking handheld electronic devices in virtual reality

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018516399A (en) * 2015-04-15 2018-06-21 株式会社ソニー・インタラクティブエンタテインメント Pinch and hold gesture navigation on head mounted display
JP2018530797A (en) * 2015-07-07 2018-10-18 グーグル エルエルシー System for tracking handheld electronic devices in virtual reality
JP6293110B2 (en) * 2015-12-07 2018-03-14 株式会社Hielero Point cloud data acquisition system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7022250B1 (en) 2021-10-04 2022-02-17 株式会社メディアドゥ Virtual reality or augmented reality reading systems, 3D display control programs for books and images, and information processing methods
JP2023054522A (en) * 2021-10-04 2023-04-14 株式会社メディアドゥ Virtual reality or augmented reality reading system, 3d display control program of book and image, and information processing method

Similar Documents

Publication Publication Date Title
CN110140099B (en) System and method for tracking controller
US11741624B2 (en) Method and system for determining spatial coordinates of a 3D reconstruction of at least part of a real object at absolute spatial scale
US11796309B2 (en) Information processing apparatus, information processing method, and recording medium
CN110582798B (en) System and method for virtual enhanced vision simultaneous localization and mapping
KR101898075B1 (en) Augmented Reality System with Space and Object Recognition
US10949671B2 (en) Augmented reality system capable of manipulating an augmented reality object and an augmented reality method using the same
US10665036B1 (en) Augmented reality system and method with dynamic representation technique of augmented images
US10719993B1 (en) Augmented reality system and method with space and object recognition
AU2015248967B2 (en) A method for localizing a robot in a localization plane
KR20160088909A (en) Slam on a mobile device
US10726631B1 (en) Augmented reality system and method with frame region recording and reproduction technology based on object tracking
US11682138B2 (en) Localization and mapping using images from multiple devices
CN110895676B (en) dynamic object tracking
WO2021112107A1 (en) Information processing device, information processing method, and program
WO2015093130A1 (en) Information processing device, information processing method, and program
TW202125401A (en) Method, processing device, and display system for information display
TWI648556B (en) Slam and gesture recognition method
CN112819970B (en) Control method and device and electronic equipment
KR20190081031A (en) An augmented reality system capable of manipulating an augmented reality object using three-dimensional attitude information
WO2021137348A1 (en) Method for generating space map in order to share three-dimensional space information among plurality of terminals and reading command execution point
Plopski et al. Tracking systems: Calibration, hardware, and peripherals
KR20180058199A (en) Electronic apparatus for a video conference and operation method therefor
US20230222742A1 (en) Information processing apparatus, information processing method, and information processing program
US20230120092A1 (en) Information processing device and information processing method
JP2007156016A (en) Communication apparatus, data processing method, program, and communication method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20896566

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20896566

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP