WO2021112107A1

WO2021112107A1 - Information processing device, information processing method, and program

Info

Publication number: WO2021112107A1
Application number: PCT/JP2020/044771
Authority: WO
Inventors: ダニエル誠徳永
Original assignee: ソニーグループ株式会社
Priority date: 2019-12-04
Filing date: 2020-12-02
Publication date: 2021-06-10

Abstract

According to the present invention, the accuracy of gesture recognition of a gesture recognition target is improved.　First position and orientation information about the gesture recognition target is acquired on the basis of sensor outputs of a first apparatus with the position and orientation of the first apparatus as a reference. Second position and orientation information about the gesture recognition target is received from a second apparatus with the position and orientation of the second apparatus, which are acquired on the basis of sensor outputs of the second apparatus, as a reference. The position and orientation of the gesture recognition target are determined on the basis of relative relationship information about the positions and orientations of the first and second apparatuses, the first position and orientation information and the second position and orientation information about the gesture recognition target, and a gesture of the gesture recognition target is recognized on the basis of the determined position and orientation of the gesture recognition target.

Description

Information processing equipment, information processing methods and programs

This technology relates to information processing devices, information processing methods and programs.

Conventionally, for example, Patent Document 1 describes a user interface device that recognizes a gesture of a user's hand using a camera attached to the ceiling. In the case of this user interface device, a camera fixed to the ceiling is used, and the accuracy of gesture recognition may decrease depending on the position of the user or the direction of the hand.

Japanese Unexamined Patent Publication No. 2017-21960

The purpose of this technology is to improve the accuracy of gesture recognition of the gesture recognition target.

The concept of this technology is
An information acquisition unit that acquires the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
An information receiving unit that receives the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. The information processing apparatus includes an information processing unit that determines a posture and recognizes the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.

In the present technology, the information acquisition unit acquires the first position / orientation information of the gesture recognition target based on the position / orientation of the first device based on the sensor output of the first device. In addition, the information receiving unit receives from the second device the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device. Will be done.

In addition, the information processing unit recognizes the gesture based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and orientation information of the gesture recognition target. The position and posture of the target are determined. Then, the information processing unit recognizes the gesture of the gesture recognition target based on the determined position and orientation of the gesture recognition target.

For example, the information acquisition unit further acquires the position / orientation information of the second device based on the position / orientation of the first device based on the sensor output of the first device, and the relative relationship information is the first device. The position / orientation information of the second device based on the position / orientation of the second device may be included. In this case, for example, the information acquisition unit is configured to acquire the position / orientation information of the second device based on the recognition marker information displayed on the second device included in the sensor output of the first device. You may. Further, for example, the information receiving unit receives the position / orientation information of the first device based on the position / attitude information of the second device acquired from the second device based on the sensor output of the second device. Further received, the relative relationship information may be made to include the position / orientation information of the first device based on the position / attitude information of the second device.

Further, for example, the information processing unit spatially determines the first position / posture of the gesture recognition target and the second position / posture of the gesture recognition target based on the relative relationship information between the position / posture of the first device and the second device. Based on the time stamp information added to the first position / posture information of the gesture recognition target and the second position / posture information of the gesture recognition target, the first position / posture of the gesture recognition target and the gesture recognition target The second position / orientation is time-synchronized by the prediction process, and the first position / orientation of the gesture recognition target and the second position / orientation of the gesture recognition target that are spatially and temporally synchronized are integrated to perform gesture recognition. The position and orientation of the object may be determined.

In this case, for example, identification information is added to the first position / orientation information of the gesture recognition target acquired by the information acquisition unit and the second position / orientation information of the gesture recognition target received by the information receiving unit. , The information processing unit integrates or separates the identification information based on each position / orientation information of the gesture recognition target, and the first position / orientation of the gesture recognition target and the second position of the gesture recognition target related to the same identification information. The posture may be integrated.

Further, for example, the information receiving unit transmits a request for starting the application to the second device, and then sends a request for transmitting information, and the second device recognizes the second position / orientation information of the gesture recognition target. May be received. In this case, for example, the second device that has received the application start request may update the second position / orientation information of the gesture recognition target at any time.

Further, for example, the first device is an augmented reality display device, and may be further provided with a display control unit that controls the augmented reality display in the augmented reality display device based on the gesture detection recognition information. In this case, for example, the gesture recognition target may be located between the first device and the second device, and the augmented reality display may be performed at a position corresponding to the second device. Then, in this case, for example, the first device may be a head-mounted display having a transmissive display, and the second device may be a mobile device having a non-transparent display.

As described above, in the present technology, the first position / orientation information of the gesture recognition target based on the position / orientation of the first device and the second position / orientation of the gesture recognition target based on the position / orientation of the second device. It recognizes the gesture of the gesture recognition target based on the position and posture of the gesture recognition target determined based on the information. Therefore, it is possible to improve the accuracy of gesture recognition of the gesture recognition target. Further, in the present technology, the first device and the second device can be freely moved.

It is a block diagram which shows the configuration example of the AR display system as an embodiment. It is a figure which shows an example of the camera image of an HMD and a mobile device. It is a block diagram which shows the structural example of HMD. It is a block diagram which shows the configuration example of a mobile device. It is a flowchart for demonstrating the outline of processing of HMD and a mobile device in an AR display system. It is a flowchart for demonstrating the recognition process (main) in HMD. It is a figure for demonstrating the detail of a synchronization process. It is a figure for demonstrating the spatial and temporal synchronization processing of the hand recognition result of an HMD and a mobile device. It is a figure for demonstrating the detail of the integration process. It is a figure for demonstrating an example in which ID separation of a hand recognition result is necessary. It is a figure for demonstrating an example in which ID integration of a hand recognition result is necessary. It is a flowchart for demonstrating recognition processing (sub) in a mobile device.

Hereinafter, embodiments for carrying out the invention (hereinafter referred to as “embodiments”) will be described. The explanation will be given in the following order.
1. 1. Embodiment 2. Modification example

<1. Embodiment>
"AR display system"
FIG. 1 shows a configuration example of an AR (Augmented Reality) display system 10 as an embodiment. The AR display system 10 includes an HMD (Head Mounted Display) 100 having a transmissive display as an AR display device and a mobile device 200 having a non-transparent display such as a smartphone or a tablet. In this embodiment, the HMD 100 constitutes a first device and an information processing device, and the mobile device 200 constitutes a second device.

The HMD 100 is attached to the head of the user 300 so that the transmissive display is located at the eye position. Further, the mobile device 200 is held in the right hand of the user 300. The HMD 100 recognizes the mobile device 200 based on the output of a sensor such as a camera, and virtualizes it as an AR display (AR superimposed object) at a position corresponding to the mobile device 200 so as to be superimposed on the mobile device 200 in the illustrated example. Display a typical book 400.

The HMD 100 repeatedly acquires the self-position / posture information, the position / posture information of the mobile device 200 based on the self-position / posture, and the position / posture information of the left hand of the user 300 as a gesture recognition target based on the sensor output of the camera or the like. To do. Further, the mobile device 200 obtains the self-position / posture information, the position / posture information of the HMD 100 based on the self-position / posture, and the position / posture information of the left hand of the user 300 as a gesture recognition target based on the sensor output of the camera or the like. Get it repeatedly.

The HMD 100 receives information acquired by the mobile device 200 from the mobile device 200. Then, the HMD 100 spatially and temporally synchronizes the position / posture information of the left hand of the user 300 acquired by the HMD 100 and the mobile device 200 based on the information acquired by itself and the information received from the mobile device 200. It will be integrated later to determine the position and orientation of the left hand of the user 300.

The HMD 100 recognizes the gesture of the left hand of the user 300 based on the position and orientation of the left hand of the user 300 determined as described above, or further the temporal change thereof, and controls the AR display based on the recognition information of the gesture. .. For example, when the gesture of the left hand of the user 300 is an operation of turning the pages of the virtual book 400, the display of the book 400 is changed so that the pages of the book 400 are turned.

As described above, based on the integrated result of the position / posture information of the left hand of the user 300 based on the position / posture of the HMD 100 and the position / posture information of the left hand of the user 300 based on the position / posture of the mobile device 200, the left hand of the user 300 It recognizes gestures (left hand poses). Therefore, it is possible to improve the accuracy of gesture recognition of the left hand of the user 300.

For example, consider the case where the gesture of the left hand of the user 300 is an operation of turning the pages of a virtual book 400. FIG. 2A shows an image of a camera mounted on the HMD100. In the camera image of the HMD 100, since the left hand of the user 300 is taken from the back side of the hand, the fine pose of the finger is not observed. Further, FIG. 2B shows an image of a camera mounted on the mobile device 200. In the camera image of the mobile device 200, since the left hand of the user 300 is taken from the palm side, fine poses of the fingers can be observed.

Therefore, by recognizing the gesture of the left hand of the user 300 based on the integration result of the position and orientation information of the left hand of the user 300 by the HMD 100 and the mobile device 200, it is possible to improve the recognition accuracy, and a single camera. Observation from images enables difficult recognition. Further, in the AR display system 10, the HMD 100 and the mobile device 200 can be freely moved.

"HMD configuration example"
FIG. 3 shows a configuration example of the HMD 100. The HMD 100 includes a camera 101, an IMU (Inertial Measurement Unit) 102, an information processing unit 103, a communication unit 104, a transmissive display 105, and an application / recognition information storage 106.

The camera 101 is composed of an image sensor such as a lens, a CCD image sensor, and a CMOS image sensor. Two cameras 101 are provided on the outer surface of the front portion of the HMD 100, for example, and take an image of an object (subject) existing ahead of the user's line-of-sight direction. The IMU 102 acquires information on the acceleration and angular acceleration of the HMD 100.

The information processing unit 103 is composed of a CPU (Central Processing Unit) and the like. The information processing unit 103 performs various processes based on various programs stored in a storage unit (not shown). The information processing unit 103 includes a self-position estimation processing unit 131, another person's position estimation processing unit 132, a hand recognition processing unit 133, a main recognition integrated processing unit 134, and an image generation / application processing unit 135. There is.

The self-position estimation processing unit 131 uses an algorithm such as SLAM (Simultaneously Localization And Mapping) to determine the position and orientation of the HMD 100 based on the image obtained by the camera 101 and the information (acceleration, angular acceleration) obtained by the IMU 102. presume. The position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system). In this case, the self-position estimation processing unit 131 can also estimate information such as an error in position / orientation estimation.

The other person position estimation processing unit 132 estimates the position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 by a method such as object recognition (for example, marker recognition) or tracking based on the image obtained by the camera 101. The position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system). In this case, the other person position estimation processing unit 132 can also estimate information such as an error in position / orientation estimation. The position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.

The hand recognition processing unit 133 recognizes the hand based on the image obtained by the camera 101, and estimates the position / posture and speed of the hand based on the position / posture of the HMD 100. In addition, the hand recognition processing unit 133 estimates the pose. As pose estimation, for example, the position and speed of each joint are estimated. The information on the position and orientation of the hand estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system), and information on the position of each joint. Contains, for example, three-dimensional information (x, y, z in a Cartesian coordinate system). It is also conceivable to estimate the angle of each finger instead of estimating the position of each joint as pose estimation. In the following description, an example of estimating the position of each joint as pose estimation will be described.

In this case, the hand recognition processing unit 133 can also estimate information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint. The speed of the hand and the speed of each joint can be estimated from the temporal change of their positions.

The hand recognition result is the position and posture of the hand and the position of each joint, but other information may be used as long as the pose of the hand can be restored. For example, information such as rotation of relative coordinates of each joint, rotation of absolute coordinates, or position of each joint in world coordinates can be considered.

The self-position estimation processing unit 131, the other person's position estimation processing unit 132, and the hand recognition processing unit 133 perform processing as needed to update the information. The information obtained from these processing units can be summarized as follows. The time (observation time) at which the information was acquired is added as a time stamp to the information.

(1) Observation time (2) Position and posture and speed of the hand and estimation error (3) Position and speed of each joint and estimation error (4) Position and posture of self (HMD100) and estimation error (5) Others Positional orientation of (mobile device 200) and estimation error

The main recognition integrated processing unit 134 acquires the information obtained at any time by the self-position estimation processing unit 131, the other person's position estimation processing unit 132, and the hand recognition processing unit 133, and the same information from the mobile device 200 through the communication unit 104. The left hand recognition results (positional posture of the hand, position of each joint) of the user 300 acquired by the HMD 100 and the mobile device 200 are spatially and temporally synchronized and then integrated. The processing of the main recognition integrated processing unit 134 will be further described later.

The image generation / application processing unit 135 performs processing necessary for the operation of the application, and performs rendering processing for displaying the virtual book 400 as an AR display. Further, the image generation / application processing unit 135 receives the integration result of the main recognition integration processing unit 134, recognizes the gesture of the left hand of the user 300 based on the left hand recognition result of the user 300, and uses the recognition information of this gesture. Based on this, the interaction process that controls the AR display is performed. For example, when the gesture of the left hand of the user 300 turns the page of the book 400, the process of turning the page of the book 400 is performed.

The communication unit 104 communicates with the mobile device 200 wirelessly (for example, Wi-Fi (Wireless Fidelity) or Li-Fi (Light Fidelity)) or by wire. The transmissive display 105 performs AR display based on the image data supplied from the image generation / application processing unit 135.

The application / recognition information storage 106 holds information necessary for the application. In addition, the application / recognition information storage 106 holds information necessary for recognition and the like. Examples of the information held for recognition are localization maps for SLAM, marker recognition information, and hand recognition information. In the example of FIG. 3, the application / recognition information storage 106 is connected only to the main recognition integrated processing unit 134 and the image generation / application processing unit 135, but may be connected to other processing units.

"Example of mobile device configuration"
FIG. 4 shows a configuration example of the mobile device 200. The mobile device 200 has a camera 201, an IMU 202, an information processing unit 203, a communication unit 204, a non-transparent display 205, and a recognition information storage 206.

The camera 201 is composed of an image sensor such as a lens, a CCD image sensor, and a CMOS image sensor. The camera 201 is a stereo camera, which is provided on the display surface side of the mobile device 200, and images an object (subject) existing on the display surface side. The IMU 202 acquires information on the acceleration and angular acceleration of the mobile device 200.

The information processing unit 203 is composed of a CPU (Central Processing Unit) and the like. The information processing unit 203 performs various processes based on various programs stored in a storage unit (not shown). The information processing unit 203 includes a self-position estimation processing unit 231, another person's position estimation processing unit 232, a hand recognition processing unit 233, a main recognition integrated processing unit 234, and a recognition marker image generation unit 235. There is.

The self-position estimation processing unit 231 estimates the position and orientation of the mobile device 200 based on the image obtained by the camera 201 and the information (acceleration, angular acceleration) obtained by the IMU 202 by using an algorithm such as SLAM. The position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system). In this case, the self-position estimation processing unit 231 can also estimate information such as an error in position / orientation estimation.

The other person position estimation processing unit 232 estimates the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 by a method such as object recognition or tracking based on the image obtained by the camera 201. The position / orientation information estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system). In this case, the other person position estimation processing unit 232 can also estimate information such as an error in position / orientation estimation. The position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.

The hand recognition processing unit 233 recognizes the hand based on the image obtained by the camera 201, and estimates the position / posture and speed of the hand based on the position / posture of the mobile device 200. In addition, the hand recognition processing unit 233 estimates the pose. As pose estimation, for example, the position and speed of each joint are estimated. The information on the position and orientation of the hand estimated in this way includes, for example, six-dimensional information (x, y, z, pitch, yaw, roll in the Cartesian coordinate system), and the information on the position of each joint is For example, it contains three-dimensional information (x, y, z in a Cartesian coordinate system).

In this case, the hand recognition processing unit 233 can also estimate information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint. The speed of the hand and the speed of each joint can be estimated from the temporal change of their positions.

The self-position estimation processing unit 231, the other person's position estimation processing unit 232, and the hand recognition processing unit 233 perform processing at any time to update the information. The information obtained from these processing units can be summarized as follows. The time (observation time) at which the information was acquired is added as a time stamp to the information.

(1) Observation time (2) Position and posture and speed of the hand and estimation error (3) Position and speed of each joint and estimation error (4) Position and posture of the self (mobile device 200) and estimation error (5) Position and orientation of another person (HMD100), and estimation error

The sub-recognition integrated processing unit 234 uses the information obtained at any time by the self-position estimation processing unit 231 and the other person's position estimation processing unit 232 and the hand recognition processing unit 233 as a request for information transmission sent from the HMD 100 through the communication unit 204. Correspondingly, it is transmitted to the HMD 100 through the communication unit 204. The processing of the sub-recognition integrated processing unit 234 will be further described later.

The recognition marker image generation unit 235 acquires the image data of the recognition marker from the recognition information storage 206 and supplies it to the non-transparent display 205 to display the recognition marker. The display of the recognition marker is performed based on an instruction from the sub-recognition integrated processing unit 234 based on an application activation request received from the HMD 100 via the communication unit 204.

The communication unit 204 communicates with the HMD 100 wirelessly or by wire. The non-transparent display 205 displays the recognition marker based on the image data supplied from the recognition marker image generation unit 235.

The recognition information storage 206 holds information necessary for recognition and the like. Examples of the information held for recognition include the above-mentioned image data of the recognition marker, a localization map for SLAM, and information for hand recognition. In the example of FIG. 4, the recognition information storage 206 is connected only to the sub-recognition integrated processing unit 234 and the recognition marker image generation unit 235, but may be connected to other processing units.

"Processing of HMDs and mobile devices in AR display systems"
The outline of the processing of the HMD 100 and the mobile device 200 in the AR display system 10 shown in FIG. 1 will be described with reference to the flowchart of FIG.

The HMD100 starts the application in step ST1. Next, in step ST2, the HMD 100 requests the mobile device 200, which is a sub device, to start the application. In response to the application activation request from the HMD 100, the mobile device 200 activates the application in step ST11 and displays the recognition marker on the non-transparent display 205. The mobile device 200 then performs a recognition process (sub) in step ST12. The details of this recognition process (sub) will be described later.

After the process of step ST2, the HMD 100 performs the recognition process of the mobile device 200 which is a sub device in step ST3. In this case, the HMD 100 estimates the position and orientation of the mobile device 200 based on the recognition marker displayed on the non-transparent display 205 of the mobile device 200.

After the position and orientation are estimated in step ST3, the HMD 100 performs rendering in which a virtual book 400 is superimposed and displayed on the position of the mobile device 200 as an AR display in step ST4, and recognition information of the gesture of the left hand of the user 300. Start the interaction, which is the control of the AR display based on. The HMD 100 then performs a recognition process (main) in step ST5. The details of this recognition process (main) will be described later.

In parallel with the recognition process (main) in step ST5, the HMD 100 updates the state of the virtual book 400 as necessary based on the recognition information of the gesture of the left hand of the user 300 in step ST6. For example, when the recognition information of the gesture indicates that the page of the book 400 is turned, the state is updated so that the page of the book 400 is turned.

After that, when the application termination is instructed by the operation of the user 300, for example, the HMD 100 performs the application termination processing in step ST7. In this process, a termination signal is transmitted to the mobile device 200, which is a sub device, and the recognition process (main) is terminated. The mobile device 200 receives the end signal from the HMD 100 and performs the application end process in step ST13. In this process, the recognition process (sub) is terminated and the marker display is stopped.

"Recognition processing (main)"
With reference to the flowchart of FIG. 6, the recognition process (main) in the HMD 100, that is, (details of the process of step ST5 of the flowchart of FIG. 5 will be described. The HMD 100 repeatedly executes the process of the flowchart of FIG.

First, in step ST21, the self-position estimation processing unit 131 uses an algorithm such as SLAM, and the HMD 100 is self based on the image obtained by the camera 101 and the information (acceleration, angular acceleration) obtained by the IMU 102. The position and orientation of the HMD 100 are estimated. In this case, the error of position / orientation estimation is also estimated.

Next, in step ST22, the HMD100 uses the position / orientation of the HMD100 as a reference by a method such as object recognition (for example, marker recognition), tracking, etc., based on the image obtained by the camera 101 by the other person position estimation processing unit 132. The position and orientation of the mobile device 200, which is another person, is estimated. In this case, the error of position / orientation estimation is also estimated. The position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.

Next, in step ST23, the hand recognition processing unit 133 recognizes the hand based on the image obtained by the camera 101, and estimates the position / posture and speed of the hand based on the position / posture of the HMD 100. Furthermore, the position and speed of each joint are estimated as pose estimation. In this case, information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint is also estimated.

Next, in step ST24, the HMD 100 sends a request for information transmission to the mobile device 200, which is another person, through the communication unit 104 in the main recognition integrated processing unit 134, and receives the information from the mobile device 200. The information received in this way includes information acquired by the self-position estimation processing unit 231, the other person's position estimation processing unit 232, and the hand recognition processing unit 233 in the mobile device 200.

That is, the information received in this way is based on the position / orientation of the mobile device 200 and the estimation error, the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200, the estimation error, and the position / orientation of the mobile device 200. The position and posture and speed of the hand, and the estimation error, the position and speed of each joint, and the estimation error are included. The position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 constitutes information on the relative relationship between the position / orientation of the mobile device 200 and the HMD100.

Next, in step ST25, the HMD 100 is the hand recognition result estimated by the main recognition integrated processing unit 134 by the self HMD 100 and the other mobile device 200 (including the position and posture of the hand and the position of each joint). Are spatially and temporally synchronized (initialized).

The flowchart of FIG. 7 shows the synchronization process in step ST25 in more detail. In step ST31, the HMD 100 adjusts the position and orientation of the mobile device 200 estimated by the mobile device 200, which is another person, to the position and orientation of the mobile device 200 estimated based on the position and orientation of the HMD 100, which is itself. As a result, the position and orientation of the mobile device 200 are organized in the world coordinate system as seen from the HMD 100.

Next, in step ST32, the HMD 100 adjusts the hand recognition result estimated based on the position and orientation of the mobile device 200 based on the adjusted position and orientation of the mobile device 200 to the world coordinate system seen from the HMD 100. Include. As a result, the position and posture of the hand estimated based on the HMD 100 (including the position of each joint) and the position and posture of the hand estimated based on the mobile device 200 (including the position of each joint) are the world seen from the HMD 100. It is organized in the coordinate system and is spatially synchronized.

In the above description, an example is shown in which the position and orientation of the mobile device 200 estimated based on the position and orientation of the HMD 100 are used and summarized in the world coordinate system seen from the HMD 100. However, the present invention is not limited to this, and it is conceivable to use the position and orientation of the HMD 100 estimated based on the position and orientation of the mobile device 200 and to put them together in the world coordinate system seen from the mobile device 200.

In the above description, the position / orientation of the mobile device 200 based on the position / orientation of the HMD 100 or the position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 is used as the relative relationship information between the HMD 100 and the mobile device 200. I explained that it will be used. It is also conceivable to use the observation information of the same object as the relative relationship information between the position and orientation of the HMD 100 and the mobile device 200. For example, observation of the same environment (initialization by SLAM map) or observation information of the same object (hand recognition, special markers, etc. that can be observed to be the same object) that can be observed in common. is there.

Next, in step ST33, the HMD 100 predicts the hand recognition result estimated by the self and others at the current time. This prediction is based on information such as hand speed, joint speed, and observation time. This prediction may be a prediction by linear interpolation, or may be an interpolation by curve interpolation or machine learning. As a result, the hand recognition result estimated based on the HMD 100 and the hand recognition result estimated based on the mobile device 200 are time-synchronized.

As described above, in the present technology, since the hand recognition results estimated by the MD 100 and the mobile device 200 are synchronized in time, it is not necessary to synchronize the observations of the MD 100 and the mobile device 200, but each of the MD 100 and the mobile device 200 does not need to be synchronized. It is necessary to adjust the internal time.

FIG. 8A is estimated by the HMD100, the hand recognition result in the world coordinate system seen from the HMD100 is shown by a solid line, and the prediction to the current time is shown by a broken line. In the illustrated example, the position of each circle indicates the position of the joint, and the color of each circle indicates the observation error. Points that are far from the camera and difficult to observe and have many errors are shown in black, while points that are easy to observe and have few errors are shown in white.

FIG. 8B shows the hand recognition result estimated by the mobile device 200 and adjusted to the world coordinate system seen from the HMD 100 with a solid line, and the prediction to the current time is shown with a broken line. In the illustrated example, the position of each circle indicates the position of the joint, and the color of each circle indicates the observation error. Points that are far from the camera and difficult to observe and have many errors are shown in black, while points that are easy to observe and have few errors are shown in white.

FIG. 8 (c) shows the predictions of FIGS. 8 (a) and 8 (b) to the current time. That is, FIG. 8 (c) shows the hand recognition results estimated by the HMD 100 and the mobile device 200, spatially and temporally synchronized.

Note that the conversion from the observation time to a time earlier than that may be used instead of the future prediction from the observation time to the current time. In this case, the observation time of the hand position and posture (including the position of each joint) estimated by the self and others is adjusted to the old time. At that time, interpolation can also be performed by storing time-series information.

Returning to the explanation of the flowchart of FIG. 6, after the processing of step ST25, in step ST26, the main integrated processing unit 134 synchronizes the hands of the self and others spatially and temporally (hands). (Including the position and posture of each joint, the position of each joint) is integrated.

The flowchart of FIG. 9 shows the integrated process in step ST26 in more detail. In step ST41, the HMD 100 continuously determines the identity of each hand recognition result having the same ID (same identifier).

In this case, it is verified whether the identity is maintained based on the pose and position / posture of the hands having the same ID. This verification is calculated from the distance of each observation of the position and posture of the hand and the distance of the pose, and if it exceeds a certain threshold value, it is judged that the identity is not maintained. The pose distance can be calculated from the distance between the positions of each joint and the difference in the rotation of the joints.

Next, in step ST42, the HMD 100 refers to the determination in step ST41 to determine whether there is a hand recognition result in which the identity is not maintained and the ID needs to be separated. When it is determined that there is a hand recognition result that requires ID separation, the HMD 100 assigns a different ID to each hand recognition result in step ST43 to separate the IDs, and then performs the process of step ST44. move on. On the other hand, if it is determined in step ST42 that there is no hand recognition result that requires ID separation, the HMD 100 immediately proceeds to the process of step ST44.

In this step ST44, the HMD 100 makes an integrated judgment of hand recognition results having different IDs. This judgment is the opposite of the above-mentioned judgment of identity at the time of ID separation, and when the distance between the position and posture of the hands and the pose is equal to or less than a certain threshold value, it is judged as the same hand and integrated.

Next, in step ST45, the HMD 100 refers to the determination in step ST44 and determines whether or not there is a hand recognition result that requires ID integration. When it is determined that there is a hand recognition result that requires ID integration, the HMD 100 integrates the IDs in step ST46, and then proceeds to the process of step ST47. On the other hand, when it is determined in step ST45 that there is no hand recognition result that requires ID integration, the HMD 100 immediately proceeds to the process of step ST47.

When a hand without an ID is observed (aside from the tracked hand, when the hand is suddenly recognized at a distant position, or when the first hand recognition process starts running, Or a new hand is reflected in the angle of view of the camera), a unique ID that is not used is assigned to the hand recognition result of those unknown hands.

Further, the ID may be assigned to the hand recognition result based on the identification ID of each individual's hand. In that case, the ID will be assigned at the time of hand recognition. This process is considered to be effective even when the tracked hand is not recognized and appears.

ID separation is necessary when the observed hands appear to overlap depending on the position of the camera, are mistakenly recognized as one hand, and then are found to be two hands. On the contrary, the ID integration is treated as if the same hand was observed in different places in space due to the misrecognition of the pose, and it is recognized as two hands, and then the position and orientation of the camera are corrected to the correct position. It is needed in cases where it is integrated when it is observed and recognized as the same hand.

FIG. 10 shows an example in which ID separation of the hand recognition result is required. In the case where two hands overlap in FIG. 10A, it may be recognized as the same one hand by misidentification. However, as shown in FIG. 10B, if it is possible to recognize that the two hands are two separate hands in the subsequent recognition, it is necessary to register them as different IDs.

FIG. 11 shows an example in which ID integration is required. As shown in FIG. 11A, even if the same hand is recognized in a state of being misidentified when estimating the self-posture position of the camera, it is recognized as a hand in different spaces. However, as shown in FIG. 11B, when the self-positioning posture is corrected and the correct position is recognized, it is found that the hands registered as separate hands are the same. In that case, it is necessary to integrate the two hand recognition results as one hand.

At the time of processing in step ST47, a hand recognition result synchronized with the identity can be obtained. In this step ST47, the HMD 100 integrates synchronized and identical hand recognition results. First, the HMD 100 integrates the position and posture of the hand among the hand recognition results. This integration is performed, for example, by using an extended Kalman filter, a normal Kalman filter, or a particle filter. Alternatively, this integration is done, for example, by finding a weighted average or a simple position average. The hand position / orientation estimation error can be used as an input for those filters or as a weight to improve the accuracy of integration.

Next, the HMD100 integrates the positions of each joint in the hand recognition result. This integration is also performed using, for example, an extended Kalman filter, or a normal Kalman filter or particle filter. Alternatively, this integration is also done, for example, by finding a weighted average or a simple position average. The estimation error of the position of each joint can be used as an input for those filters or as a weight to improve the accuracy of integration.

In the above explanation, the integration process is performed by integrating the position and posture of the hand, and then integrating the positions of each joint. However, the positions of the joints may be integrated from the beginning.

Returning to the explanation of the flowchart of FIG. 6, the HMD 100 feeds back the integration result in step ST27 after the processing of step ST26. This feedback includes feedback to the own device itself and feedback to the mobile device 200 which is another person. Feedback to the mobile device 200 is performed by transmitting the integrated result to the mobile device 200 through the communication unit 104. By feeding back the result of this integration, it is possible to improve the accuracy of hand recognition (estimation of hand position / posture, estimation of each joint position).

"Recognition processing (sub)"
With reference to the flowchart of FIG. 12, the recognition process (sub) in the mobile device 200, that is, (details of the process of step ST12 of the flowchart of FIG. 5 will be described. The mobile device 200 repeatedly executes the process of the flowchart of FIG. To do.

First, in step ST51, the mobile device 200 uses an algorithm such as SLAM in the self-position estimation processing unit 231 based on the image obtained by the camera 201 and the information (acceleration, angular acceleration) obtained by the IMU 202. The position and orientation of the mobile device 200 is estimated. In this case, the error of position / orientation estimation is also estimated.

Further, in step ST51, the mobile device 200 is the other person's position estimation processing unit 232 based on the image obtained by the camera 201, and the other person based on the position / orientation of the mobile device 200 by a method such as object recognition or tracking. The position and orientation of the HMD 100 is estimated. In this case, the error of position / orientation estimation is also estimated. The position / orientation of the HMD 100 based on the position / orientation of the mobile device 200 constitutes information on the relative relationship between the position / orientation of the HMD 100 and the mobile device 200.

Further, in step ST51, the hand recognition processing unit 233 recognizes the hand based on the image obtained by the camera 201, and estimates the position / posture and speed of the hand based on the position / posture of the mobile device 200. Furthermore, the position and speed of each joint are estimated as pose estimation. In this case, information such as an error in estimating the position and orientation of the hand and an error in estimating the position of each joint is also estimated.

Next, in step ST52, the mobile device 200 determines in the sub-recognition integrated processing unit 234 whether or not there is a request for information transmission from another person, the HMD 100. When there is a request for information transmission, the mobile device 200 transmits the information estimated in step ST51 to the other HMD100 through the communication unit 204 in the sub-recognition integrated processing unit 234 in step ST53, and then, The process proceeds to step ST54. On the other hand, when there is no request for information transmission in step ST52, the mobile device 200 immediately proceeds to the process of step ST54.

In step ST54, the mobile device 200 determines whether or not the integration result is received from another person by the sub-recognition integration processing unit 234. When the integrated result is received from another person, the mobile device 200 integrates the received information with the past estimated information in the sub-recognition integrated processing unit 234 in step ST55, and is used in the hand recognition / hand pose estimation. Update the estimated information of. By updating the past estimation information used in hand recognition / hand pose estimation with the received information (integration result from others) in this way, the accuracy of hand recognition / hand pose estimation can be improved. On the other hand, when the integration result is not received from another person in step ST54, the mobile device 200 ends the process. In this case, the past estimation information used in hand recognition / hand pose estimation is not updated.

The order of each process in the flowchart of FIG. 12 is not limited to this. For example, the processing of self-position estimation, other person's position estimation, hand recognition / hand pose estimation, and the processing of reception and transmission may be performed in parallel. In that case, when there is a request for information transmission, the mobile device 200 transmits the latest estimation result to the HMD 100. Further, when the integration result is received from the HMD 100, the mobile device 200 integrates the integration result (received information) with the latest estimation result.

As described above, in the present technology, the hand gesture is recognized based on the hand recognition result determined based on the hand recognition result based on the position / orientation of the HMD 100 and the hand recognition result based on the position / orientation of the mobile device 200. It is a thing. Therefore, it is possible to improve the accuracy of hand gesture recognition. Further, in the present technology, the HMD 100 and the mobile device 200 can be freely moved.

<2. Modification example>
In the above-described embodiment, an example is shown in which the user 300 holds the mobile device 200 with his right hand and the user 300 makes a gesture with his left hand. However, for example, an example in which the mobile device 200 is placed on a table or the like and the user 300 makes a gesture with both the right hand and the left hand can be considered in the same manner.

Further, in the above-described embodiment, an example in which the gesture recognition target is a hand is shown. However, as the gesture recognition target, other objects that can be modeled can be assumed in addition to the hand. For example, objects such as pens, markers, and boxes that are premised on rigid bodies, books, faces, papers, human bodies, and cars that are known to be deformed can be considered as deformation examples.

Further, in the above-described embodiment, the self-positioning posture, the other-person-positioning posture, and the recognition of the hand are all explained on the premise of the input estimated in the 3D space. Therefore, it is basically premised on a sensor such as a stereo camera that can acquire such information. However, it is possible to loosen that premise.

When each estimation device uses a monocular camera, some estimation / estimation results are in an indefinite scale state (a state in which the size of an object is not determined. In the case of a monocular, self-position, hand recognition, etc. are in that state. However, even in that state, it is considered possible to integrate the information proposed by this technology. In that case, processing can be performed by simultaneously estimating the adjustment (synchronization) of the camera poses and the adjustment of the scale at the time of integration of the information of the above-mentioned specific example.

In addition, when estimating the position and orientation of another person, if the size of the object to be considered as another person is known, the scale can be estimated from that information, so based on that information, other recognition results with an indefinite scale. Can be estimated. Therefore, from this information as well, the processing of this technology can be applied to a system composed of a monocular camera.

It is also possible to assume the use of equipment that can estimate distance information, such as ToF cameras, Pattern Stereos, and Structured Light systems. In that case, the present technology can be expanded by performing integrated processing based on the recognition result of the device that can obtain 3D information. It is also possible to consider the cooperation between such devices and monocular cameras. In this case, it is conceivable to interpolate the information of the monocular camera from the 3D information and the information of the device whose scale can be estimated.

Further, in the above-described embodiment, an example in which the main processing is performed by the HMD 100 and the sub processing is performed by the mobile device 200 is shown. However, it is also conceivable to switch the processing between the HMD 100 and the mobile device 200 at a predetermined cycle. This makes it possible to eliminate processing bottlenecks and uneven processing amount. In addition, this makes it possible to suppress the extreme power consumption of one terminal.

Further, in the above-described embodiment, the cooperation between the HMD 100 and the mobile device 200 is mentioned, but in addition, this technology can be used for detecting the pose of the human body by a plurality of movable cameras. In this case, it can be assumed that each camera is recognized as another person and treated as a motion capture system by an operable camera system by integrating the observations of each other.

Further, in the above-described embodiment, the HMD 100 constitutes the first device and the information processing device, and the mobile device 200 constitutes the second device, but the present technology is not limited thereto. Absent.

For example, the mobile device 200 may constitute the first device and the information processing device, and the HMD 100 may constitute the second device. In this case, AR display control is performed by transmitting an AR display control signal based on the gesture recognition information from the mobile device 200 to the HMD 100 by communication, or the mobile device 200 communicates with the HMD 100. The gesture recognition information is transmitted, and AR display control is performed in the HMD 100 based on the gesture recognition information.

Further, for example, the HMD 100 constitutes the first device and the mobile device 200 constitutes the second device, or the mobile device 200 constitutes the first device and the HMD 100 constitutes the second device. And other devices such as an external server connected to the mobile device 200 via a network may constitute an information processing device. In this case, AR display control is performed by transmitting an AR display control signal based on the gesture recognition information from another device to the HMD 100 by communication, or the other device communicates with the HMD 100. Gesture recognition information is transmitted, and AR display control is performed in the HMD 100 based on the gesture recognition information.

Although the preferred embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the technical scope of the present disclosure is not limited to such examples. It is clear that a person having ordinary knowledge in the technical field of the present disclosure can come up with various modifications or modifications within the scope of the technical ideas described in the claims. Of course, it is understood that the above also belongs to the technical scope of the present disclosure.

Further, the effects described in the present specification are merely explanatory or exemplary and are not limited. That is, the techniques according to the present disclosure may exhibit other effects apparent to those skilled in the art from the description herein, in addition to or in place of the above effects.

The present technology can have the following configurations.
(1) An information acquisition unit that acquires the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
An information receiving unit that receives the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. An information processing apparatus including an information processing unit that determines a posture and recognizes the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
(2) The information acquisition unit further acquires the position / orientation information of the second device based on the position / orientation of the first device based on the sensor output of the first device.
The information processing apparatus according to (1), wherein the relative relationship information includes position / orientation information of the second device based on the position / orientation of the first device.
(3) The information acquisition unit acquires the position / orientation information of the second device based on the recognition marker information displayed on the second device included in the sensor output of the first device (3). The information processing device according to 2).
(4) The information receiving unit receives the position / orientation information of the first device based on the position / orientation of the second device acquired from the second device based on the sensor output of the second device. Receive more,
The information processing apparatus according to (1), wherein the relative relationship information includes position / orientation information of the first device based on the position / attitude of the second device.
(5) The information processing unit
Based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target are spatially synchronized.
Based on the time stamp information added to the first position / posture information of the gesture recognition target and the second position / posture information of the gesture recognition target, the first position / posture of the gesture recognition target and the gesture recognition target The second position and orientation are time-synchronized by the prediction process,
The spatially and temporally synchronized first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target are integrated to determine the position and orientation of the gesture recognition target (1). The information processing apparatus according to any one of (4) to (4).
(6) Identification information is added to the first position / posture information of the gesture recognition target acquired by the information acquisition unit and the second position / posture information of the gesture recognition target received by the information receiving unit. And
The information processing unit integrates or separates the identification information based on each position / posture information of the gesture recognition target, and the first position / posture of the gesture recognition target and the gesture recognition target related to the same identification information. The information processing apparatus according to (5) above, which integrates the second position and orientation.
(7) The information receiving unit transmits a request for starting an application to the second device, and then transmits a request for transmitting information, and the second device sends the second position of the gesture recognition target. The information processing device according to any one of (1) to (6) above, which receives posture information.
(8) The information processing device according to (7), wherein the second device that has received the application activation request updates the second position / orientation information of the gesture recognition target at any time.
(9) The first device is an augmented reality display device.
The information processing device according to any one of (1) to (8), further comprising a display control unit that controls augmented reality display in the augmented reality display device based on the gesture recognition information.
(10) The gesture recognition target is located between the first device and the second device.
The information processing device according to (9), wherein the augmented reality display is performed at a position corresponding to the second device.
(11) The first device is a head-mounted display having a transmissive display.
The information processing device according to (10) above, wherein the second device is a mobile device having a non-transmissive display.
(12) A procedure for acquiring the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device, and
A procedure for receiving the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device, and
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. An information processing method having a procedure of determining a posture and recognizing the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
(13) Computer
An information acquisition means for acquiring the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
An information receiving means for receiving the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. A program that determines a posture and functions as an information processing means for recognizing a gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.

10 ... AR display system 100 ... HMD
101 ... Camera 102 ... IMU
103 ・・・ Information processing unit 104 ・・・ Communication unit 105 ・・・ Transmissive display 106 ・・・ Application / recognition information storage 131 ・・・ Self-position estimation processing unit 132 ・・・ Others' position estimation processing unit 133 ・・・ Hand recognition processing unit 134 ・・・ Main recognition integrated processing unit 135 ・・・ Image generation / application processing unit 200 ・・・ Mobile device 201 ・・・ Camera 202 ・・・ IMU
203 ・・・ Information processing unit 204 ・・・ Communication unit 205 ・・・ Non-transparent display 206 ・・・ Information storage for recognition 231 ・・・ Self-position estimation processing unit 232 ・・・ Others position estimation processing unit 233 ・・・-Hand recognition processing unit 234 ... Sub-recognition integrated processing unit 235 ... Recognition marker image generation unit 300 ... User 400 ... Virtual book

Claims

An information acquisition unit that acquires the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
An information receiving unit that receives the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. An information processing apparatus including an information processing unit that determines a posture and recognizes the gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
The information acquisition unit further acquires the position / orientation information of the second device based on the position / orientation of the first device based on the sensor output of the first device.
The information processing device according to claim 1, wherein the relative relationship information includes position / orientation information of the second device based on the position / attitude of the first device.
The second aspect of the present invention, wherein the information acquisition unit acquires the position / orientation information of the second device based on the recognition marker information displayed on the second device included in the sensor output of the first device. Information processing device.
The information receiving unit further receives the position / orientation information of the first device based on the position / orientation of the second device acquired based on the sensor output of the second device from the second device. And
The information processing device according to claim 1, wherein the relative relationship information includes position / orientation information of the first device based on the position / attitude of the second device.
The information processing unit
Based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target are spatially synchronized.
Based on the time stamp information added to the first position / posture information of the gesture recognition target and the second position / posture information of the gesture recognition target, the first position / posture of the gesture recognition target and the gesture recognition target The second position and orientation are time-synchronized by the prediction process,
The first position and orientation of the gesture recognition target and the second position and orientation of the gesture recognition target that are spatially and temporally synchronized are integrated to determine the position and orientation of the gesture recognition target according to claim 1. The information processing device described.
Identification information is added to the first position / orientation information of the gesture recognition target acquired by the information acquisition unit and the second position / orientation information of the gesture recognition target received by the information receiving unit.
The information processing unit integrates or separates the identification information based on each position / orientation information of the gesture recognition target, and the first position / orientation of the gesture recognition target and the gesture recognition target related to the same identification information. The information processing device according to claim 5, which integrates the second position and orientation.
The information receiving unit transmits a request for starting an application to the second device, then transmits a request for transmitting information, and receives the second position / orientation information of the gesture recognition target from the second device. The information processing device according to claim 1 for receiving.
The information processing device according to claim 7, wherein the second device that has received the application activation request updates the second position / posture information of the gesture recognition target at any time.
The first device is an augmented reality display device.
The information processing device according to claim 1, further comprising a display control unit that controls augmented reality display in the augmented reality display device based on the gesture recognition information.
The gesture recognition target is located between the first device and the second device, and is located between the first device and the second device.
The information processing device according to claim 9, wherein the augmented reality display is performed at a position corresponding to the second device.
The first device is a head-mounted display having a transmissive display.
The information processing device according to claim 10, wherein the second device is a mobile device having a non-transparent display.
A procedure for acquiring the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device, and
A procedure for receiving the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device, and
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. An information processing method having a procedure of determining a posture and recognizing a gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.
Computer,
An information acquisition means for acquiring the first position / posture information of the gesture recognition target based on the position / posture of the first device based on the sensor output of the first device.
An information receiving means for receiving the second position / posture information of the gesture recognition target based on the position / posture of the second device acquired based on the sensor output of the second device from the second device.
The position of the gesture recognition target based on the relative relationship information between the position and orientation of the first device and the second device, the first position and orientation information of the gesture recognition target, and the second position and posture information of the gesture recognition target. A program that determines a posture and functions as an information processing means for recognizing a gesture of the gesture recognition target based on the determined position and posture of the gesture recognition target.