WO2021161769A1

WO2021161769A1 - Information processing device, information processing method, and program

Info

Publication number: WO2021161769A1
Application number: PCT/JP2021/002501
Authority: WO
Inventors: 洋祐加治; 哲男池田; 淳入江; 英佑藤縄; 誠史友永; 忠義村上; 健志後藤
Original assignee: ソニーグループ株式会社
Priority date: 2020-02-10
Filing date: 2021-01-25
Publication date: 2021-08-19

Abstract

Provided is an information processing device or the like that solves the problem about the usability of an information processing system that provides a function which depends on a registered gesture. The information processing device according to an aspect of the present disclosure is provided with a first analysis unit, a second analysis unit, a determination unit, and an execution unit. The first analysis unit detects a specific part of a user shown in an input image, and analyzes the operation of the specific part. The second analysis unit detects a specific object shown in the input image, and analyzes the operation of the specific object. On the basis of the operation of the specific part and the operation of the specific object, the determination unit determines whether a gesture related to the specific part and the specific object has been executed. When it is determined that the gesture has been executed, the execution unit executes a function corresponding to the gesture.

Description

Information processing equipment, information processing methods, and programs

This disclosure relates to information processing devices, information processing methods, and programs.

In an AR (Augmented Reality) system or the like, when the execution of a registered gesture is detected, a function corresponding to the registered gesture is provided. For example, a system has been proposed in which the display of CG (Commuter Graphics) created by an information processing system is changed according to the movement of a user's hand. Further, a system has been proposed in which it detects whether or not there is a specific object near the user and displays different CG depending on the presence or absence of the specific object even if the same gesture is performed.

Japanese Unexamined Patent Publication No. 2019-060963 JP-A-2018-000941

With the development of the AR system, the functions that the AR system can provide are increasing more and more. Therefore, the method of selecting the provided functions is important. For example, if a large number of functions are to be recognized only by the gestures of the user's hand, a large number of gestures are required, which is not preferable from the viewpoint of usability. Therefore, it is preferable to make it possible to select a large number of functions by making a combination of a specific object and a human gesture, that is, a gesture using the specific object available.

From the viewpoint of usability, it is preferable that a function intuitively imagined by the user from a specific object is associated with the specific object. Therefore, it is preferable that a plurality of specific objects are available. However, in that case, detection such as whether or not there is a specific object near the user cannot recognize which specific object is used when a plurality of specific objects are near the user. ..

Also, if only one function can be assigned to one specific object, and you want to call another function, you have to change the specific object. This is not preferable from the viewpoint of usability. Therefore, it is desired that a plurality of functions can be assigned to each of the plurality of specific objects. However, in that case, it is necessary to accurately distinguish which of the multiple functions the gesture is for.

The present disclosure provides an information processing device or the like that solves a problem related to usability of an information processing system that provides functions according to registered gestures.

The information processing device on one aspect of the present disclosure includes a first analysis unit, a second analysis unit, a determination unit, and an execution unit. The first analysis unit detects a specific part of the user captured in the input image and analyzes the operation of the specific part. The second analysis unit detects the specific object captured in the input image and analyzes the operation of the specific object. The determination unit determines whether or not the gesture related to the specific portion and the specific object has been executed based on the operation of the specific portion and the operation of the specific object. When it is determined that the gesture has been executed, the execution unit executes a function corresponding to the gesture.

Further, the second analysis unit determines the success or failure of the analysis of the motion of the specific object at the first time point, and when it is determined that the analysis of the motion of the specific object at the first time point has failed, the first analysis unit. The analysis result of the operation at the time point may be modified.

In addition, the second analysis unit uses the analysis result of the operation at the first time point as the analysis result of the operation at the second time point before the first time point and the third time point after the first time point. The analysis result of the operation at the first time point may be modified based on at least one of the analysis results of the operation in.

Further, the second analysis unit predicts the operation at the first time point based on the analysis result of the operation at the time point before the first time point, and predicts the operation at the first time point and the first time point. The success or failure of the analysis of the motion at the first time point may be determined based on the analysis result of the motion at the first time point.

Further, when it is determined that the analysis of the motion of the specific object at the first time point has failed, the second analysis unit corrects the analysis result of the motion at the first time point based on the prediction result. You may.

Further, the execution unit may output an image including an image object corresponding to the executed gesture together with the execution of the function.

Further, the second analysis unit may predict the first region of the detected object at the first time point based on the transition of the position of the detected object, and the execution unit may predict the first region of the detected object. At the time point, the image object corresponding to the executed gesture may be displayed in an area other than the first area.

Further, the second analysis unit may predict the second region at the first time point of the detected specific part based on the transition of the position of the detected specific part, and the execution part may predict the second region at the first time point. At the first time point, the image object corresponding to the executed gesture may be displayed in an area other than the second area.

Further, the execution unit may adjust the display position of the image object corresponding to the executed gesture according to the movement of the user.

Further, the first analysis unit may estimate the position of the user at the first time point based on the transition of the position of the detected specific part, and the execution unit corresponds to the executed gesture. The image object to be displayed may be displayed in the correct vertical orientation when viewed from the user's estimated position.

Further, the execution unit may adjust the display direction of the image object corresponding to the executed gesture according to the movement of the user.

Further, the execution unit may display each image related to the plurality of gestures and each image related to the plurality of functions when the determination unit determines that a predetermined gesture has been executed. , The determination unit recognizes the selected gesture and function based on the transition of the specific part and the position of each displayed image, and when it is determined that the selected gesture has been executed, You may perform the selected function.

Further, the execution unit may indicate a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed, and the first analysis unit may indicate the registration area. The specific site included in the above may be detected, and the operation of the detected specific site may be one of the plurality of gestures.

In another aspect of the disclosure,
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
An information processing method is provided.

In another aspect of the disclosure,
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
A program that runs on a computer is provided.

In addition, a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
A storage medium in which a program executed on a computer is stored is provided.

The figure which shows the structural example of the information processing system which concerns on embodiment of this disclosure. The figure explaining the usage form of an information processing system. The block diagram which shows the internal structure example of an information processing apparatus. The figure explaining the flow of analysis of a specific part. The figure explaining the registration content of the registration gesture. The figure which shows the registration gesture using the specific object of a pen shape. The figure which shows the registration gesture and the corresponding function using the specific object of a pen shape. The figure which shows another registration gesture and corresponding function using a specific object of a pen shape. The figure which shows the registration gesture using the specific object of the rectangular parallelepiped. The figure which shows the registration gesture and the corresponding function using the specific object of a rectangular parallelepiped. The figure which shows another registration gesture and corresponding function using a specific object of a rectangular parallelepiped. The figure which shows the registration gesture using the specific object of a cylinder. The figure which shows the registration gesture and the corresponding function using the specific object of a cylinder. The figure which shows the preferable display example of an image object. Flowchart of function call processing by gesture. Flowchart of analysis process. Flowchart of image display processing.

Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.

(One Embodiment of the present invention)
FIG. 1 is a diagram showing a configuration example of an information processing system according to an embodiment of the present disclosure. In the example of FIG. 1, the information processing system 1000 is composed of a distance image generation device 100, an information processing device 200, a projector 300, and a projected object 400.

The information processing system 1000 of the present embodiment is a system that recognizes the execution of the registered gesture and provides a function according to the registered gesture. In the present disclosure, as a typical example, an example in which the information processing system 1000 is an AR (Augmented Reality) system for displaying an image will be described. However, the function to be executed does not have to be related to image processing. The control of the electric device managed by the information processing system 1000 (for example, power on / off, etc.) may be simply executed.

In the example of FIG. 1, a table is shown as the projected object 400, and an image is projected on the upper surface of the table. The entire image displayed by the information processing system 1000 is referred to as the entire image 500. The surface of the projected object 400 on which the entire image 500 is displayed is referred to as a projected surface.

In this disclosure, the term "image" is a concept that includes both still images and moving images. Therefore, the "image" in the present disclosure may be replaced with a still image or a moving image if there is no particular problem. That is, the image displayed by the information processing system 1000 may be a moving image or a still image. The concept of "image" is also included in "image". Further, the whole image 500 may be an image called a stereoscopic image, a 3D image, or the like, which can make the viewer feel three-dimensional.

Gestures and functions corresponding to the gestures are registered in advance in the information processing system 1000. Further, the user who uses the information processing system 1000 recognizes the registered gesture and the corresponding function, and conveys the function to be called to the information processing system 1000 by the gesture. The information processing system 1000 recognizes the gesture of the user and changes the display content of the entire image 500 according to the recognized gesture.

FIG. 2 is a diagram illustrating a usage pattern of the information processing system 1000. FIG. 2 shows

users

601, 602, and 603 who use the information processing system 1000. Further, a pen-shaped object 701 held by the user 601 is shown. Also shown are a rectangular parallelepiped 702, a cylinder 703, and a laptop PC 704 resting on the projected object 400. In addition, a plurality of depictions are displayed in the entire image 500. In the present disclosure, a depiction in the whole image 500, in other words, a partial image displayed in a part of the whole image 500, is referred to as an image object. In the example of FIG. 2, image objects 501, 502, and 503 such as memo paper and an image object 504 representing a time of "3:47" are shown. The image object 504 representing the time is projected on the upper surface of the cylinder 703.

User 602 is touching the image object 503 with a finger. In this way, the action of touching the image object of the entire image 500 is also recognized as a gesture. Then, the pre-registered process corresponding to the gesture is performed. For example, it is possible to execute processing on the touched image object. For example, it is possible to execute a process such as enlarging, reducing, or erasing the touched image object 503, or changing the image content related to the image object 503. Alternatively, it is possible to execute processing on an image object other than the touched image object. For example, it is possible to execute a process such as displaying a menu image of the information processing system 1000. In this way, the entire image 500 can be used as a so-called virtual touch screen.

Some gestures are represented only by specific parts such as the user's hand and fingers, but in the present embodiment, gestures using a specific object registered in advance in the information processing system 1000 can also be used. For example, the user 601 is holding the pen-shaped object 701, and such a posture of the user can also be recognized as a gesture. For example, a process such as displaying a solid line corresponding to the transition of the position of the tip of the object 701 (in other words, the locus of the tip) on the image object 501 can be executed.

In general, the human body is also included in the object, but in order to distinguish between a gesture using only a specific part of the user and a gesture using a specific object, the specific object is not included in the specific object in this disclosure. It shall be. That is, a gesture using a specific object is a gesture using an object other than the human body, and a gesture using only the hand, a gesture using a combination of eyes and fingers, etc. are not included in the gesture using the specific object. .. The specific object is not particularly limited except for the human body, and its shape is not limited.

Further, the information processing system 1000 of the present embodiment can assign a plurality of instruction contents to a specific object. That is, it is possible to register a plurality of gestures using the same specific object. For example, when a user grasps and moves a pen-shaped object, if the tip of the object is directed toward the entire image 500, a solid line is drawn in the overall image 500, and the rear end of the object is directed toward the entire image 500. If so, a process of erasing the solid line in the entire image 500 may be performed according to the transition of the position of the rear end. That is, a gesture using a pen-shaped object can provide a pen function of drawing a solid line and an eraser function of erasing the drawn solid line. In this way, even gestures using the same specific object can call different functions. From the viewpoint of usability, it is preferable that a function intuitively imagined by the user from a specific object is associated with the specific object, and this embodiment makes it possible.

The above implementation method will be explained. First, the role of each device shown in FIG. 1 will be described. The distance image generation device 100 captures a region in which the gesture is performed and generates a distance image (depth map) related to the region. As the distance image generation device 100, a known device may be used. For example, it includes an image sensor, a distance measuring sensor, and the like, and generates a distance image from these images. The image sensor includes an RGB camera and the like. Examples of the distance measuring sensor include a stereo camera, a TOF (Time of Flight) camera, and a structured light camera.

The information processing device 200 recognizes the gesture by the user based on the distance image generated by the distance image generation device 100. As described above, the recognized gesture may include a gesture using an object. Then, the whole image 500 is generated based on the recognized gesture. In the present embodiment, the display position of the image object displayed on the entire image 500 is adjusted in order to further improve the usability. As a result, the entire image 500 can be easily viewed by the user. Details will be described later together with the components of the information processing apparatus 200.

The projector 300 outputs the entire image 500 generated by the information processing device 200. In the example of FIG. 1, the projector 300 is installed above the table, an image is projected downward from the projector 300, and the entire image 500 is displayed on the upper surface of the table.

Although the AR system for projecting the entire image 500 is shown in the example of FIG. 1, the display destination of the entire image 500 is not particularly limited. For example, the information processing apparatus 200 may transmit the entire image 500 to the laptop 704 of FIG. 2 so that the entire image 500 may be displayed on the laptop 704. Further, the entire image 500 may be output to an image display device such as an AR glass or a head-mounted display. That is, instead of the projector 300 and the projected object 400, an image display device may be included in the information processing system 1000.

In this disclosure, in order to clarify the processing performed by the information processing system 1000, an example in which the information processing system 1000 is composed of the above devices is shown, but these devices may be integrated. And may be further dispersed.

The internal configuration of the information processing device 200 will be described. FIG. 3 is a block diagram showing an example of the internal configuration of the information processing apparatus 200. In the example of FIG. 3, the information processing apparatus 200 includes a distance image acquisition unit 210, a user analysis unit 220, an object analysis unit 230, a determination unit 240, a function execution unit 250, and a user analysis data storage unit 261. A data storage unit 262 for object analysis, a registration gesture storage unit 263, and an image storage unit 264 are provided. The user analysis unit 220 includes a user detection unit 221 and a user motion analysis unit 222. The object analysis unit 230 includes an object detection unit 231 and an object motion analysis unit 232. The function execution unit 250 includes an image acquisition unit 251, a display position determination unit 252, a display direction determination unit 253, and an overall image generation unit 254.

The above-mentioned components of the information processing apparatus 200 may be aggregated or further dispersed. For example, in FIG. 3, in order to clarify the data used in each process, a plurality of storage units (user analysis data storage unit 261 and object analysis data storage unit 261) that store the data used by each component are stored. Although 262, the registered gesture storage unit 263, and the image storage unit 264) have been described, these storage units may be composed of one or more memories or storages, or a combination thereof. In addition, components and functions not shown or described in the present disclosure may also be present in the information processing apparatus 200.

The distance image acquisition unit 210 acquires a distance image from the distance image generation device 100 and transmits it to the user analysis unit 220 and the object analysis unit 230. The distance image acquisition unit 210 may perform preprocessing such as threshold processing on the distance image in order to improve the accuracy of the detection executed by the user analysis unit 220 and the object analysis unit 230.

The user analysis unit 220 detects a specific part of the user captured in the distance image and analyzes the operation of the specific part. The data used for the processing of the user analysis unit 220 is stored in advance in the user analysis data storage unit 261. For example, data for detecting a specific part such as a template image of a specific part is stored in the user analysis data storage unit 261. The specific part may be a part that can be used for gestures such as a finger, a hand, an arm, a face, and an eye, and is not particularly limited.

The specific part detection unit of the user analysis unit 220 detects the user's specific part and its area captured in the distance image. As the detection method, a known method may be used. For example, the area of the specific part can be detected by performing block matching between the template image related to the specific part stored in the user analysis data storage unit 261 and the distance image.

The user movement analysis unit 222 of the user analysis unit 220 obtains the position, posture, and transition of the specific part as the detected movement of the specific part.

FIG. 4 is a diagram for explaining the flow of analysis of a specific part. In the example of FIG. 4, the user's hand is shown as the specific site 611. As shown in the upper part of FIG. 4, the user motion analysis unit 222 plots a plurality of points 621 in the region of the specific portion. The points plotted in the area of the specific part are described as feature points.

The method of plotting feature points is not particularly limited. For example, it may be determined based on a skeleton model generation method generally used in the technique of gesture recognition. In addition, features of specific parts such as nails, wrinkles, moles, hairs, and joints may be plotted as feature points 621.

The user motion analysis unit 222 detects each position of the feature point 621. By using the distance image, the position in the depth direction with respect to the projection surface can also be detected. That is, the three-dimensional position of each feature point 621 can be obtained.

The user motion analysis unit 222 performs plane fitting on the obtained feature points 621. For example, a method such as the least squares method or the RANSAC method can be used. As a result, as shown in the center of FIG. 4, the plane 631 related to the specific portion is obtained.

The posture of the specific part is obtained based on the plane 631 related to the specific part. For example, a projection plane or the like is defined in advance as a reference plane, and the inclination of the plane 631 with respect to the reference plane is set as the posture of the specific portion. The inclination of the plane is, for example, the angle difference between the three-dimensional axis of the plane 631 as shown on the lower side of FIG. 4 and the three-dimensional axis based on the reference plane, such as pitch, yaw, and yaw. It can be represented by a roll.

In this way, the user motion analysis unit 222 calculates the position and posture of the specific part for each distance image (for each frame if the distance image is a moving image). Then, these differences before and after the time series are calculated. That is, the transition is obtained from the difference between the analysis result based on the distance image at the first time point and the analysis result based on the distance image at the second time point after the first time point. If the feature points cannot be distinguished, the correspondence between the feature points before and after the time series may be estimated by using a search method or the like, or the position may be determined based on the time interval in which the distance image was taken. Feature points with little transition may be associated with each other.

If features of a specific part such as nails, wrinkles, moles, and hair are plotted as feature points, the plotted feature points will be different for each user. The user analysis unit 220 stores the provided feature points as a history in the user analysis data storage unit 261, and based on the history, confirms whether the newly detected specific site has been previously analyzed. May be good. In this way, user identification may be performed based on the arrangement of feature points.

The object analysis unit 230 detects the object captured in the distance image and analyzes the operation of the object. The data used for the processing of the object analysis unit 230 is stored in advance in the object analysis data storage unit 262. For example, data for detecting an object, such as a template image of an object, is stored in the object analysis data storage unit 262.

Note that the object analysis unit 230 may analyze only the specific object used for the gesture, or may analyze an object other than the specific object. For example, the specific object related to the registered gesture can be recognized based on the data related to the registered gesture stored in the registered gesture storage unit 263. The object analysis unit 230 may match only a specific object related to the registered gesture among the objects that can be analyzed by itself. As a result, effects such as reduction of processing load and increase of processing speed can be obtained. Further, as will be described later, in order to determine the display position of the image object, not only the specific object related to the registered gesture but also all the objects that can be analyzed by itself may be analyzed.

The object detection unit 231 of the object analysis unit 230 detects the object and its region captured in the distance image in the same manner as the specific part detection unit. It is possible that the object captured in the distance image is not recognized. For example, only the shape of an object such as a pen, a rectangular parallelepiped, or a cylinder may be recognized.

The object motion analysis unit 232 of the object analysis unit 230 obtains the position, posture, and transition of the detected object as the motion of the detected object in the same manner as the user motion analysis unit 222.

In this way, the movement of the specific part of the user and the movement of the specific object are analyzed, but these movement analysis may fail. For example, in the middle of the gesture, the user's hand may hide the specific object, and the specific object may not be shown on the distance image. In that case, the specific object is not detected, and it is erroneously recognized that the specific object has ended even though the operation of the specific object is actually continuing.

In preparation for such a case, the user motion analysis unit 222 and the object motion analysis unit 232 may perform verification of the analysis result, reanalysis of the motion, and the like. That is, the success or failure of the analysis of the motion of the specific object at a certain point in time may be determined, and when it is determined that the analysis of the motion of the specific object at that time has failed, the analysis result of the motion at that time may be modified.

For example, the analysis result of the motion at the first time point is at least one of the analysis result of the motion at the second time point before the first time point and the analysis result of the motion at the third time point after the first time point. May be modified based on. For example, when the position and the posture fluctuate abruptly, it may be determined that the analysis has failed, and the suddenly fluctuated position and the posture may be corrected by complementation based on the values before and after the time series.

Further, the user motion analysis unit 222 and the object motion analysis unit 232 may predict the future position and posture based on the transition of the position and posture so far. For example, the position of the feature point at the first N + 1 time point may be predicted based on the position of the feature point at the first to Nth time points (N is an integer of 1 or more). The prediction result is compared with the estimation result based on the actual detection described above, and if the error is larger than the predetermined threshold value, it can be determined that the estimation by the detection has failed. If it is determined that the estimation has failed, the prediction result may be used, or as described above, correction may be made based on the preceding and succeeding estimation results.

The prediction method may be, for example, calculating the velocity and acceleration of the feature points and making a prediction based on the calculated speeds and accelerations. Alternatively, the position of the feature point at the first to Nth time points may be input to the estimation model based on the neural network, and the position of the feature point at the N + 1th time point may be output. The estimation model is known for deep learning based on input data for learning indicating the positions of feature points at the first to Nth time points and correct answer data indicating the actual positions of the feature points at the time points of N + 1. Can be generated by performing.

The determination unit 240 determines whether or not the gesture related to at least one of the specific part and the specific object has been executed based on at least one of the movement of the specific part and the movement of the specific object. The data used for the determination is stored in advance in the registered gesture storage unit 263. For example, the determination unit 240 compares the operation of the specific part related to the registered gesture with the analyzed operation of the specific part, and calculates the matching rate. Similarly, the motion of the specific object related to the registered gesture is compared with the motion of the analyzed specific object, and the matching rate is calculated. Then, it may be determined whether or not the gesture has been executed based on each match rate. For example, it may be determined that the registration gesture has been performed when each match rate exceeds the respective threshold value.

FIG. 5 is a diagram for explaining the registered contents of the registered gesture. The specific object to be used, the functional classification on the AR system, the operation of the specific part constituting the registered gesture, the operation of the specific object, and the function to be called are shown. In addition, even in the example of FIG. 5, the specific part is made by hand. Further, in the movement of the specific part and the movement of the specific object, numerical values indicating the transition of the position and the posture are actually registered.

6 to 13 are diagrams showing the registration gesture shown in FIG. FIG. 6 shows the details of the first to third registered gestures of FIG. FIG. 6 (A) shows the first registered gesture of FIG. The hand 611, which is a specific part, maintains a state in which the pen is held, which is called a pen holding posture, and the pen-shaped object 701, which is a specific object, is maintained in a state where its tip is close to the projection surface, that is, on the lower side. There is. When this gesture is recognized, as shown in FIG. 5, the function of "drawing a line" of the function classification of "electronic pen" is called. FIG. 6B shows the second registered gesture of FIG. When the gesture is recognized, as shown in FIG. 5, the function of erasing the line in the whole image 500 is called. FIG. 6C shows the third registered gesture of FIG. 5, and when the gesture is recognized, it is shown that the pointer 505, which is an image object, is displayed in the entire image 500.

FIG. 7 shows the fourth registered gesture of FIG. When the rear end of the pen-shaped object 701 is within a predetermined distance from the thumb (that is, in a close state) and the thumb is bent and stretched, each time the thumb is bent and stretched, an image object having a different color is included in the entire image 500. It is shown that 506A, 506B, and 506C are displayed in order (that is, the color of the image object changes). For example, the color of the line in the above-mentioned function of "drawing a line" can be changed.

FIG. 8 shows the fifth registered gesture of FIG. Each time the rear end of the pen-shaped object 701 rotates while the finger rotates, image objects 507A and 507B having different sizes are displayed in order in the entire image 500 (that is, the size of the image object changes). ) Is shown. For example, the line thickness in the above-mentioned "drawing line" function can be changed.

FIG. 9 shows the sixth registered gesture of FIG. A gesture is shown in which the hand 611 moves parallel to the projection plane and one corner of the rectangular parallelepiped 702 moves parallel to the projection plane while maintaining a state closer to the projection plane than the other corners. When the gesture is executed, as shown in FIG. 5, the function of erasing the line in the whole image 500 is called.

FIG. 10 (A) shows the seventh registered gesture of FIG. A gesture is shown in which the hand 611 moves parallel to the projection plane and one side of the rectangular parallelepiped 702 moves parallel to the projection plane while maintaining a state close to the projection plane. When the gesture is recognized, it is shown that the size (scale) of the entire image 500 is changed as shown in FIG. 10 (B).

FIG. 11 (A) shows the eighth registered gesture of FIG. The gesture that the contact surface of the rectangular parallelepiped 702 with the projection surface changes regardless of the movement of the specific part is shown. In this way, a gesture that does not include either the movement of a specific part or the movement of a specific object may be registered. When the gesture is recognized, the entire image 500 is switched as shown in FIG. 11 (B).

FIG. 12 shows the 9th to 11th registered gestures of FIG. FIG. 12A shows the ninth registered gesture, and when this gesture is recognized, an image object called a stamp is displayed in the entire image 500. In FIG. 12B, when this gesture is recognized, an image object 504 called a timer representing the time as shown in FIG. 2 is displayed in the entire image 500. FIG. 12C shows the eleventh registered gesture, and when this gesture is recognized, the stamp is switched to another stamp.

FIG. 13 (A) shows the twelfth registered gesture of FIG. When the gesture is executed, the whole image 500 is rotated, and for example, as shown in FIG. 13B, the orientation of the image object displayed in the whole image 500 is turned upside down.

However, when a rotationally symmetric specific object such as the cylinder 703 performs a rotational motion as shown in FIGS. 12 (C) and 13 (A), the motion is recognized based on the transition of the feature points. It is difficult. Therefore, the motion analysis unit may recognize the motion of the specific object based on the analyzed motion of the specific part.

For example, when it is analyzed that a specific part rotates about a vertical line passing through the center of the upper surface of the cylinder 703, the motion analysis unit analyzes the analysis result even if the cylinder 703 is analyzed to be stationary. Overturning, the cylinder 703 may be regarded as rotated. Further, it may be determined whether or not the specific portion is in contact with the cylinder 703, and it may be determined that the cylinder 703 has rotated in the same manner as the specific portion only when the specific portion is in contact with the cylinder 703. In this way, the movement of the specific object may be recognized based on the movement of the specific part.

Note that the contact between the specific part and the specific object can be recognized by whether the plane related to the specific part intersects any surface of the specific object.

Note that the user motion analysis unit 222 may also recognize the motion of the specific part based on the motion of the specific object. Further, the determination unit 240 may recertify the movement of the other based on at least one of the movement of the specific object and the movement of the specific portion.

In this way, the movement of the specific part and the movement of the specific object constituting the registered gesture are predetermined and stored as data, and the determination unit 240 detects the movement of the specific part and the specific object in the data. Search for registration gestures that match the combination of behaviors in. Then, when the registered gesture is detected, the function corresponding to the registered gesture is found.

When it is determined that the gesture has been executed, the function execution unit 250 executes the function corresponding to the executed gesture.

The image acquisition unit 251 acquires an image object corresponding to the function to be executed from the image storage unit 264. The whole image generation unit 254 generates the whole image 500 including the acquired image object.

Note that, as in the example of FIG. 2, a plurality of image objects may be displayed in the entire image 500. In such a case, it is preferable to determine the display position so that the plurality of displayed image objects do not overlap.

It is also preferable to consider the orientation of the image object. For example, when an image object contains characters and the like, if the characters are displayed in the opposite direction to the user, usability is lowered. Therefore, in the present embodiment, an appropriate display position and display direction of the image object are determined.

The display position determination unit 252 detects an area (that is, an empty space) in which the image is not displayed in the current overall image 500. The detection may be performed by recording the display position of the image object and detecting based on the recording. The detected empty space is a candidate for the display position of the image object. The display position determination unit 252 selects one of the detected empty spaces based on the size of the newly displayed image object, the position of the specific portion related to the gesture, and the like. As a result, the display position of the newly displayed image object is determined.

Further, the display position determination unit 252 may detect a space that cannot be actually used among the empty spaces, and may exclude the space that cannot be actually used from the candidates for the display position of the image object. For example, when the projected object 400 is a table as in the example of FIG. 2, it is conceivable that the object placed on the table overlaps with the entire image 500. In such a case, if the image object is projected in the same place as the object placed on the table, the image object becomes difficult to see. Therefore, the display position determination unit 252 may further narrow down the candidates for the display position of the image object based on the detected position of the object.

Also, the image object captured in the distance image is not always stationary. For example, when user A and user B are using the information processing system 1000 and try to display an image object called by the gesture of user A in an empty space, when the image object is displayed, the image object is displayed. An object moved by user B may come to the empty space. In order to avoid such a situation, the display position determining unit 252 may further narrow down the display position candidates of the image object based on the detected specific part and the predicted position of the object at the time when the image object is displayed. good. As a result, it is possible to prevent a situation in which the functions in the entire image 500 overlap with the user's hand, an object placed on the projection surface, or the like.

Further, there may be a case where the image object is always displayed on the specific object like the clock object 504 in the example of FIG. In such a case, the display position determination unit 252 may use the predicted position of the specific object as the display position of the image object.

The display direction determination unit 253 determines the display direction of the image object. For example, when the user analysis unit 220 recognizes the user's face captured in the distance image, the display direction of the image object is determined based on the position and orientation of the face. For example, the correct direction (for example, up and down) is determined in advance for the image object, and the display direction determination unit 253 displays the image object in the correct up and down direction when viewed from a certain position of the user's face. As a result, when displaying characters and the like, the characters can be displayed in a direction that is easy for the user to see.

Further, the position and orientation of the user's face may be estimated. For example, the distance image may show the user's hand but not the user's face. In such a case, the position of the user's face outside the range of the distance image is estimated from a part of the user's hand or the like. For example, when the user analysis unit 220 analyzes the movement of the user's hand, the direction in which the hand is inserted into the image area can be specified. The display direction determination unit 253 may recognize the direction in which the hand is inserted into the image area as the position of the face. Alternatively, the user analysis unit 220 may generate a skeleton model of the hand and estimate the position of the face in consideration of the orientation of the fingers specified by the skeleton model.

In this way, the image object to be displayed, its display position, and its display direction are determined, and the overall image generation unit 254 generates the entire image 500 according to these determination items. The generation of the entire image 500 may be the same as that of a general CG (Computer Graphics) manufacturing method.

FIG. 14 is a diagram showing a preferable display example of the image object. A gesture is performed in which the hand 611, which is a specific part, touches the cylinder 703, and the menu image 508 of the AR system is displayed by the gesture. Other image objects are also displayed in the whole image 500. In addition, the

hands

611, 612, and 613 of each user are present in the sky above the entire image 500. In addition, objects 705, 706 and 707 are present on the projection surface. The menu image 508 is displayed so as not to overlap with these. Further, although the character is shown in the menu image 508, it is shown so that the user of the hand 611 who performed the gesture can read the character. It is preferable that the entire image 500 having excellent usability is generated by the processing of the display position determination unit 252 and the display direction determination unit 253.

In the above, the correspondence between gestures and functions was predetermined, but from the viewpoint of usability, it is preferable to be able to customize the correspondence. Therefore, it is preferable that a gesture that calls a function for changing the correspondence is registered. For example, the menu image 508 shown in FIG. 14 includes an icon indicating a registered gesture and an icon indicating a callable function. Then, the user's tap may be detected and the gesture and function related to the tapped icon may be associated with each other.

Also, a gesture that calls the function for registering a new gesture may be registered. For example, the pointer 505 as shown in FIG. 6C may be displayed, and while the pointer 505 is displayed, a gesture performed in the sky above the pointer 505 may be newly registered. The gesture performed in the sky above the pointer 505 is analyzed by the user analysis unit 220 and the object analysis unit 230, and the analyzed motion of the specific part and the motion of the object are the contents of the new registered gesture, the registered gesture storage unit 263. It should be memorized in.

As mentioned above, if the user can be identified by arranging the feature points, the registered gesture can be customized for each user.

Next, the flow of processing by each component will be described. FIG. 15 is a flowchart of a function call process by a gesture.

The image acquisition unit 251 acquires a distance image (S101) and transmits it to the user analysis unit 220 and the object analysis unit 230. The user analysis unit 220 detects a specific part of the user reflected in the distance image and analyzes the operation of the specific part (S102). On the other hand, the object analysis unit 230 also detects the object reflected in the distance image and analyzes the motion of the object (S103). The internal flow of these analysis processing processes will be described later.

The determination unit 240 determines whether or not the registration gesture has been executed based on the analyzed movements of the specific part and the object (S104). If the registration gesture is not executed (NO in S105), the function execution unit 250 maintains the current function (S106). As a result, the flow ends and the entire image 500 does not change. On the other hand, when the registered gesture is executed, the function execution unit 250 executes the function related to the executed gesture (S107). That is, the function is switched, and the image object related to the function of the entire image 500 changes.

Next, the internal flow of analysis processing will be described. FIG. 16 is a flowchart of the analysis process. Since the flow is the same in the user analysis unit 220 and the object analysis unit 230, the user detection unit 221 and the object detection unit 231 are collectively referred to as a detection unit, and the user motion analysis unit 222 and the object motion analysis are collectively described. Described as motion analysis unit.

The detection unit attempts to detect a specific part or object captured in a distance image at a plurality of predetermined points in time. When the detection and analysis are not successful at all time points (NO in S202), the analysis unit estimates the operation at the time of failure based on the operation at other time points and the like. When the estimation is completed or successful at all points (YES in S203), the motion analysis unit determines the detected position and posture transition, the previous analysis result, and the other motion (that is, in the case of user analysis). Analyzes the motion based on the motion of the object, the motion of a specific part in the case of object analysis), and the like (S204). Thereby, even if the detection and analysis fail at some of the plurality of time points, the operation at the plurality of time points can be estimated. It is also possible to detect the rotational movement of a specific rotationally symmetric object. This prevents false detection of operation.

Next, the flow of image display processing associated with function execution will be described. FIG. 17 is a flowchart of the image display process. This flow can be performed in S106 and S107 of the flow of the function call processing by the gesture.

The function execution unit 250 acquires information such as an image object corresponding to a gesture, a detected specific part, and a detected object (S301). The information of the image object may be transmitted from the determination unit 240, or the information indicating the gesture executed may be transmitted from the determination unit 240, and the function execution unit 250 may acquire the information based on the information.

Based on the image object information, the image acquisition unit 251 acquires the image object from the image storage unit 264 (S302). If the gesture is not changed, the image object has already been acquired, and the process may be omitted.

On the other hand, the display position determination unit 252 first predicts the positions of the user and the object at the time of output of the image in order to determine the position to display the image (S303). As described above, the user motion analysis unit 222 and the object motion analysis unit 232 may perform the prediction. The display position determination unit 252 confirms the free space of the entire image 500 currently displayed, and further detects the free space after the present time based on the predicted positions of the user and the object (S304). The display position determination unit 252 determines the display position of the image object based on the free space after the present time and the size of the image object (S305). When displaying on a specific object, the position of the specific object is detected instead of the empty space, and the position is determined as the display position.

On the other hand, the display direction determination unit 253 confirms whether the position of the user's face is detected in order to determine the direction in which the image is displayed, and if the position of the face is not detected (NO in S306), The position of the user's face is estimated based on the detected specific part (S307). When the estimation is completed or the position of the face is detected (NO in S306), the display direction determination unit 253 determines the display direction of the image object based on the position of the user's face (S308). ..

Since the image object, the display position of the image, and the display direction of the image are known, the whole image generation unit 254 generates and outputs the whole image 500 by fitting the image object into the whole image 500 as determined. (S309). When the flow shown in FIG. 15 is performed every time a distance image is received from the distance image generator 100, the image object corresponding to the gesture can be continuously displayed so as to be easily seen by the user by this flow. For example, the user who performed the gesture may move after the gesture is executed and the image object related to the gesture is displayed. Even in such a case, the display position and display direction of the image object are optimized.

Note that the flowchart of the present disclosure is an example, and each process does not necessarily have to be performed according to the above flow. For example, the processes of S102 and S103 are performed in parallel, but may be performed in order.

As described above, according to the present embodiment, a plurality of functions that can be provided in the information processing system 1000 can be assigned to a plurality of gestures using one specific object. The information processing system 1000 recognizes the executed gesture by analyzing the movement of the specific object and the user, and executes the function assigned to the recognized gesture. This eliminates the complexity of switching to another specific object when it is desired to switch the function to be executed, and improves the convenience of the user.

Also, the position of the user's face is estimated, and the position, orientation, etc. of the image object to be displayed are adjusted according to the estimated face position. As a result, for example, it is possible to prevent a problem that the characters displayed in the information processing system 1000 are displayed diagonally to the user and the displayed characters are difficult to read.

In addition, the position and movement of the user and the object existing in the whole image 500 are also estimated, and even if the object in the display space and the user move, the image object is always placed in the empty space or the moving object. It is also possible to display it.

The processing of the device according to the embodiment of the present disclosure can be realized by software (program) executed by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like. It should be noted that, instead of executing all the processes of the device by software, some processes may be executed by hardware such as a dedicated circuit.

Note that the above-described embodiment shows an example for embodying the present disclosure, and the present disclosure can be implemented in various other forms. For example, various modifications, substitutions, omissions, or combinations thereof are possible without departing from the gist of the present disclosure. The forms in which such modifications, substitutions, omissions, etc. are made are also included in the scope of the invention described in the claims and the equivalent scope thereof, as are included in the scope of the present disclosure.

The present disclosure may also have the following structure.
[1]
A first analysis unit that detects a specific part of the user captured in the input image and analyzes the operation of the specific part.
A second analysis unit that detects a specific object captured in the input image and analyzes the operation of the specific object.
A determination unit that determines whether or not any of the specific portion and the plurality of gestures related to the specific object has been executed based on the operation of the specific portion and the operation of the specific object.
When it is determined that any of the plurality of gestures has been executed, an execution unit that executes a function corresponding to the executed gesture, and an execution unit.
Information processing device equipped with.
[2]
The second analysis unit
Judgment is made as to whether or not the motion analysis of the specific object at the first time point is successful.
The information processing apparatus according to the above [1], wherein when it is determined that the analysis of the motion of the specific object at the first time point has failed, the analysis result of the motion at the first time point is corrected.
[3]
The second analysis unit uses the analysis result of the operation at the first time point, the analysis result of the operation at the second time point before the first time point, and the operation at the third time point after the first time point. The information processing apparatus according to the above [2], wherein the analysis result of the operation at the first time point is modified based on at least one of the analysis results of the above.
[4]
The second analysis unit
Based on the analysis result of the operation at the time point before the first time point, the operation at the first time point is predicted.
The information processing apparatus according to the above [2], which determines the success or failure of the analysis of the operation at the first time point based on the prediction result of the operation at the first time point and the analysis result of the operation at the first time point.
[5]
When it is determined that the analysis of the motion of the specific object at the first time point has failed, the second analysis unit corrects the analysis result of the motion at the first time point based on the prediction result. 4] The information processing apparatus according to the above.
[6]
The information processing device according to any one of [1] to [5] above, wherein the execution unit outputs an image including an image object corresponding to the executed gesture together with the execution of the function.
[7]
The second analysis unit predicts the first region of the detected object at the first time point based on the transition of the position of the detected object.
The information processing device according to the above [6], wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the first area at the first time point.
[8]
The second analysis unit predicts the second region of the detected specific part at the first time point based on the transition of the position of the detected specific part.
The information processing device according to the above [6], wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the second area at the first time point.
[9]
The information processing device according to any one of [6] to [8] above, wherein the execution unit adjusts the display position of the image object corresponding to the executed gesture according to the movement of the user.
[10]
The first analysis unit estimates the position of the user at the first time point based on the transition of the position of the detected specific part.
The information processing according to any one of [6] to [9] above, wherein the execution unit displays the image object corresponding to the executed gesture in the correct vertical direction when viewed from the estimated position of the user. Device.
[11]
The information processing device according to any one of [6] to [10] above, wherein the execution unit adjusts the display direction of the image object corresponding to the executed gesture according to the movement of the user.
[12]
When the determination unit determines that a predetermined gesture has been executed, the execution unit displays each image related to the plurality of gestures and each image related to the plurality of functions.
The determination unit
Recognize the selected gesture and function based on the transition of the specific part and the position of each displayed image.
The information processing apparatus according to any one of [1] to [11] above, which executes the selected function when it is determined that the selected gesture has been executed.
[13]
The execution unit indicates a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed.
The first analysis unit detects a specific part included in the registration area, and sets the operation of the detected specific part as one of the plurality of gestures. The description in any one of [1] to [12] above. Information processing device.
[14]
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
Information processing method including.
[15]
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
A program that runs on your computer.
[16]
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
A storage medium in which a program executed on a computer is stored.

1000 Information processing system 100 Distance image generator 200 Information processing device 210 Distance image acquisition unit 220 User analysis unit 221 User detection unit 222 User motion analysis unit 230 Object analysis unit 231 Object detection unit 232 Object motion analysis unit 240 Judgment unit 250 Function execution 251 Image acquisition unit 252 Display position determination unit 253 Display direction determination unit 254 Overall image generation unit 261 User analysis data storage unit 262 Object analysis data storage unit 263 Registered gesture storage unit 264 Image storage unit 300 Projector 400 Projected object 500

Overall image

501, 502, 503, 504 Image object 505

Pointer image object

506A, 506B, 506C Different

color image object

507A, 507B Different

size image object

601, 602, 603

User

611, 612, 613 Specific part 621 Feature point 631

Plane

701, 702, 703, 704, 705, 706, 707 Object

Claims

A first analysis unit that detects a specific part of the user captured in the input image and analyzes the operation of the specific part.
A second analysis unit that detects a specific object captured in the input image and analyzes the operation of the specific object.
A determination unit that determines whether or not any of the specific portion and the plurality of gestures related to the specific object has been executed based on the operation of the specific portion and the operation of the specific object.
When it is determined that any of the plurality of gestures has been executed, an execution unit that executes a function corresponding to the executed gesture, and an execution unit.
Information processing device equipped with.
The second analysis unit
Judgment is made as to whether or not the motion analysis of the specific object at the first time point is successful.
The information processing apparatus according to claim 1, wherein when it is determined that the analysis of the motion of the specific object at the first time point has failed, the analysis result of the motion at the first time point is corrected.
The second analysis unit uses the analysis result of the operation at the first time point, the analysis result of the operation at the second time point before the first time point, and the operation at the third time point after the first time point. The information processing apparatus according to claim 2, wherein the analysis result of the operation at the first time point is modified based on at least one of the analysis results of the above.
The second analysis unit
Based on the analysis result of the operation at the time point before the first time point, the operation at the first time point is predicted.
The information processing apparatus according to claim 2, wherein the success or failure of the analysis of the operation at the first time point is determined based on the prediction result of the operation at the first time point and the analysis result of the operation at the first time point.
Claim that the second analysis unit corrects the analysis result of the operation at the first time point based on the prediction result when it is determined that the analysis of the operation of the specific object at the first time point has failed. The information processing apparatus according to 4.
The information processing device according to claim 1, wherein the execution unit outputs an image including an image object corresponding to the executed gesture together with the execution of the function.
The second analysis unit predicts the first region of the detected object at the first time point based on the transition of the position of the detected object.
The information processing device according to claim 6, wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the first area at the first time point.
The second analysis unit predicts the second region of the detected specific part at the first time point based on the transition of the position of the detected specific part.
The information processing device according to claim 6, wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the second area at the first time point.
The information processing device according to claim 6, wherein the execution unit adjusts the display position of the image object corresponding to the executed gesture according to the movement of the user.
The first analysis unit estimates the position of the user at the first time point based on the transition of the position of the detected specific part.
The information processing device according to claim 6, wherein the execution unit displays an image object corresponding to the executed gesture in the correct vertical direction when viewed from the estimated position of the user.
The information processing device according to claim 10, wherein the execution unit adjusts the display direction of the image object corresponding to the executed gesture according to the movement of the user.
When the determination unit determines that a predetermined gesture has been executed, the execution unit displays each image related to the plurality of gestures and each image related to the plurality of functions.
The determination unit
Recognize the selected gesture and function based on the transition of the specific part and the position of each displayed image.
The information processing apparatus according to claim 1, wherein when it is determined that the selected gesture has been executed, the selected function is executed.
The execution unit indicates a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed.
The information processing apparatus according to claim 1, wherein the first analysis unit detects a specific part included in the registration area and sets the operation of the detected specific part as one of the plurality of gestures.
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
Information processing method including.
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
A program that runs on your computer.