WO2021161769A1 - Information processing device, information processing method, and program - Google Patents

Information processing device, information processing method, and program Download PDF

Info

Publication number
WO2021161769A1
WO2021161769A1 PCT/JP2021/002501 JP2021002501W WO2021161769A1 WO 2021161769 A1 WO2021161769 A1 WO 2021161769A1 JP 2021002501 W JP2021002501 W JP 2021002501W WO 2021161769 A1 WO2021161769 A1 WO 2021161769A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
gesture
time point
specific
executed
Prior art date
Application number
PCT/JP2021/002501
Other languages
French (fr)
Japanese (ja)
Inventor
洋祐 加治
哲男 池田
淳 入江
英佑 藤縄
誠史 友永
忠義 村上
健志 後藤
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021161769A1 publication Critical patent/WO2021161769A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Definitions

  • This disclosure relates to information processing devices, information processing methods, and programs.
  • CG Common Graphics
  • the functions that the AR system can provide are increasing more and more. Therefore, the method of selecting the provided functions is important. For example, if a large number of functions are to be recognized only by the gestures of the user's hand, a large number of gestures are required, which is not preferable from the viewpoint of usability. Therefore, it is preferable to make it possible to select a large number of functions by making a combination of a specific object and a human gesture, that is, a gesture using the specific object available.
  • the present disclosure provides an information processing device or the like that solves a problem related to usability of an information processing system that provides functions according to registered gestures.
  • the information processing device on one aspect of the present disclosure includes a first analysis unit, a second analysis unit, a determination unit, and an execution unit.
  • the first analysis unit detects a specific part of the user captured in the input image and analyzes the operation of the specific part.
  • the second analysis unit detects the specific object captured in the input image and analyzes the operation of the specific object.
  • the determination unit determines whether or not the gesture related to the specific portion and the specific object has been executed based on the operation of the specific portion and the operation of the specific object. When it is determined that the gesture has been executed, the execution unit executes a function corresponding to the gesture.
  • the second analysis unit determines the success or failure of the analysis of the motion of the specific object at the first time point, and when it is determined that the analysis of the motion of the specific object at the first time point has failed, the first analysis unit.
  • the analysis result of the operation at the time point may be modified.
  • the second analysis unit uses the analysis result of the operation at the first time point as the analysis result of the operation at the second time point before the first time point and the third time point after the first time point.
  • the analysis result of the operation at the first time point may be modified based on at least one of the analysis results of the operation in.
  • the second analysis unit predicts the operation at the first time point based on the analysis result of the operation at the time point before the first time point, and predicts the operation at the first time point and the first time point.
  • the success or failure of the analysis of the motion at the first time point may be determined based on the analysis result of the motion at the first time point.
  • the second analysis unit corrects the analysis result of the motion at the first time point based on the prediction result. You may.
  • the execution unit may output an image including an image object corresponding to the executed gesture together with the execution of the function.
  • the second analysis unit may predict the first region of the detected object at the first time point based on the transition of the position of the detected object, and the execution unit may predict the first region of the detected object.
  • the image object corresponding to the executed gesture may be displayed in an area other than the first area.
  • the second analysis unit may predict the second region at the first time point of the detected specific part based on the transition of the position of the detected specific part, and the execution part may predict the second region at the first time point.
  • the image object corresponding to the executed gesture may be displayed in an area other than the second area.
  • the execution unit may adjust the display position of the image object corresponding to the executed gesture according to the movement of the user.
  • the first analysis unit may estimate the position of the user at the first time point based on the transition of the position of the detected specific part, and the execution unit corresponds to the executed gesture.
  • the image object to be displayed may be displayed in the correct vertical orientation when viewed from the user's estimated position.
  • the execution unit may adjust the display direction of the image object corresponding to the executed gesture according to the movement of the user.
  • the execution unit may display each image related to the plurality of gestures and each image related to the plurality of functions when the determination unit determines that a predetermined gesture has been executed.
  • the determination unit recognizes the selected gesture and function based on the transition of the specific part and the position of each displayed image, and when it is determined that the selected gesture has been executed, You may perform the selected function.
  • the execution unit may indicate a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed, and the first analysis unit may indicate the registration area.
  • the specific site included in the above may be detected, and the operation of the detected specific site may be one of the plurality of gestures.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed. When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and An information processing method is provided.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
  • the step of executing the function corresponding to the executed gesture and A program that runs on a computer is provided.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
  • the step of executing the function corresponding to the executed gesture and A storage medium in which a program executed on a computer is stored is provided.
  • the figure explaining the registration content of the registration gesture. The figure which shows the registration gesture using the specific object of a pen shape.
  • the figure which shows another registration gesture and corresponding function using a specific object of a rectangular parallelepiped The figure which shows the registration gesture using the specific object of a cylinder.
  • Flowchart of function call processing by gesture Flowchart of analysis process. Flowchart of image display processing.
  • FIG. 1 is a diagram showing a configuration example of an information processing system according to an embodiment of the present disclosure.
  • the information processing system 1000 is composed of a distance image generation device 100, an information processing device 200, a projector 300, and a projected object 400.
  • the information processing system 1000 of the present embodiment is a system that recognizes the execution of the registered gesture and provides a function according to the registered gesture.
  • the information processing system 1000 is an AR (Augmented Reality) system for displaying an image.
  • the function to be executed does not have to be related to image processing.
  • the control of the electric device managed by the information processing system 1000 may be simply executed.
  • a table is shown as the projected object 400, and an image is projected on the upper surface of the table.
  • the entire image displayed by the information processing system 1000 is referred to as the entire image 500.
  • the surface of the projected object 400 on which the entire image 500 is displayed is referred to as a projected surface.
  • the term "image” is a concept that includes both still images and moving images. Therefore, the "image” in the present disclosure may be replaced with a still image or a moving image if there is no particular problem. That is, the image displayed by the information processing system 1000 may be a moving image or a still image. The concept of "image” is also included in "image”. Further, the whole image 500 may be an image called a stereoscopic image, a 3D image, or the like, which can make the viewer feel three-dimensional.
  • Gestures and functions corresponding to the gestures are registered in advance in the information processing system 1000. Further, the user who uses the information processing system 1000 recognizes the registered gesture and the corresponding function, and conveys the function to be called to the information processing system 1000 by the gesture. The information processing system 1000 recognizes the gesture of the user and changes the display content of the entire image 500 according to the recognized gesture.
  • FIG. 2 is a diagram illustrating a usage pattern of the information processing system 1000.
  • FIG. 2 shows users 601, 602, and 603 who use the information processing system 1000. Further, a pen-shaped object 701 held by the user 601 is shown. Also shown are a rectangular parallelepiped 702, a cylinder 703, and a laptop PC 704 resting on the projected object 400.
  • a plurality of depictions are displayed in the entire image 500.
  • a depiction in the whole image 500 in other words, a partial image displayed in a part of the whole image 500, is referred to as an image object.
  • image objects 501, 502, and 503 such as memo paper and an image object 504 representing a time of "3:47" are shown.
  • the image object 504 representing the time is projected on the upper surface of the cylinder 703.
  • the action of touching the image object of the entire image 500 is also recognized as a gesture.
  • the pre-registered process corresponding to the gesture is performed.
  • it is possible to execute processing on the touched image object For example, it is possible to execute a process such as enlarging, reducing, or erasing the touched image object 503, or changing the image content related to the image object 503.
  • it is possible to execute a process such as displaying a menu image of the information processing system 1000. In this way, the entire image 500 can be used as a so-called virtual touch screen.
  • gestures are represented only by specific parts such as the user's hand and fingers, but in the present embodiment, gestures using a specific object registered in advance in the information processing system 1000 can also be used.
  • the user 601 is holding the pen-shaped object 701, and such a posture of the user can also be recognized as a gesture.
  • a process such as displaying a solid line corresponding to the transition of the position of the tip of the object 701 (in other words, the locus of the tip) on the image object 501 can be executed.
  • the human body is also included in the object, but in order to distinguish between a gesture using only a specific part of the user and a gesture using a specific object, the specific object is not included in the specific object in this disclosure. It shall be. That is, a gesture using a specific object is a gesture using an object other than the human body, and a gesture using only the hand, a gesture using a combination of eyes and fingers, etc. are not included in the gesture using the specific object. ..
  • the specific object is not particularly limited except for the human body, and its shape is not limited.
  • the information processing system 1000 of the present embodiment can assign a plurality of instruction contents to a specific object. That is, it is possible to register a plurality of gestures using the same specific object. For example, when a user grasps and moves a pen-shaped object, if the tip of the object is directed toward the entire image 500, a solid line is drawn in the overall image 500, and the rear end of the object is directed toward the entire image 500. If so, a process of erasing the solid line in the entire image 500 may be performed according to the transition of the position of the rear end. That is, a gesture using a pen-shaped object can provide a pen function of drawing a solid line and an eraser function of erasing the drawn solid line. In this way, even gestures using the same specific object can call different functions. From the viewpoint of usability, it is preferable that a function intuitively imagined by the user from a specific object is associated with the specific object, and this embodiment makes it possible.
  • the distance image generation device 100 captures a region in which the gesture is performed and generates a distance image (depth map) related to the region.
  • a known device may be used.
  • it includes an image sensor, a distance measuring sensor, and the like, and generates a distance image from these images.
  • the image sensor includes an RGB camera and the like.
  • Examples of the distance measuring sensor include a stereo camera, a TOF (Time of Flight) camera, and a structured light camera.
  • the information processing device 200 recognizes the gesture by the user based on the distance image generated by the distance image generation device 100.
  • the recognized gesture may include a gesture using an object.
  • the whole image 500 is generated based on the recognized gesture.
  • the display position of the image object displayed on the entire image 500 is adjusted in order to further improve the usability. As a result, the entire image 500 can be easily viewed by the user. Details will be described later together with the components of the information processing apparatus 200.
  • the projector 300 outputs the entire image 500 generated by the information processing device 200.
  • the projector 300 is installed above the table, an image is projected downward from the projector 300, and the entire image 500 is displayed on the upper surface of the table.
  • the display destination of the entire image 500 is not particularly limited.
  • the information processing apparatus 200 may transmit the entire image 500 to the laptop 704 of FIG. 2 so that the entire image 500 may be displayed on the laptop 704.
  • the entire image 500 may be output to an image display device such as an AR glass or a head-mounted display. That is, instead of the projector 300 and the projected object 400, an image display device may be included in the information processing system 1000.
  • the information processing system 1000 in order to clarify the processing performed by the information processing system 1000, an example in which the information processing system 1000 is composed of the above devices is shown, but these devices may be integrated. And may be further dispersed.
  • FIG. 3 is a block diagram showing an example of the internal configuration of the information processing apparatus 200.
  • the information processing apparatus 200 includes a distance image acquisition unit 210, a user analysis unit 220, an object analysis unit 230, a determination unit 240, a function execution unit 250, and a user analysis data storage unit 261.
  • a data storage unit 262 for object analysis, a registration gesture storage unit 263, and an image storage unit 264 are provided.
  • the user analysis unit 220 includes a user detection unit 221 and a user motion analysis unit 222.
  • the object analysis unit 230 includes an object detection unit 231 and an object motion analysis unit 232.
  • the function execution unit 250 includes an image acquisition unit 251, a display position determination unit 252, a display direction determination unit 253, and an overall image generation unit 254.
  • the above-mentioned components of the information processing apparatus 200 may be aggregated or further dispersed.
  • a plurality of storage units (user analysis data storage unit 261 and object analysis data storage unit 261) that store the data used by each component are stored.
  • 262, the registered gesture storage unit 263, and the image storage unit 264) have been described, these storage units may be composed of one or more memories or storages, or a combination thereof.
  • components and functions not shown or described in the present disclosure may also be present in the information processing apparatus 200.
  • the distance image acquisition unit 210 acquires a distance image from the distance image generation device 100 and transmits it to the user analysis unit 220 and the object analysis unit 230.
  • the distance image acquisition unit 210 may perform preprocessing such as threshold processing on the distance image in order to improve the accuracy of the detection executed by the user analysis unit 220 and the object analysis unit 230.
  • the user analysis unit 220 detects a specific part of the user captured in the distance image and analyzes the operation of the specific part.
  • the data used for the processing of the user analysis unit 220 is stored in advance in the user analysis data storage unit 261.
  • data for detecting a specific part such as a template image of a specific part is stored in the user analysis data storage unit 261.
  • the specific part may be a part that can be used for gestures such as a finger, a hand, an arm, a face, and an eye, and is not particularly limited.
  • the specific part detection unit of the user analysis unit 220 detects the user's specific part and its area captured in the distance image.
  • the detection method a known method may be used.
  • the area of the specific part can be detected by performing block matching between the template image related to the specific part stored in the user analysis data storage unit 261 and the distance image.
  • the user movement analysis unit 222 of the user analysis unit 220 obtains the position, posture, and transition of the specific part as the detected movement of the specific part.
  • FIG. 4 is a diagram for explaining the flow of analysis of a specific part.
  • the user's hand is shown as the specific site 611.
  • the user motion analysis unit 222 plots a plurality of points 621 in the region of the specific portion. The points plotted in the area of the specific part are described as feature points.
  • the method of plotting feature points is not particularly limited. For example, it may be determined based on a skeleton model generation method generally used in the technique of gesture recognition. In addition, features of specific parts such as nails, wrinkles, moles, hairs, and joints may be plotted as feature points 621.
  • the user motion analysis unit 222 detects each position of the feature point 621. By using the distance image, the position in the depth direction with respect to the projection surface can also be detected. That is, the three-dimensional position of each feature point 621 can be obtained.
  • the user motion analysis unit 222 performs plane fitting on the obtained feature points 621.
  • a method such as the least squares method or the RANSAC method can be used.
  • the plane 631 related to the specific portion is obtained.
  • the posture of the specific part is obtained based on the plane 631 related to the specific part.
  • a projection plane or the like is defined in advance as a reference plane, and the inclination of the plane 631 with respect to the reference plane is set as the posture of the specific portion.
  • the inclination of the plane is, for example, the angle difference between the three-dimensional axis of the plane 631 as shown on the lower side of FIG. 4 and the three-dimensional axis based on the reference plane, such as pitch, yaw, and yaw. It can be represented by a roll.
  • the user motion analysis unit 222 calculates the position and posture of the specific part for each distance image (for each frame if the distance image is a moving image). Then, these differences before and after the time series are calculated. That is, the transition is obtained from the difference between the analysis result based on the distance image at the first time point and the analysis result based on the distance image at the second time point after the first time point. If the feature points cannot be distinguished, the correspondence between the feature points before and after the time series may be estimated by using a search method or the like, or the position may be determined based on the time interval in which the distance image was taken. Feature points with little transition may be associated with each other.
  • the user analysis unit 220 stores the provided feature points as a history in the user analysis data storage unit 261, and based on the history, confirms whether the newly detected specific site has been previously analyzed. May be good. In this way, user identification may be performed based on the arrangement of feature points.
  • the object analysis unit 230 detects the object captured in the distance image and analyzes the operation of the object.
  • the data used for the processing of the object analysis unit 230 is stored in advance in the object analysis data storage unit 262.
  • data for detecting an object such as a template image of an object, is stored in the object analysis data storage unit 262.
  • the object analysis unit 230 may analyze only the specific object used for the gesture, or may analyze an object other than the specific object.
  • the specific object related to the registered gesture can be recognized based on the data related to the registered gesture stored in the registered gesture storage unit 263.
  • the object analysis unit 230 may match only a specific object related to the registered gesture among the objects that can be analyzed by itself. As a result, effects such as reduction of processing load and increase of processing speed can be obtained. Further, as will be described later, in order to determine the display position of the image object, not only the specific object related to the registered gesture but also all the objects that can be analyzed by itself may be analyzed.
  • the object detection unit 231 of the object analysis unit 230 detects the object and its region captured in the distance image in the same manner as the specific part detection unit. It is possible that the object captured in the distance image is not recognized. For example, only the shape of an object such as a pen, a rectangular parallelepiped, or a cylinder may be recognized.
  • the object motion analysis unit 232 of the object analysis unit 230 obtains the position, posture, and transition of the detected object as the motion of the detected object in the same manner as the user motion analysis unit 222.
  • the movement of the specific part of the user and the movement of the specific object are analyzed, but these movement analysis may fail.
  • the user's hand may hide the specific object, and the specific object may not be shown on the distance image. In that case, the specific object is not detected, and it is erroneously recognized that the specific object has ended even though the operation of the specific object is actually continuing.
  • the user motion analysis unit 222 and the object motion analysis unit 232 may perform verification of the analysis result, reanalysis of the motion, and the like. That is, the success or failure of the analysis of the motion of the specific object at a certain point in time may be determined, and when it is determined that the analysis of the motion of the specific object at that time has failed, the analysis result of the motion at that time may be modified.
  • the analysis result of the motion at the first time point is at least one of the analysis result of the motion at the second time point before the first time point and the analysis result of the motion at the third time point after the first time point. May be modified based on. For example, when the position and the posture fluctuate abruptly, it may be determined that the analysis has failed, and the suddenly fluctuated position and the posture may be corrected by complementation based on the values before and after the time series.
  • the user motion analysis unit 222 and the object motion analysis unit 232 may predict the future position and posture based on the transition of the position and posture so far. For example, the position of the feature point at the first N + 1 time point may be predicted based on the position of the feature point at the first to Nth time points (N is an integer of 1 or more).
  • the prediction result is compared with the estimation result based on the actual detection described above, and if the error is larger than the predetermined threshold value, it can be determined that the estimation by the detection has failed. If it is determined that the estimation has failed, the prediction result may be used, or as described above, correction may be made based on the preceding and succeeding estimation results.
  • the prediction method may be, for example, calculating the velocity and acceleration of the feature points and making a prediction based on the calculated speeds and accelerations.
  • the position of the feature point at the first to Nth time points may be input to the estimation model based on the neural network, and the position of the feature point at the N + 1th time point may be output.
  • the estimation model is known for deep learning based on input data for learning indicating the positions of feature points at the first to Nth time points and correct answer data indicating the actual positions of the feature points at the time points of N + 1. Can be generated by performing.
  • the determination unit 240 determines whether or not the gesture related to at least one of the specific part and the specific object has been executed based on at least one of the movement of the specific part and the movement of the specific object.
  • the data used for the determination is stored in advance in the registered gesture storage unit 263.
  • the determination unit 240 compares the operation of the specific part related to the registered gesture with the analyzed operation of the specific part, and calculates the matching rate.
  • the motion of the specific object related to the registered gesture is compared with the motion of the analyzed specific object, and the matching rate is calculated. Then, it may be determined whether or not the gesture has been executed based on each match rate. For example, it may be determined that the registration gesture has been performed when each match rate exceeds the respective threshold value.
  • FIG. 5 is a diagram for explaining the registered contents of the registered gesture.
  • the specific object to be used, the functional classification on the AR system, the operation of the specific part constituting the registered gesture, the operation of the specific object, and the function to be called are shown.
  • the specific part is made by hand.
  • numerical values indicating the transition of the position and the posture are actually registered.
  • FIG. 6 shows the details of the first to third registered gestures of FIG.
  • FIG. 6 (A) shows the first registered gesture of FIG.
  • the hand 611 which is a specific part, maintains a state in which the pen is held, which is called a pen holding posture, and the pen-shaped object 701, which is a specific object, is maintained in a state where its tip is close to the projection surface, that is, on the lower side. There is.
  • FIG. 6B shows the second registered gesture of FIG.
  • FIG. 6C shows the third registered gesture of FIG. 5, and when the gesture is recognized, it is shown that the pointer 505, which is an image object, is displayed in the entire image 500.
  • FIG. 7 shows the fourth registered gesture of FIG.
  • the rear end of the pen-shaped object 701 is within a predetermined distance from the thumb (that is, in a close state) and the thumb is bent and stretched, each time the thumb is bent and stretched, an image object having a different color is included in the entire image 500.
  • 506A, 506B, and 506C are displayed in order (that is, the color of the image object changes). For example, the color of the line in the above-mentioned function of "drawing a line" can be changed.
  • FIG. 8 shows the fifth registered gesture of FIG.
  • image objects 507A and 507B having different sizes are displayed in order in the entire image 500 (that is, the size of the image object changes). ) Is shown.
  • the line thickness in the above-mentioned "drawing line” function can be changed.
  • FIG. 9 shows the sixth registered gesture of FIG. A gesture is shown in which the hand 611 moves parallel to the projection plane and one corner of the rectangular parallelepiped 702 moves parallel to the projection plane while maintaining a state closer to the projection plane than the other corners.
  • the gesture is executed, as shown in FIG. 5, the function of erasing the line in the whole image 500 is called.
  • FIG. 10 (A) shows the seventh registered gesture of FIG. A gesture is shown in which the hand 611 moves parallel to the projection plane and one side of the rectangular parallelepiped 702 moves parallel to the projection plane while maintaining a state close to the projection plane.
  • the gesture is recognized, it is shown that the size (scale) of the entire image 500 is changed as shown in FIG. 10 (B).
  • FIG. 11 (A) shows the eighth registered gesture of FIG.
  • the gesture that the contact surface of the rectangular parallelepiped 702 with the projection surface changes regardless of the movement of the specific part is shown.
  • a gesture that does not include either the movement of a specific part or the movement of a specific object may be registered.
  • the entire image 500 is switched as shown in FIG. 11 (B).
  • FIG. 12 shows the 9th to 11th registered gestures of FIG.
  • FIG. 12A shows the ninth registered gesture, and when this gesture is recognized, an image object called a stamp is displayed in the entire image 500.
  • FIG. 12B when this gesture is recognized, an image object 504 called a timer representing the time as shown in FIG. 2 is displayed in the entire image 500.
  • FIG. 12C shows the eleventh registered gesture, and when this gesture is recognized, the stamp is switched to another stamp.
  • FIG. 13 (A) shows the twelfth registered gesture of FIG.
  • the whole image 500 is rotated, and for example, as shown in FIG. 13B, the orientation of the image object displayed in the whole image 500 is turned upside down.
  • the motion analysis unit may recognize the motion of the specific object based on the analyzed motion of the specific part.
  • the motion analysis unit analyzes the analysis result even if the cylinder 703 is analyzed to be stationary. Overturning, the cylinder 703 may be regarded as rotated. Further, it may be determined whether or not the specific portion is in contact with the cylinder 703, and it may be determined that the cylinder 703 has rotated in the same manner as the specific portion only when the specific portion is in contact with the cylinder 703. In this way, the movement of the specific object may be recognized based on the movement of the specific part.
  • the contact between the specific part and the specific object can be recognized by whether the plane related to the specific part intersects any surface of the specific object.
  • the user motion analysis unit 222 may also recognize the motion of the specific part based on the motion of the specific object. Further, the determination unit 240 may recertify the movement of the other based on at least one of the movement of the specific object and the movement of the specific portion.
  • the movement of the specific part and the movement of the specific object constituting the registered gesture are predetermined and stored as data, and the determination unit 240 detects the movement of the specific part and the specific object in the data. Search for registration gestures that match the combination of behaviors in. Then, when the registered gesture is detected, the function corresponding to the registered gesture is found.
  • the function execution unit 250 executes the function corresponding to the executed gesture.
  • the image acquisition unit 251 acquires an image object corresponding to the function to be executed from the image storage unit 264.
  • the whole image generation unit 254 generates the whole image 500 including the acquired image object.
  • a plurality of image objects may be displayed in the entire image 500. In such a case, it is preferable to determine the display position so that the plurality of displayed image objects do not overlap.
  • an appropriate display position and display direction of the image object are determined.
  • the display position determination unit 252 detects an area (that is, an empty space) in which the image is not displayed in the current overall image 500. The detection may be performed by recording the display position of the image object and detecting based on the recording. The detected empty space is a candidate for the display position of the image object. The display position determination unit 252 selects one of the detected empty spaces based on the size of the newly displayed image object, the position of the specific portion related to the gesture, and the like. As a result, the display position of the newly displayed image object is determined.
  • the display position determination unit 252 may detect a space that cannot be actually used among the empty spaces, and may exclude the space that cannot be actually used from the candidates for the display position of the image object. For example, when the projected object 400 is a table as in the example of FIG. 2, it is conceivable that the object placed on the table overlaps with the entire image 500. In such a case, if the image object is projected in the same place as the object placed on the table, the image object becomes difficult to see. Therefore, the display position determination unit 252 may further narrow down the candidates for the display position of the image object based on the detected position of the object.
  • the image object captured in the distance image is not always stationary.
  • the display position determining unit 252 may further narrow down the display position candidates of the image object based on the detected specific part and the predicted position of the object at the time when the image object is displayed. good. As a result, it is possible to prevent a situation in which the functions in the entire image 500 overlap with the user's hand, an object placed on the projection surface, or the like.
  • the display position determination unit 252 may use the predicted position of the specific object as the display position of the image object.
  • the display direction determination unit 253 determines the display direction of the image object. For example, when the user analysis unit 220 recognizes the user's face captured in the distance image, the display direction of the image object is determined based on the position and orientation of the face. For example, the correct direction (for example, up and down) is determined in advance for the image object, and the display direction determination unit 253 displays the image object in the correct up and down direction when viewed from a certain position of the user's face. As a result, when displaying characters and the like, the characters can be displayed in a direction that is easy for the user to see.
  • the position and orientation of the user's face may be estimated.
  • the distance image may show the user's hand but not the user's face.
  • the position of the user's face outside the range of the distance image is estimated from a part of the user's hand or the like.
  • the user analysis unit 220 analyzes the movement of the user's hand, the direction in which the hand is inserted into the image area can be specified.
  • the display direction determination unit 253 may recognize the direction in which the hand is inserted into the image area as the position of the face.
  • the user analysis unit 220 may generate a skeleton model of the hand and estimate the position of the face in consideration of the orientation of the fingers specified by the skeleton model.
  • the image object to be displayed, its display position, and its display direction are determined, and the overall image generation unit 254 generates the entire image 500 according to these determination items.
  • the generation of the entire image 500 may be the same as that of a general CG (Computer Graphics) manufacturing method.
  • FIG. 14 is a diagram showing a preferable display example of the image object.
  • a gesture is performed in which the hand 611, which is a specific part, touches the cylinder 703, and the menu image 508 of the AR system is displayed by the gesture.
  • Other image objects are also displayed in the whole image 500.
  • the hands 611, 612, and 613 of each user are present in the sky above the entire image 500.
  • objects 705, 706 and 707 are present on the projection surface.
  • the menu image 508 is displayed so as not to overlap with these.
  • the character is shown in the menu image 508, it is shown so that the user of the hand 611 who performed the gesture can read the character. It is preferable that the entire image 500 having excellent usability is generated by the processing of the display position determination unit 252 and the display direction determination unit 253.
  • the correspondence between gestures and functions was predetermined, but from the viewpoint of usability, it is preferable to be able to customize the correspondence. Therefore, it is preferable that a gesture that calls a function for changing the correspondence is registered.
  • the menu image 508 shown in FIG. 14 includes an icon indicating a registered gesture and an icon indicating a callable function. Then, the user's tap may be detected and the gesture and function related to the tapped icon may be associated with each other.
  • a gesture that calls the function for registering a new gesture may be registered.
  • the pointer 505 as shown in FIG. 6C may be displayed, and while the pointer 505 is displayed, a gesture performed in the sky above the pointer 505 may be newly registered.
  • the gesture performed in the sky above the pointer 505 is analyzed by the user analysis unit 220 and the object analysis unit 230, and the analyzed motion of the specific part and the motion of the object are the contents of the new registered gesture, the registered gesture storage unit 263. It should be memorized in.
  • the registered gesture can be customized for each user.
  • FIG. 15 is a flowchart of a function call process by a gesture.
  • the image acquisition unit 251 acquires a distance image (S101) and transmits it to the user analysis unit 220 and the object analysis unit 230.
  • the user analysis unit 220 detects a specific part of the user reflected in the distance image and analyzes the operation of the specific part (S102).
  • the object analysis unit 230 also detects the object reflected in the distance image and analyzes the motion of the object (S103). The internal flow of these analysis processing processes will be described later.
  • the determination unit 240 determines whether or not the registration gesture has been executed based on the analyzed movements of the specific part and the object (S104). If the registration gesture is not executed (NO in S105), the function execution unit 250 maintains the current function (S106). As a result, the flow ends and the entire image 500 does not change. On the other hand, when the registered gesture is executed, the function execution unit 250 executes the function related to the executed gesture (S107). That is, the function is switched, and the image object related to the function of the entire image 500 changes.
  • FIG. 16 is a flowchart of the analysis process. Since the flow is the same in the user analysis unit 220 and the object analysis unit 230, the user detection unit 221 and the object detection unit 231 are collectively referred to as a detection unit, and the user motion analysis unit 222 and the object motion analysis are collectively described. Described as motion analysis unit.
  • the detection unit attempts to detect a specific part or object captured in a distance image at a plurality of predetermined points in time.
  • the analysis unit estimates the operation at the time of failure based on the operation at other time points and the like.
  • the motion analysis unit determines the detected position and posture transition, the previous analysis result, and the other motion (that is, in the case of user analysis). Analyzes the motion based on the motion of the object, the motion of a specific part in the case of object analysis), and the like (S204). Thereby, even if the detection and analysis fail at some of the plurality of time points, the operation at the plurality of time points can be estimated. It is also possible to detect the rotational movement of a specific rotationally symmetric object. This prevents false detection of operation.
  • FIG. 17 is a flowchart of the image display process. This flow can be performed in S106 and S107 of the flow of the function call processing by the gesture.
  • the function execution unit 250 acquires information such as an image object corresponding to a gesture, a detected specific part, and a detected object (S301).
  • the information of the image object may be transmitted from the determination unit 240, or the information indicating the gesture executed may be transmitted from the determination unit 240, and the function execution unit 250 may acquire the information based on the information.
  • the image acquisition unit 251 acquires the image object from the image storage unit 264 (S302). If the gesture is not changed, the image object has already been acquired, and the process may be omitted.
  • the display position determination unit 252 first predicts the positions of the user and the object at the time of output of the image in order to determine the position to display the image (S303). As described above, the user motion analysis unit 222 and the object motion analysis unit 232 may perform the prediction. The display position determination unit 252 confirms the free space of the entire image 500 currently displayed, and further detects the free space after the present time based on the predicted positions of the user and the object (S304). The display position determination unit 252 determines the display position of the image object based on the free space after the present time and the size of the image object (S305). When displaying on a specific object, the position of the specific object is detected instead of the empty space, and the position is determined as the display position.
  • the display direction determination unit 253 confirms whether the position of the user's face is detected in order to determine the direction in which the image is displayed, and if the position of the face is not detected (NO in S306), The position of the user's face is estimated based on the detected specific part (S307). When the estimation is completed or the position of the face is detected (NO in S306), the display direction determination unit 253 determines the display direction of the image object based on the position of the user's face (S308). ..
  • the whole image generation unit 254 Since the image object, the display position of the image, and the display direction of the image are known, the whole image generation unit 254 generates and outputs the whole image 500 by fitting the image object into the whole image 500 as determined. (S309).
  • the image object corresponding to the gesture can be continuously displayed so as to be easily seen by the user by this flow.
  • the user who performed the gesture may move after the gesture is executed and the image object related to the gesture is displayed. Even in such a case, the display position and display direction of the image object are optimized.
  • a plurality of functions that can be provided in the information processing system 1000 can be assigned to a plurality of gestures using one specific object.
  • the information processing system 1000 recognizes the executed gesture by analyzing the movement of the specific object and the user, and executes the function assigned to the recognized gesture. This eliminates the complexity of switching to another specific object when it is desired to switch the function to be executed, and improves the convenience of the user.
  • the position of the user's face is estimated, and the position, orientation, etc. of the image object to be displayed are adjusted according to the estimated face position. As a result, for example, it is possible to prevent a problem that the characters displayed in the information processing system 1000 are displayed diagonally to the user and the displayed characters are difficult to read.
  • the position and movement of the user and the object existing in the whole image 500 are also estimated, and even if the object in the display space and the user move, the image object is always placed in the empty space or the moving object. It is also possible to display it.
  • the processing of the device according to the embodiment of the present disclosure can be realized by software (program) executed by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like. It should be noted that, instead of executing all the processes of the device by software, some processes may be executed by hardware such as a dedicated circuit.
  • the present disclosure may also have the following structure.
  • a first analysis unit that detects a specific part of the user captured in the input image and analyzes the operation of the specific part.
  • a second analysis unit that detects a specific object captured in the input image and analyzes the operation of the specific object.
  • a determination unit that determines whether or not any of the specific portion and the plurality of gestures related to the specific object has been executed based on the operation of the specific portion and the operation of the specific object. When it is determined that any of the plurality of gestures has been executed, an execution unit that executes a function corresponding to the executed gesture, and an execution unit. Information processing device equipped with.
  • the second analysis unit Judgment is made as to whether or not the motion analysis of the specific object at the first time point is successful.
  • the information processing apparatus wherein when it is determined that the analysis of the motion of the specific object at the first time point has failed, the analysis result of the motion at the first time point is corrected.
  • the second analysis unit uses the analysis result of the operation at the first time point, the analysis result of the operation at the second time point before the first time point, and the operation at the third time point after the first time point.
  • the information processing apparatus according to the above [2], wherein the analysis result of the operation at the first time point is modified based on at least one of the analysis results of the above.
  • the second analysis unit Based on the analysis result of the operation at the time point before the first time point, the operation at the first time point is predicted.
  • the information processing apparatus which determines the success or failure of the analysis of the operation at the first time point based on the prediction result of the operation at the first time point and the analysis result of the operation at the first time point.
  • the second analysis unit corrects the analysis result of the motion at the first time point based on the prediction result. 4] The information processing apparatus according to the above. [6] The information processing device according to any one of [1] to [5] above, wherein the execution unit outputs an image including an image object corresponding to the executed gesture together with the execution of the function.
  • the second analysis unit predicts the first region of the detected object at the first time point based on the transition of the position of the detected object.
  • the information processing device according to the above [6], wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the first area at the first time point.
  • the second analysis unit predicts the second region of the detected specific part at the first time point based on the transition of the position of the detected specific part.
  • the information processing device according to the above [6], wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the second area at the first time point.
  • the execution unit displays each image related to the plurality of gestures and each image related to the plurality of functions.
  • the determination unit Recognize the selected gesture and function based on the transition of the specific part and the position of each displayed image.
  • the information processing apparatus according to any one of [1] to [11] above, which executes the selected function when it is determined that the selected gesture has been executed.
  • the execution unit indicates a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed.
  • the first analysis unit detects a specific part included in the registration area, and sets the operation of the detected specific part as one of the plurality of gestures. The description in any one of [1] to [12] above.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed. When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and Information processing method including.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed. When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and A program that runs on your computer.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
  • the step of executing the function corresponding to the executed gesture and A storage medium in which a program executed on a computer is stored.
  • Information processing system 100 Distance image generator 200 Information processing device 210 Distance image acquisition unit 220 User analysis unit 221 User detection unit 222 User motion analysis unit 230 Object analysis unit 231 Object detection unit 232 Object motion analysis unit 240 Judgment unit 250 Function execution 251 Image acquisition unit 252 Display position determination unit 253 Display direction determination unit 254 Overall image generation unit 261 User analysis data storage unit 262 Object analysis data storage unit 263 Registered gesture storage unit 264 Image storage unit 300 Projector 400 Projected object 500 Overall image 501, 502, 503, 504 Image object 505 Pointer image object 506A, 506B, 506C Different color image object 507A, 507B Different size image object 601, 602, 603 User 611, 612, 613 Specific part 621 Feature point 631 Plane 701, 702, 703, 704, 705, 706, 707 Object

Abstract

Provided is an information processing device or the like that solves the problem about the usability of an information processing system that provides a function which depends on a registered gesture. The information processing device according to an aspect of the present disclosure is provided with a first analysis unit, a second analysis unit, a determination unit, and an execution unit. The first analysis unit detects a specific part of a user shown in an input image, and analyzes the operation of the specific part. The second analysis unit detects a specific object shown in the input image, and analyzes the operation of the specific object. On the basis of the operation of the specific part and the operation of the specific object, the determination unit determines whether a gesture related to the specific part and the specific object has been executed. When it is determined that the gesture has been executed, the execution unit executes a function corresponding to the gesture.

Description

情報処理装置、情報処理方法、およびプログラムInformation processing equipment, information processing methods, and programs
 本開示は、情報処理装置、情報処理方法、およびプログラムに関する。 This disclosure relates to information processing devices, information processing methods, and programs.
 AR(Augmented Reality)システムなどでは、登録されたジェスチャの実行を検知した場合に、登録されたジェスチャに応じた機能を提供することが行われている。例えば、ユーザの手の動作に応じて、情報処理システムが作り出したCG(Comupter Graphics)の表示を変更するシステムが提案されている。また、ユーザの傍に特定物体が有るかどうかを検知し、同じジェスチャが行われても特定物体の有無に応じて異なるCGを表示するシステムが提案されている。 In an AR (Augmented Reality) system or the like, when the execution of a registered gesture is detected, a function corresponding to the registered gesture is provided. For example, a system has been proposed in which the display of CG (Commuter Graphics) created by an information processing system is changed according to the movement of a user's hand. Further, a system has been proposed in which it detects whether or not there is a specific object near the user and displays different CG depending on the presence or absence of the specific object even if the same gesture is performed.
特開2019-060963号公報Japanese Unexamined Patent Publication No. 2019-060963 特開2018-000941号公報JP-A-2018-000941
 ARシステムの発展に伴い、ARシステムが提供可能な機能はますます増加している。そのため、提供される機能を選択する方法が重要となる。例えば、数多くの機能をユーザの手のジェスチャだけで認識させようとすると、数多くのジェスチャが必要となり、ユーザビリティの観点からは好ましくない。そこで、特定物体と、人のジェスチャと、を組み合わせたもの、すなわち、特定物体を用いたジェスチャを利用可能とすることで、数多くの機能を選択可能にするほうが好ましい。 With the development of the AR system, the functions that the AR system can provide are increasing more and more. Therefore, the method of selecting the provided functions is important. For example, if a large number of functions are to be recognized only by the gestures of the user's hand, a large number of gestures are required, which is not preferable from the viewpoint of usability. Therefore, it is preferable to make it possible to select a large number of functions by making a combination of a specific object and a human gesture, that is, a gesture using the specific object available.
 ユーザビリティの観点からは、特定物体からユーザが直感的に想像される機能が当該特定物体に対応付けられていることが好ましい。ゆえに、複数の特定物体が利用可能であるほうが好ましい。しかし、その場合、ユーザの傍に特定物体が有るかどうかといった検知では、複数の特定物体がユーザの傍にあったときに、いずれの特定物体を用いたジェスチャであるかを認識することができない。 From the viewpoint of usability, it is preferable that a function intuitively imagined by the user from a specific object is associated with the specific object. Therefore, it is preferable that a plurality of specific objects are available. However, in that case, detection such as whether or not there is a specific object near the user cannot recognize which specific object is used when a plurality of specific objects are near the user. ..
 また、一つの特定物体に対して一つの機能しか割り当てられない場合、別の機能を呼び出したいときは、特定物体を持ち替えなくてはならない。これは、ユーザビリティの観点からは好ましくない。ゆえに、複数の特定物体それぞれに対して複数の機能を割り当て可能であることが望まれる。しかし、その場合、複数の機能のいずれに関するジェスチャなのかを正確に区別する必要がある。 Also, if only one function can be assigned to one specific object, and you want to call another function, you have to change the specific object. This is not preferable from the viewpoint of usability. Therefore, it is desired that a plurality of functions can be assigned to each of the plurality of specific objects. However, in that case, it is necessary to accurately distinguish which of the multiple functions the gesture is for.
 本開示は、登録されたジェスチャに応じた機能を提供する情報処理システムのユーザビリティに関する課題を解決する情報処理装置などを提供する。 The present disclosure provides an information processing device or the like that solves a problem related to usability of an information processing system that provides functions according to registered gestures.
 本開示の一側面の情報処理装置は、第1解析部と、第2解析部と、判定部と、実行部と、を備える。前記第1解析部は、入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析する。前記第2解析部は、前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析する。前記判定部は、前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係るジェスチャが実行されたか否かを判定する。前記実行部は、前記ジェスチャが実行されたと判定された場合に、前記ジェスチャに対応する機能を実行する。 The information processing device on one aspect of the present disclosure includes a first analysis unit, a second analysis unit, a determination unit, and an execution unit. The first analysis unit detects a specific part of the user captured in the input image and analyzes the operation of the specific part. The second analysis unit detects the specific object captured in the input image and analyzes the operation of the specific object. The determination unit determines whether or not the gesture related to the specific portion and the specific object has been executed based on the operation of the specific portion and the operation of the specific object. When it is determined that the gesture has been executed, the execution unit executes a function corresponding to the gesture.
 また、前記第2解析部は、前記特定物体の第1時点における動作の解析の成否について判定し、前記特定物体の第1時点における動作の解析が失敗したと判定された場合に、前記第1時点における動作の解析結果を修正してもよい。 Further, the second analysis unit determines the success or failure of the analysis of the motion of the specific object at the first time point, and when it is determined that the analysis of the motion of the specific object at the first time point has failed, the first analysis unit. The analysis result of the operation at the time point may be modified.
 また、前記第2解析部は、前記第1時点における動作の解析結果を、前記第1時点よりも前の第2時点における動作の解析結果、および、前記第1時点よりも後の第3時点における動作の解析結果、の少なくともいずれかに基づいて、前記第1時点における動作の解析結果を修正してもよい。 In addition, the second analysis unit uses the analysis result of the operation at the first time point as the analysis result of the operation at the second time point before the first time point and the third time point after the first time point. The analysis result of the operation at the first time point may be modified based on at least one of the analysis results of the operation in.
 また、前記第2解析部は、前記第1時点よりも前の時点における動作の解析結果に基づいて、前記第1時点における動作を予測し、前記第1時点における動作の予測結果と、前記第1時点における動作の解析結果と、に基づいて、前記第1時点における動作の解析の成否を判定してもよい。 Further, the second analysis unit predicts the operation at the first time point based on the analysis result of the operation at the time point before the first time point, and predicts the operation at the first time point and the first time point. The success or failure of the analysis of the motion at the first time point may be determined based on the analysis result of the motion at the first time point.
 また、前記第2解析部は、前記特定物体の前記第1時点における動作の解析が失敗したと判定された場合に、前記予測結果に基づいて、前記第1時点における動作の解析結果を修正してもよい。 Further, when it is determined that the analysis of the motion of the specific object at the first time point has failed, the second analysis unit corrects the analysis result of the motion at the first time point based on the prediction result. You may.
 また、前記実行部は、前記機能の実行とともに、前記実行されたジェスチャに対応する画像オブジェクトを含む画像を出力してもよい。 Further, the execution unit may output an image including an image object corresponding to the executed gesture together with the execution of the function.
 また、前記第2解析部は、前記検出された物体の位置の推移に基づいて、前記検出された物体の第1時点における第1領域を予測してもよく、前記実行部は、前記第1時点において、前記実行されたジェスチャに対応する画像オブジェクトを前記第1領域以外の領域に表示してもよい。 Further, the second analysis unit may predict the first region of the detected object at the first time point based on the transition of the position of the detected object, and the execution unit may predict the first region of the detected object. At the time point, the image object corresponding to the executed gesture may be displayed in an area other than the first area.
 また、前記第2解析部は、前記検出された特定部位の位置の推移に基づいて、前記検出された特定部位の第1時点における第2領域を予測してもよく、前記実行部は、前記第1時点において、前記実行されたジェスチャに対応する画像オブジェクトを前記第2領域以外の領域に表示しもてよい。 Further, the second analysis unit may predict the second region at the first time point of the detected specific part based on the transition of the position of the detected specific part, and the execution part may predict the second region at the first time point. At the first time point, the image object corresponding to the executed gesture may be displayed in an area other than the second area.
 また、前記実行部は、前記実行されたジェスチャに対応する画像オブジェクトの表示位置を、前記ユーザの移動に応じて、調整してもよい。 Further, the execution unit may adjust the display position of the image object corresponding to the executed gesture according to the movement of the user.
 また、前記第1解析部は、前記検出された特定部位の位置の推移に基づいて、前記ユーザの第1時点における位置を推定してもよく、前記実行部は、前記実行されたジェスチャに対応する画像オブジェクトを、前記ユーザの推定された位置から見て、上下正しい向きに表示してもよい。 Further, the first analysis unit may estimate the position of the user at the first time point based on the transition of the position of the detected specific part, and the execution unit corresponds to the executed gesture. The image object to be displayed may be displayed in the correct vertical orientation when viewed from the user's estimated position.
 また、前記実行部は、前記実行されたジェスチャに対応する画像オブジェクトの表示方向を、前記ユーザの移動に応じて、調整してもよい。 Further, the execution unit may adjust the display direction of the image object corresponding to the executed gesture according to the movement of the user.
 また、前記実行部は、所定のジェスチャが実行されたと前記判定部によって判定された場合に、前記複数のジェスチャに係る各画像と、前記複数の機能に係る各画像と、を表示してもよく、前記判定部は、前記特定部位の推移と、表示された各画像の位置と、に基づいて、選択されたジェスチャおよび機能を認識し、選択されたジェスチャが実行されたと判定された場合に、選択された機能を実行してもよい。 Further, the execution unit may display each image related to the plurality of gestures and each image related to the plurality of functions when the determination unit determines that a predetermined gesture has been executed. , The determination unit recognizes the selected gesture and function based on the transition of the specific part and the position of each displayed image, and when it is determined that the selected gesture has been executed, You may perform the selected function.
 また、前記実行部は、所定のジェスチャが実行されたと前記判定部によって判定された場合に、新たなジェスチャを登録するための登録領域を示してもよく、前記第1解析部は、前記登録領域に含まれる特定部位を検出し、検出された特定部位の動作を、前記複数のジェスチャの一つとしてもよい。 Further, the execution unit may indicate a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed, and the first analysis unit may indicate the registration area. The specific site included in the above may be detected, and the operation of the detected specific site may be one of the plurality of gestures.
 本開示の他の一態様では、
 入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析するステップと、
 前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析するステップと、
 前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定するステップと、
 前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行するステップと、
 を備える情報処理方法が提供される。
In another aspect of the disclosure,
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
An information processing method is provided.
 本開示の他の一態様では、
 入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析するステップと、
 前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析するステップと、
 前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定するステップと、
 前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行するステップと、
 を備える、コンピュータに実行されるプログラムが提供される。
In another aspect of the disclosure,
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
A program that runs on a computer is provided.
 また、入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析するステップと、
 前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析するステップと、
 前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定するステップと、
 前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行するステップと、
 を備える、コンピュータに実行されるプログラムが記憶された記憶媒体が提供される。
In addition, a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
A storage medium in which a program executed on a computer is stored is provided.
本開示の実施形態に係る情報処理システムの構成例を示す図。The figure which shows the structural example of the information processing system which concerns on embodiment of this disclosure. 情報処理システムの利用形態を説明する図。The figure explaining the usage form of an information processing system. 情報処理装置の内部構成例を示すブロック図。The block diagram which shows the internal structure example of an information processing apparatus. 特定部位の解析の流れを説明する図。The figure explaining the flow of analysis of a specific part. 登録ジェスチャの登録内容について説明する図。The figure explaining the registration content of the registration gesture. ペン形状の特定物体を用いた登録ジェスチャを示す図。The figure which shows the registration gesture using the specific object of a pen shape. ペン形状の特定物体を用いた登録ジェスチャおよび対応する機能を示す図。The figure which shows the registration gesture and the corresponding function using the specific object of a pen shape. ペン形状の特定物体を用いた別の登録ジェスチャおよび対応する機能を示す図。The figure which shows another registration gesture and corresponding function using a specific object of a pen shape. 直方体の特定物体を用いた登録ジェスチャを示す図。The figure which shows the registration gesture using the specific object of the rectangular parallelepiped. 直方体の特定物体を用いた登録ジェスチャおよび対応する機能を示す図。The figure which shows the registration gesture and the corresponding function using the specific object of a rectangular parallelepiped. 直方体の特定物体を用いた別の登録ジェスチャおよび対応する機能を示す図。The figure which shows another registration gesture and corresponding function using a specific object of a rectangular parallelepiped. 円柱の特定物体を用いた登録ジェスチャを示す図。The figure which shows the registration gesture using the specific object of a cylinder. 円柱の特定物体を用いた登録ジェスチャおよび対応する機能を示す図。The figure which shows the registration gesture and the corresponding function using the specific object of a cylinder. 画像オブジェクトの好ましい表示例を示す図。The figure which shows the preferable display example of an image object. ジェスチャによる機能の呼び出し処理のフローチャート。Flowchart of function call processing by gesture. 解析処理のフローチャート。Flowchart of analysis process. 画像表示処理のフローチャート。Flowchart of image display processing.
 以下、図面を参照して、本開示の実施形態について説明する。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.
(本発明の一実施形態)
 図1は、本開示の実施形態に係る情報処理システムの構成例を示す図である。図1の例では、情報処理システム1000が、距離画像生成装置100と、情報処理装置200と、プロジェクタ300と、被投影体400と、により構成されている。
(One Embodiment of the present invention)
FIG. 1 is a diagram showing a configuration example of an information processing system according to an embodiment of the present disclosure. In the example of FIG. 1, the information processing system 1000 is composed of a distance image generation device 100, an information processing device 200, a projector 300, and a projected object 400.
 本実施形態の情報処理システム1000は、登録されたジェスチャの実行を認識し、登録されたジェスチャに応じた機能を提供するシステムである。本開示では、典型的な例として、情報処理システム1000が画像を表示するAR(Augmented Reality)システムである例を説明する。ただし、実行される機能が画像処理に関するものである必要はない。単に、情報処理システム1000によって管理される電気機器の制御(例えば電源のオン、オフなど)が実行されてもよい。 The information processing system 1000 of the present embodiment is a system that recognizes the execution of the registered gesture and provides a function according to the registered gesture. In the present disclosure, as a typical example, an example in which the information processing system 1000 is an AR (Augmented Reality) system for displaying an image will be described. However, the function to be executed does not have to be related to image processing. The control of the electric device managed by the information processing system 1000 (for example, power on / off, etc.) may be simply executed.
 図1の例では、被投影体400として、テーブルが示されており、当該テーブルの上面に画像が投影されている。情報処理システム1000によって表示される画像全体を全体画像500と記載する。全体画像500が表示される被投影体400の面を、投影面と記載する。 In the example of FIG. 1, a table is shown as the projected object 400, and an image is projected on the upper surface of the table. The entire image displayed by the information processing system 1000 is referred to as the entire image 500. The surface of the projected object 400 on which the entire image 500 is displayed is referred to as a projected surface.
 なお、本開示において、「画像」という用語は、静止画および動画の両方を包括する概念である。ゆえに、本開示中の「画像」は、特に問題がなければ、静止画または動画に置き換えられてもよい。つまり、情報処理システム1000によって表示される画像は、動画でも静止画でもよい。また、「映像」という概念も「画像」に含まれる。また、全体画像500は、立体視画像、3D画像などと称される、閲覧者に立体的と感じさせることが可能な画像でもよい。 In this disclosure, the term "image" is a concept that includes both still images and moving images. Therefore, the "image" in the present disclosure may be replaced with a still image or a moving image if there is no particular problem. That is, the image displayed by the information processing system 1000 may be a moving image or a still image. The concept of "image" is also included in "image". Further, the whole image 500 may be an image called a stereoscopic image, a 3D image, or the like, which can make the viewer feel three-dimensional.
 情報処理システム1000には、ジェスチャと、当該ジェスチャに対応する機能と、が予め登録されている。また、情報処理システム1000を利用するユーザは、登録されているジェスチャおよび対応する機能を認識しており、呼び出したい機能をジェスチャによって情報処理システム1000に伝える。情報処理システム1000は、ユーザのジェスチャを認識し、認識されたジェスチャに応じて全体画像500の表示内容を変更する。 Gestures and functions corresponding to the gestures are registered in advance in the information processing system 1000. Further, the user who uses the information processing system 1000 recognizes the registered gesture and the corresponding function, and conveys the function to be called to the information processing system 1000 by the gesture. The information processing system 1000 recognizes the gesture of the user and changes the display content of the entire image 500 according to the recognized gesture.
 図2は、情報処理システム1000の利用形態を説明する図である。図2には、情報処理システム1000を利用するユーザ601、602、および603が示されている。また、ユーザ601が把持しているペン形状の物体701が示されている。また、被投影体400の上に置かれている、直方体702と、円柱703と、ラップトップPC704と、が示されている。また、全体画像500内に、複数の描写物が表示されている。本開示では、全体画像500内の描写物、言い換えれば、全体画像500の一部分において表示される部分画像、を画像オブジェクトと記載する。図2の例では、メモ紙のような画像オブジェクト501、502、および503と、「3:47」という時刻を表す画像オブジェクト504と、が示されている。時刻を表す画像オブジェクト504は、円柱703の上面に投影されている。 FIG. 2 is a diagram illustrating a usage pattern of the information processing system 1000. FIG. 2 shows users 601, 602, and 603 who use the information processing system 1000. Further, a pen-shaped object 701 held by the user 601 is shown. Also shown are a rectangular parallelepiped 702, a cylinder 703, and a laptop PC 704 resting on the projected object 400. In addition, a plurality of depictions are displayed in the entire image 500. In the present disclosure, a depiction in the whole image 500, in other words, a partial image displayed in a part of the whole image 500, is referred to as an image object. In the example of FIG. 2, image objects 501, 502, and 503 such as memo paper and an image object 504 representing a time of "3:47" are shown. The image object 504 representing the time is projected on the upper surface of the cylinder 703.
 ユーザ602は、画像オブジェクト503を指でタッチしている。このように、全体画像500の画像オブジェクトをタッチする動作もジェスチャとして認識される。そして、ジェスチャに対応する予め登録済みの処理が行われる。例えば、タッチされた画像オブジェクトに対する処理を実行させることができる。例えば、タッチされた画像オブジェクト503を拡大する、縮小する、もしくは消去する、または、画像オブジェクト503に係る画像内容を変更する、といった処理を実行させることができる。あるいは、タッチされた画像オブジェクトとは別の画像オブジェクトに対する処理を実行させることもできる。例えば、情報処理システム1000のメニュー画像を表示するといった処理を実行させることもできる。このようにして、全体画像500をいわゆる仮想タッチスクリーンとして利用することもできる。 User 602 is touching the image object 503 with a finger. In this way, the action of touching the image object of the entire image 500 is also recognized as a gesture. Then, the pre-registered process corresponding to the gesture is performed. For example, it is possible to execute processing on the touched image object. For example, it is possible to execute a process such as enlarging, reducing, or erasing the touched image object 503, or changing the image content related to the image object 503. Alternatively, it is possible to execute processing on an image object other than the touched image object. For example, it is possible to execute a process such as displaying a menu image of the information processing system 1000. In this way, the entire image 500 can be used as a so-called virtual touch screen.
 ジェスチャには、ユーザの手、指などの特定部位だけにより表されるものもあるが、本実施形態では、情報処理システム1000に予め登録済みの特定物体を用いたジェスチャも利用することができる。例えば、ユーザ601がペン形状の物体701を把持しているが、このようなユーザの姿勢もジェスチャとして認識され得る。例えば、物体701の突端の位置の推移(言い換えれば、突端の軌跡)に応じた実線を画像オブジェクト501に表示するといった処理が実行され得る。 Some gestures are represented only by specific parts such as the user's hand and fingers, but in the present embodiment, gestures using a specific object registered in advance in the information processing system 1000 can also be used. For example, the user 601 is holding the pen-shaped object 701, and such a posture of the user can also be recognized as a gesture. For example, a process such as displaying a solid line corresponding to the transition of the position of the tip of the object 701 (in other words, the locus of the tip) on the image object 501 can be executed.
 なお、一般に、人体も物体に含まれるが、ユーザの特定部位だけを用いたジェスチャと、特定物体を用いたジェスチャと、を区別するために、本開示では、特定物体には人体は含まれないものとする。すなわち、特定物体を用いたジェスチャとは、人体以外の物体も用いたジェスチャであり、手だけを用いたジェスチャ、目と指を組み合わせたジェスチャなどは、特定物体を用いたジェスチャには含まれない。人体以外であれば、特定物体は、特に限られるものではないし、その形状も限定されるものではない。 In general, the human body is also included in the object, but in order to distinguish between a gesture using only a specific part of the user and a gesture using a specific object, the specific object is not included in the specific object in this disclosure. It shall be. That is, a gesture using a specific object is a gesture using an object other than the human body, and a gesture using only the hand, a gesture using a combination of eyes and fingers, etc. are not included in the gesture using the specific object. .. The specific object is not particularly limited except for the human body, and its shape is not limited.
 さらに、本実施形態の情報処理システム1000は、特定物体に複数の指示内容を割り当てることができる。すなわち、同一の特定物体を用いた複数のジェスチャを登録することができる。例えば、ユーザがペン形状の物体を把持して動かした場合に、当該物体の先端が全体画像500に向かっていれば全体画像500内に実線を描き、当該物体の後端が全体画像500に向かっていれば後端の位置の推移に応じて、全体画像500内の実線を消去するという処理が行われてもよい。すなわち、ペン形状の物体を用いたジェスチャにより、実線を描くというペンの機能と、描かれている実線を消去するという消しゴムの機能と、が提供され得る。このように、同じ特定物体を用いたジェスチャであっても、異なる機能を呼び出すことができる。ユーザビリティの観点からは、特定物体からユーザが直感的に想像される機能が当該特定物体に対応付けられていることが好ましいが、本実施形態はそれを可能にする。 Further, the information processing system 1000 of the present embodiment can assign a plurality of instruction contents to a specific object. That is, it is possible to register a plurality of gestures using the same specific object. For example, when a user grasps and moves a pen-shaped object, if the tip of the object is directed toward the entire image 500, a solid line is drawn in the overall image 500, and the rear end of the object is directed toward the entire image 500. If so, a process of erasing the solid line in the entire image 500 may be performed according to the transition of the position of the rear end. That is, a gesture using a pen-shaped object can provide a pen function of drawing a solid line and an eraser function of erasing the drawn solid line. In this way, even gestures using the same specific object can call different functions. From the viewpoint of usability, it is preferable that a function intuitively imagined by the user from a specific object is associated with the specific object, and this embodiment makes it possible.
 上記の実現方法について説明する。まず、図1に示した各装置の役割について説明する。距離画像生成装置100は、ジェスチャが行われる領域を撮影して、当該領域に係る距離画像(デプスマップ)を生成する。距離画像生成装置100は、公知の装置を用いてよい。例えば、画像センサ、測距センサなどを含み、これらの画像から距離画像を生成する。画像センサとしては、RGBカメラなどがある。測距センサとしては、ステレオカメラ、TOF(Time of Flight)カメラ、ストラクチャードライトカメラなどがある。 The above implementation method will be explained. First, the role of each device shown in FIG. 1 will be described. The distance image generation device 100 captures a region in which the gesture is performed and generates a distance image (depth map) related to the region. As the distance image generation device 100, a known device may be used. For example, it includes an image sensor, a distance measuring sensor, and the like, and generates a distance image from these images. The image sensor includes an RGB camera and the like. Examples of the distance measuring sensor include a stereo camera, a TOF (Time of Flight) camera, and a structured light camera.
 情報処理装置200は、距離画像生成装置100により生成された距離画像に基づき、ユーザによるジェスチャを認識する。認識されるジェスチャには、前述の通り、物体を用いたジェスチャが含まれ得る。そして、認識されたジェスチャに基づいて全体画像500を生成する。なお、本実施形態では、ユーザビリティのさらなる向上のために、全体画像500に表示される画像オブジェクトの表示位置を調整する。これにより、ユーザにとって見やすい全体画像500にすることができる。詳細については、情報処理装置200の構成要素とともに後述する。 The information processing device 200 recognizes the gesture by the user based on the distance image generated by the distance image generation device 100. As described above, the recognized gesture may include a gesture using an object. Then, the whole image 500 is generated based on the recognized gesture. In the present embodiment, the display position of the image object displayed on the entire image 500 is adjusted in order to further improve the usability. As a result, the entire image 500 can be easily viewed by the user. Details will be described later together with the components of the information processing apparatus 200.
 プロジェクタ300は、情報処理装置200によって生成された全体画像500を出力する。図1の例では、プロジェクタ300は、テーブルの上方に設置され、プロジェクタ300から下方に画像が投影され、テーブルの上面に全体画像500が表示されている。 The projector 300 outputs the entire image 500 generated by the information processing device 200. In the example of FIG. 1, the projector 300 is installed above the table, an image is projected downward from the projector 300, and the entire image 500 is displayed on the upper surface of the table.
 なお、図1の例では、全体画像500を投影する場合のARシステムが示されているが、全体画像500の表示先は、特に限られるものではない。例えば、情報処理装置200が図2のラップトップ704に全体画像500を送信することにより、ラップトップ704に全体画像500が表示されてもよい。また、ARグラス、ヘッドマウントディスプレイなどいった画像表示装置に全体画像500が出力されてもよい。すなわち、プロジェクタ300および被投影体400の代わりに、画像表示装置が情報処理システム1000に含まれてもよい。 Although the AR system for projecting the entire image 500 is shown in the example of FIG. 1, the display destination of the entire image 500 is not particularly limited. For example, the information processing apparatus 200 may transmit the entire image 500 to the laptop 704 of FIG. 2 so that the entire image 500 may be displayed on the laptop 704. Further, the entire image 500 may be output to an image display device such as an AR glass or a head-mounted display. That is, instead of the projector 300 and the projected object 400, an image display device may be included in the information processing system 1000.
 なお、本開示では、情報処理システム1000にて行われる処理を明確にするために、情報処理システム1000が上記の装置から構成される例を示したが、これらの装置は、集約されてもよいし、さらに分散されてもよい。 In this disclosure, in order to clarify the processing performed by the information processing system 1000, an example in which the information processing system 1000 is composed of the above devices is shown, but these devices may be integrated. And may be further dispersed.
 情報処理装置200の内部構成について説明する。図3は、情報処理装置200の内部構成例を示すブロック図である。図3の例では、情報処理装置200は、距離画像取得部210と、ユーザ解析部220と、物体解析部230と、判定部240と、機能実行部250と、ユーザ解析用データ記憶部261と、物体解析用データ記憶部262と、登録ジェスチャ記憶部263と、画像記憶部264と、を備える。ユーザ解析部220は、ユーザ検出部221と、ユーザ動作解析部222と、を備える。物体解析部230は、物体検出部231と、物体動作解析部232と、を備える。機能実行部250は、画像取得部251と、表示位置決定部252と、表示方向決定部253と、全体画像生成部254と、を備える。 The internal configuration of the information processing device 200 will be described. FIG. 3 is a block diagram showing an example of the internal configuration of the information processing apparatus 200. In the example of FIG. 3, the information processing apparatus 200 includes a distance image acquisition unit 210, a user analysis unit 220, an object analysis unit 230, a determination unit 240, a function execution unit 250, and a user analysis data storage unit 261. A data storage unit 262 for object analysis, a registration gesture storage unit 263, and an image storage unit 264 are provided. The user analysis unit 220 includes a user detection unit 221 and a user motion analysis unit 222. The object analysis unit 230 includes an object detection unit 231 and an object motion analysis unit 232. The function execution unit 250 includes an image acquisition unit 251, a display position determination unit 252, a display direction determination unit 253, and an overall image generation unit 254.
 なお、情報処理装置200の上記の構成要素は、集約されてもよいし、さらに分散されてもよい。例えば、図3では、各処理において使用されるデータを明確にするために、各構成要素によって用いられるデータを記憶する複数の記憶部(ユーザ解析用データ記憶部261と、物体解析用データ記憶部262と、登録ジェスチャ記憶部263と、画像記憶部264)を記載したが、これらの記憶部は、一つ以上のメモリもしくはストレージ、またはそれらの組み合わせで構成されてもよい。また、本開示において、図示または説明されていない構成要素および機能も情報処理装置200には存在し得る。 The above-mentioned components of the information processing apparatus 200 may be aggregated or further dispersed. For example, in FIG. 3, in order to clarify the data used in each process, a plurality of storage units (user analysis data storage unit 261 and object analysis data storage unit 261) that store the data used by each component are stored. Although 262, the registered gesture storage unit 263, and the image storage unit 264) have been described, these storage units may be composed of one or more memories or storages, or a combination thereof. In addition, components and functions not shown or described in the present disclosure may also be present in the information processing apparatus 200.
 距離画像取得部210は、距離画像生成装置100から距離画像を取得し、ユーザ解析部220および物体解析部230に送信する。距離画像取得部210は、ユーザ解析部220および物体解析部230にて実行される検出の精度を高めるために、距離画像に対して閾値処理などの前処理を行ってもよい。 The distance image acquisition unit 210 acquires a distance image from the distance image generation device 100 and transmits it to the user analysis unit 220 and the object analysis unit 230. The distance image acquisition unit 210 may perform preprocessing such as threshold processing on the distance image in order to improve the accuracy of the detection executed by the user analysis unit 220 and the object analysis unit 230.
 ユーザ解析部220は、距離画像に写されたユーザの特定部位を検出して、特定部位の動作を解析する。ユーザ解析部220の処理に用いられるデータは、予めユーザ解析用データ記憶部261に記憶されている。例えば、特定部位のテンプレート画像といった特定部位を検出するためのデータがユーザ解析用データ記憶部261に記憶されている。特定部位は、指、手、腕、顔、目などジェスチャに用いられ得る部位であればよく、特に限られるものではない。 The user analysis unit 220 detects a specific part of the user captured in the distance image and analyzes the operation of the specific part. The data used for the processing of the user analysis unit 220 is stored in advance in the user analysis data storage unit 261. For example, data for detecting a specific part such as a template image of a specific part is stored in the user analysis data storage unit 261. The specific part may be a part that can be used for gestures such as a finger, a hand, an arm, a face, and an eye, and is not particularly limited.
 ユーザ解析部220の特定部位検出部は、距離画像に写されたユーザの特定部位およびその領域を検出する。検出方法は、公知の手法を用いてよい。例えば、ユーザ解析用データ記憶部261に記憶された特定部位に関するテンプレート画像と、距離画像とのブロックマッチングを行うことにより、特定部位の領域を検出することができる。 The specific part detection unit of the user analysis unit 220 detects the user's specific part and its area captured in the distance image. As the detection method, a known method may be used. For example, the area of the specific part can be detected by performing block matching between the template image related to the specific part stored in the user analysis data storage unit 261 and the distance image.
 ユーザ解析部220のユーザ動作解析部222は、検出された特定部位の動作として、特定部位の位置、姿勢、およびその推移を求める。 The user movement analysis unit 222 of the user analysis unit 220 obtains the position, posture, and transition of the specific part as the detected movement of the specific part.
 図4は、特定部位の解析の流れを説明する図である。図4の例では、ユーザの手が特定部位611として示されている。ユーザ動作解析部222は、図4の上側に示すように、特定部位の領域内に複数の点621をプロットする。特定部位の領域内にプロットされた点を特徴点と記載する。 FIG. 4 is a diagram for explaining the flow of analysis of a specific part. In the example of FIG. 4, the user's hand is shown as the specific site 611. As shown in the upper part of FIG. 4, the user motion analysis unit 222 plots a plurality of points 621 in the region of the specific portion. The points plotted in the area of the specific part are described as feature points.
 特徴点のプロット方法は、特に限られるものではない。例えば、ジェスチャ認識の技術において一般的に用いられる骨格モデルの生成手法に基づいて決定されてよい。また、爪、皺、ほくろ、毛、関節など、特定部位の特徴が特徴点621としてプロットされてもよい。 The method of plotting feature points is not particularly limited. For example, it may be determined based on a skeleton model generation method generally used in the technique of gesture recognition. In addition, features of specific parts such as nails, wrinkles, moles, hairs, and joints may be plotted as feature points 621.
 ユーザ動作解析部222は、特徴点621の各位置を検出する。距離画像を用いることにより、投影面に対する奥行き方向における位置も検出することができる。すなわち、各特徴点621の3次元位置を求めることができる。 The user motion analysis unit 222 detects each position of the feature point 621. By using the distance image, the position in the depth direction with respect to the projection surface can also be detected. That is, the three-dimensional position of each feature point 621 can be obtained.
 ユーザ動作解析部222は、得られた特徴点621に対して、平面フィッティングを行う。例えば最小二乗法、RANSAC法などの方法を用いることができる。これにより、図4の真中に示すように、特定部位に係る平面631が求められる。 The user motion analysis unit 222 performs plane fitting on the obtained feature points 621. For example, a method such as the least squares method or the RANSAC method can be used. As a result, as shown in the center of FIG. 4, the plane 631 related to the specific portion is obtained.
 特定部位の姿勢は、特定部位に係る平面631に基づいて求められる。例えば、投影面などを予め基準平面として定め、当該基準平面に対する平面631の傾きを特定部位の姿勢とする。平面の傾きは、例えば、図4の下側に示すような平面631の3次元軸と、基準平面に基づく3次元軸と、の角度差である、ピッチ(pitch)、ヨー(yaw)、およびロール(roll)で表すことができる。 The posture of the specific part is obtained based on the plane 631 related to the specific part. For example, a projection plane or the like is defined in advance as a reference plane, and the inclination of the plane 631 with respect to the reference plane is set as the posture of the specific portion. The inclination of the plane is, for example, the angle difference between the three-dimensional axis of the plane 631 as shown on the lower side of FIG. 4 and the three-dimensional axis based on the reference plane, such as pitch, yaw, and yaw. It can be represented by a roll.
 このようにして、ユーザ動作解析部222は、距離画像ごとに(距離画像が動画なら1フレームごとに)、特定部位の位置および姿勢を算出する。そして、時系列の前後における、これらの差異を算出する。すなわち、第1時点の距離画像に基づく解析結果と、第1時点よりも後の第2時点の距離画像に基づく解析結果と、の差分により、推移を求める。なお、特徴点の区別がつかない場合は、探索法などを用いて時系列の前後における特徴点の対応関係を推定してもよいし、距離画像が撮影された時間間隔を踏まえれば、位置の推移が少ない特徴点同士を対応づけてもよい。 In this way, the user motion analysis unit 222 calculates the position and posture of the specific part for each distance image (for each frame if the distance image is a moving image). Then, these differences before and after the time series are calculated. That is, the transition is obtained from the difference between the analysis result based on the distance image at the first time point and the analysis result based on the distance image at the second time point after the first time point. If the feature points cannot be distinguished, the correspondence between the feature points before and after the time series may be estimated by using a search method or the like, or the position may be determined based on the time interval in which the distance image was taken. Feature points with little transition may be associated with each other.
 なお、爪、皺、ほくろ、毛など、特定部位の特徴が特徴点としてプロットされた場合、プロットされた特徴点は、各ユーザで異なることになる。ユーザ解析部220は、設けられた特徴点をユーザ解析用データ記憶部261に履歴として記憶し、当該履歴に基づいて、新たに検出された特定部位が以前に解析されたものかを確認してもよい。このようにして、特徴点の配置に基づいたユーザ識別を行ってもよい。 If features of a specific part such as nails, wrinkles, moles, and hair are plotted as feature points, the plotted feature points will be different for each user. The user analysis unit 220 stores the provided feature points as a history in the user analysis data storage unit 261, and based on the history, confirms whether the newly detected specific site has been previously analyzed. May be good. In this way, user identification may be performed based on the arrangement of feature points.
 物体解析部230は、距離画像に写された物体を検出して、当該物体の動作を解析する。物体解析部230の処理に用いられるデータは、予め物体解析用データ記憶部262に記憶されている。例えば、物体のテンプレート画像といった物体を検出するためのデータが物体解析用データ記憶部262に記憶されている。 The object analysis unit 230 detects the object captured in the distance image and analyzes the operation of the object. The data used for the processing of the object analysis unit 230 is stored in advance in the object analysis data storage unit 262. For example, data for detecting an object, such as a template image of an object, is stored in the object analysis data storage unit 262.
 なお、物体解析部230は、ジェスチャに利用される特定物体だけを解析してもよいし、特定物体以外の物体も解析してもよい。例えば、登録ジェスチャ記憶部263に記憶されている登録ジェスチャに関するデータに基づいて、登録ジェスチャに係る特定物体を認識することができる。物体解析部230は、自身が解析可能な物体のうち、登録ジェスチャに係る特定物体だけのマッチングを行ってもよい。これにより、処理負荷の軽減、処理速度の上昇などの効果を得られる。また、後述するが、画像オブジェクトの表示位置を決定するために、登録ジェスチャに係る特定物体だけでなく、自身が解析可能な物体全てに対して解析を行ってもよい。 Note that the object analysis unit 230 may analyze only the specific object used for the gesture, or may analyze an object other than the specific object. For example, the specific object related to the registered gesture can be recognized based on the data related to the registered gesture stored in the registered gesture storage unit 263. The object analysis unit 230 may match only a specific object related to the registered gesture among the objects that can be analyzed by itself. As a result, effects such as reduction of processing load and increase of processing speed can be obtained. Further, as will be described later, in order to determine the display position of the image object, not only the specific object related to the registered gesture but also all the objects that can be analyzed by itself may be analyzed.
 物体解析部230の物体検出部231は、特定部位検出部と同様にして、距離画像に写された物体およびその領域を検出する。なお、距離画像に写された物体が何であるかを認識しないこともあり得る。例えば、ペン状、直方体、円柱などといった物体の形状だけが認識される場合もあり得る。 The object detection unit 231 of the object analysis unit 230 detects the object and its region captured in the distance image in the same manner as the specific part detection unit. It is possible that the object captured in the distance image is not recognized. For example, only the shape of an object such as a pen, a rectangular parallelepiped, or a cylinder may be recognized.
 物体解析部230の物体動作解析部232は、ユーザ動作解析部222と同様にして、検出された物体の動作として、検出された物体の位置、姿勢、およびその推移を求める。 The object motion analysis unit 232 of the object analysis unit 230 obtains the position, posture, and transition of the detected object as the motion of the detected object in the same manner as the user motion analysis unit 222.
 このようにして、ユーザの特定部位の動作と、特定物体の動作と、が解析されるが、これらの動作解析が失敗する場合もあり得る。例えば、ジェスチャの途中において、ユーザの手が特定物体を隠してしまい、距離画像上に特定物体が写されていない場合もあり得る。その場合、特定物体が検出されず、特定物体の動作が実際には続いているにも関わらず、終了したと誤認識される。 In this way, the movement of the specific part of the user and the movement of the specific object are analyzed, but these movement analysis may fail. For example, in the middle of the gesture, the user's hand may hide the specific object, and the specific object may not be shown on the distance image. In that case, the specific object is not detected, and it is erroneously recognized that the specific object has ended even though the operation of the specific object is actually continuing.
 そのような場合に備えて、ユーザ動作解析部222および物体動作解析部232は、解析結果の検証、動作の再解析などを行ってもよい。すなわち、特定物体のある時点における動作の解析の成否について判定し、特定物体の当該時点における動作の解析が失敗したと判定された場合に、当該時点における動作の解析結果を修正してもよい。 In preparation for such a case, the user motion analysis unit 222 and the object motion analysis unit 232 may perform verification of the analysis result, reanalysis of the motion, and the like. That is, the success or failure of the analysis of the motion of the specific object at a certain point in time may be determined, and when it is determined that the analysis of the motion of the specific object at that time has failed, the analysis result of the motion at that time may be modified.
 例えば、第1時点における動作の解析結果を、第1時点よりも前の第2時点における動作の解析結果、および、第1時点よりも後の第3時点における動作の解析結果、の少なくともいずれかに基づいて、修整してもよい。例えば、位置および姿勢が急激に変動した場合、解析が失敗したと判定し、急激に変動した位置および姿勢を、時系列の前後の値に基づく補完などによって修整してもよい。 For example, the analysis result of the motion at the first time point is at least one of the analysis result of the motion at the second time point before the first time point and the analysis result of the motion at the third time point after the first time point. May be modified based on. For example, when the position and the posture fluctuate abruptly, it may be determined that the analysis has failed, and the suddenly fluctuated position and the posture may be corrected by complementation based on the values before and after the time series.
 また、ユーザ動作解析部222および物体動作解析部232は、これまでの位置および姿勢の推移に基づいて、今後の位置および姿勢を予測してもよい。例えば、第1から第N(Nは1以上の整数)の時点における特徴点の位置に基づいて、第N+1の時点における特徴点の位置を予測してもよい。当該予測結果と、上記の実際の検出に基づく推定結果と、を比較して、誤差が所定閾値よりも大きい場合は、検出による推定が失敗したと判定できる。推定が失敗したと判定された場合は、予測結果を用いてもよいし、上記のように、先行および後続の推定結果に基づいて補正してもよい。 Further, the user motion analysis unit 222 and the object motion analysis unit 232 may predict the future position and posture based on the transition of the position and posture so far. For example, the position of the feature point at the first N + 1 time point may be predicted based on the position of the feature point at the first to Nth time points (N is an integer of 1 or more). The prediction result is compared with the estimation result based on the actual detection described above, and if the error is larger than the predetermined threshold value, it can be determined that the estimation by the detection has failed. If it is determined that the estimation has failed, the prediction result may be used, or as described above, correction may be made based on the preceding and succeeding estimation results.
 予測方法は、例えば、特徴点の速度および加速度を算出し、算出されたそれらに基づいて予測してもよい。あるいは、ニューラルネットワークに基づく推定モデルに、第1から第Nの時点における特徴点の位置を入力し、第N+1の時点における特徴点の位置を出力させてもよい。当該推定モデルは、第1から第Nの時点における特徴点の位置を示す学習用の入力データと、第N+1の時点における特徴点の実際の位置を示す正解データと、に基づいて公知の深層学習を行うことにより生成することができる。 The prediction method may be, for example, calculating the velocity and acceleration of the feature points and making a prediction based on the calculated speeds and accelerations. Alternatively, the position of the feature point at the first to Nth time points may be input to the estimation model based on the neural network, and the position of the feature point at the N + 1th time point may be output. The estimation model is known for deep learning based on input data for learning indicating the positions of feature points at the first to Nth time points and correct answer data indicating the actual positions of the feature points at the time points of N + 1. Can be generated by performing.
 判定部240は、特定部位の動作と、特定物体の動作と、の少なくともいずれかに基づいて、特定部位および特定物体の少なくともいずれかに係るジェスチャが実行されたか否かを判定する。判定に用いられるデータは、予め登録ジェスチャ記憶部263に記憶されている。例えば、判定部240は、登録ジェスチャに係る特定部位の動作と、解析された特定部位の動作と、を比較して、その合致率を算出する。同様に、登録ジェスチャに係る特定物体の動作と、解析された特定物体の動作と、を比較して、その合致率を算出する。そして、各合致率に基づいて、ジェスチャが実行されたかが判定されてもよい。例えば、各合致率がそれぞれの閾値を超えていた場合に、登録ジェスチャが実施されたと判定してもよい。 The determination unit 240 determines whether or not the gesture related to at least one of the specific part and the specific object has been executed based on at least one of the movement of the specific part and the movement of the specific object. The data used for the determination is stored in advance in the registered gesture storage unit 263. For example, the determination unit 240 compares the operation of the specific part related to the registered gesture with the analyzed operation of the specific part, and calculates the matching rate. Similarly, the motion of the specific object related to the registered gesture is compared with the motion of the analyzed specific object, and the matching rate is calculated. Then, it may be determined whether or not the gesture has been executed based on each match rate. For example, it may be determined that the registration gesture has been performed when each match rate exceeds the respective threshold value.
 図5は、登録ジェスチャの登録内容について説明する図である。用いられる特定物体と、ARシステム上の機能分類と、登録ジェスチャを構成する特定部位の動作および特定物体の動作と、呼び出される機能と、が示されている。なお、図5の例でも、特定部位は手でとしている。また、特定部位の動作および特定物体の動作は、実際には、位置、姿勢の推移を示す数値が登録されている。 FIG. 5 is a diagram for explaining the registered contents of the registered gesture. The specific object to be used, the functional classification on the AR system, the operation of the specific part constituting the registered gesture, the operation of the specific object, and the function to be called are shown. In addition, even in the example of FIG. 5, the specific part is made by hand. Further, in the movement of the specific part and the movement of the specific object, numerical values indicating the transition of the position and the posture are actually registered.
 図6から13は、図5に示された登録ジェスチャを示す図である。図6には、図5の1から3番目の登録ジェスチャの詳細が示されている。図6(A)は、図5の1番目の登録ジェスチャを示す。特定部位である手611は、ペン持ち姿勢と呼ばれる、ペンを把持した状態を維持し、特定物体であるペン形状の物体701は、その先端が投影面に近い状態、つまり下側に維持されている。このジェスチャが認識されると、図5に示すように、「電子ペン」という機能分類の「線を書く」という機能が呼び出される。図6(B)は、図5の2番目の登録ジェスチャを示す。当該ジェスチャが認識されると、図5に示すように、全体画像500内の線を消す機能が呼び出される。図6(C)は、図5の3番目の登録ジェスチャを示し、当該ジェスチャが認識されると、全体画像500内に、画像オブジェクトであるポインタ505が表示されることが示されている。 6 to 13 are diagrams showing the registration gesture shown in FIG. FIG. 6 shows the details of the first to third registered gestures of FIG. FIG. 6 (A) shows the first registered gesture of FIG. The hand 611, which is a specific part, maintains a state in which the pen is held, which is called a pen holding posture, and the pen-shaped object 701, which is a specific object, is maintained in a state where its tip is close to the projection surface, that is, on the lower side. There is. When this gesture is recognized, as shown in FIG. 5, the function of "drawing a line" of the function classification of "electronic pen" is called. FIG. 6B shows the second registered gesture of FIG. When the gesture is recognized, as shown in FIG. 5, the function of erasing the line in the whole image 500 is called. FIG. 6C shows the third registered gesture of FIG. 5, and when the gesture is recognized, it is shown that the pointer 505, which is an image object, is displayed in the entire image 500.
 図7は、図5の4番目の登録ジェスチャを示す。ペン形状の物体701の後端が親指と所定距離以内にあるときに(つまり、近い状態で)親指が曲げ伸ばしされると、曲げ伸ばしの度に、全体画像500内に、色の異なる画像オブジェクト506A、506B、506Cが順に表示される(つまり、画像オブジェクトの色が変化する)ことが示されている。例えば、前述の「線を書く」という機能における線の色が変更され得る。 FIG. 7 shows the fourth registered gesture of FIG. When the rear end of the pen-shaped object 701 is within a predetermined distance from the thumb (that is, in a close state) and the thumb is bent and stretched, each time the thumb is bent and stretched, an image object having a different color is included in the entire image 500. It is shown that 506A, 506B, and 506C are displayed in order (that is, the color of the image object changes). For example, the color of the line in the above-mentioned function of "drawing a line" can be changed.
 図8は、図5の5番目の登録ジェスチャを示す。指が回転しつつ、ペン形状の物体701の後端部が回転する度に、全体画像500内に、サイズの異なる画像オブジェクト507Aおよび507Bが順に表示される(つまり、画像オブジェクトのサイズが変化する)ことが示されている。例えば、前述の「線を書く」という機能における線の太さが変更され得る。 FIG. 8 shows the fifth registered gesture of FIG. Each time the rear end of the pen-shaped object 701 rotates while the finger rotates, image objects 507A and 507B having different sizes are displayed in order in the entire image 500 (that is, the size of the image object changes). ) Is shown. For example, the line thickness in the above-mentioned "drawing line" function can be changed.
 図9は、図5の6番目の登録ジェスチャを示す。手611が投影面に平行に動き、直方体702のある角が他の角よりも投影面に近い状態を維持しつつ投影面に平行に動くジェスチャが示されている。当該ジェスチャが実行されると、図5に示すように、全体画像500内の線を消す機能が呼び出される。 FIG. 9 shows the sixth registered gesture of FIG. A gesture is shown in which the hand 611 moves parallel to the projection plane and one corner of the rectangular parallelepiped 702 moves parallel to the projection plane while maintaining a state closer to the projection plane than the other corners. When the gesture is executed, as shown in FIG. 5, the function of erasing the line in the whole image 500 is called.
 図10(A)は、図5の7番目の登録ジェスチャを示す。手611が投影面に平行に動き、直方体702のある側面が投影面に近い状態を維持しつつ投影面に平行に動くジェスチャが示されている。当該ジェスチャが認識されると、図10(B)に示すように、全体画像500の大きさ(スケール)が変更されることを示す。 FIG. 10 (A) shows the seventh registered gesture of FIG. A gesture is shown in which the hand 611 moves parallel to the projection plane and one side of the rectangular parallelepiped 702 moves parallel to the projection plane while maintaining a state close to the projection plane. When the gesture is recognized, it is shown that the size (scale) of the entire image 500 is changed as shown in FIG. 10 (B).
 図11(A)は、図5の8番目の登録ジェスチャを示す。特定部位の動作によらずに、直方体702の投影面への接触面が変わるというジェスチャが示されている。このように特定部位の動作および特定物体の動作のいずれか一方が含まれないジェスチャが登録されていてもよい。当該ジェスチャが認識されると、図11(B)に示すように、全体画像500が切り替わる。 FIG. 11 (A) shows the eighth registered gesture of FIG. The gesture that the contact surface of the rectangular parallelepiped 702 with the projection surface changes regardless of the movement of the specific part is shown. In this way, a gesture that does not include either the movement of a specific part or the movement of a specific object may be registered. When the gesture is recognized, the entire image 500 is switched as shown in FIG. 11 (B).
 図12は、図5の9から11番目の登録ジェスチャを示す。図12(A)は、9番目の登録ジェスチャを示し、このジェスチャが認識されると、全体画像500内にスタンプという画像オブジェクトが表示される。図12(B)は、このジェスチャが認識されると、図2に示したような時刻を表すタイマーという画像オブジェクト504が、全体画像500内に表示される。図12(C)は、11番目の登録ジェスチャを示し、このジェスチャが認識されると、スタンプが別のスタンプに切り替えられる。 FIG. 12 shows the 9th to 11th registered gestures of FIG. FIG. 12A shows the ninth registered gesture, and when this gesture is recognized, an image object called a stamp is displayed in the entire image 500. In FIG. 12B, when this gesture is recognized, an image object 504 called a timer representing the time as shown in FIG. 2 is displayed in the entire image 500. FIG. 12C shows the eleventh registered gesture, and when this gesture is recognized, the stamp is switched to another stamp.
 図13(A)は、図5の12番目の登録ジェスチャを示す。当該ジェスチャが実行されると、全体画像500が回転し、例えば、図13(B)に示すように、全体画像500内に表示された画像オブジェクトの向きが逆さまになる。 FIG. 13 (A) shows the twelfth registered gesture of FIG. When the gesture is executed, the whole image 500 is rotated, and for example, as shown in FIG. 13B, the orientation of the image object displayed in the whole image 500 is turned upside down.
 しかし、円柱703のような回転対称の特定物体が、図12(C)および13(A)に示したような回転の動作を行った場合、特徴点の推移に基づいて当該動作を認識することは困難である。そこで、動作解析部は、解析された特定部位の動作に基づいて、特定物体の動作を認識してもよい。 However, when a rotationally symmetric specific object such as the cylinder 703 performs a rotational motion as shown in FIGS. 12 (C) and 13 (A), the motion is recognized based on the transition of the feature points. It is difficult. Therefore, the motion analysis unit may recognize the motion of the specific object based on the analyzed motion of the specific part.
 例えば、特定部位が、円柱703の上面の中心を通る垂直線を軸にして回転したと解析された場合、動作解析部は、円柱703が静止していると解析したとしても、当該解析結果を覆して、円柱703が回転したとみなしてよい。また、特定部位が円柱703に接触していたかを判断し、接触していた場合のみ、円柱703は特定部位と同じく回転したと判定してもよい。このように、特定部位の動作に基づいて特定物体の動作を認識してもよい。 For example, when it is analyzed that a specific part rotates about a vertical line passing through the center of the upper surface of the cylinder 703, the motion analysis unit analyzes the analysis result even if the cylinder 703 is analyzed to be stationary. Overturning, the cylinder 703 may be regarded as rotated. Further, it may be determined whether or not the specific portion is in contact with the cylinder 703, and it may be determined that the cylinder 703 has rotated in the same manner as the specific portion only when the specific portion is in contact with the cylinder 703. In this way, the movement of the specific object may be recognized based on the movement of the specific part.
 なお、特定部位と特定物体との接触は、特定部位に係る平面が特定物体のいずれかの面と交差しているかにより、認識することができる。 Note that the contact between the specific part and the specific object can be recognized by whether the plane related to the specific part intersects any surface of the specific object.
 なお、ユーザ動作解析部222も、特定物体の動作に基づいて、特定部位の動作を認識してもよい。また、判定部240が、特定物体の動作および特定部位の動作のいずれか一方に少なくとも基づいて、他方の動作を認定し直してもよい。 Note that the user motion analysis unit 222 may also recognize the motion of the specific part based on the motion of the specific object. Further, the determination unit 240 may recertify the movement of the other based on at least one of the movement of the specific object and the movement of the specific portion.
 このように、登録ジェスチャを構成する特定部位の動作および特定物体の動作が予め定められてデータとして記憶されており、判定部240は、当該データ内に、検出された特定部位の動作および特定物体の動作の組み合わせと一致する登録ジェスチャがあるかを検索する。そして、登録ジェスチャが検出されると、登録ジェスチャに対応する機能が判明する。 In this way, the movement of the specific part and the movement of the specific object constituting the registered gesture are predetermined and stored as data, and the determination unit 240 detects the movement of the specific part and the specific object in the data. Search for registration gestures that match the combination of behaviors in. Then, when the registered gesture is detected, the function corresponding to the registered gesture is found.
 機能実行部250は、ジェスチャが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行する。 When it is determined that the gesture has been executed, the function execution unit 250 executes the function corresponding to the executed gesture.
 画像取得部251は、画像記憶部264から実行すべき機能に対応する画像オブジェクトを取得する。全体画像生成部254は、取得された画像オブジェクトを含めて、全体画像500を生成する。 The image acquisition unit 251 acquires an image object corresponding to the function to be executed from the image storage unit 264. The whole image generation unit 254 generates the whole image 500 including the acquired image object.
 なお、図2の例のように、全体画像500に複数の画像オブジェクトが表示される場合もあり得る。そのような場合、表示される複数の画像オブジェクトが重ならないように、表示位置を決定することが好ましい。 Note that, as in the example of FIG. 2, a plurality of image objects may be displayed in the entire image 500. In such a case, it is preferable to determine the display position so that the plurality of displayed image objects do not overlap.
 また、画像オブジェクトの向きも考慮されたほうが好ましい。例えば、画像オブジェクトに文字などが含まれる場合に、ユーザにとって当該文字が逆向きに表示されてしまっては、ユーザビリティが低くなる。そこで、本実施形態では、画像オブジェクトの適切な表示位置および表示方向を決定する。 It is also preferable to consider the orientation of the image object. For example, when an image object contains characters and the like, if the characters are displayed in the opposite direction to the user, usability is lowered. Therefore, in the present embodiment, an appropriate display position and display direction of the image object are determined.
 表示位置決定部252は、現時点の全体画像500内において画像が表示されていない領域(すなわち、空きスペース)を検出する。当該検出は、画像オブジェクトの表示位置を記録しておき、当該記録に基づいて検出すればよい。検出された空きスペースは、画像オブジェクトの表示位置の候補となる。表示位置決定部252は、検出された空きスペースの一つを、新規に表示される画像オブジェクトのサイズ、ジェスチャに係る特定部位の位置などに基づいて選出する。これにより、新規に表示される画像オブジェクトの表示位置が決定される。 The display position determination unit 252 detects an area (that is, an empty space) in which the image is not displayed in the current overall image 500. The detection may be performed by recording the display position of the image object and detecting based on the recording. The detected empty space is a candidate for the display position of the image object. The display position determination unit 252 selects one of the detected empty spaces based on the size of the newly displayed image object, the position of the specific portion related to the gesture, and the like. As a result, the display position of the newly displayed image object is determined.
 また、表示位置決定部252は、空きスペースのうち、実際には使用できないスペースを検出し、実際には使用できないスペースを画像オブジェクトの表示位置の候補から外してよい。例えば、図2の例のように被投影体400がテーブルの場合、テーブル上に置かれた物体が全体画像500と重なることが考えられる。このような場合、テーブル上に置かれた物体と同じ場所に画像オブジェクトが映されると、画像オブジェクトが見えにくくなってしまう。したがって、表示位置決定部252は、検出された物体の位置に基づいて、画像オブジェクトの表示位置の候補をさらに絞り込んでもよい。 Further, the display position determination unit 252 may detect a space that cannot be actually used among the empty spaces, and may exclude the space that cannot be actually used from the candidates for the display position of the image object. For example, when the projected object 400 is a table as in the example of FIG. 2, it is conceivable that the object placed on the table overlaps with the entire image 500. In such a case, if the image object is projected in the same place as the object placed on the table, the image object becomes difficult to see. Therefore, the display position determination unit 252 may further narrow down the candidates for the display position of the image object based on the detected position of the object.
 また、距離画像に写された画像オブジェクトが静止しているとは限らない。例えば、ユーザAおよびユーザBが情報処理システム1000を利用している場合に、ユーザAのジェスチャによって呼び出された画像オブジェクトを空きスペースに表示しようとしたところ、当該画像オブジェクトが表示された時点において、当該空きスペースに、ユーザBが動かした物体が来る場合もあり得る。このような事態を避けるために、表示位置決定部252は、画像オブジェクトが表示される時点における検出された特定部位および物体の予測位置に基づいて、当該画像オブジェクトの表示位置の候補をさらに絞り込んでもよい。これにより、全体画像500内の機能が、ユーザの手、投影面に置かれた物体などに重なってしまうといった事態を防ぐことができる。 Also, the image object captured in the distance image is not always stationary. For example, when user A and user B are using the information processing system 1000 and try to display an image object called by the gesture of user A in an empty space, when the image object is displayed, the image object is displayed. An object moved by user B may come to the empty space. In order to avoid such a situation, the display position determining unit 252 may further narrow down the display position candidates of the image object based on the detected specific part and the predicted position of the object at the time when the image object is displayed. good. As a result, it is possible to prevent a situation in which the functions in the entire image 500 overlap with the user's hand, an object placed on the projection surface, or the like.
 また、図2の例の時計オブジェクト504のように、画像オブジェクトを常に特定物体の上に表示させたい場合もあり得る。そのような場合、表示位置決定部252は、特定物体の予測位置を、画像オブジェクトの表示位置としてもよい。 Further, there may be a case where the image object is always displayed on the specific object like the clock object 504 in the example of FIG. In such a case, the display position determination unit 252 may use the predicted position of the specific object as the display position of the image object.
 表示方向決定部253は、画像オブジェクトの表示方向を決定する。例えば、距離画像に写されていたユーザの顔がユーザ解析部220により認識された場合は、当該顔の位置および向きに基づいて、画像オブジェクトの表示方向を決定する。例えば、画像オブジェクトには予め正しいとされる方向(例えば上下)を定めておき、表示方向決定部253は、ユーザの顔がある位置から見て、画像オブジェクトを上下正しい向きに表示する。これにより、文字などを表示する場合に、ユーザにとって見やすい方向に表示することができる。 The display direction determination unit 253 determines the display direction of the image object. For example, when the user analysis unit 220 recognizes the user's face captured in the distance image, the display direction of the image object is determined based on the position and orientation of the face. For example, the correct direction (for example, up and down) is determined in advance for the image object, and the display direction determination unit 253 displays the image object in the correct up and down direction when viewed from a certain position of the user's face. As a result, when displaying characters and the like, the characters can be displayed in a direction that is easy for the user to see.
 また、ユーザの顔の位置および向きが推定されてもよい。例えば、距離画像にユーザの手が写されていたが、ユーザの顔が写されていない場合もあり得る。そのような場合、ユーザの手などの一部から、距離画像の範囲外にあるユーザの顔の位置を推定する。例えば、ユーザ解析部220が、ユーザの手の動作を解析している場合、手が画像領域に挿入された方向を特定することができる。表示方向決定部253は、手が画像領域に挿入された方向を顔の位置と認識してもよい。あるいは、ユーザ解析部220が、手の骨格モデルを生成し、当該骨格モデルにより特定された指の向きなども考慮して、顔の位置を推定してもよい。 Further, the position and orientation of the user's face may be estimated. For example, the distance image may show the user's hand but not the user's face. In such a case, the position of the user's face outside the range of the distance image is estimated from a part of the user's hand or the like. For example, when the user analysis unit 220 analyzes the movement of the user's hand, the direction in which the hand is inserted into the image area can be specified. The display direction determination unit 253 may recognize the direction in which the hand is inserted into the image area as the position of the face. Alternatively, the user analysis unit 220 may generate a skeleton model of the hand and estimate the position of the face in consideration of the orientation of the fingers specified by the skeleton model.
 このようにして、表示する画像オブジェクトと、その表示位置と、その表示方向と、が決定され、全体画像生成部254が、これらの決定事項にしたがって、全体画像500を生成する。全体画像500の生成は、一般的のCG(Comupter Graphics)製造手法と同じでよい。 In this way, the image object to be displayed, its display position, and its display direction are determined, and the overall image generation unit 254 generates the entire image 500 according to these determination items. The generation of the entire image 500 may be the same as that of a general CG (Computer Graphics) manufacturing method.
 図14は、画像オブジェクトの好ましい表示例を示す図である。特定部位である手611が円柱703をタッチするというジェスチャが行われ、当該ジェスチャにより、ARシステムのメニュー画像508が表示されている。全体画像500には、他の画像オブジェクトも表示されている。また、全体画像500の上空には、各ユーザの手611、612および613が存在している。また、投影面には物体705、706および707が存在する。これらと重ならないように、メニュー画像508が表示されている。また、メニュー画像508には文字が示されているが、ジェスチャを行った手611のユーザが当該文字を読むことができるように示されている。表示位置決定部252および表示方向決定部253の処理により、このようなユーザビリティに優れた全体画像500を生成することが好ましい。 FIG. 14 is a diagram showing a preferable display example of the image object. A gesture is performed in which the hand 611, which is a specific part, touches the cylinder 703, and the menu image 508 of the AR system is displayed by the gesture. Other image objects are also displayed in the whole image 500. In addition, the hands 611, 612, and 613 of each user are present in the sky above the entire image 500. In addition, objects 705, 706 and 707 are present on the projection surface. The menu image 508 is displayed so as not to overlap with these. Further, although the character is shown in the menu image 508, it is shown so that the user of the hand 611 who performed the gesture can read the character. It is preferable that the entire image 500 having excellent usability is generated by the processing of the display position determination unit 252 and the display direction determination unit 253.
 なお、上記では、ジェスチャと機能の対応関係が予め定められていたが、ユーザビリティの観点からは、当該対応関係をカスタマイズできるほうが好ましい。ゆえに、対応関係の変更を行う機能を呼び出すジェスチャが登録されていることが好ましい。例えば、図14に示したメニュー画像508に、登録ジェスチャを示すアイコンと、呼び出し可能な機能を示すアイコンと、を含ませる。そして、ユーザのタップを検出して、タップされたアイコンに係るジェスチおよび機能を対応させるようにしてもよい。 In the above, the correspondence between gestures and functions was predetermined, but from the viewpoint of usability, it is preferable to be able to customize the correspondence. Therefore, it is preferable that a gesture that calls a function for changing the correspondence is registered. For example, the menu image 508 shown in FIG. 14 includes an icon indicating a registered gesture and an icon indicating a callable function. Then, the user's tap may be detected and the gesture and function related to the tapped icon may be associated with each other.
 また、新たなジェスチャの登録を行う機能を呼び出すジェスチャが登録されていてもよい。例えば、図6(C)で示したようなポインタ505が表示され、ポインタ505が表示されている間に、ポインタ505の上空において行われたジェスチャが新たに登録されてもよい。ポインタ505の上空において行われたジェスチャが、ユーザ解析部220および物体解析部230によって解析され、解析された特定部位の動作および物体の動作が、新たな登録ジェスチャの内容として、登録ジェスチャ記憶部263に記憶されればよい。 Also, a gesture that calls the function for registering a new gesture may be registered. For example, the pointer 505 as shown in FIG. 6C may be displayed, and while the pointer 505 is displayed, a gesture performed in the sky above the pointer 505 may be newly registered. The gesture performed in the sky above the pointer 505 is analyzed by the user analysis unit 220 and the object analysis unit 230, and the analyzed motion of the specific part and the motion of the object are the contents of the new registered gesture, the registered gesture storage unit 263. It should be memorized in.
 なお、前述の通り、特徴点の配置によってユーザの識別が可能な場合、登録ジェスチャのカスタマイズもユーザごとに行うことが可能である。 As mentioned above, if the user can be identified by arranging the feature points, the registered gesture can be customized for each user.
 次に、各構成要素による処理の流れについて説明する。図15は、ジェスチャによる機能の呼び出し処理のフローチャートである。 Next, the flow of processing by each component will be described. FIG. 15 is a flowchart of a function call process by a gesture.
 画像取得部251は、距離画像を取得し(S101)、ユーザ解析部220および物体解析部230に送信する。ユーザ解析部220は、距離画像に写るユーザの特定部位を検出して、特定部位の動作を解析する(S102)。一方、物体解析部230も距離画像に写る物体を検出して、物体の動作を解析する(S103)。これらの解析処処理の内部フローは後述する。 The image acquisition unit 251 acquires a distance image (S101) and transmits it to the user analysis unit 220 and the object analysis unit 230. The user analysis unit 220 detects a specific part of the user reflected in the distance image and analyzes the operation of the specific part (S102). On the other hand, the object analysis unit 230 also detects the object reflected in the distance image and analyzes the motion of the object (S103). The internal flow of these analysis processing processes will be described later.
 判定部240は、解析された特定部位および物体の動作に基づき、登録ジェスチャが実行されたどうかを判定する(S104)。登録ジェスチャが実行されなかった場合は(S105のNO)、機能実行部250が現状の機能を維持する(S106)。これにより、フローは終了し、全体画像500は変化しない。一方、登録ジェスチャが実行された場合は、機能実行部250が実行されたジェスチャに係る機能を実行する(S107)。すなわち、機能の切り替えが実施され、全体画像500の当該機能に係る画像オブジェクトが変化する。 The determination unit 240 determines whether or not the registration gesture has been executed based on the analyzed movements of the specific part and the object (S104). If the registration gesture is not executed (NO in S105), the function execution unit 250 maintains the current function (S106). As a result, the flow ends and the entire image 500 does not change. On the other hand, when the registered gesture is executed, the function execution unit 250 executes the function related to the executed gesture (S107). That is, the function is switched, and the image object related to the function of the entire image 500 changes.
 次に、解析処理の内部フローについて説明する。図16は、解析処理のフローチャートである。なお、ユーザ解析部220と物体解析部230において、フローは同じであるため、ユーザ検出部221および物体検出部231をまとめて検出部と記載し、ユーザ動作解析部222および物体動作解析をまとめて動作解析部と記載する。 Next, the internal flow of analysis processing will be described. FIG. 16 is a flowchart of the analysis process. Since the flow is the same in the user analysis unit 220 and the object analysis unit 230, the user detection unit 221 and the object detection unit 231 are collectively referred to as a detection unit, and the user motion analysis unit 222 and the object motion analysis are collectively described. Described as motion analysis unit.
 検出部が、所定の複数の時点の距離画像に写された特定部位または物体の検出を試行する。全時点において検出および解析が成功しなかった場合(S202のNO)、解析部は、失敗した時点の動作を他の時点の動作などに基づいて推定する。当該推定が完了した、または、全時点において成功した場合(S203のYES)、動作解析部は、検出された位置および姿勢の推移、前回の解析結果、もう一方の動作(つまり、ユーザ解析の場合は物体の動作、物体解析の場合は特定部位の動作)などに基づいて、動作を解析する(S204)。これにより、当該複数の時点のうちのいくつかの時点において検出および解析が失敗しても、当該複数の時点における動作を推定することができる。また、回転対称の特定物体の回転動作も検出することができる。これにより、動作の誤検知が防がれる。 The detection unit attempts to detect a specific part or object captured in a distance image at a plurality of predetermined points in time. When the detection and analysis are not successful at all time points (NO in S202), the analysis unit estimates the operation at the time of failure based on the operation at other time points and the like. When the estimation is completed or successful at all points (YES in S203), the motion analysis unit determines the detected position and posture transition, the previous analysis result, and the other motion (that is, in the case of user analysis). Analyzes the motion based on the motion of the object, the motion of a specific part in the case of object analysis), and the like (S204). Thereby, even if the detection and analysis fail at some of the plurality of time points, the operation at the plurality of time points can be estimated. It is also possible to detect the rotational movement of a specific rotationally symmetric object. This prevents false detection of operation.
 次に、機能実行に伴う画像表示処理のフローについて説明する。図17は、画像表示処理のフローチャートである。本フローは、ジェスチャによる機能の呼び出し処理のフローのS106およびS107において行われ得る。 Next, the flow of image display processing associated with function execution will be described. FIG. 17 is a flowchart of the image display process. This flow can be performed in S106 and S107 of the flow of the function call processing by the gesture.
 機能実行部250が、ジェスチャに対応する画像オブジェクト、検出された特定部位、検出された物体などの情報を取得する(S301)。なお、画像オブジェクトの情報は判定部240から送信されてもよいし、判定部240からは実行されたジェスチャを示す情報が送信され、当該情報に基づいて機能実行部250が取得してもよい。 The function execution unit 250 acquires information such as an image object corresponding to a gesture, a detected specific part, and a detected object (S301). The information of the image object may be transmitted from the determination unit 240, or the information indicating the gesture executed may be transmitted from the determination unit 240, and the function execution unit 250 may acquire the information based on the information.
 画像オブジェクトの情報に基づいて、画像取得部251は画像記憶部264から画像オブジェクトを取得する(S302)。なお、ジェスチャが変更されなかった場合は、画像オブジェクトが既に取得されているため、当該処理は省略されてよい。 Based on the image object information, the image acquisition unit 251 acquires the image object from the image storage unit 264 (S302). If the gesture is not changed, the image object has already been acquired, and the process may be omitted.
 一方、表示位置決定部252は、当該画像を表示する位置を決定するために、まずは、当該画像の出力時点でのユーザおよび物体の位置を予測する(S303)。なお、当該予測は、前述の通り、ユーザ動作解析部222および物体動作解析部232が行ってもよい。表示位置決定部252は、現在表示されている全体画像500の空きスペースを確認し、さらに、ユーザおよび物体の予測位置に基づいて、現時点以降の空きスペースを検出する(S304)。表示位置決定部252は、現時点以降の空きスペースと、画像オブジェクトのサイズと、に基づいて、画像オブジェクトの表示位置を決定する(S305)。なお、特定物体上に表示する場合は、空きスペースの代わりに、当該特定物体の位置が検出されて、当該位置が表示位置と決定される。 On the other hand, the display position determination unit 252 first predicts the positions of the user and the object at the time of output of the image in order to determine the position to display the image (S303). As described above, the user motion analysis unit 222 and the object motion analysis unit 232 may perform the prediction. The display position determination unit 252 confirms the free space of the entire image 500 currently displayed, and further detects the free space after the present time based on the predicted positions of the user and the object (S304). The display position determination unit 252 determines the display position of the image object based on the free space after the present time and the size of the image object (S305). When displaying on a specific object, the position of the specific object is detected instead of the empty space, and the position is determined as the display position.
 一方、表示方向決定部253は、当該画像を表示する方向を決定するために、ユーザの顔の位置が検出されたかを確認し、顔の位置が検出されなかった場合(S306のNO)は、検出された特定部位に基づいてユーザの顔の位置を推定する(S307)。当該推定が完了した、または、顔の位置が検出されている場合(S306のNO)は、表示方向決定部253は、ユーザの顔の位置に基づいて画像オブジェクトの表示方向を決定する(S308)。 On the other hand, the display direction determination unit 253 confirms whether the position of the user's face is detected in order to determine the direction in which the image is displayed, and if the position of the face is not detected (NO in S306), The position of the user's face is estimated based on the detected specific part (S307). When the estimation is completed or the position of the face is detected (NO in S306), the display direction determination unit 253 determines the display direction of the image object based on the position of the user's face (S308). ..
 画像オブジェクト、当該画像の表示位置、および当該画像の表示方向が判明したため、全体画像生成部254は、決定された通りに、画像オブジェクトを全体画像500にはめ込むことにより、全体画像500を生成し出力する(S309)。図15に示したフローが距離画像生成装置100から距離画像を受信する度に行われいた場合、本フローにより、ジェスチャに対応する画像オブジェクトを、ユーザにとって見やすいように表示し続けることができる。例えば、ジェスチャが実行されて当該ジェスチャに係る画像オブジェクトが表示された後に、ジェスチャを実行したユーザが移動することもあり得る。そのような場合でも、画像オブジェクトの表示位置および表示方向が適正化される。 Since the image object, the display position of the image, and the display direction of the image are known, the whole image generation unit 254 generates and outputs the whole image 500 by fitting the image object into the whole image 500 as determined. (S309). When the flow shown in FIG. 15 is performed every time a distance image is received from the distance image generator 100, the image object corresponding to the gesture can be continuously displayed so as to be easily seen by the user by this flow. For example, the user who performed the gesture may move after the gesture is executed and the image object related to the gesture is displayed. Even in such a case, the display position and display direction of the image object are optimized.
 なお、本開示のフローチャートは一例であり、各処理が上記フローの通りに必ず行われる必要はない。例えば、S102およびS103の処理は、並列に行われているが、順番に行われてもよい。 Note that the flowchart of the present disclosure is an example, and each process does not necessarily have to be performed according to the above flow. For example, the processes of S102 and S103 are performed in parallel, but may be performed in order.
 以上のように、本実施形態によれば、情報処理システム1000において提供され得る複数の機能を、一つの特定物体を用いた複数のジェスチャにそれぞれ割り当てることができる。情報処理システム1000は、当該特定物体およびユーザの動作を解析することにより、実行されたジェスチャを認識し、認識されたジェスチャに割り当てられた機能を実行する。これにより、実行する機能を切り替えたい場合に、別の特定物体に持ち替えるといった煩雑さをなくし、ユーザの利便性を向上する。 As described above, according to the present embodiment, a plurality of functions that can be provided in the information processing system 1000 can be assigned to a plurality of gestures using one specific object. The information processing system 1000 recognizes the executed gesture by analyzing the movement of the specific object and the user, and executes the function assigned to the recognized gesture. This eliminates the complexity of switching to another specific object when it is desired to switch the function to be executed, and improves the convenience of the user.
 また、ユーザの顔の位置などを推定し、推定された顔の位置に応じて、表示する画像オブジェクトの位置、向きなどを調整する。これにより、例えば、情報処理システム1000において表示された文字がユーザにとって斜めに表示されてしまい、表示された文字が読み難いといった問題を防ぐことができる。 Also, the position of the user's face is estimated, and the position, orientation, etc. of the image object to be displayed are adjusted according to the estimated face position. As a result, for example, it is possible to prevent a problem that the characters displayed in the information processing system 1000 are displayed diagonally to the user and the displayed characters are difficult to read.
 また、全体画像500内に存在するユーザおよび物体の位置および動きも推定し、表示スペース上の物およびユーザが移動しても、空きスペース、または、移動している物体上に、画像オブジェクトを常に表示させるといったことも可能にする。 In addition, the position and movement of the user and the object existing in the whole image 500 are also estimated, and even if the object in the display space and the user move, the image object is always placed in the empty space or the moving object. It is also possible to display it.
 本開示の実施形態における装置の処理は、CPU(Central Processing Unit)またはGPU(Graphics Processing Unit)等が実行するソフトウェア(プログラム)により実現できる。なお、当該装置の全ての処理をソフトウェアで実行するのではなく、一部の処理が、専用の回路などのハードウェアにより実行されてもよい。 The processing of the device according to the embodiment of the present disclosure can be realized by software (program) executed by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like. It should be noted that, instead of executing all the processes of the device by software, some processes may be executed by hardware such as a dedicated circuit.
 なお、上述の実施形態は本開示を具現化するための一例を示したものであり、その他の様々な形態で本開示を実施することが可能である。例えば、本開示の要旨を逸脱しない範囲で、種々の変形、置換、省略またはこれらの組み合わせが可能である。そのような変形、置換、省略等を行った形態も、本開示の範囲に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Note that the above-described embodiment shows an example for embodying the present disclosure, and the present disclosure can be implemented in various other forms. For example, various modifications, substitutions, omissions, or combinations thereof are possible without departing from the gist of the present disclosure. The forms in which such modifications, substitutions, omissions, etc. are made are also included in the scope of the invention described in the claims and the equivalent scope thereof, as are included in the scope of the present disclosure.
 なお、本開示は以下のような構成を取ることもできる。
 [1]
 入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析する第1解析部と、
 前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析する第2解析部と、
 前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定する判定部と、
 前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行する実行部と、
 を備える情報処理装置。
 [2]
 前記第2解析部は、
  前記特定物体の第1時点における動作の解析の成否について判定し、
  前記特定物体の第1時点における動作の解析が失敗したと判定された場合に、前記第1時点における動作の解析結果を修正する
 上記[1]に記載の情報処理装置。
 [3]
 前記第2解析部は、前記第1時点における動作の解析結果を、前記第1時点よりも前の第2時点における動作の解析結果、および、前記第1時点よりも後の第3時点における動作の解析結果、の少なくともいずれかに基づいて、前記第1時点における動作の解析結果を修正する
 上記[2]に記載の情報処理装置。
 [4]
 前記第2解析部は、
  前記第1時点よりも前の時点における動作の解析結果に基づいて、前記第1時点における動作を予測し、
  前記第1時点における動作の予測結果と、前記第1時点における動作の解析結果と、に基づいて、前記第1時点における動作の解析の成否を判定する
 上記[2]に記載の情報処理装置。
 [5]
 前記第2解析部は、前記特定物体の前記第1時点における動作の解析が失敗したと判定された場合に、前記予測結果に基づいて、前記第1時点における動作の解析結果を修正する
 上記[4]に記載の情報処理装置。
 [6]
 前記実行部は、前記機能の実行とともに、前記実行されたジェスチャに対応する画像オブジェクトを含む画像を出力する
 上記[1]ないし[5]のいずれかに記載の情報処理装置。
 [7]
 前記第2解析部は、前記検出された物体の位置の推移に基づいて、前記検出された物体の第1時点における第1領域を予測し、
 前記実行部は、前記第1時点において、前記実行されたジェスチャに対応する画像オブジェクトを前記第1領域以外の領域に表示する
 上記[6]に記載の情報処理装置。
 [8]
 前記第2解析部は、前記検出された特定部位の位置の推移に基づいて、前記検出された特定部位の第1時点における第2領域を予測し、
 前記実行部は、前記第1時点において、前記実行されたジェスチャに対応する画像オブジェクトを前記第2領域以外の領域に表示する
 上記[6]に記載の情報処理装置。
 [9]
 前記実行部は、前記実行されたジェスチャに対応する画像オブジェクトの表示位置を、前記ユーザの移動に応じて、調整する
 上記[6]ないし[8]のいずれかに記載の情報処理装置。
 [10]
 前記第1解析部は、前記検出された特定部位の位置の推移に基づいて、前記ユーザの第1時点における位置を推定し、
 前記実行部は、前記実行されたジェスチャに対応する画像オブジェクトを、前記ユーザの推定された位置から見て、上下正しい向きに表示する
 上記[6]ないし[9]のいずれかに記載の情報処理装置。
 [11]
 前記実行部は、前記実行されたジェスチャに対応する画像オブジェクトの表示方向を、前記ユーザの移動に応じて、調整する
 上記[6]ないし[10]のいずれかに記載の情報処理装置。
 [12]
 前記実行部は、所定のジェスチャが実行されたと前記判定部によって判定された場合に、前記複数のジェスチャに係る各画像と、前記複数の機能に係る各画像と、を表示し、
 前記判定部は、
  前記特定部位の推移と、表示された各画像の位置と、に基づいて、選択されたジェスチャおよび機能を認識し、
 選択されたジェスチャが実行されたと判定された場合に、選択された機能を実行する
 上記[1]ないし[11]のいずれかに記載の情報処理装置。
 [13]
 前記実行部は、所定のジェスチャが実行されたと前記判定部によって判定された場合に、新たなジェスチャを登録するための登録領域を示し、
 前記第1解析部は、前記登録領域に含まれる特定部位を検出し、検出された特定部位の動作を、前記複数のジェスチャの一つとする
 上記[1]ないし[12]のいずれかに記載の情報処理装置。
 [14]
 入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析するステップと、
 前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析するステップと、
 前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定するステップと、
 前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行するステップと、
 を備える情報処理方法。
 [15]
 入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析するステップと、
 前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析するステップと、
 前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定するステップと、
 前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行するステップと、
 を備える、コンピュータに実行されるプログラム。
 [16]
 入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析するステップと、
 前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析するステップと、
 前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定するステップと、
 前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行するステップと、
 を備える、コンピュータに実行されるプログラムが記憶された記憶媒体。
The present disclosure may also have the following structure.
[1]
A first analysis unit that detects a specific part of the user captured in the input image and analyzes the operation of the specific part.
A second analysis unit that detects a specific object captured in the input image and analyzes the operation of the specific object.
A determination unit that determines whether or not any of the specific portion and the plurality of gestures related to the specific object has been executed based on the operation of the specific portion and the operation of the specific object.
When it is determined that any of the plurality of gestures has been executed, an execution unit that executes a function corresponding to the executed gesture, and an execution unit.
Information processing device equipped with.
[2]
The second analysis unit
Judgment is made as to whether or not the motion analysis of the specific object at the first time point is successful.
The information processing apparatus according to the above [1], wherein when it is determined that the analysis of the motion of the specific object at the first time point has failed, the analysis result of the motion at the first time point is corrected.
[3]
The second analysis unit uses the analysis result of the operation at the first time point, the analysis result of the operation at the second time point before the first time point, and the operation at the third time point after the first time point. The information processing apparatus according to the above [2], wherein the analysis result of the operation at the first time point is modified based on at least one of the analysis results of the above.
[4]
The second analysis unit
Based on the analysis result of the operation at the time point before the first time point, the operation at the first time point is predicted.
The information processing apparatus according to the above [2], which determines the success or failure of the analysis of the operation at the first time point based on the prediction result of the operation at the first time point and the analysis result of the operation at the first time point.
[5]
When it is determined that the analysis of the motion of the specific object at the first time point has failed, the second analysis unit corrects the analysis result of the motion at the first time point based on the prediction result. 4] The information processing apparatus according to the above.
[6]
The information processing device according to any one of [1] to [5] above, wherein the execution unit outputs an image including an image object corresponding to the executed gesture together with the execution of the function.
[7]
The second analysis unit predicts the first region of the detected object at the first time point based on the transition of the position of the detected object.
The information processing device according to the above [6], wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the first area at the first time point.
[8]
The second analysis unit predicts the second region of the detected specific part at the first time point based on the transition of the position of the detected specific part.
The information processing device according to the above [6], wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the second area at the first time point.
[9]
The information processing device according to any one of [6] to [8] above, wherein the execution unit adjusts the display position of the image object corresponding to the executed gesture according to the movement of the user.
[10]
The first analysis unit estimates the position of the user at the first time point based on the transition of the position of the detected specific part.
The information processing according to any one of [6] to [9] above, wherein the execution unit displays the image object corresponding to the executed gesture in the correct vertical direction when viewed from the estimated position of the user. Device.
[11]
The information processing device according to any one of [6] to [10] above, wherein the execution unit adjusts the display direction of the image object corresponding to the executed gesture according to the movement of the user.
[12]
When the determination unit determines that a predetermined gesture has been executed, the execution unit displays each image related to the plurality of gestures and each image related to the plurality of functions.
The determination unit
Recognize the selected gesture and function based on the transition of the specific part and the position of each displayed image.
The information processing apparatus according to any one of [1] to [11] above, which executes the selected function when it is determined that the selected gesture has been executed.
[13]
The execution unit indicates a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed.
The first analysis unit detects a specific part included in the registration area, and sets the operation of the detected specific part as one of the plurality of gestures. The description in any one of [1] to [12] above. Information processing device.
[14]
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
Information processing method including.
[15]
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
A program that runs on your computer.
[16]
A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
A storage medium in which a program executed on a computer is stored.
 1000 情報処理システム
 100 距離画像生成装置
 200 情報処理装置
 210 距離画像取得部
 220 ユーザ解析部
 221 ユーザ検出部
 222 ユーザ動作解析部
 230 物体解析部
 231 物体検出部
 232 物体動作解析部
 240 判定部
 250 機能実行部
 251 画像取得部
 252 表示位置決定部
 253 表示方向決定部
 254 全体画像生成部
 261 ユーザ解析用データ記憶部
 262 物体解析用データ記憶部
 263 登録ジェスチャ記憶部
 264 画像記憶部
 300 プロジェクタ
 400 被投影体
 500 全体画像
 501、502、503、504 画像オブジェクト
 505 ポインタの画像オブジェクト
 506A、506B、506C 色が異なる画像オブジェクト
 507A、507B サイズが異なる画像オブジェクト
 601、602、603 ユーザ
 611、612、613 特定部位
 621 特徴点
 631 平面
 701、702、703、704、705、706、707 物体
1000 Information processing system 100 Distance image generator 200 Information processing device 210 Distance image acquisition unit 220 User analysis unit 221 User detection unit 222 User motion analysis unit 230 Object analysis unit 231 Object detection unit 232 Object motion analysis unit 240 Judgment unit 250 Function execution 251 Image acquisition unit 252 Display position determination unit 253 Display direction determination unit 254 Overall image generation unit 261 User analysis data storage unit 262 Object analysis data storage unit 263 Registered gesture storage unit 264 Image storage unit 300 Projector 400 Projected object 500 Overall image 501, 502, 503, 504 Image object 505 Pointer image object 506A, 506B, 506C Different color image object 507A, 507B Different size image object 601, 602, 603 User 611, 612, 613 Specific part 621 Feature point 631 Plane 701, 702, 703, 704, 705, 706, 707 Object

Claims (15)

  1.  入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析する第1解析部と、
     前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析する第2解析部と、
     前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定する判定部と、
     前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行する実行部と、
     を備える情報処理装置。
    A first analysis unit that detects a specific part of the user captured in the input image and analyzes the operation of the specific part.
    A second analysis unit that detects a specific object captured in the input image and analyzes the operation of the specific object.
    A determination unit that determines whether or not any of the specific portion and the plurality of gestures related to the specific object has been executed based on the operation of the specific portion and the operation of the specific object.
    When it is determined that any of the plurality of gestures has been executed, an execution unit that executes a function corresponding to the executed gesture, and an execution unit.
    Information processing device equipped with.
  2.  前記第2解析部は、
      前記特定物体の第1時点における動作の解析の成否について判定し、
      前記特定物体の第1時点における動作の解析が失敗したと判定された場合に、前記第1時点における動作の解析結果を修正する
     請求項1に記載の情報処理装置。
    The second analysis unit
    Judgment is made as to whether or not the motion analysis of the specific object at the first time point is successful.
    The information processing apparatus according to claim 1, wherein when it is determined that the analysis of the motion of the specific object at the first time point has failed, the analysis result of the motion at the first time point is corrected.
  3.  前記第2解析部は、前記第1時点における動作の解析結果を、前記第1時点よりも前の第2時点における動作の解析結果、および、前記第1時点よりも後の第3時点における動作の解析結果、の少なくともいずれかに基づいて、前記第1時点における動作の解析結果を修正する
     請求項2に記載の情報処理装置。
    The second analysis unit uses the analysis result of the operation at the first time point, the analysis result of the operation at the second time point before the first time point, and the operation at the third time point after the first time point. The information processing apparatus according to claim 2, wherein the analysis result of the operation at the first time point is modified based on at least one of the analysis results of the above.
  4.  前記第2解析部は、
      前記第1時点よりも前の時点における動作の解析結果に基づいて、前記第1時点における動作を予測し、
      前記第1時点における動作の予測結果と、前記第1時点における動作の解析結果と、に基づいて、前記第1時点における動作の解析の成否を判定する
     請求項2に記載の情報処理装置。
    The second analysis unit
    Based on the analysis result of the operation at the time point before the first time point, the operation at the first time point is predicted.
    The information processing apparatus according to claim 2, wherein the success or failure of the analysis of the operation at the first time point is determined based on the prediction result of the operation at the first time point and the analysis result of the operation at the first time point.
  5.  前記第2解析部は、前記特定物体の前記第1時点における動作の解析が失敗したと判定された場合に、前記予測結果に基づいて、前記第1時点における動作の解析結果を修正する
     請求項4に記載の情報処理装置。
    Claim that the second analysis unit corrects the analysis result of the operation at the first time point based on the prediction result when it is determined that the analysis of the operation of the specific object at the first time point has failed. The information processing apparatus according to 4.
  6.  前記実行部は、前記機能の実行とともに、前記実行されたジェスチャに対応する画像オブジェクトを含む画像を出力する
     請求項1に記載の情報処理装置。
    The information processing device according to claim 1, wherein the execution unit outputs an image including an image object corresponding to the executed gesture together with the execution of the function.
  7.  前記第2解析部は、前記検出された物体の位置の推移に基づいて、前記検出された物体の第1時点における第1領域を予測し、
     前記実行部は、前記第1時点において、前記実行されたジェスチャに対応する画像オブジェクトを前記第1領域以外の領域に表示する
     請求項6に記載の情報処理装置。
    The second analysis unit predicts the first region of the detected object at the first time point based on the transition of the position of the detected object.
    The information processing device according to claim 6, wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the first area at the first time point.
  8.  前記第2解析部は、前記検出された特定部位の位置の推移に基づいて、前記検出された特定部位の第1時点における第2領域を予測し、
     前記実行部は、前記第1時点において、前記実行されたジェスチャに対応する画像オブジェクトを前記第2領域以外の領域に表示する
     請求項6に記載の情報処理装置。
    The second analysis unit predicts the second region of the detected specific part at the first time point based on the transition of the position of the detected specific part.
    The information processing device according to claim 6, wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the second area at the first time point.
  9.  前記実行部は、前記実行されたジェスチャに対応する画像オブジェクトの表示位置を、前記ユーザの移動に応じて、調整する
     請求項6に記載の情報処理装置。
    The information processing device according to claim 6, wherein the execution unit adjusts the display position of the image object corresponding to the executed gesture according to the movement of the user.
  10.  前記第1解析部は、前記検出された特定部位の位置の推移に基づいて、前記ユーザの第1時点における位置を推定し、
     前記実行部は、前記実行されたジェスチャに対応する画像オブジェクトを、前記ユーザの推定された位置から見て、上下正しい向きに表示する
     請求項6に記載の情報処理装置。
    The first analysis unit estimates the position of the user at the first time point based on the transition of the position of the detected specific part.
    The information processing device according to claim 6, wherein the execution unit displays an image object corresponding to the executed gesture in the correct vertical direction when viewed from the estimated position of the user.
  11.  前記実行部は、前記実行されたジェスチャに対応する画像オブジェクトの表示方向を、前記ユーザの移動に応じて、調整する
     請求項10に記載の情報処理装置。
    The information processing device according to claim 10, wherein the execution unit adjusts the display direction of the image object corresponding to the executed gesture according to the movement of the user.
  12.  前記実行部は、所定のジェスチャが実行されたと前記判定部によって判定された場合に、前記複数のジェスチャに係る各画像と、前記複数の機能に係る各画像と、を表示し、
     前記判定部は、
      前記特定部位の推移と、表示された各画像の位置と、に基づいて、選択されたジェスチャおよび機能を認識し、
     選択されたジェスチャが実行されたと判定された場合に、選択された機能を実行する
     請求項1に記載の情報処理装置。
    When the determination unit determines that a predetermined gesture has been executed, the execution unit displays each image related to the plurality of gestures and each image related to the plurality of functions.
    The determination unit
    Recognize the selected gesture and function based on the transition of the specific part and the position of each displayed image.
    The information processing apparatus according to claim 1, wherein when it is determined that the selected gesture has been executed, the selected function is executed.
  13.  前記実行部は、所定のジェスチャが実行されたと前記判定部によって判定された場合に、新たなジェスチャを登録するための登録領域を示し、
     前記第1解析部は、前記登録領域に含まれる特定部位を検出し、検出された特定部位の動作を、前記複数のジェスチャの一つとする
     請求項1に記載の情報処理装置。
    The execution unit indicates a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed.
    The information processing apparatus according to claim 1, wherein the first analysis unit detects a specific part included in the registration area and sets the operation of the detected specific part as one of the plurality of gestures.
  14.  入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析するステップと、
     前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析するステップと、
     前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定するステップと、
     前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行するステップと、
     を備える情報処理方法。
    A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
    A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
    Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
    When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
    Information processing method including.
  15.  入力画像に写されたユーザの特定部位を検出して、前記特定部位の動作を解析するステップと、
     前記入力画像に写された特定物体を検出して、前記特定物体の動作を解析するステップと、
     前記特定部位の動作と、前記特定物体の動作と、に基づいて、前記特定部位および前記特定物体に係る複数のジェスチャのいずれかが実行されたか否かを判定するステップと、
     前記複数のジェスチャのいずれかが実行されたと判定された場合に、実行されたジェスチャに対応する機能を実行するステップと、
     を備える、コンピュータに実行されるプログラム。
    A step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part, and
    A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and
    Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
    When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and
    A program that runs on your computer.
PCT/JP2021/002501 2020-02-10 2021-01-25 Information processing device, information processing method, and program WO2021161769A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-020938 2020-02-10
JP2020020938 2020-02-10

Publications (1)

Publication Number Publication Date
WO2021161769A1 true WO2021161769A1 (en) 2021-08-19

Family

ID=77292382

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/002501 WO2021161769A1 (en) 2020-02-10 2021-01-25 Information processing device, information processing method, and program

Country Status (1)

Country Link
WO (1) WO2021161769A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009519553A (en) * 2005-12-12 2009-05-14 株式会社ソニー・コンピュータエンタテインメント Method and system enabling depth and direction detection when interfacing with a computer program
WO2017217050A1 (en) * 2016-06-16 2017-12-21 ソニー株式会社 Information processing device, information processing method and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009519553A (en) * 2005-12-12 2009-05-14 株式会社ソニー・コンピュータエンタテインメント Method and system enabling depth and direction detection when interfacing with a computer program
WO2017217050A1 (en) * 2016-06-16 2017-12-21 ソニー株式会社 Information processing device, information processing method and storage medium

Similar Documents

Publication Publication Date Title
US11314335B2 (en) Systems and methods of direct pointing detection for interaction with a digital device
US20220382379A1 (en) Touch Free User Interface
US20210096651A1 (en) Vehicle systems and methods for interaction detection
US10732725B2 (en) Method and apparatus of interactive display based on gesture recognition
JP5802667B2 (en) Gesture input device and gesture input method
KR101947034B1 (en) Apparatus and method for inputting of portable device
EP2972669B1 (en) Depth-based user interface gesture control
US20180292907A1 (en) Gesture control system and method for smart home
US20150084859A1 (en) System and Method for Recognition and Response to Gesture Based Input
US9477874B2 (en) Method using a touchpad for controlling a computerized system with epidermal print information
US20180150186A1 (en) Interface control system, interface control apparatus, interface control method, and program
EP2752740A1 (en) Drawing control method, apparatus and mobile terminal
US9544556B2 (en) Projection control apparatus and projection control method
US20150363038A1 (en) Method for orienting a hand on a touchpad of a computerized system
WO2014127697A1 (en) Method and terminal for triggering application programs and application program functions
US20140362002A1 (en) Display control device, display control method, and computer program product
US10621766B2 (en) Character input method and device using a background image portion as a control region
Matlani et al. Virtual mouse using hand gestures
JP6033061B2 (en) Input device and program
WO2021161769A1 (en) Information processing device, information processing method, and program
KR101559424B1 (en) A virtual keyboard based on hand recognition and implementing method thereof
JP2016071824A (en) Interface device, finger tracking method, and program
CN110162251A (en) Image-scaling method and device, storage medium, electronic equipment
EP3059664A1 (en) A method for controlling a device by gestures and a system for controlling a device by gestures
KR101327963B1 (en) Character input apparatus based on rotating user interface using depth information of hand gesture and method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21753553

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21753553

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP