WO2021161769A1 - Dispositif de traitement d'informations, procédé de traitement d'informations et programme - Google Patents

Dispositif de traitement d'informations, procédé de traitement d'informations et programme Download PDF

Info

Publication number
WO2021161769A1
WO2021161769A1 PCT/JP2021/002501 JP2021002501W WO2021161769A1 WO 2021161769 A1 WO2021161769 A1 WO 2021161769A1 JP 2021002501 W JP2021002501 W JP 2021002501W WO 2021161769 A1 WO2021161769 A1 WO 2021161769A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
gesture
time point
specific
executed
Prior art date
Application number
PCT/JP2021/002501
Other languages
English (en)
Japanese (ja)
Inventor
洋祐 加治
哲男 池田
淳 入江
英佑 藤縄
誠史 友永
忠義 村上
健志 後藤
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021161769A1 publication Critical patent/WO2021161769A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Definitions

  • This disclosure relates to information processing devices, information processing methods, and programs.
  • CG Common Graphics
  • the functions that the AR system can provide are increasing more and more. Therefore, the method of selecting the provided functions is important. For example, if a large number of functions are to be recognized only by the gestures of the user's hand, a large number of gestures are required, which is not preferable from the viewpoint of usability. Therefore, it is preferable to make it possible to select a large number of functions by making a combination of a specific object and a human gesture, that is, a gesture using the specific object available.
  • the present disclosure provides an information processing device or the like that solves a problem related to usability of an information processing system that provides functions according to registered gestures.
  • the information processing device on one aspect of the present disclosure includes a first analysis unit, a second analysis unit, a determination unit, and an execution unit.
  • the first analysis unit detects a specific part of the user captured in the input image and analyzes the operation of the specific part.
  • the second analysis unit detects the specific object captured in the input image and analyzes the operation of the specific object.
  • the determination unit determines whether or not the gesture related to the specific portion and the specific object has been executed based on the operation of the specific portion and the operation of the specific object. When it is determined that the gesture has been executed, the execution unit executes a function corresponding to the gesture.
  • the second analysis unit determines the success or failure of the analysis of the motion of the specific object at the first time point, and when it is determined that the analysis of the motion of the specific object at the first time point has failed, the first analysis unit.
  • the analysis result of the operation at the time point may be modified.
  • the second analysis unit uses the analysis result of the operation at the first time point as the analysis result of the operation at the second time point before the first time point and the third time point after the first time point.
  • the analysis result of the operation at the first time point may be modified based on at least one of the analysis results of the operation in.
  • the second analysis unit predicts the operation at the first time point based on the analysis result of the operation at the time point before the first time point, and predicts the operation at the first time point and the first time point.
  • the success or failure of the analysis of the motion at the first time point may be determined based on the analysis result of the motion at the first time point.
  • the second analysis unit corrects the analysis result of the motion at the first time point based on the prediction result. You may.
  • the execution unit may output an image including an image object corresponding to the executed gesture together with the execution of the function.
  • the second analysis unit may predict the first region of the detected object at the first time point based on the transition of the position of the detected object, and the execution unit may predict the first region of the detected object.
  • the image object corresponding to the executed gesture may be displayed in an area other than the first area.
  • the second analysis unit may predict the second region at the first time point of the detected specific part based on the transition of the position of the detected specific part, and the execution part may predict the second region at the first time point.
  • the image object corresponding to the executed gesture may be displayed in an area other than the second area.
  • the execution unit may adjust the display position of the image object corresponding to the executed gesture according to the movement of the user.
  • the first analysis unit may estimate the position of the user at the first time point based on the transition of the position of the detected specific part, and the execution unit corresponds to the executed gesture.
  • the image object to be displayed may be displayed in the correct vertical orientation when viewed from the user's estimated position.
  • the execution unit may adjust the display direction of the image object corresponding to the executed gesture according to the movement of the user.
  • the execution unit may display each image related to the plurality of gestures and each image related to the plurality of functions when the determination unit determines that a predetermined gesture has been executed.
  • the determination unit recognizes the selected gesture and function based on the transition of the specific part and the position of each displayed image, and when it is determined that the selected gesture has been executed, You may perform the selected function.
  • the execution unit may indicate a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed, and the first analysis unit may indicate the registration area.
  • the specific site included in the above may be detected, and the operation of the detected specific site may be one of the plurality of gestures.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed. When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and An information processing method is provided.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
  • the step of executing the function corresponding to the executed gesture and A program that runs on a computer is provided.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
  • the step of executing the function corresponding to the executed gesture and A storage medium in which a program executed on a computer is stored is provided.
  • the figure explaining the registration content of the registration gesture. The figure which shows the registration gesture using the specific object of a pen shape.
  • the figure which shows another registration gesture and corresponding function using a specific object of a rectangular parallelepiped The figure which shows the registration gesture using the specific object of a cylinder.
  • Flowchart of function call processing by gesture Flowchart of analysis process. Flowchart of image display processing.
  • FIG. 1 is a diagram showing a configuration example of an information processing system according to an embodiment of the present disclosure.
  • the information processing system 1000 is composed of a distance image generation device 100, an information processing device 200, a projector 300, and a projected object 400.
  • the information processing system 1000 of the present embodiment is a system that recognizes the execution of the registered gesture and provides a function according to the registered gesture.
  • the information processing system 1000 is an AR (Augmented Reality) system for displaying an image.
  • the function to be executed does not have to be related to image processing.
  • the control of the electric device managed by the information processing system 1000 may be simply executed.
  • a table is shown as the projected object 400, and an image is projected on the upper surface of the table.
  • the entire image displayed by the information processing system 1000 is referred to as the entire image 500.
  • the surface of the projected object 400 on which the entire image 500 is displayed is referred to as a projected surface.
  • the term "image” is a concept that includes both still images and moving images. Therefore, the "image” in the present disclosure may be replaced with a still image or a moving image if there is no particular problem. That is, the image displayed by the information processing system 1000 may be a moving image or a still image. The concept of "image” is also included in "image”. Further, the whole image 500 may be an image called a stereoscopic image, a 3D image, or the like, which can make the viewer feel three-dimensional.
  • Gestures and functions corresponding to the gestures are registered in advance in the information processing system 1000. Further, the user who uses the information processing system 1000 recognizes the registered gesture and the corresponding function, and conveys the function to be called to the information processing system 1000 by the gesture. The information processing system 1000 recognizes the gesture of the user and changes the display content of the entire image 500 according to the recognized gesture.
  • FIG. 2 is a diagram illustrating a usage pattern of the information processing system 1000.
  • FIG. 2 shows users 601, 602, and 603 who use the information processing system 1000. Further, a pen-shaped object 701 held by the user 601 is shown. Also shown are a rectangular parallelepiped 702, a cylinder 703, and a laptop PC 704 resting on the projected object 400.
  • a plurality of depictions are displayed in the entire image 500.
  • a depiction in the whole image 500 in other words, a partial image displayed in a part of the whole image 500, is referred to as an image object.
  • image objects 501, 502, and 503 such as memo paper and an image object 504 representing a time of "3:47" are shown.
  • the image object 504 representing the time is projected on the upper surface of the cylinder 703.
  • the action of touching the image object of the entire image 500 is also recognized as a gesture.
  • the pre-registered process corresponding to the gesture is performed.
  • it is possible to execute processing on the touched image object For example, it is possible to execute a process such as enlarging, reducing, or erasing the touched image object 503, or changing the image content related to the image object 503.
  • it is possible to execute a process such as displaying a menu image of the information processing system 1000. In this way, the entire image 500 can be used as a so-called virtual touch screen.
  • gestures are represented only by specific parts such as the user's hand and fingers, but in the present embodiment, gestures using a specific object registered in advance in the information processing system 1000 can also be used.
  • the user 601 is holding the pen-shaped object 701, and such a posture of the user can also be recognized as a gesture.
  • a process such as displaying a solid line corresponding to the transition of the position of the tip of the object 701 (in other words, the locus of the tip) on the image object 501 can be executed.
  • the human body is also included in the object, but in order to distinguish between a gesture using only a specific part of the user and a gesture using a specific object, the specific object is not included in the specific object in this disclosure. It shall be. That is, a gesture using a specific object is a gesture using an object other than the human body, and a gesture using only the hand, a gesture using a combination of eyes and fingers, etc. are not included in the gesture using the specific object. ..
  • the specific object is not particularly limited except for the human body, and its shape is not limited.
  • the information processing system 1000 of the present embodiment can assign a plurality of instruction contents to a specific object. That is, it is possible to register a plurality of gestures using the same specific object. For example, when a user grasps and moves a pen-shaped object, if the tip of the object is directed toward the entire image 500, a solid line is drawn in the overall image 500, and the rear end of the object is directed toward the entire image 500. If so, a process of erasing the solid line in the entire image 500 may be performed according to the transition of the position of the rear end. That is, a gesture using a pen-shaped object can provide a pen function of drawing a solid line and an eraser function of erasing the drawn solid line. In this way, even gestures using the same specific object can call different functions. From the viewpoint of usability, it is preferable that a function intuitively imagined by the user from a specific object is associated with the specific object, and this embodiment makes it possible.
  • the distance image generation device 100 captures a region in which the gesture is performed and generates a distance image (depth map) related to the region.
  • a known device may be used.
  • it includes an image sensor, a distance measuring sensor, and the like, and generates a distance image from these images.
  • the image sensor includes an RGB camera and the like.
  • Examples of the distance measuring sensor include a stereo camera, a TOF (Time of Flight) camera, and a structured light camera.
  • the information processing device 200 recognizes the gesture by the user based on the distance image generated by the distance image generation device 100.
  • the recognized gesture may include a gesture using an object.
  • the whole image 500 is generated based on the recognized gesture.
  • the display position of the image object displayed on the entire image 500 is adjusted in order to further improve the usability. As a result, the entire image 500 can be easily viewed by the user. Details will be described later together with the components of the information processing apparatus 200.
  • the projector 300 outputs the entire image 500 generated by the information processing device 200.
  • the projector 300 is installed above the table, an image is projected downward from the projector 300, and the entire image 500 is displayed on the upper surface of the table.
  • the display destination of the entire image 500 is not particularly limited.
  • the information processing apparatus 200 may transmit the entire image 500 to the laptop 704 of FIG. 2 so that the entire image 500 may be displayed on the laptop 704.
  • the entire image 500 may be output to an image display device such as an AR glass or a head-mounted display. That is, instead of the projector 300 and the projected object 400, an image display device may be included in the information processing system 1000.
  • the information processing system 1000 in order to clarify the processing performed by the information processing system 1000, an example in which the information processing system 1000 is composed of the above devices is shown, but these devices may be integrated. And may be further dispersed.
  • FIG. 3 is a block diagram showing an example of the internal configuration of the information processing apparatus 200.
  • the information processing apparatus 200 includes a distance image acquisition unit 210, a user analysis unit 220, an object analysis unit 230, a determination unit 240, a function execution unit 250, and a user analysis data storage unit 261.
  • a data storage unit 262 for object analysis, a registration gesture storage unit 263, and an image storage unit 264 are provided.
  • the user analysis unit 220 includes a user detection unit 221 and a user motion analysis unit 222.
  • the object analysis unit 230 includes an object detection unit 231 and an object motion analysis unit 232.
  • the function execution unit 250 includes an image acquisition unit 251, a display position determination unit 252, a display direction determination unit 253, and an overall image generation unit 254.
  • the above-mentioned components of the information processing apparatus 200 may be aggregated or further dispersed.
  • a plurality of storage units (user analysis data storage unit 261 and object analysis data storage unit 261) that store the data used by each component are stored.
  • 262, the registered gesture storage unit 263, and the image storage unit 264) have been described, these storage units may be composed of one or more memories or storages, or a combination thereof.
  • components and functions not shown or described in the present disclosure may also be present in the information processing apparatus 200.
  • the distance image acquisition unit 210 acquires a distance image from the distance image generation device 100 and transmits it to the user analysis unit 220 and the object analysis unit 230.
  • the distance image acquisition unit 210 may perform preprocessing such as threshold processing on the distance image in order to improve the accuracy of the detection executed by the user analysis unit 220 and the object analysis unit 230.
  • the user analysis unit 220 detects a specific part of the user captured in the distance image and analyzes the operation of the specific part.
  • the data used for the processing of the user analysis unit 220 is stored in advance in the user analysis data storage unit 261.
  • data for detecting a specific part such as a template image of a specific part is stored in the user analysis data storage unit 261.
  • the specific part may be a part that can be used for gestures such as a finger, a hand, an arm, a face, and an eye, and is not particularly limited.
  • the specific part detection unit of the user analysis unit 220 detects the user's specific part and its area captured in the distance image.
  • the detection method a known method may be used.
  • the area of the specific part can be detected by performing block matching between the template image related to the specific part stored in the user analysis data storage unit 261 and the distance image.
  • the user movement analysis unit 222 of the user analysis unit 220 obtains the position, posture, and transition of the specific part as the detected movement of the specific part.
  • FIG. 4 is a diagram for explaining the flow of analysis of a specific part.
  • the user's hand is shown as the specific site 611.
  • the user motion analysis unit 222 plots a plurality of points 621 in the region of the specific portion. The points plotted in the area of the specific part are described as feature points.
  • the method of plotting feature points is not particularly limited. For example, it may be determined based on a skeleton model generation method generally used in the technique of gesture recognition. In addition, features of specific parts such as nails, wrinkles, moles, hairs, and joints may be plotted as feature points 621.
  • the user motion analysis unit 222 detects each position of the feature point 621. By using the distance image, the position in the depth direction with respect to the projection surface can also be detected. That is, the three-dimensional position of each feature point 621 can be obtained.
  • the user motion analysis unit 222 performs plane fitting on the obtained feature points 621.
  • a method such as the least squares method or the RANSAC method can be used.
  • the plane 631 related to the specific portion is obtained.
  • the posture of the specific part is obtained based on the plane 631 related to the specific part.
  • a projection plane or the like is defined in advance as a reference plane, and the inclination of the plane 631 with respect to the reference plane is set as the posture of the specific portion.
  • the inclination of the plane is, for example, the angle difference between the three-dimensional axis of the plane 631 as shown on the lower side of FIG. 4 and the three-dimensional axis based on the reference plane, such as pitch, yaw, and yaw. It can be represented by a roll.
  • the user motion analysis unit 222 calculates the position and posture of the specific part for each distance image (for each frame if the distance image is a moving image). Then, these differences before and after the time series are calculated. That is, the transition is obtained from the difference between the analysis result based on the distance image at the first time point and the analysis result based on the distance image at the second time point after the first time point. If the feature points cannot be distinguished, the correspondence between the feature points before and after the time series may be estimated by using a search method or the like, or the position may be determined based on the time interval in which the distance image was taken. Feature points with little transition may be associated with each other.
  • the user analysis unit 220 stores the provided feature points as a history in the user analysis data storage unit 261, and based on the history, confirms whether the newly detected specific site has been previously analyzed. May be good. In this way, user identification may be performed based on the arrangement of feature points.
  • the object analysis unit 230 detects the object captured in the distance image and analyzes the operation of the object.
  • the data used for the processing of the object analysis unit 230 is stored in advance in the object analysis data storage unit 262.
  • data for detecting an object such as a template image of an object, is stored in the object analysis data storage unit 262.
  • the object analysis unit 230 may analyze only the specific object used for the gesture, or may analyze an object other than the specific object.
  • the specific object related to the registered gesture can be recognized based on the data related to the registered gesture stored in the registered gesture storage unit 263.
  • the object analysis unit 230 may match only a specific object related to the registered gesture among the objects that can be analyzed by itself. As a result, effects such as reduction of processing load and increase of processing speed can be obtained. Further, as will be described later, in order to determine the display position of the image object, not only the specific object related to the registered gesture but also all the objects that can be analyzed by itself may be analyzed.
  • the object detection unit 231 of the object analysis unit 230 detects the object and its region captured in the distance image in the same manner as the specific part detection unit. It is possible that the object captured in the distance image is not recognized. For example, only the shape of an object such as a pen, a rectangular parallelepiped, or a cylinder may be recognized.
  • the object motion analysis unit 232 of the object analysis unit 230 obtains the position, posture, and transition of the detected object as the motion of the detected object in the same manner as the user motion analysis unit 222.
  • the movement of the specific part of the user and the movement of the specific object are analyzed, but these movement analysis may fail.
  • the user's hand may hide the specific object, and the specific object may not be shown on the distance image. In that case, the specific object is not detected, and it is erroneously recognized that the specific object has ended even though the operation of the specific object is actually continuing.
  • the user motion analysis unit 222 and the object motion analysis unit 232 may perform verification of the analysis result, reanalysis of the motion, and the like. That is, the success or failure of the analysis of the motion of the specific object at a certain point in time may be determined, and when it is determined that the analysis of the motion of the specific object at that time has failed, the analysis result of the motion at that time may be modified.
  • the analysis result of the motion at the first time point is at least one of the analysis result of the motion at the second time point before the first time point and the analysis result of the motion at the third time point after the first time point. May be modified based on. For example, when the position and the posture fluctuate abruptly, it may be determined that the analysis has failed, and the suddenly fluctuated position and the posture may be corrected by complementation based on the values before and after the time series.
  • the user motion analysis unit 222 and the object motion analysis unit 232 may predict the future position and posture based on the transition of the position and posture so far. For example, the position of the feature point at the first N + 1 time point may be predicted based on the position of the feature point at the first to Nth time points (N is an integer of 1 or more).
  • the prediction result is compared with the estimation result based on the actual detection described above, and if the error is larger than the predetermined threshold value, it can be determined that the estimation by the detection has failed. If it is determined that the estimation has failed, the prediction result may be used, or as described above, correction may be made based on the preceding and succeeding estimation results.
  • the prediction method may be, for example, calculating the velocity and acceleration of the feature points and making a prediction based on the calculated speeds and accelerations.
  • the position of the feature point at the first to Nth time points may be input to the estimation model based on the neural network, and the position of the feature point at the N + 1th time point may be output.
  • the estimation model is known for deep learning based on input data for learning indicating the positions of feature points at the first to Nth time points and correct answer data indicating the actual positions of the feature points at the time points of N + 1. Can be generated by performing.
  • the determination unit 240 determines whether or not the gesture related to at least one of the specific part and the specific object has been executed based on at least one of the movement of the specific part and the movement of the specific object.
  • the data used for the determination is stored in advance in the registered gesture storage unit 263.
  • the determination unit 240 compares the operation of the specific part related to the registered gesture with the analyzed operation of the specific part, and calculates the matching rate.
  • the motion of the specific object related to the registered gesture is compared with the motion of the analyzed specific object, and the matching rate is calculated. Then, it may be determined whether or not the gesture has been executed based on each match rate. For example, it may be determined that the registration gesture has been performed when each match rate exceeds the respective threshold value.
  • FIG. 5 is a diagram for explaining the registered contents of the registered gesture.
  • the specific object to be used, the functional classification on the AR system, the operation of the specific part constituting the registered gesture, the operation of the specific object, and the function to be called are shown.
  • the specific part is made by hand.
  • numerical values indicating the transition of the position and the posture are actually registered.
  • FIG. 6 shows the details of the first to third registered gestures of FIG.
  • FIG. 6 (A) shows the first registered gesture of FIG.
  • the hand 611 which is a specific part, maintains a state in which the pen is held, which is called a pen holding posture, and the pen-shaped object 701, which is a specific object, is maintained in a state where its tip is close to the projection surface, that is, on the lower side. There is.
  • FIG. 6B shows the second registered gesture of FIG.
  • FIG. 6C shows the third registered gesture of FIG. 5, and when the gesture is recognized, it is shown that the pointer 505, which is an image object, is displayed in the entire image 500.
  • FIG. 7 shows the fourth registered gesture of FIG.
  • the rear end of the pen-shaped object 701 is within a predetermined distance from the thumb (that is, in a close state) and the thumb is bent and stretched, each time the thumb is bent and stretched, an image object having a different color is included in the entire image 500.
  • 506A, 506B, and 506C are displayed in order (that is, the color of the image object changes). For example, the color of the line in the above-mentioned function of "drawing a line" can be changed.
  • FIG. 8 shows the fifth registered gesture of FIG.
  • image objects 507A and 507B having different sizes are displayed in order in the entire image 500 (that is, the size of the image object changes). ) Is shown.
  • the line thickness in the above-mentioned "drawing line” function can be changed.
  • FIG. 9 shows the sixth registered gesture of FIG. A gesture is shown in which the hand 611 moves parallel to the projection plane and one corner of the rectangular parallelepiped 702 moves parallel to the projection plane while maintaining a state closer to the projection plane than the other corners.
  • the gesture is executed, as shown in FIG. 5, the function of erasing the line in the whole image 500 is called.
  • FIG. 10 (A) shows the seventh registered gesture of FIG. A gesture is shown in which the hand 611 moves parallel to the projection plane and one side of the rectangular parallelepiped 702 moves parallel to the projection plane while maintaining a state close to the projection plane.
  • the gesture is recognized, it is shown that the size (scale) of the entire image 500 is changed as shown in FIG. 10 (B).
  • FIG. 11 (A) shows the eighth registered gesture of FIG.
  • the gesture that the contact surface of the rectangular parallelepiped 702 with the projection surface changes regardless of the movement of the specific part is shown.
  • a gesture that does not include either the movement of a specific part or the movement of a specific object may be registered.
  • the entire image 500 is switched as shown in FIG. 11 (B).
  • FIG. 12 shows the 9th to 11th registered gestures of FIG.
  • FIG. 12A shows the ninth registered gesture, and when this gesture is recognized, an image object called a stamp is displayed in the entire image 500.
  • FIG. 12B when this gesture is recognized, an image object 504 called a timer representing the time as shown in FIG. 2 is displayed in the entire image 500.
  • FIG. 12C shows the eleventh registered gesture, and when this gesture is recognized, the stamp is switched to another stamp.
  • FIG. 13 (A) shows the twelfth registered gesture of FIG.
  • the whole image 500 is rotated, and for example, as shown in FIG. 13B, the orientation of the image object displayed in the whole image 500 is turned upside down.
  • the motion analysis unit may recognize the motion of the specific object based on the analyzed motion of the specific part.
  • the motion analysis unit analyzes the analysis result even if the cylinder 703 is analyzed to be stationary. Overturning, the cylinder 703 may be regarded as rotated. Further, it may be determined whether or not the specific portion is in contact with the cylinder 703, and it may be determined that the cylinder 703 has rotated in the same manner as the specific portion only when the specific portion is in contact with the cylinder 703. In this way, the movement of the specific object may be recognized based on the movement of the specific part.
  • the contact between the specific part and the specific object can be recognized by whether the plane related to the specific part intersects any surface of the specific object.
  • the user motion analysis unit 222 may also recognize the motion of the specific part based on the motion of the specific object. Further, the determination unit 240 may recertify the movement of the other based on at least one of the movement of the specific object and the movement of the specific portion.
  • the movement of the specific part and the movement of the specific object constituting the registered gesture are predetermined and stored as data, and the determination unit 240 detects the movement of the specific part and the specific object in the data. Search for registration gestures that match the combination of behaviors in. Then, when the registered gesture is detected, the function corresponding to the registered gesture is found.
  • the function execution unit 250 executes the function corresponding to the executed gesture.
  • the image acquisition unit 251 acquires an image object corresponding to the function to be executed from the image storage unit 264.
  • the whole image generation unit 254 generates the whole image 500 including the acquired image object.
  • a plurality of image objects may be displayed in the entire image 500. In such a case, it is preferable to determine the display position so that the plurality of displayed image objects do not overlap.
  • an appropriate display position and display direction of the image object are determined.
  • the display position determination unit 252 detects an area (that is, an empty space) in which the image is not displayed in the current overall image 500. The detection may be performed by recording the display position of the image object and detecting based on the recording. The detected empty space is a candidate for the display position of the image object. The display position determination unit 252 selects one of the detected empty spaces based on the size of the newly displayed image object, the position of the specific portion related to the gesture, and the like. As a result, the display position of the newly displayed image object is determined.
  • the display position determination unit 252 may detect a space that cannot be actually used among the empty spaces, and may exclude the space that cannot be actually used from the candidates for the display position of the image object. For example, when the projected object 400 is a table as in the example of FIG. 2, it is conceivable that the object placed on the table overlaps with the entire image 500. In such a case, if the image object is projected in the same place as the object placed on the table, the image object becomes difficult to see. Therefore, the display position determination unit 252 may further narrow down the candidates for the display position of the image object based on the detected position of the object.
  • the image object captured in the distance image is not always stationary.
  • the display position determining unit 252 may further narrow down the display position candidates of the image object based on the detected specific part and the predicted position of the object at the time when the image object is displayed. good. As a result, it is possible to prevent a situation in which the functions in the entire image 500 overlap with the user's hand, an object placed on the projection surface, or the like.
  • the display position determination unit 252 may use the predicted position of the specific object as the display position of the image object.
  • the display direction determination unit 253 determines the display direction of the image object. For example, when the user analysis unit 220 recognizes the user's face captured in the distance image, the display direction of the image object is determined based on the position and orientation of the face. For example, the correct direction (for example, up and down) is determined in advance for the image object, and the display direction determination unit 253 displays the image object in the correct up and down direction when viewed from a certain position of the user's face. As a result, when displaying characters and the like, the characters can be displayed in a direction that is easy for the user to see.
  • the position and orientation of the user's face may be estimated.
  • the distance image may show the user's hand but not the user's face.
  • the position of the user's face outside the range of the distance image is estimated from a part of the user's hand or the like.
  • the user analysis unit 220 analyzes the movement of the user's hand, the direction in which the hand is inserted into the image area can be specified.
  • the display direction determination unit 253 may recognize the direction in which the hand is inserted into the image area as the position of the face.
  • the user analysis unit 220 may generate a skeleton model of the hand and estimate the position of the face in consideration of the orientation of the fingers specified by the skeleton model.
  • the image object to be displayed, its display position, and its display direction are determined, and the overall image generation unit 254 generates the entire image 500 according to these determination items.
  • the generation of the entire image 500 may be the same as that of a general CG (Computer Graphics) manufacturing method.
  • FIG. 14 is a diagram showing a preferable display example of the image object.
  • a gesture is performed in which the hand 611, which is a specific part, touches the cylinder 703, and the menu image 508 of the AR system is displayed by the gesture.
  • Other image objects are also displayed in the whole image 500.
  • the hands 611, 612, and 613 of each user are present in the sky above the entire image 500.
  • objects 705, 706 and 707 are present on the projection surface.
  • the menu image 508 is displayed so as not to overlap with these.
  • the character is shown in the menu image 508, it is shown so that the user of the hand 611 who performed the gesture can read the character. It is preferable that the entire image 500 having excellent usability is generated by the processing of the display position determination unit 252 and the display direction determination unit 253.
  • the correspondence between gestures and functions was predetermined, but from the viewpoint of usability, it is preferable to be able to customize the correspondence. Therefore, it is preferable that a gesture that calls a function for changing the correspondence is registered.
  • the menu image 508 shown in FIG. 14 includes an icon indicating a registered gesture and an icon indicating a callable function. Then, the user's tap may be detected and the gesture and function related to the tapped icon may be associated with each other.
  • a gesture that calls the function for registering a new gesture may be registered.
  • the pointer 505 as shown in FIG. 6C may be displayed, and while the pointer 505 is displayed, a gesture performed in the sky above the pointer 505 may be newly registered.
  • the gesture performed in the sky above the pointer 505 is analyzed by the user analysis unit 220 and the object analysis unit 230, and the analyzed motion of the specific part and the motion of the object are the contents of the new registered gesture, the registered gesture storage unit 263. It should be memorized in.
  • the registered gesture can be customized for each user.
  • FIG. 15 is a flowchart of a function call process by a gesture.
  • the image acquisition unit 251 acquires a distance image (S101) and transmits it to the user analysis unit 220 and the object analysis unit 230.
  • the user analysis unit 220 detects a specific part of the user reflected in the distance image and analyzes the operation of the specific part (S102).
  • the object analysis unit 230 also detects the object reflected in the distance image and analyzes the motion of the object (S103). The internal flow of these analysis processing processes will be described later.
  • the determination unit 240 determines whether or not the registration gesture has been executed based on the analyzed movements of the specific part and the object (S104). If the registration gesture is not executed (NO in S105), the function execution unit 250 maintains the current function (S106). As a result, the flow ends and the entire image 500 does not change. On the other hand, when the registered gesture is executed, the function execution unit 250 executes the function related to the executed gesture (S107). That is, the function is switched, and the image object related to the function of the entire image 500 changes.
  • FIG. 16 is a flowchart of the analysis process. Since the flow is the same in the user analysis unit 220 and the object analysis unit 230, the user detection unit 221 and the object detection unit 231 are collectively referred to as a detection unit, and the user motion analysis unit 222 and the object motion analysis are collectively described. Described as motion analysis unit.
  • the detection unit attempts to detect a specific part or object captured in a distance image at a plurality of predetermined points in time.
  • the analysis unit estimates the operation at the time of failure based on the operation at other time points and the like.
  • the motion analysis unit determines the detected position and posture transition, the previous analysis result, and the other motion (that is, in the case of user analysis). Analyzes the motion based on the motion of the object, the motion of a specific part in the case of object analysis), and the like (S204). Thereby, even if the detection and analysis fail at some of the plurality of time points, the operation at the plurality of time points can be estimated. It is also possible to detect the rotational movement of a specific rotationally symmetric object. This prevents false detection of operation.
  • FIG. 17 is a flowchart of the image display process. This flow can be performed in S106 and S107 of the flow of the function call processing by the gesture.
  • the function execution unit 250 acquires information such as an image object corresponding to a gesture, a detected specific part, and a detected object (S301).
  • the information of the image object may be transmitted from the determination unit 240, or the information indicating the gesture executed may be transmitted from the determination unit 240, and the function execution unit 250 may acquire the information based on the information.
  • the image acquisition unit 251 acquires the image object from the image storage unit 264 (S302). If the gesture is not changed, the image object has already been acquired, and the process may be omitted.
  • the display position determination unit 252 first predicts the positions of the user and the object at the time of output of the image in order to determine the position to display the image (S303). As described above, the user motion analysis unit 222 and the object motion analysis unit 232 may perform the prediction. The display position determination unit 252 confirms the free space of the entire image 500 currently displayed, and further detects the free space after the present time based on the predicted positions of the user and the object (S304). The display position determination unit 252 determines the display position of the image object based on the free space after the present time and the size of the image object (S305). When displaying on a specific object, the position of the specific object is detected instead of the empty space, and the position is determined as the display position.
  • the display direction determination unit 253 confirms whether the position of the user's face is detected in order to determine the direction in which the image is displayed, and if the position of the face is not detected (NO in S306), The position of the user's face is estimated based on the detected specific part (S307). When the estimation is completed or the position of the face is detected (NO in S306), the display direction determination unit 253 determines the display direction of the image object based on the position of the user's face (S308). ..
  • the whole image generation unit 254 Since the image object, the display position of the image, and the display direction of the image are known, the whole image generation unit 254 generates and outputs the whole image 500 by fitting the image object into the whole image 500 as determined. (S309).
  • the image object corresponding to the gesture can be continuously displayed so as to be easily seen by the user by this flow.
  • the user who performed the gesture may move after the gesture is executed and the image object related to the gesture is displayed. Even in such a case, the display position and display direction of the image object are optimized.
  • a plurality of functions that can be provided in the information processing system 1000 can be assigned to a plurality of gestures using one specific object.
  • the information processing system 1000 recognizes the executed gesture by analyzing the movement of the specific object and the user, and executes the function assigned to the recognized gesture. This eliminates the complexity of switching to another specific object when it is desired to switch the function to be executed, and improves the convenience of the user.
  • the position of the user's face is estimated, and the position, orientation, etc. of the image object to be displayed are adjusted according to the estimated face position. As a result, for example, it is possible to prevent a problem that the characters displayed in the information processing system 1000 are displayed diagonally to the user and the displayed characters are difficult to read.
  • the position and movement of the user and the object existing in the whole image 500 are also estimated, and even if the object in the display space and the user move, the image object is always placed in the empty space or the moving object. It is also possible to display it.
  • the processing of the device according to the embodiment of the present disclosure can be realized by software (program) executed by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like. It should be noted that, instead of executing all the processes of the device by software, some processes may be executed by hardware such as a dedicated circuit.
  • the present disclosure may also have the following structure.
  • a first analysis unit that detects a specific part of the user captured in the input image and analyzes the operation of the specific part.
  • a second analysis unit that detects a specific object captured in the input image and analyzes the operation of the specific object.
  • a determination unit that determines whether or not any of the specific portion and the plurality of gestures related to the specific object has been executed based on the operation of the specific portion and the operation of the specific object. When it is determined that any of the plurality of gestures has been executed, an execution unit that executes a function corresponding to the executed gesture, and an execution unit. Information processing device equipped with.
  • the second analysis unit Judgment is made as to whether or not the motion analysis of the specific object at the first time point is successful.
  • the information processing apparatus wherein when it is determined that the analysis of the motion of the specific object at the first time point has failed, the analysis result of the motion at the first time point is corrected.
  • the second analysis unit uses the analysis result of the operation at the first time point, the analysis result of the operation at the second time point before the first time point, and the operation at the third time point after the first time point.
  • the information processing apparatus according to the above [2], wherein the analysis result of the operation at the first time point is modified based on at least one of the analysis results of the above.
  • the second analysis unit Based on the analysis result of the operation at the time point before the first time point, the operation at the first time point is predicted.
  • the information processing apparatus which determines the success or failure of the analysis of the operation at the first time point based on the prediction result of the operation at the first time point and the analysis result of the operation at the first time point.
  • the second analysis unit corrects the analysis result of the motion at the first time point based on the prediction result. 4] The information processing apparatus according to the above. [6] The information processing device according to any one of [1] to [5] above, wherein the execution unit outputs an image including an image object corresponding to the executed gesture together with the execution of the function.
  • the second analysis unit predicts the first region of the detected object at the first time point based on the transition of the position of the detected object.
  • the information processing device according to the above [6], wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the first area at the first time point.
  • the second analysis unit predicts the second region of the detected specific part at the first time point based on the transition of the position of the detected specific part.
  • the information processing device according to the above [6], wherein the execution unit displays an image object corresponding to the executed gesture in an area other than the second area at the first time point.
  • the execution unit displays each image related to the plurality of gestures and each image related to the plurality of functions.
  • the determination unit Recognize the selected gesture and function based on the transition of the specific part and the position of each displayed image.
  • the information processing apparatus according to any one of [1] to [11] above, which executes the selected function when it is determined that the selected gesture has been executed.
  • the execution unit indicates a registration area for registering a new gesture when the determination unit determines that a predetermined gesture has been executed.
  • the first analysis unit detects a specific part included in the registration area, and sets the operation of the detected specific part as one of the plurality of gestures. The description in any one of [1] to [12] above.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed. When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and Information processing method including.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed. When it is determined that any of the plurality of gestures has been executed, the step of executing the function corresponding to the executed gesture, and A program that runs on your computer.
  • a step of detecting a specific part of the user captured in the input image and analyzing the operation of the specific part and A step of detecting a specific object captured in the input image and analyzing the operation of the specific object, and Based on the movement of the specific part and the movement of the specific object, a step of determining whether or not any one of the specific part and the plurality of gestures related to the specific object has been executed is performed.
  • the step of executing the function corresponding to the executed gesture and A storage medium in which a program executed on a computer is stored.
  • Information processing system 100 Distance image generator 200 Information processing device 210 Distance image acquisition unit 220 User analysis unit 221 User detection unit 222 User motion analysis unit 230 Object analysis unit 231 Object detection unit 232 Object motion analysis unit 240 Judgment unit 250 Function execution 251 Image acquisition unit 252 Display position determination unit 253 Display direction determination unit 254 Overall image generation unit 261 User analysis data storage unit 262 Object analysis data storage unit 263 Registered gesture storage unit 264 Image storage unit 300 Projector 400 Projected object 500 Overall image 501, 502, 503, 504 Image object 505 Pointer image object 506A, 506B, 506C Different color image object 507A, 507B Different size image object 601, 602, 603 User 611, 612, 613 Specific part 621 Feature point 631 Plane 701, 702, 703, 704, 705, 706, 707 Object

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un dispositif de traitement d'informations ou un dispositif similaire qui résout le problème concernant la facilité d'utilisation d'un système de traitement d'informations qui fournit une fonction qui dépend d'un geste enregistré. Le dispositif de traitement d'informations selon un aspect de la présente invention est pourvu d'une première unité d'analyse, d'une seconde unité d'analyse, d'une unité de détermination et d'une unité de transmission. La première unité d'analyse détecte une partie spécifique d'un utilisateur présenté dans une image d'entrée, et analyse l'opération de la partie spécifique. La seconde unité d'analyse détecte un objet spécifique présenté dans l'image d'entrée, et analyse l'opération de l'objet spécifique. Sur la base de l'opération de la partie spécifique et de l'opération de l'objet spécifique, l'unité de détermination détermine si un geste associé à la partie spécifique et à l'objet spécifique a été exécuté. Lorsqu'il est déterminé que le geste a été exécuté, l'unité d'exécution exécute une fonction correspondant au geste.
PCT/JP2021/002501 2020-02-10 2021-01-25 Dispositif de traitement d'informations, procédé de traitement d'informations et programme WO2021161769A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020020938 2020-02-10
JP2020-020938 2020-02-10

Publications (1)

Publication Number Publication Date
WO2021161769A1 true WO2021161769A1 (fr) 2021-08-19

Family

ID=77292382

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/002501 WO2021161769A1 (fr) 2020-02-10 2021-01-25 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Country Status (1)

Country Link
WO (1) WO2021161769A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009519553A (ja) * 2005-12-12 2009-05-14 株式会社ソニー・コンピュータエンタテインメント コンピュータプログラムとのインタフェース時に深さと方向の検出を可能とする方法およびシステム
WO2017217050A1 (fr) * 2016-06-16 2017-12-21 ソニー株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations et support de stockage

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009519553A (ja) * 2005-12-12 2009-05-14 株式会社ソニー・コンピュータエンタテインメント コンピュータプログラムとのインタフェース時に深さと方向の検出を可能とする方法およびシステム
WO2017217050A1 (fr) * 2016-06-16 2017-12-21 ソニー株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations et support de stockage

Similar Documents

Publication Publication Date Title
US11314335B2 (en) Systems and methods of direct pointing detection for interaction with a digital device
US20220382379A1 (en) Touch Free User Interface
US20210096651A1 (en) Vehicle systems and methods for interaction detection
US10732725B2 (en) Method and apparatus of interactive display based on gesture recognition
JP5802667B2 (ja) ジェスチャ入力装置およびジェスチャ入力方法
KR101947034B1 (ko) 휴대 기기의 입력 장치 및 방법
US20180292907A1 (en) Gesture control system and method for smart home
US20150084859A1 (en) System and Method for Recognition and Response to Gesture Based Input
US20180150186A1 (en) Interface control system, interface control apparatus, interface control method, and program
US9477874B2 (en) Method using a touchpad for controlling a computerized system with epidermal print information
EP2752740A1 (fr) Procédé, appareil et terminal mobile de commande de dessin
US9544556B2 (en) Projection control apparatus and projection control method
US20150363038A1 (en) Method for orienting a hand on a touchpad of a computerized system
WO2014127697A1 (fr) Procédé et terminal permettant de déclencher des programmes d'application et des fonctions de programmes d'application
US20140362002A1 (en) Display control device, display control method, and computer program product
US10621766B2 (en) Character input method and device using a background image portion as a control region
Matlani et al. Virtual mouse using hand gestures
KR101559424B1 (ko) 손 인식에 기반한 가상 키보드 및 그 구현 방법
JP6033061B2 (ja) 入力装置およびプログラム
WO2021161769A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
JP2016071824A (ja) インターフェース装置、手指追跡方法、及び、プログラム
CN110162251A (zh) 图像缩放方法及装置、存储介质、电子设备
US20240160294A1 (en) Detection processing device, detection processing method, information processing system
EP3059664A1 (fr) Procédé pour commander un dispositif par des gestes et système permettant de commander un dispositif par des gestes
KR101327963B1 (ko) 깊이 값을 이용한 회전 인터페이스 기반 문자 입력 장치 및 이를 이용한 문자 입력 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21753553

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21753553

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP