WO2020078319A1 - 基于手势的操控方法及终端设备 - Google Patents

基于手势的操控方法及终端设备 Download PDF

Info

Publication number
WO2020078319A1
WO2020078319A1 PCT/CN2019/111035 CN2019111035W WO2020078319A1 WO 2020078319 A1 WO2020078319 A1 WO 2020078319A1 CN 2019111035 W CN2019111035 W CN 2019111035W WO 2020078319 A1 WO2020078319 A1 WO 2020078319A1
Authority
WO
WIPO (PCT)
Prior art keywords
hand
group
image
spatial position
gesture
Prior art date
Application number
PCT/CN2019/111035
Other languages
English (en)
French (fr)
Inventor
王星宇
赵磊
吴航
张启龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP19872386.8A priority Critical patent/EP3859489A4/en
Publication of WO2020078319A1 publication Critical patent/WO2020078319A1/zh
Priority to US17/230,067 priority patent/US20210232232A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/016Input arrangements with force or tactile feedback as computer generated output to the user
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/012Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present application relates to the field of human-computer interaction, in particular to a gesture-based manipulation method and terminal device.
  • Virtual reality technology and augmented reality technology are emerging multimedia technologies in recent years.
  • Virtual reality technology is an immersive interactive environment created based on multimedia computer technology, sensing technology, and simulation technology. Specifically, it uses computer technology to generate a virtual environment with a specific range of realistic visual, audio, and tactile integration. The user interacts and interacts with objects in the virtual environment in a natural manner with the necessary equipment, thereby generating personal presence. The feeling and experience equivalent to the real environment.
  • Augmented reality technology is a technology that calculates the position and angle of camera images in real time and adds corresponding images. The goal of this technology is to "seamlessly" integrate real-world information and virtual-world information so that real environments and virtual objects are superimposed on the same screen or space in real time. In this way, the real world is supplemented so that people can enhance their experience of the real world in terms of sight, hearing, and touch.
  • Gesture recognition is one of the core technologies of human-computer interaction based on vision. Users can interact with virtual objects using human-computer interaction methods based on gesture recognition. In virtual reality or augmented reality scenes, users can interact with augmented objects (virtual objects) in three-dimensional space through gestures, enhancing immersion. In this way of interaction, users no longer need to manipulate virtual objects through external devices, such as keyboards, mice, and handles, nor just click on virtual objects on the touch screen.
  • gesture interaction technology is more complicated, but at the same time has at least the following advantages: (1) can be separated from the physical contact to achieve remote control; (2) interactive actions are more abundant and natural, different operations have different Gestures are not limited to several commonly used operations such as clicks and swipes; (3) Less impact on user activities, you can continue gesture operations at any time.
  • a currently used gesture-based interaction method is to obtain the shape features or motion trajectories of gestures through gesture analysis, identify corresponding gestures based on the shape features or trajectories of gestures, and then perform corresponding control operations.
  • the terminal device is configured with only a limited number of shape features or motion trajectories of gestures, resulting in that the terminal device can only recognize these gestures, which has poor scalability and low recognition rate.
  • Embodiments of the present application provide a gesture-based manipulation method and terminal device, which can cover all natural gestures and continuous hand movements, with higher and more efficient operation, which is convenient for improving the user's human-computer interaction experience.
  • a first aspect of an embodiment of the present application provides a gesture-based manipulation method.
  • the method is executed by a terminal device, and the method includes: displaying a target screen, where the target screen includes a gesture or a detected hand Virtual objects manipulated by hand movements; acquiring hand images of group F; recognizing the position of hand joint points of the hands in the hand image of group F based on the hand images of group F, thereby obtaining hand joint points of group F
  • the spatial position of any group of hand joint points is the spatial position of the hand joint points of the hand in a group of hand images, F is an integer greater than 0; according to the group F of hand joint points
  • the spatial position executes a control operation corresponding to the spatial position of the hand joint point of the group F, the control operation is used to adjust the position and / or form of the virtual object in the target picture.
  • the group F hand image may be an image collected by a depth camera and a color camera on the terminal device, or may be only a depth image collected by the depth camera.
  • a group of hand images may include depth images and color images (for example, RGB images) obtained by a depth camera and a color camera simultaneously shooting the same scene, or may include only depth images collected by the depth camera.
  • the simultaneous shooting of the same scene by the depth camera and the color camera means that the time interval between the depth camera and the color camera shooting the same scene is less than the time threshold.
  • the time threshold may be 1 ms, 5 ms, 10 ms, etc.
  • the color image and the depth image included in any group of hand images are images obtained by shooting the same scene at the same time.
  • the group F hand image may also be a group F hand image obtained by terminal 3 dividing the hand of the group F original image collected by the camera.
  • the spatial position of the key points refers to the three-dimensional coordinates of the joint points.
  • the above-mentioned virtual objects may be virtual objects displayed on the terminal device that can be manipulated by the user through gestures, such as virtual characters, virtual animals, and virtual objects.
  • the terminal device is preset with a trained recognition network, and the above group F hand images may be sequentially input to the recognition network to obtain the spatial position of a group of joint points corresponding to each group of hand images. In this way, the spatial position of each hand joint in each group of hand images can be quickly and accurately determined.
  • the terminal device After acquiring a set of hand images, the terminal device recognizes the hand joint points in the set of hand images, saves the obtained spatial position of the set of joint points, and determines the spatial position of the set of joint points Corresponding gestures.
  • the terminal device may sequentially recognize hand joint points in each group of hand images in the order in which each group of hand images is obtained. It can be understood that any natural gesture or any hand movement can be represented as the spatial position of one or more sets of joint points. In turn, any natural gesture or any hand movement can be determined by the spatial position of one or more sets of joint points. Natural gesture refers to any gesture, that is, any gesture that the user can make.
  • the terminal device determines the user's control operation through the spatial position of the finger joint point, which can cover all natural gestures and continuous hand movements, and the operation efficiency is higher and more natural, so as to improve the user's human-computer interaction Experience.
  • the performing the control operation corresponding to the spatial position of the hand joint point of the group F includes: determining M gesture types according to the spatial position of the hand joint point of the group F, where M is less than Or equal to F, M is a positive integer; execute the control operation corresponding to the M gesture types.
  • the terminal device determines one or more gesture types according to the spatial position of one or more sets of finger joint points, and then executes the control operation corresponding to the one or more gesture types to accurately recognize various gestures.
  • the determining of the M gesture types corresponding to the spatial positions of the hand joint points of the F group includes: according to the spatial positions of a group of hand joint points in the F group of hand joint points , Calculate the angle between the hand joint points in the set of hand joint points; according to the angle between the hand joint points, determine a gesture type corresponding to the spatial position of the set of hand joint points .
  • the determining of the M gesture types corresponding to the spatial positions of the hand joint points of the group F includes: determining at least two gesture types corresponding to the spatial positions of the hand joint points of the group F , F is greater than 1; the performing the control operation corresponding to the M gesture types includes: performing the control operation corresponding to the at least two gesture types according to the change in the gesture type of the at least two gesture types.
  • the performing the control operation corresponding to the spatial position of the hand joint point of the group F includes: determining the hand joint of the group F according to the spatial position of the hand joint point of the group F M gesture types corresponding to the spatial positions of the points, F is greater than 1, and M is less than or equal to F; the control operation is performed according to the spatial positions of the hand joint points of the group F and the M gesture types.
  • the performing the control operation according to the spatial position of the hand joint points of the group F and the M gesture types includes: according to the spatial position of the hand joint points of the group F , Determine the spatial position change of the hand joint point between the hand joint point groups; perform the control operation according to the M gesture types and the spatial position change;
  • the spatial position of the hand joint points of the group F determine the change of the spatial position of the hand joint points between the hand joint point groups; the change of the gesture type and the change of the spatial position according to the M gesture types To execute the control operation;
  • control operation is performed according to the gesture type change of the M gesture types and the spatial position of the joint point of the hand of the group F.
  • the performing the control operation according to the M gesture types and the spatial position changes includes: when at least one of the M gesture types is a target gesture type , According to the spatial position of the hand joint points of the group F, determine the change of the spatial position of the hand joint points between the hand joint point groups, and the target gesture type is used to adjust the virtual object in the target screen. position.
  • the performing the control operation corresponding to the spatial position of the hand joint points of the group F includes: determining between the hand joint point groups according to the spatial position of the hand joint points of the group F The spatial position of the joint point of the hand changes, F is greater than 1; according to the spatial position change, the control operation is performed.
  • the method further includes: when the number of hand joint points in each group of hand images in the K group of hand images is less than the number threshold, prompting that the gesture operation exceeds the manipulation range
  • the hand image of group K is included in the hand image of group F, K is less than or equal to F, and K is a positive integer.
  • the recognizing the position of the hand joint point in the group F hand image, thereby obtaining the spatial position of the group F hand joint point includes:
  • the position of the hand joint point of the hand in the position area is recognized.
  • the performing the control operation according to the change of the spatial position includes: determining a movement trajectory of the hand according to the change of the spatial position; moving the movement trajectory according to the movement trajectory of the hand Virtual objects are vibrated; the intensity of the vibration is positively or negatively related to the distance from the hand to the terminal device.
  • the performing the control operation according to the change of the spatial position includes: determining the movement of the hand according to the change of the spatial position; and performing adjustment corresponding to the movement of the hand Operation, the adjustment operation is used to adjust the form of the virtual object.
  • the hand in the any group of hand images is detected according to at least one of a color image and a depth image included in any group of hand images in the group F hand images
  • the position area where the part is located includes: detecting the first position area where the hand is located in the color image included in the target group hand image according to the color image included in the target group hand image; the target group hand image Is any group of images in the hand image of the group F; the recognizing the position of the hand joint point of the hand in the position area according to at least one of the color image and the depth image includes :
  • the depth image included in the target group hand image identify the position of the hand joint point of the hand in the second position area in the depth image to obtain the set of hand joint points corresponding to the target group hand image
  • the spatial position, the second position area is an area corresponding to the first position area in the depth image, and the depth image and the color image are images obtained by simultaneously shooting the same scene.
  • the hand in the any group of hand images is detected according to at least one of a color image and a depth image included in any group of hand images in the group F hand images
  • the location area of the part includes: detecting the first location area of the hand in the color image according to the color image included in the target group hand image; the target group hand image is the group F hand Any group of images in the image;
  • the recognizing the position of the hand joint point of the hand in the position area according to at least one of the color image and the depth image includes: recognizing the first position area in accordance with the color image The position of the hand joint point of the hand to obtain the spatial position of the first group of hand joint points; based on the depth image included in the hand image of the target group, identify the hand position in the second position area in the depth image The position of the hand joint point to obtain the spatial position of the second group of hand joint points, the second position area being the area corresponding to the first position area in the depth image, the depth image and the color image Synchronizing the images obtained by shooting the same scene; combining the spatial positions of the first group of hand joint points and the second group of hand joint points to obtain a set of hand joint points corresponding to the target group hand image Spatial location.
  • the method before identifying the hand joint points in the group F hand image and obtaining the spatial position of the group F hand joint points, the method further includes: using a color sensor and a depth sensor Simultaneously shooting the same scene to obtain the original color image and the original depth image; spatially aligning the original color image and the original depth image; making the hands of the aligned original color image and the aligned original depth image respectively Segmentation to obtain the target group hand image.
  • the recognizing the position of the hand joint point in the group F hand image to obtain the spatial position of the group F hand joint point includes: according to the depth included in the target group hand image An image to detect the location area of the hand in the depth image, the target group hand image is any group of images in the group F hand image; based on the depth image, the depth image is identified The position of the hand joint point of the hand in the position area in the above, to obtain the spatial position of a group of hand joint points corresponding to the target group hand image.
  • a second aspect of an embodiment of the present application provides a terminal device.
  • the terminal device includes: a display unit for displaying a target screen including the detected gesture or the detected hand motion for manipulation Virtual object; acquisition unit for acquiring group F hand images; recognition unit for identifying the position of hand joint points of the hands in the group F hand images based on the group F hand images, thereby obtaining The spatial position of the hand joint points in group F.
  • the spatial position of any group of hand joint points is the spatial position of the hand joint points in the hand image in a group of hands.
  • the processing unit uses According to the spatial position of the hand joint point of the group F, the control operation corresponding to the spatial position of the hand joint point of the group F is performed, the control operation is used to adjust the position and the position of the virtual object in the target picture / Or form.
  • the processing unit is configured to determine at least one gesture corresponding to the spatial position of the joint point of the group F; execute the control operation corresponding to the at least one gesture.
  • the processing unit is configured to calculate a hand joint in the set of hand joint points according to the spatial position of the set of hand joint points in the group F hand joint points The angle between the points; according to the angle between the hand joint points, determine a gesture type corresponding to the spatial position of the set of hand joint points.
  • the processing unit is configured to determine at least two gesture types corresponding to the spatial positions of the hand joint points of the group F, where F is greater than 1; gestures according to the at least two gesture types When the type changes, the control operation corresponding to the at least two gesture types is performed.
  • the processing unit is configured to determine the M gesture types corresponding to the spatial positions of the hand joint points of the group F according to the spatial positions of the hand joint points of the group F, where F is greater than 1. M is less than or equal to F; perform the control operation according to the spatial position of the joint point of the hand of the group F and the M types of gestures.
  • the processing unit is configured to determine the change in the spatial position of the hand joint point between the hand joint point groups according to the spatial position of the hand joint point in the group F; M types of gestures and changes in the spatial position, perform the control operation;
  • the processing unit is configured to determine the change in the spatial position of the hand joint point between the hand joint point groups according to the spatial position of the hand joint point in the group F; the gesture type according to the M gesture types Change and the spatial position change, execute the control operation;
  • the processing unit is configured to execute the control operation according to the gesture type change of the M gesture types and the spatial position of the joint point of the hand of the group F.
  • the processing unit is configured to determine a change in the spatial position of the hand joint point between the hand joint point groups according to the spatial position of the hand joint point in the F group, F is greater than 1 ; Perform the control operation according to the change of the spatial position.
  • the processing unit is configured to prompt that the gesture operation exceeds the control when the number of hand joint points in each group of hand images in the K group of hand images is less than the number threshold Range, the group K hand image is included in the group F hand image, K is less than or equal to F, and K is a positive integer.
  • the recognition unit is configured to detect any group of hands based on at least one of a color image and a depth image included in any group of hand images in the group F hand images The location area of the hand in the image; based on at least one of the color image and the depth image, the position of the hand joint point of the hand in the location area is identified.
  • the recognition unit is configured to detect the first position area of the hand in the color image included in the target group hand image based on the color image included in the target group hand image
  • the target group hand image is any one of the group F hand images; based on the depth image included in the target group hand image, identify the hand in the second position area in the depth image
  • the position of the hand joint point to obtain the spatial position of a group of hand joint points corresponding to the target group of hand images, the second position area is the area corresponding to the first position area in the depth image, so
  • the depth image and the color image are images obtained by simultaneously shooting the same scene.
  • the recognition unit is configured to detect the first position area of the hand in the color image based on the color image included in the target group hand image; the target group hand The image is any one of the group F hand images; based on the color image, the position of the hand joint point of the hand in the first position area is identified to obtain the space of the first group of hand joint points Position; according to the depth image included in the hand image of the target group, identify the position of the hand joint point of the hand in the second position area in the depth image to obtain the spatial position of the second group of hand joint points, the The second position area is the area corresponding to the first position area in the depth image, and the depth image and the color image are synchronized to an image obtained by shooting the same scene; the spatial positions of the first group of hand joint points are merged And the spatial position of the second group of hand joint points, to obtain the spatial position of a group of hand joint points corresponding to the target group of hand images.
  • the acquisition unit includes: a color sensor for shooting the same scene to obtain an original color image; a depth sensor for shooting the same scene to obtain the original original depth image;
  • the alignment subunit is used to spatially align the original color image and the original depth image;
  • the segmentation subunit is used to separately segment the aligned original color image and the aligned original depth image to obtain all Describe the hand image of the target group.
  • a third aspect of the embodiments of the present application provides a computer-readable storage medium that stores a computer program, and the computer program includes program instructions, which when executed by a processor causes the processor Perform the method described in the first aspect and any optional implementation manner.
  • a fourth aspect of an embodiment of the present application provides a terminal device, the terminal device includes a processor and a memory; the memory is used to store code; the processor reads the code stored in the memory to execute the first Method provided.
  • a fifth aspect of an embodiment of the present application provides a computer program product, which, when the computer program product runs on a computer, causes the computer to perform part or all of the steps of any one of the methods of the first aspect.
  • FIG. 1 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a terminal device according to an embodiment of this application.
  • FIG. 3 is a schematic diagram of a logical structure of a terminal 300 provided by an embodiment of this application;
  • FIG. 4 is a schematic diagram of a logical structure of an acquisition unit provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of determining a fist gesture from the spatial position of a group of hand joint points provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of an open gesture determined by the spatial position of a group of hand joint points provided by an embodiment of the present application
  • FIG. 7 is a flowchart of a gesture-based manipulation method provided by an embodiment of the present application.
  • FIG. 8 is a flowchart of another gesture-based manipulation method according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a bullet released by an open gesture provided by an embodiment of the present application.
  • FIG. 10 is a flowchart of another gesture-based manipulation method provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a screen displayed by a terminal device provided by an embodiment of this application.
  • FIG. 12 is a schematic diagram of a hand movement process provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of the relationship between the vibration intensity and the distance from the hand to the terminal device according to an embodiment of the present application
  • FIG. 14 is a schematic diagram of a hardware structure of a terminal device provided by an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a terminal device 100 according to an embodiment of the present application.
  • the terminal device 100 may be, but not limited to, a mobile phone, a tablet computer, a laptop computer, a smart watch, a TV, AR glasses, VR glasses, and other electronic devices with display screens.
  • the terminal device 100 may support multiple applications, such as one or more of the following: drawing applications, word processing applications, website browsing applications, spreadsheet applications, office software applications, game applications, Phone apps, video conferencing apps, email apps, instant messaging apps, health management apps, photo management apps, digital camera apps, digital video camera apps, vibration management apps, digital music player apps Programs and digital video player applications.
  • Each application program executed on the terminal device 100 optionally obtains a user input instruction through at least one hardware interface device, such as but not limited to a touch display screen 136, a depth camera 156, and a color camera 158.
  • the terminal device 100 may include a memory 108 (which may include one or more computer-readable storage media), one or more processing units (such as but not limited to a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit , GPU), neural network processor (Neural-network Processing Unit, NPU), digital signal processor (Digital Signal Processing, DSP) and field programmable gate array (Field-Programmable Gate FPGA) at least one) 120.
  • processing units such as but not limited to a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit , GPU), neural network processor (Neural-network Processing Unit, NPU), digital signal processor (Digital Signal Processing, DSP) and field programmable gate array (Field-Programmable Gate FPGA) at least one) 120.
  • the terminal device 100 may also include a memory controller 104, a peripheral device interface 106, an RF circuit 126, an audio circuit 128, a speaker 130, a touch display 136, a microphone 132, an input / output (I / O) subsystem 106, other input or At least one of the control device 116 and the external port 156.
  • the terminal device 100 may also include one or more optical sensors 142.
  • the terminal device 100 may further include one or more intensity sensors 146 for detecting the intensity of the touch on the touch display screen 136 (for example, for detecting the intensity of the touch on the touch display screen 136, where "strength" refers to the touch The pressure or pressure of the touch (eg, finger touch) on the display screen 136).
  • the terminal device 100 may also include a display screen that does not have a function of sensing the user's touch, thereby replacing the touch display screen 136.
  • the terminal device 100 may further include a depth camera 156 and a color camera 158.
  • the depth camera 156 is used to collect depth images.
  • the color camera 158 is used to collect color images, such as RGB images.
  • the depth camera 156 and the color camera 158 can simultaneously capture images of the same scene under the control of one or more processing units in the terminal device 100.
  • the terminal device 100 may further include a vibration circuit 160 for providing one or more vibration modes, so that the terminal device 100 achieves different vibration strengths or vibration effects.
  • the vibration circuit 160 may include devices such as a vibration motor.
  • terminal device 100 is only an example, and the terminal device 100 may have more or fewer components than shown, optionally combining two or more components.
  • the various components shown in FIG. 1 are implemented in hardware, software, or a combination of both hardware and software, and may also include at least one of an integrated circuit for signal processing and an integrated circuit for application programs.
  • the memory 108 may include a high-speed random access memory, and optionally also a non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Access to the memory 108 by other components of the terminal device 100 (e.g., CPU 120 and peripheral devices 118) is optionally controlled by the controller 104 of the memory.
  • a non-volatile memory such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state memory devices.
  • the peripheral device interface 106 may be used to couple the input peripheral device and the output peripheral device of the terminal device 100 to the processing unit 102 and the memory 108.
  • the one or more processing units 102 execute or execute various software programs and / or instruction sets stored in the memory 108 to perform various functions of the device 100 and process data.
  • the peripheral device interface 106, the processing unit 102 and the memory controller 104 may be implemented on a single chip such as the chip 104.
  • the peripheral device interface 106, the processing unit 102, and the memory controller 104 may also be implemented on separate chips.
  • a radio frequency (Radio Frequency) (RF) circuit 126 is used to receive and transmit RF signals, also called electromagnetic signals.
  • the RF circuit 126 converts an electrical signal into an electromagnetic signal or converts an electromagnetic signal into an electrical signal, and communicates with a communication network and other communication devices via the electromagnetic signal.
  • the RF circuit 126 may include circuits for performing the above functions, including but not limited to antenna systems, RF transceivers, one or more amplifiers, tuners, one or more oscillators, digital signal processors, codec chipsets, etc. Wait.
  • the RF circuit 126 may communicate with a network and other devices through wireless communication, and the network may be, for example, the Internet (also known as the World Wide Web (WWW)), intranet, wireless local area network (LAN), or metropolitan area network (MAN).
  • Wireless communication can include any of a variety of communication standards, protocols, and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), High Speed Downlink Packet Access (HSDPA), High Speed Uplink packet access (HSUPA), evolution, pure data (EV-DO), HSPA, HSPA +, dual-cell HSPA (DC-HSPDA), long-term evolution (LTE), near field communication (NFC), wideband code division Address (WCDM), code division multiple access (CDM), time division multiple access (TDM), Bluetooth, wireless fidelity (Wi-Fi) (for example, IEEE802.11a, IEEE802.lib, IEEE802.Ilg or IEEE802.
  • GSM Global System for Mobile Communications
  • EDGE Enhanced Data GSM Environment
  • VoIP Voice over Internet Protocol
  • Wi-MAX email protocols
  • IMAP Internet Message Access Protocol
  • POP Post Office Protocol
  • instant messaging eg, extensible message processing and presence protocols
  • XMPP extensible message processing and presence protocols
  • SMPLE Extended Session Initiation Protocol
  • MPS Instant Messaging and Presence Service
  • SMS Short Message Service
  • the audio circuit 128, the speaker 130, and the microphone 132 provide an audio interface between the user and the device 100.
  • the audio circuit 128 receives audio data from the peripheral device interface 118, converts the audio data into electrical signals, and transmits the electrical signals to the speaker 130.
  • the speaker 130 converts the electrical signal into sound waves audible to the human ear.
  • the audio circuit 128 also receives electrical signals obtained by the microphone 132 converting sound waves.
  • the audio circuit 128 converts the electrical signal into audio data, and transmits the audio data to the peripheral device interface 106 for processing.
  • the audio data may be transmitted by the peripheral device interface 106 to the memory 108, the processing unit 102, or the RF circuit 126.
  • the audio circuit 128 may also include a headset jack.
  • the headset jack provides an interface between the audio circuit 128 and a removable audio input / output peripheral device.
  • the peripheral device may be an output-only headset, or the peripheral device may have an output (for example, a single-ear headset or dual Headsets) and input (e.g. microphone) headsets.
  • the I / O subsystem 134 couples input / output peripheral devices on the terminal device 100 such as the touch screen 136 and other input control devices 152 to the peripheral device interface 106.
  • I / O subsystem 134 may include display controller 134, optical sensor controller 140, intensity sensor controller 144, or other input controller 154 for other input control devices 116.
  • the other input controller 154 receives electrical signals from other input control devices 116 or sends electrical signals to other input control devices 116.
  • Other input control devices 116 optionally include physical buttons (eg, push buttons, rocker buttons, etc.), dials, slide switches, joysticks, click dials, and so on.
  • Other input controllers 154 may also be optionally coupled to any of the following: a keyboard, an infrared port, a USB port, and a pointing device (such as a mouse).
  • the physical buttons may also include volume up or volume down buttons for volume control of the speaker 130, headphones, or headset.
  • the physical button may also include a push button for turning on and off the terminal device 100 and locking the terminal device 100.
  • the touch display 136 provides an input interface and an output interface between the terminal device 100 and the user.
  • the display controller 134 receives the electric signal from the touch display screen 136 or sends the electric signal to the touch screen 112.
  • the touch display 136 displays visual output to the user.
  • the visual output optionally includes graphics, text, icons, dynamic pictures, video, and any combination thereof.
  • the touch display screen 136 may have sensors or sensor groups that receive input from the user based on tactile or tactile touch.
  • the touch display 136 and the display controller 134 (together with any associated modules or instruction sets in the memory 108) detect the touch on the touch display 136 (and any movement or interruption of the touch), and convert the detected The touch is converted into an interaction with a user interface object (eg, one or more virtual buttons, icons, web pages, graphics, or images) displayed on the touch display screen 136.
  • a user interface object eg, one or more virtual buttons, icons, web pages, graphics, or images
  • the touch point between the touch display screen 136 and the user may correspond to the user's finger or the stylus.
  • the touch display 136 may use LCD (liquid crystal display) technology, LPD (light emitting polymer display) technology, or LED (light emitting diode) technology.
  • LCD liquid crystal display
  • LPD light emitting polymer display
  • LED light emitting diode
  • the touch display 136 and the display controller 134 may use any of a variety of touch sensing technologies that are now known or will be developed in the future, including, but not limited to, capacitive, resistive, infrared Or surface acoustic wave touch sensing technology. In the specific implementation process, projected mutual capacitance sensing technology can be used.
  • the touch display 136 may have a video resolution of more than 100dpi or other video resolutions.
  • the user optionally uses any suitable object or attachment such as a stylus, finger, etc. to touch the touch display 136.
  • the user interface may be designed to interact with the user based on finger touches and gestures, which may not be as accurate as stylus-based input due to the large touch area of the finger on the touch display 136.
  • the terminal device 100 translates the rough finger-based input into a precise pointer / cursor position or command to perform the action desired by the user.
  • the terminal device 100 converts the user's gesture or hand motion into a control operation for manipulating a virtual object or other operable object displayed on the touch screen 136.
  • the terminal device 100 may further include a touch pad for activating or deactivating a specific function through user's touch.
  • the touchpad area and the touch display screen 136 area are different areas, and the two areas may or may not be adjacent. The touchpad does not display visual output.
  • the terminal device 100 may also include a power system 138 for powering various components.
  • the power system 138 may include a power management system, one or more power sources (eg, batteries, alternating current (AC)), recharging systems, power failure detection circuits, power converters or inverters, power status indicators (eg, illuminated Diodes (LEDs) and any other components associated with the generation, management and distribution of electricity.
  • the power system may further include a wireless charging receiver for receiving electrical energy through wireless charging, so as to charge the terminal device 100.
  • the terminal device 100 may also include one or more optical sensors 142 coupled to the optical sensor controller 140 in the I / O subsystem 134.
  • the optical sensor 142 may include a charge coupled element (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor.
  • CCD charge coupled element
  • CMOS complementary metal oxide semiconductor
  • the optical sensor 142 receives light projected through one or more lenses from the environment, and converts the light into data representing an image.
  • the terminal device 100 may also include a touch intensity sensor 146 coupled to the intensity sensor controller 144 in the I / O subsystem 134.
  • the touch intensity sensor 146 may include one or more capacitive force sensors, power sensors, piezoelectric sensors, optical force sensors, or other intensity sensors.
  • the touch intensity sensor 146 is used to receive touch intensity information from the environment.
  • the terminal device 100 may also include one or more proximity sensors 148 coupled to the peripheral device interface 106.
  • the proximity sensor 148 is coupled to the input controller 160 in the I / O subsystem 134.
  • the proximity sensor when the terminal device 100 is placed near the user's ear (eg, when the user is making a phone call), the proximity sensor is turned off and the touch display 136 is disabled.
  • the terminal device 100 may also include one or more accelerometers 150 coupled to the peripheral device interface 106.
  • the accelerometer 150 is optionally coupled to the input controller 160 in the I / O subsystem 134.
  • the terminal device 100 may include a GPS (or GLONASS or other global navigation system) receiver in addition to one or more accelerometers 150 for acquiring location information about the device 100.
  • the storage 108 may include an operating system 110 and at least one of the following modules: a communication module (or instruction set) 112, a touch / motion module (or instruction set) 114, a graphics module (or Instruction set) 116, telephone module 118, recording module 120, video and music player module 122, and online audio / video module 124, the above modules are software codes, and the processing unit 102 reads the corresponding code in the memory 108 to achieve The kinetic energy of the module.
  • a communication module or instruction set
  • a touch / motion module or instruction set
  • a graphics module or Instruction set
  • Operating system 110 e.g. Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, embedded operating systems (e.g. VxWorks), Android, iOS, windows phone, Symbian, BlackBerry OS or windows mobile) including for control and management
  • Various software components and / or drivers for general system tasks e.g, memory management, storage device control, power management, etc.
  • general system tasks e.g, memory management, storage device control, power management, etc.
  • the communication module 112 is used to communicate with other devices through one or more external ports 156, and also includes various software components for processing data received by the RF circuit 126 and / or the external port 156.
  • the external port 156 eg, Universal Serial Bus (USB), etc.
  • USB Universal Serial Bus
  • the external port 156 may be a charging interface connected to the power system 138.
  • the charging interface is used to connect to the charging line to obtain external power through the charging line.
  • the external port 156 may also be a data interface. It is used to connect with the data line to obtain external data through the data line.
  • the external port 156 may have the functions of a data interface and a charging interface.
  • the data line and the charging line may also be the same line.
  • the touch / motion module 114 may be used to detect touches with the touch display 136 (in conjunction with the display controller 134) and other touch devices (eg, touch pad).
  • the touch / motion module 114 may include various software components for performing various operations related to the detection of the touch, such as determining whether a touch has occurred (eg, detecting a finger press event), determining the intensity of the touch (eg, touching Pressure or pressure), determining whether there is a movement of the touch and tracking the movement on the surface of the touch display 136 (eg, detecting one or more finger drag events), and determining whether the touch has stopped (eg, detecting a finger lift) Event or touch interrupt).
  • the touch / motion module 114 receives touch data from the surface of the touch display 136.
  • Determining the movement of the touch point may include determining the velocity (magnitude), speed (magnitude and direction) or acceleration (change in magnitude and / or direction) of the touch point, and the movement of the touch point is represented by a series of touch data. These operations can be applied to a single touch (eg, single-finger touch) or multiple simultaneous touches (eg, "multi-touch” / multi-finger touch).
  • the touch / motion module 114 and the display controller 134 detect touches on the touchpad.
  • the touch / motion module 114 may use a set of one or more intensity thresholds to determine whether the operation has been performed by the user (e.g., to determine whether the user has "clicked" the icon).
  • the touch / motion module 114 may detect the user's gesture input. Different gestures on the surface of the touch display screen 136 have different touch patterns (eg, different movements or intensities of detected touches). Therefore, the gesture can be detected by detecting a specific touch pattern. For example, detecting a finger tap gesture includes detecting a finger press event, and then detecting a finger lift (lift off) event at the same position (or substantially the same position) as the finger press event (eg, at the icon position) . As another example, detecting a finger swipe gesture on the surface of the touch display screen 136 includes detecting a finger press event, then detecting one or more finger drag events, and then detecting a finger lift (lift off) event.
  • the graphics module 116 may include various software components for presenting and displaying graphics on the touch screen 136 or other display, including visual impact (eg, brightness, transparency, saturation, contrast, material) used to change the displayed graphics Or other visual characteristics).
  • visual impact eg, brightness, transparency, saturation, contrast, material
  • graphics includes any objects that can be displayed to the user, such as but not limited to text, web pages, icons (such as user interface objects including soft keys), digital images, videos, animations, and so on.
  • the graphics module 116 may store data representing graphics to be displayed. Each graphic can be assigned a corresponding code.
  • the graphics module 116 receives one or more codes specifying the graphics to be displayed, and can also receive coordinate data and other graphics attribute data if necessary, and then generates corresponding image data to be output to the display controller 134 for display in Touch on the display 136.
  • the telephone module 118 can be used to make calls and answer Incoming calls, conducting conversations, and disconnecting or hanging up at the end of a conversation.
  • wireless communication can use any of a variety of communication standards, protocols, and technologies.
  • the recording module 120 can be used for recording and performing the recording process Open, tentative, continue, complete and other interactions with the user, and store the recorded audio data.
  • the video and music playback module 122 includes a user Acquire and play audio / video data and other audio / video file executable files stored in one or more file formats (such as MP3 or AAC files), and audio / video for display, presentation, or other playback ( For example, on the touch screen 112 or on the external display connected via the external port 156).
  • the device 100 may optionally include an audio / video player.
  • the video and music playback module 122 may include a video playback module and a music playback module.
  • the online audio / video module 124 is used to access , Receiving (for example, by streaming and / or downloading), playing back (for example, on a touch screen or on an external display connected via an external port 124), and otherwise managing one or more file formats (such as H. 264 / H.265, AMR-WB or EVS) online audio / video data.
  • the online audio / video module 124 may include an online audio module and an online video module.
  • the memory 108 may also include a video conference module, an email client module, an instant messaging module, a camera module for still images or video images, a word processing application module, an image editing module, a drawing module, a JAVA enabling module, an encryption module, a digital Rights management module, voice recognition module or voice copy module.
  • a video conference module an email client module, an instant messaging module, a camera module for still images or video images, a word processing application module, an image editing module, a drawing module, a JAVA enabling module, an encryption module, a digital Rights management module, voice recognition module or voice copy module.
  • each of the above modules and application programs can be used to execute the method described in this application, and can also be used as a module corresponding to the method described in this application.
  • These modules ie, instruction sets
  • the memory 108 optionally stores a subset of the aforementioned modules.
  • the above-mentioned modules and application programs in the memory can also be implemented by means of integrated circuits or a combination of software and hardware.
  • the memory 108 optionally stores additional modules and data structures not described above.
  • FIG. 2 is a schematic structural diagram of a terminal device 100 according to an embodiment of the present application.
  • the terminal device may include a touch display screen 136, a depth camera 156 and a color camera 158.
  • the depth camera 156 is used to collect depth images.
  • the color camera 158 is used to collect color images, such as RGB images.
  • the touch display screen 136 is used to display a target screen including a virtual object for manipulation through a detected gesture or a detected hand motion.
  • the depth camera 156, the color camera 158, and the touch display 136 are located on the same side of the terminal device 100. When the user is viewing the touch display screen 136, the depth camera 156 and the color camera 158 can capture an image of the user's hand.
  • the depth camera 156 is adjacent to the color camera 158, and the scene shot by the depth camera 156 and the color camera 158 may be regarded as the same scene.
  • FIG. 3 is a schematic diagram of a logical structure of a terminal 300 provided by an embodiment of the present application.
  • the terminal 300 includes a display unit 302, an acquisition unit 304, an identification unit 306, and a processing unit 308.
  • the unit in the terminal 300 may be implemented by software programming, or may be implemented by a hardware circuit, or some units may be implemented by software programming and another part of the unit may be implemented by a hardware circuit.
  • the terminal 300 may be the terminal device 100 in FIG. 1.
  • the screen of the terminal 300 may be the touch display screen 136.
  • the functions of the units in the terminal 300 are introduced below.
  • the display unit 302 is used to display a target screen including a virtual object for manipulation through the detected gesture or the detected hand motion.
  • the obtaining unit 304 is used to obtain the group F hand image.
  • the recognition unit 306 is used to sequentially recognize the position of the hand joint points in the group F hand images to obtain the spatial position of the group F hand joint points, and the space position of each group of hand joint points corresponds to a group of hand images.
  • the spatial position of the joint point of the hand, F is an integer greater than 0.
  • the spatial position of the hand joint points corresponding to a set of hand images may include the two-dimensional position and depth information of each hand joint point in the set of hand images, thereby forming a three-dimensional position.
  • the spatial position of a group of hand joint points includes the three-dimensional positions of 21 joint points of the hand.
  • a group of hand images may be a frame of depth images, and may also include a frame of depth images and a frame of color images, or a frame of color images.
  • the processing unit 308 is configured to perform a control operation corresponding to the spatial position of the joint points of the F groups of hands, and the control operation is used to adjust the position and / or shape of the virtual object in the target picture.
  • the display unit 302 is also used to display the adjusted target screen.
  • the display unit 302 can display a dynamic screen obtained by changing the position and / or shape of the virtual object in the target screen, or can display an image collected by a rear camera or a front camera, and can also display a virtual corresponding to the user's hand Hand images can also display other images.
  • the processing unit 308 may determine the control operation corresponding to the spatial position of the hand joint point of the group F, and adjust the screen displayed by the display unit 302 according to the control operation. It can be understood that the processing unit 308 may control the display unit 302 to display different screens according to different control operations.
  • the display unit 302 may be the touch display 136 in FIG. 1 or a non-touch display, which is not limited in the embodiments of the present application.
  • the user can manipulate the virtual objects in the screen displayed by the display unit 302 through gestures or hand movements.
  • the terminal 300 may recognize the user's gesture or hand motion, and then convert the recognized gesture or hand motion into a control operation for adjusting the screen displayed by the display unit 302.
  • the function of the obtaining unit 304 can be realized by the depth camera 156 and the color camera 158 in FIG. 1.
  • the depth camera 156 and the color camera 158 can simultaneously capture images, and once captured, a set of hand images can be obtained.
  • Both the identification unit 306 and the processing unit 308 may be the processing unit 120 in FIG. 1.
  • a set of hand images may include a depth image and a color image (such as RGB images) obtained by the depth camera and the color camera simultaneously shooting the same scene respectively, or only a depth image captured by the depth camera, or only Including a color image captured by a color camera.
  • the simultaneous shooting of the same scene by the depth camera and the color camera means that the time interval between the depth camera and the color camera shooting the same scene is less than the time threshold.
  • the time threshold may be 1 ms, 5 ms, 10 ms, etc.
  • the color image and the depth image included in any group of hand images are images obtained by shooting the same scene at the same time.
  • the group of hand images can be transmitted to the recognition unit 306; the recognition unit 306 recognizes the position of the hand joint point in the group of hand images To obtain the spatial position of a group of hand joint points and transmit it to the processing unit 308; the processing unit 308 can obtain the spatial position of the group of hand joint points and determine the gesture type corresponding to the spatial position of the group of hand joint points, Then, the control operation corresponding to the gesture type is executed. That is to say, the recognition unit 306 can obtain a set of hand images and then recognize the positions of hand joint points in the set of hand images, and then the processing unit 308 determines the gesture type corresponding to the set of hand images.
  • the identification unit 306 and the processing unit 308 may be different units, or may be the same unit (processing unit 120 in FIG. 1). It can be understood that in the embodiment of the present application, the recognition unit 306 may use a set of hand images (including a frame of depth images) to obtain the spatial position of a set of hand joint points, and then the processing unit 308 determines the corresponding The type of gesture and the spatial location of the user ’s hand.
  • the recognition unit 306 may use a set of hand images (including a frame of depth images) to obtain the spatial position of a set of hand joint points, and then the processing unit 308 determines the corresponding The type of gesture and the spatial location of the user ’s hand.
  • these sets of hand images can be transmitted to the recognition unit 306; the recognition unit 306 recognizes the positions of hand joint points in these sets of hand images, The spatial positions of multiple sets of hand joint points are obtained.
  • Multiple groups refer to two or more groups. It can be understood that the acquisition unit 304 may send a set of hand images to the recognition unit 306 each time, that is, send a set of hand images to the recognition unit 306 when obtaining a set of hand images; The hand image, that is, after acquiring multiple sets of hand images, these sets of hand images are sent to the recognition unit 306 together.
  • these sets of hand images can be transmitted to the recognition unit 306; the recognition unit 306 recognizes the positions of hand joint points in these sets of hand images to obtain multiple sets of hands
  • the spatial location of the gateway Each group of hand images captured by the color camera 158 is a frame of color images.
  • the recognition unit 306 can recognize the positions of the hand joint points in the hand images of the frames using multiple frames of color images to obtain the spatial positions of the multiple sets of hand joint points.
  • the recognition unit 306 recognizes the position of the hand joint points in the group of hand images, and can obtain the spatial position (three-dimensional position) of the group of hand joint points .
  • the recognition unit 306 recognizes the positions of hand joint points in at least two sets of hand images, and can obtain the spatial positions (three-dimensional positions) of at least one set of hand joint points.
  • the recognition unit 306 may recognize the position of the hand joint point in the group F hand image according to the group F hand image, thereby obtaining the spatial position of the group F hand joint point.
  • F may be equal to 1.
  • F is at least 2.
  • any natural gesture can be expressed as the spatial position of a set of hand joint points, and any type of hand motion can be expressed as the spatial position of multiple sets of hand joint points.
  • any kind of natural gesture can be determined by the spatial position of a group of hand joint points, and any kind of hand movement can be determined by the spatial position of multiple sets of hand joint points.
  • Natural gesture refers to any gesture, that is, any gesture that the user can make. It can be understood that the terminal 300 may determine a gesture according to the spatial position of a group of hand joint points, or may determine a gesture sequence or hand motion based on the spatial position of multiple sets of hand joint points, and then execute the gesture or the hand action Corresponding control operation.
  • the control operation corresponding to the spatial position of the hand joint point of group F is the control operation corresponding to the gesture or hand movement determined by the spatial position of the hand joint point of group F.
  • the terminal 300 may be preset with a correspondence relationship between the spatial position of a group of hand joint points and control actions, or may be preset with a correspondence relationship between a combination of spatial positions of the group F hand joint points and control actions.
  • the user can manipulate the virtual objects in the screen displayed by the display unit 302 through gestures and hand movements. For example, the position or form of the virtual object in the screen displayed by the display unit 202 is adjusted by gestures or hand movements.
  • the terminal device determines the user's control operation through the spatial position of the hand joint point, which can cover all natural gestures and hand movements, and the operation efficiency is higher and more natural, so as to improve the user's human-computer interaction experience .
  • the acquisition unit 304 includes: a color sensor 3042, a depth sensor 3044, an alignment subunit 3046, and a segmentation subunit 3048.
  • the sub-units in the acquisition unit 304 can be implemented by software programming, can also be implemented by hardware circuits, part of the units can also be implemented by software programming and another part of the units can be implemented by hardware circuits.
  • the acquisition unit 304 in FIG. 4 may not include the color sensor 3042. In other words, the color sensor is optional, not necessary.
  • the color sensor 3042 is used to capture original color images.
  • the depth sensor 3044 is used to capture the original depth image.
  • the alignment subunit 3046 is used for spatial alignment of the original color image and the original depth image.
  • the segmentation subunit 3048 is configured to perform hand segmentation on the original color image and the original depth image to obtain a set of hand images.
  • the functions of the alignment subunit 3046 and the division subunit 3048 can be implemented by the processing unit 120 in FIG. 1.
  • the color sensor 3042 may be a sensor in the color camera 158 in FIG. 1
  • the depth sensor 3044 may be a sensor in the depth camera 158 in FIG. 1.
  • the color sensor 3042 and the depth sensor 3044 can simultaneously capture images, the color sensor obtains the original color image, and the depth sensor obtains the original depth image.
  • the resolution of the color sensor 3042 and the depth sensor 3044 may be the same or different. That is to say, the resolution of the original depth image and the original color image may be the same or different.
  • the alignment subunit 3046 may use an image scaling algorithm (such as a bilinear interpolation method) to adjust the two images to the same size. For example, the resolution of the color image is 800 * 1280, and the resolution of the depth image is 400 * 640.
  • the alignment subunit 3046 uses bilinear interpolation to reduce the color image to 400 * 640.
  • the origin of the former is the color camera (RGB camera), and the origin of the latter is the infrared camera, the two will have corresponding errors. Therefore, the original color image and the original depth image need to be spatially aligned.
  • the spatial alignment of the original color image and the original depth image may be that the original color image remains unchanged, and the original depth image is adjusted so that the original depth image is spatially aligned with the original color image; or the original depth image may not remain The original color image is adjusted so that the original depth image is spatially aligned with the original color image.
  • the alignment subunit 3046 may rotate and translate the original depth image according to the rotation translation matrix between the two cameras calibrated by the depth sensor and the color sensor, so as to align with the original color image.
  • the alignment subunit 3046 or the processing unit 308 first adjusts the two images to the same size and then performs spatial alignment.
  • the alignment subunit 3046 directly performs spatial alignment on the two images.
  • the segmentation subunit 3048 is optional, and the role of the segmentation subunit 3048 is to extract the image area where the hand is located from the original color image and extract the image area where the hand is located from the original depth image.
  • the obtaining unit 304 may not include the division subunit 3048. That is to say, the acquiring unit 304 can send the original color image and the original depth image after the spatial alignment is achieved as the target group hand image to the recognition unit 306.
  • spatial alignment of the original color image and the original depth image can ensure that the position of the hand joint point in the color original image is consistent with the position of the hand joint point in the original depth image, and the implementation is simple.
  • the terminal 300 further includes: a detection unit 310, configured to detect a location area of the hand based on at least one of a color image and a depth image included in the hand image of the target group; the recognition unit 306 , Used to identify the position of the hand joint point of the hand in the position area according to at least one of the color image and the depth image, to obtain the spatial position of a group of hand joint points.
  • a detection unit 310 configured to detect a location area of the hand based on at least one of a color image and a depth image included in the hand image of the target group
  • the recognition unit 306 Used to identify the position of the hand joint point of the hand in the position area according to at least one of the color image and the depth image, to obtain the spatial position of a group of hand joint points.
  • the terminal 300 further includes:
  • the detecting unit 310 is configured to detect the first position area where the hand is located in the color image included in the hand image of the target group.
  • the recognition unit 306 is used to recognize the position of the hand joint point of the hand in the second position area in the depth image included in the target group of hand images, to obtain the spatial position of the hand joint point of the group, the second position area is
  • the first position area corresponds to the area in the depth image. Considering that the resolutions of the depth image and the color image may be different, the second position area and the first position area may be the same or may be in a certain ratio.
  • the target group hand image includes the color image and the depth image, and the target group hand image is any group image of the above-mentioned group F hand image.
  • the depth image and the color image are images obtained by simultaneously shooting the same scene.
  • the detection unit 310 may be the processing unit 120 in FIG. 1.
  • the detection unit 310 may use a trained detection network to detect the location area of the hand in the color image, and input the detected first location area (hand position result) to the recognition unit 306.
  • the first position area may be represented by a set of coordinates.
  • the detection network is a network obtained by the processing unit 308 using deep learning methods and training using a large number of color image (RGB image) samples.
  • the recognition unit 306 may recognize the position of the hand joint point in the depth image by using the trained recognition network to obtain the spatial position of a group of hand joint points.
  • the recognition unit 306 determines that the first location area corresponds to the second location area in the depth image, and utilizes the trained recognition network for the second location area.
  • the image is subjected to hand joint point regression to obtain the spatial position of the group of hand joint points.
  • the detection unit 310 determines that the first location area corresponds to the second location area in the depth image, and sends the image of the second location area to the recognition unit 306.
  • the recognition unit 306 uses the trained recognition network to perform hand joint point regression on the image of the second position area to obtain the spatial position of the group of hand joint points.
  • the recognition network is a network obtained by the processing unit 308 using deep learning methods and training using a large number of deep samples. In practical applications, the recognition unit 306 recognizes a depth image to obtain the three-dimensional space positions of 21 hand joint points.
  • the detection unit 310 is configured to detect the first position area where the hand is located in the color image included in the hand image of the target group.
  • the identifying unit 306 is used to identify the position of the hand joint point of the hand in the first position area to obtain the spatial position of the first group of hand joint points; identify the second position in the depth image included in the target group hand image The position of the hand joint point of the hand in the area to obtain the spatial position of the second group of hand joint points, the second position area being the area corresponding to the first position area in the depth image; merging the first group of hands The spatial position of the joint point of the part and the spatial position of the second group of hand joint point, to obtain the spatial position of the set of hand joint point corresponding to the target group of hand images.
  • the target group hand image includes the color image and the depth image, and the target group hand image is any group image of the above-mentioned group F hand image.
  • the spatial position of the hand joint points obtained from the color image can be the coordinates of each pixel (two-dimensional coordinates)
  • the spatial position of the hand joint points obtained from the depth image can be the coordinates of each pixel and each hand in the scene
  • the recognition unit 306 can separately recognize the color image and the depth image, and merge two recognition results (spatial positions of two sets of hand joint points) to obtain a two-dimensional spatial position of a set of hand joint points.
  • the spatial position of the first group of joints includes 21 two-dimensional coordinates and sequentially represents the spatial position of the first hand joint point to the 21st hand joint point;
  • the spatial position of the second group of joints includes 21 three-dimensional coordinates The coordinates sequentially represent the three-dimensional space position from the first hand joint point to the 21st hand joint point.
  • the spatial position of the first group of hand joint points and the spatial position of the second group of hand joint points can be the two-dimensional coordinates of the spatial position of the two sets of hand joint points.
  • the recognition unit 306 determines that the first location area corresponds to the second location area in the depth image, and utilizes the trained recognition network for the second location area.
  • the image is subjected to hand joint point regression to obtain the spatial position of 21 hand joint points and 21 confidence levels.
  • the 21 confidence levels correspond to the spatial positions of the 21 hand joint points.
  • the recognition unit 306 uses the trained recognition network to perform hand joint point regression on the image of the first position area, to obtain the spatial position of 21 hand joint points and 21 confidence levels .
  • the 21 confidence levels correspond to the spatial positions of the 21 hand joint points.
  • each hand joint point corresponds to two spatial positions, one is the three-dimensional spatial position obtained by the recognition unit 306 recognizing the depth image, and the other is the two-dimensional spatial position obtained by the recognition unit 306 recognizing the color image.
  • the recognition unit 306 may merge the spatial positions of two hand joint points corresponding to the same hand joint point into one spatial joint point.
  • the two spatial positions corresponding to a hand joint point are (A, B) and (C, D, E); if the confidence of (A, B) is higher than (C, D, E ), The two spatial positions are merged into (A, B, E); otherwise, the two spatial positions are merged into (C, D, E).
  • the spatial position of the hand joint point can be determined more accurately.
  • the detection unit 308 is used to detect the position area of the hand in the depth image; the recognition unit 306 is used to identify the position of the hand joint point of the hand in the position area to obtain a Set the spatial location of the hand joints.
  • the depth image is a hand image of the target group, that is, any hand image of the hand image of the group F described above.
  • the detection unit 308 may use a trained detection network to detect the area where the hand in the depth image is located, and input the detected position area (hand position result) to the recognition unit 306.
  • the detection network may be a network obtained by the processing unit 308 using deep learning methods and using a large number of depth image samples for training.
  • the recognition unit 306 may use the trained recognition network to recognize the position of the hand joint in the depth image to obtain the spatial position of a group of hand joints.
  • the trained recognition network is used to perform hand joint point regression on the image of the position area to obtain the group of hand joints The spatial location of the point.
  • the detection unit 308 sends the image of the location area to the recognition unit 306.
  • the recognition unit 306 uses the trained recognition network to perform hand joint point regression on the image of the location area to obtain the spatial position of the group of hand joint points.
  • the depth image can be used to quickly determine the spatial position of the hand joint points in each set of hand images.
  • the recognition unit 306 uses multiple frames of color images to recognize the position of the joint points of the hand to obtain the spatial position of one or more sets of joint points.
  • the terminal 300 may include only the color sensor and not the depth sensor.
  • the processing unit 308 obtains the spatial position of a group of hand joint points from the recognition unit 306; according to the spatial position of the group of hand joint points, calculates the angle between 4 hand joint points on each finger of the hand; The angle between the four hand joint points on the root finger to determine the bending state of each finger; according to the bending state of each finger of the hand, determine a gesture type corresponding to the spatial position of the hand joint point of the group; execute this The control operation corresponding to the gesture type.
  • the processing unit 308 may determine the bending state of each finger of the hand according to the spatial position of a group of hand joint points, and then determine the spatial position of the group of hand joint points by integrating the bending state of each finger Corresponding gesture type, and then execute the control operation corresponding to the gesture type.
  • FIG. 5 is a schematic diagram of determining a fist gesture by the spatial position of a group of hand joint points provided by an embodiment of the present application, where each dot represents a hand joint point. As shown in FIG. 5, the processing unit 308 can determine the fist gesture according to the spatial position of the set of hand joint points in FIG. 5.
  • FIG. 6 is a schematic diagram of an open gesture determined by the spatial position of a group of hand joint points provided by an embodiment of the present application, where each dot represents a hand joint point.
  • the processing unit 308 can determine the opening gesture according to the spatial position of the set of hand joint points in FIG. 6. After determining the control operation, the processing unit 308 may adjust the screen displayed by the display unit 302 accordingly. For example, the position and / or form of the virtual object in the screen currently displayed by the display unit 302 is adjusted.
  • the processing unit 308 obtains the spatial position of the group F hand joint points from the recognition unit 306; determines the M gesture types corresponding to the spatial position of the group F hand joint points, and executes the control operations corresponding to the M gesture types.
  • F is equal to M, and M is greater than 1.
  • the processing unit 308 can determine a gesture type according to the spatial position of a group of hand joint points.
  • the spatial position of a group of hand joint points may not correspond to any one of the gesture types. Therefore, the number of gesture types determined by the processing unit 308 according to the spatial position of the F group of hand joint points may be less than F.
  • M is less than F
  • M is an integer greater than
  • F is an integer greater than 1.
  • the hand transition from the fist gesture to the open gesture corresponds to the spatial positions of 20 sets of hand joint points
  • the processing unit 308 determines only the fist gesture and the open gesture according to the spatial positions of the 20 sets of hand joint points .
  • the processing unit 308 may obtain the spatial position of a group of hand joint points from the recognition unit 306 each time, and determine a gesture type corresponding to the spatial position of the group of hand joint points.
  • the specific implementation of the processing unit 308 for determining a gesture type according to the spatial position of each group of hand joint points is the same as the implementation in the first embodiment.
  • the processing unit 308 may obtain the spatial positions of multiple sets of hand joint points from the recognition unit 306 at a time, and determine one or more gesture types corresponding to the spatial positions of these sets of hand joint points.
  • control operations corresponding to the M gesture types are not related to the change in the gesture types of the M gesture types.
  • control operations corresponding to the M gesture types are independent of the order in which the gesture types of the M gesture types are determined.
  • the gesture type changes of the M gesture types refer to the order in which the gesture types of the M gesture types are determined. Since the hand images of each group in the group F hand images are obtained in a certain order, the determination of the M gesture types also has a certain order.
  • the processing unit 308 sequentially determines the first gesture type, the second gesture type, and the third gesture type, and the three gesture types correspond to the target control operation; the processing unit 308 sequentially determines the second gesture type, the first The gesture type and the third gesture type also correspond to the target control operation.
  • control operations corresponding to the M gesture types are related to the change in the gesture types of the M gesture types.
  • the processing unit 308 executing the control operations corresponding to the M gesture types may be determining the control operations corresponding to the M gesture types according to the change of the gesture types of the M gesture types, where M is greater than 1.
  • the processing unit 308 determines the first gesture type, the second gesture type, and the third gesture type in sequence, and the processing unit 308 determines that the three gesture types correspond to the first control according to the change in the gesture types of the three gesture types Operation; the processing unit 308 sequentially determines the second gesture type, the first gesture type, and the third gesture type, and the processing unit 308 determines that the three gesture types perform the second control operation according to the change in the gesture types of the three gesture types, The second control operation is different from the first control operation. It can be understood that the processing unit 308 may determine at least two gesture types corresponding to the spatial positions of the hand joint points of group F, and perform control operations corresponding to the at least two gesture types according to the change in gesture types of the at least two gesture types.
  • the processing unit 308 may preset the correspondence between the gesture sequence and the control operation. After the processing unit 308 determines a gesture type, it can combine the gesture type with one or more gesture types obtained previously to obtain a gesture sequence; if the gesture sequence corresponds to a certain control operation, execute the control operation When the gesture sequence does not correspond to any control operation, determine the gesture type corresponding to the spatial position of the next group of hand joint points, and repeat the previous operation. It can be seen that the processing unit 308 can acquire the spatial position of a group of hand joint points each time and determine the gesture type corresponding to the spatial position of the group of hand joint points.
  • the processing unit 308 can obtain the spatial positions of multiple sets of hand joint points each time, and determine a gesture sequence according to the spatial positions of the multiple sets of hand joint points.
  • the user can perform a certain control operation through multiple consecutive gestures. For example, the user can use a continuous gesture (gesture sequence) from a fist gesture to an open gesture to realize a certain control operation on the virtual object.
  • the processing unit 308 obtains the spatial position of the hand joint points of the F group from the recognition unit 306; according to the spatial position of the hand joint points of the F group, the spatial position change of the hand joint points between the hand joint point groups is determined, F It is an integer greater than 1; adjust the position of the virtual object in the screen displayed by the display unit 302 according to the spatial position change, and / or adjust the form of the virtual object in the screen displayed by the display unit 302 according to the spatial position change.
  • the method of adjusting the position of the virtual object in the screen displayed by the display unit 302 according to the change of the spatial position is as follows: according to the change of the spatial position, the movement trajectory of the hand is determined; according to the movement trajectory of the hand, the virtual object is moved, the The movement trajectory of the virtual object is consistent with the movement trajectory of the hand.
  • the movement trajectory of the virtual object coincides with the movement trajectory of the hand, which means that the movement trajectory of the virtual object is the same in shape and proportional to the movement trajectory of the hand.
  • the user's hand moves 20 cm to the right, and the virtual object moves 5 cm to the right; the user's hand moves 30 cm to the left, and the virtual object moves 7.5 cm to the left.
  • the spatial position of the hand corresponds to the position of the virtual object, that is, the spatial position of the hand is mapped to the spatial position of the virtual object, so that the user feels that the hand directly contacts the virtual object.
  • the processing unit 308 After the processing unit 308 obtains the spatial position of a group of hand joint points, it can change the spatial position of the calculated spatial position of the hand joint point relative to the spatial position of a group of hand joint points obtained previously (hand displacement) ; Adjust the form of the virtual object in the screen displayed by the display unit 302 according to the change of the spatial position.
  • the terminal 300 further includes a vibration unit 312 for performing vibration.
  • the vibration intensity of the vibration unit 312 is positively or negatively related to the distance from the hand to the terminal 300.
  • the farther the hand is from the terminal 300 the weaker the vibration strength of the vibration unit 312.
  • the vibration unit 312 can adjust its vibration intensity according to the distance from the hand to the terminal 300.
  • the method of adjusting the form of the virtual object in the screen displayed by the display unit 302 according to the change in the spatial position is as follows: determining the movement of the hand according to the change in the spatial position; performing the adjustment operation corresponding to the movement of the hand, This adjustment operation is used to adjust the form of the virtual object in the screen displayed by the display unit 302.
  • the display unit 302 displays the image (virtual hand) corresponding to the user's gesture or hand motion into the screen displayed thereon. Adjusting the form of the virtual object in the screen displayed by the display unit 302 may be adjusting the direction, size, shape, etc.
  • the screen displayed by the display unit 302 includes the user's hand mapped to the virtual hand in the screen.
  • the user can use the virtual hand as his own hand to operate the virtual object in the screen accordingly. That is to say, the user operates his own hand is to operate the virtual hand, that is, the movement of the virtual hand is consistent with the movement of the user's hand.
  • the terminal 300 can obtain the three-dimensional spatial position of the hand joint point using a single frame image, and can obtain the spatial position change of the hand joint point from the continuous multi-frame images, and control the occurrence position of the virtual object according to the control operation determined by the spatial position change and Morphological changes.
  • the processing unit 308 obtains the spatial position of the group F hand joint points from the recognition unit 306, and determines the M gesture types corresponding to the spatial position of the group F hand joint points, F is greater than 1, M is less than or equal to F, and is at least 1. Perform control operations according to the spatial position of the joint points of the hands of the F group and the M gesture types.
  • the following describes several specific implementation methods for performing control operations according to the spatial position of the hand joint points of the F group and the M gesture types.
  • the identification unit 306 determines the target gesture type according to the spatial position of a group of hand joint points in the spatial position of the group F hand joint points
  • the hand joint is determined according to the spatial position of the group F hand joint points
  • the spatial position of the hand joint point between the point groups changes; according to the target gesture type and the spatial position change, a control operation is performed.
  • the target gesture type is used to adjust the position of the virtual object in the display screen of the display unit 302.
  • the target gesture may be a fist gesture, an open gesture, or the like.
  • the recognition unit 306 determines the target gesture type according to the spatial position of a group of hand joint points
  • the gesture types determined according to the spatial positions of one or more sets of hand joint points before the group are all the target gesture
  • the spatial position of the group of hand joint points and the spatial position of the one or more sets of hand joint points determine the change of the spatial position of the hand joint points between the hand joint point groups; according to the change of the spatial position, Perform control operations.
  • the recognition unit 306 determines the spatial position change of the hand joint points between the hand joint point groups according to the spatial position of the hand joint points of the F group; the gesture type change according to the M gesture types and the spatial position change To perform control operations.
  • the processing unit 308 may preset a correspondence between the combination of the gesture type change and the spatial position change and the control operation, and the processing unit 308 may determine the control operation corresponding to the combination of different gesture type changes and the spatial position change according to the corresponding relationship, and then adjust The position and / or form of the virtual object in the screen displayed by the display unit 302.
  • the recognition unit 306 performs a control operation according to the gesture type changes of the M gesture types and the spatial position of the hand joint points in group F.
  • the processing unit 308 may preset a correspondence between the combination of the gesture type change and the spatial position of the group F hand joint point and the control operation, and the processing unit 308 may determine different gesture type changes and the group F hand joint point according to the correspondence
  • the control operation corresponding to the combination of the spatial positions of, adjusts the position and / or shape of the virtual object in the screen displayed by the display unit 302.
  • the terminal 300 can obtain complex and continuous motion recognition by using gesture types recognized from successive sets of hand images and spatial position changes of hand joint points determined according to the multiple sets of hand images, and then manipulate virtual objects Continuous changes occur.
  • the terminal 300 can determine various complicated hand movements of the user by combining the gesture type and the spatial position change of the hand joint points to meet the requirements of different application scenarios.
  • the terminal 300 can interact with the user's gesture or hand motion by vibrating or playing specific music, that is, by vibrating or playing specific music To respond to the user's gestures or hand movements, thereby enhancing the user's sense of operation.
  • specific music that is, by vibrating or playing specific music
  • the terminal 300 in FIG. 3 may further include a vibration unit 312 and an audio unit 314.
  • the vibration unit 312 may be the vibration circuit 160 in FIG. 1.
  • the audio unit 314 may be the audio circuit 128 in FIG. 1.
  • the vibration unit 312 is used to provide a vibration effect corresponding to the control operation determined by the processing unit 308.
  • the audio unit 314 is used to provide an audio effect corresponding to the control operation determined by the processing unit 308.
  • a control operation may correspond to only one vibration effect, or only one audio special effect, or may correspond to both a vibration effect and an audio special effect. For example, after the terminal device determines the control operation corresponding to the gun gesture, the terminal device imitates the vibration effect of the recoil of the real pistol and emits a shooting sound (audio special effect).
  • the processing unit 308 may preset the correspondence between the control operation and the vibration effect, or may preset the correspondence between the control operation and the audio effect, and may also preset the correspondence between the control operation and the combination of the vibration effect and the audio effect. In a specific implementation, after the processing unit 308 determines a certain control operation, it may determine the vibration effect corresponding to the control operation according to the corresponding relationship between the control operation and the vibration effect, and control the vibration unit 312 to vibrate to achieve the vibration effect. In a specific implementation, after the processing unit 308 determines a certain control operation, it can determine the audio effect corresponding to the control operation according to the corresponding relationship between the control operation and the audio effect, and control the audio unit 314 to play the corresponding music to achieve the audio effect.
  • the processing unit 308 can determine the audio special effect and vibration effect corresponding to the control operation according to the corresponding relationship between the control operation and the combination of the vibration effect and the audio special effect, and control the audio unit 314 to play the corresponding Music, achieve the audio special effect, and control the vibration unit 312 to vibrate to achieve the vibration effect.
  • the terminal may provide different vibration feedback forms or audio special effects according to the detected gesture type.
  • triggering the corresponding vibration effect and music special effect can enhance the user's immersion and improve the user experience.
  • the user can manipulate the position or shape of the virtual object in the screen displayed by the terminal 300 through gestures or hand movements.
  • This requires that the camera on the terminal 300 can capture the user's hand image, that is, the user's gesture operation is within a reasonable control range, so as to recognize the user's gesture or hand motion based on the hand image, and then adjust the terminal 300 The position or shape of the virtual object in the displayed screen.
  • the following describes how to prompt the user that the gesture operation is beyond the control range.
  • the camera of the terminal 300 captures the hand image; the recognition unit 306 recognizes the position of the hand joint in the hand image to obtain the spatial position of one or more sets of hand joints; processing The unit 308 determines the control operation corresponding to the spatial position of the hand joint point of the group or groups, and executes the control operation.
  • the necessary condition for the processing unit 308 to obtain the control operation is that the recognition unit 306 can recognize a certain number of hand joint points from the hand image captured by the camera.
  • the user may be prompted that the gesture operation is beyond the control range, in order to facilitate Prompt the user to re-control within a reasonable control range.
  • the terminal 300 prompts that the gesture operation exceeds the manipulation range.
  • the hand image of the reference group is any one of the hand images in the above group F
  • the number threshold may be 10, 12, 15, etc.
  • the recognition unit 306 recognizes the position of hand joint points in the hand image of the reference group, obtains the spatial position of a group of hand joint points, and transmits it to the processing unit 308; the processing unit 308 determines the group of hand joint points After the number of spatial positions of the hand joint points included in the spatial position is less than the threshold value, the vibration unit 312 is controlled by a certain vibration effect to prompt and / or the audio unit 314 is played to play a certain audio special effect for prompting.
  • the recognition unit 306 recognizes the position of the hand joint point in the hand image of group F
  • the number of hand joint points in each group of hand images in the hand image of group K is less than the number threshold
  • the user is prompted that the gesture operation is beyond the control range.
  • the group K hand image is included in the group F hand image. K is less than or equal to F, and is at least 1.
  • the detection unit 310 detects the position area of the hands in each group of hand images in the group F hand images, no hands or complete hands are detected in the group K hand images In the case of, the user is prompted that the gesture operation is beyond the control range.
  • the group K hand image is included in the group F hand image. K is less than or equal to F, and is at least 1.
  • the detection unit 310 detects the hand in a set of hand images, and notifies the processing unit 308 that the hand is not detected when the hand is not detected or the complete hand is not detected; the processing unit 308 controls the vibration unit 312 prompts by a certain vibration effect and / or controls the audio unit 314 to play a certain audio special effect for prompting.
  • the terminal device After the terminal device detects that the current group of hand images does not contain the hand image or the group of hand images does not contain the complete hand image, it prompts the user that the gesture operation is beyond the operation range, and can promptly prompt the user to re-do Manipulation.
  • an embodiment of the present application provides a gesture-based manipulation method. As shown in FIG. 7, the method may include:
  • the terminal device displays a target screen.
  • the target picture includes a virtual object for manipulation through the detected gesture or the detected hand movement.
  • the target picture may include an image collected by the rear camera of the terminal device.
  • the terminal device acquires the hand image of group F.
  • the front camera (depth camera and / or color camera) of the terminal device can collect the user's hand image, so as to determine the user's gesture or hand movement.
  • the terminal device can use the depth camera and the color camera to simultaneously capture the user's hand to obtain the group F hand image, or can only use the depth camera to capture the user's hand to obtain the group F hand image, or can only use the color camera to capture the user's hand Get F-frame color images in the hand.
  • F is an integer greater than 0.
  • the group F hand image may be an F frame depth image, or an F frame color image, or may be an F image combination, and each image combination includes a frame depth image and a frame color image.
  • F can be equal to 1 or greater than 1.
  • F is at least 2.
  • the terminal device recognizes the position of the hand joint point in the hand image of the group F, and obtains the spatial position of the hand joint point of the group F.
  • the spatial position of any group of hand joint points in the spatial position of the group F hand joint points is the spatial position of the hand joint points of the hand in the set of hand images.
  • the control operation is used to adjust the position and / or shape of the virtual object in the target picture.
  • the terminal device determines the user's control operation through the spatial position of the hand joint point, which can cover all natural gestures and continuous hand movements, and the operation efficiency is higher and more natural, so as to improve the user's human-machine Interactive experience.
  • an embodiment of the present application provides another gesture-based manipulation method. As shown in FIG. 8, the method may include:
  • the terminal device displays a target screen.
  • the target picture includes a virtual object for manipulation through the detected gesture or the detected hand movement.
  • the target picture may include an image obtained by the terminal device through a rear camera.
  • the target screen may be a screen of an application, for example, an AR game or a VR game.
  • the terminal device uses the depth camera and the color camera to simultaneously shoot the same scene to obtain a set of hand images.
  • the depth camera can obtain a depth image by shooting the scene, and the color camera can obtain a color image by shooting the scene.
  • the terminal device performs noise reduction and hand segmentation on the set of hand images.
  • the terminal device spatially aligns the depth image and the color image included in the set of hand images.
  • the terminal device detects whether the group of hand images includes hands.
  • the terminal device recognizes the position of the hand joint points in the set of hand images, and obtains the spatial position of the set of hand joint points.
  • the foregoing embodiment describes how to recognize the position of the hand joint point in the hand image, which will not be described in detail here.
  • the terminal device performs a control operation corresponding to the spatial position of the joint point of the hand.
  • the control operation is used to adjust the position and / or shape of the virtual object in the target picture.
  • the terminal device performing the control operation corresponding to the spatial position of the hand joint point of the group may be the terminal device determining the gesture type corresponding to the spatial position of the hand joint point of the group, and performing the control operation corresponding to the gesture type.
  • the terminal device may be preset with a correspondence between gesture types and control operations, and the corresponding relationships may be used to determine control operations corresponding to various gesture types.
  • the method for determining the gesture type corresponding to the spatial position of a group of hand joint points by the terminal device is the same as that in the foregoing embodiment, and will not be described in detail here.
  • FIG. 9 is a schematic diagram of a bullet released by an open gesture provided by an embodiment of the present application.
  • the spatial position of the hand joint points in the figure is the spatial position of a group of hand joint points obtained by the terminal device from a set of hand images
  • the open gesture in the figure is the terminal device according to the group
  • the gesture type determined by the spatial position of the joint point of the hand, 901 in the figure represents the bullet (virtual object) in the screen displayed by the terminal device, 902 in the figure represents the slingshot in the screen, and the broken line represents the movement trajectory of the bullet.
  • the bullet displayed in the screen displayed by the terminal device is in a state of being fired, and the terminal device photographs the user's hand to obtain a set of hand images; after determining the opening gesture based on the set of hand images, the terminal device displays the bullet fired
  • the screen that displays the bullet moves according to the movement trajectory in FIG. 9.
  • the terminal device determines the gesture type corresponding to the set of spatial positions, and combines the gesture type with one or more gesture types obtained previously to obtain a gesture sequence; in the gesture When the sequence corresponds to a certain control operation, the control operation is performed; when the gesture sequence does not correspond to any control operation, the gesture type corresponding to the spatial position of the next group of hand joint points is determined, and the previous operation is repeated.
  • the terminal device may preset a correspondence between at least one gesture sequence and a control operation, and use the correspondence to determine a control operation corresponding to various gesture sequences.
  • the terminal device executes the vibration effect and / audio special effect corresponding to the control operation.
  • a control operation may correspond to only one vibration effect, or only one audio effect, or may correspond to both a vibration effect and an audio special effect.
  • the terminal device may be preset with a corresponding relationship between control operations and vibration effects, or may be preset with a corresponding relationship between control operations and audio effects, and may also be preset with a corresponding relationship between control operations and vibration effects and audio characteristics. For example, after the terminal device determines the control operation corresponding to the gun gesture, the terminal device imitates the vibration effect of the recoil of the real pistol and emits a shooting sound (audio special effect).
  • the terminal device vibrates to indicate that the gesture operation is beyond the control range.
  • the terminal device may determine the user's gesture type according to the spatial position of the joint point of the hand, and then perform the corresponding control operation, and the gesture recognition rate is high.
  • an embodiment of the present application provides another gesture-based manipulation method. As shown in FIG. 10, the method may include:
  • the terminal device acquires the hand image of group F.
  • the terminal device uses the depth camera and the color camera to simultaneously shoot the same scene to obtain a group of hand images, and continuously and simultaneously shoots F times to obtain the F group of hand images.
  • F is an integer greater than 1.
  • the terminal device recognizes the position of the hand joint point in each group of hand images in the group F hand image to obtain the spatial position of the group F hand joint point.
  • the terminal device obtains a set of hand images and recognizes the positions of hand joint points in a frame of images to obtain a set of hand joint point spatial positions (three-dimensional spatial position).
  • the terminal device starts to recognize the position of the hand joint point in the group F image to obtain the spatial position of the hand joint point of group F.
  • the terminal device determines the change of the spatial position of the hand joint point between the hand joint point groups according to the spatial position of the hand joint point of the group F.
  • the spatial positions of the hand joint points included in the spatial positions of the set of hand joint points may be the spatial positions of the first hand joint point to the 21st hand joint point in sequence.
  • the terminal device calculates the change of the spatial position of the hand joint point according to the spatial position of the two sets of hand joint points, which can be obtained by subtracting the spatial position of one set of hand joint points and the spatial position of the other set of hand joint points.
  • the spatial position of each hand joint changes.
  • the terminal device adjusts the position and / or shape of the virtual object in the displayed screen according to the change in the spatial position.
  • the screen displayed on the display screen or display of the terminal device includes at least one virtual object, and the user can adjust the at least one virtual object in the screen using gestures or hand movements.
  • the terminal device adjusts the position of the virtual object in the displayed screen according to the spatial position change as follows: according to the spatial position change, the movement trajectory of the hand is determined; the virtual object is moved according to the movement trajectory of the hand, The movement trajectory of the virtual object coincides with the movement trajectory of the hand.
  • the movement trajectory of the virtual object coincides with the movement trajectory of the hand, which means that the movement trajectory of the virtual object is the same in shape and proportional to the movement trajectory of the hand.
  • the user's hand moves 20 cm to the right, and the virtual object moves 5 cm to the right; the user's hand moves 30 cm to the left, and the virtual object moves 7.5 cm to the left.
  • the spatial position of the hand corresponds to the position of the virtual object, that is, the spatial position of the hand is mapped to the spatial position of the virtual object, so that the user feels that the hand directly contacts the virtual object.
  • the terminal device maps the movement trajectory of the hand to the screen to obtain a screen where the virtual hand moves.
  • the user can use the virtual hand as his own hand to operate the virtual object in the screen accordingly. That is to say, the user operates his own hand is to operate the virtual hand, that is, the movement of the virtual hand is consistent with the movement of the user's hand.
  • the terminal device After the terminal device obtains the spatial position of a group of hand joint points, it can calculate the spatial position of the group of hand joint points relative to the spatial position of a group of hand joint points obtained previously (hand displacement); The virtual object is moved correspondingly according to the change of the spatial position.
  • the method of adjusting the form of the virtual object in the displayed screen according to the change of the spatial position is as follows: according to the change of the spatial position, the movement of the hand is determined; It is used to adjust the form of the virtual object in the screen displayed by the terminal device. Adjusting the form of the virtual object in the screen displayed by the terminal device may be adjusting the direction, size, shape, etc. of the virtual object in the screen displayed by the terminal device.
  • the terminal device maps the hand movement to the screen to obtain a virtual hand movement consistent with the hand movement. The user can use the virtual hand as his own hand to operate the virtual object in the screen accordingly.
  • the terminal device can obtain the three-dimensional spatial position of the hand joint point using a single frame of image, and the spatial position change of the hand joint point can be obtained from the continuous multi-frame image, and can be controlled according to the control operation determined by the change of the spatial position The corresponding position and shape change of the virtual object occurs.
  • the terminal device determines that the spatial position of the hand joint points of group F corresponds to M gesture types; the terminal device adjusts its spatial position according to the M gesture types and the spatial position of hand joint points of group F The position and / or shape of the virtual object in the displayed screen.
  • the terminal device determines that the spatial position of the hand joint points of group F corresponds to M gesture types; the terminal device adjusts its spatial position according to the M gesture types and the spatial position of hand joint points of group F The position and / or shape of the virtual object in the displayed screen.
  • the terminal device determines that the spatial position of the hand joint points of group F corresponds to M gesture types; the terminal device adjusts its spatial position according to the M gesture types and the spatial position of hand joint points of group F The position and / or shape of the virtual object in the displayed screen.
  • the terminal device determines the target gesture type according to the spatial position of a group of hand joint points in the spatial position of the group F hand joint points, according to the spatial position of the group F hand joint points, Determine the spatial position change of the hand joint points between the hand joint point groups; perform control operations according to the target gesture type and the spatial position change.
  • the target gesture type is used to adjust the position of the virtual object in the display screen of the display unit 302.
  • the target gesture may be a fist gesture, an open gesture, or the like.
  • the terminal device determines the target gesture type according to the spatial position of a group of hand joint points
  • the terminal device determines that the gesture types determined according to the spatial position of one or more sets of hand joint points before the group are all the target gesture
  • the spatial position of the group of hand joint points and the spatial position of the one or more sets of hand joint points determine the change of the spatial position of the hand joint points between the hand joint point groups; according to the change of the spatial position, Perform control operations.
  • the terminal device determines the spatial position change of the hand joint point between the hand joint point groups according to the spatial position of the hand joint point of group F; the gesture type change according to the M gesture types and The spatial position changes and the control operation is performed.
  • the processing unit 308 may preset a correspondence between the combination of the gesture type change and the spatial position change and the control operation, and the processing unit 308 may determine the control operation corresponding to the combination of different gesture type changes and the spatial position change according to the corresponding relationship, and then adjust The position and / or form of the virtual object in the screen displayed by the display unit 302.
  • the terminal device performs a control operation according to the gesture type changes of the M gesture types and the spatial position of the hand joint points in group F.
  • the processing unit 308 may preset a correspondence between the combination of the gesture type change and the spatial position of the group F hand joint point and the control operation, and the processing unit 308 may determine different gesture type changes and the group F hand joint point according to the correspondence
  • the control operation corresponding to the combination of the spatial positions of, adjusts the position and / or shape of the virtual object in the screen displayed by the display unit 302.
  • the terminal device may determine the user's hand motion based on consecutive sets of hand images, and then execute the corresponding control operation of the hand motion.
  • the following describes how the terminal device adjusts the displayed screen according to the user's gesture or hand movement in combination with specific application scenarios.
  • the terminal device After receiving the instruction sent by the user to start the AR archery game application, the terminal device starts the AR archery game application.
  • the terminal device After receiving the instruction sent by the user to start the AR archery game application, the terminal device starts the AR archery game application.
  • the terminal device After the user clicks a target icon on the touch display screen of the terminal device, the terminal device starts the AR archery game application, and the target icon is an icon of the AR archery game application.
  • the terminal device uses the rear camera to shoot the real scene, superimpose the virtual object on the real image obtained by the shooting, and display it on the display screen.
  • the display screen displays the real images taken by the rear camera and the superimposed virtual objects.
  • 1101 represents a slingshot
  • 1102 represents a bullet
  • 1103 represents a shooting target
  • 1104 represents a real image, where 1101, 1102, and 1103 are virtual objects superimposed on the real image 1104.
  • the terminal device plays background music of the AR archery game application.
  • the terminal device uses the front camera (depth camera and color camera) to capture the user's hand image, and detects whether the hand image is included in the hand image.
  • front camera depth camera and color camera
  • the terminal device vibrates to prompt the user to adjust the position of the hand to prevent operation failure. If the hand image includes hands, perform the next step.
  • the terminal device recognizes the first gesture based on the captured hand image, and converts the first gesture into a first interaction instruction.
  • the first gesture may be a fist gesture, and the first interaction instruction may be to pinch a bullet on the slingshot.
  • the terminal device runs an AR archery game, if a fist gesture is detected, the terminal device displays a picture of pinching a bullet on a slingshot.
  • the camera acquires successive sets of hand images, and recognizes each set of hand images to obtain the spatial position change of the hand joint points.
  • the specific implementation is: when the user's hand moves up, down, left, and right, the continuous three-dimensional space position of the hand joint point is recognized, and the bullet can be controlled to move up, down, left, and right in the screen; when the user's palm deflects, according to the hand joint
  • the spatial position of the point changes, the orientation of the palm of the hand can be obtained. According to the orientation of the palm of the hand, the firing direction of the bullet can be adjusted.
  • FIG. 12 is a schematic diagram of hand movement provided by an embodiment of the present application. As shown in FIG. 12, the hand can be moved up and down, left and right, back and forth, and the palm orientation of the hand can be adjusted. Specifically, the terminal device may determine the position distance of each hand joint point according to the spatial position of a group of hand joint points, and then determine the palm orientation of the hand. In practical applications, each movement of the hand can cause the bullet in the screen displayed by the terminal device to change accordingly. Specifically, the movement trajectory of the hand is consistent with the movement trajectory of the bullet in the screen.
  • the terminal device detects that the user's hand moves, its vibration unit works. As shown in Figure 13, the farther the hand is from the camera, the stronger the vibration strength, which means the greater the slingshot's pulling force, reflecting the 3D vibration effect. It can be understood that when the user's hand keeps the fist state and moves, the gesture recognition result remains unchanged, and the state of pinching the bullet on the slingshot is still maintained.
  • the terminal device recognizes the second gesture based on the captured hand image, and converts the first gesture into a second interaction instruction.
  • the second gesture may be an open gesture
  • the second interaction instruction may be a bullet release.
  • the terminal device runs the AR archery game
  • if an open gesture is detected the terminal device displays a screen for releasing bullets. It can be understood that the user can make the bullet in the slingshot in the screen by making a fist gesture, the movement of the bullet can be controlled by the movement of the hand, and the bullet in the screen can be released by the open gesture.
  • the terminal device triggers vibration effects and music special effects.
  • the terminal device When a bullet hits a virtual object in the screen of the terminal device, the terminal device triggers a vibration effect and a music special effect to enhance the user's immersion.
  • the user can manipulate the slingshot in the screen with gestures to complete the operations on the slingshot, adjust the direction of the bullet's launch, adjust the slingshot's tension, adjust the position of the bullet, and release the bullet.
  • the user can manipulate the slingshot in the screen through gestures or hand movements, just as if his own hand directly controls the slingshot in the screen.
  • the following uses the interactive AR escape game as an example to introduce the application of gesture and hand motion recognition.
  • a series of hand movements such as screwing, turning keys, sliding doors, pulling drawers, etc., are involved in the AR room escape game.
  • Natural gesture recognition tracks all joint points of the hand, and captures and recognizes all behaviors to determine the user's various gestures and hand movements, and then adjust the screen played by the terminal device.
  • the terminal device can extract each hand joint point of the hand, and then analyze the hand movement and position in real time. In this way, any hand movements and gestures of the user can be recognized, and the corresponding control operations can be completed.
  • the terminal device 1400 can be used as an implementation manner of the terminal 300.
  • the terminal device 1400 includes a processor 1402, a memory 1404, a camera 1406, a display screen 1408, and a bus 1410. Among them, the processor 1402, the memory 1404, the camera 1406, and the display screen 1408 communicate with each other through the bus 1410.
  • the processor 1402 can realize the recognition in FIG. 3
  • the processor 1402 may use a general-purpose CPU, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits for executing relevant programs to implement the technology provided by the embodiments of the present invention Program.
  • ASIC Application Specific Integrated Circuit
  • the memory 1404 may be a read-only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 1404 may store an operating system and other application programs.
  • Program codes for implementing the functions required by the modules and components included in the terminal 300 provided by the embodiments of the present invention through software or firmware, or for implementing the above methods provided by the method embodiments of the present invention are stored in the memory 1404, and
  • the processor 1402 reads the code in the memory 1404 to execute the operations required to be performed by the modules and components included in the terminal 300, or execute the above method provided by the embodiment of the present application.
  • the display screen 1408 is used to display various images and dynamic pictures described in the implementation of the present application.
  • the bus 1410 may include a path for transferring information between various components of the terminal device 1400 (for example, the processor 1402, the memory 1404, the camera 1406, and the display screen 1408).
  • the camera 1406 may be a depth camera for capturing depth images.
  • the camera 1406 may be a color camera, such as an RGB camera, for capturing depth images.
  • the camera 1406 may include a color camera and a depth camera.
  • the color camera is used to capture a color image.
  • the depth camera is used to capture a depth image.
  • the color camera and the depth camera may simultaneously capture the same scene.
  • terminal device 1400 shown in FIG. 14 only shows the processor 1402, the memory 1404, the camera 1406, the display screen 1408, and the bus 1410, in the specific implementation process, those skilled in the art should understand that the terminal device The 1400 also contains other devices necessary for normal operation. Meanwhile, according to specific needs, those skilled in the art should understand that the terminal device 1400 may further include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the terminal device 1400 may also include only the devices necessary to implement the embodiments of the present invention, and does not necessarily include all the devices shown in FIG. 14.
  • the above program may be stored in a computer readable storage medium, and the program During execution, the process of the embodiments of the above methods may be included.
  • the above storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM: Read-Only Memory) or a random storage memory (RAM: Random Access Memory), etc.
  • the computer program can be stored / distributed in a suitable medium, for example, an optical storage medium or a solid-state medium, provided with other hardware or as part of the hardware, and can also adopt other distribution forms, such as via the Internet or other wired or wireless telecommunication systems.
  • a suitable medium for example, an optical storage medium or a solid-state medium, provided with other hardware or as part of the hardware, and can also adopt other distribution forms, such as via the Internet or other wired or wireless telecommunication systems.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请公开了一种基于手势的操控方法及终端设备,该方法包括:显示目标画面,所述目标画面中包括用于通过检测到的手势或检测到的手部动作进行操控的虚拟物体;获取F组手部图像;根据所述F组手部图像,识别所述F组手部图像中手部的手部关节点位置,从而得到F组手部关节点的空间位置,任一组手部关节点的空间位置为一组手部图像中的手部的手部关节点的空间位置,F为大于0的整数;根据所述F组手部关节点的空间位置,执行所述F组手部关节点的空间位置对应的控制操作,所述控制操作用于调节所述目标画面中所述虚拟物体的位置和/或形态。

Description

基于手势的操控方法及终端设备 技术领域
本申请涉及人机交互领域,尤其涉及一种基于手势的操控方法及终端设备。
背景技术
虚拟现实技术和增强现实技术是近年来新兴的多媒体技术。虚拟现实技术是一种基于多媒体计算机技术、传感技术、仿真技术创建的沉浸式交互环境。具体地说,就是采用计算机技术生成逼真的视、听、触觉一体化的特定范围的虚拟环境,用户借助必要的设备以自然的方式与虚拟环境中的对象进行交互作用、相互影响,从而产生亲临等同真实环境的感受和体验。增强现实技术是一种实时地计算摄影机影像的位置及角度并加上相应图像的技术。这种技术的目标是将真实世界信息和虚拟世界信息“无缝”集成,使得真实的环境和虚拟的物体实时地叠加到了同一个画面或空间。通过这种方式对现实世界进行补充,使人们在视觉、听觉、触觉等方面增强对现实世界的体验。
手势识别是基于视觉的人机交互的核心技术之一。用户可以采用基于手势识别的人机交互方法与虚拟对象进行交互。在虚拟现实或增强现实场景中,用户可以与三维空间中的增强对象(虚拟对象)通过手势进行交互,增强了沉浸感。在这种交互方式中,用户不再需要通过外接设备,例如键盘、鼠标、手柄等对虚拟对象进行操控,也不仅仅是在触摸屏上点击虚拟对象。与其他交互方式相比,手势交互的技术更复杂,但同时至少具备如下优点:(1)可以脱离实体接触,实现远距离控制;(2)交互动作更加丰富和自然,不同的操作有不同的手势,不局限于点击和滑动等几种常用的操作;(3)对用户活动影响较少,可以随时继续手势操作。
当前采用的一种基于手势的交互方法是通过手势分析获得手势的形状特征或运动轨迹,根据手势的形状特征或运动轨迹识别出相应的手势,进而执行相应的控制操作。在这种方案中,终端设备仅配置了有限几种手势的形状特征或运动轨迹,导致该终端设备只能识别这几种手势,拓展性差、识别率较低。
发明内容
本申请实施例提供了一种基于手势的操控方法及终端设备,能够覆盖所有的自然手势以及连续的手部动作,操作效率更高、更自然,以便于提高用户的人机交互体验。
本申请实施例第一方面提供了一种基于手势的操控方法,该方法由终端设备执行,该方法包括:显示目标画面,所述目标画面中包括用于通过检测到的手势或检测到的手部动作进行操控的虚拟物体;获取F组手部图像;根据所述F组手部图像,识别所述F组手部图像中手部的手部关节点位置,从而得到F组手部关节点的空间位置,任一组手部关节点的空间位置为一组手部图像中的手部的手部关节点的空间位置,F为大于0的整数;根据所述F组手部关节点的空间位置,执行所述F组手部关节点的空间位置对应的控制操作, 所述控制操作用于调节所述目标画面中所述虚拟物体的位置和/或形态。
上述F组手部图像可以是终端设备上的深度摄像头和彩色摄像头采集的图像,也可以仅是深度摄像头采集的深度图像。一组手部图像可以包括深度摄像头和彩色摄像头同步拍摄同一场景分别得到的深度图像和彩色图像(例如RGB图像),也可以仅包括深度摄像头采集的深度图像。深度摄像头和彩色摄像头同步拍摄同一场景是指该深度摄像头和该彩色摄像头拍摄同一场景的时间间隔小于时间阈值。该时间阈值可以是1毫秒、5毫秒、10毫秒等。可选的,任一组手部图像包括的彩色图像和深度图像是同一时刻拍摄同一场景得到的图像。上述F组手部图像还可以是终端3对摄像头采集的F组原始图像做手部分割,得到的F组手部图像。关节点的空间位置是指关节点的三维坐标。上述虚拟物体可以是终端设备上显示的用户可通过手势操控的虚拟物体,例如虚拟人物、虚拟动物、虚拟物品等。可选的,终端设备预设有训练好的识别网络,可以依次将上述F组手部图像输入到该识别网络,得到每一组手部图像对应的一组关节点的空间位置。通过这种方式可以快速、准确地确定各组手部图像中各手部关节点的空间位置。在实际应用中,终端设备获取到一组手部图像后,识别该组手部图像中的手部关节点,将得到的一组关节点的空间位置进行保存以及确定该组关节点的空间位置对应的手势。终端设备可以按照获得各组手部图像的顺序依次识别各组手部图像中的手部关节点。可以理解,任一种自然手势或任一种手部动作均可以表示为一组或多组关节点的空间位置。反过来,通过一组或多组关节点的空间位置可以确定任一种自然手势或任一种手部动作。自然手势是指任意手势,即用户可以做到的任意手势。本申请实施例中,终端设备通过手指关节点的空间位置确定用户的控制操作,能够覆盖所有的自然手势以及连续的手部动作,操作效率更高、更自然,以便于提高用户的人机交互体验。
在一个可选的实现方式中,所述执行所述F组手部关节点的空间位置对应的控制操作包括:根据所述F组手部关节点的空间位置,确定M个手势类型,M小于或等于F,M为正整数;执行所述M个手势类型对应的所述控制操作。该实现方式中,终端设备根据一组或多组手指关节点的空间位置确定一个或多个手势类型,进而执行该一个或多个手势类型对应的控制操作,可以准确识别各种手势。
在一个可选的实现方式中,所述确定所述F组手部关节点的空间位置对应的M个手势类型包括:根据所述F组手部关节点中一组手部关节点的空间位置,计算所述一组手部关节点中的手部关节点之间的角度;根据所述手部关节点之间的角度,确定所述一组手部关节点的空间位置对应的一个手势类型。
在一个可选的实现方式中,所述确定所述F组手部关节点的空间位置对应的M个手势类型包括:确定所述F组手部关节点的空间位置对应的至少两个手势类型,F大于1;所述执行所述M个手势类型对应的所述控制操作包括:根据所述至少两个手势类型的手势类型变化,执行所述至少两个手势类型对应的所述控制操作。
在一个可选的实现方式中,所述执行所述F组手部关节点的空间位置对应的控制操作包括:根据所述F组手部关节点的空间位置,确定所述F组手部关节点的空间位置对应的M个手势类型,F大于1,M小于或等于F;根据所述F组手部关节点的空间位置和所述M个手势类型,执行所述控制操作。
在一个可选的实现方式中,所述根据所述F组手部关节点的空间位置和所述M个手势类型,执行所述控制操作包括:根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据所述M个手势类型和所述空间位置变化,执行所述控制操作;
或者,根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据所述M个手势类型的手势类型变化和所述空间位置变化,执行所述控制操作;
或者,根据所述M个手势类型的手势类型变化和所述F组手部关节点的空间位置,执行所述控制操作。
在一个可选的实现方式中,所述根据所述M个手势类型和所述空间位置变化,执行所述控制操作包括:在所述M个手势类型中的至少一个为目标手势类型的情况下,根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化,所述目标手势类型用于调整所述目标画面中所述虚拟物体的位置。
在一个可选的实现方式中,所述执行所述F组手部关节点的空间位置对应的控制操作包括:根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化,F大于1;根据所述空间位置变化,执行所述控制操作。
在一个可选的实现方式中,所述方法还包括:在K组手部图像中的每组手部图像中的手部关节点的数量均小于数量阈值的情况下,提示手势操作超出操控范围,所述K组手部图像包含于所述F组手部图像,K小于或等于F,K为正整数。
在一个可选的实现方式中,所述识别所述F组手部图像中的手部关节点位置,从而得到F组手部关节点的空间位置包括:
根据所述F组手部图像中任一组手部图像包括的彩色图像和深度图像中的至少一项检测所述任一组手部图像中的手部所处的位置区域;
根据所述彩色图像和所述深度图像中的至少一项,识别得到所述位置区域中的手部的手部关节点位置。
在一个可选的实现方式中,所述根据所述空间位置变化,执行所述控制操作包括:根据所述空间位置变化,确定手部的移动轨迹;按照所述手部的移动轨迹移动所述虚拟物体,并进行震动;所述震动的震动强度与所述手部到终端设备的距离正相关或负相关。
在一个可选的实现方式中,所述根据所述空间位置变化,执行所述控制操作包括:根据所述空间位置变化,确定手部的动作;执行与所述手部的动作相对应的调整操作,所述调整操作用于调整所述虚拟物体的形态。
在一个可选的实现方式中,所述根据所述F组手部图像中任一组手部图像包括的彩色图像和深度图像中的至少一项检测所述任一组手部图像中的手部所处的位置区域包括:根据目标组手部图像包括的彩色图像,检测所述目标组手部图像包括的彩色图像中的手部所处的第一位置区域;所述目标组手部图像为所述F组手部图像中任一组图像;所述根据所述彩色图像和所述深度图像中的至少一项,识别得到所述位置区域中的手部的手部关节点位置包括:
根据所述目标组手部图像包括的深度图像,识别所深度图像中第二位置区域中的手部的手部关节点位置,得到所述目标组手部图像对应的一组手部关节点的空间位置,所述第二位置区域为所述第一位置区域对应在所述深度图像中的区域,所述深度图像和所述彩色图像为同步拍摄同一场景得到的图像。
在一个可选的实现方式中,所述根据所述F组手部图像中任一组手部图像包括的彩色图像和深度图像中的至少一项检测所述任一组手部图像中的手部所处的位置区域包括:根据目标组手部图像包括的彩色图像,检测所述彩色图像中的手部所处的第一位置区域;所述目标组手部图像为所述F组手部图像中任一组图像;
所述根据所述彩色图像和所述深度图像中的至少一项,识别得到所述位置区域中的手部的手部关节点位置包括:根据所述彩色图像,识别所述第一位置区域中的手部的手部关节点位置,得到第一组手部关节点的空间位置;根据所述目标组手部图像包括的深度图像,识别所述深度图像中第二位置区域中的手部的手部关节点位置,得到第二组手部关节点的空间位置,所述第二位置区域为所述第一位置区域对应在所述深度图像中的区域,所述深度图像和所述彩色图像同步为拍摄同一场景得到的图像;合并第一组手部关节点的空间位置和所述第二组手部关节点的空间位置,得到所述目标组手部图像对应的一组手部关节点的空间位置。
在一个可选的实现方式中,所述识别所述F组手部图像中的手部关节点,得到F组手部关节点的空间位置之前,所述方法还包括:通过彩色传感器和深度传感器同步拍摄所述同一场景,得到原始彩色图像和原始深度图像;对所述原始彩色图像和所述原始深度图像做空间对齐;分别对对齐后的原始彩色图像和对齐后的原始深度图像做手部分割,得到所述目标组手部图像。
在一个可选的实现方式中,所述识别所述F组手部图像中的手部关节点位置,得到F组手部关节点的空间位置包括:根据所述目标组手部图像包括的深度图像,检测所述深度图像中的手部所处的位置区域,所述目标组手部图像为所述F组手部图像中任一组图像;根据所述深度图像,识别所述深度图像中所述位置区域中的手部的手部关节点位置,得到所述目标组手部图像对应的一组手部关节点的空间位置。
本申请实施例第二方面提供了一种终端设备,该终端设备包括;显示单元,用于显示目标画面,所述目标画面中包括用于通过检测到的手势或检测到的手部动作进行操控的虚拟物体;获取单元,用于获取F组手部图像;识别单元,用于根据所述F组手部图像,识别所述F组手部图像中手部的手部关节点位置,从而得到F组手部关节点的空间位置,任一组手部关节点的空间位置为一组手部图像中的手部的手部关节点的空间位置,F为大于0的整数;处理单元,用于根据所述F组手部关节点的空间位置,执行所述F组手部关节点的空间位置对应的控制操作,所述控制操作用于调节所述目标画面中所述虚拟物体的位置和/或形态。
在一个可选的实现方式中,所述处理单元,用于确定所述F组关节点的空间位置对应的至少一个手势;执行所述至少一个手势对应的所述控制操作。
在一个可选的实现方式中,所述处理单元,用于根据所述F组手部关节点中一组手部关节点的空间位置,计算所述一组手部关节点中的手部关节点之间的角度;根据所述手部 关节点之间的角度,确定所述一组手部关节点的空间位置对应的一个手势类型。
在一个可选的实现方式中,所述处理单元,用于确定所述F组手部关节点的空间位置对应的至少两个手势类型,F大于1;根据所述至少两个手势类型的手势类型变化,执行所述至少两个手势类型对应的所述控制操作。
在一个可选的实现方式中,所述处理单元,用于根据所述F组手部关节点的空间位置,确定所述F组手部关节点的空间位置对应的M个手势类型,F大于1,M小于或等于F;根据所述F组手部关节点的空间位置和所述M个手势类型,执行所述控制操作。
在一个可选的实现方式中,所述处理单元,用于根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据所述M个手势类型和所述空间位置变化,执行所述控制操作;
或者,所述处理单元,用于根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据所述M个手势类型的手势类型变化和所述空间位置变化,执行所述控制操作;
或者,所述处理单元,用于根据所述M个手势类型的手势类型变化和所述F组手部关节点的空间位置,执行所述控制操作。
在一个可选的实现方式中,所述处理单元,用于根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化,F大于1;根据所述空间位置变化,执行所述控制操作。
在一个可选的实现方式中,所述处理单元,用于在K组手部图像中的每组手部图像中的手部关节点的数量均小于数量阈值的情况下,提示手势操作超出操控范围,所述K组手部图像包含于所述F组手部图像,K小于或等于F,K为正整数。
在一个可选的实现方式中,所述识别单元,用于根据所述F组手部图像中任一组手部图像包括的彩色图像和深度图像中的至少一项检测所述任一组手部图像中的手部所处的位置区域;根据所述彩色图像和所述深度图像中的至少一项,识别得到所述位置区域中的手部的手部关节点位置。
在一个可选的实现方式中,所述识别单元,用于根据目标组手部图像包括的彩色图像,检测所述目标组手部图像包括的彩色图像中的手部所处的第一位置区域;所述目标组手部图像为所述F组手部图像中任一组图像;根据所述目标组手部图像包括的深度图像,识别所深度图像中第二位置区域中的手部的手部关节点位置,得到所述目标组手部图像对应的一组手部关节点的空间位置,所述第二位置区域为所述第一位置区域对应在所述深度图像中的区域,所述深度图像和所述彩色图像为同步拍摄同一场景得到的图像。
在一个可选的实现方式中,所述识别单元,用于根据目标组手部图像包括的彩色图像,检测所述彩色图像中的手部所处的第一位置区域;所述目标组手部图像为所述F组手部图像中任一组图像;根据所述彩色图像,识别所述第一位置区域中的手部的手部关节点位置,得到第一组手部关节点的空间位置;根据所述目标组手部图像包括的深度图像,识别所述深度图像中第二位置区域中的手部的手部关节点位置,得到第二组手部关节点的空间位置,所述第二位置区域为所述第一位置区域对应在所述深度图像中的区域,所述深度图像和所述彩色图像同步为拍摄同一场景得到的图像;合并第一组手部关节点的空间位置和所述第 二组手部关节点的空间位置,得到所述目标组手部图像对应的一组手部关节点的空间位置。
在一个可选的实现方式中,所述获取单元包括:彩色传感器,用于拍摄拍摄所述同一场景,得到原始彩色图像;深度传感器,用于拍摄拍摄所述同一场景,得到原原始深度图像;对齐子单元,用于对所述原始彩色图像和所述原始深度图像做空间对齐;分割子单元,用于分别对对齐后的原始彩色图像和对齐后的原始深度图像做手部分割,得到所述目标组手部图像。
本申请实施例第三方面提供了一种计算机可读存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行上述第一方面以及任一项可选的实现方式所述的方法。
本申请实施例第四方面提供了一种终端设备,该终端设备包括处理器和存储器;该存储器用于存储代码;该处理器通过读取该存储器中存储的该代码,以用于执行第一方面提供的方法。
本申请实施例第五方面提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行第一方面的任意一种方法的部分或全部步骤。
附图说明
图1是本申请实施例提供的一种终端设备的结构示意图;
图2为本申请实施例提供的一种终端设备的结构示意图;
图3为本申请实施例提供的一种终端300的逻辑结构示意图;
图4为本申请实施例提供的一种获取单元的逻辑结构示意图;
图5为本申请实施例提供的由一组手部关节点的空间位置确定握拳手势的示意图;
图6为本申请实施例提供的由一组手部关节点的空间位置确定张开手势的示意图;
图7为本申请实施例提供的一种基于手势的操控方法流程图;
图8为本申请实施例提供了另一种基于手势的操控方法流程图;
图9为本申请实施例提供的一种通过张开手势释放子弹的示意图;
图10为本申请实施例提供的另一种基于手势的操控方法流程图;
图11为本申请实施例提供的终端设备显示的画面的示意图;
图12为本申请实施例提供的一种手部移动过程的示意图;
图13为本申请实施例提供的一种震动强度与手部至终端设备的距离的关系示意图;
图14是本发明实施例提供的终端设备的硬件结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请实施例方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。
本申请的说明书实施例和权利要求书及上述附图中的术语“第一”、“第二”、和“第三”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。此外,术语“包括”和“具 有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。“和/或”用于表示在其所连接的两个对象之间选择一个或全部。例如“A和/或B”表示A、B或A+B。
图1是本申请实施例提供的一种终端设备100的结构示意图。终端设备100可以是但不限于:手机,平板电脑,笔记本电脑,智能手表,电视机,AR眼镜,VR眼镜以及其他具有显示屏的电子设备。终端设备100可以支持多个应用程序,诸如以下各项中的一者或多者:绘图应用程序、文字处理应用程序、网站浏览应用程序、电子表格应用程序、办公软件应用程序、游戏应用程序、电话应用程序、视频会议应用程序、电子邮件应用程序、即时消息应用程序、健康管理应用程序、照片管理应用程序、数字相机应用程序、数字视频相机应用程序、震动管理应用程序、数字音乐播放器应用程序和数字视频播放器应用程序。在终端设备100上执行的各个应用程序任选地通过至少一个硬件接口设备获得用户输入的指令,例如但不限于触摸显示屏136、深度摄像头156以及彩色摄像头158。
终端设备100可以包括存储器108(可以包括一个或多个计算机可读存储介质)、一个或多个处理单元(例如但不限于中央处理器(Central Processing Unit,CPU)、图形处理器(Graphics Processing Unit,GPU)、神经网络处理器(Neural-network Processing Unit,NPU)、数字信号处理器(Digital Signal Processing,DSP)和现场可编程门阵列(Field-Programmable Gate ArrayFPGA)中至少一种)120。终端设备100还可以包括存储器控制器104、外围设备接口106、RF电路126、音频电路128、扬声器130、触摸显示屏136、麦克风132、输入/输出(I/O)子系统106、其它输入或控制设备116和外部端口156中的至少一种。终端设备100还可以包括一个或多个光学传感器142。终端设备100还可以包括用于检测触摸显示屏136上的触摸的强度的一个或多个强度传感器146(例如,用于检测触摸显示屏136上的触摸的强度,其中,"强度"是指触摸显示屏136上的触摸(例如,手指触摸)的压力或者压强)。这些部件任选地通过一根或多根总线或信号线通信。应理解,终端设备100也可以包括不具有对用户的触摸进行感知功能的显示屏,从而替换触摸显示屏136。终端设备100还可以包括深度摄像头156以及彩色摄像头158。深度摄像头156用于采集深度图像。彩色摄像头158用于采集彩色图像,例如RGB图像。深度摄像头156以及彩色摄像头158可以在终端设备100中的一个或多个处理单元的控制下,同步拍摄同一场景的图像。由于深度摄像头156在终端设备100上的位置与彩色摄像头158在终端设备100上的位置相同或相近,深度摄像头156和彩色摄像头158拍摄的场景可以视为同一场景。终端设备100还可以包括震动电路160,该震动电路160用于提供一种或多种震动方式,以使得终端设备100达到不同的震动强度或震动效果。震动电路160可以包括震动马达等器件。
应当理解,终端设备100只是一个示例,并且终端设备100可以具有比所示出的更多或更少的部件,任选地组合两个或更多个部件。图1中所示的各种部件以硬件、软件、或硬件与软件两者的组合来实现,还可以包括信号处理专用集成电路和应用程序专用集成电路至少一种。
存储器108可以包括高速随机存取存储器,并且还任选地包括非易失性存储器,诸如一个或多个磁盘存储设备、闪存存储器设备、或其它非易失性固态存储器设备。终端设备 100的其它部件(例如CPU 120和外围设备接118)对存储器108的访问任选地由存储器的控制器104来控制。
外围设备接口106可用于将终端设备100的输入外围设备和输出外围设备耦接到处理单元102和存储器108。该一个或多个处理单元102运行或执行存储在存储器108中的各个软件程序和/或指令集以执行设备100的各种功能并处理数据。其中,外围设备接口106、处理单元102和存储器控制器104可以被实现在单个芯片诸如芯片104上。外围设备接口106、处理单元102和存储器控制器104也可以被实现在独立的芯片上。
射频(Radio Frequency,RF)电路126用于接收和发送RF信号,也被叫做电磁信号。RF电路126将电信号转换为电磁信号或将电磁信号转换为电信号,并且经由电磁信号与通信网络及其他通信设备进行通信。RF电路126可以包括用于执行上述功能的电路,包括但不限于天线系统、RF收发器、一个或多个放大器、调谐器、一个或多个振荡器、数字信号处理器、编解码芯片组等等。RF电路126可以通过无线通信与网络以及其它设备通信,网络例如可以是互联网(也被称为万维网(WWW))、内联网、无线局域网(LAN)或者城域网(MAN))。无线通信可以包括多种通信标准、协议和技术中的任何一种,包括但不限于全球移动通信系统(GSM)、增强数据GSM环境(EDGE)、高速下行链路分组接入(HSDPA)、高速上行链路分组接入(HSUPA)、演进、纯数据(EV-DO)、HSPA、HSPA+、双小区HSPA(DC-HSPDA)、长期演进(LTE)、近场通信(NFC)、宽带码分多址(WCDM)、码分多址(CDM)、时分多址(TDM)、蓝牙、无线保真(Wi-Fi)(例如,IEEE802.11a、IEEE 802.lib、IEEE 802.Ilg或IEEE 802.Iln)、互联网协议语音技术(VoIP)、Wi-MAX、电子邮件协议(例如,互联网消息访问协议(IMAP)和/或邮局协议(POP))、即时消息(例如,可扩展消息处理和到场协议(XMPP)、用于即时消息和到场利用扩展的会话发起协议(SMPLE)、即时消息和到场服务(MPS))或短消息服务(SMS),以及无线通信也包括在本文档提交日期还未开发出的通信协议。
音频电路128、扬声器130和麦克风132提供用户和设备100之间的音频接口。音频电路128从外围设备接口118接收音频数据,将音频数据转换为电信号,并将电信号传输到扬声器130。扬声器130将电信号转换为人耳可听见的声波。音频电路128还接收由麦克风132将声波进行转换得到电信号。音频电路128将电信号转换为音频数据,并将音频数据传输到外围设备接口106以用于处理。音频数据可以由外围设备接口106传输至存储器108、处理单元102或RF电路126。音频电路128还可以包括耳麦插孔。耳麦插孔提供了音频电路128与可移除的音频输入/输出外围设备之间的接口,该外围设备可以是仅输出的耳机,该外围设备也可以是具有输出(例如,单耳耳机或双耳耳机)和输入(例如,麦克风)两者的耳麦。
I/O子系统134将终端设备100上的输入/输出外围设备诸如触摸显示屏136和其他输入控制设备152耦接到外围设备接口106。I/O子系统134可以包括显示控制器134、光学传感器控制器140、强度传感器控制器144或用于其它输入控制设备116的其他输入控制器154。该其他输入控制器154从其它输入控制设备116接收电信号或发送电信号到其它输入控制设备116。其它输入控制设备116任选地包括实体按钮(例如,按压按钮、摇臂按钮等)、拨号盘、滑动开关、操纵杆、点击式转盘等等。其他输入控制器154也可以任选地耦接到 以下各项中的任一者:键盘、红外端口、USB端口和指向设备(例如鼠标)。实体按钮也可以包括用于扬声器130、耳机或者耳麦的音量控制的增音量按钮或者减音量按钮。实体按钮也可以包括用于对终端设备100开关机和锁定终端设备100的按压按钮。
触摸显示屏136提供终端设备100与用户之间的输入接口和输出接口。显示控制器134从触摸显示屏136接收电信号或者将电信号发送至触摸屏112。触摸显示屏136向用户显示视觉输出。视觉输出任选地包括图形、文本、图标、动态画面、视频和它们的任何组合。
触摸显示屏136可以具有基于触觉或触感触摸来接收来自用户的输入的传感器或传感器组。触摸显示屏136和显示控制器134(与存储器108中的任何相关联的模块或指令集一起)检测触摸显示屏136上的触摸(和该触摸的任何移动或中断),并且将所检测到的触摸转换为与显示在触摸显示屏136上的用户界面对象(例如,一个或多个虚拟按钮、图标、网页、图形或图像)的交互。其中,触摸显示屏136和用户之间的触摸点可以对应于用户的手指,也可以对应于触笔。
触摸显示屏136可以使用IXD(液晶显示器)技术、LPD(发光聚合物显示器)技术或LED(发光二极管)技术。触摸显示屏136和显示控制器134可以使用现在已知的或以后将开发出的多种触摸感测技术中的任何技术,该触摸感测技术包括但不限于电容性的、电阻性的、红外线的或表面声波触摸感测技术。在具体实现过程中,可以使用投射式互电容感测技术。
触摸显示屏136可以具有超过100dpi的视频分辨率或者其他视频分辨率。用户任选地使用任何合适的物体或附加物诸如触笔、手指等等与触摸显示屏136触摸。在一些实施例中,用户界面可以被设计为基于手指的触摸和手势与用户进行交互,由于手指在触摸显示屏136上的触摸区域较大,所以这可能不如基于触笔的输入精确。在一些实施例中,终端设备100将基于手指的粗略输入转译为精确的指针/光标位置或命令以执行用户所期望的动作。在一些实施例中,终端设备100将用户的手势或手部动作转换为用于操控触摸显示屏136显示的虚拟物体或其他可操作对象的控制操作。
在一些实施例中,除了触摸屏之外,终端设备100还可以包括用于通过用户的触摸激活或去激活特定功能的触摸板。在一些实施例中,触摸板区域与触摸显示屏136区域为不同区域,两个区域可以相邻,也可以不相邻。触摸板不显示视觉输出。
终端设备100还可以包括用于为各种部件供电的电力系统138。电力系统138可以包括电力管理系统、一个或多个电源(例如,电池、交流电(AC))、再充电系统、电力故障检测电路、功率转换器或逆变器、电力状态指示器(例如,发光二极管(LED))和任何其它与电力的生成、管理和分配相关联的部件。电力系统还可以包括无线充电接收器,用于通过无线充电的方式接收电能,从而为终端设备100充电。
终端设备100还可以包括耦接到I/O子系统134中的光学传感器控制器140一个或多个光学传感器142。光学传感器142可以包括电荷耦合元件(CCD)或互补金属氧化物半导体(CMOS)光电晶体管。光学传感器142从环境接收通过一个或多个透镜而投射的光,并且将光转换为表示图像的数据。
终端设备100还可以包括耦接到I/O子系统134中强度传感器控制器144的触摸强度传感器146。触摸强度传感器146可以包括一个或多个电容式力传感器、电力传感器、压 电力传感器、光学力传感器或其它强度传感器。触摸强度传感器146用于从环境接收触摸强度信息。
终端设备100还可以包括耦接到外围设备接口106一个或多个接近传感器148。作为另外一种选择,接近传感器148耦接至I/O子系统134中的输入控制器160。在一些实施例中,当终端设备100被置于用户的耳朵附近时(例如,当用户正在进行电话呼叫时),接近传感器关闭并且禁用触摸显示屏136。
终端设备100还可以包括耦接到外围设备接口106的一个或多个加速度计150。作为另外一种选择,加速度计150任选地耦接至I/O子系统134中的输入控制器160。在一些实施例中,基于对来自该一个或多个加速度计的数据的分析,从而在触摸屏显示器上以纵向视图或横向视图被显示。终端设备100除了一个或多个加速度计150之外还可以包括GPS(或GLONASS或其他全球导航系统)接收器,以用于获取关于设备100的位置信息。
在一些实施例中,存储在存储器108中可以包括操作系统110,以及包括以下模块中至少一种:通信模块(或指令集)112、触摸/运动模块(或指令集)114、图形模块(或指令集)116、电话模块118、录音模块120、视频和音乐播放模块122以及在线音频/视频模块124,上述模块为软件代码,处理单元102通过读取存储器108中相应的代码,从而实现相应的模块的动能。
操作系统110(例如,Darwin、RTXC、LINUX、UNIX、OS X、WINDOWS、嵌入式操作系统诸(例如VxWorks)、Android、iOS、windows phone、Symbian、BlackBerry OS或windows mobile)包括用于控制和管理一般系统任务(例如,存储器管理、存储设备控制、功率管理等)的各种软件部件和/或驱动器,并且用于各种硬件部件和软件部件之间的通信。
通信模块112用于通过一个或多个外部端口156与其他设备进行通信,并且还包括用于处理由RF电路126和/或外部端口156所接收的数据的各种软件部件。外部端口156(例如,通用串行总线(USB)等)用于直接连接到其他设备或者间接地通过网络(例如,互联网、无线LAN等)进行连接。在一些实施例中,外部端口156可以是与电力系统138连接的充电接口,充电接口用于与充电线连接,从而通过充电线获取来自外部的电能,外部端口156也可以是数据接口,数据接口用于与数据线连接,从而通过数据线获取来自外部的数据,应理解,外部端口156可以同时具有数据接口和充电接口的功能,相应的,数据线和充电线也可以是同一根线。
触摸/运动模块114可以用于检测与触摸显示屏136(结合显示控制器134)和其它触摸设备(例如,触摸板)的触摸。触摸/运动模块114可以包括用于执行与触摸的检测相关的各种操作的各种软件部件,诸如确定是否已经发生了触摸(例如,检测手指按下事件)、确定触摸的强度(例如,触摸的压力或压强)、确定是否存在触摸的移动并且在触摸显示屏136表面上跟踪该移动(例如,检测一个或多个手指拖动事件)、以及确定触摸是否已经停止(例如,检测手指抬起事件或者触摸中断)。触摸/运动模块114从触摸显示屏136表面接收触摸数据。确定触摸点的移动可以包括确定触摸点的速率(量值)、速度(量值和方向)或者加速度(量值和/或方向的改变),触摸点的移动由一系列触摸数据来表示。这些操作可以被应用于单点触摸(例如,单指触摸)或者多点同时触摸(例如,"多点触摸"/多个手指触摸)。在一些实施例中,触摸/运动模块114和显示控制器134检测触摸板上的触摸。在一 些实施例中,触摸/运动模块114可以使用一个或多个强度阈值的集合来确定操作是否已经由用户执行(例如,确定用户是否已经"点击"图标)。触摸/运动模块114可以检测用户的手势输入。触摸显示屏136表面上的不同手势具有不同触摸图案(例如,所检测到的触摸的不同移动或强度)。因此,可以通过检测具体触摸图案来检测手势。例如,检测手指轻击手势包括检测手指按下事件,然后在与手指按下事件相同的位置(或基本上相同的位置)处(例如,在图标位置处)检测手指抬起(抬离)事件。又如,在触摸显示屏136表面上检测手指轻扫手势包括检测手指按下事件、然后检测一个或多个手指拖动事件、并且随后检测手指抬起(抬离)事件。
图形模块116可以包括用于在触摸显示屏136或其它显示器上呈现和显示图形的各种软件部件,包括用于改变所显示的图形的视觉冲击(例如,亮度、透明度、饱和度、对比度、材质或其它视觉特性)的部件。在本申请中,术语"图形"包括可被显示给用户的任何对象,例如但不限于文本、网页、图标(诸如包括软键的用户界面对象)、数字图像、视频、动画等等。
在一些实施例中,图形模块116可以存储表示待显示图形的数据。每个图形可以被分配有对应的代码。图形模块116接收指定待显示图形的一个或多个代码,在必要的情况下还可以接收坐标数据和其他图形属性数据,然后生成相应的图像数据以输出至显示控制器134,以用于显示在触摸显示屏136上。
结合RF电路126、音频电路128、扬声器130、麦克风132、触摸显示屏136、显示控制器156、触摸/运动模块114和图形模块116中至少一种,电话模块118可以被用于进行呼叫、接听来电、进行会话以及在会话结束时断开或挂断。如上所述,无线通信可以使用多种通信标准、协议和技术中的任一种。
结合音频电路128、扬声器130、麦克风132、触摸显示屏136、显示控制器156、触摸/运动模块114和图形模块116中至少一种,录音模块120可以被用于录音,并执行录音过程中的开启、暂定、继续、完成等与用户交互的动作,以及存储录入的音频数据。
结合触摸显示屏136、显示系统控制器156、触摸/运动模块114、图形模块116、音频电路128、扬声器130、外部端口156和RF电路126中至少一种,视频和音乐播放模块122包括允许用户获取和播放以一种或多种文件格式(诸如MP3或AAC文件)存储的音频/视频数据和其它音频/视频文件的可执行文件,以及用于显示、呈现或以其它方式回放音频/视频(例如,在触摸屏112上或在经由外部端口156连接的外部显示器上)。在一些实施例中,设备100可以任选地包括音频/视频播放器。其中,视频和音乐播放模块122可以包括视频播放模块和音乐播放模块。
结合触摸显示屏136、显示系统控制器156、触摸/运动模块114、图形模块116、音频电路128、扬声器130、外部端口156和RF电路126中至少一种,在线音频/视频模块124用于访问、接收(例如,通过流式传输和/或下载)、回放(例如在触摸屏上或在经由外部端口124所连接的外部显示器上)以及以其他方式管理一种或多种文件格式(诸如H.264/H.265,AMR-WB或者EVS)的在线音频/视频数据。其中,在线音频/视频模块124可以包括在线音频模块和在线视频模块。
存储器108也可以包括视频会议模块、电子邮件客户端模块、即时消息模块、用于静态图像或视频图像的相机模块、文字处理应用模块、图像编辑模块、绘图模块、JAVA启用模块、加密模块、数字权限管理模块、声音识别模块或声音复制模块。
上述每个模块和应用程序可以用于执行本申请中所描述的方法,也可以作为本申请所描述的方法对应的模块。这些模块(即指令集)不必被实现为各自独立的软件程序、过程或模块,因此这些模块的子组可以任选地在各个实施例中被组合或以其它方式重新布置。在一些实施例中,存储器108任选地存储上述模块的子组。存储器中的上述模块和应用程序同样可以通过集成电路或者软硬件结合的方式实现,此外,存储器108任选地存储上面未描述的附加模块和数据结构。
图2为本申请实施例提供的一种终端设备100的结构示意图。如图2所示,终端设备可以包括触摸显示屏136、深度摄像头156以及彩色摄像头158。深度摄像头156用于采集深度图像。彩色摄像头158用于采集彩色图像,例如RGB图像。触摸显示屏136用于显示目标画面,该目标画面包括用于通过检测到的手势或检测到的手部动作进行操控的虚拟物体。深度摄像头156、彩色摄像头158以及触摸显示屏136位于终端设备100的同一侧。当用户在观看触摸显示屏136时,深度摄像头156以及彩色摄像头158可以拍摄用户的手部图像。如图2所示,深度摄像头156与彩色摄像头158相邻,深度摄像头156和彩色摄像头158同步拍摄的场景可以视为同一场景。
图3为本申请实施例提供的一种终端300的逻辑结构示意图。终端300包括:显示单元302、获取单元304、识别单元306以及处理单元308。应理解,终端300中的单元可以通过软件编程实现,也可以通过硬件电路实现,也可以部分单元通过软件编程实现而另一部分单元通过硬件电路实现。应理解,终端300可以是图1中的终端设备100,进一步的,终端300的屏幕可以是触摸显示屏136。下面介绍一下终端300中各单元的功能。
显示单元302,用于显示目标画面,该目标画面包括用于通过检测到的手势或检测到的手部动作进行操控的虚拟物体。获取单元304,用于获取F组手部图像。识别单元306,用于依次识别这F组手部图像中的手部关节点位置,得到F组手部关节点的空间位置,每组手部关节点的空间位置为一组手部图像对应的手部关节点的空间位置,F为大于0的整数。一组手部图像对应的手部关节点的空间位置可以包括各手部关节点在该组手部图像中的二维位置以及深度信息,从而组成了三维位置。例如,一组手部关节点的空间位置包括手部21个关节点的三维位置。一组手部图像可以为一帧深度图像,也可以包括一帧深度图像和一帧彩色图像,还可以为一帧彩色图像。处理单元308,用于执行这F组手部关节点的空间位置对应的控制操作,该控制操作用于调节该目标画面中该虚拟物体的位置和/或形态。显示单元302,还用于显示调整后的目标画面。
显示单元302可以显示目标画面中该虚拟物体发生位置和/或形态改变得到的动态画面,也可以显示通过后置摄像头或前置摄像头采集的图像,还可以显示与用户的手部相对应的虚拟手部图像,还可以显示其他图像。处理单元308可以确定该F组手部关节点的空间位置对应的控制操作,并根据该控制操作调整显示单元302显示的画面。可以理解,处理单元308可以根据不同的控制操作控制显示单元302显示不同的画面。显示单元302可以是图1中的触摸显示屏136,也可以是非触摸显示屏,本申请实施例不作限定。在实际应用 中,用户可以通过手势或手部动作操控显示单元302显示的画面中的虚拟物体。终端300可以识别用户的手势或手部动作,进而将识别到手势或手部动作转换为调节显示单元302显示的画面的控制操作。获取单元304的功能可以由图1中的深度摄像头156以及彩色摄像头158实现。在实际应用中,深度摄像头156以及彩色摄像头158可以同步拍摄图像,拍摄一次,可以得到一组手部图像。识别单元306和处理单元308均可以是图1中的处理单元120。一组手部图像可以包括深度摄像头和彩色摄像头同步拍摄同一场景分别得到的一帧深度图像和一帧彩色图像(例如RGB图像),也可以仅包括深度摄像头拍摄的一帧深度图像,还可以仅包括彩色摄像头拍摄的一帧彩色图像。深度摄像头和彩色摄像头同步拍摄同一场景是指该深度摄像头和该彩色摄像头拍摄同一场景的时间间隔小于时间阈值。该时间阈值可以是1毫秒、5毫秒、10毫秒等。可选的,任一组手部图像包括的彩色图像和深度图像是同一时刻拍摄同一场景得到的图像。
可选的,深度摄像头156以及彩色摄像头158拍摄到一组手部图像后,可以将该组手部图像传输至识别单元306;该识别单元306识别该组手部图像中的手部关节点位置,得到一组手部关节点的空间位置,并传输至处理单元308;处理单元308可以得到该组手部关节点的空间位置,就确定该组手部关节点的空间位置对应的手势类型,进而执行该手势类型对应的控制操作。也就是说,识别单元306可以获得一组手部图像后就识别该组手部图像中的手部关节点位置,进而由处理单元308确定该组手部图像对应的手势类型。识别单元306和处理单元308可以是不同的单元,也可以是同一个单元(图1中的处理单元120)。可以理解,本申请实施例中,识别单元306可以利用一组手部图像(包括一帧深度图像)得到一组手部关节点的空间位置,进而由处理单元308确定该组手部图像对应的手势类型以及用户的手部当前所处的空间位置。
可选的,深度摄像头156以及彩色摄像头158拍摄多组手部图像后,可以将这些组手部图像传输至识别单元306;该识别单元306识别这些组手部图像中的手部关节点位置,得到多组手部关节点的空间位置。多组是指两组或两组以上。可以理解,获取单元304可以每次向识别单元306发送一组手部图像,即获得一组手部图像就向识别单元306发送一组手部图像;也可以每次向识别单元306发送多组手部图像,即获取多组手部图像后,将这些组手部图像一起向识别单元306发送。
可选的,彩色摄像头158拍摄多组手部图像后,可以将这些组手部图像传输至识别单元306;识别单元306识别这些组手部图像中的手部关节点位置,得到多组手部关节点的空间位置。彩色摄像头158拍摄的每组手部图像为一帧彩色图像。识别单元306可以利用多帧彩色图像识别这些帧手部图像中的手部关节点位置,得到多组手部关节点的空间位置。
可以理解,当一组手部图像包括一帧深度图像的情况下,识别单元306识别该组手部图像中的手部关节点位置,可以得到一组手部关节点的空间位置(三维位置)。当一组手部图像不包括深度图像的情况下,识别单元306识别至少两组手部图像中的手部关节点位置,可以得到至少一组手部关节点的空间位置(三维位置)。识别单元306可以根据F组手部图像,识别该F组手部图像中的手部关节点位置,从而得到F组手部关节点的空间位置。当一组手部图像包括深度图像时,F可以等于1。当一组手部图像部包括深度图像时,F至少为2。
可以理解,任一种自然手势均可以表示为一组手部关节点的空间位置,任一种手部动作均可以表示为多组手部关节点的空间位置。反过来,通过一组手部关节点的空间位置可以确定任一种自然手势,通过多组手部关节点的空间位置可以确定任一种手部动作。自然手势是指任意手势,即用户可以做到的任意手势。可以理解,终端300可以根据一组手部关节点的空间位置确定一个手势,也可以根据多组手部关节点的空间位置确定一个手势序列或手部动作,进而执行该手势或该手部动作对应的控制操作。F组手部关节点的空间位置对应的控制操作就是由F组手部关节点的空间位置确定的手势或手部动作对应的控制操作。终端300可以预设有一组手部关节点的空间位置与控制动作的对应关系,也可以预设有F组手部关节点的空间位置的组合与控制动作的对应关系。在实际应用中,用户可以通过手势以及手部动作操控显示单元302显示的画面中的虚拟物体。例如通过手势或手部动作调整显示单元202显示的画面中虚拟物体的位置或形态。
本申请实施例中,终端设备通过手部关节点的空间位置确定用户的控制操作,能够覆盖所有的自然手势以及手部动作,操作效率更高、更自然,以便于提高用户的人机交互体验。
在图3中未详细介绍获取单元304如何得到F组手部图像,下面做一下具体介绍。在一个可选的实现方式中,如图4所示,获取单元304包括:彩色传感器3042、深度传感器3044、对齐子单元3046以及分割子单元3048。获取单元304中的子单元可以通过软件编程实现,也可以通过硬件电路实现,也可以部分单元通过软件编程实现而另一部分单元通过硬件电路实现。图4中的获取单元304可以不包括彩色传感器3042。也就是说,彩色传感器是可选的,而不是必须的。
彩色传感器3042,用于拍摄原始彩色图像。深度传感器3044,用于拍摄原始深度图像。对齐子单元3046,用于对该原始彩色图像和该原始深度图像做空间对齐。分割子单元3048,用于分别对该原始彩色图像和该原始深度图像做手部分割,得到一组手部图像。对齐子单元3046以及分割子单元3048的功能可以由图1中的处理单元120实现。彩色传感器3042可以是图1中彩色摄像头158中的传感器,深度传感器3044可以是图1中深度摄像头158中的传感器。彩色传感器3042和深度传感器3044可以同步拍摄图像,该彩色传感器得到该原始彩色图像,该深度传感器得到该原始深度图像。彩色传感器3042和深度传感器3044分辨率可以相同,也可以不同。也就是说,该原始深度图像和该原始彩色图像的分辨率可以相同,也可以不同。当该原始深度图像和该原始彩色图像的分辨率不同时,对齐子单元3046可以采用图像缩放算法(例如双线性插值方法)将这两个图像调整到大小一致。举例来说,彩色图像的分辨率是800*1280,深度图像的分辨率是400*640,对齐子单元3046采用双线性插值,将该彩色图像缩小到400*640。
由于彩色图像数据与深度图像数据的空间坐标系是不同的,前者的原点是彩色摄像头(RGB摄像头),后者的原点是红外摄像头,两者会有相应的误差。因此,需要对该原始彩色图像和该原始深度图像做空间对齐。对原始彩色图像和原始深度图像做空间对齐可以是该原始彩色图像保持不变,调整该原始深度图像,使得该原始深度图像与该原始彩色图像实现空间对齐;也可以是该原始深度图像保持不变,调整该原始彩色图像,使得该原始深度图像与该原始彩色图像实现空间对齐。可选的,对齐子单元3046可以依据深度传感器 和彩色传感器标定的两摄像头之间的旋转平移矩阵,对该原始深度图像进行旋转平移,以便与该原始彩色图像对齐。当该原始深度图像与该原始彩色图像的分辨率不同时,对齐子单元3046或处理单元308先将这两个图像调整到大小一致再进行空间对齐。当该原始深度图像与该原始彩色图像的分辨率相同时,对齐子单元3046对这两个图像直接做空间对齐。分割子单元3048是可选的,分割子单元3048的作用是从该原始彩色图像中提取出手部所处的图像区域以及从该原始深度图像中提取出手部所处的图像区域。获取单元304也可以不包括分割子单元3048。也就是说,获取单元304可以将实现空间对齐后的原始彩色图像和原始深度图像作为目标组手部图像发送给识别单元306。
在该实现方式中,对原始彩色图像和原始深度图像做空间对齐,可以保证该彩色原始图像中手部关节点的位置与该原始深度图像中的手部关节点的位置一致,实现简单。
在图3中未详细介绍识别单元306如何识别手部图像中的手部关节点位置,下面做一下具体介绍。
在一个可选的实现方式中,终端300还包括:检测单元310,用于根据目标组手部图像包括的彩色图像和深度图像中的至少一项检测手部所处的位置区域;识别单元306,用于根据该彩色图像和该深度图像中的至少一项识别该位置区域中的手部的手部关节点位置,得到一组手部关节点的空间位置。
在一个可选的实现方式中,终端300还包括:
检测单元310,用于检测目标组手部图像包括的彩色图像中的手部所处的第一位置区域。识别单元306,用于识别该目标组手部图像包括的深度图像中第二位置区域中的手部的手部关节点位置,得到一组手部关节点的空间位置,该第二位置区域为该第一位置区域对应在该深度图像中的区域。考虑到深度图像和彩色图像的分辨率可能不同,该第二位置区域和该第一位置区域可以相同,也可以成一定比例。
该目标组手部图像包括该彩色图像和该深度图像,该目标组手部图像为上述F组手部图像中任一组图像。该深度图像和该彩色图像为同步拍摄同一场景得到的图像。检测单元310可以是图1中的处理单元120。检测单元310可以采用训练好的检测网络对彩色图像中的手部所处的位置区域进行检测,将检测到的第一位置区域(手的位置结果)输入到识别单元306。该第一位置区域可以用一组坐标表示。该检测网络是处理单元308采用深度学习方法,利用大量彩色图像(RGB图像)样本进行训练得到的网络。识别单元306可以采用训练好的识别网络识别深度图像中的手部关节点位置,得到一组手部关节点的空间位置。可选的,识别单元306获取到第一位置区域和深度图像后,确定该第一位置区域对应在该深度图像中的第二位置区域,并利用训练好的识别网络对该第二位置区域的图像做手部关节点回归,得到该组手部关节点的空间位置。可选的,检测单元310在检测到第一位置区域后,确定该第一位置区域对应在该深度图像中的第二位置区域,并将该第二位置区域的图像发送给识别单元306。识别单元306利用训练好的识别网络对该第二位置区域的图像做手部关节点回归,得到该组手部关节点的空间位置。识别网络是处理单元308采用深度学习方法,利用大量深度样本进行训练得到的网络。在实际应用中,识别单元306识别一个深度图像可以得到21个手部关节点的三维空间位置。
在一个可选的实现方式中,检测单元310,用于检测目标组手部图像包括的彩色图像 中的手部所处的第一位置区域。识别单元306,用于识别该第一位置区域中的手部的手部关节点位置,得到第一组手部关节点的空间位置;识别该目标组手部图像包括的深度图像中第二位置区域中的手部的手部关节点位置,得到第二组手部关节点的空间位置,该第二位置区域为该第一位置区域对应在该深度图像中的区域;合并该第一组手部关节点的空间位置和该第二组手部关节点的空间位置,得到该目标组手部图像对应的一组手部关节点的空间位置。
该目标组手部图像包括该彩色图像和该深度图像,该目标组手部图像为上述F组手部图像中任一组图像。从彩色图像得到的手部关节点的空间位置可以是各像素点的坐标(二维坐标),从深度图像得到的手部关节点的空间位置可以是各像素点的坐标以及场景中各手部关节点到深度传感器的距离(深度信息)。可以理解,从深度图像得到的手部关节点的空间位置包括手部在该深度图像中的二维位置以及深度信息,从而组成了三维位置。在实际应用中,识别单元306可以分别对彩色图像和深度图像进行识别,合并两个识别结果(两组手部关节点的空间位置),得到一组手部关节点的二维空间位置。可选的,该第一组关节的空间位置包括21个二维坐标且依次表示第1手部关节点至第21手部关节点的空间位置;该第二组关节的空间位置包括21个三维坐标且依次表示该第1手部关节点至该第21手部关节点的三维空间位置。合并该第一组手部关节点的空间位置和该第二组手部关节点的空间位置可以是合并这两组手部关节点的空间位置的二维坐标,每个手部关节点的第三维坐标保持不变,第三维坐标可以是场景中手部关节点到深度传感器的距离。可选的,识别单元306获取到第一位置区域和深度图像后,确定该第一位置区域对应在该深度图像中的第二位置区域,并利用训练好的识别网络对该第二位置区域的图像做手部关节点回归,得到21个手部关节点的空间位置以及21个置信度。这21个置信度与这21个手部关节点的空间位置一一对应。识别单元306获取到第一位置区域和彩色图像后,利用训练好的识别网络对该第一位置区域的图像做手部关节点回归,得到21个手部关节点的空间位置以及21个置信度。这21个置信度与这21个手部关节点的空间位置一一对应。一个手部关节点的空间位置对应的置信度越高,该手部关节点的空间位置越准确。可以理解,每个手部关节点对应两个空间位置,一个是识别单元306识别深度图像得到的三维空间位置,另一个是识别单元306识别彩色图像得到的二维空间位置。识别单元306可以将与同一个手部关节点相对应的两个手部关节点的空间位置合并为一个手部关节点的空间位置。举例来说,一个手部关节点对应的两个空间位置为分别为(A,B)和(C,D,E);若该(A,B)的置信度高于(C,D,E)的置信度,则将这两个空间位置合并为(A,B,E);否则,将这两个空间位置合并为(C,D,E)。
在该实现方式中,通过合并从深度图像得到的手部关节点的空间位置以及从彩色图像得到的手部关节点的空间位置,可以更准确地确定手部关节点的空间位置。
在一个可选的实现方式中,检测单元308,用于检测深度图像的手部所处的位置区域;识别单元306,用于识别该位置区域中的手部的手部关节点位置,得到一组手部关节点的空间位置。该深度图像为目标组手部图像,即上述F组手部图像中的任一组手部图像。
检测单元308可以采用训练好的检测网络对深度图像中的手部所处的区域进行检测,将检测到的位置区域(手的位置结果)输入到识别单元306。检测网络可以是处理单元308 采用深度学习方法,利用大量深度图像样本进行训练得到的网络。识别单元306可以采用训练好的识别网络是识别深度图像中的手部关节点位置,得到一组手部关节点的空间位置。可选的,识别单元306获取到深度图像中的手部所处的位置区域和深度图像后,利用训练好的识别网络对该位置区域的图像做手部关节点回归,得到该组手部关节点的空间位置。可选的,检测单元308在检测到深度图像中的手部所处的位置区域后,将该位置区域的图像发送给识别单元306。识别单元306利用训练好的识别网络对该位置区域的图像做手部关节点回归,得到该组手部关节点的空间位置。
在该实现方式中,利用深度图像可以快速地确定每组手部图像中的手部关节点的空间位置。
在一个可选的实现方式中,识别单元306利用多帧彩色图像识别手部关节点的位置,得到一组或多组关节点的空间位置。在该实现方式中,终端300可以仅包括彩色传感器而不包括深度传感器。
在图3中未详细介绍处理单元308如何确定F组手部关节点的空间位置对应的控制操作,下面通过几个实施例做一个详细介绍。
实施例一
处理单元308获得来自识别单元306的一组手部关节点的空间位置;根据该组手部关节点的空间位置,计算手部每根手指上4个手部关节点之间的角度;根据每根手指上4个手部关节点之间的角度,确定每根手指的弯曲状态;根据手部每根手指的弯曲状态,确定该组手部关节点的空间位置对应的一个手势类型;执行该手势类型对应的控制操作。在该实施例中,处理单元308可以根据一组手部关节点的空间位置,分别确定手部每根手指的弯曲状态,再综合每根手指的弯曲状态确定该组手部关节点的空间位置对应的手势类型,进而执行该手势类型对应的控制操作。图5为本申请实施例提供的由一组手部关节点的空间位置确定握拳手势的示意图,其中,每个圆点表示一个手部关节点。如图5所示,处理单元308可以根据图5中的这一组手部关节点的空间位置确定握拳手势。图6为本申请实施例提供的由一组手部关节点的空间位置确定张开手势的示意图,其中,每个圆点表示一个手部关节点。如图6所示,处理单元308可以根据图6中的这一组手部关节点的空间位置确定张开手势。处理单元308在确定该控制操作后,可以相应地调整显示单元302显示的画面。例如,调整显示单元302当前显示的画面中虚拟物体的位置和/或形态。
实施例二
处理单元308获得来自识别单元306的F组手部关节点的空间位置;确定该F组手部关节点的空间位置对应的M个手势类型,执行该M个手势类型对应的控制操作。可选的,F等于M,M大于1。也就是说,处理单元308可以根据一组手部关节点的空间位置,确定一个手势类型。一组手部关节点的空间位置可以不对应任一种手势类型,因此,处理单元308根据该F组手部关节点的空间位置确定的手势类型的个数可能少于F。可选的,M小于F,M为大于0的整数,F为大于1的整数。举例来说,手部从握拳手势转换到张开手势对应20组手部关节点的空间位置,处理单元308根据这20组手部关节点的空间位置仅确定了该握拳手势和该张开手势。处理单元308可以每次获得来自识别单元306的一组手部关节点的空间位置,并确定该组手部关节点的空间位置对应的一个手势类型。处理单 元308根据每组手部关节点的空间位置确定一个手势类型的具体实现方式与实施例一中的实现方式相同。处理单元308可以每次获得来自识别单元306的多组手部关节点的空间位置,并确定这些组手部关节点的空间位置对应的一个或多个手势类型。
可选的,M个手势类型对应的控制操作与该M个手势类型的手势类型变化无关。也就是说,M个手势类型对应的控制操作与该M个手势类型中各手势类型被确定的顺序无关。M个手势类型的手势类型变化是指该M个手势类型中各手势类型被确定的顺序。由于F组手部图像中各组手部图像的获得有一定的先后顺序,所以这M个手势类型的确定也有一定的先后顺序。举例来说,处理单元308依次确定了第一手势类型、第二手势类型以及第三手势类型,这三个手势类型对应目标控制操作;处理单元308依次确定了第二手势类型、第一手势类型以及第三手势类型,这三个手势类型也对应该目标控制操作。
可选的,M个手势类型对应的控制操作与该M个手势类型的手势类型变化相关。处理单元308执行该M个手势类型对应的控制操作可以是根据该M个手势类型的手势类型变化,确定该M个手势类型对应的控制操作,M大于1。举例来说,处理单元308依次确定了第一手势类型、第二手势类型以及第三手势类型,处理单元308根据这三个手势类型的手势类型变化,确定这三个手势类型对应第一控制操作;处理单元308依次确定了第二手势类型、第一手势类型以及第三手势类型,处理单元308根据这三个手势类型的手势类型变化,确定这三个手势类型对第二控制操作,该第二控制操作与该第一控制操作不同。可以理解,处理单元308可以确定F组手部关节点的空间位置对应的至少两个手势类型,根据该至少两个手势类型的手势类型变化,执行该至少两个手势类型对应的控制操作。
处理单元308可以预设有手势序列与控制操作的对应关系。处理单元308确定一个手势类型后,可以将该手势类型与之前得到的1个或多个手势类型组合起来,得到一个手势序列;在该手势序列对应某个控制操作的情况下,执行该控制操作;在该手势序列不对应任一控制操作的情况下,确定下一组手部关节点的空间位置对应的手势类型,重复之前的操作。可见,处理单元308可以每次获取一组手部关节点的空间位置,并确定该组手部关节点的空间位置对应的手势类型。另外,处理单元308每次可以获得多组手部关节点的空间位置,并根据该多组手部关节点的空间位置确定一个手势序列。在实际应用中,用户可以通过多个连续的手势执行某种控制操作。例如,用户可以利用从握拳手势到张开手势之间的连续手势(手势序列),实现对虚拟物体的某个控制操作。
实施例三
处理单元308获得来自识别单元306的F组手部关节点的空间位置;根据这F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化,F为大于1的整数;根据该空间位置变化调整显示单元302显示的画面中虚拟物体的位置,和/或,根据该空间位置变化调整显示单元302显示的画面中虚拟物体的形态。具体的,根据该空间位置变化调整显示单元302显示的画面中虚拟物体的位置的方法如下:根据该空间位置变化,确定手部的移动轨迹;按照该手部的移动轨迹移动该虚拟物体,该虚拟物体的移动轨迹与该手部的移动轨迹一致。该虚拟物体的移动轨迹与该手部的移动轨迹一致是指该虚拟物体的移动轨迹与该手部的移动轨迹的形状相同且大小成比例。例如,用户的手部向右移动20厘米,虚拟物体向右5厘米;该用户的手部向左移动30厘米,该虚拟物体向左移动7.5厘 米。手部的空间位置与该虚拟物体的位置相对应,即将手部的空间位置映射到虚拟物体的空间位置,使用户感觉手部直接与虚拟物体相接触。
处理单元308获得一组手部关节点的空间位置后,可以将计算该组手部关节点的空间位置相对于之前得到的某组手部关节点的空间位置的空间位置变化(手部位移);根据该空间位置变化调整显示单元302显示的画面中虚拟物体的形态。可选的,终端300还包括震动单元312,用于进行震动,该震动单元312的震动强度与手部至终端300的距离正相关或负相关。可选的,手部距离终端300越远,震动单元312的震动强度越弱。可选的,手部距离终端300越近,震动单元312的震动强度越弱。在实际应用中,震动单元312可以根据手部至终端300的距离调整其震动强度。具体的,根据该该空间位置变化调整显示单元302显示的画面中虚拟物体的形态的方法如下:根据该空间位置变化,确定手部的动作;执行与该手部的动作相对应的调整操作,该调整操作用于调整显示单元302显示的画面中虚拟物体的形态。可选的,显示单元302将用户的手势或手部动作对应的图像(虚拟手部)显示到其显示的画面中。调整显示单元302显示的画面中虚拟物体的形态可以是调整显示单元302显示的画面中虚拟物体的方向、大小、形状等。可选的,显示单元302显示的画面中包括用户的手部映射到该画面中的虚拟手部。用户可以将该虚拟手部当作自己的手部相应的操作该画面中的虚拟物体。也就是说,用户操作自己的手部就是操作该虚拟手部,即该虚拟手部的动作与用户的手部动作一致。终端300可以利用单帧图像得到手部关节点的三维空间位置,由连续多帧图像可以得到手部关节点的空间位置变化,并根据该空间位置变化确定的控制操作来操控虚拟物体发生位置和形态变化。
实施例四
处理单元308获得来自识别单元306的F组手部关节点的空间位置,确定该F组手部关节点的空间位置对应的M个手势类型,F大于1,M小于或等于F,且至少为1;根据该F组手部关节点的空间位置和该M个手势类型,执行控制操作。下面介绍几种根据该F组手部关节点的空间位置和该M个手势类型,执行控制操作的具体实现方式。
可选的,识别单元306根据F组手部关节点的空间位置中的一组手部关节点的空间位置确定目标手势类型后,根据该F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据该目标手势类型和该空间位置变化,执行控制操作。其中,该目标手势类型用于调整显示单元302显示画面中虚拟物体的位置。该目标手势可以是握拳手势、张开手势等。具体的,识别单元306根据一组手部关节点的空间位置确定目标手势类型后,在根据该组之前的一组或多组手部关节点的空间位置确定的手势类型均为该目标手势时,根据该组手部关节点的空间位置和该一组或多组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据该空间位置变化,执行控制操作。
可选的,识别单元306根据F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据M个手势类型的手势类型变化和该空间位置变化,执行控制操作。处理单元308可以预设有手势类型变化与空间位置变化的组合与控制操作的对应关系,处理单元308可以根据该对应关系确定不同的手势类型变化与空间位置变化的组合对应的控制操作,进而调整显示单元302显示的画面中虚拟物体的位置和/或形态。
可选的,识别单元306根据M个手势类型的手势类型变化和F组手部关节点的空间位 置,执行控制操作。处理单元308可以预设有手势类型变化和F组手部关节点的空间位置的组合与控制操作的对应关系,处理单元308可以根据该对应关系确定不同的手势类型变化和F组手部关节点的空间位置的组合对应的控制操作,进而调整显示单元302显示的画面中虚拟物体的位置和/或形态。
可以理解,终端300可以利用从连续多组手部图像中识别到的手势类型以及根据该多组手部图像确定的手部关节点的空间位置变化,得到复杂连续的动作识别,进而操控虚拟物体发生连续变化。在实际应用中,终端300可以结合手势类型和手部关节点的空间位置变化来确定用户的各种复杂的手部动作,以满足不同应用场景的需求。
在实际应用中,终端300在根据用户的手势或手部动作确定某种控制操作后,可以通过震动或播放特定音乐来与该用户的手势或手部动作实现交互,即通过震动或播放特定音乐来响应用户的手势或手部动作,进而增强该用户的操作感。下面介绍一下具体的实现方式。
图3中的终端300还可以包括震动单元312和音频单元314。震动单元312可以是图1中的震动电路160。音频单元314可以是图1中的音频电路128。震动单元312用于提供与处理单元308确定的控制操作相对应的震动效果。音频单元314用于提供与处理单元308确定的控制操作相对应的音频效果。一个控制操作可以仅对应一种震动效果,也可以仅对应一种音频特效,还可以既对应一种震动效果又对应一种音频特效。举例来说,终端设备确定比枪手势对应的控制操作后,该终端设备模仿真实手枪后坐力的震动效果以及发出射击声(音频特效)。处理单元308可以预设有控制操作与震动效果的对应关系,也可以预设有控制操作与音频特效的对应关系,还可以预设有控制操作与震动效果和音频特效的组合的对应关系。在具体实现中,处理单元308确定某个控制操作后,可以根据控制操作与震动效果的对应关系确定该控制操作对应的震动效果,并控制震动单元312进行震动,达到该震动效果。在具体实现中,处理单元308确定某个控制操作后,可以根据控制操作与音频特效的对应关系确定该控制操作对应的音频特效,并控制音频单元314播放相应的音乐,达到该音频特效。在具体实现中,处理单元308确定某个控制操作后,可以根据控制操作与震动效果和音频特效的组合的对应关系确定该控制操作对应的音频特效和震动效果,并控制音频单元314播放相应的音乐,达到该音频特效,以及控制震动单元312进行震动,达到该震动效果。在实际应用中,终端可以根据检测到的手势类型,提供不同的震动反馈形式或音频特效。
在实现方式中,检测到用户的手势后,触发对应的震动效果和音乐特效可以增强用户的沉浸感,提高用户体验。
在实际应用中,用户可以通过手势或手部动作操控终端300显示的画面中虚拟物体的位置或形态。这就要求终端300上的摄像头可以拍摄到用户的手部图像,即用户的手势操作在合理的操控范围内,以便于根据该手部图像识别出用户的手势或手部动作,进而调整终端300显示的画面中虚拟物体的位置或形态。下面介绍一下如何提示用户其手势操作超出操控范围的方式。
在具体实现中,终端300的摄像头(深度摄像头和彩色摄像头)拍摄手部图像;识别单元306识别手部图像中手部关节点位置,得到一组或多组手部关节点的空间位置;处理 单元308确定这一组或多组手部关节点的空间位置对应的控制操作,并执行该控制操作。当从一组手部图像中不能检测到手部时,表明用户的手势操作超出操控范围。可以理解,处理单元308得到控制操作的必要条件是识别单元306从摄像头拍摄的手部图像中可以识别到一定数量的手部关节点。可选的,当识别单元306从摄像头拍摄的一组手部图像中识别到的手部关节点的数量较少或未识别到手部关节点时,可以提示用户其手势操作超出操控范围,以便于提示该用户在合理的操控范围内重新进行操控。终端300在参考组手部图像中识别到的手部关节点的数量小于数量阈值的情况下,提示手势操作超出操控范围,该参考组手部图像为上述F组手部图像中的任一组图像,即获取单元304获取到的任一组手部图像。该数量阈值可以是10、12、15等。在具体实现中,识别单元306识别该参考组手部图像中手部关节点位置,得到一组手部关节点的空间位置,并传输至处理单元308;处理单元308确定该组手部关节点的空间位置包括的手部关节点的空间位置的数量小于该数量阈值后,控制震动单元312通过某种震动效果进行提示和/或控制音频单元314播放某种音频特效进行提示。
可选的,当识别单元306识别F组手部图像中的手部关节点位置时,在K组手部图像中的每组手部图像中的手部关节点的数量均小于数量阈值的情况下,提示用户其手势操作超出操控范围。该K组手部图像包含于该F组手部图像。K小于或等于F,且至少为1。
可选的,当检测单元310检测F组手部图像中每组手部图像中的手部所处的位置区域时,在K组手部图像中未检测到手部或未检测到完整的手部的情况下,提示用户其手势操作超出操控范围。该K组手部图像包含于该F组手部图像。K小于或等于F,且至少为1。在具体实现中,检测单元310检测一组手部图像中的手部,在未检测到手部或未检测到完整的手部的时通知处理单元308其未检测到手部;处理单元308控制震动单元312通过某种震动效果进行提示和/或控制音频单元314播放某种音频特效进行提示。
在该实现方式中,终端设备检测到当前组手部图像未包含手部图像或该组手部图像未包含完整的手部图像后,提示用户手势操作超出操作范围,可以及时提示该用户重新进行操控。
基于图1中终端设备100,本申请实施例提供了一种基于手势的操控方法,如图7所示,该方法可包括:
701、终端设备显示目标画面。
该目标画面中包括用于通过检测到的手势或检测到的手部动作进行操控的虚拟物体。该目标画面可以包括该终端设备的后置摄像头采集的图像。
702、该终端设备获取F组手部图像。
该终端设备的前置摄像头(深度摄像头和/或彩色摄像头)可采集用户的手部图像,以便于确定该用户的手势或手部动作。该终端设备可以利用深度摄像头和彩色摄像头同步拍摄用户的手部得到F组手部图像,也可以仅利用深度摄像头拍摄用户的手部得到F组手部图像,还可以仅利用彩色摄像头拍摄用户的手部得到F帧彩色图像。F为大于0的整数。该F组手部图像可以是F帧深度图像,也可以是F帧彩色图像,还可以是F个图像组合,每个图像组合包括一帧深度图像和一帧彩色图像。当一组手部图像包括一帧深度图像时,F可以等于1,也可以大于1。当一组手部图像不包括深度图像时,F至少为2。
703、该终端设备识别该F组手部图像中的手部关节点位置,得到F组手部关节点的空间位置。
该F组手部关节点的空间位置中任一组手部关节点的空间位置为一组手部图像中的手部的手部关节点的空间位置。
704、执行该F组手部关节点的空间位置对应的控制操作,该控制操作用于调节该目标画面中该虚拟物体的位置和/或形态。
本申请实施例中,终端设备通过手部关节点的空间位置确定用户的控制操作,能够覆盖所有的自然手势以及连续的手部动作,操作效率更高、更自然,以便于提高用户的人机交互体验。
基于图1中终端设备100,本申请实施例提供了另一种基于手势的操控方法,如图8所示,该方法可包括:
801、终端设备显示目标画面。
该目标画面中包括用于通过检测到的手势或检测到的手部动作进行操控的虚拟物体。该目标画面可以包括该终端设备通过后置摄像头获取的图像。该目标画面可以是某个应用的画面,例如AR游戏或VR游戏的画面。
802、该终端设备利用深度摄像头和彩色摄像头同步拍摄同一场景得到一组手部图像。
该深度摄像头拍摄该场景可以得到深度图像,该彩色摄像头拍摄该场景可以得到彩色图像。可选的,终端设备得到这组手部图像后,对这组手部图像进行降噪、手部分割等处理。可选的,终端设备得到这组手部图像后,对这组手部图像包括的深度图像和彩色图像进行空间对齐。
803、该终端设备检测该组手部图像中是否包括手部。
若是,执行804;若否,执行806。
804、该终端设备识别该组手部图像中的手部关节点位置,得到一组手部关节点的空间位置。
前述实施例介绍了如何识别手部图像中的手部关节点位置,这里不再详述。
805、该终端设备执行该组手部关节点的空间位置对应的控制操作。
该控制操作用于调节该目标画面中该虚拟物体的位置和/或形态。该终端设备执行该组手部关节点的空间位置对应的控制操作可以是该终端设备确定该组手部关节点的空间位置对应的手势类型,执行该手势类型对应的控制操作。该终端设备可以预设有手势类型与控制操作的对应关系,利用该对应关系可以确定各种手势类型对应的控制操作。该终端设备确定一组手部关节点的空间位置对应的手势类型的方法与前述实施例中相同,这里不再详述。图9为本申请实施例提供的一种通过张开手势释放子弹的示意图。如图9所示,图中的手部关节点的空间位置为终端设备根据一组手部图像得到的一组手部关节点的空间位置,图中的张开手势为该终端设备根据该组手部关节点的空间位置确定的手势类型,图中的901表示该终端设备显示的画面中的子弹(虚拟物体),图中的902表示该画面中的弹弓,虚线表示该子弹的移动轨迹。举例来说,终端设备显示的画面中子弹处于待发射状态,该终端设备拍摄用户的手部得到一组手部图像;在根据该组手部图像确定张开手势后,该终端设备显示该子弹发射的画面,即显示该子弹按照图9中的移动轨迹移动的画面。
可选的,805可以由如下操作代替:该终端设备确定该组空间位置对应的手势类型,将该手势类型与之前得到的1个或多个手势类型组合起来,得到一个手势序列;在该手势序列对应某个控制操作的情况下,执行该控制操作;在该手势序列不对应任一控制操作的情况下,确定下一组手部关节点的空间位置对应的手势类型,重复之前的操作。终端设备可以预设有至少一个手势序列与控制操作的对应关系,利用该对应关系可以确定各种手势序列对应的控制操作。
可选的,终端设备确定该控制操作后,执行与该控制操作相对应的震动效果和/音频特效。一个控制操作可以仅对应一种震动效果,也可以仅对应一种音频效果,还可以既对应一种震动效果又对应一种音频特效。该终端设备可以预设有控制操作与震动效果的对应关系,也可以预设有控制操作与音频特效的对应关系,还可以预设有控制操作与震动效果和音频特征的组合的对应关系。举例来说,终端设备确定比枪手势对应的控制操作后,该终端设备模仿真实手枪后坐力的震动效果以及发出射击声(音频特效)。
806、该终端设备进行震动,以提示手势操作超出操控范围。
本申请实施例中,终端设备可以根据手部关节点的空间位置确定用户的手势类型,进而执行相应的控制操作,手势识别率高。
基于图1中终端设备100,本申请实施例提供了另一种基于手势的操控方法,如图10所示,该方法可包括:
1001、终端设备获取F组手部图像。
该终端设备利用深度摄像头和彩色摄像头同步拍摄同一场景得到一组手部图像,连续同步拍摄F次,得到该F组手部图像。F为大于1的整数。
1002、该终端设备识别该F组手部图像中每组手部图像中的手部关节点位置,得到F组手部关节点的空间位置。
可选的,该终端设备获得一组手部图像就识别一帧图像中的手部关节点位置,得到一组手部关节点的空间位置(三维空间位置)。可选的,该终端设备获得F组手部图像后开始识别该F组图像中的手部关节点位置,得到F组手部关节点的空间位置。
1003、该终端设备根据该F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化。
一组手部关节点的空间位置包括的手部关节点的空间位置可以依次为第1手部关节点的空间位置至第21手部关节点的空间位置。该终端设备根据两组手部关节点的空间位置,计算手部关节点的空间位置变化可以是用一组手部关节点的空间位置与减去另一组手部关节点的空间位置,得到每个手部关节点的空间位置变化。
1004、该终端设备根据该空间位置变化调整其显示的画面中虚拟物体的位置和/或形态。
该终端设备的显示屏或显示器显示的画面中包括至少一个虚拟物体,用户可以利用手势或手部动作调整该画面中的该至少一个虚拟物体。具体的,该终端设备根据该空间位置变化调整其显示的画面中虚拟物体的位置的方法如下:根据该空间位置变化,确定手部的移动轨迹;按照该手部的移动轨迹移动该虚拟物体,该虚拟物体的移动轨迹与该手部的移动轨迹一致。该虚拟物体的移动轨迹与该手部的移动轨迹一致是指该虚拟物体的移动轨迹与该手部的移动轨迹的形状相同且大小成比例。例如,用户的手部向右移动20厘米,虚拟 物体向右5厘米;该用户的手部向左移动30厘米,该虚拟物体向左移动7.5厘米。手部的空间位置与该虚拟物体的位置相对应,即将手部的空间位置映射到虚拟物体的空间位置,使用户感觉手部直接与虚拟物体相接触。可选的,该终端设备将该手部的移动轨迹映射到该画面中,得到一个虚拟手部进行移动的画面。用户可以将该虚拟手部当作自己的手部相应的操作该画面中的虚拟物体。也就是说,用户操作自己的手部就是操作该虚拟手部,即该虚拟手部的动作与用户的手部动作一致。终端设备获得一组手部关节点的空间位置后,可以将计算该组手部关节点的空间位置相对于之前得到的某组手部关节点的空间位置的空间位置变化(手部位移);根据该空间位置变化相应的移动该虚拟物体。
具体的,根据该空间位置变化调整其显示的画面中虚拟物体的形态的方法如下:根据该空间位置变化,确定手部的动作;执行与该手部的动作相对应的调整操作,该调整操作用于调整该终端设备显示的画面中虚拟物体的形态。调整该终端设备显示的画面中虚拟物体的形态可以是调整该终端设备显示的画面中虚拟物体的方向、大小、形状等。可选的,该终端设备将该手部的动作映射到该画面中,得到一个与该手部的动作一致的虚拟手部的动作。用户可以将该虚拟手部当作自己的手部相应的操作该画面中的虚拟物体。也就是说,用户操作自己的手部就是操作该虚拟手部,即该虚拟手部的动作与用户的手部动作一致。在实际应用中,终端设备可以利用单帧图像得到手部关节点的三维空间位置,由连续多帧图像可以得到手部关节点的空间位置变化,并根据该空间位置变化确定的控制操作来操控虚拟物体发生相应的位置和形态变化。
1103和1004可以由如下操作取代:终端设备确定该F组手部关节点的空间位置对应M个手势类型;该终端设备根据该M个手势类型和F组手部关节点的空间位置,调整其显示的画面中虚拟物体的位置和/或形态。下面介绍一下具体的实现方式。
在一个可选的实现方式中,终端设备根据F组手部关节点的空间位置中的一组手部关节点的空间位置确定目标手势类型后,根据该F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据该目标手势类型和该空间位置变化,执行控制操作。其中,该目标手势类型用于调整显示单元302显示画面中虚拟物体的位置。该目标手势可以是握拳手势、张开手势等。具体的,终端设备根据一组手部关节点的空间位置确定目标手势类型后,在确定根据该组之前的一组或多组手部关节点的空间位置确定的手势类型均为该目标手势时,根据该组手部关节点的空间位置和该一组或多组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据该空间位置变化,执行控制操作。
在一个可选的实现方式中,终端设备根据F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据M个手势类型的手势类型变化和该空间位置变化,执行控制操作。处理单元308可以预设有手势类型变化与空间位置变化的组合与控制操作的对应关系,处理单元308可以根据该对应关系确定不同的手势类型变化与空间位置变化的组合对应的控制操作,进而调整显示单元302显示的画面中虚拟物体的位置和/或形态。
在一个可选的实现方式中,终端设备根据M个手势类型的手势类型变化和F组手部关节点的空间位置,执行控制操作。处理单元308可以预设有手势类型变化和F组手部关节 点的空间位置的组合与控制操作的对应关系,处理单元308可以根据该对应关系确定不同的手势类型变化和F组手部关节点的空间位置的组合对应的控制操作,进而调整显示单元302显示的画面中虚拟物体的位置和/或形态。
本申请实施例中,终端设备可以根据连续多组手部图像确定用户的手部动作,进而执行该手部动作相应的控制操作。
下面结合具体的应用场景介绍一下终端设备如何根据用户的手势或手部动作调整其显示的画面。
以一个交互类的AR射箭游戏为例,具体的交互方法如下:
(1)、用户启动终端设备上的AR射箭游戏应用。
该终端设备在接收到用户发送的启动该AR射箭游戏应用的指令后,启动该AR射箭游戏应用。在实际应用中,该用户点击该终端设备的触摸显示屏上的目标图标后,该终端设备启动该AR射箭游戏应用,该目标图标为该AR射箭游戏应用的图标。
(2)该终端设备利用后置摄像头拍摄真实场景,将虚拟物体叠加到拍摄得到的真实图像上,并在显示屏上显示。
该显示屏显示后置摄像头拍摄得到的真实图像以及叠加的虚拟物体。如图11所示,1101表示弹弓、1102表示子弹、1103表示射击目标、1104表示真实图像,其中,1101、1102以及1103为叠加在真实图像1104上的虚拟物体。可选的,该终端设备播放该AR射箭游戏应用的背景音乐。
(3)、该终端设备利用前置摄像头(深度摄像头和彩色摄像头)拍摄用户的手部图像,检测该手部图像中是否包括手部。
如果检测到该手部图像中不包括手部,则确定该用户的手部不在摄像头的视野内,该终端设备进行震动,以提示该用户调整手部的位置,防止操作失灵。如果该手部图像包括手部,执行下一步骤。
(4)该终端设备根据拍摄的手部图像识别出第一手势,将该第一手势转换为第一交互指令。
该第一手势可以是握拳手势,该第一交互指令可以是捏住弹弓上子弹。在实际应用中,在终端设备运行AR射箭游戏的过程中,若检测到握拳手势时,该终端设备显示捏住弹弓上子弹的画面。
当用户手部发生移动时,摄像头获取连续组手部图像,并对每一组手部图像进行识别,可以得到手部关节点的空间位置变化。具体实施为:当用户手部上下左右移动时,识别到手部关节点的三维空间位置发生连续的空间变化,可以控制画面中子弹连续上下左右移动;当用户手部掌心发生偏转时,依据手部关节点的空间位置变化,可以得到手部掌心的朝向,根据该手部掌心的朝向调整子弹的发射方向,该子弹的发射方向与该手部掌心朝向相同;当用户手前后移动时,手部关节点的空间z坐标发生变化,可以控制子弹发射力度。图12为本申请实施例提供的一种手部移动的示意图。如图12所示,手部可以上下移动,也可以左右移动,还可以前后移动,还可以调整手部掌心朝向。具体的,终端设备可以根据一组手部关节点的空间位置确定各手部关节点的位置距离,进而确定手部掌心朝向。在实际应用中,手部的每种移动方式都可以使得终端设备显示的画面中的子弹发生相应的改变。具 体的,手部的移动轨迹与画面中子弹的移动轨迹一致。另外,终端设备检测到用户的手部发生移动时,其震动单元工作。如图13所示,手部离摄像头越远,震动强度越强,代表弹弓拉力越大,体现3D震动效果。可以理解,当用户手部保持握拳状态进行移动时,手势识别结果不变,依然保持捏住弹弓上子弹的状态。
(5)该终端设备根据拍摄的手部图像识别出第二手势,将该第一手势转换为第二交互指令。
该第二手势可以是张开手势,该第二交互指令可以是释放子弹。在实际应用中,在终端设备运行AR射箭游戏的过程中,若检测到张开手势时,该终端设备显示释放子弹的画面。可以理解,用户可以通过握拳手势使该画面中的弹弓上子弹,通过手部的移动可以控制子弹的移动,通过张开手势释放该画面中的子弹。
(6)该终端设备触发震动效果和音乐特效。
当该终端设备画面中,子弹击中虚拟物体时,该终端设备触发震动效果和音乐特效,增强用户的沉浸感。
用户在玩该AR射箭游戏时,可以通过手势操控画面中的弹弓,完成弹弓上子弹、调整子弹的发射方向、调整弹弓拉力、调整子弹位置、释放子弹等操作。也就是说,用户可以通过手势或手部动作操控画面中的弹弓,就感觉是自己的手部直接操控画面中的弹弓一样。
下面以交互类的密室逃脱类AR游戏为例,介绍一下手势和手部动作识别的应用。在密室逃脱类AR游戏过程中涉及到了拧螺丝,旋转钥匙,推门,拉抽屉等一系列手部动作。自然手势识别对所有手部关节点进行跟踪,以及对所有行为都进行捕捉、识别,以确定用户的各种手势以及手部动作,进而调整终端设备播放的画面。在实际应用中,终端设备可以对手部的各个手部关节点进行提取,然后实时的分析手部动作以及位置。这样可以识别出用户的任意手部动作以及手势,进而完成相应的控制操作。
图14是本发明实施例提供的终端设备1400的硬件结构示意图。如图14所示,终端设备1400可以作为终端300的一种实现方式,终端设备1400包括处理器1402、存储器1404、摄像头1406、显示屏1408和总线1410。其中,处理器1402、存储器1404、摄像头1406、显示屏1408通过总线1410实现彼此之间的通信连接。处理器1402可以实现图3中识别
处理器1402可以采用通用的CPU,微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),或者一个或多个集成电路,用于执行相关程序,以实现本发明实施例所提供的技术方案。
存储器1404可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器1404可以存储操作系统、以及其他应用程序。用于通过软件或者固件来实现本发明实施例提供的终端300包括的模块以及部件所需执行的功能,或者用于实现本发明方法实施例提供的上述方法的程序代码存储在存储器1404中,并由处理器1402读取存储器1404中的代码来执行终端300包括的模块以及部件所需执行的操作,或者执行本申请实施例提供的上述方法。
显示屏1408用于显示本申请实施里中描述的各种图像以及动态画面。
总线1410可包括在终端设备1400各个部件(例如处理器1402、存储器1404、摄像头 1406、显示屏1408)之间传送信息的通路。
摄像头1406可以是深度摄像头,用于拍摄深度图像。摄像头1406可以是彩色摄像头,例如RGB摄像头,用于拍摄深度图像。摄像头1406可以包括彩色摄像头和深度摄像头,该彩色摄像头用于拍摄彩色图像,该深度摄像头用于拍摄深度图像,该彩色摄像头和该深度摄像头可以同步拍摄同一场景。
应注意,尽管图14所示的终端设备1400仅仅示出了处理器1402、存储器1404、摄像头1406、显示屏1408以及总线1410,但是在具体实现过程中,本领域的技术人员应当明白,终端设备1400还包含实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当明白,终端设备1400还可包含实现其他附加功能的硬件器件。此外,本领域的技术人员应当明白,终端设备1400也可仅仅包含实现本发明实施例所必须的器件,而不必包含图14中所示的全部器件。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所涉及的动作和模块并不一定是本发明所必须的。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述的程序可存储于一种计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,上述的存储介质可为磁碟、光盘、只读存储记忆体(ROM:Read-Only Memory)或随机存储记忆体(RAM:Random Access Memory)等。
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要保护的本发明的过程中,本领域技术人员通过查看该附图、公开内容、以及所附权利要求书,可理解并实现该公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其它组成部分或步骤,“一”或“一个”不排除多个的可能性。单个处理器或其它模块可以实现权利要求中列举的若干项功能。互相不同的从属权利要求中记载了某些措施,但这并不代表这些措施不能组合起来产生良好的效果。计算机程序可以存储/分布在合适的介质中,例如:光存储介质或固态介质,与其它硬件一起提供或作为硬件的一部分,也可以采用其它分布形式,如通过Internet或其它有线或无线电信系统。
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上该,本说明书内容不应理解为对本发明的限制。

Claims (25)

  1. 一种基于手势的操控方法,其特征在于,包括:
    显示目标画面,所述目标画面中包括用于通过检测到的手势或检测到的手部动作进行操控的虚拟物体;
    获取F组手部图像;
    根据所述F组手部图像,识别所述F组手部图像中手部的手部关节点位置,从而得到F组手部关节点的空间位置,任一组手部关节点的空间位置为一组手部图像中的手部的手部关节点的空间位置,F为大于0的整数;
    根据所述F组手部关节点的空间位置,执行所述F组手部关节点的空间位置对应的控制操作,所述控制操作用于调节所述目标画面中所述虚拟物体的位置和/或形态。
  2. 根据权利要求1所述的方法,其特征在于,所述执行所述F组手部关节点的空间位置对应的控制操作包括:
    根据所述F组手部关节点的空间位置,确定M个手势类型,M小于或等于F,M为正整数;
    执行所述M个手势类型对应的所述控制操作。
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述F组手部关节点的空间位置对应的M个手势类型包括:
    根据所述F组手部关节点中一组手部关节点的空间位置,计算所述一组手部关节点中的手部关节点之间的角度;
    根据所述手部关节点之间的角度,确定所述一组手部关节点的空间位置对应的一个手势类型。
  4. 根据权利要求2或3所述的方法,其特征在于,所述确定所述F组手部关节点的空间位置对应的M个手势类型包括:
    确定所述F组手部关节点的空间位置对应的至少两个手势类型,F大于1;
    所述执行所述M个手势类型对应的所述控制操作包括:
    根据所述至少两个手势类型的手势类型变化,执行所述至少两个手势类型对应的所述控制操作。
  5. 根据权利要求1所述的方法,其特征在于,所述执行所述F组手部关节点的空间位置对应的控制操作包括:
    根据所述F组手部关节点的空间位置,确定所述F组手部关节点的空间位置对应的M个手势类型,F大于1,M小于或等于F;
    根据所述F组手部关节点的空间位置和所述M个手势类型,执行所述控制操作。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述F组手部关节点的空间位 置和所述M个手势类型,执行所述控制操作包括:
    根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据所述M个手势类型和所述空间位置变化,执行所述控制操作;
    或者,
    根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据所述M个手势类型的手势类型变化和所述空间位置变化,执行所述控制操作;
    或者,
    根据所述M个手势类型的手势类型变化和所述F组手部关节点的空间位置,执行所述控制操作。
  7. 根据权利要求1所述的方法,其特征在于,所述执行所述F组手部关节点的空间位置对应的控制操作包括:
    根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化,F大于1;
    根据所述空间位置变化,执行所述控制操作。
  8. 根据权利要求1至7任意一项所述的方法,其特征在于,所述方法还包括:
    在K组手部图像中的每组手部图像中的手部关节点的数量均小于数量阈值的情况下,提示手势操作超出操控范围,所述K组手部图像包含于所述F组手部图像,K小于或等于F,K为正整数。
  9. 根据权利要求1至7任意一项所述的方法,其特征在于,所述识别所述F组手部图像中的手部关节点位置,从而得到F组手部关节点的空间位置包括:
    根据所述F组手部图像中任一组手部图像包括的彩色图像和深度图像中的至少一项检测所述任一组手部图像中的手部所处的位置区域;
    根据所述彩色图像和所述深度图像中的至少一项,识别得到所述位置区域中的手部的手部关节点位置。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述F组手部图像中任一组手部图像包括的彩色图像和深度图像中的至少一项检测所述任一组手部图像中的手部所处的位置区域包括:
    根据目标组手部图像包括的彩色图像,检测所述目标组手部图像包括的彩色图像中的手部所处的第一位置区域;所述目标组手部图像为所述F组手部图像中任一组图像;
    所述根据所述彩色图像和所述深度图像中的至少一项,识别得到所述位置区域中的手部的手部关节点位置包括:
    根据所述目标组手部图像包括的深度图像,识别所深度图像中第二位置区域中的手部的手部关节点位置,得到所述目标组手部图像对应的一组手部关节点的空间位置,所述第二位置区域为所述第一位置区域对应在所述深度图像中的区域,所述深度图像和所述彩色 图像为同步拍摄同一场景得到的图像。
  11. 根据权利要求9所述的方法,其特征在于,所述根据所述F组手部图像中任一组手部图像包括的彩色图像和深度图像中的至少一项检测所述任一组手部图像中的手部所处的位置区域包括:
    根据目标组手部图像包括的彩色图像,检测所述彩色图像中的手部所处的第一位置区域;所述目标组手部图像为所述F组手部图像中任一组图像;
    所述根据所述彩色图像和所述深度图像中的至少一项,识别得到所述位置区域中的手部的手部关节点位置包括:
    根据所述彩色图像,识别所述第一位置区域中的手部的手部关节点位置,得到第一组手部关节点的空间位置;
    根据所述目标组手部图像包括的深度图像,识别所述深度图像中第二位置区域中的手部的手部关节点位置,得到第二组手部关节点的空间位置,所述第二位置区域为所述第一位置区域对应在所述深度图像中的区域,所述深度图像和所述彩色图像同步为拍摄同一场景得到的图像;
    合并第一组手部关节点的空间位置和所述第二组手部关节点的空间位置,得到所述目标组手部图像对应的一组手部关节点的空间位置。
  12. 根据权利要求10或11所述的方法,其特征在于,所述识别所述F组手部图像中的手部关节点,得到F组手部关节点的空间位置之前,所述方法还包括:
    通过彩色传感器和深度传感器同步拍摄所述同一场景,得到原始彩色图像和原始深度图像;
    对所述原始彩色图像和所述原始深度图像做空间对齐;
    分别对对齐后的原始彩色图像和对齐后的原始深度图像做手部分割,得到所述目标组手部图像。
  13. 一种终端设备,其特征在于,包括:
    显示单元,用于显示目标画面,所述目标画面中包括用于通过检测到的手势或检测到的手部动作进行操控的虚拟物体;
    获取单元,用于获取F组手部图像;
    识别单元,用于根据所述F组手部图像,识别所述F组手部图像中手部的手部关节点位置,从而得到F组手部关节点的空间位置,任一组手部关节点的空间位置为一组手部图像中的手部的手部关节点的空间位置,F为大于0的整数;
    处理单元,用于根据所述F组手部关节点的空间位置,执行所述F组手部关节点的空间位置对应的控制操作,所述控制操作用于调节所述目标画面中所述虚拟物体的位置和/或形态。
  14. 根据权利要求13所述的终端设备,其特征在于,所述处理单元,用于确定所述F 组关节点的空间位置对应的至少一个手势;执行所述至少一个手势对应的所述控制操作。
  15. 根据权利要求14所述的终端设备,其特征在于,所述处理单元,用于根据所述F组手部关节点中一组手部关节点的空间位置,计算所述一组手部关节点中的手部关节点之间的角度;根据所述手部关节点之间的角度,确定所述一组手部关节点的空间位置对应的一个手势类型。
  16. 根据权利要求14或15所述的终端设备,其特征在于,所述处理单元,用于确定所述F组手部关节点的空间位置对应的至少两个手势类型,F大于1;根据所述至少两个手势类型的手势类型变化,执行所述至少两个手势类型对应的所述控制操作。
  17. 根据权利要求13所述的终端设备,其特征在于,所述处理单元,用于根据所述F组手部关节点的空间位置,确定所述F组手部关节点的空间位置对应的M个手势类型,F大于1,M小于或等于F;根据所述F组手部关节点的空间位置和所述M个手势类型,执行所述控制操作。
  18. 根据权利要求17所述的终端设备,其特征在于,所述处理单元,用于根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据所述M个手势类型和所述空间位置变化,执行所述控制操作;
    或者,所述处理单元,用于根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化;根据所述M个手势类型的手势类型变化和所述空间位置变化,执行所述控制操作;
    或者,所述处理单元,用于根据所述M个手势类型的手势类型变化和所述F组手部关节点的空间位置,执行所述控制操作。
  19. 根据权利要求13所述的终端设备,其特征在于,所述处理单元,用于根据所述F组手部关节点的空间位置,确定手部关节点组之间的手部关节点的空间位置变化,F大于1;根据所述空间位置变化,执行所述控制操作。
  20. 根据权利要求13至18任意一项所述的终端设备,其特征在于,所述处理单元,用于在K组手部图像中的每组手部图像中的手部关节点的数量均小于数量阈值的情况下,提示手势操作超出操控范围,所述K组手部图像包含于所述F组手部图像,K小于或等于F,K为正整数。
  21. 根据权利要求13至18任意一项所述的终端设备,其特征在于,所述识别单元,用于根据所述F组手部图像中任一组手部图像包括的彩色图像和深度图像中的至少一项检测所述任一组手部图像中的手部所处的位置区域;根据所述彩色图像和所述深度图像中的至少一项,识别得到所述位置区域中的手部的手部关节点位置。
  22. 根据权利要求21所述的终端设备,其特征在于,所述识别单元,用于根据目标组手部图像包括的彩色图像,检测所述目标组手部图像包括的彩色图像中的手部所处的第一位置区域;所述目标组手部图像为所述F组手部图像中任一组图像;根据所述目标组手部图像包括的深度图像,识别所深度图像中第二位置区域中的手部的手部关节点位置,得到所述目标组手部图像对应的一组手部关节点的空间位置,所述第二位置区域为所述第一位置区域对应在所述深度图像中的区域,所述深度图像和所述彩色图像为同步拍摄同一场景得到的图像。
  23. 根据权利要求21所述的终端设备,其特征在于,所述识别单元,用于根据目标组手部图像包括的彩色图像,检测所述彩色图像中的手部所处的第一位置区域;所述目标组手部图像为所述F组手部图像中任一组图像;根据所述彩色图像,识别所述第一位置区域中的手部的手部关节点位置,得到第一组手部关节点的空间位置;根据所述目标组手部图像包括的深度图像,识别所述深度图像中第二位置区域中的手部的手部关节点位置,得到第二组手部关节点的空间位置,所述第二位置区域为所述第一位置区域对应在所述深度图像中的区域,所述深度图像和所述彩色图像同步为拍摄同一场景得到的图像;合并第一组手部关节点的空间位置和所述第二组手部关节点的空间位置,得到所述目标组手部图像对应的一组手部关节点的空间位置。
  24. 根据权利要求22或23所述的终端设备,其特征在于,所述获取单元包括:
    彩色传感器,用于拍摄拍摄所述同一场景,得到原始彩色图像;
    深度传感器,用于拍摄拍摄所述同一场景,得到原原始深度图像;
    对齐子单元,用于对所述原始彩色图像和所述原始深度图像做空间对齐;
    分割子单元,用于分别对对齐后的原始彩色图像和对齐后的原始深度图像做手部分割,得到所述目标组手部图像。
  25. 一种计算机可读存储介质,其特征在于,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-12任一项所述的方法。
PCT/CN2019/111035 2018-10-15 2019-10-14 基于手势的操控方法及终端设备 WO2020078319A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19872386.8A EP3859489A4 (en) 2018-10-15 2019-10-14 GESTURE BASED HANDLING PROCESS AND TERMINAL DEVICE
US17/230,067 US20210232232A1 (en) 2018-10-15 2021-04-14 Gesture-based manipulation method and terminal device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811198807.5 2018-10-15
CN201811198807.5A CN111045511B (zh) 2018-10-15 2018-10-15 基于手势的操控方法及终端设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/230,067 Continuation US20210232232A1 (en) 2018-10-15 2021-04-14 Gesture-based manipulation method and terminal device

Publications (1)

Publication Number Publication Date
WO2020078319A1 true WO2020078319A1 (zh) 2020-04-23

Family

ID=70230379

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/111035 WO2020078319A1 (zh) 2018-10-15 2019-10-14 基于手势的操控方法及终端设备

Country Status (4)

Country Link
US (1) US20210232232A1 (zh)
EP (1) EP3859489A4 (zh)
CN (1) CN111045511B (zh)
WO (1) WO2020078319A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116719416A (zh) * 2023-08-07 2023-09-08 海马云(天津)信息技术有限公司 虚拟数字人的手势动作修正方法和装置、电子设备及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625102A (zh) * 2020-06-03 2020-09-04 上海商汤智能科技有限公司 一种建筑物展示方法及装置
CN111897430B (zh) * 2020-07-30 2023-05-23 深圳创维-Rgb电子有限公司 应用的控制方法、显示终端及计算机可读存储介质
CN112433605B (zh) * 2020-11-10 2022-10-04 深圳市瑞立视多媒体科技有限公司 一种在虚拟现实环境中实现拧螺丝的方法
CN112286363B (zh) * 2020-11-19 2023-05-16 网易(杭州)网络有限公司 虚拟主体形态改变方法及装置、存储介质、电子设备
CN113238650B (zh) * 2021-04-15 2023-04-07 青岛小鸟看看科技有限公司 手势识别和控制的方法、装置及虚拟现实设备
US20230145728A1 (en) * 2021-11-05 2023-05-11 Htc Corporation Method and system for detecting hand gesture, and computer readable storage medium
CN117492555A (zh) * 2022-07-25 2024-02-02 北京字跳网络技术有限公司 对象移动控制方法、装置、设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105160323A (zh) * 2015-09-07 2015-12-16 哈尔滨市一舍科技有限公司 一种手势识别方法
EP3101511A1 (en) * 2015-06-03 2016-12-07 Nokia Technologies Oy Monitoring
WO2017140569A1 (de) * 2016-02-19 2017-08-24 Audi Ag Kraftfahrzeug-bedienvorrichtung und verfahren zum betreiben einer bedienvorrichtung, um eine wechselwirkung zwischen einer virtuellen darstellungsebene und einer hand zu bewirken
CN107578023A (zh) * 2017-09-13 2018-01-12 华中师范大学 人机交互手势识别方法、装置及系统
CN107885317A (zh) * 2016-09-29 2018-04-06 阿里巴巴集团控股有限公司 一种基于手势的交互方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9477303B2 (en) * 2012-04-09 2016-10-25 Intel Corporation System and method for combining three-dimensional tracking with a three-dimensional display for a user interface
US9696795B2 (en) * 2015-02-13 2017-07-04 Leap Motion, Inc. Systems and methods of creating a realistic grab experience in virtual reality/augmented reality environments
CN104808788B (zh) * 2015-03-18 2017-09-01 北京工业大学 一种非接触式手势操控用户界面的方法
KR101711736B1 (ko) * 2015-05-26 2017-03-02 이화여자대학교 산학협력단 영상에서 동작 인식을 위한 특징점 추출 방법 및 골격 정보를 이용한 사용자 동작 인식 방법
CN106886741A (zh) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 一种基手指识别的手势识别方法
US20170193289A1 (en) * 2015-12-31 2017-07-06 Microsoft Technology Licensing, Llc Transform lightweight skeleton and using inverse kinematics to produce articulate skeleton
CN106886284A (zh) * 2017-01-20 2017-06-23 西安电子科技大学 一种基于Kinect的博物馆文物交互系统
CN107423698B (zh) * 2017-07-14 2019-11-22 华中科技大学 一种基于并联卷积神经网络的手势估计方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3101511A1 (en) * 2015-06-03 2016-12-07 Nokia Technologies Oy Monitoring
CN105160323A (zh) * 2015-09-07 2015-12-16 哈尔滨市一舍科技有限公司 一种手势识别方法
WO2017140569A1 (de) * 2016-02-19 2017-08-24 Audi Ag Kraftfahrzeug-bedienvorrichtung und verfahren zum betreiben einer bedienvorrichtung, um eine wechselwirkung zwischen einer virtuellen darstellungsebene und einer hand zu bewirken
CN107885317A (zh) * 2016-09-29 2018-04-06 阿里巴巴集团控股有限公司 一种基于手势的交互方法及装置
CN107578023A (zh) * 2017-09-13 2018-01-12 华中师范大学 人机交互手势识别方法、装置及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3859489A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116719416A (zh) * 2023-08-07 2023-09-08 海马云(天津)信息技术有限公司 虚拟数字人的手势动作修正方法和装置、电子设备及存储介质
CN116719416B (zh) * 2023-08-07 2023-12-15 海马云(天津)信息技术有限公司 虚拟数字人的手势动作修正方法和装置、电子设备及存储介质

Also Published As

Publication number Publication date
EP3859489A1 (en) 2021-08-04
EP3859489A4 (en) 2021-12-01
CN111045511A (zh) 2020-04-21
CN111045511B (zh) 2022-06-07
US20210232232A1 (en) 2021-07-29

Similar Documents

Publication Publication Date Title
WO2020078319A1 (zh) 基于手势的操控方法及终端设备
US11967039B2 (en) Automatic cropping of video content
US11112956B2 (en) Device, method, and graphical user interface for switching between camera interfaces
TWI545496B (zh) 用於調整控制件外觀之裝置、方法及圖形使用者介面
US11941764B2 (en) Systems, methods, and graphical user interfaces for adding effects in augmented reality environments
KR102114377B1 (ko) 전자 장치에 의해 촬영된 이미지들을 프리뷰하는 방법 및 이를 위한 전자 장치
WO2015188614A1 (zh) 操作虚拟世界里的电脑和手机的方法、装置以及使用其的眼镜
US11615595B2 (en) Systems, methods, and graphical user interfaces for sharing augmented reality environments
US11250636B2 (en) Information processing device, information processing method, and program
US10572017B2 (en) Systems and methods for providing dynamic haptic playback for an augmented or virtual reality environments
WO2021227918A1 (zh) 交互方法和增强现实设备
WO2018230160A1 (ja) 情報処理システム、情報処理方法、およびプログラム
US20170357389A1 (en) Device, Method, and Graphical User Interface for Media Playback in an Accessibility Mode
US20240153219A1 (en) Systems, Methods, and Graphical User Interfaces for Adding Effects in Augmented Reality Environments
KR20190035373A (ko) 혼합 현실에서의 가상 모바일 단말 구현 시스템 및 이의 제어 방법
US20180063283A1 (en) Information processing apparatus, information processing method, and program
WO2022151687A1 (zh) 合影图像生成方法、装置、设备、存储介质、计算机程序及产品
US20150215530A1 (en) Universal capture
WO2012007034A1 (en) Sending and receiving information
US20230333651A1 (en) Multi-Finger Gesture based on Finger Manipulation Data and Extremity Tracking Data
CN116802589A (zh) 基于手指操纵数据和非系留输入的对象参与

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19872386

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019872386

Country of ref document: EP

Effective date: 20210429