CN109960406B

CN109960406B - Intelligent electronic equipment gesture capturing and recognizing technology based on action between fingers of two hands

Info

Publication number: CN109960406B
Application number: CN201910154813.9A
Authority: CN
Inventors: 史元春; 喻纯; 韦笑颖
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-03-01
Filing date: 2019-03-01
Publication date: 2020-12-08
Anticipated expiration: 2039-03-01
Also published as: CN109960406A

Abstract

The intelligent electronic equipment is provided with a sensor, wherein the sensor can sense information of a hand of a user when the user uses the electronic equipment above a screen, the intelligent electronic equipment processes data sensed by the sensor and identifies actions between fingers of two hands, the actions are used as gesture input of the user for the intelligent electronic equipment, and corresponding control operation is carried out. The sensor can be a camera system installed on the intelligent electronic device and can capture hand images above a screen when the user uses the electronic device. The input modes of the mobile phone are enriched, so that the man-machine interaction becomes more friendly and convenient.

Description

Intelligent electronic equipment gesture capturing and recognizing technology based on action between fingers of two hands

Technical Field

The present invention relates generally to input technology and interaction technology of intelligent electronic portable devices, and more particularly, to technology for enabling intelligent electronic devices such as mobile phones to capture new gestures and construct binocular vision systems by providing a mirror device.

Background

The field of view of the camera on the current mobile phone is limited and fixed, and if the camera is a front camera, the camera can only obtain the image information in the space range of 60 × 80 degrees right above the mobile phone, and (2) the current camera of the mobile phone is basically a monocular camera, and the monocular camera can only obtain the RGB information of the object in the field of view, and cannot obtain the three-dimensional information of the object.

Due to the defects of the existing camera of the mobile phone, the hand information of a user when the user naturally uses the mobile phone cannot be obtained by using the mobile phone, and much valuable hand information for interaction with the mobile phone is ignored.

At present, an input channel on a mobile phone is limited, and most information is generated by means of data read by a capacitive screen of the mobile phone, namely direct contact between a hand of a user and a touch screen, so that phenomena of complex operation or unnatural operation and the like can occur.

Disclosure of Invention

The present invention has been made in view of the above circumstances.

According to one aspect of the invention, an intelligent electronic device is provided, and a camera is deployed and can capture images of hand holding of a user holding device.

Optionally, the recognizing the gesture of the holding hand comprises tracking the position of holding fingers around the electronic device, and taking the actions of lifting, moving or knocking the intelligent electronic device by different holding fingers as the interactive information input.

Optionally, the holding gesture model includes a gesture of taking a picture of the handheld intelligent electronic device, which is called a picture taking gesture, and when the intelligent electronic device recognizes the picture taking gesture, the intelligent electronic device automatically starts the picture taking application and automatically takes a picture.

Optionally, the intelligent electronic device is further operable to: identifying which hand the user is holding; based on the recognition result, the graphical user interface layout is adjusted such that the user's finger holding the hand can more easily click on the target component than before the adjustment.

Optionally, the camera is at the edge of the intelligent electronic device.

Optionally, the camera is a fisheye camera below the screen.

Optionally, the intelligent electronic device further comprises infrared illumination and an infrared filter used in conjunction with the camera to increase the signal-to-noise ratio.

Optionally, the camera is a depth camera.

Optionally, the intelligent electronic device further comprises an optical reflection device arranged obliquely to the screen of the intelligent electronic device, the optical reflection device being capable of reflecting light parallel to the screen surface of the electronic device to be captured by the camera, so that when the user holds the electronic device, the camera can capture an image of the hand of the user while holding the electronic device above the screen; and the intelligent electronic equipment identifies the hand motion and/or gesture of the user at the moment based on the obtained hand image of the user, serves as input information of the user, and interacts with the user.

Optionally, the light reflecting device is one of a lens, a prism, a convex mirror and a multi-lens or a combination thereof.

Optionally, the camera is a wide-angle camera.

Optionally, the wide-angle camera is placed in the center of the screen, and has an optical axis direction perpendicular to the touch screen of the electronic device, and has a viewing angle of 170 degrees to 190 degrees.

Optionally, the camera is a liftable camera and/or an adjustable angle camera.

Optionally, the view range of the liftable camera and/or the adjustable angle camera is a range of at least-40 degrees to 40 degrees in a normal plane of a transverse axis of the plane of the electronic device, wherein the view range of the liftable camera and/or the adjustable angle camera is a zero-degree line of sight line parallel to the longitudinal axis of the plane of the electronic device.

Optionally, the intelligent electronic device is any one of a smart phone, an intelligent vehicle-mounted electronic device, and an intelligent tablet computer.

According to another aspect of the present invention, there is provided a method for human-computer interaction, in which an intelligent electronic device is deployed with a camera capable of capturing an image of a user holding a hand of the device, the method comprising: and identifying the gesture of the holding hand based on the obtained image of the holding hand of the user, and executing corresponding control operation by the electronic equipment based on the identified gesture of the holding hand.

Optionally, the intelligent electronic device has a camera and a light reflection device arranged obliquely to a screen surface of the electronic device, and the human-computer interaction method includes: the camera system captures an image of a hand of a user when the user uses the electronic equipment, which is above a screen, when the user holds the electronic equipment; the intelligent electronic equipment identifies the hand motion and/or gesture of the user at the moment based on the obtained hand image of the user, and the hand motion and/or gesture are used as input information of the user, and the intelligent electronic equipment interacts with the user based on the input information.

According to another aspect of the present invention, an intelligent electronic device is provided, which has a sensor capable of sensing information of a user's hand when using the electronic device above a screen, wherein the intelligent electronic device processes data sensed by the sensor, recognizes a motion between fingers of both hands, performs a gesture input of the user to the intelligent electronic device, and performs a corresponding control operation.

Optionally, the sensor is a camera system installed on the intelligent electronic device, and can capture an image of a hand of a user when using the electronic device above a screen.

Optionally, the method further includes recognizing a gesture of the user in a holding state at the time based on the obtained image of the hand of the user when the user holds the electronic device, as input information of the user, and interacting with the user, wherein the intelligent electronic device performs at least one of the following gesture recognition: the intelligent electronic equipment identifies the holding gesture of one hand and identifies the touch action of the other hand on the holding handshake as the gesture input with the intelligent electronic equipment to carry out interaction with the user; the intelligent electronic device recognizes the holding gesture of both hands and recognizes the action between the thumbs of both hands as a gesture input to the intelligent electronic device to perform interaction with the user.

Optionally, the camera system comprises a front camera and a lens or a prism arranged obliquely to the surface of the screen of the electronic device, the lens or the prism reflecting light parallel to the surface of the screen of the electronic device so that it is captured by the front camera, the front camera capturing light reflected by the lens or the prism so as to capture an image of a hand of the user when using the electronic device above the screen when the user holds the electronic device; the intelligent electronic device recognizes a gesture of a user in a holding state at the moment based on the obtained image of the hand of the user when the user holds the electronic device, and the gesture is used as input information of the user and interacts with the user, wherein the intelligent electronic device performs at least one of gesture recognition, namely recognition of a holding gesture of one hand by the intelligent electronic device, and recognition of a touch action of the other hand on the holding hand, and the gesture input with the intelligent electronic device is used for performing interaction with the user, namely recognition of a holding gesture of two hands by the intelligent electronic device, and recognition of an action between thumbs of two hands, and the gesture input with the intelligent electronic device is used for performing interaction with the user.

Optionally, the intelligent electronic device recognizes a single-handed holding gesture by recognizing four fingers protruding laterally and one thumb present above or to the side of the electronic device.

Optionally, the smart electronic device recognizes the two-handed holding gesture by recognizing the thumb roots appearing on both sides of the smart electronic device and the two thumb heads appearing above the screen.

Optionally, in the case of one-handed holding, the touching action of the holding hand by the other hand includes: a finger button gesture in which a finger holding a hand is used as a button and the other hand touches it, and a finger slider gesture in which the finger holding the hand is used as a slider and the other hand slides or clicks thereon.

Optionally, the action between the thumbs of the two hands comprises one or more of: the thumbs touch each other; thumb wheel rotation; the thumbs respectively act according to a certain path; the thumb touches the pad and then moves along a certain path.

According to another aspect of the present invention, there is provided a human-computer interaction method for an intelligent electronic device, the intelligent electronic device having a sensor capable of capturing an image of a hand of a user while using the electronic device above a screen, wherein the human-computer interaction method comprises: the intelligent electronic equipment identifies the action between the fingers of the two hands, and performs corresponding control operation as gesture input of a user for the intelligent electronic equipment.

Optionally, the sensor is a camera system installed on the intelligent electronic device, and can capture a hand image of the user when using the electronic device above the screen.

Optionally, the human-computer interaction method further includes recognizing, based on the obtained hand image of the user when holding the electronic device, a gesture of the user in a holding state at the time as input information of the user, and interacting with the user, where recognizing the gesture of the user in the holding state at the time includes performing at least one of the following gesture recognitions: recognizing a holding gesture of a single hand, and recognizing a touch action of the other hand on the holding handshake as a gesture input with the intelligent electronic equipment to perform interaction with a user; recognizing a holding gesture of both hands and recognizing a motion between thumbs of both hands as a gesture input to the intelligent electronic device to perform interaction with the user.

Optionally, the camera system comprises a front camera and a lens or prism arranged obliquely to the surface of the screen of the intelligent electronic device, the lens or prism reflecting light parallel to the surface of the screen of the electronic device so that it is captured by the front camera, the front camera capturing light reflected by the lens or prism so as to capture an image of a hand of the user when using the electronic device above the screen when the user is holding the electronic device.

Optionally, the smart electronic device recognizes the single-handed holding gesture by recognizing four fingers protruding laterally and one thumb present above or to the side of the smart electronic device.

Optionally, the action between the thumbs of the two hands comprises one or more of: the thumb is in contact with the thumb wheel; the thumbs respectively act according to a certain path; the thumb touches the pad and then moves along a certain path.

According to another aspect of the invention, an intelligent electronic device is provided, which comprises a front camera and a light reflection device arranged obliquely to the surface of a screen of the electronic device, wherein the light reflection device enables light parallel to the direction of the screen of the electronic device to enter the front camera through reflection of the light reflection device, and light emitted from one point of an object enters the camera through two light paths through reflection of a prism and the screen of the portable device, so that two virtual cameras are generated, a virtual binocular camera is built, and further, spatial three-dimensional information of the object is obtained.

Optionally, the intelligent electronic device is further equipped with an infrared light emitting device, and the camera is an infrared camera.

Optionally, the light reflecting means is a triangular prism.

Optionally, the light reflecting means is a flat mirror or a convex mirror.

Optionally, the binocular camera can capture images of objects within a range of at least 5 centimeters in each of the left and right lateral directions and at least 10 centimeters in the longitudinal direction from the bottom of the electronic device.

Optionally, one of the two optical paths is that light emitted from one point of the object directly enters the light reflection device, and then enters the front camera after being reflected by the light reflection device; the other light path is that the light emitted by the point of the object is reflected by the screen of the electronic equipment, then emitted to the light reflection device, reflected by the light reflection device and then enters the front camera.

Optionally, there is a band of relatively dark areas, referred to as dark bands, in the image captured by the front camera, and the position of the prism relative to the front camera can be adjusted to reduce the extent of the dark bands.

Optionally, there is a darker area, called dark band, relative to other areas in the image captured by the front-facing camera, and the intelligent electronic device uses a brightness compensation method to remove the dark band before calculating the stereo vision information.

Optionally, the intelligent electronic device identifies an object in the image based on the obtained image, and performs interaction based on the identification result.

Optionally, in the case that the object is recognized as a pen, an angle between the pen body and the surface of the electronic screen is estimated, and the interactive operation of the intelligent electronic device is controlled based on the estimated angle.

Alternatively, for the captured binocular RGB image, the following processing is performed for gesture recognition:

(1) carrying out correction processing to obtain a standardized monocular RGB image;

(2) obtaining a hand skin mask image by using a skin color segmentation method;

(3) calculating the depth one by one on the basis of two standardized monocular RGB images to obtain a depth map;

(4) combining the skin mask image and the depth map to obtain a segmentation image of the hand region;

(5) and performing gesture recognition based on the obtained segmented image of the hand region.

Optionally, the correction processing comprises performing color correction using a pixel color equation Output R + L, where Output is the color of each pixel in the Output image, Input is the color of each pixel in the Input image, R is a reflection factor, depending only on the physical properties of the screen surface, and L is the self-luminescence of the electronic device.

Optionally, L is set to be zero, so that the parameter R is obtained by acquiring an image of a white wall surface by using the intelligent electronic device and fitting.

Optionally, the skin tone segmentation algorithm comprises two modules, one of which uses thresholds of hue and saturation in the image to segment skin regions, and the other of which dynamically calibrates these thresholds every predetermined number of frames.

According to another aspect of the invention, a man-machine interaction method of an intelligent electronic portable device is provided, the intelligent electronic portable device is provided with a front camera and a light reflection device which is arranged obliquely to the surface of a screen of the electronic device, the light reflection device is a plane lens or a triangular prism, the man-machine interaction method comprises the steps that the reflection device enables light parallel to the direction of the screen of the electronic device to be reflected to enter the front camera, light emitted from one point of an object is reflected by the prism and the screen of the portable device to enter the camera through two light paths, two plane images are obtained by the portable device and processed by the two plane images, depth information is obtained to be combined with the depth information to perform object identification, and man-machine interaction is performed on the basis of the identified object.

Optionally, the intelligent electronic portable device is further equipped with an infrared light emitting device, and the camera is an infrared camera.

Optionally, the light reflecting means is a triangular prism.

Optionally, the light reflecting means is a flat mirror or a convex lens.

Optionally, there is a darker area, called dark band, in the image captured by the front camera relative to the other areas, and the intelligent electronic portable device uses a brightness compensation method to remove the dark band before calculating the stereo vision information.

Optionally, the human-computer interaction method further comprises recognizing hand motions and/or gestures of the user in combination with the depth information as input information of the user, and interacting with the user based on the input information.

Optionally, in the case that the captured image is a monocular RGB image, further comprising performing the following processing for gesture recognition for the captured specific two monocular RGB images: (1) carrying out correction processing to obtain a standardized monocular RGB image; (2) obtaining a hand skin mask image (3) by using a skin color segmentation method, and calculating the depth pixel by pixel based on two standardized monocular RGB images to obtain a depth map; (4) combining the skin mask image and the depth map to obtain a segmentation image of the hand region; (5) and performing gesture recognition based on the obtained segmented image of the hand region.

According to another aspect of the invention, an intelligent electronic device comprises a touchable surface having a sensor capable of capturing an image of a hand when in contact with the touchable surface, wherein the intelligent electronic device identifies a state of the hand when the hand touches the touchable surface, including one or more of identifying a particular portion of the touchable surface that the hand touches, identifying which finger is touching the touchable surface, and the angle of the finger relative to the touchable surface.

Optionally, wherein identifying the specific portion of the hand comprises identifying one or more of a fingertip, an abdomen, a thenar, and a knuckle, the same action of different finger portions on the same position on the touchable surface indicating a corresponding operation on different objects.

Optionally, the sensor comprises a camera, and for detecting the specific part of the hand, image processing based on the form detection is completed by using an image signal captured by the camera.

Optionally, wherein identifying which finger the relevant finger is comprises: one or more of a thumb, index finger, middle finger, ring finger, and little finger are identified, and the same action of different fingers on the same interface object represents a different operation on this interface object.

Alternatively, for recognition using a finger, this is done using a deep neural network.

Optionally, wherein identifying the angle to which the finger is relative comprises: identifying various angles of the fingers relative to the touchable surface in the range of 0 to 90 degrees, and identifying different angle clicks of the fingers on the touchable surface, or finger angle adjustments after the clicks, are all used as information inputs.

Alternatively, in the case where it is detected that the finger touches the volume/brightness adjustment button, the volume/brightness level is adjusted based on the detection of the change in the angle of the finger with respect to the screen.

Optionally, the identifying an angle of the finger with respect to the screen comprises: according to the position of the screen capacitance signal, the position of a hand click point in an image is determined through coordinate transformation, a click area is further determined in the image, point cloud of a fingertip area in a depth map is fitted by using a linear regression prediction method, and then the angle of a click finger is further determined, wherein the click area comprises the click position and a specific area above the click position.

Alternatively, for specific part detection of the hand, image processing based on the form detection is completed in combination with the screen capacitance signal and the image signal captured by the camera.

Optionally, the electronic device is one of a smartphone, a touch panel, and an in-vehicle device.

The scheme of one embodiment of the invention can change the visual field range of the existing camera of the mobile phone by using one mirror (comprising a plane mirror, a triangular prism, a convex mirror and the like). The visual field range of a camera on a traditional mobile phone is limited and fixed, for example, a front-mounted camera can only acquire image information in a space range of about 60 to 80 degrees right above the mobile phone, hand information of a user when the user naturally uses the mobile phone cannot be acquired by using the mobile phone, and much hand information which is valuable for interaction with the mobile phone is ignored. According to the scheme, the visual field range of the front camera is changed by adding a low-cost mirror accessory and a corresponding algorithm, so that through reflection of a mirror surface, the existing camera can capture hand image information of a user (when the mobile phone is used naturally), and the visual field range of a mobile phone camera system is greatly widened.

The scheme of another embodiment of the invention provides a low-cost (only one mirror) binocular vision system for the mobile phone, and a virtual binocular camera is constructed by two light paths generated by matching the mirror and the existing camera of the mobile phone, so that the three-dimensional information of an object in the space is restored.

The invention further provides an interaction gesture based on the holding hand, compared with the information input mode based on the touch screen of the traditional mobile phone, the gesture interaction based on the holding hand can enrich the input modes of the mobile phone, and some interactions become more convenient. In addition, due to the self-perception of the human body, the user can easily locate the body part of the user, so that the user can look at the specific part of the holding hand to interact without visual attention. The interaction mode of the embodiment of the invention more accords with the use habit of the user when the user naturally uses the mobile phone, and the user does not need to lift the hand deliberately to make unnatural gestures in the space. Practical experience experiments of users show that the interactive design modes can be well accepted by users, and meanwhile, the interactive design modes have interestingness, learning easiness and convenience.

Drawings

These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:

fig. 1 and 2 are schematic diagrams showing that the capture range of a camera using a mirror placed obliquely to the screen of a mobile phone is changed to cover a space in close proximity to the screen of the mobile phone in parallel with the screen of the mobile phone according to an embodiment of the present invention.

Fig. 3 shows an exemplary operation process of a man-machine interaction method of a mobile phone with a tilted mirror placed therein according to an embodiment of the present invention.

Fig. 4 shows a schematic view of a mobile phone in which a wide-angle camera is placed at the center of a screen, and fig. 5 shows a schematic view of a field of view in this case.

Fig. 6 is a schematic diagram showing a case that a liftable camera is mounted on the top of a mobile phone.

FIG. 7 is a schematic diagram showing an edge-mounted adjustable camera of a mobile phone

Fig. 8 is a schematic view showing how the camera of fig. 6 and 7 is mounted in comparison with the field of view of a conventional front camera of a cellular phone.

FIG. 9 is a diagram illustrating a scenario where a user clicks the side of the phone with the index finger and a cursor appears on the screen to facilitate the user's clicking on a distant object, according to an embodiment of the present invention.

Fig. 10 shows a schematic view of a scene in which when a user picks up a mobile phone with a gesture of taking a picture, the camera app should automatically recognize that it is opened and automatically take a picture.

FIG. 11 shows a schematic diagram that after identifying which hand the user is holding, the UI layout can be adjusted so that the user better clicks on the target part.

FIG. 12 shows a schematic view of the application of the finger slider and finger button.

FIG. 13 shows a schematic diagram of thumb gestures of two hands holding a cell phone.

Fig. 14 shows an optical principle schematic diagram of a mirror matched with an existing camera of a mobile phone to build a virtual binocular camera.

Figure 15 shows a schematic of a relatively dark band of regions in an image captured by a front camera of a cell phone using a triangular prism.

Detailed Description

In order that those skilled in the art will better understand the present invention, the following detailed description of the invention is provided in conjunction with the accompanying drawings and the detailed description of the invention.

Before the introduction, the meaning of the terms in this text is explained.

Mirror, mirror in this context is meant in a broad sense, a device having a light reflecting function, such as a plane mirror, a triangular prism, a convex mirror, etc.

The hand skin mask image is an image with only a hand area and other background removed. When combined with other pictures, unwanted parts can be distinguished and masked.

The "hand of the user when using the electronic device above the screen" herein refers to the hand of the user when using the electronic device within 5 cm above the screen, and such hand is not substantially photographed by the front camera of the conventional mobile phone.

Human-computer interaction based on handshake

The gesture interaction on the existing mobile phone does not consider the user gesture characteristics when using the mobile phone, and the user needs to lift the hand to be used deliberately, so that unnatural gestures are made in the space.

The inventor considers that the holding hand is the body part closest to the mobile phone, the interaction based on the holding hand more accords with the behavior habit of the user when the user naturally uses the mobile phone, the information input mode of the mobile phone can be enriched, and more new, convenient and natural interaction possibilities exist.

The inventors further realized that conventional sensors are not convenient to use in order for the handset to capture the hold handshake. In the prior art, a built-in sensor of the mobile phone, such as an acceleration sensor, a gravity sensor, etc., is mostly used to sense the hand information of a user when the mobile phone is used, and some researches will be conducted by externally connecting an ultrasonic sensor to the mobile phone to obtain the information. These methods can only obtain specific and simple hand information, such as whether the hand is close to the mobile phone, the moving direction of the hand, whether the mobile phone is shaken, and the like, and cannot obtain information of holding the hand.

According to one embodiment of the invention, an interaction technology based on hand holding is provided, a camera capable of capturing images of a hand holding of a user when the user uses the mobile phone is installed on the mobile phone, the images of the hand holding are captured, gesture information of the hand holding is obtained by using a computer vision technology, and not only can the holding gesture be recognized, but also the positions of holding fingers around the mobile phone can be tracked. By acquiring the information of the handshake, the mobile phone executes corresponding control operation. The interaction mode based on the hand holding can enrich the information input mode of the mobile phone and also enable a plurality of operations to become more intelligent and faster.

Regarding the form of the camera, it may be an existing camera equipped with an additionally configured mirror, or a fisheye camera.

According to one embodiment of the invention, the capture space of the existing camera is changed by using a mirror obliquely placed above the existing camera of the mobile phone. Therefore, the camera can obtain the hand image information of the user (when the mobile phone is naturally used) through the reflection of the mirror surface, and the capture range of the camera is changed to cover the space which is parallel to the mobile phone screen and is close to the mobile phone screen, as shown in fig. 1 and fig. 2. The front camera in the existing mobile phone is mainly used for capturing an object image in the forward direction or the backward direction of the mobile phone, but cannot capture the operation of hands on the mobile phone, such as the operation of hands holding the mobile phone, which cannot be captured, and the operation of hands on the mobile phone, and the visual field range of the mobile phone is about 80 degrees taking a vertical mobile phone line as a central axis, as shown in the upper diagram in fig. 5; in contrast, the embodiment of the invention can reflect the light parallel to the surface of the mobile phone screen to be captured by the camera through the arrangement of the side face of the mobile phone inclined to the mobile phone screen, so that when a user holds the mobile phone, the camera can capture the image of the hand of the user when the user uses the mobile phone above the screen and related to the light in the direction parallel to the mobile phone screen, and the gesture of the user is recognized based on the image, so that the feedback is carried out based on the gesture.

An exemplary operation process of the man-machine interaction method of the mobile phone with the tilted mirror is shown in fig. 3: in step 110, capturing a hand image of a user holding a mobile phone; in step 120, recognizing the gesture using a computer vision algorithm; in step 130, the mobile phone performs the corresponding operation.

Preferably, the camera is located at the edge of the handset (including the top and sides of the handset). Alternatively, the camera may also be in various locations within the surface of the handset, the ultimate goal being to be able to capture image information of both hands of the user while using the handset. The field of view of the camera may be fixed, dedicated to the user's hands; it may also be possible to make adjustments to adjust the field of view to the user's hands when needed. Several possible scenarios are listed below:

(1) the camera is installed in the cell-phone screen, and the wide-angle camera can catch the most hand action of user when using the cell-phone:

the wide angle camera is placed in the center of the screen as shown in fig. 4, with the field of view in the range of 0 to 180 degrees in the normal plane to the transverse axis of the handset plane, with the line of sight parallel to the longitudinal axis of the handset plane being the zero degree line, as shown in the lower diagram in fig. 5.

(2) The camera can be lifted and installed on the mobile phone, and fig. 6 shows a schematic diagram of the camera which can be lifted and installed on the top of the mobile phone.

(3) An adjustable camera mounted on the edge of the handset, and fig. 7 shows a schematic view of an adjustable camera mounted on the top of the handset.

Fig. 8 is a schematic diagram showing the comparison between the above installation manners (2) and (3) and the view range of the conventional front-facing camera of the mobile phone, wherein the upper diagram in fig. 8 is a schematic view of the view range of the conventional front-facing camera of the mobile phone, and the lower diagram in fig. 8 is a schematic view of the view ranges of the installation manners (2) and (3) according to the embodiment of the present invention.

In one example, an infrared illumination and infrared filter are also configured on the smart electronic portable device for use in conjunction with a camera to increase signal-to-noise ratio.

In one example, a depth camera is further disposed on the smart electronic portable device, so that depth information can be acquired, and a method for configuring the depth camera to acquire the depth information will be described in detail later.

In one example, the intelligent electronic portable device is provided with an optical reflection device arranged obliquely to the screen of the intelligent electronic portable device, the optical reflection device can reflect light parallel to the screen surface of the electronic device to be captured by the camera, so that when a user holds the electronic device, the camera can capture a hand image above the screen and when the user uses the electronic device; and the intelligent electronic portable equipment identifies the hand motion and/or gesture of the user at the moment based on the obtained hand image of the user, serves as input information of the user, and interacts with the user.

Optionally, the camera is a wide-angle camera.

Optionally, the wide-angle camera is placed in the center of the screen, and the optical axis direction of the wide-angle camera is perpendicular to the touch screen of the electronic device, and has a viewing angle of 170 degrees to 190 degrees.

Optionally, the camera is a liftable camera and/or an adjustable angle camera.

Optionally, the view range of the liftable camera and/or the angle-adjustable camera is a range in which the view range is at least-40 degrees to 40 degrees in a normal plane of a transverse axis of the plane of the electronic device, and the view range is a zero-degree line parallel to a line of sight of a longitudinal axis of the plane of the electronic device.

Optionally, the intelligent electronic portable device is any one of a smart phone, an intelligent vehicle-mounted electronic device, and an intelligent tablet computer.

Through the hand information that the camera caught, not only can discern the gesture of gripping, can also discern the instantaneous or lasting gesture of gripping, including the position of following the cell-phone and gripping the finger around to can have following application:

(1) the position and the action of the fingers for holding the hand are obtained, and when a user holds the mobile phone with one hand, the fingers for holding the mobile phone still have moving space and flexibility. The actions of lifting, moving or knocking the mobile phone by different fingers can be used as interactive information input. Examples are as follows: the user clicks the side of the phone with the index finger and a cursor appears on the screen to facilitate the user's clicking on a distant object, as shown in fig. 9.

(2) A grip gesture is identified. Holding the handshake potential brings many valuable inputs to the phone, such as the quick opening of various apps: when the user picks up the phone with a gesture to take a picture, the camera app should automatically recognize to open and automatically take a picture, as shown in fig. 10.

(3) Recognizing a user-held handshake. Current smartphone screen sizes are generally not suitable for one-handed use, especially when clicking on content that is far away from the screen. For example: after identifying which hand the user is holding, the UI layout can be adjusted so that the user clicks better on the target part, as in FIG. 11.

New gesture based on action between fingers of two hands

According to one embodiment of the invention, the intelligent electronic device is provided with a sensor, the sensor can sense information of a hand of a user when the user uses the electronic device, the intelligent electronic device processes data sensed by the sensor, recognizes actions between fingers of two hands, serves as gesture input of the user for the portable device, and carries out corresponding control operation.

The sensor here is capable of detecting the finger position, and is, for example, an image sensor (camera) or a capacitance sensor.

In one example, the sensor is a camera system mounted on the intelligent electronic device capable of capturing an image of the user's hand above the screen while using the electronic device.

It should be noted that the sensor is not limited to an image sensor (camera), but may be other types of sensors, such as a capacitive sensor on a touch screen.

According to one embodiment of the invention, a gesture interaction mode is designed based on the characteristics of the holding action of the hand on the mobile phone.

The holding action is divided into one-handed holding and two-handed holding.

When the mobile phone is held by a single hand, four fingers protruding from the side surface and a thumb appearing above or on the side surface of the mobile phone naturally appear on the mobile phone, and the protruding holding hands are used as touch parts capable of being sensed and interact with the mobile phone through touching the holding hands. FIG. 12 shows a schematic representation of the application of finger sliders and finger buttons, one hand holding the cell phone and the other hand touching, which is a typical gesture for interacting with the cell phone. In this case, we consider the handshake can be treated as an interface for touch. In fig. 12, the left diagram is a case where a finger of a hand is used as a button, which is called a finger button (FingerButton), and the finger button allows a user to use a finger to click on the finger of the hand as an interactive mode for manipulating an application (for example, switching a brush color). The right diagram in fig. 12 shows the case where the fingers of the holding hand act as a slider bar, we call FingerBar, which allows the user to slide on the thumb of the holding hand to provide input (e.g., control volume) to the one-dimensional control bar.

Both the FingerButton and FingerBar technologies reduce the operation steps required by the original interaction mode, and increase the media available for mobile phone input, thereby improving the interaction efficiency.

When two hands hold the mobile phone, the two hands hold hands, two thumbs of the two hands can appear above the screen of the mobile phone at the same time, and touch (such as Thumb touch) and motion (such as Thumb rotation) between the two thumbs are used as a mode for interacting with the mobile phone, which is called Thumb-to-Thumb gesture and is used as mode switching or simple and quick operation for triggering a second view. As shown in FIG. 13, an example use of the enhanced typing interaction experience is illustrated: when important information is filled in a mail, the user may wish to consult another application page to obtain a telephone number or address. In the current usage, the user must switch back to the last application, and try to note down these key information strings and return to the input page, which is a cumbersome process. With Thumb-to-Thumb gestures, once two Thumb contacts are detected, the system may move the screen of the last application to the previous layer of the current application so that the user can easily refer to the content, and after trying to remember, he/she may release the two thumbs back to the current page to continue text entry. This provides a very efficient and lightweight method of mode switching on a smartphone.

Compared with a touch screen, the input modes of the mobile phone can be enriched based on the hand holding gesture interaction, and some interactions become more convenient. In addition, due to the self-perception of the human body, the user can easily locate the body part of the user, so that the user can look at the specific part of the holding hand to interact without visual attention.

Compared with gesture interaction on the mobile phone, the interaction mode is more in line with the use habit of the user when the user uses the mobile phone naturally, the user does not need to lift the hand deliberately, and unnatural gestures are made in the space.

In addition, the holding gesture used for holding the mobile phone, the finger action when holding the mobile phone, and the like can be identified, such as basic gestures of clicking, sliding, zooming, selecting and the like of the hand on the mobile phone extension plane; the fingers of the hand above the screen of the mobile phone touch each other, pinch a fist, open the hand, and move the fingers along a specific path (such as drawing a circle with the fingers).

In one example, a camera system in an intelligent portable device includes a front camera and a lens or prism disposed oblique to a screen surface of the electronic device, the lens or prism reflecting light parallel to the screen surface of the electronic device such that it is captured by the front camera, the front camera capturing light reflected by the lens or prism such that an image of a user's hand while using the electronic device above the screen when the user is holding the electronic device is captured; the intelligent electronic device identifies a gesture of a user in a holding state at the moment based on the obtained hand image when the user holds the electronic device, the gesture is used as input information of the user, and the intelligent electronic device interacts with the user, wherein the portable device performs at least one of the following gesture identifications: the portable equipment identifies the holding gesture of one hand and the touch action of the other hand on the holding handshake as the gesture input with the portable equipment to carry out the interaction with the user; the portable device recognizes a gripping gesture of both hands and recognizes a motion between thumbs of both hands as a gesture input with the portable device to perform interaction with the user.

In one example, the electronic portable device recognizes a single-handed holding gesture by recognizing four fingers protruding laterally and one thumb present above or to the side of the electronic device.

In one example, the electronic portable device recognizes a two-handed holding gesture by recognizing thumb roots appearing on both sides of the smart electronic device and two thumb heads appearing above the screen, considering positions of two thumbs at the electronic portable device when both hands are held.

In one example, in the case of one-handed holding, the touching action of the holding hand by the other hand includes: a finger button gesture in which a finger holding a hand is used as a button and the other hand touches it, and a finger slider gesture in which the finger holding the hand is used as a slider and the other hand slides or clicks thereon.

In one example, the action between the thumbs of the two hands includes one or more of: the thumbs touch each other, rotate and rotate, and respectively act according to a certain path, and the thumbs touch each other and then act according to a certain path. A predetermined path is followed with respect to the thumb, such as the thumb bending.

Third, cell-phone binocular system based on virtual camera

A light reflection device such as a mirror surface (comprising a lens, a prism, a convex mirror or a multi-surface lens combination and the like) is placed above a camera of a mobile phone, a low-cost (only one mirror) binocular vision system can be provided for the mobile phone through the combination of the camera and the mirror, a virtual binocular camera (a virtual camera 1 and a virtual camera 2) is constructed through two light paths generated by the cooperation of the mirror surface and the existing camera of the mobile phone, and three-dimensional information of objects in a captured space is restored.

In a typical stereoscopic vision system, there are typically two cameras shooting the same scene. The mirror is matched with the existing camera of the mobile phone, and a virtual binocular camera is built by using the optical principle illustrated in fig. 14, so that three-dimensional information of an object in the visual field range of the camera is restored.

The mirror may be a flat mirror, a triangular prism, a convex mirror, etc., and the use of a triangular prism is preferred because it has the ability to totally internally reflect, thus resulting in higher imaging quality associated with the use of a triangular prism as compared to the use of a flat mirror.

As shown in fig. 14, the existing camera of the mobile phone forms two virtual cameras by reflection of the mirror surface and the mobile phone screen, and light emitted by an object enters the existing cameras through two optical paths respectively.

The optical path 1 represents that one optical path generated by an object directly enters the camera through mirror reflection; the light path 2 indicates that another light path generated by the object firstly passes through the reflection of the mobile phone screen, then enters the mirror surface, and finally enters the camera. Further, the virtual camera 1 is slightly higher than the touch screen, which is a result of the primary reflection of the prism mirror slope (optical path 1). The virtual camera 2 is generated by twice reflected light (light path 2), the first reflection occurs at the bottom of the triple prism or on the mobile phone screen, and then is reflected to the inclined plane mirror surface of the triple prism. The two virtual cameras are parallel to the mobile phone screen, and a stereoscopic vision system is formed together.

The two different light paths provide a binocular view for the mobile phone, and a binocular system is constructed, so that the depth information of the object in the space can be calculated through a computer vision algorithm, and the three-dimensional information of the object in the space can be acquired.

In one example, the intelligent electronic device identifies an object based on the obtained image and interacts based on the result of the identification.

For example, the intelligent electronic device estimates an angle between a pen body and the surface of the electronic screen under the condition that the intelligent electronic device recognizes that the object is a pen, and controls the interactive operation of the intelligent electronic device based on the estimated angle; or tracking the track of the pen tip moving on the desktop, and controlling the interactive operation of the intelligent electronic device based on the identified track of the pen tip, for example, when the pen tip clicks at a specific position on the desktop or moves along a certain track, indicating that the screen of the intelligent electronic device is clicked at the corresponding position or slides along the corresponding track.

The intelligent electronic device can judge the identity of the user and control the interactive operation with the user based on the recognized body type or clothes of the user. For example, if the device identifies that the user is the device owner or manager, then more authorization is given to the user; if the user is identified as a non-device-common user, less functions are started or the owner of the device is notified.

In one example, the intelligent electronic device further identifies the surrounding environment of the user and determines the situation to adjust the corresponding setting of the intelligent electronic device. For example, when the user is in a meeting room or a movie theater, the device judges that the user is in a meeting environment, and can automatically reduce or modulate the volume of the device to be silent so as to avoid disturbing others; and if the user is in the driving environment, the electronic equipment closes all entertainment software and the like.

According to one embodiment of the invention, the virtual binocular camera can capture images of objects in the range of at least 5 centimeters in the left and right directions and at least 10 centimeters in the longitudinal direction upwards by taking the bottom of the mobile phone as a starting point.

The embodiment of the invention designs the light path structure suitable for the mobile phone by combining the single prism and the traditional front-facing camera of the mobile phone, provides stereoscopic vision by matching the single prism lens with the mobile phone screen and the front-facing camera of the mobile phone, and successfully rotates the front-facing camera to shoot the view parallel to the screen.

We have created two virtual cameras but the resulting image quality is not the same. The optical path entering the virtual camera 1 (optical path 1) is totally reflected by the inclined surface of the prism, and therefore, an image quality that is not different from an image directly captured by a camera can be obtained.

While the virtual camera 2 situation is more complicated. The first reflection of this light path occurs at the base of the prism or on the screen of the mobile phone, and in the case of the prism at the base, since the refractive index of the prism glass is higher than that of air, this results in the phenomenon of "total internal reflection", i.e. light is difficult to pass through the contact surface between the prism and the screen (air must exist in the middle of the contact surface), but is totally reflected inside the prism. Therefore, this portion produces a high-quality image having the same brightness and sharpness as those of the image in the virtual camera 1. For the latter, i.e. the case where the first reflection occurs on the cell phone screen, the light that occurs on the cell phone screen will be attenuated due to the low reflectivity. This results in a relatively dark zone band in the image captured by the front camera, as shown in fig. 15. By adjusting the relative position of the prism and the camera, the width of the dark band can be minimized. In the following, we will describe how to remove the dark portions using the luminance compensation function before proceeding with the stereo vision algorithm.

According to one embodiment of the present invention, for a captured binocular image (two monocular RGB images obtained simultaneously), the following processing may be performed:

(1) performing a correction process in which a luminance compensation function is used to remove dark portions, resulting in a standardized monocular RGB image;

(2) obtaining a hand skin mask image by using a skin color segmentation method;

In one example, the correction process may include:

(1) color correction is performed using the pixel color equation Output R + L,

(2) wherein Output is the color of each pixel in the Output image, Input is the color of each pixel in the Input image, R is a reflection factor, which depends only on the physical properties of the screen surface, and L is the self-luminescence of the mobile phone.

In one example, the L is set to zero (such a setting is reasonable since our camera is very close to the surface of the phone, so the value of L is almost zero under normal lighting conditions), and the parameter R is obtained by fitting by capturing an image of a white wall surface with the intelligent electronic device. In a more specific example, some images of white walls were acquired with a prototype machine and used to remove the black areas in each frame of image using the parameter R in the least squares fit model for each pixel.

With regard to skin segmentation techniques, our task is to eliminate as much as possible of the background area in the acquired image under different illumination conditions, and over a range of hues and saturations. Although the predecessors have conducted extensive research on skin detection, existing solutions (e.g. document 1, research for color-based pixel classification) do not meet the requirements in our particular scenario. Their purpose is to detect all possible human skin colors in the same image, while we only need to satisfy the hand skin detection of one user. Furthermore, our skin detection module requires high computational efficiency for real-time interaction.

Based on the above considerations, the inventor proposes a set of skin detection algorithms, which is divided into two modules: one module simply uses thresholds of hue and saturation in the image to segment skin regions, and the other module dynamically calibrates these thresholds every few frames. In particular, we train a convolutional neural network, such as that described in document 5, to identify user hand skin pixels as a semantic segmentation task using the data sets provided in documents 1 and 2, 3, 4. The skin detection algorithm of the embodiment of the invention is suitable for efficiently detecting the hand skin of a user in real time in different illumination conditions and a certain hue and saturation range, and eliminates a background area in an acquired image as far as possible. .

The list of cited documents is as follows:

document 1: (S.L. phosphor, A.Bouzedroum, and D.Chai.2005.skin segmentation using color pixel classification: Analysis and compatibility. IEEE transactions-operations on Pattern Analysis and Machine understanding 27,1(Jan 2005), 148-154. https: i/doi.org/10.1109/TPAMI.2005.17.

Document 2: tomasz Grzejszczak, Michal Kawulok, and Adam Gallszka.2016. hand and hands detection and localization in color images multimedium Tools and Applications 75,23(2016), 16363-16387. https:// doi.org/10.1007/s11042-015-

Document 3: michal Kawulok, Jolanta Kawulok, Jakub Nalepa, and Bogdan Smolka.2014 Self-adaptive algorithm for segmentation in skin regions EURASIP Journal on Advances in Signal Processing 2014 (170) (2014), 1-22 https:// doi. org/10.1186/1687-

Document 4: jakub Nalepa and Michal Kawulok.2014 Fast and Accurate Hand Shape Classification.In Beyond Databases, architecutes, and Structure-turs, Stanislaw Kozielski, Dariusz Mrozek, Pawel Kasprowski, Bozena Malysiak-Mrozek, and Daniel Kostrzewa (Eds.). communication in Computer and Information Science, Vol.424 Springer 364, 373 https:// doi.org/10.1007/978-3-319-

Document 5: ahandt, aurora95, unixnme, and Pavlos Melissinos.2018. Keras-tensorb evaluation of full volumetric connectivity for Se-manual segmentation. https:// githu. com/aurora95/Keras-FCN.

The intelligent electronic portable equipment with the depth information obtaining and processing functions can effectively identify hand movements and/or gestures of a user by combining the depth information, such as identifying the holding gesture used for holding the electronic equipment; finger motion while holding the electronic device; clicking the hand part used by the screen; a hand gesture for clicking a screen; clicking, sliding, zooming and selecting the hand on the extension plane of the electronic equipment; the motion that fingers of hands above the screen of the electronic equipment touch, pinch a fist, open a hand and move along a specific path.

In one example, the smart electronic portable device recognizes a single-handed holding gesture and recognizes a touch action of the other hand on the holding handshake as a gesture input with the portable device to interact with the user.

In one example, the smart electronic portable device recognizes a single-handed holding gesture by recognizing four fingers protruding laterally and one thumb present above or to the side of the electronic device.

In one example, the portable device recognizes a two-handed holding gesture and recognizes a motion between two thumbs as a gesture input with the portable device to perform interaction with the user.

The action between the thumbs of the hands may include one or more of the following: the thumbs touch each other; thumb wheel rotation; the thumbs respectively act according to a certain path; the thumb touches the pad and then moves along a certain path.

Fourth, identify the intelligent electronic device of the hand of the operation screen

According to one embodiment of the present invention, there is provided an intelligent electronic device comprising a touchable surface (such as a touch screen or a touch pad, such as the touch pad of a notebook) having sensors capable of capturing images of a hand of a user while using the electronic device over the screen, wherein the intelligent electronic device identifies a state of the hand touching the screen, including one or more of identifying a specific part of the hand, identifying which finger is the relevant finger, and the direction of the finger.

The term "hand image above the screen when the user uses the electronic device" refers to that when the traditional electronic device is operating the screen of the electronic device with the hand, the camera arranged on the electronic device cannot capture the image of the hand operating the screen, and the camera system (the system here may include the traditional front camera and the auxiliary device arranged, such as a plane mirror, a triangular prism, etc.) arranged in the embodiment of the present invention can capture the image of the hand operating the screen.

The sensor here may be a normal camera, i.e. a monocular camera, or a binocular camera, or an infrared camera.

The sensors here may be on the side, below, or above the touchable surface.

Identifying a particular part of the hand may include: identify the finger tip, finger abdomen, thenar, finger joint. The same action of different finger parts on the same position represents that corresponding operation is carried out on different objects.

As an example, a fingertip touch represents an operation on a fine target such as a text, a brush, etc., such as: sliding the characters with a fingertip to indicate that the characters in the area are selected for operation; and the current brush is changed into a small brush by using the brush tool with the fingertips.

By way of example, finger-belly touches represent normal user operations, such as manipulating icons or menu options on the device screen to select commands, call files, launch programs, or perform some other daily task.

As an example, a large thenar touch represents an operation between application layers, such as: long pressing the big thenar on the screen to show that the big thenar returns to the main page; the big thenar slides left and right on the screen, indicating that the current application is switched.

As an example, finger joint touches represent screenshot operations, such as: double-clicking the screen by the finger joint represents to intercept the current screen; the knuckle draws a circle on the screen to show the image in the circle.

As an example, for the detection of a specific part of the hand, combining the screen capacitance signal with the image signal captured by the camera is performed by conventional image processing techniques such as morphology detection. Specifically, for example, the following can be performed: firstly, the position of a hand click point in an image is determined through coordinate transformation according to the position of a screen capacitance signal, a click region (comprising the click position and a small region above the click position) is further determined in the image, and a specific part of a hand for clicking is determined according to multi-modal information such as the geometric characteristics (depth information, such as inclination angle, joint position, joint bending direction and the like) of fingers in the region and the characteristics (such as contact area, inclination of a contact ellipse) of the capacitance signal.

As an example, identifying which finger the relevant finger is may include identifying a thumb, index finger, middle finger, ring finger, and pinky finger.

As an example, the same action of different fingers on the same object may represent different operations on the same object, such as middle finger clicking on file representation copy, ring finger clicking on representation paste; clicking the WeChat icon by the index finger represents opening the WeChat, and clicking the WeChat icon by the middle finger represents opening a scanning application in the WeChat.

For example, long pressing of the home page with different fingers may represent a shortcut key, opening the corresponding application: for example, a long press on the index finger may indicate opening WeChat, a long press on the middle finger may indicate opening Payment, etc.

Furthermore, different fingers may represent different tools, such as index finger for a brush, middle finger for an eraser, etc.

As an example, identifying the angle of the finger may include: various angles of the finger from 0 to 90 degrees with respect to the screen are recognized. The clicks on different angles of the screen or the adjustment of the angle of the finger after the click can be used as information input. Such as: after the finger touches the volume/brightness adjusting button, the volume/brightness is adjusted by adjusting the angle of the finger relative to the screen.

As an example, identifying the angle of the finger may be performed as follows: determining the position of the screen capacitance signal in the image through coordinate transformation according to the position of the screen capacitance signal, further determining a click region (comprising the click position and a small region above the click position) in the image, fitting the point cloud of the fingertip region in the depth map by using a linear regression prediction method, and further determining the angle of a click finger.

The scheme of the invention can change the visual field range of the existing camera of the mobile phone by using one mirror surface (comprising a lens, a prism and the like). The visual field range of a camera on a traditional mobile phone is limited and fixed, for example, a front-mounted camera can only acquire image information in a space range of about 60 to 80 degrees right above the mobile phone, hand information of a user when the user naturally uses the mobile phone cannot be acquired by using the mobile phone, and much hand information which is valuable for interaction with the mobile phone is ignored. According to the scheme, the visual field range of the front camera is changed by adding a low-cost mirror accessory and a corresponding algorithm, so that through reflection of a mirror surface, the existing camera can capture hand image information of a user (when the mobile phone is used naturally), and the visual field range of a mobile phone camera system is greatly widened.

The scheme of the invention provides a low-cost (only one lens is needed) binocular vision system for the mobile phone, and a virtual binocular camera is constructed through two light paths generated by the cooperation of the lens and the existing camera of the mobile phone, so that the three-dimensional information of an object in the space is restored.

The invention also provides an interaction gesture based on the holding hand, compared with the information input mode based on the touch screen of the traditional mobile phone, the gesture interaction based on the holding hand can enrich the input modes of the mobile phone and enable some interactions to be more convenient and faster. In addition, due to the self-perception of the human body, the user can easily locate the body part of the user, so that the user can look at the specific part of the holding hand to interact without visual attention. The interaction mode of the embodiment of the invention more accords with the use habit of the user when the user naturally uses the mobile phone, and the user does not need to lift the hand deliberately to make unnatural gestures in the space. Practical experience experiments of users show that the interactive design modes can be well accepted by users, and meanwhile, the interactive design modes have interestingness, learning easiness and convenience.

In the foregoing, a mobile phone is taken as an example of the smart electronic portable device, but other smart electronic portable device products capable of being held by hand, such as a smart vehicle-mounted electronic device, a smart tablet computer, and the like, may also be used.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An intelligent electronic device having a sensor capable of sensing information on a user's hand when using the electronic device above a screen, wherein

The intelligent electronic equipment processes the data sensed by the sensor, recognizes the action between the fingers of the two hands, is used as the gesture input of the user aiming at the intelligent electronic equipment, carries out corresponding control operation,

the intelligent electronic equipment also identifies the gesture of the user in the holding state at the moment based on the obtained image of the hand of the user when the user holds the electronic equipment, the gesture is used as the input information of the user and is interacted with the user,

wherein the intelligent electronic device performs at least one of the following gesture recognitions:

the intelligent electronic equipment identifies the holding gesture of one hand and identifies the touch action of the other hand on the holding handshake as the gesture input with the intelligent electronic equipment to carry out interaction with the user;

the intelligent electronic device recognizes the holding gesture of both hands and recognizes the action between the thumbs of both hands as a gesture input to the intelligent electronic device to perform interaction with the user.

2. The intelligent electronic device of claim 1, the sensor being a camera system mounted on the intelligent electronic device capable of capturing an image of a user's hand while using the electronic device above a screen.

3. An intelligent electronic device according to claim 2, said camera system comprising a front facing camera and a lens or prism arranged inclined to the surface of the electronic device screen,

the lens or prism reflects light parallel to the surface of the screen of the electronic device so that it is captured by the front-facing camera,

the front camera captures light reflected by the lens or the triple prism, so that an image of a hand of a user when the user uses the electronic equipment above the screen when the user holds the electronic equipment is captured;

the intelligent electronic device identifies the gesture of the user in the holding state at the moment based on the acquired image of the hand of the user when the user holds the electronic device, the gesture is used as the input information of the user and interacts with the user,

the intelligent electronic device recognizes the holding gesture of both hands and recognizes the action between the thumbs of both hands as a gesture input with the intelligent electronic device to perform interaction with the user.

4. The intelligent electronic device of claim 3, the intelligent electronic device recognizing a single-handed holding gesture by recognizing four fingers protruding laterally and one thumb present above or to the side of the electronic device.

5. The intelligent electronic device of claim 3, the intelligent electronic device recognizing a two-handed holding gesture by recognizing thumb roots appearing on both sides of the intelligent electronic device and two thumb heads appearing above the screen.

6. A smart electronic portable device according to claim 3, wherein the touching action of the holding hand by the other hand in the case of one-handed holding comprises: a finger button gesture in which a finger holding a hand is used as a button and the other hand touches it, and a finger slider gesture in which the finger holding the hand is used as a slider and the other hand slides or clicks thereon.

7. The intelligent electronic device of claim 3, the action between the thumbs of the two hands comprising one or more of:

the thumbs touch each other;

thumb wheel rotation;

the thumbs respectively act according to a certain path;

the thumb touches the pad and then moves along a certain path.

8. A human-computer interaction method of an intelligent electronic device having a sensor capable of capturing an image of a user's hand above a screen while using the electronic device, wherein the human-computer interaction method comprises:

the intelligent electronic equipment recognizes the action between the fingers of the two hands, is used as the gesture input of the user aiming at the intelligent electronic equipment, and carries out corresponding control operation,

wherein the intelligent electronic device identifies the gesture of the user in the holding state at the moment based on the obtained hand image when the user holds the electronic device, the gesture is used as the input information of the user and interacts with the user,

wherein recognizing the gesture of the user in the holding state at this time includes performing at least one of the following gesture recognitions:

recognizing a holding gesture of a single hand, and recognizing a touch action of the other hand on the holding handshake as a gesture input with the intelligent electronic equipment to perform interaction with a user;

recognizing a holding gesture of both hands and recognizing a motion between thumbs of both hands as a gesture input to the intelligent electronic device to perform interaction with the user.

9. The human-computer interaction method of claim 8, wherein the sensor is a camera system mounted on the intelligent electronic device, and the camera system is capable of capturing an image of a user's hand above a screen when the user is using the electronic device.

10. The human-computer interaction method according to claim 9, wherein the camera system comprises a front camera and a lens or a triple prism arranged obliquely to the surface of the screen of the intelligent electronic device,

the front camera captures light reflected by the lens or the triple prism, thereby capturing an image of a hand of a user when using the electronic device above the screen when the user holds the electronic device.

11. The human-computer interaction method of claim 10, wherein the intelligent electronic device recognizes a single-handed holding gesture by recognizing four fingers protruding laterally and one thumb present above or to the side of the intelligent electronic device.

12. The human-computer interaction method of claim 10, wherein the intelligent electronic device recognizes the two-handed holding gesture by recognizing the thumb roots appearing on both sides of the intelligent electronic device and the two thumb heads appearing above the screen.

13. The human-computer interaction method according to claim 10, wherein in the case of one-handed holding, the touch action of the other hand on the holding hand comprises: a finger button gesture in which a finger holding a hand is used as a button and the other hand touches it, and a finger slider gesture in which the finger holding the hand is used as a slider and the other hand slides or clicks thereon.

14. The human-computer interaction method of claim 8, said action between the thumbs of the hands comprising one or more of:

the thumbs touch each other;

thumb wheel rotation;

the thumbs respectively act according to a certain path;

the thumb touches the pad and then moves along a certain path.