CN115061577A - Hand projection interaction method, system and storage medium - Google Patents

Hand projection interaction method, system and storage medium Download PDF

Info

Publication number
CN115061577A
CN115061577A CN202210958436.6A CN202210958436A CN115061577A CN 115061577 A CN115061577 A CN 115061577A CN 202210958436 A CN202210958436 A CN 202210958436A CN 115061577 A CN115061577 A CN 115061577A
Authority
CN
China
Prior art keywords
hand
projection
user
picture
bone joint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210958436.6A
Other languages
Chinese (zh)
Other versions
CN115061577B (en
Inventor
冯翀
陈铁昊
王雪灿
王宇轩
张梓航
张梦遥
郭嘉伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenguang Technology Co ltd
Original Assignee
Beijing Shenguang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenguang Technology Co ltd filed Critical Beijing Shenguang Technology Co ltd
Priority to CN202210958436.6A priority Critical patent/CN115061577B/en
Publication of CN115061577A publication Critical patent/CN115061577A/en
Application granted granted Critical
Publication of CN115061577B publication Critical patent/CN115061577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a hand projection interaction method, a hand projection interaction system and a storage medium, wherein the method comprises the following steps: acquiring a hand picture of a user; identifying and tracking a plurality of key bone joint points of the hand; calculating projection conversion parameters, and adjusting a projection picture according to the projection conversion parameters to realize the anchoring of the projection picture and the hand part; when the finger tip clicks the projection picture, starting to acquire a first longitudinal depth value corresponding to the coordinate of the key bone joint point of the finger tip, a second longitudinal depth value corresponding to the operation button in the projection picture and a first transverse distance between the coordinate of the key bone joint point of the finger tip and the coordinate of the operation button; when the difference between the longitudinal depth values reaches a first threshold value and the first transverse distance reaches a second threshold value, judging that the user makes a click event, and sending a projection change instruction according to an operation button clicked by the user; and updating the projection content according to the projection change instruction to finish the interaction with the user. The invention solves the problem of poor projection interaction experience in a narrow space.

Description

Hand projection interaction method, system and storage medium
Technical Field
The application relates to the technical field of human-computer interaction, in particular to a hand projection interaction method, a hand projection interaction system and a storage medium.
Background
Common interaction technologies generally include various schemes such as pure virtual, pure real or virtual-real interaction:
pure virtual: the user realizes interactive behaviors through the design inside the device on electronic devices such as a collection device and a tablet, and the operation and the feedback are virtual.
The method is pure and real: the user interacts with the actual item by following certain rules, such as a traditional board game.
And (3) virtual-real interaction: general interaction, that is, interaction with an electronic device by a user clicking a button on the electronic device, is not flexible and requires support from the electronic device.
Traditional hand interaction: there are interaction methods such as hand smartwatches, for which the interaction is generally performed by a component mounted on a hand of a user, which is equivalent to requiring two hands to perform interaction with the component; meanwhile, there are interaction modes based on gesture control, which require a user to perform a specific action to realize interaction.
These prior art techniques have a number of drawbacks:
1. purely virtual interactions can be limited by the screen of the electronic device, and interactions based on the electronic device can have a certain impact on the eyes of the user and are not healthy enough.
2. Purely real interaction does not have eye protection problem, but because the interaction is based on real objects, if some special effects are realized or the content of the interaction is expanded, some valuable real objects are usually required to be purchased for realization.
3. The virtual-real interaction scheme realizes breakthrough of pure virtual and pure real technologies, but still needs a certain platform as assistance, for example, a projector in the traditional virtual-real interaction, a projection device in the upgraded virtual-real interaction, and the like, which cause obstacles to the interaction to some extent: a large space is required for support.
4. Although the traditional hand interaction technology can achieve the utilization of hand space, still has some problems: two-hand operation is required, which is not feasible in some scenes; the operable space is small, and only a small part of the screen of the hand is provided; in addition, aiming at gesture control, the operation is not accurate enough, only some qualitative control can be performed, and the operation is not flexible enough.
Aiming at the technical problems of limited user interaction scene and poor experience in the prior art, no effective solution is provided at present.
Disclosure of Invention
The embodiment of the application provides a hand projection interaction method, a hand projection interaction system and a storage medium, and at least solves the technical problem that in the prior art, the user experiences poor interaction in random space occasions, especially in narrow space.
According to one aspect of the embodiment of the application, a hand projection interaction method, a hand projection interaction system and a storage medium are provided.
A hand projection interaction method, comprising the steps of:
s01, acquiring hand pictures of a user, wherein the hand pictures comprise a first hand picture and a second hand picture;
s02, identifying a plurality of key bone joint points of the hand, and tracking a plurality of key bone joint points near the finger tip of the second hand;
s03, calculating projection conversion parameters according to the plurality of key bone joint points identified and tracked in the step S02, and adjusting a projection picture according to the projection conversion parameters to realize anchoring of the projection picture and the first hand;
s04, when the user clicks the projection picture of the first hand by using the finger tip of the second hand, starting to obtain: a first longitudinal depth value corresponding to the coordinates of a plurality of key bone joint points near the second hand finger tip, a second longitudinal depth value corresponding to the coordinates of an operation button in the projection picture, and a first transverse distance between the coordinates of a plurality of key bone joint points near the second hand finger tip and the coordinates of the operation button;
s05, when the difference between the first longitudinal depth value and the second longitudinal depth value reaches a first threshold value and the first transverse distance reaches a second threshold value, judging that a user clicks an event, and sending a projection change instruction according to the operation button clicked by the user;
and S06, updating projection contents according to the projection change instruction, and finishing interaction with a user.
Further, step S01 further includes the steps of:
carrying out projection focusing and trapezoidal correction, and carrying out coincidence and calibration judgment on picture signals until a projection picture is clear;
the ChArUco marker containing the coded information is displayed in the projection picture.
Further, the step S02 specifically includes:
separating a palm area of the first hand through palm detection;
analyzing 21 key bone joint points of the first hand by a hand bone joint point calibration algorithm, and storing coordinates of the 21 key bone joint points;
expanding the 21 key bone joint points according to user setting through an interpolation algorithm to obtain more key bone joint point coordinates of a first hand palm area and store the coordinates;
tracking five key bone joint points of the second hand fingertip, and updating the coordinates of the five key bone joint points in real time.
Further, the step S03 specifically includes:
according to the plurality of key bone joint points identified and tracked in the step S02 and the ChArUco mark in the projection picture, the corresponding vertex coordinates of the projection picture in the image space are calculated, meanwhile, the depth value of the projection picture at the moment is recorded, the projection conversion parameter is calculated and stored according to the depth value of the projection picture and the initial projection conversion parameter, and the projection picture is adjusted according to the projection conversion parameter, so that the anchoring between the projection picture and the first hand is realized.
Further, the step S04 further includes a click determination step:
and when the distance between the fingertip of the second hand of the user and the projection picture of the first hand is close to 1cm, judging that the user is about to click, and continuously acquiring the first longitudinal depth value, the second longitudinal depth value and the first transverse distance.
Further, in step S05, determining that the user has made a click event, and issuing a projection change instruction according to the operation button clicked by the user specifically includes:
after the fact that the user click event is determined, acquiring current projection content, analyzing the user state and action intention by combining multi-frame click information, and taking an analysis result as source data of next analysis;
if the function corresponding to the user click is called of a certain event, executing corresponding calling and recording calling information, and if the function corresponding to the user click is marked, displaying a mark and recording mark information.
Further, recognizing the first hand and the second hand by detecting hand gestures of the user comprises detecting pre-designated hand gestures to determine the first hand and the second hand, and realizing switching of the first hand and the second hand through gestures with specific semantics; and the second hand is used as a touch initiator to perform touch operation on the projection picture of the first hand.
Further, the invention also provides a hand projection interaction system, which comprises an information acquisition unit, a projection unit and a calculation and analysis unit, wherein the calculation and analysis unit is respectively in communication connection with the information acquisition unit and the projection unit, and the hand projection interaction system comprises:
the information acquisition unit is used for acquiring a first hand picture and a second hand picture of a user;
the calculation analysis unit is used for analyzing and identifying a plurality of key bone joint points of the hand after receiving the hand picture of the user, and tracking a plurality of key bone joint points near the finger tip of the second hand;
the calculation analysis unit is used for calculating projection conversion parameters according to the identified key bone joint points, and the projection unit is used for adjusting a projection picture according to the projection conversion parameters to realize the anchoring of the projection picture and the first hand;
the computational analysis unit is further configured to obtain: a first longitudinal depth value corresponding to the coordinates of a plurality of key bone joint points near the second hand finger tip, a second longitudinal depth value corresponding to the coordinates of an operation button in the projection picture, and a first transverse distance between the coordinates of a plurality of key bone joint points near the second hand finger tip and the coordinates of the operation button; when the difference between the first longitudinal depth value and the second longitudinal depth value reaches a first threshold value and the first transverse distance reaches a second threshold value, the calculation and analysis unit judges that the user clicks an event, and the calculation and analysis unit sends a projection change instruction to the projection unit according to an operation button clicked by the user;
the projection unit is also used for updating projection content according to the projection change instruction and finishing interaction with a user.
Further, the information acquisition unit comprises an RGB camera, a TOF camera, an RGB camera and a 3D structured light camera or a binocular camera; the projection unit is a projector.
Further, the present application also provides a storage medium, where the storage medium includes a stored program, where when the program runs, the device where the storage medium is located is controlled to execute the aforementioned hand projection interaction method.
The technical problem that this application embodiment mainly solved lets projection interaction more nimble promptly, no longer is subject to the size of space occasion, lets hand each other no longer need the electronic equipment based on hand intelligence wrist-watch etc. only need utilize the camera of mutual equipment to carry out scene information and catches, to behind the scene information processing alright realize user interaction on user's palm projection. In addition, the problem of site limitation of virtual-real interaction technology is solved, for example, effective interaction behaviors cannot be carried out in a small space, equipment needs to be arranged in a space, an interaction plane can be projected on the palm of a hand of a user, and control interaction of projection contents is achieved. Compared with the traditional hand interaction technology, the technology can realize single-hand operation, and the operation interface is equivalently expanded to the whole palm, so that more operation space is provided for a user. In addition, under many scenes, for example, when the user is at home or in the car, often need to operate intelligent furniture, the seat, the door window, the screen, these equipment is sometimes nearer apart from the user, sometimes far away, user operation can be tired when the distance is far away, and probably influence person's safety of driving, at this moment, through the scheme of this application embodiment, can realize that the user only needs to lift the hand, realize the demonstration of projection content promptly, can realize relevant operation at the palm center, it is very convenient and fast convenient, equivalent to combine intelligent device control and the interactive capability in little space, thereby realized a very convenient and friendly human-computer interaction characteristic, it needs to use many equipment and great problem to solve original operation.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic structural diagram of a hand projection interaction system according to an embodiment of the present application;
FIG. 2 is a flow chart of a hand projection interaction method according to an embodiment of the application;
FIG. 3 is a schematic diagram of key bony joint points of a user's hand according to an embodiment of the application;
FIG. 4 is a schematic diagram of a user's hand key bone joint interpolation method according to an embodiment of the present application;
FIG. 5 is a schematic flow chart illustrating a process for determining whether a valid click event is generated for a user according to an embodiment of the present application;
FIG. 6a is one of effect diagrams of a user touch hand interacting with a projection hand according to an embodiment of the present application;
fig. 6b is a diagram of still another effect of the user touch hand interacting with the projection hand according to the embodiment of the application.
In fig. 6a and 6B, a is a first hand, and B is a second hand.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The terms in the examples of the present application are defined as follows unless otherwise specified:
hand key bone joint points (key nodes for short): the point sets used to form the nodes of each key part of the hand are mainly used to detect the position of the hand and to determine the general projection area.
RGB graph: and the distance is represented by color information after the distance is identified by the camera.
RGB camera: for acquiring color images.
The depth camera: depth images are acquired to assist key bony joint points of the hand to achieve a more accurate projection.
Binocular camera: the functions of the RGB camera and the depth camera can be realized simultaneously, namely the acquisition of color and depth images is realized simultaneously.
TOF: namely the abbreviation of time to flight, the depth is determined by the "time of flight".
Example 1
In accordance with an embodiment of the present application, there is provided a hand projection interaction method embodiment, it being noted that the steps of the method may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is shown in a step flow diagram, in some cases, the steps shown or described may be performed in an order different than here.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hand projection interaction system schematic diagram for implementing a hand projection interaction method. The hand projection interactive system shown in fig. 1 comprises an information acquisition unit, a projection unit and a calculation and analysis unit, wherein the calculation and analysis unit is respectively in communication connection with the information acquisition unit and the projection unit.
Fig. 2 is a flowchart of a hand projection interaction method according to an embodiment of the present application, where the method is applied to a mobile terminal, a computer, a projector, and other devices, which may all be implemented based on the hardware structure of fig. 1.
Referring to fig. 2, the hand projection interaction method may include:
s01, acquiring a first hand picture and a second hand picture of a user;
s02, identifying a plurality of key bone joint points of the hand, and tracking a plurality of key bone joint points near the finger tip of the second hand;
s03, calculating projection conversion parameters according to the plurality of key bone joint points identified and tracked in the step S02, and adjusting a projection picture according to the projection conversion parameters to realize anchoring of the projection picture and the first hand;
s04, when the user clicks the projection picture of the first hand by using the finger tip of the second hand, starting to obtain: the first longitudinal depth value corresponding to the coordinates of a plurality of key bone joint points near the second hand finger tip, the second longitudinal depth value corresponding to the coordinates of the operation button in the projection picture, and the first transverse distance between the coordinates of a plurality of key bone joint points near the second hand finger tip and the coordinates of the operation button;
s05, when the difference between the first longitudinal depth value and the second longitudinal depth value reaches a first threshold value and the first transverse distance reaches a second threshold value, judging that the user makes a click event, and sending a projection change instruction according to an operation button clicked by the user;
and S06, updating projection contents according to the projection change instruction, and finishing interaction with a user.
The embodiment of the invention realizes the anchoring of the projection picture and the hand of the user, and uses a curved surface or an irregular plane, for example, the hand of the user, especially the palm, as a carrier of the projection picture, that is, as a projection screen, so that the projection interaction is not limited to a plane any more, and the touch recognition interaction technology is suitable for more scenes, especially, the touch recognition interaction technology can also realize excellent recognition for a desktop with a curved surface and a disordered object distribution, and the user can customize an interaction plane without being limited by the desktop or the palm, which is one of the invention points of the invention. Of course, when the user places the book on the desktop, it is also feasible to perform the click operation with the book plane as the interactive surface.
In the embodiment, a plurality of key bone joint points of the hand are identified, a plurality of key bone joint points near the finger tip of the second hand are tracked, the identification result of the key bone joint points of the hand is creatively combined with the depth graphic representation to realize the touch judgment of the user behavior, and the far-field touch interaction is realized through the scheme of combining the visible light image and the depth image, which is one of the important invention points of the invention.
The touch detection method based on depth contrast in this embodiment may adopt two schemes, namely an integral modeling method and a local point set method: the whole modeling method aims at the whole fixed interactive plane, and a user can only be regarded as clicking when directly contacting the fixed plane, and the scheme mainly aims at the desktop. And the local point set method aims at partial contact surfaces around the fingertips of the user, namely the user can customize the interaction surface to a certain extent.
Preferably, the step S01 further includes the steps of:
carrying out projection focusing and trapezoidal correction, and carrying out coincidence and calibration judgment on picture signals until a projection picture is clear;
the ChArUco marker containing the coded information is displayed in the projection picture.
Preferably, step S02 specifically includes:
separating a palm area of the first hand through palm detection;
analyzing 21 key bone joint points of the first hand by a hand bone joint point calibration algorithm, and storing coordinates of the 21 key bone joint points;
expanding 21 key bone joint points according to user setting through an interpolation algorithm to obtain more key bone joint point coordinates of a first hand palm area and storing the coordinates;
and tracking five key bone joint points of the second hand fingertip, and updating the coordinates of the five key bone joint points in real time.
In this embodiment, the calibration method for calibrating the hand bone joint points includes two steps, namely palm detection and key bone joint point calibration tracking. The palm detection adopts fast R-CNN to carry out single-class detection, comprises four steps of convolution extraction, candidate region extraction, candidate de-pooling and classification, and can adopt a palm tracking algorithm based on optical flow to improve the frame rate of palm tracking after the palm is detected. The hand key bone joint point calibration tracking adopts an end-to-end mode, namely, a deep learning model similar to MobileNet V2 is utilized, the coordinates of the hand key bone joint point are directly calculated based on a hand image, and the specific flow is as follows:
the bone joint point calibration model needs to output three types of data, the first is the x and y coordinates of 21 key bone joint points, the second is the value for marking the existence probability of the hand, and the third is the identifier of whether the hand model is a left hand or a right hand. The 21 key points are shown in figure 3.
The algorithm uses a CPM (conditional position machine) -based model to output the coordinates of 21 key bone joint points of the hand, and uses a multi-scale mode to enlarge the receptive field. The algorithm uses the model to quickly judge in real time whether a reasonable hand structure exists in a palm detection frame given by the palm detection model in the first step, and outputs a probability. If the probability is less than a certain threshold, the palm detection model of the first step will be restarted.
Given a specific formula for joint point prediction, the hand gesture of the user can be obtained by the following formula:
the formula I is as follows: labels obtained using the hourglass
Figure 206234DEST_PATH_IMAGE001
Generating a thermodynamic diagram of the hand joint point k (the thermodynamic diagram is a probability diagram and is consistent with the pixel composition of the picture, but the data at each pixel position is the probability that the current pixel is a certain joint, and further joint information is analyzed based on the probability):
Figure 44746DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 155309DEST_PATH_IMAGE003
identifying an accurate thermodynamic diagram for the hand joint points at each pixel region;
Figure 878414DEST_PATH_IMAGE004
the method comprises the steps that a hourglass frame is utilized to process a picture to obtain data corresponding to each pixel point; p is also the predicted position of the collector stage in the image after the hourglass framework processing,
Figure 161628DEST_PATH_IMAGE005
for the thermodynamic diagram local action range control parameter, the parameter value is set to the square of the bandwidth parameter in the gaussian kernel function.
The formula II is as follows: thermodynamic diagram predicted according to formula one
Figure 577566DEST_PATH_IMAGE003
Further, the position P of the key bone joint point k in the image is obtained (further correction is performed based on the predicted position, more accurate position information is obtained)
Figure 313441DEST_PATH_IMAGE006
Then, as to how to obtain a specific motion, the gesture is first classified, and the position area of each joint is given for each class, and the current motion is determined as long as each joint is left in the corresponding balance.
Finally, the original 21 key bone joint points can be expanded by using an interpolation method and selected or rejected according to the actual application scene. For example, due to the lack of a joint point at the palm center position in the original key bone joint points, the middle points (Mp) of 1 and 5, 0 and 9, 0 and 17 in the original key bone joint points can be taken as new key bone joint points for anchoring the projection information and the touch buttons, as shown in fig. 4.
Preferably, step S03 specifically includes:
according to the plurality of key bone joint points identified and tracked in the step S02 and the chraruco mark in the projection picture, the corresponding vertex coordinates of the projection picture in the image space are calculated, and the depth value of the projection picture at the time is recorded, the projection conversion parameter can be calculated and stored by a quadrilateral mapping method, for example, according to the depth value of the projection picture and the initial projection conversion parameter, and the projection picture is adjusted according to the projection conversion parameter, so as to anchor the projection picture and the first hand.
Preferably, step S04 further includes a click determination step:
when the distance between the fingertip of the second hand of the user and the projection picture of the first hand is close to 6-10mm, preferably 1cm, the user is judged to click, and at this time, the depth information of the object in the projection range starts to be continuously collected so as to obtain the first longitudinal depth value, the second longitudinal depth value and the first transverse distance.
Preferably, in step S05, if it is determined whether the user has made a click event and the touch event is a valid touch event, a touch determination flow shown in fig. 5 is adopted:
s501: judging to start;
s502: calculating Euclidean distances between the coordinates of the key bone joint points of the index finger tip of the touch hand, namely the second hand, and the coordinates of each key bone joint point of the projection hand, namely the first hand, according to the coordinate information (image space) of the key bone joint points of the hand;
s503: sorting the results, taking the minimum distance, judging whether the minimum distance is smaller than a third threshold value, preferably taking the threshold value as 6mm-10mm, further preferably 10mm,
s504: if the minimum distance is smaller than the third threshold value, putting the coordinates of the key bone joint points of the index finger tip of the touch hand, namely the second hand, corresponding to the minimum distance into a sliding window, and entering step S505; if not, determining that no touch exists, and returning to the step S502;
s505: calculating the average coordinate value of the coordinates in the sliding window, and calculating the distance between the average coordinate value and the coordinates of the current key bone joint point;
s506: judging whether the calculated result is smaller than a fourth threshold value, and if so, entering S507; otherwise, the touch is not considered, and the mobile terminal returns to the starting state;
s507: dynamically acquiring a certain area range according to the coordinates of the tip of the second finger and the length of the second joint of the finger, and performing mean value calculation after removing abnormal values of a point set of a circular area as shown in fig. 6b to serve as a depth value at the current fingertip coordinate, namely a first longitudinal depth value; in the same manner, the depth value at the key bone joint coordinate closest to the touch finger tip in the projected hand obtained in step S502, that is, the second longitudinal depth value, is obtained through calculation. Taking an absolute value after the difference between the two depth values is made;
s508: judging whether the depth difference is smaller than a first threshold, if so, judging that the user triggers a click event, entering S509, and if not, returning to the step S502;
s509: a click event is initiated.
The touch hand and the projection hand are shown in fig. 6 a.
Preferably, in step S05, the determining that the user has made a click event, and issuing a projection change instruction according to the operation button clicked by the user specifically includes:
after the fact that the user click event is judged, acquiring current projection content, analyzing the user state and action intention by combining multi-frame click information, and taking an analysis result as source data of next analysis;
if the function corresponding to the user click is called of a certain event, executing corresponding calling and recording calling information, and if the function corresponding to the user click is marked, displaying a mark and recording mark information.
Preferably, the first hand and the second hand are recognized by detecting hand gestures of the user, the method comprises the steps of detecting a pre-designated hand gesture to determine the first hand and the second hand, and switching the first hand and the second hand through a gesture with a specific semantic meaning; generally, the second hand is used as a touch initiator, i.e., a touch hand, to perform a touch operation on the projection screen of the first hand (as a projection hand), for example, a hand with a palm of the user open and a five fingers straight can be recognized as the first hand, and the projection screen can be focused and anchored on the palm of the first hand; and identifying the hand of the user who only straightens one finger when the user clenches a fist as a second hand, and using the finger straightened by the second hand as a touch pen to perform touch operation on the projection picture of the first hand.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the hand projection interaction method according to the foregoing embodiments can be implemented by software plus a necessary general hardware platform, and certainly can be implemented by hardware, but the former is a better implementation in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method of the embodiments of the present application.
Example 2
According to the embodiment of the application, a hand projection interaction system for implementing the hand projection interaction method is further provided, and the system is implemented in a mobile terminal, a computer, a projector and other devices in a software or hardware mode.
As shown in fig. 1, the hand projection interactive system includes:
the information acquisition unit 302, the projection unit 304 and the calculation analysis unit 306, the calculation analysis unit 306 is connected with the information acquisition unit 302 and the projection unit 304 respectively in a communication manner, wherein:
the information acquisition unit is used for acquiring a first hand picture and a second hand picture of a user;
the calculation and analysis unit is used for analyzing and identifying a plurality of key bone joint points of the hand after receiving the hand picture of the user, and tracking a plurality of key bone joint points near the finger tip of the second hand;
the calculation analysis unit is used for calculating projection conversion parameters according to the identified key bone joint points, and the projection unit is used for adjusting a projection picture according to the projection conversion parameters to realize the anchoring of the projection picture and the first hand;
the computational analysis unit is further configured to obtain: a first longitudinal depth value corresponding to the coordinates of a plurality of key bone joint points near the second hand finger tip, a second longitudinal depth value corresponding to the coordinates of an operation button in the projection picture, and a first transverse distance between the coordinates of a plurality of key bone joint points near the second hand finger tip and the coordinates of the operation button; when the difference between the first longitudinal depth value and the second longitudinal depth value reaches a first threshold value and the first transverse distance reaches a second threshold value, the calculation and analysis unit judges that the user makes a click event, and the calculation and analysis unit sends a projection change instruction to the projection unit according to an operation button clicked by the user;
the projection unit is also used for updating projection content according to the projection change instruction and finishing interaction with a user.
Preferably, the information acquisition unit may be a combination of an RGB camera and a TOF camera, a combination of an RGB camera and a 3D structured light camera, or a binocular camera.
When the information acquisition unit is selected, except that the RGB camera and the depth camera are directly used for identifying the depth through three-dimensional structured light or TOF technology, the camera is limited by the camera, and the depth information in the current scene cannot be well identified under the condition of a short distance, such as within 40cm, so that the binocular camera can be used in the three-dimensional touch identification method based on the RGB map as an extension. Through the characteristic that the calculated amount of the binocular camera is large, the recognition range can be limited by no more 40cm, effective recognition of depth information can be achieved under the relatively close condition, an RGB (red, green and blue) graph is further output, touch behaviors are jointly judged by combining with a key node prediction result, the binocular camera can simultaneously acquire RGB and a depth image (the depth image needs to be calculated through the RGB image), and the image matching is simple and rapid.
Preferably, initialization of information acquisition units such as an RGB camera and a depth camera is realized through invocation by OpenCV and OpenNI.
In another embodiment of the present invention, the binocular camera based hand projection interaction method includes the following detailed procedures:
preferably, the computational analysis unit includes, but is not limited to, a computing pad, which may provide computing capabilities.
S1: initializing a binocular camera, acquiring hand posture information of a current user in real time, and performing hand delineation by using a computing board;
s1 specifically includes:
the first step is as follows: shooting a current scene by using a binocular camera, acquiring left and right pictures of the current scene, such as a first hand picture and a second hand picture, and preprocessing and slightly correcting information;
the second step is that: after the processed hand picture is obtained, the prediction, identification and tracking of the positions of all key bone joint points of the hand are further carried out by utilizing a palm detection algorithm and a hand key bone joint point calibration algorithm based on a curled neural network, so that the current hand posture of the user can be obtained, and the user waits for the next step of use after the current hand posture is stored;
s2: when the computing board analyzes and judges that a certain frame of user action is determined to be pressing a touch point (such as a touch button) of a hand projection picture, the computing board acquires action information of the previous frames of users from storage, and further analyzes the user action;
s2 specifically includes:
the first step is as follows: calculating the depth of a scene object in real time through a hand picture acquired by a binocular camera, and if the distance difference between the key bone joints of the hand and the touch points of the projection picture is judged to be less than 5mm, judging that the user moves as the touch points for pressing the projection picture;
the second step is that: after the fact that the user presses or clicks the event is judged, acquiring the current projection content, analyzing the user state and action intention by combining multi-frame clicking information, and taking an analysis result as source data of the next analysis;
s3: the computing board further calculates according to the current user state and action to obtain a method called by the current user, and simultaneously transmits changes needed to be made by the projector to the projector;
s3 specifically includes:
the first step is as follows: the computing board analyzes the specific hand action of the user by utilizing the multi-frame state information and further obtains the track information of the plane pressing part of the user;
the second step is that: acquiring the projection content of the current projector, judging the functions related to the same position by combining track information, wherein the judgment mode is to acquire the pressing event position (generally a fingertip) judged by the computing board, and then identifying the function of the current position;
the third step: after the related functions are known, if the functions are called by certain events, calling is carried out, and calling information is recorded, and if the functions are purely added with marks, marking information is recorded;
the fourth step: the computing board transmits calling information or marking information generated by user action to the projector in an instruction mode, and the projector updates projection content according to projection change instructions of the computing board to complete interaction with users.
The hand projection interactive system optimizes the equipment for touch identification, and only needs an RGB camera and a depth camera (or a binocular camera) at least, so that the algorithm is written into the computing board, the identification of user touch can be realized, the use is more flexible, and the applicable occasions are wider.
Example 3
Embodiments of the present application also provide a storage medium. Optionally, in this embodiment, the storage medium may be configured to store program codes executed by the hand projection interaction method.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Further, in this embodiment, the storage medium is configured to store the program code for executing any one of the method steps listed in embodiment 1, which is not described in detail herein for brevity.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is only a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A hand projection interaction method is characterized by comprising the following steps:
s01, acquiring hand pictures of a user, wherein the hand pictures comprise a first hand picture and a second hand picture;
s02, identifying a plurality of key bone joint points of the hand, and tracking a plurality of key bone joint points near the finger tip of the second hand;
s03, calculating projection conversion parameters according to the plurality of key bone joint points identified and tracked in the step S02, and adjusting a projection picture according to the projection conversion parameters to realize anchoring of the projection picture and the first hand;
s04, when the user clicks the projection picture of the first hand by using the finger tip of the second hand, starting to obtain: a first longitudinal depth value corresponding to the coordinates of a plurality of key bone joint points near the second hand finger tip, a second longitudinal depth value corresponding to the coordinates of an operation button in the projection picture, and a first transverse distance between the coordinates of a plurality of key bone joint points near the second hand finger tip and the coordinates of the operation button;
s05, when the difference between the first longitudinal depth value and the second longitudinal depth value reaches a first threshold value and the first transverse distance reaches a second threshold value, judging that a user clicks an event, and sending a projection change instruction according to the operation button clicked by the user;
and S06, updating projection contents according to the projection change instruction, and finishing interaction with a user.
2. The method according to claim 1, wherein step S01 further comprises the steps of:
carrying out projection focusing and trapezoidal correction, and carrying out coincidence and calibration judgment on picture signals until a projection picture is clear;
the ChArUco marker containing the coded information is displayed in the projection picture.
3. The method according to claim 1, wherein the step S02 specifically includes:
separating a palm area of the first hand through palm detection;
analyzing 21 key bone joint points of the first hand by a hand bone joint point calibration algorithm, and storing coordinates of the 21 key bone joint points;
expanding the 21 key bone joint points according to user setting through an interpolation algorithm to obtain more key bone joint point coordinates of a first hand palm area and store the coordinates;
and tracking five key bone joint points of the second hand fingertip, and updating the coordinates of the five key bone joint points in real time.
4. The method according to claim 2, wherein the step S03 specifically includes:
according to the plurality of key bone joint points identified and tracked in the step S02 and the ChArUco mark in the projection picture, the corresponding vertex coordinates of the projection picture in the image space are calculated, meanwhile, the depth value of the projection picture at the moment is recorded, the projection conversion parameter is calculated and stored according to the depth value of the projection picture and the initial projection conversion parameter, and the projection picture is adjusted according to the projection conversion parameter, so that the anchoring between the projection picture and the first hand is realized.
5. The method according to claim 4, wherein the step S04 further comprises a click determination step of:
and when the distance between the fingertip of the second hand of the user and the projection picture of the first hand is close to 1cm, judging that the user is about to click, and continuously acquiring the first longitudinal depth value, the second longitudinal depth value and the first transverse distance.
6. The method according to claim 1, wherein in step S05, determining that the user has made a click event, and issuing a projection change instruction according to the operation button clicked by the user specifically includes:
after the fact that the user click event is judged, acquiring current projection content, analyzing the user state and action intention by combining multi-frame click information, and taking an analysis result as source data of next analysis;
if the function corresponding to the user click is the call of a certain event, executing the corresponding call and recording the call information, and if the user click is the mark function, displaying the mark and recording the mark information.
7. The method of claim 1,
identifying the first hand and the second hand by detecting hand gestures of the user, including detecting pre-specified hand gestures to determine the first hand and the second hand, and enabling switching of the first hand and the second hand by gestures of a specific semantic; and the second hand is used as a touch initiator to perform touch operation on the projection picture of the first hand.
8. The utility model provides a hand projection interactive system, includes information acquisition unit, projection unit and computational analysis unit, computational analysis unit is connected its characterized in that with information acquisition unit and projection unit communication respectively:
the information acquisition unit is used for acquiring a first hand picture and a second hand picture of a user;
the calculation and analysis unit is used for analyzing and identifying a plurality of key bone joint points of the hand after receiving the hand picture of the user, and tracking a plurality of key bone joint points near the finger tip of the second hand;
the calculation analysis unit is also used for calculating projection conversion parameters according to the identified key bone joint points; the projection unit is used for adjusting a projection picture according to the projection conversion parameter to realize the anchoring of the projection picture and the first hand;
the computational analysis unit is further configured to obtain: a first longitudinal depth value corresponding to the coordinates of a plurality of key bone joint points near the second hand finger tip, a second longitudinal depth value corresponding to the coordinates of an operation button in the projection picture, and a first transverse distance between the coordinates of a plurality of key bone joint points near the second hand finger tip and the coordinates of the operation button; when the difference between the first longitudinal depth value and the second longitudinal depth value reaches a first threshold value and the first transverse distance reaches a second threshold value, the calculation and analysis unit judges that the user makes a click event, and the calculation and analysis unit sends a projection change instruction to the projection unit according to an operation button clicked by the user;
the projection unit is also used for updating projection content according to the projection change instruction and finishing interaction with a user.
9. The system of claim 8,
the information acquisition unit comprises an RGB camera and TOF camera combination, an RGB camera and 3D structured light camera combination or a binocular camera; the projection unit is a projector.
10. A storage medium, characterized in that the storage medium comprises a stored program, wherein the device on which the storage medium is located is controlled to perform the method according to any of claims 1-7 when the program is run.
CN202210958436.6A 2022-08-11 2022-08-11 Hand projection interaction method, system and storage medium Active CN115061577B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210958436.6A CN115061577B (en) 2022-08-11 2022-08-11 Hand projection interaction method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210958436.6A CN115061577B (en) 2022-08-11 2022-08-11 Hand projection interaction method, system and storage medium

Publications (2)

Publication Number Publication Date
CN115061577A true CN115061577A (en) 2022-09-16
CN115061577B CN115061577B (en) 2022-11-11

Family

ID=83208325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210958436.6A Active CN115061577B (en) 2022-08-11 2022-08-11 Hand projection interaction method, system and storage medium

Country Status (1)

Country Link
CN (1) CN115061577B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472189A (en) * 2023-12-27 2024-01-30 大连三通科技发展有限公司 Typing or touch control realization method with physical sense

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080312663A1 (en) * 2007-06-15 2008-12-18 Martin Haimerl Computer-assisted joint analysis using surface projection
CN106055091A (en) * 2016-05-16 2016-10-26 电子科技大学 Hand posture estimation method based on depth information and calibration method
CN112657176A (en) * 2020-12-31 2021-04-16 华南理工大学 Binocular projection man-machine interaction method combined with portrait behavior information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080312663A1 (en) * 2007-06-15 2008-12-18 Martin Haimerl Computer-assisted joint analysis using surface projection
CN106055091A (en) * 2016-05-16 2016-10-26 电子科技大学 Hand posture estimation method based on depth information and calibration method
CN112657176A (en) * 2020-12-31 2021-04-16 华南理工大学 Binocular projection man-machine interaction method combined with portrait behavior information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472189A (en) * 2023-12-27 2024-01-30 大连三通科技发展有限公司 Typing or touch control realization method with physical sense
CN117472189B (en) * 2023-12-27 2024-04-09 大连三通科技发展有限公司 Typing or touch control realization method with physical sense

Also Published As

Publication number Publication date
CN115061577B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
JP6079832B2 (en) Human computer interaction system, hand-to-hand pointing point positioning method, and finger gesture determination method
US8768006B2 (en) Hand gesture recognition
KR20130099317A (en) System for implementing interactive augmented reality and method for the same
CN106201173B (en) A kind of interaction control method and system of user's interactive icons based on projection
CN112506340B (en) Equipment control method, device, electronic equipment and storage medium
CN114138121B (en) User gesture recognition method, device and system, storage medium and computing equipment
US20120163661A1 (en) Apparatus and method for recognizing multi-user interactions
Leite et al. Hand gesture recognition from depth and infrared Kinect data for CAVE applications interaction
CN115061577B (en) Hand projection interaction method, system and storage medium
Jiang et al. independent hand gesture recognition with Kinect
CN113282164A (en) Processing method and device
Molyneaux et al. Cooperative augmentation of mobile smart objects with projected displays
Abdallah et al. An overview of gesture recognition
CN114581535B (en) Method, device, storage medium and equipment for marking key points of user bones in image
Pansare et al. Gestuelle: A system to recognize dynamic hand gestures using hidden Markov model to control windows applications
CN115497094A (en) Image processing method and device, electronic equipment and storage medium
Siam et al. Human computer interaction using marker based hand gesture recognition
Hannuksela et al. Face tracking for spatially aware mobile user interfaces
KR101998786B1 (en) Non-contact Finger Input Device and Method in Virtual Space
CN111640185A (en) Virtual building display method and device
Wang Hand Tracking and its Pattern Recognition in a Network of Calibrated Cameras
Kale et al. Epipolar constrained user pushbutton selection in projected interfaces
Hiremath et al. Gesture Recognition System
Hannuksela Camera based motion estimation and recognition for human-computer interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant