CN114860072A - Gesture recognition interaction equipment based on monocular camera - Google Patents

Gesture recognition interaction equipment based on monocular camera Download PDF

Info

Publication number
CN114860072A
CN114860072A CN202210404958.1A CN202210404958A CN114860072A CN 114860072 A CN114860072 A CN 114860072A CN 202210404958 A CN202210404958 A CN 202210404958A CN 114860072 A CN114860072 A CN 114860072A
Authority
CN
China
Prior art keywords
palm
detector
detecting
gesture recognition
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210404958.1A
Other languages
Chinese (zh)
Inventor
时沐朗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202210404958.1A priority Critical patent/CN114860072A/en
Publication of CN114860072A publication Critical patent/CN114860072A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses gesture recognition interaction equipment based on monocular camera, including camera, detector and feature extractor, it is used for discerning background scene consciousness. The interaction method comprises the following steps: 1) establishing a palm three-dimensional coordinate image library through a three-dimensional palm characteristic point model; 2) displaying an intelligent interactive picture; 3) operating on the image and calculating a palm position by the detector; 4) the intelligent interactive picture playing is controlled according to the user palm movement data, in daily use, by adopting the technical scheme, the feature extractor operates the whole image and calculates the hand positions, and the three-dimensional hand feature point model extracted by the detector operates the positions and predicts the approximate three-dimensional surface through regression. After the palm is accurately cut, the requirement for enhancing common data is greatly reduced, the resource consumption is low, and the required hardware threshold is low. The compatibility is strong, and the system can be operated in a crossing way. The universality is strong, and the device is suitable for various types of cameras and operation platforms.

Description

Gesture recognition interaction equipment based on monocular camera
Technical Field
The invention relates to gesture recognition interaction equipment based on a monocular camera.
Background
In the online PC live broadcast teaching process, due to the loss of interactive equipment, a numerical control board is similar to interactive equipment and is not popularized, and due to the limitation of live broadcast equipment (PC teaching), a teacher uses traditional interactive equipment (such as a mouse), so that the experience feeling is poor, and even the teaching quality is influenced, and therefore improvement is needed.
Disclosure of Invention
The present invention is directed to solving one of the technical problems of the prior art.
The application provides a gesture recognition interaction device based on monocular camera includes:
a camera for acquiring an image;
a feature extractor for locating a rough range in the image where the palm is located;
the detector is used for accurately cutting the image in the rough range to obtain a palm image and/or an operation intelligent interactive picture;
and the marker is used for identifying the palm joint characteristic points in the palm graph to position the palm.
Meanwhile, an interaction method based on the gesture recognition interaction equipment is disclosed, and comprises the following steps:
1) establishing a palm three-dimensional coordinate image library through a three-dimensional palm characteristic point model;
2) displaying an intelligent interactive picture;
3) operating on the image and calculating a palm position by the detector;
4) and controlling the intelligent interactive picture to play according to the user palm movement data.
The palm three-dimensional coordinate image library establishing step:
1) the ML Pipeline is formed by two deep neural network models working together in real time;
2) operating the intelligent interactive picture through a detector and calculating the hand position;
3) operating the positions through a three-dimensional palm feature point model and predicting an approximate three-dimensional surface through regression;
4) directly predicting coordinates of 21 three-dimensional finger joint coordinates in the detected hand region through regression, and performing model learning consistent internal hand posture representation;
5) and manually annotating the real images positioned by the 21 three-dimensional finger joint coordinates to establish a palm three-dimensional coordinate image library.
Further comprising:
training the detector, modeling by using a square bounding box, and neglecting other length-width ratios;
wherein focus loss is minimized during training.
Allowing the deep neural network to use most of its computational power for accuracy of coordinate prediction.
The marker is generated from the palm feature points identified in the previous frame, and the detector is invoked to reposition the palm when the feature point model is no longer able to identify the presence of the palm.
The intelligent interactive picture operation steps are as follows:
1) initializing an environment, and starting a mediaprofile and an OpenCV framework;
2) initializing a Hand Detector class;
3) initializing a main program of the drawing board, and calling a Hand Detector class;
4) judging whether the camera is successfully opened or not, and starting a gesture detection algorithm after the camera is successfully opened;
5) selecting a drawing mode, namely, opening two or more fingers or palms to be a tool selection mode;
6) drawing mode-only index finger is raised, the currently selected drawing tool is judged, and the color of the selected drawing tool is superimposed.
The Hand Detector class includes the following members:
1) for detecting, segmenting and marking the palms and joints;
2) detecting specific positions of the palm and the fingertips of each finger;
3) detecting a finger gesture;
4) detecting the position of a finger;
5) detecting and calculating the distance between fingers;
wherein detecting the finger gesture includes determining which fingers are lifted.
The initialization steps of the main program of the drawing board are as follows:
1) importing an external library and a module;
2) adjusting the size of the brush;
3) opening a Library folder, opening and calling a designed interactive picture, and initializing a UI (user interface);
4) turning on a camera by using an OpenCV framework;
5) and setting the size of the software window.
The gesture detection process comprises the following steps:
1) calling the Hand Detector;
2) creating an img variable, and storing a picture captured by a camera in real time into the img variable;
3) detecting fingers, judging and detecting the positions of an index finger and a middle finger;
4) detecting a palm posture;
wherein detecting the palm posture comprises determining which fingers are lifted.
The invention has the following beneficial effects:
1. the resource consumption is low, and the required hardware threshold is low;
2. the compatibility is strong, and the system can be operated;
3. the universality is strong, and the device is suitable for cameras and operation platforms of various models;
4. the application range is wide.
Drawings
Fig. 1 is a schematic diagram illustrating a principle of a gesture recognition interaction device and an interaction method based on a monocular camera in an embodiment of the present application;
FIG. 2 is a schematic diagram of positions of 21 three-dimensional palm feature points in the embodiment of the present application;
FIG. 3 is a flowchart of an interaction method of the gesture recognition interaction device in the embodiment of the present application;
FIG. 4 is a flowchart of a step of creating a palm three-dimensional coordinate image library in the embodiment of the present application;
FIG. 5 is a flowchart illustrating the operation steps of an intelligent interactive screen in the embodiment of the present application;
FIG. 6 is a flowchart of the initialization procedure of the main program of the drawing board in the embodiment of the present application;
FIG. 7 is a flowchart illustrating a main program initialization procedure of a drawing board according to an embodiment of the present application;
FIG. 8 is a flow chart of gesture detection in an embodiment of the present application.
Detailed Description
Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application are capable of operation in other sequences than those illustrated or described herein, and that the terms "first," "second," etc. generally refer to a class of objects and do not limit the number of objects, for example, a first object may be one or more. Further, in the specification and claims, "and/or" means at least one of the connected objects, the character "/" generally means a relationship that preceding and succeeding associated objects are an "or".
The server provided by the embodiments of the present application is described in detail below with reference to the accompanying drawings by specific embodiments and application scenarios thereof.
As shown in fig. 1 to 7, an embodiment of the present application provides a gesture recognition interaction device based on a monocular camera, including a camera for acquiring an image; a feature extractor for locating a rough range in the image where the palm is located; the detector is used for accurately cutting the image in the rough range to obtain a palm image and/or an operation intelligent interactive picture; and the marker is used for identifying the palm joint characteristic points in the palm graph to position the palm.
Further, the method comprises the following steps:
1) establishing a palm three-dimensional coordinate image library through a three-dimensional palm characteristic point model;
2) displaying an intelligent interactive picture;
3) operating on the image and calculating a palm position by the detector;
4) and controlling the intelligent interactive picture to play according to the user palm movement data.
Further, the step of establishing the palm three-dimensional coordinate image library is as follows:
1) forming ML Pipeline by two real-time cooperative deep neural network models;
2) operating the intelligent interactive picture through a detector and calculating the hand position;
3) operating the positions through a three-dimensional palm feature point model and predicting an approximate three-dimensional surface through regression;
4) directly predicting coordinates of 21 three-dimensional finger joint coordinates in the detected hand region through regression, and performing model learning consistent internal hand posture representation;
5) and manually annotating the real images positioned by the 21 three-dimensional finger joint coordinates to establish a palm three-dimensional coordinate image library.
Further, the method also comprises the following steps: and training the detector, modeling by using a square bounding box, neglecting other length-width ratios and reducing the focus loss as much as possible in the training process.
Preferably, the following components: allowing the deep neural network to use much of its computational power for accuracy in coordinate prediction.
Preferably: the marker is generated from the palm feature points identified in the previous frame, and the detector is invoked to reposition the palm when the feature point model can no longer identify the presence of the palm.
Further, the intelligent interactive picture operation steps are as follows:
1) initializing an environment, and starting a mediaprofile and an OpenCV framework;
2) initializing a Hand Detector class;
3) initializing a main program of the drawing board, and calling a Hand Detector class;
4) judging whether the camera is successfully opened or not, and starting a gesture detection algorithm after the camera is successfully opened;
5) selecting a drawing mode, namely, opening two or more fingers or palms to be a tool selection mode;
6) drawing mode-only index finger is raised, the currently selected drawing tool is judged, and the color of the selected drawing tool is superimposed.
Further, the Hand Detector class includes the following members:
1) for detecting, segmenting and marking the palms and joints;
2) detecting specific positions of the palm and the fingertips of each finger;
3) detecting a finger gesture;
4) detecting a finger position;
5) detecting and calculating the distance between fingers;
wherein detecting the finger gesture includes determining which fingers are lifted.
Further, the initialization steps of the main program of the drawing board are as follows:
1) importing an external library and a module;
2) adjusting the size of the brush;
3) opening a Library folder, opening and calling a designed interactive picture, and initializing a UI (user interface);
4) turning on a camera by using an OpenCV framework;
5) and setting the size of the software window.
Further, the gesture detection process is as follows:
1) calling a Hand Detector;
2) creating an img variable, and storing a picture captured by a camera in real time into the img variable;
3) detecting fingers, judging and detecting the positions of an index finger and a middle finger;
4) detecting a palm posture;
wherein detecting the palm pose comprises determining which fingers are lifted.
In the embodiment of the application, mediapiphands is a high fidelity (HD) hand and finger tracking solution. It uses Machine Learning (ML) to infer 21 three-dimensional feature points of the hand. The current most advanced method mainly relies on strong Desktop environment (Desktop) for inference, and the method of the Mediapipe can complete real-time identification on a Mobile phone (Mobile), and can even be expanded to simultaneous identification of multiple hands.
Mediapipe- (machine learning channel) ML Pipeline;
mediapipeline handles uses an ML channel composed of multiple models together. The palm detection model performs traversal operations on the entire image and returns a directional palm bounding box.
The detector adopts a palm detector;
the hand feature point model can operate on a cut image region defined by the palm detector and return high-fidelity 3D hand key points, and the strategy is as follows:
ML Pipeline consists of two real-time deep neural network models and works simultaneously. A detector operates on the entire image and calculates hand positions, and a three-dimensional hand feature point model operates on these positions and predicts the approximated three-dimensional surface by regression. After the palm is cut accurately, the need for common data enhancement (such as bionic transformation consisting of rotation, translation and scale change) is greatly reduced, and the requirement allows the network to use most of the computation power of the network for the accuracy of coordinate prediction. Furthermore, in our Pipeline, markers can also be generated from hand feature points identified from the previous frame, invoking the palm detector to reposition the palm only if the feature point model can no longer identify the presence of a hand.
Pipeline is implemented in the form of a Mediapipe graph, tracking the subgraph using hand landmarks from the hand feature points module, and rendering using a dedicated hand renderer subgraph. And a hand characteristic point sub-graph of the same module and a palm detection sub-graph of the palm detection module are used inside the hand characteristic point tracking sub-graph. First, we train a palm detector, rather than a hand detector, because the bounding box of a rigid object like the palm and fist is estimated to be much simpler than a hand that detects articulated fingers. In addition, since the palm is a small object, the non-maximum suppression algorithm has a good effect even in the case where both hands are not open (such as handshake). Furthermore, the palm can be modeled using a square bounding box (called anchors in ML) ignoring other aspect ratios, thus reducing the number of anchors by a factor of 3-5. Second, a feature extractor consisting of an encoder-decoder structure is used for greater scene context awareness, even for small objects (similar to the RetinaNet method). Finally, we try to minimize the focus loss during training to support the large number of archors caused by the high scale difference.
With the above technique, we achieved an average accuracy of 95.7% in palm detection. With the normal cross entropy loss and no decoder, baseline is only 86.22%.
After palm detection of the whole image, the subsequent hand feature point model carries out accurate key point positioning on 21 three-dimensional finger joint coordinates in the detected hand region through regression, namely direct coordinate prediction. The model learns a consistent internal hand posture representation, and even the robustness of the hand detection result which is partially visible or incomplete is strong.
To obtain the real data, we manually annotated about 3 million real test images with 21 three-dimensional coordinates as shown in the following figure (we obtained the Z value from the image depth map if it existed at each corresponding coordinate). To better cover possible hand gestures and provide additional oversight on hand geometry, we also render a high quality synthetic hand model in various contexts and map it onto corresponding three-dimensional coordinates.
Pipeline is implemented in the form of a Mediapipe graph, tracking the subgraph using hand landmarks from the hand feature points module, and rendering using a dedicated hand renderer subgraph. A hand characteristic point sub-graph of the same module and a palm detection sub-graph of a palm detection module are used in the hand characteristic point tracking sub-graph;
the gesture recognition and interaction algorithm can be applied to various scenes to realize different functions, such as gesture control of a mouse.
Detect finger position, when judging what drawing instrument of selection:
1. if the finger is located between the coordinates 160 and 360, the color of the painting brush is set to be purple red (255, 0, 255)
2. Setting the color of the brush to blue (255, 125, 0) if the finger is between the coordinates 500-700
3. Setting the brush color to green (0, 255, 0) if the finger is between coordinates 815 and 965
4. If the finger is located between the coordinates 1050 and 1250, the brush is set as an eraser, i.e. colorless (0, 0, 0).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order, depending on the functionality involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the scope of the invention as defined by the appended claims.

Claims (10)

1. The utility model provides a gesture recognition mutual equipment based on monocular camera which characterized in that includes:
a camera for acquiring an image;
a feature extractor for locating a rough range in the image where the palm is located;
the detector is used for accurately cutting the image in the rough range to obtain a palm image and/or an operation intelligent interactive picture;
and the marker is used for identifying the palm joint characteristic points in the palm graph to position the palm.
2. The interaction method of the gesture recognition interaction device based on the claim 1 is characterized by comprising the following steps:
1) establishing a palm three-dimensional coordinate image library through a three-dimensional palm characteristic point model;
2) displaying an intelligent interactive picture;
3) operating on the image and calculating a palm position by the detector;
4) and controlling the intelligent interactive picture to play according to the user palm movement data.
3. The interaction method of the gesture recognition interaction device according to claim 2, wherein the step of establishing the palm three-dimensional coordinate image library comprises the steps of:
1) an ML Pipeline (channel) is formed by two real-time cooperative deep neural network models;
2) operating the intelligent interactive picture through a detector and calculating the hand position;
3) operating the positions through a three-dimensional palm feature point model and predicting an approximate three-dimensional surface through regression;
4) directly predicting coordinates of 21 three-dimensional finger joint coordinates in the detected hand region through regression, and performing model learning consistent internal hand posture representation;
5) and manually annotating the real images positioned by the 21 three-dimensional finger joint coordinates to establish a palm three-dimensional coordinate image library.
4. The interaction method of the gesture recognition interaction device according to claim 3, further comprising:
training the detector, modeling by using a square bounding box, and neglecting other length-width ratios;
wherein focus loss is minimized during training.
5. The interaction method of the gesture recognition interaction device according to claim 2, characterized in that:
allowing the deep neural network to use most of its computational power for accuracy of coordinate prediction.
6. The interaction method of the gesture recognition interaction device according to claim 2, characterized in that:
the marker is generated from the palm feature points identified in the previous frame, and the detector is invoked to reposition the palm when the feature point model is no longer able to identify the presence of the palm.
7. The interaction method of the gesture recognition interaction device according to any one of claims 1 to 6, wherein the intelligent interaction screen is executed by the following steps:
1) initializing an environment, and starting a mediaprofile and an OpenCV framework;
2) initializing a Hand Detector class;
3) initializing a main program of the drawing board, and calling a Hand Detector class;
4) judging whether the camera is successfully opened or not, and starting a gesture detection algorithm after the camera is successfully opened;
5) selecting a drawing mode, namely, opening two or more fingers or palms to be a tool selection mode;
6) drawing mode-only index finger is raised, the currently selected drawing tool is judged, and the color of the selected drawing tool is superimposed.
8. The interaction method of the gesture recognition interaction device according to claim 7, wherein the Hand Detector class comprises the following members:
1) for detecting, segmenting and marking the palms and joints;
2) detecting specific positions of the palm and the fingertips of each finger;
3) detecting a finger gesture;
4) detecting a finger position;
5) detecting and calculating the distance between fingers;
wherein detecting the finger gesture includes determining which fingers are lifted.
9. The interaction method of the gesture recognition interaction device according to claim 7, wherein the initialization step of the main program of the drawing board is as follows:
1) importing an external library and a module;
2) adjusting the size of the brush;
3) opening a Library folder, opening and calling designed interactive pictures, and initializing a UI (user interface);
4) turning on a camera by using an OpenCV framework;
5) and setting the size of the software window.
10. The interaction method of the gesture recognition interaction device according to claim 7, wherein the gesture detection process is as follows:
1) calling the Hand Detector;
2) creating an img variable, and storing a picture captured by a camera in real time into the img variable;
3) detecting fingers, judging and detecting the positions of an index finger and a middle finger;
4) detecting a palm posture;
wherein detecting the palm pose comprises determining which fingers are lifted.
CN202210404958.1A 2022-04-18 2022-04-18 Gesture recognition interaction equipment based on monocular camera Pending CN114860072A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210404958.1A CN114860072A (en) 2022-04-18 2022-04-18 Gesture recognition interaction equipment based on monocular camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210404958.1A CN114860072A (en) 2022-04-18 2022-04-18 Gesture recognition interaction equipment based on monocular camera

Publications (1)

Publication Number Publication Date
CN114860072A true CN114860072A (en) 2022-08-05

Family

ID=82631443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210404958.1A Pending CN114860072A (en) 2022-04-18 2022-04-18 Gesture recognition interaction equipment based on monocular camera

Country Status (1)

Country Link
CN (1) CN114860072A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258655A (en) * 2022-12-13 2023-06-13 合肥工业大学 Real-time image enhancement method and system based on gesture interaction

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258655A (en) * 2022-12-13 2023-06-13 合肥工业大学 Real-time image enhancement method and system based on gesture interaction
CN116258655B (en) * 2022-12-13 2024-03-12 合肥工业大学 Real-time image enhancement method and system based on gesture interaction

Similar Documents

Publication Publication Date Title
JP6079832B2 (en) Human computer interaction system, hand-to-hand pointing point positioning method, and finger gesture determination method
CN108776773B (en) Three-dimensional gesture recognition method and interaction system based on depth image
TWI654539B (en) Virtual reality interaction method, device and system
US11308655B2 (en) Image synthesis method and apparatus
Beyeler OpenCV with Python blueprints
CN111178170B (en) Gesture recognition method and electronic equipment
CN109240494B (en) Control method, computer-readable storage medium and control system for electronic display panel
CN109839827B (en) Gesture recognition intelligent household control system based on full-space position information
CN105046249B (en) A kind of man-machine interaction method
CN113506377A (en) Teaching training method based on virtual roaming technology
CN114327064A (en) Plotting method, system, equipment and storage medium based on gesture control
CN114860072A (en) Gesture recognition interaction equipment based on monocular camera
Inoue et al. Tracking Robustness and Green View Index Estimation of Augmented and Diminished Reality for Environmental Design
CN114373050A (en) Chemistry experiment teaching system and method based on HoloLens
CN104239119A (en) Method and system for realizing electric power training simulation upon kinect
US20160342831A1 (en) Apparatus and method for neck and shoulder landmark detection
JP6174277B1 (en) Image processing system, image processing apparatus, image processing method, and program
CN110442242B (en) Intelligent mirror system based on binocular space gesture interaction and control method
CN111383343B (en) Home decoration design-oriented augmented reality image rendering coloring method based on generation countermeasure network technology
KR20190027287A (en) The method of mimesis for keyboard and mouse function using finger movement and mouth shape
CN104699243B (en) A kind of incorporeity virtual mouse method based on monocular vision
CN115061577A (en) Hand projection interaction method, system and storage medium
Reza et al. Real time mouse cursor control based on bare finger movement using webcam to improve HCI
Patil et al. Gesture Recognition for Media Interaction: A Streamlit Implementation with OpenCV and MediaPipe
Vysocky et al. Generating synthetic depth image dataset for industrial applications of hand localization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination