CN115167666A - But interactive AR intelligence helmet device of gesture - Google Patents

But interactive AR intelligence helmet device of gesture Download PDF

Info

Publication number
CN115167666A
CN115167666A CN202210723856.6A CN202210723856A CN115167666A CN 115167666 A CN115167666 A CN 115167666A CN 202210723856 A CN202210723856 A CN 202210723856A CN 115167666 A CN115167666 A CN 115167666A
Authority
CN
China
Prior art keywords
gesture
module
gesture recognition
soc
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210723856.6A
Other languages
Chinese (zh)
Inventor
童飞飞
葛晨阳
李辉
杨亚林
张小娜
王骞
卫倩倩
王辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Costar Group Co Ltd
Original Assignee
Henan Costar Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Costar Group Co Ltd filed Critical Henan Costar Group Co Ltd
Priority to CN202210723856.6A priority Critical patent/CN115167666A/en
Publication of CN115167666A publication Critical patent/CN115167666A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0101Head-up displays characterised by optical features
    • G02B2027/0138Head-up displays characterised by optical features comprising image capture systems, e.g. camera
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0101Head-up displays characterised by optical features
    • G02B2027/014Head-up displays characterised by optical features comprising information/image processing systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Optics & Photonics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an AR intelligent helmet device capable of realizing gesture interaction, which comprises an AR helmet and a gesture recognition module, wherein the AR helmet is connected with the gesture recognition module; the AR helmet comprises a multi-camera module, an SoC (System on a chip) computing board, a binocular micro-display screen and an optical machine module, wherein the multi-camera module collects gesture images, and gesture control commands are recognized to perform function selection, clicking, exiting and page changing actions through gesture recognition processing on the SoC computing board based on an ARM + NPU (advanced RISC machine plus user plane) architecture and are synchronously displayed on the binocular micro-display screen and the optical machine module; the gesture recognition module runs on the SoC computing board and performs hand detection and static gesture recognition on gesture images acquired by the multiple camera modules, wherein the hand detection adopts a Retina hand-based hand detection network. Compared with the prior art, the method can realize gesture interactive recognition, has the characteristics of short time delay, accurate hand action recognition and real-time interaction support, and has important significance for communication between people, between people and machines, and even between humanoid intelligent machines.

Description

But interactive AR intelligence helmet device of gesture
Technical Field
The invention relates to the technical field of VR/AR and natural interaction, in particular to an AR intelligent helmet device capable of gesture interaction.
Background
The gesture is a natural interaction mode which is inherent to people, is an important bridge for communication between people, people and machines, even people-like intelligent machines and machines, and has urgent needs in many fields, such as deaf-mute communication, intelligent home, robots, medical national defense and the like. How to obtain high-precision and high-accuracy gesture recognition has become a key for gesture interaction research. The large-view-field high-immersion AR helmet serving as a portable large-screen display device can meet the display requirements of large view field, high immersion degree and high resolution, and is developed in the directions of multi-sensor integration, virtual and real image fusion, digital information superposition and the like at present. The existing AR helmet generally adopts a key, voice control or mobile phone connection interaction mode in the interaction aspect. If the AR helmet can also adopt gesture interaction in the aspect of interaction, the method has important significance for communication and communication between people, people and machines, and even humanoid intelligent machines and machines.
Disclosure of Invention
In order to solve the technical defects, the invention aims to provide an AR intelligent helmet device capable of gesture interaction, which can realize gesture interaction recognition, has the characteristics of short time delay, accurate hand motion recognition and real-time interaction support, and has important significance for communication between people, between people and machines, and even between humanoid intelligent machines.
In order to achieve the purpose, the invention adopts the technical scheme that: an AR intelligent helmet device capable of realizing gesture interaction comprises an AR helmet and a gesture recognition module; the AR helmet comprises a multi-camera module, an SoC (System on a chip) computing board, a binocular micro-display screen and an optical machine module, wherein the multi-camera module collects gesture images, and gesture control commands are recognized to perform function selection, clicking, exiting and page changing actions through gesture recognition processing on the SoC computing board based on an ARM + NPU (advanced RISC machine plus user plane) architecture and are synchronously displayed on the binocular micro-display screen and the optical machine module;
the gesture recognition module runs on the SoC computing board and performs hand detection and static gesture recognition on gesture images acquired by the multiple camera modules, wherein the hand detection adopts a Retina hand-based hand detection network, and the static gesture recognition adopts static gesture classification network processing.
Furthermore, the multi-camera module comprises an RGB high-definition camera and an IR detection camera, and video images acquired by the RGB high-definition camera and the IR detection camera are transmitted to the SoC computing board through the mipi interface.
The multi-camera module comprises two low-light-level high-definition cameras and an IR detection camera, the two low-light-level high-definition cameras are combined to have the function of expanding the view angle FoV, and video images collected by the cameras are transmitted to the SoC computing board through the mipi interface.
The SoC computing board is provided with a SoC chip, the chip adopts an ARM core + NPU core architecture, a plurality of paths of video signals of a multi-camera module are input through a mipi interface, image fusion processing of RGB + IR or two paths of low-light level + IR is carried out after ISP processing, and the fusion target is more visible and convenient to identify for targets under different illumination conditions. The NPU core is used for running a target detection and gesture recognition algorithm. And the fused image, the target detection result and the gesture recognition result are synchronously output to a binocular micro display screen and an optical-mechanical module through mipi for display.
The binocular micro display screen adopts an OLED micro display screen or an LCoS micro display screen; the optical machine module is a near-eye optical system or an optical waveguide diffraction device and is used for near-eye AR enhanced display.
The hand detection and static gesture recognition are mainly based on operation processing of an NPU core of an SoC on an SoC computing board, the RetinaHandd-based hand detection network comprises a backbone network backbone for feature extraction, a feature processing fusion module FPN and a regression head part head module, and the regression head part head module is used for regressing specific category and coordinate information of a target from features processed by the feature processing fusion module FPN.
The static gesture recognition result is used for controlling functions and menu selection, APP clicking, exiting and page changing actions on the SoC computing board, and the static gesture classification network for realizing the static gesture recognition comprises a feature extraction module and a normalization index function, wherein the feature extraction module comprises a full connection layer, a batch standardization layer and a nonlinear activation layer; outputting the detected hand region of the static gesture classification network as a C-dimensional feature, which represents the probability that the static gesture belongs to C categories respectively; the normalized exponential function normalizes the probability to between [0,1 ].
By the technical scheme, the AR intelligent helmet device capable of realizing gesture interaction is realized based on the AR helmet and the gesture recognition module. The method has the characteristics of short time delay, accurate hand action recognition and real-time interaction support, and can improve the natural interaction capacity of the AR intelligent helmet.
Drawings
Fig. 1 is a block schematic diagram of the present invention.
Fig. 2 is a schematic diagram of a structure of a hand detection network based on RetinaHand in the gesture recognition module of the present invention.
FIG. 3 is a flow chart of a method for visualizing the Attention-FPN in the present invention.
FIG. 4 is a schematic diagram of a static gesture classification network structure according to the present invention.
Detailed Description
Referring to fig. 1, an embodiment of the present invention discloses a gesture interactive AR smart helmet apparatus, which includes: an AR helmet, a gesture recognition module; the AR helmet comprises a plurality of camera modules, an SoC (System on a chip) computing board, a binocular micro-display screen and an optical machine module, gesture images are collected by the plurality of camera modules, gesture recognition processing is carried out on the SoC computing board based on an ARM + NPU (advanced RISC machine plus peripheral component Unit) framework, gesture action control instructions are recognized to carry out function selection, clicking, exiting and page changing actions, and the gesture control instructions are synchronously displayed on the binocular micro-display screen and the optical machine module.
The multi-camera module comprises an RGB high-definition camera and an IR detection camera, and transmits acquired video images to the SoC computing board through a mipi interface;
the multi-camera module can also comprise two low-light-level high-definition cameras and an IR detection camera, the two low-light-level high-definition cameras are combined to expand the view angle FoV, and the acquired video image is sent to the SoC computing board through a mipi interface;
the SoC computing board generally comprises an SoC chip, the chip adopts an ARM core + NPU core architecture, a plurality of paths of video signals of a plurality of camera modules are input through a mipi interface, image fusion processing of RGB + IR or two paths of low-light level + IR is carried out after the processing of ISP, and the fusion target is that the target under different illumination conditions can be more dominant and is convenient to identify. The NPU core is used for running a target detection and gesture recognition algorithm. And the fused image, the target detection result and the gesture recognition result are synchronously output to a binocular micro-display screen and an optical machine module through mipi for display.
The binocular micro display screen is generally an OLED micro display screen or an LCoS micro display screen,
the optical machine module is a near-eye optical system, can be an optical waveguide diffraction device and is used for near-eye AR enhanced display.
The gesture recognition module runs on the SoC computing board, and carries out hand detection and static gesture action classification recognition on gesture images collected by the multiple camera modules, wherein the hand detection adopts a Retina hand-based hand detection network, and the gesture recognition adopts static gesture classification network processing. The result of the static gesture recognition is used for controlling functions and menu selection on the SoC computing board, APP clicking, quitting, page changing and other actions.
The hand detection and classification network is mainly operated and processed based on an NPU core of the SoC on the SoC computing board.
The Retinahand detection network belongs to a single-stage target detection network, and improves and upgrades a plurality of modules on the basis that a RetinaFace framework is used as a structure for reference, particularly, more lightweight Networks are introduced as backbone Networks, a characteristic Pyramid (FPN) is improved, a positive and negative sample generation strategy is changed, a neutral part is simplified, different loss functions are tried, and the like.
The RetinaHand-based hand detection network conforms to the classic design flow of Backbone, neck and Head in a target detection algorithm, and the network structure of the hand detection network is shown in the attached figure 2. The network structure mainly comprises three main parts:
1) The backbone network used for feature extraction is commonly referred to as a backbone.
2) The feature processing fusion module FPN, also called the nack module of the network.
3) The regression header part, generally called head module, is used for regressing information such as specific categories, coordinates and the like of the target from the features processed by the hack module.
The processing process of the hand detection network based on Retinahand comprises three steps:
the first step is as follows: the generation of an a priori anchor frame (anchor) and the matching of the anchor frame and a target frame (GT). The basic principle of all single-stage target detection algorithms based on the prior anchor frame can be summarized into classification and regression after dense sampling for the original image, so that the generation of the anchor frame is an essential step, and although the geometric meaning of the anchor frame is relative to the original image, the specific generation of the anchor frame needs to be performed by combining a feature map. In the Retina-hand, three layers of feature maps in the network are reserved, and the down-sampling ratios to the original image are 1/8, 1/16 and 1/32 respectively.
In combination with the features of the infrared gesture image data set and the consideration of speed, in one example, the size of the input infrared image original is defined as 224x224, and then the scales of the three layers of feature maps are 28x28, 14x14 and 7x7, respectively, where each pixel point on each layer of feature map corresponds to an area of 8x8, 16x16 and 32x32 on the original. For traditional algorithms such as Faster R-CNN, SSD, retinaNet and the like, k anchor frames with different scales and length-width ratios are generated by taking each pixel point on a feature map as a reference, wherein k =9 generally represents 3 anchor frames with different scales and 3 widths-high ratios. Further aiming at the characteristic that the infrared image gesture data of the invention is close to a square, the design of the anchor frame can be simplified by only considering the dimension and neglecting the aspect ratio. Meanwhile, when the data set is processed, all labels can be forcibly processed into squares by adopting a short edge lengthening mode.
After the anchor frame is generated, intensive sampling work for the original image is only completed, and further, an object for supervised learning needs to be constructed for each sample, wherein the position of the object frame relative to the anchor frame and the category of each anchor frame are specifically represented. I.e. to determine whether the anchor frame belongs to the foreground or the background, if it belongs to the foreground, a specific position needs to be determined for it, where the position is represented by the offset of the anchor frame with respect to the target frame. The offset is divided into two parts, namely the offset of the center point of the target frame relative to the center point of the anchor frame and the conversion of the width and the height of the target frame relative to the width and the height of the anchor frame, wherein the conversion specifically represents the scale proportion of the target frame and the anchor frame after logarithmic transformation.
It should be noted that, in order to eliminate the influence caused by the size of the anchor frame itself, all anchor frames are considered equally, and the target frame needs to be normalized by using the width and the height relative to the center point of the anchor frame. If normalization is not carried out, the large anchor frame can tolerate larger deviation, and the small anchor frame is very sensitive to the deviation, so that the training and learning of the model are not facilitated, and the problem can be solved by converting regression absolute scale into regression relative scale. The other important step is to convert the width and height of the target frame relative to the width and height of the anchor frame into a logarithmic space, if the conversion is not carried out, the output width and height of the model can only be positive values, the requirement on the model is increased, the optimization difficulty is increased, and the conversion into the logarithmic space solves the problem.
The second step is that: in the mapping process from input to output of the whole network, an input image 3x224x224 is firstly subjected to feature extraction through a backbone network formed by convolutional layer stacking, features of each layer in the middle of the network are extracted and sent to the next FPN for processing, the features of the last three layers of the whole backbone network are extracted in total, and for the MobileNet V1x0.25 serving as the backbone network, the scales of feature maps of the three layers are respectively 64x28x28, 128x14x14 and 256x7x7.
And 3 layers of characteristics are obtained after FPN characteristic fusion, and each layer has a large number of prior anchor frames. In order to improve the expression capability of the features, the feature map at this time is further subjected to feature extraction by a feature refining module consisting of a large convolution kernel, so that the receptive field of the feature map is enlarged.
The FPN is an Attention-FPN. The characteristic pyramid is used as a necessary component in a current target detection mainstream model, the positioning capacity of an algorithm for targets with different scales can be effectively improved, for a hand detection task, the size of a hand changes violently due to different distances and orientations of a shot object relative to a camera in an actual scene, the pixel of a target close to the camera can reach 400x400 maximally, the size of a farthest target is only 20x20, and the scale changes violently, so that a target detection network is required to have good detection capacity for large and small targets. The traditional FPN is realized by directly adding the upper sampling of the high-level features and the bottom-level features, and the design of the invention realizes the improved FPN which integrates the Attention idea.
Inspired by MobileViT, the invention expands a self-attention mechanism and introduces the self-attention mechanism into an FPN module, wherein Query, key and Value are not from the same input any more, query is from nonlinear transformation of a shallow feature map, and Key and Value are from linear transformation of a deep feature map after upsampling. The operation in raw FPN using element-by-element addition is turned into fusion using an attention mechanism. From the principle point of view of the attention mechanism, this operation can be understood as expressing each pixel inside the shallow feature map using a weighted sum of all pixels of the deep feature map. The method has the advantages that the shallow layer is represented by a deep attention mechanism, global information can be effectively introduced into each pixel in the shallow layer feature map, convolution focuses more on local information, and therefore the feature map after fusion retains the global information and the local information at the same time, and model learning is facilitated. And finally, obtaining a new feature map of the shallow feature and the deep feature fused by the attention mechanism, and further transforming the feature map by using the attention mechanism again to improve the expression capability of the features.
The specific operation is as follows: the method comprises the steps of upsampling a relatively deep feature map, wherein 7x7 sampling is 14x14, aligning the number of channels with the channels of the previous layer by using 1x1 convolution, mapping 256 to 128 to obtain 128x14x14, and then performing Attention operation by using the obtained feature map, wherein the MobileViT method is used for firstly performing slicing operation on the feature map, performing self-Attention operation on all pixels in each slice, and then performing inverse transformation on the obtained final result to obtain the same shape as the original input feature map, thereby realizing an Attention calculation process. FIG. 3 shows the complete implementation flow of the Attention-FPN.
The third step: the feature maps are respectively subjected to target frame regression branches, and the final coordinate and the probability of the foreground background are obtained through confidence classification branch regression. For the present invention, if the total number of anchor boxes is represented by N, then the final output of the classification branch of the network model will be 2N, while the final output of the coordinate-box regression branch will be 4N, representing the probability that each anchor box belongs to the front, the background, and if it belongs to the foreground, the offset of the center point of the target with respect to the anchor boxes and the logarithmic transformation value of the width height of the target with respect to the width height of the anchor boxes, respectively.
To improve the positioning accuracy, the Loss function of the regression target frame coordinates is replaced by the average absolute error Loss (IoU Loss). When the distance between the output and the target is measured by using the absolute error, the regressed geometric quantities are independent from each other, and the inherent geometric constraints between the regressed geometric quantities are lacked. While such geometrical links can be modeled if the intersection ratio between the prediction box and the real box is directly optimized, this can also be seen as a direct optimization for the evaluation index.
In another embodiment, the static gesture classification network includes a feature extraction module and a normalized exponential function; the characteristic extraction module comprises a full connection layer, a batch normalization layer and a nonlinear activation layer; outputting the detected hand region of the static gesture classification network as a C-dimensional feature, which represents the probability that the static gesture belongs to C categories respectively; the normalized exponential function normalizes the probability to between [0,1 ].
For the embodiment, the static gesture classification network, as shown in fig. 4, mainly includes a feature extraction module and a normalized exponential function. The feature extraction module is mainly formed by stacking a full connection layer, a batch standardization layer and a nonlinear activation layer module. The input of the dynamic gesture classification network is a sequence of K groups of key point positions, the output is C-dimensional (C represents class number) characteristics, the probabilities that the gestures belong to C classes respectively are represented, and meanwhile, in order to compare the maximum output probability with a set threshold value, an exponential normalization function is required to normalize the probabilities to be between [0,1 ].
The above-described embodiments are only some of the embodiments of the present invention, and the concept and scope of the present invention are not limited to the details of the above-described exemplary embodiments. Therefore, various changes and modifications of the present invention shall be covered by the appended claims without departing from the design concept of the present invention.

Claims (7)

1. The utility model provides a but interactive AR intelligence helmet apparatus of gesture which characterized in that: the device comprises an AR helmet and a gesture recognition module; the AR helmet comprises a multi-camera module, an SoC (System on a chip) computing board, a binocular micro-display screen and an optical machine module, wherein the multi-camera module collects gesture images, and gesture control commands are recognized to perform function selection, clicking, exiting and page changing actions through gesture recognition processing on the SoC computing board based on an ARM + NPU (advanced RISC machine plus user plane) architecture and are synchronously displayed on the binocular micro-display screen and the optical machine module;
the gesture recognition module runs on the SoC computing board and performs hand detection and static gesture recognition on gesture images acquired by the multiple camera modules, wherein the hand detection adopts a Retina hand-based hand detection network, and the static gesture recognition adopts static gesture classification network processing.
2. The gesture-interactable AR smart headset apparatus of claim 1, wherein: the multi-camera module comprises an RGB high-definition camera and an IR detection camera, and video images acquired by the RGB high-definition camera and the IR detection camera are transmitted to the SoC computing board through the mipi interface.
3. The gesture-interactable AR smart headgear apparatus of claim 1, wherein: the multi-camera module comprises two low-light-level high-definition cameras and an IR detection camera, the two low-light-level high-definition cameras are combined to have the function of expanding the view angle FoV, and video images collected by the cameras are transmitted to the SoC computing board through the mipi interface.
4. The gesture-interactable AR smart headgear apparatus of claim 1, wherein: the SoC computing board is provided with a SoC chip which adopts an ARM core + NPU core architecture, a plurality of paths of video signals of a multi-camera module are input through a mipi interface, image fusion processing of RGB + IR or two paths of dim light + IR is carried out after the image fusion processing, a fused target is more dominant and is convenient to identify for targets under different illumination conditions, and the NPU core is used for running a target detection and gesture recognition algorithm; and the fused image, the target detection result and the gesture recognition result are synchronously output to a binocular micro display screen and an optical-mechanical module through mipi for display.
5. The gesture-interactable AR smart headset apparatus of claim 4, wherein: the binocular micro display screen adopts an OLED micro display screen or an LCoS micro display screen; the optical-mechanical module is a near-eye optical system or an optical waveguide diffraction device and is used for near-eye AR enhanced display.
6. The gesture-interactable AR smart headgear apparatus of claim 1, wherein: the hand detection and static gesture recognition are mainly based on operation processing of an NPU (kernel-processing unit) core of an SoC (system on a chip) computing board, the Retinahand-based hand detection network comprises a backbone network backbone for feature extraction, a feature processing fusion module FPN and a regression head module, and the regression head module is used for regressing specific types and coordinate information of targets from features processed by the feature processing fusion module FPN.
7. The gesture-interactable AR smart headgear apparatus of claim 6, wherein: the static gesture recognition result is used for controlling functions and menu selection, APP clicking, exiting and page changing actions on the SoC computing board, and the static gesture classification network for realizing the static gesture recognition comprises a feature extraction module and a normalization index function, wherein the feature extraction module comprises a full connection layer, a batch standardization layer and a nonlinear activation layer; outputting the detected hand region of the static gesture classification network as a C-dimensional feature, which represents the probability that the static gesture belongs to C categories respectively; the normalized exponential function normalizes the probability to between [0,1 ].
CN202210723856.6A 2022-06-24 2022-06-24 But interactive AR intelligence helmet device of gesture Pending CN115167666A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210723856.6A CN115167666A (en) 2022-06-24 2022-06-24 But interactive AR intelligence helmet device of gesture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210723856.6A CN115167666A (en) 2022-06-24 2022-06-24 But interactive AR intelligence helmet device of gesture

Publications (1)

Publication Number Publication Date
CN115167666A true CN115167666A (en) 2022-10-11

Family

ID=83486901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210723856.6A Pending CN115167666A (en) 2022-06-24 2022-06-24 But interactive AR intelligence helmet device of gesture

Country Status (1)

Country Link
CN (1) CN115167666A (en)

Similar Documents

Publication Publication Date Title
US20180186452A1 (en) Unmanned Aerial Vehicle Interactive Apparatus and Method Based on Deep Learning Posture Estimation
CN112836597B (en) Multi-hand gesture key point estimation method based on cascade parallel convolution neural network
CN115699082A (en) Defect detection method and device, storage medium and electronic equipment
CN113449573A (en) Dynamic gesture recognition method and device
CN110796018A (en) Hand motion recognition method based on depth image and color image
CN113158833B (en) Unmanned vehicle control command method based on human body posture
CN112183506A (en) Human body posture generation method and system
CN110135277B (en) Human behavior recognition method based on convolutional neural network
WO2022052782A1 (en) Image processing method and related device
CN114973408B (en) Dynamic gesture recognition method and device
Łysakowski et al. Real-time onboard object detection for augmented reality: Enhancing head-mounted display with yolov8
CN111290584A (en) Embedded infrared binocular gesture control system and method
CN111695408A (en) Intelligent gesture information recognition system and method and information data processing terminal
CN111222459A (en) Visual angle-independent video three-dimensional human body posture identification method
CN111046796A (en) Low-cost space gesture control method and system based on double-camera depth information
CN112069979B (en) Real-time action recognition man-machine interaction system
CN113569849A (en) Car fills electric pile interface detection intelligent interaction system based on computer vision
CN113642425A (en) Multi-mode-based image detection method and device, electronic equipment and storage medium
CN112351181A (en) Intelligent camera based on CMOS chip and ZYNQ system
Gao et al. Study of improved Yolov5 algorithms for gesture recognition
CN111078008B (en) Control method of early education robot
CN117422858A (en) Dual-light image target detection method, system, equipment and medium
CN117152838A (en) Gesture recognition method based on multi-core dynamic attention mechanism
CN115167666A (en) But interactive AR intelligence helmet device of gesture
CN116466827A (en) Intelligent man-machine interaction system and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination