CN117392759B - Action recognition method based on AR teaching aid - Google Patents

Action recognition method based on AR teaching aid Download PDF

Info

Publication number
CN117392759B
CN117392759B CN202311685269.3A CN202311685269A CN117392759B CN 117392759 B CN117392759 B CN 117392759B CN 202311685269 A CN202311685269 A CN 202311685269A CN 117392759 B CN117392759 B CN 117392759B
Authority
CN
China
Prior art keywords
gesture
color label
layer
representing
convolution layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311685269.3A
Other languages
Chinese (zh)
Other versions
CN117392759A (en
Inventor
凌艳
陆海燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Aeronautic Polytechnic
Original Assignee
Chengdu Aeronautic Polytechnic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Aeronautic Polytechnic filed Critical Chengdu Aeronautic Polytechnic
Priority to CN202311685269.3A priority Critical patent/CN117392759B/en
Publication of CN117392759A publication Critical patent/CN117392759A/en
Application granted granted Critical
Publication of CN117392759B publication Critical patent/CN117392759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/273Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Social Psychology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an action recognition method based on an AR teaching aid, which belongs to the technical field of action recognition and comprises the following steps: s1, acquiring a gesture global image of a user on an AR teaching aid, cutting the gesture global image, and generating a gesture local image; s2, constructing a gesture recognition model; s3, inputting the gesture local image into a gesture recognition model, and determining gesture actions of a user. The invention discloses an action recognition method based on an AR teaching aid, which is used for precisely cutting a global image, wherein a generated local image only comprises hands, so that the gesture action can be quickly and accurately extracted in the following steps; meanwhile, the invention also builds a gesture recognition model, and the gesture recognition model performs feature extraction and feature fusion on the local image, so that the gesture of the user can be accurately extracted, the user can conveniently control the AR teaching aid, the use feeling of the user is improved, and the interaction time of the AR teaching aid is reduced.

Description

Action recognition method based on AR teaching aid
Technical Field
The invention belongs to the technical field of action recognition, and particularly relates to an action recognition method based on an AR teaching aid.
Background
With the progress of technology, AR technology is increasingly developed, and its application is increasingly accepted. AR sand tables are being widely used as a new type of AR educational tool. The three-dimensional virtual reality system can combine the three-dimensional virtual reality with the actual environment, so that students experience through various senses such as vision, hearing and touch, and the like, and know various contents. In education, AR sand tables are fused with solid models by acoustic, optical, electrical, image, three-dimensional animation, and computer programming techniques. Students and teachers can operate the AR sand table through gesture motions, gesture motion graphics or image information is converted into data information to be input into head-mounted equipment equipped with the AR sand table, and the data information is matched with the three-dimensional space information sand table, so that the AR sand table is controlled. However, the existing AR sand table is not accurate enough and has a slower response speed when recognizing the action of a user, so the invention provides an action recognition method based on an AR teaching aid.
Disclosure of Invention
The invention provides an action recognition method based on an AR teaching aid in order to solve the problems.
The technical scheme of the invention is as follows: an action recognition method based on an AR teaching aid comprises the following steps:
s1, acquiring a gesture global image of a user on an AR teaching aid, cutting the gesture global image, and generating a gesture local image;
s2, constructing a gesture recognition model;
s3, inputting the gesture local image into a gesture recognition model, and determining gesture actions of a user.
Further, S1 comprises the following sub-steps:
s11, calculating color label values of all pixel points in the gesture global image;
s12, taking the pixel point with the largest color label value as a standard pixel point;
s13, calculating the difference between the color label values of the rest pixel points in the gesture global image and the color label values of the standard pixel points to obtain a color label difference set;
s14, determining invalid pixel points in the gesture global image according to the color label difference value set;
s15, removing the invalid pixel points from the gesture global image to generate a gesture local image.
The beneficial effects of the above-mentioned further scheme are: in the invention, when a student or a teacher uses the digital sand table, the operation is completed by waving the gesture above the digital sand table, so the invention collects the image when waving the gesture, but the image at the moment possibly has redundant background to influence gesture recognition, so the invention primarily cuts the global image of the gesture and determines the local image only containing the hand. The three-channel color values of the partial images are generally similar, so that the pixel points belonging to background noise are screened by calculating the color label values among the pixel points, and the pixel points are removed from the gesture global image, so that the gesture partial image only containing hands can be obtained, the gesture recognition can be conveniently and rapidly carried out in the subsequent steps, and the recognition efficiency and the recognition accuracy are improved.
Further, in S11, the color label value C of the pixel point with x-axis and y-axis in the gesture global image x,y The calculation formula of (2) is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 0 Representing the abscissa, y of the pixel point where the gesture global image center is located 0 Representing the ordinate of the pixel point where the gesture global image center is located, R x,y Red channel value, G, representing pixel point with x on the abscissa and y on the ordinate x,y Green channel value, B, representing pixel point with x on the abscissa and y on the ordinate x,y Blue channel values representing pixel points with x on the abscissa and y on the ordinate, log (-) represents a logarithmic function,representing the abscissa x 0 And the ordinate is y 0 Is used for the red channel value of the pixel point,representing the abscissa x 0 And the ordinate is y 0 Is set for the green channel value of the pixel point of (c),representing the abscissa x 0 And the ordinate is y 0 Blue channel value of the pixel point of (c).
Further, S14 includes the sub-steps of:
s141, sorting all color label difference values of the color label difference value set from small to large, and sorting before sortingThe color label differences are used as a first color label difference subset; wherein L represents the number of color label differences of the color label difference set,representing an upward rounding function;
s142, randomly dividing the rest color label difference values except the first color label difference value subset in the color label difference value set into a second color label difference value subset and a third color label difference value subset;
s143, determining a color label threshold according to the first color label difference value subset, the second color label difference value subset and the third color label difference value subset;
s144, eliminating pixel points corresponding to the color label difference values larger than the color label threshold value from the gesture global image, and generating a gesture local image.
Further, in S143, the calculation formula of the color label threshold σ is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein u is m Representing the mth color label difference, v, in the first subset of color label differences n Representing the nth color label difference, w, in the second subset of color label differences k Represents the kth color label difference in the third subset of color label differences, max (·) represents the maximum function, min (·) represents the minimum function, v ave Representing the average value, w, of all color label differences in the second subset of color label differences ave Represents the average of all color label differences in the third subset of color label differences, e represents the index.
Further, in S2, the gesture recognition model includes an input layer, a first feature convolution layer, a second feature convolution layer, an operator, a full connection layer, and an output layer;
the input end of the input layer is used as the input end of the gesture recognition model, the first output end of the input layer is connected with the input end of the first characteristic convolution layer, and the second output layer of the input layer is connected with the first input end of the second characteristic convolution layer; the first output end of the first characteristic convolution layer is connected with the first input end of the arithmetic unit, and the second output end of the first characteristic convolution layer is connected with the second input end of the second characteristic convolution layer; the output end of the second characteristic convolution layer is connected with the second input end of the arithmetic unit; the output end of the arithmetic unit is connected with the input end of the full-connection layer; the output end of the full-connection layer is connected with the input end of the output layer; the output end of the output layer is used as the output end of the gesture recognition model.
The beneficial effects of the above-mentioned further scheme are: in the invention, an input layer is used for inputting a gesture local image into a gesture recognition model. The first characteristic convolution layer is used for extracting characteristic information of the gesture local image according to pixel values of all pixel points in the extracted gesture local image; the second characteristic convolution layer is used for fusing the characteristic information extracted by the first characteristic convolution layer with the pixel value of each pixel point in the gesture local image, so that the characteristic richness is increased. The full-connection layer fuses the characteristic information extracted by the first characteristic convolution layer and the characteristic information extracted by the second characteristic convolution layer again through addition operation, the characteristic dimension is improved, and finally the identification result is output through the output layer.
Further, the expression of the first characteristic convolution layer is:the method comprises the steps of carrying out a first treatment on the surface of the Where G represents the output of the first feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point lines of the gesture local image, J represents the number of pixel point lines of the gesture local image, and w p Weights representing the p-th convolution kernel in the first feature convolution layer, o p Representing the first characteristic convolution layerOffset of p convolution kernels, alpha p Step size, b, representing the p-th convolution kernel in the first feature convolution layer p The number of channels representing the P-th convolution kernel in the first characteristic convolution layer, and P represents the number of convolution kernels of the first characteristic convolution layer.
Further, the expression of the second characteristic convolution layer is:the method comprises the steps of carrying out a first treatment on the surface of the Where H represents the output of the second feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point rows of the gesture local image, J represents the number of pixel point rows of the gesture local image, and W q Weight, O, representing the q-th convolution kernel in the first feature convolution layer q Representing the bias, beta, of the qth convolution kernel in the second feature convolution layer q Step length representing the qth convolution kernel in the second feature convolution layer, B q The number of channels of the Q-th convolution kernel in the second characteristic convolution layer is represented, P represents the number of convolution kernels of the first characteristic convolution layer, and Q represents the number of convolution kernels of the second characteristic convolution layer.
Further, the expression of the fully connected layer is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein T represents the output of the fully connected layer,the offset of the kth neuron in the full connection layer is represented, K represents the number of neurons of the full connection layer, P represents the number of convolution kernels of the first characteristic convolution layer, Q represents the number of convolution kernels of the second characteristic convolution layer, G represents the output of the first characteristic convolution layer, and H represents the output of the second characteristic convolution layer.
The beneficial effects of the invention are as follows: the invention discloses an action recognition method based on an AR teaching aid, which is characterized in that through recognizing a global image of a gesture swung by a user on the AR teaching aid (namely an AR sand table), the global image is accurately cut in consideration of background noise of the global image, and the generated local image only comprises hands, so that the gesture action can be quickly and accurately extracted in the following steps; meanwhile, the invention also builds a gesture recognition model, and the gesture recognition model performs feature extraction and feature fusion on the local image, so that the gesture of the user can be accurately extracted, the user can conveniently control the AR teaching aid, the use feeling of the user is improved, and the interaction time of the AR teaching aid is reduced.
Drawings
FIG. 1 is a flow chart of an AR teaching aid based motion recognition method;
fig. 2 is a schematic diagram of a gesture recognition model.
Detailed Description
Embodiments of the present invention are further described below with reference to the accompanying drawings.
As shown in fig. 1, the invention provides an action recognition method based on an AR teaching aid, which comprises the following steps:
s1, acquiring a gesture global image of a user on an AR teaching aid, cutting the gesture global image, and generating a gesture local image;
s2, constructing a gesture recognition model;
s3, inputting the gesture local image into a gesture recognition model, and determining gesture actions of a user.
In an embodiment of the present invention, S1 comprises the following sub-steps:
s11, calculating color label values of all pixel points in the gesture global image;
s12, taking the pixel point with the largest color label value as a standard pixel point;
s13, calculating the difference between the color label values of the rest pixel points in the gesture global image and the color label values of the standard pixel points to obtain a color label difference set;
s14, determining invalid pixel points in the gesture global image according to the color label difference value set;
s15, removing the invalid pixel points from the gesture global image to generate a gesture local image.
In the invention, when a student or a teacher uses the digital sand table, the operation is completed by waving the gesture above the digital sand table, so the invention collects the image when waving the gesture, but the image at the moment possibly has redundant background to influence gesture recognition, so the invention primarily cuts the global image of the gesture and determines the local image only containing the hand. The three-channel color values of the partial images are generally similar, so that the pixel points belonging to background noise are screened by calculating the color label values among the pixel points, and the pixel points are removed from the gesture global image, so that the gesture partial image only containing hands can be obtained, the gesture recognition can be conveniently and rapidly carried out in the subsequent steps, and the recognition efficiency and the recognition accuracy are improved.
In the embodiment of the present invention, in S11, the color label value C of the pixel point with x abscissa and y ordinate in the gesture global image x,y The calculation formula of (2) is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 0 Representing the abscissa, y of the pixel point where the gesture global image center is located 0 Representing the ordinate of the pixel point where the gesture global image center is located, R x,y Red channel value, G, representing pixel point with x on the abscissa and y on the ordinate x,y Green channel value, B, representing pixel point with x on the abscissa and y on the ordinate x,y Blue channel values representing pixel points with x on the abscissa and y on the ordinate, log (-) represents a logarithmic function,representing the abscissa x 0 And the ordinate is y 0 Is used for the red channel value of the pixel point,representing the abscissa x 0 And the ordinate is y 0 Is set for the green channel value of the pixel point of (c),representing the abscissa x 0 And the ordinate is y 0 Blue channel value of the pixel point of (c).
In an embodiment of the present invention, S14 includes the sub-steps of:
s141, sorting all color label difference values of the color label difference value set from small to large, and sorting before sortingThe color label differences are used as a first color label difference subset; wherein L represents the number of color label differences of the color label difference set,representing an upward rounding function;
s142, randomly dividing the rest color label difference values except the first color label difference value subset in the color label difference value set into a second color label difference value subset and a third color label difference value subset;
s143, determining a color label threshold according to the first color label difference value subset, the second color label difference value subset and the third color label difference value subset;
s144, eliminating pixel points corresponding to the color label difference values larger than the color label threshold value from the gesture global image, and generating a gesture local image.
In the embodiment of the present invention, in S143, the calculation formula of the color label threshold σ is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein u is m Representing the mth color label difference, v, in the first subset of color label differences n Representing the nth color label difference, w, in the second subset of color label differences k Represents the kth color label difference in the third subset of color label differences, max (·) represents the maximum function, min (·) represents the minimum function, v ave Representing the average value, w, of all color label differences in the second subset of color label differences ave Represents the average of all color label differences in the third subset of color label differences, e represents the index.
In the embodiment of the present invention, as shown in fig. 2, in S2, the gesture recognition model includes an input layer, a first feature convolution layer, a second feature convolution layer, an operator, a full connection layer, and an output layer;
the input end of the input layer is used as the input end of the gesture recognition model, the first output end of the input layer is connected with the input end of the first characteristic convolution layer, and the second output layer of the input layer is connected with the first input end of the second characteristic convolution layer; the first output end of the first characteristic convolution layer is connected with the first input end of the arithmetic unit, and the second output end of the first characteristic convolution layer is connected with the second input end of the second characteristic convolution layer; the output end of the second characteristic convolution layer is connected with the second input end of the arithmetic unit; the output end of the arithmetic unit is connected with the input end of the full-connection layer; the output end of the full-connection layer is connected with the input end of the output layer; the output end of the output layer is used as the output end of the gesture recognition model.
In the invention, an input layer is used for inputting a gesture local image into a gesture recognition model. The first characteristic convolution layer is used for extracting characteristic information of the gesture local image according to pixel values of all pixel points in the extracted gesture local image; the second characteristic convolution layer is used for fusing the characteristic information extracted by the first characteristic convolution layer with the pixel value of each pixel point in the gesture local image, so that the characteristic richness is increased. The full-connection layer fuses the characteristic information extracted by the first characteristic convolution layer and the characteristic information extracted by the second characteristic convolution layer again through addition operation, the characteristic dimension is improved, and finally the identification result is output through the output layer.
In the embodiment of the present invention, the expression of the first characteristic convolution layer is:the method comprises the steps of carrying out a first treatment on the surface of the Where G represents the output of the first feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point lines of the gesture local image, J represents the number of pixel point lines of the gesture local image, and w p Weights representing the p-th convolution kernel in the first feature convolution layer, o p Representing a first feature convolution layerOffset of p-th convolution kernel, alpha p Step size, b, representing the p-th convolution kernel in the first feature convolution layer p The number of channels representing the P-th convolution kernel in the first characteristic convolution layer, and P represents the number of convolution kernels of the first characteristic convolution layer.
In the embodiment of the present invention, the expression of the second characteristic convolution layer is:the method comprises the steps of carrying out a first treatment on the surface of the Where H represents the output of the second feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point rows of the gesture local image, J represents the number of pixel point rows of the gesture local image, and W q Weight, O, representing the q-th convolution kernel in the first feature convolution layer q Representing the bias, beta, of the qth convolution kernel in the second feature convolution layer q Step length representing the qth convolution kernel in the second feature convolution layer, B q The number of channels of the Q-th convolution kernel in the second characteristic convolution layer is represented, P represents the number of convolution kernels of the first characteristic convolution layer, and Q represents the number of convolution kernels of the second characteristic convolution layer.
In the embodiment of the invention, the expression of the full connection layer is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein T represents the output of the fully connected layer,the offset of the kth neuron in the full connection layer is represented, K represents the number of neurons of the full connection layer, P represents the number of convolution kernels of the first characteristic convolution layer, Q represents the number of convolution kernels of the second characteristic convolution layer, G represents the output of the first characteristic convolution layer, and H represents the output of the second characteristic convolution layer.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims (5)

1. An action recognition method based on an AR teaching aid is characterized by comprising the following steps:
s1, acquiring a gesture global image of a user on an AR teaching aid, cutting the gesture global image, and generating a gesture local image;
s2, constructing a gesture recognition model;
s3, inputting the gesture local image into a gesture recognition model, and determining gesture actions of a user;
the step S1 comprises the following substeps:
s11, calculating color label values of all pixel points in the gesture global image;
s12, taking the pixel point with the largest color label value as a standard pixel point;
s13, calculating the difference between the color label values of the rest pixel points in the gesture global image and the color label values of the standard pixel points to obtain a color label difference set;
s14, determining invalid pixel points in the gesture global image according to the color label difference value set;
s15, removing invalid pixel points from the gesture global image to generate a gesture local image;
in S11, the color label value C of the pixel point with x abscissa and y ordinate in the gesture global image x,y The calculation formula of (2) is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 0 Representing the abscissa, y of the pixel point where the gesture global image center is located 0 Representing the ordinate of the pixel point where the gesture global image center is located, R x,y Red channel value, G, representing pixel point with x on the abscissa and y on the ordinate x,y Green channel value, B, representing pixel point with x on the abscissa and y on the ordinate x,y Blue channel value representing pixel point with x on the abscissa and y on the ordinate, log (·) represents a logarithmic function, ++>Representing the abscissa x 0 And the ordinate is y 0 Is used for the red channel value of the pixel point,representing the abscissa x 0 And the ordinate is y 0 Green channel value of pixel of +.>Representing the abscissa x 0 And the ordinate is y 0 Blue channel values for pixels of (a);
the step S14 includes the sub-steps of:
s141, sorting all color label difference values of the color label difference value set from small to large, and sorting before sortingThe color label differences are used as a first color label difference subset; wherein L represents the number of color label differences of the color label difference set, < >>Representing an upward rounding function;
s142, randomly dividing the rest color label difference values except the first color label difference value subset in the color label difference value set into a second color label difference value subset and a third color label difference value subset;
s143, determining a color label threshold according to the first color label difference value subset, the second color label difference value subset and the third color label difference value subset;
s144, removing pixel points corresponding to the color label difference values larger than the color label threshold value from the gesture global image to generate a gesture local image;
in S143, the calculation formula of the color label threshold σ is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein u is m Representing the mth color label difference, v, in the first subset of color label differences n Representing the nth color label difference, w, in the second subset of color label differences k Represents the kth color label difference in the third subset of color label differences, max (·) represents the maximum function, min (·) represents the minimum function, v ave Representing the average value, w, of all color label differences in the second subset of color label differences ave Represents the average of all color label differences in the third subset of color label differences, e represents the index.
2. The AR teaching aid-based motion recognition method according to claim 1, wherein in S2, the gesture recognition model includes an input layer, a first feature convolution layer, a second feature convolution layer, an operator, a full connection layer, and an output layer;
the input end of the input layer is used as the input end of the gesture recognition model, the first output end of the input layer is connected with the input end of the first characteristic convolution layer, and the second output layer of the input layer is connected with the first input end of the second characteristic convolution layer; the first output end of the first characteristic convolution layer is connected with the first input end of the arithmetic unit, and the second output end of the first characteristic convolution layer is connected with the second input end of the second characteristic convolution layer; the output end of the second characteristic convolution layer is connected with the second input end of the arithmetic unit; the output end of the arithmetic unit is connected with the input end of the full-connection layer; the output end of the full-connection layer is connected with the input end of the output layer; the output end of the output layer is used as the output end of the gesture recognition model.
3. The AR teaching aid based motion recognition method according to claim 2, wherein the expression of the first feature convolution layer is:,/>the method comprises the steps of carrying out a first treatment on the surface of the Where G represents the output of the first feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point lines of the gesture local image, J represents the number of pixel point lines of the gesture local image, and w p Weights representing the p-th convolution kernel in the first feature convolution layer, o p Representing the offset, alpha, of the p-th convolution kernel in the first characteristic convolution layer p Step size, b, representing the p-th convolution kernel in the first feature convolution layer p The number of channels representing the P-th convolution kernel in the first characteristic convolution layer, and P represents the number of convolution kernels of the first characteristic convolution layer.
4. The AR teaching aid based motion recognition method according to claim 2, wherein the expression of the second feature convolution layer is:the method comprises the steps of carrying out a first treatment on the surface of the Where H represents the output of the second feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point rows of the gesture local image, J represents the number of pixel point rows of the gesture local image, and W q Weight, O, representing the q-th convolution kernel in the first feature convolution layer q Representing the bias, beta, of the qth convolution kernel in the second feature convolution layer q Step length representing the qth convolution kernel in the second feature convolution layer, B q The number of channels of the Q-th convolution kernel in the second characteristic convolution layer is represented, P represents the number of convolution kernels of the first characteristic convolution layer, and Q represents the number of convolution kernels of the second characteristic convolution layer.
5. According to claim 2The action recognition method based on the AR teaching aid is characterized in that the expression of the full connection layer is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein T represents the output of the fully-connected layer, ">The offset of the kth neuron in the full connection layer is represented, K represents the number of neurons of the full connection layer, P represents the number of convolution kernels of the first characteristic convolution layer, Q represents the number of convolution kernels of the second characteristic convolution layer, G represents the output of the first characteristic convolution layer, and H represents the output of the second characteristic convolution layer.
CN202311685269.3A 2023-12-11 2023-12-11 Action recognition method based on AR teaching aid Active CN117392759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311685269.3A CN117392759B (en) 2023-12-11 2023-12-11 Action recognition method based on AR teaching aid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311685269.3A CN117392759B (en) 2023-12-11 2023-12-11 Action recognition method based on AR teaching aid

Publications (2)

Publication Number Publication Date
CN117392759A CN117392759A (en) 2024-01-12
CN117392759B true CN117392759B (en) 2024-03-12

Family

ID=89463428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311685269.3A Active CN117392759B (en) 2023-12-11 2023-12-11 Action recognition method based on AR teaching aid

Country Status (1)

Country Link
CN (1) CN117392759B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345867A (en) * 2018-03-09 2018-07-31 南京邮电大学 Gesture identification method towards Intelligent household scene
CN108983980A (en) * 2018-07-27 2018-12-11 河南科技大学 A kind of mobile robot basic exercise gestural control method
CN112749664A (en) * 2021-01-15 2021-05-04 广东工贸职业技术学院 Gesture recognition method, device, equipment, system and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345867A (en) * 2018-03-09 2018-07-31 南京邮电大学 Gesture identification method towards Intelligent household scene
CN108983980A (en) * 2018-07-27 2018-12-11 河南科技大学 A kind of mobile robot basic exercise gestural control method
CN112749664A (en) * 2021-01-15 2021-05-04 广东工贸职业技术学院 Gesture recognition method, device, equipment, system and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HSV Brightness Factor Matching for Gesture Recognition System;Mokhtar M.Hasan等;《International Journal of Image Processing》;20101231;第4卷(第5期);456-467 *
一种基于图像特征融合的动态手势识别方法;陈茜等;《计算机与数字工程》;20230620;第51卷(第6期);1381-1386 *
基于YCbCr颜色空间手势分割;杨红玲;宣士斌;莫愿斌;赵洪;;广西民族大学学报(自然科学版);20170815(第03期);66-71 *
基于增强现实的人手自然交互康复系统设计;卞方舟;《中国优秀硕士学位论文全文数据库 (医药卫生科技辑)》;20230115(第1期);E060-1237 *
基于计算机视觉的手势识别系统研究;周航;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20080515(第5期);I138-13 *

Also Published As

Publication number Publication date
CN117392759A (en) 2024-01-12

Similar Documents

Publication Publication Date Title
CN110750959B (en) Text information processing method, model training method and related device
US11983881B2 (en) AI-based image region recognition method and apparatus and AI-based model training method and apparatus
JP2667954B2 (en) Apparatus and method for automatic handwriting recognition using static and dynamic parameters
CN104463101B (en) Answer recognition methods and system for character property examination question
US11783615B2 (en) Systems and methods for language driven gesture understanding
CN107578023A (en) Man-machine interaction gesture identification method, apparatus and system
CN108229478A (en) Image, semantic segmentation and training method and device, electronic equipment, storage medium and program
CN109145871B (en) Psychological behavior recognition method, device and storage medium
JP2022500808A (en) Statement generation methods and devices, electronic devices and programs
CN113870395A (en) Animation video generation method, device, equipment and storage medium
CN110704606A (en) Generation type abstract generation method based on image-text fusion
CN113283336A (en) Text recognition method and system
CN115471885A (en) Action unit correlation learning method and device, electronic device and storage medium
CN111739037B (en) Semantic segmentation method for indoor scene RGB-D image
CN109240931A (en) Problem feedback information treating method and apparatus
CN116561533B (en) Emotion evolution method and terminal for virtual avatar in educational element universe
CN117392759B (en) Action recognition method based on AR teaching aid
US4736447A (en) Video computer
CN111968624A (en) Data construction method and device, electronic equipment and storage medium
CN111862061A (en) Method, system, device and medium for evaluating aesthetic quality of picture
CN108108652A (en) A kind of across visual angle Human bodys&#39; response method and device based on dictionary learning
CN114944002B (en) Text description-assisted gesture-aware facial expression recognition method
Kratimenos et al. 3D hands, face and body extraction for sign language recognition
CN114358579A (en) Evaluation method, evaluation device, electronic device, and computer-readable storage medium
CN113947801A (en) Face recognition method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant