CN117392759B - Action recognition method based on AR teaching aid - Google Patents
Action recognition method based on AR teaching aid Download PDFInfo
- Publication number
- CN117392759B CN117392759B CN202311685269.3A CN202311685269A CN117392759B CN 117392759 B CN117392759 B CN 117392759B CN 202311685269 A CN202311685269 A CN 202311685269A CN 117392759 B CN117392759 B CN 117392759B
- Authority
- CN
- China
- Prior art keywords
- gesture
- color label
- layer
- representing
- convolution layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000009471 action Effects 0.000 title claims abstract description 21
- 230000004913 activation Effects 0.000 claims description 6
- 230000033001 locomotion Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 6
- 238000000605 extraction Methods 0.000 abstract description 2
- 230000004927 fusion Effects 0.000 abstract description 2
- 230000003993 interaction Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 12
- 239000004576 sand Substances 0.000 description 12
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/273—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Social Psychology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Psychiatry (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an action recognition method based on an AR teaching aid, which belongs to the technical field of action recognition and comprises the following steps: s1, acquiring a gesture global image of a user on an AR teaching aid, cutting the gesture global image, and generating a gesture local image; s2, constructing a gesture recognition model; s3, inputting the gesture local image into a gesture recognition model, and determining gesture actions of a user. The invention discloses an action recognition method based on an AR teaching aid, which is used for precisely cutting a global image, wherein a generated local image only comprises hands, so that the gesture action can be quickly and accurately extracted in the following steps; meanwhile, the invention also builds a gesture recognition model, and the gesture recognition model performs feature extraction and feature fusion on the local image, so that the gesture of the user can be accurately extracted, the user can conveniently control the AR teaching aid, the use feeling of the user is improved, and the interaction time of the AR teaching aid is reduced.
Description
Technical Field
The invention belongs to the technical field of action recognition, and particularly relates to an action recognition method based on an AR teaching aid.
Background
With the progress of technology, AR technology is increasingly developed, and its application is increasingly accepted. AR sand tables are being widely used as a new type of AR educational tool. The three-dimensional virtual reality system can combine the three-dimensional virtual reality with the actual environment, so that students experience through various senses such as vision, hearing and touch, and the like, and know various contents. In education, AR sand tables are fused with solid models by acoustic, optical, electrical, image, three-dimensional animation, and computer programming techniques. Students and teachers can operate the AR sand table through gesture motions, gesture motion graphics or image information is converted into data information to be input into head-mounted equipment equipped with the AR sand table, and the data information is matched with the three-dimensional space information sand table, so that the AR sand table is controlled. However, the existing AR sand table is not accurate enough and has a slower response speed when recognizing the action of a user, so the invention provides an action recognition method based on an AR teaching aid.
Disclosure of Invention
The invention provides an action recognition method based on an AR teaching aid in order to solve the problems.
The technical scheme of the invention is as follows: an action recognition method based on an AR teaching aid comprises the following steps:
s1, acquiring a gesture global image of a user on an AR teaching aid, cutting the gesture global image, and generating a gesture local image;
s2, constructing a gesture recognition model;
s3, inputting the gesture local image into a gesture recognition model, and determining gesture actions of a user.
Further, S1 comprises the following sub-steps:
s11, calculating color label values of all pixel points in the gesture global image;
s12, taking the pixel point with the largest color label value as a standard pixel point;
s13, calculating the difference between the color label values of the rest pixel points in the gesture global image and the color label values of the standard pixel points to obtain a color label difference set;
s14, determining invalid pixel points in the gesture global image according to the color label difference value set;
s15, removing the invalid pixel points from the gesture global image to generate a gesture local image.
The beneficial effects of the above-mentioned further scheme are: in the invention, when a student or a teacher uses the digital sand table, the operation is completed by waving the gesture above the digital sand table, so the invention collects the image when waving the gesture, but the image at the moment possibly has redundant background to influence gesture recognition, so the invention primarily cuts the global image of the gesture and determines the local image only containing the hand. The three-channel color values of the partial images are generally similar, so that the pixel points belonging to background noise are screened by calculating the color label values among the pixel points, and the pixel points are removed from the gesture global image, so that the gesture partial image only containing hands can be obtained, the gesture recognition can be conveniently and rapidly carried out in the subsequent steps, and the recognition efficiency and the recognition accuracy are improved.
Further, in S11, the color label value C of the pixel point with x-axis and y-axis in the gesture global image x,y The calculation formula of (2) is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 0 Representing the abscissa, y of the pixel point where the gesture global image center is located 0 Representing the ordinate of the pixel point where the gesture global image center is located, R x,y Red channel value, G, representing pixel point with x on the abscissa and y on the ordinate x,y Green channel value, B, representing pixel point with x on the abscissa and y on the ordinate x,y Blue channel values representing pixel points with x on the abscissa and y on the ordinate, log (-) represents a logarithmic function,representing the abscissa x 0 And the ordinate is y 0 Is used for the red channel value of the pixel point,representing the abscissa x 0 And the ordinate is y 0 Is set for the green channel value of the pixel point of (c),representing the abscissa x 0 And the ordinate is y 0 Blue channel value of the pixel point of (c).
Further, S14 includes the sub-steps of:
s141, sorting all color label difference values of the color label difference value set from small to large, and sorting before sortingThe color label differences are used as a first color label difference subset; wherein L represents the number of color label differences of the color label difference set,representing an upward rounding function;
s142, randomly dividing the rest color label difference values except the first color label difference value subset in the color label difference value set into a second color label difference value subset and a third color label difference value subset;
s143, determining a color label threshold according to the first color label difference value subset, the second color label difference value subset and the third color label difference value subset;
s144, eliminating pixel points corresponding to the color label difference values larger than the color label threshold value from the gesture global image, and generating a gesture local image.
Further, in S143, the calculation formula of the color label threshold σ is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein u is m Representing the mth color label difference, v, in the first subset of color label differences n Representing the nth color label difference, w, in the second subset of color label differences k Represents the kth color label difference in the third subset of color label differences, max (·) represents the maximum function, min (·) represents the minimum function, v ave Representing the average value, w, of all color label differences in the second subset of color label differences ave Represents the average of all color label differences in the third subset of color label differences, e represents the index.
Further, in S2, the gesture recognition model includes an input layer, a first feature convolution layer, a second feature convolution layer, an operator, a full connection layer, and an output layer;
the input end of the input layer is used as the input end of the gesture recognition model, the first output end of the input layer is connected with the input end of the first characteristic convolution layer, and the second output layer of the input layer is connected with the first input end of the second characteristic convolution layer; the first output end of the first characteristic convolution layer is connected with the first input end of the arithmetic unit, and the second output end of the first characteristic convolution layer is connected with the second input end of the second characteristic convolution layer; the output end of the second characteristic convolution layer is connected with the second input end of the arithmetic unit; the output end of the arithmetic unit is connected with the input end of the full-connection layer; the output end of the full-connection layer is connected with the input end of the output layer; the output end of the output layer is used as the output end of the gesture recognition model.
The beneficial effects of the above-mentioned further scheme are: in the invention, an input layer is used for inputting a gesture local image into a gesture recognition model. The first characteristic convolution layer is used for extracting characteristic information of the gesture local image according to pixel values of all pixel points in the extracted gesture local image; the second characteristic convolution layer is used for fusing the characteristic information extracted by the first characteristic convolution layer with the pixel value of each pixel point in the gesture local image, so that the characteristic richness is increased. The full-connection layer fuses the characteristic information extracted by the first characteristic convolution layer and the characteristic information extracted by the second characteristic convolution layer again through addition operation, the characteristic dimension is improved, and finally the identification result is output through the output layer.
Further, the expression of the first characteristic convolution layer is:,the method comprises the steps of carrying out a first treatment on the surface of the Where G represents the output of the first feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point lines of the gesture local image, J represents the number of pixel point lines of the gesture local image, and w p Weights representing the p-th convolution kernel in the first feature convolution layer, o p Representing the first characteristic convolution layerOffset of p convolution kernels, alpha p Step size, b, representing the p-th convolution kernel in the first feature convolution layer p The number of channels representing the P-th convolution kernel in the first characteristic convolution layer, and P represents the number of convolution kernels of the first characteristic convolution layer.
Further, the expression of the second characteristic convolution layer is:,the method comprises the steps of carrying out a first treatment on the surface of the Where H represents the output of the second feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point rows of the gesture local image, J represents the number of pixel point rows of the gesture local image, and W q Weight, O, representing the q-th convolution kernel in the first feature convolution layer q Representing the bias, beta, of the qth convolution kernel in the second feature convolution layer q Step length representing the qth convolution kernel in the second feature convolution layer, B q The number of channels of the Q-th convolution kernel in the second characteristic convolution layer is represented, P represents the number of convolution kernels of the first characteristic convolution layer, and Q represents the number of convolution kernels of the second characteristic convolution layer.
Further, the expression of the fully connected layer is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein T represents the output of the fully connected layer,the offset of the kth neuron in the full connection layer is represented, K represents the number of neurons of the full connection layer, P represents the number of convolution kernels of the first characteristic convolution layer, Q represents the number of convolution kernels of the second characteristic convolution layer, G represents the output of the first characteristic convolution layer, and H represents the output of the second characteristic convolution layer.
The beneficial effects of the invention are as follows: the invention discloses an action recognition method based on an AR teaching aid, which is characterized in that through recognizing a global image of a gesture swung by a user on the AR teaching aid (namely an AR sand table), the global image is accurately cut in consideration of background noise of the global image, and the generated local image only comprises hands, so that the gesture action can be quickly and accurately extracted in the following steps; meanwhile, the invention also builds a gesture recognition model, and the gesture recognition model performs feature extraction and feature fusion on the local image, so that the gesture of the user can be accurately extracted, the user can conveniently control the AR teaching aid, the use feeling of the user is improved, and the interaction time of the AR teaching aid is reduced.
Drawings
FIG. 1 is a flow chart of an AR teaching aid based motion recognition method;
fig. 2 is a schematic diagram of a gesture recognition model.
Detailed Description
Embodiments of the present invention are further described below with reference to the accompanying drawings.
As shown in fig. 1, the invention provides an action recognition method based on an AR teaching aid, which comprises the following steps:
s1, acquiring a gesture global image of a user on an AR teaching aid, cutting the gesture global image, and generating a gesture local image;
s2, constructing a gesture recognition model;
s3, inputting the gesture local image into a gesture recognition model, and determining gesture actions of a user.
In an embodiment of the present invention, S1 comprises the following sub-steps:
s11, calculating color label values of all pixel points in the gesture global image;
s12, taking the pixel point with the largest color label value as a standard pixel point;
s13, calculating the difference between the color label values of the rest pixel points in the gesture global image and the color label values of the standard pixel points to obtain a color label difference set;
s14, determining invalid pixel points in the gesture global image according to the color label difference value set;
s15, removing the invalid pixel points from the gesture global image to generate a gesture local image.
In the invention, when a student or a teacher uses the digital sand table, the operation is completed by waving the gesture above the digital sand table, so the invention collects the image when waving the gesture, but the image at the moment possibly has redundant background to influence gesture recognition, so the invention primarily cuts the global image of the gesture and determines the local image only containing the hand. The three-channel color values of the partial images are generally similar, so that the pixel points belonging to background noise are screened by calculating the color label values among the pixel points, and the pixel points are removed from the gesture global image, so that the gesture partial image only containing hands can be obtained, the gesture recognition can be conveniently and rapidly carried out in the subsequent steps, and the recognition efficiency and the recognition accuracy are improved.
In the embodiment of the present invention, in S11, the color label value C of the pixel point with x abscissa and y ordinate in the gesture global image x,y The calculation formula of (2) is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 0 Representing the abscissa, y of the pixel point where the gesture global image center is located 0 Representing the ordinate of the pixel point where the gesture global image center is located, R x,y Red channel value, G, representing pixel point with x on the abscissa and y on the ordinate x,y Green channel value, B, representing pixel point with x on the abscissa and y on the ordinate x,y Blue channel values representing pixel points with x on the abscissa and y on the ordinate, log (-) represents a logarithmic function,representing the abscissa x 0 And the ordinate is y 0 Is used for the red channel value of the pixel point,representing the abscissa x 0 And the ordinate is y 0 Is set for the green channel value of the pixel point of (c),representing the abscissa x 0 And the ordinate is y 0 Blue channel value of the pixel point of (c).
In an embodiment of the present invention, S14 includes the sub-steps of:
s141, sorting all color label difference values of the color label difference value set from small to large, and sorting before sortingThe color label differences are used as a first color label difference subset; wherein L represents the number of color label differences of the color label difference set,representing an upward rounding function;
s142, randomly dividing the rest color label difference values except the first color label difference value subset in the color label difference value set into a second color label difference value subset and a third color label difference value subset;
s143, determining a color label threshold according to the first color label difference value subset, the second color label difference value subset and the third color label difference value subset;
s144, eliminating pixel points corresponding to the color label difference values larger than the color label threshold value from the gesture global image, and generating a gesture local image.
In the embodiment of the present invention, in S143, the calculation formula of the color label threshold σ is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein u is m Representing the mth color label difference, v, in the first subset of color label differences n Representing the nth color label difference, w, in the second subset of color label differences k Represents the kth color label difference in the third subset of color label differences, max (·) represents the maximum function, min (·) represents the minimum function, v ave Representing the average value, w, of all color label differences in the second subset of color label differences ave Represents the average of all color label differences in the third subset of color label differences, e represents the index.
In the embodiment of the present invention, as shown in fig. 2, in S2, the gesture recognition model includes an input layer, a first feature convolution layer, a second feature convolution layer, an operator, a full connection layer, and an output layer;
the input end of the input layer is used as the input end of the gesture recognition model, the first output end of the input layer is connected with the input end of the first characteristic convolution layer, and the second output layer of the input layer is connected with the first input end of the second characteristic convolution layer; the first output end of the first characteristic convolution layer is connected with the first input end of the arithmetic unit, and the second output end of the first characteristic convolution layer is connected with the second input end of the second characteristic convolution layer; the output end of the second characteristic convolution layer is connected with the second input end of the arithmetic unit; the output end of the arithmetic unit is connected with the input end of the full-connection layer; the output end of the full-connection layer is connected with the input end of the output layer; the output end of the output layer is used as the output end of the gesture recognition model.
In the invention, an input layer is used for inputting a gesture local image into a gesture recognition model. The first characteristic convolution layer is used for extracting characteristic information of the gesture local image according to pixel values of all pixel points in the extracted gesture local image; the second characteristic convolution layer is used for fusing the characteristic information extracted by the first characteristic convolution layer with the pixel value of each pixel point in the gesture local image, so that the characteristic richness is increased. The full-connection layer fuses the characteristic information extracted by the first characteristic convolution layer and the characteristic information extracted by the second characteristic convolution layer again through addition operation, the characteristic dimension is improved, and finally the identification result is output through the output layer.
In the embodiment of the present invention, the expression of the first characteristic convolution layer is:,the method comprises the steps of carrying out a first treatment on the surface of the Where G represents the output of the first feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point lines of the gesture local image, J represents the number of pixel point lines of the gesture local image, and w p Weights representing the p-th convolution kernel in the first feature convolution layer, o p Representing a first feature convolution layerOffset of p-th convolution kernel, alpha p Step size, b, representing the p-th convolution kernel in the first feature convolution layer p The number of channels representing the P-th convolution kernel in the first characteristic convolution layer, and P represents the number of convolution kernels of the first characteristic convolution layer.
In the embodiment of the present invention, the expression of the second characteristic convolution layer is:,the method comprises the steps of carrying out a first treatment on the surface of the Where H represents the output of the second feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point rows of the gesture local image, J represents the number of pixel point rows of the gesture local image, and W q Weight, O, representing the q-th convolution kernel in the first feature convolution layer q Representing the bias, beta, of the qth convolution kernel in the second feature convolution layer q Step length representing the qth convolution kernel in the second feature convolution layer, B q The number of channels of the Q-th convolution kernel in the second characteristic convolution layer is represented, P represents the number of convolution kernels of the first characteristic convolution layer, and Q represents the number of convolution kernels of the second characteristic convolution layer.
In the embodiment of the invention, the expression of the full connection layer is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein T represents the output of the fully connected layer,the offset of the kth neuron in the full connection layer is represented, K represents the number of neurons of the full connection layer, P represents the number of convolution kernels of the first characteristic convolution layer, Q represents the number of convolution kernels of the second characteristic convolution layer, G represents the output of the first characteristic convolution layer, and H represents the output of the second characteristic convolution layer.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.
Claims (5)
1. An action recognition method based on an AR teaching aid is characterized by comprising the following steps:
s1, acquiring a gesture global image of a user on an AR teaching aid, cutting the gesture global image, and generating a gesture local image;
s2, constructing a gesture recognition model;
s3, inputting the gesture local image into a gesture recognition model, and determining gesture actions of a user;
the step S1 comprises the following substeps:
s11, calculating color label values of all pixel points in the gesture global image;
s12, taking the pixel point with the largest color label value as a standard pixel point;
s13, calculating the difference between the color label values of the rest pixel points in the gesture global image and the color label values of the standard pixel points to obtain a color label difference set;
s14, determining invalid pixel points in the gesture global image according to the color label difference value set;
s15, removing invalid pixel points from the gesture global image to generate a gesture local image;
in S11, the color label value C of the pixel point with x abscissa and y ordinate in the gesture global image x,y The calculation formula of (2) is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein x is 0 Representing the abscissa, y of the pixel point where the gesture global image center is located 0 Representing the ordinate of the pixel point where the gesture global image center is located, R x,y Red channel value, G, representing pixel point with x on the abscissa and y on the ordinate x,y Green channel value, B, representing pixel point with x on the abscissa and y on the ordinate x,y Blue channel value representing pixel point with x on the abscissa and y on the ordinate, log (·) represents a logarithmic function, ++>Representing the abscissa x 0 And the ordinate is y 0 Is used for the red channel value of the pixel point,representing the abscissa x 0 And the ordinate is y 0 Green channel value of pixel of +.>Representing the abscissa x 0 And the ordinate is y 0 Blue channel values for pixels of (a);
the step S14 includes the sub-steps of:
s141, sorting all color label difference values of the color label difference value set from small to large, and sorting before sortingThe color label differences are used as a first color label difference subset; wherein L represents the number of color label differences of the color label difference set, < >>Representing an upward rounding function;
s142, randomly dividing the rest color label difference values except the first color label difference value subset in the color label difference value set into a second color label difference value subset and a third color label difference value subset;
s143, determining a color label threshold according to the first color label difference value subset, the second color label difference value subset and the third color label difference value subset;
s144, removing pixel points corresponding to the color label difference values larger than the color label threshold value from the gesture global image to generate a gesture local image;
in S143, the calculation formula of the color label threshold σ is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein u is m Representing the mth color label difference, v, in the first subset of color label differences n Representing the nth color label difference, w, in the second subset of color label differences k Represents the kth color label difference in the third subset of color label differences, max (·) represents the maximum function, min (·) represents the minimum function, v ave Representing the average value, w, of all color label differences in the second subset of color label differences ave Represents the average of all color label differences in the third subset of color label differences, e represents the index.
2. The AR teaching aid-based motion recognition method according to claim 1, wherein in S2, the gesture recognition model includes an input layer, a first feature convolution layer, a second feature convolution layer, an operator, a full connection layer, and an output layer;
the input end of the input layer is used as the input end of the gesture recognition model, the first output end of the input layer is connected with the input end of the first characteristic convolution layer, and the second output layer of the input layer is connected with the first input end of the second characteristic convolution layer; the first output end of the first characteristic convolution layer is connected with the first input end of the arithmetic unit, and the second output end of the first characteristic convolution layer is connected with the second input end of the second characteristic convolution layer; the output end of the second characteristic convolution layer is connected with the second input end of the arithmetic unit; the output end of the arithmetic unit is connected with the input end of the full-connection layer; the output end of the full-connection layer is connected with the input end of the output layer; the output end of the output layer is used as the output end of the gesture recognition model.
3. The AR teaching aid based motion recognition method according to claim 2, wherein the expression of the first feature convolution layer is:,/>the method comprises the steps of carrying out a first treatment on the surface of the Where G represents the output of the first feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point lines of the gesture local image, J represents the number of pixel point lines of the gesture local image, and w p Weights representing the p-th convolution kernel in the first feature convolution layer, o p Representing the offset, alpha, of the p-th convolution kernel in the first characteristic convolution layer p Step size, b, representing the p-th convolution kernel in the first feature convolution layer p The number of channels representing the P-th convolution kernel in the first characteristic convolution layer, and P represents the number of convolution kernels of the first characteristic convolution layer.
4. The AR teaching aid based motion recognition method according to claim 2, wherein the expression of the second feature convolution layer is:,the method comprises the steps of carrying out a first treatment on the surface of the Where H represents the output of the second feature convolution layer, σ (·) represents the activation function, Z represents the matrix of pixel values, Z 1,1 ,..,z IJ The pixel value of each pixel point in the gesture local image is represented, I represents the number of pixel point rows of the gesture local image, J represents the number of pixel point rows of the gesture local image, and W q Weight, O, representing the q-th convolution kernel in the first feature convolution layer q Representing the bias, beta, of the qth convolution kernel in the second feature convolution layer q Step length representing the qth convolution kernel in the second feature convolution layer, B q The number of channels of the Q-th convolution kernel in the second characteristic convolution layer is represented, P represents the number of convolution kernels of the first characteristic convolution layer, and Q represents the number of convolution kernels of the second characteristic convolution layer.
5. According to claim 2The action recognition method based on the AR teaching aid is characterized in that the expression of the full connection layer is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein T represents the output of the fully-connected layer, ">The offset of the kth neuron in the full connection layer is represented, K represents the number of neurons of the full connection layer, P represents the number of convolution kernels of the first characteristic convolution layer, Q represents the number of convolution kernels of the second characteristic convolution layer, G represents the output of the first characteristic convolution layer, and H represents the output of the second characteristic convolution layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311685269.3A CN117392759B (en) | 2023-12-11 | 2023-12-11 | Action recognition method based on AR teaching aid |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311685269.3A CN117392759B (en) | 2023-12-11 | 2023-12-11 | Action recognition method based on AR teaching aid |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117392759A CN117392759A (en) | 2024-01-12 |
CN117392759B true CN117392759B (en) | 2024-03-12 |
Family
ID=89463428
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311685269.3A Active CN117392759B (en) | 2023-12-11 | 2023-12-11 | Action recognition method based on AR teaching aid |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117392759B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345867A (en) * | 2018-03-09 | 2018-07-31 | 南京邮电大学 | Gesture identification method towards Intelligent household scene |
CN108983980A (en) * | 2018-07-27 | 2018-12-11 | 河南科技大学 | A kind of mobile robot basic exercise gestural control method |
CN112749664A (en) * | 2021-01-15 | 2021-05-04 | 广东工贸职业技术学院 | Gesture recognition method, device, equipment, system and storage medium |
-
2023
- 2023-12-11 CN CN202311685269.3A patent/CN117392759B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345867A (en) * | 2018-03-09 | 2018-07-31 | 南京邮电大学 | Gesture identification method towards Intelligent household scene |
CN108983980A (en) * | 2018-07-27 | 2018-12-11 | 河南科技大学 | A kind of mobile robot basic exercise gestural control method |
CN112749664A (en) * | 2021-01-15 | 2021-05-04 | 广东工贸职业技术学院 | Gesture recognition method, device, equipment, system and storage medium |
Non-Patent Citations (5)
Title |
---|
HSV Brightness Factor Matching for Gesture Recognition System;Mokhtar M.Hasan等;《International Journal of Image Processing》;20101231;第4卷(第5期);456-467 * |
一种基于图像特征融合的动态手势识别方法;陈茜等;《计算机与数字工程》;20230620;第51卷(第6期);1381-1386 * |
基于YCbCr颜色空间手势分割;杨红玲;宣士斌;莫愿斌;赵洪;;广西民族大学学报(自然科学版);20170815(第03期);66-71 * |
基于增强现实的人手自然交互康复系统设计;卞方舟;《中国优秀硕士学位论文全文数据库 (医药卫生科技辑)》;20230115(第1期);E060-1237 * |
基于计算机视觉的手势识别系统研究;周航;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20080515(第5期);I138-13 * |
Also Published As
Publication number | Publication date |
---|---|
CN117392759A (en) | 2024-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110750959B (en) | Text information processing method, model training method and related device | |
US11983881B2 (en) | AI-based image region recognition method and apparatus and AI-based model training method and apparatus | |
JP2667954B2 (en) | Apparatus and method for automatic handwriting recognition using static and dynamic parameters | |
CN104463101B (en) | Answer recognition methods and system for character property examination question | |
US11783615B2 (en) | Systems and methods for language driven gesture understanding | |
CN107578023A (en) | Man-machine interaction gesture identification method, apparatus and system | |
CN108229478A (en) | Image, semantic segmentation and training method and device, electronic equipment, storage medium and program | |
CN109145871B (en) | Psychological behavior recognition method, device and storage medium | |
JP2022500808A (en) | Statement generation methods and devices, electronic devices and programs | |
CN113870395A (en) | Animation video generation method, device, equipment and storage medium | |
CN110704606A (en) | Generation type abstract generation method based on image-text fusion | |
CN113283336A (en) | Text recognition method and system | |
CN115471885A (en) | Action unit correlation learning method and device, electronic device and storage medium | |
CN111739037B (en) | Semantic segmentation method for indoor scene RGB-D image | |
CN109240931A (en) | Problem feedback information treating method and apparatus | |
CN116561533B (en) | Emotion evolution method and terminal for virtual avatar in educational element universe | |
CN117392759B (en) | Action recognition method based on AR teaching aid | |
US4736447A (en) | Video computer | |
CN111968624A (en) | Data construction method and device, electronic equipment and storage medium | |
CN111862061A (en) | Method, system, device and medium for evaluating aesthetic quality of picture | |
CN108108652A (en) | A kind of across visual angle Human bodys' response method and device based on dictionary learning | |
CN114944002B (en) | Text description-assisted gesture-aware facial expression recognition method | |
Kratimenos et al. | 3D hands, face and body extraction for sign language recognition | |
CN114358579A (en) | Evaluation method, evaluation device, electronic device, and computer-readable storage medium | |
CN113947801A (en) | Face recognition method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |