CN115016641A - Conference control method, device, conference system and medium based on gesture recognition - Google Patents

Conference control method, device, conference system and medium based on gesture recognition Download PDF

Info

Publication number
CN115016641A
CN115016641A CN202210617660.9A CN202210617660A CN115016641A CN 115016641 A CN115016641 A CN 115016641A CN 202210617660 A CN202210617660 A CN 202210617660A CN 115016641 A CN115016641 A CN 115016641A
Authority
CN
China
Prior art keywords
gesture recognition
gesture
recognition model
conference control
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210617660.9A
Other languages
Chinese (zh)
Inventor
黄鑫
陈龙
蒋海洋
马澎家
杨望宇
蔡俊
张子恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Institute of Technology
Original Assignee
Wuhan Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Institute of Technology filed Critical Wuhan Institute of Technology
Priority to CN202210617660.9A priority Critical patent/CN115016641A/en
Publication of CN115016641A publication Critical patent/CN115016641A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a conference control method, a device, a conference system and a medium based on gesture recognition, which comprises the following steps: acquiring a gesture recognition model with complete training, wherein the gesture recognition model comprises a feature extraction module and an inference library module; acquiring video stream information to be detected; performing feature extraction on the video stream information based on a feature extraction module to obtain a video frame to be identified; performing gesture action recognition on the video frame to be recognized based on the reasoning base module to obtain gesture action recognition information; and generating a corresponding conference control instruction according to the gesture action identification information. The invention can identify the gesture actions of the conference participants by only using the common camera and the data processing equipment, determines the gesture actions by tracking the information of the key points of the hands and generates the corresponding conference control instructions, more conveniently and quickly completes the man-machine interaction, improves the identification speed and the accuracy of the gesture actions of the conference participants, and provides technical support for the online conference studied by the cooperation of multiple persons.

Description

Conference control method, device, conference system and medium based on gesture recognition
Technical Field
The invention relates to the technical field of human-computer interaction, in particular to a conference control method and device based on gesture recognition, a conference system and a computer readable storage medium.
Background
With the development of artificial intelligence technology, applications in various computer fields including human behavior recognition, target detection, target tracking, voice recognition and the like have made great progress. Human-computer interaction is always a hot point of research, from early punched paper to mouse and keyboard operation, from the existing touch screen technology to voice recognition interaction, and the human-computer communication mode is more and more natural and humanized. The recent rise of virtual reality and augmented reality technologies has also brought about the development of gesture recognition interaction technologies.
The traditional gesture recognition algorithm mainly comprises threshold segmentation, edge image segmentation, region-based segmentation and the like. In the daily operation of a company, multiple departments are often required to discuss a project. However, the traditional method can only be used for demonstrating by a single speaker, cannot allow other participants to comment, is difficult to discuss in real time, and is not beneficial to the efficient development of the collaborative consultation. In addition, conventional gesture recognition algorithms have some disadvantages, such as: the speed is high, but the accuracy is low; the self-adaptive series of multi-threshold segmentation has large calculation amount, and the result is sensitive to the threshold, occupies more resources and the like.
Therefore, a conference control method based on gesture recognition is needed to be provided to solve the problems of slow gesture recognition speed and large occupied resource of the existing conference system for controlling multi-person collaborative interaction through gestures.
Disclosure of Invention
In view of this, it is necessary to provide a conference control method, apparatus, system and computer readable storage medium based on gesture recognition, so as to solve the problems of slow gesture recognition speed and large occupied resources when controlling multi-user collaborative interaction through gestures in the prior art.
In order to solve the above problem, the present invention provides a conference control method based on gesture recognition, including:
acquiring a gesture recognition model which is trained completely, wherein the gesture recognition model comprises a feature extraction module and an inference library module;
acquiring video stream information to be detected;
extracting the characteristics of the video stream information based on the characteristic extraction module to obtain a video frame to be identified;
performing gesture action recognition on the video frame to be recognized based on the reasoning library module to obtain gesture action recognition information;
and generating a corresponding conference control instruction according to the gesture action identification information.
Further, the obtaining of the fully trained gesture recognition model includes:
creating an initial gesture recognition model;
acquiring a gesture video sample data set, and dividing the sample data set into a training set and a verification set;
training the initial gesture recognition model by using the training set to obtain a trained gesture recognition model;
and performing performance evaluation on the trained gesture recognition model by using the verification set, and obtaining the preset gesture recognition model when the trained gesture recognition model reaches a preset performance standard.
Further, the gesture recognition model includes a plurality of channel separable volume blocks;
the channel separable convolution block includes a plurality of convolution layers and a SE channel attention layer;
the SE channel attention layer is connected to a plurality of the convolution layers.
Further, the activation function of the channel separable volume block is a Swish function;
the Swish function is used to pass the output data of the plurality of convolutional layers.
Further, the network parameters of the gesture recognition model include depth, width, and resolution size;
and adjusting the weights corresponding to the depth, the width and the picture size by using a MnasNet grid searching method.
Further, training the initial gesture recognition model using the training set includes:
carrying out first-stage training on the initial gesture recognition model by using a random data enhancement training mode to obtain a gesture recognition model after preliminary optimization;
and performing second-stage training on the preliminarily optimized gesture recognition model by using a antagonism sample training mode to obtain a trained gesture recognition model.
Further, the first stage training includes adjusting network parameters of the model; the second stage training includes adjusting network parameters and scale of the model.
The invention also provides a conference control device based on gesture recognition, which comprises:
the model acquisition module is used for acquiring a gesture recognition model which is completely trained, and the gesture recognition model comprises a feature extraction module and an inference library module;
the video information acquisition module is used for acquiring video stream information to be detected;
the extraction module is used for extracting the characteristics of the video stream information based on the characteristic extraction model to obtain a video frame to be identified;
the recognition module is used for recognizing the gesture action of the video frame to be recognized based on the reasoning library module to obtain gesture action recognition information;
and the instruction generating module is used for generating a corresponding conference control instruction according to the gesture action identification information.
The invention also provides a conference system, which comprises a processor and a memory, wherein the memory is stored with a computer program, and when the computer program is executed by the processor, the conference control method based on gesture recognition in any technical scheme is realized.
The invention also provides a computer readable storage medium, wherein the program medium stores computer program instructions, and when the computer program instructions are executed by a computer, the computer is enabled to execute any one of the conference control methods based on gesture recognition.
Compared with the prior art, the invention has the beneficial effects that: firstly, establishing a gesture recognition model with complete training; secondly, acquiring video stream information to be detected; thirdly, recognizing the gesture action in the video stream information to be detected through a gesture recognition model; and finally, generating a corresponding conference control instruction according to the gesture action recognition information. The invention can identify the gesture actions of the conference participants by only using a common camera and data processing equipment, does not need to use specific matched hardware equipment, determines the gesture actions by tracking the information of key points of hands and generates corresponding conference control instructions, more conveniently and quickly completes man-machine interaction, improves the identification speed and accuracy of the gesture actions of the conference participants, and provides technical support for an online conference studied by multiple persons in a collaborative interaction manner.
Drawings
Fig. 1 is a schematic flowchart of an embodiment of a conference control method based on gesture recognition according to the present invention;
FIG. 2 is a schematic flowchart illustrating an embodiment of a method for obtaining a fully trained gesture recognition model according to the present invention;
FIG. 3 is a schematic structural diagram of an initial gesture recognition model according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an embodiment of an MBConv volume block provided by the present invention;
fig. 5 is a schematic structural diagram of an embodiment of a conference control apparatus based on gesture recognition according to the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
The invention provides a conference control method based on gesture recognition, which comprises the following steps:
step S101: acquiring a gesture recognition model which is trained completely, wherein the gesture recognition model comprises a feature extraction module and an inference library module;
step S102: acquiring video stream information to be detected;
step S103: extracting the characteristics of the video stream information based on the characteristic extraction module to obtain a video frame to be identified;
step S104: performing gesture action recognition on the video frame to be recognized based on the reasoning library module to obtain gesture action recognition information;
step S105: and generating a corresponding conference control instruction according to the gesture action identification information.
As a specific embodiment, the feature extraction module may utilize an Efficienct model framework or a mobilene network model framework.
As a specific embodiment, the inference library module is a strongly classified neural network Realtimenet. In the current application of Realtimenet, the neural network architecture can identify which action gesture is, can also judge abnormal behavior identification such as movement and fighting, and can also calculate the calorie consumed. Thus, it can be used in the following scenarios: 1. gesture control (in smart home devices, smart kiosks, automobiles); 2. human body action recognition (in smart home devices, automobiles, public places, video calls); 3. body-building tracking; 4. human-computer interaction (e.g., is the user talking to me or others; 5. AR (gesturing from the "self" perspective); 6. the interaction between the digital human and the virtual human and the user is realized.
In the conference control method based on gesture recognition provided by this embodiment, first, a fully trained gesture recognition model is established; secondly, acquiring video stream information to be detected; thirdly, recognizing the gesture action in the video stream information to be detected through a gesture recognition model; and finally, generating a corresponding conference control instruction according to the gesture action recognition information. The invention can identify the gesture actions of the conference participants by only using a common camera and data processing equipment, does not need to use specific matched hardware equipment, determines the gesture actions by tracking the information of key points of hands and generates corresponding conference control instructions, more conveniently and quickly completes man-machine interaction, improves the identification speed and accuracy of the gesture actions of the conference participants, and provides technical support for an online conference studied by multiple persons in a collaborative interaction manner.
As a preferred embodiment, in step S101, as shown in fig. 2, acquiring a fully trained gesture recognition model includes:
step S201: creating an initial gesture recognition model;
step S202: acquiring a gesture video sample data set, and dividing the sample data set into a training set and a verification set;
step S203: training the initial gesture recognition model by using the training set to obtain a trained gesture recognition model;
step S204: and performing performance evaluation on the trained gesture recognition model by using the verification set, and obtaining the preset gesture recognition model when the trained gesture recognition model reaches a preset performance standard.
As a specific embodiment, in step S202, first, the videos and the classification labels in the gesture video sample data set are converted into images (video frames) and corresponding classification labels, or a small video is separately labeled with classification categories without labeling, specifically: cutting out video frames (in the form of pictures) from each video (training and testing videos) by a certain FPS (indicating the number of frames transmitted per second on a picture) and storing the video frames as a training set and a testing set, and taking the classification performance of the images as the classification performance of the corresponding videos;
and after the training is finished, the model is loaded to check and verify all video frames in the test set, and the first five video frames of the accuracy rate on the full test set are output.
As a preferred embodiment, the network parameters of the gesture recognition model include depth, width and resolution size;
and adjusting the weights corresponding to the depth, the width and the picture size by using a MnasNet grid searching method.
As a specific example, when the initial gesture recognition model is created based on the Efficienct model, the baseline model EfficientNet-B0 is generated using a MnasNet model implemented with a reinforcement learning algorithm. The models in the EfficientNet series model range from EfficientNet-B0 to EfficientNet-L2, and the accuracy of the models is higher and higher.
By adopting a composite scaling method, under the restriction conditions of preset memory and calculated amount, the depth, width (channel number of a characteristic diagram) and picture size of the EfficientNet-B0 model are scaled simultaneously, the scaling ratios of the three dimensions are obtained by grid search, and finally the initial gesture recognition model established based on EfficientNet is output. As shown in fig. 3. The respective subgraph meanings in fig. 3 are:
(a) the subgraph is the reference model.
(b) And (4) carrying out width scaling on the subgraph on the basis of the reference model, namely increasing the channel number of the picture.
(c) And (4) carrying out depth scaling on the subgraph on the basis of the reference model, namely increasing the layer number of the network.
(d) The sub-graph scales the size of the picture on the basis of the reference model.
(e) And the subgraph scales the depth, the width and the size of the picture simultaneously on the basis of the reference model.
As a preferred embodiment, the gesture recognition model comprises a plurality of channel separable volume blocks;
the channel separable convolution block includes a plurality of convolution layers and an SE channel attention layer;
the SE channel attention layer is connected to a plurality of the convolution layers.
As a preferred embodiment, the activation function of the channel separable volume block is a Swish function;
the Swish function is used to pass the output data of the plurality of convolutional layers.
As a specific embodiment, as shown in fig. 4, the inside of the Efficienct model is implemented by a plurality of MBConv convolution blocks, and the specific structure of each MBConv convolution block is as shown in fig. 4. Wherein, the ReLU activation function of the MBConv volume block is replaced by a Swish activation function. The MBConv volume block also uses a structure similar to residual chaining, except that the SE layer is used in the short connection part.
In addition, a drop _ connect method is also used instead of the conventional drop method. Dropconnect differs from Dropout in that in training the neural network model, rather than dropping the output of hidden nodes randomly, it drops the input of hidden nodes randomly. The role of Dropconnect and Dropout in deep neural networks is to prevent the model from generating an overfit. In contrast, the effect of DropConnect would be better.
As a preferred embodiment, training the initial gesture recognition model by using the training set includes:
carrying out first-stage training on the initial gesture recognition model by using a random data enhancement training mode to obtain a gesture recognition model after preliminary optimization;
and performing second-stage training on the preliminarily optimized gesture recognition model by using a antagonism sample training mode to obtain a trained gesture recognition model.
As a specific example, the gesture recognition model of the present invention is derived from two aspects.
(1) Aspect of model structure scale:
from the EfficientNet-B0 version to the EfficientNet-L2 version in the EfficientNet series models, the models are higher and higher in precision and larger in scale, and the requirement for the memory is increased accordingly.
The scale of the model is mainly determined by the scaling parameters of the three dimensions of width, depth and resolution. These three dimensions are not independent of each other, and for higher resolution of the input pictures, a deeper network is required to obtain a larger perceived field of view. Also, for higher resolution pictures, more channels are needed to obtain more accurate features.
The scaling parameters for each version are shown in table 1, and it can be seen that as the scaling parameters of the model become larger, the drop rate parameter of dropout also increases. This is because the more parameters in the model, the stronger the fitting effect of the model, and the easier it is to generate overfitting.
TABLE 1
Version name Scaling parameters: width of Scaling parameters: depth of field Scaling parameters: resolution ratio Dropout rate
EfficientNet-B0 1 1 224 0.2
EfficientNet-B1 1 1.1 240 0.2
EfficientNet-B2 1.1 1.2 260 0.3
EfficientNet-B3 1.2 1.4 300 0.3
EfficientNet-B4 1.4 1.8 380 0.4
EfficientNet-B5 1.6 2.2 456 0.4
EfficientNet-B6 1.8 2.6 528 0.5
EfficientNet-B7 2.0 3.1 600 0.5
EfficientNet-B8 2.2 3.6 672 0.5
EfficientNet-L2 4.3 8.3 800 0.5
To avoid the over-fitting problem, increasing the drop rate of dropout alone is not enough. There is also a need to improve the generalization capability of the model by means of an improvement in the training mode.
(2) Training aspects of the model:
before the EfficientNet-B7 version, the EfficientNet series model mainly improves the precision by adjusting the scaling parameters and increasing the network scale. After the EfficientNet-B7 version, the model precision is improved mainly by improving the training mode and increasing the network size 2 methods in parallel. The main training method is as follows:
1) random data enhancement: called Randaugment, is a more efficient data enhancement method. The method is used in EfficientNet-B7 version.
2) Training the model with challenge samples: the applications are described in EfficientNet-B8 and EfficientNet-L2 versions, hereinafter versions B8 through L2 are collectively referred to as AdvProp.
The random data enhancement is to directly replace the original automatic data enhancement method automatic in the original training frame. The AdvProp and the noise Student are training methods, namely random data enhancement methods, completed by using a new training framework.
3) Using a self-training framework: the application is in the Noisy Student version.
The random data enhancement method in the embodiment is a new data enhancement method, and is simpler and better-used than the automatic data enhancement method.
In this embodiment, the training method for overfitting is reduced by aligning the resistant samples. In implementation, a separate secondary batch specification is used to process antagonistic samples. I.e. an additional one auxiliary BN is used to act on the antagonistic sample alone. Antagonistic samples refer to those generated by adding an imperceptible perturbation to an image that may cause convolutional neural networks (ConvNets) to make erroneous predictions.
In the AdvProp model, 3 generation algorithms of challenge samples are used, PGD, I-FGSM and GD respectively.
Based on the foundation and application research results of the existing gesture recognition technology related to computer vision, software libraries such as a MediaPipe handles framework and OpenCV are integrated, and a gesture recognition method which can be used on a notebook computer with a camera is designed. The gesture is determined by tracking the information of the key points of the hand and corresponding decisions are made, so that the man-machine interaction is completed more conveniently and rapidly.
As a specific embodiment, the gesture actions that can be recognized by this embodiment are: clicking, translating, zooming, grabbing and rotating.
The click gesture actions include: the index finger of one hand is opened, the other four fingers are closed to form a click shape, the coordinates of the mouse are found, and the thumb is used for controlling whether the mouse is in a pressing (clicking) state or not, so that the mouse is used for clicking a file, a hyperlink and the like.
The panning gesture actions include: the five fingers of one hand are opened to move in parallel for entering the next ppt or returning to the previous ppt, and the five fingers translate in the positive direction (from left to right).
The zoom gesture action includes: the five fingers of the two hands are opened, and simultaneously move outwards in an expanding way or move inwards in a contracting way, so that the specific areas of pictures, texts and the like can be enlarged or reduced according to a specific scale.
The grab gesture action includes: the index finger and the middle finger of one hand are opened, the other three fingers are closed, a grabbing object is found, and then the thumb is opened, namely, the grabbing object is in a grabbing state and can be used for dragging a text box, characters, pictures and the like to a specified position.
The rotational gesture actions include: five fingers of both hands open to simultaneously to certain distance of clockwise moving, can discern rotatory angle according to the position that both hands moved, can be used to realize functions such as the rotation of picture.
In a preferred embodiment, the first stage training includes adjusting network parameters of the model; the second stage training includes adjusting network parameters and scale of the model.
The present invention also provides a conference control device based on gesture recognition, a block diagram of which is shown in fig. 5, and the conference control device 500 based on gesture recognition includes:
the model acquisition module 501 is used for acquiring a gesture recognition model with complete training, and the gesture recognition model comprises a feature extraction module and an inference base module;
a video information obtaining module 502, configured to obtain video stream information to be detected;
an extracting module 503, configured to perform feature extraction on the video stream information based on the feature extraction model to obtain a video frame to be identified;
the recognition module 504 is configured to perform gesture motion recognition on the video frame to be recognized based on the inference library module to obtain gesture motion recognition information;
and the instruction generating module 505 is configured to generate a corresponding conference control instruction according to the gesture motion recognition information.
The invention also correspondingly provides a conference system, which comprises a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the conference control method based on gesture recognition in any technical scheme is realized.
The present embodiment also provides a computer-readable storage medium, where the computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed by a computer, the computer is caused to execute any one of the above-mentioned conference control methods based on gesture recognition.
According to the computer-readable storage medium and the computing device provided by the above embodiments of the present invention, the content specifically described for implementing the conference control method based on gesture recognition according to the present invention can be referred to, and the beneficial effects similar to those of the conference control method based on gesture recognition as described above are obtained, and are not repeated herein.
The invention discloses a conference control method, a conference control device, a conference system and a computer readable storage medium based on gesture recognition, wherein firstly, a gesture recognition model with complete training is established; secondly, acquiring video stream information to be detected; thirdly, recognizing the gesture action in the video stream information to be detected through a gesture recognition model; and finally, generating a corresponding conference control instruction according to the gesture action recognition information. The invention can identify the gesture actions of the conference participants by only using a common camera and data processing equipment, does not need to use specific matched hardware equipment, determines the gesture actions by tracking the information of key points of hands and generates corresponding conference control instructions, more conveniently and quickly completes man-machine interaction, improves the identification speed and accuracy of the gesture actions of the conference participants, and provides technical support for an online conference studied by multiple persons in a collaborative interaction manner.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. A conference control method based on gesture recognition is characterized by comprising the following steps:
acquiring a gesture recognition model which is trained completely, wherein the gesture recognition model comprises a feature extraction module and an inference library module;
acquiring video stream information to be detected;
extracting the characteristics of the video stream information based on the characteristic extraction module to obtain a video frame to be identified;
performing gesture action recognition on the video frame to be recognized based on the reasoning library module to obtain gesture action recognition information;
and generating a corresponding conference control instruction according to the gesture action identification information.
2. The conference control method based on gesture recognition according to claim 1, wherein the obtaining of the well-trained gesture recognition model comprises:
creating an initial gesture recognition model;
acquiring a gesture video sample data set, and dividing the sample data set into a training set and a verification set;
training the initial gesture recognition model by using the training set to obtain a trained gesture recognition model;
and performing performance evaluation on the trained gesture recognition model by using the verification set, and obtaining the preset gesture recognition model when the trained gesture recognition model reaches a preset performance standard.
3. The gesture recognition based conference control method of claim 1, wherein said gesture recognition model comprises a plurality of channel separable volume blocks;
the channel separable convolution block includes a plurality of convolution layers and a SE channel attention layer;
the SE channel attention layer is connected to a plurality of the convolution layers.
4. The gesture recognition based conference control method of claim 3, wherein said channel separable volume block activation function is a Swish function;
the Swish function is used to pass output data of the plurality of convolutional layers.
5. The conference control method based on gesture recognition according to claim 1, wherein the network parameters of the gesture recognition model include depth, width and resolution size;
and adjusting the weights corresponding to the depth, the width and the picture size by using a MnasNet grid searching method.
6. The method for controlling conference based on gesture recognition according to claim 5, wherein training the initial gesture recognition model by using the training set comprises:
performing a first-stage training on the initial gesture recognition model by using a random data enhancement training mode to obtain a gesture recognition model after preliminary optimization;
and performing second-stage training on the preliminarily optimized gesture recognition model by using a antagonism sample training mode to obtain a trained gesture recognition model.
7. The conference control method based on gesture recognition according to claim 6, wherein the first stage training includes adjusting network parameters of a model; the second stage training includes adjusting network parameters and scale of the model.
8. A conference control apparatus based on gesture recognition, comprising:
the model acquisition module is used for acquiring a gesture recognition model which is completely trained, and the gesture recognition model comprises a feature extraction module and an inference library module;
the video information acquisition module is used for acquiring video stream information to be detected;
the extraction module is used for extracting the characteristics of the video stream information based on the characteristic extraction model to obtain a video frame to be identified;
the recognition module is used for recognizing the gesture action of the video frame to be recognized based on the reasoning library module to obtain gesture action recognition information;
and the instruction generating module is used for generating a corresponding conference control instruction according to the gesture action identification information.
9. A conferencing system comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements a gesture recognition based conference control method according to any of claims 1-7.
10. A computer-readable storage medium, characterized in that the program medium stores computer program instructions which, when executed by a computer, cause the computer to perform the conference control method based on gesture recognition according to any one of claims 1-7.
CN202210617660.9A 2022-06-01 2022-06-01 Conference control method, device, conference system and medium based on gesture recognition Pending CN115016641A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210617660.9A CN115016641A (en) 2022-06-01 2022-06-01 Conference control method, device, conference system and medium based on gesture recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210617660.9A CN115016641A (en) 2022-06-01 2022-06-01 Conference control method, device, conference system and medium based on gesture recognition

Publications (1)

Publication Number Publication Date
CN115016641A true CN115016641A (en) 2022-09-06

Family

ID=83072640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210617660.9A Pending CN115016641A (en) 2022-06-01 2022-06-01 Conference control method, device, conference system and medium based on gesture recognition

Country Status (1)

Country Link
CN (1) CN115016641A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117671592A (en) * 2023-12-08 2024-03-08 中化现代农业有限公司 Dangerous behavior detection method, dangerous behavior detection device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117671592A (en) * 2023-12-08 2024-03-08 中化现代农业有限公司 Dangerous behavior detection method, dangerous behavior detection device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Yang et al. Video captioning by adversarial LSTM
Modi et al. Facial emotion recognition using convolution neural network
CN105549885A (en) Method and device for recognizing user emotion during screen sliding operation
Caputo et al. SFINGE 3D: A novel benchmark for online detection and recognition of heterogeneous hand gestures from 3D fingers’ trajectories
Ku et al. A virtual sign language translator on smartphones
CN115016641A (en) Conference control method, device, conference system and medium based on gesture recognition
CN111860086A (en) Gesture recognition method, device and system based on deep neural network
Robert et al. A review on computational methods based automated sign language recognition system for hearing and speech impaired community
Cecotti et al. Hand-drawn symbol recognition in immersive virtual reality using deep extreme learning machines
Lech et al. Hand gesture recognition supported by fuzzy rules and Kalman filters
CN114202801A (en) Gesture recognition method based on attention-guided airspace map convolution simple cycle unit
Bajpai et al. Custom dataset creation with tensorflow framework and image processing for google t-rex
Rangdale et al. CNN based Model for Hand Gesture Recognition and Detection Developed for Specially Disabled People.
Hariharan et al. Computer vision based student behavioral tracking and analysis using deep learning
Sharma et al. An intelligent sign communication machine for people impaired with hearing and speaking abilities
Al Farid et al. Single Shot Detector CNN and Deep Dilated Masks for Vision-Based Hand Gesture Recognition from Video Sequences
Zholshiyeva et al. Human-machine interactions based on hand gesture recognition using deep learning methods.
CN114967927B (en) Intelligent gesture interaction method based on image processing
KHELDOUN et al. ALGSL89: An Algerian Sign Language Dataset
Zhang et al. Facial expression recognition based on spatial-temporal fusion with attention mechanism
Parihar et al. Hand Gesture Recognition: A Review
Zhu et al. Application of Attention Mechanism-Based Dual-Modality SSD in RGB-D Hand Detection
Kumar et al. Machine Learning Approach for Gesticulation System Using Hand
Gandhi et al. Sign Language Recognition Using Convolutional Neural Network
Corso Techniques for vision-based human-computer interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination