CN114564104A - Conference demonstration system based on dynamic gesture control in video - Google Patents

Conference demonstration system based on dynamic gesture control in video Download PDF

Info

Publication number
CN114564104A
CN114564104A CN202210145445.3A CN202210145445A CN114564104A CN 114564104 A CN114564104 A CN 114564104A CN 202210145445 A CN202210145445 A CN 202210145445A CN 114564104 A CN114564104 A CN 114564104A
Authority
CN
China
Prior art keywords
gesture
video
module
redundancy
conference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210145445.3A
Other languages
Chinese (zh)
Inventor
苗启广
宋建锋
史媛媛
刘如意
苗凯彬
李宇楠
刘向增
葛道辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210145445.3A priority Critical patent/CN114564104A/en
Publication of CN114564104A publication Critical patent/CN114564104A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures

Abstract

The invention discloses a conference demonstration system based on dynamic gesture control in video, which consists of a real-time video acquisition module, a continuous gesture segmentation module, a video multi-scale redundancy removal module, a gesture recognition module and a conference demonstration system response module. The real-time video acquisition module is used for continuously acquiring real-time video stream data; the continuous gesture segmentation module is used for splitting continuous gestures in the video stream; the video redundancy removing module is used for removing the redundancy area of the single gesture video clip; the gesture recognition module is used for recognizing the received independent single gesture video; the conference demonstration system response module converts the gesture signal into a control instruction of the conference system, and the corresponding instruction function is called to realize the control of opening, showing and page turning of the conference demonstration system. The system enables a presenter to get rid of the limitations of a mouse, a keyboard and a page turning pen, enhances the interactivity of conference presentation and improves the fluency of the presentation process.

Description

Conference demonstration system based on dynamic gesture control in video
Technical Field
The invention belongs to the technical field of computer application, relates to an office teaching conference demonstration control system, and particularly relates to a conference demonstration system based on dynamic gesture control in videos.
Background
With the popularization of computer technology and equipment, projection equipment is often adopted as demonstration in office and teaching meeting scenes at present. And most of the tools that operate synchronously with the projection device are mice, keyboards or page turning pens. The interaction between performers and participants is emphasized in the office teaching scene, and the devices have certain limitations. For example, when a performer walks down the platform to interact with the participants and meets the situation of showing and turning pages, the performer has to stop interacting with the participants and return to the platform again to show and turn pages of the conference system, which destroys the fluency of conference communication. As another example, remote control devices interrupt communication when a performer interacts with a participant by shaking hands or signing. These problems described above all bring much inconvenience to the actual life work.
Disclosure of Invention
In order to solve the technical problems that in the process of meeting demonstration, a demonstration manuscript can only be controlled through a keyboard, a mouse or a page turning pen or other similar control devices, convenience and rapidness cannot be realized, and control operation of external equipment is not needed, the invention aims to provide a non-contact type meeting demonstration system based on dynamic gesture control in videos, wherein an performer is not limited by distance and equipment, and the operations of opening, showing, page turning and the like of the meeting system are freely and smoothly realized.
In order to realize the task, the invention adopts the following technical scheme:
the utility model provides a meeting presentation system based on gesture control in video which characterized in that, comprises real-time video acquisition module, continuous gesture segmentation module, video redundancy module, gesture recognition module and the meeting presentation system response module that connect gradually, wherein:
the real-time video acquisition module is used for acquiring a current video stream in real time by adopting a camera;
the continuous gesture segmentation module is used for splitting continuous gestures in the video stream, segmenting a plurality of continuous gestures into independent gesture fragments and sending the independent gesture video fragments to the video redundancy elimination module;
the video redundancy removing module is used for removing redundancy areas of the single gesture video clips, screening effective information in the video clips through a coarse redundancy removing unit and a further fine redundancy removing unit, and sending the simplified independent gesture video clips to the gesture recognition module;
the gesture recognition module is used for recognizing the received independent single gesture video, training a gesture recognition model by utilizing a recording data set constructed in advance, further performing prediction classification on the detected hand video by adopting a hand feature model, and finally sending a prediction result of the gesture recognition model to the conference demonstration system response module;
and the conference demonstration system response module is used for converting the received gesture type prediction result into a control instruction, and then sending the received control instruction to the processor to complete the opening, showing and page turning of the conference demonstration system.
According to the invention, the continuous gesture segmentation module judges whether the hand part is continuously exposed in the demonstration area by designing a hand part discriminator algorithm, segments a plurality of continuous gestures into independent gesture segments according to the exposure degree of the hand part, and sends the independent gesture video segments to the video redundancy elimination module.
Furthermore, the video redundancy removing module screens effective information in the video segments through two units of coarse redundancy removing and fine redundancy removing; wherein:
the rough redundancy removing unit is used for screening and deleting irrelevant gesture segments in the starting part and the ending part of the video;
and the redundancy eliminating unit is further designed to screen similar frames in the video in order to accelerate the speed of the gesture recognition module, so that the video information is simplified.
Specifically, the gesture recognition module specifically includes a data recording unit, a gesture recognition model training unit, and a gesture category prediction unit, where:
the gesture recognition model unit is used for training the constructed gesture data set, learning the characteristic information of different types of gestures and storing the characteristic information as a gesture characteristic model;
the gesture type prediction unit is used for performing gesture type prediction on the motion video of the hand;
the recording data set unit is obtained by collecting, recording and sorting, and is used for recording data of 18 demonstrators in an environment with a white wall as a background under normal indoor illumination, each demonstrator is 1m away from the camera, and the demonstrators demonstrate three gesture actions of clicking, grabbing and translating in a sitting posture state.
The gesture is a single-hand action and is operated by the left hand or the right hand.
And a camera in the real-time video acquisition module selects a common camera.
The conference demonstration system based on the dynamic gesture control in the video can be widely applied to the environments of office work, teaching and the like, gets rid of the constraint of a keyboard, a mouse and a page turning pen, overcomes the limitation on space, can control the system in real time, enhances the interactivity of conference demonstration and simultaneously improves the fluency of demonstration.
Drawings
Fig. 1 is a schematic diagram of the general structure of a conference presentation system based on gesture control in video according to the present invention.
FIG. 2 is a flow diagram of a continuous gesture segmentation module based on slide detection.
FIG. 3 is a schematic diagram of a sliding window segmentation algorithm in the continuous gesture segmentation module.
Fig. 4 is a graph of similarity versus successive frames in a video stream.
FIG. 5 is a graph of performance test of a training model of a gesture recognition algorithm.
The invention is explained in more detail below with reference to the figures and examples.
Detailed Description
It should be noted that the conference presentation system based on dynamic gesture control in video provided by this embodiment is constructed based on a PC end, so as to facilitate the control of the whole conference system by the PC end; on the other hand, the system is a conference system controlled by dynamic gestures in videos, not a conference system controlled by static gestures in images, and the gesture dynamic videos are directly trained by adopting a three-dimensional convolutional neural network in a gesture recognition model unit, so that the system is closer to real scenes in real life and is beneficial to application and popularization. The camera in the real-time video acquisition module is a common camera so as to facilitate the wide application of the system.
In design, the continuous gesture segmentation module is provided for video flow scene design in real life. In previous presentation systems, pictures of gestures were generally faced, that is to say they could be regarded as separate gestures, i.e. only one gesture in each picture needs to be analyzed and recognized. Compared with the situation of recognizing independent gesture videos, the gesture recognition under the real human-computer interaction scene has the primary problem that gesture separation and extraction are required to be carried out on continuously obtained gesture video streams. In the module, a continuous gesture segmentation method based on sliding window detection is further designed, so that continuous gestures of a video stream acquired by a camera are segmented. Meanwhile, in consideration of the fact that practical application scenes often have great requirements on the real-time performance of the algorithm, the multi-thread algorithm framework is adopted while continuous gesture segmentation is achieved.
Furthermore, in the design process, the video redundancy removing module consists of two units of coarse redundancy removing and fine redundancy removing. Firstly, a redundancy removing unit is roughly arranged, an adaptive interframe similarity judging algorithm is designed, and irrelevant gesture information of a starting part and an ending part in a video segment is subjected to adaptive screening and deletion; and then, a redundancy thinning unit is used for further accelerating the speed of the gesture recognition module, and a uniform proportional sampling algorithm is used for thinning redundancy and simplifying video information.
The recorded data set is acquired, recorded and sorted by the applicant. The data set records data of 18 demo persons in an environment with a white wall as a background under normal indoor illumination. Each demonstrator is 1m away from the camera, and performers demonstrate three gesture actions of clicking, grabbing and translating in a sitting posture state.
To expand the amount of data, each participant performed 5 iterations of each gesture, for a total of 90 video samples, for a total of 270 gesture videos. The data set has 3 types (click, grab, pan), with 200 videos as training set and 70 videos as test set.
The control gestures and detailed information are as follows:
Figure BDA0003508704480000041
Figure BDA0003508704480000051
the gesture is only a single-handed motion, but both the left and right hands can operate.
As shown in fig. 1, the present embodiment provides a conference presentation system based on dynamic gesture control in video, including: the system comprises a real-time video acquisition module, a continuous gesture segmentation module, a video redundancy removing module, a gesture recognition module and a conference demonstration system response module.
The real-time video acquisition module is mainly used for acquiring a current video stream in real time by adopting a camera;
the continuous gesture segmentation module is mainly used for splitting continuous gestures in a video stream, segmenting a plurality of continuous gestures into independent single gestures, and sending the independent gesture video to the video redundancy elimination module;
in this embodiment, the continuous gesture segmentation module determines whether the hand is continuously exposed in the presentation area by designing a hand discriminator algorithm, and segments a plurality of continuous gestures into individual independent gesture segments according to the exposure degree of the hand.
The video redundancy removing module is mainly used for removing the redundancy area of the single gesture video clip; screening effective information in the video clips through a coarse redundancy removing unit and a further fine redundancy removing unit, and sending the simplified independent gesture video clips to a gesture recognition module; wherein:
the rough redundancy removing unit is used for screening and deleting irrelevant gesture segments in the starting part and the ending part of the video;
and the redundancy eliminating unit is further designed to screen similar frames in the video in order to accelerate the speed of the gesture recognition module, so that the video information is simplified.
The gesture recognition module mainly comprises the acquisition of a hand video, the model training of a dynamic gesture video and the prediction by loading a model; the conference demonstration system response module converts the gesture signal into a control instruction of the conference system, and the corresponding instruction function is called to realize the control of opening, showing and page turning of the conference demonstration system.
Specifically, the gesture recognition module specifically includes a data recording unit, a gesture recognition model unit, and a gesture category prediction unit, where:
the gesture recognition model unit is used for training the constructed gesture data set, learning the characteristic information of different types of gestures and storing the characteristic information as a gesture characteristic model;
the gesture type prediction unit is used for performing gesture type prediction on the motion video of the hand;
the recording data set unit is obtained by collecting, recording and sorting, and is used for recording data of 18 demonstrators in an environment with a white wall as a background under normal indoor illumination, each demonstrator is 1m away from the camera, and the demonstrators demonstrate three gesture actions of clicking, grabbing and translating in a sitting posture state.
And the conference presentation system response module is used for converting the received gesture type prediction result into a control instruction and then sending the received control instruction to the processor to complete the opening, showing and page turning of the conference presentation system.
Referring to fig. 1, the conference presentation system based on dynamic gesture control in video of the present embodiment operates according to the following steps:
step 1: and opening the camera, performing continuous gesture actions by the performer, and acquiring real-time video stream by the camera.
Step 2: dividing a plurality of continuous gesture dynamic videos collected in real time into independent gesture videos;
in this embodiment, a continuous gesture segmentation method based on sliding window detection is designed, so as to segment a continuous gesture of a video stream acquired by a camera. As shown in fig. 2, in the present embodiment, a multi-thread processing method based on sliding window segmentation is designed, so as to implement segmented sampling of a video stream acquired by a camera. The whole process is realized by the cooperation of the two threads, and by the method, the time delay accumulation caused by the gesture recognition process can be avoided, the whole processing efficiency is further improved, and the real-time performance of the human-computer interaction system is ensured.
Thread 1 is primarily responsible for the capture of video. Firstly, a sliding detection window with the length of n is maintained, the window carries out detection once every t seconds, and if the fact that hands appear in n continuous frames is detected, the fact that the next 100 frames are all effective gesture action information is judged. Meanwhile, in order to ensure the robustness of the recognition, a sampling queue with a length of 100 frames needs to be maintained, and the video frame sequence in the sliding window is included therein (the frame sequence with the length of 100 is read in for the first time). Thread 1 sends an activation signal to thread 2 every time it completes a read (here, a 100 frame sample sequence is a sample threshold determined by the time and number of video frames required to perform an independent gesture based on experimental statistics).
Thread 2 is primarily responsible for data processing and prediction of gestures. After the sample queue is full, the video frame sequence therein is sent to the video deduplication module.
As shown in fig. 3, the detail of the sampling of thread 1 is shown. And for a real-time video stream, a sliding window detection unit is arranged, if the hand parts of 10 continuous frames are detected to appear, the next 100 frames are judged to be effective gesture information, and a video redundancy removal module stage is started. If no hand appears in 10 continuous frames, the gesture action is not considered to be started, and the sliding window detection of the next round is abandoned. Each time a hand is detected for 10 consecutive frames, the module proceeds to the next module for the next 100 consecutive segments. By the segmentation method based on sliding window detection, the independent gestures are divided.
And step 3: and performing redundancy elimination processing on the independent gesture video segments, wherein the redundancy elimination processing comprises a coarse redundancy elimination unit and a fine redundancy elimination unit which are used for screening effective information in the video segments.
Further, the rough redundancy removing unit is used for screening and deleting irrelevant gesture segments in the starting part and the ending part of the video; independent gesture video segment data are unified into 100 frames, but through single gesture video analysis, the video existence information redundancy is found, the gesture action is concentrated in the middle part of the video, and the front end and the tail end of the video basically do not contain useful gesture information. Therefore, through the statistical analysis of the video clips, as shown in fig. 4, the whole video data has 100 frames, and a single gesture action performance needs about 2s (60 frames). In the video, about the first 10 frames of the video, the performer is in a waiting state and does not start to perform any action. The video ends for about 10 frames and the performer gesture presentation is substantially finished. Through the discrimination of the interframe similarity, a self-adaptive frame sampling threshold value is set to screen the video frames, and 60 frames in the middle are selected, so that the information redundancy is reduced.
Furthermore, in order to accelerate the speed of the gesture recognition module, the method designs a uniform sampling based on equal intervals to screen similar frames in the video in a detail mode, and video information is simplified. One frame is reserved every m frames in the video, and the rest m-1 frames are deleted. Then, a video with a standard frame number is obtained, and m is obtained by the following formula:
Figure BDA0003508704480000081
wherein, the standard frame number is s, and the total is the actual frame number of the original video.
For video with a lower frame number than standard, the ratio between the standard frame number and the actual frame number is calculated:
Figure BDA0003508704480000082
then, for each frame in the video, it is replicated ratio times, interpolated after the position of the frame. The difference between the video frame number and the standard frame number at this time is:
dif=s-total*ratio (3)
if dif is greater than 0, then the dif frame is randomly selected from the original total frame to be copied once, and the dif frame is sequentially placed behind the position of the random frame. At this point, the video with the frame number smaller than the standard completes the frame expansion/completion operation.
And 4, step 4: and sending the gesture recognition model constructed according to the recorded data set into a three-dimensional neural network for training, and iterating for 20 times to obtain a corresponding network model. When the I3D generation is trained for 20 generations, the training program is set to store the trained gesture recognition model once per generation when training the gesture recognition model. The learning rate of the training gesture recognition model adopts exponential decay, and the calculation formula is as follows:
ln=lo×γepoch
wherein lnIndicates the learning rate of a new round of update,/oDenotes the learning rate before update, and γ is a parameter. Gamma is set to 0.1, and the learning rate finally converges to 0.001 through exponential decay. The convergence of the network is accelerated by adopting the exponential decay model, so that the network can be converged better to obtain an optimal solution.
Through continuous tests of the inventor, the fact that the batch _ size is set to be 8 is determined, the video memory utilization rate is improved, the parallelization efficiency of large matrix multiplication is further improved, the iteration times required by training are reduced, and the training speed of the same data volume is further accelerated.
In the training process, in order to see the effect of the model as a whole, firstly, each epoch tests the test data, finds out a training algebra with higher accuracy (as shown in fig. 5), and further analyzes and compares to find out a numerically optimal model. And finally, training a 12 th generation model as an optimal model.
Further, the I3D model, the double-current network, the 3DRes-18, the I3D and the Yolov3+ Res-18 are trained according to the same algebra respectively, the accuracy of each model and the performance of the recognition time comparison model are obtained by testing the same test data, and the performance evaluation comprises the recognition capability, the recognition speed and the robustness of the model.
Detection precision and speed comparison of four models
Dual stream network 3DRes-18 Yolov3+Res-18 I3D
Acc 0.85 0.69 0.95 0.92
FPS/ms 146 129 210 130
In this stage, the optical flow data needs to be extracted from the RGB video data in advance by using the dual-flow network, which means that the gesture recognition model implemented by using the dual-flow network model cannot be implemented in real time, and 3DRes-18 is used to train the video data, and as a result, it is found that the recognition accuracy is low and the space that can be promoted is large. The Yolov3+ Res-18 is trained, and the result shows that the model achieves a good effect in the aspect of accuracy, but the gesture detection consumes too long time and cannot achieve good real-time correspondence. The training of I3D shows that the I3D model achieves good effect on recognition speed and accuracy, and the accuracy and recognition time of the test set of each model are shown in Table 5.1. By comparing all aspects, the I3D is adopted as the basic model in the embodiment to continue deep training and result optimization.
And 5: loading a gesture recognition model to carry out prediction classification on motion videos of the hand;
step 6: and converting the acquired gesture type into a function control function of the conference system, and controlling the conference system to open, show or turn pages through the gesture.
Through the PPT control system based on gesture recognition, a performer does not need to operate external equipment, only a common camera at a PC (personal computer) end needs to acquire a gesture dynamic video, a corresponding control instruction can be acquired, and convenience and smoothness of demonstration are improved.
It should be understood that the above embodiments are preferred examples of the present invention, the present invention is not limited to the above embodiments, and those skilled in the art may make additions or substitutions of technical features without departing from the technical solutions of the present invention, and therefore, the technical solutions generated by the additions or substitutions should also belong to the protection scope of the present invention.

Claims (6)

1. The utility model provides a meeting presentation system based on dynamic gesture control in video which characterized in that, by the real-time video acquisition module that connects gradually, continuous gesture cuts apart module, video and removes redundant module, gesture recognition module and meeting presentation system response module and constitute, wherein:
the real-time video acquisition module is used for acquiring a current video stream in real time by adopting a camera;
the continuous gesture segmentation module is used for splitting continuous gestures in the video stream, segmenting a plurality of continuous gestures into independent gesture fragments and sending the independent gesture video fragments to the video redundancy elimination module;
the video redundancy removing module is used for removing redundancy areas of the single gesture video clips, screening effective information in the video clips through a coarse redundancy removing unit and a further fine redundancy removing unit, and sending the simplified independent gesture video clips to the gesture recognition module;
the gesture recognition module is used for recognizing the received independent single gesture video, training a gesture recognition model by utilizing a recording data set constructed in advance, further performing prediction classification on the detected hand video by adopting a hand feature model, and finally sending a prediction result of the gesture recognition model to the conference demonstration system response module;
and the conference demonstration system response module is used for converting the received gesture type prediction result into a control instruction, and then sending the received control instruction to the processor to complete the opening, showing and page turning of the conference demonstration system.
2. The system as claimed in claim 1, wherein the continuous gesture segmentation module determines whether the hand is continuously exposed in the presentation area by designing a hand discriminator algorithm, segments a plurality of continuous gestures into independent gesture segments according to the exposure degree of the hand, and sends the independent gesture video segments to the video redundancy elimination module.
3. The system for meeting presentation based on dynamic gesture control in video according to claim 1, wherein the video redundancy removing module filters the effective information in the video segment through two units of coarse redundancy removing and fine redundancy removing; wherein:
the rough redundancy removing unit is used for screening and deleting irrelevant gesture segments in the starting part and the ending part of the video;
and the redundancy eliminating unit is further designed to screen similar frames in the video in order to accelerate the speed of the gesture recognition module, so that the video information is simplified.
4. The video-based conference presentation system controlled by dynamic gestures according to claim 1, wherein the gesture recognition module specifically comprises a recording data set unit, a gesture recognition model unit and a gesture category prediction unit, wherein:
the gesture recognition model unit is used for training the constructed gesture data set, learning the characteristic information of different types of gestures and storing the characteristic information as a gesture characteristic model;
the gesture type prediction unit is used for performing gesture type prediction on the motion video of the hand;
the recording data set unit is obtained by collection, recording and arrangement, under normal indoor illumination, 18 demo persons are recorded in an environment with a white wall as a background, each demo person is 1m away from the camera, and the demo persons demonstrate three gesture actions of clicking, grabbing and translating in a sitting posture state.
5. The video-based dynamic gesture control conference presentation system according to claim 4, wherein the gesture is a one-handed motion, operated with either left or right hand.
6. The system for meeting presentation based on dynamic gesture control in video of claim 1, wherein the camera in the real-time video capture module is a common camera.
CN202210145445.3A 2022-02-17 2022-02-17 Conference demonstration system based on dynamic gesture control in video Pending CN114564104A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210145445.3A CN114564104A (en) 2022-02-17 2022-02-17 Conference demonstration system based on dynamic gesture control in video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210145445.3A CN114564104A (en) 2022-02-17 2022-02-17 Conference demonstration system based on dynamic gesture control in video

Publications (1)

Publication Number Publication Date
CN114564104A true CN114564104A (en) 2022-05-31

Family

ID=81713262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210145445.3A Pending CN114564104A (en) 2022-02-17 2022-02-17 Conference demonstration system based on dynamic gesture control in video

Country Status (1)

Country Link
CN (1) CN114564104A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980683A (en) * 2023-09-25 2023-10-31 深圳硬之城信息技术有限公司 Slide show method, device and storage medium based on video

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968178A (en) * 2012-11-07 2013-03-13 电子科技大学 Gesture-based PPT (Power Point) control system
CN107092349A (en) * 2017-03-20 2017-08-25 重庆邮电大学 A kind of sign Language Recognition and method based on RealSense
JP2018124801A (en) * 2017-02-01 2018-08-09 株式会社エクスビジョン Gesture recognition device and gesture recognition program
WO2019023921A1 (en) * 2017-08-01 2019-02-07 华为技术有限公司 Gesture recognition method, apparatus, and device
WO2021012513A1 (en) * 2019-07-19 2021-01-28 平安科技(深圳)有限公司 Gesture operation method and apparatus, and computer device
CN113408328A (en) * 2020-03-16 2021-09-17 哈尔滨工业大学(威海) Gesture segmentation and recognition algorithm based on millimeter wave radar

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968178A (en) * 2012-11-07 2013-03-13 电子科技大学 Gesture-based PPT (Power Point) control system
JP2018124801A (en) * 2017-02-01 2018-08-09 株式会社エクスビジョン Gesture recognition device and gesture recognition program
CN107092349A (en) * 2017-03-20 2017-08-25 重庆邮电大学 A kind of sign Language Recognition and method based on RealSense
WO2019023921A1 (en) * 2017-08-01 2019-02-07 华为技术有限公司 Gesture recognition method, apparatus, and device
WO2021012513A1 (en) * 2019-07-19 2021-01-28 平安科技(深圳)有限公司 Gesture operation method and apparatus, and computer device
CN113408328A (en) * 2020-03-16 2021-09-17 哈尔滨工业大学(威海) Gesture segmentation and recognition algorithm based on millimeter wave radar

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NAGASHREE R N,STAFFORD MICHAHIAL,AISHWARYA G N,BEEBI HAJIRA AZEEZ,JAYALAKSHMI M R,R KRUPA RANI: "Hand gesture recognition using support vector machine", THE INTERNATIONAL JOURNAL OF ENGINEERING AND SCIENCE (IJES), vol. 4, no. 6 *
姬晓飞;王治博;王昱;: "视频手势识别的互动演示系统的设计与实现", 沈阳航空航天大学学报, no. 02 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980683A (en) * 2023-09-25 2023-10-31 深圳硬之城信息技术有限公司 Slide show method, device and storage medium based on video
CN116980683B (en) * 2023-09-25 2024-04-16 深圳硬之城信息技术有限公司 Slide show method, device and storage medium based on video

Similar Documents

Publication Publication Date Title
CN108352174B (en) Electronic device, storage device, and method for image processing
US10424341B2 (en) Dynamic video summarization
JP4499380B2 (en) System and method for whiteboard and audio capture
JP4258090B2 (en) Video frame classification method, segmentation method, and computer-readable storage medium
CN111488791A (en) On-device classification of fingertip movement patterns as gestures in real time
JP2012506589A (en) Method, system and related modules, and software components for providing an image sensor human machine interface
Chatila et al. Integrated planning and execution control of autonomous robot actions
CN110334753B (en) Video classification method and device, electronic equipment and storage medium
JP2000298498A (en) Segmenting method of audio visual recording substance, computer storage medium and computer system
CN110619284B (en) Video scene division method, device, equipment and medium
CN110942011A (en) Video event identification method, system, electronic equipment and medium
CN110708606A (en) Method for intelligently editing video
CN114564104A (en) Conference demonstration system based on dynamic gesture control in video
Kota et al. Automated detection of handwritten whiteboard content in lecture videos for summarization
CN108377407B (en) Panoramic video processing method and device and electronic equipment
Xu et al. Content extraction from lecture video via speaker action classification based on pose information
JP4110323B2 (en) Information output method and apparatus, program, and computer-readable storage medium storing information output program
CN112115131A (en) Data denoising method, device and equipment and computer readable storage medium
CN114245232B (en) Video abstract generation method and device, storage medium and electronic equipment
CN111062284A (en) Visual understanding and diagnosing method of interactive video abstract model
CN114245032B (en) Automatic switching method and system for video framing, video player and storage medium
CN111860086A (en) Gesture recognition method, device and system based on deep neural network
Ye et al. Vics: A modular vision-based hci framework
CN114666503A (en) Photographing method and device, storage medium and electronic equipment
CN114333056A (en) Gesture control method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination