WO2017000764A1 - 一种手势检测识别方法及系统 - Google Patents

一种手势检测识别方法及系统 Download PDF

Info

Publication number
WO2017000764A1
WO2017000764A1 PCT/CN2016/085625 CN2016085625W WO2017000764A1 WO 2017000764 A1 WO2017000764 A1 WO 2017000764A1 CN 2016085625 W CN2016085625 W CN 2016085625W WO 2017000764 A1 WO2017000764 A1 WO 2017000764A1
Authority
WO
WIPO (PCT)
Prior art keywords
gesture
target
detection
image
skin color
Prior art date
Application number
PCT/CN2016/085625
Other languages
English (en)
French (fr)
Inventor
张宏鑫
Original Assignee
芋头科技(杭州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 芋头科技(杭州)有限公司 filed Critical 芋头科技(杭州)有限公司
Priority to EP16817128.8A priority Critical patent/EP3318955A4/en
Priority to JP2017567753A priority patent/JP6608465B2/ja
Priority to US15/739,274 priority patent/US10318800B2/en
Publication of WO2017000764A1 publication Critical patent/WO2017000764A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means

Definitions

  • the invention relates to the field of human-computer interaction, and in particular to a gesture detection and recognition method and system based on a robot system.
  • Gesture detection and state recognition technologies generally use 2D or 3D technology. Since the hand is an elastic object, there will be a big difference between the same gestures. Different gestures may be similar, different people may make different gestures, and the gestures have greater redundant information in the unconscious. In the situation, people will generate a lot of gestures, so the computing power and recognition accuracy of the recognition technology are very high.
  • the existing recognition technology cannot quickly recognize multi-gesture changes, the recognition accuracy is low, and the real-time performance is poor; and it is sensitive to light, and different intensity and direction illumination (such as polarized or uncompensated light source) will produce different shadows. Directly affecting the quasi-determinism of recognition, it is impossible to extract the target region of interest in complex background conditions.
  • a gesture detection and recognition method includes the following steps:
  • A1. Collect images and store them
  • A2 using a preset plurality of classifiers for detecting different gestures, detecting the image of each frame according to a preset interval according to a preset interval to obtain a gesture target;
  • the gesture frequency of the two states before and after the gesture target is acquired according to the skin color model, and the gesture frequency is matched with the preset gesture state to acquire a state of the gesture transition, and output.
  • the image is pre-processed prior to performing step A2.
  • each of the classifiers performs multi-scale target detection on the image through a preset sliding window to acquire the gesture target.
  • the window is expanded by 4 times to detect the gesture target.
  • the classifier employs a cascade classifier.
  • a gesture detection and recognition system includes:
  • An acquisition unit for collecting images An acquisition unit for collecting images
  • a storage unit connected to the collection unit, for storing the image
  • a plurality of classifiers for detecting different gestures, respectively connected to the storage unit, for detecting the image of each frame in a preset sequence by using an alternate frame to obtain a gesture target;
  • a skin color modeling unit connected to the storage unit, to establish a skin color model based on a pixel distribution of the gesture target area;
  • a decision unit which is configured to respectively connect the plurality of the classifiers and the skin color modeling unit, acquire the gesture frequency of the two states before and after the gesture target according to the skin color model, and match the gesture frequency with the preset gesture state, Get the state of the gesture transition and output.
  • the acquisition unit uses a camera.
  • the classifier uses a cascade classifier.
  • the classifier performs multi-scale target detection on the image through a preset sliding window to acquire the gesture target.
  • the window is expanded by 4 times to detect the gesture target.
  • the gesture detection and recognition method can perform real-time skin color modeling based on the pixel distribution in the detected gesture target area, so as to extract the skin color in a specific scene, and gradually eliminate the influence of the light after the illumination changes drastically. , thereby achieving the purpose of extracting the state of the gesture transition.
  • the gesture detection and recognition system can detect gestures with different light, shooting angle, size and skin color, and the recognition accuracy can reach more than 90% with high accuracy.
  • FIG. 1 is a block diagram of an embodiment of a gesture detection and recognition system according to the present invention.
  • Figure 2 is a graph of the fist-palm and palm-fist changes in gesture frequency
  • Figure 3 is a schematic diagram of a gesture music control system.
  • a gesture detection and recognition method includes the following steps:
  • A1. Collect images and store them
  • the gesture frequency of the two states before and after the gesture target is acquired, and the gesture frequency is matched with the preset gesture state to acquire the state of the gesture conversion, and output.
  • the gesture detection and recognition method can perform real-time skin color modeling based on the pixel distribution in the detected gesture target area, so as to extract the skin color in a specific scene, and gradually eliminate the influence of the light after the illumination changes drastically. , thereby achieving the purpose of extracting the state of the gesture transition.
  • the gesture detection and recognition method can be applied to a robot system, and the robot can collect gestures of various postures appearing at any position in the visual field in various illumination conditions, including polarized light or uncompensated light sources, and can acquire the gesture transition state in real time.
  • the detected color space of the gesture target area image can be converted to YUV (YUV is a color coding method adopted by the European television system (belonging to PAL), which is PAL (Pal) and SECAM. (Secon) simulates the color space used by the color TV system to remove the Y component to eliminate the effects of illumination. Since the skin color pixels in this region are Gaussian, the mean and variance of the UV values in the region are calculated to update the mean variance of the overall skin color, and the skin color model can be established in real time to remove the background and improve the accuracy.
  • YUV is a color coding method adopted by the European television system (belonging to PAL), which is PAL (Pal) and SECAM.
  • PAL PAL
  • SECAM SECAM
  • the image is pre-processed prior to performing step A2.
  • the pre-processing in this embodiment can adopt the histogram equalization method to adjust the gray value by using the accumulation function to achieve contrast enhancement, thereby eliminating the influence of illumination and increasing the dynamic range of the pixel gray value. Thereby, the effect of enhancing the overall contrast of the image can be achieved.
  • each classifier performs multi-scale target detection on the image through a predetermined sliding window to acquire a gesture target.
  • the classifier is trained using the Adaboost algorithm.
  • Adaboost is an iterative algorithm. The main idea is to train a number of different Weak Classifiers for a training set, and then combine these weak classifiers into a strong classifier. It determines the weight of each sample based on whether each sample classification is correct in each training set and the correct rate of the last overall classification, and the lower classifier trains based on the data sets of these new weights.
  • the resulting cascaded classifier is a weighted combination of the classifiers obtained for each training.
  • the classifier can be trained using an LBP feature (Local Binary Pattern).
  • LBP feature is an operator used to describe the local texture features of an image. It has significant advantages such as rotation invariance and gray invariance.
  • a multi-scale target detection is performed on the image by using a sliding window of the same size as the training image.
  • the gesture target is detected by expanding the window by 4 times after acquiring the gesture target.
  • the detection window can be expanded as a pre-judgment of the position of the next frame gesture target, and the next frame input image Only the image portion of this window is taken to increase the detection speed.
  • the length and width of the original window can be expanded by a factor of two.
  • the classifier employs a cascade classifier.
  • the cascade classifier can detect gestures of light, shooting angle, size, and skin color, and the recognition accuracy can reach more than 90%, and the accuracy is high.
  • a gesture detection and recognition system includes:
  • the collecting unit 1 is configured to collect an image
  • the storage unit 2 is connected to the collection unit 1 for storing an image
  • each frame image is detected by using an alternate frame to obtain a gesture target;
  • the skin color modeling unit 4 is connected to the storage unit 2 for establishing a skin color model based on the pixel distribution of the gesture target area;
  • the decision unit 5 is connected to the plurality of classifiers 3 and the skin color modeling unit 4 respectively, and acquires the gesture frequency of the two states before and after the gesture target according to the skin color model, and matches the gesture frequency with the preset gesture state to obtain the state of the gesture conversion, and Output.
  • the classifier 3 in the gesture detection and recognition system is capable of detecting gestures of different light, shooting angle, size, and skin color, and the recognition accuracy rate can reach more than 90%, and the accuracy is high.
  • the skin color modeling unit 4 can perform real-time skin color modeling based on the detected pixel target region based on the pixel distribution, can extract skin color for a specific scene, and gradually eliminate the influence thereof after the illumination changes drastically.
  • the skin color modeling unit 4 can convert the detected gesture target area image color space into a YUV space, and remove the Y component to eliminate the illumination effect. Since the skin color pixels in this region are Gaussian, the mean and variance of the UV values in the region are calculated to update the mean variance of the overall skin color, and the skin color model can be established in real time to remove the background and improve the accuracy.
  • the acquisition unit 1 employs a video camera.
  • the camera can adopt a high-definition camera with an acquisition speed of 30 frames/second.
  • the classifier 3 employs a cascade classifier.
  • the cascade classifier 3 can detect gestures of different light, shooting angle, size, and skin color, and the recognition accuracy can reach more than 90%, and the accuracy is high.
  • the classifier 3 performs multi-scale object detection on the image through a preset sliding window to acquire a gesture target.
  • the classifier 3 uses the Adaboost algorithm for training.
  • Adaboost is an iterative algorithm. The main idea is to train a number of different Weak Classifiers for a training set, and then combine these weak classifiers into a strong classifier. It determines the weight of each sample based on whether each sample classification is correct in each training set and the correct rate of the last overall classification, and the lower classifier trains based on the data sets of these new weights. The resulting cascading classifier will be every time The trained classifiers are weighted and combined.
  • the classifier 3 can be trained using an LBP feature (Local Binary Pattern).
  • LBP feature is an operator used to describe the local texture features of an image. It has significant advantages such as rotation invariance and gray invariance.
  • a multi-scale target detection is performed on the image by using a sliding window of the same size as the training image.
  • the classifier 3 detects the gesture target by expanding the window by 4 times.
  • the detection window can be expanded as a pre-judgment of the position of the next frame gesture target, and the next frame input image Only the image portion of this window is taken to increase the detection speed.
  • the length and width of the original window can be expanded by a factor of two.
  • the classifier corresponding to the corresponding gesture can be trained.
  • the changes in the frequency of the boxing between the boxing-hand and the palm-boxing should conform to the figure shown in Figure 2.
  • the intersection of the two is the change of the gesture state.
  • the nearby area is selected as the detection window of the next frame to improve the detection speed and reduce the false detection rate.
  • a shorter sliding window is used in calculating the gesture frequency F, the length of which is related to the gesture change time. Since the abscissa of the intersection of the two frequencies f1 and f2 is not necessarily an integer, a threshold T is set, When the absolute difference between f1 and f2 is within the threshold T, it is considered that a state change has occurred.
  • This threshold T has a large influence on the response speed and accuracy.
  • the change of boxing-palm and palm-boxing usually takes place within 0.5 seconds, so a sliding window with a length of 15 frames can be selected.
  • the detection recognition speed can be improved and the false detection rate can be reduced.
  • the defined frequency function is used to smooth the false detection noise, and the corresponding state change is identified by the frequency change, and the recognition is fast and accurate.
  • the response speed can be kept within 100ms.
  • the gesture detection and recognition technology can be applied to the gesture music control, and an HD camera can be used to connect to the embedded system of the robot through the MIPI or USB interface, and the robot is embedded.
  • the computing system can include hardware and software operating environments, and the system includes an image capturing unit, a gesture detection and recognition unit, and a music playing unit.
  • the specific control process of the gesture music control system is: the robot requests the image acquisition unit while playing the music, and the driver software accepts the request, and transmits the image captured by the camera to the gesture detection and recognition unit for detecting and determining the specific gesture, and calculating The result is sent to the music playback unit, and the music playback unit executes the pre-specified corresponding command after obtaining the result.
  • the user sends a fist (hand-fist) operation, the music is paused; the user issues an operation of extending the five fingers (boxing-palm), and the music continues.
  • the advantages of the present invention are that the pre-established skin color model adopted by the existing recognition technology is not applicable to certain specific scenes, and the real-time skin color model adopted by the present invention can be applied to the scene at that time, and can eliminate the influence of drastic changes in illumination;
  • This technical solution can be embedded in the robot system, so the LBP feature is used, which is an integer operation, compared to the direction gradient histogram (Histogram of Oriented Gradient (HOG), which greatly reduces the amount of calculation, makes the system calculation faster;
  • the invention performs the position pre-judgment of the gesture target area on the basis of the previous frame, thereby reducing the image area size, and greatly improving the running speed and eliminating Part of the background influence, reduce the false detection rate; use different gesture classifiers to improve the detection speed through the frame; smooth the false detection noise by the gesture frequency, and use a short sliding window to respond to the state change of the gesture in real time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Social Psychology (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种手势检测识别方法及系统,手势检测识别方法为:采集图像,并存储;采用预设的多个用于检测不同手势的分类器按照预设顺序依据隔帧交替的方式对每一帧所述图像进行检测,以获取手势目标;基于所述手势目标区域的像素分布建立肤色模型;根据肤色模型获取所述手势目标前后两个状态的所述手势频率,将所述手势频率与预设手势状态匹配,以获取手势转换的状态,并输出。手势检测识别方法可实现在特定场景下提取肤色,可在光照剧烈变化后逐步消除其产生的影响,从而实现提取手势转换状态的目的。手势检测识别系统能够检测光线、拍摄角度、大小、肤色不同的手势,且识别正确率可达到90%以上,准确性高。

Description

一种手势检测识别方法及系统 技术领域
本发明涉及人机交互领域,尤其涉及一种基于机器人系统的手势检测识别方法及系统。
背景技术
随着计算机技术的发展,计算机的处理技术越来越强,原始的人机交互技术越来越不能满足人们的需求,人们开始寻找更加自然和智能的交互方式。手势检测及状态识别技术普遍采用2D或3D技术。由于手是弹性物体,同一种手势之间会有较大差别,不同手势之间可能会很相似,不同的人做出的手势也会不同,且手势具有较大的冗余信息,在无意识的状况下人会产生非常多的手势,因此对识别技术的运算能力和识别正确度要求很高。然而现有的识别技术无法快速识别多手势变化,识别的正确率低,实时性差;且对光线比较敏感,不同强度、方向的光照(如偏光或无补偿光源的情况下)会产生不同的阴影直接影响识别的准确定性,无法在复杂的背景条件中提取出感兴趣的手部区域目标。
发明内容
针对现有的识别技术存在的上述问题,现提供一种旨在实现可在偏光或 无补偿光源的情况下快速识别手势变化的手势检测识别方法及系统。
具体技术方案如下:
一种手势检测识别方法,包括下述步骤:
A1.采集图像,并存储;
A2.采用预设的多个用于检测不同手势的分类器按照预设顺序依据隔帧交替的方式对每一帧所述图像进行检测,以获取手势目标;
A3.基于所述手势目标区域的像素分布建立肤色模型;
A4.根据肤色模型获取所述手势目标前后两个状态的所述手势频率,将所述手势频率与预设手势状态匹配,以获取手势转换的状态,并输出。
优选的,在执行所述步骤A2之前,对所述图像进行预处理。
优选的,每个所述分类器均通过一预设滑动窗口对所述图像进行多尺度目标检测,以获取所述手势目标。
优选的,获取所述手势目标后将所述窗口扩大4倍对所述手势目标进行检测。
优选的,所述分类器采用级连分类器。
一种手势检测识别系统,包括:
采集单元,用以采集图像;
存储单元,连接所述采集单元,用以存储所述图像;
复数个用于检测不同手势的分类器,分别连接所述存储单元,用以在预设顺序下采用隔帧交替的方式对每一帧所述图像进行检测,以获取手势目标;
肤色建模单元,连接所述存储单元,用以基于所述手势目标区域的像素分布建立肤色模型;
决策单元,分别连接复数个所述分类器和所述肤色建模单元,根据肤色模型获取所述手势目标前后两个状态的所述手势频率,将所述手势频率与预设手势状态匹配,以获取手势转换的状态,并输出。
所述采集单元采用摄像机。
所述分类器采用级连分类器。
所述分类器均通过一预设滑动窗口对所述图像进行多尺度目标检测,以获取所述手势目标。
所述分类器获取所述手势目标后将所述窗口扩大4倍对所述手势目标进行检测。
上述技术方案的有益效果:
在本技术方案中,手势检测识别方法可在检测到的手势目标区域,基于像素分布进行实时的肤色建模,以实现在特定场景下提取肤色,可在光照剧烈变化后逐步消除其产生的影响,从而实现提取手势转换状态的目的。手势检测识别系统能够检测光线、拍摄角度、大小、肤色不同的手势,且识别正确率可达到90%以上,准确性高。
附图说明
图1为本发明所述手势检测识别系统的一种实施例的模块图;
图2为拳-掌和掌-拳关于手势频率变化的曲线图;
图3为手势音乐控制系统的原理图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。
下面结合附图和具体实施例对本发明作进一步说明,但不作为本发明的限定。
一种手势检测识别方法,包括下述步骤:
A1.采集图像,并存储;
A2.采用预设的多个用于检测不同手势的分类器按照预设顺序依据隔帧交替的方式对每一帧图像进行检测,以获取手势目标;
A3.基于手势目标区域的像素分布建立肤色模型;
A4.根据肤色模型获取手势目标前后两个状态的手势频率,将手势频率与预设手势状态匹配,以获取手势转换的状态,并输出。
在本实施例中,手势检测识别方法可在检测到的手势目标区域,基于像素分布进行实时的肤色建模,以实现在特定场景下提取肤色,可在光照剧烈变化后逐步消除其产生的影响,从而实现提取手势转换状态的目的。手势检测识别方法可应用于在机器人系统中,机器人可在各种光照条件,包括偏光或者无补偿光源的情况采集视野中任意位置出现的各种姿势的手势,可实时获取手势转换状态。
在建立肤色模型的过程中可将检测出的手势目标区域图像色彩空间转为YUV(YUV是被欧洲电视系统所采用的一种颜色编码方法(属于PAL),是PAL(帕尔制)和SECAM(塞康制)模拟彩色电视制式采用的颜色空间)空间,去除Y分量来消除光照影响。由于在此区域内肤色像素呈高斯分布,计算得到该区域UV值的均值和方差来更新总体肤色的均值方差,便可以实时建立肤色模型,去除背景,提高准确率。
在优选的实施例中,在执行步骤A2之前,对图像进行预处理。
在本实施例中的预处理可采用直方图均衡化的方式,通过使用累积函数对灰度值进行“调整”以实现对比度的增强,从而消除光照影响,增加了象素灰度值的动态范围从而可达到增强图像整体对比度的效果。
在优选的实施例中,每个分类器均通过一预设滑动窗口对图像进行多尺度目标检测,以获取手势目标。
分类器采用了Adaboost算法进行训练。Adaboost是一种迭代算法,主要思想是对一个训练集训练出多个不同的弱分类器(Weak Classifier),再将这些弱分类器联合起来,组合成一个强分类器。它根据每次训练集中每个样本分类是否正确以及上次总体分类的正确率来确定每个样本的权值,下层分类器就根据这些新权值的数据集进行训练。最后获得的级联分类器就是将每次训练得到的分类器加权组合而成。
进一步地,分类器可采用LBP特征(Local Binary Pattern,局部二值模式)进行训练。LBP特征是一种用来描述图像局部纹理特征的算子,它具有旋转不变性和灰度不变性等显著的优点。
在本实施例中采用和训练图像大小相同的滑动窗口对图像进行多尺度目标检测。
在优选的实施例中,获取手势目标后将窗口扩大4倍对手势目标进行检测。
由于在每帧图像之间手部运动变化距离并不大,为了提高速度,每当检测到手势目标后,可通过扩大检测窗口作为下一帧手势目标存在位置的预判,下一帧输入图像只取此窗口图像部分,以提高检测速度。
进一步地,可将原窗口的长度与宽度各扩大2倍的。
在优选的实施例中,分类器采用级连分类器。
在本实施例中采用级连分类器能够检测光线、拍摄角度、大小、肤色不同的手势,识别正确率可达到90%以上,准确性高。
如图1所示,一种手势检测识别系统,包括:
采集单元1,用以采集图像;
存储单元2,连接采集单元1,用以存储图像;
复数个用于检测不同手势的分类器3,分别连接存储单元2,用以在预设 顺序下采用隔帧交替的方式对每一帧图像进行检测,以获取手势目标;
肤色建模单元4,连接存储单元2,用以基于手势目标区域的像素分布建立肤色模型;
决策单元5,分别连接复数个分类器3和肤色建模单元4,根据肤色模型获取手势目标前后两个状态的手势频率,将手势频率与预设手势状态匹配,以获取手势转换的状态,并输出。
在本实施例中,手势检测识别系统中的分类器3能够检测光线、拍摄角度、大小、肤色不同的手势,且识别正确率可达到90%以上,准确性高。
肤色建模单元4可根据检测到的手势目标区域,基于像素分布进行实时的肤色建模,可以针对特定场景提取肤色,并且在光照剧烈变化后逐步消除其产生的影响。肤色建模单元4可将检测出的手势目标区域图像色彩空间转为YUV空间,去除Y分量来消除光照影响。由于在此区域内肤色像素呈高斯分布,计算得到该区域UV值的均值和方差来更新总体肤色的均值方差,便可以实时建立肤色模型,去除背景,提高准确率。
在优选的实施例中,采集单元1采用摄像机。
进一步地,摄像机可采用采集速度为30帧/秒的高清摄像头。
在优选的实施例中,分类器3采用级连分类器。
在本实施例中采用级连分类器3能够检测光线、拍摄角度、大小、肤色不同的手势,识别正确率可达到90%以上,准确性高。
在优选的实施例中,分类器3均通过一预设滑动窗口对图像进行多尺度目标检测,以获取手势目标。
分类器3采用了Adaboost算法进行训练。Adaboost是一种迭代算法,主要思想是对一个训练集训练出多个不同的弱分类器(Weak Classifier),再将这些弱分类器联合起来,组合成一个强分类器。它根据每次训练集中每个样本分类是否正确以及上次总体分类的正确率来确定每个样本的权值,下层分类器就根据这些新权值的数据集进行训练。最后获得的级联分类器就是将每次 训练得到的分类器加权组合而成。
进一步地,分类器3可采用LBP特征(Local Binary Pattern,局部二值模式)进行训练。LBP特征是一种用来描述图像局部纹理特征的算子,它具有旋转不变性和灰度不变性等显著的优点。
在本实施例中采用和训练图像大小相同的滑动窗口对图像进行多尺度目标检测。
在优选的实施例中,分类器3获取手势目标后将窗口扩大4倍对手势目标进行检测。
由于在每帧图像之间手部运动变化距离并不大,为了提高速度,每当检测到手势目标后,可通过扩大检测窗口作为下一帧手势目标存在位置的预判,下一帧输入图像只取此窗口图像部分,以提高检测速度。
进一步地,可将原窗口的长度与宽度各扩大2倍的。
对于不同的手势,可训练出相应手势对应的分类器。以特定的拳-掌训练出了对应的拳分类器和掌分类器为例进行手势检测识别:为了提高计算速度,可采用隔帧交替的方式采用不同分类器的方法进行手势的检测,在实际生活中,手势可以在一定时间内是恒定的,因此在某帧使用其中一个分类器检测到手势,若下一帧另一个分类器未检测到时,可以假设之前的手势状态依旧存在。为了识别状态改变,假设了手势频率F(gesture)=手势存在时间/检测时间,它可以平滑误检,减少对状态识别的干扰。理想条件下,拳-掌和掌-拳关于手势频率的变化应该符合图2所示,二者的交点即为手势状态改变。实际应用中:当检测出某个手势存在之后,选择其附近区域作为下一帧的检测窗口,以提高检测速度并且降低误检率。为了对手势变化作出快速响应,在计算手势频率F时使用了一个较短的滑动窗口,其长度与手势变化时间相关。由于两个频率f1,f2交点的横坐标并不一定是整数,因此设立一个阈值T, 当f1与f2的绝对差值在阈值T范围内,则认为发生了一次状态改变。此阈值T对响应速度以及准确率有较大影响。通过观察频率曲线可以得知,从状态B变为状态C时,f1下降,f2上升。因此根据计算得到的两个手势频率可以判断出,此变化是拳-掌还是掌-拳。
在手势检测识别过程中:拳-掌,掌-拳的改变通常发生在0.5秒之内,因此可选择长度为15帧的滑动窗口。通过隔帧交替使用分类器、缩小检测范围的方式可提高检测识别速度同时降低误检率,采用定义的频率函数平滑误检噪声,通过频率的变化识别出对应的状态改变,且识别快速准确,响应速度可保持在100ms以内。
于上述技术方案基础上,进一步的,如图3所示,可将手势检测识别技术应用于手势音乐控制中,可采用一高清摄像头,通过MIPI或者USB接口连接到机器人的嵌入式系统,机器人嵌入式计算系统可包含硬件和软件运行环境,系统中包括影像采集单元、手势检测识别单元和音乐播放单元。
手势音乐控制系统的具体控制过程为:机器人播放音乐的同时请求到影像采集单元,驱动软件接受请求,将摄像头采集到的影像传送给手势检测识别单元,用于检测并确定具体手势,计算后将结果发送给音乐播放单元,音乐播放单元得到结果后执行预先指定的对应命令。如:用户发出握拳(掌-拳)的操作,音乐暂停;用户发出伸开五指(拳-掌)的操作,音乐继续。
本发明的优点有:现有的识别技术采用的预先建立肤色模型对某些特定场景并不适用,而本发明采用的实时建立肤色模型能够适用于当时场景,并且能够消除光照剧烈变化的影响;本技术方案可嵌入于机器人系统中,因此采用的是LBP特征,它是整数运算,相比方向梯度直方图(Histogram of  Oriented Gradient,HOG),极大降低运算量,使得系统计算更加快速;本发明在前一帧的基础上进行手势目标区域的位置预判从而减少图像区域大小,能够极大的提升运行速度,消除部分背景影响,降低误检率;通过隔帧使用不同手势分类器可以提高检测速度;通过手势频率来平滑误检噪声,使用一个较短的滑动窗口对手势的状态变化作出实时响应。
以上所述仅为本发明较佳的实施例,并非因此限制本发明的实施方式及保护范围,对于本领域技术人员而言,应当能够意识到凡运用本发明说明书及图示内容所作出的等同替换和显而易见的变化所得到的方案,均应当包含在本发明的保护范围内。

Claims (10)

  1. 一种手势检测识别方法,其特征在于,包括下述步骤:
    A1.采集图像,并存储;
    A2.采用预设的多个用于检测不同手势的分类器按照预设顺序依据隔帧交替的方式对每一帧所述图像进行检测,以获取手势目标;
    A3.基于所述手势目标区域的像素分布建立肤色模型;
    A4.根据肤色模型获取所述手势目标前后两个状态的所述手势频率,将所述手势频率与预设手势状态匹配,以获取手势转换的状态,并输出。
  2. 如权利要求1所述手势检测识别方法,其特征在于,在执行所述步骤A2之前,对所述图像进行预处理。
  3. 如权利要求1所述手势检测识别方法,其特征在于,每个所述分类器均通过一预设滑动窗口对所述图像进行多尺度目标检测,以获取所述手势目标。
  4. 如权利要求3所述手势检测识别方法,其特征在于,获取所述手势目标后将所述窗口扩大4倍对所述手势目标进行检测。
  5. 如权利要求1所述手势检测识别方法,其特征在于,所述分类器采用级连分类器。
  6. 一种手势检测识别系统,其特征在于,包括:
    采集单元,用以采集图像;
    存储单元,连接所述采集单元,用以存储所述图像;
    复数个用于检测不同手势的分类器,分别连接所述存储单元,用以在预设顺序下采用隔帧交替的方式对每一帧所述图像进行检测,以获取手势目标;
    肤色建模单元,连接所述存储单元,用以基于所述手势目标区域的像素分布建立肤色模型;
    决策单元,分别连接复数个所述分类器和所述肤色建模单元,根据肤色 模型获取所述手势目标前后两个状态的所述手势频率,将所述手势频率与预设手势状态匹配,以获取手势转换的状态,并输出。
  7. 如权利要求6所述手势检测识别系统,其特征在于,所述采集单元采用摄像机。
  8. 如权利要求6所述手势检测识别系统,其特征在于,所述分类器采用级连分类器。
  9. 如权利要求6所述手势检测识别系统,其特征在于,所述分类器均通过一预设滑动窗口对所述图像进行多尺度目标检测,以获取所述手势目标。
  10. 如权利要求9所述手势检测识别系统,其特征在于,所述分类器获取所述手势目标后将所述窗口扩大4倍对所述手势目标进行检测。
PCT/CN2016/085625 2015-06-30 2016-06-13 一种手势检测识别方法及系统 WO2017000764A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP16817128.8A EP3318955A4 (en) 2015-06-30 2016-06-13 Gesture detection and recognition method and system
JP2017567753A JP6608465B2 (ja) 2015-06-30 2016-06-13 ジェスチャーの検知識別の方法及びシステム
US15/739,274 US10318800B2 (en) 2015-06-30 2016-06-13 Gesture detection and recognition method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510381721.6A CN106325485B (zh) 2015-06-30 2015-06-30 一种手势检测识别方法及系统
CN201510381721.6 2015-06-30

Publications (1)

Publication Number Publication Date
WO2017000764A1 true WO2017000764A1 (zh) 2017-01-05

Family

ID=57607761

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/085625 WO2017000764A1 (zh) 2015-06-30 2016-06-13 一种手势检测识别方法及系统

Country Status (7)

Country Link
US (1) US10318800B2 (zh)
EP (1) EP3318955A4 (zh)
JP (1) JP6608465B2 (zh)
CN (1) CN106325485B (zh)
HK (1) HK1231590A1 (zh)
TW (1) TW201701187A (zh)
WO (1) WO2017000764A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490125A (zh) * 2019-08-15 2019-11-22 成都睿晓科技有限公司 一种基于手势自动检测的加油区服务质量检测系统
CN110728185A (zh) * 2019-09-10 2020-01-24 西安工业大学 一种判别驾驶人存在手持手机通话行为的检测方法
US11169614B2 (en) * 2017-10-24 2021-11-09 Boe Technology Group Co., Ltd. Gesture detection method, gesture processing device, and computer readable storage medium
US11792189B1 (en) * 2017-01-09 2023-10-17 United Services Automobile Association (Usaa) Systems and methods for authenticating a user using an image capture device

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562207A (zh) * 2017-09-21 2018-01-09 深圳市晟达机械设计有限公司 一种基于手势识别控制的智能医疗系统
CN108038452B (zh) * 2017-12-15 2020-11-03 厦门瑞为信息技术有限公司 一种基于局部图像增强的家电手势快速检测识别方法
CN108255308A (zh) * 2018-02-11 2018-07-06 北京光年无限科技有限公司 一种基于虚拟人的手势交互方法及系统
CN109961016B (zh) * 2019-02-26 2022-10-14 南京邮电大学 面向智能家居场景的多手势精准分割方法
CN111652017B (zh) * 2019-03-27 2023-06-23 上海铼锶信息技术有限公司 一种动态手势识别方法及系统
CN110751082B (zh) * 2019-10-17 2023-12-12 烟台艾易新能源有限公司 一种智能家庭娱乐系统手势指令识别方法
CN112686169A (zh) * 2020-12-31 2021-04-20 深圳市火乐科技发展有限公司 手势识别控制方法、装置、电子设备及存储介质
CN114967905A (zh) * 2021-02-26 2022-08-30 广州视享科技有限公司 手势控制方法、装置、计算机可读存储介质和电子设备
CN113297956B (zh) * 2021-05-22 2023-12-08 温州大学 一种基于视觉的手势识别方法及系统
CN113609976B (zh) * 2021-08-04 2023-07-21 燕山大学 一种基于WiFi设备的方向敏感多手势识别系统及方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763515A (zh) * 2009-09-23 2010-06-30 中国科学院自动化研究所 一种基于计算机视觉的实时手势交互方法
CN102508547A (zh) * 2011-11-04 2012-06-20 哈尔滨工业大学深圳研究生院 基于计算机视觉的手势输入法构建方法及系统
CN103376890A (zh) * 2012-04-16 2013-10-30 富士通株式会社 基于视觉的手势遥控系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7095401B2 (en) * 2000-11-02 2006-08-22 Siemens Corporate Research, Inc. System and method for gesture interface
TW201123031A (en) * 2009-12-24 2011-07-01 Univ Nat Taiwan Science Tech Robot and method for recognizing human faces and gestures thereof
US20120169860A1 (en) * 2010-06-30 2012-07-05 Guan Lian Method for detection of a body part gesture to initiate a web application
US20130110804A1 (en) * 2011-10-31 2013-05-02 Elwha LLC, a limited liability company of the State of Delaware Context-sensitive query enrichment
US9335826B2 (en) * 2012-02-29 2016-05-10 Robert Bosch Gmbh Method of fusing multiple information sources in image-based gesture recognition system
US9111135B2 (en) * 2012-06-25 2015-08-18 Aquifi, Inc. Systems and methods for tracking human hands using parts based template matching using corresponding pixels in bounded regions of a sequence of frames that are a specified distance interval from a reference camera
US9697418B2 (en) * 2012-07-09 2017-07-04 Qualcomm Incorporated Unsupervised movement detection and gesture recognition
US9129155B2 (en) * 2013-01-30 2015-09-08 Aquifi, Inc. Systems and methods for initializing motion tracking of human hands using template matching within bounded regions determined using a depth map

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763515A (zh) * 2009-09-23 2010-06-30 中国科学院自动化研究所 一种基于计算机视觉的实时手势交互方法
CN102508547A (zh) * 2011-11-04 2012-06-20 哈尔滨工业大学深圳研究生院 基于计算机视觉的手势输入法构建方法及系统
CN103376890A (zh) * 2012-04-16 2013-10-30 富士通株式会社 基于视觉的手势遥控系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3318955A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11792189B1 (en) * 2017-01-09 2023-10-17 United Services Automobile Association (Usaa) Systems and methods for authenticating a user using an image capture device
US11169614B2 (en) * 2017-10-24 2021-11-09 Boe Technology Group Co., Ltd. Gesture detection method, gesture processing device, and computer readable storage medium
CN110490125A (zh) * 2019-08-15 2019-11-22 成都睿晓科技有限公司 一种基于手势自动检测的加油区服务质量检测系统
CN110490125B (zh) * 2019-08-15 2023-04-18 成都睿晓科技有限公司 一种基于手势自动检测的加油区服务质量检测系统
CN110728185A (zh) * 2019-09-10 2020-01-24 西安工业大学 一种判别驾驶人存在手持手机通话行为的检测方法

Also Published As

Publication number Publication date
US20180293433A1 (en) 2018-10-11
US10318800B2 (en) 2019-06-11
JP2018524726A (ja) 2018-08-30
EP3318955A4 (en) 2018-06-20
EP3318955A1 (en) 2018-05-09
HK1231590A1 (zh) 2017-12-22
CN106325485A (zh) 2017-01-11
JP6608465B2 (ja) 2019-11-20
TW201701187A (zh) 2017-01-01
CN106325485B (zh) 2019-09-10

Similar Documents

Publication Publication Date Title
WO2017000764A1 (zh) 一种手势检测识别方法及系统
CA3000127C (en) System and method for appearance search
US9036917B2 (en) Image recognition based on patterns of local regions
Granger et al. A comparison of CNN-based face and head detectors for real-time video surveillance applications
US20120274755A1 (en) System and method for human detection and counting using background modeling, hog and haar features
Deore et al. Study of masked face detection approach in video analytics
KR102399017B1 (ko) 이미지 생성 방법 및 장치
WO2011007390A1 (ja) 画像処理装置、及びインターフェース装置
KR20070016849A (ko) 얼굴 검출과 피부 영역 검출을 적용하여 피부의 선호색변환을 수행하는 방법 및 장치
JP6157165B2 (ja) 視線検出装置及び撮像装置
Elleuch et al. A static hand gesture recognition system for real time mobile device monitoring
Ahlvers et al. Model-free face detection and head tracking with morphological hole mapping
Ji et al. Spatio-temporal cuboid pyramid for action recognition using depth motion sequences
Radwan et al. Regression based pose estimation with automatic occlusion detection and rectification
CN104751144A (zh) 一种面向视频监控的正面人脸快速评价方法
Duan et al. Detection of hand-raising gestures based on body silhouette analysis
Wang et al. Adaptive visual tracking based on discriminative feature selection for mobile robot
Zamuner et al. A pose-adaptive constrained local model for accurate head pose tracking
Ullah et al. Hand gesture recognition for automatic tap system
Balasubramanian et al. Fovea intensity comparison code for person identification and verification
Niju Robust Human Tracking Using Sparse Collaborative Model in Surveillance Videos
Kim et al. A fast and accurate face tracking scheme by using depth information in addition to texture information
Xie et al. Information Technology in Multi-Object Tracking Based on Bilateral Structure Tensor Corner Detection for Mutual Occlusion
Guan et al. Efficient and robust face detection from color images
Hemanth et al. Analysis of Daubechies Wavelet Transform Based Human Detection Approaches in Digital Videos

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16817128

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017567753

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 15739274

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2016817128

Country of ref document: EP