CN112101208A - Feature series fusion gesture recognition method and device for elderly people - Google Patents

Feature series fusion gesture recognition method and device for elderly people Download PDF

Info

Publication number
CN112101208A
CN112101208A CN202010965987.6A CN202010965987A CN112101208A CN 112101208 A CN112101208 A CN 112101208A CN 202010965987 A CN202010965987 A CN 202010965987A CN 112101208 A CN112101208 A CN 112101208A
Authority
CN
China
Prior art keywords
gesture
image
points
point
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010965987.6A
Other languages
Chinese (zh)
Inventor
罗晓君
杨金水
罗湘喜
孙瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Huiming Science And Technology Co ltd
Original Assignee
Jiangsu Huiming Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Huiming Science And Technology Co ltd filed Critical Jiangsu Huiming Science And Technology Co ltd
Priority to CN202010965987.6A priority Critical patent/CN112101208A/en
Publication of CN112101208A publication Critical patent/CN112101208A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a feature series fusion gesture recognition method and device for elderly people. A feature series fusion gesture recognition method for elderly people comprises the following steps: converting the RGB image into a YCbCr image by using a color space model, and segmenting the image by using an ellipse model in a skin color model to obtain a gesture part; performing series fusion on HOG and LBP characteristics by using a series characteristic fusion method, describing gesture characteristics from two angles of edges and textures, and adopting gesture classification recognition based on an SVM; and judging whether the recognized gesture is effective or not by adopting a face verification method and a head posture estimation method, if so, recognizing the user requirement by utilizing the gesture, and otherwise, judging that the recognized gesture is ineffective. The invention judges the current demand of the old through the hand action of the old, converts the care demand into different hand actions, indirectly solves the problem that the old cannot clearly express the care demand by language, and simultaneously provides a simple and easy expression mode for the old.

Description

Feature series fusion gesture recognition method and device for elderly people
Technical Field
The invention relates to the technical field of gesture recognition, in particular to a method and a device for recognizing features of elderly people through series fusion.
Background
Older people are older, and the physical function of the older people is degraded, so that the older people often cannot clearly and intuitively express the living care requirements such as defecation, food intake, medication and the like due to unclear mouth and teeth.
The gesture is an important way for people to communicate information, and people can express rich semantic information through hand motions. Gesture recognition is a process of tracking and recognizing executed gestures and converting the gestures into words or sentences capable of expressing semantic information, and is mainly divided into two types, namely static gesture recognition and dynamic gesture recognition. In the research of gesture recognition at home and abroad, many early works are all dependent on various hardware devices to acquire specific gesture information.
With the breakthrough of computer hardware and image processing theory, the gesture recognition technology based on computer vision gradually becomes mainstream, and the method can complete the whole process of gesture recognition only by means of a camera, a common hardware device. Recognition of several typical gestures is accomplished in conjunction with a linear ANN, such as using the distance between the fingertip and the palm centroid. For example, an Extreme Learning Machine (ELM) pattern recognition algorithm recognizes the monocular vision-based libars symbol, and obtains a high recognition rate, but the accuracy of the method may change due to the influence of illumination. For example, a gesture learning system based on two-dimensional image sampling and splicing is used for carrying out classification and identification on gestures by sampling and concatenating gesture demonstration videos to construct training data. For example, a gesture recognition system designed and studied based on Kinect recognizes 12 letters using the length and direction vector of a finger and the vector angle between the finger and the palm, and only an open gesture shape has been studied although the system achieves a high recognition rate.
Disclosure of Invention
The invention aims to provide a feature series fusion gesture recognition method for elderly people, which is based on gesture segmentation of an elliptical model and feature extraction of fusion HOG and LBP, establishes a gesture recognition algorithm, introduces a face verification solution to judge the validity of a gesture, and judges the care requirement of the elderly people by using the finally obtained validity gesture. Based on the purpose, the technical scheme adopted by the invention is as follows:
a feature series fusion gesture recognition method for elderly people is characterized by comprising the following steps:
step S1, gesture image segmentation: converting the RGB image into a YCbCr image by using a color space model, and then segmenting the image by using an ellipse model in a skin color model to obtain a gesture part;
step S2, extracting gesture features: performing series fusion on HOG and LBP characteristics by using a series characteristic fusion method, describing gesture characteristics from two angles of edges and textures, and adopting gesture classification recognition based on an SVM;
step S3, effective gesture recognition: and judging whether the recognized gesture is effective or not by adopting a face verification method and a head posture estimation method, if so, recognizing the user requirement by utilizing the gesture, and otherwise, judging that the recognized gesture is ineffective.
Further, the step S1 includes a step S11, in which a monocular camera is used to collect images for face detection;
step S12, firstly, analyzing the color space and mapping to the proper color space; then, analyzing a skin color model, and segmenting a skin color area; then, analyzing noise interference, and excluding skin color and skin color-like connected domains outside the hand region; finally, the hand area is intercepted.
Further, in step S12, after converting the RGB color space into the YCbCr color space, the Y luminance component and the CbCr color component are separately processed, and the expression for converting from the RGB color space into the YCbCr color space is as follows:
Figure BDA0002682331110000021
modeling the skin color by adopting the elliptical model to obtain a segmented binary image, wherein the elliptical model is described by the following formula:
Figure BDA0002682331110000022
wherein (x, y) is the boundary point of the ellipse, and (c)x,cy) Is the center of the ellipse, a is the major axis of the ellipse, b is the minor axis of the ellipse, θ is the rotation angle of the ellipse;
searching the binary image to obtain the outline of each skin color and skin color-like connected domain, and eliminating the interference of the skin color-like region by using an area operator; and finally, removing the interference of the face region by filtering a connected domain containing a face rectangular frame, and only leaving a pure hand region.
Further, in the step S2, the tandem feature fusion method is specifically as follows; suppose a gesture image imgiAfter HOG feature extraction, generating a feature vector ai,aiObtaining a characteristic vector A by PCA dimension reduction processing and mappingi;imgiAfter circular neighborhood, rotational invariance and unified LBP feature extraction, a feature vector B is generatediThen the final fused feature vector of the image is represented as:
Ci=[Ai Bi] (2.1)。
further, the face verification technology adopts face recognition verification based on ResNet; the face recognition firstly needs to extract features, namely, images are collected through a camera, then a face area is positioned through an SSD face detection network, then face key points are searched through CLM feature point positioning, then the face area is aligned, and finally the feature description of the face is obtained through a feature extraction network; the face alignment adopts affine transformation and a bilinear interpolation method, the affine transformation is linear transformation from two-dimensional coordinates to two-dimensional coordinates, and the mathematical model is as follows:
Figure BDA0002682331110000031
where (x ', y') is the mapped point of (x, y) after affine transformation, the homogeneous coordinate representation of the transformation is:
Figure BDA0002682331110000032
wherein M is represented as an affine transformation matrix and comprises 6 unknown variables, (a)00,a01,a10,a11) Represents linear transformation parameters, (b)00,b01) Representing a translation parameter; the affine transformation matrix formula is: x '═ XH, where X' is a known matrix of standard frontal image reference points, X is a known matrix of image reference points to be aligned, H is an unknown affine transformation parameter matrix, which can be solved:
H=(XTX)-1XTX′ (3.4)
after H is solved, carrying out affine transformation on each pixel point of the whole image to be aligned, and combining to obtain a corrected result image;
the linear interpolation method mainly aims to solve the problem of deformation distortion caused by image size conversion, namely amplification or reduction, and calculates the pixel value of a point to be solved by searching four integer pixel points closest to the corresponding coordinate (i, j) and then respectively carrying out linear interpolation in two directions;
suppose we know the source image midpoint Q11(x1,y1) Point Q12(x1,y2) Point Q21(x2,y1) And point Q22(x2,y2) The pixel value of a certain point P (x, y) is obtained by first aligning Q in the x direction11(x1,y1) And Q21(x2,y1) Two points are subjected to linear interpolation calculation to obtain a pixel value I (R)1) In the x direction to Q12(x1,y2) And Q22(x2,y2) Two points are subjected to linear interpolation calculation to obtain a pixel value I (R)2),Namely:
Figure BDA0002682331110000033
wherein R is1The coordinates of the point are (x, y)1),R2The coordinates of the point are (x, y)2),
Then to R in the y direction1And R2The two points perform linear interpolation to calculate the pixel value I (P) of the point (x, y) position, namely:
Figure BDA0002682331110000041
after the pixel values of 4 integer coordinate points adjacent to any point (x, y) of the image are known, the pixel values of the (x, y) coordinate points can be obtained through a bilinear interpolation method as follows:
Figure BDA0002682331110000042
the pixel value of each point of the target image is equal to the pixel value of the corresponding position in the source image, and the image size transformation can be completed;
after the human faces are aligned, processing the images by using a ResNet residual convolution neural network, outputting 128-dimensional characteristic vectors, measuring the similarity between the characteristic vectors of the human face images to be recognized and the standard characteristic vectors by using cosine distances, and judging whether the personnel in the images to be recognized are the designated users or not according to the size of the similarity.
Further, the head pose estimation method is EPnP-based head pose estimation; the EPnP algorithm is to represent the three-dimensional coordinates of all feature points in the world coordinate system by the weighted sum of coordinates of 4 virtual control points, the 4 virtual control points cannot be coplanar, by solving the coordinates of the 4 control points in the camera coordinate system, the conversion relationship between the coordinates can be obtained, and the attitude information of the head is further calculated according to the conversion relationship, which is specifically as follows:
recording n characteristic points in world coordinate systemHas the coordinates of
Figure BDA0002682331110000043
The coordinates of the 4 virtual control points are
Figure BDA0002682331110000044
Figure BDA0002682331110000045
Coordinate transformation projected into camera coordinate system
Figure BDA0002682331110000046
Virtual control point change to
Figure BDA0002682331110000047
Each feature point in the two coordinate systems is represented by a weighted sum of 4 virtual control points, which are respectively corresponding to the feature points, namely:
Figure BDA0002682331110000048
according to equation (3.8a) is known
Figure BDA0002682331110000049
And
Figure BDA00026823311100000410
on the premise of (2), the weight parameter alpha can be obtained by solvingijThen, the coordinates of the 4 virtual control points in the camera coordinate system need to be obtained; according to a projection imaging model from 3D points to 2D points, known image points U are combinediAnd spatial point
Figure BDA00026823311100000411
Substituting M and M respectively, and developing to obtain:
Figure BDA0002682331110000051
wherein λ isiOf feature points for which scale factors are to be determinedPixel coordinate UiThe internal reference matrix K of the camera is known, and the weight parameter alpha is knownijThe coordinates of the control point in the camera coordinate system are obtained
Figure BDA0002682331110000052
Setting control points in a camera coordinate system to be solved
Figure BDA0002682331110000053
Further expanding equation (3.9), namely:
Figure BDA0002682331110000054
two linear equations can be obtained from the above equation:
Figure BDA0002682331110000055
when the weight value alpha is knownij2D feature points (u)i,vi) And (f) in the reference matrixu,fv)、(cu,cv) On the premise of (1), can be solved
Figure BDA0002682331110000056
A specific value of (a);
after the coordinates of 4 control points in the camera coordinate system are obtained through the steps, the coordinates are substituted into the formula (3.8b), namely
Figure BDA0002682331110000057
Obtaining the coordinates of the 3D characteristic points in a camera coordinate system; the coordinates of the characteristic points under the camera coordinate system can be obtained through solving, and then the rotation matrix and the translation matrix are calculated through the relation of the points under the world coordinate system and the camera coordinate system.
Further, the rules of the valid gesture in step S3 include: the rules of "valid gesture must look at person" based on identity information, "valid gesture must be in place" based on location information, "valid gesture must be focused" based on gesture information, and "valid gesture must last" based on statistical information.
Furthermore, (1) the realization of 'effective gesture must see people' depends on a face verification technology, and only when a specified user exists in a monitoring picture, a gesture area can be allowed to be detected on img, gesture features are extracted, and gesture categories are recognized;
(2) the 'valid gesture must be in place' means that a gesture specially specified in advance is valid, and other gestures cannot be responded by the system;
(3) "effective gesture must be focused" means that the user must watch the poster painted with the gesture mark when making the corresponding gesture, and the yaw angle theta of the head gesture of the user when watchingyIf the angle is greater than 30 degrees, the gesture which satisfies the head rotation angle is considered to be effective;
(4) "a valid gesture must last" means that a certain gesture that is swung out lasts at least 3s, i.e. for 90 frames;
if one of the above four items is not satisfied, the gesture is considered to be invalid.
The invention also aims to provide a vision tracking device for the eye movement machine of the elderly, which comprises a gesture image segmentation module, a gesture recognition module and an effective gesture recognition module;
the gesture image segmentation module comprises video acquisition equipment, a face detection module and a gesture image segmentation submodule; the video acquisition equipment acquires images; the face detection module detects a face in an image acquired by the video acquisition equipment; the gesture image segmentation submodule segments a hand area in an image acquired by the video acquisition equipment;
the gesture recognition module comprises a gesture feature extraction module and a gesture recognition sub-module; the gesture feature extraction module extracts gestures; the gesture recognition sub-module compares and recognizes gestures according to preset gesture classification and extracted gestures;
the effective gesture recognition module recognizes an effective gesture according to a preset rule.
Compared with the prior art, the invention has the following beneficial effects: (1) the invention introduces a face verification technology and a head posture estimation technology to judge whether the recognized gesture is effective, if the recognized gesture is effective, the user requirement is recognized by utilizing the gesture, otherwise, the recognized gesture is judged to be an invalid gesture. The characteristic series fusion gesture recognition technology for the elderly can quickly and accurately acquire the gesture actions of the user, so that the care requirements of the elderly are recognized in real time, and meanwhile, the related technology of gesture effective recognition is added, so that the safety of gesture recognition is greatly improved, and the probability of misidentification is reduced. (2) According to the method for fusing the HOG and the LBP in series connection, provided by the invention, the key characteristics for describing the gesture can be captured better, and the quality of the gesture description operator is greatly improved, so that the SVM classifier for subsequent training has better efficiency and higher precision.
The characteristic series fusion gesture recognition technology for the elderly provided by the invention can judge the current demand of the elderly through the hand action of the elderly, convert the care demand into different hand actions, indirectly solve the problem that the elderly cannot clearly express the care demand by language, and simultaneously provide a simple and easy expression mode for the elderly.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of an SDD network architecture;
FIG. 3 is a diagram of SDD face detection results;
FIG. 4 is a schematic diagram of color space conversion, (4a) RGB image; (4b) a YCbCr image;
FIG. 5 is a schematic diagram of skin color segmentation of an elliptical model, (5a) YCbCr image, and (5b) elliptical model segmentation;
fig. 6 is a schematic diagram of a noise interference elimination process, (6a) an area operator filter image, (6b) a face filter image, (6c) a hand binarization image, and (6d) a hand RGB original image;
FIG. 7 is a gesture image acquisition, (7a) an undisclosed image, (7b) a whiteclosed image, (7c) a whiteclosed RGB image, (7d) a grayscale image, and (7e) a size normalization image;
FIG. 8 is a schematic diagram of tandem fusion features;
FIG. 9 is a feature fusion gesture diagram;
FIG. 10 is a sample diagram of a gesture;
FIG. 11 is a schematic diagram of the positioning result of the CLM, (11a) the original image, and (11b) the image of the positioning result of the CLM key points;
FIG. 12 is a schematic diagram of bilinear interpolation;
fig. 13 is a result diagram of a face verification section, (13a) a face verification image 1, and (13b) a face verification image 2;
FIG. 14 is a diagram showing the results of head pose estimation, (14a) head right tilt, (14b) head left tilt;
Detailed Description
The invention is further described below with reference to examples and figures.
Example 1
As shown in fig. 1, a feature series fusion gesture recognition method for elderly people is characterized by comprising the following steps:
step S1, gesture image segmentation: converting the RGB image into a YCbCr image by using a color space model, and then segmenting the image by using an ellipse model in a skin color model to obtain a gesture part;
step S2, extracting gesture features: performing series fusion on HOG and LBP characteristics by using a series characteristic fusion method, describing gesture characteristics from two angles of edges and textures, and adopting gesture classification recognition based on an SVM;
step S3, effective gesture recognition: and judging whether the recognized gesture is effective or not by adopting a face verification method and a head posture estimation method, if so, recognizing the user requirement by utilizing the gesture, and otherwise, judging that the recognized gesture is ineffective.
Specifically, the step S1 includes a step S11, in which a monocular camera is used to perform image acquisition for face detection;
use ordinary monocular camera as video acquisition equipment, the camera concrete parameter is: frame rate 30FPS, image size 640 x 480. After the image is obtained, face detection is needed, and a foundation is laid for subsequent face verification. The invention adopts SSD (Single Shot MultiBox Detector) network to realize the detection of the human face. The SSD network still keeps the end-to-end characteristic of YOLO, the whole calculation process is packaged in a single network, the SSD network is an integrated entity of a single-stage model, is one of the current target detection networks with the best comprehensive effect, and is simple and convenient to train, and accurate and rapid in prediction.
The general architecture of the SSD network is shown in fig. 2, which uses the basic network to mine the feature information of the input image, generates the position coordinates and the confidence score of the face region through the prediction network, and finally sets the confidence score threshold by itself, and only the face region larger than the threshold is retained. The face detection results based on the SSD network are shown in fig. 3.
Step S12, firstly, analyzing the color space and mapping to the proper color space; then, analyzing a skin color model, and segmenting a skin color area; then, analyzing noise interference, and excluding skin color and skin color-like connected domains outside the hand region; finally, the hand area is intercepted.
Step S12 is explained in detail as follows:
at present, the gesture image segmentation algorithm mainly comprises a texture segmentation method based on a static image, a matching method, a threshold value method, a difference method based on a dynamic sequence, an optical flow method and the like. The invention uses a gesture segmentation method based on a skin color model: firstly, analyzing a color space and mapping the color space to a proper color space; then, analyzing a skin color model, and segmenting a skin color area; then, analyzing noise interference, and excluding skin color and skin color-like connected domains outside the hand region; finally intercepting the hand region color space is a mathematical description of the intuitive visual perception of the image. The skin color shows good clustering performance in a YCbCr color space, and is more suitable for a gesture image segmentation task. After converting the RGB color space into the YCbCr color space, the Y luminance component and the CbCr color component may be processed separately, or even the Y luminance component may be completely discarded. The expression for converting from RGB color space to YCbCr color space is as follows:
Figure BDA0002682331110000081
as shown in fig. 4 below, the RGB image is converted into a YCbCr image, and the skin color region appears more prominent and compact. Fig. 4a shows an RGB image, and fig. 4b shows a YCbCr image.
The skin color model mainly comprises the following components: single gaussian model, mixed gaussian model, elliptical model, etc. By projecting the samples from the YCbCr color space to the CrCb color plane, the skin color points can be found to be gathered in an elliptical area, the calculation loss of the elliptical model is very low, and the model is quite visual. Therefore, the invention adopts the ellipse model to model the skin color. The elliptical model can be described by the following equation:
Figure BDA0002682331110000082
wherein (x, y) is the boundary point of the ellipse, and (c)x,cy) Is the center of the ellipse, a is the major axis of the ellipse, b is the minor axis of the ellipse, and θ is the rotation angle of the ellipse.
The skin color segmentation operation of the elliptical model is performed on the image in YCbCr color space, and the result is shown in fig. 5, where fig. 5a shows the YCbCr image and fig. 5b shows the elliptical model segmenting the image. As seen from fig. 5, after the skin color segmentation is performed by using the elliptical model, the image still has some noise regions such as holes, burrs, and faces, and the noise mainly comes from the interference of the skin color-like region and the interference of the face region. In order to remove the interferences, the invention provides an improved interference elimination strategy, which specifically comprises the following operations: in the SSD face detection network introduced in section 1.1, collected RGB original image imgrgbInputting a face detection network to obtain coordinates (x) of top left vertex of a face rectangular frame1,y1) And lower right vertex coordinates (x)2,y2) (ii) a Then, the original drawing imgrgbConverting from RGB color space to YCbCr color space to obtain image imgYCbCrThen, the image img is divided by the skin color elliptical modelYCbCrObtaining a binary image after skin color segmentation; searching the binary image to obtain the outline of each skin color and skin color-like connected domain, and then eliminating the interference of the skin color-like region by using an area operator; finally, the connected domain containing the face rectangular frame is filtered, so that the interference of the face region is eliminated, and only pure face is leftA pure hand area. The effect map is shown in fig. 6, (6a) an area operator filter map, (6b) a face filter map, (6c) a hand binary image, and (6d) a hand RGB original map.
The method comprises the steps of obtaining the outline of a hand region after interference elimination in two steps of a noise interference analysis stage, enabling a directly peeled hand image to be too aggressive, and appropriately leaving a white space for the region of interest when the image is segmented so as to avoid unexpected situations, so that when the circumscribed rectangle of the hand is drawn, the side length of the rectangle is extended outwards by 15 pixels compared with the original side length; when the rectangular area is peeled off, the side length is extended by 10 pixels from the original side, and the obtained gesture image is shown in fig. 7b, and the rectangular area is cut out from the RGB original image to obtain a diagram 7 c. For computational convenience, the image is grayed out here, and then the gesture image size is further normalized to 128 × 96 using bilinear interpolation, resulting in a "final" gesture image, as shown in fig. 7 (e).
Specifically, the gesture recognition in step S2 includes gesture feature extraction and gesture classification recognition.
Histogram of Oriented Gradients (HOG) describes the Gradient direction and Gradient intensity distribution characteristics of a local part of an image. The HOG is realized mainly by dividing the image into a plurality of regions with the same size, then respectively calculating a directional gradient histogram in each region, and finally connecting and combining all the histograms, namely, a HOG feature description operator of the detected image, wherein the small-size region is called a cell unit (cell).
Local Binary Patterns (LBPs) [13] describe the Local texture of an image. The LBP realizing method is that when calculating LBP response value of a certain point, the point is taken as the center, a neighborhood is drawn according to a certain rule, the point is compared with the pixel values of all the points in the depocenter neighborhood one by one, when the pixel value of other points is larger than the pixel value of the center point, the position is marked as 1, when the pixel value is smaller than or equal to the pixel value of the center point, the position is marked as 0, the marking results around the center position in the neighborhood are obtained in sequence, all the peripheral marking results are connected in sequence to form a binary number, and the binary number can be regarded as the LBP response value of the center position.
Therefore, the HOG and the LBP can both capture the characteristics with identification degree of the key of the gesture, wherein the HOG focuses more on the edge information of the gesture, the extracted characteristics have excellent effect under the condition of simple gesture and single background, but the HOG has poor effect when the background is complicated or the hierarchy information of the gesture cannot be ignored; LBP focuses more on texture information of gestures, and can capture complex gestures such as overlapping between fingers or between fingers and palm, but the extracted features have a moderate effect. There are inevitable limitations to the individual features described.
According to the method, HOG and LBP characteristics are fused, the gesture characteristics are described from two angles of edge and texture respectively, the quality of a final gesture description operator is improved, and the method lays a foundation for high-precision gesture classification identification in the following. The invention uses a serial feature fusion method: hypothetical gesture image imgiAfter HOG feature extraction, generating a feature vector ai,aiObtaining a characteristic vector A by PCA dimension reduction processing and mappingi;imgiAfter circular neighborhood, rotational invariance and unified LBP feature extraction, a feature vector B is generatediThen the final fused feature vector of the image is represented as:
Ci=[Ai Bi] (2.1)
in the invention, a 396-dimensional HOG + PCA feature description operator is connected in series with a 108-dimensional circular neighborhood + rotation invariance + unified LBP feature description operator to obtain a 504-dimensional final fusion feature vector. A schematic diagram of a feature fusion process based on the tandem approach is shown in fig. 8.
And (4) respectively carrying out HOG and improved LBP feature extraction of the parameters designed in the text on the gesture image in the step (7e), and placing the obtained LBP feature vector in the HOG feature vector subjected to PCA dimension reduction to obtain a fused feature vector. This vector will serve as a mathematical description of the image for the following classification identification.
SVM-based gesture classification recognition
According to the method, a Support Vector Machine (SVM) which is excellent in performance under a small sample data set is selected as a classifier, and the classifier can be put into use after an SVM model parameter file is generated by training of the established gesture data set, so that the final purpose of gesture prediction is achieved.
To train SVM classifiers and verify the effectiveness of gesture recognition, a correlated gesture image dataset is established herein. The invention designs 8 different gestures with certain discrimination by way of example, 5 volunteers are invited to make the 8 gestures in front of the camera, each person in each gesture collects 100 frames of images under different scenes and different illumination, a large amount of image data can ensure that an SVM model file trained subsequently has certain generalization and robustness as far as possible, and the practical application of accurately recognizing the gestures so as to control the posture of the bed body can be met.
As can be seen from the above description, each gesture has 5 × 100 to 500 images, and 8 × 500 to 4000 images in total are 8 × 500 gestures. The original panoramic image with 640 × 480 sizes acquired by the camera contains excessive background noise, a gesture area is extracted by using a gesture segmentation algorithm, 4000 gesture images with different sizes are obtained, the original 4000 panoramic images are deleted, 500 images of each gesture are further divided into a training set (400) and a testing set (100) according to the ratio of 8: 2, and partial gesture sample images in the data set are shown in fig. 10.
The multi-classification strategy of the SVM is mainly divided into two implementation manners of one-to-many (one-vs-rest) and one-to-one (one-vs-one). The one-to-one method has higher precision than the one-to-many method, and the training complexity and the test real-time performance are still in an acceptable range, so the one-to-one implementation mode is used for solving the multi-classification problem of 8 gestures. The SVM class stated in machine learning module ml in opencv3.4.0 library is used herein, and the positive sample label is set to 1 and the negative sample label is set to 0. When creating an SVM classifier, an SVM type (a default C _ SVC is used herein) needs to be set, a kernel function type and related hyper-parameters need to be set, an RBF kernel function is selected herein, and the related hyper-parameters are represented by gamma and C. Because the selection of the hyperparameter of the SVM classifier is very important and directly influences the generalization and robustness of the SVM model, the invention uses a five-fold cross validation method to solve the selection problem of the hyperparameter, so that a relatively excellent SVM model can be obtained, and the gesture classification effect is ensured to reach a higher level.
Because the gesture motion is random, sometimes the non-human conscious motion can be recognized as the effective gesture by the machine, thereby influencing the user experience. Relatively mature face detection, face verification and head posture estimation are introduced as auxiliary technologies of the system, the judgment of gesture effectiveness is helped to be realized through the mature auxiliary technologies, and the accuracy of the care requirement identification of the elderly is guaranteed through the restriction of effectiveness.
In this embodiment, face recognition verification based on ResNet is adopted.
The face recognition needs to extract features firstly, namely, images are collected through a camera, then a face area is located through an SSD face detection network, face key points are located and searched through CLM feature point locating, then the face area is aligned, and finally feature description of a face is obtained through a feature extraction network.
The CLM can quite accurately search out key feature points of a human face, such as eyes, mouth and the like, and mainly comprises three parts: shape model, local model, and fitting optimization. In the embodiment, a 68 characteristic point calibration method of a 300-W face data set is adopted to describe the structural shape of a face, and after a face rectangular region is obtained through an SSD face detection algorithm, the exact position of a characteristic point is found by utilizing a fitting optimization strategy to search in the characteristic point field in the face region in combination with the constraints of a shape model and a local model. The CLM positioning result is shown in fig. 11, (11a) the original image (11b) the CLM key point positioning result image.
Face alignment is mainly to solve two problems: firstly, the angles of the face images caused by the rotation of the head are different, and secondly, the sizes of the face images caused by the shooting distance are different. The embodiment adopts affine transformation and bilinear interpolation to realize face alignment. The affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, and the mathematical model of the affine transformation is as follows:
Figure BDA0002682331110000111
where (x ', y') is the mapped point of (x, y) after affine transformation, the homogeneous coordinate representation of the transformation is:
Figure BDA0002682331110000112
wherein M is represented as an affine transformation matrix and comprises 6 unknown variables, (a)00,a01,a10,a11) Represents linear transformation parameters, (b)00,b01) Representing the translation parameters. At least 3 pairs of points are required for obtaining the affine transformation matrix, 2 eye points and 1 nose point are selected, and the 3 points of the images to be aligned are assumed to be (x)1,y1)、(x2,y2)、(x3,y3) The 3 points of the standard frontal face image are (x)1′,y1′)、(x2′,y2′)、(x3′,y3'), equation (3.2) can be transformed into:
Figure BDA0002682331110000121
formula (3.3) can be abbreviated as: x '═ XH, where X' is a known matrix of standard frontal image reference points, X is a known matrix of image reference points to be aligned, H is an unknown affine transformation parameter matrix, which can be solved:
H=(XTX)-1XTX′ (3.4)
and after H is solved, carrying out affine transformation on each pixel point of the whole image to be aligned, and combining to obtain a corrected result image.
The linear interpolation method mainly aims to solve the problem of malformation distortion caused by image size conversion, namely amplification or reduction, and calculates the pixel value of the point to be solved by searching four integer pixel points closest to the corresponding coordinate (i, j) and then respectively carrying out linear interpolation in two directions.
As shown in FIG. 12, assume that we know the source image midpoint Q11(x1,y1) Point Q12(x1,y2) Point Q21(x2,y1) And point Q22(x2,y2) The pixel value of a certain point P (x, y) is obtained by first aligning Q in the x direction11(x1,y1) And Q21(x2,y1) Two points are subjected to linear interpolation calculation to obtain a pixel value I (R)1) In the x direction to Q12(x1,y2) And Q22(x2,y2) Two points are subjected to linear interpolation calculation to obtain a pixel value I (R)2) Namely:
Figure BDA0002682331110000122
wherein R is1The coordinates of the point are (x, y)1),R2The coordinates of the point are (x, y)2),
Then to R in the y direction1And R2The two points perform linear interpolation to calculate the pixel value I (P) of the point (x, y) position, namely:
Figure BDA0002682331110000123
after the pixel values of 4 integer coordinate points adjacent to any point (x, y) of the image are known, the pixel values of the (x, y) coordinate points can be obtained through a bilinear interpolation method as follows:
Figure BDA0002682331110000131
and (4) making the pixel value of each point of the target image equal to the pixel value of the corresponding position in the source image, thus finishing the image size transformation.
After the human faces are aligned, processing the images by using a ResNet residual convolution neural network, outputting 128-dimensional characteristic vectors, measuring the similarity between the characteristic vectors of the human face images to be recognized and the standard characteristic vectors by using cosine distances, and judging whether the personnel in the images to be recognized are the designated users or not according to the size of the similarity. The final face verification result graph is shown in fig. 13.
The valid gesture rules are determined as follows:
the embodiment defines rule definitions of 'effective gesture must look at people' based on identity information, 'effective gesture must be in place' based on position information, 'effective gesture must be concentrated' based on posture information and 'effective gesture must be continuous' based on statistical information, and designs an effective gesture judgment module. The judgment rules of the 4 valid gestures are as follows:
(1) the realization that effective gestures need to see people depends on a face verification technology, and only when an appointed user exists in a monitoring picture, a gesture area can be allowed to be detected on img, gesture features are extracted, and gesture categories are identified;
(2) "valid gestures must be in place" means that the specifically defined gesture is valid and no other gestures can be responded to by the system;
(3) "effective gesture must be focused" means that the user must watch the poster painted with the gesture mark when making the corresponding gesture, and the yaw angle theta of the head gesture of the user when watchingyIf the angle is greater than 30 degrees, the gesture which satisfies the head rotation angle is considered to be effective;
(4) a "valid gesture must last" means that some gesture that is swung out lasts at least 3s, i.e. for 90 frames.
If one of the above four items is not satisfied, the gesture is regarded as an invalid gesture, so that the probability of false recognition is greatly reduced.
Example 2
The present embodiment is different from embodiment 1 in that the head pose estimation method in the present embodiment is based on EPnP head pose estimation.
The EPnP algorithm is that three-dimensional coordinates of all feature points in a world coordinate system are represented by weighted sum of coordinates of 4 virtual control points, the 4 virtual control points cannot be coplanar, a conversion relation between the coordinates can be obtained by solving the coordinates of the 4 control points in a camera coordinate system, and attitude information of a head is further calculated according to the conversion relation.
The coordinates of the n characteristic points in the world coordinate system are recorded as
Figure BDA0002682331110000133
The coordinates of the 4 virtual control points are
Figure BDA0002682331110000134
Figure BDA0002682331110000135
Coordinate transformation projected into camera coordinate system
Figure BDA0002682331110000136
Virtual control point change to
Figure BDA0002682331110000137
Each feature point in the two coordinate systems is represented by a weighted sum of 4 virtual control points, which are respectively corresponding to the feature points, namely:
Figure BDA0002682331110000141
according to equation (3.8a) is known
Figure BDA0002682331110000142
And
Figure BDA0002682331110000143
on the premise of (2), the weight parameter alpha can be obtained by solvingijThen, the coordinates of the 4 virtual control points in the camera coordinate system need to be obtained; according to a projection imaging model from 3D points to 2D points, known image points U are combinediAnd spatial point
Figure BDA0002682331110000144
Substituting M and M respectively, and developing to obtain:
Figure BDA0002682331110000145
wherein λ isiThe scale factor is to be solved, the pixel coordinate U of the characteristic pointiThe internal reference matrix K of the camera is known, and the weight parameter alpha is knownijThe coordinates of the control point in the camera coordinate system are obtained
Figure BDA0002682331110000146
Setting control points in a camera coordinate system to be solved
Figure BDA0002682331110000147
Further expanding equation (3.9), namely:
Figure BDA0002682331110000148
two linear equations can be obtained from the above equation:
Figure BDA0002682331110000149
when the weight value alpha is knownij2D feature points (u)i,vi) And (f) in the reference matrixu,fv)、(cu,cv) On the premise of (1), can be solved
Figure BDA00026823311100001410
A specific value of (a);
after the coordinates of 4 control points in the camera coordinate system are obtained through the steps, the coordinates are substituted into the formula (3.8b), namely
Figure BDA00026823311100001411
Obtaining the coordinates of the 3D characteristic points in a camera coordinate system; the coordinates of the characteristic points under the camera coordinate system can be obtained through solving, and then the rotation matrix and the translation matrix are calculated through the relation of the points under the world coordinate system and the camera coordinate system.
The standard 3D model used in this embodiment is derived from the front head three-dimensional data model of the research institute of robot and the system of the science and university of corbela, theoretically, the more the number of the selected human face feature points is, the more accurate the posture calculation result is, but actually, several key feature points are required, in this embodiment, 14 points are used, the numbers of the points on the feature point calibration graph in the standard 3D model are 33, 29, 34, 38, 13, 17, 25, 21, 55, 49, 43, 39, 45 and 6, respectively, and the final head posture estimation result is shown in fig. 14.
In order to verify the gesture recognition effect, fusion features are extracted from 8 gesture (100 test images of each gesture) test classifiers, each gesture image is solved to obtain 504-dimensional serial feature vectors, the class of each test image is determined according to SVM multi-classifier model parameters obtained in the training stage, the class is compared with the real class of each test image, the decision result of the SVM multi-classifier model is counted, and a confusion matrix for gesture recognition is drawn as shown in the following table 1.
TABLE 1 test sample gesture recognition results
Figure BDA0002682331110000151
It can be seen that, except the gesture h, the gestures may not be mistakenly determined into other categories, and the remaining gestures may generate a little false recognition, especially the most difficult determination for the sample of the gesture f, which is also due to the reason that the complexity of the gesture itself is high and the recognition degree is low. In general, most of the test samples can be correctly classified, and the gesture recognition effect is good.
Example 3
The vision tracking device for the eye movement machine of the elderly is characterized by comprising a gesture image segmentation module, a gesture recognition module and an effective gesture recognition module;
the gesture image segmentation module comprises video acquisition equipment, a face detection module and a gesture image segmentation submodule; the video acquisition equipment acquires images; the face detection module detects a face in an image acquired by the video acquisition equipment; the gesture image segmentation submodule segments a hand area in an image acquired by the video acquisition equipment;
the gesture recognition module comprises a gesture feature extraction module and a gesture recognition sub-module; the gesture feature extraction module extracts gestures; the gesture recognition sub-module compares and recognizes gestures according to preset gesture classification and extracted gestures;
the effective gesture recognition module recognizes an effective gesture according to a preset rule.
Finally, it should be noted that: the above embodiments are only used to illustrate the present invention and do not limit the technical solutions described in the present invention; thus, while the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted; all such modifications and variations are intended to be included herein within the scope of this disclosure and the present invention and protected by the following claims.

Claims (9)

1. A feature series fusion gesture recognition method for elderly people is characterized by comprising the following steps:
step S1, gesture image segmentation: converting the RGB image into a YCbCr image by using a color space model, and then segmenting the image by using an ellipse model in a skin color model to obtain a gesture part;
step S2, extracting gesture features: performing series fusion on HOG and LBP characteristics by using a series characteristic fusion method, describing gesture characteristics from two angles of edges and textures, and adopting gesture classification recognition based on an SVM;
step S3, effective gesture recognition: and judging whether the recognized gesture is effective or not by adopting a face verification method and a head posture estimation method, if so, recognizing the user requirement by utilizing the gesture, and otherwise, judging that the recognized gesture is ineffective.
2. The method for recognizing the characteristic series-connection fusion gesture of the elderly according to claim 1, wherein the step S1 comprises a step S11 of performing image acquisition by using a monocular camera to perform face detection;
step S12, firstly, analyzing the color space and mapping to the proper color space; then, analyzing a skin color model, and segmenting a skin color area; then, analyzing noise interference, and excluding skin color and skin color-like connected domains outside the hand region; finally, the hand area is intercepted.
3. The senior citizen feature series fusion gesture recognition method of claim 3, wherein in step S12, after converting the RGB color space into the YCbCr color space, the Y luminance component and the CbCr color component are separately processed, and the expression for converting from the RGB color space to the YCbCr color space is as follows:
Figure FDA0002682331100000011
modeling the skin color by adopting the elliptical model to obtain a segmented binary image, wherein the elliptical model is described by the following formula:
Figure FDA0002682331100000012
wherein (x, y) is the boundary point of the ellipse, and (c)x,cy) Is the center of the ellipse, a is the major axis of the ellipse, b is the minor axis of the ellipse, θ is the rotation angle of the ellipse;
searching the binary image to obtain the outline of each skin color and skin color-like connected domain, and eliminating the interference of the skin color-like region by using an area operator; and finally, removing the interference of the face region by filtering a connected domain containing a face rectangular frame, and only leaving a pure hand region.
4. The elderly feature series-fused gesture recognition method according to claim 3, wherein in step S2, the series connection featureThe fusion method is concretely as follows; suppose a gesture image imgiAfter HOG feature extraction, generating a feature vector ai,aiObtaining a characteristic vector A by PCA dimension reduction processing and mappingi;imgiAfter circular neighborhood, rotational invariance and unified LBP feature extraction, a feature vector B is generatediThen the final fused feature vector of the image is represented as:
Ci=[Ai Bi] (2.1)。
5. the method for recognizing the serially-connected feature fusion gesture of the elderly according to claim 1, wherein the face verification technology adopts face recognition verification based on ResNet; the face recognition firstly needs to extract features, namely, images are collected through a camera, then a face area is positioned through an SSD face detection network, then face key points are searched through CLM feature point positioning, then the face area is aligned, and finally the feature description of the face is obtained through a feature extraction network; the face alignment adopts affine transformation and a bilinear interpolation method, the affine transformation is linear transformation from two-dimensional coordinates to two-dimensional coordinates, and the mathematical model is as follows:
Figure FDA0002682331100000021
where (x ', y') is the mapped point of (x, y) after affine transformation, the homogeneous coordinate representation of the transformation is:
Figure FDA0002682331100000022
wherein M is represented as an affine transformation matrix and comprises 6 unknown variables, (a)00,a01,a10,a11) Represents linear transformation parameters, (b)00,b01) Representing a translation parameter; the affine transformation matrix formula is: x '═ XH, where X' is constructed from standard frontal image reference pointsAnd forming a known matrix, wherein X is a known matrix formed by reference points of the images to be aligned, H is an unknown affine transformation parameter matrix, and the known matrix can be obtained by the following steps:
H=(XTX)-1XTX′ (3.4)
after H is solved, carrying out affine transformation on each pixel point of the whole image to be aligned, and combining to obtain a corrected result image;
the linear interpolation method mainly aims to solve the problem of deformation distortion caused by image size conversion, namely amplification or reduction, and calculates the pixel value of a point to be solved by searching four integer pixel points closest to the corresponding coordinate (i, j) and then respectively carrying out linear interpolation in two directions;
suppose we know the source image midpoint Q11(x1,y1) Point Q12(x1,y2) Point Q21(x2,y1) And point Q22(x2,y2) The pixel value of a certain point P (x, y) is obtained by first aligning Q in the x direction11(x1,y1) And Q21(x2,y1) Two points are subjected to linear interpolation calculation to obtain a pixel value I (R)1) In the x direction to Q12(x1,y2) And Q22(x2,y2) Two points are subjected to linear interpolation calculation to obtain a pixel value I (R)2) Namely:
Figure FDA0002682331100000031
wherein R is1The coordinates of the point are (x, y)1),R2The coordinates of the point are (x, y)2),
Then to R in the y direction1And R2The two points perform linear interpolation to calculate the pixel value I (P) of the point (x, y) position, namely:
Figure FDA0002682331100000032
after the pixel values of 4 integer coordinate points adjacent to any point (x, y) of the image are known, the pixel values of the (x, y) coordinate points can be obtained through a bilinear interpolation method as follows:
Figure FDA0002682331100000033
the pixel value of each point of the target image is equal to the pixel value of the corresponding position in the source image, and the image size transformation can be completed;
after the human faces are aligned, processing the images by using a ResNet residual convolution neural network, outputting 128-dimensional characteristic vectors, measuring the similarity between the characteristic vectors of the human face images to be recognized and the standard characteristic vectors by using cosine distances, and judging whether the personnel in the images to be recognized are the designated users or not according to the size of the similarity.
6. The senior citizen feature series fusion gesture recognition method according to claim 1, wherein the head pose estimation method is EPnP-based head pose estimation; the EPnP algorithm is to represent the three-dimensional coordinates of all feature points in the world coordinate system by the weighted sum of coordinates of 4 virtual control points, the 4 virtual control points cannot be coplanar, by solving the coordinates of the 4 control points in the camera coordinate system, the conversion relationship between the coordinates can be obtained, and the attitude information of the head is further calculated according to the conversion relationship, which is specifically as follows:
the coordinates of the n characteristic points in the world coordinate system are recorded as
Figure FDA0002682331100000034
The coordinates of the 4 virtual control points are
Figure FDA0002682331100000035
Figure FDA0002682331100000036
Coordinate transformation projected into camera coordinate system
Figure FDA0002682331100000037
Virtual control point change to
Figure FDA0002682331100000038
Each feature point in the two coordinate systems is represented by a weighted sum of 4 virtual control points, which are respectively corresponding to the feature points, namely:
Figure FDA0002682331100000041
according to equation (3.8a) is known
Figure FDA0002682331100000042
And
Figure FDA0002682331100000043
on the premise of (2), the weight parameter alpha can be obtained by solvingijThen, the coordinates of the 4 virtual control points in the camera coordinate system need to be obtained; according to a projection imaging model from 3D points to 2D points, known image points U are combinediAnd spatial point
Figure FDA0002682331100000044
Substituting M and M respectively, and developing to obtain:
Figure FDA0002682331100000045
wherein λ isiThe scale factor is to be solved, the pixel coordinate U of the characteristic pointiThe internal reference matrix K of the camera is known, and the weight parameter alpha is knownijThe coordinates of the control point in the camera coordinate system are obtained
Figure FDA0002682331100000046
Setting control points in a camera coordinate system to be solved
Figure FDA0002682331100000047
Further expanding equation (3.9), namely:
Figure FDA0002682331100000048
two linear equations can be obtained from the above equation:
Figure FDA0002682331100000049
when the weight value alpha is knownij2D feature points (u)i,vi) And (f) in the reference matrixu,fv)、(cu,cv) On the premise of (1), can be solved
Figure FDA00026823311000000410
A specific value of (a);
after the coordinates of 4 control points in the camera coordinate system are obtained through the steps, the coordinates are substituted into the formula (3.8b), namely
Figure FDA00026823311000000411
Obtaining the coordinates of the 3D characteristic points in a camera coordinate system; the coordinates of the characteristic points under the camera coordinate system can be obtained through solving, and then the rotation matrix and the translation matrix are calculated through the relation of the points under the world coordinate system and the camera coordinate system.
7. The senior citizen feature serial fusion gesture recognition method according to claim 1, wherein the rules of valid gestures in step S3 include: the rules of "valid gesture must look at person" based on identity information, "valid gesture must be in place" based on location information, "valid gesture must be focused" based on gesture information, and "valid gesture must last" based on statistical information.
8. The elderly feature series-fusion gesture recognition method according to claim 7,
(1) the realization that effective gestures need to see people depends on a face verification technology, and only when an appointed user exists in a monitoring picture, a gesture area can be allowed to be detected on img, gesture features are extracted, and gesture categories are identified;
(2) the 'valid gesture must be in place' means that a gesture specially specified in advance is valid, and other gestures cannot be responded by the system;
(3) "effective gesture must be focused" means that the user must watch the poster painted with the gesture mark when making the corresponding gesture, and the yaw angle theta of the head gesture of the user when watchingyIf the angle is greater than 30 degrees, the gesture which satisfies the head rotation angle is considered to be effective;
(4) "a valid gesture must last" means that a certain gesture that is swung out lasts at least 3s, i.e. for 90 frames;
if one of the above four items is not satisfied, the gesture is considered to be invalid.
9. A serial connection and fusion gesture recognition device for the features of the elderly is characterized by comprising a gesture image segmentation module, a gesture recognition module and an effective gesture recognition module;
the gesture image segmentation module comprises video acquisition equipment, a face detection module and a gesture image segmentation submodule; the video acquisition equipment acquires images; the face detection module detects a face in an image acquired by the video acquisition equipment; the gesture image segmentation submodule segments a hand area in an image acquired by the video acquisition equipment;
the gesture recognition module comprises a gesture feature extraction module and a gesture recognition sub-module; the gesture feature extraction module extracts gestures; the gesture recognition sub-module compares and recognizes gestures according to preset gesture classification and extracted gestures;
the effective gesture recognition module recognizes an effective gesture according to a preset rule.
CN202010965987.6A 2020-09-15 2020-09-15 Feature series fusion gesture recognition method and device for elderly people Pending CN112101208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010965987.6A CN112101208A (en) 2020-09-15 2020-09-15 Feature series fusion gesture recognition method and device for elderly people

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010965987.6A CN112101208A (en) 2020-09-15 2020-09-15 Feature series fusion gesture recognition method and device for elderly people

Publications (1)

Publication Number Publication Date
CN112101208A true CN112101208A (en) 2020-12-18

Family

ID=73758595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010965987.6A Pending CN112101208A (en) 2020-09-15 2020-09-15 Feature series fusion gesture recognition method and device for elderly people

Country Status (1)

Country Link
CN (1) CN112101208A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378773A (en) * 2021-06-29 2021-09-10 北京百度网讯科技有限公司 Gesture recognition method, device, equipment, storage medium and program product
CN113657346A (en) * 2021-08-31 2021-11-16 深圳市比一比网络科技有限公司 Driver action recognition method based on combination of target detection and key point detection
CN113899675A (en) * 2021-10-13 2022-01-07 淮阴工学院 Automatic concrete impermeability detection method and device based on machine vision
CN114564100A (en) * 2021-11-05 2022-05-31 南京大学 Free stereoscopic display hand-eye interaction method based on infrared guidance
CN115713998A (en) * 2023-01-10 2023-02-24 华南师范大学 Intelligent medicine box
CN116484035A (en) * 2023-05-23 2023-07-25 武汉威克睿特科技有限公司 Resume index system and method based on face recognition figure

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846359A (en) * 2018-06-13 2018-11-20 新疆大学科学技术学院 It is a kind of to divide the gesture identification method blended with machine learning algorithm and its application based on skin-coloured regions
CN109086589A (en) * 2018-08-02 2018-12-25 东北大学 A kind of intelligent terminal face unlocking method of combination gesture identification
CN109614922A (en) * 2018-12-07 2019-04-12 南京富士通南大软件技术有限公司 A kind of dynamic static gesture identification method and system
CN110689889A (en) * 2019-10-11 2020-01-14 深圳追一科技有限公司 Man-machine interaction method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846359A (en) * 2018-06-13 2018-11-20 新疆大学科学技术学院 It is a kind of to divide the gesture identification method blended with machine learning algorithm and its application based on skin-coloured regions
CN109086589A (en) * 2018-08-02 2018-12-25 东北大学 A kind of intelligent terminal face unlocking method of combination gesture identification
CN109614922A (en) * 2018-12-07 2019-04-12 南京富士通南大软件技术有限公司 A kind of dynamic static gesture identification method and system
CN110689889A (en) * 2019-10-11 2020-01-14 深圳追一科技有限公司 Man-machine interaction method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
缑新科: "《基于特征融合的静态手势识别》", 《计算机与数字工程》, pages 1336 - 1340 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378773A (en) * 2021-06-29 2021-09-10 北京百度网讯科技有限公司 Gesture recognition method, device, equipment, storage medium and program product
CN113378773B (en) * 2021-06-29 2023-08-08 北京百度网讯科技有限公司 Gesture recognition method, gesture recognition device, gesture recognition apparatus, gesture recognition storage medium, and gesture recognition program product
CN113657346A (en) * 2021-08-31 2021-11-16 深圳市比一比网络科技有限公司 Driver action recognition method based on combination of target detection and key point detection
CN113899675A (en) * 2021-10-13 2022-01-07 淮阴工学院 Automatic concrete impermeability detection method and device based on machine vision
CN114564100A (en) * 2021-11-05 2022-05-31 南京大学 Free stereoscopic display hand-eye interaction method based on infrared guidance
CN114564100B (en) * 2021-11-05 2023-12-12 南京大学 Infrared guiding-based hand-eye interaction method for auto-stereoscopic display
CN115713998A (en) * 2023-01-10 2023-02-24 华南师范大学 Intelligent medicine box
CN116484035A (en) * 2023-05-23 2023-07-25 武汉威克睿特科技有限公司 Resume index system and method based on face recognition figure
CN116484035B (en) * 2023-05-23 2023-12-01 武汉威克睿特科技有限公司 Resume index system and method based on face recognition figure

Similar Documents

Publication Publication Date Title
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
CN112101208A (en) Feature series fusion gesture recognition method and device for elderly people
Sirohey et al. Eye detection in a face image using linear and nonlinear filters
Huang et al. Unsupervised joint alignment of complex images
Zhang et al. Adaptive facial point detection and emotion recognition for a humanoid robot
Metaxas et al. A review of motion analysis methods for human nonverbal communication computing
Pandey et al. Hand gesture recognition for sign language recognition: A review
Kheirkhah et al. A hybrid face detection approach in color images with complex background
Galiyawala et al. Person retrieval in surveillance video using height, color and gender
Gürel Development of a face recognition system
Dalka et al. Human-Computer Interface Based on Visual Lip Movement and Gesture Recognition.
Hoque et al. Computer vision based gesture recognition for desktop object manipulation
Chang et al. Automatic hand-pose trajectory tracking system using video sequences
Thabet et al. Algorithm of local features fusion and modified covariance-matrix technique for hand motion position estimation and hand gesture trajectory tracking approach
Vezzetti et al. Application of geometry to rgb images for facial landmark localisation-a preliminary approach
CN112149598A (en) Side face evaluation method and device, electronic equipment and storage medium
Rabba et al. Discriminative robust gaze estimation using kernel-dmcca fusion
Naji et al. Detecting faces in colored images using multi-skin color models and neural network with texture analysis
Bottino et al. A fast and robust method for the identification of face landmarks in profile images
Paul et al. Extraction of facial feature points using cumulative distribution function by varying single threshold group
Kapse et al. Eye-referenced dynamic bounding box for face recognition using light convolutional neural network
Kathura et al. Hand gesture recognition by using logical heuristics
Hatem et al. Human facial features detection and tracking in images and video
Dhote et al. Overview and an Approach to Real Time Face Detection and Recognition
Zhao et al. The Development of an Identification Photo Booth System based on a Deep Learning Automatic Image Capturing Method.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination