CN108521594B - Free viewpoint video playing method based on motion sensing camera gesture recognition - Google Patents

Free viewpoint video playing method based on motion sensing camera gesture recognition Download PDF

Info

Publication number
CN108521594B
CN108521594B CN201810246578.3A CN201810246578A CN108521594B CN 108521594 B CN108521594 B CN 108521594B CN 201810246578 A CN201810246578 A CN 201810246578A CN 108521594 B CN108521594 B CN 108521594B
Authority
CN
China
Prior art keywords
gesture
viewpoint
palm
current frame
switching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810246578.3A
Other languages
Chinese (zh)
Other versions
CN108521594A (en
Inventor
朱威
宋广振
王义锋
陈朋
郑雅羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810246578.3A priority Critical patent/CN108521594B/en
Publication of CN108521594A publication Critical patent/CN108521594A/en
Application granted granted Critical
Publication of CN108521594B publication Critical patent/CN108521594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a free viewpoint video playing method based on somatosensory camera gesture recognition, which comprises the steps of obtaining a frame of depth image and human body skeleton data, preprocessing the depth image by a bilateral filtering method, calculating the center position of a right palm and the position of a right arm, obtaining a right palm profile, extracting the profile characteristics of the right palm and the position characteristics of the right arm to judge a gesture, and converting the recognized gesture into corresponding viewpoint playing operation. The invention uses the bilateral filtering method to preprocess the depth image, so that the generated depth image better retains the edge contour information in the image. A simple and quick gesture recognition method is designed, and the blockage of the viewpoint switching by the user gesture is effectively reduced. A brand-new method for amplifying and reducing the local picture of the free viewpoint video and a method for switching viewpoints are designed, the local amplification and reduction of the viewpoint picture and the viewpoint switching are carried out by recognizing the gesture of a user, and a good blending feeling can be brought to the user when the user watches the free viewpoint video.

Description

Free viewpoint video playing method based on motion sensing camera gesture recognition
Technical Field
The invention relates to the field of free viewpoint video playing, in particular to a free viewpoint video playing method based on somatosensory camera gesture recognition.
Background
With the development of information technology, the requirements of people on video playing are more and more demanding, the original black and white video image is developed to the current color video image, the original standard definition and high definition are developed to the current ultra-definition and ultra-high definition, but the vision of human eyes has physiological limits, and the requirements on resolution and chroma tend to be saturated. In addition, videos and images are high-definition and three-dimensional in two major development directions of the twenty-first century, high-definition is basically achieved at present, and three-dimensional development is the development direction of videos and images in the future. The display modes of the stereoscopic video can be divided into two modes, namely naked eye 3D display and VR display according to the mainstream classification method. The naked eye 3D display technology can be divided into two modes, and the traditional mode uses a special 3D display to display and directly uses a common display to perform free viewpoint playing display. The display mode used by the invention is to play the free viewpoint video by adopting a common display and a somatosensory camera. At present, there are many developers of Motion sensing cameras, and the mainstream manufacturers include Microsoft, Intel, Leap Motion and the like, wherein the Intel develops a RealSense Motion camera, and compared with a Kinect camera of Microsoft, the RealSense camera has a relatively short recognition distance and focuses on application in a short distance. LeapMotion developed a Controller somatosensory camera, which is very small compared to the former two cameras and is only suitable for some gesture recognition and finger tracking at close distances. The Microsoft Kinect camera comprises Kinect v1 and Kinect v2, wherein Kinect v2 is a somatosensory camera based on the ToF (Time of Flight) principle, the generated depth image is high in quality, and has human skeleton tracking, basic gesture operation and face tracking functions, the camera provides an SDK (space division key) developed on a Windows platform, and the API can be directly called for skeleton tracking to obtain the three-dimensional coordinate information of corresponding postures and skeleton points.
Gesture recognition can be divided into vision-based gesture recognition and data glove gesture recognition based on various sensors from the technical point of view, and the recognition precision of the data glove gesture recognition technology is higher than that of the vision-based gesture recognition technology, but the special data glove equipment is too expensive, so that the data glove equipment is not suitable for popularization and use in life. The gesture recognition technology based on vision can be completed only by using a camera, and the gesture recognition technology is convenient to use and moderate in price, so that the gesture recognition is performed through the motion sensing camera. The visual gesture recognition system (see the sea soldiers, who want to be far and new, slow light, forestry, medium to be hot and medium, research based on visual gesture recognition-review [ J ] electronics, 2000(02): 118-: first, a video data stream is acquired by one or more cameras. Then, the system detects whether a gesture occurs in the data stream according to the interaction model of the gesture input. If so, the gesture is segmented from the video signal. Then, selecting a gesture model for gesture analysis, wherein the analysis process comprises feature detection and model parameter estimation; in the recognition stage, the gestures are classified according to the model parameters and gesture descriptions are generated as required. Finally, the system drives the specific application according to the generated description. The patent with the application number of 201710263069.7 mainly comprises the steps of obtaining the palm center and the arm center through a Kinect camera to extract a hand outline, then extracting the position characteristics of fingertips and finger roots, comparing the characteristics with the characteristics in a gesture template library to recognize gestures, finally putting the recognized gestures into a queue and comparing and analyzing the recognition results of the first four frames, and obtaining the final recognition result when the most gestures occur. The final recognition result of the recognition method is accurate, but the processing delay is large, so that the method is not suitable for real-time switching of free viewpoints. The patent with application number 201610711131.X obtains depth image data and skeleton data in a gesture video through a Kinect camera, performs primary processing on the obtained depth image by using median filtering, then performs gesture detection and hand position tracking through HandTracker classes in a NITE library, extracts contour information of a palm through edge detection, and finally performs matching recognition gesture through multiple gestures in a decision tree gesture library. The gesture recognition method has the advantages of complex steps and high calculation complexity, and is not beneficial to real-time switching of the free viewpoint.
Disclosure of Invention
In order to enhance the interaction and the mutual inductance of a user in the playing process of a free viewpoint video and further bring better 3D effect experience to the user, the invention provides a free viewpoint video playing method based on gesture recognition of a somatosensory camera, which can perform local amplification and reduction of a viewpoint picture and switching of viewpoints by recognizing the gesture of the user, and comprises the following steps:
(1) acquiring a frame of depth image and human skeleton data
Real-time acquisition of one-frame depth image data I through motion sensing camera1And corresponding human skeletal data S1In which I1Has a width of w pixel points and a height of h pixel points, S1Skeleton three-dimensional coordinate comprising three points of right arm joint, right wrist joint and right palmAnd (4) data. The acquired frame data is the current frame data.
(2) Depth image preprocessing using bilateral filtering
Using a bilateral filtering method to the depth image I obtained in the step (1)1Preprocessing is carried out to obtain a depth image recorded as I 'after bilateral filtering'1. The bilateral filter has the characteristic of better retaining the edge information of the depth image while filtering noise.
(3) Calculating right palm center position and right arm position using human skeletal data
In the invention, because the viewpoint is switched by recognizing the gesture of the user in the playing process of the free viewpoint video, the complexity of the existing calculation method of the palm center position and the arm position is too high, and a quick calculation method of the palm center position and the arm position is used. In the invention, S is directly reacted1The three-dimensional coordinates of the right palm joint in the middle are used as the coordinates H of the center position of the right palmc(x1,y1,z1) The position of the right arm is the joint point A of the right armr(x2,y2,z2) And right wrist joint point Wr(x3,y3,z3) The two points are connected.
(4) Obtaining the outline of the right palm according to the center position of the right palm and the preprocessed depth image
H is to bec(x1,y1,z1) Mapping to depth image I'1To the corresponding position to give H'c(x’1,y’1,z’1) In the practical application scene of gesture recognition, the hand is closest to the motion sensing camera, namely, the gray value of the hand area is smaller than that of other areas and the difference value is obvious compared with that of other areas. Taking H'c(x’1,y’1,z’1) Gray value G of1For reference, search for matching depth image I'1H 'of'c(x’1,y’1,z’1) The rectangular area is a rectangular area which is provided with a center, w 'pixel points in width and h' pixel points in height, wherein w 'is w/4, and h' is h/3; in thatExtracting gray value in G in rectangular area1-10,G1+10]The inner pixel area, this area is marked H. And finally setting the gray value of the H area as 255 and the gray values of the rest background areas as 0 to obtain the right palm profile.
(5) Extracting the outline characteristic of the right palm and the position characteristic of the right arm
(5-1) extracting the characteristics of the H region. And (5) performing convex hull detection on the H area, and if the H area is a polygonal area with five convex points and four concave points, jumping to the step (5-2) for further processing, otherwise, jumping to the step (5-3).
(5-2) the H region is further detected to discriminate fingertips. All the convex points and concave points in the H area are numbered as z [ n ] in sequence and alternately]N is 0,1.. 8, resulting in a single line from z [0]To z [8 ]]The path of (2) is to distinguish the convex point (i.e. the finger tip) and the concave point (i.e. the finger root) in the outline of the right palm by judging whether the direction of the path from the previous point to the next point is upward or downward. If the direction from the front point to the rear point is downward, the point is the fingertip, if the direction from the front point to the rear point is upward, the point is the finger root, and all the fingertip positions and the previously determined right palm center position are counted to be used as the right palm contour feature. Calculating the Euclidean distance x from the first fingertip of the right hand to the center of the right palmnAnd the Euclidean distance z from the second fingertip of the right hand to the center of the right palmnAnd (6) jumping to the step (6-1) to further judge the zooming-in gesture or the zooming-out gesture.
(5-3) passing through the current frame midpoint Ar(x2,y2,z2) And point Wr(x3,y3,z3) To extract the right arm's angle θ with the spatial X-axis representing the horizontal right direction: the Euclidean distance d of the two points is obtained by using the formula (1), namely the actual distance of the two points in the three-dimensional space,
Figure GDA0002487276470000051
then passes through point Wr(x3,y3,z3) And Ar(x2,y2,z2) Distance | X on spatial X-axis3-x2And | and the Euclidean distance d, solving the included angle theta between the right arm of the current frame and the space X axis representing the horizontal right direction, and jumping to the step (6-2) for further judgment.
(6) Judging the gesture by using the outline characteristic of the right palm and the position characteristic of the right arm
(6-1) dividing x of the current framenAnd znAnd m before1Euclidean distance x from the first fingertip of the right hand to the center of the right palm in the framem1And the Euclidean distance z from the second fingertip of the right hand to the center of the right palmm1Contrast-recognizing a zoom-in gesture and a zoom-out gesture, m1Has a value range of [10,60 ]]: if xn>xm1And z isn>zm1Judging the current frame output gesture as an amplifying gesture, and jumping to the step (7); if xn<xm1And z isn<zm1Judging the current frame output gesture as a zooming-out gesture, and jumping to the step (7); if the above two cases are not satisfied or the m-th case before1And if the first fingertip data of the right hand and the second fingertip data of the right hand do not exist in the frame, judging that the current frame has no zooming gesture, and finishing the current frame processing.
(6-2) judging whether theta in the current frame belongs to [85 degrees and 95 degrees ], if the theta belongs to [85 degrees and 95 degrees ], jumping to the step (6-3) to further judge the gesture of switching the viewpoint leftwards and the gesture of switching the viewpoint rightwards, and if not, finishing the processing of the current frame.
(6-3) m th before judgment2The right arm in the frame makes an angle theta with the spatial X-axis representing the horizontal right directionm2Recognizing a left-hand viewpoint-switching gesture and a right-hand viewpoint-switching gesture, m2Has a value range of [10,60 ]]: if thetam2Satisfies the relation of formula (2), i.e. thetam2Belong to [20 degrees, 80 degrees ]],
Figure GDA0002487276470000052
Judging the current frame output gesture as a leftward viewpoint switching gesture, and calculating a viewpoint switching span V of a corresponding unitleftAnd jumping to the step (7);
if thetam2Satisfies the relation of formula (3)I.e. thetam2Belonging to [100 degrees, 160 degrees ]],
Figure GDA0002487276470000061
Judging the current frame output gesture as a right viewpoint switching gesture, and calculating a viewpoint switching span V of a corresponding unitrightAnd jumping to the step (7);
if thetam2Not satisfying the above two cases or the m-th case before2The right arm in the frame makes an angle theta with the spatial X-axis representing the horizontal right directionm2If not, judging that the current frame has no viewpoint switching gesture, and ending the processing of the current frame data.
(7) Converting the recognized gesture into corresponding viewpoint playing operation
(7-1) if the recognized gesture is a local screen zooming-in gesture, realizing a playing operation in the following way: and pausing the current playing picture, mapping the space coordinate of the center of the right palm of the current gesture to a specific position in the image, then locally amplifying the picture at the position through a corresponding functional module, and finally outputting the amplified picture to a display to finish the operation.
(7-2) if the recognized gesture is a zoom-out gesture of the local screen, implementing a play operation by: and pausing the current playing picture, mapping the space coordinate of the center of the right palm of the current gesture to a specific position in the image, then locally reducing the picture of the position through a corresponding functional module, and finally outputting the reduced picture to a display to finish the operation.
And (7-3) if the gesture recognized currently is a gesture for switching the viewpoint leftwards, sending a signal for switching the viewpoint leftwards, generating a new viewpoint through a free viewpoint synthesis technology, and finally outputting the new viewpoint to a display to finish the operation.
And (7-4) if the gesture recognized currently is a gesture for switching the viewpoint rightwards, sending a signal for switching the gesture rightwards, generating a new viewpoint through a free viewpoint synthesis technology, and finally outputting the new viewpoint to a display to finish the operation.
And (7-5) ending the processing of the current frame data.
The technical conception of the invention is as follows: firstly, considering that the quality of a depth image acquired by a common somatosensory camera is poor, a bilateral filtering method is used for carrying out primary processing on the depth image after the depth image is acquired, so that a better edge contour image is acquired; then, judging the center position of a right palm and the position of a right arm according to skeleton data of the somatosensory camera, acquiring a right palm contour according to the center position of the right palm and the preprocessed depth image, and further extracting a right palm contour characteristic and a right arm position distance characteristic; then judging the magnification and reduction gestures of the self-viewpoint video through the outline characteristics of the right palm, and judging the left-right switching gesture of the viewpoint through the included angle between the position connecting line of the right arm and the horizontal right direction; and finally, sending the recognized gesture signals to a corresponding module to realize a corresponding playing function.
Compared with the prior art, the invention has the following beneficial effects:
the invention uses the bilateral filtering method to preprocess the depth image, so that the generated depth image better retains the edge contour information in the image. A simple and quick gesture recognition method is designed, and the blockage of the viewpoint switching by the user gesture is effectively reduced. A brand-new method for amplifying and reducing the local picture of the free viewpoint video and a method for switching viewpoints are designed, the local amplification and reduction of the viewpoint picture and the viewpoint switching are carried out by recognizing the gesture of a user, and a good blending feeling can be brought to the user when the user watches the free viewpoint video.
Drawings
FIG. 1 is a block flow diagram of the present invention;
FIG. 2 is a block diagram of a decision process of the present invention;
FIG. 3 is a schematic diagram of a viewpoint gesture switching to the right in the present invention, wherein a viewpoint gesture switching to the right is composed of a left diagram gesture to a right diagram gesture;
FIG. 4 is a schematic diagram of a viewpoint left switching gesture according to the present invention, wherein the viewpoint left switching gesture is composed of a left diagram gesture and a right diagram gesture.
Detailed Description
The present invention will be described in detail below with reference to examples and drawings, but the present invention is not limited thereto.
The invention adopts a second generation Kinect somatosensory camera of Microsoft corporation and a PC with a processor of Inter-Core-i5-6500@2.30GHz as a hardware platform, and the embodiment of the invention relates to a method for playing free viewpoint video based on Kinect camera gesture recognition, which specifically comprises the following steps as shown in figure 1:
(1) acquiring a frame of depth image and human skeleton data;
(2) preprocessing the depth image by adopting a bilateral filtering method;
(3) calculating the center position of the right palm and the position of the right arm by using the human skeleton data;
(4) obtaining a right palm outline according to the center position of the right palm and the preprocessed depth image;
(5) extracting the outline characteristic of the right palm and the position characteristic of the right arm;
(6) judging and recognizing the gesture by using the outline characteristic of the right palm and the position characteristic of the right arm;
(7) and converting the recognized gesture into a corresponding viewpoint playing operation.
The step (1) specifically comprises the following steps:
(1-1) calling a related header file in the Kinect camera SDK by using OpenCV to obtain video stream data in real time, and directly obtaining a frame of current Depth image data I through a Depth function interface1In which I1The width of the pixel is w pixel points, and the height of the pixel is h pixel points.
(1-2) obtaining the bone point data S through drawSkeleton function1Specifically, bone three-dimensional coordinate data of an ELBOW _ RIGHT, a WRIST _ RIGHT and a HAND _ RIGHT of a human body are obtained in real time through a whichone variable.
The step (2) specifically comprises the following steps:
first, a depth image I is filtered using a bilateral filter1The gray value of the peripheral point of a certain pixel and the Euclidean distance between the points and the central point are weighted and combined, the specific value of the point after bilateral filtering is obtained through calculation, and the specific value is used for filtering the peripheral point of the certain pixelThe depth image after bilateral filtering optimization is marked as I'1And acquiring the right palm contour and the right arm contour for the subsequent steps.
The step (3) specifically comprises the following steps:
the bone data S1The three-dimensional coordinates of the HAND _ RIGHT skeleton point in (1) are taken as the coordinates H of the center position of the RIGHT palmcPassing through ELBOW _ RIGHT point Ar(x2,y2,z2) And WRIST _ RIGHT point Wr(x3,y3,z3) The right arm position is calculated by connecting the two points.
The step (4) specifically comprises the following steps:
taking H'c(x’1,y’1,z’1) Gray value G of1For reference, search for matching depth image I'1H 'of'c(x’1,y’1,z’1) The method comprises the steps of obtaining a rectangular area with the center width of w 'pixel points and the center height of h' pixel points, wherein w 'is w/4, h' is h/3, and extracting a gray value in [ G ] from the rectangular area1-10,G1+10]The inner pixel area, this area is marked H. And finally setting the gray value of the H area as 255 and the gray values of the rest background areas as 0 to obtain the right palm profile.
The step (5) specifically comprises the following steps:
(5-1) extracting the characteristics of the H region. And (5) performing convex hull detection on the H area, and if the H area is a polygonal area with five convex points and four concave points, jumping to the step (5-2) for further processing, otherwise, jumping to the step (5-3).
(5-2) the H region is further detected to discriminate fingertips. All the convex points and concave points in the H area are numbered as z [ n ] in sequence and alternately]N is 0,1.. 8, resulting in a single line from z [0]To z [8 ]]The path of (2) is to distinguish the convex point (i.e. the finger tip) and the concave point (i.e. the finger root) in the outline of the right palm by judging whether the direction of the path from the previous point to the next point is upward or downward. If the direction from the front point to the rear point is downward, the point is the fingertip, if the direction from the front point to the rear point is upward, the point is the finger root, all the fingertip positions and the previously determined right palm center position are counted and used as the right palm outlineAnd (5) characterizing. Calculating the Euclidean distance x from the first fingertip of the right hand to the center of the right palmnAnd the Euclidean distance z from the second fingertip of the right hand to the center of the right palmnAnd (6) jumping to the step (6-1) to further judge the zooming-in gesture or the zooming-out gesture.
(5-3) passing through the current frame midpoint Ar(x2,y2,z2) And point Wr(x3,y3,z3) Extracting a spatial X-axis included angle theta between the right arm and the representative horizontal right direction: the Euclidean distance d of the two points is obtained by using the formula (1), namely the actual distance of the two points in the three-dimensional space,
Figure GDA0002487276470000101
then passes through point Wr(x3,y3,z3) And Ar(x2,y2,z2) Distance | X on spatial X-axis3-x2And | and the Euclidean distance d, solving the included angle theta between the right arm of the current frame and the space X axis representing the horizontal right direction, and jumping to the step (6-2) for further judgment.
(6) Judging the gesture by using the outline characteristic of the right palm and the position characteristic of the right arm
(6-1) dividing x of the current framenAnd znAnd m before1Euclidean distance x from the first fingertip of the right hand to the center of the right palm in the framem1And the Euclidean distance z from the second fingertip of the right hand to the center of the right palmm1Contrast-recognizing a zoom-in gesture and a zoom-out gesture, m1Has a value range of [10,60 ]]: if xn>xm1And z isn>zm1Judging the current frame output gesture as an amplifying gesture, and jumping to the step (7); if xn<xm1And z isn<zm1Judging the current frame output gesture as a zooming-out gesture, and jumping to the step (7); if the above relationship or the previous m-th relation is not satisfied1And if the first fingertip data of the right hand and the second fingertip data of the right hand do not exist in the frame, judging that the current frame has no zooming gesture, and finishing the current frame processing.
(6-2) judging whether theta in the current frame belongs to [85 degrees and 95 degrees ], if the theta belongs to [85 degrees and 95 degrees ], jumping to the step (6-3) to further judge the gesture of switching the viewpoint leftwards and the gesture of switching the viewpoint rightwards, and if not, finishing the processing of the current frame.
(6-3) m th before judgment2The right arm in the frame makes an angle theta with the spatial X-axis representing the horizontal right directionm2Recognizing a left-hand viewpoint-switching gesture and a right-hand viewpoint-switching gesture, m2Has a value range of [10,60 ]]: if thetam2If the relationship of the formula (2) is satisfied, the current frame output gesture is determined as a leftward viewpoint switching gesture, and a viewpoint switching span V of a corresponding unit is calculatedleftAnd jumping to the step (7);
Figure GDA0002487276470000111
if thetam2If the relation of formula (3) is satisfied, the current frame output gesture is determined as a right viewpoint switching gesture, and a viewpoint switching span V of a corresponding unit is calculatedrightAnd jumping to the step (7);
Figure GDA0002487276470000112
if the above relation or the previous m-th relation is not satisfied2The right arm in the frame makes an angle theta with the spatial X-axis representing the horizontal right directionm2If not, judging that the current frame has no viewpoint switching gesture, and ending the processing of the current frame data.
The step (7) specifically comprises:
(7-1) if the current frame is detected to have the amplification gesture, suspending the current playing picture, mapping the space coordinate of the center position of the right palm of the current gesture to the specific position in the free viewpoint image, then realizing the local amplification of the picture of the position through a resize () function of OpenCV, and finally displaying the amplified picture to a display to finish the amplification operation.
(7-2) if a zoom-out gesture is detected in the current frame, pausing the current playing picture, mapping the space coordinate of the center position of the right palm of the current gesture to the specific position in the free viewpoint image, then realizing local zoom-out of the picture in the position through a resize () function of OpenCV, and finally displaying the zoomed-out picture to a display to finish the zoom-out operation.
And (7-3) if detecting that the current frame has a gesture of switching the viewpoint to the left, triggering a corresponding signal to be sent to a free viewpoint synthesis end, generating a new viewpoint at the free viewpoint synthesis end through a free viewpoint synthesis technology, wherein the generated new viewpoint is located at a distance of a corresponding unit on the right of the current viewpoint, and if the viewpoint is switched to the left by one unit distance, namely, the viewpoint is actually switched to the left by 30 mm. Then the new viewpoint is sent to the display to complete the process of viewpoint switching.
(7-4) if detecting that the current frame has a viewpoint gesture of switching to the right, triggering a corresponding signal to be sent to a free viewpoint synthesis end, generating a new viewpoint at the free viewpoint synthesis end through a free viewpoint synthesis technology, wherein the generated new viewpoint is located at a distance of a corresponding unit on the left of the current viewpoint, and if switching to the right is performed, namely, a viewpoint of 30mm is actually switched to the right. Then the new viewpoint is sent to the display to complete the process of viewpoint switching.
And (7-5) ending the processing of the current frame data.

Claims (5)

1. A free viewpoint video playing method based on somatosensory camera gesture recognition is characterized by comprising the following steps:
(1) acquiring a frame of depth image and human skeleton data:
real-time acquisition of one frame depth image I through motion sensing camera1And corresponding human skeletal data S1,I1Has a width of w pixel points and a height of h pixel points, S1The skeleton three-dimensional coordinate data comprises three points of a right arm joint, a right wrist joint and a right palm;
(2) the depth image is preprocessed by adopting a bilateral filtering method:
depth image I using bilateral filtering method1Filtering to eliminate noise interference in the depth image,optimizing edge contour information in the depth image, and recording the depth image subjected to bilateral filtering as I'1
(3) Calculating a right palm center position and a right arm position using human skeletal data:
will S1The three-dimensional coordinates of the right-palm joint point in (1) are taken as the coordinates H of the center position of the right palmc(x1,y1,z1) The right arm position is calculated through an arm joint point Ar(x2,y2,z2) And wrist joint point Wr(x3,y3,z3) A connection between two points;
(4) obtaining a right palm outline according to the center position of the right palm and the preprocessed depth image:
h is to bec(x1,y1,z1) Mapping to depth image I'1Obtaining a corresponding point, and matching the depth image I 'by taking the gray value of the point as a reference'1Obtaining a region H from a region adjacent to the gray level value, and mixing the region H with I'1Binarizing gray values of other areas to obtain a palm contour map;
(5) extracting the outline characteristic of the right palm and the position characteristic of the right arm:
firstly, judging whether the H area is a polygonal area with five convex points and four concave points, if so, further judging the fingertips and the finger roots, and further obtaining the Euclidean distance x from the first fingertip of the right hand to the center of the right palmnAnd the Euclidean distance z from the second fingertip of the right hand to the center of the right palmn(ii) a If not, pass Ar(x2,y2,z2) And Wr(x3,y3,z3) Obtaining a space X-axis included angle theta between the right arm of the current frame and the representative horizontal right direction;
(6) judging and recognizing the gesture by utilizing the palm contour characteristics and the arm position characteristics:
if the Euclidean distance x from the first fingertip of the right hand to the center of the right palm of the current frame is obtained in the step (5)nAnd the Euclidean distance z from the second fingertip of the right hand to the center of the right palmnThen x isnAnd znAnd itFront m1Euclidean distance x from the first fingertip of the right hand to the center of the right palm in the framem1And the Euclidean distance z from the second fingertip of the right hand to the center of the right palmm1Comparison, m1Has a value range of [10,60 ]]Identifying a zooming-in gesture or a zooming-out gesture, and ending the current frame processing if the zooming-in gesture or the zooming-out gesture conditions are not met;
if the included angle theta between the right arm of the current frame and the space X axis representing the horizontal right direction is obtained in the step (5), whether the theta in the current frame belongs to [85 degrees ], 95 degrees or not is judged]If yes, the m-th step before the judgment is carried out2The right arm in the frame makes an angle theta with the spatial X-axis representing the horizontal right directionm2Recognizing a left-hand viewpoint-switching gesture and a right-hand viewpoint-switching gesture, m2Has a value range of [10,60 ]]If the viewpoint switching gesture condition is not met, ending the current frame processing;
(7) converting the recognized gesture into a corresponding operation:
converting the recognized zooming-in gesture or zooming-out gesture into zooming-in or zooming-out operation of a video picture; converting the recognized gesture for switching the viewpoint leftwards or switching the viewpoint rightwards into leftward switching operation of the viewpoint or rightward switching operation of the viewpoint; the processing of the current frame data is ended.
2. The method for playing free viewpoint video based on somatosensory camera gesture recognition as claimed in claim 1, wherein in the step (4), H is selectedc(x1,y1,z1) Mapping to depth image I'1To obtain H'c(x’1,y’1,z’1) H is taken'c(x’1,y’1,z’1) Gray value G of1For reference, search for matching depth image I'1H 'of'c(x’1,y’1,z’1) A rectangular area with a center, a width of w 'pixel points and a height of h', wherein w 'is w/4, h' is h/3, and the gray value is extracted from the rectangular area and is [ G ]1-10,G1+10]Inner pixel area, marking the area as H, and finally setting the gray value of the H areaAnd setting the gray values of the rest background areas to be 255 as 0, and obtaining the right palm outline.
3. The method for playing the free viewpoint video based on the somatosensory camera gesture recognition as claimed in claim 1, wherein the step (5) comprises the following substeps:
(5-1) performing convex hull detection on the H area, and if the H area is a polygonal area with five convex points and four concave points, sequentially and alternately numbering all the convex points and the concave points of the H area as z [ n ]]N is 0,1.. 8, there is a path from z [0 · to]To z [8 ]]Judging whether the path direction from the front point to the rear point is upward or downward to distinguish that the convex points in the palm outline are finger tips and concave points which are finger roots, if the path direction from the front point to the rear point is downward, the convex points are the finger tips, if the path direction from the front point to the rear point is upward, the convex points are the finger roots, and if the path direction from the front point to the rear point is upward, the convex points are the finger roots, all the finger tip positions and the previously determined center position of the right palm are counted to be used as the outline characteristics of the right; calculating the Euclidean distance x from the first fingertip of the right hand to the center of the right palmnAnd the Euclidean distance z from the second fingertip of the right hand to the center of the right palmn
(5-2) if there are no five bumps and no four pits in the H region, passing through the current frame midpoint Ar(x2,y2,z2) And point Wr(x3,y3,z3) To extract the right arm's angle θ with the spatial X-axis representing the horizontal right direction: the Euclidean distance d of the two points is obtained by using the formula (1), namely the actual distance of the two points in the three-dimensional space,
Figure FDA0002487276460000031
then passes through point Wr(x3,y3,z3) And Ar(x2,y2,z2) Distance | X on spatial X-axis3-x2And | and the Euclidean distance d, calculating a space X-axis included angle theta between the right arm of the current frame and the representative horizontal right direction.
4. The method for playing the free viewpoint video based on the somatosensory camera gesture recognition as claimed in claim 1, wherein the step (6) comprises the following sub-steps:
(6-1) if the Euclidean distance x from the first fingertip of the right hand to the center of the right palm of the current frame is obtained in the step (5)nAnd the Euclidean distance z from the second fingertip of the right hand to the center of the right palmnThen, whether the gesture zooming-in or gesture zooming-out condition is met is judged: if xn>xm1And z isn>zm1Judging the current frame output gesture as an amplifying gesture; if xn<xm1And z isn<zm1Judging the current frame output gesture as a zooming-out gesture; if the above two cases are not satisfied or the m-th case before1If the first fingertip data of the right hand and the second fingertip data of the right hand do not exist in the frame, judging that the current frame has no zooming gesture, and finishing the processing of the current frame;
(6-2) if the included angle theta between the right arm of the current frame and the space X axis representing the horizontal right direction is obtained in the step (5), judging whether the theta belongs to [85 degrees and 95 degrees ], if the theta belongs to [85 degrees and 95 degrees ], further judging whether the viewpoint gesture is switched leftwards or rightwards through the step (6-3), and if not, finishing the processing of the current frame;
(6-3) m th before taking2The right arm in the frame makes an angle theta with the spatial X-axis representing the horizontal right directionm2: if thetam2Belong to [20 degrees, 80 degrees ]]Judging the current frame output gesture as a viewpoint switching left gesture according to the formula (2),
Figure FDA0002487276460000041
calculating a viewpoint switching span V of a corresponding unitleft(ii) a If thetam2Belonging to [100 degrees, 160 degrees ]]Judging the current frame output gesture as a right viewpoint switching gesture and according to the formula (3),
Figure FDA0002487276460000051
calculating a viewpoint switching span V of a corresponding unitright(ii) a If thetam2Not satisfying the above two cases or the m-th case before2The right arm in the frame makes an angle theta with the spatial X-axis representing the horizontal right directionm2If not, judging that the current frame has no viewpoint switching gesture, and ending the processing of the current frame data.
5. The method for playing the free viewpoint video based on the somatosensory camera gesture recognition as claimed in claim 1, wherein the step (7) comprises the following sub-steps:
(7-1) if the recognized gesture is a local screen zooming-in gesture, realizing a playing operation in the following way: pausing a current playing picture, mapping the space coordinate of the center of the right palm of the current gesture to a specific position in the image, then locally amplifying the picture at the position through a corresponding functional module, and finally outputting the amplified picture to a display to finish the operation;
(7-2) if the recognized gesture is a zoom-out gesture of the local screen, implementing a play operation by: pausing a current playing picture, mapping the space coordinate of the center of the right palm of the current gesture to a specific position in the image, then locally reducing the picture of the position through a corresponding functional module, and finally outputting the reduced picture to a display to finish the operation;
(7-3) if the currently recognized gesture is a gesture for switching the viewpoint leftwards, sending a gesture signal for switching leftwards, generating a new viewpoint through a free viewpoint synthesis technology, and finally outputting the new viewpoint to a display to complete the operation;
(7-4) if the currently recognized gesture is a gesture for switching the viewpoint to the right, sending a gesture signal for switching the viewpoint to the right, generating a new viewpoint through a free viewpoint synthesis technology, and finally outputting the new viewpoint to a display to complete the operation;
and (7-5) ending the processing of the current frame data.
CN201810246578.3A 2018-03-23 2018-03-23 Free viewpoint video playing method based on motion sensing camera gesture recognition Active CN108521594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810246578.3A CN108521594B (en) 2018-03-23 2018-03-23 Free viewpoint video playing method based on motion sensing camera gesture recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810246578.3A CN108521594B (en) 2018-03-23 2018-03-23 Free viewpoint video playing method based on motion sensing camera gesture recognition

Publications (2)

Publication Number Publication Date
CN108521594A CN108521594A (en) 2018-09-11
CN108521594B true CN108521594B (en) 2020-08-11

Family

ID=63434053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810246578.3A Active CN108521594B (en) 2018-03-23 2018-03-23 Free viewpoint video playing method based on motion sensing camera gesture recognition

Country Status (1)

Country Link
CN (1) CN108521594B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110989844A (en) * 2019-12-16 2020-04-10 广东小天才科技有限公司 Input method, watch, system and storage medium based on ultrasonic waves
CN112346642B (en) * 2020-11-10 2022-03-08 交控科技股份有限公司 Train information display method and device, electronic equipment and system
CN114041767A (en) * 2021-10-11 2022-02-15 宁波春建电子科技有限公司 Heart rate detection method based on depth camera and millimeter wave radar

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520791A (en) * 2011-11-28 2012-06-27 北京盈胜泰科技术有限公司 Wireless gesture recognition device
CN102622083A (en) * 2012-02-24 2012-08-01 北京盈胜泰科技术有限公司 Hand gesture recognition method and hand gesture recognition device
CN103167230A (en) * 2011-12-17 2013-06-19 富泰华工业(深圳)有限公司 Electronic equipment and method controlling shooting according to gestures thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100027845A1 (en) * 2008-07-31 2010-02-04 Samsung Electronics Co., Ltd. System and method for motion detection based on object trajectory
US9063573B2 (en) * 2011-02-17 2015-06-23 The Board Of Trustees Of The Leland Stanford Junior University Method and system for touch-free control of devices

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520791A (en) * 2011-11-28 2012-06-27 北京盈胜泰科技术有限公司 Wireless gesture recognition device
CN103167230A (en) * 2011-12-17 2013-06-19 富泰华工业(深圳)有限公司 Electronic equipment and method controlling shooting according to gestures thereof
CN102622083A (en) * 2012-02-24 2012-08-01 北京盈胜泰科技术有限公司 Hand gesture recognition method and hand gesture recognition device

Also Published As

Publication number Publication date
CN108521594A (en) 2018-09-11

Similar Documents

Publication Publication Date Title
Li et al. Delving into egocentric actions
CN108388882B (en) Gesture recognition method based on global-local RGB-D multi-mode
WO2019228473A1 (en) Method and apparatus for beautifying face image
US8615108B1 (en) Systems and methods for initializing motion tracking of human hands
US9092665B2 (en) Systems and methods for initializing motion tracking of human hands
CN108062525B (en) Deep learning hand detection method based on hand region prediction
JP4768196B2 (en) Apparatus and method for pointing a target by image processing without performing three-dimensional modeling
US6204852B1 (en) Video hand image three-dimensional computer interface
US20110025834A1 (en) Method and apparatus of identifying human body posture
CN102096471B (en) Human-computer interaction method based on machine vision
CN108521594B (en) Free viewpoint video playing method based on motion sensing camera gesture recognition
JP5438601B2 (en) Human motion determination device and program thereof
CN104813258A (en) Data input device
KR101642402B1 (en) Apparatus and method for capturing digital image for guiding photo composition
CN104821010A (en) Binocular-vision-based real-time extraction method and system for three-dimensional hand information
CN110032932B (en) Human body posture identification method based on video processing and decision tree set threshold
TW201514830A (en) Interactive operation method of electronic apparatus
CN102880865A (en) Dynamic gesture recognition method based on complexion and morphological characteristics
CN103995595A (en) Game somatosensory control method based on hand gestures
JP6331761B2 (en) Determination device, determination method, and determination program
JP2010057105A (en) Three-dimensional object tracking method and system
JP2014029656A (en) Image processor and image processing method
CN106484108A (en) Chinese characters recognition method based on double vision point gesture identification
Mohatta et al. Robust hand gestural interaction for smartphone based AR/VR applications
Ramirez-Giraldo et al. Kernel based hand gesture recognition using kinect sensor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant