CN112657176A - Binocular projection man-machine interaction method combined with portrait behavior information - Google Patents

Binocular projection man-machine interaction method combined with portrait behavior information Download PDF

Info

Publication number
CN112657176A
CN112657176A CN202011642041.2A CN202011642041A CN112657176A CN 112657176 A CN112657176 A CN 112657176A CN 202011642041 A CN202011642041 A CN 202011642041A CN 112657176 A CN112657176 A CN 112657176A
Authority
CN
China
Prior art keywords
straight line
image
coordinate system
binocular
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011642041.2A
Other languages
Chinese (zh)
Inventor
谢巍
许练濠
吴伟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202011642041.2A priority Critical patent/CN112657176A/en
Publication of CN112657176A publication Critical patent/CN112657176A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a binocular projection man-machine interaction method combining portrait behavior information, which comprises the following steps: carrying out edge detection on a camera capturing area; carrying out straight line detection on the image after edge detection; solving the mapping relation under the view angle transformation through the homography transformation matrix; identifying the interactive object by using a YOLOv3 target detection algorithm, and acquiring game coordinates mapped to a Unity3D development scene from the projection area coordinates; obtaining depth information of an interactive object through binocular camera ranging; obtaining the positions of skeleton key points of an interactor through a Kinect camera, and virtualizing a character object and corresponding interactive actions in Uinty3D software; the invention can utilize a deep learning method to carry out human-computer interaction, and can greatly improve the interactivity of the projection human-computer interaction system by combining the depth information of the interaction object measured by the binocular camera and the portrait behavior information of the interactors.

Description

Binocular projection man-machine interaction method combined with portrait behavior information
Technical Field
The invention relates to the technical fields of image processing, feature extraction, feature analysis, computer vision, convolutional neural networks, target detection, human-computer interaction and the like, in particular to a binocular projection human-computer interaction method combining portrait behavior information.
Background
With the development of scientific technology, man-machine interaction technology becomes diversified, people no longer need to simply present virtual scenes, and begin to explore interaction methods with virtual worlds, so that more and more novel man-machine interaction technologies come into play. Human-computer interaction techniques fall into several categories: the traditional interactive technology taking a keyboard and a mouse as input; interaction technologies based on touch screen devices, such as smart phones, tablet computers; non-contact interaction technologies based on machine vision and image processing technologies, such as virtual keyboards, gesture interaction systems and the like.
The key-press keyboard and mouse are the most mature interactive devices at present, and the man-machine interaction technology based on the key-press keyboard and mouse is also applied to the computer operation at the earliest. The interactive mode has the characteristics of stable performance and quick response, and is widely applied to office work in daily life. However, the disadvantages of this technique are: a complete human-computer interaction process requires devices such as a keyboard, a mouse, a screen for displaying a graphical interface and the like, and the devices are numerous and heavy; with the development of touch screen technology and mobile equipment, display and interaction functions of novel mobile products such as smart phones are integrated into a screen, and the characteristics of portability and easy operation enable the mobile equipment to be rapidly and widely used, so that the life style of people is changed; the diversified demands of people on human-computer interaction prompt people to find a more natural and less-limited human-computer interaction mode.
In the literature (Goto H, Takemura D, Kawasaki Y, et al. development of an Information Projection Using a Projector-Camera System [ J ]. Electronics and Communications in Japan,2013,96(11): 70-81), Hiroki Goto et al studied a Camera Projection interaction System based on the frame difference method and the hand flesh color extraction method: firstly, separating hands from a scene based on clustering characteristics of hand skin colors in HSV and YCbCr spaces, and then detecting fingertip positions on a separated foreground image by using a template matching method, thereby realizing projection interaction between a user and a computer or a family television. In the literature (Fitriani, Goh W.B. Interactive with projected media on deformable surfaces [ C ]. Rio de Janeiro, Brazil: IEEE International Conference on Computer Vision,2007:1-6.), Fitriani et al propose a human-Computer interaction system based on a deformable projection surface, which projects a virtual scene onto the surface of a deformable object, then detects the deformation generated when a user touches the projection screen, and analyzes the interaction information through an image processing algorithm and a deformation model of the object. The above schemes based on machine vision technology and image processing algorithm all have the following problems: the diversity of the projection scene cannot be guaranteed; the dependence on peripherals is large. As in an interactive system based on hand skin color, when the projected scene is similar to the hand skin color, the hand foreground separation algorithm is ineffective.
Disclosure of Invention
The invention discloses a binocular projection human-computer interaction method combined with portrait behavior information, and aims to utilize a depth learning method to carry out human-computer interaction, and simultaneously combine depth information of an interaction object measured by a binocular camera and portrait behavior information of an interactor to improve the interactivity of a projection human-computer interaction system to a great extent.
The invention is realized by at least one of the following technical schemes.
A binocular projection human-computer interaction method combined with portrait behavior information comprises the following steps:
acquiring image data by using a camera, and carrying out edge detection on a camera capturing area;
carrying out straight line detection on the image after edge detection through Hough straight line detection to realize area positioning of a projection area;
solving the mapping relation under the view angle transformation through the homography transformation matrix;
identifying the interactive objects by using a YOLOv3 target detection algorithm, and obtaining game coordinates mapped to the Unity3D development scene from the projection area coordinates according to the solved homography transformation matrix;
obtaining depth information of an interactive object through binocular camera ranging;
and virtualizing the character object by obtaining the positions of the skeleton key points of the interactive person, and generating corresponding interactive actions according to the distribution of the character joint points.
Preferably, the camera image is grayed before edge detection, the edge is a set of points with obvious brightness change in the image, and a large-scale canny filter operator is adopted to detect the edge of the image.
Preferably, a gaussian filter is used to smooth the image and filter out noise when edge detection is performed;
a gaussian kernel of size (2k +1) × (2k +1) is set by the following formula:
Figure BDA0002880998400000021
wherein k is a positive integer, i, j is in the middle of [1,2k +1 ]],σ2Let σ be 1.4 and k be 1 for the variance of the gaussian function, resulting in a gaussian convolution kernel:
Figure BDA0002880998400000022
convolving the Gaussian kernel with a gray image to obtain a smooth image;
calculating the gradient strength and direction of each pixel point in the image, and utilizing Sobel operators in the horizontal direction and the vertical direction:
Figure BDA0002880998400000023
wherein SxIs the Sobel operator in the horizontal direction, SyIs a vertical Sobel operator which is respectively convolved with the smooth image to obtain the first derivative G of the pixel point in two directionsx,GyFrom this, the gradient of the pixel point is calculated:
Figure BDA0002880998400000031
preferably, the non-maxima suppression comprises the steps of:
3) comparing the gradient strength of the current pixel with two pixels along the positive and negative gradient directions;
4) if the gradient intensity of the current pixel is maximum compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is restrained.
Preferably, the performing line detection on the image after the edge detection through hough line detection to realize the area positioning process of the projection area specifically includes:
for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point under the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line, the straight line is transformed into: b is y-xk, and the abscissa in the hough space is defined as k, and the ordinate is b, then b is y-xk, which is a straight line with slope-x and intercept of y in the hough space; several points (x) on the same straight line on the Cartesian coordinate system1,y1),(x2,y2),…,(xn,yn) Corresponding to a plurality of straight lines on the Hough space, wherein the common intersection point (k, b) of the straight lines is the slope and the intercept of the same straight line in a Cartesian coordinate system;
performing hough transform in a polar coordinate mode, specifically, expressing a straight line by using a polar coordinate equation rho ═ xcos theta + ysin theta, wherein rho is a polar distance, namely a distance from an original point to the straight line in a polar coordinate space; theta is a polar angle, namely an included angle between a line segment which passes through an origin and is perpendicular to a straight line and an x axis, defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then coordinates (x) of a plurality of points on the same straight line on the polar coordinate system1,y1),(x2,y2),…,(xn,yn) Corresponding to a plurality of curves in Hough space, wherein the common intersection point (theta, rho) of the curves is the polar angle and the polar distance of the same straight line in a polar coordinate system;
calculating the intersection point of the four boundary straight lines of the longest projection region to obtain four vertex coordinates (x)lt,ylt)、(xlb,ylb)、(xrb,yrb)、(xrt,yrt)。
Preferably, the solving of the mapping relationship under the view angle transformation through the homography transformation matrix specifically includes:
setting the X '-Y' plane to be vertical to the Z axis of the X-Y-Z space coordinate system and to be intersected with the Z axis to be a point (0,0,1), namely, the point (X ', Y') under the X '-Y' plane coordinate system is a point (X ', Y', 1) under the X-Y-Z space coordinate system; and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography matrix H:
Figure BDA0002880998400000032
Figure BDA0002880998400000033
in the formula, h1~h99 transformation parameters for the homography matrix; further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:
Figure BDA0002880998400000041
the H matrix has 9 transformation parameters, but actually only 8 degrees of freedom, and multiplying the H matrix by a scaling factor k:
Figure BDA0002880998400000042
i.e. k x H andh actually represents the same mapping relation, so that H only has 8 degrees of freedom, and the solution of the homography matrix H is realized by adding constraint to the homography matrix H or adding H to the homography matrix H 91.
Preferably, h is9The solution equation is set to 1 as follows:
Figure BDA0002880998400000043
the homography matrix H is constrained modulo 1 as follows:
Figure BDA0002880998400000044
the equation to be solved is then:
Figure BDA0002880998400000045
defining target coordinate points of the four vertexes of the projection area under the obtained pixel coordinate system under the projection scene coordinate system, namely solving an H matrix:
Figure BDA0002880998400000046
preferably, the identifying the interactive object by using the YOLOv3 target detection algorithm, and obtaining the game coordinate mapped from the projection area coordinate to the Unity3D development scene according to the solved homography transformation matrix specifically includes:
the loss function of YOLOv3 is as follows:
Figure BDA0002880998400000051
where the first term is the coordinate error loss, λcoordIs a coordinate loss function coefficient; s denotes dividing an input image intoS × S grids; b represents the number of frames included in one mesh;
Figure BDA0002880998400000052
whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r isij
Figure BDA0002880998400000053
X, y, w, h representing the prediction box and the real box, respectively; the second term and the third term are confidence loss,
Figure BDA0002880998400000054
whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cijAnd
Figure BDA0002880998400000055
respectively representing the predicted and real confidence coefficients of the jth frame of the ith grid; classes represents the number of categories; p is a radical ofij(c),
Figure BDA0002880998400000056
And the prediction probability and the real probability of the jth frame of the ith grid belonging to the class c object are shown.
Preferably, the obtaining of the depth information of the interactive object through binocular camera ranging specifically includes:
correcting the original image according to the calibration result, wherein the two corrected images are positioned on the same plane and are parallel to each other;
matching pixel points of the two corrected images to obtain a disparity map;
and calculating the depth of each pixel according to the matching result, thereby obtaining a depth map.
Preferably, the obtaining of the skeletal key point position of the interactor through the Kinect camera and the virtualization of the character object and the corresponding interaction action in the Uinty3D software specifically include:
identifying the skeleton structure of a target person by utilizing the skeleton tracking function of the Kinect somatosensory instrument, acquiring depth data of the target person and related information of a color image, and displaying the obtained skeleton structure information of the person in real time to store joint point data of the person; the interaction of the man-machine interaction system is improved by virtualizing character objects in the Unity3D software to simulate the interaction of an interactor.
Compared with the prior art, the invention has the beneficial effects that: the target detection is carried out by a deep learning method, so that the interference of environmental factors can be avoided, the diversity of interactive scenes is ensured, and meanwhile, the interactivity of a human-computer interaction system is improved by combining the depth information of an interactive object acquired by a binocular camera and the human behavior information of an interactive person acquired by a Kinect camera.
Drawings
Fig. 1 is a hardware schematic diagram of a binocular projection human-computer interaction method combining with portrait behavior information according to the embodiment;
fig. 2 is a flowchart of a binocular projection human-computer interaction method combining with portrait behavior information according to the present embodiment;
FIG. 3 is a schematic diagram of detection of a Cartesian coordinate Hough line according to the embodiment;
fig. 4 is a schematic diagram of a hough line detection algorithm of the polar coordinate system in this embodiment;
FIG. 5 is a schematic diagram of homography transformation of the present embodiment;
fig. 6 is a frame diagram of an actual binocular system according to the present embodiment;
fig. 7 is a frame diagram of an ideal binocular vision system of the present embodiment.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
As shown in fig. 2, a binocular projection human-computer interaction method combining portrait behavior information includes the following steps:
s1, performing edge detection on the camera capturing area through a large-scale canny filtering operator, and performing non-maximum suppression;
carrying out graying processing on the camera image;
the edge is a set of points with obvious brightness change in the image, the gradient can reflect the change speed in numerical value, and the edge of the image is detected by adopting a large-scale canny filter operator based on the principle of no leakage of the boundary of a projection area;
using a Gaussian filter to smooth the image and filter out noise;
a gaussian kernel of size (2k +1) × (2k +1) is set by the following formula:
Figure BDA0002880998400000061
wherein k is a positive integer, i, j is in the middle of [1,2k +1 ]],σ2Let σ be 1.4 and k be 1 for the variance of the gaussian function, resulting in a gaussian convolution kernel:
Figure BDA0002880998400000062
convolving the Gaussian kernel with a gray image to obtain a smooth image;
calculating the gradient strength and direction of each pixel point in the image, and utilizing Sobel operators in the horizontal direction and the vertical direction:
Figure BDA0002880998400000063
wherein SxIs the Sobel operator in the horizontal direction, SxIs a vertical Sobel operator which is respectively convolved with the smooth image to obtain the first derivative G of the pixel point in two directionsx,GyFrom this, the gradient of the pixel point is calculated:
Figure BDA0002880998400000071
non-maxima suppression is applied to eliminate spurious responses due to edge detection:
for each pixel point on the obtained gradient image, the reservation or elimination of the point cannot be determined only by a single threshold, and for the finally obtained edge image, the accurate description of the source image contour is expected, so that non-maximum suppression is required:
5) comparing the gradient strength of the current pixel with two pixels along the positive and negative gradient directions;
6) if the gradient intensity of the current pixel is maximum compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is inhibited;
as the convolution kernel scale increases, the more pronounced the detected edge is; based on the principle that the boundary of the projection area cannot be detected, a large-scale canny operator is adopted to detect the edge of the image.
S2, carrying out straight line detection on the image after edge detection through Hough straight line detection to realize area positioning of a projection area;
hough line detection maps each point on a cartesian coordinate system to a straight line in Hough space by using the principle of point-line duality of the cartesian coordinate system and Hough space, so that an intersection passing through a plurality of straight lines in Hough space corresponds to a straight line passing through a plurality of points in the cartesian coordinate system, as shown in fig. 3.
Specifically, for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point under the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line. The straight line is transformed into: and b is y-xk, and the abscissa in the hough space is k, and the ordinate in the hough space is b, then b is y-xk, which is a straight line with slope-x and intercept y in the hough space. Several points (x) on the same straight line on the Cartesian coordinate system1,y1),(x2,y2),…,(xn,yn) The hough space corresponds to a plurality of straight lines, and the common intersection point (k, b) of the straight lines is the slope and the intercept of the same straight line in a Cartesian coordinate system.
The slope of the vertical line in the image cannot be calculated becauseThis is typically done in polar form with a hough transform. Specifically, a straight line is expressed by a polar coordinate equation ρ ═ xcos θ + ysin θ, where ρ is a polar distance, i.e., a distance from an origin to the straight line in a polar coordinate space; θ is the polar angle, i.e. the angle between the line segment passing through the origin and perpendicular to the straight line and the x-axis. Defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then defining the coordinates (x) of a plurality of points on the same straight line on the polar coordinate system1,y1),(x2,y2),…,(xn,yn) The hough space corresponds to a plurality of curves, and a common intersection point (θ, ρ) of the curves is a polar angle and a polar distance of the same straight line in the polar coordinate system, and a schematic diagram is shown in fig. 4 of the accompanying drawings.
Calculating the intersection point of the four boundary straight lines of the longest projection region in pairs to obtain coordinates (x) of four top points of the projection region, namely, the top left point, the bottom right point and the top right pointlt,ylt)、(xlb,ylb)、(xrb,yrb)、(xrt,yrt)。
S3, solving the mapping relation under the view angle transformation through the homography transformation matrix; the homography transformation diagram is shown in figure 5.
Homography transformations reflect the process of mapping from one two-dimensional plane to three-dimensional space, and then from three-dimensional space to another two-dimensional plane. The homography transformation describes nonlinear transformation between two coordinate systems, so that the homography transformation has wide application in the fields of image splicing, image correction, augmented reality and the like.
X-Y-Z is a three-dimensional space coordinate system and can be understood as a world coordinate system; x-y is a pixel plane space coordinate system; and x '-y' is a plane coordinate system of the elevator key. The homography transform can be described as: a point (X, Y) on the X-Y coordinate system, corresponding to a straight line l passing through the origin and the point on the X-Y-Z coordinate system:
Figure BDA0002880998400000081
the straight line intersects the x '-y' coordinate system plane at point (x ', y'), and the process from point (x, y) to point (x ', y') is referred to as a homography transformation.
The solving process of the homography transformation is as follows:
let the X '-Y' plane be perpendicular to the Z-axis of the X-Y-Z space coordinate system and intersect the Z-axis at point (0,0,1), i.e., point (X ', Y') in the X '-Y' plane coordinate system is point (X ', Y', 1) in the X-Y-Z space coordinate system. And describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography matrix H:
Figure BDA0002880998400000082
Figure BDA0002880998400000083
in the formula, h1~h99 transformation parameters for the homography matrix; further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:
Figure BDA0002880998400000084
the H matrix has 9 transformation parameters, but actually only 8 degrees of freedom, and multiplying the H matrix by a scaling factor k:
Figure BDA0002880998400000085
that is, k x H and H actually represent the same mapping relationship, so H has only 8 degrees of freedom, and one way is to solve H9Setting to 1, the equation to be solved is:
Figure BDA0002880998400000086
another approach is to add a constraint to the homography matrix H, modulo 1, as follows:
Figure BDA0002880998400000087
the equation to be solved is then:
Figure BDA0002880998400000091
defining target coordinate points of the four vertexes of the projection area under the obtained pixel coordinate system under the projection scene coordinate system, namely solving an H matrix:
Figure BDA0002880998400000092
s4, identifying the interactive objects by researching a convolutional neural network technology and utilizing a YOLOv3 target detection algorithm, and obtaining game coordinates mapped to a Unity3D development scene from the projection area coordinates according to the solved homography transformation matrix;
s5, obtaining depth information of the interactive object by ranging the binocular camera through a convolutional neural network;
further, interactive objects are identified by using a YOLOv3 target detection algorithm, and a game coordinate process of mapping the projection area coordinates to the Unity3D development scene is obtained according to the solved homography transformation matrix.
Coordinate positioning of the arrow position in the image is a typical object detection problem. The current research on target detection in the field of convolutional neural networks is mainly divided into two algorithms, namely a two-stage algorithm and a one-stage algorithm: the two-stage target detection algorithm calculates an input image to obtain a candidate region, and then classifies and corrects the candidate region through a convolution neural network. Representative of such algorithms are the R-CNN series and SPP-Net, etc.; the one stage target detection algorithm converts the detection and positioning of the target position into a regression problem, and the target detection algorithm from the input end to the output end is directly realized through a single convolutional neural network. one stage algorithm is represented by the YOLO series, SSD, Retina-Net, etc. The two types of target detection algorithms have respective advantages and disadvantages, the two types of target detection algorithms are superior to the one type of target detection algorithms in accuracy and precision, and the one type of target detection algorithms have great advantages in speed.
The YOLOv3 loss function is designed as follows:
Figure BDA0002880998400000093
where the first term is the coordinate error loss, λcoordIs a coordinate loss function coefficient; s denotes dividing the input image into S × S meshes; b represents the number of frames included in one mesh;
Figure BDA0002880998400000101
whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r isij
Figure BDA0002880998400000102
X, y, w, h representing the prediction box and the real box, respectively; the second term and the third term are confidence loss,
Figure BDA0002880998400000103
whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cxjAnd
Figure BDA0002880998400000104
respectively representing the predicted and real confidence coefficients of the jth frame of the ith grid; classes represents the number of categories; p is a radical ofij(c),
Figure BDA0002880998400000105
And the prediction probability and the real probability of the jth frame of the ith grid belonging to the class c object are shown.
S6, obtaining the positions of the skeleton key points of the interactors through a Kinect camera, virtualizing the character objects in the Uinty3D software, and generating corresponding interactive actions according to the distribution of the character joint points.
Calibrating the binocular cameras to obtain internal and external parameters and distortion coefficients of the two cameras;
the purpose of camera calibration is as follows: first, to restore the real world position of the object imaged by the camera, it is necessary to know how the world object is transformed into the computer image plane, i.e. to solve the internal and external parameter matrix.
Second, the perspective projection of the camera has a significant problem — distortion. Another purpose of camera calibration is to solve distortion coefficients and then use them for image rectification.
Correcting the original image according to the calibration result, wherein the two corrected images are positioned on the same plane and are parallel to each other;
the main task of the binocular camera system is distance measurement, and the parallax distance measurement formula is derived under the ideal condition of the binocular system, but in the real binocular stereo vision system, two camera image planes which are completely aligned in a coplanar line do not exist, as shown in the attached figure 6, wherein p is a certain point on an object to be measured, and O is a certain point on the object to be measured1And O2The optical centers of the two cameras are respectively, so that the three-dimensional correction is carried out, namely, two images which are not in coplanar line alignment in practice are corrected into coplanar line alignment (the coplanar line alignment is that when two camera image planes are on the same plane and the same point is projected to the two camera image planes, the same line of two pixel coordinate systems is needed), the actual binocular system is corrected into an ideal binocular system, as shown in figure 7, wherein p is a certain point on an object to be measured, and O is a certain point on the object to be measuredRAnd OTThe optical centers of the two cameras are respectively, the imaging points of the point P on the photoreceptors of the two cameras are P and P', f is the focal length of the cameras, B is the center distance of the two cameras, and X isRAnd XTThe distances from the imaging points on the image planes of the left camera and the right camera to the left edge of the image plane respectively, and z is the required depth information.
Matching pixel points of the two corrected images, wherein the pixel points are used for matching corresponding image points of the same scene on left and right views to obtain a parallax image;
calculating the depth of each pixel according to the matching result, thereby obtaining a depth map;
further, the obtaining of the skeletal key point position of the interactor through the Kinect camera, and then the virtualization of the character object and the corresponding interactive action process in the Uinty3D software specifically include:
and identifying the skeleton structure of the target person by utilizing the skeleton tracking function of the Kinect somatosensory instrument. The depth data of the target person and the related information of the color image are directly acquired from the sensor of the Kinect, and the obtained skeletal structure information of the person is displayed in real time and 20 joint point data of the person are saved.
The interactivity of the man-machine interaction system is improved by virtualizing character objects in the Unity3D software and designing a plurality of corresponding actions, such as squatting, standing, shooting postures and the like, in the Unity3D software according to the character joint distribution information so as to simulate the interactive actions of an interactor.
The invention uses the projector 2 to project a mobile phone interface through the mobile phone end 1, the interactive object starts to perform interactive action, the binocular camera 3 and the Kinect camera 4 transmit images to the cloud server 5 through the network for data processing and analysis, and the result is transmitted back to the mobile phone end 1 for displaying the game scene.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Claims (10)

1. A binocular projection human-computer interaction method combined with portrait behavior information is characterized by comprising the following steps:
acquiring image data by using a camera, and carrying out edge detection on a camera capturing area;
carrying out straight line detection on the image after edge detection through Hough straight line detection to realize area positioning of a projection area;
solving the mapping relation under the view angle transformation through the homography transformation matrix;
identifying the interactive objects by using a YOLOv3 target detection algorithm, and obtaining game coordinates mapped to the Unity3D development scene from the projection area coordinates according to the solved homography transformation matrix;
obtaining depth information of an interactive object through binocular camera ranging;
and virtualizing the character object by obtaining the positions of the skeleton key points of the interactive person, and generating corresponding interactive actions according to the distribution of the character joint points.
2. A binocular projection human-computer interaction method in combination with portrait behavior information according to claim 1, wherein the camera image is grayed before edge detection, the edge is a set of points in the image where brightness change is significant, and the large-scale canny filter operator is used to detect the image edge.
3. The binocular projection human-computer interaction method combined with the portrait behavior information as claimed in claim 2, wherein a gaussian filter is used to smooth the image and filter out noise when performing edge detection;
a gaussian kernel of size (2k +1) × (2k +1) is set by the following formula:
Figure FDA0002880998390000011
wherein k is a positive integer, i, j is in the middle of [1,2k +1 ]],σ2Let σ be 1.4 and k be 1 for the variance of the gaussian function, resulting in a gaussian convolution kernel:
Figure FDA0002880998390000012
convolving the Gaussian kernel with a gray image to obtain a smooth image;
calculating the gradient strength and direction of each pixel point in the image, and utilizing Sobel operators in the horizontal direction and the vertical direction:
Figure FDA0002880998390000013
wherein SxIs the Sobel operator in the horizontal direction, SyIs a vertical Sobel operator which is respectively convolved with the smooth image to obtain the first derivative G of the pixel point in two directionsx,GyFrom this, the gradient of the pixel point is calculated:
Figure FDA0002880998390000014
4. a binocular projection human-computer interaction method in combination with portrait behavior information according to claim 3, wherein the non-maximum suppression comprises the steps of:
1) comparing the gradient strength of the current pixel with two pixels along the positive and negative gradient directions;
2) if the gradient intensity of the current pixel is maximum compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is restrained.
5. The binocular projection human-computer interaction method combined with the portrait behavior information according to claim 4, wherein the image subjected to edge detection is subjected to straight line detection through Hough straight line detection to realize an area positioning process of a projection area, specifically comprising:
for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point under the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line, the straight line is transformed into: b is y-xk, and defines the horizontal coordinate of Hough space as k, and the vertical coordinate of Hough space as kB is marked as b, and then, y-xk is a straight line with the slope of-x and the intercept of y in Hough space; several points (x) on the same straight line on the Cartesian coordinate system1,y1),(x2,y2),…,(xn,yn) Corresponding to a plurality of straight lines on the Hough space, wherein the common intersection point (k, b) of the straight lines is the slope and the intercept of the same straight line in a Cartesian coordinate system;
performing hough transform in a polar coordinate mode, specifically, expressing a straight line by using a polar coordinate equation rho ═ xcos theta + ysin theta, wherein rho is a polar distance, namely a distance from an original point to the straight line in a polar coordinate space; theta is a polar angle, namely an included angle between a line segment which passes through an origin and is perpendicular to a straight line and an x axis, defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then coordinates (x) of a plurality of points on the same straight line on the polar coordinate system1,y1),(x2,y2),…,(xn,yn) Corresponding to a plurality of curves in Hough space, wherein the common intersection point (theta, rho) of the curves is the polar angle and the polar distance of the same straight line in a polar coordinate system;
calculating the intersection point of the four boundary straight lines of the longest projection region to obtain four vertex coordinates (x)lt,ylt)、(xlb,ylb)、(xrb,yrb)、(xrt,yrt)。
6. The binocular projection human-computer interaction method combined with the portrait behavior information according to claim 5, wherein the solving of the mapping relationship under the view angle transformation through the homography transformation matrix specifically comprises:
setting the X '-Y' plane to be vertical to the Z axis of the X-Y-Z space coordinate system and to be intersected with the Z axis to be a point (0,0,1), namely, the point (X ', Y') under the X '-Y' plane coordinate system is a point (X ', Y', 1) under the X-Y-Z space coordinate system; and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography matrix H:
Figure FDA0002880998390000021
Figure FDA0002880998390000022
in the formula, h1~h99 transformation parameters for the homography matrix; further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:
Figure FDA0002880998390000031
the H matrix has 9 transformation parameters, but actually only 8 degrees of freedom, and multiplying the H matrix by a scaling factor k:
Figure FDA0002880998390000032
that is, k × H and H actually represent the same mapping relationship, so that H has only 8 degrees of freedom, and solving the homography matrix H adopts adding constraint to the homography matrix H or adding H to the homography matrix H91.
7. The binocular projection human-computer interaction method combining portrait behavior information as claimed in claim 6, wherein h is9The solution equation is set to 1 as follows:
Figure FDA0002880998390000033
the homography matrix H is constrained modulo 1 as follows:
Figure FDA0002880998390000034
the equation to be solved is then:
Figure FDA0002880998390000035
defining target coordinate points of the four vertexes of the projection area under the obtained pixel coordinate system under the projection scene coordinate system, namely solving an H matrix:
Figure FDA0002880998390000036
8. the binocular projection human-computer interaction method combined with portrait behavior information as claimed in claim 7, wherein the identifying of the interaction object by using YOLOv3 target detection algorithm and the mapping of the projection area coordinates to the game coordinates in the Unity3D development scene from the solved homography transformation matrix are obtained, specifically comprising:
the loss function of YOLOv3 is as follows:
Figure FDA0002880998390000041
where the first term is the coordinate error loss, λcoordIs a coordinate loss function coefficient; s denotes dividing the input image into S × S meshes; b represents the number of frames included in one mesh;
Figure FDA0002880998390000042
whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r isij
Figure FDA0002880998390000044
X, y, w, h representing the prediction box and the real box, respectively; the second term and the third term are confidence loss,
Figure FDA0002880998390000043
whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cijAnd
Figure FDA0002880998390000045
respectively representing the predicted and real confidence coefficients of the jth frame of the ith grid; classes represents the number of categories; p is a radical ofij(c),
Figure FDA0002880998390000046
And the prediction probability and the real probability of the jth frame of the ith grid belonging to the class c object are shown.
9. The binocular projection human-computer interaction method combined with the portrait behavior information according to claim 8, wherein the obtaining of the depth information of the interaction object through binocular camera ranging specifically comprises:
correcting the original image according to the calibration result, wherein the two corrected images are positioned on the same plane and are parallel to each other;
matching pixel points of the two corrected images to obtain a disparity map;
and calculating the depth of each pixel according to the matching result, thereby obtaining a depth map.
10. The binocular projection human-computer interaction method combined with the portrait behavior information as claimed in claim 9, wherein the obtaining of the skeletal key point positions of the interactors through a Kinect camera and the virtualization of the character objects and the corresponding interaction actions in the Uinty3D software specifically comprises:
identifying the skeleton structure of a target person by utilizing the skeleton tracking function of the Kinect somatosensory instrument, acquiring depth data of the target person and related information of a color image, and displaying the obtained skeleton structure information of the person in real time to store joint point data of the person; the interaction of the interactors is mimicked by virtualizing character objects in Unity3D software.
CN202011642041.2A 2020-12-31 2020-12-31 Binocular projection man-machine interaction method combined with portrait behavior information Pending CN112657176A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011642041.2A CN112657176A (en) 2020-12-31 2020-12-31 Binocular projection man-machine interaction method combined with portrait behavior information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011642041.2A CN112657176A (en) 2020-12-31 2020-12-31 Binocular projection man-machine interaction method combined with portrait behavior information

Publications (1)

Publication Number Publication Date
CN112657176A true CN112657176A (en) 2021-04-16

Family

ID=75412210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011642041.2A Pending CN112657176A (en) 2020-12-31 2020-12-31 Binocular projection man-machine interaction method combined with portrait behavior information

Country Status (1)

Country Link
CN (1) CN112657176A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113487668A (en) * 2021-05-25 2021-10-08 北京工业大学 Radius-unlimited learnable cylindrical surface back projection method
CN113506210A (en) * 2021-08-10 2021-10-15 深圳市前海动竞体育科技有限公司 Method for automatically generating point maps of athletes in basketball game and video shooting device
CN115061577A (en) * 2022-08-11 2022-09-16 北京深光科技有限公司 Hand projection interaction method, system and storage medium
WO2022252239A1 (en) * 2021-05-31 2022-12-08 浙江大学 Computer vision-based mobile terminal application control identification method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015024407A1 (en) * 2013-08-19 2015-02-26 国家电网公司 Power robot based binocular vision navigation system and method based on
CN107481267A (en) * 2017-08-14 2017-12-15 华南理工大学 A kind of shooting projection interactive system and method based on binocular vision
CN111354007A (en) * 2020-02-29 2020-06-30 华南理工大学 Projection interaction method based on pure machine vision positioning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015024407A1 (en) * 2013-08-19 2015-02-26 国家电网公司 Power robot based binocular vision navigation system and method based on
CN107481267A (en) * 2017-08-14 2017-12-15 华南理工大学 A kind of shooting projection interactive system and method based on binocular vision
CN111354007A (en) * 2020-02-29 2020-06-30 华南理工大学 Projection interaction method based on pure machine vision positioning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈鹏艳等: "Unity3D中的Kinect主角位置检测与体感交互", 《沈阳理工大学学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113487668A (en) * 2021-05-25 2021-10-08 北京工业大学 Radius-unlimited learnable cylindrical surface back projection method
WO2022252239A1 (en) * 2021-05-31 2022-12-08 浙江大学 Computer vision-based mobile terminal application control identification method
CN113506210A (en) * 2021-08-10 2021-10-15 深圳市前海动竞体育科技有限公司 Method for automatically generating point maps of athletes in basketball game and video shooting device
CN115061577A (en) * 2022-08-11 2022-09-16 北京深光科技有限公司 Hand projection interaction method, system and storage medium
CN115061577B (en) * 2022-08-11 2022-11-11 北京深光科技有限公司 Hand projection interaction method, system and storage medium

Similar Documents

Publication Publication Date Title
US10732725B2 (en) Method and apparatus of interactive display based on gesture recognition
Memo et al. Head-mounted gesture controlled interface for human-computer interaction
Rekimoto Matrix: A realtime object identification and registration method for augmented reality
CN112657176A (en) Binocular projection man-machine interaction method combined with portrait behavior information
CN111291885A (en) Near-infrared image generation method, network generation training method and device
WO2023071964A1 (en) Data processing method and apparatus, and electronic device and computer-readable storage medium
CN111401266B (en) Method, equipment, computer equipment and readable storage medium for positioning picture corner points
CN108604379A (en) System and method for determining the region in image
CN109359514B (en) DeskVR-oriented gesture tracking and recognition combined strategy method
US20050166163A1 (en) Systems and methods of interfacing with a machine
US11308655B2 (en) Image synthesis method and apparatus
Caputo et al. 3D Hand Gesture Recognition Based on Sensor Fusion of Commodity Hardware.
CN104081307A (en) Image processing apparatus, image processing method, and program
CN111354007B (en) Projection interaction method based on pure machine vision positioning
CN108305321B (en) Three-dimensional human hand 3D skeleton model real-time reconstruction method and device based on binocular color imaging system
CN109271023B (en) Selection method based on three-dimensional object outline free-hand gesture action expression
US20240071016A1 (en) Mixed reality system, program, mobile terminal device, and method
CN111527468A (en) Air-to-air interaction method, device and equipment
JP2021520577A (en) Image processing methods and devices, electronic devices and storage media
US20230351724A1 (en) Systems and Methods for Object Detection Including Pose and Size Estimation
CN110222651A (en) A kind of human face posture detection method, device, terminal device and readable storage medium storing program for executing
Battisti et al. Seamless bare-hand interaction in mixed reality
KR20200069009A (en) Apparatus for Embodying Virtual Reality and Automatic Plan System of Electrical Wiring
EP3309713B1 (en) Method and device for interacting with virtual objects
JP6304815B2 (en) Image processing apparatus and image feature detection method, program and apparatus thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210416

RJ01 Rejection of invention patent application after publication