CN112657176A - Binocular projection man-machine interaction method combined with portrait behavior information - Google Patents
Binocular projection man-machine interaction method combined with portrait behavior information Download PDFInfo
- Publication number
- CN112657176A CN112657176A CN202011642041.2A CN202011642041A CN112657176A CN 112657176 A CN112657176 A CN 112657176A CN 202011642041 A CN202011642041 A CN 202011642041A CN 112657176 A CN112657176 A CN 112657176A
- Authority
- CN
- China
- Prior art keywords
- straight line
- image
- coordinate system
- binocular
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 35
- 239000011159 matrix material Substances 0.000 claims abstract description 37
- 238000001514 detection method Methods 0.000 claims abstract description 33
- 230000009466 transformation Effects 0.000 claims abstract description 32
- 230000002452 interceptive effect Effects 0.000 claims abstract description 29
- 238000013507 mapping Methods 0.000 claims abstract description 18
- 238000003708 edge detection Methods 0.000 claims abstract description 15
- 238000011161 development Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 8
- 230000001629 suppression Effects 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 230000003238 somatosensory effect Effects 0.000 claims description 3
- 230000009471 action Effects 0.000 claims description 2
- 230000006399 behavior Effects 0.000 abstract description 11
- 238000013135 deep learning Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 11
- 230000000875 corresponding effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003702 image correction Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 108091008695 photoreceptors Proteins 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a binocular projection man-machine interaction method combining portrait behavior information, which comprises the following steps: carrying out edge detection on a camera capturing area; carrying out straight line detection on the image after edge detection; solving the mapping relation under the view angle transformation through the homography transformation matrix; identifying the interactive object by using a YOLOv3 target detection algorithm, and acquiring game coordinates mapped to a Unity3D development scene from the projection area coordinates; obtaining depth information of an interactive object through binocular camera ranging; obtaining the positions of skeleton key points of an interactor through a Kinect camera, and virtualizing a character object and corresponding interactive actions in Uinty3D software; the invention can utilize a deep learning method to carry out human-computer interaction, and can greatly improve the interactivity of the projection human-computer interaction system by combining the depth information of the interaction object measured by the binocular camera and the portrait behavior information of the interactors.
Description
Technical Field
The invention relates to the technical fields of image processing, feature extraction, feature analysis, computer vision, convolutional neural networks, target detection, human-computer interaction and the like, in particular to a binocular projection human-computer interaction method combining portrait behavior information.
Background
With the development of scientific technology, man-machine interaction technology becomes diversified, people no longer need to simply present virtual scenes, and begin to explore interaction methods with virtual worlds, so that more and more novel man-machine interaction technologies come into play. Human-computer interaction techniques fall into several categories: the traditional interactive technology taking a keyboard and a mouse as input; interaction technologies based on touch screen devices, such as smart phones, tablet computers; non-contact interaction technologies based on machine vision and image processing technologies, such as virtual keyboards, gesture interaction systems and the like.
The key-press keyboard and mouse are the most mature interactive devices at present, and the man-machine interaction technology based on the key-press keyboard and mouse is also applied to the computer operation at the earliest. The interactive mode has the characteristics of stable performance and quick response, and is widely applied to office work in daily life. However, the disadvantages of this technique are: a complete human-computer interaction process requires devices such as a keyboard, a mouse, a screen for displaying a graphical interface and the like, and the devices are numerous and heavy; with the development of touch screen technology and mobile equipment, display and interaction functions of novel mobile products such as smart phones are integrated into a screen, and the characteristics of portability and easy operation enable the mobile equipment to be rapidly and widely used, so that the life style of people is changed; the diversified demands of people on human-computer interaction prompt people to find a more natural and less-limited human-computer interaction mode.
In the literature (Goto H, Takemura D, Kawasaki Y, et al. development of an Information Projection Using a Projector-Camera System [ J ]. Electronics and Communications in Japan,2013,96(11): 70-81), Hiroki Goto et al studied a Camera Projection interaction System based on the frame difference method and the hand flesh color extraction method: firstly, separating hands from a scene based on clustering characteristics of hand skin colors in HSV and YCbCr spaces, and then detecting fingertip positions on a separated foreground image by using a template matching method, thereby realizing projection interaction between a user and a computer or a family television. In the literature (Fitriani, Goh W.B. Interactive with projected media on deformable surfaces [ C ]. Rio de Janeiro, Brazil: IEEE International Conference on Computer Vision,2007:1-6.), Fitriani et al propose a human-Computer interaction system based on a deformable projection surface, which projects a virtual scene onto the surface of a deformable object, then detects the deformation generated when a user touches the projection screen, and analyzes the interaction information through an image processing algorithm and a deformation model of the object. The above schemes based on machine vision technology and image processing algorithm all have the following problems: the diversity of the projection scene cannot be guaranteed; the dependence on peripherals is large. As in an interactive system based on hand skin color, when the projected scene is similar to the hand skin color, the hand foreground separation algorithm is ineffective.
Disclosure of Invention
The invention discloses a binocular projection human-computer interaction method combined with portrait behavior information, and aims to utilize a depth learning method to carry out human-computer interaction, and simultaneously combine depth information of an interaction object measured by a binocular camera and portrait behavior information of an interactor to improve the interactivity of a projection human-computer interaction system to a great extent.
The invention is realized by at least one of the following technical schemes.
A binocular projection human-computer interaction method combined with portrait behavior information comprises the following steps:
acquiring image data by using a camera, and carrying out edge detection on a camera capturing area;
carrying out straight line detection on the image after edge detection through Hough straight line detection to realize area positioning of a projection area;
solving the mapping relation under the view angle transformation through the homography transformation matrix;
identifying the interactive objects by using a YOLOv3 target detection algorithm, and obtaining game coordinates mapped to the Unity3D development scene from the projection area coordinates according to the solved homography transformation matrix;
obtaining depth information of an interactive object through binocular camera ranging;
and virtualizing the character object by obtaining the positions of the skeleton key points of the interactive person, and generating corresponding interactive actions according to the distribution of the character joint points.
Preferably, the camera image is grayed before edge detection, the edge is a set of points with obvious brightness change in the image, and a large-scale canny filter operator is adopted to detect the edge of the image.
Preferably, a gaussian filter is used to smooth the image and filter out noise when edge detection is performed;
a gaussian kernel of size (2k +1) × (2k +1) is set by the following formula:
wherein k is a positive integer, i, j is in the middle of [1,2k +1 ]],σ2Let σ be 1.4 and k be 1 for the variance of the gaussian function, resulting in a gaussian convolution kernel:
convolving the Gaussian kernel with a gray image to obtain a smooth image;
calculating the gradient strength and direction of each pixel point in the image, and utilizing Sobel operators in the horizontal direction and the vertical direction:
wherein SxIs the Sobel operator in the horizontal direction, SyIs a vertical Sobel operator which is respectively convolved with the smooth image to obtain the first derivative G of the pixel point in two directionsx,GyFrom this, the gradient of the pixel point is calculated:
preferably, the non-maxima suppression comprises the steps of:
3) comparing the gradient strength of the current pixel with two pixels along the positive and negative gradient directions;
4) if the gradient intensity of the current pixel is maximum compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is restrained.
Preferably, the performing line detection on the image after the edge detection through hough line detection to realize the area positioning process of the projection area specifically includes:
for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point under the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line, the straight line is transformed into: b is y-xk, and the abscissa in the hough space is defined as k, and the ordinate is b, then b is y-xk, which is a straight line with slope-x and intercept of y in the hough space; several points (x) on the same straight line on the Cartesian coordinate system1,y1),(x2,y2),…,(xn,yn) Corresponding to a plurality of straight lines on the Hough space, wherein the common intersection point (k, b) of the straight lines is the slope and the intercept of the same straight line in a Cartesian coordinate system;
performing hough transform in a polar coordinate mode, specifically, expressing a straight line by using a polar coordinate equation rho ═ xcos theta + ysin theta, wherein rho is a polar distance, namely a distance from an original point to the straight line in a polar coordinate space; theta is a polar angle, namely an included angle between a line segment which passes through an origin and is perpendicular to a straight line and an x axis, defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then coordinates (x) of a plurality of points on the same straight line on the polar coordinate system1,y1),(x2,y2),…,(xn,yn) Corresponding to a plurality of curves in Hough space, wherein the common intersection point (theta, rho) of the curves is the polar angle and the polar distance of the same straight line in a polar coordinate system;
calculating the intersection point of the four boundary straight lines of the longest projection region to obtain four vertex coordinates (x)lt,ylt)、(xlb,ylb)、(xrb,yrb)、(xrt,yrt)。
Preferably, the solving of the mapping relationship under the view angle transformation through the homography transformation matrix specifically includes:
setting the X '-Y' plane to be vertical to the Z axis of the X-Y-Z space coordinate system and to be intersected with the Z axis to be a point (0,0,1), namely, the point (X ', Y') under the X '-Y' plane coordinate system is a point (X ', Y', 1) under the X-Y-Z space coordinate system; and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography matrix H:
in the formula, h1~h99 transformation parameters for the homography matrix; further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:
the H matrix has 9 transformation parameters, but actually only 8 degrees of freedom, and multiplying the H matrix by a scaling factor k:
i.e. k x H andh actually represents the same mapping relation, so that H only has 8 degrees of freedom, and the solution of the homography matrix H is realized by adding constraint to the homography matrix H or adding H to the homography matrix H 91.
Preferably, h is9The solution equation is set to 1 as follows:
the homography matrix H is constrained modulo 1 as follows:
the equation to be solved is then:
defining target coordinate points of the four vertexes of the projection area under the obtained pixel coordinate system under the projection scene coordinate system, namely solving an H matrix:
preferably, the identifying the interactive object by using the YOLOv3 target detection algorithm, and obtaining the game coordinate mapped from the projection area coordinate to the Unity3D development scene according to the solved homography transformation matrix specifically includes:
the loss function of YOLOv3 is as follows:
where the first term is the coordinate error loss, λcoordIs a coordinate loss function coefficient; s denotes dividing an input image intoS × S grids; b represents the number of frames included in one mesh;whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r isij、X, y, w, h representing the prediction box and the real box, respectively; the second term and the third term are confidence loss,whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cijAndrespectively representing the predicted and real confidence coefficients of the jth frame of the ith grid; classes represents the number of categories; p is a radical ofij(c),And the prediction probability and the real probability of the jth frame of the ith grid belonging to the class c object are shown.
Preferably, the obtaining of the depth information of the interactive object through binocular camera ranging specifically includes:
correcting the original image according to the calibration result, wherein the two corrected images are positioned on the same plane and are parallel to each other;
matching pixel points of the two corrected images to obtain a disparity map;
and calculating the depth of each pixel according to the matching result, thereby obtaining a depth map.
Preferably, the obtaining of the skeletal key point position of the interactor through the Kinect camera and the virtualization of the character object and the corresponding interaction action in the Uinty3D software specifically include:
identifying the skeleton structure of a target person by utilizing the skeleton tracking function of the Kinect somatosensory instrument, acquiring depth data of the target person and related information of a color image, and displaying the obtained skeleton structure information of the person in real time to store joint point data of the person; the interaction of the man-machine interaction system is improved by virtualizing character objects in the Unity3D software to simulate the interaction of an interactor.
Compared with the prior art, the invention has the beneficial effects that: the target detection is carried out by a deep learning method, so that the interference of environmental factors can be avoided, the diversity of interactive scenes is ensured, and meanwhile, the interactivity of a human-computer interaction system is improved by combining the depth information of an interactive object acquired by a binocular camera and the human behavior information of an interactive person acquired by a Kinect camera.
Drawings
Fig. 1 is a hardware schematic diagram of a binocular projection human-computer interaction method combining with portrait behavior information according to the embodiment;
fig. 2 is a flowchart of a binocular projection human-computer interaction method combining with portrait behavior information according to the present embodiment;
FIG. 3 is a schematic diagram of detection of a Cartesian coordinate Hough line according to the embodiment;
fig. 4 is a schematic diagram of a hough line detection algorithm of the polar coordinate system in this embodiment;
FIG. 5 is a schematic diagram of homography transformation of the present embodiment;
fig. 6 is a frame diagram of an actual binocular system according to the present embodiment;
fig. 7 is a frame diagram of an ideal binocular vision system of the present embodiment.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
As shown in fig. 2, a binocular projection human-computer interaction method combining portrait behavior information includes the following steps:
s1, performing edge detection on the camera capturing area through a large-scale canny filtering operator, and performing non-maximum suppression;
carrying out graying processing on the camera image;
the edge is a set of points with obvious brightness change in the image, the gradient can reflect the change speed in numerical value, and the edge of the image is detected by adopting a large-scale canny filter operator based on the principle of no leakage of the boundary of a projection area;
using a Gaussian filter to smooth the image and filter out noise;
a gaussian kernel of size (2k +1) × (2k +1) is set by the following formula:
wherein k is a positive integer, i, j is in the middle of [1,2k +1 ]],σ2Let σ be 1.4 and k be 1 for the variance of the gaussian function, resulting in a gaussian convolution kernel:
convolving the Gaussian kernel with a gray image to obtain a smooth image;
calculating the gradient strength and direction of each pixel point in the image, and utilizing Sobel operators in the horizontal direction and the vertical direction:
wherein SxIs the Sobel operator in the horizontal direction, SxIs a vertical Sobel operator which is respectively convolved with the smooth image to obtain the first derivative G of the pixel point in two directionsx,GyFrom this, the gradient of the pixel point is calculated:
non-maxima suppression is applied to eliminate spurious responses due to edge detection:
for each pixel point on the obtained gradient image, the reservation or elimination of the point cannot be determined only by a single threshold, and for the finally obtained edge image, the accurate description of the source image contour is expected, so that non-maximum suppression is required:
5) comparing the gradient strength of the current pixel with two pixels along the positive and negative gradient directions;
6) if the gradient intensity of the current pixel is maximum compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is inhibited;
as the convolution kernel scale increases, the more pronounced the detected edge is; based on the principle that the boundary of the projection area cannot be detected, a large-scale canny operator is adopted to detect the edge of the image.
S2, carrying out straight line detection on the image after edge detection through Hough straight line detection to realize area positioning of a projection area;
hough line detection maps each point on a cartesian coordinate system to a straight line in Hough space by using the principle of point-line duality of the cartesian coordinate system and Hough space, so that an intersection passing through a plurality of straight lines in Hough space corresponds to a straight line passing through a plurality of points in the cartesian coordinate system, as shown in fig. 3.
Specifically, for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point under the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line. The straight line is transformed into: and b is y-xk, and the abscissa in the hough space is k, and the ordinate in the hough space is b, then b is y-xk, which is a straight line with slope-x and intercept y in the hough space. Several points (x) on the same straight line on the Cartesian coordinate system1,y1),(x2,y2),…,(xn,yn) The hough space corresponds to a plurality of straight lines, and the common intersection point (k, b) of the straight lines is the slope and the intercept of the same straight line in a Cartesian coordinate system.
The slope of the vertical line in the image cannot be calculated becauseThis is typically done in polar form with a hough transform. Specifically, a straight line is expressed by a polar coordinate equation ρ ═ xcos θ + ysin θ, where ρ is a polar distance, i.e., a distance from an origin to the straight line in a polar coordinate space; θ is the polar angle, i.e. the angle between the line segment passing through the origin and perpendicular to the straight line and the x-axis. Defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then defining the coordinates (x) of a plurality of points on the same straight line on the polar coordinate system1,y1),(x2,y2),…,(xn,yn) The hough space corresponds to a plurality of curves, and a common intersection point (θ, ρ) of the curves is a polar angle and a polar distance of the same straight line in the polar coordinate system, and a schematic diagram is shown in fig. 4 of the accompanying drawings.
Calculating the intersection point of the four boundary straight lines of the longest projection region in pairs to obtain coordinates (x) of four top points of the projection region, namely, the top left point, the bottom right point and the top right pointlt,ylt)、(xlb,ylb)、(xrb,yrb)、(xrt,yrt)。
S3, solving the mapping relation under the view angle transformation through the homography transformation matrix; the homography transformation diagram is shown in figure 5.
Homography transformations reflect the process of mapping from one two-dimensional plane to three-dimensional space, and then from three-dimensional space to another two-dimensional plane. The homography transformation describes nonlinear transformation between two coordinate systems, so that the homography transformation has wide application in the fields of image splicing, image correction, augmented reality and the like.
X-Y-Z is a three-dimensional space coordinate system and can be understood as a world coordinate system; x-y is a pixel plane space coordinate system; and x '-y' is a plane coordinate system of the elevator key. The homography transform can be described as: a point (X, Y) on the X-Y coordinate system, corresponding to a straight line l passing through the origin and the point on the X-Y-Z coordinate system:the straight line intersects the x '-y' coordinate system plane at point (x ', y'), and the process from point (x, y) to point (x ', y') is referred to as a homography transformation.
The solving process of the homography transformation is as follows:
let the X '-Y' plane be perpendicular to the Z-axis of the X-Y-Z space coordinate system and intersect the Z-axis at point (0,0,1), i.e., point (X ', Y') in the X '-Y' plane coordinate system is point (X ', Y', 1) in the X-Y-Z space coordinate system. And describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography matrix H:
in the formula, h1~h99 transformation parameters for the homography matrix; further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:
the H matrix has 9 transformation parameters, but actually only 8 degrees of freedom, and multiplying the H matrix by a scaling factor k:
that is, k x H and H actually represent the same mapping relationship, so H has only 8 degrees of freedom, and one way is to solve H9Setting to 1, the equation to be solved is:
another approach is to add a constraint to the homography matrix H, modulo 1, as follows:
the equation to be solved is then:
defining target coordinate points of the four vertexes of the projection area under the obtained pixel coordinate system under the projection scene coordinate system, namely solving an H matrix:
s4, identifying the interactive objects by researching a convolutional neural network technology and utilizing a YOLOv3 target detection algorithm, and obtaining game coordinates mapped to a Unity3D development scene from the projection area coordinates according to the solved homography transformation matrix;
s5, obtaining depth information of the interactive object by ranging the binocular camera through a convolutional neural network;
further, interactive objects are identified by using a YOLOv3 target detection algorithm, and a game coordinate process of mapping the projection area coordinates to the Unity3D development scene is obtained according to the solved homography transformation matrix.
Coordinate positioning of the arrow position in the image is a typical object detection problem. The current research on target detection in the field of convolutional neural networks is mainly divided into two algorithms, namely a two-stage algorithm and a one-stage algorithm: the two-stage target detection algorithm calculates an input image to obtain a candidate region, and then classifies and corrects the candidate region through a convolution neural network. Representative of such algorithms are the R-CNN series and SPP-Net, etc.; the one stage target detection algorithm converts the detection and positioning of the target position into a regression problem, and the target detection algorithm from the input end to the output end is directly realized through a single convolutional neural network. one stage algorithm is represented by the YOLO series, SSD, Retina-Net, etc. The two types of target detection algorithms have respective advantages and disadvantages, the two types of target detection algorithms are superior to the one type of target detection algorithms in accuracy and precision, and the one type of target detection algorithms have great advantages in speed.
The YOLOv3 loss function is designed as follows:
where the first term is the coordinate error loss, λcoordIs a coordinate loss function coefficient; s denotes dividing the input image into S × S meshes; b represents the number of frames included in one mesh;whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r isij、X, y, w, h representing the prediction box and the real box, respectively; the second term and the third term are confidence loss,whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cxjAndrespectively representing the predicted and real confidence coefficients of the jth frame of the ith grid; classes represents the number of categories; p is a radical ofij(c),And the prediction probability and the real probability of the jth frame of the ith grid belonging to the class c object are shown.
S6, obtaining the positions of the skeleton key points of the interactors through a Kinect camera, virtualizing the character objects in the Uinty3D software, and generating corresponding interactive actions according to the distribution of the character joint points.
Calibrating the binocular cameras to obtain internal and external parameters and distortion coefficients of the two cameras;
the purpose of camera calibration is as follows: first, to restore the real world position of the object imaged by the camera, it is necessary to know how the world object is transformed into the computer image plane, i.e. to solve the internal and external parameter matrix.
Second, the perspective projection of the camera has a significant problem — distortion. Another purpose of camera calibration is to solve distortion coefficients and then use them for image rectification.
Correcting the original image according to the calibration result, wherein the two corrected images are positioned on the same plane and are parallel to each other;
the main task of the binocular camera system is distance measurement, and the parallax distance measurement formula is derived under the ideal condition of the binocular system, but in the real binocular stereo vision system, two camera image planes which are completely aligned in a coplanar line do not exist, as shown in the attached figure 6, wherein p is a certain point on an object to be measured, and O is a certain point on the object to be measured1And O2The optical centers of the two cameras are respectively, so that the three-dimensional correction is carried out, namely, two images which are not in coplanar line alignment in practice are corrected into coplanar line alignment (the coplanar line alignment is that when two camera image planes are on the same plane and the same point is projected to the two camera image planes, the same line of two pixel coordinate systems is needed), the actual binocular system is corrected into an ideal binocular system, as shown in figure 7, wherein p is a certain point on an object to be measured, and O is a certain point on the object to be measuredRAnd OTThe optical centers of the two cameras are respectively, the imaging points of the point P on the photoreceptors of the two cameras are P and P', f is the focal length of the cameras, B is the center distance of the two cameras, and X isRAnd XTThe distances from the imaging points on the image planes of the left camera and the right camera to the left edge of the image plane respectively, and z is the required depth information.
Matching pixel points of the two corrected images, wherein the pixel points are used for matching corresponding image points of the same scene on left and right views to obtain a parallax image;
calculating the depth of each pixel according to the matching result, thereby obtaining a depth map;
further, the obtaining of the skeletal key point position of the interactor through the Kinect camera, and then the virtualization of the character object and the corresponding interactive action process in the Uinty3D software specifically include:
and identifying the skeleton structure of the target person by utilizing the skeleton tracking function of the Kinect somatosensory instrument. The depth data of the target person and the related information of the color image are directly acquired from the sensor of the Kinect, and the obtained skeletal structure information of the person is displayed in real time and 20 joint point data of the person are saved.
The interactivity of the man-machine interaction system is improved by virtualizing character objects in the Unity3D software and designing a plurality of corresponding actions, such as squatting, standing, shooting postures and the like, in the Unity3D software according to the character joint distribution information so as to simulate the interactive actions of an interactor.
The invention uses the projector 2 to project a mobile phone interface through the mobile phone end 1, the interactive object starts to perform interactive action, the binocular camera 3 and the Kinect camera 4 transmit images to the cloud server 5 through the network for data processing and analysis, and the result is transmitted back to the mobile phone end 1 for displaying the game scene.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Claims (10)
1. A binocular projection human-computer interaction method combined with portrait behavior information is characterized by comprising the following steps:
acquiring image data by using a camera, and carrying out edge detection on a camera capturing area;
carrying out straight line detection on the image after edge detection through Hough straight line detection to realize area positioning of a projection area;
solving the mapping relation under the view angle transformation through the homography transformation matrix;
identifying the interactive objects by using a YOLOv3 target detection algorithm, and obtaining game coordinates mapped to the Unity3D development scene from the projection area coordinates according to the solved homography transformation matrix;
obtaining depth information of an interactive object through binocular camera ranging;
and virtualizing the character object by obtaining the positions of the skeleton key points of the interactive person, and generating corresponding interactive actions according to the distribution of the character joint points.
2. A binocular projection human-computer interaction method in combination with portrait behavior information according to claim 1, wherein the camera image is grayed before edge detection, the edge is a set of points in the image where brightness change is significant, and the large-scale canny filter operator is used to detect the image edge.
3. The binocular projection human-computer interaction method combined with the portrait behavior information as claimed in claim 2, wherein a gaussian filter is used to smooth the image and filter out noise when performing edge detection;
a gaussian kernel of size (2k +1) × (2k +1) is set by the following formula:
wherein k is a positive integer, i, j is in the middle of [1,2k +1 ]],σ2Let σ be 1.4 and k be 1 for the variance of the gaussian function, resulting in a gaussian convolution kernel:
convolving the Gaussian kernel with a gray image to obtain a smooth image;
calculating the gradient strength and direction of each pixel point in the image, and utilizing Sobel operators in the horizontal direction and the vertical direction:
wherein SxIs the Sobel operator in the horizontal direction, SyIs a vertical Sobel operator which is respectively convolved with the smooth image to obtain the first derivative G of the pixel point in two directionsx,GyFrom this, the gradient of the pixel point is calculated:
4. a binocular projection human-computer interaction method in combination with portrait behavior information according to claim 3, wherein the non-maximum suppression comprises the steps of:
1) comparing the gradient strength of the current pixel with two pixels along the positive and negative gradient directions;
2) if the gradient intensity of the current pixel is maximum compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is restrained.
5. The binocular projection human-computer interaction method combined with the portrait behavior information according to claim 4, wherein the image subjected to edge detection is subjected to straight line detection through Hough straight line detection to realize an area positioning process of a projection area, specifically comprising:
for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point under the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line, the straight line is transformed into: b is y-xk, and defines the horizontal coordinate of Hough space as k, and the vertical coordinate of Hough space as kB is marked as b, and then, y-xk is a straight line with the slope of-x and the intercept of y in Hough space; several points (x) on the same straight line on the Cartesian coordinate system1,y1),(x2,y2),…,(xn,yn) Corresponding to a plurality of straight lines on the Hough space, wherein the common intersection point (k, b) of the straight lines is the slope and the intercept of the same straight line in a Cartesian coordinate system;
performing hough transform in a polar coordinate mode, specifically, expressing a straight line by using a polar coordinate equation rho ═ xcos theta + ysin theta, wherein rho is a polar distance, namely a distance from an original point to the straight line in a polar coordinate space; theta is a polar angle, namely an included angle between a line segment which passes through an origin and is perpendicular to a straight line and an x axis, defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then coordinates (x) of a plurality of points on the same straight line on the polar coordinate system1,y1),(x2,y2),…,(xn,yn) Corresponding to a plurality of curves in Hough space, wherein the common intersection point (theta, rho) of the curves is the polar angle and the polar distance of the same straight line in a polar coordinate system;
calculating the intersection point of the four boundary straight lines of the longest projection region to obtain four vertex coordinates (x)lt,ylt)、(xlb,ylb)、(xrb,yrb)、(xrt,yrt)。
6. The binocular projection human-computer interaction method combined with the portrait behavior information according to claim 5, wherein the solving of the mapping relationship under the view angle transformation through the homography transformation matrix specifically comprises:
setting the X '-Y' plane to be vertical to the Z axis of the X-Y-Z space coordinate system and to be intersected with the Z axis to be a point (0,0,1), namely, the point (X ', Y') under the X '-Y' plane coordinate system is a point (X ', Y', 1) under the X-Y-Z space coordinate system; and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography matrix H:
in the formula, h1~h99 transformation parameters for the homography matrix; further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:
the H matrix has 9 transformation parameters, but actually only 8 degrees of freedom, and multiplying the H matrix by a scaling factor k:
that is, k × H and H actually represent the same mapping relationship, so that H has only 8 degrees of freedom, and solving the homography matrix H adopts adding constraint to the homography matrix H or adding H to the homography matrix H91.
7. The binocular projection human-computer interaction method combining portrait behavior information as claimed in claim 6, wherein h is9The solution equation is set to 1 as follows:
the homography matrix H is constrained modulo 1 as follows:
the equation to be solved is then:
defining target coordinate points of the four vertexes of the projection area under the obtained pixel coordinate system under the projection scene coordinate system, namely solving an H matrix:
8. the binocular projection human-computer interaction method combined with portrait behavior information as claimed in claim 7, wherein the identifying of the interaction object by using YOLOv3 target detection algorithm and the mapping of the projection area coordinates to the game coordinates in the Unity3D development scene from the solved homography transformation matrix are obtained, specifically comprising:
the loss function of YOLOv3 is as follows:
where the first term is the coordinate error loss, λcoordIs a coordinate loss function coefficient; s denotes dividing the input image into S × S meshes; b represents the number of frames included in one mesh;whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r isij、X, y, w, h representing the prediction box and the real box, respectively; the second term and the third term are confidence loss,whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cijAndrespectively representing the predicted and real confidence coefficients of the jth frame of the ith grid; classes represents the number of categories; p is a radical ofij(c),And the prediction probability and the real probability of the jth frame of the ith grid belonging to the class c object are shown.
9. The binocular projection human-computer interaction method combined with the portrait behavior information according to claim 8, wherein the obtaining of the depth information of the interaction object through binocular camera ranging specifically comprises:
correcting the original image according to the calibration result, wherein the two corrected images are positioned on the same plane and are parallel to each other;
matching pixel points of the two corrected images to obtain a disparity map;
and calculating the depth of each pixel according to the matching result, thereby obtaining a depth map.
10. The binocular projection human-computer interaction method combined with the portrait behavior information as claimed in claim 9, wherein the obtaining of the skeletal key point positions of the interactors through a Kinect camera and the virtualization of the character objects and the corresponding interaction actions in the Uinty3D software specifically comprises:
identifying the skeleton structure of a target person by utilizing the skeleton tracking function of the Kinect somatosensory instrument, acquiring depth data of the target person and related information of a color image, and displaying the obtained skeleton structure information of the person in real time to store joint point data of the person; the interaction of the interactors is mimicked by virtualizing character objects in Unity3D software.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011642041.2A CN112657176A (en) | 2020-12-31 | 2020-12-31 | Binocular projection man-machine interaction method combined with portrait behavior information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011642041.2A CN112657176A (en) | 2020-12-31 | 2020-12-31 | Binocular projection man-machine interaction method combined with portrait behavior information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112657176A true CN112657176A (en) | 2021-04-16 |
Family
ID=75412210
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011642041.2A Pending CN112657176A (en) | 2020-12-31 | 2020-12-31 | Binocular projection man-machine interaction method combined with portrait behavior information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112657176A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113487668A (en) * | 2021-05-25 | 2021-10-08 | 北京工业大学 | Radius-unlimited learnable cylindrical surface back projection method |
CN113506210A (en) * | 2021-08-10 | 2021-10-15 | 深圳市前海动竞体育科技有限公司 | Method for automatically generating point maps of athletes in basketball game and video shooting device |
CN115061577A (en) * | 2022-08-11 | 2022-09-16 | 北京深光科技有限公司 | Hand projection interaction method, system and storage medium |
WO2022252239A1 (en) * | 2021-05-31 | 2022-12-08 | 浙江大学 | Computer vision-based mobile terminal application control identification method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015024407A1 (en) * | 2013-08-19 | 2015-02-26 | 国家电网公司 | Power robot based binocular vision navigation system and method based on |
CN107481267A (en) * | 2017-08-14 | 2017-12-15 | 华南理工大学 | A kind of shooting projection interactive system and method based on binocular vision |
CN111354007A (en) * | 2020-02-29 | 2020-06-30 | 华南理工大学 | Projection interaction method based on pure machine vision positioning |
-
2020
- 2020-12-31 CN CN202011642041.2A patent/CN112657176A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015024407A1 (en) * | 2013-08-19 | 2015-02-26 | 国家电网公司 | Power robot based binocular vision navigation system and method based on |
CN107481267A (en) * | 2017-08-14 | 2017-12-15 | 华南理工大学 | A kind of shooting projection interactive system and method based on binocular vision |
CN111354007A (en) * | 2020-02-29 | 2020-06-30 | 华南理工大学 | Projection interaction method based on pure machine vision positioning |
Non-Patent Citations (1)
Title |
---|
陈鹏艳等: "Unity3D中的Kinect主角位置检测与体感交互", 《沈阳理工大学学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113487668A (en) * | 2021-05-25 | 2021-10-08 | 北京工业大学 | Radius-unlimited learnable cylindrical surface back projection method |
WO2022252239A1 (en) * | 2021-05-31 | 2022-12-08 | 浙江大学 | Computer vision-based mobile terminal application control identification method |
CN113506210A (en) * | 2021-08-10 | 2021-10-15 | 深圳市前海动竞体育科技有限公司 | Method for automatically generating point maps of athletes in basketball game and video shooting device |
CN115061577A (en) * | 2022-08-11 | 2022-09-16 | 北京深光科技有限公司 | Hand projection interaction method, system and storage medium |
CN115061577B (en) * | 2022-08-11 | 2022-11-11 | 北京深光科技有限公司 | Hand projection interaction method, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10732725B2 (en) | Method and apparatus of interactive display based on gesture recognition | |
Memo et al. | Head-mounted gesture controlled interface for human-computer interaction | |
Rekimoto | Matrix: A realtime object identification and registration method for augmented reality | |
CN112657176A (en) | Binocular projection man-machine interaction method combined with portrait behavior information | |
CN111291885A (en) | Near-infrared image generation method, network generation training method and device | |
WO2023071964A1 (en) | Data processing method and apparatus, and electronic device and computer-readable storage medium | |
CN111401266B (en) | Method, equipment, computer equipment and readable storage medium for positioning picture corner points | |
CN108604379A (en) | System and method for determining the region in image | |
CN109359514B (en) | DeskVR-oriented gesture tracking and recognition combined strategy method | |
US20050166163A1 (en) | Systems and methods of interfacing with a machine | |
US11308655B2 (en) | Image synthesis method and apparatus | |
Caputo et al. | 3D Hand Gesture Recognition Based on Sensor Fusion of Commodity Hardware. | |
CN104081307A (en) | Image processing apparatus, image processing method, and program | |
CN111354007B (en) | Projection interaction method based on pure machine vision positioning | |
CN108305321B (en) | Three-dimensional human hand 3D skeleton model real-time reconstruction method and device based on binocular color imaging system | |
CN109271023B (en) | Selection method based on three-dimensional object outline free-hand gesture action expression | |
US20240071016A1 (en) | Mixed reality system, program, mobile terminal device, and method | |
CN111527468A (en) | Air-to-air interaction method, device and equipment | |
JP2021520577A (en) | Image processing methods and devices, electronic devices and storage media | |
US20230351724A1 (en) | Systems and Methods for Object Detection Including Pose and Size Estimation | |
CN110222651A (en) | A kind of human face posture detection method, device, terminal device and readable storage medium storing program for executing | |
Battisti et al. | Seamless bare-hand interaction in mixed reality | |
KR20200069009A (en) | Apparatus for Embodying Virtual Reality and Automatic Plan System of Electrical Wiring | |
EP3309713B1 (en) | Method and device for interacting with virtual objects | |
JP6304815B2 (en) | Image processing apparatus and image feature detection method, program and apparatus thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210416 |
|
RJ01 | Rejection of invention patent application after publication |