CN112657176A

CN112657176A - Binocular projection man-machine interaction method combined with portrait behavior information

Info

Publication number: CN112657176A
Application number: CN202011642041.2A
Authority: CN
Inventors: 谢巍; 许练濠; 吴伟林
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-16

Abstract

The invention discloses a human-computer interaction method for binocular projection combined with portrait behavior information. The method includes: performing edge detection on an area captured by a camera; performing line detection on an image after edge detection; and solving perspective transformation through a homography transformation matrix The mapping relationship below; use the YOLOv3 target detection algorithm to identify the interactive objects, and obtain the game coordinates mapped from the projection area coordinates to the Unity3D development scene; obtain the depth information of the interactive objects through the binocular camera ranging; obtain the interactors through the Kinect camera The key point position of the skeleton is then virtualized in the Uinty3D software, and the human-computer interaction can be virtualized in the Uinty3D software. It can greatly improve the interactivity of the projection human-computer interaction system.

Description

Binocular projection man-machine interaction method combined with portrait behavior information

Technical Field

The invention relates to the technical fields of image processing, feature extraction, feature analysis, computer vision, convolutional neural networks, target detection, human-computer interaction and the like, in particular to a binocular projection human-computer interaction method combining portrait behavior information.

Background

With the development of scientific technology, man-machine interaction technology becomes diversified, people no longer need to simply present virtual scenes, and begin to explore interaction methods with virtual worlds, so that more and more novel man-machine interaction technologies come into play. Human-computer interaction techniques fall into several categories: the traditional interactive technology taking a keyboard and a mouse as input; interaction technologies based on touch screen devices, such as smart phones, tablet computers; non-contact interaction technologies based on machine vision and image processing technologies, such as virtual keyboards, gesture interaction systems and the like.

The key-press keyboard and mouse are the most mature interactive devices at present, and the man-machine interaction technology based on the key-press keyboard and mouse is also applied to the computer operation at the earliest. The interactive mode has the characteristics of stable performance and quick response, and is widely applied to office work in daily life. However, the disadvantages of this technique are: a complete human-computer interaction process requires devices such as a keyboard, a mouse, a screen for displaying a graphical interface and the like, and the devices are numerous and heavy; with the development of touch screen technology and mobile equipment, display and interaction functions of novel mobile products such as smart phones are integrated into a screen, and the characteristics of portability and easy operation enable the mobile equipment to be rapidly and widely used, so that the life style of people is changed; the diversified demands of people on human-computer interaction prompt people to find a more natural and less-limited human-computer interaction mode.

In the literature (Goto H, Takemura D, Kawasaki Y, et al. development of an Information Projection Using a Projector-Camera System [ J ]. Electronics and Communications in Japan,2013,96(11): 70-81), Hiroki Goto et al studied a Camera Projection interaction System based on the frame difference method and the hand flesh color extraction method: firstly, separating hands from a scene based on clustering characteristics of hand skin colors in HSV and YCbCr spaces, and then detecting fingertip positions on a separated foreground image by using a template matching method, thereby realizing projection interaction between a user and a computer or a family television. In the literature (Fitriani, Goh W.B. Interactive with projected media on deformable surfaces [ C ]. Rio de Janeiro, Brazil: IEEE International Conference on Computer Vision,2007:1-6.), Fitriani et al propose a human-Computer interaction system based on a deformable projection surface, which projects a virtual scene onto the surface of a deformable object, then detects the deformation generated when a user touches the projection screen, and analyzes the interaction information through an image processing algorithm and a deformation model of the object. The above schemes based on machine vision technology and image processing algorithm all have the following problems: the diversity of the projection scene cannot be guaranteed; the dependence on peripherals is large. As in an interactive system based on hand skin color, when the projected scene is similar to the hand skin color, the hand foreground separation algorithm is ineffective.

Disclosure of Invention

The invention discloses a binocular projection human-computer interaction method combined with portrait behavior information, and aims to utilize a depth learning method to carry out human-computer interaction, and simultaneously combine depth information of an interaction object measured by a binocular camera and portrait behavior information of an interactor to improve the interactivity of a projection human-computer interaction system to a great extent.

The invention is realized by at least one of the following technical schemes.

A binocular projection human-computer interaction method combined with portrait behavior information comprises the following steps:

acquiring image data by using a camera, and carrying out edge detection on a camera capturing area;

carrying out straight line detection on the image after edge detection through Hough straight line detection to realize area positioning of a projection area;

solving the mapping relation under the view angle transformation through the homography transformation matrix;

identifying the interactive objects by using a YOLOv3 target detection algorithm, and obtaining game coordinates mapped to the Unity3D development scene from the projection area coordinates according to the solved homography transformation matrix;

obtaining depth information of an interactive object through binocular camera ranging;

and virtualizing the character object by obtaining the positions of the skeleton key points of the interactive person, and generating corresponding interactive actions according to the distribution of the character joint points.

Preferably, the camera image is grayed before edge detection, the edge is a set of points with obvious brightness change in the image, and a large-scale canny filter operator is adopted to detect the edge of the image.

Preferably, a gaussian filter is used to smooth the image and filter out noise when edge detection is performed;

a gaussian kernel of size (2k +1) × (2k +1) is set by the following formula:

wherein k is a positive integer, i, j is in the middle of [1,2k +1 ]]，σ²Let σ be 1.4 and k be 1 for the variance of the gaussian function, resulting in a gaussian convolution kernel:

convolving the Gaussian kernel with a gray image to obtain a smooth image;

calculating the gradient strength and direction of each pixel point in the image, and utilizing Sobel operators in the horizontal direction and the vertical direction:

wherein S_xIs the Sobel operator in the horizontal direction, S_yIs a vertical Sobel operator which is respectively convolved with the smooth image to obtain the first derivative G of the pixel point in two directions_x,G_yFrom this, the gradient of the pixel point is calculated:

preferably, the non-maxima suppression comprises the steps of:

3) comparing the gradient strength of the current pixel with two pixels along the positive and negative gradient directions;

4) if the gradient intensity of the current pixel is maximum compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is restrained.

Preferably, the performing line detection on the image after the edge detection through hough line detection to realize the area positioning process of the projection area specifically includes:

for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point under the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line, the straight line is transformed into: b is y-xk, and the abscissa in the hough space is defined as k, and the ordinate is b, then b is y-xk, which is a straight line with slope-x and intercept of y in the hough space; several points (x) on the same straight line on the Cartesian coordinate system₁,y₁),(x₂,y₂),…,(x_n,y_n) Corresponding to a plurality of straight lines on the Hough space, wherein the common intersection point (k, b) of the straight lines is the slope and the intercept of the same straight line in a Cartesian coordinate system;

performing hough transform in a polar coordinate mode, specifically, expressing a straight line by using a polar coordinate equation rho ═ xcos theta + ysin theta, wherein rho is a polar distance, namely a distance from an original point to the straight line in a polar coordinate space; theta is a polar angle, namely an included angle between a line segment which passes through an origin and is perpendicular to a straight line and an x axis, defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then coordinates (x) of a plurality of points on the same straight line on the polar coordinate system₁,y₁),(x₂,y₂),…,(x_n,y_n) Corresponding to a plurality of curves in Hough space, wherein the common intersection point (theta, rho) of the curves is the polar angle and the polar distance of the same straight line in a polar coordinate system;

calculating the intersection point of the four boundary straight lines of the longest projection region to obtain four vertex coordinates (x)_lt,y_lt)、(x_lb,y_lb)、(x_rb,y_rb)、(x_rt,y_rt)。

Preferably, the solving of the mapping relationship under the view angle transformation through the homography transformation matrix specifically includes:

setting the X '-Y' plane to be vertical to the Z axis of the X-Y-Z space coordinate system and to be intersected with the Z axis to be a point (0,0,1), namely, the point (X ', Y') under the X '-Y' plane coordinate system is a point (X ', Y', 1) under the X-Y-Z space coordinate system; and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography matrix H:

in the formula, h₁～h₉9 transformation parameters for the homography matrix; further obtaining the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system as follows:

the H matrix has 9 transformation parameters, but actually only 8 degrees of freedom, and multiplying the H matrix by a scaling factor k:

i.e. k x H andh actually represents the same mapping relation, so that H only has 8 degrees of freedom, and the solution of the homography matrix H is realized by adding constraint to the homography matrix H or adding H to the homography matrix H ₉1.

Preferably, h is₉The solution equation is set to 1 as follows:

the homography matrix H is constrained modulo 1 as follows:

the equation to be solved is then:

defining target coordinate points of the four vertexes of the projection area under the obtained pixel coordinate system under the projection scene coordinate system, namely solving an H matrix:

preferably, the identifying the interactive object by using the YOLOv3 target detection algorithm, and obtaining the game coordinate mapped from the projection area coordinate to the Unity3D development scene according to the solved homography transformation matrix specifically includes:

the loss function of YOLOv3 is as follows:

where the first term is the coordinate error loss, λ_coordIs a coordinate loss function coefficient; s denotes dividing an input image intoS × S grids; b represents the number of frames included in one mesh;

whether the jth frame of the ith grid contains an object or not is represented, the containing time value is 1, and the non-containing time value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r is_ij、

X, y, w, h representing the prediction box and the real box, respectively; the second term and the third term are confidence loss,

whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]_noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; c_ijAnd

respectively representing the predicted and real confidence coefficients of the jth frame of the ith grid; classes represents the number of categories; p is a radical of_ij(c),

And the prediction probability and the real probability of the jth frame of the ith grid belonging to the class c object are shown.

Preferably, the obtaining of the depth information of the interactive object through binocular camera ranging specifically includes:

correcting the original image according to the calibration result, wherein the two corrected images are positioned on the same plane and are parallel to each other;

matching pixel points of the two corrected images to obtain a disparity map;

and calculating the depth of each pixel according to the matching result, thereby obtaining a depth map.

Preferably, the obtaining of the skeletal key point position of the interactor through the Kinect camera and the virtualization of the character object and the corresponding interaction action in the Uinty3D software specifically include:

identifying the skeleton structure of a target person by utilizing the skeleton tracking function of the Kinect somatosensory instrument, acquiring depth data of the target person and related information of a color image, and displaying the obtained skeleton structure information of the person in real time to store joint point data of the person; the interaction of the man-machine interaction system is improved by virtualizing character objects in the Unity3D software to simulate the interaction of an interactor.

Compared with the prior art, the invention has the beneficial effects that: the target detection is carried out by a deep learning method, so that the interference of environmental factors can be avoided, the diversity of interactive scenes is ensured, and meanwhile, the interactivity of a human-computer interaction system is improved by combining the depth information of an interactive object acquired by a binocular camera and the human behavior information of an interactive person acquired by a Kinect camera.

Drawings

Fig. 1 is a hardware schematic diagram of a binocular projection human-computer interaction method combining with portrait behavior information according to the embodiment;

fig. 2 is a flowchart of a binocular projection human-computer interaction method combining with portrait behavior information according to the present embodiment;

FIG. 3 is a schematic diagram of detection of a Cartesian coordinate Hough line according to the embodiment;

fig. 4 is a schematic diagram of a hough line detection algorithm of the polar coordinate system in this embodiment;

FIG. 5 is a schematic diagram of homography transformation of the present embodiment;

fig. 6 is a frame diagram of an actual binocular system according to the present embodiment;

fig. 7 is a frame diagram of an ideal binocular vision system of the present embodiment.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments.

As shown in fig. 2, a binocular projection human-computer interaction method combining portrait behavior information includes the following steps:

s1, performing edge detection on the camera capturing area through a large-scale canny filtering operator, and performing non-maximum suppression;

carrying out graying processing on the camera image;

the edge is a set of points with obvious brightness change in the image, the gradient can reflect the change speed in numerical value, and the edge of the image is detected by adopting a large-scale canny filter operator based on the principle of no leakage of the boundary of a projection area;

using a Gaussian filter to smooth the image and filter out noise;

a gaussian kernel of size (2k +1) × (2k +1) is set by the following formula:

convolving the Gaussian kernel with a gray image to obtain a smooth image;

wherein S_xIs the Sobel operator in the horizontal direction, S_xIs a vertical Sobel operator which is respectively convolved with the smooth image to obtain the first derivative G of the pixel point in two directions_x,G_yFrom this, the gradient of the pixel point is calculated:

non-maxima suppression is applied to eliminate spurious responses due to edge detection:

for each pixel point on the obtained gradient image, the reservation or elimination of the point cannot be determined only by a single threshold, and for the finally obtained edge image, the accurate description of the source image contour is expected, so that non-maximum suppression is required:

5) comparing the gradient strength of the current pixel with two pixels along the positive and negative gradient directions;

6) if the gradient intensity of the current pixel is maximum compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is inhibited;

as the convolution kernel scale increases, the more pronounced the detected edge is; based on the principle that the boundary of the projection area cannot be detected, a large-scale canny operator is adopted to detect the edge of the image.

S2, carrying out straight line detection on the image after edge detection through Hough straight line detection to realize area positioning of a projection area;

hough line detection maps each point on a cartesian coordinate system to a straight line in Hough space by using the principle of point-line duality of the cartesian coordinate system and Hough space, so that an intersection passing through a plurality of straight lines in Hough space corresponds to a straight line passing through a plurality of points in the cartesian coordinate system, as shown in fig. 3.

Specifically, for a straight line y on a cartesian coordinate system, kx + b, where (x, y) represents a coordinate point under the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line. The straight line is transformed into: and b is y-xk, and the abscissa in the hough space is k, and the ordinate in the hough space is b, then b is y-xk, which is a straight line with slope-x and intercept y in the hough space. Several points (x) on the same straight line on the Cartesian coordinate system₁,y₁),(x₂,y₂),…,(x_n,y_n) The hough space corresponds to a plurality of straight lines, and the common intersection point (k, b) of the straight lines is the slope and the intercept of the same straight line in a Cartesian coordinate system.

The slope of the vertical line in the image cannot be calculated becauseThis is typically done in polar form with a hough transform. Specifically, a straight line is expressed by a polar coordinate equation ρ ═ xcos θ + ysin θ, where ρ is a polar distance, i.e., a distance from an origin to the straight line in a polar coordinate space; θ is the polar angle, i.e. the angle between the line segment passing through the origin and perpendicular to the straight line and the x-axis. Defining the horizontal coordinate in Hough space as theta and the vertical coordinate as rho, and then defining the coordinates (x) of a plurality of points on the same straight line on the polar coordinate system₁,y₁),(x₂,y₂),…,(x_n,y_n) The hough space corresponds to a plurality of curves, and a common intersection point (θ, ρ) of the curves is a polar angle and a polar distance of the same straight line in the polar coordinate system, and a schematic diagram is shown in fig. 4 of the accompanying drawings.

Calculating the intersection point of the four boundary straight lines of the longest projection region in pairs to obtain coordinates (x) of four top points of the projection region, namely, the top left point, the bottom right point and the top right point_lt,y_lt)、(x_lb,y_lb)、(x_rb,y_rb)、(x_rt,y_rt)。

S3, solving the mapping relation under the view angle transformation through the homography transformation matrix; the homography transformation diagram is shown in figure 5.

Homography transformations reflect the process of mapping from one two-dimensional plane to three-dimensional space, and then from three-dimensional space to another two-dimensional plane. The homography transformation describes nonlinear transformation between two coordinate systems, so that the homography transformation has wide application in the fields of image splicing, image correction, augmented reality and the like.

X-Y-Z is a three-dimensional space coordinate system and can be understood as a world coordinate system; x-y is a pixel plane space coordinate system; and x '-y' is a plane coordinate system of the elevator key. The homography transform can be described as: a point (X, Y) on the X-Y coordinate system, corresponding to a straight line l passing through the origin and the point on the X-Y-Z coordinate system:

the straight line intersects the x '-y' coordinate system plane at point (x ', y'), and the process from point (x, y) to point (x ', y') is referred to as a homography transformation.

The solving process of the homography transformation is as follows:

let the X '-Y' plane be perpendicular to the Z-axis of the X-Y-Z space coordinate system and intersect the Z-axis at point (0,0,1), i.e., point (X ', Y') in the X '-Y' plane coordinate system is point (X ', Y', 1) in the X-Y-Z space coordinate system. And describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by using a homography matrix H:

that is, k x H and H actually represent the same mapping relationship, so H has only 8 degrees of freedom, and one way is to solve H₉Setting to 1, the equation to be solved is:

another approach is to add a constraint to the homography matrix H, modulo 1, as follows:

the equation to be solved is then:

s4, identifying the interactive objects by researching a convolutional neural network technology and utilizing a YOLOv3 target detection algorithm, and obtaining game coordinates mapped to a Unity3D development scene from the projection area coordinates according to the solved homography transformation matrix;

s5, obtaining depth information of the interactive object by ranging the binocular camera through a convolutional neural network;

further, interactive objects are identified by using a YOLOv3 target detection algorithm, and a game coordinate process of mapping the projection area coordinates to the Unity3D development scene is obtained according to the solved homography transformation matrix.

Coordinate positioning of the arrow position in the image is a typical object detection problem. The current research on target detection in the field of convolutional neural networks is mainly divided into two algorithms, namely a two-stage algorithm and a one-stage algorithm: the two-stage target detection algorithm calculates an input image to obtain a candidate region, and then classifies and corrects the candidate region through a convolution neural network. Representative of such algorithms are the R-CNN series and SPP-Net, etc.; the one stage target detection algorithm converts the detection and positioning of the target position into a regression problem, and the target detection algorithm from the input end to the output end is directly realized through a single convolutional neural network. one stage algorithm is represented by the YOLO series, SSD, Retina-Net, etc. The two types of target detection algorithms have respective advantages and disadvantages, the two types of target detection algorithms are superior to the one type of target detection algorithms in accuracy and precision, and the one type of target detection algorithms have great advantages in speed.

The YOLOv3 loss function is designed as follows:

where the first term is the coordinate error loss, λ_coordIs a coordinate loss function coefficient; s denotes dividing the input image into S × S meshes; b represents the number of frames included in one mesh;

whether the jth frame of the ith grid does not contain an object or not is represented, the value of the non-containing time is 1, and the value of the containing time is 0; lambda [ alpha ]_noobjTo balance the loss weights of object-bearing and object-free meshes, the goal is to reduce the confidence loss of the mesh borders without objects; cx_jAnd

S6, obtaining the positions of the skeleton key points of the interactors through a Kinect camera, virtualizing the character objects in the Uinty3D software, and generating corresponding interactive actions according to the distribution of the character joint points.

Calibrating the binocular cameras to obtain internal and external parameters and distortion coefficients of the two cameras;

the purpose of camera calibration is as follows: first, to restore the real world position of the object imaged by the camera, it is necessary to know how the world object is transformed into the computer image plane, i.e. to solve the internal and external parameter matrix.

Second, the perspective projection of the camera has a significant problem — distortion. Another purpose of camera calibration is to solve distortion coefficients and then use them for image rectification.

the main task of the binocular camera system is distance measurement, and the parallax distance measurement formula is derived under the ideal condition of the binocular system, but in the real binocular stereo vision system, two camera image planes which are completely aligned in a coplanar line do not exist, as shown in the attached figure 6, wherein p is a certain point on an object to be measured, and O is a certain point on the object to be measured₁And O₂The optical centers of the two cameras are respectively, so that the three-dimensional correction is carried out, namely, two images which are not in coplanar line alignment in practice are corrected into coplanar line alignment (the coplanar line alignment is that when two camera image planes are on the same plane and the same point is projected to the two camera image planes, the same line of two pixel coordinate systems is needed), the actual binocular system is corrected into an ideal binocular system, as shown in figure 7, wherein p is a certain point on an object to be measured, and O is a certain point on the object to be measured_RAnd O_TThe optical centers of the two cameras are respectively, the imaging points of the point P on the photoreceptors of the two cameras are P and P', f is the focal length of the cameras, B is the center distance of the two cameras, and X is_RAnd X_TThe distances from the imaging points on the image planes of the left camera and the right camera to the left edge of the image plane respectively, and z is the required depth information.

Matching pixel points of the two corrected images, wherein the pixel points are used for matching corresponding image points of the same scene on left and right views to obtain a parallax image;

calculating the depth of each pixel according to the matching result, thereby obtaining a depth map;

further, the obtaining of the skeletal key point position of the interactor through the Kinect camera, and then the virtualization of the character object and the corresponding interactive action process in the Uinty3D software specifically include:

and identifying the skeleton structure of the target person by utilizing the skeleton tracking function of the Kinect somatosensory instrument. The depth data of the target person and the related information of the color image are directly acquired from the sensor of the Kinect, and the obtained skeletal structure information of the person is displayed in real time and 20 joint point data of the person are saved.

The interactivity of the man-machine interaction system is improved by virtualizing character objects in the Unity3D software and designing a plurality of corresponding actions, such as squatting, standing, shooting postures and the like, in the Unity3D software according to the character joint distribution information so as to simulate the interactive actions of an interactor.

The invention uses the projector 2 to project a mobile phone interface through the mobile phone end 1, the interactive object starts to perform interactive action, the binocular camera 3 and the Kinect camera 4 transmit images to the cloud server 5 through the network for data processing and analysis, and the result is transmitted back to the mobile phone end 1 for displaying the game scene.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Claims

1. a binocular projection human-computer interaction method in conjunction with portrait behavior information, is characterized in that, described binocular projection human-computer interaction method, comprises the following steps:

Use the camera to obtain image data, and perform edge detection on the area captured by the camera;

Line detection is performed on the image after edge detection by Hough line detection to realize the regional positioning of the projection area;

Solve the mapping relationship under the perspective transformation through the homography transformation matrix;

Use the YOLOv3 target detection algorithm to identify the interactive objects, and obtain the game coordinates mapped from the projection area coordinates to the Unity3D development scene according to the solved homography transformation matrix;

Obtain depth information of interactive objects through binocular camera ranging;

By obtaining the position of the key points of the skeleton of the interactor, the character objects are virtualized, and corresponding interactive actions are generated according to the distribution of the character joint points.

2. a kind of binocular projection human-computer interaction method combined with portrait behavior information according to claim 1, is characterized in that, before edge detection, the camera image is subjected to grayscale processing, and the edge is the point where the brightness changes obvious in the image A collection of large-scale canny filtering operators to detect image edges.

3. a kind of binocular projection human-computer interaction method in combination with portrait behavior information according to claim 2, is characterized in that, first use Gaussian filter when carrying out edge detection, to smooth image, filter out noise;

A Gaussian kernel of size (2k+1)×(2k+1) is set by the following formula:

where k is a positive integer, i,j∈[1,2k+1], σ ² is the variance of the Gaussian function, set σ=1.4, k=1, the Gaussian convolution kernel is obtained:

Use the Gaussian kernel to convolve the grayscale image to obtain a smooth image;

Calculate the gradient intensity and direction of each pixel in the image, using the Sobel operator in the horizontal and vertical directions:

Among them, S _x is the Sobel operator in the horizontal direction, and S _y is the Sobel operator in the vertical direction, which are convolved with the smooth image respectively to obtain the first-order derivatives G _x and G _y of the pixel points in the two directions, and then calculate the pixel points. The gradient of :

4. a kind of binocular projection human-computer interaction method combined with portrait behavior information according to claim 3, is characterized in that, non-maximum value suppression comprises the following steps:

1) Compare the gradient strength of the current pixel with two pixels along the positive and negative gradient directions;

2) If the gradient strength of the current pixel is the largest compared with the other two pixels, the pixel point is reserved as an edge point, otherwise the pixel point will be suppressed.

5. a kind of binocular projection human-computer interaction method combined with portrait behavior information according to claim 4, is characterized in that, the described image after edge detection is carried out line detection by Hough line detection, realizes the area of projection area The positioning process includes:

For a straight line y=kx+b on the Cartesian coordinate system, where (x, y) represents the coordinate point in the coordinate system, k represents the slope of the line, b represents the intercept of the line, and the line is transformed into: b=y-xk , and define that the abscissa in the Hough space is k and the ordinate is b, then b=y-xk is a straight line with a slope of -x and an intercept of y in the Hough space; Several points (x ₁ , y ₁ ), (x ₂ , y ₂ ),…,(x _n , y _n ) correspond to several straight lines in Hough space, and the common intersection (k,b) of these straight lines is the slope and intercept of the same line in Cartesian coordinates;

The Hough transform is carried out in the form of polar coordinates. Specifically, the straight line is represented by the polar coordinate equation ρ=xcosθ+ysinθ, where ρ is the polar distance, that is, the distance from the origin to the straight line in the polar coordinate space; The angle between the line segment at the origin and perpendicular to the line and the x-axis, define the abscissa as θ and the ordinate as ρ in the Hough space, then the coordinates of several points on the same line in the polar coordinate system (x ₁ , y ₁ ), ( x ₂ , y ₂ ),…,(x _n , y _n ) correspond to several curves in Hough space, and the common intersection (θ, ρ) of these curves is the polar angle and polar angle of the same straight line in the polar coordinate system distance;

For the obtained four longest projection area boundary lines, find the intersection points of the four vertex coordinates (x _lt , y _lt ), (x _lb , y _lb ), (x _rb , y _rb ), ( x _rt , y _rt ).

6. a kind of binocular projection human-computer interaction method in combination with portrait behavior information according to claim 5, is characterized in that, described solving the mapping relation under the perspective transformation by homography transformation matrix, specifically comprises:

Suppose the x'-y' plane is perpendicular to the Z axis of the X-Y-Z space coordinate system, and intersects the Z axis at the point (0,0,1), that is, the point (x', y') under the x'-y' plane coordinates is the point (x', y', 1) in the X-Y-Z space coordinate system; use the homography matrix H to describe the mapping relationship between the x-y plane coordinate system and the X-Y-Z space coordinate system:

In the formula, h ₁ ~ h ₉ are the nine transformation parameters of the homography matrix; and then the mapping relationship from the xy plane coordinate system to the x'-y' plane coordinate system is:

The H matrix has 9 transformation parameters, but actually only has 8 degrees of freedom. Multiply the H matrix by a scaling factor k, then:

That is, k*H and H actually represent the same mapping relationship, so H has only 8 degrees of freedom. To solve the homography matrix H, add constraints to the homography matrix H or set h ₉ to 1.

7. a kind of binocular projection human-computer interaction method in conjunction with portrait behavior information according to claim 6, is characterized in that, the solution mode that h ₉ is set to 1, the solution equation is:

Add constraints to the homography matrix H so that its modulus is equal to 1, as follows:

Then the equation to be solved is:

From the four vertices of the projection area in the pixel coordinate system obtained above, define their respective target coordinate points in the projected scene coordinate system, and then the H matrix can be solved:

8. a kind of binocular projection human-computer interaction method in conjunction with portrait behavior information according to claim 7, is characterized in that, described general uses YOLOv3 target detection algorithm to identify the interactive object, and transforms according to the homography of solving The matrix is mapped from the projection area coordinates to the game coordinates in the Unity3D development scene, including:

The loss function of the YOLOv3 is as follows:

In the formula, the first term is the coordinate error loss, and λ _coord is the coordinate loss function coefficient; S represents dividing the input image into S×S grids; B represents the number of frames contained in a grid;

Indicates whether the j-th frame of the i-th grid contains an object, the value is 1 when it is included, and 0 when it is not included; x, y respectively represent the center coordinates of the frame; w, h respectively represent the length and width of the frame; r _ij ,

represent the x, y, w, and h of the predicted box and the real box, respectively; the second and third terms are the confidence loss,

Indicates whether the j-th border of the i-th grid does not contain objects, the value is 1 when not included, and 0 when it is included; λ _noobj is the loss weight of the grid with objects and without objects, the purpose is to reduce the loss weight of grids without objects The confidence loss of the grid border of ; C _ij is the same as

Represents the predicted and true confidence of the j-th frame of the i-th grid; classes represents the number of categories; p _ij (c),

Represents the predicted probability and true probability that the jth box of the ith grid belongs to the cth object.

9. A binocular projection human-computer interaction method combined with portrait behavior information according to claim 8, wherein the obtaining depth information of the interactive object through binocular camera ranging, specifically comprises:

Correct the original image according to the calibration result, and the two corrected images are located on the same plane and parallel to each other;

Perform pixel point matching on the two corrected images to obtain a disparity map;

The depth map is obtained by calculating the depth of each pixel according to the matching result.

10. A binocular projection human-computer interaction method combined with portrait behavior information according to claim 9, characterized in that, obtaining the key point position of the skeleton of the interactor through the Kinect camera, and then virtualizing the character in the Uinty3D software Objects and corresponding interactive actions, including:

Use the skeletal tracking function of the Kinect somatosensory instrument to identify the skeletal structure of the target person, obtain the depth data of the target person and related information of the color image, and display the obtained skeleton structure information in real time to save the joint point data of the person; Characterized objects, imitating the interaction of the interactor.