CN112363629B

CN112363629B - A new non-contact human-computer interaction method and system

Info

Publication number: CN112363629B
Application number: CN202011395956.8A
Authority: CN
Inventors: 袁誉乐; 于奇; 郭学胤; 梁立新; 相韶华
Original assignee: Shenzhen Technology University
Current assignee: Shenzhen Technology University
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2021-05-28
Anticipated expiration: 2040-12-03
Also published as: CN112363629A; WO2022116281A1

Abstract

The present invention is applicable to the technical field of human-computer interaction, and provides a new non-contact human-computer interaction method and system. The method includes the steps of: S100. In the two-dimensional plane image of the depth camera, automatically detect the Three vertices A, B, C and target point F; S200, combine the internal parameters of the depth camera, convert the plane pixel coordinates of the vertices A, B, C and the target point F into three-dimensional coordinates in the depth camera coordinate system; S300, calculate the target The three-dimensional coordinates of the projection point F' of the point F on the display screen in the depth camera coordinate system; S400, calculating the plane pixel coordinates of the projection point F' on the display screen; S500, recognizing the action of the target point F, and calling the relevant system mouse The interface triggers mouse events. The invention solves the problem of using detectable target objects such as fingertips to control the screen content in a mouseless environment, does not need any calibration, and has the advantages of small calculation amount and simple hardware equipment.

Description

A new non-contact human-computer interaction method and system

技术领域technical field

本发明属于人机交互的技术领域，具体涉及一种新的非接触式人机交互方法及系统。The invention belongs to the technical field of human-computer interaction, and in particular relates to a new non-contact human-computer interaction method and system.

背景技术Background technique

目前，非接触式人机交互技术在互动游戏、互动博物馆、互动旅游馆以及VR/AR(虚拟现实/增强现实)等领域都有着广泛的应用，使用者在电脑显示屏、智能电视等显示区域前用各种非接触式的手势就可以与电子产品进行交互操作，比如可以用手势调节音量，这种无需鼠标操作的人机交互技术具有极大的市场价值及经济价值。At present, non-contact human-computer interaction technology has a wide range of applications in interactive games, interactive museums, interactive tourist museums, and VR/AR (virtual reality/augmented reality). Various non-contact gestures can be used to interact with electronic products. For example, gestures can be used to adjust the volume. This human-computer interaction technology without mouse operation has great market value and economic value.

在公开号CN102841679A，发明名称为一种非接触式人机互动方法与装置的专利申请文件中，公开了一种非接触式人机互动的方法与装置，主要是利用了摄像机标定原理，其中需要获取摄像机的位置信息和方向信息计算标定结果，还需要引入重力传感模块和滑动测阻模块，硬件设计较为复杂。In the patent application document with the publication number CN102841679A, the invention title is a non-contact human-computer interaction method and device, a non-contact human-computer interaction method and device are disclosed, mainly using the camera calibration principle, which requires To obtain the position information and direction information of the camera to calculate the calibration results, it is also necessary to introduce a gravity sensing module and a sliding resistance measurement module, and the hardware design is relatively complicated.

而随着深度摄像机的出现和计算机视觉领域中深度学习技术的发展，基于深度摄像机的人机交互系统也越来越多。深度摄像机与普通摄像机的区别在于，除了能够获取平面图像信息，还可以获得拍摄对象的深度信息，也就是三维的位置和尺寸信息，使得整个计算系统能够获得环境和对象的三维立体数据。深度摄像机具备三维感测与识别的能力，经过进一步深化处理，还可以完成三维建模等应用。With the emergence of depth cameras and the development of deep learning technology in the field of computer vision, there are more and more human-computer interaction systems based on depth cameras. The difference between a depth camera and an ordinary camera is that, in addition to obtaining plane image information, it can also obtain depth information of the shooting object, that is, three-dimensional position and size information, so that the entire computing system can obtain three-dimensional data of the environment and objects. The depth camera has the ability of three-dimensional sensing and recognition. After further processing, it can also complete applications such as three-dimensional modeling.

发明内容SUMMARY OF THE INVENTION

本发明实施例的目的在于提供一种新的非接触式人机交互方法，旨在解决现有技术中的利用摄像机标定原理进行非接触式人机交互存在的计算过程复杂并且硬件复杂的问题。The purpose of the embodiments of the present invention is to provide a new non-contact human-computer interaction method, which aims to solve the problems in the prior art that the non-contact human-computer interaction using the camera calibration principle is complicated in calculation process and hardware.

本发明实施例是这样实现的，提供一种新的非接触式人机交互方法，包括如下步骤：The embodiments of the present invention are implemented in this way, and provide a new non-contact human-computer interaction method, which includes the following steps:

S100、在深度摄像机的二维平面图像中，利用深度学习系统自动检测显示屏幕的三个顶点A、B、C和目标点F，得到各点的平面像素坐标；S100, in the two-dimensional plane image of the depth camera, use the deep learning system to automatically detect the three vertices A, B, C and the target point F of the display screen, and obtain the plane pixel coordinates of each point;

S200、结合深度摄像机内参，将顶点A、B、C和目标点F的平面像素坐标转换为深度摄像机坐标系下的三维坐标；S200, combining the internal parameters of the depth camera, convert the plane pixel coordinates of the vertices A, B, C and the target point F into three-dimensional coordinates in the depth camera coordinate system;

S300、计算目标点F在显示屏幕上的投影点F’在深度摄像机坐标系下的三维坐标；S300, calculate the three-dimensional coordinates of the projection point F' of the target point F on the display screen under the depth camera coordinate system;

S400、计算投影点F’在显示屏幕上的平面像素坐标；S400, calculate the plane pixel coordinates of the projection point F' on the display screen;

S500、识别目标点F的动作，调用相关系统鼠标接口进行鼠标事件触发。S500. Identify the action of the target point F, and call the relevant system mouse interface to trigger mouse events.

进一步地，所述步骤S100包括如下子步骤：Further, the step S100 includes the following sub-steps:

S110、利用基于深度学习的目标检测算法对显示屏幕的顶点A、B、C进行自动检测，得到各点的平面像素坐标；S110, using the target detection algorithm based on deep learning to automatically detect the vertices A, B, and C of the display screen, and obtain the plane pixel coordinates of each point;

S120、建立一个两阶段的目标检测深度神经网络结构，对目标点F进行自动检测，包括：S120. Establish a two-stage target detection deep neural network structure to automatically detect the target point F, including:

S121、建立一个目标物体检测深度神经网络，用于检测二维平面图像中的目标物体，对检测到的目标物体区域进行扩展，并根据扩展后的区域定位目标物体；S121, establishing a target object detection deep neural network for detecting the target object in the two-dimensional plane image, expanding the detected target object area, and locating the target object according to the expanded area;

S122、建立一个目标点检测深度神经网络，用于检测目标物体的目标点，并定位出目标点，得到目标点F的平面像素坐标。S122 , establishing a target point detection deep neural network for detecting the target point of the target object, locating the target point, and obtaining the plane pixel coordinates of the target point F.

进一步地，所述步骤S120还包括：建立一个双通道注意力机制的神经网络，用于提高检测目标点的定位精度。Further, the step S120 further includes: establishing a neural network with a dual-channel attention mechanism to improve the positioning accuracy of the detected target point.

进一步地，所述步骤S300包括：Further, the step S300 includes:

根据顶点A、B、C，计算经过顶点A的屏幕平面的法向量n，根据线段FF’平行于法向量n和线段AF’垂直于法向量n，计算投影点F’的三维坐标。According to the vertices A, B, C, calculate the normal vector n of the screen plane passing through the vertex A, and calculate the three-dimensional coordinates of the projection point F' according to the line segment FF' parallel to the normal vector n and the line segment AF' perpendicular to the normal vector n.

进一步地，所述步骤S400包括：Further, the step S400 includes:

计算投影点F’的平面像素坐标中横坐标u与纵坐标v之比，然后结合显示屏幕的屏幕分辨率，得到投影点F’的平面像素坐标。Calculate the ratio of the abscissa u to the ordinate v in the plane pixel coordinates of the projection point F', and then combine with the screen resolution of the display screen to obtain the plane pixel coordinates of the projection point F'.

进一步地，所述步骤S400包括：Further, the step S400 includes:

分别计算投影点F’到线段AB的距离D_FAB和投影点F’到线段AC的距离D_FAC，得到投影点F’的平面像素坐标中横坐标u与纵坐标v之比，然后结合显示屏幕的屏幕分辨率，根据u＝(D_FAB/AC)*屏幕水平像素，v＝(D_FAC/AB)*屏幕垂直像素，得到投影点F’的平面像素坐标F’(u,v)。Calculate the distance D _FAB from the projection point F' to the line segment AB and the distance D _FAC from the projection point F' to the line segment AC respectively, and obtain the ratio of the abscissa u to the ordinate v in the plane pixel coordinates of the projection point F', and then combine with the display screen According to u=(D _FAB /AC)*screen horizontal pixel, v=(D _FAC /AB)*screen vertical pixel, the plane pixel coordinate F'(u,v) of the projection point F' is obtained.

本发明实施例的另一目的在于提供一种新的非接触式人机交互系统，包括计算机、深度摄像机、显示屏幕，计算机分别与深度摄像机和显示屏幕连接，所述系统采用上述新的非接触式人机交互方法进行人机交互。Another object of the embodiments of the present invention is to provide a new non-contact human-computer interaction system, including a computer, a depth camera, and a display screen. The computer is respectively connected to the depth camera and the display screen. The system adopts the above-mentioned new contactless human-computer interaction method.

本发明实施例的又一目的在于提供一种计算机可读存储介质，其存储用于电子数据交换的程序，所述程序用于执行上述新的非接触式人机交互方法。Another object of the embodiments of the present invention is to provide a computer-readable storage medium, which stores a program for electronic data exchange, and the program is used to execute the above-mentioned new contactless human-computer interaction method.

与现有技术相比，本发明提供的一种新的非接触式人机交互方法及系统的有益效果为：解决了无鼠标环境下利用手指指尖等可检测的目标物体对屏幕内容进行操控的问题，这种新的非接触式人机交互方法无需任何标定，具有计算量小、硬件设备简单的优点。本发明基于深度学习，通过深度摄像机获取二维平面信息、三维深度信息，计算得到目标物体在显示屏幕上的投影的坐标，触发鼠标事件，实现非接触式人机交互。Compared with the prior art, the beneficial effect of a new non-contact human-computer interaction method and system provided by the present invention is that it solves the problem of using detectable target objects such as fingertips to control screen content in a mouse-free environment. This new non-contact human-computer interaction method does not require any calibration, and has the advantages of small computational complexity and simple hardware equipment. Based on deep learning, the invention obtains two-dimensional plane information and three-dimensional depth information through a depth camera, calculates the projected coordinates of the target object on the display screen, triggers mouse events, and realizes non-contact human-computer interaction.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来说，在不付出创造性劳动的前提下，还可以根据这些附图得到其它的附图。In order to explain the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings required in the technical description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. , for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1是本发明实施例提供的一种新的非接触式人机交互方法的流程图。FIG. 1 is a flowchart of a new non-contact human-computer interaction method provided by an embodiment of the present invention.

图2是本发明实施例中深度摄像机拍摄到的二维平面图像。FIG. 2 is a two-dimensional plane image captured by a depth camera in an embodiment of the present invention.

图3是本发明实施例中两阶段的目标检测深度神经网络结构的示意图。FIG. 3 is a schematic diagram of a two-stage target detection deep neural network structure in an embodiment of the present invention.

图4是本发明实施例中深度摄像机、手指指尖与显示屏幕的位置关系示意图。FIG. 4 is a schematic diagram of the positional relationship between a depth camera, a fingertip, and a display screen in an embodiment of the present invention.

图5是本发明实施例中计算投影点F’在深度摄像机坐标系中的三维坐标的流程图。Fig. 5 is a flow chart of calculating the three-dimensional coordinates of the projection point F' in the depth camera coordinate system in the embodiment of the present invention.

图6是本发明实施例中触发鼠标双击事件时手指指尖的当前位置与原始位置的关系示意图。6 is a schematic diagram of the relationship between the current position of the fingertip and the original position when a mouse double-click event is triggered in an embodiment of the present invention.

图7是本发明实施例中触发鼠标双击事件的流程图。FIG. 7 is a flowchart of triggering a mouse double-click event in an embodiment of the present invention.

图8是本发明实施例提供的一种新的非接触式人机交互系统的结构示意图。FIG. 8 is a schematic structural diagram of a new non-contact human-computer interaction system provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明要解决的技术问题、技术方案及有益效果更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the technical problems, technical solutions and beneficial effects to be solved by the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

参考图1，图1是本发明实施例提供的一种新的非接触式人机交互方法的流程图，该方法包括以下步骤：Referring to FIG. 1, FIG. 1 is a flowchart of a new non-contact human-computer interaction method provided by an embodiment of the present invention, and the method includes the following steps:

S100、在深度摄像机的二维平面图像中，利用深度学习系统自动检测显示屏幕的三个顶点A、B、C和目标点F，得到各点的平面像素坐标A(ua,va)，B(ub,vb)，C(uc,vc)，F(u0,v0)；S100, in the two-dimensional plane image of the depth camera, use the deep learning system to automatically detect the three vertices A, B, C and the target point F of the display screen, and obtain the plane pixel coordinates A(ua,va) of each point, B( ub, vb), C (uc, vc), F (u0, v0);

S200、结合深度摄像机内参，将各点的平面像素坐标转换为深度摄像机坐标系下的三维坐标，得到各点的三维坐标A(x₁,y₁,z₁),B(x₂,y₂,z₂),C(x₃,y₃,z₃),F(x₀,y₀,z₀)；S200. Combine the internal parameters of the depth camera, convert the plane pixel coordinates of each point into three-dimensional coordinates in the depth camera coordinate system, and obtain the three-dimensional coordinates A(x ₁ , y ₁ , z ₁ ), B(x ₂ , y ₂ of each point) , z ₂ ), C(x ₃ , y ₃ , z ₃ ), F(x ₀ , y ₀ , z ₀ );

S300、计算目标点F在显示屏幕上的投影点F’在深度摄像机坐标系下的三维坐标F’(x′,y′,z′)；S300, calculate the three-dimensional coordinates F' (x', y', z') of the projection point F' of the target point F on the display screen in the depth camera coordinate system;

S400、计算投影点F’在显示屏幕上的平面像素坐标F’(u,v)，包括：S400. Calculate the plane pixel coordinates F'(u, v) of the projection point F' on the display screen, including:

分别计算投影点F’到线段AB的距离D_FAB和投影点F’到线段AC的距离D_FAC，得到投影点F’的平面像素坐标中横坐标u与纵坐标v之比，然后结合显示屏幕的屏幕分辨率(屏幕水平像素*屏幕垂直像素)，根据u＝(D_FAB/AC)*屏幕水平像素，v＝(D_FAC/AB)*屏幕垂直像素，得到投影点F’的平面像素坐标F’(u,v)；Calculate the distance D _FAB from the projection point F' to the line segment AB and the distance D _FAC from the projection point F' to the line segment AC respectively, and obtain the ratio of the abscissa u to the ordinate v in the plane pixel coordinates of the projection point F', and then combine with the display screen The screen resolution (screen horizontal pixels * screen vertical pixels), according to u = (D _FAB /AC) * screen horizontal pixels, v = (D _FAC /AB) * screen vertical pixels, get the plane pixel coordinates of the projection point F'F'(u,v);

S500、识别目标点F的动作，在投影点F’的位置显示鼠标标记，调用相关系统鼠标接口进行鼠标事件触发，对显示屏幕进行操控。S500. Identify the action of the target point F, display a mouse mark at the position of the projection point F', call the relevant system mouse interface to trigger a mouse event, and manipulate the display screen.

图2是本发明实施例中深度摄像机拍摄到的二维平面图像，图像中显示了一块呈长方形的显示屏幕和手势为伸出食指的一只手。在本发明实施例中，将手指指尖作为目标点进行说明，在其他实施例中，还可以选择其他可检测的目标物体作为目标点。FIG. 2 is a two-dimensional plane image captured by a depth camera in an embodiment of the present invention, and the image shows a rectangular display screen and a hand with a gesture of extending an index finger. In the embodiment of the present invention, the fingertip is used as the target point for description. In other embodiments, other detectable target objects may also be selected as the target point.

参考图2，上述步骤S100包括如下子步骤：Referring to FIG. 2, the above step S100 includes the following sub-steps:

S110、利用基于深度学习的目标检测算法(比如YOLO算法或SSD算法)对显示屏幕的三个顶点A、B、C进行自动检测，选择显示屏幕的三个顶点A、B、C构成屏幕平面，并以顶点A作为屏幕平面的像素坐标系的原点，得到各点的平面像素坐标A(ua,va)，B(ub,vb)，C(uc,vc)；S110. Use a deep learning-based target detection algorithm (such as the YOLO algorithm or the SSD algorithm) to automatically detect the three vertices A, B, and C of the display screen, and select the three vertices A, B, and C of the display screen to form a screen plane, And take vertex A as the origin of the pixel coordinate system of the screen plane, and obtain the plane pixel coordinates A(ua,va), B(ub,vb), C(uc,vc) of each point;

S120、参考图3，建立一个两阶段的目标检测深度神经网络结构，对手指指尖F进行自动检测，包括：S120, referring to FIG. 3, establish a two-stage target detection deep neural network structure to automatically detect the fingertip F, including:

S121、建立一个手指检测深度神经网络，用于检测二维平面图像中的手指，对检测到的手指区域进行扩展，并根据扩展后的区域定位手指；S121, establishing a finger detection deep neural network for detecting fingers in a two-dimensional plane image, expanding the detected finger region, and locating the finger according to the expanded region;

S122、建立一个指尖检测深度神经网络，用于检测手指的指尖，并定位出指尖，得到手指指尖F的平面像素坐标F(u0,v0)。S122 , establishing a deep neural network for fingertip detection, which is used to detect the fingertip of the finger, locate the fingertip, and obtain the plane pixel coordinates F(u0, v0) of the fingertip F of the finger.

为了提高检测指尖的定位精度，本发明实施例还建立了一个双通道注意力机制的神经网络。注意力机制是一种模拟人脑注意力的模型，利用有限的注意力快速筛选出重要信息，从而提高人脑在处理视觉信息上的效率与准确率。In order to improve the positioning accuracy of detecting fingertips, the embodiment of the present invention also establishes a neural network with a dual-channel attention mechanism. The attention mechanism is a model that simulates the attention of the human brain. It uses limited attention to quickly screen out important information, thereby improving the efficiency and accuracy of the human brain in processing visual information.

参考图4，图4是本发明实施例中深度摄像机、手指指尖与显示屏幕的位置示意图。图中显示了手指指尖F在显示屏幕的投影点F’。Referring to FIG. 4 , FIG. 4 is a schematic diagram of the positions of the depth camera, the fingertip, and the display screen in the embodiment of the present invention. The figure shows the projection point F' of the fingertip F on the display screen.

参考图5，上述步骤S300包括如下子步骤：Referring to FIG. 5, the above step S300 includes the following sub-steps:

S310、根据顶点A、B、C，计算经过顶点A的屏幕平面的法向量n＝(a,b,c)，其中：S310, according to the vertices A, B, C, calculate the normal vector n=(a, b, c) of the screen plane passing through the vertex A, wherein:

a＝y₁(z₂-z₃)+y₂(z₃-z₁)+y₃(z₁-z₂)，a=y ₁ (z ₂ -z ₃ )+y ₂ (z ₃ -z ₁ )+y ₃ (z ₁ -z ₂ ),

b＝z₁(x₂-x₃)+z₂(x₃-x₁)+z₃(x₁-x₂)，b=z ₁ (x ₂ -x ₃ )+z ₂ (x ₃ -x ₁ )+z ₃ (x ₁ -x ₂ ),

c＝x₁(y₂-y₃)+x₂(y₃-y₁)+x₃(y₁-y₂)；c=x ₁ (y ₂ -y ₃ )+x ₂ (y ₃ -y ₁ )+x ₃ (y ₁ -y ₂ );

S320、设投影点F’在深度摄像机坐标系下的三维坐标为F’(x′,y′,z′)，根据线段FF’平行于法向量n，得到方程组：S320. Let the three-dimensional coordinates of the projection point F' in the depth camera coordinate system be F'(x', y', z'), and according to the line segment FF' being parallel to the normal vector n, a system of equations is obtained:

S330、根据线段AF’垂直于法向量n，得到方程：S330. According to the line segment AF' being perpendicular to the normal vector n, the equation is obtained:

a(x′-x₁)+b(y′-y₁)+c(z′-z₁)＝0(**)；a(x'-x ₁ )+b(y'-y ₁ )+c(z'-z ₁ )=0(**);

S340、联立方程组(*)和方程(**)，得到t，将t代入方程组(*)，即得到投影点F’的三维坐标F’(x′,y′,z′)。S340. Simultaneously combine the equation system (*) and the equation (**) to obtain t, and substitute t into the equation system (*) to obtain the three-dimensional coordinates F'(x', y', z') of the projection point F'.

具体地，上述步骤S500中的所述识别目标点F的动作包括：Specifically, the action of identifying the target point F in the above step S500 includes:

当其他顶点A、B、C均固定不变时，使用者移动手指，利用上述步骤S100到S200重新计算手指指尖F的三维坐标。其中，目标点F的动作包括点击、双击、保持按下状态、按下后被释放等，对应触发的所述鼠标事件包括鼠标点击事件(click)、鼠标双击事件(dbclick)、鼠标按钮被按下时触发的事件(mousedown)、鼠标按钮被释放时触发的事件(mouseup)。When the other vertices A, B, and C are all fixed, the user moves the finger, and uses the above steps S100 to S200 to recalculate the three-dimensional coordinates of the fingertip F of the finger. The actions of the target point F include clicking, double-clicking, keeping the pressed state, and releasing after being pressed. The event triggered when the mouse button is released (mousedown), and the event triggered when the mouse button is released (mouseup).

下面以触发鼠标双击事件为例进行说明。The following is an example of triggering a mouse double-click event.

图6是本发明实施例中触发鼠标双击事件时手指指尖的当前位置与手指指尖原始位置的关系示意图，图中D1、D2分别为手指指尖的当前位置到原始位置的距离，D1>D2。参考图6，本发明实施例中双击动作的定义为：手指指尖离开原始位置后又回到原始位置附近，所述手指指尖离开原始位置的定义为：手指指尖的位置与原始位置的距离大于D1，所述手指指尖回到原始位置附近的定义为：手指指尖的位置与原始位置的距离小于D2。6 is a schematic diagram of the relationship between the current position of the fingertip and the original position of the fingertip when the mouse double-click event is triggered in the embodiment of the present invention, in the figure D1 and D2 are the distances from the current position of the fingertip to the original position, D1> D2. Referring to FIG. 6 , in the embodiment of the present invention, a double-click action is defined as: the fingertip leaves the original position and then returns to the vicinity of the original position. The definition of the fingertip leaving the original position is: the difference between the position of the fingertip and the original position. When the distance is greater than D1, the definition of returning the fingertip to the vicinity of the original position is that the distance between the position of the fingertip and the original position is less than D2.

图7是本发明实施例中触发鼠标双击事件的流程图。参考图7，由于鼠标双击事件一般会在2秒内结束，这样在50帧图像序列内会出现一个点击动作。本发明实施例中通过每5帧图像检测手指指尖的坐标，并设置标记FAR＝TRUE为第tn帧图像的指尖坐标与原始位置的距离大于D1，设置标记NEAR＝TRUE为第tm帧(0<tn<tm<50)图像的指尖坐标与原始位置的距离小于D2；当FAR＝TRUE并且NEAR＝TRUE时，触发鼠标双击事件，对显示屏幕进行操控。具体流程如下步骤：FIG. 7 is a flowchart of triggering a mouse double-click event in an embodiment of the present invention. Referring to FIG. 7 , since a mouse double-click event generally ends within 2 seconds, a click action occurs within a 50-frame image sequence. In the embodiment of the present invention, the coordinates of the fingertip are detected by every 5 frames of images, and the mark FAR=TRUE is set to indicate that the distance between the fingertip coordinates of the tnth frame image and the original position is greater than D1, and the mark NEAR=TRUE is set to be the tmth frame ( 0<tn<tm<50) The distance between the fingertip coordinates of the image and the original position is less than D2; when FAR=TRUE and NEAR=TRUE, the mouse double-click event is triggered to control the display screen. The specific process is as follows:

首先读取图像，判断是否距离FAR＝TRUE过去了50帧图像也没有NEAR＝TRUE，若是，则设置FAR＝FALSE并且NEAR＝FALSE，即如果第tn帧图像的手指指尖在离开原始位置后，直到第(tn+50)帧图像的手指指尖都没有回到原始位置附近，则对第tn帧图像的手指指尖进行归位；若否，则每5帧图像检测手指指尖的坐标，当检测到的手指指尖的坐标满足FAR＝TRUE时，继续检测，直到手指指尖的坐标还满足NEAR＝TRUE时，触发鼠标双击事件，并设置FAR＝FALSE并且NEAR＝FALSE，重复以上步骤。First, read the image, and judge whether there is no NEAR=TRUE after 50 frames of images from FAR=TRUE, if so, set FAR=FALSE and NEAR=FALSE, that is, if the fingertip of the tnth frame image leaves the original position, Until the fingertip of the (tn+50)th frame image does not return to the original position, the fingertip of the tnth frame image is returned to the original position; if not, the coordinates of the fingertip are detected every 5 frames of images, When the detected coordinates of the fingertip satisfy FAR=TRUE, continue to detect until the coordinates of the fingertip also satisfy NEAR=TRUE, trigger the mouse double-click event, and set FAR=FALSE and NEAR=FALSE, and repeat the above steps.

设原始位置的手指指尖的三维坐标为(x₀,y₀,z₀)，第tn帧图像的指尖坐标为(x_tn,y_tn,z_tn)，第tm帧图像的指尖坐标为(x_tm,y_tm,z_tm)，则当同时满足：Let the three-dimensional coordinates of the fingertip of the original position be (x ₀ , y ₀ , z ₀ ), the coordinates of the fingertip of the tn-th frame image are (x _tn , y _tn , z _tn ), and the finger-tip coordinates of the tm-th frame image is (x _tm , y _tm , z _tm ), then when both:

(x_tn-x₀)²+(y_tn-y₀)²+(z_tn-z₀)²＞D1²和(x _tn -x ₀ ) ² +(y _tn -y ₀ ) ² +(z _tn -z ₀ ) ² >D1 ² and

(x_tm-x₀)²+(y_tm-y₀)²+(z_tm-z₀)²＜D2²时，触发鼠标双击事件。When (x _tm -x ₀ ) ² +(y _tm -y ₀ ) ² +(z _tm -z ₀ ) ² < D2 ² , a mouse double-click event is triggered.

本发明实施例提供了一种非接触式人机交互方法，用以解决无鼠标环境下利用手指指尖等可检测的目标物体对屏幕内容进行操控的问题。这种新的非接触式人机交互方法无需任何标定，而是基于深度学习，通过深度摄像机获取二维平面信息、三维深度信息，计算得到目标物体在显示屏幕上的投影的坐标，触发鼠标事件，实现非接触式人机交互。具有计算量小、硬件设备简单的优点。The embodiment of the present invention provides a non-contact human-computer interaction method, which is used to solve the problem of manipulating screen content with detectable target objects such as fingertips in a mouse-free environment. This new non-contact human-computer interaction method does not require any calibration, but is based on deep learning, obtains two-dimensional plane information and three-dimensional depth information through a depth camera, calculates the projected coordinates of the target object on the display screen, and triggers mouse events. , to achieve non-contact human-computer interaction. It has the advantages of small calculation amount and simple hardware device.

参考图8，本发明实施例还提供了一种新的非接触式人机交互系统，包括：计算机、深度摄像机、显示屏幕，计算机分别与深度摄像机和显示屏幕连接。该系统采用上述实施例的方法进行人机交互。深度摄像机采集显示屏幕的顶点的位置信息和目标点的位置信息，计算机用于根据深度摄像机采集到的位置信息计算目标点在显示屏幕上的投影的位置信息，并调用相关系统鼠标接口进行鼠标事件触发，对显示屏幕进行操控。Referring to FIG. 8 , an embodiment of the present invention further provides a new non-contact human-computer interaction system, including: a computer, a depth camera, and a display screen, and the computer is respectively connected to the depth camera and the display screen. The system adopts the method of the above embodiment to perform human-computer interaction. The depth camera collects the position information of the vertex of the display screen and the position information of the target point. The computer is used to calculate the position information of the projection of the target point on the display screen according to the position information collected by the depth camera, and call the relevant system mouse interface for mouse events. Trigger to control the display screen.

本发明实施例还提供一种计算机可读存储介质，其存储用于电子数据交换的程序，程序用于执行本发明的新的非接触式人机交互方法。The embodiment of the present invention also provides a computer-readable storage medium, which stores a program for electronic data exchange, and the program is used to execute the novel non-contact human-computer interaction method of the present invention.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the principles of the present invention shall be included in the protection scope of the present invention. Inside.

Claims

1. A novel non-contact man-machine interaction method comprises the following steps:

s100, automatically detecting three vertexes A, B, C and a target point F of a display screen by using a deep learning system in a two-dimensional plane image of a depth camera to obtain plane pixel coordinates of each point;

s200, combining the internal reference of the depth camera, and converting the plane pixel coordinates of the top point A, B, C and the target point F into three-dimensional coordinates under a depth camera coordinate system;

s300, calculating a three-dimensional coordinate of a projection point F' of the target point F on the display screen in a depth camera coordinate system;

s400, calculating the plane pixel coordinate of the projection point F' on the display screen;

and S500, identifying the action of the target point F, and calling a related system mouse interface to trigger a mouse event.

2. A new contactless human-computer interaction method according to claim 1, characterized in that said step S100 comprises the following sub-steps:

s110, automatically detecting a vertex A, B, C of the display screen by using a target detection algorithm based on deep learning to obtain plane pixel coordinates of each point;

s120, establishing a two-stage target detection deep neural network structure, and automatically detecting a target point F, wherein the two-stage target detection deep neural network structure comprises the following steps:

s121, establishing a target object detection depth neural network for detecting a target object in the two-dimensional plane image, expanding the detected target object area and positioning the target object according to the expanded area;

and S122, establishing a target point detection depth neural network for detecting a target point of the target object and positioning the target point to obtain a plane pixel coordinate of the target point F.

3. The new contactless human-computer interaction method according to claim 2, wherein the step S120 further comprises: and establishing a neural network with a dual-channel attention mechanism for improving the positioning precision of the detected target point.

4. The new contactless human-computer interaction method according to claim 1, wherein the step S300 includes:

from the vertex A, B, C, a normal vector n of the screen plane passing through the vertex a is calculated, and from the line segment FF ' parallel to the normal vector n and the line segment AF ' perpendicular to the normal vector n, the three-dimensional coordinates of the projected point F ' are calculated.

5. The new contactless human-computer interaction method according to claim 1, wherein the step S400 comprises:

and calculating the ratio of the abscissa u to the ordinate v in the plane pixel coordinate of the projection point F ', and then combining the screen resolution of the display screen to obtain the plane pixel coordinate of the projection point F'.

6. The new contactless human-computer interaction method according to claim 1, wherein the step S400 comprises:

respectively calculating the distance D from the projection point F' to the line segment AB_FABAnd the distance D from the projection point F' to the line segment AC_FACObtaining the ratio of the abscissa u to the ordinate v in the plane pixel coordinate of the projection point F', and then combining the screen resolution of the display screen according to the u ═ D_FABV ═ screen horizontal pixels, (/ AC) ((D))_FAC(AB) screen vertical pixels, resulting in the flatness of the projected point FThe plane pixel coordinates F' (u, v).

7. A novel non-contact human-computer interaction system, which comprises a computer, a depth camera and a display screen, wherein the computer is respectively connected with the depth camera and the display screen, and the system adopts the novel non-contact human-computer interaction method as claimed in any one of claims 1-6 for human-computer interaction.

8. A computer-readable storage medium storing a program for electronic data exchange, the program being for executing the new contactless human-computer interaction method according to any one of claims 1-6.