CN108829233B

CN108829233B - Interaction method and device

Info

Publication number: CN108829233B
Application number: CN201810387822.8A
Authority: CN
Inventors: 陈圆; 黄亮; 彭中兴
Original assignee: Shenzhen Tongwei Communication Technology Co ltd
Current assignee: SHENZHEN TONGWEI COMMUNICATION TECHNOLOGY Co.,Ltd.
Priority date: 2018-04-26
Filing date: 2018-04-26
Publication date: 2021-06-15
Anticipated expiration: 2038-04-26
Also published as: CN108829233A

Abstract

The invention relates to the technical field of human-computer interaction, and discloses an interaction method and a device, wherein the method comprises the following steps: identifying a human skeleton key point coordinate set of all people in the image collected by the camera; acquiring coordinate data of a target person from the human skeleton key point coordinate set; tracking the target person, and performing interaction according to the real-time coordinate data of the target person; the convolutional neural network algorithm based on deep learning improves the speed and accuracy of human-computer interaction, achieves the effect of automatically tracking the target person, only needs a common camera to acquire images, and is low in cost and high in compatibility.

Description

Interaction method and device

Technical Field

The invention relates to the technical field of human-computer interaction, in particular to an interaction method and device.

Background

The motion sensing game is a novel game in which a player interacts with an intelligent device by changing body actions, and compared with a traditional game depending on interaction modes such as keys or touch, the motion sensing game can enhance participation of the player and is more and more accepted by the game player and the market.

With the increasing significance of artificial intelligence, various technologies and applications emerge endlessly, and deep learning is taken as a popular research direction in the field of artificial intelligence, so that a machine can simulate a human brain learning mechanism to process data such as images and sounds, and particularly in the aspect of images, the processing effect based on the deep learning algorithm is obviously superior to that of the traditional image processing algorithm.

The core technology of the motion sensing game is how a computer acquires body action information of a player, and at present, two main implementation modes exist, one mode is a Kinect camera of Microsoft, the Kinect camera can acquire the body action information of the player, and the method has a good identification effect and has the defects of high hardware cost and complex equipment configuration. The other mode is that after the player images are collected by using a common camera, the player action information is obtained through a deep learning algorithm of body action recognition, and the common camera and the Kinect camera are different in that the common camera can only collect two-dimensional images, and the Kinect camera can collect three-dimensional images. The method is low in hardware cost, but the experience effect of the player is poor, and the player is easily interfered by the field environment, for example, after the player body is temporarily shielded, the body motion information of the player cannot be identified again by the motion sensing game in the reappearance of the image.

Disclosure of Invention

The invention mainly aims to provide an interaction method and an interaction device, which improve the speed and accuracy of man-machine interaction and achieve the effect of automatically tracking a target person through a convolutional neural network algorithm based on deep learning, and only a common camera is needed to acquire an image, so that the cost is low and the compatibility is strong.

In order to achieve the above object, an interaction method provided by the present invention includes:

identifying a human skeleton key point coordinate set of all people in the image collected by the camera;

acquiring coordinate data of a target person from the human skeleton key point coordinate set;

and tracking the target person, and performing interaction according to the real-time coordinate data of the target person.

Optionally, the identifying the human skeleton key point coordinate set of all people in the image acquired by the camera includes:

acquiring an image acquired by a camera;

and identifying a human skeleton key point coordinate set of all people in the image by using a human skeleton key point identification algorithm.

Optionally, the tracking the target person, and the interacting according to the real-time coordinate data of the target person includes:

acquiring an image of a rectangular area where a target person is located, and determining an initial tracking area;

estimating a rectangular area of the target person in the image collected by the camera at a certain moment through a target tracking algorithm;

calculating real-time coordinate data of the target person through the human skeleton key point identification algorithm;

calculating the intersection ratio of the rectangular area corresponding to the real-time coordinate data and the rectangular area obtained by estimation;

judging whether the intersection ratio is larger than a preset threshold value or not, and if so, taking the real-time coordinate data as control data for interaction between the target person and the system;

otherwise, recording the action of the target person at a certain moment as abnormal action, and accumulating abnormal times.

Optionally, the recording the motion of the target person at the certain time as an abnormal motion, and after accumulating the abnormal times, further includes:

judging whether the abnormal times are larger than a preset time threshold value or not;

if so, acquiring a human skeleton key point coordinate set of all people in the image acquired by the camera at the moment, and calculating the similarity between the rectangular area where all people are located and the initial tracking area;

and taking the person with the maximum similarity as a target person.

Optionally, the target person is a person who completes a preset action in the image.

As another aspect of the present invention, there is provided an interactive apparatus, including:

the identification module is used for identifying a human skeleton key point coordinate set of all people in the image acquired by the camera;

the interaction module is used for acquiring coordinate data of a target person from the human skeleton key point coordinate set;

and the tracking module is used for tracking the target person and carrying out interaction according to the real-time coordinate data of the target person.

Optionally, the identification module comprises:

the image acquisition unit is used for acquiring images acquired by the camera;

and the coordinate calculation unit is used for identifying the human skeleton key point coordinate set of all people in the image by using a human skeleton key point identification algorithm.

Optionally, the tracking module comprises:

the initial unit is used for acquiring an image of a rectangular area where a target person is located and determining an initial tracking area;

the estimation unit is used for estimating a rectangular area of the target person in the image acquired by the camera at a certain moment through a target tracking algorithm;

the real-time coordinate calculation unit is used for calculating real-time coordinate data of the target person through the human skeleton key point identification algorithm;

the intersection ratio calculation unit is used for calculating the intersection ratio of the rectangular area corresponding to the real-time coordinate data and the rectangular area obtained through estimation;

the first judgment unit is used for judging whether the intersection ratio is larger than a preset threshold value or not, and if so, the real-time coordinate data is used as control data for interaction between a target person and a system; otherwise, recording the action of the target person at a certain moment as abnormal action, and accumulating abnormal times.

Optionally, the tracking module further comprises:

the second judging unit is used for judging whether the abnormal times are larger than a preset time threshold value or not;

the similarity calculation unit is used for acquiring a human skeleton key point coordinate set of all people in the image acquired by the camera at the moment and calculating the similarity between the rectangular area where all people are located and the initial tracking area when the abnormal times are larger than a preset time threshold;

and the re-acquisition target unit is used for taking the person with the maximum similarity as a target person.

The invention provides an interaction method and an interaction device, wherein the method comprises the following steps: identifying a human skeleton key point coordinate set of all people in the image collected by the camera; acquiring coordinate data of a target person from the human skeleton key point coordinate set; tracking the target person, and performing interaction according to the real-time coordinate data of the target person; the convolutional neural network algorithm based on deep learning improves the speed and accuracy of human-computer interaction, achieves the effect of automatically tracking the target person, only needs a common camera to acquire images, and is low in cost and high in compatibility.

Drawings

Fig. 1 is a flowchart of an interaction method according to an embodiment of the present invention;

FIG. 2 is a flowchart of the method of step S10 in FIG. 1;

FIG. 3 is a flowchart of a method of step S30 of FIG. 1;

FIG. 4 is a flowchart of another method of step S30 of FIG. 1;

fig. 5 is a block diagram illustrating an exemplary structure of an interactive apparatus according to a second embodiment of the present invention;

FIG. 6 is a block diagram of an exemplary structure of the identification module of FIG. 5;

FIG. 7 is a block diagram of an exemplary structure of the tracking module of FIG. 5;

fig. 8 is a block diagram of another exemplary structure of the tracking module of fig. 5.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.

Example one

As shown in fig. 1, in this embodiment, an interaction method includes:

s10, recognizing a human skeleton key point coordinate set of all people in the image collected by the camera;

s20, acquiring coordinate data of a target person from the human skeleton key point coordinate set;

and S30, tracking the target person, and performing interaction according to the real-time coordinate data of the target person.

In the embodiment, the speed and the accuracy of human-computer interaction are improved through the convolutional neural network algorithm based on deep learning, the effect of automatically tracking the target person is achieved, only a common camera is needed to collect images, the cost is low, and the compatibility is strong.

In this embodiment, the key points of the human skeleton generally refer to the key points of the eyes, nose, neck, left shoulder, left elbow, left wrist, right shoulder, right elbow, right wrist, left hip, left knee, left ankle, right hip, right knee, right ankle, and the like, and the coordinate data of these key points is related to the collected image, for example, an image with dimension H × W, with the left vertex of the image as the origin coordinate (0,0), the direction from the origin to the right vertex as the X axis, and the direction from the origin to the lower left point as the Y axis, the coordinate of the key point is (X, Y), where X is greater than or equal to 0 and less than or equal to W, and Y is greater than or equal to 0 and less than or equal to H. Zero or more persons may exist in an image, and the human skeleton key point coordinate data set refers to that the coordinate data of the zero or more persons are put in a set.

In this embodiment, the target person is a person who completes a preset action in the image.

When the motion sensing game is started, a player is prompted to perform preset appointed actions to control to start the game, the appointed actions such as raising hands, standing and the like have distinguishable characteristics when being mapped to the coordinates of the human skeleton key points, and therefore the actions can be selected from the coordinate data set of the human skeleton key points.

As shown in fig. 2, in the present embodiment, the step S10 includes:

s11, acquiring an image acquired by the camera;

and S12, identifying the human skeleton key point coordinate set of all people in the image by using a human skeleton key point identification algorithm.

In the present embodiment, the human bone key point identification algorithm includes an openpos algorithm based on deep learning.

In the embodiment, a common camera is used as image data acquisition equipment, so that the hardware cost is low, the compatibility is strong, and the popularization and the use are convenient; an ordinary camera acquires an image Ft, and a human skeleton key point coordinate data set { P1, P2, …, Pn } of all people in the Ft is identified by using a human skeleton key point identification algorithm. Ft represents the image collected at the current moment, the frame rate of the camera determines the unit of the moment, the resolution of the camera determines the size of the Ft image, and Pn represents the human skeleton key point coordinate data of the nth person. The human body skeleton key point coordinate data Pk of the target player of the designated action is selected from { P1, P2, …, Pn }, wherein the designated action comprises raising hands and standing, but is not limited to specific actions.

As shown in fig. 3, in the present embodiment, the step S30 includes:

s31, acquiring an image of a rectangular area where the target person is located, and determining an initial tracking area;

and intercepting and storing a rectangular area O where the target player k is located in the image Ft according to the coordinate data Pk of the key points of the human skeleton, and initializing the target tracking algorithm by taking the image of the rectangular area as an initialization parameter of the target tracking algorithm.

S32, estimating a rectangular area of the target person in the image collected by the camera at a certain moment through a target tracking algorithm;

in the present embodiment, the target tracking algorithm includes a deep learning based GOTURN algorithm.

S33, calculating real-time coordinate data of the target person through the human skeleton key point identification algorithm;

s34, calculating the intersection ratio of the rectangular area corresponding to the real-time coordinate data and the rectangular area obtained by estimation;

the camera acquires an image Ft +1, a target tracking algorithm estimates a rectangle M of a region where a target player is located in the image Ft +1, a human bone key point identification algorithm identifies a human bone key point coordinate data set { P1, P2, …, Pn } of all people in the image Ft +1, a region { N1, N2, …, Nn } where a human body corresponding to { P1, P2, …, Pn } is located is calculated, then { IoU1, IoU2, …, IoUn } is calculated, and finally IoUk is taken as max { IoU1, IoU2, …, IoUn }, wherein IoU (Intersection over Unit) is defined as:

s35, judging whether the intersection ratio is larger than a preset threshold value or not, if so, S36, taking the real-time coordinate data as control data for interaction between a target person and a system;

otherwise, S37 records the action of the target person at the certain time as an abnormal action, and accumulates the abnormal times.

If the IoUk is larger than or equal to T1 and T1 is a preset threshold value, outputting Pk as control data of actual interaction between the target player and the motion sensing game; if IoUk < T1, recording and accumulating the abnormal times C of IoUk < T1.

As shown in fig. 4, in this embodiment, after step S37, the method further includes:

s38, judging whether the abnormal times are larger than a preset time threshold value or not;

if so, S39, acquiring a human skeleton key point coordinate set of all people in the image acquired by the camera at the moment, and calculating the similarity between the rectangular area where all people are located and the initial tracking area; otherwise, the step S310 is carried out, and the abnormal times are continuously accumulated;

and S311, taking the person with the maximum similarity as a target person.

The camera acquires the current time image Ft 'again, the human skeleton key point identification algorithm identifies a human skeleton key point coordinate data set { P1, P2, …, Pn } of all people in the Ft', calculates a region { N1, N2, …, Nn } where a human body corresponding to the { P1, P2, …, Pn } is located, and then calculates { S1, S2, …, Sn } and Sm represents the similarity between the image of the region O and the image of the region Nm, wherein m belongs to {1,2, …, N }. And traversing { S1, S2, … and Sn }, selecting Sk > T2, wherein T2 is a preset threshold, and taking an Nk area image as an initialization parameter of the target tracking algorithm.

In the embodiment, the method can automatically track the target player, effectively solves the problem that the tracking target is lost after the target player is shielded by the body for a short time in the game, quickly retraces the target player, greatly improves the participation sense of the player in the motion sensing game, and brings better game experience to the player.

Example two

As shown in fig. 5, in the present embodiment, an interaction apparatus includes:

the identification module 10 is used for identifying a human skeleton key point coordinate set of all people in the image acquired by the camera;

the interaction module 20 is used for acquiring coordinate data of a target person from the human skeleton key point coordinate set;

and the tracking module 30 is used for tracking the target person and performing interaction according to the real-time coordinate data of the target person.

As shown in fig. 6, in this embodiment, the identification module includes:

the image acquisition unit 11 is used for acquiring images acquired by the camera;

and the coordinate calculation unit 12 is used for identifying the human skeleton key point coordinate set of all people in the image by using a human skeleton key point identification algorithm.

As shown in fig. 7, in this embodiment, the tracking module includes:

an initial unit 31, configured to acquire an image of a rectangular area where a target person is located, and determine an initial tracking area;

The estimation unit 32 is used for estimating a rectangular area of the target person in the image acquired by the camera at a certain moment through a target tracking algorithm;

The real-time coordinate calculation unit 33 is used for calculating real-time coordinate data of the target person through the human skeleton key point identification algorithm;

an intersection ratio calculation unit 34, configured to calculate an intersection ratio between a rectangular region corresponding to the real-time coordinate data and the estimated rectangular region;

the first judging unit 35 is configured to judge whether the intersection ratio is greater than a preset threshold, and if so, use the real-time coordinate data as control data for interaction between the target person and the system; otherwise, recording the action of the target person at a certain moment as abnormal action, and accumulating abnormal times.

As shown in fig. 8, in this embodiment, the tracking module further includes:

a second judging unit 36, configured to judge whether the abnormal number is greater than a preset number threshold;

the similarity calculation unit 37 is configured to, when the abnormal times are greater than a preset time threshold, obtain a human skeleton key point coordinate set of all people in the image acquired by the camera at the time, and calculate a similarity between a rectangular region where all people are located and the initial tracking region;

and a re-acquisition target unit 38 for setting the person with the largest similarity as the target person.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An interaction method, comprising:

tracking the target person, and performing interaction according to the real-time coordinate data of the target person;

the step of tracking the target person and interacting according to the real-time coordinate data of the target person comprises the following steps:

2. The interaction method according to claim 1, wherein the identifying the set of human skeleton key point coordinates of all people in the image captured by the camera comprises:

acquiring an image acquired by a camera;

3. The interaction method according to claim 1, wherein the recording the action of the target person at the certain moment as an abnormal action, and after accumulating the abnormal times further comprises:

and taking the person with the maximum similarity as a target person.

4. The interaction method of claim 1, wherein the target person is a person in the image who has completed a predetermined action.

5. An interactive apparatus, comprising:

the tracking module is used for tracking the target person and carrying out interaction according to the real-time coordinate data of the target person;

wherein the tracking module comprises:

6. The interaction device of claim 5, wherein the identification module comprises:

the image acquisition unit is used for acquiring images acquired by the camera;

7. The interaction device of claim 5, wherein the tracking module further comprises:

8. The interactive device as claimed in claim 5, wherein the target person is a person in the image who performs a predetermined action.