CN112651325A

CN112651325A - Interaction method and device of performer and virtual object and computer equipment

Info

Publication number: CN112651325A
Application number: CN202011526981.5A
Authority: CN
Inventors: 涂中文; 沈萦华; 吕朝辉; 李金戈
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-04-13

Abstract

The invention provides a performer and virtual object interaction method, a performer and virtual object interaction device and computer equipment, wherein the method comprises the steps of obtaining depth video data of a performer through a reality technology; identifying each frame of image of the depth video data to obtain position data of skeletal joints of performers; storing the position data into a List structure in sequence; traversing the List structure body, and calculating to obtain the motion data of the skeletal joints of the performer; constructing a virtual human body model; constructing a corresponding relation between a skeletal joint of the virtual human body model and a skeletal joint of the performer; and mapping the motion data to the bone joints of the virtual human body model based on the corresponding relation so that the virtual human body model can read the action of the performer in real time to generate corresponding behaviors to realize the interaction of the performer and the virtual human body model.

Description

Interaction method and device of performer and virtual object and computer equipment

Technical Field

The invention relates to the technical field of human-computer interaction, in particular to a performer and virtual object interaction method and device and computer equipment.

Background

The rapid development of modern information technology brings great changes to the production and living styles of human beings. With the continuous development and breakthrough of computer technology and the emergence of new devices, people have higher and higher requirements for traditional media.

Due to the rapid development of information technology and the large-scale popularization of new devices, people also begin to expect a new human-computer interaction mode in the traditional stage performance. From the advent of computers to date, mice, keyboards, and displays continue to be the most common way of human-computer interaction. The interaction mode has many different limitations, for example, the handle is limited due to insufficient key positions, and VR equipment is heavy and expensive, so that people cannot really interact with the stage scene as a finger in a practical manner when using a computer in life, and the requirements of people at higher levels cannot be met, and therefore, how to combine the performer with the virtual object and the scene, the stage effect is more and more subject to the attention of people.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a computer device and a storage medium for interaction between a performer and a virtual object.

In one embodiment, a method of interaction of an actor with a virtual object includes:

acquiring depth video data of a performer through a sensory technology;

identifying each frame of image of the depth video data to obtain position data of skeletal joints of performers;

storing the position data into a List structure in sequence;

traversing the List structure body, and calculating to obtain the motion data of the skeletal joints of the performer;

constructing a virtual human body model;

constructing a corresponding relation between the skeletal joints of the virtual human body model and the skeletal joints of the performer;

mapping the motion data into skeletal joints of the virtual human body model based on the correspondence.

In one embodiment, the step of traversing the List structure to calculate motion data of skeletal joints of the performer comprises:

traversing the List structure, and calculating the movement direction and the movement distance of the bone joint according to the variation of the position data identified by two adjacent frames of images;

acquiring the time interval of two adjacent frames of images;

and calculating the movement speed of the bone joint according to the time interval and the movement distance.

removing the position data which do not meet preset conditions in the List structure body to obtain the removed List structure body;

and traversing the removed List structure body, and calculating to obtain the motion data.

In one embodiment, the location data comprises: vector data and angle data;

the step of removing the position data which do not meet the preset condition from the List structure body to obtain a removed List structure body includes:

removing the position data of which the initial angle does not meet a preset angle range in the List structure;

and removing the position data of which the difference value between the angle data of the current frame and the angle data of the previous frame in the List structure exceeds a preset threshold value to obtain the removed List structure.

In one embodiment, the position data comprises vector data and angle data;

the step of traversing the List structure and calculating motion data of skeletal joints of the performer comprises:

creating a window;

initializing the window to 0;

continuously adding the currently traversed position data into the window in the process of traversing the List structure;

recording angle data for a first of said bone joints as a starting angle value;

judging whether the initial angle value is within a preset angle range or not;

when the initial angle value exceeds the preset angle range, eliminating the position data of the current frame;

when the initial angle value is within the preset angle range, carrying out difference processing on the angle data of the current frame and the angle data of the previous frame to obtain an angle difference value;

judging whether the angle difference value meets the determined bone joint action recognition state characteristics or not;

when the angle difference value meets the determined bone joint motion recognition state characteristics, dividing the angle difference value by the size of the window, and calculating to obtain the motion speed of the bone joint;

when the angle difference does not meet the determined bone joint motion recognition state characteristics, the position data are removed;

identifying the variable quantity of the vector data according to the current frame image and the vector data identified by the previous frame image, and calculating to obtain the motion direction and the motion distance of the bone joint;

traversing the List structure; and obtaining the motion data.

In one embodiment, the method for the performer to interact with the virtual object further comprises:

acquiring three-primary-color video data of the performer through a real sensing technology;

aligning the three primary color video data with the same frame of image in the depth video data to obtain synthesized video data;

the step of identifying each frame of image of the depth video data to obtain position data of skeletal joints of the performer comprises:

and identifying each frame of image of the synthetic video data to obtain the position data of the bone joint.

In one embodiment, the position data is obtained using the following formula:

x'＝Raw Im age.x-Proj.x*Raw Im age.x

y'＝Raw Im age.y-Proj.y*Raw Im age.y

wherein x 'is the x-axis coordinate of the bone joint, y' is the y-axis coordinate of the bone joint, Raw image.x is the length of the image, Raw image.y is the width of the image, the x-axis coordinate value of the bone joint recognized by proj.x, and the y-axis coordinate value of the bone joint recognized by Pr oj.y.

In one embodiment, an apparatus for interaction of an actor with a virtual object, the apparatus comprising:

the video module is used for acquiring depth video data of a performer through a real sensing technology;

the identification module is used for identifying each frame of image of the depth video data to obtain the position data of the skeletal joint of the performer;

the storage module is used for sequentially storing the position data into a List structure body;

the traversal module is used for traversing the List structure body and calculating motion data of skeletal joints of the performer;

the building module is used for building a virtual human body model;

the corresponding module is used for constructing the corresponding relation between the skeletal joints of the virtual human body model and the skeletal joints of the performer;

and the mapping module is used for mapping the motion data into the bone joints of the virtual human body model based on the corresponding relation.

In one embodiment, a computer device comprises a memory storing a computer program and a processor implementing the steps of the method of any of the above embodiments when the processor executes the computer program.

In one of the embodiments, a computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above embodiments.

According to the interaction method of the performer and the virtual object, the depth video data of the performer are obtained through a perception technology, the depth video data are identified, the position data of the bone joints and the position data of the bone joints of the performer are identified, the identified position data are correspondingly stored in a List structure body, the List structure body is traversed to obtain the motion data of the bone joints, the motion data are mapped into the virtual human body model, so that the virtual human body model can read the action of the performer in real time to generate corresponding behaviors, and interaction between the performer and the virtual human body model is achieved.

Drawings

FIG. 1 is a schematic flow diagram of a method for an actor to interact with a virtual object in one embodiment;

FIG. 2 is a schematic flow chart of a method for interaction of an actor with a virtual object in another embodiment;

FIG. 3 is a schematic diagram illustrating the processing of bone joint data according to one embodiment;

FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The application provides a performer and virtual object interaction method, which comprises the following steps:

acquiring depth video data of a performer through a sensory technology;

storing the position data into a List structure in sequence;

constructing a virtual human body model;

Referring to fig. 1, a method for interaction between an actor and a virtual object is provided, the method comprising:

s110, acquiring depth video data of a performer through a sensory technology;

specifically, the real sensing technology, namely Realsense, shoots the performer through the depth sensing camera to obtain depth video data of the performer.

S120, identifying each frame of image of the depth video data to obtain position data of skeletal joints of performers;

specifically, Depth images (Depth maps) are widely used as a general three-dimensional scene information expression mode. The gray value of each pixel point of the depth image can be used for representing the distance between a certain point in the scene and the camera. Thus, by identifying each frame of image of the depth video data, position data of the skeletal joints of the performer can be obtained.

And S130, sequentially storing the position data into a List structure body.

Specifically, in order to obtain the limb state of the user within a period of time, the bone information obtained within the period of time needs to be stored. By selecting the List data structure as a storage container for the skeletal information. The benefits of using a List structure as a storage container are:

(1) is simple and convenient. After the joint information is stored, it is obvious to go through the information iteratively. For the List structure, iteration can be done directly through the for loop, while access to the data can also be done simply through the subscript.

(2) Is convenient to clean. In Unity3D, 60 frames can be run for 1 second, i.e. a video is taken for one second, there are 60 frames of picture frames, if the joint information is stored for each frame, the length of the container will reach 60 after 1 second, and 3600 after 1 minute. Obviously, for joint information, not only real-time storage but also real-time maintenance and cleaning are required to be realized. The List structure in C # has a Remove function built in, and simple and easy data cleaning can be realized by using the function.

Specifically, the position data is sequentially stored in the List structure, that is, the position data identified by each frame of image in the depth video data is sequentially stored in the List structure.

And S140, traversing the List structure body, and calculating to obtain the motion data of the skeletal joints of the performer.

Specifically, the List structure is traversed, that is, the position data identified by each frame image is traversed, and since the position data is stored sequentially, that is, in time sequence, the position data of each frame image is processed in time sequence during the traversal, and the motion data of the skeletal joint of the performer, such as the motion direction and the motion speed, is calculated according to the position data.

S150, constructing a virtual human body model;

specifically, the virtual human body model can be downloaded from Asset Store of Unity3D and imported

S160, constructing a corresponding relation between the bone joints of the virtual human body model and the bone joints of the performer;

specifically, the joints of the virtual human body model need to correspond to the skeletal joints of the performer, so that the virtual human body model can transmit corresponding motions with the limb motions of the performer.

Specifically, at model initialization, the model needs to be dragged into the scene and T-shaped to obtain the correct joint matching. Likewise, in order to drive the virtual manikin, the corresponding bone processing function needs to be started also in case the user is detected. In indirect mapping, the position of the entire model is determined using the position of the torso joint. Depending on the location of the scene building, it may be necessary to rotate or symmetrically manipulate this location. Otherwise, the model may move in the other direction. In order to set a one-to-one correspondence relationship between the virtual human body model and the skeletal joints of the performer, a list of storage joints needs to be created in a scenario in which data of the skeletal joints is processed.

S170, mapping the motion data to bone joints of the virtual human body model based on the corresponding relation.

Specifically, the motion data of each bone joint of the performer is obtained through calculation in the above steps, and the motion data obtained through calculation is mapped into the virtual human body model, that is, the motion data is assigned to the bone joint corresponding to the virtual human body model, so that the bone joint of the virtual human body model moves along with the motion of the performer, and the virtual human body model can follow the motion of the performer. That is, when the performer raises the right hand, the virtual mannequin will also raise its right hand accordingly.

Specifically, in this embodiment, the depth video data is obtained by real shooting, the skeletal joints of the performer are identified by Nuitrack, and a virtual human body model is constructed by Unity 3D.

Referring to fig. 2, in one embodiment, the step of traversing the List structure to calculate the motion data of the skeletal joints of the performer includes:

s141, traversing the List structure, and calculating the movement direction and the movement distance of the bone joint according to the variation of the position data identified by the two adjacent frames of images;

s142, acquiring the time interval between two adjacent frames of images;

and S143, calculating the movement speed of the bone joint according to the time interval and the movement distance.

Specifically, the position data includes vector data and angle data, and the movement direction and the movement distance of the bone joint are calculated according to the variation of the vector data and the angle data identified by two adjacent frames of images, so as to obtain the direction of arm flapping. First, arm information needs to be described using the obtained skeletal joint information. For example, the elbow to hand joint vector is used to describe the arm. In addition, the included angle between the arm and the x axis can be calculated by utilizing the vector of the arm. By using the vector information of the arm, the position of the arm, namely whether the arm is upward or downward at all, can be obtained. And using the calculated angle information, the waving direction of the arm can be obtained, and if the calculated angle is gradually increased in the iteration of the List, it can be determined that the arm of the performer is waving inwards. Similarly, if the angle is calculated to be gradually decreased in the iteration, it can be judged that the arm of the performer is swung outward.

During the traversal, the program will compare the angle of the currently traversed bone joint with the angle of the bone joint of the previous frame. If the angle is decreasing, the performer is deemed to want to perform an arm "out" maneuver; similarly, if the angle increases, it is recorded that the user is performing an arm "inward" operation. After traversing the first two data of the List, it can know which of the 8 body movements the performer wants to perform, wherein the 8 body movements are shown in table 1. Wherein x is the x-axis coordinate of the bone joint and y is the y-axis coordinate of the bone joint.

Watch 18 kinds of limb movements

In this way, the movement direction and the movement distance of the bone joint can be calculated according to the vector data and the variation of the angle data identified by the two adjacent frames of images.

Specifically, when the depth video is captured, the frame rate is fixed, that is, the number of images captured per second is fixed, and by acquiring the time interval between two adjacent frames, that is, the time interval between the current frame and the previous frame, for example, when the camera captures 60 frames of images for 1 second, the time interval between two adjacent frames of images is 16.7 milliseconds. And obtaining the current movement speed of the bone joint according to the division of the movement distance by the time interval. The moving direction, the moving distance and the moving speed of the skeleton joints are determined, namely the moving condition of the skeleton of the performer is obtained.

Specifically, the removing step may be performed before the List structure is traversed, or may be performed in real time during the traversing process. In the position data storage and List structure, the shot depth video data and the recognition position data may be abnormal, and therefore, the position data which does not meet the preset conditions in the List structure is removed, so that the obtained motion data conforms to the human body rule, for example, the position data which does not meet the characteristics of the determined limb action recognition state is removed, and for example, the position data of which the starting angle does not meet the preset angle range is removed.

In one embodiment, the location data comprises: vector data and angle data;

Specifically, the program calculates angular data for the bone joints, and during the initial traversal, the program records the angular value of the first bone joint as the starting angular value. And judging the initial angle after recording, directly cleaning data in the window if the initial angle does not meet the preset angle range, exiting the circulation, and not performing traversal judgment in the current frame.

Specifically, if the difference between the angle of the skeletal joint of the current frame and the angle of the skeletal joint of the previous frame exceeds a preset threshold, that is, the motion state of the skeletal joint does not satisfy the characteristics of the determined limb motion recognition state, that is, if the arm angle of the current frame is smaller than the arm angle of the previous frame in the state of the inward swinging of the arm, it indicates that the recognition process of the state is finished, and the position data of the frame is removed.

In one embodiment, the position data comprises vector data and angle data;

creating a window;

initializing the window to 0;

recording angle data for a first of said bone joints as a starting angle value;

judging whether the initial angle value is within a preset angle range or not;

traversing the List structure; and obtaining the motion data.

Specifically, for better understanding of the present embodiment, taking the arm as an example, when beginning to traverse the List, the program will create a window with a size of 0, during the process of traversing the List, the window will continuously add the position data of the currently traversed arm, after the program obtains the position data of the bone joints, it will first determine which quadrant the arm is located in fig. 3, and based on this, first determine the "up" or "down" information of the arm in 8 kinds of limb movements.

The program then calculates the arm angle information and, at the beginning of the traversal, the program records the angle value of the first skeletal joint as the starting angle value. And judging the initial angle after recording, directly cleaning data in the window if the initial angle does not meet the preset angle range, exiting the circulation, and not performing traversal judgment in the current frame. If the starting angle satisfies the preset angle range, the program will compare the currently traversed angle data with the angle data of the previous frame. If the angle is decreasing, the program considers that the performer wants to perform an arm "out" maneuver; similarly, if the angle increases, the program will note that the user is performing an arm "inward" operation. Specifically, the angle in this embodiment may be understood as an included angle.

After traversing the first two data of the List, the program will get which of the 8 limb actions in Table 1 the performer wants to perform. The next traversal is to determine whether the recognition is successful, if the difference between the arm angle of the frame and the arm angle of the previous frame does not satisfy the characteristics of the determined limb movement recognition state, that is, if the arm angle of the current frame is smaller than the arm angle of the previous frame in the state of swinging the arm "inwards", it indicates that the recognition process of the state is finished. At this point all the traversed data is recorded in the window. Meanwhile, joint angle information in the window is monotonous, the program judges the last joint angle information of the window at the moment, if the angle data of the bone joint meet a preset angle range, the limb movement identification is successful, the program makes a difference between a recorded starting angle and a recorded finishing angle and divides the difference by the size of the window to obtain the movement speed of the arm, then the window data are cleared, the identification result is set, and the circulation is stopped; if the window data is not satisfied, the identification is failed, the window data is cleaned at the moment, and the loop exits.

In order to facilitate recording of the identification state of the bone joint, in one embodiment, in the identification process, the calculated motion data is given to the successfully identified bone joint; and for the bone joint with unsuccessful recognition, giving a recognition failure identifier, wherein the corresponding motion data is 0. Specifically, after the iterative identification process of the program is finished, two identification results are generated: failure of identification and success of identification. In the case of a failure in recognition, the program sets the state variable of the corresponding bone joint to Default and the motion speed variable of the corresponding bone joint to 0, which indicates that the bone joint does not perform any recognition operation currently. If the recognition is successful, the program records the recognition state of the corresponding bone joint and the calculated swing speed of the bone joint, and gives the recorded recognition state and swing speed to the variable for external access.

Specifically, the RGB video data is used for the user to visually recognize the bone recognition condition and the recognized bone position of the Nuitrack. First, since the depth image obtained by the readense does not accurately correspond to the RGB image, it is necessary to align the depth image of the readense with the RGB image using a Nuitrack plug-in order to eliminate the offset and accurately obtain the recognition result. The Nuitrack, having acquired the RGB image and depth image captured by the Realsense, processes the images, identifies the performer, and if identified, automatically processes the skeletal data and stores it. An additional way of selecting the skeletal nodes is to use the instantiation method of Unity3D, instantiate prepared joints in Unity3D to the positions of the identified skeletal joints when the skeletal joints of the performer are detected, and destroy the stage objects of the corresponding joints when the joints of the performer lose focus.

In one embodiment, the position data is obtained using the following formula:

x'＝Raw Im age.x-Pr oj.x*Raw Im age.x

y'＝Raw Im age.y-Pr oj.y*Raw Im age.y

Specifically, by using the Joint structure of the Nuitrack, each piece of bone Joint information recognized in the screen can be obtained by traversing the shot bone Joint through the interface provided by the Nuitrack, and the Joint information is stored in the form of the Joint structure. The Joint structure includes a confidence attribute, a Joint type attribute, a Proj attribute, and the like. After the information of the joints after Nuitrack processing and filtering is obtained, secondary filtering is needed, the confidence attribute represents the identification reliability of the joints, the larger the value is, the more accurate and reliable the identification is, and the smaller the value is, the less reliable the identification result is. For example: if the value of the defindece is larger than 0.5, the identification of the joint is credible, and if the value of the defindece is smaller than 0.5, the identification result of the joint is determined to be unreliable. Therefore, in the process of instantiating the joints, only trusted joints are instantiated, and untrusted joint information is discarded, namely removed.

The Proj attribute in the joint information structure stores position information of the joint identified by Nuitrack. It contains three values of x, y and z, corresponding to three coordinate axes of x, y and z. The x value and the y value belong to the range of [0, 1], and the two values are obtained by dividing the absolute coordinates of the joint when the joint is captured and recognized and the length and width of the video frame, and represent the relative coordinates of the joint position. In order to correspond a joint to the joint coordinate corresponding to the image in Unity3D using this value, it is necessary to obtain the length and width of the RawImage created in Unity3D and make a corresponding calculation. Since the coordinate axis recognized by Nuitrack does not necessarily correspond to the coordinate axis of Unity3D, the x and y values cannot be simply multiplied by the length and width of RawImage. After testing, the coordinate axes on the RawImage are opposite to the coordinate axes x and y of the coordinate information of the joint provided by Nuitrack, that is, if the joint information is projected onto the joint of the image person correctly, the x and y values are inverted with respect to the symmetry axes, that is, the length of the RawImage is subtracted by the length of the RawImage multiplied by proj.x, and the width of the RawImage is subtracted by the width of the RawImage multiplied by proj.y, so as to obtain the correct coordinates. As shown in formula (1):

x'＝Raw Im age.x-Pr oj.x*Raw Im age.x

y'＝Raw Im age.y-Pr oj.y*Raw Im age.y (1)

to better illustrate the technical solution of the present invention, the following is a specific embodiment, a method for interaction between an actor and a virtual object, the method comprising:

acquiring depth video data of a performer through a sensory technology;

identifying each frame of image of the synthetic video data to obtain position data of the bone joint;

creating a window;

initializing the window to 0;

recording angle data for a first of said bone joints as a starting angle value;

judging whether the initial angle value is within a preset angle range or not;

traversing the List structure; obtaining the motion data;

constructing a virtual human body model;

Specifically, in the case of arm circumference, the above steps can be refined as follows:

and (I) storing the bone joint information. In order to obtain the limb state of the user within a period of time, the bone information obtained within the period of time needs to be stored. By selecting the List data structure as a storage container for the skeletal information. The benefits of using a List structure as a storage container are:

And (II) judging the bone joint information. When the user swings their arm, the program will recognize the direction and speed at which the user swings their arm. In order to obtain the direction of arm swing. First, arm information needs to be described using the obtained skeletal joint information. For example, the elbow to hand joint vector is used to describe the arm. In addition, the included angle between the arm and the x axis can be calculated by utilizing the vector of the arm. By using the vector information of the arm, the position of the arm, namely whether the arm is upward or downward at all, can be obtained. And using the calculated angle information, the waving direction of the arm can be obtained, and if the calculated angle is gradually increased in the iteration of the List, it can be determined that the arm of the performer is waving inwards. Similarly, if the angle is calculated to be gradually decreased in the iteration, it can be judged that the arm of the performer is swung outward.

When the List structure is traversed, the program creates a window with the size of 0, the window continuously adds currently traversed joint data in the process of traversing the List, and after the program obtains the position of the arm, the program firstly judges which quadrant the arm is in as shown in fig. 3, and firstly determines the arm 'up' or 'down' information in 8 limb actions according to the judgment. The program calculates the angle information of the arm and at the beginning of the traversal, the program records the angle value of the first joint data as the value of the starting angle. And judging the initial angle after recording, directly cleaning data in the window if the initial angle does not meet the set initial angle range, exiting the circulation, and not performing traversal judgment in the current frame.

In the traversal, the program compares the currently traversed joint angle information with the angle of the previous frame. If the angle is decreasing, the program deems the user wants to perform an arm "out" operation; similarly, if the angle increases, the program will record that the user is performing an arm "inward" operation. After traversing the first two data of the List, the program will obtain which of the 8 kinds of body motions in table 1 the user wants to perform, and the next traversal is to determine whether the recognition is successful, and if the difference between the arm angle of the frame and the arm angle of the previous frame does not satisfy the characteristics of the determined recognition state of the body motion, that is, if the arm angle of the current frame is smaller than the arm angle of the previous frame in the state of swinging the arm "inward", it indicates that the recognition process of the state is finished. At this point all the traversed data is recorded in the window. Meanwhile, joint angle information in the window is monotonous, the program judges the last joint angle information of the window at the moment, if the joint angle information meets a set end angle range, the limb action identification is successful, the program makes a difference between a recorded starting angle and a recorded end angle and divides the difference by the size of the window to obtain the arm movement speed, then window data is cleared, an identification result is set, and the circulation is stopped; if the window data is not satisfied, the identification is failed, the window data is cleaned at the moment, and the loop exits.

And (III) storing the joint motion recognition result. The limb motion recognition interface provides external access in the form of variables. As described above, the program will provide two classes corresponding to the left and right arms, including arm movement direction and arm movement speed, for access by external scripts. After the iterative identification process of the program is finished, two identification results are generated: failure of identification and success of identification. In case of recognition failure, the program sets the state variable of the corresponding arm to Default and the corresponding arm movement speed variable to 0, which indicates that the arm does not perform any recognition operation currently. If the recognition is successful, the program records the recognition state of the corresponding arm and the calculated arm swing speed, and assigns the recorded recognition state and calculated arm swing speed to a variable for external access.

And fourthly, Realsense image projection and bone node addition. To complete the function of projecting the image captured by Realsense into the scene, the RawImage game object in Unity3D was used to receive the image obtained by Nuitrack. The video stream captured by Realsense is firstly intercepted by Nuitrack for processing, and meanwhile, the Nuitrack also provides a method for acquiring the video stream for developers. The user can directly call the Nuitrack external interface to obtain the video frame shot by the Realsense. At this point, the video frame can be processed directly and then sent to the RawImage for display. Meanwhile, old video frames need to be destroyed and cleaned in time.

In order to make the user visually recognize the bone recognition condition and the recognized bone position of the Nuitrack, how to attach the Nuitrack processed bone data and the joint points to the image will be described next. First, since the depth image obtained by the readense does not accurately correspond to the RGB image, it is necessary to align the depth image of the readense with the RGB image using a Nuitrack plug-in order to eliminate the offset and accurately obtain the recognition result. The Nuitrack, having acquired the RGB image and depth image captured by the Realsense, processes the images, identifies the performer, and if identified, automatically processes the skeletal data and stores it. And the selected bone nodes are added by using the instantiation method of Unity3D, when the joints of the performer are detected, the prepared joints in Unity3D are instantiated to the positions of the identified bone joints, and the stage objects of the corresponding joints are destroyed when the joints of the performer lose focus.

As described above, the corresponding joint needs to be destroyed when it disappears, so the instantiated joint needs to be managed so that when the focus of the joint is lost, the corresponding joint object can be easily found when it is destroyed. A joint-to-object mapping dictionary is created that is used to make a one-to-one correspondence between joint types and objects to which the joints are bound. When the Nuitrack identifies a human body in a video stream shot by Realsense, bone data corresponding to the human body are traversed, detected joints are instantiated at positions corresponding to the characters by utilizing the joint positions stored by the Nuitrack, and the instantiated joint objects are added into a created dictionary according to the joint types for unified management.

By utilizing the Joint structure of the Nuitrack, the shot bone joints can be traversed through the interface provided by the Nuitrack to obtain each piece of bone Joint information recognized in the screen, and the Joint information is stored in the form of the Joint structure. The Joint structure contains a consndence attribute, a jointype attribute, a Proj attribute, and the like. After the information of the joints after Nuitrack processing and filtering is obtained, secondary filtering is needed, the confidence attribute represents the identification reliability of the joints, the larger the value is, the more accurate and reliable the identification is, and the smaller the value is, the less reliable the identification result is. For example: if the value of the defindece is larger than 0.5, the identification of the joint is credible, and if the value of the defindece is smaller than 0.5, the identification result of the joint is determined to be unreliable. Therefore, in the process of instantiating the joints, only trusted joints are instantiated, and untrusted joint information is discarded, namely removed.

The Proj attribute in the joint information structure stores position information of the joint identified by Nuitrack. It contains three values of x, y and z, corresponding to three coordinate axes of x, y and z. The x value and the y value belong to the range of [0, 1], and the two values are obtained by dividing the absolute coordinates of the joint when the joint is captured and recognized and the length and width of the video frame, and represent the relative coordinates of the joint position. In order to correspond a joint to the joint coordinate corresponding to the image in Unity3D using this value, it is necessary to obtain the length and width of the RawImage created in Unity3D and make a corresponding calculation. Since the coordinate axis recognized by Nuitrack does not necessarily correspond to the coordinate axis of Unity3D, the x and y values cannot be simply multiplied by the length and width of RawImage. After testing, the coordinate axes on the RawImage are opposite to the coordinate axes x and y of the coordinate information of the joint provided by Nuitrack, that is, if the joint information is projected onto the joint of the image person correctly, the x and y values are inverted with respect to the symmetry axes, that is, the length of the RawImage is subtracted by the length of the RawImage multiplied by proj.x, and the width of the RawImage is subtracted by the width of the RawImage multiplied by proj.y, so as to obtain the correct coordinates.

The position information of the joint object in the Unity3D is adjusted by using the obtained x and y values through calculation in each frame of the video stream, and the joint object and the real image projection joint position in the Unity3D can be tracked in real time through the operation.

And (V) tracking the virtual human body model. The virtual mannequin is downloaded and imported from Asset Store of Unity 3D. At model initialization, the model needs to be dragged into the scene and T-shaped to get the correct joint match. Likewise, in order to drive the virtual figures, the corresponding skeletal processing functions also need to be initiated in case a performer is detected. In indirect mapping, the position of the entire model is determined using the position of the torso joint. According to the position of the scene building, the position needs to be rotated or symmetrically operated. Otherwise, the model may move in the other direction. In order to set a one-to-one correspondence relationship between the model and the joints of the character, in a script that processes data of the skeletal joints, it is necessary to create a list of storage joints. Generally, as in humans, the skeleton of the model typically contains two clavicles, which are also contained in the joint type by Nuitrack. It is noted, however, that the bone data obtained from Realsense typically has only one clavicle, located in the middle of the two clavicles. For convenience, the present invention creates a dictionary for mapping the JointType of Nuitrack and the joints of the model. And iterating the joint array of the model, recording the rotation angle of the model bone, and adding the model joint and the corresponding joint type into the dictionary. The information of each joint of the Nuitrack can be obtained through a good skeleton processing function, the rotation degree of the model joint is calculated, and the angle is given to the model joint after the angle is calculated. The model joints will then move with the motion of the performer. This is done by having the model follow the motion of the performer, i.e., as the performer raises his right hand, the model will also raise his right hand. The captured picture can also be mirror flipped if desired. In addition, the joint information of the model can be modified to achieve the purpose. The mirror image joints corresponding to the joints can be exchanged, for example, the joint of the left shoulder is replaced to the right shoulder, and the mirror image can be achieved. The performer in this embodiment may also be referred to as a user.

In one embodiment, an apparatus for interaction between an actor and a virtual object is provided, and the apparatus for interaction between the actor and the virtual object is implemented by using the method for interaction between the actor and the virtual object according to any one of the above embodiments. In one embodiment, the performer and virtual object interaction device comprises corresponding modules for implementing the steps of the performer and virtual object interaction method.

the building module is used for building a virtual human body model;

The interaction device for the performer and the virtual object obtains depth video data of the performer through a perception technology, identifies the depth video data, identifies position data of bone joints and bone joints of the performer, correspondingly stores the identified position data into a List structure, traverses the List structure to obtain motion data of the bone joints, and maps the motion data into the virtual human body model, so that the virtual human body model can read the action of the performer in real time to generate corresponding behaviors, and interaction between the performer and the virtual human body model is achieved.

In one embodiment, the traversal module comprises:

the traversing unit is used for traversing the List structure body and calculating the movement direction and the movement distance of the bone joint according to the variation of the position data identified by two adjacent frames of images;

the time interval unit is used for acquiring the time interval between two adjacent frames of images;

and the movement speed calculation unit is used for calculating the movement speed of the bone joint according to the time interval and the movement distance.

In one embodiment, the traversal module comprises a rejection unit and a traversal unit;

the removing unit is used for removing the position data which do not meet the preset conditions in the List structure body to obtain the removed List structure body;

and the traversing unit is used for traversing the removed List structure body and calculating to obtain the motion data.

In one embodiment, a computer device is provided, the internal structure of which may be as shown in FIG. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of performer interaction with a virtual object. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device comprises a memory storing a computer program and a processor executing the steps of the method of any of the above embodiments.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring depth video data of a performer through a sensory technology;

storing the position data into a List structure in sequence;

constructing a virtual human body model;

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the method of interaction of an actor with a virtual object as described in any of the above embodiments.

In one embodiment, a computer-readable storage medium is provided, having stored thereon a computer program, the computer program implementing the following steps when executed by a processor:

acquiring depth video data of a performer through a sensory technology;

storing the position data into a List structure in sequence;

constructing a virtual human body model;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of interaction of a performer with a virtual object, comprising:

acquiring depth video data of a performer through a sensory technology;

storing the position data into a List structure in sequence;

constructing a virtual human body model;

2. The method of claim 1, wherein the step of traversing the List structure to calculate motion data of skeletal joints of the performer comprises:

acquiring the time interval of two adjacent frames of images;

3. The method of claim 1, wherein the step of traversing the List structure to calculate motion data of skeletal joints of the performer comprises:

4. The method of performer interaction with a virtual object as recited in claim 3 wherein the position data comprises: vector data and angle data;

5. The method of claim 1, wherein the position data includes vector data and angle data;

creating a window;

initializing the window to 0;

recording angle data for a first of said bone joints as a starting angle value;

judging whether the initial angle value is within a preset angle range or not;

traversing the List structure; and obtaining the motion data.

6. The method of performer interaction with a virtual object as recited in claim 1, further comprising:

7. The method of claim 1, wherein the position data is obtained using the following equation:

x'＝RawImage.x-Proj.x*RawImage.x

y'＝RawImage.y-Proj.y*RawImage.y

wherein x 'is an x-axis coordinate of a bone joint, y' is a y-axis coordinate of the bone joint, rawimage.x is a length of the image, rawimage.y is a width of the image, an x-axis coordinate value of the bone joint recognized by proj.x, and a y-axis coordinate value of the bone joint recognized by proj.y.

8. An apparatus for performer interaction with a virtual object, comprising:

the building module is used for building a virtual human body model;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.