CN109448131B

CN109448131B - Kinect-based virtual piano playing system construction method

Info

Publication number: CN109448131B
Application number: CN201811243690.8A
Authority: CN
Inventors: 吴俊�; 张子涵; 王凯; 王家霈; 张瑶; 何贵青; 蒋晓悦; 谢红梅; 夏召强; 冯晓毅; 李会方
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2018-10-24
Filing date: 2018-10-24
Publication date: 2022-07-26
Anticipated expiration: 2038-10-24
Also published as: CN109448131A

Abstract

The invention provides a method for constructing a virtual piano playing system based on a Kinect. The invention is used as an interaction mode of people and machines, is a simple and convenient virtual keyboard, and can be expanded to the fields of intelligent home, games, robots and the like; the OpenGL library is used for displaying, the state of the keys is judged by combining the value of the position of the fingertips, the accuracy of the key playing is improved, and good user experience can be brought. When realizing virtual piano, established three-dimensional model for the picture effect has the third dimension more, satisfies people's immersive experience.

Description

Kinect-based virtual piano playing system construction method

Technical Field

The invention relates to the field of electronics and computers, in particular to a Kinect-based equipment construction method.

Background

In the current market, the program technology of playing by simulating piano keys through computer keys is mainly used, and the somatosensory interactive playing application is few. Although the current virtual piano technology can realize the piano playing effect to a certain extent, most of the current virtual piano technology is a program technology for inputting by using a computer keyboard, and the playing effect of real piano keys is difficult to simulate by using the computer keyboard. Meanwhile, the sound quality played by the prior art is poor, the sound effect played by a real piano is difficult to achieve, and the requirements of professional users are difficult to meet.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a method for constructing a virtual piano playing system based on a Kinect, which is designed aiming at the requirement of realizing a simple virtual piano playing system by common users, so that the method can meet the most basic requirements on music. The system is simple and easy to use, and the user can immediately realize the function of playing the virtual piano by only selecting the area for creating the virtual keyboard on a proper plane.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps:

the system can meet the requirement of a virtual playing piano and can also be used for medical rehabilitation of patients with bone injury. The patient is required to repeat a large number of movements under the direction of the doctor during rehabilitation. With the help of the Kinect device, the patient can carry out rehabilitation action by himself without the help of a doctor with the assistance of the display picture. And the patient is in a gaming environment, the mood is more pleasant, which also makes the patient recover faster.

In order to realize that the user only needs to select the area for creating the virtual keys on a proper plane, the function of playing the virtual piano can be immediately realized. The system study content is as follows:

step 1: three-dimensional reconstruction of a scene

Opening a Kinect equipment camera, capturing spatial depth information, obtaining the depth information of a real image by using the camera, carrying out triangularization calculation on point cloud coordinates by calculating the point cloud coordinates of each pixel point in each frame of depth image, and calculating the normal vector of each pixel point by using the coordinate information and the depth information of each pixel point after triangularization to realize three-dimensional reconstruction;

the detailed steps are as follows:

step 1.1: depth map acquisition

Controlling the Kinect by using a Kinect SDK of Microsoft, setting the total number of point clouds for OpenGL and a three-dimensional scene when three-dimensional reconstruction is started by adopting Open frames for three-dimensional drawing, clicking and marking out a key area of a virtual piano on a screen by a mouse, obtaining depth information captured in a Kinect camera by using a NuiImageStreamGetNextFrame () function in the SDK, updating the depth information one frame by one frame in update (), and storing the depth information of each frame in a set buffer;

step 1.2: point cloud picture acquisition

Establishing a class, calculating point clouds by utilizing a NuiTransform DepthImageToSkeleton () function in a Kinect SDK according to the depth information obtained in the step 1.1, storing the obtained point clouds in a matrix form, wherein each element in the matrix represents one point in the point clouds, and the elements in the matrix correspond to pixel points of the same row and column coordinates in the depth image;

step 1.3: point cloud triangulation

After a point cloud data matrix is obtained, triangularization treatment is carried out on the point cloud:

the point cloud is regarded as a picture, the cloud picture is traversed, all adjacent points of a certain point in a point cloud space are obtained, and each point is connected with the two previous points into a triangular surface according to the front-back sequence by using the connection function of OpenGL, namely the point cloud in a line is triangulated;

step 1.4: vertex normal map

The invention selects the positive directions of all normal lines towards the camera as a standard, and uses a least square method to fit an optimal plane for all points adjacent to a point, wherein the normal direction of the optimal plane is the normal direction of the point, and the point comprises the least square sum of the distances from the surrounding adjacent points to the optimal plane, namely the following formula is minimum:

wherein i is the number of the calculated point, M represents the square sum of the distances from the neighboring points around the point to the optimal plane, (a, b, c) is the normal vector of the plane, xi, yi, zi are coordinate values of the point, and the partial derivatives are calculated for three parameters of the normal vector, respectively, so as to obtain:

taking the minimum value for equation (1), the following three equations hold:

the solution to the minimum value of equation (1) is, according to Cramer's rule, the

Wherein D represents a gradient, namely an environment scene of the virtual piano system is reconstructed;

and 2, step: virtual key generation

In a scene after Kinect reconstruction, on any plane without a shielding object, clicking a screen domain of a PC (personal computer) through a mouse, selecting a specific region needing to generate keys, drawing by utilizing OpenGL, and constructing virtual keys in the region;

after a virtual piano environment scene is built, selecting a key generation position in the built virtual environment, clicking a selection area by a mouse, monitoring by utilizing mousePressed (), obtaining a screen two-dimensional coordinate of the key selection area of a user, wherein the two-dimensional coordinate is a screen coordinate established by taking the center of a computer screen as an original point, using a point cloud coordinate in a depth image space with a Kinect camera as the original point obtained in the step 1.2 for building a three-dimensional key, storing point cloud data obtained by previous calculation in a matrix form, and obtaining a corresponding point cloud coordinate according to a row-column position corresponding relation of the screen coordinate and the depth image coordinate;

acquiring point cloud coordinates of a key area by using an Update () function, calling a createkey () function, calculating a three-dimensional coordinate of a spatial position of each key, setting the number of divided keys according to the size of a screen and a frame rate of equipment, dividing the total number of the point clouds initially set in the step 1.1 by the number of the currently set keys to obtain the number of the point clouds of each key, and drawing three-dimensional keys by using an OpenGL camera as a visual angle in a draw () function, namely directly drawing the key area on the screen by using a mouse; after the virtual keys are set, the note file corresponding to the position of each key is played by using a playing command;

and 3, step 3: piano key press detection

Detecting whether the user presses the keys in real time when the user plays the piano after setting, namely performing pressing detection;

comparing the point cloud values of the fingers with the point cloud numbers of the key areas to judge the state change of the keys, and determining that the state of the keys changes once the point cloud number values of the response areas are detected to change and the positions of the fingers are in the key areas at the moment, wherein the range of the change values is matched with the fingers, namely determining that the keys are pressed down;

and 4, step 4: playing of corresponding musical notes

By the key depression detection, if the key is depressed, the corresponding note is played, and the virtual key changes the color to show the user that the detection is performed.

In order to realize real-time playing, firstly, asynchronous playing is adopted to ensure that the playing of notes cannot influence the display of the graph, namely, the graph is displayed while the key is pressed, and secondly, a thread is started to play the sound, so that the playing sound and the display part are not influenced mutually;

the method adopts a windows system with a midi player, sends a sound playing command to a virtual key system through a midi OutShortMsg () function, uses an Acoustics Pianisisimo sound source library when a plurality of notes are played simultaneously, and calls a thread pool command to avoid abuse of a single thread when a plurality of keys are pressed simultaneously.

The invention performs median filtering on the image, and performs median filtering on the acquired depth data by using a window of 5x5 in an update () function.

In the normal map calculation of step 1.4, 8 domain points around the point are taken as neighboring points, the depth of the 8 domain points is judged, and the domain points are taken as the neighboring points for the next calculation only when the depth difference between the depth of the 8 domain points and the depth of the central point is within 5 percent of the depth value of the central point.

When an object moves through the key selection area, the point cloud number of the key selection area changes, a global variable is set, the refresh frame number of images collected by a camera in the moving process time of the object is recorded, the point cloud number of at least the first 20 times of the frame number is obtained, and the average value of the rest number after the highest value and the lowest value of the point cloud number are removed is used as the current point cloud number.

The method comprises the steps of firstly obtaining the point cloud number of each key when the key is not touched, selecting more than 20 frames, removing the highest value and the lowest value, then calculating the average value, taking the total point cloud number of each key of ten percent as a threshold value, and when the fluctuation range of the average value of the point cloud number of the key is smaller than the threshold value, enabling the key not to change color and not to sound.

The invention has the advantages that the current popular Kinect equipment is utilized to realize a virtual piano playing system, the requirement of users on entertainment is met, the system can be used as a general input device to a certain extent, as an interaction mode of people and machines, and a simple and convenient virtual keyboard can be expanded to the fields of intelligent home, games, robots and the like; the depth map provided by the Kinect is low in resolution, the depth values are easily affected by noise, and in order to solve the problem brought to accurate playing of the piano, the OpenGL library is used for displaying, the state of the keys is judged by combining with the values of the positions of the fingertips, the accuracy of playing the keys is improved, and good user experience can be brought. When the virtual piano is realized, a three-dimensional model is established, so that the picture effect is more stereoscopic, and the immersive experience of people is met.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a schematic view of the inventive triangularization.

Fig. 3 is an environmental scenario of the virtual piano system of the present invention.

Fig. 4 is a schematic diagram of a key constructed by the present invention.

Fig. 5 is a press detection diagram of the present invention.

Fig. 6 is a comparison between before and after filtering according to the present invention, wherein fig. 6(a) is a schematic diagram before filtering and fig. 6(b) is a schematic diagram after filtering.

FIG. 7 is a real-time effect diagram of a computer screen when a user performs a performance using the present invention.

Detailed Description

The invention is further illustrated by the following examples in conjunction with the drawings.

Step 1: three-dimensional reconstruction of a scene

Opening a Kinect equipment camera, capturing spatial depth information, obtaining the depth information of a real image by using the camera, carrying out triangularization calculation on point cloud coordinates by calculating the point cloud coordinates of each pixel point in each frame of depth image, and calculating the normal vector of the pixel point by using the coordinate information and the depth information of each pixel point after triangularization so as to realize three-dimensional reconstruction;

the detailed steps are as follows:

step 1.1: depth map acquisition

Controlling the Kinect by using a Kinect SDK of Microsoft, setting the total number of point clouds for OpenGL and a three-dimensional scene when three-dimensional reconstruction is started by adopting Open frames for three-dimensional drawing, clicking and marking out a key area of a virtual piano on a screen by a mouse, obtaining depth information captured in a Kinect camera by using a NuiImageStreamGetNextFrame () function in the SDK, updating the depth information frame by frame in update () and storing the depth information of each frame in a set buffer;

step 1.2: point cloud picture acquisition

step 1.3: point cloud triangularization

After a point cloud data matrix is obtained, triangularization processing is carried out on the point cloud:

the point cloud returned by the Kinect is structured and not disordered, so the calculation amount for performing triangulation calculation on the point cloud is not too large, step 1.2, the point cloud is stored in a matrix, the size of the matrix is completely the same as that of a depth map (row x column), the point cloud is regarded as a map, the cloud map is traversed, all adjacent points of a certain point in a point cloud space are obtained, and each point is connected with the two previous points into a triangular surface according to the front-back sequence by using the connection function of OpenGL, namely the point cloud of a row is triangulated. As shown in FIG. 2, the sequence numbers next to the dots are in the order of drawing the dots:

step 1.4: vertex normal map

There are many calculations for finding the normal, and the invention takes two points adjacent to each point to form a triangle, and calculates the normal of the triangle, i.e. the normal of the point, but this method is too inaccurate, and the coordinate of one point changes slightly, which affects the direction of the final normal, and the lighting effect is very unstable. In order to achieve a relatively accurate effect, the invention selects all normal positive directions facing the camera as a standard, and uses a least square method to fit an optimal plane for all points adjacent to a point, wherein the normal direction of the optimal plane is the normal direction of the point, and the sum of squares of distances from the surrounding adjacent points to the optimal plane is minimum, namely the following formula is minimum:

where i is the number of the calculated point, M denotes the sum of the squares of the distances of the points including surrounding neighbors to this best plane, (a, b, c) is the normal vector to the plane, x _i ，y _i ，z _i The coordinate values of the point are respectively, and the offset derivatives of the three parameters of the normal vector are respectively calculated, so that the following results can be obtained:

taking the minimum value for equation (1), the following three equations hold:

Where D represents the gradient, i.e. the environment scene of the virtual piano system is reconstructed, as shown in FIG. 3

And 2, step: virtual key generation

In a scene after Kinect reconstruction, clicking a screen domain of a PC (personal computer) through a mouse on any plane without a shielding object, selecting a specific area needing to generate keys, drawing by utilizing OpenGL, and constructing virtual keys in the area;

after a virtual piano environment scene is constructed, selecting a key generation position in the constructed virtual environment, clicking a selection area by a mouse, monitoring by utilizing mousePressed (), acquiring a screen two-dimensional coordinate of the key selection area of a user, wherein the two-dimensional coordinate is a screen coordinate established by taking the center of a computer screen as an original point, and using a point cloud coordinate in a depth image space with a Kinect camera as the original point obtained in the step 1.2 for constructing a three-dimensional key, storing point cloud data obtained by previous calculation in a matrix form, and obtaining a corresponding point cloud coordinate according to a row-column position corresponding relation between the screen coordinate and the depth image coordinate;

acquiring point cloud coordinates of a key region by using an Update () function, calling a createKey () function, calculating a three-dimensional coordinate of a spatial position of each key, setting the number of divided keys according to the size of a screen and a frame rate of equipment, dividing the total number of the point clouds initially set in the step 1.1 by the number of the currently set keys to obtain the number of the point clouds in each key, and drawing three-dimensional keys by using an OpenGL camera as a visual angle in a draw () function, namely directly drawing the key region on the screen by using a mouse; after the virtual keys are set, the note file corresponding to the position of each key is prepared for the subsequent note playing by using a playing command;

and 3, step 3: piano key press detection

When the user plays the piano after the setting is finished, whether the user presses the keys is detected in real time, namely, pressing detection is carried out.

Comparing the point cloud values of the fingers with the point cloud numbers of the key areas to judge the state change of the keys, namely determining that the state of the keys changes once the point cloud number values of the response areas are detected to change and the positions of the fingers are in the key areas at the moment, wherein the range of the change values is consistent with the fingers, namely determining that the keys are pressed down, thus adopting double judgment criteria, improving the accuracy of piano playing and bringing good experience to users;

and 4, step 4: corresponding note playing

By the key depression detection, if a key is depressed, a corresponding note is played while the virtual key changes color to show the user that the key depression has been detected.

When the virtual keys are detected to be pressed down, the virtual piano immediately responds to play the note sound files corresponding to the keys, the corresponding colors of the keys in the interface are changed, namely the corresponding keys are pressed down, and the high requirement of the system on real time is high in consideration of the situation that the playing speed of a user is high;

the method adopts a built-in midi player of a windows system to play sound, sends a sound playing command to a virtual key system through a midi OutShortMsg () function, reduces the sound quality when a plurality of notes are played simultaneously, finds a public professional piano sound source library with higher sound quality, namely an Acoustica Pianisisimo sound source library, enables the sound played by the keys to be better listened, can not conflict when playing different notes due to the fact that a plurality of keys are pressed simultaneously, calls a thread pool command to avoid abuse of a single thread, and ensures that the simultaneous playing of a plurality of notes is not influenced.

Since the depth map obtained and plotted using OpenCV has many holes and noise, and the edges are not smooth. The invention filters the image, uses median filtering in order to keep the edges from being blurred, and uses a window of 5x5 for the acquired depth data in the update () function.

In the normal map calculation of step 1.4, if each point is simply taken to form a triangle with its adjacent points, and the normal vector of the triangle is calculated, the change of one point will affect the final normal direction, and if the points around the point are increased, the normal will be more stable, but the calculation amount is too large. By using the thought of a two-dimensional image filter for reference, 8 field points around the required point are used as adjacent points, the depth of the 8 field points needs to be judged because pixel points in the image have depth information, and the field points are used as the adjacent points for the next calculation only when the depth difference between the depth of the 8 field points and the depth of the central point is within 5 percent of the depth value of the central point.

The finger distinguishing process is further explained, when objects move through the key selection area, the point cloud number of the key selection area changes, so that a global variable is set, the number of refreshing frames of images acquired by a camera in the moving process of the objects is recorded, the point cloud number of at least the previous 20 times of the frame number is obtained, the average value of the rest number after the highest value and the lowest value of the point cloud number is removed and is used as the current point cloud number, and errors are reduced as far as possible under the condition that resources are not occupied.

By invoking midi to play a single note, the effect is also acceptable. But playing multiple notes simultaneously, the fusion of sound is too poor. The professional defines an audio file for each note to freely use an audio library of the professional, creates a thread pool, creates a thread for each key to call the audio file, and solves the problem of thread multiplexing by using the thread pool, so that the expenditure is reduced.

The corresponding threshold values should be different even if different keys are closely spaced, and the number of point clouds is still different, so that the fixed key space point cloud number and the determined variation range are obviously not preferable. According to the method, the point cloud number of each key when the key is not touched is obtained, the number of frames of more than 20 frames is selected, the average value is obtained after the highest value and the lowest value are removed, the total point cloud number of each key is taken as the threshold value, when the fluctuation range of the average value of the point cloud number of the key is smaller than the threshold value, the key does not change color and does not sound, and as shown in fig. 7, the state of the key is not influenced by the shielding of an object in front of the key.

In order to verify the feasibility of the technical scheme of the invention, a computer is used for connecting the Kinect equipment and installing the system of the invention to carry out a plurality of open demonstrations, the hand playing is carried out on the desktop of a classroom, the foot playing is carried out on the ground of a dormitory, and the invention obtains good tone quality and sound quality and faster response speed.

Claims

1. A method for constructing a virtual piano playing system based on Kinect is characterized by comprising the following steps:

step 1: three-dimensional reconstruction of a scene

the detailed steps are as follows:

step 1.1: depth map acquisition

step 1.2: point cloud picture acquisition

Establishing a class, calculating point cloud by utilizing a NuiTransform DepthImageToSkeleton () function in Kinect SDK according to the depth information obtained in the step 1.1, storing the obtained point cloud in a matrix form, wherein each element in the matrix represents one point in the point cloud, and the elements in the matrix correspond to pixel points of the same row and column coordinates in the depth image;

step 1.3: point cloud triangularization

step 1.4: vertex normal map

Selecting all normal positive directions towards the camera as a standard, fitting an optimal plane for all adjacent points of a point by using a least square method, wherein the normal direction of the optimal plane is the normal direction of the point, and the sum of squares of distances from the adjacent points around the point to the optimal plane is minimum, namely the following formula is minimum:

where i is the number of the calculated point, M represents the sum of the squares of the distances of the point including the surrounding neighbors to this best plane, (a, b, c) is the normal vector of the plane, x _i ，y _i ，z _i The coordinate values of the point are respectively, and the deviation of three parameters of the normal vector is respectively calculated, so that the following can be obtained:

taking the minimum value for equation (1), the following three equations hold:

the solution to the minimum of equation (1) is according to Cramer's law

Wherein D represents a gradient, i.e., an environmental scene in which the virtual piano system is reconstructed;

and 2, step: generation of virtual keys

and 3, step 3: piano key press detection

Detecting whether a user presses keys in real time when the user plays a piano after setting is finished, namely performing pressing detection;

and 4, step 4: playing of corresponding musical notes

2. The method for constructing a virtual piano playing system based on Kinect as claimed in claim 1, wherein:

in order to realize real-time playing, firstly, asynchronous playing is adopted to ensure that the playing of notes cannot influence the display of the graph, namely, the graph is displayed when the key is pressed down, and secondly, the thread is started to play the sound, so that the playing sound and the display part are not influenced mutually.

3. The method for constructing a virtual piano playing system based on Kinect as claimed in claim 1, wherein:

the method is characterized in that a windows system is used for playing sound, a midi player is arranged in the windows system, a sound playing command is sent to a virtual key system through a midi OutShortMsg () function, when a plurality of notes are played at the same time, an Acoustics Pianisisimo sound source library is used, and when a plurality of keys are pressed down at the same time, a thread pool command is called to avoid abuse of a single thread.

4. The method for constructing a virtual piano playing system based on Kinect as claimed in claim 1, wherein:

the image is median filtered, using a window of 5x5 in the update () function on the acquired depth data.

5. The method for constructing a virtual piano playing system based on Kinect as claimed in claim 1, wherein:

in the normal map calculation of step 1.4, 8 domain points around the obtained point are used as neighboring points, the depth of the 8 domain points is determined, and the domain points are used as neighboring points for the next calculation only when the depth difference between the depth of the 8 domain points and the depth of the central point is within 5% of the depth value of the central point.

6. The method for constructing a virtual piano playing system based on Kinect as claimed in claim 1, wherein:

when an object moves through the key selection area, the point cloud number of the key selection area changes, a global variable is set, the refresh frame number of images collected by the camera in the moving process time of the object is recorded, the point cloud number of at least the previous 20 times of the frame number is obtained, and the average value of the residual number after the highest value and the lowest value of the point cloud number are removed is used as the current point cloud number.

7. The method for constructing a virtual piano playing system based on Kinect as claimed in claim 1, wherein:

firstly, the point cloud number of each key which is not touched is obtained, the average value is obtained after the frame number of more than 20 frames is selected and the highest value and the lowest value are removed, the total point cloud number of each key in ten percent is taken as the threshold value, and when the fluctuation range of the average value of the point cloud number of the keys is smaller than the threshold value, the keys do not change color and do not sound.