CN115870980A - Vision-based piano playing robot control method and device - Google Patents

Vision-based piano playing robot control method and device Download PDF

Info

Publication number
CN115870980A
CN115870980A CN202211585522.3A CN202211585522A CN115870980A CN 115870980 A CN115870980 A CN 115870980A CN 202211585522 A CN202211585522 A CN 202211585522A CN 115870980 A CN115870980 A CN 115870980A
Authority
CN
China
Prior art keywords
image
pixel
robot
paw
mechanical arm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211585522.3A
Other languages
Chinese (zh)
Inventor
罗俊琦
朱留存
陈明友
张振宇
刘纪元
经展鹏
李亮
白文豪
洪培涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beibu Gulf University
Original Assignee
Beibu Gulf University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beibu Gulf University filed Critical Beibu Gulf University
Priority to CN202211585522.3A priority Critical patent/CN115870980A/en
Publication of CN115870980A publication Critical patent/CN115870980A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a vision-based piano playing robot control method and a vision-based piano playing robot control device, wherein the method comprises the following steps: calibrating the paw and the key at the tail end of the robot by using the acquired visual information to obtain a transformation relation between a key image coordinate system and a paw space coordinate system; performing autonomous identification by using the acquired music score image to obtain music score information; planning the optimal action combination of the mechanical arm and the paw according to the music score information; and determining a control instruction according to the optimal action combination, and sending the control instruction to the mechanical arm and the paw so as to enable the mechanical arm and the paw to act. The invention can realize convenient calibration of a 'hand-organ' of the piano playing robot and autonomous identification of music score images, optimizes the playing action combination of the piano playing robot and effectively improves the playing efficiency and effect of the piano playing robot.

Description

Vision-based piano playing robot control method and device
Technical Field
The invention relates to the technical field of intelligent robots, in particular to a method and a device for controlling a piano playing robot based on vision.
Background
Pianos, which utilize the basic principles of strings, damping and vibration to produce sound, are very popular instruments. With the development of intelligent robot technology in recent years, piano playing is gradually expanded from a typical human task to an automatic playing task executed by a robot, and good social and economic application values are shown.
The existing piano playing robot is composed of a mechanical arm and a paw, the mechanical arm moves to a playing position, and the paw presses a key to play music, so that the following technical problems are solved:
(1) In order to realize accurate robot playing, the mechanical gripper and the key are usually calibrated one by one in a teaching mode, and when the relative position of the robot and the key changes, the calibration is needed again, so that the program is complex and the time consumption is long.
(2) The music score played by the robot needs to be manually input, and adverse effects are brought to the task flexibility played by the robot.
(3) The mechanical fingers anchor keys, so that the action combination of the arm claws cannot be flexibly planned according to music score information, and the playing effect is restricted.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a vision-based piano playing robot control method and device, which can realize convenient calibration of a 'hand-piano' of the piano playing robot and autonomous recognition of a music score image, optimize the playing action combination of the piano playing robot and effectively improve the playing efficiency and effect of the piano playing robot.
In order to achieve the purpose, the invention provides the following scheme:
a vision-based piano playing robot control method comprises the following steps:
calibrating the paw and the key at the tail end of the robot by using the acquired visual information to obtain a transformation relation between a key image coordinate system and a paw space coordinate system;
performing autonomous identification by using the acquired music score image to obtain music score information;
planning the optimal action combination of the mechanical arm and the paw according to the music score information;
and determining a control command according to the optimal action combination, and sending the control command to the mechanical arm and the paw so as to enable the mechanical arm and the paw to act.
Preferably, the calibrating the claws and the keys at the tail end of the robot by using the acquired visual information to obtain the transformation relation between the key image coordinate system and the claw space coordinate system includes:
using a Zhang Zhengyou calibration method to calibrate the hands and eyes of the robot, and acquiring a transformation matrix of a camera coordinate system and a robot base coordinate system;
extracting key image position information according to the visual information;
acquiring the transformation relation between the position information of each image and the coordinates of a camera space coordinate system;
and acquiring the transformation relation between the position information of each image and the coordinates of the paw space coordinate system.
Preferably, the extracting key image position information from the visual information includes:
acquiring a first keyboard region image of a target musical instrument;
placing a target musical instrument in a complete field of view right below a depth camera, and acquiring a musical instrument image;
taking the keyboard area image as a preset image recognition model, taking the instrument image as an image to be recognized, and executing a scale-invariant feature transformation algorithm to obtain a second keyboard area image of the target instrument;
carrying out binarization processing on the second keyboard region image to obtain a binarization image;
acquiring a pixel value of the binary image in the y direction, and performing horizontal filtering on the binary image by using a preset threshold value to obtain an upper part image containing a black key and a lower part image not containing the black key; the preset threshold is determined by the pixel value;
filtering out the slender black pixel area of the upper part of the image, namely removing the image of the gap area between the adjacent white keys;
performing edge extraction and enhancement on the residual black pixel blocks to obtain a circumscribed rectangle corresponding to each black key;
selecting an image coordinate corresponding to the geometric center of the circumscribed rectangle as an image coordinate of the black key;
performing edge extraction and enhancement on the white pixel block of the lower part image, and selecting an image coordinate corresponding to the geometric center of the circumscribed rectangle as an image coordinate of a white key;
and calculating the abscissa and the ordinate of all the keys in the image coordinate system, and directly acquiring the depth information of the keys in the camera coordinate system by using a depth camera.
Preferably, the performing the autonomous recognition by using the acquired music score image to obtain music score information includes:
carrying out image preprocessing on the music score image to obtain a preprocessed image; the preprocessing image comprises image binaryzation, spectral line area segmentation, spectral line removal, abnormal note image restoration and note image segmentation;
carrying out note identification according to the connected domain in the preprocessed image to obtain note identification information; the note identification includes: stem identification, head identification, tail and Fu Liang identification, and pitch identification;
and identifying and recognizing the inflexion marks and the rest marks according to the music score image to obtain symbol identification information.
Preferably, the image preprocessing is performed on the curvelet image to obtain a preprocessed image, and the image preprocessing includes:
searching an optimal threshold value of the minimum intra-class variance by using a maximum inter-class variance method, and obtaining a binary curved spectrum image according to the optimal threshold value and the curved spectrum image;
projecting the binaryzation curvelet image data to the horizontal direction by using horizontal projection to construct a histogram;
acquiring 5 peak values of the histogram in the y-axis direction, and constructing a peak value vector by using the 5 peak values; a pixel region in the horizontal direction between the maximum value and the minimum value in the peak value vector is a staff image region;
removing all black pixels in the horizontal direction of each element in the peak value vector to obtain an image with spectral lines removed;
recovering notes with changed shapes caused by the removed spectral lines by using morphological processing based on the image with the removed spectral lines to obtain a recovered image;
searching each pixel in the restored image until the current pixel has a value of 255 and has not been marked;
selecting the current pixel as a seed pixel, assigning a label to the current pixel, and checking each pixel adjacent to the seed pixel;
if a pixel with the value of 255 is adjacent to the seed pixel, returning to the step of selecting the current pixel as the seed pixel, assigning a label to the current pixel, and checking each pixel adjacent to the seed pixel until a pixel with the pixel value of 255 cannot be found;
and updating the label and returning to the step of searching each pixel in the restored image until the value of the current pixel is 255 and not marked, and obtaining the marked connected domain after all the pixels in the image are checked.
Preferably, the performing note identification according to the connected domain in the preprocessed image to obtain note identification information includes:
based on the connected domain, applying vertical projection and obtaining a histogram;
determining the position of the sign post according to the peak value of the histogram;
traversing the height of a circumscribed rectangle of the connected domain, and taking the connected domain with the difference value between the distance of the staff and the height of the circumscribed rectangle smaller than 6 as a candidate character head;
determining the symmetry rate of each connection component according to the candidate symbols;
determining the identified symbols according to the symmetry rate;
removing the stem and the head in the connected domain, performing primary screening on the removed image, and reserving the image with the width larger than 2 times of the preset width and the length larger than the space between the staff;
carrying out Hough transform on the retained image to carry out linear feature detection, and identifying the image with linear features as Fu Liang; fu Liang is the number of the linear features of the Hough transform detection result;
separating the stem and the tail, and defining a reference line as the position of a third line of the staff;
deducing according to the difference between the center position of the symbol head and the reference line to obtain the pitch.
Preferably, the identifying and recognizing a inflexion mark and a pause symbol according to the music score image to obtain symbol identification information includes:
detecting whether a connected region mark exists in the beginning region of the staff and the left region of the tone;
if so, normalizing the image to the size same as that of the template image, and operating by applying a logic calculation formula to obtain a result image;
and determining the template image corresponding to the result image with the matching degree larger than 0.8 and the maximum number of the pixel values 255 as the corresponding inflexion mark or rest character.
Preferably, planning the optimal motion combination of the mechanical arm and the paw according to the music score information comprises:
mapping the note sequence into a key image coordinate sequence according to the one-to-one correspondence of the notes and the keys;
the conversion of the musical notes to the target positions of the robot is realized by the conversion relation between the key image coordinate system and the paw space coordinate system and the key image coordinate sequence;
identifying a music score image, constructing a note sequence according to the playing sequence, and calculating the number of notes to be played;
initializing particle population parameters, and randomly generating various robot playing action sequences aiming at the music score with the note number;
acquiring a path length fitness function and a mechanical arm moving time fitness function of the action sequence, and determining a final particle fitness function according to the path length fitness function and the mechanical arm moving time fitness function;
evaluating the initial fitness of each particle, and calculating the initial fitness function of each particle to obtain the initial individual position of each particle at a position node and the optimal position of the particle swarm overall at the position node;
updating the speed and the position of each particle by using a preset updating formula;
calculating the fitness of all the particles, obtaining the individual optimal position vector of each particle at the position node, and updating the individual optimal position searched by the particle at the position node and the optimal position of the position node of the particle swarm overall by utilizing an updating strategy;
and judging whether the maximum iteration times or the convergence condition is met, if so, outputting an optimal action sequence, and otherwise, jumping to the step of updating the speed and the position of each particle by using a preset updating formula.
Preferably, the determining a control command according to the optimal action combination and sending the control command to the mechanical arm and the gripper comprises:
sending a control instruction containing the information of the optimal action combination to the mechanical arm and the paw through a bus protocol; and the control instructions are sequentially executed, and a closed loop detection mechanism is added to the mechanical arm position instructions for detection.
A vision-based piano playing robot control device is used for the vision-based piano playing robot control method, and the device comprises:
the device comprises a depth camera, a paw, a six-degree-of-freedom mechanical arm and an upper computer control cabinet;
the tail end of the six-degree-of-freedom mechanical arm is connected with the paw, and the tail end of the six-degree-of-freedom mechanical arm is also connected with the depth camera through a switching frame and used for acquiring a target image and depth information. The six-degree-of-freedom mechanical arm is communicated with the upper computer control cabinet through an interface; a nonvolatile storage, a random access memory and a processor are arranged in the upper computer control cabinet; the nonvolatile memory is used for storing computer programs and image information and is coupled with the memory; the random access memory is used for loading the executed computer program and image information; the processor is used for realizing the automatic identification of the music score image and planning the optimal action combination of the mechanical arm and the paw.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a vision-based piano playing robot control method and a vision-based piano playing robot control device, wherein the method comprises the following steps: calibrating the paw and the key at the tail end of the robot by using the acquired visual information to obtain a transformation relation between a key image coordinate system and a paw space coordinate system; performing autonomous identification by using the acquired music score image to obtain music score information; planning the optimal action combination of the mechanical arm and the paw according to the music score information; and determining a control instruction according to the optimal action combination, and sending the control instruction to the mechanical arm and the paw so as to enable the mechanical arm and the paw to act. The invention can realize convenient calibration of a 'hand-organ' of the piano playing robot and autonomous identification of music score images, optimizes the playing action combination of the piano playing robot and effectively improves the playing efficiency and effect of the piano playing robot.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a method provided by an embodiment of the present invention;
FIG. 2 is an original input image of a paper music score provided by an embodiment of the present invention;
FIG. 3 is a diagram illustrating the effect of note identification on a music score image according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, the inclusion of a list of steps, processes, methods, etc. is not limited to only those steps recited, but may alternatively include additional steps not recited, or may alternatively include additional steps inherent to such processes, methods, articles, or devices.
The invention aims to provide a vision-based piano playing robot control method and device, which can realize convenient calibration of a piano playing robot 'hand-piano' and autonomous identification of music score images, optimize the playing action combination of the piano playing robot and effectively improve the playing efficiency and effect of the piano playing robot.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of a method provided in an embodiment of the present invention, and as shown in fig. 1, the present invention provides a vision-based piano playing robot control method, including:
step 1: calibrating the paw and the key at the tail end of the robot by using the acquired visual information to obtain a transformation relation between a key image coordinate system and a paw space coordinate system;
step 2: performing autonomous identification by using the acquired music score image to obtain music score information;
and step 3: planning the optimal action combination of the mechanical arm and the paw according to the music score information;
and 4, step 4: and determining a control command according to the optimal action combination, and sending the control command to the mechanical arm and the paw so as to enable the mechanical arm and the paw to act.
Specifically, step 1 in this embodiment: the robot hand-piano calibration module calibrates the paws and the keys at the tail end of the robot by using visual information to realize the position transformation of key image coordinates and paw coordinates. The robot hand-piano calibration method specifically comprises the following steps:
s1.1, using Zhang Zhengyou calibration method to calibrate the hand and eye of the robot, and obtaining a transformation matrix of a camera coordinate system and a robot base coordinate system
Figure BDA0003991071370000081
S1.2: key image position information is extracted. The method specifically comprises the following steps:
s1.2.1: a keyboard region image img _ benchmark of the target musical instrument is acquired.
S1.2.2: and (4) placing the target musical instrument in a complete field of view right below the depth camera, and acquiring an image img _ piano.
S1.2.3: the image img _ benchmark is used as a preset image recognition model, the image img _ pilot is used as an image to be recognized, and a Scale-invariant feature transform (SIFT) algorithm is executed to obtain a keyboard region image img _ keyboard of the target musical instrument.
S1.2.4: and carrying out binarization on the image img _ keyboard to obtain an image img _ bw.
S1.2.5: and acquiring a pixel value pixinum _ y of the image img _ bw in the y direction, and performing horizontal filtering on the pixel value pixinum _ y by taking 0.67 pixinum _asa threshold value to obtain an upper part image img _ upper containing a black key and a lower part image img _ lower containing no black key.
S1.2.6: the elongated black pixel area of the image img _ upper is filtered out, i.e., the image of the gap area between the adjacent white keys is removed. And performing edge extraction and enhancement on the residual black pixel blocks to obtain a circumscribed rectangle corresponding to each black key. And selecting the image coordinate corresponding to the geometric center of the circumscribed rectangle as the image coordinate of the black key.
S1.2.7: and performing edge extraction and enhancement on the white pixel block of the image img _ lower, and selecting an image coordinate corresponding to the geometric center of the circumscribed rectangle as an image coordinate of the white key.
S1.2.8: calculating the abscissa x and the ordinate y of all keys under an image coordinate system, and directly acquiring depth information Z of the keys under a camera coordinate system by using a depth camera c
S1.3: acquiring the transformation relation between the image coordinates of each key and the coordinates of a camera coordinate system, specifically: using the coordinate information of the key image acquired in S1.2, according to the camera model, the following transformations are performed:
Figure BDA0003991071370000091
wherein X c And Y c The coordinates of the key in the x direction and the y direction respectively in the camera coordinate system, and f is the camera focal length.
S1.4: acquiring the transformation relation between the coordinates of each key image and the coordinates of a paw coordinate system, specifically:
the transformation matrix of the robot base coordinate system and the hand claw coordinate system can be obtained through the forward kinematics of the robot
Figure BDA0003991071370000092
Then, a transformation matrix of the camera coordinate system and the paw coordinate system is acquired>
Figure BDA0003991071370000093
Finally, the transformation relation between the key image coordinates and the paw space coordinates is obtained as follows:
Figure BDA0003991071370000094
wherein X h ,Y h ,Z h The coordinates in the x, y and z axes of the paw coordinate system are respectively.
S2: autonomously identifying a music score image;
the method for autonomously identifying the music score image mainly comprises the following steps: image preprocessing, note identification, inflexion mark and rest mark identification. The original curvelet image is shown in fig. 2.
S2.1: image preprocessing, specifically including image binarization, spectral line region segmentation, spectral line removal, abnormal note image restoration and note image segmentation.
S2.1.1: image binarization, which aims to convert an input image into a binarized image and specifically comprises the following steps: and (4) finding the optimal threshold T for minimizing the intra-class variance by applying a maximum inter-class variance method. Given an input image I and a threshold value T, binarizing the image I B Can be obtained by the following formula:
Figure BDA0003991071370000095
where x and y represent the row index values and column index values of the image index matrix.
S2.1.2: spectral line area location
Firstly, projecting binary curved spectrum image data to the horizontal direction by using horizontal projection to construct a histogram, specifically comprising the following steps:
Figure BDA0003991071370000101
where M and N are the number of horizontal and vertical pixels of the image, respectively, and h (y) represents the binarized image I B The y-th row of pixel projection values.
And then, carrying out spectral line area positioning, wherein the purpose is to acquire information of the position, the width and the spacing of spectral lines. The method specifically comprises the following steps: obtain 5 peaks of the histogram in the y-axis direction, i.e. vector R p ={r p1 ,r p2 ,r p3 ,r p4 ,r p5 }。R p The horizontal pixel region between the maximum value and the minimum value in (b) represents the staff image region. Line width H L The line width of the histogram peak. Spectral line spacing S L Is R p The absolute value of the difference between any two adjacent elements, i.e.:
S L =abs(r pn -r p(n+1) ),1≤n≤4
s2.1.3: spectral line removal, specifically: removal of R p All black pixels in the horizontal direction of each element, and an image with spectral lines removed is recorded as I sf
S2.1.4: the abnormal note image is restored, and the notes with the changed shapes caused by the removed spectral lines are recovered by using morphological processing. Specifically, the image closing operation is defined as follows:
Figure BDA0003991071370000102
wherein I mp Is a morphologically processed image, E is a structural element,
Figure BDA0003991071370000103
for the expansion operation, Θ is the corrosion operation.
S2.1.5: segmentation of note images with the purpose of segmenting and extracting I mp The individual note image in (1). Specifically, the connected region is marked: (1) Searching each pixel in the image until the value of the current pixel P is 255 and is not marked; (2) Selecting P as a seed pixel, assigning a label L to P, and then checking each pixel adjacent to the seed pixel; (3) If a pixel Q with the value of 255 is adjacent to the seed pixel, returning to the step (2) until a pixel with the pixel value of 255 cannot be found; (4) And (4) updating the label and returning to the step (1) until all pixels in the image are checked.
S2.2: note identification, which aims to identify the connected domain in S2.1.5, and the result is shown in fig. 3, and specifically includes: stem identification, head identification, tail (Fu Liang) identification, pitch identification.
In particular, once the connected regions of notes are identified and marked, the last step is to identify them. In the staff, a note can be divided into a note head, a note trunk and a note tail according to the structure of the note, and the pitch is determined mainly through the position of the note relative to the staff; the hook is used for determining the beat; the token is used for connecting the token head and the token tail.
S2.2.1: token identification
The vertical projection is applied and a histogram is obtained to determine the approximate location of each note. In the vertical projection process, the peak in the histogram is the position of the stem. Thus, the stems of each note can be identified.
S2.2.2: symbol head identification
The height of the hook is generally the same as the staff spacing. Thus, the staff spacing S is used L As a threshold to determine whether the image is a hook. The method specifically comprises the following steps: high S of circumscribed rectangle traversing connected domain CC A 1, S CC And S L The connected component with the difference value less than 6 is used as the candidate symbol head L. Since the hook shape is generally an ellipse, it is symmetrical with respect to its center. Here, the symbol header identification rate is further improved by calculating the symmetry rate of the candidate connected component features. Specifically, for the candidate character header L, for an arbitrary point P e L, the point P exists S E.g. L, make P-P C =P C -P S In which P is C As the center of the connecting member L, the following is defined:
Figure BDA0003991071370000111
from the above equation, the following symmetry ratio R of each connected component can be obtained:
Figure BDA0003991071370000112
wherein sum S Sum being the total number of symmetric pixels A Is the total number of all pixels in the connected component. Connected components with R greater than 0.9 are identified as the headers.
S2.2.3: hook (Fu Liang) identification
The hook recognition is performed by an optical character method. Fu Liang identifies specifically: removing the stem and the head; primary screening the image while retaining the width of more than 2 times H L A length greater than S L The image of (a); then, hough transform is performed to detect straight line features, and the number of images having straight line features is Fu Liang, fu Liang is equal to the number of straight line features of the hough transform detection result.
S2.2.4: pitch recognition
The pitch of a note is determined by its position relative to the staff. The pitch difference between notes is in scale units, and the distance of each scale is half the height of the staff space. Therefore, the present embodiment defines the pitch of each note by calculating the distance between the note and the reference line. The method specifically comprises the following steps: separating the stem and tail (Fu Liang), defining the reference line as the third of the staffPosition of line (r) p3 ) Obtaining the center position L of the symbol head c And r p3 The difference of (a) infers pitch.
S2.3: inflection symbol and rest identifier identification
The diacritics are usually recorded in two ways, (1) at the beginning of the staff mark the representative key; (2) On the left side of the note, to adjust the pitch of the note. An inactivity symbol is a music symbol used to indicate a pause in the music/song. Unlike a note, a rest has no pitch, so the height of the rest is fixed. However, the shape of the rest of different beats is more irregular than the note.
The identification method of the embodiment specifically includes: template matching techniques are employed to identify inflexion marks and rests. This technique is used to compare the unknown image with template images known in the database (i.e., template images of all types of inflexion marks and rests) to identify the symbol. First, the presence or absence of a connected region flag is detected in the staff head region and the left-hand region of the note. If so, the image is normalized to the same size as the template image and the logical XOR operation, as shown below, is applied to obtain the resulting image. Finally, the template image corresponding to the image with the matching degree greater than 0.8 and the largest number of 255 pixel values is the corresponding inflexion mark or rest.
Figure BDA0003991071370000121
Wherein, I T Representing a template image, I C Representing a sub-picture containing a symbol, I XOR Representing the resulting image after the exclusive or operation.
S3: and according to the music score information identified in the S2, constructing a position conversion relation between the note information and the paw coordinates, and planning the optimal action combination of the mechanical arm and the paw.
S3.1: establishing a position transformation relation between the note information and the paw coordinate, specifically comprising the following steps: mapping the note sequence into a key image coordinate sequence according to the one-to-one corresponding relationship between the notes and the keys, and finally realizing the conversion from the notes to the target position of the robot through the conversion relationship between the key image coordinates and the paw coordinates in the S1.
S3.2: and optimizing the optimal action combination of the mechanical arm and the paw.
Five candidate pressing fingers correspond to each key, and the combination number of the robot playing actions exponentially increases along with the increase of the number of the notes: assuming that the number of notes to be played is n, the number of action combinations the robot has is 5 n . How to efficiently search for a better playing combination is a challenging task. The sum of the displacement time delay generated by the motion of the robot is a main factor influencing the playing effect. The smaller the displacement amount and the displacement times of the robot are, the smaller the generated displacement time delay is, and the better the playing effect is. The invention plans the optimal action combination of the mechanical arm and the paw, and essentially converts the playing optimization problem into the minimization problem of the displacement amount and the displacement times of the mechanical arm. The invention provides a particle swarm optimization algorithm-based method for optimizing the optimal action combination of a mechanical arm and a paw, which comprises the following specific operation steps:
s3.2.1: and identifying the music score image, constructing a note sequence according to the playing sequence, and calculating the number n of notes to be played.
S3.2.2: initializing particle population parameters, wherein the parameters comprise: the number of the population particles is N, and the j-dimension initial velocity of the ith particle
Figure BDA0003991071370000131
And an initial position->
Figure BDA0003991071370000132
S3.2.3: aiming at a music score with the note number of N, randomly generating N robot playing motion sequences, wherein the mathematical expression is shown as the following formula:
Figure BDA0003991071370000133
wherein X i The ith particle is represented, namely the ith playing action sequence is composed of n position nodes.
Figure BDA0003991071370000134
Indicating the position node corresponding to the nth note in the ith particle in the kth iteration, an initial motion sequence is generated here, and k =0 is assigned. />
Figure BDA0003991071370000141
And &>
Figure BDA0003991071370000142
And coordinates of the nth position node in the directions of the x axis, the y axis and the z axis are respectively represented. The initial flight speed of the particle is->
Figure BDA0003991071370000143
S3.2.4: defining a particle fitness function, firstly defining a path length fitness function of an action sequence as follows:
Figure BDA0003991071370000144
in the formula f 1 Is the sum of the path lengths in cartesian space coordinates,
Figure BDA0003991071370000145
represents the coordinates of a position node on the particle, <' > or>
Figure BDA0003991071370000146
Representing the coordinates of the particle's current location node.
Defining a function f of fitness of moving times of the mechanical arm in an action sequence 2 Expressed as follows:
Figure BDA0003991071370000147
in the formula M i Indicating the number of movements of the robot at the current location node.
The final particle fitness function F is:
F=αf 1 +βf 2
where α and β are the weighting factors of the respective functions, where α =0.7 and β =0.3.
S3.2.5: the initial fitness of each particle is evaluated, and an initial fitness function F (X) of each particle is calculated i ) Obtaining the initial individual position of the particle i at the position node j
Figure BDA0003991071370000148
And an optimal location &' for the particle swarm global at location node j>
Figure BDA0003991071370000149
S3.2.6: and updating the speed and the position of each particle, wherein the updating formula is as follows:
Figure BDA0003991071370000151
Figure BDA0003991071370000152
wherein the content of the first and second substances,
Figure BDA0003991071370000153
representing the velocity of the particle i at the position node j at the k-th iteration; />
Figure BDA0003991071370000154
Indicating the position of particle i at position node j at the kth iteration. />
Figure BDA0003991071370000155
Indicating that the individual optimum position found at position node j for particle i at the kth iteration is present, is selected>
Figure BDA0003991071370000156
And the optimal position of the position node j of the particle swarm global at the k-th iteration is shown. Individual weight factor c 1 =2, global weight factorSeed C 2 =1。r 1 And r 2 Is a random number and takes values between (0,1). w is an inertia weight factor, and the value taking method comprises the following steps: />
Figure BDA0003991071370000157
Wherein ω is max And omega min Representing the maximum value and the minimum value of the inertia weight factor, respectively assigning 0.8 and 0.3 to k as the current iteration number, k max =100 is the maximum number of iterations.
S3.2.7: calculating the fitness of all the particles to obtain the individual optimal position vector of each particle at the position node j
Figure BDA0003991071370000158
And then updated->
Figure BDA0003991071370000159
And &>
Figure BDA00039910713700001510
The updating strategy specifically comprises the following steps:
Figure BDA00039910713700001511
Figure BDA00039910713700001512
where min (-) is the minimum lookup function, returning the minimum of the elements in the vector.
S3.2.8: and judging whether the maximum iteration times or the convergence condition is met, if so, outputting the optimal action sequence, and otherwise, jumping to S3.2.6 for continuous execution. Specifically, the convergence condition is continuous iteration for 10 times, and the variation of the optimal fitness value is less than 10 -2
S4: sending a control command to the mechanical arm and the paw actuator;
and sending the control instruction obtained in the step S3 to the mechanical arm and the paw through a TCP-IP bus protocol by using a control instruction sending module.
In addition, the present embodiment further includes step 5, which specifically includes the following steps:
s5: and an action execution module. And receiving the control instruction sent by the S4 by using a control instruction receiving module, and realizing the coordinated playing of the actions of the arms and the paws through a serial working mode. The control command received in the step S5 is a mechanical arm and paw alternative control command of 'mechanical arm position command 1-paw action command 1-mechanical arm position command 2-paw action command 2- … -mechanical arm position command n-paw action command n'. Because the mechanical arm and the paw belong to two sets of different executing mechanisms, if a timing system needs to be introduced to respectively restrict the operation time of the mechanical arm and the paw in a concurrent working mode, the task complexity and flexibility are poor. The adopted serial working mode specifically comprises the following steps: the control instructions are sequentially executed, and a closed-loop detection mechanism is added to the mechanical arm position instructions, and the method is described as follows:
e=s-s *
Figure BDA0003991071370000161
wherein s and s * Representing the current and desired position, respectively, e is the position error,
Figure BDA0003991071370000162
in the generalized inverse of the jacobian matrix, λ is a velocity control coefficient, and θ is a velocity control quantity. In the closed loop detection, the two formulas are executed in a circulating way, and only when the error between the tail end position of the mechanical arm and the target position is less than 1mm, the loop is jumped out and the next paw control instruction is executed.
Referring to fig. 4, the invention further provides a control device of the vision-based piano playing robot, which is used for realizing the control method of the vision-based piano playing robot. The tail end of the six-degree-of-freedom mechanical arm is connected with the paw, the tail end of the six-degree-of-freedom mechanical arm is also connected with the depth camera through the switching frame, and the six-degree-of-freedom mechanical arm moves together with the mechanical arm and is used for acquiring a target image and depth information. The mechanical arm is communicated with the upper computer control cabinet through an RS-232 interface. The upper computer control cabinet is internally provided with a nonvolatile memory, a random access memory and a processor.
A non-volatile storage device storing computer programs and image information and coupled to the memory.
And the random access memory is used for loading the executed computer program and the image information.
A processor, which may be a CPU or GPU, configured to perform the method of implementing S2 or S3 based on instructions stored by the random access memory.
The invention has the following beneficial effects:
the invention can realize convenient calibration of a 'hand-organ' of the piano playing robot and autonomous identification of music score images, optimizes the playing action combination of the piano playing robot and effectively improves the playing efficiency and effect of the piano playing robot.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A vision-based piano playing robot control method is characterized by comprising the following steps:
calibrating the paws and the keys at the tail end of the robot by using the acquired visual information to obtain a transformation relation between a key image coordinate system and a paw space coordinate system;
performing autonomous identification by using the acquired music score image to obtain music score information;
planning the optimal action combination of the mechanical arm and the paw according to the music score information;
and determining a control instruction according to the optimal action combination, and sending the control instruction to the mechanical arm and the paw so as to enable the mechanical arm and the paw to act.
2. The vision-based piano playing robot control method of claim 1, wherein the calibration of the paws and the keys at the tail end of the robot is performed by using the acquired vision information to obtain the transformation relation between a key image coordinate system and a paw space coordinate system, and the method comprises the following steps:
using a Zhang Zhengyou calibration method to calibrate the hands and eyes of the robot, and acquiring a transformation matrix of a camera coordinate system and a robot base coordinate system;
extracting key image position information according to the visual information;
acquiring the transformation relation between the position information of each image and the coordinates of a camera space coordinate system;
and acquiring the transformation relation between the position information of each image and the coordinates of the paw space coordinate system.
3. The vision-based piano robot control method of claim 2, wherein said extracting key image position information according to the vision information comprises:
acquiring a first keyboard region image of a target musical instrument;
placing a target musical instrument in a complete field of view right below a depth camera, and acquiring a musical instrument image;
taking the keyboard area image as a preset image recognition model, taking the musical instrument image as an image to be recognized, and executing a scale-invariant feature transformation algorithm to obtain a second keyboard area image of the target musical instrument;
carrying out binarization processing on the second keyboard region image to obtain a binarization image;
acquiring a pixel value of the binary image in the y direction, and performing horizontal filtering on the binary image by using a preset threshold value to obtain an upper part image containing a black key and a lower part image not containing the black key; the preset threshold is determined by the pixel value;
filtering out the slender black pixel area of the upper part of the image, namely removing the image of the gap area between the adjacent white keys;
performing edge extraction and enhancement on the residual black pixel blocks to obtain a circumscribed rectangle corresponding to each black key;
selecting an image coordinate corresponding to the geometric center of the circumscribed rectangle as an image coordinate of the black key;
performing edge extraction and enhancement on the white pixel block of the lower part image, and selecting an image coordinate corresponding to the geometric center of the circumscribed rectangle as an image coordinate of a white key;
and calculating the abscissa and the ordinate of all the keys in the image coordinate system, and directly acquiring the depth information of the keys in the camera coordinate system by using a depth camera.
4. The vision-based control method for the piano playing robot according to claim 1, wherein the step of performing autonomous recognition by using the acquired music score image to obtain music score information comprises the following steps:
carrying out image preprocessing on the music score image to obtain a preprocessed image; the preprocessing image comprises image binaryzation, spectral line area segmentation, spectral line removal, abnormal note image restoration and note image segmentation;
carrying out note identification according to the connected domain in the preprocessed image to obtain note identification information; the note identification includes: stem identification, head identification, tail and Fu Liang identification, and pitch identification;
and identifying and recognizing the inflexion marks and the rest marks according to the music score image to obtain symbol identification information.
5. The vision-based control method for the piano playing robot of claim 4, wherein the image preprocessing is performed on the music score image to obtain a preprocessed image, and the method comprises the following steps:
searching an optimal threshold value of the minimized intra-class variance by applying a maximum inter-class variance method, and obtaining a binary curved spectrum image according to the optimal threshold value and the curved spectrum image;
projecting the binaryzation curvelet image data to the horizontal direction by using horizontal projection to construct a histogram;
acquiring 5 peak values of the histogram in the y-axis direction, and constructing a peak value vector by using the 5 peak values; a pixel area in the horizontal direction between the maximum value and the minimum value in the peak value vector is a staff image area;
removing all black pixels in the horizontal direction of each element in the peak value vector to obtain an image with a removed spectral line;
recovering notes with changed shapes caused by the removed spectral lines by using morphological processing based on the image with the removed spectral lines to obtain a recovered image;
searching each pixel in the restored image until the current pixel has a value of 255 and has not been marked;
selecting the current pixel as a seed pixel, assigning a label to the current pixel, and checking each pixel adjacent to the seed pixel;
if a pixel with the value of 255 is adjacent to the seed pixel, returning to the step of selecting the current pixel as the seed pixel, distributing a label to the current pixel, and checking each pixel adjacent to the seed pixel until no pixel with the pixel value of 255 is found;
and updating the label and returning to the step of searching each pixel in the restored image until the value of the current pixel is 255 and the pixel is not marked, until all the pixels in the image are checked, and obtaining the marked connected domain.
6. The vision-based piano playing robot control method of claim 4, wherein the step of performing note recognition according to the connected domain in the preprocessed image to obtain note recognition information comprises:
based on the connected domain, applying vertical projection and obtaining a histogram;
determining the position of the symbol stem according to the peak value of the histogram;
traversing the height of a circumscribed rectangle of the connected domain, and taking the connected domain with the difference value between the distance of the staff and the height of the circumscribed rectangle smaller than 6 as a candidate character head;
determining the symmetry rate of each connection component according to the candidate symbols;
determining the identified symbols according to the symmetry rate;
removing the stem and the head in the connected domain, performing primary screening on the removed image, and reserving the image with the width larger than 2 times of the preset width and the length larger than the space between the staff;
carrying out Hough transform on the retained image to carry out linear feature detection, and identifying the image with linear features as Fu Liang; fu Liang is the number of linear features of the Hough transform detection result;
separating the stem and the tail, and defining a reference line as the position of a third line of the staff;
and deducing according to the difference between the center position of the symbol head and the reference line to obtain the pitch.
7. The vision-based control method for a piano playing robot of claim 4, wherein the identifying and recognizing a inflexion mark and a rest mark according to the music score image to obtain symbol identification information comprises:
detecting whether a connected region mark exists in the beginning region of the staff and the left region of the tone;
if so, normalizing the image to the size same as that of the template image, and operating by applying a logic calculation formula to obtain a result image;
and determining the template image corresponding to the result image with the matching degree larger than 0.8 and the maximum number of the pixel values 255 as the corresponding inflexion mark or rest character.
8. The vision-based control method for a piano playing robot of claim 1, wherein planning an optimal motion combination of a mechanical arm and a paw according to the music score information comprises:
mapping the note sequence into a key image coordinate sequence according to the one-to-one correspondence of the notes and the keys;
the conversion of the musical notes to the target positions of the robot is realized by the conversion relation between the key image coordinate system and the paw space coordinate system and the key image coordinate sequence;
identifying a music score image, constructing a note sequence according to the playing sequence, and calculating the number of notes to be played;
initializing particle population parameters, and randomly generating various robot playing action sequences aiming at the music score with the note number;
acquiring a path length fitness function and a mechanical arm moving time fitness function of an action sequence, and determining a final particle fitness function according to the path length fitness function and the mechanical arm moving time fitness function;
evaluating the initial fitness of each particle, and calculating the initial fitness function of each particle to obtain the initial individual position of each particle at a position node and the optimal position of the particle swarm overall at the position node;
updating the speed and the position of each particle by using a preset updating formula;
calculating the fitness of all the particles, obtaining the individual optimal position vector of each particle at the position node, and updating the individual optimal position searched by the particle at the position node and the optimal position of the position node of the particle swarm overall by utilizing an updating strategy;
and judging whether the maximum iteration times or the convergence condition is met, if so, outputting an optimal action sequence, and otherwise, jumping to the step of updating the speed and the position of each particle by using a preset updating formula.
9. The vision-based piano robot control method of claim 1, wherein the determining a control command according to the optimal action combination and sending the control command to the robot arm and the gripper comprises:
sending a control instruction containing the information of the optimal action combination to the mechanical arm and the paw through a bus protocol; and the control instructions are sequentially executed, and a closed loop detection mechanism is added to the mechanical arm position instructions for detection.
10. A vision-based piano robot control device for implementing the vision-based piano robot control method of any one of claims 1 to 9, the device comprising:
the device comprises a depth camera, a paw, a six-degree-of-freedom mechanical arm and an upper computer control cabinet;
the tail end of the six-degree-of-freedom mechanical arm is connected with the paw, and the tail end of the six-degree-of-freedom mechanical arm is further connected with the depth camera through a switching frame and used for acquiring a target image and depth information. The six-degree-of-freedom mechanical arm is communicated with the upper computer control cabinet through an interface; a nonvolatile memory, a random access memory and a processor are arranged in the upper computer control cabinet; the nonvolatile memory is used for storing computer programs and image information and is coupled with the memory; the random access memory is used for loading the executed computer program and the image information; the processor is used for realizing the automatic identification of the music score image and planning the optimal action combination of the mechanical arm and the paw.
CN202211585522.3A 2022-12-09 2022-12-09 Vision-based piano playing robot control method and device Pending CN115870980A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211585522.3A CN115870980A (en) 2022-12-09 2022-12-09 Vision-based piano playing robot control method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211585522.3A CN115870980A (en) 2022-12-09 2022-12-09 Vision-based piano playing robot control method and device

Publications (1)

Publication Number Publication Date
CN115870980A true CN115870980A (en) 2023-03-31

Family

ID=85766968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211585522.3A Pending CN115870980A (en) 2022-12-09 2022-12-09 Vision-based piano playing robot control method and device

Country Status (1)

Country Link
CN (1) CN115870980A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117207204A (en) * 2023-11-09 2023-12-12 之江实验室 Control method and control device of playing robot
WO2024008217A1 (en) * 2023-06-08 2024-01-11 之江实验室 Humanoid piano playing robot

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024008217A1 (en) * 2023-06-08 2024-01-11 之江实验室 Humanoid piano playing robot
CN117207204A (en) * 2023-11-09 2023-12-12 之江实验室 Control method and control device of playing robot
CN117207204B (en) * 2023-11-09 2024-01-30 之江实验室 Control method and control device of playing robot

Similar Documents

Publication Publication Date Title
CN109117848B (en) Text line character recognition method, device, medium and electronic equipment
Valle et al. Multi-task head pose estimation in-the-wild
CN115870980A (en) Vision-based piano playing robot control method and device
Wang et al. Transferring rich feature hierarchies for robust visual tracking
US6072494A (en) Method and apparatus for real-time gesture recognition
US10445602B2 (en) Apparatus and method for recognizing traffic signs
WO2020050111A1 (en) Motion recognition method and device
CN107424161B (en) Coarse-to-fine indoor scene image layout estimation method
CN107180224B (en) Finger motion detection and positioning method based on space-time filtering and joint space Kmeans
JP6756406B2 (en) Image processing equipment, image processing method and image processing program
CN109033955A (en) A kind of face tracking method and system
CN111429481B (en) Target tracking method, device and terminal based on adaptive expression
CN113361329B (en) Robust single-target tracking method based on example feature perception
TW202201275A (en) Device and method for scoring hand work motion and storage medium
Kerdvibulvech et al. Vision-based detection of guitar players' fingertips without markers
CN114708645A (en) Object identification device and object identification method
Yin et al. Fast scale estimation method in object tracking
CN114022906A (en) Pedestrian re-identification method based on multi-level features and attention mechanism
CN113158870A (en) Countermeasure type training method, system and medium for 2D multi-person attitude estimation network
CN113470073A (en) Animal center tracking method based on deep learning
CN113449601A (en) Pedestrian re-recognition model training and recognition method and device based on progressive smooth loss
CN112667771A (en) Answer sequence determination method and device
Wang et al. A 3D guitar fingering assessing system based on CNN-hand pose estimation and SVR-assessment
CN112686104B (en) Multi-sound part music score recognition method based on deep learning
Miyashita et al. Analyzing Fine Motion Considering Individual Habit for Appearance-Based Proficiency Evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination