CN101409070A

CN101409070A - Music reconstruction method base on movement image analysis

Info

Publication number: CN101409070A
Application number: CNA2008100471644A
Authority: CN
Inventors: 徐开笑
Original assignee: Individual
Current assignee: Individual
Priority date: 2008-03-28
Filing date: 2008-03-28
Publication date: 2009-04-15

Abstract

The invention relates to a music reconstructing method, in particular to a music reconstructing method based on moving picture analysis. The music reconstructing method based on moving picture analysis is characterized in that the music reconstructing method comprises the steps as follows: step 1, two-dimensional images, that is to say, the moving gestures of the human body are obtained and stored in a temporary database according to the sequence of the moving gestures; step 2, music data which is matched with the sequence of the moving gestures stored in the temporary database are called out from a music database; step 3, the music data are arranged according to the sequence of the moving gestures matched and are input to a musical instrument. Therefore, the method has the following advantages: 1, the moving images of the human body are directly gathered by a common video camera, thereby greatly reducing the cost of the system; 2, the image identification codes and the music database are arranged correspondingly so as to accomplish the conversion from image to music.

Description

Music reconstruction method based on movement image analysis

Technical field

The present invention relates to a kind of music reconstruction method, especially relate to a kind of music reconstruction method based on movement image analysis.

Background technology

Since 20th century, computer application is that the development of music has brought revolutionary breakthrough, and science and technology is more tight with getting in touch of music.Look like the rise of electronic musical instrument and electronic music, computer composing and " colour music ", brought more wide development space to music.The pattern of musician's art music changes fully now, and by the help of special musical instrumentses such as computing machine, musician's creation can be converted into the actual music score of Chinese operas or even concrete sound equipment.Like this, various science and technology are introduced creation, the musician is freed from many simple duplications of labour, the ability and the technology of musician's creation aspect have also obtained great extension simultaneously.The utilization of technological means makes music develop to both direction:

On the one hand, because the help of technology, music technology and artistry are more and more stronger, give sound equipment beyond imagination and experience.For example, be that music and the audio in the Hollywood movie of representative made with " following water world ", " Titanic ", " Braveheart ", be exactly singer or composer and technician use " the music grand banquet " of this technological means for people's manufacturing.We can say that we want to hear one not have the musical works of scientific and technological content fully be impossible in current society.

On the other hand, because the utilization of technology, the music making threshold is more and more lower, and music communication is also more and more convenient, and the music popularity rate is also more and more higher.The invention of for example, the utilization of color ring for handset, MP3 player etc. has been dissolved among the drop of people's life.The technological means of bringing in constant renewal in is more vivid, more interesting to listen to by music, and constantly weeds out the old and bring forth the new.For example, aspect vocal music, high-tech makes to sing and carries out sound spectrum test and become possibility.The utilization multimedia computer to chanteur's sound condition, sing various aspects such as vocal technique, music quality and carry out the sound spectrum test, the reference of a voice quality being provided for repetiteur and learner from the angle of singing acoustics is with the musicology teaching of auxiliary classic method.

At present external research first meeting clue is in this respect tackled key problems to correlation technique concentrating strength on based on the alliance of manufacturer of Japan early stage electronic musical instrument business men YAMAHA, ROLAND, KORG.Simultaneously, Creative Company of Singapore and a lot of U.S. producer are also in research and development.The science and art institute of Massachusetts Polytechnics also is devoted to the research of related content.Various applied controllers emerge in an endless stream on market, but technology and the blank of using or a lot.

Comparatively speaking, though China, set up powerful relatively research and development through after a while development and made troop,, music is relative later with the related discipline starting of scientific and technological combination, a little less than relevant research and development still are in, a lower aspect.Beijing Aero-Space electronic engineering delivered the article that is entitled as " a kind of image is to the method for music conversion " in 2000, mapping relations and mapping principle to image and music are inquired into, utilize the corresponding relation between image and the music, according to certain science and artistic principle, corresponding algorithm is proposed, the raw information of image is converted into the MIDI music, but this piece article also just simply with red, orange, yellow, green, blue or green, basket, purple, corresponding C, D, ^bE, F, G, A, ^bSeven sounds of B are controlled rhythm by the color change in the color, lack dirigibility and practicality.

All there is following defective in prior art aspect the application two of image acquisition and image recognition:

1. image acquisition part: generally adopt the mode that is covered with induction point on human synovial both at home and abroad at present, realize the collection of image, the induction point place is the high-precision sensing instrument, and package unit costs an arm and a leg.

2. the application of image recognition: the range of application of at present domestic and international associated picture recognition technology only limits to the regeneration of two dimension or 3-D view.

Summary of the invention

The present invention mainly is that to solve existing in prior technology image acquisition induction point place be the high-precision sensing instrument, the technical matters that package unit costs an arm and a leg etc.; Provide a kind of direct employing common camera to gather the human motion image, greatly reduced the music reconstruction method based on movement image analysis of the cost of system.

It is that the range of application that solves the domestic and international associated picture recognition technology of existing in prior technology only limits to technical matterss such as two dimension or 3-D view regeneration that the present invention also has a purpose; Provide a kind of image identification codes and music libraries corresponding mode, finished the music reconstruction method based on movement image analysis of image to the music conversion.

Above-mentioned technical matters of the present invention is mainly solved by following technical proposals:

Music reconstruction method based on movement image analysis is characterized in that, may further comprise the steps:

Step 1 is obtained two dimensional image in order, i.e. the athletic posture of human body, and deposit in the volatile data base according to the sequencing of athletic posture;

Step 2 accesses and deposits in the music data that the athletic posture in the volatile data base is complementary from musical database;

Step 3 is arranged above-mentioned music data, and is input in the musical instruments according to the sequencing of the athletic posture that is complementary with it.

At the above-mentioned music reconstruction method based on movement image analysis, the step of obtaining the athletic posture of human body in the described step 1 is:

Step a, two-dimensional image data obtains, and promptly the data of the athletic posture of human body are obtained;

Step b handles the above-mentioned data of obtaining elimination interference and noise;

Step c carries out Feature Selection and extraction with the data among the step b, promptly picks out some the most effective features from a stack features, carries out feature extraction then from the most effective feature;

Steps d is classified the feature after the said extracted,

At the above-mentioned music reconstruction method based on movement image analysis, the music data in the described step 2 is single note or music libraries.

At the above-mentioned music reconstruction method based on movement image analysis, described athletic posture comprises human body single armed, both arms, single leg and both legs motion.

At the above-mentioned music reconstruction method based on movement image analysis, described music libraries comprises phrase or period.

At the above-mentioned music reconstruction method based on movement image analysis, described music data is a midi format.

Therefore, the present invention has following advantage: 1. directly adopt common camera to gather the human motion image, greatly reduce the cost of system; 2. the corresponding mode of image identification codes and music libraries has been finished the conversion of image to music.

Description of drawings

Fig. 1 is the motion posture of eight note correspondences of treble of the present invention;

Fig. 2 is the motion posture of eight note correspondences of middle register of the present invention;

Fig. 3 is the motion posture of eight note correspondences of bass area of the present invention;

Fig. 4 is the motion posture of eight note correspondences of the bass area after the present invention optimizes;

Fig. 5 is the motion posture of eight note correspondences of the middle register after the present invention optimizes;

Fig. 6 is the motion posture of eight note correspondences of the treble after the present invention optimizes;

Embodiment

Below by embodiment, and in conjunction with the accompanying drawings, technical scheme of the present invention is described in further detail.

Embodiment:

Mode identification method and the principle that the present invention relates to are described below:

At first learn about the definition of pattern and pattern-recognition

In a broad sense, be present in the observable things in time and space, whether they same or similar if can distinguish, and can be referred to as pattern.Say to narrow sense, pattern be exactly to interested object quantitatively or the description of structure, mode class is exactly the set with pattern of some common features, i.e. pattern overall in classification or the same class under the pattern.Also there is the people that mode class is called pattern, concrete pattern is called sample.Some automatic technique are studied in pattern-recognition exactly, rely on this technology, computing machine automatically (perhaps the people interferes on a small quantity) pattern is differentiated, is analyzed and discerns, and then assign in separately the mode class and go.

Next introduce the method and the principle of pattern-recognition

Pattern-recognition is one of data message extraction technique method of carrying out multivariate analysis, mainly is here by algorithm and software synthesis utilization several different methods pattern to be carried out Classification and Identification.At different objects and different purposes, adopt different pattern recognition theory, method, the technology of main flow is statistical model identification, syntactic pattern identification, fuzzy mathematics method, neural net method, artificial intelligence approach at present.Exist certain contact and reference between them.What adopted here mainly is statistical model identification (statistical pattern recognition), and this class recognition technology theory is more perfect, and method is also a lot, and is comparatively effective usually, formed a rounded system.Statistical model identification be with each sample with characteristic parameter (before pattern is classified, with the numerical value of conditions such as the parameter of description scheme such as pH value, acidity, total nitrogen, reducing sugar as " characteristic parameter ") be expressed as a point in the hyperspace, principle according to " things of a kind come together, people of a mind fall into the same group ", similar or similar sample separation is from should be nearer, and different sample separation are from should be far away.Like this, just can differentiate, classify, and forecast the unknown with the result according to the function of distance between sample or distance.It is a kind of basic skills that material design, industry are optimized that this statistical model is discerned, and now it is applied in the classification and discriminating of motion posture.

That introduces above relates to the template matches sorting technique, and its principle is as follows:

The primary goal of statistical model identification is exactly the classification in hyperspace of sample and representative point thereof.The simplest classification problem is that sample is divided into mutual exclusive two classes (be that first kind sample comprises that all meet the sample of certain standard, the second class sample comprises that all do not meet the sample of certain standard) in the pattern-recognition.Can be if in hyperspace, can find out a lineoid or hypersurface with the representative point branch of sample in two districts, then ask the calculating of this lineoid or hypersurface to be " training " or " study ".The sample or the sample point of institute's basis is called " training points " or " training set ", with mode identification method the training point set classified.The correct classification rate of this moment is called discrimination.Enough high as discrimination, promptly think and set up method of discrimination.To remaining sample (" not telling " computing machine in " training ", so can be described as " the unknown " sample for computing machine) classification, the correct classification rate of this moment is called prediction ability with this method of discrimination.

Be provided with two standard movement postures, sample template dimension A and B, its proper vector is d dimensional feature: X _A=(x _A1, x _A2..., x _Ad) ^TAnd X _B=(x _B1, x _B2..., x _Bd) ^TThe motion posture X that any one is to be identified, its proper vector is: X=(x ₁, x ₂..., x _d) ^T

Discern with template matching method, if X=X _A, then this motion posture is A, if X=X _B, then this motion posture is B.The simplest recognition methods utilizes distance to differentiate exactly.If X distance X _ACompare distance X _BClosely, then belong to X _A, otherwise belong to X _BWhat utilize is the minor increment diagnostic method.Any 2 x, the distance between the y:

d (x, y) = {[Σ_{i = 1}^{d} {(x_{i} - y_{i})}^{2}]}^{\frac{1}{2}}

According to the distance early work is criterion, constitutes the distance classification device, and its differentiation rule is:

\{\begin{matrix} d (X, X_{A}) < d (X, X_{B}) &DoubleRightArrow; X &Element; A \\ d (X {, X}_{A}) > d (X, X_{B}) &DoubleRightArrow; X &Element; B \end{matrix}

Pattern-recognition is calculated also can once be divided into multiclass with sample.When sample need be divided into multiclass, method commonly used was earlier sample to be divided into two classes, then sorted each class is divided into two classes with mode identification method, and this method of two classes of repeatedly dividing is usually than once the branch multiclass is good.Multicategory classification also can adopt additive method sometimes, and for example the KNN method just can directly be done multicategory classification.

Be provided with M classification: ω ₁, ω ₂..., ω _M, every class is by several vector representations, as ω _iClass has:

X_{i} = (\begin{matrix} x_{i 1} \\ x_{i 2} \\ x_{i 3} \\ \cdot \\ \cdot \\ \cdot \\ x_{in} \end{matrix})

For the posture X that is identified arbitrarily,

X = (\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \\ \cdot \\ \cdot \\ \cdot \\ x_{n} \end{matrix})

Computed range d (X _i, X),, make if there are some i

D (X _i, X)＜d (X _j, X), j=1,2 ..., M, i ≠ j be X ∈ ω then _i.

When specifically differentiating, X, 2 distances of Y can be used | X-Y| ²Expression, promptly

d (X, X_{i}) = {| X - X_{i} |}^{2} = {(X - X_{i})}^{T} (X - X_{i})

= X^{T} X - X^{T} X_{i} - X_{i}^{T} X + X_{i}^{T} X_{i}

= X^{T} X - (X^{T} X_{i} + X_{i}^{T} X - X_{i}^{T} X_{i})

X in the formula ^TX _i+ X _i ^TX-X _i ^TX _iLinear function for feature can be used as discriminant function:

d_{i} (X) = X^{T} X_{i} + X_{i}^{T} X - X_{i}^{T} X_{i}

If d (X, X _i)=min{d _i(X) }, X ∈ ω then _iThe minimum distance classification of multiclass problem that Here it is.

In practical process, also can stay less and even not stay unknown sample, to increase the training point set, improve prediction ability, a way commonly used is exactly " leaving-one method ", promptly removes a sample at every turn, with all the other sample masterpiece training points, and the forecasting procedure of trying to achieve done " the unknown " forecast to the classification of a sample removing, so successively each sample has all been done " the unknown " forecast after, get forecast success ratio (mean value) as average prediction ability.Sample can adopt " staying ten methods " " to stay 1/4th methods " and check prediction ability for a long time.

In the present embodiment, can adopt following pattern recognition system:

The analytic process of this pattern recognition system mainly is made up of 4 parts based on the pattern recognition system of statistical method: data are obtained, pre-service, feature extraction and selection, categorised decision.The categorised decision of main usefulness is exactly to be classified as a certain classification with statistical method being identified object in feature space.

Set forth operation steps of the present invention below:

Step 1 is obtained two dimensional image in order, i.e. the athletic posture of human body, and deposit in the volatile data base according to the sequencing of athletic posture; In this step, the step of obtaining the athletic posture of human body is:

The detailed process of this step is as follows:

In order to make computing machine carry out Classification and Identification to various phenomenons, with computing machine can computing symbol represent the object studied.Usually the information of input object has following 3 types, that is:

(1) two dimensional image such as literal, fingerprint, map, this class object of photo.

(2) one dimension waveform such as electroencephalogram, cardiogram, mechanical shock waveform etc.

(3) physical parameter and logical value: the description of body temperature, analysis data, parameter correctness.By measure, sampling and quantizing, can be with matrix or vector representation, the process that data that Here it is are obtained.

Identifying object of the present invention belongs to the first kind, and the gray-scale map in the two dimensional image is done to introduce simply the relevant knowledge of gray-scale map below:

Digital picture in the computing machine uses the matrix of being made up of the value of sampled point to represent usually:

(\begin{matrix} f (0,0) & f (0,1) & \cdot \cdot \cdot & f (0, M - 1) \\ f (1,0) & f (1,1) & \cdot \cdot \cdot & f (1, M - 1) \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ f (N - 1,0) & f (N - 1,1) & \cdot \cdot \cdot & f (N - 1, M - 1) \end{matrix})

Each sampling unit is called a pixel (pixel), and in the following formula, M, N are respectively the sum of all pixels of digital picture on horizontal stroke, longitudinal direction.Image file has different extension name by its digital image format unusual.Modal picture format is that its file of bitmap format is extension name with BMP.

The color depth of digital picture is represented the number of bits that the color value of each pixel is shared.The big more number of color that then can represent of color depth is many more.The difference of color depth just produces different types of image file, and normal use is monochrome image, gray level image, pseudo-chromaticity diagram picture, 24 true color images in computing machine.Because the required storage space of 24 true color images is very big, processing speed is slower, when the needs storage space little, and when requiring to handle image real-time, generally all to utilize corresponding 8 bitmaps that it is carried out approximate processing, therefore 8 bitmaps are that gray level image is more a kind of graphical representation method of coverage in the image technique, also are the method for expressing that the present invention adopts.

Gray level image has following feature:

(1) storage file of gray level image has the color of image table, and this color table has 256, and each list item is made up of the red, green, blue color component in the color of image table, and the red, green, blue color component value all equates, promptly

fred(x，y)＝fgreen(x，y)＝fblue(x，y)

(2) each pixel is formed by eight, and its value scope is represented 256 kinds of different gray levels from 0～255.(x y) is the list item entry address of color of image table to the pixel value f of each pixel.For example, certain gray level image (16 * 6) value corresponding matrix is as follows:

125，153，158，157，127，70，103，120，129，114，114，150，150，147，150，160，133，154，158，100，116，120，97，74，54，74，118，146，148，150，145，157，155，163，95，112，123，101，137，108，81，71，63，81，137，142，146，152，167，69，85，59，65，43，85，34，69，78，104，101，117，132，134，149，54，46，38，44，38，36，44，36，25，48，115，113，114，124，135，152，58，30，44，35，28，69，144，147，57，60，93，106，119，124，131，144，

The detailed process of this step is as follows:

In the generation of digital picture, transmission course, because the influence of various factors causes image unavoidably to have some noises.In order to study the identification of picture material, at first to handle the image information that obtains, elimination interference, noise, here, the image pre-service mainly refers to image is carried out filtering, and purpose is the noise of removing in the image.

In mode identification procedure, mainly contain following a few noise like:

(1) in obtaining image process, airborne grit, mist, cigarette etc. can cause decrease in image quality, introduce noise.

(2) camera lens is unclean, causes the isolated point noise, and focusing on not, the brigadier causes image blurring.These noises have only operation conscientious, just can be overcome to greatest extent.

In addition, the random noise that also has some random disturbance to cause, and the quantizing noise of image in gatherer process, and the noise of ccd video camera system.Various factors combines, and makes the character more complicated of noise.

Noise penalty picture quality, influenced graphical analysis and image recognition.In order to reduce or filtering noise and random disturbance, strengthen useful information, improve the validity and the reliability of subsequent treatment, create good condition for image segmentation, should adopt suitable method to carry out denoising.

When image information is faint can't discern the time, also must carry out enhancement process to image, to adjust for how much, color corrections etc. are with person who happens to be on hand for an errand, machine analysis.Pretreated purpose is to remove noise, strengthens Useful Information, eliminates the influence that is brought by dimension and weight.

The data volume that gets access to is sizable, in order to realize Classification and Identification effectively, and will be to raw data. carry out conversion, obtain reflecting the feature of classification essence.The process of feature extraction that Here it is and selection.Generally the space that raw data is formed is measurement space, and Classification and Identification is rely the space of carrying out feature space, by conversion, can become the pattern of representing to the pattern of representing in the lower feature space of dimension in the higher measurement space of dimension.A pattern in feature space also is called a sample usually, and it often can be expressed as a vector, i.e. a point in the feature space.

Pick out some the most effective features to reach the purpose of dimensionality reduction feature space from a stack features, this process is feature selecting.The quantity of primitive character may be very big, and sample is to be in the higher dimensional space in other words, can represent sample with lower dimensional space by the method for mapping (or conversion), and this process is feature extraction.

Feature selecting and extraction are not well-separated in fact.For example, can earlier the primitive character spatial mappings be arrived the lower space of dimension, in this space, select to remove those features that obviously do not have classified information again and reduce dimension by mapping again.Selecting the suitable feature amount, open into the suitable feature space, is a key of pattern-recognition success or failure.In actual computation, people always try hard to abandon those proper vectors little to the classification effect, make the number (guaranteeing under the prerequisite of good classification) of characteristic quantity reduce to minimum, this be because:

(1) unnecessary characteristic quantity does not only have great benefit, and may disturb assorting process.

(2) for the ratio that guarantees sample number and space dimensionality more than or equal to 3 (more preferably greater than equaling 10), and don't must be with too many sample (increasing the workload that sample number needs to increase greatly experiment in many practical problemss), it is minimum that space dimensionality is reduced to.

Testing sample is carried out feature extraction, several different methods is arranged, what this paper adopted is a kind of simple template.At first find the reference position of each sample, width of this sample of search and height near this; Length and width N equal portions with each sample constitute a N * N template, add up for the black pixel number in each zonule, divided by the area sum of this zonule, have both got eigenwert.The benefit of doing like this is that the eigenwert that obtains at same shape, the sample of different sizes is more or less the same, and has the ability to be considered as similar to the sample of same shape, different sizes.Here N=5, require object at least on width and length greater than 5 pixels, otherwise too little can't correctly classification.N value large form more is also big more, and feature is many more, and it is strong more to distinguish different object abilities, but calculated amount increases simultaneously, and the time that operation waits increases, and needed sample library also is multiplied.The number of general sample library is 5～10 times of characteristic number, and feature adds up to 5 * 5=25 here, and each posture needs at least 75 standard models, and seven postures need 525 standard models, well imagine that number is many.If the N value is too small, be unfavorable for the difference between different objects.

Each posture is extracted 5 * 5=25 dimensional feature.The feature templates extraction step of motion posture is as follows:

(1) search data district finds out the top of border up and down of human body, bottom, left, right.

(2) human region is equally divided into 5 * 5 zonule.

(3) calculate black pixel proportion in each zonule of 5 * 5,5 ratio values of first row are saved in preceding 5 of feature, second row corresponding 6～10 of feature, and the like.

Steps d is classified the feature after the said extracted,

Categorised decision is exactly to be classified as a certain classification with statistical method being identified object in feature space.Basic way is to determine certain decision rule on sample training collection basis, makes by this decision rule being identified the object loss minimum that the error recognition rate that caused is minimum or cause of classifying.

The method for designing of sorter belongs to the supervised learning method.In the supervised learning recognition methods, in order to classify to unknown things, must import the sample of some, make up training set, and the classification of these samples be known, extract the feature of these samples, utilize the classification under each sample in the training set, set up discriminant function, construct a sorter by these known conditions, to the pattern of any unknown classification, differentiate its classification then with this sorter.

Under the prerequisite that the d dimensional feature space has been determined, the classifier design problem of discussion is one and selects what criterion how make, fixed d dimensional feature space is divided into the problem of decision domain.Classification design has two kinds of basic skills: template matching method and discriminant function method.What this paper mainly taked is the method for template matches, does below to describe in detail.

The template matches sorter will train each sample in the sample sets all as standard form, make comparisons with testing sample and each standard form, find out standard form the most similar, arest neighbors, with the classification of the standard form classification as oneself.For example category-A has ten training samples, therefore has 10 templates, category-B that 8 training samples are arranged, and 8 templates are just arranged.Any one testing sample is all calculated similarity at minute time-like and this 18 templates, finds out the most similar template, if this template is one in the category-B, just determines that testing sample is a category-B, otherwise is category-A.Therefore say that template is the simplest mode identification method on the principle.But it is exactly that calculated amount is big that template has a significant disadvantages, and memory space is big, and the template that store is a lot, and each specimen will be calculated a similarity to each template, and therefore when template number was very big, calculated amount was also very big.

The classification of present embodiment is to know that with oneself the sample training collection of classification comes the design category device, i.e. supervised learning classification is called discriminatory analysis usually again.If the pattern sample has n feature, a sample has just constituted a proper vector that n ties up so, and it is at the just corresponding point of n-dimensional space.So-called classification problem is exactly that feature space is divided into corresponding to different classes of mutual exclusive zone, and each zone is corresponding to a specific mode class, and the interface between different classes of is described with " discriminant function ".For supervised classification, need determine discriminant function according to the proper vector of sample, only after discriminant function is determined, can classify to unknown pattern with it.Simultaneously, know the priori for the treatment of that classification mode is enough.This method generally can be divided into parametric method and nonparametric method two classes, wherein parametric method is generally proposed by the statistician, its quality of differentiating effect depends on the statistical distribution whether sample meets hypothesis, propose by experimental science man or computist's pattern-recognition expert and the nonparametric diagnostic method is many, sample distribution is not had special requirement.

Cluster (Cluster Analysis) is called cluster again, is a kind of method of research things classification in the mathematical statistics subject.It does not know the classification of sample in advance, and utilizes the priori of sample to come the structural classification device, i.e. unsupervised learning classification.Treat in shortage will adopt unsupervised classification, i.e. cluster analysis under the situation of priori of classification mode.Cluster analysis is with distance and degree of scatter between each proper vector of methods analyst of mathematics. some proper vector may be gathered into several groups, can be by classifying apart from distance between each group.The cluster centre of each class is exactly such core.Comprised the different subclass of SOME PROPERTIES in the sample set, the task of cluster analysis is sought these subclass exactly." things of a kind come together, people of a mind fall into the same group " is the basic point of departure of cluster analysis.

Above-mentioned mode identification technology, extracted the characteristic information of human motion, how could more abundant, more reasonably utilize these information, translated into the control of musical features? here two kinds of schemes have mainly been adopted: first kind, human body attitude is corresponded to fixing note.Second kind, human body attitude is corresponded to a music libraries.

The following describes the various emotional informations of body gesture in the present embodiment.

Health similarly is the sensor that can't close, is constantly transmitting people's mood and state, is constantly transmitting mood and impression.Can cry bitterly during as people's sadness, can dance for joy when exciting, can flush red during indignation, can laugh heartily when glad, the meeting eyes close when frightened.Musical performance then can be expressed specific emotion as sad, indignation, happiness and fear etc.In order to hold the emotion information that human body transmits more exactly, therefrom find out the tie of attitude and music contact, this paper has studied different human body attitudes, now enumerates the most representative posture as follows:

(1) single arm held upward

(2) both hands are backed up

(3) arm is handed in the front

(4) two arm held upwards:

The emotion information of human body attitude transmission usually be subjected at that time sight, concern the influence of external factor such as the depth, culture background.Even if duplicate posture under different environment, culture or identity, also can show different implication.This just makes that research has too many uncertainty, and is subjected to the influence of extraneous factor in many ways.In order to make experimental result simpler, intuitive, clear, we have got rid of the influence of external factor such as environment, culture, identity, directly start with, athletic posture has been corresponded to different notes or music module from the emotion of musical form structure, music style and melody performance.

Below, with of the conversion of the athletic posture of explanation in the present embodiment to note:

Performing artist's left hand has only three kinds of postures: forearm upwards, forearm keeps flat and vertically downward, be used for representing high, medium and low sound respectively.

Eight notes of treble be expressed as successively C3, D3, ^bE3, F3, G3, A3, ^bB3, corresponding motion posture as shown in Figure 1.

Eight notes of middle register be expressed as successively C2, D2, ^bE2, F2, G2, A2, ^bB2, corresponding motion posture as shown in Figure 2.

Eight notes of bass area be expressed as successively C1, D1, ^bE1, F1, G1, A1, ^bB1, corresponding motion posture as shown in Figure 3.

The range broad that this kind scheme is represented, the right hand posture of three octaves of high, medium and low sound is more unified, and the corresponding posture of seven notes designs simple and clearly, and the performing artist is easy to association.But " transition repetition " phenomenon often appears in this scheme in mode identification procedure.So-called " transition repetitions " be meant, the performing artist by a note in the transfer process of another note correspondence posture, the posture of other note has appearred representing.Such as, C1 ^bJust the D1 posture must occur in the transfer process of E1 posture, in the process of Flame Image Process, the D1 posture also can be come out as note is resolved, and this has just destroyed the original performance melody of performing artist.

Introduce below and optimize the back scheme

Scheme after the optimization has added the posture of shank, when both legs close up, expression be bass; When both legs part a little, expression be middle pitch; And when both legs are opened greatly, expression be high pitch.

Figure 4 shows that the motion posture of the bass area representative after the optimization, Figure 5 shows that the motion posture of the middle register representative after the optimization, Figure 6 shows that the motion posture of the treble representative after the optimization.

Difference semitone between adjacent two keys (comprising black key) on the piano, seven notes in octave be expressed as successively C, D, ^bE, F, G, A, ^bB, the various combination posture of both hands has been represented seven notes that octave is interior.The advantage of this sets of plan is: the posture of hand " transition repetition " phenomenon can not occur substantially based on the forearm motion in the performance process.

But the action of this sets of plan is not directly perceived, and the performing artist is not easy to remember the implication of different gestures.Therefore,, must be familiar with the implication that each posture representative shows earlier, then the melody of wanting in the mind to play is converted into corresponding posture, finally get a cover action smoothly glibly if the performing artist wants to play certain section melody.This sets of plan has proposed higher requirement to the performing artist.

Next be of the conversion of motion posture to music libraries; In the motion posture in the conversion process of note, each posture correspondence be a monophonic note, when playing the music of whole section harmony, it is very complicated that action will become.In order to overcome the above problems, we are transformed into music libraries with different motion postures, the representative of posture no longer be monophonic note, but several trifle or one section melody.When performing artist's speed is differently made a series of movements, also differently played the combination of several sections melody with regard to speed.Because the melody section of playing is chosen wantonly, and the melody section of different-style is combined, sounding may be very ear-piercing, very strange, but also might create by relatively more graceful melody.Through repeatedly attempting and experiment constantly, if find that the style of snatch of music is more similar with form, it is also more complete interesting to listen to that the music after the reconstruct sounds, do not have and piece together sense.Collected a large amount of music files based on this point, and music file has been carried out the processing of segmentation, intercepting and compression, finally set up a music libraries with different artistic style and form of expression.At the music libraries of having set up, directly start with from the emotion of musical form structure, music style and melody performance, music libraries is classified.

At first introduce the division foundation of music libraries down.

The method that phrase is divided mainly contains following three kinds.

1. trifle is divided: this method is simple relatively, is applicable to common typical period, and this period comprises 8 to 16 trifles normally in the melodic line of medium speed, intermediate complexity.

2. sign is divided: according to following sign, the change of the stop on rest, the long, the change in the range of sound, sound equipment dynamics, different harmony be closing-styled, restart with music tone or rhythm type originally to the division of phrase.

3. intelligence is divided: the advantage of comprehensive above two kinds of methods, divide phrase according to the auto-correlation degree of melody simultaneously,

Because repeating is most important factor in the musical form structure, so the division of period is mainly undertaken by seeking similar phrase.

After music libraries carried out above division, the performing artist just can select suitable music module according to the mood of oneself systemicly.To make the hommization more of composition process like this, the performing artist can be reconstructed and creation again snatch of music according to the mood of oneself.

After the identification that has realized image and transforming, it is last problem of native system that music generates.The many utilization and very ripe music generation technique arranged at present, taken all factors into consideration the relative merits of various technology.From the technical feasibility of MIDI music, used various message, passage and transmission control mode thereof were described in detail during combining music generated below.

At first introduce notion and the formation of MIDI

MIDI is the abbreviation of Musical Instrument Digital Interface, i.e. " the digital interface of musical instruments ".The purpose of definition MIDI notion is exactly in order to form a kind of various digital music equipment and computing machine is soft, hardware is linked up communications protocol, form of allowing, otherwise multiple musical instruments just can't link up and combine, and forms powerful music function.Therefore MIDI can be described as the computerized extension of traditional music.Linking up musically, but just must limit the language of intercommunication, the kind of the music score of Chinese operas has many kinds, and wherein the staff with the west is most widely used, score formats is represented its duration of a sound and pitch with the shape and the position of note, cooperates attributes such as its volume of text description and symbolic representation, playing method again.Midi format has comprised the appointment of above-mentioned traditional music score to playing, and adds the arrangement of musical instrument and works out attribute and instruction soft with some numberizations, hardware, allows different digital music equipment to provide the formal communication of midi message.For example, when playing the MIDI document in the MIDI software for editing, software for editing just can provide a series of midi message in proper order to sonic-effect card, and sonic-effect card is just according to midi format, with attributes such as the specified musical instrument of midi message, pitch, volume, the duration of a sound, send each different sound.

Next introduce form and the structure of MIDI

The midi format that is widely used most has three kinds:

(1) the most basic midi format of General MIDI.

(2) midi format of GS Roland formulation provides more multitone look and sound effect variation selection on the basis of General MIDI.

(3) midi format of XG Yamaha formulation provides more multitone look and sound effect variation selection on the basis of General MIDI.

Native system adopts basic General midi format, and the communication between the MIDI equipment relies on the transmission of midi message.Midi message transmits by the MIDI connecting line from the MIDI Port.

The MIDI Port has three kinds: IN, OUT and THRU.Midi message is the IN Port that is sent to another MIDI equipment from OUT Port.THRU can send the message that IN Port receives to other MIDI equipment.Connect plural MIDI equipment as need, just will use THRU Port.

Each MIDI Port is supported 16 Channel, and Channel is the channel of a tone color, and 16 channels can pronounce simultaneously, just waits and plays simultaneously as 16 kinds of different musical instruments.Channel1 to 9,11 to 16 is defined as the melody musical instrument, and as Piano, Guitar, Flute etc., Channel10 is defined as the musical rhythm instrument of non-melody, as one group of Drum Set.Selectable melody tone color has 128 kinds, and the kind of musical instrument and ordering all have regulation.Channel 10 is defined as the special use of rhythm tone color, comprises 47 kinds of tone colors.The number that pronounces simultaneously is no less than 24, comprises at least 16 melody sounds and 8 rhythm sounds (channel 10).MIDI equipment sounding is by midi message control, and the priority that midi message transmits is a channel 10,1 to 9,11 to 16.When the too many so that MIDI equipment of the number that pronounces is simultaneously sinkd beneath one's burden.And the sound of same channel has overlapping, and last sound just can suddenly stop to allow next sound sounding.MiddleC (CS) is defined as MIDI Key 60 (the 60th MIDI key).

A MIDI Note{MIDI note) attribute mainly contains:

(1) channel under the Channel-.

(2) Pitch-pitch.

(3) Velocity-intensity refers generally to press the dynamics that MIDI Keyboard key one is stopped.The dynamics of manipulating the strings as the examination of the expression of MIDIGuitar.MIDI Controller/Control change (control top) can change the pronunciation character of MIDI Note,

The midi message that other are commonly used:

(1) Program Change-changes musical instrument.

(2) Pitch Bend Change-glide is the same with Modulation, controls with the operating rod or the manipulation ball of MIDI keyboard usually.

(3) Channel Key Pressure (the Aftertouch)-value of pressing is pinned key during performance, changes the strong and weak transition effect that the pressure of key is made, and the MIDI Keboard of higher-order just supports this function.

Introduce the MIDI feasibility and the advantage of present embodiment employing down at last

Sounding instrument commonly used in the industry science field is a single-chip microcomputer, utilizes assembly language or C language compilation program, realizes the control to hummer on the single-chip microcomputer, sends several sound of fixed frequency.But native system is very high to the requirement of audible segment, because after human body enters the image acquisition scope, the performing artist can indiscriminately ad. as one wishes do exercises, so just requires system can generate the song of corresponding a large amount of different styles.This just requires audible segment can control three big music factor---tone color, pitch, the duration of a sound accurately.Yet the ticking that hummer sends has only uninteresting several notes, and the duration of a sound is fixed, and has not both had accuracy in pitch to say, tamber effect is also relatively poor, can not meet the demands far away.

At above problem, attempted another vocal technique---write the MIDI file.In the process of the format structure of study standard MIDI file and generation method, find, though write the melody that the MIDI file can be created various styles, but in a single day the MIDI file generates and just can not revise, modification process is more loaded down with trivial details, equal to have made again a MIDI file, do not meet the requirement of native system.Because in native system, the information that human motion generates is a large amount of, continuous, at difference different gestures constantly, requires system to make a response in real time.

Checking has detected music professional domain various softwares commonly used as CuteMIDI numbered musical notation composer (shareware) V1.50, cubase, sonar, protools7.0 or the like.But these softwares all are integrated, input be the numbered musical notation or the staff of standard, output be corresponding melody, do not meet the requirement of native system yet.Because in native system, image finally has been converted into various codes, requires system can realize utilizing code to control the effect of music.

Finally found solution---see through with device-independent function call and obtain access control sound card equipment.Sound card also comprises MIDI equipment usually, and this class hardware is play note to respond short binary order message.MIDI hardware can also be attached on the MIDI input equipment as music keyboard etc. by cable usually.Usually, Wai Bu MIDI compositor also can add sound card to.The API of WINDOWS comprises that prefix is the function of midiIn and midiOut, they be respectively applied for read from the MIDI sequence of peripheral control unit and on inner or outside compositor playing back music, do not need to understand the hardware interface on the MIDI card when using these functions.To between the preparatory stage of playing back music, open a MIDI output device, can call out the midiOutOpen function,, just can send MIDI message to this equipment in case open a MIDI output device and obtained its handle.(hMidiOut dwMessage) for example, the speed with 0x7F play middle pitch C (note is 0x3C) on MIDI passage 5, then need the Note On message of 3 bytes: 0x950x3C0x7F can to call out midiOutShortMsg this moment.Wherein, 00000000～01111111 totally 128 numbers can represent 128 sounds on the keyboard.Typical MIDI message (Program Change) can be a certain special modality and changes musical instrument sound.The common format of MIDI Program Change message is: Cn pp wherein, first byte is called state byte, the scope of pp is from 0 to 127 to represent 128 kinds of different sound.Because realize access control with function, just code can be input in each function easily and go, according to the difference of input condition, control section can also be adjusted.This method has not only realized the accurate control to tone color, pitch, the duration of a sound, has also reached the effect of real-time output, has satisfied the requirement of system.

Because MIDI itself simple in structure, the therefore shortcoming that also exists some temporarily can't overcome, tonequality owe that the sense of reality, difference are soft, under the hardware combinations, and tone color is completely different, the audio-visual equipment of standard can not be play.Total advantage and weakness of seeing MIDI, the research that transforms and generate from music itself, to the requirement of real-time performance and the demand of self-defined editor's melody, selecting MIDI is suitable as the music generating mode.Because the cost less relatively to the dependence of hardware equipment aspect, that this has also saved research has promoted the feasibility that system realizes.

Specific embodiment described herein only is that the present invention's spirit is illustrated.The technician of the technical field of the invention can make various modifications or replenishes or adopt similar mode to substitute described specific embodiment, but can't depart from spirit of the present invention or surmount the defined scope of appended claims.

Claims

1. the music reconstruction method based on movement image analysis is characterized in that, may further comprise the steps:

Step 1 is obtained two dimensional image, i.e. the athletic posture of human body, and deposit in the volatile data base according to the sequencing of athletic posture;

2. the music reconstruction method based on movement image analysis according to claim 1 is characterized in that, the step of obtaining the human motion attitude in the described step 1 is:

Steps d is classified the feature after the said extracted.

3. the music reconstruction method based on movement image analysis according to claim 1 is characterized in that, the music data in the described step 2 is single note or music libraries.

4. according to any described music reconstruction method of claim 1 to 3, it is characterized in that described athletic posture comprises human body single armed, both arms, single leg and both legs motion based on movement image analysis.

5. the music reconstruction method based on movement image analysis according to claim 3 is characterized in that described music libraries comprises phrase or period.

6. the music reconstruction method based on movement image analysis according to claim 4 is characterized in that, described music data is a midi format.