CN110599987A - Piano note recognition algorithm based on convolutional neural network - Google Patents
Piano note recognition algorithm based on convolutional neural network Download PDFInfo
- Publication number
- CN110599987A CN110599987A CN201910787062.4A CN201910787062A CN110599987A CN 110599987 A CN110599987 A CN 110599987A CN 201910787062 A CN201910787062 A CN 201910787062A CN 110599987 A CN110599987 A CN 110599987A
- Authority
- CN
- China
- Prior art keywords
- note
- neural network
- piano
- short
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 claims abstract description 11
- 238000001514 detection method Methods 0.000 claims abstract description 11
- 230000005236 sound signal Effects 0.000 claims abstract description 9
- 239000011295 pitch Substances 0.000 claims description 14
- 238000012216 screening Methods 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 2
- XOFYZVNMUHMLCC-ZPOLXVRWSA-N prednisone Chemical compound O=C1C=C[C@]2(C)[C@H]3C(=O)C[C@](C)([C@@](CC4)(O)C(=O)CO)[C@@H]4[C@@H]3CCC2=C1 XOFYZVNMUHMLCC-ZPOLXVRWSA-N 0.000 claims description 2
- 230000035772 mutation Effects 0.000 claims 1
- 238000000034 method Methods 0.000 abstract description 12
- 238000004364 calculation method Methods 0.000 abstract description 4
- 230000007547 defect Effects 0.000 abstract 1
- 238000005070 sampling Methods 0.000 description 6
- 238000005311 autocorrelation function Methods 0.000 description 5
- 238000009432 framing Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Auxiliary Devices For Music (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention discloses a piano note recognition algorithm based on a convolutional neural network, which mainly comprises the following steps: searching a starting point and an end point of each note from a continuous piano audio through an end point detection algorithm; dividing the complete piano audio into a set of single note audio files; drawing a spectrogram of each note; and inputting the spectrogram into a trained neural network to finish the identification. The invention provides an algorithm for searching a short-time energy difference peak point and combining double thresholds, which overcomes the defect that the traditional double-threshold algorithm excessively depends on the setting of the threshold; the method has the advantages that the audio signal is processed and converted into the digital image for identification through drawing of the spectrogram, frequency doubling errors generated when the fundamental frequency is extracted by a traditional time domain method are overcome, and the calculation speed and accuracy are greatly improved compared with the traditional frequency domain method.
Description
Technical Field
The invention belongs to an audio signal processing technology, and particularly relates to a piano note identification algorithm based on a convolutional neural network.
Background
With the development of economy and the improvement of culture, the number of music fans is increasing, but limited to factors such as energy and time, a considerable part of music fans choose to self-learn and practice during the off-hours. Because of lacking professional's guidance, the condition such as the wrong note of play and oneself can't judge often can appear, and the software that this moment a section can automatic identification piano play sound can help them to a great extent, and piano play note discernment can also alleviate music worker's working strength simultaneously, is favorable to the intellectuality of music processing and creation.
The piano note identification algorithm mainly comprises an end point detection part, a note segmentation part and a pitch identification part.
The end point detection and the note segmentation are key steps before note identification, and the accurate end point detection is a precondition for ensuring the accuracy of note identification. The double-threshold algorithm is the most classical endpoint algorithm, and the method respectively sets high and low threshold values (marked as delta) of short-time energy and short-time zero crossing rate1、δ2And Z1、Z2) A complete audio file is divided into four stages. 1 and a silent section: short time energy below delta2(ii) a 2. Transition section: short time energy greater than delta2Below delta1And the short-time zero crossing rate is more than Z2(ii) a 3. Music segment: short time energy greater than delta1And the short-time zero crossing rate is more than Z1(ii) a 4. Short time energy below delta2Or short time zero crossing rate lower than Z2. In practice, the noise condition is also taken into account, so that in addition to the above four thresholds, the shortest tone segment length and the longest transition segment length are additionally set for distinguishing noise and preventing tone truncation in advance. Therefore, the accuracy of the algorithm mainly depends on the setting of the threshold, and the setting of the threshold usually takes the background sound of a plurality of frames before the recording, which also has requirements on the recording file, and if a small popping sound occurs at the beginning of the recording, the accuracy rate is greatly reduced, and the practicability is lacked.
Conventional pitch identification has focused on research in both the time and frequency domains. The short-time autocorrelation function is used for judging the similarity degree of two signals in a time domain and is commonly used for detecting the synchronism and the periodicity of the signals. The property that the autocorrelation necessarily has a maximum value at the position of integral multiple of the period provides an important basis for extracting piano pitch, namely fundamental frequency, by using the short-time autocorrelation function. The fundamental frequency is extracted by a traditional autocorrelation function method by drawing a short-time autocorrelation function curve, the autocorrelation function is represented as a peak at a pitch period, and then the interval between two adjacent peaks is a gene period. However, in general, the fundamental component is not the strongest component, and the rich harmonic component makes the waveform of the audio signal very complex, and often a frequency doubling error occurs, that is, the result of the fundamental frequency estimation is the second frequency doubling or second frequency division of the actual fundamental frequency. The wavelet analysis method is used as a method in the field of applied mathematics, and local conversion is carried out on the time and the frequency of a signal, so that the fundamental frequency information in a music signal can be effectively extracted. The specific steps are that a wavelet component curve under the same grade number is drawn, the number n of sampling points between two maximum values in the curve reflects the pitch period, then the number of sampling points between adjacent maximum values under different grade numbers is calculated by continuously changing the grade number, and if the number of sampling points is not changed, the fundamental frequency is determined. Therefore, although the wavelet analysis method can effectively extract the fundamental frequency, the calculation amount is huge because wavelet components under different levels are calculated.
In summary, in the aspect of endpoint detection, the traditional double-threshold algorithm has the disadvantage of being too dependent on the setting of the double-threshold, and in the aspect of pitch identification fundamental frequency extraction, the traditional time domain method is prone to frequency multiplication errors and low in accuracy, while the traditional frequency domain method is high in algorithm complexity, large in calculation amount and low in operation efficiency, and both the frequency domain method and the time domain method have high requirements on signal-to-noise ratio, and cannot accurately extract audio signals with low signal-to-noise ratio.
Disclosure of Invention
The invention aims to provide a piano note identification algorithm based on a convolutional neural network.
The technical solution for realizing the purpose of the invention is as follows: a piano note identification algorithm based on a convolutional neural network comprises the following steps:
step 1, finding out a starting point and an end point of each note from a continuous piano audio through an end point detection algorithm;
step 2, dividing the complete piano audio into a set of single note audio files according to the starting point and the ending point of each note;
step 3, drawing a spectrogram of each note;
and 4, inputting the spectrogram into the trained neural network to finish recognition.
Compared with the prior art, the invention has the following remarkable advantages: (1) compared with the traditional double-threshold algorithm, the short-time energy difference and double-threshold-based endpoint detection algorithm provided by the invention does not excessively depend on the setting of the threshold value, and has high accuracy; (2) compared with the traditional time-frequency domain method, the algorithm for identifying the piano pitch by using the convolutional neural network provided by the invention has the advantages of no frequency doubling error, strong noise resistance, simple algorithm, high operation speed and high accuracy.
Drawings
FIG. 1 is a flow chart of the piano note identification algorithm based on the convolutional neural network of the present invention.
FIG. 2 is a diagram of a neural network used in the present invention.
Fig. 3 is a short time energy plot.
Fig. 4 is a graph illustrating a short-time energy difference curve.
Fig. 5 is a diagram illustrating a short-time energy difference peak point.
FIG. 6 is a schematic diagram of short-term energy difference peak screening.
Detailed Description
As shown in FIG. 1, the piano note identification algorithm based on the convolutional neural network of the present invention comprises the following steps:
step 1, reading a section of audio signal, performing framing and windowing on the audio signal, and performing normalization pretreatment.
The framing windowing represents the music signal from an unstable process as a combination of several frame sequences that are stable and time-invariant, and is the basis of a series of steps followed by calculating the relevant characteristics of the music signal.
Step 2, calculating and drawing a short-time energy difference curve of two adjacent frames, wherein the short-time energy and short-time energy difference formula is as follows:
ΔEi=Ei-Ei-1
since short-time energy difference information between frames is calculated, Δ EiFiltering micro energy fluctuation in a part of original signals, smoothing energy change of the whole audio information, and adopting difference operation to calculate difference value delta E of two adjacent framesiThe note onset is easier to determine than the energy of the short duration of each frame.
And step 3, searching and marking all maximum value points (peak value points) in the curve as candidate note starting points.
All peak points at this time include a large amount of background noise in the audio signal and the extreme points of the note signal, and need to be filtered.
And 4, setting the minimum peak height according to the background environment sound, and setting the shortest distance between adjacent peak points according to the playing speed.
The minimum peak height is mainly used for filtering background noise, and the shortest distance between adjacent peak points is mainly used for filtering pseudo end points in notes, so that one note is prevented from being cut off for multiple times and needs to be adjusted according to the beat speed when the piano is played.
And 5, screening the peak value points in the B according to the minimum peak value height and the minimum peak value distance set in the step 4, and reserving frames corresponding to the points, namely starting points of all notes.
Step 6, calculating the short-time zero crossing rate of each frame, wherein the formula is as follows:
where w (n) is a window function, sgn represents a sign function, which is defined as follows:
the short-time zero-crossing rate measurement has the significance that the periodic change of the signal can be reflected to a certain extent. For sampled sinusoidal periodic signals, the average zero crossing rate must be twice the signal frequency multiplied by the sampling period, and when the sampling period is fixed, the zero crossing rate reflects the signal frequency information. Especially for regular musical tone signals, the zero-crossing rate is distributed in a certain range, and the rule can be used for distinguishing musical tones from noise because the zero-crossing rate of the noise is larger.
And 7, setting two thresholds of short-time energy and short-time zero-crossing rate, and respectively calculating corresponding end points of each starting point obtained in the step 5.
And 8, judging the position of the end point corresponding to each starting point, and taking the first 10 frames of the starting point after the starting point as the corresponding end point if the end point is behind the next starting point.
And 9, calculating the difference value of each pair of start and stop points, judging the difference value as noise if the difference value is smaller than the set shortest note length, deleting the pair of start and stop points from the set, and finally obtaining the start and stop points of all notes.
Since the steps 8 and 9 carry out re-judgment and re-screening on each start point and each stop point, the dependence of the algorithm on threshold setting is greatly reduced, and the accuracy is improved.
And step 10, dividing the continuous notes in the audio into single notes according to the start and stop point information obtained in the step 9.
And step 11, drawing a spectrogram of each note.
And step 12, inputting the spectrogram into a trained neural network to obtain the pitch. The neural network structure is shown in fig. 2. All convolution kernels in the network are 3 x 3 in size, the pooling layers are in maximum pooling, the number of neurons in the fully-connected layer 1 is 1024, the number of neurons in the fully-connected layer 2 is 88, and the size corresponds to 88 pitches of the piano.
The present invention will be described in detail below with reference to the accompanying drawings and examples.
Examples
The audio file used in the embodiment is an artificially recorded piano playing, and comprises 8 notes in total.
Step 1, after the sound recording file is obtained, performing framing and windowing operation on the sound recording file, wherein the sampling rate is 44100Hz, and a window function is selected as a commonly used Hanning window and is defined as follows:
step 2, after the framing operation is finished, according to a formula:
ΔEi=Ei-Ei-1
respectively calculating the short-time energy and the short-time zero crossing rate of each frame and the short-time energy difference of two adjacent frames, storing the results in an array and drawing a curve, wherein the short-time energy is shown in figure 3, and the short-time energy difference is shown in figure 4.
And 3, after obtaining the short-time energy difference curve, searching all wave crests in the curve, namely maximum value points, marking the wave crests in the curve by red asterisks, and storing the peak value points and the peak values into an array for later use. As shown in FIG. 5, the peak of the background noise is generally small and is significantly different from the peak at a position near the note start. A plurality of peak values are detected in a note duration, wherein the highest peak value is a true note starting point, a plurality of adjacent peak values are note pseudo end points, the peak value corresponding to the pseudo end point is slightly lower than the peak value corresponding to the actual starting point, and the distance between the pseudo end point and the actual starting point is small.
And 4, setting the minimum peak height and the shortest peak distance, wherein the minimum peak height is an empirical value, only the piano tones and the environmental background tones are needed to be distinguished, and the shortest peak distance is related to the beat speed adopted when the piano is played. And screening all the peak points in the graph 3 according to the two thresholds, deleting the peak points smaller than the minimum peak height and the peaks thereof, namely noise from the array, wherein the peak values are higher than the minimum peak height but have the distance with the adjacent peak points lower than the shortest peak distance point, and reserving the points with larger peak values, namely pseudo end points. The final screening results are shown in fig. 6.
And 5, obtaining alternative starting points of all notes, starting from each starting point, judging whether the short-time energy and the short-time zero-crossing rate of each frame meet threshold conditions at the same time frame by frame, if one frame meets the end point conditions, continuously judging whether the next 9 frames meet the conditions, if so, judging that the frame is an alternative end point, and if not, continuously judging until the end point is found. And judging whether the position of the end point is before the next starting point after the alternative end point is obtained, if so, setting the position as the end point corresponding to the current starting point, otherwise, indicating that the end point is searched wrongly, and setting the end point corresponding to the current starting point as the first 5 frames of the next starting point.
And repeating the steps until all the end points corresponding to the starting points are searched, and storing the starting points and the end points in an array in a one-to-one correspondence mode and recording the starting points and the end points as the alternative note end points.
And 6, calculating the difference value of each pair of start and stop points, judging that the difference value is greater than the shortest note length, if so, keeping the pair of start and stop points, otherwise, judging that the pair of start and stop points is noise, and deleting the noise from the candidate note end points. This completes the endpoint detection.
And 7, after the start point and the stop point of each note are obtained in the step, reading the part from each pair of start points in the initial recording audio file with the time period from the start point to the end point corresponding to the start point in the step 6, extracting the part from the original audio to obtain an independent audio file, repeating the step, and finally completing audio segmentation to obtain the audio file corresponding to each note, wherein the total number is 8, and the audio file is named as 1 to 8 according to the sequence of the notes.
And 8, drawing a spectrogram of all the note audio files obtained in the step 7, wherein the abscissa of the spectrogram represents time, the ordinate represents frequency, the color represents energy, and the picture name is consistent with the audio name.
And 9, inputting the spectrogram obtained in the step 8 into a neural network, automatically zooming the image to a required input size by the neural network, and finally obtaining a pitch corresponding to each note through calculation of the neural network, wherein an output result is a pitch name. The selected audio frequency of the example comprises 8 piano notes, and the final recognition results are A5, G5, E5, G5, C6, A5, G5 and A5, which are consistent with the actual pitch played and are all correct.
Claims (6)
1. A piano note identification algorithm based on a convolutional neural network is characterized by comprising the following steps:
step 1, finding out a starting point and an end point of each note from a continuous piano audio through an end point detection algorithm;
step 2, dividing the complete piano audio into a set of single note audio files according to the starting point and the ending point of each note;
step 3, drawing a spectrogram of each note;
and 4, inputting the spectrogram into the trained neural network to finish recognition.
2. The convolutional neural network-based piano note identification algorithm as claimed in claim 1, wherein step 1 finds the start point of each note using the energy mutation information on the time domain, and calculates the end point of each note using the double threshold algorithm in combination with the start point information;
the short-time energy formula is:
in the formula Si(m) is an amplitude of an m-th point of the i-th frame audio signal; l is the frame length;
the short-time energy difference is the energy difference delta E between two adjacent framesiNamely:
ΔEi=Ei-Ei-1
3. the convolutional neural network-based piano note identification algorithm as claimed in claim 2, wherein the end point detection algorithm based on short-time energy difference comprises the following steps:
A) calculating and drawing a short-time energy difference curve of two adjacent frames;
B) searching and marking all maximum value points in the curve as candidate note starting points;
C) setting a minimum peak height according to background environment sounds, and setting a shortest distance between adjacent peak points according to playing speed;
D) screening the peak points in the step B according to the minimum peak height and the minimum peak distance set in the step C, wherein the frame corresponding to the reserved points is the starting point of each note;
E) calculating the short-time zero-crossing rate of each frame, wherein the formula is as follows:
where w (n) is a window function, sgn represents a sign function, which is defined as follows:
F) and D, setting two thresholds of short-time energy and short-time zero-crossing rate, and respectively calculating the corresponding end point of each starting point obtained in the step D.
G) And judging the position of the end point corresponding to each starting point, and if the end point is behind the next starting point, taking the first 10 frames of the starting point behind the starting point as the corresponding end point.
4. The convolutional neural network-based piano note identification algorithm as claimed in claim 3, wherein the read-in audio signal is subjected to frame windowing and normalization before endpoint detection.
5. The convolutional neural network-based piano note identification algorithm as claimed in claim 3, wherein the difference between each pair of start and stop points is calculated, if the difference is smaller than the set shortest note length, it is determined as noise, the pair of start and stop points is deleted from the set, and finally the start and stop points of all notes are obtained.
6. The convolutional neural network-based piano note identification algorithm as claimed in claim 1, wherein step 4 inputs the spectrogram into the trained neural network to obtain the pitch; all convolution kernel sizes in the neural network are 3 x 3, the pooling layer is maximum pooling, the number of neurons of the full connection layer 1 is 1024, the number of neurons of the full connection layer 2 is 88, and the size corresponds to 88 pitches of the piano.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910787062.4A CN110599987A (en) | 2019-08-25 | 2019-08-25 | Piano note recognition algorithm based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910787062.4A CN110599987A (en) | 2019-08-25 | 2019-08-25 | Piano note recognition algorithm based on convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110599987A true CN110599987A (en) | 2019-12-20 |
Family
ID=68855426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910787062.4A Pending CN110599987A (en) | 2019-08-25 | 2019-08-25 | Piano note recognition algorithm based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110599987A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111415681A (en) * | 2020-03-17 | 2020-07-14 | 北京奇艺世纪科技有限公司 | Method and device for determining musical notes based on audio data |
CN111508526A (en) * | 2020-04-10 | 2020-08-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111508480A (en) * | 2020-04-20 | 2020-08-07 | 网易(杭州)网络有限公司 | Training method of audio recognition model, audio recognition method, device and equipment |
CN111540378A (en) * | 2020-04-13 | 2020-08-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio detection method, device and storage medium |
CN112259063A (en) * | 2020-09-08 | 2021-01-22 | 华南理工大学 | Multi-tone overestimation method based on note transient dictionary and steady dictionary |
CN112420071A (en) * | 2020-11-09 | 2021-02-26 | 上海交通大学 | Constant Q transformation based polyphonic electronic organ music note identification method |
CN112509601A (en) * | 2020-11-18 | 2021-03-16 | 中电海康集团有限公司 | Note starting point detection method and system |
CN113593504A (en) * | 2020-04-30 | 2021-11-02 | 小叶子(北京)科技有限公司 | Pitch recognition model establishing method, pitch recognition method and pitch recognition device |
CN113658612A (en) * | 2021-08-25 | 2021-11-16 | 桂林智神信息技术股份有限公司 | Method and system for identifying played keys based on audio |
CN114283841A (en) * | 2021-12-20 | 2022-04-05 | 天翼爱音乐文化科技有限公司 | Audio classification method, system, device and storage medium |
CN116884438A (en) * | 2023-09-08 | 2023-10-13 | 杭州育恩科技有限公司 | Method and system for detecting musical instrument training sound level based on acoustic characteristics |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1093310A2 (en) * | 1999-09-28 | 2001-04-18 | Nortel Networks Limited | Tone detection using neural network |
US20060095254A1 (en) * | 2004-10-29 | 2006-05-04 | Walker John Q Ii | Methods, systems and computer program products for detecting musical notes in an audio signal |
US20080188967A1 (en) * | 2007-02-01 | 2008-08-07 | Princeton Music Labs, Llc | Music Transcription |
US20090151544A1 (en) * | 2007-12-17 | 2009-06-18 | Sony Corporation | Method for music structure analysis |
CN103325382A (en) * | 2013-06-07 | 2013-09-25 | 大连民族学院 | Method for automatically identifying Chinese national minority traditional instrument audio data |
CN104021789A (en) * | 2014-06-25 | 2014-09-03 | 厦门大学 | Self-adaption endpoint detection method using short-time time-frequency value |
CN104143324A (en) * | 2014-07-14 | 2014-11-12 | 电子科技大学 | Musical tone note identification method |
CN104217731A (en) * | 2014-08-28 | 2014-12-17 | 东南大学 | Quick solo music score recognizing method |
CN105976803A (en) * | 2016-04-25 | 2016-09-28 | 南京理工大学 | Note segmentation method based on music score |
CN108038146A (en) * | 2017-11-29 | 2018-05-15 | 无锡同芯微纳科技有限公司 | Musical performance artificial intelligence analysis method, system and equipment |
CN110136730A (en) * | 2019-04-08 | 2019-08-16 | 华南理工大学 | A kind of automatic allocation system of piano harmony and method based on deep learning |
-
2019
- 2019-08-25 CN CN201910787062.4A patent/CN110599987A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1093310A2 (en) * | 1999-09-28 | 2001-04-18 | Nortel Networks Limited | Tone detection using neural network |
US20060095254A1 (en) * | 2004-10-29 | 2006-05-04 | Walker John Q Ii | Methods, systems and computer program products for detecting musical notes in an audio signal |
US20080188967A1 (en) * | 2007-02-01 | 2008-08-07 | Princeton Music Labs, Llc | Music Transcription |
CN101652807A (en) * | 2007-02-01 | 2010-02-17 | 缪斯亚米有限公司 | Music transcription |
US20090151544A1 (en) * | 2007-12-17 | 2009-06-18 | Sony Corporation | Method for music structure analysis |
CN103325382A (en) * | 2013-06-07 | 2013-09-25 | 大连民族学院 | Method for automatically identifying Chinese national minority traditional instrument audio data |
CN104021789A (en) * | 2014-06-25 | 2014-09-03 | 厦门大学 | Self-adaption endpoint detection method using short-time time-frequency value |
CN104143324A (en) * | 2014-07-14 | 2014-11-12 | 电子科技大学 | Musical tone note identification method |
CN104217731A (en) * | 2014-08-28 | 2014-12-17 | 东南大学 | Quick solo music score recognizing method |
CN105976803A (en) * | 2016-04-25 | 2016-09-28 | 南京理工大学 | Note segmentation method based on music score |
CN108038146A (en) * | 2017-11-29 | 2018-05-15 | 无锡同芯微纳科技有限公司 | Musical performance artificial intelligence analysis method, system and equipment |
CN110136730A (en) * | 2019-04-08 | 2019-08-16 | 华南理工大学 | A kind of automatic allocation system of piano harmony and method based on deep learning |
Non-Patent Citations (2)
Title |
---|
伍洋: "基于MFCC和BP神经网络的乐音主频识别", 《信息科技辑》 * |
黎思泉等: "一种融合时频信息的钢琴音符端点检测算法", 《科技与创新》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111415681A (en) * | 2020-03-17 | 2020-07-14 | 北京奇艺世纪科技有限公司 | Method and device for determining musical notes based on audio data |
CN111415681B (en) * | 2020-03-17 | 2023-09-01 | 北京奇艺世纪科技有限公司 | Method and device for determining notes based on audio data |
CN111508526B (en) * | 2020-04-10 | 2022-07-01 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111508526A (en) * | 2020-04-10 | 2020-08-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111540378A (en) * | 2020-04-13 | 2020-08-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio detection method, device and storage medium |
CN111508480A (en) * | 2020-04-20 | 2020-08-07 | 网易(杭州)网络有限公司 | Training method of audio recognition model, audio recognition method, device and equipment |
CN113593504A (en) * | 2020-04-30 | 2021-11-02 | 小叶子(北京)科技有限公司 | Pitch recognition model establishing method, pitch recognition method and pitch recognition device |
CN112259063A (en) * | 2020-09-08 | 2021-01-22 | 华南理工大学 | Multi-tone overestimation method based on note transient dictionary and steady dictionary |
CN112420071A (en) * | 2020-11-09 | 2021-02-26 | 上海交通大学 | Constant Q transformation based polyphonic electronic organ music note identification method |
CN112509601A (en) * | 2020-11-18 | 2021-03-16 | 中电海康集团有限公司 | Note starting point detection method and system |
CN113658612A (en) * | 2021-08-25 | 2021-11-16 | 桂林智神信息技术股份有限公司 | Method and system for identifying played keys based on audio |
CN113658612B (en) * | 2021-08-25 | 2024-02-09 | 桂林智神信息技术股份有限公司 | Method and system for identifying played keys based on audio frequency |
CN114283841A (en) * | 2021-12-20 | 2022-04-05 | 天翼爱音乐文化科技有限公司 | Audio classification method, system, device and storage medium |
CN116884438A (en) * | 2023-09-08 | 2023-10-13 | 杭州育恩科技有限公司 | Method and system for detecting musical instrument training sound level based on acoustic characteristics |
CN116884438B (en) * | 2023-09-08 | 2023-12-01 | 杭州育恩科技有限公司 | Method and system for detecting musical instrument training sound level based on acoustic characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110599987A (en) | Piano note recognition algorithm based on convolutional neural network | |
Gillet et al. | Transcription and separation of drum signals from polyphonic music | |
US8193436B2 (en) | Segmenting a humming signal into musical notes | |
Ryynänen et al. | Automatic transcription of melody, bass line, and chords in polyphonic music | |
JP5282548B2 (en) | Information processing apparatus, sound material extraction method, and program | |
Kroher et al. | Automatic transcription of flamenco singing from polyphonic music recordings | |
CN111369982A (en) | Training method of audio classification model, audio classification method, device and equipment | |
CN109979488B (en) | System for converting human voice into music score based on stress analysis | |
JP2009511954A (en) | Neural network discriminator for separating audio sources from mono audio signals | |
CN110136730B (en) | Deep learning-based piano and acoustic automatic configuration system and method | |
CN106997765B (en) | Quantitative characterization method for human voice timbre | |
CN110516102B (en) | Lyric time stamp generation method based on spectrogram recognition | |
Kirchhoff et al. | Evaluation of features for audio-to-audio alignment | |
Azarloo et al. | Automatic musical instrument recognition using K-NN and MLP neural networks | |
CN113192471B (en) | Musical main melody track recognition method based on neural network | |
CN112420071B (en) | Constant Q transformation based polyphonic electronic organ music note identification method | |
Arumugam et al. | An efficient approach for segmentation, feature extraction and classification of audio signals | |
CN105895079A (en) | Voice data processing method and device | |
TWI299855B (en) | Detection method for voice activity endpoint | |
Gao et al. | Vocal melody extraction via dnn-based pitch estimation and salience-based pitch refinement | |
Oudre et al. | Chord recognition using measures of fit, chord templates and filtering methods | |
Gurunath Reddy et al. | Predominant melody extraction from vocal polyphonic music signal by time-domain adaptive filtering-based method | |
CN112634841B (en) | Guitar music automatic generation method based on voice recognition | |
CN111681674B (en) | Musical instrument type identification method and system based on naive Bayesian model | |
CN114678039A (en) | Singing evaluation method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191220 |
|
RJ01 | Rejection of invention patent application after publication |