CN102521281A

CN102521281A - Humming computer music searching method based on longest matching subsequence algorithm

Info

Publication number: CN102521281A
Application number: CN2011103821590A
Authority: CN
Inventors: 王醒策; 陈卓然; 周明全; 武仲科
Original assignee: Beijing Normal University
Current assignee: Beijing Normal University
Priority date: 2011-11-25
Filing date: 2011-11-25
Publication date: 2012-06-27
Anticipated expiration: 2031-11-25
Also published as: CN102521281B

Abstract

The invention discloses a humming computer music retrieval method based on the longest matching subsequence algorithm, which mainly includes the following steps: (1) pitch frequency extraction; (2) construction of a music feature database; (3) realization of feature expression; 4) retrieval matching; the advantage of the present invention is to promote the overall speed of similarity calculation, improve the search efficiency of search engine, build accurate music retrieval platform for karaoke and content-based search network engine and multifunctional intelligent mobile terminal platform Can be widely used in fields such as the related plug-in of network search engine, the extraction of music feature provided by the present invention, the expression of music feature and the accurate calculation method of similarity can provide the accurate calculation of humming retrieval system, make the retrieval of music Accurate, relaxed and pleasant, with strong practical value and practical significance.

Description

A Humming Computer Music Retrieval Method Based on Longest Matching Subsequence Algorithm

技术领域 technical field

本发明涉及一种基于最长匹配子序列算法的哼唱计算机音乐检索方法，属于基于音乐信息内容检索的计算机应用技术领域。The invention relates to a humming computer music retrieval method based on the longest matching subsequence algorithm, and belongs to the technical field of computer application based on music information content retrieval.

背景技术 Background technique

近年随着Internet的发展，音频数据呈几何级数增长。传统的基于文字标注的检索方法已经不能满足海量多媒体数据的检索需要，因此基于内容的音乐信息检索(Music InformationRetrieval，MIR)技术已经成为信号处理、模式识别和数据挖掘等领域的热点技术之一。基于内容的多媒体信息检索技术的研究主要集中在图像和视频方面，目前，国内外应用在音频检索上的技术还不多见。随着用户对网络分类和检索的兴趣提升，使得建立音频web数据检索机制至关重要。制约基于内容音乐检索技术发展的关键技术问题是如何提取音频特征实现音乐内容表征并描述音乐特征以及用何种方法进行特征匹配。旋律特征的提取和表达是基于内容的音乐检索中的基础环节，从音乐片段中提取的旋律特征的能否客观、准确的表达音乐的语义信息，决定着音乐特征的正确传递，直接关系到后续的匹配和检索是否切实有效；音乐片段的相似度计算算法以及相应的匹配机制能否符合普遍的听觉、心理感受，是决定检索结果是否准确的关键因素。因此旋律特征的提取表达与相似度的计算评估是影响一个哼唱检索或内容的音乐检索系统性能的最重要环节。In recent years, with the development of the Internet, audio data has grown exponentially. Traditional text-based retrieval methods can no longer meet the retrieval needs of massive multimedia data, so content-based Music Information Retrieval (MIR) technology has become one of the hot technologies in the fields of signal processing, pattern recognition, and data mining. The research of content-based multimedia information retrieval technology mainly focuses on image and video. At present, there are not many technologies applied to audio retrieval at home and abroad. As users' interest in web classification and retrieval increases, it is crucial to establish an audio web data retrieval mechanism. The key technical issues restricting the development of content-based music retrieval technology are how to extract audio features to realize music content characterization and describe music features and which method to use for feature matching. The extraction and expression of melody features is the basic link in content-based music retrieval. Whether the melody features extracted from music fragments can objectively and accurately express the semantic information of music determines the correct transmission of music features, which is directly related to the follow-up Whether the matching and retrieval is effective; whether the similarity calculation algorithm of music clips and the corresponding matching mechanism can conform to the general auditory and psychological feelings is the key factor to determine whether the retrieval results are accurate. Therefore, the extraction and expression of melody features and the calculation and evaluation of similarity are the most important links that affect the performance of a music retrieval system for humming retrieval or content.

对于声学信号而言，其听觉上的音高是由其基音频率序列(Fundamental Frequency)所决定的。音高提取的目是把用户的输入的声学信号转化成基音频率序列。目前，在特征提取方面的常见算法如：自相关函数算法(Autocorrelation)、倒谱分析法(Cepstral Analysis)、交叉相关函数算法(CCF)、平均幅度差函数算法(AMDF)、标准化交叉相关函数算法(NCCF)、整合音高提取算法(Integrated Pitch Tracker)，但随着相关技术的发展在很多应用场景中，这些算法的处理效果已经达不到应用的要求，极易造成特征表达与真实音乐语义内容的偏差和模糊。For an acoustic signal, its auditory pitch is determined by its fundamental frequency sequence (Fundamental Frequency). The purpose of pitch extraction is to convert the user's input acoustic signal into a sequence of pitch frequencies. At present, common algorithms in feature extraction such as: autocorrelation function algorithm (Autocorrelation), cepstral analysis (Cepstral Analysis), cross-correlation function algorithm (CCF), average amplitude difference function algorithm (AMDF), standardized cross-correlation function algorithm (NCCF), Integrated Pitch Tracker (Integrated Pitch Tracker), but with the development of related technologies in many application scenarios, the processing effect of these algorithms has not met the requirements of the application, and it is easy to cause feature expression and real music semantics Bias and ambiguity of content.

目前在特征表达方面的常见方法及缺点如下所示：The current common methods and shortcomings in feature expression are as follows:

1、音高轮廓表达法无法对音高变化进行量化，易造成特征表达与真实音乐语义内容模糊，随着歌曲样本扩张，极易出现音高轮廓相同但实际旋律相差很大的情况。1. The pitch contour expression method cannot quantify pitch changes, which can easily lead to ambiguity between feature expression and real music semantic content. With the expansion of song samples, it is easy to have the same pitch contour but the actual melody is very different.

2、MIDI音符近似表达法将用户哼唱的自然音高近似归一到离散的MIDI音符的整数值，会产生旋律表达不准确的问题。如图1所示，展示的是同一段旋律在C大调和A大调中的表达，两段旋律片段所有对应音的MIDI音高值完全不同，但给人的听觉和音乐认知上感受却是几乎完全一致。在合理的特征表达方法中，应视这两段旋律具有相同的旋律特征；基于这一点就体现出MIDI音符近似表达法显得不够恰当与全面。2. The approximate expression method of MIDI notes approximates the natural pitch of the user humming to the integer value of discrete MIDI notes, which will cause the problem of inaccurate melody expression. As shown in Figure 1, it shows the expression of the same melody in C major and A major. The MIDI pitch values of all the corresponding sounds in the two melody segments are completely different, but the auditory and musical cognition experience is different. is almost exactly the same. In a reasonable feature expression method, it should be considered that the two melodies have the same melody feature; based on this, the approximate expression of MIDI notes is not appropriate and comprehensive.

3、绝对音高表达法虽然解决了近似化产生的表达错误的问题，但配合串比较类和一些动态规划的相关算法时所产生的音高纵向整体偏移(Pitch Shiftiness)会带来严重的匹配误差，所以这种特征表达方法并不适合一般的相似度计算机制。3. Although the absolute pitch expression method solves the problem of expression errors caused by approximation, the pitch shiftiness (Pitch Shiftiness) produced when combined with the string comparison class and some dynamic programming related algorithms will cause serious problems. Matching error, so this feature expression method is not suitable for general similarity calculation mechanism.

4、调内音级表达法虽然避免了音高整体偏移和不同调式哼唱所带来的影响。但该方法需要加入调式主音和调式属性作为附加信息，而在哼唱应用的使用场景中，绝大多数情况下主音和调式的属性无法直接获得，在哼唱片段较短、包含信息不够丰富的情况下，该方法可能出现很大偏差。如图2所示，这是一段C大调的旋律片段，但同时也符合G大调的调式属性。这是由于G大调的音节相对于C大调只存在一个变化音#F。所以，当旋律片段中未出现这个变化音时或其还原音时，由其他音符组成的旋律片段符合C大调和G大调两个调式的属性。这会导致利用调式内每个音符到主音的度数(Degrees of the Scale from the Tonic)定音方法的失效。且许多音乐风格在创作的过程中频繁采用包含转调、调外音等打破单一调式属性的音乐创作技巧，在这些情况下，采用该方法进行哼唱检索会产生很大误差。4. Although the method of expressing the tone level within the tune avoids the influence of the overall pitch shift and humming in different modes. However, this method needs to add the tonic and mode attributes as additional information. In the usage scenarios of humming applications, the attributes of tonic and mode cannot be obtained directly in most cases. In some cases, the method may be biased significantly. As shown in Figure 2, this is a melody fragment in C major, but it also conforms to the modal attribute of G major. This is because the G major syllable has only one change sound #F relative to the C major syllable. Therefore, when the changed sound or its restored sound does not appear in the melody segment, the melody segment composed of other notes conforms to the properties of the two modes of C major and G major. This will lead to the failure of the method of tuning the pitch using the degree of each note to the tonic (Degrees of the Scale from the Tonic). Moreover, many music styles frequently use music creation techniques that break the attributes of a single mode, such as transposition and out-of-tune, in the process of creation. In these cases, using this method for humming retrieval will produce large errors.

5在传统的三重旋律表达法中，音程这一属性表达的是相邻音符之间的频率变化幅度，以赫兹为单位。音乐体系中所使用的音高单位是半音，尽管半音与赫兹成正相关但并非是线性相关，半音与赫兹呈现对数关系，因此在不同的音高区域，相差相等单位的半音的两个音之间对应的频率之差不同。如果采用频率之差作为相邻两个音之间音程衡量标准，这将导致同一旋律在不同音高区域产生不同的音程序列，进而出现严重音乐特征扭曲，例如图4所示：旋律1和旋律2包含相同的旋律特征，但在不同的调式下哼唱，其各个自然音在频率维度上分布的差别鲜明，使得三重旋律表达法无法客观表示旋律特征。5 In traditional triple melodic notation, the attribute interval expresses the magnitude of the frequency change between adjacent notes, measured in hertz. The pitch unit used in the music system is a semitone. Although semitones are positively correlated with Hertz, they are not linearly correlated. Semitones and Hertz have a logarithmic relationship. Therefore, in different pitch regions, the difference between two tones of semitones with equal units The difference between the corresponding frequencies is different. If the difference in frequency is used as the measure of the interval between two adjacent tones, this will result in different pitch sequences for the same melody in different pitch regions, resulting in serious distortion of musical features, as shown in Figure 4: melody 1 and melody 2 contains the same melodic features, but is hummed in different modes, and the distribution of each natural sound in the frequency dimension is significantly different, making the triple melody expression method unable to objectively express the melodic features.

目前在相似度计算方面的现有方法如下：The current existing methods in terms of similarity calculation are as follows:

1、编辑距离算法1. Edit distance algorithm

传统的编辑距离算法，编辑距离是用来计算两个字符串之间，把一个字符串A转换到另一个字符串B的最小操作代价。朴素的编辑距离算法(Levenshtein Distance)只适用于字符串之间的计算，无法直接应用与构成音乐旋律的相似度计算。而拓展后的编辑距离算法可以用于实数串的距离计算，这种方法的优势是可以量化比较相互匹配的两个实数序列之间的转化代价，以衡量两个实数序列所代表的两段旋律间的相似度。但这种拓展到实数范围的编辑距离算法更适用于全局比较，作为输入的两个旋律序列间不匹配时，其相似性计算性能明显降低。例如：当用户哼唱的某乐句片段与一首音乐的完整信息进行匹配时，编辑距离算法会计算大量插入元素或删除元素所带来的额外代价，这会使得旋律相似度大大降低，从而导致算法失效。如图5虚线圈定的部分所示，尽管旋律B与旋律A的局部具有很高的相似度，可视为匹配的旋律片段。但由于旋律B被机械的与旋律A的整体进行相似度计算，其相似度被大大降低，这也是编辑距离算法的缺陷所在。In the traditional edit distance algorithm, edit distance is used to calculate the minimum operation cost between two strings to convert one string A to another string B. The simple edit distance algorithm (Levenshtein Distance) is only suitable for the calculation between strings, and cannot be directly applied to the calculation of the similarity with the composition of the music melody. The extended edit distance algorithm can be used for the distance calculation of real number strings. The advantage of this method is that it can quantify and compare the conversion cost between two matching real number sequences to measure the two melodies represented by the two real number sequences. similarity between. However, this edit distance algorithm extended to the range of real numbers is more suitable for global comparison. When there is a mismatch between two melody sequences as input, its similarity calculation performance will be significantly reduced. For example: when a phrase fragment hummed by the user matches the complete information of a piece of music, the edit distance algorithm will calculate the extra cost brought by a large number of inserted elements or deleted elements, which will greatly reduce the melody similarity, resulting in Algorithms fail. As shown in the part delineated by the dotted line in Fig. 5, although parts of melody B and melody A have a high degree of similarity, they can be regarded as matching melody fragments. However, since the melody B is mechanically calculated with the overall similarity of the melody A, its similarity is greatly reduced, which is also the defect of the edit distance algorithm.

2、最长公共子序列算法2. Longest common subsequence algorithm

最长公共子序列算法的作用和优势在于，该算法可以从两个字符串A、B中找到相互匹配的子序列，从而可以用来实现从两段旋律中取得匹配的片段。但由于最长公共子序列算法不考虑元素插入和删除的代价，因此，当用户哼唱的某乐句片段与一首音乐的完整旋律信息进行匹配时，用户输入的较短的旋律序列会被无限制的拉伸，两段毫不相关的旋律通过把其中一段强行拉伸的进行匹配。这种匹配方式极大地扭曲了音乐旋律的特征，即便两段旋律经过拉伸得以匹配，但是这种方法实际上已经失效。The function and advantage of the longest common subsequence algorithm is that the algorithm can find matching subsequences from two strings A and B, so that it can be used to obtain matching segments from two melodies. However, since the longest common subsequence algorithm does not consider the cost of element insertion and deletion, when a phrase fragment hummed by the user matches the complete melody information of a piece of music, the shorter melody sequence input by the user will be ignored. Restricted stretching, two unrelated melodies are matched by one of them being stretched forcibly. This matching method greatly distorts the characteristics of the music melody. Even if the two melodies are stretched to match, this method has actually failed.

3、动态时间规整算法3. Dynamic time warping algorithm

用户哼唱的语音信号具有很强的随机性，不同的发音习惯，发音时所处的环境不同都会导致发音持续时间长短不一的现象。动态时间规整算法是把语音信号伸长或缩短，直到与标准模式的长度一致期间，未知单词的时间轴会产生扭曲或弯折，以便其特征量与标准模式对应。该算法特征使序列可以在时间轴上进行伸缩，从而使相似的轮廓能够对相互对齐，因此被广泛应用于基于内容的音乐检索、信号处理、语音识别等领域。但是，该算法同样有一些缺点，首先是时间复杂度太高，在对不定长的整句进行匹配且整句音符相差不多时，容易造成匹配结果区分度不高的问题。The voice signal of the user humming has a strong randomness, different pronunciation habits, and different environments in the pronunciation will lead to different pronunciation durations. The dynamic time warping algorithm is to lengthen or shorten the speech signal until it is consistent with the length of the standard pattern, and the time axis of the unknown word will be distorted or bent so that its feature quantity corresponds to the standard pattern. The algorithm feature enables the sequence to be stretched on the time axis, so that similar contours can be aligned with each other, so it is widely used in content-based music retrieval, signal processing, speech recognition and other fields. However, this algorithm also has some disadvantages. First, the time complexity is too high. When matching a sentence of variable length and the notes of the whole sentence are similar, it is easy to cause the problem that the matching result is not highly differentiated.

4、隐马尔可夫模型4. Hidden Markov Model

隐马尔可夫模型(Hidden Markov Model，HMM)是一种统计分析模型，可用于非特定人的语音识别中。在哼唱检索领域，由于用户输入的哼唱旋律本身也是语音信号，可以作为隐马尔科夫模型的观测向量，而旋律特征数据库中的音高特征序列特征具有概率统计特性，可以作为模型的隐状态。在实现中，通过对不同歌曲的旋律特征进行建模构成检索空间，并对模型进行相应的训练；在检索过程中，可以反馈用户哼唱的语音信号和检索空间内的歌曲模型相互匹配的概率。基于隐马尔可夫模型实现的哼唱检索系统，对于不同演唱水平的用户均能返回查准率良好的结果。但同时其也有不可避免的缺点：隐马尔可夫模型对于音乐特征数据库中的每条记录，需要分别建立相应的训练模型，随着特征库容量增长，训练的工作量将会十分庞大，因此隐马尔可夫模型实用性较差。Hidden Markov Model (HMM) is a statistical analysis model that can be used in non-specific speech recognition. In the field of humming retrieval, since the humming melody input by the user itself is also a speech signal, it can be used as the observation vector of the hidden Markov model, and the pitch feature sequence features in the melody feature database have probabilistic and statistical characteristics, which can be used as the hidden Markov model. state. In the implementation, the retrieval space is formed by modeling the melody features of different songs, and the model is trained accordingly; in the retrieval process, the probability of matching the voice signal of the user humming and the song model in the retrieval space can be fed back . The humming retrieval system based on the hidden Markov model can return good accuracy results for users of different singing levels. But at the same time, it also has unavoidable disadvantages: Hidden Markov model needs to establish a corresponding training model for each record in the music feature database. As the capacity of the feature database increases, the workload of training will be very large. Markov models are less practical.

发明内容Contents of the invention

本发明的目的在于提供一种能够克服上述技术问题的使计算机主动识别音乐音调变化的基于最长匹配子序列算法的哼唱计算机音乐检索方法。本发明的基本技术思路是：在分析对目前音乐特征提取与表达方法的基础上，确定以相邻音之间的半音音程构成的特征序列；采用RAPT算法实现音乐基音频率的提取；在技术效果上避免了在不同调式哼唱的造成的特征提取偏差，为旋律特征的准确提取创造了前提和基础。在旋律特征表达方面，以十二平均律作为生律基础，将基频轮廓序列经过对数变换，转化为以半音为单位的音程序列，避免了不同用户在调式哼唱时对旋律特征的影响，同时实现对MIDI问题特征提取的归一化，以MOMEL算法实现宏观旋律轮廓建模，并以基于十二平均律的对数转化作为技术实现的特征提取与表达方法，使原本长度10³数量级基频序列，在不丢失旋律特征的前提下，排除了歌词、语调对宏观旋律基频信号的波动影响，并使得到基频轮廓序列的长度缩减为10数量级，为进一步提高整体系统的匹配速度提供了重要支持。在相似度计算方面，采用基于最长匹配子序列(Longest Matched Subsequence，LMS)的相似度计算机制与传统串匹配计算方法相结合的方法，有效地避免了其它相关算法在应用中的局限性。The object of the present invention is to provide a humming computer music retrieval method based on the longest matching subsequence algorithm that can overcome the above-mentioned technical problems and enable the computer to actively identify changes in music pitch. The basic technical ideas of the present invention are: on the basis of analyzing the current music feature extraction and expression method, determine the feature sequence formed by semitone intervals between adjacent tones; adopt the RAPT algorithm to realize the extraction of music pitch frequency; in technical effect It avoids the deviation of feature extraction caused by humming in different modes, and creates the premise and foundation for the accurate extraction of melody features. In terms of melody feature expression, the twelve equal temperament is used as the basis of the rhythm, and the fundamental frequency contour sequence is converted into a pitch sequence with semitones as the unit through logarithmic transformation, which avoids the influence of different users on the melody feature when humming in the mode , at the same time realize the normalization of MIDI problem feature extraction, realize the macro-melody contour modeling with the MOMEL algorithm, and use the logarithmic transformation based on the twelve equal laws as the feature extraction and expression method realized by technology, so that the original length of 10 ³ orders of magnitude The fundamental frequency sequence, on the premise of not losing the melody characteristics, eliminates the influence of lyrics and intonation on the macro melody fundamental frequency signal, and reduces the length of the fundamental frequency contour sequence to 10 orders of magnitude, in order to further improve the matching speed of the overall system provided important support. In terms of similarity calculation, the method of combining the similarity calculation mechanism based on Longest Matched Subsequence (LMS) with the traditional string matching calculation method effectively avoids the limitations of other related algorithms in application.

本发明的主要步骤为：Main steps of the present invention are:

(1)基音频率提取；通过音频处理，采用RAPT算法进行基音频率提取、采用低通滤波器和高通滤波器进行基音频率序列规整、采用中值滤波和线性平滑进行基音频率序列平滑、采用MOMEL算法进行旋律建模的步骤实现将用户哼唱的语音信号转化为基音频率轮廓序列。(1) Pitch frequency extraction; through audio processing, use RAPT algorithm for pitch frequency extraction, low-pass filter and high-pass filter for pitch frequency sequence regularization, median filter and linear smoothing for pitch frequency sequence smoothing, and MOMEL algorithm The step of performing melody modeling realizes converting the voice signal hummed by the user into a sequence of pitch frequency contours.

(2)音乐特征数据库的构建；将数据库中所有歌曲的MIDI文件进行预处理，提取出其中的MIDI音高序列，并以独立字段存入音乐特征数据库，在后续的检索环节中省去了MIDI文件处理的步骤，而是直接从特征数据库中提取音高序列。(2) The construction of the music feature database; the MIDI files of all songs in the database are preprocessed, the MIDI pitch sequence is extracted, and stored in the music feature database in an independent field, and the MIDI is omitted in the subsequent retrieval link. step in file processing, and instead extract pitch sequences directly from the feature database.

(3)特征表达实现；将从音频处理模块得到的基音频率轮廓序列和音乐特征数据库提取的MIDI音高序列，转化为统一旋律音程序列，分别代表用户哼唱和数据库记录的旋律特征。(3) Realization of feature expression; convert the pitch frequency profile sequence obtained from the audio processing module and the MIDI pitch sequence extracted from the music feature database into a unified melody sequence, representing the melody features of the user humming and the database record respectively.

(4)检索匹配；将从用户哼唱音频提取的旋律特征序列分别与检索空间中的所有音乐特征序列进行相似度计算，并按照最长匹配子序列(LMS)算法机制，将每次匹配的结果进行相似度排序。(4) Retrieval matching; the melody feature sequence extracted from the user's humming audio is calculated for similarity with all music feature sequences in the retrieval space, and according to the longest matching subsequence (LMS) algorithm mechanism, each matching The results are sorted by similarity.

本发明的优点是，提升了相似度计算的总体速度，提高了搜索引擎的搜索效率，为卡拉OK和基于内容搜索网络引擎及多功能智能移动终端平台构建了精确的音乐检索平台；可广泛地应用在网络搜索引擎的相关插件等领域，本发明所提供的音乐特征的提取、音乐特征的表达以及相似度的精确计算方法可提供哼唱检索系统的准确计算，使音乐的检索准确、轻松、愉快，具有较强的实用价值和现实意义。The invention has the advantages of improving the overall speed of similarity calculation, improving the search efficiency of search engines, and constructing an accurate music retrieval platform for karaoke and content-based search network engines and multifunctional intelligent mobile terminal platforms; it can be widely used Applied in fields such as related plug-ins of network search engines, the extraction of music features provided by the present invention, the expression of music features and the precise calculation method of similarity can provide accurate calculations for humming retrieval systems, making music retrieval accurate, easy, and efficient. Pleasant, with strong practical value and practical significance.

附图说明 Description of drawings

图1是同一段旋律分别在C大调和G大调中的表达示意图；Fig. 1 is a schematic representation of the same melody in C major and G major;

图2是同时符合C大调和G大调调式属性的一段旋律示意图；Fig. 2 is a schematic diagram of a melody that conforms to both C major and G major mode attributes;

图3是半音与赫兹的数值关系示意图；Fig. 3 is a schematic diagram of the numerical relationship between semitones and Hertz;

图4是相同旋律在不同调式下的频率变化曲线示意图；Fig. 4 is a schematic diagram of the frequency change curve of the same melody under different modes;

图5是局部旋律与整体旋律进行匹配示意图；Fig. 5 is a schematic diagram of matching a local melody with an overall melody;

图6是本发明的音频特征提取总体流程示意图；Fig. 6 is a schematic diagram of an overall flow chart of audio feature extraction in the present invention;

图7是相同旋律在不同音高区域的音程曲线Figure 7 is the interval curve of the same melody in different pitch regions

图8是本发明的基于MOMEL算法的旋律建模示意图；Fig. 8 is a schematic diagram of melody modeling based on the MOMEL algorithm of the present invention;

图9是本发明的基于LMS算法的相似度计算流程图。Fig. 9 is a flow chart of the similarity calculation based on the LMS algorithm of the present invention.

具体实施方式 Detailed ways

下面结合附图和实施例对本发明进行详细描述。本发明的主要步骤为：The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments. Main steps of the present invention are:

(1)基音频率序列提取及处理(1) Pitch frequency sequence extraction and processing

在基于音乐信息内容检索的技术中，对音频输入的特征提取的准确性对于音乐信息检索系统的整体性能起着至关重要的作用。理想的音频特征提取需要客观准确地表达用户所输入的音频检索信息中的音乐旋律，为提升检索准确率和检索效率，本发明提出了一种包含基音频率提取、频域滤波、中值滤波、旋律建模等多步骤的结合的旋律特征提取流程，本发明的基音频率序列提取及处理总体流程如图6所示：In technologies based on music information content retrieval, the accuracy of feature extraction for audio input plays a crucial role in the overall performance of music information retrieval systems. Ideal audio feature extraction needs to objectively and accurately express the music melody in the audio retrieval information input by the user. In order to improve the retrieval accuracy and retrieval efficiency, the present invention proposes a method including pitch frequency extraction, frequency domain filtering, median filtering, The combined melody feature extraction process of multi-steps such as melody modeling, the overall flow of pitch frequency sequence extraction and processing of the present invention is shown in Figure 6:

1)对输入的WAV波形文件应用RAPT算法进行基音频率提取，从而得到基音频率序列；1) Apply the RAPT algorithm to the input WAV waveform file to extract the pitch frequency, thereby obtaining the pitch frequency sequence;

2)将原始的基音频率序列将经过高通滤波器和低通滤波器处理，去除毛刺和噪声点，平滑基频曲线。人类的音域宽度范围一般在E2(82Hz)～C6(1047Hz)之间，根据人类自然发声范围，将高通滤波的阈值设置为80Hz，低通滤波的阈值设置为1100Hz，用以除去处在高低阈值之外的基频值；2) The original pitch frequency sequence will be processed by a high-pass filter and a low-pass filter to remove burrs and noise points and smooth the pitch curve. The range of the human vocal range is generally between E2 (82Hz) and C6 (1047Hz). According to the natural vocal range of human beings, the threshold of the high-pass filter is set to 80Hz, and the threshold of the low-pass filter is set to 1100Hz to remove the high and low thresholds. other fundamental frequency values;

3)用线性平滑处理对基音频率序列进行线性滤波处理，去除基音频率序列中的噪声点并且对基音频率序列的曲线轮廓进一步平滑。在本发明的实施例中，滤波窗口设置为50毫秒。3) Perform linear filtering on the pitch frequency sequence with linear smoothing to remove noise points in the pitch frequency sequence and further smooth the curve profile of the pitch frequency sequence. In the embodiment of the present invention, the filtering window is set to 50 milliseconds.

4)将所得到的基音频率序列，通过中值滤波去除噪声点，有效地去除了基音频率序列中的噪声点，且完好地保留基音频率序列中连续曲线之间的阶跃变化。在本发明的实施例中，经过基音频率提取后的到基音频率序列采样率为100点/秒，中值滤波窗口设置为77毫秒。4) Remove the noise points from the obtained pitch frequency sequence through median filtering, effectively remove the noise points in the pitch frequency sequence, and keep the step changes between the continuous curves in the pitch frequency sequence intact. In the embodiment of the present invention, the sampling rate of the pitch frequency sequence after the pitch frequency extraction is 100 points/second, and the median filter window is set to 77 milliseconds.

(2)音乐特征表达(2) Expression of musical characteristics

1)基音频率曲线的特征表达1) Characteristic expression of the pitch frequency curve

以半音作为单位，以相邻两个音之间的音程所构成的序列作为旋律特征。包含n个自然音符的旋律片断，可被表达为n-1个实数构成的音程序列，以量化的方式表达旋律特征，不同旋律的音乐特征具有区分度，为后续的相似度计算提供有效的结果；对整体音高整体偏移不敏感，允许用户在任意调式中哼唱，相同的旋律特征仍能被提取；具有良好的稳定性，即使在旋律信息有限的情况下，特征表达方法仍然不会失效等优点。对用户通过哼唱输入的音频信息，音程计算定义如公式(1)所示：The melodic feature is the sequence formed by the intervals between two adjacent tones, with semitones as the unit. A melody fragment containing n natural notes can be expressed as a sequence of n-1 real numbers, expressing the melody features in a quantized manner. The music features of different melodies are distinguishable, providing effective results for subsequent similarity calculations ;It is insensitive to the overall shift of the overall pitch, allowing users to hum in any mode, and the same melody features can still be extracted; it has good stability, even in the case of limited melody information, the feature expression method still does not Advantages such as failure. For the audio information input by the user by humming, the definition of interval calculation is shown in formula (1):

$Pitch pitch {Interval Interval}_{n no} = = 1212 * * {log log}_{22} ((\frac{{freq freq}_{n no + + 11}}{{freq freq}_{n no}})) - - - - - - ((11))$

根据以上定义，可将音高频率序列Fx＝(freq₁，freq₂，freq₃，...，freq_n)映射到音程序列Pi＝(pitch_interval₁，pitch_interval₂，pitch_interval₃，...，pitch_interval_n-1)。According to the above definition, the pitch frequency sequence Fx=(freq ₁ , freq ₂ , freq ₃ ,..., freq _n ) can be mapped to the pitch sequence Pi=(pitch_interval ₁ , pitch_interval ₂ , pitch_interval ₃ ,..., pitch_interval _n-1 ).

对于音乐特征数据库中存储的MIDI文件，需要采用同样的旋律特征表达方式，使得从用户输入端和从数据库端提取的旋律特征具有相同的格式。对MIDI文件，音程计算定义如公式(2)所示，其中MIDI_note_n+1和MIDI_note_n代表MIDI文件中的音高值：For the MIDI files stored in the music feature database, the same melody feature expression method needs to be adopted, so that the melody features extracted from the user input end and the database end have the same format. For MIDI files, the interval calculation definition is shown in formula (2), where MIDI_note _n+1 and MIDI_note _n represent the pitch values in the MIDI file:

Pitch Interval_n＝MIDI_note_n+1-MIDI_note_n (2)Pitch Interval _n = MIDI_note _n+1 -MIDI_note _n (2)

经过以上转化，可将不同调式下的同一旋律特征进行归一化，同时消除了不同哼唱调式对旋律特征提取的影响，如图7所示，显然两条曲线的对应点已经完全重合，不同调式中同一旋律的特征被成功的以归一化的方式提取出来。基于相同的特征表达方式可以设计一个相似度评价机制，来完成检索信息与数据库中的特征信息的匹配，对经过处理的基音频率序列进行旋律建模，得到一组由离散点构成的旋律骨架；旋律骨架经过对数转化，输入的相邻音之间音程被提取出来，并以此作为输入音频的特征序列，最终提取出的旋律特征，被送入匹配模块与音乐特征数据库中的信息进行相似度计算，得到匹配结果。After the above conversion, the same melody features in different modes can be normalized, and the influence of different humming modes on the extraction of melody features can be eliminated. The features of the same melody in the mode are successfully extracted in a normalized way. Based on the same feature expression method, a similarity evaluation mechanism can be designed to complete the matching of the retrieval information and the feature information in the database, perform melody modeling on the processed pitch frequency sequence, and obtain a set of melody skeletons composed of discrete points; The melody skeleton undergoes logarithmic transformation, and the intervals between the input adjacent tones are extracted, and used as the feature sequence of the input audio, and the finally extracted melody features are sent to the matching module for similarity with the information in the music feature database. Calculate the degree and get the matching result.

2)旋律的特征表达2) The characteristic expression of the melody

通过基音频率提取、滤波的得到基音频率轮廓曲线，可被拆分成为两种相互独立旋律成份的组合：宏观旋律成份和微观旋律成份。其定义分别如下：The pitch frequency contour curve obtained through pitch frequency extraction and filtering can be split into a combination of two independent melody components: a macro melody component and a micro melody component. Its definitions are as follows:

宏观旋律成份：反应语音信息中的声调模式，与基音频率的全局音高变化密切相关。Macro-melodic component: reflects the tone pattern in the speech information, and is closely related to the global pitch change of the pitch frequency.

微观旋律成份：反应语音信息中的音素成份，影响基音频率曲线的局部变化。Micro-melodic components: reflect the phoneme components in the voice information, and affect the local changes of the pitch frequency curve.

同理，哼唱信息作为语音信息的一种，亦可视为两种旋律成份的组合。对于人声哼唱的音乐旋律，音高变化只和其基频曲线的宏观旋律成份相关，而哼唱的音标、歌词等音素信息，则由其基音频率曲线微观旋律成份决定，利用二次样条函数，通过插值近似得到基音频率曲线的宏观旋律。所得到的宏观旋律以离散目标点序列的形式呈现，并代表了该基音频率序列对应的音高旋律特征，哼唱旋律特征基于音高序列而与音标、音素信息无关。所以，利用MOMEL算法对经过滤波的基音频率轮廓曲线进行处理，可以获取基音频率轮廓曲线中的宏观旋律序列，并作为后续旋律特征表达的基础。Similarly, humming information, as a kind of voice information, can also be regarded as a combination of two melodic components. For the melody of human humming music, the pitch change is only related to the macroscopic melody components of its fundamental frequency curve, while the phoneme information such as the phonetic symbols and lyrics of humming is determined by the microscopic melody components of its pitch frequency curve. The bar function is used to approximate the macro-melody of the pitch frequency curve by interpolation. The obtained macroscopic melody is presented in the form of a sequence of discrete target points, and represents the pitch melody feature corresponding to the pitch frequency sequence. The humming melody feature is based on the pitch sequence and has nothing to do with phonetic symbols and phoneme information. Therefore, by using the MOMEL algorithm to process the filtered pitch frequency contour curve, the macroscopic melody sequence in the pitch frequency contour curve can be obtained and used as the basis for subsequent melody feature expression.

如图8所示的是MOMEL算法的处理的一个实例。经过旋律建模，基音频率轮廓曲线(上方)的宏观旋律(下方)被成功的提取出来。然而，MOMEL算法输出的直接结果对于后续的旋律特征表达仍存在明显不足。例如，图8中基音频率轮廓曲线最后两段中，代表一个音高的基音频率轮廓曲线被标记出两个数值十分相近的目标点。为解决此类问题，本发明设置一个参数化的阈值，用以控制相邻音之间的音程。当两个音之间的音程低于此阈值的时候，这个音程会基于具体情况被删除或者与其他相邻音程进行合并。An example of the processing of the MOMEL algorithm is shown in FIG. 8 . After melody modeling, the macro melody (below) of the pitch frequency profile curve (above) is successfully extracted. However, the direct results output by the MOMEL algorithm are still obviously insufficient for the subsequent melodic feature expression. For example, in the last two segments of the pitch frequency contour curve in FIG. 8 , two target points with very similar values are marked on the pitch frequency contour curve representing a pitch. To solve such problems, the present invention sets a parameterized threshold to control the interval between adjacent tones. When the interval between two tones is lower than this threshold, this interval will be deleted or merged with other adjacent intervals based on the specific situation.

(3)相似度计算-检索匹配算法(3) Similarity Calculation-Retrieval Matching Algorithm

(a)最长匹配子序列算法(Longest Matched Subsequence，LMS)(a) Longest Matched Subsequence (LMS)

基于本发明中的特征提取方法所获得的旋律特征是一组实数序列，而音乐特征数据库中存储的旋律特征均为整数序列。此时，如果机械的利用最长公共子序列算法计算两个序列的相似度，那么很多原本可以匹配的元素可能被遗漏。The melody features obtained based on the feature extraction method in the present invention are a set of real number sequences, while the melody features stored in the music feature database are all integer sequences. At this time, if the longest common subsequence algorithm is used to mechanically calculate the similarity between the two sequences, many elements that could have been matched may be missed.

最长匹配子序列算法正可解决最长公共子序列算法(LCS)在应用中存在问题。最长匹配子序列算法作为对最长公共子序列算法的一种改进，其输出结果是独立的两个子序列A’、B’，分别是输入序列A、B的子序列。The longest matching subsequence algorithm can solve the problems in the application of the longest common subsequence algorithm (LCS). As an improvement to the longest common subsequence algorithm, the longest matching subsequence algorithm outputs two independent subsequences A' and B', which are subsequences of the input sequences A and B respectively.

按照如下方式定义最长匹配子序列：Define the longest matching subsequence as follows:

给定输入序列A＝(a1，a2，a3，...，an)和B＝(b1，b2，b3，...，bm)，Given an input sequence A = (a1, a2, a3, ..., an) and B = (b1, b2, b3, ..., bm),

即产生子序列A’＝(a’1，a’2，a’3，...，a’1)和B’＝(b’1，b’2，b’3，...，b’1)。That is, subsequences A'=(a'1, a'2, a'3,...,a'1) and B'=(b'1, b'2, b'3,...,b '1).

子序列A’、B’满足如下条件：Subsequences A' and B' satisfy the following conditions:

1)子序列A’、B’中的每个元素都在另一子序列中有与之匹配的元素，且符合如下条件：1) Each element in the subsequence A', B' has a matching element in another subsequence, and meets the following conditions:

在子序列A’、B’中：In subsequences A', B':

A’的元素a’k对应A的元素ai；The element a'k of A' corresponds to the element ai of A;

B’的元素b’k对应B的元素bj。Element b'k of B' corresponds to element bj of B.

满足：LD(ai，bj)≤δ，其中δ为给定局部相似度最大值。Satisfy: LD(ai, bj)≤δ, where δ is the maximum value of a given local similarity.

2)子序列A’和B’分别在各自的原始序列中相对连续，即符合如下条件：2) The subsequences A' and B' are relatively continuous in their respective original sequences, that is, they meet the following conditions:

在子序列A’、B’中：In subsequences A', B':

A’的元素a’k对应A的元素ai，A’的的元素a’k+1对应A的元素as；The element a'k of A' corresponds to the element ai of A, and the element a'k+1 of A' corresponds to the element as of A;

B’的元素b’l对应B的元素bj，B’的的元素b’l+1对应B的元素at；The element b'l of B' corresponds to the element bj of B, and the element b'l+1 of B' corresponds to the element at of B;

满足：s-i≤L且t-j≤L，其中L为子序列中允许插入元素的最大值。Satisfy: s-i≤L and t-j≤L, where L is the maximum value of elements allowed to be inserted in the subsequence.

3)子序列A’和B’具有相同的长度。并且A’、B’分别是A、B所有满足条件1)和2)的子序列中最长的，即|A’|＝|B’|＝max{|Ak|，|Bl|}。3) Subsequences A' and B' have the same length. And A', B' are the longest subsequences of A and B satisfying conditions 1) and 2) respectively, that is, |A'|=|B'|=max{|Ak|, |Bl|}.

在最长匹配子序列算法中，元素相等的概念被匹配的概念所替换。与最长公共子序列算法不同，A’与B’并非机械地完全相等，而是A、B所有子序列中最长且拥有相似度最高的一组。In the longest matching subsequence algorithm, the concept of equality of elements is replaced by the concept of matching. Unlike the longest common subsequence algorithm, A' and B' are not completely equal mechanically, but are the longest and most similar group of all subsequences of A and B.

(b)局部相似度计算(b) Local similarity calculation

作为最长匹配子序列算法的基础，下面就局部相似度的计算方式进行介绍。As the basis of the longest matching subsequence algorithm, the calculation method of local similarity is introduced below.

首先，定义实数序列的编辑距离算法：First, define the edit distance algorithm for sequences of real numbers:

对给定输入的实数序列：X＝(x1，x2，x3，...，xm)、Y＝(y1，y2，y3，...，yn)。For a given input sequence of real numbers: X=(x1, x2, x3, . . . , xm), Y=(y1, y2, y3, . . . , yn).

按照公式3定义实数序列中元素之间相互转化的权值，其中δ为判定相等的阈值：According to the formula 3, define the weight of mutual transformation between elements in the sequence of real numbers, where δ is the threshold for judging equality:

$w w ((a a,, b b)) = = \{\begin{matrix} 00,, if if | | a a - - b b | | < < δ δ \\ | | a a - - b b | |,, if if | | a a - - b b | | &GreaterEqual; &Greater Equal; δ δ \end{matrix} - - - - - - ((33))$

初始化编辑距离矩阵Dm，n，初始化条件如下所示：Initialize the edit distance matrix Dm,n, and the initialization conditions are as follows:

d0，0＝0；d0,0=0;

di，0＝di-1，0+w(xi，0)，其中1≤i≤m；di,0=di-1,0+w(xi,0), where 1≤i≤m;

d0，j＝d0，j-1+w(0，yj)，其中1≤j≤n。d0,j=d0,j-1+w(0,yj), where 1≤j≤n.

对1≤i≤m且1≤j≤n的矩阵单元，计算编辑距离矩阵Dm，n，递归方程如公式4所示，其中Wdel、Wsub和Wins分别为删除、替换、插入三种操作的权值：For matrix units with 1≤i≤m and 1≤j≤n, calculate the edit distance matrix Dm,n, and the recursive equation is shown in formula 4, where Wdel, Wsub and Wins are the weights of the three operations of deletion, replacement and insertion respectively value:

${d d}_{i i,, j j} = = min min \{\begin{matrix} {d d}_{i i - - 11,, j j} + + {W W}_{del del} * * w w (({x x}_{i i},, 00)) \\ {d d}_{i i - - 11,, j j - - 11} + + {W W}_{sub sub} * * w w (({x x}_{i i},, {y the y}_{j j})) \\ {d d}_{i i,, j j - - 11} + + {W W}_{ins ins} * * w w ((00,, {y the y}_{j j})) \end{matrix} - - - - - - ((44))$

最终，输入的实数序列X和Y之间的编辑距离ED(X，Y)可以从矩阵Dm，n的右下角dm，n得到，如公式5所示：Finally, the edit distance ED(X, Y) between the input real sequence X and Y can be obtained from the lower right corner dm,n of the matrix Dm,n, as shown in Equation 5:

ED(X，Y)＝dm，n。(5)ED(X,Y)=dm,n. (5)

然后，给出元素的局部相似度的具体定义：Then, a specific definition of the local similarity of elements is given:

对给定输入的旋律特征序列：A＝(a1，a2，a3，...，an)，B＝(b1，b2，b3，...，bm)。For a given input melody feature sequence: A = (a1, a2, a3, ..., an), B = (b1, b2, b3, ..., bm).

从A、B中各取一个元素构成二元组(ai，bj)，对于每一对二元组，其局部相似度定义如下，其中k为局部半径：Take one element from A and B respectively to form a pair (ai, bj). For each pair of pairs, the local similarity is defined as follows, where k is the local radius:

定义局部子序列X＝(ai-k，...，ai，...，ai+k)和Y＝(bj-k，...，bj，...，bj+k)。Define local subsequences X=(ai-k, . . . , ai, . . . , ai+k) and Y=(bj-k, . . . , bj, . . . , bj+k).

则二元组(ai，bj)附近的局部相似度LD(ai，bj)可由局部子序列X和Y之间的编辑距离ED(X，Y)得到，如公式6所示：Then the local similarity LD(ai, bj) near the binary group (ai, bj) can be obtained from the edit distance ED(X, Y) between the local subsequences X and Y, as shown in Equation 6:

LD(ai，bj)＝ED(X，Y)。(6)LD(ai,bj)=ED(X,Y). (6)

(c)动态规划计算最长匹配子序列(c) Dynamic programming calculates the longest matching subsequence

明确了原始序列A、B中元素ai、bj附近局部相似度的计算规则，可以利用动态规划(Dynamic Programming)的策略计算出符合定义的最长匹配子序列A’、B’。The calculation rules of the local similarity near the elements ai and bj in the original sequences A and B are clarified, and the longest matching subsequences A' and B' that meet the definition can be calculated using the strategy of Dynamic Programming.

首先，利用动态规划结算最长公共子序列算法(LCS)的计算流程：First, use dynamic programming to settle the calculation process of the longest common subsequence algorithm (LCS):

给定输入序列A＝(a1，a2，a3，...，am)和B＝(b1，b2，b3，...，bn)。Given an input sequence A = (a1, a2, a3, ..., am) and B = (b1, b2, b3, ..., bn).

构造LCS矩阵Cm，n，按照如下条件初始化该矩阵：Construct the LCS matrix Cm, n, and initialize the matrix according to the following conditions:

ci，0＝0，c0，j＝0，其中0≤i≤m且0≤j≤n；ci,0=0, c0,j=0, where 0≤i≤m and 0≤j≤n;

利用公式7中的递归方程计算矩阵，其中1≤i≤m且1≤j≤n：Calculate the matrix using the recurrence equation in Equation 7, where 1≤i≤m and 1≤j≤n:

${c c}_{i i,, j j} = = \{\begin{matrix} {c c}_{i i - - 11,, j j - - 11} + + 11,, if if {a a}_{i i} = = {b b}_{j j} \\ max max (({c c}_{i i,, j j - - 11},, {c c}_{i i - - 11,, j j})),, else else \end{matrix} - - - - - - ((77))$

最终，最长公共子序列可由Cm，n的右下角cm，n得出。Finally, the longest common subsequence can be obtained from the lower right corner cm,n of Cm,n.

最长匹配子序列算法(LMS)可按如下方式计算：The longest matching subsequence algorithm (LMS) can be calculated as follows:

定义矩阵Cm，n、Rm，n和Sm，n，用以计算最长匹配子序列，详细定义分别如下：Define the matrices Cm, n, Rm, n and Sm, n to calculate the longest matching subsequence. The detailed definitions are as follows:

整形矩阵Cm，n，其单元ci，j储存子序列(a1，a2，a3，...，ai)和(b1，b2，b3，...，bj)之间的最长匹配子序列长度；Shaping matrix Cm, n, its unit ci, j stores the length of the longest matching subsequence between subsequences (a1, a2, a3, ..., ai) and (b1, b2, b3, ..., bj) ;

整数矩阵Rm，n，其单元ri，j储存这两个子序列中不连续元素的个数；The integer matrix Rm, n, its unit ri, j stores the number of discontinuous elements in the two subsequences;

字符矩阵Sm，n，其单元si，j储存矩阵Cm，n、Rm，n的内部演算路径，对每一次计算记录正常最长匹配子序列增长的方向。The character matrix Sm,n, its unit si,j stores the internal calculation path of the matrix Cm,n, Rm,n, and records the growth direction of the normal longest matching subsequence for each calculation.

按照以下条件初始化这三个矩阵：Initialize the three matrices as follows:

ci，0＝0，c0，j＝0，r0，j＝0，ri，0＝0，s0，j＝‘_’，si，0＝‘_’，其中0≤i≤m且0≤j≤n；ci, 0=0, c0, j=0, r0, j=0, ri, 0=0, s0, j='_', si, 0='_', where 0≤i≤m and 0≤j ≤n;

利用公式8-10中的递归方程计算矩阵Cm，n、Rm，n和Sm，n，其中1≤i≤m且1≤j≤n：Compute the matrices Cm,n, Rm,n and Sm,n, where 1≤i≤m and 1≤j≤n, using the recursive equations in Equations 8-10:

${c c}_{i i,, j j} = = \{\begin{matrix} {c c}_{i i - - 11,, j j - - 11} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δand δand {r r}_{i i - - 11,, j j - - 11} \leq \leq L L \\ 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δand δand {r r}_{i i - - 11,, j j - - 11} > > L L \\ max max (({c c}_{i i,, j j - - 11},, {c c}_{i i - - 11,, j j})),, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq L L \\ {c c}_{i i,, j j - - 11} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i,, j j - - 11} \leq \leq L L < < {r r}_{i i - - 11,, j j} \\ {c c}_{i i - - 11,, j j} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j} \leq \leq L L < < {r r}_{i i,, j j - - 11} \\ 00,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j},, {r r}_{i i,, j j - - 11} > > L L \end{matrix} - - - - - - ((88))$

${r r}_{i i,, j j} = = \{\begin{matrix} 00,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δ δ \\ {r r}_{i i,, j j - - 11} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} > > {c c}_{i i - - 11,, j j} \\ {r r}_{i i - - 11,, j j} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} < < {c c}_{i i - - 11,, j j} \\ min min (({r r}_{i i,, j j - - 11,,} {r r}_{i i - - 11,, j j})) + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} = = {c c}_{i i - - 11,, j j} \\ {r r}_{i i,, j j - - 11} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i,, j j - - 11} \leq \leq L L < < {r r}_{i i - - 11,, j j} \\ {r r}_{i i - - 11,, j j} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j} \leq \leq L L < < {r r}_{i i,, j j - - 11} \\ 00,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j},, {r r}_{i i,, j j - - 11} > > L L \end{matrix} - - - - - - ((99))$

${s the s}_{i i,, j j} = = \{\begin{matrix} ' ' S S' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δand δand {c c}_{i i,, j j} = = 11 \\ ' ' O o' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δand δand {c c}_{i i,, j j} &NotEqual; &NotEqual; 11 \\ ' ' R R' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} > > {c c}_{i i - - 11,, j j} \\ ' ' D D.' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} < < {c c}_{i i - - 11,, j j} \\ ' ' R R' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11} \leq \leq {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} = = {c c}_{i i - - 11,, j j} \\ ' ' D D.' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i - - 11,, j j} \leq \leq {r r}_{i i,, j j - - 11} \leq \leq Land Land {c c}_{i i,, j j - - 11} = = {c c}_{i i - - 11,, j j} \\ ' ' R R' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i,, j j - - 11} \leq \leq L L < < {r r}_{i i - - 11,, j j} \\ ' ' D D.' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j} \leq \leq L L < < {r r}_{i i,, j j - - 11} \end{matrix} - - - - - - ((1010))$

最终，本发明的最长匹配子序列算法的输出结果可通过对符号矩阵Sm，n的记录的路径和Cm，n中储存的数值计算得到。而最长匹配子序列的长度则可由cm，n直接获得。Finally, the output result of the longest matching subsequence algorithm of the present invention can be obtained by calculating the recorded paths of the symbol matrix Sm,n and the values stored in Cm,n. The length of the longest matching subsequence can be obtained directly from cm, n.

本发明的旋律相似度的最长匹配子序列算法的具体流程如图9所示：The specific flow of the longest matching subsequence algorithm of the melody similarity of the present invention is as shown in Figure 9:

1)：给定输入旋律特征序列A、B，采用LMS算法计算取得A、B之间彼此相似度最高的子序列A’、B’，即最长匹配子序列；1): Given the input melody feature sequences A and B, use the LMS algorithm to calculate the subsequences A' and B' with the highest similarity between A and B, that is, the longest matching subsequence;

2)通过原始特征序列A、B的长度和匹配子序列A’、B’的长度计算出原始特征序列A、B之间的匹配部分所占比率；2) Calculate the proportion of the matching part between the original feature sequence A, B through the length of the original feature sequence A, B and the length of the matching subsequence A', B';

3)采用实数域的编辑距离算法计算A’、B’之间编辑距离；3) Calculate the edit distance between A' and B' using the edit distance algorithm in the real number field;

4)：对检索过程中每一组参与匹配的特征序列，以A、B之间的匹配部分所占比率作为第一关键字进行降序排列，A’、B’之间编辑距离作为第二关键字进行升序排列，对其相似度进行排序，构成相似度降序列表。4): For each group of feature sequences participating in the matching during the retrieval process, the proportion of the matching part between A and B is used as the first keyword to arrange in descending order, and the edit distance between A' and B' is used as the second key Words are sorted in ascending order, and their similarity is sorted to form a similarity descending list.

本发明在实施例的测试效果验证中随机选择了六名被试对象，该六名对象以哼唱形式为系统提供待检索信息。此外，为保证每一次哼唱的有效性，避免实验结果受到被试对象主观因素的影响，实验过程中，每一人次的哼唱只有其他五名被试对象半数以上认可，认为该哼唱者所哼唱的歌曲确实是目标歌曲时，才被算作一次有效的哼唱，否则将不记为一次有效实验数据。87条有效的音频检索信息在实验中产生，对于大部份的检索信息，所预期的歌曲都命中可接受的搜索结果顺位。对58.62％的检索，目标歌曲命中了搜索结果的第一名；所占百分比在所有顺位的结果中排名第一。此外对超过88.51％的检索，目标歌曲都能命中搜索结果的前五位；并且95.40％的检索，目标歌曲能够命中前十位。对每一次独立检索，其执行时间在150ms到550ms，均值289.47ms，考虑的实验的硬件环境，对于一个拥有上百首歌曲的音乐特征数据库来说，其运行时间也在可接受的范围内，取得了理想的准确率效果，总体的实验结果表明本发明所提出的特征提取、表达方法和旋律相似度的计算方法切实有效。In the test effect verification of the embodiment, the present invention randomly selects six test subjects, who provide the system with information to be retrieved in the form of humming. In addition, in order to ensure the effectiveness of each humming and avoid the experimental results being affected by the subject's subjective factors, during the experiment, only half of the other five subjects recognized each person's humming. Only when the song being hummed is indeed the target song, it will be counted as a valid humming, otherwise it will not be recorded as a valid experimental data. 87 effective audio retrieval information were generated in the experiment, and for most of the retrieval information, the expected songs hit the acceptable search result rank. For 58.62% of the retrievals, the target song hit the first place in the search results; the percentage was the first among all the ranking results. In addition, for more than 88.51% of searches, the target song can hit the top five of the search results; and for 95.40% of the searches, the target song can hit the top ten. For each independent search, its execution time is between 150ms and 550ms, with an average value of 289.47ms. Considering the hardware environment of the experiment, for a music feature database with hundreds of songs, its running time is also within an acceptable range. The ideal accuracy effect is achieved, and the overall experimental results show that the feature extraction, expression method and calculation method of melody similarity proposed by the present invention are effective.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明公开的范围内，能够轻易想到的变化或替换，都应涵盖在本发明权利要求的保护范围内。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the disclosure of the present invention are all It should be covered within the protection scope of the claims of the present invention.

Claims

1. a kind of humming computer music retrieval method based on the longest matching subsequence algorithm, is characterized in that, comprises the following steps:

(1) Pitch frequency sequence extraction and processing;

1) Apply the RAPT algorithm to the input WAV waveform file to extract the pitch frequency, thereby obtaining the pitch frequency sequence;

2) The original pitch frequency sequence will be processed by a high-pass filter and a low-pass filter to remove burrs and noise points, and smooth the pitch curve. The range of human vocal range is generally between E2 (82Hz) and C6 (1047Hz). According to the range of human natural vocalization, the threshold of high-pass filter is set to 80Hz, and the threshold of low-pass filter is set to 1100Hz to remove the fundamental frequency value outside the high and low thresholds;

3) Carry out linear filter processing to the pitch frequency sequence with linear smoothing, remove the noise point in the pitch frequency sequence and further smooth the curve profile of the pitch frequency sequence;

4) remove the noise point by the obtained pitch frequency sequence by median filtering, effectively remove the noise point in the pitch frequency sequence, and keep the step change between the continuous curves in the pitch frequency sequence intact;

(2) Expression of musical characteristics;

1) Characteristic expression of the pitch frequency curve;

The melodic feature is the sequence formed by the intervals between two adjacent tones, with semitones as the unit. A melody fragment containing n natural notes can be expressed as a sequence of n-1 real numbers, expressing the melody features in a quantized manner. The music features of different melodies are distinguishable, providing effective results for subsequent similarity calculations ;It is not sensitive to the overall pitch offset, allowing users to hum in any mode, and the same melodic features can still be extracted; it has good stability, and for the audio information input by the user through humming, the interval calculation is defined as the formula (1) as shown:

Pitch pitch {Interval Interval}_{n no} = = 1212 * * {log log}_{22} ((\frac{{freq freq}_{n no + + 11}}{{freq freq}_{n no}})) - - - - - - ((11))

According to the above definition, the pitch frequency sequence Fx=(freq ₁ , freq ₂ , freq ₃ ,..., freq _n ) can be mapped to the pitch sequence Pi=(pitch_interval ₁ , pitch_interval ₂ , pitch_interval ₃ ,..., pitch_interval _n-1 );

For MIDI files stored in the music feature database, the same melody feature expression method needs to be adopted, so that the melody features extracted from the user input end and the database end have the same format. For MIDI files, the definition of interval calculation is as shown in formula (2) , where MIDI_note _n+1 and MIDI_note _n represent the pitch values in the MIDI file:

Pitch Interval _n = MIDI_note _n+1 -MIDI_note _n (2)

After the above transformation, the features of the same melody in different modes can be normalized, and at the same time, the influence of different humming modes on the extraction of melody features can be eliminated, and the features of the same melody in different modes can be successfully extracted in a normalized way , based on the same feature expression method, a similarity evaluation mechanism can be designed to complete the matching of the retrieval information and the feature information in the database, perform melody modeling on the processed pitch frequency sequence, and obtain a set of melody skeletons composed of discrete points ; The melody skeleton undergoes logarithmic transformation, and the intervals between the input adjacent tones are extracted, and used as the feature sequence of the input audio, and the finally extracted melody features are sent to the matching module and the information in the music feature database. Calculate the similarity to get the matching result;

2) The characteristic expression of the melody;

The pitch frequency contour curve obtained through pitch frequency extraction and filtering can be split into a combination of two independent melody components: a macro melody component and a micro melody component; their definitions are as follows:

Macro-melodic component: reflects the tone pattern in the speech information, which is closely related to the global pitch change of the pitch frequency;

Micro-melodic components: reflect the phoneme components in the voice information, and affect the local changes of the pitch frequency curve;

Similarly, humming information, as a kind of voice information, is also regarded as a combination of two melody components. For the music melody sung by the human voice, the pitch change is only related to the macroscopic melody component of its fundamental frequency curve, while humming Phoneme information such as phonetic symbols and lyrics are determined by the microscopic melody components of the pitch frequency curve. Using the quadratic spline function, the macroscopic melody of the pitch frequency curve is approximated by interpolation, and the obtained macroscopic melody is presented in the form of a sequence of discrete target points , and represents the pitch melody feature corresponding to the pitch frequency sequence. The humming melody feature is based on the pitch sequence and has nothing to do with phonetic symbols and phoneme information. Therefore, the pitch can be obtained by using the MOMEL algorithm to process the filtered pitch frequency contour curve The macro-melodic sequence in the frequency profile curve, and as the basis for subsequent melodic feature expression;

The direct result output by the MOMEL algorithm is still obviously insufficient for the subsequent melodic feature expression, so a parameterized threshold is set to control the interval between adjacent tones, when the interval between two tones is lower than the threshold , this interval will be deleted or merged with other adjacent intervals based on the specific situation;

(3) Similarity Calculation-Retrieval Matching Algorithm

(a) Longest Matched Subsequence (LMS);

The melody feature obtained based on the feature extraction method is a set of real number sequences, while the melody features stored in the music feature database are all integer sequences. At this time, if the similarity between the two sequences is calculated mechanically using the longest common subsequence algorithm, Then many elements that could have been matched may be missed;

The longest matching subsequence algorithm can solve the problems in the application of the longest common subsequence algorithm (LCS). The longest matching subsequence algorithm is an improvement to the longest common subsequence algorithm. Subsequences A', B' are subsequences of input sequences A and B respectively;

Define the longest matching subsequence as follows:

Given an input sequence A = (a1, a2, a3, ..., an) and B = (b1, b2, b3, ..., bm),

That is, subsequences A'=(a'1, a'2, a'3,..., a'l) and B'=(b'1, b'2, b'3,..., b 'l);

Subsequences A' and B' satisfy the following conditions:

First, each element in subsequences A' and B' has a matching element in another subsequence, and meets the following conditions:

In subsequences A', B':

The element a'k of A' corresponds to the element ai of A;

The element b'k of B' corresponds to the element bj of B;

Satisfy: LD(ai, bj)≤δ, where δ is the maximum value of a given local similarity;

Second, the subsequences A' and B' are relatively continuous in their respective original sequences, that is, they meet the following conditions:

In subsequences A', B':

The element a'k of A' corresponds to the element ai of A, and the element a'k+1 of A' corresponds to the element as of A;

The element b'l of B' corresponds to the element bj of B, and the element b'l+1 of B' corresponds to the element at of B;

Satisfy: s-i≤L and t-j≤L, where L is the maximum value of elements allowed to be inserted in the subsequence;

Third, subsequences A' and B' have the same length. And A', B' are the longest subsequences of A and B satisfying conditions 1) and 2) respectively, that is, |A'|=|B'|=max{|Ak|, |Bl|};

In the longest matching subsequence algorithm, the concept of equality of elements is replaced by the concept of matching. Unlike the longest common subsequence algorithm, A' and B' are not completely equal mechanically, but are the longest and most similar group of all subsequences of A and B;

(b) local similarity calculation;

As the basis of the longest matching subsequence algorithm, the following is the calculation method of local similarity;

First, define the edit distance algorithm for sequences of real numbers:

For a given input sequence of real numbers: X = (x1, x2, x3, ..., xm), Y = (y1, y2, y3, ..., yn);

According to the formula (3), the weight value of mutual conversion between elements in the real number sequence is defined, where δ is the threshold for judging equality:

w w ((a a,, b b)) = = \{\begin{matrix} 00,, if if | | a a - - b b | | < < δ δ \\ | | a a - - b b | |,, if if | | a a - - b b | | &GreaterEqual; &Greater Equal; δ δ \end{matrix} - - - - - - ((33))

Initialize the edit distance matrix Dm, n, and the initialization conditions are as follows:

d0,0=0;

di,0=di-1,0+w(xi,0), where 1≤i≤m;

d0, j=d0, j-1+w(0, yj), where 1≤j≤n;

For matrix units with 1≤i≤m and 1≤j≤n, calculate the edit distance matrix Dm,n, and the recursive equation is shown in formula (4), where Wdel, Wsub and Wins are three operations of deletion, replacement and insertion respectively The weight of:

{d d}_{i i,, j j} = = min min \{\begin{matrix} {d d}_{i i - - 11,, j j} + + {W W}_{del del} * * w w (({x x}_{i i},, 00)) \\ {d d}_{i i - - 11,, j j - - 11} + + {W W}_{sub sub} * * w w (({x x}_{i i},, {y the y}_{j j})) \\ {d d}_{i i,, j j - - 11} + + {W W}_{ins ins} * * w w ((00,, {y the y}_{j j})) \end{matrix} - - - - - - ((44))

Finally, the edit distance ED(X, Y) between the input real sequence X and Y can be obtained from the lower right corner dm,n of the matrix Dm,n, as shown in formula (5):

ED(X,Y)=dm,n. (5)

Then, a specific definition of the local similarity of elements is given:

For a given input melody feature sequence: A = (a1, a2, a3, ..., an), B = (b1, b2, b3, ..., bm);

Take one element from A and B respectively to form a pair (ai, bj). For each pair of pairs, the local similarity is defined as follows, where k is the local radius:

Define local subsequences X=(ai-k,...,ai,...,ai+k) and Y=(bj-k,...,bj,...,bj+k);

Then the local similarity LD(ai, bj) near the binary group (ai, bj) can be obtained from the edit distance ED(X, Y) between the local subsequences X and Y, as shown in formula (6):

LD(ai, bj) = ED(X, Y); (6)

(c) Dynamic programming calculates the longest matching subsequence;

The calculation rules of the local similarity near elements ai and bj in the original sequences A and B are clarified, and the longest matching subsequences A' and B' that meet the definition can be calculated using the strategy of Dynamic Programming;

First, use dynamic programming to settle the calculation process of the longest common subsequence algorithm (LCS):

Given an input sequence A = (a1, a2, a3, ..., am) and B = (b1, b2, b3, ..., bn);

Construct the LCS matrix Cm, n, and initialize the matrix according to the following conditions:

ci,0=0, c0,j=0, where 0≤i≤m and 0≤j≤n;

Calculate the matrix using the recursive equation in formula (7), where 1≤i≤m and 1≤j≤n:

{c c}_{i i,, j j} = = \{\begin{matrix} {c c}_{i i - - 11,, j j - - 11} + + 11,, if if {a a}_{i i} = = {b b}_{j j} \\ max max (({c c}_{i i,, j j - - 11},, {c c}_{i i - - 11,, j j})),, else else \end{matrix} - - - - - - ((77))

Finally, the longest common subsequence can be obtained from the lower right corner cm, n of Cm, n;

The longest matching subsequence algorithm (LMS) can be calculated as follows:

Define the matrices Cm, n, Rm, n and Sm, n to calculate the longest matching subsequence. The detailed definitions are as follows:

Shaping matrix Cm, n, its unit ci, j stores the length of the longest matching subsequence between subsequences (a1, a2, a3, ..., ai) and (b1, b2, b3, ..., bj) ;

The integer matrix Rm, n, its unit ri, j stores the number of discontinuous elements in the two subsequences;

The character matrix Sm, n, its unit si, j stores the internal calculation path of the matrix Cm, n, Rm, n, and records the growth direction of the normal longest matching subsequence for each calculation;

Initialize the ≤ matrices as follows:

ci, 0=0, c0, j=0, r0, j=0, ri, 0=0, s0, j='_', si, 0='_', where 0≤i≤m and 0≤j ≤n;

Calculate the matrices Cm,n, Rm,n and Sm,n using the recursive equation in formula (8), where 1≤i≤m and 1≤j≤n:

{c c}_{i i,, j j} = = \{\begin{matrix} {c c}_{i i - - 11,, j j - - 11} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δand δand {r r}_{i i - - 11,, j j - - 11} \leq \leq L L \\ 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δand δand {r r}_{i i - - 11,, j j - - 11} > > L L \\ max max (({c c}_{i i,, j j - - 11},, {c c}_{i i - - 11,, j j})),, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq L L \\ {c c}_{i i,, j j - - 11} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i,, j j - - 11} \leq \leq L L < < {r r}_{i i - - 11,, j j} \\ {c c}_{i i - - 11,, j j} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j} \leq \leq L L < < {r r}_{i i,, j j - - 11} \\ 00,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j},, {r r}_{i i,, j j - - 11} > > L L \end{matrix} - - - - - - ((88))

{r r}_{i i,, j j} = = \{\begin{matrix} 00,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δ δ \\ {r r}_{i i,, j j - - 11} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} > > {c c}_{i i - - 11,, j j} \\ {r r}_{i i - - 11,, j j} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} < < {c c}_{i i - - 11,, j j} \\ min min (({r r}_{i i,, j j - - 11,,} {r r}_{i i - - 11,, j j})) + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} = = {c c}_{i i - - 11,, j j} \\ {r r}_{i i,, j j - - 11} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i,, j j - - 11} \leq \leq L L < < {r r}_{i i - - 11,, j j} \\ {r r}_{i i - - 11,, j j} + + 11,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j} \leq \leq L L < < {r r}_{i i,, j j - - 11} \\ 00,, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j},, {r r}_{i i,, j j - - 11} > > L L \end{matrix} - - - - - - ((99))

{s the s}_{i i,, j j} = = \{\begin{matrix} ' ' S S' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δand δand {c c}_{i i,, j j} = = 11 \\ ' ' O o' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) \leq \leq δand δand {c c}_{i i,, j j} &NotEqual; &NotEqual; 11 \\ ' ' R R' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} > > {c c}_{i i - - 11,, j j} \\ ' ' D D.' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11},, {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} < < {c c}_{i i - - 11,, j j} \\ ' ' R R' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i,, j j - - 11} \leq \leq {r r}_{i i - - 11,, j j} \leq \leq Land Land {c c}_{i i,, j j - - 11} = = {c c}_{i i - - 11,, j j} \\ ' ' D D.' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δ δ,, {r r}_{i i - - 11,, j j} \leq \leq {r r}_{i i,, j j - - 11} \leq \leq Land Land {c c}_{i i,, j j - - 11} = = {c c}_{i i - - 11,, j j} \\ ' ' R R' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i,, j j - - 11} \leq \leq L L < < {r r}_{i i - - 11,, j j} \\ ' ' D D.' ',, ifLD ifLD (({a a}_{i i},, {b b}_{j j})) > > δand δand {r r}_{i i - - 11,, j j} \leq \leq L L < < {r r}_{i i,, j j - - 11} \end{matrix} - - - - - - ((1010))

Finally, the output result of the longest matching subsequence algorithm can be obtained by calculating the path recorded in the symbol matrix Sm,n and the value stored in Cm,n, and the length of the longest matching subsequence can be obtained directly from cm,n.

2. a kind of humming computer music retrieval method based on the longest matching subsequence algorithm according to claim 1, is characterized in that, the concrete steps of described longest matching subsequence algorithm are as follows:

(1): Given the input melody feature sequences A and B, use the LMS algorithm to calculate the subsequences A' and B' with the highest similarity between A and B, that is, the longest matching subsequence;

(2) Calculate the proportion of the matching part between the original feature sequence A, B through the length of the original feature sequence A, B and the length of the matching subsequence A', B';

(3) Using the edit distance algorithm in the real number field to calculate the edit distance between A' and B';

(4): For each group of feature sequences participating in the matching during the retrieval process, the proportion of the matching part between A and B is used as the first keyword to arrange in descending order, and the edit distance between A' and B' is used as the second The keywords are sorted in ascending order, and their similarity is sorted to form a similarity descending list.