CN103440250A

CN103440250A - Embedded humming retrieval method and system based on 16-bit DSP (Digital Signal Processing) platform application

Info

Publication number: CN103440250A
Application number: CN2013103089474A
Authority: CN
Inventors: 曹宏
Original assignee: BEIJING HELIOS-ADSP SCIENCE AND TECHNOLOGY Co Ltd
Current assignee: BEIJING HELIOS-ADSP SCIENCE AND TECHNOLOGY Co Ltd
Priority date: 2013-07-22
Filing date: 2013-07-22
Publication date: 2013-12-11

Abstract

The invention provides an embedded humming retrieval method based on 16-bit DSP (Digital Signal Processing) platform application. The method comprises the steps of acquiring a humming melody of a user and storing the humming melody according to a PCM (Pulse Code Modulation) format; extracting fundamental tones from the humming melody according to a short-time autocorrelation algorithm to obtain PV (pitch value) information; performing note segmentation and post-processing to the PV information, converting the PV information into NOTE format information and removing false fundamental tone information to obtain a melody to be retrieved; converting MIDI (Musical Instrument Digital Interface) music library information into NOTE format information to obtain a standard MIDI music template library; matching the melody to be retrieved with the standard MIDI music template library through an NLS (Note-based Linear Scaling) algorithm and an NRA (Note-based Recursive Alignment) algorithm to obtain matching scores, wherein the NLS algorithm is obtained by modifying a traditional frame-based LS (Linear Scaling) algorithm into an algorithm based on NOTE information, and the NRA algorithm is obtained by modifying a traditional frame-based RA (Recursive Alignment) algorithm into an algorithm based on the NOTE information; and ordering according to the matching scores and outputting matching results. The embedded humming retrieval method based on the 16-bit DSP platform application can realize high-efficiency accurate retrieval, and the adaptability is wide.

Description

Embedded singing search method and system based on 16 bit DSP platform application

Technical field

The present invention relates to the field of microwave automatic search music, refer to especially a kind ofly for singing search, and can be applied to search method and the SOC (system on a chip) of 16 bit DSP platforms.

Background technology

At present, how efficiently to retrieve easily target information from bulk information, become a major challenge for the retrieval of music information, continuous renewal and development along with data storage method and means, digital information has been penetrated into each corner of our life, and traditional retrieval mode need to be classified and mark to information, and the great degree of quantities exceeds people's the imagination, and, along with the further increase of quantity of information, the drawback of traditional retrieval mode highlights all the more.

Singing search is a kind of novel, the content-based retrieval mode, be accompanied by scholar's continuous exploration and research, this content-based music retrieval mode has many-sided advantage, at first, develop rapidly along with internet and other electronic products, the memory space of music also constantly increases, according to this growth pattern, song is become to increasing according to mark and the information classification engineering of conventional approach, and content-based singing search is without being classified and mark work, utilizing relevant switching software to convert the target music file to corresponding template style just can use, while is due to the development of singing search algorithm, at present can efficiently and accurately retrieve the target music information.Secondly; people often can meet a kind of like this situation; only remember the melody of music; but do not remember the lyrics and the concrete name of music clearly; for a kind of like this Search Requirement, obviously traditional retrieval mode has manifested the drawback on the function, can not realize expecting searched targets even fully; at this time the advantage of information retrieval based on contents mode just appears undoubtedly, and retrieval person only need to hum out the corresponding melody of remembering just can find out corresponding retrieval music information from a large amount of music files.And such retrieval mode is also a slice light for intelligent market application foreground, systems such as MP3, mobile phone, Karaoke, intelligent toy is introduced embedded singing search technology and is had the good market demand capacity.

Peculiar advantage and development space due to singing search, the researcher constantly carries out and deepens for the research in this field both at home and abroad, new algorithm and application platform constantly are incorporated into this field, in a word, as the content-based retrieval mode, singing search has that important theoretical research is worth and engineering background widely, yet, still there is this huge development space in this field of marketization product, the product of practical application on market at present all is based on the large scale system of PC, China Mobile for example, the CRBT retrieval service that China Telecom provides, be subject to the impact of existing algorithm complex and retrieval accuracy, the existing singing search matching algorithm based on PC can't directly apply to existing most embedded system.

Summary of the invention

The present invention proposes a kind of can efficiently accurately the retrieval, and is applicable to the embedded singing search SOC (system on a chip) based on 16 bit DSP platform application of bottom embedded platform.

Technical scheme of the present invention is achieved in that

Embedded singing search method based on 16 bit DSP platform application comprises:

Gather user's humming melody, and will hum melody and store according to the PCM form;

According to auto-correlation algorithm in short-term, the humming melody is carried out to the fundamental tone extraction, obtain PV information;

PV information is carried out being converted to the NOTE format information after note syncopate and aftertreatment, reject false Pitch Information, obtain melody to be retrieved;

MIDI music libraries information is converted to the NOTE format information, obtains standard MIDI music template base;

Melody to be retrieved and standard MIDI music template base are carried out to melody matching by NLS algorithm and NRA algorithm, obtain the coupling mark; Wherein the NLS algorithm is that the traditional LS algorithm based on frame is revised as to the information based on NOTE; The NRA algorithm is revised as the information based on NOTE by the traditional RA algorithm based on frame;

According to the coupling mark, just sort, the output matching result.

The another embedded singing search SOC (system on a chip) based on 16 bit DSP platform application provided by the invention comprises:

Acquisition module, for gathering user's humming melody, and will hum melody and store according to the PCM form;

The fundamental tone extraction module, for according to auto-correlation algorithm in short-term, the humming melody being carried out to the fundamental tone extraction, obtain PV information;

The note syncopate module, for PV information is carried out to note syncopate, by merged/deconsolidation process of Pitch Information adjacent on time sequencing;

Post-processing module, for PV information being carried out to the calculating of fundamental frequency path cost, obtain the PV information of input melody after the Boersma computing;

The algorithmic match module, for by PV information, MIDI music libraries information is converted to the NOTE format information; And the PV information after changing and the MIDI music template in MIDI music libraries information being mated according to NLS algoritic module and NRA algoritic module, or coupling mark;

The matching result output terminal, for exporting sequence and ask for the corresponding MIDI music template sequence number of maximum matching distance value according to the coupling mark.

Embedded singing search method and SOC (system on a chip) based on 16 bit DSP platform application provided by the invention, fundamental tone converts PV information to note information, and note information is carried out to cutting after extracting and obtaining PV information, weed out false note information, obtain sane note information; Because note information can better be described the song melody, than the Pitch Information based on frame, adopt the note matching way can effectively improve the degree of accuracy that coupling is calculated; Adopt in addition NLS and NRA algorithm to substitute traditional matching process based on frame, the degree of accuracy that coupling is calculated has had very big raising.

The accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, below will the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the embedded singing search method flow diagram that the present invention is based on 16 bit DSP platform application;

Fig. 2 is the block diagram that the present invention is based on the embedded singing search SOC (system on a chip) of 16 bit DSP platform application.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making under the creative work prerequisite the every other embodiment obtained, belong to the scope of protection of the invention.

Gather user's humming melody, and will hum melody and store according to the PCM form; (step 1)

According to auto-correlation algorithm in short-term, the humming melody is carried out to the fundamental tone extraction, obtains PV(Pitch Value and be called for short PV, pitch value) information; (step 2)

PV information is carried out being converted to the NOTE format information after note syncopate and aftertreatment, reject false pitch value information, obtain melody to be retrieved; (step 3)

MIDI music libraries information is converted to the NOTE format information, obtains standard MIDI music template base; (step 4)

Melody to be retrieved and standard MIDI music template base are carried out to melody matching by NLS algorithm and NRA algorithm, obtain the coupling mark; Wherein the NLS algorithm is that the traditional LS algorithm based on frame is revised as to the information based on NOTE; The NRA algorithm is revised as the information based on NOTE by the traditional RA algorithm based on frame; (step 5)

According to the coupling mark, just sort, the output matching result.(step 6)

Execution step 1 o'clock, the time span of user humming need to be greater than 0.3 second, and best is to surpass 6 seconds, and the humming melody, at first with PCM(Pulse Code Modulation, is called for short PCM, pulse-code modulation recording) form preserved.

Performing step at 2 o'clock, auto-correlation algorithm is modal algorithm during a kind of territory signal is processed in short-term, adopts auto-correlation algorithm in short-term to extract fundamental frequency, and simultaneously auxiliary with the fundamental frequency post-processed, according to ultimate resolution, minimum frame moves, and carries out the fundamental tone extraction.Than the auto-correlation processing algorithm of frequency domain and other transform domains, auto-correlation algorithm has significantly reduced computation complexity under the prerequisite that guarantees operational precision in short-term, thereby is more suitable for the application of embedded chip platform.

Performing step at 3 o'clock, PV information is carried out being converted to the NOTE(note after note syncopate and aftertreatment) format information, reject false Pitch Information.In practice, because autocorrelation function peak-peak place may not be pitch period, so, if directly may make mistakes the inverse of peak-peak as fundamental frequency, therefore need to carry out aftertreatment.Aftertreatment is mainly to weed out some false pitch value information the Pitch Information from having obtained, and obtains real pitch value information.Be converted to the NOTE(note) format information is a kind of basic conversion method.

Wherein, the concrete grammar of note syncopate is: the method that adopts weighting to be averaging determines whether being divided into new note and determines its corresponding pitch value.

The step that aftertreatment comprises has: (1) fundamental frequency path cost calculates; (2) Boersma calculates minimal path.

Calculate the method for fundamental frequency path cost functional value: using a front n peak value as candidate item, and corresponding pitch period is labeled as: p _k(0)～p _k(n-1).Cost function between consecutive frame is defined as:

In formula, R () is autocorrelation function.Just can obtain the distance of adjacent two interframe by this interframe cost function.Because the fundamental tone of adjacent interframe has certain contact, their variation is continuous often, so we consider that from the angle in whole path this cost is just more reasonable, thereby definition path cost function is:

D = Σ_{n = k - s}^{k + s - 1} \overset{&OverBar;}{d} (p_{n} (i_{n}), p_{n + 1} (i_{n + 1}))

In above formula, s means to consider the front and back length in path, and k means the sequence number of present frame.

Boersma calculates minimal path: for reducing the erroneous judgement to fundamental tone half frequency, frequency multiplication, computation process is as follows:

At first, find out n candidate value of fundamental frequency in respective frame, then to corresponding candidate value definition intensity level be:

In the definition, S is the intensity level of trying to achieve in the above; Th _voicethe threshold value that means speech sound just is considered to sound disconnected when the voice amplitude is greater than this value; Th _silencethe threshold value that means quiet voice, just think that when the voice amplitude is less than this value this conclusion sound is quiet; it is the local peak value of voice signal;

it is the global peak of voice signal; R (τ _max) refer to the maximal value of autocorrelation function; C _octavebe a work factor, its value is larger, more tends to candidate's fundamental frequency of selecting frequency high; Pitch _minthe minimum value that means minimum fundamental frequency, determine by selected window is long; τ _maxmean τ value when autocorrelation function is obtained maximum.

After trying to achieve intensity level, it forms with candidate's fundamental frequency again: (F _ni, S _ni), wherein n means the sequence number of present frame, i means i candidate's amount in the n frame.Obtain the path total cost of whole speech frame sequence when selecting candidate item by this fundamental frequency candidate value and respective strengths value again:

Cost = Σ_{n = 1}^{N - 1} C_{tran} (F_{n - 1, p_{n - 1}}, F_{n, p_{n}}) - Σ_{n = 1}^{N - 1} S_{n, p_{n}}

Wherein

refer to from the p of n-1 frame _n-1individual fundamental frequency candidate value is to the p of n frame _nthe switching cost of individual fundamental frequency candidate value, it is defined as follows:

In formula, C _inverserefer to from voiced sound to non-voiced sound or the switching cost from non-voiced sound to voiced sound, increase this value and can reduce the redirect between voiced sound and non-voiced sound; C _jumpbe from the fundamental frequency redirect cost between unvoiced frame, increase the possibility that this value will reduce larger fundamental frequency redirect.They are all fixed values, as they all can be set to 0.2.When they all are set to 0, mean that the redirect cost between consecutive frame is 0, at this moment when selecting the fundamental frequency candidate item, just only need to have considered local optimum.

The selection of final fundamental frequency, be that to obtain minimum value by Cost determined, adopts the method for dynamic programming to find out the path that makes the Cost minimum, and this path is exactly the best fundamental frequency value of each frame.

Preferably, due in note syncopate, most typical mistake has two kinds, and a kind of is inserting error, and another kind is deletion error.Inserting error refers to the data that belong to a note originally, after note syncopate, has been cut into a plurality of notes, has just drawn non-existent new note originally more, and this mistake is called as inserting error.Deletion error refers to the data that do not belong to a note originally, after note syncopate, has been cut into a note, just a plurality of notes originally has been merged into to a note, and this mistake is called as deletion error.For reducing inserting error and deletion error, therefore need to reject false pitch value information, adopt the segmentation statistical method, in a window time span scope, the note obtained analyzed, by time span and amplitude not the value in normal span deleted.

Performing step at 4 o'clock, MIDI(Musical Instrument Digital Interface, be called for short MIDI, MIDI) information of music libraries transfers the NOTE format information equally to, the MIDI snatch of music of the direct Application standard of MIDI template in the MIDI music libraries wherein, ignoring some the less important information in the MIDI music, is the NOTE format information by important information Reseals such as track wherein, musical instrument, rhythm.

Performing step at 5 o'clock, NLS(Note-based Linear Scaling, be called for short NLS, linear extendible based on note) algorithm and NRA(Note-based Recursive Alignment, be called for short NRA, the recurrence comparison based on note) algorithm carries out melody matching, obtain mating mark, wherein the NLS algorithm is based on traditional LS(Linear Scaling of frame, is called for short LS, linear scale) algorithm is revised as the information based on NOTE; Can greatly reduce calculated amount by revising, improve arithmetic speed.

Wherein, NLS(Note-based Linear Scaling, be called for short NLS, the linear extendible based on note) algorithm computation process:

Because the linear extendible algorithm is based on the matching algorithm of frame, in order further to improve its speed ability, the linear extendible algorithm based on note just is introduced into, and its algorithm steps is similar to the LS algorithm, but humming feature and the masterplate feature of NLS all adopt the note feature, and specific algorithm is as follows:

The linear extendible algorithm of algorithm 1 based on note

And NRA(Note-based Recursive Alignment, be called for short NRA, the recurrence comparison based on note) algorithm computation process:

The NRA algorithm is on the basis that is based upon the RA algorithm, and what it all used is the note feature, and in algorithm, the humming sequence of notes of establishing input is H={f ₁, f ₂..., f _n}={ (s ₁, t ₁), (s ₂, t ₂) ..., (s _n, t _n), song masterplate sequence of notes is H={f ' ₁, f ' ₂..., f ' _n}={ (s ' ₁, t ' ₁), (s ' ₂, t ' ₂) ..., (s ' _n, t ' _n), f wherein means the note feature, and s means half pitch value of note, and t refers to the length of note.In the NRA algorithm, maximum iteration time is defined as I, and humming is calculated as follows with masterplate spacing NRA (S, H, I):

Concrete, algorithm 2NRA algorithm

Performing step at 6 o'clock, according to the sequence of coupling mark, the output matching result, preferred its concrete operations are, each standard MIDI template in the middle of melody to be retrieved and standard MIDI music template base is mated, and calculated corresponding matching score, just sorted with regard to matching score, and filter out the sequence number of the standard MIDI template that score is the highest, be final output matching result.

Preferably, singing search method provided by the invention, its operating process is that the humming person is by microphone input humming melody, time span is greater than 0.3s, the best 6s that surpasses, then by algorithm, the form of humming melody is modified, with the PV stored in file format, among system, MIDI music libraries information also converts the standard MIDI music template base of NOTE form to and is stored.The humming melody carries out note syncopate and fundamental frequency aftertreatment after gene extraction and format conversion, weeds out in the fundamental frequency leaching process the false fundamental frequency information occurred.Consider the restriction of embedded platform system resource, SOC (system on a chip) adopts auto-correlation algorithm in short-term to extract fundamental frequency.

Treated humming melody becomes melody to be retrieved after treatment, then enters the melody matching stage, by NLS and NRA algorithm, mates calculating, thereby obtains the coupling mark, and after the humming melody is mated, result is with the novel output of voice.

Preferably, the present invention also provides the embedded singing search SOC (system on a chip) based on 16 bit DSP platform application corresponding with said method, comprising:

The algorithmic match module, for by PV information, MIDI music libraries information is converted to the NOTE format information; And according to NLS algoritic module and NRA algoritic module, the PV information after changing and the MIDI music template in MIDI music libraries information are mated, obtain the coupling mark;

The matching result output terminal, for just exporting sequence and ask for the corresponding MIDI music template sequence number of maximum matching distance value according to the coupling mark.

Wherein, acquisition module comprises the microphone signal collection, A/D changes (Analog to Digital Convert, the conversion of analog to digital signal), PCM information is stored three parts, carry out signals collecting by microphone when the user hums, then store through PCM information after being converted to digital signal by A/D.

The fundamental tone extraction module comprises windowing, autocorrelation calculation, false Pitch Information deletion in short-term; Its auto-correlation algorithm in short-term adopted is a kind of common algorithm, then this is not repeating.

The note syncopate module can be carried out cutting by effective Pitch Information, and Pitch Information adjacent on time sequencing is merged or deconsolidation process.

Post-processing module comprises the calculating of fundamental frequency path cost, Boersma calculates aftertreatment.Wherein, the fundamental frequency path cost calculates and Boersma calculating describes in aforesaid method, therefore at this, launches no longer in addition narration.

The algorithmic match module comprises NRA algoritic module and NLS algoritic module, and wherein the NRA algoritic module is to calculate the distance between humming information and song template information by the mode of iteration, mainly comprises following four steps:

1, the humming segment is stretched and alignd with midi music template.

2, the song masterplate is divided into to isometric two parts H from centre ₁, H ₂, the humming segment S that cuts in two from centre ₁, S ₂, the length of two sections is determined by predefined a series of ratios, is respectively fT _s(1-f) T _s; This ratio can be set by experiment, and in formula, f represents scale factor, is a value between [0,1], and Ts refers to hum the T.T. length of melody section.

3, respectively to S ₁, S ₂stretch and H ₁, H ₂alignment, calculate overall distance.

3a, from the first note of humming, the duration of by the interval of humming between note and masterplate note, being multiplied by this interval, obtain distance value v ₁, from starting to ending of humming, calculate all distance value v ₁, v ₂..., v _n;

Overall distance between 3b, calculating humming and masterplate: v=|v ₁|+| v ₂|+... + | v _n|.

Above step in, between the pitch of the pitch that calculates the humming note and masterplate note apart from the time, can directly use interval between the two as distance value, Euclidean distance, also can adopt other distance, as interval square etc.The present invention calculate apart from the time use Euclidean distance.

4, find out an optimal segmentation ratio f _best, the distance that makes to hum between segment and MIDI music template is minimum; Respectively two parts H1, the H2 newly be divided into carried out to iteration according to above step, until iterations is 0; Obtain the matching distance value for the matching distance between humming melody and song Template Information.

Preferably, the NLS module is for carrying out linear extendible by the NOTE format information, and concrete steps are:

Preset a series of contraction-expansion factors;

The humming melody is stretched;

Find out the minor increment of humming melody and MIDI music template, i.e. the highest data of matching value.

Preferably, the matching result output terminal finally represents matching result for the form with voice.

Embedded singing search method and SOC (system on a chip) based on 16 bit DSP platform application provided by the invention, after extracting and obtain PV information by fundamental tone, convert PV information to note information, and note information is carried out to cutting, weed out false note information, obtain sane note information; Then adopt note to be mated, because note information can better be described the song melody, compare with the Pitch Information based on frame, adopt the matching way of note can effectively improve the degree of accuracy that coupling is calculated; Adopt in addition NLS and NRA algorithm to substitute traditional matching algorithm based on frame, the degree of accuracy that coupling is calculated is obviously promoted; Therefore the degree of accuracy of coupling can significantly improve; In addition, adopt autocorrelation method in short-term to calculate fundamental frequency, than the frequency domain processing mode, calculated amount and system resource occupancy have obtained remarkable reduction, and adopt NLS, NRA algorithm to replace traditional LS, RA algorithm, calculated amount reduces, and the singing search algorithm that can only move on PC before therefore making obtains realizing on resource-constrained embedded chip.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the embedded singing search method based on 16 bit DSP platform application, is characterized in that, comprising:

According to the coupling mark, just sort, the output matching result.

2. the embedded singing search method based on 16 bit DSP platform application as claimed in claim 1, is characterized in that, when carrying out the step of the humming melody that gathers the user, the length that the user hums the time is greater than 0.3s.

3. the embedded singing search method based on 16 bit DSP platform application as claimed in claim 1, it is characterized in that, while after execution is carried out note syncopate and aftertreatment to PV information, being converted to the step of NOTE format information, aftertreatment specifically comprises: the fundamental frequency path cost calculates and Boersma calculates minimal path.

4. the embedded singing search method based on 16 bit DSP platform application as claimed in claim 1, it is characterized in that, in execution, according to the coupling mark, just sort, during the step of output matching result, its concrete operations are, each standard MIDI music template in the middle of melody to be retrieved and standard MIDI music template base is mated, and calculate corresponding matching score, according to matching score, just sorted, and filter out the sequence number of the standard MIDI music template that score is the highest, be final output matching result.

5. the embedded singing search method based on 16 bit DSP platform application as claimed in claim 1, is characterized in that, when carrying out the step of the humming melody that gathers the user, the length that the user hums the time is greater than 6s.

6. the embedded singing search SOC (system on a chip) based on 16 bit DSP platform application, is characterized in that, comprising:

7. the embedded singing search SOC (system on a chip) based on 16 bit DSP platform application as claimed in claim 6, is characterized in that, the NLS algoritic module is for carrying out linear extendible by the NOTE format information, and concrete steps are:

Preset a series of contraction-expansion factors;

The humming melody is stretched;

Find out the minor increment of humming melody and MIDI music template.

8. the embedded singing search SOC (system on a chip) based on 16 bit DSP platform application as claimed in claim 6, is characterized in that, the concrete steps of NRA algoritic module comprise:

The humming melody is stretched and alignd with MIDI music template;

MIDI music template is divided into to isometric two parts H1, H2 from centre, humming melody cut in two from centre S1, S2, the length of two sections is determined according to predefined ratio, be respectively f Ts and (1-f) Ts, wherein f is [0,1] scale factor between, Ts is the T.T. length of humming melody section;

Respectively S1, S2 are stretched and align with H1, H2, calculate overall distance;

Find out optimal segmentation ratio f best according to the minor increment between humming melody and MIDI music template;

Two parts H1, the H2 newly be divided into carried out to iteration according to above step, until iterations is 0, obtain the matching distance value for the matching distance between humming melody and song Template Information.