CN1953046B

CN1953046B - Automatic selection device and method for music based on humming sing

Info

Publication number: CN1953046B
Application number: CN2006101224306A
Authority: CN
Inventors: 凌若天; 罗笑南
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2006-09-26
Filing date: 2006-09-26
Publication date: 2010-09-01
Anticipated expiration: 2026-09-26
Also published as: CN1953046A

Abstract

This invention discloses a music automatic selection device and method, which comprises audio frequency collection device, servo, output device and the method comprises the following steps: selecting songs of property through sing selection, wherein, the user only need sing or speak basic property of song, the audio collection device collects data to send to servo; the servo analysis audio data finds music in database; finally displaying relative results through output device.

Description

A kind of melody automatic dialing unit and method based on humming

Technical field

The present invention relates to a kind of melody automatic dialing unit and method based on humming, specifically, relate to a kind of user in the theme of only remembering melody, and forgotten under the name of melody and singer's the situations such as information, selected the apparatus and method of song by the humming melody.

Background technology

1, content-based audio retrieval technology introduction

The computer search audio fragment can use the text marking mode based on title or filename, as voice data being marked " music ", " speech " etc.But because the imperfection and the subjectivity of filename and textual description, people are difficult to find the audio fragment that satisfies specific requirement.For addressing the above problem, content-based audio retrieval technology is arisen at the historic moment.Content-based audio retrieval is exactly by the audio frequency characteristics analysis, gives different semantemes to different voice datas, makes the audio frequency with identical semanteme keep similar acoustically.Sample between the audio fragment that the simplest content-based audio retrieval use is inquired about and stored is to the comparison between the sample.But because sound signal is variable, and different audio fragments can be expressed by different sampling rates, and each sample can use different bit numbers, reserves so this method effect differs.Therefore, content-based audio retrieval is the basis by extraction audio frequency characteristics collection such as average amplitude and frequency distribution usually.

Speech recognition technology also helps the realization of content-based audio search simultaneously.If there are the lyrics in the audio section that is used to mate, can use speech recognition technology that voice signal is converted into text, use the IR technology then and carry out index and retrieval.Except that the sounding vocabulary of reality, be included in the out of Memory in the voice, as enunciator's identity and mood etc., all help speech index and retrieval.

2, the introduction of song pattern is selected in existing craft

The consumer is to the similar place of Karaoke now, and the consumer generally can select many backgrounds to play, the accompaniment music during as singing.The method of choosing song has following several usually: 1, select song according to the number of words of song names; 2, select song according to singer's name number of words; 3, song is divided into 3 classes, and then carries out binary search according to man, woman, chorus type according to the number of words of song title, singer's name; 4, according to the language of song song is classified, carry out binary search then.

Because each search all is to search for according to the condition of very easy repetitions such as song title number of words, singer's name number of words, each Search Results all comprises a large amount of non-purpose information, so just carries out binary search, three search etc.And usually final Search Results also comprises a large amount of melodies, and a result displayed bar of display terminal number is very limited, and the Pagination Display of just having to is in the time of so each user search, all to carry out a large amount of browsing, screen just with human eye and can choose the melody of oneself wanting.

3, Related product and patent

There are many research units that content-based audio search is being studied at present.

The content-based audio frequency coupling research of NUS is finished by Jonathan Foote, and at first, this research requires the audio file sample storehouse of accumulation certain scale, and will form proper vector through handling automatically; Secondly, the audio file in this sample storehouse all will that is to say that each file all will be included into a class through artificial mark.

It is that advertisement in the TV programme is analyzed that fundamental purpose is studied to this aspect by Mannheim, Germany university.The researchist is at first cut apart roughly the audio-frequency unit of the TV commercials that record in advance with audio frequency characteristics such as loudness, obtains the audio file of music and environment sound equipment (noise).With fundamental frequency sequence extraction method the music file that splits is analyzed automatically then, extracted corresponding fundamental frequency time series, and corresponding music file is carried out index with this.

MIT, University of Southern California etc. have all launched audio retrieval research, by humming inquiry, audio classification, structured audio is represented and based on speaker's the research with aspect such as index cut apart.

But now content-based audio search technology is imperfection also, and our system will take a kind of more excellently in searching algorithm in the research prior art, makes the coupling degree of accuracy higher.But because the restriction of prior art development, the matching precision of system still can not satisfy people's demand fully, thereby we take the method for conventional manual search and humming search combination, comes the music data storehouse is searched for.Our system will take traditional manual search method when the humming search can not retrieve satisfactory result like this.

Device of the present invention will solve the time-consuming shortcoming of the trouble of tradition search in the past, solve the jejune problem of content-based search technique simultaneously, have the commercialization practical value.

Summary of the invention

Long at the former technology input search condition time, shortcomings such as way of search complexity, and people's ubiquity such a case, only remember the theme of melody, but forget information such as the name of melody and singer, in this case, traditional way of search can not satisfy people's demand fully, the present invention proposes a kind of melody automatic dialing unit and method based on humming.

It comprises audio collecting device, server, output device and output device based on the melody automatic dialing unit of humming.

The melody base attribute that described audio collecting device is responsible for gathering the melody of user's humming and is gathered user's oral account, the data that will collect send to the server of system simultaneously.

Described server is responsible for storing melody, receives the data that audio collecting device is gathered, and mates with it, the result of mating is sent back outlet terminal, and the various command that entry terminal sends is analyzed.

Described outlet terminal is responsible for the result of display server coupling and is shown response user's various inputs.

Described entry terminal is responsible for importing matching condition and is sent various command to server.

A kind of melody automatic selecting method based on humming, it comprises by humming choosing song and passes through oral account melody base attribute choosing song:

Described step by humming choosing song is specially:

1) user sends the order of preparing humming selection melody by entry terminal to server;

2) open the switch of audio collecting device, the user hums melody to audio collecting device;

3) voice data that sends over of server analyzing audio collecting device and the melody in the music data storehouse mate;

When 4) matching result is one or more of, server is sent to outlet terminal to matching result, if have only a result, wait for that then the user confirms the result, if many results are arranged, wait for that then the user selects the result and confirms, if the user thinks that all matching results are not the results that he wants, change step 6) over to; If thinking, the user result who has him to want in the matching result changes step 7) over to;

When 5) matching result was zero bar, server returned to outlet terminal to the result that it fails to match, and system also changes step 6) over to automatically;

6) system enters conventional manual and selects the song pattern, and the user is by entry terminal input singer title, singer's sex, and conditions such as song title are carried out manual screening;

7) melody of system plays response.

Described step by oral account melody base attribute choosing song is specially:

1) user sends the order that preparation oral account melody attribute is selected melody by entry terminal to server;

2) open the switch of audio collecting device, the user says certain attribute of melody to audio collecting device;

When 4) matching result was one or more of, server was sent to outlet terminal to matching result, if having only a result, waited for that then the user confirms the result, if many results are arranged, waited for that then the user selects the result and confirms; If the user thinks that all matching results are not the results that he wants, change step 6) over to; If thinking, the user result who has him to want in the matching result changes step 7) over to;

7) melody of system plays response.

Compare with conventional art, the present invention has following characteristics:

1) user is only with the humming band lyrics or not with the song of the lyrics, system can mate according to the melody (can contain the lyrics in the melody) of user's humming and the melody in the musical database, the song title that the match is successful is returned to the user select.Solved the user and need remember that song names just can select the inconvenience of the song of wanting, the mode of humming has simultaneously also improved the song efficiency of selection.

2) user hums the part song or music can be selected melody, and promptly the user does not need song intactly hummed before microphone and finishes, and the climax part of humming song usually can meet the demands.

3) user can the import-restriction condition improve the precision that the humming mode is selected song.Restrictive condition can be input to system with form of sound by microphone, also can be input to system by external connection keyboard.These restrictive conditions can be some speech or words of song title, also can be some words of singer's name.

4) system has kept the mode of manual selection song.When the user by humming can not select the song of wanting the time, manual selection mode can be used as of system and replenishes.

Description of drawings

Fig. 1 is this device basic architecture figure;

Fig. 2 hums choosing song process flow diagram for the user;

Fig. 3 gives an oral account melody attribute choosing song process flow diagram for the user;

Fig. 4 is this device case study on implementation figure.

Embodiment

Below in conjunction with accompanying drawing the present invention is further set forth.

A kind of as shown in Figure 1 melody automatic dialing unit based on humming comprises audio collecting device, server, output device and output device.

Audio collecting device is meant microphone or has same function with microphone, can gather the equipment of surrounding environment sound.Audio collecting device is used to gather the melody of user's humming and gathers user's oral account in system melody base attribute, audio collecting device also has the ability that transmits the data that collect to the server of system, can be to carry while gathering, also can be to carry after collection is finished.This equipment also should have out and close two states simultaneously, only when opening, could gather the sound of surrounding environment.

Server is one and comprises the music data storehouse, can carry out the computing machine of the access and the analysis of melody in the above.Be sent in the server by the audio collecting device information of collecting, server is analyzed the part of user's humming or the melody base attribute of user's oral account, mate with the melody in the database, the calculating of coupling is finished on server, and server sends back outlet terminal to the result of coupling.Server is also analyzed the various command that entry terminal sends, and these orders comprise: the matching condition of user's input before the humming choosing song; After coupling is finished, carry the result behind outlet terminal, the order that the user selects once more; The order of importing during full manual screening.The specific implementation step of server coupling melody:

1) user who collects is hummed audio frequency and be sent to server.

2) audio frequency is carried out pre-service.Pre-service makes in the audio frequency step afterwards and is easier to by Computer Processing.The average of audio frequency is equivalent to a DC component, and the average μ x of audio frequency x (n) is estimated by following formula:

\overset{&OverBar;}{μ} x = \frac{1}{N} Σ_{n = 0}^{N - 1} Xn (n)

Wherein Xn (n) is the record of N the point of X (n), and μ x is the estimated value of the right real average μ x of X (n).

3) audio feature extraction.Characteristic sounds is a kind of random data, in the data processing of at random dynamics parameter, these characteristic sounds are described, need to calculate earlier the mean square deviation of audio frequency, obtain the statistics of amplitude domain, and then draw the statistics of time domain by autocorrelation function, draw the statistics of frequency domain at last again by the autopower spectral density function.After obtaining the statistics of amplitude domain feature A, the temporal signatures T of audio frequency and frequency domain character F, promptly think the useful feature that has obtained audio frequency acquiring.

4) compare by the feature of audio frequency and the feature of the music in the database.N music arranged in the tentation data storehouse, we precompute each music i in the database (i=1,2 ..., amplitude domain feature A n) _i, temporal signatures T _iWith frequency domain character F _i, obtain all D _i(i=1,2 ..., n), D _jThe D of expression value minimum _i, music j is our coupling optimum so.

Wherein:

D_{i} = \sqrt{{(A - A_{i})}^{2} + {(T - T_{i})}^{2} + {(F - F_{i})}^{2}}

5) obtain set k={k|D _k≤ D _j+ Δ }, wherein Δ is the coefficient of a control matching result scope.Return set k, i.e. all music that the match is successful.

Outlet terminal can be the equipment that general display, projector etc. have Presentation Function, is used for result that display server returns and the various inputs that show the response user.

Entry terminal can be the equipment that a kind of specific or general keyboard, touch-screen etc. have write-in functions, is used for the user and imports matching condition and send various command to server.

A kind of melody automatic selecting method based on humming comprises by humming choosing song with by oral account melody base attribute choosing song.

Be illustrated in figure 2 as the step of user by humming choosing song.

Be illustrated in figure 3 as the step of user by oral account melody base attribute choosing song.

Here the realization of system is sung in the automatic choosing that illustrates us as shown in Figure 4.The system implementation case diagram is different with system construction drawing, and system construction drawing is the ingredient of brief description system, and system should be that a server can be handled the request that many cover automatic musical composition selection equipment send over simultaneously, simultaneously the result is returned to them.

A present user has come us wherein before the cover melody selection equipment, wants to carry out the selection operation of melody.He has been owing to forgotten name and original singer's name of melody, thus can't obtain the result by the traditional-handwork inquiry, but he still remembers the theme of song, so hummed his melody section of remembering of part to microphone.Because the tone and the beat of user's humming are not very accurate, are shown on the display so system matches has a plurality of results to return, the user has selected him to want melody from the result because the melody name is remembered in resultful prompting.He selects then next melody again, remain mode by humming, specifically he to have imported the melody original singer earlier be these screening conditions of songster, hum then, because humming is too inaccurate really, so it fails to match, the prompting user enters by hand and selects the song pattern, and at this moment the user searches for according to melody name and original singer.The user selects the 3rd first melody once more, he remembers full name of melody specifically, so he selects song by method from the melody name to microphone that say, system mates according to user's language and the song names in the database, same because the interference of a variety of causes, coupling can not fully accurately be carried out, and system has returned several the melody names that the match is successful and waited for that the user carries out artificial selection, the melody that the user has selected him to want by keyboard from return results.

By such flow process, a user has just passed through native system and has selected 3 first melodies to play behaviour

Claims

One kind based on the humming the melody automatic dialing unit, it comprises

Audio collecting device is used for being responsible for gathering the melody of user's humming and the melody base attribute of collection user oral account under opening, and the data that will collect send to the server of system simultaneously;

Server is used for being responsible for the storage melody, receives the data that described audio collecting device is gathered, and mates with it, the result of mating is sent back outlet terminal, and the various command that entry terminal sends is analyzed;

Described outlet terminal is used for the various inputs of being responsible for showing the result of described server coupling and showing the response user;

Described entry terminal is used for being responsible for the input matching condition and sends various command to server.
2. the melody automatic dialing unit based on humming according to claim 1 is characterized in that described audio collecting device has out and closes two states, only when opening, could gather ambient data.
3. the melody automatic dialing unit based on humming according to claim 1 is characterized in that, described audio collecting device is microphone or has the equipment of same function with microphone.
4. the melody automatic dialing unit based on humming according to claim 1 is characterized in that described server is one and comprises the music data storehouse, can carry out the computing machine of the access and the analysis of melody.
5. the melody automatic dialing unit based on humming according to claim 1 is characterized in that described outlet terminal is the equipment with Presentation Function.
6. the melody automatic dialing unit based on humming according to claim 1 is characterized in that described entry terminal is the equipment with write-in functions.
7. melody automatic selecting method based on humming is characterized in that it comprises that described step by humming choosing song is specially by humming choosing song with by oral account melody base attribute choosing song:

1) user sends the order of preparing humming selection melody by entry terminal to server;

2) open the switch of audio collecting device, the user hums melody to audio collecting device;

3) voice data that sends over of server analyzing audio collecting device and the melody in the music data storehouse mate;

When 4) matching result is one or more of, server is sent to outlet terminal to matching result, if have only a result, wait for that then the user confirms the result, if many results are arranged, wait for that then the user selects the result and confirms, if the user thinks that all matching results are not the results that he wants, change step 6) over to; If thinking, the user result who has him to want in the matching result changes step 7) over to;

When 5) matching result was zero bar, server returned to outlet terminal to the result that it fails to match, and system also changes step 6) over to automatically;

6) system enters conventional manual and selects the song pattern, and the user is by entry terminal input singer title, singer's sex, and conditions such as song title are carried out manual screening;

7) melody of system plays response.
8. the melody automatic selecting method based on humming according to claim 7 is characterized in that, described step by oral account melody base attribute choosing song is specially:

1) user sends the order that preparation oral account melody attribute is selected melody by entry terminal to server;

2) open the switch of audio collecting device, the user says certain attribute of melody to audio collecting device;

3) voice data that sends over of server analyzing audio collecting device and the melody in the music data storehouse mate;

When 4) matching result was one or more of, server was sent to outlet terminal to matching result, if having only a result, waited for that then the user confirms the result, if many results are arranged, waited for that then the user selects the result and confirms; If the user thinks that all matching results are not the results that he wants, change step 6) over to; If thinking, the user result who has him to want in the matching result changes step 7) over to;

When 5) matching result was zero bar, server returned to outlet terminal to the result that it fails to match, and system also changes step 6) over to automatically;

6) system enters conventional manual and selects the song pattern, and the user is by entry terminal input singer title, singer's sex, and conditions such as song title are carried out manual screening;

7) melody of system plays response.