CN201397672Y - Music learning system - Google Patents

Music learning system Download PDF

Info

Publication number
CN201397672Y
CN201397672Y CN2009201065794U CN200920106579U CN201397672Y CN 201397672 Y CN201397672 Y CN 201397672Y CN 2009201065794 U CN2009201065794 U CN 2009201065794U CN 200920106579 U CN200920106579 U CN 200920106579U CN 201397672 Y CN201397672 Y CN 201397672Y
Authority
CN
China
Prior art keywords
information
characteristic information
characteristic
similarity
media
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009201065794U
Other languages
Chinese (zh)
Inventor
须清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Paragon Technology Co Ltd
Original Assignee
Beijing Paragon Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Paragon Technology Co Ltd filed Critical Beijing Paragon Technology Co Ltd
Priority to CN2009201065794U priority Critical patent/CN201397672Y/en
Application granted granted Critical
Publication of CN201397672Y publication Critical patent/CN201397672Y/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The utility model provides a music learning system, which includes an information memory bank memorizing the information of at least one piece of music. The music learning system is characterized in that the music learning system further includes a sound input part, a feature extraction part, a media information selection part, a media information feature extraction part, a feature similarity calculation part, and an information cue part, wherein the feature extraction part extracts first feature information from sound signals or information input by the sound input part; the media informationselection part selects the information of the music to be learned; the media information feature extraction part calculates second feature information of the selected music information; the feature similarity calculation part is used for calculating and judging the similarity of the first feature information and the second feature information of the selected music information; and the informationcue part outputs the differences between the input sound and the selected music information according to the information similarity. The utility model can further provide the differences on the wholes, partial segments and/or single syllable between the input sound and the selected music information, so that the correction can be efficiently performed during the learning according to the differences between the played or sung music and the original music.

Description

The melody learning system
Technical field
The utility model relates to the melody learning system, particularly relates to the melody learning system of judging difference in the music playing process.
Background technology
Multimedia player, quite universal as MP3, MP4, portable terminal, computer etc., these equipment all have bigger information storage capacity usually, have stored a lot of bar multimedia messagess.The method that will select required multimedia messages to play from these multimedia messagess mainly is earlier multimedia messages to be classified by certain rule, is selected by operation interface by the operator in the mode of choice menus then.When the multimedia messages content is a lot, will make the menu level of operation interface a lot, also be the comparison difficulty to such an extent as to choose required multimedia messages.And the shown menu option of operation interface only shows caption of multimedia information usually usually, when the multimedia messages content is a lot, people often see that it is required content that title can not know whether sometimes, often choose back audition or try that it is not required that the back is found, situation about reselecting again.
In today of internet development, the content of multimedia in the network is very huge especially, therefrom searches for required content and is not easy, and particularly search is difficult to acquisition especially when people do not remember the title of content of multimedia clearly.
In recent years, speech recognition technology and also many based on the research of sound control operation electronic equipment, also some commercializations on some mobile terminal devices are as selecting to carry out making call operation etc. by sound.U.S. patent documents No.4,277,644 and No.6,101,467 have covered the various aspects of speech recognition software.And the method that is used to characterize audio content also has description.Particularly U.S. Patent No. 6,054, and 646 and No.6,173,250 have covered the method that is used for characterizing by features such as beat, energy, pitches music.
Although have at the progress of the characterizing method of speech recognition, audio signal analysis and musical features and development in recent years, and realize voice control on some electronic equipments, it uses the demand that can not satisfy people under many circumstances.For example, often a kind of situation that occurs is, some difficulties appear in people when using the content of multimedia that multimedia player selects oneself to like, perhaps can groan out certain segment or a certain sentence of melody in this multimedia messages at that time or only can groan out the approximate content of a trifle of melody melody, but be exactly title that can not remember content of multimedia, thereby can't find required media content effectively.
On open was on July 13rd, 2005, and publication number is to have mentioned by go to select the content in desired signal source behind the phonetic feature that extracts signal source with phonetic feature in the Chinese invention patent application disclosure of CN1639975A.Disclosed watchdog function (Watch Dog) in the disclosure file particularly: the user can sing or groan out one style (pattern) to voice-frequency sender-player sound intermediate frequency analyzer, voice-frequency sender-player can monitor different channels at that specific tone then, and the user can input to voice-frequency sender-player with said speech by voice recognition software, and voice-frequency sender-player can be at some or all dialogue and the monologue and monitor different channel that comprises these speech then.Adopt senior matching algorithm, promptly when twice or three times appear in predetermined second in the number, announce Matching Algorithm when phrase.When coupling occurring, can produce a control event, the switching of control channel.
But the technology of foregoing description has its defective when being applied to the multimedia player with larger capacity.Because using multimedia player is not all to be the professional, during the content of the segment of singing out or groaning out or content or melody melody, it often is not the content of standard, also different as the beat of melody melody tone also different or the melody melody, but the content of its content of groaning out or singing out and desired selection has certain similarity again.Such as a first melody is that C transfers, and the multimedia messages of recording is based on also that C transfers, but the content that people groan out or sing out can transfer or C rising tune or C falling tone with F, but the rhythm of melody is similar substantially, and it is same first melody that people can judge; Perhaps a first melody is 2/4 beat, but the content of groaning out or singing out may be 4/4 beat, but the rhythm of melody is similar substantially, and it is same first melody that people can judge.In this case, prior art does not have fine solution.
On the other hand, for media player, also the situation of Cun Zaiing is, people wish that certain bar media information begins to play from certain point, prior art normally adopts fast forward button or fast backward button, but this mode of operation can only be estimated with the operator and advance or going-back position, usually inaccurate, need repeatedly just can finish by fast forward button or fast backward button.Though existing Digital Media recording mode provides the broadcast of catalogue formula menu selection mode with selected certain bar medium, but still can not solve problem by people's expectation quick control media play starting point.
In addition, the copyright problem of music, also more and more receive publicity, particularly report is arranged during the plagiarism problem of music, and some plagiarism persons are in order to hide the leak of corresponding legal provisions, the tone or the beat of melody adjusted slightly, make its with former song from different in form, but the entity content is still similar, and how this similarity is considered to plagiarize, and prior art does not propose corresponding method yet.
Summary of the invention
Technical problem to be solved in the utility model is the starting point of how more effectively selecting required multimedia messages from the media store body or from the internet and arbitrarily controlling media play effectively.The utility model adopts the method for phonetic feature extraction, stage extraction, similarity calculating, similarity determination to realize that sound is controlled electronic equipment or network operation realizes obtaining automatically required multimedia messages.Technology of the present utility model can also be used for or realize melody plagiarism or the automatic judgement of similarity and the assisted learning of melody simultaneously.
The term explanation: the phonetic feature of indication is and the relevant characteristic information of the rhythm of importing voice herein, and the rhythm is based on each discernible syllable, that is to say, one section multimedia comprises a lot of syllables, and be to serve as the feature that the basis obtains with each syllable when extracting phonetic feature, the feature of each syllable is combined the whole rhythm or the melody that has promptly constituted this section multimedia messages in order, therefore the characteristics combination of extracting can intercept wherein any one section, as carrying out the right basis of aspect ratio in the utility model.When one section phonetic entry comprises a plurality of melody, can only extract the theme feature or extract the feature of all melody.Phonetic feature and characteristic information are identical meanings in this article.
The term explanation: the media information of the utility model indication has identical implication with multimedia messages, all is meant the combination in any of the voice messaging, music information, video information, data message or these information that include acoustic information.
The term explanation: the utility model indication similarity is meant the data of the expression information correlativity that the employing relevance algorithms draws between two information, and described relevance algorithms comprises linear dependence computing method or non-linear dependencies computing method.Linear dependence computing method and non-linear dependencies computing method have provided a variety of mathematical models and computing method in existing art of mathematics and expeimental physics, quote as the prior art that is associated with the utility model.
For addressing the above problem, the technical solution of proposition is:
1, first kind of scheme of a kind of multimedia playing apparatus comprises storage medium and the multi-media signal output block of having stored at least one multimedia messages, it is characterized in that also comprising:
Sound input component both can be by the sound transducer sound import, also can be one section audio files input information of making in advance;
Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input or information;
The medium information characteristic memory unit has been stored second characteristic information corresponding to described every multimedia messages;
The characteristic similarity calculating unit is used for judging the similarity of any segment information of second characteristic information of described first characteristic information and described every multimedia messages;
The characteristic similarity decision means is chosen the similarity maximal value from described similarity data;
The multimedia messages alternative pack selects that multimedia messages at the peaked message segment of similarity place to be sent to described multi-media signal output block from described storage medium.
2, second kind of scheme of a kind of multimedia playing apparatus comprises storage medium and the multi-media signal output block of having stored at least one multimedia messages, it is characterized in that also comprising:
Sound input component both can be by the sound transducer sound import, also can be one section audio files input information of making in advance;
Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input or information;
The medium information characteristic calculating unit calculates second characteristic information corresponding to described every multimedia messages;
The characteristic similarity calculating unit is used for judging the similarity of any segment information of second characteristic information of described first characteristic information and described every multimedia messages;
The characteristic similarity decision means is chosen the similarity maximal value from described similarity data;
The multimedia messages alternative pack selects that multimedia messages at the peaked message segment of similarity place to be sent to described multi-media signal output block from described storage medium.
3, for the extracting method of first characteristic information and second characteristic information, the song of being familiar with everybody is an example, can extract the theme characteristic information of this first song, as representing with numbered musical notation or staff, has comprised the information of tempo and tone in the numbered musical notation.Can be the theme characteristic information as second characteristic information of the present utility model; And different people is when singing out or groaning out this first song, its tempo and/or tone may be different with tempo, tone that this first song itself is determined, also may be different with tempo, the tone of second characteristic information in the message segment of record into multimedia messages, if but all be at same first singing songs, their theme is to have very big similarity.Therefore after carrying out beat adjustment and/or tone adjustment for second characteristic information, carry out similarity with first characteristic information again and calculate.Described melody also can be represented with staff or other melody.In the multimedia messages of music was handled, wherein a kind of music media form was a music score file, and this file is with the data mode stored sound of expression note, musical instrument and sharpness information, and most popular data layout is the MIDI data layout.The MIDI file comprises standard how to reproduce sound, can be considered to a music score of electronically readable form, the sound channel that will consider when it comprises the represented music score of relevant data of storing in each MIDI file of resetting, used device and the information of the parameter of entering a higher school.Collective term " parameters,acoustic " expression for example defines, and pitch, note or its residual value are respectively the description that responds grade, velocity of sound, tone color or special-effect such as trill or reverberation.Therefore described MIDI file has comprised needed second characteristic information of the utility model, can be at each bar or the pairing MIDI file of each first multimedia messages as second characteristic information of the present utility model, accordingly, same procedure is also adopted in the extraction of first characteristic information, and the MIDI file that extracts the input voice is as first characteristic information.Perhaps carry out one of features such as data extract removal musical instrument, response grade, tone color trill, reverberation or several back again as second characteristic information of the present utility model for each bar or the pairing MIDI file of each first multimedia messages, accordingly, same procedure is also adopted in the extraction of first characteristic information, and the MIDI file that extracts the input voice is removed one of features such as velocity of sound, musical instrument, response grade, tone color trill, reverberation or several back as first characteristic information.
In U.S. Patent No. 6,054, provided in 646 by from voice signal, extracting the method for characteristic signal, comprised cepstral coefficients method (MFCC:Mel Frequency Cepstral Coefficients), linear predict code (LPC:Linear Predictive Coding).Also provided simultaneously the parameter maps description that the MFCC feature is converted into the MIDI file.The utility model quotes in full U.S. Patent No. 6,054 here, 646 content.Can be easy to find the sound waveform file (WAVE) that will gather to be converted to the software of MIDI file in addition in existing internet, to also have the MIDI file conversion be the software of numbered musical notation and be the software of staff with the MIDI file conversion.Therefore the utility model is realizing on the basis of these existing knowledge that content of the present utility model is with the relevance between the multimedia messages of the acoustic information of judging input and storage.A kind of implementation can be described below:
Voice signal for input extracts the MFCC coefficient, generates the MIDI file with the MFCC coefficient then, is the numbered musical notation file with the MIDI file conversion again, with the numbered musical notation file as first characteristic information; Multimedia messages for storage adopts identical method to extract the MFCC coefficient, generates the MIDI file with the MFCC coefficient then, is the numbered musical notation file with the MIDI file conversion again, with the numbered musical notation file as second characteristic information; Calculate the similarity of first characteristic information and second characteristic information then, just can realize the function that will reach required for the present invention according to similarity result of calculation.In different application requires, can also carry out further conversion for first characteristic information and second characteristic information, the numbered musical notation file of multimedia messages correspondence further generates as described in also comprising as second characteristic information based on the combination of the numbered musical notation file of various big accent, if promptly the numbered musical notation file of original multimedia information is a c major, can further generate the part of such as the numbered musical notation file of the big accent of D, the big accent of E, the big accent of G etc. as second characteristic information; Second characteristic information the combination that comprises also that the numbered musical notation file of described multimedia messages correspondence further generates for another example based on the numbered musical notation file of various beats, if promptly the numbered musical notation file of original multimedia information is 2/4 bat, can further generate the part of such as the numbered musical notation file of 4/4 bat, 6/8 bat etc. as second characteristic information; For another example each tone of numbered musical notation file is all used a numeral, adjacent same tone is merged into a tone, and then carry out similarity and calculate, the sound that can get rid of input is because of being out of tune or the beat difference causes the difference of similarity.
In a kind of optional implementation, first characteristic information and second characteristic information can be exactly MFCC coefficient or LPC coefficient, directly carry out similarity calculating for MFCC coefficient or LPC coefficient; Can also be exactly the MIDI file, directly carry out similarity and calculate for the MIDI file.
4, described first characteristic information comprises acoustic tones information and/or inflection information; Described second characteristic information comprises acoustic tones information and/or the inflection information that comprises in the multimedia messages.
5 or described first characteristic information comprise sound pitch information and/or change in pitch information; Described second characteristic information comprises sound pitch information and/or the change in pitch information that comprises in the multimedia messages.
6, first kind of scheme of a kind of multimedia messages disposal route selected required multimedia messages from the storage medium of second characteristic information of at least one multimedia messages and every multimedia messages correspondence, it is characterized in that comprising the steps:
The first step: by sound input component input audio signal or information;
Second step: from the voice signal of described sound input component input or information, extract first characteristic information;
The 3rd step: the similarity data of calculating any segment information in second characteristic information of described first characteristic information and described every multimedia messages;
The 4th step: from described similarity data, choose the similarity maximal value;
The 5th step: second characteristic information from described storage medium under the peaked message segment of selection similarity;
The 6th step: from storage medium, retrieve pairing that multimedia messages according to the second affiliated characteristic information.
7, this method also comprises the step with pairing that multimedia messages output.
8, this method also comprises in storage medium the step of input multimedia messages, is input to multimedia messages in the storage medium from other media or connects by network by wired or wireless mode multimedia messages is downloaded in the storage medium.
9, further be this method, also comprise the multimedia messages of described input is calculated the step of the second corresponding characteristic information and is stored in the described storage medium.
10 or this method also comprise directly the step of in the storage medium input multimedia messages and corresponding second characteristic information.
11, the length of any segment information in described second characteristic information is identical with the length of described first characteristic information, any segment information in perhaps described second characteristic information by the beat adjustment after and/or after the tone adjustment length with described first characteristic information identical.
12, described second characteristic information and described first characteristic information be music the rhythm or melodic information.
13 or described second characteristic information and described first characteristic information be the rhythm or the melodic information of having removed beat length.
14, the computing method in described the 3rd step are the linear dependence computing method.A kind of first characteristic information and second characteristic information of realizing being based on numbered musical notation is because usually numbered musical notation can be with three octave notes and beat perfect representation.Because all being 1 to 7 numeral, note adds that high note or low note and pause sound (representing with 0 usually) represent.Can carry out following processing when being converted into characteristic information of the present invention.For high pitch (the 3rd octave) with 8 to 15 totally 7 numerals, for bass (first octave) with-7 to-1 totally 7 numerals, for middle pitch (second octave) with 1-7 totally 7 numerals, the pause sound is represented with 0, therefore characteristic information of the present invention has been transformed into numerical information in this implementation, the corresponding numeral of each beat.With linear dependence degree computing method, can be easy to calculate the similarity of first characteristic information and second characteristic information, even first characteristic information is different with the pitch or the tone of second characteristic information, if but exist similarly, then each beat pitch or tone all can correspondingly change.As second characteristic information is that C in the music transfers, and second characteristic information can be that B transfers, because the numeral of each beat all takes place correspondingly to change according to determined accent, though the numeral of each beat is different, the similarity of calculating is but very high.The mathematic calculation of linear similarity belongs to known algorithm, just repeats no more here.Sometimes it is different with the beat of second characteristic information of multimedia messages the represented beat of first characteristic information of voice also can to occur importing, as second characteristic information is 2/4 beat, and first characteristic information is 4/4 beat, but the theme of its expression may be similar, therefore calculates similarity and need adjust for the beat of first characteristic information and/or second characteristic information before.One of method of adjustment be with the data of a beat with identical beat of data expansion, be 5 can be adjusted into two beats as the data of certain beat, each beat all is 5; Two of method of adjustment is that two continuous beats that data are identical are reduced to a beat, all is 5 can be adjusted into a beat as the data of certain two continuous beat, and beat data is 5.
15, second of a kind of multimedia messages disposal route kind of scheme selected required multimedia messages from the storage medium of having stored a multimedia messages at least, it is characterized in that comprising the steps:
The first step: by sound input component input audio signal or information;
Second step: from the voice signal of described sound input component input, extract first characteristic information;
The 3rd step: second characteristic information that calculates every multimedia messages correspondence;
The 4th step: the similarity data of calculating any segment information in second characteristic information of described first characteristic information and described every multimedia messages;
The 5th step: from described similarity data, choose the similarity maximal value;
The 6th step: retrieve pairing that multimedia messages according to second characteristic information under the peaked message segment of similarity.
The difference of second kind of scheme and first kind of scheme is that second characteristic information of every multimedia messages is to be stored in the memory bank in advance, or just calculates during application need.
16, a kind of first kind of scheme of multimedia messages player operation method, from the storage medium of second characteristic information of at least one multimedia messages and every multimedia messages correspondence, select required multimedia messages to play, it is characterized in that comprising the steps:
The first step: by sound input component input audio signal or information;
Second step: from the voice signal of described sound input component input or information, extract first characteristic information;
The 3rd step: the similarity data of calculating any segment information in second characteristic information of described first characteristic information and described every multimedia messages;
The 4th step: from described similarity data, choose the similarity maximal value;
The 5th step: second characteristic information from described storage medium under the peaked message segment of selection similarity;
The 6th step: from storage medium, retrieve pairing that multimedia messages and play output according to the second affiliated characteristic information.
Second characteristic information of every multimedia messages correspondence can adopt the MIDI file, perhaps extracts the partial element of MIDI file out.
17, second kind of scheme of a kind of multimedia messages player operation method selected required multimedia messages from the storage medium of having stored a multimedia messages at least, it is characterized in that comprising the steps:
The first step: by sound input component input audio signal or information;
Second step: from the voice signal of described sound input component input or information, extract first characteristic information;
The 3rd step: second characteristic information that calculates every multimedia messages correspondence;
The 4th step: the similarity data of calculating any segment information in second characteristic information of described first characteristic information and described every multimedia messages;
The 5th step: from described similarity data, choose the similarity maximal value;
The 6th step: retrieve pairing that multimedia messages according to second characteristic information under the peaked message segment of similarity and play output.
Technology of the present invention can also be used to judge the similarity of two songs, is judging music has bigger use in whether plagiarizing.
18, a kind of music similarity determination methods of carrying out is carried out the similarity judgement for the first music and second music, it is characterized in that comprising the steps:
The first step: from the multimedia messages of described first music, extract first characteristic information behind the multimedia messages of first characteristic information of the multimedia messages of input first music or input first music;
Second step: a plurality of message segments that described first characteristic information resolved into the certain-length that begins with any starting point;
The 3rd step: import second music multimedia messages second characteristic information or import the multimedia messages of second music after from the multimedia messages of described second music, extract second characteristic information;
The 4th step: calculate described a plurality of message segments any one section with described second characteristic information in the similarity data of any segment information;
The 5th step: from described similarity data, choose the similarity maximal value;
The 6th step: judge that whether the similarity maximal value surpasses the threshold values of setting, if surpass the threshold values of setting then judge described first music and the described second music similarity height, otherwise described first music and described second music similarity are low.
In a plurality of message segments of above-mentioned certain-length, for the regulation of certain-length can with the definition of relevant legal document carry out related, as stipulate continuous 7 beats similarly be identified as plagiarism, described certain-length can be set at the length of 7 beats.
The implication of the threshold values of above-mentioned setting is according to determining for the strict degree of the execution of relevant law.Similar just calculation for strictness plagiarized, and then the threshold values of She Dinging is just very high, near 1; Just can suitably reduce when carrying out the threshold values of setting when strict degree reduces, as be 0.8 or 0.9.
19, a kind ofly carry out the music similarity determination methods, it is characterized in that comprising the steps: for the music in the internet
The first step: from the multimedia messages of described first music, extract first characteristic information behind the multimedia messages of first characteristic information of the multimedia messages of input first music or input first music;
Second step: a plurality of message segments that described first characteristic information resolved into the certain-length that begins with any starting point;
The 3rd step: from the internet, download second music multimedia messages second characteristic information or from the internet, download the multimedia messages of second music after from the multimedia messages of described second music, extract second characteristic information;
The 4th step: calculate described a plurality of message segments any one section with described second characteristic information in the similarity data of any segment information;
The 5th step: from described similarity data, choose the similarity maximal value;
The 6th step: judge that whether the similarity maximal value surpasses the threshold values of setting, if surpass the threshold values of setting then judge described first music and the described second music similarity height, otherwise described first music and described second music similarity are low.
In a plurality of message segments of above-mentioned certain-length, for the regulation of certain-length can with the definition of relevant legal document carry out related, as stipulate continuous 7 beats similarly be identified as plagiarism, described certain-length can be set at the length of 7 beats.
The implication of the threshold values of above-mentioned setting is according to determining for the strict degree of the execution of relevant law.Similar just calculation for strictness plagiarized, and then the threshold values of She Dinging is just very high, near 1; Just can suitably reduce when carrying out the threshold values of setting when strict degree reduces, as be 0.8 or 0.9.
Adopt technology of the present invention can also be used for the internet and carry out media information search, a kind of effective more a kind of search system and searching method are provided.
20, first of a kind of network searching system kind of scheme comprises remote server component and proximal piece, and described remote server component is connected by internet or LAN with proximal piece, it is characterized in that:
Described proximal piece comprises:
Sound input component;
Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input or information;
The information transmit block arrives described remote server component with first characteristic information by network delivery;
Message pick-up first parts receive the multimedia messages that described remote server component sends over;
Described remote server component comprises:
Message pick-up second parts receive first characteristic information that sends over from described proximal piece;
The media information memory unit has been stored at least one multimedia messages, calculating and storage or has been stored second characteristic information corresponding to described every multimedia messages, every multimedia messages and its second characteristic information corresponding relation in advance;
The characteristic similarity calculating unit is used for judging the similarity of any segment information of second characteristic information of described first characteristic information and described every multimedia messages;
The characteristic similarity decision means is chosen similarity maximal value or similarity and is surpassed a plurality of similarity data of setting threshold values from described similarity data;
The multimedia messages alternative pack selects described similarity maximal value or similarity to be sent to described proximal piece above one or more multimedia messages of the second characteristic information correspondence at the message segment place of a plurality of similarity data of setting threshold values from described media information memory bank.
21, adopt the network searching system of first kind of scheme to realize the multimedia messages searching method, it is characterized in that comprising following operation steps:
Step 1: in described proximal piece input audio signal or information;
Step 2: described proximal piece is extracted first characteristic information of described voice signal or information;
Step 3: described first characteristic information is sent to remote server component by internet or LAN;
Step 4: described remote server component is calculated the similarity of second characteristic information of every media information storing in described first characteristic information and the remote server component;
Step 5: described remote server component retrieves pairing multimedia messages as the multimedia messages of choosing according to the maximal value or the similarity of described similarity above corresponding second characteristic information of a plurality of similarity data of setting threshold values from described remote server component;
Step 6: described remote server component sends to proximal piece with the multimedia messages of being chosen by internet or LAN.
22, second of a kind of network searching system kind of scheme comprises remote server component and proximal piece, and described remote server component is connected by internet or LAN with proximal piece, it is characterized in that:
Described proximal piece comprises:
Sound input component;
Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input or information;
Download parts, download second characteristic information of every multimedia messages from described remote server component;
The near-end memory unit, storage is from second characteristic information of every multimedia messages downloading parts
The characteristic similarity calculating unit is used for judging the similarity of any segment information of second characteristic information of described first characteristic information and described every multimedia messages;
The characteristic similarity decision means is chosen similarity maximal value or similarity and is surpassed a plurality of similarity data of setting threshold values from described similarity data;
Alternative pack takes out the similarity maximal value from described memory unit or similarity surpasses pairing second characteristic information of a plurality of similarity data of setting threshold values;
The information transmit block arrives described remote server component with selected second characteristic information of alternative pack by network delivery;
Message pick-up first parts receive the multimedia messages that described remote server component sends over;
Described remote server component comprises:
Message pick-up second parts receive second characteristic information that sends over from described proximal piece;
The media information memory unit has been stored at least one multimedia messages, calculating and storage or has been stored second characteristic information corresponding to described every multimedia messages, every multimedia messages and its second characteristic information corresponding relation in advance;
The multimedia messages alternative pack, pairing one or more multimedia messages of second characteristic information that receives in selection and the described information receiving parts from described media information memory bank is sent to described proximal piece.
23, adopt the network searching system of second kind of scheme to realize the multimedia messages searching method, it is characterized in that comprising following operation steps:
Step 1: described proximal piece is downloaded second characteristic information of every multimedia messages from described remote server component by internet or LAN
Step 2: in described proximal piece input audio signal or information;
Step 3: described proximal piece is extracted first characteristic information of described voice signal or information;
Step 4: described proximal piece is calculated the similarity of second characteristic information of described first characteristic information and described every media information;
Step 5: the maximal value or the similarity of described similarity are sent to remote server component above pairing second characteristic information of a plurality of similarity data of setting threshold values by internet or LAN;
Step 6: described remote server component retrieves pairing multimedia messages as the multimedia messages of choosing according to second characteristic information of being received from described remote server component;
Step 7: described remote server component sends to proximal piece with the multimedia messages of being chosen by internet or LAN.
24, the third scheme of a kind of network searching system comprises remote server component and proximal piece, and described remote server component is connected by internet or LAN with proximal piece, it is characterized in that:
Described proximal piece comprises:
Sound input component;
The information transmit block will arrive described remote server component by network delivery from the voice signal or the information of described sound input component;
Message pick-up first parts receive the multimedia messages that described remote server component sends over;
Described remote server component comprises:
Message pick-up second parts receive the voice signal or the information that send over from described proximal piece;
Characteristic extracting component is extracted first characteristic information voice signal that receives from described message pick-up second parts or the information;
The media information memory unit has been stored at least one multimedia messages, second characteristic information corresponding to described every multimedia messages, every multimedia messages and its second characteristic information corresponding relation;
The characteristic similarity calculating unit is used for judging the similarity of any segment information of second characteristic information of described first characteristic information and described every multimedia messages;
The characteristic similarity decision means is chosen similarity maximal value or similarity and is surpassed a plurality of similarity data of setting threshold values from described similarity data;
The multimedia messages alternative pack selects described similarity maximal value or similarity to be sent to described proximal piece above one or more multimedia messages of the second characteristic information correspondence at the message segment place of a plurality of similarity data of setting threshold values from described media information memory bank.
25, adopt the network searching system of the third scheme to realize the multimedia messages searching method, it is characterized in that comprising following operation steps:
Step 1: in described proximal piece input audio signal or information;
Step 2: described voice signal or information are sent to remote server component by internet or LAN;
Step 3: described remote server component extract first characteristic information of the voice signal of receiving or information;
Step 4: described remote server component is calculated the similarity of second characteristic information of every media information storing in described first characteristic information and the remote server component;
Step 5: described remote server component retrieves pairing multimedia messages as the multimedia messages of choosing according to the maximal value or the similarity of described similarity above corresponding second characteristic information of a plurality of similarity data of setting threshold values from described remote server component;
Step 6: described remote server component sends to proximal piece with the multimedia messages of being chosen by internet or LAN.
26, described multimedia messages is one of following message or its combination: literal, picture, sound, melody, film, TV.
Technology of the present invention can also be used for the device according to sound input automatic page turning, as concert performer's music score page turning etc.
27, a kind of automatic page turning device comprises media information memory bank and the display unit of having stored at least one multimedia messages, it is characterized in that also comprising:
Sound input component;
Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input;
The medium information characteristic memory unit calculates and stores or stored in advance second characteristic information corresponding to described every multimedia messages;
The characteristic similarity calculating unit, the part of the multimedia messages that is used to judge that described first characteristic information and described display unit are shown the current location of the pairing multimedia messages of message segment of the corresponding second characteristic information similarity maximum;
The page turning decision means is when the current location of described multimedia messages is following one page content that the then described display unit of ending of the part of the shown multimedia messages of described display unit shows described multimedia messages;
28, a kind of a kind of scheme of melody learning system comprises the memory bank of having stored at least one first musical composition information, it is characterized in that also comprising:
Sound input component;
Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input;
The media information alternative pack selects to prepare certain first musical composition information of study;
Medium information characteristic is extracted parts, extracts second characteristic information of selected musical composition information;
The characteristic similarity calculating unit is used for calculating and judging the corresponding second characteristic information similarity with selected musical composition information of described first characteristic information;
The information indicating parts provide the difference of the sound and the selected musical composition information of input according to the information similarity.
29, second of a kind of melody learning system kind of scheme comprises the memory bank of having stored at least one first musical composition information and second characteristic information corresponding with every first musical composition information, it is characterized in that also comprising:
Sound input component;
Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input;
The media information alternative pack selects to prepare certain first musical composition information of study;
The characteristic similarity calculating unit is used to calculate and judges described first characteristic information and the selected corresponding second characteristic information similarity of musical composition information;
The information indicating parts provide the difference of the sound and the selected musical composition information of input according to the information similarity.
30, in the above-mentioned melody learning system, the sound of described input and the difference of musical composition information comprise sound and the whole difference of selected musical composition information and/or the difference of local segment and/or single syllable of input.Promptly can provide the sound imported and provide difference as a whole with selected musical composition information as a whole, the high more then otherness of the similarity that is calculated is more little, and the low more then diversity factor of similarity is big more; Because the characteristic information among the present invention extracts by each syllable, therefore can also further provide the difference on the local segment of the sound of input and selected musical composition information, also can provide the difference of single syllable, just can find exactly oneself to play or the melody of performance and the difference part of melody itself during study, thereby revise effectively according to this species diversity.
31, in the above-mentioned melody learning system, described information indicating parts comprise voice output parts and/or information display section part.Promptly can show by sound or information display mode difference with the sound of input and selected musical composition information.As playing back and/or show segment or the syllable or the similarity data of difference with loudspeaker with display.
Beneficial effect of the present invention: adopt technology of the present invention can realize more effectively selecting required multimedia messages from the media store body or from the internet, by the relevant part phonetic feature information of input medium, as the segment of humming certain first song can retrieve the complete information of this first song; Whether technology of the present invention can also more effectively be distinguished a first melody simultaneously other melodies of plagiarism.The present invention adopts the method for phonetic feature extraction, stage extraction, similarity calculating, similarity determination to realize that sound is controlled electronic equipment or network operation realizes obtaining automatically required multimedia messages, realize that perhaps melody is plagiarized or the automatic judgement of similarity, and can also realize melody automatic page turning function, the person can be absorbed in the performance of melody to make the music playing, and does not need the page, the melody assisted learning function of manual switchover melody.The present invention realizes that media player can realize selecting and the media information of input voice with very big similarity by phonetic entry from media player, changed the mode of operation of existing media player fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or sing the broadcast starting point that the selection that just can carry out media information also can be controlled medium, greatly reduce user's operation easier, even can realize the operation of media player for blind person or the user that is ignorant of player operation.The present invention realizes that the media research system can realize the media information that has very big similarity with the input voice by selecting in the various media servers of phonetic entry from internet or LAN, changed the way of search of existing network search engine or research tool fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or just singing and to carry out the search of media information, simplified user's operation easier greatly, even can realize the search of media information for blind person or the user that is ignorant of computation.
Description of drawings:
Fig. 1 is first kind of system works principle schematic that realizes multimedia information retrieval of the present invention.
Fig. 2 is second kind of system works principle schematic that realizes multimedia information retrieval of the present invention.
Fig. 3 is that first characteristic information and second characteristic information carry out first kind of algorithm principle of work synoptic diagram that similarity is calculated among the present invention.
Fig. 4 is that first characteristic information and second characteristic information carry out second kind of algorithm principle of work synoptic diagram that similarity is calculated among the present invention.
Fig. 5 is that first characteristic information and second characteristic information carry out the third algorithm principle of work synoptic diagram that similarity is calculated among the present invention.
Fig. 6 is the present invention chooses multimedia messages by the sound input a workflow synoptic diagram.
Fig. 7 is that the present invention chooses multimedia messages by sound from the internet first kind of system realizes synoptic diagram.
Fig. 8 is that the present invention chooses multimedia messages by sound from the internet second kind of system realizes synoptic diagram.
Fig. 9 is that the present invention realizes music score automatic page turning system principle synoptic diagram.
Figure 10 is that the present invention realizes melody learning system principle schematic.
Figure 11 is that the present invention realizes the media player principle schematic.
Figure 12 is the schematic flow sheet that the present invention judges two song similaritys.
Embodiment:
Core point of the present invention is, the acoustic information of input is handled, and extracts first characteristic information, adopts second characteristic information in special algorithm and the multimedia messages to carry out similarity calculating then.That multimedia messages of selecting the similarity maximum is as the desired selected multimedia messages of being imported of sound.When the input of multimedia messages and sound and processing element thereof concentrate in the embedded system, can design based on portable sets such as media player of the present invention, palm PC, portable terminal, notebook computers.When multimedia information storage in server, and the sound input is in client, the acoustic information system parts not only can be integrated in the server but also can be integrated in the client, server is connected by LAN or internet with client, can design based on media research of the present invention system, music infringement decision-making system, sing learning system, music score automatic page turning device.
Further describe specific embodiments of the present invention below in conjunction with accompanying drawing.
Fig. 1 is based on first kind of implementation of media play system that the present invention realizes that the sound input is selected.In this scheme, characteristic similarity calculating unit 105 comprise two inputs: one is to extract parts 103 from first characteristic information, and it is by handling the voice messaging from phonetic entry parts 101, therefrom characteristic information extraction; Another is that its feature by taking out media information from medium information characteristic memory unit 102 intercepts any one section characteristic information then from any segment information intercepting parts 104 in second characteristic information.A plurality of similarity data that characteristic similarity calculating unit 105 will calculate are exported to characteristic similarity decision means 106, screen comparison by these parts, therefrom choose the second affiliated characteristic information of that section characteristic information of similarity maximum and from storage medium 107, select required multimedia messages as multimedia messages alternative pack 108.The media information of second characteristic information that medium information characteristic memory unit 102 is stored and storage medium 107 storages is one to one, and promptly a media information in medium information characteristic memory unit 102 second characteristic information and the storage medium 107 is one to one.This corresponding relation also is stored in the medium information characteristic memory unit 102 or in the storage medium 107.In the specific implementation, medium information characteristic memory unit 102 and storage medium 107 can be merged into by a memory unit, and wherein second characteristic information can adopt the data form file layout with the corresponding of media information, also can adopt the file layout of database.Typical phonetic entry parts 101 specific implementations are such as being to be made of microphone, microphone signal treatment circuit and digitization of speech signals Acquisition Circuit.First characteristic information extracts aspect ratio that parts 103 the extract prosodic information in the voice, pitch information etc. in this way from the voice of being imported, and further can be converted into music-book information, as feature.Realization as media play system, in the specific design, wherein any segment information intercepting parts 104, characteristic similarity calculating unit 105, characteristic similarity decision means 106 and the multimedia messages alternative pack 108 in first characteristic information extraction parts 103, second characteristic information all realized by software by the processor of media player.The effect of its realization is, when people wish media renderer plays bar media information, can be facing to phonetic entry parts 101 hum the segment of the music information that this media information comprises by microphone, utilize method media player of the present invention just can select automatically with the institute segment of hum nearest like media information play, thereby save because of forgetting the media information title or carrying out the worry that multilevel menu is operated too much because of media information.Even the melody segment that the user hummed of media player is very inaccurate, has only that the basic rhythm is similar to get final product, thereby have very big practicality, adaptability, operability.The present invention realizes that media player can realize selecting and the media information of input voice with very big similarity by phonetic entry from media player, changed the mode of operation of existing media player fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or just singing and to carry out the selection of media information, simplified user's operation easier greatly, even can realize the operation of media player for blind person or the user that is ignorant of player operation.
Fig. 2 is based on second kind of implementation of media play system that the present invention realizes that the sound input is selected.The difference of this scheme and scheme shown in Figure 1 is that second characteristic information is not to be stored in the memory bank in advance, but calculates second characteristic information by medium information characteristic calculating unit 202 by the media information that reads in the storage medium 107.This implementation is than the benefit of first kind of scheme, can utilize people to improve the efficient of the feature of extracting or the feature that adjustment is extracted for the further achievement in research of phonetic feature by the algorithm that upgrades medium information characteristic calculating unit 202 at any time.
Fig. 3 is based on first characteristic information of the present invention and second characteristic information carries out the first method principle schematic that similarity is calculated.In the figure, the length of supposing first characteristic information is 4 bytes, feature 302 each byte location are labeled as a, b, c, d respectively, the length of first characteristic information 301 is 16 bytes, and the position mark of each byte is 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16.The method that intercepts second characteristic information is to be the byte number of starting point intercepting equal length with arbitrary byte, saves the not enough intercepting value of byte length.Obtain 13 intercepting sections like this, the byte location of each intercepting section is respectively 1,2,3,4; 2,3,4,5; 3,4,5,6; 4,5,6,7; 5,6,7,8; 6,7,8,9; 7,8,9,10; 8,9,10,11; 9,10,11,12; 10,11,12,13; 11,12,13,14; 12,13,14,15; 13,14,15,16.Each intercepting section is carried out similarity with first characteristic information respectively and is calculated result of calculation 303, and result of calculation 303 comprises 13 numerical value, is expressed as R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13.For n bar multimedia messages, the second characteristic information length of supposing every multimedia messages is identical, all be 16 bytes, then calculate total 13*n value by above-mentioned similarity, from 13*n value, choose maximal value again, according to the pairing second characteristic information section of this maximal value, just can learn this second characteristic information, the corresponding relation according to the media information and second characteristic information retrieves corresponding that media information then.
Because everyone hums identical tune, or say the words of identical content, humming or the speed of speaking are not necessarily identical, the prosodic features of therefore humming or saying content may be than the length difference of the same segment of the prosodic features of media information, as certain syllable is single 1/4 to clap again in the media information, and the humming or the prosodic features of saying this syllable may be two 1/4 and clap; Perhaps certain syllable in the media information is two 1/4 again and claps, and the humming or the prosodic features of saying this syllable may be single 1/4 to clap.Therefore in order to improve compatibility and the reliability that similarity is calculated, in carrying out similarity calculating, comprise the attribute byte of situation merge into to(for) the adjacent same characteristic features byte of first characteristic information and/or second characteristic information.Fig. 4 is based on first characteristic information of the present invention and second characteristic information carries out the second method principle schematic that similarity is calculated.Among this figure, except pressing the mode of Fig. 3,, calculate similarity result 403 in that first characteristic information 402 and second characteristic information 401 are not done the merging processing, result of calculation 403 comprises 13 numerical value, is expressed as R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13.Second characteristic information 401 has two place's adjacent feature identical among the figure, be feature 2 and feature 6, we merge into a feature with adjacent same characteristic features, at this moment, second characteristic information becomes the pooling information 404 of second characteristic information, first characteristic information 402 obtains result 405 with pooling information 404 by identical similarity calculating method then, and result of calculation 405 comprises 10 numerical value, is expressed as R14, R15, R16, R17, R18, R19, R20, R21, R22, R23.For n bar multimedia messages, do above-mentioned identical processing and calculating, choose maximal value again, according to the pairing second characteristic information section of this maximal value, just can learn this second characteristic information, the corresponding relation according to the media information and second characteristic information retrieves corresponding that media information then.
Fig. 5 is based on first characteristic information of the present invention and second characteristic information carries out the third method principle schematic that similarity is calculated.Compare with Fig. 4, first characteristic information among this figure exists needs to merge the adjacent same characteristic features of handling.Earlier do similarity and calculate result 503 by former first characteristic information 502 and second characteristic information 501, result of calculation 503 comprises 13 numerical value, be expressed as R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, then the pooling information 504 of first characteristic information and second characteristic information 501 are done similarity and calculate result 505, result of calculation 503 comprises 13 numerical value, is expressed as R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27.For n bar multimedia messages, do above-mentioned identical processing and calculating, choose maximal value again, according to the pairing second characteristic information section of this maximal value, just can learn this second characteristic information, the corresponding relation according to the media information and second characteristic information retrieves corresponding that media information then.
When all having the characteristic information that can merge for first characteristic information and second characteristic information, then comprise four kinds of situations and calculate, promptly first characteristic information and second characteristic information directly calculate similarity; Similarity is calculated in the pooling information of first characteristic information and second characteristic information; The pooling information of first characteristic information and second characteristic information calculate similarity; Similarity is calculated in the pooling information of the pooling information of first characteristic information and second characteristic information.
Fig. 6 is based on the present invention and adopts the sound input to select the treatment scheme synoptic diagram of multimedia messages.This figure further provides the realization example, after being converted to the MIDI file behind the extraction MFCC coefficient, is converted to numbered musical notation information again as characteristic information.Idiographic flow is: carry out the voice signal input in step 601, as hum one section, voice signal for input extracts the MFCC coefficient when the step 602, in step 603 the MFCC coefficient that obtains is converted to the MIDI file, be converted to numbered musical notation information in step 604 then, generate first characteristic information in step 605; Suppose the MIDI file of having stored every multimedia messages correspondence in the media bank, if do not have, can change out earlier the MIDI file, enter the MIDI file that step 606 reads article one multimedia messages, be converted to numbered musical notation information in step 607, generate second characteristic information in step 608, calculate the similarity of first characteristic information and second characteristic information then in step 609; Judge whether the last item multimedia messages in step 610? if not then entering the MIDI file that step 614 reads next bar multimedia messages, continue the processing of step 607, step 608, step 609, step 610, judge the corresponding MIDI file of similarity maximal value if then enter step 61 1, step read 612 get with the similarity maximal value the multimedia file that is associated of corresponding MIDI file, at last at the selected multimedia file of step 613 output.
Fig. 7 is based on first kind of principle schematic that the present invention realizes the media research system.The media research system comprises server end 700 and client 710, and client 710 is by interconnection network or LAN 704 Connection Service device ends 700.Wherein server end 700 comprises media information database 701, media interviews processing element 702, network interface 703; Client 710 comprises information exhibition parts 706, phonetic entry parts 707, voice signal processing element 708, network interface 705.The user is by phonetic entry parts 707 input voice, as hum the segment of melody, or the voice document made in advance of copy, handle by voice signal processing element 708, comprise the digitizing of voice signal, the extraction of voice first characteristic information, first characteristic information that will extract then sends in interconnection network or the LAN 704 by network interface 705, receives first characteristic information by the network interface 703 of server end 700 and delivers to media interviews processing element 702.Media interviews processing element 702 is taken out second characteristic information of every media information from media information database 701, adopt similarity calculating method to calculate each segment of every second characteristic information and the similarity of first characteristic information with first characteristic information of receiving then, choose pairing second characteristic information of similarity maximal value, take out the media information that with similarity maximal value pairing second characteristic information be associated with the corresponding relation of second characteristic information from media information database 701 according to every media information then, and selected media information sent in interconnection network or the LAN 704 by network interface 703, by network interface 705 these media informations of client 710 and deliver to voice signal processing element 708, by voice signal processing element 708 this media information is delivered to information exhibition parts 706 and show.As media information is simple music information, and display member 706 can be voice signal output amplifier and loudspeaker or earphone.As media information is the video that comprises music information, and display member 706 can be the combiner that comprises display screen and voice signal output amplifier and loudspeaker or earphone.As receive that media information comprises many alternative media informations, then information can be presented at by the clauses and subclauses mode on the display screen of display member 706 and select for the user.The present invention realizes that the media research system can realize the media information that has very big similarity with the input voice by selecting in the various media servers of phonetic entry from internet or LAN, changed the way of search of existing network search engine or research tool fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or just singing and to carry out the search of media information, simplified user's operation easier greatly, even can realize the search of media information for blind person or the user that is ignorant of computation.
Fig. 8 is based on second kind of principle schematic that the present invention realizes the media research system.The media research system comprises server end 800 and client 810, and client 810 is by interconnection network or LAN 704 Connection Service device ends 800.Wherein server end 800 comprises media information database 701, media interviews processing element 802, network interface 703; Client 810 comprises information exhibition parts 706, phonetic entry parts 707, voice signal processing element 808, network interface 705 and the local media second characteristic information memory unit 809.Before carrying out phonetic search, client 810 needs to download every pairing second characteristic information of media information by interconnection network or LAN 704 from server end 800 earlier, stores into then in the second characteristic information memory unit 809.The user is by phonetic entry parts 707 input voice, as hum the segment of melody, or the voice document made in advance of copy, handle by voice signal processing element 808, comprise the digitizing of voice signal, the extraction of voice first characteristic information, voice signal processing element 808 reads second characteristic information of every media information from the second characteristic information memory unit 809 then, adopt similarity calculating method to calculate each segment of every second characteristic information and the similarity of first characteristic information with first characteristic information that extracts then, choose pairing second characteristic information of similarity maximal value, second characteristic information of choosing is sent in interconnection network or the LAN 704 by network interface 705, receive second characteristic information by the network interface 703 of server end 800 and deliver to media interviews processing element 802.Media interviews processing element 802 is taken out and the receive media information that second characteristic information is associated from media information database 701 with the corresponding relation of second characteristic information according to every media information, and selected media information sent in interconnection network or the LAN 704 by network interface 703, by network interface 705 these media informations of client 810 and deliver to voice signal processing element 808, by voice signal processing element 808 this media information is delivered to information exhibition parts 706 and show.As media information is simple music information, and display member 706 can be voice signal output amplifier and loudspeaker or earphone.As media information is the video that comprises music information, and display member 706 can be the combiner that comprises display screen and voice signal output amplifier and loudspeaker or earphone.As receive that media information comprises many alternative media informations, then information can be presented at by the clauses and subclauses mode on the display screen of display member 706 and select for the user.The present invention realizes that the media research system can realize the media information that has very big similarity with the input voice by selecting in the various media servers of phonetic entry from internet or LAN, changed the way of search of existing network search engine or research tool fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or just singing and to carry out the search of media information, simplified user's operation easier greatly, even can realize the search of media information for blind person or the user that is ignorant of computation.
Fig. 9 is that the present invention realizes music score automatic page turning system principle synoptic diagram.Music score automatic page turning system comprises music score display unit 901, processing element 902 and phonetic entry parts 903.Wherein processing element 902 comprises the memory bank of memory bank, processor and the stored routine software of storing music-book information.Phonetic entry parts 903 comprise microphone and voice digitization collection and the memory circuit of collecting voice.Music score display unit 901 is electronic displaing parts, as LCD, organic light emission pipe display unit, Electronic Paper display unit etc.When playing music, music score display unit 901 shows first page of content of the music score of corresponding melody under the control of processing element 902, in playing procedure, phonetic entry parts 903 are constantly gathered the sound that input is played, the rhythm that is extracted sound by processing element 902 is as first characteristic information and make similarity with the segment of second characteristic information of institute's playing music of storage in advance and calculate, can judge the position of having played music score according to the similarity maximal value, in case played and finish then automatically following one page content of music score is presented on the display unit 901 thereby processing element 902 analyzes the music score content that is presented at display unit 901, avoided the player manually to carry out the short interruption that the music score page turning causes performance.Usually the melody of concert performer's performance and the speed of music score are very approaching, and therefore carrying out not to need to merge the adjacent same characteristic features of processing when similarity is calculated.
Figure 10 is that the present invention realizes melody learning system principle schematic.The melody learning system comprises display unit 1001, processing element 1002 and phonetic entry parts 1003.Wherein processing element 1002 comprises the memory bank of memory bank, processor and the stored routine software of storing musical composition information.Phonetic entry parts 1003 comprise microphone and voice digitization collection and the memory circuit of collecting voice.Music score display unit 1001 is electronic displaing parts, as LCD, organic light emission pipe display unit, Electronic Paper display unit etc.When singing or playing music, display unit 1001 shows the music score of corresponding melody under the control of processing element 1002, in singing or playing procedure, phonetic entry parts 903 are constantly gathered the sound that input is played, the rhythm that is extracted sound by processing element 902 is as first characteristic information, after one first melody finishes, second characteristic information of first characteristic information that is extracted and institute's playing music of storing is in advance made similarity by syllable to be calculated, the difference of the syllable of each syllable and standard melody when being given in singing or playing music according to similarity result of calculation, thereby processing element 902 is presented at this species diversity on the display unit 901, sing or play happy person and find mistake, and adjust own performance and reach the purpose that study is assisted according to the difference of shown syllable.
Figure 11 is that the present invention realizes the media player principle schematic.Media player 1100 comprises processor main frame 1101, control operation button 1102, earphone 1103 and microphone 1104.Processor main frame 1101 is connected with control operation button 1102, earphone 1103 and microphone 1104 by connecting lead 1105, the signal of this connection is two-way, be that the push button signalling of control operation button 1102 and the voice signal of microphone 1104 inputs can be sent to processor main frame 1101, the output signal of processor main frame 1101 can output to earphone 1103.In other are realized, processor main frame 1101 carries out wireless connections by wireless signal and control operation button 1102, earphone 1103 and microphone 1104, as adopt Bluetooth technology (BlueTooth) or WiFi technology to realize wireless connections, no matter wired connection or wireless connections mode all are existing mature technologies.Processor main frame 1101 comprises the memory bank 1105 and the information processing apparatus 1106 of the media information and second characteristic information thereof.Comprise first button 1107 and second button 1108 on the control operation button 1102.Because when people use media player, when playing certain first medium, can follow the music humming of medium, and player of the present invention also adopts the operator to hum media segment when selecting the broadcast starting point of medium and control medium to carry out, follow in progress medium and hum or hum segment and control media player and reselect medium or play starting point in order to make media player distinguish the user, adopt first button 1107 and 1108 realizations of second button on the control operation button 1102.When pressing first button 1107, the operator represents it is to select medium by the humming segment, when pressing second button 1108, the operator represents it is the broadcast starting point of selecting medium by the humming segment, when first button 1107 and second button 1108 are not all pressed, be to follow in progress medium to hum.The push button signalling of control operation button 1102 is sent to processor main frame 1101, carries out judgment processing by processing host.Press first button 1107 as the operator, information processing apparatus 1106 is by handling the voice messaging from microphone 1104, therefrom characteristic information extraction; Feature from memory bank 1105 taking-up media informations, intercept a plurality of similarity data that any one section characteristic information calculates then and screen comparison, therefrom choose second characteristic information under that section characteristic information of similarity maximum as the foundation of selecting required multimedia messages, choose media information according to second characteristic information and media information corresponding relation then and play.Press second button 1108 as the operator, information processing apparatus 1106 is by handling the voice messaging from microphone 1104, therefrom characteristic information extraction; Feature from memory bank 1105 taking-up media informations, intercept a plurality of similarity data that any one section characteristic information calculates then and screen comparison, therefrom choose second characteristic information under that section characteristic information of similarity maximum as the foundation of selecting required multimedia messages, choose media information and begin broadcast according to second characteristic information and media information corresponding relation then from location point with the second characteristic information segment similarity maximum.The medium selection of media player and the automatic location of media play starting point have so just been realized.
Figure 12 is the schematic flow sheet that the present invention judges two song similaritys, and this figure further provides the realization example, after being converted to the MIDI file behind the extraction MFCC coefficient, is converted to numbered musical notation information again as characteristic information.Idiographic flow is: at step 1201 input first music, extract the MFCC coefficient for first music in step 1202, in step 1203 the MFCC coefficient that obtains is converted to the MIDI file, is converted to numbered musical notation information in step 1204 then, generate first characteristic information in step 1205; Take identical processing for second music: in step 1206 input second music, extract the MFCC coefficient for second music in step 1207, in step 1208 the MFCC coefficient that obtains is converted to the MIDI file, be converted to numbered musical notation information in step 1209 then, generate first characteristic information in step 1210.Calculate the similarity of first characteristic information and second characteristic information then in step 1211; In step 1212 from from described similarity data, choosing the similarity maximal value and judging that in step 1213 whether the similarity maximal value surpasses threshold values? reach a conclusion if surpass threshold values then enter step 1214: first music and the described second music similarity height; Reach a conclusion if do not surpass threshold values then enter step 1215: first music and described second music similarity are low.

Claims (10)

1, a kind of melody learning system comprises the storage medium of having stored at least one first musical composition information, it is characterized in that also comprising:
Sound input component;
Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input or information;
The media information alternative pack selects to prepare certain first musical composition information of study;
Medium information characteristic is extracted parts, calculates second characteristic information of selected musical composition information;
The characteristic similarity calculating unit is used to calculate and judge the second characteristic information similarity of described first characteristic information and selected musical composition information;
The information indicating parts provide the difference of the sound and the selected musical composition information of input according to the information similarity.
2, system according to claim 1 is characterized in that the difference of the sound of described input and selected musical composition information comprises sound and the whole difference of musical composition information and/or the difference of local segment and/or single syllable of input.
3, system according to claim 1 is characterized in that described first characteristic information comprises acoustic tones information and/or inflection information; Described second characteristic information comprises acoustic tones information and/or the inflection information that comprises in the multimedia messages.
4, a kind of melody learning system is characterized in that comprising display unit, processing element and phonetic entry parts; Wherein said processing element comprises the memory bank of memory bank, processor and the stored routine software of storing musical composition information; Described phonetic entry parts comprise microphone and voice digitization collection and the memory circuit of collecting voice; Described music score display unit is an electronic displaing part.
5, according to each described system in the claim 1 to 3, it is characterized in that described characteristic extracting component, media information alternative pack, characteristic similarity calculating unit, storage medium are realized by the information processing apparatus that comprises processor, adopt wired lead to be connected between described information processing apparatus and the sound input component or adopt the wireless signal connection.
6, a kind of melody learning system comprises the storage medium of having stored at least one first musical composition information and second characteristic information corresponding with every first musical composition information, it is characterized in that also comprising:
Sound input component;
Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input or information;
The media information alternative pack selects to prepare certain first musical composition information of study;
The characteristic similarity calculating unit is used to calculate and judges described first characteristic information and the selected corresponding second characteristic information similarity of musical composition information;
The information indicating parts provide the difference of the sound and the selected musical composition information of input according to the information similarity.
7, system according to claim 6 is characterized in that the difference of the sound of described input and selected musical composition information comprises sound and the whole difference of musical composition information and/or the difference of local segment and/or single syllable of input.
8, system according to claim 6 is characterized in that described first characteristic information comprises acoustic tones information and/or inflection information; Described second characteristic information comprises acoustic tones information and/or the inflection information that comprises in the multimedia messages.
9, system according to claim 4 is characterized in that described electronic displaing part is LCD or organic light emission pipe display unit or Electronic Paper display unit.
10, according to each described system in the claim 6 to 8, it is characterized in that described characteristic extracting component, media information alternative pack, characteristic similarity calculating unit, storage medium are realized by the information processing apparatus that comprises processor, adopt wired lead to be connected between described information processing apparatus and the sound input component or adopt the wireless signal connection.
CN2009201065794U 2009-03-23 2009-03-23 Music learning system Expired - Fee Related CN201397672Y (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009201065794U CN201397672Y (en) 2009-03-23 2009-03-23 Music learning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009201065794U CN201397672Y (en) 2009-03-23 2009-03-23 Music learning system

Publications (1)

Publication Number Publication Date
CN201397672Y true CN201397672Y (en) 2010-02-03

Family

ID=41620196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009201065794U Expired - Fee Related CN201397672Y (en) 2009-03-23 2009-03-23 Music learning system

Country Status (1)

Country Link
CN (1) CN201397672Y (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955490A (en) * 2014-04-16 2014-07-30 华为技术有限公司 Audio playing method and audio playing equipment
CN106056503A (en) * 2016-06-01 2016-10-26 苏州科技学院 Intelligent music teaching platform and application method thereof
CN106919583A (en) * 2015-12-25 2017-07-04 广州酷狗计算机科技有限公司 The method for pushing and device of audio file
CN107800879A (en) * 2017-10-23 2018-03-13 努比亚技术有限公司 A kind of audio regulation method, terminal and computer-readable recording medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955490A (en) * 2014-04-16 2014-07-30 华为技术有限公司 Audio playing method and audio playing equipment
CN106919583A (en) * 2015-12-25 2017-07-04 广州酷狗计算机科技有限公司 The method for pushing and device of audio file
CN106919583B (en) * 2015-12-25 2020-11-10 广州酷狗计算机科技有限公司 Audio file pushing method and device
CN106056503A (en) * 2016-06-01 2016-10-26 苏州科技学院 Intelligent music teaching platform and application method thereof
CN107800879A (en) * 2017-10-23 2018-03-13 努比亚技术有限公司 A kind of audio regulation method, terminal and computer-readable recording medium

Similar Documents

Publication Publication Date Title
CN101552000B (en) Music similarity processing method
US7244885B2 (en) Server apparatus streaming musical composition data matching performance skill of user
JP2006106818A (en) Music retrieval device, music retrieval method and music retrieval program
JP2010518459A (en) Web portal for editing distributed audio files
CN101551997B (en) Assisted learning system of music
CN101657816A (en) The portal website that is used for distributed audio file editing
JP2009244789A (en) Karaoke system with guide vocal creation function
CN201397672Y (en) Music learning system
JP2016070999A (en) Karaoke effective sound setting system
JP2007264569A (en) Retrieval device, control method, and program
JP2004233698A (en) Device, server and method to support music, and program
CN201397671Y (en) Media player
CN101551999B (en) Automatic page overturning device
CN101552002B (en) Media broadcasting device and media operating method
CN101552003B (en) Media information processing method
CN201397670Y (en) Network searching system
CN101552001B (en) Network searching system and information searching method
JP2006251697A (en) Karaoke device
CN201397673Y (en) Music score indicating device
JP2003131674A (en) Music search system
JP2006276560A (en) Music playback device and music playback method
JP5969421B2 (en) Musical instrument sound output device and musical instrument sound output program
JP5537246B2 (en) Singing position display system
JP5234950B2 (en) Singing recording system
JP4498221B2 (en) Karaoke device and program

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100203

Termination date: 20120323