CN101551999B

CN101551999B - Automatic page overturning device

Info

Publication number: CN101551999B
Application number: CN2009100784855A
Authority: CN
Inventors: 须清
Original assignee: Beijing Paragon Technology Co Ltd
Current assignee: Beijing Paragon Technology Co Ltd
Priority date: 2009-02-25
Filing date: 2009-02-25
Publication date: 2012-06-27
Anticipated expiration: 2029-02-25
Also published as: CN101551999A

Abstract

The present invention provides an automatic page overturning device which comprises an information memory stack stored at least one piece of multimedia information and a display component, and furthermore the automatic page overturning device comprises the following components: a sound input component; a characteristic extracting component for executing first characteristic information from the sound signal or information input from the sound input component; a medium information characteristic memory component which is stored with second characteristic information corresponding with each piece of multimedia information; a characteristic similarity computing component used for computing and determining the current information of multimedia information corresponding with the information segment which has maximum similarity with the part of multimedia information displayed by the first characteristic information and display component; and a page overturning component. When the current position of multimedia information is the end of multimedia information part displayed by the display component, the content of next page of multimedia information is displayed by the display component.The music executant can keep his mind on the playing of music without manually overturnning the page of music.

Description

A kind of automatic page turning device

Technical field

The present invention relates to a kind of automatic page turning device, particularly relate to the device that carries out automatic page turning in the music playing process.

Background technology

Multimedia player, quite universal like MP3, MP4, portable terminal, computer etc., these equipment all have bigger information storage capacity usually, have stored a lot of bar multimedia messagess.The method that will from these multimedia messagess, select required multimedia messages to play mainly is earlier multimedia messages to be classified by certain rule, is selected through operation interface by the operator with the mode of choice menus then.When the multimedia messages content is a lot, will make the menu level of operation interface a lot, also be the comparison difficulty to such an extent as to choose required multimedia messages.And the menu option that common operation interface is shown only shows caption of multimedia information usually; When the multimedia messages content is a lot; People often see whether title can not be known sometimes is required content; Often choose back audition or try that it is not required that the back is found, situation about reselecting again.

In today of internet development, the content of multimedia in the network is very huge especially, therefrom searches for required content and is not easy, and particularly search is difficult to acquisition especially when people do not remember the title of content of multimedia clearly.

In recent years, speech recognition technology and also many based on the research of sound control operation electronic equipment, also some commercializations on some mobile terminal devices are as selecting to carry out making call operation etc. through sound.Document us No.4,277,644 and No.6,101,467 have covered the various aspects of speech recognition software.And the method that is used to characterize audio content also has description.Particularly United States Patent(USP) No. 6,054, and 646 and No.6,173,250 have covered the method that is used for characterizing through characteristics such as beat, energy, pitches music.

Although have to the progress of the characterizing method of speech recognition, audio signal analysis and musical features and development in recent years, and on some electronic equipments, realize voice control, it uses the demand that can not satisfy people under many circumstances.For example; A kind of situation that often occurs is; Some difficulties appear in people when using the content of multimedia that multimedia player selects oneself to like; Perhaps can groan out certain segment or a certain sentence of melody in this multimedia messages at that time or only can groan out the approximate content of a trifle of melody melody, but be exactly title that can not remember content of multimedia, thereby can't find required media content effectively.

On open was on July 13rd, 2005, and publication number is to have mentioned in the Chinese invention patent application disclosure of CN1639975A through go to select the content in desired signal source behind the phonetic feature that extracts signal source with phonetic feature.Disclosed watchdog function (Watch Dog) in this open file particularly: the user can sing or groan out one style (pattern) to voice-frequency sender-player sound intermediate frequency analyzer; Voice-frequency sender-player can be kept watch on different frequency channels to that specific tone then; And the user can input to voice-frequency sender-player with said speech through voice recognition software, and voice-frequency sender-player can be kept watch on different channel to some or the whole dialogue that comprise these speech and monologue then.Adopt senior matching algorithm, promptly when twice or three times appear in predetermined second in the number, announce Matching Algorithm when phrase.When coupling occurring, can produce a control event, the switching of control channel.

But the technical application of foregoing description has its defective during to multimedia player with larger capacity.Because using multimedia player is not all to be the professional; During the content of the segment of singing out or groaning out or content or melody melody; It often is not the content of standard; Like the tone of the beat of melody melody and different or melody melody and different, but its content of groaning out or singing out with hope that the content of selection has certain similarity again.Such as a first melody is that C transfers, and the multimedia messages of recording is based on also that C transfers, but the content that people groan out or sing out can transfer or C rising tune or C falling tone with F, but the rhythm of melody is similar basically, and it is same first melody that people can judge; Perhaps a first melody is 2/4 beat, but the content of groaning out or singing out possibly be 4/4 beat, but the rhythm of melody is similar basically, and it is same first melody that people can judge.In this case, prior art does not have fine solution.

On the other hand, for media player, situation about also existing is; People hope that certain bar media information begins to play from certain point; Prior art normally adopts fast forward button or fast backward button, but this mode of operation can only be estimated with the operator and advance or going-back position; Usually inaccurate, need repeatedly could accomplish by fast forward button or fast backward button.Though existing Digital Media recording mode provides the broadcast of catalogue formula menu selection mode with selected certain bar medium, but still can not solve the problem by people's expectation quick control media play starting point.

In addition, the copyright problem of music also more and more receives publicity; Particularly report is arranged during the plagiarism problem of music, and some plagiarism persons are in order to hide the leak of corresponding legal provisions, the tone or the beat of melody adjusted slightly; Make them different from form with former song; But the entity content is still similar, and how this similarity is considered to plagiarize, and prior art does not propose corresponding method yet.

Summary of the invention

Technical matters to be solved by this invention is the starting point of how more effectively selecting required multimedia messages from the media store body or from the internet and arbitrarily controlling media play effectively.The present invention adopts the method for phonetic feature extraction, stage extraction, similarity calculating, similarity determination to realize that sound is controlled electronic equipment or network operation realizes obtaining automatically required multimedia messages.Technology of the present invention can be used for or realize that melody is plagiarized or the automatic judgement of similarity simultaneously.Technology of the present invention can also realize melody automatic page turning function simultaneously, and the person can be absorbed in the performance of melody to make the music playing, and does not need the page and the singing CAL function of manual switchover melody

The term explanation: the phonetic feature of indication is and the relevant characteristic information of the rhythm of importing voice among this paper; And the rhythm is to be the basis with each discernible syllable; That is to say; One section multimedia comprises a lot of syllables, and is to serve as the characteristic that the basis obtains with each syllable when extracting phonetic feature, and the characteristic of each syllable is combined the whole rhythm or the melody that has promptly constituted this section multimedia messages in order; Therefore the characteristics combination of extracting can intercepting wherein any one section, as carrying out the right basis of aspect ratio among the present invention.When one section phonetic entry comprises a plurality of melody, can only extract the characteristic that the theme characteristic is perhaps extracted all melody.Phonetic feature and characteristic information are identical meanings in this article.

The term explanation: the media information of indication of the present invention has identical implication with multimedia messages, all is meant the combination in any of the voice messaging, music information, video information, data message or these information that include acoustic information.

The term explanation: indication similarity of the present invention is meant the data of the expression information correlativity that the employing relevance algorithms draws between two information, and said relevance algorithms comprises linear dependence computing method or non-linear dependencies computing method.Linear dependence computing method and non-linear dependencies computing method have provided a variety of mathematical models and computing method in existing art of mathematics and expeimental physics, quote as the prior art that is associated with the present invention.

For addressing the above problem, the technical solution of proposition is:

1, first kind of scheme of a kind of multimedia playing apparatus comprises storage medium and the multi-media signal output block of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component both can be through the sound transducer sound import, also can be one section audio files input information of making in advance;

Characteristic extracting component is extracted first characteristic information from the voice signal of said sound input component input or information;

The medium information characteristic memory unit has been stored second characteristic information corresponding to said every multimedia messages;

The characteristic similarity calculating unit is used for judging the similarity of any segment information of second characteristic information of said first characteristic information and said every multimedia messages;

The characteristic similarity decision means is chosen the similarity maximal value from said similarity data;

The multimedia messages alternative pack selects that multimedia messages at the peaked message segment of similarity place to be sent to said multi-media signal output block from said storage medium.

2, second kind of scheme of a kind of multimedia playing apparatus comprises storage medium and the multi-media signal output block of having stored at least one multimedia messages, it is characterized in that also comprising:

The medium information characteristic calculating unit calculates second characteristic information corresponding to said every multimedia messages;

3, for the method for distilling of first characteristic information and second characteristic information, be example with the song of well known, can extract the theme characteristic information of this first song, as representing, comprised the information of tempo and tone in the numbered musical notation with numbered musical notation or staff.Can be the theme characteristic information as second characteristic information of the present invention; And different people is when singing out or groaning out this first song; Its tempo and/or tone maybe be different with tempo, tone that this first song itself is confirmed; Also maybe be different with tempo, the tone of second characteristic information in the message segment of record into multimedia messages; If but all be that their theme is to have very big similarity to same first singing songs.Therefore after carrying out beat adjustment and/or tone adjustment for second characteristic information, carry out similarity with first characteristic information again and calculate.Said melody also can be represented with staff or other melody.In the multimedia messages of music was handled, wherein a kind of music media form was a music score file, and this file is with the data mode stored sound of expression note, musical instrument and sharpness information, and most popular data layout is the MIDI data layout.The MIDI file comprises standard how to reproduce sound, can be considered to a music score of electronically readable form, the sound channel that will consider when it comprises the represented music score of relevant data of in each MIDI file of resetting, storing, used device and the information of the parameter of entering a higher school.Collective term " parameters,acoustic " expression for example defines, and pitch, note or its residual value are respectively the description that responds grade, velocity of sound, tone color or special-effect such as trill or reverberation.Therefore said MIDI file has comprised second characteristic information of wanting required for the present invention; Can be directed against each bar or the pairing MIDI file of each first multimedia messages as second characteristic information of the present invention; Accordingly; Same procedure is also adopted in the extraction of first characteristic information, and the MIDI file that extracts the input voice is as first characteristic information.Perhaps carry out one of characteristics such as data extract removal musical instrument, response grade, tone color trill, reverberation or several back again as second characteristic information of the present invention for each bar or the pairing MIDI file of each first multimedia messages; Accordingly; Same procedure is also adopted in the extraction of first characteristic information, and the MIDI file that extracts the input voice is removed one of characteristics such as velocity of sound, musical instrument, response grade, tone color trill, reverberation or several back as first characteristic information.

At United States Patent(USP) No. 6; 054; Provided in 646 through from voice signal, extracting the method for characteristic signal, comprised cepstral coefficients method (MFCC:Mel Frequency Cepstral Coefficients), linear predict code (LPC:Linear Predictive Coding).Also provided simultaneously the parameter maps description that the MFCC characteristic is converted into the MIDI file.The present invention quotes in full United States Patent(USP) No. 6,054 here, 646 content.In existing internet, can be easy to find the software that the sound waveform file of gathering (WAVE) is converted into the MIDI file in addition, also have the MIDI file conversion is the software of numbered musical notation and is the software of staff with the MIDI file conversion.Therefore the present invention is realizing on the basis of these existing knowledge that content of the present invention is with the relevance between the multimedia messages of the acoustic information of judging input and storage.A kind of implementation can be described below:

Voice signal for input extracts the MFCC coefficient, with MFCC coefficient generation MIDI file, is the numbered musical notation file with the MIDI file conversion more then, with the numbered musical notation file as first characteristic information; Multimedia messages for storage adopts identical method to extract the MFCC coefficient, with MFCC coefficient generation MIDI file, is the numbered musical notation file with the MIDI file conversion more then, with the numbered musical notation file as second characteristic information; Calculate the similarity of first characteristic information and second characteristic information then, just can realize the function that will reach required for the present invention according to similarity result of calculation.In different application requires; Can also carry out further conversion for first characteristic information and second characteristic information; The combination that comprises also that the corresponding numbered musical notation file of said multimedia messages further generates like second characteristic information based on the numbered musical notation file of various big accent; If promptly the numbered musical notation file of original multimedia information is a c major, can further generate the part of such as the numbered musical notation file of the big accent of D, the big accent of E, the big accent of G etc. as second characteristic information; Second characteristic information the combination that comprises also that the corresponding numbered musical notation file of said multimedia messages further generates for another example based on the numbered musical notation file of various beats; If promptly the numbered musical notation file of original multimedia information is 2/4 bat, can further generate the part of such as the numbered musical notation file of 4/4 bat, 6/8 bat etc. as second characteristic information; For another example each tone of numbered musical notation file is all used a numeral, adjacent same tone is merged into a tone, and then carry out similarity and calculate, the sound that can get rid of input is because of being out of tune or the beat difference causes the difference of similarity.

In a kind of optional implementation, first characteristic information and second characteristic information can be exactly MFCC coefficient or LPC coefficient, directly carry out similarity calculating for MFCC coefficient or LPC coefficient; Can also be exactly the MIDI file, directly carry out similarity and calculate for the MIDI file.

4, said first characteristic information comprises acoustic tones information and/or inflection information; Said second characteristic information comprises acoustic tones information and/or the inflection information that comprises in the multimedia messages.

5, perhaps said first characteristic information comprises sound pitch information and/or change in pitch information; Said second characteristic information comprises sound pitch information and/or the change in pitch information that comprises in the multimedia messages.

6, first kind of scheme of a kind of multimedia messages disposal route selected required multimedia messages from the storage medium of the second corresponding characteristic information of at least one multimedia messages and every multimedia messages, it is characterized in that comprising the steps:

The first step: through sound input component input audio signal or information;

Second step: from the voice signal of said sound input component input or information, extract first characteristic information;

The 3rd step: the similarity data of calculating any segment information in second characteristic information of said first characteristic information and said every multimedia messages;

The 4th step: from said similarity data, choose the similarity maximal value;

The 5th step: second characteristic information from said storage medium under the peaked message segment of selection similarity;

The 6th step: from storage medium, retrieve pairing that multimedia messages according to the second affiliated characteristic information.

7, this method also comprises the step with pairing that multimedia messages output.

8, this method also comprises in storage medium the step of input multimedia messages, from other media, is input to multimedia messages in the storage medium or connects through network through wired or wireless mode to download to multimedia messages in the storage medium.

9, further be this method, also comprise the multimedia messages of said input is calculated the step of the second corresponding characteristic information and is stored in the said storage medium.

10 or this method also comprise directly the step of in the storage medium input multimedia messages and corresponding second characteristic information.

11, the length of any segment information in said second characteristic information is identical with the length of said first characteristic information, and any segment information in perhaps said second characteristic information is identical with the length of said first characteristic information through beat adjustment back and/or tone adjustment back.

12, said second characteristic information and said first characteristic information be music the rhythm or melodic information.

13, perhaps said second characteristic information and said first characteristic information are the rhythm or the melodic informations of having removed beat length.

14, the computing method in said the 3rd step are the linear dependence computing method.A kind of first characteristic information and second characteristic information of realizing being based on numbered musical notation is because numbered musical notation can be used three octave notes and beat perfect representation usually.Because all being 1 to 7 numeral, note adds that high note or low note and pause sound (representing with 0 usually) represent.Can handle as follows when being converted into characteristic information according to the invention.For high pitch (the 3rd octave) with 8 to 15 totally 7 numerals; For bass (first octave) with-7 to-1 totally 7 numerals; For middle pitch (second octave) with 1-7 totally 7 numerals; The pause sound representes that with 0 therefore characteristic information of the present invention has been transformed into numerical information in this implementation, the corresponding numeral of each beat.With linear dependence degree computing method; Can be easy to calculate the similarity of first characteristic information and second characteristic information; Even first characteristic information is different with the pitch or the tone of second characteristic information, if but exist similarly, then each beat pitch or tone all can correspondingly change.Like second characteristic information is the C accent in the music, and second characteristic information can be that B transfers, because the numeral of each beat all takes place correspondingly to change according to determined accent, though the numeral of each beat is different, the similarity of calculating is but very high.The mathematic calculation of linear similarity belongs to known algorithm, just repeats no more here.Sometimes it is different with the beat of second characteristic information of multimedia messages the represented beat of first characteristic information of voice also can to occur importing; Like second characteristic information is 2/4 beat; And first characteristic information is 4/4 beat; But the theme of its expression possibly be similar, therefore calculates similarity and need adjust for the beat of first characteristic information and/or second characteristic information before.One of method of adjustment be with the data of a beat with identical beat of data expansion, be 5 can be adjusted into two beats like the data of certain beat, each beat all is 5; Two of method of adjustment is that two continuous beats that data are identical are reduced to a beat, all is 5 can be adjusted into a beat like the data of certain two continuous beat, and beat data is 5.

15, second of a kind of multimedia messages disposal route kind of scheme selected required multimedia messages from the storage medium of having stored a multimedia messages at least, it is characterized in that comprising the steps:

Second step: from the voice signal of said sound input component input, extract first characteristic information;

The 3rd step: calculate the second corresponding characteristic information of every multimedia messages;

The 4th step: the similarity data of calculating any segment information in second characteristic information of said first characteristic information and said every multimedia messages;

The 5th step: from said similarity data, choose the similarity maximal value;

The 6th step: retrieve pairing that multimedia messages according to second characteristic information under the peaked message segment of similarity.

The difference of second kind of scheme and first kind of scheme is that second characteristic information of every multimedia messages is to be stored in the memory bank in advance, or just calculates during application need.

16, a kind of first kind of scheme of multimedia messages player operation method; From the storage medium of the second corresponding characteristic information of at least one multimedia messages and every multimedia messages, select required multimedia messages to play, it is characterized in that comprising the steps:

The 4th step: from said similarity data, choose the similarity maximal value;

The 6th step: from storage medium, retrieve pairing that multimedia messages and play output according to the second affiliated characteristic information.

Every the second corresponding characteristic information of multimedia messages can adopt the MIDI file, perhaps extracts the partial element of MIDI file out.

17, second kind of scheme of a kind of multimedia messages player operation method selected required multimedia messages from the storage medium of having stored a multimedia messages at least, it is characterized in that comprising the steps:

The 5th step: from said similarity data, choose the similarity maximal value;

The 6th step: retrieve pairing that multimedia messages according to second characteristic information under the peaked message segment of similarity and play output.

Technology according to the invention can also be used to judge the similarity of two songs, is judging music has bigger use in whether plagiarizing.

18, a kind of music similarity determination methods of carrying out is carried out the similarity judgement for the first music and second music, it is characterized in that comprising the steps:

The first step: first characteristic information of the multimedia messages of input first music or import the multimedia messages of first music after from the multimedia messages of said first music, extract first characteristic information;

Second step: a plurality of message segments that said first characteristic information resolved into the certain-length that begins with any starting point;

The 3rd step: import second music multimedia messages second characteristic information or import the multimedia messages of second music after from the multimedia messages of said second music, extract second characteristic information;

The 4th step: calculate said a plurality of message segments any one section with said second characteristic information in the similarity data of any segment information;

The 5th step: from said similarity data, choose the similarity maximal value;

The 6th step: judge that whether the similarity maximal value surpasses the threshold values of setting, if surpass the threshold values of setting then judge said first music and said second music similarity height, otherwise said first music and said second music similarity are low.

In a plurality of message segments of above-mentioned certain-length, for the regulation of certain-length can with the definition of relevant legal document carry out related, as stipulate continuous 7 beats similarly be identified as plagiarism, can said certain-length be set at the length of 7 beats.

The implication of the threshold values of above-mentioned setting is according to confirming for the strict degree of the execution of relevant law.Similar just calculation the for strictness plagiarized, and the threshold values of then setting is just very high, near 1; Just can suitably reduce when carrying out the threshold values of setting when strict degree reduces, as be 0.8 or 0.9.

19, a kind ofly carry out the music similarity determination methods, it is characterized in that comprising the steps: for the music in the internet

The 3rd step: second characteristic information of from the internet, downloading the multimedia messages of second music perhaps extracts second characteristic information from the multimedia messages of said second music behind the multimedia messages of download second music from the internet;

The 5th step: from said similarity data, choose the similarity maximal value;

Adopt technology of the present invention can also be used for the internet and carry out media information search, a kind of effective more a kind of search system and searching method are provided.

20, first of a kind of network searching system kind of scheme comprises remote server component and proximal piece, and said remote server component is connected through internet or LAN with proximal piece, it is characterized in that:

Said proximal piece comprises:

Sound input component;

The information transmit block arrives said remote server component with first characteristic information through network delivery;

Message pick-up first parts receive the multimedia messages that said remote server component sends over;

Said remote server component comprises:

Message pick-up second parts receive first characteristic information that sends over from said proximal piece;

The media information memory unit has been stored at least one multimedia messages, calculating and storage or has been stored second characteristic information corresponding to said every multimedia messages, every multimedia messages and its second characteristic information corresponding relation in advance;

The characteristic similarity decision means is chosen similarity maximal value or similarity and is surpassed a plurality of similarity data of setting threshold values from said similarity data;

The multimedia messages alternative pack, the one or more corresponding multimedia messages of second characteristic information of from said media information memory bank, selecting said similarity maximal value or similarity to surpass the message segment place of a plurality of similarity data of setting threshold values is sent to said proximal piece.

21, adopt the network searching system of first kind of scheme to realize the multimedia messages searching method, it is characterized in that comprising following operation steps:

Step 1: in said proximal piece input audio signal or information;

Step 2: said proximal piece is extracted first characteristic information of said voice signal or information;

Step 3: said first characteristic information is sent to remote server component through internet or LAN;

Step 4: said remote server component is calculated the similarity of second characteristic information of every media information storing in said first characteristic information and the remote server component;

Step 5: said remote server component retrieves pairing multimedia messages as the multimedia messages of choosing according to the maximal value or the similarity of said similarity above corresponding second characteristic information of a plurality of similarity data of setting threshold values from said remote server component;

Step 6: said remote server component sends to proximal piece with the multimedia messages of being chosen through internet or LAN.

22, second of a kind of network searching system kind of scheme comprises remote server component and proximal piece, and said remote server component is connected through internet or LAN with proximal piece, it is characterized in that:

Said proximal piece comprises:

Sound input component;

Download parts, download second characteristic information of every multimedia messages from said remote server component;

The near-end memory unit, storage is from second characteristic information of every multimedia messages downloading parts

Alternative pack takes out the similarity maximal value from said memory unit or similarity surpasses pairing second characteristic information of a plurality of similarity data of setting threshold values;

The information transmit block arrives said remote server component with selected second characteristic information of alternative pack through network delivery;

Said remote server component comprises:

Message pick-up second parts receive second characteristic information that sends over from said proximal piece;

The multimedia messages alternative pack, pairing one or more multimedia messages of second characteristic information that from said media information memory bank, receives in selection and the said information receiving parts is sent to said proximal piece.

23, adopt the network searching system of second kind of scheme to realize the multimedia messages searching method, it is characterized in that comprising following operation steps:

Step 1: said proximal piece is downloaded second characteristic information of every multimedia messages from said remote server component through internet or LAN

Step 2: in said proximal piece input audio signal or information;

Step 3: said proximal piece is extracted first characteristic information of said voice signal or information;

Step 4: said proximal piece is calculated the similarity of second characteristic information of said first characteristic information and said every media information;

Step 5: the maximal value or the similarity of said similarity are sent to remote server component above pairing second characteristic information of a plurality of similarity data of setting threshold values through internet or LAN;

Step 6: said remote server component retrieves pairing multimedia messages as the multimedia messages of choosing according to second characteristic information of being received from said remote server component;

Step 7: said remote server component sends to proximal piece with the multimedia messages of being chosen through internet or LAN.

24, the third scheme of a kind of network searching system comprises remote server component and proximal piece, and said remote server component is connected through internet or LAN with proximal piece, it is characterized in that:

Said proximal piece comprises:

Sound input component;

The information transmit block will arrive said remote server component through network delivery from the voice signal or the information of said sound input component;

Said remote server component comprises:

Message pick-up second parts receive the voice signal or the information that send over from said proximal piece;

Characteristic extracting component is extracted first characteristic information voice signal that receives from said message pick-up second parts or the information;

The media information memory unit has been stored at least one multimedia messages, second characteristic information corresponding to said every multimedia messages, every multimedia messages and its second characteristic information corresponding relation;

25, adopt the network searching system of the third scheme to realize the multimedia messages searching method, it is characterized in that comprising following operation steps:

Step 1: in said proximal piece input audio signal or information;

Step 2: said voice signal or information are sent to remote server component through internet or LAN;

Step 3: said remote server component extract first characteristic information of the voice signal of receiving or information;

26, said multimedia messages is one of following message or its combination: literal, picture, sound, melody, film, TV.

Technology of the present invention can also be used for the device according to sound input automatic page turning, like concert performer's music score page turning etc.

27, a kind of automatic page turning device comprises media information memory bank and the display unit of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component;

Characteristic extracting component is extracted first characteristic information from the voice signal of said sound input component input;

The medium information characteristic memory unit calculates and stores or stored in advance second characteristic information corresponding to said every multimedia messages;

The characteristic similarity calculating unit, the part of the multimedia messages that is used to judge that said first characteristic information and said display unit are shown the current location of the maximum pairing multimedia messages of message segment of the corresponding second characteristic information similarity;

The page turning decision means is when the current location of said multimedia messages is following one page content that the then said display unit of ending of the part of the multimedia messages that shows of said display unit shows said multimedia messages;

28, a kind of a kind of scheme of the assistant learning system of singing comprises the memory bank of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component;

The media information alternative pack selects to prepare certain bar multimedia messages of study;

Medium information characteristic is extracted parts, extracts second characteristic information of selected multimedia messages;

The characteristic similarity calculating unit is used for calculating and judging the corresponding second characteristic information similarity with selected multimedia messages of said first characteristic information;

The information indicating parts provide the difference of the sound and the multimedia messages of input according to the information similarity.

29, second of a kind of assistant learning system of singing kind of scheme comprises the memory bank of having stored at least one multimedia messages and second characteristic information corresponding with every multimedia messages, it is characterized in that also comprising:

Sound input component;

The characteristic similarity calculating unit is used to calculate and judges said first characteristic information and the selected corresponding second characteristic information similarity of multimedia messages;

Beneficial effect of the present invention: adopt technology of the present invention can realize more effectively selecting required multimedia messages from the media store body or from the internet; Through the relevant part phonetic feature information of input medium, as the segment of humming certain first song can retrieve the complete information of this first song; Whether technology of the present invention can also more effectively be distinguished a first melody simultaneously has other melodies of plagiarism.The present invention adopts the method for phonetic feature extraction, stage extraction, similarity calculating, similarity determination to realize that sound is controlled electronic equipment or network operation realizes obtaining automatically required multimedia messages; Realize that perhaps melody is plagiarized or the automatic judgement of similarity; And can also realize melody automatic page turning function; The person can be absorbed in the performance of melody to make the music playing, and does not need the page, the singing CAL function of manual switchover melody.The present invention realizes that media player can realize from media player, selecting and the media information of input voice with very big similarity through phonetic entry; Changed the mode of operation of existing media player fully; Has media information location feature more accurately; And most applications is without the operation of hand; Directly, greatly reduce user's operation easier, even can realize the operation of media player for blind person or the user that is ignorant of player operation through saying or singing the broadcast starting point that the selection that just can carry out media information also can be controlled medium.The present invention realizes that the media research system can realize the media information that has very big similarity with the input voice through selecting in the various media servers of phonetic entry from internet or LAN; Changed the way of search of existing network search engine or research tool fully; Has media information location feature more accurately; And most applications is without the operation of hand; Directly, simplified user's operation easier greatly, even can realize the search of media information for blind person or the user that is ignorant of computation through saying or just singing and to carry out the search of media information.

Description of drawings:

Fig. 1 is first kind of system works principle schematic that realizes multimedia information retrieval of the present invention.

Fig. 2 is second kind of system works principle schematic that realizes multimedia information retrieval of the present invention.

Fig. 3 is that first characteristic information and second characteristic information carry out first kind of algorithm principle of work synoptic diagram that similarity is calculated among the present invention.

Fig. 4 is that first characteristic information and second characteristic information carry out second kind of algorithm principle of work synoptic diagram that similarity is calculated among the present invention.

Fig. 5 is that first characteristic information and second characteristic information carry out the third algorithm principle of work synoptic diagram that similarity is calculated among the present invention.

Fig. 6 is the present invention chooses multimedia messages through the sound input a workflow synoptic diagram.

Fig. 7 is that the present invention chooses multimedia messages through sound from the internet first kind of system realizes synoptic diagram.

Fig. 8 is that the present invention chooses multimedia messages through sound from the internet second kind of system realizes synoptic diagram.

Fig. 9 is that the present invention realizes music score automatic page turning system principle synoptic diagram.

Figure 10 is the present invention's assistant learning system principle schematic that realizes singing.

Figure 11 is that the present invention realizes the media player principle schematic.

Figure 12 is the schematic flow sheet that the present invention judges two song similaritys.

Embodiment:

Core point of the present invention is, the acoustic information of input is handled, and extracts first characteristic information, adopts second characteristic information in special algorithm and the multimedia messages to carry out similarity calculating then.Select that maximum multimedia messages of similarity and hope selected multimedia messages as the sound of being imported.When the input of multimedia messages and sound and processing element thereof concentrate in the embedded system, can design based on portable sets such as media player of the present invention, palm PC, portable terminal, notebook computers.When multimedia information storage in server; And the sound input is in client; The acoustic information system parts not only can be integrated in the server but also can be integrated in the client; Server is connected through LAN or internet with client, can design based on media research of the present invention system, music infringement decision-making system, sing learning system, music score automatic page turning device.

Further describe specific embodiments of the present invention below in conjunction with accompanying drawing.

Fig. 1 is based on first kind of implementation of media play system that the present invention realizes that the sound input is selected.In this scheme, characteristic similarity calculating unit 105 comprise two inputs: one is to extract parts 103 from first characteristic information, and it is through handling the voice messaging from phonetic entry parts 101, therefrom characteristic information extraction; Another is from any segment information intercepting parts 104 in second characteristic information, its characteristic through taking out media information from medium information characteristic memory unit 102, any one section characteristic information of intercepting then.A plurality of similarity data that characteristic similarity calculating unit 105 will calculate are exported to characteristic similarity decision means 106; Screen comparison by these parts, therefrom choose the second affiliated characteristic information of maximum that section characteristic information of similarity and from storage medium 107, select required multimedia messages as multimedia messages alternative pack 108.The media information of second characteristic information that medium information characteristic memory unit 102 is stored and storage medium 107 storages is one to one, and promptly a media information in medium information characteristic memory unit 102 second characteristic information and the storage medium 107 is one to one.This corresponding relation also is stored in the medium information characteristic memory unit 102 or in the storage medium 107.In concrete the realization, medium information characteristic memory unit 102 can be merged into by a memory unit with storage medium 107, and wherein second characteristic information can adopt the data form file layout with the corresponding of media information, also can adopt the file layout of database.Typical phonetic entry parts 101 are concrete to be realized such as being to be made up of microphone, microphone signal treatment circuit and digitization of speech signals Acquisition Circuit.First characteristic information extracts aspect ratio that parts 103 the extract prosodic information in the voice, pitch information etc. in this way from the voice of being imported, and further can be converted into music-book information, as characteristic.Realization as media play system; In the concrete design, wherein any segment information intercepting parts 104, characteristic similarity calculating unit 105, characteristic similarity decision means 106 and the multimedia messages alternative pack 108 that extract in parts 103, second characteristic information of first characteristic information all realized through software by the processor of media player.The effect of its realization is; When people hope media renderer plays bar media information; Can be facing to the segment by the microphone humming music information that this media information comprised of phonetic entry parts 101; Utilize method media player of the present invention just can select automatically with the institute segment of hum nearest like media information play, thereby save because of forgetting the media information title or carrying out the worry that multilevel menu is operated too much because of media information.Even the melody segment that the user hummed of media player is very inaccurate, has only that the basic rhythm is similar to get final product, thereby have very big practicality, adaptability, operability.The present invention realizes that media player can realize from media player, selecting and the media information of input voice with very big similarity through phonetic entry; Changed the mode of operation of existing media player fully; Has media information location feature more accurately; And most applications is without the operation of hand; Directly, simplified user's operation easier greatly, even can realize the operation of media player for blind person or the user that is ignorant of player operation through saying or just singing and to carry out the selection of media information.

Fig. 2 is based on second kind of implementation of media play system that the present invention realizes that the sound input is selected.The difference of this scheme and scheme shown in Figure 1 is that second characteristic information is not to be stored in the memory bank in advance, but calculates second characteristic information by medium information characteristic calculating unit 202 through the media information that reads in the storage medium 107.This implementation is than the benefit of first kind of scheme, can utilize people to improve the efficient of the characteristic of extracting or the feature that adjustment is extracted for the further achievement in research of the phonetic feature algorithm through updated at any time medium information characteristic calculating unit 202.

Fig. 3 is based on first characteristic information of the present invention and second characteristic information carries out the first method principle schematic that similarity is calculated.In the figure; The length of supposing first characteristic information is 4 bytes; Feature 302 each byte location are labeled as a, b, c, d respectively; The length of first characteristic information 301 is 16 bytes, and the position mark of each byte is 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16.The method of intercepting second characteristic information is to be the byte number of starting point intercepting equal length with arbitrary byte, saves the not enough intercepting value of byte length.Obtain 13 intercepting sections like this, the byte location of each intercepting section is respectively 1,2,3,4; 2,3,4,5; 3,4,5,6; 4,5,6,7; 5,6,7,8; 6,7,8,9; 7,8,9,10; 8,9,10,11; 9,10,11,12; 10,11,12,13; 11,12,13,14; 12,13,14,15; 13,14,15,16.Each intercepting section is carried out similarity with first characteristic information respectively and is calculated result of calculation 303, and result of calculation 303 comprises 13 numerical value, is expressed as R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13.For n bar multimedia messages; The second characteristic information length of supposing every multimedia messages is identical, all is 16 bytes, then calculates total 13*n value by above-mentioned similarity; From 13*n value, choose maximal value again; According to the pairing second characteristic information section of this maximal value, just can learn this second characteristic information, the corresponding relation according to the media information and second characteristic information retrieves corresponding that media information then.

Because everyone hums identical tune; Or say the words of identical content; Humming or the speed of speaking are not necessarily identical; The prosodic features of therefore hum or saying content maybe be more different than the length of the same segment of the prosodic features of media information, are single 1/4 to clap again in the media information like certain syllable, and the prosodic features of humming or saying this syllable possibly be two 1/4 bats; Perhaps certain syllable in the media information is two 1/4 again and claps, and the humming or the prosodic features of saying this syllable possibly be single 1/4 to clap.Therefore in order to improve compatibility and the reliability that similarity is calculated, in carrying out similarity calculating, comprise the attribute byte of situation merge into to(for) the adjacent same characteristic features byte of first characteristic information and/or second characteristic information.Fig. 4 is based on first characteristic information of the present invention and second characteristic information carries out the second method principle schematic that similarity is calculated.Among this figure; Except pressing the mode of Fig. 3,, calculate similarity result 403 in that first characteristic information 402 and second characteristic information 401 are not done the merging processing; Result of calculation 403 comprises 13 numerical value, is expressed as R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13.Second characteristic information 401 has two place's adjacent feature identical among the figure; Be characteristic 2 and characteristic 6, we merge into a characteristic with adjacent same characteristic features, at this moment; Second characteristic information becomes the pooling information 404 of second characteristic information; First characteristic information 402 obtains result 405 with pooling information 404 by identical similarity calculating method then, and result of calculation 405 comprises 10 numerical value, is expressed as R14, R15, R16, R17, R18, R19, R20, R21, R22, R23.For n bar multimedia messages; Do above-mentioned identical processing and calculating; Choose maximal value again; According to the pairing second characteristic information section of this maximal value, just can learn this second characteristic information, the corresponding relation according to the media information and second characteristic information retrieves corresponding that media information then.

Fig. 5 is based on first characteristic information of the present invention and second characteristic information carries out the third method principle schematic that similarity is calculated.Compare with Fig. 4, the first characteristic information existence among this figure needs to merge the adjacent same characteristic features of handling.Earlier do similarity and calculate result 503 by former first characteristic information 502 and second characteristic information 501; Result of calculation 503 comprises 13 numerical value; Be expressed as R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13; Do similarity to the pooling information 504 of first characteristic information and second characteristic information 501 then and calculate result 505; Result of calculation 503 comprises 13 numerical value, is expressed as R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27.For n bar multimedia messages; Do above-mentioned identical processing and calculating; Choose maximal value again; According to the pairing second characteristic information section of this maximal value, just can learn this second characteristic information, the corresponding relation according to the media information and second characteristic information retrieves corresponding that media information then.

When all having the characteristic information that can merge for first characteristic information and second characteristic information, then comprise four kinds of situation and calculate, promptly first characteristic information and second characteristic information directly calculate similarity; Similarity is calculated in the pooling information of first characteristic information and second characteristic information; The pooling information of first characteristic information and second characteristic information calculate similarity; Similarity is calculated in the pooling information of the pooling information of first characteristic information and second characteristic information.

Fig. 6 is based on the present invention and adopts the sound input to select the treatment scheme synoptic diagram of multimedia messages.This figure further provides the realization instance, after converting the MIDI file into behind the extraction MFCC coefficient, converts numbered musical notation information again into as characteristic information.Idiographic flow is: carry out the voice signal input in step 601; As hum one section; Voice signal for input extracts the MFCC coefficient when the step 602; Convert the MFCC coefficient that obtains into the MIDI file in step 603, convert numbered musical notation information in step 604 then, generate first characteristic information in step 605; Suppose and stored the every MIDI file that multimedia messages is corresponding in the media bank; If do not have; Can change out earlier the MIDI file, get into the MIDI file that step 606 reads article one multimedia messages, convert numbered musical notation information in step 607; Generate second characteristic information in step 608, calculate the similarity of first characteristic information and second characteristic information then in step 609; Judge whether the last item multimedia messages in step 610? If not then getting into the MIDI file that step 614 reads next bar multimedia messages; Continue the processing of step 607, step 608, step 609, step 610; Judge the corresponding MIDI file of similarity maximal value if then get into step 611; Step read 612 get with the similarity maximal value the multimedia file that is associated of corresponding MIDI file, at last at the selected multimedia file of step 613 output.

Fig. 7 is based on first kind of principle schematic that the present invention realizes the media research system.The media research system comprises server end 700 and client 710, and client 710 is through interconnection network or LAN 704 Connection Service device ends 700.Wherein server end 700 comprises media information database 701, media interviews processing element 702, network interface 703; Client 710 comprises information exhibition parts 706, phonetic entry parts 707, voice signal processing element 708, network interface 705.The user is through phonetic entry parts 707 input voice; As hum the segment of melody; Or copy the voice document of making in advance, and handle by voice signal processing element 708, comprise the digitizing of voice signal, the extraction of voice first characteristic information; First characteristic information that will extract then sends in interconnection network or the LAN 704 through network interface 705, receives first characteristic information by the network interface 703 of server end 700 and delivers to media interviews processing element 702.Media interviews processing element 702 is taken out second characteristic information of every media information from media information database 701; Adopt similarity calculating method to calculate each segment of every second characteristic information and the similarity of first characteristic information with first characteristic information of receiving then; Choose pairing second characteristic information of similarity maximal value; Take out the media information that with similarity maximal value pairing second characteristic information be associated with the corresponding relation of second characteristic information from media information database 701 according to every media information then; And selected media information sent in interconnection network or the LAN 704 through network interface 703; By network interface 705 these media informations of client 710 and deliver to voice signal processing element 708, by voice signal processing element 708 this media information is delivered to information exhibition parts 706 and show.Like media information is simple music information, and display member 706 can be voice signal output amplifier and loudspeaker or earphone.Like media information is the video that comprises music information, and display member 706 can be the combiner that comprises display screen and voice signal output amplifier and loudspeaker or earphone.As receive that media information comprises many alternative media informations, then can information be presented at by the clauses and subclauses mode and supply the user to select on the display screen of display member 706.The present invention realizes that the media research system can realize the media information that has very big similarity with the input voice through selecting in the various media servers of phonetic entry from internet or LAN; Changed the way of search of existing network search engine or research tool fully; Has media information location feature more accurately; And most applications is without the operation of hand; Directly, simplified user's operation easier greatly, even can realize the search of media information for blind person or the user that is ignorant of computation through saying or just singing and to carry out the search of media information.

Fig. 8 is based on second kind of principle schematic that the present invention realizes the media research system.The media research system comprises server end 800 and client 810, and client 810 is through interconnection network or LAN 704 Connection Service device ends 800.Wherein server end 800 comprises media information database 701, media interviews processing element 802, network interface 703; Client 810 comprises information exhibition parts 706, phonetic entry parts 707, voice signal processing element 808, network interface 705 and the local media second characteristic information memory unit 809.Before carrying out phonetic search, client 810 needs to download every pairing second characteristic information of media information through interconnection network or LAN 704 from server end 800 earlier, stores into then in the second characteristic information memory unit 809.The user is through phonetic entry parts 707 input voice; As hum the segment of melody; Or the voice document made in advance of copy; Handled by voice signal processing element 808, comprise the digitizing of voice signal, the extraction of voice first characteristic information, voice signal processing element 808 reads second characteristic information of every media information from the second characteristic information memory unit 809 then; Adopt similarity calculating method to calculate each segment of every second characteristic information and the similarity of first characteristic information with first characteristic information that extracts then; Choose pairing second characteristic information of similarity maximal value, second characteristic information of choosing is sent in interconnection network or the LAN 704 through network interface 705, receive second characteristic information by the network interface 703 of server end 800 and deliver to media interviews processing element 802.Media interviews processing element 802 according to the corresponding relation of every media information and second characteristic information from 701 taking-ups of media information database and the receive media information that second characteristic information is associated; And selected media information sent in interconnection network or the LAN 704 through network interface 703; By network interface 705 these media informations of client 810 and deliver to voice signal processing element 808, by voice signal processing element 808 this media information is delivered to information exhibition parts 706 and show.Like media information is simple music information, and display member 706 can be voice signal output amplifier and loudspeaker or earphone.Like media information is the video that comprises music information, and display member 706 can be the combiner that comprises display screen and voice signal output amplifier and loudspeaker or earphone.As receive that media information comprises many alternative media informations, then can information be presented at by the clauses and subclauses mode and supply the user to select on the display screen of display member 706.The present invention realizes that the media research system can realize the media information that has very big similarity with the input voice through selecting in the various media servers of phonetic entry from internet or LAN; Changed the way of search of existing network search engine or research tool fully; Has media information location feature more accurately; And most applications is without the operation of hand; Directly, simplified user's operation easier greatly, even can realize the search of media information for blind person or the user that is ignorant of computation through saying or just singing and to carry out the search of media information.

Fig. 9 is that the present invention realizes music score automatic page turning system principle synoptic diagram.Music score automatic page turning system comprises music score display unit 901, processing element 902 and phonetic entry parts 903.Wherein processing element 902 comprises the memory bank of memory bank, processor and the stored routine software of storing music-book information.Phonetic entry parts 903 comprise microphone and voice digitization collection and the memory circuit of collecting voice.Music score display unit 901 is electronic displaing parts, like LCD, organic light emission pipe display unit, Electronic Paper display unit etc.When playing music; Music score display unit 901 shows first page of content of the music score of corresponding melody under the control of processing element 902; In playing procedure; Phonetic entry parts 903 are constantly gathered the sound that input is played; The rhythm that is extracted sound by processing element 902 is as first characteristic information and make similarity with the segment of second characteristic information of institute's playing music of storage in advance and calculate; Can judge the position of having played music score according to the similarity maximal value, thereby, avoid the player manually to carry out the short interruption that the music score page turning causes performance in case processing element 902 analyzes the music score content that is presented at display unit 901 has played and finish then automatically following one page content of music score is presented on the display unit 901.Usually the speed of the melody of concert performer's performance and music score is very approaching, therefore carries out can not need merging when similarity is calculated and handles adjacent same characteristic features.

Figure 10 is the present invention's assistant learning system principle schematic that realizes singing.The singing assistant learning system comprises display unit 1001, processing element 1002 and phonetic entry parts 1003.Wherein processing element 1002 comprises the memory bank of memory bank, processor and the stored routine software of storing musical composition information.Phonetic entry parts 1003 comprise microphone and voice digitization collection and the memory circuit of collecting voice.Music score display unit 1001 is electronic displaing parts, like LCD, organic light emission pipe display unit, Electronic Paper display unit etc.When singing or playing music; Display unit 1001 shows the music score of corresponding melody under the control of processing element 1002; In singing or playing procedure, phonetic entry parts 903 are constantly gathered the sound that input is played, and the rhythm that is extracted sound by processing element 902 is as first characteristic information; After one first melody finishes; First characteristic information that is extracted is made similarity with second characteristic information of institute's playing music of storing in advance by syllable calculate, the difference of the syllable of each syllable and standard melody when being given in singing or playing music according to similarity result of calculation, thus processing element 902 is presented at this species diversity on the display unit 901; Sing or play happy person according to shows syllable difference find mistake, and the performance of adjustment oneself reaches and learns the purpose of assisting.

Figure 11 is that the present invention realizes the media player principle schematic.Media player 1100 comprises processor main frame 1101, control operation button 1102, earphone 1103 and microphone 1104.Processor main frame 1101 is connected with control operation button 1102, earphone 1103 and microphone 1104 through connecting lead 1105; The signal of this connection is two-way; Be that the push button signalling of control operation button 1102 and the voice signal of microphone 1104 inputs can be sent to processor main frame 1101, the output signal of processor main frame 1101 can output to earphone 1103.In other are realized; Processor main frame 1101 carries out wireless connections through wireless signal and control operation button 1102, earphone 1103 and microphone 1104; As adopt Bluetooth technology (BlueTooth) or WiFi technology to realize wireless connections; No matter wired connection or wireless connections mode all are existing mature technologies.Processor main frame 1101 comprises the memory bank 1105 and information processing apparatus 1106 of the media information and second characteristic information thereof.Comprise first button 1107 and second button 1108 on the control operation button 1102.Because when people use media player; When playing certain first medium, can follow the music humming of medium; And player of the present invention also adopts the operator to hum media segment when selecting the broadcast starting point of medium and control medium to carry out; Follow in progress medium and hum or hum segment and control media player and reselect medium or play starting point in order to make media player distinguish the user, adopt first button 1107 and 1108 realizations of second button on the control operation button 1102., the operator representes it is to select medium when pressing first button 1107 through the humming segment;, the operator representes it is the broadcast starting point of selecting medium through the humming segment when pressing second button 1108; When first button 1107 and second button 1108 are not all pressed, be to follow in progress medium to hum.The push button signalling of control operation button 1102 is sent to processor main frame 1101, carries out judgment processing by processing host.Press first button 1107 like the operator, information processing apparatus 1106 is through handling the voice messaging from microphone 1104, therefrom characteristic information extraction; Characteristic from memory bank 1105 taking-up media informations; A plurality of similarity data of calculating of any one section characteristic information of intercepting are screened comparison then; Therefrom choose second characteristic information under maximum that section characteristic information of similarity as the foundation of selecting required multimedia messages, choose media information according to second characteristic information and media information corresponding relation then and play.Press second button 1108 like the operator, information processing apparatus 1106 is through handling the voice messaging from microphone 1104, therefrom characteristic information extraction; Characteristic from memory bank 1105 taking-up media informations; A plurality of similarity data of calculating of any one section characteristic information of intercepting are screened comparison then; Therefrom choose the second affiliated characteristic information of maximum that section characteristic information of similarity as the foundation of selecting required multimedia messages, choose media information and begin broadcast according to second characteristic information and media information corresponding relation then from location point with second characteristic information segment similarity maximum.The medium selection of media player and the automatic location of media play starting point have so just been realized.

Figure 12 is the schematic flow sheet that the present invention judges two song similaritys, and this figure further provides the realization instance, after converting the MIDI file into behind the extraction MFCC coefficient, converts numbered musical notation information again into as characteristic information.Idiographic flow is: at step 1201 input first music; Extract the MFCC coefficient for first music in step 1202; Convert the MFCC coefficient that obtains into the MIDI file in step 1203, convert numbered musical notation information in step 1204 then, generate first characteristic information in step 1205; Take identical processing for second music: in step 1206 input second music; Extract the MFCC coefficient for second music in step 1207; Convert the MFCC coefficient that obtains into the MIDI file in step 1208; Convert numbered musical notation information in step 1209 then, generate first characteristic information in step 1210.Calculate the similarity of first characteristic information and second characteristic information then in step 1211; In step 1212 from from said similarity data, choosing the similarity maximal value and judging that in step 1213 whether the similarity maximal value surpasses threshold values? Reach a conclusion if surpass threshold values then get into step 1214: first music and said second music similarity are high; Reach a conclusion if do not surpass threshold values then get into step 1215: first music and said second music similarity are low.

Claims

1. an automatic page turning device comprises storage medium and the display unit of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component;

Characteristic extracting component; From the voice signal of said sound input component input, extract first characteristic information; When said first characteristic information comprises the MIDI data, one of the musical instrument from said first characteristic information in the removal MIDI data, response grade, tone color trill, reverberation characteristic or several;

The medium information characteristic memory unit; Stored second characteristic information corresponding to said every multimedia messages; When said second characteristic information comprises the MIDI data, one of the musical instrument from said second characteristic information in the removal MIDI data, response grade, tone color trill, reverberation characteristic or several;

The characteristic similarity calculating unit, be used to calculate and judge the multimedia messages that said first characteristic information and said display unit are shown part the current location of the maximum pairing multimedia messages of message segment of the corresponding second characteristic information similarity; In said similarity is calculated, comprise for first characteristic information and/or second characteristic information and carry out the beat adjustment; And/or in said similarity is calculated, comprise for second characteristic information and carry out the tone adjustment, and/or the adjacent same characteristic features that in said similarity is calculated, comprises for first characteristic information and/or second characteristic information merges;

Said similarity Calculation Method comprises: for second characteristic information of multimedia messages that said display unit shows; With arbitrary byte is the byte number of starting point intercepting and first characteristic information equal length from second characteristic information; Save the not enough intercepting value of byte length; Each intercepting section is carried out similarity with first characteristic information respectively and is calculated result of calculation, from said result of calculation, chooses maximal value; Perhaps

Said similarity Calculation Method comprises: for second characteristic information of multimedia messages that said display unit shows; With arbitrary byte is the byte number of starting point intercepting and first characteristic information equal length from second characteristic information; Save the not enough intercepting value of byte length, each intercepting section is carried out similarity with first characteristic information respectively and is calculated result of calculation; And carry out after adjacent same characteristic features merges into a characteristic processing for second characteristic information of multimedia messages that said display unit shows; Byte number with the intercepting and the first characteristic information equal length arbitrary byte second characteristic information that is starting point from merge into a characteristic processing through adjacent same characteristic features after; Save the not enough intercepting value of byte length, each intercepting section is carried out similarity with first characteristic information respectively and is calculated result of calculation; From all result of calculations, choose maximal value then; Perhaps

Said similarity Calculation Method comprises: for second characteristic information of multimedia messages that said display unit shows; With arbitrary byte is the byte number of starting point intercepting and first characteristic information equal length from second characteristic information; Save the not enough intercepting value of byte length, each intercepting section is carried out similarity with first characteristic information respectively and is calculated result of calculation; And carry out after adjacent same characteristic features merges into a characteristic processing for first characteristic information; Is starting point intercepting and the byte number of merging into the first characteristic information equal length after the characteristic processing through adjacent same characteristic features from second characteristic information to second characteristic information of multimedia messages that said display unit shows with arbitrary byte; Save the not enough intercepting value of byte length, each intercepting section respectively with merge into first characteristic information after the characteristic processing through adjacent same characteristic features and carry out similarity and calculate result of calculation; From all result of calculations, choose maximal value then; Perhaps

Said similarity Calculation Method comprises: for second characteristic information of multimedia messages that said display unit shows; With arbitrary byte is the byte number of starting point intercepting and first characteristic information equal length from second characteristic information; Save the not enough intercepting value of byte length, each intercepting section is carried out similarity with first characteristic information respectively and is calculated result of calculation; And carry out after adjacent same characteristic features merges into a characteristic processing for second characteristic information of multimedia messages that said display unit shows; Byte number with the intercepting and the first characteristic information equal length arbitrary byte second characteristic information that is starting point from merge into a characteristic processing through adjacent same characteristic features after; Save the not enough intercepting value of byte length, each intercepting section is carried out similarity with first characteristic information respectively and is calculated result of calculation; And carry out after adjacent same characteristic features merges into a characteristic processing for first characteristic information; Is starting point intercepting and the byte number of merging into the first characteristic information equal length after the characteristic processing through adjacent same characteristic features from second characteristic information to second characteristic information of multimedia messages that said display unit shows with arbitrary byte; Save the not enough intercepting value of byte length, each intercepting section respectively with merge into first characteristic information after the characteristic processing through adjacent same characteristic features and carry out similarity and calculate result of calculation; And carry out adjacent same characteristic features for first characteristic information and merge into a characteristic processing; Carry out after adjacent same characteristic features merges into a characteristic processing simultaneously for second characteristic information of multimedia messages that said display unit shows; With intercepting arbitrary byte second characteristic information that is starting point from merge into a characteristic processing through adjacent same characteristic features after and the byte number of merging into the first characteristic information equal length after the characteristic processing through adjacent same characteristic features; Save the not enough intercepting value of byte length, each intercepting section respectively with merge into first characteristic information after the characteristic processing through adjacent same characteristic features and carry out similarity and calculate result of calculation; From all result of calculations, choose maximal value then;

The page turning decision means is when the current location of said multimedia messages is following one page content that the then said display unit of ending of the current page of the multimedia messages that shows of said display unit shows said multimedia messages.

2. device according to claim 1 is characterized in that said first characteristic information comprises acoustic tones information and/or inflection information; Said second characteristic information comprises acoustic tones information and/or the inflection information that comprises in the multimedia messages.

3. device according to claim 1 is characterized in that said first characteristic information is one of following information or combination: MIDI data, numbered musical notation, staff, cepstral coefficients method data, linear predict code data; Said second characteristic information is one of following information or combination: MIDI data, numbered musical notation, staff, cepstral coefficients method data, linear predict code data, the prosodic information of music, the melodic information of music.

4. according to claim 1 or 2 or 3 described devices, it is characterized in that said storage medium and medium information characteristic memory unit are independently memory bank or shared memory bank; Said storage medium and/or medium information characteristic memory unit have also been stored every multimedia messages and its second characteristic information corresponding relation.

5. an automatic page turning device comprises storage medium and the display unit of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component;

The medium information characteristic calculating unit; Calculating is corresponding to second characteristic information of said every multimedia messages; When said second characteristic information comprises the MIDI data, one of the musical instrument from said second characteristic information in the removal MIDI data, response grade, tone color trill, reverberation characteristic or several;

6. device according to claim 5 is characterized in that said first characteristic information comprises acoustic tones information and/or inflection information; Said second characteristic information comprises acoustic tones information and/or the inflection information that comprises in the multimedia messages.

7. device according to claim 5 is characterized in that said first characteristic information is one of following information or combination: MIDI data, numbered musical notation, staff, cepstral coefficients method data, linear predict code data; Said second characteristic information is one of following information or combination: MIDI data, numbered musical notation, staff, cepstral coefficients method data, linear predict code data, the prosodic information of music, the melodic information of music.