CN109543063A

CN109543063A - A kind of lyric and song matching degree analysis method

Info

Publication number: CN109543063A
Application number: CN201811180642.9A
Authority: CN
Inventors: 李永徽; 刘彦迪; 牛徐策; 李胡林; 代玉
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-10-09
Filing date: 2018-10-09
Publication date: 2019-03-29
Anticipated expiration: 2038-10-09
Also published as: CN109543063B

Abstract

The invention discloses a kind of lyric and song matching degree analysis methods, including a large amount of Chinese character pronunciations to generate two dimensional image by time-frequency spectrum algorithm；It is aided with corresponding pronunciation training classifier, the sound two dimensional image generated that classifier can provide after any conversion using time-frequency spectrum algorithm corresponds to four probability scores of correspondence of the Chinese four tones of standard Chinese pronunciation；The music score of Chinese operas generates song two dimensional image itself by time-frequency spectrum algorithm；Song two dimensional image itself is inputted classifier by evaluator, and classifier provides most suitable tone to most unsuitable tone in sequence, and returns to evaluator；The lyrics import evaluator by input tone, and evaluator provides matching score according to the information that the lyrics provide.This method can give a word, and in short perhaps the marking of whole section of lyrics is to help musician to improve word or song, and then the phenomenon that avoid " not seeing that the lyrics do not understand song ", so that song is more immensely popular.

Description

A kind of lyric and song matching degree analysis method

Technical field

The present invention relates to a kind of analysis method, specially a kind of lyric and song matching degree analysis method belongs at data Manage technical field.

Background technique

Currently, many mobile devices support that music playback function, music tool are a kind of for playing various sounds The multimedia play software of music file, currently, audio plays tool can match the classification method search lyrics, service according to ci and qu The lyrics of search version are recommended audio and play tool by device, and then the song of recommended version can be shown when playing Word, while when singing, it can also be matched according to the pronunciation of singer with song, facilitating singer to understand audience may go out Existing understanding is difficult, to understand difficulty in corresponding song portions application singing skills with a definite target in view to reduce.

And classification method is matched for existing ci and qu, one can not evaluate the song that any dialect for having tone is write as The matching of word and the song music score of Chinese operas, be merely able to support standard Chinese pronunciation, be not able to satisfy the demand of singer, secondly, it is matched Precision is lower, often will lead to the lyrics and the unmatched phenomenon of song of appearance, influences the usage experience of user.

Summary of the invention

The object of the invention is that providing a kind of lyric and song matching degree analysis method to solve the above-mentioned problems.

The present invention is through the following technical solutions to achieve the above objectives: a kind of lyric and song matching degree analysis method, packet Include following steps:

Step A, a large amount of Chinese character pronunciations generate two dimensional image by time-frequency spectrum algorithm；

Wherein, time-frequency spectrum algorithm is the image changed over time to frequency content each in audio, and time-frequency spectrum is in certain fields It closes and is also referred to as vocal print or sound spectrum.The direction x of time-frequency spectrum picture or the direction y indicate the time, another direction indicates frequency, Each column or row of time-frequency spectrum can obtain by the Fourier transformation of audio near the moment, this method taken near the moment It can also be realized by window function.One section near each moment of the audio column or row that can provide time-frequency spectrum, This process is repeated at the time of different arrives time-frequency spectrum two dimensional image to obtain the final product；

Step B, it is aided with corresponding pronunciation training classifier, after classifier can provide any conversion using time-frequency spectrum algorithm Sound two dimensional image generated correspond to four probability scores of correspondence of the Chinese four tones of standard Chinese pronunciation；

Wherein, classifier in the component being built in inside ci and qu adaptation in advance can there are many implementation, using support to The supervised learning method of amount machine (SVM) algorithm or neural network algorithm；

Step C, the music score of Chinese operas generates song two dimensional image itself by time-frequency spectrum algorithm；

Step D, song two dimensional image itself is inputted classifier by evaluator, and classifier provides most suitable sound in sequence It is transferred to most unsuitable tone, and returns to evaluator；

Step E, the lyrics import evaluator by input tone, and evaluator provides matching point according to the information that the lyrics provide Number.

Preferably, it in order to be intercepted to each word in the corresponding lyrics, in the step C, is calculated by time-frequency spectrum The song frequency that method generates uses Gaussian spread about the function of time change.

Preferably, in order to make classifier accuracy that should reach 95% or more, in the step B, pronunciation training classifier is adopted The data of collection four class sounding of standard Chinese (sound, two sound, three sound, the four tones of standard Chinese pronunciation), every class sounding at least 1,000, as training set Data, if for be if Guangdong language classifier nine sound (high and level tone, yin enter, and on yin, yin is gone, in enter, rising tone, on sun, sun is gone, and sun enters), And so on other language.

Preferably, for accuracy with higher, in the step E, evaluator is true according to the Chinese character pronunciation in the lyrics Surely which probability score is taken, and this evaluation method is repeated to entire ci and qu.

Preferably, in order to realize that each word corresponding to the existing lyrics is given a mark, the song of the step E input Word can be a word, a word or the whole section of lyrics.

The beneficial effects of the present invention are: lyric and song matching degree analysis method design is rationally, in step C, when passing through The song frequency that frequency spectrum algorithm generates uses Gaussian spread about the function of time change, by frequency expansion be frequency band later The two-dimentional time-frequency spectrum picture of song itself is obtained, each word in the corresponding lyrics is intercepted, and is similar to image true The image of Chinese character pronunciation, can input evaluator, and in step B, pronunciation training classifier acquires four class sounding (one of standard Chinese Sound, two sound, three sound, the four tones of standard Chinese pronunciation) data, every class sounding at least 1,000, as training set data, if for Guangdong language classifier For nine sound (high and level tone, yin enter, and on yin, yin is gone, in enter, rising tone, on sun, sun is gone, and sun enters), and so on other language, make to classify Device accuracy should reach 95% or more, and in step E, evaluator determines which probability point is taken according to the Chinese character pronunciation in the lyrics Number, and this evaluation method is repeated to entire ci and qu, the average value of all matching scores is word/sentence/paragraph/song ci and qu Score, accuracy with higher are matched, the lyrics of step E input can be a word, a word or the whole section of lyrics, can Realize that corresponding to the existing lyrics each word is given a mark, facilitate singer to understand audience it is possible that understanding it is difficult, from And difficulty is understood to reduce in corresponding song portions application singing skills with a definite target in view.

Detailed description of the invention

Fig. 1 is schematic structural view of the invention；

Fig. 2 is the lyrics of embodiment of the present invention schematic diagram.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Referring to Fig. 1, a kind of lyric and song matching degree analysis method, comprising the following steps:

Wherein, time-frequency spectrum algorithm is the image changed over time to frequency content each in audio, and time-frequency spectrum is in certain fields It closes and is also referred to as vocal print or sound spectrum.The direction x of time-frequency spectrum picture or the direction y indicate the time, another direction indicates frequency, Each column or row of time-frequency spectrum can obtain by the Fourier transformation of audio near the moment, this method taken near the moment It can also be realized by window function.One section near each moment of the audio column or row that can provide time-frequency spectrum, This process is repeated at the time of different arrives time-frequency spectrum two dimensional image to obtain the final product.

Wherein, classifier in the component being built in inside ci and qu adaptation in advance can there are many implementation, using support to The supervised learning method of amount machine (SVM) algorithm or neural network algorithm.

In the step C, the song frequency generated by time-frequency spectrum algorithm is expanded about the function of time change using Gauss Frequency expansion is the two-dimentional time-frequency spectrum picture that frequency band has just obtained song itself later by exhibition, each word in the corresponding lyrics It is intercepted, so that image is similar to the image of true Chinese character pronunciation, can input evaluator, in the step B, pronunciation training The data of classifier acquisition four class sounding of standard Chinese (sound, two sound, three sound, the four tones of standard Chinese pronunciation), every class sounding at least 1,000, As training set data, be if for Guangdong language classifier nine sound (high and level tone, yin enter, and on yin, yin is gone, in enter, rising tone, on sun, Sun is gone, and sun enters), and so on other language, make classifier accuracy that should reach 95% or more, in the step E, evaluator root It determines which probability score taken according to the Chinese character pronunciation in the lyrics, and this evaluation method, all matchings point is repeated to entire ci and qu Several average value is the word/sentence/paragraph/song ci and qu matching score, accuracy with higher, the step E input The lyrics can be a word, in short or the whole section of lyrics, can be realized each word corresponding to the existing lyrics and give a mark, Facilitate singer to understand audience it is possible that understanding it is difficult, to sing skill in corresponding song portions application with a definite target in view Ingeniously difficulty is understood to reduce.

Embodiment:

Wherein in short " massif is being crossed, although using lyric and song matching degree analysis method evaluation " massif " White head, chatters " score, train classifier first, acquire training data according to the language for having tone.Standard Chinese The four tones of standard Chinese pronunciation, then acquire the data of four class sounding (sound, two sound, three sound, the four tones of standard Chinese pronunciation), and every class sounding at least 1,000, as training set Data.Collected sound is done into time-frequency spectral transformation and obtains two dimensional image, two dimensional image and the input classification of corresponding tone are calculated Method.

Then, song ci and qu is acquired, converts the function that frequency changes over time for song, and cut according to the lyrics It takes, obtains a series of two-dimentional time-frequency spectrum picture of music scores of Chinese operas itself.

Finally, by the two-dimentional time-frequency spectrum picture of the music score of Chinese operas itself and corresponding lyrics pronunciation input evaluator, evaluator calling point Class device provides specific score, finally exports a word, a word, point of a word or entire song according to user demand Number.

Ci and qu is as shown in Figure 2.

Matching result be listed as follows (underline be ci and qu matching score, if be converted into hundred-mark system need multiplied by 100).

" crossing " two word score is low, easily mishears, i.e., ci and qu mismatches.Furthermore " right ", " ", the words such as " head " are from ci and qu It is also bad with seeing." talkative " two words are not easy to mishear, and score is higher.Other word scores are higher.This general comment 47 divides (hundred-mark system), still It can.Therefore author's reply improves or adjusts melody in " crossing " word or changes the lyrics.Singer is singing to " crossing " simultaneously When should pay attention to as far as possible.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.

In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims

1. a kind of lyric and song matching degree analysis method, it is characterised in that: the following steps are included:

Wherein, time-frequency spectrum algorithm is the image changed over time to frequency content each in audio, and time-frequency spectrum is in certain occasions Referred to as vocal print or sound spectrum.The direction x of time-frequency spectrum picture or the direction y indicate the time, another direction indicates frequency, time-frequency Each column or row of spectrum can be obtained by the Fourier transformation of audio near the moment, this to take the method near the moment To be realized by window function.One section near each moment of the audio column or row that can provide time-frequency spectrum, in difference At the time of repeat this process and arrive time-frequency spectrum two dimensional image to obtain the final product；

Step B, it is aided with corresponding pronunciation training classifier, classifier can provide the sound after any conversion using time-frequency spectrum algorithm Sound two dimensional image generated corresponds to four probability scores of correspondence of the Chinese four tones of standard Chinese pronunciation；

Wherein, classifier is in the component being built in inside ci and qu adaptation in advance can there are many implementations, using support vector machines The supervised learning method of algorithm or neural network algorithm；

Step D, song two dimensional image itself is inputted classifier by evaluator, and classifier provides most suitable tone in sequence and arrives Most unsuitable tone, and return to evaluator；

Step E, the lyrics import evaluator by input tone, and evaluator provides matching score according to the information that the lyrics provide.

2. a kind of lyric and song matching degree analysis method according to claim 1, it is characterised in that: in the step C, The song frequency generated by time-frequency spectrum algorithm uses Gaussian spread about the function of time change.

3. a kind of lyric and song matching degree analysis method according to claim 1, it is characterised in that: in the step B, Pronunciation training classifier acquires the data of four class sounding of standard Chinese, and every class sounding at least 1,000, as training set data, If for being nine sound if Guangdong language classifier, and so on other language.

4. a kind of lyric and song matching degree analysis method according to claim 1, it is characterised in that: in the step E, Which probability score is evaluator take according to the Chinese character pronunciation determination in the lyrics, and repeats this evaluation method to entire ci and qu.

5. a kind of lyric and song matching degree analysis method according to claim 1, it is characterised in that: the step E is defeated The lyrics entered can be a word, a word or the whole section of lyrics.