CN106991163A

CN106991163A - A kind of song recommendations method based on singer's sound speciality

Info

Publication number: CN106991163A
Application number: CN201710206783.2A
Authority: CN
Inventors: 余春艳; 苏金池; 刘煌; 郭文忠
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2017-07-28

Abstract

The present invention relates to a kind of song recommendations method based on singer's sound speciality, this method using song numbered musical notation and singer sing opera arias band etc. information, song features library is set up, the performance range of song is extracted and constructs voice tone color embedded space, and obtains the voice tone color of original singer and is characterized.Performance range is extracted to the recording file of singing opera arias of singer and voice tone color is characterized, the sound speciality of singer is portrayed；The sound level distribution situation of calculating song, in the performance capability evaluation value of each sound level, thus calculates user and sings range and the matching degree of song range requirement with singer；The sound clip of singer is embedded into tone color embedded space, the tone color similarity with each singer in embedded space is calculated respectively.The present invention can consider the range matching degree and tone color similarity of singer, calculate the recommendation degree for the user per song.

Description

A kind of song recommendations method based on singer's sound speciality

Technical field

It is more particularly to a kind of to be based on singer's sound speciality the present invention relates to the acoustic signal processing method in field of singing Song recommendations method.

Background technology

Music commending system, which is focused on to user, recommends it to may like the song listened, and the recommended technology of use can mainly divide For content-based recommendation and the recommendation based on collaborative filtering.The main audio according to music itself of content-based recommendation algorithm Feature is recommended, the feature such as low-level image feature or melody, rhythm, school, emotion such as including MFCC.Pushing away based on collaborative filtering Algorithm is recommended mainly to be pushed away according to the program request behavior between user or based on playing record, the similitude between user Recommend.

In recent years, the dual thorn for the large-scale true man's music select-elite class program of fast-developing and each shelves applied in mobile Internet Under swashing, music commending system likes this traditional application scenarios of the song listened gradually to migrate from for user's recommendation, and then penetrates into Enter and recommend to like the emerging application scenarios such as the song sung for user.

However, the synchronous migration of method is not recommended in the migration of application scenarios along with music.Exemplified by singing APP, The song that recommendation function in APP is recommended is based on current hit song.But, hit song is not appropriate for all users and drilled Sing.It is possible to that song tone is too high, causes treble portion because user itself sings range scope and sings the limitation of ability Can do not sing up；Song is also possible to be adapted to be gone to deduce with sound that is rough, having explosive force, and user is that a sound is sweet Schoolgirl.

Obviously, new recommendation application scenarios need new recommendation pattern.Under the application scenarios that K is sung, user is not only Listen song, it is often more important that can farthest deduce song.This is the process of a bi-directional matching, on the one hand, needs are examined The speciality of user's own voice is considered, such as the actual tone color for singing range and sound of user；On the other hand, it is necessary to consider song The bent requirement to singing ability, the range scope of such as song claim and what kind of tone color are more suitable for the emotion of the deduction song Deng.

In order to preferably introduce the concept of the song recommendations based on singer's sound speciality, some relevant musicals, people are introduced The theoretical basic conception of sound.

Tone color：Tone color refers to sound in certain attribute acoustically produced, and auditor can judge two accordingly with same Mode is presented, the difference of sound with identical pitch and loudness.

Range：Range has two kinds of the range of total range and other voice or musical instrument.Total range refers to total model of the series of sound Enclose, i.e., the scope from double bass to descant.The range of individual other voice or musical instrument refers to some voice or certain musical instrument whole Scope of the double bass that can be reached in range to descant.The range of musical instrument is relatively fixed, and the range of voice is due to each The inborn vocal cords size of people, length, thickness are different and whether there is the reasons such as vocal music training by system the day after tomorrow, there is larger area Not.

MIDI (Musical Instrument Digital Interface), is the communication of a kind of digital music, musical instrument Standard.MIDI files can flexibly record the information such as the pitch and the duration of a sound of song, be easy to computer carry out pitch analysis with Processing.

CQT is composed, and a kind of tone color frequency physical features, the wave filter group being distributed by centre frequency exponentially believes musical sound Number it is expressed as determining the spectrum energy of music single-tone, the quality factor q of wave filter group keeps constant.

Individual performance ability includes width and the accuracy in pitch control ability in each sound level of singer's range scope etc..Hair Acoustic energy power is the basis of performance ability, medically utilizes the personal sounding range of range of voice dossier and the dynamic model of loudness Enclose.Professional singer then lifts the performance ability of itself by the vocal music training method of system, but common singer typically will not Go to use specific training method.

Therefore, the application be based on above-mentioned analysis, using song numbered musical notation and singer sing opera arias band etc. information, set up song features text Part storehouse, extract song sings range and the tamber characteristic of singer.Recording file of singing opera arias when being given song recitals using user simultaneously With the numbered musical notation information of song, on the premise of sound level completes quality height, the performance range and tamber characteristic of user is extracted.Synthesis is examined Consider the matching degree and user's tone color and singer's tone color in Qu Ku sung in range and Qu Ku between the requirement of song range of user Between similarity, calculate the recommendation degree per song for the user in Qu Ku, and to the high song of user's recommendation recommendation degree.

The content of the invention

In view of this, it is an object of the invention to provide a kind of song recommendations method based on singer's sound speciality, to drilling Range similarity and tone color similarity, song recommendations are analyzed between the person of singing and singer.

The present invention is realized using following scheme：A kind of song recommendations method based on singer's sound speciality, including it is as follows Step：

Step S1：The numbered musical notation information of song in Qu Ku is analyzed, the MIDI pitch consensus sequences of each song is obtained, analyzes The sound level distribution histogram of song, obtains the performance range requirement of each song；

Step S2：Sung opera arias recording file using MELODIA Algorithm Analysis users, obtain the MIDI that singer sings the song The MIDI pitch consensus sequences of the same song obtained in pitch value sequence, acquisition step S1, the benchmark for calculating singer is sung Ability, extracts it and sings range；

Step S3：Time frequency signal sign is extracted to the file of singing opera arias of singer, is input in depth convolutional neural networks to net Network is iterated training, the depth convolutional neural networks and voice tone color embedded space trained；

Step S4：Time frequency signal is extracted according to the file of singing opera arias of singer to characterize, and is entered into what is trained in step S3 In depth convolutional neural networks, the output of network is vectorial corresponding to the 3-dimensional tamber characteristic of voice tone color embedded space, by this 3-dimensional Tamber characteristic vector is characterized as the voice tone color of original singer；

Step S5：The sound clip of singing opera arias of singer is analyzed, the same method using step S4 obtains the insertion of voice tone color One group of 3-dimensional tamber characteristic vector in space, is characterized as singer's voice tone color；

Step S6：Range requirement and the performance range of singer are sung according to song, calculated between user and song Range matching degree；

Step S7：Characterized according to the voice tone color of original singer and singer, calculate the tone color phase of singer and each singer Like degree；

Step S8：According to range matching degree and tone color similarity, the recommendation per song for the user in Qu Ku is calculated Degree.

Further, the step S6 specifically includes following steps：

Step S61：The weight of each sound level is obtained according to song sound level distribution histogram, the weight of each sound level is equal to should The summation of all sound level occurrence numbers in number of times that sound level occurs divided by the song, the definition of calculation formula is specially：

Wherein, num (X) represents the number of times that sound level X occurs in numbered musical notation, X_maxThe maximum MIDI values of note in numbered musical notation are represented, X_minRepresent the minimum MIDI values of note in numbered musical notation；

Step S62：Using the sound level distribution situation and user of song in the performance capability evaluation value of each sound level, calculate User sings range and the matching degree of song range requirement, and the definition of the calculation formula of range matching degree is specially：

Wherein, U (X) represents performance capability evaluation value of the user on sound level X.

Further, in the step S7, after the sound clip of singer is embedded into tone color embedded space, calculate respectively The tone color similarity of each singer in singer and embedded space, the definition of the calculation formula of tone color similarity is specially：

Tim_sim (u, s)=1-tanh (μ | | Z1-Z2 | |₂)

Wherein, | | Z1-Z2 | |₂The Euclidean distance between 2 points is represented, μ is empirical coefficient, and tanh is hyperbolic tangent function.

Further, in the step S8, when carrying out final recommendation, the performance range and Qu Ku of user is considered The tone color similitude of singer, is calculated in Qu Ku per first in the matching degree of range requirement and the tone color and Qu Ku of user of middle song Song is for the recommendation degree of the user, and the definition of the calculation formula of recommendation degree is specially：

Recom (u, s)=cRan_mat (u, s)+(1-c) Tim_sim (u, s)

Wherein, Recom (u, s) represent song s for user u recommendation degree, Ran_mat (u, s) represent song s for Family u range matching degree, Tim_sim (u, s) represents the similarity of user u tone color and song s original singer's tone color, and c takes It is worth for 0.7.

Compared with prior art, the invention has the advantages that：This method using user the record of singing opera arias when giving song recitals The numbered musical notation information of sound file and song, on the premise of sound level completes quality height, extracts the performance range of user.Song is utilized simultaneously Bent numbered musical notation and singer sing opera arias the information such as band, set up song features library, extract the performance range and tamber characteristic of singer.With profit Width calculation user and the matching degree of song with range, but in some cases do not apply to compare, utilize song MIDI pitches Model analysis goes out the sound level distribution histogram of song with user in the performance capability evaluation value of each sound level, calculates user's performance Range and the matching degree of song range requirement, the scope of application are wider also more accurate, while user can also be complete during performance It is apt to it and sings range.And, will be higher-dimension, sequential by depth convolutional network powerful dimensionality reduction ability and feature learning ability Voice spectrum signature is embedded into the tone color embedded space of 3-dimensional, so as to realize tone color similitude in 3-dimensional tone color embedded space Metrizability.Finally consider user sings range and the matching degree and the sound of user of the range requirement of song in Qu Ku Color and the tone color similitude of singer in Qu Ku, calculate the recommendation degree per song for the user in Qu Ku.Can be according to difference Application scenarios, different parameter value values are set.So as to recommend the suitable and high song of recommendation degree to user.

Brief description of the drawings

Fig. 1 is the method flow schematic block diagram of the present invention.

Fig. 2 is user U3 parts song recommendations the results list of the present invention.

Embodiment

The present invention will be further described for table and embodiment below in conjunction with the accompanying drawings.

The present embodiment provides a kind of song recommendations method based on singer's sound speciality, and following step is included as shown in Figure 1 Suddenly：

In the present embodiment, the step S6 specifically includes following steps：

Wherein, num (X) represents the number of times that sound level X occurs in numbered musical notation, X_maxThe maximum MIDI values of note in numbered musical notation are represented, X_minRepresent the minimum MIDI values of note in numbered musical notation.

In the present embodiment, in the step S7, after the sound clip of singer is embedded into tone color embedded space, respectively The tone color similarity of singer and each singer in embedded space are calculated, the definition of the calculation formula of tone color similarity is specially：

Tim_sim (u, s)=1-tanh (μ | | Z1-Z2 | |₂)

In the present embodiment, in the step S8, when carrying out final recommendation, consider the performance range of user with The tone color similitude of singer, is calculated in Qu Ku in the range of song is required in Qu Ku matching degree and the tone color and Qu Ku of user Per song for the recommendation degree of the user, the definition of the calculation formula of recommendation degree is specially：

Recom (u, s)=cRan_mat (u, s)+(1-c) Tim_sim (u, s)

In the present embodiment, example is provided according to above method, specifically includes following steps：

Step 1：The numbered musical notation information of song in Qu Ku is analyzed, the MIDI pitch consensus sequences of each song is obtained, analyzes The sound level distribution histogram of song, obtains the performance range requirement of each song.Comprise the following steps that：

Step 11：Collect the numbered musical notation for arranging that user gives song recitals in Qu Ku, tone mark and roll call in song numbered musical notation etc. Information, corresponding MIDI pitches value sequence is converted to by numbered musical notation information, and according between at the beginning of correspondence accompanying song, continue when Between etc. information, set up the standard MIDI pitch parameters files of the song.

Step 12：Go out the sound level distribution histogram of song, the sound level distribution of song by song MIDI pitch value sequence analyses In statistics with histogram song MIDI pitch value sequences, the number of times that each sound level occurs, and the order according to the series of sound from low to high Arrangement.Therefore, the range part that most left and most right two sound level of sound level distribution histogram is included, to should song performance Range requirement.

Step 2：Sung opera arias recording file using MELODIA Algorithm Analysis users, obtain the MIDI that singer sings the song Pitch value sequence, obtains the MIDI pitch consensus sequences of same song in Qu Ku, and the benchmark for calculating singer sings ability, extracts It sings range.Comprise the following steps that：

Step 21：Sung opera arias recording file, MELODIA algorithm energy automatic detection songs using MELODIA Algorithm Analysis users In main melody fundamental frequency F0, design parameter is set to { " minfqr ":82.0,"maxfqr":1047.0,"voicing": 0.2,"minpeaksalience":0.0}.Fundamental frequency F0 is converted into MIDI pitch value p, conversion formula is specifically defined as：

Step 22：Below with user U1, user U1 is male, with certain performance experience, but too high sound is sung Do not get on.The MIDI pitch consensus sequences of same song in Qu Ku are obtained according to step 11, U1 sound level X are counted in song collection The number of times of middle appearance and singer sing quasi- number of times to evaluate sound level X completion quality, are designated as β (X), are added to the history text of user In part, the performance range historical record of user is updated.The Wilson's confidential interval lower bound of each sound level is calculated, after renewal Sound level completes quality.

Step 23：Sound level after traversal updates completes quality, then calculates benchmark and sings ability α.α calculation formula Definition is specially：

α=mean (β (X))-std (β (X))

Wherein, mean (β (X)) represents the average value of each sound level completion quality in total range, and std (β (X)) represents total sound Each sound level completes the standard deviation of quality in domain.Through statistics, user U1 is sung after 20 songs, the average completion of each sound level Quality is 0.84, and standard deviation is 0.16, and it is 0.68 to calculate its benchmark and sing ability α.

Step 24：Use ＜ α, [X_min,X_max],sequ[β(X_min),β(X_max)], min (β (X)) ＞ represents drilling for singer Range is sung, is a five-tuple.Wherein, X_minIt is for the minimum sound level of MIDI values, X_maxFor the sound level that MIDI values are maximum, sequ [β (X_min),β(X_max)] it is X_minWith X_maxBetween sound level complete mass-sequential, it is min (β (X)) that tonequality, which completes quality floor value,. It is A3 that analysis, which obtains user U1 more than the minimum sound level that benchmark sings ability, and maximum sound level is G5.And each sound level completes quality Sequential value is [0.86,0.99 ...], wherein it is 0.28 that minimum tonequality, which completes quality floor value,.Finally give drilling for user U1 Singing range is<0.68, [A3, G5], [0.86,0.99 ...], 0.28>.

Step 3：Time frequency signal sign is extracted to the file of singing opera arias of singer, is input in depth convolutional neural networks to network Training is iterated, the depth convolutional neural networks and voice tone color embedded space trained.Comprise the following steps that：

Step 31：Singer is divided into one group of 15 people, the audio of singing opera arias to every suite of song hand carries out framing, CQT is extracted to it special Levy, the CQT coefficients of each frame are 192 dimensions, choose the CQT coefficients of 60 frames, constitute the input matrix of neutral net, matrix size is 60*192。

Step 32：The input matrix obtained in step 31 is input to depth convolutional neural networks, and using training in pairs Method training is iterated to network, the depth convolutional neural networks and corresponding tone color embedded space trained.

Step 4：Time frequency signal is extracted according to the file of singing opera arias of singer to characterize, and is entered into the depth trained in step S3 Spend in convolutional neural networks, the output of network is vectorial corresponding to the 3-dimensional tamber characteristic of voice tone color embedded space, by this 3-dimensional sound Color characteristic vector is characterized as the voice tone color of original singer.Comprise the following steps that：

Step 41：To the sound clip of each singer, using the method for same step 31, extract after CQT features, obtain big The small input matrix for 60*192.

Step 42：The CQT features that step 41 is obtained are input to step 32 and train depth convolutional network, the output of network Corresponding to the 3-dimensional tamber characteristic vector of voice tone color embedded space, using the vectorial voice as original singer of this 3-dimensional tamber characteristic Tone color is characterized.

Step 5：The sound clip of singing opera arias of singer is analyzed, with step S4 methods, is obtained in voice tone color embedded space One group of 3-dimensional tamber characteristic vector, is characterized as singer's voice tone color.Comprise the following steps that：

Step 51：To the sound clip of singer, using the method for same step 31, extract after CQT features, obtaining size is 60*192 input matrix.

Step 52：The CQT features that step 51 is obtained are input to step 32 and train depth convolutional network, the output of network Corresponding to the 3-dimensional tamber characteristic vector of voice tone color embedded space, using the vectorial people's sound as singer of this 3-dimensional tamber characteristic Color table is levied.

Step 6：Range requirement and the performance range of singer are sung according to song, calculated between user and song Range matching degree.Comprise the following steps that：

Step 61：The weight of each sound level is obtained according to song sound level distribution histogram, the weight of each sound level is equal to should The summation of all sound level occurrence numbers in number of times that sound level occurs divided by the song, the definition of calculation formula is specially：

Step 62：The song sung with user U2 performance range and Liu Ruoying《When love is close》Carry out instance analysis.With Family U2, women, its sing range be<0.18,[F4,G5],[0.183,0.409,…],0.08>, i.e. user U2 range [F4, G5] in the range of each sound level completeness be 0.183,0.409 ..., performance ability valuation outside range scope is 0.08.The song Range requirement be [D4, G5].

Calculate in song sound level distribution histogram after the weight of each sound level, range matching degree is equal to user at each Performance capability evaluation value in sound level is multiplied by the weight sum of song sound level distribution histogram.The calculation formula of range matching degree Definition is specially：

Wherein, U (X) represents performance capability evaluation value of the user on sound level X, whether belongs to drilling for user according to sound level X Range is sung, U (X) takes different values.Specific formula for calculation is defined：

Wherein, β (X) represents that user is singing the completion quality of each sound level in range, the sound level of min (β (X)) expression users Complete the minimum value of quality.

Finally give user U2 and sing the song that Liu Ruo English is sung《When love is close》Range matching degree be 0.36.

Step 7：Original singer and singer are respectively obtained according to the method for step 4 and step 5 empty in the insertion of each tone color Between in one group of 3-dimensional tamber characteristic vector, using the Euclidean distance between singer, calculate respectively with it is each in each embedded space The tone color similarity of singer.With singer's subset C8 Liang Jingru, Liu Ruoying, Chen little Chun, schoolmate, Fan Weiqi, Fan Xiaoxuan, TANK, Ren Xianqi } exemplified by.

The definition of the calculation formula of tone color similarity is specially：

Tim_sim (u, s)=1-tanh (μ | | Z1-Z2 | |₂)

Wherein, | | Z1-Z2 | |₂Represent the Euclidean distance between 2 points, μ is empirical coefficient, take 0.05, tanh for hyperbolic just Cut function.The similitude of singer Liang Jingru and other singers are { { Fan Weiqi：0.53 }, { Liu Ruoying：0.55 }, { Fan Xiaoxuan： 0.55 }, { Chen little Chun：0.22},{TANK：0.37 }, { Ren Xianqi：0.40 } a, { schoolmate：0.24}}.

Step 8：When carrying out final recommendation, consider user sings range and the range requirement of song in Qu Ku Matching degree and user tone color and Qu Ku in singer tone color similitude, calculate Qu Ku in per song for the user's Recommendation degree.The calculating of range matching degree is as shown in abovementioned steps 6, and the calculating of tone color similarity is as shown in abovementioned steps 7.With user U3 carry out instance analysis, user U3, male, its sing range model be<0.58,[A3,G5],[0.835,0.602,…], 0.26>, i.e. each sound level completenesses of the user U3 in the range of range [A3, G5] is 0.835,0.302 ..., outside range scope The valuation of performance ability is 0.26.

The definition of the calculation formula of recommendation degree is specially：

Recom (u, s)=cRan_mat (u, s)+(1-c) Tim_sim (u, s)

Wherein, Recom (u, s) represent song s for user u recommendation degree, Ran_mat (u, s) represent song s for Family u range matching degree, Tim_sim (u, s) represents the similarity of user u tone color and song s original singer's tone color, and c is Constant, 0≤c≤1.C values are 0.7.By range matching degree and tone color similarity of the song s to user u, its recommendation is calculated Degree, finally according to recommendation degree, recommends song to user.

End user's U3 recommendation degree highest recommend song details be：Song《10 years》, singer Chen Yixun, range will Ask [C4, F5].Its range matching degree is 0.84, and tone color similarity is 0.88, final to calculate matching degree 0.85.Fig. 2 gives use Part recommendation results more full family U3.

The foregoing is only presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with Modification, should all belong to the covering scope of the present invention.

Claims

1. a kind of song recommendations method based on singer's sound speciality, it is characterised in that：Comprise the following steps：

Step S1：The numbered musical notation information of song in Qu Ku is analyzed, the MIDI pitch consensus sequences of each song is obtained, analyzes song Sound level distribution histogram, obtain the performance range requirement of each song；

Step S2：Sung opera arias recording file using MELODIA Algorithm Analysis users, obtain the MIDI pitches that singer sings the song The MIDI pitch consensus sequences of the same song obtained in value sequence, acquisition step S1, the benchmark for calculating singer sings ability, Extract it and sing range；

Step S3：Time frequency signal sign is extracted to the file of singing opera arias of singer, is input in depth convolutional neural networks to enter network Row iteration is trained, the depth convolutional neural networks and voice tone color embedded space trained；

Step S4：Time frequency signal is extracted according to the file of singing opera arias of singer to characterize, and is entered into the depth trained in step S3 In convolutional neural networks, the output of network is vectorial corresponding to the 3-dimensional tamber characteristic of voice tone color embedded space, by this 3-dimensional tone color Characteristic vector is characterized as the voice tone color of original singer；

Step S5：The sound clip of singing opera arias of singer is analyzed, the same method using step S4 obtains voice tone color embedded space In one group of 3-dimensional tamber characteristic vector, be used as singer's voice tone color characterize；

Step S6：Range requirement and the performance range of singer are sung according to song, the sound between user and song is calculated Domain matching degree；

Step S7：Characterized according to the voice tone color of original singer and singer, calculate the tone color similarity of singer and each singer；

Step S8：According to range matching degree and tone color similarity, the recommendation degree per song for the user in Qu Ku is calculated.

2. a kind of song recommendations method based on singer's sound speciality according to claim 1, it is characterised in that：It is described Step S6 specifically includes following steps：

Step S61：The weight of each sound level is obtained according to song sound level distribution histogram, the weight of each sound level is equal to the sound level The summation of all sound level occurrence numbers in the number of times of appearance divided by the song, the definition of calculation formula is specially：

Wherein, num (X) represents the number of times that sound level X occurs in numbered musical notation, X_maxRepresent the maximum MIDI values of note in numbered musical notation, X_min Represent the minimum MIDI values of note in numbered musical notation；

Step S62：Using the sound level distribution situation and user of song in the performance capability evaluation value of each sound level, user is calculated Range and the matching degree of song range requirement are sung, the definition of the calculation formula of range matching degree is specially：

3. a kind of song recommendations method based on singer's sound speciality according to claim 1, it is characterised in that：It is described In step S7, after the sound clip of singer is embedded into tone color embedded space, calculate each in singer and embedded space respectively The tone color similarity of singer, the definition of the calculation formula of tone color similarity is specially：

Tim_sim (u, s)=1-tanh (μ | | Z1-Z2 | |²)

Wherein, | | Z1-Z2 | |²The Euclidean distance between 2 points is represented, μ is empirical coefficient, and tanh is hyperbolic tangent function.

4. a kind of song recommendations method based on singer's sound speciality according to claim 1, it is characterised in that：It is described In step S8, when carrying out final recommendation, consider user sings range and of the range requirement of song in Qu Ku The tone color similitude of singer in tone color and Qu Ku with degree and user, calculates the recommendation per song for the user in Qu Ku Spend, the definition of the calculation formula of recommendation degree is specially：

Recom (u, s)=cRan_mat (u, s)+(1-c) Tim_sim (u, s)

Wherein, Recom (u, s) represents recommendation degree of the song s for user u, and Ran_mat (u, s) represents song s for user u Range matching degree, Tim_sim (u, s) represents the similarity of user u tone color and song s original singer's tone color, and c values are 0.7。