CN110660383A

CN110660383A - Singing scoring method based on lyric and singing alignment

Info

Publication number: CN110660383A
Application number: CN201910890520.7A
Authority: CN
Inventors: 林伟伟; 胡康立
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-09-20
Filing date: 2019-09-20
Publication date: 2020-01-07

Abstract

The invention discloses a singing scoring method based on lyric singing alignment, which comprises the following steps in sequence: recording songs; singing accompanying separation and noise removal; extracting fundamental tone frequency and amplitude; the lyrics are aligned with the singing voice in sentence units; dividing the pitch frequency of each word in the aligned singing voice; calculating a pitch frequency similarity score; calculating a rhythm score according to the singing voice of the user, the time length of each sentence of the standard singing voice and the starting and ending time of each word; normalizing the amplitudes of the user singing voice and the standard singing voice; calculating an amplitude similarity score; and multiplying the pitch frequency score, the rhythm score and the amplitude score by weight coefficients, adding the values, and calculating the comprehensive score of the song. The singing scoring method reduces the influence of accompaniment and noise on singing evaluation; the method has the advantages that the label information of the lyrics is reasonably utilized, so that the evaluation on the fundamental tone frequency and the rhythm of the user is more accurate; the user songs are evaluated in multiple aspects, and the song scoring result is more objective and comprehensive.

Description

Singing scoring method based on lyric and singing alignment

Technical Field

The invention relates to the technical field of voice signal processing, in particular to a singing scoring method based on lyric singing alignment.

Background

With the development of the internet and science and technology, the demand of modern people for online singing entertainment is greater, and users pay more and more attention to the ranking of the singing ability, so that an accurate and comprehensive singing scoring method is very necessary. Currently, in the industry, a singing scoring method directly translates a current recorded audio to be scored for n offset durations to search out a time-wise better corresponding relationship between the recorded audio and a standard audio, so as to improve the singing score of a song. However, this method requires n searches each time to compare an optimal score, and is not accurate enough. Therefore, researchers provide a singing scoring method based on dynamic time warping, the method collects audio data to be scored and reference audio data, and generates corresponding fundamental tone frequency vectors; and then calculating the path distance by utilizing dynamic time warping, determining the intonation score of the audio data to be scored, determining the rhythm score of the audio data to be scored by utilizing the alignment degree, and finally determining the score of the audio data to be scored according to the intonation score and the rhythm score. However, the path normalized by the method may distort the corresponding relationship between the band score and the reference pitch frequency trajectory, and the method only considers two aspects of audio and rhythm, evaluates the user song from the singing skill, and does not consider the emotion aspect. The singing scoring method in academic circles is advanced compared with the singing scoring method in the industry, but is also complex. The early singing scoring method is mainly a feature matching method, and the main idea is to extract vocal music features of some songs, and calculate similarity distances between the vocal music features of the user songs and vocal music features of standard songs by using Dynamic Time Warping (DTW). Extracting three characteristics of fundamental tone frequency, Mel cepstrum coefficient (MFCC) and sound intensity as Wu national seal, and calculating similarity of the characteristics by using DTW algorithm to obtain song score; changhong Lin et al, based on the DTW algorithm, rate songs from features such as RMS energy, pitch, spectral center, spectral flatness, and spread spectrum. However, the song rhythm and emotion aspects are not well considered in the methods, so that WeiHo Tsai and the like are improved on the basis of predecessors, a Hidden Markov Model (HMM) is established to judge whether the song to be scored is correct in rhythm or not while similarity of vocal music features is calculated by using a DTW algorithm. However, the rhythm evaluation method needs to establish an independent corresponding HMM for each song to identify whether the rhythm is correct or not, so that the training cost is high, and the method has great limitation in practical application; the PeiPai Chen and the like extract 5 characteristics related to the singing enthusiasm of the singer in the song, and a support vector regression model is trained by utilizing a large amount of data and used for predicting the singing enthusiasm of the singer; florian Eyben et al divides emotion marker words in an Arousal-value space by using 205 features in singing voice, and then trains a support vector machine to analyze the emotion in the song in the space. NingZhang and the like construct a double-dense-set connected convolutional neural network two-classification model by using a large amount of song data containing emotion labels, and the quality of a user song is evaluated end to end. Although the academic community has deeper research on singing scoring technology, the methods all need a large amount of data and training time, and although there is a certain research on song emotion analysis, the recognition accuracy is low, the recognized emotion types are single, and the method is difficult to realize in practical application.

Although singing scoring technology has been developed to a certain extent in the last ten years, most of academic methods are complex and have certain limitations, and the singing scoring technology is difficult to be directly applied to actual life; therefore, a simpler scoring method is still used in the industry, resulting in insufficient scoring capability for songs.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a singing scoring method based on the alignment of lyrics and singing voice. The method aligns the user song audio frequencies for separating the accompaniment and the noise by using the lyric label information and the automatic sound identification technology, compares the aligned user audio frequencies with the standard audio frequencies, calculates the user audio rhythm score, the fundamental tone frequency score and the amplitude score, and finally determines the final comprehensive score according to the three scores, thereby weakening the influence of the accompaniment and the noise on the song voice evaluation and more accurately evaluating the user song.

The purpose of the invention is realized by the following technical scheme:

a singing scoring method based on lyric singing alignment comprises the following steps:

s1, recording songs;

s2, separating singing voice accompaniment and removing noise;

s3, extracting fundamental tone frequency and amplitude;

s4, aligning the lyrics with the singing voice by taking sentences as units;

s501, dividing the pitch frequency of each word in the aligned singing voice;

s502, calculating a fundamental tone frequency similarity score ScoreP;

s6, calculating a rhythm score ScoreR according to the singing voice of the user, the time length of each sentence of the standard singing voice and the starting and ending time of each word;

s701, normalizing the amplitudes of the singing voice of the user and the standard singing voice;

s702, calculating an amplitude similarity score ScoreV;

and S8, multiplying the pitch frequency Score, the rhythm Score and the amplitude Score by weight coefficients, adding the weighted values, and calculating the comprehensive Score of the song.

The recorded song refers to song audio recorded in a daily environment or a non-professional recording environment such as a karaoke environment.

The sequence of the step S2 is as follows: separating singing voice and accompaniment by using a sound source separation algorithm, and carrying out noise removal processing on the separated singing voice.

The lyric singing voice alignment is that each lyric is taken as a dividing unit, each lyric is aligned with the corresponding singing voice frequency by utilizing an automatic voice recognition technology and is divided into a singing voice set A ═ A₁，A₂，...,A_nIn which A is_iIndicating the singing voice of the i-th sentence aligned with the lyrics.

The steps S501 and S502 include:

for the ith sentence singing voice A_iIdentifying each word pronunciation region, recording its start and end time in the sentence singing voice, dividing into a pitch frequency set P_i＝{P_i1,P_i2,...,P_im},P_ijIt is expressed as the jth word of the lyric of the ith sentence.

Calculate the ith sentence singing voice A_iThe similarity of the pitch frequency of the user is compared and matched with the pitch frequency of the standard singing voice, and the pitch frequency set of the user singing voice is recorded as Pc_i＝{Pc_i1,Pc_i2,...,Pc_imThe standard singing voice fundamental tone frequency set is Ps_i＝{Ps_i1,Ps_i2,...,Ps_imIn comparison with Pc_ijAnd Ps_ijThe starting time of the two is unified at the same starting point, and then the similarity is calculated by using a characteristic similarity measurement algorithm.

The song rhythm evaluation process comprises overall rhythm evaluation and local rhythm evaluation, and comprises the following steps: the overall rhythm evaluation takes the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the singing voice time length Tc of the user_iAnd a standard singing voice duration Ts_iA difference of (a); the local rhythm evaluation is to use each word in the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the pitch frequency Pc of the singing voice of the user_ijAnd the standard singing tone frequency Ps_ijDifference in start and end times.

In the process of calculating the amplitude similarity score, the evaluation aspect comprises the following steps: comparing singing voice A_iUser amplitude Vc_iAnd a standard amplitude Vs_iAverage amplitude

Andand calculating the amplitude similarity by using a characteristic similarity measurement algorithm, and synthesizing the two aspects to obtain an amplitude score.

The overall rhythm score and the amplitude similarity score are each a singing voice A_iTo evaluateUnit, the whole rhythm score and amplitude score of the whole song are the sum of the scores of the characteristics of the singing voice corresponding to each sentence:

wherein f is_iIs singing voice A_iThe evaluation score of a certain feature of (1).

The pitch frequency score and the local rhythm score are each a word in each sentence of singing voice_ijFor the evaluation unit, the pitch frequency score and the local rhythm score of the whole song are the words A in the singing voice_ijSum of corresponding feature evaluation scores of (1):

wherein f is_ijIs the word A in singing voice_ijThe evaluation score of a certain feature of (1).

In the process of step S8, the weight coefficient λ_iAn appropriate value can be artificially given, and the weight coefficient lambda is_iThe sum of (1); regression models can also be built to fit these weight coefficients λ_iTo conform to human auditory perception. The comprehensive fraction calculation formula is as follows: socre ═ λ₁*ScoreR+λ₂*ScoreP+λ₃*ScoreV。

The characteristic similarity measurement algorithm process comprises the following steps: converting user singing voice characteristics and standard singing voice characteristics into characteristic vector F_cAnd F_sComparing the user singing voice feature vector F_cAnd standard singing voice feature vector F_sZero-filling the feature vector with shorter length until the two feature vectors have the same length to obtain the feature vector F with the same length after zero-filling_c' and F_s' measuring the similarity between two characteristic vectors by using Euclidean distance, and calculating a formula:

F_euclideaneuclidean distance, F, of feature vectors_euclideanThe smaller the similarity, the greater the similarity and vice versa.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) the invention can give an objective and accurate score to the songs recorded in the non-professional environment. In real life, a user often accompanies the music in the process of recording songs, and the recorded songs are influenced by recording equipment or the surrounding environment, so that the recorded songs are not the simple singing voice of the user, but the audio with mixed multiple voices. Therefore, when feature matching is performed with the standard audio, the feature matching similarity is reduced. Before the singing voice is matched with the standard singing voice in characteristics, the singing voice of the user in the mixed audio is separated from the accompaniment, then the separated singing voice of the user is subjected to noise removal processing, the influence of the accompaniment and the noise is avoided, and the scoring of the song is more accurate.

(2) And aligning each sentence of lyrics with the singing voice audio by using the lyric label information and an automatic voice recognition technology, and limiting the singing voice of the user and the standard singing voice in the same comparison area. Then, the sentence or word is used as an evaluation unit, and the feature similarity of the singing voice of the user and the standard singing voice is compared under the unit, so that the accuracy of the evaluation result of the features is improved.

(3) The volume can reflect the emotion of the user when singing to a certain extent, the emotional tone is a sad song and is generally lower in volume, the tone is a happy song and is generally higher in volume, and the volume is determined by the amplitude, so that the emotion of the user can be reflected laterally by evaluating the song by utilizing the amplitude. In the step of comparing the similarity after normalizing the amplitudes of the singing voice of the user and the standard singing voice, the normalization processing reduces the difference of the singing voice of the user and the amplitude of the standard singing voice due to the influence of other factors, and then the amplitude similarity is calculated to show whether the emotion put into the song of the user is consistent with the emotion in the standard song.

(4) The song is evaluated by synthesizing the fundamental tone frequency, rhythm and amplitude, the singing level of the user can be more comprehensively reflected, and in the process of calculating the comprehensive score, the weight coefficients of the three aspects can be adjusted, so that the calculated score is more in line with the human auditory perception and the scoring standard.

Drawings

Fig. 1 is a flowchart of a singing scoring method based on lyric and singing alignment according to the present invention.

FIG. 2 is a modular schematic of an embodiment of the method of FIG. 1.

FIG. 3 is a diagram illustrating the user's singing voice aligned with the lyrics and the standard singing voice.

Fig. 4 is a schematic diagram showing the rhythm difference between the user's singing voice and the standard singing voice after a certain sentence of lyrics is aligned.

FIG. 5 shows a pitch frequency subsequence P_ijPitch frequency alignment front and back diagrams.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Referring to fig. 1 to 5, a singing scoring method based on alignment of lyrics and singing voice comprises the following steps in sequence:

s1, recording songs;

s2, separating singing voice accompaniment and removing noise;

s3, extracting fundamental tone frequency and amplitude;

s4, aligning the lyrics with the singing voice by taking sentences as units;

s501, dividing the pitch frequency of each word in the aligned singing voice;

s502, calculating a fundamental tone frequency similarity score ScoreP;

s6, calculating a rhythm score ScoreR of the song according to the singing voice of the user, the duration of each sentence of the standard singing voice and the starting and ending time of each word;

s702, calculating an amplitude similarity score ScoreV;

s8, multiplying the fundamental tone frequency score, the rhythm score and the amplitude score by a weight coefficient lambda_iAnd added to calculate a composite Score for the song.

The recorded songs refer to song audio recorded in a non-professional recording environment, and the non-professional recording environment comprises a daily environment and a karaoke environment.

The step S2 specifically includes: separating singing voice and accompaniment by using a sound source separation algorithm, and carrying out noise removal processing on the separated singing voice.

In step S4, the words and singing voice alignment means that each word of words is used as a division unit, and each word of words is aligned with the corresponding singing voice audio frequency by using an automatic speech recognition technique, and is divided into a singing voice set a ═ a₁，A₂，...,A_nIn which A is_iThe singing voice of the ith sentence aligned with the lyrics is expressed, and i is more than or equal to 1 and less than or equal to n.

Step S501 and step S502 specifically include:

singing voice A of lyrics aligned for ith sentence_iIdentifying each pronunciation region, recording its start time and end time in the sentence singing voice, dividing into a pitch frequency set P_i＝{P_i1,P_i2,...,P_im},P_ijThe word is expressed as the jth word of the lyric of the ith sentence;

calculating the singing voice A of the i-th sentence aligned with the lyrics_iThe similarity of the pitch frequency of the user is compared and matched with the pitch frequency of the standard singing voice, and the pitch frequency set of the user singing voice is recorded as Pc_i＝{Pc_i1,Pc_i2,...,Pc_imThe standard singing voice fundamental tone frequency set is Ps_i＝{Ps_i1,Ps_i2,...,Ps_imIn comparison with Pc_ijAnd Ps_ijThe starting time of the two is required to be unified at the same starting point, and then the similarity is calculated by utilizing a characteristic similarity measurement algorithm;

the jth word P of the lyric of the ith sentence_ijFor the unit of evaluation, the pitch frequency score of the entire song is the word P in the singing voice_ijThe corresponding pitch frequency evaluation score sum.

In step S6, the evaluation of the tempo of the song includes overall tempo evaluation and local tempo evaluation: the overall rhythm evaluation takes the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the singing voice time length Tc of the user_iAnd a standard singing voice duration Ts_iDifference of (2)(ii) a The local rhythm evaluation is based on the word in the singing voice aligned with each sentence as evaluation unit, and the evaluation method is to compare the pitch frequency Pc of the singing voice of the user_ijAnd the standard singing tone frequency Ps_ijDifference in start and end times;

singing voice A of lyrics aligned in ith sentence_iThe integral rhythm score of the whole song is the total score of the rhythm characteristics of the singing voice corresponding to each sentence;

the jth word P of the lyric of the ith sentence_ijFor the evaluation unit, the local rhythm score of the whole song is the word P in the singing voice_ijThe corresponding local tempo score sums.

Step S702 specifically includes: comparing singing voice A of the ith sentence aligned with lyrics_iCorresponding user amplitude Vc_iAnd a standard amplitude Vs_iAverage amplitude of

And

calculating amplitude similarity by using a characteristic similarity measurement algorithm, and synthesizing the two aspects to obtain an amplitude score;

singing voice A of lyrics aligned in ith sentence_iFor the evaluation unit, the amplitude score of the whole song is the sum of the amplitude feature scores corresponding to each sentence.

In step S8, the weight coefficient λ_iA preset value is given artificially, and a weight coefficient lambda_iThe sum of (1);

or, the weight coefficient lambda_iFitting by establishing a regression model to ensure that the regression model accords with human auditory perception, wherein a comprehensive score calculation formula is as follows:

Socre＝λ₁*ScoreR+λ₂*ScoreP+λ₃*ScoreV。

the feature similarity measurement algorithm specifically comprises the following steps: converting user singing voice characteristics and standard singing voice characteristics into characteristic vector F_cAnd F_sComparing the user singing voice feature vector F_cAnd standard singing voice feature vector F_sZero-filling the eigenvectors with shorter lengths until the lengths of the two eigenvectors are consistent to obtain the eigenvectors F 'with consistent lengths after zero-filling'_cAnd F'_sAnd measuring the similarity between the two characteristic vectors by using the Euclidean distance, and calculating a formula:

wherein, F_euclideanEuclidean distance, F, of feature vectors_euclideanThe smaller the similarity, the greater the similarity and vice versa.

Specifically, the method comprises the following steps:

(1) recording songs sung by a user in a non-professional recording environment such as KTV and uploading the songs to a cloud server;

(2) the method comprises the following steps of carrying out score estimation on a user song in a cloud server:

and (2.1) preprocessing the song. Separating the singing voice and the accompaniment by utilizing a harmonic wave and impact source separation technology or a singing voice and accompaniment separation technology based on U-net and the like to obtain the singing voice without the accompaniment; then, the singing voice audio is denoised, such as setting frequency and amplitude thresholds, and audio signals below the two thresholds are zeroed out and removed. And finally, extracting the fundamental tone frequency and the amplitude of the singing voice and inputting the fundamental tone frequency and the amplitude into an audio segmentation and alignment module.

And (2.2) audio segmentation alignment. The cloud server processor finds out the lyrics of the recorded song from the database, divides the lyrics into evaluation units by taking sentences as units, then identifies the singing voice of the user according to the divided sentences and by utilizing an automatic voice recognition technology, divides the singing voice audio frequency and aligns the singing voice audio frequency with the lyrics sentences to obtain an aligned singing voice set A ═ A { A }₁，A₂，...,A_n}. Meanwhile, the cloud server processor also performs segmentation and alignment processing on the standard singing voice or directly loads the segmented and aligned standard singing voice from the database. The user's singing voice is then aligned with the standard singing voice with reference to the first lyric, e.g.FIG. 3 shows, wherein the sequence P_i＝{P_i1,P_i2,...,P_imIs the audio band set corresponding to each lyric word in the ith sentence of lyrics, where Pc_ijFor the user's singing voice pitch frequency segment, Ps_ijThe horizontal axis is the sequence P for the standard singing tone frequency section_ijDistribution in time coordinates.

(2.3) rhythm scoring. As shown in FIG. 4, the overall tempo score is in each sentence A_iAs an evaluation unit, the user singing voice time Tc_iWith the standard singing voice duration Ts_iIs used as an evaluation index, and the time length difference is | Ts_i-Tc_iThe larger the | the worse the overall tempo score Rhythm 1; the partial rhythm score is a subsequence P_ijIn units, it compares the user's song phonon sequence Pc_ijWith standard singing voice subsequence Ps_ijThe difference between the start time and the end time, the larger the difference, the worse the local tempo score Rhythm 2. And comprehensively calculating the overall rhythm score and the local rhythm score to obtain the rhythm score ScoreR.

(2.4) tone scoring. The pitch score is in the subsequence P_ijFor each subsequence P in the evaluation process, as evaluation unit_ijAlignment is required as shown in fig. 5. Then converting the singing voice into a characteristic vector form, wherein sequence characteristic vectors with shorter vector lengths need to be filled with zero to achieve the consistency of the singing voice of the user and the standard singing voice characteristic vector length, and then measuring the similarity of the two characteristic vectors by using Euclidean distance, wherein the smaller the distance, the greater the similarity, and the higher the tone score ScoreP.

(2.5) amplitude scoring. In each sentence A_iThe singing voice of the user and the standard singing voice amplitude are normalized to be an evaluation unit, and the difference between the singing voice and the standard singing voice due to different environments is reduced. Then expressing the singing voice of the user and the amplitude of the standard singing voice in a vector form, filling zero into the vector with shorter length to align the length, and calculating the Euclidean distance of the two vectors as amplitude score Volume 1; the average Volume of the evaluation unit is calculated, and the absolute distance between the two is calculated as the amplitude score Volume 2. The two amplitude scores were combined to obtain the amplitude score ScoreV.

And (2.6) comprehensive scoring. Multiplying the tempo score ScoreR, the pitch score ScoreP and the amplitude score ScoreV by an artificially given weight factor λ_iAnd adding the weights, and calculating a comprehensive Score of the user audio, wherein the sum of the weight coefficients is 1, and the calculation formula is as follows:

Socre＝λ₁*ScoreR+λ₂*ScoreP+λ₃*ScoreV

(3) and downloading the comprehensive Score from the cloud server to a display terminal and feeding back the Score to the user.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A singing scoring method based on lyric singing alignment is characterized by comprising the following steps:

s1, recording songs;

s2, separating singing voice accompaniment and removing noise;

s3, extracting fundamental tone frequency and amplitude;

s4, aligning the lyrics with the singing voice by taking sentences as units;

s501, dividing the pitch frequency of each word in the aligned singing voice;

s502, calculating a fundamental tone frequency similarity score ScoreP;

s702, calculating an amplitude similarity score ScoreV;

s8, multiplying the fundamental tone frequency score, the rhythm score and the amplitude score by the weight valueCoefficient lambda_iAnd added to calculate a composite Score for the song.

2. The lyric scoring method as recited in claim 1, wherein said recorded song is a song audio recorded in a non-professional recording environment, said non-professional recording environment comprising a daily environment, a karaoke environment.

3. The singing scoring method according to claim 1, wherein the step S2 is specifically: separating singing voice and accompaniment by using a sound source separation algorithm, and carrying out noise removal processing on the separated singing voice.

4. The method as claimed in claim 1, wherein the step S4, the lyric-singing alignment means that each lyric is divided into a set of singing voice, wherein each lyric is aligned with the corresponding singing voice audio frequency by using automatic speech recognition technique, and the set of singing voice is divided into a set of singing voice, a ═ a₁，A₂，...,A_nIn which A is_iThe singing voice of the ith sentence aligned with the lyrics is expressed, and i is more than or equal to 1 and less than or equal to n.

5. The singing scoring method based on the alignment of lyrics and singing as claimed in claim 1, wherein the steps S501 and S502 are specifically:

calculating the singing voice A of the i-th sentence aligned with the lyrics_iThe similarity of the pitch frequency of the user is compared and matched with the pitch frequency of the standard singing voice, and the pitch frequency set of the user singing voice is recorded as Pc_i＝{Pc_i1,Pc_i2,...,Pc_imStandard singing pitch frequencyThe aggregate is Ps_i＝{Ps_i1,Ps_i2,...,Ps_imIn comparison with Pc_ijAnd Ps_ijThe starting time of the two is required to be unified at the same starting point, and then the similarity is calculated by utilizing a characteristic similarity measurement algorithm;

6. The singing scoring method according to claim 1, wherein in step S6, the song tempo is evaluated as a global tempo evaluation and a local tempo evaluation: the overall rhythm evaluation takes the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the singing voice time length Tc of the user_iAnd a standard singing voice duration Ts_iA difference of (a); the local rhythm evaluation is based on the word in the singing voice aligned with each sentence as evaluation unit, and the evaluation method is to compare the pitch frequency Pc of the singing voice of the user_ijAnd the standard singing tone frequency Ps_ijDifference in start and end times;

7. The singing scoring method according to claim 1, wherein the step S702 is specifically: comparing singing voice A of the ith sentence aligned with lyrics_iCorresponding user amplitude Vc_iAnd a standard amplitude Vs_iAverage amplitude of

And

8. The singing scoring method based on the alignment of lyrics and singing voice of claim 1, wherein in step S8, the weight coefficient λ_iA preset value is given artificially, and a weight coefficient lambda_iThe sum of (1);

Socre＝λ₁*ScoreR+λ₂*ScoreP+λ₃*ScoreV。

9. the singing scoring method based on the alignment of the lyrics and the singing voice according to claim 5 or 7, characterized in that the feature similarity measurement algorithm specifically comprises: converting user singing voice characteristics and standard singing voice characteristics into characteristic vector F_cAnd F_sComparing the user singing voice feature vector F_cAnd standard singing voice feature vector F_sZero-filling the feature vector with shorter length until the two feature vectors have the same length to obtain the feature vector F with the same length after zero-filling_c' and F_s' measuring the similarity between two characteristic vectors by using Euclidean distance, and calculating a formula: