CN110660383A - Singing scoring method based on lyric and singing alignment - Google Patents

Singing scoring method based on lyric and singing alignment Download PDF

Info

Publication number
CN110660383A
CN110660383A CN201910890520.7A CN201910890520A CN110660383A CN 110660383 A CN110660383 A CN 110660383A CN 201910890520 A CN201910890520 A CN 201910890520A CN 110660383 A CN110660383 A CN 110660383A
Authority
CN
China
Prior art keywords
singing voice
singing
score
user
amplitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910890520.7A
Other languages
Chinese (zh)
Inventor
林伟伟
胡康立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910890520.7A priority Critical patent/CN110660383A/en
Publication of CN110660383A publication Critical patent/CN110660383A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

The invention discloses a singing scoring method based on lyric singing alignment, which comprises the following steps in sequence: recording songs; singing accompanying separation and noise removal; extracting fundamental tone frequency and amplitude; the lyrics are aligned with the singing voice in sentence units; dividing the pitch frequency of each word in the aligned singing voice; calculating a pitch frequency similarity score; calculating a rhythm score according to the singing voice of the user, the time length of each sentence of the standard singing voice and the starting and ending time of each word; normalizing the amplitudes of the user singing voice and the standard singing voice; calculating an amplitude similarity score; and multiplying the pitch frequency score, the rhythm score and the amplitude score by weight coefficients, adding the values, and calculating the comprehensive score of the song. The singing scoring method reduces the influence of accompaniment and noise on singing evaluation; the method has the advantages that the label information of the lyrics is reasonably utilized, so that the evaluation on the fundamental tone frequency and the rhythm of the user is more accurate; the user songs are evaluated in multiple aspects, and the song scoring result is more objective and comprehensive.

Description

Singing scoring method based on lyric and singing alignment
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a singing scoring method based on lyric singing alignment.
Background
With the development of the internet and science and technology, the demand of modern people for online singing entertainment is greater, and users pay more and more attention to the ranking of the singing ability, so that an accurate and comprehensive singing scoring method is very necessary. Currently, in the industry, a singing scoring method directly translates a current recorded audio to be scored for n offset durations to search out a time-wise better corresponding relationship between the recorded audio and a standard audio, so as to improve the singing score of a song. However, this method requires n searches each time to compare an optimal score, and is not accurate enough. Therefore, researchers provide a singing scoring method based on dynamic time warping, the method collects audio data to be scored and reference audio data, and generates corresponding fundamental tone frequency vectors; and then calculating the path distance by utilizing dynamic time warping, determining the intonation score of the audio data to be scored, determining the rhythm score of the audio data to be scored by utilizing the alignment degree, and finally determining the score of the audio data to be scored according to the intonation score and the rhythm score. However, the path normalized by the method may distort the corresponding relationship between the band score and the reference pitch frequency trajectory, and the method only considers two aspects of audio and rhythm, evaluates the user song from the singing skill, and does not consider the emotion aspect. The singing scoring method in academic circles is advanced compared with the singing scoring method in the industry, but is also complex. The early singing scoring method is mainly a feature matching method, and the main idea is to extract vocal music features of some songs, and calculate similarity distances between the vocal music features of the user songs and vocal music features of standard songs by using Dynamic Time Warping (DTW). Extracting three characteristics of fundamental tone frequency, Mel cepstrum coefficient (MFCC) and sound intensity as Wu national seal, and calculating similarity of the characteristics by using DTW algorithm to obtain song score; changhong Lin et al, based on the DTW algorithm, rate songs from features such as RMS energy, pitch, spectral center, spectral flatness, and spread spectrum. However, the song rhythm and emotion aspects are not well considered in the methods, so that WeiHo Tsai and the like are improved on the basis of predecessors, a Hidden Markov Model (HMM) is established to judge whether the song to be scored is correct in rhythm or not while similarity of vocal music features is calculated by using a DTW algorithm. However, the rhythm evaluation method needs to establish an independent corresponding HMM for each song to identify whether the rhythm is correct or not, so that the training cost is high, and the method has great limitation in practical application; the PeiPai Chen and the like extract 5 characteristics related to the singing enthusiasm of the singer in the song, and a support vector regression model is trained by utilizing a large amount of data and used for predicting the singing enthusiasm of the singer; florian Eyben et al divides emotion marker words in an Arousal-value space by using 205 features in singing voice, and then trains a support vector machine to analyze the emotion in the song in the space. NingZhang and the like construct a double-dense-set connected convolutional neural network two-classification model by using a large amount of song data containing emotion labels, and the quality of a user song is evaluated end to end. Although the academic community has deeper research on singing scoring technology, the methods all need a large amount of data and training time, and although there is a certain research on song emotion analysis, the recognition accuracy is low, the recognized emotion types are single, and the method is difficult to realize in practical application.
Although singing scoring technology has been developed to a certain extent in the last ten years, most of academic methods are complex and have certain limitations, and the singing scoring technology is difficult to be directly applied to actual life; therefore, a simpler scoring method is still used in the industry, resulting in insufficient scoring capability for songs.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a singing scoring method based on the alignment of lyrics and singing voice. The method aligns the user song audio frequencies for separating the accompaniment and the noise by using the lyric label information and the automatic sound identification technology, compares the aligned user audio frequencies with the standard audio frequencies, calculates the user audio rhythm score, the fundamental tone frequency score and the amplitude score, and finally determines the final comprehensive score according to the three scores, thereby weakening the influence of the accompaniment and the noise on the song voice evaluation and more accurately evaluating the user song.
The purpose of the invention is realized by the following technical scheme:
a singing scoring method based on lyric singing alignment comprises the following steps:
s1, recording songs;
s2, separating singing voice accompaniment and removing noise;
s3, extracting fundamental tone frequency and amplitude;
s4, aligning the lyrics with the singing voice by taking sentences as units;
s501, dividing the pitch frequency of each word in the aligned singing voice;
s502, calculating a fundamental tone frequency similarity score ScoreP;
s6, calculating a rhythm score ScoreR according to the singing voice of the user, the time length of each sentence of the standard singing voice and the starting and ending time of each word;
s701, normalizing the amplitudes of the singing voice of the user and the standard singing voice;
s702, calculating an amplitude similarity score ScoreV;
and S8, multiplying the pitch frequency Score, the rhythm Score and the amplitude Score by weight coefficients, adding the weighted values, and calculating the comprehensive Score of the song.
The recorded song refers to song audio recorded in a daily environment or a non-professional recording environment such as a karaoke environment.
The sequence of the step S2 is as follows: separating singing voice and accompaniment by using a sound source separation algorithm, and carrying out noise removal processing on the separated singing voice.
The lyric singing voice alignment is that each lyric is taken as a dividing unit, each lyric is aligned with the corresponding singing voice frequency by utilizing an automatic voice recognition technology and is divided into a singing voice set A ═ A1,A2,...,AnIn which A isiIndicating the singing voice of the i-th sentence aligned with the lyrics.
The steps S501 and S502 include:
for the ith sentence singing voice AiIdentifying each word pronunciation region, recording its start and end time in the sentence singing voice, dividing into a pitch frequency set Pi={Pi1,Pi2,...,Pim},PijIt is expressed as the jth word of the lyric of the ith sentence.
Calculate the ith sentence singing voice AiThe similarity of the pitch frequency of the user is compared and matched with the pitch frequency of the standard singing voice, and the pitch frequency set of the user singing voice is recorded as Pci={Pci1,Pci2,...,PcimThe standard singing voice fundamental tone frequency set is Psi={Psi1,Psi2,...,PsimIn comparison with PcijAnd PsijThe starting time of the two is unified at the same starting point, and then the similarity is calculated by using a characteristic similarity measurement algorithm.
The song rhythm evaluation process comprises overall rhythm evaluation and local rhythm evaluation, and comprises the following steps: the overall rhythm evaluation takes the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the singing voice time length Tc of the useriAnd a standard singing voice duration TsiA difference of (a); the local rhythm evaluation is to use each word in the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the pitch frequency Pc of the singing voice of the userijAnd the standard singing tone frequency PsijDifference in start and end times.
In the process of calculating the amplitude similarity score, the evaluation aspect comprises the following steps: comparing singing voice AiUser amplitude VciAnd a standard amplitude VsiAverage amplitude
Figure BDA0002208603040000032
Andand calculating the amplitude similarity by using a characteristic similarity measurement algorithm, and synthesizing the two aspects to obtain an amplitude score.
The overall rhythm score and the amplitude similarity score are each a singing voice AiTo evaluateUnit, the whole rhythm score and amplitude score of the whole song are the sum of the scores of the characteristics of the singing voice corresponding to each sentence:
Figure BDA0002208603040000031
wherein f isiIs singing voice AiThe evaluation score of a certain feature of (1).
The pitch frequency score and the local rhythm score are each a word in each sentence of singing voiceijFor the evaluation unit, the pitch frequency score and the local rhythm score of the whole song are the words A in the singing voiceijSum of corresponding feature evaluation scores of (1):
Figure BDA0002208603040000041
wherein f isijIs the word A in singing voiceijThe evaluation score of a certain feature of (1).
In the process of step S8, the weight coefficient λiAn appropriate value can be artificially given, and the weight coefficient lambda isiThe sum of (1); regression models can also be built to fit these weight coefficients λiTo conform to human auditory perception. The comprehensive fraction calculation formula is as follows: socre ═ λ1*ScoreR+λ2*ScoreP+λ3*ScoreV。
The characteristic similarity measurement algorithm process comprises the following steps: converting user singing voice characteristics and standard singing voice characteristics into characteristic vector FcAnd FsComparing the user singing voice feature vector FcAnd standard singing voice feature vector FsZero-filling the feature vector with shorter length until the two feature vectors have the same length to obtain the feature vector F with the same length after zero-fillingc' and Fs' measuring the similarity between two characteristic vectors by using Euclidean distance, and calculating a formula:
Figure BDA0002208603040000042
Feuclideaneuclidean distance, F, of feature vectorseuclideanThe smaller the similarity, the greater the similarity and vice versa.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the invention can give an objective and accurate score to the songs recorded in the non-professional environment. In real life, a user often accompanies the music in the process of recording songs, and the recorded songs are influenced by recording equipment or the surrounding environment, so that the recorded songs are not the simple singing voice of the user, but the audio with mixed multiple voices. Therefore, when feature matching is performed with the standard audio, the feature matching similarity is reduced. Before the singing voice is matched with the standard singing voice in characteristics, the singing voice of the user in the mixed audio is separated from the accompaniment, then the separated singing voice of the user is subjected to noise removal processing, the influence of the accompaniment and the noise is avoided, and the scoring of the song is more accurate.
(2) And aligning each sentence of lyrics with the singing voice audio by using the lyric label information and an automatic voice recognition technology, and limiting the singing voice of the user and the standard singing voice in the same comparison area. Then, the sentence or word is used as an evaluation unit, and the feature similarity of the singing voice of the user and the standard singing voice is compared under the unit, so that the accuracy of the evaluation result of the features is improved.
(3) The volume can reflect the emotion of the user when singing to a certain extent, the emotional tone is a sad song and is generally lower in volume, the tone is a happy song and is generally higher in volume, and the volume is determined by the amplitude, so that the emotion of the user can be reflected laterally by evaluating the song by utilizing the amplitude. In the step of comparing the similarity after normalizing the amplitudes of the singing voice of the user and the standard singing voice, the normalization processing reduces the difference of the singing voice of the user and the amplitude of the standard singing voice due to the influence of other factors, and then the amplitude similarity is calculated to show whether the emotion put into the song of the user is consistent with the emotion in the standard song.
(4) The song is evaluated by synthesizing the fundamental tone frequency, rhythm and amplitude, the singing level of the user can be more comprehensively reflected, and in the process of calculating the comprehensive score, the weight coefficients of the three aspects can be adjusted, so that the calculated score is more in line with the human auditory perception and the scoring standard.
Drawings
Fig. 1 is a flowchart of a singing scoring method based on lyric and singing alignment according to the present invention.
FIG. 2 is a modular schematic of an embodiment of the method of FIG. 1.
FIG. 3 is a diagram illustrating the user's singing voice aligned with the lyrics and the standard singing voice.
Fig. 4 is a schematic diagram showing the rhythm difference between the user's singing voice and the standard singing voice after a certain sentence of lyrics is aligned.
FIG. 5 shows a pitch frequency subsequence PijPitch frequency alignment front and back diagrams.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Referring to fig. 1 to 5, a singing scoring method based on alignment of lyrics and singing voice comprises the following steps in sequence:
s1, recording songs;
s2, separating singing voice accompaniment and removing noise;
s3, extracting fundamental tone frequency and amplitude;
s4, aligning the lyrics with the singing voice by taking sentences as units;
s501, dividing the pitch frequency of each word in the aligned singing voice;
s502, calculating a fundamental tone frequency similarity score ScoreP;
s6, calculating a rhythm score ScoreR of the song according to the singing voice of the user, the duration of each sentence of the standard singing voice and the starting and ending time of each word;
s701, normalizing the amplitudes of the singing voice of the user and the standard singing voice;
s702, calculating an amplitude similarity score ScoreV;
s8, multiplying the fundamental tone frequency score, the rhythm score and the amplitude score by a weight coefficient lambdaiAnd added to calculate a composite Score for the song.
The recorded songs refer to song audio recorded in a non-professional recording environment, and the non-professional recording environment comprises a daily environment and a karaoke environment.
The step S2 specifically includes: separating singing voice and accompaniment by using a sound source separation algorithm, and carrying out noise removal processing on the separated singing voice.
In step S4, the words and singing voice alignment means that each word of words is used as a division unit, and each word of words is aligned with the corresponding singing voice audio frequency by using an automatic speech recognition technique, and is divided into a singing voice set a ═ a1,A2,...,AnIn which A isiThe singing voice of the ith sentence aligned with the lyrics is expressed, and i is more than or equal to 1 and less than or equal to n.
Step S501 and step S502 specifically include:
singing voice A of lyrics aligned for ith sentenceiIdentifying each pronunciation region, recording its start time and end time in the sentence singing voice, dividing into a pitch frequency set Pi={Pi1,Pi2,...,Pim},PijThe word is expressed as the jth word of the lyric of the ith sentence;
calculating the singing voice A of the i-th sentence aligned with the lyricsiThe similarity of the pitch frequency of the user is compared and matched with the pitch frequency of the standard singing voice, and the pitch frequency set of the user singing voice is recorded as Pci={Pci1,Pci2,...,PcimThe standard singing voice fundamental tone frequency set is Psi={Psi1,Psi2,...,PsimIn comparison with PcijAnd PsijThe starting time of the two is required to be unified at the same starting point, and then the similarity is calculated by utilizing a characteristic similarity measurement algorithm;
the jth word P of the lyric of the ith sentenceijFor the unit of evaluation, the pitch frequency score of the entire song is the word P in the singing voiceijThe corresponding pitch frequency evaluation score sum.
In step S6, the evaluation of the tempo of the song includes overall tempo evaluation and local tempo evaluation: the overall rhythm evaluation takes the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the singing voice time length Tc of the useriAnd a standard singing voice duration TsiDifference of (2)(ii) a The local rhythm evaluation is based on the word in the singing voice aligned with each sentence as evaluation unit, and the evaluation method is to compare the pitch frequency Pc of the singing voice of the userijAnd the standard singing tone frequency PsijDifference in start and end times;
singing voice A of lyrics aligned in ith sentenceiThe integral rhythm score of the whole song is the total score of the rhythm characteristics of the singing voice corresponding to each sentence;
the jth word P of the lyric of the ith sentenceijFor the evaluation unit, the local rhythm score of the whole song is the word P in the singing voiceijThe corresponding local tempo score sums.
Step S702 specifically includes: comparing singing voice A of the ith sentence aligned with lyricsiCorresponding user amplitude VciAnd a standard amplitude VsiAverage amplitude of
Figure BDA0002208603040000072
And
Figure BDA0002208603040000073
calculating amplitude similarity by using a characteristic similarity measurement algorithm, and synthesizing the two aspects to obtain an amplitude score;
singing voice A of lyrics aligned in ith sentenceiFor the evaluation unit, the amplitude score of the whole song is the sum of the amplitude feature scores corresponding to each sentence.
In step S8, the weight coefficient λiA preset value is given artificially, and a weight coefficient lambdaiThe sum of (1);
or, the weight coefficient lambdaiFitting by establishing a regression model to ensure that the regression model accords with human auditory perception, wherein a comprehensive score calculation formula is as follows:
Socre=λ1*ScoreR+λ2*ScoreP+λ3*ScoreV。
the feature similarity measurement algorithm specifically comprises the following steps: converting user singing voice characteristics and standard singing voice characteristics into characteristic vector FcAnd FsComparing the user singing voice feature vector FcAnd standard singing voice feature vector FsZero-filling the eigenvectors with shorter lengths until the lengths of the two eigenvectors are consistent to obtain the eigenvectors F 'with consistent lengths after zero-filling'cAnd F'sAnd measuring the similarity between the two characteristic vectors by using the Euclidean distance, and calculating a formula:
wherein, FeuclideanEuclidean distance, F, of feature vectorseuclideanThe smaller the similarity, the greater the similarity and vice versa.
Specifically, the method comprises the following steps:
a singing scoring method based on lyric singing alignment comprises the following steps:
(1) recording songs sung by a user in a non-professional recording environment such as KTV and uploading the songs to a cloud server;
(2) the method comprises the following steps of carrying out score estimation on a user song in a cloud server:
and (2.1) preprocessing the song. Separating the singing voice and the accompaniment by utilizing a harmonic wave and impact source separation technology or a singing voice and accompaniment separation technology based on U-net and the like to obtain the singing voice without the accompaniment; then, the singing voice audio is denoised, such as setting frequency and amplitude thresholds, and audio signals below the two thresholds are zeroed out and removed. And finally, extracting the fundamental tone frequency and the amplitude of the singing voice and inputting the fundamental tone frequency and the amplitude into an audio segmentation and alignment module.
And (2.2) audio segmentation alignment. The cloud server processor finds out the lyrics of the recorded song from the database, divides the lyrics into evaluation units by taking sentences as units, then identifies the singing voice of the user according to the divided sentences and by utilizing an automatic voice recognition technology, divides the singing voice audio frequency and aligns the singing voice audio frequency with the lyrics sentences to obtain an aligned singing voice set A ═ A { A }1,A2,...,An}. Meanwhile, the cloud server processor also performs segmentation and alignment processing on the standard singing voice or directly loads the segmented and aligned standard singing voice from the database. The user's singing voice is then aligned with the standard singing voice with reference to the first lyric, e.g.FIG. 3 shows, wherein the sequence Pi={Pi1,Pi2,...,PimIs the audio band set corresponding to each lyric word in the ith sentence of lyrics, where PcijFor the user's singing voice pitch frequency segment, PsijThe horizontal axis is the sequence P for the standard singing tone frequency sectionijDistribution in time coordinates.
(2.3) rhythm scoring. As shown in FIG. 4, the overall tempo score is in each sentence AiAs an evaluation unit, the user singing voice time TciWith the standard singing voice duration TsiIs used as an evaluation index, and the time length difference is | Tsi-TciThe larger the | the worse the overall tempo score Rhythm 1; the partial rhythm score is a subsequence PijIn units, it compares the user's song phonon sequence PcijWith standard singing voice subsequence PsijThe difference between the start time and the end time, the larger the difference, the worse the local tempo score Rhythm 2. And comprehensively calculating the overall rhythm score and the local rhythm score to obtain the rhythm score ScoreR.
(2.4) tone scoring. The pitch score is in the subsequence PijFor each subsequence P in the evaluation process, as evaluation unitijAlignment is required as shown in fig. 5. Then converting the singing voice into a characteristic vector form, wherein sequence characteristic vectors with shorter vector lengths need to be filled with zero to achieve the consistency of the singing voice of the user and the standard singing voice characteristic vector length, and then measuring the similarity of the two characteristic vectors by using Euclidean distance, wherein the smaller the distance, the greater the similarity, and the higher the tone score ScoreP.
(2.5) amplitude scoring. In each sentence AiThe singing voice of the user and the standard singing voice amplitude are normalized to be an evaluation unit, and the difference between the singing voice and the standard singing voice due to different environments is reduced. Then expressing the singing voice of the user and the amplitude of the standard singing voice in a vector form, filling zero into the vector with shorter length to align the length, and calculating the Euclidean distance of the two vectors as amplitude score Volume 1; the average Volume of the evaluation unit is calculated, and the absolute distance between the two is calculated as the amplitude score Volume 2. The two amplitude scores were combined to obtain the amplitude score ScoreV.
And (2.6) comprehensive scoring. Multiplying the tempo score ScoreR, the pitch score ScoreP and the amplitude score ScoreV by an artificially given weight factor λiAnd adding the weights, and calculating a comprehensive Score of the user audio, wherein the sum of the weight coefficients is 1, and the calculation formula is as follows:
Socre=λ1*ScoreR+λ2*ScoreP+λ3*ScoreV
Figure BDA0002208603040000091
(3) and downloading the comprehensive Score from the cloud server to a display terminal and feeding back the Score to the user.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (9)

1. A singing scoring method based on lyric singing alignment is characterized by comprising the following steps:
s1, recording songs;
s2, separating singing voice accompaniment and removing noise;
s3, extracting fundamental tone frequency and amplitude;
s4, aligning the lyrics with the singing voice by taking sentences as units;
s501, dividing the pitch frequency of each word in the aligned singing voice;
s502, calculating a fundamental tone frequency similarity score ScoreP;
s6, calculating a rhythm score ScoreR of the song according to the singing voice of the user, the duration of each sentence of the standard singing voice and the starting and ending time of each word;
s701, normalizing the amplitudes of the singing voice of the user and the standard singing voice;
s702, calculating an amplitude similarity score ScoreV;
s8, multiplying the fundamental tone frequency score, the rhythm score and the amplitude score by the weight valueCoefficient lambdaiAnd added to calculate a composite Score for the song.
2. The lyric scoring method as recited in claim 1, wherein said recorded song is a song audio recorded in a non-professional recording environment, said non-professional recording environment comprising a daily environment, a karaoke environment.
3. The singing scoring method according to claim 1, wherein the step S2 is specifically: separating singing voice and accompaniment by using a sound source separation algorithm, and carrying out noise removal processing on the separated singing voice.
4. The method as claimed in claim 1, wherein the step S4, the lyric-singing alignment means that each lyric is divided into a set of singing voice, wherein each lyric is aligned with the corresponding singing voice audio frequency by using automatic speech recognition technique, and the set of singing voice is divided into a set of singing voice, a ═ a1,A2,...,AnIn which A isiThe singing voice of the ith sentence aligned with the lyrics is expressed, and i is more than or equal to 1 and less than or equal to n.
5. The singing scoring method based on the alignment of lyrics and singing as claimed in claim 1, wherein the steps S501 and S502 are specifically:
singing voice A of lyrics aligned for ith sentenceiIdentifying each pronunciation region, recording its start time and end time in the sentence singing voice, dividing into a pitch frequency set Pi={Pi1,Pi2,...,Pim},PijThe word is expressed as the jth word of the lyric of the ith sentence;
calculating the singing voice A of the i-th sentence aligned with the lyricsiThe similarity of the pitch frequency of the user is compared and matched with the pitch frequency of the standard singing voice, and the pitch frequency set of the user singing voice is recorded as Pci={Pci1,Pci2,...,PcimStandard singing pitch frequencyThe aggregate is Psi={Psi1,Psi2,...,PsimIn comparison with PcijAnd PsijThe starting time of the two is required to be unified at the same starting point, and then the similarity is calculated by utilizing a characteristic similarity measurement algorithm;
the jth word P of the lyric of the ith sentenceijFor the unit of evaluation, the pitch frequency score of the entire song is the word P in the singing voiceijThe corresponding pitch frequency evaluation score sum.
6. The singing scoring method according to claim 1, wherein in step S6, the song tempo is evaluated as a global tempo evaluation and a local tempo evaluation: the overall rhythm evaluation takes the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the singing voice time length Tc of the useriAnd a standard singing voice duration TsiA difference of (a); the local rhythm evaluation is based on the word in the singing voice aligned with each sentence as evaluation unit, and the evaluation method is to compare the pitch frequency Pc of the singing voice of the userijAnd the standard singing tone frequency PsijDifference in start and end times;
singing voice A of lyrics aligned in ith sentenceiThe integral rhythm score of the whole song is the total score of the rhythm characteristics of the singing voice corresponding to each sentence;
the jth word P of the lyric of the ith sentenceijFor the evaluation unit, the local rhythm score of the whole song is the word P in the singing voiceijThe corresponding local tempo score sums.
7. The singing scoring method according to claim 1, wherein the step S702 is specifically: comparing singing voice A of the ith sentence aligned with lyricsiCorresponding user amplitude VciAnd a standard amplitude VsiAverage amplitude of
Figure FDA0002208603030000022
And
Figure FDA0002208603030000023
calculating amplitude similarity by using a characteristic similarity measurement algorithm, and synthesizing the two aspects to obtain an amplitude score;
singing voice A of lyrics aligned in ith sentenceiFor the evaluation unit, the amplitude score of the whole song is the sum of the amplitude feature scores corresponding to each sentence.
8. The singing scoring method based on the alignment of lyrics and singing voice of claim 1, wherein in step S8, the weight coefficient λiA preset value is given artificially, and a weight coefficient lambdaiThe sum of (1);
or, the weight coefficient lambdaiFitting by establishing a regression model to ensure that the regression model accords with human auditory perception, wherein a comprehensive score calculation formula is as follows:
Socre=λ1*ScoreR+λ2*ScoreP+λ3*ScoreV。
9. the singing scoring method based on the alignment of the lyrics and the singing voice according to claim 5 or 7, characterized in that the feature similarity measurement algorithm specifically comprises: converting user singing voice characteristics and standard singing voice characteristics into characteristic vector FcAnd FsComparing the user singing voice feature vector FcAnd standard singing voice feature vector FsZero-filling the feature vector with shorter length until the two feature vectors have the same length to obtain the feature vector F with the same length after zero-fillingc' and Fs' measuring the similarity between two characteristic vectors by using Euclidean distance, and calculating a formula:
Figure FDA0002208603030000021
wherein, FeuclideanEuclidean distance, F, of feature vectorseuclideanThe smaller the similarity, the greater the similarity and vice versa.
CN201910890520.7A 2019-09-20 2019-09-20 Singing scoring method based on lyric and singing alignment Pending CN110660383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910890520.7A CN110660383A (en) 2019-09-20 2019-09-20 Singing scoring method based on lyric and singing alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910890520.7A CN110660383A (en) 2019-09-20 2019-09-20 Singing scoring method based on lyric and singing alignment

Publications (1)

Publication Number Publication Date
CN110660383A true CN110660383A (en) 2020-01-07

Family

ID=69037426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910890520.7A Pending CN110660383A (en) 2019-09-20 2019-09-20 Singing scoring method based on lyric and singing alignment

Country Status (1)

Country Link
CN (1) CN110660383A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210850A (en) * 2020-01-10 2020-05-29 腾讯音乐娱乐科技(深圳)有限公司 Lyric alignment method and related product
CN111369975A (en) * 2020-03-17 2020-07-03 郑州工程技术学院 University music scoring method, device, equipment and storage medium based on artificial intelligence
CN112133269A (en) * 2020-09-22 2020-12-25 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device, equipment and medium
CN112802494A (en) * 2021-04-12 2021-05-14 北京世纪好未来教育科技有限公司 Voice evaluation method, device, computer equipment and medium
CN113096689A (en) * 2021-04-02 2021-07-09 腾讯音乐娱乐科技(深圳)有限公司 Song singing evaluation method, equipment and medium
WO2021245234A1 (en) * 2020-06-05 2021-12-09 Sony Group Corporation Electronic device, method and computer program
CN113853047A (en) * 2021-09-29 2021-12-28 深圳市火乐科技发展有限公司 Light control method and device, storage medium and electronic equipment
CN114093386A (en) * 2021-11-10 2022-02-25 厦门大学 Education-oriented multi-dimensional singing evaluation method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894552A (en) * 2010-07-16 2010-11-24 安徽科大讯飞信息科技股份有限公司 Speech spectrum segmentation based singing evaluating system
CN107507628A (en) * 2017-08-31 2017-12-22 广州酷狗计算机科技有限公司 Singing methods of marking, device and terminal
CN107978308A (en) * 2017-11-28 2018-05-01 广东小天才科技有限公司 A kind of K songs methods of marking, device, equipment and storage medium
CN108492835A (en) * 2018-02-06 2018-09-04 南京陶特思软件科技有限公司 A kind of methods of marking of singing
CN108922562A (en) * 2018-06-15 2018-11-30 广州酷狗计算机科技有限公司 Sing evaluation result display methods and device
CN109300485A (en) * 2018-11-19 2019-02-01 北京达佳互联信息技术有限公司 Methods of marking, device, electronic equipment and the computer storage medium of audio signal
CN109448754A (en) * 2018-09-07 2019-03-08 南京光辉互动网络科技股份有限公司 A kind of various dimensions singing marking system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894552A (en) * 2010-07-16 2010-11-24 安徽科大讯飞信息科技股份有限公司 Speech spectrum segmentation based singing evaluating system
CN107507628A (en) * 2017-08-31 2017-12-22 广州酷狗计算机科技有限公司 Singing methods of marking, device and terminal
CN107978308A (en) * 2017-11-28 2018-05-01 广东小天才科技有限公司 A kind of K songs methods of marking, device, equipment and storage medium
CN108492835A (en) * 2018-02-06 2018-09-04 南京陶特思软件科技有限公司 A kind of methods of marking of singing
CN108922562A (en) * 2018-06-15 2018-11-30 广州酷狗计算机科技有限公司 Sing evaluation result display methods and device
CN109448754A (en) * 2018-09-07 2019-03-08 南京光辉互动网络科技股份有限公司 A kind of various dimensions singing marking system
CN109300485A (en) * 2018-11-19 2019-02-01 北京达佳互联信息技术有限公司 Methods of marking, device, electronic equipment and the computer storage medium of audio signal

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
M.NARASIMHAMURTY著,王振永译: "《模式识别 算法及实现方法》", 31 December 2017, 哈尔滨:哈尔滨工业大学出版社 *
林伟伟: "基于遗传算法的Docker集群调度策略", 《华南理工大学学报(自然科学版)》 *
王佳迪: "鲁棒的音乐评分方法研究", 《中国优秀硕士学位论文全文数据库中国优秀硕士学位论文全文数据库信息科技辑》 *
王齐祥: "《现代公共广播技术与工程案例》", 31 August 2011, 北京:国防工业出版社 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210850A (en) * 2020-01-10 2020-05-29 腾讯音乐娱乐科技(深圳)有限公司 Lyric alignment method and related product
CN111210850B (en) * 2020-01-10 2021-06-25 腾讯音乐娱乐科技(深圳)有限公司 Lyric alignment method and related product
CN111369975A (en) * 2020-03-17 2020-07-03 郑州工程技术学院 University music scoring method, device, equipment and storage medium based on artificial intelligence
WO2021245234A1 (en) * 2020-06-05 2021-12-09 Sony Group Corporation Electronic device, method and computer program
CN112133269A (en) * 2020-09-22 2020-12-25 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device, equipment and medium
CN112133269B (en) * 2020-09-22 2024-03-15 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device, equipment and medium
CN113096689A (en) * 2021-04-02 2021-07-09 腾讯音乐娱乐科技(深圳)有限公司 Song singing evaluation method, equipment and medium
CN112802494A (en) * 2021-04-12 2021-05-14 北京世纪好未来教育科技有限公司 Voice evaluation method, device, computer equipment and medium
CN112802494B (en) * 2021-04-12 2021-07-16 北京世纪好未来教育科技有限公司 Voice evaluation method, device, computer equipment and medium
CN113853047A (en) * 2021-09-29 2021-12-28 深圳市火乐科技发展有限公司 Light control method and device, storage medium and electronic equipment
CN114093386A (en) * 2021-11-10 2022-02-25 厦门大学 Education-oriented multi-dimensional singing evaluation method

Similar Documents

Publication Publication Date Title
CN110660383A (en) Singing scoring method based on lyric and singing alignment
Paulus et al. Measuring the similarity of Rhythmic Patterns.
Ryynänen et al. Automatic transcription of melody, bass line, and chords in polyphonic music
Mesaros et al. Singer identification in polyphonic music using vocal separation and pattern recognition methods.
Nakano et al. An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features
CN104272382B (en) Personalized singing synthetic method based on template and system
Ryynänen et al. Transcription of the Singing Melody in Polyphonic Music.
Tsai et al. Automatic evaluation of karaoke singing based on pitch, volume, and rhythm features
CN109979488B (en) System for converting human voice into music score based on stress analysis
Kroher et al. Automatic transcription of flamenco singing from polyphonic music recordings
CN109545191B (en) Real-time detection method for initial position of human voice in song
Lagrange et al. Normalized cuts for predominant melodic source separation
Pandit et al. Feature selection for a DTW-based speaker verification system
Fujihara et al. F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and Viterbi search
Toh et al. Multiple-Feature Fusion Based Onset Detection for Solo Singing Voice.
CN115050387A (en) Multi-dimensional singing playing analysis evaluation method and system in art evaluation
Dzhambazov et al. On the use of note onsets for improved lyrics-to-audio alignment in turkish makam music
Jha et al. Assessing vowel quality for singing evaluation
CN109410968B (en) Efficient detection method for initial position of voice in song
Ikemiya et al. Transcribing vocal expression from polyphonic music
CN113823270B (en) Determination method, medium, device and computing equipment of rhythm score
Barthet et al. Speech/music discrimination in audio podcast using structural segmentation and timbre recognition
CN111681674B (en) Musical instrument type identification method and system based on naive Bayesian model
Ikemiya et al. Transferring vocal expression of f0 contour using singing voice synthesizer
CN113129923A (en) Multi-dimensional singing playing analysis evaluation method and system in art evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200107