CN110660383A - Singing scoring method based on lyric and singing alignment - Google Patents
Singing scoring method based on lyric and singing alignment Download PDFInfo
- Publication number
- CN110660383A CN110660383A CN201910890520.7A CN201910890520A CN110660383A CN 110660383 A CN110660383 A CN 110660383A CN 201910890520 A CN201910890520 A CN 201910890520A CN 110660383 A CN110660383 A CN 110660383A
- Authority
- CN
- China
- Prior art keywords
- singing voice
- singing
- score
- user
- amplitude
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013077 scoring method Methods 0.000 title claims abstract description 25
- 238000011156 evaluation Methods 0.000 claims abstract description 50
- 230000033764 rhythmic process Effects 0.000 claims abstract description 45
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000000926 separation method Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 31
- 238000005259 measurement Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000008447 perception Effects 0.000 claims description 4
- 230000002194 synthesizing effect Effects 0.000 claims description 4
- 239000002131 composite material Substances 0.000 claims description 2
- 230000008451 emotion Effects 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 10
- 230000001755 vocal effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000012854 evaluation process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000037007 arousal Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/69—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
Abstract
The invention discloses a singing scoring method based on lyric singing alignment, which comprises the following steps in sequence: recording songs; singing accompanying separation and noise removal; extracting fundamental tone frequency and amplitude; the lyrics are aligned with the singing voice in sentence units; dividing the pitch frequency of each word in the aligned singing voice; calculating a pitch frequency similarity score; calculating a rhythm score according to the singing voice of the user, the time length of each sentence of the standard singing voice and the starting and ending time of each word; normalizing the amplitudes of the user singing voice and the standard singing voice; calculating an amplitude similarity score; and multiplying the pitch frequency score, the rhythm score and the amplitude score by weight coefficients, adding the values, and calculating the comprehensive score of the song. The singing scoring method reduces the influence of accompaniment and noise on singing evaluation; the method has the advantages that the label information of the lyrics is reasonably utilized, so that the evaluation on the fundamental tone frequency and the rhythm of the user is more accurate; the user songs are evaluated in multiple aspects, and the song scoring result is more objective and comprehensive.
Description
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a singing scoring method based on lyric singing alignment.
Background
With the development of the internet and science and technology, the demand of modern people for online singing entertainment is greater, and users pay more and more attention to the ranking of the singing ability, so that an accurate and comprehensive singing scoring method is very necessary. Currently, in the industry, a singing scoring method directly translates a current recorded audio to be scored for n offset durations to search out a time-wise better corresponding relationship between the recorded audio and a standard audio, so as to improve the singing score of a song. However, this method requires n searches each time to compare an optimal score, and is not accurate enough. Therefore, researchers provide a singing scoring method based on dynamic time warping, the method collects audio data to be scored and reference audio data, and generates corresponding fundamental tone frequency vectors; and then calculating the path distance by utilizing dynamic time warping, determining the intonation score of the audio data to be scored, determining the rhythm score of the audio data to be scored by utilizing the alignment degree, and finally determining the score of the audio data to be scored according to the intonation score and the rhythm score. However, the path normalized by the method may distort the corresponding relationship between the band score and the reference pitch frequency trajectory, and the method only considers two aspects of audio and rhythm, evaluates the user song from the singing skill, and does not consider the emotion aspect. The singing scoring method in academic circles is advanced compared with the singing scoring method in the industry, but is also complex. The early singing scoring method is mainly a feature matching method, and the main idea is to extract vocal music features of some songs, and calculate similarity distances between the vocal music features of the user songs and vocal music features of standard songs by using Dynamic Time Warping (DTW). Extracting three characteristics of fundamental tone frequency, Mel cepstrum coefficient (MFCC) and sound intensity as Wu national seal, and calculating similarity of the characteristics by using DTW algorithm to obtain song score; changhong Lin et al, based on the DTW algorithm, rate songs from features such as RMS energy, pitch, spectral center, spectral flatness, and spread spectrum. However, the song rhythm and emotion aspects are not well considered in the methods, so that WeiHo Tsai and the like are improved on the basis of predecessors, a Hidden Markov Model (HMM) is established to judge whether the song to be scored is correct in rhythm or not while similarity of vocal music features is calculated by using a DTW algorithm. However, the rhythm evaluation method needs to establish an independent corresponding HMM for each song to identify whether the rhythm is correct or not, so that the training cost is high, and the method has great limitation in practical application; the PeiPai Chen and the like extract 5 characteristics related to the singing enthusiasm of the singer in the song, and a support vector regression model is trained by utilizing a large amount of data and used for predicting the singing enthusiasm of the singer; florian Eyben et al divides emotion marker words in an Arousal-value space by using 205 features in singing voice, and then trains a support vector machine to analyze the emotion in the song in the space. NingZhang and the like construct a double-dense-set connected convolutional neural network two-classification model by using a large amount of song data containing emotion labels, and the quality of a user song is evaluated end to end. Although the academic community has deeper research on singing scoring technology, the methods all need a large amount of data and training time, and although there is a certain research on song emotion analysis, the recognition accuracy is low, the recognized emotion types are single, and the method is difficult to realize in practical application.
Although singing scoring technology has been developed to a certain extent in the last ten years, most of academic methods are complex and have certain limitations, and the singing scoring technology is difficult to be directly applied to actual life; therefore, a simpler scoring method is still used in the industry, resulting in insufficient scoring capability for songs.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a singing scoring method based on the alignment of lyrics and singing voice. The method aligns the user song audio frequencies for separating the accompaniment and the noise by using the lyric label information and the automatic sound identification technology, compares the aligned user audio frequencies with the standard audio frequencies, calculates the user audio rhythm score, the fundamental tone frequency score and the amplitude score, and finally determines the final comprehensive score according to the three scores, thereby weakening the influence of the accompaniment and the noise on the song voice evaluation and more accurately evaluating the user song.
The purpose of the invention is realized by the following technical scheme:
a singing scoring method based on lyric singing alignment comprises the following steps:
s1, recording songs;
s2, separating singing voice accompaniment and removing noise;
s3, extracting fundamental tone frequency and amplitude;
s4, aligning the lyrics with the singing voice by taking sentences as units;
s501, dividing the pitch frequency of each word in the aligned singing voice;
s502, calculating a fundamental tone frequency similarity score ScoreP;
s6, calculating a rhythm score ScoreR according to the singing voice of the user, the time length of each sentence of the standard singing voice and the starting and ending time of each word;
s701, normalizing the amplitudes of the singing voice of the user and the standard singing voice;
s702, calculating an amplitude similarity score ScoreV;
and S8, multiplying the pitch frequency Score, the rhythm Score and the amplitude Score by weight coefficients, adding the weighted values, and calculating the comprehensive Score of the song.
The recorded song refers to song audio recorded in a daily environment or a non-professional recording environment such as a karaoke environment.
The sequence of the step S2 is as follows: separating singing voice and accompaniment by using a sound source separation algorithm, and carrying out noise removal processing on the separated singing voice.
The lyric singing voice alignment is that each lyric is taken as a dividing unit, each lyric is aligned with the corresponding singing voice frequency by utilizing an automatic voice recognition technology and is divided into a singing voice set A ═ A1,A2,...,AnIn which A isiIndicating the singing voice of the i-th sentence aligned with the lyrics.
The steps S501 and S502 include:
for the ith sentence singing voice AiIdentifying each word pronunciation region, recording its start and end time in the sentence singing voice, dividing into a pitch frequency set Pi={Pi1,Pi2,...,Pim},PijIt is expressed as the jth word of the lyric of the ith sentence.
Calculate the ith sentence singing voice AiThe similarity of the pitch frequency of the user is compared and matched with the pitch frequency of the standard singing voice, and the pitch frequency set of the user singing voice is recorded as Pci={Pci1,Pci2,...,PcimThe standard singing voice fundamental tone frequency set is Psi={Psi1,Psi2,...,PsimIn comparison with PcijAnd PsijThe starting time of the two is unified at the same starting point, and then the similarity is calculated by using a characteristic similarity measurement algorithm.
The song rhythm evaluation process comprises overall rhythm evaluation and local rhythm evaluation, and comprises the following steps: the overall rhythm evaluation takes the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the singing voice time length Tc of the useriAnd a standard singing voice duration TsiA difference of (a); the local rhythm evaluation is to use each word in the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the pitch frequency Pc of the singing voice of the userijAnd the standard singing tone frequency PsijDifference in start and end times.
In the process of calculating the amplitude similarity score, the evaluation aspect comprises the following steps: comparing singing voice AiUser amplitude VciAnd a standard amplitude VsiAverage amplitudeAndand calculating the amplitude similarity by using a characteristic similarity measurement algorithm, and synthesizing the two aspects to obtain an amplitude score.
The overall rhythm score and the amplitude similarity score are each a singing voice AiTo evaluateUnit, the whole rhythm score and amplitude score of the whole song are the sum of the scores of the characteristics of the singing voice corresponding to each sentence:wherein f isiIs singing voice AiThe evaluation score of a certain feature of (1).
The pitch frequency score and the local rhythm score are each a word in each sentence of singing voiceijFor the evaluation unit, the pitch frequency score and the local rhythm score of the whole song are the words A in the singing voiceijSum of corresponding feature evaluation scores of (1):wherein f isijIs the word A in singing voiceijThe evaluation score of a certain feature of (1).
In the process of step S8, the weight coefficient λiAn appropriate value can be artificially given, and the weight coefficient lambda isiThe sum of (1); regression models can also be built to fit these weight coefficients λiTo conform to human auditory perception. The comprehensive fraction calculation formula is as follows: socre ═ λ1*ScoreR+λ2*ScoreP+λ3*ScoreV。
The characteristic similarity measurement algorithm process comprises the following steps: converting user singing voice characteristics and standard singing voice characteristics into characteristic vector FcAnd FsComparing the user singing voice feature vector FcAnd standard singing voice feature vector FsZero-filling the feature vector with shorter length until the two feature vectors have the same length to obtain the feature vector F with the same length after zero-fillingc' and Fs' measuring the similarity between two characteristic vectors by using Euclidean distance, and calculating a formula:
Feuclideaneuclidean distance, F, of feature vectorseuclideanThe smaller the similarity, the greater the similarity and vice versa.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the invention can give an objective and accurate score to the songs recorded in the non-professional environment. In real life, a user often accompanies the music in the process of recording songs, and the recorded songs are influenced by recording equipment or the surrounding environment, so that the recorded songs are not the simple singing voice of the user, but the audio with mixed multiple voices. Therefore, when feature matching is performed with the standard audio, the feature matching similarity is reduced. Before the singing voice is matched with the standard singing voice in characteristics, the singing voice of the user in the mixed audio is separated from the accompaniment, then the separated singing voice of the user is subjected to noise removal processing, the influence of the accompaniment and the noise is avoided, and the scoring of the song is more accurate.
(2) And aligning each sentence of lyrics with the singing voice audio by using the lyric label information and an automatic voice recognition technology, and limiting the singing voice of the user and the standard singing voice in the same comparison area. Then, the sentence or word is used as an evaluation unit, and the feature similarity of the singing voice of the user and the standard singing voice is compared under the unit, so that the accuracy of the evaluation result of the features is improved.
(3) The volume can reflect the emotion of the user when singing to a certain extent, the emotional tone is a sad song and is generally lower in volume, the tone is a happy song and is generally higher in volume, and the volume is determined by the amplitude, so that the emotion of the user can be reflected laterally by evaluating the song by utilizing the amplitude. In the step of comparing the similarity after normalizing the amplitudes of the singing voice of the user and the standard singing voice, the normalization processing reduces the difference of the singing voice of the user and the amplitude of the standard singing voice due to the influence of other factors, and then the amplitude similarity is calculated to show whether the emotion put into the song of the user is consistent with the emotion in the standard song.
(4) The song is evaluated by synthesizing the fundamental tone frequency, rhythm and amplitude, the singing level of the user can be more comprehensively reflected, and in the process of calculating the comprehensive score, the weight coefficients of the three aspects can be adjusted, so that the calculated score is more in line with the human auditory perception and the scoring standard.
Drawings
Fig. 1 is a flowchart of a singing scoring method based on lyric and singing alignment according to the present invention.
FIG. 2 is a modular schematic of an embodiment of the method of FIG. 1.
FIG. 3 is a diagram illustrating the user's singing voice aligned with the lyrics and the standard singing voice.
Fig. 4 is a schematic diagram showing the rhythm difference between the user's singing voice and the standard singing voice after a certain sentence of lyrics is aligned.
FIG. 5 shows a pitch frequency subsequence PijPitch frequency alignment front and back diagrams.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Referring to fig. 1 to 5, a singing scoring method based on alignment of lyrics and singing voice comprises the following steps in sequence:
s1, recording songs;
s2, separating singing voice accompaniment and removing noise;
s3, extracting fundamental tone frequency and amplitude;
s4, aligning the lyrics with the singing voice by taking sentences as units;
s501, dividing the pitch frequency of each word in the aligned singing voice;
s502, calculating a fundamental tone frequency similarity score ScoreP;
s6, calculating a rhythm score ScoreR of the song according to the singing voice of the user, the duration of each sentence of the standard singing voice and the starting and ending time of each word;
s701, normalizing the amplitudes of the singing voice of the user and the standard singing voice;
s702, calculating an amplitude similarity score ScoreV;
s8, multiplying the fundamental tone frequency score, the rhythm score and the amplitude score by a weight coefficient lambdaiAnd added to calculate a composite Score for the song.
The recorded songs refer to song audio recorded in a non-professional recording environment, and the non-professional recording environment comprises a daily environment and a karaoke environment.
The step S2 specifically includes: separating singing voice and accompaniment by using a sound source separation algorithm, and carrying out noise removal processing on the separated singing voice.
In step S4, the words and singing voice alignment means that each word of words is used as a division unit, and each word of words is aligned with the corresponding singing voice audio frequency by using an automatic speech recognition technique, and is divided into a singing voice set a ═ a1,A2,...,AnIn which A isiThe singing voice of the ith sentence aligned with the lyrics is expressed, and i is more than or equal to 1 and less than or equal to n.
Step S501 and step S502 specifically include:
singing voice A of lyrics aligned for ith sentenceiIdentifying each pronunciation region, recording its start time and end time in the sentence singing voice, dividing into a pitch frequency set Pi={Pi1,Pi2,...,Pim},PijThe word is expressed as the jth word of the lyric of the ith sentence;
calculating the singing voice A of the i-th sentence aligned with the lyricsiThe similarity of the pitch frequency of the user is compared and matched with the pitch frequency of the standard singing voice, and the pitch frequency set of the user singing voice is recorded as Pci={Pci1,Pci2,...,PcimThe standard singing voice fundamental tone frequency set is Psi={Psi1,Psi2,...,PsimIn comparison with PcijAnd PsijThe starting time of the two is required to be unified at the same starting point, and then the similarity is calculated by utilizing a characteristic similarity measurement algorithm;
the jth word P of the lyric of the ith sentenceijFor the unit of evaluation, the pitch frequency score of the entire song is the word P in the singing voiceijThe corresponding pitch frequency evaluation score sum.
In step S6, the evaluation of the tempo of the song includes overall tempo evaluation and local tempo evaluation: the overall rhythm evaluation takes the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the singing voice time length Tc of the useriAnd a standard singing voice duration TsiDifference of (2)(ii) a The local rhythm evaluation is based on the word in the singing voice aligned with each sentence as evaluation unit, and the evaluation method is to compare the pitch frequency Pc of the singing voice of the userijAnd the standard singing tone frequency PsijDifference in start and end times;
singing voice A of lyrics aligned in ith sentenceiThe integral rhythm score of the whole song is the total score of the rhythm characteristics of the singing voice corresponding to each sentence;
the jth word P of the lyric of the ith sentenceijFor the evaluation unit, the local rhythm score of the whole song is the word P in the singing voiceijThe corresponding local tempo score sums.
Step S702 specifically includes: comparing singing voice A of the ith sentence aligned with lyricsiCorresponding user amplitude VciAnd a standard amplitude VsiAverage amplitude ofAndcalculating amplitude similarity by using a characteristic similarity measurement algorithm, and synthesizing the two aspects to obtain an amplitude score;
singing voice A of lyrics aligned in ith sentenceiFor the evaluation unit, the amplitude score of the whole song is the sum of the amplitude feature scores corresponding to each sentence.
In step S8, the weight coefficient λiA preset value is given artificially, and a weight coefficient lambdaiThe sum of (1);
or, the weight coefficient lambdaiFitting by establishing a regression model to ensure that the regression model accords with human auditory perception, wherein a comprehensive score calculation formula is as follows:
Socre=λ1*ScoreR+λ2*ScoreP+λ3*ScoreV。
the feature similarity measurement algorithm specifically comprises the following steps: converting user singing voice characteristics and standard singing voice characteristics into characteristic vector FcAnd FsComparing the user singing voice feature vector FcAnd standard singing voice feature vector FsZero-filling the eigenvectors with shorter lengths until the lengths of the two eigenvectors are consistent to obtain the eigenvectors F 'with consistent lengths after zero-filling'cAnd F'sAnd measuring the similarity between the two characteristic vectors by using the Euclidean distance, and calculating a formula:
wherein, FeuclideanEuclidean distance, F, of feature vectorseuclideanThe smaller the similarity, the greater the similarity and vice versa.
Specifically, the method comprises the following steps:
a singing scoring method based on lyric singing alignment comprises the following steps:
(1) recording songs sung by a user in a non-professional recording environment such as KTV and uploading the songs to a cloud server;
(2) the method comprises the following steps of carrying out score estimation on a user song in a cloud server:
and (2.1) preprocessing the song. Separating the singing voice and the accompaniment by utilizing a harmonic wave and impact source separation technology or a singing voice and accompaniment separation technology based on U-net and the like to obtain the singing voice without the accompaniment; then, the singing voice audio is denoised, such as setting frequency and amplitude thresholds, and audio signals below the two thresholds are zeroed out and removed. And finally, extracting the fundamental tone frequency and the amplitude of the singing voice and inputting the fundamental tone frequency and the amplitude into an audio segmentation and alignment module.
And (2.2) audio segmentation alignment. The cloud server processor finds out the lyrics of the recorded song from the database, divides the lyrics into evaluation units by taking sentences as units, then identifies the singing voice of the user according to the divided sentences and by utilizing an automatic voice recognition technology, divides the singing voice audio frequency and aligns the singing voice audio frequency with the lyrics sentences to obtain an aligned singing voice set A ═ A { A }1,A2,...,An}. Meanwhile, the cloud server processor also performs segmentation and alignment processing on the standard singing voice or directly loads the segmented and aligned standard singing voice from the database. The user's singing voice is then aligned with the standard singing voice with reference to the first lyric, e.g.FIG. 3 shows, wherein the sequence Pi={Pi1,Pi2,...,PimIs the audio band set corresponding to each lyric word in the ith sentence of lyrics, where PcijFor the user's singing voice pitch frequency segment, PsijThe horizontal axis is the sequence P for the standard singing tone frequency sectionijDistribution in time coordinates.
(2.3) rhythm scoring. As shown in FIG. 4, the overall tempo score is in each sentence AiAs an evaluation unit, the user singing voice time TciWith the standard singing voice duration TsiIs used as an evaluation index, and the time length difference is | Tsi-TciThe larger the | the worse the overall tempo score Rhythm 1; the partial rhythm score is a subsequence PijIn units, it compares the user's song phonon sequence PcijWith standard singing voice subsequence PsijThe difference between the start time and the end time, the larger the difference, the worse the local tempo score Rhythm 2. And comprehensively calculating the overall rhythm score and the local rhythm score to obtain the rhythm score ScoreR.
(2.4) tone scoring. The pitch score is in the subsequence PijFor each subsequence P in the evaluation process, as evaluation unitijAlignment is required as shown in fig. 5. Then converting the singing voice into a characteristic vector form, wherein sequence characteristic vectors with shorter vector lengths need to be filled with zero to achieve the consistency of the singing voice of the user and the standard singing voice characteristic vector length, and then measuring the similarity of the two characteristic vectors by using Euclidean distance, wherein the smaller the distance, the greater the similarity, and the higher the tone score ScoreP.
(2.5) amplitude scoring. In each sentence AiThe singing voice of the user and the standard singing voice amplitude are normalized to be an evaluation unit, and the difference between the singing voice and the standard singing voice due to different environments is reduced. Then expressing the singing voice of the user and the amplitude of the standard singing voice in a vector form, filling zero into the vector with shorter length to align the length, and calculating the Euclidean distance of the two vectors as amplitude score Volume 1; the average Volume of the evaluation unit is calculated, and the absolute distance between the two is calculated as the amplitude score Volume 2. The two amplitude scores were combined to obtain the amplitude score ScoreV.
And (2.6) comprehensive scoring. Multiplying the tempo score ScoreR, the pitch score ScoreP and the amplitude score ScoreV by an artificially given weight factor λiAnd adding the weights, and calculating a comprehensive Score of the user audio, wherein the sum of the weight coefficients is 1, and the calculation formula is as follows:
Socre=λ1*ScoreR+λ2*ScoreP+λ3*ScoreV
(3) and downloading the comprehensive Score from the cloud server to a display terminal and feeding back the Score to the user.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (9)
1. A singing scoring method based on lyric singing alignment is characterized by comprising the following steps:
s1, recording songs;
s2, separating singing voice accompaniment and removing noise;
s3, extracting fundamental tone frequency and amplitude;
s4, aligning the lyrics with the singing voice by taking sentences as units;
s501, dividing the pitch frequency of each word in the aligned singing voice;
s502, calculating a fundamental tone frequency similarity score ScoreP;
s6, calculating a rhythm score ScoreR of the song according to the singing voice of the user, the duration of each sentence of the standard singing voice and the starting and ending time of each word;
s701, normalizing the amplitudes of the singing voice of the user and the standard singing voice;
s702, calculating an amplitude similarity score ScoreV;
s8, multiplying the fundamental tone frequency score, the rhythm score and the amplitude score by the weight valueCoefficient lambdaiAnd added to calculate a composite Score for the song.
2. The lyric scoring method as recited in claim 1, wherein said recorded song is a song audio recorded in a non-professional recording environment, said non-professional recording environment comprising a daily environment, a karaoke environment.
3. The singing scoring method according to claim 1, wherein the step S2 is specifically: separating singing voice and accompaniment by using a sound source separation algorithm, and carrying out noise removal processing on the separated singing voice.
4. The method as claimed in claim 1, wherein the step S4, the lyric-singing alignment means that each lyric is divided into a set of singing voice, wherein each lyric is aligned with the corresponding singing voice audio frequency by using automatic speech recognition technique, and the set of singing voice is divided into a set of singing voice, a ═ a1,A2,...,AnIn which A isiThe singing voice of the ith sentence aligned with the lyrics is expressed, and i is more than or equal to 1 and less than or equal to n.
5. The singing scoring method based on the alignment of lyrics and singing as claimed in claim 1, wherein the steps S501 and S502 are specifically:
singing voice A of lyrics aligned for ith sentenceiIdentifying each pronunciation region, recording its start time and end time in the sentence singing voice, dividing into a pitch frequency set Pi={Pi1,Pi2,...,Pim},PijThe word is expressed as the jth word of the lyric of the ith sentence;
calculating the singing voice A of the i-th sentence aligned with the lyricsiThe similarity of the pitch frequency of the user is compared and matched with the pitch frequency of the standard singing voice, and the pitch frequency set of the user singing voice is recorded as Pci={Pci1,Pci2,...,PcimStandard singing pitch frequencyThe aggregate is Psi={Psi1,Psi2,...,PsimIn comparison with PcijAnd PsijThe starting time of the two is required to be unified at the same starting point, and then the similarity is calculated by utilizing a characteristic similarity measurement algorithm;
the jth word P of the lyric of the ith sentenceijFor the unit of evaluation, the pitch frequency score of the entire song is the word P in the singing voiceijThe corresponding pitch frequency evaluation score sum.
6. The singing scoring method according to claim 1, wherein in step S6, the song tempo is evaluated as a global tempo evaluation and a local tempo evaluation: the overall rhythm evaluation takes the singing voice after each sentence is aligned as an evaluation unit, and the evaluation method is to compare the singing voice time length Tc of the useriAnd a standard singing voice duration TsiA difference of (a); the local rhythm evaluation is based on the word in the singing voice aligned with each sentence as evaluation unit, and the evaluation method is to compare the pitch frequency Pc of the singing voice of the userijAnd the standard singing tone frequency PsijDifference in start and end times;
singing voice A of lyrics aligned in ith sentenceiThe integral rhythm score of the whole song is the total score of the rhythm characteristics of the singing voice corresponding to each sentence;
the jth word P of the lyric of the ith sentenceijFor the evaluation unit, the local rhythm score of the whole song is the word P in the singing voiceijThe corresponding local tempo score sums.
7. The singing scoring method according to claim 1, wherein the step S702 is specifically: comparing singing voice A of the ith sentence aligned with lyricsiCorresponding user amplitude VciAnd a standard amplitude VsiAverage amplitude ofAndcalculating amplitude similarity by using a characteristic similarity measurement algorithm, and synthesizing the two aspects to obtain an amplitude score;
singing voice A of lyrics aligned in ith sentenceiFor the evaluation unit, the amplitude score of the whole song is the sum of the amplitude feature scores corresponding to each sentence.
8. The singing scoring method based on the alignment of lyrics and singing voice of claim 1, wherein in step S8, the weight coefficient λiA preset value is given artificially, and a weight coefficient lambdaiThe sum of (1);
or, the weight coefficient lambdaiFitting by establishing a regression model to ensure that the regression model accords with human auditory perception, wherein a comprehensive score calculation formula is as follows:
Socre=λ1*ScoreR+λ2*ScoreP+λ3*ScoreV。
9. the singing scoring method based on the alignment of the lyrics and the singing voice according to claim 5 or 7, characterized in that the feature similarity measurement algorithm specifically comprises: converting user singing voice characteristics and standard singing voice characteristics into characteristic vector FcAnd FsComparing the user singing voice feature vector FcAnd standard singing voice feature vector FsZero-filling the feature vector with shorter length until the two feature vectors have the same length to obtain the feature vector F with the same length after zero-fillingc' and Fs' measuring the similarity between two characteristic vectors by using Euclidean distance, and calculating a formula:
wherein, FeuclideanEuclidean distance, F, of feature vectorseuclideanThe smaller the similarity, the greater the similarity and vice versa.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910890520.7A CN110660383A (en) | 2019-09-20 | 2019-09-20 | Singing scoring method based on lyric and singing alignment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910890520.7A CN110660383A (en) | 2019-09-20 | 2019-09-20 | Singing scoring method based on lyric and singing alignment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110660383A true CN110660383A (en) | 2020-01-07 |
Family
ID=69037426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910890520.7A Pending CN110660383A (en) | 2019-09-20 | 2019-09-20 | Singing scoring method based on lyric and singing alignment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110660383A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210850A (en) * | 2020-01-10 | 2020-05-29 | 腾讯音乐娱乐科技(深圳)有限公司 | Lyric alignment method and related product |
CN111369975A (en) * | 2020-03-17 | 2020-07-03 | 郑州工程技术学院 | University music scoring method, device, equipment and storage medium based on artificial intelligence |
CN112133269A (en) * | 2020-09-22 | 2020-12-25 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method, device, equipment and medium |
CN112802494A (en) * | 2021-04-12 | 2021-05-14 | 北京世纪好未来教育科技有限公司 | Voice evaluation method, device, computer equipment and medium |
CN113096689A (en) * | 2021-04-02 | 2021-07-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Song singing evaluation method, equipment and medium |
WO2021245234A1 (en) * | 2020-06-05 | 2021-12-09 | Sony Group Corporation | Electronic device, method and computer program |
CN113853047A (en) * | 2021-09-29 | 2021-12-28 | 深圳市火乐科技发展有限公司 | Light control method and device, storage medium and electronic equipment |
CN114093386A (en) * | 2021-11-10 | 2022-02-25 | 厦门大学 | Education-oriented multi-dimensional singing evaluation method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101894552A (en) * | 2010-07-16 | 2010-11-24 | 安徽科大讯飞信息科技股份有限公司 | Speech spectrum segmentation based singing evaluating system |
CN107507628A (en) * | 2017-08-31 | 2017-12-22 | 广州酷狗计算机科技有限公司 | Singing methods of marking, device and terminal |
CN107978308A (en) * | 2017-11-28 | 2018-05-01 | 广东小天才科技有限公司 | A kind of K songs methods of marking, device, equipment and storage medium |
CN108492835A (en) * | 2018-02-06 | 2018-09-04 | 南京陶特思软件科技有限公司 | A kind of methods of marking of singing |
CN108922562A (en) * | 2018-06-15 | 2018-11-30 | 广州酷狗计算机科技有限公司 | Sing evaluation result display methods and device |
CN109300485A (en) * | 2018-11-19 | 2019-02-01 | 北京达佳互联信息技术有限公司 | Methods of marking, device, electronic equipment and the computer storage medium of audio signal |
CN109448754A (en) * | 2018-09-07 | 2019-03-08 | 南京光辉互动网络科技股份有限公司 | A kind of various dimensions singing marking system |
-
2019
- 2019-09-20 CN CN201910890520.7A patent/CN110660383A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101894552A (en) * | 2010-07-16 | 2010-11-24 | 安徽科大讯飞信息科技股份有限公司 | Speech spectrum segmentation based singing evaluating system |
CN107507628A (en) * | 2017-08-31 | 2017-12-22 | 广州酷狗计算机科技有限公司 | Singing methods of marking, device and terminal |
CN107978308A (en) * | 2017-11-28 | 2018-05-01 | 广东小天才科技有限公司 | A kind of K songs methods of marking, device, equipment and storage medium |
CN108492835A (en) * | 2018-02-06 | 2018-09-04 | 南京陶特思软件科技有限公司 | A kind of methods of marking of singing |
CN108922562A (en) * | 2018-06-15 | 2018-11-30 | 广州酷狗计算机科技有限公司 | Sing evaluation result display methods and device |
CN109448754A (en) * | 2018-09-07 | 2019-03-08 | 南京光辉互动网络科技股份有限公司 | A kind of various dimensions singing marking system |
CN109300485A (en) * | 2018-11-19 | 2019-02-01 | 北京达佳互联信息技术有限公司 | Methods of marking, device, electronic equipment and the computer storage medium of audio signal |
Non-Patent Citations (4)
Title |
---|
M.NARASIMHAMURTY著,王振永译: "《模式识别 算法及实现方法》", 31 December 2017, 哈尔滨:哈尔滨工业大学出版社 * |
林伟伟: "基于遗传算法的Docker集群调度策略", 《华南理工大学学报(自然科学版)》 * |
王佳迪: "鲁棒的音乐评分方法研究", 《中国优秀硕士学位论文全文数据库中国优秀硕士学位论文全文数据库信息科技辑》 * |
王齐祥: "《现代公共广播技术与工程案例》", 31 August 2011, 北京:国防工业出版社 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210850A (en) * | 2020-01-10 | 2020-05-29 | 腾讯音乐娱乐科技(深圳)有限公司 | Lyric alignment method and related product |
CN111210850B (en) * | 2020-01-10 | 2021-06-25 | 腾讯音乐娱乐科技(深圳)有限公司 | Lyric alignment method and related product |
CN111369975A (en) * | 2020-03-17 | 2020-07-03 | 郑州工程技术学院 | University music scoring method, device, equipment and storage medium based on artificial intelligence |
WO2021245234A1 (en) * | 2020-06-05 | 2021-12-09 | Sony Group Corporation | Electronic device, method and computer program |
CN112133269A (en) * | 2020-09-22 | 2020-12-25 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method, device, equipment and medium |
CN112133269B (en) * | 2020-09-22 | 2024-03-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method, device, equipment and medium |
CN113096689A (en) * | 2021-04-02 | 2021-07-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Song singing evaluation method, equipment and medium |
CN112802494A (en) * | 2021-04-12 | 2021-05-14 | 北京世纪好未来教育科技有限公司 | Voice evaluation method, device, computer equipment and medium |
CN112802494B (en) * | 2021-04-12 | 2021-07-16 | 北京世纪好未来教育科技有限公司 | Voice evaluation method, device, computer equipment and medium |
CN113853047A (en) * | 2021-09-29 | 2021-12-28 | 深圳市火乐科技发展有限公司 | Light control method and device, storage medium and electronic equipment |
CN114093386A (en) * | 2021-11-10 | 2022-02-25 | 厦门大学 | Education-oriented multi-dimensional singing evaluation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110660383A (en) | Singing scoring method based on lyric and singing alignment | |
Paulus et al. | Measuring the similarity of Rhythmic Patterns. | |
Ryynänen et al. | Automatic transcription of melody, bass line, and chords in polyphonic music | |
Mesaros et al. | Singer identification in polyphonic music using vocal separation and pattern recognition methods. | |
Nakano et al. | An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features | |
CN104272382B (en) | Personalized singing synthetic method based on template and system | |
Ryynänen et al. | Transcription of the Singing Melody in Polyphonic Music. | |
Tsai et al. | Automatic evaluation of karaoke singing based on pitch, volume, and rhythm features | |
CN109979488B (en) | System for converting human voice into music score based on stress analysis | |
Kroher et al. | Automatic transcription of flamenco singing from polyphonic music recordings | |
CN109545191B (en) | Real-time detection method for initial position of human voice in song | |
Lagrange et al. | Normalized cuts for predominant melodic source separation | |
Pandit et al. | Feature selection for a DTW-based speaker verification system | |
Fujihara et al. | F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and Viterbi search | |
Toh et al. | Multiple-Feature Fusion Based Onset Detection for Solo Singing Voice. | |
CN115050387A (en) | Multi-dimensional singing playing analysis evaluation method and system in art evaluation | |
Dzhambazov et al. | On the use of note onsets for improved lyrics-to-audio alignment in turkish makam music | |
Jha et al. | Assessing vowel quality for singing evaluation | |
CN109410968B (en) | Efficient detection method for initial position of voice in song | |
Ikemiya et al. | Transcribing vocal expression from polyphonic music | |
CN113823270B (en) | Determination method, medium, device and computing equipment of rhythm score | |
Barthet et al. | Speech/music discrimination in audio podcast using structural segmentation and timbre recognition | |
CN111681674B (en) | Musical instrument type identification method and system based on naive Bayesian model | |
Ikemiya et al. | Transferring vocal expression of f0 contour using singing voice synthesizer | |
CN113129923A (en) | Multi-dimensional singing playing analysis evaluation method and system in art evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200107 |