CN106157976B - Singing evaluation method and system - Google Patents

Singing evaluation method and system Download PDF

Info

Publication number
CN106157976B
CN106157976B CN201510169264.4A CN201510169264A CN106157976B CN 106157976 B CN106157976 B CN 106157976B CN 201510169264 A CN201510169264 A CN 201510169264A CN 106157976 B CN106157976 B CN 106157976B
Authority
CN
China
Prior art keywords
data
pitch
data segment
calculating
starting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510169264.4A
Other languages
Chinese (zh)
Other versions
CN106157976A (en
Inventor
李啸
蒋成林
梅林海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201510169264.4A priority Critical patent/CN106157976B/en
Publication of CN106157976A publication Critical patent/CN106157976A/en
Application granted granted Critical
Publication of CN106157976B publication Critical patent/CN106157976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Reverberation, Karaoke And Other Acoustics (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses a singing evaluation method and a singing evaluation system, and belongs to the technical field of voice signal processing. The singing evaluation method comprises the following steps: acquiring recording data of a song sung by a user; sequentially intercepting a plurality of sections of recording data with set lengths from different positions between a starting end point and an ending end point of the recording data to obtain a plurality of data sections; calculating the pitch difference value of each data segment and the standard audio data of the song one by one, and taking the starting end point of the data segment with the minimum pitch difference value as a detection starting point; calculating a similarity score between a data segment starting from the detection starting point and standard audio data of the song; and taking the similarity score as an evaluation result. The singing evaluation method and the singing evaluation system are simple in calculation process and accurate in scoring, and can meet the application requirement of real-time scoring.

Description

Singing evaluation method and system
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a singing evaluation method and system.
Background
In the karaoke singing evaluation system, in order to evaluate the singing level of a user, a standard pitch of a song is usually preset, and a standard pitch curve with respect to time is formed. After the pitch curve of the recording data of the song sung by the user is obtained, the pitch curve is compared with the standard pitch curve, the singing level of the user is evaluated by calculating the area error between the pitch curve and the standard pitch curve, and the closer the pitch curve of the recording data sung by the user is to the standard pitch curve (namely, the smaller the area error of the curve is), the higher the singing level of the user is indicated.
In order to enhance the user experience, a singing evaluation system usually adopts a real-time scoring mode, so that a user can obtain a feedback result in a singing process. In the real-time scoring mode, each time a user sings a sentence, the system needs to give a scoring result of the sentence in a short response time. In the mode, under the condition that the starting time points of the recording data of the song sung by the user are inconsistent with the starting time points of the standard pitch data, for example, the defects of the recording equipment, the network problem of data transmission and the like cause the delay or deviation of the recording data of the user, the score is easily too low, and the real singing level of the user cannot be reflected.
In the prior art, a Dynamic Time Warping (DTW) algorithm is usually adopted to solve the problem of scoring errors caused by non-correspondence between user recording data and standard pitch data. The DTW algorithm is a nonlinear warping technique that combines time warping with distance measure calculation, and determines the time correspondence between a user recording signal (signal to be measured) and a standard signal (template signal) sequence by finding the time warping mode that is the minimum value of the distance measure. Specifically, when similarity between two signal sequences (i.e., "template signal" and "signal to be measured") with the same time length is measured, an euclidean distance calculation method is usually adopted to find a suitable time warping function, so that the distance between the signal to be measured and the template signal is the minimum after warping according to the function. If the time lengths of the two sequences are not consistent, the signal to be measured needs to be lengthened or shortened, namely, the signal to be measured is distorted on a time axis, and then Euclidean distance calculation is performed after the signal to be measured is in one-to-one correspondence with the template signal. Assuming that the lengths of the template signal and the signal to be measured are m and n respectively, firstly, an m x n matrix is constructed, and a matrix element (i, j) represents the Euclidean distance between the ith point of the template signal and the jth point of the signal to be measured. The time warping function is actually a path from (1,1) to (m, n), and point (i, j) on the path indicates that the jth point of the signal to be measured is aligned with the ith point of the template signal. For each path, the values of all points passed by the path are accumulated to obtain a cumulative distance, and the path with the smallest cumulative distance in all paths corresponds to the required time warping function. The time warping function is obtained through calculation of the Euclidean distance, the problem of scoring errors caused by time non-correspondence between user recording data and standard pitch data can be solved to a certain extent, and due to the fact that the calculation amount is large, the calculation process is complex, and the application requirement of real-time scoring cannot be met.
Disclosure of Invention
The embodiment of the invention provides a singing evaluation method and a singing evaluation system, which are simple in calculation process and accurate in scoring and can meet the application requirement of real-time scoring.
The technical scheme provided by the embodiment of the invention is as follows:
in one aspect, a singing evaluation method is provided, and includes:
acquiring recording data of a song sung by a user;
sequentially intercepting a plurality of sections of recording data with set lengths from different positions between a starting end point and an ending end point of the recording data to obtain a plurality of data sections;
calculating the pitch difference value of each data segment and the standard audio data of the song one by one, and taking the starting end point of the data segment with the minimum pitch difference value as a detection starting point;
calculating a similarity score between a data segment starting from the detection starting point and standard audio data of the song;
and taking the similarity score as an evaluation result.
Preferably, the calculating a pitch difference value of the data segment from standard audio data of a song includes:
step 201: extracting a stable fundamental frequency of the data segment;
step 202: calculating the pitch deviation of the stable fundamental frequency of the data segment from the standard audio data;
step 203: regulating the half frequency and the frequency multiplication in the stable fundamental frequency according to the starting deviation;
step 204: iteratively executing the steps 202 to 203 until the calculation times reach preset times or the stable fundamental frequency is converged;
step 205: and taking the pitch deviation of the stable fundamental frequency of the data segment and the standard audio data as the pitch difference when the calculation times reach the preset times or the stable fundamental frequency is converged.
Preferably, before the calculating a similarity score of the data segment from the detection start point and the standard audio data of the song, the method further comprises: and carrying out smoothing processing and singularity removing processing on the data segment with the pitch difference value smaller than a preset threshold value.
Preferably, the calculating the pitch-up deviation of the stable fundamental frequency of the data segment from the standard audio data comprises:
calculating a pitch mean difference between the stable fundamental frequency of the data segment and the standard audio data;
and searching a pitch deviation value which is closest to the standard pitch data with the sum of the stable fundamental frequencies in an interval containing the pitch mean difference, and taking the pitch deviation value as the start-up deviation.
Preferably, the sequentially intercepting a plurality of segments of the recording data with a set length from different positions between a starting end point and an ending end point of the recording data includes:
determining an interception starting position;
intercepting the recording data with a set length from the intercepting starting position;
and adjusting the interception initial position for multiple times by using a preset step length, and intercepting multiple sections of recording data with set lengths from the intercepted initial position after each adjustment.
In another aspect, a singing evaluation system is provided, including:
the acquisition module is used for acquiring recording data of songs sung by a user;
the intercepting module is used for sequentially intercepting a plurality of sections of recording data with set lengths from different positions between a starting end point and an ending end point of the recording data to obtain a plurality of data sections;
the first calculation module is used for calculating the pitch difference value of each data segment and the standard audio data of the song one by one;
a detection starting point determining module, configured to select a starting end point of the data segment with the smallest pitch difference from the pitch difference values calculated by the first calculating module as a detection starting point;
and the second calculation module is used for calculating the similarity score between the data segment starting from the detection starting point and the standard audio data of the song and taking the similarity score as an evaluation result.
Preferably, the first calculation module comprises:
an extraction unit, configured to extract a stable fundamental frequency of the data segment;
the computing unit is used for computing the start-up deviation of the stable fundamental frequency of the data segment and the standard audio data in an iterative mode, and taking the start-up deviation of the stable fundamental frequency of the data segment and the standard audio data as the pitch difference value when the computing times reach preset times or the stable fundamental frequency is converged;
and the warping unit is used for warping the half frequency and the frequency multiplication in the stable fundamental frequency according to the tuning-on deviation calculated by the calculating unit and outputting the warped stable fundamental frequency to the calculating unit so as to enable the calculating unit to perform the next iterative calculation.
Preferably, the system further comprises:
and the smooth odd point removing module is used for performing smoothing processing and odd point removing processing on the data segment with the pitch difference value smaller than a preset threshold value before the second calculating module calculates the similarity score between the data segment starting from the detection starting point and the standard audio data of the song.
Preferably, the calculation unit includes:
an average difference calculating unit, configured to calculate a pitch average difference between the stable fundamental frequency of the data segment and the standard audio data;
and the searching unit is used for searching a pitch deviation value which is closest to the standard pitch data and the sum of the stable fundamental frequencies in an interval containing the pitch average value difference, and taking the pitch deviation value as the starting pitch deviation.
Preferably, the intercepting module comprises:
a determination unit for determining an interception start position;
the intercepting unit is used for intercepting the recording data with set length from the intercepting initial position;
the adjusting unit is used for adjusting the intercepting starting position for multiple times according to a preset step length;
the intercepting unit is further used for intercepting the multiple sections of recording data with set lengths from the intercepting initial position after each adjustment after the adjustment unit adjusts the intercepting initial position each time.
The singing evaluation method and the singing evaluation system provided by the embodiment of the invention are characterized in that a plurality of sections of recording data with set lengths are intercepted from different positions between a starting end point and an ending end point of the recording data of a song sung by a user to obtain a plurality of data sections, the pitch difference between each data section and standard audio data of the song is calculated, the starting end point of the data section with the minimum pitch difference is used as a detection starting point, the similarity score between the data section starting from the detection starting point and the standard audio data of the song is calculated, and the similarity score is used as an evaluation result. According to the singing evaluation method and system, the starting end point of the data segment with the minimum pitch difference value serves as the detection starting point, the time difference between the recording data of the song sung by the user and the standard pitch data can be guaranteed to be minimum, so that the user can accurately score under the condition that the time between the recording data of the song sung by the user and the standard pitch data does not correspond, in addition, due to the fact that the calculation process is simple, the application requirement of real-time scoring can be met, and the singing evaluation method and system have strong practicability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a flowchart of a singing evaluation method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of calculating the pitch difference of FIG. 1;
fig. 3 is a flowchart of another singing evaluation method according to an embodiment of the present invention;
FIG. 4 is a flow chart of a method of calculating the pull-in offset of FIG. 3;
FIG. 5 is a flowchart of a recording data interception method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a singing evaluation system according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a second singing evaluation system according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a third singing evaluation system according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a fourth singing evaluation system according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a fifth singing evaluation system according to an embodiment of the present invention.
Detailed Description
In order to make the technical field of the invention better understand the scheme of the embodiment of the invention, the embodiment of the invention is further described in detail with reference to the drawings and the implementation mode.
The embodiment of the invention provides a singing evaluation method, which comprises the following steps as shown in figure 1:
step 101: recording data of a song sung by a user is obtained.
In the embodiment of the invention, the recording data of the songs sung by the user can be obtained sentence by sentence, and the recording data of the sentence is evaluated and scored after each sentence of recording data is obtained, so that the real-time performance of scoring is ensured, and the user experience is enhanced.
Step 102: and sequentially intercepting a plurality of sections of recording data with set lengths from different positions between the starting end point and the ending end point of the recording data to obtain a plurality of data sections.
In the process of singing a song, a user may have deviation between starting time points of recording data and standard audio data of the song sung by the user due to improper singing time points or defects of recording equipment, data transmission network problems and the like, so that pitch curves of the recording data and the standard audio data are different, the scoring accuracy is affected, and the real level of the song sung by the user cannot be reflected.
In the embodiment of the invention, a plurality of sections of recording data with set length are sequentially intercepted from different positions between the starting end point and the ending end point of the recording data to obtain a plurality of data sections, and then the plurality of data sections are respectively compared with standard pitch data to determine the closest starting time point. The set length (length of the intercepted data segment) is less than or equal to the length of the recording data of the song sung by the user, and specifically, the set length can be set according to the length of the standard audio data, so that the set length is equal to the length of the standard audio data, and the intercepted data segment can be evaluated and scored conveniently in the follow-up process.
Specifically, as shown in fig. 5, sequentially intercepting a plurality of pieces of recording data with a set length from different positions between a start end point and an end point of the recording data may include the following steps:
step 501: and determining a interception starting position.
Assuming that the length of the recorded data is L +2E and the length of the standard audio data is L, the end point of the recorded data and the end point of the standard audio data may be aligned, and then the interception start position may be arbitrarily selected from the 2E range where the recorded data is longer than the standard audio data, and the specific interception start position may be set according to an empirical value, for example, the interception start position may be set as the start point of the recorded data, or the middle position of the 2E length range may be set, and the embodiment of the present invention is not particularly limited.
Step 502: and intercepting the recording data with set length from the interception initial position.
After determining the interception start position in step 501, the recording data with the length the same as that of the standard audio data is intercepted from the interception start position to obtain a first data segment.
Step 503: and adjusting the interception initial position for multiple times by a preset step length, and respectively intercepting multiple sections of recording data with set lengths from the intercepted initial position after each adjustment.
After the first data segment is obtained by interception, other data segments can be obtained by interception by adjusting the interception initial position. Specifically, the interception start position may be adjusted according to a preset rule, for example, the interception start position may be adjusted according to a preset step length. Assuming that the interception starting position when the first data segment is intercepted is the starting end point of the recording data, and the preset step length is step, the starting end point of the recording data at the interception starting position of the second data segment starts moving step backwards, the interception starting position of the third data segment starts moving step backwards by 2 x step from the starting end point of the recording data, and so on, the recording data with multiple segments of set length can be intercepted.
Step 103: and calculating the pitch difference value of each data segment and the standard audio data of the song one by one, and taking the starting end point of the data segment with the minimum pitch difference value as a detection starting point.
By calculating the pitch difference value between each data segment and the standard audio data one by one and taking the starting end point of the data segment with the minimum pitch difference value as the detection starting point, the recording data can be ensured to be closest to the starting time point of the standard pitch data. The pitch difference between the data segment and the standard audio data of the song may be calculated by a method in the prior art, and the embodiment of the present invention preferably calculates by a method shown in fig. 2, and specifically includes the following steps:
step 201: the stable fundamental frequency of the data segment is extracted.
The recording of a song sung by a user is composed of a plurality of phonemes, and the boundary positions of the phonemes can be acquired according to standard audio data. And in the boundary of each phoneme, selecting a longest continuous non-zero fundamental frequency as the stable fundamental frequency of the phoneme, and after extracting the stable fundamental frequencies of all the phonemes in the intercepted data segment, obtaining the stable fundamental frequency of the data segment.
Step 202: the pitch-on deviation of the stable fundamental frequency of the data segment from the standard audio data is calculated.
The tone starting deviation refers to a difference value between a pitch curve of a song sung by the user and a standard pitch curve caused by different tone marks specified in standard audio data when the user sings the song.
As shown in fig. 4, a typical method for calculating a pitch offset according to an embodiment of the present invention includes the following steps:
step 401: and calculating the pitch mean difference between the stable fundamental frequency of the data segment and the standard audio data.
After the stable fundamental frequency of the data segment is extracted, the pitch mean value of the stable fundamental frequency is easy to calculate, the pitch mean value of the standard audio data is easy to obtain in the same way, and the pitch mean value difference can be obtained by calculating the difference value of the two pitch mean values.
Step 402: and searching a pitch deviation value which is closest to the standard pitch data to the sum of the stable fundamental frequencies in the interval containing the pitch average value difference, and taking the pitch deviation value as the start tone deviation.
After the pitch mean difference is calculated, a pitch deviation value, which is closest to standard pitch data to the sum of the stable fundamental frequencies, can be searched in a section containing the pitch mean difference, specifically, the width range of the searched section can be determined according to the requirements of precision and the like, and the searched pitch deviation value is used as the tuning deviation.
Step 203: and regulating the half frequency and the frequency multiplication in the stable fundamental frequency according to the start-up deviation.
According to the MIDI (Musical Instrument Digital Interface) standard, the scaling relationship between the pitch value p and the fundamental frequency f is as follows:
Figure BDA0000697274530000081
because the half-frequency and frequency multiplication phenomena are generated in the process of extracting the stable fundamental frequency, the extracted pitch value may be 12 (half-frequency) smaller than the real pitch value or 12 (frequency multiplication) larger than the real pitch value. By comparing the pitch value, which is closest to the standard pitch value by adding 12 to the pitch value and subtracting 12 from the pitch value, the pitch value is correspondingly processed by adding or subtracting 12, so that the half-frequency and the frequency multiplication are normalized, and the purpose of eliminating the half-frequency and the frequency multiplication is achieved.
Step 204: and (3) iteratively executing the steps 202 to 203 until the calculation times reach a preset number or the stable fundamental frequency converges.
Since the pitch data of the user is modified by half-frequency and frequency doubling warping, the pitch offset needs to be recalculated, that is, step 202 and step 203 are repeatedly executed for a plurality of times, so as to perform iterative calculation on the pitch offset value until the calculation times reach the preset times or the stable fundamental frequency is converged. The preset number of times may be set according to an empirical value, for example, 2 times or 3 times. And when the results of the tuning deviation calculated for two consecutive times are the same, the stable fundamental frequency is considered to be converged.
Step 205: and taking the pitch deviation of the stable fundamental frequency of the data segment and the standard audio data as a pitch difference value when the calculation times reach the preset times or the stable fundamental frequency is converged.
When the number of times of calculation reaches the preset number of times or the stable fundamental frequency is converged, it can be considered that a relatively accurate pitch offset has been found, the pitch offset is superimposed on the pitch data of the user, and the obtained pitch data is very close to the standard pitch data.
Step 104: a similarity score of the data segment from the detection start point and the standard audio data of the song is calculated.
Specifically, feature parameters may be extracted from a data segment (recording data of a set length) starting from the detection start point and the standard audio data, and the feature parameters of the data segment may be compared with the feature parameters of the standard audio data to obtain a similarity score.
More preferably, as shown in fig. 3, before calculating the similarity score between the data segment from the detection start point and the standard audio data of the song, the singing evaluation method further includes:
step 301: and carrying out smoothing processing and singularity removing processing on the data segment with the pitch difference value smaller than the preset threshold value.
After the pitch difference is calculated in step 205, the user pitch data with a small difference from the standard pitch is smoothed, so that the influence of the step effect can be effectively reduced, and noise and distortion can be reduced. Specifically, a threshold value (preset threshold) of the pitch difference value may be preset, and the pitch value corresponding to the pitch difference value smaller than the preset threshold in the data segment may be adjusted to the standard pitch, so as to implement the smoothing processing.
Because the pronunciation quality is not good or the problem of extracting the stable fundamental frequency method in the process of singing the song by the user, the pitch values of some points in the pitch data of the user are higher or lower (namely, the singularities appear), and the singularities can be removed through specific operation, thereby eliminating the influence of a small number of singularities on the overall scoring. The specific operation method for removing the singular points is as follows: arranging the data of the songs sung by the user from small to large according to the error between the data and the standard pitch data, selecting a preset range (such as the former 20%) as data without singularities, if the average value of the 20% of data is x, searching all points with the error between x and a certain set threshold value (such as 1), regarding the points as normal points, setting the average value of the points as y, and filling all the remaining points with y, thereby removing the singularities.
Step 105: and taking the similarity score as an evaluation result.
After the similarity score is obtained through calculation, the similarity score can be directly used as an evaluation result, namely a scoring result, and the evaluation result can be fed back to a user through voice broadcasting or displaying and the like according to needs, so that the user can timely know the evaluation result.
The singing evaluation method provided by the embodiment of the invention is characterized in that a plurality of sections of recording data with set lengths are intercepted from different positions between the starting end point and the ending end point of the recording data of a song sung by a user to obtain a plurality of data sections, the pitch difference value of each data section and the standard audio data of the song is calculated, the starting end point of the data section with the minimum pitch difference value is used as a detection starting point, the similarity score of the data section starting from the detection starting point and the standard audio data of the song is calculated, and the similarity score is used as an evaluation result. According to the singing evaluation method and system, the starting end point of the data segment with the minimum pitch difference value serves as the detection starting point, the time difference between the recording data of the song sung by the user and the standard pitch data can be guaranteed to be minimum, so that the user can accurately score under the condition that the time between the recording data of the song sung by the user and the standard pitch data does not correspond, in addition, due to the fact that the calculation process is simple, the application requirement of real-time scoring can be met, and the singing evaluation method and system have strong practicability.
Correspondingly, an embodiment of the present invention further provides a schematic structural diagram of a singing evaluation system, as shown in fig. 6, including:
an obtaining module 601, configured to obtain recording data of a song sung by a user;
an intercepting module 602, configured to sequentially intercept multiple segments of recording data with a set length at different positions between a start endpoint and an end endpoint of the recording data to obtain multiple data segments;
a first calculating module 603, configured to calculate a pitch difference between each data segment and standard audio data of a song one by one;
a detection starting point determining module 604, configured to select a starting end point of the data segment with the smallest pitch difference from the pitch difference values calculated by the first calculating module 603 as a detection starting point;
and a second calculating module 605, configured to calculate a similarity score between the data segment starting from the detection starting point and the standard audio data of the song, and use the similarity score as an evaluation result.
As shown in fig. 7, a specific structure of the first computing module 603 includes:
an extracting unit 701, configured to extract a stable fundamental frequency of a data segment;
a calculating unit 702, configured to calculate a pitch offset between the stable fundamental frequency of the data segment and the standard audio data in an iterative manner, and use the pitch offset between the stable fundamental frequency of the data segment and the standard audio data as a pitch difference when the calculated number reaches a preset number or the stable fundamental frequency converges;
and a regularizing unit 703 for regularizing half-frequency and double-frequency in the stable fundamental frequency according to the tuning-on deviation calculated by the calculating unit 702, and outputting the regularized stable fundamental frequency to the calculating unit 702, so that the calculating unit 702 performs the next iterative calculation.
As shown in fig. 8, the singing evaluation system further includes:
and a smooth odd-point removing module 606, configured to perform smoothing processing and odd-point removing processing on the data segment with the pitch difference value smaller than the preset threshold before the second calculating module 605 calculates the similarity score between the data segment starting from the detection starting point and the standard audio data of the song. The smoothing and odd-removing functions can be integrated into one smoothing and odd-removing module 606 as required, and separate smoothing and odd-removing modules can be respectively arranged to respectively smooth and remove odd-removing processing on the data segment.
As shown in fig. 9, the calculating unit 702 specifically includes:
a mean difference calculating unit 801, configured to calculate a pitch mean difference between the stable fundamental frequency of the data segment and the standard audio data;
a searching unit 802, configured to search, in an interval including a pitch mean difference, a pitch deviation value whose sum of the stable fundamental frequencies is closest to the standard pitch data, and use the pitch deviation value as the start-up deviation.
As shown in fig. 10, the intercepting module 602 specifically includes:
a determining unit 901, configured to determine an interception start position;
an intercepting unit 902, configured to intercept the recording data with a set length from an interception start position;
an adjusting unit 903, configured to adjust the capture start position multiple times with a preset step length;
the intercepting unit 902 is further configured to intercept, after the adjusting unit 903 adjusts the interception start position each time, multiple segments of recording data with a set length from the interception start position adjusted each time.
The singing evaluation system provided by the embodiment of the invention is characterized in that a plurality of sections of recording data with set lengths are intercepted from different positions between the starting end point and the ending end point of the recording data of a song sung by a user to obtain a plurality of data sections, the pitch difference value between each data section and the standard audio data of the song is calculated, the starting end point of the data section with the minimum pitch difference value is used as a detection starting point, the similarity score between the data section starting from the detection starting point and the standard audio data of the song is calculated, and the similarity score is used as an evaluation result. According to the singing evaluation method and system, the starting end point of the data segment with the minimum pitch difference value serves as the detection starting point, the time difference between the recording data of the song sung by the user and the standard pitch data can be guaranteed to be minimum, so that the user can accurately score under the condition that the time between the recording data of the song sung by the user and the standard pitch data does not correspond, in addition, due to the fact that the calculation process is simple, the application requirement of real-time scoring can be met, and the singing evaluation method and system have strong practicability.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, they are described in a relatively simple manner, and reference may be made to some descriptions of method embodiments for relevant points. The above-described system embodiments are merely illustrative, wherein the modules or units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A singing evaluation method, comprising:
acquiring recording data of a song sung by a user;
sequentially intercepting a plurality of sections of recording data with set lengths from different positions between a starting end point and an ending end point of the recording data to obtain a plurality of data sections;
calculating the pitch difference value of each data segment and the standard audio data of the song one by one, and taking the starting end point of the data segment with the minimum pitch difference value as a detection starting point;
calculating a similarity score between a data segment starting from the detection starting point and standard audio data of the song;
and taking the similarity score as an evaluation result.
2. The singing evaluation method of claim 1, wherein the calculating a pitch difference between the data segment and standard audio data of a song comprises:
step 201: extracting a stable fundamental frequency of the data segment;
step 202: calculating the pitch deviation of the stable fundamental frequency of the data segment from the standard audio data;
step 203: regulating the half frequency and the frequency multiplication in the stable fundamental frequency according to the starting deviation;
step 204: iteratively executing the steps 202 to 203 until the calculation times reach preset times or the stable fundamental frequency is converged;
step 205: and taking the pitch deviation of the stable fundamental frequency of the data segment and the standard audio data as the pitch difference when the calculation times reach the preset times or the stable fundamental frequency is converged.
3. The singing evaluation method according to claim 2, wherein before the calculating the similarity score of the data segment from the detection start point and the standard audio data of the song, the method further comprises: and carrying out smoothing processing and singularity removing processing on the data segment with the pitch difference value smaller than a preset threshold value.
4. The singing evaluation method of claim 3, wherein the calculating a pitch-up deviation of the stabilized fundamental frequency of the data segment from the standard audio data comprises:
calculating a pitch mean difference between the stable fundamental frequency of the data segment and the standard audio data;
and searching a pitch deviation value which is closest to the standard pitch data with the sum of the stable fundamental frequencies in an interval containing the pitch mean difference, and taking the pitch deviation value as the start-up deviation.
5. A singing evaluation method according to any one of claims 1 to 4, wherein the sequentially intercepting a plurality of segments of recorded data of a set length from different positions between a start end point and an end point of the recorded data comprises:
determining an interception starting position;
intercepting the recording data with a set length from the intercepting starting position;
and adjusting the interception initial position for multiple times by using a preset step length, and intercepting multiple sections of recording data with set lengths from the intercepted initial position after each adjustment.
6. A singing evaluation system, comprising:
the acquisition module is used for acquiring recording data of songs sung by a user;
the intercepting module is used for sequentially intercepting a plurality of sections of recording data with set lengths from different positions between a starting end point and an ending end point of the recording data to obtain a plurality of data sections;
the first calculation module is used for calculating the pitch difference value of each data segment and the standard audio data of the song one by one;
a detection starting point determining module, configured to select a starting end point of the data segment with the smallest pitch difference from the pitch difference values calculated by the first calculating module as a detection starting point;
and the second calculation module is used for calculating the similarity score between the data segment starting from the detection starting point and the standard audio data of the song and taking the similarity score as an evaluation result.
7. The singing evaluation system according to claim 6, wherein the first computing module includes:
an extraction unit, configured to extract a stable fundamental frequency of the data segment;
the computing unit is used for computing the start-up deviation of the stable fundamental frequency of the data segment and the standard audio data in an iterative mode, and taking the start-up deviation of the stable fundamental frequency of the data segment and the standard audio data as the pitch difference value when the computing times reach preset times or the stable fundamental frequency is converged;
and the warping unit is used for warping the half frequency and the frequency multiplication in the stable fundamental frequency according to the tuning-on deviation calculated by the calculating unit and outputting the warped stable fundamental frequency to the calculating unit so as to enable the calculating unit to perform the next iterative calculation.
8. A singing evaluation system according to claim 7, wherein the system further comprises:
and the smooth odd point removing module is used for performing smoothing processing and odd point removing processing on the data segment with the pitch difference value smaller than a preset threshold value before the second calculating module calculates the similarity score between the data segment starting from the detection starting point and the standard audio data of the song.
9. A singing evaluation system according to claim 8, wherein the computing unit comprises:
an average difference calculating unit, configured to calculate a pitch average difference between the stable fundamental frequency of the data segment and the standard audio data;
and the searching unit is used for searching a pitch deviation value which is closest to the standard pitch data and the sum of the stable fundamental frequencies in an interval containing the pitch average value difference, and taking the pitch deviation value as the starting pitch deviation.
10. A singing evaluation system according to any one of claims 6 to 9, wherein the intercept module comprises:
a determination unit for determining an interception start position;
the intercepting unit is used for intercepting the recording data with set length from the intercepting initial position;
the adjusting unit is used for adjusting the intercepting starting position for multiple times according to a preset step length;
the intercepting unit is further used for intercepting the multiple sections of recording data with set lengths from the intercepting initial position after each adjustment after the adjustment unit adjusts the intercepting initial position each time.
CN201510169264.4A 2015-04-10 2015-04-10 Singing evaluation method and system Active CN106157976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510169264.4A CN106157976B (en) 2015-04-10 2015-04-10 Singing evaluation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510169264.4A CN106157976B (en) 2015-04-10 2015-04-10 Singing evaluation method and system

Publications (2)

Publication Number Publication Date
CN106157976A CN106157976A (en) 2016-11-23
CN106157976B true CN106157976B (en) 2020-02-07

Family

ID=57335643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510169264.4A Active CN106157976B (en) 2015-04-10 2015-04-10 Singing evaluation method and system

Country Status (1)

Country Link
CN (1) CN106157976B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358969A (en) * 2017-07-19 2017-11-17 无锡冰河计算机科技发展有限公司 One kind recording fusion method
CN107507628B (en) * 2017-08-31 2021-01-15 广州酷狗计算机科技有限公司 Singing scoring method, singing scoring device and terminal
CN107785010A (en) * 2017-09-15 2018-03-09 广州酷狗计算机科技有限公司 Singing songses evaluation method, equipment, evaluation system and readable storage medium storing program for executing
CN108022604A (en) * 2017-11-28 2018-05-11 北京小唱科技有限公司 The method and apparatus of amended record audio content
CN108206026B (en) * 2017-12-05 2021-12-03 北京小唱科技有限公司 Method and device for determining pitch deviation of audio content
CN108257613B (en) * 2017-12-05 2021-12-10 北京小唱科技有限公司 Method and device for correcting pitch deviation of audio content
CN108172206B (en) * 2017-12-27 2021-05-07 广州酷狗计算机科技有限公司 Audio processing method, device and system
CN110890086B (en) * 2018-08-17 2023-12-26 嘉楠明芯(北京)科技有限公司 Voice similarity calculation method and device based on greedy algorithm
CN109524025B (en) * 2018-11-26 2021-12-14 北京达佳互联信息技术有限公司 Singing scoring method and device, electronic equipment and storage medium
CN110070847B (en) * 2019-03-28 2023-09-26 深圳市芒果未来科技有限公司 Musical tone evaluation method and related products
CN110120216B (en) * 2019-04-29 2021-11-12 北京小唱科技有限公司 Audio data processing method and device for singing evaluation
CN112581976B (en) * 2019-09-29 2023-06-27 骅讯电子企业股份有限公司 Singing scoring method and system based on streaming media
CN111081277B (en) * 2019-12-19 2022-07-12 广州酷狗计算机科技有限公司 Audio evaluation method, device, equipment and storage medium
CN111785238B (en) * 2020-06-24 2024-02-27 腾讯音乐娱乐科技(深圳)有限公司 Audio calibration method, device and storage medium
CN112365868B (en) * 2020-11-17 2024-05-28 北京达佳互联信息技术有限公司 Sound processing method, device, electronic equipment and storage medium
CN112885374A (en) * 2021-01-27 2021-06-01 吴怡然 Sound accuracy judgment method and system based on spectrum analysis
CN115881159A (en) * 2022-10-10 2023-03-31 中央民族大学 Singing evaluation method and device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101430876B (en) * 2007-11-08 2012-03-14 中国科学院声学研究所 Singing marking system and method
CN101894552B (en) * 2010-07-16 2012-09-26 安徽科大讯飞信息科技股份有限公司 Speech spectrum segmentation based singing evaluating system
CN103716470B (en) * 2012-09-29 2016-12-07 华为技术有限公司 The method and apparatus of Voice Quality Monitor
CN104318921B (en) * 2014-11-06 2017-08-25 科大讯飞股份有限公司 Segment cutting detection method and system, method and system for evaluating spoken language

Also Published As

Publication number Publication date
CN106157976A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
CN106157976B (en) Singing evaluation method and system
KR101413327B1 (en) Methods and systems for performing synchronization of audio with corresponding textual transcriptions and determining confidence values of the synchronization
US9355649B2 (en) Sound alignment using timing information
CN102664016A (en) Singing evaluation method and system
WO2015139452A1 (en) Method and apparatus for processing speech signal according to frequency domain energy
KR102212225B1 (en) Apparatus and Method for correcting Audio data
CN104978962A (en) Query by humming method and system
CN101430876A (en) Singing marking system and method
JP6578049B2 (en) Learning data generation apparatus and program thereof
WO2021120602A1 (en) Method and apparatus for detecting rhythm points, and electronic device
Katmeoka et al. Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds
CN108711415A (en) Correct the method, apparatus and storage medium of the time delay between accompaniment and dry sound
CN104732984B (en) A kind of method and system of quick detection single-frequency prompt tone
Mysore et al. Relative pitch estimation of multiple instruments
JP2013050605A (en) Language model switching device and program for the same
CN107133344B (en) Data processing method and device
Terrien et al. Regime change thresholds in recorder-like instruments: Influence of the mouth pressure dynamics
CN106663110B (en) Derivation of probability scores for audio sequence alignment
Wan et al. Automatic piano music transcription using audio‐visual features
CN110675845A (en) Human voice humming accurate recognition algorithm and digital notation method
CN107025902B (en) Data processing method and device
JP2016080832A (en) Learning data generation device and program thereof
Sumarno On The Performace of Segment Averaging of Discrete Cosine Transform Coefficients on Musical Instruments Tone Recognition
CN106373590B (en) Voice real-time duration adjustment-based sound variable speed control system and method
Benetos et al. Improving automatic music transcription through key detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant