CN106157976B

CN106157976B - Singing evaluation method and system

Info

Publication number: CN106157976B
Application number: CN201510169264.4A
Authority: CN
Inventors: 李啸; 蒋成林; 梅林海
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2015-04-10
Filing date: 2015-04-10
Publication date: 2020-02-07
Anticipated expiration: 2035-04-10
Also published as: CN106157976A

Abstract

The invention discloses a singing evaluation method and a singing evaluation system, and belongs to the technical field of voice signal processing. The singing evaluation method comprises the following steps: acquiring recording data of a song sung by a user; sequentially intercepting a plurality of sections of recording data with set lengths from different positions between a starting end point and an ending end point of the recording data to obtain a plurality of data sections; calculating the pitch difference value of each data segment and the standard audio data of the song one by one, and taking the starting end point of the data segment with the minimum pitch difference value as a detection starting point; calculating a similarity score between a data segment starting from the detection starting point and standard audio data of the song; and taking the similarity score as an evaluation result. The singing evaluation method and the singing evaluation system are simple in calculation process and accurate in scoring, and can meet the application requirement of real-time scoring.

Description

Singing evaluation method and system

Technical Field

The invention relates to the technical field of voice signal processing, in particular to a singing evaluation method and system.

Background

In the karaoke singing evaluation system, in order to evaluate the singing level of a user, a standard pitch of a song is usually preset, and a standard pitch curve with respect to time is formed. After the pitch curve of the recording data of the song sung by the user is obtained, the pitch curve is compared with the standard pitch curve, the singing level of the user is evaluated by calculating the area error between the pitch curve and the standard pitch curve, and the closer the pitch curve of the recording data sung by the user is to the standard pitch curve (namely, the smaller the area error of the curve is), the higher the singing level of the user is indicated.

In order to enhance the user experience, a singing evaluation system usually adopts a real-time scoring mode, so that a user can obtain a feedback result in a singing process. In the real-time scoring mode, each time a user sings a sentence, the system needs to give a scoring result of the sentence in a short response time. In the mode, under the condition that the starting time points of the recording data of the song sung by the user are inconsistent with the starting time points of the standard pitch data, for example, the defects of the recording equipment, the network problem of data transmission and the like cause the delay or deviation of the recording data of the user, the score is easily too low, and the real singing level of the user cannot be reflected.

In the prior art, a Dynamic Time Warping (DTW) algorithm is usually adopted to solve the problem of scoring errors caused by non-correspondence between user recording data and standard pitch data. The DTW algorithm is a nonlinear warping technique that combines time warping with distance measure calculation, and determines the time correspondence between a user recording signal (signal to be measured) and a standard signal (template signal) sequence by finding the time warping mode that is the minimum value of the distance measure. Specifically, when similarity between two signal sequences (i.e., "template signal" and "signal to be measured") with the same time length is measured, an euclidean distance calculation method is usually adopted to find a suitable time warping function, so that the distance between the signal to be measured and the template signal is the minimum after warping according to the function. If the time lengths of the two sequences are not consistent, the signal to be measured needs to be lengthened or shortened, namely, the signal to be measured is distorted on a time axis, and then Euclidean distance calculation is performed after the signal to be measured is in one-to-one correspondence with the template signal. Assuming that the lengths of the template signal and the signal to be measured are m and n respectively, firstly, an m x n matrix is constructed, and a matrix element (i, j) represents the Euclidean distance between the ith point of the template signal and the jth point of the signal to be measured. The time warping function is actually a path from (1,1) to (m, n), and point (i, j) on the path indicates that the jth point of the signal to be measured is aligned with the ith point of the template signal. For each path, the values of all points passed by the path are accumulated to obtain a cumulative distance, and the path with the smallest cumulative distance in all paths corresponds to the required time warping function. The time warping function is obtained through calculation of the Euclidean distance, the problem of scoring errors caused by time non-correspondence between user recording data and standard pitch data can be solved to a certain extent, and due to the fact that the calculation amount is large, the calculation process is complex, and the application requirement of real-time scoring cannot be met.

Disclosure of Invention

The embodiment of the invention provides a singing evaluation method and a singing evaluation system, which are simple in calculation process and accurate in scoring and can meet the application requirement of real-time scoring.

The technical scheme provided by the embodiment of the invention is as follows:

in one aspect, a singing evaluation method is provided, and includes:

acquiring recording data of a song sung by a user;

sequentially intercepting a plurality of sections of recording data with set lengths from different positions between a starting end point and an ending end point of the recording data to obtain a plurality of data sections;

calculating the pitch difference value of each data segment and the standard audio data of the song one by one, and taking the starting end point of the data segment with the minimum pitch difference value as a detection starting point;

calculating a similarity score between a data segment starting from the detection starting point and standard audio data of the song;

and taking the similarity score as an evaluation result.

Preferably, the calculating a pitch difference value of the data segment from standard audio data of a song includes:

step 201: extracting a stable fundamental frequency of the data segment;

step 202: calculating the pitch deviation of the stable fundamental frequency of the data segment from the standard audio data;

step 203: regulating the half frequency and the frequency multiplication in the stable fundamental frequency according to the starting deviation;

step 204: iteratively executing the steps 202 to 203 until the calculation times reach preset times or the stable fundamental frequency is converged;

step 205: and taking the pitch deviation of the stable fundamental frequency of the data segment and the standard audio data as the pitch difference when the calculation times reach the preset times or the stable fundamental frequency is converged.

Preferably, before the calculating a similarity score of the data segment from the detection start point and the standard audio data of the song, the method further comprises: and carrying out smoothing processing and singularity removing processing on the data segment with the pitch difference value smaller than a preset threshold value.

Preferably, the calculating the pitch-up deviation of the stable fundamental frequency of the data segment from the standard audio data comprises:

calculating a pitch mean difference between the stable fundamental frequency of the data segment and the standard audio data;

and searching a pitch deviation value which is closest to the standard pitch data with the sum of the stable fundamental frequencies in an interval containing the pitch mean difference, and taking the pitch deviation value as the start-up deviation.

Preferably, the sequentially intercepting a plurality of segments of the recording data with a set length from different positions between a starting end point and an ending end point of the recording data includes:

determining an interception starting position;

intercepting the recording data with a set length from the intercepting starting position;

and adjusting the interception initial position for multiple times by using a preset step length, and intercepting multiple sections of recording data with set lengths from the intercepted initial position after each adjustment.

In another aspect, a singing evaluation system is provided, including:

the acquisition module is used for acquiring recording data of songs sung by a user;

the intercepting module is used for sequentially intercepting a plurality of sections of recording data with set lengths from different positions between a starting end point and an ending end point of the recording data to obtain a plurality of data sections;

the first calculation module is used for calculating the pitch difference value of each data segment and the standard audio data of the song one by one;

a detection starting point determining module, configured to select a starting end point of the data segment with the smallest pitch difference from the pitch difference values calculated by the first calculating module as a detection starting point;

and the second calculation module is used for calculating the similarity score between the data segment starting from the detection starting point and the standard audio data of the song and taking the similarity score as an evaluation result.

Preferably, the first calculation module comprises:

an extraction unit, configured to extract a stable fundamental frequency of the data segment;

the computing unit is used for computing the start-up deviation of the stable fundamental frequency of the data segment and the standard audio data in an iterative mode, and taking the start-up deviation of the stable fundamental frequency of the data segment and the standard audio data as the pitch difference value when the computing times reach preset times or the stable fundamental frequency is converged;

and the warping unit is used for warping the half frequency and the frequency multiplication in the stable fundamental frequency according to the tuning-on deviation calculated by the calculating unit and outputting the warped stable fundamental frequency to the calculating unit so as to enable the calculating unit to perform the next iterative calculation.

Preferably, the system further comprises:

and the smooth odd point removing module is used for performing smoothing processing and odd point removing processing on the data segment with the pitch difference value smaller than a preset threshold value before the second calculating module calculates the similarity score between the data segment starting from the detection starting point and the standard audio data of the song.

Preferably, the calculation unit includes:

an average difference calculating unit, configured to calculate a pitch average difference between the stable fundamental frequency of the data segment and the standard audio data;

and the searching unit is used for searching a pitch deviation value which is closest to the standard pitch data and the sum of the stable fundamental frequencies in an interval containing the pitch average value difference, and taking the pitch deviation value as the starting pitch deviation.

Preferably, the intercepting module comprises:

a determination unit for determining an interception start position;

the intercepting unit is used for intercepting the recording data with set length from the intercepting initial position;

the adjusting unit is used for adjusting the intercepting starting position for multiple times according to a preset step length;

the intercepting unit is further used for intercepting the multiple sections of recording data with set lengths from the intercepting initial position after each adjustment after the adjustment unit adjusts the intercepting initial position each time.

The singing evaluation method and the singing evaluation system provided by the embodiment of the invention are characterized in that a plurality of sections of recording data with set lengths are intercepted from different positions between a starting end point and an ending end point of the recording data of a song sung by a user to obtain a plurality of data sections, the pitch difference between each data section and standard audio data of the song is calculated, the starting end point of the data section with the minimum pitch difference is used as a detection starting point, the similarity score between the data section starting from the detection starting point and the standard audio data of the song is calculated, and the similarity score is used as an evaluation result. According to the singing evaluation method and system, the starting end point of the data segment with the minimum pitch difference value serves as the detection starting point, the time difference between the recording data of the song sung by the user and the standard pitch data can be guaranteed to be minimum, so that the user can accurately score under the condition that the time between the recording data of the song sung by the user and the standard pitch data does not correspond, in addition, due to the fact that the calculation process is simple, the application requirement of real-time scoring can be met, and the singing evaluation method and system have strong practicability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of a singing evaluation method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of calculating the pitch difference of FIG. 1;

fig. 3 is a flowchart of another singing evaluation method according to an embodiment of the present invention;

FIG. 4 is a flow chart of a method of calculating the pull-in offset of FIG. 3;

FIG. 5 is a flowchart of a recording data interception method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a singing evaluation system according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a second singing evaluation system according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a third singing evaluation system according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a fourth singing evaluation system according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a fifth singing evaluation system according to an embodiment of the present invention.

Detailed Description

In order to make the technical field of the invention better understand the scheme of the embodiment of the invention, the embodiment of the invention is further described in detail with reference to the drawings and the implementation mode.

The embodiment of the invention provides a singing evaluation method, which comprises the following steps as shown in figure 1:

step 101: recording data of a song sung by a user is obtained.

In the embodiment of the invention, the recording data of the songs sung by the user can be obtained sentence by sentence, and the recording data of the sentence is evaluated and scored after each sentence of recording data is obtained, so that the real-time performance of scoring is ensured, and the user experience is enhanced.

Step 102: and sequentially intercepting a plurality of sections of recording data with set lengths from different positions between the starting end point and the ending end point of the recording data to obtain a plurality of data sections.

In the process of singing a song, a user may have deviation between starting time points of recording data and standard audio data of the song sung by the user due to improper singing time points or defects of recording equipment, data transmission network problems and the like, so that pitch curves of the recording data and the standard audio data are different, the scoring accuracy is affected, and the real level of the song sung by the user cannot be reflected.

In the embodiment of the invention, a plurality of sections of recording data with set length are sequentially intercepted from different positions between the starting end point and the ending end point of the recording data to obtain a plurality of data sections, and then the plurality of data sections are respectively compared with standard pitch data to determine the closest starting time point. The set length (length of the intercepted data segment) is less than or equal to the length of the recording data of the song sung by the user, and specifically, the set length can be set according to the length of the standard audio data, so that the set length is equal to the length of the standard audio data, and the intercepted data segment can be evaluated and scored conveniently in the follow-up process.

Specifically, as shown in fig. 5, sequentially intercepting a plurality of pieces of recording data with a set length from different positions between a start end point and an end point of the recording data may include the following steps:

step 501: and determining a interception starting position.

Assuming that the length of the recorded data is L +2E and the length of the standard audio data is L, the end point of the recorded data and the end point of the standard audio data may be aligned, and then the interception start position may be arbitrarily selected from the 2E range where the recorded data is longer than the standard audio data, and the specific interception start position may be set according to an empirical value, for example, the interception start position may be set as the start point of the recorded data, or the middle position of the 2E length range may be set, and the embodiment of the present invention is not particularly limited.

Step 502: and intercepting the recording data with set length from the interception initial position.

After determining the interception start position in step 501, the recording data with the length the same as that of the standard audio data is intercepted from the interception start position to obtain a first data segment.

Step 503: and adjusting the interception initial position for multiple times by a preset step length, and respectively intercepting multiple sections of recording data with set lengths from the intercepted initial position after each adjustment.

After the first data segment is obtained by interception, other data segments can be obtained by interception by adjusting the interception initial position. Specifically, the interception start position may be adjusted according to a preset rule, for example, the interception start position may be adjusted according to a preset step length. Assuming that the interception starting position when the first data segment is intercepted is the starting end point of the recording data, and the preset step length is step, the starting end point of the recording data at the interception starting position of the second data segment starts moving step backwards, the interception starting position of the third data segment starts moving step backwards by 2 x step from the starting end point of the recording data, and so on, the recording data with multiple segments of set length can be intercepted.

Step 103: and calculating the pitch difference value of each data segment and the standard audio data of the song one by one, and taking the starting end point of the data segment with the minimum pitch difference value as a detection starting point.

By calculating the pitch difference value between each data segment and the standard audio data one by one and taking the starting end point of the data segment with the minimum pitch difference value as the detection starting point, the recording data can be ensured to be closest to the starting time point of the standard pitch data. The pitch difference between the data segment and the standard audio data of the song may be calculated by a method in the prior art, and the embodiment of the present invention preferably calculates by a method shown in fig. 2, and specifically includes the following steps:

step 201: the stable fundamental frequency of the data segment is extracted.

The recording of a song sung by a user is composed of a plurality of phonemes, and the boundary positions of the phonemes can be acquired according to standard audio data. And in the boundary of each phoneme, selecting a longest continuous non-zero fundamental frequency as the stable fundamental frequency of the phoneme, and after extracting the stable fundamental frequencies of all the phonemes in the intercepted data segment, obtaining the stable fundamental frequency of the data segment.

Step 202: the pitch-on deviation of the stable fundamental frequency of the data segment from the standard audio data is calculated.

The tone starting deviation refers to a difference value between a pitch curve of a song sung by the user and a standard pitch curve caused by different tone marks specified in standard audio data when the user sings the song.

As shown in fig. 4, a typical method for calculating a pitch offset according to an embodiment of the present invention includes the following steps:

step 401: and calculating the pitch mean difference between the stable fundamental frequency of the data segment and the standard audio data.

After the stable fundamental frequency of the data segment is extracted, the pitch mean value of the stable fundamental frequency is easy to calculate, the pitch mean value of the standard audio data is easy to obtain in the same way, and the pitch mean value difference can be obtained by calculating the difference value of the two pitch mean values.

Step 402: and searching a pitch deviation value which is closest to the standard pitch data to the sum of the stable fundamental frequencies in the interval containing the pitch average value difference, and taking the pitch deviation value as the start tone deviation.

After the pitch mean difference is calculated, a pitch deviation value, which is closest to standard pitch data to the sum of the stable fundamental frequencies, can be searched in a section containing the pitch mean difference, specifically, the width range of the searched section can be determined according to the requirements of precision and the like, and the searched pitch deviation value is used as the tuning deviation.

Step 203: and regulating the half frequency and the frequency multiplication in the stable fundamental frequency according to the start-up deviation.

According to the MIDI (Musical Instrument Digital Interface) standard, the scaling relationship between the pitch value p and the fundamental frequency f is as follows:

because the half-frequency and frequency multiplication phenomena are generated in the process of extracting the stable fundamental frequency, the extracted pitch value may be 12 (half-frequency) smaller than the real pitch value or 12 (frequency multiplication) larger than the real pitch value. By comparing the pitch value, which is closest to the standard pitch value by adding 12 to the pitch value and subtracting 12 from the pitch value, the pitch value is correspondingly processed by adding or subtracting 12, so that the half-frequency and the frequency multiplication are normalized, and the purpose of eliminating the half-frequency and the frequency multiplication is achieved.

Step 204: and (3) iteratively executing the steps 202 to 203 until the calculation times reach a preset number or the stable fundamental frequency converges.

Since the pitch data of the user is modified by half-frequency and frequency doubling warping, the pitch offset needs to be recalculated, that is, step 202 and step 203 are repeatedly executed for a plurality of times, so as to perform iterative calculation on the pitch offset value until the calculation times reach the preset times or the stable fundamental frequency is converged. The preset number of times may be set according to an empirical value, for example, 2 times or 3 times. And when the results of the tuning deviation calculated for two consecutive times are the same, the stable fundamental frequency is considered to be converged.

Step 205: and taking the pitch deviation of the stable fundamental frequency of the data segment and the standard audio data as a pitch difference value when the calculation times reach the preset times or the stable fundamental frequency is converged.

When the number of times of calculation reaches the preset number of times or the stable fundamental frequency is converged, it can be considered that a relatively accurate pitch offset has been found, the pitch offset is superimposed on the pitch data of the user, and the obtained pitch data is very close to the standard pitch data.

Step 104: a similarity score of the data segment from the detection start point and the standard audio data of the song is calculated.

Specifically, feature parameters may be extracted from a data segment (recording data of a set length) starting from the detection start point and the standard audio data, and the feature parameters of the data segment may be compared with the feature parameters of the standard audio data to obtain a similarity score.

More preferably, as shown in fig. 3, before calculating the similarity score between the data segment from the detection start point and the standard audio data of the song, the singing evaluation method further includes:

step 301: and carrying out smoothing processing and singularity removing processing on the data segment with the pitch difference value smaller than the preset threshold value.

After the pitch difference is calculated in step 205, the user pitch data with a small difference from the standard pitch is smoothed, so that the influence of the step effect can be effectively reduced, and noise and distortion can be reduced. Specifically, a threshold value (preset threshold) of the pitch difference value may be preset, and the pitch value corresponding to the pitch difference value smaller than the preset threshold in the data segment may be adjusted to the standard pitch, so as to implement the smoothing processing.

Because the pronunciation quality is not good or the problem of extracting the stable fundamental frequency method in the process of singing the song by the user, the pitch values of some points in the pitch data of the user are higher or lower (namely, the singularities appear), and the singularities can be removed through specific operation, thereby eliminating the influence of a small number of singularities on the overall scoring. The specific operation method for removing the singular points is as follows: arranging the data of the songs sung by the user from small to large according to the error between the data and the standard pitch data, selecting a preset range (such as the former 20%) as data without singularities, if the average value of the 20% of data is x, searching all points with the error between x and a certain set threshold value (such as 1), regarding the points as normal points, setting the average value of the points as y, and filling all the remaining points with y, thereby removing the singularities.

Step 105: and taking the similarity score as an evaluation result.

After the similarity score is obtained through calculation, the similarity score can be directly used as an evaluation result, namely a scoring result, and the evaluation result can be fed back to a user through voice broadcasting or displaying and the like according to needs, so that the user can timely know the evaluation result.

The singing evaluation method provided by the embodiment of the invention is characterized in that a plurality of sections of recording data with set lengths are intercepted from different positions between the starting end point and the ending end point of the recording data of a song sung by a user to obtain a plurality of data sections, the pitch difference value of each data section and the standard audio data of the song is calculated, the starting end point of the data section with the minimum pitch difference value is used as a detection starting point, the similarity score of the data section starting from the detection starting point and the standard audio data of the song is calculated, and the similarity score is used as an evaluation result. According to the singing evaluation method and system, the starting end point of the data segment with the minimum pitch difference value serves as the detection starting point, the time difference between the recording data of the song sung by the user and the standard pitch data can be guaranteed to be minimum, so that the user can accurately score under the condition that the time between the recording data of the song sung by the user and the standard pitch data does not correspond, in addition, due to the fact that the calculation process is simple, the application requirement of real-time scoring can be met, and the singing evaluation method and system have strong practicability.

Correspondingly, an embodiment of the present invention further provides a schematic structural diagram of a singing evaluation system, as shown in fig. 6, including:

an obtaining module 601, configured to obtain recording data of a song sung by a user;

an intercepting module 602, configured to sequentially intercept multiple segments of recording data with a set length at different positions between a start endpoint and an end endpoint of the recording data to obtain multiple data segments;

a first calculating module 603, configured to calculate a pitch difference between each data segment and standard audio data of a song one by one;

a detection starting point determining module 604, configured to select a starting end point of the data segment with the smallest pitch difference from the pitch difference values calculated by the first calculating module 603 as a detection starting point;

and a second calculating module 605, configured to calculate a similarity score between the data segment starting from the detection starting point and the standard audio data of the song, and use the similarity score as an evaluation result.

As shown in fig. 7, a specific structure of the first computing module 603 includes:

an extracting unit 701, configured to extract a stable fundamental frequency of a data segment;

a calculating unit 702, configured to calculate a pitch offset between the stable fundamental frequency of the data segment and the standard audio data in an iterative manner, and use the pitch offset between the stable fundamental frequency of the data segment and the standard audio data as a pitch difference when the calculated number reaches a preset number or the stable fundamental frequency converges;

and a regularizing unit 703 for regularizing half-frequency and double-frequency in the stable fundamental frequency according to the tuning-on deviation calculated by the calculating unit 702, and outputting the regularized stable fundamental frequency to the calculating unit 702, so that the calculating unit 702 performs the next iterative calculation.

As shown in fig. 8, the singing evaluation system further includes:

and a smooth odd-point removing module 606, configured to perform smoothing processing and odd-point removing processing on the data segment with the pitch difference value smaller than the preset threshold before the second calculating module 605 calculates the similarity score between the data segment starting from the detection starting point and the standard audio data of the song. The smoothing and odd-removing functions can be integrated into one smoothing and odd-removing module 606 as required, and separate smoothing and odd-removing modules can be respectively arranged to respectively smooth and remove odd-removing processing on the data segment.

As shown in fig. 9, the calculating unit 702 specifically includes:

a mean difference calculating unit 801, configured to calculate a pitch mean difference between the stable fundamental frequency of the data segment and the standard audio data;

a searching unit 802, configured to search, in an interval including a pitch mean difference, a pitch deviation value whose sum of the stable fundamental frequencies is closest to the standard pitch data, and use the pitch deviation value as the start-up deviation.

As shown in fig. 10, the intercepting module 602 specifically includes:

a determining unit 901, configured to determine an interception start position;

an intercepting unit 902, configured to intercept the recording data with a set length from an interception start position;

an adjusting unit 903, configured to adjust the capture start position multiple times with a preset step length;

the intercepting unit 902 is further configured to intercept, after the adjusting unit 903 adjusts the interception start position each time, multiple segments of recording data with a set length from the interception start position adjusted each time.

The singing evaluation system provided by the embodiment of the invention is characterized in that a plurality of sections of recording data with set lengths are intercepted from different positions between the starting end point and the ending end point of the recording data of a song sung by a user to obtain a plurality of data sections, the pitch difference value between each data section and the standard audio data of the song is calculated, the starting end point of the data section with the minimum pitch difference value is used as a detection starting point, the similarity score between the data section starting from the detection starting point and the standard audio data of the song is calculated, and the similarity score is used as an evaluation result. According to the singing evaluation method and system, the starting end point of the data segment with the minimum pitch difference value serves as the detection starting point, the time difference between the recording data of the song sung by the user and the standard pitch data can be guaranteed to be minimum, so that the user can accurately score under the condition that the time between the recording data of the song sung by the user and the standard pitch data does not correspond, in addition, due to the fact that the calculation process is simple, the application requirement of real-time scoring can be met, and the singing evaluation method and system have strong practicability.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, they are described in a relatively simple manner, and reference may be made to some descriptions of method embodiments for relevant points. The above-described system embodiments are merely illustrative, wherein the modules or units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A singing evaluation method, comprising:

acquiring recording data of a song sung by a user;

and taking the similarity score as an evaluation result.

2. The singing evaluation method of claim 1, wherein the calculating a pitch difference between the data segment and standard audio data of a song comprises:

step 201: extracting a stable fundamental frequency of the data segment;

3. The singing evaluation method according to claim 2, wherein before the calculating the similarity score of the data segment from the detection start point and the standard audio data of the song, the method further comprises: and carrying out smoothing processing and singularity removing processing on the data segment with the pitch difference value smaller than a preset threshold value.

4. The singing evaluation method of claim 3, wherein the calculating a pitch-up deviation of the stabilized fundamental frequency of the data segment from the standard audio data comprises:

5. A singing evaluation method according to any one of claims 1 to 4, wherein the sequentially intercepting a plurality of segments of recorded data of a set length from different positions between a start end point and an end point of the recorded data comprises:

determining an interception starting position;

6. A singing evaluation system, comprising:

7. The singing evaluation system according to claim 6, wherein the first computing module includes:

8. A singing evaluation system according to claim 7, wherein the system further comprises:

9. A singing evaluation system according to claim 8, wherein the computing unit comprises:

10. A singing evaluation system according to any one of claims 6 to 9, wherein the intercept module comprises:

a determination unit for determining an interception start position;