CN113823270B - Determination method, medium, device and computing equipment of rhythm score - Google Patents

Determination method, medium, device and computing equipment of rhythm score Download PDF

Info

Publication number
CN113823270B
CN113823270B CN202111266761.8A CN202111266761A CN113823270B CN 113823270 B CN113823270 B CN 113823270B CN 202111266761 A CN202111266761 A CN 202111266761A CN 113823270 B CN113823270 B CN 113823270B
Authority
CN
China
Prior art keywords
target
starting point
determining
note
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111266761.8A
Other languages
Chinese (zh)
Other versions
CN113823270A (en
Inventor
高月洁
郑博
刘华平
曹偲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Cloud Music Technology Co Ltd
Original Assignee
Hangzhou Netease Cloud Music Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Cloud Music Technology Co Ltd filed Critical Hangzhou Netease Cloud Music Technology Co Ltd
Priority to CN202111266761.8A priority Critical patent/CN113823270B/en
Publication of CN113823270A publication Critical patent/CN113823270A/en
Application granted granted Critical
Publication of CN113823270B publication Critical patent/CN113823270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The embodiment of the disclosure provides a method, medium, device and computing equipment for determining a rhythm score, wherein the method comprises the following steps: and acquiring a dry sound signal corresponding to the target song by the user, determining a first starting point of each user singing note in the dry sound signal, and determining a rhythm score of the target song by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and a starting point detection interval corresponding to each target note in the pitch line file. The method and the device can accurately obtain the rhythm score of the song singed by the user, and improve the user experience.

Description

Determination method, medium, device and computing equipment of rhythm score
Technical Field
The embodiment of the disclosure relates to the technical field of voice signal processing, and more particularly relates to a method, medium, device and computing equipment for determining a rhythm score.
Background
This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
The existing K song system can score singing of a user, and entertainment interactivity is improved. Wherein scoring tempo is a common scoring approach.
In the related art, when performing rhythm scoring, a pitch contour is generally used as a characteristic of rhythm scoring, a corresponding DTW curve is obtained by performing a dynamic time warping (DYNAMIC TIME WARPING, DTW) operation on a pitch line sung by a user and a template pitch line, and then a Root Mean Square (RMS) value (i.e., a least Square difference value) between the DTW curve and a straight line obtained by fitting the DTW curve is calculated to obtain the rhythm scoring. The tempo score obtained by the above manner is not accurate enough.
Disclosure of Invention
The disclosure provides a method, medium, device and computing equipment for determining a tempo score so as to accurately determine the tempo score.
In a first aspect, an embodiment of the present disclosure provides a method for determining a tempo score, including:
acquiring a dry sound signal corresponding to a target song singed by a user;
determining a first starting point of each user singing note in the dry sound signal;
Determining a rhythm score of a target song by a user according to a first starting point, weights of all target notes in a pitch line file corresponding to the target song and a starting point detection interval corresponding to all target notes in the pitch line file, wherein the weights are used for representing importance degrees of different target notes in the pitch line file to rhythm hearing, the starting point detection interval comprises a preset offset range taking a second starting point of the target notes as a center, the starting point detection interval corresponds to a first preset score of the target notes, and the starting point detection interval and the first starting point are used for determining the target score of the target notes.
In one possible implementation manner, determining the tempo score of the target song by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song, and the starting point detection interval corresponding to each target note in the pitch line file includes: determining whether a corresponding first starting point exists in the starting point detection interval; if the corresponding first starting point exists in the starting point detection interval, determining the target score of the target note according to the first starting point and the first preset score of the target note corresponding to the starting point detection interval; and determining the rhythm score of the target song sung by the user according to the weight and the target score of the target note.
In one possible implementation, determining the tempo score for the user to sing the target song based on the weights of the target notes and the target score includes: obtaining the product of the weight of the target note corresponding to each starting point detection interval and the target score; and determining the rhythm score of the target song sung by the user according to the ratio of the sum of the products to the weight sum of each target note in the pitch line file.
In one possible implementation manner, the method for determining the tempo score further includes: if the starting point detection interval is determined to not have the corresponding first starting point, determining that the target score of the target note corresponding to the starting point detection interval is a second preset score, wherein the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
In one possible implementation, the start point detection interval corresponding to the target note includes a plurality of different start point detection subintervals, and the different start point detection subintervals correspond to different first preset scores of the target note; the different starting point detection subintervals comprise different preset offset ranges centering on the second starting point of the target note, and the different preset offset ranges are obtained according to the length of the target note, the second starting point of the target note and different preset detection values.
In one possible implementation manner, the method for determining the tempo score further includes: if the fact that a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target note is determined, determining target starting point detection subintervals corresponding to the first starting points in the starting point detection interval respectively; and determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target notes.
In one possible implementation, determining a first starting point for each user's singing notes in the dry acoustic signal includes: and determining a first starting point of each user singing note in the dry sound signal based on a preset starting point detection algorithm.
In one possible implementation, the preset start point detection algorithm includes a frequency spectrum-based start point detection algorithm and a pyin-based start point detection algorithm, and determining a first start point of each user singing note in the dry sound signal includes: determining a third starting point of each user singing note in the dry sound signal based on a starting point detection algorithm of the frequency spectrum; determining a fourth starting point of the singing notes of each user in the dry sound signal based on a pyin starting point detection algorithm; and determining a first starting point of each user singing note in the dry sound signal according to the union of the third starting point and the fourth starting point.
In one possible implementation manner, before determining the tempo score of the target song by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song, and the starting point detection interval corresponding to each target note in the pitch line file, the method for determining the tempo score further includes: the weight of each target note in the pitch line file corresponding to the target song is obtained by the following method: determining the weight of a first target note behind the air port as a first weight, and determining the weight of an initial note in the pitch line file as the first weight; determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight; determining the weights of the target notes except the first target note and the continuous homophones after the air port as a third weight, wherein the third weight is smaller than the second weight; determining that the weight of the target notes in the continuous same notes except for the first target note is a fourth weight, wherein the fourth weight is smaller than the third weight; and obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
In one possible implementation manner, the method for determining the tempo score further includes: if the time interval between two adjacent target notes is greater than the air port threshold value, determining that an air port exists between the two adjacent target notes; if the pitches of at least two target notes are the same and the time interval between two adjacent target notes is smaller than the threshold value of the air port, determining that the at least two target notes are continuous same notes.
In a second aspect, an embodiment of the present disclosure provides a determining apparatus for a tempo score, including:
the acquisition module is used for acquiring a dry sound signal corresponding to a target song singed by a user;
a determining module, configured to determine a first starting point of each user singing a note in the dry sound signal;
The processing module is used for determining the rhythm score of the target song sung by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file, wherein the weight is used for representing the importance degree of different target notes in the pitch line file to the rhythm hearing, the starting point detection interval comprises a preset offset range taking the second starting point of the target note as the center, the starting point detection interval corresponds to the first preset score of the target note, and the starting point detection interval and the first starting point are used for determining the target score of the target note.
In one possible implementation, the processing module is specifically configured to: determining whether a corresponding first starting point exists in the starting point detection interval; if the corresponding first starting point exists in the starting point detection interval, determining the target score of the target note according to the first starting point and the first preset score of the target note corresponding to the starting point detection interval; and determining the rhythm score of the target song sung by the user according to the weight and the target score of the target note.
In one possible implementation, the processing module is specifically configured to: obtaining the product of the weight of the target note corresponding to each starting point detection interval and the target score; and determining the rhythm score of the target song sung by the user according to the ratio of the sum of the products to the weight sum of each target note in the pitch line file.
In one possible implementation, the processing module is further configured to: if the starting point detection interval is determined to not have the corresponding first starting point, determining that the target score of the target note corresponding to the starting point detection interval is a second preset score, wherein the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
In one possible implementation, the start point detection interval corresponding to the target note includes a plurality of different start point detection subintervals, and the different start point detection subintervals correspond to different first preset scores of the target note; the different starting point detection subintervals comprise different preset offset ranges centering on the second starting point of the target note, and the different preset offset ranges are obtained according to the length of the target note, the second starting point of the target note and different preset detection values.
In one possible implementation, the processing module is further configured to: if the fact that a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target note is determined, determining target starting point detection subintervals corresponding to the first starting points in the starting point detection interval respectively;
and determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target notes.
In one possible implementation, the determining module is specifically configured to: and determining a first starting point of each user singing note in the dry sound signal based on a preset starting point detection algorithm.
In one possible implementation, the preset starting point detection algorithm includes a spectrum-based starting point detection algorithm and a pyin-based starting point detection algorithm, and the determining module is specifically configured to: determining a third starting point of each user singing note in the dry sound signal based on a starting point detection algorithm of the frequency spectrum; determining a fourth starting point of the singing notes of each user in the dry sound signal based on a pyin starting point detection algorithm; and determining a first starting point of each user singing note in the dry sound signal according to the union of the third starting point and the fourth starting point.
In one possible implementation, the processing module is further configured to: before determining a rhythm score of a target song by a user according to a first starting point, weights of all target notes in a pitch line file corresponding to the target song and a starting point detection interval corresponding to all target notes in the pitch line file, acquiring the weights of all target notes in the pitch line file corresponding to the target song by the following modes: determining the weight of a first target note behind the air port as a first weight, and determining the weight of an initial note in the pitch line file as the first weight; determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight; determining the weights of the target notes except the first target note and the continuous homophones after the air port as a third weight, wherein the third weight is smaller than the second weight; determining that the weight of the target notes in the continuous same notes except for the first target note is a fourth weight, wherein the fourth weight is smaller than the third weight; and obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
In one possible implementation, the processing module is further configured to: if the time interval between two adjacent target notes is greater than the air port threshold value, determining that an air port exists between the two adjacent target notes; if the pitches of at least two target notes are the same and the time interval between two adjacent target notes is smaller than the threshold value of the air port, determining that the at least two target notes are continuous same notes.
In a third aspect, embodiments of the present disclosure provide a computing device comprising: a processor, a memory communicatively coupled to the processor;
The memory stores computer-executable instructions;
The processor executes the computer-executable instructions stored in the memory to implement the method of determining a tempo score according to the first aspect of the present disclosure.
In a fourth aspect, an embodiment of the present disclosure provides a computer readable storage medium, in which computer program instructions are stored, which when executed by a processor, implement a method for determining a tempo score according to the first aspect of the present disclosure.
In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements a method of determining a tempo score according to the first aspect of the present disclosure.
According to the method, medium, device and computing equipment for determining the rhythm score, through obtaining the dry sound signal corresponding to the target song by the user, the first starting point of each user singing note in the dry sound signal is determined, and the rhythm score of the target song by the user is determined according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file. According to the method and the device for determining the rhythm score of the user singing song, the rhythm score of the user singing song can be accurately obtained, and user experience is improved.
Drawings
The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method for determining a tempo score according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a start point detection sub-interval A corresponding to a note in a pitch line file according to an embodiment of the present disclosure;
FIG. 4 is a flow chart of a method of determining a tempo score provided by another embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating a spectrum-based onset detection algorithm for detecting onset of notes according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a note onset detected by a onset detection algorithm based on pyin according to one embodiment of the present disclosure;
FIG. 7 is a diagram illustrating a starting point detection algorithm based on pyin according to one embodiment of the present disclosure that cannot detect pitch for consonants;
FIG. 8 is a schematic diagram of determining a target score of a target note according to one embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a tempo score determining apparatus according to an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of a program product provided by an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of a computing device according to an embodiment of the disclosure.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to an embodiment of the disclosure, a method, medium, device and computing equipment for determining a rhythm score are provided.
In this context, it is to be understood that the terms involved: endpoint detection, an algorithm that divides the onset of notes, is commonly referred to as onset (onset) detection; dry sound, i.e. pure human sound without accompaniment and post-treatment; the frequency spectrum is a method for analyzing audio, can show the relation between the signal frequency and the energy, is generally a two-dimensional image, and has the horizontal axis representing time, the vertical axis representing frequency and the color shade representing energy level. Furthermore, any number of elements in the figures is for illustration and not limitation, and any naming is used for distinction only and not for any limiting sense.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments thereof.
Summary of The Invention
The present inventors have found that existing karaoke systems typically score the intonation of a user's singing when scoring the user's singing. Specifically, a score for a user's singing voice is derived based on the difference between the user's singing voice and the melody (i.e., fundamental frequency) of the song in a template (such as midi file). In addition, when a piece of singing voice is rated manually, factors such as rhythm, smell, tone, singing skills and the like are considered in addition to the evaluation of the voice level. Wherein scoring tempo is a common scoring approach. In the related art, when the rhythm scoring is performed, a pitch contour is generally used as a characteristic of the rhythm scoring, a corresponding DTW curve is obtained by performing DTW operation on a pitch line sung by a user and a template pitch line, and then the RMS between the DTW curve and a straight line obtained by fitting the DTW curve is calculated to obtain the rhythm scoring. When the tempo scoring is performed in the above manner, there are the following drawbacks: (1) The method can effectively evaluate the rhythm of the same song singed by different singers, but the scores of different songs cannot be compared with each other; (2) The pitch line of the full curve is required to be calculated, and real-time scoring cannot be given by taking sentences as the scale; (3) The user is required to sing strictly according to the pitch, and if the situation of running tone but correct rhythm occurs, proper scores cannot be given. In order to solve the situation that the user runs and tunes but the rhythm is correct in the defect (3), a 13-dimensional mel-frequency cepstral coefficient (Mel Frequency Cepstral Coefficients, MFCC) characteristic is used, and a DTW is used for scoring the rhythm, but the following defects still exist in the scoring mode: (1) The MFCC feature allows for tempo evaluation in the case of user pitch, but accordingly, the user must correctly sing the lyrics, otherwise a proper score cannot be given; (2) The method can effectively evaluate the rhythm of the same song singed by different singers, but scores of different songs cannot be compared with each other; (3) The pitch line of the full curve needs to be calculated, and real-time scoring cannot be given by taking sentences as the scale.
In addition, in the related art, the tempo may also be scored by: (1) Lyrics are identified by a voice recognition technology and matched in sentence units to obtain singing voice sets e= { E1, E2. The starting and ending points of each word in a sentence of singing are located based on frequency, resulting in a single tone set pi= { Pi1, pi2,..pij }. And comparing the user singing duration of the ith singing voice Ei in the singing voice set E with the standard singing duration to evaluate the overall rhythm score. Similarly, comparing the difference between the starting time and the ending time of the user singing voice of the j-th word of the i-th sentence singing voice in the singing voice set E and the standard singing voice, and evaluating a local rhythm score; mode (1) has the following disadvantages: the user must correctly sing the lyrics, otherwise the algorithm cannot give a proper score; when phrases with the same lyrics appear in the same song, confusion is easy to generate in matching; the local rhythm scoring requires that a user sings strictly according to pitch, and if the condition of running tone but correct rhythm occurs, the algorithm cannot give a proper score; (2) Scoring according to the rhythm of musical instrument playing, taking the middle point of the playing duration time of each note as a specific playing time point, matching by using dynamic programming, and deducting according to the proportion of the multi-playing or less-playing tones in the whole; mode (2) has the following disadvantages: proper scoring cannot be performed specifically according to the deviation of the performance of each note; when the voices with more performances and less performances exist in a phrase, the obtained score has deviation; (3) Matching and scoring are carried out by taking bar as the minimum unit, and rhythm scoring is carried out based on the difference value between the accuracy degree of the bar starting point beat and the integral value of the bar; mode (3) has the following disadvantages: the minimum offset unit of the bar is 32 notes, the matching is not accurate enough, and the grading is not accurate enough; when the singing rhythm of the user is different from the original music by more than 4 score notes, the upper limit of the matching method is exceeded, and corresponding phrases are not matched; the missing of single tone is not beaten, the wrong shooting is performed, and the detection is performed in a robber. The modes for scoring the rhythms are relatively crude, algorithms are more hard, accuracy is low, and application conditions are relatively narrow.
Based on the above problems, the present disclosure provides a method, medium, device and computing device for determining a tempo score, which can accurately obtain a tempo score of a song singed by a user according to a dry sound signal and a pitch line file by obtaining the dry sound signal corresponding to the song singed by the user, thereby improving user experience.
Application scene overview
An application scenario of the solution provided in the present disclosure is first illustrated with reference to fig. 1. Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present disclosure, as shown in fig. 1, in the application scenario, a server 102 obtains a dry sound signal of a song sung by a user through a client 101, the server 102 determines a tempo score of the song sung by the user according to the dry sound signal and a pitch line file of the song sung by the user, the tempo score is transmitted to the client 101 through a network, and the client 101 displays the tempo score of the song sung by the user. The specific implementation process of determining the tempo score of the song sung by the user by the server 102 according to the dry sound signal and the pitch line file of the song sung by the user may be referred to as the schemes of the following embodiments.
It should be noted that fig. 1 is only a schematic diagram of an application scenario provided by an embodiment of the present disclosure, and the embodiment of the present disclosure does not limit the devices included in fig. 1 or limit the positional relationship between the devices in fig. 1. For example, in the application scenario shown in fig. 1, a data storage device may be an external memory with respect to the server 102, or an internal memory integrated into the server 102.
Exemplary method
A method of determining a tempo score according to an exemplary embodiment of the present disclosure is described below with reference to fig. 1 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.
First, a method for determining a tempo score is described by way of specific embodiments.
Fig. 2 is a flowchart of a method for determining a tempo score according to an embodiment of the present disclosure. The method of the embodiments of the present disclosure may be applied in a computing device, which may be a server or a server cluster, or the like. As shown in fig. 2, the method of the embodiment of the present disclosure includes:
S201, acquiring a dry sound signal corresponding to a target song singed by a user.
In the embodiment of the present disclosure, the present disclosure is not limited in particular as to how to obtain the dry sound signal corresponding to the target song sung by the user. Illustratively, the user listens to the accompaniment of the target song through headphones while singing the target song. The K song Application (APP) can directly record pure voice without accompaniment corresponding to the target song of the user, namely a dry voice signal, so that the dry voice signal corresponding to the target song of the user can be obtained. By way of example, when a user plays K songs, he/she directly listens to the played accompaniment and sings the target song, so as to obtain an audio file with the accompaniment for the user to sing, and a relevant sound accompaniment separation algorithm is utilized to extract a dry sound signal corresponding to the target song for the user to sing from the audio file.
S202, determining a first starting point of singing notes of each user in the dry sound signal.
In this step, the first starting point is the starting time of the note sung by the user, which may also be referred to as the onset of the note sung by the user. Determining a first starting point of each user's singing note in the dry sound signal, i.e. determining an onset of each user's singing note in the dry sound signal. After the dry sound signal corresponding to the user singing target song is obtained, the dry sound set can be extracted through the current endpoint detection algorithm based on the dry sound signal, namely, a first starting point of each user singing note in the dry sound signal is determined. For how to determine the first starting point of the singing notes of each user in the dry acoustic signal, reference is made to the following embodiments, which are not repeated here.
And S203, determining the rhythm score of the target song sung by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file.
The method comprises the steps that weights are used for representing importance degrees of different target notes in a pitch line file on rhythm hearing, a starting point detection interval comprises a preset offset range taking a second starting point of the target notes as a center, the starting point detection interval corresponds to a first preset score of the target notes, and the starting point detection interval and the first starting point are used for determining target scores of the target notes.
Illustratively, a pitch line file, such as a midi file, may be purchased from a song provider or may be generated by algorithms such as song melody extraction, spectrum conversion, and the like. Table 1 is a pitch line file format provided by an embodiment of the present disclosure, as shown in table 1, the pitch line file is a matrix of three columns, and the meanings of each column are respectively: note onset time in milliseconds (ms); note end time in milliseconds (ms); pitch, in midi bond number. The corresponding note length may be obtained from the time interval of the note start time and the note end time.
TABLE 1
Different notes in the pitch line file have different contribution degrees to whether the rhythm is accurate or not on the auditory sense, so that the weight of each note in the pitch line file can be determined according to the importance degree of different notes in the pitch line file on the auditory sense of the rhythm. For how to determine the weights of the target notes in the pitch line file corresponding to the target song, reference may be made to the following embodiments, which are not described herein.
The second starting point of the target note, that is, the starting time of the target note in the pitch line file, is centered on the second starting point of the target note, and according to the preset offset range, a starting point detection interval corresponding to each target note in the pitch line file can be determined. The starting point detection interval corresponds to a first preset score of the target note, and the starting point detection interval and the first starting point are used for determining the target score of the target note.
Optionally, the start point detection interval corresponding to the target note includes a plurality of different start point detection subintervals, and the different start point detection subintervals correspond to different first preset scores of the target note; the different starting point detection subintervals comprise different preset offset ranges centering on the second starting point of the target note, and the different preset offset ranges are obtained according to the length of the target note, the second starting point of the target note and different preset detection values.
The starting point detection section corresponding to the target note includes, for example, three different starting point detection subintervals, namely, a starting point detection subinterval a, a starting point detection subinterval B and a starting point detection subinterval C, where the three different starting point detection subintervals correspond to different first preset scores of the target note; the three different start point detection subintervals include different preset offset ranges centered on the second start point of the target note, and specifically, the preset offset ranges corresponding to the three different start point detection subintervals may be determined as follows:
Starting point detection subinterval a: a note start time point + -note length/4, wherein 4 is a preset detection value; if the first starting point falls into the interval according to the starting point detection subinterval A and the first starting point of the note sung by the user, a corresponding first preset score x can be obtained, wherein x is 1 for example;
A starting point detection subinterval B: a note start time point + -note length/3, wherein 3 is a preset detection value; if the first starting point falls into the interval according to the starting point detection subinterval A and the first starting point of the note sung by the user, a corresponding first preset score y can be obtained, wherein y is 0.7 for example;
starting point detection subinterval C: note start time point ± note length/2, wherein 2 is a preset detection value; if the first starting point falls into the interval according to the starting point detection subinterval a and the first starting point of the note sung by the user, a corresponding first preset score z can be obtained, where z is, for example, 0.5.
The score requirements for the three different starting point detection subintervals may be: x > y > z >0. The preset offset ranges corresponding to the three different start point detection subintervals may be determined in advance through experiments. Illustratively, one possible experimental method is: acquiring dry sound signals covering different singing levels, different song styles and different languages, and manually determining the rhythm score of each singing segment; setting a plurality of sets of preset offset corresponding to the three different starting point detection subintervals, comparing the rhythm scores obtained by the method for determining the rhythm scores provided by the embodiment of the disclosure with the manually determined rhythm scores, and determining each preset offset range corresponding to the three different starting point detection subintervals according to the parameter scheme with the highest correlation degree with the manually determined rhythm scores.
After determining the preset offset ranges corresponding to the three different start point detection subintervals in the above manner, the corresponding three start point detection subintervals may be represented by threshold A、thresholdB and threshold C, respectively, and correspondingly, the first preset scores corresponding to the three start point detection subintervals may be represented by score A、scoreB and score C, respectively. Fig. 3 is a schematic diagram of a start point detection subinterval a corresponding to a note in a pitch line file according to an embodiment of the present disclosure, as shown in fig. 3, for note 1, note 2 and note 3 in the pitch line file, corresponding note start time points are respectively: 5s, 7s and 12s, the corresponding note end time points are respectively: 7s, 12s and 15s, the corresponding note lengths can be determined as: 2s, 5s and 3s, wherein the start point detection section corresponding to each note comprises three start point detection subintervals of threshold A、thresholdB and threshold C, specifically, the start point detection section corresponding to note 1 comprises a start point detection subinterval A1, a start point detection subinterval B1 and a start point detection subinterval C1, the start point detection section corresponding to note 2 comprises a start point detection subinterval A2, a start point detection subinterval B2 and a start point detection subinterval C2, and the start point detection section corresponding to note 3 comprises a start point detection subinterval A3, a start point detection subinterval B3 and a start point detection subinterval C3; taking the threshold A start point detection subinterval corresponding to note 1, note 2, and note 3 as an example, as shown in fig. 3, the start point detection subinterval corresponding to note 1, note 2, and note 3 is: a start point detection sub-section A1, a start point detection sub-section A2, and a start point detection sub-section A3. The first preset score A corresponding to each of the start point detection subinterval A1, the start point detection subinterval A2, and the start point detection subinterval A3 is 1 score.
Optionally, the weights of the notes in the pitch line file corresponding to the songs sung by the user and the start point detection intervals corresponding to the notes in the pitch line file may be obtained according to the pitch line file corresponding to the songs sung by the user, and stored in the memory, so that when the rhythms of the songs sung by the user are scored, the weights of the target notes in the pitch line file corresponding to the songs sung by the user and the start point detection intervals corresponding to the target notes in the pitch line file are directly obtained from the memory.
In the embodiment of the disclosure, after determining the first starting point of each user singing note in the dry sound signal, the rhythm score of the user singing the target song may be determined according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song, and the starting point detection interval corresponding to each target note in the pitch line file. For determining the rhythm score of the target song sung by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song, and the starting point detection interval corresponding to each target note in the pitch line file, reference may be made to the subsequent embodiments, which are not described herein.
For example, after determining the tempo score of the user singing the target song, the tempo score of the user singing the target song may be displayed to the user.
According to the method for determining the rhythm score, the first starting point of each user singing note in the dry sound signal is determined by acquiring the dry sound signal corresponding to the target song by the user, and the rhythm score of the target song by the user is determined according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file. According to the embodiment of the invention, the rhythm score of the song singed by the user is determined according to the dry sound signal corresponding to the song singed by the user, the weight of each note in the pitch line file and the starting point detection interval corresponding to each note in the pitch line file, so that the rhythm score of the song singed by the user can be accurately obtained, and the user experience is improved.
On the basis of the above embodiment, when determining the weight of each note in the pitch line file in the rhythm score, it may be determined whether there is one air port between two adjacent target notes in the pitch line file and whether at least two target notes are consecutive same notes according to the pitch line file. One possible implementation is to determine that there is a gas port between two adjacent target notes if the time interval between them is greater than the gas port threshold; if the pitches of at least two target notes are the same and the time interval between two adjacent target notes is smaller than the threshold value of the air port, determining that the at least two target notes are continuous same notes.
Illustratively, the time required for inter-phrase ventilation is typically between 450ms and 2000ms, and the time of inhalation for a snap or extreme case is between 100ms and 450ms, so that in a pitch line file, a port can be considered to be present here as long as no pitch segment is found that exceeds a certain length (i.e., port threshold, such as represented by T breath). Specifically, the port threshold T breath is, for example, 100ms, and in the pitch line file, if the time interval between the start time of the second target note and the end time of the first target note is greater than 100ms, it is determined that there is a port between the two adjacent target notes. For consecutive co-notes, illustratively, if the pitches of the 3 target notes are the same and the time interval between two adjacent ones of the 3 target notes is less than the gate threshold, then the 3 target notes are determined to be consecutive co-notes.
Optionally, before determining the tempo score of the target song by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song, and the starting point detection interval corresponding to each target note in the pitch line file, the method for determining the tempo score provided by the embodiment of the present disclosure may further include: the weight of each target note in the pitch line file corresponding to the target song is obtained by the following method: determining the weight of a first target note behind the air port as a first weight, and determining the weight of an initial note in the pitch line file as the first weight; determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight; determining the weights of the target notes except the first target note and the continuous homophones after the air port as a third weight, wherein the third weight is smaller than the second weight; determining that the weight of the target notes in the continuous same notes except for the first target note is a fourth weight, wherein the fourth weight is smaller than the third weight; and obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
Illustratively, the weights of the individual notes in the pitch line file are obtained by:
a. The first note after the port (i.e., the first note in each sentence) has a first weight X, and the starting note in the pitch line file has a first weight X, for example, 3, it being understood that the starting note in the pitch line file is the first note of all notes contained in the pitch line file;
b. The weight of the first note of the consecutive same notes is the second weight Y, Y being 2 for example;
c. the weights of the notes except the first note and the continuous same note behind the air port are the third weight Z, and Z is 1;
d. the weight of notes other than the first note (i.e., the non-initial) in the consecutive co-notes is a fourth weight W, e.g., 0.
Wherein, the requirements for the weight value are: x > Y > Z > W. The specific weight values may be determined experimentally in advance. Illustratively, one possible experimental method is: acquiring dry sound signals covering different singing levels, different song styles and different languages; manually determining the rhythm score of each singing segment; setting a plurality of parameter schemes for determining weight values, comparing the rhythm scores obtained by the method for determining the rhythm scores provided by the embodiment of the disclosure with the manually determined rhythm scores, and determining the weight values in the a, b, c and d according to the parameter scheme with the highest correlation degree with the manually determined rhythm scores.
It should be noted that, the first tone of each sentence of the song and the first tone of the continuous homonote are more important for the rhythm hearing, so the corresponding weights are determined by the above a and b; to reduce the impact that non-initial detection of consecutive co-notes may not accurately have on the tempo score, the corresponding weights are therefore determined by d as described above.
After the weights of the target notes in the pitch line file corresponding to the target song are obtained in the above manner, the weights corresponding to each target note in the pitch line file may be represented by weight [ j ], for example.
Fig. 4 is a flowchart of a method for determining a tempo score according to another embodiment of the present disclosure. Based on the above embodiments, the embodiments of the present disclosure further describe how to determine a tempo score of a song being singed by a user. As shown in fig. 4, a method of an embodiment of the present disclosure may include:
s401, acquiring a dry sound signal corresponding to a target song singed by a user.
A detailed description of this step may be referred to the related description of S201 in the embodiment shown in fig. 2, and will not be repeated here.
In an embodiment of the present disclosure, step S202 in fig. 2 may further include the following step S402:
S402, determining a first starting point of each user singing note in the dry sound signal based on a preset starting point detection algorithm.
In this step, the preset start point detection algorithm may be predetermined based on the current start point detection algorithm. Accordingly, a first start point of each user's singing note in the dry sound signal may be determined based on a preset start point detection algorithm.
Further, the preset starting point detection algorithm includes a frequency spectrum-based starting point detection algorithm and a pyin-based starting point detection algorithm, and determining a first starting point of each user singing note in the dry sound signal may include: determining a third starting point of each user singing note in the dry sound signal based on a starting point detection algorithm of the frequency spectrum; determining a fourth starting point of the singing notes of each user in the dry sound signal based on a pyin starting point detection algorithm; and determining a first starting point of each user singing note in the dry sound signal according to the union of the third starting point and the fourth starting point.
Illustratively, the principle of the spectrum-based start point detection algorithm is: when the vomit and tone in the audio change, structural abrupt changes occur in the frequency spectrum. Fig. 5 is a schematic diagram of detecting note start points according to a spectrum-based start point detection algorithm according to an embodiment of the present disclosure, as shown in fig. 5, for a sentence of lyrics "let me feel difficult" singed by a user, the diagram is decomposed into a corresponding spectrogram 502 according to a volume waveform 501, and the onset of each note is found by detecting the mutation point of spectrum energy in the spectrogram 502. It should be noted that the spectrum-based start point detection algorithm is a universal onset feature extractor.
Illustratively, fig. 6 is a schematic diagram of a note start point detected by a start point detection algorithm based on pyin according to an embodiment of the present disclosure, and referring to fig. 6, the start point detection algorithm based on pyin includes the following four steps: the first step, detecting the pitch corresponding to each frame contained in the audio; second, smoothing the obtained pitch to obtain a pitch line curve 601 as shown in fig. 6; third, the notes are divided according to the pitch lines, and the corresponding straight lines 602 shown in fig. 6 are obtained; fourth, determining the initial time of the note as the detected onset. Note that, the starting point detection algorithm based on pyin is a generic note starting point feature extractor.
Through experiments, the frequency spectrum-based starting point detection algorithm and the pyin-based starting point detection algorithm have advantages and disadvantages, wherein the frequency spectrum-based starting point detection algorithm has the following advantages: the sounding time of consonants is taken into account when detecting onset, which has the disadvantages: for the situation of tone turning (a plurality of pitches correspond to the same word), missed detection is often carried out, and for the situation of multiple pitches of a word, non-initial tone set is insensitive; the starting point detection algorithm based on pyin has the advantages that: for pitch sensitivity, the onset of the transfer can be well solved, and the defects are that: the consonant portion of the utterance has no pitch, so the detected onset will be later than the actual onset, causing errors, illustratively, FIG. 7 is a schematic diagram of a pyin-based start-point detection algorithm provided by an embodiment of the present disclosure, where no pitch can be detected for the consonant, and as shown in FIG. 7, numerals 241, 242, 243, 244, 174, 177, 179, and 180 represent the frequencies of the detected pitches in Hz; for the second note shown in fig. 7, the onset detected by the pyin-based onset detection algorithm is the onset 701, and the actual onset of the second note should be the onset 702, and the onset 701 is later than the onset 702, which causes errors; in addition, the starting point detection algorithm based on pyin cannot detect non-initial set for the case of consecutive same pitch.
Therefore, after the third starting point of each user singing note in the dry sound signal is determined based on the frequency spectrum starting point detection algorithm, and after the fourth starting point of each user singing note in the dry sound signal is determined based on the pyin starting point detection algorithm, the first starting point of each user singing note in the dry sound signal is determined according to the union of the third starting point and the fourth starting point, so that the detection accuracy of onset can be effectively improved. Illustratively, the first starting point for each user singing a note in the detected dry acoustic signal may be represented by onset [ i ].
In the embodiment of the present disclosure, the step S203 in fig. 2 may further include three steps S403 to S405 as follows:
s403, determining whether a corresponding first starting point exists in the starting point detection section.
In this step, after determining the first start point of each user's singing note in the dry signal based on a preset start point detection algorithm, it may be determined whether the start point detection interval has a corresponding first start point based on the start point detection interval corresponding to each target note in the pitch line file. Illustratively, referring to fig. 3, the first start point is, for example, 4.8s, the start point detection interval is, for example, a start point detection subinterval A1 corresponding to note 1, the start point detection subinterval A1 is 4.5s to 5.5s, and 4.8s is in the range of 4.5s to 5.5s, and thus, it can be determined that the start point detection subinterval A1 has the corresponding first start point of 4.8s.
If it is determined that the start point detection interval has the corresponding first start point, executing step S404; if it is determined that the start point detection section does not have the corresponding first start point, step S406 is performed.
S404, if the corresponding first starting point exists in the starting point detection interval, determining the target score of the target note according to the first starting point and the first preset score of the target note corresponding to the starting point detection interval.
In this step, after determining that the start point detection interval has the corresponding first start point, the target score of the target note may be determined according to the first start point and the first preset score of the target note corresponding to the start point detection interval. For example, referring to fig. 3, the first start point is, for example, 4.8s, the start point detection interval is, for example, the start point detection sub-interval A1 corresponding to the note 1, and after determining that the start point detection sub-interval A1 has the corresponding first start point for 4.8s, since the first preset score of the note 1 corresponding to the start point detection sub-interval A1 is 1 score, the target score of the target note corresponding to the first start point may be determined to be 1 score.
Further, the method for determining the tempo score provided by the embodiment of the present disclosure may further include: if the fact that a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target note is determined, determining target starting point detection subintervals corresponding to the first starting points in the starting point detection interval respectively; and determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target notes.
For example, fig. 8 is a schematic diagram of determining a target score of a target note, as shown in fig. 8, where the target note is, for example, note 2, and the corresponding start point detection interval includes three different start point detection subintervals, namely, a start point detection subinterval a (corresponding first preset score is, for example, 1 score), a start point detection subinterval B (corresponding first preset score is, for example, 0.7), and a start point detection subinterval C (corresponding first preset score is, for example, 0.5); if there are 3 corresponding first start points in the start point detection interval corresponding to the note 2, which are the first start point 1, the first start point 2 and the first start point 3, respectively, it may be determined that the target start point detection subinterval corresponding to the first start point 1 is the start point detection subinterval a, the target start point detection subinterval corresponding to the first start point 2 is the start point detection subinterval B and the target start point detection subinterval corresponding to the first start point 3 is the start point detection subinterval C, and further it may be determined that the score of the start point detection subinterval a corresponding to the first start point 1 is the highest, and the score is 1 score, so it may be determined that the target score of the note 2 is 1 score.
S405, determining the rhythm score of the target song sung by the user according to the weight and the target score of the target note.
In this step, based on the pitch line file corresponding to the target song, after determining the target score of the target note, the tempo score of the target song by the user may be determined according to the weight of the target note and the target score.
Further, determining a tempo score for the user singing the target song according to the weights and the target scores of the target notes may include: obtaining the product of the weight of the target note corresponding to each starting point detection interval and the target score; and determining the rhythm score of the target song sung by the user according to the ratio of the sum of the products to the weight sum of each target note in the pitch line file.
Illustratively, the tempo score of a user singing a target song may be determined by the following formula:
Wherein,
/>
Wherein rhythm score represents the tempo score of the user singing the target song, set [ i ] represents the set of each user singing note in the detected user dry sound signal, and i represents the ith set in the set array; score [ j ] indicates the score (i.e., score A、scoreB and score C in the above embodiments) corresponding to the target note obtained when the onset of the note being singed by the user falls in the above different onset detection subintervals (i.e., onset detection subinterval a, onset detection subinterval B, and onset detection subinterval C in the above embodiments); the notes [ j ] denote each target note in the pitch line file, j denote the jth target note in the pitch line file, weight [ j ] denote the weight of the jth target note in the pitch line, the notes [ j ] threshold A denote the start point detection subinterval a of the jth target note in the pitch line file, the notes [ j ] threshold B denote the start point detection subinterval B of the jth target note in the pitch line file, and the notes [ j ] threshold C denote the start point detection subinterval C of the jth target note in the pitch line file.
S406, if it is determined that the initial point detection interval does not have the corresponding first initial point, determining that the target score of the target note corresponding to the initial point detection interval is the second preset score.
The second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
For example, if the second preset score is, for example, 0, the second preset score is smaller than the first preset score of the target note corresponding to the start point detection interval. If the initial point detection interval is determined to not have the corresponding first initial point, determining that the target score of the target note corresponding to the initial point detection interval is 0. For example, referring to score [ j ] in the above embodiment, when the corresponding first starting point does not exist in the starting point detection interval, that is, the other cases exist, the score [ j ] takes a value of 0.
According to the method for determining the rhythm scores, through obtaining the dry sound signals corresponding to the target songs sung by the users, and based on a preset starting point detection algorithm, the first starting points of the sung notes of the users in the dry sound signals are determined; determining whether a corresponding first starting point exists in the starting point detection interval, if so, determining the target score of the target note corresponding to the starting point detection interval according to the first starting point and a first preset score of the target note corresponding to the starting point detection interval, and if not, determining the target score of the target note corresponding to the starting point detection interval as a second preset score; determining the rhythm score of the target song sung by the user according to the weight and the target score of the target note; according to the embodiment of the invention, the rhythm score of the song singed by the user is determined according to the dry sound signal corresponding to the song singed by the user, the weight of each note in the pitch line file and the starting point detection interval corresponding to each note in the pitch line file, so that the rhythm score of the song singed by the user can be accurately obtained, and the user experience is improved.
Based on the above embodiment, in one possible implementation manner, according to a first start point of each user singing note in the dry signal and a start point detection interval corresponding to each target note in the pitch line file, it may be determined whether the first start point has a corresponding start point detection interval; if the first starting point is determined to have the corresponding starting point detection interval, determining the score and the weight corresponding to the first starting point according to the first starting point, the first preset score of the target notes corresponding to the starting point detection interval and the weight of each target note in the pitch line file corresponding to the target song; obtaining the product of the score and the weight corresponding to each first starting point according to the score and the weight corresponding to each first starting point; and determining the rhythm score of the target song sung by the user according to the ratio of the summation of the products to the summation of the weights corresponding to the first starting points.
In summary, the technical scheme provided by the present disclosure has at least the following advantages:
(1) The method can replace a manual evaluation flow, and provide rhythm evaluation for singing in real time and objectively;
(2) Because the pitch does not need to be referred, the method can be suitable for various singing levels, and does not require the tone alignment and lyrics;
(3) Since the rhythm scores obtained by the formula are in a percentage system, the rhythm scores of different songs have the same evaluation scale, and the rhythm scores of different songs can be compared;
(4) The rhythm score can be given in real time by taking sentences as a unit, and evaluation is not required to be carried out after the complete music is finished.
Exemplary apparatus
Having described the medium of the exemplary embodiment of the present disclosure, next, a determination device of the tempo score of the exemplary embodiment of the present disclosure will be described with reference to fig. 8. The device of the exemplary embodiment of the disclosure can realize each process in the model training method embodiment and achieve the same functions and effects.
Fig. 9 is a schematic structural diagram of a device for determining a tempo score according to an embodiment of the present disclosure, and as shown in fig. 9, a device 900 for determining a tempo score according to an embodiment of the present disclosure includes: an acquisition module 901, a determination module 902 and a processing module 903. Wherein:
The acquiring module 901 is configured to acquire a dry sound signal corresponding to a target song singed by a user.
A determining module 902 is configured to determine a first starting point for each user's singing note in the dry acoustic signal.
The processing module 903 is configured to determine a tempo score for a user to sing a target song according to a first starting point, weights of each target note in a pitch line file corresponding to the target song, and a starting point detection interval corresponding to each target note in the pitch line file, where the weights are used to represent importance degrees of different target notes in the pitch line file to a tempo listening feel, the starting point detection interval includes a preset offset range centered on a second starting point of the target note, the starting point detection interval corresponds to a first preset score of the target note, and the starting point detection interval and the first starting point are used to determine the target score of the target note.
In one possible implementation, the processing module 903 may be specifically configured to: determining whether a corresponding first starting point exists in the starting point detection interval; if the corresponding first starting point exists in the starting point detection interval, determining the target score of the target note according to the first starting point and the first preset score of the target note corresponding to the starting point detection interval; and determining the rhythm score of the target song sung by the user according to the weight and the target score of the target note.
In one possible implementation, the processing module 903 may be specifically configured to: obtaining the product of the weight of the target note corresponding to each starting point detection interval and the target score; and determining the rhythm score of the target song sung by the user according to the ratio of the sum of the products to the weight sum of each target note in the pitch line file.
In one possible implementation, the processing module 903 may also be configured to: if the starting point detection interval is determined to not have the corresponding first starting point, determining that the target score of the target note corresponding to the starting point detection interval is a second preset score, wherein the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
In one possible implementation, the start point detection interval corresponding to the target note includes a plurality of different start point detection subintervals, and the different start point detection subintervals correspond to different first preset scores of the target note; the different starting point detection subintervals comprise different preset offset ranges centering on the second starting point of the target note, and the different preset offset ranges are obtained according to the length of the target note, the second starting point of the target note and different preset detection values.
In one possible implementation, the processing module 903 may also be configured to: if the fact that a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target note is determined, determining target starting point detection subintervals corresponding to the first starting points in the starting point detection interval respectively;
and determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target notes.
In one possible implementation, the determining module 902 may be specifically configured to: and determining a first starting point of each user singing note in the dry sound signal based on a preset starting point detection algorithm.
In one possible implementation, the preset starting point detection algorithm includes a spectrum-based starting point detection algorithm and a pyin-based starting point detection algorithm, and the determining module 902 may specifically be configured to: determining a third starting point of each user singing note in the dry sound signal based on a starting point detection algorithm of the frequency spectrum; determining a fourth starting point of the singing notes of each user in the dry sound signal based on a pyin starting point detection algorithm; and determining a first starting point of each user singing note in the dry sound signal according to the union of the third starting point and the fourth starting point.
In one possible implementation, the processing module 903 may also be configured to: before determining a rhythm score of a target song by a user according to a first starting point, weights of all target notes in a pitch line file corresponding to the target song and a starting point detection interval corresponding to all target notes in the pitch line file, acquiring the weights of all target notes in the pitch line file corresponding to the target song by the following modes: determining the weight of a first target note behind the air port as a first weight, and determining the weight of an initial note in the pitch line file as the first weight; determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight; determining the weights of the target notes except the first target note and the continuous homophones after the air port as a third weight, wherein the third weight is smaller than the second weight; determining that the weight of the target notes in the continuous same notes except for the first target note is a fourth weight, wherein the fourth weight is smaller than the third weight; and obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
In one possible implementation, the processing module 903 may also be configured to: if the time interval between two adjacent target notes is greater than the air port threshold value, determining that an air port exists between the two adjacent target notes; if the pitches of at least two target notes are the same and the time interval between two adjacent target notes is smaller than the threshold value of the air port, determining that the at least two target notes are continuous same notes.
The device of the embodiment of the disclosure may be used to implement the scheme of the method for determining the tempo score in any of the above method embodiments, and its implementation principle and technical effects are similar, and are not repeated here.
Exemplary Medium
Having described the method of the exemplary embodiments of the present disclosure, next, a storage medium of the exemplary embodiments of the present disclosure will be described with reference to fig. 10.
Fig. 10 is a schematic diagram of a program product provided by an embodiment of the present disclosure, and with reference to fig. 10, a program product 1000 for implementing the above method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the context of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).
Exemplary computing device
Having described the methods, media, and apparatus of exemplary embodiments of the present disclosure, a computing device of exemplary embodiments of the present disclosure is next described with reference to fig. 11.
The computing device 1100 shown in fig. 11 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present disclosure.
Fig. 11 is a schematic structural diagram of a computing device according to an embodiment of the disclosure, and as shown in fig. 11, a computing device 1100 is represented in the form of a general-purpose computing device. Components of computing device 1100 may include, but are not limited to: the at least one processing unit 1101, the at least one memory unit 1102, and a bus 1103 that connects the various system components (including the processing unit 1101 and the memory unit 1102).
The bus 1103 includes a data bus, a control bus, and an address bus.
The storage unit 1102 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 11021 and/or cache memory 11022, and may further include readable media in the form of nonvolatile memory, such as Read Only Memory (ROM) 11023.
The storage unit 1102 may also include a program/utility 11025 having a set (at least one) of program modules 11024, such program modules 11024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Computing device 1100 can also communicate with one or more external devices 1104 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 1105. Moreover, computing device 1100 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter 1106. As shown in fig. 11, network adapter 1106 communicates with other modules of computing device 1100 over bus 1103. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computing device 1100, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the determination means of tempo scores are mentioned, such a division is only exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that this disclosure is not limited to the particular embodiments disclosed nor does it imply that features in these aspects are not to be combined to benefit from this division, which is done for convenience of description only. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (21)

1. A method of determining a tempo score, comprising:
acquiring a dry sound signal corresponding to a target song singed by a user;
determining a first starting point of each user's singing note in the dry sound signal;
determining a rhythm score of the target song by the user according to the first starting point, the weight of each target note in a pitch line file corresponding to the target song and a starting point detection interval corresponding to each target note in the pitch line file, wherein the weight is used for representing the importance degree of different target notes in the pitch line file on rhythm hearing, the starting point detection interval comprises a preset offset range taking a second starting point of the target note as a center, the starting point detection interval corresponds to a first preset score of the target note, and the starting point detection interval and the first starting point are used for determining the target score of the target note;
Determining the rhythm score of the target song by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file, including:
Determining whether the corresponding first starting point exists in the starting point detection section;
If the first starting point corresponding to the starting point detection interval is determined, determining a target score of the target note according to the first starting point and a first preset score of the target note corresponding to the starting point detection interval;
and determining the rhythm score of the target song sung by the user according to the weight and the target score of the target note.
2. The method for determining a tempo score according to claim 1, wherein said determining a tempo score for said user singing a target song based on weights and target scores of said target notes includes:
obtaining the product of the weight and the target score of the target note corresponding to each starting point detection interval;
and determining the rhythm score of the target song sung by the user according to the ratio of the sum of the products to the weight sum of each target note in the pitch line file.
3. The method of determining a tempo score according to claim 1 and further comprising:
If the fact that the corresponding first starting point does not exist in the starting point detection interval is determined, determining that the target score of the target note corresponding to the starting point detection interval is a second preset score, wherein the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
4. The method for determining a tempo score according to claim 1, wherein the start point detection segments corresponding to the target notes include a plurality of different start point detection subsegments corresponding to different first predetermined scores of the target notes;
The different starting point detection subintervals comprise different preset offset ranges taking the second starting point of the target note as a center, and the different preset offset ranges are obtained according to the length of the target note, the second starting point of the target note and different preset detection values.
5. The method of determining a tempo score of claim 4 and further comprising:
If the fact that a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target notes is determined, determining target starting point detection subintervals corresponding to the first starting points in the starting point detection interval respectively;
And determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target notes.
6. The method of determining a tempo score according to claim 1 wherein said determining a first starting point for each user's singing notes in said dry sound signal includes:
And determining a first starting point of singing notes of each user in the dry sound signal based on a preset starting point detection algorithm.
7. The method of claim 6, wherein the predetermined onset detection algorithm includes a spectrum-based onset detection algorithm and a pyin-based onset detection algorithm, and wherein determining the first onset of each user singing note in the vocal signal includes:
Determining a third starting point of each user singing note in the dry sound signal based on a frequency spectrum starting point detection algorithm;
Determining a fourth starting point of singing notes of each user in the dry sound signal based on a pyin starting point detection algorithm;
And determining a first starting point of singing notes of each user in the dry sound signal according to the union set of the third starting point and the fourth starting point.
8. The method according to any one of claims 1 to 7, wherein before determining the tempo score for the target song by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song, and the starting point detection interval corresponding to each target note in the pitch line file, further comprises:
The weight of each target note in the pitch line file corresponding to the target song is obtained by the following steps:
Determining the weight of a first target note behind a gas port as a first weight, and determining the weight of a starting note in the pitch line file as the first weight;
determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight;
Determining that the weight of the target notes except for the first target note and the continuous same note after the air port is a third weight, wherein the third weight is smaller than the second weight;
Determining that the weight of the target notes other than the first target note in the continuous same note is a fourth weight, wherein the fourth weight is smaller than the third weight;
And obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
9. The method of determining a tempo score of claim 8 and further comprising:
If the time interval between two adjacent target notes is greater than the air port threshold value, determining that an air port exists between the two adjacent target notes;
If the pitches of at least two target notes are the same and the time interval between two adjacent target notes is smaller than the threshold value of the air port, determining that the at least two target notes are continuous same notes.
10. A cadence score determination device, comprising:
the acquisition module is used for acquiring a dry sound signal corresponding to a target song singed by a user;
a determining module, configured to determine a first starting point of each user singing note in the dry sound signal;
the processing module is used for determining a rhythm score of the target song by the user according to the first starting point, the weight of each target note in a pitch line file corresponding to the target song and a starting point detection interval corresponding to each target note in the pitch line file, wherein the weight is used for representing the importance degree of different target notes in the pitch line file to rhythm hearing, the starting point detection interval comprises a preset offset range taking a second starting point of the target note as a center, the starting point detection interval corresponds to a first preset score of the target note, and the starting point detection interval and the first starting point are used for determining the target score of the target note;
the processing module is specifically configured to:
Determining whether the corresponding first starting point exists in the starting point detection section;
If the first starting point corresponding to the starting point detection interval is determined, determining a target score of the target note according to the first starting point and a first preset score of the target note corresponding to the starting point detection interval;
and determining the rhythm score of the target song sung by the user according to the weight and the target score of the target note.
11. The cadence score determination device of claim 10, wherein the processing module is configured to:
obtaining the product of the weight and the target score of the target note corresponding to each starting point detection interval;
and determining the rhythm score of the target song sung by the user according to the ratio of the sum of the products to the weight sum of each target note in the pitch line file.
12. The cadence score determination device of claim 10, wherein the processing module is further to:
If the fact that the corresponding first starting point does not exist in the starting point detection interval is determined, determining that the target score of the target note corresponding to the starting point detection interval is a second preset score, wherein the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
13. The apparatus for determining a tempo score according to claim 10, wherein the start point detection intervals corresponding to the target notes include a plurality of different start point detection sub-intervals corresponding to different first preset scores of the target notes;
The different starting point detection subintervals comprise different preset offset ranges taking the second starting point of the target note as a center, and the different preset offset ranges are obtained according to the length of the target note, the second starting point of the target note and different preset detection values.
14. The cadence score determination device of claim 13, wherein the processing module is further configured to:
If the fact that a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target notes is determined, determining target starting point detection subintervals corresponding to the first starting points in the starting point detection interval respectively;
And determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target notes.
15. The cadence score determination device of claim 10, wherein the determination module is configured to:
And determining a first starting point of singing notes of each user in the dry sound signal based on a preset starting point detection algorithm.
16. The tempo scoring determination device of claim 15 wherein the preset starting point detection algorithms include a spectrum-based starting point detection algorithm and a pyin-based starting point detection algorithm, the determination module being operative to:
Determining a third starting point of each user singing note in the dry sound signal based on a frequency spectrum starting point detection algorithm;
Determining a fourth starting point of singing notes of each user in the dry sound signal based on a pyin starting point detection algorithm;
And determining a first starting point of singing notes of each user in the dry sound signal according to the union set of the third starting point and the fourth starting point.
17. The apparatus according to any one of claims 10 to 16, wherein the processing module, before determining the tempo score of the target song by the user based on the first start point, the weight of each target note in the pitch line file corresponding to the target song, and the start point detection interval corresponding to each target note in the pitch line file, is further configured to:
The weight of each target note in the pitch line file corresponding to the target song is obtained by the following steps:
Determining the weight of a first target note behind a gas port as a first weight, and determining the weight of a starting note in the pitch line file as the first weight;
determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight;
Determining that the weight of the target notes except for the first target note and the continuous same note after the air port is a third weight, wherein the third weight is smaller than the second weight;
Determining that the weight of the target notes other than the first target note in the continuous same note is a fourth weight, wherein the fourth weight is smaller than the third weight;
And obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
18. The cadence score determination device of claim 17, wherein the processing module is further configured to:
If the time interval between two adjacent target notes is greater than the air port threshold value, determining that an air port exists between the two adjacent target notes;
If the pitches of at least two target notes are the same and the time interval between two adjacent target notes is smaller than the threshold value of the air port, determining that the at least two target notes are continuous same notes.
19. A computing device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of determining a tempo score according to any one of claims 1-9.
20. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein computer program instructions which, when executed by a processor, implement a method of determining a tempo score according to any of claims 1-9.
21. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements a method of determining a tempo score according to any of claims 1-9.
CN202111266761.8A 2021-10-28 2021-10-28 Determination method, medium, device and computing equipment of rhythm score Active CN113823270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111266761.8A CN113823270B (en) 2021-10-28 2021-10-28 Determination method, medium, device and computing equipment of rhythm score

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111266761.8A CN113823270B (en) 2021-10-28 2021-10-28 Determination method, medium, device and computing equipment of rhythm score

Publications (2)

Publication Number Publication Date
CN113823270A CN113823270A (en) 2021-12-21
CN113823270B true CN113823270B (en) 2024-05-03

Family

ID=78917573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111266761.8A Active CN113823270B (en) 2021-10-28 2021-10-28 Determination method, medium, device and computing equipment of rhythm score

Country Status (1)

Country Link
CN (1) CN113823270B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111429949B (en) * 2020-04-16 2023-10-13 广州繁星互娱信息科技有限公司 Pitch line generation method, device, equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004184506A (en) * 2002-11-29 2004-07-02 Brother Ind Ltd Karaoke machine and program
JP2005107329A (en) * 2003-09-30 2005-04-21 Yamaha Corp Karaoke machine
WO2010115298A1 (en) * 2009-04-07 2010-10-14 Lin Wen Hsin Automatic scoring method for karaoke singing accompaniment
CN107767850A (en) * 2016-08-23 2018-03-06 冯山泉 A kind of singing marking method and system
CN108008930A (en) * 2017-11-30 2018-05-08 广州酷狗计算机科技有限公司 The method and apparatus for determining K song score values
CN109300485A (en) * 2018-11-19 2019-02-01 北京达佳互联信息技术有限公司 Methods of marking, device, electronic equipment and the computer storage medium of audio signal
KR102107588B1 (en) * 2018-10-31 2020-05-07 미디어스코프 주식회사 Method for evaluating about singing and apparatus for executing the method
CN112309351A (en) * 2019-07-31 2021-02-02 武汉Tcl集团工业研究院有限公司 Song generation method and device, intelligent terminal and storage medium
CN113096689A (en) * 2021-04-02 2021-07-09 腾讯音乐娱乐科技(深圳)有限公司 Song singing evaluation method, equipment and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8575465B2 (en) * 2009-06-02 2013-11-05 Indian Institute Of Technology, Bombay System and method for scoring a singing voice

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004184506A (en) * 2002-11-29 2004-07-02 Brother Ind Ltd Karaoke machine and program
JP2005107329A (en) * 2003-09-30 2005-04-21 Yamaha Corp Karaoke machine
WO2010115298A1 (en) * 2009-04-07 2010-10-14 Lin Wen Hsin Automatic scoring method for karaoke singing accompaniment
CN107767850A (en) * 2016-08-23 2018-03-06 冯山泉 A kind of singing marking method and system
CN108008930A (en) * 2017-11-30 2018-05-08 广州酷狗计算机科技有限公司 The method and apparatus for determining K song score values
KR102107588B1 (en) * 2018-10-31 2020-05-07 미디어스코프 주식회사 Method for evaluating about singing and apparatus for executing the method
CN109300485A (en) * 2018-11-19 2019-02-01 北京达佳互联信息技术有限公司 Methods of marking, device, electronic equipment and the computer storage medium of audio signal
CN112309351A (en) * 2019-07-31 2021-02-02 武汉Tcl集团工业研究院有限公司 Song generation method and device, intelligent terminal and storage medium
CN113096689A (en) * 2021-04-02 2021-07-09 腾讯音乐娱乐科技(深圳)有限公司 Song singing evaluation method, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种改进旋律匹配算法在MIDI演奏系统中的应用;兰帆 等;计算机与现代化;20091231(第06期);第151-157页 *
动作与音乐的节奏特征匹配模型;樊儒昆 等;计算机辅助设计与图形学学报;20100630;第22卷(第06期);第990-996页 *

Also Published As

Publication number Publication date
CN113823270A (en) 2021-12-21

Similar Documents

Publication Publication Date Title
Yamada et al. A rhythm practice support system with annotation-free real-time onset detection
Su et al. Sparse Cepstral, Phase Codes for Guitar Playing Technique Classification.
US8972259B2 (en) System and method for teaching non-lexical speech effects
US9852721B2 (en) Musical analysis platform
Fujihara et al. Lyrics-to-audio alignment and its application
Sharma et al. NHSS: A speech and singing parallel database
US20170090860A1 (en) Musical analysis platform
CN103915093A (en) Method and device for realizing voice singing
Mesaros Singing voice identification and lyrics transcription for music information retrieval invited paper
Vijayan et al. Analysis of speech and singing signals for temporal alignment
JP5598516B2 (en) Voice synthesis system for karaoke and parameter extraction device
Toh et al. Multiple-Feature Fusion Based Onset Detection for Solo Singing Voice.
JP4479701B2 (en) Music practice support device, dynamic time alignment module and program
CN113823270B (en) Determination method, medium, device and computing equipment of rhythm score
Gupta et al. Deep learning approaches in topics of singing information processing
Mayor et al. Performance analysis and scoring of the singing voice
Friberg et al. CUEX: An algorithm for automatic extraction of expressive tone parameters in music performance from acoustic signals
Wong et al. Automatic lyrics alignment for Cantonese popular music
Lerch Software-based extraction of objective parameters from music performances
Nakano et al. A drum pattern retrieval method by voice percussion
CN105244021B (en) Conversion method of the humming melody to MIDI melody
Srinivasamurthy et al. Transcription and recognition of syllable based percussion patterns: The case of Beijing Opera
Li et al. An approach to score following for piano performances with the sustained effect
JP6098422B2 (en) Information processing apparatus and program
Cuesta et al. A framework for multi-f0 modeling in SATB choir recordings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant