CN113823270A - Rhythm score determination method, medium, device and computing equipment - Google Patents
Rhythm score determination method, medium, device and computing equipment Download PDFInfo
- Publication number
- CN113823270A CN113823270A CN202111266761.8A CN202111266761A CN113823270A CN 113823270 A CN113823270 A CN 113823270A CN 202111266761 A CN202111266761 A CN 202111266761A CN 113823270 A CN113823270 A CN 113823270A
- Authority
- CN
- China
- Prior art keywords
- target
- starting point
- note
- determining
- score
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000033764 rhythmic process Effects 0.000 title claims abstract description 96
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000001514 detection method Methods 0.000 claims abstract description 268
- 239000011295 pitch Substances 0.000 claims description 124
- 230000005236 sound signal Effects 0.000 claims description 63
- 238000001228 spectrum Methods 0.000 claims description 20
- 238000012545 processing Methods 0.000 description 19
- 238000010586 diagram Methods 0.000 description 18
- 238000011156 evaluation Methods 0.000 description 11
- 230000008901 benefit Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 238000013077 scoring method Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 238000009423 ventilation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
Embodiments of the present disclosure provide a method, medium, apparatus, and computing device for determining a tempo score, the method including: obtaining an acoustic stem signal corresponding to a target song sung by a user, determining a first starting point of each user singing note in the acoustic stem signal, and determining a rhythm score of the target song sung by the user according to the first starting point, the weight of each target note in a pitch line file corresponding to the target song and a starting point detection interval corresponding to each target note in the pitch line file. The method and the device can accurately obtain the rhythm score of the song sung by the user, and improve the user experience.
Description
Technical Field
The embodiment of the disclosure relates to the technical field of voice signal processing, and more particularly, to a method, a medium, an apparatus and a computing device for determining a tempo score.
Background
This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
The existing Karaoke system can score singing of a user, and entertainment interactivity is improved. Scoring the rhythm is a common scoring method.
In the related art, when performing tempo evaluation, a pitch contour is generally used as a feature of tempo evaluation, a Dynamic Time Warping (DTW) operation is performed on a pitch line sung by a user and a template pitch line to obtain a corresponding DTW curve, and then a Root Mean Square (RMS) (i.e., a least Square difference) between the DTW curve and a straight line obtained by fitting the DTW curve is calculated to obtain tempo evaluation. The tempo score obtained in the above manner is not accurate enough.
Disclosure of Invention
The present disclosure provides a rhythm score determination method, medium, apparatus, and computing device to accurately determine a rhythm score.
In a first aspect, an embodiment of the present disclosure provides a method for determining a tempo score, including:
acquiring dry sound signals corresponding to a target song sung by a user;
determining a first starting point of each user singing note in the dry sound signal;
determining a rhythm score of the target song sung by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file, wherein the weight is used for indicating the importance degree of different target notes in the pitch line file to rhythm listening feeling, the starting point detection interval comprises a preset offset range taking the second starting point of the target note as the center, the starting point detection interval corresponds to a first preset score of the target note, and the starting point detection interval and the first starting point are used for determining the target score of the target note.
In a possible implementation manner, determining a tempo score for a user to sing a target song according to a first start point, weights of target notes in a pitch line file corresponding to the target song, and start point detection intervals corresponding to the target notes in the pitch line file includes: determining whether a corresponding first starting point exists in the starting point detection interval; if the starting point detection interval is determined to have a corresponding first starting point, determining a target score of a target note according to the first starting point and a first preset score of the target note corresponding to the starting point detection interval; and determining the rhythm score of the target song sung by the user according to the weight of the target note and the target score.
In one possible implementation, determining a tempo score for the user to sing the target song according to the weight of the target note and the target score includes: acquiring the product of the weight of the target note corresponding to each starting point detection interval and the target score; and determining the rhythm score of the target song sung by the user according to the ratio of the summation of the products to the summation of the weights of the target notes in the pitch line file.
In a possible embodiment, the method for determining the tempo score further comprises: and if the starting point detection interval is determined to have no corresponding first starting point, determining the target score of the target note corresponding to the starting point detection interval as a second preset score, wherein the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
In a possible implementation manner, the starting point detection interval corresponding to the target note comprises a plurality of different starting point detection sub-intervals, and the different starting point detection sub-intervals correspond to different first preset scores of the target note; the different start point detection subintervals include different preset offset ranges centered at a second start point of the target note, and the different preset offset ranges are obtained according to the length of the target note, the second start point of the target note, and different preset detection values.
In a possible embodiment, the method for determining the tempo score further comprises: if a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target note, determining target starting point detection sub-intervals corresponding to the first starting points in the starting point detection interval respectively; and determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target note.
In one possible embodiment, determining a first starting point at which each user sings a note in the stem sound signal comprises: and determining a first starting point of each user singing note in the dry sound signal based on a preset starting point detection algorithm.
In one possible embodiment, the preset onset detection algorithm includes a spectrum-based onset detection algorithm and a pyin-based onset detection algorithm, and the determining the first onset of the singing of the note by each user in the dry sound signal includes: determining a third starting point of each user singing note in the dry sound signal based on a starting point detection algorithm of the frequency spectrum; determining a fourth starting point of each user singing note in the dry sound signal based on a pyin starting point detection algorithm; and determining a first starting point of each user singing the note in the dry sound signal according to the union set of the third starting point and the fourth starting point.
In a possible implementation manner, before determining a tempo score for a target song sung by a user according to a first start point, weights of target notes in a pitch line file corresponding to the target song, and start point detection intervals corresponding to the target notes in the pitch line file, the method for determining the tempo score further includes: acquiring the weight of each target note in the pitch line file corresponding to the target song by the following method: determining the weight of the first target note behind the air port as a first weight and determining the weight of the initial note in the pitch line file as a first weight; determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight; determining the weights of the target notes except the first target note and the continuous homophone after the air port as a third weight, wherein the third weight is smaller than the second weight; determining the weights of the target notes except the first target note in the continuous same note as a fourth weight, wherein the fourth weight is smaller than the third weight; and obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
In a possible embodiment, the method for determining the tempo score further comprises: if the time interval between two adjacent target musical notes is larger than the threshold value of the air port, determining that an air port is arranged between the two adjacent target musical notes; and if the pitches of the at least two target notes are the same and the time interval between two adjacent target notes is smaller than the air port threshold value, determining that the at least two target notes are continuous same notes.
In a second aspect, an embodiment of the present disclosure provides a rhythm score determining apparatus, including:
the acquisition module is used for acquiring an acoustic stem signal corresponding to a target song sung by a user;
the determining module is used for determining a first starting point of each user singing note in the dry sound signal;
the processing module is used for determining rhythm scores of the target song sung by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file, the weight is used for representing the importance degree of different target notes in the pitch line file to rhythm audibility, the starting point detection interval comprises a preset offset range taking the second starting point of the target note as the center, the starting point detection interval corresponds to the first preset score of the target note, and the starting point detection interval and the first starting point are used for determining the target score of the target note.
In a possible implementation, the processing module is specifically configured to: determining whether a corresponding first starting point exists in the starting point detection interval; if the starting point detection interval is determined to have a corresponding first starting point, determining a target score of a target note according to the first starting point and a first preset score of the target note corresponding to the starting point detection interval; and determining the rhythm score of the target song sung by the user according to the weight of the target note and the target score.
In a possible implementation, the processing module is specifically configured to: acquiring the product of the weight of the target note corresponding to each starting point detection interval and the target score; and determining the rhythm score of the target song sung by the user according to the ratio of the summation of the products to the summation of the weights of the target notes in the pitch line file.
In one possible implementation, the processing module is further configured to: and if the starting point detection interval is determined to have no corresponding first starting point, determining the target score of the target note corresponding to the starting point detection interval as a second preset score, wherein the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
In a possible implementation manner, the starting point detection interval corresponding to the target note comprises a plurality of different starting point detection sub-intervals, and the different starting point detection sub-intervals correspond to different first preset scores of the target note; the different start point detection subintervals include different preset offset ranges centered at a second start point of the target note, and the different preset offset ranges are obtained according to the length of the target note, the second start point of the target note, and different preset detection values.
In one possible implementation, the processing module is further configured to: if a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target note, determining target starting point detection sub-intervals corresponding to the first starting points in the starting point detection interval respectively;
and determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target note.
In a possible implementation, the determining module is specifically configured to: and determining a first starting point of each user singing note in the dry sound signal based on a preset starting point detection algorithm.
In a possible implementation manner, the preset starting point detection algorithm includes a spectrum-based starting point detection algorithm and a pyin-based starting point detection algorithm, and the determining module is specifically configured to: determining a third starting point of each user singing note in the dry sound signal based on a starting point detection algorithm of the frequency spectrum; determining a fourth starting point of each user singing note in the dry sound signal based on a pyin starting point detection algorithm; and determining a first starting point of each user singing the note in the dry sound signal according to the union set of the third starting point and the fourth starting point.
In one possible implementation, the processing module is further configured to: before determining the rhythm score of the target song sung by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file, acquiring the weight of each target note in the pitch line file corresponding to the target song in the following way: determining the weight of the first target note behind the air port as a first weight and determining the weight of the initial note in the pitch line file as a first weight; determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight; determining the weights of the target notes except the first target note and the continuous homophone after the air port as a third weight, wherein the third weight is smaller than the second weight; determining the weights of the target notes except the first target note in the continuous same note as a fourth weight, wherein the fourth weight is smaller than the third weight; and obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
In one possible implementation, the processing module is further configured to: if the time interval between two adjacent target musical notes is larger than the threshold value of the air port, determining that an air port is arranged between the two adjacent target musical notes; and if the pitches of the at least two target notes are the same and the time interval between two adjacent target notes is smaller than the air port threshold value, determining that the at least two target notes are continuous same notes.
In a third aspect, an embodiment of the present disclosure provides a computing device, including: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement a method of tempo scoring as described in the first aspect of the present disclosure.
In a fourth aspect, the embodiments of the present disclosure provide a computer-readable storage medium, in which computer program instructions are stored, and when the computer program instructions are executed by a processor, the method for determining a tempo score according to the first aspect of the present disclosure is implemented.
In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program, which when executed by a processor, implements the method for determining tempo score according to the first aspect of the present disclosure.
According to the rhythm score determining method, medium, device and computing equipment provided by the embodiment of the disclosure, the first starting point of each user singing note in the dry sound signal is determined by acquiring the dry sound signal corresponding to the target song sung by the user, and the rhythm score of the target song sung by the user is determined according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file. According to the method and the device, the rhythm score of the song sung by the user is determined according to the dry sound signal corresponding to the song sung by the user, the weight of each note in the pitch line file and the starting point detection interval corresponding to each note in the pitch line file, so that the rhythm score of the song sung by the user can be accurately obtained, and the user experience is improved.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
fig. 1 is a schematic view of an application scenario provided in the embodiment of the present disclosure;
fig. 2 is a flowchart of a rhythm score determining method according to an embodiment of the present disclosure;
FIG. 3 is a diagram illustrating a start point detection subinterval A corresponding to a note in a pitch line file according to an embodiment of the disclosure;
fig. 4 is a flowchart of a tempo score determination method according to another embodiment of the present disclosure;
FIG. 5 is a diagram illustrating a spectrum-based onset detection algorithm for detecting onset of notes according to an embodiment of the present disclosure;
FIG. 6 is a diagram illustrating note onsets detected by a pyin-based onset detection algorithm according to an embodiment of the disclosure;
FIG. 7 is a diagram illustrating that a pyin-based start point detection algorithm provided by an embodiment of the present disclosure cannot detect pitch for consonants;
FIG. 8 is a diagram illustrating a determination of a target score for a target note according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a rhythm score determining apparatus according to an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of a program product provided by an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the disclosure, a rhythm score determining method, a medium, a device and a computing device are provided.
In this context, it is to be understood that the terms referred to: end point detection, i.e. an algorithm that divides the starting points of the notes, commonly referred to as start point (onset) detection; dry sound, i.e. pure human sound without accompaniment and without post-processing; the frequency spectrum is a method for analyzing audio frequency, can show the relation between signal frequency and energy, and is generally a two-dimensional image, wherein the horizontal axis represents time, the vertical axis represents frequency, and the color depth represents energy. Moreover, any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.
Summary of The Invention
The inventor finds that when the existing Karaoke system scores singing of a user, the tone of the singing of the user is generally scored. In particular, the score for the user's singing voice is derived based on the difference between the user's singing voice and the melody (i.e., fundamental frequency) of the song in the template (e.g., midi file). In addition, when manually scoring a segment of singing voice, factors such as rhythm, breath, timbre, singing skill and the like are considered in addition to the scoring of accuracy of sound. Scoring the rhythm is a common scoring method. In rhythm evaluation, a pitch contour is generally used as a characteristic of rhythm evaluation in the related art, a corresponding DTW curve is obtained by performing DTW operation on a pitch line sung by a user and a template pitch line, and RMS between the DTW curve and a straight line obtained by fitting the DTW curve is calculated to obtain rhythm evaluation. When rhythm scoring is performed in the above manner, the following disadvantages exist: (1) the same song sung by different singers can be effectively evaluated in rhythm, but the scores of different songs cannot be compared with each other; (2) the full-melody altitude needs to be calculated, and real-time scoring cannot be given by taking a sentence as a scale; (3) the user is required to sing strictly according to the pitch, and if the tone is off but the rhythm is correct, a proper score cannot be given. In order to solve the situation that the user runs and the rhythm is correct, which occurs in the above disadvantage (3), a 13-dimensional Mel Cepstral coeffients (MFCC) feature is used, and the rhythm scoring is performed by using DTW, but the scoring method still has the following disadvantages: (1) the MFCC feature allows tempo assessment in case of a user running, but accordingly, the user must sing the lyrics correctly, otherwise no suitable score can be given; (2) the same song sung by different singers can be effectively subjected to rhythm evaluation, but scores of different songs cannot be compared with each other; (3) the full-melody altitude needs to be calculated, and real-time scoring cannot be given by taking a sentence as a scale.
In addition, in the related art, the rhythm may also be scored in the following ways: (1) lyrics are identified by a voice identification technology, and matching is carried out by taking sentences as units, so that a singing voice set E is obtained, wherein the singing voice set E is { E1, E2. And locating the starting point and the ending point of each word in the song voice of a sentence based on the frequency to obtain a tone set Pi-1, Pi 2. And comparing the singing voice duration of the user of the ith sentence singing voice Ei in the singing voice set E with the standard singing voice duration to evaluate the integral rhythm score. Similarly, comparing the start time difference and the end time difference of the jth word of the ith sentence singing voice in the singing voice set E with the standard singing voice, and evaluating a local rhythm score; the following disadvantages exist in the mode (1): the user must sing the lyrics correctly, otherwise the algorithm can not give a proper score; when phrases with the same lyrics appear in the same song, the matching is easy to be disordered; the local rhythm scoring requires the user to sing strictly according to the pitch, and if the tone is off but the rhythm is correct, the algorithm cannot give a proper score; (2) scoring the rhythm of the musical instrument, taking the middle point of the playing duration of each note as a specific playing time point, matching by using dynamic programming, and deducting according to the proportion of the tones with more or less playing in the whole body; the mode (2) has the following disadvantages: the proper scoring cannot be performed specifically according to the deviation of each note performance; when the tones with more and less performances exist in a phrase are scored, the obtained scores have deviation; (3) matching and scoring are carried out by taking the bar as the minimum unit, and rhythm scoring is carried out based on the difference value between the accuracy degree of the initial point beat of the bar and the integral time value of the bar; the mode (3) has the following disadvantages: the minimal deviation unit of the measure is 32 minutes of notes, the matching is not accurate enough, and meanwhile, the scoring is not accurate enough; when the difference between the singing rhythm of the user and the original music is more than 4 notes of 4 minutes, the upper limit of the matching method is exceeded, and the corresponding phrase cannot be matched; the missed, wrong and false beats of the single tone cannot be detected. The rhythm scoring methods are simple, the algorithm is hard to damage more, the accuracy is not high, and the application condition is narrow.
Based on the above problems, the present disclosure provides a rhythm score determining method, medium, device and computing device, which can accurately obtain a rhythm score of a song sung by a user according to an stem sound signal corresponding to the song sung by the user and a chord line file, and improve user experience.
Application scene overview
An application scenario of the scheme provided by the present disclosure is first illustrated with reference to fig. 1. Fig. 1 is a schematic view of an application scenario provided by an embodiment of the present disclosure, as shown in fig. 1, in the application scenario, a server 102 obtains an acoustic stem signal of a song sung by a user through a client 101, the server 102 determines a tempo score of the song sung by the user according to the acoustic stem signal and a chord line file of the song sung by the user, transmits the tempo score to the client 101 through a network, and the client 101 displays the tempo score of the song sung by the user. For the specific implementation process of determining the tempo score of the song sung by the user according to the dry sound signal and the chord line file of the song sung by the user, reference may be made to the schemes of the following embodiments.
It should be noted that fig. 1 is only a schematic diagram of an application scenario provided by the embodiment of the present disclosure, and the embodiment of the present disclosure does not limit the devices included in fig. 1, nor does it limit the positional relationship between the devices in fig. 1. For example, in the application scenario shown in fig. 1, a data storage device may be further included, and the data storage device may be an external memory with respect to the server 102 or an internal memory integrated in the server 102.
Exemplary method
A method of determining tempo scoring according to an exemplary embodiment of the present disclosure is described below with reference to fig. 1 in conjunction with an application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.
First, a method of determining tempo score will be described by way of specific embodiments.
Fig. 2 is a flowchart of a rhythm score determining method according to an embodiment of the present disclosure. The method of the disclosed embodiments may be applied in a computing device, which may be a server or a server cluster or the like. As shown in fig. 2, the method of the embodiment of the present disclosure includes:
s201, obtaining an acoustic stem signal corresponding to a target song sung by a user.
In the embodiment of the present disclosure, how to obtain the dry sound signal corresponding to the target song sung by the user is not specifically limited in the present disclosure. Illustratively, the user sings the target song by listening to the accompaniment of the target song through the earphone while singing the song. The karaoke Application (APP) can directly record the accompanying-free pure voice, i.e., the dry sound signal, corresponding to the target song sung by the user, so that the dry sound signal corresponding to the target song sung by the user can be acquired. Illustratively, when a user sings a song, the user directly listens to the played accompaniment to sing a target song, so that an audio file sung by the user with the accompaniment is obtained, and an acoustic signal corresponding to the target song sung by the user is extracted from the audio file by using a related acoustic accompaniment separation algorithm.
S202, determining a first starting point of each user singing note in the dry sound signal.
In this step, the first starting point is the starting time of the note sung by the user, and may also be referred to as onset of the note sung by the user. The first starting point at which each user in the dry sound signal sings a note is determined, i.e., the onset at which each user in the dry sound signal sings a note is determined. After the stem sound signal corresponding to the target song sung by the user is obtained, the stem sound onset can be extracted through the current endpoint detection algorithm based on the stem sound signal, that is, the first starting point of each note sung by the user in the stem sound signal is determined. For how to determine the first starting point of each user singing a note in the dry sound signal, reference may be made to the following embodiments, which are not described herein again.
S203, determining rhythm scores of the target song sung by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file.
The weights are used for representing the importance degree of different target notes in the sound contour file to rhythm listening, the starting point detection interval comprises a preset offset range taking a second starting point of the target notes as a center, the starting point detection interval corresponds to a first preset score of the target notes, and the starting point detection interval and the first starting point are used for determining the target score of the target notes.
Illustratively, the pitch line file, such as a midi file, may be purchased from a song supplier or may be generated by algorithms such as song melody extraction, score conversion, and the like. Table 1 shows a pitch line file format provided in an embodiment of the present disclosure, as shown in table 1, the pitch line file is a matrix with three columns, and each column has the following meanings: note onset time in milliseconds (ms); note ending time in milliseconds (ms); pitch, in units of midi key number. The corresponding note length can be obtained from the time interval of the note-on time and the note-off time.
TABLE 1
Different notes in the pitch line file contribute different degrees to whether the rhythm is accurate in listening, so that the weights of the notes in the pitch line file can be determined according to the importance degree of the different notes in the pitch line file to the rhythm listening. For how to determine the weight of each target note in the pitch line file corresponding to the target song, reference may be made to subsequent embodiments, which are not described herein again.
The second starting point of the target note, i.e., the starting time of the target note in the pitch line file, is centered on the second starting point of the target note, and the starting point detection interval corresponding to each target note in the pitch line file can be determined according to the preset offset range. The starting point detection interval corresponds to a first preset score of the target note, and the starting point detection interval and the first starting point are used for determining the target score of the target note.
Optionally, the starting point detection interval corresponding to the target note includes a plurality of different starting point detection subintervals, and the different starting point detection subintervals correspond to different first preset scores of the target note; the different start point detection subintervals include different preset offset ranges centered at a second start point of the target note, and the different preset offset ranges are obtained according to the length of the target note, the second start point of the target note, and different preset detection values.
Illustratively, the starting point detection interval corresponding to the target note includes, for example, three different starting point detection sub-intervals, namely a starting point detection sub-interval a, a starting point detection sub-interval B, and a starting point detection sub-interval C, where the three different starting point detection sub-intervals correspond to different first preset scores of the target note; the three different starting point detection subintervals include different preset offset ranges centered at the second starting point of the target note, and specifically, the preset offset ranges respectively corresponding to the three different starting point detection subintervals may be determined as follows:
starting point detection subinterval a: the note starting time point is +/-note length/4, wherein 4 is a preset detection value; if a first starting point of the subinterval A and the note sung by the user is detected according to the starting point, and the first starting point is detected to fall into the interval, a corresponding first preset score x can be obtained, wherein x is 1 for example;
starting point detection subinterval B: the note starting time point is +/-note length/3, wherein 3 is a preset detection value; if a first starting point of the subinterval A and the note sung by the user is detected according to the starting point, and the first starting point is detected to fall into the interval, a corresponding first preset score y can be obtained, wherein y is 0.7 for example;
starting point detection subinterval C: the note starting time point is +/-note length/2, wherein 2 is a preset detection value; if the first starting point of the subinterval a and the note sung by the user is detected according to the starting point, and the first starting point is detected to fall into the interval, a corresponding first preset score z can be obtained, wherein z is 0.5 for example.
The requirement of the fraction values of the three different starting point detection subintervals may be as follows: x > y > z > 0. The preset offset ranges corresponding to the three different starting point detection subintervals may be determined in advance through experiments. Exemplarily, one possible experimental approach is: acquiring dry sound signals covering different singing levels, different song styles and different languages, and manually determining the rhythm score of each section of singing; and setting a plurality of sets of preset offsets corresponding to the three different starting point detection subintervals, comparing the rhythm score obtained by the rhythm score determining method provided by the embodiment of the disclosure with the artificially determined rhythm score, and determining each preset offset range corresponding to the three different starting point detection subintervals according to the parameter scheme with the highest correlation degree with the artificially determined rhythm score.
After the preset offset ranges respectively corresponding to the three different starting point detection subintervals are determined in the above manner, the corresponding three starting point detection subintervals can respectively use thresholdA、thresholdBAnd thresholdCAccordingly, the first preset scores corresponding to the three starting point detection subintervals can be expressed by scoreA、scoreBAnd scoreCAnd (4) showing. Fig. 3 is a schematic diagram of a start point detection subinterval a corresponding to notes in a pitch line file according to an embodiment of the disclosure, as shown in fig. 3, for note 1, note 2 and note 3 in the pitch line file, the corresponding note start time points are: 5s, 7s and 12s, corresponding note ending time pointRespectively, the following steps: 7s, 12s and 15s, it can be determined that the corresponding note lengths are: 2s, 5s and 3s, the start point detection interval corresponding to each note comprises the above thresholdA、thresholdBAnd thresholdCThree starting point detection subintervals, specifically, the starting point detection interval corresponding to the note 1 includes a starting point detection subinterval a1, a starting point detection subinterval B1 and a starting point detection subinterval C1, the starting point detection interval corresponding to the note 2 includes a starting point detection subinterval a2, a starting point detection subinterval B2 and a starting point detection subinterval C2, and the starting point detection interval corresponding to the note 3 includes a starting point detection subinterval A3, a starting point detection subinterval B3 and a starting point detection subinterval C3; threshold corresponding to note 1, note 2 and note 3 respectivelyAFor example, as shown in fig. 3, the start point detection subintervals corresponding to note 1, note 2, and note 3 are: a starting point detection sub-interval a1, a starting point detection sub-interval a2, and a starting point detection sub-interval A3. Wherein, the first preset scores score respectively corresponding to the starting point detection subinterval A1, the starting point detection subinterval A2 and the starting point detection subinterval A3AAll are divided into 1.
Optionally, the weights of the notes in the pitch line file corresponding to the songs and the start point detection intervals corresponding to the notes in the pitch line file may be obtained according to a pre-obtained pitch line file corresponding to each song to be sung by the user, and stored in the memory, so that when performing rhythm evaluation on the songs sung by the user, the weights of the target notes in the pitch line file corresponding to the songs sung by the user and the start point detection intervals corresponding to the target notes in the pitch line file are directly obtained from the memory.
In the embodiment of the present disclosure, after the first start point of each note sung by the user in the dry sound signal is determined, the tempo score of the target song sung by the user may be determined according to the first start point, the weight of each target note in the pitch line file corresponding to the target song, and the start point detection interval corresponding to each target note in the pitch line file. For how to determine the rhythm score of the target song sung by the user according to the first start point, the weight of each target note in the pitch line file corresponding to the target song, and the start point detection interval corresponding to each target note in the pitch line file, reference may be made to subsequent embodiments, which are not described herein again.
Illustratively, after the tempo score for the user to sing the target song is determined, the tempo score for the user to sing the target song may be displayed to the user.
According to the rhythm score determining method provided by the embodiment of the disclosure, the first starting point of each user singing note in the dry sound signal is determined by obtaining the dry sound signal corresponding to the target song sung by the user, and the rhythm score of the target song sung by the user is determined according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file. According to the method and the device for detecting the rhythm of the song played by the user, the rhythm score of the song played by the user is determined according to the dry sound signal corresponding to the song played by the user, the weight of each note in the pitch line file and the starting point detection interval corresponding to each note in the pitch line file, so that the rhythm score of the song played by the user can be accurately obtained, and the user experience is improved.
On the basis of the above embodiment, in determining the weight of each note in the pitch line file in the tempo score, it is first determined whether there is a gas port between two adjacent target notes in the pitch line file and whether at least two target notes are consecutive notes according to the pitch line file. One possible implementation is that if the time interval between two adjacent target notes is greater than the air port threshold, it is determined that there is an air port between two adjacent target notes; and if the pitches of the at least two target notes are the same and the time interval between two adjacent target notes is smaller than the air port threshold value, determining that the at least two target notes are continuous same notes.
Illustratively, the time required for inter-phrase ventilation is typically between 450ms and 2000ms, and the time for fast singing or extreme inhalation is between 100ms and 450ms, so that in a pitch contour file, only those times longer than a certain duration (i.e., a port threshold, such as with T) are found (e.g., in a pitch contour file)breathRepresented) of silence, i.e., considered to be stored thereinAt one of the ports. Specifically, port threshold TbreathFor example, 100ms, in the pitch line file, if the time interval between the start time of the second target note and the end time of the first target note in two adjacent target notes is greater than 100ms, it is determined that there is a gas port between the two adjacent target notes. For consecutive homophones, for example, if the pitches of the 3 target notes are the same and the time interval between two adjacent target notes in the 3 target notes is less than the threshold value of the gas port, the 3 target notes are determined to be consecutive homophones.
Optionally, before determining the tempo score of the target song sung by the user according to the first start point, the weight of each target note in the pitch line file corresponding to the target song, and the start point detection interval corresponding to each target note in the pitch line file, the method for determining the tempo score provided in the embodiment of the present disclosure may further include: acquiring the weight of each target note in the pitch line file corresponding to the target song by the following method: determining the weight of the first target note behind the air port as a first weight and determining the weight of the initial note in the pitch line file as a first weight; determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight; determining the weights of the target notes except the first target note and the continuous homophone after the air port as a third weight, wherein the third weight is smaller than the second weight; determining the weights of the target notes except the first target note in the continuous same note as a fourth weight, wherein the fourth weight is smaller than the third weight; and obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
Illustratively, the weights of the individual notes in the pitch line file are obtained by:
a. the first note (i.e. the first note in each sentence) after the closure is weighted by a first weight X and the first note in the pitch line file is weighted by a first weight, X is 3 for example, it can be understood that the first note in the pitch line file is the first note in all the notes contained in the pitch line file;
b. the first note of consecutive notes has a second weight Y, for example 2;
c. the notes other than the first note, the consecutive homophone after the closure are weighted by a third weight Z, for example 1;
d. the notes other than the first note in the consecutive notes (i.e., non-first) are weighted by a fourth weight W, such as 0.
Wherein, the requirement for the weight value is as follows: x > Y > Z > W. The specific weight value may be determined in advance through experiments. Exemplarily, one possible experimental approach is: acquiring dry sound signals covering different singing levels, different song styles and different languages; manually determining the rhythm score of each segment of singing; and setting a plurality of sets of parameter schemes for determining weight values, comparing the rhythm scores obtained by the rhythm score determining method provided by the embodiment of the disclosure with the artificially determined rhythm scores, and determining the weight values in the a, the b, the c and the d according to the parameter scheme with the highest correlation degree with the artificially determined rhythm scores.
It should be noted that the first sound of each sentence of the song and the first sound of the consecutive same note are more important to the rhythm listening, so the corresponding weights are determined by the above a and b; in order to reduce the influence of the possibly inaccurate detection of the continuation homophonic non-first sound on the rhythm score, the corresponding weight is determined by the above d.
After the weights of the target notes in the pitch line file corresponding to the target song are obtained in the above manner, the weight corresponding to each target note in the pitch line file may be represented by weight [ j ], for example.
Fig. 4 is a flowchart of a rhythm score determining method according to another embodiment of the present disclosure. On the basis of the above embodiments, the embodiments of the present disclosure further describe how to determine the tempo score of the song sung by the user. As shown in fig. 4, a method of an embodiment of the present disclosure may include:
s401, obtaining an acoustic stem signal corresponding to a target song sung by a user.
For a detailed description of this step, reference may be made to the description related to S201 in the embodiment shown in fig. 2, and details are not described here.
In the embodiment of the present disclosure, the step S202 in fig. 2 may further include the following step S402:
s402, determining a first starting point of each user singing note in the dry sound signal based on a preset starting point detection algorithm.
In this step, the preset starting point detection algorithm may be predetermined based on the current starting point detection algorithm. Therefore, a first start point at which each user sings a note in the dry sound signal may be determined based on a preset start point detection algorithm.
Further, the preset onset detection algorithm includes a spectrum-based onset detection algorithm and a pyin-based onset detection algorithm, and determining a first onset point at which each user sings a note in the dry sound signal may include: determining a third starting point of each user singing note in the dry sound signal based on a starting point detection algorithm of the frequency spectrum; determining a fourth starting point of each user singing note in the dry sound signal based on a pyin starting point detection algorithm; and determining a first starting point of each user singing the note in the dry sound signal according to the union set of the third starting point and the fourth starting point.
Illustratively, the principle of the spectrum-based start point detection method is: when the speeches and the tones in the audio are changed, the frequency spectrum has abrupt structural changes. Fig. 5 is a schematic diagram of detecting note start points by a spectrum-based start point detection algorithm according to an embodiment of the present disclosure, and as shown in fig. 5, for a lyric "make me feel awkward" sung by a user, the lyric is decomposed into a corresponding spectrogram 502 according to a volume waveform diagram 501, and an onset of each note is found by detecting a mutation point of spectrum energy in the spectrogram 502. It should be noted that the spectrum-based start point detection algorithm is a general onset feature extractor.
Exemplarily, fig. 6 is a schematic diagram of note onsets detected by a pyin-based onset detection algorithm according to an embodiment of the present disclosure, and referring to fig. 6, the pyin-based onset detection algorithm includes the following four steps: firstly, detecting the pitch corresponding to each frame contained in the audio; secondly, smoothing the obtained pitch to obtain a pitch contour curve 601 as shown in fig. 6; thirdly, dividing notes according to the pitch line to obtain corresponding straight lines 602 shown in FIG. 6; fourth, the start time of the note is determined to be the detected onset. It should be noted that the origin detection algorithm based on pyin is a generic note origin feature extractor.
Experiments prove that the frequency spectrum-based initial point detection algorithm and the pyin-based initial point detection algorithm have the advantages and disadvantages respectively, wherein the frequency spectrum-based initial point detection algorithm has the advantages that: the sounding time of the consonant is considered when detecting onset, and the defects are as follows: for the case of pronunciation change (multiple pitches correspond to the same word), missed detection is often found, and for the case of one word and multiple pitches, the non-initial onset is not sensitive; the advantages of the pyin-based starting point detection algorithm are as follows: for pitch sensitivity, onset of the transposition can be solved well, and the disadvantages are as follows: the consonant part in pronunciation has no pitch, therefore, the detected onset will be later than the actual onset, causing errors, for example, fig. 7 is a schematic diagram of the pyin-based start point detection algorithm provided by an embodiment of the present disclosure, which cannot detect pitch for consonants, as shown in fig. 7, the numbers 241, 242, 243, 244, 174, 177, 179 and 180 represent the frequency of the detected pitch in Hz; for the second note shown in FIG. 7, the onset detected by the onset detection algorithm based on pyin is the onset 701, while the actual onset of the second note should be the onset 702, and the onset 701 is later than the onset 702, causing errors; in addition, the pyin-based start point detection algorithm cannot detect non-initial onset for the case of consecutive homophones.
Therefore, after the third starting point of each user singing note in the dry sound signal is determined based on the starting point detection algorithm of the frequency spectrum, and the fourth starting point of each user singing note in the dry sound signal is determined based on the starting point detection algorithm of the pyin, the first starting point of each user singing note in the dry sound signal is determined according to the union set of the third starting point and the fourth starting point, and the detection accuracy of the onset can be effectively improved. Illustratively, the first starting point of each user singing a note in the detected dry sound signal may be denoted by onset [ i ].
In the embodiment of the present disclosure, the step S203 in fig. 2 may further include the following three steps S403 to S405:
s403, determining whether a corresponding first starting point exists in the starting point detection section.
In this step, after determining a first start point of each user singing note in the dry sound signal based on a preset start point detection algorithm, it may be determined whether the start point detection interval has a corresponding first start point based on a start point detection interval corresponding to each target note in the pitch line file. Exemplarily, referring to fig. 3, the first start point is 4.8s, the start point detection interval is, for example, the start point detection sub-interval a1 corresponding to note 1, the start point detection sub-interval a1 is 4.5s to 5.5s, and 4.8s is in the range of 4.5s to 5.5s, so that it can be determined that the first start point 4.8s corresponding to the start point detection sub-interval a1 exists.
If the first starting point corresponding to the starting point detection interval exists, executing S404; if it is determined that the first start point does not exist in the start point detection section, step S406 is performed.
S404, if the first starting point corresponding to the starting point detection interval is determined, determining the target score of the target note according to the first starting point and the first preset score of the target note corresponding to the starting point detection interval.
In this step, after determining that the first start point corresponding to the start point detection section exists, the target score of the target note may be determined according to the first start point and the first preset score of the target note corresponding to the start point detection section. For example, referring to fig. 3, the first start point is 4.8s, the start point detection interval is a start point detection sub-interval a1 corresponding to the note 1, and after the start point detection sub-interval a1 is determined to have the corresponding first start point of 4.8s, since the first preset score of the note 1 corresponding to the start point detection sub-interval a1 is 1 score, the target score of the target note corresponding to the first start point may be determined to be 1 score.
Further, the method for determining a tempo score provided by the embodiment of the present disclosure may further include: if a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target note, determining target starting point detection sub-intervals corresponding to the first starting points in the starting point detection interval respectively; and determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target note.
Exemplarily, fig. 8 is a schematic diagram of determining a target score of a target note according to an embodiment of the present disclosure, as shown in fig. 8, where the target note is, for example, note 2, and the corresponding start point detection interval includes, for example, three different start point detection sub-intervals, which are, respectively, a start point detection sub-interval a (corresponding first preset score is, for example, 1 score), a start point detection sub-interval B (corresponding first preset score is, for example, 0.7), and a start point detection sub-interval C (corresponding first preset score is, for example, 0.5); if there are 3 first start points corresponding to the start point detection section corresponding to the note 2, which are respectively the first start point 1, the first start point 2, and the first start point 3, it can be determined that the target start point detection subinterval corresponding to the first start point 1 is the start point detection subinterval a, the target start point detection subinterval corresponding to the first start point 2 is the start point detection subinterval B, and the target start point detection subinterval corresponding to the first start point 3 is the start point detection subinterval C, and further it can be determined that the score of the start point detection subinterval a corresponding to the first start point 1 is the highest, and the score is 1, so it can be determined that the target score of the note 2 is 1.
S405, determining a rhythm score of the target song sung by the user according to the weight of the target note and the target score.
In this step, after the target score of the target note is determined based on the pitch line file corresponding to the target song, the rhythm score of the target song sung by the user can be determined according to the weight of the target note and the target score.
Further, determining a tempo score for the user singing the target song according to the weight of the target note and the target score may include: acquiring the product of the weight of the target note corresponding to each starting point detection interval and the target score; and determining the rhythm score of the target song sung by the user according to the ratio of the summation of the products to the summation of the weights of the target notes in the pitch line file.
Illustratively, the tempo score at which the user sings the target song may be determined by the following formula:
wherein,
wherein, rhythmscoreA tempo score, onset i, representing the target song sung by the user]An onset indicating the singing notes of each user in the detected user dry sound signal, i indicating the ith onset in the onset array; score [ j ]]When onset indicating the note sung by the user falls in the different start point detection subintervals (i.e., the start point detection subinterval a, the start point detection subinterval B, and the start point detection subinterval C in the above-described embodiment), the score corresponding to the target note (i.e., score in the above-described embodiment) is obtainedA、scoreBAnd scoreC);notes[j]Representing the respective target note in the pitch line file, j representing the jth target note in the pitch line file, weight j]Represents the weight of the jth target note in the pitch contour, nodes j].thresholdAA start point detection subinterval A, notes [ j ] representing the jth target note in the pitch line file].thresholdBA start point detection subinterval B, notes [ j ] representing the jth target note in the pitch line file].thresholdCA start point detection subinterval C representing the jth target note in the pitch line file.
And S406, if the starting point detection interval is determined to have no corresponding first starting point, determining the target score of the target note corresponding to the starting point detection interval as a second preset score.
The second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
For example, if the second preset score is 0, the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval. And if the starting point detection section is determined to have no corresponding first starting point, determining that the target score of the target note corresponding to the starting point detection section is 0. Exemplarily, referring to score [ j ] in the above embodiment, when there is no corresponding first starting point in the starting point detection interval, that is, it belongs to other cases, score [ j ] takes a value of 0.
According to the rhythm scoring determination method provided by the embodiment of the disclosure, the first starting point of each user singing note in the dry sound signal is determined by acquiring the dry sound signal corresponding to the target song sung by the user and based on the preset starting point detection algorithm; determining whether a corresponding first starting point exists in the starting point detection interval, if so, determining a target score of a target note according to the first starting point and a first preset score of the target note corresponding to the starting point detection interval, and if not, determining the target score of the target note corresponding to the starting point detection interval as a second preset score; determining a rhythm score of the target song sung by the user according to the weight and the target score of the target note; according to the method and the device for detecting the rhythm of the song played by the user, the rhythm score of the song played by the user is determined according to the dry sound signal corresponding to the song played by the user, the weight of each note in the pitch line file and the starting point detection interval corresponding to each note in the pitch line file, so that the rhythm score of the song played by the user can be accurately obtained, and the user experience is improved.
On the basis of the above embodiment, in a possible implementation manner, according to a first start point of each user singing note in the dry sound signal and a start point detection interval corresponding to each target note in the pitch line file, whether the first start point has a corresponding start point detection interval or not may be determined; if it is determined that the first starting point has the corresponding starting point detection interval, determining a score and a weight corresponding to the first starting point according to the first starting point, a first preset score of a target note corresponding to the starting point detection interval and the weight of each target note in a pitch line file corresponding to the target song; obtaining the product of the score and the weight corresponding to each first initial point according to the score and the weight corresponding to each first initial point; and determining the rhythm score of the target song sung by the user according to the ratio of the summation of the products and the summation of the weights corresponding to the first starting points.
In summary, the technical solution provided by the present disclosure has at least the following advantages:
(1) the method can replace the manual evaluation process, and provide rhythm evaluation for singing in real time and objectively;
(2) because the pitch is not required to be referred, the method is suitable for various singing levels, and does not require intonation and lyrics;
(3) because the rhythm scores obtained by the formula are in percentage, the rhythm scores of different songs have the same evaluation scale, and the rhythm scores of different songs can be compared;
(4) rhythm scores can be given in real time by taking sentences as units, and evaluation is not required to be carried out after the whole music is finished.
Exemplary devices
Having described the medium of an exemplary embodiment of the present disclosure, next, a determination device of tempo scoring of an exemplary embodiment of the present disclosure will be explained with reference to fig. 8. The device of the exemplary embodiment of the present disclosure can implement each process in the foregoing model training method embodiments, and achieve the same function and effect.
Fig. 9 is a schematic structural diagram of a rhythm score determining apparatus according to an embodiment of the present disclosure, and as shown in fig. 9, a rhythm score determining apparatus 900 according to an embodiment of the present disclosure includes: an acquisition module 901, a determination module 902 and a processing module 903. Wherein:
an obtaining module 901, configured to obtain an acoustic stem signal corresponding to a target song sung by a user.
A determining module 902 is configured to determine a first starting point of each user singing a note in the dry sound signal.
The processing module 903 is configured to determine a rhythm score of the target song sung by the user according to the first start point, a weight of each target note in the pitch line file corresponding to the target song, and a start point detection interval corresponding to each target note in the pitch line file, where the weight is used to indicate an importance degree of different target notes in the pitch line file to rhythm audibility, the start point detection interval includes a preset offset range centered on a second start point of the target note, the start point detection interval corresponds to a first preset score of the target note, and the start point detection interval and the first start point are used to determine a target score of the target note.
In a possible implementation, the processing module 903 may be specifically configured to: determining whether a corresponding first starting point exists in the starting point detection interval; if the starting point detection interval is determined to have a corresponding first starting point, determining a target score of a target note according to the first starting point and a first preset score of the target note corresponding to the starting point detection interval; and determining the rhythm score of the target song sung by the user according to the weight of the target note and the target score.
In a possible implementation, the processing module 903 may be specifically configured to: acquiring the product of the weight of the target note corresponding to each starting point detection interval and the target score; and determining the rhythm score of the target song sung by the user according to the ratio of the summation of the products to the summation of the weights of the target notes in the pitch line file.
In one possible implementation, the processing module 903 may further be configured to: and if the starting point detection interval is determined to have no corresponding first starting point, determining the target score of the target note corresponding to the starting point detection interval as a second preset score, wherein the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
In a possible implementation manner, the starting point detection interval corresponding to the target note comprises a plurality of different starting point detection sub-intervals, and the different starting point detection sub-intervals correspond to different first preset scores of the target note; the different start point detection subintervals include different preset offset ranges centered at a second start point of the target note, and the different preset offset ranges are obtained according to the length of the target note, the second start point of the target note, and different preset detection values.
In one possible implementation, the processing module 903 may further be configured to: if a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target note, determining target starting point detection sub-intervals corresponding to the first starting points in the starting point detection interval respectively;
and determining the highest score in the first preset scores of the target notes corresponding to the target starting point detection subintervals as the target score of the target note.
In a possible implementation, the determining module 902 may be specifically configured to: and determining a first starting point of each user singing note in the dry sound signal based on a preset starting point detection algorithm.
In a possible implementation manner, the preset starting point detection algorithm includes a spectrum-based starting point detection algorithm and a pyin-based starting point detection algorithm, and the determining module 902 may specifically be configured to: determining a third starting point of each user singing note in the dry sound signal based on a starting point detection algorithm of the frequency spectrum; determining a fourth starting point of each user singing note in the dry sound signal based on a pyin starting point detection algorithm; and determining a first starting point of each user singing the note in the dry sound signal according to the union set of the third starting point and the fourth starting point.
In one possible implementation, the processing module 903 may further be configured to: before determining the rhythm score of the target song sung by the user according to the first starting point, the weight of each target note in the pitch line file corresponding to the target song and the starting point detection interval corresponding to each target note in the pitch line file, acquiring the weight of each target note in the pitch line file corresponding to the target song in the following way: determining the weight of the first target note behind the air port as a first weight and determining the weight of the initial note in the pitch line file as a first weight; determining the weight of a first target note in the continuous same notes as a second weight, wherein the second weight is smaller than the first weight; determining the weights of the target notes except the first target note and the continuous homophone after the air port as a third weight, wherein the third weight is smaller than the second weight; determining the weights of the target notes except the first target note in the continuous same note as a fourth weight, wherein the fourth weight is smaller than the third weight; and obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
In one possible implementation, the processing module 903 may further be configured to: if the time interval between two adjacent target musical notes is larger than the threshold value of the air port, determining that an air port is arranged between the two adjacent target musical notes; and if the pitches of the at least two target notes are the same and the time interval between two adjacent target notes is smaller than the air port threshold value, determining that the at least two target notes are continuous same notes.
The apparatus of the embodiment of the present disclosure may be configured to execute a scheme of a method for determining a tempo score in any one of the above method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
Exemplary Medium
Having described the method of the exemplary embodiment of the present disclosure, next, a storage medium of the exemplary embodiment of the present disclosure will be described with reference to fig. 10.
Fig. 10 is a schematic diagram of a program product provided in an embodiment of the present disclosure, and referring to fig. 10, a program product 1000 for implementing the method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. The readable signal medium may also be any readable medium other than a readable storage medium.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).
Exemplary computing device
Having described the methods, media, and apparatus of the exemplary embodiments of the present disclosure, a computing device of the exemplary embodiments of the present disclosure is described next with reference to fig. 11.
The computing device 1100 shown in fig. 11 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.
Fig. 11 is a schematic structural diagram of a computing device provided in an embodiment of the present disclosure, and as shown in fig. 11, the computing device 1100 is represented in the form of a general-purpose computing device. Components of computing device 1100 may include, but are not limited to: the at least one processing unit 1101, the at least one storage unit 1102, and a bus 1103 connecting different system components (including the processing unit 1101 and the storage unit 1102).
The bus 1103 includes a data bus, a control bus, and an address bus.
The storage unit 1102 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)11021 and/or cache memory 11022, and may further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 11023.
The memory unit 1102 may also include a program/utility 11025 having a set (at least one) of program modules 11024, such program modules 11024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the tempo scoring determination means are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (10)
1. A method of tempo scoring determination comprising:
acquiring dry sound signals corresponding to a target song sung by a user;
determining a first starting point of each user singing note in the dry sound signal;
according to the first starting point, the weight of each target note in a pitch line file corresponding to the target song and a starting point detection interval corresponding to each target note in the pitch line file, determining a rhythm score of the target song sung by the user, wherein the weight is used for expressing the importance degree of different target notes in the pitch line file to rhythm audibility, the starting point detection interval comprises a preset offset range taking a second starting point of the target note as a center, the starting point detection interval corresponds to a first preset score of the target note, and the starting point detection interval and the first starting point are used for determining the target score of the target note.
2. The method for determining a tempo score according to claim 1, wherein said determining the tempo score for the user to sing the target song according to the first start point, the weight of each target note in the pitch-line file corresponding to the target song, and the start point detection interval corresponding to each target note in the pitch-line file comprises:
determining whether the starting point detection interval has the corresponding first starting point;
if the first starting point corresponding to the starting point detection interval is determined, determining a target score of the target note according to the first starting point and a first preset score of the target note corresponding to the starting point detection interval;
and determining the rhythm score of the target song sung by the user according to the weight and the target score of the target musical note.
3. The method for determining a tempo score according to claim 2, wherein said determining a tempo score for said user singing a target song according to said target note weight and target score comprises:
acquiring the product of the weight and the target score of the target note corresponding to each starting point detection interval;
and determining the rhythm score of the target song sung by the user according to the ratio of the sum of the products to the sum of the weights of the target notes in the pitch line file.
4. The method of determining a tempo score according to claim 2, further comprising:
and if the starting point detection interval is determined to have no corresponding first starting point, determining that the target score of the target note corresponding to the starting point detection interval is a second preset score, wherein the second preset score is smaller than the first preset score of the target note corresponding to the starting point detection interval.
5. The method for determining a tempo score according to claim 2, wherein the start point detection interval corresponding to the target note comprises a plurality of different start point detection sub-intervals, and the different start point detection sub-intervals correspond to different first preset scores of the target note;
the different starting point detection subintervals include different preset offset ranges centered at a second starting point of the target note, and the different preset offset ranges are obtained according to the length of the target note, the second starting point of the target note and different preset detection values.
6. The method of determining a tempo score according to claim 5, further comprising:
if a plurality of corresponding first starting points exist in the starting point detection interval corresponding to the target note, determining target starting point detection subintervals corresponding to the first starting points in the starting point detection interval respectively;
and determining the highest target score of the target notes in the first preset scores of the target notes corresponding to the target starting point detection subintervals.
7. A method for tempo scoring determination according to claim 1, wherein said determining a first starting point at which each user sings a note in said dry sound signal comprises:
and determining a first starting point of each user singing note in the dry sound signal based on a preset starting point detection algorithm.
8. The method of determining a tempo score according to claim 7, wherein said preset onset detection algorithm comprises a spectrum-based onset detection algorithm and a pyin-based onset detection algorithm, and said determining a first onset point at which each user sings a note in said dry sound signal comprises:
determining a third starting point of each user singing note in the dry sound signal based on a starting point detection algorithm of a frequency spectrum;
determining a fourth starting point of each user singing note in the dry sound signal based on a pyin starting point detection algorithm;
and determining a first starting point of each user singing note in the dry sound signal according to the union set of the third starting point and the fourth starting point.
9. The method for determining a tempo score according to any one of claims 1-8, wherein before determining the tempo score for the target song sung by the user according to the first start point, the weight of each target note in the pitch-line file corresponding to the target song, and the start point detection interval corresponding to each target note in the pitch-line file, the method further comprises:
acquiring the weight of each target note in the pitch line file corresponding to the target song by the following method:
determining the weight of the first target note behind the air port as a first weight and determining the weight of the initial note in the pitch line file as a first weight;
determining the weight of a first target note in the consecutive notes as a second weight, wherein the second weight is smaller than the first weight;
determining the weights of the target notes other than the first target note after the air port and the continuous homophone as a third weight, wherein the third weight is smaller than the second weight;
determining the weights of the target notes except the first target note in the consecutive notes as fourth weights, wherein the fourth weights are smaller than the third weights;
and obtaining the weight of each target note in the pitch line file corresponding to the target song according to the first weight, the second weight, the third weight and the fourth weight.
10. The method of determining a tempo score according to claim 9, further comprising:
if the time interval between two adjacent target notes is greater than the threshold value of the air port, determining that an air port exists between the two adjacent target notes;
if the pitches of the at least two target notes are the same and the time interval between two adjacent target notes is smaller than the air port threshold value, determining that the at least two target notes are continuous same notes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111266761.8A CN113823270B (en) | 2021-10-28 | 2021-10-28 | Determination method, medium, device and computing equipment of rhythm score |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111266761.8A CN113823270B (en) | 2021-10-28 | 2021-10-28 | Determination method, medium, device and computing equipment of rhythm score |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113823270A true CN113823270A (en) | 2021-12-21 |
CN113823270B CN113823270B (en) | 2024-05-03 |
Family
ID=78917573
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111266761.8A Active CN113823270B (en) | 2021-10-28 | 2021-10-28 | Determination method, medium, device and computing equipment of rhythm score |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113823270B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111429949A (en) * | 2020-04-16 | 2020-07-17 | 广州繁星互娱信息科技有限公司 | Pitch line generation method, device, equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004184506A (en) * | 2002-11-29 | 2004-07-02 | Brother Ind Ltd | Karaoke machine and program |
JP2005107329A (en) * | 2003-09-30 | 2005-04-21 | Yamaha Corp | Karaoke machine |
WO2010115298A1 (en) * | 2009-04-07 | 2010-10-14 | Lin Wen Hsin | Automatic scoring method for karaoke singing accompaniment |
US20120067196A1 (en) * | 2009-06-02 | 2012-03-22 | Indian Institute of Technology Autonomous Research and Educational Institution | System and method for scoring a singing voice |
CN107767850A (en) * | 2016-08-23 | 2018-03-06 | 冯山泉 | A kind of singing marking method and system |
CN108008930A (en) * | 2017-11-30 | 2018-05-08 | 广州酷狗计算机科技有限公司 | The method and apparatus for determining K song score values |
CN109300485A (en) * | 2018-11-19 | 2019-02-01 | 北京达佳互联信息技术有限公司 | Methods of marking, device, electronic equipment and the computer storage medium of audio signal |
KR102107588B1 (en) * | 2018-10-31 | 2020-05-07 | 미디어스코프 주식회사 | Method for evaluating about singing and apparatus for executing the method |
CN112309351A (en) * | 2019-07-31 | 2021-02-02 | 武汉Tcl集团工业研究院有限公司 | Song generation method and device, intelligent terminal and storage medium |
CN113096689A (en) * | 2021-04-02 | 2021-07-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Song singing evaluation method, equipment and medium |
-
2021
- 2021-10-28 CN CN202111266761.8A patent/CN113823270B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004184506A (en) * | 2002-11-29 | 2004-07-02 | Brother Ind Ltd | Karaoke machine and program |
JP2005107329A (en) * | 2003-09-30 | 2005-04-21 | Yamaha Corp | Karaoke machine |
WO2010115298A1 (en) * | 2009-04-07 | 2010-10-14 | Lin Wen Hsin | Automatic scoring method for karaoke singing accompaniment |
US20120067196A1 (en) * | 2009-06-02 | 2012-03-22 | Indian Institute of Technology Autonomous Research and Educational Institution | System and method for scoring a singing voice |
CN107767850A (en) * | 2016-08-23 | 2018-03-06 | 冯山泉 | A kind of singing marking method and system |
CN108008930A (en) * | 2017-11-30 | 2018-05-08 | 广州酷狗计算机科技有限公司 | The method and apparatus for determining K song score values |
KR102107588B1 (en) * | 2018-10-31 | 2020-05-07 | 미디어스코프 주식회사 | Method for evaluating about singing and apparatus for executing the method |
CN109300485A (en) * | 2018-11-19 | 2019-02-01 | 北京达佳互联信息技术有限公司 | Methods of marking, device, electronic equipment and the computer storage medium of audio signal |
CN112309351A (en) * | 2019-07-31 | 2021-02-02 | 武汉Tcl集团工业研究院有限公司 | Song generation method and device, intelligent terminal and storage medium |
CN113096689A (en) * | 2021-04-02 | 2021-07-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Song singing evaluation method, equipment and medium |
Non-Patent Citations (2)
Title |
---|
兰帆 等: "一种改进旋律匹配算法在MIDI演奏系统中的应用", 计算机与现代化, no. 06, 31 December 2009 (2009-12-31), pages 151 - 157 * |
樊儒昆 等: "动作与音乐的节奏特征匹配模型", 计算机辅助设计与图形学学报, vol. 22, no. 06, 30 June 2010 (2010-06-30), pages 990 - 996 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111429949A (en) * | 2020-04-16 | 2020-07-17 | 广州繁星互娱信息科技有限公司 | Pitch line generation method, device, equipment and storage medium |
CN111429949B (en) * | 2020-04-16 | 2023-10-13 | 广州繁星互娱信息科技有限公司 | Pitch line generation method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113823270B (en) | 2024-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2115732B1 (en) | Music transcription | |
JP4479701B2 (en) | Music practice support device, dynamic time alignment module and program | |
JP2008015214A (en) | Singing skill evaluation method and karaoke machine | |
Toh et al. | Multiple-Feature Fusion Based Onset Detection for Solo Singing Voice. | |
Mayor et al. | Performance analysis and scoring of the singing voice | |
Grubb et al. | Enhanced vocal performance tracking using multiple information sources | |
Lerch | Software-based extraction of objective parameters from music performances | |
Friberg et al. | CUEX: An algorithm for automatic extraction of expressive tone parameters in music performance from acoustic signals | |
CN105244021B (en) | Conversion method of the humming melody to MIDI melody | |
CN113823270B (en) | Determination method, medium, device and computing equipment of rhythm score | |
JP2008015211A (en) | Pitch extraction method, singing skill evaluation method, singing training program, and karaoke machine | |
JP3279204B2 (en) | Sound signal analyzer and performance information generator | |
JP2002041068A (en) | Singing rating method in karaoke equipment | |
JP4070120B2 (en) | Musical instrument judgment device for natural instruments | |
Li et al. | An approach to score following for piano performances with the sustained effect | |
Molina et al. | Automatic scoring of singing voice based on melodic similarity measures | |
JP6098422B2 (en) | Information processing apparatus and program | |
CN112992110B (en) | Audio processing method, device, computing equipment and medium | |
JP2008040260A (en) | Musical piece practice assisting device, dynamic time warping module, and program | |
JP5810947B2 (en) | Speech segment specifying device, speech parameter generating device, and program | |
JP2008015212A (en) | Musical interval change amount extraction method, reliability calculation method of pitch, vibrato detection method, singing training program and karaoke device | |
JP2008040258A (en) | Musical piece practice assisting device, dynamic time warping module, and program | |
ZA et al. | Investigating ornamentation in Malay traditional, Asli Music. | |
JP2005234304A (en) | Performance sound decision apparatus and performance sound decision program | |
JP6090043B2 (en) | Information processing apparatus and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |