CN110853678B - Trill identification scoring method, trill identification scoring device, terminal and non-transitory computer-readable storage medium - Google Patents
Trill identification scoring method, trill identification scoring device, terminal and non-transitory computer-readable storage medium Download PDFInfo
- Publication number
- CN110853678B CN110853678B CN201911138267.6A CN201911138267A CN110853678B CN 110853678 B CN110853678 B CN 110853678B CN 201911138267 A CN201911138267 A CN 201911138267A CN 110853678 B CN110853678 B CN 110853678B
- Authority
- CN
- China
- Prior art keywords
- pitch
- tone
- trill
- word
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013077 scoring method Methods 0.000 title claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 63
- 238000009432 framing Methods 0.000 claims abstract description 32
- 238000012545 processing Methods 0.000 claims abstract description 17
- 238000004364 calculation method Methods 0.000 claims description 35
- 238000004590 computer program Methods 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 14
- 238000001914 filtration Methods 0.000 claims description 13
- 230000007246 mechanism Effects 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 210000000056 organ Anatomy 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- SNDPXSYFESPGGJ-UHFFFAOYSA-N L-norVal-OH Natural products CCCC(N)C(O)=O SNDPXSYFESPGGJ-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
Abstract
The invention relates to the technical field of voice signal processing, and provides a vibrato recognition scoring method, a vibrato recognition scoring device, a terminal and a non-transitory computer readable storage medium, which are used for vibrato real-time, fair and scientific scoring. The method comprises the following steps: judging whether the currently sung character needs to be subjected to vibrato recognition; if the currently sung word needs to be subjected to vibrato recognition, calculating the tone average value, the tone standard deviation and the vibrato period of the currently sung word in the current tone sequence group after framing the audio data of the currently sung word; calculating the trill score of the current tone sequence group of the currently sung word according to the tone average value, the tone standard deviation and the trill period of the current tone sequence group of the currently sung word; a real-time vibrato score for the currently sung word is found based on the vibrato score for the current pitch sequence grouping of the currently sung word and the vibrato scores for the C-1 pitch sequence groupings of the currently sung word. The method ensures real-time, objective and scientific vibrato identification and evaluation.
Description
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a vibrato recognition scoring method, a vibrato recognition scoring device, a terminal and a non-transitory computer readable storage medium.
Background
The trill is also called rolling sound, is a type of consonant of continuous vibration generated by air flow impact of a certain soft sounding organ in a relaxed state, and specifically, one tuning organ contacts another tuning organ to generate vibration under a certain aerodynamic condition; a certain flexible tuning organ in the sound channel contacts the surface of another tuning organ to form a gap, and when the air flow is constant, the gap can repeatedly open and close; the size of the gap and the gas flow must be limited to a strict range.
In the song singing scoring process, a singer is infected by songs and live emotions, and can emit vibrato when singing certain characters. The vibrato can make the whole singing process more sensitive, and the recognition and the scoring of the vibrato in the singing scoring process can more arouse the enthusiasm of singers and the interest of singing. More importantly, the method can avoid the situation that the fluctuation of the vibrato is mistakenly regarded as deviating from the original tone in the scoring process, so that the situation that a higher singing score is originally obtained but the score is not ideal is caused.
The existing trill recognition method is a static recognition method, which mainly intercepts a section of sound segment to be detected, wherein the sound segment at least comprises the pronunciation of a character during singing, and then analyzes the sound segment to judge whether the segment is trill or not. However, on one hand, the static recognition method can only extract the single feature of the sound segment, namely whether the sound segment is the feature of the vibrato, and the whole word sings and then is analyzed and judged, so that the static recognition method cannot be used for the real-time vibrato grading process; on the other hand, when a singer is rated, the vibrato is easily and erroneously determined as a tone singing inaccuracy, so that the scoring of the singing becomes unfair and unscientific.
Disclosure of Invention
The invention provides a vibrato identification and scoring method, a device, a terminal and a non-transitory computer readable storage medium, which realize the technical effects of identifying the vibrato in the singing song of a singer in real time and scientifically and objectively scoring the tone of the singer.
In one aspect, the invention provides a vibrato identification scoring method, which comprises the following steps:
judging whether the currently sung character needs to be subjected to vibrato recognition;
if the currently sung word needs to be subjected to vibrato recognition, calculating the tone average value, the tone standard deviation and the vibrato period of the currently sung word in the current tone sequence group after framing the audio data of the currently sung word, wherein each plurality of tone sequences of the currently sung word form a tone sequence group;
calculating the trill score of the current tone sequence group of the currently sung word according to the tone average value, the tone standard deviation and the trill period of the current tone sequence group of the currently sung word;
and calculating the real-time trill score of the currently sung word based on the trill score of the current tone sequence group of the currently sung word and the trill scores of C-1 tone sequence groups of the currently sung word, wherein C is a natural number, and the C tone sequence groups form all the tone sequence groups of the currently sung word.
Specifically, the calculating the pitch mean, the pitch standard deviation and the vibrato period of the current pitch sequence group of the currently sung word after framing the audio data of the currently sung word includes:
processing the audio data of the currently sung word by adopting a sliding window mechanism to obtain a tone sequence group of the currently sung word;
calculating the pitch average value AVGTone of the current pitch sequence group of the currently sung word and the trill period of the current pitch sequence group of the currently sung word;
the pitch standard deviation SDTone of the current pitch sequence grouping of the currently sung word is calculated based on the pitch sliding window size ToneWLen in the sliding window grouping method and the pitch mean AVGTone of the current pitch sequence grouping of the currently sung word.
Specifically, the processing the audio data of the currently sung word by using the sliding window mechanism to obtain the tone sequence packet of the currently sung word includes:
framing the audio data of the currently sung word by a sliding window framing method to obtain a plurality of frames of audio data;
calculating the tone value of each frame of audio data of a plurality of frames of audio data to obtain a tone sequence;
filtering the tone sequence to obtain a filtered tone sequence;
and performing sliding window grouping on the filtered tone sequences by adopting a sliding window grouping method to obtain all tone sequence groups of the currently sung words.
Specifically, the calculating a vibrato score of the current pitch sequence group of the currently sung word according to the pitch mean, the pitch standard deviation and the vibrato period of the current pitch sequence group of the currently sung word includes:
obtaining an average tone score AVGToneSCore of the current tone sequence group of the currently sung word by comparing the tone average of the current tone sequence group of the currently sung word with the reference tone of the currently sung word in the tone reference file;
determining the pitch standard deviation score SDToneScore of the current pitch sequence group of the currently sung word by judging whether the pitch standard deviation of the current pitch sequence group of the currently sung word is positioned in an interval formed by an upper limit and a lower limit of a preset pitch standard deviation;
determining a trill period score TrillTimeScore of the current pitch sequence packet of the currently sung word by judging whether the trill period of the current pitch sequence packet of the currently sung word is within an interval formed by the upper limit and the lower limit of a preset trill period;
according to the formula TScore ═ AVGToneScore [ ToneRate + (1-ToneRate) × (TrillTimeScore) × SDToneScore ], a vibrato score TScore of the current pitch sequence group of the currently sung word is calculated.
Specifically, the obtaining a real-time vibrato score of the currently sung word based on the vibrato score of the currently pitched sequence group of the currently sung word and the vibrato scores of the C-1 pitched sequence groups of the currently sung word includes:
according to the same method as that for obtaining the trill score of the current tone sequence group of the currently sung word, obtaining the sum TSUM of the trill scores of C-1 tone sequence groups before the current tone sequence group of the currently sung word;
dividing the sum of the trill score of the current tone sequence group of the currently sung word and TSUM by the number of all tone sequence groups of the currently sung word, and taking the obtained result as the real-time trill score of the currently sung word.
In another aspect, the present invention provides a vibrato identification scoring apparatus, including:
the judging module is used for judging whether the currently sung character needs to be subjected to vibrato recognition or not;
the first calculation module is used for calculating the average value of the tones, the standard deviation of the tones and the trill period of the current tone sequence group of the currently sung word after framing the audio data of the currently sung word if the currently sung word needs to be subjected to trill recognition, wherein each plurality of tone sequences of the currently sung word form a tone sequence group;
the second calculation module is used for calculating the vibrato score of the current tone sequence group of the currently sung word according to the tone average value, the tone standard deviation and the vibrato period of the current tone sequence group of the currently sung word;
and the real-time vibrato score calculating module is used for calculating the real-time vibrato score of the currently sung word based on the vibrato score of the current tone sequence group of the currently sung word and the vibrato scores of the C-1 tone sequence groups of the currently sung word, wherein C is a natural number, and the C tone sequence groups form all the tone sequence groups of the currently sung word.
Specifically, the first calculation module includes:
a tone sequence grouping determination unit, which is used for processing the audio data of the currently sung word by adopting a sliding window mechanism to obtain the tone sequence grouping of the currently sung word;
a trill period calculation unit for calculating the pitch average AVGTone of the current pitch sequence group of the currently sung word and the trill period of the current pitch sequence group of the currently sung word;
and the pitch standard deviation calculating unit is used for calculating the pitch standard deviation SDTone of the current pitch sequence group of the currently sung word according to the pitch sliding window size ToneWLen in the sliding window grouping method and the pitch average value AVGTone.
Specifically, the tone sequence group determination unit includes:
the framing unit is used for framing the audio data of the currently sung word by a sliding window framing method to obtain a plurality of frames of audio data;
the pitch value calculation unit is used for calculating the pitch value of each frame of audio data of a plurality of frames of audio data to obtain a pitch sequence;
a filtering unit, configured to filter the tone sequence to obtain a filtered tone sequence;
and the grouping unit is used for performing sliding window grouping on the filtered tone sequences by adopting a sliding window grouping method to obtain all tone sequence groupings of the currently sung words.
In a third aspect, the present invention provides a terminal, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method according to the above technical solution are implemented.
In a fourth aspect, the invention provides a non-transitory computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, performs the steps of the method according to the above technical solution
Different from the prior art that the trill condition of a word is evaluated after the word is sung, the invention identifies and evaluates the trill of the word currently sung, namely the word starts to be framed and subsequently processed after the voice is sent from a singer, on one hand, the instantaneity of the trill identification and evaluation is ensured; on the other hand, when the method identifies and evaluates the vibrato, the method considers the characteristics of a plurality of dimensions such as the vibrato period, the duration, the pitch parameter, the vibrato amplitude and the like, and achieves the technical effect of evaluating the singer level in real time from different aspects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a vibrato identification scoring method according to an embodiment of the present invention;
FIG. 2 is a flow chart for calculating the pitch mean, pitch standard deviation, and trill period for a current pitch sequence grouping of a currently sung word provided by an embodiment of the present invention;
FIG. 3 is a flow chart of audio data processing of a currently sung word to obtain a tone sequence grouping of the currently sung word provided by an embodiment of the present invention;
FIG. 4 is a flow chart for calculating the vibrato score for the current pitch sequence grouping of the currently sung word provided by an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a vibrato identification scoring apparatus provided in an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a first computing module according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a tone sequence grouping determination unit provided in an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a second computing module according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a real-time vibrato score calculating module according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this specification, adjectives such as first and second may only be used to distinguish one element or action from another, without necessarily requiring or implying any actual such relationship or order. References to an element or component or step (etc.) should not be construed as limited to only one of the element, component, or step, but rather to one or more of the element, component, or step, etc., where the context permits.
In the present specification, the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The invention provides a vibrato identification scoring method and a vibrato identification scoring device, wherein the method is shown in figure 1 and mainly comprises steps S101 to S104, which are detailed as follows:
step S101: and judging whether the currently sung character needs to be subjected to vibrato recognition or not.
In singing, because every word singer can or can not sing vibrato, the action of carrying out vibrato recognition on single words without vibrato at a large probability is a waste of resources, and therefore, in the embodiment of the invention, whether the currently sung word needs vibrato recognition or not can be judged firstly.
Since the tone reference file records information such as the reference tone of each character, the starting time of each sentence, the duration of each character and the like, and possibly marks whether each character in the song is vibrato or not, a mode of judging whether the current sung character needs vibrato identification is to analyze the tone reference file, and the characters which need vibrato identification can be confirmed through the vibrato mark. In a tone reference file without a vibrato mark, another way of judging whether a currently sung word needs to be subjected to vibrato recognition is to determine a word capable of being subjected to vibrato recognition and scoring by analyzing the tone reference file, specifically, because the minimum duration of the word possibly generating the vibrato (hereinafter referred to as "vibrato minimum duration" and expressed by TMinTime) is 0.9 second capable of lasting when the word is sounded, that is, if the duration of the word when the word is sounded is more than 0.9 second, the probability of generating the vibrato can reach more than 70%, otherwise, the probability of generating the vibrato is less than 50%, and the word is generally considered not to meet the condition of generating the vibrato; the minimum vibrato time length can be set to be 0.7-1.1 seconds; determining whether any word is the last word of a sentence of lyrics according to the starting time coordinate of each sentence in the song reference file and the continuous time length of the sounding of the word; confirming whether the character needs to be subjected to vibrato recognition or not by comparing the duration of the character with the minimum vibrato duration; and if the currently sung word is the last word of a sentence of lyrics and the sounding duration is longer than the minimum vibrato duration, determining that the currently sung word needs vibrato recognition.
It should be noted that, in order to make the amplitudes represented by the audio data with different bit widths in the same order to facilitate subsequent operations, in the embodiment of the present invention, normalization processing may be performed on the audio data stream, and a Pulse Code Modulation (PCM) audio data value with different bit widths is normalized to a range of-1 to +1, where the specific normalization method is processed according to the following formula:
where val represents PCM audio data, which is represented by a fixed-point number, nval represents a normalized audio data value, which is represented by a floating-point number, and bitnum represents a bit width of the fixed-point val data.
In order to filter out human voice harmonic signals and high-frequency noise which contribute less to tone extraction and improve tone extraction accuracy, in the embodiment of the invention, after normalization processing is carried out on an audio data stream, low-pass filtering can be carried out on the normalized audio data stream, and the filtering frequency point is 8000 Hz.
Step S102: if the word sung at present needs to carry out vibrato recognition, calculating the pitch average value, the pitch standard deviation and the vibrato period of the current pitch sequence group of the word sung at present by framing the audio data of the word sung at present, wherein each plurality of pitch sequences of the word sung at present form a pitch sequence group.
Several concepts related to the present invention need to be explained herein. Firstly, a tone sequence is a set of tone values arranged according to time sequence, the tone sequence of a word sung at present is a set of all the tone values of the word arranged according to time sequence, a plurality of tone sequences are divided into a group, the group is the tone sequence group, therefore, one word is formed by grouping a plurality of tone sequences, and each plurality of tone sequences form a tone sequence group; since the present invention is processed in units of one tone sequence grouping of a currently sung word through steps S101 to S104, the current tone sequence grouping of the currently sung word refers to one tone sequence grouping of the currently sung word that is currently being processed, and for convenience of description, hereinafter, when referring to "the current tone sequence grouping" refers to the current tone sequence grouping of the currently sung word.
As an embodiment of the present invention, calculating the pitch mean, the pitch standard deviation and the vibrato period of the current pitch sequence group of the currently sung word after framing the audio data of the currently sung word can be realized by the following steps S201 to S203, as shown in fig. 2, which are described in detail as follows:
step S201: and processing the audio data of the currently sung word by adopting a sliding window mechanism to obtain the tone sequence grouping of the currently sung word.
The sliding window mechanism is a processing mode of truncating a frequency domain signal by adopting a proper window function in order to reduce the leakage of spectral energy in the signal processing process. As an embodiment of the present invention, the audio data of the currently sung word is processed by using a sliding window mechanism, and obtaining the tone sequence grouping of the currently sung word can be implemented by the following steps S301 to S304, as shown in fig. 3, which are described in detail as follows:
step S301: and framing the audio data of the currently sung word by a sliding window framing method to obtain a plurality of frames of audio data.
As mentioned above, the sliding window mechanism is a process of truncating the frequency domain signal by using an appropriate window function in order to reduce the spectral energy leakage. Since the width of the main lobe of the hamming window is twice as large as that of the rectangular window, that is, the bandwidth is increased by about one time, and the side lobe is smaller, which is beneficial for analyzing the frequency domain signal, in the embodiment of the present invention, when the audio data of the currently sung word is framed by using the sliding window framing method, the hamming window may be selected as the window function, and the window duration may be set to 20-40 ms, preferably 30 ms. As for the step of the hamming window, the value of the hamming window influences the accuracy of the calculated amount and the scoring result, namely, the smaller the step length is, the larger the calculated amount is, and the more accurate the scoring result is; the longer the stepping length is, the smaller the calculated amount is, and the larger the error of the scoring result is, so that the stepping length can be reasonably selected according to the calculation capability of the processor, and meanwhile, the scoring accuracy is considered; in the embodiment of the present invention, the step of the hamming window can be set to a value in a range of 1-10 ms, preferably 6 ms.
Step S302: and calculating the pitch value of each frame of audio data of the plurality of frames of audio data obtained in the step S301 to obtain a pitch sequence.
Each pitch value of each frame of audio data constitutes an element of a set of pitch sequences, and a pitch sequence is formed when all pitch values are calculated and arranged into a set according to time sequence.
Step S303: the pitch sequence calculated in step S302 is filtered to obtain a filtered pitch sequence.
First, median filtering is performed on the pitch sequence obtained in step S301 to filter outliers. Based on the compromise between the calculation amount and the accuracy, in the embodiment of the present invention, when performing median filtering on the tone sequence obtained in step S301, 5-point median filtering may be selected. In order to maximally smooth the pitch fluctuation curve and retain the tremolo frequency, the pitch sequence after median filtering may be further subjected to low-pass filtering, wherein a value of a sampling rate FS of the low-pass filter may be calculated by a step duration HOPTime of audio data framing, and a calculation formula of FS is as follows:
as for the frequency point f of the low pass filter 0 Which may be determined according to the trill period. Generally, the vibrato period ranges from 138 milliseconds to 191 milliseconds, the corresponding frequency value is 1000/vibrato period, and the vibrato frequency range is obtained after calculation is 5.24 Hz to 7.24 Hz. According to the tremolo frequency upper limit value of 7.24Hz and considering a certain margin, in the embodiment of the invention, the frequency point f of the low-pass filter 0 Optionally 15Hz for immediate use f 0 =15Hz。
Step S304: and performing sliding window grouping on the filtered tone sequences by adopting a sliding window grouping method to obtain all tone sequence groups of the currently sung words.
Further, the filtered tone sequences are sliding window grouped. In order to calculate the vibrato period in the packet well, the length of the sliding window is generally selected to be 3-4 times of the average vibrato period, and the sliding window can be a rectangular window. According to the vibrato period range of 138-192 ms, preferably 600 ms is used as the length of the sliding window duration ToneWTime, and the corresponding sliding window size ToneWLen is calculated by the following formula:
where Round () represents a rounding operation, and FrameHopTime represents an audio data framing stepping time in units of milliseconds. The step ToneHop of the sliding window generally selects integral multiple of FrameHopTime of audio data frame-by-frame stepping, namely integral multiple of tone number, and the value range is 1-10 times, the sliding window step can be properly selected according to the performance of a processor, and particularly, when the step value of the sliding window is smaller, the tone grouping number of each word is more, the calculation precision is high, and the calculation amount is larger; when the step value of the sliding window is larger, the number of tone groups is smaller, the calculation amount is small, the calculation precision is low, and the value is preferably 2.
Step S202: the pitch mean AVGTone of the current pitch sequence grouping of the currently sung word and the trill period of the current pitch sequence grouping of the currently sung word are calculated.
The pitch average AVGTone of the current pitch sequence group, that is, the average of all the pitch values in the current pitch sequence group, may be calculated according to a general calculation method for averaging, that is, the average of the pitch values of all the pitch sequences in the current pitch sequence group is calculated to obtain the pitch average AVGTone of the current pitch sequence group.
As for the trill period of the current tone sequence packet, a plurality of tone sequence segments can be obtained after the tone sequence contained in the tone sequence packet of the trill period to be detected is segmented, the average distance of tones between adjacent tone sequence segments is calculated, and after the calculation of the average distances of tones corresponding to all the trill periods to be detected is completed, the minimum average distance of the tones is taken as the trill period corresponding to the tone sequence packet. As for the value range of the vibrato period to be detected, the value can be taken at equal intervals in the range of 100-300 milliseconds, and 10-30 points are generally suitable. Thus, the pitch distance D (k) of the k-th adjacent pitch segment can be calculated by the following formula:
wherein, k is 1 to Segnum, segtone refers to the tone data segment separated by the tremolo period to be detected, k indicates the kth adjacent tone sequence segment, when k is 1, D (1) represents the tone distance between the 1 st tone sequence segment and the 2 nd tone sequence segment, when k is 2, D (2) represents the tone distance between the 2 nd tone sequence segment and the 3 rd tone sequence segment, and so on, Segsize represents the tone sequence length after the tone sequence contained in the tone sequence packet is segmented with the tremolo period to be detected as the segmentation unit, and Segnum represents the number of the tone sequence segments obtained after the tone sequence packet is segmented in this way.
The average pitch distance TestD corresponding to the tremolo period to be detected is calculated according to the following formula:
the trill period TrillTime for the current pitch sequence packet is calculated according to the following expression:
TrillTime ═ MIN (testd (n)), where n ═ 1 to TestNum
In the above expression, TestNum represents the number of vibrato cycles to be detected.
Step S203: the pitch standard deviation SDTone of the current pitch sequence grouping of the currently sung word is calculated based on the pitch sliding window size ToneWLen in the sliding window grouping method and the pitch mean AVGTone of the current pitch sequence grouping of the currently sung word.
The pitch sliding window size ToneWLen has been given in the previous embodiments, the pitch mean AVGTone of the current pitch sequence grouping of the currently sung word is also found, and the pitch standard deviation SDTone of the current pitch sequence grouping of the currently sung word can be calculated according to the following formula:
where tone _ m (j) represents the pitch value of the jth pitch sequence of the current pitch sequence grouping.
Step S103: a trill score is calculated for the current pitch sequence grouping of the currently sung word based on the pitch mean, the pitch standard deviation, and the trill period of the current pitch sequence grouping of the currently sung word.
In an embodiment of the invention, the trill score of the current pitch-sequence packet of the currently sung word is related to factors such as the trill period score trilltescore, the average pitch score AVGToneScore and the pitch standard deviation score SDToneScore of the current pitch-sequence packet, and therefore, it is necessary to calculate the trill period score trilltescore, the average pitch score AVGToneScore and the pitch standard deviation score toneescore of the current pitch-sequence packet first. As an embodiment of the present invention, calculating the vibrato score of the current pitch sequence grouping of the currently sung word according to the pitch mean, the pitch standard deviation and the vibrato period of the current pitch sequence grouping of the currently sung word can be realized by steps S401 to S404 as described in fig. 4, which are described in detail as follows:
step S401: and determining the trill period score TrillTimeScore of the current pitch sequence packet of the currently sung word by judging whether the trill period of the current pitch sequence packet of the currently sung word is within an interval formed by the upper limit and the lower limit of a preset trill period.
In the embodiment of the present invention, the technical solution of determining the trill period score trill time score of the current tone sequence packet of the currently sung word is to compare the calculated trill period trill time of the current tone sequence packet with the preset trill period upper limit TTimeH and the preset trill period lower limit TTimeL, and when the trill period trill time is between the preset trill period upper limit TTimeH and the preset trill period lower limit TTimeL, it indicates that the trill period trill time of the current tone sequence packet conforms to the trill period condition, otherwise, it indicates that the trill period trill time of the current tone sequence packet does not conform to the trill period condition, specifically, the calculation formula of the trill period score trill time score of the current tone sequence packet is as follows:
step S402: the average pitch score AVGToneScore for the current pitch sequence grouping of the currently sung word is obtained by comparing the pitch average for the current pitch sequence grouping of the currently sung word with the pitch of the currently sung word in the pitch reference file.
In the embodiment of the present invention, the average pitch AVGTone of the current pitch sequence group calculated in the foregoing embodiment may be compared with the reference pitch of the word currently sung in the pitch reference file, and the average pitch score of the current pitch sequence group may be determined according to a principle that the closer the average pitch AVGTone of the current pitch sequence group is to the reference pitch of the word currently sung in the pitch reference file, the higher the average pitch score of the current pitch sequence group is, the more the average pitch AVGTone of the current pitch sequence group is deviated from the reference pitch of the word currently sung in the pitch reference file, and the lower the average pitch score of the current pitch sequence group is. It should be noted that in the embodiment of the present invention, the average pitch is a prerequisite for the overall vibrato score, and the vibrato score is meaningful only when the average pitch matches the reference pitch of the pitch reference file. When the difference between the average pitch and the reference pitch of the pitch reference file is more than one octave, namely 12 semitones, the running pitch is serious, and the average pitch score AVGToneSCore of the current pitch sequence group is 0; when the difference between the average pitch and the pitch of the pitch reference file is less than one octave, the average pitch score AVGToneScore of the current pitch sequence group is calculated according to the degree of deviation of the average pitch from the pitch reference file, and the calculation formula of the average pitch score AVGToneScore of the current pitch sequence group is as follows:
wherein BaseTone is the reference tone of the currently sung word in the tone reference file.
Step S403: and determining the pitch standard deviation score SDToneScore of the current pitch sequence group of the currently sung word by judging whether the pitch standard deviation of the current pitch sequence group of the currently sung word is positioned in an interval formed by an upper limit and a lower limit of a preset pitch standard deviation.
The pitch standard deviation score of the current pitch sequence group reflects the degree of departure of the vibrato pitch from the average pitch of the current pitch sequence group, and within a certain range, the larger the pitch standard deviation of the current pitch sequence group is, the more obvious the vibrato is, and the higher the pitch standard deviation score of the current pitch sequence group is; the smaller the pitch standard deviation of the current pitch sequence packet, the more ambiguous the vibrato, and the lower the pitch standard deviation score of the current pitch sequence packet.
The frequency fluctuation of the general vibrato is 1.843-11.703 Hz, the human voice fundamental frequency range is about 70-800 Hz, and the fluctuation of the vibrato with the average tone as the center increases along with the increase of the fundamental frequency. And calculating a pitch difference value maximum maxdiv _ tone and a minimum value mindiv _ tone of the human voice fundamental frequency and the human voice fundamental frequency superposition tremolo frequency maximum fluctuation value according to a following frequency pitch conversion formula, taking mindiv _ tone/2 as a lower limit SDLimit _ L of the voice alignment difference, and taking maxdiv _ tone/2 as an upper limit SDLimit _ H of the pitch standard difference.
In the above frequency-to-Pitch conversion equation, Pitch represents Pitch, and basefreq represents fundamental frequency. Assuming that the fundamental frequency of human voice and the fluctuation of the tremolo frequency are in a linear relationship, the lower limit of the standard deviation SDLimit _ L of the pitch calculated when the fundamental frequency is 800Hz and the fluctuation is 11.703Hz is 0.12, and the upper limit of the standard deviation SDLimit _ H of the pitch calculated when the fundamental frequency is 70Hz and the fluctuation is 1.843Hz is 0.225. It should be noted that, in the actual singing process, the frequency fluctuation of the vibrato will vary from one singer to another with the same fundamental frequency. Considering the limit, the lower limit SDLimit _ L of the tone standard deviation is calculated to be 0.0398 when the fundamental frequency is 800Hz and the tremolo fluctuates by 1.843 Hz; the upper limit of the pitch standard deviation SDLimit _ H is 2.676 calculated at a fundamental frequency of 70Hz and a vibrato fluctuation of 11.703 Hz. In consideration of the error and the limit in the pitch calculation process, in the embodiment of the present invention, the upper limit of the pitch standard deviation SDLimit _ H may be set to 1.5 and the lower limit of the pitch standard deviation SDLimit _ L may be set to 0.1.
In an embodiment of the invention, the pitch standard deviation score SDToneScore of the current pitch sequence grouping of the currently sung word is calculated as follows:
step S404: according to the formula TScore ═ AVGToneScore [ ToneRate + (1-ToneRate) × (TrillTimeScore) × SDToneScore ], a vibrato score TScore of the current pitch sequence group of the currently sung word is calculated.
In the calculation formula of the vibrato score TScore of the current tone sequence grouping, ToneRate represents the score proportion, namely the percentage of tone accuracy in the total vibrato score of the tone sequence grouping is within the range of 0-1, the larger the value is, the higher the requirement on the vibrato tone accuracy is, and the score proportion of the vibrato part is reduced; conversely, the pitch accuracy requirement decreases and the specific gravity of the trill section score increases.
As can be seen from the formula for calculating the tremolo score TScore of the current pitch sequence grouping, the tremolo score TScore of the current pitch sequence grouping consists of two parts, namely the average pitch score AVGToneScore of the current pitch sequence grouping and the tremolo correlation score of the current pitch sequence grouping, wherein the tremolo correlation score of the current pitch sequence grouping in turn comprises the tremolo period score TrillTimeScore of the current pitch sequence grouping and the pitch standard deviation score sdtonesescore of the current pitch sequence grouping. The average pitch score AVGToneScore of the current pitch sequence grouping is an important prerequisite for the vibrato score TScore of the current pitch sequence grouping, in other words, the pitch must be accurate before the subsequent vibrato score is meaningful. The trill period score trill time score and the pitch standard deviation score SDToneScore of the current pitch sequence packet are all missing from the trill correlation score, and it makes no sense to see only the trill period score or the pitch standard deviation score singly, i.e. if the trill period does not meet the requirement of the trill, the trill correlation score cannot be obtained, and similarly, if the pitch standard deviation does not meet the lower limit of the standard deviation, the trill correlation score cannot be obtained, so the trill period score trill time score of the current pitch sequence packet and the pitch standard deviation score sdtonesescore of the current pitch sequence packet are multiplicative, and the trill correlation score of the current pitch sequence packet is determined by both the scores, and finally, in engineering practice, the score proportion can be adjusted to increase the weight of the trill correlation score or the average pitch score AVGToneScore of the current pitch sequence packet in the trill TScore of the current pitch sequence packet.
Step S104: and calculating the real-time trill score of the currently sung word based on the trill score of the current tone sequence group of the currently sung word and the trill scores of C-1 tone sequence groups of the currently sung word, wherein C is a natural number, and the C tone sequence groups form all the tone sequence groups of the currently sung word.
As an embodiment of the invention, the deriving the real-time vibrato score for the currently sung word based on the vibrato score for the current pitch sequence grouping of the currently sung word and the vibrato scores for the C-1 pitch sequence groupings of the currently sung word may be: obtaining the sum TSUM of the trill scores of C-1 pitch sequence groups before the current pitch sequence group of the word sung currently according to the same method as the method for obtaining the trill score of the current pitch sequence group of the word sung currently; dividing the sum of the trill score of the current tone sequence group of the currently sung word and TSUM by the number of all tone sequence groups of the currently sung word, and taking the obtained result as the real-time trill score of the currently sung word. In other words, each time the trill score of one pitch sequence group is calculated as the trill score of the current pitch sequence group, when the trill score of the last pitch sequence group of the currently sung word is calculated, the trill score of the current pitch sequence group is also used as the trill score of the current pitch sequence group, the sum TSUM of the trill scores of the C-1 pitch sequence groups before the current pitch sequence group, namely the last pitch sequence group, is calculated, the sum of the trill score (denoted as TScoreCur) and the TSUM of the last pitch sequence group of the currently sung word is divided by the number of all pitch sequence groups of the currently sung word, namely C, and the obtained result is used as the real-time trill score of the currently sung word. If the real-time vibrato score of the currently sung word is expressed by AScore, the calculation formula is as follows:
wherein, the number of all tone sequence groups of the currently sung word, i.e. C, can be obtained by the following calculation formula:
wherein, FLOOR () represents the value of the integer part, WordTime represents the exact duration of the currently sung word stored in the reference tone file, in milliseconds, FrameTime represents the time to collect a frame of data, FrameTime represents FrameLen 1000/FS, where FrameLen represents the number of audio sample data included in a frame. It should be noted that the value of C may be calculated before the currently sung word is sung. It should be noted that the real-time vibrato score AScore of the currently sung word may be added to an existing non-tonal scoring system for intensity, similarity, etc. When the existing grading method of tone and the like is used, the ToneRate can be set to be 0, the average tone score AVGToneSCore is calculated by the original tone grading method, and the real-time vibrato score ASore of the currently sung word is accumulated into the original tone grading value in an additional score mode, so that the portability of the real-time vibrato grading method in an original grading system is facilitated, and the software development period is shortened. It should be noted that the real-time vibrato score AScore of the currently sung word may be used not only for real-time evaluation of the vibrato itself of a word, i.e. how many parts of the word are obtained in terms of vibrato in the above evaluation method, but also for judging whether a word actually sung vibrato, and the specific method may be to compare AScore with (ToneRate 0.6) × TeamNUM, which is 0.6 times the full tone of all tone sequence groups of the currently sung word, and if AScore > (ToneRate 0.6) × TeamNUM, it may be determined that the currently sung word sung vibrato, where TeamNUM represents the number of all tone sequence group groups of the currently sung word, and the meaning is equivalent to the meaning of C described above.
It can be known from the above-mentioned trill recognition scoring method illustrated in fig. 1 that, unlike the prior art, the trill condition of a word is evaluated only after the word is sung, the present invention performs trill recognition and evaluation on the word currently being sung, i.e. the word starts framing and subsequent processing after the voice of the singer, on one hand, the real-time performance of the trill recognition and evaluation is ensured; on the other hand, when the method is used for identifying and evaluating the vibrato, the characteristics of multiple dimensions such as the vibrato period, the duration, the tone parameter, the vibrato amplitude and the like are considered, and the technical effect of evaluating the singer level in real time from different aspects is achieved.
Referring to fig. 5, a vibrato identification and scoring apparatus provided in an embodiment of the present invention includes a determining module 501, a first calculating module 502, a second calculating module 503, and a real-time vibrato score calculating module 504, which are detailed as follows:
a judging module 501, configured to judge whether a currently sung word needs to be subjected to vibrato recognition;
a first calculating module 502, configured to calculate a pitch average, a pitch standard deviation, and a vibrato period of a current pitch sequence group of a currently sung word after framing audio data of the currently sung word if the currently sung word needs to be subjected to vibrato recognition, where each of a plurality of pitch sequences of the currently sung word constitutes a pitch sequence group;
a second calculating module 503, configured to calculate a vibrato score of the current pitch sequence group of the currently sung word according to the pitch average, the pitch standard deviation, and the vibrato period of the current pitch sequence group of the currently sung word;
a real-time trill score calculation module 504, configured to obtain a real-time trill score of the currently sung word based on the trill score of the current tone sequence group of the currently sung word and the trill scores of the C-1 tone sequence groups of the currently sung word, where C is a natural number, and the C tone sequence groups constitute all the tone sequence groups of the currently sung word.
Specifically, the first calculating module 502 illustrated in fig. 5 may include a pitch sequence grouping determining unit 601, a vibrato period calculating unit 602, and a pitch standard deviation calculating unit 603, and the structure thereof is shown in fig. 6, and detailed as follows:
a tone sequence grouping determination unit 601, configured to process audio data of a currently sung word by using a sliding window mechanism, to obtain a tone sequence grouping of the currently sung word;
a trill period calculation unit 602 for calculating a pitch average AVGTone of the current pitch sequence group of the currently sung word and a trill period of the current pitch sequence group of the currently sung word;
a pitch standard deviation calculating unit 603, configured to calculate a pitch standard deviation SDTone of a current pitch sequence grouping of a currently sung word according to a pitch sliding window size ToneWLen in a sliding window grouping method and the pitch average AVGTone.
Specifically, the pitch sequence grouping determination unit 601 illustrated in fig. 6 may include a framing unit 701, a pitch value calculation unit 702, a filtering unit 703 and a grouping unit 704, and the structure thereof is shown in fig. 7, and detailed as follows:
a framing unit 701, configured to frame the audio data of the currently sung word by using a sliding window framing method to obtain a plurality of frames of audio data;
a pitch value calculating unit 702, configured to calculate a pitch value of each frame of audio data of a plurality of frames of audio data, so as to obtain a pitch sequence;
a filtering unit 703, configured to filter the tone sequence to obtain a filtered tone sequence;
a grouping unit 704, configured to perform sliding window grouping on the filtered tone sequence by using a sliding window grouping method, so as to obtain all tone sequence groups of the currently sung word.
Specifically, the second calculating module 503 illustrated in fig. 5 may include an average pitch score calculating unit 801, a pitch standard deviation score calculating unit 802, a vibrato period score calculating unit 803, and a vibrato score calculating unit 804, which are shown in fig. 8 and detailed as follows:
an average pitch score calculation unit 801 configured to obtain an average pitch score AVGToneScore of a current pitch sequence group of a currently sung word by comparing a pitch average of the current pitch sequence group of the currently sung word with a reference pitch of the currently sung word in a pitch reference file;
a tone standard deviation score calculating unit 802, configured to determine a tone standard deviation score SDToneScore of a current tone sequence grouping of a currently sung word by determining whether a tone standard deviation of the current tone sequence grouping of the currently sung word is within an interval formed by upper and lower limits of a preset tone standard deviation;
a trill period score calculation unit 803, configured to determine a trill period score trilltiescore of a current pitch sequence packet of a currently sung word by determining whether a trill period of the current pitch sequence packet of the currently sung word is within an interval formed by upper and lower limits of a preset trill period;
a vibrato score calculating unit 804, configured to calculate a vibrato score TScore of the current pitch sequence packet of the currently sung word according to a formula TScore ═ avgtonecore [ tnerate + (1-ToneRate) × trilltime score × SDToneScore ].
Specifically, the real-time vibrato score calculating module 504 illustrated in fig. 5 may include an obtaining unit 901 and an average value calculating unit 902, which are shown in fig. 9 in a structural diagram, and detailed as follows:
an obtaining unit 901, configured to obtain a sum TSUM of trill scores of C-1 pitch sequence groups before a current pitch sequence group of a currently sung word, according to a method similar to that for obtaining a trill score of the current pitch sequence group of the currently sung word;
and an average value calculating unit 902, configured to divide the sum of the trill score of the current tone sequence group of the currently sung word and the TSUM by the number of all tone sequence groups of the currently sung word, and obtain a result as a real-time trill score of the currently sung word.
From the above description, it can be seen that the invention realizes real-time scoring of the song based on the basic unit words by real-time scoring of the tone, rhythm and tone intensity of each word in the song sung by the singer, thereby solving the technical problems of inaccurate scoring, lack of interest and interactivity of the existing singing scoring system and achieving the technical effect of real-time evaluation of the singer level from different aspects.
Fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present invention. As shown in fig. 10, the terminal 10 of this embodiment may be a microphone speaker as exemplified in fig. 2. The terminal illustrated in fig. 10 mainly includes: a processor 100, a memory 101, and a computer program 102, such as a program for a vibrato recognition scoring method, stored in the memory 101 and operable on the processor 100. The processor 100 executes the computer program 102 to implement the steps of the above-described embodiments of the vibrato identification and scoring method, such as steps S101 to S104 shown in fig. 1. Alternatively, the processor 100 executes the computer program 102 to implement the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the determining module 501, the first calculating module 502, the second calculating module 503 and the real-time vibrato score calculating module 504 shown in fig. 5.
Illustratively, the computer program 102 of the vibrato recognition scoring method mainly includes: judging whether the current sung character needs to be subjected to vibrato recognition or not; if the currently sung word needs to be subjected to vibrato recognition, calculating the tone average value, the tone standard deviation and the vibrato period of the currently sung word in the current tone sequence group after framing the audio data of the currently sung word, wherein each plurality of tone sequences of the currently sung word form a tone sequence group; calculating the trill score of the current tone sequence group of the currently sung word according to the tone average value, the tone standard deviation and the trill period of the current tone sequence group of the currently sung word; and calculating the real-time trill score of the currently sung word based on the trill score of the current tone sequence group of the currently sung word and the trill scores of C-1 tone sequence groups of the currently sung word, wherein C is a natural number, and the C tone sequence groups form all the tone sequence groups of the currently sung word. The computer program 102 may be partitioned into one or more modules/units, which are stored in the memory 101 and executed by the processor 100 to implement the present invention. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 102 in the terminal 5. For example, the computer program 102 may be divided into the functions of the determination module 501, the first calculation module 502, the second calculation module 503, and the real-time vibrato score calculation module 504 (modules in the virtual device), and the specific functions of each module are as follows: a judging module 501, configured to judge whether a currently sung word needs to be subjected to vibrato recognition; a first calculating module 502, configured to calculate a pitch average, a pitch standard deviation, and a vibrato period of a current pitch sequence group of a currently sung word after framing audio data of the currently sung word if the currently sung word needs to be subjected to vibrato recognition, where each of a plurality of pitch sequences of the currently sung word constitutes a pitch sequence group; a second calculating module 503, configured to calculate a vibrato score of the current pitch sequence group of the currently sung word according to the pitch average, the pitch standard deviation, and the vibrato period of the current pitch sequence group of the currently sung word; a real-time trill score calculating module 504, configured to obtain a real-time trill score of the currently sung word based on the trill score of the current pitch sequence group of the currently sung word and the trill scores of the C-1 pitch sequence groups of the currently sung word, where C is a natural number, and the C pitch sequence groups constitute all the pitch sequence groups of the currently sung word.
The terminal 10 may include, but is not limited to, a processor 100, a memory 101. Those skilled in the art will appreciate that fig. 10 is merely exemplary of a terminal 10 and does not constitute a limitation of the terminal 10 and may include more or fewer components than illustrated, or some of the components may be combined, or different components, e.g., computing devices may also include input output devices, network access devices, buses, etc.
The Processor 100 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 101 may be an internal storage unit of the terminal 10, such as a hard disk or a memory of the terminal 10. The memory 101 may also be an external storage device of the terminal 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the terminal 10. Further, the memory 101 may also include both internal and external memory units of the terminal 10. The memory 101 is used for storing computer programs and other programs and data required by the terminal. The memory 101 may also be used to temporarily store data that has been output or is to be output.
It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the device is divided into different functional units or modules so as to complete all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described device/terminal embodiments are merely illustrative, and for example, a division of a module or a unit is only one type of logical function division, and other division manners may be available in actual implementation, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-transitory computer readable storage medium. Based on such understanding, all or part of the processes in the method of the embodiments of the present invention may also be implemented by instructing related hardware through a computer program, where the computer program of the vibrato recognition scoring method may be stored in a non-transitory computer readable storage medium, and when being executed by a processor, the computer program may implement the steps of the embodiments of the methods, that is, determining whether the currently sung word needs to be subjected to vibrato recognition; if the currently sung word needs to be subjected to vibrato recognition, calculating the tone average value, the tone standard deviation and the vibrato period of the currently sung word in the current tone sequence group after framing the audio data of the currently sung word, wherein each plurality of tone sequences of the currently sung word form a tone sequence group; calculating the trill score of the current pitch sequence group of the current sung word according to the pitch average value, the pitch standard deviation and the trill period of the current pitch sequence group of the current sung word; and calculating the real-time trill score of the currently sung word based on the trill score of the current tone sequence group of the currently sung word and the trill scores of C-1 tone sequence groups of the currently sung word, wherein C is a natural number, and the C tone sequence groups form all the tone sequence groups of the currently sung word. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The non-transitory computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the non-transitory computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, non-transitory computer readable media does not include electrical carrier signals and telecommunications signals as subject to legislation and patent practice. The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (11)
1. A vibrato recognition scoring method, wherein the method comprises:
judging whether the currently sung character needs to be subjected to vibrato recognition;
if the word sung at present needs to be subjected to vibrato identification, calculating the pitch average value, the pitch standard deviation and the vibrato period of the current pitch sequence grouping of the word after framing the audio data of the word, wherein each plurality of pitch sequences of the word form a pitch sequence grouping;
calculating a vibrato score for a current pitch sequence grouping of the word based on a pitch mean, a pitch standard deviation, and a vibrato period of the current pitch sequence grouping;
evaluating a real-time vibrato score for the word based on a vibrato score of the current pitch sequence grouping and a vibrato score of a C-1 pitch sequence grouping of the word, where C is a natural number, the C pitch sequence groupings constituting all pitch sequence groupings of the word;
wherein said calculating a vibrato score for a current pitch sequence grouping of said words based on a pitch mean, a pitch standard deviation, and a vibrato period of said current pitch sequence grouping comprises:
obtaining an average pitch score AVGToneScore for the current pitch sequence grouping by comparing the pitch average for the current pitch sequence grouping to the reference pitch for the word in a pitch reference file;
determining a pitch standard deviation score SDToneScore of the current pitch sequence grouping by judging whether the pitch standard deviation of the current pitch sequence grouping is within an interval formed by an upper limit and a lower limit of a preset pitch standard deviation;
determining a trill period score TrillTimeScore of the current pitch sequence packet by judging whether the trill period of the current pitch sequence packet is within an interval formed by the upper limit and the lower limit of a preset trill period;
calculating the vibrato score TScore of the current tone sequence grouping according to a formula TScore ═ AVGToneSCore [ ToneRate + (1-ToneRate) × (TrillTimeScore) ] SDTonScore ]; wherein, the ToneRate represents the score proportion and the value range is between 0 and 1.
2. The vibrato identification scoring method as claimed in claim 1, wherein said calculating a pitch mean, a pitch standard deviation and a vibrato period for a current pitch sequence grouping of the word by framing audio data of the word comprises:
processing the audio data of the words by adopting a sliding window mechanism to obtain tone sequence groups of the words;
calculating a pitch average value AVGTone of the current pitch sequence group and a trill period of the current pitch sequence group;
and calculating the tone standard deviation SDTONe of the current tone sequence grouping according to the tone sliding window size ToneWLen in a sliding window grouping method and the tone average value AVGTone.
3. The vibrato identification scoring method of claim 2, wherein said processing the audio data of the word using a sliding window mechanism to obtain a pitch sequence grouping of the word comprises:
framing the audio data of the words by a sliding window framing method to obtain a plurality of frames of audio data;
calculating the tone value of each frame of audio data of the plurality of frames of audio data to obtain a tone sequence;
filtering the tone sequence to obtain a filtered tone sequence;
and performing sliding window grouping on the filtered tone sequences by adopting a sliding window grouping method to obtain all tone sequence groups of the words.
4. The vibrato recognition scoring method as recited in claim 2,
the trill period score trilltiescore for the current pitch sequence packet is calculated as follows:
wherein TrillTime is a trill period, TTimeH is a preset trill period upper limit, and TTimeL is a preset trill period lower limit;
the calculation formula for the average pitch score AVGToneScore for the current pitch sequence grouping is as follows:
wherein, BaseTone is the reference tone of the word sung currently in the tone reference file, AVGTone is the tone average value of the current tone sequence group, and SDTone is the tone standard deviation of the current tone sequence group;
the pitch standard deviation score SDToneScore for the current pitch sequence grouping of the currently sung word is calculated as follows:
where SDLimit _ H is the upper pitch standard deviation limit and SDLimit _ L is the lower pitch standard deviation limit.
5. The trill recognition scoring method of claim 1, wherein said deriving a real-time trill score for the word based on the trill score for the current pitch sequence grouping and the trill scores for the C-1 pitch sequence groupings of the word comprises:
acquiring the sum TSUM of the trill scores of C-1 pitch sequence groups before the current pitch sequence group according to the same method as the method for acquiring the trill scores of the current pitch sequence group;
dividing the sum of the vibrato score of the current pitch sequence packet and the TSUM by the number of all pitch sequence packets of the word, and taking the result as the real-time vibrato score of the word.
6. A vibrato recognition scoring device, wherein the device comprises:
the judging module is used for judging whether the currently sung character needs to be subjected to vibrato recognition or not;
the first calculation module is used for calculating the tone average value, the tone standard deviation and the trill period of the current tone sequence grouping of the word after framing the audio data of the word if the currently sung word needs to be subjected to trill recognition, and each plurality of tone sequences of the word form a tone sequence grouping;
a second calculation module for calculating a vibrato score of the current tone sequence grouping according to a tone average, a tone standard deviation and a vibrato period of the current tone sequence grouping;
a real-time trill score calculation module for calculating the real-time trill score of the word based on the trill score of the current tone sequence group and the trill scores of the C-1 tone sequence groups of the word, wherein C is a natural number, and the C tone sequence groups constitute all the tone sequence groups of the word;
wherein the average pitch score AVGToneScore for the current pitch sequence grouping is obtained by comparing the pitch average for the current pitch sequence grouping to the reference pitch for the word in a pitch reference file;
determining a pitch standard deviation score SDToneScore of the current pitch sequence grouping by judging whether the pitch standard deviation of the current pitch sequence grouping is within an interval formed by an upper limit and a lower limit of a preset pitch standard deviation;
determining the trill period score TrillTimeScore of the current tone sequence packet by judging whether the trill period of the current tone sequence packet is within an interval formed by the upper limit and the lower limit of a preset trill period;
calculating the vibrato score TScore of the current tone sequence grouping according to a formula TScore ═ AVGToneSCore [ ToneRate + (1-ToneRate) × (TrillTimeScore) ] SDTonScore ]; wherein, the ToneRate represents the score proportion and the value range is between 0 and 1.
7. The trill recognition scoring device of claim 6, wherein the first computing module comprises:
a tone sequence grouping determination unit, configured to process the audio data of the word by using a sliding window mechanism to obtain a tone sequence grouping of the word;
a trill period calculation unit for calculating a pitch average AVGTone of the current pitch sequence group and a trill period of the current pitch sequence group;
a pitch standard deviation calculating unit, configured to calculate the pitch standard deviation SDTone of the current pitch sequence grouping according to the pitch sliding window size tonelen in the sliding window grouping method and the pitch average AVGTone.
8. The trill recognition scoring device of claim 7, wherein the pitch sequence grouping determination unit comprises:
the framing unit is used for framing the audio data of the words by a sliding window framing method to obtain a plurality of frames of audio data;
the pitch value calculating unit is used for calculating the pitch value of each frame of audio data of the plurality of frames of audio data to obtain a pitch sequence;
a filtering unit, configured to filter the tone sequence to obtain a filtered tone sequence;
and the grouping unit is used for performing sliding window grouping on the filtered tone sequences by adopting a sliding window grouping method to obtain all tone sequence groups of the words.
9. The trill recognition scoring device of claim 7,
the trill period score trilltiescore for the current pitch sequence packet is calculated as follows:
wherein TrillTime is a trill period, TTimeH is a preset trill period upper limit, and TTimeL is a preset trill period lower limit;
the calculation formula for the average pitch score AVGToneScore for the current pitch sequence grouping is as follows:
wherein, BaseTone is the reference tone of the word sung currently in the tone reference file, AVGTone is the tone average value of the current tone sequence group, and SDTone is the tone standard deviation of the current tone sequence group;
the pitch standard deviation score SDToneScore for the current pitch sequence grouping of the currently sung word is calculated as follows:
where SDLimit _ H is the upper pitch standard deviation limit and SDLimit _ L is the lower pitch standard deviation limit.
10. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the computer program.
11. A non-transitory computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911138267.6A CN110853678B (en) | 2019-11-20 | 2019-11-20 | Trill identification scoring method, trill identification scoring device, terminal and non-transitory computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911138267.6A CN110853678B (en) | 2019-11-20 | 2019-11-20 | Trill identification scoring method, trill identification scoring device, terminal and non-transitory computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110853678A CN110853678A (en) | 2020-02-28 |
CN110853678B true CN110853678B (en) | 2022-09-06 |
Family
ID=69602563
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911138267.6A Active CN110853678B (en) | 2019-11-20 | 2019-11-20 | Trill identification scoring method, trill identification scoring device, terminal and non-transitory computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110853678B (en) |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008015214A (en) * | 2006-07-06 | 2008-01-24 | Dds:Kk | Singing skill evaluation method and karaoke machine |
JP4900017B2 (en) * | 2007-04-17 | 2012-03-21 | ヤマハ株式会社 | Vibrato detection device, vibrato evaluation device, vibrato detection method, vibrato evaluation method and program |
JP5147389B2 (en) * | 2007-12-28 | 2013-02-20 | 任天堂株式会社 | Music presenting apparatus, music presenting program, music presenting system, music presenting method |
CN106997769B (en) * | 2017-03-25 | 2020-04-24 | 腾讯音乐娱乐(深圳)有限公司 | Trill recognition method and device |
CN107978322A (en) * | 2017-11-27 | 2018-05-01 | 北京酷我科技有限公司 | A kind of K songs marking algorithm |
CN108415942B (en) * | 2018-01-30 | 2021-06-25 | 福建星网视易信息系统有限公司 | Personalized teaching and singing scoring two-dimensional code generation method, device and system |
CN109979485B (en) * | 2019-04-29 | 2023-05-23 | 北京小唱科技有限公司 | Audio evaluation method and device |
-
2019
- 2019-11-20 CN CN201911138267.6A patent/CN110853678B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110853678A (en) | 2020-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Deshmukh et al. | Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech | |
CN110880329B (en) | Audio identification method and equipment and storage medium | |
US7660718B2 (en) | Pitch detection of speech signals | |
CN104620313B (en) | Audio signal analysis | |
Rao et al. | Vocal melody extraction in the presence of pitched accompaniment in polyphonic music | |
WO2018045988A1 (en) | Method and device for generating digital music score file of song, and storage medium | |
US8473282B2 (en) | Sound processing device and program | |
CN109817191B (en) | Tremolo modeling method, device, computer equipment and storage medium | |
Yang et al. | BaNa: A noise resilient fundamental frequency detection algorithm for speech and music | |
BRPI0616903A2 (en) | method for separating audio sources from a single audio signal, and, audio source classifier | |
CN112133277B (en) | Sample generation method and device | |
JP2007041593A (en) | Method and apparatus for extracting voiced/unvoiced classification information using harmonic component of voice signal | |
EP2962299B1 (en) | Audio signal analysis | |
Arora et al. | On-line melody extraction from polyphonic audio using harmonic cluster tracking | |
CN108877835A (en) | Evaluate the method and system of voice signal | |
Driedger et al. | Template-based vibrato analysis in music signals | |
CN106970950B (en) | Similar audio data searching method and device | |
CN113611330A (en) | Audio detection method and device, electronic equipment and storage medium | |
CN110853678B (en) | Trill identification scoring method, trill identification scoring device, terminal and non-transitory computer-readable storage medium | |
CN107025902B (en) | Data processing method and device | |
CN114302301B (en) | Frequency response correction method and related product | |
CN111782868B (en) | Audio processing method, device, equipment and medium | |
CN110827859B (en) | Method and device for vibrato recognition | |
Sutton et al. | Transcription of vocal melodies using voice characteristics and algorithm fusion | |
CN115206345B (en) | Music and human voice separation method, device, equipment and medium based on time-frequency combination |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |