CN110867194A - Audio scoring method, device, equipment and storage medium - Google Patents

Audio scoring method, device, equipment and storage medium Download PDF

Info

Publication number
CN110867194A
CN110867194A CN201911072491.XA CN201911072491A CN110867194A CN 110867194 A CN110867194 A CN 110867194A CN 201911072491 A CN201911072491 A CN 201911072491A CN 110867194 A CN110867194 A CN 110867194A
Authority
CN
China
Prior art keywords
vibrato
fragment
determining
sequence
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911072491.XA
Other languages
Chinese (zh)
Other versions
CN110867194B (en
Inventor
江益靓
林森
庄晓滨
张超鹏
曹蜀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Music Entertainment Technology Shenzhen Co Ltd
Original Assignee
Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Music Entertainment Technology Shenzhen Co Ltd filed Critical Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority to CN201911072491.XA priority Critical patent/CN110867194B/en
Publication of CN110867194A publication Critical patent/CN110867194A/en
Application granted granted Critical
Publication of CN110867194B publication Critical patent/CN110867194B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The application discloses a scoring method, a scoring device, scoring equipment and a scoring storage medium for audio, and belongs to the technical field of computers. The method comprises the following steps: acquiring a plurality of vibrato fragments corresponding to a base frequency sequence of the audio to be scored; acquiring trill characteristic information of a fundamental frequency sequence corresponding to a plurality of trill segments, wherein the trill characteristic information at least comprises spectral distribution stability and sequence amplitude; and determining the trill scores of the trill segments according to the trill segment durations of the trill segments and the spectral distribution stability and the sequence amplitude of the fundamental frequency sequences corresponding to the trill segments. And determining the audio tremolo score of the audio to be scored according to the tremolo scores and the number of the tremolo segments of the plurality of tremolo segments, and scoring the audio to be scored based on the audio tremolo score. Therefore, the method also focuses on the trill segment on the basis of focusing on the intonation through traditional scoring, and solves the problem that the scoring method only focuses on the intonation of the audio to be scored in the prior art is too single, so that the obtained score is more unilateral.

Description

Audio scoring method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for scoring an audio.
Background
With the development of network technology, more and more music software can provide a singing mode, and in order to assist a user to better master the singing skill of a song, a scoring function can be provided to score audio to be scored.
The existing scoring method generally compares the fundamental frequency sequence of the original audio with the fundamental frequency sequence of the audio to be scored, and scores the audio to be scored according to a related matching algorithm. For example, the fundamental frequency sequence of the original singing audio and the fundamental frequency sequence of the audio to be scored can be respectively extracted through a fundamental frequency extraction method, the similarity between the fundamental frequency sequence of the original singing audio and the fundamental frequency sequence of the audio to be scored is determined through a dynamic time warping method, and then the score of the audio to be scored is determined according to the similarity.
However, the similarity determined by the method only concerns the intonation of the audio to be scored, namely whether the tone is correct or not, and if the tone is correct, a high score can be obtained.
Disclosure of Invention
The application provides a scoring method, a scoring device, scoring equipment and a storage medium for audio, which can solve the problem that scores are more unilateral due to the fact that scoring methods in the related art are too single. The technical scheme is as follows:
in one aspect, a method for scoring audio is provided, and the method includes:
acquiring a plurality of vibrato fragments corresponding to a base frequency sequence of the audio to be scored;
acquiring trill characteristic information of a fundamental frequency sequence corresponding to the trill segments, wherein the trill characteristic information at least comprises spectral distribution stability and sequence amplitude;
determining the trill scores of the trill segments according to the trill segment durations of the trill segments and the spectral distribution stability and sequence amplitude of the fundamental frequency sequences corresponding to the trill segments;
determining the audio tremolo score of the audio to be scored according to the tremolo scores and the number of the tremolo segments of the plurality of tremolo segments;
and scoring the audio to be scored based on the audio trill value.
In a possible implementation manner of the present application, the determining the vibrato scores of the vibrato segments according to the vibrato segment durations of the vibrato segments and the spectrum distribution stability and the sequence amplitude of the fundamental frequency sequences corresponding to the vibrato segments includes:
determining time length scores of the plurality of vibrato fragments based on vibrato fragment time lengths of the plurality of vibrato fragments;
multiplying the frequency spectrum distribution stability of the fundamental frequency sequence corresponding to each trill fragment by a first numerical value respectively to obtain the stability score of each trill fragment;
determining amplitude scores of the plurality of vibrato fragments based on sequence amplitudes of fundamental frequency sequences corresponding to the plurality of vibrato fragments;
and respectively determining the sum of the time length score, the stability score and the amplitude score of each trill fragment as the trill score of each trill fragment to obtain the trill scores of the trill fragments.
In a possible implementation manner of the present application, the determining time length scores of the vibrato segments based on the vibrato segment time lengths of the vibrato segments includes:
for a first vibrato fragment in the plurality of vibrato fragments, when the vibrato fragment time length of the first vibrato fragment is smaller than a first time length threshold value, determining the time length score of the first vibrato fragment according to the vibrato fragment time length of the first vibrato fragment and a second numerical value, wherein the first vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the vibrato fragment time length of the first vibrato fragment is greater than or equal to the first time length threshold value, determining the second numerical value as the time length value of the first vibrato fragment.
In a possible implementation manner of the present application, the determining the amplitude scores of the multiple vibrato fragments based on the sequence amplitudes of the fundamental frequency sequences corresponding to the multiple vibrato fragments includes:
for a second vibrato fragment in the plurality of vibrato fragments, when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is smaller than an amplitude threshold value, determining the amplitude score of the second vibrato fragment according to the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment and a third numerical value, wherein the second vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is greater than or equal to the amplitude threshold value, determining the third numerical value as the amplitude score of the second vibrato fragment.
In a possible implementation manner of the present application, the determining the audio tremolo score of the audio to be scored according to the tremolo scores and the number of the tremolo segments of the plurality of tremolo segments includes:
determining a highest vibrato score from the vibrato scores of the plurality of vibrato fragments;
determining the trill numerical value according to the trill fragment number of the trill fragments;
and determining the sum of the highest trill score and the trill number score as the audio trill score of the audio to be scored.
In a possible implementation manner of the present application, the determining a trill score value according to the number of trill segments of the trill segments includes:
determining the number of the vibrato segments as the vibrato number score when the number of the vibrato segments is smaller than a number threshold; or,
and when the number of the vibrato segments is larger than or equal to the number threshold, determining a fourth numerical value as the vibrato number score.
In a possible implementation manner of the present application, the obtaining of multiple vibrato segments corresponding to a fundamental frequency sequence of an audio to be scored includes:
taking the appointed time length as the window time length and the appointed step length as the moving distance, and carrying out fast Fourier transform processing on the fundamental frequency sequence of the audio to be scored in a plurality of windows to obtain frequency spectrums corresponding to the windows;
respectively squaring the frequency spectrums corresponding to the windows to obtain power spectrums corresponding to the windows;
determining the ratio of the power spectrum energy in the designated frequency band to the total power spectrum energy in each window according to the power spectrums corresponding to the windows to obtain a vibrato possibility value corresponding to each window;
and determining the plurality of vibrato fragments from the audio to be scored according to the vibrato possibility values corresponding to the plurality of windows.
In a possible implementation manner of the present application, the determining, according to the vibrato likelihood values corresponding to the windows, the vibrato segments from the audio to be scored includes:
determining candidate base frequency sequences with the vibrato possibility values larger than or equal to a possibility threshold value from the base frequency sequences of the audio to be scored according to the vibrato possibility values corresponding to the windows;
determining the frequency spectrum distribution stability, frequency and sequence amplitude of the target candidate base frequency sequence with continuous duration greater than a second duration threshold;
and determining the plurality of vibrato fragments from the audio to be scored according to the frequency spectrum distribution stability, the frequency and the sequence amplitude of the target candidate fundamental frequency sequence.
In a possible implementation manner of the present application, the determining the stability of the frequency distribution, the frequency, and the sequence amplitude of the target candidate fundamental frequency sequence whose continuous duration is greater than the second duration threshold includes:
performing fast Fourier transform processing on the target candidate base frequency sequence, and squaring a processing result to obtain a power spectrum of the target candidate base frequency sequence;
according to the power spectrum of the target candidate base frequency sequence, determining the ratio of the power spectrum energy of the target candidate base frequency sequence in a preset frequency band to the total power spectrum energy of the target candidate base frequency sequence to obtain the spectrum distribution stability of the target candidate base frequency sequence;
determining the frequency of the target candidate base frequency sequence according to the period duration and the period number of a plurality of base frequency vibration periods of the target candidate base frequency sequence, wherein the base frequency sequence in each base frequency vibration period crosses zero twice after being subjected to averaging;
and determining the sequence amplitude of the target candidate fundamental frequency sequence according to the fundamental frequency difference and the number of the cycles of the plurality of fundamental frequency vibration cycles of the target candidate fundamental frequency sequence, wherein the fundamental frequency difference of each fundamental frequency vibration cycle is the difference between the maximum value and the minimum value of the fundamental frequency in each fundamental frequency vibration cycle.
In a possible implementation manner of the present application, after obtaining a plurality of vibrato segments corresponding to a fundamental frequency sequence of an audio to be scored, the method further includes:
and for each vibrato fragment in the plurality of vibrato fragments, highlighting the progress bar corresponding to each vibrato fragment from the starting time of each vibrato fragment.
In another aspect, an apparatus for scoring audio is provided, the apparatus including:
the first acquisition module is used for acquiring a plurality of vibrato fragments corresponding to the base frequency sequence of the audio to be scored;
a second obtaining module, configured to obtain vibrato feature information of a fundamental frequency sequence corresponding to the plurality of vibrato fragments, where the vibrato feature information at least includes a spectrum distribution stability and a sequence amplitude;
a first determining module, configured to determine vibrato scores of the vibrato fragments according to vibrato fragment durations of the vibrato fragments and spectral distribution stabilities and sequence amplitudes of fundamental frequency sequences corresponding to the vibrato fragments;
the second determining module is used for determining the audio tremolo scores of the audios to be scored according to the tremolo scores and the number of the tremolo fragments of the plurality of tremolo fragments;
and the scoring module is used for scoring the audio to be scored based on the audio trill score.
In one possible implementation manner of the present application, the first determining module is configured to:
determining time length scores of the plurality of vibrato fragments based on vibrato fragment time lengths of the plurality of vibrato fragments;
multiplying the frequency spectrum distribution stability of the fundamental frequency sequence corresponding to each trill fragment by a first numerical value respectively to obtain the stability score of each trill fragment;
determining amplitude scores of the plurality of vibrato fragments based on sequence amplitudes of fundamental frequency sequences corresponding to the plurality of vibrato fragments;
and respectively determining the sum of the time length score, the stability score and the amplitude score of each trill fragment as the trill score of each trill fragment to obtain the trill scores of the trill fragments.
In one possible implementation manner of the present application, the first determining module is configured to:
for a first vibrato fragment in the plurality of vibrato fragments, when the vibrato fragment time length of the first vibrato fragment is smaller than a first time length threshold value, determining the time length score of the first vibrato fragment according to the vibrato fragment time length of the first vibrato fragment and a second numerical value, wherein the first vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the vibrato fragment time length of the first vibrato fragment is greater than or equal to the first time length threshold value, determining the second numerical value as the time length value of the first vibrato fragment.
In one possible implementation manner of the present application, the first determining module is configured to:
for a second vibrato fragment in the plurality of vibrato fragments, when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is smaller than an amplitude threshold value, determining the amplitude score of the second vibrato fragment according to the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment and a third numerical value, wherein the second vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is greater than or equal to the amplitude threshold value, determining the third numerical value as the amplitude score of the second vibrato fragment.
In one possible implementation manner of the present application, the second determining module is configured to:
determining a highest vibrato score from the vibrato scores of the plurality of vibrato fragments;
determining the trill numerical value according to the trill fragment number of the trill fragments;
and determining the sum of the highest trill score and the trill number score as the audio trill score of the audio to be scored.
In one possible implementation manner of the present application, the second determining module is configured to:
determining the number of the vibrato segments as the vibrato number score when the number of the vibrato segments is smaller than a number threshold; or,
and when the number of the vibrato segments is larger than or equal to the number threshold, determining a fourth numerical value as the vibrato number score.
In one possible implementation manner of the present application, the first obtaining module is configured to:
taking the appointed time length as the window time length and the appointed step length as the moving distance, and carrying out fast Fourier transform processing on the fundamental frequency sequence of the audio to be scored in a plurality of windows to obtain frequency spectrums corresponding to the windows;
respectively squaring the frequency spectrums corresponding to the windows to obtain power spectrums corresponding to the windows;
determining the ratio of the power spectrum energy in the designated frequency band to the total power spectrum energy in each window according to the power spectrums corresponding to the windows to obtain a vibrato possibility value corresponding to each window;
and determining the plurality of vibrato fragments from the audio to be scored according to the vibrato possibility values corresponding to the plurality of windows.
In one possible implementation manner of the present application, the first obtaining module is configured to:
determining candidate base frequency sequences with the vibrato possibility values larger than or equal to a possibility threshold value from the base frequency sequences of the audio to be scored according to the vibrato possibility values corresponding to the windows;
determining the frequency spectrum distribution stability, frequency and sequence amplitude of the target candidate base frequency sequence with continuous duration greater than a second duration threshold;
and determining the plurality of vibrato fragments from the audio to be scored according to the frequency spectrum distribution stability, the frequency and the sequence amplitude of the target candidate fundamental frequency sequence.
In one possible implementation manner of the present application, the first obtaining module is configured to:
performing fast Fourier transform processing on the target candidate base frequency sequence, and squaring a processing result to obtain a power spectrum of the target candidate base frequency sequence;
according to the power spectrum of the target candidate base frequency sequence, determining the ratio of the power spectrum energy of the target candidate base frequency sequence in a preset frequency band to the total power spectrum energy of the target candidate base frequency sequence to obtain the spectrum distribution stability of the target candidate base frequency sequence;
determining the frequency of the target candidate base frequency sequence according to the period duration and the period number of a plurality of base frequency vibration periods of the target candidate base frequency sequence, wherein the base frequency sequence in each base frequency vibration period crosses zero twice after being subjected to averaging;
and determining the sequence amplitude of the target candidate fundamental frequency sequence according to the fundamental frequency difference and the number of the cycles of the plurality of fundamental frequency vibration cycles of the target candidate fundamental frequency sequence, wherein the fundamental frequency difference of each fundamental frequency vibration cycle is the difference between the maximum value and the minimum value of the fundamental frequency in each fundamental frequency vibration cycle.
In a possible implementation manner of the present application, the first obtaining module is further configured to:
and for each vibrato fragment in the plurality of vibrato fragments, highlighting the progress bar corresponding to each vibrato fragment from the starting time of each vibrato fragment.
In another aspect, an apparatus is provided, which includes a memory for storing a computer program and a processor for executing the computer program stored in the memory to implement the steps of the audio scoring method described above.
In another aspect, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, implements the steps of the audio scoring method described above.
In another aspect, a computer program product is provided comprising instructions which, when run on a computer, cause the computer to perform the steps of the audio scoring method described above.
The technical scheme provided by the application can at least bring the following beneficial effects:
in the embodiment of the application, a plurality of vibrato fragments corresponding to a base frequency sequence of an audio to be scored can be obtained firstly, namely the vibrato fragments are extracted from the audio to be scored independently, vibrato characteristic information of the base frequency sequence corresponding to the vibrato fragments is obtained, the vibrato characteristic information at least comprises spectral distribution stability and sequence amplitude, the vibrato scores of the vibrato fragments can be determined according to the vibrato fragment duration of the vibrato fragments and the spectral distribution stability and the sequence amplitude of the base frequency sequence corresponding to the vibrato fragments, the scores of the audio to be scored in the aspect of vibrato can be described, the audio vibrato scores of the audio to be scored are determined according to the vibrato scores and the vibrato fragment number of the vibrato fragments, and the audio to be scored is scored based on the audio vibrato scores. Therefore, the method also pays attention to the trill fragment on the basis of paying attention to the intonation in the traditional scoring, further determines the audio trill score of the audio to be scored, and solves the problem that the score obtained is more unilateral due to the fact that the scoring method only pays attention to the intonation of the audio to be scored in the prior art is too single.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow diagram illustrating a method for scoring audio according to an exemplary embodiment;
FIG. 2 is a diagram illustrating a sequence of fundamental frequencies of audio to be scored, according to an exemplary embodiment;
FIG. 3 is a spectral diagram shown in accordance with an example embodiment;
FIG. 4 is a schematic diagram illustrating a vibrato fragment analysis of audio to be scored, according to an example embodiment;
FIG. 5 is a schematic diagram illustrating a vibrato fragment analysis of audio to be scored, according to another illustrative embodiment;
FIG. 6 is a diagram illustrating a sequence of base frequencies after being de-equalized in accordance with an exemplary embodiment;
FIG. 7 is a schematic diagram illustrating a determination of a tremolo segment corresponding to a sequence of fundamental frequencies in accordance with an exemplary embodiment;
FIG. 8 is a schematic diagram illustrating a page displaying a vibrato fragment in accordance with an exemplary embodiment;
FIG. 9 is a schematic diagram of a page showing audio vibrato scores for audio to be scored, according to an example embodiment;
FIG. 10 is a schematic diagram illustrating the structure of an audio scoring apparatus according to an exemplary embodiment;
FIG. 11 is a schematic diagram illustrating the structure of an apparatus according to an exemplary embodiment.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Before explaining the audio scoring method provided by the embodiment of the present application in detail, an implementation environment of the embodiment of the present application is described.
The audio scoring method provided by the embodiment of the application can be executed by equipment, music software can be installed in the equipment, and the equipment can comprise a recording function.
As an example, the device may be any electronic product that can perform human-computer interaction with a user through one or more manners such as a keyboard, a touch pad, a touch screen, a remote controller, voice interaction, or a handwriting device, for example, a PC (Personal computer), a mobile phone, a smart phone, a PDA (Personal Digital Assistant), a wearable device, a pocket PC (pocket PC), a tablet computer, a smart car, a smart television, a smart speaker, and the like, which is not limited in this embodiment.
It will be understood by those skilled in the art that the foregoing is by way of example only and that other existing or future devices, which may be suitable for use in the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.
After the embodiments of the present application are described in the implementation environment, the audio scoring method provided by the embodiments of the present application will be explained in detail with reference to the drawings.
Fig. 1 is a flowchart illustrating a scoring method of audio, which may be applied to the above-described apparatus, according to an exemplary embodiment. Referring to fig. 1, the method may include the following steps:
step 101: and acquiring a plurality of vibrato fragments corresponding to the fundamental frequency sequence of the audio to be scored.
The fundamental frequency sequence is composed of a plurality of fundamental frequencies and can be used for determining the melody of the audio to be scored. The fundamental frequency is the frequency of the fundamental tone, and natural sounds can generally be decomposed into different sine waves, where the sine wave with the lowest frequency is the fundamental tone, and the frequency of the fundamental tone can be referred to as the fundamental frequency.
It should be noted that, to determine a plurality of vibrato segments from the audio to be scored, the determination needs to be performed through the fundamental frequency sequence of the audio to be scored, and therefore, the fundamental frequency sequence of the audio to be scored can be extracted from the audio to be scored through a fundamental frequency extraction algorithm. Illustratively, referring to fig. 2, fig. 2 is a schematic diagram of a fundamental frequency sequence of audio to be scored.
As an example, when obtaining a plurality of vibrato segments corresponding to the fundamental frequency sequence of the audio to be scored, fast fourier transform may be performed on the fundamental frequency sequence of the audio to be scored, and then the plurality of vibrato segments may be determined according to a transform result. Since the signal subjected to the fast fourier transform must be a stationary signal, the fundamental frequency sequence of the audio to be scored can be subjected to the fast fourier transform processing in units of windows.
In some embodiments, the specified duration may be a window duration, the specified step length is a moving distance, and the fast fourier transform processing is performed on the fundamental frequency sequence of the audio to be scored in the multiple windows to obtain frequency spectrums corresponding to the multiple windows. And according to the power spectrums corresponding to the windows, determining the ratio of the power spectrum energy in the specified frequency band to the total power spectrum energy in each window, and obtaining the vibrato possibility value corresponding to each window. And determining a plurality of vibrato fragments from the audio to be evaluated according to the vibrato possibility values corresponding to the windows.
The specified duration and the specified step length may be set by the user according to actual needs, or may be set by default by the device, which is not limited in the embodiment of the present application.
The designated frequency band refers to a frequency band where the frequency of the vibrato fragment is theoretically located, and can be 4-8 HZ.
Wherein, the vibrato possibility value can be used to indicate the possibility that the audio segment of the audio to be scored corresponding to each window is the vibrato segment.
As an example, in order to improve the calculation efficiency and accuracy, the fast fourier transform processing may be performed on the fundamental frequency sequence of the audio to be scored in the window with a window duration of 50ms and a moving distance of 5ms, so as to obtain frequency spectrums corresponding to a plurality of windows.
For example, a frequency spectrum corresponding to a first window may be obtained by performing fast fourier transform on a 0-50ms base frequency sequence of the audio to be scored in the first window, a frequency spectrum corresponding to a second window may be obtained by performing fast fourier transform on a 5-55ms base frequency sequence of the audio to be scored in the second window, and so on, and a frequency spectrum corresponding to multiple windows may be obtained by performing fast fourier transform on a base frequency sequence of the audio to be scored in multiple windows. Referring to fig. 3, fig. 3 is a spectral diagram of a frequency spectrum obtained by performing fast fourier transform on a fundamental frequency sequence of an audio to be scored in a window.
It should be noted that, the above is only described by taking the window duration as 50ms and the moving distance as 5ms as an example, in an actual implementation, the window duration may take any value, and the moving distance may also take any value, and may be adjusted according to an actual need, which is not limited in this embodiment of the application.
As an example, after obtaining the frequency spectrums corresponding to the multiple windows, the frequency spectrum corresponding to each window may be squared to obtain the power spectrum corresponding to each window.
As an example, after obtaining the power spectrums corresponding to the multiple windows, for any window in the multiple windows, according to the power spectrum corresponding to the window, a ratio of power spectrum energy in a specified frequency band to total energy of the power spectrum in the window is determined in the window, and the ratio is determined as a vibrato possibility value that an audio segment of the audio to be evaluated corresponding to the window is a vibrato segment.
In one possible implementation, the vibrato likelihood value of the window may be calculated by the following equation (1).
Figure BDA0002261392150000101
Wherein P represents the vibrato possibility value of the window, and X (f, t) represents the normalized valueF represents frequency, t represents time, the power spectrum is obtained by performing fast Fourier transform and squaring on the base frequency sequence in a specified time length from the time t,
Figure BDA0002261392150000102
represents the power spectrum energy within the frequency band of 4HZ-8HZ, and ^ X (f, t) dt represents the total energy of the power spectrum within the window.
As another example, after obtaining the frequency spectrum corresponding to each window, directly determining, according to the frequency spectrum corresponding to each window, a ratio of spectral energy in the specified frequency band to spectral total energy in each window, and obtaining a vibrato likelihood value corresponding to each window.
In some embodiments, after determining the vibrato likelihood values corresponding to the multiple windows, multiple vibrato fragments may be determined from the audio to be scored according to the vibrato likelihood values corresponding to the multiple windows, and the specific implementation may include: and according to the vibrato possibility values corresponding to the windows, determining a candidate base frequency sequence with the vibrato possibility value larger than or equal to a possibility threshold value from the base frequency sequences of the audio to be evaluated, and determining the spectrum distribution stability, the frequency and the sequence amplitude of a target candidate base frequency sequence with continuous time length larger than a second time length threshold value. And determining a plurality of vibrato segments from the audio to be evaluated according to the frequency spectrum distribution stability, the frequency and the sequence amplitude of the target candidate base frequency sequence.
The probability threshold may be set by a user according to actual needs, or may be set by default by a device, which is not limited in the embodiment of the present application.
The candidate base frequency sequence is a base frequency sequence of which the vibrato possibility value is greater than or equal to the possibility threshold value in the base frequency sequence of the audio to be scored; the target candidate base frequency sequence is a candidate base frequency sequence of which the continuous duration is greater than a second duration threshold value in the candidate base frequency sequences.
The second duration threshold may be set by a user according to actual needs, or may be set by default by the device, which is not limited in the embodiment of the present application.
As an example, the vibrato likelihood value corresponding to each window may be compared with a likelihood threshold according to the vibrato likelihood values corresponding to a plurality of windows, when the vibrato likelihood value of a certain window is greater than or equal to the likelihood threshold, it may be considered that the audio segment corresponding to the fundamental frequency sequence in the window may be a vibrato segment, and the fundamental frequency sequence in the window may be determined as the candidate fundamental frequency sequence.
Illustratively, referring to fig. 4, fig. 4 is a time-vibrato likelihood value graph and a time-fundamental frequency sequence graph, in which the likelihood threshold is 0.25, and it can be seen in the time-vibrato likelihood value graph that the corresponding fundamental frequency sequences located on the straight line are candidate fundamental frequency sequences, from which the start time and the end time of each candidate fundamental frequency sequence can be determined, and the fundamental frequency sequences within the corresponding start time and end time in the time-fundamental frequency sequence graph are candidate fundamental frequency sequences, it can be seen that the number of the candidate fundamental frequency sequences is usually multiple. Referring to fig. 5, fig. 5 is a schematic diagram of a vibrato fragment analysis of an audio to be scored.
As an example, since the continuous duration of some candidate fundamental frequency sequences in the plurality of candidate fundamental frequency sequences is very short, the audio segment of the audio to be scored corresponding to the part of candidate fundamental frequency sequences may be considered not to be a vibrato segment, when the continuous duration of the candidate fundamental frequency sequences is greater than the second duration threshold, the audio segment of the audio to be scored corresponding to the candidate fundamental frequency sequences may be considered to be a vibrato segment with a higher possibility, the candidate fundamental frequency sequences with the continuous duration greater than the second duration threshold are determined as the target candidate fundamental frequency sequences, and then the spectral distribution stability, the frequency and the sequence amplitude of the target candidate fundamental frequency sequences with the continuous duration greater than the second duration threshold are determined.
In one possible implementation manner, the specific implementation of determining the stability of the frequency distribution, the frequency and the sequence amplitude of the target candidate fundamental frequency sequence with the continuous duration greater than the second duration threshold may include: and performing fast Fourier transform processing on the target candidate base frequency sequence, and squaring a processing result to obtain a power spectrum of the target candidate base frequency sequence. According to the power spectrum of the target candidate base frequency sequence, determining the ratio of the power spectrum energy of the target candidate base frequency sequence in the specified frequency band to the total power spectrum energy of the target candidate base frequency sequence to obtain the spectrum distribution stability of the target candidate base frequency sequence; determining the frequency of the target candidate base frequency sequence according to the period duration and the period number of a plurality of base frequency vibration periods of the target candidate base frequency sequence, wherein the base frequency sequence in each base frequency vibration period crosses zero twice after being subjected to equalization; determining the sequence amplitude of the target candidate base frequency sequence according to the base frequency difference and the period number of a plurality of base frequency vibration periods of the target candidate base frequency sequence, wherein the base frequency difference of each base frequency vibration period is the difference between the maximum value and the minimum value of the base frequency in each base frequency vibration period.
As an example, the number of the target candidate base frequency sequences is usually multiple, when the frequency spectrum distribution stability of the target candidate base frequency sequence with continuous duration greater than the second duration threshold is determined, for any target candidate base frequency sequence, fast fourier transform processing may be performed on the target candidate base frequency sequence, a square is taken for a processing result to obtain a power spectrum of the target candidate base frequency sequence, then according to the power spectrum of the target candidate base frequency sequence, a ratio of power spectrum energy of the target candidate base frequency sequence in a specified frequency band to total power spectrum energy of the target candidate base frequency sequence is determined, and the ratio is determined as the frequency spectrum distribution stability of the target candidate base frequency sequence.
It should be noted that, the specific implementation of performing fast fourier transform processing on the target candidate fundamental frequency sequence and processing the result to obtain the power spectrum is the same as the process of determining the power spectrum corresponding to the multiple windows in the foregoing embodiment, which may be referred to specifically in the foregoing embodiment, and this embodiment is not described herein again.
Illustratively, based on the power spectrum of the target candidate fundamental frequency sequence, the stability of the spectral distribution of the target candidate fundamental frequency sequence may be determined by the following formula (2).
Figure BDA0002261392150000121
Wherein, the likelihood represents the frequency spectrum of the target candidate base frequency sequenceDistribution stability, X (f, t) represents the power spectrum of the target candidate base frequency sequence, f represents frequency, t represents time,
Figure BDA0002261392150000122
represents the power spectrum energy of the target candidate base frequency sequence in the frequency band of 4HZ-8HZ, and ^ X (f, t) dt represents the total power spectrum energy of the target candidate base frequency sequence.
As an example, when determining the frequency of the target candidate fundamental frequency sequence with the continuous duration greater than the second duration threshold, for any target candidate fundamental frequency sequence, the frequency of the target candidate fundamental frequency sequence may be determined according to the period duration and the period number of the multiple fundamental frequency vibration periods of the target candidate fundamental frequency sequence, and the fundamental frequency sequence in each fundamental frequency vibration period crosses zero twice after averaging.
The term "averaging" refers to adding the fundamental frequency of each point in the target candidate fundamental frequency sequence to obtain an average value, subtracting the average value from the fundamental frequency of each point, and describing the obtained corresponding relationship between the fundamental frequency of each point and time by using the fundamental frequency sequence, wherein the fundamental frequency sequence at this time is the baseband frequency sequence after averaging.
And the zero crossing point is twice after the equalization of the fundamental frequency sequence in each fundamental frequency vibration period. Referring to fig. 6, a is a fundamental frequency sequence in a complete fundamental frequency vibration period, B is another fundamental frequency sequence in a complete fundamental frequency vibration period, and C is not a fundamental frequency sequence of a complete fundamental frequency vibration period although zero-crossing twice.
Illustratively, the frequency of the target candidate fundamental frequency sequence may be determined by the following formula (3).
Figure BDA0002261392150000131
Wherein,
Figure BDA0002261392150000132
representing the frequency, R, of a target candidate sequence of fundamental frequenciesnRepresenting the period duration of the nth fundamental frequency vibration period in the target candidate fundamental frequency sequence, N tableAnd showing the number of the periods of the fundamental frequency vibration period in the target candidate fundamental frequency sequence.
Illustratively, referring to FIG. 6, it can be determined that the cycle duration of the first fundamental oscillation period in FIG. 6 is 4ms and the cycle duration of the second fundamental oscillation period is 12 ms.
As an example, when determining the sequence amplitude of the target candidate fundamental frequency sequence with the continuous duration greater than the second duration threshold, for any target candidate fundamental frequency sequence, the sequence amplitude of the target candidate fundamental frequency sequence may be determined according to the fundamental frequency difference and the number of cycles of a plurality of fundamental frequency vibration cycles of the target candidate fundamental frequency sequence, where the fundamental frequency difference of each fundamental frequency vibration cycle refers to the difference between the maximum fundamental frequency value and the minimum fundamental frequency value in each fundamental frequency vibration cycle.
Illustratively, the sequence amplitude of the target candidate fundamental frequency sequence may be determined by the following formula (4).
Figure BDA0002261392150000133
Where, extend denotes the sequence amplitude of the target candidate fundamental frequency sequence, EnAnd representing the fundamental frequency difference value of the nth fundamental frequency vibration period in the target candidate fundamental frequency sequence, and N representing the number of the periods of the fundamental frequency vibration period in the target candidate fundamental frequency sequence.
As an example, after determining the spectrum distribution stability, the frequency and the sequence amplitude of the target candidate fundamental frequency sequence, the target candidate fundamental frequency sequence may be filtered by a threshold method according to the spectrum distribution stability, the frequency and the sequence amplitude of the target candidate fundamental frequency sequence, so as to determine a target candidate fundamental frequency sequence that meets the requirement, and an audio clip corresponding to the target candidate fundamental frequency sequence that meets the requirement is selected from the audio to be evaluated, so as to determine a plurality of vibrato clips.
Exemplarily, when the stability of the frequency spectrum distribution of the target candidate base frequency sequence is greater than a first threshold, the frequency of the target candidate base frequency sequence is greater than a second threshold, and the sequence amplitude of the target candidate base frequency sequence is greater than a third threshold, it may be determined that the audio segment of the audio to be scored corresponding to the target candidate base frequency sequence is a vibrato segment; or when the stability of the frequency spectrum distribution of the target candidate base frequency sequence is greater than a first threshold, or the frequency of the target candidate base frequency sequence is greater than a second threshold, or the sequence amplitude of the target candidate base frequency sequence is greater than a third threshold, it may be determined that the audio segment of the audio to be scored corresponding to the target candidate base frequency sequence is a vibrato segment. Referring to fig. 7, in the time-base frequency sequence shown in fig. 7, the audio segments corresponding to the base frequency sequence located within the two dotted lines are vibrato segments.
It is noted that in other embodiments, the plurality of vibrato segments may be determined from the audio to be scored by other methods.
In a first implementation, a plurality of vibrato segments may be determined from audio to be scored by a hidden markov model based on a sequence of fundamental frequencies of the audio to be scored. Exemplarily, a plurality of fundamental frequency sequences with vibrato labels can be obtained, the plurality of fundamental frequency sequences with vibrato labels are input into a hidden markov model to train the hidden markov model so as to obtain a vibrato recognition model, then the fundamental frequency sequences of the audios to be scored are input into the vibrato recognition model, the vibrato judgment result of the fundamental frequency sequences of each preset time period in the fundamental frequency sequences of the audios to be scored is output, and the audio segments corresponding to the fundamental frequency sequences with continuous vibrato judgment results are determined as the vibrato segments.
Wherein, the trill judgment result comprises a yes result or a no result.
For example, the preset time period may be 5ms, and the preset time period may be set by a user according to actual needs, or may be set by default by a device, which is not limited in this embodiment of the application.
The second implementation manner can be that the short-time fourier transform processing is performed on the fundamental frequency sequence of the audio to be scored to obtain a spectrogram corresponding to the fundamental frequency sequence of the audio to be scored, the spectrogram of the audio to be scored is drawn, the spectrogram is segmented according to a certain rule to obtain a plurality of spectrogram, for each spectrogram in the spectrogram, each spectrogram is respectively matched with prestored spectrogram of short vibrato fragments with different frequencies and amplitudes, if the matching is successful, the start-stop time is determined according to the spectrogram which is successfully matched in succession, and the audio fragment of the audio to be scored in the time period is determined to be a vibrato fragment.
In a third implementation manner, short-time fourier transform processing may be performed on the fundamental frequency sequence of the audio to be scored to obtain a spectrogram corresponding to the fundamental frequency sequence of the audio to be scored, the spectrogram of the audio to be scored is drawn, the spectrogram is input into a trained vibrato detection network, the start-stop time of a plurality of audios may be output, and an audio segment of the audio to be scored, of which the start-stop time of the audio is greater than a time threshold, is determined as a vibrato segment.
The time threshold may be set by a user according to actual needs, or may be set by default by the device, which is not limited in the embodiment of the present application.
Further, after determining a plurality of audio segments to be scored from the audio to be scored according to the fundamental frequency sequence of the audio to be scored, for each vibrato segment in the plurality of vibrato segments, highlighting the progress bar corresponding to each vibrato segment from the start time of each vibrato segment.
As an example, during the recording of the audio to be scored by the user, the progress bar corresponding to the trill fragment may be highlighted in the form of a wavy line or a highlighted progress bar. Illustratively, referring to fig. 8, a wave line may be displayed above the progress bar from the start time of a tremolo segment during recording until the end time of the tremolo segment, and the display of the wave line may be stopped.
Therefore, the user can display the tremolo in real time as long as the user sings the tremolo in the recording process, so that the user can feel the singing power of the user in real time, and the use experience of the user is improved.
Step 102: obtaining trill characteristic information of a fundamental frequency sequence corresponding to a plurality of trill segments, wherein the trill characteristic information at least comprises spectrum distribution stability and sequence amplitude.
Because the plurality of vibrato fragments are selected from the vibrato fragments corresponding to the target fundamental frequency sequence, and the stability of the frequency spectrum distribution and the sequence amplitude of the target fundamental frequency sequence are determined in the step 101, the vibrato characteristic information of the fundamental frequency sequence corresponding to the plurality of vibrato fragments can be directly obtained.
Step 103: and determining the trill scores of the trill segments according to the trill segment durations of the trill segments and the spectral distribution stability and the sequence amplitude of the fundamental frequency sequences corresponding to the trill segments.
In some embodiments, determining the vibrato scores for a plurality of vibrato fragments may include the steps of:
(1) determining time duration scores for the plurality of vibrato segments based on vibrato segment time durations for the plurality of vibrato segments.
As an example, for a first vibrato fragment in the plurality of vibrato fragments, when the vibrato fragment duration of the first vibrato fragment is smaller than a first duration threshold, determining the duration score of the first vibrato fragment according to the vibrato fragment duration of the first vibrato fragment and a second numerical value, wherein the first vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or when the vibrato fragment time length of the first vibrato fragment is larger than or equal to the first time length threshold value, determining the second numerical value as the time length value of the first vibrato fragment.
The first time threshold may be set by a user according to actual needs, or may be set by default by the device, which is not limited in the embodiment of the present application.
That is, for each of the plurality of vibrato pieces, the time length score of each vibrato piece may be determined according to the above method, and when the time length score of the vibrato piece is calculated, two calculation methods may be classified according to the difference in time length of the vibrato piece.
Illustratively, the time-length score of the vibrato fragment may be calculated by the following formula (5).
Figure BDA0002261392150000151
Wherein S is1Represents the time duration score of a single vibrato fragment, t represents the time duration of a single vibrato fragment, and 35 is a second value.
It should be noted that, in the above formula (5), only 35 is taken as an example as the second numerical value, and 0.4 and 1.5 in the formula are also only exemplary numerical values, in an actual implementation, both the second numerical value and other numerical values in the formula (5) may be set by a user according to actual needs, or may be set by default by a device, which is not limited in this embodiment of the application.
(2) And multiplying the frequency spectrum distribution stability of the fundamental frequency sequence corresponding to each trill fragment by the first numerical value to obtain the stability score of each trill fragment.
Illustratively, the stability score of a vibrato fragment may be calculated by the following equation (6).
S2=25*likelihood (6)
Wherein S is2The stability score of a single vibrato fragment is shown, the likelihood shows the stability of the spectral distribution of a single vibrato fragment, and 25 is a first numerical value.
It should be noted that, in the above formula (6), only 25 is taken as an example of the first numerical value, and in an actual implementation, the first numerical value may be set by a user according to an actual need, or may be set by a default of a device, which is not limited in this embodiment of the application.
(3) And determining amplitude scores of the plurality of vibrato fragments based on the sequence amplitudes of the fundamental frequency sequences corresponding to the plurality of vibrato fragments.
As an example, for a second vibrato fragment in the plurality of vibrato fragments, when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is smaller than an amplitude threshold, according to the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment and a third numerical value, determining the amplitude score of the second vibrato fragment, wherein the second vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is greater than or equal to the amplitude threshold value, determining the third numerical value as the amplitude score of the second vibrato fragment.
The amplitude threshold may be set by a user according to actual needs, or may be set by default by a device, which is not limited in the embodiment of the present application.
That is, for each of the plurality of vibrato pieces, the amplitude score of each vibrato piece may be determined according to the above-described method, and when calculating the amplitude score of the vibrato piece, the sequence amplitudes of the fundamental frequency sequences corresponding to the vibrato piece are different, and two calculation methods may be used.
Illustratively, the amplitude score of a vibrato fragment may be calculated by the following formula (7).
Figure BDA0002261392150000161
Wherein S is3The amplitude score of a single vibrato fragment is represented, the extent represents the sequence amplitude of a single vibrato fragment, and 10 is a third numerical value.
It should be noted that, in the above formula (7), only 10 is taken as an example for the third numerical value, and 0.15 and 1 in the formula are also only exemplary numerical values, in an actual implementation, the third numerical value and other numerical values in the formula (7) may be set by a user according to an actual need, or may be set by default by a device, which is not limited in this embodiment of the application.
(4) And respectively determining the sum of the duration score, the stability score and the amplitude score of each trill fragment as the trill score of each trill fragment to obtain the trill scores of a plurality of trill fragments.
That is, for a single vibrato fragment, the time length score, the stability score, and the amplitude score of the vibrato fragment are added, and the sum obtained by the addition is determined as the vibrato score of the vibrato fragment. Performing the above operations for a plurality of vibrato segments, vibrato scores for the plurality of vibrato segments may be determined.
Illustratively, the vibrato score of a single vibrato fragment may be denoted by S.
It should be noted that the vibrato score of each vibrato fragment can be displayed in real time during the recording of the audio to be scored.
Step 104: and determining the audio tremolo score of the audio to be scored according to the tremolo scores of the tremolo segments and the number of the tremolo segments.
In some embodiments, the specific implementation of step 103 may include: determining the highest trill score from the trill scores of the trill segments, and determining the trill score according to the number of the trill segments. And determining the sum of the highest trill score and the trill number score as the audio trill score of the audio to be scored.
As an example, the plurality of vibrato fragments may be arranged in order of their vibrato scores from high to low, and the vibrato score of the vibrato fragment sequentially ranked first may be determined as the highest vibrato score.
Exemplarily, assuming that the vibrato score of the vibrato fragment a is 60, the vibrato score of the vibrato fragment B is 70, the vibrato score of the vibrato fragment C is 68, and the vibrato score of the vibrato fragment D is 62, the four vibrato fragments are sorted to obtain the BCDA, and the vibrato score of the vibrato fragment B can be determined as the highest vibrato score, i.e., the highest vibrato score is 70.
As an example, when determining the trill number score, the determination is divided into two cases according to the difference in the number of trill segments of a plurality of trill segments: when the number of the vibrato segments of the plurality of vibrato segments is smaller than a number threshold, determining the number of the vibrato segments as the vibrato number value; or, when the number of the vibrato segments of the plurality of vibrato segments is greater than or equal to the number threshold, determining the fourth numerical value as the vibrato number value.
The number threshold may be set by a user according to actual needs, or may be set by default by the device, which is not limited in the embodiment of the present application.
Illustratively, the vibrato number score may be calculated by the following formula (8).
Figure BDA0002261392150000181
Wherein S is4The number score of the tremolo is shown, x is the number of the tremolo segments, and 30 is the fourth numerical value.
It should be noted that, in the above formula (8), only 30 is taken as an example of the fourth numerical value, and in an actual implementation, the fourth numerical value may be set by a user according to an actual need, or may be set by a default of a device, which is not limited in this embodiment of the application.
As an example, after determining the highest vibrato score and vibrato number score, the audio vibrato score of the audio to be scored may be determined by the following equation (9).
Audio trill score max (S) + S4(9)
Continuing with the above example, assuming that the audio to be scored includes 4 vibrato segments, the vibrato score may be determined to be 4, the highest vibrato score may be 70, and the audio vibrato score of the audio to be scored may be 74.
In other embodiments, an average vibrato segment score of the plurality of vibrato segments may be determined according to vibrato scores of the plurality of vibrato segments, and then a vibrato number score may be determined according to a number of vibrato segments of the plurality of vibrato segments; and determining the sum of the average trill score and the trill number score as the audio trill score of the audio to be scored.
Continuing with the above example, the average vibrato score may be determined as (60+70+68+62) ÷ 4 ═ 65, the vibrato number score may be 4, and the audio vibrato score of the audio to be scored may be determined as 71.
Step 105: and scoring the audio to be scored based on the audio vibrato score.
In some embodiments, after determining the audio vibrato score of the audio to be scored, the audio to be scored may be scored based on the audio vibrato score.
Illustratively, the fundamental frequency sequence of the original singing audio and the fundamental frequency sequence of the audio to be scored can be respectively extracted through a fundamental frequency extraction method, the similarity between the fundamental frequency sequence of the original singing audio and the fundamental frequency sequence of the audio to be scored is determined through a dynamic time warping method, and then the score of the audio to be scored is determined according to the similarity. For example, the similarity may be determined as a score of the audio to be scored, or a product of the similarity and a preset numerical value may be used as the score of the audio to be scored. Then, the audio tremolo score of the audio to be scored and the score of the audio to be scored can be added to obtain the total score of the audio to be scored; or adding the score of the audio to be scored and the audio tremolo score of the audio to be scored according to a certain weight to obtain the total score of the audio to be scored; still alternatively, the score of the audio to be scored and the audio vibrato score of the audio to be scored may be displayed separately in a page.
The preset value may be set by a user according to actual needs, or may be set by default by the device, which is not limited in the embodiment of the present application.
As an example, the overall score of the audio to be scored may be displayed on the page when the recording of the audio to be scored is finished; alternatively, referring to fig. 9, the score of the audio to be scored may be displayed in the page, and the audio vibrato score of the audio to be scored may be converted into a star scale, and displayed in the page in a star scale manner.
In the embodiment of the application, a plurality of vibrato fragments corresponding to a base frequency sequence of an audio to be scored can be obtained firstly, namely the vibrato fragments are extracted from the audio to be scored independently, vibrato characteristic information of the base frequency sequence corresponding to the vibrato fragments is obtained, the vibrato characteristic information at least comprises spectral distribution stability and sequence amplitude, the vibrato scores of the vibrato fragments can be determined according to the vibrato fragment duration of the vibrato fragments and the spectral distribution stability and the sequence amplitude of the base frequency sequence corresponding to the vibrato fragments, the scores of the audio to be scored in the aspect of vibrato can be described, the audio vibrato scores of the audio to be scored are determined according to the vibrato scores and the vibrato fragment number of the vibrato fragments, and the audio to be scored is scored based on the audio vibrato scores. Therefore, the method also pays attention to the trill fragment on the basis of paying attention to the intonation in the traditional scoring, further determines the audio trill score of the audio to be scored, and solves the problem that the score obtained is more unilateral due to the fact that the scoring method only pays attention to the intonation of the audio to be scored in the prior art is too single.
Fig. 10 is a schematic structural diagram illustrating an audio scoring apparatus, which may be implemented by software, hardware or a combination of the two as a part or all of a device, according to an exemplary embodiment. Referring to fig. 10, the apparatus includes: a first obtaining module 1001, a second obtaining module 1002, a first determining module 1003, a second determining module 1004, and a scoring module 1005.
A first obtaining module 1001, configured to obtain a plurality of vibrato fragments corresponding to a base frequency sequence of an audio to be scored;
a second obtaining module 1002, configured to obtain vibrato feature information of a fundamental frequency sequence corresponding to a plurality of vibrato fragments, where the vibrato feature information at least includes a spectrum distribution stability and a sequence amplitude;
a first determining module 1003, configured to determine vibrato scores of the vibrato segments according to vibrato segment durations of the vibrato segments, and spectral distribution stabilities and sequence amplitudes of fundamental frequency sequences corresponding to the vibrato segments;
a second determining module 1004, configured to determine, according to the vibrato scores of the plurality of vibrato fragments and the number of the vibrato fragments, audio vibrato scores of the audio to be scored;
a scoring module 1005 for scoring the audio to be scored based on the audio vibrato score
In a possible implementation manner of the present application, the first determining module 1003 is configured to:
determining time length scores of the vibrato fragments based on the vibrato fragment time lengths of the vibrato fragments;
multiplying the frequency spectrum distribution stability of the fundamental frequency sequence corresponding to each trill fragment by a first numerical value respectively to obtain the stability score of each trill fragment;
determining amplitude scores of the tremolo fragments based on sequence amplitudes of fundamental frequency sequences corresponding to the tremolo fragments;
and respectively determining the sum of the duration score, the stability score and the amplitude score of each trill fragment as the trill score of each trill fragment to obtain the trill scores of a plurality of trill fragments.
In a possible implementation manner of the present application, the first determining module 1003 is configured to:
for a first vibrato fragment in the plurality of vibrato fragments, when the vibrato fragment time length of the first vibrato fragment is smaller than a first time length threshold value, determining the time length score of the first vibrato fragment according to the vibrato fragment time length of the first vibrato fragment and a second numerical value, wherein the first vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the vibrato fragment time length of the first vibrato fragment is larger than or equal to the first time length threshold value, determining the second numerical value as the time length value of the first vibrato fragment.
In a possible implementation manner of the present application, the first determining module 1003 is configured to:
for a second vibrato fragment in the plurality of vibrato fragments, when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is smaller than an amplitude threshold value, determining the amplitude score of the second vibrato fragment according to the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment and a third numerical value, wherein the second vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is greater than or equal to the amplitude threshold value, determining a third numerical value as the amplitude score of the second vibrato fragment.
In one possible implementation manner of the present application, the second determining module 1004 is configured to:
determining a highest vibrato score from the vibrato scores of the plurality of vibrato fragments;
determining the trill numerical value according to the trill fragment number of the trill fragments;
and determining the sum of the highest trill score and the trill number score as the audio trill score of the audio to be scored.
In one possible implementation manner of the present application, the second determining module 1004 is configured to:
when the number of the vibrato segments of the plurality of vibrato segments is smaller than a number threshold, determining the number of the vibrato segments as the vibrato number value; or,
and when the number of the vibrato segments is larger than or equal to the number threshold value, determining the fourth numerical value as the vibrato number value.
In one possible implementation manner of the present application, the first obtaining module 1001 is configured to:
taking the designated duration as the window duration and the designated step length as the moving distance, and performing fast Fourier transform processing on the fundamental frequency sequence of the audio to be scored in a plurality of windows to obtain frequency spectrums corresponding to the plurality of windows;
respectively squaring frequency spectrums corresponding to the multiple windows to obtain power spectrums corresponding to the multiple windows;
determining the ratio of the power spectrum energy in the designated frequency band to the total power spectrum energy in each window according to the power spectrums corresponding to the windows to obtain a vibrato possibility value corresponding to each window;
and determining a plurality of vibrato fragments from the audio to be evaluated according to the vibrato possibility values corresponding to the windows.
In one possible implementation manner of the present application, the first obtaining module 1001 is configured to:
determining a candidate base frequency sequence with a vibrato possibility value larger than or equal to a possibility threshold value from the base frequency sequences of the audio to be evaluated according to the vibrato possibility values corresponding to the windows;
determining the frequency spectrum distribution stability, frequency and sequence amplitude of the target candidate base frequency sequence with continuous duration greater than a second duration threshold;
and determining a plurality of vibrato segments from the audio to be evaluated according to the frequency spectrum distribution stability, the frequency and the sequence amplitude of the target candidate base frequency sequence.
In one possible implementation manner of the present application, the first obtaining module 1001 is configured to:
performing fast Fourier transform processing on the target candidate base frequency sequence, and squaring a processing result to obtain a power spectrum of the target candidate base frequency sequence;
according to the power spectrum of the target candidate base frequency sequence, determining the ratio of the power spectrum energy of the target candidate base frequency sequence in a preset frequency band to the total power spectrum energy of the target candidate base frequency sequence to obtain the spectrum distribution stability of the target candidate base frequency sequence;
determining the frequency of the target candidate base frequency sequence according to the period duration and the period number of a plurality of base frequency vibration periods of the target candidate base frequency sequence, wherein the base frequency sequence in each base frequency vibration period crosses zero twice after being subjected to equalization;
determining the sequence amplitude of the target candidate base frequency sequence according to the base frequency difference and the period number of a plurality of base frequency vibration periods of the target candidate base frequency sequence, wherein the base frequency difference of each base frequency vibration period is the difference between the maximum value and the minimum value of the base frequency in each base frequency vibration period.
In one possible implementation manner of the present application, the first obtaining module 1001 is further configured to:
for each vibrato fragment in the plurality of vibrato fragments, highlighting the progress bar corresponding to each vibrato fragment from the start time of each vibrato fragment.
In the embodiment of the application, a plurality of vibrato fragments corresponding to a base frequency sequence of an audio to be scored can be obtained firstly, namely the vibrato fragments are extracted from the audio to be scored independently, vibrato characteristic information of the base frequency sequence corresponding to the vibrato fragments is obtained, the vibrato characteristic information at least comprises spectral distribution stability and sequence amplitude, the vibrato scores of the vibrato fragments can be determined according to the vibrato fragment duration of the vibrato fragments and the spectral distribution stability and the sequence amplitude of the base frequency sequence corresponding to the vibrato fragments, the scores of the audio to be scored in the aspect of vibrato can be described, the audio vibrato scores of the audio to be scored are determined according to the vibrato scores and the vibrato fragment number of the vibrato fragments, and the audio to be scored is scored based on the audio vibrato scores. Therefore, the method also pays attention to the trill fragment on the basis of paying attention to the intonation in the traditional scoring, further determines the audio trill score of the audio to be scored, and solves the problem that the score obtained is more unilateral due to the fact that the scoring method only pays attention to the intonation of the audio to be scored in the prior art is too single.
It should be noted that: when the audio scoring device provided in the above embodiment scores audio, only the division of the above functional modules is exemplified, and in practical application, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the audio scoring device and the audio scoring method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.
Fig. 11 is a block diagram of a device 1100 according to an embodiment of the present disclosure. The device 1100 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group audio Layer III, motion Picture Experts compression standard audio Layer 3), an MP4 player (Moving Picture Experts Group audio Layer IV, motion Picture Experts compression standard audio Layer 4), a notebook computer, or a desktop computer. Device 1100 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.
In general, the device 1100 includes: a processor 1101 and a memory 1102.
Processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1101 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1101 may also include a main processor and a coprocessor, the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1101 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 can also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one instruction for execution by processor 1101 to implement the scoring method for audio provided by method embodiments herein.
In some embodiments, the apparatus 1100 may also optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102 and peripheral interface 1103 may be connected by a bus or signal lines. Various peripheral devices may be connected to the peripheral interface 1103 by buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, touch display screen 1105, camera 1106, audio circuitry 1107, positioning component 1108, and power supply 1109.
The peripheral interface 1103 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1101 and the memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1101, the memory 1102 and the peripheral device interface 1103 may be implemented on separate chips or circuit boards, which is not limited by this embodiment.
The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1104 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1104 may communicate with other devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1104 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1105 is a touch display screen, the display screen 1105 also has the ability to capture touch signals on or over the surface of the display screen 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this point, the display screen 1105 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 1105 may be one, providing the front panel of the device 1100; in other embodiments, the display screens 1105 may be at least two, each disposed on a different surface of the device 1100 or in a folded design; in still other embodiments, the display 1105 may be a flexible display disposed on a curved surface or on a folded surface of the device 1100. Even further, the display screen 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display screen 1105 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.
Camera assembly 1106 is used to capture images or video. Optionally, camera assembly 1106 includes a front camera and a rear camera. Generally, a front camera is disposed on a front panel of the apparatus, and a rear camera is disposed on a rear surface of the apparatus. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1106 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuitry 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing or inputting the electric signals to the radio frequency circuit 1104 to achieve voice communication. The microphones may be multiple and placed at different locations on the device 1100 for stereo sound acquisition or noise reduction purposes. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1107 may also include a headphone jack.
The positioning component 1108 is used to locate the current geographic location of the device 1100 for navigation or LBS (location based Service). The positioning component 1108 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, or the galileo System in russia.
The power supply 1109 is used to provide power to the various components of the device 1100. The power supply 1109 may be alternating current, direct current, disposable or rechargeable. When the power supply 1109 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the device 1100 also includes one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyro sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.
The acceleration sensor 1111 can detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the apparatus 1100. For example, the acceleration sensor 1111 may be configured to detect components of the gravitational acceleration in three coordinate axes. The processor 1101 may control the touch display screen 1105 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1111. The acceleration sensor 1111 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 1112 may detect a body direction and a rotation angle of the device 1100, and the gyro sensor 1112 may acquire a 3D motion of the user to the device 1100 in cooperation with the acceleration sensor 1111. From the data collected by gyroscope sensor 1112, processor 1101 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
The pressure sensors 1113 may be disposed on the side bezel of the device 1100 and/or on the underlying layers of the touch display screen 1105. When the pressure sensor 1113 is disposed on the side frame of the device 1100, the holding signal of the user to the device 1100 can be detected, and the processor 1101 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the touch display screen 1105, the processor 1101 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 1105. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 1114 is configured to collect a fingerprint of the user, and the processor 1101 identifies the user according to the fingerprint collected by the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1114 may be disposed on the front, back, or side of the device 1100. When a physical key or vendor Logo is provided on the device 1100, the fingerprint sensor 1114 may be integrated with the physical key or vendor Logo.
Optical sensor 1115 is used to collect ambient light intensity. In one embodiment, the processor 1101 may control the display brightness of the touch display screen 1105 based on the ambient light intensity collected by the optical sensor 1115. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1105 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1105 is turned down. In another embodiment, processor 1101 may also dynamically adjust the shooting parameters of camera assembly 1106 based on the ambient light intensity collected by optical sensor 1115.
A proximity sensor 1116, also referred to as a distance sensor, is typically provided on the front panel of the device 1100. The proximity sensor 1116 is used to capture the distance between the user and the front of the device 1100. In one embodiment, the touch display screen 1105 is controlled by the processor 1101 to switch from a bright screen state to a dark screen state when the proximity sensor 1116 detects that the distance between the user and the front face of the device 1100 is gradually decreasing; when the proximity sensor 1116 detects that the distance between the user and the front face of the device 1100 becomes progressively larger, the touch display screen 1105 is controlled by the processor 1101 to switch from a breath-screen state to a light-screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 11 does not constitute a limitation of the device 1100, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.
In some embodiments, a computer-readable storage medium is also provided, in which a computer program is stored, which when executed by a processor implements the steps of the scoring method for audio in the above embodiments. For example, the computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It is noted that the computer-readable storage medium referred to herein may be a non-volatile storage medium, in other words, a non-transitory storage medium.
It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.
That is, in some embodiments, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the audio scoring method described above.
The above-mentioned embodiments are provided not to limit the present application, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (22)

1. A method for scoring audio, the method comprising:
acquiring a plurality of vibrato fragments corresponding to a base frequency sequence of the audio to be scored;
acquiring trill characteristic information of a fundamental frequency sequence corresponding to the trill segments, wherein the trill characteristic information at least comprises spectral distribution stability and sequence amplitude;
determining the trill scores of the trill segments according to the trill segment durations of the trill segments and the spectral distribution stability and sequence amplitude of the fundamental frequency sequences corresponding to the trill segments;
determining the audio tremolo score of the audio to be scored according to the tremolo scores and the number of the tremolo segments of the plurality of tremolo segments;
and scoring the audio to be scored based on the audio trill value.
2. The method according to claim 1, wherein determining the vibrato scores of the vibrato segments based on vibrato segment durations of the vibrato segments and spectral distribution stationarity and sequence amplitude of fundamental frequency sequences corresponding to the vibrato segments comprises:
determining time length scores of the plurality of vibrato fragments based on vibrato fragment time lengths of the plurality of vibrato fragments;
multiplying the frequency spectrum distribution stability of the fundamental frequency sequence corresponding to each trill fragment by a first numerical value respectively to obtain the stability score of each trill fragment;
determining amplitude scores of the plurality of vibrato fragments based on sequence amplitudes of fundamental frequency sequences corresponding to the plurality of vibrato fragments;
and respectively determining the sum of the time length score, the stability score and the amplitude score of each trill fragment as the trill score of each trill fragment to obtain the trill scores of the trill fragments.
3. The method of claim 2, wherein determining the time duration scores for the plurality of vibrato segments based on vibrato segment time durations for the plurality of vibrato segments comprises:
for a first vibrato fragment in the plurality of vibrato fragments, when the vibrato fragment time length of the first vibrato fragment is smaller than a first time length threshold value, determining the time length score of the first vibrato fragment according to the vibrato fragment time length of the first vibrato fragment and a second numerical value, wherein the first vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the vibrato fragment time length of the first vibrato fragment is greater than or equal to the first time length threshold value, determining the second numerical value as the time length value of the first vibrato fragment.
4. The method of claim 2, wherein determining amplitude scores for the plurality of vibrato segments based on sequence amplitudes of the sequence of fundamental frequencies to which the plurality of vibrato segments correspond comprises:
for a second vibrato fragment in the plurality of vibrato fragments, when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is smaller than an amplitude threshold value, determining the amplitude score of the second vibrato fragment according to the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment and a third numerical value, wherein the second vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is greater than or equal to the amplitude threshold value, determining the third numerical value as the amplitude score of the second vibrato fragment.
5. The method of claim 1, wherein determining the audio vibrato score of the audio to be scored according to the vibrato scores and the number of vibrato segments of the plurality of vibrato segments comprises:
determining a highest vibrato score from the vibrato scores of the plurality of vibrato fragments;
determining the trill numerical value according to the trill fragment number of the trill fragments;
and determining the sum of the highest trill score and the trill number score as the audio trill score of the audio to be scored.
6. The method of claim 5, wherein determining a vibrato number score based on a vibrato segment number of the plurality of vibrato segments comprises:
determining the number of the vibrato segments as the vibrato number score when the number of the vibrato segments is smaller than a number threshold; or,
and when the number of the vibrato segments is larger than or equal to the number threshold, determining a fourth numerical value as the vibrato number score.
7. The method of claim 1, wherein the obtaining of the plurality of vibrato segments corresponding to the fundamental frequency sequence of the audio to be scored comprises:
taking the appointed time length as the window time length and the appointed step length as the moving distance, and carrying out fast Fourier transform processing on the fundamental frequency sequence of the audio to be scored in a plurality of windows to obtain frequency spectrums corresponding to the windows;
respectively squaring the frequency spectrums corresponding to the windows to obtain power spectrums corresponding to the windows;
determining the ratio of the power spectrum energy in the designated frequency band to the total power spectrum energy in each window according to the power spectrums corresponding to the windows to obtain a vibrato possibility value corresponding to each window;
and determining the plurality of vibrato fragments from the audio to be scored according to the vibrato possibility values corresponding to the plurality of windows.
8. The method of claim 7, wherein determining the plurality of vibrato segments from the audio to be scored according to the vibrato likelihood values corresponding to the plurality of windows comprises:
determining candidate base frequency sequences with the vibrato possibility values larger than or equal to a possibility threshold value from the base frequency sequences of the audio to be scored according to the vibrato possibility values corresponding to the windows;
determining the frequency spectrum distribution stability, frequency and sequence amplitude of the target candidate base frequency sequence with continuous duration greater than a second duration threshold;
and determining the plurality of vibrato fragments from the audio to be scored according to the frequency spectrum distribution stability, the frequency and the sequence amplitude of the target candidate fundamental frequency sequence.
9. The method of claim 8, wherein the determining the stability of the spectral distribution, the frequency and the sequence amplitude of the target candidate sequence of fundamental frequencies having a continuous duration greater than a second duration threshold comprises:
performing fast Fourier transform processing on the target candidate base frequency sequence, and squaring a processing result to obtain a power spectrum of the target candidate base frequency sequence;
according to the power spectrum of the target candidate base frequency sequence, determining the ratio of the power spectrum energy of the target candidate base frequency sequence in a preset frequency band to the total power spectrum energy of the target candidate base frequency sequence to obtain the spectrum distribution stability of the target candidate base frequency sequence;
determining the frequency of the target candidate base frequency sequence according to the period duration and the period number of a plurality of base frequency vibration periods of the target candidate base frequency sequence, wherein the base frequency sequence in each base frequency vibration period crosses zero twice after being subjected to averaging;
and determining the sequence amplitude of the target candidate fundamental frequency sequence according to the fundamental frequency difference and the number of the cycles of the plurality of fundamental frequency vibration cycles of the target candidate fundamental frequency sequence, wherein the fundamental frequency difference of each fundamental frequency vibration cycle is the difference between the maximum value and the minimum value of the fundamental frequency in each fundamental frequency vibration cycle.
10. The method of claim 1, wherein after obtaining the plurality of vibrato segments corresponding to the fundamental frequency sequence of the audio to be scored, the method further comprises:
and for each vibrato fragment in the plurality of vibrato fragments, highlighting the progress bar corresponding to each vibrato fragment from the starting time of each vibrato fragment.
11. An apparatus for scoring audio, the apparatus comprising:
the first acquisition module is used for acquiring a plurality of vibrato fragments corresponding to the base frequency sequence of the audio to be scored;
a second obtaining module, configured to obtain vibrato feature information of a fundamental frequency sequence corresponding to the plurality of vibrato fragments, where the vibrato feature information at least includes a spectrum distribution stability and a sequence amplitude;
a first determining module, configured to determine vibrato scores of the vibrato fragments according to vibrato fragment durations of the vibrato fragments and spectral distribution stabilities and sequence amplitudes of fundamental frequency sequences corresponding to the vibrato fragments;
the second determining module is used for determining the audio tremolo scores of the audios to be scored according to the tremolo scores and the number of the tremolo fragments of the plurality of tremolo fragments;
and the scoring module is used for scoring the audio to be scored based on the audio trill score.
12. The apparatus of claim 11, wherein the first determination module is to:
determining time length scores of the plurality of vibrato fragments based on vibrato fragment time lengths of the plurality of vibrato fragments;
multiplying the frequency spectrum distribution stability of the fundamental frequency sequence corresponding to each trill fragment by a first numerical value respectively to obtain the stability score of each trill fragment;
determining amplitude scores of the plurality of vibrato fragments based on sequence amplitudes of fundamental frequency sequences corresponding to the plurality of vibrato fragments;
and respectively determining the sum of the time length score, the stability score and the amplitude score of each trill fragment as the trill score of each trill fragment to obtain the trill scores of the trill fragments.
13. The apparatus of claim 12, wherein the first determination module is to:
for a first vibrato fragment in the plurality of vibrato fragments, when the vibrato fragment time length of the first vibrato fragment is smaller than a first time length threshold value, determining the time length score of the first vibrato fragment according to the vibrato fragment time length of the first vibrato fragment and a second numerical value, wherein the first vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the vibrato fragment time length of the first vibrato fragment is greater than or equal to the first time length threshold value, determining the second numerical value as the time length value of the first vibrato fragment.
14. The apparatus of claim 12, wherein the first determination module is to:
for a second vibrato fragment in the plurality of vibrato fragments, when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is smaller than an amplitude threshold value, determining the amplitude score of the second vibrato fragment according to the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment and a third numerical value, wherein the second vibrato fragment is any vibrato fragment in the plurality of vibrato fragments; or,
and when the sequence amplitude of the fundamental frequency sequence corresponding to the second vibrato fragment is greater than or equal to the amplitude threshold value, determining the third numerical value as the amplitude score of the second vibrato fragment.
15. The apparatus of claim 11, wherein the second determination module is to:
determining a highest vibrato score from the vibrato scores of the plurality of vibrato fragments;
determining the trill numerical value according to the trill fragment number of the trill fragments;
and determining the sum of the highest trill score and the trill number score as the audio trill score of the audio to be scored.
16. The apparatus of claim 15, wherein the second determination module is to:
determining the number of the vibrato segments as the vibrato number score when the number of the vibrato segments is smaller than a number threshold; or,
and when the number of the vibrato segments is larger than or equal to the number threshold, determining a fourth numerical value as the vibrato number score.
17. The apparatus of claim 11, wherein the first obtaining module is to:
taking the appointed time length as the window time length and the appointed step length as the moving distance, and carrying out fast Fourier transform processing on the fundamental frequency sequence of the audio to be scored in a plurality of windows to obtain frequency spectrums corresponding to the windows;
respectively squaring the frequency spectrums corresponding to the windows to obtain power spectrums corresponding to the windows;
determining the ratio of the power spectrum energy in the designated frequency band to the total power spectrum energy in each window according to the power spectrums corresponding to the windows to obtain a vibrato possibility value corresponding to each window;
and determining the plurality of vibrato fragments from the audio to be scored according to the vibrato possibility values corresponding to the plurality of windows.
18. The apparatus of claim 17, wherein the first obtaining module is to:
determining candidate base frequency sequences with the vibrato possibility values larger than or equal to a possibility threshold value from the base frequency sequences of the audio to be scored according to the vibrato possibility values corresponding to the windows;
determining the frequency spectrum distribution stability, frequency and sequence amplitude of the target candidate base frequency sequence with continuous duration greater than a second duration threshold;
and determining the plurality of vibrato fragments from the audio to be scored according to the frequency spectrum distribution stability, the frequency and the sequence amplitude of the target candidate fundamental frequency sequence.
19. The apparatus of claim 18, wherein the first obtaining module is to:
performing fast Fourier transform processing on the target candidate base frequency sequence, and squaring a processing result to obtain a power spectrum of the target candidate base frequency sequence;
according to the power spectrum of the target candidate base frequency sequence, determining the ratio of the power spectrum energy of the target candidate base frequency sequence in a preset frequency band to the total power spectrum energy of the target candidate base frequency sequence to obtain the spectrum distribution stability of the target candidate base frequency sequence;
determining the frequency of the target candidate base frequency sequence according to the period duration and the period number of a plurality of base frequency vibration periods of the target candidate base frequency sequence, wherein the base frequency sequence in each base frequency vibration period crosses zero twice after being subjected to averaging;
and determining the sequence amplitude of the target candidate fundamental frequency sequence according to the fundamental frequency difference and the number of the cycles of the plurality of fundamental frequency vibration cycles of the target candidate fundamental frequency sequence, wherein the fundamental frequency difference of each fundamental frequency vibration cycle is the difference between the maximum value and the minimum value of the fundamental frequency in each fundamental frequency vibration cycle.
20. The apparatus of claim 11, wherein the first obtaining module is further configured to:
and for each vibrato fragment in the plurality of vibrato fragments, highlighting the progress bar corresponding to each vibrato fragment from the starting time of each vibrato fragment.
21. An apparatus comprising a memory for storing a computer program and a processor for executing the computer program stored in the memory to perform the steps of the method of any one of claims 1 to 10.
22. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.
CN201911072491.XA 2019-11-05 2019-11-05 Audio scoring method, device, equipment and storage medium Active CN110867194B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911072491.XA CN110867194B (en) 2019-11-05 2019-11-05 Audio scoring method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911072491.XA CN110867194B (en) 2019-11-05 2019-11-05 Audio scoring method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110867194A true CN110867194A (en) 2020-03-06
CN110867194B CN110867194B (en) 2022-05-17

Family

ID=69653554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911072491.XA Active CN110867194B (en) 2019-11-05 2019-11-05 Audio scoring method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110867194B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593604A (en) * 2021-07-22 2021-11-02 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for detecting audio quality
CN114061568A (en) * 2021-11-30 2022-02-18 北京信息科技大学 Method, device and system for measuring rotating speed of flying object based on geomagnetic data
CN114534130A (en) * 2020-11-25 2022-05-27 深圳市安联消防技术有限公司 Method for eliminating airflow noise of breathing mask

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090125298A1 (en) * 2007-11-02 2009-05-14 Melodis Inc. Vibrato detection modules in a system for automatic transcription of sung or hummed melodies
CN104903956A (en) * 2012-10-10 2015-09-09 弗兰霍菲尔运输应用研究公司 Apparatus and method for efficient synthesis of sinusoids and sweeps by employing spectral patterns
CN109817191A (en) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 Trill modeling method, device, computer equipment and storage medium
CN109979485A (en) * 2019-04-29 2019-07-05 北京小唱科技有限公司 Audio evaluation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090125298A1 (en) * 2007-11-02 2009-05-14 Melodis Inc. Vibrato detection modules in a system for automatic transcription of sung or hummed melodies
CN104903956A (en) * 2012-10-10 2015-09-09 弗兰霍菲尔运输应用研究公司 Apparatus and method for efficient synthesis of sinusoids and sweeps by employing spectral patterns
CN109817191A (en) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 Trill modeling method, device, computer equipment and storage medium
CN109979485A (en) * 2019-04-29 2019-07-05 北京小唱科技有限公司 Audio evaluation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李锦珑等: "基于视唱语料的颤音分析及其应用研究", 《自动化与仪器仪表》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114534130A (en) * 2020-11-25 2022-05-27 深圳市安联消防技术有限公司 Method for eliminating airflow noise of breathing mask
CN113593604A (en) * 2021-07-22 2021-11-02 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for detecting audio quality
CN114061568A (en) * 2021-11-30 2022-02-18 北京信息科技大学 Method, device and system for measuring rotating speed of flying object based on geomagnetic data
CN114061568B (en) * 2021-11-30 2023-11-14 北京信息科技大学 Method, device and system for measuring rotating speed of flying body based on geomagnetic data

Also Published As

Publication number Publication date
CN110867194B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN108008930B (en) Method and device for determining K song score
CN109994127B (en) Audio detection method and device, electronic equipment and storage medium
CN108538302B (en) Method and apparatus for synthesizing audio
CN110688082B (en) Method, device, equipment and storage medium for determining adjustment proportion information of volume
CN110956971B (en) Audio processing method, device, terminal and storage medium
CN109147757B (en) Singing voice synthesis method and device
CN110931048B (en) Voice endpoint detection method, device, computer equipment and storage medium
CN109003621B (en) Audio processing method and device and storage medium
CN109192218B (en) Method and apparatus for audio processing
WO2022111168A1 (en) Video classification method and apparatus
CN110867194B (en) Audio scoring method, device, equipment and storage medium
CN111128232B (en) Music section information determination method and device, storage medium and equipment
CN112735429B (en) Method for determining lyric timestamp information and training method of acoustic model
CN109065068B (en) Audio processing method, device and storage medium
CN109192223B (en) Audio alignment method and device
CN109102811B (en) Audio fingerprint generation method and device and storage medium
CN111081277B (en) Audio evaluation method, device, equipment and storage medium
CN108053832B (en) Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN110600034B (en) Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN111428079A (en) Text content processing method and device, computer equipment and storage medium
CN113362836B (en) Vocoder training method, terminal and storage medium
CN112086102B (en) Method, apparatus, device and storage medium for expanding audio frequency band
CN110337030B (en) Video playing method, device, terminal and computer readable storage medium
CN110377208B (en) Audio playing method, device, terminal and computer readable storage medium
CN109003627B (en) Method, device, terminal and storage medium for determining audio score

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant