WO2020199381A1 - 音频信号的旋律检测方法、装置以及电子设备 - Google Patents

音频信号的旋律检测方法、装置以及电子设备 Download PDF

Info

Publication number
WO2020199381A1
WO2020199381A1 PCT/CN2019/093204 CN2019093204W WO2020199381A1 WO 2020199381 A1 WO2020199381 A1 WO 2020199381A1 CN 2019093204 W CN2019093204 W CN 2019093204W WO 2020199381 A1 WO2020199381 A1 WO 2020199381A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
pitch
audio signal
frequency
melody
Prior art date
Application number
PCT/CN2019/093204
Other languages
English (en)
French (fr)
Inventor
吴晓婕
Original Assignee
广州市百果园信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市百果园信息技术有限公司 filed Critical 广州市百果园信息技术有限公司
Priority to SG11202110700SA priority Critical patent/SG11202110700SA/en
Priority to EP19922753.9A priority patent/EP3929921B1/en
Priority to US17/441,640 priority patent/US20220165239A1/en
Publication of WO2020199381A1 publication Critical patent/WO2020199381A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/38Chord
    • G10H1/383Chord detection and/or recognition, e.g. for correction, or automatic bass generation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/066Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/071Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for rhythm pattern analysis or rhythm style recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/081Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for automatic key or tonality recognition, e.g. using musical rules or a knowledge base
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/086Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for transcription of raw audio or music data to a displayed or printed staff representation or to displayable MIDI-like note-oriented data, e.g. in pianoroll format
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/395Special musical scales, i.e. other than the 12-interval equally tempered scale; Special input devices therefor
    • G10H2210/471Natural or just intonation scales, i.e. based on harmonics consonance such that most adjacent pitches are related by harmonically pure ratios of small integers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • G10L2025/906Pitch tracking

Definitions

  • the present invention relates to the field of audio processing. Specifically, the present invention relates to a method, device and electronic device for detecting melody of audio signals.
  • the existing technical solution is to perform voice recognition on the song sung by the user, mainly by recognizing the lyrics in the audio signal of the song sung by the user, and matching the identified lyrics in the database to obtain the melody information of the song sung by the user.
  • the user may just hum a melody without clear lyrics, or just repeat with simple lyrics of 1 to 2 words without actual lyrics meaning.
  • the original voice recognition-based The method will fail.
  • the user may also sing a melody created by himself, and the original database matching method is no longer applicable.
  • the purpose of the present invention is to solve at least one of the above technical defects.
  • the present invention does not require the user to sing clear lyrics, but only needs the user to hum and sing a melody; at the same time, when the user is a non-professional singer, there is a slight out of tune, it can identify the corresponding more accurate melody of the user's singing content.
  • the present invention provides a melody detection method for audio signals, including the following steps: divide the audio signal into multiple audio segments according to the beat, detect the pitch frequency of each frame of audio sub-signal in each audio segment, and Estimate the pitch value of each audio segment by the pitch frequency; determine the note name corresponding to each audio segment according to the frequency range to which the pitch value belongs; use the note name of each audio segment to estimate the mode of the audio signal to obtain The scale of the audio signal; and the melody of the audio signal is determined according to the frequency interval of the pitch value of each audio segment in the scale.
  • the audio signal is divided into multiple audio segments according to the beat, the pitch frequency of each frame of the audio sub-signal in each audio segment is detected, and each pitch is estimated according to the pitch frequency.
  • the step of the pitch value of the audio segment includes: determining the duration of each audio segment according to the set beat type; dividing the audio signal into several audio segments according to the duration; wherein the audio segment is a measure determined according to the beat ; Divide each of the audio segments into several audio sub-segments; respectively detect the pitch frequency of each frame of the audio sub-signal in each of the audio sub-segments; calculate the mean value of the pitch frequency of the audio sub-segments in the audio sub-segment. As the pitch value.
  • the method further includes: calculating each audio segment When the stable duration of the pitch value is less than the set threshold, the pitch value of the corresponding audio segment is set to zero.
  • the step of determining the pitch name corresponding to each audio segment according to the frequency range to which the pitch value belongs includes: inputting the pitch value into a pitch name number to generate The model obtains the note name number; according to the note name number, the frequency range to which the pitch value of each audio segment belongs is searched in the note name sequence table to determine the note name corresponding to the pitch value.
  • the note name number generation model in the step of inputting the pitch value into the note name number generation model to obtain the note name number, is expressed as:
  • the K is the note name number
  • the f mn is the frequency of the pitch value of the n-th note in the m-th audio segment
  • the a is the frequency of the note name used for positioning
  • the step of estimating the mode of the audio signal by using the note name of each audio segment to obtain the scale of the audio signal includes: obtaining the audio signal The pitch name corresponding to each audio segment in the middle; the pitch name is processed by the tuning algorithm to estimate the mode of the audio signal; the interval semitone number of the positioned note is determined according to the mode, and the interval semitone number is calculated according to the interval semitone number The scale corresponding to the audio signal.
  • the step of determining the melody of the audio signal according to the frequency interval of the pitch value of each audio segment in the musical scale includes: obtaining the musical scale of the audio signal Pitch list; wherein the pitch list records the correspondence between the pitch value and the scale; according to the pitch value of each audio segment in the audio signal, search for the note corresponding to the pitch value in the pitch list; According to the time sequence corresponding to the pitch value in each audio segment, the musical notes are sorted in the time sequence, and the musical notes are converted into the melody of the corresponding audio signal according to the sorting.
  • the audio signal is divided into multiple audio segments according to the beat, the pitch frequency of each frame of the audio sub-signal in each audio segment is detected, and each pitch is estimated according to the pitch frequency.
  • the step of the pitch value of the audio segment it further includes: performing short-time Fourier transform on the audio signal; wherein the audio signal is an audio signal of humming or unvoicing; performing the fundamental tone on the result of the short-time Fourier transform Frequency detection to obtain the pitch frequency; among them, the pitch frequency is used for the detection of the pitch value; if the pitch frequency cannot be detected, the interpolation frequency is input at the signal position corresponding to the audio sub-signal of each frame; the interpolation frequency is used as the corresponding frame The pitch frequency of the audio signal.
  • the audio signal is divided into multiple audio segments according to the beat, the pitch frequency of each frame of the audio sub-signal in each audio segment is detected, and each pitch is estimated according to the pitch frequency.
  • the step of the pitch value of the audio segment it further includes: generating the music rhythm of the audio signal according to the set rhythm information; generating prompt information of the beat and time according to the music rhythm.
  • the present invention also provides a melody detection device for audio signals, including: a pitch detection unit for dividing the audio signal into multiple audio segments according to the beat, detecting the pitch frequency of each frame of audio sub-signal in each audio segment, and according to The pitch frequency estimates the pitch value of each audio segment; the sound name detection unit is used to determine the sound name corresponding to each audio segment according to the frequency range to which the pitch value belongs; the mode detection unit is used to use the The note name estimates the mode of the audio signal to obtain the scale of the audio signal; the melody detection unit is used to determine the melody of the audio signal according to the frequency interval of the pitch value of each audio segment in the scale .
  • the present invention also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to perform the melody detection of an audio signal according to any one of the above embodiments method.
  • the present invention also provides a non-transitory computer-readable storage medium.
  • the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the audio signal processing described in any one of the above embodiments. Melody detection method.
  • the melody detection scheme of the audio signal provided in this embodiment: divide the audio signal into multiple audio segments according to the beat, detect the pitch frequency of each frame of audio sub-signal in each audio segment, and estimate the pitch of each audio segment according to the pitch frequency. Pitch value; determine the pitch name corresponding to each audio segment according to the frequency range to which the pitch value belongs; use the pitch name of each audio segment to estimate the mode of the audio signal to obtain the scale of the audio signal;
  • a technical solution for determining the melody of the audio signal is the frequency interval of the pitch value of the audio segment in the scale.
  • processing steps such as pitch value, pitch name, mode estimation, and scale determination are performed on the pitch value, pitch name, mode estimation, and scale determination of the pitch frequency corresponding to the multi-frame audio sub-signals in the audio segment divided by the audio signal, and finally the melody of the user's humming or unvoicing audio signal is output.
  • the technical scheme provided by the present invention can perform audio signals of bad singing and non-professional singing such as self-composition, meaningless humming, wrong singing of lyrics, unclear singing, unstable vocalization, inaccurate intonation, out of voice, broken sound, etc. Accurate melody detection does not depend on the user's pronunciation standards or singing accuracy.
  • the technical solution of the present invention even when the user is out of tune, the melody hummed by the user can be corrected, and the correct melody can be output finally. Therefore, the technical solution of the present invention has better robustness in obtaining accurate melody, and can have a good recognition effect even when the deviation degree of the singer is less than 1.5 semitones.
  • FIG. 1 is a method flowchart of an audio signal melody detection method according to an embodiment
  • FIG. 2 is a flowchart of a method for determining the pitch value of each audio segment in an audio signal according to an embodiment
  • Figure 3 is a schematic diagram of an audio segment in an audio signal divided into eight audio sub-segments
  • Fig. 4 is a flowchart of a method for configuring a pitch value whose stable duration is less than a threshold to zero;
  • FIG. 5 is a flowchart of a method for determining a note name according to the frequency range where the pitch value is located in an embodiment
  • FIG. 6 is a flowchart of a method for tuning and determining a scale according to the note name of each audio segment according to an embodiment
  • FIG. 7 shows the relationship between the number of interval semitones, the note name and the frequency value, and the relationship between the pitch value and the scale in this embodiment
  • FIG. 8 is a flowchart of a method for generating a melody from a pitch value according to a mode and a scale according to an embodiment
  • FIG. 9 is a flowchart of a method for preprocessing audio signals according to an embodiment
  • FIG. 10 is a flowchart of a method for generating prompt information according to selected rhythm information according to an embodiment
  • FIG. 11 is a structural diagram of an audio signal melody detection device according to an embodiment
  • Fig. 12 is a structural diagram of an audio signal melody detection electronic device of an embodiment.
  • the present invention provides a technical solution for detecting the melody of an audio signal, which can detect the melody formed in the audio signal.
  • the melody is recognized and output, especially suitable for a cappella, humming, and inaccurate singing.
  • the present invention is also suitable for scenes such as singing without lyrics.
  • the present invention provides a method for detecting melody of an audio signal, which includes the following steps:
  • Step S1 Divide the audio signal into multiple audio segments according to the beat, detect the pitch frequency of each frame audio sub-signal in each audio segment, and estimate the pitch value of each audio segment according to the pitch frequency;
  • Step S2 Determine the sound name corresponding to each audio segment according to the frequency range to which the pitch value belongs;
  • Step S3 Estimate the mode of the audio signal by using the note name of each audio segment to obtain the scale of the audio signal
  • Step S4 Determine the melody of the audio signal according to the frequency interval of the pitch value of each audio segment in the scale.
  • the melody of the audio signal hummed by the user is recognized as an example.
  • You can select a designated beat which is the beat of the melody of the audio signal, such as 1/4 beat, 1/2 beat, 1 beat, 2 beats, and 4 beats.
  • the audio signal is divided into multiple audio segments, each audio segment corresponds to a measure in the beat, and each audio segment includes multiple frames of audio sub-signals.
  • the standard duration of the selected beat can be set as one measure, and the audio signal is divided into multiple audio segments according to the standard duration, that is, the audio segments are divided according to the standard duration of one measure. Then divide the audio segment of the measure equally, for example, divide a measure into eight audio segments. The duration of each audio segment can be used as the output time of a stable pitch value.
  • An audio signal is generally divided into fast (120 beats/minute), medium speed (90 beats/minute), and slow speed (30 beats/minute) according to the different singing speeds of users.
  • fast 120 beats/minute
  • medium speed 90 beats/minute
  • slow speed (30 beats/minute) according to the different singing speeds of users.
  • the standard duration of a measure is about 1 second to 2 seconds
  • the output time of the above pitch value is about 125 milliseconds to 250 milliseconds.
  • step S1 when the user hums to the mth measure, the audio segment of the mth measure is detected. Among them, if the audio segment of the mth measure is equally divided into eight audio segments, each audio segment will determine a pitch value, that is, each segment corresponds to a pitch value.
  • each audio segment includes multiple frames of audio sub-signals
  • the pitch frequency of each frame of audio sub-signal can be detected, and the corresponding pitch value of each audio segment can be obtained according to the pitch frequency.
  • Obtain the pitch value of each audio segment in each audio segment and determine the sound name corresponding to each audio segment in each audio segment.
  • each audio segment can include multiple note names, or just hum the same note name.
  • the mode of the audio signal obtained by the user humming is estimated, and the scale of the corresponding audio signal is obtained.
  • the mode estimation is performed on the changes of the multiple note names to obtain the mode corresponding to the audio signal.
  • the key of the user's humming can be determined by the mode, for example, the key of C or F#.
  • the scale of the audio signal hummed by the user is determined according to the determined mode and interval relationship.
  • Each tone on the scale corresponds to a certain frequency range.
  • the melody of the audio signal is determined by judging that the pitch frequency of each audio segment falls within each audio frequency range of the above-mentioned scale.
  • step S1 divide the audio signal into multiple audio segments according to the beat, and detect each frame of audio in each audio segment
  • the pitch frequency of the sub-signal, and the step of estimating the pitch value of each audio segment according to the pitch frequency specifically includes:
  • Step S11 Determine the duration of each audio segment according to the set beat type.
  • Step S12 divide the audio signal into several audio segments according to the duration. Among them, the above audio segment is a bar determined according to the beat.
  • Step S13 Divide each audio segment into several audio sub-segments.
  • Step S14 Detect the pitch frequency of each frame audio sub-signal in each audio sub-segment.
  • Step S15 Use the average value of the pitch frequency of the continuously stable multiple frames of audio sub-signals in the audio sub-segment as the pitch value.
  • the above technical solution can determine the duration of each audio segment according to the set beat type. According to the duration of the audio segment, the audio signal of a certain duration is divided into several audio segments. Each audio segment corresponds to the above-mentioned measure determined by the beat.
  • Figure 3 shows an example of dividing an audio segment (a measure) into eight audio segments in an audio signal.
  • the audio segments in Figure 3 include: audio segment X-1, audio segment X-2, audio segment X-3, audio segment X-4, audio segment X-5, audio segment X-6, audio segment X-7, Audio segment X-8.
  • each audio segment In an audio signal obtained by the user humming, each audio segment generally includes three processes of beginning, continuing, and ending. In each audio segment shown in FIG. 3, the pitch frequency with the most stable pitch change and the longest duration is detected, and the pitch frequency is used as the pitch value of the audio segment.
  • the beginning and end of each audio segment are generally areas where the pitch changes drastically. The area where the pitch changes drastically will affect the accuracy of the detected pitch value. In a further improved technical solution, the area where the pitch changes drastically can be removed before the pitch value is detected to enhance the accuracy of the pitch value detection result.
  • a segment with a change in the pitch frequency within ⁇ 5 Hz and the longest duration is taken as a continuous and stable segment of the corresponding audio segment.
  • the average value of all pitch frequencies in the segment with the longest duration is calculated, and the calculated average value is output as the pitch value of this audio segment.
  • the above-mentioned threshold refers to the minimum stable duration of each audio segment.
  • the above threshold is exemplarily selected as one-third of the length of the audio segment. In a measure (an audio segment), if the duration of the longest audio segment is greater than a certain threshold, then the segment (the audio segment) will output eight tones, and each sound corresponds to an audio segment.
  • step S15 uses the average value of the pitch frequency of the continuously stable multi-frame audio sub-signals in the audio segment as the pitch value, further include:
  • Step S16 Calculate the stable duration of the pitch value in each audio segment.
  • Step S17 When the stabilization duration is less than the set threshold, the pitch value of the corresponding audio segment is set to zero.
  • the above-mentioned threshold refers to the minimum stable duration of each audio segment.
  • the time of the longest duration segment in each audio segment is the stable duration of the pitch value.
  • the pitch value of the corresponding audio segment is set to zero.
  • step S2 the step of determining the note name corresponding to each audio segment according to the frequency range to which the pitch value belongs. include:
  • Step S21 Input the pitch value into the note name number generation model to obtain the note name number.
  • Step S22 Find the frequency range to which the pitch value of each audio segment belongs in the sound name sequence table according to the sound name number, and determine the sound name corresponding to the pitch value.
  • the pitch value of each audio segment is input into the note number generation model to obtain the note number.
  • the frequency range to which the pitch value of each audio segment belongs is searched in the note sequence table to determine the note name corresponding to the pitch value.
  • the range to which the value of the note number belongs may also correspond to the note name in the note sequence table.
  • the present invention also provides a sound name number generation model, and the above sound name number generation model is expressed as:
  • the K is the sound name number
  • the f mn is the frequency of the pitch value of the n-th sound (corresponding to the n-th audio segment) in the m-th audio segment (m-th bar)
  • the a is the frequency of the note name used for positioning
  • the mod is the remainder function.
  • the number of 12 note names is set according to the twelve equal temperament, that is, there are 12 note names in an octave.
  • the sound name used for positioning is determined as A in this embodiment
  • the setting of 12 note name numbers is based on the twelve equal temperament.
  • the following shows a sound name sequence table, which records the one-to-one correspondence between the sound name number range where the value of the sound name number K is located and the sound name.
  • the sound name number range corresponding to sound name A is: 0.5 ⁇ K ⁇ 1.5;
  • the sound name number range corresponding to sound name A# is: 1.5 ⁇ K ⁇ 2.5;
  • the sound name number range corresponding to sound name B is: 2.5 ⁇ K ⁇ 3.5;
  • the sound name number range corresponding to the sound name C is: 3.5 ⁇ K ⁇ 4.5;
  • the sound name number range corresponding to the sound name C# is: 4.5 ⁇ K ⁇ 5.5;
  • the sound name number range corresponding to sound name D is: 5.5 ⁇ K ⁇ 6.5;
  • the sound name number range corresponding to the sound name D# is: 6.5 ⁇ K ⁇ 7.5;
  • the sound name number range corresponding to the sound name E is: 7.5 ⁇ K ⁇ 8.5;
  • the sound name number range corresponding to the sound name F is: 8.5 ⁇ K ⁇ 9.5;
  • the sound name number range corresponding to the sound name F# is: 9.5 ⁇ K ⁇ 10.5;
  • the sound name number range corresponding to the sound name G is: 10.5 ⁇ K ⁇ 11.5;
  • the sound name number range corresponding to the sound name G# is: 11.5 ⁇ K or K ⁇ 0.5.
  • step S3 of the present invention using the note names of each audio segment to estimate the mode of the audio signal to obtain the scale of the audio signal includes:
  • Step S31 Obtain the sound name corresponding to each audio segment in the audio signal.
  • Step S32 The tone name is processed by a tuning algorithm to estimate the mode of the audio signal.
  • Step S33 Determine the number of semitones between the positioning notes according to the mode, and calculate the scale corresponding to the audio signal according to the number of semitones.
  • mode estimation can be performed based on multiple sound names of the audio signal.
  • the mode estimation is processed by a tuning algorithm, and the tuning algorithm can be a Krumhansl-Schmuckler (Krumhansl-Schmuckler) tuning algorithm.
  • the tuning algorithm can output the mode of the audio signal hummed by the user.
  • the mode output in this embodiment can be represented by the number of semitones, and the mode can also be represented by the note name.
  • the number of semitones is between the number of semitones and the aforementioned 12 note names. One to one correspondence.
  • the number of semitones between the positioned notes can be determined. For example, in this embodiment, it is determined that the mode of the audio signal is F#, the number of semitones between them is 9, and the note name is F#. Mode F# means F# is used as Do (roll name), Do is the positioning note, which is the first note of the scale.
  • Do is the positioning note, which is the first note of the scale.
  • the positioning note can be set to any note in the scale, and the corresponding conversion is performed. In this embodiment, using the first note as the positioning note can reduce some processing.
  • the interval semitone number of the positioning note (Do) can be determined to be 9 according to the mode (F#) of the audio signal, and the scale corresponding to the audio signal can be calculated according to the interval semitone number.
  • the positioning note (Do) is determined according to the mode (F#).
  • the positioning note is the first note in the scale, that is, the note corresponding to the roll name (Do).
  • the scale can be determined.
  • the scale of mode F# is expressed as F#, G#, A#, B, C#, D#, F in the order of note names.
  • the scale of mode F# is expressed in order by roll name: Do, Re, Mi, Fa, Sol, La, Si.
  • the scale when the tuning algorithm obtains the interval semitone, the scale can be obtained through the following conversion relationship:
  • Key represents the number of semitones between the tune and determines the positioning note
  • mod represents the remainder function
  • Do, Re, Mi, Fa, Sol, La, and Si are the number of semitones between the roll names in the scale. Obtaining the number of semitones between each roll name can determine the note name of each scale through Figure 7.
  • Figure 7 shows the relationship between the number of interval semitones, note names, and frequency values, including the multiple relationship between the number of interval semitones and note names.
  • the scale of the audio signal whose mode is C can be obtained by conversion of the interval relationship.
  • the scales represented by the note names in order are: C, D, E, F, G, A, B.
  • the scales expressed in order by the roll name are: Do, Re, Mi, Fa, Sol, La, Si.
  • step S4 the step of determining the melody of the audio signal according to the frequency interval of the pitch value of each audio segment in the scale includes:
  • Step S41 Obtain the pitch list of the audio signal scale.
  • the pitch list records the correspondence between the pitch value and the scale.
  • the pitch list refer to Fig. 7 (the relationship between pitch values and scales shown in Fig. 7 constitutes the aforementioned pitch list).
  • Each note name in the scale corresponds to a pitch value, and the pitch value is expressed in frequency (Hertz).
  • Step S42 According to the pitch value of each audio segment in the audio signal, search for the note corresponding to the pitch value in the pitch list.
  • Step S43 Sort the notes according to the time sequence according to the time sequence corresponding to the pitch values in each audio segment, and convert the notes into the melody of the corresponding audio signal according to the sequence.
  • the pitch list of the scale corresponding to the audio signal can be obtained as shown in Fig. 7. According to the pitch value of each audio segment in the audio signal, the corresponding note is searched in the pitch list. Musical notes can be expressed in the form of sound names.
  • the pitch name of the note found in the pitch list is A 1 . Therefore, the musical note corresponding to the time and its duration can be found according to the frequency of the pitch value of each audio segment in the audio signal.
  • the notes are sorted in the order of appearance. According to the time sequence of the notes, the notes are converted into melody corresponding to the audio signal.
  • the obtained melody can be the melody displayed by numbered musical notation, stave, sound name, and solo name, and can also be output in standard pitch music.
  • the melody can be searched for humming, that is, the corresponding repertoire information can be searched; chord, accompaniment, and harmony can also be processed on the hummed melody; it can also determine the user's humming Song types and analysis of user characteristics.
  • the difference between the user's humming melody and the obtained melody can be calculated to obtain the user's humming accuracy score.
  • step S1 the audio signal is divided into multiple audio segments according to the beat, the pitch frequency of each frame audio sub-signal in each audio segment is detected, and the pitch frequencies are estimated according to the pitch frequency.
  • the pitch value of the audio segment it also includes:
  • Step A1 Perform short-time Fourier transform on the audio signal.
  • the audio signal is a humming or a cappella audio signal.
  • Step A2 Perform pitch frequency detection on the result of performing short-time Fourier transform to obtain the pitch frequency. Among them, the pitch frequency is used for the detection of the pitch value.
  • Step A3 If the pitch frequency cannot be detected, input the interpolation frequency at the signal position corresponding to the audio sub-signal of each frame.
  • Step A4 Use the interpolation frequency as the pitch frequency of the audio signal of the corresponding frame.
  • the audio signal hummed by the user can be obtained through the radio equipment.
  • the short-time Fourier transform of the audio signal obtains the result of the short-time Fourier transform of multiple frames.
  • the aforementioned audio signal may be an audio signal collected by a user through a cappella or humming track, and the track sung or hummed may be a song created by himself.
  • the pitch frequency detection is performed on the result of the short-time Fourier transform of each frame to obtain the pitch frequency, and then the multi-frame pitch frequency corresponding to the audio signal is obtained.
  • the pitch frequency can be used for subsequent pitch detection of the audio signal.
  • the interpolation frequency is input to the corresponding signal position in the audio segment.
  • the interpolation frequency can be obtained according to the interpolation algorithm.
  • the aforementioned interpolation frequency can be used as the pitch frequency of the corresponding audio segment.
  • step S1 the audio signal is divided into multiple audio segments according to the beat, and the audio signal of each frame in each audio segment is detected.
  • the pitch frequency and before the step of estimating the pitch value of each audio segment according to the pitch frequency, further includes:
  • Step B1 Generate the music rhythm of the audio signal according to the set rhythm information.
  • Step B2 Generate beat and time prompt information according to the music rhythm.
  • the user can select the rhythm information according to the song to be hummed. Acquire the rhythm information set by the user to generate a music rhythm corresponding to the audio signal.
  • prompt information is generated.
  • the prompt information can prompt the user the beat and time of the audio signal to be generated.
  • the beat can be embodied in the form of drum beats, piano sound, etc., or can be embodied in the form of vibrations and flashes emitted by the device held by the user.
  • the rhythm information selected by the user is 1/4 beat
  • the music rhythm is generated according to the 1/4 beat selected by the user
  • the beat conforming to the 1/4 beat is generated, which is fed back to the user.
  • the device (such as a mobile phone or a singing tool) prompts the user to 1/4 of the beat in the form of vibration.
  • it can also generate drum beats or piano accompaniment to assist users in humming according to the 1/4 beat.
  • the device or headphones held by the user can play drum beats or piano accompaniment to the user, thereby improving the accuracy of the melody of the audio signal. Sex.
  • the user can be reminded of the starting point and ending point of the humming through prompt information such as vibration or prompt sound.
  • prompt information can also be prompted by visual means such as a display screen.
  • a melody detection device for audio signals including:
  • the pitch detection unit 111 is configured to divide the audio signal into multiple audio segments according to the beat, detect the pitch frequency of each frame of audio sub-signal in each audio segment, and estimate the pitch value of each audio segment according to the pitch frequency.
  • the note name detection unit 112 is configured to determine the note name corresponding to each audio segment according to the frequency range to which the pitch value belongs.
  • the mode detection unit 113 is configured to estimate the mode of the audio signal by using the note name of each audio segment, and obtain the scale of the audio signal.
  • the melody detection unit 114 is configured to determine the melody of the audio signal according to the frequency interval of the pitch value of each audio segment in the scale.
  • this embodiment also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the above-mentioned embodiments Melody detection method of audio signal.
  • Fig. 12 is a block diagram showing an electronic device for performing a method for detecting a melody of an audio signal according to an exemplary embodiment.
  • the electronic device 1200 may be provided as a server. 12, the electronic device 1200 includes a processing component 1222, which further includes one or more processors, and a memory resource represented by a memory 1232 for storing instructions executable by the processing component 1222, such as application programs.
  • the application program stored in the memory 1232 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1222 is configured to execute instructions to execute the above-mentioned method for detecting the melody of the audio signal.
  • the electronic device 1200 may also include a power component 1226 configured to perform power management of the electronic device 1200, a wired or wireless network interface 1250 configured to connect the electronic device 1200 to a network, and an input output (I/O) interface 1258 .
  • the electronic device 1200 can operate based on an operating system stored in the memory 1232, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • the electronic device may be a terminal such as a computer device, a mobile phone, and a tablet computer.
  • This embodiment also provides a non-transitory computer-readable storage medium.
  • the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the audio signal described in any one of the above embodiments. Melody detection method.
  • the melody detection scheme of the audio signal provided in this embodiment: divide the audio signal into multiple audio segments according to the beat, detect the pitch frequency of each frame of audio sub-signal in each audio segment, and estimate the pitch of each audio segment according to the pitch frequency. Pitch value; determine the pitch name corresponding to each audio segment according to the frequency range to which the pitch value belongs; use the pitch name of each audio segment to estimate the mode of the audio signal to obtain the scale of the audio signal;
  • a technical solution for determining the melody of the audio signal is the frequency interval of the pitch value of the audio segment in the scale.
  • processing steps such as pitch value, pitch name, mode estimation, and scale determination are performed on the pitch value, pitch name, mode estimation, and scale determination of the pitch frequency corresponding to the multi-frame audio sub-signals in the audio segment divided by the audio signal, and finally the melody of the user's humming or unvoicing audio signal is output.
  • the technical solutions provided by the embodiments of the present invention can be used for unprofessional singing and non-professional singing, such as self-composing, meaningless humming, wrong singing of lyrics, unclear singing, unstable vocalization, inaccurate intonation, misalignment, broken sound, etc. Signal, for accurate melody detection, without relying on the user’s pronunciation standards or singing accuracy.
  • the technical solution of the embodiment of the present invention even when the user is out of tune, the melody hummed by the user can be corrected, and the correct melody is finally output. Therefore, the technical solution of the present invention has better robustness in obtaining accurate melody, and can have a good recognition effect even when the deviation degree of the singer is less than 1.5 semitones.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

一种音频信号的旋律检测方法、装置及电子设备。该方法包括:按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据基音频率估计各个音频段的音高值(S1);根据该音高值所属的频率范围确定各个音频段对应的音名(S2);利用各个音频段的音名对该音频信号的调式进行估计,获取该音频信号的音阶(S3);根据各个音频段的音高值在音阶中的频率区间,确定音频信号的旋律(S4)。该音频信号的旋律检测方法可对非专业歌手演唱的音频信号检测旋律,在用户走音走调等情况下,也可正确估计用户哼唱旋律。

Description

音频信号的旋律检测方法、装置以及电子设备 技术领域
本发明涉及音频处理领域,具体而言,本发明涉及一种音频信号的旋律检测方法、装置以及电子设备。
背景技术
在日常生活中,歌唱是一种重要的文化活动和娱乐方式。随着这种娱乐方式的发展,为了能够对用户演唱的歌曲进行分类或根据用户喜好进行自动和弦匹配等,就需要对用户演唱的歌曲进行旋律识别。但是对于未经专业音乐训练的用户来说,演唱中不可避免的会出现轻微的音高不准(走调)的情况。此时,就会对音乐旋律的准确识别带来挑战。
现有的技术方案是对用户演唱的歌曲进行语音识别,主要通过识别用户演唱歌曲的音频信号中的歌词,并根据识别到的歌词在数据库中进行匹配,得到用户演唱歌曲的旋律信息。但考虑到实际情况中,用户可能只是哼唱出了一个旋律,没有明确的歌词,或者只是用1~2个字的简单歌词重复,没有实际的歌词意义,此时原有的基于语音识别的方法就会失败。此外,用户也可能演唱的是自己创作的一段旋律,此时原有的数据库匹配方法也不再适用。
发明内容
本发明的目的旨在解决至少一个上述的技术缺陷。本发明无需用户演唱出明确的歌词,仅需用户哼唱出一个旋律;同时在应对用户是非专业歌手存在轻微走调的情况下,可识别出用户所演唱内容相应比较准确的旋律。
为实现上述目的,本发明提供了一种音频信号的旋律检测方法,包括如下步骤:按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值;根据所述音高值所属的频率范围确定各个音频段对应的音名;利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶;根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律。
在一种实施例的音频信号的旋律检测方法中,所述按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值的步骤,包括:根据设定的节拍类型确定各个音频段的持续 时间;根据所述持续时间将音频信号分割为若干音频段;其中,上述音频段为根据节拍确定的小节;将每一个所述音频段等分为若干音频小段;分别检测各个所述音频小段中各帧音频子信号的基音频率;将所述音频小段中持续稳定多帧音频子信号的基音频率的均值作为音高值。
在一种实施例的音频信号的旋律检测方法中,所述将所述音频小段中持续稳定多帧音频子信号的基音频率的均值作为音高值的步骤之后,还包括:计算每个音频小段中所述音高值的稳定持续时间;当所述稳定持续时间小于设定的门限时,将对应音频小段的音高值设置为零。
在一种实施例的音频信号的旋律检测方法中,所述根据所述音高值所属的频率范围确定各个音频段对应的音名的步骤,包括:将所述音高值输入音名编号生成模型得到音名编号;根据所述音名编号在音名序列表中查找各个音频段的音高值所属的频率范围,确定音高值对应的音名。
在一种实施例的音频信号的旋律检测方法中,所述将所述音高值输入音名编号生成模型得到音名编号的步骤中,所述音名编号生成模型表述为:
Figure PCTCN2019093204-appb-000001
其中,所述K是音名编号,所述f m-n是第m个所述音频段中第n个音的音高值的频率,所述a是用于定位的音名的频率,所述mod是求余函数。
在一种实施例的音频信号的旋律检测方法中,所述利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶的步骤,包括:获取所述音频信号中各个音频段对应的音名;将所述音名通过定调算法处理对所述音频信号的调式进行估计;根据所述调式确定定位音符的间隔半音数,并根据所述间隔半音数计算得到音频信号对应的音阶。
在一种实施例的音频信号的旋律检测方法中,所述根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律的步骤,包括:获取音频信号音阶的音高列表;其中,所述音高列表记载音高值与音阶之间的对应关系;根据音频信号内各个音频段的音高值,在所述音高列表中查找音高值对应的音符;根据各个音频段中音高值对应的时间顺序,将所述音符按照所述时间顺序进行排序,并根据所述排序将音符转换为对应音频信号的旋律。
在一种实施例的音频信号的旋律检测方法中,所述按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估 计各个音频段的音高值的步骤之前,还包括:将音频信号进行短时傅里叶变换;其中,所述音频信号为哼唱或清唱的音频信号;对进行短时傅立叶变换的结果,进行基音频率检测,得到基音频率;其中,基音频率用于音高值的检测;若检测不到基音频率,则在各帧音频子信号对应的信号位置输入插值频率;将所述插值频率作为对应帧的音频信号的基音频率。
在一种实施例的音频信号的旋律检测方法中,所述按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值的步骤之前,还包括:根据设定的节奏信息生成所述音频信号的音乐节奏;根据所述音乐节奏生成节拍和时间的提示信息。
本发明还提供一种音频信号的旋律检测装置,包括:音高检测单元,用于按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值;音名检测单元,用于根据所述音高值所属的频率范围确定各个音频段对应的音名;调式检测单元,用于利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶;旋律检测单元,用于根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律。
本发明还提供一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行上述实施例中任意一项所述的音频信号的旋律检测方法。
本发明还提供一种非临时性计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行上述实施例中任意一项所述的音频信号的旋律检测方法。
本实施例中提供的音频信号的旋律检测方案:按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值;根据所述音高值所属的频率范围确定各个音频段对应的音名;利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶;根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律的技术方案。通过上述技术方案对音频信号划分的音频段中多帧音频子信号对应的基音频率进行音高值、音名、调式估计、音阶确定等处理步骤,最终输出用户哼唱或者清唱音频信号的旋律。本发明提供的技术方案可以对自行作曲、无意义哼唱、歌词 错误演唱、吐字不清演唱、发声不稳定、音准不准、走音、破音等不良演唱和非专业演唱的音频信号,进行准确的旋律检测,而不依赖于用户的发音标准或者演唱准确。利用本发明的技术方案,即使在用户走音走调等情况下,也可以修正用户哼唱的旋律,最终输出正确的旋律。因此,本发明技术方案在得到准确的旋律上具有更好的鲁棒性,甚至可以在演唱者走调偏移程度小于1.5个半音的情况下,都具有良好的识别效果。
附图说明
本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为一个实施例的音频信号的旋律检测方法的方法流程图;
图2为实施例的确定音频信号中各个音频段音高值的方法流程图;
图3为音频信号中一个音频段划分为八个音频小段的示意图;
图4为对稳定持续时间小于门限的音高值配置为零的方法流程图;
图5为实施例的根据音高值所在频率范围确定音名的方法流程图;
图6为实施例的根据各音频段的音名定调和确定音阶的方法流程图;
图7在本实施例中展示了一种间隔半音数、音名以及频率值的关系、与一种音高值与音阶的关系;
图8为实施例的根据调式和音阶将音高值生成旋律的方法流程图;
图9为实施例的对音频信号预处理的方法流程图;
图10为实施例的根据选定节奏信息生成提示信息的方法流程图;
图11为一个实施例的音频信号的旋律检测装置结构图;
图12为实施例的音频信号的旋律检测电子设备的结构图。
具体实施方式
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能解释为对本发明的限制。
为了克服旋律识别准确率低,对演唱者演唱音准要求高,否则不能得到有效和准确旋律信息的技术缺陷,本发明提供一种对音频信号进行旋律检测的技术方案,能够对音频信号中形成的旋律进行识别并输出,尤其适用于清唱或者哼唱以及音准 不准的等演唱。除此之外,本发明还适用于无歌词演唱等场景。
请参考图1,本发明提供一种音频信号的旋律检测方法,包括如下步骤:
步骤S1:按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值;
步骤S2:根据所述音高值所属的频率范围确定各个音频段对应的音名;
步骤S3:利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶;
步骤S4:根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律。
在上述技术方案中,以识别用户哼唱的音频信号的旋律为例。可以选择指定的节拍,指定的节拍即音频信号的旋律的节拍,例如是1/4拍、1/2拍、1拍、2拍、4拍。根据指定的节拍,将音频信号划分为多个音频段,每个音频段对应节拍中的一个小节,每个音频段中包括多帧音频子信号。
在本实施例中,可以将所选节拍的标准持续时间设定为一个小节,并将音频信号按照标准持续时间划分为多个音频段,即按照一小节的标准持续时间划分音频段。再针对该小节的音频段进行等分,例如将一个小节等分为八个音频小段,可以将每一个音频小段的持续时长作为一个稳定音高值的输出时间。
在一份音频信号中,根据用户演唱速度的不同,一般分为快速(120拍/分钟)、中速(90拍/分钟)、慢速(30拍/分钟)。以一个小节包含两拍为例,一个小节的标准持续时间约在1秒至2秒之间,那么上述音高值的输出时间约在125毫秒至250毫秒之间。
在执行上述步骤S1时,当用户哼唱至第m个小节,对第m个小节的音频段进行检测。其中,若将第m个小节的音频段等分为八个音频小段,每一个音频小段将确定一个音高值,即每一个小段对应一个音高值。
具体地,每一个音频小段中包括多帧音频子信号,可以检测各帧音频子信号的基音频率,并根据所述基音频率得到各个音频小段相应的音高值。获取各个音频段中每个音频小段的音高值,据此确定各个音频段中每个音频小段对应的音名。同样地,各个音频段可以包括多个音名,也可以只哼唱同一个音名。
利用各个音频段的音名,对上述用户哼唱得到的音频信号的调式进行估计,获取对应音频信号的音阶。在得到对多段音频段对应的音名后,对多个音名的变化进 行调式估计,得到对应音频信号的调式。其中,通过调式可以确定用户所哼唱的基调,例如是C调或F#调。根据所确定的调式和音程关系确定用户所哼唱的音频信号的音阶。
音阶上每个音对应一定的频率范围,根据各个音频段的音高值,通过判断每个音频段的基音频率落入在上述音阶中的各个音频率区间,确定所述音频信号的旋律。
请参考图2,本发明实施例为了获得更准确的音高值,为此提供一种技术方案,上述步骤S1:按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值的步骤,具体包括:
步骤S11:根据设定的节拍类型确定各个音频段的持续时间。
步骤S12:根据所述持续时间将音频信号分割为若干音频段。其中,上述音频段为根据节拍确定的小节。
步骤S13:将每一个所述音频段等分为若干音频小段。
步骤S14:分别检测各个所述音频小段中各帧音频子信号的基音频率。
步骤S15:将所述音频小段中持续稳定多帧音频子信号的基音频率的均值作为音高值。
上述技术方案根据设定的节拍类型可以确定各个音频段的持续时间。根据音频段的持续时间,将一定时长的音频信号分割为若干音频段。每一段音频段对应上述根据节拍确定的小节。
为了更好的说明上述步骤S13,请参考图3。图3展示了音频信号中将一个音频段(一个小节)等分,划分为八个音频小段的示例。图3中的音频小段包括:音频小段X-1、音频小段X-2、音频小段X-3、音频小段X-4、音频小段X-5、音频小段X-6、音频小段X-7、音频小段X-8。
在用户哼唱得到的一份音频信号中,每一个音频小段一般包括起始、持续、结束的三个过程。在图3所出示的每一个音频小段内,检测音高变化最稳定且持续时间最长的基音频率,将该基音频率作为该音频小段的音高值。在上述检测过程中,每一个音频小段的起始和结束过程一般是音高变化比较剧烈的区域。音高变化剧烈的区域会影响所检测得到音高值的准确性。在进一步改进的技术方案中,检测音高值之前可以将音高变化剧烈的区域予以去除,以增强音高值检测结果的准确性。
具体来说,在每一个音频小段中,根据基音频率的检测结果,将基音频率变化在±5赫兹以内且持续时间最长片段,作为对应音频小段的持续稳定片段。
如果上述持续时间最长片段的时间长度大于一定门限,则对该持续时间最长片段内的所有基音频率求均值,将求得的均值作为这个音频小段的音高值输出。其中,上述的门限是指每一个音频小段的最小稳定持续的时间。在本实施例中,将上述门限示例性地选定为音频小段时间长度的三分之一。在一个小节(一个音频段)中,如果音频小段持续时间最长片段的时间长度大于一定门限,那么该小节(该音频段)将会输出八个音,每一个音对应一个音频小段。
为此请参考图4,本发明的实施例中提供一种技术方案,所述步骤S15将所述音频小段中持续稳定多帧音频子信号的基音频率的均值作为音高值的步骤之后,还包括:
步骤S16:计算每个音频小段中所述音高值的稳定持续时间。
步骤S17:当所述稳定持续时间小于设定的门限时,将对应音频小段的音高值设置为零。其中,上述的门限是指每一个音频小段的最小稳定持续的时间。
在检测音高值的过程中,每一个音频小段中持续时间最长片段的时间是音高值的稳定持续时间。上述持续时间最长片段的稳定持续时间小于设定的门限时,将对应音频小段的音高值设置为零。
本发明的实施例中还提供一种准确检测音频段的音名的技术方案,请参考图5,在步骤S2根据所述音高值所属的频率范围确定各个音频段对应的音名的步骤,包括:
步骤S21:将所述音高值输入音名编号生成模型得到音名编号。
步骤S22:根据所述音名编号在音名序列表中查找各个音频段的音高值所属的频率范围,确定音高值对应的音名。
在上述过程中,将每个音频段的音高值输入到音名编号生成模型中,得到音名编号。
根据每个音频段的音名编号,在音名序列表中查找各个音频段的音高值所属的频率范围,确定音高值对应的音名。在本实施例中,音名编号的值所属的范围在音名序列表中也可以对应音名。
本发明还提供一种音名编号生成模型,上述音名编号生成模型表述为:
Figure PCTCN2019093204-appb-000002
其中,所述K是音名编号,所述f m-n是第m个所述音频段(第m个小节)中第n个音(对应第n个音频小段)的音高值的频率,所述a是用于定位的音名的频率,所述mod是求余函数。12个音名编号的数量设定是根据十二平均律确定的,即 一个八度有12个音名。
例如,假设第四个音频段(第四个小节)的第2个音频小段X-2的估计音高值为f 4-2=450赫兹,本实施例中确定用于定位的音名为A,该音名对应的频率为440赫兹,也就是a=440赫兹。在本实施例中,12个音名编号的设定是根据十二平均律。
当f 4-2=450赫兹时,该音频段的第2个音的音名编号K=1,通过音名序列表(请见图7,图7中展示的一种间隔半音数、音名以及频率值的关系构成上述音名序列表)可以查找到该音频段的第2个音的音名是A,即音频小段X-2的音名为A。
下面展示了一种音名序列表,音名序列表记载音名编号K的值所在的音名编号范围与音名之间的一一对应的关系。
音名A对应的音名编号范围为:0.5<K≤1.5;
音名A#对应的音名编号范围为:1.5<K≤2.5;
音名B对应的音名编号范围为:2.5<K≤3.5;
音名C对应的音名编号范围为:3.5<K≤4.5;
音名C#对应的音名编号范围为:4.5<K≤5.5;
音名D对应的音名编号范围为:5.5<K≤6.5;
音名D#对应的音名编号范围为:6.5<K≤7.5;
音名E对应的音名编号范围为:7.5<K≤8.5;
音名F对应的音名编号范围为:8.5<K≤9.5;
音名F#对应的音名编号范围为:9.5<K≤10.5;
音名G对应的音名编号范围为:10.5<K≤11.5;
音名G#对应的音名编号范围为:11.5<K或K≤0.5。
通过音名编号范围,可以初步将用户演唱走音、走调等情况的音高处理到靠近准确演唱的音名上,便于后续的调式估计、音阶确定和旋律检测等处理,增强后续输出旋律的准确性。
请参考图6,本发明提供一种技术方案,可以确定用户哼唱的音频信号的调式以及对应的音阶。本发明中步骤S3所述利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶,包括:
步骤S31:获取所述音频信号中各个音频段对应的音名。
步骤S32:将所述音名通过定调算法处理对所述音频信号的调式进行估计。
步骤S33:根据所述调式确定定位音符的间隔半音数,并根据所述间隔半音数 计算得到音频信号对应的音阶。
在上述过程中,可以通过获得音频信号中各个音频段对应的音名,根据音频信号的多个音名进行调式估计。其中,调式估计采用定调算法处理,定调算法可以是Krumhansl-Schmuckler(克鲁姆汉斯尔-施穆克勒)等定调算法。定调算法可以输出上述用户哼唱的音频信号的调式,例如本实施例中所输出的调式可以用间隔半音数表示,调式还可以用音名表示,间隔半音数与前述12个音名之间一一对应。
根据定调算法所确定的调式,可以确定定位音符的间隔半音数。例如,在本实施例中,确定音频信号的调式是F#,其间隔半音数是9,音名是F#。调式F#表示的是以F#作为Do(唱名),Do就是定位音符,也就是音阶的第一个音符。当然,在其它可能的处理方式中,可以将定位音符设定为是音阶中的任一音符,并进行相应的转换。本实施例中,将第一个音符作为定位音符可以减少一些处理。
本实施例可以根据音频信号的调式(F#)确定定位音符(Do)的间隔半音数为9,并根据该间隔半音数计算得到音频信号对应的音阶。
在上述过程中,根据调式(F#)确定定位音符(Do),定位音符是音阶中的第一个音符,也就是唱名(Do)所对应的音符。根据调式F#的大调音阶中的音程关系(全音-全音-半音-全音-全音-全音-半音),可以确定音阶。调式F#的音阶,以音名按照顺序表示为:F#、G#、A#、B、C#、D#、F。调式F#的音阶,以唱名按照顺序表示为:Do、Re、Mi、Fa、Sol、La、Si。
在本实施例中,当定调算法获得的是间隔半音数时,可以通过下列换算关系得到音阶:
Do=(Key+3)mod 12;
Re=(Key+5)mod 12;
Mi=(Key+7)mod 12;
Fa=(Key+8)mod 12;
Sol=(Key+10)mod 12;
La=Key;
Si=(Key+2)mod 12;
上述换算关系中,Key表示调式确定定位音符的间隔半音数,mod表示求余函数,其中的Do、Re、Mi、Fa、Sol、La、Si分别是音阶中的唱名的间隔半音数。获得各个唱名的间隔半音数就可以通过图7确定各个音阶的音名。
图7表示的是间隔半音数、音名以及频率值的关系,其中包括间隔半音数和音名之间的频率值的倍数关系。
在本实施例中,若定调算法输出的调式是C,其间隔半音数是3,通过音程关系换算可以得到调式是C的音频信号的音阶。以音名按照顺序表示的音阶为:C、D、E、F、G、A、B。以唱名按照顺序表示的音阶为:Do、Re、Mi、Fa、Sol、La、Si。
请参考图8,本发明的实施例中提供一种技术方案,步骤S4根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律的步骤,包括:
步骤S41:获取音频信号音阶的音高列表。
其中,所述音高列表记载音高值与音阶之间的对应关系。音高列表可以参考图7(图7展示的一种音高值与音阶的关系构成上述音高列表),音阶中各个音名对应一个音高值,音高值以频率(赫兹)表示。
步骤S42:根据音频信号内各个音频段的音高值,在所述音高列表中查找音高值对应的音符。
步骤S43:根据各个音频段中音高值对应的时间顺序,将所述音符按照所述时间顺序进行排序,并根据所述排序将音符转换为对应音频信号的旋律。
在上述过程中,可以获取音频信号对应音阶的音高列表如图7,根据音频信号中各个音频段的音高值,在音高列表中查找对应的音符。音符可以音名的形式表示。
在本实施例中,例如当音高值为440赫兹时,在音高列表中查找的到音符的音名为A 1。因此,可以根据音频信号的中各个音频段的音高值的频率找到对应时间的音符及其持续时间。
根据各个音频段中音高值对应的时间顺序,将音符按照出现的时间顺序进行排序。根据音符的时间排序,将音符转换为对应音频信号的旋律。得到的旋律可以是简谱、五线谱、音名、唱名展示的旋律,还可以是以标准音准的音乐输出。
在本实施例中,得到旋律之后,还可以进行旋律进行哼唱检索,即对应曲目信息的检索;也可以对哼唱的旋律进行和弦、伴奏、和声的处理;还可以确定用户哼唱的歌曲类型并进行用户特征的分析。除此之外,还可以根据用户哼唱的旋律和得到旋律之间进行差异计算,得到用户哼唱准确性的评分。
在本发明提供的实施例中,请参考图9,步骤S1按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值的步骤之前,还包括:
步骤A1:将音频信号进行短时傅里叶变换。其中,所述音频信号为哼唱或清唱的音频信号。
步骤A2:对进行短时傅立叶变换的结果,进行基音频率检测,得到基音频率。其中,基音频率用于音高值的检测。
步骤A3:若检测不到基音频率,则在各帧音频子信号对应的信号位置输入插值频率。
步骤A4:将所述插值频率作为对应帧的音频信号的基音频率。
在上述过程,可以通过收音设备获取用户哼唱的音频信号。对音频信号进行短时傅里叶变换,将音频信号处理后输出为短时傅立叶变换的结果。根据帧长和帧移,对音频信号的短时傅里叶变换得到多帧的短时傅立叶变换的结果。
上述的音频信号可以是用户通过清唱或者哼唱曲目采集得到的音频信号,所清唱或者哼唱的曲目可以是自行创作的歌曲。对每一帧短时傅立叶变换的结果进行基音频率检测得到基音频率,进而得到上述音频信号对应的多帧基音频率。基音频率可以用于后续对音频信号的基音检测。
由于用户哼唱声音小或者采集得到的音频信号较弱,有可能导致检测不到基音频率。当音频信号中某些音频小段检测不到基音频率时,则在音频小段中,对应的信号位置输入该插值频率。其中,插值频率可以根据插值算法得到。前述的插值频率可以作为对应音频小段的基音频率。
请参考图10,为了进一步提升旋律识别的准确性,本发明实施例中提供一种技术方案,步骤S1按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值的步骤之前,还包括:
步骤B1:根据设定的节奏信息生成所述音频信号的音乐节奏。
步骤B2:根据所述音乐节奏生成节拍和时间的提示信息。
在上述过程中,用户可以根据即将要哼唱的曲目选定节奏信息。获取用户设定的节奏信息生成对应音频信号的音乐节奏。
进一步根据上述得到的节奏信息,生成提示信息。其中,提示信息可以提示用户即将要生成的音频信号的节拍和时间。为了便于理解,节拍可以是鼓点、钢琴声等形式体现,还可以是通过用户持有的设备发出的震动、闪光体现。
在本实施例中,举一个例子,用户选定的节奏信息是1/4拍,根据用户选定的 1/4拍生成音乐节奏,并生成符合1/4拍的节拍,反馈到用户持有的设备(例如是手机或者是演唱工具),以震动的形式向用户提示1/4的节拍。除此之外,还可以根据1/4拍的节拍生成辅助用户哼唱的鼓点或者钢琴伴奏,用户所持有的设备或耳机可以向用户播放鼓点或者钢琴伴奏,从而提升得到音频信号的旋律准确性。
根据用户选定的时间长度,可以在哼唱开始或者哼唱结束通过震动或者提示音等提示信息,提示用户哼唱的起点和终点。除此之外,提示信息也可以通过显示屏等可视化手段进行提示。
请参考图11,为了克服对音频信号对应的音频信号准确性要求很高,识别准确率低,不能得到有效和准确旋律信息的技术缺陷,本发明提供一种对音频信号进行旋律检测的装置——音频信号的旋律检测装置,包括:
音高检测单元111,用于按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值。
音名检测单元112,用于根据所述音高值所属的频率范围确定各个音频段对应的音名。
调式检测单元113,用于利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶。
旋律检测单元114,用于根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律。
请参考图12,本实施例中还提供一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行上述任一实施例所述的音频信号的旋律检测方法。
具体地,图12是根据一示例性实施例示出的一种执行音频信号的旋律检测方法的一种电子设备的框图。例如,电子设备1200可以被提供为一服务器。参照图12,电子设备1200包括处理组件1222,其进一步包括一个或多个处理器,以及由存储器1232所代表的存储器资源,用于存储可由处理组件1222的执行的指令,例如应用程序。存储器1232中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1222被配置为执行指令,以执行上述音频信号的旋律检测方法。
电子设备1200还可以包括一个电源组件1226被配置为执行电子设备1200的电源管理,一个有线或无线网络接口1250被配置为将电子设备1200连接到网络,和 一个输入输出(I/O)接口1258。电子设备1200可以操作基于存储在存储器1232的操作系统,例如Windows ServerTM、Mac OS XTM、UnixTM、LinuxTM、FreeBSDTM或类似。其中,电子设备可以是计算机设备、手机、平板电脑等终端。
本实施例还提供一种非临时性计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行上述实施例中任意一项所述的音频信号的旋律检测方法。
本实施例中提供的音频信号的旋律检测方案:按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值;根据所述音高值所属的频率范围确定各个音频段对应的音名;利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶;根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律的技术方案。通过上述技术方案对音频信号划分的音频段中多帧音频子信号对应的基音频率进行音高值、音名、调式估计、音阶确定等处理步骤,最终输出用户哼唱或者清唱音频信号的旋律。本发明实施例所提供的技术方案可以对自行作曲、无意义哼唱、歌词错误演唱、吐字不清演唱、发声不稳定、音准不准、走音、破音等不良演唱和非专业演唱的音频信号,进行准确的旋律检测,而不依赖于用户的发音标准或者演唱准确。利用本发明实施例的技术方案,即使在用户走音走调等情况下,也可以修正用户哼唱的旋律,最终输出正确的旋律。因此,本发明技术方案在得到准确的旋律上具有更好的鲁棒性,甚至可以在演唱者走调偏移程度小于1.5个半音的情况下,都具有良好的识别效果。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
以上所述仅是本发明的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (12)

  1. 一种音频信号的旋律检测方法,其特征在于,包括如下步骤:
    按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值;
    根据所述音高值所属的频率范围确定各个音频段对应的音名;
    利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶;
    根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律。
  2. 根据权利要求1所述的音频信号的旋律检测方法,其特征在于,所述按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值的步骤,包括:
    根据设定的节拍类型确定各个音频段的持续时间;
    根据所述持续时间将音频信号分割为若干音频段;其中,上述音频段为根据节拍确定的小节;
    将每一个所述音频段等分为若干音频小段;
    分别检测各个所述音频小段中各帧音频子信号的基音频率;
    将所述音频小段中持续稳定多帧音频子信号的基音频率的均值作为音高值。
  3. 根据权利要求2所述的音频信号的旋律检测方法,其特征在于,所述将所述音频小段中持续稳定多帧音频子信号的基音频率的均值作为音高值的步骤之后,还包括:
    计算每个音频小段中所述音高值的稳定持续时间;
    当所述稳定持续时间小于设定的门限时,将对应音频小段的音高值设置为零。
  4. 根据权利要求1所述的音频信号的旋律检测方法,其特征在于,所述根据所述音高值所属的频率范围确定各个音频段对应的音名的步骤,包括:
    将所述音高值输入音名编号生成模型得到音名编号;
    根据所述音名编号在音名序列表中查找各个音频段的音高值所属的频率范围,确定音高值对应的音名。
  5. 根据权利要求4所述的音频信号的旋律检测方法,其特征在于,所述将所述音高值输入音名编号生成模型得到音名编号的步骤中,所述音名编号生成模型表述 为:
    Figure PCTCN2019093204-appb-100001
    其中,所述K是音名编号,所述f m-n是第m个所述音频段中第n个音的音高值的频率,所述a是用于定位的音名的频率,所述mod是求余函数。
  6. 根据权利要求1所述的音频信号的旋律检测方法,其特征在于,所述利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶的步骤,包括:
    获取所述音频信号中各个音频段对应的音名;
    将所述音名通过定调算法处理对所述音频信号的调式进行估计;
    根据所述调式确定定位音符的间隔半音数,并根据所述间隔半音数计算得到音频信号对应的音阶。
  7. 根据权利要求1所述的音频信号的旋律检测方法,其特征在于,所述根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律的步骤,包括:
    获取音频信号音阶的音高列表;其中,所述音高列表记载音高值与音阶之间的对应关系;
    根据音频信号内各个音频段的音高值,在所述音高列表中查找音高值对应的音符;
    根据各个音频段中音高值对应的时间顺序,将所述音符按照所述时间顺序进行排序,并根据所述排序将音符转换为对应音频信号的旋律。
  8. 根据权利要求1所述的音频信号的旋律检测方法,其特征在于,所述按节拍对音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值的步骤之前,还包括:
    将音频信号进行短时傅里叶变换;其中,所述音频信号为哼唱或清唱的音频信号;
    对进行短时傅立叶变换的结果,进行基音频率检测,得到基音频率;其中,基音频率用于音高值的检测;
    若检测不到基音频率,则在各帧音频子信号对应的信号位置输入插值频率;
    将所述插值频率作为对应帧的音频信号的基音频率。
  9. 根据权利要求1所述的音频信号的旋律检测方法,其特征在于,所述按节拍 将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值的步骤之前,还包括:
    根据设定的节奏信息生成所述音频信号的音乐节奏;
    根据所述音乐节奏生成节拍和时间的提示信息。
  10. 一种音频信号的旋律检测装置,其特征在于,包括:
    音高检测单元,用于按节拍将音频信号划分为多个音频段,检测各个音频段中各帧音频子信号的基音频率,并根据所述基音频率估计各个音频段的音高值;
    音名检测单元,用于根据所述音高值所属的频率范围确定各个音频段对应的音名;
    调式检测单元,用于利用各个音频段的音名对所述音频信号的调式进行估计,获取所述音频信号的音阶;
    旋律检测单元,用于根据各个音频段的音高值在所述音阶中的频率区间,确定所述音频信号的旋律。
  11. 一种电子设备,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行权利要求1至9任意一项所述的音频信号的旋律检测方法。
  12. 一种非临时性计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行权利要求1至9任意一项所述的音频信号的旋律检测方法。
PCT/CN2019/093204 2019-03-29 2019-06-27 音频信号的旋律检测方法、装置以及电子设备 WO2020199381A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
SG11202110700SA SG11202110700SA (en) 2019-03-29 2019-06-27 Melody detection method for audio signal, device and electronic apparatus
EP19922753.9A EP3929921B1 (en) 2019-03-29 2019-06-27 Melody detection method for audio signal, device, and electronic apparatus
US17/441,640 US20220165239A1 (en) 2019-03-29 2019-06-27 Method for detecting melody of audio signal and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910251678.X 2019-03-29
CN201910251678.XA CN109979483B (zh) 2019-03-29 2019-03-29 音频信号的旋律检测方法、装置以及电子设备

Publications (1)

Publication Number Publication Date
WO2020199381A1 true WO2020199381A1 (zh) 2020-10-08

Family

ID=67081833

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093204 WO2020199381A1 (zh) 2019-03-29 2019-06-27 音频信号的旋律检测方法、装置以及电子设备

Country Status (5)

Country Link
US (1) US20220165239A1 (zh)
EP (1) EP3929921B1 (zh)
CN (1) CN109979483B (zh)
SG (1) SG11202110700SA (zh)
WO (1) WO2020199381A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113178183A (zh) * 2021-04-30 2021-07-27 杭州网易云音乐科技有限公司 音效处理方法、装置、存储介质和计算设备

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110610721B (zh) * 2019-09-16 2022-01-07 上海瑞美锦鑫健康管理有限公司 一种基于歌词演唱准确度的检测系统及方法
CN111081277B (zh) * 2019-12-19 2022-07-12 广州酷狗计算机科技有限公司 音频测评的方法、装置、设备及存储介质
CN112416116B (zh) * 2020-06-01 2022-11-11 上海哔哩哔哩科技有限公司 计算机设备的震动控制方法和系统
CN111696500B (zh) * 2020-06-17 2023-06-23 不亦乐乎科技(杭州)有限责任公司 一种midi序列和弦进行识别方法和装置
CN113539296B (zh) * 2021-06-30 2023-12-29 深圳万兴软件有限公司 一种基于声音强度的音频高潮检测算法、存储介质及装置
CN113744763B (zh) * 2021-08-18 2024-02-23 北京达佳互联信息技术有限公司 确定相似旋律的方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009186762A (ja) * 2008-02-06 2009-08-20 Yamaha Corp 拍タイミング情報生成装置およびプログラム
CN101710010A (zh) * 2009-11-30 2010-05-19 河南平高电气股份有限公司 隔离开关动静触头夹紧力测试装置
CN101916564A (zh) * 2008-12-05 2010-12-15 索尼株式会社 信息处理装置、旋律线提取方法、低音线提取方法及程序
CN103854644A (zh) * 2012-12-05 2014-06-11 中国传媒大学 单声道多音音乐信号的自动转录方法及装置
CN106057208A (zh) * 2016-06-14 2016-10-26 科大讯飞股份有限公司 一种音频修正方法及装置
CN106157973A (zh) * 2016-07-22 2016-11-23 南京理工大学 音乐检测与识别方法

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR970009939B1 (ko) * 1988-02-29 1997-06-19 닛뽄 덴기 호움 엘렉트로닉스 가부시기가이샤 자동채보(採譜) 방법 및 그 장치
JP3047068B2 (ja) * 1988-10-31 2000-05-29 日本電気株式会社 自動採譜方法及び装置
US5327518A (en) * 1991-08-22 1994-07-05 Georgia Tech Research Corporation Audio analysis/synthesis system
WO2001069575A1 (en) * 2000-03-13 2001-09-20 Perception Digital Technology (Bvi) Limited Melody retrieval system
JP3570332B2 (ja) * 2000-03-21 2004-09-29 日本電気株式会社 携帯電話装置及びその着信メロディ入力方法
US6587816B1 (en) * 2000-07-14 2003-07-01 International Business Machines Corporation Fast frequency-domain pitch estimation
DE102006008260B3 (de) * 2006-02-22 2007-07-05 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zur Analyse eines Audiodatums
DE102006008298B4 (de) * 2006-02-22 2010-01-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum Erzeugen eines Notensignals
WO2007119182A1 (en) * 2006-04-14 2007-10-25 Koninklijke Philips Electronics, N.V. Selection of tonal components in an audio spectrum for harmonic and key analysis
JP4375471B2 (ja) * 2007-10-05 2009-12-02 ソニー株式会社 信号処理装置、信号処理方法、およびプログラム
US8468014B2 (en) * 2007-11-02 2013-06-18 Soundhound, Inc. Voicing detection modules in a system for automatic transcription of sung or hummed melodies
CN101504834B (zh) * 2009-03-25 2011-12-28 深圳大学 一种基于隐马尔可夫模型的哼唱式旋律识别方法
CN102053998A (zh) * 2009-11-04 2011-05-11 周明全 一种利用声音方式检索歌曲的方法及系统装置
TWI426501B (zh) * 2010-11-29 2014-02-11 Inst Information Industry 旋律辨識方法與其裝置
CN106157958A (zh) * 2015-04-20 2016-11-23 汪蓓 哼唱相对旋律谱提取技术
CN106547797B (zh) * 2015-09-23 2019-07-05 腾讯科技(深圳)有限公司 音频生成方法和装置
US9852721B2 (en) * 2015-09-30 2017-12-26 Apple Inc. Musical analysis platform
CN106875929B (zh) * 2015-12-14 2021-01-19 中国科学院深圳先进技术研究院 一种音乐旋律转化方法及系统
US20190294877A1 (en) * 2018-03-25 2019-09-26 Dror Dov Ayalon Method and system for identifying an optimal sync point of matching signals
US10714065B2 (en) * 2018-06-08 2020-07-14 Mixed In Key Llc Apparatus, method, and computer-readable medium for generating musical pieces

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009186762A (ja) * 2008-02-06 2009-08-20 Yamaha Corp 拍タイミング情報生成装置およびプログラム
CN101916564A (zh) * 2008-12-05 2010-12-15 索尼株式会社 信息处理装置、旋律线提取方法、低音线提取方法及程序
CN101710010A (zh) * 2009-11-30 2010-05-19 河南平高电气股份有限公司 隔离开关动静触头夹紧力测试装置
CN103854644A (zh) * 2012-12-05 2014-06-11 中国传媒大学 单声道多音音乐信号的自动转录方法及装置
CN106057208A (zh) * 2016-06-14 2016-10-26 科大讯飞股份有限公司 一种音频修正方法及装置
CN106157973A (zh) * 2016-07-22 2016-11-23 南京理工大学 音乐检测与识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3929921A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113178183A (zh) * 2021-04-30 2021-07-27 杭州网易云音乐科技有限公司 音效处理方法、装置、存储介质和计算设备
CN113178183B (zh) * 2021-04-30 2024-05-14 杭州网易云音乐科技有限公司 音效处理方法、装置、存储介质和计算设备

Also Published As

Publication number Publication date
SG11202110700SA (en) 2021-10-28
EP3929921A4 (en) 2022-04-27
EP3929921B1 (en) 2024-07-31
CN109979483A (zh) 2019-07-05
EP3929921A1 (en) 2021-12-29
CN109979483B (zh) 2020-11-03
US20220165239A1 (en) 2022-05-26

Similar Documents

Publication Publication Date Title
WO2020199381A1 (zh) 音频信号的旋律检测方法、装置以及电子设备
US8859872B2 (en) Method for giving feedback on a musical performance
Bosch et al. Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music
US9087500B2 (en) Note sequence analysis apparatus
Gupta et al. Perceptual evaluation of singing quality
US9852721B2 (en) Musical analysis platform
US10643638B2 (en) Technique determination device and recording medium
WO2023040332A1 (zh) 一种曲谱生成方法、电子设备及可读存储介质
WO2019180830A1 (ja) 歌唱評価方法及び装置、プログラム
CN105244021B (zh) 哼唱旋律到midi旋律的转换方法
JP4722738B2 (ja) 楽曲分析方法及び楽曲分析装置
US20230335090A1 (en) Information processing device, information processing method, and program
JP2020112683A (ja) 音響解析方法および音響解析装置
Tang et al. Melody Extraction from Polyphonic Audio of Western Opera: A Method based on Detection of the Singer's Formant.
JPH11237890A (ja) 歌唱採点機能付きカラオケ装置における歌唱採点方法
JP6604307B2 (ja) コード検出装置、コード検出プログラムおよびコード検出方法
JP2008015212A (ja) 音程変化量抽出方法、ピッチの信頼性算出方法、ビブラート検出方法、歌唱訓練プログラム及びカラオケ装置
WO2020255214A1 (ja) 楽曲解析装置、プログラムおよび楽曲解析方法
Jin et al. An automatic grading method for singing evaluation
CN113270081B (zh) 调整歌伴奏音的方法及调整歌伴奏音的电子装置
US20210366453A1 (en) Sound signal synthesis method, generative model training method, sound signal synthesis system, and recording medium
Huang et al. Pitch and mode recognition of humming melodies
Ištvánek et al. Towards Automatic Measure-Wise Feature Extraction Pipeline for Music Performance Analysis
JP2012058277A (ja) 歌唱音声評価装置
JP2008015213A (ja) ビブラート検出方法、歌唱訓練プログラム及びカラオケ装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19922753

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019922753

Country of ref document: EP

Effective date: 20210920