WO2023040332A1 - 一种曲谱生成方法、电子设备及可读存储介质 - Google Patents

一种曲谱生成方法、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2023040332A1
WO2023040332A1 PCT/CN2022/094961 CN2022094961W WO2023040332A1 WO 2023040332 A1 WO2023040332 A1 WO 2023040332A1 CN 2022094961 W CN2022094961 W CN 2022094961W WO 2023040332 A1 WO2023040332 A1 WO 2023040332A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
audio
score
information
beat
Prior art date
Application number
PCT/CN2022/094961
Other languages
English (en)
French (fr)
Inventor
芮元庆
蒋义勇
李毓磊
Original Assignee
腾讯音乐娱乐科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯音乐娱乐科技(深圳)有限公司 filed Critical 腾讯音乐娱乐科技(深圳)有限公司
Publication of WO2023040332A1 publication Critical patent/WO2023040332A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/38Chord
    • G10H1/383Chord detection and/or recognition, e.g. for correction, or automatic bass generation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/071Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for rhythm pattern analysis or rhythm style recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/311Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation

Definitions

  • the present application relates to the technical field of audio processing, and in particular to a musical score generation method, musical score generation electronic equipment, and a computer-readable storage medium.
  • Music score that is, music score
  • music score is a regular combination of various written symbols that record the pitch or rhythm of music, such as common numbered notation, stave notation, guitar notation, guqin notation and various modern or ancient notations are all called notation.
  • musical notation such as guitar notation
  • manual notation method is inefficient and the accuracy of the notation is poor.
  • the purpose of the present application is to provide a music score generation method, an electronic device and a computer-readable storage medium, which can efficiently generate accurate music scores.
  • the application provides a method for generating music scores, including:
  • chord information Using the chord information, the original tone information, the beat number and the audio time signature to draw a music score to obtain a target music score.
  • chord information the original tone information, the number of beats and the audio time signature to draw a score to obtain a target score
  • the chord information, the original tone information, the number of beats and the audio time signature to draw a score to obtain a target score includes:
  • the target lyrics are the corresponding lyrics of the target audio
  • chord information the original tone information, the number of beats and the audio time signature to draw a score to obtain a target score
  • the chord information, the original tone information, the number of beats and the audio time signature to draw a score to obtain a target score includes:
  • chord information the original tone information, the number of beats and the audio time signature to draw a score to obtain a target score
  • the chord information, the original tone information, the number of beats and the audio time signature to draw a score to obtain a target score includes:
  • the target information is adjusted to obtain the adjusted information; wherein, the target information is at least one of the original tone information, the chord information, music score drawing rules, and the number of beats;
  • the target musical score is generated by using the unadjusted non-target information and the adjusted information.
  • performing tone detection on the target audio to obtain original tone information includes:
  • the key mode corresponding to both the major and minor key sequence corresponding to the maximum number of matching notes and the tonic parameter is determined as the original key information.
  • performing rhythm detection on the target audio to obtain the number of beats includes:
  • the energy value threshold is obtained by multiplying the average energy value and the weight value of the interval, and the weight value is based on the energy value in each interval The variance is obtained;
  • performing rhythm detection on the target audio to obtain the number of beats includes:
  • the log magnitude spectrum is input into the trained neural network to obtain the probability value that each audio frame is a beat in the target audio;
  • a maximum autocorrelation parameter within a preset range is determined as the number of beats.
  • a target part in the target score is determined, and a reminder is marked on the target part.
  • the present application also provides an electronic device, including a memory and a processor, wherein:
  • the memory is used to store computer programs
  • the processor is configured to execute the computer program, so as to realize the above-mentioned musical score generation method.
  • the present application also provides a computer-readable storage medium for storing a computer program, wherein, when the computer program is executed by a processor, the above-mentioned music score generation method is realized.
  • the musical notation generation method obtains the target audio; generates the chromaticity map corresponding to the target audio and each sound level, and uses the chromaticity map to identify the chord of the target audio to obtain chord information; detects the mode of the target audio to obtain the original tone information; detect the rhythm of the target audio to obtain the number of beats; identify the beat type of each audio frame of the target audio, and determine the audio time signature based on the correspondence between the beat type and the time signature; use chord information, original tone information, beat number and The audio time signature is used to draw the score to obtain the target score.
  • the method uses the chromaticity spectrum to represent the energy distribution of the target audio in the frequency domain, and then recognizes the chord of the target audio to obtain chord information.
  • Mode and time signature are important basis for performance, which need to be reflected in the score, so the mode detection of the target audio is performed to obtain the original key information.
  • the audio time signature is determined based on the combination of the beat types. The number of beats (or beats per minute) can represent the speed of the audio rhythm, and use it to determine the corresponding time of the chord.
  • the target score can be obtained by using the chord information, original key information, beat number and audio time signature to draw the score.
  • the target audio By processing the target audio, the data and information necessary to draw the music score are obtained, and then the target music score is drawn by using it. Compared with the way of manually picking up the score, it can efficiently generate an accurate score, so that the efficiency and accuracy of the score generation are equal. Higher, which solves the problems of low efficiency of related technologies and poor score accuracy.
  • the present application also provides an electronic device and a computer-readable storage medium, which also have the above beneficial effects.
  • Fig. 1 is the applicable hardware composition frame schematic diagram of a kind of score generating method that the embodiment of the application provides;
  • FIG. 2 is a schematic diagram of a hardware composition framework applicable to another music score generation method provided by the embodiment of the present application;
  • Fig. 3 is a schematic flow chart of a music score generation method provided by the embodiment of the present application.
  • Fig. 4 is a kind of chromaticity spectrum provided by the embodiment of the present application.
  • Fig. 5 is a specific second musical score provided by the embodiment of the present application.
  • FIG. 6 is a specific target musical score provided by the embodiment of the present application.
  • Fig. 7 is a fingering image provided by the embodiment of the present application.
  • FIG. 1 is a schematic diagram of a hardware composition framework applicable to a musical score generation method provided in an embodiment of the present application.
  • the electronic device 100 may include a processor 101 and a memory 102 , and may further include one or more of a multimedia component 103 , an information input/information output (I/O) interface 104 and a communication component 105 .
  • I/O information input/information output
  • the processor 101 is used to control the overall operation of the electronic device 100, so as to complete all or part of the steps in the musical score generation method;
  • the memory 102 is used to store various types of data to support the operation of the electronic device 100, these data can be, for example, Instructions for any application or method operating on the electronic device 100, as well as application-related data, are included.
  • the memory 102 can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random Access Memory (Static Random Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), Programmable Read-Only Memory (Programmable Read-Only Memory, PROM), Read-Only Memory (Read-Only Memory) One or more of Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM Static Random Access Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • PROM Programmable Read-Only Memory
  • Read-Only Memory One or more of Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • at least programs and/or data for realizing the following functions are
  • chord information Use the chord information, original key information, beat number and audio time signature to draw the score and get the target score.
  • Multimedia components 103 may include screen and audio components.
  • the screen can be, for example, a touch screen, and the audio component is used for outputting and/or inputting audio signals.
  • an audio component may include a microphone for receiving external audio signals.
  • the received audio signal may be further stored in the memory 102 or sent via the communication component 105 .
  • the audio component also includes at least one speaker for outputting audio signals.
  • the I/O interface 104 provides an interface between the processor 101 and other interface modules, which may be a keyboard, a mouse, buttons, and the like. These buttons can be virtual buttons or physical buttons.
  • the communication component 105 is used for wired or wireless communication between the electronic device 100 and other devices.
  • Wireless communication such as Wi-Fi, Bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G or 4G, or a combination of one or more of them, so the corresponding communication component 105 may include: Wi-Fi parts, Bluetooth parts, NFC parts.
  • the electronic device 100 may be implemented by one or more Application Specific Integrated Circuit (ASIC for short), Digital Signal Processor (DSP for short), Digital Signal Processing Device (DSPD for short), Programmable Logic Device (Programmable Logic Device, PLD for short), Field Programmable Gate Array (Field Programmable Gate Array, FPGA for short), controller, microcontroller, microprocessor or other electronic components to implement, used to execute the score generation method .
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • Field Programmable Gate Array Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • the structure of the electronic device 100 shown in FIG. 1 does not constitute a limitation on the electronic device in the embodiment of the present application.
  • the electronic device 100 may include more or fewer components than those shown in FIG. 1 , or combine certain parts.
  • FIG. 2 is a schematic diagram of a hardware composition framework applicable to another music score generation method provided in the embodiment of the present application.
  • the hardware composition framework may include: a first electronic device 11 and a second electronic device 12 connected through a network 13 .
  • the hardware structure of the first electronic device 11 and the second electronic device 12 may refer to the electronic device 100 in FIG. 1 . That is, it can be understood that there are two electronic devices 100 in this embodiment, and the two perform data interaction.
  • the form of the network 13 is not limited in the embodiment of the present application, that is, the network 13 may be a wireless network (such as WIFI, Bluetooth, etc.), or a wired network.
  • the first electronic device 11 and the second electronic device 12 may be the same electronic device, for example, both the first electronic device 11 and the second electronic device 12 are servers; they may also be different types of electronic devices, for example, the first electronic device
  • the device 11 may be a smart phone or other smart terminals
  • the second electronic device 12 may be a server.
  • a server with strong computing power can be used as the second electronic device 12 to improve data processing efficiency and reliability, and further improve the processing efficiency of score generation.
  • a smart phone with low cost and wide application range is used as the first electronic device 11 to realize the interaction between the second electronic device 12 and the user.
  • the interaction process may be: the smart phone acquires the target audio, and sends the target audio to the server, and the server generates the target score.
  • the server sends the target music score to the smart phone, and the smart phone displays the target music score.
  • FIG. 3 is a schematic flowchart of a music score generation method provided by an embodiment of the present application.
  • the method in this example includes:
  • the target audio refers to the audio that needs to generate the corresponding score, and its number and type are not limited.
  • the target audio can be a song with lyrics, or can be pure music without lyrics.
  • the specific method of obtaining the target audio is not limited.
  • the audio information can be obtained first, and then used to filter the locally pre-stored audio to obtain the target audio; or the data transmission interface can be used to obtain the externally input target audio.
  • S102 Generate a chromaticity map corresponding to the target audio and each sound level, and use the chromaticity map to identify chords of the target audio to obtain chord information.
  • the chromaticity spectrum is the Chromagram, and the chromaticity feature is the collective name of the chromaticity vector (Chroma Vector) and the chromaticity spectrum (Chromagram).
  • the chromaticity vector is a vector containing 12 elements. These elements represent the energy in 12 sound levels within a period of time (such as 1 frame), and the energy accumulation of the same sound level in different octaves. the sequence of. Taking the piano as an example, it can be played with 88 pitches (pitch), and these pitches are based on the seven white key notes of do, re, mi, fa, so, la, ti (and the five notes between them). black keys) appear as a group of "circulation", the relationship between do in one group and do in the next group is an octave relationship, if the concept between groups is ignored, then these twelve tones are Constitute twelve sound levels (ie pitch class).
  • the chromaticity spectrum is usually generated by Constant-Q Transform (Constant-Q Transform, CQT). Specifically, after Fourier transform is performed on the target audio, after it is transformed from the time domain to the frequency domain, the noise reduction processing is performed on the frequency domain signal, and then tuning is performed, which is similar to "tuning different pianos to the standard frequency". "Effect. Then the absolute time is converted into frames according to the length of the selected window, and the energy of each pitch in each frame is recorded as a pitch map. On the basis of the pitch map, the energy of the notes at the same time, the same pitch level, and different octaves are superimposed on the elements of the pitch level in the chromaticity vector to form a chromaticity map. Please refer to FIG.
  • CQT Constant-Q Transform
  • FIG. 4 is a chromaticity spectrum provided by the embodiment of the present application.
  • the three levels of C, E, and G are very bright. According to the knowledge of music theory, it can be determined that the main chord of C major (Cmaj), or C, is played during this time of the target audio.
  • Major triad the main chord of C major (Cmaj), or C, is played during this time of the target audio.
  • Chord is a concept in music theory, which refers to a group of sounds with a certain interval relationship.
  • a chord is formed by combining three or more tones vertically according to a third or non-third overlapping relationship.
  • Interval refers to the relationship between two tones in pitch, which refers to the distance between two tones in pitch, and its unit name is degree.
  • S103 Perform tune detection on the target audio to obtain original tune information.
  • a mode refers to an organism composed of several musical tones with different pitches organized according to a certain interval relationship with one sound as the core. Modes are specifically divided into major and minor, both of which follow different interval relationships.
  • the relationship between the major scales is full-full-half-full-full-full-half, and the interval relationship between each tone is 0-2-4-5-7-9-11-12, wherein, The distance between the first sound and the second sound is 2, which is full; the distance between the second sound and the third sound is 2, which is full; the distance between the third sound and the fourth sound The distance between is 1, which is half, and so on.
  • the relationship between minor keys is full-half-full-full-half-full-full, and the interval relationship between each tone is 0-2-3-5-7-8-10-12. Among them, when several tones in a mode are arranged into a scale, the tonic is the core of a mode, and the most stable tone in the mode is the tonic.
  • # means a sharp tone, which is half a step higher than the original tone
  • b means a lower tone, which is a half step lower than the original tone.
  • the mode is divided into major and minor, so there are 24 modes in total.
  • the target audio can be input into a trained convolutional neural network.
  • the convolutional neural network is trained by using a large amount of training data with mode marks.
  • the specific The structure may be a multi-layer convolutional neural network structure. After the target audio is input into the convolutional neural network, it can be used to select the one with the highest probability among the 24 debugging categories as the mode of the target audio.
  • modulo calculation may be performed on the note sequence, and matched with the major and minor styles, and the original tone information is obtained according to the matching result.
  • S104 Perform rhythm detection on the target audio to obtain the number of beats.
  • BPM is the abbreviation of Beat Per Minute.
  • the Chinese name is the number of beats, which is defined as the unit of beats per minute.
  • BPM is the speed mark of the whole song, which is a speed standard independent of the music score.
  • 60BPM means playing an average of 60 quarter notes (or an equivalent combination of notes) in one minute.
  • Rhythm detection is BPM detection. The number of beats is used to control the playing speed of the audio. The same chord is played with different rhythms at different BPMs.
  • autocorrelation calculation may be performed based on the probability sequence that each audio frame of the target audio is a beat (ie, beat), and the calculated result is determined as BPM.
  • the beat can be detected based on the energy distribution of each audio frame within a period of time, and then the BPM can be determined according to the detected beat.
  • S105 Identify the beat type of each audio frame of the target audio, and determine the audio time signature based on the correspondence between the beat type and the time signature.
  • a time signature is a notation used in musical notation that is marked in the form of fractions. There is a time signature in front of each score. If the rhythm is changed in the middle, the changed time signature will be marked.
  • the time signature is like a score, such as 2/4, 3/4, etc.
  • the denominator represents the time value of the beat, that is, a fraction of a note is used as a beat, for example, 2/4 represents a quarter note to represent a beat, and there are two beats in each measure.
  • the numerator represents how many beats there are in each measure. For example, 2/4 is a quarter note, and there are two beats in a measure. 3/4 is a quarter note, and each measure has three beats, and so on. An indispensable thing in music is rhythm.
  • Rhythm is a series of long and short relationships organized. This long and short relationship needs to be divided by time signature.
  • the function of time signature is to separate many notes according to rules, so that the rhythm distinct. For example, for 4/4 beat and 3/4 beat, the beat distribution of each measure of 4/4 beat is strong beat, weak beat, second strong beat, second weak beat, while 3/4 beat is strong beat, weak beat Beat, upbeat.
  • the beats of each frame are divided into non-beat, strong beat downbeat and weak beat beat.
  • the classification problem can be realized through convolutional neural network or cyclic neural network, and the activation probability of three different beats in each frame can be detected. Through Some post-processing can determine the distribution of upbeats and downbeats.
  • the time signature can be identified in the opposite way.
  • the rhythm is related to the strength and distribution of the beat, so the beat type of each audio frame in the target audio can be identified.
  • a convolutional neural network or a recurrent neural network can be used to classify each audio frame and judge it as No beat (non-beat), strong beat (downbeat) or weak beat (beat), and according to the strength and distribution of the beat, use the corresponding relationship between the beat type and the time signature to determine the audio time signature corresponding to the target audio.
  • the above beat type detection method is only a specific implementation manner, and other ways may also be used to detect the beat.
  • the specific execution order of the four steps of S102, S103, S104, and S105 is not limited, and they may be executed in parallel or in series.
  • the score can be drawn based on it, and the target score corresponding to the target audio can be obtained.
  • score drawing can be performed based on preset drawing rules. There are multiple drawing rules, and each drawing rule is respectively related to the score type of the target score, such as guitar score or piano score.
  • the music score drawing rule is the corresponding relationship between chords and pre-stored fingering images, and according to the above information, the corresponding fingering images can be selected, and the fingering images can be spliced to obtain the target music score.
  • the music score drawing rule is a music score drawing rule set according to music theory knowledge, for example, two notes in the first beat of a C chord, which are respectively 5 strings and 3 strings, and two notes in the second beat, which are respectively 2 chord and chord 3, the corresponding score drawing rule can be in the form of data, for example, C(1:5,2;2,3).
  • the energy distribution of the target audio in the frequency domain is represented by means of a chromaticity spectrum, and then the chords of the target audio are identified to obtain chord information.
  • Mode and time signature are important basis for performance, which need to be reflected in the score, so the mode detection of the target audio is performed to obtain the original key information.
  • the audio time signature is determined based on the combination of the beat types. The number of beats (or beats per minute) can represent the speed of the audio rhythm, and use it to determine the corresponding time of the chord.
  • the target score can be obtained by using the chord information, original key information, beat number and audio time signature to draw the score.
  • the target audio By processing the target audio, the data and information necessary to draw the music score are obtained, and then the target music score is drawn by using it. Compared with the way of manually picking up the score, it can efficiently generate an accurate score, so that the efficiency and accuracy of the score generation are equal. Higher, which solves the problems of low efficiency of related technologies and poor score accuracy.
  • this embodiment specifically describes some steps in the foregoing embodiments.
  • the target audio is subjected to tone detection, and the process of obtaining the original tone information may include the following steps:
  • Step 11 Extract the note sequence of the target audio.
  • Step 12 Perform modulo calculation on the note sequence based on multiple different tonic parameters respectively, to obtain multiple calculation result sequences.
  • Step 13 use each calculation result sequence to compare with the major and minor sequence respectively, and obtain the corresponding number of matching notes.
  • Step 14 Determine the major and minor sequence corresponding to the maximum number of matching notes and the mode corresponding to the tonic parameter as the original tone information.
  • the note sequence refers to the sound corresponding to each audio frame in the target audio, which can be represented by note_array, and each value in the sequence, ie note_array[i], is an integer.
  • the tonic parameter refers to the parameter used to represent the tonic of the target audio. Since there are 12 possible tonics, there are 12 tonic parameters in total, which can be set as 12 integers ranging from 0 to 11. The main tone parameter can be represented by shift. Through this modulo calculation, a calculation result sequence can be obtained. By selecting different tonic parameters, the obtained calculation result sequence can represent the mode of the target audio under the condition that the note represented by the tonic parameter is tonic.
  • the modulus calculation is the calculation of (note_array[i]+shift)%12, where % represents the modulus.
  • 12 calculation result sequences can be obtained.
  • the major and minor sequence can be specifically a major sequence or a minor sequence, the major sequence is (0 2 4 5 7 9 11 12), and the minor sequence is (0 2 3 5 7 8 10 12). If all the parameters in the calculation result sequence fall into the major sequence, and the tonic parameter is 0, it means that the key mode of the target audio is C major. As mentioned above, it is impossible for all the parameters in the calculation result sequence to fall into the major or minor sequence. In this case, the number of notes falling into the major sequence and the number of notes falling into the minor sequence in the calculation result sequence can be counted, that is, each calculation result sequence is compared with the major and minor sequence to obtain the corresponding matching number of notes.
  • the calculation result sequence is (...0 5 77)
  • the three parameters of 0 5 7 fall into both the major sequence and the minor sequence, that is, both the major sequence and the minor sequence match, it can be The number of matching notes corresponding to the major sequence and the number of matching notes corresponding to the minor sequence are each increased by 3.
  • the calculation result sequence is (...4 9 11)
  • it only falls into the major sequence so you can add 3 to the number of matching notes corresponding to the major sequence.
  • there are 12 calculation result sequences corresponding to different tonic parameters and each calculation result sequence has 2 matching note numbers respectively corresponding to the major sequence and the minor sequence, there are 24 matching note numbers in total, respectively Corresponding to 24 modes.
  • select the maximum value that is, select the maximum matching note number, and determine the corresponding mode according to its corresponding major and minor sequence and tonic parameters.
  • the rhythm detection is performed on the target audio, and the process of obtaining the beat number may specifically include the following steps:
  • Step 21 Calculate the energy value of each audio frame in the target audio.
  • Step 22 Divide the target audio into several intervals, and use the energy value to calculate the average energy value of the interval.
  • Step 23 If the energy value is greater than the energy value threshold, determine that a beat is detected.
  • Step 24 Count the number of beats per minute to obtain the number of beats.
  • the energy value threshold is obtained by multiplying the average energy value and the weight value of the interval, and the weight value is obtained based on the variance of the energy value in each interval.
  • Audio has a high sampling rate, sometimes up to 44100Hz. When dividing audio frames, it is usually divided by 1024 sampling points per frame. Therefore, under the premise of 44100Hz sampling rate, one second of target audio can be divided into 43 audio frames. . When calculating the energy value corresponding to the audio frame, you can follow:
  • E j is the energy value of the audio frame whose serial number is j
  • input(i) is the sampling value of the sampling point
  • i is the serial number of each sampling point in the current audio frame.
  • the target audio is divided into several intervals, which can be averaged or unevenly divided, so as to determine the average energy value in each interval, and the average energy value is used to determine the energy value threshold in the interval, and the energy value The threshold is used to determine whether a beat is recorded in a certain audio frame.
  • intervals can be divided evenly, and the length of each interval is 1 second. Then the average energy value is:
  • avg(E) is the average energy value. After obtaining the average energy value, use it and the weight value to obtain the energy value threshold. Specifically, the weight value is:
  • C is the weight value
  • var(E) is the variance of the energy value in the interval
  • the energy value threshold is C*avg(E). If the energy value of an audio frame in the interval is greater than the energy value threshold, it means that the energy value audio frame records a beat, that is, a beat.
  • the number of beats can be obtained by counting the number of beats per minute. Specifically, the number of beats per minute in each interval may be counted separately to obtain multiple candidate beat numbers, and the candidate beat number with the largest number is determined as the beat number. Alternatively, the number of beats of the entire target audio may be calculated, and the number of beats may be calculated using the number of beats and the length of the target audio.
  • rhythm detection may also be performed by means of deep learning. Perform rhythm detection on the target audio to obtain the number of beats, including:
  • Step 31 Generate a log magnitude spectrum corresponding to the target audio.
  • Step 32 Input the log magnitude spectrum into the trained neural network to obtain the probability value that each audio frame in the target audio is a beat.
  • Step 33 Perform autocorrelation calculation on the probability value sequence composed of probability values to obtain several autocorrelation parameters.
  • Step 34 Determine the maximum autocorrelation parameter within a preset range as the number of beats.
  • the logarithmic magnitude spectrum is a kind of spectrogram, in which the amplitude of each spectral line is calculated logarithmically to the original amplitude A, so the unit of the ordinate is dB (decibel).
  • the purpose of this transformation is to pull those lower amplitude components higher relative to the higher amplitude components, so that periodic signals masked by low amplitude noise can be observed.
  • the trained neural network is used to predict whether a beat is recorded in each audio frame in the target audio. After inputting the logarithmic amplitude spectrum into the neural network, the neural network outputs the probability value of recording the beat in each audio frame, and the probability Autocorrelation calculations are performed on a sequence of probability values composed of values.
  • the number of beats can be determined within the preset range. Specifically, the maximum autocorrelation parameter within a preset range is determined as the number of beats.
  • the target score is a guitar score.
  • multiple candidate fingering images can be pre-stored, and the target score can be generated by selecting existing images and splicing them.
  • the process of drawing the score by using the chord information, the original key information, the number of beats and the audio time signature to obtain the target score may include the following steps:
  • Step 41 Using the chord information to determine the fingering image.
  • Step 42 Concatenate the fingering images based on the chord information to obtain a second musical score.
  • Step 43 Use the original key information, the number of beats and the audio time signature to mark the second musical score to obtain the target musical score.
  • the candidate fingering image refers to an image used to reflect the manner in which fingers control the strings when playing the guitar
  • the fingering image refers to a candidate fingering image corresponding to chord information. It is understandable that different chords require different fingerings to control the strings to be played. Therefore, when the chord information is determined, its corresponding playing style must be determined, so it can be used to determine the fingering image. It should be noted that, usually, the same chord is played in different ways in different modes, so the chord information and the original key information can be used to jointly determine the fingering image. Since chords change, and one fingering image can only correspond to one tone or a few tones, the number of fingering images determined by using chord information must be multiple.
  • the second musical score can be obtained by splicing each fingering image.
  • the second musical score is a musical score obtained by splicing the fingering images.
  • the fingering image includes an image of pressing the string and an image of controlling the string , wherein string control includes playing methods such as picking and strumming.
  • FIG. 5 is a specific second score provided by the embodiment of the present application.
  • the target music score can be obtained by marking the second music score with the original key information, brand book and audio time signature.
  • FIG. 6 is a specific target score provided by the embodiment of the present application. Among them, the original key is the key of C, the number of beats is 60, and the time signature is 4/4.
  • the target audio may be audio with lyrics, in this case, a mark corresponding to the lyrics may be set in the target score.
  • the target score in Fig. 6 also includes lyrics.
  • the process of drawing the score by using the chord information, the original key information, the number of beats and the audio time signature to obtain the target score may include the following steps:
  • Step 51 Determine the position information of each word in the target lyrics in the target audio.
  • Step 52 Use the duration of each word to determine the corresponding note type.
  • Step 53 Using the chord information, the original key information, the number of beats and the audio time signature to generate the first musical score, and based on the position information and note type, use the target lyrics to identify the first musical score to obtain the target musical score.
  • the first music score is generated by using the chord information, the original key information, the number of beats and the audio time signature, which can be obtained in the manner of steps 41-43.
  • the target lyrics are the lyrics corresponding to the target audio. After the target lyrics are obtained, the position information of each word in the target audio needs to be determined.
  • the location information includes a timestamp, such as the verbatim lyrics information corresponding to the song "Ten Miles of Spring Breeze":
  • startTime is the timestamp
  • duration is the duration
  • the location information may be the measure in which each character is located and which beat in the measure. In this case, the location information needs to use the above timestamp to calculate the location information:
  • the subsection where it is located the start time of the word (i.e. the timestamp)/the duration of a subsection
  • Position in a measure (start time of a word - current measure * duration of a measure)/(60/BPM).
  • the position information After the position information is obtained, it can be used to determine the position of each word in the target lyrics in the first score. Since the duration of each character is different, the corresponding note type is also different, for example, a 16th note, an 8th note or a 4th note. In order to indicate the singing method of each character in the target score, it is necessary to determine the corresponding note type according to the duration. After the note type and position information are determined, they are used as a reference, and the target lyrics are used to identify the first musical score to obtain the target musical score.
  • the performer since the performer usually cannot play music with the same mode, performance mode, and performance speed, when generating the target score, some information of the original music can be modified as needed, so that the generated target score can meet the requirements of the user. own needs. Therefore, use the chord information, original key information, beat number and audio time signature to draw the score to get the target score, including:
  • Step 61 Adjust the target information according to the obtained score adjustment information to obtain adjusted information.
  • Step 62 Generating a target score by using the unadjusted non-target information and the adjusted information.
  • the target information is at least one of the original key information, chord information, score drawing rules, and beat count
  • the non-target information refers to information other than the selected and adjusted target information.
  • Score adjustment information used to adjust the specified target information.
  • the tap count can directly determine the playing speed of the audio, and adjusting the tap count can make the playing speed of the target audio faster or slower.
  • the change of key mode can also be called modulation, which is limited by the selection range of guitar modes that users can master. For example, beginners usually only play the key of C, and the original tune mode can be converted into the mode selected by the user. That is, the original key information is adjusted, for example, the G key is adjusted to the C key.
  • the adjustment of the mode usually leads to the adjustment of the chords, that is, it is necessary to convert the chords corresponding to each beat on the original score into the chords corresponding to the selected mode.
  • the chords can also be adjusted individually if desired.
  • the adjustment of the score drawing rules can modify the performance style of the score and other information.
  • the music score drawing rule is specifically the correspondence between chords and fingerings (and corresponding fingering images).
  • the strings can be played by picking or strumming, so the fingering images corresponding to the same chord can be decomposed chord images or rhythmic images.
  • different time signatures correspond to a series of different decomposition chords and rhythm patterns. Please refer to FIG. 7 .
  • FIG. 7 is a fingering image provided by the embodiment of the present application, in which several rhythm patterns corresponding to common 4/4 beats are recorded.
  • the scores can be stored for reuse after they are generated. Specifically, the following steps may be included:
  • Step 71 Establish the corresponding relationship between the target audio and the target music score, and store the corresponding relationship between the target music score and the audio score.
  • Step 72 If a musical score output request is detected, then use the corresponding relationship of each audio musical score to determine whether there is a requested musical score corresponding to the musical score output request.
  • Step 73 If there is a requested score, output the requested score.
  • music score storage is not limited.
  • data such as the chords of each beat, the corresponding lyrics, and the note type of the lyrics can be recorded and saved.
  • the record content can look like this:
  • the 9th measure, the 1st beat, the corresponding chord is G chord
  • the corresponding lyrics have three words
  • the first word "band” is a 16th note
  • the lyrics area under the tab staff the corresponding position is the second , and so on
  • the user can be guided to perform. Specifically, the following steps can also be included:
  • Step 81 Determine the beat audio according to the target beat number in the target score.
  • Step 82 After detecting the start signal, play beat audio, and count the playing time.
  • Step 83 Determine the target part in the target score according to the target beat number and performance duration, and mark the target part as a reminder.
  • Beat audio refers to audio with regular beat reminders, and the time interval between two adjacent beat sounds in different beat audios is different.
  • the target tap count may be an unadjusted tap count, or may be an adjusted tap count.
  • the time interval between two adjacent beat sounds can be determined, thereby determining the beat audio. Specifically, the time interval between two adjacent beat sounds is (60/target beat count) seconds.
  • the start signal After the start signal is detected, it means that the user starts to play.
  • beat audio is played, and the playing time of this time is counted at the same time.
  • the playing time refers to the time for starting to play the target score.
  • the part of the target score currently being played can be determined, that is, the target part.
  • the target part In order to be able to remind the user of the position that should be played currently, the target part may be marked with a reminder.
  • the specific manner of reminding and marking is not limited, for example, it may be marked with coloring.
  • the user can choose to play the entire content of the target score each time, or can play part of the content, so the target part can be any part in the target score, or can be a part within a certain range in the target score , which can be specified by the user.
  • the computer-readable storage medium provided by the embodiment of the present application is introduced below, and the computer-readable storage medium described below and the music score generation method described above can be referred to in correspondence.
  • the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above-mentioned music score generation method are realized.
  • the computer-readable storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store program codes. medium.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for the related information, please refer to the description of the method part.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Acoustics & Sound (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

一种曲谱生成方法、设备及计算机可读存储介质,该方法包括:获取目标音频(S101);生成目标音频与各个音级对应的色度图谱,并利用色度图谱识别目标音频的和弦,得到和弦信息(S102);对目标音频进行调式检测,得到原调信息(S103);对目标音频进行节奏检测,得到拍子数(S104);对目标音频各个音频帧的节拍类型进行识别,并基于节拍类型与拍号对应关系确定音频拍号(S105);利用和弦信息、原调信息、拍子数和音频拍号进行曲谱绘制,得到目标曲谱(S106)。通过对目标音频进行处理,得到绘制曲谱所必须的数据和信息,进而利用其绘制得到目标曲谱,相比人工扒谱的方式,能够高效地生成准确的曲谱,使得曲谱生成的效率和准确性均较高。

Description

一种曲谱生成方法、电子设备及可读存储介质
本申请要求于2021年09月16日提交中国专利局、申请号为202111088919.7、发明名称为“一种曲谱生成方法、电子设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频处理技术领域,特别涉及一种曲谱生成方法、曲谱生成电子设备及计算机可读存储介质。
背景技术
曲谱,即乐谱,是记录音乐音高或者节奏的各种书面符号的有规律的组合,如常见的简谱、五线谱、吉他谱、古琴谱等等各种现代的或者古代的乐谱都叫做曲谱。当前,通常需要采用人工扒谱的方式生成曲谱,例如吉他谱,而人工扒谱的方式效率较低,且曲谱准确性较差。
发明内容
有鉴于此,本申请的目的在于提供一种曲谱生成方法、电子设备及计算机可读存储介质,能够高效地生成准确的曲谱。
为解决上述技术问题,第一方面,本申请提供了一种曲谱生成方法,包括:
获取目标音频;
生成所述目标音频与各个音级对应的色度图谱,并利用所述色度图谱识别所述目标音频的和弦,得到和弦信息;
对所述目标音频进行调式检测,得到原调信息;
对所述目标音频进行节奏检测,得到拍子数;
对所述目标音频各个音频帧的节拍类型进行识别,并基于节拍类型与拍号对应关系确定音频拍号;
利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号进行曲谱绘制,得到目标曲谱。
可选地,所述利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号进行曲谱绘制,得到目标曲谱,包括:
确定目标歌词中各个字在所述目标音频中的位置信息;所述目标歌词为所述目标音频对应的歌词;
利用各个所述字的持续时长确定对应的音符类型;
利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号生成第一曲谱,并基于所述位置信息和所述音符类型,利用所述目标歌词标识所述第一曲谱,得到所述目标曲谱。
可选地,所述利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号进行曲谱绘制,得到目标曲谱,包括:
利用所述和弦信息确定指法图像;
基于所述和弦信息对所述指法图像进行拼接,得到第二曲谱;
利用所述原调信息、所述拍子数和所述音频拍号标记所述第二曲谱,得到所述目标曲谱。
可选地,所述利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号进行曲谱绘制,得到目标曲谱,包括:
根据得到的曲谱调整信息,对目标信息进行调整,得到调整后信息;其中,所述目标信息为所述原调信息、所述和弦信息、曲谱绘制规则、所述拍子数中的至少一项;
利用未经调整的非目标信息和所述调整后信息生成所述目标曲谱。
可选地,所述对所述目标音频进行调式检测,得到原调信息,包括:
提取所述目标音频的音符序列;
分别基于多个不同的主音参数,对所述音符序列进行取模计算,得到多个计算结果序列;
利用各个所述计算结果序列分别与大小调序列比对,得到对应的匹配音符数;
将最大匹配音符数对应的所述大小调序列和所述主音参数均对应的调式确定为所述原调信息。
可选地,所述对所述目标音频进行节奏检测,得到拍子数,包括:
计算目标音频中各个音频帧的能量值;
对所述目标音频划分为若干个区间,并利用所述能量值计算所处区间的平均能量值;
若所述能量值大于能量值阈值,则确定检测到一个节拍;所述能量值阈值由平均能量值和所述区间的权重值相乘得到,所述权重值基于各个所述区间内的能量值的方差得到;
统计每分钟内的节拍数,得到所述拍子数。
可选地,所述对所述目标音频进行节奏检测,得到拍子数,包括:
生成所述目标音频对应的对数幅度谱;
将所述对数幅度谱输入训练好的神经网络,得到所述目标音频中每个音频帧为节拍的概率值;
对所述概率值组成的概率值序列进行自相关计算,得到若干个自相关参数;
将处于预设范围内的最大自相关参数确定为所述拍子数。
可选地,还包括:
建立所述目标音频与所述目标曲谱之间的音频曲谱对应关系,并存储所述目标曲谱和所述音频曲谱对应关系;
若检测到曲谱输出请求,则利用各个所述音频曲谱对应关系判断是否存在所述曲谱输出请求对应的请求曲谱;
若存在所述请求曲谱,则输出所述请求曲谱。
可选地,还包括:
根据所述目标曲谱中的目标拍子数确定节拍音频;
在检测到开始信号后,播放所述节拍音频,并统计演奏时长;
按照所述目标拍子数和所述演奏时长,确定所述目标曲谱中的目标部分,对所述目标部分进行提醒标注。
第二方面,本申请还提供了一种电子设备,包括存储器和处理器,其中:
所述存储器,用于保存计算机程序;
所述处理器,用于执行所述计算机程序,以实现上述的曲谱生成方法。
第三方面,本申请还提供了一种计算机可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现上述的曲谱生成方法。
本申请提供的曲谱生成方法,获取目标音频;生成目标音频与各个音级对应的色度图谱,并利用色度图谱识别目标音频的和弦,得到和弦信息;对目标音频进行调式检测,得到原调信息;对目标音频进行节奏检测,得到拍子数;对目标音频各个音频帧的节拍类型进行识别,并基于节拍类型与拍号对应关系确定音频拍号;利用和弦信息、原调信息、拍子数和音频拍号进行曲谱绘制,得到目标曲谱。
可见,该方法在获取目标音频后,利用色度图谱的方式对目标音频在频域范围的能量分布进行表示,进而识别目标音频的和弦,得到和弦信息。调式和拍号是演奏的重要依据,其需要在曲谱中进行体现,因此对目标音频进行调式检测得到原调信息。并通过对节拍类型进行识别,基于节拍类型的组合确定音频拍号。拍子数(或称为每分钟节拍数)可以表征音频节奏的快慢,利用其确定和弦对应的时间。在得到上述信息后,利用和弦信息、原调信息、拍子数和音频拍号进行曲谱绘制,即可得到目标曲谱。通过对目标音频进行处理,得到绘制曲谱所必须的数据和信息,进而利用其绘制得到目标曲谱,相比人工扒谱的方式,能够高效地生成准确的曲谱,使得曲谱生成的效率和准确性均较高,解决了相关技术效率较低,且曲谱准确性较差的问题。
此外,本申请还提供了一种电子设备及计算机可读存储介质,同样具有上述有益效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例提供的一种曲谱生成方法所适用的硬件组成框架 示意图;
图2为本申请实施例提供的另一种曲谱生成方法所适用的硬件组成框架示意图;
图3为本申请实施例提供的一种曲谱生成方法的流程示意图;
图4为本申请实施例提供的一种色度图谱;
图5为本申请实施例提供的一种具体的第二曲谱;
图6为本申请实施例提供的一种具体的目标曲谱;
图7为本申请实施例提供的指法图像。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于理解,先对本申请实施例提供的曲谱生成方法对应的方案所使用的硬件组成框架进行介绍。请参考图1,图1为本申请实施例提供的一种曲谱生成方法所适用的硬件组成框架示意图。其中电子设备100可以包括处理器101和存储器102,还可以进一步包括多媒体组件103、信息输入/信息输出(I/O)接口104以及通信组件105中的一种或多种。
其中,处理器101用于控制电子设备100的整体操作,以完成曲谱生成方法中的全部或部分步骤;存储器102用于存储各种类型的数据以支持在电子设备100的操作,这些数据例如可以包括用于在该电子设备100上操作的任何应用程序或方法的指令,以及应用程序相关的数据。该存储器102可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,例如静态随机存取存储器(Static Random Access Memory,SRAM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、可编程只读存储器(Programmable Read-Only Memory, PROM)、只读存储器(Read-Only Memory,ROM)、磁存储器、快闪存储器、磁盘或光盘中的一种或多种。在本实施例中,存储器102中至少存储有用于实现以下功能的程序和/或数据:
获取目标音频;
生成目标音频与各个音级对应的色度图谱,并利用色度图谱识别目标音频的和弦,得到和弦信息;
对目标音频进行调式检测,得到原调信息;
对目标音频进行节奏检测,得到拍子数;
对目标音频各个音频帧的节拍类型进行识别,并基于节拍类型与拍号对应关系确定音频拍号;
利用和弦信息、原调信息、拍子数和音频拍号进行曲谱绘制,得到目标曲谱。
多媒体组件103可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏,音频组件用于输出和/或输入音频信号。例如,音频组件可以包括一个麦克风,麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器102或通过通信组件105发送。音频组件还包括至少一个扬声器,用于输出音频信号。I/O接口104为处理器101和其他接口模块之间提供接口,上述其他接口模块可以是键盘,鼠标,按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件105用于电子设备100与其他设备之间进行有线或无线通信。无线通信,例如Wi-Fi,蓝牙,近场通信(Near Field Communication,简称NFC),2G、3G或4G,或它们中的一种或几种的组合,因此相应的该通信组件105可以包括:Wi-Fi部件,蓝牙部件,NFC部件。
电子设备100可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,简称ASIC)、数字信号处理器(Digital Signal Processor,简称DSP)、数字信号处理设备(Digital Signal Processing Device,简称DSPD)、可编程逻辑器件(Programmable Logic Device,简称PLD)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行曲谱生成方法。
当然,图1所示的电子设备100的结构并不构成对本申请实施例中电子 设备的限定,在实际应用中电子设备100可以包括比图1所示的更多或更少的部件,或者组合某些部件。
可以理解的是,本申请实施例中并不对电子设备的数量进行限定,其可以是多个电子设备共同协作完成曲谱生成方法。在一种可能的实施方式中,请参考图2,图2为本申请实施例提供的另一种曲谱生成方法所适用的硬件组成框架示意图。由图2可知,该硬件组成框架可以包括:第一电子设备11和第二电子设备12,二者之间通过网络13连接。
在本申请实施例中,第一电子设备11与第二电子设备12的硬件结构可以参考图1中电子设备100。即可以理解为本实施例中具有两个电子设备100,两者进行数据交互。进一步,本申请实施例中并不对网络13的形式进行限定,即,网络13可以是无线网络(如WIFI、蓝牙等),也可以是有线网络。
其中,第一电子设备11和第二电子设备12可以是同一种电子设备,如第一电子设备11和第二电子设备12均为服务器;也可以是不同类型的电子设备,例如,第一电子设备11可以是智能手机或其它智能终端,第二电子设备12可以是服务器。在一种可能的实施方式中,可以利用计算能力强的服务器作为第二电子设备12来提高数据处理效率及可靠性,进而提高曲谱生成的处理效率。同时利用成本低,应用范围广的智能手机作为第一电子设备11,用于实现第二电子设备12与用户之间的交互。可以理解的是,该交互过程可以为:智能手机获取目标音频,并将目标音频发送至服务器,由服务器进行目标曲谱的生成。服务器将目标曲谱发送至智能手机,由智能手机对目标曲谱进行展示。
基于上述说明,请参考图3,图3为本申请实施例提供的一种曲谱生成方法的一种流程示意图。该实施例中的方法包括:
S101:获取目标音频。
目标音频,是指需要生成对应曲谱的音频,其数量、类型等不做限定。具体的,目标音频可以为具有歌词的歌曲,或者可以为不具有歌词的纯音 乐。目标音频的具体获取方式不做限定,例如可以先获取音频信息,并利用其对本地预存的音频进行筛选,得到目标音频;或者可以利用数据传输接口获取外部输入的目标音频。
S102:生成目标音频与各个音级对应的色度图谱,并利用色度图谱识别目标音频的和弦,得到和弦信息。
色度图谱即Chromagram,色度特征是色度向量(Chroma Vector)和色度图谱(Chromagram)的统称。色度向量是一个含有12个元素的向量,这些元素分别代表一段时间(如1帧)内12个音级中的能量,不同八度的同一音级能量累加,色度图谱则是色度向量的序列。以钢琴为例,它可以被弹奏出88个音高(pitch),这些音高都是以do、re、mi、fa、so、la、ti七个白键音符(及其之间的五个黑键)为一组“循环”出现的,某一组中的do和下一组中的do就是一个八度的关系,如果忽略组与组之间的概念,那么这十二个音即构成十二个音级(即pitch class)。
色度图谱通常通过常数Q变换(Constant-Q Transform,CQT)生成。具体的,对目标音频做傅里叶变换,将其从时域转变为频域之后,对频域信号进行降噪处理,再进行调谐,起到类似于“将不同的钢琴调弦到标准频率”的效果。然后将绝对时间按照所选窗的长度转换为帧,并将每一个音高在每一帧内的能量记录下来,成为音高图谱。在音高图谱的基础上,将同一时间、同一音级、不同八度的音符的能量叠加到色度向量内该音级的元素上,成为色度图谱。请参考图4,图4为本申请实施例提供的一种色度图谱。其中的第一个大格,C、E、G这三个音级非常的亮,根据乐理知识,可以确定目标音频的这一时间内演奏了C大调主和弦(Cmaj),或称为C大三和弦。
和弦(Chord)是乐理上的一个概念,指的是一定音程关系的一组声音。将三个或以上的音,按照三度或非三度的叠置关系,在纵向上加以结合,就成为和弦。音程,指两个音级在音高上的相互关系,就是指两个音在音高上的距离,其单位名称叫做度。通过上述方式,利用色度图谱和乐理知识相配合,可以确定目标音频在不同时间对应的和弦,得到和弦信息。
S103:对目标音频进行调式检测,得到原调信息。
调式,是指以一个音为核心,按照一定音程关系不同音高组织在一起的若干个乐音,构成的一个有机体。调式具体分为大调和小调,二者分别遵循不同的音程关系。
具体的,大调之间的关系为全-全-半-全-全-全-半,各个音之间的音程关系为0-2-4-5-7-9-11-12,其中,第一个音至第二个音之间的距离为2,即为全;第二个音至第三个音之间的距离为2,即为全;第三个音至第四个音之间的距离为1,即为半,以此类推。小调之间的关系为全-半-全-全-半-全-全,各个音之间的音程关系为0-2-3-5-7-8-10-12。其中,当调式中的几个音排列成音阶时,主音是一个调式的核心,调式中最稳定的音,即为主音。由于音阶共有12个,每个都可以作为主音,具体包括C、C#(或Db)、D、D#(或Eb)、E、F、F#(或Gb)、G、G#(或Ab)、A、A#(或Bb)、B。其中#表示升音,比原音高半音,b表示降音,比原音降半音。而调式又分为大调和小调,因此共有24种调式。
本实施例并不限定调式检测的具体方式,在一种实施方式中,可以将目标音频输入训练好的卷积神经网络,该卷积神经网络利用大量具有调式标记的训练数据训练得到,其具体结构可以为多层卷积神经网络结构。将目标音频输入该卷积神经网络中后,可以利用其选择24个调试类别中概率最大的一个作为目标音频的调式。在另一种实施方式中,可以利对音符序列进行取模计算,并与大调和小调样式进行匹配,根据匹配结果得到原调信息。
S104:对目标音频进行节奏检测,得到拍子数。
BPM是Beat Per Minute的简称,中文名为拍子数,释义为每分钟节拍数的单位。BPM是全曲速度标记,为独立在曲谱外的速度标准,一般以一个四分音符为一拍,60BPM即为一分钟演奏均匀60个四分音符(或等效的音符组合)。节奏检测即为BPM检测,拍子数用于控制音频的演奏速度,相同的和弦在不同的BPM下演奏节奏不同。
本实施例并不限定节奏检测的具体方式,在一种实施方式中,可以基于目标音频各个音频帧是节拍(即beat)的概率序列进行自相关计算,并将计算得到的结果确定为BPM。在另一种实施方式中,可以基于一段时间 内各个音频帧的能量分布情况检测节拍,进而根据检测到的节拍确定BPM。
S105:对目标音频各个音频帧的节拍类型进行识别,并基于节拍类型与拍号对应关系确定音频拍号。
拍号,是一种在乐谱中使用的符号,用分数的形式来标画。每一个乐谱前面都有拍号,中间如果改变节奏会标出改变的拍号,拍号如同分数,如2/4、3/4等。其分母表示拍子的时值,即用几分音符来做为一拍,例如2/4代表用四分音符代表一拍,每一小节有两拍。分子代表每一小节有多少拍子,例如2/4拍就是以四分音符为一拍,一小节有两拍,3/4以四分音符为一拍,每小节有三拍,以此类推。音乐中有一个不可缺少的东西就是节奏,节奏即为一系列组织起来的长短关系,这种长短关系需要利用拍号进行规范的划分,拍号的作用就是把众多的音符按规矩分隔,使节奏鲜明。例如,对于4/4拍和3/4拍来说,4/4拍的每小节的节拍分布为强拍、弱拍、次强拍、次弱拍,而3/4拍为强拍、弱拍、弱拍。
因为可以通过检测强拍和弱拍的分布来区分它们。将每一帧属于的拍分无拍non-beat、强拍downbeat和弱拍beat,可以通过卷积神经网络或者循环神经网络实现该分类问题,检测出每帧三种不同拍的激活概率,通过一些后处理即可确定强拍和弱拍的分布。
因此,可以采用相反的方式对拍号进行识别。具体的,节奏与节拍的强弱和分布情况相关,因此可以对目标音频中各个音频帧的节拍类型进行识别,例如可以利用卷积神经网络或循环神经网络对各个音频帧进行分类,判断其为无拍(non-beat)、强拍(downbeat)或弱拍(beat),并根据节拍的强弱和分布情况,利用节拍类型与拍号的对应关系,确定目标音频对应的音频拍号。需要说明的是,上述节拍类型检测方式仅为一种具体的实施方式,还可以采用其他的方式对节拍进行检测。
S106:利用和弦信息、原调信息、拍子数和音频拍号进行曲谱绘制,得到目标曲谱。
需要说明的是,S102、S103、S104、S105四个步骤的具体执行顺序不做限定,其可以并行执行,或者可以串行执行。在得到曲谱所需的和弦信 息、原调信息、拍子数和音频拍号后,可以基于其进行曲谱绘制,得到与目标音频相对应的目标曲谱。具体的,可以基于预设的绘制规则进行曲谱绘制,绘制规则有多个,各个绘制规则分别与目标曲谱的曲谱类型相关,例如为吉他曲谱,或者为钢琴曲谱等。在一种实施方式中,曲谱绘制规则为和弦与预存的指法图像的对应关系,根据上述信息,可以选择对应的指法图像,并将所述指法图像进行拼接,得到目标曲谱。在另一种实施方式中,曲谱绘制规则为根据乐理知识设定的曲谱绘制规则,例如C和弦第一拍两个音,分别为5弦和3弦,第二拍两个音,分别为2弦和3弦,则其对应的曲谱绘制规则可以承数据形式,例如为C(1:5,2;2,3)。
应用本申请实施例提供的曲谱生成方法,在获取目标音频后,利用色度图谱的方式对目标音频在频域范围的能量分布进行表示,进而识别目标音频的和弦,得到和弦信息。调式和拍号是演奏的重要依据,其需要在曲谱中进行体现,因此对目标音频进行调式检测得到原调信息。并通过对节拍类型进行识别,基于节拍类型的组合确定音频拍号。拍子数(或称为每分钟节拍数)可以表征音频节奏的快慢,利用其确定和弦对应的时间。在得到上述信息后,利用和弦信息、原调信息、拍子数和音频拍号进行曲谱绘制,即可得到目标曲谱。通过对目标音频进行处理,得到绘制曲谱所必须的数据和信息,进而利用其绘制得到目标曲谱,相比人工扒谱的方式,能够高效地生成准确的曲谱,使得曲谱生成的效率和准确性均较高,解决了相关技术效率较低,且曲谱准确性较差的问题。
基于上述实施例,本实施例对上述实施例中的部分步骤进行具体说明。在一种实施方式中,为了得到准确的原调信息,对目标音频进行调式检测,得到原调信息的过程可以包括如下步骤:
步骤11:提取目标音频的音符序列。
步骤12:分别基于多个不同的主音参数,对音符序列进行取模计算,得到多个计算结果序列。
步骤13:利用各个计算结果序列分别与大小调序列比对,得到对应的匹配音符数。
步骤14:将最大匹配音符数对应的大小调序列和主音参数均对应的调式确定为原调信息。
其中,音符序列是指目标音频中各个音频帧对应的音,其可以用note_array表示,序列中的每个值,即note_array[i]均为整数。主音参数,是指用于表示目标音频的主音的参数,由于主音有12种可能,因此共有12个主音参数,可以设置为0~11共12个整数。主音参数可以用shift表示。通过该取模计算,可以得到计算结果序列,通过选择不同的主音参数,得到的计算结果序列能够表示在以主音参数表示的音符为主音这一情况下的目标音频的调式。
具体的,取模计算为(note_array[i]+shift)%12的计算,其中%表示取模。经过取模计算,可以得到12个计算结果序列。大小调序列具体可以为大调序列或小调序列,大调序列即为(0 2 4 5 7 9 11 12),小调序列为(0 2 3 5 7 8 10 12)。若计算结果序列中的参数全部落入大调序列中,且主音参数为0,则说明目标音频的调式为C大调。所述,不可能出现计算结果序列中所有的参数均落入大调序列或小调序列中的情况。在这种情况下,可以统计计算结果序列中落入大调序列中的音符数和落入小调序列中的音符数,即利用各个计算结果序列分别与大小调序列进行比对,得到对应的匹配音符数。
具体的,若计算结果序列为(…0 5 7…),由于0 5 7这三个参数既落入大调序列也落入小调序列,即与大调序列和小调序列均匹配,因此可以为大调序列对应的匹配音符数和小调序列对应的匹配音符数各加3。若计算结果序列为(…4 9 11),则其仅落入大调序列,因此可以为大调序列对应的匹配音符数加3。可以理解的是,由于共有12个对应于不同主音参数的计算结果序列,每个计算结果序列具有2个分别对应于大调序列和小调序列的匹配音符数,因此共有24个匹配音符数,分别对应于24种调式。在得到24个匹配音符数后,从中选择最大值,即选择最大匹配音符数,并根据其对应的大小调序列和主音参数确定对应的调式。
进一步,在一种实施方式中,对于节奏检测的过程,为了提高拍子数的准确性,对目标音频进行节奏检测,得到拍子数的过程具体可以包括如下步骤:
步骤21:计算目标音频中各个音频帧的能量值。
步骤22:对目标音频划分为若干个区间,并利用能量值计算所处区间的平均能量值。
步骤23:若能量值大于能量值阈值,则确定检测到一个节拍。
步骤24:统计每分钟内的节拍数,得到拍子数。
其中,能量值阈值由平均能量值和区间的权重值相乘得到,权重值基于各个区间内的能量值的方差得到。音频的采样率较高,有时可以达到44100Hz,在划分音频帧时,通常以每帧1024个采样点进行划分,因此在44100Hz采样率的前提下,一秒的目标音频可以划分为43个音频帧。在计算音频帧对应的能量值时,可以按照:
Figure PCTCN2022094961-appb-000001
其中,E j为序号为j的音频帧的能量值,input(i)为采样点的采样值,i为当前音频帧内各个采样点的序号。
由于拍子数为BPM,因此需要统计每秒中的的节拍数。本实施例中,将目标音频划分为若干个区间,具体可以为平均划分或非平均划分,以便确定各个区间内的平均能量值,平均能量值用于确定该区间内的能量值阈值,能量值阈值则用于判断某一音频帧中是否记录有节拍。通常情况下,区间可以为平均划分,每个区间的长度为1秒。则平均能量值为:
Figure PCTCN2022094961-appb-000002
avg(E)即为平均能量值。在得到平均能量值之后,利用其与权重值得到能量值阈值。具体的,权重值为:
C=-0.0000015·var(E)+1.5142857
Figure PCTCN2022094961-appb-000003
其中,C为权重值,var(E)为区间内的能量值的方差,能量值阈值 为C*avg(E)。若该区间内某一个音频帧的能量值大于能量值阈值,则说明该能量值音频帧记录了一个节拍,即beat。通过统计每分钟的节拍数,即可得到拍子数。具体的,可以分别统计各个区间内的内分钟节拍数,得到多个候选拍子数,并将数量最多的候选拍子数确定为拍子数。或者,可以计算整个目标音频的节拍数,并利用该节拍数和目标音频的长度计算得到拍子数。
在另一种实施方式中,还可以利用深度学习的方式进行节奏检测。对目标音频进行节奏检测,得到拍子数,包括:
步骤31:生成目标音频对应的对数幅度谱。
步骤32:将对数幅度谱输入训练好的神经网络,得到目标音频中每个音频帧为节拍的概率值。
步骤33:对概率值组成的概率值序列进行自相关计算,得到若干个自相关参数。
步骤34:将处于预设范围内的最大自相关参数确定为拍子数。
对数幅度谱是频谱图中的一种,其中各谱线的振幅都对原振幅A作了对数计算,所以其纵坐标的单位是dB(分贝)。这个变换的目的是使那些振幅较低的成分相对高振幅成分得以拉高,以便观察掩盖在低幅噪声中的周期信号。训练好的神经网络,其用于对目标音频中各个音频帧是否记录了节拍进行预测,将对数幅度谱输入神经网络后,神经网络输出每个音频帧记录了节拍的概率值,并对概率值组成的概率值序列进行自相关计算。在自相关计算后,通常会得到超过一个自相关参数。由于音频的BPM通常处于一个固定的区间,即预设范围,因此可以在预设范围内确定拍子数。具体的,将处于预设范围内的最大自相关参数确定为拍子数。
进一步,在一种实施方式中,目标曲谱为吉他曲谱,为了提高绘制目标曲谱的速度,可以预存有多个候选指法图像,通过选择已有的图像并进行拼接的方式生成目标曲谱。具体的,利用和弦信息、原调信息、拍子数和音频拍号进行曲谱绘制,得到目标曲谱的过程可以包括如下步骤:
步骤41:利用和弦信息确定指法图像。
步骤42:基于和弦信息对指法图像进行拼接,得到第二曲谱。
步骤43:利用原调信息、拍子数和音频拍号标记第二曲谱,得到目标曲谱。
其中,候选指法图像,是指用于反映弹奏吉他时手指控制琴弦的方式的图像,而指法图像是指与和弦信息相对应的候选指法图像。可以理解的是,不同的和弦需要采用不同的指法来控制琴弦才能弹出。因此当和弦信息确定后,其对应的弹奏方式必然已经确定,因此可以利用其确定指法图像。需要说明的是,通常情况下,不同调式下的相同和弦的演奏方式不同,因此可以利用和弦信息和原调信息共同确定指法图像。由于和弦是变化的,而一个指法图像仅能对应一个音或数量较少的几个音,因此利用和弦信息确定的指法图像的数量必然是多个。
在得到指法图像后,通过对各个指法图像进行拼接,即可得到第二曲谱,第二曲谱为由指法图像拼接得到的曲谱,需要说明的是,指法图像包括按弦的图像和控弦的图像,其中,控弦包括拨弦和扫弦等弹奏方式。请参考图5,图5为本申请实施例提供的一种具体的第二曲谱。在得到第二曲谱后,利用原调信息、牌子书和音频拍号对第二曲谱进行标记,即可得到目标曲谱。请参考图6,图6为本申请实施例提供的一种具体的目标曲谱。其中,原调为C调,拍子数为60,拍号为4/4。
在一种实施方式中,目标音频可以为具有歌词的音频,在这种情况下,可以在目标曲谱中设置于歌词对应的标记。例如,可以看出,图6中的目标曲谱还包括歌词。具体的,利用和弦信息、原调信息、拍子数和音频拍号进行曲谱绘制,得到目标曲谱的过程可以包括如下步骤:
步骤51:确定目标歌词中各个字在目标音频中的位置信息。
步骤52:利用各个字的持续时长确定对应的音符类型。
步骤53:利用和弦信息、原调信息、拍子数和音频拍号生成第一曲谱,并基于位置信息和音符类型,利用目标歌词标识第一曲谱,得到目标曲谱。
该实施方式中,利用和弦信息、原调信息、拍子数和音频拍号生成的是第一曲谱,其具体可以采用步骤41-步骤43中的方式得到。目标歌词为目标音频对应的歌词,在获取目标歌词后,需要确定其中每个字在目标音频中的位置信息。在一种实施方式中,位置信息包括时间戳,如歌曲《春 风十里》对应的逐字歌词信息:
<LyricLine LineStartTime="23036"LineDuration="5520">
<LyricWord word="我"startTime="23036"duration="216"/>
<LyricWord word="在"startTime="23252"duration="553"/>
<LyricWord word="二"startTime="23805"duration="240"/>
<LyricWord word="环"startTime="24045"duration="279"/>
<LyricWord word="路"startTime="24324"duration="552"/>
<LyricWord word="的"startTime="24876"duration="281"/>
<LyricWord word="里"startTime="25157"duration="199"/>
<LyricWord word="边"startTime="25356"duration="872"/>
<LyricWord word="想"startTime="26228"duration="952"/>
<LyricWord word="着"startTime="27180"duration="320"/>
<LyricWord word="你"startTime="27500"duration="1056"/>
</LyricLine>
其中,startTime为时间戳,duration为持续时间。
在另一种实施方式中,位置信息可以为各个字所在的小节以及在该小节中的哪一拍上。在这种情况下,位置信息需要利用上述的时间戳进行位置信息的计算:
所在小节=字的开始时间(即时间戳)/一小节的持续时间
小节中的位置=(字的开始时间-所在小节*小节的持续时间)/(60/BPM)。
在得到位置信息后,则可利用其确定目标歌词中各个字在第一曲谱中的位置。由于每个字的持续时长不同时,其对应的音符类型也不同,例如为16分音符、8分音符或4分音符。为了将每个字的演唱方式在目标曲谱中标明,需要根据持续时长确定对应的音符类型。在确定音符类型和位置信息后,将其作为基准,利用目标歌词标识第一曲谱,得到目标曲谱。
进一步的,由于演奏者通常无法演奏具有和的调式、演奏方式、演奏速度的乐曲,因此在生成目标曲谱时,可以根据需要对原曲的某些信息进行修改,使得生成的目标曲谱能够符合用户自身的需要。因此,利用和弦 信息、原调信息、拍子数和音频拍号进行曲谱绘制,得到目标曲谱,包括:
步骤61:根据得到的曲谱调整信息,对目标信息进行调整,得到调整后信息。
步骤62:利用未经调整的非目标信息和调整后信息生成目标曲谱。
其中,目标信息为原调信息、和弦信息、曲谱绘制规则、拍子数中的至少一项,非目标信息即为选中被调整的目标信息以外的其他信息。曲谱调整信息,用于对指定的目标信息进行调整。拍子数可以直接决定音频的演奏速度,调整拍子数可以比目标音频的演奏速度更快或更慢。调式的变化也可以称为转调,受到用户能够掌握的吉他调式选择范围限制,例如初学者通常只会弹奏C调,可以将原曲调式转换成用户选择的调式。即调整原调信息,例如将G调调整为C调。需要说明的还是,根据乐理知识,调式的调整通常会引起和弦的调整,即需要将原曲谱上各拍对应的和弦转换成选择调式对应的和弦。例如将调式由G调调整为C调时,需要将G调第一阶和弦G,转换成C调对应的第一阶和弦C。当然,也可以根据需要单独调整和弦。
曲谱绘制规则的调整,可以修改曲谱的演奏风格等信息。在一种具体的实施方式中,若采用指法图像拼接的方式生成目标曲谱,则曲谱绘制规则具体为和弦与指法(以及对应的指法图像)之间的对应关系。而在吉他演奏中,可以采用拨弦或扫弦的方式进行演奏,因此相同的和弦对应的指法图像可以为分解和弦图像或节奏型图像。根据乐理知识,不同的拍号,对应着一系列不同的分解和弦和节奏型。请参考图7,图7为本申请实施例提供的指法图像,其中记录了常见的4/4拍对应的若干个节奏型。
进一步,为了避免计算资源的浪费,在生成曲谱后可以对其进行存储,以便重复利用。具体的,可以包括如下步骤:
步骤71:建立目标音频与目标曲谱之间的音频曲谱对应关系,并存储目标曲谱和音频曲谱对应关系。
步骤72:若检测到曲谱输出请求,则利用各个音频曲谱对应关系判断是否存在曲谱输出请求对应的请求曲谱。
步骤73:若存在请求曲谱,则输出请求曲谱。
其中,若不存在曲谱输出请求对应的请求曲谱,则利用本申请提供您的曲谱生成方法生成该请求曲谱,并将其输出。曲谱保存的具体形式不做限定,例如在一种实施方式中,可以对每一拍的和弦,对应歌词,以及该歌词的音符类型等数据进行记录,并保存。记录内容可以如下所示:
<BeatInfo chord="G"segment="9"beat="1">
<LyricInfo>
<LyricWord word="带"startPos="2"note="16"/>
<LyricWord word="出"startPos="3"note="16"/>
<LyricWord word="温"startPos="4"note="16""/>
</LyricInfo>
</BeatInfo>
意思是,第9小节,第1拍,对应的和弦是G和弦,对应歌词有三个字,第一字“带”是一个16分音符,六线谱下歌词区域,对应的位置是第二位,以此类推
此外,还可以在生成曲谱或输出曲谱后,引导用户进行演奏,具体的,还可以包括如下步骤:
步骤81:根据目标曲谱中的目标拍子数确定节拍音频。
步骤82:在检测到开始信号后,播放节拍音频,并统计演奏时长。
步骤83:按照目标拍子数和演奏时长,确定目标曲谱中的目标部分,对目标部分进行提醒标注。
节拍音频,是指具有规律的进行节拍提醒的音频,不同的节拍音频中相邻的两个节拍音的时间间隔不同。目标拍子数可以为未经调整的拍子数,或者可以为经过调整的拍子数。利用目标拍子数,可以确定相邻的两个节拍音的时间间隔大小,进而确定节拍音频。具体的,相邻的两个节拍音之间的时间间隔为(60/目标拍子数)秒。
在检测到开始信号后,说明用户开始演奏,为了对用户进行演奏节奏的提醒,播放节拍音频,同时统计本次的演奏时长。演奏时长,是指开始演奏目标曲谱的时长,根据目标拍子数和演奏时长,可以确定当前演奏的目标曲谱的部分,即目标部分。为了能够提醒用户当前应当演奏的位置, 可以对目标部分进行提醒标注。提醒标注的具体方式不做限定,例如可以为着色标注。进一步的,用户可以选择每次演奏时演奏目标曲谱的全部内容,或者可以演奏其中的部分内容,因此目标部分可以为目标曲谱中的任意一个部分,或者可以为目标曲谱中某一范围内的部分,具体可以由用户指定。
下面对本申请实施例提供的计算机可读存储介质进行介绍,下文描述的计算机可读存储介质与上文描述的曲谱生成方法可相互对应参照。
本申请还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述的曲谱生成方法的步骤。
该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
本领域技术人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应该认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可 擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系属于仅仅用来将一个实体或者操作与另一个实体或者操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语包括、包含或者其他任何变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (11)

  1. 一种曲谱生成方法,其特征在于,包括:
    获取目标音频;
    生成所述目标音频与各个音级对应的色度图谱,并利用所述色度图谱识别所述目标音频的和弦,得到和弦信息;
    对所述目标音频进行调式检测,得到原调信息;
    对所述目标音频进行节奏检测,得到拍子数;
    对所述目标音频各个音频帧的节拍类型进行识别,并基于节拍类型与拍号对应关系确定音频拍号;
    利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号进行曲谱绘制,得到目标曲谱。
  2. 根据权利要求1所述的曲谱生成方法,其特征在于,所述利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号进行曲谱绘制,得到目标曲谱,包括:
    确定目标歌词中各个字在所述目标音频中的位置信息;所述目标歌词为所述目标音频对应的歌词;
    利用各个所述字的持续时长确定对应的音符类型;
    利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号生成第一曲谱,并基于所述位置信息和所述音符类型,利用所述目标歌词标识所述第一曲谱,得到所述目标曲谱。
  3. 根据权利要求1所述的曲谱生成方法,其特征在于,所述利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号进行曲谱绘制,得到目标曲谱,包括:
    利用所述和弦信息确定指法图像;
    基于所述和弦信息对所述指法图像进行拼接,得到第二曲谱;
    利用所述原调信息、所述拍子数和所述音频拍号标记所述第二曲谱,得到所述目标曲谱。
  4. 根据权利要求1所述的曲谱生成方法,其特征在于,所述利用所述和弦信息、所述原调信息、所述拍子数和所述音频拍号进行曲谱绘制,得 到目标曲谱,包括:
    根据得到的曲谱调整信息,对目标信息进行调整,得到调整后信息;其中,所述目标信息为所述原调信息、所述和弦信息、曲谱绘制规则、所述拍子数中的至少一项;
    利用未经调整的非目标信息和所述调整后信息生成所述目标曲谱。
  5. 根据权利要求1所述的曲谱生成方法,其特征在于,所述对所述目标音频进行调式检测,得到原调信息,包括:
    提取所述目标音频的音符序列;
    分别基于多个不同的主音参数,对所述音符序列进行取模计算,得到多个计算结果序列;
    利用各个所述计算结果序列分别与大小调序列比对,得到对应的匹配音符数;
    将最大匹配音符数对应的所述大小调序列和所述主音参数均对应的调式确定为所述原调信息。
  6. 根据权利要求1所述的曲谱生成方法,其特征在于,所述对所述目标音频进行节奏检测,得到拍子数,包括:
    计算目标音频中各个音频帧的能量值;
    对所述目标音频划分为若干个区间,并利用所述能量值计算所处区间的平均能量值;
    若所述能量值大于能量值阈值,则确定检测到一个节拍;所述能量值阈值由平均能量值和所述区间的权重值相乘得到,所述权重值基于各个所述区间内的能量值的方差得到;
    统计每分钟内的节拍数,得到所述拍子数。
  7. 根据权利要求1所述的曲谱生成方法,其特征在于,所述对所述目标音频进行节奏检测,得到拍子数,包括:
    生成所述目标音频对应的对数幅度谱;
    将所述对数幅度谱输入训练好的神经网络,得到所述目标音频中每个音频帧为节拍的概率值;
    对所述概率值组成的概率值序列进行自相关计算,得到若干个自相关 参数;
    将处于预设范围内的最大自相关参数确定为所述拍子数。
  8. 根据权利要求1所述的曲谱生成方法,其特征在于,还包括:
    建立所述目标音频与所述目标曲谱之间的音频曲谱对应关系,并存储所述目标曲谱和所述音频曲谱对应关系;
    若检测到曲谱输出请求,则利用各个所述音频曲谱对应关系判断是否存在所述曲谱输出请求对应的请求曲谱;
    若存在所述请求曲谱,则输出所述请求曲谱。
  9. 根据权利要求1所述的曲谱生成方法,其特征在于,还包括:
    根据所述目标曲谱中的目标拍子数确定节拍音频;
    在检测到开始信号后,播放所述节拍音频,并统计演奏时长;
    按照所述目标拍子数和所述演奏时长,确定所述目标曲谱中的目标部分,对所述目标部分进行提醒标注。
  10. 一种电子设备,其特征在于,包括存储器和处理器,其中:
    所述存储器,用于保存计算机程序;
    所述处理器,用于执行所述计算机程序,以实现如权利要求1至9任一项所述的曲谱生成方法。
  11. 一种计算机可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至9任一项所述的曲谱生成方法。
PCT/CN2022/094961 2021-09-16 2022-05-25 一种曲谱生成方法、电子设备及可读存储介质 WO2023040332A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111088919.7A CN113763913A (zh) 2021-09-16 2021-09-16 一种曲谱生成方法、电子设备及可读存储介质
CN202111088919.7 2021-09-16

Publications (1)

Publication Number Publication Date
WO2023040332A1 true WO2023040332A1 (zh) 2023-03-23

Family

ID=78796104

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/094961 WO2023040332A1 (zh) 2021-09-16 2022-05-25 一种曲谱生成方法、电子设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN113763913A (zh)
WO (1) WO2023040332A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763913A (zh) * 2021-09-16 2021-12-07 腾讯音乐娱乐科技(深圳)有限公司 一种曲谱生成方法、电子设备及可读存储介质
CN114927026A (zh) * 2022-02-15 2022-08-19 湖北省民间工艺技师学院 古琴弹奏的辅助方法、装置、存储介质和古琴

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104992712A (zh) * 2015-07-06 2015-10-21 成都云创新科技有限公司 能识别音乐自动成谱的方法
CN108986841A (zh) * 2018-08-08 2018-12-11 百度在线网络技术(北京)有限公司 音频信息处理方法、装置及存储介质
CN110379400A (zh) * 2018-04-12 2019-10-25 森兰信息科技(上海)有限公司 一种用于生成乐谱的方法及系统
US20200335072A1 (en) * 2018-04-12 2020-10-22 Sunland Information Technology Co., Ltd. System and method for generating musical score
CN112382257A (zh) * 2020-11-03 2021-02-19 腾讯音乐娱乐科技(深圳)有限公司 一种音频处理方法、装置、设备及介质
CN112634841A (zh) * 2020-12-02 2021-04-09 爱荔枝科技(北京)有限公司 一种基于声音识别的吉他谱自动生成方法
CN113763913A (zh) * 2021-09-16 2021-12-07 腾讯音乐娱乐科技(深圳)有限公司 一种曲谱生成方法、电子设备及可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6197631B2 (ja) * 2013-12-19 2017-09-20 ヤマハ株式会社 楽譜解析装置および楽譜解析方法
CN103871295B (zh) * 2014-03-31 2016-03-16 王紫颐 一种基于屏幕显示的多功能古筝电子曲谱装置
JP3201408U (ja) * 2015-09-09 2015-12-10 昭郎 伊東 五線譜上に記載されている調性を色彩にて表現させた楽譜
CN110111762B (zh) * 2019-05-06 2023-07-18 香港教育大学 一种方格乐谱生成系统
CN113012665B (zh) * 2021-02-19 2024-04-19 腾讯音乐娱乐科技(深圳)有限公司 音乐生成方法及音乐生成模型的训练方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104992712A (zh) * 2015-07-06 2015-10-21 成都云创新科技有限公司 能识别音乐自动成谱的方法
CN110379400A (zh) * 2018-04-12 2019-10-25 森兰信息科技(上海)有限公司 一种用于生成乐谱的方法及系统
US20200335072A1 (en) * 2018-04-12 2020-10-22 Sunland Information Technology Co., Ltd. System and method for generating musical score
CN108986841A (zh) * 2018-08-08 2018-12-11 百度在线网络技术(北京)有限公司 音频信息处理方法、装置及存储介质
CN112382257A (zh) * 2020-11-03 2021-02-19 腾讯音乐娱乐科技(深圳)有限公司 一种音频处理方法、装置、设备及介质
CN112634841A (zh) * 2020-12-02 2021-04-09 爱荔枝科技(北京)有限公司 一种基于声音识别的吉他谱自动生成方法
CN113763913A (zh) * 2021-09-16 2021-12-07 腾讯音乐娱乐科技(深圳)有限公司 一种曲谱生成方法、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN113763913A (zh) 2021-12-07

Similar Documents

Publication Publication Date Title
JP6735100B2 (ja) 音楽コンテンツ及びリアルタイム音楽伴奏の自動採譜
Dixon Onset detection revisited
US7582824B2 (en) Tempo detection apparatus, chord-name detection apparatus, and programs therefor
WO2023040332A1 (zh) 一种曲谱生成方法、电子设备及可读存储介质
US9852721B2 (en) Musical analysis platform
US9804818B2 (en) Musical analysis platform
CN112382257B (zh) 一种音频处理方法、装置、设备及介质
US8859872B2 (en) Method for giving feedback on a musical performance
CN109979488B (zh) 基于重音分析的人声转乐谱系统
WO2020199381A1 (zh) 音频信号的旋律检测方法、装置以及电子设备
CN108257588B (zh) 一种谱曲方法及装置
Su et al. Sparse modeling of magnitude and phase-derived spectra for playing technique classification
JPWO2009104269A1 (ja) 楽曲判別装置、楽曲判別方法、楽曲判別プログラム及び記録媒体
JP5196550B2 (ja) コード検出装置およびコード検出プログラム
Benetos et al. Automatic transcription of Turkish microtonal music
JP2010025972A (ja) コード名検出装置及びコード名検出用プログラム
Lerch Software-based extraction of objective parameters from music performances
Yang Computational modelling and analysis of vibrato and portamento in expressive music performance
WO2019180830A1 (ja) 歌唱評価方法及び装置、プログラム
León et al. A fuzzy framework to explain musical tuning in practice
JP5843074B2 (ja) 弦楽器演奏評価装置及び弦楽器演奏評価プログラム
JP2007240552A (ja) 楽器音認識方法、楽器アノテーション方法、及び楽曲検索方法
Müller et al. Tempo and Beat Tracking
Freire et al. Real-Time Symbolic Transcription and Interactive Transformation Using a Hexaphonic Nylon-String Guitar
Aljanaki Automatic musical key detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22868709

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE